CN1225126C

CN1225126C - Space predicting method and apparatus for video encoding

Info

Publication number: CN1225126C
Application number: CN02130833.0A
Authority: CN
Inventors: 高文; 范晓鹏; 吕岩
Original assignee: Institute of Computing Technology of CAS
Current assignee: UNITED XINYUAN DIGITAL AUDIO V
Priority date: 2002-10-09
Filing date: 2002-10-09
Publication date: 2005-10-26
Anticipated expiration: 2022-10-09
Also published as: CN1489391A

Abstract

The present invention relates to a novel space predicting method for video encoding and a device thereof. When a code flow is coded, a predicted image is converted to a frequency domain for frequency domain quantization, and the image is treated as the predicted image again; when the code flow is decoded, frequency domain quantization is carried out to the predicted image, and then the predicted image is compensated to a decoded residual image. The device comprises a coding module and a decoding module at least, wherein the coding module is provided with a prediction module, a residual coefficient calculating module, a scanning module and an entropy coding module; the decoding module is provided with an entropy decoding module, a compensation module, a dequantization module, an inverse transformation module and a transformation quantization module. The present invention overcomes the defect of a video coding method based on multidirectional spatial prediction that a flicker phenomenon occurs when the method is used for coding continuous I frames, and the coding efficiency of the video coding method based on the multidirectional spatial prediction is enhanced; the present invention provides a coding plan for the continuous I frames for resisting the flicker phenomenon and a scanning plan for residual coefficients based on a mode, and furthermore, the coding efficiency is ensured; the present invention also provides a specific system architecture for realizing the method.

Description

Novel spatial prediction method and device for video coding

Technical Field：

The invention relates to a novel spatial prediction method and a device thereof for video coding, in particular to a novel spatial prediction method and a device thereof for realizing the method, which are used for eliminating and compensating flicker generated by coding video stream in a whole frame (I frame) and further improving coding and decoding efficiency in a video coding and decoding technology based on JVT standard; belongs to the technical field of digital video processing.

Background：

Efficient video codec techniques are key to the implementation of multimedia data storage and transmission, and advanced video codec techniques generally exist in a standard fashion. The current typical video compression standard is an MPEG series international standard proposed by the Moving Picture Expert Group (MPEG for short) under the international organization for standardization (ISO); the h.26x series of video compression standards proposed by the International Telecommunications Union (ITU), and the JVT video coding standard being formulated by the Joint Video Team (JVT) established by ISO and ITU, and the like. The JVT standard employs a novel coding technique that is significantly more efficient than any of the existing coding standards. The formal name of the JVT standard in ISO is the tenth part of the MPEG-4 standard, and the formal name in ITU is the H.264 standard.

The video encoding process is a process of encoding each frame of image of a video sequence. In the JVT video coding standard, the coding of each frame of image is based on macroblocks. When each frame image is encoded, intra (I-frame) encoding, predictive (P-frame) encoding, bidirectional predictive (B-frame) encoding, and the like can be used. The feature of I-frame coding is that no reference to other frames is required in the encoding and decoding. In general: when encoding, I-frame, P-frame and B-frame encoding is performed alternately, for example: in the order IBBPBBP. However, for some special applications, for example: applications requiring low computational complexity, low storage capacity, or requiring real-time compression, etc., may use only I-frames for encoding. In addition, full I-frame coded video also has features that facilitate editing. In I-frame coding, redundancy within macroblocks is eliminated by orthogonal transforms, such as Discrete Cosine Transforms (DCT), wavelet transforms, and the like. While the traditional video coding algorithm usually adopts a prediction method on a coefficient domain of orthogonal transformation when eliminating redundancy between macro blocks. However, this prediction can only be done on the dc component and is therefore inefficient.

In I frame coding, the adoption of multi-directional spatial prediction is the mainstream of the current research, and a good effect is obtained. Spatial prediction within a frame refers to: when I frame coding and decoding are carried out, firstly, according to a certain mode, the prediction of the current block is generated by the information (such as adjacent reconstructed blocks) in the frame (which can be obtained by a decoding end); then, the predicted block is subtracted from the actual block to be encoded to obtain a residual, and the residual is encoded.

The multi-directional spatial prediction technology is well applied to video coding. The JVT video coding standard uses this technique. However, there are two major drawbacks to the existing multidirectional spatial prediction techniques: one is that the existing technology can generate serious flicker phenomenon when applied to continuous I frame coding, which affects visual effect; the other one is that: multi-directional spatial prediction changes the probability distribution of the residual image over the coefficient domain, while existing methods still use a fixed zigzag scanning order of the transform coefficients, see fig. 4, which refers to the coding order of the coefficients of the transformed quantized block in a video coding scheme, which order has a large impact on the coding efficiency. In current coding systems (jpeg, mpeg, etc.), this fixed scanning order is commonly used for blocks of the same size, and thus the coding efficiency is not optimal. JVT is an efficient video coding standard currently being formulated. It was first formulated by the ITU (International Telecommunications Union) and further adopted by the ISO/IEC International standards organization as part ten of ISO/IEC 14496(MPEG 4).

However, when the JVT is used for encoding all I frames, the reconstructed image may flicker during playback. After analysis and verification, it is considered to be mainly caused by the variable block size and the relative randomness of intra prediction when encoding.

Variable block size refers to: an encoded macroblock may be subdivided into smaller subblocks according to an encoding mode. The size of the sub-blocks is different when the partition modes are different. The variable block size causes flicker, mainly due to: the blocks with the same position and basically unchanged content of the previous frame and the next frame adopt different segmentation modes during encoding, so that reconstruction results are greatly different. This is avoided in part by appropriately modifying the encoding strategy of the encoder without modifying the decoder. See fig. 1, which is a schematic diagram illustrating the subdivision of a macroblock in a JVT into various nibbles.

Common coding schemes typically only have inter-prediction to remove temporal redundancy, which is removed by various transforms. JVT proposes intra-frame prediction, which is used together with transform coding to eliminate spatial redundancy, thereby greatly improving coding efficiency. Specifically, there are two modes, i.e., Intra4 × 4 and Intra16 × 16. (Intra4 × 4 and Intra16 × 16 are two kinds of macroblock partition modes, 9 kinds of prediction modes in the Intra4 × 4 mode and 4 kinds in the Intra 6 × 16 mode) referring to fig. 2, in the Intra4 × 4 mode, Intra prediction is performed for each subblock in one macroblock. The middle pixel of each 4 x 4 small block will be predicted by the already decoded 17 pixels in the neighboring blocks.

Referring to fig. 3, the prediction modes within the frame are divided into 9 (mode 0 to mode 8), wherein mode 2 is DC prediction in the MPEG-4 standard.

In Intra16 × 16, it is assumed that the pixels of a block to be predicted are represented by P (x, y), where x, y is 0 … 15, the left critical block pixel P (-1, y) of the predicted block is 0..15, and the upper critical block pixel P (x, -1) of the predicted block is 0.. 15. Defining 4 prediction modes, which are respectively: vertical prediction, horizontal prediction, DC prediction, and planar prediction. There are also four prediction modes for chroma blocks, which are substantially similar to luma blocks.

The relative randomness of intra prediction refers to: due to the difference in the information within the frames used to generate the prediction and the difference in the prediction modes, the prediction values of the blocks at the same positions and with substantially unchanged contents in the two previous and next frames are generally different. When continuous I-frame coding is carried out on a system without intra-frame prediction, the small difference between the front and rear corresponding blocks can be removed by the quantization operation of a transform domain. The more similar to an lnd macroblock, the greater the probability that this difference is removed. If the two blocks are identical, the reconstructed image is also identical. However, for systems with intra prediction, the reconstructed image consists of two additions: a predicted image and a reconstructed residual image. Due to the quantization process of the frequency domain, the frequency domain coefficient of the reconstructed residual meets the integral multiple of the quantization step; without this process, the prediction image has a low probability that its frequency domain coefficients just satisfy an integer multiple of the quantization step size, and in fact if its frequency domain coefficients are divided by the quantization step size, the fractional part of the resulting coefficients can be considered as a random number between 0 and 1 (when the quantization step size is not large), i.e., the probability of any number between 0 and 1 (including 0) is equal.

Since the reconstructed image is composed of such two parts, the fractional part of the coefficient obtained by dividing its frequency domain coefficient by the quantization step can also be considered as a random number between 0 and 1. For the two corresponding blocks before and after the pixel value is relatively close, because of the relative randomness of the reconstructed image in the frequency domain, the similarity of the reconstructed images is not closely related to the similarity of the reconstructed images. Even if they are themselves identical, the probability of the reconstructed images being identical is small.

Referring to fig. 5 and 6, the existing encoding process is: the reconstructed image is processed by a prediction module under the control of a mode selection module, a prediction image is output, the prediction image and the current coding image are processed by a residual error coefficient calculation module, and then are processed by a scanning module and entropy coding, and finally a coding code stream is output.

Referring to fig. 7, the signal flow of the decoding part of the multi-directional spatial prediction coding system in the prior art is as follows: the decoded code stream is output as a video stream after entropy decoding, inverse quantization and inverse transformation and then compensation of a predicted image.

The above-mentioned encoding/decoding process cannot overcome the flicker phenomenon generated when the video encoding method based on multi-direction spatial prediction is applied to continuous I frame encoding, and the encoding efficiency of the video encoding method based on multi-direction spatial prediction cannot be improved due to the adoption of a fixed scanning mode.

Disclosure of Invention

The invention mainly aims to provide a novel space prediction method for video coding, overcomes the defect that a video coding method based on multidirectional space prediction generates flicker phenomenon when being applied to continuous I frame coding, improves the coding efficiency of the video coding method based on multidirectional space prediction, provides an anti-flicker continuous I frame coding scheme and a mode-based residual coefficient scanning scheme for a video coding technology based on multidirectional space prediction, and ensures the coding efficiency while lightening the flicker phenomenon.

Another object of the present invention is to provide a novel spatial prediction method for video coding, which takes the JVT standard as an implementation case and provides a specific technical means for solving the flicker problem of consecutive I-frame coding and improving the coding efficiency for the JVT standard.

It is a further object of the present invention to provide a novel spatial prediction apparatus for video coding, and to provide a specific apparatus for implementing the above method.

The purpose of the invention is realized by the following technical scheme:

a new space prediction method used for video coding, while encoding the code stream, transform the predictive picture to the frequency domain too with the transformation method used while dealing with the residual image, and use the same quantized coefficient to quantize, then regard as the predictive picture again; when decoding, the predicted image is subjected to frequency domain quantization by adopting the same processing method as that during encoding, and then the quantized predicted image is compensated to a decoded residual image.

The encoding process specifically comprises:

step 100: generating a predicted image from a decoded image of a block adjacent to a current encoding block according to the selected prediction mode;

step 101: transforming the predicted image to the frequency domain;

step 102: quantizing the frequency domain coefficients of the predicted image, wherein the quantized coefficients are the same as those used when processing the residual image; and the matrix of the quantized frequency domain coefficients is set to satisfy the following formula:

Z＝Q(Y)＝(Y×Quant(QP)+Qconst(QP))＞＞Q_bit(Qp)

wherein,

z is the quantized frequency domain coefficient matrix,

y is a matrix of the frequency domain coefficients,

qp is a quantization parameter for the signal,

quant (Qp), Qconst (Qp), Q _ bit (Qp) are functions in quantization defined by JVT;

step 103: inverse quantization is performed on the matrix of the frequency domain coefficients obtained in step 102 according to the following formula;

W＝DQ(Z)＝(Z×DQuant(Qp)+DQconst(Qp))＞＞Q_per(Qp)

DQuant(Qp)×DQconst(Qp)≈2^{Q_per(Qp)×Q_bit(Qp)}

wherein,

w is the inverse quantized frequency domain coefficient matrix,

z is the frequency domain coefficient matrix after quantization and before inverse quantization,

qp is a quantization parameter for the signal,

DQuant (Qp), DQconst (Qp), Q _ bit (Qp), Q _ per (Qp) are the functions in quantization defined by JVT;

step 104: according to the method of step 100-103, transforming the encoded current block to the frequency domain to obtain a frequency domain image;

step 105: subtracting the inversely quantized frequency domain coefficient matrix from the frequency domain image to directly obtain a frequency domain residual error image;

step 106: and quantizing the frequency domain residual image to obtain a quantized frequency domain residual coefficient, wherein the formula is the same as the above.

Step 107: performing coefficient scanning on the frequency domain residual error coefficient, and performing entropy coding to obtain a code stream;

step 108: compensating the frequency domain coefficient matrix to the frequency domain residual error coefficient according to the following formula;

C＝C+Z；

wherein,

c is the frequency domain residual coefficient and,

z is a frequency domain coefficient matrix;

step 109: inverse quantizing the frequency domain residual error coefficient by using a JVT formula;

step 110: according to the size of the block and the mode inverse transformation frequency domain residual error coefficient, obtaining a primary reconstructed image;

step 111: and filtering the reconstructed image to remove the blocking effect to obtain the output image of the current block.

Before the encoding, the method further includes a process of determining a scanning order based on the prediction mode, and the specific process is as follows: and respectively counting the probability that the coefficient of each frequency of the residual image is not 0 in each mode, and generating a scanning sequence table from large to small according to the value of the probability so as to replace a single zigzag scanning table. The zigzag scanning refers to a coding order of coefficients of transform-quantized blocks in a video coding scheme, which has a great influence on coding efficiency. Referring to fig. 4, in the current coding system (jpeg, mpeg, etc.), a fixed scanning order is commonly used for the same size block.

During encoding scanning, a scanning sequence table is consulted according to the selected mode sequence, and the residual error coefficients are scanned according to the consulted position sequence.

The decoding process comprises the following steps:

step 200: obtaining a prediction mode and a frequency domain residual error coefficient through entropy decoding;

step 201: according to the prediction mode obtained by entropy decoding; generating a prediction image from decoded images of blocks adjacent to a currently decoded block;

step 203: transforming the predicted image to the frequency domain;

step 204: quantizing the frequency domain coefficients of the predicted image to obtain a frequency domain coefficient matrix;

step 205: according to the following formula, the frequency domain coefficient matrix is compensated to the frequency domain residual coefficient,

C＝C+Z

wherein,

c is the frequency domain residual coefficient and,

z is a frequency domain coefficient matrix;

step 206: inverse quantizing the frequency domain residual error coefficients;

step 207: inversely transforming the frequency domain residual error coefficient according to the block size and the mode to obtain a primary reconstructed image;

step 208: and filtering the reconstructed image to remove the blocking effect to obtain the output image of the current block.

The decoding process may also be:

step 210: obtaining a prediction mode and a frequency domain residual error coefficient through entropy decoding;

step 211: according to the prediction mode obtained by entropy decoding; generating a prediction image from decoded images of blocks adjacent to a currently decoded block;

step 212: transforming the predicted image to the frequency domain;

step 213: quantizing the frequency domain coefficients of the predicted image to obtain a matrix of the frequency domain coefficients;

step 214: respectively inversely quantizing the frequency domain coefficient and the frequency domain residual error coefficient;

step 215: the inverse quantized frequency domain coefficient matrix is compensated to the inverse quantized frequency domain residual coefficients,

step 216: inversely transforming the frequency domain residual error coefficient according to the block size and the mode to obtain a primary reconstructed image;

step 217: and filtering the reconstructed image to remove the blocking effect to obtain the output image of the current block.

The decoding process may be further:

step 220: obtaining a prediction mode and a frequency domain residual error coefficient through entropy decoding;

step 221: according to the prediction mode obtained by entropy decoding; generating a prediction image from decoded images of blocks adjacent to a currently decoded block;

step 222: transforming the predicted image to the frequency domain;

step 223: quantizing the frequency domain coefficients of the predicted image to obtain a matrix of the frequency domain coefficients;

step 234: respectively inversely quantizing the frequency domain coefficient and the frequency domain residual error coefficient;

step 225: inversely transforming the frequency domain coefficient and the frequency domain residual error coefficient according to the block size and the mode respectively;

step 226: compensating the frequency domain coefficient matrix subjected to inverse quantization and inverse transformation to a frequency domain residual error coefficient to obtain a primary reconstructed image;

step 227: and filtering the reconstructed image to remove the blocking effect to obtain the output image of the current block.

In the above-described decoding scanning, the scanning order table is referred to in accordance with the selected mode order, and the residual coefficients are scanned in the order of the referred positions.

A novel spatial prediction device for video coding at least comprises a coding module and a decoding module; wherein,

the coding module is at least provided with: the device comprises a prediction module, a residual error coefficient calculation module, a scanning module and an entropy coding module; the prediction module processes the input reconstructed image to obtain a predicted image, the predicted image is processed by the residual error coefficient calculation module to compensate the current coded image, and then the compensated code stream is processed by the scanning module and then is coded and output by the entropy coding module;

the decoding module is at least provided with: the device comprises an entropy decoding module, a compensation module, an inverse quantization module, an inverse transformation module and a transformation quantization module; sequentially carrying out entropy decoding, inverse quantization and inverse transformation on an input code stream signal; the transformation quantization module processes the prediction image to obtain compensation information in the decoding process.

The coding specifically comprises:

the prediction module generates a predicted image from the decoded image of the adjacent block of the current coding block according to the prediction mode selected by the mode selection module;

the residual coefficient calculating module transforms the predicted image to a frequency domain and quantizes the frequency domain coefficient of the predicted image, wherein the quantization coefficient is the same as the quantization coefficient adopted when the residual image is processed; and the matrix of the quantized frequency domain coefficients satisfies the following formula:

Z＝Q(Y)＝(Y×Quant(Qp)+Qconst(Qp))＞＞Q_bit(Qp)

wherein,

z is a quantized frequency domain coefficient matrix;

y is a frequency domain coefficient matrix;

qp is a quantization parameter

the residual coefficient calculating module performs inverse quantization on the obtained matrix of the frequency domain coefficient according to the following formula;

W＝DQ(Z)＝(Z×DQuant(Qp)+DQconst(Qp))＞＞Q_per(Qp)

DQuant(Qp)×DQconst(Qp)≈2^{Q_per(Qp)×Q_bit(Qp)}

wherein,

w is the inverse quantized frequency domain coefficient matrix,

qp is a quantization parameter for the signal,

DQuant (Qp), DQcons t (Qp), Q _ bit (Qp), Q _ per (Qp) are the function in quantization defined by JVT;

the residual coefficient calculating module adopts the method to transform the coded current block to the frequency domain to obtain a frequency domain image; further subtracting the frequency domain coefficient matrix to directly obtain a frequency domain residual error image; quantizing the frequency domain residual error image to obtain a quantized frequency domain residual error coefficient;

the scanning module performs coefficient scanning on the frequency domain residual error coefficient; the entropy coding module codes the scanned information to obtain a code stream;

the compensation module compensates the frequency domain coefficient matrix to the frequency domain residual error coefficient according to the following formula;

C＝C+Z，

wherein,

c is a frequency domain residual error coefficient;

z is a frequency domain coefficient matrix;

the inverse quantization module inversely quantizes the frequency domain residual error coefficient according to a formula of JVT; the inverse transformation module carries out inverse transformation on the frequency domain residual error coefficient according to the size and the mode of the block to obtain a primary reconstructed image; and the filtering module performs filtering for removing the blocking effect on the reconstructed image to obtain an output image of the current block.

The coding module and/or the decoding module is also provided with a mode selection module for determining a scanning sequence based on the prediction mode, and the mode selection module is used for controlling mode selection in the prediction module and the scanning module and improving the coding and/or decoding efficiency; the scanning module respectively counts the probability that the coefficient of each frequency of the residual image is not 0 in each mode, and generates a scanning sequence table from large to small according to the value of the probability so as to replace a single zigzag scanning table.

The decoding specifically comprises:

the entropy decoding module decodes the input code stream to obtain a prediction mode and a frequency domain residual error coefficient; then, according to a prediction mode obtained by entropy decoding; generating a prediction image from decoded images of blocks adjacent to a currently decoded block;

the transformation quantization module transforms the prediction image to a frequency domain and quantizes frequency domain coefficients of the prediction image to obtain a frequency domain coefficient matrix;

the compensation module compensates the frequency domain coefficient matrix to the frequency domain residual coefficient according to the following formula,

C＝C+Z

wherein,

c is a frequency domain residual error coefficient;

z is a frequency domain coefficient matrix;

the inverse quantization module performs inverse quantization processing on the frequency domain residual error coefficient;

the inverse transformation module carries out inverse transformation on the frequency domain residual error coefficient according to the size and the mode of the block to obtain a primary reconstructed image; and finally, the filtering module performs filtering for removing the blocking effect on the reconstructed image to obtain the output image of the current block.

The decoding may specifically be:

the inverse quantization module is respectively positioned behind the entropy decoding module and the transformation quantization module and respectively inversely quantizes the frequency domain coefficient and the frequency domain residual error coefficient;

the compensation module compensates the inverse quantized frequency domain coefficient matrix onto the inverse quantized frequency domain residual coefficients,

the inverse transformation module inversely transforms the frequency domain residual error coefficient according to the size and the mode of the block to obtain a primary reconstructed image; and finally, filtering the reconstructed image to remove the blocking effect to obtain the output image of the current block.

The decoding may specifically be:

the inverse transformation module positioned behind the inverse quantization module respectively inversely transforms the frequency domain coefficient and the frequency domain residual error coefficient according to the block size and the mode;

the compensation module compensates the frequency domain coefficient matrix subjected to inverse quantization and inverse transformation to the frequency domain residual error coefficient to obtain a primary reconstructed image; and finally, filtering the reconstructed image to remove the blocking effect to obtain the output image of the current block.

When decoding, the decoding module also refers to the scanning sequence table according to the selected mode sequence and scans the residual error coefficient according to the sequence of the searched positions.

Through the analysis of the technical scheme, the invention has the following advantages:

1. by the novel spatial prediction method for video coding, the defect that a flicker phenomenon is generated when the video coding method based on multi-direction spatial prediction is applied to continuous I frame coding is overcome, the coding efficiency of the video coding method based on multi-direction spatial prediction is improved, a flicker-resistant continuous I frame coding scheme and a mode-based residual coefficient scanning scheme are provided for the video coding technology based on multi-direction spatial prediction, and the coding efficiency is further ensured while the flicker phenomenon is reduced.

2. The invention takes JVT standard as an implementation case, provides a technical means for solving the problem of flicker of continuous I frame coding for the JVT standard and improving the coding efficiency of the standard.

3. The device provided by the invention provides a specific system structure for realizing the method, and a hardware module and a combination scheme thereof for realizing the system.

Drawings：

Fig. 1 is a schematic diagram of macroblock subdivision in JVT.

Fig. 2 is a diagram illustrating the prediction of pixels in each 4 × 4 small block in the Intra4 × 4 mode.

Fig. 3 is a diagram illustrating prediction directions of intra prediction modes.

Fig. 4 is a diagram illustrating a fixed scanning order commonly used in the conventional encoding system.

Fig. 5 is a schematic diagram of a prior art encoding process.

Fig. 6 is a schematic diagram of a prior art encoding apparatus.

FIG. 7 is a signal flow diagram of a decoding portion of a prior art multi-directional spatial predictive coding system.

Fig. 8 is a flowchart of a new scanning method according to the present invention.

FIG. 9 is a diagram of encoding with a new scanning method according to an embodiment of the present invention.

FIG. 10 is a flowchart of the anti-flicker decoding end according to the present invention.

FIG. 11 is a block diagram of a decoding portion according to an embodiment of the invention.

Fig. 12 is a block diagram of a decoding portion according to another embodiment of the present invention.

Fig. 13 is a block diagram of a decoding portion according to yet another embodiment of the present invention.

Detailed Description

The present invention is further illustrated in detail below with reference to specific examples:

the invention provides a novel space prediction method and a device thereof for video coding, aiming at effectively reducing the flicker phenomenon generated during continuous I frame coding in a video coding method based on multidirectional space prediction; and the scanning sequence is determined according to the prediction mode, so that the coding efficiency of the video coding method based on the multidirectional spatial prediction is effectively improved.

In the JVT coding standard, an embodiment of the present invention implements the anti-flicker processing by using the following steps:

referring to figures 7 and 8 of the drawings,

and (3) processing at the encoding end:

1. and (3) generating a predicted image: and generating a predicted image from the decoded image of the adjacent block of the current coding block according to the selected prediction mode. This step is the same as the original step of JVT;

2. the predicted image is transformed to the frequency domain. The method of transformation is the same as the method of transformation used when processing residual images in the JVT. For example, for a4 × 4 block, assuming the input is X, the output Y is:

Y = [\begin{matrix} 1 & 1 & 1 & 1 \\ 2 & 1 & - 1 & - 2 \\ 1 & - 1 & - 1 & 1 \\ 1 & - 2 & 2 & - 1 \end{matrix}] [\begin{matrix} x_{00} & x_{01} & x_{02} & x_{03} \\ x_{10} & x_{11} & x_{12} & x_{13} \\ x_{20} & x_{21} & x_{22} & x_{23} \\ x_{30} & x_{31} & x_{32} & x_{33} \end{matrix}] [\begin{matrix} 1 & 2 & 1 & 1 \\ 1 & 1 & - 1 & - 2 \\ 1 & - 1 & - 1 & 2 \\ 1 & - 2 & 1 & - 1 \end{matrix}]

wherein Y is the frequency domain coefficient of the predicted image, and X is the predicted image

3. The frequency domain coefficients Y of the predicted image are quantized. The quantization coefficients Qp are the same as Qp used when processing the residual image. Setting a matrix of the quantized frequency domain coefficients as Z; the quantization formula is:

Z＝Q(Y)＝(Y×Quant(Qp)+Qconst(Qp))＞＞Q_bit(Qp)

4. and inversely quantizing Z to obtain W. The inverse quantization here is distinguished from the inverse quantization in JVT: the inverse quantization in JVT is not on the same scale as the quantization,

in JVT, its inverse quantization formula is:

W＝DQ(Z)＝(Z×DQuant(Qp)+DQconst(Qp))＞＞Q_per(Qp)

in order to restore the quantized coefficients to the scale before quantization, the inverse quantization formula must be redesigned: the inverse quantization formula of the invention is:

W＝DQ’(Z)＝(Z×DQuant’(Qp)+DQconst’(Qp))＞＞Q_per’(Qp)

also, the new formula must satisfy:

DQuant' (Qp) xQuant (Qp) approximately equal to 2^{Q-per’(Qp)×Q_bit(Qp)}

5. Transforming a current block I to be coded into a frequency domain to obtain a frequency domain image F; the method is the same as above;

6. subtracting W from F to directly obtain a frequency domain residual error image S;

7. quantizing S to obtain a quantized frequency domain residual error coefficient C, wherein the formula is the same as the above;

8. performing coefficient scanning on the C, and performing entropy coding to obtain a code stream;

9. z is compensated to C. I.e., C ═ C + Z;

10. inverse quantization of C, using formula of JVT

11. Performing inverse transformation C to obtain a primary reconstructed image B, and using the original inverse transformation method of JVT according to the block size and the mode;

12. and filtering B to eliminate the blocking effect to obtain an output image 0 of the current block.

See fig. 10, 11

One of the processes at the decoding end:

1. entropy decoding to obtain a prediction mode and a frequency domain residual error coefficient C;

2. and (3) generating a predicted image: and according to the prediction mode obtained by entropy decoding, generating a prediction image from the decoded image of the adjacent block of the current decoding block. This step is the same as the JVT original step.

3. The predicted image is transformed to the frequency domain. The method of transformation is the same as the method of transformation used when processing residual images in the JVT. This step is the same as step 2 in encoding.

4. And quantizing the frequency domain coefficient Y of the predicted image to obtain Z. This step is the same as step 3 in encoding.

5. Z is compensated to C. I.e., C ═ C + Z. Same coding step 9

6. Quantize C, using the formula of JVT. As well as the encoding step 10.

7. And performing inverse transformation C to obtain a primary reconstructed image B. The original inverse transform method of JVT is used according to block size and mode. The same as the 11 th step in encoding.

8. And filtering B to eliminate the blocking effect to obtain an output image 0 of the current block. The same as the 12 th step in encoding.

Referring to fig. 12 and 13, the second and third processes of the decoding end of the present invention: basically the same as one of the above-described processes at the decoding end, except that: one of the processing of the decoding end is directly compensated after quantization, the second processing of the decoding end is compensated after inverse quantization, and the third processing of the decoding end is compensated after inverse transformation.

The second and third schemes are similar to the first scheme, except that the compensation positions are different, and according to the difference of the compensation positions, inverse quantization and inverse transformation are carried out on the quantized predicted image.

The method for determining the scanning sequence based on the prediction mode can effectively improve the coding efficiency of the video coding method based on the multidirectional spatial prediction. In the JVT coding standard, a prediction mode based scanning module is implemented using the following steps:

in the design stage: firstly, respectively counting the probability that the coefficient of each frequency of the residual image is not 0 in each mode; then, a scan order table is generated in the order of the probabilities from large to small (for example, expressed by a variable matrix T (m, i): the position of the coefficient of the i-th scan in the m-th mode is T (m, i)). Instead of a single zigzag scan table z (i).

In the encoding and decoding stage:

when scanning is carried out, according to the selected mode m, the scanning sequence table T is searched according to the increasing sequence of i, and the residual error coefficients are scanned according to the sequence of the searched positions.

Referring to fig. 9, the encoding module in the apparatus of the present invention is further provided with a new scanning module between the residual coefficient calculating module and the entropy encoding module based on the apparatus of the prior art, and the new scanning module selects a predetermined scanning order according to the control of the mode selecting module, thereby improving the processing efficiency.

Referring to fig. 11, 12 and 13, the decoding modules in the apparatus of the present invention are each added with a transform quantization module, and include an entropy decoding module, an inverse quantization module, an inverse transform module and a prediction compensation module, except that: in different embodiments, the position of the prediction compensation module may be set after the entropy decoding module, or after the inverse quantization module, or after the inverse transformation module, respectively. Taking fig. 11 as an example, the specific decoding process includes: and after entropy decoding is carried out on the code stream. The prediction image processed by the transformation quantization module and the code stream information processed by the entropy decoding are compensated in the prediction compensation module, and then are output after being processed by the inverse quantization module and the inverse transformation module in sequence. The difference from fig. 11 is: the prediction compensation positions shown in fig. 12 and 13 are located after the inverse quantization module or the inverse transform module, respectively.

Finally, it should be noted that: the above embodiments are intended to be illustrative only and not limiting, and although the invention has been described in detail with reference to the preferred embodiments disclosed above, those skilled in the art will appreciate that: the invention may be modified and equivalents substituted; all such modifications and variations are intended to be included herein within the scope of this disclosure and the present invention and protected by the following claims.

Claims

1. A novel spatial prediction method for video coding, characterized by: when the code stream is coded, the predicted image is also transformed to a frequency domain, quantized by the same quantization coefficient and then used as the predicted image; when decoding, the predicted image is subjected to frequency domain quantization by adopting the same processing method as that during encoding, and then the quantized predicted image is compensated to a decoded residual image.

2. The novel spatial prediction method for video coding as claimed in claim 1, characterized in that: the encoding process specifically comprises:

step 101: transforming the predicted image to the frequency domain;

Z＝Q(Y)＝(Y×Quant(Qp)+Qconst(Qp))＞＞Q_bit(Qp)

wherein,

z is the quantized frequency domain coefficient matrix,

y is a matrix of the frequency domain coefficients,

qp is a quantization parameter for the signal,

quant (Qp), Qconst (Qp), Q _ bit (Qp) are functions in quantization defined by the Joint video working set standard;

W＝DQ(Z)＝(Z×DQuant(Qp)+DQconst(Qp))＞＞Q_per(Qp)

DQuant(Qp)×DQconst(Qp)≈2^{Q_per(Qp)×Q_bit(Qp)}

wherein,

w is the inverse quantized frequency domain coefficient matrix,

qp is a quantization parameter for the signal,

DQuant (Qp), DQconst (Qp), Q _ bit (Qp), Q _ per (Qp) are the functions in quantization defined by the Joint video working set standard;

C＝C+Z；

wherein,

c is the frequency domain residual coefficient and,

z is a frequency domain coefficient matrix;

step 109: inverse quantizing the frequency domain residual error coefficients by using a formula combined with the video working group standard;

3. The novel spatial prediction method for video coding according to claim 1 or 2, characterized in that: before the encoding, the method further includes a process of determining a scanning order based on the prediction mode, and the specific process is as follows: and respectively counting the probability that the coefficient of each frequency of the residual image is not 0 in each mode, and generating a scanning sequence table from large to small according to the value of the probability.

4. The novel spatial prediction method for video coding as claimed in claim 3, characterized in that: during encoding scanning, a scanning sequence table is consulted according to the selected mode sequence, and the residual error coefficients are scanned according to the consulted position sequence.

5. The novel spatial prediction method for video coding as claimed in claim 3, characterized in that: the decoding process comprises the following steps:

step 203: transforming the predicted image to the frequency domain;

C＝C+Z

wherein,

c is the frequency domain residual coefficient and,

z is a frequency domain coefficient matrix;

step 206: inverse quantizing the frequency domain residual error coefficients;

6. The novel spatial prediction method for video coding as claimed in claim 3, characterized in that: the decoding process comprises the following steps:

step 212: transforming the predicted image to the frequency domain;

7. The novel spatial prediction method for video coding as claimed in claim 3, characterized in that: the decoding process comprises the following steps:

step 222: transforming the predicted image to the frequency domain;

8. The novel spatial prediction method for video coding as claimed in claim 5, characterized in that: in decoding scanning, a scanning order table is referred to in accordance with a selected mode order, and residual coefficients are scanned in the order of the referred positions.

9. The novel spatial prediction method for video coding as claimed in claim 6, characterized in that: in decoding scanning, a scanning order table is referred to in accordance with a selected mode order, and residual coefficients are scanned in the order of the referred positions.

10. The novel spatial prediction method for video coding as claimed in claim 7, characterized in that: in decoding scanning, a scanning order table is referred to in accordance with a selected mode order, and residual coefficients are scanned in the order of the referred positions.

11. A novel spatial prediction apparatus for video coding, characterized by: it at least comprises an encoding module and a decoding module; wherein,

Z＝Q(Y)＝(Y×Quant(Qp)+Qconst(Qp))＞＞Q_bit(Qp)

wherein,

z is a quantized frequency domain coefficient matrix;

y is a frequency domain coefficient matrix;

qp is a quantization parameter

W＝DQ(Z)＝(Z×DQuant(Qp)+DQconst(Qp))＞＞Q_per(Qp)

DQuant(Qp)×DQconst(Qp)≈2^{Q_per(Qp)×Q_bit(Qp)}

wherein,

w is the inverse quantized frequency domain coefficient matrix,

qp is a quantization parameter for the signal,

C＝C+Z，

wherein,

c is a frequency domain residual error coefficient;

z is a frequency domain coefficient matrix;

the inverse quantization module inversely quantizes the frequency domain residual error coefficient according to a formula combined with the video working group standard; the inverse transformation module carries out inverse transformation on the frequency domain residual error coefficient according to the size and the mode of the block to obtain a primary reconstructed image; the filtering module carries out filtering for removing the blocking effect on the reconstructed image to obtain an output image of the current block;

12. The novel spatial prediction device for video coding as set forth in claim 11, wherein: the coding module and/or the decoding module is also provided with a mode selection module for determining a scanning sequence based on the prediction mode, and the mode selection module is used for controlling mode selection in the prediction module and the scanning module and improving the coding and/or decoding efficiency; the scanning module respectively counts the probability that the coefficient of each frequency of the residual image is not 0 in each mode, and generates a scanning sequence table from large to small according to the value of the probability.

13. The novel spatial prediction device for video coding according to claim 12, characterized in that: the decoding specifically comprises:

C＝C+Z

wherein,

c is a frequency domain residual error coefficient;

z is a frequency domain coefficient matrix;

14. The novel spatial prediction device for video coding according to claim 12, characterized in that: the decoding specifically comprises:

15. The novel spatial prediction device for video coding according to claim 12, characterized in that: the decoding specifically comprises:

16. The apparatus of claim 13, 14 or 15, wherein: when decoding, the decoding module also refers to the scanning sequence table according to the selected mode sequence and scans the residual error coefficient according to the sequence of the searched positions.