CN106998470B

CN106998470B - Decoding method, encoding method, decoding apparatus, and encoding apparatus

Info

Publication number: CN106998470B
Application number: CN201610050028.5A
Authority: CN
Inventors: 曾兵; 陈宸; 朱树元; 缪泽翔; 张红; 赵寅
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2016-01-25
Filing date: 2016-01-25
Publication date: 2020-03-20
Anticipated expiration: 2036-01-25
Also published as: WO2017129023A1; CN106998470A

Abstract

The embodiment of the invention provides a decoding method, an encoding method, decoding equipment and encoding equipment. The decoding method comprises the following steps: obtaining alternating current AC component and direct current DC component residual error of a transformation quantization coefficient of a target image block from a code stream; performing inverse quantization and inverse transformation on the AC component and a preset DC component to obtain a transitional reconstructed image block of the target image block; determining a predicted value of a DC component of a transformation quantization coefficient of the target image block according to the similarity of pixels in a reference pixel area of the transitional reconstructed image block and the target image block; determining an original DC component of the target image block according to the predicted value of the DC component and the DC component residual error; and carrying out inverse quantization and inverse transformation on the original DC component and the AC component of the target image block to obtain a residual signal of the target image block, and decoding the target image block according to the residual signal.

Description

Decoding method, encoding method, decoding apparatus, and encoding apparatus

Technical Field

The embodiment of the invention relates to the field of video coding, decoding and compression, in particular to a decoding method, an encoding method, decoding equipment and encoding equipment.

Background

Digital video capabilities can be incorporated into a wide range of devices, including digital televisions, digital direct broadcast systems, wireless broadcast systems, Personal Digital Assistants (PDAs), laptop or desktop computers, tablet computers, e-book readers, digital cameras, digital recording devices, digital media players, video gaming devices, video game consoles, cellular or satellite radio telephones, video conferencing devices, video streaming devices, and so forth. Digital video devices implement video compression techniques, such as those described in the standards defined by the MPEG-2, MPEG-4, ITU-T H.263, ITU-T H.264/MPEG-4 part 10 Advanced Video Coding (AVC), ITU-T H.265 High Efficiency Video Coding (HEVC) standards, and extensions of the standards, to more efficiently transmit and receive digital video information. Video devices may more efficiently transmit, receive, encode, decode, and/or store digital video information by implementing these video codec techniques.

In the field of video encoding and decoding, the concept of a frame refers to a complete image, and the image of one frame is composed of the images of one frame in a certain sequence and frame rate to form a video format, and then the video format can be played. When the frame rate reaches a certain speed, the interval time between two frames is less than the resolution limit of human eyes, and short visual retention occurs, so that the visual retention appears dynamically on the screen. The basis of the compression of a video file is the compression coding of a single-frame digital image, and a large amount of repeated representation information, called redundant information, exists in the digitized image. There are many places with the same or similar spatial structures in a frame of image, for example, there are mostly close relations and similarities between the colors of sampling points in the same object or background. In a multi-frame image group, an image of one frame is substantially greatly correlated with an image of a previous frame or a next frame, and the difference in pixel value of the description information is small, which are parts that can be compressed. For the same reason, the video file not only contains spatial redundant information, but also contains a large amount of temporal redundant information, which is caused by the composition structure of the video. For example, the frame rate of video samples is typically 25 frames/second to 30 frames/second, with the possibility of 60 frames/second in special cases. That is, the sampling time interval between two adjacent frames is at least 1/30 seconds to 1/25 seconds. In such a short time, a large amount of similar information exists in the sampled image pictures basically, and the pictures have great relevance. But are independent recordings in the original digital video recording system and do not take into account and exploit these consistent similar characteristics, which results in a considerable amount of redundant data to repeat. In addition, it has been shown through research that, from the viewpoint of the psychological characteristic of the visual sensitivity of human eyes, there is also a portion of video information that can be used for compression, i.e., visual redundancy. Visual redundancy means that a video bit stream is appropriately compressed by using the characteristic that the human eye is sensitive to luminance variations and relatively insensitive to chrominance variations. In the high-brightness area, the sensitivity of human vision to brightness change shows a descending trend, and the human vision is more sensitive to the edge of an object, and the inner area is relatively insensitive; the method is sensitive to the whole structure and relatively insensitive to internal detail transformation. Because the final service object of the video image information is a human group, the characteristics of human eyes can be fully utilized to compress the original video image information, and a better compression effect is achieved. In addition to the above-mentioned spatial redundancy, temporal redundancy and visual redundancy, a series of redundant information of information entropy redundancy, structural redundancy, knowledge redundancy, importance redundancy, and the like exist in video image information. The purpose of video compression coding is to remove redundant information in a video sequence by using various technical methods so as to achieve the effects of reducing storage space and saving transmission bandwidth.

In terms of the current state of the art, video compression processing techniques mainly include intra-frame prediction, inter-frame prediction, transform quantization, entropy coding, and deblocking filtering. In the international general scope, there are four main compression encoding methods in the existing video compression encoding standards: chroma sampling, predictive coding, transform coding, and quantization coding.

Chroma sampling: the method makes full use of the psychovisual characteristics of human eyes, and tries to reduce the data volume of the single element description to the maximum extent from the data representation of the bottom layer. Most adopted in television systems is luminance-chrominance (YUV) color coding, which is a widely adopted standard for european television systems. The YUV color space includes a luminance signal Y and two color difference signals U and V, the three components being independent of each other. The mutually separated expression modes of the YUV color modes are more flexible, the transmission occupied bandwidth is less, and the model has more advantages than the traditional red, green and blue (RGB) color model. For example, the YUV 4:2:0 form indicates that the two chrominance components U and V have only half of the luminance Y component in both the horizontal and vertical directions, i.e., there are 4 luminance components Y in 4 sampling pixels, and there is only one chrominance component U and V. When this is expressed, the data amount is further reduced to about 33% of the original data amount. The aim of compressing video by using human physiological visual characteristics and the chrominance sampling mode is one of the video data compression modes widely adopted at present.

Predictive coding: i.e. the data information of previously encoded frames is used to predict the frame that is currently to be encoded. A predicted value is obtained through prediction, the predicted value is not completely equivalent to an actual value, and a certain residual value exists between the predicted value and the actual value. If the prediction is more suitable, the more the predicted value is close to the actual value, the smaller the residual value is, so that the data size can be greatly reduced by encoding the residual value, and the original image is restored and reconstructed by using the residual value and the predicted value when the decoding end decodes, which is the basic idea method of the prediction encoding. Predictive coding is divided into two basic types, intra-prediction and inter-prediction, in mainstream coding standards.

Transform coding: instead of directly encoding the original spatial domain information, the information sample values are converted from the current domain to another artificially defined domain (usually called transform domain) according to some form of transform function, and then compression encoding is performed according to the distribution characteristics of the information in the transform domain. The reason for transform coding is: video image data is often large in data correlation in a spatial domain, so that a large amount of redundant information exists, and a large bit amount is required for direct encoding. And the data correlation in the transform domain is greatly reduced, so that the redundant information of the coding is reduced, and the data amount required by the coding is also greatly reduced, thereby obtaining higher compression ratio and realizing better compression effect. Typical transform coding is a Carlo (K-L) transform, a Fourier transform, or the like. Integer Discrete Cosine Transform (DCT) is a commonly used transform coding scheme in many international standards.

Quantization coding: the above mentioned transform coding does not compress data per se, and the quantization process is a powerful means for compressing data, and is also a main reason for data "loss" in lossy compression. The quantization process is a process of forcibly planning an input value with a large dynamic range into a small number of output values. Because the range of the quantized input value is large, more bits are needed for representation, and the range of the output value after forced programming is small, only a small number of bits are needed for representation. Each quantized input is normalized to a quantized output, i.e., quantized into an order of magnitude, commonly referred to as a quantization level (typically specified by the encoder).

In the coding algorithm based on the hybrid coding architecture, the compression coding modes are used in a hybrid way, and the encoder control module selects the coding mode adopted by different image blocks according to the local characteristics of the image blocks in the video frame. The method comprises the steps of carrying out frequency domain or spatial domain prediction on a block subjected to intra-frame prediction coding, carrying out motion compensation prediction on a block subjected to inter-frame prediction coding, carrying out transformation and quantization processing on a predicted residual error to form a residual error coefficient (also called a transformation quantization coefficient), and finally generating a final code stream through an entropy encoder. To avoid accumulation of prediction errors, the reference signal for intra-frame or inter-frame prediction is obtained by a decoding module at the encoding end. And reconstructing a residual signal by the transformed and quantized residual coefficient through inverse quantization and inverse transformation, and adding the reconstructed residual signal and a predicted reference signal to obtain a reconstructed image. The loop filtering can carry out pixel correction on the reconstructed image, and the coding quality of the reconstructed image is improved.

However, as the demand for high definition video increases, the demand for image coding efficiency also increases, and therefore how to further increase the coding efficiency becomes an urgent problem to be solved.

Disclosure of Invention

The application provides a decoding method, an encoding method, a decoding device and an encoding device, which can improve the encoding efficiency.

According to a first aspect of the present invention, there is provided a decoding method comprising: acquiring an AC component and a DC component residual of a transform quantization coefficient of a target image block from a code stream; performing inverse quantization and inverse transformation on the AC component and a preset DC component to obtain a transition residual error, and adding the transition residual error and a predicted value of the target image block to obtain a transition reconstruction image block of the target image block; determining a predicted value of a DC component of a transformation quantization coefficient of the target image block according to the similarity of pixels in a reference pixel area of the transitional reconstructed image block and the target image block; determining an original DC component of the target image according to the predicted value of the DC component and the DC component residual error; and carrying out inverse quantization and inverse transformation on the original DC component and the AC component of the target image to obtain a residual signal of the target image, and decoding the target image according to the residual signal.

For example, at a decoding end, a reconstructed transform quantization coefficient may be obtained from the code stream, where the reconstructed transform quantization coefficient includes a difference value between a first direct current DC component of the target image block and a predicted value of the DC component, and an alternating current AC component of the target image block; performing inverse quantization and inverse transformation on the transitional transformation quantization coefficient to obtain a transitional reconstruction image block of the target image block, wherein the transitional transformation quantization coefficient comprises a second DC component and an AC component, and the second DC component is a preset value; determining a predicted value of the DC component according to the similarity of the pixels of the transitional reconstructed image block and the pixels in the reference pixel area of the target image block; determining a DC component (namely an original DC component) according to a difference value (namely a DC component residual) of the DC component and a predicted value of the DC component and the predicted value of the DC component; and carrying out inverse quantization and inverse transformation on the initial transformation quantization coefficient to obtain a residual signal of the target image block, wherein the initial transformation quantization coefficient comprises a DC component and an AC component.

According to a second aspect of the present invention, there is provided an encoding method comprising: transforming and quantizing the residual signal of the target image block to obtain a Direct Current (DC) component and an Alternating Current (AC) component of a transformation quantization coefficient of the target image block; performing inverse quantization and inverse transformation on the AC component and a preset DC component to obtain a transition residual error, and adding the transition residual error and a predicted value of the target image block to obtain a transition reconstruction image block of the target image block; determining a predicted value of a DC component of a transformation quantization coefficient of the target image block according to the similarity of pixels of the transitional reconstruction image block and pixels in a reference pixel area of the target image block; determining a DC component residual error of the target image according to the predicted value of the DC component and the original DC component of the target image block; and writing the residual error of the AC component and the DC component into the code stream.

For example, at an encoding end, a residual signal of a target image block may be transformed and quantized to obtain an initial transform quantization coefficient of the target image block, where the initial transform quantization coefficient includes a first DC component and an AC component; inversely quantizing and inversely transforming the transitional transformation quantization coefficient to obtain a transitional reconstruction image block of the target image block, wherein the transitional transformation quantization coefficient comprises a second DC component and an alternating current AC component, and the second DC component is a preset value; determining a predicted value of a DC component (namely an original DC component) according to the similarity between the pixels of the transitional reconstructed image block and the pixels in the reference pixel area of the target image block; and writing the reconstructed transformation quantization coefficient into the code stream, wherein the reconstructed transformation quantization coefficient comprises a difference value (namely a DC component residual) of the DC component and the predicted value and an AC component.

Based on the technical scheme of the application, the inverse quantization and inverse transformation can be performed on the transformation quantization coefficient under the condition that the DC component is set to be a preset value, so that a transitional reconstructed image block of the target image block is obtained, and the predicted value of the DC component is determined according to the similarity between the pixels of the transitional reconstructed image block and the pixels in the reference pixel area of the target image block. When encoding or decoding the target image block, the DC component residue may be used instead of the DC component. Since the absolute value of the DC component residual is smaller than the DC component, the coding efficiency is improved.

In addition, according to the technical scheme of the application, the predicted value of the DC component is determined according to the similarity between the pixels of the transitional reconstructed image block with the DC component as the preset value and the pixels in the reference pixel area, so that the precision of the predicted value of the DC component is improved, the residual error of the DC component is smaller, and the coding efficiency is higher compared with a scheme of directly using the predicted value of the DC component of the reference pixel area as the predicted value of the DC component of the target image block.

In some implementations, determining the prediction value of the DC component of the transform quantized coefficient of the target image block according to the similarity of the pixels in the reference pixel area of the transitional reconstructed image block and the target image block includes: determining at least one line and a first group of pixel signals and an adjacent second group of pixel signals corresponding to each line in the direction specified by the prediction mode of the target image block, wherein the first group of pixel signals comprises first pixel signals of the transient reconstruction image block, the second group of pixel signals comprises second pixel signals of the reference pixel area, and the first pixel signals and the second pixel signals are adjacent; solving the offset of the first pixel signal to enable the square sum of the second-order gradient of the reconstructed signal after the offset is added to the first pixel signal in at least one line and the second-order gradient of the second pixel signal to be minimum, wherein the offset is added to the pixel signal used for representing the second-order gradient in the transitional reconstructed image block and is used for representing a predicted value of the DC component before quantization; and quantizing the value of the offset obtained by solving to obtain a predicted value of the DC component. Because the second-order gradient can more accurately represent the correlation or similarity between the pixels of the target image block and the reference image block, the accuracy of the predicted value of the DC component obtained based on the second-order gradient is higher, the DC component residual is smaller, and the coding efficiency is further improved.

In some implementations, the first and second sets of pixel signals corresponding to each of the at least one line satisfy one of the following two formulas:

wherein λ is₁Threshold, λ, for the presence of a second order gradient in the reconstructed signal representing the first set of pixel signals₂A threshold value for the reconstructed signal representing the first set of pixel signals in the absence of a second order gradient,

is the second order gradient of the first set of pixel signals,

is a second order gradient, i, of the second set of pixel signals_kIs to beA number of one less line, j is a number of pixel signals on each of the at least one line,

wherein the offset is calculated according to the following formula:

wherein, the delta x is an offset,

being the second order gradient of the reconstructed signal of the first pixel signal,

c represents a set of lines satisfying the common two formulas, and q is the number of pixel signals on each of the at least one line. Because only the pixels on certain lines with strong correlation are selected for calculating the predicted value of the DC component, the precision of the predicted value of the DC component is further improved, the residual error of the DC component is smaller, and the coding efficiency is further improved.

In some implementations of the method of the present invention,

in some implementations of the method of the present invention,

in some implementations, determining the prediction value of the DC component of the transform quantized coefficient of the target image block according to the similarity of the pixels in the reference pixel area of the transitional reconstructed image block and the target image block includes: determining a plurality of pixel pairs, a first pixel signal and an adjacent second pixel signal corresponding to each pixel pair in the direction specified by the prediction mode of the target image block, wherein the first pixel signal is a pixel signal of the transient reconstruction image block, and the second pixel signal is a pixel signal of a reference pixel area; adding an offset to the first pixel signal to obtain a reconstructed signal of the first pixel signal, wherein the offset is used for representing a predicted value of the DC component before quantization; solving for the offset such that a sum of squares of first order gradients of reconstructed signals of first pixel signals and second pixel signals of the plurality of pixel pairs is minimized; and quantizing the value of the offset obtained by solving to obtain a predicted value of the DC component.

In certain implementations, the predicted value of the DC component is calculated according to the following formula:

where δ x is the offset, n is the number of pixels in each row or column of the transitional reconstructed image block,

is a first one of the pixel signals, and,

is the second pixel signal and is the second pixel signal,

and

adjacent in the direction specified by the prediction mode. Because only the adjacent pixels between the transitional reconstructed image block and the reference pixel area are selected for calculating the predicted value of the DC component, the calculation complexity is reduced under the condition of ensuring the accuracy of the predicted value of the DC component.

In some implementations, the first group of pixel signals and the second group of pixel signals may be signals of pixels through which the at least one line passes, or signals obtained by interpolating signals of pixels around the at least one line

In some implementations, determining the prediction value of the DC component of the transform quantized coefficient of the target image block according to the similarity of the pixels in the reference pixel area of the transitional reconstructed image block and the target image block includes: acquiring a first group of pixel signals positioned above a transitional reconstructed image block, a second group of pixel signals positioned on the left side of the transitional reconstructed image block, a third group of pixel signals positioned on the upper side inside the transitional reconstructed image block and a fourth group of pixel signals positioned on the left side inside the transitional reconstructed image block in a reference pixel area, wherein the first group of pixel signals and the third group of pixel signals respectively comprise M rows of pixel signals, the second group of pixel signals and the fourth group of pixel signals respectively comprise H columns of pixel signals, and M and H are positive integers; calculating the difference between the average value of the first group of pixel signals and the average value of the third group of pixel signals to obtain a first difference value; calculating the difference between the average value of the second group of pixel signals and the average value of the fourth group of pixel signals to obtain a second difference value; and quantizing the average value of the first difference value and the second difference value to obtain a predicted value of the DC component. Because the pixels of the multi-row or multi-column transitional reconstructed image block and the reference pixel area are selected for calculating the predicted value of the DC component, the calculation complexity is reduced under the condition of ensuring the accuracy of the predicted value of the DC component. In addition, the average value is used to calculate the predicted value of the DC component, so that the design of an encoder or a decoder is simple.

In certain implementations, M is an integer greater than or equal to 2 and N is an integer greater than or equal to 2. In addition, when M and H are multiple rows or multiple columns, the predicted value of the DC component can be predicted by fully utilizing the correlation of more signals around the target image block and the relation between the target image block and the surrounding signals, so that the residual error of the DC component is smaller, the precision of the predicted value of the DC component is improved, and the coding efficiency is further improved.

In some implementations, the target image block is a transform block. Since the transformation and quantization and the inverse transformation and inverse quantization may be in units of transform blocks, and thus, in units of transform blocks also when determining the predicted value of the DC component, it is possible to reduce complexity of calculation and to make the design of an encoder or a decoder simple.

In some implementations, the preset DC component is zero. By setting the predicted value of the DC component to zero, the complexity of calculation can be reduced compared to setting to other values.

In certain embodiments, the above method further comprises: and determining the size of the DC component, if the DC component is larger than a preset threshold value, executing the method, and if the DC component is smaller than the preset threshold value, directly coding according to the DC component and the AC component. Because the DC component can be adopted only when the DC component is larger than the preset threshold, the improvement of the coding efficiency makes up the performance loss caused by the algorithm complexity of predicting the DC residual error, and the overall performance of coding or decoding is improved.

According to a third aspect of the present invention, a decoding device comprises means for performing the decoding method of the first aspect. The decoding apparatus includes: the entropy decoding module is used for acquiring alternating current AC component and direct current DC component residual error of a transformation quantization coefficient of the target image block from the code stream; the first inverse quantization and inverse transformation module is used for performing inverse quantization and inverse transformation on the AC component and the preset DC component to obtain a transition residual error, and the transition residual error is added with the predicted value of the target image block to obtain a transition reconstructed image block of the target image block; the prediction module is used for determining a prediction value of a DC component of a transformation quantization coefficient of the target image block according to the similarity of pixels in a reference pixel area of the transitional reconstructed image block and the target image block; and the second inverse quantization and inverse transformation module is used for determining the original DC component of the target image block according to the predicted value of the DC component and the DC component residual, inversely quantizing and inversely transforming the original DC component and the AC component of the target image block to obtain a residual signal of the target image block, and decoding the target image block according to the residual signal.

According to a fourth aspect of the invention, an encoding device comprises means for performing the encoding method of the second aspect. The encoding device includes: the transformation and quantization module is used for transforming and quantizing the residual signal of the target image block to obtain a Direct Current (DC) component and an Alternating Current (AC) component of a transformation quantization coefficient of the target image block; the inverse quantization and inverse transformation module is used for performing inverse quantization and inverse transformation on the AC component and the preset DC component to obtain a transition residual error, and adding the transition residual error and the predicted value of the target image block to obtain a transition reconstruction image block of the target image block; the prediction module is used for determining a predicted value of a DC component of a transformation quantization coefficient of the target image block according to the similarity between the pixels of the transitional reconstruction image block and the pixels in the reference pixel area of the target image block; and the entropy coding module is used for determining the DC component residual error of the target image block according to the predicted value of the DC component and the original DC component of the target image block.

According to a fifth aspect of the present invention, a decoding apparatus comprises a video decoder configured to: obtaining alternating current AC component and direct current DC component residual error of a transformation quantization coefficient of a target image block from a code stream; performing inverse quantization and inverse transformation on the AC component and a preset DC component to obtain a transition residual error, and adding the transition residual error and a predicted value of the target image block to obtain a transition reconstruction image block of the target image block; determining a predicted value of a DC component of a transformation quantization coefficient of the target image block according to the similarity of pixels in a reference pixel area of the transitional reconstructed image block and the target image block; determining an original DC component of the target image block according to the predicted value of the DC component and the DC component residual error; and carrying out inverse quantization and inverse transformation on the original DC component and the AC component of the target image block to obtain a residual signal of the target image block, and decoding the target image block according to the residual signal.

According to a sixth aspect of the present invention, an encoding apparatus comprises a video decoder configured to: transforming and quantizing the residual signal of the target image block to obtain a Direct Current (DC) component and an Alternating Current (AC) component of a transformation quantization coefficient of the target image block; performing inverse quantization and inverse transformation on the AC component and a preset DC component to obtain a transition residual error, and adding the transition residual error and a predicted value of the target image block to obtain a transition reconstruction image block of the target image block; determining a predicted value of a DC component of a transformation quantization coefficient of the target image block according to the similarity of pixels of the transitional reconstruction image block and pixels in a reference pixel area of the target image block; determining a DC component residual error of the target image block according to the predicted value of the DC component and the original DC component of the target image block; and writing the residual error of the AC component and the DC component into the code stream.

According to a seventh aspect of the invention, a computer-readable storage medium having stored thereon instructions that, when executed, cause one or more processors of a decoding device to: obtaining alternating current AC component and direct current DC component residual error of a transformation quantization coefficient of a target image block from a code stream; performing inverse quantization and inverse transformation on the AC component and a preset DC component to obtain a transition residual error, and adding the transition residual error and a predicted value of the target image block to obtain a transition reconstruction image block of the target image block; determining a predicted value of a DC component of a transformation quantization coefficient of the target image block according to the similarity of pixels in a reference pixel area of the transitional reconstructed image block and the target image block; determining an original DC component of the target image block according to the predicted value of the DC component and the DC component residual error; and carrying out inverse quantization and inverse transformation on the original DC component and the AC component of the target image block to obtain a residual signal of the target image block, and decoding the target image block according to the residual signal.

According to an eighth aspect of the invention, a computer-readable storage medium storing instructions that, when executed, cause one or more processors of an encoding apparatus to: transforming and quantizing the residual signal of the target image block to obtain a Direct Current (DC) component and an Alternating Current (AC) component of a transformation quantization coefficient of the target image block; performing inverse quantization and inverse transformation on the AC component and a preset DC component to obtain a transition residual error, and adding the transition residual error and a predicted value of the target image block to obtain a transition reconstruction image block of the target image block; determining a predicted value of a DC component of a transformation quantization coefficient of the target image block according to the similarity of pixels of the transitional reconstruction image block and pixels in a reference pixel area of the target image block; determining a DC component residual error of the target image block according to the predicted value of the DC component and the original DC component of the target image block; and writing the residual error of the AC component and the DC component into the code stream.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required to be used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to these drawings without inventive labor.

FIG. 1 is a schematic block diagram of a video encoding system according to an embodiment of the present invention;

FIG. 2 is a schematic device diagram for video encoding according to an embodiment of the present invention;

FIG. 3 is a schematic block diagram of another video codec system according to an embodiment of the present invention;

FIG. 4 is a schematic block diagram of a video encoder according to an embodiment of the present invention;

FIG. 5 is a schematic block diagram of a video decoder according to an embodiment of the present invention;

FIG. 6 is a schematic device diagram of a video encoder according to an embodiment of the present invention;

FIG. 7 is a schematic device diagram of a video decoder according to an embodiment of the present invention;

FIG. 8 is a schematic flow chart of an encoding method according to an embodiment of the present invention;

FIG. 9 is a schematic flow chart diagram of a decoding method according to an embodiment of the present invention;

fig. 10 shows 35 prediction modes of HEVC;

FIGS. 11A and 11B are schematic diagrams of selecting a pixel signal based on a prediction mode according to an embodiment of the present invention;

FIGS. 12A-12D are diagrams illustrating selection of pixel signals based on directional prediction modes according to embodiments of the present invention;

FIGS. 13A and 13B are schematic diagrams of selecting pixel signals based on prediction modes according to another embodiment of the present invention;

FIG. 14 is a schematic flow chart diagram of a process of determining a predicted value of a DC component in accordance with an embodiment of the present invention;

FIG. 15 is a schematic flow chart diagram of a process of determining a predicted value of a DC component in accordance with another embodiment of the present invention;

FIG. 16 is a schematic flow chart diagram of a process of determining a predicted value of a DC component in accordance with another embodiment of the present invention;

FIG. 17 is a schematic block diagram of an embodiment of the present invention suitable for use in a television application;

fig. 18 is a schematic configuration diagram of an application of the present invention to a mobile phone.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be obtained by a person skilled in the art without inventive effort based on the embodiments of the present invention, are within the scope of the present invention.

Fig. 1 is a schematic block diagram of a video codec device or electronic equipment 50 that may incorporate a codec according to an embodiment of the present invention. Fig. 2 is a schematic device diagram for video encoding according to an embodiment of the present invention. The elements of fig. 1 and 2 will be described below.

The electronic device 50 may for example be a mobile terminal or a user equipment of a wireless communication system. It should be understood that embodiments of the present invention may be implemented within any electronic device or apparatus that may require encoding and decoding, or encoding or decoding, of video images.

The apparatus 50 may include a housing 30 for incorporating and protecting equipment. The device 50 may also include a display 32 in the form of a liquid crystal display. In other embodiments of the invention, the display may be any suitable display technology suitable for displaying images or video. The apparatus 50 may also include a keypad 34. In other embodiments of the invention, any suitable data or user interface mechanism may be employed. For example, the user interface may be implemented as a virtual keyboard or a data entry system as part of a touch sensitive display. The device may include a microphone 36 or any suitable audio input, which may be a digital or analog signal input. The apparatus 50 may also include an audio output device which, in embodiments of the invention, may be any one of: headphones 38, speakers, or analog audio or digital audio output connections. The apparatus 50 may also include a battery 40, and in other embodiments of the invention the device may be powered by any suitable mobile energy device, such as a solar cell, a fuel cell, or a clock mechanism generator. The apparatus may also include an infrared port 42 for short-range line-of-sight communication with other devices. In other embodiments, the device 50 may also include any suitable short-range communication solution, such as a Bluetooth wireless connection or a USB/firewire wired connection.

The apparatus 50 may include a controller 56 or processor for controlling the apparatus 50. The controller 56 may be connected to a memory 58, which in embodiments of the present invention may store data in the form of images and audio data, and/or may also store instructions for implementation on the controller 56. The controller 56 may also be connected to a codec circuit 54 adapted to effect encoding and decoding of audio and/or video data or ancillary encoding and decoding effected by the controller 56.

The apparatus 50 may also include a Card reader 48 and a smart Card 46, such as a Universal Integrated Circuit Card (UICC) and a UICC reader, for providing user information and adapted to provide authentication information for authenticating and authorizing a user at a network.

The apparatus 50 may further comprise a radio interface circuit 52 connected to the controller and adapted to generate wireless communication signals, for example for communication with a cellular communication network, a wireless communication system or a wireless local area network. The apparatus 50 may also include an antenna 44 connected to the radio interface circuit 52 for transmitting radio frequency signals generated at the radio interface circuit 52 to other apparatus(s) and for receiving radio frequency signals from other apparatus(s).

In some embodiments of the present invention, the apparatus 50 includes a camera capable of recording or detecting single frames that are received and processed by the codec 54 or controller. In some embodiments of the invention, an apparatus may receive video image data to be processed from another device prior to transmission and/or storage. In some embodiments of the present invention, the apparatus 50 may receive images for encoding/decoding via a wireless or wired connection.

Fig. 3 is a schematic block diagram of another video codec system 10 according to an embodiment of this disclosure. As shown in fig. 3, video codec system 10 includes a source device 12 and a destination device 14. Source device 12 generates encoded video data. Accordingly, source device 12 may be referred to as a video encoding device or a video encoding apparatus. Destination device 14 may decode the encoded video data generated by source device 12. Destination device 14 may, therefore, be referred to as a video decoding device or a video decoding apparatus. Source device 12 and destination device 14 may be examples of video codec devices or video codec apparatuses. Source device 12 and destination device 14 may comprise a wide range of devices, including desktop computers, mobile computing devices, notebook (e.g., laptop) computers, tablet computers, set-top boxes, handsets such as smart phones, televisions, cameras, display devices, digital media players, video game consoles, in-vehicle computers, or the like.

Destination device 14 may receive the encoded video data from source device 12 via channel 16. Channel 16 may comprise one or more media and/or devices capable of moving encoded video data from source device 12 to destination device 14. In one example, channel 16 may comprise one or more communication media that enable source device 12 to transmit encoded video data directly to destination device 14 in real-time. In this example, source device 12 may modulate the encoded video data according to a communication standard (e.g., a wireless communication protocol), and may transmit the modulated video data to destination device 14. The one or more communication media may include wireless and/or wired communication media such as a Radio Frequency (RF) spectrum or one or more physical transmission lines. The one or more communication media may form part of a packet-based network, such as a local area network, a wide area network, or a global network (e.g., the internet). One or more communication media may include a router, switch, base station, or other apparatus that facilitates communication from source device 12 to destination device 14.

In another example, channel 16 may include a storage medium that stores encoded video data generated by source device 12. In this example, destination device 14 may access the storage medium via disk access or card access. The storage medium may include a variety of locally-accessed data storage media such as blu-ray discs, DVDs, CD-ROMs, flash memory, or other suitable digital storage media for storing encoded video data.

In another example, channel 16 may include a file server or another intermediate storage device that stores encoded video data generated by source device 12. In this example, destination device 14 may access encoded video data stored at a file server or other intermediate storage device via streaming or download. The file server may be of a type capable of storing encoded video data and transmitting the encoded video data to destination device 14. Example file servers include web servers (e.g., for a website), File Transfer Protocol (FTP) servers, Network Attached Storage (NAS) devices, and local disk drives.

Destination device 14 may access the encoded video data via a standard data connection, such as an internet connection. Example types of data connections include a wireless channel (e.g., a Wi-Fi connection), a wired connection (e.g., DSL, cable modem, etc.), or a combination of both, suitable for accessing encoded video data stored on a file server. The transmission of the encoded video data from the file server may be a streaming transmission, a download transmission, or a combination of both.

The technique of the present invention is not limited to wireless application scenarios, and for example, the technique can be applied to video encoding and decoding supporting various multimedia applications such as the following: over-the-air television broadcasts, cable television transmissions, satellite television transmissions, streaming video transmissions (e.g., via the internet), encoding of video data stored on a data storage medium, decoding of video data stored on a data storage medium, or other applications. In some examples, video codec system 10 may be configured to support one-way or two-way video transmission to support applications such as video streaming, video playback, video broadcasting, and/or video telephony.

In the example of fig. 3, source device 12 includes video source 18, video encoder 20, and output interface 22. In some examples, output interface 22 may include a modulator/demodulator (modem) and/or a transmitter. Video source 18 may include a video capture device (e.g., a video camera), a video archive containing previously captured video data, a video input interface to receive video data from a video content provider, and/or a computer graphics system for generating video data, or a combination of the aforementioned video data sources.

Video encoder 20 may encode video data from video source 18. In some examples, source device 12 transmits the encoded video data directly to destination device 14 via output interface 22. The encoded video data may also be stored on a storage medium or file server for later access by destination device 14 for decoding and/or playback.

In the example of fig. 3, destination device 14 includes input interface 28, video decoder 30, and display device 32. In some examples, input interface 28 includes a receiver and/or a modem. Input interface 28 may receive encoded video data via channel 16. The display device 32 may be integral with the destination device 14 or may be external to the destination device 14. In general, display device 32 displays decoded video data. The display device 32 may include a variety of display devices, such as a Liquid Crystal Display (LCD), a plasma display, an Organic Light Emitting Diode (OLED) display, or other types of display devices.

Video encoder 20 and video decoder 30 may operate according to a video compression standard, such as the high efficiency video codec h.265 standard, and may comply with the HEVC test model (HM). The text description of the H.265 standard ITU-T H.265(V3) (04/2015), published No. 4/29 2015, downloadable from http:// handle. ITU. int/11.1002/1000/12455, the entire contents of which are incorporated herein by reference.

Alternatively, video encoder 20 and video decoder 30 may operate according to other proprietary or industry standards, including ITU-T H.261, ISO/IECMPEG-1Visual, ITU-T H.262, or ISO/IECMPEG-2Visual, ITU-T H.263, ISO/IECMPEG-4Visual, ITU-T H.264 (also referred to as ISO/IECMPEG-4AVC), including Scalable Video Codec (SVC) and Multiview Video Codec (MVC) extensions. It should be understood that the techniques of this disclosure are not limited to any particular codec standard or technique.

Moreover, fig. 3 is merely an example and the techniques of this disclosure may be applied to video codec applications (e.g., single-sided video encoding or video decoding) that do not necessarily include any data communication between an encoding device and a decoding device. In other examples, data is retrieved from local memory, streamed over a network, or otherwise manipulated. The encoding device may encode and store data to memory, and/or the decoding device may retrieve and decode data from memory. In many examples, encoding and decoding are performed by multiple devices that do not communicate with each other, but merely encode data to and/or retrieve data from memory and decode data.

Video encoder 20 and video decoder 30 may each be implemented as any of a variety of suitable circuits, such as one or more microprocessors, Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs), discrete logic, hardware, or any combinations thereof. If the techniques are implemented partially or fully in software, the device may store instructions of the software in a suitable non-transitory computer-readable storage medium and may execute instructions in hardware using one or more processors to perform the techniques of this disclosure. Any of the foregoing may be considered one or more processors, including hardware, software, a combination of hardware and software, and the like. Each of video encoder 20 and video decoder 30 may be included in one or more encoders or decoders, either of which may be integrated as part of a combined encoder/decoder (CODEC) in the other device.

This disclosure may generally refer to video encoder 20 "signaling" some information to another device, such as video decoder 30. The term "signaling" may generally refer to syntax elements and/or represent a conveyance of encoded video data. This communication may occur in real-time or near real-time. Alternatively, such communication may occur over a span of time, such as may occur when, at the time of encoding, the syntax elements are stored to a computer-readable storage medium with the binary data resulting from the encoding, the syntax elements being then retrievable by a decoding device at any time after being stored to such medium.

Video encoder 20 encodes video data. The video data may include one or more pictures. Video encoder 20 may generate a bitstream that contains encoded information for the video data in the form of a bitstream. The encoding information may include encoded picture data and associated data. The associated data may include Sequence Parameter Sets (SPS), Picture Parameter Sets (PPS), and other syntax structures. An SPS may contain parameters that apply to zero or more sequences. A PPS may contain parameters that apply to zero or more pictures. A syntax structure refers to a set of zero or more syntax elements in a codestream arranged in a specified order.

To generate coding information for a picture, video encoder 20 may partition the picture into a grid of Coding Tree Blocks (CTBs). In some examples, a CTB may be referred to as a "tree block," largest coding unit "(LCU), or" coding tree unit. The CTB is not limited to a particular size and may include one or more Coding Units (CUs). Each CTB may be associated with a block of pixels of equal size within a picture. Each pixel may correspond to one luminance (luma or luma) sample and two chrominance (chroma or chroma) samples. Thus, each CTB may be associated with one block of luma samples and two blocks of chroma samples. The CTBs of a picture may be divided into one or more slices. In some examples, each stripe contains an integer number of CTBs. As part of encoding a picture, video encoder 20 may generate encoding information for each slice of the picture, i.e., encode CTBs within the slice. To encode a CTB, video encoder 20 may recursively perform quadtree partitioning on blocks of pixels associated with the CTB to partition the blocks of pixels into descending blocks of pixels. Smaller blocks of pixels may be associated with a CU.

Fig. 4 is a schematic block diagram of a video encoder 20 according to an embodiment of the present invention, which includes an encoding-side prediction module 201, a transform quantization module 202, an entropy coding module 203, an encoding reconstruction module 204, and an encoding-side filtering module 205, and fig. 5 is a schematic block diagram of a video decoder 30 according to an embodiment of the present invention, which includes a decoding-side prediction module 206, an inverse transform inverse quantization module 207, an entropy decoding module 208, a decoding reconstruction module 209, and a decoding filtering module 210. Specifically, the method comprises the following steps:

the encoding side prediction module 201 and the decoding side prediction module 206 are used to generate prediction data. Video encoder 20 may generate one or more Prediction Units (PUs) for each no longer partitioned CU (i.e., the smallest sized CU). Each PU of a CU may be associated with a different block of pixels within a block of pixels of the CU. Video encoder 20 may generate a predictive block of pixels for each PU of the CU. Video encoder 20 may use intra prediction or inter prediction to generate the predictive pixel blocks for the PU. If video encoder 20 uses intra prediction to generate the predictive pixel block for the PU, video encoder 20 may generate the predictive pixel block for the PU based on decoded pixels of a picture (or referred to as an image) that is associated with the PU and located within the same video frame. If video encoder 20 uses inter prediction to generate the predictive pixel block for the PU, video encoder 20 may generate the predictive pixel block for the PU based on decoded pixels of one or more pictures other than the picture associated with the PU, i.e., other than the video frame in which the PU is located. Video encoder 20 may generate residual blocks of pixels for the CU based on the predictive blocks of pixels for the PUs of the CU. The residual pixel block of the CU may indicate a difference (or referred to as a residual signal or residual data) between sample values in the predictive pixel block of the PU of the CU and corresponding sample values in the initial pixel block of the CU.

The transform quantization module 202 is configured to process the predicted residual signal or residual data. Video encoder 20 may perform recursive quadtree partitioning on the residual pixel blocks of the CU to partition the residual pixel blocks of the CU into one or more smaller residual pixel blocks associated with Transform Units (TUs) of the CU. Because pixels in a block of pixels associated with a TU each correspond to one luma sample and two chroma samples, each TU may be associated with one block of luma residual samples and two blocks of chroma residual samples. Video encoder 20 may apply one or more transforms to a residual sample block associated with a TU to generate a coefficient block (i.e., a block of coefficients). The transform may be a discrete cosine transform, i.e. a DCT transform, or a variant thereof, such as a discrete sine transform, i.e. a DST transform. With the transform matrix of the DCT, a two-dimensional transform is computed by applying a one-dimensional transform in the horizontal and vertical directions, obtaining blocks of coefficients (also called transform coefficients). Video encoder 20 may perform a quantization procedure on each transform coefficient in the coefficient block. Quantization generally refers to the process by which transform coefficients are quantized to reduce the amount of data used to represent the coefficients, which can be understood as the process of mapping an infinite number of data into a finite number of data, thereby providing further compression. After transformation and quantization, a transformation quantization coefficient is obtained, the transformation quantization coefficient comprises a DC component and an AC component, the process of solving the DC component and the AC component is not the focus of the invention, and a specific solving process can refer to ITU-T H.265(V3) (04/2015), which is not described herein. The process performed by the inverse transform inverse quantization module 207 comprises the inverse process of the transform quantization module 202.

Video encoder 20 may generate a set of syntax elements that represent coefficients in the quantized coefficient block. Video encoder 20, through entropy encoding module 203, may apply an entropy encoding operation (e.g., a Context Adaptive Binary Arithmetic Coding (CABAC) operation) to some or all of the syntax elements described above. To apply CABAC encoding to syntax elements, video encoder 20 may binarize the syntax elements to form a binary sequence including one or more bits (referred to as "bins"). Video encoder 20 may encode a portion of the bins using regular (regular) encoding and may encode other portions of the bins using bypass (bypass) encoding.

In addition to entropy encoding syntax elements of the coefficient block, video encoder 20, through encoding reconstruction module 204, may apply inverse quantization and an inverse transform to the transformed coefficient block to reconstruct a residual sample block from the transformed coefficient block. Video encoder 20 may add the reconstructed residual block of samples to a corresponding block of samples of the one or more predictive blocks of samples to generate a reconstructed block of samples. By reconstructing a block of samples for each color component, video encoder 20 may reconstruct a block of pixels associated with a TU. The pixel blocks for each TU of the CU are reconstructed in this manner until the entire pixel block reconstruction for the CU is complete.

According to an embodiment of the present invention, the video encoder 20 may further include a DC component prediction module 206 for determining a predicted value of the DC component, and specifically, the DC component prediction module 206 may construct a new set of transform quantization coefficients from the transform quantization coefficients output by the transform quantization module 202, the DC component of the constructed new transform quantization coefficients being forced to 0, the AC component being an AC component output by the transform quantization module 202, perform inverse quantization and inverse transformation on the new transform quantization coefficients by an inverse transform and inverse quantization module 207, perform an inverse transform and inverse quantization process on the obtained reconstructed residual sample block, and may add the reconstructed residual sample block to a corresponding sample block of the one or more predictive sample blocks to generate a reconstructed sample block (hereinafter also referred to as an image block transition reconstruction). And finally, determining a predicted value of the DC component according to the similarity or correlation of the pixels of the transitional reconstructed image block and the pixels of the reference pixel area. And finally, obtaining a residual signal of the DC component according to the predicted value of the DC component and the DC component output by the transform quantization module 202. The residual signal of the DC component and the AC component are written into the code stream by the entropy coding module 203.

After video encoder 20 reconstructs the pixel block of the CU, video encoder 20, through encoding-side filtering module 205, performs a deblocking filtering operation to reduce blockiness of the pixel block associated with the CU. After video encoder 20 performs the deblocking filtering operation, video encoder 20 may modify the reconstructed pixel block of the CTB of the picture using Sample Adaptive Offset (SAO). After performing these operations, video encoder 20 may store the reconstructed block of pixels of the CU in a decoded picture buffer for use in generating predictive blocks of pixels for other CUs.

Video decoder 30 may receive the codestream. The bitstream contains, in the form of a bitstream, encoding information of the video data encoded by the video encoder 20. Video decoder 30, through entropy decoding module 208, parses the codestream to extract syntax elements from the codestream. When video decoder 30 performs CABAC decoding, video decoder 30 may perform regular decoding on some bins and may perform bypass decoding on other bins, where the bins in the bitstream have a mapping relationship with syntax elements, and the syntax elements are obtained by parsing the bins.

The video decoder 30, through the decoding reconstruction module 209, may reconstruct pictures of the video data based on syntax elements extracted from the codestream. The process of reconstructing video data based on syntax elements is generally reciprocal to the process performed by video encoder 20 to generate syntax elements. For example, video decoder 30 may generate, based on syntax elements associated with the CU, predictive pixel blocks for PUs of the CU. In addition, video decoder 30 may inverse quantize coefficient blocks associated with TUs of the CU. Video decoder 30 may perform an inverse transform on the inverse quantized coefficient blocks to reconstruct residual pixel blocks associated with the TUs of the CU. Video decoder 30 may reconstruct the block of pixels of the CU based on the predictive block of pixels and the residual block of pixels.

According to an embodiment of the present invention, the video decoder 30 may further include a DC component prediction module 211 for determining a predicted value of the DC component, and specifically, the DC component prediction module 211 may reconstruct another transform quantization coefficient from the transform coefficients obtained from the bitstream, the DC component of the reconstructed another transform quantization coefficient being forced to 0, the AC component being an AC component output by the entropy decoding module 208, and perform inverse quantization and inverse transformation processes on the reconstructed another transform module through the inverse quantization and inverse transformation module, the reconstructed residual sample block obtained through the inverse transformation and inverse quantization processes, and the reconstructed residual sample block may be added to a corresponding sample block of the one or more predictive sample blocks to generate a reconstructed sample block (hereinafter also referred to as a transient reconstructed image block). And finally, determining a predicted value of the DC component according to the similarity or correlation of the pixels of the transitional reconstructed image block and the pixels of the reference pixel area. And finally, the reconstructed DC component is obtained according to the predicted value of the DC component and the residual signal of the DC component output by the entropy decoding module 208.

After video decoder 30 reconstructs the pixel block of the CU, video decoder 30, through decode filter module 210, performs a deblocking filter operation to reduce blockiness of the pixel block associated with the CU. In addition, video decoder 30 may perform the same SAO operations as video encoder 20 based on the one or more SAO syntax elements. After video decoder 30 performs these operations, video decoder 30 may store the pixel blocks of the CU in a decoded picture buffer. The decoded picture buffer may provide reference pictures for subsequent motion compensation, intra prediction, and display device presentation.

FIG. 6 is a block diagram illustrating an example video encoder 20 configured to implement the techniques of this disclosure. It should be understood that fig. 6 is exemplary and should not be taken as limiting the techniques as broadly illustrated and described herein. As shown in fig. 6, video encoder 20 includes prediction processing unit 100, residual generation unit 102, transform processing unit 104, quantization processing unit 106, inverse quantization processing unit 108, inverse transform processing unit 110, reconstruction unit 112, filter unit 113, decoded picture buffer 114, inverse quantization processing unit 131 within DC prediction module 130, inverse transform processing unit 132, reconstruction unit 133, DC component prediction processing unit 134, residual generation unit 135, and entropy encoding unit 116. Entropy encoding unit 116 includes a regular CABAC codec engine and a bypass codec engine. Prediction processing unit 100 includes inter prediction processing unit 121 and intra prediction processing unit 126. The inter prediction processing unit 121 includes a motion estimation unit and a motion compensation unit. In other examples, video encoder 20 may include more, fewer, or different functional components.

Video encoder 20 receives video data. To encode the video data, video encoder 20 may encode each slice of each picture of the video data. As part of encoding the slice, video encoder 20 may encode each CTB in the slice. As part of encoding the CTB, prediction processing unit 100 may perform quadtree partitioning on a block of pixels associated with the CTB to divide the block of pixels into descending blocks of pixels. For example, prediction processing unit 100 may partition a block of pixels of a CTB into four equally sized sub-blocks, partition one or more of the sub-blocks into four equally sized sub-blocks, and so on.

Video encoder 20 may encode a CU of a CTB in a picture to generate encoding information for the CU. Video encoder 20 may encode CUs of the CTB according to zigzag scanning order. In other words, video encoder 20 may encode CUs as top-left CU, top-right CU, bottom-left CU, and then bottom-right CU. When video encoder 20 encodes the partitioned CU, video encoder 20 may encode CUs associated with sub-blocks of a block of pixels of the partitioned CU according to a zigzag scanning order.

Furthermore, prediction processing unit 100 may partition the pixel blocks of the CU among one or more PUs of the CU. Video encoder 20 and video decoder 30 may support various PU sizes. Assuming that the size of a particular CU is 2 nx 2N, video encoder 20 and video decoder 30 may support 2 nx 2N or nxn PU sizes for intra prediction, and 2 nx 2N, 2 nx N, N x 2N, N xn, or similar sized symmetric PUs for inter prediction. Video encoder 20 and video decoder 30 may also support 2 nxnu, 2 nxnd, nlx 2N, and nR x 2N asymmetric PUs for inter prediction.

Inter prediction processing unit 121 may generate predictive data for a PU by performing inter prediction on each PU of the CU. The predictive data for the PU may include predictive pixel blocks corresponding to the PU and motion information for the PU. The strips may be I-strips, P-strips, or B-strips. Inter prediction unit 121 may perform different operations on a PU of a CU depending on whether the PU is in an I slice, a P slice, or a B slice. In I-slice, all PUs perform intra prediction.

If the PU is in a P slice, motion estimation unit 122 may search for a reference picture in a list of reference pictures (e.g., "list 0") to find a reference block for the PU. The reference block of the PU may be a block of pixels that most closely corresponds to the block of pixels of the PU. Motion estimation unit 122 may generate a reference picture index that indicates a reference picture of the reference block in list 0 that contains the PU, and a motion vector that indicates a spatial displacement between the pixel block of the PU and the reference block. Motion estimation unit 122 may output the reference picture index and the motion vector as motion information for the PU. Motion compensation unit 124 may generate the predictive pixel block for the PU based on the reference block indicated by the motion information of the PU.

If the PU is in a B slice, motion estimation unit 122 may perform uni-directional inter prediction or bi-directional inter prediction on the PU. To perform uni-directional inter prediction for a PU, motion estimation unit 122 may search the reference pictures of a first reference picture list ("list 0") or a second reference picture list ("list 1") for a reference block of the PU. Motion estimation unit 122 may output, as the motion information for the PU: a reference picture index indicating a location in list 0 or list 1 of a reference picture containing a reference block, a motion vector indicating a spatial displacement between a pixel block of the PU and the reference block, and a prediction direction indicator indicating whether the reference picture is in list 0 or list 1. To perform bi-directional inter prediction for the PU, motion estimation unit 122 may search the reference picture in list 0 for the reference block of the PU and may also search the reference picture in list 1 for another reference block of the PU. Motion estimation unit 122 may generate reference picture indices that indicate positions in list 0 and list 1 of reference pictures containing the reference block. In addition, motion estimation unit 122 may generate motion vectors that indicate spatial displacements between the reference block and the block of pixels of the PU. The motion information for the PU may include a reference picture index and a motion vector for the PU. Motion compensation unit 124 may generate the predictive pixel block for the PU based on the reference block indicated by the motion information of the PU.

Intra-prediction processing unit 126 may generate predictive data for the PU by performing intra-prediction on the PU. The predictive data for the PU may include predictive pixel blocks for the PU and various syntax elements. Intra-prediction processing unit 126 may perform intra-prediction on PUs within I-slices, P-slices, and B-slices.

To perform intra-prediction for a PU, intra-prediction processing unit 126 may use multiple intra-prediction modes to generate multiple sets of predictive data for the PU. To generate the set of predictive data for the PU using the intra-prediction mode, intra-prediction processing unit 126 may extend samples from neighboring PU's sample blocks across the PU's sample blocks in a direction associated with the intra-prediction mode. Assuming left-to-right, top-to-bottom coding order for PU, CU, and CTB, neighboring PUs may be above the PU, above-right of the PU, above-left of the PU, or to the left of the PU. Intra-prediction processing unit 126 may use a different number of intra-prediction modes, e.g., 33 directional intra-prediction modes, included. In some examples, the number of intra prediction modes may depend on the size of the block of pixels of the PU.

Prediction processing unit 100 may select predictive data for a PU of the CU from among predictive data generated for the PU by inter prediction processing unit 121 or predictive data generated for the PU by intra prediction processing unit 126. In some examples, prediction processing unit 100 selects predictive data for PUs of the CU based on a rate/distortion metric for the set of predictive data. For example, a lagrangian cost function is used to select between the coding mode and its parameter values (such as motion vectors, reference indices and intra prediction directions). This kind of cost function uses a weighting factor lambda to relate the actual or estimated image distortion due to the lossy coding method to the actual or estimated amount of information needed to represent the pixel values in the image region: C-D + lambda x R, where C is the lagrangian cost to be minimized, D is the image distortion (e.g., mean square error) with the mode and its parameters, and R is the number of bits (e.g., including the amount of data used to represent the candidate motion vectors) needed to reconstruct the image block in the decoder. Generally, the least costly coding mode is selected as the actual coding mode. The predictive block of pixels of the selected predictive data may be referred to herein as the selected predictive block of pixels.

Residual generation unit 102 may generate residual blocks of pixels for the CU based on the blocks of pixels of the CU and the selected predictive blocks of pixels of the PUs of the CU. For example, residual generation unit 102 may generate the residual block of pixels for the CU such that each sample in the residual block of pixels has a value equal to a difference between: a sample in a block of pixels of the CU, and a corresponding sample in a selected predictive block of pixels of a PU of the CU.

Prediction processing unit 100 may perform quadtree partitioning to partition the residual pixel blocks of the CU into sub-blocks. Each no-longer-divided residual pixel block may be associated with a different TU of the CU. The size and location of the residual pixel blocks associated with the TUs of a CU are not necessarily related to the size and location of the pixel blocks of the PU based on the CU.

Because pixels of a residual pixel block of a TU may correspond to one luma sample and two chroma samples, each TU may be associated with one luma sample block and two chroma sample blocks. Transform processing unit 104 may generate coefficient blocks for each TU of the CU by applying one or more transforms to residual sample blocks associated with the TU. For example, transform processing unit 104 may apply a Discrete Cosine Transform (DCT), a directional transform, or a conceptually similar transform to the residual sample block.

Quantization unit 106 may quantize coefficients in the coefficient block. For example, n-bit coefficients may be truncated to m-bit coefficients during quantization, where n is greater than m. Quantization unit 106 may quantize coefficient blocks associated with TUs of the CU based on Quantization Parameter (QP) values associated with the CU. Video encoder 20 may adjust the degree of quantization applied to the coefficient blocks associated with the CU by adjusting the QP value associated with the CU.

Inverse quantization unit 108 and inverse transform processing unit 110 may apply inverse quantization and inverse transform, respectively, to the transformed coefficient block to reconstruct a residual sample block from the coefficient block. Reconstruction unit 112 may add samples of the reconstructed residual sample block to corresponding samples of one or more predictive sample blocks generated by prediction processing unit 100 to generate a reconstructed sample block associated with the TU. In this manner, video encoder 20 may reconstruct blocks of pixels of the CU by reconstructing blocks of samples for each TU of the CU.

Inverse quantization processing unit 131 and inverse transform processing unit 132 may apply inverse quantization and inverse transform, respectively, to the transformed coefficient block with the DC component in the coefficient block set to 0 to reconstruct a residual sample block from the coefficient block. Reconstruction unit 133 can add samples of the reconstructed residual sample block to corresponding samples (also referred to as prediction values) of one or more predictive sample blocks generated by prediction processing unit 100 to generate a reconstructed sample block (also referred to as a transitional reconstructed image block, hereinafter) associated with the TU. The transitional reconstructed image block is different from the reconstructed sample block generated by the reconstruction unit 112. The DC component prediction processing unit 134 may determine a prediction value of the DC component according to the similarity or correlation between the transient reconstructed image block and the pixels of the parameter pixel region obtained by the reconstruction unit 133. The DC residual generation unit 135 may generate a DC residual (residual) based on the DC component prediction value output by the DC component prediction processing unit 134 and the original DC component generated by the quantization processing unit 106. For example, the DC component prediction value may be subtracted from the original DC component to obtain a DC residual.

It is to be understood that the inverse quantization processing unit 131 and the inverse transform processing unit 132 may also multiplex the functions of the inverse quantization processing unit 108 and the inverse transform processing unit 110, respectively, and in this case, a coefficient reconstruction unit may be additionally provided for acquiring a DC component of the transformed quantized coefficient from the quantization processing unit 106, forcing the DC component to zero, and inputting the DC component forced to zero to the inverse quantization processing unit 108 for inverse quantization. The inverse quantization processing unit 108 is arranged to further inverse quantize the AC component and the forced-to-zero DC component and to perform an inverse transformation by the inverse transform processing unit 110 to reconstruct a residual sample block from the coefficient block. Likewise, reconstruction unit 133 can add samples of the reconstructed residual sample block to corresponding samples (also referred to as prediction values) of one or more predictive sample blocks generated by prediction processing unit 100 to generate a reconstructed sample block (also referred to as a transitional reconstructed image block, hereinafter) associated with the TU.

Filter unit 113 may perform a deblocking filtering operation to reduce blocking artifacts for blocks of pixels associated with the CU. In addition, the filter unit 113 may apply the SAO offset determined by the prediction processing unit 100 to the reconstructed sample block to restore a pixel block. Filter unit 113 may generate coding information for SAO syntax elements of CTBs.

The decoded picture buffer 114 may store the reconstructed pixel block. Inter prediction unit 121 may perform inter prediction on PUs of other pictures using the reference picture containing the reconstructed pixel block. In addition, intra-prediction processing unit 126 may use the reconstructed pixel block in decoded picture buffer 114 to perform intra-prediction on other PUs in the same picture as the CU.

Entropy encoding unit 116 may receive data from other functional components of video encoder 20. For example, entropy encoding unit 116 may receive the residual of the DC component from DC residual generation unit 135 as well as the AC component and may receive syntax elements from prediction processing unit 100. Entropy encoding unit 116 may perform one or more entropy encoding operations on the data to generate entropy encoded data. For example, entropy encoding unit 116 may perform a Context Adaptive Variable Length Coding (CAVLC) operation, a CABAC operation, a variable to variable (V2V) length coding operation, a syntax-based context adaptive binary arithmetic coding (SBAC) operation, a Probability Interval Partitioning Entropy (PIPE) coding operation, or other type of entropy encoding operation on the data. In a particular example, entropy encoding unit 116 may use regular CABAC engine 118 to encode regular CABAC-coded bins of syntax elements and may use bypass codec engine 120 to encode bypass-coded bins.

FIG. 7 is a block diagram illustrating an example video decoder 30 configured to implement the techniques of this disclosure. It should be understood that fig. 7 is exemplary and should not be taken as limiting the techniques as broadly illustrated and described herein. As shown in fig. 7, the video decoder 30 includes an entropy decoding unit 150, a prediction processing unit 152, an inverse quantization unit 154, an inverse transform processing unit 156, a reconstruction unit 158, a filter unit 159, an inverse quantization processing unit 171 within the DC prediction module 170, an inverse transform processing unit 172, a reconstruction unit 173, a DC component prediction processing unit 174, a DC component generation unit 175, and a decoded picture buffer 160. Prediction processing unit 152 includes an inter prediction processing unit 162 and an intra prediction processing unit 164. Entropy decoding unit 150 includes a regular CABAC codec engine and a bypass codec engine. In other examples, video decoder 30 may include more, fewer, or different functional components.

Video decoder 30 may receive the codestream. Entropy decoding unit 150 may parse the codestream to extract syntax elements from the codestream. As part of parsing the code stream, entropy decoding unit 150 may parse entropy-encoded syntax elements in the code stream. In addition, the entropy decoding unit 150 may further extract a variance quantization coefficient from the code stream, including: the residual of the DC component and the AC component.

The inverse quantization processing unit 171 and the inverse transform processing unit 172 may apply inverse quantization and inverse transform to the transformed coefficient block, respectively, and the DC component in the coefficient block is set to 0 to reconstruct a residual sample block from the coefficient block. Reconstruction unit 173 can add samples of the reconstructed residual sample block to corresponding samples (also referred to as prediction values) of one or more predictive sample blocks generated by prediction processing unit 152 to generate a reconstructed sample block (also referred to as a transitional reconstructed image block) associated with the TU. The transitional reconstructed image block is different from the reconstructed sample block generated by the reconstruction unit 158, i.e., the transitional reconstructed image block is a reconstructed image block used for acquiring the DC component and cannot be directly used to reconstruct the decoded image. The DC component prediction processing unit 174 may determine a prediction value of the DC component from the similarity or correlation of the transient reconstructed image block and the pixels of the parameter pixel region obtained by the reconstruction unit 173. The DC residual generating unit 175 may generate a DC component based on the DC component prediction value output by the DC component prediction processing unit 174 and the DC residual (residual) generated by the entropy decoding unit 150. For example, the DC component prediction value may be added to the DC residual to obtain the DC component.

It is to be understood that the inverse quantization processing unit 171 and the inverse transform processing unit 172 may also multiplex the functions of the inverse quantization processing unit 154 and the inverse transform processing unit 156, respectively, and in this case, another coefficient reconstruction unit may be provided for forcing the DC component of the transform quantization coefficient to zero and inputting the DC component forced to zero to the inverse quantization processing unit 154 for inverse quantization. The inverse quantization processing unit 156 is adapted to further inverse quantize the AC component and the DC component forced to zero and to perform an inverse transformation by the inverse transformation processing unit 156 to reconstruct a residual sample block from the coefficient block. Likewise, reconstruction unit 173 can add samples of the reconstructed residual sample block to corresponding samples (also referred to as prediction values) of one or more predictive sample blocks generated by prediction processing unit 152 to generate a reconstructed sample block (also referred to as a transitional reconstructed image block, hereinafter) associated with the TU.

Prediction processing unit 152, inverse quantization unit 154, inverse transform processing unit 156, reconstruction unit 158, and filter unit 159 may decode the video data according to syntax elements such as a DC component and an AC component, i.e., generate decoded video data.

The syntax elements may include regular CABAC coded bins and bypass coded bins. Entropy decoding unit 150 may use regular CABAC codec engine 166 to decode regular CABAC-coded bins and may use bypass codec engine 168 to decode bypass-coded bins.

If the PU is encoded using intra prediction, intra prediction processing unit 164 may perform intra prediction to generate a predictive sampling block for the PU. Intra-prediction processing unit 164 may use the intra-prediction mode to generate predictive pixel blocks for the PU based on the pixel blocks of the spatially neighboring PUs. Intra prediction processing unit 164 may determine the intra prediction mode of the PU from one or more syntax elements parsed from the codestream.

The inter prediction processing unit 162 may include a motion compensation unit, and the motion compensation unit may construct the first reference picture list and the second reference picture list according to syntax elements parsed from the codestream. Furthermore, if the PU is encoded using inter prediction, entropy decoding unit 150 may parse the motion information of the PU. Motion compensation unit 162 may determine one or more reference blocks for the PU from the motion information of the PU. Motion compensation unit 162 may generate the predictive block of pixels for the PU from one or more reference blocks of the PU.

In addition, video decoder 30 may perform a reconstruction operation on CUs that are no longer partitioned. To perform a reconstruction operation on a CU that is no longer partitioned, video decoder 30 may perform a reconstruction operation on each TU of the CU. By performing a reconstruction operation for each TU of the CU, video decoder 30 may reconstruct residual blocks of pixels associated with the CU.

As part of performing a reconstruction operation on TUs of a CU, inverse quantization unit 154 may inverse quantize (i.e., dequantize) coefficient blocks associated with the TUs. Inverse quantization unit 154 may use a QP value associated with the CU of the TU to determine a degree of quantization and is the same as the degree of inverse quantization that inverse quantization unit 154 is determined to apply.

After inverse quantization unit 154 inverse quantizes the DC and AC components in the coefficient block, inverse transform processing unit 156 may apply one or more inverse transforms to the coefficient block in order to generate a residual sample block associated with the TU. For example, inverse transform processing unit 156 may apply an inverse DCT, an inverse integer transform, an inverse Karhunen-Loeve transform (KLT), an inverse rotational transform, an inverse directional transform, or other inverse transforms corresponding to the transform at the encoding end to the coefficient block.

Reconstruction unit 158 may use, where applicable, residual pixel blocks associated with the TUs of the CU and predictive pixel blocks (i.e., intra-prediction data or inter-prediction data) of the PUs of the CU to reconstruct the pixel blocks of the CU. In particular, reconstruction unit 158 may add samples of the residual pixel block to corresponding samples of the predictive pixel block to reconstruct the pixel block of the CU.

Filter unit 159 may perform a deblocking filtering operation to reduce blocking artifacts of blocks of pixels associated with CUs of the CTB. In addition, the filter unit 159 may modify the pixel values of the CTB according to SAO syntax elements parsed from the codestream. For example, the filter unit 159 may determine the modifier value according to the SAO syntax element of the CTB and add the determined modifier value to the sample value in the reconstructed pixel block of the CTB. By modifying some or all of the pixel values of the CTBs of the picture, filter unit 159 may modify the reconstructed picture of the video data according to the SAO syntax elements.

Video decoder 30 may store the block of pixels of the CU in decoded picture buffer 160. Decoded picture buffer 160 may provide reference pictures for subsequent motion compensation, intra prediction, and display device presentation (e.g., display device 32 of fig. 3). For example, video decoder 30 may perform intra-prediction operations or inter-prediction operations on PUs of other CUs according to blocks of pixels in decoded picture buffer 160.

The encoding process and the decoding process of the embodiment of the present invention are described in detail below by taking HEVC as an example in conjunction with fig. 8 and 9. It should be understood that a similar approach may also be employed in video coding techniques such as h.264.

As described above, the embodiments of the present invention are mainly different from the conventional video coding technique in that the transform quantization coefficients of the code stream are written as the residual of the DC component and the AC component, rather than being written as the DC component and the AC component as in the conventional video coding technique. Since the absolute value of the residual of the DC component is generally smaller than the DC component, and the smaller the value, the higher the coding efficiency, the coding efficiency is improved.

Fig. 8 is a schematic diagram of an encoding process according to an embodiment of the present invention.

When the encoder encodes, an image to be encoded is firstly divided into image blocks which do not overlap with each other. Each tile is called a coding Tree unit ctu (coding Tree unit). The CTU may be further divided into a plurality of sub-blocks, where each sub-block is a Coding Unit (CU) whose Coding parameters can be independently determined. The CTU is divided into a quad tree structure. During compression coding, each CU will independently select its best coding parameters, e.g., Prediction mode selection, partition of Prediction block (PU), etc. When all the CUs in one CTU are encoded in sequence and compressed, the subsequent CTUs can be encoded continuously.

And 810, predicting by an encoder to obtain a predicted value of the predicted block, and subtracting the original value from the predicted value of the predicted block to obtain a residual signal of the predicted block. For example, the prediction may be intra prediction. The original value refers to an original pixel value of an image block to be encoded, including a target image block.

And 820, transforming and quantizing the residual signals of the target image block to obtain a direct current DC component and an alternating current AC component of the transformed and quantized coefficients of the target image block.

For example, the residual signal of the target image block is transformed and quantized, resulting in an initial transformed quantized coefficient of the target image block, which includes a DC component and an AC component.

The target image block of the present embodiment may be a transform block or correspond to a transform block. One prediction block may be divided into a plurality of transform blocks, and the transform blocks belonging to the same prediction block employ the same prediction mode. Embodiments of the present invention may transform and quantize a residual signal in units of transform blocks. The residual signal of each transform block may be obtained from the residual signal of the prediction block.

And 830, performing inverse quantization and inverse transformation on the initial transformation quantization coefficient to obtain a reconstruction residual signal of the target image block, and then adding the reconstruction residual of the target image block and the predicted value of the target image block to obtain a reconstruction value, so as to obtain a reconstruction image block of the target image block for subsequent predictive coding as reference information.

It is understood that 810 through 830 are similar to conventional video coding and are not described in detail herein.

And 840, performing inverse quantization and inverse transformation on the AC component and the preset DC component to obtain a transition residual, and adding the transition residual and the predicted value of the target image block to obtain a transition reconstructed image block of the target image block.

For example, inverse quantization and inverse transformation are performed on the transitional transformation quantized coefficient to obtain a transitional residual, and the transitional residual is added to the prediction value to obtain a transitional reconstructed image block of the target image block, where the transitional transformation quantized coefficient includes a second DC component and an AC component obtained by 820, and the second DC component is a preset value (e.g., a preset value is 0). The use of the scheme in which the second DC component is set to 0 has a benefit in that the complexity of the calculation can be reduced.

Setting the DC component of the transitional transformation quantization coefficient to be zero aiming at each target image block, namely the transformation quantization coefficient comprises an AC component analyzed from a code stream and a DC component with the value of 0, carrying out inverse quantization and inverse transformation on the transitional transformation quantization coefficient to obtain a reconstruction residual error of the target image block when the DC component is zero, and adding the reconstruction residual error and the predicted value of the target image block to obtain a reconstruction value, thereby obtaining a transitional reconstruction image block of the target image block when the DC component is zero, and marking as B_DC＝0. It is to be understood that the transitional reconstructed image block obtained at 840 is different from the reconstructed image block obtained at 830.

And 850, determining a predicted value of the DC component of the transformation quantization coefficient of the target image block according to the similarity between the pixels of the transient reconstruction image block and the pixels in the reference pixel area of the target image block.

For example, a prediction value of the DC component is determined according to a similarity of pixels of the reconstructed image block to pixels in the reference pixel region. Calculating a prediction value DC of a DC component of a target image block_predSee the description of the embodiment of fig. 14, which is not repeated herein.

860, determining a DC component residual of the target image block according to the predicted value of the DC component and the original DC component of the target image block.

For example, the DC component of the target image block is subtracted from the predicted value of the DC component to obtain a residual Δ DC of the DC component.

870, writing the AC component and DC component residual into the code stream.

For example, writing a reconstructed transformed quantized coefficient into the code stream, wherein the reconstructed transformed quantized coefficient comprises a difference value of the DC component and the predicted value and an AC component obtained by 820. The reconstructed transformed quantized coefficients may be entropy encoded and written into a code stream. Information such as the CU partition method in the CTU, the CU encoding mode, the PU partition mode in the CU, and the prediction mode selection is also entropy-encoded and written into the code stream.

In addition, although the DC component of the reference pixel area may also be directly used as the predicted value of the DC component, the embodiment of the present invention determines the predicted value of the DC component according to the similarity between the pixels of the transient reconstructed image block whose DC component is a preset value and the pixels in the reference pixel area, so as to improve the accuracy of the predicted value of the DC component, and make the residual of the DC component smaller, so that the encoding efficiency is higher.

Fig. 9 is a schematic diagram of a decoding process according to another embodiment of the present invention.

The processing method of the decoding end corresponds to the encoding end, after the decoding end obtains a code stream of a CTU, entropy decoding is carried out to obtain the partition mode of the CU in the CTU and the partition mode, the prediction mode, the transformation quantization coefficient and other information of the PU in each CU, and then decoding is carried out in sequence. When all the CUs in the CTU are decoded, the subsequent CTUs can be decoded continuously.

And 910, acquiring Alternating Current (AC) component and Direct Current (DC) component residual errors of the transformation quantization coefficients of the target image block from the code stream.

For example, a reconstructed transformed quantized coefficient is obtained from the code stream, wherein the reconstructed transformed quantized coefficient includes a difference value between a first direct current DC component of the target image block and a predicted value of the DC component, and an alternating current AC component of the target image block. The decoding end can receive the code stream sent by the encoding end and carry out entropy decoding on the code stream to obtain the prediction mode information and the transformation quantization coefficient of the target image block. The prediction mode information may include a prediction mode for each prediction block, e.g., in HEVC, the prediction mode may be one of a DC prediction mode, a planar prediction mode, and 33 angular (or directional) prediction modes. The transform quantized coefficients include Δ DC and AC components. For each prediction block, a plurality of transform blocks may be divided accordingly, the transform blocks belonging to the same prediction block employing the same prediction mode. The target image block of the present embodiment may be a transform block or correspond to a transform block.

And 920, performing inverse quantization and inverse transformation on the AC component and the preset DC component to obtain a transition residual, and adding the transition residual and the predicted value of the target image block to obtain a transition reconstructed image block of the target image block.

For example, inverse quantization and inverse transformation are performed on the transitional transformation quantization coefficient to obtain a transitional reconstructed image block of the target image block, where the transitional transformation quantization coefficient includes a second DC component and an AC component, and the second DC component is a preset value. Setting the DC component of the transformation quantization coefficient to be zero aiming at each target image block, namely the transformation quantization coefficient comprises an AC component analyzed from a code stream and a DC component with the value of 0, carrying out inverse quantization and inverse transformation on the transformation quantization coefficient to obtain a reconstruction residual error of the target image block with the DC component being zero, adding the reconstruction residual error and a predicted value of the target image block to obtain a transitional reconstruction image block of the target image block with the DC component being zero, and marking as B_DC＝0。

And 930, determining a predicted value of the DC component of the transformation quantization coefficient of the target image block according to the similarity of the pixels in the reference pixel area of the transitional reconstructed image block and the target image block.

For example, the prediction value of the DC component is determined from the similarity of the pixels of the transitional reconstructed image block to the pixels in the reference pixel area of the target image block. Calculating a prediction value DC of a DC component of a target image block_predSee the description of the embodiment of fig. 14 below.

And 940, determining the original DC component of the target image block according to the predicted value of the DC component and the DC component residual.

For example, the DC component is determined based on a difference between the DC component and the predicted value of the DC component. The predicted value DC of the DC component obtained in the above step can be used_predAnd adding the delta DC analyzed from the code stream to obtain a quantized DC component value of the target image block.

950, inverse quantization and inverse transformation are performed on the original DC component and the AC component of the target image block to obtain a residual signal of the target image block.

For example, the original transform quantization coefficient is inversely quantized and inversely transformed to obtain a residual signal of the target image block, wherein the original transform quantization coefficient includes a DC component and an AC component. And carrying out inverse quantization and inverse transformation on the DC component value of the target image block obtained in the step and the AC component value of the target image block analyzed from the code stream, thereby obtaining a reconstructed residual signal of the target image block.

And 960, decoding the target image block according to the residual signal.

For example, a reconstructed image block is generated from the prediction value and the reconstructed residual signal.

It should be understood that the process of generating the reconstructed image block according to the prediction value and the reconstructed residual signal is similar to that of the conventional video coding and will not be described herein.

A method of determining a prediction value of a DC component of a target image block is described in detail below with reference to fig. 10 to 13B.

Fig. 10 shows 35 prediction modes of HEVC. Referring to fig. 10, intra prediction modes in HEVC include Direct Current (DC) prediction mode (index 1), Planar (Planar) prediction mode (index 0) and angular prediction mode (index 2 to 34). When performing intra prediction, one of the 35 prediction modes described above may be selected. The current image block is predicted using information of reconstructed blocks of its left and top reference pixel areas. When the DC prediction mode is adopted, all pixels in the current image block use the average value of the reference pixels as the prediction value. When a plane prediction mode is adopted, reference pixels are used for carrying out bilinear interpolation to obtain the prediction values of all pixels in the current image block. When the angular prediction method is adopted, by utilizing the characteristic that the texture contained in the current image block is highly related to the texture of the adjacent reconstructed block, the pixel prediction in the current image block is projected to a reference pixel area along a specific angle, and the corresponding pixel value with 1/32 precision in the reference pixel area is used as the prediction value of the pixel in the current image block. Pixel values of the precision of reference pixel region 1/32 may be interpolated using two adjacent reference pixels. If the projected corresponding reference pixel happens to be an integer pixel of the reference region, the corresponding reference pixel value can be directly copied.

When determining the predicted value of the DC component, the embodiments of the present invention may select the pixel signal according to the direction specified by the prediction mode, and determine the predicted value of the DC component according to the texture correlation of the pixel signal. Three methods of determining the predicted value of the DC component are described in detail below.

Fig. 14 is a schematic flow diagram of a process of determining a predicted value of a DC component according to an embodiment of the present invention.

Specifically, 850 of FIG. 8 and 930 of FIG. 9 may include the steps of:

1410, according to the direction indicated by the prediction mode, selecting a plurality of adjacent pixel signals along the kth line in the transient reconstruction image block and the parameter pixel area;

for example, an intra prediction mode pred for obtaining a target image block from a code stream_modeFour adjacent pixel signals are taken from the reference pixel areas around the transitional reconstructed image block and the transitional reconstructed image block, respectively, along the direction specified by the intra prediction mode. The k-th line is not a true line, but is a line that is convenient for describing a hypothetical ray that passes through the four adjacent pixels and coincides with the direction in which the intra-prediction mode points, and is substantially parallel or parallel to the direction in which the intra-prediction mode points. It should be understood that when referring to a pixel through which a line passes, embodiments of the present invention may refer to the line passing through an actual pixel, or may refer to a virtual pixel interpolated by the line passing through an actual pixel.

Fig. 11A is a diagram illustrating selection of pixel signals based on a directional prediction mode according to an embodiment of the present invention. Fig. 11B is a schematic diagram of selecting pixel signals based on the DC prediction mode and the planar prediction mode. As shown in fig. 11A, selecting a group of pixel signals in the reference pixel region on the k-th line includes:

four signals, selecting a set of pixel signals in the transient reconstruction image block, comprising:

four signals. Wherein the subscript i_kRepresenting the kth line along the prediction direction, j represents the jth pixel signal on this line.

And

and adjacent, other signals are arranged in sequence according to the numbering sequence. For the direction prediction mode, selecting pixel signals according to the prediction direction; for the DC prediction mode and the planar prediction mode, however, only the correlation between adjacent pixels in the horizontal direction and the vertical direction may be considered, and thus, the correlation between adjacent pixels in the horizontal direction and the vertical direction may be consideredThe pixel signal is selected upward. In an embodiment of the present invention, the direction specified by the directional prediction mode may refer to a prediction direction, and the directions specified by the DC prediction mode and the plane prediction mode may refer to a horizontal direction and a vertical direction, and k is an integer between 1 and 2 n-1.

It should be understood that, for convenience of description, the embodiment of the present invention is described by taking four signals each selected in the reference pixel area and the transient reconstructed image block as an example, the embodiment of the present invention is not limited thereto, and the number of the selected signals may be other values, for example, a value greater than 4 or equal to 3.

1420, calculating the second order gradient of the two groups of pixel signals selected on the k-th line, and determining the set of lines with texture directionality according to the second order of the two groups of pixels.

The calculation method of the second order of the two groups of pixel signals selected on each line is as follows:

the ith can be judged according to the magnitude of the second order gradient_kWhether a line has a strong texture characteristic in the prediction direction or not, when the second-order gradient value is small (for example, smaller than a certain preset value), it indicates that the signal is consistent in the direction, i.e., has a strong texture characteristic. The present embodiment is described by taking the above method for obtaining the second order gradient as an example, but the embodiment of the present invention is not limited thereto, and any other method for obtaining the second order gradient may be used in the embodiment of the present invention.

In a specific implementation, the ith is judged using the following formula (2)_kWhether a line has strong texture properties in the prediction direction. In particular, when a signal for calculating a second order gradient cannot be found in the transient reconstruction image block and a signal for calculating a second order gradient can be found in the reference region, for example, the following formula (3) is used.

Wherein λ₁And λ₂Can use lambda according to experimental experience₁＝40,λ ₂20. If it is the ith_kA line satisfies formula (2) or (3), the line is added to a set C representing a set of lines having texture directivity.

It should be understood that the embodiment of the present invention determines the texture characteristic by a second-order gradient, but the embodiment of the present invention is not limited thereto, and other methods may be used to determine whether the selected signal on each line has a stronger texture characteristic, such as a first-order gradient. Of course, the texture characteristic may be determined by a combination of methods, for example, by using a second order gradient for selected signals on some lines and a first order gradient for selected signals on others.

Referring to fig. 10, except for

directions

2, 10, 18, 26 and 34, the prediction modes in other directions use two adjacent pixels to perform weighted averaging to obtain the pixel signals

And

in other words, the two sets of pixel signals may be signals of the pixels through which the line passes, or may be signals obtained by interpolating signals of pixels on both sides of the line.

The above is obtained in different prediction modes as described in detail below

And

a method of signaling. For convenience of description, the indices of the horizontal and vertical directions of the signals are shown by the corner marks in the following figures, which are different from the above corner marks. FIGS. 12A-12D are directional prediction based embodiments of the present inventionSchematic diagram of mode selection pixel signal.

Referring to FIG. 12A, for interpolation for prediction modes 3-9, blocks to the left and lower left of the transitional reconstructed image block may be used as references, in x_i,1Interpolating pixel y 'for origin by making a ray along the prediction direction (the prediction direction is indicated in FIG. 10)'_i,jAnd obtaining the weighted average of two adjacent original pixel points. Prediction modes 4-9 are similar and will not be described further herein.

Wherein

The distance between the interpolation point and the upper side pixel point,

the distance between the interpolation point and the lower side pixel point is. The rest of the interpolation points are analogized.

Referring to FIG. 12B, for the interpolation of prediction modes 11-17, the blocks to the left and upper left of the transitional reconstructed image block are used as references to

Interpolating pixel y 'for origin by making a ray along the prediction direction (the prediction direction indicated in FIG. 10)'_i,jAnd obtaining the weighted average of two adjacent original pixel points. Prediction modes 12-17 are similar and will not be described further herein.

Wherein

For interpolation points from upper-side pixel points

The distance between the first and second electrodes,

the distance between the interpolation point and the lower side pixel point is. Interpolation of the remaining points and so on.

Referring to FIG. 12C, for the interpolation of prediction modes 19-25, the block above and in the upper left corner of the transitional reconstructed image block is used as a reference, denoted by x_1,iInterpolating pixel y 'for origin by making a ray along the predicted direction'_i,jAnd obtaining the weighted average of two adjacent original pixel points. Modes 20-25 are similar and will not be described in detail herein.

Wherein

The distance between the interpolation point and the left pixel point and the distance between the interpolation point and the right pixel point. The rest of the interpolation points are analogized.

Referring to FIG. 12D, for the interpolation of directional prediction modes 27-33, the block above and in the upper right corner of the transitional reconstructed image block can be used as a reference, denoted by x_1,iInterpolating pixel y 'for origin by making a ray along the predicted direction'_i,jAnd obtaining the weighted average of two adjacent original pixel points. Modes 28-33 are similar and will not be described in detail herein.

Wherein

It should be understood that the selection of the interpolation point is not limited to the point between the upper pixel point and the lower pixel point on the ray, and the point between the left pixel point and the right pixel point on the ray may be selected, and accordingly, the interpolation point may be selected according to the left pixel point and the right pixel pointDot interpolation to obtain y'_i,j。

For the pixel points inside the transient reconstruction image block, if 4 adjacent pixel points or pixel points obtained through interpolation can be found on the ray reverse extension lines in fig. 12A to 12D, the 4 adjacent pixel points inside the transient reconstruction image block are selected; and if 4 adjacent pixel points cannot be found, only selecting the pixel points of the reference block or the reference pixel area.

And 1430, solving the offset of the first pixel signal, so that the sum of squares of a second-order gradient of the reconstructed signal obtained by adding the offset to the first pixel signal on the line in the set C and a second-order gradient of the second pixel signal is minimum, and adding the offset deltax to the signal in the transitional reconstructed image block to obtain the reconstructed signal with the DC component. The offset represents a DC prediction value of the transitional reconstructed image block before quantization. When the offset deltax is the actual DC component before quantization,

i.e. the actual reconstructed signal.

According to the characteristic that the image signals have a strong correlation or similarity in local regions, i.e.

And

and the DC prediction value of the transient reconstruction image block is calculated by strong correlation. The correlation or similarity is measured using a second order gradient, see in particular the following equation (11), and the problem translates into finding δ x such that the sum of the squares of the second order gradients of the signals in that direction is minimized. The Convex optimization (Convex optimization) problem is a quadratic function, and the value of δ x can be obtained by calculating formula (12).

Where q is {1,2}, and k is C. q is the number of pixel signals on each line in the above set C.

Because the second-order gradient can more accurately represent the correlation or similarity between the pixels of the target image block and the reference image block, the accuracy of the predicted value of the DC component obtained based on the second-order gradient is higher, the DC component residual is smaller, and the coding efficiency is further improved.

1440, the value of the offset obtained by the solution is quantized to obtain a predicted value of the quantized DC component.

For example, the predicted value of the quantized DC component is DC_predQ (m · δ x), where Q (·) is the quantization operation and m is the number of pixels in the target image block.

It should be understood that, as an alternative to the embodiment of fig. 14, it is also possible to directly select two adjacent pixel signals of the transient reconstructed image block and the reference area according to the direction specified by the prediction mode to form a plurality of pixel pairs, and determine the DC component prediction value according to the formula (13) and the formula (14). For a specific process, refer to the description in method one, which is not repeated herein.

Because the second-order gradient can more accurately represent the correlation or similarity between the pixels of the target image block and the reference image block, the accuracy of the predicted value of the DC component obtained based on the second-order gradient is higher, the DC component residual is smaller, and the coding efficiency is further improved. In addition, only the pixels on certain lines with strong correlation are selected for calculating the predicted value of the DC component, so that the accuracy of the predicted value of the DC component is further improved, the residual error of the DC component is smaller, and the coding efficiency is further improved.

Fig. 15 is a schematic flow chart of a process of determining a predicted value of a DC component according to another embodiment of the present invention.

Specifically, step 850 of FIG. 8 and step 930 of FIG. 9 may include the steps of:

and 1510, determining a plurality of pixel pairs in the transient reconstructed image block and the parameter pixel area according to the direction indicated by the prediction mode, and a first pixel signal and an adjacent second pixel signal corresponding to each pixel pair, wherein the first pixel signal is a pixel signal of the transient reconstructed image block, and the second pixel signal is a pixel signal of the reference pixel area.

Fig. 13A and 13B are schematic diagrams of selecting pixel signals according to another embodiment of the invention. As shown in fig. 13A and 13B, the transient reconstruction image block fetching signal

Reference region signal taking

For the direction prediction mode, the pixel signals can be selected according to the prediction direction; for the DC prediction plane prediction mode, only the correlation of the pixels in the vertical and horizontal directions may be considered, i.e. the pixel signals are selected in the horizontal and vertical directions.

1520, the offset of the first pixel signal is solved such that a sum of squares of first order gradients of the first pixel signal of the plurality of pixel pairs plus the offset of the reconstructed signal and the second pixel signal is minimized, wherein the offset is used to represent a predicted value of the DC component before quantization.

For the direction prediction mode, the pixel signals can be selected according to the prediction direction; for the DC prediction plane prediction mode, only the correlation of the pixels in the vertical and horizontal directions may be considered, i.e. the pixel signals are selected in the horizontal and vertical directions. Similar to the above-described process in which the set C is not space-time, the signal within the image block is reconstructed for transitions

It is now added an offset δ x.

According to the characteristic that the image signal has a strong correlation in a local area, i.e.

And

and the method has strong correlation, and can be used for calculating the predicted value of the DC component of the target image block. The correlation can be measured using a first order gradient, see equation (13), and the problem translates into finding δ x such that the sum of the squared first order gradients of the signal in that direction is minimized. The convex optimization problem is a quadratic function, and only formula (14) needs to be calculated to obtain the value of δ x.

1530, the solved offset value is quantized to obtain a predicted value of the DC component.

It will be appreciated that the embodiment of fig. 14 may be combined with the embodiment of fig. 15. For example, if no line with texture direction characteristics is found through 1420 of fig. 14, i.e., the set C is empty, two pixels of the transitional reconstructed image block adjacent to the reference area are selected to determine the prediction value of the DC component according to the scheme of fig. 15.

Because only the adjacent pixels between the transitional reconstructed image block and the reference pixel area are selected for calculating the predicted value of the DC component, the calculation complexity is reduced under the condition of ensuring the accuracy of the predicted value of the DC component.

Fig. 16 is a schematic flow chart of a process of determining a predicted value of a DC component according to another embodiment of the present invention.

1610, selecting M rows of pixels in the upper reference pixel region of the transient reconstruction image block and H columns of pixels in the left reference pixel region of the transient reconstruction image block, and respectively calculating an average value Ave (Up) of the M rows of pixels and the H columns of pixels_M) And Ave (Left)_H)。

1620, selecting M rows of pixels on the upper side inside the transient reconstruction image block and H columns of pixels on the left side inside the transient reconstruction image block, and respectively calculating the average value Ave (Cur) of the two pixels_M) And Ave (Cur)_H)。

Where M and H may be greater than or equal to 2 and less than the size N of the transitional reconstructed image block. M and H may or may not be equal.

1630 calculating the difference between the mean of the M rows of pixels in the reference pixel region above the transitional reconstructed image block and the mean of the M rows of pixels at the upper side inside the transitional reconstructed image block to obtain a first difference value

1640, calculating the difference between the mean value of the H rows of pixels in the reference pixel region at the left side of the transitional reconstructed image block and the mean value of the H rows of pixels at the left side inside the transitional reconstructed image block to obtain a second difference value

1650, the average of the sums is used as the predicted value of the DC component of the target image block before quantization, and the predicted value of the DC component of the target image block after quantization is

And m is the number of the target image blocks.

The embodiments of the present invention can be applied to various electronic apparatuses, and the following gives examples in which the embodiments of the present invention are applied to a television apparatus and a mobile phone apparatus, for example.

Because the pixels of the multi-row or multi-column transitional reconstructed image block and the reference pixel area are selected for calculating the predicted value of the DC component, the calculation complexity is reduced under the condition of ensuring the accuracy of the predicted value of the DC component. The average value is used to calculate the predicted value of the DC component, making the design of the encoder or decoder simple. In addition, when M and H are multiple rows or multiple columns, the predicted value of the DC component can be predicted by fully utilizing the correlation of more signals around the target image block and the relation between the target image block and the surrounding signals, so that the residual error of the DC component is smaller, the precision of the predicted value of the DC component is improved, and the coding efficiency is further improved.

In addition, the method also comprises the steps of firstly judging the size of the DC value of the image to be coded before implementing the DC prediction, executing the method of the invention if the DC value is larger than a preset threshold, and processing by adopting a conventional method if the DC value is smaller than the preset threshold, namely directly coding the DC component and the AC component of the target image block without adopting the DC prediction method. Corresponding to the decoding end, the present invention may add a syntax element, such as a flag, to the parameters of slice header, PPS, SPS, etc. to indicate which way the decoding end uses to obtain the DC component and AC component of the target image block, such as DC _ pred _ present flag, which indicates when it is 1 that DC is obtained in a prediction way, and indicates when it is zero that DC is obtained in a way of directly parsing the code stream, or vice versa.

Fig. 17 is a schematic structural diagram of a television application to which the embodiment of the present invention is applied. The television apparatus 1700 includes an antenna 1701, a tuner 1702, a demultiplexer 1703, a decoder 1704, a video signal processor 1705, a display unit 1706, an audio signal processor 1707, a speaker 1708, an external interface 1709, a controller 1710, a user interface 1711, and a bus 1712.

The tuner 1702 extracts a signal of a desired channel from a broadcast signal received via the antenna 1701, and demodulates the extracted signal. The tuner 1702 then outputs the encoded bit stream obtained by demodulation to the demultiplexer 1703. That is, the tuner 1702 functions as a transmitting device in the television apparatus 1700 that receives an encoded stream of encoded images.

The demultiplexer 1703 separates a video stream and an audio stream of a program to be viewed from the encoded bit stream, and outputs the separated streams to the decoder 1704. The demultiplexer 1703 also extracts auxiliary data, e.g., an electronic program guide, from the encoded bitstream and provides the extracted data to the controller 1710. If the encoded bit stream is scrambled, the demultiplexer 703 may descramble the encoded bit stream.

The decoder 1704 decodes the video stream and the audio stream input from the demultiplexer 1703. The decoder 1704 then outputs the video data generated by the decoding to the video signal processor 1705. The decoder 1704 also outputs audio data generated by decoding to the audio signal processor 1707.

The video signal processor 1705 reproduces video data input from the decoder 1704 and displays the video data on the display unit 1706. The video signal processor 1705 may also display an application screen provided via a network on the display unit 1706. In addition, the video signal processor 1705 may perform additional processing, for example, noise removal, on the video data according to the setting. The video signal processor 1705 may also generate an image of a GUI (graphical user interface) and superimpose the generated image on an output image.

The display unit 1706 is driven by a driving signal supplied from the video signal processor 1705, and displays a video or an image on a video screen of a display device, for example, a liquid crystal display, a plasma display, or an OELD (organic electroluminescent display).

The audio signal processor 1707 performs reproduction processing, for example, digital-to-analog conversion and amplification, on the audio data input from the decoder 1704, and outputs audio through the speaker 1708. In addition, the audio signal processor 1707 may perform additional processing on the audio data, for example, noise removal.

The external interface 1709 is an interface for connecting the television apparatus 1700 to an external device or a network. For example, a video stream or an audio stream received via the external interface 1709 may be decoded by the decoder 1704. That is, the external interface 1709 also serves as a transmitting device in the television apparatus 1700 that receives an encoded stream of encoded images.

The controller 1710 includes a processor and a memory. The memory stores programs to be executed by the processor, program data, auxiliary data, data acquired via a network, and the like. For example, when the television apparatus 900 is started, a program stored in the memory is read and executed by the processor. The processor controls the operation of television apparatus 1700 in accordance with control signals input from user interface 1711.

The user interface 1711 is connected to the controller 1710. For example, the user interface 1711 includes buttons and switches for allowing the user to operate the television apparatus 1700 and a receiving unit for receiving a remote control signal. The user interface 1711 detects operations performed by the user via these components, generates a control signal, and outputs the generated control signal to the controller 1710.

The bus 1712 connects the tuner 1702, demultiplexer 1703, decoder 1704, video signal processor 1705, audio signal processor 1707, external interface 1709, and controller 1710 to each other.

In the television apparatus 1700 having such a structure, the decoder 1704 has the function of the video decoding apparatus according to the above-described embodiment.

Fig. 18 is a schematic configuration diagram of an application of the present invention to a mobile phone. The mobile phone device 1720 includes an antenna 1721, a communication unit 1722, an audio codec 1723, a speaker 1724, a microphone 1725, a camera unit 1726, an image processor 1727, a demultiplexer 1728, a recording/reproducing unit 1729, a display unit 1730, a controller 1731, an operation unit 1732, and a bus 1733.

The antenna 1721 is connected to the communication unit 1722. A speaker 1724 and a microphone 1725 are connected to the audio codec 1723. The operation unit 932 is connected to the controller 1731. The bus 1733 connects the communication unit 1722, the audio codec 1723, the camera unit 1726, the image processor 1727, the demultiplexer 1728, the recording/reproducing unit 1729, the display unit 1730, and the controller 1731 to each other.

The mobile phone device 1720 performs operations in various operation modes, such as transmission/reception of audio signals, transmission/reception of e-mail and image data, capturing of images, recording of data, and the like, including a voice call mode, a data communication mode, an imaging mode, and a video phone mode.

In the voice call mode, an analog audio signal generated by the microphone 1725 is provided to the audio codec 1723. The audio codec 1723 converts an analog audio signal into audio data, performs analog-to-digital conversion on the converted audio data, and compresses the audio data. The audio codec 1723 then outputs the audio data obtained as the compression result to the communication unit 1722. The communication unit 1722 encodes and modulates audio data to generate a signal to be transmitted. The communication unit 1722 then transmits the generated signal to be transmitted to the base station via the antenna 1721. The communication unit 1722 also amplifies a radio signal received via the antenna 1721 and performs frequency conversion on the radio signal received via the antenna 1721 to obtain a received signal. The communication unit 1722 then demodulates and decodes the received signal to generate audio data, and outputs the generated audio data to the audio codec 1723. The audio codec 1723 decompresses the audio data and performs digital-to-analog conversion on the audio data to generate an analog audio signal. The audio codec 1723 then provides the resulting audio signal to the speaker 1724 for output of audio from the speaker 1724.

In the data communication mode, for example, the controller 1731 generates text data to be included in an electronic mail according to an operation by a user via the operation unit 1732. The controller 1731 also displays text on the display unit 1730. The controller 1731 also generates email data in response to an instruction for transmission from the user via the operation unit 1732, and outputs the generated email data to the communication unit 1722. A communication unit 17922 encodes and modulates the email data to generate a signal to be transmitted. The communication unit 1722 then transmits the generated signal to be transmitted to the base station via the antenna 1721. The communication unit 1722 also amplifies a radio signal received via the antenna 1721 and performs frequency conversion on the radio signal received via the antenna 1721 to obtain a received signal. The communication unit 1722 then demodulates and decodes the received signal to restore the email data, and outputs the restored email data to the controller 1731. The controller 1731 displays the contents of the email on the display unit 1730 and stores the email data in the storage medium of the recording/reproducing unit 1729.

The recording/reproducing unit 1729 includes a readable/writable storage medium. For example, the storage medium may be an internal storage medium, or may be an externally mounted storage medium, such as a hard disk, a magnetic disk, a magneto-optical disk, a USB (universal serial bus) memory, or a memory card.

In the imaging mode, the camera unit 1726 images a subject to generate image data, and outputs the generated image data to the image processor 1727. The image processor 1727 encodes image data input from the camera unit 1726 and stores the encoded stream in a storage medium of the storage/reproduction unit 1729.

In the video phone mode, the demultiplexer 1728 multiplexes a video stream encoded by the image processor 1727 and an audio stream input from the audio codec 1723, and outputs the multiplexed stream to the communication unit 1722. The communication unit 1722 encodes and modulates the multiplexed stream to generate a signal to be transmitted. The communication unit 1722 then transmits the generated signal to be transmitted to the base station via the antenna 1721. The communication unit 1722 also amplifies a radio signal received via the antenna 1721 and performs frequency conversion on the radio signal received via the antenna 1721 to obtain a received signal. The signal to be transmitted and the received signal may comprise coded bit streams. The communication unit 1722 then demodulates and decodes the received signal to restore the stream, and outputs the restored stream to the demultiplexer 1728. The demultiplexer 1728 separates a video stream and an audio stream from the input stream, outputs the video stream to the image processor 1727, and outputs the audio stream to the audio codec 1723. The image processor 1727 decodes the video stream to generate video data. The video data is supplied to the display unit 1730, and a series of images are displayed by the display unit 1730. The audio codec 1723 decompresses the audio stream and performs digital-to-analog conversion on the audio stream to generate an analog audio signal. The audio codec 1723 then provides the generated audio signals to the speaker 1724 to output audio from the speaker 1724.

In the mobile phone device 1720 having such a structure, the image processor 1727 has the functions of the video encoding device and the video decoding device according to the above-described embodiments.

In one or more examples, the functions described may be implemented in hardware, software, firmware, or any combination thereof. If implemented in software, the functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium and executed by a hardware-based processing unit. Computer-readable media may include computer-readable storage media (which corresponds to tangible media such as data storage media) or communication media, including any medium that facilitates transfer of a computer program from one place to another, such as in accordance with a communication protocol. In this manner, the computer-readable medium may generally correspond to (1) a non-transitory tangible computer-readable storage medium, or (2) a communication medium such as a signal or carrier wave. A data storage medium may be any available medium that can be accessed by one or more computers or one or more processors to retrieve instructions, code and/or data structures for implementation of the techniques described in this disclosure. The computer program product may include a computer-readable medium.

By way of example, and not limitation, some computer-readable storage media may comprise RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, flash memory, or any other medium that may be used to store desired program code in the form of instructions or data structures and that may be accessed by a computer. Also, any connection is properly termed a computer-readable medium. For example, if the instructions are sent from a website, server, or other remote source using a coaxial cable, fiber optic cable, twisted pair, Digital Subscriber Line (DSL), or wireless technologies such as infrared, radio, and microwave, then the coaxial cable, fiber optic cable, twisted pair, DSL, or wireless technologies such as infrared, radio, and microwave are included in the definition of medium. It should be understood, however, that computer-readable storage media and data storage media do not include connections, carrier waves, signals, or other transitory media, but rather pertain to non-transitory tangible storage media. Disk and disc, as used herein, includes Compact Disc (CD), laser disc, optical disc, Digital Versatile Disc (DVD), floppy disk and blu-ray disc where disks usually reproduce data magnetically, while discs reproduce data optically with lasers. Combinations of the above should also be included within the scope of computer-readable media.

The instructions may be executed by one or more processors, such as one or more Digital Signal Processors (DSPs), general purpose microprocessors, Application Specific Integrated Circuits (ASICs), field programmable logic arrays (FPGAs), or other equivalent integrated or discrete logic circuitry. Thus, the term "processor," as used herein may refer to any of the foregoing structure or any other structure suitable for implementation of the techniques described herein. Additionally, in some aspects, the functionality described herein may be provided within dedicated hardware and/or software modules configured for encoding and decoding, or incorporated in a combined codec. Also, the techniques may be fully implemented in one or more circuits or logic elements.

The techniques of this disclosure may be implemented broadly by various means or devices including a wireless handset, an Integrated Circuit (IC), or a collection of ICs (e.g., a chipset). Various components, modules, or units are described in this disclosure to emphasize functional aspects of devices configured to perform the disclosed techniques, but do not necessarily require realization by different hardware units. In particular, as described above, the various units may be combined in a codec hardware unit, or provided in conjunction with suitable software and/or firmware by a set of interoperability hardware units (including one or more processors as described above).

It should be appreciated that reference throughout this specification to "one embodiment" or "an embodiment" means that a particular feature, structure or characteristic described in connection with the embodiment is included in at least one embodiment of the present invention. Thus, the appearances of the phrases "in one embodiment" or "in an embodiment" in various places throughout this specification are not necessarily all referring to the same embodiment. Furthermore, the particular features, structures, or characteristics may be combined in any suitable manner in one or more embodiments.

In various embodiments of the present invention, it should be understood that the sequence numbers of the above-mentioned processes do not mean the execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation on the implementation process of the embodiments of the present invention.

In addition, the terms "system" and "network" are often used interchangeably herein. It should be understood that the term "and/or" herein is merely one type of association relationship that describes an associated object, meaning that three relationships may exist, e.g., a and/or B may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the character "/" herein generally indicates that the former and latter related objects are in an "or" relationship.

In the embodiments provided herein, it should be understood that "B corresponding to a" means that B is associated with a from which B can be determined. It should also be understood that determining B from a does not mean determining B from a alone, but may be determined from a and/or other information.

Those of ordinary skill in the art will appreciate that the elements and algorithm steps of the examples described in connection with the embodiments disclosed herein may be embodied in electronic hardware, computer software, or combinations of both, and that the components and steps of the examples have been described in a functional general in the foregoing description for the purpose of illustrating clearly the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.

The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention, and all the changes or substitutions should be covered within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the appended claims.

In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

Claims

1. A decoding method, characterized in that the decoding method comprises:

obtaining alternating current AC component and direct current DC component residual error of a transformation quantization coefficient of a target image block from a code stream;

performing inverse quantization and inverse transformation on the AC component and a preset DC component to obtain a transition residual error, and adding the transition residual error and a predicted value of a target image block to obtain a transition reconstructed image block of the target image block;

determining a predicted value of a DC component of a transformation quantization coefficient of the target image block according to the similarity of pixels in the reference pixel area of the transitional reconstructed image block and the target image block;

determining an original DC component of the target image according to the predicted value of the DC component and the DC component residual error;

and carrying out inverse quantization and inverse transformation on the original DC component and the AC component of the target image to obtain a residual signal of the target image, and decoding the target image according to the residual signal.

2. The decoding method according to claim 1, wherein determining a prediction value of a DC component of a transform quantized coefficient of the target image block from a similarity of the transitional reconstructed image block and pixels in a reference pixel area of the target image block comprises:

determining at least one line and a first group of pixel signals and an adjacent second group of pixel signals corresponding to each line in a direction specified by a prediction mode of the target image block, wherein the first group of pixel signals comprises first pixel signals of the transitional reconstructed image block, the second group of pixel signals comprises second pixel signals of the reference pixel area, and the first pixel signals are adjacent to the second pixel signals;

solving an offset of the first pixel signal so that a sum of squares of a second order gradient of the reconstructed signal of the at least one line after the first pixel signal is added by the offset and a second order gradient of the second pixel signal is minimum, wherein the pixel signal used for representing the second order gradient in the transient reconstructed image block is added by the offset, and the offset is used for representing a predicted value of the DC component before quantization;

and quantizing the value of the offset obtained by solving to obtain a predicted value of the DC component.

3. The decoding method according to claim 2, wherein the first group of pixel signals and the second group of pixel signals corresponding to each of the at least one line satisfy one of the following two formulas:

wherein λ is₁A threshold value, λ, for a second order gradient of a reconstructed signal representing said first set of pixel signals₂A threshold value for representing a reconstructed signal of the first set of pixel signals in the absence of a second order gradient,

is the second order gradient of the first set of pixel signals,

is a second order gradient of the second set of pixel signals, i_kIs the number of the at least one line, j is the number of the pixel signal on each of the at least one line,

wherein the offset is calculated according to the following formula:

wherein δ x is the offset amount,

for a second order gradient of a reconstructed signal of the first pixel signal,

c represents a set of lines satisfying one of the two formulas, and q is a number of pixel signals on each of the at least one line.

4. The decoding method according to claim 1, wherein the determining a prediction value of a DC component of a transform quantized coefficient of the target image block according to the similarity of the pixels in the reference pixel area of the target image block and the reconstructed image block according to the transition comprises:

determining a plurality of pixel pairs, a first pixel signal corresponding to each pixel pair and an adjacent second pixel signal in a direction specified by a prediction mode of the target image block, wherein the first pixel signal is a pixel signal of the transitional reconstructed image block, and the second pixel signal is a pixel signal of the reference pixel area;

solving for an offset of the first pixel signal such that a sum of squares of first order gradients of a reconstructed signal and a second pixel signal of the plurality of pixel pairs after the offset is added to the first pixel signal is minimized, wherein the offset is used to represent a predicted value of the DC component before quantization;

5. The decoding method according to claim 4, wherein the predicted value of the DC component is calculated according to the following formula:

where δ x is the offset, n is the number of pixels of each row or column of the transitional reconstructed image block,

for the purpose of the first pixel signal,

for the purpose of the second pixel signal,

and

adjacent in the direction specified by the prediction mode.

6. The decoding method according to claim 1, wherein determining a prediction value of a DC component of a transform quantized coefficient of the target image block from a similarity of the transitional reconstructed image block and pixels in a reference pixel area of the target image block comprises:

acquiring a first group of pixel signals positioned above the transitional reconstructed image block, a second group of pixel signals positioned on the left side of the transitional reconstructed image block, a third group of pixel signals positioned on the upper side inside the transitional reconstructed image block and a fourth group of pixel signals positioned on the left side inside the transitional reconstructed image block in the reference pixel area, wherein the first group of pixel signals and the third group of pixel signals respectively comprise M rows of pixel signals, the second group of pixel signals and the fourth group of pixel signals respectively comprise H columns of pixel signals, and M and H are positive integers;

calculating the difference between the average value of the first group of pixel signals and the average value of the third group of pixel signals to obtain a first difference value;

calculating the difference between the average value of the second group of pixel signals and the average value of the fourth group of pixel signals to obtain a second difference value;

and quantizing the average value of the first difference value and the second difference value to obtain a predicted value of the DC component.

7. The decoding method according to claim 6, wherein M is an integer greater than or equal to 2, and H is an integer greater than or equal to 2.

8. A method of encoding, comprising:

transforming and quantizing the residual signal of the target image block to obtain a Direct Current (DC) component and an Alternating Current (AC) component of a transformation quantization coefficient of the target image block;

determining a predicted value of a DC component of a transformation quantization coefficient of the target image block according to the similarity of the pixels of the transitional reconstructed image block and the pixels in the reference pixel area of the target image block;

determining a DC component residual of the target image according to the predicted value of the DC component and the original DC component of the target image block;

and writing the residual errors of the AC component and the DC component into a code stream.

9. The encoding method according to claim 8, wherein determining a prediction value of a DC component of a transform quantized coefficient of the target image block from a similarity of pixels of the transitional reconstructed image block to pixels in a reference pixel area of the target image block comprises:

10. The encoding method according to claim 9, wherein the first group of pixel signals and the second group of pixel signals corresponding to each of the at least one line satisfy one of the following two formulas:

is the second order gradient of the first set of pixel signals,

wherein the offset is calculated according to the following formula:

wherein δ x is the offset amount,

11. The encoding method according to claim 8, wherein determining a prediction value of a DC component of a transform quantized coefficient of the target image block from a similarity of pixels of the transitional reconstructed image block to pixels in a reference pixel area of the target image block comprises:

solving for an offset of the first pixel signal, so that a sum of squares of first-order gradients of a reconstructed signal and a second pixel signal of the plurality of pixel pairs after the offset is added to the first pixel signal is minimized, wherein the offset is used for representing a predicted value of the DC component before quantization;

12. The encoding method according to claim 11, wherein the prediction value of the DC component is calculated according to the following formula:

for the purpose of the first pixel signal,

for the purpose of the second pixel signal,

and

adjacent in the direction specified by the prediction mode.

13. The encoding method according to claim 8, wherein determining a prediction value of a DC component of a transform quantized coefficient of the target image block from a similarity of pixels of the transitional reconstructed image block to pixels in a reference pixel area of the target image block comprises:

14. The encoding method according to claim 13, wherein M is an integer greater than or equal to 2, and H is an integer greater than or equal to 2.

15. A decoding device, characterized by comprising:

the entropy decoding module is used for acquiring alternating current AC component and direct current DC component residual error of a transformation quantization coefficient of the target image block from the code stream;

the first inverse quantization and inverse transformation module is used for performing inverse quantization and inverse transformation on the AC component and a preset DC component to obtain a transition residual error, and adding the transition residual error and a predicted value of a target image block to obtain a transition reconstructed image block of the target image block;

obtaining a transitional reconstruction image block of the target image block;

the prediction module is used for determining a prediction value of a DC component of a transformation quantization coefficient of the target image block according to the similarity of pixels in the reference pixel area of the transitional reconstructed image block and the target image block;

and the second inverse quantization and inverse transformation module is used for determining the original DC component of the target image according to the predicted value of the DC component and the DC component residual, inversely quantizing and inversely transforming the original DC component and the AC component of the target image to obtain a residual signal of the target image, and decoding the target image according to the residual signal.

16. The decoding device according to claim 15, wherein the prediction module is configured to determine at least one line and a first set of pixel signals and a second set of adjacent pixel signals corresponding to each line in a direction specified by a prediction mode of the target image block, wherein the first set of pixel signals includes a first pixel signal of the transitional reconstructed image block, the second set of pixel signals includes a second pixel signal of the reference pixel region, and the first pixel signal is adjacent to the second pixel signal; solving an offset of the first pixel signal so that a sum of squares of a second order gradient of the reconstructed signal of the at least one line after the first pixel signal is added by the offset and a second order gradient of the second pixel signal is minimum, wherein the pixel signal used for representing the second order gradient in the transient reconstructed image block is added by the offset, and the offset is used for representing a predicted value of the DC component before quantization; and quantizing the value of the offset obtained by solving to obtain a predicted value of the DC component.

17. The decoding apparatus according to claim 16, wherein the first group of pixel signals and the second group of pixel signals corresponding to each of the at least one line satisfy one of the following two formulas:

is the second order gradient of the first set of pixel signals,

wherein the offset is calculated according to the following formula:

wherein δ x is the offset amount,

18. The decoding device according to claim 15, wherein the prediction module determines a plurality of pixel pairs, and a first pixel signal and an adjacent second pixel signal corresponding to each pixel pair, in a direction specified by a prediction mode of the target image block, wherein the first pixel signal is a pixel signal of the transitional reconstructed image block, and the second pixel signal is a pixel signal of the reference pixel region; solving for an offset of the first pixel signal, so that a sum of squares of first-order gradients of a reconstructed signal and a second pixel signal of the plurality of pixel pairs after the offset is added to the first pixel signal is minimized, wherein the offset is used for representing a predicted value of the DC component before quantization; and quantizing the value of the offset obtained by solving to obtain a predicted value of the DC component.

19. The decoding apparatus according to claim 18, wherein the prediction value of the DC component is calculated according to the following formula:

wherein δ x is the offsetAn amount, n being the number of pixels of each row or column of said transitional reconstructed image block,

for the purpose of the first pixel signal,

for the purpose of the second pixel signal,

and

adjacent in the direction specified by the prediction mode.

20. The decoding device according to claim 15, wherein the decoding device obtains a first set of pixel signals located above the transitional reconstructed image block, a second set of pixel signals on a left side of the transitional reconstructed image block, a third set of pixel signals on an upper side inside the transitional reconstructed image block, and a fourth set of pixel signals on a left side inside the transitional reconstructed image block in the reference pixel region, the first set of pixel signals and the third set of pixel signals respectively include M rows of pixel signals, the second set of pixel signals and the fourth set of pixel signals respectively include H columns of pixel signals, and M and H are positive integers; calculating the difference between the average value of the first group of pixel signals and the average value of the third group of pixel signals to obtain a first difference value; calculating the difference between the average value of the second group of pixel signals and the average value of the fourth group of pixel signals to obtain a second difference value; and quantizing the average value of the first difference value and the second difference value to obtain a predicted value of the DC component.

21. An encoding device, characterized by comprising:

the transformation and quantization module is used for transforming and quantizing the residual signal of the target image block to obtain a Direct Current (DC) component and an Alternating Current (AC) component of a transformation quantization coefficient of the target image block;

the inverse quantization and inverse transformation module is used for performing inverse quantization and inverse transformation on the AC component and a preset DC component to obtain a transition residual error, and adding the transition residual error and a predicted value of a target image block to obtain a transition reconstructed image block of the target image block;

the prediction module is used for determining a prediction value of a DC component of a transformation quantization coefficient of the target image block according to the similarity between the pixels of the transitional reconstructed image block and the pixels in the reference pixel area of the target image block;

and the entropy coding module is used for determining a DC component residual of the target image according to the predicted value of the DC component and the original DC component of the target image block.

22. The encoding device of claim 21, wherein the prediction module determines at least one line and a first set of pixel signals and a second set of adjacent pixel signals corresponding to each line in a direction specified by a prediction mode of the target image block, wherein the first set of pixel signals includes a first pixel signal of the interim reconstructed image block, the second set of pixel signals includes a second pixel signal of the reference pixel region, and the first pixel signal is adjacent to the second pixel signal; solving an offset of the first pixel signal so that a sum of squares of a second order gradient of the reconstructed signal of the at least one line after the first pixel signal is added by the offset and a second order gradient of the second pixel signal is minimum, wherein the pixel signal used for representing the second order gradient in the transient reconstructed image block is added by the offset, and the offset is used for representing a predicted value of the DC component before quantization; and quantizing the value of the offset obtained by solving to obtain a predicted value of the DC component.

23. The encoding apparatus according to claim 22, wherein the first group of pixel signals and the second group of pixel signals corresponding to each of the at least one line satisfy one of the following two formulas:

is the second order gradient of the first set of pixel signals,

wherein the offset is calculated according to the following formula:

wherein δ x is the offset amount,

24. The encoding device according to claim 21, wherein the prediction module determines a plurality of pixel pairs, and a first pixel signal and an adjacent second pixel signal corresponding to each pixel pair, in a direction specified by a prediction mode of the target image block, wherein the first pixel signal is a pixel signal of the transient reconstructed image block, and the second pixel signal is a pixel signal of the reference pixel region; solving for an offset of the first pixel signal, so that a sum of squares of first-order gradients of a reconstructed signal and a second pixel signal of the plurality of pixel pairs after the offset is added to the first pixel signal is minimized, wherein the offset is used for representing a predicted value of the DC component before quantization; and quantizing the value of the offset obtained by solving to obtain a predicted value of the DC component.

25. The encoding apparatus according to claim 24, wherein the prediction value of the DC component is calculated according to the following formula:

for the purpose of the first pixel signal,

is a stand forThe second pixel signal is a signal of the second pixel,

and

adjacent in the direction specified by the prediction mode.

26. The encoding apparatus according to claim 22, wherein the prediction module obtains a first set of pixel signals located above the transitional reconstructed image block, a second set of pixel signals located on the left side of the transitional reconstructed image block, a third set of pixel signals located on the upper side inside the transitional reconstructed image block, and a fourth set of pixel signals located on the left side inside the transitional reconstructed image block in the reference pixel region, the first set of pixel signals and the third set of pixel signals respectively include M rows of pixel signals, the second set of pixel signals and the fourth set of pixel signals respectively include H columns of pixel signals, and M and H are positive integers; calculating the difference between the average value of the first group of pixel signals and the average value of the third group of pixel signals to obtain a first difference value; calculating the difference between the average value of the second group of pixel signals and the average value of the fourth group of pixel signals to obtain a second difference value; and quantizing the average value of the first difference value and the second difference value to obtain a predicted value of the DC component.