KR20130049603A

KR20130049603A - Hexc Deblocking Filter Parallel Processing Algorithm and How Hexc Deblocking Filter Enables Parallel Processing

Info

Publication number: KR20130049603A
Application number: KR1020110114713A
Authority: KR
Inventors: 유제윤; 김영조; 변주원; 김재석
Original assignee: 연세대학교 산학협력단
Priority date: 2011-11-04
Filing date: 2011-11-04
Publication date: 2013-05-14

Abstract

PURPOSE: A parallel processing algorithm for an HEVC(High Efficiency Video Coding) deblocking filter and a parallel processing method using thereof are provided to eliminate data dependencies by dividing data into data-to-read and data-to-write. CONSTITUTION: An image encoding apparatus(100) encodes the differential between an original block and a predicted block. An intra-prediction unit(120) outputs SATD on a prediction mode and stores the SATD in a candidate intra-prediction mode list. A motion prediction unit(111) confirms motion vectors in a reference image which is stored in a reference image buffer(190). A motion compensation unit(112) generates a prediction block by performing compensation of motions using the motion vectors. A subtractor(125) generates residual blocks by the differential between an input block and the generated prediction block. A transform unit(130) outputs the transform coefficients by performing block transforms on the residual blocks. A quantization unit(140) outputs the quantized coefficients by quantizing the transform coefficients based on quantization parameters. An entropy coding unit(150) entropy encodes the quantized coefficients according to the probability distribution for outputting the video bit stream. [Reference numerals] (111) Motion prediction unit; (112) Motion compensation unit; (120) Intra-prediction unit; (130) Transform unit; (140) Quantization unit; (150) Entropy coding unit; (160) Dequantization unit; (170) Inverse transform unit; (180) Filter unit; (190) Reference image buffer; (AA) Input image; (BB) Inter; (CC) Intra; (DD) Bit stream

Description

Hexc Deblocking Filter Parallel Processing Algorithm and Hexc Deblocking Filter Enabled Parallel Processing

The present invention relates to a deblocking filter performance improvement and speed improvement technique of a video codec.

The present invention relates to a deblocking filter scheme of a video codec. This method can be applied to codecs that require faster speeds in processing high quality images.

Recently, there is an increasing demand for high-resolution, high-quality images such as HD (High Definition) and UHD (Ultra High Definition) images in various applications. As the video data becomes higher resolution and higher quality, the amount of data increases relative to the existing video data. Therefore, when the video data is transmitted or stored using a medium such as a conventional wired / wireless broadband line, The storage cost will increase. High-efficiency image compression techniques can be utilized to solve such problems as image data becomes high-resolution and high-quality.

An inter-screen prediction technique for predicting pixel values included in the current picture from a picture before or after the current picture using an image compression technique, an intra prediction technique for predicting pixel values included in a current picture using pixel information in the current picture, Various techniques exist, such as an entropy encoding technique for allocating a short code to a high frequency of appearance and a long code to a low frequency of appearance, and the image data can be effectively compressed and transmitted or stored.

The present invention is to provide a parallel processing method for faster execution time by removing the data dependency in order to solve the problems of the prior art described above.

If you perform horizontal boundary filtering after waiting for the result of vertical boundary filtering, parallel processing is not performed. Accordingly, an object of the present invention is to separate data to be read from data to be read so that parallel processing is possible by eliminating data dependency.

The present invention also aims to bring about improved image quality by better reflecting the characteristics of the boundary using the pre-filtered pixel without using the filtered pixel.

According to one embodiment of the present invention, a method is provided that enables parallel processing in an HEVC deblocking filter.

Using the present invention results in a slightly better performance gain in the filtering process of the deblocking filter. It can be used in various ways with the scalability of parallel processing in the structure that parallel processing was not possible.

According to the configuration of the present invention has a scalability for parallel processing can be used in various ways.

1 is a block diagram illustrating an image encoding apparatus according to an embodiment of the present invention.
2 is a block diagram illustrating a configuration of an image decoding apparatus according to another embodiment of the present invention.
3 is a diagram for explaining data dependency in an 8x8 block.
4 is a diagram for explaining data dependency among 8x8 blocks.
5 is a view for explaining the operation of the deblocking filter.
6 is a diagram for describing an operation sequence of neighboring inter-block filtering.
7 is a flowchart illustrating an algorithm of the proposed deblocking filter.
8 is a view for explaining the operation sequence of the proposed algorithm.
9 is a diagram for describing an operation of simultaneously processing two blocks.

Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings. In the following description of the embodiments of the present invention, a detailed description of known functions and configurations incorporated herein will be omitted when it may make the subject matter of the present disclosure rather unclear.

It is to be understood that when an element is referred to as being "connected" or "connected" to another element, it may be directly connected or connected to the other element, . In addition, the description of "including" a specific configuration in the present invention does not exclude a configuration other than the configuration, and means that additional configurations can be included in the practice of the present invention or the technical scope of the present invention.

The terms first, second, etc. may be used to describe various components, but the components should not be limited by the terms. The terms are used only for the purpose of distinguishing one component from another. For example, without departing from the scope of the present invention, the first component may be referred to as a second component, and similarly, the second component may also be referred to as a first component.

In addition, the components shown in the embodiments of the present invention are shown independently to represent different characteristic functions, which does not mean that each component is composed of separate hardware or software constituent units. That is, each constituent unit is included in each constituent unit for convenience of explanation, and at least two constituent units of the constituent units may be combined to form one constituent unit, or one constituent unit may be divided into a plurality of constituent units to perform a function. The integrated embodiments and separate embodiments of the components are also included within the scope of the present invention, unless they depart from the essence of the present invention.

In addition, some of the components are not essential components to perform essential functions in the present invention, but may be optional components only to improve performance. The present invention can be implemented only with components essential for realizing the essence of the present invention, except for the components used for the performance improvement, and can be implemented by only including the essential components except the optional components used for performance improvement Are also included in the scope of the present invention.

1 is a block diagram illustrating an image encoding apparatus according to an embodiment of the present invention.

Referring to FIG. 1, the image encoding apparatus 100 may include a motion predictor 111, a motion compensator 112, an intra predictor 120, a switch 115, a subtractor 125, and a converter 130. And a quantization unit 140, an entropy encoding unit 150, an inverse quantization unit 160, an inverse transform unit 170, an adder 175, a filter unit 180, and a reference image buffer 190.

The image encoding apparatus 100 performs encoding in an intra mode or an inter mode with respect to an input image and outputs a bit stream. In the embodiment of the present invention, intra prediction can be used in the same way as inter prediction, and inter prediction can be used in the same meaning as inter prediction. In order to determine an optimal prediction method for the prediction unit, an intra prediction method and an inter prediction method may be selectively used for the prediction unit. The image encoding apparatus 100 generates a prediction block for the original block of the input image, and then encodes the difference between the original block and the prediction block.

In the intra-picture prediction mode, the intra-prediction unit 120 (or the intra-picture prediction unit can also be used as a term having the same meaning) performs spatial prediction using pixel values of already coded blocks around the current block And generates a prediction block.

The intra predictor 120 calculates a SATD for the intra prediction mode for a predetermined prediction unit, and calculates a first intra prediction mode corresponding to the calculated first SATD value, the second SATD value, and the first SATD value. The prediction block of the predetermined prediction unit may be generated by storing the prediction mode list in the candidate screen and determining whether the difference between the first SATD value and the second SATD value is less than or equal to a predetermined threshold value. In addition, when the difference between the first SATD value and the second SATD value is larger than a predetermined threshold value, the intra prediction mode corresponding to the first SATD value is determined as the final intra prediction mode of the current prediction unit, and the first SATD value is determined. If the difference between the second SATD value is less than or equal to a predetermined threshold value, the additional intra prediction mode is included in the candidate intra prediction mode list to determine an RD cost calculated based on the intra prediction mode stored in the candidate intra prediction mode list. The prediction mode in the final screen can be determined.

In addition, the intra predictor 120 determines whether the SATD value for the predetermined coding unit exceeds the threshold SATD value, and does not further divide the coding unit when the SATD value for the predetermined coding unit is smaller than the threshold SATD value. If the SATD value for the predetermined coding unit is greater than or equal to the threshold SATD value, the coding unit may be further split.

The operation of the intra prediction unit 120 will be described in detail with reference to the following embodiments of the present invention and FIGS. 3 to 9.

In the case of the inter-picture prediction mode, the motion prediction unit 111 finds a motion vector by searching an area of the reference picture stored in the reference picture buffer 190, which is best matched with the input block, in the motion prediction process. The motion compensation unit 112 generates a prediction block by performing motion compensation using a motion vector.

The subtracter 125 generates a residual block by a difference between the input block and the generated prediction block. The transforming unit 130 performs a transform on the residual block to output a transform coefficient. The quantization unit 140 quantizes the input transform coefficient according to the quantization parameter and outputs a quantized coefficient. The entropy encoding unit 150 entropy-codes the input quantized coefficients according to a probability distribution to output a bit stream.

Since the HEVC performs inter prediction coding, i.e., inter prediction coding, the currently encoded image needs to be decoded and stored for use as a reference image. Accordingly, the quantized coefficients are inversely quantized in the inverse quantization unit 160 and inversely transformed in the inverse transformation unit 170. The inverse quantized and inverse transformed coefficients are added to the prediction block through the adder 175 and a reconstruction block is generated.

The restoration block passes through the filter unit 180 and the filter unit 180 applies at least one of a deblocking filter, a sample adaptive offset (SAO), and an adaptive loop filter (ALF) can do. The filter unit 180 may be referred to as an adaptive in-loop filter. The deblocking filter can remove block distortion occurring at the boundary between the blocks. SAO may add an appropriate offset value to pixel values to compensate for coding errors. The ALF may perform filtering based on a comparison between the reconstructed image and the original image, and may be performed only when high efficiency is applied. The restoration block having passed through the filter unit 180 is stored in the reference image buffer 190.

2 is a block diagram illustrating a configuration of an image decoding apparatus according to another embodiment of the present invention.

2, the image decoding apparatus 200 includes an entropy decoding unit 210, an inverse quantization unit 220, an inverse transform unit 230, an intra prediction unit 240, a motion compensation unit 250, (260) and a reference image buffer (270).

The video decoding apparatus 200 receives the bit stream output from the encoder and decodes the video stream into an intra mode or an inter mode, and outputs a reconstructed video, i.e., a reconstructed video. In the intra mode, a prediction block is generated using an intra prediction mode, and a prediction block is generated using an inter prediction method in an inter mode. The image decoding apparatus 200 obtains a residual block from the input bitstream, generates a prediction block, adds the residual block and the prediction block, and generates a reconstructed block, that is, a reconstruction block.

The entropy decoding unit 210 entropy-decodes the input bitstream according to a probability distribution and outputs a quantized coefficient. The quantized coefficients are inversely quantized in the inverse quantization unit 220 and inversely transformed in the inverse transformation unit 230. As a result of inverse quantization / inverse transformation of the quantized coefficients, a residual block is generated.

In the intra-picture prediction mode, the intra-prediction unit 240 (or the inter-picture prediction unit) performs spatial prediction using the pixel values of the already coded blocks around the current block to generate a prediction block.

In the inter-view prediction mode, the motion compensation unit 250 generates a prediction block by performing motion compensation using a motion vector and a reference image stored in the reference image buffer 270.

The residual block and the prediction block are added through the adder 255, and the added block is passed through the filter unit 260. [ The filter unit 260 may apply at least one or more of the deblocking filter, SAO, and ALF to the reconstructed block or the reconstructed picture. The filter unit 260 outputs a reconstructed image, that is, a reconstructed image. The reconstructed picture may be stored in the reference picture buffer 270 to be used for inter prediction.

Methods for improving the prediction performance of the encoding / decoding apparatus include a method of increasing the accuracy of the interpolation image and a method of predicting the difference signal. Here, the difference signal is a signal indicating the difference between the original image and the predicted image. In the present invention, the term " difference signal " may be replaced by a " difference signal ", " residual block ", or " difference block " depending on the context. Those skilled in the art may influence the idea You will be able to distinguish this within the scope of not giving.

As described above, in the embodiment of the present invention, a coding unit (coding unit) is used as a coding unit for convenience of explanation, but it may be a unit for performing not only coding but also decoding. Hereinafter, an encoding / decoding method of an intra prediction mode using two candidate intra prediction modes described with reference to FIGS. 3 to 9 according to an embodiment of the present invention may be implemented according to the functions of the modules described above with reference to FIGS. 1 and 2. These encoders and decoders are included in the scope of the present invention. That is, the image encoding method and image decoding method to be described later in the embodiment of the present invention can be performed in each component included in the image encoder and the image decoder described with reference to FIG. 1 and FIG. The meaning of the component may include not only the hardware meaning but also a software processing unit that may be performed through an algorithm.

In addition, hereinafter, the image encoding method and the image decoding method which will be described later in an embodiment of the present invention may be performed by each component included in the image encoder and the image decoder described above with reference to FIGS. 1 and 2. The meaning of the component may include not only the hardware meaning but also a software processing unit that may be performed through an algorithm.

The deblocking filter of HEVC, which is being standardized, is an issue in parallelization. Currently Panasonic's Deblocking Filter takes the form of horizontal boundary filtering after vertical boundary filtering. Since data dependence exists in the display portion of FIG. 3 (waiting for the vertical boundary filtering result and performing horizontal boundary), there is a problem in that the data is not parallelized. That is, Panasonic's De-blocking Filter does not have parallelism due to the problem of data dependency.

The configuration of the present invention replaces all inputs used as pixels before filtering.

The parallel deblocking filter algorithm proposed by the present invention is as follows.

1. Data dependency in 8x8 blocks

The reason for no parallelism is the data dependence that exists between the processes. Referring to FIG. 4, the value of the horizontal boundary of the previous 8x8 block must be written after filtering to perform vertical boundary filtering of the next 8x8 block. There is a data dependency between these two boundaries. Panasonic, Tandberg, Nokia, and ericsson's deblocking filters have data dependencies between the vertical and horizontal boundaries even within a single 8x8 block. Data dependency is an obstacle to parallelism because it prevents simultaneous filtering. As shown in FIG. 4, the #n 8x8 block and the #n 8x8 block cannot be processed simultaneously.

2. Reuse of unfiltered pixels

Looking at Panasonic's deblocking filter algorithm, the unfiltered pixels can be used in the decision process to eliminate parallelism by eliminating data dependency. If this is used in the filtering process, data dependence disappears in the filtering process. Data dependencies occur when data that has to be read has not been written yet. This is because reading old data without reading new data may not produce the desired results. Panasonic's deblocking filter uses the pixels before filtering in the decision process (ON / OFF, type), but in the filtering process, after the result of the filtering value of the previous block is written, the result is received as a new input and the next process is performed. Thus, data dependencies exist, which makes parallel processing impossible. To solve this problem, the filtered pixels are used in the filtering process as well as the determination process.

5 is a data processing sequence when the pre-filtering pixel is used in the filtering process. In Fig. 5, (1) reads data to determine ON / OFF, and (2) reads data to determine type. Improvements in the Panasonic algorithm are (3) and (4) in FIG. 5, and both processes use the pixels before filtering. The pixels before filtering are stored in the line buffer, and the pixels are filtered to read the stored pixels. After filtering, data is written to the reference frame. In (3) and (4), if one difference is made, the data will be overwritten without collision. In the previous algorithm, processing (4) using the result of (3) generates delay as much as the execution time of filtering. However, if the pixel is used before filtering, the delay can be reduced by one clock difference.

In addition, data dependence between neighboring blocks also disappears. As shown in FIG. 6, when only the timing at which data is written is adjusted to be different, the #n block and the # n + 1 block can be simultaneously executed. In FIG. 6, the pixels displayed in dark color create new values by using 4 to 6 neighboring pixels, and write the values back into the memory to complete filtering. If the timing of writing to the memory is not overlapped, the operation (1) and the operation (2) of FIG. 6 can be completed until the operation of writing to the memory after filtering by the time difference of one clock.

3. New Deblocking Filter

When the pixels before filtering are used in the determination process and the filtering process, parallel processing is possible by removing data dependency between blocks. Next, FIG. 7 is an algorithm of the proposed deblocking filter.

The new deblocking filter proceeds as shown in FIG. 7. The Panasonic algorithm performs both boundaries at the same time for the ON / OFF decision and the type decision, but in the filtering process, the vertical boundary and the horizontal boundary must be sequentially performed due to the above-described data dependency problem. However, the proposed algorithm can filter both vertical and horizontal boundaries simultaneously by removing data dependence using pixels before filtering as described above.

8 shows areas used and applied when processing each process in one block. In FIG. 8, (1) determines using the pixels in the 3rd and 6th lines in order to make the ON / OFF decision, and (2) the filtering type of each line using the pixels of all the lines. Determine. When each filtering is determined, the process of FIG. 8 (3) starts and immediately the process of (4) begins. As explained in "2. Reusing Unfiltered Pixels," the use of pre-filtering pixels eliminates data dependence, allowing the process to proceed without waiting for results.

9 is a diagram illustrating an operation of simultaneously processing two blocks. In the first step of FIG. 9, the on / off determination of the vertical and horizontal boundaries of the n block and the neighboring block is performed. In the second step, the #n block and the neighboring block determine the type at the same time, and the # n + 2 block starts the ON / OFF decision step. In the final step, #n block performs filtering and # n + 2 block performs type determination. Since there is no dependency between data, deblocking filtering is possible continuously as shown in FIG. 9.

Unlike Panasonic's algorithm, the biggest feature of the proposed algorithm is that it uses the pixels before filtering at every stage. Table 1 shows the difference between the pixels used in the conventional algorithm and the pixels used in the proposed algorithm.

Table 1 Input Comparison of Algorithms

As shown in Table 1, the proposed algorithm differs from using the pre-filtering pixel in the filtering step compared with the reference algorithm Panasonic algorithm. In addition, the equation used in each step is as follows.

<8x8 block boundary and pixel>

Equations 1 and 2 are ON / OFF determination steps. If Equation 2 is satisfied, the process proceeds to the next step.

If all of the equations (3) to (5) are satisfied, the strong filter is performed. If none of the equations are satisfied, the weak filter is performed. When determining the strong filter in the type determination step, filtering is performed using Equations 6 to 11 below.

Equations 6 to 11 represent the operation of the strong filter. After each calculation, after clipping, we change both pixels from the boundary. When determining the weak filter in type determination, filtering is performed using the following Equations 12 to 16.

Equations 12 to 16 express the operation of the weak filter. It will replace both two pixels from the boundary.

Claims

How to enable parallel processing in HEVC deblocking filter.