KR20150069584A

KR20150069584A - Method and apparatus of parallel deblocking filtering to minimize latency

Info

Publication number: KR20150069584A
Application number: KR1020130155158A
Authority: KR
Inventors: 조현호; 유종훈; 심동규
Original assignee: 광운대학교 산학협력단
Priority date: 2013-12-13
Filing date: 2013-12-13
Publication date: 2015-06-24

Abstract

The present invention relates to a parallel deblocking filtering method and apparatus for minimizing delay by changing the order of CTUs to be filtered in consideration of dependency between divided regions when paralleling a deblocking filter which is one of in-loop filters of HEVC .

Description

Field of the Invention [0001] The present invention relates to a parallel deblocking filtering method and apparatus for delay minimization,

The present invention relates to image processing techniques, and more particularly, to a method and apparatus for minimizing delays in a parallel deblocking filtering process in an HEVC video decoder / decoder.

Recently, as the demand for high resolution and high definition video has increased, there has been a need for a highly efficient video compression technology for the next generation video service. In response to these market demands, Moving Picture Expert Group (MPEG) and Video Coding Expert Group (VCEG) have formed Joint Collaborative Team on Video Coding (JCT-VC). In 2010, the next generation video called HEVC (High Efficiency Video Coding) We started to develop standard technology. In January 2013, the development of the HEVC standard technology has been completed, and the HEVC achieves a compression efficiency improvement of about 50% compared to the H.264 / AVC High profile, which is conventionally known to have the highest compression efficiency.

The HEVC can be applied to existing existing techniques such as quad-tree partition structure, 35 intra prediction modes, advanced motion vector prediction (AMVP), deblocking filter, and sample adaptive offset (SAO) The coding efficiency is improved more than the video codec, but the complexity of the sub-decoder is also increased. among them

The deblocking filter takes up a relatively high computational complexity in the decoder.

When applying data level parallelism to a deblocking filter which is one of HEVC in-loop filters, horizontal filtering for vertical edges and vertical filtering for horizontal edges are combined and processed in one step, And to provide a method and an apparatus for minimizing the delay caused by synchronization.

In order to solve the above problems, a parallel deblocking filtering method and apparatus for a minimum delay according to the first embodiment of the present invention divides an area in units of rows of a CTU (Coding Tree Unit) for a picture or a slice, Use data-level parallelism to allocate to each thread or core.

When dividing a picture or slice into several areas for the data-level parallelization, an equal number of CTU rows are allocated to a thread or each core.

When deblocking filtering of HEVC is performed, horizontal filtering for vertical edges and vertical filtering for horizontal edges are processed in one step without dividing them into two steps, thereby minimizing the explicit synchronization process between the two steps.

In the deblocking filtering, the filtering is performed in units of CTU considering the scanning order that minimizes the dependency between adjacent divided areas.

When performing vertical filtering on the top CTU row in each partition area, data dependency is solved by confirming whether horizontal filtering is performed on the bottommost CTU row in the upper part of the current partition area for synchronization between the partition areas.

In the present invention, in performing deblocking filtering on a picture or slice basis, an equal number of CTU rows are divided into each core or thread, thereby solving the problem of unevenness in the amount of work generated in the data-level parallelization process. In addition, the horizontal filtering for the vertical edge and the vertical filtering for the horizontal edge are processed and processed as one step, thereby minimizing the delay in data parallelization by minimizing the explicit synchronization between the two stages.

1 is a block diagram showing a configuration of a video decoding apparatus to which the present invention is applied.
2A is an exemplary diagram illustrating deblocking filtering in the horizontal direction performed at a vertical edge boundary.
FIG. 2B is an exemplary diagram illustrating vertical deblocking filtering performed at a horizontal edge boundary. FIG.
FIG. 3A is an exemplary diagram of a method of assigning equal CTU rows to each core or thread, and each partition performing parallel filtering in parallel on the vertical edge. FIG.
FIG. 3B is an illustration of a method for allocating equal CTU rows to each core or thread, such that each partition performs parallel filtering in the vertical direction on a horizontal edge.
FIG. 4 is an example of data dependency between regions generated in two stages when horizontal filtering for vertical edges and vertical filtering for horizontal edges are performed in two stages in parallel deblocking filtering.
FIG. 5A illustrates horizontal deblocking filtering for CTU in order to minimize data dependence between regions when horizontal filtering for vertical edges and vertical filtering for horizontal edges are performed in one step in parallel deblocking filtering. Fig.
FIG. 5B illustrates vertical deblocking filtering for CTU in order to minimize data dependence between regions when horizontal filtering for vertical edges and vertical filtering for horizontal edges are performed in one step in parallel deblocking filtering. Fig.

Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings. In the following description of the embodiments of the present invention, a detailed description of known functions and configurations incorporated herein will be omitted when it may make the subject matter of the present disclosure rather unclear.

The terms first, second, etc. may be used to describe various components, but the components should not be limited by the terms. The terms are used only for the purpose of distinguishing one component from another.

In addition, the components shown in the embodiments of the present invention are shown independently to represent different characteristic functions, which does not mean that each component is composed of separate hardware or software constituent units. That is, each constituent unit is included in each constituent unit for convenience of explanation, and at least two constituent units of the constituent units may be combined to form one constituent unit, or one constituent unit may be divided into a plurality of constituent units to perform a function. The integrated embodiments and separate embodiments of the components are also included within the scope of the present invention, unless they depart from the essence of the present invention.

In addition, some of the components are not essential components to perform essential functions in the present invention, but may be optional components only to improve performance. The present invention can be implemented only with components essential for realizing the essence of the present invention, except for the components used for the performance improvement, and can be implemented by only including the essential components except the optional components used for performance improvement Are also included in the scope of the present invention.

1 is a block diagram showing a configuration of a video decoding apparatus to which the present invention is applied.

1, the video decoding apparatus 100 includes an entropy decoding unit 110, an inverse mapping unit 120, an inverse transform unit 130, an inter picture prediction unit 140, an intra prediction unit 150, A summing unit 155, a deblocking filter unit 160, an SAO performing unit 170, and a reference image buffer 180.

The entropy decoding unit 110 decodes the input bitstream and outputs syntax elements and quantized coefficients. In-picture prediction or inter-picture prediction is determined according to the sent syntax syntax, and the transmitted quantized coefficients are transformed into a residual signal through the inverse quantization unit 120 and the inverse transformation unit 130.

The intra prediction 150 or the inter prediction may be performed as a method of generating the prediction signal. In the case of the intra prediction 150, one prediction block is generated by performing spatial prediction using the pixel values of the encoded neighboring blocks adjacent to the current block. In the case of the inter-picture prediction 140, one prediction block is generated by performing motion compensation using a motion vector transmitted from the encoder and a reconstructed image stored in the reference image buffer 180.

The predictive signal generated by performing the inter-picture prediction 140 or the intra-picture prediction 150 is subjected to residual signal combination 155 and passes through the deblocking filter 160 and the SAO performing unit 170. The deblocking filter and the restored picture in which the SAO is performed are stored in the reference picture buffer 180 and can be used as a reference picture in the inter-picture motion prediction unit 140.

2A is an exemplary diagram illustrating deblocking filtering in the horizontal direction performed at a vertical edge boundary.

Referring to FIG. 2A, the deblocking filtering of the HEVC performs deblocking filtering on all PU (Prediction unit) and TU (Transform unit) boundaries on an 8x8 grid in the CTU. At this time, horizontal filtering is performed up to 3 pixels with respect to the edge boundary with respect to the vertical edge boundary.

When the HEVC performs horizontal filtering (215) on the vertical edge boundaries in the CTU, no data dependency occurs between the respective edge boundaries.

FIG. 2B is an exemplary diagram illustrating vertical deblocking filtering performed at a horizontal edge boundary. FIG.

Referring to FIG. 2B, the deblocking filtering of the HEVC performs deblocking filtering on all PU (Prediction unit) and TU (Transform unit) boundaries on the 8x8 grid in the CTU. At this time, the vertical direction filtering is performed up to a maximum of 3 pixels based on the edge boundary with respect to the horizontal edge boundary.

When the HEVC performs vertical filtering (225) on the horizontal edge boundaries in the CTU, there is no data dependency between the respective edge boundaries.

FIG. 3A is an exemplary diagram of a method of assigning equal CTU rows to each core or thread, and each partition performing parallel filtering in parallel on the vertical edge. FIG.

Referring to FIG. 3A, the picture 300 or slice is evenly divided in units of rows of the CTU in accordance with the number of cores or threads to perform parallelization, and the divided areas are allocated to each core or thread.

When the reconstructed picture 300 is divided into two regions and the divided regions are parallel-processed by the Core 0 310 and the Core 1 320, first, horizontal filtering for the vertical edge boundaries is performed for each CTU . At this time, filtering is generally performed in order on the CTUs in each area.

The deblocking filter of HEVC performs filtering at all PU or TU boundaries existing on the 8x8 grid and changes the maximum of 3 pixels based on the edge boundary. Therefore, when performing horizontal filtering on the vertical edge boundary, There is no data dependency.

FIG. 3B is an illustration of a method for allocating equal CTU rows to each core or thread, such that each partition performs parallel filtering in the vertical direction on a horizontal edge.

Referring to FIG. 3B, the picture 350 or slice is evenly divided in units of rows of the CTU in accordance with the number of cores or threads to be subjected to parallelization, and the divided regions are allocated to each core or thread.

When the reconstructed picture 350 is divided into two regions, and each of the divided regions is parallel-processed by the Core 0 360 and the Core 1 370, the horizontal filtering for the vertical edge boundary is performed for each CTU, Is performed. At this time, filtering is generally performed in order on the CTUs in each area. After performing horizontal filtering on the vertical edge boundaries of each region, vertical filtering is performed on the horizontal edge boundaries in each region. Also in this case, filtering is generally performed in sequential order with respect to CTUs within each region.

The deblocking filter of the HEVC performs filtering at all PU or TU boundaries existing on the 8x8 grid and changes the maximum of 3 pixels based on the edge boundary. Therefore, even when performing vertical filtering on the horizontal edge boundary, the CTU There is no data dependency between them.

However, since the deblocking filter of the HEVC uses a horizontal filtering applied to the vertical edge boundary as an input value of the vertical filtering step to the horizontal edge boundary, an explicit synchronization process is required between the two filtering steps.

FIG. 4 is an example of data dependency between regions generated in two stages when horizontal filtering for vertical edges and vertical filtering for horizontal edges are performed in two stages in parallel deblocking filtering.

Referring to FIG. 4, Core 0 (400) allocated to the CTU 0 to CTU 11 regions performs horizontal filtering on the vertical edge boundaries in the allocation region. At the same time, Core 1 410 allocated to the CTU 12 to CTU 23 regions performs vertical filtering on the horizontal edge boundary in the region. At this time, the execution speed of the filtering may vary depending on the performance of each core or thread or the degree of deblocking filtering workload for the assigned CTU. In FIG. 4, Core 0 (400) is an example of performing horizontal direction filtering on vertical edges with respect to CTU 0 to CTU 5 in its own region. Core 1 410 is an example of performing vertical filtering on a horizontal edge after completing horizontal filtering on a vertical edge with respect to CTU 12 to CTU 23 in its region. In this case, the vertical direction filtering for the CTUs CTU12 to CTU17 located at the partitioning boundary in the partition for the Core 1 410 is performed by the CTUs CTU6 to CTU11 of the region adjacent to the partition boundary 430 and the data dependency Lt; / RTI >

FIG. 5A illustrates horizontal deblocking filtering for CTU in order to minimize data dependence between regions when horizontal filtering for vertical edges and vertical filtering for horizontal edges are performed in one step in parallel deblocking filtering. Fig.

Referring to FIG. 5A, when an HEVC deblocking filter is data-level parallelized, each core or thread is assigned an equal number of CTU rows. In order to minimize the delay in the data level parallelization process for HEVC deblocking filtering, the order of the CTUs to be filtered is adjusted in consideration of the positions of the CTUs having dependency.

5A is an example of performing horizontal filtering on a vertical edge boundary in a case where a picture is divided into two regions and HEVC deblocking filtering is performed in parallel with two cores or threads. Core 0 (500) performs horizontal filtering from the lowest CTU row in its area to minimize dependence on Core 1 (510). (CTU6 -> CTU7 -> CTU8 -> CTU9 -> CTU10 -> CTU11 -> CTU0 -> ...-> CTU5)

Likewise, Core 1 (510) also performs horizontal filtering from the lowest CTU row in its area. (CTU18 -> CTU19 -> CTU20 -> CTU21 -> CTU22 -> CTU23 -> CTU12 -> ... -> CTU17)

FIG. 5B illustrates vertical deblocking filtering for CTU in order to minimize data dependence between regions when horizontal filtering for vertical edges and vertical filtering for horizontal edges are performed in one step in parallel deblocking filtering. Fig.

Referring to FIG. 5B, the Core 1 (510, 560) completes the horizontal filtering of the vertical edges with respect to the CTUs in its region through the process described in FIG. 5A. Core 1 (510, 560) does not wait for Core 0 to complete horizontal filtering for all CTUs in the partition of Core 0, but only whether to perform horizontal filtering for CTUs at partition boundary 570 Vertical filtering can be performed on the horizontal edge.

In FIG. 5B, Core 1 (560) is an example in which vertical filtering is continuously performed on a horizontal edge after completion of horizontal filtering for all CTUs allocated to the region. At this time, Core0 is an example of horizontal filtering for vertical edges with respect to CTU6 to CTU11 among its divided regions. By performing vertical filtering from the upper CTU row in the allocated area, the delay of the synchronization process occurring at the boundary of the divided area can be minimized.

In the above-described embodiments, methods are described based on a flowchart as a series of steps or blocks, but the present invention is not limited to the order of the steps, and some steps may occur in different orders or in a different order than the steps described above have. It will also be understood by those skilled in the art that the steps depicted in the flowchart illustrations are not exclusive and that other steps may be included or that one or more steps in the flowchart may be deleted without affecting the scope of the invention You will understand.

The above-described embodiments include examples of various aspects. While it is not possible to describe every possible combination for expressing various aspects, one of ordinary skill in the art will recognize that other combinations are possible. Accordingly, it is intended that the invention include all alternatives, modifications and variations that fall within the scope of the following claims.

Claims

A video coding / decoding apparatus and method for dividing an equal number of CTU rows into a core or a thread and performing parallel processing on each divided region when a parallel deblocking filter is performed

A video coding / decoding apparatus and method for performing horizontal filtering in a reverse order from a CTU row located at the lowest position in each area in consideration of dependency of a divided area in a horizontal filtering step for a vertical edge

A video sub-decoding apparatus and method for performing vertical filtering sequentially from the CTU row located at the top in each area in consideration of the dependence of the sub-area in the vertical filtering step on the horizontal edge

Considering the dependency of the divided area in the vertical filtering step for the horizontal edge, when performing filtering on the CTU row located at the top in each area, it is checked whether vertical filtering is performed on the adjacent CTUs, Sub / decoding apparatus and method