KR20150069584A - Method and apparatus of parallel deblocking filtering to minimize latency - Google Patents

Method and apparatus of parallel deblocking filtering to minimize latency Download PDF

Info

Publication number
KR20150069584A
KR20150069584A KR1020130155158A KR20130155158A KR20150069584A KR 20150069584 A KR20150069584 A KR 20150069584A KR 1020130155158 A KR1020130155158 A KR 1020130155158A KR 20130155158 A KR20130155158 A KR 20130155158A KR 20150069584 A KR20150069584 A KR 20150069584A
Authority
KR
South Korea
Prior art keywords
filtering
horizontal
vertical
ctu
core
Prior art date
Application number
KR1020130155158A
Other languages
Korean (ko)
Inventor
조현호
유종훈
심동규
Original Assignee
광운대학교 산학협력단
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 광운대학교 산학협력단 filed Critical 광운대학교 산학협력단
Priority to KR1020130155158A priority Critical patent/KR20150069584A/en
Publication of KR20150069584A publication Critical patent/KR20150069584A/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/117Filters, e.g. for pre-processing or post-processing
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/85Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using pre-processing or post-processing specially adapted for video compression
    • H04N19/86Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using pre-processing or post-processing specially adapted for video compression involving reduction of coding artifacts, e.g. of blockiness

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

The present invention relates to a parallel deblocking filtering method and apparatus for minimizing delay by changing the order of CTUs to be filtered in consideration of dependency between divided regions when paralleling a deblocking filter which is one of in-loop filters of HEVC .

Description

Field of the Invention [0001] The present invention relates to a parallel deblocking filtering method and apparatus for delay minimization,

The present invention relates to image processing techniques, and more particularly, to a method and apparatus for minimizing delays in a parallel deblocking filtering process in an HEVC video decoder / decoder.

Recently, as the demand for high resolution and high definition video has increased, there has been a need for a highly efficient video compression technology for the next generation video service. In response to these market demands, Moving Picture Expert Group (MPEG) and Video Coding Expert Group (VCEG) have formed Joint Collaborative Team on Video Coding (JCT-VC). In 2010, the next generation video called HEVC (High Efficiency Video Coding) We started to develop standard technology. In January 2013, the development of the HEVC standard technology has been completed, and the HEVC achieves a compression efficiency improvement of about 50% compared to the H.264 / AVC High profile, which is conventionally known to have the highest compression efficiency.

The HEVC can be applied to existing existing techniques such as quad-tree partition structure, 35 intra prediction modes, advanced motion vector prediction (AMVP), deblocking filter, and sample adaptive offset (SAO) The coding efficiency is improved more than the video codec, but the complexity of the sub-decoder is also increased. among them

The deblocking filter takes up a relatively high computational complexity in the decoder.

When applying data level parallelism to a deblocking filter which is one of HEVC in-loop filters, horizontal filtering for vertical edges and vertical filtering for horizontal edges are combined and processed in one step, And to provide a method and an apparatus for minimizing the delay caused by synchronization.

In order to solve the above problems, a parallel deblocking filtering method and apparatus for a minimum delay according to the first embodiment of the present invention divides an area in units of rows of a CTU (Coding Tree Unit) for a picture or a slice, Use data-level parallelism to allocate to each thread or core.

When dividing a picture or slice into several areas for the data-level parallelization, an equal number of CTU rows are allocated to a thread or each core.

When deblocking filtering of HEVC is performed, horizontal filtering for vertical edges and vertical filtering for horizontal edges are processed in one step without dividing them into two steps, thereby minimizing the explicit synchronization process between the two steps.

In the deblocking filtering, the filtering is performed in units of CTU considering the scanning order that minimizes the dependency between adjacent divided areas.

When performing vertical filtering on the top CTU row in each partition area, data dependency is solved by confirming whether horizontal filtering is performed on the bottommost CTU row in the upper part of the current partition area for synchronization between the partition areas.

In the present invention, in performing deblocking filtering on a picture or slice basis, an equal number of CTU rows are divided into each core or thread, thereby solving the problem of unevenness in the amount of work generated in the data-level parallelization process. In addition, the horizontal filtering for the vertical edge and the vertical filtering for the horizontal edge are processed and processed as one step, thereby minimizing the delay in data parallelization by minimizing the explicit synchronization between the two stages.

1 is a block diagram showing a configuration of a video decoding apparatus to which the present invention is applied.
2A is an exemplary diagram illustrating deblocking filtering in the horizontal direction performed at a vertical edge boundary.
FIG. 2B is an exemplary diagram illustrating vertical deblocking filtering performed at a horizontal edge boundary. FIG.
FIG. 3A is an exemplary diagram of a method of assigning equal CTU rows to each core or thread, and each partition performing parallel filtering in parallel on the vertical edge. FIG.
FIG. 3B is an illustration of a method for allocating equal CTU rows to each core or thread, such that each partition performs parallel filtering in the vertical direction on a horizontal edge.
FIG. 4 is an example of data dependency between regions generated in two stages when horizontal filtering for vertical edges and vertical filtering for horizontal edges are performed in two stages in parallel deblocking filtering.
FIG. 5A illustrates horizontal deblocking filtering for CTU in order to minimize data dependence between regions when horizontal filtering for vertical edges and vertical filtering for horizontal edges are performed in one step in parallel deblocking filtering. Fig.
FIG. 5B illustrates vertical deblocking filtering for CTU in order to minimize data dependence between regions when horizontal filtering for vertical edges and vertical filtering for horizontal edges are performed in one step in parallel deblocking filtering. Fig.

Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings. In the following description of the embodiments of the present invention, a detailed description of known functions and configurations incorporated herein will be omitted when it may make the subject matter of the present disclosure rather unclear.

The terms first, second, etc. may be used to describe various components, but the components should not be limited by the terms. The terms are used only for the purpose of distinguishing one component from another.

In addition, the components shown in the embodiments of the present invention are shown independently to represent different characteristic functions, which does not mean that each component is composed of separate hardware or software constituent units. That is, each constituent unit is included in each constituent unit for convenience of explanation, and at least two constituent units of the constituent units may be combined to form one constituent unit, or one constituent unit may be divided into a plurality of constituent units to perform a function. The integrated embodiments and separate embodiments of the components are also included within the scope of the present invention, unless they depart from the essence of the present invention.

In addition, some of the components are not essential components to perform essential functions in the present invention, but may be optional components only to improve performance. The present invention can be implemented only with components essential for realizing the essence of the present invention, except for the components used for the performance improvement, and can be implemented by only including the essential components except the optional components used for performance improvement Are also included in the scope of the present invention.

1 is a block diagram showing a configuration of a video decoding apparatus to which the present invention is applied.

1, the video decoding apparatus 100 includes an entropy decoding unit 110, an inverse mapping unit 120, an inverse transform unit 130, an inter picture prediction unit 140, an intra prediction unit 150, A summing unit 155, a deblocking filter unit 160, an SAO performing unit 170, and a reference image buffer 180.

The entropy decoding unit 110 decodes the input bitstream and outputs syntax elements and quantized coefficients. In-picture prediction or inter-picture prediction is determined according to the sent syntax syntax, and the transmitted quantized coefficients are transformed into a residual signal through the inverse quantization unit 120 and the inverse transformation unit 130.

The intra prediction 150 or the inter prediction may be performed as a method of generating the prediction signal. In the case of the intra prediction 150, one prediction block is generated by performing spatial prediction using the pixel values of the encoded neighboring blocks adjacent to the current block. In the case of the inter-picture prediction 140, one prediction block is generated by performing motion compensation using a motion vector transmitted from the encoder and a reconstructed image stored in the reference image buffer 180.

The predictive signal generated by performing the inter-picture prediction 140 or the intra-picture prediction 150 is subjected to residual signal combination 155 and passes through the deblocking filter 160 and the SAO performing unit 170. The deblocking filter and the restored picture in which the SAO is performed are stored in the reference picture buffer 180 and can be used as a reference picture in the inter-picture motion prediction unit 140.

2A is an exemplary diagram illustrating deblocking filtering in the horizontal direction performed at a vertical edge boundary.

Referring to FIG. 2A, the deblocking filtering of the HEVC performs deblocking filtering on all PU (Prediction unit) and TU (Transform unit) boundaries on an 8x8 grid in the CTU. At this time, horizontal filtering is performed up to 3 pixels with respect to the edge boundary with respect to the vertical edge boundary.

When the HEVC performs horizontal filtering (215) on the vertical edge boundaries in the CTU, no data dependency occurs between the respective edge boundaries.

FIG. 2B is an exemplary diagram illustrating vertical deblocking filtering performed at a horizontal edge boundary. FIG.

Referring to FIG. 2B, the deblocking filtering of the HEVC performs deblocking filtering on all PU (Prediction unit) and TU (Transform unit) boundaries on the 8x8 grid in the CTU. At this time, the vertical direction filtering is performed up to a maximum of 3 pixels based on the edge boundary with respect to the horizontal edge boundary.

When the HEVC performs vertical filtering (225) on the horizontal edge boundaries in the CTU, there is no data dependency between the respective edge boundaries.

FIG. 3A is an exemplary diagram of a method of assigning equal CTU rows to each core or thread, and each partition performing parallel filtering in parallel on the vertical edge. FIG.

Referring to FIG. 3A, the picture 300 or slice is evenly divided in units of rows of the CTU in accordance with the number of cores or threads to perform parallelization, and the divided areas are allocated to each core or thread.

When the reconstructed picture 300 is divided into two regions and the divided regions are parallel-processed by the Core 0 310 and the Core 1 320, first, horizontal filtering for the vertical edge boundaries is performed for each CTU . At this time, filtering is generally performed in order on the CTUs in each area.

The deblocking filter of HEVC performs filtering at all PU or TU boundaries existing on the 8x8 grid and changes the maximum of 3 pixels based on the edge boundary. Therefore, when performing horizontal filtering on the vertical edge boundary, There is no data dependency.

FIG. 3B is an illustration of a method for allocating equal CTU rows to each core or thread, such that each partition performs parallel filtering in the vertical direction on a horizontal edge.

Referring to FIG. 3B, the picture 350 or slice is evenly divided in units of rows of the CTU in accordance with the number of cores or threads to be subjected to parallelization, and the divided regions are allocated to each core or thread.

When the reconstructed picture 350 is divided into two regions, and each of the divided regions is parallel-processed by the Core 0 360 and the Core 1 370, the horizontal filtering for the vertical edge boundary is performed for each CTU, Is performed. At this time, filtering is generally performed in order on the CTUs in each area. After performing horizontal filtering on the vertical edge boundaries of each region, vertical filtering is performed on the horizontal edge boundaries in each region. Also in this case, filtering is generally performed in sequential order with respect to CTUs within each region.

The deblocking filter of the HEVC performs filtering at all PU or TU boundaries existing on the 8x8 grid and changes the maximum of 3 pixels based on the edge boundary. Therefore, even when performing vertical filtering on the horizontal edge boundary, the CTU There is no data dependency between them.

However, since the deblocking filter of the HEVC uses a horizontal filtering applied to the vertical edge boundary as an input value of the vertical filtering step to the horizontal edge boundary, an explicit synchronization process is required between the two filtering steps.

FIG. 4 is an example of data dependency between regions generated in two stages when horizontal filtering for vertical edges and vertical filtering for horizontal edges are performed in two stages in parallel deblocking filtering.

Referring to FIG. 4, Core 0 (400) allocated to the CTU 0 to CTU 11 regions performs horizontal filtering on the vertical edge boundaries in the allocation region. At the same time, Core 1 410 allocated to the CTU 12 to CTU 23 regions performs vertical filtering on the horizontal edge boundary in the region. At this time, the execution speed of the filtering may vary depending on the performance of each core or thread or the degree of deblocking filtering workload for the assigned CTU. In FIG. 4, Core 0 (400) is an example of performing horizontal direction filtering on vertical edges with respect to CTU 0 to CTU 5 in its own region. Core 1 410 is an example of performing vertical filtering on a horizontal edge after completing horizontal filtering on a vertical edge with respect to CTU 12 to CTU 23 in its region. In this case, the vertical direction filtering for the CTUs CTU12 to CTU17 located at the partitioning boundary in the partition for the Core 1 410 is performed by the CTUs CTU6 to CTU11 of the region adjacent to the partition boundary 430 and the data dependency Lt; / RTI >

FIG. 5A illustrates horizontal deblocking filtering for CTU in order to minimize data dependence between regions when horizontal filtering for vertical edges and vertical filtering for horizontal edges are performed in one step in parallel deblocking filtering. Fig.

Referring to FIG. 5A, when an HEVC deblocking filter is data-level parallelized, each core or thread is assigned an equal number of CTU rows. In order to minimize the delay in the data level parallelization process for HEVC deblocking filtering, the order of the CTUs to be filtered is adjusted in consideration of the positions of the CTUs having dependency.

5A is an example of performing horizontal filtering on a vertical edge boundary in a case where a picture is divided into two regions and HEVC deblocking filtering is performed in parallel with two cores or threads. Core 0 (500) performs horizontal filtering from the lowest CTU row in its area to minimize dependence on Core 1 (510). (CTU6 -> CTU7 -> CTU8 -> CTU9 -> CTU10 -> CTU11 -> CTU0 -> ...-> CTU5)

Likewise, Core 1 (510) also performs horizontal filtering from the lowest CTU row in its area. (CTU18 -> CTU19 -> CTU20 -> CTU21 -> CTU22 -> CTU23 -> CTU12 -> ... -> CTU17)

FIG. 5B illustrates vertical deblocking filtering for CTU in order to minimize data dependence between regions when horizontal filtering for vertical edges and vertical filtering for horizontal edges are performed in one step in parallel deblocking filtering. Fig.

Referring to FIG. 5B, the Core 1 (510, 560) completes the horizontal filtering of the vertical edges with respect to the CTUs in its region through the process described in FIG. 5A. Core 1 (510, 560) does not wait for Core 0 to complete horizontal filtering for all CTUs in the partition of Core 0, but only whether to perform horizontal filtering for CTUs at partition boundary 570 Vertical filtering can be performed on the horizontal edge.

In FIG. 5B, Core 1 (560) is an example in which vertical filtering is continuously performed on a horizontal edge after completion of horizontal filtering for all CTUs allocated to the region. At this time, Core0 is an example of horizontal filtering for vertical edges with respect to CTU6 to CTU11 among its divided regions. By performing vertical filtering from the upper CTU row in the allocated area, the delay of the synchronization process occurring at the boundary of the divided area can be minimized.

In the above-described embodiments, methods are described based on a flowchart as a series of steps or blocks, but the present invention is not limited to the order of the steps, and some steps may occur in different orders or in a different order than the steps described above have. It will also be understood by those skilled in the art that the steps depicted in the flowchart illustrations are not exclusive and that other steps may be included or that one or more steps in the flowchart may be deleted without affecting the scope of the invention You will understand.

The above-described embodiments include examples of various aspects. While it is not possible to describe every possible combination for expressing various aspects, one of ordinary skill in the art will recognize that other combinations are possible. Accordingly, it is intended that the invention include all alternatives, modifications and variations that fall within the scope of the following claims.

Claims (4)

A video coding / decoding apparatus and method for dividing an equal number of CTU rows into a core or a thread and performing parallel processing on each divided region when a parallel deblocking filter is performed
A video coding / decoding apparatus and method for performing horizontal filtering in a reverse order from a CTU row located at the lowest position in each area in consideration of dependency of a divided area in a horizontal filtering step for a vertical edge
A video sub-decoding apparatus and method for performing vertical filtering sequentially from the CTU row located at the top in each area in consideration of the dependence of the sub-area in the vertical filtering step on the horizontal edge
Considering the dependency of the divided area in the vertical filtering step for the horizontal edge, when performing filtering on the CTU row located at the top in each area, it is checked whether vertical filtering is performed on the adjacent CTUs, Sub / decoding apparatus and method

KR1020130155158A 2013-12-13 2013-12-13 Method and apparatus of parallel deblocking filtering to minimize latency KR20150069584A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
KR1020130155158A KR20150069584A (en) 2013-12-13 2013-12-13 Method and apparatus of parallel deblocking filtering to minimize latency

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
KR1020130155158A KR20150069584A (en) 2013-12-13 2013-12-13 Method and apparatus of parallel deblocking filtering to minimize latency

Publications (1)

Publication Number Publication Date
KR20150069584A true KR20150069584A (en) 2015-06-24

Family

ID=53516648

Family Applications (1)

Application Number Title Priority Date Filing Date
KR1020130155158A KR20150069584A (en) 2013-12-13 2013-12-13 Method and apparatus of parallel deblocking filtering to minimize latency

Country Status (1)

Country Link
KR (1) KR20150069584A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2024098821A1 (en) * 2022-11-11 2024-05-16 上海哔哩哔哩科技有限公司 Av1 filtering method and apparatus

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2024098821A1 (en) * 2022-11-11 2024-05-16 上海哔哩哔哩科技有限公司 Av1 filtering method and apparatus

Similar Documents

Publication Publication Date Title
CN111149359B (en) Method and apparatus for encoding/decoding image and recording medium storing bit stream
US10778987B2 (en) Method and apparatus for encoding/decoding video signal
CA2876017C (en) Method and apparatus for intra transform skip mode
CN105794210B (en) The motion prediction compensation method and device of boundary pixel are used in video coding system
US10567806B2 (en) Method of block-based adaptive loop filtering
KR101847899B1 (en) Method and apparatus for processing video
US9986235B2 (en) Video decoding device and video decoding method
US9860530B2 (en) Method and apparatus for loop filtering
US9344717B2 (en) Method and apparatus for sample adaptive offset in a video decoder
US8913656B2 (en) Method and apparatus for in-loop filtering
KR102227411B1 (en) Distance weighted bi-directional intra prediction
US20140198844A1 (en) Method and apparatus for non-cross-tile loop filtering
RU2589341C2 (en) Size of element of icon in video coding
US20150326886A1 (en) Method and apparatus for loop filtering
GB2531001A (en) Method and apparatus for vector encoding in video coding and decoding
US20200029082A1 (en) Image processing method for performing processing of encoding tree unit and encoding unit, image decoding and encoding method using same, and device thereof
KR20170102806A (en) Method for encoding/decoding a video signal and apparatus therefor
US20090279611A1 (en) Video edge filtering
US20200145670A1 (en) Apparatus and method for directional intra prediction using a fitting plane and a plurality of primary reference samples as well as a plurality of secondary reference samples
George et al. Efficient multi-threading strategies in VVenC, an open and optimized VVC encoder implementation
KR20150069584A (en) Method and apparatus of parallel deblocking filtering to minimize latency
JP2012114637A (en) Video encoding device
KR20130070191A (en) Method and apparatus for in-loop filtering on the lcu-level parallelism
JP2008271068A (en) Moving picture image encoding method, encoder for moving picture image parallel encoding, moving picture image parallel encoding method, moving picture image parallel encoding apparatus, their programs, and computer-readable recording medium recorded with their programs
Gu et al. A novel low delay in-loop filtering wpp process for parallel hevc encoding

Legal Events

Date Code Title Description
WITN Withdrawal due to no request for examination