KR20150069584A - Method and apparatus of parallel deblocking filtering to minimize latency - Google Patents
Method and apparatus of parallel deblocking filtering to minimize latency Download PDFInfo
- Publication number
- KR20150069584A KR20150069584A KR1020130155158A KR20130155158A KR20150069584A KR 20150069584 A KR20150069584 A KR 20150069584A KR 1020130155158 A KR1020130155158 A KR 1020130155158A KR 20130155158 A KR20130155158 A KR 20130155158A KR 20150069584 A KR20150069584 A KR 20150069584A
- Authority
- KR
- South Korea
- Prior art keywords
- filtering
- horizontal
- vertical
- ctu
- core
- Prior art date
Links
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/102—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
- H04N19/117—Filters, e.g. for pre-processing or post-processing
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/85—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using pre-processing or post-processing specially adapted for video compression
- H04N19/86—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using pre-processing or post-processing specially adapted for video compression involving reduction of coding artifacts, e.g. of blockiness
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Compression Or Coding Systems Of Tv Signals (AREA)
Abstract
The present invention relates to a parallel deblocking filtering method and apparatus for minimizing delay by changing the order of CTUs to be filtered in consideration of dependency between divided regions when paralleling a deblocking filter which is one of in-loop filters of HEVC .
Description
The present invention relates to image processing techniques, and more particularly, to a method and apparatus for minimizing delays in a parallel deblocking filtering process in an HEVC video decoder / decoder.
Recently, as the demand for high resolution and high definition video has increased, there has been a need for a highly efficient video compression technology for the next generation video service. In response to these market demands, Moving Picture Expert Group (MPEG) and Video Coding Expert Group (VCEG) have formed Joint Collaborative Team on Video Coding (JCT-VC). In 2010, the next generation video called HEVC (High Efficiency Video Coding) We started to develop standard technology. In January 2013, the development of the HEVC standard technology has been completed, and the HEVC achieves a compression efficiency improvement of about 50% compared to the H.264 / AVC High profile, which is conventionally known to have the highest compression efficiency.
The HEVC can be applied to existing existing techniques such as quad-tree partition structure, 35 intra prediction modes, advanced motion vector prediction (AMVP), deblocking filter, and sample adaptive offset (SAO) The coding efficiency is improved more than the video codec, but the complexity of the sub-decoder is also increased. among them
The deblocking filter takes up a relatively high computational complexity in the decoder.
When applying data level parallelism to a deblocking filter which is one of HEVC in-loop filters, horizontal filtering for vertical edges and vertical filtering for horizontal edges are combined and processed in one step, And to provide a method and an apparatus for minimizing the delay caused by synchronization.
In order to solve the above problems, a parallel deblocking filtering method and apparatus for a minimum delay according to the first embodiment of the present invention divides an area in units of rows of a CTU (Coding Tree Unit) for a picture or a slice, Use data-level parallelism to allocate to each thread or core.
When dividing a picture or slice into several areas for the data-level parallelization, an equal number of CTU rows are allocated to a thread or each core.
When deblocking filtering of HEVC is performed, horizontal filtering for vertical edges and vertical filtering for horizontal edges are processed in one step without dividing them into two steps, thereby minimizing the explicit synchronization process between the two steps.
In the deblocking filtering, the filtering is performed in units of CTU considering the scanning order that minimizes the dependency between adjacent divided areas.
When performing vertical filtering on the top CTU row in each partition area, data dependency is solved by confirming whether horizontal filtering is performed on the bottommost CTU row in the upper part of the current partition area for synchronization between the partition areas.
In the present invention, in performing deblocking filtering on a picture or slice basis, an equal number of CTU rows are divided into each core or thread, thereby solving the problem of unevenness in the amount of work generated in the data-level parallelization process. In addition, the horizontal filtering for the vertical edge and the vertical filtering for the horizontal edge are processed and processed as one step, thereby minimizing the delay in data parallelization by minimizing the explicit synchronization between the two stages.
1 is a block diagram showing a configuration of a video decoding apparatus to which the present invention is applied.
2A is an exemplary diagram illustrating deblocking filtering in the horizontal direction performed at a vertical edge boundary.
FIG. 2B is an exemplary diagram illustrating vertical deblocking filtering performed at a horizontal edge boundary. FIG.
FIG. 3A is an exemplary diagram of a method of assigning equal CTU rows to each core or thread, and each partition performing parallel filtering in parallel on the vertical edge. FIG.
FIG. 3B is an illustration of a method for allocating equal CTU rows to each core or thread, such that each partition performs parallel filtering in the vertical direction on a horizontal edge.
FIG. 4 is an example of data dependency between regions generated in two stages when horizontal filtering for vertical edges and vertical filtering for horizontal edges are performed in two stages in parallel deblocking filtering.
FIG. 5A illustrates horizontal deblocking filtering for CTU in order to minimize data dependence between regions when horizontal filtering for vertical edges and vertical filtering for horizontal edges are performed in one step in parallel deblocking filtering. Fig.
FIG. 5B illustrates vertical deblocking filtering for CTU in order to minimize data dependence between regions when horizontal filtering for vertical edges and vertical filtering for horizontal edges are performed in one step in parallel deblocking filtering. Fig.
Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings. In the following description of the embodiments of the present invention, a detailed description of known functions and configurations incorporated herein will be omitted when it may make the subject matter of the present disclosure rather unclear.
The terms first, second, etc. may be used to describe various components, but the components should not be limited by the terms. The terms are used only for the purpose of distinguishing one component from another.
In addition, the components shown in the embodiments of the present invention are shown independently to represent different characteristic functions, which does not mean that each component is composed of separate hardware or software constituent units. That is, each constituent unit is included in each constituent unit for convenience of explanation, and at least two constituent units of the constituent units may be combined to form one constituent unit, or one constituent unit may be divided into a plurality of constituent units to perform a function. The integrated embodiments and separate embodiments of the components are also included within the scope of the present invention, unless they depart from the essence of the present invention.
In addition, some of the components are not essential components to perform essential functions in the present invention, but may be optional components only to improve performance. The present invention can be implemented only with components essential for realizing the essence of the present invention, except for the components used for the performance improvement, and can be implemented by only including the essential components except the optional components used for performance improvement Are also included in the scope of the present invention.
1 is a block diagram showing a configuration of a video decoding apparatus to which the present invention is applied.
1, the
The
The
The predictive signal generated by performing the
2A is an exemplary diagram illustrating deblocking filtering in the horizontal direction performed at a vertical edge boundary.
Referring to FIG. 2A, the deblocking filtering of the HEVC performs deblocking filtering on all PU (Prediction unit) and TU (Transform unit) boundaries on an 8x8 grid in the CTU. At this time, horizontal filtering is performed up to 3 pixels with respect to the edge boundary with respect to the vertical edge boundary.
When the HEVC performs horizontal filtering (215) on the vertical edge boundaries in the CTU, no data dependency occurs between the respective edge boundaries.
FIG. 2B is an exemplary diagram illustrating vertical deblocking filtering performed at a horizontal edge boundary. FIG.
Referring to FIG. 2B, the deblocking filtering of the HEVC performs deblocking filtering on all PU (Prediction unit) and TU (Transform unit) boundaries on the 8x8 grid in the CTU. At this time, the vertical direction filtering is performed up to a maximum of 3 pixels based on the edge boundary with respect to the horizontal edge boundary.
When the HEVC performs vertical filtering (225) on the horizontal edge boundaries in the CTU, there is no data dependency between the respective edge boundaries.
FIG. 3A is an exemplary diagram of a method of assigning equal CTU rows to each core or thread, and each partition performing parallel filtering in parallel on the vertical edge. FIG.
Referring to FIG. 3A, the
When the
The deblocking filter of HEVC performs filtering at all PU or TU boundaries existing on the 8x8 grid and changes the maximum of 3 pixels based on the edge boundary. Therefore, when performing horizontal filtering on the vertical edge boundary, There is no data dependency.
FIG. 3B is an illustration of a method for allocating equal CTU rows to each core or thread, such that each partition performs parallel filtering in the vertical direction on a horizontal edge.
Referring to FIG. 3B, the
When the
The deblocking filter of the HEVC performs filtering at all PU or TU boundaries existing on the 8x8 grid and changes the maximum of 3 pixels based on the edge boundary. Therefore, even when performing vertical filtering on the horizontal edge boundary, the CTU There is no data dependency between them.
However, since the deblocking filter of the HEVC uses a horizontal filtering applied to the vertical edge boundary as an input value of the vertical filtering step to the horizontal edge boundary, an explicit synchronization process is required between the two filtering steps.
FIG. 4 is an example of data dependency between regions generated in two stages when horizontal filtering for vertical edges and vertical filtering for horizontal edges are performed in two stages in parallel deblocking filtering.
Referring to FIG. 4, Core 0 (400) allocated to the
FIG. 5A illustrates horizontal deblocking filtering for CTU in order to minimize data dependence between regions when horizontal filtering for vertical edges and vertical filtering for horizontal edges are performed in one step in parallel deblocking filtering. Fig.
Referring to FIG. 5A, when an HEVC deblocking filter is data-level parallelized, each core or thread is assigned an equal number of CTU rows. In order to minimize the delay in the data level parallelization process for HEVC deblocking filtering, the order of the CTUs to be filtered is adjusted in consideration of the positions of the CTUs having dependency.
5A is an example of performing horizontal filtering on a vertical edge boundary in a case where a picture is divided into two regions and HEVC deblocking filtering is performed in parallel with two cores or threads. Core 0 (500) performs horizontal filtering from the lowest CTU row in its area to minimize dependence on Core 1 (510). (CTU6 -> CTU7 -> CTU8 -> CTU9 -> CTU10 -> CTU11 -> CTU0 -> ...-> CTU5)
Likewise, Core 1 (510) also performs horizontal filtering from the lowest CTU row in its area. (CTU18 -> CTU19 -> CTU20 -> CTU21 -> CTU22 -> CTU23 -> CTU12 -> ... -> CTU17)
FIG. 5B illustrates vertical deblocking filtering for CTU in order to minimize data dependence between regions when horizontal filtering for vertical edges and vertical filtering for horizontal edges are performed in one step in parallel deblocking filtering. Fig.
Referring to FIG. 5B, the Core 1 (510, 560) completes the horizontal filtering of the vertical edges with respect to the CTUs in its region through the process described in FIG. 5A. Core 1 (510, 560) does not wait for
In FIG. 5B, Core 1 (560) is an example in which vertical filtering is continuously performed on a horizontal edge after completion of horizontal filtering for all CTUs allocated to the region. At this time, Core0 is an example of horizontal filtering for vertical edges with respect to CTU6 to CTU11 among its divided regions. By performing vertical filtering from the upper CTU row in the allocated area, the delay of the synchronization process occurring at the boundary of the divided area can be minimized.
In the above-described embodiments, methods are described based on a flowchart as a series of steps or blocks, but the present invention is not limited to the order of the steps, and some steps may occur in different orders or in a different order than the steps described above have. It will also be understood by those skilled in the art that the steps depicted in the flowchart illustrations are not exclusive and that other steps may be included or that one or more steps in the flowchart may be deleted without affecting the scope of the invention You will understand.
The above-described embodiments include examples of various aspects. While it is not possible to describe every possible combination for expressing various aspects, one of ordinary skill in the art will recognize that other combinations are possible. Accordingly, it is intended that the invention include all alternatives, modifications and variations that fall within the scope of the following claims.
Claims (4)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
KR1020130155158A KR20150069584A (en) | 2013-12-13 | 2013-12-13 | Method and apparatus of parallel deblocking filtering to minimize latency |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
KR1020130155158A KR20150069584A (en) | 2013-12-13 | 2013-12-13 | Method and apparatus of parallel deblocking filtering to minimize latency |
Publications (1)
Publication Number | Publication Date |
---|---|
KR20150069584A true KR20150069584A (en) | 2015-06-24 |
Family
ID=53516648
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
KR1020130155158A KR20150069584A (en) | 2013-12-13 | 2013-12-13 | Method and apparatus of parallel deblocking filtering to minimize latency |
Country Status (1)
Country | Link |
---|---|
KR (1) | KR20150069584A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2024098821A1 (en) * | 2022-11-11 | 2024-05-16 | 上海哔哩哔哩科技有限公司 | Av1 filtering method and apparatus |
-
2013
- 2013-12-13 KR KR1020130155158A patent/KR20150069584A/en not_active Application Discontinuation
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2024098821A1 (en) * | 2022-11-11 | 2024-05-16 | 上海哔哩哔哩科技有限公司 | Av1 filtering method and apparatus |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111149359B (en) | Method and apparatus for encoding/decoding image and recording medium storing bit stream | |
US10778987B2 (en) | Method and apparatus for encoding/decoding video signal | |
CA2876017C (en) | Method and apparatus for intra transform skip mode | |
CN105794210B (en) | The motion prediction compensation method and device of boundary pixel are used in video coding system | |
US10567806B2 (en) | Method of block-based adaptive loop filtering | |
KR101847899B1 (en) | Method and apparatus for processing video | |
US9986235B2 (en) | Video decoding device and video decoding method | |
US9860530B2 (en) | Method and apparatus for loop filtering | |
US9344717B2 (en) | Method and apparatus for sample adaptive offset in a video decoder | |
US8913656B2 (en) | Method and apparatus for in-loop filtering | |
KR102227411B1 (en) | Distance weighted bi-directional intra prediction | |
US20140198844A1 (en) | Method and apparatus for non-cross-tile loop filtering | |
RU2589341C2 (en) | Size of element of icon in video coding | |
US20150326886A1 (en) | Method and apparatus for loop filtering | |
GB2531001A (en) | Method and apparatus for vector encoding in video coding and decoding | |
US20200029082A1 (en) | Image processing method for performing processing of encoding tree unit and encoding unit, image decoding and encoding method using same, and device thereof | |
KR20170102806A (en) | Method for encoding/decoding a video signal and apparatus therefor | |
US20090279611A1 (en) | Video edge filtering | |
US20200145670A1 (en) | Apparatus and method for directional intra prediction using a fitting plane and a plurality of primary reference samples as well as a plurality of secondary reference samples | |
George et al. | Efficient multi-threading strategies in VVenC, an open and optimized VVC encoder implementation | |
KR20150069584A (en) | Method and apparatus of parallel deblocking filtering to minimize latency | |
JP2012114637A (en) | Video encoding device | |
KR20130070191A (en) | Method and apparatus for in-loop filtering on the lcu-level parallelism | |
JP2008271068A (en) | Moving picture image encoding method, encoder for moving picture image parallel encoding, moving picture image parallel encoding method, moving picture image parallel encoding apparatus, their programs, and computer-readable recording medium recorded with their programs | |
Gu et al. | A novel low delay in-loop filtering wpp process for parallel hevc encoding |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
WITN | Withdrawal due to no request for examination |