KR20120096848A

KR20120096848A - Apparatus and method for video decoding

Info

Publication number: KR20120096848A
Application number: KR1020110016218A
Authority: KR
Inventors: 정기석; 김원진; 조걸
Original assignee: 한양대학교 산학협력단
Priority date: 2011-02-23
Filing date: 2011-02-23
Publication date: 2012-08-31
Also published as: KR101216821B1

Abstract

PURPOSE: An image encoding device and a method thereof are provided to efficiently decode an encoded image without overhead and to increase the performance of decoding. CONSTITUTION: A thread allocation unit(510) allocates two or more threads to first to third syntax elements. The thread allocation unit allocates one thread to fourth to fifth syntax elements. A decoding unit(520) processes two or more threads according to a pipeline scheme. The decoding unit decodes the first to fifth syntax elements.

Description

[0001] APPARATUS AND METHOD FOR VIDEO DECODING [0002]

본 발명의 실시예들은 영상 디코딩 장치 및 방법에 관한 것으로서, 더욱 상세하게는 인코딩된 영상의 효율적인 병렬 디코딩이 가능한 영상 디코딩 장치 및 방법에 관한 것이다. Embodiments of the present invention relate to an image decoding apparatus and method, and more particularly, to an image decoding apparatus and method capable of efficient parallel decoding of an encoded image.

디지털 TV가 보편화됨에 따라 사용자들은 고해상도의 동영상 서비스를 제공받기를 요구하고 있다. 이로 인하여 고해상도의 동영상 서비스를 제공하기 위한 동영상 압축 처리에 대한 연구가 활발히 진행되고 있다. As digital TVs are becoming more common, users are demanding high resolution video services. For this reason, researches on video compression processing to provide a high resolution video service have been actively conducted.

동영상 압축 처리에 관하여 대표적으로 H.264/AVC는 현존하는 가장 압축률이 우수한 성능의 비디오 코덱 표준으로, 디지털 방송, 멀티미디어 플레이어, 화상회의 등 멀티미디어 서비스 분야에서 많이 사용되고 있다. H.264/AVC는 높은 압축률을 지원하기 위하여 다른 비디오 코덱에 비해 복잡도가 높은 알고리즘을 사용한다. 따라서, H.264/AVC에 따라 데이터를 인코딩/디코딩하기 위해서는 고성능의 프로세서가 사용되어야 한다. Regarding video compression processing, H.264 / AVC is a video codec standard with the highest compression ratio. It is widely used in multimedia service fields such as digital broadcasting, multimedia player, and video conferencing. H.264 / AVC uses a higher complexity algorithm than other video codecs to support high compression rates. Therefore, a high performance processor should be used to encode / decode data according to H.264 / AVC.

기존의 싱글 코어 기반의 프로세서는 클록 속도를 올려서 성능을 향상시키기 때문에 프로세서의 발열 및 소비전력의 증가로 인하여 처리 성능을 일정 수준 이상 증가시킬 수 없다는 문제점이 있었다. 이러한 문제점을 해결하기 위해 프로세서 코어 수를 늘리고 데이터를 병렬로 처리하는 멀티 코어 기반의 프로세서에 대한 연구가 지속적으로 이루어지고 있다. Existing single core-based processors improve the performance by increasing the clock speed, there is a problem that can not increase the processing performance by a certain level due to the heat generation and power consumption of the processor. In order to solve this problem, researches on multi-core based processors that increase the number of processor cores and process data in parallel have been continuously conducted.

H.264/AVC 디코더에서도 디코딩 성능을 향상시키기 위해 멀티코어 시스템에서 효과적으로 동작할 수 있는 병렬 처리 방법에 대한 다양한 연구들이 진행되고 있다. 대표적으로 데이터 레벨 병렬화 방법은 H.264/AVC 데이터를 병렬화가 가능하도록 나누어 여러 스레드(Thread)를 통해 동시에 처리하는 디코딩 방법이다. In order to improve decoding performance in H.264 / AVC decoders, various studies on parallel processing methods that can effectively operate in a multicore system have been conducted. Representatively, the data level parallelization method is a decoding method in which H.264 / AVC data is divided to be parallelized and processed through multiple threads.

그런데, 데이터 레벨 병렬화 방법은 H.264/AVC의 데이터간 의존성을 지키면서 병렬 처리에 따라 디코딩을 진행해야 하지만, 엔트로피 부호화 부분은 순차적으로 디코딩을 수행하여야 한다는 문제점이 있어, 병렬 처리에 따른 디코딩의 수행에 어려움을 발생시킨다. 특히 고해상도의 동영상에 대한 요구가 증가하면서 압축률이 좋은 CABAC(Context Adaptive Binary Arithmetic Coding) 엔트로피 인코딩이 선호되고 있으나, 복잡도가 높아 인코딩에 소요되는 시간이 길다는 단점이 있다. By the way, the data level parallelization method has to decode according to parallel processing while maintaining dependency between data of H.264 / AVC, but the entropy encoding part has to decode sequentially, thus performing decoding according to parallel processing. Causes difficulties. In particular, CABAC (Context Adaptive Binary Arithmetic Coding) entropy encoding, which has a high compression ratio, is preferred due to an increasing demand for high resolution video, but has a disadvantage in that the encoding time is long.

따라서, H.264/AVC 데이터 디코딩의 병렬 처리 가부는 엔트로피 디코딩의 병렬 처리 가부와 밀접한 관계를 가지게 된다. 이와 관련하여 엔트로피 디코딩을 병렬화하기 위한 다양한 알고리즘들이 제안되고 있지만, 종래의 알고리즘들은 대부분 하드웨어 디코더의 성능 향상에 대한 연구가 많고 엔트로피 디코딩의 부분적인 성능 향상에 초점이 맞추어 있어 전체적으로 디코딩 성능을 향상시키는 데에는 어려움이 있었다. Therefore, parallelism of H.264 / AVC data decoding is closely related to parallelism of entropy decoding. In this regard, various algorithms for parallelizing entropy decoding have been proposed, but most of the conventional algorithms have been studied for improving the performance of hardware decoders and focused on partial performance improvement of entropy decoding. There was a difficulty.

상기한 바와 같은 종래기술의 문제점을 해결하기 위해, 본 발명에서는 인코딩된 영상을 큰 오버헤드 없이 효율적으로 병렬 디코딩할 수 있는 영상 디코딩 장치 및 방법을 제안하고자 한다.In order to solve the problems of the prior art as described above, the present invention is to propose an image decoding apparatus and method capable of efficiently parallel decoding the encoded video without a large overhead.

본 발명의 다른 목적들은 하기의 실시예를 통해 당업자에 의해 도출될 수 있을 것이다.Other objects of the present invention may be derived by those skilled in the art through the following examples.

상기한 목적을 달성하기 위해 본 발명의 바람직한 일 실시예에 따르면, 구문 요소 분할 기법에 따라 인코딩된 영상을 디코딩하는 장치에 있어서, MBINFO 그룹에 포함되는 제1 구문 요소, PRED 그룹에 포함되는 제2 구문 요소 및 CBP 그룹에 포함되는 제3 구문 요소의 디코딩에 대하여 2 이상의 스레드(Thread)를 할당하고, SIGMAP 그룹에 포함되는 제4 구문 요소 및 COEFF 그룹에 포함되는 제5 구문 요소의 디코딩에 대하여 하나의 스레드를 할당하는 스레드 할당부; 및 상기 2 이상의 스레드 및 상기 하나의 스레드를 파이프 라인 기법에 따라 병렬 처리하여 상기 제1 구문 요소, 상기 제2 구문 요소, 상기 제3 구문 요소, 상기 제4 구문 요소 및 상기 제5 구문 요소를 디코딩하는 디코딩부를 포함하는 영상 디코딩 장치가 제공된다. According to an embodiment of the present invention to achieve the above object, in the apparatus for decoding an image encoded according to the syntax element segmentation technique, the first syntax element included in the MBINFO group, the second included in the PRED group Allocate two or more threads for decoding the syntax elements and the third syntax element included in the CBP group, and one for the decoding of the fourth syntax element included in the SIGMAP group and the fifth syntax element included in the COEFF group. A thread allocator for allocating threads; And parallel processing the two or more threads and the one thread according to a pipeline technique to decode the first syntax element, the second syntax element, the third syntax element, the fourth syntax element and the fifth syntax element. Provided is an image decoding apparatus including a decoding unit.

상기 스레드 할당부는 상기 제1 구문 요소의 디코딩에 대하여 제1 스레드를 할당하고, 상기 제2 구문 요소 및 상기 제3 구문 요소의 디코딩에 대하여 제2 스레드를 할당하고, 상기 제4 구문 요소 및 상기 제5 구문 요소의 디코딩에 대하여 제3 스레드를 할당하며, 상기 디코딩부는 소정의 시간 격차를 두고 상기 제1 스레드의 처리, 상기 제2 스레드의 처리 및 상기 제3 스레드의 처리를 순차적으로 개시하여 상기 제1 스레드, 상기 제2 스레드 및 상기 제3 스레드를 병렬 처리할 수 있다. The thread allocator allocates a first thread for decoding the first syntax element, allocates a second thread for decoding the second syntax element and the third syntax element, and assigns the fourth syntax element and the first thread. The third thread is allocated to the decoding of five syntax elements, and the decoding unit sequentially initiates the processing of the first thread, the processing of the second thread, and the processing of the third thread with a predetermined time gap. One thread, the second thread, and the third thread may be processed in parallel.

상기 디코딩부는 매크로 블록 단위로 상기 제1 스레드, 상기 제2 스레드 및 상기 제3 스레드를 각각 처리하되, 상기 제1 스레드의 처리의 개시 시점으로부터 한 타임 슬롯 후에 상기 제2 스레드의 처리를 개시하고, 상기 제2 스레드의 처리의 개시 시점으로부터 한 타임 슬롯 후에 상기 제3 매크로 블록의 처리를 개시할 수 있다. The decoding unit processes the first thread, the second thread, and the third thread in macroblock units, respectively, and starts processing of the second thread one time slot after the start of processing of the first thread; The processing of the third macroblock may be started one time slot after the start of the processing of the second thread.

상기 디코딩부는 2 이상의 상기 제1 구문 요소를 한 타임 슬롯 내에서 디코딩하고, 하나 이상의 상기 제2 구문 요소 및 하나 이상의 상기 제3 구문 요소를 한 타임 슬롯 내에서 디코딩하며, 하나 이상의 상기 제4 구문 요소 및 하나 이상의 상기 제5 구문 요소를 한 타임 슬롯 내에서 디코딩할 수 있다. The decoding unit decodes two or more first syntax elements in one time slot, decodes one or more second syntax elements and one or more third syntax elements in one time slot, and one or more of the fourth syntax elements. And decode one or more of the fifth syntax elements in one time slot.

상기 스레드 할당부는 상기 제1 구문 요소의 디코딩에 대하여 제1 스레드를 할당하고, 상기 제2 구문 요소의 디코딩에 대하여 제2 스레드를 할당하고, 상기 제3 구문 요소의 디코딩에 대하여 제3 스레드를 할당하고, 상기 제4 구문 요소 및 상기 제5 구문 요소의 디코딩에 대하여 제4 스레드를 할당하며, 상기 디코딩부는 소정의 시간 격차를 두고 상기 제1 스레드의 처리, 상기 제2 스레드의 처리, 상기 제3 스레드의 처리 및 상기 제4 스레드의 처리를 순차적으로 개시하여 상기 제1 스레드, 상기 제2 스레드, 상기 제3 스레드 및 상기 제4 스레드를 병렬 처리할 수 있다. The thread allocator allocates a first thread for decoding the first syntax element, allocates a second thread for decoding the second syntax element, and allocates a third thread for decoding the third syntax element. And assigning a fourth thread to the decoding of the fourth syntax element and the fifth syntax element, wherein the decoding unit processes the first thread, the second thread, and the third thread by a predetermined time gap. Processing of the thread and processing of the fourth thread may be sequentially started to process the first thread, the second thread, the third thread, and the fourth thread in parallel.

상기 디코딩부는 타임 슬롯 단위로 상기 제1 스레드, 상기 제2 스레드, 제3 스레드 및 상기 제4 스레드를 각각 처리하되, 상기 제1 스레드의 처리의 개시 시점으로부터 한 타임 슬롯 후에 상기 제2 스레드의 처리를 개시하고, 상기 제2 스레드의 처리의 개시 시점으로부터 한 타임 슬롯 후에 상기 제3 매크로 블록의 처리를 개시하며, 상기 제3 스레드의 처리의 개시 시점으로부터 한 타임 슬롯 후에 상기 제4 매크로 블록의 처리를 개시할 수 있다. The decoding unit processes the first thread, the second thread, the third thread, and the fourth thread on a time slot basis, respectively, and processes the second thread after a time slot from a start time of processing of the first thread. To start the processing of the third macroblock one time slot after the start of the processing of the second thread, and to process the fourth macroblock one time slot from the starting of the processing of the third thread. May be initiated.

상기 디코딩부는 2 이상의 상기 제1 구문 요소를 한 타임 슬롯 내에서 디코딩하고, 2 이상의 상기 제2 구문 요소를 한 타임 슬롯 내에서 디코딩하고, 2 이상의 상기 제3 구문 요소를 한 타임 슬롯 내에서 디코딩하고, 하나 이상의 상기 제4 구문 요소 및 하나 이상의 상기 제5 구문 요소를 한 타임 슬롯 내에서 디코딩할 수 있다. The decoding unit decodes two or more first syntax elements in one time slot, decodes two or more second syntax elements in one time slot, decodes two or more third syntax elements in one time slot, and At least one of the fourth syntax element and at least one of the fifth syntax elements may be decoded in one time slot.

또한, 본 발명의 다른 실시예에 따르면, 구문 요소 분할 기법에 따라 인코딩된 영상을 디코딩하는 방법에 있어서, MBINFO 그룹에 포함되는 제1 구문 요소, PRED 그룹에 포함되는 제2 구문 요소 및 CBP 그룹에 포함되는 제3 구문 요소의 디코딩에 대하여 2 이상의 스레드를 할당하고, SIGMAP 그룹에 포함되는 제4 구문 요소 및 COEFF 그룹에 포함되는 제5 구문 요소의 디코딩에 대하여 하나의 스레드를 할당하는 단계; 및 상기 2 이상의 스레드 및 상기 하나의 스레드를 파이프 라인 기법에 따라 병렬 처리하여 상기 제1 구문 요소, 상기 제2 구문 요소, 상기 제3 구문 요소, 상기 제4 구문 요소 및 상기 제5 구문 요소를 디코딩하는 단계를 포함하는 영상 디코딩 방법이 제공된다. Further, according to another embodiment of the present invention, in a method for decoding an image encoded according to a syntax element segmentation technique, the first syntax element included in the MBINFO group, the second syntax element included in the PRED group, and the CBP group Allocating two or more threads for decoding the included third syntax element, and allocating one thread for decoding the fourth syntax element included in the SIGMAP group and the fifth syntax element included in the COEFF group; And parallel processing the two or more threads and the one thread according to a pipeline technique to decode the first syntax element, the second syntax element, the third syntax element, the fourth syntax element and the fifth syntax element. There is provided a video decoding method comprising the step of.

본 발명에 따른 영상 디코딩 장치 및 방법은 인코딩된 영상을 큰 오버헤드 없이 효율적으로 병렬 디코딩할 수 있는 장점이 있다. An image decoding apparatus and method according to the present invention has an advantage of efficiently parallel decoding an encoded image without large overhead.

도 1 내지 도 4는 H.264/AVC 표준에 따른 디코딩의 개념을 설명하기 위한 도면이다.
도 5는 본 발명의 일 실시예에 따른 영상 디코딩 장치의 개괄적인 구성을 나타낸 블록도이다.
도 6 내지 12는 본 발명의 일 실시예에 따른 영상 디코딩 장치의 디코딩 개념을 설명하기 위한 도면이다.
도 13은 본 발명의 일 실시예에 따른 영상 디코딩 방법의 전체적인 흐름을 도시한 순서도이다.1 to 4 are diagrams for explaining a concept of decoding according to the H.264 / AVC standard.
5 is a block diagram illustrating a general configuration of an image decoding apparatus according to an embodiment of the present invention.
6 to 12 are diagrams for describing a decoding concept of an image decoding apparatus according to an embodiment of the present invention.
13 is a flowchart illustrating the overall flow of an image decoding method according to an embodiment of the present invention.

본 발명은 다양한 변경을 가할 수 있고 여러 가지 실시예를 가질 수 있는 바, 특정 실시예들을 도면에 예시하고 상세한 설명에 상세하게 설명하고자 한다. 그러나, 이는 본 발명을 특정한 실시 형태에 대해 한정하려는 것이 아니며, 본 발명의 사상 및 기술 범위에 포함되는 모든 변경, 균등물 내지 대체물을 포함하는 것으로 이해되어야 한다. 각 도면을 설명하면서 유사한 참조부호를 유사한 구성요소에 대해 사용하였다. While the invention is susceptible to various modifications and alternative forms, specific embodiments thereof are shown by way of example in the drawings and will herein be described in detail. It should be understood, however, that the invention is not intended to be limited to the particular embodiments, but includes all modifications, equivalents, and alternatives falling within the spirit and scope of the invention. Like reference numerals are used for like elements in describing each drawing.

이하에서, 본 발명에 따른 실시예들을 첨부된 도면을 참조하여 상세하게 설명한다.
Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings.

도 1 내지 도 4는 H.264/AVC 표준에 따른 디코딩의 개념을 설명하기 위한 도면이다. 도 1 내지 도 4를 참조하여 H.264/AVC 표준에 따른 디코딩의 개념을 설명하면 아래와 같다. 1 to 4 are diagrams for explaining a concept of decoding according to the H.264 / AVC standard. A concept of decoding according to the H.264 / AVC standard will be described below with reference to FIGS. 1 to 4.

H.264/AVC는 ISO/IEC와 ITU가 함께 만든 비디오 압축 표준으로 기존에 사용되는 비디오 압축 표준보다 더 높은 압축률 가지며, 네트워크를 통한 스트리밍에 적합한 특성을 가지고 있어서 다양한 멀티미디어 응용 분야에서 사용되고 있다. 그 중에서 H.264/AVC 표준에 따른 디코딩 장치는 그 기능에 따라 엔트로피 디코딩(ED: Entropy Decoding), 역 변환(IT: Inverse Transformation, 이하, "IT"라고 함), 역 양자화(IQ: Inverse Quantization, 이하, "IQ"라고 함), 인트라 예측(IP: Intra Prediction, 이하, "IP"라고 함), 움직임 보상(MC: Motion Compensation, 이하, "MC"라고 함), 디블로킹 필터(DF: Deblocking Filter, 이하, "DF"라고 함)로 구성된다. H.264 / AVC is a video compression standard created by ISO / IEC and ITU. It has a higher compression rate than the existing video compression standard and is suitable for streaming over a network and is used in various multimedia applications. Among them, the decoding apparatus according to the H.264 / AVC standard includes entropy decoding (ED), inverse transformation (IT), and inverse quantization (IQ) according to its function. , Hereinafter referred to as "IQ", intra prediction (IP: Intra Prediction, hereinafter "IP"), motion compensation (MC: Motion Compensation, hereinafter "MC"), deblocking filter (DF: Deblocking Filter, hereinafter referred to as "DF").

이와 같은 H.264/AVC 표준에 따라 생성되는 데이터는 도 1에 도시된 바와 같은 구조를 가진다. 도 1을 참조하면, H.264/AVC의 비디오 시퀀스는 GOP(Group Of Pictures)로 구성되고, 각 GOP는 다수의 픽쳐(Picture) 또는 프레임(Frame)으로 구성된다. 여기서, GOP를 구성하는 프레임은 공간적 중복성을 이용한 I 프레임, 시간적 중복성을 이용한 P 프레임 및 양방향으로 참조 프레임을 사용하는 B 프레임으로 분류된다. 그리고 하나의 픽쳐는 하나 이상의 슬라이스(Slice)로 구성된다. 또한, 하나의 슬라이스는 다수의 매크로 블록(Macroblock)으로 구성되며, 하나의 매크로 블록은 휘도 블록(Luma block)과 색차 블록(Chroma block)으로 이루어진다.The data generated according to the H.264 / AVC standard has a structure as shown in FIG. Referring to FIG. 1, a video sequence of H.264 / AVC is composed of a group of pictures (GOP), and each GOP is composed of a plurality of pictures or frames. Here, the frames constituting the GOP are classified into I frames using spatial redundancy, P frames using temporal redundancy, and B frames using reference frames in both directions. One picture is composed of one or more slices. In addition, one slice includes a plurality of macroblocks, and one macroblock includes a luma block and a chrominance block.

한편, 대표적인 종래의 H.264/AVC 데이터의 병렬 디코딩 처리 방법인 데이터 레벨 병렬화 방법은 H.264/AVC 데이터를 병렬화가 가능하도록 나누어 처리하는 방법으로서, 보다 상세하게 데이터 레벨 병렬화 방법은 H.264/AVC 데이터 단위에 따라 프레임 단위의 병렬화 방법, 슬라이스 단위의 병렬화 방법, 및 매크로 블록 단위의 병렬화 방법으로 크게 분류된다. Meanwhile, the data level parallelization method, which is a typical conventional parallel decoding processing method of H.264 / AVC data, is a method of dividing and processing H.264 / AVC data to enable parallelism. More specifically, the data level parallelization method is H.264. According to the / AVC data unit, it is roughly classified into a parallel unit method of a frame unit, a parallel unit method of a slice unit, and a parallel unit method of a macroblock unit.

그런데, 프레임 단위의 병렬화 방법의 경우, P 프레임 또는 B 프레임의 참조 프레임 의존성에 의해 병렬 디코딩 성능의 향상에 한계가 있다. 또한, 슬라이스 단위의 병렬화 방법의 경우, 슬라이스 사이의 데이터 의존성은 없으나, 한 프레임을 여러 개의 슬라이스로 나누어 인코딩 된 영상만을 슬라이스 단위로 병렬 디코딩할 수 있다는 제한이 있다. However, in the case of the frame-by-frame parallelization method, there is a limit to the improvement of the parallel decoding performance by the reference frame dependency of the P frame or the B frame. In addition, in the case of the parallelization method of a slice unit, there is no data dependency between slices, but there is a limitation that only an image encoded by dividing a frame into multiple slices and decoded in slice units can be parallelly decoded.

상기한 문제점들로 인해 매크로 블록 단위의 병렬화 방법에 대하여 다양한 연구가 진행되고 있다. 매크로 블록 단위의 병렬화 방법은 동시에 처리할 수 있는 매크로 블록을 각 스레드(Thread)에 할당하여 병렬 디코딩을 수행한다. 이 때, 매크로 블록들 사이에는 도 2에 도시된 바와 같은 의존성이 존재한다. 일례로, "Current MB"의 IP 처리를 위해서는 1번 매크로, 2번 매크로 블록, 3번 매크로 블록 및 4번 매크로 블록이 우선적으로 IP 처리되어야 한다. Due to the above problems, various studies have been conducted on the parallelization method in macroblock units. In the parallelization method of macroblock units, parallel decoding is performed by allocating macroblocks that can be processed simultaneously to each thread. At this time, there is a dependency as shown in FIG. 2 between the macroblocks. For example, for the IP processing of "Current MB", macro macro 1, macro block 3, macro block 3, and macro block 4 must first be IP processed.

이와 관련하여, 매크로 블록 단위의 병렬화 방법 중 대표적인 2D-wave 병렬화 방법은 도 3에 도시된 바와 같이 각 화살표에 스레드를 할당하여 화살표를 따라서 매크로 블록을 처리한다. 다시 말해, 화살표 방향에 따라 수평적으로 스레드를 할당하여 병렬 처리를 수행한다. 일례로, 도 3에서 MB(4,0), MB(2,1), MB(0,2)는 데이터 의존성을 지키면서 5번째 시간(T5)에서 동시에 처리된다. In this regard, a representative 2D-wave parallelization method among macroblock parallelism methods allocates a thread to each arrow to process the macroblock along the arrow as shown in FIG. 3. In other words, parallel processing is performed by allocating threads horizontally in the direction of the arrow. For example, in FIG. 3, MB (4,0), MB (2,1), and MB (0,2) are processed simultaneously at the fifth time T5 while maintaining data dependency.

또한, 상기한 2D-wave 병렬화 방법을 확장한 3D-wave 방법은 도 4에 도시된 바와 같이 데이터 레벨 병렬화를 진행하기 위하여 엔트로피 디코딩을 선행 처리한 후 MC+IT/IQ IP+IT/IQ를 병렬로 처리하고 DF의 병렬 처리를 수행한다. 여기서, 엔트로피 디코딩 과정을 선행 처리하지 않는 이유는 ED 과정은 병렬화가 되지 않기 때문이다.In addition, the 3D-wave method, which is an extension of the 2D-wave parallelization method described above, performs parallel processing of MC + IT / IQ IP + IT / IQ after preprocessing entropy decoding in order to proceed with data level parallelization as shown in FIG. 4. And perform parallel processing of the DF. The reason why the entropy decoding process is not preprocessed is that the ED process is not parallelized.

특히 영상의 해상도가 높아질수록 압축률이 높은 CABAC가 사용되는데, CABAC는 압축율은 좋지만 복잡도가 높기 때문에 엔트로피 디코딩의 수행에 소요되는 시간이 길다는 단점이 있다. 이에 따라 CABAC를 디코딩하는데 소요되는 시간을 줄이기 위하여 CABAC 엔트로피 디코딩을 병렬화하기 위한 다양한 연구가 이루어지고 있다. In particular, the higher the resolution of the image, the higher the compression rate of CABAC is used, CABAC has a disadvantage that the time required to perform entropy decoding because the compression ratio is good but the complexity is high. Accordingly, various studies have been conducted to parallelize CABAC entropy decoding in order to reduce the time required for decoding CABAC.

그런데, CABAC를 이용한 H.264/AVC 인코딩 장치는 이진화(Binarization), 문맥 모델링(Context Modeling) 및 산술 부호화(Binary arithmetic coding)를 순차적으로 진행하여 인코딩을 수행하는바, 이와 같은 순차적인 인코딩의 진행으로 인해 병렬화를 통한 고속 CABAC 인코딩의 수행이 어렵다는 문제점이 있었다. However, the H.264 / AVC encoding apparatus using CABAC performs encoding by sequentially performing binarization, context modeling, and arithmetic coding. Due to this, there is a problem that it is difficult to perform fast CABAC encoding through parallelism.

위와 같은 문제점에 따라 H.264/AVC 표준 이후의 차세대 표준을 위하여 엔트로피 슬라이스 병렬화 방법이 제안되었다. 엔트로피 슬라이스 병렬화 방법은 기존의 슬라이스 레벨 병렬화와 일부 유사점이 있지만, 기존의 슬라이스 레벨 병렬화 방법은 하나의 픽쳐를 여러 개의 슬라이스로 나누어 각 슬라이스 단위로 디코딩을 수행하는 반면, 엔트로피 슬라이스 방법은 CABAC 부분만을 슬라이스로 나누어 엔트로피 디코딩을 수행하며, 재구성된 데이터를 이용하여 나머지 디코딩을 수행한다. 그러나 엔트로피 슬라이스 방법은 인코딩 효율이 떨이지는 단점이 있다.In accordance with the above problems, the entropy slice parallelization method has been proposed for the next-generation standard after the H.264 / AVC standard. While entropy slice parallelism has some similarities to conventional slice level parallelism, the existing slice level parallelization method divides a picture into several slices and decodes each slice unit, whereas entropy slice method slices only the CABAC part. Entropy decoding is performed by dividing by. The remaining decoding is performed using reconstructed data. However, the entropy slice method has a disadvantage in that the encoding efficiency is inferior.

이에 따라, CABAC 병렬화에서 발생하는 인코딩 효율 및 병렬화 성능을 높이기 위해 구문 요소 분할(Syntax Element Partitioning) 기법을 이용한 CABAC 병렬화 방법이 제안 되었다. 구문 요소 분할 기법은 구문 요소(SE: Syntax Element)를 그룹(구문 요소 그룹)으로 나누어 CABAC를 진행하는 방법이다. 여기서, 구문 요소는 아래의 표 1에 도시된 바와 같이 MBINFO 그룹, PRED 그룹, CBP 그룹, SIGMAP 그룹, 및 COEFF 그룹으로 분류되어 인코딩된다(이하에서는 설명의 편의를 위해, MBINFO 그룹에 포함되는 구문 요소를 "제1 구문 요소"로, PRED 그룹에 포함된 구문 요소를 "제2 구문 요소"로, CBP 그룹에 포함된 구문 요소를 "제3 구문 요소"로, SIGMAP 그룹에 포함된 구문 요소를 "제4 구문 요소"로, 및 COEFF 그룹에 포함된 구문 요소를 "제5 구문 요소"로 칭하기로 한다).
Accordingly, the CABAC parallelization method using the syntax element partitioning technique has been proposed to improve the encoding efficiency and the parallelization performance that occur in CABAC parallelism. The syntax element splitting technique is a method of performing CABAC by dividing a syntax element (SE) into groups (syntax element groups). Here, the syntax elements are classified and encoded into MBINFO group, PRED group, CBP group, SIGMAP group, and COEFF group as shown in Table 1 below (hereinafter, for convenience of description, syntax elements included in MBINFO group) To "first syntax element", syntax element included in PRED group to "second syntax element", syntax element included in CBP group to "third syntax element", syntax element included in SIGMAP group " Fourth syntax element "and syntax elements included in the COEFF group are referred to as" fiveth syntax element ").

그룹group 구문 요소Syntax element MBINFOMBINFO mb_skip_flag, mb_type, sub_mb_type, mb_field_decoded flag,
end of slice flagmb_skip_flag, mb_type, sub_mb_type, mb_field_decoded flag,
end of slice flag PREDPRED prev_intra4x4 pred_mode_flag, rem intra4x4_pred_mode, prev_intra8x8_pred_mode_flag, rem intra8x8_pred_mode, intra_chroma pred_mode, ref_idx_l0, ref_idx_l1, mvd_l0, mvd_l1prev_intra4x4 pred_mode_flag, rem intra4x4_pred_mode, prev_intra8x8_pred_mode_flag, rem intra8x8_pred_mode, intra_chroma pred_mode, ref_idx_l0, ref_idx_l1, mvd_l0, mvd_l1 CBPCBP transform_size_8x8_flag, coded_block_pattern, coded_block_flagtransform_size_8x8_flag, coded_block_pattern, coded_block_flag SIGMAPSIGMAP significant_coeff flag, last_significant_coeff flagsignificant_coeff flag, last_significant_coeff flag COEFFCOEFF coeff_abs_level_minus1, coeff_sign_flagcoeff_abs_level_minus1, coeff_sign_flag

본 발명에 따른 영상 디코딩 장치는 위와 같은 MBINFO 그룹, PRED 그룹, CBP 그룹, SIGMAP 그룹 및 COEFF 그룹으로 분류되어 엔트로피(Entropy) 인코딩된 데이터를 병렬로 디코딩하기 위한 것이다. 이하, 도 5 를 참조하여 본 발명의 일 실시예에 따른 영상 디코딩 장치에 대해 상세하게 설명하기로 한다.
The image decoding apparatus according to the present invention is to decode entropy encoded data in parallel classified into the above MBINFO group, PRED group, CBP group, SIGMAP group, and COEFF group. Hereinafter, an image decoding apparatus according to an embodiment of the present invention will be described in detail with reference to FIG. 5.

도 5는 본 발명의 일 실시예에 따른 영상 디코딩 장치의 개괄적인 구성을 나타낸 블록도이다. 5 is a block diagram illustrating a general configuration of an image decoding apparatus according to an embodiment of the present invention.

도 5를 참조하면, 본 발명의 일 실시예에 따른 영상 디코딩 장치(500)는 스레드 할당부(510) 및 디코딩부(520)를 포함한다. 이하, 각 구성 요소 별로 그 기능을 상술하기로 한다. Referring to FIG. 5, an image decoding apparatus 500 according to an embodiment of the present invention includes a thread allocator 510 and a decoder 520. Hereinafter, the function of each component will be described in detail.

스레드 할당부(510)는 구문 요소 분할 기법에 따라 인코딩(보다 정확하게는 엔트로피 인코딩)된 데이터를 디코딩하기 위한 작업에 다수의 스레드를 할당한다. The thread allocator 510 allocates a plurality of threads to a task for decoding data encoded (or more precisely entropy encoded) according to a syntax element splitting technique.

보다 상세하게, 스레드 할당부(510)는 앞서 설명한 MBINFO 그룹, PRED 그룹, CBP 그룹, SIGMAP 그룹 및 COEFF 그룹을 적절하게 조합하여 동시에 디코딩할 수 있는 구문 요소 그룹을 묶어서 디코딩 작업으로 규정하고, 이에 따라 발생하는 다수의 디코딩 작업 그룹에 대해 각각 하나의 스레드를 할당하여 다수의 스레드를 발생시킨다. More specifically, the thread allocator 510 defines a decoding operation by grouping syntax element groups that can be decoded simultaneously by properly combining the aforementioned MBINFO group, PRED group, CBP group, SIGMAP group, and COEFF group, and accordingly, A plurality of threads are generated by allocating one thread for each decoding group that occurs.

그리고, 디코딩부(520)는 할당된 다수의 스레드를 파이프 라인 기법에 따라 병렬 처리하여 각 구문 요소 그룹에 속하는 구문 요소를 디코딩한다. 다시 말해, 디코딩부(520)는 소정의 시간 격차를 두고 다수의 스레드의 처리를 개시하여 병렬적으로 디코딩을 수행한다. The decoding unit 520 decodes the syntax elements belonging to each syntax element group by performing parallel processing on a plurality of allocated threads according to a pipeline technique. In other words, the decoding unit 520 starts processing of a plurality of threads with a predetermined time gap and performs decoding in parallel.

이하에서는 도 6 내지 도 12를 참조하여 동시에 디코딩할 수 있는 그룹들을 묶어서 하나의 디코딩 작업으로 규정하여 스레드를 부여하고, 파이프 라인 기법을 이용하여 병렬 처리를 수행하는 이유 및 이에 따른 효과를 보다 상세히 설명하기로 한다.
Hereinafter, referring to FIG. 6 to FIG. 12, a group of decodeable groups can be bundled and defined as one decoding operation to assign a thread, and the reason and effect thereof for performing parallel processing using a pipeline technique will be described in more detail. Let's go.

먼저, 도 6은 구문 요소 분할 기법에 따라 다수의 그룹으로 분류된 구문 요소들의 일례를 도시하고 있다. 즉, 구문 요소들은 앞서 설명한 바와 같이 MBINFO 그룹, PRED 그룹, CBP 그룹, SIGMAP 그룹, COEFF 그룹으로 분류된다. 따라서, 구문 요소 분할 기법에 따라 인코딩된 영상을 디코딩하고자 하는 경우, 이론적으로는 5개의 구문 요소 그룹 각각에 대해 하나의 스레드를 할당하여 병렬로 디코딩을 수행할 수 있다. First, FIG. 6 illustrates an example of syntax elements classified into a plurality of groups according to a syntax element splitting scheme. That is, the syntax elements are classified into the MBINFO group, the PRED group, the CBP group, the SIGMAP group, and the COEFF group as described above. Therefore, when a video encoded according to a syntax element segmentation scheme is to be decoded, theoretically, one thread may be allocated to each of five syntax element groups, thereby performing decoding in parallel.

그러나, 구문 요소 그룹 사이에는 도 7에 도시된 바와 같은 데이터 의존성이 존재한다. 이러한 데이터 의존성은 구문 요소 분할 기법에 따라 영상을 인코딩하는 경우, 데이터의 인코딩 순서에 의한 데이터의 결속도에 의해 발생한다. However, there is a data dependency between the syntax element groups as shown in FIG. This data dependence is caused by the speed of data due to the encoding order of data when the image is encoded according to the syntax element segmentation technique.

따라서, 도 7을 참조하면 PRED 그룹에 포함된 제2 구문 요소는 MBINFO 그룹에 포함된 제1 구문 요소가 디코딩 된 후에 디코딩되어야 하고, SIGMAP 그룹에 포함된 제4 구문 요소는 PRED 그룹에 포함된 제2 구문 요소 및 CBP 그룹에 포함된 제3 구문 요소가 디코딩된 후에 디코딩 되어야 하며, COFF 그룹에 포함된 제5 구문 요소는 SIGMAP 그룹에 포함된 제4 구문 요소가 디코딩 된 후에 디코딩 되어야 한다. Therefore, referring to FIG. 7, the second syntax element included in the PRED group should be decoded after the first syntax element included in the MBINFO group is decoded, and the fourth syntax element included in the SIGMAP group is included in the PRED group. 2 The syntax element and the third syntax element included in the CBP group must be decoded after decoding, and the fifth syntax element included in the COFF group must be decoded after the fourth syntax element included in the SIGMAP group is decoded.

이와 같은 데이터의 의존성을 유지하면서 CABAC 디코딩의 병렬 처리를 수행하기 위해 도 8에 도시된 바와 같이 MBINFO 그룹, CBP 그룹, PRED 그룹, SIGMAP 그룹, 및 COEFF 그룹에 대해 각각 제1 스레드(Task 1) 내지 제5 스레드(Task 5)를 순차적으로 할당한 후, 소정의 시간 간격을 두고 제1 스레드 내지 제5 스레드의 처리를 개시하여 멀티코어 시스템에서 5개의 태스크를 가지는 파이프라인 방식으로 병렬 디코딩을 수행할 수 있다. In order to perform parallel processing of CABAC decoding while maintaining the dependency of such data, as shown in FIG. 8, the first thread (Task 1) to the MBINFO group, the CBP group, the PRED group, the SIGMAP group, and the COEFF group, respectively. After sequentially assigning a fifth thread (Task 5), processing of the first to fifth threads is started at predetermined time intervals to perform parallel decoding in a pipelined manner having five tasks in a multicore system. Can be.

그런데, SIGMAP 그룹 부분과 COEFF 그룹 부분에서 인코딩되는 레지듀얼 데이터(Residual Data)는 도 9에 도시된 바와 같이 부호화 플래그(coded_block_flag)가 1이면 SIGMAP 그룹 내의 significant_coeff_flag, last_significant coeff_flag이 인코딩된 후 COEFF 그룹 내의 레벨 정보(Leve Information) 값인 coeff_abs_level minus1, coeff_sign_flag가 인코딩된다. 이와 같은 인코딩 과정에 의해 SIGMAP 그룹에 포함된 제4 구문 요소와 COEFF 그룹에 포함된 제5 구문 요소 사이에 강한 데이터의 결속이 발생하게 된다. However, the residual data encoded in the SIGMAP group portion and the COEFF group portion is a level in the COEFF group after significant_coeff_flag and last_significant coeff_flag in the SIGMAP group are encoded when the encoding flag coded_block_flag is 1 as shown in FIG. 9. Coeff_abs_level minus1 and coeff_sign_flag, which are information values, are encoded. This encoding process causes strong data binding between the fourth syntax element included in the SIGMAP group and the fifth syntax element included in the COEFF group.

따라서, 도 8에 도시된 바와 같이 각 구문 요소 그룹에 대해 별개의 스레드를 할당하여 파이프 라인 기법에 따라 병렬 디코딩을 수행하는 경우, SIGMAP 그룹에 포함된 제4 구문 요소의 디코딩(즉, 제4 스레드의 처리) 및 COEFF 그룹에 포함된 제5 구문 요소의 디코딩(즉, 제5 스레드의 처리) 시에 큰 오버헤드가 발생하게 되어 파이프 라인 기법에 따른 병렬 디코딩의 효율이 크게 감소하게 된다. Therefore, as shown in FIG. 8, when parallel decoding is performed according to a pipeline technique by allocating a separate thread for each syntax element group, decoding of the fourth syntax element included in the SIGMAP group (that is, the fourth thread). Processing) and decoding of the fifth syntax element included in the COEFF group (that is, processing of the fifth thread) causes a large overhead, thereby greatly reducing the efficiency of parallel decoding according to the pipeline technique.

따라서, 본 발명의 일 실시예에 따른 영상 디코딩 장치(500)는 SIGMAP 그룹에 포함된 제4 구문 요소의 디코딩 작업과 COEFF 그룹에 포함된 제5 구문 요소의 디코딩 작업을 하나의 디코딩 작업 그룹으로 규정하고, 이에 대해 하나의 스레드를 할당한 후, 다른 스레드들과 병렬 처리를 수행하여 디코딩함으로써 위와 같은 오버헤드의 발생을 방지한다. Accordingly, the image decoding apparatus 500 according to an embodiment of the present invention defines the decoding operation of the fourth syntax element included in the SIGMAP group and the decoding operation of the fifth syntax element included in the COEFF group as one decoding operation group. After allocating one thread, the parallel processing with the other threads is performed to decode and prevent the above overhead from occurring.

보다 상세하게, 본 발명의 일 실시예에 따르면, 스레드 할당부(510)는 MBINFO 그룹에 포함되는 제1 구문 요소, PRED 그룹에 포함되는 제2 구문 요소 및 CBP 그룹에 포함되는 제3 구문 요소의 디코딩에 대하여 2 이상의 스레드를 할당하고, SIGMAP 그룹에 포함되는 제4 구문 요소 및 COEFF 그룹에 포함되는 제5 구문 요소의 디코딩에 대하여 하나의 스레드를 할당한다. 다시 말해, 스레드 할당부(510)는 MBINFO 그룹, PRED 그룹 및 CBP 그룹을 적절히 조합하여 2 이상의 디코딩 작업을 규정하고, 규정된 2 이상의 디코딩 작업에 대해 각각 각각 스레드를 할당하고, SIGMAP 그룹 및 COEFF 그룹에 대한 디코딩을 하나의 디코딩 작업으로 규정하여 하나의 스레드를 할당한다. More specifically, according to an embodiment of the present invention, the thread allocator 510 may include a first syntax element included in the MBINFO group, a second syntax element included in the PRED group, and a third syntax element included in the CBP group. Allocate two or more threads for decoding and one thread for decoding the fourth syntax element included in the SIGMAP group and the fifth syntax element included in the COEFF group. In other words, the thread allocator 510 defines two or more decoding operations by appropriately combining the MBINFO group, the PRED group, and the CBP group, and allocates threads to the prescribed two or more decoding operations, respectively, and the SIGMAP group and the COEFF group. It defines a decoding for as one decoding operation and allocates one thread.

이 경우, 디코딩부(520)는 상기 2 이상의 스레드 및 상기 하나의 스레드를 파이프 라인 기법에 따라 병렬 처리하여 제1 구문 요소, 제2 구문 요소, 제3 구문 요소, 제4 구문 요소 및 제5 구문 요소를 디코딩한다. In this case, the decoding unit 520 processes the two or more threads and the one thread in parallel according to a pipelined technique, so that the first syntax element, the second syntax element, the third syntax element, the fourth syntax element and the fifth syntax are processed in parallel. Decode the elements.

본 발명의 제1 실시예에 따르면, 스레드 할당부(510)는 제1 구문 요소의 디코딩에 대하여 제1 스레드를 할당하고, 제2 구문 요소 및 제3 구문 요소의 디코딩에 대하여 제2 스레드를 할당하고, 제4 구문 요소 및 제5 구문 요소의 디코딩에 대하여 제3 스레드를 할당할 수 있다. 이 경우, 디코딩부(520)는 소정의 시간 격차를 두고 제1 스레드의 처리, 제2 스레드의 처리 및 제3 스레드의 처리를 순차적으로 개시하여 제1 스레드, 제2 스레드 및 제3 스레드를 병렬 처리하여 디코딩을 수행할 수 있다. According to the first embodiment of the present invention, the thread allocator 510 allocates a first thread for decoding the first syntax element, and allocates a second thread for decoding the second syntax element and the third syntax element. In addition, a third thread may be allocated to the decoding of the fourth syntax element and the fifth syntax element. In this case, the decoding unit 520 sequentially initiates the processing of the first thread, the processing of the second thread, and the processing of the third thread with a predetermined time gap, thereby paralleling the first thread, the second thread, and the third thread. Processing to perform decoding.

다시 말해, 본 발명의 제1 실시예에 따르면, 영상 디코딩 장치(500)는 도 10에 도시된 바와 같이 MBINFO 그룹에 포함되는 제1 구문 요소의 디코딩에 대하여 제1 스레드(Thread 1)를 할당하고, PRED 그룹에 포함되는 제2 구문 요소 및 CBP 그룹에 포함되는 제3 구문 요소의 디코딩에 대하여 제2 스레드(Thread 2)를 할당하고, SIGMAP 그룹에 포함되는 제4 구문 요소 및 COEFF 그룹에 포함되는 제5 구문 요소의 디코딩에 대하여 제3 스레드(Thread 3)를 할당한 후, 소정의 시간 격차(즉, 하나의 타임 슬롯)를 두고 제1 스레드의 처리, 제2 스레드의 처리 및 제3 스레드의 처리를 순차적으로 개시하여 제1 스레드, 제2 스레드 및 제3 스레드를 병렬 처리하여 파이프 라인 기법에 따라 디코딩을 수행할 수 있다. In other words, according to the first embodiment of the present invention, the image decoding apparatus 500 allocates a first thread Thread 1 to the decoding of the first syntax element included in the MBINFO group as shown in FIG. 10. Allocates a second thread for decoding the second syntax element included in the PRED group and the third syntax element included in the CBP group, and is included in the fourth syntax element and COEFF group included in the SIGMAP group. After allocating a third thread (Thread 3) for decoding the fifth syntax element, processing the first thread, processing the second thread, and processing the third thread with a predetermined time gap (ie, one time slot). Processing may be sequentially started to process the first thread, the second thread, and the third thread in parallel to perform decoding according to a pipeline technique.

여기서, 본 발명의 일 실시예에 따르면, 디코딩부(520)는 도 10에 도시된 바와 같이 타임 슬롯 단위로 제1 스레드, 제2 스레드 및 제3 스레드를 각각 처리할 수 있다. 이 경우, 디코딩부(520)는 제1 스레드의 처리의 개시 시점으로부터 한 타임 슬롯 후에 제2 스레드의 처리를 개시하고, 제2 스레드의 처리의 개시 시점으로부터 한 타임 슬롯 후에 제3 타임 슬롯의 처리를 개시할 수 있다. Here, according to an embodiment of the present invention, as shown in FIG. 10, the decoding unit 520 may process the first thread, the second thread, and the third thread in units of time slots, respectively. In this case, the decoding unit 520 starts processing of the second thread one time slot after the start time of the processing of the first thread, and processes the third time slot one time slot after the start of the processing of the second thread. May be initiated.

한편, 위와 같이 파이프 라인 기법에 따라 디코딩을 수행하는 경우에 있어서, 한 타임 슬롯에 하나의 구문 요소만을 처리하는 경우에는 동기화의 횟수 증가에 의해 전체적인 디코딩 속도가 느려지는 문제가 발생할 수도 있다. Meanwhile, when decoding is performed according to the pipeline technique as described above, when only one syntax element is processed in one time slot, the overall decoding speed may be slowed down by increasing the number of synchronizations.

따라서, 본 발명의 일 실시예에 따르면, 디코딩부(520)는 각각의 스레드에 대해, 하나의 타임 슬롯 내에서 다수의 구문 요소를 처리함으로써 동기화 횟수를 감소시켜 디코딩 속도의 저하를 방지할 수 있다. 다시 말해, 디코딩부(520)는 2 이상의 제1 구문 요소를 한 타임 슬롯 내에서 디코딩하고, 하나 이상의 제2 구문 요소 및 하나 이상의 제3 구문 요소를 한 타임 슬롯 내에서 디코딩하며, 하나 이상의 제4 구문 요소 및 하나 이상의 제5 구문 요소를 한 타임 슬롯에서 디코딩할 수 있다. Therefore, according to an embodiment of the present invention, the decoding unit 520 can reduce the number of synchronization by processing a plurality of syntax elements in one time slot for each thread to prevent a decrease in decoding speed. . In other words, the decoding unit 520 decodes two or more first syntax elements in one time slot, decodes one or more second syntax elements and one or more third syntax elements in one time slot, and one or more fourths. Syntax elements and one or more fifth syntax elements may be decoded in one time slot.

또한, 본 발명의 제2 실시예에 따르면, 스레드 할당부(510)는 제1 구문 요소의 디코딩에 대하여 제1 스레드를 할당하고, 제2 구문 요소의 디코딩에 대하여 제2 스레드를 할당하고, 제3 구문 요소의 디코딩에 대하여 제3 스레드를 할당하고, 제4 구문 요소 및 제5 구문 요소의 디코딩에 대하여 제4 스레드를 할당할 수 있다. 이 경우, 디코딩부(520)는 소정의 시간 격차를 두고 제1 스레드의 처리, 제2 스레드의 처리, 제3 스레드의 처리 및 제4 스레드의 처리를 순차적으로 개시하여 제1 스레드, 제2 스레드, 제3 스레드 및 제4 스레드를 병렬 처리하여 디코딩을 수행할 수 있다. Further, according to the second embodiment of the present invention, the thread allocator 510 allocates a first thread for decoding the first syntax element, allocates a second thread for decoding the second syntax element, A third thread may be allocated for decoding of three syntax elements, and a fourth thread may be allocated for decoding of the fourth syntax element and the fifth syntax element. In this case, the decoding unit 520 sequentially initiates the processing of the first thread, the processing of the second thread, the processing of the third thread, and the processing of the fourth thread with a predetermined time gap, thereby causing the first thread and the second thread. The third thread and the fourth thread may be processed in parallel to perform decoding.

다시 말해, 본 발명의 제2 실시예에 따르면, 영상 디코딩 장치(500)는 도 11에 도시된 바와 같이, MBINFO 그룹에 포함되는 제1 구문 요소의 디코딩에 대하여 제1 스레드(Thread 1)를 할당하고, PRED 그룹에 포함되는 제2 구문 요소의 디코딩에 대하여 제2 스레드(Thread 2)를 할당하고, CBP 그룹에 포함되는 제3 구문 요소의 디코딩에 대하여 제3 스레드(Thread 3)를 할당하고, SIGMAP 그룹에 포함되는 제4 구문 요소 및 COEFF 그룹에 포함되는 제5 구문 요소의 디코딩에 대하여 제4 스레드(Thread 4)를 할당한 후, 소정의 시간 격차를 두고 제1 스레드의 처리, 제2 스레드의 처리, 제3 스레드의 처리 및 제4 스레드의 처리를 순차적으로 개시하여 제1 스레드, 제2 스레드, 제3 스레드 및 제4 스레드를 병렬 처리하여 파이프 라인 기법에 따라 디코딩을 수행할 수 있다. In other words, according to the second embodiment of the present invention, as shown in FIG. 11, the image decoding apparatus 500 allocates a first thread to the decoding of the first syntax element included in the MBINFO group. Allocate a second thread (Thread 2) to the decoding of the second syntax element included in the PRED group, allocate a third thread (Thread 3) to the decoding of the third syntax element included in the CBP group, After allocating a fourth thread (Thread 4) to the decoding of the fourth syntax element included in the SIGMAP group and the fifth syntax element included in the COEFF group, processing of the first thread with a predetermined time gap and processing of the second thread The processing of the third thread, the processing of the third thread, and the processing of the fourth thread may be sequentially started, and the first thread, the second thread, the third thread, and the fourth thread may be processed in parallel to perform decoding according to the pipeline technique.

여기서, 본 발명의 일 실시예에 따르면, 디코딩부(520)는 앞서 설명한 바와 마찬가지로 타임 슬롯 단위로 제1 스레드, 제2 스레드, 제3 스레드 및 제4 스레드를 각각 처리할 수 있다. 이 경우, 디코딩부(520)는 제1 스레드의 처리의 개시 시점으로부터 한 타임 슬롯 후에 제2 스레드의 처리를 개시하고, 제2 스레드의 처리의 개시 시점으로부터 한 타임 슬롯 후에 제3 타임 슬롯의 처리를 개시하며, 제3 스레드의 처리의 개시 시점으로부터 한 타임 슬롯 후에 제4 타임 슬롯의 처리를 개시할 수 있다. Here, according to the exemplary embodiment of the present invention, the decoding unit 520 may process the first thread, the second thread, the third thread, and the fourth thread on a time slot basis as described above. In this case, the decoding unit 520 starts processing of the second thread one time slot after the start time of the processing of the first thread, and processes the third time slot one time slot after the start of the processing of the second thread. In this example, the processing of the fourth time slot may be started one time slot after the start of the processing of the third thread.

또한, 본 발명의 일 실시예에 따르면, 디코딩부(520)는 앞서 설명한 바와 마찬가지로 2 이상의 제1 구문 요소를 한 타임 슬롯 내에서 디코딩하고, 2 이상의 구문 요소를 한 타임 슬롯 내에서 디코딩하고, 2 이상의 제3 구문 요소를 한 타임 슬롯 내에서 디코딩하고, 하나 이상의 제4 구문 요소 및 하나 이상의 제5 구문 요소를 한 타임 슬롯에서 디코딩할 수 있다. 이에 따라 한 타임 슬롯에 하나의 구문 요소만을 처리하여 동기화의 횟수 증가함으로써 전체적인 디코딩 속도가 느려지는 문제를 해결할 수 있게 된다. In addition, according to an embodiment of the present invention, as described above, the decoding unit 520 decodes two or more first syntax elements in one time slot, decodes two or more syntax elements in one time slot, and The third syntax element may be decoded in one time slot, and one or more fourth syntax elements and one or more fifth syntax elements may be decoded in one time slot. Accordingly, it is possible to solve the problem of slowing down the overall decoding speed by increasing the number of synchronizations by processing only one syntax element in one time slot.

이에 따라, 본 발명에 따른 영상 디코딩 장치(500)는 큰 오버헤드 없이 영상에 따른 데이터를 효율적으로 병렬 디코딩할 수 있게 된다. Accordingly, the image decoding apparatus 500 according to the present invention can efficiently decode data according to an image in parallel without great overhead.

이하, 표 2와 도 12를 참조하여 본 발명에 따른 영상 디코딩 장치(500)에 의한 병렬 디코딩 시뮬레이션 결과를 설명한다. Hereinafter, the results of the parallel decoding simulation by the image decoding apparatus 500 according to the present invention will be described with reference to Table 2 and FIG. 12.

본 시뮬레이션에서는 ITU-T SG16/Q6 VCEG(Video Coding Experts Group)에서 H.264 이후의 H.264+ 또는 H.265 표준화 진행을 위한 사전 작업으로 관련 핵심 기술들을 발굴하기 위해 운영하는 일련의 활동인 KTA(Key Technology Area)에 의해 개발된 KTA 2.7 디코더를 사용하여 시뮬레이션을 수행하였다. 여기서, 시뮬레이션 대상 영상으로는 KTA 2.7에서 제공하는 영상 인코딩 프로파일을 기반으로 KTA 2.7 인코더를 이용하여 인코딩한 영상을 사용하였다. 그리고, 운영체제는 리눅스 ubuntu 9.10, 커널 버전 2.6.31을 이용하였고, CPU는 인텔 쿼드 코어 i5 프로세서를 사용하였으며, 컴파일러는 gcc v4.4.1을 이용하였고 병렬화 기법은 OpenMP[21,22]를 사용하였다. In this simulation, the ITU-T SG16 / Q6 Video Coding Experts Group (VCEG) is a series of activities operated by the ITU-T SG16 / Q6 Video Coding Experts Group (VCEG) to explore relevant core technologies as a preliminary work for H.264 + or H.265 standardization. The simulation was performed using a KTA 2.7 decoder developed by the Key Technology Area (KTA). Here, the image encoded using the KTA 2.7 encoder is used as the simulation target image based on the image encoding profile provided by KTA 2.7. The operating system uses Linux ubuntu 9.10, kernel version 2.6.31, CPU uses Intel quad core i5 processor, compiler uses gcc v4.4.1, and parallelization technique uses OpenMP [21,22].

하기의 표 2는 본 발명의 일 실시예에 따른 영상 디코딩 장치(500)의 디코딩 처리 시간 결과를 표시한 것이고, 도 12는 하기의 표 2를 그래프 형태로 도시한 도면이다.
Table 2 below shows the decoding processing time results of the image decoding apparatus 500 according to an embodiment of the present invention, and FIG. 12 is a diagram illustrating Table 2 in graph form.

　　 before MT-SEPbefore MT-SEP MT-SEP(3)MT-SEP (3) MT-SEP(4)MT-SEP (4) MT-SEP(5)MT-SEP (5) 　　 μmμm μmμm %% μmμm %% μmμm %% mobcalmobcal HD,
1280×720HD,
1280 × 720 4297642976 2994729947 30%30% 2848128481 34%34% 2848128481 13%13% stockholmstockholm HD,
1280×720HD,
1280 × 720 5706657066 4410644106 23%23% 4370143701 23%23% 4370143701 0%0% shiledsshileds HD,
1280×720HD,
1280 × 720 3641836418 2509125091 31%31% 2180421804 40%40% 2180421804 22%22% blue_skyblue_sky FHD,
1920×1088FHD,
1920 × 1088 6044660446 4109141091 32%32% 2692426924 55%55% 3870738707 36%36% pedestrian_areapedestrian_area FHD,
1920×1088FHD,
1920 × 1088 5707957079 3918039180 31%31% 2776827768 51%51% 3760837608 34%34% sunflowersunflower FHD,
1920×1088FHD,
1920 × 1088 6132261322 3875238752 37%37% 2758927589 55%55% 3843138431 37%37% rush_hourrush_hour FHD,
1920×1088FHD,
1920 × 1088 5275452754 3570635706 32%32% 2473424734 53%53% 3155231552 40%40%

보다 상세하게, 상기의 표 2 및 도 12에서 "MT-SEP(Multi-Threaded Syntax Element Partitioning)(3)"은 디코딩 작업에 대해 3개의 스레드를 할당하는 본 발명의 제1 실시예에 따른 영상 디코딩 장치(500)와 대응되고, "MT-SEG(4)"는 디코딩 작업에 대해 4개의 스레드를 할당하는 본 발명의 제2 실시예에 따른 영상 디코딩 장치(500)와 대응된다. 그리고, "MT-SEG(5)"와 관계된 결과는 디코딩 작업에 대해 5개의 스레드를 부여하여 영상 디코딩을 수행하는 경우의 처리 결과를 의미한다. More specifically, in Table 2 and FIG. 12, "Multi-Threaded Syntax Element Partitioning (MT-SEP) 3" is used for image decoding according to the first embodiment of the present invention, which allocates three threads to a decoding operation. Corresponding to the apparatus 500, "MT-SEG (4)" corresponds to the image decoding apparatus 500 according to the second embodiment of the present invention that allocates four threads for the decoding operation. In addition, the result related to "MT-SEG (5)" means a processing result when image decoding is performed by assigning five threads to the decoding operation.

표 2 및 도 12를 참조하면, 앞서 설명한 바와 같이 SIGMAP 그룹에 포함되는 제4 구문 요소의 디코딩과 COEFF 그룹에 포함되는 제5 구문 요소의 디코딩을 분리하여 별개의 스레드로 처리하는 경우, 디코딩 처리 시간이 증가함을 확인할 수 있다. Referring to Table 2 and FIG. 12, when the decoding of the fourth syntax element included in the SIGMAP group and the decoding of the fifth syntax element included in the COEFF group are processed in separate threads as described above, the decoding processing time This increase can be seen.

또한, 표 2 및 도 12를 참조하면, 본 발명의 제1 실시예 및 제2 실시예에 따른 영상 디코딩 장치(500)에 의해 디코딩을 수행하는 경우 23% 내지 55% 정도 디코딩 성능이 향상됨을 확인할 수 있다. 특히, 본 발명의 제2 실시예에 따른 영상 디코딩 장치(500)를 이용하여 디코딩을 수행하는 경우, 최대 55%까지 CABAC 디코딩 성능을 향상시킬 수 있다.
In addition, referring to Table 2 and FIG. 12, when decoding is performed by the image decoding apparatus 500 according to the first and second embodiments of the present invention, it is confirmed that decoding performance is improved by about 23% to 55%. Can be. In particular, when decoding is performed using the image decoding apparatus 500 according to the second embodiment of the present invention, CABAC decoding performance may be improved up to 55%.

도 13은 본 발명의 일 실시예에 따른 영상 디코딩 방법의 전체적인 흐름을 도시한 순서도이다. 이하, 각 단계 별로 수행되는 과정을 설명하기로 한다. 13 is a flowchart illustrating the overall flow of an image decoding method according to an embodiment of the present invention. Hereinafter, a process performed for each step will be described.

먼저, 단계(S1310)에서는 MBINFO 그룹에 포함되는 제1 구문 요소, PRED 그룹에 포함되는 제2 구문 요소 및 CBP 그룹에 포함되는 제3 구문 요소의 디코딩에 대하여 2 이상의 스레드를 할당하고, SIGMAP 그룹에 포함되는 제4 구문 요소 및 COEFF 그룹에 포함되는 제5 구문 요소의 디코딩에 대하여 하나의 스레드를 할당한다. First, in step S1310, two or more threads are allocated to the decoding of the first syntax element included in the MBINFO group, the second syntax element included in the PRED group, and the third syntax element included in the CBP group, and assigned to the SIGMAP group. One thread is allocated for decoding the fourth syntax element included and the fifth syntax element included in the COEFF group.

그리고, 단계(S1320)에서는 상기 2 이상의 스레드 및 상기 하나의 스레드를 파이프 라인 기법에 따라 병렬 처리하여 제1 구문 요소, 제2 구문 요소, 제3 구문 요소, 제4 구문 요소 및 제5 구문 요소를 디코딩한다. In operation S1320, the two or more threads and the one thread may be processed in parallel according to a pipeline technique to process a first syntax element, a second syntax element, a third syntax element, a fourth syntax element, and a fifth syntax element. Decode

본 발명의 일 실시예에 따르면, 단계(S1310)에서는 제1 구문 요소의 디코딩에 대하여 제1 스레드를 할당하고, 제2 구문 요소 및 제3 구문 요소의 디코딩에 대하여 제2 스레드를 할당하고, 제4 구문 요소 및 제5 구문 요소의 디코딩에 대하여 제3 스레드를 할당할 수 있다. 이 경우, 단계(S1320)에서는 상기 디코딩하는 단계는 소정의 시간 격차를 두고 제1 스레드의 처리, 제2 스레드의 처리 및 제3 스레드의 처리를 순차적으로 개시하여 제1 스레드, 제2 스레드 및 제3 스레드를 병렬 처리할 수 있다. According to an embodiment of the present invention, in step S1310, the first thread is allocated to the decoding of the first syntax element, the second thread is allocated to the decoding of the second syntax element and the third syntax element, and A third thread may be allocated for decoding of the four syntax elements and the fifth syntax element. In this case, in the step S1320, the decoding may sequentially start the processing of the first thread, the processing of the second thread, and the processing of the third thread with a predetermined time gap, thereby allowing the first thread, the second thread, and the first thread to be processed. 3 threads can be processed in parallel.

본 발명의 다른 실시예에 따르면, 단계(S1310)에서는 제1 구문 요소의 디코딩에 대하여 제1 스레드를 할당하고, 제2 구문 요소의 디코딩에 대하여 제2 스레드를 할당하고, 제3 구문 요소의 디코딩에 대하여 제3 스레드를 할당하고, 제4 구문 요소 및 제5 구문 요소의 디코딩에 대하여 제4 스레드를 할당할 수 있다. 이 경우, 단계(S1320)에서는 소정의 시간 격차를 두고 제1 스레드의 처리, 제2 스레드의 처리, 제3 스레드의 처리 및 제4 스레드의 처리를 순차적으로 개시하여 제1 스레드, 제2 스레드, 제3 스레드 및 제4 스레드를 병렬 처리할 수 있다. According to another embodiment of the present invention, in step S1310, the first thread is allocated to the decoding of the first syntax element, the second thread is allocated to the decoding of the second syntax element, and the decoding of the third syntax element is performed. Assign a third thread for, and assign a fourth thread for decoding the fourth syntax element and the fifth syntax element. In this case, in step S1320, the process of the first thread, the process of the second thread, the process of the third thread, and the process of the fourth thread are sequentially started with a predetermined time gap, thereby allowing the first thread, the second thread, The third thread and the fourth thread can be processed in parallel.

지금까지 본 발명에 따른 영상 디코딩 방법의 실시예들에 대하여 설명하였고, 앞서 도 5에서 설명한 영상 디코딩 장치(500)에 관한 구성이 본 실시예에도 그대로 적용 가능하다. 이에, 보다 상세한 설명은 생략하기로 한다.The embodiments of the image decoding method according to the present invention have been described so far, and the configuration of the image decoding apparatus 500 described above with reference to FIG. 5 is also applicable to the present embodiment. Hereinafter, a detailed description will be omitted.

또한, 본 발명의 실시예들은 다양한 컴퓨터 수단을 통하여 수행될 수 있는 프로그램 명령 형태로 구현되어 컴퓨터 판독 가능 매체에 기록될 수 있다. 상기 컴퓨터 판독 가능 매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. 상기 매체에 기록되는 프로그램 명령은 본 발명을 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 당업자에게 공지되어 사용 가능한 것일 수도 있다. 컴퓨터 판독 가능 기록 매체의 예에는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(magnetic media), CD-ROM, DVD와 같은 광기록 매체(optical media), 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical), 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함한다. 상기된 하드웨어 장치는 본 발명의 일 실시예들의 동작을 수행하기 위해 하나 이상의 소프트웨어 모듈로서 작동하도록 구성될 수 있으며, 그 역도 마찬가지이다.In addition, embodiments of the present invention may be implemented in the form of program instructions that may be executed by various computer means to be recorded on a computer readable medium. The computer readable medium may include program instructions, data files, data structures, etc. alone or in combination. The program instructions recorded on the medium may be those specially designed and constructed for the present invention or may be available to those skilled in the art of computer software. Examples of computer-readable media include magnetic media such as hard disks, floppy disks, and magnetic tape; optical media such as CD-ROMs and DVDs; magnetic media such as floppy disks; Examples of program instructions, such as magneto-optical and ROM, RAM, flash memory and the like, can be executed by a computer using an interpreter or the like, as well as machine code, Includes a high-level language code. The hardware device described above may be configured to operate as one or more software modules to perform the operations of one embodiment of the present invention, and vice versa.

이상과 같이 본 발명에서는 구체적인 구성 요소 등과 같은 특정 사항들과 한정된 실시예 및 도면에 의해 설명되었으나 이는 본 발명의 전반적인 이해를 돕기 위해서 제공된 것일 뿐, 본 발명은 상기의 실시예에 한정되는 것은 아니며, 본 발명이 속하는 분야에서 통상적인 지식을 가진 자라면 이러한 기재로부터 다양한 수정 및 변형이 가능하다. 따라서, 본 발명의 사상은 설명된 실시예에 국한되어 정해져서는 아니되며, 후술하는 특허청구범위뿐 아니라 이 특허청구범위와 균등하거나 등가적 변형이 있는 모든 것들은 본 발명 사상의 범주에 속한다고 할 것이다.As described above, the present invention has been described with reference to particular embodiments, such as specific elements, and limited embodiments and drawings. However, it should be understood that the present invention is not limited to the above- Various modifications and variations may be made thereto by those skilled in the art to which the present invention pertains. Accordingly, the spirit of the present invention should not be construed as being limited to the embodiments described, and all of the equivalents or equivalents of the claims, as well as the following claims, belong to the scope of the present invention .

Claims

An apparatus for decoding an image encoded according to a syntax element partitioning technique,
Allocating two or more threads for decoding the first syntax element included in the MBINFO group, the second syntax element included in the PRED group, and the third syntax element included in the CBP group, and the fourth included in the SIGMAP group. A thread allocator for allocating one thread for decoding the fifth syntax element included in the syntax element and the COEFF group; And
Parallelizing the two or more threads and the one thread according to a pipeline technique to decode the first syntax element, the second syntax element, the third syntax element, the fourth syntax element and the fifth syntax element Decoding unit
Image decoding apparatus comprising a.

The method of claim 1,
The thread allocation unit
Allocating a first thread for decoding the first syntax element, allocating a second thread for decoding the second syntax element and the third syntax element, and assigning a second thread to the decoding of the fourth syntax element and the fifth syntax element. Allocate a third thread for decoding,
The decoding unit sequentially initiates the processing of the first thread, the processing of the second thread, and the processing of the third thread with a predetermined time gap to parallel the first thread, the second thread, and the third thread. Image decoding apparatus characterized in that the processing.

The method of claim 2,
The decoding unit
Process the first thread, the second thread, and the third thread on a time slot basis, respectively, and start processing of the second thread one time slot after the start of processing of the first thread; And the processing of the third macroblock is started one time slot after the start of the processing of the thread.

The method of claim 3,
The decoding unit
Decode two or more of the first syntax elements in one time slot, decode one or more of the second syntax elements and one or more of the third syntax elements in one time slot, and one or more of the fourth syntax elements and one or more And decoding the fifth syntax element in one time slot.

The method of claim 1,
The thread allocation unit
Allocating a first thread for decoding the first syntax element, allocating a second thread for decoding the second syntax element, allocating a third thread for decoding the third syntax element, Allocate a fourth thread for decoding four syntax elements and the fifth syntax element,
The decoding unit sequentially initiates the processing of the first thread, the processing of the second thread, the processing of the third thread, and the processing of the fourth thread with a predetermined time gap, thereby providing the first thread and the second thread. And processing the third thread and the fourth thread in parallel.

The method of claim 5,
The decoding unit
Process the first thread, the second thread, the third thread, and the fourth thread on a time slot basis, respectively, and start processing of the second thread one time slot after the start of processing of the first thread; Starting the processing of the third macroblock one time slot after the start of the processing of the second thread, and starting the processing of the fourth macroblock one time slot after the starting of the processing of the third thread. Image decoding apparatus, characterized in that.

The method of claim 6,
The decoding unit
Decode two or more of the first syntax elements in one time slot, decode two or more of the second syntax elements in one time slot, decode two or more of the third syntax elements in one time slot, And decode the fourth syntax element and the at least one fifth syntax element in one time slot.

A method of decoding an image encoded according to a syntax element splitting technique,
Allocate two or more threads for decoding the first syntax element included in the MBINFO group, the second syntax element included in the PRED group, and the third syntax element included in the CBP group, and the fourth syntax element included in the SIGMAP group, and Allocating one thread for decoding the fifth syntax element included in the COEFF group; And
Parallelizing the two or more threads and the one thread according to a pipeline technique to decode the first syntax element, the second syntax element, the third syntax element, the fourth syntax element and the fifth syntax element step
Image decoding method comprising a.

9. The method of claim 8,
Allocating the thread
Allocating a first thread for decoding the first syntax element, allocating a second thread for decoding the second syntax element and the third syntax element, and assigning a second thread to the decoding of the fourth syntax element and the fifth syntax element. Allocate a third thread for decoding,
The decoding step may sequentially initiate the processing of the first thread, the processing of the second thread, and the processing of the third thread with a predetermined time gap, such that the first thread, the second thread, and the third thread are sequentially started. The video decoding method characterized in that the parallel processing.

9. The method of claim 8,
Allocating the thread
Allocating a first thread for decoding the first syntax element, allocating a second thread for decoding the second syntax element, allocating a third thread for decoding the third syntax element, Allocate a fourth thread for decoding four syntax elements and the fifth syntax element,
The decoding may sequentially start the processing of the first thread, the processing of the second thread, the processing of the third thread, and the processing of the fourth thread with a predetermined time gap, thereby providing the first thread and the first thread. And processing two threads, the third thread, and the fourth thread in parallel.