KR20110122658A

KR20110122658A - Video decoder and method for video decoding using multi-thread

Info

Publication number: KR20110122658A
Application number: KR1020110100739A
Authority: KR
Inventors: 김원진; 정기석
Original assignee: 한양대학교 산학협력단
Priority date: 2011-10-04
Filing date: 2011-10-04
Publication date: 2011-11-10

Abstract

PURPOSE: A method and apparatus for decoding video based on multi thread are provided to improve decoding performance and satisfy data dependency according to H.264/AVC standard. CONSTITUTION: A process module(200) performs a first process of a fourth area by generating a (4-1)-th thread(241). The process module performs a third process of a second area by generating a (4-2)-th thread(242). The process module performs a second process of a third area by generating a (4-3)-th thread(243). The request of the memory is decreased by reusing a memory bank(310,320,330) in the video decoding based on the multi thread.

Description

Video decoder and method for video decoding using multi-thread}

본 발명은 멀티 스레드 기반의 비디오 디코더 및 디코딩 방법에 관한 것이다.The present invention relates to a multi-threaded video decoder and decoding method.

인터넷을 포함한 정보통신 기술이 발달함에 따라 문자, 음성뿐만 아니라 화상통신이 증가하고 있다. 기존의 문자 위주의 통신 방식으로는 소비자의 다양한 욕구를 충족시키기에는 부족하며, 이에 따라 문자, 영상, 음악 등 다양한 형태의 정보를 수용할 수 있는 멀티미디어 서비스가 증가하고 있다. 멀티미디어 데이터는 그 양이 방대하여 대용량의 저장매체를 필요로 하며 전송 시에 넓은 대역폭을 필요로 한다. 따라서 문자, 영상, 오디오를 포함한 멀티미디어 데이터를 전송하기 위해서는 압축코딩기법을 사용하는 것이 필수적이다.As information and communication technology including the Internet is developed, not only text and voice but also video communication are increasing. Conventional text-based communication methods are not enough to satisfy various needs of consumers, and accordingly, multimedia services that can accommodate various types of information such as text, video, and music are increasing. The multimedia data has a huge amount and requires a large storage medium and a wide bandwidth in transmission. Therefore, in order to transmit multimedia data including text, video, and audio, it is essential to use a compression coding technique.

데이터를 압축하는 기본적인 원리는 데이터의 중복(redundancy) 요소를 제거하는 과정이다. 이러한 데이터 중에서도 특히, 비디오 데이터는 그 용량이 매우 크기 때문에 다른 종류의 멀티미디어 데이터들에 비하여 효율적인 압축의 중요성이 보다 크다고 할 수 있다. The basic principle of compressing data is to eliminate redundancy in the data. Among these data, in particular, video data has a very large capacity, and thus, the importance of efficient compression is more important than other types of multimedia data.

비디오 압축의 기본 원리는 하나의 픽쳐(프레임) 내에서 동일한 색이나 객체가 반복되는 것과 같은 공간적 중복이나, 동영상 프레임에서 인접 프레임이 거의 변화가 없는 경우나 오디오에서 같은 음이 계속 반복되는 것과 같은 시간적 중복, 또는 인간의 시각 및 지각 능력이 높은 주파수에 둔감한 것을 고려하여 지각적 중복을 제거하는 방법 등을 통하여 데이터를 압축하는 것이다. 일반적인 비디오 코딩 방법에 있어서, 시간적 중복은 모션 보상에 근거한 시간적 필터링(temporal filtering)에 의해 제거하고, 공간적 중복은 공간적 변환(spatial transform)에 의해 제거한다.The basic principle of video compression is spatial redundancy, such as the repetition of the same color or object within a picture (frame), temporal repetition of the same note in an audio frame, or when there is little change in adjacent frames in a video frame. Data is compressed through a method of eliminating perceptual duplication in consideration of duplication, or the insensitivity of human visual and perceptual ability to high frequencies. In a general video coding method, temporal redundancy is eliminated by temporal filtering based on motion compensation, and spatial redundancy is removed by spatial transform.

종래에는 이러한 비디오 코딩이나 디코딩 작업이 단일 코어 프로세서(single-core processor)에 의하여 이루어지는 것이 일반적이었다. 그러나, 최근 들어 강력한 성능의 멀티 코어 프로세서(multi-core processor)가 보급화되면서 비디오 코딩/디코딩과 같이 시스템 자원을 많이 소모하는 분야에서 멀티 코어 프로세서의 활용도가 높아지고 있다.In the past, such video coding or decoding was generally performed by a single-core processor. However, in recent years, the use of multi-core processors has been increased in the field that consumes a lot of system resources such as video coding / decoding as the powerful multi-core processors have become popular.

본 발명이 해결하고자 하는 기술적 과제는 처리 성능이 향상된 멀티 스레드 기반의 비디오 디코딩 방법을 제공하는 것이다.The technical problem to be solved by the present invention is to provide a multi-threaded video decoding method with improved processing performance.

본 발명이 해결하고자 하는 다른 기술적 과제는 처리 성능이 향상된 멀티 스레드 기반의 비디오 디코더를 제공하는 것이다.Another technical problem to be solved by the present invention is to provide a multi-threaded video decoder with improved processing performance.

본 발명의 기술적 과제들은 이상에서 언급한 기술적 과제로 제한되지 않으며, 언급되지 않은 또 다른 기술적 과제들은 아래의 기재로부터 당업자에게 명확하게 이해될 수 있을 것이다.Technical problems of the present invention are not limited to the technical problems mentioned above, and other technical problems not mentioned will be clearly understood by those skilled in the art from the following description.

상기 기술적 과제를 달성하기 위한 본 발명의 멀티 스레드 기반의 비디오 디코딩 방법의 일 태양(aspect)은, 비디오 프레임을 제1 내지 제N 영역으로 분할하고, 제1 단계에서 제1-1 스레드를 생성하여 제1 영역에 대해 제1 처리를 수행하고, 제2 단계에서 제2-1 스레드를 생성하여 제1 영역에 대해 제2 처리를 수행하고, 제2-2 스레드를 생성하여 제2 영역에 대해 제1 처리를 수행하고, 제3 단계에서 제3-1 스레드를 생성하여 제1 영역에 대해 제3 처리를 수행하고, 제3-2 스레드를 생성하여 제2 영역에 대해 제2 처리를 수행하고, 제3-3 스레드를 생성하여 제3 영역에 대해 제1 처리를 수행하는 것을 포함한다.An aspect of the multi-threaded video decoding method of the present invention for achieving the above technical problem is to divide a video frame into first to N-th regions, and generate a first-first thread in a first step The first process is performed on the first region, and in the second step, the second region is generated by performing the second process on the first region, and the second-2 thread is generated, and the second region is generated on the second region. Perform a first process, perform a third process for the first region by creating a 3-1 thread in a third step, generate a third-2 thread, and perform a second process for the second region, Creating a third-3 thread to perform a first process on the third region;

상기 다른 기술적 과제를 달성하기 위한 본 발명의 멀티 스레드 기반의 비디오 디코더의 일 태양은, 제1 내지 제N 영역을 포함하는 비디오 프레임을 제공받는 입력 모듈, 및 다수의 스레드를 생성하여 비디오 프레임에 대한 디코딩 처리를 수행하는 처리 모듈을 포함하되, 처리 모듈은, 제1 단계에서 제1-1 스레드를 생성하여 제1 영역에 대해 제1 처리를 수행하고, 제2 단계에서 제2-1 스레드를 생성하여 제1 영역에 대해 제2 처리를 수행하고, 제2-2 스레드를 생성하여 제2 영역에 대해 제1 처리를 수행하고, 제3 단계에서 제3-1 스레드를 생성하여 제1 영역에 대해 제3 처리를 수행하고, 제3-2 스레드를 생성하여 제2 영역에 대해 제2 처리를 수행하고, 제3-3 스레드를 생성하여 제3 영역에 대해 제1 처리를 수행한다.One aspect of the multi-threaded video decoder of the present invention for achieving the above another technical problem, an input module for receiving a video frame including the first to N-th region, and generating a plurality of threads for the video frame And a processing module for performing decoding processing, wherein the processing module generates a first-first thread in a first step to perform a first processing on the first region, and generates a second-first thread in a second step. To perform a second process on the first region, generate a 2-2 thread to perform the first process on the second region, and in the third step, generate a 3-1 thread on the first region. The third process is performed, a second thread is generated to perform the second process on the second area, and the third-3 thread is generated to perform the first process on the third area.

기타 실시예들의 구체적인 사항들은 상세한 설명 및 도면들에 포함되어 있다.Specific details of other embodiments are included in the detailed description and the drawings.

본 발명의 일 실시예에 따른 멀티 스레드 기반의 비디오 디코더 및 디코딩 방법에 따를 경우, H.264/AVC 표준에 따른 데이터 의존성을 만족시키면서 디코딩 처리 성능을 향상시킬 수 있다. 또한, 디코딩 과정에서 메모리 뱅크를 재사용함으로써, 메모리의 효율적인 이용이 가능하다.According to the multi-threaded video decoder and decoding method according to an embodiment of the present invention, it is possible to improve decoding processing performance while satisfying data dependency according to the H.264 / AVC standard. In addition, by reusing the memory bank in the decoding process, it is possible to use the memory efficiently.

도 1은 H.264/AVC 표준에 따른 비디오 디코딩 흐름도이다.
도 2는 H.264/AVC 표준에 따른 비디오 디코딩에서 매크로 블록간 의존성을 나타낸 도면이다.
도 3 내지 도 6은 본 발명의 일 실시예에 따른 멀티 스레드 기반의 비디오 디코딩 방법을 설명하기 위한 도면들이다.
도 7 및 도 8은 본 발명의 일 실시예에 따른 멀티 스레드 기반의 비디오 디코딩 방법의 성능을 설명하기 위한 도면들이다.1 is a video decoding flowchart according to the H.264 / AVC standard.
2 is a diagram illustrating inter-macroblock dependencies in video decoding according to the H.264 / AVC standard.
3 to 6 are diagrams for describing a multi-thread based video decoding method according to an embodiment of the present invention.
7 and 8 are diagrams for describing the performance of a multi-threaded video decoding method according to an embodiment of the present invention.

본 발명의 이점 및 특징, 그리고 그것들을 달성하는 방법은 첨부되는 도면과 함께 상세하게 후술되어 있는 실시예들을 참조하면 명확해질 것이다. 그러나 본 발명은 이하에서 개시되는 실시예들에 한정되는 것이 아니라 서로 다른 다양한 형태로 구현될 것이며, 단지 본 실시예들은 본 발명의 개시가 완전하도록 하며, 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 발명의 범주를 완전하게 알려주기 위해 제공되는 것이며, 본 발명은 청구항의 범주에 의해 정의될 뿐이다. Advantages and features of the present invention and methods for achieving them will be apparent with reference to the embodiments described below in detail with the accompanying drawings. However, the present invention is not limited to the embodiments disclosed below, but will be implemented in various forms, and only the present embodiments are intended to complete the disclosure of the present invention, and the general knowledge in the art to which the present invention pertains. It is provided to fully convey the scope of the invention to those skilled in the art, and the present invention is defined only by the scope of the claims.

도면에서 표시된 구성요소의 크기 및 상대적인 크기는 설명의 명료성을 위해 과장된 것일 수 있다. 명세서 전체에 걸쳐 동일 참조 부호는 동일 구성 요소를 지칭하며, "및/또는"은 언급된 아이템들의 각각 및 하나 이상의 모든 조합을 포함한다.The size and relative size of the components shown in the drawings may be exaggerated for clarity of explanation. Like reference numerals refer to like elements throughout the specification, and "and / or" includes each and every combination of one or more of the mentioned items.

다른 정의가 없다면, 본 명세서에서 사용되는 모든 용어(기술 및 과학적 용어를 포함)는 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 공통적으로 이해될 수 있는 의미로 사용될 수 있을 것이다. 또 일반적으로 사용되는 사전에 정의되어 있는 용어들은 명백하게 특별히 정의되어 있지 않는 한 이상적으로 또는 과도하게 해석되지 않는다.Unless otherwise defined, all terms (including technical and scientific terms) used in the present specification may be used in a sense that can be commonly understood by those skilled in the art. In addition, the terms defined in the commonly used dictionaries are not ideally or excessively interpreted unless they are specifically defined clearly.

이하 도 1 내지 도 6을 참조하여, 본 발명의 일 실시예에 따른 멀티 스레드 기반의 비디오 디코딩 방법에 대해 설명한다.Hereinafter, a multi-threaded video decoding method according to an embodiment of the present invention will be described with reference to FIGS. 1 to 6.

이하에서는 본 발명을 H.264/AVC 표준에 따라 디코딩하는 것을 예로 들어 설명하나, 본 발명이 이에 제한되는 것은 아니다. 또한, 본 명세서에서는 비디오 프레임이 다수의 매크로 블록(Macro Block, 이하 MB표시함)을 포함하는 것을 예로 들어 설명하고 있으나, 역시 본 발명이 이에 제한되는 것은 아니다.Hereinafter, the present invention will be described with an example of decoding according to the H.264 / AVC standard, but the present invention is not limited thereto. In addition, in the present specification, although the video frame includes a plurality of macro blocks (hereinafter referred to as MB), the description is given as an example, but the present invention is not limited thereto.

먼저, 도 1을 참조하여 H.264/AVC 표준에 따라 인코딩된 비디오 프레임을 디코딩하는 작업 흐름에 대해 설명한다.First, a workflow of decoding a video frame encoded according to the H.264 / AVC standard will be described with reference to FIG. 1.

도 1은 H.264/AVC 표준에 따른 비디오 디코딩 흐름도이다.1 is a video decoding flowchart according to the H.264 / AVC standard.

도 1을 참조하면, 비디오 프레임을 구성하는 비트 스트림(Bit Stream)이 제공되면, 이를 엔트로피 복호화(Entropy Decoding, 이하 ED로 표시함)(10)한다. 구체적으로, H.264/AVC에서 비트 스트림은 Network Adaptation Layer (NAL) 단위로 입력되어 ED(10)에서 계수(Coefficient)를 생성할 수 있다. 다음, 엔트로피 복호화 작업 후 생성되는 계수들은 역 양자화(Inverse Quantization, 이하 IQ로 표시함)(20) 및 역 변환(Inverse Transformation, 이하, IT로 표시함)(30) 과정 후에 레시듀얼 데이터(residual data)로 생성될 수 있다. Referring to FIG. 1, when a bit stream constituting a video frame is provided, it is entropy decoded (hereinafter, referred to as ED) 10. In detail, in H.264 / AVC, the bit stream may be input in units of Network Adaptation Layer (NAL) to generate coefficients in the ED 10. Next, coefficients generated after the entropy decoding operation are received after the process of inverse quantization (hereinafter referred to as IQ) 20 and inverse transformation (hereinafter referred to as IT) 30. Can be generated).

한편, 엔트로피 복호화 작업 후 생성되는 계수들은 여러 예측데이터 생성에도 사용될 수 있는데, 인트라 예측(intra prediction, 이하 IP로 표시함)(40)은 공간적 중복성을 이용하여 화면 내 예측을 수행하고, 모션 보상(motion compensation, 이하 MC로 표시함)(50)은 시간적 중복성을 사용하여 화면간의 예측을 수행할 수 있다. 이러한 예측 작업은 MB의 타입에 따라서 IP가 적용되거나, MC가 적용될 수 있다.Meanwhile, the coefficients generated after the entropy decoding operation may be used to generate a plurality of prediction data. Intra prediction (hereinafter referred to as IP) 40 performs intra prediction using spatial redundancy, and performs motion compensation ( motion compensation (hereinafter referred to as MC) 50 may perform prediction between screens using temporal redundancy. Such a prediction task may be applied with IP or MC according to the type of MB.

IP 및 MC후 만들어지는 블록은 IQ/IT 후에 만들어지는 레시듀얼 데이터와 합쳐질 수 있다(60). 그리고 이렇게 디코딩된 영상은 블록간의 경계가 뚜렷이 나타나는 블록킹 현상이 나타날 수 있기 때문에, 이를 없애기 위해서 블록 경 계를 부드럽게 하는 디블록킹 필터(Deblocking Filter, 이하 DF로 표시함)(70)를 적용할 수 있다. 여기서, H.264/AVC의 DF는 가중치 값을 제어하면서 적용하는 적응형(adaptive) 방법을 사용할 수 있다.Blocks created after IP and MC may be merged with residual data created after IQ / IT (60). Since the decoded image may have a blocking phenomenon in which the boundary between blocks is clearly visible, a deblocking filter (hereinafter referred to as DF) 70 may be applied to remove the block boundary. . Here, the DF of H.264 / AVC may use an adaptive method that is applied while controlling the weight value.

다음, 도 2를 참조하여, H.264/AVC 표준에 따른 디코딩 방법에서 MB간 데이터 의존성에 대해 설명한다.Next, referring to FIG. 2, data dependency between MBs in the decoding method according to the H.264 / AVC standard will be described.

도 2는 H.264/AVC 표준에 따른 비디오 디코딩에서 매크로 블록간 의존성을 나타낸 도면이다. 2 is a diagram illustrating inter-macroblock dependencies in video decoding according to the H.264 / AVC standard.

도 2를 참조하면, 현재 처리하고자 하는 MB의 IP, MC 및 DF가 각각 처리되기 위해서는 주변 MB들의 IP, MC 및 DF가 각각 처리된 상태여야 한다. 구체적으로 현재 처리하고자 하는 MB의 IP 및 MC가 처리되기 위해서는, 처리하고자 하는 MB(Current MB)의 좌상, 상, 우상의 MB(1, 2, 3) 및 좌측의 MB(4)의 IP 및 MC가 처리된 상태여야 한다. 그리고, 현재 처리하고자 하는 MB의 DF가 각각 처리되기 위해서는, 처리하고자 하는 MB(Current MB)의 상측 의 MB(2) 및 좌측의 MB(4)의 DF가 처리된 상태여야 한다. 이는 각 MB를 처리하는데 있어서 의존성 있는 MB의 처리여부를 확인해야 하는 결과를 가져오게 되고, 이는 처리 속도에 악영향을 끼칠 수 있다. 이에 대해서는 보다 자세히 후술하도록 한다.Referring to FIG. 2, in order to process IP, MC, and DF of MB currently to be processed, respectively, IP, MC, and DF of neighboring MBs must be processed. Specifically, in order to process the IP and MC of the MB to be processed currently, the IP and MC of the MB (1, 2, 3) on the upper left, upper and right sides of the MB (Current MB) to be processed and the MB (4) on the left Must be processed. In order to process the DF of the MB to be processed, the DF of the MB (2) on the upper side and the MB (4) on the left side of the MB (Current MB) to be processed must be processed. This results in the necessity of checking whether or not dependent MBs are processed in processing each MB, which may adversely affect the processing speed. This will be described later in more detail.

다음 도 3 내지 도 6을 참조하여, 본 발명의 일 실시예에 따른 멀티 스레드 기반의 비디오 디코딩 방법에 대해 설명한다.Next, a multi-threaded video decoding method according to an embodiment of the present invention will be described with reference to FIGS. 3 to 6.

도 3 내지 도 6은 본 발명의 일 실시예에 따른 멀티 스레드 기반의 비디오 디코딩 방법을 설명하기 위한 도면들이다.3 to 6 are diagrams for describing a multi-thread based video decoding method according to an embodiment of the present invention.

먼저 도 3을 참조하면, 비디오 프레임을 제1 내지 제N 영역으로 분할한다. 여기서 비디오 프레임은 전술하였듯이 H.264/AVC 표준에 따라 인코딩된 비디오 프레임일 수 있다.First, referring to FIG. 3, a video frame is divided into first to Nth regions. As described above, the video frame may be a video frame encoded according to the H.264 / AVC standard.

각각의 제1 내지 제N 영역은 다수의 MB(110)를 포함할 수 있다. 도 3 내지 도 6을 함께 참조하면, 본 발명의 일 실시예에 따른 멀티 스레드 기반의 비디오 디코딩 방법에서는 비디오 프레임(100)을 각각 16개의 MB를 포함하는 제1 내지 제4 영역으로 분할한 것을 예시하고 있으나, 본 발명이 이에 제한되는 것은 아니다. 필요에 따라 분할된 영역의 개수는 훨씬 더 많아질 수 있다. 또한, 여기서는 제1 내지 제4 영역에 포함된 MB(110)의 개수가 서로 16개로 동일한 것을 예시하고 있으나, 이 역시 하나의 예시에 불과한 것으로 MB(110)의 개수는 얼마든지 달라질 수도 있으며, 필요에 따라 제1 내지 제4 영역은 서로 다른 개수의 MB(110)를 포함할 수도 있다. 마지막으로 도 3 내지 도 6에는 비디오 프레임(100)이 도시된 바와 같이 가로방향으로 4등분 되어 있는 것이 도시되어 있으나, 본 발명은 역시 이에 제한되지 않는다.Each of the first to Nth regions may include a plurality of MBs 110. 3 to 6 together, in the multi-threaded video decoding method according to an embodiment of the present invention, the video frame 100 is divided into first to fourth regions each including 16 MB. However, the present invention is not limited thereto. If necessary, the number of divided regions can be much larger. In addition, although the number of MBs 110 included in the first to fourth regions is the same as that of 16, the number of MBs 110 may be different. Accordingly, the first to fourth regions may include different numbers of MBs 110. 3 to 6, the video frame 100 is shown to be divided into four in the horizontal direction as shown, the present invention is not limited thereto.

다시 도 3을 참조하면, 제1 단계(Stage 1)에서 제1-1 스레드(211)를 생성하여 제1 영역에 대해 제1 처리를 수행한다. 여기서 제1 처리는 ED일 수 있다. 제1-1 스레드(211)가 제1 영역에 대해 ED를 모두 수행하고 나면, 그 결과는 제1 메모리 뱅크(310)에 저장될 수 있다. 이는 후술하겠지만 다른 스레드가 다른 처리를 위해 제1 영역에 대한 ED 결과를 이용하기 위함일 수 있다.Referring to FIG. 3 again, in a first step (Stage 1), a first-first thread 211 is generated to perform a first process on the first area. Wherein the first treatment may be ED. After the first-first thread 211 performs all ED for the first region, the result may be stored in the first memory bank 310. This will be described later, but may be for another thread to use the ED result for the first area for other processing.

다음 도 4를 참조하면, 제2 단계(Stage 2)에서 제2-1 스레드(221)를 생성하여 제1 영역에 대해 제2 처리를 수행하고, 제2-2 스레드(222)를 생성하여 제2 영역에 대해 제1 처리를 수행한다. 여기서 제1 처리는 ED일 수 있으며, 제2 처리는 MC+IQ/IP 또는 IP+IQ/IT일 수 있다. 여기서는 MB의 타입에 따라 MC와 IP중 어느 하나가 수행될 수 있다.Next, referring to FIG. 4, in a second step (Stage 2), a 2-1 th thread 221 is created to perform a second process on the first region, and a 2-2 th thread 222 is generated to generate a second thread. The first process is performed on the two areas. Here, the first process may be ED, and the second process may be MC + IQ / IP or IP + IQ / IT. In this case, one of MC and IP may be performed according to the type of MB.

제2-1 스레드(221)는 제1 메모리 뱅크(310)에 저장된 제1-1 스레드(도 3의 211)가 제1 영역에 대해 ED 처리한 결과를 이용하여, 제1 영역에 대한 MC+IQ/IP 또는 IP+IQ/IT 처리를 수행할 수 있다. 그리고, 수행된 처리 결과는 다시 제1 메모리 뱅크(310)에 저장될 수 있다. 제2-2 스레드(222)는 제2 영역에 대해 제1 처리를 수행하고, 그 처리 결과를 제2 메모리 뱅크(320)에 저장할 수 있다.The 2-1 th thread 221 uses the result of ED processing on the first area by the 1-1 thread (211 of FIG. 3) stored in the first memory bank 310. IQ / IP or IP + IQ / IT processing can be performed. The processed result may be stored in the first memory bank 310 again. The second-2 thread 222 may perform a first process on the second area, and store the processing result in the second memory bank 320.

다음 도 5를 참조하면, 제3 단계(Stage 3)에서 제3-1 스레드(231)를 생성하여 제1 영역에 대해 제3 처리를 수행하고, 제3-2 스레드(232)를 생성하여 제2 영역에 대해 제2 처리를 수행하고, 제3-3 스레드(233)를 생성하여 제3 영역에 대해 제1 처리를 수행한다. 여기서 제1 처리는 ED일 수 있으며, 제2 처리는 MC+IQ/IP 또는 IP+IQ/IT일 수 있고, 제3 처리는 DF일 수 있다.Next, referring to FIG. 5, in the third step (Stage 3), the third-first thread 231 is generated to perform a third process on the first region, and the third-second thread 232 is generated to generate the third-first thread. The second process is performed on the second region, and the third-3 thread 233 is generated to perform the first process on the third region. Here, the first process may be ED, the second process may be MC + IQ / IP or IP + IQ / IT, and the third process may be DF.

제3-1 스레드(231)는 제1 메모리 뱅크(310)에 저장된 제2-1 스레드(도 4의 221)가 제1 영역에 대해 MC+IQ/IP 또는 IP+IQ/IT 처리한 결과를 이용하여, 제1 영역에 대한 DF 처리를 수행할 수 있다. 그리고, 수행된 처리 결과는 다시 제1 메모리 뱅크(310)에 저장될 수 있다. 제3-2 스레드(232)는 제2 메모리 뱅크(320)에 저장된 제2-2 스레드(도 4의 222)가 제2 영역에 대해 ED 처리한 결과를 이용하여, 제2 영역에 대한 MC+IQ/IP 또는 IP+IQ/IT 처리를 수행하고, 그 처리 결과를 다시 제2 메모리 뱅크(320)에 저장할 수 있다. 여기서 본 발명의 일 실시예에 따른 멀티 스레드 기반의 비디오 디코딩 방법에서는 제2 영역에 대한 MC+IQ/IP 또는 IP+IQ/IT 처리가 제1 영역에 대한 MC+IQ/IP 또는 IP+IQ/IT 처리가 수행된 후 수행되기 때문에, 도 2에 도시한 H.264/AVC 표준에 따른 데이터 의존성을 만족시킴을 알 수 있다. 마지막으로 제3-3 스레드(233)는 제3 영역에 대해 제1 처리를 수행하고, 그 처리 결과를 제3 메모리 뱅크(330)에 저장할 수 있다.The 3-1 th thread 231 processes the result of MC + IQ / IP or IP + IQ / IT processing of the first region by the 2-1 th thread (221 of FIG. 4) stored in the first memory bank 310. DF processing for the first area can be performed. The processed result may be stored in the first memory bank 310 again. The third-second thread 232 uses the result of ED processing of the second area by the second-second thread (222 in FIG. 4) stored in the second memory bank 320. IQ / IP or IP + IQ / IT processing may be performed, and the processing result may be stored in the second memory bank 320 again. In the multi-threaded video decoding method according to an embodiment of the present invention, the MC + IQ / IP or IP + IQ / IT processing for the second region is performed by the MC + IQ / IP or IP + IQ / for the first region. Since it is performed after the IT processing is performed, it can be seen that the data dependency according to the H.264 / AVC standard shown in FIG. 2 is satisfied. Finally, the third-3 thread 233 may perform a first process on the third area and store the processing result in the third memory bank 330.

다음 도 6을 참조하면, 제4 단계(Stage 4)에서 제4-1 스레드(241)를 생성하여 제4 영역에 대해 제1 처리를 수행하고, 제4-2 스레드(242)를 생성하여 제2 영역에 대해 제3 처리를 수행하고, 제4-3 스레드(243)를 생성하여 제3 영역에 대해 제2 처리를 수행한다. 앞에서와 마찬가지로 여기서, 제1 처리는 ED일 수 있으며, 제2 처리는 MC+IQ/IP 또는 IP+IQ/IT일 수 있고, 제3 처리는 DF일 수 있다.Next, referring to FIG. 6, in a fourth step (Stage 4), a fourth-first thread 241 is created to perform a first process on a fourth region, and a fourth-second thread 242 is generated to generate a first-first thread. The third process is performed on the second region, and the fourth-3 thread 243 is generated to perform the second process on the third region. As before, here, the first process may be ED, the second process may be MC + IQ / IP or IP + IQ / IT, and the third process may be DF.

도 6을 참조하면, 제1 영역은 ED, MC+IQ/IP 또는 IP+IQ/IT, DF처리가 모두 완료되어 디코딩이 끝난 영역(Decoded)일 수 있다. 한편, 제4-2 스레드(242)는 제2 메모리 뱅크(320)에 저장된 제3-2 스레드(도 5의 232)가 제2 영역에 대해 MC+IQ/IP 또는 IP+IQ/IT 처리한 결과를 이용하여, 제2 영역에 대한 DF 처리를 수행할 수 있다. 그리고, 수행된 처리 결과는 다시 제2 메모리 뱅크(320)에 저장될 수 있다. 제4-3 스레드(243)는 제3 메모리 뱅크(330)에 저장된 제3-3 스레드(도 4의 233)가 제3 영역에 대해 ED 처리한 결과를 이용하여, 제3 영역에 대한 MC+IQ/IP 또는 IP+IQ/IT 처리를 수행하고, 그 처리 결과를 다시 제3 메모리 뱅크(330)에 저장할 수 있다. 마찬가지로, 여기서 제3 영역에 대한 MC+IQ/IP 또는 IP+IQ/IT 처리는 제2 영역에 대한 MC+IQ/IP 또는 IP+IQ/IT 처리가 수행된 후 수행되기 때문에, 도 2에 도시한 H.264/AVC 표준에 따른 데이터 의존성을 만족시킴을 알 수 있다. 마지막으로, 제 4-1 스레드(241)는 제4 영역에 대해 ED 처리를 수행하고, 그 처리 결과를 제1 메모리 뱅크(310)에 저장할 수 있다. 여기서, 제1 메모리 뱅크(310)는 제1 영역에 대한 디코딩이 진행될 때는 제1 영역에 대한 디코딩 데이터가 저장되다가, 제1 영역에 대한 디코딩이 모두 완료된 후 제4 영역에 대한 디코딩에 사용되기 위해 재사용되는 메모리 뱅크일 수 있다. 즉, 본 발명의 일 실시예에 따른 멀티 스레드 기반의 비디오 디코딩 방법에서는 이처럼 메모리 뱅크를 재사용함으로써, 고해상도 비디오 프레임을 처리할수록 급격하게 요구되는 메모리에 대한 요구를 저감할 수 있게 된다.Referring to FIG. 6, the first region may be a decoded region in which all of ED, MC + IQ / IP, IP + IQ / IT, and DF processes are completed and decoded. On the other hand, the 4-2 thread 242 is processed by the 3-2 thread (232 of FIG. 5) stored in the second memory bank 320 MC + IQ / IP or IP + IQ / IT for the second area Using the result, DF processing for the second area can be performed. In addition, the performed processing result may be stored in the second memory bank 320 again. The 4-3 thread 243 is the MC + for the third area by using the result of ED processing of the third area by the third-3 thread (233 in FIG. 4) stored in the third memory bank 330. IQ / IP or IP + IQ / IT processing may be performed, and the processing result may be stored in the third memory bank 330 again. Similarly, since MC + IQ / IP or IP + IQ / IT processing for the third region is performed after MC + IQ / IP or IP + IQ / IT processing for the second region is performed in FIG. It can be seen that it satisfies the data dependency according to one H.264 / AVC standard. Finally, the 4-1 thread 241 may perform ED processing on the fourth region and store the processing result in the first memory bank 310. Here, the first memory bank 310 stores the decoded data for the first region when decoding of the first region is performed, and is used for decoding the fourth region after the decoding of the first region is completed. It may be a memory bank that is reused. That is, in the multi-threaded video decoding method according to an embodiment of the present invention, by reusing the memory bank as described above, the demand for a rapidly required memory may be reduced as the high resolution video frame is processed.

전술한 제 4단계 이후의 과정은 앞서 설명한 제1 내지 제4 단계에 대한 설명으로 충분히 유추 가능한바 자세한 설명을 생략하도록 한다. 이와 같은 순서로 디코딩 처리를 계속 진행하면 제1 내지 제4 영역에 대한 디코딩을 모두 완료할 수 있게 된다.The process after the above-described fourth step can be sufficiently inferred as the description of the first to fourth steps described above, and thus a detailed description thereof will be omitted. If the decoding process continues in this order, it is possible to complete the decoding of the first to fourth regions.

다음 도 7 및 도 8을 참조하여, 본 발명의 일 실시예에 따른 멀티 스레드 기반의 비디오 디코딩 방법의 처리 성능 향상도에 대해 설명한다.Next, the processing performance improvement of the multi-threaded video decoding method according to an embodiment of the present invention will be described with reference to FIGS. 7 and 8.

아래의 표 1은 병렬화 방법을 적용하기 전의 H.264/AVC 디코더를 이용하여 다양한 영상(rush hour ~ shields)에 대한 디코딩 시간을 나타낸 표이다. 그리고, 표 2는 태스크 레벨 병렬화 방법, 데이터 레벨 병렬화 방법 및 본 발명의 일 실시예에 따른 멀티 스레드 기반의 비디오 디코딩 방법을 각각 사용하여, 다양한 영상(rush hour ~ shields)에 대해 디코딩 했을 때의 프레임당 최소 처리 시간을 나타낸 표이고, 도 7은 표 1에 관한 그래프이다. Table 1 below shows the decoding times of various images (rush hour to shields) using the H.264 / AVC decoder before the parallelization method is applied. And, Table 2 shows a frame when decoding for various images (rush hour ~ shields) using a task level parallelism method, a data level parallelism method and a multi-threaded video decoding method according to an embodiment of the present invention. Is a table showing the minimum treatment time, and FIG. 7 is a graph related to Table 1. FIG.

마지막으로, 표 3은 표 2와 동일한 방법을 각각 사용하였을 때 프레임당 평균 디코딩 시간을 나타낸 표이고, 도 8은 표 2에 관한 그래프이다.Finally, Table 3 is a table showing the average decoding time per frame when using the same method as Table 2, respectively, Figure 8 is a graph of Table 2.

여기서, 태스크 레벨 병렬화 방법으로는 전형적인 파이프라인 기법(A)을 사용하였고, 데이터 레벨 병렬화 방법으로는 2Dwave 기법(B)을 사용하였다. 여기서, 2Dwave 기법은 2개의 스레드를 이용한 기법(B-1)과 3개의 스레드를 이용한 기법(B-2)을 각각 사용하였다. 또한, 각 영상(rush hour ~ shields)은 JM-v16을 사용하여 인코딩 하였고, 인코딩 환경은 JM-v16 에서 제공하는 H.264/AVC baseline profile를 기반으로 하였다. 그리고, 운영체제는 리눅스 ubuntu 9.10, 커널 버전 2.6.31에서 개발하였고, 인텔 쿼드 코어 i5 프로세서를 사용 하였다. 컴파일러는 gcc v4.4.1을 이용하였고 병렬화 방법은 OpenMP를 사용하였다.Here, a typical pipeline technique (A) is used as the task level parallelization method, and a 2Dwave technique (B) is used as the data level parallelization method. Here, the 2Dwave technique uses a technique using two threads (B-1) and a technique using three threads (B-2), respectively. Also, each image (rush hour ~ shields) was encoded using JM-v16, and the encoding environment was based on H.264 / AVC baseline profile provided by JM-v16. The operating system was developed on Linux ubuntu 9.10, kernel version 2.6.31, and used Intel quad core i5 processor. The compiler used gcc v4.4.1 and the parallelization method used OpenMP.

BasicBasic ED (AVG)ED (AVG) MC+IQ/IT
IP+ IQ/IT (AVG)MC + IQ / IT
IP + IQ / IT (AVG) DF (AVG)DF (AVG) AVGAVG MINMIN SequenceSequence (㎲)(㎲) (%)(%) (㎲)(㎲) (%)(%) (㎲)(㎲) (%)(%) (㎲)(㎲) (㎲)(㎲) rush_hourrush_hour FHD, 1920X1088FHD, 1920X1088 39293929 24%24% 65676567 41%41% 56995699 35%35% 1619516195 1474614746 blue_skyblue_sky FHD, 1920X1088FHD, 1920X1088 41844184 27%27% 58635863 38%38% 55145514 35%35% 1556115561 1369413694 pedestrain_areapedestrain_area FHD, 1920X1088FHD, 1920X1088 42954295 24%24% 68006800 39%39% 64406440 37%37% 1753517535 1349813498 sunflowersunflower FHD, 1920X1088FHD, 1920X1088 39053905 29%29% 63016301 47%47% 31303130 23%23% 1333613336 1103411034 park_runpark_run HD, 1280X720HD, 1280X720 51505150 40%40% 39263926 31%31% 37143714 29%29% 1279012790 1068710687 mobcalmobcal HD, 1280X720HD, 1280X720 23102310 28%28% 34583458 42%42% 25642564 31%31% 83328332 38823882 stockholmstockholm HD, 1280X720HD, 1280X720 20292029 28%28% 29932993 42%42% 21902190 30%30% 72127212 66406640 shieldsshields HD, 1280X720HD, 1280X720 22112211 30%30% 29972997 40%40% 22102210 30%30% 74187418 57075707

SequenceSequence AA B-1B-1 B-2B-2 CC MIN
(㎲)MIN
(㎲) MIN
(%)MIN
(%) MIN
(㎲)MIN
(㎲) MIN
(%)MIN
(%) MIN
(㎲)MIN
(㎲) MIN
(%)MIN
(%) MIN
(㎲)MIN
(㎲) MIN
(%)MIN
(%) rush_hourrush_hour FHD, 1920X1088FHD, 1920X1088 1187311873 19%19% 1210512105 18%18% 99989998 32%32% 69936993 53%53% blue_skyblue_sky FHD, 1920X1088FHD, 1920X1088 1101811018 20%20% 1147011470 16%16% 95589558 30%30% 67356735 51%51% pedestrain_areapedestrain_area FHD, 1920X1088FHD, 1920X1088 1212612126 10%10% 1066610666 21%21% 96049604 29%29% 84318431 38%38% sunflowersunflower FHD, 1920X1088FHD, 1920X1088 94819481 14%14% 96169616 13%13% 85078507 23%23% 67186718 39%39% park_runpark_run HD, 1280X720HD, 1280X720 78137813 27%27% 84828482 21%21% 74417441 30%30% 55805580 48%48% mobcalmobcal HD, 1280X720HD, 1280X720 26552655 32%32% 31543154 19%19% 26782678 31%31% 21722172 44%44% stockholmstockholm HD, 1280X720HD, 1280X720 49964996 25%25% 53955395 19%19% 45634563 31%31% 37023702 44%44% shieldsshields HD, 1280X720HD, 1280X720 50465046 12%12% 46894689 18%18% 41604160 27%27% 35283528 38%38%

SequenceSequence AA B-1B-1 B-2B-2 CC AVG
(㎲)AVG
(㎲) AVG
(%)AVG
(%) AVG
(㎲)AVG
(㎲) AVG
(%)AVG
(%) AVG
(㎲)AVG
(㎲) AVG
(%)AVG
(%) AVG
(㎲)AVG
(㎲) AVG
(%)AVG
(%) rush_hourrush_hour FHD, 1920X1088FHD, 1920X1088 1430914309 12%12% 1339013390 17%17% 1213012130 25%25% 92369236 43%43% blue_skyblue_sky FHD, 1920X1088FHD, 1920X1088 1391413914 11%11% 1315913159 15%15% 1178011780 24%24% 84778477 46%46% pedestrain_areapedestrain_area FHD, 1920X1088FHD, 1920X1088 1578415784 10%10% 1426414264 19%19% 1349813498 23%23% 1087510875 38%38% sunflowersunflower FHD, 1920X1088FHD, 1920X1088 1202312023 10%10% 1157211572 13%13% 1058010580 21%21% 88228822 34%34% park_runpark_run HD, 1280X720HD, 1280X720 1102511025 14%14% 1070910709 16%16% 1004510045 21%21% 69946994 45%45% mobcalmobcal HD, 1280X720HD, 1280X720 74037403 11%11% 69386938 17%17% 63876387 23%23% 54585458 34%34% stockholmstockholm HD, 1280X720HD, 1280X720 58655865 19%19% 61266126 15%15% 57135713 21%21% 48024802 33%33% shieldsshields HD, 1280X720HD, 1280X720 69786978 6%6% 62096209 16%16% 56375637 24%24% 49504950 33%33%

먼저 표 1을 참조하면, 각 영상(rush hour ~ shields)에 따라 디코딩 시간이 각각 다름을 알 수 있고, 하나의 영상도 디코더의 처리 기능에 따라 처리 시간이 각각 다름을 알 수 있다.First, referring to Table 1, it can be seen that the decoding time differs according to each image (rush hour to shields), and that one image also has a different processing time depending on the processing function of the decoder.

다음 표 2 및 도 7을 참조하면, 파이프 라인 병렬화 방법(A)은 병렬화 방법을 적용하기 전의 H.264/AVC 디코더의 처리시간을 기준으로 최소 처리 시간을 12~32% 향상하였음을 알 수 있고, 2개의 스레드를 이용한 2Dwave 기법(B-1)은 13~21% 향상하였음을 알 수 있다. 한편, 3개의 스레드를 이용한 2Dwave 기법(B-2)은 23~32% 성능 향상하였음을 알 수 있고, 본 발명의 일 실시예에 따른 비디오 디코딩 방법(C)은 38~53%의 성능을 향상시켰음을 알 수 있다.Referring to Table 2 and FIG. 7, it can be seen that the pipelined parallelization method (A) improved the minimum processing time by 12 to 32% based on the processing time of the H.264 / AVC decoder before applying the parallelization method. The 2Dwave technique using two threads (B-1) has been improved by 13 ~ 21%. On the other hand, it can be seen that the 2D wave method (B-2) using three threads improves the performance by 23 to 32%, and the video decoding method C according to the embodiment of the present invention improves the performance by 38 to 53%. It can be seen that.

다음 표 3 및 도 8을 참조하면, 파이프 라인 병렬화 방법(A)은 병렬화 방법을 적용하기 전의 H.264/AVC 디코더의 처리시간을 기준으로 평균 처리 시간을 6~19% 향상하였음을 알 수 있고, 2개의 스레드를 이용한 2Dwave 기법(B-1)은 13~19% 향상하였음을 알 수 있다. 한편, 3개의 스레드를 이용한 2Dwave 기법(B-2)은 21~25% 성능 향상하였음을 알 수 있고, 본 발명의 일 실시예에 따른 비디오 디코딩 방법(C)은 33~46%의 성능을 향상시켰음을 알 수 있다.Referring to Table 3 and FIG. 8, it can be seen that the pipelined parallelization method (A) improved the average processing time by 6 to 19% based on the processing time of the H.264 / AVC decoder before applying the parallelization method. In addition, the 2D wave technique (B-1) using two threads is improved by 13 ~ 19%. On the other hand, it can be seen that the 2Dwave method (B-2) using three threads improves the performance by 21 to 25%, and the video decoding method C according to the embodiment of the present invention improves the performance by 33 to 46%. It can be seen that.

앞서 살펴본, 파이프 라인 병렬화 방법(A)은 MB타입에 따라서 MB간 처리 속도가 다르고, 작업 흐름에 따라 작업별 처리속도가 다르기 때문에 지속적으로 스레드간 동기화가 필요하다. 따라서 파이프 라인 병렬화 방법(A)에서는 스레드 동기화로 인한 지연현상이 발생한다고 볼 수 있다. 한편, 2Dwave 기법(B)은 ED가 병렬화 되지 않기 때문에 ED의 순차 처리가 성능 향상에 병목 현상으로 작용한다. 반면, 본 발명의 일 실시예에 따른 비디오 디코딩 방법(C)은 비디오 프레임을 N영역로 분할 처리 하고, 스레드의 동기화를 최소화 하여 비교적 빠른 시간에 디코딩을 수행할 수 있어, 디코딩 처리 성능을 향상시킬 수 있다.As discussed above, the pipelined parallelization method (A) requires continuous synchronization between threads because the processing speed between MBs differs according to MB type, and processing speeds vary by task according to the work flow. Therefore, the pipeline parallelization method (A) can be regarded as a delay due to thread synchronization. On the other hand, in the 2Dwave method (B), since ED is not parallelized, sequential processing of ED is a bottleneck for improving performance. On the other hand, the video decoding method (C) according to an embodiment of the present invention can divide the video frame into N regions and perform decoding in a relatively fast time by minimizing the synchronization of threads, thereby improving decoding processing performance. Can be.

다음 도 3 내지 도 6을 참조하여, 본 발명의 일 실시예에 따른 멀티 스레드 기반의 비디오 디코더에 대해 설명한다. 본 발명의 일 실시예에 따른 멀티 스레드 기반의 비디오 디코더는 앞서 설명한 본 발명의 일 실시예에 따른 멀티 스레드 기반의 비디오 디코딩 방법과 실직적으로 동일한 동작을 하므로, 앞서 설명한 내용과 중복된 설명은 모두 생략하도록 한다.Next, a multi-threaded video decoder according to an embodiment of the present invention will be described with reference to FIGS. 3 to 6. Since the multi-threaded video decoder according to an embodiment of the present invention operates substantially the same as the multi-threaded video decoding method according to the above-described embodiment of the present invention, the descriptions duplicated above Omit it.

도 3 내지 도 6을 참조하면, 본 발명의 일 실시예에 따른 멀티 스레드 기반의 비디오 디코더는 입력 모듈(미도시), 처리 모듈(200) 및 제1 내지 제3 메모리 뱅크(310~330)을 포함할 수 있다.3 to 6, a multi-threaded video decoder according to an embodiment of the present invention may include an input module (not shown), a processing module 200, and first to third memory banks 310 to 330. It may include.

입력 모듈(미도시)은 제1 내지 제N 영역을 포함하는 비디오 프레임(100)을 제공받는 모듈일 수 있다.The input module (not shown) may be a module that receives the video frame 100 including the first to Nth regions.

처리 모듈(200)은 전술한 본 발명의 일 실시예에 따른 멀티 스레드 기반의 비디오 디코딩 방법에서, 각 단계마다 스레드(211~241)를 생성하여 입력 모듈(미도시)에 제공된 비디오 프레임(100)에 대한 디코딩 처리를 수행하고, 처리 결과를 제1 내지 제3 메모리 뱅크(310~330)에 저장하는 모듈일 수 있다.In the multi-threaded video decoding method according to an embodiment of the present invention described above, the processing module 200 generates threads 211 to 241 for each step and provides a video frame 100 provided to an input module (not shown). The module may be a module configured to perform decoding processing on the memory and to store the processing results in the first to third memory banks 310 to 330.

제1 내지 제3 메모리 뱅크(310~330)는 전술한 본 발명의 일 실시예에 따른 멀티 스레드 기반의 비디오 디코딩 방법에서와 동일한 기능을 수행하는 메모리 뱅크일 수 있다.The first to third memory banks 310 to 330 may be memory banks that perform the same functions as in the multi-threaded video decoding method according to an embodiment of the present invention described above.

이와 같은 본 발명의 일 실시예에 따른 멀티 스레드 기반의 디코더는 H.264/AVC 표준에 따른 데이터 의존성을 만족시키면서 디코딩 처리 성능을 향상시킬 수 있다. 한편, 메모리 뱅크(310~330)를 재사용함으로써, 메모리의 효율적인 이용도 도모할 수 있다.Such a multi-threaded decoder according to an embodiment of the present invention can improve decoding processing performance while satisfying data dependency according to the H.264 / AVC standard. On the other hand, by reusing the memory banks 310 to 330, efficient use of the memory can also be achieved.

이상 첨부된 도면을 참조하여 본 발명의 실시예들을 설명하였으나, 본 발명은 상기 실시예들에 한정되는 것이 아니라 서로 다른 다양한 형태로 제조될 수 있으며, 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자는 본 발명의 기술적 사상이나 필수적인 특징을 변경하지 않고서 다른 구체적인 형태로 실시될 수 있다는 것을 이해할 수 있을 것이다. 그러므로 이상에서 기술한 실시예들은 모든 면에서 예시적인 것이며 한정적이 아닌 것으로 이해해야만 한다.Although the embodiments of the present invention have been described above with reference to the accompanying drawings, the present invention is not limited to the above embodiments but may be manufactured in various forms, and having ordinary skill in the art to which the present invention pertains. It will be understood by those skilled in the art that the present invention may be embodied in other specific forms without changing the technical spirit or essential features of the present invention. It is therefore to be understood that the above-described embodiments are illustrative in all aspects and not restrictive.

100: 비디오 프레임 110: 매크로 블록
200: 처리 모듈 211~241: 스레드
310, 320, 330: 메모리 뱅크100: video frame 110: macro block
200: processing module 211 to 241: thread
310, 320, 330: memory bank

Claims

Divide the video frame into first to N-th regions each comprising a plurality of macro blocks,
In a first step, a first process is performed on the first area by using a 1-1 thread.
In a second step, a second process different from the first process is performed on the first area using the 2-1 thread, and the first process is performed on the second area using a 2-2 thread. Including performing,
The video frame comprises a video frame encoded according to the H.264 / AVC standard,
The multi-threaded video decoding method of which the number of macro blocks included in the first to Nth regions is the same.