KR20090020460A

KR20090020460A - Method and apparatus for video decoding

Info

Publication number: KR20090020460A
Application number: KR1020080017493A
Authority: KR
Inventors: 알렉세이 로마노프스키; 안드레이 칸
Original assignee: 삼성전자주식회사
Priority date: 2007-08-23
Filing date: 2008-02-26
Publication date: 2009-02-26
Also published as: KR101392349B1

Abstract

A method and an apparatus for decoding video are provided to interleave a slice decoding process and a transmission process of a decoded macro block in a video decoding apparatus based on a multi core processor, thereby improving decoding performance. A multi core processor(100) is an integrated circuit including a plurality of cores for powerful performance, power consumption saving and efficient job handlings. A memory unit(200) stores an application program and data and loads a decoder unit(300) to decode a bit stream inputted by the multi core processor. Moreover, the memory unit comprises a buffer or a Q(Que) which temporarily saves data before the data are processed. The decoding unit comprises various function modules for video decoding with regard to the input bit stream.

Description

Method and apparatus for video decoding {Method and apparatus for video decoding}

본 발명은 비디오 디코딩 방법 및 장치에 관한 것으로, 보다 상세하게는 멀티 코어 프로세서 기반의 비디오 디코딩 장치에서 디코딩 성능을 향상시키는 비디오 디코딩 방법 및 장치에 관한 것이다.The present invention relates to a video decoding method and apparatus, and more particularly, to a video decoding method and apparatus for improving decoding performance in a multi-core processor-based video decoding apparatus.

일반적으로 인터넷을 포함한 정보 통신 기술이 발달함에 따라 문자, 음성뿐만 아니라 화상 통신이 증가하고 있다. 기존의 문자 위주의 통신 방식으로는 소비자의 다양한 욕구를 충족시키기에는 부족하며, 이에 따라 문자, 비디오, 음악 등 다양한 형태의 정보를 수용할 수 있는 멀티미디어 서비스가 증가하고 있다. 멀티미디어 데이터는 그 양이 방대하여 대용량의 저장 매체를 필요로 하며 전송시에 넓은 대역폭을 필요로 한다. 따라서 문자, 비디오, 오디오를 포함한 멀티미디어 데이터를 전송하기 위해서는 압축 코딩 기법을 사용하는 것이 필수적이다.In general, as information and communication technology including the Internet is developed, not only text and voice but also video communication are increasing. Conventional text-based communication methods are not enough to satisfy various needs of consumers, and accordingly, multimedia services that can accommodate various types of information such as text, video, and music are increasing. Multimedia data is huge in size and requires a large storage medium and a wide bandwidth in transmission. Therefore, it is essential to use compression coding to transmit multimedia data including text, video, and audio.

현재 사용되고 있는 비디오 코딩 방법으로는 MPEG-2, MPEG-4, H.263과 H.264 등이 있는데, 이러한 비디오 코딩 방법은 모션 보상 예측 코딩법에 기초하고 있는데, 시간적 중복은 모션 보상에 의해 제거하고 공간적 중복은 변환 코딩에 의해 제거한다.Currently used video coding methods include MPEG-2, MPEG-4, H.263 and H.264. These video coding methods are based on motion compensated predictive coding. And spatial redundancy is eliminated by transform coding.

이 중에서 ISO/IEC 13818-2 Video Standard에서 규정하는 MPEG-2(Moving Picture Experts Group-2)는 비디오 및 오디오를 압축하는 방식으로서, MPEG-1을 확장하여 현행 TV나 HDTV에 사용되는 비디오 데이터를 효율적으로 압축하는 것을 주된 목적으로 컴퓨터 네트워크를 통해서 전송 가능한 고화질의 비디오 인코딩 기법을 제공하기 위해 만들어 졌다.Among these, MPEG-2 (Moving Picture Experts Group-2), which is defined by the ISO / IEC 13818-2 Video Standard, is a method of compressing video and audio. It extends MPEG-1 to convert video data used in current TV or HDTV. Its purpose is to provide a high quality video encoding technique that can be transmitted over a computer network with the main purpose of compressing efficiently.

MPEG-2 비디오 표준은 기본적으로 영상 내 공간적, 시간적 중복성을 제거하고 이를 약속된 비트열로 표시하여 훨씬 짧은 길이로 표시하여 방대한 비디오 데이터를 압축하게 된다.The MPEG-2 video standard essentially eliminates spatial and temporal redundancy in the picture and displays it as a promised sequence of bits, resulting in a much shorter length of time to compress massive video data.

공간적 중복성을 제거하는 기술로는 영상 내 DCT(Discrete Cosine Transform) 변환과 양자화를 통해 사람 눈이 민감하지 않으면서 많은 정보량을 차지하는 고주파 성분을 제거하는 방법이 있고, 시간적 중복성(영상 프레임간 유사성)을 제거하는 방법으로 프레임간 유사성을 탐지하여 비슷한 부분은 영상 데이터를 보내지 않고 그에 해당하는 움직임 벡터 정보와 움직임 벡터로 표시했을 때 발생하는 오차 성분을 대신 보내는 방법이 있다. 오차 성분 또한 DCT 변환과 양자화를 거친다. 또한 발생 빈도를 고려하여 자주 발생하는 비트열에는 훨씬 짧은 코드를 할당하는 방법으로 비트열을 무손실 압축하는 가변길이 코드(Variable Length Code; VLC)이 있다. 특히 DCT 계수는 실행 길이 코드(Run Length Code)를 통해 짧은 비트열로 표현할 수 있다.As a technique for removing spatial redundancy, there is a method of removing high frequency components that occupy a large amount of information while the human eye is insensitive through DCT (Discrete Cosine Transform) transformation and quantization. As a method of elimination, there is a method of detecting similarity between frames so that similar parts do not send image data, but instead send corresponding motion vector information and an error component generated when the motion vector is displayed. The error component is also subjected to DCT transformation and quantization. In addition, there is a variable length code (VLC) that losslessly compresses a bit string by assigning a much shorter code to a bit string that frequently occurs in consideration of the frequency of occurrence. In particular, the DCT coefficients may be represented by a short bit string through a run length code.

종래에는 이러한 비디오 디코딩 작업이 단일 코어 프로세서(Single-Core Processor)에 의하여 이루어지는 것이 일반적이었다. 그러나, 최근 들어 강력한 성 능의 멀티 코어 프로세서(Multi-Core Processor)가 보급화되면서 비디오 디코딩과 같이 시스템 자원을 많이 소모하는 분야에서 멀티 코어 프로세서의 활용도가 높아지고 있다.In the related art, such video decoding is generally performed by a single-core processor. However, with the recent popularization of powerful multi-core processors, the utilization of multi-core processors is increasing in areas that consume a lot of system resources such as video decoding.

그러나, 하나의 프로세서를 구성하는 복수의 코어들이 각각 정해진 기능만을 담당하도록 정해져 있는 기능적 분할 방식의 경우, 그 구현이 용이하기는 하지만 분할된 기능을 각각의 코어가 처리하는 시간들이 동일하지 않기 때문에 병렬 처리가 어렵고, 프로세서의 전체적 성능을 모두 활용하지 못하는 단점이 있다.However, in the case of a functional partitioning scheme in which a plurality of cores constituting one processor are each assigned only a predetermined function, the parallelization is easy because the core processing time of each core for the partitioned function is not the same. It is difficult to process and does not take full advantage of the overall performance of the processor.

또한, 하나의 픽쳐를 복수의 영역으로 나누고 이를 각각의 코어에 할당하는 데이터 분할 방식의 경우, 단순한 데이터 처리에 있어서는 높은 병렬성을 보장하나, 데이터 처리 프로세스간 의존성(Dependency)이 있으면 구현이 복잡해지고 이를 해결하기 위한 추가 작업(데이터의 분할 크기와 연산 부하 간의 관계 예측)이 필요하게 되므로 성능이 급격하게 저하되는 단점이 있다. 또한 멀티 코어 프로세서를 구성하는 각각의 코어가 비디오 디코딩을 위한 전체 기능을 가지고 있어야 하기 때문에 시스템 자원의 사용에 있어서도 비효율적이 된다.In addition, in the data partitioning method of dividing a picture into a plurality of regions and allocating the same to each core, a high degree of parallelism is ensured in simple data processing, but the implementation is complicated when there is a dependency between data processing processes. There is a disadvantage in that performance is drastically reduced because additional work (predicting the relationship between the partition size of the data and the computational load) is required. In addition, since each core constituting a multi-core processor must have a full function for video decoding, it becomes inefficient in using system resources.

따라서, 멀티 코어 프로세서의 성능을 제대로 발휘하여 비디오 디코딩의 성능을 높일 수 있는 비디오 디코딩 방법이 필요하다.Accordingly, there is a need for a video decoding method that can properly display the performance of a multi-core processor to enhance video decoding performance.

본 발명은 상기한 문제점을 개선하기 위해 고안된 것으로, 본 발명이 해결하고자 하는 과제는 멀티 코어 프로세서 기반의 비디오 디코딩 장치에 있어서 디코딩에 필요한 계산과 디코딩된 데이터의 전송을 인터리브(Interleave)함으로써 디코딩 성능을 향상시킬 수 있는 비디오 디코딩 방법 및 장치를 제공하는 것이다.SUMMARY OF THE INVENTION The present invention has been devised to solve the above problems, and the problem to be solved by the present invention is to improve decoding performance by interleaving a calculation necessary for decoding and transmission of decoded data in a multi-core processor-based video decoding apparatus. It is to provide a video decoding method and apparatus that can be improved.

본 발명의 기술적 과제는 이상에서 언급한 것들로 제한되지 않으며, 언급되지 않은 또 다른 기술적 과제는 아래의 기재로부터 당업자에게 명확하게 이해될 수 있을 것이다.Technical problem of the present invention is not limited to those mentioned above, another technical problem that is not mentioned will be clearly understood by those skilled in the art from the following description.

상기 과제를 달성하기 위하여, 본 발명의 실시예에 따른 비디오 디코딩 장치는, 비디오 디코딩을 수행하기 위한 디코더부 및 상기 디코더부를 이용하여 입력된 비트 스트림에 대한 상기 비디오 디코딩을 수행하는 멀티 코어 프로세서를 포함하고, 상기 멀티 코어 프로세서는, 입력된 비트 스트림을 파싱하여 복수의 슬라이스로 분할하여 할당하는 제1 코어 및 상기 할당된 슬라이스를 디코딩하여 생성한 복수의 매크로블록을 보조 메모리에 포함된 제1 버퍼 및 제2 버퍼에 교대로 저장하였다가 주 메모리로 전송하여 상기 복수의 매크로블록에 관한 영상을 복원하는 제2 코어를 포함하며, 상기 제1 버퍼 및 상기 제2 버퍼 중 어느 하나의 버퍼로부터 상기 디코딩된 복수의 매크로블록이 전송되는 동안, 다른 버퍼에는 상기 디코딩된 복수의 매크로블록이 저장된다.In order to achieve the above object, a video decoding apparatus according to an embodiment of the present invention, a decoder for performing video decoding and a multi-core processor for performing the video decoding on the bit stream input using the decoder unit The multi-core processor may include: a first buffer including a first core for parsing an input bit stream, dividing the bit stream into a plurality of slices, and a plurality of macroblocks generated by decoding the allocated slice; And a second core that alternately stores the second buffer and transfers the image to the main memory to restore an image of the plurality of macroblocks, wherein the decoded data is decoded from one of the first buffer and the second buffer. While a plurality of macroblocks are being transmitted, another decoded plurality of macroblocks are stored in another buffer. It is.

상기 과제를 달성하기 위하여, 본 발명의 실시예에 따른 비디오 디코딩 방법은, 제1 코어 및 제2 코어로 구성되는 멀티 코어 프로세서 기반의 비디오 디코딩 장치에 있어서, 상기 제1 코어에서 입력된 비트 스트림을 파싱하여 복수의 슬라이스로 분할하고 상기 복수의 슬라이스 중 어느 하나를 상기 제2 코어에 할당하는 단계와, 상기 제2 코어에서 상기 할당된 슬라이스를 디코딩하여 복수의 매크로블록을 생성하는 단계와, 상기 디코딩된 복수의 매크로블록을 보조 메모리에 포함된 제1 버퍼 및 제2 버퍼에 교대로 저장하였다가 주 메모리로 전송하는 단계 및 상기 주 메모리로 전송된 복수의 매크로블록을 이용하여 상기 복수의 매크로블록에 관한 영상을 복원하는 단계를 포함하며, 상기 제1 버퍼 및 상기 제2 버퍼 중 어느 하나의 버퍼로부터 상기 디코딩된 복수의 매크로블록이 전송되는 동안, 다른 버퍼에는 상기 디코딩된 복수의 매크로블록이 저장된다.In order to achieve the above object, the video decoding method according to an embodiment of the present invention, in a multi-core processor-based video decoding apparatus consisting of a first core and a second core, the bit stream input from the first core Parsing and dividing into a plurality of slices and allocating any one of the plurality of slices to the second core; decoding the allocated slice in the second core to generate a plurality of macroblocks; Alternately storing the plurality of macroblocks in the first buffer and the second buffer included in the auxiliary memory and transferring the macroblocks to the main memory and using the plurality of macroblocks transferred to the main memory to the plurality of macroblocks. Restoring an image related to said decoding, wherein said decoding is performed from one of said first buffer and said second buffer. While the plurality of macroblocks are transmitted, the decoded plurality of macroblocks are stored in another buffer.

상기 과제를 달성하기 위하여, 본 발명의 또 다른 실시예에 따른 비디오 디코딩 장치는, 비디오 디코딩을 수행하기 위한 디코더부 및 상기 디코더부를 이용하여 입력된 비트 스트림에 대한 상기 비디오 디코딩을 수행하는 멀티 코어 프로세서를 포함하고, 상기 멀티 코어 프로세서는, 입력된 비트 스트림을 파싱하여 복수의 슬라이스로 분할하여 할당하는 제1 코어 및 상기 할당된 슬라이스로부터 생성한 복수의 매크로블록 각각에 대한 움직임 보상을 수행하고, 상기 움직임 보상을 수행한 복수의 매크로블록을 주 메모리로 전송하여 상기 복수의 매크로블록에 관한 영상을 복원하는 제2 코어를 포함하며, 상기 제2 코어는 상기 움직임 보상을 수행하는 동안, 상기 움직임 보상의 결과에 영향을 받지 않는 다른 작업을 동시에 수행한다.In order to achieve the above object, a video decoding apparatus according to another embodiment of the present invention, a decoder for performing video decoding and a multi-core processor for performing the video decoding on the bit stream input by using the decoder Wherein the multi-core processor is further configured to perform motion compensation for each of a plurality of macroblocks generated from the first core and the allocated slices by parsing and dividing an input bit stream into a plurality of slices. And a second core configured to transmit a plurality of macroblocks that have performed motion compensation to a main memory to reconstruct an image of the plurality of macroblocks, wherein the second core is configured to perform the motion compensation. Simultaneously perform other tasks that are not affected by the results.

상기 과제를 달성하기 위하여, 본 발명의 또 다른 실시예에 따른 비디오 디코딩 방법은, 제1 코어 및 제2 코어로 구성되는 멀티 코어 프로세서 기반의 비디오 디코딩 장치에 있어서, 상기 제1 코어에서 입력된 비트 스트림을 파싱하여 복수의 슬라이스로 분할하고 상기 복수의 슬라이스 중 어느 하나를 상기 제2 코어에 할당하는 단계와, 상기 제2 코어에서 상기 할당된 슬라이스로부터 생성한 복수의 매크로블록 각각에 대한 움직임 보상을 수행하는 단계와, 상기 움직임 보상을 수행한 복수의 매크로블록을 주 메모리로 전송하여 상기 복수의 매크로블록에 관한 영상을 복원하는 단계를 포함하며, 상기 움직임 보상을 수행하는 동안, 상기 움직임 보상의 결과에 영향을 받지 않는 다른 작업을 동시에 수행한다.In order to achieve the above object, a video decoding method according to another embodiment of the present invention, in a multi-core processor-based video decoding apparatus consisting of a first core and a second core, the bit input from the first core Parsing the stream into a plurality of slices and allocating any one of the plurality of slices to the second core; and performing motion compensation for each of the plurality of macroblocks generated from the allocated slices in the second core. And restoring an image related to the plurality of macroblocks by transmitting a plurality of macroblocks that have performed the motion compensation to a main memory, and performing the motion compensation while performing the motion compensation. Simultaneously perform other tasks that are not affected.

기타 실시예들의 구체적인 사항들은 상세한 설명 및 도면들에 포함되어 있다.Specific details of other embodiments are included in the detailed description and the drawings.

본 발명의 비디오 디코딩 방법 및 장치에 따르면 다음과 같은 효과가 하나 혹은 그 이상 있다.According to the video decoding method and apparatus of the present invention, there are one or more effects as follows.

첫째, 멀티 코어 프로세서 기반의 비디오 디코딩 장치에서 슬라이스를 디코딩하는 과정과 디코딩된 매크로 블록의 전송 과정을 인터리브함으로써 디코딩 성능을 향상시킬 수 있는 장점이 있다.First, in a multi-core processor-based video decoding apparatus, decoding performance may be improved by interleaving a process of decoding a slice and a process of transmitting a decoded macro block.

둘째,　멀티 코어 프로세서 기반의 비디오 디코딩 장치에서 움직임 보상을 위한 계산 과정과 다른 계산 과정을 인터리브함으로써 디코딩 성능을 향상시킬 수 있는 장점이 있다.Second, in a multi-core processor-based video decoding apparatus, the decoding performance can be improved by interleaving a calculation process for motion compensation and another calculation process.

셋째, 멀티 코어 프로세서 기반의 비디오 디코딩 장치를 이용하여 MPEG2 표준 규격을 만족하면서도 디코딩 성능을 향상시킬 수 있는 장점이 있다.Third, there is an advantage that the decoding performance can be improved while satisfying the MPEG2 standard using a multi-core processor-based video decoding apparatus.

넷째, 멀티 코어 프로세서 기반의 비디오 디코딩 장치의 디코딩 성능을 향상시킴으로서 하드웨어를 효율적으로 사용할 수 있으며 같은 성능을 구현하는데 있어 하드웨어 사양을 줄일 수 있는 장점이 있다.Fourth, by improving the decoding performance of a multi-core processor-based video decoding device, it is possible to use hardware efficiently and to reduce the hardware specification in implementing the same performance.

본 발명의 효과들은 이상에서 언급한 효과들로 제한되지 않으며, 언급되지 않은 또 다른 효과들은 청구범위의 기재로부터 당업자에게 명확하게 이해될 수 있을 것이다.The effects of the present invention are not limited to the above-mentioned effects, and other effects not mentioned will be clearly understood by those skilled in the art from the description of the claims.

본 발명의 이점 및 특징, 그리고 그것들을 달성하는 방법은 첨부되는 도면과 함께 상세하게 후술되어 있는 실시예들을 참조하면 명확해질 것이다. 그러나 본 발명은 이하에서 개시되는 실시예들에 한정되는 것이 아니라 서로 다른 다양한 형태로 구현될 수 있으며, 단지 본 실시예들은 본 발명의 개시가 완전하도록 하고, 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 발명의 범주를 완전하게 알려주기 위해 제공되는 것이며, 본 발명은 청구항의 범주에 의해 정의될 뿐이다. 명세서 전체에 걸쳐 동일 참조 부호는 동일 구성 요소를 지칭한다.Advantages and features of the present invention and methods for achieving them will be apparent with reference to the embodiments described below in detail with the accompanying drawings. However, the present invention is not limited to the embodiments disclosed below, but can be implemented in various different forms, and only the embodiments make the disclosure of the present invention complete, and the general knowledge in the art to which the present invention belongs. It is provided to fully inform the person having the scope of the invention, which is defined only by the scope of the claims. Like reference numerals refer to like elements throughout.

이하, 본 발명의 실시예들에 의하여 비디오 디코딩 방법 및 장치를 설명하기 위한 도면들을 참고하여 본 발명에 대해 설명하도록 한다.Hereinafter, the present invention will be described with reference to the drawings for explaining a video decoding method and apparatus according to embodiments of the present invention.

이 때, 처리 흐름도 도면들의 각 블록과 흐름도 도면들의 조합들은 컴퓨터 프로그램 인스트럭션들에 의해 수행될 수 있음을 이해할 수 있을 것이다. 이들 컴 퓨터 프로그램 인스트럭션들은 범용 컴퓨터, 특수용 컴퓨터 또는 기타 프로그램 가능한 데이터 프로세싱 장비의 프로세서에 탑재될 수 있으므로, 컴퓨터 또는 기타 프로그램 가능한 데이터 프로세싱 장비의 프로세서를 통해 수행되는 그 인스트럭션들이 흐름도 블록(들)에서 설명된 기능들을 수행하는 수단을 생성하게 된다. 이들 컴퓨터 프로그램 인스트럭션들은 특정 방식으로 기능을 구현하기 위해 컴퓨터 또는 기타 프로그램 가능한 데이터 프로세싱 장비를 지향할 수 있는 컴퓨터 이용 가능 또는 컴퓨터 판독 가능 메모리에 저장되는 것도 가능하므로, 그 컴퓨터 이용가능 또는 컴퓨터 판독 가능 메모리에 저장된 인스트럭션들은 흐름도 블록(들)에서 설명된 기능을 수행하는 인스트럭션 수단을 내포하는 제조 품목을 생산하는 것도 가능하다. 컴퓨터 프로그램 인스트럭션들은 컴퓨터 또는 기타 프로그램 가능한 데이터 프로세싱 장비 상에 탑재되는 것도 가능하므로, 컴퓨터 또는 기타 프로그램 가능한 데이터 프로세싱 장비 상에서 일련의 동작 단계들이 수행되어 컴퓨터로 실행되는 프로세스를 생성해서 컴퓨터 또는 기타 프로그램 가능한 데이터 프로세싱 장비를 수행하는 인스트럭션들은 흐름도 블록(들)에서 설명된 기능들을 실행하기 위한 단계들을 제공하는 것도 가능하다.At this point, it will be understood that each block of the flowchart illustrations and combinations of flowchart illustrations may be performed by computer program instructions. Since these computer program instructions may be mounted on a processor of a general purpose computer, special purpose computer, or other programmable data processing equipment, those instructions performed through the processor of the computer or other programmable data processing equipment are described in flow chart block (s). It will create a means to perform the specified functions. These computer program instructions may be stored in a computer usable or computer readable memory that can be directed to a computer or other programmable data processing equipment to implement functionality in a particular manner, and thus the computer usable or computer readable memory. It is also possible for the instructions stored in to produce an article of manufacture containing instruction means for performing the functions described in the flowchart block (s). Computer program instructions may also be mounted on a computer or other programmable data processing equipment, such that a series of operating steps may be performed on the computer or other programmable data processing equipment to create a computer-implemented process to create a computer or other programmable data. Instructions for performing the processing equipment may also provide steps for performing the functions described in the flowchart block (s).

또한, 각 블록은 특정된 논리적 기능(들)을 실행하기 위한 하나 이상의 실행 가능한 인스트럭션들을 포함하는 모듈, 세그먼트 또는 코드의 일부를 나타낼 수 있다. 또, 몇 가지 대체 실행 예들에서는 블록들에서 언급된 기능들이 순서를 벗어나서 발생하는 것도 가능함을 주목해야 한다. 예컨대, 잇달아 도시되어 있는 두 개의 블록들은 사실 실질적으로 동시에 수행되는 것도 가능하고 또는 그 블록들이 때때 로 해당하는 기능에 따라 역순으로 수행되는 것도 가능하다.In addition, each block may represent a portion of a module, segment, or code that includes one or more executable instructions for executing a specified logical function (s). It should also be noted that in some alternative implementations, the functions noted in the blocks may occur out of order. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or they may sometimes be executed in the reverse order, depending on the functionality involved.

도 1은 본 발명의 일실시예에 따른 비디오 디코딩 장치의 구성을 나타내는 블록도이다.1 is a block diagram illustrating a configuration of a video decoding apparatus according to an embodiment of the present invention.

본 발명의 일실시예에 따른 비디오 디코딩 장치는, 멀티 코어 프로세서(100), 메모리부(200) 및 디코더부(300)를 포함할 수 있다.The video decoding apparatus according to an embodiment of the present invention may include a multi-core processor 100, a memory unit 200, and a decoder unit 300.

멀티 코어 프로세서(100)는 보다 강력한 성능과 소비 전력 절감, 그리고 여러 개의 작업을 보다 효율적으로 한 번에 처리하기 위해 복수의 코어를 포함하는 집적 회로를 의미한다.The multi-core processor 100 refers to an integrated circuit including a plurality of cores for more powerful performance, lower power consumption, and more efficiently processing multiple tasks at once.

바람직하게는, 본 발명의 일실시예에 따른 비디오 디코딩 장치에서는 멀티 코어 프로세서(100)로서 Cell BE 아키텍처(Cell Broadband Engine Architecture, CBEA)를 사용할 수 있다. Cell BE 아키텍처는 최근에 소니, 도시바, 아이비엠(Sony, Toshiba, IBM, 이를 합쳐 STI라고 한다.) 3사에서 개발한 아키텍처로서, 시스템 자원을 많이 소모하는 비디오 디코딩 장치에 사용될 수 있다.Preferably, in the video decoding apparatus according to an embodiment of the present invention, a Cell BE architecture (Cell Broadband Engine Architecture, CBEA) may be used as the multi-core processor 100. The Cell BE architecture was recently developed by Sony, Toshiba and IBM (STI, collectively known as STI) and can be used for video decoding devices that consume a lot of system resources.

도 2는 본 발명의 일실시예에 따른 Cell BE 아키텍처의 구조를 개략적으로 나타낸 블록도이다.2 is a block diagram schematically illustrating a structure of a Cell BE architecture according to an embodiment of the present invention.

Cell BE 아키텍처(Cell Broadband Engine Architecture, CBEA)는 64-bit 파워 아키텍처 기반의 새로운 프로세서 구조를 정의하며, 분산 프로세싱과 미디어 중심의 애플리케이션에 초점을 맞추었다.The Cell BEband Architecture (CBEA) defines a new processor architecture based on the 64-bit power architecture and focuses on distributed processing and media-centric applications.

도 2에 도시된 바와 같이, Cell BE 아키텍처는 적어도 하나의 Power Processor Element(PPE(110))와 복수의 고성능 Synergistic Processor Element(SPE(120)), 이들간 통신을 담당하는 EIB(130)(Element Interconnect Bus) 및 메모리(140)로 구성된 싱글 칩 멀티 프로세서로 정의될 수 있다.As shown in FIG. 2, the Cell BE architecture includes at least one Power Processor Element (PPE 110), a plurality of high performance Synergistic Processor Elements (SPE 120), and an EIB 130 (Element) that is in charge of communication therebetween. Interconnect Bus) and the memory 140 may be defined as a single chip multiprocessor.

각각의 SPE(120)는 애플리케이션 프로그램을 실행할 수 있는 독립 프로세서로서, 공유된 메모리(140)와 직접 메모리 액세스(Direct Memory Access, DMA) 명령어로 모든 Cell 프로세싱 요소간에 완벽하고 효율적인 통신을 가능하게 한다. 또한, SPE(120)가 메인 메모리(140)의 계층(Hierarchy)에 포함되어 있지 않고, 독립적인 Local Store(LS)만 가지고 있기 때문에, 메인 메모리(140)에 접근할 때에는 DMA 방식을 사용할 수 있다.Each SPE 120 is an independent processor capable of executing an application program. The SPE 120 enables a complete and efficient communication between all cell processing elements with a shared memory 140 and a direct memory access (DMA) instruction. In addition, since the SPE 120 is not included in the hierarchy of the main memory 140 and has only an independent Local Store (LS), the DMA method may be used to access the main memory 140. .

PPE(110)는 64비트 파워 아키텍처의 프로세서로서, 각각의 SPE(120)가 해야 할 작업을 할당하는 마이크로프로세서 코어이다. Cell BE 아키텍처 기반의 시스템에서 PPE(110)는 운영 체제(Operating System, OS)와 대부분의 애플리케이션을 구동시키나, 운영 체제 및 애플리케이션의 집중적인 부분의 계산은 SPE(120)에 할당(Offload)하게 된다.PPE 110 is a processor of 64-bit power architecture, which is a microprocessor core that allocates the work each SPE 120 should do. In a system based on the Cell BE architecture, the PPE 110 runs an operating system (OS) and most applications, but the calculation of the intensive portion of the operating system and applications is offloaded to the SPE 120. .

SPE(120)는 독립적인 프로세서처럼 행동하는 프로세서로서, 벡터 및 데이터 스트리밍 처리에 특화된 SIMD(Single Instruction, Multiple Data) 형태의 아키텍처로 구성되어 있다. SPE(120)는 256 KByte의 Local Store(LS)를 포함하고 있다. 도 2와 같이, SPE(120)는 8개가 구비될 수 있으나, 이에 한정되지는 않는다.The SPE 120 is a processor that acts as an independent processor, and has an architecture of a SIMD (Single Instruction, Multiple Data) type specialized in vector and data streaming processing. The SPE 120 includes a 256 KByte Local Store (LS). As shown in FIG. 2, eight SPEs 120 may be provided, but is not limited thereto.

EIB(130)는 Cell BE 프로세서 상의 모든 프로세서 요소와 메모리(140) 컨트롤러와 IO간의 명령어 및 데이터들의 통신 경로를 의미한다. 따라서, EIB(130)는 PPE(110)와 SPE(120)들과 병행적으로 작동하여 데이터 전송과 계산을 동시에 할 수 있다.The EIB 130 refers to a communication path of instructions and data between all the processor elements on the Cell BE processor and the memory 140 controller and the IO. Accordingly, the EIB 130 may operate in parallel with the PPE 110 and the SPEs 120 to simultaneously perform data transmission and calculation.

Cell BE 아키텍처의 자세한 구조에 대해서는 공지되어 있으므로, 이에 관한 자세한 설명은 생략한다.Since the detailed structure of the Cell BE architecture is known, a detailed description thereof will be omitted.

메모리부(200)는 애플리케이션 프로그램 및 데이터를 저장하는 곳으로서, 후술할 디코더부(300)를 로딩하여 멀티 코어 프로세서(100)가 입력된 비트 스트림에 대하여 디코딩 작업을 수행할 수 있도록 할 수 있다. 또한, 메모리부(200)는 처리되기 전의 데이터를 일시 저장하는 버퍼(Buffer) 내지 큐(Que)를 포함할 수 있다.The memory unit 200 stores application programs and data. The memory unit 200 may load the decoder unit 300 to be described later to enable the multi-core processor 100 to perform decoding operations on the input bit stream. In addition, the memory unit 200 may include a buffer or a queue that temporarily stores data before processing.

저장부는 하드 디스크, 플래시 메모리(140), CF 카드(Compact Flash Card), SD 카드(Secure Digital Card), SM 카드(Smart Media Card), MMC 카드(Multimedia Card) 또는 메모리(140) 스틱(Memory Stick) 등 정보의 입출력이 가능한 모듈로서 비디오 디코딩 장치의 내부에 구비되어 있을 수도 있고, 별도의 장치에 구비되어 있을 수도 있다. 여기서는, 멀티 코어 프로세서(100)와 독립적으로 구비된 메모리(140)를 예로 들었으나, 멀티 코어 프로세서(100) 내부의 메모리(140)를 사용할 수도 있다.The storage unit may include a hard disk, a flash memory 140, a Compact Flash Card, a Secure Digital Card, an SD Card, a Smart Media Card, an MMC Card, or a Memory 140 Stick. As a module capable of inputting / outputting information such as), the module may be provided inside the video decoding apparatus or may be provided in a separate apparatus. Here, although the memory 140 provided independently of the multi-core processor 100 is taken as an example, the memory 140 inside the multi-core processor 100 may be used.

디코더부(300)는 입력된 비트 스트림에 대하여 비디오 디코딩을 수행하기 위한 다양한 기능 모듈들로 구성될 수 있다.The decoder unit 300 may be configured with various functional modules for performing video decoding on the input bit stream.

도 3은 본 발명의 일실시예에 따른 디코더부의 구성을 나타내는 블록도이다.3 is a block diagram showing a configuration of a decoder unit according to an embodiment of the present invention.

본 발명의 일실시예에 따른 디코더부(300)는, 심볼 디코더(310), 역 양자화부(320), 역 변환부(330), 움직임 보상부(340), 가산기(350), 디블록부(360) 및 버퍼(370) 등의 기능 모듈을 포함할 수 있다.The decoder 300 according to an embodiment of the present invention includes a symbol decoder 310, an inverse quantizer 320, an inverse transform unit 330, a motion compensator 340, an adder 350, and a deblocking unit. And a functional module such as 360 and buffer 370.

심볼 디코더(310)는 입력된 비트 스트림에 대하여 무손실 복호화를 수행하고, 움직임 벡터와 텍스쳐 데이터를 구한다. 무손실 복호화에는 허프만 블록 디코딩(huffman block decoding), 산술 복호화(arithmetic decoding), 가변 길이 복호화(variable length decoding) 등이 있다. 일반적으로 특정 매크로블록에 대한 움직임 벡터는 주변 매크로블록의 움직임 벡터에 의존성을 지닌다. 즉, 주변 매크로블록의 움직임 벡터를 구하지 않고서는 특정 매크로블록의 움직임 벡터도 구할 수 없다. 심볼 리코더(310)에서 구한 텍스쳐 데이터는 역 양자화부(320)에 제공되고, 움직임 벡터는 움직임 보상부(340)에 제공될 수 있다.The symbol decoder 310 performs lossless decoding on the input bit stream and obtains a motion vector and texture data. Lossless decoding includes Huffman block decoding, arithmetic decoding, variable length decoding, and the like. In general, the motion vector for a particular macroblock depends on the motion vector of the neighboring macroblock. That is, the motion vector of a specific macroblock cannot be obtained without obtaining the motion vector of the neighboring macroblock. Texture data obtained from the symbol recorder 310 may be provided to the inverse quantizer 320, and a motion vector may be provided to the motion compensator 340.

역 양자화부(320)는 심볼 디코더(310)로부터 제공되는 텍스쳐 데이터를 역 양자화(Inverse quantisation)한다. 이러한 역 양자화 과정은 양자화 과정에서 사용되었던 양자화 테이블을 이용하여 양자화 과정에서 생성된 인덱스로부터 그에 매칭되는 값을 복원하는 과정을 의미한다.The inverse quantizer 320 inverse quantization of the texture data provided from the symbol decoder 310. The inverse quantization process refers to a process of restoring a value matched from an index generated in the quantization process by using a quantization table used in the quantization process.

역 변환부(330)는 역 양자화된 결과에 대하여 역 변환을 수행한다. 이러한 역 변환의 구체적 방법으로는 역 DCT(Inverse Discrete Cosine Transform) 변환, 역 웨이브렛 변환 등이 있다. 역 변환된 결과, 즉 복원된 고주파 영상은 가산기(350)에 제공된다.The inverse transform unit 330 performs inverse transform on the inverse quantized result. Specific methods of such inverse transform include an inverse discrete cosine transform (DCT) transform and an inverse wavelet transform. The inverse transformed result, that is, the restored high frequency image is provided to the adder 350.

움직임 보상부(340)는 심볼 디코더(310)로부터 제공되는 현재 매크로블록에 대한 움직임 벡터(Motion Vector)를 이용하여, 적어도 하나 이상의 참조 프레임(이전에 복원되어 픽처 버퍼에 저장되어 있음)을 움직임 보상(Motion Compensation)함으로써 예측 영상을 생성한다. 이러한 움직임 보상이 1/2 픽셀 또는 1/4 픽셀 단위 로 이루어지는 경우에는 예측 영상을 생성하기 위한 보간 과정에서 많은 연산량이 소요된다. 또한, 두 개의 참조 프레임을 사용하여 움직임 보상하는 경우에는 각각 움직임 보상된 매크로블록들 평균을 계산하게 되는데, 이 때에는 매크로블록들 간에는 의존성이 존재하게 된다. 따라서, 이들 매크로블록들은 단일의 코어에서 처리되도록 할 필요가 있다.The motion compensator 340 motion-compensates at least one or more reference frames (previously reconstructed and stored in the picture buffer) by using a motion vector of the current macroblock provided from the symbol decoder 310. The prediction image is generated by (Motion Compensation). When such motion compensation is performed in units of 1/2 pixel or 1/4 pixel, a large amount of computation is required in an interpolation process for generating a prediction image. In addition, when motion compensation is performed using two reference frames, an average of motion compensated macroblocks is calculated. In this case, dependencies exist between the macroblocks. Thus, these macroblocks need to be processed in a single core.

가산기(350)는 역 변환부(330)로부터 제공되는 고주파 영상과 생성된 예측 영상을 가산하여 현재 매크로블록에 관한 영상을 복원한다.The adder 350 reconstructs an image related to the current macroblock by adding the high frequency image provided from the inverse transformer 330 and the generated prediction image.

디블록부(360)는 복원된 영상에 디블록 필터를 적용하여 복원된 영상의 블록 인위성(block artifact)를 제거한다. 일반적으로, 복원된 영상은 매크로블록 단위로 처리되기 때문에 매크로블록 경계 부분에서 노이즈가 발생하게 되는데 이를 블록 인위성이라고 한다. 이러한 블록 인위성은 비디오 데이터의 압축률이 높을수록 커지는 경향이 있다. 디블록 필터를 거친 복원된 영상은 버퍼(370)에 일시 저장되었다가 다른 영상의 복원을 위하여 이용되기도 한다.The deblock unit 360 removes block artifacts of the reconstructed image by applying a deblock filter to the reconstructed image. In general, since the reconstructed image is processed in units of macroblocks, noise is generated at the macroblock boundary part, which is called block artificiality. Such block artificiality tends to increase as the compression rate of video data increases. The reconstructed image passed through the deblocking filter may be temporarily stored in the buffer 370 and used for reconstruction of another image.

한편, 모든 매크로블록이 움직임 보상을 통하여 복원되는 것은 아니다. 매크로블록에 따라서는 인트라 예측(Intra prediction, IP)을 통하여 코딩되는 경우도 있다. 이를 인트라 매크로블록(Intra macroblock)이라고 한다. 인트라 예측은 현재 매크로블록을 동일한 프레임 내에서 인접한 다른 매크로블록의 영상을 이용하여 복원하는 방법이다. 이러한 경우에도 현재 매크로블록은 다른 매크로블록과 의존성을 가지게 되므로 단일의 코어에서 처리되도록 할 필요가 있다.On the other hand, not all macroblocks are recovered through motion compensation. Some macroblocks may be coded through intra prediction (IP). This is called an intra macroblock. Intra prediction is a method of reconstructing a current macroblock using an image of another adjacent macroblock in the same frame. Even in this case, the current macroblock has a dependency on other macroblocks and needs to be processed in a single core.

도 4는 본 발명의 일실시예에 따른 비디오 디코딩 장치에서 멀티 코어 프로 세서를 이용하여 비디오를 디코딩하는 예를 나타내는 도면이다.4 is a diagram illustrating an example of decoding a video using a multi-core processor in a video decoding apparatus according to an embodiment of the present invention.

여기서는, 멀티 코어 프로세서(100)로서 Cell BE 아키텍처를 이용하는 방법을 예로 들어 설명할 것이나, 이에 국한되지는 않으며 당업자에 의해 변경 가능하다.Here, a method of using the Cell BE architecture as the multi-core processor 100 will be described as an example, but is not limited thereto, and may be changed by those skilled in the art.

먼저, PPE(110)는 입력된 비트 스트림을 파싱(Parsing)하여 복수의 슬라이스(Slice)로 분할할 수 있다. 그리고, 분할된 각각의 슬라이스는 각각의 SPE(120)로 보내지게 되고, SPE(120)는 슬라이스를 디코딩하여 복수의 매크로블록(Macroblock)을 생성할 수 있다. 즉, Cell BE 아키텍처 내 각각의 SPE(120)는 디코더부(300)를 이용하여 슬라이스를 디코딩할 수 있으므로 슬라이스 수준의 동시성(Concurrency)을 구현할 수 있다. 매크로블록은 슬라이스를 디코딩한 결과이다. 슬라이스를 디코딩하는 과정은 공지되어 있으므로, 이에 관한 자세한 설명은 생략한다.First, the PPE 110 may parse the input bit stream into a plurality of slices. In addition, each divided slice is sent to each SPE 120, and the SPE 120 may decode the slice to generate a plurality of macroblocks. That is, since each SPE 120 in the Cell BE architecture can decode the slice using the decoder unit 300, the concurrency of the slice level can be implemented. The macroblock is the result of decoding the slice. Since a process of decoding a slice is known, a detailed description thereof will be omitted.

슬라이스를 디코딩하여 매크로블록을 생성하면 EIB(130)는 DMA 전송을 이용하여 디코딩된 매크로블록을 SPE(120)의 LS로부터 메모리(140) 상의 픽쳐 버퍼로 전송할 수 있다. 보다 정확하게는, 역 변환부에서 생성된 YUV 픽셀을 픽쳐 버퍼로 전송하게 된다. 또한, EIB(130)는 DMA 전송을 이용하여 픽쳐 버퍼에 저장된 픽쳐 데이터로부터 예측 데이터를 SPE(120)로 전송하여 움직임 보상을 할 수 있도록 한다.When the macroblock is generated by decoding the slice, the EIB 130 may transmit the decoded macroblock using the DMA transmission from the LS of the SPE 120 to the picture buffer on the memory 140. More precisely, the YUV pixel generated by the inverse transform unit is transmitted to the picture buffer. In addition, the EIB 130 transmits prediction data from the picture data stored in the picture buffer to the SPE 120 using DMA transmission to compensate for motion.

상기와 같이 구성되는 본 발명의 일실시예에 따른 비디오 디코딩 장치의 제어 방법을 설명하면 다음과 같다.Referring to the control method of the video decoding apparatus according to an embodiment of the present invention configured as described above are as follows.

도 5는 본 발명의 일실시예에 따른 비디오 디코딩 장치에서 디코딩된 매크로블록을 전송하는 과정을 나타내는 순서도이다.5 is a flowchart illustrating a process of transmitting a decoded macroblock in a video decoding apparatus according to an embodiment of the present invention.

도 4에서 설명한 바와 같이, 먼저 PPE(110)는 입력된 비트 스트림을 파싱하여 복수의 슬라이스로 분할하고 나면, SPE(120)는 각각의 슬라이스를 디코딩하여 복수의 매크로블록을 생성할 수 있다(S401).As described in FIG. 4, first, the PPE 110 parses an input bit stream and divides the input bit stream into a plurality of slices, and then the SPE 120 may decode each slice to generate a plurality of macroblocks (S401). ).

한편, SPE(120)에 포함된 LS의 크기는 256 KB 정도의 작은 용량으로 픽쳐 버퍼를 포함할 수 있을 정도로 충분히 크지 않기 때문에, 디코딩된 매크로블록은 메모리(140) 상의 픽쳐 버퍼로 전송될 필요가 있다. 여기서는, 메모리(140) 상의 픽쳐 버퍼로 전송하는 예를 들고 있으나, 시스템을 구현함에 있어서 다른 SPE(120)로 전송하는 것도 가능하다.On the other hand, since the size of the LS included in the SPE 120 is not large enough to include the picture buffer with a small capacity of about 256 KB, the decoded macroblock needs to be transmitted to the picture buffer on the memory 140. have. Here, an example of transmitting to a picture buffer on the memory 140 is provided. However, the system may be transmitted to another SPE 120 in a system implementation.

이 때, 매크로 블록을 디코딩될 때마다 하나씩 따로 전송하기 위해 DMA 전송을 이용하는 것은 간단하지만 비효율적이므로, LS에 디코딩된 매크로블록을 일정 시간 저장하였다가 LS가 꽉 채워진 경우 복수의 디코딩된 매크로블록을 모아서 함께 전송하게 된다. LS에 모아지는 매크로블록들의 수는 SPE(120)의 사용할 수 있는 메모리(140)의 용량에 따라 달라질 수 있다.In this case, it is simple but inefficient to use the DMA transmission to transmit the macro blocks one by one each time they are decoded. Therefore, when the decoded macroblocks are stored in the LS for a predetermined time and the LS is full, a plurality of decoded macroblocks are collected. Will be sent together. The number of macroblocks collected in the LS may vary depending on the capacity of the available memory 140 of the SPE 120.

도 6은 종래에 DMA 전송을 이용하여 매크로블록을 전송하는 예를 나타내는 도면이다.6 is a diagram illustrating an example of conventionally transmitting a macroblock using DMA transfer.

도 6에 도시된 바와 같이, 디코딩된 매크로블록은 SPE(120)의 LS 내의 버퍼로 저장될 수 있다. 버퍼 내부가 매크로블록으로 가득 채워지게 되면 EIB(130)는 버퍼 내에 모아진 매크로블록을 메모리(140) 상의 픽쳐 버퍼로 전송을 시작하게 된 다.As shown in FIG. 6, the decoded macroblock may be stored as a buffer in the LS of the SPE 120. When the inside of the buffer is filled with macroblocks, the EIB 130 starts transmitting the macroblocks collected in the buffer to the picture buffer on the memory 140.

이 때, 새롭게 디코딩된 매크로블록을 버퍼로 저장하게 되면 기존에 저장된 매크로블록, 즉 아직 픽쳐 버퍼로 전송되지 않은 매크로블록을 덮어 씌울 수 있기 때문에 매크로블록의 전송이 완료될 때까지 기다리는 대기 시간이 필요하게 된다. 이러한 대기 시간은 픽쳐 버퍼로 매크로블록을 전송할 때 마다 필요하며 매크로블록이 많을수록 여러 번의 대기 시간이 필요하게 된다. 이와 같이, 대기 시간에는 SPE(120)가 슬라이스를 디코딩하여 매크로블록을 생성할 수 없기 때문에, 디코딩 시간이 지연되는 문제가 있다.In this case, when a newly decoded macroblock is stored as a buffer, the previously stored macroblock, that is, a macroblock that has not yet been transferred to the picture buffer, may be overwritten, and thus a waiting time until the macroblock transmission is completed is required. Done. This waiting time is required every time a macroblock is transmitted to the picture buffer, and the more macroblocks, the more waiting time is required. As such, since the SPE 120 cannot generate a macroblock by decoding the slice during the waiting time, the decoding time is delayed.

따라서, 본 발명의 일실시예에 따른 비디오 디코딩 장치에서는 SPE(120)의 LS에 대해 이중 버퍼링(Double Buffering)을 이용하여 매크로블록을 저장 및 전송할 수 있다.Therefore, in the video decoding apparatus according to an embodiment of the present invention, macroblocks may be stored and transmitted using double buffering for LS of the SPE 120.

이중 버퍼링을 위해 SPE(120)의 LS는 두 개의 버퍼, 즉 제1 버퍼와 제2 버퍼로 구성될 수 있다. 최초에는 제1 버퍼를 활성화(Active) 상태로, 제2 버퍼는 비활성화(Passive) 상태로 설정할 수 있다. 또한, 제1 버퍼와 제2 버퍼의 용량은 동일하게 설정되는 것이 바람직하다.For double buffering, the LS of the SPE 120 may be composed of two buffers, a first buffer and a second buffer. Initially, the first buffer may be set to an active state, and the second buffer may be set to a passive state. In addition, it is preferable that the capacities of the first buffer and the second buffer are set to be the same.

다시 도 5를 참조하면, SPE(120)는 디코딩된 매크로블록을 먼저 활성화된 제1 버퍼에 저장하고(S402), 이는 제1 버퍼에 매크로블록이 가득 찰 때까지 계속될 수 있다(S403).Referring back to FIG. 5, the SPE 120 stores the decoded macroblock in the first activated first buffer (S402), which may continue until the first buffer is full of macroblocks (S403).

제1 버퍼가 가득 차게 되면(S403의 예), EIB(130)는 제1 버퍼 내에 모아진 매크로블록의 전송을 요청(초기화)하여 픽쳐 버퍼로 전송을 시작하고(S404), 이와 동시에 디코딩된 매크로블록을 LS의 제2 버퍼에 저장할 수 있다(S405). 이 때, 제1 버퍼 및 제2 버퍼의 상태는 서로 교환되어(Swap), 제1 버퍼는 비활성화 상태로, 제2 버퍼는 활성화 상태로 설정될 수 있다.When the first buffer becomes full (YES in S403), the EIB 130 requests (initializes) the transmission of the macroblocks collected in the first buffer and starts the transmission to the picture buffer (S404), and simultaneously decodes the macroblock. May be stored in the second buffer of the LS (S405). At this time, the states of the first buffer and the second buffer may be swapped with each other (Swap), so that the first buffer may be set to an inactive state and the second buffer may be set to an active state.

만약, 제2 버퍼가 가득 차게 되면(S406의 예), EIB(130)는 제2 버퍼 내에 모아진 매크로블록을 픽쳐 버퍼로 전송을 시작할 수 있다(S407). 이 때, 다시 제2 버퍼는 비활성화 상태로, 제1 버퍼는 활성화 상태로 설정될 수 있다. 그리고, 슬라이스의 모든 매크로블록에 대하여 디코딩이 완료되었는지 판단하여(S408), 판단 결과, 아직 남아 있는 매크로블록이 있는 경우에는 디코딩된 매크로블록을 LS의 제1 버퍼에 저장할 수 있다(S408의 아니오).If the second buffer becomes full (YES in S406), the EIB 130 may start transmitting the macroblocks collected in the second buffer to the picture buffer (S407). At this time, the second buffer may be set to an inactive state, and the first buffer may be set to an active state. Then, it is determined whether decoding is completed for all macroblocks of the slice (S408), and when it is determined that there are still macroblocks, the decoded macroblocks may be stored in the first buffer of the LS (NO in S408). .

상기 단계 402 내지 단계 407을 모든 매크로블록에 대해 디코딩이 완료될 때까지 반복하고, 모든 매크로블록의 디코딩이 완료된 경우(S408의 예), 즉, 슬라이스의 마지막에 해당하는 매크로블록을 저장할 때에는 저장되는 버퍼에 대해서 매크로블록의 전송을 요청할 수 있다. 그리고, 모든 매크로블록이 픽쳐 버퍼로 전송될 때까지 대기하게 된다(S409).Steps 402 to 407 are repeated until decoding is completed for all macroblocks, and when decoding of all macroblocks is completed (YES in S408), that is, when storing a macroblock corresponding to the end of a slice, The macroblock may be requested to be sent to the buffer. The macroblock waits until all macroblocks are transmitted to the picture buffer (S409).

이와 같이, 하나의 슬라이스에 대한 디코딩이 완료되면, 다른 슬라이스에 대하여 상기 단계 401 내지 단계 409를 반복할 수 있다.As such, when decoding of one slice is completed, steps 401 to 409 may be repeated for another slice.

도 7은 본 발명의 일실시예에 따른 비디오 디코딩 장치에서 DMA 전송을 이용하여 매크로블록을 전송하는 예를 나타내는 도면이다.7 is a diagram illustrating an example of transmitting a macroblock using DMA transmission in a video decoding apparatus according to an embodiment of the present invention.

도 7에 도시된 바와 같이, 제1 버퍼 및 제2 버퍼에서는 매크로블록의 저장과 전송이 번갈아 반복적으로 이루어질 수 있다. 보다 정확하게는, 매크로블록을 전송 하는 단계에 매크로블록을 디코딩하고 저장하는 단계를 인터리브(Interleave)할 수 있다. 즉, 제1 버퍼에서 매크로블록의 전송이 이루어지는 동안 제2 버퍼에서는 디코딩된 매크로블록의 저장이 이루어지고, 반대로 제2 버퍼에서 매크로블록의 전송이 이루어지는 동안 제1 버퍼에서는 매크로블록의 저장이 이루어지므로, 도 6에서와 같이 매크로블록을 버퍼에 전송하는 동안에 대기 시간이 필요 없게 된다. 다만, 매크로블록을 디코딩하는 것이 다른 픽쳐의 매크로블록들에 의존될 수 있기 때문에, 하나의 슬라이스의 마지막 부분에서는 모든 전송이 완료될 때까지 기다리는 것이 필요하나, 이것은 하나의 슬라이스에 대해 단 한번 필요할 뿐이다.As illustrated in FIG. 7, in the first buffer and the second buffer, storage and transmission of macroblocks may be alternately repeated. More precisely, decoding and storing the macroblock may be interleaved in the step of transmitting the macroblock. That is, while the macroblock is transmitted in the first buffer, the decoded macroblock is stored in the second buffer, and conversely, the macroblock is stored in the first buffer while the macroblock is transmitted in the second buffer. As shown in FIG. 6, no waiting time is required while transferring the macroblock to the buffer. However, since decoding a macroblock may depend on macroblocks of another picture, it is necessary to wait until all transmissions are completed at the end of one slice, but this is only needed once for one slice. .

이와 같이, 데이터를 전송하는 동안 SPE(120)의 스레드가 데이터 전송을 마칠 때까지 기다리지 않고 디코딩 작업을 계속할 수 있도록 함으로써 EIB(130)의 전체 작업 부하를 감소시킬 수 있어 전체 디코딩 장치의 성능을 향상시킬 수 있다.As such, while the data is being transmitted, the entire workload of the EIB 130 can be reduced by allowing the thread of the SPE 120 to continue decoding without waiting for the data to be transmitted, thereby improving the performance of the entire decoding apparatus. You can.

또한, 매크로블록의 디코딩은 다른 매크로블록의 디코딩에 영향을 받지 않는다는 점에서 볼 때, 매크로블록의 디코딩이 끝날 때까지 기다리지 않고, 필요한 경우, 예를 들어 제1 버퍼 또는 제2 버퍼에 매크로블록이 가득 차는 경우 매크로블록의 전송을 요청(전송 초기화)하고 바로 SPE(120) 스레드로 제어를 전환하여 디코딩하는 것이 가능하므로 디코딩 성능을 향상시킬 수 있다.In addition, since the decoding of the macroblock is not affected by the decoding of other macroblocks, the macroblock is not stored in the first buffer or the second buffer if necessary, without waiting until the decoding of the macroblock is finished. If it is full, it is possible to request the transmission of the macroblock (transmission initialization) and to decode by switching control to the SPE 120 thread immediately, thereby improving the decoding performance.

한편, 제1 버퍼 또는 제2 버퍼 내 매크로블록의 전송을 요청한 후 전송이 완료될 때까지의 시간은 다른 버퍼에 매크로블록을 저장하는 시간보다 적게 걸리기 때문에 제1 버퍼 또는 제2 버퍼는 바로 저장이 가능한 상태로 될 수 있다. 따라서, 도 5 및 도 7에서는 제1 버퍼 또는 제2 버퍼 내에 매크로블록이 가득 찬 경우에 전 송이 이루어지는 예를 들고 있으나, 매크로블록의 저장 시간과 전송 시간을 고려하여 전송 요청을 할 수도 있다. 즉, 제1 버퍼에 매크로블록이 가득 차지 않는 경우에도 제1 버퍼에 매크로블록의 전송을 요청하고 제2 버퍼에 저장을 시작할 수도 있다.On the other hand, since the request from the first buffer or the second buffer to the transfer of the macroblock until the transfer is completed takes less time than storing the macroblock in another buffer, the first buffer or the second buffer is immediately stored. It can be made possible. Thus, although FIG. 5 and FIG. 7 illustrate an example in which transmission is performed when the macroblock is filled in the first buffer or the second buffer, the transmission request may be made in consideration of the storage time and the transmission time of the macroblock. That is, even when the macroblock is not full in the first buffer, the macroblock may be requested to be transmitted to the first buffer and then started to be stored in the second buffer.

상술한 바와 같이, 본 발명의 일실시예에 따르면 디코딩된 매크로블록의 저장과 전송 과정을 인터리브함으로써 ISO/IEC 13818-2 Video Standard에서 규정하는 MPEG-2 표준 규격을 만족하면서도 디코딩 성능을 향상시킬 수 있다. 따라서, 하드웨어를 보다 효율적으로 사용할 수 있으며 같은 성능을 구현하는데 있어 하드웨어 사양을 줄일 수 있다.As described above, according to an embodiment of the present invention, by interleaving the process of storing and transmitting the decoded macroblock, the decoding performance can be improved while satisfying the MPEG-2 standard standard defined by the ISO / IEC 13818-2 Video Standard. have. Therefore, the hardware can be used more efficiently, and the hardware specification can be reduced in achieving the same performance.

한편, 이하 설명할 본 발명의 또 다른 실시예에 따른 비디오 디코딩 방법은 디코딩 과정 중 움직임 보상 단계에 움직임 보상에 독립적인 다른 단계를 인터리브함으로써 디코딩 성능을 향상시킬 수 있음을 설명하기로 한다.Meanwhile, it will be described that the video decoding method according to another embodiment of the present invention to be described below can improve decoding performance by interleaving another step independent of motion compensation to the motion compensation step during the decoding process.

본 발명의 또 다른 실시예에 따른 비디오 디코딩 장치의 구성 및 동작은 도 1 내지 도 4에서 설명한 바와 동일하므로, 이에 대한 설명은 생략하도록 한다.Since the configuration and operation of the video decoding apparatus according to another embodiment of the present invention are the same as those described with reference to FIGS. 1 to 4, description thereof will be omitted.

일반적으로, MPEG-2 디코딩 과정의 움직임 보상 단계에서는 연속되는 두 영상 프레임에 대해 현재 영상 프레임의 매크로블록에서 구한 움직임 벡터를 이용하여 이전 영상 프레임에 대해 움직임 예측을 수행할 수 있다. 이 때, 움직임 예측을 수행하기 위하여 이전 영상 프레임, 즉 이미 디코딩된 픽쳐(참조 프레임)로부터 예측 데이터(Prediction data)(또는, 예측 픽셀(Prediction pixel)이라고도 함)를 얻을 수 있다.In general, in the motion compensation step of the MPEG-2 decoding process, motion prediction may be performed on a previous video frame by using a motion vector obtained from a macroblock of a current video frame for two consecutive video frames. In this case, in order to perform motion prediction, prediction data (or also referred to as a prediction pixel) may be obtained from a previous image frame, that is, a picture (reference frame) already decoded.

도 4에서 설명한 바와 같이, 디코딩한 매크로블록은 픽쳐 버퍼에 전송되어 픽쳐로 복원되기 때문에, 움직임 보상을 위해서는 이미 복원된 픽쳐로부터 예측 데이터를 SPE(120)로 전송하는 것이 필요하다.As described in FIG. 4, since the decoded macroblock is transmitted to the picture buffer and reconstructed as a picture, it is necessary to transmit prediction data from the already reconstructed picture to the SPE 120 for motion compensation.

도 8은 본 발명의 또 다른 실시예에 따른 비디오 디코딩 장치에서 움직임 보상을 하는 과정을 나타내는 순서도이다.8 is a flowchart illustrating a process of performing motion compensation in a video decoding apparatus according to another embodiment of the present invention.

먼저, SPE(120)는 현재 영상 프레임의 매크로블록에서 움직임 벡터를 디코딩할 수 있다(S501). 그리고, EIB(130)는 메모리(140) 상의 픽쳐 버퍼로부터 이미 디코딩된 픽쳐(참조 프레임)로부터 예측 데이터를 얻기 위해 픽쳐 버퍼 내에서 오프셋(Offset)을 계산할 수 있다(S502). 그리고, EIB(130)는 예측 데이터를 SPE(120)의 LS로 전송하기 위해 DMA 전송을 초기화할 수 있다(S503). 상기 단계 S501 내지 S503의 과정을 움직임 보상에 있어서 준비 단계(Preparation Stage)라고 한다. 준비 단계를 마치고 나면, 예측 데이터에 대한 DMA 전송이 시작된다.First, the SPE 120 may decode a motion vector in a macroblock of a current image frame (S501). The EIB 130 may calculate an offset in the picture buffer in order to obtain prediction data from a picture (reference frame) already decoded from the picture buffer on the memory 140 (S502). In addition, the EIB 130 may initialize the DMA transmission to transmit the prediction data to the LS of the SPE 120 (S503). The process of steps S501 to S503 is called a preparation stage in motion compensation. After the preparation phase, the DMA transfer for the prediction data is started.

예측 데이터에 대한 DMA 전송이 시작되면, SPE(120)는 움직임 보상 단계에 독립적인, 즉 움직임 보상 단계에 영향을 받지 않는 다른 단계를 수행할 수 있다(S504).When the DMA transmission for the prediction data is started, the SPE 120 may perform another step independent of the motion compensation step, that is, not affected by the motion compensation step (S504).

도 9는 종래의 비디오 디코딩 장치에서 움직임 보상을 하는 예를 나타내는 도면이다.9 is a diagram illustrating an example of performing motion compensation in a conventional video decoding apparatus.

도 9에 도시된 바와 같이, 종래에는 움직임 보상의 준비 단계를 마친 후에는, SPE(120)로 예측 데이터의 전송이 완료될 때까지 SPE(120)는 대기 상태에 있었다. 그리고, 예측 데이터의 전송이 완료된 후, 전송된 예측 데이터를 이용하여 움 직임 예측을 수행하였다. 이와 같이, SPE(120)가 예측 데이터의 전송 완료를 기다리는 대기 시간으로 인해 전체 디코딩 시간이 지연되는 문제가 있었다.As shown in FIG. 9, conventionally, after completing the step of preparing motion compensation, the SPE 120 is in a standby state until the transmission of the prediction data to the SPE 120 is completed. After the transmission of the prediction data is completed, the motion prediction is performed using the transmitted prediction data. As such, there is a problem that the entire decoding time is delayed due to the waiting time for the SPE 120 to wait for the transmission of the prediction data.

따라서, 본 발명의 또 다른 실시예에 따른 비디오 디코딩 장치에서는 예측 데이터를 전송하는 동안, SPE(120)가 다른 작업을 할 수 있도록 한다.Accordingly, the video decoding apparatus according to another embodiment of the present invention allows the SPE 120 to perform other tasks while transmitting the predictive data.

도 10은 본 발명의 또 다른 실시예에 따른 비디오 디코딩 장치에서 움직임 보상을 하는 예를 나타내는 도면이다.10 is a diagram illustrating an example of performing motion compensation in a video decoding apparatus according to another embodiment of the present invention.

도 10에 도시된 바와 같이, 움직임 보상의 준비 단계를 마친 후 예측 데이터의 전송이 시작되면, SPE(120)는 디코딩 과정 중 다른 단계를 수행할 수 있다. 바람직하게는, SPE(120)는 움직임 보상 단계에 영향을 받지 않는 단계, 즉 움직임 보상에 의한 결과에 독립적인 단계를 수행할 수 있는데, 예를 들어, 허프만 블록 디코딩(Huffman block decoding), 양자화 매트릭스 디코딩(Quantizer matrix decoding) 등의 단계를 수행할 수 있다.As shown in FIG. 10, when the transmission of the prediction data starts after completing the preparation step of the motion compensation, the SPE 120 may perform another step of the decoding process. Preferably, the SPE 120 may perform a step that is not affected by the motion compensation step, that is, a step independent of the result of the motion compensation, for example, Huffman block decoding, a quantization matrix. Steps such as decoding (Quantizer matrix decoding) may be performed.

다시 도 8을 참조하면, SPE(120)는 예측 데이터의 전송이 완료될 때까지 움직임 보상 단계와 독립적인 다른 단계를 수행하게 되고(S505의 아니오), 예측 데이터의 전송이 완료되면(S505의 예) SPE(120)는 전송된 예측 데이터와 현재 매크로블록에서 구한 움직임 벡터를 이용하여 움직임 예측을 수행할 수 있다(S506). 마지막으로, 움직임 예측을 통해 움직임 보상을 수행하여 예측 영상을 생성할 수 있다(S507).Referring back to FIG. 8, the SPE 120 performs another step independent of the motion compensation step until the transmission of the prediction data is completed (NO in S505), and when the transmission of the prediction data is completed (YES in S505). The SPE 120 may perform motion prediction using the transmitted prediction data and the motion vector obtained from the current macroblock (S506). Finally, motion compensation may be performed through motion prediction to generate a predicted image (S507).

한편, 본 발명의 또 다른 실시예에 따른 비디오 디코딩 방법은 모든 타입의 매크로블록, 즉 인트라 매크로블록(Intra macroblock), 넌인트라 매크로블록(Non- intra macroblock) 및 스킵트 매크로블록(skipped macroblock)에 적용할 수 있다.Meanwhile, the video decoding method according to another embodiment of the present invention includes all types of macroblocks, that is, intra macroblocks, non-intra macroblocks, and skipped macroblocks. Applicable

도 11은 본 발명의 또 다른 실시예에 따른 비디오 디코딩 장치에서 넌인트라 매크로블록에 대한 움직임 보상을 구현한 예를 나타내는 도면이다.11 is a diagram illustrating an example of implementing motion compensation for a non-intra macroblock in a video decoding apparatus according to another embodiment of the present invention.

도 11에 도시된 바와 같이, 넌인트라 매크로블록에 대해서 움직임 보상을 하기 위해서는 먼저 움직임 보상에 대한 준비 단계를 수행할 수 있다(S601). 그리고, 여러 조건에 따라 분기하여, 예측 데이터의 DMA 전송을 하는 동안 제로잉 DC 예측(Zeroing Dc prediction)(S602), 양자화 매트릭스 디코딩(Quantizer matrix decoding)(S603) 및 허프만 블록 디코딩(Huffman block decoding)(S604)을 수행할 수 있다. 도 11에서 보는 바와 같이, 예측 데이터의 DMA 전송을 하는 동안 SPE(120)는 적어도 제로잉 DC 예측 단계(S602)를 수행할 수 있음을 알 수 있다. 마지막으로, 예측 데이터 전송이 완료되면, 예측 데이터를 이용하여 움직임 예측을 수행할 수 있다(S605, S607 또는 S608). 여기서, StartMotionCompensation() 함수는 움직임 벡터를 디코딩하는 함수, 메모리(140)의 오프셋을 계산하는 함수 및 DMA 전송을 초기화하는 함수를 포함하고, FinishMotionCompensation() 함수는 움직임 예측을 수행하는 함수를 포함할 수 있다.As shown in FIG. 11, in order to compensate for motion with respect to the non-intra macroblock, first, a preparation step for motion compensation may be performed (S601). Then, branching according to various conditions, zeroing DC prediction (S602), quantization matrix decoding (S603), and Huffman block decoding (D) during DMA transmission of prediction data ( S604) may be performed. As shown in FIG. 11, it can be seen that the SPE 120 may perform at least a zeroing DC prediction step S602 during DMA transmission of prediction data. Finally, when the transmission of the prediction data is completed, motion prediction may be performed using the prediction data (S605, S607 or S608). Here, the StartMotionCompensation () function may include a function for decoding a motion vector, a function for calculating an offset of the memory 140, and a function for initializing a DMA transfer, and the FinishMotionCompensation () function may include a function for performing motion prediction. have.

한편, 움직임 보상이 끝난 후 넌인트라 IDCT 단계(S606)를 수행함으로써, ISO/IEC 13818-2 Video Standard에서 규정하는 MPEG-2 표준 규격을 만족할 수 있다.On the other hand, by performing the non-intra IDCT step (S606) after the motion compensation is completed, it can satisfy the MPEG-2 standard standard prescribed in ISO / IEC 13818-2 Video Standard.

여기서는, 넌인트라 매크로블록에 대해서만 설명하였으나, 인트라 매크로블록과 스킵트 매크로블록에 대해서도 비슷한 방법으로 구현할 수 있다.Although only the non-intra macroblock has been described herein, the intra macroblock and the skipped macroblock may be implemented in a similar manner.

상술한 바와 같이, 본 발명의 또 다른 실시예에 따른 비디오 디코딩 장치의 경우, 움직임 보상 단계와 움직임 보상에 독립적인 다른 단계를 인터리브함으로써 ISO/IEC 13818-2 Video Standard에서 규정하는 MPEG-2 표준 규격을 만족하면서도 디코딩 성능을 향상시킬 수 있다. 따라서, 하드웨어를 보다 효율적으로 사용할 수 있으며 같은 성능을 구현하는데 있어 하드웨어 사양을 줄일 수 있다.As described above, in the case of a video decoding apparatus according to another embodiment of the present invention, the MPEG-2 standard standard defined by ISO / IEC 13818-2 Video Standard by interleaving a motion compensation step and another step independent of motion compensation Can improve the decoding performance. Therefore, the hardware can be used more efficiently, and the hardware specification can be reduced in achieving the same performance.

이 때, 본 실시예에서 사용되는 '~부'라는 용어는 소프트웨어 또는 FPGA또는 ASIC과 같은 하드웨어 구성요소를 의미하며, '~부'는 어떤 역할들을 수행한다. 그렇지만 '~부'는 소프트웨어 또는 하드웨어에 한정되는 의미는 아니다. '~부'는 어드레싱할 수 있는 저장 매체에 있도록 구성될 수도 있고 하나 또는 그 이상의 프로세서들을 재생시키도록 구성될 수도 있다. 따라서, 일 예로서 '~부'는 소프트웨어 구성요소들, 객체지향 소프트웨어 구성요소들, 클래스 구성요소들 및 태스크 구성요소들과 같은 구성요소들과, 프로세스들, 함수들, 속성들, 프로시저들, 서브루틴들, 프로그램 코드의 세그먼트들, 드라이버들, 펌웨어, 마이크로코드, 회로, 데이터, 데이터베이스, 데이터 구조들, 테이블들, 어레이들, 및 변수들을 포함한다. 구성요소들과 '~부'들 안에서 제공되는 기능은 더 작은 수의 구성요소들 및 '~부'들로 결합되거나 추가적인 구성요소들과 '~부'들로 더 분리될 수 있다. 뿐만 아니라, 구성요소들 및 '~부'들은 디바이스 또는 보안 멀티미디어카드 내의 하나 또는 그 이상의 CPU들을 재생시키도록 구현될 수도 있다.In this case, the term '~ part' used in the present embodiment refers to software or a hardware component such as an FPGA or an ASIC, and '~ part' performs certain roles. However, '~' is not meant to be limited to software or hardware. '~ Portion' may be configured to be in an addressable storage medium or may be configured to play one or more processors. Thus, as an example, '~' means components such as software components, object-oriented software components, class components, and task components, and processes, functions, properties, procedures, and the like. Subroutines, segments of program code, drivers, firmware, microcode, circuits, data, databases, data structures, tables, arrays, and variables. The functionality provided within the components and the 'parts' may be combined into a smaller number of components and the 'parts' or further separated into additional components and the 'parts'. In addition, the components and '~' may be implemented to play one or more CPUs in the device or secure multimedia card.

본 발명이 속하는 기술분야의 통상의 지식을 가진 자는 본 발명이 그 기술적 사상이나 필수적인 특징을 변경하지 않고서 다른 구체적인 형태로 실시될 수 있다 는 것을 이해할 수 있을 것이다. 그러므로 이상에서 기술한 실시예들은 모든 면에서 예시적인 것이며 한정적이 아닌 것으로 이해해야만 한다. 본 발명의 범위는 상기 상세한 설명보다는 후술하는 특허청구의 범위에 의하여 나타내어지며, 특허청구의 범위의 의미 및 범위 그리고 그 균등 개념으로부터 도출되는 모든 변경 또는 변형된 형태가 본 발명의 범위에 포함되는 것으로 해석되어야 한다.Those skilled in the art will appreciate that the present invention can be embodied in other specific forms without changing the technical spirit or essential features of the present invention. Therefore, it should be understood that the embodiments described above are exemplary in all respects and not restrictive. The scope of the present invention is indicated by the scope of the following claims rather than the detailed description, and all changes or modifications derived from the meaning and scope of the claims and the equivalent concept are included in the scope of the present invention. Should be interpreted.

도 4는 본 발명의 일실시예에 따른 비디오 디코딩 장치에서 멀티 코어 프로세서를 이용하여 비디오를 디코딩하는 예를 나타내는 도면이다.4 is a diagram illustrating an example of decoding a video using a multi-core processor in a video decoding apparatus according to an embodiment of the present invention.

<도면의 주요 부분에 대한 부호의 설명><Explanation of symbols for the main parts of the drawings>

100: 멀티 코어 프로세서100: multi-core processor

110: PPE 120: SPE110: PPE 120: SPE

130: EIB 140: 메모리130: EIB 140: memory

200: 메모리부200: memory

300: 디코더부300: decoder

310: 심볼 디코더 320: 역 양자화부310: symbol decoder 320: inverse quantization unit

330: 역 변환부 340: 모션 보상부330: inverse transform unit 340: motion compensation unit

350: 가산기 360: 디블록부350: adder 360: deblock unit

370: 버퍼370: buffer

Claims

A decoder unit for performing video decoding; And

A multi-core processor which performs the video decoding on the bit stream input by using the decoder unit;

The multi-core processor,

A first core for parsing the input bit stream and dividing the input bit stream into a plurality of slices; And

A second block for alternately storing a plurality of macroblocks generated by decoding the allocated slices in a first buffer and a second buffer included in an auxiliary memory, and then transferring the macroblocks to a main memory to restore an image of the plurality of macroblocks; Includes a core,

And a plurality of decoded macroblocks are stored in another buffer while the decoded plurality of macroblocks are transmitted from one of the first buffer and the second buffer.

The method of claim 1,

The auxiliary memory is provided inside the second core, and the main memory is provided separately from the second core in the multi-core processor.

The method of claim 1,

The second core,

Storing the decoded plurality of macroblocks in the first buffer,

When the first buffer is full, simultaneously transmitting the plurality of macroblocks stored in the first buffer to the main memory, and storing the decoded plurality of macroblocks in the second buffer,

When the second buffer is full, transferring the plurality of macroblocks stored in the second buffer to the main memory, and simultaneously storing the decoded plurality of macroblocks in the first buffer. A video decoding apparatus for repeating until all the macroblocks for the slice assigned to is transmitted.

The method of claim 1,

And the plurality of decoded macroblocks are transmitted to the main memory in a direct memory access (DMA) manner.

The method of claim 1,

The multi-core processor,

At least one power processor element (PPE);

A plurality of synergistic processor elements (SPEs); And

A Cell BE Architecture (Cell Broadband Engine Architecture) including the at least one PPE and an EIB controlling the plurality of SPEs,

And the first core is the at least one PPE, and the second core is any one of the plurality of SPEs.

A decoder unit for performing video decoding; And

The multi-core processor,

A second core configured to perform motion compensation on each of the plurality of macroblocks generated from the allocated slices, and transmit the plurality of macroblocks on which the motion compensation has been performed to main memory to restore an image of the plurality of macroblocks Including;

And the second core simultaneously performs another task that is not affected by the result of the motion compensation while performing the motion compensation.

The method of claim 6,

The second core extracts a motion vector for each of the plurality of macroblocks, extracts prediction data from a pre-reconstructed image in the main memory, and transmits the prediction data to the second core, thereby transmitting the predicted data and the motion vector. A video decoding apparatus for generating a predictive image by performing motion prediction using a.

The method of claim 7, wherein

The other operation is performed simultaneously while the prediction data is transmitted to the second core.

The method of claim 7, wherein

And the prediction data is transmitted to the second core in a direct memory access (DMA) manner.

The method of claim 6,

The multi-core processor,

At least one power processor element (PPE);

A plurality of synergistic processor elements (SPEs); And

In the multi-core processor-based video decoding method consisting of a first core and a second core,

Parsing the bit stream input from the first core into a plurality of slices and assigning one of the plurality of slices to the second core;

Decoding the allocated slice in the second core to generate a plurality of macroblocks;

Alternately storing the decoded macroblocks in a first buffer and a second buffer included in an auxiliary memory, and then transferring the decoded macroblocks to a main memory; And

Restoring images of the plurality of macroblocks using the plurality of macroblocks transmitted to the main memory;

And while the decoded plurality of macroblocks are transmitted from one of the first buffer and the second buffer, the decoded plurality of macroblocks are stored in another buffer.

The method of claim 11, wherein

Alternately storing the first buffer and the second buffer and transferring the same to the main memory;

A first step of storing the decoded plurality of macroblocks in the first buffer;

If the first buffer is full, transmitting a plurality of macroblocks stored in the first buffer to the main memory and storing the decoded plurality of macroblocks in the second buffer;

A third step of, when the second buffer is full, transferring a plurality of macroblocks stored in the second buffer to the main memory and simultaneously storing the decoded plurality of macroblocks in the first buffer; And

And a fourth step of repeating the first to third steps until all macroblocks for the slice assigned to the second core are transmitted.

The method of claim 11, wherein

The multi-core processor,

At least one power processor element (PPE);

A plurality of synergistic processor elements (SPEs); And

The first core is the at least one PPE, and the second core is any one of the plurality of SPEs.

In the video decoding apparatus based on a multi-core processor comprising a first core and a second core,

Performing motion compensation on each of the plurality of macroblocks generated from the allocated slice in the second core;

Restoring an image of the plurality of macroblocks by transmitting a plurality of macroblocks that have performed the motion compensation to a main memory,

While performing the motion compensation, simultaneously performing other tasks not affected by the result of the motion compensation.

The method of claim 16,

Performing the motion compensation,

Extracting a motion vector for each of the plurality of macroblocks;

Extracting prediction data from a pre-reconstructed image in the main memory and transmitting the prediction data to the second core; And

And performing a motion prediction using the transmitted prediction data and the motion vector to generate a prediction image.

The method of claim 17,

Said other operation being performed simultaneously while said prediction data is being sent to said second core.

The method of claim 17,

The method of claim 16,

The multi-core processor,

At least one power processor element (PPE);

A plurality of synergistic processor elements (SPEs); And