KR101425620B1

KR101425620B1 - Method and apparatus for video decoding based on a multi-core processor

Info

Publication number: KR101425620B1
Application number: KR1020080004533A
Authority: KR
Inventors: 백현기; 그리고리 아불라제; 김연일; 배세현
Original assignee: 삼성전자주식회사
Priority date: 2007-12-17
Filing date: 2008-01-15
Publication date: 2014-07-31
Also published as: KR20090065398A

Abstract

본 발명은 멀티 코어 프로세서 환경에서 많은 연산량을 필요로 하는 비디오 디코딩 처리에 있어서 시스템 자원을 보다 효과적으로 사용하고자 하기 위한 방법 및 장치에 관한 것이다.The present invention relates to a method and apparatus for more effectively using system resources in a video decoding process requiring a large amount of computation in a multicore processor environment.

본 발명의 일 실시예에 따른 멀티 코어 프로세서 장치는, 비디오 디코딩을 수행하기 위한 기능 모듈들로 구성되는 비디오 디코더 모듈과, 입력된 비트스트림을 저장하고 상기 기능 모듈들을 로드하는 메모리와, 상기 기능 모듈들을 이용하여 상기 입력된 비트스트림에 대한 비디오 디코딩을 수행하는 복수의 코어들을 포함하는 멀티 코어 프로세서를 포함하며, 상기 비디오 디코딩을 수행하는 중에 상기 코어들 중에서 제1 코어에 유휴 시간이 발생한 경우에는, 비디오 디코딩에 관한 잔여 작업을 가지는 제2 코어가 상기 잔여 작업 중에서 일부의 작업을 상기 제1 코어에 할당하여 상기 유휴 시간을 감소시키는 것을 특징으로 한다.A multicore processor apparatus according to an exemplary embodiment of the present invention includes a video decoder module including functional modules for performing video decoding, a memory for storing an input bitstream and loading the functional modules, And a plurality of cores for performing video decoding on the input bitstream by using the plurality of cores, wherein when an idle time occurs in the first core among the cores during the video decoding, And a second core having a residual operation related to video decoding is assigned to the first core to perform some operations among the remaining operations to reduce the idle time.

비디오 코딩, 멀티 코어 프로세서, 심볼 디코딩, 모션 보상, 의존성 Video coding, multicore processor, symbol decoding, motion compensation, dependency

Description

TECHNICAL FIELD The present invention relates to a video decoding method and apparatus based on a multi-core processor,

본 발명은 비디오 디코딩 기술에 관한 것으로, 보다 상세하게는 멀티 코어 프로세서 환경에서 많은 연산량을 필요로 하는 비디오 디코딩 처리에 있어서 시스템 자원을 보다 효과적으로 사용하고자 하기 위한 방법 및 장치에 관한 것이다.The present invention relates to a video decoding technique, and more particularly, to a method and apparatus for more effectively using system resources in a video decoding process requiring a large amount of computation in a multicore processor environment.

인터넷을 포함한 정보통신 기술이 발달함에 따라 문자, 음성뿐만 아니라 화상통신이 증가하고 있다. 기존의 문자 위주의 통신 방식으로는 소비자의 다양한 욕구를 충족시키기에는 부족하며, 이에 따라 문자, 영상, 음악 등 다양한 형태의 정보를 수용할 수 있는 멀티미디어 서비스가 증가하고 있다. 멀티미디어 데이터는 그 양이 방대하여 대용량의 저장매체를 필요로 하며 전송시에 넓은 대역폭을 필요로 한다. 따라서 문자, 영상, 오디오를 포함한 멀티미디어 데이터를 전송하기 위해서는 압축코딩기법을 사용하는 것이 필수적이다.As information and communication technologies including the Internet are developed, not only text and voice but also video communication are increasing. Conventional character - oriented communication methods are not sufficient to satisfy various needs of consumers. Accordingly, multimedia services capable of accommodating various types of information such as text, images, and music are increasing. The amount of multimedia data is so large that it needs a large capacity storage medium and requires a wide bandwidth in transmission. Therefore, it is necessary to use a compression coding technique to transmit multimedia data including characters, images, and audio.

데이터를 압축하는 기본적인 원리는 데이터의 중복(redundancy) 요소를 제거하는 과정이다. 이러한 데이터 중에서도 특히, 비디오 데이터는 그 용량이 매우 크기 때문에 다른 종류의 멀티미디어 데이터들에 비하여 효율적인 압축의 중요성이 보다 크다고 할 수 있다. The basic principle of compressing data is the process of eliminating the redundancy of data. Among these data, in particular, since the video data has a very large capacity, efficient compression is more important than other kinds of multimedia data.

비디오 압축의 기본 원리는 하나의 픽쳐(프레임) 내에서 동일한 색이나 객체가 반복되는 것과 같은 공간적 중복이나, 동영상 프레임에서 인접 프레임이 거의 변화가 없는 경우나 오디오에서 같은 음이 계속 반복되는 것과 같은 시간적 중복, 또는 인간의 시각 및 지각 능력이 높은 주파수에 둔감한 것을 고려하여 지각적 중복을 제거하는 방법 등을 통하여 데이터를 압축하는 것이다. 일반적인 비디오 코딩 방법에 있어서, 시간적 중복은 모션 보상에 근거한 시간적 필터링(temporal filtering)에 의해 제거하고, 공간적 중복은 공간적 변환(spatial transform)에 의해 제거한다.The basic principles of video compression are spatial redundancy such that the same color or object is repeated in one picture (frame), temporal redundancy such as a case in which adjacent frames in the moving picture frame hardly change, And a method of eliminating perceptual redundancy in consideration of redundancy or insensitivity to high frequency of human visual and perceptual ability. In a general video coding method, temporal redundancy is removed by temporal filtering based on motion compensation, and spatial redundancy is removed by a spatial transform.

종래에는 이러한 비디오 코딩이나 디코딩 작업이 단일 코어 프로세서(single-core processor)에 의하여 이루어지는 것이 일반적이었다. 그러나, 최근 들어 강력한 성능의 멀티 코어 프로세서(multi-core processor)가 보급화되면서 비디오 코딩/디코딩과 같이 시스템 자원을 많이 소모하는 분야에서 멀티 코어 프로세서의 활용도가 높아지고 있다.Conventionally, such a video coding or decoding operation is generally performed by a single-core processor. However, in recent years, as a multi-core processor with a strong performance has been popularized, the utilization of multi-core processors is increasing in the field of consuming system resources such as video coding / decoding.

멀티 코어 프로세서는 보다 강력한 성능과 소비 전력 절감, 그리고 여러 개의 작업을 보다 효율적으로 한 번에 처리하기 위해 두 개 이상의 코어가 결합되어 있는 집적회로를 가리킨다. 멀티 코어 프로세서는 한 컴퓨터 내에 두 개 이상의 독립된 프로세서가 설치된 경우와 종종 비교되곤 한다. 그러나, 멀티 코어 프로세서의 경우 두 개의 프로세서가 실제로는 하나의 소켓에 꽂혀지기 때문에 프로세서 간의 연결이 더 빠른 장점이 있다. 이론적으로는 듀얼 코어 프로세서가 단일 코어 프 로세서보다 성능 면에서 두 배가 되어야 하지만, 실제로는 듀얼 코어 프로세서가 단일 코어 프로세서보다 성능 면에서 약 1.5배 정도 우수한 것으로 알려져 있다.A multicore processor refers to an integrated circuit that combines more than two cores to provide more power, reduce power consumption, and handle multiple tasks more efficiently at the same time. Multicore processors are often compared to when two or more independent processors are installed in a computer. However, in the case of multicore processors, the advantage is that the two processors are actually plugged into a single socket, which makes the connections between the processors faster. Theoretically, a dual-core processor should double in performance, but in fact, a dual-core processor is about 1.5 times better in performance than a single-core processor.

현재, 단일 코어 프로세서가 복잡도나 속도 측면에 있어 거의 물리적 한계에 도달하고 있다고 여겨지고 있기 때문에, 멀티 코어 프로세서와 관련된 산업의 성장이 가속화되고 있는 추세에 있다. 멀티 코어 프로세서 제품을 생산하거나 이와 관련되어 있는 회사들로는 AMD, ARM, Intel 등이 있는데, 이들 회사들은 향후 프로세서 시장의 대부분을 멀티 코어 프로세서가 장악할 것이라고 예측하고 그 개발에 박차를 가하고 있다.Currently, the growth of multicore processor-related industries is accelerating because single-core processors are believed to reach near-physical limits in terms of complexity and speed. There are companies such as AMD, ARM, and Intel that are producing or related to multicore processor products, which are predicting that multicore processors will dominate most of the future processor market and are spurring the development.

이러한 멀티 코어 프로세서를 이용하여 비디오 디코딩을 수행하는 종래의 기술로는 기능적(functional) 분할 방식과 데이터 분할 방식이 있다.Conventional techniques for performing video decoding using such a multicore processor include a functional partitioning method and a data partitioning method.

도 1 및 도 2는 상기 기능적 분할 방식을 설명하기 위한 도면들이다. 도 1에 도시된 바와 같이, 일반적으로 비디오 디코딩을 위해서 프로세서는, 데이터 읽기, 전처리/초기화, 엔트로피 복호화(entropy decoding, 이하 ED로 표시함), 역 양자화(inverse quantization, 이하 IQ로 표시함), 역 변환(inverse transform, 이하 IT로 표시함), 인트라 예측(intra prediction, 이하 IP로 표시함), 모션 보상(motion compensation, 이하 MC로 표시함), 디블록(deblocking), 데이터 쓰기 등 다양한 세부 기능을 수행하여야 한다.1 and 2 are views for explaining the functional division method. As shown in FIG. 1, in general, for video decoding, a processor may perform data reading, preprocessing / initialization, entropy decoding (hereinafter referred to as ED), inverse quantization And various details such as inverse transform (hereinafter referred to as IT), intra prediction (hereinafter referred to as IP), motion compensation (hereinafter referred to as MC), deblocking, Function.

상기 기능적 분할 방식에서는, 하나의 프로세서를 구성하는 복수의 코어들은 각각 정해진 기능만을 담당하도록 정해져 있다. 예를 들면, 코어 #2는 엔트로피 복호화(ED) 기능만을 담당하고, 코어 #4는 디블록 기능만을 담당한다. 이와 같이 복 수의 코어들을 기능적으로 분할하게 되면, 도 2와 같이, 각각의 코어들이 담당하는 연산량들(21, 22, 23, 24) 간에 불균형이 발생한다. 특히, 코어 #3은 상대적으로 과중한 부하(load)로 인하여 병목(critical path)으로 작용하기 때문에 프로세서 전체 성능의 저하를 유발하게 된다. 이와 같이, 기능적 분할 방식은 그 구현이 용이하기는 하지만 분할된 기능을 각각의 코어가 처리하는 시간들이 동일하지 않기 때문에 병렬 처리가 어렵고, 프로세서의 전체적 성능을 모두 활용하지 못하는 단점이 있다.In the functional partitioning method, a plurality of cores constituting one processor are each determined to take only predetermined functions. For example, core # 2 only performs an entropy decoding (ED) function, and core # 4 only performs a deblock function. When the multiple cores are functionally divided as described above, an imbalance occurs between the arithmetic operations 21, 22, 23, and 24, which are performed by the cores, as shown in FIG. Particularly, core # 3 acts as a critical path due to a relatively heavy load, which causes degradation of the overall performance of the processor. As described above, although the functional partitioning method is easy to implement, parallel processing is difficult because each core processes the divided functions at the same time, and the whole performance of the processor is not utilized.

도 3은 상기 데이터 분할 방식을 설명하기 위한 도면이다. 데이터 분할 방식은, 도 3에 도시한 바와 같이, 하나의 픽쳐(30)를 복수의 영역으로 나누고 이를 각각의 코어에 할당하는 방식이다. 예를 들면, 하나의 픽쳐를 동일한 크기로 4등분 한 후, 4등분된 영역들을 각각 대응되는 코어가 처리하게 된다.3 is a diagram for explaining the data division method. As shown in FIG. 3, the data division method is a method of dividing one picture 30 into a plurality of areas and assigning them to respective cores. For example, one picture is divided into quadrants of the same size, and quadrants are processed by corresponding cores.

이와 같이, 데이터 분할 방식은 단순한 데이터 처리에 있어서는 높은 병렬성을 보장한다. 그러나, 데이터 처리 프로세스간 의존성(dependency)이 있으면 구현이 복잡해지고 이를 해결하기 위한 추가 작업(데이터의 분할 크기와 연산 부하 간의 관계 예측)이 필요하게 되므로 성능이 급격하게 저하되는 단점이 있다. 또한 멀티 코어 프로세서를 구성하는 각각의 코어가 비디오 디코딩을 위한 전체 기능을 가지고 있어야 하기 때문에 시스템 자원(예: local storage)의 사용에 있어서도 비효율적이 된다.Thus, the data partitioning scheme ensures high parallelism in simple data processing. However, if there is a dependency between data processing processes, the implementation becomes complicated and there is a disadvantage that the performance is drastically degraded because an additional task (prediction of the relationship between the division size of the data and the operation load) is required. Also, each core of a multicore processor must have full functionality for video decoding, making it inefficient for use of system resources (eg, local storage).

특히, 최근 많이 사용되고 있는 H.264 표준 규격은 다른 표준 규격의 디코더에 비하여 연산량이 많고 기능 간의 의존성이 매우 높아서, 이상과 같은 종래의 방 식들로는 멀티 코어 프로세서의 성능을 제대로 발휘할 수가 없다.In particular, the H.264 standard, which is widely used in recent years, has a higher computational complexity and higher dependency on functions than decoders of other standard standards. Thus, the performance of a multicore processor can not be exhibited in the conventional methods as described above.

본 발명이 이루고자 하는 기술적 과제는, 멀티 코어 프로세서를 이용하여 비디오 디코딩을 수행하는 데 있어서, 멀티 코어들 간에 의존성이 없는 독립적인 작업들을 공유하게 함으로써, 비디오 디코딩 성능을 향상시키고자 하는 것이다.An object of the present invention is to improve video decoding performance by sharing independent tasks that do not depend on each other among multicores in performing video decoding using a multicore processor.

본 발명의 기술적 과제들은 이상에서 언급한 기술적 과제로 제한되지 않으며, 언급되지 않은 또 다른 기술적 과제들은 아래의 기재로부터 당업자에게 명확하게 이해될 수 있을 것이다.The technical objects of the present invention are not limited to the technical matters mentioned above, and other technical subjects not mentioned can be clearly understood by those skilled in the art from the following description.

상기 기술적 과제를 달성하기 위한 본 발명의 일 실시예에 따른 멀티 코어 프로세서 장치는, 비디오 디코딩을 수행하기 위한 기능 모듈들로 구성되는 비디오 디코더 모듈; 입력된 비트스트림을 저장하고 상기 기능 모듈들을 로드하는 메모리; 및 상기 기능 모듈들을 이용하여 상기 입력된 비트스트림에 대한 비디오 디코딩을 수행하는 복수의 코어들을 포함하는 멀티 코어 프로세서를 포함하며, 상기 비디오 디코딩을 수행하는 중에 상기 코어들 중에서 제1 코어에 유휴 시간이 발생한 경우에는, 비디오 디코딩에 관한 잔여 작업을 가지는 제2 코어가 상기 잔여 작업 중에서 일부의 작업을 상기 제1 코어에 할당하여 상기 유휴 시간을 감소시킨다.According to an aspect of the present invention, there is provided a multi-core processor including: a video decoder module including functional modules for performing video decoding; A memory for storing an input bit stream and loading the function modules; And a plurality of cores for performing video decoding on the input bitstream using the functional modules, wherein during the video decoding, the first core among the cores If so, a second core with remaining work on video decoding allocates some of the remaining work to the first core to reduce the idle time.

또한, 상기 기술적 과제를 달성하기 위한 본 발명의 일 실시예에 멀티 코어 프로세서 기반의 비디오 디코딩 방법은, 입력된 비트스트림을 저장하고 비디오 디코딩을 수행하기 위한 기능 모듈들을 로드하는 단계; 상기 입력된 비트스트림 및 기능 모듈들을 이용하여 작업들을 생성하고, 상기 작업을 해당하는 기능에 따라 상기 코어들에 분배하기 위하여 상기 생성된 작업들을 버퍼에 큐잉하는 단계; 상기 기능 모듈들을 이용하여 상기 입력된 비트스트림에 대한 비디오 디코딩을 수행하는 단계로서, 상기 비디오 디코딩은 복수의 코어들을 갖는 멀티 코어 프로세서에 의하여 수행되는 상기 단계; 및 상기 비디오 디코딩을 수행하는 중에 상기 코어들 중에서 제1 코어에 유휴 시간이 발생한 경우에는, 비디오 디코딩에 관한 잔여 작업을 가지는 제2 코어가 상기 잔여 작업 중에서 일부의 작업을 상기 제1 코어에 할당하는 단계를 포함한다.According to another aspect of the present invention, there is provided a video decoding method based on a multicore processor, comprising: loading function modules for storing an input bitstream and performing video decoding; Generating jobs using the input bitstream and function modules, and queuing the generated jobs in a buffer to distribute the jobs to the cores according to a corresponding function; Performing video decoding on the input bitstream using the functional modules, wherein the video decoding is performed by a multicore processor having a plurality of cores; And when a idle time occurs in a first one of the cores during the video decoding, a second core having a residual operation related to video decoding allocates a part of the operations from the remaining operations to the first core .

본 발명에 따르면, 멀티 코어 프로세서 환경에서 코어 간의 부하 균형을 제공함으로써 비디오 디코딩의 성능을 높일 수 있다.According to the present invention, performance of video decoding can be improved by providing load balancing between cores in a multicore processor environment.

또한, 본 발명에 따르면 주요 연산 간에 존재하는 의존성이 고려되기 때문에 코어 별로 비디오 디코딩의 기능 모듈을 동적으로 할당할 수 있다.In addition, according to the present invention, a functional module of video decoding can be dynamically allocated on a core-by-core basis since dependencies existing between major operations are considered.

본 발명의 이점 및 특징, 그리고 그것들을 달성하는 방법은 첨부되는 도면과 함께 상세하게 후술되어 있는 실시예들을 참조하면 명확해질 것이다. 그러나 본 발명은 이하에서 개시되는 실시예들에 한정되는 것이 아니라 서로 다른 다양한 형태로 구현될 것이며, 단지 본 실시예들은 본 발명의 개시가 완전하도록 하며, 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 발명의 범주를 완전하게 알려주기 위해 제공되는 것이며, 본 발명은 청구항의 범주에 의해 정의될 뿐이다. 명 세서 전체에 걸쳐 동일 참조 부호는 동일 구성 요소를 지칭한다.BRIEF DESCRIPTION OF THE DRAWINGS The advantages and features of the present invention, and the manner of achieving them, will be apparent from and elucidated with reference to the embodiments described hereinafter in conjunction with the accompanying drawings. The present invention may, however, be embodied in many different forms and should not be construed as being limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the invention to those skilled in the art. Is provided to fully convey the scope of the invention to those skilled in the art, and the invention is only defined by the scope of the claims. Like reference numerals refer to like elements throughout the specification.

이하 첨부된 도면들을 참조하여 본 발명의 일 실시예를 상세히 설명한다.Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings.

도 4 및 도 5는 본 발명에 따른 동적 부하 밸런싱 기법의 개념을 설명하기 위한 도면들이다. 이 중에서, 도 4는 종래에 멀티 코어 프로세서가 비디오 디코딩을 수행하는 과정을 보여주는 도면이다. 여기서, 코어 #1 및 코어 #2는 엔트로피 디코딩(ED), 역 양자화(IQ) 및 역 변환(IT)만을 담당하고, 코어 #3는 인트라 예측(IP) 및 모션 보상(MC)만을 담당하며, 코어 #4는 디블록 만을 담당한다.4 and 5 are diagrams for explaining the concept of the dynamic load balancing technique according to the present invention. 4 is a diagram illustrating a process in which a multi-core processor performs video decoding in the related art. Here, the core # 1 and the core # 2 only handle entropy decoding (ED), inverse quantization (IQ) and inverse transform (IT) Core # 4 is only responsible for the diblock.

이 경우, 코어 #3는 t₁에서 작업이 완료되고, 코어 #1 및 코어 #2는 t₂에서 작업이 완료되지만 코어 #3이 t₃에서 작업이 완료되기 전까지는 다음 영상(또는 수개의 매크로블록들)을 처리하지 못하고 유휴(idle) 상태에 있게 된다. 따라서, 멀티 코어 프로세서의 전체적인 성능을 제대로 활용하지 못한다.In this case, the core # 3, the operation is completed at t _1, core # 1 and core # 2 in t ₂ is complete, but the core # 3 until the operation is completed at t ₃ is the next picture (or the number of macro Blocks) and is in an idle state. As a result, it does not fully utilize the overall performance of multicore processors.

도 5는 도 4의 환경에서 본 발명에 따른 부하의 동적 부하 밸런싱(dynamic load balancing) 기법을 적용한 예를 보여준다. 도 4와 같이 멀티 코어 프로세서에 기능적 분할 방식을 적용하면 전술한 바와 같은 유휴 시간이 발생하지만, 제1 코어에서 유휴 시간이 발생하면 잔여 작업이 남아 있는 제1 코어에서 상기 제1 코어에 일부 작업을 분배하게 되면 이와 같은 유휴 시간을 없애거나 줄일 수가 있다.FIG. 5 shows an example in which the dynamic load balancing method according to the present invention is applied to the environment of FIG. If a functional partitioning scheme is applied to a multi-core processor as shown in FIG. 4, idle time occurs as described above. However, if idle time occurs in the first core, Distribution can eliminate or reduce this idle time.

코어 #4은 t--₁에서 유휴 상태가 되면 코어 #3에게 이러한 사실을 알린다. 그러면, 코어 #3은 자신의 잔여 작업 중 일부를 코어 #4에게 할당하게 된다. 물론, 코어 #3은 마찬가지로 코어 #4 뿐만 아니라 코어 #1 및 #2에도 일부 작업을 할당할 수 있다. 이와 같이, 특정 코어에서 초과되는 부하를 다른 코어들에 동적으로 균형을 맞추게 되면, t₄에서 모든 코어들이 동시에 작업을 완료할 수 있다.Core # 4 informs Core # 3 when it is idle at t-- ₁ . Then, Core # 3 allocates some of its remaining operations to Core # 4. Of course, Core # 3 can assign some tasks to Core # 1 and Core # 2 as well as Core # 4. Thus, if the overloading of a particular core is dynamically balanced to the other cores, then all cores at t ₄ can complete their work simultaneously.

도 6은 본 발명의 일 실시예에 따른 비디오 디코딩 과정을 보여주는 도면이다.6 is a diagram illustrating a video decoding process according to an embodiment of the present invention.

심볼 디코더(61)는 입력된 비트스트림에 대하여 무손실 복호화를 수행하고, 모션 벡터와 텍스쳐 데이터를 구한다. 상기 무손실 복호화에는 허프만 복호화(Huffman decoding), 산술 복호화(arithmetic decoding), 가변 길이 복호화(variable length decoding) 등이 있다. 일반적으로 특정 매크로블록에 대한 모션 벡터는 주변 매크로블록의 모션 벡터에 의존성을 지닌다. 즉, 주변 매크로블록의 모션 벡터를 구하지 않고서는 상기 특정 매크로블록의 모션 벡터도 구할 수 없다. 이와 같이 의존성이 있는 데이터들이 각각 서로 다른 코어에서 처리되게 되면 병렬처리가 불가능하게 되므로, 본 발명의 부하의 동적 부하 밸런싱에 있어서는 상호간에 의존성이 존재하는 데이터들이 가급적 하나의 코어에서 처리될 수 있도록 작업을 할당할 필요가 있다.The symbol decoder 61 performs lossless decoding on the input bitstream, and obtains motion vectors and texture data. The lossless decoding includes Huffman decoding, arithmetic decoding, and variable length decoding. In general, the motion vector for a particular macroblock depends on the motion vector of the neighboring macroblock. That is, the motion vector of the specific macroblock can not be obtained without obtaining the motion vector of the neighboring macroblock. If the data having dependency is processed in different cores, parallel processing becomes impossible. Therefore, in the dynamic load balancing of the load of the present invention, it is preferable that data having mutual dependency is processed in one core . &Lt; / RTI >

상기 구한 텍스쳐 데이터는 역 양자화부(62)에 제공되고, 상기 구한 모션 벡터는 모션 보상부(65)에 제공된다.The obtained texture data is provided to an inverse quantization unit 62, and the obtained motion vector is provided to a motion compensation unit 65. [

한편, 역 양자화부(62)는 심볼 디코더(61)로부터 제공되는 텍스쳐 데이터를 역 양자화한다. 이러한 역 양자화 과정은 양자화 과정에서 사용되었던 양자화 테이블을 이용하여 양자화 과정에서 생성된 인덱스로부터 그에 매칭되는 값을 복원하는 과정이다.On the other hand, the dequantizer 62 dequantizes the texture data supplied from the symbol decoder 61. The dequantization process is a process of recovering a value matched with the index generated in the quantization process using the quantization table used in the quantization process.

역 변환부(63)는 상기 역 양자화된 결과에 대하여 역 변환을 수행한다. 이러한 역 변환의 구체적 방법으로는 역 DCT 변환, 역 웨이브렛 변환 등이 있다. 상기 역 변환된 결과, 즉 복원된 고주파 영상은 가산기(66)에 제공된다.The inverse transform unit 63 performs an inverse transform on the inverse quantized result. Specific methods of such inverse transform include inverse DCT transform and inverse wavelet transform. The result of the inverse transformation, that is, the reconstructed high-frequency image, is supplied to the adder 66.

모션 보상부(65)는 심볼 디코더(61)로부터 제공되는 현재 매크로블록에 대한 모션 벡터를 이용하여, 적어도 하나 이상의 참조 프레임(기 복원되어 버퍼(64)에 저장되어 있음)을 모션 보상함으로써 예측 영상을 생성한다. 이러한 모션 보상이 1/2 픽셀 또는 1/4 픽셀 단위로 이루어지는 경우에는 상기 예측 영상을 생성하기 위한 보간 과정에서 매우 많은 연산량이 소요된다. 또한, 두 개의 참조 프레임을 사용하여 모션 보상하는 경우에는 각각 모션 보상된 매크로블록들 평균을 계산하게 되는데, 이 때에는 상기 매크로블록들 간에는 의존성이 존재하게 된다. 즉, 이들 매크로블록들은 단일의 코어에서 처리되도록 할 필요가 있다.The motion compensation unit 65 performs motion compensation on at least one reference frame (which has been restored and stored in the buffer 64) using the motion vector for the current macroblock provided from the symbol decoder 61, . In the case where such motion compensation is performed in units of 1/2 pixel or 1/4 pixel, a very large amount of calculation is required in the interpolation process for generating the prediction image. In addition, when motion compensation is performed using two reference frames, an average of motion compensated macroblocks is calculated. At this time, there is a dependency between the macroblocks. That is, these macroblocks need to be processed in a single core.

가산기(66)는 역 변환부(63)로부터 제공되는 고주파 영상과 상기 생성된 예측 영상을 가산하여 현재 매크로블록에 관한 영상을 복원한다. 디블록부(67)는 상기 복원된 영상에 디블록 필터를 적용하여 상기 복원된 영상의 블록 인위성(block artifact)를 제거한다. 일반적으로, 상기 복원된 영상은 매크로블록 단위로 처리되기 때문에 매크로블록 경계 부분에서 노이즈가 발생하게 되는데 이를 블록 인위성이라고 한다. 이러한 블록 인위성은 비디오 데이터의 압축률이 높을수록 커지는 경향이 있다. 상기 디블록 필터를 거친 복원된 영상은 버퍼(64)에 일시 저장되었다가 다른 영상의 복원을 위하여 이용되기도 한다.The adder 66 adds the high-frequency image provided from the inverse transformer 63 and the generated predictive image to reconstruct the image of the current macroblock. The deblocking unit 67 removes block artifacts of the reconstructed image by applying a diblock filter to the reconstructed image. Generally, since the reconstructed image is processed in units of macroblocks, noise is generated at the boundary of macroblocks, which is called block artifact. This block artifact tends to increase as the compression rate of video data increases. The reconstructed image obtained through the diblock filter is temporarily stored in the buffer 64 and used for restoration of other images.

한편, 모든 매크로블록이 모션 보상을 통하여 복원되는 것은 아니다. 매크로블록에 따라서는 인트라 예측(IP)을 통하여 코딩되는 경우도 있다(이를 인트라 매크로블록이라고 함). 인트라 예측은 현재 매크로블록을 복원함에 있어서, 동일한 프레임 내에서 인접한 다른 매크로블록의 영상을 이용하는 기법이다. 이 경우에는 상기 현재 매크로블록은 상기 다른 매크로블록과 의존성을 가지게 되므로 단일의 코어에서 처리되도록 할 필요가 있다.On the other hand, not all macroblocks are restored through motion compensation. In some macroblocks, they are coded through intra prediction (IP) (this is referred to as intra macroblock). Intra prediction is a technique for restoring a current macroblock using images of adjacent macroblocks in the same frame. In this case, since the current macroblock has a dependency with the other macroblock, it needs to be processed in a single core.

도 7은 본 발명의 일 실시예에 따른 전체 시스템의 구성도이다. 상기 시스템은 TV, 셋탑박스, 데스크 탑, 랩 탑 컴퓨터, 팜 탑(palmtop) 컴퓨터, PDA(personal digital assistant), 비디오 또는 이미지 저장 장치(예컨대, VCR(video cassette recorder), DVR(digital video recorder) 등)를 나타내는 것일 수 있다. 뿐만 아니라, 상기 시스템은 상기한 장치들을 조합한 것, 또는 상기 장치가 다른 장치의 일부분으로 포함된 것을 나타내는 것일 수도 있다. 상기 시스템은 적어도 하나 이상의 비디오 소스(71), 하나 이상의 입출력 장치(72), 멀티 코어 프로세서(110), 메모리(120), 그리고 디스플레이 장치(73)를 포함하여 구성될 수 있다.7 is a block diagram of an overall system according to an embodiment of the present invention. Such as a TV, a set top box, a desktop, a laptop computer, a palmtop computer, a personal digital assistant (PDA), a video or image storage device (e.g., a video cassette recorder (VCR), a digital video recorder (DVR) Etc.). In addition, the system may be a combination of the above devices, or the device may be included as part of another device. The system may include at least one video source 71, at least one input / output device 72, a multicore processor 110, a memory 120, and a display device 73.

비디오 소스(71)는 TV 리시버(TV receiver), VCR, 또는 다른 비디오 저장 장치를 나타내는 것일 수 있다. 또한, 상기 소스(71)는 인터넷, WAN(wide area network), LAN(local area network), 지상파 방송 시스템(terrestrial broadcast system), 케이블 네트워크, 위성 통신 네트워크, 무선 네트워크, 전화 네트워크 등을 이용하여 서버로부터 비디오를 수신하기 위한 하나 이상의 네트워크 연결을 나타내는 것일 수도 있다. 뿐만 아니라, 상기 소스는 상기한 네트워크들을 조합한 것, 또는 상기 네트워크가 다른 네트워크의 일부분으로 포함된 것을 나타내는 것일 수도 있다. 비디오 소스(71)는 이와 같이 비디오 데이터를 얻는 경로를 의미하기도 하지만, 소정의 비디오 압축 알고리즘에 의하여 압축된 비트스트림 자체를 의미하기도 한다.Video source 71 may represent a TV receiver, VCR, or other video storage device. The source 71 may be connected to the server 71 by using the Internet, a wide area network (WAN), a local area network (LAN), a terrestrial broadcast system, a cable network, a satellite communication network, Lt; RTI ID = 0.0 > video < / RTI > In addition, the source may be a combination of the above networks, or indicating that the network is included as part of another network. The video source 71 also means a path for obtaining video data, but also means a bit stream itself compressed by a predetermined video compression algorithm.

입출력 장치(72), 멀티 코어 프로세서(110), 그리고 메모리(120)는 통신 매체(76)를 통하여 통신한다. 상기 통신 매체(76)는 통신 버스, 통신 네트워크, 또는 하나 이상의 내부 연결 회로를 나타내는 것일 수 있다. 상기 소스(71)로부터 수신되는 입력 비디오 데이터는 메모리(120)에 저장된 하나 이상의 소프트웨어 프로그램에 따라 멀티 코어 프로세서(110)에 의하여 처리될 수 있다. 멀티 코어 프로세서(110)는 보다 강력한 성능과 소비 전력 절감, 그리고 여러 개의 작업을 보다 효율적으로 한 번에 처리하기 위해 두 개 이상의 코어가 결합되어 있는 집적회로를 의미한다.The input / output device 72, the multicore processor 110, and the memory 120 communicate via the communication medium 76. The communication medium 76 may be a communication bus, a communication network, or one or more internal connection circuits. The input video data received from the source 71 may be processed by the multicore processor 110 in accordance with one or more software programs stored in the memory 120. The multi-core processor 110 refers to an integrated circuit having two or more cores combined to perform more efficiently and at the same time, with more performance and lower power consumption.

디스플레이 장치(73)에 제공되는 출력 비디오를 생성하기 위하여 멀티 코어 프로세서(110)에 의하여 실행될 수 있다. 디스플레이 장치(73)는 LCD(Liquid Crystal Display), LED(Light-Emitting Diode), OLED(Organic Light-Emitting Diode), PDP(Plasma Display Panel), 또는 기타 해당 분야에서 알려져 있는 임의의 다른 형태의 영상 표시 수단으로 구현될 수 있다. And may be executed by the multicore processor 110 to produce output video provided to the display device 73. The display device 73 may be a liquid crystal display (LCD), a light-emitting diode (LED), an organic light-emitting diode (OLED), a plasma display panel (PDP), or any other type of image And can be implemented as display means.

메모리(120)에 저장된 소프트웨어 프로그램은 도 6에 도시된 디코딩 과정을 수행하기 위한 비디오 디코더 모듈을 포함한다. 상기 비디오 디코더는 메모리(120)에 저장되어 있을 수도 있고, CD-ROM이나 플로피 디스크와 같은 저장 매체에서 읽 어 들이거나, 각종 네트워크를 통하여 소정의 서버로부터 다운로드 한 것일 수도 있다. 상기 소프트웨어에 의하여 하드웨어 회로에 의하여 대체되거나, 소프트웨어와 하드웨어 회로의 조합에 의하여 대체될 수 있다. 또한, 메모리(120)는 처리되기 전의 데이터를 일시 저장하는 버퍼(buffer) 내지 큐(que)를 포함한다.The software program stored in the memory 120 includes a video decoder module for performing the decoding process shown in FIG. The video decoder may be stored in the memory 120, read from a storage medium such as a CD-ROM or a floppy disk, or downloaded from a predetermined server through various networks. May be replaced by hardware circuitry by the software, or may be replaced by a combination of software and hardware circuitry. In addition, the memory 120 includes a buffer or a queue for temporarily storing data before being processed.

도 8은 본 발명의 일 실시예에 따른, 동적 부하 밸런싱을 제공하는 멀티 코어 프로세서 장치(100)의 구성을 도시하는 블록도이다.8 is a block diagram illustrating the configuration of a multicore processor device 100 that provides dynamic load balancing, in accordance with one embodiment of the present invention.

멀티 코어 프로세서 장치(100)는 멀티 코어 프로세서(110)와, 메모리(120)와, 버퍼(130)와, 비디오 디코더 모듈(140)을 포함하여 구성될 수 있다. 멀티 코어 프로세서 장치(100)는 기본적으로는 도 1에서와 같은 기능적 분할 방식을 사용한다. 즉, 복수의 코어들은 1차적으로는 각각 정해진 기능만을 담당하도록 되어 있다. 그러나, 유휴 시간을 제거 내지 감소시키기 위하여, 각각의 코어들은 담당하던 작업이 완료되면 그 완료된 사실을 다른 코어들에게 통지하고, 다른 코어들은 진행중인 작업 중 일부의 작업을 작업을 완료한 코어에게 분할 할당하게 된다.The multicore processor apparatus 100 may include a multicore processor 110, a memory 120, a buffer 130, and a video decoder module 140. The multi-core processor device 100 basically uses the functional division method as shown in FIG. That is, the plurality of cores are primarily responsible for only predetermined functions. However, in order to eliminate or reduce the idle time, each of the cores notifies the other cores of the completion of the work they are in charge of, and the other cores allocate some of the work in progress to the completed core .

비디오 디코더 모듈(140)은 도 6과 같은 비디오 디코딩 과정을 수행하기 위한 비디오 디코딩 소프트웨어이다. 비디오 디코더 모듈(140)은 심볼 디코더, 역 양자화부, 역 변환부, 모션 보상부 등의 기능 모듈로 이루어질 수 있다. 비디오 디코더 모듈은, MPEG-2, MPEG-4, H.264 등 미리 정의된 비디오 코딩/디코딩 표준 규격에 따른 비디오 디코딩 소프트웨어일 수 있다.The video decoder module 140 is video decoding software for performing the video decoding process as shown in FIG. The video decoder module 140 may comprise functional modules such as a symbol decoder, an inverse quantization unit, an inverse transform unit, and a motion compensation unit. The video decoder module may be video decoding software according to a predefined video coding / decoding standard, such as MPEG-2, MPEG-4, H.264.

메모리(120)는 입력된 비트스트림을 저장하고, 비디오 디코더 모듈(140)의 기능 모듈들을 로드한다. 상기 비트스트림은 비디오 인코더(미도시됨) 단에서 압축 된 비디오 데이터이다. 메모리(120)는 롬(ROM), 피롬(PROM), 이피롬(EPROM), 이이피롬(EEPROM), 플래시 메모리와 같은 비휘발성 메모리 소자 또는 램(RAM)과 같은 휘발성 메모리 소자, 하드 디스크, 광 디스크와 같은 저장 매체, 또는 기타 해당 분야에서 알려져 있는 임의의 다른 형태로 구현될 수 있다.The memory 120 stores the input bitstream and loads functional modules of the video decoder module 140. [ The bitstream is compressed video data at a video encoder (not shown). The memory 120 may be a nonvolatile memory device such as ROM, PROM, EPROM, EEPROM, flash memory, or a volatile memory device such as a RAM, a hard disk, Disk, or any other form known in the art.

버퍼(130)는 멀티 코어 프로세서(110)가 처리해야 하는 작업 내지 영상 블록의 데이터를 일시적으로 저장하는 저장소이다. 버퍼(130)는 메모리(120)의 일부로서 구현될 수도 있고 메모리(120)와는 별도의 저장 수단으로 구현될 수도 있다.The buffer 130 is a storage for temporarily storing data of tasks or image blocks that the multicore processor 110 has to process. The buffer 130 may be implemented as part of the memory 120 or may be implemented as a separate storage unit from the memory 120. [

멀티 코어 프로세서(110)는 적어도 2이상의 코어들로 구성되어 있다. 도 8의 실시예에서는 멀티 코어 프로세서(110)가 3개의 코어로 이루어져 있지만 2개 또는 4개 이상의 코어들로 구현될 수도 있음은 물론이다. 전체의 비디오 디코딩 과정이 심볼 디코딩과, 역 영자화 및 역 변환과, 모션 보상의 3가지 기능으로 분류된다고 할 때, 제1 코어(111)는 심볼 디코딩을 담당하고, 제2 코어(112)는 모션 보상(MC)을 담당하며, 제3 코어(113)는 역 양자화 및 역 변환(IQ/IT)을 담당하는 것으로 가정하여 설명한다.The multicore processor 110 is composed of at least two or more cores. In the embodiment of FIG. 8, although the multicore processor 110 is composed of three cores, it is needless to say that the multicore processor 110 may be implemented by two or more cores. The first core 111 is responsible for symbol decoding, and the second core 112 is for decoding symbols. In this case, the first core 111 and the second core 112 are divided into three functions, i.e., symbol decoding, inverse characterization and inverse transformation, And the third core 113 is responsible for inverse quantization and inverse transform (IQ / IT).

제1 코어(111)는 메모리(120)에 로딩된 기능 모듈과 비트스트림을 읽어서 버퍼(130)에 작업을 큐잉한다. 버퍼(130)에 큐잉되는 작업의 최소 단위는 매크로블록을 구성하며 모션 벡터가 할당되는 서브블록이다. 코어(111, 112, 113)는 일단, 상기 기능적 분할 방식에 따라 버퍼(130)에 큐잉된 작업 중에서 각각 담당하는 기능에 해당하는 작업을 수행하게 된다. 그러나, 어떤 코어가 유휴 상태가 되면 작업을 진행 중인 코어의 작업 중 일부를 상기 유휴 상태가 된 코어에 할당할 필요가 있 다.The first core 111 reads the functional module and the bit stream that are loaded into the memory 120 and queues the work to the buffer 130. The minimum unit of work queued in the buffer 130 is a sub-block that constitutes a macroblock and to which a motion vector is assigned. The cores 111, 112, and 113 perform operations corresponding to functions assigned to the buffers 130, respectively, according to the functional division method. However, when a core becomes idle, it is necessary to allocate some of the operations of the core in operation to the idle core.

다음의 도 9는 코어 간에 동적 부하 밸런싱을 구현하는 구체적인 예를 보여주는 시퀀스 다이어그램(sequence diagram)이다.Figure 9 below is a sequence diagram illustrating a concrete example of implementing dynamic load balancing between cores.

제1 코어(111)는 제2 코어(112)에게 모션 보상 작업을 수행할 것을 명령하는 제어 메시지(Do_MC(N))을 제2 코어(112)에 전송한다(S2). 상기 N은 버퍼(130) 에 저장되어 있는 처리되어야 할 영상 블록(본 발명에서는 적어도 하나 이상의 매크로블록 또는 서브블록을 의미한다)의 수를 나타낸다. 그리고, 제1 코어(111)는 제3 코어(113)에게 역 양자화 및 역 변환을 수행할 것을 명령하는 제어 메시지(Do_IQ/IT)를 전송한다(S4).The first core 111 transmits a control message Do_MC (N) to the second core 112 to instruct the second core 112 to perform the motion compensation operation (S2). The number N represents the number of image blocks (at least one macro block or a sub-block in the present invention) to be processed stored in the buffer 130. Then, the first core 111 transmits a control message (Do_IQ / IT) instructing the third core 113 to perform inverse quantization and inverse transformation (S4).

이후부터는, 각각의 코어가 담당하는 작업을 각각 수행한다. 즉, 제1 코어(111)는 심볼 디코딩을 수행하고(S6), 제2 코어(112)는 모션 보상을 수행하며(S8), 제3 코어(113)는 역 양자화 및 역 변환을 수행한다(S10). 단, 역 양자화 및 역 변환은 심볼 디코딩이 완료되어야 수행될 수 있지만, S6의 심볼 디코딩이 현재 영상 블록의 다음 영상 블록에 대한 심볼 디코딩 과정이라면, 현재 영상 블록에 대한 심볼 디코딩은 이미 완료되어 있을 수 있다. 제1 코어(111)는 심볼 디코딩이 완료되면 다른 코어들이 수행하여야 할 작업들을 버퍼(130)에 큐잉하게 된다. 이 때, 제1 코어(111)는 모든 작업을 단일의 버퍼에 큐잉할 수도 있지만, 각 코어 별로 별도의 버퍼를 생성하여 기능별로 각각의 코어에 필요한 작업을 분리하여 큐잉할 수도 있다.Hereinafter, each of the cores performs the respective tasks. That is, the first core 111 performs symbol decoding (S6), the second core 112 performs motion compensation (S8), and the third core 113 performs dequantization and inverse transform S10). However, if the symbol decoding in S6 is the symbol decoding process for the next image block in the current image block, the symbol decoding for the current image block may be already completed have. The first core 111 queues the jobs to be performed by the other cores in the buffer 130 upon completion of symbol decoding. In this case, the first core 111 may queue all the jobs in a single buffer, but it is also possible to separately create a separate buffer for each core, and separate and queue the jobs required for each core according to the function.

상기 기능적 분할 방식에 따라 각각의 코어가 작업을 진행하던 중에, 제3 코 어(113)에서의 역 양자화 및 역 변환 작업이 완료되면, 제3 코어(113)는 제1 코어(111)에 해당 작업이 완료되었음을 나타내는 제어 메시지(IQ/IT_Done)를 제1 코어(111)에 전송한다(S12). 이로써, 제3 코어(113)는 유휴 상태가 된다.When the inverse quantization and inverse transformation operations in the third core 113 are completed while the respective cores are working according to the functional division method, the third core 113 corresponds to the first core 111 And transmits a control message (IQ / IT_Done) indicating that the operation is completed to the first core 111 (S12). Thereby, the third core 113 becomes idle.

한편, 제3 코어(113)는 상기 역 양자화 및 역 변환 작업이 완료되었음을 제2 코어에 알리는 신호(SendSignal(IQ/IT_Done))를 제2 코어(112)에 전달한다(S14).The third core 113 transmits a signal (SendSignal (IQ / IT_Done)) informing the second core that the inverse quantization and inverse transformation are completed to the second core 112 (S14).

제2 코어(112)는 현재 잔여 작업 중 일부 작업(p)를 분할하고(S16), 그 일부의 작업을 수행할 것을 제3 코어(113)에 알리는 신호(SendSignal(Do_MC(p))를 제3 코어에 전달한다(S18). 제2 코어(112)가 잔여 작업 중에서 어느 정도의 작업을 다른 코어에 할당한 것인가는 임의로 정할 수 있지만, 간단히 잔여 작업 량을 (유휴 코어의 수 + 1)로 나눈 값을 기준으로 할 수 있을 것이다. 왜냐하면 코어 간의 성능이 동일하다면 상기와 같이 할당하였을 때 유휴 시간을 최소화할 수 있기 때문이다. 예를 들어, 전체 수행되어야 할 모션 보상에 관한 작업 량이 N이고, 현재까지 제2 코어(112)가 완료한 작업량이 m이라고 하면, 제2 코어(112)는 제3 코어(113)에 잔여 작업량인 N-m의 1/2을 할당할 수 있다.The second core 112 divides a part of the work p in the current remaining work S16 and sends a signal SendSignal (Do_MC (p)) to inform the third core 113 to perform a part of the work It is possible to arbitrarily determine how much work the second core 112 has allocated to the other cores among the remaining jobs. However, the remaining work amount can be arbitrarily set to (the number of idle cores + 1) For example, if the amount of work related to the motion compensation to be performed is N, that is, if the performance of the core is N, If the amount of work completed by the second core 112 is m, the second core 112 can allocate 1/2 of Nm, which is the remaining work amount, to the third core 113.

그 후, 제2 코어(112) 및 제3 코어(113)는 각각 할당된 모션 보상 작업을 수행하게 된다(S20, S22). 이 때 각각의 코어(112, 113)는 버퍼(130)에 큐잉된 작업 중에서 자신에게 할당된 작업을 추출하여 수행한다. 이를 위하여 코어는 상기 큐잉된 작업들 중에서 자신이 작업할 작업들에게 미리 체크 비트를 설정할 수도 있다.Thereafter, the second core 112 and the third core 113 respectively perform the assigned motion compensation operations (S20, S22). At this time, each of the cores 112 and 113 extracts and executes a job assigned to itself in the job queued in the buffer 130. For this purpose, the core may set a check bit in advance for jobs to be performed among the queued jobs.

제3 코어(113)가 할당된 모션 보상 작업을 완료하게 되면 제2 코어(112)에 이를 알리는 신호(SendSignal(MC_Done))을 전달한다(S24).The third core 113 transmits a signal (SendSignal (MC_Done)) notifying the second core 112 to the second core 112 (S24).

그러면, 제2 코어(112)는 자신에게 할당된 모션 보상 작업이 완료될 때, 제1 코어(111)에게 전체 모션 보상 작업이 완료되었음을 알리는 제어 메시지(MC_Done)을 전송한다(S26).Then, when the motion compensation operation assigned to the second core 112 is completed, the second core 112 transmits a control message (MC_Done) indicating that the entire motion compensation operation is completed to the first core 111 (S26).

이상에서는, 기존의 기능적 분할 방식에 본 발명의 일 실시예에 따른 동적 부하 밸런싱을 적용하는 예를 설명하였다. 그런데, 버퍼(130)에 큐잉된 모든 작업들이 서로 의존성을 갖지 않는 독립적인 작업들이라면 별 문제가 없다. 그러나, 버퍼(130)에 큐잉된 작업들 중에서 의존성을 갖는 작업이 있다면 도 9의 과정이 약간 수정될 필요가 있다.In the above, an example in which dynamic load balancing according to an embodiment of the present invention is applied to the existing functional division method has been described. However, there is no problem if all tasks queued in the buffer 130 are independent tasks that do not have dependencies on each other. However, if there is a job having a dependency among jobs queued in the buffer 130, the process of FIG. 9 needs to be slightly modified.

먼저, 제1 코어(111)는 버퍼(130)에 작업을 큐잉할 때 의존성이 있는 작업을 추출하여 별도의 체크 비트를 표시하거나, 상호 의존성이 있는 작업과 의존성이 없는 작업을 별도의 버퍼에 큐잉한다. 예를 들어, 제3 코어(113)가 유휴 상태에 있고, 제2 코어(112)가 담당해야 할 큐잉된 작업들이 도 10과 같다고 가정한다. 이 때 작업 3 내지 5는 상호 의존성이 있는 작업이므로 하나의 코어에서 처리되는 것이 바람직하다. 따라서, 제2 코어(112)는 총 12개의 잔여 작업 중에서 상호 의존성이 있는 작업 6 내지 9와 작업 1 및 2를 자신에게 할당하고, 나머지 6개의 작업들은 제3 코어(113)에 할당할 수 있다. 또는, 반대로 전자를 제3 코어(113)에게 할당하고, 후자를 자신에게 할당할 수도 있다.First, the first core 111 extracts a job having a dependency when the job is queued in the buffer 130 and displays a separate check bit, or jobs having interdependency and no dependency are queued in a separate buffer do. For example, it is assumed that the third core 113 is in the idle state and the queued jobs that the second core 112 should assume are as shown in FIG. At this time, the tasks 3 to 5 are interdependent tasks, so it is preferable that they are processed in one core. Accordingly, the second core 112 can allocate tasks 6 to 9 and tasks 1 and 2, which have interdependencies, from among a total of 12 remaining tasks, and assign the remaining six tasks to the third core 113 . Alternatively, conversely, the former may be assigned to the third core 113, and the latter may be assigned to itself.

이와 같이, 상호 의존성이 있는 작업들을 하나의 코어에 할당하더라도 나머지 독립적인 작업들을 적절히 배분함으로써 전체적으로 부하 밸런스를 맞출 수가 있다.In this way, even when tasks having interdependencies are allocated to one core, the remaining independent tasks can be appropriately allocated to balance the load as a whole.

이상에서와 같이, 본 발명에 따른 멀티 코어 프로세서 장치(100)는 기능적 분할 및 동적 부하 밸런싱을 적용하여 일부 코어에서 발생되는 유휴 시간을 최소화한다. 그렇지만, 모든 코어가 동일한 시각에 동일한 영상 블록에 대한 작업을 동시 수행하는 것은 아닐 수도 있다. 도 11은 본 발명의 일 실시예를 파이프라인(pipeline) 개념으로 설명하는 도면이다.As described above, the multicore processor apparatus 100 according to the present invention applies functional division and dynamic load balancing to minimize the idle time that occurs in some cores. However, not all cores may concurrently work on the same video block at the same time. 11 is a diagram illustrating an embodiment of the present invention in terms of a pipeline.

도 11에서, 동일한 영상 블록에 대하여 수행되는 작업들을 음영으로 표시하였다. 즉, 코어 #1이 구간 1에서 현재 영상 블록에 대한 심볼 디코딩을 완료하면 코어 #2는 구간 2에서 상기 디코딩된 심볼을 바탕으로 복수의 작업을 버퍼에 큐잉한다. 코어 #3은 구간 3에서 상기 현재 영상 블록에 대한 모션 보상 작업을 수행하다가 코어 #2에서 유휴 시간이 발생한 것을 확인하고 일부의 모션 보상 작업을 코어 #2에 할당한다.In FIG. 11, operations performed on the same image block are shaded. That is, when core # 1 completes symbol decoding on the current image block in interval 1, core # 2 queues a plurality of jobs into the buffer based on the decoded symbols in interval 2. The core # 3 performs the motion compensation operation on the current image block in the interval 3, confirms that the idle time has occurred in the core # 2, and allocates some motion compensation operations to the core # 2.

구간 3이 경과하여 모션 보상이 완료되면, 코어 #1은 구간 4의 특정 시간에 상기 현재 영상 블록에 대한 역 양자화(IQ), 역 변화(IT) 및 인트라 예측(IP) 등의 과정을 수행한다. 마지막으로 코어 #4는 구간 5에서 상기 현재 영상 블록에 대한 디블록 과정을 수행하여 현재 영상 블록의 블록 인위성을 제거함으로써 하나의 영상 블록(제1 영상 블록)을 복원한다. 이와 마찬가지의 파이프라인 구조에 의하여 구간 6이 경과하면 또 다른 영상 블록(제2 영상 블록)이 복원된다.When the motion compensation is completed after the interval 3 has elapsed, the core # 1 performs processes such as inverse quantization (IQ), inverse change (IT), and intra prediction (IP) on the current image block at a specific time in the interval 4 . Finally, the core # 4 performs a deblocking process on the current image block in the interval 5 to restore one image block (first image block) by removing block artifacts of the current image block. When the interval 6 has elapsed due to the similar pipeline structure, another image block (second image block) is reconstructed.

멀티 코어 프로세서를 이용하여 기존의 기능적 분할 방식에 따라 비디오 디코딩을 수행한 실험 결과와, 본 발명에 따른 동적 부하 할당 방식에 따라 비디오 디코딩을 수행한 실험 결과는 다음의 표 1에 정리되어 있다.Experimental results of performing video decoding according to the conventional functional division method using a multicore processor and experimental results of performing video decoding according to the dynamic load allocation method according to the present invention are summarized in Table 1 below.

코어 #1Core # 1 코어 #2Core # 2 종래기술Conventional technology 13.45 ms/프레임13.45 ms / frame 39.95 ms/프레임39.95 ms / frame 본 발명Invention 26.40 ms/프레임26.40 ms / frame 27.01 ms/프레임27.01 ms / frame

상기 실험에서는, 코어 #2은 모션 보상에 관한 작업만을 수행하고, 코어 #1는 모션 보상 이외의 작업을 수행하는 것으로 하였다. 상기 모션 보상에는 Quarter Pixel 모션 보상과 같이 연산량의 부하가 매우 큰 모션 보상 기법을 사용하였다. 표 1에서, 종래 기술에 따르면 코어 #2가 작업을 완료하는 시점에서 코어 #1에는 26.5ms 만큼의 유휴시간이 발생하지만, 본 발명에 따르면 코어 #1에는 0.61ms 만큼의 유휴시간만이 발생한다는 것을 알 수 있다.In the experiment, the core # 2 performs only motion compensation and the core # 1 performs operations other than motion compensation. For the motion compensation, a motion compensation technique such as Quarter Pixel motion compensation, which has a large computational load, is used. In Table 1, according to the related art, when Core # 2 completes the operation, idle time of 26.5 ms occurs in Core # 1, but according to the present invention, only idle time of 0.61 ms occurs in Core # 1 .

이상 첨부된 도면을 참조하여 본 발명의 실시예를 설명하였지만, 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자는 본 발명이 그 기술적 사상이나 필수적인 특징을 변경하지 않고서 다른 구체적인 형태로 실시될 수 있다는 것을 이해할 수 있을 것이다. 그러므로 이상에서 기술한 실시예들은 모든 면에서 예시적인 것이며 한정적이 아닌 것으로 이해해야 한다.While the present invention has been described in connection with what is presently considered to be practical exemplary embodiments, it is to be understood that the invention is not limited to the disclosed embodiments, but, on the contrary, You will understand. It is therefore to be understood that the embodiments described above are in all respects illustrative and not restrictive.

도 1은 종래의 기능적 분할 방식에 따른 작업 흐름을 설명하는 도면이다.1 is a view for explaining a work flow according to a conventional functional division method.

도 2는 종래의 기능적 분할 방식에서 발생되는 로드 불균형을 보여주는 도면이다.2 is a diagram showing a load imbalance generated in a conventional functional division method.

도 3은 종래의 데이터 분할 방식을 설명하기 위한 도면이다.3 is a diagram for explaining a conventional data division method.

도 4는 종래에 멀티 코어 프로세서가 비디오 디코딩을 수행하는 과정을 보여주는 도면이다.FIG. 4 is a diagram illustrating a process in which a multi-core processor performs video decoding.

도 5는 도 4의 환경에서 본 발명에 따른 부하의 동적 부하 밸런싱을 적용한 예를 보여주는 도면이다.FIG. 5 is a diagram showing an example in which dynamic load balancing of a load according to the present invention is applied in the environment of FIG. 4; FIG.

도 7은 본 발명의 일 실시예에 따른 전체 시스템의 구성도이다.7 is a block diagram of an overall system according to an embodiment of the present invention.

도 8은 본 발명의 일 실시예에 따른, 동적 부하 밸런싱을 제공하는 멀티 코어 프로세서 장치의 구성을 도시하는 블록도이다.8 is a block diagram illustrating the configuration of a multicore processor device that provides dynamic load balancing, in accordance with one embodiment of the present invention.

도 9는 코어 간에 동적 부하 밸런싱을 구현하는 구체적인 예를 보여주는 시퀀스 다이어그램이다.Figure 9 is a sequence diagram illustrating a specific example of implementing dynamic load balancing between cores.

도 10은 버퍼에 큐잉된 작업들이 의존성이 있는 작업과 독립적인 작업을 포함하는 예를 보여주는 도면이다.FIG. 10 is a diagram showing an example in which queued jobs in a buffer include tasks that are dependent and independent; and FIG.

도 11은 본 발명의 일 실시예를 파이프라인 개념으로 설명하는 도면11 is a view for explaining an embodiment of the present invention in a pipeline concept;

(도면의 주요부분에 대한 부호 설명) (Reference Numerals for Main Parts of the Drawings)

61: 심볼 디코더 62: 역 양자화부61: Symbol decoder 62: Inverse quantization unit

63: 역 변환부 65: 모션 보상부63: Inverse transform unit 65: Motion compensation unit

66: 가산기 67: 디블록부66: adder 67:

71: 비디오 소스 72: 입출력 장치71: Video source 72: Input / output device

73: 디스플레이 장치 100: 멀티 코어 프로세서 장치73: Display device 100: Multicore processor device

110: 멀티 코어 프로세서 120: 메모리110: multicore processor 120: memory

130: 버퍼 140: 비디오 디코더 모듈130: buffer 140: video decoder module

Claims

A video decoder module composed of functional modules for performing video decoding;

A memory for storing an input bit stream and loading the function modules; And

And a multicore processor configured by a plurality of cores for performing video decoding on the input bitstream using the function modules,

Wherein when the idle time occurs in the first core among the cores during the video decoding, the first core transmits a signal indicating that the idle time has occurred to the second core, Wherein the second core having the idle time divided by the number of cores in which the idle time has occurred is divided by the number of cores in which the idle time has occurred, Processor device.

The method according to claim 1,

Wherein the functional module is a functional module of the H.264 standard.

2. The method of claim 1,

A symbol decoding module, an inverse quantization module, an inverse transform module, and a motion compensation module.

2. The system of claim 1, wherein the multicore processor

Further comprising a third core for queuing the generated jobs in a buffer to generate jobs using the input bitstream and function modules and to distribute the jobs to the cores according to a corresponding function, Device.

5. The method of claim 4, wherein the third core

And separates the jobs having dependencies among the jobs and queues them into the buffer or the separate buffer.

delete

2. The method of claim 1, wherein the first core

And transmits a signal to the second core indicating that the operation is completed, when the assigned task is completed.

The method according to claim 1,

Wherein the interdependent tasks among the remaining operations are allotted to one core.

The method according to claim 1,

Wherein the residual operation belongs to a motion compensation operation.

Loading functional modules for storing the input bitstream and performing video decoding;

Creating jobs using the input bitstream and function modules, and queuing the generated jobs in a buffer to distribute the jobs to cores according to a corresponding function;

Performing video decoding on the input bitstream using the functional modules, wherein the video decoding is performed by a multicore processor having a plurality of cores; And

When the idle time occurs in the first core among the cores during the video decoding, transmitting a signal indicating that the idle time has occurred to the second core; And

And a second core having a residual operation relating to video decoding divided by the number of cores in which the idle time has occurred by the number of cores in which the idle time has occurred, to a core where the second core and the idle time have occurred A video decoding method based on a multicore processor.

12. The method of claim 11,

Wherein said functional module is a functional module of H.264 standard.

12. The method of claim 11,

A video decoding method based on a multicore processor including a symbol decoding module, an inverse quantization module, an inverse transform module, and a motion compensation module.

14. The method of claim 13, wherein the step of queuing

Dividing jobs having dependencies among the jobs and queuing them into the buffer or a separate buffer.

delete

12. The method of claim 11,

And when the assigned task is completed, transmitting a signal to the second core indicating that the task is completed.

12. The method of claim 11,

Wherein the interdependent tasks among the remaining tasks are all assigned to one core.

12. The method of claim 11,

Wherein the residual operation belongs to a motion compensation operation.