KR100556341B1

KR100556341B1 - Vedeo decoder system having reduced memory bandwidth

Info

Publication number: KR100556341B1
Application number: KR1020040002199A
Authority: KR
Inventors: 강해용
Original assignee: (주)씨앤에스 테크놀로지
Priority date: 2004-01-13
Filing date: 2004-01-13
Publication date: 2006-03-03
Also published as: KR20050074011A

Abstract

본 발명은 멀티미디어 신호처리를 위한 비디오 디코더 시스템에 관한 것으로서, 더욱 상세하게는 분리된 듀얼 버스와 메모리 컨트롤러를 내장함으로써 H.264 HD급 비디오 디코더를 지원할 수 있는 메모리 대역폭이 감소된 비디오 디코더 시스템을 제공한다.The present invention relates to a video decoder system for multimedia signal processing, and more particularly, to provide a video decoder system having a reduced memory bandwidth capable of supporting an H.264 HD video decoder by embedding a separate dual bus and a memory controller. do.

본 발명에서는 하나의 버스는 그 버스에 속한 메모리 컨트롤러를 이용하여 인터 모드 예측을 위한 외부 메모리와의 읽기/쓰기 동작을 수행하고, 다른 하나의 버스는 그 버스에 속한 다른 하나의 메모리 컨트롤러를 이용하여 그 외의 블록이 외부 메모리와의 읽기/쓰기 동작을 하도록 한다. 또한, 인터 모드 예측에 속한 메모리 컨트롤러는 96비트의 데이터 버스로 구성하고, 그 외의 블록이 외부 메모리와의 읽기/쓰기를 위한 메모리 컨트롤러는 32비트의 데이터 버스로 구성한다.In the present invention, one bus performs a read / write operation with an external memory for inter mode prediction using a memory controller belonging to the bus, and the other bus uses another memory controller belonging to the bus. Make other blocks read / write with external memory. In addition, the memory controller belonging to the inter-mode prediction is composed of a 96-bit data bus, and the memory controller for reading / writing an external block with a 32-bit data bus is configured.

메모리 대역폭, 외부 메모리, 듀얼 버스, 듀얼 메모리 컨트롤러, 인터 모드Memory Bandwidth, External Memory, Dual Bus, Dual Memory Controller, Inter Mode

Description

Video decoder system with reduced memory bandwidth {Vedeo decoder system having reduced memory bandwidth}

도 1은 본 발명에 따른 비디오 디코더 시스템의 전체 블록도이다.1 is an overall block diagram of a video decoder system according to the present invention.

도 2a와 도 2b는 한 프레임 디코딩시 쓰기 경로/읽기 경로를 설명하는 도면이다.2A and 2B illustrate a write path / read path when decoding one frame.

도 3a는 기준 프레임 메모리의 4x4 블록 워드 구조를 나타내며, 도 3b는 프레임 메모리의 4x4 블록 워드 구조를 나타내는 도면이다.FIG. 3A illustrates a 4x4 block word structure of a reference frame memory, and FIG. 3B illustrates a 4x4 block word structure of a frame memory.

도 4는 DF 블록에서 FMO 활성화/비활성화시 읽기/쓰기 경로를 설명하는 도면이다.4 illustrates a read / write path when FMO is activated / deactivated in the DF block.

도 5는 인터 모드 예측에 있어서, 기존의 쿼터 샘플 루마 보간을 설명하는 도면이다.FIG. 5 is a diagram for explaining conventional quarter sample luma interpolation in inter mode prediction. FIG.

* 도면의 주요부분에 대한 부호의 설명 *Explanation of symbols on the main parts of the drawings

100 : 비디오 디코더 200a, 200b : 외부 메모리100: video decoder 200a, 200b: external memory

300a, 300b : 듀얼 버스 400a, 400b : 듀얼 메모리 컨트롤러300a, 300b: dual bus 400a, 400b: dual memory controller

본 발명은 멀티미디어 신호처리를 위한 비디오 디코더 시스템에 관한 것으로서, 더욱 상세하게는 분리된 듀얼 버스와 메모리 컨트롤러를 내장함으로써 H.264 HD급 비디오 디코더를 지원할 수 있는 메모리 대역폭이 감소된 비디오 디코더 시스템을 제공하는 것이다.The present invention relates to a video decoder system for multimedia signal processing, and more particularly, to provide a video decoder system having a reduced memory bandwidth capable of supporting an H.264 HD video decoder by embedding a separate dual bus and a memory controller. It is.

영상이나 음향이 디지털 데이터로 전환될 경우, 이 데이터들의 양은 상당히 크기 때문에 압축하지 않을 경우에는 저장공간을 많이 차지하므로 비효율적이다. 따라서, 디지털 데이터의 정보를 줄이게 되는 압축기술을 필요로 하게되었고, 압축기술을 통하여 저장공간 활용 및 네트워크를 통한 전송기능을 향상시킬 수가 있다. 예를들면, 이미지 압축방식으로는 GIF, JPEG등의 포맷이 있으며, 영상 및 음향압축방식으로는 MPEG, H.263, H.264 등이 있다.When video or sound is converted to digital data, the amount of data is so large that it takes up a lot of storage space if not compressed, which is inefficient. Therefore, a compression technique for reducing information of digital data is required, and the compression technique can improve storage utilization and transmission function through a network. For example, image compression methods include GIF and JPEG formats, and video and sound compression methods include MPEG, H.263, and H.264.

그중에서 H.264 기술은 새로운 비디오 압축 코딩 표준으로 ITU와 국제표준화기구/국제전자기술위원회(ISO/IEC)가 공동 결성한 ‘조인트비디오팀(JVT)’에 의해 개발된 것으로, ‘ISO/IEC 14496 10 어드밴스드 비디오 코딩’이라고도 불리며, 급변하는 무선 환경 및 인터넷 환경 등을 고려하여 오류 강인성 및 네트워크에 친숙한 방식을 고려한 비디오 압축 표준이다.Among them, H.264 is a new video compression coding standard developed by the Joint Video Team (JVT) jointly formed by the ITU and the International Organization for Standardization / ISO (IEC). 14496 10 Advanced Video Coding 'is a video compression standard that considers error robustness and network-friendly method in consideration of rapidly changing wireless environment and Internet environment.

H.264 비디오 디코더 시스템에 있어서, 영상 데이터의 코딩 또는 디코딩은 매크로 블록(macro block : MB) 단위로 수행되며, 메모리의 경우 입력 비디오 스트림과 움직임 보상을 위한 프레임들을 저장하기 위한 외부 메모리가 존재한다.In the H.264 video decoder system, coding or decoding of image data is performed in units of macro blocks (MB), and in the case of a memory, an external memory for storing an input video stream and frames for motion compensation is present. .

입력 비디오 스트림은 GOP(Group Of Pictures)라는 일련의 픽쳐 내지 프레임의 군들로 나뉘어진다. 각각의 GOP는 다수의 픽쳐를 포함하고, 각각의 픽쳐는 다수 의 슬라이스로 나뉘어진다. 각각의 슬라이스는 다수의 매크로블록을 포함하며, 각각의 매크로블록은 4개의 8×8 명도(luminance) 블록과 2개의 8×8 색상(chrominance) 블록을 갖는다. 그리고, 한 GOP내에 2가지 종류의 픽쳐가 나타날 수 있는데, 하나는 인트라(INTRA) 모드 픽쳐(또는 I-픽쳐)이고 다른 하나는 예측 움직임 보상된 픽쳐(P-픽쳐)이다. I-픽쳐에서는 모든 매크로블럭이 인트라 모드로, 움직임 보상을 고려하지 않고 디코딩된다. P-픽쳐는 이전의 한 프레임을 이용하여 압축된 것으로, 각 매크로블럭이 움직임 보상을 이용하는 인터(INTER) 모드로 디코딩될 수 있다. The input video stream is divided into a series of pictures or groups of frames called a group of pictures (GOP). Each GOP includes a number of pictures, and each picture is divided into a number of slices. Each slice contains a number of macroblocks, each macroblock having four 8x8 brightness blocks and two 8x8 chrominance blocks. And two kinds of pictures may appear in one GOP, one is an intra mode picture (or I-picture) and the other is a predictive motion compensated picture (P-picture). In an I-picture, all macroblocks are in intra mode and decoded without considering motion compensation. The P-picture is compressed using one previous frame, and each macroblock can be decoded in an INTER mode using motion compensation.

기존의 H.264 비디오 디코더 시스템을 구현함에 있어서, 프로파일이 베이스라인 뿐만 아니라 메인 프로파일시의 단일 버스로 사용하는 경우에 메모리 대역폭이 엄청나게 요구되는 문제점을 가진다. 즉, 동영상 디코딩 프로세스를 수행하기 위해서는 전체 프레임을 저장하고 있는, 프레임 메모리와 기준 프레임 메모리를 포함한 외부 메모리에 대한 액세스가 요구되는데, 동영상 복원 과정에서 현재 혹은 이전 프레임 메모리의 액세스가 매우 빈번하므로 종래에서처럼 단일 시스템 버스를 이용한다면 이로 인한 시스템 버스의 이용도 측면에서 매우 비효율이다.In implementing the existing H.264 video decoder system, when the profile is used as a single bus in the main profile as well as the baseline, the memory bandwidth is enormously required. That is, in order to perform the video decoding process, access to an external memory including a frame memory and a reference frame memory, which stores the entire frame, is required. As the current or previous frame memory is frequently accessed during the video restoration process, as in the prior art, If a single system bus is used, the resulting system bus is also very inefficient.

H.264 비디오 디코더 시스템에서의 외부 프레임 메모리 액세스에는 (1) 가변장 디코딩을 위한 메모리 액세스, (2) 각각의 블록 사이즈별(16x16, 8x8, 8x16, 16x8, 4x4) 인터 모드 예측을 위한 다중 기준 프레임 메모리 액세스, (3) 인트라 모드 예측을 위한 주변 매크로블록의 메모리 액세스, (4) 디블록킹 필터링을 위한 메모리 액세스가 있다. 특히, 이러한 메모리 액세스 중에서, 인터 모드 예측을 수 행하는 경우의 메모리 대역폭은 나머지 블록의 메모리 대역폭을 모두 합한 것보다 크다. External frame memory access in an H.264 video decoder system includes (1) memory access for variable-length decoding, and (2) multiple criteria for inter-mode prediction for each block size (16x16, 8x8, 8x16, 16x8, 4x4). Frame memory access, (3) memory access of neighboring macroblocks for intra mode prediction, and (4) memory access for deblocking filtering. In particular, among such memory accesses, the memory bandwidth when performing inter mode prediction is larger than the sum of the memory bandwidths of the remaining blocks.

도 5는 인터 모드 예측에 있어서, 쿼터 샘플 루마 보간(quarter sample luma Interpolation)을 설명하는 도면이다. FIG. 5 is a diagram illustrating quarter sample luma interpolation in inter mode prediction. FIG.

도 5를 참조하면, 인터 모드 예측에서의 루마 예측은 6-탭 필터(1,-5,20,20,-5,1)를 적용하여 쿼터 샘플 루마 보간을 수행하며, 예를들어 G,H,M,N 픽셀 사이에 움직임 벡터가 위치한다면 a,b,c,d,e,f,g,h,I,j,k,m,n,p,q,r를 구하기 위해서 A,B,C,D,E,F,I,J,K,L,M,N,P,Q,R,S,T,U 픽셀이 필요하다. 결국, 외부 기준 프레임으로부터 읽어올 데이터 양이 상당하다는 것을 예측할 수 있다. H.264 디코더는 인터 모드 예측시 4x4 블록 단위로 움직임 벡터를 가질 수 있으며, 최악의 경우에 4x4 블록 단위로 쿼터 샘플 루마 보간을 수행함으로써 외부 프레임 메모리와의 액세스가 증가할 수 있다. Referring to FIG. 5, luma prediction in inter mode prediction is performed by applying a six-tap filter (1, -5,20,20, -5,1) to perform quarter-sample luma interpolation, for example, G, H. If the motion vector is located between M, N pixels, then A, B, c, d, e, f, g, h, I, j, k, m, n, p, q, r C, D, E, F, I, J, K, L, M, N, P, Q, R, S, T, U pixels are required. As a result, it can be predicted that the amount of data to be read from the external reference frame is significant. The H.264 decoder may have a motion vector in units of 4 × 4 blocks in inter mode prediction, and in the worst case, access to an external frame memory may be increased by performing quarter sample luma interpolation in units of 4 × 4 blocks.

고해상도(HD)를 지원하기 위해 인코더가 한 개의 프레임을 4x4 블록으로 움직임 벡터를 가지고, 초당 30 프레임으로 디코딩한다고 가정하면(여기서, 디코더는 32비트 액세스), 참조 픽쳐로부터 루마 샘플 보간을 수행하기 위해 다음과 같은 메모리 대역폭이 요구된다. 이때, 데이터 전송은 SDRAM의 버스트(Burst) 모드를 이용하여 액세스가 이루어진다.To support high resolution (HD), suppose the encoder decodes one frame into 4x4 blocks with motion vectors at 30 frames per second (where the decoder has 32-bit access) to perform luma sample interpolation from the reference picture. The following memory bandwidth is required. At this time, the data transfer is accessed using the burst mode of the SDRAM.

4x4 블록의 루마 샘플 보간은 3워드 * 9라인이 필요하게 된다(3워드, 9라인에 대한 설명은 이후에서 상술됨). 버스트 모드로 한 라인씩 읽어오는데 10싸이클로 정의한다면, 90싸이클이 소요된다. 하나의 매크로블록이 4x4 블록으로 16개 존 재하므로 90싸이클 * 16이 된다. HD는 매크로블록 개수가 120 * 67.5개가 되므로 90싸이클 * 16 * 120 * 67.5로 계산된다. 또한 초당 30프레임으로 디코딩을 수행한다면, 90싸이클 * 16 * 120 * 67.5싸이클 * 30 (약 300㎒)의 메모리 대역폭을 필요로 한다. 따라서, H.264 HD급 디코더를 지원하기 위해서는 적어도 메모리 대역폭이 500MHz 정도의 고성능 칩이 요구되어짐을 알 수 있다. Luma sample interpolation of a 4x4 block requires 3 words * 9 lines (the description of 3 words, 9 lines will be described later). If you define 10 cycles to read lines one by one in burst mode, it takes 90 cycles. Since one macroblock exists in 16 as 4x4 blocks, it becomes 90 cycles * 16. Since HD has 120 * 67.5 macroblocks, it is calculated as 90 cycles * 16 * 120 * 67.5. In addition, if decoding is performed at 30 frames per second, a memory bandwidth of 90 cycles * 16 * 120 * 67.5 cycles * 30 (about 300 MHz) is required. Therefore, it can be seen that a high performance chip having a memory bandwidth of at least about 500 MHz is required to support the H.264 HD class decoder.

이상 살펴본 바와 같이, 기존의 H.264 비디오 디코더 시스템은 많은 양의 데이터 액세스가 요구되며, 결과적으로 메모리의 대역폭 요구(즉, 데이터 저장 및 검색 속도)가 큰 문제점으로 지적되고 있다. As described above, the existing H.264 video decoder system requires a large amount of data access, and as a result, memory bandwidth requirements (ie, data storage and retrieval speed) are pointed out as a big problem.

이러한 문제점을 해결하기 위한 것으로, 첫번째는 데이터 버스의 확장(예를 들어, 32비트에서 64비트로) 방법을 들 수 있다. 이러한 데이터 버스의 확장 방법을 이용하는 경우에 메모리 대역폭을 계산하게 되면, 4x4 블록의 루마 샘플 보간은 2워드 * 9라인이 필요하게 된다. 버스트 모드(버스트 2 액세스)로 한 라인씩 읽어오는데 8싸이클로 정의한다면, 72싸이클이 소요된다. 하나의 매크로블록은 4x4 블록이 16개가 존재하므로 72싸이클 * 16이 된다. HD는 매크로 개수가 120 * 67.5개가 되므로 72 * 16 * 120 * 67.5싸이클이 요구된다. 또한 1초당 30프레임으로 디코딩을 수행한다면, 72 * 16 * 120 * 67.5싸이클 * 30 (약 250MHz)이 소요된다. 여전히 많은 양의 메모리 대역폭이 요구되고 있으며, 데이터 버스를 128비트로 확장하더라도 메모리 대역폭의 감소는 기대할 수 없다. In order to solve this problem, the first method is an extension of the data bus (for example, 32 bits to 64 bits). When the memory bandwidth is calculated using the data bus extension method, the luma sample interpolation of the 4x4 block requires 2 words * 9 lines. If you define 8 cycles to read one line in burst mode (burst 2 access), 72 cycles are required. One macroblock is 72 cycles * 16 because there are 16 4x4 blocks. HD requires 120 x 67.5 macros, so 72 x 16 * 120 * 67.5 cycles are required. In addition, if decoding is performed at 30 frames per second, it takes 72 * 16 * 120 * 67.5 cycles * 30 (about 250 MHz). A large amount of memory bandwidth is still required, and even if the data bus is extended to 128 bits, no reduction in memory bandwidth can be expected.

두번째는 2개의 메모리 컨트롤러를 이용하여 데이터 버스를 두개로 단순히 분리하는 방법을 들 수 있다. 즉, 인터 모드 예측을 위한 디코딩 읽기/쓰기 경로 및 그외의 블록들에 대한 외부 메모리와의 읽기/쓰기 경로를 분리하는 것이다. 그러나, 이 방법 또한 첫번째 방법과 거의 유사한 성능을 보일 수 있다.The second is to simply split the data bus into two using two memory controllers. That is, the decoding read / write path for inter mode prediction and the read / write path with external memory for other blocks are separated. However, this method can also show almost similar performance as the first method.

따라서, 낮은 메모리 대역폭에서도 H.264 HD급 비디오 디코더를 지원할 수 있는 비디오 디코더 시스템이 요구되어진다. Therefore, a video decoder system capable of supporting an H.264 HD video decoder even at a low memory bandwidth is required.

본 발명은 상기와 같은 문제점을 해결하기 위한 것으로서, 본 발명의 목적은 인터 모드 예측동안 외부 메모리에 대한 액세스를 제어하는 별도의 버스와 메모리 컨트롤러를 제공함으로써 각 블록에 대한 부하가 최소화되도록 하는데 있다.SUMMARY OF THE INVENTION The present invention has been made to solve the above problems, and an object of the present invention is to provide a separate bus and memory controller that controls access to an external memory during inter-mode prediction, thereby minimizing the load on each block.

따라서, 낮은 메모리 대역폭에서도 H.264 HD급 비디오 디코더를 지원할 수 있는 비디오 디코더 시스템이 제공된다. Accordingly, a video decoder system capable of supporting an H.264 HD video decoder even at a low memory bandwidth is provided.

본 발명에 따른 비디오 디코더 시스템은 가변장 디코딩 블록, 역이산 코사인 변환/양자화 블록, 인터 블록, 인트라 블록, 디블록킹 필터 블록을 포함하는 비디오 디코더와; 상기 비디오 디코더를 이용하여 디코딩된 데이터를 저장하거나 디코딩을 위해 저장된 데이터를 출력하는 외부 메모리를 포함하는데, 상기 외부 메모리는 프레임 메모리와 기준 프레임 메모리로 구성되며; 상기 외부 메모리에 대한 액세스를 제어하는 제1 버스와 제1 메모리 컨트롤러 및 제2 버스와 제2 메모리 컨트롤러를 포함하고, 한 프레임 디코딩시, 상기 인터 블록은 상기 제1 버스와 제1 메모리 컨트롤러를 통해 상기 기준 프레임 메모리로부터 데이터를 리딩하여 인터 모드 예측을 수행하고, 상기 인터 블록을 제외한 서브블록들은 상기 제2 버스와 제2 메모리 컨트롤러를 통해 상기 프레임 메모리부터 데이터를 리딩하여 해당 동작들을 수행하는 것을 특징으로 한다.A video decoder system according to the present invention comprises: a video decoder comprising a variable length decoding block, an inverse discrete cosine transform / quantization block, an inter block, an intra block, a deblocking filter block; An external memory for storing decoded data using the video decoder or outputting stored data for decoding, the external memory comprising a frame memory and a reference frame memory; And a first bus, a first memory controller, a second bus, and a second memory controller for controlling access to the external memory, wherein, when decoding one frame, the inter block is connected to the first bus and the first memory controller. Inter-mode prediction is performed by reading data from the reference frame memory, and subblocks except for the inter-block read data from the frame memory through the second bus and a second memory controller to perform corresponding operations. It is done.

이하 본 발명을 첨부된 도면 도 1 내지 도 3를 참고로하여 설명하면 다음과 같다.Hereinafter, the present invention will be described with reference to the accompanying drawings, FIGS. 1 to 3.

도 1은 본 발명에 따른 비디오 디코더 시스템의 전체 블록도로서, 비디오 디코더(100), 외부 메모리(200a, 200b), 각 구성부를 연결해 주는 듀얼 버스(300a, 300b), 및 외부 메모리 제어를 위한 듀얼 메모리 컨트롤러(400a, 400b)로 구성된다.1 is an overall block diagram of a video decoder system according to the present invention, which includes a video decoder 100, external memories 200a and 200b, dual buses 300a and 300b connecting respective components, and dual for external memory control. Memory controllers 400a and 400b.

비디오 디코더(100)는 입력된 비디오 스트림의 복원을 수행하며, 가변장 디코딩(Variable Length Decoding : VLD) 블록(102), 역이산여현변환/양자화(Inverse Discrete Cosine Transform/Quantization : IDCT/Q) 블록(104), 인트라 블록(106), 인터 블록(108) 및 디블록킹 필터(Deblocking filter : DF) 블록(110)으로 구성된다.The video decoder 100 performs restoration of the input video stream, and includes a variable length decoding (VLD) block 102 and an inverse discrete cosine transform / quantization (IDCT / Q) block. 104, an intra block 106, an inter block 108, and a deblocking filter (DF) block 110.

VLD 블록(102)은 조건부 가변장 코딩(context-adaptive variable length decoding : CAVLD) 코딩에 의해 입력되는 압축 비디오 스트림 파싱(bit stream parsing)을 담당하고, IDCT/Q 블록(104)은 픽셀 단위의 주파수-시간 영역 변환 및 양자화되어 입력되는 픽셀 데이터에 대한 역양자화를 수행하고, 인트라 블록(106)은 인트라 프레임 픽쳐시 주변 매크로블록과의 상관관계를 통해 현재 픽셀에 대한 디코딩을 수행하고, 인터 블록(108)은 인터 프레임 픽쳐시 전 영상과의 비교를 통 해 인터 프레임에 대한 디코딩을 수행하며, DF 블록(110)은 비디오 스트림의 복원시 발생될 수 있는 매크로블록들간의 블록킹 현상을 줄일 수 있도록 설계된다.The VLD block 102 is responsible for bit stream parsing input by context-adaptive variable length decoding (CAVLD) coding, and the IDCT / Q block 104 is a frequency in pixels. Time domain transform and inverse quantization on the inputted quantized pixel data, the intra block 106 performs decoding on the current pixel through correlation with neighboring macroblocks in the intra frame picture, and 108 is to decode the inter frame by comparing with the previous image in the inter-frame picture, DF block 110 is designed to reduce the blocking phenomenon between macroblocks that may occur when the video stream is reconstructed do.

외부 메모리(200a, 200b)는 상기 비디오 디코더(100)를 이용하여 디코딩된 데이터를 저장하거나 디코딩을 위해 저장된 데이터를 출력하며, 기준 프레임 메모리(200a)와 프레임 메모리(200b)로 구성된다.The external memories 200a and 200b store data decoded using the video decoder 100 or output data stored for decoding, and are composed of a reference frame memory 200a and a frame memory 200b.

듀얼 버스(300a, 300b)와 듀얼 메모리 컨트롤러(400a, 400b)는 상기 외부 메모리(200a, 200b)에 대한 액세스를 제어한다.The dual buses 300a and 300b and the dual memory controllers 400a and 400b control access to the external memories 200a and 200b.

한 프레임 디코딩시, 상기 인터 블록(108)은 버스(300a)와 메모리 컨트롤러(400a)를 통해 상기 기준 프레임 메모리(200a)로부터 데이터를 리딩하여 인터 모드 예측을 수행하고, 상기 인터 프레임 블록(108)을 제외한 서브블록들(VLD 블록, IDCT/Q 블록 등)은 상기 버스(300b)와 메모리 컨트롤러(400b)를 통해 상기 프레임 메모리(200b)부터 데이터를 리딩하여 해당 동작들을 수행한다.In decoding one frame, the inter block 108 reads data from the reference frame memory 200a through the bus 300a and the memory controller 400a to perform inter mode prediction, and the inter frame block 108 The subblocks (VLD block, IDCT / Q block, etc.) except for read data from the frame memory 200b through the bus 300b and the memory controller 400b to perform corresponding operations.

요약하면, 본 발명에서는 듀얼 버스(300a, 300b) 및 듀얼 메모리 컨트롤러(400a, 400b)를 제공함으로써 하나의 버스(300a)는 메모리 컨트롤러(400a)를 이용하여 인터 모드 예측을 위한 외부 메모리(200a, 200b)와의 읽기/쓰기 동작을 수행하고, 다른 하나의 버스(300b)는 메모리 컨트롤러(400b)를 이용하여 그 외의 블록이 외부 메모리(200a, 200b)와의 읽기/쓰기 동작을 하도록 한다. 또한, 인터 모드 예측에 속한 메모리 컨트롤러(400a)는 96비트의 데이터 버스로 구성하고, 그 외의 블록이 외부 메모리와의 읽기/쓰기를 위한 메모리 컨트롤러(400b)는 32비트의 데이터 버스로 구성하는 것이다.In summary, the present invention provides the dual buses 300a and 300b and the dual memory controllers 400a and 400b so that one bus 300a uses the memory controller 400a to provide an external memory 200a, for inter mode prediction. The read / write operation with the 200b is performed, and the other bus 300b uses the memory controller 400b to allow other blocks to read / write with the external memories 200a and 200b. In addition, the memory controller 400a belonging to the inter mode prediction is composed of a 96-bit data bus, and the memory controller 400b for reading / writing an external memory with a 32-bit data bus is configured. .

인터 모드 예측을 위해, 기준 프레임 메모리(200a)에 대한 데이터 버스를 96비트로 구성하면, 32비트와 64비트 저장시 각각 3워드, 2워드 액세스 시간이 필요한데 반하여, 한 워드로 액세스 시간을 줄일 수 있다. 또한, 메모리 컨트롤러가 "윈도우 액세스"를 지원하여 한꺼번에 9라인을 읽어올 수 있게 지원하다면, 아래와 같은 메모리 대역폭이 요구된다. 여기서, 메모리 컨트롤러 동작중 "윈도우 액세스"란 메모리 컨트롤러가 버스트 동작이 아닌 윈도우 동작을 수행하는 것으로, 프레임 메모리 맵의 구성시 한 개의 매크로블록이 연속적인 칼럼 어드레스로 이루어져 있지 않을 때 한 개의 매크로블록을 읽어오기 위해서 연속되지 않은 칼럼 어드레스를 연속된 어드레스 처럼 SDRAM 멀티-뱅크와 RAS, CAS 타이밍을 이용하여 동작하는 방법으로 정의될 수 있다. For inter mode prediction, if the data bus for the reference frame memory 200a is configured to 96 bits, access time is reduced to one word while three words and two words are required for 32-bit and 64-bit storage, respectively. . Also, if the memory controller supports "window access" to read 9 lines at a time, the following memory bandwidth is required. Here, "window access" during the operation of the memory controller means that the memory controller performs a window operation instead of a burst operation. When one macroblock is not composed of consecutive column addresses in the frame memory map, one macroblock is selected. For reading, non-contiguous column addresses can be defined as operating using SDRAM multi-bank, RAS, and CAS timings as contiguous addresses.

이러한 경우의 메모리 대역폭을 계산해보면, 4x4 블록의 루마 샘플 보간은 1워드 * 9라인이 필요하게 된다. 버스트 모드가 아닌 윈도우 모드로 액세스한다면 약 18싸이클로 9라인을 읽어 올 수 있다. 한 매크로블록이 4x4 블록이 16개가 존재하므로 18싸이클 * 16이 된다. HD는 매크로 개수가 120 * 67.5개가 되므로 18 * 16 * 120 * 67.5싸이클이 요구된다. 또한 1초당 30프레임으로 디코딩을 수행한다면, 18 * 16 * 120 * 67.5싸이클 * 30 (약 70MHz)이 소요된다. To calculate the memory bandwidth in this case, luma sample interpolation of 4x4 blocks would require 1 word * 9 lines. If you access in windowed mode instead of burst mode, you can read 9 lines in about 18 cycles. One macroblock is 18 cycles * 16 since there are 16 4x4 blocks. HD requires 120 * 67.5 macros, which requires 18 * 16 * 120 * 67.5 cycles. Also, if decoding is performed at 30 frames per second, it takes 18 * 16 * 120 * 67.5 cycles * 30 (about 70MHz).

기존의 방법, 즉 버스 확장과 버스 단순 분리등의 방법보다 메모리 대역폭이 현저하게 떨어짐을 알 수 있다. 결국 H.264 비디오 디코더가 고해상도(HD)를 지원하기 위해 100MHz 미만에서 동작이 가능하다. 위의 계산에서 크로마(Chroma) 샘플 보간은 루마 샘플 보간을 수행하는 메모리 컨트롤러 또는 시스템 버스에 속해 있는 메모리 컨트롤러를 이용하여 동작할 수 있으며, 시스템 구성시 다양한 애플리케이션 요구에 따라 동작을 수행한다. It can be seen that the memory bandwidth is significantly lower than the conventional methods such as bus expansion and bus simple separation. As a result, H.264 video decoders can operate below 100MHz to support high-definition (HD). In the above calculation, chroma sample interpolation may operate using a memory controller that performs luma sample interpolation or a memory controller belonging to a system bus, and performs operations according to various application requirements in system configuration.

기본적인 읽기/쓰기 경로Basic read / write path

도 2a와 도 2b는 한 프레임 디코딩시 쓰기 경로 및 각각의 서브 블록(VLD, Intra, Inter 등등)에 따라 프레임 메모리 또는 기준 프레임 메모리를 결정하여 읽기 동작을 수행하는 읽기 경로를 나타내고 있다. 2A and 2B illustrate a read path for determining a frame memory or a reference frame memory according to a write path and each sub-block (VLD, Intra, Inter, etc.) and performing a read operation when decoding one frame.

먼저, 도 2a를 참조하면, 쓰기 경로는 한 프레임 디코딩시 최종 동작 블럭은 DF 블록(110)이다. DF 블록(110)은 디블록킹 필터링을 수행하는 엔진(도시안됨), 및 외부 메모리에 데이터를 저장하기 위한 DMA(Direct Memory Access)(도시안됨), 즉 버스(300a)에 속한 DMA와 버스(300b)에 속한 DMA로 구성되어 있다. 이때, DMA 동작은 다음과 같다. First, referring to FIG. 2A, when the write path is decoded one frame, the final operation block is the DF block 110. The DF block 110 includes an engine (not shown) that performs deblocking filtering, and a DMA and bus 300b belonging to a direct memory access (DMA) (not shown) for storing data in an external memory, that is, the bus 300a. It consists of DMA belonging to). At this time, the DMA operation is as follows.

DF 블록(110)의 동작이 종료될 때, 버스(300a)에 속한 DMA는 메모리 컨트롤러(400a)를 통하여 기준 프레임 메모리(200a)에 데이터를 저장한다. 또한 버스(300b)에 속한 DMA는 메모리 컨트롤러(300b)를 이용하여 프레임 메모리(200b)에 데이터를 저장한다. When the operation of the DF block 110 ends, the DMA belonging to the bus 300a stores data in the reference frame memory 200a through the memory controller 400a. In addition, the DMA belonging to the bus 300b stores data in the frame memory 200b using the memory controller 300b.

읽기 경로는 각각의 서브 블록들의 동작에 따라 프레임 메모리 또는 기준 프레임 메모리를 사용한다. 도 2b에서, VLD 블록(102), 인트라 블록(106) 등의 서브 블록은 버스(300b)를 이용하여 프레임 메모리(200b)에서 데이터를 읽어와 각각의 블록에 해당하는 동작을 수행하며(스텝1), 인터 블록(108)은 버스(300a)를 이용하 여 기준 프레임 메모리(200a)로부터 데이터를 읽어와 인터 모드 예측을 수행한다(스텝2). 이때, 인터 모드 예측 수행시, 크로마 예측은 메모리 대역폭을 고려하여 프레임 메모리(200b)를 사용할 수도 있다. 결국, 루마 예측은 항상 기준 프레임 메모리(200a)를 이용하며, 크로마 예측은 애플리케이션에 따라 기준 프레임 메모리(200a) 또는 프레임 메모리(200b)를 이용한다.The read path uses a frame memory or a reference frame memory according to the operation of each subblock. In FIG. 2B, sub-blocks such as the VLD block 102 and the intra block 106 read data from the frame memory 200b using the bus 300b to perform operations corresponding to the respective blocks (step 1). The inter block 108 reads data from the reference frame memory 200a using the bus 300a to perform inter mode prediction (step 2). In this case, when performing inter mode prediction, the chroma prediction may use the frame memory 200b in consideration of memory bandwidth. After all, luma prediction always uses reference frame memory 200a, and chroma prediction uses reference frame memory 200a or frame memory 200b depending on the application.

한 개의 매크로블록 디코딩 종료시 읽기/쓰기 경로Read / write path at the end of decoding one macroblock

비디오 디코더(100)는 인터 모드 예측 또는 인트라 모드 예측 수행후 DF 블록(110)의 플래그(Flag)가 활성화되어 있으면, 디블록킹 필터링후에 외부의 기준 프레임 메모리(200a)에 데이터를 저장한다. 만약 DF 블록(110)의 플래그가 활성화되어 있지 않다면, 인터 또는 인트라 모드 예측후 바로 외부의 프레임 메모리(200b)에 데이터를 저장한다. 즉, 기준 프레임 메모리(200a)에는 인터 모드 예측을 위한 기준 프레임 데이터가 저장되어 있으며, 프레임 메모리(200b)에는 인터 또는 인트라 모드 예측을 수행하여 얻어진 예측된 결과가 저장된다. If the flag of the DF block 110 is activated after the inter mode prediction or the intra mode prediction, the video decoder 100 stores the data in the external reference frame memory 200a after the deblocking filtering. If the flag of the DF block 110 is not activated, data is stored in the external frame memory 200b immediately after inter or intra mode prediction. That is, reference frame data for inter mode prediction is stored in the reference frame memory 200a, and predicted results obtained by performing inter or intra mode prediction are stored in the frame memory 200b.

첫째, 인터 모드 예측을 위해 기준 프레임 참조는 도 2b의 스텝2와 같이 버스(300a)를 통해 메모리 컨트롤러(400a)를 이용하여 기준 프레임 메모리(200a)에서 읽어온다. 둘째, 인터 또는 인트라 예측후 예측된 결과값은 도 2a의 스텝1과 같이 버스(300b)를 통해 메모리 컨트롤러(400b)를 이용하여 프레임 메모리(200b)에 저장한다. 여기서, 현재 디코딩된 매크로블록을 메모리 컨트롤러(400b)를 통해 프레임 메모리(200b)에 저장함과 동시에 메모리 컨트롤러(400a)를 통해 기준 프레임 메모 리(200a)에 저장한다(이것에 대한 자세한 동작은 저장 방법에서 서술함). First, for inter mode prediction, the reference frame reference is read from the reference frame memory 200a using the memory controller 400a via the bus 300a as shown in step 2 of FIG. 2B. Second, the predicted result after inter or intra prediction is stored in the frame memory 200b using the memory controller 400b through the bus 300b as shown in step 1 of FIG. 2A. Here, the current decoded macroblock is stored in the frame memory 200b through the memory controller 400b and simultaneously in the reference frame memory 200a through the memory controller 400a. ).

임의의 매크로블록을 디코딩시에, 매크로블록이 인터 모드인 경우에는 버스(300a)를 통해 메모리 컨트롤러(400a)를 이용하여 기준 프레임 메모리(200a)에 저장된 데이터를 읽어오고, 반대로 매크로블록이 인트라 모드이면 메모리 컨트롤러(400b)를 이용하여 프레임 메모리(200b)에 저장된 데이터를 읽어오는 방식으로 동작한다. 따라서, 인터 모드 예측 수행시 메모리 컨트롤러(400a)의 메모리 대역폭은 메모리 컨트롤러(400b)의 메모리 대역폭에 영향을 주지 않으므로, 앞에서 언급된 저주파수에서도 고해상도(HD)를 충분히 지원할 수 있다. When decoding a macroblock, when the macroblock is in the inter mode, the data stored in the reference frame memory 200a is read using the memory controller 400a through the bus 300a, and conversely, the macroblock is in the intra mode. In this case, the data stored in the frame memory 200b is read using the memory controller 400b. Therefore, since the memory bandwidth of the memory controller 400a does not affect the memory bandwidth of the memory controller 400b when performing inter mode prediction, the high resolution HD may be sufficiently supported even at the low frequency mentioned above.

기준 프레임 메모리의 매핑Mapping of Reference Frame Memory

도 3a는 메모리 컨트롤러(400a)에 속한 기준 프레임 메모리(200a)의 4x4 블록 워드 구조를 나타내고 있으며, 도 3b는 메모리 컨트롤러(400b)에 속한 프레임 메모리(200b)의 4x4 블록 워드 구조를 나타내고 있다. 3A illustrates a 4x4 block word structure of the reference frame memory 200a belonging to the memory controller 400a, and FIG. 3B illustrates a 4x4 block word structure of the frame memory 200b belonging to the memory controller 400b.

4x4 블록의 루마 샘플 보간을 수행하기 위해서, 움직임 벡터가 도 4a에서와 같이 0, 1, 4, 5 픽셀 사이에 위치한다면, 루마 샘플 보간이 가로 방향으로 필요한 픽셀 중 l4, l0, 0, 1, 2, 3의 픽셀이 한워드로 필요하다(이미 언급한 바와 같이, 6-탭 필터를 사용하기 때문이다). 만약 움직임 벡터가 3, 7, r0, r1 픽셀 사이에 위치하고 있다면, 보간을 위해 요구되는 가로 픽셀은 1, 2, 3, r0, r4, r8 픽셀이다. 그러므로, 가로축은 왼쪽에 인접한 l4, l0와 오른쪽에 인접한 r0, r4, r8 픽셀과 현재 4x4 블록에 디코딩된 0, 1, 2, 3 픽셀은 기준 프레임 메모리에 한 워드로 저장을 한다. 만약 32비트로 저장을 하게 되면, 3워드를 읽어오게 되지만, 위와 같이 한 워드에 저장하면, 메모리 컨트롤러의 1/3로 줄어드는 효과를 얻을 수 있다. In order to perform luma sample interpolation of a 4x4 block, if the motion vector is located between 0, 1, 4, and 5 pixels as shown in FIG. 4A, the luma sample interpolation requires l4, l0, 0, 1, Pixels of 2 and 3 are needed in one word (as already mentioned, because we use a 6-tap filter). If the motion vector is located between 3, 7, r0, and r1 pixels, the horizontal pixels required for interpolation are 1, 2, 3, r0, r4, r8 pixels. Therefore, the horizontal axis stores l4, l0 adjacent to the left, r0, r4, r8 pixels adjacent to the right, and 0, 1, 2, 3 pixels decoded in the current 4x4 block as one word in the reference frame memory. If you save 32 bits, 3 words are read, but if you save them in one word as above, you can get 1/3 of the memory controller.

다음에, 세로 방향을 고려해보자. 움직임 벡터가 0, 1, 4, 5 픽셀 사이에 위치한다면 세로축으로 루마 샘플 보간을 수행하기 위해 필요한 픽셀은 위에 인접된 블록의 픽셀 u5, u0와 현재 4x4 블록에 디코딩된 픽셀인 0, 4, 8, 12 픽셀이 요구된다. 만약 움직임 벡터가 12, 13, d0, d4 픽셀 사이에 위치하면, 루마 샘플 보간이 세로 방향으로 필요한 픽셀 중 4, 8, 12, d0, d4, d8 픽셀이 요구된다. 그러므로, 세로축의 아래에 인접한 d0, d4, d8와 현재의 4x4 블록에 디코딩한 픽셀인 4, 8, 12 픽셀을 이용하여 루마 샘플 보간을 수행한다. 결국 4x4 블록에 대한 루마 샘플 보간을 수행하기 위해서는 세로축으로 9라인이 필요하다.Next, consider the portrait orientation. If the motion vector is located between 0, 1, 4, and 5 pixels, the pixels needed to perform luma sample interpolation on the vertical axis are pixels u5, u0 of the adjacent block above and 0, 4, 8, which are pixels decoded in the current 4x4 block. 12 pixels are required. If the motion vector is located between 12, 13, d0, and d4 pixels, 4, 8, 12, d0, d4, and d8 pixels are required among the pixels for which luma sample interpolation is required in the vertical direction. Therefore, d0, d4, d8 adjacent to the bottom of the vertical axis Luma sample interpolation is performed using 4, 8, and 12 pixels decoded in the current 4x4 block. As a result, 9 lines are required on the vertical axis to perform luma sample interpolation for a 4x4 block.

DF 블록에서 FMO 활성화/비활성화시 읽기/쓰기 경로Read / write path when FMO is enabled / disabled in DF block

한가지 주의할 점은 H.264 비디오 디코더는 라스터 스캔(Raster Scan) 방식 뿐만 아니라 FMO(Flexible Macroblock Ordering)를 지원한다는 것이다. H.264 비디오 디코딩시에 DF 블록(110)의 플래그가 활성화되어 있다면 문제가 발생하지 않으나, DF 블록의 플래그가 비활성화 되어 있다면 다음과 같은 시퀀스를 따라야 한다. 여기서, 상기 플래그는 역양자화된 영상 데이터의 역양자화 계수의 분포 및 이전 프레임과 현재 프레임의 차를 나타내는 움직임 벡터를 이용하여 추출되며, 디코딩된 영상의 루프 필터링 필요성 여부를 나타내는 정보이다.One thing to note is that the H.264 video decoder supports flexible macroblock ordering (FMO) as well as raster scan. If the flag of the DF block 110 is activated during H.264 video decoding, no problem occurs. If the flag of the DF block is deactivated, the following sequence should be followed. Here, the flag is extracted by using a distribution of inverse quantization coefficients of dequantized image data and a motion vector indicating a difference between a previous frame and a current frame, and information indicating whether loop filtering of the decoded image is necessary.

앞서 설명한 바와 같이, 인터 모드 예측 또는 인트라 모드 예측을 수행한후 메모리 컨트롤러(400b)를 통해 프레임 메모리(200b)에 저장을 한다. 한 프레임 디코딩이 종료되면, DF 블록(110)은 플래그가 비활성화되어 있어도 라스터 스캔 방식으로 메모리 컨트롤러(400b)를 이용하여 프레임 메모리(200b)에서 데이터를 읽어온다(도 4의 스텝1). As described above, the inter mode prediction or the intra mode prediction is performed and then stored in the frame memory 200b through the memory controller 400b. When one frame decoding is finished, the DF block 110 reads data from the frame memory 200b using the memory controller 400b in a raster scan method even if the flag is inactivated (step 1 of FIG. 4).

또한, 한개의 매크로블록을 읽어온 후, 두번째 매크로블록을 읽어오면, 미리 읽어온 첫번째 매크로블록과 조합하여 도 3과 같은 구조로 한개의 워드로 만들어 메모리 컨트롤러(400a)를 이용하여 기준 프레임 메모리(200a)의 첫번째 매크로블록 위치에 저장을 한다(도 4의 스텝2). In addition, when one macroblock is read and then the second macroblock is read, in combination with the previously read first macroblock, a word is formed into a structure as shown in FIG. 3 by using the memory controller 400a. Save to the first macroblock position of 200a) (step 2 of Fig. 4).

계속해서 세번째 매크로블록을 메모리 컨트롤러(400b)를 통해 프레임 메모리(200b)에서 읽어온 후 미리 가져온 첫번째 매크로블록과 두번째 매크로블록과 합쳐 도 3과 같이 구성한 다음에, 메모리 컨트롤러(400b)를 이용하여 기준 프레임 메모리(200a)의 두번째 매크로블록에 저장을 한다. 도 3의 기준 프레임 메모리와 같이 메모리 맵을 구성하기 위해서는 두개의 매크로블록을 저장할 수 있는 메모리가 내부에 존재해야 한다. 위와 같이 기준 프레임 메모리에 한개의 워드 구조를 96비트로 구성함으로써 인터 모드 예측 수행시 메모리 액세스 시간을 줄일 수 있다. Subsequently, the third macroblock is read from the frame memory 200b through the memory controller 400b and combined with the first macroblock and the second macroblock previously obtained as shown in FIG. 3, and then the reference is performed using the memory controller 400b. The second macroblock of the frame memory 200a is stored. In order to construct a memory map like the reference frame memory of FIG. 3, a memory capable of storing two macroblocks must be present therein. As described above, by configuring a single word structure in 96 bits in the reference frame memory, memory access time can be reduced when performing inter mode prediction.

이상에서와 같이 본 발명에 따르면, 인터 모드 예측동안 외부 메모리에 대한 액세스를 제어하는 별도의 버스와 메모리 컨트롤러를 제공함으로써 각 블록에 대한 부하가 최소화되며, 100MHz의 미만에서 H.264 HD급 비디오 디코더를 지원하는 비 디오 디코더 시스템을 제공할 수 있다는 장점을 가진다. As described above, according to the present invention, by providing a separate bus and memory controller that controls access to external memory during inter-mode prediction, the load on each block is minimized, and H.264 HD video decoder at less than 100 MHz It has the advantage that it can provide a video decoder system that supports.

본 발명에 따른 비디오 디코더 시스템은 인터넷으로도 무리없이 DVD수준의 동영상을 디코딩할 수 있어 HD급 주문형 비디오(VOD) 등 콘텐츠 서비스에 유용할 것으로 예상될 수 있으며, 디지털 방송을 위한 수신기 및 실시간 영상통화 등등의 다양한 애플리케이션에 적용이 가능하다. The video decoder system according to the present invention can decode DVD-level videos without difficulty even over the Internet, and can be expected to be useful for content services such as HD video on demand (VOD), a receiver for digital broadcasting, and a real-time video call. It can be applied to various applications.

이상 설명한 내용을 통해 당업자라면 본 발명의 기술 사상을 일탈하지 아니하는 범위에서 다양한 변경 및 수정이 가능함을 알 수 있을 것이다. 따라서, 본 발명의 기술적 범위는 실시예에 기재된 내용으로 한정되는 것이 아니라 특허 청구의 범위에 의하여 정해져야 한다. Those skilled in the art will appreciate that various changes and modifications can be made without departing from the spirit of the present invention. Therefore, the technical scope of the present invention should not be limited to the contents described in the embodiments, but should be defined by the claims.

Claims

A video decoder comprising a variable length decoding block, an inverse discrete cosine transform / quantization block, an inter block, an intra block, a deblocking filter block;

An external memory for storing decoded data using the video decoder or outputting stored data for decoding, the external memory comprising a frame memory and a reference frame memory;

A first bus, a first memory controller, a second bus, and a second memory controller controlling access to the external memory;

In decoding one frame, the inter block performs inter-mode prediction by reading data from the reference frame memory through the first bus and the first memory controller, and subblocks except for the inter block include the second bus and the second bus. 2. The video decoder system of claim 1, wherein the video decoder system reads data from the frame memory and performs corresponding operations.

The method of claim 1, wherein the first memory controller controls access to the reference frame memory using a 96-bit data bus, and the second memory controller controls access to the frame memory using a 32-bit data bus. Video decoder system.

The video decoder system of claim 1, wherein the predicted result value in the inter block or intra block is stored in the frame memory through the second bus and the second memory controller.

The method of claim 1, wherein when performing luma sample interpolation in inter frame prediction, one block is formed by using two pixels of the left block and five pixels of the right block in the horizontal direction when constructing one block, and configuring the vertical direction. And a second line of the upper block and three lines of the lower block.

The video decoder system of claim 1, wherein, in performing inter-frame prediction, when performing chroma sample interpolation, either a frame memory or a reference frame memory can be used.

The video decoder system of claim 1, wherein in the deblocking filter block, one macroblock is configured by using a memory inside the deblocking filter block when storing decoded data in a reference frame memory when FMO is activated.

The method of claim 1, wherein the first and second memory controllers access one macroblock with consecutive column addresses using multi-bank, RAS, and CAS timings. Video decoder system.