KR20070111467A

KR20070111467A - Scratch pad for storing intermediate loop filter data

Info

Publication number: KR20070111467A
Application number: KR1020077017514A
Authority: KR
Inventors: 빌 콴; 에릭 슬렌져; 케이시 킹; 라쿠엘 로자스
Original assignee: 어드밴스드 마이크로 디바이시즈, 인코포레이티드
Priority date: 2005-01-25
Filing date: 2006-01-17
Publication date: 2007-11-21
Also published as: US20060165164A1; GB0712488D0; US7792385B2; TW200701795A; TWI382764B; WO2006081098A1; DE112006000270T5; CN101160971A; GB2435788B; CN101160971B; GB2435788A; JP2008529412A

Abstract

A video processing apparatus and methodology are implemented as a combination of a processor and a video decoding hardware block to decode video data by providing the video decoding block with an in-loop filter and a scratch pad memory, so that the in-loop filter may efficiently perform piecewise processing of overlap smoothing and in-loop deblocking in a macroblock-based fashion which is a much more efficient algorithm than th frame-based method.

Description

Scratch PAD for storing intermediate loop filter data {SCRATCH PAD FOR STORING INTERMEDIATE LOOP FILTER DATA}

본 발명은 비디오 프로세싱 기술에 관한 것이다. 일 측면으로 본 발명은 디지탈 비디오 정보의 압축해제(decompression)에 관한 것이다.The present invention relates to video processing techniques. In one aspect, the present invention is directed to decompression of digital video information.

비디오 정보는 방대한 양의 저장 공간을 필요로 하기 때문에, 비디오 정보는 압축되는 것이 일반적이다. 따라서, 예를 들면, CD-ROM 또는 DVD 에 저장되어 있는 압축된 비디오 정보를 디스플레이하기 위해서는, 압축된 비디오 정보가 압축해제되어야만 압축해제된 비디오 정보가 제공될 수 있다. 이후, 압축해제된 비디오 정보는 비트 스트림(bit stream)으로 디스플레이 장치에 제공된다. 비디오 정보의 압축해제된 비트 스트림은, 디스플레이 상의 픽셀 위치들에 상응하는 메모리 위치들에 비트맵(bit map) 으로서 저장되는 것이 일반적이다. 디스플레이 상에서 정보의 한 화면(single screen)을 제공하기 위해 필요한 비디오 정보는 프레임이라 지칭된다. 일련의 프레임들을 디스플레이함으로써 동영상(motion video)을 제공하기 위해서, 압축된 비디오 정보를 빠르고 효율적으로 디코딩(decoding)하는 것이 많은 비디오 시스템의 목표이다. Since video information requires a large amount of storage space, video information is generally compressed. Thus, for example, in order to display compressed video information stored in a CD-ROM or DVD, the decompressed video information can be provided only when the compressed video information is decompressed. The decompressed video information is then provided to the display device as a bit stream. The decompressed bit stream of video information is typically stored as a bit map in memory locations corresponding to pixel locations on the display. The video information needed to provide a single screen of information on the display is referred to as a frame. In order to provide motion video by displaying a series of frames, the goal of many video systems is to quickly and efficiently decode the compressed video information.

기록 매체, 디바이스들, 및 가령, 비디오 압축과 같은 데이터 처리에 관한 다양한 관점들을 표준화하는 것은, 이러한 기술 및 이들의 응용예들이 지속적으로 발전하기 위해서 매우 바람직한 일이다.Standardizing various aspects of recording media, devices, and data processing such as, for example, video compression, is highly desirable for the continued development of these technologies and their applications.

비디오 정보를 압축 및 압축해제하는 많은 압축(압축해제) 표준들이 개발되었으며 또는 개발중에 있는바, 이러한 것들로는 비디오 인코딩 및 디코딩을 위한 MPEG(Motion Picture Expert Group) 표준들(예를 들면, MPEG-1, MPEG-2, MPEG-3, MPEG-4, MPEG-7, MPEG-21)이 있으며, 또는 윈도우 미디어 비디오 압축 표준(Window Media Video compression standard)(예를 들면, WMV9)를 들 수 있다. MPEG 및 WMV 표준들 각각은, 마치 본 명세서에 개시된 것 처럼 전체로서 본 명세서에서 참조된다. Many compression (decompression) standards have been developed or are under development, which compress and decompress video information, such as the Motion Picture Expert Group (MPEG) standards for video encoding and decoding (e.g. MPEG- 1, MPEG-2, MPEG-3, MPEG-4, MPEG-7, MPEG-21, or the Window Media Video compression standard (for example, WMV9). Each of the MPEG and WMV standards is referred to herein as a whole, as if disclosed herein.

일반적으로, 비디오 압축 기술들은, 비디오 프레임들에 나타나는 공간적(spatial) 리던던시 및 시간적(temporal) 리던던시 모두를 감소시킴으로써 비디오 정보를 압축하도록 작용하는, 인트라프레임 압축(intraframe compression) 및 인터프레임 압축(interframe compression)을 포함한다. 인트라프레임 압축 기술들은, 프레임을 압축하기 위해서 프레임에 내포된 정보만을 사용하는바, 이는 I-프레임이라고 호칭된다. 인터프레임 압축 기술들은, 선행 및/또는 후속 프레임들과 관련된 프레임들을 압축하는바, 이들은 예측된 프레임들, P-프레임들, 또는 B 프레임들 이라고 호칭된다. 인트라프레임 및 인터프레임 압축 기술들은 통상적으로 공간(spatial) 또는 블록 기반(block-based)의 인코딩을 사용하는바, 이에 의해 하나의 비디오 프레임이 인코딩을 위한 블록들로 분할된다(또한, 블록 변환 프로세스라고도 호칭되기도 한다). 예를 들면, 하나의 I-프레임은 8×8 블록들로 분할된다. 상기 블록들은, 계수(coefficient)를 특정한 코사인 기반 함수의 진폭으로 인코딩하는 이산 코사인 변환(discrete cosine transformation : DCT) 코딩 체계를 이용하여 코딩되거나, 또는 몇몇 다른 변환들[예를 들면, 정수(integer) 변환]을 이용하여 코딩된다. 변환된 계수들은 이후 양자화되며, 0이 아닌(nonzero) 진폭 레벨들을 갖는 계수들과 0 인 진폭 레벨을 갖는 계수들의 부분열(run or subsequence)이 생성된다. 양자화된 계수들은 이후, 0 인 계수들의 기다란 런(run)들을 압축하기 위해서, 런(run) 레벨 인코딩(또는 런 길이 인코딩) 된다. In general, video compression techniques operate to compress video information by reducing both spatial and temporal redundancy appearing in video frames, intraframe compression and interframe compression. ). Intraframe compression techniques use only the information embedded in a frame to compress the frame, which is called an I-frame. Interframe compression techniques compress frames associated with preceding and / or subsequent frames, which are called predicted frames, P-frames, or B frames. Intraframe and interframe compression techniques typically use spatial or block-based encoding, whereby one video frame is divided into blocks for encoding (also a block conversion process). It may also be called). For example, one I-frame is divided into 8x8 blocks. The blocks may be coded using a discrete cosine transformation (DCT) coding scheme that encodes a coefficient to the amplitude of a particular cosine-based function, or some other transforms (eg, integers). Coded using Transform. The transformed coefficients are then quantized, producing a run or subsequence of coefficients with nonzero amplitude levels and coefficients with an amplitude level of zero. The quantized coefficients are then run level encoded (or run length encoded) to compress long runs of zero coefficients.

이후, 이러한 결과물들은, 코드워드들을 인코딩될 값들에 할당하는 통계적인 코딩 기법을 사용하는 가변 길이 코더(variable length coder : VLC)에서 엔트로피(entropy) 코딩되거나 또는, 다른 엔트로피 코딩 기법들[가령, 콘텍스트 기반의 적응형 이진 산술 코딩(Context-based Adaptive Binary Arithmetic Coding : CABCA), 콘텍스트 적응형 가변 길이 코딩(Context Adaptive Variable Length Coding : CAVLC) 및 이와 유사한 것들]을 이용하여 엔트로피 코딩된다. 높은 발생빈도(frequency of occurrence)를 갖는 값들은 짧은 코드워드들에 할당되며, 낮은 발생빈도를 갖는 값들은 긴 코드워드들에 할당된다. 평균적으로는, 더 자주 발생하는 짧은 코드워드들이 우세하게 되며, 그 결과 코드 스트링은 원본 데이터보다 더 짧아진다. 따라서, 공간적 또는 블록 기반의 인코딩 기법들은 하나의 프레임에 관련된 디지털 정보를 압축한다. 일련의 프레임들과 관련된 디지털 정보들을 압축하기 위해서, 비디오 압축 기술들은 P-프레임들 및/또는 B-프레임들을 사용하는바, 이는 연속적인 프레임들 사이에는 시간적인 상관성(correlation)이 있다는 사실을 활용하기 위함이다. 비록, 서로 다른 구현예에서는 서로 다른 블록 구성들을 사용할 수도 있지만, 인터프레임 압축 기술들은 서로 다른 프레임들 간의 차이점들을 식별할 것이며, 이후 DCT, 양자화, 런 길이 및 엔트로피 인코딩 기법을 사용하여 이러한 차이점 정보를 공간적으로 인코딩할 것이다. 예를 들면, P-프레임은 16×16 매크로 블록들로 분할되며[예를 들면, 4개의 8×8 휘도(luminance) 블록들 및 2개의 8×8 색차(chrominance) 블록들] 그리고 상기 매크로 블록들이 압축된다. 인터프레임 또는 인트라프레임 압축 기술들이 사용되는가에 관계없이, 비디오 데이터를 인코딩하는데 공간적인 또는 블록-기반의 인코딩 기법들을 사용한다는 것은, 압축된 비디오 데이터가 가변 길이 인코딩되었다는 것을 의미하며, 전술한 바와같은 블록-기반의 압축 기법들을 사용하여 다르게 압축되었다는 것을 의미한다. These results are then entropy coded in a variable length coder (VLC) using a statistical coding technique that assigns codewords to values to be encoded, or other entropy coding techniques [eg, context]. Entropy coded using Context-based Adaptive Binary Arithmetic Coding (CABCA), Context Adaptive Variable Length Coding (CAVLC) and the like. Values with high frequency of occurrence are assigned to short codewords and values with low frequency are assigned to long codewords. On average, more frequent short codewords prevail, resulting in a shorter code string than the original data. Thus, spatial or block-based encoding techniques compress digital information related to one frame. To compress digital information related to a series of frames, video compression techniques use P-frames and / or B-frames, which take advantage of the fact that there is a temporal correlation between successive frames. To do this. Although different implementations may use different block configurations, interframe compression techniques will identify differences between different frames and then use DCT, quantization, run length, and entropy encoding techniques to account for this difference information. We will encode spatially. For example, a P-frame is divided into 16 × 16 macroblocks (eg, four 8 × 8 luminance blocks and two 8 × 8 chrominance blocks) and the macro block Are compressed. Regardless of whether interframe or intraframe compression techniques are used, the use of spatial or block-based encoding techniques to encode video data means that the compressed video data has been variable length encoded, as described above. Compressed differently using block-based compression techniques.

수신기 측 또는 재생(playback) 디바이스 측에서는, 블록 변환들로 처리되었던 비디오 데이터를 디코딩하기 위해 상기의 압축 단계들이 거꾸로 수행된다(reverse). 도1은 비디오 정보를 압축해제하기 위한 통상적인 시스템(30)을 도시한 도면으로, 상기 시스템은 입력 스트림 디코딩부(35), 모션(motion) 디코딩부(38), 합산기(39), 프레임 버퍼(40), 및 디스플레이(41)를 포함하여 구성된다. 입력 스트림 디코딩부(35)를 살펴보면, 압축된 비디오 정보의 스트림을 입력 버퍼(31)에서 수신하며, VLC 디코더(32)에서 가변 길이 디코딩을 수행하며, 역양자화기(33)에서 지그재그 및 양자화를 거꾸로 수행하며, 역 DCT 유닛(34)에서 DCT 변환을 거꾸로 수행하며, 정적으로(staticly) 압축해제된 비디오 정보의 블록들을 합산기(39)에 제공한다. 모션 디코딩부(38)에서는, 예전 화면(previous picture) 데이 터의 복사본[예전 화면(picture) 저장 버퍼(36)에 저장되어 있는]과 VLC 디코더(32)로부터의 모션 정보를 모션 보상 유닛(37)이 수신하며, 그리고 모션-보상된 픽셀들을 합산기(39)로 제공한다. 합산기(39)는 정적으로(staticly) 압축해제된 비디오 정보 및 모션 보상된 픽셀들을 수신하며, 압축해제된 픽셀들을 프레임 버퍼(40)에 제공하는바, 이후 프레임 버퍼는 상기 정보를 디스플레이(41)에 제공한다.At the receiver side or the playback device side, the above compression steps are reversed to decode the video data that has been processed into block transforms. 1 shows a typical system 30 for decompressing video information, which comprises an input stream decoder 35, a motion decoder 38, a summer 39, a frame. And a buffer 40 and a display 41. Referring to the input stream decoding unit 35, the stream of compressed video information is received by the input buffer 31, the VLC decoder 32 performs variable length decoding, and the dequantizer 33 performs zigzag and quantization. Performs upside down, inverts DCT conversion in inverse DCT unit 34 and provides blocks of statically decompressed video information to summer 39. In the motion decoding unit 38, a copy of the previous picture data (stored in the previous picture storage buffer 36) and motion information from the VLC decoder 32 are added to the motion compensation unit 37. ) Receives and provides motion-compensated pixels to summer 39. Summer 39 receives statically decompressed video information and motion compensated pixels, and provides decompressed pixels to frame buffer 40, which then displays the information 41 To provide.

통상적인 비디오 인코더 및 디코더 디자인에서는, 블록-기반의 변환, 모션 보상, 양자화 및/또는 또 다른 손실성(lossy) 처리 단계들로부터, 블록킹 아티팩트(blocking artifact)(블록들 간의 현저한 불연속들)가 프레임으로 유입될 수 있다. 블록킹 아티팩트들을 감소시키기 위한 종래의 시도들은, 블록들 사이의 경계들을 평활화함으로서 프레임들을 처리하는, 오버랩 평활화(overlap smoothing) 또는 디블록킹 필터링(deblocking filtering)(인-루프 또는 프로세싱 전에)을 사용하였다. 예를 들어, WMV9 표준의 경우, 블록킹 아티팩트들을 감소시키기 위해, 오버랩 평활화 및 인 루프(in loop) 디블록킹이 전체 화면(whole picture) 상에서 프로세싱된다는 점이 상술되었다. WMV9 디코딩의 경우, 오버랩 평활화는 8×8 블록 경계들 상에서만 수행되는바, 전체 프레임에 대해 수직 방향으로 평활화가 시작되며, 이후 평활화는 전체 프레임에 대해 수평 방향으로 수행된다. 다음으로, 인루프 디블록킹은, 활성화되었을 때, 다음과 같은 순서로 수행된다. (ⅰ) 최상층 라인으로부터 시작하여 프레임내의 모든 8×8 블록의 수평 경계선들이 필터링되며; (ⅱ) 최상층 라인으로부터 시작하여 프레임내의 모든 8×4 서브 블록의 수평 경계선들이 필터링되며; (ⅲ) 가장 좌측의 라인으로부터 시작하여 모든 8×8 블록의 수직 경계선들이 필터링되며; (ⅳ) 가장 좌측의 라인으로부터 시작하여 모든 4×8 서브 블록의 수직 경계선들이 필터링된다. 종래의 방법들은 전체 프레임에 대해서 2개의 패스(pass)들을 사용하였는바, 제 1 패스는 오버랩 평활화를 수행하는 것이며, 제 2 단계는 인-루프(in-loop) 디블록킹을 위한 것이다. 개별적인 단계를 수행하느냐 마느냐를 결정할 때에 적용되는 또 다른 요구 사항들(예를 들면, 파라미터 PQUANT 및 블록 타입들과 관련된)이 있을 수도 있지만, 이러한 프로세스들의 목적은 16×16 매크로 블록, 8×8 블록들, 또는 4×4 서브 블록들의 가장자리(edge)들을 평활화하는 것이며, 따라서, 2차원 변환 및 양자화에 의한 아티팩트들(blockiness 등)이 제거된다.In a typical video encoder and decoder design, blocking artifacts (significant discontinuities between blocks) result from block-based transform, motion compensation, quantization, and / or other lossy processing steps. Can be introduced into. Prior attempts to reduce blocking artifacts have used overlap smoothing or deblocking filtering (prior to in-loop or processing), which processes frames by smoothing the boundaries between blocks. For example, in the case of the WMV9 standard, it has been described above that overlap smoothing and in loop deblocking are processed on the whole picture to reduce blocking artifacts. In the case of WMV9 decoding, overlap smoothing is performed only on 8x8 block boundaries, so smoothing starts in the vertical direction for the entire frame, and then smoothing is performed in the horizontal direction for the entire frame. Next, in-loop deblocking, when activated, is performed in the following order. (Iii) horizontal boundaries of all 8x8 blocks in the frame, starting from the topmost line, are filtered; (Ii) horizontal boundaries of all 8x4 subblocks in the frame are filtered, starting from the topmost line; (Iii) the vertical boundaries of all 8x8 blocks are filtered, starting from the leftmost line; (Iii) The vertical boundaries of all 4x8 subblocks, starting from the leftmost line, are filtered out. Conventional methods used two passes for the entire frame, where the first pass is to perform overlap smoothing and the second step is for in-loop deblocking. There may be other requirements that apply when deciding whether to perform individual steps (eg, related to parameter PQUANT and block types), but the purpose of these processes is 16 × 16 macroblocks, 8 × 8 blocks. , Or edges of 4x4 subblocks, thus eliminating artifacts (such as blockiness) by two-dimensional transform and quantization.

비디오 압축해제를 처리하는 프로세서 기반의(processor-based) 접근법들의 경우, 평활화 또는 디블록킹 기능을 부가한다는 것은 계산량이 매우 과도한 필터링 프로세스이다. 이러한 종류의 프로세싱은, 프레임(예를 들면, 640×480 픽셀의 VGA 사이즈는 307 kByte에 상당한다)을 유지할 수 있는 매우 큰 메모리 버퍼가 존재할 때에, 소프트웨어 상에서 수행될 수 있다. 다른 한편으로, 디코딩을 위한 하드웨어 기반의 접근법들의 경우, 평활화 및 디블록킹이 동시에 수행되지 못했으며, 그리고 프레임 전체에 대해서 디블록킹이 수행되었는바, 이는 매우 큰 국부(local) 메모리를 필요로 하고, 상당한 정도의 버스 대역폭 요구들을 강요하며, 그리고 메모리 액세스 시간을 희생한다. 결과적으로, 압축해제 방법들에 관련된 프로세싱 요구들을 감소시켜야할 필요성 및 특히, 오버랩 평활화 및/또는 디블록킹 필터 동작들을 포 함하는 압축해제 동작들을 향상시켜야할 필요성이 상당부분 존재한다.For processor-based approaches that handle video decompression, adding smoothing or deblocking functionality is a very computational filtering process. This kind of processing can be performed in software when there is a very large memory buffer that can hold a frame (e.g., a VGA size of 640x480 pixels corresponds to 307 kBytes). On the other hand, for hardware-based approaches for decoding, smoothing and deblocking were not performed at the same time, and deblocking was performed for the entire frame, which requires very large local memory, It imposes significant bus bandwidth demands and sacrifices memory access time. As a result, there is a considerable need to reduce processing requirements associated with decompression methods and in particular to improve decompression operations, including overlap smoothing and / or deblocking filter operations.

첨부된 도면들 및 후속으로 상술될 상세한 설명을 참조하여 본 출원의 나머지 부분들을 검토한 이후에, 종래 시스템의 또 다른 단점들 및 제한 사항들은 해당 기술분야의 당업자에게 자명해질 것이다. After reviewing the remainder of the present application with reference to the accompanying drawings and the following detailed description, further disadvantages and limitations of the prior system will become apparent to those skilled in the art.

비디오 압축해제에 있어 소프트웨어 및 하드웨어의 조합을 사용함에 의해, 서로 다른 다양한 비디오 압축 체계(scheme)를 빠르고 효과적으로 처리할 수 있는 유연한 압축해제 시스템이 제공된다. 상기 유연한 압축해제 시스템은, 프론트 엔트(front end) 압축해제 단계들을 수행하는 프로세서와 백 엔드(back end) 압축해제 단계들을 수행하는 비디오 가속기를 포함한다. 비디오 프레임 데이터 상에 오버랩 평활화와 인-루프 디블록킹 필터 동작을 수행하는 비디오 가속기에서 메모리 대역폭 요구를 감소시키기 위해서, 상기 인-루프 필터는, 스크래치 패드 메모리 또는 저장 디바이스에 접속되는바, 이는 매크로블록-기반 방식에서 오버랩 평활화 및 인-루프 디블록킹의 구분적(piecewise) 프로세싱을 용이하게 한다. 매크로 블록 기반의 방식으로 필터링 동작의 구분적 프로세싱을 수행하는 스크래치 패드를 사용하는 것은, 프레임 기반의 방법보다 더 효율적이다. 스크래치 패드 메모리의 사이즈는 프레임의 폭(width)과 관련되어 있으므로, 온-칩(on-chip) 메모리의 양(amount)은 감소된다. 예를 들면, 스크래치 패드 메모리의 사이즈는, 비디오 데이터의 한 프레임으로부터 부분적으로 필터링된 블록들의 하나의 행(row)만을 유지할 정도의 사이즈일 수도 있다.By using a combination of software and hardware in video decompression, a flexible decompression system is provided that can quickly and effectively handle a variety of different video compression schemes. The flexible decompression system includes a processor that performs front end decompression steps and a video accelerator that performs back end decompression steps. In order to reduce memory bandwidth requirements in video accelerators that perform overlap smoothing and in-loop deblocking filter operations on video frame data, the in-loop filter is connected to a scratch pad memory or a storage device, which is a macroblock. Facilitate piecewise processing of overlap smoothing and in-loop deblocking in a -based manner. Using a scratch pad that performs the fractional processing of the filtering operation in a macro block-based manner is more efficient than the frame-based method. Since the size of the scratch pad memory is related to the width of the frame, the amount of on-chip memory is reduced. For example, the size of the scratch pad memory may be large enough to hold only one row of partially filtered blocks from one frame of video data.

본 발명의 하나 이상의 실시예에 따르면, 비디오 프로세싱 시스템, 장치 및 방법이 제공되는바, 이에 의해 프로세서 및 비디오 디코드 회로는 복수의 매크로 블록들로의 블록 변환들로 처리된 비디오 데이터를 디코딩한다. 디코드 동작에 관하여, 적어도 하나의 집적회로 상에 제공되는 인-루프 필터 및 스크래치 패드 메모리가, 제 1 매크로 블록내의 선택된 픽셀 데이터를 평활화 및 디블록킹함으로써 하나 이상의 완료된 블록들 및 하나 이상의 부분적으로 필터링된 블록들을 생성하는 구분적 프로세싱을, 수행하기 위해 사용되는바, 적어도 하나의 부분적으로 필터링된 블록들은 제어 데이터 및 픽셀 데이터(스크래치 패드 메모리에 저장됨)를 내포하고 있다. 그 결과, 이전에 프로세싱된 매크로 블록에 인접한 임의의 블록은, 오버랩 평활화 및 디블록킹을 위해 완전하게 필터링될 수도 있으며 이후 제 1 필터링 동작동안에 완료된 블록으로서 출력될 수도 있다. 반면에 후속으로 처리된 매크로 블록에 인접한 블록은 오버랩 평활화 및 디블록킹을 위해 부분적으로 필터링될 수도 있으며 이후 부분적으로 필터링된 블록으로서 스크래치 패드 메모리에 저장될 수도 있다. 스크래치 패드 메모리는, 제 1 매크로 블록에서 픽셀 데이터를 평활화 및 디블록킹하기 위해 사용되는 부분적으로 필터링된 블록을 제공하기 위해 인-루프 필터에 의해 사용되며, 상기 페치된 부분적으로 필터링된 블록은, 이전 매크로 블록을 프로세싱하는 동안에 생성되었다. 한번에 하나의 매크로 블록을 오버랩 평활화 및 디블록킹하기 위해, 비디오 프레임의 매크로 블록들의 각 행(row)을 연속으로 처리함으로써, 다수의 매크로 블록들에 대해 파이프라인화된 방식으로 평활화 및 디블록킹을 연속으로 수행하는 인-루프 필터 및 스크래치 패드 메모리가 사용될 수도 있다. 본 발명의 목적들, 장점들 및 다른 신규한 특질들은, 첨부된 청구항들 및 도면들과 연계하여 다음에 서술된 상세한 설명을 숙독함으로서 해당 기술분야의 당업자에게 명백해질 것이다. In accordance with one or more embodiments of the present invention, a video processing system, apparatus, and method are provided whereby a processor and video decode circuit decode video data processed into block transforms into a plurality of macro blocks. Regarding the decode operation, the in-loop filter and scratch pad memory provided on the at least one integrated circuit may be further configured to include one or more completed blocks and one or more partially filtered by smoothing and deblocking selected pixel data in the first macro block. Used to perform the fractional processing of generating blocks, the at least one partially filtered blocks contain control data and pixel data (stored in scratch pad memory). As a result, any block adjacent to the previously processed macro block may be completely filtered for overlap smoothing and deblocking and then output as a completed block during the first filtering operation. On the other hand, a block adjacent to a subsequently processed macro block may be partially filtered for overlap smoothing and deblocking and then stored as a partially filtered block in the scratch pad memory. The scratch pad memory is used by an in-loop filter to provide a partially filtered block that is used to smooth and deblock pixel data in a first macro block, wherein the fetched partially filtered block is Created while processing a macro block. Continuously smoothing and deblocking in a pipelined manner over multiple macroblocks by successively processing each row of macroblocks in a video frame to overlap and deblock one macroblock at a time. An in-loop filter and scratch pad memory may be used. Objects, advantages and other novel features of the present invention will become apparent to those skilled in the art upon reading the following detailed description in conjunction with the appended claims and drawings.

도1은 비디오 정보를 압축해제하는 시스템을 도시한 도면이다.1 is a diagram illustrating a system for decompressing video information.

도2는 본 발명에 따른 예시적인 비디오 압축해제 시스템을 도시한 도면이다. 2 illustrates an exemplary video decompression system in accordance with the present invention.

도3은 본 발명에서 선택된 실시예에 따른 하드웨어에서 오버랩 평활화 및 인-루프 디블록킹을 효과적으로 처리하도록 스크래치 패드 메모리를 사용하는 인-루프 필터링 프로세스를 간략화하여 예시한 도면이다.FIG. 3 is a simplified illustration of an in-loop filtering process using scratch pad memory to effectively handle overlap smoothing and in-loop deblocking in hardware according to an embodiment selected in the present invention.

도4는 비디오 인코더 또는 디코더에서 평활화 및 디블록킹 필터를 사용하여, 디코딩된 프레임의 blockiness 를 감소시키는 예시적인 기술을 도시한 도면이다.4 illustrates an example technique for reducing blockiness of a decoded frame using smoothing and deblocking filters in a video encoder or decoder.

도5A 내지 5K는 루마(luma) 블록들에 대한 평활화 및 디블록킹 절차를 구현하기 위해, 구분적 프로세싱이 어떻게 사용될 수도 있는지를 예시적으로 도시한 도면이다.5A-5K illustratively illustrate how fractional processing may be used to implement a smoothing and deblocking procedure for luma blocks.

도6A 내지 6F는 크로마(chroma) 블록들에 대한 평활화 및 디블록킹 절차를 구현하기 위해, 구분적 프로세싱이 어떻게 사용될 수도 있는지를 예시적으로 도시한 도면이다.6A-6F illustrate how fractional processing may be used to implement a smoothing and deblocking procedure for chroma blocks.

본 발명의 예시적인 실시예들이 다음에서 설명되지만, 본 발명은 특정한 상세 없이도 실시될 수도 있다는 것이 이해될 것이며 명확성을 위해서, 실제 구현에 서의 모든 특징들이 본 명세서에서 설명되지는 않았다. 이러한 임의의 실제 구현예를 개발하는데 있어서, 개발자들의 특정한 목적(가령, 시스템과 관련된 제한사항 그리고 사업상의 제한 사항에 부합시키는 것)을 달성하기 위해, 구현예별로 특정한 많은 사항들이 결정되어야만 한다는 점이 이해되어야 한다. 더 나아가, 이러한 개발 노력은 복잡하고 시간이 걸리는 일일 수 있지만, 그럼에도 불구하고 본 명세서에서 개시된 내용에 의해 도움을 받는 당업자에게는 이러한 개발 노력이 통상적인 작업일 수도 있음이 이해되어야만 한다. 예를 들면, 선택된 양상들이 블록 다이어그램 형식으로 도시되지만 그렇게 상세하지는 않은바, 이는 발명을 불명료하게 만드는 것을 피하기 위함이다. 이러한 표현들 및 설명들은 해당 기술분야의 당업자들이 자신의 작업의 요지를 다른 당업자에게 전달하거나 설명하기 위해 당업자들에 의해 사용된다. 이제부터 첨부된 도면들을 참조로 하여 본 발명이 설명될 것이다.Although exemplary embodiments of the invention are described below, it will be understood that the invention may be practiced without the specific details and for purposes of clarity, not all features of an actual implementation have been described herein. In developing any of these practical implementations, it is understood that many implementation specific issues must be determined in order to achieve the specific objectives of the developer (e.g. meeting system and business limitations). Should be. Furthermore, such development efforts may be complex and time consuming, but it should nevertheless be understood that such development efforts may be routine to those skilled in the art to be assisted by the disclosure herein. For example, although selected aspects are shown in block diagram form, they are not so detailed, in order to avoid obscuring the invention. These representations and descriptions are used by those skilled in the art to convey or explain the substance of their work to others skilled in the art. The invention will now be described with reference to the accompanying drawings.

도2를 참조하면, 본 발명에 따른 예시적인 비디오 압축해제 시스템(100)이 도시되어 있다. 도시된 바와같이 비디오 압축해제 시스템(100)은 가령, 데스크탑 또는 랩탑 컴퓨터, 무선 또는 이동 디바이스, 개인 휴대 단말기(PDA), 이동전화 또는 셀룰러폰, 및 비디오 이미징 특질들을 포함하는 임의의 다른 비디오 재생 디바이스와 같은, 임의의 비디오 재생 디바이스에서 구현될 수도 있다. 2, an exemplary video decompression system 100 in accordance with the present invention is shown. As shown, the video decompression system 100 may be, for example, a desktop or laptop computer, a wireless or mobile device, a personal digital assistant (PDA), a mobile phone or cellular phone, and any other video playback device, including video imaging features. It may be implemented in any video playback device, such as.

도2에 도시된 바와같이 비디오 압축해제 시스템(100)은, 하나 이상의 프로세서 또는 프로세싱 유닛(50) 및 비디오(또는 미디어) 가속 하드웨어 유닛(101)에 접속되는 버스(95)를 포함하는, 호스트 프로세싱 유닛 또는 어플리케이션 프로세싱 유닛으로 구현된다. 또한, 상기 비디오 압축해제 시스템(100)은, DDR 제어기(60)를 통해 억세스되는 대용량의 DDR SDRAM(62, 64)을 포함하는 메인 메모리 시스템을 포함하여 구성된다. 부가적으로 또는 대안적으로, 하나 이상의 메모리들[예를 들면, IDE(72), 플래시 메모리 유닛(74), ROM(76) 등등]은 정적(static) 메모리 제어기(70)를 통해 억세스된다. DDR SDRAM 또는 다른 메모리들 모두 또는 이들 중 하나는 비디오 압축해제 시스템(100)에 통합될 수도 있으며 또는 비디오 압축해제 시스템(100)의 외부에 구현될 수도 있다. 물론, 다른 주변 디바이스들 및 디스플레이 디바이스들(82, 84, 86, 92)은 각각의 제어기(80, 90)를 통해 억세스될 수도 있다. 명확함과 이해의 편의를 위해, 비디오 압축해제 시스템(100)을 구성하는 모든 구성요소들이 상세히 설명되지는 않았다. 이러한 상세내용은 해당 기술분야의 당업자에게 잘 알려진 것이며, 특정한 컴퓨터 벤더(vendor) 및 마이크로프로세서의 타입에 따라 가변적일 수도 있다. 더 나아가, 비디오 압축해제 시스템(100)은 원하는 구현예에 따라 또 다른 버스들, 디바이스들, 및/또는 서브 시스템들을 포함할 수도 있다. 예를 들면, 상기 비디오 압축해제 시스템(100)은 캐시, 모뎀, 병렬 또는 직렬 인터페이스, SCSI 인터페이스, 네트워크 인터페이스 카드, 및 이와 유사한 것들을 포함할 수도 있다. 예시된 실시예에서, CPU(50)는 플래시 메모리(74) 및/또는 SDRAM(62, 64)에 저장된 소프트웨어를 실행한다. As shown in FIG. 2, video decompression system 100 includes a bus 95 connected to one or more processors or processing units 50 and video (or media) acceleration hardware units 101. Implemented as a unit or an application processing unit. The video decompression system 100 also includes a main memory system that includes a large amount of DDR SDRAMs 62 and 64 accessed through the DDR controller 60. Additionally or alternatively, one or more memories (eg, IDE 72, flash memory unit 74, ROM 76, etc.) are accessed via static memory controller 70. All or one of the DDR SDRAMs or other memories may be integrated into the video decompression system 100 or may be implemented external to the video decompression system 100. Of course, other peripheral devices and display devices 82, 84, 86, 92 may be accessed through respective controllers 80, 90. For clarity and ease of understanding, not all components that make up the video decompression system 100 have been described in detail. Such details are well known to those skilled in the art and may vary depending on the particular computer vendor and type of microprocessor. Furthermore, video decompression system 100 may include further buses, devices, and / or subsystems, depending on the desired implementation. For example, the video decompression system 100 may include a cache, modem, parallel or serial interface, SCSI interface, network interface card, and the like. In the illustrated embodiment, the CPU 50 executes software stored in the flash memory 74 and / or the SDRAMs 62 and 64.

도2에 도시된 비디오 압축해제 시스템(100)에서, CPU(50)는 VLD 블록(52)에서 나타난 바와같이 초기 가변 길이 디코딩 기능을 수행하며, 반면에 미디어 가속 하드웨어 유닛(101)은, 디코딩된 데이터에 대해서, 역 양자화(104), 역 변환(inverse transform)(106), 모션 보상(108), 인-루프 필터링(110), 컬러 스페이 스 변환(color space conversion)(112), 스케일링(114) 및 필터링(116)을 수행한다. 결과적인 디코딩된 데이터는, 디스플레이(92)에서 디스플레이되기 전에, 출력 버퍼(118) 및/또는 프레임 버퍼(미도시)에 임시로 저장될 수도 있다. 프로세서(50) 및 미디어 가속기 하드웨어(101) 사이에서 디코드 프로세싱 기능을 나눔으로써, 프론트 엔드 디코딩 단계들(예를 들면, 가변 길이 디코딩)은, 다양한 서로다른 압축 체계(예를 들면, MPEG-1, MPEG-2, MPEG-3, MPEG-4, MPEG-7, MPEG-21, WMV9 등등)들을 수용(accommodate)하도록, 소프트웨어에서 구현될 수 있다. 프론트 엔드 디코딩 단계들에 의해 생성된 디코드 데이터는 미디어 가속 하드웨어(101)에 제공되는바, 미디아 가속 하드웨어는 상기 디코드 데이터를 더 디코드하여 픽셀 값들을, 프레임이 완료될 때까지 매크로블록별 기반으로(macroblock by macroblock basis) 출력 버퍼 또는 프레임 버퍼에 제공한다. In the video decompression system 100 shown in FIG. 2, the CPU 50 performs an initial variable length decoding function as shown in the VLD block 52, while the media acceleration hardware unit 101 is decoded. For data, inverse quantization 104, inverse transform 106, motion compensation 108, in-loop filtering 110, color space conversion 112, scaling 114 And filtering 116. The resulting decoded data may be temporarily stored in output buffer 118 and / or frame buffer (not shown) before being displayed in display 92. By dividing the decode processing function between the processor 50 and the media accelerator hardware 101, the front end decoding steps (e.g., variable length decoding) can be performed using various different compression schemes (e.g. MPEG-1, MPEG-2, MPEG-3, MPEG-4, MPEG-7, MPEG-21, WMV9, etc.) may be implemented in software. The decode data generated by the front end decoding steps is provided to the media acceleration hardware 101, which further decodes the decode data to obtain pixel values on a macroblock basis until the frame is complete ( macroblock by macroblock basis) Provided to the output buffer or frame buffer.

동작면에서, 비디오 압축해제 시스템(100)은 압축된 비디오 신호를 가령, CD ROM, DVD 또는 다른 저장 디바이스와 같은 비디오 신호 소스로부터 수신한다. 압축된 비디오 신호는 압축된 비디오 정보의 스트림으로서 프로세서(50)에 제공되며, 프로세서는 가변 길이 디코딩된 데이터(VLD 데이터) 신호를 제공하기 위해서, 상기 압축된 신호의 가변 길이 코딩된 부분(portion)을 디코드하는 명령을 실행한다. 가변 길이 코딩을 수행하기 위해 소프트웨어 도움(software assist)이 일단 적용되면, 상기 VLD 데이터[이는 헤더들, 매트릭스 가중치들(matrix weights), 모션 벡터들, 변환된 나머지 계수들(transformed residue coefficients), 및 미분 모션 벡터들(differential motion vectors)을 포함한다]는 미디어 가속 하드웨어 유닛(101) 으로 전달되는바, 직접 전달되거나 또는 미국 출원번호 11/042365(제목 : Lightweight Compression Of Input Data)에 좀더 자세히 설명된 데이터 압축 기술들을 이용하여 전달된다. 미디어 가속기 하드웨어 유닛(101)에서는, 일단 VLD 데이터가 수신되면, 상기 데이터는 역 지그-재그 및 양자화기 회로(104)로 제공되며, 역 지그-재그 및 양자화기 회로는, 지그-재그 디코딩된 신호를 제공하기 위해 상기 VLD 데이터 신호를 디코드한다. 역 지그-재그 및 양자화는, 압축된 비디오 신호는 지그-재그 런 길이 코드 방식으로 압축되는 반면에, 지그-재그 디코딩된 신호는 정보의 연속적인(sequential) 블록들로서 역 DCT 회로(IDCT)(106)에 제공된다는 점을 보상해 준다. 결과적으로, 이러한 지그-재그 디코딩된 신호는 블록들을 제공하는바, 이들 블록들은, 디스플레이(92)에 래스터 주사(raster scanning)를 하기위해 요구되는 순서를 갖는다. 이후, 이러한 지그-재그 디코딩된 신호는 역 변환 회로(106)(예를 들면, IDCT 또는 역 정수(integer) 변환)에 제공되며, 역 변환 회로는 지그-재그 디코딩된 비디오 신호에 블록별 기반으로(block by block basis) 역 이산 코사인 변환을 수행하여, 정적으로(staticly) 압축해제된 픽셀 값들 또는 압축해제된 오차 항들(error terms)을 제공한다. 상기 정적으로(staticly) 압축해제된 픽셀 값들은 모션 보상 유닛(108)을 통해 블록별 기반으로 프로세싱되는바, 모션 보상 유닛(108)은 인트라프레임, 예측된 및 양방향 모션 보상을 제공하며, 하나, 둘 및 네개 모션 벡터들(16×16, 16×8 및 8×8 블록들)에 대한 지원(support)을 포함한다. 인-루프 필터(110)는, WMV9 압축 표준에 따른 블록킹 아티팩트들을 감소 또는 제거하기 위해 오버랩 평활화 및/또는 디블록킹을 수행하는바, 부분적으 로 종료된 매크로 블록 필터 데이터를 저장하는 스크래치 패드 메모리(111)를 사용하며, 이에 대한 상세한 내용은 후술한다. 컬러 스페이스 변환기(converter)(112)는 하나 이상의 입력 데이터 포맷들(예를 들면, YCbCr 4:2:0)을 하나 이상의 출력 포맷들(예를 들면, RGB)로 변환하며, 그리고 그 결과는 필터(116)에서 필터링되거나 및/또는 스케일링된다. In operation, video decompression system 100 receives a compressed video signal from a video signal source, such as, for example, a CD ROM, DVD, or other storage device. The compressed video signal is provided to the processor 50 as a stream of compressed video information, which processor provides a variable length coded portion of the compressed signal to provide a variable length decoded data (VLD data) signal. Run the command to decode. Once software assist is applied to perform variable length coding, the VLD data (which includes headers, matrix weights, motion vectors, transformed residue coefficients, and Differential motion vectors are transmitted to the media acceleration hardware unit 101, either directly or as described in more detail in US Appl. No. 11/042365 (Title: Lightweight Compression Of Input Data). Delivered using data compression techniques. In the media accelerator hardware unit 101, once VLD data is received, the data is provided to an inverse zig-zag and quantizer circuit 104, wherein the inverse zig-zag and quantizer circuit is a zig-zag decoded signal. Decode the VLD data signal to provide. Inverse zig-zag and quantization allows the compressed video signal to be compressed in a zig-zag run length code scheme, while the zig-zag decoded signal is an inverse DCT circuit (IDCT) 106 as sequential blocks of information. To compensate). As a result, this zigzag decoded signal provides the blocks, which have the order required to perform raster scanning on the display 92. This zigzag decoded signal is then provided to an inverse transform circuit 106 (e.g., an IDCT or integer transform), which inversely blocks on a zigzag decoded video signal. Inverse discrete cosine transform is performed to provide statically decompressed pixel values or decompressed error terms. The statically decompressed pixel values are processed on a block-by-block basis via motion compensation unit 108, where motion compensation unit 108 provides intraframe, predicted and bidirectional motion compensation, It includes support for two and four motion vectors (16x16, 16x8 and 8x8 blocks). The in-loop filter 110 performs overlap smoothing and / or deblocking to reduce or eliminate blocking artifacts according to the WMV9 compression standard, so that the in-loop filter 110 stores the partially terminated macro block filter data ( 111), which will be described later. Color space converter 112 converts one or more input data formats (eg, YCbCr 4: 2: 0) into one or more output formats (eg, RGB), and the result is a filter Filtered and / or scaled at 116.

본 명세서에서 개시된 바와같이, 평활화 및 디블록킹 인-루프 필터(110)는, 제 1 패스 동안에 매크로 블록들의 각각의 행을 부분적으로 필터링 또는 프로세싱함으로써, 인접한 블록들 사이의 경계 불연속들을 제거하며, 이후 부분적으로 프로세싱된 상기 블록들에 대한 처리를, 매크로 블록들의 다음 행을 프로세싱하는 동안에 완료한다. 이러한 기술로 인해, 부분적으로 처리된 블록들을 스크래치 패드 메모리에 저장하도록 작은 스크래치 패드 메모리(111)가 효과적으로 사용될 수 있으며, 이는 종래기술에 따른 디블록킹 프로세스에서 필터링을 위해 전체 프레임 이미지를 저장하는 큰 메모리를 사용하는 것과 현저히 다른 점이다. As disclosed herein, the smoothing and deblocking in-loop filter 110 removes boundary discontinuities between adjacent blocks by partially filtering or processing each row of macro blocks during the first pass. The processing for the partially processed blocks is completed while processing the next row of macro blocks. Due to this technique, the small scratch pad memory 111 can be effectively used to store partially processed blocks in the scratch pad memory, which is a large memory that stores the entire frame image for filtering in the deblocking process according to the prior art. This is quite different from using.

오버랩 평활화 및 디블록킹을 위해 각 블록을 프로세싱하는 것이 행별 기반(row by row basis)으로 완료되면, 완료된 블록들은 CSC(Color Space Converter)(112)로 전송되기 전에, 필터(110)로부터 FIFO 버퍼(미도시)로 출력될 수도 있다. If processing of each block for overlap smoothing and deblocking is completed on a row by row basis, the completed blocks are sent from the filter 110 to the FIFO buffer (before being sent to the Color Space Converter (CSC) 112). It may be output to (not shown).

도3은, 본 발명의 선택된 실시예에 따라 오버랩 평활화 및 디블록킹을 효과적으로 처리하기 위해 스크래치 패드 메모리를 사용하는 매크로 블록 기반의 인-루프 필터링 프로세스를 예시적으로 도시한 간략도이다. 필터링 프로세스에서, 매크 로 블록들의 행에 대한 인 루프 필터의 각각의 패스(pass)는, 완전하게 완료된 블록들[상기 블록들을 평활화 및 디블록킹하도록 완전히(fully) 필터링됨] 및 부분적으로 완료된 블록들(매크로 블록들의 다음 행을 평활화 및 디블록킹하는데에 후속으로 사용되기 위해 스크래치 패드 메모리에 저장됨)을 생성한다. 도시된 바와같이, 모든 개개의 매크로 블록(예를 들어, 4개의 루마 블록들, mb4y0, mb4y1, mb4y2, 및 mb4y3 를 포함하는 매크로 블록 4 또는 "mb4")은, 인-루프 필터링 프로세스에서 다음의 프로세싱 순서를 거치게 된다.3 is an exemplary diagram illustrating a macroblock based in-loop filtering process using a scratch pad memory to effectively handle overlap smoothing and deblocking in accordance with a selected embodiment of the present invention. In the filtering process, each pass of the in-loop filter for the row of macro blocks is completed blocks (fully filtered to smooth and deblock the blocks) and partially completed blocks. (Stored in scratch pad memory for subsequent use in smoothing and deblocking the next row of macro blocks). As shown, every individual macro block (e.g., macro block 4 or "mb4" comprising four luma blocks, mb4y0, mb4y1, mb4y2, and mb4y3) may be subjected to the following in-loop filtering process: It goes through the processing sequence.

(ⅰ) 이전의 매크로 블록(예를 들면, 매크로 블록 1)에 인접한 8×8 블록들(예를 들면, mb4y0, mb4y1)에 대한 평활화 및 디블록킹을 완전히 완료하고, 다음 매크로 블록(예를 들면, 매크로 블록 7)에 인접한 8×8 블록들(예를 들면, mb4y2, mb4y3)에 대한 평활화 및 디블록킹을 부분적으로 완료한다;(Iii) Completely smooth and deblock the 8x8 blocks (e.g., mb4y0, mb4y1) adjacent to the previous macro block (e.g., macroblock 1), and then complete the next macroblock (e.g., Partially complete smoothing and deblocking for 8x8 blocks (eg, mb4y2, mb4y3) adjacent to macroblock 7;

(ⅱ) 완료된 8×8 블록들(예를 들면, mb4y0, mb4y1)을 출력하고, 부분적으로 완료된 8×8 블록들(예를 들면, mb4y2, mb4y3)을 스크래치 패드 메모리에 저장한다;(Ii) output the completed 8x8 blocks (eg mb4y0, mb4y1) and store the partially completed 8x8 blocks (eg mb4y2, mb4y3) in the scratch pad memory;

(ⅲ) 다음 매크로 블록(예를 들면, 매크로 블록 7)이 프로세스될 때, 상기 부분적으로 완료된 8×8 블록들(예를 들면, mb4y2, mb4y3)을 스크래치 패드 메모리로부터 페치(fetch)하고, 페치된 8×8 블록들(예를 들면, mb4y2, mb4y3)에 대한 프로세싱을 완료한다; 그리고(Iii) when the next macro block (e.g., macro block 7) is processed, fetch the partially completed 8x8 blocks (e.g., mb4y2, mb4y3) from scratch pad memory, and fetch Complete the processing for the 8 × 8 blocks (eg, mb4y2, mb4y3); And

(ⅳ) 상기 완료된 8×8 블록들(예를 들면, mb4y2, mb4y3)을 다음 매크로 블록의 완료된 8×8 블록들(예를 들면, mb7y0, mb7y1)과 함께 출력한다.(Iii) Output the completed 8x8 blocks (eg, mb4y2, mb4y3) together with the completed 8x8 blocks (eg, mb7y0, mb7y1) of the next macro block.

비록, 상세한 구현예들은 어플리케이션에 따라 가변적일 수도 있지만, 도3은, 인-루프 필터(110)에 의해 프로세싱되는 이미지 프레임(150)이, 매크로 블록들(예를 들면, mb0, mb1, mb2, mb3, mb4, mb5, mb6, mb7, mb8 등등의 매크로 블록들)로 구성되어 있으며 다수의 행들[예를 들면, mb0, mb1 및 mb2 로 구성된 제 1 행(151)]에 배치되어 있는 예시적인 실시예를 도시한 도면이다. 도3의 3A에 도시된 바와같이, 인-루프 필터(110)는 매크로 블록들의 제 1 행(151)에 대해 제 1 패스를 이미 수행하였다. 제 1 행(151)에 대한 제 1 패스의 수행 결과, 위쪽 블록들(mb0y0, mb0y1, mb1y0, mb1y1, mb2y0, mb2y1)은 오버랩 평활화 및 디블록킹을 위해 완전히 프로세스되며(크로스-해칭으로 표시됨), 반면에 아래쪽 블록들(mb0y2, mb0y3, mb1y2, mb1y3, mb2y2, mb2y3)은 오버랩 평활화 및 디블록킹을 위해 단지 부분적으로만 프로세스된다. 매크로 블록 프로세싱의 다음번 패스 동안에 제 1 행(151)의 부분적으로 처리된 블록들을 완료하기 위해서, 상기 제 1 행(151)의 부분적으로 처리된 블록들은 스크래치 패드 메모리(111)에 저장된다(도트 패턴으로 표시됨).Although detailed implementations may vary depending on the application, FIG. 3 illustrates that image frame 150 processed by in-loop filter 110 may include macro blocks (eg, mb0, mb1, mb2, An example implementation consisting of mb3, mb4, mb5, mb6, mb7, mb8 and the like, and arranged in multiple rows (e.g., first row 151 consisting of mb0, mb1 and mb2). An example is shown. As shown in 3A of FIG. 3, the in-loop filter 110 has already performed a first pass on the first row 151 of macro blocks. As a result of performing the first pass on the first row 151, the upper blocks mb0y0, mb0y1, mb1y0, mb1y1, mb2y0, mb2y1 are fully processed (indicated by cross-hatching) for overlap smoothing and deblocking, On the other hand, the lower blocks mb0y2, mb0y3, mb1y2, mb1y3, mb2y2, mb2y3 are only partially processed for overlap smoothing and deblocking. In order to complete the partially processed blocks of the first row 151 during the next pass of macro block processing, the partially processed blocks of the first row 151 are stored in the scratch pad memory 111 (dot pattern Indicated by).

도3의 3A에 추가적으로 도시된 바와같이, 인-루프 필터(110)는, 매크로 블록들의 제 2 행(152)을 프로세싱하기 시작하며, 이 프로세스에 의해 제 1 행(151)의 부분적으로 프로세싱된 블록들이 완료될 것이다. 그 결과, mb0y2 및 mb3y0 블록이 평활화 및 디블록킹을 위해 완전하게 프로세싱되며, mb3y2 블록은 부분적으로만 프로세싱된다(그리고 스크래치 패드에 저장됨). 도3의 3A에서 필터(110)에 의해 프로세싱될 블록들에 대해서[필터링된 블록(154)은 대각선 해칭으로 표시됨], mb0y3, mb3y1 및 mb3y3 블록들은 평활화 및 디블록킹을 위해 부분적으로 프로세싱되어 있으며[그리고 필터(110)에 보유되며], mb1y2 블록은 스크래치 패드로부터 페치된 부분적으로 완료된 블록이며, 그리고 나머지 블록들(mb4y0 및 mb4y2)은 현재의 매크로 블록(예를 들면, 매크로 블록 4)으로부터 획득된다. 인-루프 필터(110)가 필터링된 블록들(154)을 프로세싱함에 따라, 하나 이상의 부분적으로 프로세싱된 블록들(예를 들면, mb0y3, mb3y1)에 대한 평활화 및 디블록킹은 완료되며, 반면에 남아있는 블록들(mb1y2, mb4y0, mb4y2 and mb3y3)은 부분적으로만 완료된다.As further shown in 3A of FIG. 3, the in-loop filter 110 begins processing the second row 152 of macro blocks, which is partially processed in the first row 151 by this process. Blocks will complete. As a result, the mb0y2 and mb3y0 blocks are fully processed for smoothing and deblocking, and the mb3y2 blocks are only partially processed (and stored in the scratch pad). For blocks to be processed by filter 110 in 3A of FIG. 3 (filtered block 154 is indicated by diagonal hatching), the mb0y3, mb3y1 and mb3y3 blocks are partially processed for smoothing and deblocking [ And held in filter 110], the mb1y2 block is a partially completed block fetched from the scratch pad, and the remaining blocks mb4y0 and mb4y2 are obtained from the current macro block (e.g., macro block 4). . As the in-loop filter 110 processes the filtered blocks 154, smoothing and deblocking for one or more partially processed blocks (eg, mb0y3, mb3y1) are completed, while remaining The blocks mb1y2, mb4y0, mb4y2 and mb3y3 are only partially completed.

필터링된 블록들(154)을 처리한 이후에, 인-루프 필터(110)는 새로운 데이터로 쉬프트한다. 이러한 것이 도3의 3B에 도시된 프레임(155)에 예시되어 있는바, 임의의 완료된 블록들(예를 들면, mb0y3, mb3yl)을 출력하고, 하나 이상의 부분적으로 완료된 블록들(예를 들면, mb3y3)을 스크래치 패드 메모리에 저장하고, 필터에 남아있는 부분적으로 완료된 블록들(예를 들면, mbly2, mb4y0, mb4y2)을 한 블록 위치만큼 쉬프트하고, 매크로 블록들의 이전 행으로부터 부분적으로 완료된 블록(예를 들면, mb1y3)을 페치하고 그리고 현재의 매크로 블록으로부터 새로운 블록들(예를 들면, mb4yl, mb4y3)을 로딩함으로써, 필터(110)는 필터링된 블록들(156)을 획득한다. 인-루프 필터(110)가 필터링된 블록들(156)을 프로세스함에 따라, 하나 이상의 부분적으로 프로세싱된 블록들(예를 들면, mbly2, mb4y0)에 대한 평활화 및 디블록킹이 완료되며, 남아있는 블록들(mbly3, mb4yl, mb4y3 and mb4y2)은 부분적으로만 완료된다.After processing the filtered blocks 154, the in-loop filter 110 shifts to new data. This is illustrated in frame 155 shown in FIG. 3B, which outputs any completed blocks (eg, mb0y3, mb3yl) and outputs one or more partially completed blocks (eg, mb3y3). ) Into the scratch pad memory, shift the partially completed blocks (e.g. mbly2, mb4y0, mb4y2) remaining in the filter by one block position, and partially block from the previous row of macro blocks (e.g. For example, by fetching mb1y3 and loading new blocks (eg, mb4yl, mb4y3) from the current macro block, filter 110 obtains filtered blocks 156. As the in-loop filter 110 processes the filtered blocks 156, smoothing and deblocking for one or more partially processed blocks (eg, mbly2, mb4y0) are completed, and the remaining blocks are completed. (Mbly3, mb4yl, mb4y3 and mb4y2) are only partially completed.

필터링된 블록들(156)을 프로세싱한 이후에, 상기 인-루프 필터(110)는 다시 새로운 데이터로 쉬프트하며, 이는 도3의 3C에 도시된 프레임(157)에 예시되어 있다. 특히, 임의의 완료된 블록들(예를 들면, mbly2, mb4y0)을 출력하고, 하나 이상의 부분적으로 완료된 블록들(예를 들면, mb4y2)을 스크래치 패드 메모리에 저장하고, 필터에 남아있는 부분적으로 완료된 블록들(예를 들면, mbly3, mb4yl, mb4y3)을 한 블록 위치만큼 쉬프트하고, 매크로 블록들의 이전 행으로부터 부분적으로 완료된 블록(예를 들면, mb2y2)을 페치하고 그리고 현재의 매크로 블록으로부터 새로운 블록들(예를 들면, mb5y0, mb5y2)을 로딩함으로써, 필터(110)는 필터링된 블록들(158)을 획득한다. 인-루프 필터(110)가 필터링된 블록들(158)을 프로세스함에 따라, 하나 이상의 부분적으로 프로세싱된 블록들(예를 들면, mbly3, mb4y1)에 대한 평활화 및 디블록킹이 완료되며, 남아있는 블록들(mb2y2, mb5y0, mb5y2 and mb4y3)은 부분적으로만 완료된다. 이 시점에서, 매크로 블록 4의 위쪽 블록들(mb4y0, mb4y1)에 대한 평활화 및 디블록킹은 완료되어 있으며, 하지만 아래쪽 블록들(mb4y2, mb4y3)은 단지 부분적으로만 완료되어 있다. 부분적으로 완료된 아래쪽 블록들을 스크래치 패드 메모리에 저장함으로서, 필터(110)가 매크로 블록들의 다음 행을 프로세스할 때에, 상기 필터링 동작은 완료될 수 있다. After processing the filtered blocks 156, the in-loop filter 110 again shifts to new data, which is illustrated in frame 157 shown in 3C of FIG. In particular, it outputs any completed blocks (eg mbly2, mb4y0), stores one or more partially completed blocks (eg mb4y2) in scratch pad memory, and partially completed blocks remaining in the filter. (E.g., mbly3, mb4yl, mb4y3) shift by one block position, fetch a partially completed block (e.g. mb2y2) from the previous row of macro blocks, and new blocks from the current macro block ( For example, by loading mb5y0, mb5y2, filter 110 obtains filtered blocks 158. As the in-loop filter 110 processes the filtered blocks 158, smoothing and deblocking for one or more partially processed blocks (e.g., mbly3, mb4y1) are completed, and the remaining blocks (Mb2y2, mb5y0, mb5y2 and mb4y3) are only partially completed. At this point, smoothing and deblocking for the upper blocks mb4y0 and mb4y1 of macro block 4 are completed, but the lower blocks mb4y2 and mb4y3 are only partially completed. By storing the partially completed lower blocks in the scratch pad memory, the filter operation can be completed when the filter 110 processes the next row of macro blocks.

본 발명의 또 다른 실시예에 대한 추가적인 상세한 내용이 도4에 예시되어 있는바, 도4는 비디오 인코더 또는 디코더에서 평활화 및 디블록킹 필터를 사용하여, 디코딩된 프레임의 blockiness 를 감소시키는 기술(200)을 도시한 도면이다. 이해되는 바와같이, 도시된 기술은 루마(luma) 또는 크로마(chroma) 블록들을 프로세스하기 위해 사용될 수도 있으며 비록, 각 프레임의 주변부(periphery)에는 특정 한 코너 케이스들이 있을 수도 있지만, 해당 기술분야의 당업자들은 본 발명을 원하는 대로 조정하거나 적용할 것이다. 하지만, 간략화를 위해서, 현재 개시된 내용은 각각의 프레임의 안쪽 매크로 블록들 상에서 수행되는 인-루프 필터링 단계에 주된 촛점을 맞추어 설명된다. Further details of another embodiment of the present invention are illustrated in FIG. 4, which uses a smoothing and deblocking filter in a video encoder or decoder to reduce the blockiness of the decoded frame 200. Figure is a diagram. As will be appreciated, the depicted techniques may be used to process luma or chroma blocks and although there may be specific corner cases at the periphery of each frame, those skilled in the art Will adjust or apply the invention as desired. However, for the sake of simplicity, the presently disclosed content focuses on the in-loop filtering step performed on the inner macro blocks of each frame.

도4를 참조하면, 일단 비디오 인코더/디코더가 프레임에 대해 적어도 하나의 제 1 매크로 블록을 생성하면(201), 인-루프 필터는 매크로 블록들의 최상위 행(top row)을 한번에 하나의 매크로 블록씩 프로세싱하여 각 블록의 경계들을 그의 인접한 블록들과 함께 필터링한다. 이해되는 바와같이, 상기 프레임의 주변부 상에서는 평활화 또는 디블록킹이 수행되지 않기 때문에, 필터링 프로세스에서 사용되는 부분적으로 완료된 블록들이 프레임의 바깥쪽에는 존재하지 않는다. 하지만, 매크로 블록들의 제 1 행이 필터링됨에 따라, 스크래치 패드 메모리는 부분적으로 완료된 블록들로 채워진다. 제 1 매크로 블록에서 시작하여(201), 인코더/디코더는 요구되는 블록들을 로딩하고, 스크래치 패드 메모리로부터 부분적으로 완료된 임의의 인접한 블록을 위에서 부터(매크로 블록들의 제 1 행은 제외) 검색한다. 매크로 블록이 4개의 루마 블록들(y0, yl, y2, y3) 및 2개의 크로마 블록들(Cb, Cr)로 이루어진 경우, 상기 블록들은 다음과 같은 순서에 따라 인코더/디코더로 들어간다: y0, yl, y2, y3, Cb, Cr. Referring to FIG. 4, once the video encoder / decoder generates at least one first macro block for a frame (201), the in-loop filter places the top row of macro blocks one macro block at a time. Processing to filter the boundaries of each block with its adjacent blocks. As will be appreciated, since no smoothing or deblocking is performed on the periphery of the frame, partially completed blocks used in the filtering process do not exist outside of the frame. However, as the first row of macro blocks are filtered, the scratch pad memory is filled with partially completed blocks. Beginning at the first macro block (201), the encoder / decoder loads the required blocks and retrieves any adjacent blocks partially completed from scratch pad memory (except the first row of macro blocks). If the macro block consists of four luma blocks (y0, yl, y2, y3) and two chroma blocks (Cb, Cr), the blocks enter the encoder / decoder in the following order: y0, yl , y2, y3, Cb, Cr.

다음으로, 비디오 인코더/디코더는, 필터로 로딩된 블록들의 소정의 경계들을, 인접한 블록들 또는 서브-블록들과 함께 필터링한다(210). 선택된 실시예에서, 필터에서 각 블록을 부분적으로 프로세스하기 위해 구분적 프로세싱 기법이 사용될 수도 있다. 예를 들면, 휘도(luminance) 또는 색차(chrominance) 중 어느 하나에서 8×8 블록을 디코딩한 이후에, 왼쪽 및/또는 오른쪽(수직) 가장자리들의 전부 또는 일부가 평활화 필터 프로세싱을 받게된다(211). 부가적으로 또는 대안적으로, 상기 블록의 최상 및/또는 바닥(수평) 가장자리들의 전부 또는 일부가 평활화 필터 프로세싱을 받게된다(212). 오버랩 평활화 뿐만 아니라, 디블록킹 필터 프로세싱이 8×8 블록들의 선택된 수평 경계선들의 전부 또는 일부에 적용될 수도 있으며(213) 및/또는 8×4 서브-블록들의 선택된 수평 경계선들의 전부 또는 일부에 적용될 수도 있다(214). 추가적으로 또는 대안적으로, 디블록킹 필터 프로세스는 8×8 블록들의 선택된 수직 경계선들의 전부 또는 일부에 적용될 수도 있으며(215) 및/또는 8×4 서브-블록들의 선택된 수직 경계선들의 전부 또는 일부에 적용될 수도 있다(216).Next, the video encoder / decoder filters 210 certain boundaries of blocks loaded with the filter, along with adjacent blocks or sub-blocks. In selected embodiments, a separate processing technique may be used to partially process each block in the filter. For example, after decoding an 8x8 block in either luminance or chrominance, all or some of the left and / or right (vertical) edges are subjected to smoothing filter processing (211). . Additionally or alternatively, all or part of the top and / or bottom (horizontal) edges of the block are subjected to smoothing filter processing (212). In addition to overlap smoothing, deblocking filter processing may be applied to all or part of selected horizontal boundaries of 8x8 blocks (213) and / or to all or part of selected horizontal boundaries of 8x4 sub-blocks. (214). Additionally or alternatively, the deblocking filter process may be applied to all or some of the selected vertical boundaries of 8x8 blocks (215) and / or to all or some of the selected vertical boundaries of 8x4 sub-blocks. 216.

일단 필터내의 블록들이 구분적으로 프로세스되면, 그 결과물들은 추가적인 프로세싱을 위해 필터 내에 저장되거나 또는 쉬프트된다. 특히, 필터가 새로운 데이터를 프로세스하기 위해서는, 필터내의 임의의 완료된 블록들은 필터로부터 출력된다(217). 또한, 새로운 블록들과 함께 프로세스되지 않을 것인 부분적으로 완료된 임의의 블록들은, 후속 사용 및 매크로 블록들의 다음 행과 함께 추가적으로 프로세싱되기 위하여 스크래치 패드 메모리에 저장되는바(219), 이는 매크로 블록들의 마지막 행이 프로세스될 때까지(결정단계 218 에서 긍정적인 결과가 나올때 까지) 수행되며, 이 경우 스크래치 패드 메모리에 저장하는 단계가 생략될 수도 있다.Once the blocks in the filter are processed separately, the results are stored or shifted in the filter for further processing. In particular, for the filter to process new data, any completed blocks in the filter are output from the filter (217). In addition, any partially completed blocks that will not be processed with new blocks are stored in scratch pad memory for further processing with subsequent use and the next row of macro blocks, which is the last of the macro blocks. Until the row is processed (until a positive result at decision step 218), the step of storing in scratch pad memory may be omitted.

선택된 블록들을 저장함에 의해 필터내에 여분 공간을 만들었기 때문에(217, 219), 상기 필터는 이제 새로운 데이터를 프로세싱할 수 있다. 특히, 프레임내에 추가적인 블록들이 있는 경우에는(결정단계 220에서 긍정적인 결과), 필터에 남아있는 부분적으로 필터링된 블록들은 왼편으로 쉬프트된다(222). 최상층 행의 아래쪽에 있는 행들의 경우(결정 단계 224에서 부정적인 결과), 필터내에서 이용가능한 공간은 스크래치 패드 메모리로부터 부분적으로 완료된 다음 블록을 검색함에 의해 채워지며(226), 그리고 필터내에서 임의의 남아있는 공간은 새로운 블록들로 채워진다(228). 일단, 필터가 새로운 데이터로 로딩되면, 블록 필터링 프로세스(210)는 필터 블록들의 새로운 세트에 대해서 반복된다. 이러한 일련의 동작들을 반복함으로써, 프레임내의 각각의 매크로 블록은 연속적으로 필터링되는바, 매크로 블록들의 이전 행에 대한 프로세싱 동안에 생성된 부분적으로 완료된 블록들을 스크래치 패드 메모리로부터 검색하며, 부분적으로 필터링된 블록들을, 매크로 블록들의 다음 행에 대한 프로세싱 동안에 후속으로 사용하기 위하여 스크래치 패드 메모리에 저장한다. 다른 한편으로, 만약 필터링될 블록들이 남아있지 않은 경우(결정단계 220에서 부정적인 결과), 현재 프레임에 대한 평활화 및 디블록킹 프로세스는 종료한다. 이 시점에서 다음 프레임이 검색되며(230), 그리고 상기 필터링 프로세스가 새로운 프레임의 제 1 매크로 블록에서 시작하여 반복된다. Since extra space has been created in the filter by storing the selected blocks (217, 219), the filter can now process new data. In particular, if there are additional blocks in the frame (positive result at decision 220), the partially filtered blocks remaining in the filter are shifted to the left (222). For rows below the top row (negative result at decision step 224), the space available in the filter is filled by retrieving the next block that is partially completed from scratch pad memory (226), and any The remaining space is filled with new blocks (228). Once the filter is loaded with new data, the block filtering process 210 is repeated for a new set of filter blocks. By repeating this series of operations, each macro block in the frame is successively filtered, retrieving from the scratch pad memory the partially completed blocks created during processing for the previous row of macro blocks, and removing the partially filtered blocks. It stores in scratch pad memory for subsequent use during processing for the next row of macro blocks. On the other hand, if there are no blocks left to be filtered (negative result at decision step 220), the smoothing and deblocking process for the current frame ends. At this point the next frame is retrieved (230), and the filtering process is repeated starting at the first macro block of the new frame.

이제, 도5A 내지 도5K를 참조하여 설명한다. 도5A 내지 도5K는 본 발명의 예시적인 실시예를 도시한 도면으로, 매크로 블록 4(또는 mb4)의 루마 블록들에 대한 WMV 평활화 및 디블록킹 절차가 구분적 프로세싱 기법을 이용하여 어떻게 구현될 수도 있는지를 도시한 도면이다. 필터링 프로세스의 시작 지점이 도5A에 도시되어 있는바, 필터(320)는, 오버랩 평활화(예를 들면, 타원형 322 으로 표시됨) 및 디블록킹(예를 들면, 직선 323으로 표시됨)을 위해 이미 부분적으로 프로세싱된 블록들(mb0y3, mb3y1 및 mb3y3)로 이미 로딩되어 있다.Now, a description will be given with reference to FIGS. 5A to 5K. 5A-5K illustrate an exemplary embodiment of the present invention, wherein the WMV smoothing and deblocking procedure for luma blocks of macro block 4 (or mb4) may be implemented using a separate processing technique. It is a figure which shows whether there exists. The starting point of the filtering process is shown in FIG. 5A, where filter 320 is already partially partially for overlap smoothing (e.g., indicated by elliptical 322) and deblocking (e.g., indicated by straight line 323). It is already loaded into the processed blocks mb0y3, mb3y1 and mb3y3.

이후, 필터(320)는 도5B에 도시된 바와같이 추가적인 블록들로 채워진다. 특히, 부분적으로 완료된 블록(예를 들면, mb1y2)은 스크래치 패드 메모리로부터 검색되어 필터로 로딩된다. 또한, 현재의 매크로 블록(예를 들면, mb4)로부터 선택된 블록들(예를 들면, mb4y0, mb4y2)이 인코더/디코더로 로딩된다. 비록, 매크로 블록 로딩 시퀀스(예를 들면, mb4y0, mb4yl, mb4y2 and mb4y3)는 상기 블록들 중 적어도 하나의 블록들이 로딩될 것을 요구하지만, 지금 이 시점에서는 필터(320)로 쉬프트될 것을 요구하지는 않으며, 이는 도면부호(321)에서 도시된 바와같다.The filter 320 is then filled with additional blocks as shown in FIG. 5B. In particular, the partially completed block (eg mb1y2) is retrieved from the scratch pad memory and loaded into the filter. Also selected blocks (eg mb4y0, mb4y2) from the current macro block (eg mb4) are loaded into the encoder / decoder. Although the macro block loading sequence (eg mb4y0, mb4yl, mb4y2 and mb4y3) requires at least one of the blocks to be loaded, it does not require shifting to filter 320 at this point in time. This is as shown at 321.

일단 필터 블록들이 로딩되면, 필터(320)는 도5C에 도시된 바와같이 구분적 오버랩 평활화를 수행한다. 특히, 선택된 내부 수직 가장자리(edge)(301, 302)에 대해서 수직 오버랩 평활화(V)가 수행된다. 다음으로, 선택된 내부 수평 가장자리(예를 들면, 303, 304, 305 및 306)에 대해서 수평 오버랩 평활화(H)가 수행된다.Once the filter blocks are loaded, filter 320 performs fractional overlap smoothing as shown in FIG. 5C. In particular, vertical overlap smoothing (V) is performed on the selected inner vertical edges 301, 302. Next, horizontal overlap smoothing H is performed for the selected inner horizontal edges (e.g., 303, 304, 305 and 306).

필터 블록들이 부분적으로 평활화된 이후에, 필터(320)는 도5D에 도시된 바와같이 구분적 디블록킹을 수행한다. 먼저, 선택된 8×8 블록 경계들(예를 들면, 307, 308, 309, 310)에 대해 수평 인-루프 디블록킹(HD)이 수행되며, 이후 선택된 서브-블록 경계들(예를 들면, 311, 312, 313, 314)에 대해 수평 인-루프 디블록킹(HDH)이 수행된다. 다음으로, 상기 필터는 선택된 8×8 블록 경계들(예를 들면, 315, 316)에 대한 수직 인-루프 디블록킹(VD)을 수행하며, 이후 선택된 서브-블록 경계들(예를 들면, 317, 318)에 대한 수직 인-루프 디블록킹(VDH)을 수행한다.After the filter blocks are partially smoothed, the filter 320 performs fractional deblocking as shown in FIG. 5D. First, horizontal in-loop deblocking (HD) is performed on selected 8x8 block boundaries (e.g., 307, 308, 309, 310), and then selected sub-block boundaries (e.g., 311). Horizontal in-loop deblocking (HDH) is performed for 312, 313, and 314. Next, the filter performs vertical in-loop deblocking (VD) for the selected 8x8 block boundaries (eg, 315, 316), and then selects the selected sub-block boundaries (eg, 317). Vertical in-loop deblocking (VDH).

전술한 평활화 및 디블록킹 단계들 각각에서, 경계 부분(boundary piece)이 필터링되는 순서는 문제되지 않는바, 이는 경계 부분들(boundary pieces) 사이에서는 종속성(dependency)이 없기 때문이다. 더 나아가, 도5C 및 도5D에 도시된 특정한 순서의 구분적 프로세싱이외에도, 다른 순서 및/또는 필터링 단계들 역시 본 발명에 따라 구현될 수도 있다. 예를 들면, 서로 다른 순서로 경계 부분들((boundary pieces)이 필터링 될 수도 있다. 또한, 하나 이상의 평활화 또는 디블록킹 모드들이 적용될 수도 있으며, 필터링 동작은 경계의 양쪽 사이드 상에서 3개 또는 그 이상의 픽셀들에 대해 영향을 미칠 수 있는바 이는 필터링 체계에 좌우된다. 예를 들면, MPEG-4 표준은 2개의 디블록킹 모드들을 사용하는바, 제 1 모드에서는 짧은(short) 필터를 블록 가장자리의 양쪽 사이드 상에서 하나의 픽셀에 적용하고 있으며, 제 2 모드에서는 좀더 긴(longer) 필터를 블록 가장자리의 양쪽 사이드 상에서 2개의 픽셀들에 적용하고 있다. 다른 구현예들에서는, 필터 정의들, 서로 다른 필터들의 갯수 및/또는 적응형 필터링 조건들이 특정한 요구들을 만족시키기 위해 사용될 수도 있다. In each of the smoothing and deblocking steps described above, the order in which the boundary pieces are filtered is not a problem, since there is no dependency between the boundary pieces. Furthermore, in addition to the specific ordering processing of divisions shown in Figures 5C and 5D, other ordering and / or filtering steps may also be implemented in accordance with the present invention. For example, boundary pieces may be filtered in a different order, one or more smoothing or deblocking modes may be applied, and the filtering operation may be performed on three or more pixels on both sides of the boundary. This depends on the filtering scheme, for example the MPEG-4 standard uses two deblocking modes, in which a short filter is placed on both sides of the block edge. In the second mode, a longer filter is applied to two pixels on both sides of the block edge In other embodiments, filter definitions, number of different filters And / or adaptive filtering conditions may be used to meet certain requirements.

일단, 평활화 및 디블록킹 필터 동작이 완결되면, 프로세싱된 필터 블록들은 도5E에 도시된 바와같이 저장되고 쉬프트된다. 특히, 임의의 완료된 블록들(예를 들면, mb0y3, mb3yl)은 이후 출력될 수도 있는데, 이는 이들 블록들이 완료되었기 때문이다. 또한, 하나 이상의 부분적으로 완료된 블록들(예를 들면, mb3y3)은, 아래쪽으로 인접한 매크로 블록(도3의 mb3y3과 관련하여 매크로블록 6을 참조하라)을 처리할 때에 후속 사용을 위해, 스크래치 패드 메모리로 이동될 수도 있다. 필터에 남아있는 임의의 부분적으로 완료된 블록들(예를 들면, mbly2, mb4y0, mb4y2)은, 새로운 데이터를 위한 공간을 만들어주도록 이후, 필터로 쉬프트될 수도 있다. 출력, 저장 및 쉬프트 단계의 결과가 도5F에 도시되어 있다. Once the smoothing and deblocking filter operation is completed, the processed filter blocks are stored and shifted as shown in FIG. 5E. In particular, any completed blocks (eg, mb0y3, mb3yl) may then be output, since these blocks have been completed. In addition, one or more partially completed blocks (e.g., mb3y3) may be used for scratch pad memory for subsequent use in processing downwardly adjacent macro blocks (see macroblock 6 in relation to mb3y3 in FIG. 3). May be moved to. Any partially completed blocks remaining in the filter (eg, mbly2, mb4y0, mb4y2) may then be shifted into the filter to make room for new data. The results of the output, store, and shift steps are shown in FIG. 5F.

이제, 필터(320)는 도5G에 도시된 바와같이 새로운 데이터 블록들로 채워질 수도 있다. 특히, 부분적으로 완료된 블록(예를 들면, mb1y3)이 스크래치 패드 메모리로부터 검색되어 필터로 로딩된다. 또한, 남아있는 블록들(예를 들면, mb4yl, mb4y3)이 현재의 매크로 블록(예를 들면, mb4)으로부터 필터(320)로 로딩된다.Filter 320 may now be filled with new data blocks as shown in FIG. 5G. In particular, the partially completed block (eg mb1y3) is retrieved from the scratch pad memory and loaded into the filter. In addition, the remaining blocks (eg mb4yl, mb4y3) are loaded into the filter 320 from the current macro block (eg mb4).

일단, 필터 블록들이 로딩되면, 필터(320)는 도5H에 도시된 바와같은 구분적 오버랩 평활화를 수행한다. 특히, 선택된 내부 수직 에지들에 대해 수직 오버랩 평활화(V1, V2)가 수행된다. 다음으로, 선택된 내부 수평 에지들에 대해 수평 오버랩 평활화(H1, H2, H3, H4)가 수행된다.Once the filter blocks are loaded, filter 320 performs fractional overlap smoothing as shown in FIG. 5H. In particular, vertical overlap smoothing (V1, V2) is performed on the selected inner vertical edges. Next, horizontal overlap smoothing (H1, H2, H3, H4) is performed on the selected inner horizontal edges.

필터 블록들이 부분적으로 평활화된 이후에, 필터(320)는 도5I 에 도시된 바와같은 구분적 디블록킹을 수행한다. 먼저, 수평 인-루프 디블록킹(HD1, HD2, HD3, HD4)이 선택된 8×8 블록 경계들에 대해 수행되며, 선택된 서브 블록 경계들에서 수평 인-루프 디블록킹(HDH1, HDH2, HDH3, HDH4)이 후속으로 수행된다. 다음으로, 상기 필터는 선택된 8×8 블록 경계들에 대해 수직 인-루프 디블록킹(VD1, VD2)을 수행하며, 이후 선택된 서브-블록 경계들에 대한 수직 인-루프 디블록킹(VDH1, VDH2)이 수행된다. After the filter blocks have been partially smoothed, filter 320 performs fractional deblocking as shown in FIG. 5I. First, horizontal in-loop deblocking (HD1, HD2, HD3, HD4) is performed on selected 8x8 block boundaries, and horizontal in-loop deblocking (HDH1, HDH2, HDH3, HDH4) at selected subblock boundaries. ) Is subsequently performed. Next, the filter performs vertical in-loop deblocking (VD1, VD2) for the selected 8x8 block boundaries, and then vertical in-loop deblocking (VDH1, VDH2) for the selected sub-block boundaries. This is done.

일단, 평활화 및 디블록킹 필터 동작이 완결되면, 프로세싱된 필터 블록들은 도5J에 도시된 바와같이 저장되고 쉬프트된다. 특히, 임의의 완료된 블록들(예를 들면, mb1y2, mb4y0)은 이후 출력될 수도 있는데, 이는 이들 블록들이 완료되었기 때문이다. 또한, 하나 이상의 부분적으로 완료된 블록들(예를 들면, mb4y2)은, 아래쪽으로 인접한 매크로 블록을 처리할 때에 후속 사용을 위해, 스크래치 패드 메모리로 이동될 수도 있다. 필터에 남아있는 임의의 부분적으로 완료된 블록들(예를 들면, mbly3, mb4y1, mb4y3)은, 새로운 데이터를 위한 공간을 만들어주도록 이후, 필터로 쉬프트될 수도 있다. 출력, 저장 및 쉬프트 단계의 결과가 도5K에 도시되어 있으며, 이는 도5A에 도시된 초기 필터 상태에 정확히 대응한다. 그 결과, 도5A 내지 도5K에 도시된 일련의 단계들이, 다음 매크로 블록(예를 들면, 매크로블록 5 또는 mb5)을 포함하는 필터 블록들을 계속하여 필터링하도록 반복될 수도 있다.Once the smoothing and deblocking filter operation is complete, the processed filter blocks are stored and shifted as shown in FIG. 5J. In particular, any completed blocks (eg mb1y2, mb4y0) may then be output, since these blocks have been completed. In addition, one or more partially completed blocks (eg, mb4y2) may be moved to the scratch pad memory for subsequent use when processing a downwardly adjacent macro block. Any partially completed blocks remaining in the filter (eg, mbly3, mb4y1, mb4y3) may then be shifted into the filter to make room for new data. The results of the output, store, and shift steps are shown in FIG. 5K, which corresponds exactly to the initial filter state shown in FIG. 5A. As a result, the series of steps shown in FIGS. 5A-5K may be repeated to continue filtering the filter blocks that include the next macro block (e.g., macroblock 5 or mb5).

이제 도6A 내지 도6F를 참조하면, 본 발명의 예시적인 실시예가 도시되어 있는바, 매크로 블록의 Cb 또는 Cr 블록들에 대한 WMV9 평활화 및 디블록킹 절차가 구분적 프로세싱 기법을 이용하여 어떻게 구현될 수도 있는지가 도시되어 있다. Cb 블록 및 Cr 블록들은 유사하기 때문에, 실시예는 인덱스 라벨 Cb(x,y)에 의해 식별되는 현재의 Cb 매크로 블록에 대해서 설명된다. 필터링 프로세스의 시작 지점이 도6A에 도시되어 있는바, 오버랩 평활화(예를 들면, 타원형 422 으로 표시됨) 및 디블록킹(예를 들면, 직선 423으로 표시됨)을 위해 이미 부분적으로 프로세싱된 블록들[예를 들면, Cb(x-1, y-1) and Cb(x-1, y)]로 필터(420)가 이미 로딩되어 있다.Referring now to FIGS. 6A-6F, an exemplary embodiment of the present invention is shown, which illustrates how the WMV9 smoothing and deblocking procedure for Cb or Cr blocks of a macroblock may be implemented using a discrete processing technique. Is shown. Since the Cb block and Cr blocks are similar, the embodiment is described with respect to the current Cb macro block identified by the index label Cb (x, y). The starting point of the filtering process is shown in FIG. 6A, where blocks that have already been partially processed for overlap smoothing (eg, indicated by elliptical 422) and deblocking (eg, indicated by straight line 423) [eg For example, filter 420 is already loaded with Cb (x-1, y-1) and Cb (x-1, y)].

이후, 필터(420)는 도6B에 도시된 바와같이 추가적인 블록들로 채워진다. 특 히, 부분적으로 완료된 블록[예를 들면, Cb(x, y-1)]이 스크래치 패드 메모리로부터 검색되어 필터로 로딩된다. 또한, 현재의 매크로 블록으로부터 블록[예를 들면, Cb(x, y)]이 필터(420)로 로딩된다. 일단, 필터 블록들이 로딩되면, 상기 필터(420)는 구분적 오버랩 평활화를 수행하며 이는 도6C에 도시된 바와같다. 특히, 수직 오버랩 평활화(V)가 선택된 내부 수직 에지들에 대해 수행된다. 다음으로, 수평 오버랩 평활화(H1, H2)가 선택된 내부 수평 에지들에 대해 수행된다.Thereafter, the filter 420 is filled with additional blocks as shown in FIG. 6B. In particular, a partially completed block (eg Cb (x, y-1)) is retrieved from the scratch pad memory and loaded into the filter. Also, a block (eg, Cb (x, y)) from the current macro block is loaded into the filter 420. Once the filter blocks are loaded, the filter 420 performs fractional overlap smoothing, as shown in FIG. 6C. In particular, vertical overlap smoothing (V) is performed on the selected inner vertical edges. Next, horizontal overlap smoothing H1, H2 is performed on the selected inner horizontal edges.

필터 블록들이 부분적으로 평활화된 이후에, 필터(420)는 도6D 에 도시된 바와같은 구분적 디블록킹을 수행한다. 먼저, 수평 인-루프 디블록킹(HD1, HD2)이 선택된 8×8 블록 경계들에 대해 수행되며, 선택된 서브 블록 경계들에서 수평 인-루프 디블록킹(HDH1, HDH2)이 후속으로 수행된다. 다음으로, 상기 필터는 선택된 8×8 블록 경계들에 대해 수직 인-루프 디블록킹(VD)을 수행하며, 이후 선택된 서브-블록 경계들에 대한 수직 인-루프 디블록킹(VDH)이 수행된다. After the filter blocks are partially smoothed, the filter 420 performs fractional deblocking as shown in FIG. 6D. First, horizontal in-loop deblocking (HD1, HD2) is performed on selected 8x8 block boundaries, and horizontal in-loop deblocking (HDH1, HDH2) is subsequently performed at selected subblock boundaries. Next, the filter performs vertical in-loop deblocking (VD) on the selected 8x8 block boundaries, and then vertical in-loop deblocking (VDH) on the selected sub-block boundaries.

일단, 평활화 및 디블록킹 필터 동작이 완결되면, 프로세싱된 필터 블록들은 도6E에 도시된 바와같이 저장되고 쉬프트된다. 특히, 임의의 완료된 블록[예를 들면, Cb(x-1, y-1)]은 출력될 수도 있는데, 이는 이들 블록이 완료되었기 때문이다. 또한, 하나 이상의 부분적으로 완료된 블록들[예를 들면, Cb(x-1, y1)]은, 아래쪽으로 인접한 매크로 블록을 처리할 때에 후속 사용을 위해, 스크래치 패드 메모리로 이동될 수도 있다. 필터에 남아있는 임의의 부분적으로 완료된 블록들[예를 들면, Cb(x, y-1), Cb(x, y)]은, 새로운 데이터를 위한 공간을 만들어주도록 이후, 필터로 쉬프트될 수도 있다. 출력, 저장 및 쉬프트 단계의 결과가 도6F에 도시되어 있으며, 이는 도6A에 도시된 초기 필터 상태에 정확히 대응한다. 그 결과, 도6A 내지 도6F에 도시된 일련의 단계들이, 다음 매크로 블록을 필터링하도록 반복될 수도 있다.Once the smoothing and deblocking filter operation is completed, the processed filter blocks are stored and shifted as shown in FIG. 6E. In particular, any completed blocks (eg, Cb (x-1, y-1)) may be output because these blocks are complete. In addition, one or more partially completed blocks (e.g., Cb (x-1, y1)) may be moved to the scratch pad memory for subsequent use when processing a downwardly adjacent macro block. Any partially completed blocks remaining in the filter (eg, Cb (x, y-1), Cb (x, y)) may then be shifted into the filter to make room for new data. . The results of the output, store, and shift steps are shown in FIG. 6F, which corresponds exactly to the initial filter state shown in FIG. 6A. As a result, the series of steps shown in FIGS. 6A-6F may be repeated to filter the next macro block.

앞에 설명된 바와같이, 작은 스크래치 패드 메모리를 하드웨어 디코더 유닛에 제공함으로써, 상기 인-루프 필터는, 현재 매크로 블록[MB(x,y)로 표시됨]의 루마 및 크로마 블록들로부터의 임의의 부분적으로 완료된 필터링 결과들을, 스크래치 패드 메모리에 임시로 저장할 수도 있다. 상기 저장된 필터링 결과들은 이후, 상기 매크로 블록에 인접한 아래 행의 블록들을 프로세싱하는데에 사용될 수도 있다. 특히, 필터가 바로 아래에 있는 매크로 블록 즉, MB(x, y+1)을 프로세싱 할 때에는, MB(x,y)에 대해 상기 저장된 데이터는 스크래치 패드 메모리로부터 페치되며 그리고 MB(x, y+1)를 프로세싱하는데 사용된다.As described earlier, by providing a small scratch pad memory to the hardware decoder unit, the in-loop filter is configured to partially remove from the luma and chroma blocks of the current macro block (indicated by MB (x, y)). Completed filtering results may be temporarily stored in the scratch pad memory. The stored filtering results may then be used to process blocks of the lower row adjacent to the macro block. In particular, when the filter processes the macro block immediately below, that is, MB (x, y + 1), the stored data for MB (x, y) is fetched from scratch pad memory and MB (x, y + Used to process 1).

비록, 스크래치 패드 메모리에 저장된 부분적으로 완료된 필터링 결과들은 적어도 8×8 픽셀 데이터를 포함해야 하지만, 선택된 일 실시예에서는, 상기 스크래치 패드 메모리는 또한, 상기 블록에 대한 경계 필터링이 필요한가를 결정하는 제어 데이터를 저장한다. 예를 들면, 제어 데이터는 각 블록의 현재 매크로 블록의 6개의 블록들에 대한 헤더들의 그룹을 포함하며, 1mv 또는 4mv 셀렉터, 블록 어드레스, 프레임에서 블록의 포지션, mbmode, 프레임 사이즈, 계수(0 또는 0 이 아님) 및 모션 벡터들(x 및 y 방향으로의 포워드를 위한 2개, x 및 y 방향으로의 백워드를 위한 2개)을 포함한다. 상기 데이터는 버스트 사이즈의 효율적인 사용을 가능케 하는 방식으로 패킹될 수도 있다.Although the partially completed filtering results stored in the scratch pad memory should include at least 8 × 8 pixel data, in one selected embodiment, the scratch pad memory also includes control data to determine if boundary filtering is required for the block. Save it. For example, the control data includes a group of headers for six blocks of the current macro block of each block, and includes a 1mv or 4mv selector, block address, position of the block in the frame, mbmode, frame size, coefficient (0 or Nonzero) and motion vectors (two for forward in x and y directions, two for forward in x and y directions). The data may be packed in a manner that allows for efficient use of the burst size.

스크래치 패드 메모리의 작은 사이즈 때문에, 상기 메모리는 비디오 가속기처럼 동일한 칩 상에 위치될 수도 있으며, 전형적인 프레임 사이즈에 대해서 상기 스크래치 패드 메모리는 가령, DDR 메모리 또는 다른 외부 메모리처럼, 다른 칩 상에 위치할 수도 있다. 하지만, 스크래치 패드 메모리를 비디오 가속기 하드웨어 유닛(10)에 위치시킴으로써, 향상된 메모리 억세스 성능이 얻어진다. 스크래치 패드 메모리의 사이즈를 최소화함으로써, 전체 프레임에 대한 데이터 블록들을 저장하기 위해 큰 용량의 메모리 버퍼를 포함하는 것에 비하여, 미디어 가속기 하드웨어 유닛의 제조단가가 감소될 수도 있다. 예를 들어, 제어 데이터 및 픽셀 데이터를 포함하는 부분적으로 완료된 필터링 결과들을 저장하는데에 사용되는 스크래치 패드 메모리는 다음과 같이 계산될 수 있다: Because of the small size of scratch pad memory, the memory may be located on the same chip as a video accelerator, and for typical frame sizes, the scratch pad memory may be located on another chip, such as, for example, DDR memory or other external memory. have. However, by placing the scratch pad memory in the video accelerator hardware unit 10, improved memory access performance is obtained. By minimizing the size of the scratch pad memory, the manufacturing cost of the media accelerator hardware unit may be reduced compared to including a large capacity memory buffer to store data blocks for the entire frame. For example, the scratch pad memory used to store partially completed filtering results including control data and pixel data may be calculated as follows:

스크래치 패드의 사이즈 = (576 바이트)×(프레임내에서 수평으로 있는 매크로 블록의 갯수)Size of scratch pad = (576 bytes) x (number of macro blocks horizontally in the frame)

앞서 설명한 바와같이, 프레임의 사이즈가 수직 방향으로 클 때, 스크래치 패드의 사이즈는 상대적으로 작다. 즉, 달리 말하면, 스크래치 패드의 사이즈는 수평 방향으로의 프레임 사이즈에 의존한다. 전체 프레임이 디코딩되기 전에 필터링이 제 1 매크로 블록 상에서 시작할 수 있는한, 본 명세서에서 개시된 구분적 프로세싱 기법이, 전체 프레임을 유지하는 큰 메모리와 관련해서도 필터링 동작의 스피드를 향상시키도록 유용하게 사용될 수도 있다는 것을 알 수 있을 것이다. 하지만, 스크래치 패드 메모리를 사용하는 것은, 전체 프레임을 저장하기 위해 큰 메모리를 사용하는 것에 비하여 비용 및 속도면에서 장점이 있으며, 전체 프레임을 저장하기 위해 큰 메모리를 사용한다는 것은 비용이 많이 들며 액세스 시간에 오버헤드를 부가하는 단점이 있다. 또한, 스크래치 패드 메모리의 사용은, 필터링 알고리즘에서 파이프라인화된 프로세싱에 매우 잘 부합한다. As described above, when the size of the frame is large in the vertical direction, the size of the scratch pad is relatively small. In other words, the size of the scratch pad depends on the frame size in the horizontal direction. As long as filtering can begin on the first macro block before the entire frame is decoded, the fractional processing techniques disclosed herein can be usefully used to improve the speed of the filtering operation, even with respect to the large memory holding the entire frame. You will see that it may. However, using scratch pad memory has advantages in cost and speed compared to using large memory to store the entire frame, and using large memory to store the entire frame is expensive and access time. This has the disadvantage of adding overhead. In addition, the use of scratch pad memory is very well suited for pipelined processing in filtering algorithms.

전술한 특정한 실시예들은 단지 예시적인 것이며 본 발명을 제한하는 것으로 받아들여져서는 않되며, 본 발명은 상이하지만 대등한 방법들을 통해서 수정되거나 실시될 수도 있는바, 이러한 방법들은 본 명세서의 가르침의 이득을 얻는 해당 기술분야의 당업자에게는 자명할 것이다. 따라서, 전술한 상세한 설명은, 제시된 특정된 폼으로 본 발명을 제한하는 것으로 의도되는 것은 아니며, 오히려 그와는 반대로, 첨부된 청구항들에 의해 정의되는 본 발명의 사상 및 범위내에 포함될 수도 있는 이러한 대안예들, 변형예들 및 등가물들 커버하도록 의도되는바, 해당 기술분야의 당업자라면, 이들 당업자들이 가장 넓은 형식에서 본 발명의 사상 및 범위를 벗어남이 없이 다양한 변화들, 대용예들 및 대안예들을 만들 수 있음을 이해하여야 한다.The specific embodiments described above are merely exemplary and should not be taken as limiting the invention, and the invention may be modified or practiced through different but equivalent methods, which methods benefit from the teachings herein. It will be apparent to those skilled in the art to be obtained. The foregoing detailed description, therefore, is not intended to limit the invention to the precise forms given, but on the contrary, such alternatives may be included within the spirit and scope of the invention as defined by the appended claims. Examples, modifications, and equivalents are intended to cover those skilled in the art, and those skilled in the art will appreciate that various changes, substitutions, and alternatives can be made without departing from the spirit and scope of the invention in its broadest form. It should be understood that it can be made.

Claims

1. A method of decoding video data processed by block transformation into a plurality of macro blocks, the method comprising:

Smoothing and deblocking selected pixel data of the first macroblock to produce at least one first partially filtered block and at least one first completed block; And

Storing the first partially filtered block in scratch pad memory for use in smoothing and deblocking selected pixel data of a second macro block.

Method for decoding the video data made, including.

The method of claim 1,

Fetching a second partially filtered block from the scratch pad memory for use in the smoothing and deblocking steps,

And said second partially filtered block was previously generated during processing of a prior macro block.

The method of claim 1,

Said smoothing and deblocking steps are performed by an in-loop filter.

The method of claim 3, wherein

The in-loop filter sequentially processes each row of macro blocks of a video frame to smooth and deblock one macro block at a time.

The method of claim 3, wherein the in-loop filter,

A method of decoding video data, characterized in that the smoothing and deblocking for a plurality of macro blocks are performed sequentially in a pipelined manner.

The method of claim 1,

Smoothing and deblocking selected pixel data of the first row of macro blocks to produce partially filtered blocks; And

Storing the partially filtered blocks in scratch pad memory

The method of decoding video data, further comprising.

The method of claim 6,

When smoothing and deblocking selected pixel data of a second row of macro blocks, retrieving a first partially filtered block from the scratch pad memory; And

Smoothing and deblocking selected pixel data of the first partially filtered block to complete smoothing and deblocking on the first partially filtered block, thereby generating a completed block

The method of decoding video data, further comprising.

The method of claim 1, wherein the smoothing and deblocking step comprises:

And at least one first smoothing and deblocking of at least one first block of each macro block,

Wherein the first block is partially processed in a first filter operation, the first block is stored in the scratch pad memory, and the first block is subsequently fully processed in a second filter operation. How to decode.

A video processing system for decoding video information from a compressed video data stream, comprising:

A processor that partially decodes the compressed video data stream to produce partially decoded video data; And

Video decode circuitry to decode the partially decoded video data to produce video frames

It is made, including

And the video decode circuit comprises a scratch pad memory and an in-loop filter that continuously perform separate processing of overlap smoothing and in-loop deblocking for a plurality of macro blocks of the video frames. .

The method of claim 9,

The scratch pad memory stores partially filtered blocks from the first row of macro blocks, wherein each partially filtered block is fetched by the in-loop filter during overlap smoothing and deblocking for the second row of macro blocks. And device which may be.