KR100900058B1

KR100900058B1 - Method and Circuit Operation for Motion Estimation of Deverse Multimedia Codec

Info

Publication number: KR100900058B1
Application number: KR1020070034776A
Authority: KR
Inventors: 선우명훈; 현충진
Original assignee: 아주대학교산학협력단
Priority date: 2007-04-09
Filing date: 2007-04-09
Publication date: 2009-06-01
Also published as: KR20080091651A

Abstract

본 발명은 MPEG-2, MPEG-4 그리고 H.264/AVC 차세대 멀티미디어 코덱 등 다양한 멀티미디어 코덱에 사용되는 움직임 추정 연산 방법 및 그 연산회로에 관한 것으로, 코덱을 실행하는 과정에서 연산량과 전력 소비가 많은 움직임 추정 과정에 하나의 탐색영역을 공유하여 인접한 M × N 블록들의 움직임을 추정하여 멀티미디어 코덱에 사용되는 연산량과 전력소모를 줄일 수 있는 움직임 추정 연산방법 및 그 연산회로를 제공한다.The present invention relates to a motion estimation calculation method and a calculation circuit used in various multimedia codecs such as MPEG-2, MPEG-4 and H.264 / AVC next generation multimedia codec. The present invention provides a motion estimation algorithm and a calculation circuit that reduce the amount of computation and power consumption used in a multimedia codec by estimating the motion of adjacent M × N blocks by sharing one search area in a motion estimation process.

H.264/AVC, 움직임 추정, Motion Estimation (ME), Memory Reuse 알고리즘, Data Reuse 알고리즘 H.264 / AVC, Motion Estimation, Motion Estimation (ME), Memory Reuse Algorithm, Data Reuse Algorithm

Description

Motion Estimation Algorithm Used in Various Multimedia Codecs and Its Circuits [Method and Circuit Operation for Motion Estimation of Deverse Multimedia Codec}

도 1은 종래의 H.264/AVC의 부호기의 전체구성을 나타낸 도면.BRIEF DESCRIPTION OF THE DRAWINGS Fig. 1 is a diagram showing the overall configuration of an encoder of a conventional H.264 / AVC.

도 2는 종래의 현재블록의 탐색영역 중심을 구하는 방법을 나타낸 도면.2 is a diagram illustrating a method for obtaining a search area center of a conventional current block.

도 3은 종래의 부호기의 각 블록의 연산량 분석을 나타낸 도면.3 is a diagram illustrating the amount of computation of each block of a conventional encoder.

도 4는 종래의 복호기의 각 블록의 연산량 분석을 나타낸 도면.4 is a diagram showing the amount of computation of each block of a conventional decoder.

도 5는 본 발명에 따른 참조 프레임에 펼쳐진 현재블록의 탐색영역 안에서 현재블록과 가장 일치하는 블록을 구하는 과정을 나타낸 도면.5 is a diagram illustrating a process of finding a block that most matches a current block in a search area of a current block expanded in a reference frame according to the present invention.

도 6은 본 발명에 따른 현재블록과 인접한 3개 블록의 탐색영역 간에 서로 중복된 영역을 나타낸 도면.FIG. 6 is a diagram illustrating regions overlapping each other between a current block and a search area of three adjacent blocks according to the present invention; FIG.

도 7은 본 발명에 따른 현재블록과 인접한 3개 블록의 가장 일치하는 블록을 현재블록의 탐색영역 안에서 찾는 과정을 나타낸 도면.FIG. 7 is a diagram illustrating a process of finding, in a search region of a current block, the most matching block of three blocks adjacent to the current block according to the present invention; FIG.

도 8는 본 발명에 따른 다양한 멀티미디어 코덱에 사용되는 움직임 추정 연산 연산회로를 나타낸 도면.8 is a diagram illustrating a motion estimation calculation circuit used in various multimedia codecs according to the present invention.

도 9는 본 발명에 따른 다양한 멀티미디어 코덱에 사용되는 움직임 추정 연 산 연산회로에서 4 × 1 블록의 SAD 값을 구하는 연산 유닛의 구성을 나타낸 도면.9 is a diagram showing the configuration of a calculation unit for obtaining SAD values of 4x1 blocks in a motion estimation calculation circuit used in various multimedia codecs according to the present invention.

도 10은 본 발명에 따른 다양한 멀티미디어 코덱에 사용되는 움직임 추정 연산 연산회로에서 4 × 1 블록의 SAD 값을 저장하기 위한 버퍼의 구성을 나타낸 도면.FIG. 10 is a diagram illustrating a configuration of a buffer for storing SAD values of 4 × 1 blocks in a motion estimation arithmetic circuit used in various multimedia codecs according to the present invention. FIG.

도 11은 본 발명에 따른 다양한 멀티미디어 코덱에 사용되는 움직임 추정 연산 연산회로에서 참조프레임의 탐색영역 안의 픽셀 데이터와 현재블록의 픽셀 데이터와의 SAD연산을 나타낸 도면.11 is a diagram illustrating SAD operation of pixel data in a search region of a reference frame and pixel data of a current block in a motion estimation arithmetic operation circuit used in various multimedia codecs according to the present invention.

< 도면의 주요부분에 대한 부호의 설명 ><Description of Symbols for Major Parts of Drawings>

110 : 레지스터 120 : 분배기110: register 120: divider

130 ~ 160 : SAD 연산부 170, 220 : 스위칭 블록130 to 160: SAD calculator 170, 220: switching block

180 ~ 210 : 버퍼 230 ~ 260 : SAD 비교 연산부180 to 210: buffer 230 to 260: SAD comparison operation unit

300 ~ 330, 350, 360, 380 : 덧셈기300 ~ 330, 350, 360, 380: Adder

340, 370 : 파이프라인 레지스터 400 : 디멀티플렉서340, 370: pipeline register 400: demultiplexer

410 ~ 440 : 덧셈기 450 ~ 480 : 버퍼410 ~ 440: Adder 450 ~ 480: Buffer

최근 멀티미디어 신호처리 기술은 반도체 공정 및 설계 기술과 광대역 무선 통신 시스템의 발전으로 정지 영상 및 동영상 신호를 압축하여 무선으로 전송할 수 있는 단계에 이르고 있으며, 고화질의 동영상 데이터를 높은 압축률을 유지하면서 최대한 원래의 영상을 유지하는 방향으로 계속해서 연구되고 있다. 이러한 멀티미디어 신호처리 기술의 국제 표준으로는 ISO/IEC의 MPEG-2, MPEG-4 또는 ITU-T의 H.263 그리고 ISO/IEC와 ITU-T가 공동 개발한 H.264/AVC 등이 있다. Recently, multimedia signal processing technology has reached the stage of compressing still image and video signal and transmitting it wirelessly with the development of semiconductor process and design technology and broadband wireless communication system. Research continues to maintain the image. International standards for multimedia signal processing technology include ISO / IEC MPEG-2, MPEG-4 or ITU-T H.263 and ISO / IEC and ITU-T jointly developed H.264 / AVC.

또한, 상용화된 위성 및 지상파 DMB 서비스는 이동 중 멀티미디어 데이터를 이용할 수 있다는 점에서 각광을 받고 있으며 그 수요가 폭발적으로 증가할 것으로 예상된다. 이러한 차세대 이동통신은 음성은 물론 그림, 동영상 등의 멀티미디어 정보를 고속, 고품질로 송수신하는 통신방식이다. 따라서 제한된 전송률에서 고품질의 멀티미디어 서비스를 구현하기 위해 높은 압축률을 가지면서 고화질을 유지할 수 있는 기술이 필요하다. H.264/AVC 기술은 기존의 MPEG-4에 비해 약 40% 정도, MPEG-2에 대해서는 약 60% 정도의 압축 효율을 갖는 것으로 알려져 있다. 따라서 H.264/AVC를 사용하면 1Mbps 이하의 전송속도에서도 DVD 정도의 방송품질을 유지할 수 있기 때문에 인터넷 및 무선을 통한 VOD 서비스 및 디지털 카메라, 비디오카메 라, DVD 등에서 채택되어 사용되고 있다. In addition, commercially available satellite and terrestrial DMB services are in the spotlight in that they can use multimedia data while moving, and the demand is expected to increase explosively. The next generation mobile communication is a communication method that transmits and receives multimedia information such as voice, pictures, and videos at high speed and high quality. Therefore, there is a need for a technology capable of maintaining high quality while maintaining a high compression rate in order to implement high quality multimedia services at a limited data rate. H.264 / AVC technology is known to have compression efficiency of about 40% compared to MPEG-4 and about 60% for MPEG-2. Therefore, H.264 / AVC is used in VOD service, digital camera, video camera, DVD, etc. through internet and wireless because it can maintain the broadcasting quality of DVD level even under transmission speed of 1Mbps or less.

상기 H.264/AVC는 MPEG-2와 MPEG-4 등 기존의 동영상 압축부호화 방식과 같이 참조프레임(움직임 추정에 사용되는 영상프레임)으로부터 움직임을 추정해서 예측데이터를 만들고, 부호화하려는 영상과 가장 일치하는 예측데이터와의 차분데이터를 만든 후 이산여현변환/양자화를 수행하며 마지막으로 엔트로피 부호화 (가변길이 부호화라고도 한다)를 수행한다. 이는 움직임 추정 및 보상과 이산여현변환/양자화, 엔트로피 부호화라고 불리는 기술을 기반으로 하고 있다. The H.264 / AVC makes prediction data by estimating the motion from a reference frame (image frame used for motion estimation) like the existing video compression encoding method such as MPEG-2 and MPEG-4, and most matches with the image to be encoded. Discrete cosine transform / quantization is performed after making the difference data with the predictive data. Finally, entropy coding (also called variable length coding) is performed. It is based on a technique called motion estimation and compensation, discrete cosine transform / quantization, and entropy coding.

또한, 상기 H.264/AVC에는 특정한 기능을 지원하는 베이스라인 프로파일 (Baseline Profile : BP), 메인 프로파일 (Main Profile : MP), 확장프로파일 (Extend Profile : EP)의 세 가지 프로파일을 정의하고 있는데, 각각의 프로파일과 호환되기 위해서 디코더에 요구되는 사항들이 정의되어 있다. 프로파일이란 비디오 복호화 과정에서 알고리즘 상 들어가는 기술적 구성요소를 규격화한 것을 의미한다. 바꿔 말하면, 압축된 영상의 비트열을 복호하기 위해 필요한 기술요소의 집합이라 할 수 있다.In addition, the H.264 / AVC defines three profiles that support a specific function: a baseline profile (BP), a main profile (MP), and an extended profile (EP). The requirements for the decoder are defined to be compatible with each profile. A profile is a standardization of technical components entered in an algorithm during video decoding. In other words, it can be referred to as a set of descriptive elements necessary for decoding a bit string of a compressed image.

상기 H.264/AVC는 기존의 MPEG방식에 비해서 영상의 처리단위를 보다 작은 블록크기 (4화소 x 4라인 = 16화소)로 정하고, 정교한 화소 정밀도 (1/4 화소까지)를 사용해 매우 정밀한 움직임 추정이 가능하게 되었다. 또한, 여러 개의 참조프레임을 사용함으로써 최적의 예측신호를 선택하는 움직임 추정을 활용하였다. The H.264 / AVC sets the image processing unit to a smaller block size (4 pixels x 4 lines = 16 pixels) compared to the conventional MPEG method, and uses precise pixel precision (up to 1/4 pixel) for very precise movement. Estimation is possible. In addition, we use motion estimation to select the best prediction signal by using multiple reference frames.

도1은 상기 H.264/AVC 부호기의 구성을 나타낸 것으로, Fn (현재 프레임), F'n-1 (참조 프레임)을 가지고 매크로 블록(16 x 16) 단위로 인터 모드(Inter mode)와 인트라 모드(Intra mode)로 동작한다. 인터 모드는 화면간 부호화로 움직임 추정 연산을 하게 되며, 인트라 모드는 화면내 부호화로 현재프레임 자체를 인코딩하게 된다. FIG. 1 shows the structure of the H.264 / AVC coder, which has an Fn (current frame) and an F'n-1 (reference frame) in units of macro blocks (16 x 16). It operates in Intra mode. In inter mode, motion estimation is performed by inter picture encoding, and in intra mode, the current frame is encoded by intra picture coding.

상기 움직임 추정연산을 하는 인터 모드의 연산순서는, 인코딩을 위한 입력 Fn이 매크로 블록단위로 처리되며, Fn은 F'n-1과 비교된다. 움직임 추정 부분을 통해 F'n-1 내에서 Fn 내의 현재의 매크로 블록과 일치하는 매크로 블록을 찾고, 현재 매크로 블록의 위치와 선택된 참조영역 사이의 차이가 움직임 벡터가 된다. 선택된 움직임 벡터 MV에 따라 움직임 보상된 예측영역이 생성되고, 현재 매크로블록에서 예측영역을 빼면 오차 매크로 블록이 생성된다.In the inter-mode operation order of the motion estimation operation, the input Fn for encoding is processed in macroblock units, and Fn is compared with F'n-1. The motion estimation portion finds a macroblock that matches the current macroblock in Fn within F'n-1, and the difference between the position of the current macroblock and the selected reference region becomes a motion vector. A motion compensated prediction region is generated according to the selected motion vector MV, and an error macro block is generated by subtracting the prediction region from the current macroblock.

상기 오차 매크로 블록은 이산여현변환 (DCT)를 사용하여 변환 (Transform) 되며, 일반적으로 8 x 8 또는 4 × 4의 하위블록으로 분리되고 각 하위 블록이 독립적으로 변환된다. 각 하위 블록은 Q (Quantization)를 통해 양자화 되며 양자화된 계수는 스캔에 의해 재배치되고 run-level 코딩된다. 마지막으로 각 매크로 블록의 계수, 움직임 벡터 그리고 관련된 헤더 정보가 엔트로피 인코딩되어 압축된 비트 스트림이 생성되어 복호화기에 전달된다.The error macroblock is transformed using a discrete cosine transform (DCT), and is generally divided into 8 × 8 or 4 × 4 subblocks, and each subblock is transformed independently. Each subblock is quantized through Q (Quantization) and the quantized coefficients are relocated by scan and run-level coded. Finally, coefficients, motion vectors, and associated header information of each macro block are entropy encoded to generate a compressed bit stream, which is then passed to the decoder.

상기 움직임 추정 방법은 현재 프레임에서 M × N 샘플을 갖는 블록 각각에 대해 다음의 과정이 수행된다. 첫 번째로 현재프레임의 M × N 샘플블록과 일치하는 M × N 샘플영역을 찾기 위해 참조프레임 (이전 또는 이후 프레임)의 일정영역을 탐색한다. 이 과정은 현재 프레임의 M × N 블록과 탐색영역 (일반적으로 현재 블록의 위치를 기준으로 한 참조프레임내의 일정영역)내의 가능한 M × N 블록 모두 (Full-Search) 또는 일부 (Fast-Search)를 비교하여, 그 중 가장 일치하는 영역을 찾아내는 과정이다. 현재의 M × N블록에서 후보영역을 뺌으로써 구해지는 차분데이터의 절대합 (Sum of Absolute Difference : SAD)이 최소가 되는 후보영역을 가장 일치하는 영역으로 선택하는 방법이 사용된다. 참조프레임의 일정영역을 탐색하기 위해서는 먼저 탐색영역의 중심이 되는 MVp (Motion Vector Prediction)를 구해야 한다. 상기 MVp를 구하기 위하여 현재 블록의 상단과 우상단, 좌측의 세 블록의 MV (Motion Vector)를 미디안 필터링을 하여 구한다. In the motion estimation method, the following process is performed on each block having M × N samples in the current frame. Firstly, a certain area of a reference frame (previous or subsequent frame) is searched to find an M × N sample area that matches the M × N sample block of the current frame. This process consists of (Full-Search) or part of (Fast-Search) all M × N blocks in the current frame and the possible M × N blocks in the search area (typically a constant area within the reference frame relative to the current block's position). By comparison, it is the process of finding the best match among them. A method of selecting a candidate region having the smallest sum of Absolute Difference (SAD) obtained by subtracting the candidate region from the current M × N block as the most matching region is used. In order to search a certain region of a reference frame, MVp (Motion Vector Prediction), which is the center of the search region, must first be obtained. In order to obtain the MVp, the MV (Motion Vector) of the upper, upper right and left blocks of the current block is obtained by performing median filtering.

도 2는 현재 블록의 MVp를 생성하는 방법을 나타낸 것이다. 다음에 선택된 후보영역은 현재의 M × N 블록을 위한 예측 블록이 되고, 현재 블록에서 예측 블록을 뺌으로써 M × N 차분데이터가 만들어 지는데, 이 M × N 차분데이터는 인코딩되어 전송되며, 현재 블록의 위치와 후보영역의 위치 사이의 움직임 벡터 또한 전송된다. 디코더는 전송받은 움직임 벡터를 사용하여 예측 영역을 재생성하고, 이 예측 영역과 복원된 M × N 차분데이터를 합하여 원래의 블록을 재구성한다.2 illustrates a method of generating MVp of a current block. The next selected candidate region becomes a prediction block for the current M × N block, and M × N difference data is generated by subtracting the prediction block from the current block, and the M × N difference data is encoded and transmitted. The motion vector between the position of and the position of the candidate area is also transmitted. The decoder regenerates the prediction region using the received motion vector, and reconstructs the original block by adding the prediction region and the reconstructed M × N difference data.

상기와 같은 움직임 추정은 대역폭 32bit의 단일포트 메모리에서 탐색영역 안의 데이터를 재사용하고, 전 탐색 방법으로 탐색했을 경우, 탐색영역이 [-8, +7]이면 4 × 4 블록당 95번 메모리를 접근해야 한다. QCIF (176 x 144) 영상에서는 4 × 4 블록이 1584개 존재하므로 QCIF영상 한 프레임당 150,480번 메모리를 접근해야 한다. 또한 탐색영역을 [-15, +16]로 하였을 경우에는, 4 × 4 블록당 324번 메모리를 접근해야 하며, QCIF 영상 한 프레임 당 513,216번 메모리를 접근해야 한 다. Such motion estimation reuses the data in the search area in a single-port memory with 32 bits of bandwidth, and accesses 95 times per 4 × 4 blocks if the search area is [-8, +7]. Should be. In QCIF (176 x 144) video, there are 1584 4 × 4 blocks, which requires access to 150,480 memories per frame of QCIF video. In addition, when the search area is set to [-15, +16], the memory should be accessed 324 times per 4 × 4 block, and the memory should be accessed 513,216 times per frame of QCIF image.

도3 및 도4는 종래의 부호기와 복호기의 각 블록에 대한 연산량 분석을 나타낸 것으로, 움직임 추정이 연산량의 39%, 움직임 보상이 연산량의 29%로 가장 많은 부분을 차지하고 있다. 이와 같이 움직임 추정 및 보상은 적은 전송데이터로 고품질의 영상을 유지 할 수 있는 반면 인코딩 연산량이 39%를 차지 할 정도로 연산량이 많으며, 메모리 접근 횟수도 많아서 연산량과 전력 소비가 크다는 문제점이 있었다.3 and 4 show the calculation of the amount of calculation for each block of the conventional encoder and the decoder. The motion estimation occupies the largest portion with 39% of the calculation amount and the motion compensation with 29% of the calculation amount. As such, motion estimation and compensation can maintain a high quality image with a small amount of transmission data. However, there is a problem in that the amount of computation is large enough to account for 39% of encoding operations and the number of memory accesses is high, resulting in high computational power and power consumption.

상기와 같은 문제점을 해결하기 위하여 본 발명은 MPEG-2, MPEG-4 그리고 H.264/AVC 차세대 멀티미디어 코덱 등 다양한 멀티미디어 코덱에 사용되는 연산량과 전력 소비가 많은 움직임 추정 부분에 하나의 탐색영역을 공유하여 인접한 M × N블록들의 움직임을 추정하는 움직임 추정 방법 및 그 움직임 추정 연산회로를 제공함으로써 멀티미디어 코덱에 사용되는 연산량과 전력소모를 줄일 수 있도록 하는 것을 목적으로 한다.In order to solve the problems described above, the present invention shares one search area with a motion estimation part that uses a large amount of computation and power consumption for various multimedia codecs such as MPEG-2, MPEG-4 and H.264 / AVC next-generation multimedia codecs. By providing a motion estimation method for estimating the motion of adjacent M × N blocks and the motion estimation operation circuit thereof, it is possible to reduce the amount of computation and power consumption used in the multimedia codec.

또한, 본 발명은 움직임 추정에서 연산량과 메모리 접근 횟수를 줄이기 위해서는 탐색영역 안에서의 데이터를 재사용하지 않고, 인접한 블록 간 MVp의 유사성을 이용하여 탐색영역 자체를 공유함으로써 메모리 접근 횟수를 줄이는 것을 또 다른 목적으로 한다.Another object of the present invention is to reduce the number of memory accesses by sharing the search area itself using similarity of MVp between adjacent blocks without reusing data in the search area in order to reduce the amount of computation and memory access in motion estimation. It is done.

상기 목적을 달성하기 위하여 본 발명은, MPEG-2, MPEG-4 또는 H.264/AVC 멀티미디어 코덱에 사용되는 움직임 추정 연산을 실행하는 방법에 있어서, 각 인접 한 M × N 블록들 중에서 한 개의 M × N 블록의 탐색영역을 공유하여, 인접한 M × N 블록들의 움직임 추정 연산을 하는 단계를 더 포함하는 것을 특징으로 하는 멀티미디어 코덱에 사용되는 움직임 추정 연산방법을 제공한다.In order to achieve the above object, the present invention provides a method for executing a motion estimation operation used in an MPEG-2, MPEG-4, or H.264 / AVC multimedia codec, wherein one M of each adjacent M × N blocks is provided. A motion estimation calculation method for a multimedia codec is further provided by sharing a search area of × N blocks to perform motion estimation of adjacent M × N blocks.

또한, MPEG-2, MPEG-4 또는 H.264/AVC 멀티미디어 코덱에 사용되는 움직임 추정 연산을 실행하는 연산회로에 있어서, 4 × 4 블록이 4개의 행으로 구성되어, 36 × 36 탐색영역 안에서 네 개 행의 픽셀 데이터를 저장할 수 있는 144개의 8bit 레지스터와; 현재블록의 각 행의 데이터와 탐색영역의 데이터를 저장하고 있는 레지스터로부터 데이터 값을 조합하여 출력하는 분배기와; 입력받은 현재블록의 픽셀 데이터와 탐색영역의 픽셀 데이터의 차이 값을 구하는 SAD 연산부와; 출력된 SAD 값을 순차적으로 저장할 수 있도록 출력하는 스위칭 블록과, 스위칭 블록으로부터 받은 SAD 값을 저장하는 버퍼와; 완성된 4 × 4 블록의 SAD 값이 될 수 있도록 SAD 값을 피드백 출력하거나 최종 완성된 4 × 4 블록의 SAD 값을 순차적으로 출력하는 스위칭 블록과; 입력받은 4 × 4 블록의 SAD 값을 순차적으로 비교하면서 가장 작은 SAD 값을 갖는 블록의 위치와 SAD 값을 내보내는 SAD 비교 연산부를 포함하는 것을 특징으로 하는 멀티미디어 코덱에 사용되는 움직임 추정 연산회로를 제공한다.In addition, in an operation circuit for performing motion estimation operation used in the MPEG-2, MPEG-4, or H.264 / AVC multimedia codec, a 4x4 block is composed of four rows, and is arranged in four 36x36 search areas. 144 8-bit registers capable of storing three rows of pixel data; A divider for combining and outputting data values from a register which stores data of each row of the current block and data of the search area; A SAD operation unit for obtaining a difference value between the input pixel data of the current block and the pixel data of the search area; A switching block for outputting the output SAD values sequentially and a buffer for storing the SAD values received from the switching blocks; A switching block for feeding back the SAD value or sequentially outputting the SAD value of the last completed 4 × 4 block so as to become the SAD value of the completed 4 × 4 block; It provides a motion estimation algorithm for a multimedia codec comprising a SAD comparison operation unit for outputting the SAD value and the position of the block having the smallest SAD while sequentially comparing the SAD value of the input 4 × 4 block. .

본 발명에 따른 다양한 멀티미디어 코덱에 사용되는 움직임 추정연산은, 먼저 현재프레임의 이웃한 4 × 4블록 4개의 주소를 생성하여 현재 프레임이 저장되 어 있는 메모리에 입력하고, 메모리에서 출력된 4개의 4 × 4 현재블록의 데이터를 각각의 레지스터에 저장한다. 참조 프레임에서 현재프레임의 4개의 4 × 4 현재블록과 가장 일치하는 블록을 찾기 위하여 4개의 4 × 4 현재블록 중 첫 번째 블록의 MVp를 생성하고, 상기 생성된 MVp를 중심으로 참조프레임 안에서 일정영역의 탐색영역을 정한다. 다음에, 상기 탐색영역이 정해지면 탐색영역안의 각 데이터의 주소를 생성하여 참조프레임이 저장되어 있는 메모리에 입력하고 메모리에서 출력된 탐색영역안의 데이터와 레지스터에 저장된 4개의 4 × 4 현재 블록의 데이터 사이의 4 × 1 단위로 4개 블록의 4 × 1 SAD 값을 구한다. 이후 연산을 통해 구해진 각각의 4 × 1 SAD 값들을 더하여 각각 4 × 4 SAD 값을 생성한다. 상기 각 4개의 4 × 4 현재블록들에 대한 SAD 값들 중 최소의 SAD값을 가지는 참조블록이 가장 일치하는 블록이 되며, 가장 일치하는 참조프레임의 4 × 4 블록의 데이터와 4 × 4 현재블록의 데이터를 뺄셈하여 오차블록의 데이터를 생성한다. 또한, 상기 4개의 4 × 4 현재블록의 MV는 첫 번째 블록의 MVp를 중심으로 각각 MV를 계산하여 생성하고, 생성된 4개의 4 × 4 현재블록의 오차블록과 MV는 인코딩되어 전송함으로써 움직임 추정 연산을 종료하게 된다. The motion estimation algorithm used in various multimedia codecs according to the present invention first generates four addresses of neighboring 4 × 4 blocks of the current frame, inputs them into a memory in which the current frame is stored, and outputs four four from the memory. 4 Stores the data of the current block in each register. In order to find the block that best matches the 4 4 × 4 current blocks of the current frame in the reference frame, an MVp of the first block of the 4 4 × 4 current blocks is generated, and a predetermined region in the reference frame is created based on the generated MVp. Set the search area for. Next, when the search area is determined, an address of each data in the search area is generated, inputted into a memory in which a reference frame is stored, data in the search area output from the memory, and data of four 4 × 4 current blocks stored in a register. Obtain 4 × 1 SAD values of four blocks in units of 4 × 1 between. After that, each 4 × 1 SAD value obtained through the operation is added to generate a 4 × 4 SAD value. The reference block having the smallest SAD value among the SAD values for each of the four 4x4 current blocks is the most identical block, and the data of the 4x4 block and the 4x4 current block of the most identical reference frame are the same. The data of the error block is generated by subtracting the data. In addition, the MVs of the four 4 × 4 current blocks are generated by calculating MVs based on the MVp of the first block, respectively, and the error blocks and the MVs of the four 4 × 4 current blocks generated are encoded and transmitted to estimate the motion. The operation ends.

이하, 첨부된 도면을 참조로 하여 본 발명의 각 연산과정의 좀 더 상세히 설명하고자 한다.Hereinafter, with reference to the accompanying drawings will be described in more detail for each calculation process of the present invention.

먼저, 도5는 MPEG-2, MPEG-4 또는 H.264/AVC 멀티미디어 코덱에 사용되는 움직임 추정 연산을 실행하는 방법에 있어서, 참조 프레임에 형성된 현재블록의 탐색 영역 안에서 현재블록과 가장 일치하는 블록을 구하는 과정을 나타낸 것으로, MVp가 생성되면 참조프레임 안에서 MVp를 중심으로 탐색영역을 형성하고 탐색영역 안에서 가장 일치하는 블록을 찾기 위해서는 탐색영역내의 모든 데이터를 메모리로부터 읽어야 한다. 이러한 과정은 모든 블록에 있어서 동일하게 적용되나, 탐색영역의 중심이 되는 MVp의 위치는 인접한 블록의 경우 서로 유사하기 때문에 각각의 탐색영역을 따로 지정하여 사용하지 않고, 한 개의 탐색영역을 서로 공유하여 사용할 수 있다.First, FIG. 5 shows a block that most matches a current block in a search region of a current block formed in a reference frame in the method for executing a motion estimation operation used in an MPEG-2, MPEG-4, or H.264 / AVC multimedia codec. When MVp is generated, all data in the search area must be read from the memory to form a search area around the MVp in the reference frame and find the most matching block in the search area. This process is the same for all blocks, but since the positions of the MVp, which are the centers of the search areas, are similar to each other in the case of adjacent blocks, each search area is not used separately. Can be used.

다음에, 도 6은 현재블록과 인접한 3개 블록의 탐색영역 간에 서로 중복된 영역을 나타낸 것으로, 점선으로 된 영역은 인접한 블록들의 탐색영역이 중복되는 영역이다. Next, FIG. 6 illustrates a region overlapped with each other between search areas of the current block and three adjacent blocks, and a dotted area is a region where the search areas of adjacent blocks overlap.

QCIF 영상 QCIF video ± 16± 16 ± 8± 8 ± 5± 5 ± 3± 3 AkiyoAkiyo 100100 100100 99.890699.8906 99.747599.7475 Bridge closeBridge close 100100 99.966399.9663 99.340699.3406 97.760997.7609 CarphoneCarphone 99.890699.8906 97.166197.1661 91.804291.8042 83.240783.2407 ClareClare 99.974799.9747 98.670098.6700 93.594393.5943 86.172886.1728 CoastguardCoastguard 99.966399.9663 97.758197.7581 91.997891.9978 74.340674.3406 ForemanForeman 99.983299.9832 98.804798.8047 96.170096.1700 91.857591.8575 GranmaGranma 99.907499.9074 97.427097.4270 93.911393.9113 89.242489.2424 suziesuzie 99.798099.7980 95.432195.4321 86.864286.8642 72.842372.8423 Table tennisTable tennis 99.957999.9579 98.526998.5269 92.918192.9181 83.678583.6785 trevortrevor 99.983299.9832 99.848599.8485 98.886198.8861 96.554496.5544

표 1은 4 × 4 현재블록 (B1) MVp의 일정영역 (±16, ±8, ±5, ±3)내에 인접한 블록들 (B2, B3, B4)의 MVp가 위치할 확률을 나타낸다. 현재블록 (B1)의 ±16영역 안에서 인접한 블록 (B2, B3, B4)가 발견될 확률은 99%가 넘으며, ±8 영역 안에서는 95% 이상이다. 그러므로 인접한 블록 (B2, B3, B4)의 탐색영역을 모두 메모리로부터 읽어오지 않고, 현재 블록 (B1)의 탐색영역만을 인접한 블록 (B2, B3, B4)의 탐색영역으로 설정하여, 현재 블록의 탐색영역 안에서 인접한 블록 (B2, B3, B4)의 가장 일치하는 블록을 찾는다. 따라서 이 방법을 적용 할 경우 기존의 일반적인 전 탐색 방법에 비하여 75%의 메모리 접근을 줄일 수 있고, 이에 따라 전력소모 또한 줄어들게 된다.Table 1 shows the probability that the MVp of adjacent blocks B2, B3, and B4 is located in a predetermined region (± 16, ± 8, ± 5, ± 3) of the 4 × 4 current block (B1) MVp. The probability that adjacent blocks (B2, B3, B4) are found in the ± 16 region of the current block (B1) is over 99%, and 95 percent or more in the ± 8 region. Therefore, instead of reading all the search areas of the adjacent blocks B2, B3, and B4 from the memory, only the search area of the current block B1 is set as the search area of the adjacent blocks B2, B3, and B4, and the current block search is performed. Find the closest block of adjacent blocks (B2, B3, B4) in the area. Therefore, this method can reduce the memory access by 75% compared to the conventional full search method, thereby reducing the power consumption.

다음에, 도 7은 현재블록과 인접한 3개 블록의 가장 일치하는 블록을 현재블록의 탐색영역 안에서 찾는 과정을 나타낸다. 탐색영역의 중심은 현재블록의 MVp이며 현재블록과 이웃한 블록들의 가장 일치하는 블록을 찾은 후에는 4개의 MV와 4개의 오차 블록 데이터가 생성된다.Next, FIG. 7 shows a process of finding the most matching blocks of three blocks adjacent to the current block in the search area of the current block. The center of the search area is the MVp of the current block, and after finding the best matching block of the current block and neighboring blocks, four MV and four error block data are generated.

한 실시예로서, 현재 블록 4 × 4의 탐색영역을 [-16,+15]로 하였을 때 탐색영역의 크기는 36 × 36이 된다. 인접한 블록이 4 × 4이므로 MVp = 0일 때 B1의 탐색영역은 현재 블록의 탐색영역을 4픽셀 우측으로 이동시킨 것과 같으며, B2의 탐색영역은 현재블록의 탐색영역을 4픽셀 아래로 이동시킨 것과 같다. 마지막으로 B3의 탐색영역은 현재블록의 탐색영역을 우측 대각선으로 4픽셀 이동시킨 것과 같다. As an example, when the search area of the current block 4x4 is [-16, +15], the size of the search area is 36x36. Since the adjacent block is 4 × 4, when MVp = 0, the search area of B1 is equivalent to moving the search area of the current block 4 pixels to the right, and the search area of B2 moves the search area of the current block 4 pixels below. Same as Finally, the search area of B3 is equivalent to moving the search area of the current block 4 pixels to the right diagonal.

따라서 각 인접한 블록의 탐색영역을 포함하는 전체 탐색영역의 크기는 40 x 40이 된다. 표2는 공유할 탐색영역의 크기를 40 x 40과 36 × 36으로 하였을 때의 PSNR(Peek Signal To Noise Rate)을 나타낸다. 표 2에서 각 블록이 공유할 탐색영역을 인접한 블록의 탐색영역을 포함하는 40 x 40으로 하였을 때와 현재블록의 탐색영역 36 × 36 자체를 공유할 때와 PSNR의 평균오차가 ±0.07dB로 거의 같다. 그러므로 인접한 4 × 4블록 4개의 가장 일치하는 블록을 찾기 위해 좌측 상단의 첫 번째 블록의 탐색영역을 공유하여 사용할 수 있다. Accordingly, the size of the entire search region including the search region of each adjacent block is 40 × 40. Table 2 shows the PSNR (Peek Signal To Noise Rate) when the size of the search area to be shared is 40 × 40 and 36 × 36. In Table 2, when the search area to be shared by each block is 40 x 40 including the search area of the adjacent block, when the search area 36 × 36 itself of the current block is shared and the average error of the PSNR is almost ± 0.07 dB. same. Therefore, the search area of the first block in the upper left corner can be shared and used to find the most matching blocks of four adjacent 4x4 blocks.

QCIF 영상QCIF video 탐색영역 36 × 36에서의 PSNR 평균PSNR mean in search area 36 × 36 탐색영역 40 x 40에서의 PSNR 평균PSNR mean in search area 40 x 40 AkiyoAkiyo 43.196343.1963 43.226343.2263 Bridge closeBridge close 38.267038.2670 38.317038.3170 carphonecarphone 33.662033.6620 33.672033.6720 ClareClare 44.070944.0709 43.990943.9909 CoastguardCoastguard 32.416432.4164 32.536432.5364 ForemanForeman 35.208035.2080 35.278035.2780 GranmaGranma 41.523941.5239 41.433941.4339 GrassesGrasses 26.760226.7602 26.800226.8002 Miss amMiss am 41.629441.6294 41.569441.5694 SuzieSuzie 38.524838.5248 38.614838.6148 Table tennisTable tennis 30.640130.6401 30.720130.7201 trevortrevor 36.395236.3952 36.375236.3752

또한, 표 3은 QCIF(Quarter Common Intermediate Format)한 프레임을 4 × 4 블록 단위로 연산할 때, 제안하는 탐색영역을 공유하는 방법과 탐색영역을 공유하여 사용하지 않는 방법의 메모리 접근 횟수를 나타낸다. 표 3에서 보는 바와 같이 탐색영역을 공유하여 사용할 경우 메모리 접근을 75% 줄일 수 있다.In addition, Table 3 shows the number of memory accesses when sharing the proposed search area and not using the search area when calculating a QCIF (Quarter Common Intermediate Format) frame in 4 × 4 block units. As shown in Table 3, memory access can be reduced by 75% when shared search areas are used.

각 블록의 탐색영역을 사용Use search area of each block 인접 블록의 탐색영역을 공유Share search area of adjacent blocks 탐색영역의 크기Size of navigation area [-8, +7][-8, +7] [-15, +16][-15, +16] [-8, +7][-8, +7] [-15, +16][-15, +16] 메모리 접근횟수Memory access count 150,480150,480 513,216513,216 37,62037,620 128,304128,304

다음에, 도 8은 본 발명인 메모리 재사용을 이용한 움직임 추정을 위한 연산회로를 나타낸 것으로, 4 × 4 블록이 4개의 행으로 구성되므로, 36 × 36 탐색영역 안에서 네 개 행의 픽셀 데이터를 저장할 수 있는 144개의 8bit 레지스터 (110)와, 현재블록의 각 행의 데이터와 탐색영역의 데이터를 저장하고 있는 상기 레지스터 (110)로부터 데이터 값을 조합하여 출력하는 분배기 (120)와, 입력받은 현재블록의 픽셀 데이터와 탐색영역의 픽셀 데이터의 차이 값을 구하는 SAD 연산부 (130, 140, 150, 160)와, 출력된 SAD 값을 순차적으로 저장할 수 있도록 출력하는 스위칭 블록 (170), 스위칭 블록 (170)으로부터 받은 SAD 값을 저장하는 버퍼 (180, 190, 200, 210), 완성된 4 × 4 블록의 SAD 값이 될 수 있도록 SAD 값을 피드백 출력하거나 최종 완성된 4 × 4 블록의 SAD 값을 순차적으로 출력하는 스위칭 블록 (220), 입력받은 4 × 4 블록의 SAD 값을 순차적으로 비교하면서 가장 작은 SAD 값을 갖는 블록의 위치와 SAD 값을 내보내는 SAD 비교 연산부 (230, 240, 250, 260)로 구성된다.Next, FIG. 8 shows an operation circuit for motion estimation using the memory reuse of the present invention. Since 4 × 4 blocks are composed of four rows, four rows of pixel data can be stored in a 36 × 36 search area. 144 8-bit registers 110, a divider 120 that combines and outputs data values from the registers 110 storing data of each row of the current block and data of the search area, and a pixel of the input current block. Received from the SAD calculator 130, 140, 150, 160 for calculating the difference between the data and the pixel data of the search area, and the switching block 170 and the switching block 170 for sequentially storing the output SAD values. Buffers for storing SAD values (180, 190, 200, 210), which feed back the SAD value to be the SAD value of the completed 4 × 4 block or sequentially output the SAD value of the final 4 × 4 block The switching block 220 is composed of a SAD comparison operation unit 230, 240, 250, 260 to sequentially output the SAD value and the position of the block having the smallest SAD value while sequentially comparing the SAD value of the input 4 × 4 block .

상기 4 × 4 블록이 4개의 행으로 구성되므로, 36 × 36 탐색영역 안에서 네 개 행의 픽셀 데이터는 144개의 레지스터 (110)에 순차적으로 저장되며 이후 연산을 통해 최종적인 4 × 4 블록의 SAD 값이 출력됨과 동시에 레지스터의 픽셀 데이터가 36bit 왼쪽 쉬프트 되며 다음 행의 픽셀 데이터가 순차적으로 입력된다. 현재 블록과 인접한 3개의 블록은 동시에 같은 방법으로 계산되며, 현재블록의 연산 방법은 첫 째로 사이클에 4픽셀씩 4개의 SAD를 동시에 계산한다. Since the 4 × 4 block is composed of four rows, four rows of pixel data in the 36 × 36 search region are sequentially stored in the 144 registers 110, and the SAD value of the final 4 × 4 block is subsequently processed. At the same time, the pixel data of the register is shifted left by 36 bits and the pixel data of the next row is sequentially input. The three blocks adjacent to the current block are simultaneously calculated in the same way, and the calculation method of the current block first calculates four SADs at the same time by 4 pixels per cycle.

다음에, 도9는 4 × 1 블록의 SAD 값을 구하는 연산 유닛의 구조를 나타낸 것으로, 참조블록의 픽셀데이터와 현재블록의 픽셀데이터를 입력받아 SAD 값을 구하기 위한 덧셈기(300, 310, 320, 330)와, 위 계산된 결과를 임시로 저장하고 있는 파이프라인 레지스터(340)와, 4픽셀의 SAD 값을 더하기 위한 덧셈기(350, 360)와, 위 계산된 결과를 임시로 저장하고 있는 파이프라인 레지스터(370)와, 최종적으로 4 × 1 픽셀 데이터의 SAD 값을 더하기 위한 덧셈기(380)로 구성된다.Next, FIG. 9 illustrates a structure of a calculation unit for obtaining SAD values of a 4x1 block, and includes adders 300, 310, 320, for obtaining SAD values by receiving pixel data of a reference block and pixel data of a current block. 330, a pipeline register 340 that temporarily stores the calculated results, adders 350 and 360 for adding SAD values of 4 pixels, and a pipeline that temporarily stores the calculated results. A register 370 and an adder 380 for finally adding SAD values of 4x1 pixel data.

다음에, 도10은 4 × 1 블록의 SAD 값을 저장하기 위한 버퍼의 구조를 나타낸 으로, 상기 계산된 4 × 1 블록의 SAD 값을 입력받아 선택적으로 버퍼에 저장하기 위한 디멀티플렉서(400)와, 최종적으로 4 × 4 블록의 SAD 값을 구하기 위해 이전에 저장되어 있던, SAD 값과 현재 입력받은 4 × 1 블록의 SAD 값을 더하기 위한 덧셈기(410, 420, 430, 440)와, 계산된 결과를 저장하기 위한 버퍼(450, 460, 470, 480)로 구성된다.10 shows a structure of a buffer for storing SAD values of a 4x1 block, a demultiplexer 400 for receiving the calculated SAD values of the 4x1 block and selectively storing them in a buffer; Finally, the adders 410, 420, 430, and 440 for adding the SAD value previously stored to obtain the SAD value of the 4 × 4 block and the SAD value of the currently input 4 × 1 block, and the calculated result It is composed of buffers 450, 460, 470, and 480 for storing.

도 11은 본 발명에 따른 멀티미디어 코덱에 사용되는 움직임 추정 연산회로를 사용하여 참조프레임의 탐색영역 안의 픽셀 데이터와 현재블록의 픽셀 데이터와의 SAD연산을 나타낸 것으로, 다음 4개의 4 × 1 탐색창은 매 클록 하나의 레지스터씩 우측 쉬프트 되며 네 번 쉬프트 후에는 열두 레지스터만큼 우측 쉬프트 되는 연산을 반복 한다. 참조프레임의 네 개 행에는 4 × 4 블록이 32개 존재하므로, 8사이클 후에는 SAD 연산부 1(130)을 통해 현재블록의 1행과 참조프레임의 1행을 연산한 32개의 SAD가 Buffer_1-0부터 Buffer_1-35(180)에 저장되며, 현재블록의 2행의 4픽셀도 같은 방식으로 연산한다. 현재 블록에 대하여 모든 행은 같은 방식으로 계산되며, 32사이클 후에 각 버퍼(180)에는 각 행의 SAD값이 누적된 32개 블록에 대한 최종 SAD 값들이 저장된다. SAD 연산부1(130)은 현재블록 B1을 참조프레임의 픽셀 데이터와 연산하고, SAD 연산부2(140)는 인접블록 B2, SAD 연산부3(150)는 인접블록 B3, SAD 연산부4(160)는 인접블록 B4를 참조프레임의 픽셀 데이터와 SAD 연산한다. 그 후 SAD 1 비교 연산부(230)는 Buffer_1-0부터 Buffer-1_35(180)에 저장된 SAD 값 중 최소값의 SAD값을 출력 한다. SAD 2 비교 연산부(240)는 Buffer_2-0부터 Buffer-2_35(190), SAD 3 비교 연산부(250)는 Buffer_3-0부터 Buffer-3_35(200), SAD 4 비교 연산부(260)는 Buffer_4-0부터 Buffer-4_35(210)에 저장된 값 중 각각 최소값의 SAD 값을 출력한다.11 illustrates SAD operations between pixel data in a search region of a reference frame and pixel data of a current block using a motion estimation algorithm used in a multimedia codec according to the present invention. Each shift is shifted right by one register, and after four shifts, the operation is shifted right by twelve registers. Since there are 32 4 × 4 blocks in four rows of the reference frame, after 8 cycles, the 32 SADs that computed one row of the current block and one row of the reference frame through the SAD calculator 1 (130) are Buffer_1-0. Is stored in Buffer_1-35 (180), and 4 pixels of 2 rows of the current block are calculated in the same manner. All rows are calculated in the same way for the current block, and after 32 cycles, each buffer 180 stores the final SAD values for the 32 blocks in which the SAD values of each row are accumulated. SAD operator 1 130 calculates the current block B1 with the pixel data of the reference frame, SAD operator 2 140 is the adjacent block B2, SAD operator 3 150 is the adjacent block B3, and SAD operator 4 160 is the adjacent. SAD operation is performed on the block B4 with the pixel data of the reference frame. Thereafter, the SAD 1 comparison operator 230 outputs the SAD value of the minimum value among the SAD values stored in the Buffer_1-0 to the Buffer-1_35 (180). SAD 2 comparison operation unit 240 is from Buffer_2-0 to Buffer-2_35 (190), SAD 3 comparison operation unit 250 from Buffer_3-0 to Buffer-3_35 (200), SAD 4 comparison operation unit 260 from Buffer_4-0 SAD values of the minimum values among the values stored in the buffer-4_35 (210) are output.

이상과 같이 본 발명은 도면에 도시한 실시예를 참고하여 설명하였으나, 이는 발명을 설명하기 위한 것일 뿐이며, 본 발명이 속하는 기술 분야의 통상의 지식을 가진 자라면 발명의 상세한 설명으로부터 다양한 변형 또는 균등한 실시예가 가능하다는 것을 이해할 수 있을 것이다. 따라서 본 발명의 진정한 권리 범위는 특허청구범위의 기술적 사상에 의해 결정되어야 한다.As described above, the present invention has been described with reference to the embodiments shown in the drawings, but it is only for illustrating the invention, and those skilled in the art to which the present invention pertains various modifications or equivalents from the detailed description of the invention. It will be appreciated that one embodiment is possible. Therefore, the true scope of the present invention should be determined by the technical spirit of the claims.

본 발명은 상술한 바와 같이 MPEG-2, MPEG-4 그리고 H.264/AVC 차세대 멀티미디어 코덱 등 다양한 멀티미디어 코덱에 사용되는 효율적인 움직임 추정 연산 방법 및 그 연산 방법을 구현하기 위한 연산 회로를 제공하여 H.264/AVC 알고리즘 구현시 연산량 비중을 많이 차지하는 움직임 추정 과정을 효율적으로 수행할 수 있게 하여 성능을 향상 시켰다.The present invention provides an efficient motion estimation calculation method used in various multimedia codecs such as MPEG-2, MPEG-4 and H.264 / AVC next-generation multimedia codec as described above, and provides an operation circuit for implementing the calculation method. The performance was improved by efficiently performing the motion estimation process that takes up a large amount of computation when implementing the 264 / AVC algorithm.

본 발명은 표4에 나타낸 바와 같이 탐색영역을 공유하는 메모리 재사용 알고리즘과 일반적인 전 탐색 방법과의 연산 사이클을 비교할 수 있다. [-8, +7] 탐색 영역의 경우 4 × 4 블록의 SAD를 구하기 위해 514 클록이 소요된다. QCIF (176 x 144) 영상의 경우 4 × 4 블록이 1584개 존재하므로 일반적인 전 탐색 방법으로 한 프레임을 처리하기 위해 514 × 1584 = 814,176 클록이 소요된다. As shown in Table 4, the present invention can compare operation cycles between a memory reuse algorithm sharing a search area and a general full search method. For the [-8, +7] search region, it takes 514 clocks to find the SAD of the 4x4 block. For QCIF (176 x 144) images, there are 1584 4 × 4 blocks, so it takes 514 × 1584 = 814,176 clocks to process one frame with the normal full search method.

탐색영역 크기Navigation area size [-8, +7][-8, +7] [-16, +15][-16, +15] 일반적인 전 탐색 알고리즘General pre-search algorithm 제안된 알고리즘Proposed algorithm 일반적인 전 탐색 알고리즘General pre-search algorithm 제안된 알고리즘Proposed algorithm 1개의 4 × 4 블록의 SAD를 구하는데 필요한 클록Clock required to find SAD of one 4 × 4 block 514 클록514 clock 514 클록514 clock 1163 클록1163 clock 1163 클록1163 clock QCIF (176x144)QCIF (176x144) 814,176 클록814,176 clock 203,544 클록203,544 clocks 1,842,192 클록1,842,192 clock 460,548 클록460,548 clock 30 프레임 (1개의 참조프레임)30 frames (1 reference frame) 24,425,280 클록24,425,280 clock 6,106,320 클록6,106,320 clock 55,265,760 클록55,265,760 clock 13,816,440 클록13,816,440 clock 5개의 참조프레임5 reference frames 122,126,400 클록122,126,400 clocks 30,531,600 클록30,531,600 clock 276,328,800 클록276,328,800 clock 69,082,200 클록69,082,200 clock

본발명에 따른 방법으로는 4개의 4 × 4 블록을 동시에 병렬처리 하므로 814,176/4 = 203,544 클록이 소요된다. 그러므로 참조프레임을 1개로 하여 초당 30프레임을 움직임 추정 연산 할 경우, 일반적인 전 탐색 방법으로는 24MHz로 동작해야 하며, 제안된 방식으로는 6MHz로 동작하면 된다. 또한 참조프레임을 5개로 하여 초당 30프레임을 움직임 추정 연산할 경우, 일반적인 전 탐색 방법으로는 122MHz로 동작해야 하며, 제안된 방식으로는 30MHz로 동작해야 한다. [-16, +15] 탐색영역의 경우 4 × 4 블록의 SAD를 구하기 위해 1163 클록이 소요된다. 일반적인 전 탐색 방법으로 QCIF 영상 한 프레임을 처리하기 위해서는 1,842,192 클록이 소요되며, 제안된 방법으로는 4개의 4 × 4 블록을 동시에 병렬처리 하므로 1,842,192/4 = 460,548 클록이 소요된다. 그러므로 참조프레임을 1개로 하여 초당 30프레임을 움직임 추정 연산 할 경우, 일반적인 전 탐색 방법으로는 55MHz, 제안된 방식으로는 13MHz로 동작해야하며, 참조프레임을 5개로 할 경우에는 일반적인 전 탐색 방법으로는 276MHz, 제안된 방식으로는 69MHz로 동작하면 된다. 이상과 같이 제안된 방식은 일반적인 전 탐색 방법에 비해 동작 주파수가 1/4정도로 사용되어, 기존 방식에 비해 75%의 전력소모를 줄일 수 있는 효율적인 방식이다.In the method according to the present invention, since 4 4 × 4 blocks are processed in parallel at the same time, 814,176 / 4 = 203,544 clocks are required. Therefore, in case of motion estimation calculation of 30 frames per second with one reference frame, the general pre-search method should operate at 24MHz, and the proposed method should operate at 6MHz. In addition, when the motion estimation operation is performed at 30 frames per second with 5 reference frames, the general full search method should operate at 122 MHz and the proposed method should operate at 30 MHz. In the case of the [-16, +15] search region, a 1163 clock is required to obtain a SAD of a 4 × 4 block. The general full search method requires 1,842,192 clocks to process one frame of QCIF image, and the proposed method requires 1,842,192 / 4 = 460,548 clocks because 4 4 × 4 blocks are processed in parallel at the same time. Therefore, in case of motion estimation calculation of 30 frames per second with one reference frame, it should be 55MHz and 13MHz in the proposed full search method, and in case of 5 reference frames, 276MHz, 69MHz in the proposed scheme. The proposed method is an efficient method that can reduce the power consumption by 75% compared to the conventional method because the operating frequency is used about 1/4 of the conventional full search method.

또한, 본 발명은 표5에 나타낸 바와 같이 탐색영역을 공유하는 메모리 재사용 알고리즘과 H.264/AVC의 전 탐색 알고리즘과의 PSNR 비교할 수 있다. 즉, 두 가지의 경우 PSNR의 평균오차가 ±0.05dB 정도로 서로 비슷하다. 그러므로 제안된 메모리 재사용 알고리즘은 화질의 손상이 거의 없이 일반적인 H.264/AVC의 전 탐색 알고리즘에 비해 메모리 접근 횟수를 75% 줄인 효율적인 구조이다. In addition, as shown in Table 5, the present invention can compare the PSNR between the memory reuse algorithm sharing the search area and the previous search algorithm of H.264 / AVC. In other words, the average error of PSNR is similar to each other by ± 0.05dB. Therefore, the proposed memory reuse algorithm is an efficient structure that reduces the number of memory accesses by 75% compared to the general H.264 / AVC search algorithm with almost no loss of image quality.

QCIF 영상QCIF video H.264/AVC (dB)H.264 / AVC (dB) 제안된 알고리즘 (dB)Proposed Algorithm (dB) AkiyoAkiyo 43.196343.1963 43.201343.2013 Bridge closeBridge close 38.267038.2670 38.267538.2675 carphonecarphone 33.662033.6620 33.609933.6099 ClareClare 44.070944.0709 44.977044.9770 CoastguardCoastguard 32.416432.4164 32.407032.4070 ForemanForeman 35.208035.2080 35.196135.1961 GranmaGranma 41.523941.5239 41.522141.5221 GrassesGrasses 26.760226.7602 26.745326.7453 Miss amMiss am 41.629441.6294 41.516441.5164 SuzieSuzie 38.524838.5248 38.488738.4887 Table tennisTable tennis 30.640130.6401 30.609530.6095 trevortrevor 36.395236.3952 36.396836.3968

Claims

Efficient Motion Estimation Algorithm for MPEG-2, MPEG-4 and H.264 / AVC Next Generation Multimedia Codec

A method of calculating motion estimation used in various multimedia codecs comprising sharing a search area of one M × N block among each adjacent M × N blocks to perform motion estimation of adjacent M × N blocks. .

The method of claim 1,

When the M × N block is 4 × 4 blocks,

A motion estimation operation method for various multimedia codecs, comprising: performing a motion estimation operation of three adjacent blocks by sharing a search area of the upper left 4 × 4 block among four adjacent 4 × 4 blocks; .

Generating four addresses of four neighboring 4x4 blocks of the current frame, inputting them into a memory in which the current frame is stored, and storing data of four 4x4 current blocks output from the memory in each register;

In order to find the block that most closely matches the 4 4 4 current blocks of the current frame in the reference frame, an MVp of the first block of the 4 4 4 4 current blocks is generated, and a constant within the reference frame based on the generated MVp. Determining a search area of the area;

When the search area is determined, an address of each data in the search area is generated and inputted into a memory in which a reference frame is stored, and 4 between the data in the search area output from the memory and the data of four 4 × 4 current blocks stored in the register. Obtaining 4 × 1 SAD values of four blocks in units of 1 ×, and then generating 4 × 4 SAD values by adding respective 4 × 1 SAD values obtained through operations;

The reference block having the smallest SAD value among the SAD values for each of the four 4x4 current blocks is the most identical block, and the data of the 4x4 block and the 4x4 current block of the most identical reference frame are the same. Subtracting the data to generate data of an error block;

MV of the four 4 × 4 current blocks are generated by calculating MVs based on MVp of the first block, respectively, and the error blocks and MVs of the generated 4 4 × 4 current blocks are encoded and transmitted. Motion estimation algorithm used in various multimedia codecs, characterized in that.

In the arithmetic circuit which performs the motion estimation operation used for MPEG-2, MPEG-4 or H.264 / AVC multimedia codec,

144 8-bit registers each consisting of four rows of 4 × 4 blocks, capable of storing four rows of pixel data in a 36 × 36 search area;

A divider for combining and outputting data values from a register which stores data of each row of the current block and data of the search area;

A SAD operation unit for obtaining a difference value between the input pixel data of the current block and the pixel data of the search area;

A switching block for outputting the output SAD value so as to be sequentially stored;

A buffer for storing the SAD value received from the switching block;

A switching block for feeding back the SAD value or sequentially outputting the SAD value of the last completed 4 × 4 block so as to become the SAD value of the completed 4 × 4 block;

Motion estimation algorithm used in various multimedia codecs comprising a SAD comparison operation unit for outputting the SAD value and the position of the block having the smallest SAD value while sequentially comparing the SAD value of the input 4 × 4 block.

The method of claim 4, wherein

SAD calculation unit for calculating the SAD data received from the distributor,

Each adder is connected to a pipeline register installed between the parallel adders in order to obtain the SADs of the internal reference block 4 × 1 pixels and the current block 4 × 1 pixels. And pipeline registers are motion estimation computing circuits used in various multimedia codecs, characterized in that connected by a connecting line.

The method of claim 4, wherein

The buffer for storing the SAD value of each block,

One demultiplexer for storing SAD values of each 4 × 1 block, 36 adders and 36 buffers connected in parallel to the bottom of the demultiplexer, and a connection line connecting the motions used in various multimedia codecs Estimation Computation Circuit.