KR100301835B1

KR100301835B1 - Method for block matching motion estimation and apparatus for the same

Info

Publication number: KR100301835B1
Application number: KR1019980036345A
Authority: KR
Inventors: 김상연
Original assignee: 구자홍; 엘지전자 주식회사
Priority date: 1998-09-03
Filing date: 1998-09-03
Publication date: 2001-09-06
Also published as: KR20000018666A

Abstract

블록 정합 알고리즘을 이용한 움직임 추정 방법 및 장치에 관한 것으로서, 특히 현재 프레임의 기준 블록과 이전 프레임의 탐색 영역내 후보 블록의 최상위 비트를 가지고 탐색 영역 내 모든 후보 벡터에 대해서 측정 함수를 계산하는 제 1단계와, 상기 측정 함수의 계산 결과를 비교하여 그 값이 최소인 후보 블록들을 추출하는 제 2 단계와, 상기 추출된 후보 블록들에 대해서만 그 다음 최상위 비트를 가지고 상기 제 1 단계부터 다시 수행하는 제 3 단계로 이루어지며, 이 과정을 후보 벡터의 수가 하나가 되거나 계산에 이용되는 상위 비트가 미리 설정한 임계값보다 작아지기 전까지 반복 수행함으로써, 많은 양의 게이트 수를 줄이면서 움직임 추정 오류를 증가시키지 않으며, 또한 하위 비트를 블록 정합 과정에서 사용하지 않을 수 있는데, 이는 영상 데이터에 대해서 저역 통과 필터링한 효과를 가지므로 잡음에 의한 영향을 줄일 수 있다.A method and apparatus for estimating motion using a block matching algorithm, in particular, a first step of calculating a measurement function for all candidate vectors in a search region with the most significant bits of a candidate block in a search region of a current frame and a previous frame; And a second step of comparing the calculation result of the measurement function to extract candidate blocks having a minimum value, and a third step of performing again from the first step with the next most significant bit only for the extracted candidate blocks. This process is repeated until the number of candidate vectors becomes one or the upper bits used in the calculation become smaller than the preset threshold, thereby reducing the number of gates and increasing the motion estimation error. In addition, the lower bits may not be used in the block matching process, It has a low pass filtering effect on the data, thereby reducing the effects of noise.

Description

METHOD FOR BLOCK MATCHING MOTION ESTIMATION AND APPARATUS FOR THE SAME}

본 발명은 동영상 압축 전송을 필요로 하는 시스템에 관한 것으로서, 특히 블록 정합 알고리즘을 이용한 움직임 추정 방법 및 장치에 관한 것이다.The present invention relates to a system requiring video compression transmission, and more particularly, to a motion estimation method and apparatus using a block matching algorithm.

동영상 압축 전송 시스템에서 블록 정합 알고리즘을 이용한 영상 데이터의시간적 중복성 제거 방법은 이미 일반화되어 널리 이용되고 있다. 이는 블록 정합 알고리즘에 의한 데이터 감축이 매우 효과적이고 다른 움직임 추정 방식에 비하여 비교적 구현이 용이할 뿐만 아니라 좋은 성능을 보여주기 때문이다.In the video compression transmission system, a method of eliminating temporal redundancy of image data using a block matching algorithm has been generalized and widely used. This is because data reduction by block matching algorithm is very effective and relatively easy to implement and shows good performance compared to other motion estimation methods.

이러한 상기 블록 정합 알고리즘은 다음의 수학식 1로 표현되는 측정 함수를 최소화하는 움직임 벡터(V_x,V_y)를 찾는다.This block matching algorithm finds a motion vector (V _x , V _y ) that minimizes the measurement function represented by Equation 1 below.

여기서, B는 현재 추정하고자 하는 블록 데이터 좌표들의 집합이고, SA는 이전 프레임 내의 탐색 영역 좌표들의 집합이다. 그리고, r(x,y)와 s(x+v_x, y+v_y)는 각각 (x,y) 위치의 기준 블록 데이터와 기준 블록 위치에서 (v_x, v_y)만큼 변이된 위치의 탐색 영역 내 후보 블록 데이터를 나타낸다. 이의 VLSI(Very Large Scale Intergrated Circuit) 구현 방안으로 여러 구조들이 제안되었는데, 그들 중 하나인 1차 시스톨릭 어레이(Systolic array) 구조를 도 1에 도시하였다.Here, B is a set of block data coordinates to be estimated currently, and SA is a set of search area coordinates in a previous frame. And, r (x, y) and s (x + v _x , y + v _y ) are respectively the position of the position shifted by (v _x , v _y ) from the reference block data and the reference block position at ( _x , _y ) Candidate block data in the search region is shown. Various structures have been proposed as a method of implementing a Very Large Scale Intergrated Circuit (VLSI), and one of them, a primary systolic array structure, is illustrated in FIG. 1.

도 1은 블록의 크기가 4×4 화소인 경우를 나타낸 것이고, 이때 수행 요소(processing element ; PE)의 구조는 도 2와 같다. 만일, 매크로 블록 단위인 16×16 화소를 처리할 경우에는 상기 수행 요소가 16개 시리얼로 연결된다.FIG. 1 illustrates a case where a block size is 4 × 4 pixels, and the structure of a processing element (PE) is shown in FIG. 2. If 16 × 16 pixels, which are macroblock units, are processed, the execution elements are connected to 16 serial units.

이때, 상기 각 화소는 8비트로 되어 있으므로 현재 움직임을 추정하고자 하는 기준 블록의 해당 화소의 8비트 데이터와 탐색 영역 내 후보 블록의 같은 위치의 화소의 8비트 데이터가 각각의 래치를 통해 일시 래치된 후 각 수행 요소의 감산기에 패러럴로 입력된다.In this case, since each pixel is 8 bits, 8 bit data of the corresponding pixel of the reference block for which the current motion is to be estimated and 8 bit data of the pixel at the same position of the candidate block in the search area are temporarily latched through the respective latches. It is input in parallel to the subtractor of each performance element.

즉, 현재 움직임을 추정하고자 하는 기준 블록의 화소값(=8비트 데이터)과 탐색 영역 내 후보 블록의 같은 위치의 화소값(=8비트 데이터)의 차가 감산기(21)에서 출력된다. 그리고, 상기 감산기(21)에서 출력되는 두 화소의 차값은 절대치부(22)에 절대치화된 후 가산기(23)로 입력된다. 상기 가산기(23)는 이전 PE에서 출력되는 부분 합과 상기 절대치부(22)의 출력을 더하여 래치(24)에서 래치한 후 부분 합으로서 다음 PE로 출력한다. 이때, 가장 상측에 위치하는 PE는 이전 PE가 없으므로 입력되는 부분합은 0이 된다. 이러한 과정이 각 PE에서 순차 진행되어 가장 하측에 위치하는 PE로 부분합이 입력되면 상기 PE도 두 화소값의 차값에 대한 절대값과 상기 부분 합을 더하여 누산기로 출력하고, 누산기는 입력되는 부분합에 이전 데이터를 더한다.That is, the difference between the pixel value (= 8 bit data) of the reference block for which the current motion is to be estimated and the pixel value (= 8 bit data) at the same position of the candidate block in the search area is output from the subtractor 21. The difference value between the two pixels output from the subtractor 21 is absolute valued in the absolute value unit 22 and then input to the adder 23. The adder 23 adds the partial sum output from the previous PE and the output of the absolute value unit 22 and latches the latch 24 and outputs the partial sum to the next PE. At this time, since the PE located on the uppermost side has no previous PE, the input partial sum is zero. When this process is sequentially performed at each PE and the subtotal is input to the PE located at the lowermost side, the PE is output to the accumulator by adding the absolute value and the subtotal to the difference value of the two pixel values, and the accumulator is transferred to the input subtotal. Add data

이와 같은 과정이 반복되어 현재 움직임을 추정하고자 하는 기준 블록의 모든 화소값과 탐색 영역 내 후보 블록의 모든 화소값이 각 PE에서 처리된 후 누산기를 통해 출력되면 이 값이 후보 블록들중 가장 작은 최소값인지를 판별한다. 만일, 최소값이라면 이 후보 블록이 현재 기준 블록과 가장 유사한 블록이라고 판단한다. 그리고, 가장 유사한 후보 블록이 판별되면 상기 후보 블록과 기준 블록과의 차신호와 그때의 움직임 벡터(v_x,v_y)를 엔코딩 과정을 거쳐 디코더로 전송한다.If this process is repeated and all pixel values of the reference block for which current motion is to be estimated and all pixel values of the candidate block in the search area are processed in each PE and then output through the accumulator, this value is the smallest minimum value among the candidate blocks. Determine if it is. If the minimum value, the candidate block is determined to be the most similar block to the current reference block. When the most similar candidate block is determined, the difference signal between the candidate block and the reference block and the motion vector (v _x , v _y ) at that time are transmitted to the decoder through an encoding process.

그러나, 일반적으로 상기와 같은 블록 정합 알고리즘을 비롯한 움직임 추정방법들은 대개 많은 계산량 및 고비용을 요구하는 문제가 있기 때문에 지금까지 이를 해결하려는 많은 연구가 진행되어 왔다. 그중 하나가 도 1, 도 2와 같이 탐색 영역내에 존재하는 모든 블록을 정합하는 것이 아니라 일부 블록에 대해서 정합한 후 그 결과를 바탕으로 단계적으로 최적의 블록을 찾아가는 방식이다. 그러나, 이 방법은 영상 압축시 움직임 보상 오차를 크게 증가시킴으로써 압축 효율을 저하시키는 문제와 함께 VLSI 구현시 패러럴 구조의 시스톨릭 어레이 구조로 구현하기에 부적합한 단점을 가지고 있다.However, in general, since the motion estimation methods including the block matching algorithm generally require a large amount of computation and high cost, many studies have been conducted to solve this problem. One of them is not a method of matching all blocks existing in the search area as shown in FIGS. 1 and 2 but instead of matching some blocks and searching for an optimal block step by step based on the result. However, this method has a problem of lowering the compression efficiency by greatly increasing the motion compensation error in image compression, and has a disadvantage that it is not suitable to implement a parallel structure systolic array structure in the VLSI implementation.

한편, 최근에는 또 다른 방식이 연구되고 있는데 이는 8비트 화소들로 표현되는 영상 데이터를 2진 영상으로 단순화하여 처리하는 방법이다. 즉, 상기 화소는 8비트로 되어 있으므로 각 화소값은 0∼255사이의 어느 한 값을 가지게 된다. 그러므로 문턱값을 정하고 화소값이 상기 문턱값보다 크면 1, 작으면 0으로 화소값을 단순화하는 것이다. 이 경우에 상기 수학식 1의 측정 함수에서 8비트의 뺄셈과 절대치 연산이 1비트의 배타적 오아 연산으로 단순화된다.Meanwhile, another method has recently been studied. This is a method of simplifying and processing image data represented by 8-bit pixels into a binary image. That is, since the pixel has 8 bits, each pixel value has any value between 0 and 255. Therefore, the threshold value is determined and the pixel value is simplified to 1 if the pixel value is larger than the threshold value and to 0 if the pixel value is smaller than the threshold value. In this case, 8-bit subtraction and absolute value operations are simplified to 1-bit exclusive ora operation in the measurement function of Equation 1 above.

그러나, 이 방법 역시 8비트 영상 데이터를 1비트로 단순화시킴으로써, 가장 유사한 블록을 찾기가 어렵고 이로인해 움직임 추정 성능이 떨어져 압축 효율을 저하시키는 문제가 있다. 또한, 2진 영상으로 단순화하는 과정에서 문턱값(threshold value)이 매우 중요한데 이 문턱값의 결정 과정이 매우 어려울 뿐만 아니라 많은 계산량을 필요로 하는 문제가 있다.However, this method also simplifies 8-bit image data to 1 bit, which makes it difficult to find the most similar block, resulting in poor motion estimation performance and thus lowering compression efficiency. In addition, a threshold value is very important in the process of simplifying a binary image, and the determination of the threshold value is very difficult and requires a large amount of computation.

본 발명은 상기와 같은 문제점을 해결하기 위한 것으로서, 본 발명의 목적은비트 단위로 단계적인 블록 정합 움직임 추정을 수행함으로써, 영상 압축 효율을 저하시키지 않으면서 게이트 수를 줄이는 블록 정합 움직임 추정 방법 및 장치를 제공함에 있다.SUMMARY OF THE INVENTION The present invention has been made to solve the above problems, and an object of the present invention is to perform a block-matched motion estimation step by bit, thereby reducing the number of gates without reducing the image compression efficiency method and apparatus In providing.

본 발명의 다른 목적은 잡음으로 인한 움직임 추정 오류를 줄이는 블록 정합 움직임 추정 방법 및 장치를 제공함에 있다.Another object of the present invention is to provide a block matching motion estimation method and apparatus for reducing a motion estimation error due to noise.

도 1은 종래의 블록 정합 움직임 추정 장치의 1차 시스톨릭 어레이 구조를 나타낸 구성 블록도1 is a block diagram showing a first systolic array structure of a conventional block matched motion estimation apparatus

도 2는 도 1의 각 수행 요소의 상세 블록도2 is a detailed block diagram of each performance element of FIG.

도 3은 본 발명에 따른 블록 정합 움직임 추정 장치의 1차 시스톨릭 어레이 구조를 나타낸 구성 블록도3 is a block diagram showing a structure of a first order systolic array of a block matched motion estimation apparatus according to the present invention;

도 4는 도 3의 각 수행 요소의 상세 블록도4 is a detailed block diagram of each performance element of FIG.

도면의 주요부분에 대한 부호의 설명Explanation of symbols for main parts of the drawings

41 : 배타적 오아 게이트 42 : 가산기41: exclusive oar gate 42: adder

43 : 래치43: latch

상기와 같은 목적을 달성하기 위한 본 발명에 따른 블록 정합 움직임 추정 방법은, 현재 프레임의 기준 블록과 이전 프레임의 탐색 영역내 후보 블록의 최상위 비트를 가지고 탐색 영역 내 모든 후보 벡터에 대해서 측정 함수를 계산하는 제 1 단계와, 상기 측정 함수의 계산 결과를 비교하여 그 값이 최소인 후보 블록들을 추출하는 제 2 단계와, 상기 추출된 후보 블록들에 대해서만 그 다음 최상위 비트를 가지고 상기 제 1 단계부터 다시 수행하는 제 3 단계로 이루어지며, 이 과정을 후보 벡터의 수가 하나가 되거나 또는 상기 최상위 비트값이 미리 설정한 임계값보다 작아지기 전까지 반복 수행하는 것을 특징으로 한다.In order to achieve the above object, the block matching motion estimation method according to the present invention calculates a measurement function for all candidate vectors in a search region with the most significant bits of the reference block of the current frame and the candidate block in the search region of the previous frame. And a second step of comparing the calculation result of the measurement function to extract candidate blocks having a minimum value, and having the next most significant bit only for the extracted candidate blocks again from the first step. A third step is performed, and the process is repeated until the number of candidate vectors becomes one or the most significant bit value becomes smaller than a preset threshold.

본 발명에 따른 블록 정합 움직임 추정 장치는 다수개의 수행 요소가 시리얼로 연결되고, 각 수행 요소에는 현재 프레임의 기준 블록과 이전 프레임의 탐색 범위내 후보 블록의 한 화소에 해당하는 데이터를 패러럴로 입력받아 1비트씩 시리얼로 출력하는 시리얼 레지스터가 연결되며, 최하위 수행 요소에는 누산기가 연결되는 것을 특징으로 한다.In the block matched motion estimation apparatus according to the present invention, a plurality of performance elements are serially connected, and in each performance element, data corresponding to one pixel of a reference block of a current frame and a candidate block within a search range of a previous frame are received in parallel. The serial register outputting serially by 1 bit is connected, and the accumulator is connected to the lowest performing element.

상기 각 수행 요소는 현재 프레임의 기준 블록과 이전 프레임의 탐색 영역내후보 블록의 서로 같은 위치의 화소값중 1비트를 각각 입력받아 배타적 오아 연산하는 배타적 오아 게이트와, 상기 배타적 오아 게이트의 출력과 이전 수행 요소의 출력을 더하는 가산기와, 상기 가산기의 출력을 일시 래치한 후 다음 수행 요소로 출력하는 래치로 구성되는 것을 특징으로 한다.Each of the performing elements may include an exclusive ora gate for receiving an exclusive ora operation by receiving 1 bit of pixel values at the same positions of the reference block of the current frame and the search region candidate block of the previous frame, and outputting and transferring the exclusive ora gate. And an adder for adding an output of the performing element, and a latch for outputting the output of the adder to the next performing element.

본 발명의 다른 목적, 특징 및 잇점들은 첨부한 도면을 참조한 실시예들의 상세한 설명을 통해 명백해질 것이다.Other objects, features and advantages of the present invention will become apparent from the following detailed description of embodiments taken in conjunction with the accompanying drawings.

이하, 본 발명의 바람직한 실시예를 첨부도면을 참조하여 상세히 설명한다.Hereinafter, preferred embodiments of the present invention will be described in detail with reference to the accompanying drawings.

본 발명은 영상 데이터의 비트 위치에 따라서 측정 함수의 계산 결과에 미치는 영향이 다르다는 것을 이용한다. 즉, 상위 비트일수록 측정 함수에 큰 영향을 미친다. 하기 표 1은 상위 3비트만을 비교했을 경우 하나의 화소당 계산될 수 있는 절대차(absolute difference)값을 나타낸 것이다.The present invention uses that the influence on the calculation result of the measurement function differs depending on the bit position of the image data. That is, the higher bits have a greater influence on the measurement function. Table 1 below shows an absolute difference value that can be calculated for one pixel when only the upper 3 bits are compared.

화소당 측정값의 범위Range of measurements per pixel q₇q₆q₅ q ₇ q ₆ q ₅ ｜r-s｜의 범위Range of | r-s | 0 0 00 0 10 1 00 1 11 0 01 0 11 1 01 1 10 0 00 0 10 1 00 1 11 0 01 0 11 1 01 1 1 0 ∼ 3132 ∼ 3264 ∼ 6396 ∼ 127128 ∼ 159160 ∼ 191192 ∼ 223224 ∼ 2550 to 3132 to 3264 to 6396 to 127 128 to 159160 to 191192 to 223224 to 255

여기서, q_i= r_i s_i이고, 기준 블록 데이터와 이전 프레임의 후보 블록 데이터의 i번째 비트의 동일성 여부를 나타낸다. 즉, 0이면 동일하고 1이면 서로 다름을 의미한다. 그리고, r_i는 기준 블록 데이터의 i번째 비트, s_i는 탐색 영역내 후보 블록 데이터의 i번째 비트를 각각 나타낸다. 영상 데이터가 8비트인 경우 비트 위치는 i=7이면 최상위 비트(most significant bit ; MSB)를 나타내고, i=0이면 최하위 비트(least significant bit ; LSB)를 나타낸다.Where q _i = r _i s _i , which indicates whether the i-th bit of the reference block data and the candidate block data of the previous frame are identical. In other words, 0 means the same, 1 means different. R _i represents the i-th bit of the reference block data, and s _i represents the i-th bit of the candidate block data in the search region. If the image data is 8 bits, the bit position represents the most significant bit (MSB) if i = 7, and the least significant bit (LSB) if i = 0.

상기 표 1에서 보면 상위 비트가 다를 경우에 하위 비트를 고려할 필요없이 측정 함수 값(=｜r-s｜의 범위)이 상위 비트가 같은 경우보다 큰 값을 가지는 것을 알 수 있다. 예를 들어, 상위 3비트(q₇,q₆,q₅)가 모두 같은 경우(000), 측정 함수값은 0∼31 사이인데 최상위 비트(q₇)만 같고 그 다음 비트(q₆,q₅)는 서로 다른 경우(011), 측정 함수값은 96∼127 사이임을 알 수 있다.즉, 측정 함수값이 클수록 유사 블록이 아닐 확률이 크다. 본 발명은 이러한 사실을 최대한 이용하여 비트 단위의 단계적 움직임 추정 방법으로 하드웨어의 복잡성을 줄이면서 그에 따른 성능 저하를 줄인다.In Table 1, it can be seen that the measurement function value (range of = | rs |) has a larger value than the case in which the upper bits are the same without having to consider the lower bits when the upper bits are different. For example, if the upper three bits (q ₇ , q ₆ , q ₅ ) are all the same (000), the measurement function value is between 0 and 31, with only the most significant bit (q ₇ ) and the next bit (q ₆ , q). ₅ ) are different (011), it can be seen that the measurement function value is between 96 and 127. That is, the larger the measurement function value, the more likely it is not a similar block. The present invention takes full advantage of this fact and reduces the complexity of the hardware with the stepwise motion estimation method in units of bits, thereby reducing the performance degradation.

도 3은 본 발명에 따른 블록 정합 움직임 추정 장치의 1차 시스톨릭 어레이 구조를 나타낸 블록도로서, 블록 정합 과정에서 매 단계마다 하나의 비트만을 사용하며, 블록의 크기가 4×4 화소인 경우를 나타내고 있다.FIG. 3 is a block diagram illustrating a first systolic array structure of a block matching motion estimation apparatus according to the present invention. In the block matching process, only one bit is used in each step, and the block size is 4 × 4 pixels. It is shown.

즉, 다수개 예컨대, 4개의 수행 요소가 시리얼로 연결되고, 각 수행 요소에는 기준 블록과 이전 프레임의 탐색 범위내 후보 블록의 한 화소에 해당하는 8비트의 데이터를 패러럴로 입력받아 1비트씩 시리얼로 출력하는 시리얼 레지스터가 연결된다. 그리고, 하나의 수행 요소에서 계산된 부분 합이 다음에 연결된 수행 요소로 입력되는 과정이 마지막에 연결된 수행 요소까지 순차적으로 진행된다. 상기 마지막 수행 요소에는 누산기가 연결되어 상기 마지막 수행 요소에서 입력되는 부분합과 이전 값을 더하는 과정을 매크로 블록 단위로 수행한다. 여기서, 첫 수행 요소에는 이전 PE가 없으므로 부분 합으로 0이 입력된다.That is, a plurality of, for example, four execution elements are serially connected, and in each execution element, 8 bits of data corresponding to one pixel of the candidate block in the search range of the reference block and the previous frame are parallelly inputted, and each bit is serially transmitted. The serial register to output is connected. Then, the process of inputting the partial sum calculated in one execution element to the next connected execution element proceeds sequentially to the last connected execution element. An accumulator is connected to the last execution element to perform a process of adding a partial sum input from the last execution element and a previous value in macroblock units. Here, since there is no previous PE in the first execution element, 0 is entered as the subtotal.

이때, 움직임 추정은 매크로 블록 단위로 수행하므로 실제는 상기 수행 요소가 16개 필요하다.In this case, since the motion estimation is performed in macroblock units, 16 performance elements are actually required.

이와같이 구성된 도 3에서, 상기 각 수행 요소는 한 화소에 대해 비트 단위로 데이터를 입력받는다. 즉, 현재 프레임의 기준 블록과 이전 프레임의 탐색 영역내 후보 블록의 서로 같은 위치의 화소값이 각각의 시리얼 레지스터를 통해 최상위 비트부터 수행 요소로 입력된다.In FIG. 3 configured as described above, each performing element receives data in bit units for one pixel. That is, pixel values at the same positions of the reference block of the current frame and the candidate block in the search area of the previous frame are input to the performing element from the most significant bit through each serial register.

이때, 상기 각 수행 요소는 도 4에 도시된 바와 같이, 기준 블록의 데이터와 이전 프레임의 탐색 영역 내 후보 블록의 데이터를 배타적 오아링하는 배타적 오아 게이트(41), 상기 배타적 오아 게이트(41)의 출력과 이전 PE로부터 입력되는 부분 합을 더하는 가산기(42), 및 상기 가산기(42)의 출력을 일시 래치한 후 다음 PE로 출력하는 래치(43)로 구성된다.In this case, as shown in FIG. 4, each of the performing elements includes an exclusive ora gate 41 and an exclusive ora gate 41 that exclusively ora pair data of a reference block and data of a candidate block in a search region of a previous frame. An adder 42 that adds an output and a partial sum input from the previous PE, and a latch 43 that temporarily latches the output of the adder 42 and then outputs it to the next PE.

따라서, 상기 수행 요소는 현재 프레임의 기준 블록(reference block)과 이전 프레임의 탐색 영역내 후보 블록의 최상위 비트(MSB)를 가지고 탐색 영역 내 모든 후보 벡터에 대해서 다음의 수학식 2로 표현되는 측정 함수를 계산한다.Therefore, the performing element has a reference block of the current frame and the most significant bit (MSB) of the candidate block in the search region of the previous frame, and a measurement function represented by Equation 2 for all candidate vectors in the search region. Calculate

여기서, r_i (x,y) 와 s_i (x+v_x , ~y+v_y )는 각각 (x,y) 위치의 기준 블록 데이터의 i번째 비트와 기준 블록의 위치에서 (v_x,v_y)만큼 변이된 위치에 있는 후보 블록 데이터의 i번째 비트를 나타낸다. 맨 처음 단계에서는 최상위 비트(i=7)부터 이 과정을 수행한다.Here, r_i (x, y) and s_i (x + v_x, ~ y + v_y) are respectively equal to (v _x , v _y ) at the i-th bit of the reference block data at the (x, y) position and the position of the reference block. It represents the i-th bit of the candidate block data at the displaced position. In the first step, this process is performed from the most significant bit (i = 7).

그리고나서, 상기 측정 함수의 계산 결과를 이용하여 그 값이 최소인 후보 블록 즉, 후보 벡터들을 추출한다. 이는 한개의 비트만을 이용하여 측정 함수를 계산하기 때문에 여러 개의 후보 벡터들이 동일한 측정 값을 가질 수 있기 때문이다. 여기서, 후보 벡터란 후보 블록과 추정하고자 하는 기준 블록과의 좌표값의 차이를 나타낸다.Then, the candidate block having the minimum value, that is, candidate vectors, is extracted using the calculation result of the measurement function. This is because several candidate vectors may have the same measurement value because only one bit is used to calculate the measurement function. Here, the candidate vector represents a difference in coordinate values between the candidate block and the reference block to be estimated.

따라서, 상기 추출된 후보 벡터들을 가지고 그 다음 최상위 비트 i=i-1에 대해서 상기된 과정들을 다시 수행한다. 이 과정을 후보 벡터의 수가 하나가 되거나 i값이 최소 값보다 작아지기전까지 반복한다.Therefore, the above described processes are performed again with respect to the next most significant bit i = i-1 with the extracted candidate vectors. This process is repeated until the number of candidate vectors becomes one or the value of i becomes smaller than the minimum value.

여기서, i의 최소값은 임의로 설정되는 값으로서, 0보다 크거나 같은 값이 될 수 있다. 또한, 최하위 비트까지 안가더라도 후보 벡터는 하나가 될 수 있다.Here, the minimum value of i is arbitrarily set and may be greater than or equal to zero. In addition, even if the least significant bit is not included, the candidate vector may be one.

따라서, 하위 비트를 블록 정합 과정에서 사용하지 않을 수 있는데, 이는 영상 데이터에 대해서 저역 통과 필터링한 효과를 가지므로 잡음에 의한 영향을 줄일 수 있다. 즉, 잡음이 통상 하위 비트에 영향을 많이 주기 때문에 하위 비트를 이용하지 않으면 그만큼 잡음의 영향을 줄일 수 있게 된다.Therefore, the lower bit may not be used in the block matching process, which has a low pass filtering effect on the image data, thereby reducing the influence of noise. That is, since the noise usually affects the lower bits, the influence of the noise can be reduced by not using the lower bits.

이와 같이 본 발명은 각 수행 요소가 종래의 8비트 덧셈기와 절대값 연산을 1비트 배타적 논리합 연산으로 수행가능하므로 하드웨어 복잡도를 크게 줄일 수 있고 이로인해 VLSI 구현시 게이트 수가 상당히 감소한다는 것을 알 수 있다.As described above, since the present invention can perform the conventional 8-bit adder and the absolute value operation as the 1-bit exclusive OR operation, the hardware complexity can be greatly reduced, and thus, the gate count in the VLSI implementation is significantly reduced.

종래 방식과 본 발명의 수행 요소당 필요한 게이트 수를 보면, 종래 방식의 경우 2개의 덧셈기(8비트, 12비트)가 필요하고, 12비트 래치, 그리고 절대치 연산을 위한 12비트의 반전 로직이 필요하다. 반면, 본 발명의 경우 배타적 오아 게이트와 가산기(5비트), 그리고 1개의 5비트 래치로 구현되므로 많은 양의 게이트 절약 효과가 있음을 알 수 있다. 실제로 덧셈기의 경우 비트당 2개의 배타적 오아 게이트와 2개의 앤드 게이트 그리고, 1개의 오아 게이트가 필요하고, 래치의 경우 비트당 6개의 NAND 게이트가 필요하므로 종래 방식과 본 발명의 게이트 수는 표 2에 나타낸 바와 같다.In terms of the conventional method and the number of gates required per performance element of the present invention, the conventional method requires two adders (8 bits, 12 bits), a 12 bit latch, and 12 bits of inversion logic for absolute operation. . On the other hand, in the case of the present invention, it can be seen that there is a large amount of gate saving effect because it is implemented with an exclusive OR gate, an adder (5 bits), and one 5-bit latch. In practice, the adder requires two exclusive oar gates, two end gates, and one oar gate per bit, and a latch requires six NAND gates per bit. As shown.

수행 요소당 필요한 게이트 수 비교Compare the Number of Gates Required per Performing Element 종래 방식Conventional method 본 발명The present invention 배타적 오아 게이트앤드 게이트오아 게이트낸드 게이트인버터Exclusive OR gates and gates OR gates NAND gate inverters 40 개40 개20 개72 개12 개40 pieces 40 pieces 20 pieces 72 pieces 12 pieces 11 개10 개5 개30 개11 pieces 10 pieces 5 pieces 30 pieces

또한, 영상 데이터의 최상위 비트부터 하위 비트로 순차적인 움직임을 추정해 나가다가 더 이상 추정이 필요없을 경우 즉, 후보 벡터가 하나이거나 미리 정한 i의 최소값이 되면 추정을 멈추므로 추정 오류에 의한 압축 효율 저하를 줄일 수있다.In addition, when the sequential motion is estimated from the most significant bit to the least significant bit of the image data, and the estimation is no longer necessary, that is, when the candidate vector is one or the minimum value of i is determined, the estimation is stopped. Can be reduced.

이상에서와 같이 본 발명에 따른 블록 정합 움직임 추정 방법 및 장치에 의하면, 많은 양의 게이트 수를 줄이면서 움직임 추정 오류를 증가시키지 않으므로 동영상 압축 전송을 필요로 하는 시스템, 즉 디지털 TV, 캠코더, DVD, 화상 회의 시스템등에서 움직임 추정기의 구현 비용을 줄일 수 있다. 또한, 하위 비트를 블록 정합 과정에서 사용하지 않을 수 있는데, 이는 영상 데이터에 대해서 저역 통과 필터링한 효과를 가지므로 잡음에 의한 영향을 줄일 수 있다.As described above, according to the method and apparatus for block matching motion estimation according to the present invention, a system requiring video compression transmission, i.e., digital TV, camcorder, DVD, The cost of implementing a motion estimator in a video conferencing system can be reduced. In addition, the lower bit may not be used in the block matching process, which has a low pass filtering effect on the image data, thereby reducing the influence of noise.

Claims

A first step of calculating a measurement function for all candidate vectors in the search region with the most significant bits of the reference block of the current frame and the candidate block in the search region of the previous frame;

Comparing the calculation result of the measurement function and extracting candidate blocks having a minimum value;

And a third step of repeating the first step with the next most significant bit only for the extracted candidate blocks, and repeating the process until the number of candidate vectors becomes one. Motion estimation method.

The method of claim 1, wherein the measurement function of the first step is obtained by applying the following equation.

Here, B is a set of block data coordinates to be estimated currently, SA is a set of search area coordinates in a previous frame, and r_i (x, y) and s_i (x + v_x, ~ y + v_y) are respectively (x, y) represents the i th bit of the reference block data at the position and the i th bit of the candidate block data at the position shifted by (v _x , v _y ) from the position of the reference block.

The method of claim 1, wherein the third step

And repeating the first step with the next most significant bit only for the extracted candidate blocks until the most significant bit value becomes smaller than a preset threshold value.

4. The method of claim 3, wherein the threshold is

The block matching motion estimation method, characterized in that the value is set to greater than or equal to zero.

A plurality of performance elements connected in series,

A serial register connected to each of the plurality of performance elements and receiving data corresponding to one pixel of a reference block of a current frame and a candidate block in a search range of a previous frame in parallel and serially outputting each bit;

And an accumulator connected to the lowest performing element among the performing elements,

Each of the plurality of performance elements may include an operation unit configured to perform an exclusive ord operation on one bit of pixel values at the same positions of the reference block of the current frame and the candidate block in the search region of the previous frame;

An adder for adding the output of the operation unit and the output of a previous performing enzyme;

And a latch for outputting the next performing element after temporarily latching the output of the adder.