KR19980053017A

KR19980053017A - Method of adding absolute error value of motion estimator and adder structure

Info

Publication number: KR19980053017A
Application number: KR1019960072053A
Authority: KR
Inventors: 이수정
Original assignee: 배순훈; 대우전자 주식회사
Priority date: 1996-12-26
Filing date: 1996-12-26
Publication date: 1998-09-25
Also published as: KR100236033B1

Abstract

본 발명은 움직임 추정기(motion estimator)의 가산기에 관한 것으로, 특히, 월리스 트리를 적용한 파이프 라인구조 가산기의 인터커넥션(interconnetion)을 간단히 하는 가산 방법 및 가산기 구조에 관한 것으로서, 본 발명의 방법은매트릭스를 4입력 2출력 월리스 트리로 가산하여 매트릭스 크기를 반절로 줄여나가는데, 각 스테이지의 입력 데이터를그룹으로 나누어서 총사이클 동안 모든 중간 결과 데이터를 처리하도록 하였고, 본 발명의 구조는 제 1 월리스 가산부(40)와 제 2 월리스 가산부(41), 제 3 월리스 가산부(42), 제 4 월리스 가산부(43), 제 5 월리스 가산부(44), 제 6 월리스 가산부(45), 제 7 월리스 가산부(46), 및 절대에러값(MAE)을 출력하는 병합 가산부(47)로 구성되고, 이전 스테이지 월리스 가산부에서 이웃한 2개의 가산기 출력을 다음 스테이지의 월리스 가산부의 가산기로 입력하도록 하여 각 모듈 사이의 인터커넥션이 간단해지고, 하드웨어 수(월리스 트리)가 감소되므로써, VLSI제작시 칩 면적과 비용이 감소되는 효과가 있다.The present invention relates to an adder of a motion estimator and more particularly to an addition method and an adder structure for simplifying interconnection of a pipeline structure adder applying a Wallace tree, The matrix is multiplied by a 4-input, 2-output Wallis tree to reduce the matrix size to half the size. Divided into groups The structure of the present invention is such that the first and second wallis adders 40 and 41, the third wallis adder 42 and the fourth wallis adder 43 And a merge adder 47 for outputting an absolute error value (MAE), and the fifth and sixth wavelet adders 44 and 46 are provided. The two adjacent adder outputs from the stage Wallace addition unit are input to the adder of the Wallace addition unit of the next stage to simplify the interconnection between the modules and reduce the hardware number (Wallace tree) Is reduced.

Description

Method of adding absolute error value of motion estimator and adder structure

본 발명은 움직임 추정기(motion estimator)의 가산기에 관한 것으로, 특히, 월리스 트리를 적용한 파이프 라인 구조를 갖는 가산기의 각 모듈(module) 사이의 인터커넥션(interconnetion)을 간단히 하고, 하드웨어 수를 감소시킨 절대에러값 가산 방법 및 가산기 구조에 관한 것이다.The present invention relates to an adder of a motion estimator and more particularly to an adder of a motion estimator that simplifies interconnection between modules of an adder having a pipeline structure using a Wallace tree, An error value addition method, and an adder structure.

일반적으로, 이차원 동화상은 많은 양의 정보를 갖고 있기 때문에 이의 전송을 위해서는 상당한 주파수 대역이 필요하게 된다. 이 문제를 해결하기 위해서는 정보의 중복성이 존재하기 때문에 정보의 압축이 가능해진다. 이때 시간축상의 중복성은 움직임 보상을 통하여 정보를 압축할 수 있으며, 여기서 중요한 핵심이 바로 움직임 벡터 추정기 이다.In general, since a two-dimensional moving image has a large amount of information, a considerable frequency band is required for its transmission. In order to solve this problem, there is redundancy of information, so information can be compressed. At this time, redundancy on the time axis can compress information through motion compensation, and an important point here is a motion vector estimator.

움직임 벡터 추출 알고리즘으로는 PRA(Pel Recursive Algorithm)과 블럭정합 알고리즘이 있다.Motion vector extraction algorithms include PRA (Pel Recursive Algorithm) and block matching algorithm.

블럭 정합 알고리즘(BMA: Block Matching Algorithm)은 현재 프레임을 고정된 크기의 블럭(이하 기준 블럭이라함)으로 나누어 각각의 기준 블럭이 이전 프레임의 정해진 영역에서 블럭단위로 독립적으로 변위를 한다는 가정하에 이루어지는 것이다.A block matching algorithm (BMA) divides a current frame into blocks of a fixed size (hereinafter referred to as reference blocks), and assumes that each reference block independently displaces in a predetermined area of a previous frame in block units will be.

이때, 화면의 움직임이 수평 또는 수직으로 평행 이동한 것으로 가정하여 움직임이 일어난 프레임(즉, 현재 프레임)의 블록영상이 움직임이 일어나기전 프레임(즉, 이전 프레임)의 어느 위치에 있는 블록영상과 가장 일치하는가를 추정하여 그 위치를 통해 움직임 벡터(motion vector)를 추정하는 방법이다. 이때, 블록의 크기로는 8×8, 16×16( 가로 픽셀 수×세로 픽셀 수 )을 주로 사용한다.At this time, assuming that the motion of the screen is parallel or horizontally moved, the block image of the frame in which the motion has occurred (i.e., the current frame) Estimates a motion vector, and estimates a motion vector through the position. At this time, 8 × 8 and 16 × 16 (number of horizontal pixels × number of vertical pixels) are mainly used as block sizes.

여기서, 현재 프레임의 기준 블록(reference block)과 가장 유사한 이전 블록을 찾기 위하여 이전 프레임에서 기준 블록의 위치를 중심으로 일정 범위 안을 찾게 되는데, 이러한 범위를 서치 윈도우(search window)라 하고, 이러한 서치 윈도우안에서 각 후보 블록(candidate block)과의 차를 디스토션(distortion)이라 하며, 두 블록간의 유사정도를 나타내는 것이다.Here, in order to find the previous block most similar to the reference block of the current frame, a certain range is searched around the position of the reference block in the previous frame. Such a range is called a search window, The difference between each candidate block and the candidate block is referred to as distortion and indicates the degree of similarity between the two blocks.

결국, 영상신호 처리기술에서 '움직임 추정(motion estimation)'이란 연속되는 영상신호에서 현재 프레임(current frame)의 화소(pixel)들이 이전 프레임(previous frame)에 비해 어느 정도 움직였는지를 벡터로 표시한 움직임 벡터(motion vector)를 추정하여, 전체 영상을 전송하는 대신에, 이들 움직임 벡터를 전송함으로써 전송정보를 압축하는 기술(즉, 영상압축)을 말하는 것이다.As a result, in the video signal processing technology, 'motion estimation' is a method of indicating how much the pixels of the current frame move relative to the previous frame in a continuous video signal as a vector Is a technique of estimating a motion vector and compressing transmission information by transmitting these motion vectors instead of transmitting the entire image (i.e., image compression).

한편, 블럭 정합 알고리즘중에서 계산량은 많지만 움직임 벡터를 상대적으로 정확하게 찾을 수 있는 알고리즘이 최근 집적기술의 발전으로 가능해졌는데, 이 알고리즘은 기준 블록을 서치 윈도우안의 모든 후보 블록들과 비교하는 것으로 완전 탐색 블록 정합 알고리즘(full search block matching algorithm)이라 한다.In this paper, we propose a new algorithm for finding the motion vector of a motion vector. In this paper, we propose a new algorithm that can find the motion vector relatively accurately. (Full search block matching algorithm).

완전 탐색 블럭 정합 알고리즘을 이용한 움직임 벡터 추출 일반식은 다음 수학식으로 구해진다.Extraction of Motion Vector Using Full Search Block Matching Algorithm The general expression is obtained by the following equation.

[수학식 1][Equation 1]

상기 수학식에서 쓰인현재 프레임의 기준 블럭 내의 (i,j)좌표에 있는 각 화소값을 나타내고,이전 프레임의 기준 블럭의 위치에 있는 각 화소값을 나타낸다.In the above equation, Represents each pixel value in the (i, j) coordinate in the reference block of the current frame, Represents each pixel value at the position of the reference block of the previous frame.

는와의 절대치 차이값(distortion)을 의미한다.는좌표의 움직임 벡터를 갖는 탐색 블럭의 j 열의 k 행 까지의 화소값들의 각각의 화소값 절대치 차이값을 합산한 누적치를 나타낸다. The Wow (Absolute value difference) of the input signal. The Represents an accumulated value obtained by summing up the pixel value absolute difference values of the pixel values up to the k rows of the j-th column of the search block having the motion vector of the coordinates.

는좌표의 움직임 벡터를 갖는 탐색 블럭과 기준 블럭과의 절대치 차이값의 총 누적 값을 나타낸다. The Represents the total cumulative value of absolute difference values between the search block having the motion vector of the coordinates and the reference block.

이렇게 구해진좌표가 -p/+p범위내의 모든 중에서 가장 최소 값을 갖게 하는를 기준 블럭의 움직임 벡터로 결정한다.The All coordinates within the range -p / + p To have the smallest value among As a motion vector of the reference block.

여기서,값을 블럭 크기로 나눈 값을 평균 절대 오차값 MAE(mean absolute error)이라 한다.here, The value obtained by dividing the value by the block size is called mean absolute error (MAE).

상기 수학식에서도 보듯이 움직임 추정에서 요구되는 기본적인 연산으로는 기준블록과 후보 블록과의 차를 구하는 감산과정과, 감산에 의해 구해진 차이값들을 매크로블록내에서 모두 더하여 절대에러값을 구하는 가산과정이 있다.As shown in the above equation, the basic operations required in the motion estimation include a subtraction process for obtaining a difference between a reference block and a candidate block, and an addition process for obtaining an absolute error value by adding all difference values obtained by subtraction in a macroblock have.

특히, 차이값을 모두 더해야 하는 가산기는 고속을 요구하므로 월리스 트리 파이프 라인 구조로 제작할 수 있다.In particular, the adder that needs to add all the difference values requires a high speed, so it can be manufactured with a Wallace tree pipeline structure.

여기서, '월리스 트리'는 병렬 승산기 내부 회로중 가장 비중이 크고 핵심이 되는 다수개의 부분곱을 더하여 두개의 출력선(캐리(Carry)와 합(Sum))으로 감소시키는 다수 피연산자 가산 회로로서, 1964년 C.S. Wallace에 의해 제시되었다.Here, 'Wallis tree' is a multiply operand addition circuit that adds a plurality of partial products, which are the largest and most important among the parallel multiplier internal circuits, to two output lines (Carry and Sum) CS Presented by Wallace.

도 1은 움직임 추정기에서 윌리스 트리를 이용한 종래의 파이프 라인 가산기 구조를 도시한 블록도이고, 도 2는 움직임 추정시 종래의 도 1에 따라 매크로 블록의 차이값들을 가산하는 동작을 도시한 개념도이다.FIG. 1 is a block diagram illustrating a conventional pipeline adder structure using a Willis tree in a motion estimator, and FIG. 2 is a conceptual diagram illustrating an operation of adding difference values of a macroblock according to the conventional FIG. 1 in motion estimation.

도 1에 도시한 구조는 매크로 블록 크기(수평 데이터 개수 × 수직 데이터 개수)가 16×16인 경우에 적용한 것이다.The structure shown in Fig. 1 is applied when the macroblock size (number of horizontal data x number of vertical data) is 16x16.

도 1을 참조하면, 종래의 7단계의 파이프 라인 구조로 이루어진 가산기는, 제 1 월리스 가산부(100)와 제 2 월리스 가산부(110), 제 3 월리스 가산부(120), 제 4 월리스 가산부(130), 제 5 월리스 가산부(140), 제 6 월리스 가산부(150), 및 병합 가산기(160)로 구성되어 있다.Referring to FIG. 1, the adder of the conventional 7-stage pipeline structure includes a first wallis adder 100, a second wallis adder 110, a third wallis adder 120, A fifth wallis adder 140, a sixth wallis adder 150, and a merge adder 160. The fifth and sixth wallis adders 150 and 160 are connected to each other.

그리고, 제 1내지 제 6 월리스 가산부(110∼150)는 다수개의 4입력 2출력 월리스 트리(W42)로 구성되어 있으며, 병합 가산부(160)는 상기 제 6 월리스 가산부(150)로부터 출력된 4개 서맨드를 모두 합산하는 가산기로 구성되어 있다.The first to sixth adders 110 to 150 include a plurality of 4-input, 2-output Wallace trees W42. The merge adder 160 adds And an adder that adds up all four commands.

도 1과 같이 구성된 가산기 동작을 도 2를 참조하여 설명하면, 제 1 단계(step1)에서는, 두 블럭의 차이값을 나타내는 16×16 매트릭스에서 1열(1 column)에 해당하는 16개 픽셀을 상기 제 1 월리스 가산부(100)의 1조(100-1)를 통해 가산하여 8개의 데이터로 출력하고, 2열(2 column)에 해당하는 16개 픽셀을 상기 제 1 월리스 가산부(100)의 2조(100-2)를 통해 가산하여 8개의 데이터로 출력하며, 나머지 열(3∼16 column)에 대해서도 마찬가지로 16개 픽셀을 8개의 데이터로 줄여서 출력한다.The adder operation as shown in FIG. 1 will be described with reference to FIG. 2. In the first step (step 1), 16 pixels corresponding to one column in a 16 × 16 matrix representing the difference between two blocks And adds 16 pixels corresponding to 2 columns to the first wallis adder 100 through the first set 100-1 of the first wallis adder 100, Two sets (100-2) to output 8 data, and for the remaining columns (3 to 16 columns), 16 pixels are similarly reduced to 8 data and output.

이와 같이 제 1 월리스 가산부(100)를 통하여 제 1단계가 수행된 이후, 가산부(100)의 각 조에서 출력된 8개 데이터를 열로 정렬시키게 되면 16×8 매트릭스와 같이 나타난다.After the first step is performed through the first wavelet adder 100, if the eight data output from each set of the adder 100 are arranged in columns, a 16 × 8 matrix appears.

제 2 단계(step 2)에서는, 16×8매트릭스에서 1행(1 row)에 해당하는 16개 데이터를 상기 제 2 월리스 가산부(110)의 1조(110-1)를 통해 가산하여 8개의 데이터로 출력하고, 2행(2 row)에 해당하는 16개 데이터를 상기 제 2 월리스 가산부(110)의 2조(110-2)를 통해 가산하여 8개의 데이터로 출력하며, 나머지 행(3∼8 row)에 대해서도 마찬가지로 16개 픽셀을 8개의 데이터로 줄여서 출력한다.In the second step (step 2), 16 pieces of data corresponding to one row in the 16x8 matrix are added through a set 110-1 of the second walley adder 110, 16 data corresponding to 2 rows are added through the second set 110-2 of the second walley adder 110 to output 8 data, and the remaining rows 3 8 rows), the 16 pixels are similarly reduced to 8 data and output.

이와 같이 제 2 월리스 가산부(110)를 통하여 제 2단계가 수행된 이후, 가산부(510)의 각 조에서 출력된 8개의 데이터를 행단위 정렬시키게 되면 8×8 매트릭스와 같이 나타난다.After the second step is performed through the second wally adder 110, the eight data output from each set of the adder 510 is arranged in a row-wise manner, so that an 8x8 matrix is displayed.

계속해서 제 3 월리스 가산부(120)를 통하여 제 3 단계가 수행된 이후, 8×4 데이터 매트릭스와 같이 나타나고, 제 4 월리스 가산부(130)를 통하여 제 4 단계가 수행된 이후, 8×2 데이터 매트릭스와 같이 나타나며, 제 5 월리스 가산부(140)를 통하여 제 5 단계가 수행된 이후, 4×2데이터 매트릭스와 같이 나타나고, 제 6 월리스 가산부(150)를 통하여 제 6 단계가 수행된 이후, 4×1데이터 매트릭스와 같이 나타난다.After the third step is performed through the third wallis adder 120, the data is displayed as an 8x4 data matrix. After the fourth step is performed through the fourth wallis adder 130, 8x2 Data matrix. After the fifth step is performed through the fifth wallis adder 140, a 4 × 2 data matrix is displayed. After the sixth step is performed through the sixth wallis adder 150, , And a 4x1 data matrix.

이제, 제 7 단계(step 7)에서는, 4×1매트릭스의 4개의 데이터를 병합 가산기(160)를 이용하여 모두 합산하여 1데이터를 출력한다. 이 값이 바로 최종계산된 후보 블럭의 절대에러값(MAE)에 해당한다.Now, in the seventh step (step 7), four data of the 4x1 matrix are summed by using the merge adder 160 and one data is output. This value corresponds to the absolute error value (MAE) of the finally calculated candidate block.

그러나, 상기 종래의 파이프 라인 구조를 갖는 가산기는 각 모듈사이의 입력선 과 출력선사이의 연결(인터커넥션)이 복잡한 문제점이 있었다. 또한 VLSI로 제작하고자 할때 복잡한 인터커넥션을 위한 상당량의 면적이 소요되는 문제점이 있었다.However, the adder having the conventional pipeline structure has a problem in that the connection (interconnection) between the input line and the output line between the modules is complicated. Also, there is a problem that a large amount of area is required for complicated interconnection when a VLSI is manufactured.

이에, 본 발명은 상기와 같은 종래의 문제점을 해소하기 위하여 안출된 것으로, 하드웨어 수 즉, 월리스 트리 개수를 줄이고, 파이프라인 인터커넥션을 단순화하여 칩 면적을 감소시킨 움직임 추정기의 절대에러값 가산 방법 및 가산기 구조를 제공하는데 그 목적이 있다.SUMMARY OF THE INVENTION Accordingly, it is an object of the present invention to provide an absolute error value addition method of a motion estimator in which the number of hardware, that is, To provide an adder structure.

상기와 같은 목적을 달성하기 위하여 본 발명의 방법은,매트릭스(수평 픽셀수×수직 픽셀수)의 각 픽셀값들을매트릭스로 산출하는 제 1 단계(step 1)와; 상기 제 1 단계(step 1)에서 산출된매트릭스에 대하여 가산을 수행하여매트릭스를 산출하는 제 2 단계(step 2); 상기 제 2단계(step 2)에서 산출된매트릭스에 대하여 가산하여매트릭스를 산출하는 제 3 단계(step 3); 상기 제 3 단계(step 3)에서 산출된매트릭스를 가산하여매트릭스를 산출하는 제 4 단계(step 4); 상기 제 4 단계(step 4)에서 산출된매트릭스를 가산하여매트릭스를 산출하는 제 5 단계(step 5); 상기 제 5 단계(step 5)에서 산출된매트릭스를 가산하여매트릭스를 산출하는 제 6 단계(step 6); 상기 제 6 단계(step 6)의매트릭스를가산하여매트릭스를 산출하는 제 7 단계(step 7); 및 상기 제 7 단계의매트릭스를 가산하여 절대에러값(MAE)을 구하는 제 8 단계(step 8)를 포함하여 구성되는 것을 특징한다.According to an aspect of the present invention, Each pixel value of the matrix (the number of horizontal pixels x the number of vertical pixels) A first step (step 1) of calculating by a matrix; In the first step (step 1) The addition is performed on the matrix A second step (step 2) of calculating a matrix; In the second step (step 2) Is added to the matrix A third step (step 3) of calculating a matrix; In the third step (step 3) By adding the matrix A fourth step (step 4) of calculating a matrix; In the fourth step (step 4) By adding the matrix A fifth step (step 5) of calculating a matrix; In the fifth step (step 5) By adding the matrix A sixth step (step 6) of calculating a matrix; In the sixth step (step 6) Matrix Added A seventh step (step 7) of calculating a matrix; And the seventh step And an eighth step (step 8) of adding the matrix to obtain an absolute error value (MAE).

상기와 같은 목적을 달성하기 위하여 본 발명의 가산기 구조는차이값을 가산하여데이터를 출력하는 제 1 월리스 가산부(40)와; 상기 제1 월리스 가산부(40)에서 출력된개의 데이터에 대하여 가산을 수행하여데이터를 출력하는 제 2 월리스 가산부(41); 상기 제 2 월리스 가산부(41)에서 출력된데이터에 대하여 가산하여데이터를 출력하는 제 3 월리스 가산부(42); 상기 제 3 월리스 가산부(42)에서 출력된데이터를 가산하여데이터를 출력하는 제 4 월리스 가산부(43); 상기 제 4 월리스 가산부(43)에서 출력된데이터를 가산하여데이터를 출력하는 제 5 월리스 가산부(44); 상기 제 5 월리스 가산부(44)에서 출력된데이터를 가산하여데이터를 출력하는 제 6 월리스 가산부(45); 상기 제 6 월리스 가산부(45)에서 출력된데이터를 가산하여데이터를 출력하는 제 7 월리스 가산부(46); 및 상기 제 7 월리스 가산부(46)에서 출력된데이터를 가산하여 절대에러값(MAE)을 출력하는 병합 가산부(47)를 포함하여 구성되는 것을 특징한다.In order to achieve the above object, the adder structure of the present invention includes The difference value is added A first wallis adder 40 for outputting data; And outputs it to the first wallis adder 40 The addition is performed on the data A second wallis adder 41 for outputting data; And outputs it to the second wallis adder 41 The data is added A third wallis adder 42 for outputting data; The third wallis signal output from the third wallis adder 42 The data is added A fourth wallis adder 43 for outputting data; The fourth wallis signal outputted from the fourth wallis adder 43 The data is added A fifth wallis adder 44 for outputting data; The fifth wallis adder 44 outputs the fifth The data is added A sixth wallis adder 45 for outputting data; The sixth wallis adder 45 outputs the The data is added A seventh wallis adder 46 for outputting data; And the seventh wallis adder (46) And a merge adder 47 for adding the data and outputting an absolute error value MAE.

도 1은 움직임 추정기에서 윌리스 트리를 이용한 종래의 가산기 구조를 도시한 블록도,1 is a block diagram showing a conventional adder structure using Willis trees in a motion estimator,

도 2는 움직임 추정시 종래의 도 1에 따라 매크로 블록의 차이값들을 가산하는 동작을 도시한 개념도,FIG. 2 is a conceptual diagram illustrating an operation of adding difference values of a macroblock according to the conventional FIG. 1 in motion estimation;

도 3은 본 발명에 적용되는 파이프 라인 프로세서의 기본 구조도,FIG. 3 is a basic structure diagram of a pipeline processor applied to the present invention,

도 4은 움직임 추정기에서 월리스 트리를 이용한 도 3에 따른 가산기 구조를 도시한 블록도,FIG. 4 is a block diagram illustrating an adder structure according to FIG. 3 using a Wallace tree in a motion estimator;

도 5는 도 4에 따라 매크로 블록의 차이값들을 가산하는 동작을 도시한 개념도,FIG. 5 is a conceptual diagram illustrating an operation of adding difference values of a macroblock according to FIG. 4,

도 6은 본 발명에 이용된 4개의 서맨드를 2개의 서맨드로 줄이는 월리스 트리 동작을 설명하기 위한 비트맵도,6 is a bit map for explaining the Wallace tree operation for reducing the four commands used in the present invention to two orders,

도 7은 도 6를 전가산기로 구현한 4입력 2출력 가산기(월리스 트리)의 세부 구조를 도시한 세부 구성도이다.FIG. 7 is a detailed configuration diagram showing a detailed structure of a 4-input, 2-output adder (a Wallace tree) implemented by a full adder in FIG.

* 도면의 주요부분에 대한 부호의 설명 *Description of the Related Art [0002]

40 : 제 1 월리스 가산부41 : 제 2 월리스 가산부40: first wallis adder 41: second wallis adder

42 : 제 3 월리스 가산부 43 : 제 4 월리스 가산부42: Third Wallis adder 43: Fourth Wallis adder

44 : 제 5 월리스 가산부 45 : 제 6 월리스 가산부44: fifth wallis addition part 45: sixth wallis addition part

46 : 제 7 월리스 가산부 47 : 병합 가산부46: Seventh Wallis addition part 47: Merge addition part

W₄₂: 4입력 2출력 가산기(월리스 트리)W ₄₂ : 4 input 2 output adder (Wallis tree)

우선, 일반적으로 파이프 라인구조의 각 스테이지는 조합 회로로 구성되어 있으면서 논리 혹은 산술 연산한 데이터를 파이프를 통해 전달해서 반복적으로 연산을 수행하고자 할때 유용하다. 여기서, 각 스테이지의 연산결과를 전달하는 인터페이스 래치들은 동일한 클럭에 동기되어 동작하여야만 병목현상을 막을 수 있다.First, generally, each stage of the pipeline structure is constituted by a combination circuit, and is useful when it is desired to carry out the operation repeatedly by transmitting data obtained through logic or arithmetic operation through a pipe. Here, the interface latches transmitting the operation result of each stage must be operated in synchronization with the same clock to prevent the bottleneck phenomenon.

따라서, 각 스테이지중 가장 긴 지연 시간을 갖는 스테이지에 의해 파이프 라인 클럭 주기가 결정되어 지며, 수학식 과 같다.Thus, the pipeline clock period is determined by the stage having the longest delay time among the respective stages, and is expressed by the following equation.

[수학식 2]&Quot; (2) "

상기 수학식에서는 각 스테이지의 지연 시간이며,은 각 인터페이스 래치의 지연 시간으로, 가장 긴 스테이지 지연 시간과 인터페이스 래치의 지연 시간을 합한 시간이 파이프라인의 클럭 주기로 결정되어 지고, 각 인터페이스 래치는 클럭 주기동기되는 것이다.In the above equation Respectively, Lt; / RTI > Is the delay time of each interface latch, the sum of the longest stage delay time and the delay time of the interface latch Is determined as a clock period of the pipeline, and each interface latch is connected to a clock cycle It is synchronized.

이하, 첨부된 도면을 참조하여 본 발명의 바람직한 실시예를 자세히 설명하기로 한다.Hereinafter, preferred embodiments of the present invention will be described in detail with reference to the accompanying drawings.

본 발명에서 적용한 실시예는 매크로 블럭 크기 16×16(수평 픽셀 개수×수직 픽셀 개수)인 경우에 해당한다.The embodiment applied to the present invention corresponds to a case where the macroblock size is 16 × 16 (horizontal pixel number × vertical pixel number).

도 3은 본 발명에 적용되는 파이프 라인 프로세서의 기본 구조도이고, 도 4는 움직임 추정기에서 월리스 트리를 이용한 도 3에 따른 가산기 구조를 도시한 블록도이고, 도 5는 도 4에 따라 매크로 블록의 차이값들을 가산하는 동작을 도시한 개념도이다.4 is a block diagram showing an adder structure according to FIG. 3 using a Wallace tree in a motion estimator, FIG. 5 is a diagram illustrating a difference &Lt; / RTI >

도 3을 참조하면, 본 발명의 파이프 라인 구조는 8단 프로세싱 스테이지(S1∼S8) 및, 단과 단사이의 중간 결과(intermediate results)를 임시 저장하는 래치(LATCH)들로 구성되어 있다.Referring to FIG. 3, the pipelined architecture of the present invention comprises eight stages of processing stages (S1 through S8) and latches (LATCHs) for temporarily storing intermediate results between stages.

여기서, 제 1 내지 제 6 래치(LATCH1∼LATCH6)는 상기 수학식과 같이 파이프라인 클럭 주기 (이하, '제 1 클럭'이라함 )에 동기되어 동작하고, 제 7 내지 제 9 래치(LATCH7∼LATCH9)는 상기 클럭 주기의 4배에 해당하는 클럭 주기(이하, '제 2 클럭'이라 함)에 동기되어 동작하도록 되어 있다.Here, the first to sixth latches LATCH1 to LATCH6 operate in synchronization with a pipeline clock cycle (hereinafter, referred to as 'first clock') as in the above equation, and the seventh to ninth latches LATCH7 to LATCH9, Is synchronized with a clock cycle (hereinafter referred to as a 'second clock') corresponding to four times the clock period.

이어서, 도 4는 도 3의 상세 블럭도로서, 도 3의 각 스테이지는 월리스 가산기를 포함한 다수개의 가산기 및, 레지스터들로 구성되어 있다.Next, FIG. 4 is a detailed block diagram of FIG. 3, wherein each stage of FIG. 3 is composed of a plurality of adders and registers including a Wallace adder.

도 4에서 보는 바와 같이, 파이프 라인 가산기 구조는 제 1 월리스 가산부(40)와, 제 2 월리스 가산부(41), 제 3 월리스 가산부(42), 제 4 월리스 가산부(43), 제 5 월리스 가산부(44), 제 6 월리스 가산부(45), 제 7 월리스 가산부(46), 병합 가산부(47) 및, 제 1 내지 제 9 래치(L1∼L9), 카운터(도시하지 않음)로 구성되어 있다.As shown in FIG. 4, the pipeline adder structure includes a first wall addition unit 40, a second wall addition unit 41, a third wall addition unit 42, a fourth wall addition unit 43, The first to ninth latches L1 to L9, the counter (not shown in the figure), the fifth wall addition unit 44, the sixth wallis addition unit 45, the seventh wallis addition unit 46, the merge addition unit 47, Not shown).

상기 제 1 월리스 가산부(40)는 4개 데이터를 입력받아 2개 데이터를 출력하는 월리스 트리 구조로 된 16개의 4입력 2출력 가산기로 구성되어 있는데, 한 사이클마다 수직 1열 데이터중 4개씩 입력받아 4사이클동안 한 열의 데이터(16개)를 처리하는 제 1 가산기(40-1) 및 나머지 수직 열에 대해서도 4개씩 입력받아 한 열의 데이터를 처리하는 제 2 내지 제 16 가산기(40-2∼40-16)로 구성되어 있다.The first walley adder 40 is composed of 16 4-input, 2-output adders having a Wallace tree structure for receiving 4 data and outputting 2 data. A first adder 40-1 for processing one row of data (16) for four cycles, and a second to a sixteenth adders 40-2 to 40- 16).

상기 제 2 월리스 가산부(41)는 8개의 4입력 2출력 가산기로 구성되어 있는데, 상기 제 1 월리스 가산부(40)의 제1 및 제2 가산기(40-1, 40-2) 출력을 입력받는 제1 가산기(41-1), 제 2 월리스 가산부(40)의 제3 및 제4 가산기(40-3, 40-4) 출력을 입력받는 제2 가산기(41-2) 및 나머지 제1 월리스 가산부 출력을 입력받는 제3 내지 제8 가산기(41-3∼41-8)로 구성되어 있다.The second walley adder 41 is composed of eight 4-input, two-output adders. The outputs of the first and second adders 40-1 and 40-2 of the first walley adder 40 are input A second adder 41-2 receiving the outputs of the third and fourth adders 40-3 and 40-4 of the second wally adder 40, And third to eighth adders 41-3 to 41-8 that receive the output of the Wallace adder.

상기 제 3 월리스 가산부(42)는 4개의 4입력 2출력 가산기로 구성되어 있는데, 상기 제2 월리스 가산부(41)의 제1 및 제2 가산기(41-1, 41-2)출력을 입력받는 제1 가산기(42-1), 상기 제3 및 제4 가산기(41-3, 41-4)출력을 입력받는 제2 가산기(42-2) 및 나머지 제2 월리스 가산부의 가산기(41-5∼41-8)들의 출력을 입력받는 제3 내지 제4 가산기(42-3∼42-4)로 구성되어 있다.The third walley adder 42 is composed of four 4-input, two-output adders. The third and fourth adders 41-1 and 41-2 of the second walley adder 41 input A second adder 42-2 receiving the outputs of the third and fourth adders 41-3 and 41-4 and an adder 41-5 of the remaining second wall adder 42-1, And fourth to fourth adders 42-3 to 42-4 that receive the outputs of the first to fourth adders 42-1 to 42-8.

상기 제 4 월리스 가산부(43)는 2개의 4입력 2출력 가산기로 구성되어 있는데, 상기 제 3 월리스 가산부(42)의 제1 및 제2 가산기(42-1, 42-2)출력을 입력받는 제1 가산기(43-1) 및 제 3 월리스 가산부의 가산기(42-3, 42-4) 출력을 입력받는 제2 가산기(43-2)로 구성되어 있다.The fourth walley adder 43 is composed of two 4-input, two-output adders. The fourth and sixth adders 42-1 and 42-2 of the third walley adder 42 input And a second adder 43-2 receiving the outputs of the first adder 43-1 and the adders 42-3 and 42-4 of the third wavelet adder.

상기 제 5 월리스 가산부(44)는 1개의 4입력 2출력 가산기로 구성되어 있는데, 상기 제 4 월리스 가산부(43)의 제1 및 제2 가산기(43-1, 43-2)출력을 입력받는 가산기(44)로 구성되어 있다.The fifth walley adder 44 is composed of one 4-input, two-output adder and outputs the outputs of the first and second adders 43-1 and 43-2 of the fourth walley adder 43 And an adder 44 for receiving.

상기 제 6 월리스 가산부(45)는 2개의 4입력 2출력 가산기와 8개의 시프트레지스터로 구성되어 있는데, 상기 제 5 월리스 가산부(44)의 2개 출력을 4클럭 동안 입력받아 시프트 시켜 8개 데이터를 래치하여 저장하는 시프트레지스터(45-1)와, 상기 시프트레지스터(45-1)에 저장된 4개 데이터를 입력받아 가산하는 제1 가산기(45-2) 및 상기 시프트레지스터에 저장된 다른 4개 데이터를 입력 받아 가산하는 제2 가산기(45-3)로 구성되어 있다.The sixth walith adder 45 is composed of two 4-input, 2-output adders and 8 shift registers. The 4th input and the 2-output of the fifth Walsh adder 44 are input for 4 clocks, A shift register 45-1 for latching and storing data, a first adder 45-2 for receiving and adding the four data stored in the shift register 45-1, and the other four And a second adder 45-3 for receiving and adding data.

상기 제 7 월리스 가산부(46)는 1개의 4입력 2출력 가산기로 구성되어 있는데, 상기 제 6 월리스 가산부(45)의 제1 및 제2 가산기(45-2∼45-3)의 출력을 입력 받아 2개 데이터를 출력하는 가산기(46)로 구성되어있다.The seventh Walris adder 46 is composed of one 4-input, two-output adder. The outputs of the first and second adders 45-2 to 45-3 of the sixth walley adder 45 are And an adder 46 for receiving and outputting two data.

상기 병합 가산부(47)는 상기 제 7 월리스 가산부(46)로부터 출력된 2개 데이터를 합산하여 최종 절대에러값(MAE)을 구하는 병합가산기로 구성되어 있다. 상기 병합 가산기로는 CSA(carry select adder), CLA(carry lookahead adder) 등을 이용하여 구현할 수도 있다.The merge adder 47 is composed of a merge adder for summing the two data output from the seventh wavelet adder 46 to obtain a final absolute error value MAE. The merge adder may be implemented using a carry select adder (CSA), a carry lookahead adder (CLA), or the like.

그리고, 제1 내지 제6 래치(L1∼L6)는 제 1 내지 제 5 월리스 가산부(40∼44)로부터 1사이클 마다 출력되는 데이터를 래치하여야 하므로 제 1 클럭(클럭 주파수 ; f)에 동기되어 있으며, 상기 시프트레지스터 역시 제 1 클럭에 동기되어 동작된다.The first to sixth latches L1 to L6 must latch the data output from the first to fifth wallis adders 40 to 44 every cycle, so that they are synchronized with the first clock (clock frequency) f And the shift register is also operated in synchronization with the first clock.

그리고, 제7 내지 제9 래치(L7∼L9)는 상기 제 6 월리스 가산부(45)의 시프트레지스터에 유효한 4쌍(8개 데이터)데이터가 채워진 후에 가산 연산을 수행하므로 상기 제 7내지 9래치(L7∼L9)는 상기 제 1 클럭 주기의 4배에 해당하는 제 2 클럭(클럭 주파수;)에 동기되어져 있다.Since the seventh to ninth latches L7 to L9 perform the addition operation after four pairs (eight data) of data effective in the shift register of the sixth wallis adder 45 are filled, (L7 to L9) has a second clock (clock frequency; .

이때, 제 2 클럭(COUNT_LATCH)은 상기 제 1 클럭(CLOCK_LATCH) 사이클을 카운팅하는 카운터를 이용하여 4번째 클럭마다 발생하도록 하였다.At this time, the second clock COUNT_LATCH is generated every fourth clock using a counter for counting the first clock (CLOCK_LATCH) cycle.

이어서, 도 4 와 같이 구성된 파이프 라인 구조를 갖는 가산기의 동작을 도 5를 참조하여 자세히 설명하고자 한다.Next, the operation of the adder having the pipeline structure constructed as shown in FIG. 4 will be described in detail with reference to FIG.

도 5 에서 보여지는 스텝 1에 나타낸 16×16 매트릭스가 상기 제 1 월리스 가산부(40)의 입력 데이터이다.The 16 × 16 matrix shown in step 1 shown in FIG. 5 is the input data of the first wallis adder 40.

1클럭 사이클 동안, 수직 1열에 해당하는 4개의 데이터는 제 1 가산기를 통해 2개 데이터로 출력되고, 2열에 해당하는 4개의 데이터는 제 2 가산기를 통해 2개 데이터로 출력되고, 나머지 열에 대해서도 마찬가지로 4개 데이터가 2개 데이터로 출력된다.During one clock cycle, four pieces of data corresponding to the vertical 1 column are output as two pieces of data through the first adder, four pieces of data corresponding to the two columns are outputted as two pieces of data through the second adder, Four pieces of data are output as two pieces of data.

즉, 제 1 월리스 가산부의 16개 4입력 2출력 가산기에서는 1번째 사이클 동안 A1 그룹(4×16데이터)이 입력되고, 2번째 사이클 동안 A2그룹, 3번째 사이클 동안 A3 그룹, 4번째 사이클 동안 A4 그룹이 차례로 입력되면서, 총 4사이클 동안 모든 입력 데이터가 처리되어 그 중간 결과 데이터를 정렬시키게 되면 스텝 2의 16×8 매트릭스와 같이 나타난다.That is, in the 16 four-input, two-output adders of the first wallis adder, the A1 group (4 x 16 data) is input during the first cycle, the A2 group is input during the second cycle, the A3 group during the third cycle, As the groups are input in sequence, all the input data is processed for a total of 4 cycles and the intermediate result data is aligned as shown in the 16x8 matrix of step 2. [

스텝 2에서 나타낸 16×8 매트릭스는 상기 제 2 월리스 가산부(41)의 입력 데이터로서, 1 사이클 동안, 상기 제 1 월리스 가산부(40)의 제 1 및 제 2 가산기(40-1, 40-2)로부터 출력된 4개 데이터가 제 2 월리스 가산부(41)의 제 1 가산기(41-1)로 입력되어 되고, 나머지 제 1 월리스 가산부(40)의 출력도 마찬가지로 제 2 월리스 가산부(41)의 7개 가산기(41-2∼41-8)로 입력된다.The 16 × 8 matrix shown in step 2 is input to the first and second adders 40-1 and 40-2 of the first wallis adder 40 for one cycle as input data to the second wallis adder 41, 2 are input to the first adder 41-1 of the second wallis adder 41 and the output of the remaining first wallis adder 40 is also input to the second wallis adder 41 41) of the adders 41-2 to 41-8.

즉, 1번째 사이클 동안, 그룹 B1(그룹 A1이 처리된 중간 결과 데이터 2×16개) 데이터중 4개씩 가산기를 통해 가산되어 2개 데이터로 출력되고, 2번째 사이클 동안 그룹 B2(그룹 A2가 처리된 중간 결과 데이터), 3번째 사이클 동안 그룹 B3(A3가 처리된 중간 결과 데이터), 4번째 사이클 동안 그룹 B4(그룹 A4가 처리된 중간 결과 데이터)가 처리된다. 이렇게 총 4사이클 동안 모든 중간 결과 데이터(16×8)가 제 2 월리스 가산부(41)를 통해 처리되어 그 중간 결과 데이터를 정렬시키게 되면 스텝 3의 8×8 매트릭스와 같이 나타난다.That is, during the first cycle, four out of the group B1 (2 × 16 intermediate result data processed by the group A1) are added through the adder and output as two data, and during the second cycle, the group B2 Group B3 (intermediate result data in which A3 is processed) during the third cycle, and group B4 (intermediate result data in which the group A4 is processed during the fourth cycle) are processed. If all the intermediate result data (16x8) are processed through the second wallis adder 41 and the intermediate result data is aligned for a total of four cycles, the result is displayed as an 8x8 matrix in step 3. [

계속해서, 스텝 3, 스텝 4, 스텝 5에서도 스텝 1이나 스테 2에서와 마찬가지로 4개의 그룹 단위로 전체 4사이클동안 이전 스텝의 모든 중간 결과 데이터를 처리하게 된다.Subsequently, in step 3, step 4, and step 5, all the intermediate result data of the previous step are processed for four cycles in units of four groups as in step 1 and step 2.

다시말해서, 4사이클 동안, 상기 스텝 3에 나타낸 8×8 매트릭스는 제 3 월리스 가산부(42)를 통해 4×8 매트릭스로 출력되고, 스텝 4에서 나타낸 4×8 매트릭스는 제 4 월리스 가산부(43)를 통해 2×8 매트릭스로 출력되고, 스텝 5에서 나타낸 2×8 매트릭스는 제 5 월리스 가산부(44)를 통해 4×2매트릭스로 출력된다.In other words, during the four cycles, the 8 × 8 matrix shown in the step 3 is output to the 4 × 8 matrix through the third wallis adder 42, and the 4 × 8 matrix shown in the step 4 is output to the fourth wallis adder 43 in the 2x8 matrix, and the 2x8 matrix shown in the step 5 is output to the 4x2 matrix through the fifth wallis adder 44. [

지금까지 설명한 상기 스텝 1부터 스텝 5까지 수행되는 동작은 각 스텝에서 처리해야할 모든 데이터를 4그룹으로 구분하여 1그룹을 1사이클동안 처리하여 4사이클에 모두 처리하도록 하여, 도 1에 보여준 기존의 파이프 라인 각 모듈에서 소요된 하드웨어가 1/4배 정도까지 감소되었음을 알수 있다.In the operations performed in the steps 1 to 5 described above, all the data to be processed in each step are divided into four groups, one group is processed for one cycle, and all the data are processed in four cycles, It can be seen that the hardware consumed by each module in the line has been reduced by a factor of four.

이제, 상기 스텝 5를 수행한 이후, 출력된 데이터가 처리되는 과정을 살펴보면 다음과 같다.Hereinafter, a process of processing the output data after performing the step 5 will be described.

스텝 6에서 나타낸 2×4 매트릭스는 상기 제 6 월리스 가산부(45)의 입력 데이터로서, 상기 제 5 월리스 가산부(44)로부터 제 1 클럭(CLOCK_LATCH)에 따라 출력되는 2개 데이터를 상기 시프트레지스터(45-1)로 래치한 데이터에 해당한다.The 2x4 matrix shown in step 6 is input to the sixth wally adder 45 as two data output from the fifth wally adder 44 according to the first clock CLOCK_LATCH, Corresponds to the data latched by the latch 45-1.

즉, 상기 제 5 월리스 가산부(44)로부터 4사이클 동안 출력된 데이터가 상기 시프트레지스터 저장된 배열에 해당하며, 상기 시프트레지스터(45-1)에 유효한 8개 데이터가 채워졌을 때 4개 데이터가 제 1 가산기(45-2), 나머지 4개 데이터가 제 2 가산기(45-3)로 입력되어 2×2 매트릭스로 출력된다.That is, the data outputted for the fourth cycle from the fifth wallis adder 44 corresponds to the stored arrangement of the shift register, and when the eight valid data in the shift register 45-1 are filled, 1 adder 45-2 and the remaining four data are input to the second adder 45-3 and output as a 2x2 matrix.

이때, 상기 제 6 월리스 가산부(45)의 출력을 래치하는 제 7래치(L7)는 카운터에 의해 4사이클 마다 한번씩 연산을 수행하도록 제 2 클럭(COUNT_LATCH)에 동기시켜야한다.At this time, the seventh latch L7 for latching the output of the sixth wallis adder 45 must be synchronized to the second clock COUNT_LATCH so as to perform an operation every four cycles by a counter.

스텝 7에서 나타낸 2×2 매트릭스는 상기 제 7 월리스 가산부(46)의 입력데이터로서, 상기 제 6 월리스 가산부(45)로부터 출력된 4개 데이터가 가산되어 2개 데이터로 출력된다.The 2x2 matrix shown in step 7 is the input data of the seventh wallis adder 46 and the four data output from the sixth walith adder 45 are added and output as two data.

마지막으로, 스텝 8에서 나타낸 1×2 매트릭스는 병합 가산기(47)를 통해 합산되어 최종 절대에러값(MAE)이 구해진다.Finally, the 1x2 matrix shown in step 8 is added through the merge adder 47 to obtain the final absolute error value MAE.

여기서, 상기 제 7 월리스 가산부(46)의 출력을 래치하는 제 8래치(L8)와 병합 가산부(47)의 출력을 래치하는 제 9 래치(L9) 역시, 제 7래치(L7)와 마찬가지로 카운터에 의해 발생되는 제 2 클럭(COUNT_LATCH)에 동기되어 있다. 그러나, 상기 제 9래치는 마지막 스테이지의 최종결과값을 래치하므로 반듯이 제 2 클럭(COUNT_LATCH)에 동기될 필요는 없다.Here, the eighth latch L8 for latching the output of the seventh wallis adder 46 and the ninth latch L9 for latching the output of the merge adder 47 are also similar to the seventh latch L7 And is synchronized with the second clock COUNT_LATCH generated by the counter. However, since the ninth latch latches the final result value of the last stage, it is not necessary to synchronize to the second clock COUNT_LATCH.

이상에서 설명한 가산기의 파이프라인 동작을 하기 표 1로 나타내었다.The pipeline operation of the adder described above is shown in Table 1 below.

T=1/fT = 1 / f S1S1 S2S2 S3S3 S4S4 S5S5 S6S6 S7S7 S8S8 T0T0 M1, A1M1, A1 T1T1 M1, A2M1, A2 M1, B1M1, B1 T2T2 M1, A3M1, A3 M1, B2M1, B2 M1, C1M1, C1 T3T3 M1, A4M1, A4 M1, B3M1, B3 M1, C2M1, C2 M1, D1M1, D1 T4T4 M2, A1M2, A1 M1, B4M1, B4 M1, C3M1, C3 M1, D2M1, D2 M1, E1M1, E1 T5T5 M2, A2M2, A2 M2, B1M2, B1 M1, C4M1, C4 M1, D3M1, D3 M1, E2M1, E2 T6T6 M2, A3M2, A3 M2, B2M2, B2 M2, C1M2, C1 M1, D4M1, D4 M1, E3M1, E3 T7T7 M2, A4M2, A4 M2, B3M2, B3 M2, C2M2, C2 M2, D1M2, D1 M1, E4M1, E4 T8T8 M3, A1M3, A1 M2, B4M2, B4 M2, C3M2, C3 M2, D2M2, D2 M2, E1M2, E1 M1, F1M1, F1 T9T9 M3, A2M3, A2 M3, B1M3, B1 M2, C4M2, C4 M2, D3M2, D3 M2, E2M2, E2 T10T10 M3, A3M3, A3 M3, B2M3, B2 M3, C1M3, C1 M2, D4M2, D4 M2, E3M2, E3 T11T11 M3, A4M3, A4 M3, B3M3, B3 M3, C2M3, C2 M3, D1M3, D1 M2, E4M2, E4 T12T12 M4, A1M4, A1 M3, B4M3, B4 M3, C3M3, C3 M3, D2M3, D2 M3, E1M3, E1 M2, F1M2, F1 M1, G1M1, G1

T13T13 M4, A2M4, A2 M4, B1M4, B1 M3, C4M3, C4 M3, D3M3, D3 M3, E2M3, E2 T14T14 M4, A3M4, A3 M4, B2M4, B2 M4, C1M4, C1 M3, D4M3, D4 M3, E3M3, E3 T15T15 M4, A4M4, A4 M4, B3M4, B3 M4, C2M4, C2 M4, D1M4, D1 M3, E4M3, E4 T16T16 M5, A1M5, A1 M4, B4M4, B4 M4, C3M4, C3 M4, D2M4, D2 M4, E1M4, E1 M3, F1M3, F1 M2, G1M2, G1 M1, H1M1, H1 T17T17 M5, A2M5, A2 M5, B1M5, B1 M4, C4M4, C4 M4, D3M4, D3 M4, E2M4, E2 T18T18 M5, A3M5, A3 M5, B2M5, B2 M5, C1M5, C1 M4, D4M4, D4 M4, E3M4, E3 T19T19 M5, A4M5, A4 M5, B3M5, B3 M5, C2M5, C2 M5, D1M5, D1 M4, E4M4, E4 T20T20 M6, A1M6, A1 M5, B4M5, B4 M5, C3M5, C3 M5, D2M5, D2 M5, D1M5, D1 M4, F1M4, F1 M3, G1M3, G1 M2, H1M2, H1

상기 표 1에서 T 는 제 1 클럭(CLOCK_LATCH) 주기이고, S_i는 파이프 라인 스테이지이고, M은 매크로 블록 인덱스이며, A, B, C, D, E, F, G, H 는 도 5에서 보여준 각 스테이지에서 처리되는 데이터 그룹이다.In Table 1, T is a first clock (CLOCK_LATCH) period, S _i is a pipeline stage, M is a macroblock index, and A, B, C, D, E, F, G, It is a group of data processed in each stage.

상기 표에서 보여지는 바와 같이, 스테이지 1에서는 T0∼T3클럭동안 M1 매크로 블록의 모든 데이터 4그룹 A1,A2,A3,A4 를 차례로 처리하고, 계속해서 새로운 다음 매크로 블럭을 입력받아 처리하게 된다.As shown in the table, in stage 1, all the data 4 groups A1, A2, A3, and A4 of the M1 macro block are sequentially processed during the T0 to T3 clocks, and then the next new macroblock is input and processed.

상기 M1 매크로 블록의 모든 데이터를 합산하여 절대에러값을 계산하는데 소요되는 사이클은 최소한 16사이클이 소요되며, 이후 4사이클 마다 다음 매크로 블록의 절대에러값이 계산되어 출력된다.The cycle required for calculating the absolute error value by summing all the data of the M1 macroblock takes at least 16 cycles, and the absolute error value of the next macroblock is calculated and output every four cycles thereafter.

상기 월리스 가산부의 구체적인 동작을 설명하기 위해 4입력 2출력 월리스 트리를 보이고자 한다.To illustrate the concrete operation of the Wallace addition unit, a 4-input, 2-output Wallace tree is shown.

도 6은 4개의 데이터를 2개의 데이터로 줄이는 월리스 트리 동작을 설명하기 위한 비트맵도이고, 도 7은 도 6을 전가산기로 구현한 4입력 2출력 가산기의 세부 구조를 도시한 세부 구성도이다.FIG. 6 is a bitmap diagram for explaining a Wallace tree operation for reducing four data into two data, and FIG. 7 is a detailed configuration diagram illustrating a detailed structure of a 4-input, 2-output adder implemented by a pre- .

도 7을 참조하면, 윌리스 트리에서는 4개의 데이터를 입력받아 동일한 위치의 비트끼리 묶어서 가산하여 합(SUM)과 캐리(CARRY)를 발생하고, 발생된 썸에 해당하는 비트들과 캐리에 해당하는 비트들을 하나의 데이터로 간주하여 2개의 데이터를 출력한다. 따라서, 캐리 프로퍼케이션이 발생되지 않도록 하고 있다.Referring to FIG. 7, in the Willis tree, four data are received, and the bits at the same position are grouped and added to generate sum and carry (CARRY), and bits corresponding to the generated thumb and bits corresponding to the carry Are regarded as one data, and two data are output. Therefore, carry propagation is prevented from occurring.

즉, 4개의 데이터(혹은 픽셀) a, b, c, d 는 8비트로 구성되어 있으며, 동일한 비트 위치끼리 가산을 수행하는데, a, b, c의 각 비트 3개씩 그룹을 지어 가산한다. 그 결과 각 그룹의 합(SUM)은 동일한 비트 위치, 캐리(CARRY)는 상위 1비트 위치에 해당하는 웨이트가 되어서, 나머지 d 의 각 비트와 동일한 웨이트의 비트끼리 다시 3개씩 그룹을 지어 가산한다. 그 결과 각 그룹의 합(SUM)은 동일한 비트 위치, 캐리(CARRY)는 상위 1비트 위치에 해당하는 웨이트가 되어서, 2개의 데이터 e와 f는 윌리스 트리의 출력쌍을 형성한다.That is, the four data (or pixels) a, b, c, and d are made up of 8 bits, and the same bit positions are added together. The bits a, b, and c are grouped into three groups. As a result, the sum SUM of each group becomes the same bit position, and the carry CARRY becomes the weight corresponding to the upper 1-bit position, so that the bits of the same weight as each of the bits of the remaining d are added again by three groups. As a result, the sum SUM of each group is the same bit position, and the carry (CARRY) is a weight corresponding to the upper one bit position, so that two data e and f form a pair of Willis trees.

이때, 출력된 2개의 데이터 e와 f의 최고 9비트까지 발생되고 있으나, 원래 기준 블럭과 후보 블럭의 각 픽셀끼리의 차이값을 가산하였기 때문에, 여기서 가산된 2개의 데이터는 그다지 크지 않으며, 최상위 비트(캐리)까지 고려하지 않더라도 결과에는 지장이 없다.At this time, up to the 9 bits of the output two data e and f are generated, but since the difference values between the pixels of the original reference block and the candidate block are added up, the two data added here are not so large, (Carrie), even if you do not take into account the results.

이상에서 살펴본 바와 같이 본 발명에 따라 움직임 추정시에 절대에러값을 구하는 가산 방식은 월리스 트리를 이용한 파이프라인 처리로 병렬 고속처리가 가능하며, 하드웨어 수(월리스 트리)가 감소되며, 각 모듈 사이의 인터커넥션이 간단해지므로써 VLSI제작시 칩 면적과 비용이 감소되는 효과가 있다.As described above, according to the present invention, the addition method for obtaining the absolute error value at the time of motion estimation can be performed in a parallel high-speed processing by the pipeline processing using the Wallace tree, the number of hardware (Wallace tree) is reduced, Since the interconnection is simplified, chip area and cost are reduced in VLSI fabrication.

Claims

A method for adding pixel values of a matrix (the number of horizontal pixels x the number of vertical pixels) using a Wallace tree structure,

Each pixel value of the matrix (the number of horizontal pixels x the number of vertical pixels) A first step (step 1) of calculating by a matrix;

In the first step (step 1) The addition is performed on the matrix A second step (step 2) of calculating a matrix;

In the second step (step 2) Is added to the matrix A third step (step 3) of calculating a matrix;

In the third step (step 3) By adding the matrix A fourth step (step 4) of calculating a matrix;

In the fourth step (step 4) By adding the matrix A fifth step (step 5) of calculating a matrix;

In the fifth step (step 5) By adding the matrix A sixth step (step 6) of calculating a matrix;

In the sixth step (step 6) Matrix Added A seventh step (step 7) of calculating a matrix; And

In the seventh step And an eighth step (step 8) of adding a matrix to obtain a final value (MAE).

In a pipeline adder structure for adding distortion values of blocks in a motion estimator using a Wallace tree structure,

The difference value is added A first wallis adder 40 for outputting data; And outputs it to the first wallis adder 40 The addition is performed on the data A second wallis adder 41 for outputting data;

And outputs it to the second wallis adder 41 The data is added A third wallis adder 42 for outputting data;

The third wallis signal output from the third wallis adder 42 The data is added A fourth wallis adder 43 for outputting data;

The fourth wallis signal outputted from the fourth wallis adder 43 The data is added A fifth wallis adder 44 for outputting data;

The fifth wallis adder 44 outputs the fifth The data is added A sixth wallis adder 45 for outputting data;

The sixth wallis adder 45 outputs the The data is added A seventh wallis adder 46 for outputting data; And

The seventh wallis adder 46 outputs the seventh And a merge adder (560) for adding the data and outputting a final value (MAE).

3. The semiconductor memory device according to claim 2, further comprising a first latch and a ninth latch for temporarily storing intermediate results between the input end and the output end of the first to seventh adders (40 to 46) and the merge adder (47) (L1 to L9) are additionally provided to the absolute error value adder structure of the motion estimator.

4. The semiconductor memory device according to claim 3, wherein the first to sixth latches (L1 to L6) And the seventh to the ninth latches L7 to L9 operate in synchronization with the first clock CLOCK_LATCH corresponding to the clock frequency And a second clock (COUNT_LATCH) corresponding to the first clock (COUNT_LATCH).

4. The absolute error value adder structure of claim 3, wherein the ninth latch (L9) operates in synchronization with the first clock.

5. The absolute error value adder of claim 4, wherein a counter for counting the first clock (CLOCK_LATCH) and generating a second clock (COUNT_LATCH) is added every fourth clock cycle.

The apparatus according to claim 2, wherein the first to seventh adders (40 to 46) comprise a plurality of 4-input, 2-output adders to which a Wallace tree for receiving and adding 4 data and outputting 2 data is applied The absolute error value adder structure of the motion estimator.

3. The apparatus of claim 2, wherein the first wallis adder (40) comprises: a first adder (40-1) for receiving four of vertical 1 column data every cycle and processing one row of data (16) for 4 cycles; And

And a second to a sixteenth adders (40-2 to 40-16) for inputting four data for the remaining vertical columns and processing one column of data.

3. The apparatus according to claim 2, wherein the second wally adder (41) comprises a first adder (41) for receiving the outputs of the first and second adders (40-1 and 40-2) -1);

A second adder 41-2 receiving the outputs of the third and fourth adders 40-3 and 40-4 of the first wallis adder 40; And

And the third to eighth adders (41-3 to 41-8) receiving the outputs of the remaining first wallis adder (40).

3. The apparatus according to claim 2, wherein the third wallis adder (42) comprises a first adder (42) for receiving the outputs of the first and second adders (41-1, 41-2) -1);

A second adder 42-2 receiving the outputs of the third and fourth adders 41-3 and 41-4; And

And the third to fourth adders (42-3 to 42-4) receiving the output of the remaining second wavelet adder (41).

3. The apparatus according to claim 2, wherein the fourth wallis adder (43) comprises a first adder (43) for receiving the outputs of the first and second adders (42-1 and 42-2) of the third wallis adder -One); And

And a second adder (43-2) receiving the outputs of the adders (42-3 and 42-4) of the third wavelet adder (42).

3. The apparatus according to claim 2, wherein the fifth wallis adder (44) comprises an adder (44) receiving the outputs of the first and second adders (43-1, 43-2) of the fourth wallis adder And an absolute error value adder structure for a motion estimator.

3. The apparatus according to claim 2, wherein the sixth wallis adder (45) comprises: a shift register (45-1) for receiving and shifting two outputs of the fifth wallis adder (44) ;

A first adder 45-2 for receiving and adding the four data stored in the shift register 45-1; And

And a second adder (45-3) for receiving and adding the other four data stored in the shift register (54-1).

3. The apparatus according to claim 2, wherein the seventh wallis adder (46) receives and adds four outputs of the first and second adders (45-2 to 45-3) of the sixth wallis adder (45) And an adder (46) for outputting the data as two data.