KR19980072455A

KR19980072455A - Apparatus and method for half-pixel motion estimation for video encoder

Info

Publication number: KR19980072455A
Application number: KR1019970007296A
Authority: KR
Inventors: 이동호; 이승원
Original assignee: 이준우; 동양컴퓨터기술개발 주식회사
Priority date: 1997-03-05
Filing date: 1997-03-05
Publication date: 1998-11-05
Also published as: KR100249087B1

Abstract

본 발명은 동영상 엔코더를 위한 반화소 단위의 움직임 추정 장치 및 방법에 관한 것으로, 4SS를 이용하여 실시간 시스템으로 구현될 수 있도록 동작에 소용되는 클럭수를 줄였으며 1 단계를 추가하여 반화소 단위의 움직임 추정과 움직임 보상결정까지 할 수 있도록 하고, TSS보다 성능이 좋고 FSS보다 적은 수의 클럭수를 요구하는 4SS를 이용하였으므로 실시간 시스템에서 적용하기에 적합한 구조이며, 6.5MHz의 속도로 영상 데이터가 입력될 경우 약 27MHz로 시스템을 동작시킬 수 있고, 보다 큰 움직임 벡터를 추정할 수 있는 탐색 알고리즘을 제안하여 성능면에서 우수하고 MPEG-1, MPEG-2등 비디오 엔코더를 VLSI로 구현하는데 효율적으로 적용될 수 있도록 한 것이다.The present invention relates to a half-pixel motion estimation apparatus and method for a video encoder, and reduces the number of clocks used in the operation to be implemented as a real-time system using 4SS, and adds one step to the half-pixel motion It can be used for estimation and motion compensation, and 4SS is better than TSS and requires fewer clocks than FSS. Therefore, it is suitable for real-time system and image data can be input at 6.5MHz. In this case, we propose a search algorithm that can operate the system at about 27MHz and estimate a larger motion vector so that it is excellent in performance and can be efficiently applied to implement video encoders such as MPEG-1 and MPEG-2 in VLSI. It is.

Description

Apparatus and method for half-pixel motion estimation for video encoder

본 발명은 동영상 엔코더를 위한 반화소 단위의 움직임 추정 장치 및 방법에 관한 것으로, 특히 4SS(four step search algorithm)을 이용하여 실시간 시스템으로 구현될 수 있도록 동작에 소용되는 클럭수를 줄였으며 반화소 단위의 움직임 추정하면서 움직임의 보상을 결정하는 1 스텝(step)을 추가한 동영상 엔코더를 위한 반화소 단위의 움직임 추정 장치 및 방법에 관한 것이다.The present invention relates to a half-pixel motion estimation apparatus and method for a video encoder. In particular, a four-step search algorithm (4SS) reduces the number of clocks required for an operation to be implemented in a real-time system. The present invention relates to an apparatus and method for estimating a half-pixel unit for a video encoder, which adds one step for determining a motion compensation while estimating a motion.

최근의 동영상 압축기술은 멀티미디어 통신, 비디오, 폰, 원거리 화상회의 HDTV, CD-ROM 등 그 응용분야가 점차 확대되고 있으며, 영상 압축 기술의 핵심은 비디오 신호에 생길 수 있는 시간적, 공간적 정보에 대한 중복성 이용에 있음은 이미 잘 알려진 사실이다.In recent years, video compression technology is expanding its applications such as multimedia communication, video, phone, teleconference HDTV, CD-ROM, etc., and the core of video compression technology is redundancy of temporal and spatial information that can occur in video signals. It is a well known fact that it is in use.

영상 압축을 위한 여러 가지 압축 기법 중에서 동영상 움직임 보상 부호화 기법이 많이 이용되고 있다.Among various compression techniques for image compression, video motion compensation encoding is widely used.

움직임 보상 부호화에는 움직임 추정에 의해 움직임을 보상하는 부분과 예측오차를 부호화하는 부분으로 이루어져 있으며, 움직임 벡터 추정은 움직임 정보의 유사성을 고려하여 블록 정합 알고리즘(BMA : block matching algoritym)이 가장 많이 이용되고 있다.Motion compensation coding consists of a part that compensates for motion by motion estimation and a part that encodes a prediction error. In motion vector estimation, a block matching algorithm (BMA) is most commonly used in consideration of similarity of motion information. have.

상기의 블록 정합 알고리즘에는 FS(Full Search Algorithm), TSS(Three Step Search Algorithm), HS(Hierarical Search Algorithm) 및 4SS(Four Step Search Algorithm) 등 많은 알고리즘들이 논문으로 발표되고 있으며 일부 알고리즘은 실제로 구현되어 쓰이고 있다.In the block matching algorithm, many algorithms such as Full Search Algorithm (FS), Three Step Search Algorithm (TSS), Hierarchical Search Algorithm (HS), and Four Step Search Algorithm (4SS) are published in the paper, and some algorithms are actually implemented. It is used.

그러나 상기와 같은 종래의 블록 정합 알고리즘 경우에는 FS가 성능은 가장 좋지만 많은 연산량이 필요로 하므로 실시간 구현에 어려움이 있고, HS는 고속 알고리즘의 가장 이상적인 형태이지만 역시 연산량이 많고 알고리즘을 구현하기 위한 VLSI 구조가 복잡하여 아직 실용화되기 힘들었다.However, in the conventional block matching algorithm as described above, FS has the best performance but requires a large amount of computation, which makes it difficult to implement in real time. HS is the most ideal form of a high-speed algorithm, but also has a large amount of computation and a VLSI structure for implementing the algorithm. It was so complicated that it was hard to be put into practical use.

TSS는 통계적인 움직임 벡터를 고려하지 않고 모든 움직임 벡터의 가능성을 같게 보기 때문에 실제 영상에서는 비효율적인 면을 나타내었다.The TSS shows an inefficient aspect in the real image because it considers the probability of all motion vectors equally without considering the statistical motion vectors.

그 반면에 4SS는 FS만큼 많은 양의 연산수를 요구하지 않으면서 TSS 보다는 높은 효율성을 가지는 것을 알려져 있다.On the other hand, 4SS is known to have higher efficiency than TSS without requiring as many operations as FS.

또한 이와 같이 움직임 추정에 대한 많은 알고리즘이 제시되고 있지만 알고리즘 구현을 위한 VLSI 구조에 대한 연구는 아직 미흡한 실정이다.In addition, although many algorithms for motion estimation have been proposed, studies on the VLSI structure for implementing the algorithm are still insufficient.

또 동영상의 압축 기술이 MPEG1 뿐만 아니라 MPEG2 급의 동영상에 대해서도 압축을 요구하게 됨에 따라 MPEG2 모델에서는 움직임 벡터를 추출하는데 더 넓은 서치 윈도우(SEARCH WINDOW)를 요구하기 때문에 그에 따라 늘어나는 연산량을 처리할 수 있는 실시간 움직임 벡터 추정기의 필요성이 대두되고 있다.In addition, as the video compression technology requires compression not only for MPEG1 but also for MPEG2 video, the MPEG2 model requires a wider search window for extracting motion vectors. There is a need for a real-time motion vector estimator.

따라서 본 발명은 실시간 움직임 벡터 추정기에 4SS 알고리즘을 이용하고 선형 근사 방식 반화소 추정법을 사용한 단위의 움직임 벡터 추정기의 VLSI 구조를 갖는 동영상 엔코더를 위한 반화소 단위의 움직임 추정 장치 및 방법을 제공하는 것을 그 목적으로 한다.Accordingly, the present invention provides a half-pixel motion estimation apparatus and method for a video encoder having a VLSI structure of a motion vector estimator using a 4SS algorithm for a real-time motion vector estimator and a linear approximation half-pixel estimation method. The purpose.

또한 본 발명의 넓어진 서치 윈도우를 감당할 수 있도록 4SS 알고리즘을 변형, 확장시킨 E4SS(Extended 4 Step Search Algorithm)를 채택하여 성능이 우수하고 VLSI로 구현에 용이한 E4SS 알고리즘에 대한 VLSI구조를 갖도록 함을 다른 목적으로 한다.In addition, by adopting E4SS (Extended 4 Step Search Algorithm) modified and expanded 4SS algorithm to cover the wider search window of the present invention to have a VLSI structure for E4SS algorithm with excellent performance and easy to implement in VLSI The purpose.

이와 같은 목적을 달성하기 위한 본 발명은 4SS(four step search algorithm)을 이용하여 실시간 시스템으로 구현될 수 있도록 동작에 소용되는 클럭수를 줄였으며 1 단계를 추가하여 반화소 단위의 움직임 추정과 움직임 보상결정까지 할 수 있도록 하고, TSS보다 성능이 좋고 FSS보다 적은 수의 클럭수를 요구하는 4SS를 이용하였으므로 실시간 시스템에서 적용하기에 적합한 구조이며, 6.5MHz의 속도로 영상 데이터가 입력될 경우 약 27MHz로 시스템을 동작시킬 수 있고, 보다 큰 움직임 벡터를 추정할 수 있는 탐색 알고리즘을 제안하여 성능면에서 우수하고 MPEG-1, MPEG-2등 비디오 엔코더를 VLSI로 구현하는데 효율적으로 적용될 수 있도록 한 것이다.In order to achieve the above object, the present invention uses a four step search algorithm (4SS) to reduce the number of clocks required for operation to be implemented as a real-time system, and adds one step to estimate motion and compensate for half-pixel units. It is possible to make a decision and use 4SS which is better than TSS and requires fewer clocks than FSS. Therefore, it is suitable for real-time system and it is about 27MHz when image data is input at 6.5MHz. We propose a search algorithm that can operate the system and estimate a larger motion vector so that it can be effectively applied to implement video encoders such as MPEG-1 and MPEG-2 in VLSI.

도 1 은 일반적인 영상 압축 장치의 구성을 나타낸 블럭도1 is a block diagram showing the configuration of a general video compression apparatus;

도 2 는 일반적인 4SS의 처리 과정을 나타낸 개략도Figure 2 is a schematic diagram showing the processing of a typical 4SS

도 3 은 일반적인 보건법에 의한 반화소 움직임 추정의 과정을 나타낸 개략도Figure 3 is a schematic diagram showing the process of half-pixel motion estimation by the general health law

도 4 는 일반적인 선형 근사화 방식을 나타낸 개략도4 is a schematic diagram showing a general linear approximation scheme

도 5 는 일반적인 4SS에서 각 PE의 좌표값을 나타낸 개략도5 is a schematic diagram showing the coordinates of each PE in a typical 4SS

도 6 은 본 발명의 일실시예에 따른 반화소 움직임 추정부의 구성을 나타낸 블럭도6 is a block diagram showing a configuration of a half-pixel motion estimation unit according to an embodiment of the present invention.

도 7 은 본 발명의 일실시예에 따른 프로세서 어레이의 구성을 나타낸 블럭도7 is a block diagram showing a configuration of a processor array according to an embodiment of the present invention.

도 8 은 본 발명의 일실시예에 따른 프로세서 엘리먼트의 구성을 나타낸 블럭도8 is a block diagram showing a configuration of a processor element according to an embodiment of the present invention.

도 9 는 본 발명의 반화소 단위 움직임 계산을 위한 PE를 나타낸 개략도9 is a schematic diagram showing a PE for calculating half-pixel unit motion of the present invention.

도 10 은 본 발명의 움직임 보상의 결정을 나타낸 개략도10 is a schematic diagram showing determination of motion compensation of the present invention.

도 11 은 본 발명의 다른 실시예에 따른 E4SS의 탐색과정을 나타낸 개략도11 is a schematic diagram showing a discovery process of an E4SS according to another embodiment of the present invention;

도 12 은 본 발명의 다른 실시예에 따른 데이터 입력 상태를 나타낸 개략도12 is a schematic diagram showing a data input state according to another embodiment of the present invention;

도 13 은 본 발명의 다른 실시예에 따른 탐색 영역과 움직임 영역의 구간을 나타낸 개략도13 is a schematic diagram showing sections of a search area and a motion area according to another embodiment of the present invention;

도 14 는 본 발명의 다른 실시예에 따른 반화소 움직임 추정부의 구성을 나타낸 블럭도14 is a block diagram showing a configuration of a half-pixel motion estimation unit according to another embodiment of the present invention.

도 15 는 본 발명의 다른 실시예에 따른 프로세서 어레이의 구성을 나타낸 블럭도15 is a block diagram showing a configuration of a processor array according to another embodiment of the present invention.

도 16 은 본 발명의 다른 실시예에 따른 프로세서 엘리먼트의 구성을 나타낸 블럭도16 is a block diagram showing a configuration of a processor element according to another embodiment of the present invention.

이하 본 발명을 첨부 도면에 의거 상세히 기술하면 다음과 같다.Hereinafter, the present invention will be described in detail with reference to the accompanying drawings.

도 1 은 본 발명의 동화상 움직임 추정장치를 포함하는 영상 압축 장치의 전체적인 구성을 나타낸 것으로서;1 shows the overall configuration of an image compression device including a moving picture motion estimation device of the present invention;

외부로 부터 입력되는 NTSC의 비디오 신호의 조합에 대해 그 조합에 해당하는 신호를 출력하는 NTSC 디코더(1)와;An NTSC decoder 1 which outputs a signal corresponding to the combination with respect to the combination of the NTSC video signal input from the outside;

상기 NTSC 디코더(1)로부터 입력되는 비디오 신호를 일정시간 동안 지연시키면서 증폭하는 지연 버퍼(2)와;A delay buffer (2) for amplifying while delaying the video signal input from the NTSC decoder (1) for a predetermined time;

상기 지연 버퍼(2)로부터 일정시간 동안 지연된 비디오 신호에 대해 변환을 위한 계수를 코사인 함수의 값으로 용이하게 계산하여 주파수 좌표로 변환하는 이산 코사인 변환부(3)와;A discrete cosine transforming unit (3) for easily calculating a coefficient for transforming a video signal delayed for a predetermined time from the delay buffer (2) into a value of a cosine function and converting the coefficient to a frequency coordinate;

상기 이산 코사인 변환부(3)로부터 전달되는 주파수 좌표로 변환된 비디오 신호에 대해 연속적인 곡선의 파형을 계단 파형으로 바꾸면서 진폭을 적당한 레벨을 단위로 하여 정수치로 치환하여 출력하는 양자화부(4)와;A quantization unit 4 for converting the waveform of a continuous curve into a stepped waveform for the video signal converted into the frequency coordinates transmitted from the discrete cosine transform unit 3 and converting the amplitude into an integer value at an appropriate level as a unit and outputting it; ;

상기 양자화부(4)로 부터 전달되는 계단 파형의 정수치에 대해 출현 빈도의 대수 절대값에 비례하는 길이를 가진 부호를 그 값에 할당하면서 압축된 비디오신호로 출력하는 가변 길이 부호부(5)와;Variable length code part 5 for outputting a compressed video signal while assigning a code having a length proportional to the absolute value of the frequency of appearance with respect to the integer value of the step waveform transmitted from the quantizer 4 to the value Wow;

상기 양자화부(5)에서 전달되는 레벨 단위의 정수치를 주파수 좌표의 비디오 신호로 환원시켜 출력하는 역양자화부(6)와;An inverse quantization unit (6) for reducing and outputting an integer value in units of levels transmitted from the quantization unit (5) to a video signal having frequency coordinates;

상기 역양자화부(6)로부터 전달되는 코사인 함수의 값인 주파수 좌표의 비디오 신호를 원래의 비디오 신호로 환원하는 역이산 코사인 변환부(7)와;An inverse discrete cosine transforming unit (7) for reducing the video signal of the frequency coordinate which is the value of the cosine function delivered from the inverse quantization unit (6) to the original video signal;

상기 NTSC 디코더(1)로 부터 입력되는 비디오 신호와 상기 역이산 코사인 변환부(7)로부터 전달되는 비디오 신호에 대해 움직임 벡터를 추정한 후 이에 대한 보상 결정을 하는 움직임 추정 및 보상부(8)와;A motion estimation and compensator (8) for estimating a motion vector with respect to the video signal input from the NTSC decoder (1) and the video signal transmitted from the inverse discrete cosine transformer (7), and compensating for the motion vector; ;

상기 지연 버퍼(2)로부터 일정시간 동안 지연된 비디오 신호와 사기의 움직임 추정 및 보상부(8)로부터의 신호를 혼합하여 이산 코사인 변환부(3)로 전달하는 감산부(9)들로 구성하되;A subtractor (9) which mixes the video signal delayed for a predetermined time from the delay buffer (2) with the signal from the motion estimation and compensation unit (8) and delivers it to the discrete cosine transformer (3);

상기의 움직임 추정 및 보상부(2)는 역이산 코사인 변환부(9)로부터 귀화되는 8 비트의 입력 데이터는 레지스터(11)에 의해 16 비트의 버스 구조로 변환되고;The motion estimation and compensator 2 converts the 8-bit input data naturalized from the inverse discrete cosine transform unit 9 into a 16-bit bus structure by the register 11;

상기 레지스터(11)로 부터 입력되는 16비트의 데이터는 2 개의 3 상태 버퍼(12)(13)의 스위칭 작동에 의해 2 개의 DRAM인 프레임 메모리(14)(15)에 프레임 단위로 읽기와 쓰기를 수행하도록 하고;The 16-bit data input from the register 11 reads and writes in units of frames to the frame memories 14 and 15, which are two DRAMs, by switching operations of the two three-state buffers 12 and 13. To perform;

상기 2 개의 DRAM인 프레임 메모리(14)(15)에 각각 저장된 짝수번 째의 8비트 데이터와 홀수번 째의 8 비트 데이터는 두 스위치(16)(17)에 의해 하나의 16비트 데이터 버스에 선택하면서 8 비트 단위로 분할하고;The even-numbered 8-bit data and the odd-numbered 8-bit data respectively stored in the two DRAM frame memories 14 and 15 are selected by one switch 16 and 17 to one 16-bit data bus. While dividing into 8 bit units;

상기의 스위치(16)에 의해 분할되어 2 개의 프레임 메모리(14)(15)에 저장되었던 데이터는 SRAM인 두 서치 윈도우(18)(19)에 8 비트 단위로 저장하고;Data divided by the switch 16 and stored in the two frame memories 14 and 15 are stored in two search windows 18 and 19 which are SRAMs in 8 bit units;

상기의 스위치(17)에 의해 선택되는 2 개의 프레임 메모리(14)(16)에 저장되었던 서치 윈도우 신호를 입력받는 화소 움직임 추정부(Pel Motion Estimator)(20)에서는 화소 단위로 움직임을 추정하고;A pixel motion estimator 20 receiving a search window signal stored in the two frame memories 14 and 16 selected by the switch 17 estimates motion in units of pixels;

상기 움직임 추정부(20)로 부터 출력되는 화소 단위의 움직임 정보를 입력받는 반화소 움직임 추정부(Half-Pel Motion Estimator)(21)에서는 주위의 에러값을 이용하여 반화소 단위로 움직임을 추정하고;Half-Pel Motion Estimator 21, which receives pixel-based motion information output from the motion estimator 20, estimates motion in half-pixel units using the surrounding error values. ;

상기 반화소 움직임 추정부(21)로부터 반화소 단위의 움직임 정보를 전달받는 반화소 움직임 보상부(Half-Pel Motion Compensation)(22)에서는 상기의 두 서치 윈도우(18)(19)로 데이터를 입력받아 움직임의 보상을 행하도록 하고;Half-Pel Motion Compensation 22, which receives half-pixel motion information from the half-pixel motion estimation unit 21, inputs data into the two search windows 18 and 19. To compensate for movement;

상기 움직임 보상부(22)에서 움직임이 보상된 신호를 전달받은 버퍼(23)에서는 8*8 블럭 단위로 스캐닝을 변환하도록 구성한 것이다.The buffer 23, which receives the motion-compensated signal from the motion compensator 22, converts the scanning by 8 * 8 blocks.

이와 같이 구성한 본 발명의 반화소 움직임 추정장치의 동작 상태를 상세히 기술하면 다음과 같다.The operation state of the half-pixel motion estimation apparatus of the present invention configured as described above is described in detail as follows.

일반적인 거의 대부분의 영상은 동화상으로 연속해서 이어지며 천천히 움직이는 상태이므로 TSS의 방법으로 첫 번째 단계에 서치 윈도우를 균일하게 체크 포인트(checking point)들을 잡는 것은 움직임이 적은 일반 영상에서는 비효율적인 면을 보이게 된다.Since almost all images are continuous and moving slowly in a moving picture, grasping the check points uniformly in the search window in the first step by TSS method is inefficient in general image with little movement. .

그러나 4SS의 방법은 통계적으로 움직임의 양이 중심에서 -2~2 화소 정도의 움직임을 보임을 바탕으로 하여 처음 중심에서 가로 세로 각각 -2, 2 떨어진 9점에서 서치를 시작하기 때문에 움직임이 비교적 적은 영상에도 비교적 잘 적용한다.However, the method of 4SS has relatively little movement because it starts the search at 9 points of -2 and 2 horizontally and vertically from the first center based on the statistical movement of about -2 ~ 2 pixels from the center. Applies relatively well to video.

4SS의 처리 과정은 도 2 에 도시한 것과 같이;The processing of 4SS is as shown in Fig. 2;

처음에는 15 * 15의 서치 영역(searching area)의 5 * 5 윈도우에 9 개의 서치 포인트(checking point)에 대해 최소 SAD(Sum of Absolute Difference) 값을 갖는 점을 찾는 제 1 단계를 수행한다.Initially, a first step of finding a point having a minimum Sum of Absolute Difference (SAD) value for nine searching points in a 5 * 5 window of a search area of 15 * 15 is performed.

이 구해진 점을 중심으로 5 * 5 윈도우에 9개의 체크 포인트에 대해 최소 SAD 값을 갖는 점을 찾는 제 2 단게를 수행한다.A second step is performed to find the point with the minimum SAD value for nine checkpoints in a 5 * 5 window centered on the obtained point.

다음에는 구해진 점을 중심으로 5 * 5 윈도우에 9개의 체크 포인ㅌ에 대해 최소 SAD 값을 갖는 점을 찾는 과정을 반복하는 제 3 단계를 수행한다.Next, a third step of repeating the process of finding a point having a minimum SAD value for nine check points in a 5 * 5 window is performed based on the obtained point.

그리고 상기의 구해진 점을 중심으로 3 * 3 윈도우에 9개의 체크 포인트에 대해 최소 SAD 값을 갖는 점을 찾는 제 4 단계를 수행하면, 여기서 찾아진 점이 움직임 벡터의 방향을 결정하게 된다.If a fourth step of finding a point having a minimum SAD value for nine check points in a 3 * 3 window is performed based on the obtained point, the found point determines the direction of the motion vector.

일반적으로 4SS가 TSS 보다 성능이 우수하고 FS에 거의 접근한다는 것을 알 수 있다.In general, we can see that 4SS outperforms TSS and approaches FS.

한편, 보간법에 의한 전 영역 탐색 반화소 움직임 벡터 결정법을 도 3 에서와 같이 가능한 9 포인트의 반화소에 대한 SAD를 계산하여 움직임 벡터를 결정하는 방법으로, 선형 근사화 방식은 보간법에서와 같이 가능한 반화소 위치에 대한 SAD를 계산하지 않고서도 이미 계산된 화소단위 SAD 값을 이용하여 도 3 에서와 같이 근사화하는 방법이다.Meanwhile, the full range search half-pixel motion vector determination method by interpolation method is a method of determining the motion vector by calculating SADs for the half-pixels that are possible as shown in FIG. 3, and the linear approximation method is possible as in the interpolation method. It is a method of approximating as shown in FIG. 3 using the previously calculated pixel unit SAD value without calculating the SAD for the position.

이러한 근사화 방법은 성능이 거의 보간법에 접근한다.This approximation approach almost approaches the interpolation performance.

즉, E=a │x-b│ + c ( a 0, │b│ 1 )That is, E = a │x-b│ + c (a 0, │ b│ 1)

여기서 a는 기울기이고, b는 최소점이다.Where a is the slope and b is the minimum.

E(-1) = a(1+b)E (-1) = a (1 + b)

E(0) = a│b│E (0) = a│b│

E(1) = a(1-b)E (1) = a (1-b)

도 4 의 그림에서 에러 곡선의 양쪽이 대칭인 것으로 가정하면, 최소 에러점이 -0.75 ~ - 2.25 사이에 있는 경우에는Assuming that both sides of the error curve are symmetric in the figure of FIG. 4, when the minimum error point is between -0.75 and-2.25

2{E(-1)-E(0) (E(1)-E(0)-------(식 1)2 {E (-1) -E (0) (E (1) -E (0) ------- (Equation 1)

이고, 최소 에러점이 0.75~0.25 사이에 있는 경우에는, If the minimum error point is between 0.75 and 0.25

(E(-1)-E(0) 2{(E(1)-E(0)}-------(식 2)(E (-1) -E (0) 2 {(E (1) -E (0)) ------- (Equation 2)

이다.to be.

식 1의 경우에는 mv-0.5, 식 2의 경우에는 mv+0.5를 하여 반화소 단위의 움직임 정보를 얻게 된다.In the case of Equation 1, mv-0.5, and in the case of Equation 2, mv + 0.5 is used to obtain half-pixel motion information.

그러므로 기존의 4SS 알고리즘을 적요하고 반화소 단위의 움직임을 추정하기 위해 처리 단계의 수를 1개 첨가하여 제 5 단계에서 반화소 단위의 움직임 추정을 하면서 MC, no MC를 결정할 수 있도록 한다.Therefore, in order to apply the existing 4SS algorithm and add the number of processing steps to estimate the motion of the half pixel unit, the MC and the no MC can be determined while the half pixel unit motion estimation is performed in the fifth step.

4SS에서는 도 5 에 도시한 것과 같이 기준점과 기준점으로 상하좌우 각 2씩 변위를 갖는 8점에 대한 SAD(Sum of Absolute Difference) 연산을 수행해야 한다.In 4SS, a SAD (Sum of Absolute Difference) calculation must be performed on 8 points having displacements of up, down, left, and right sides as reference points and reference points as shown in FIG. 5.

여기서 각 PE가 계산해야 할 영역이 서로 대부분 겹쳐 있으므로 기준 블록, 즉 DB 데이터를 처리 타이밍(processing timming)에 맞게 지연시키면 연산에 소요되는 클럭수를 줄일 수 있게 된다.Since the areas to be calculated by each PE overlap most of each other, delaying the reference block, that is, DB data according to processing timing, can reduce the number of clocks required for the calculation.

표 1은 4SS에서 제 1 단계, 제 2 단계 및 제 3 단계일 때 각 PE에 입력되는 W와 DB데이터의 상태를 나타낸 것이고, 표 2는 제 4 단계와 제 5 단계일 때의 각 PE에 입력되는 W와 DB 데이터의 상태를 나타낸 것이다.Table 1 shows the state of W and DB data input to each PE in the first, second and third stages of 4SS, and Table 2 is input to each PE in the fourth and fifth stages. It shows the state of W and DB data.

표 1 및 표 2 에서 확인되는 바와 같이 각 PE에 입력되는 DB 데이터가 PE0, PE1, PE2의 사이, PE3, PE4, PE5의 사이, PE6, PE7, PE8의 사이에서 제 1 단계, 제 2 단계 및 제 3 단계에는 2 클럭씩 지연되어 입력되어 지고, 제 4 단계와 제 5 단계에는 1 클럭씩 지연되어 입력된다.As shown in Table 1 and Table 2, the DB data inputted to each PE is the first step, the second step, and between PE0, PE1, PE2, between PE3, PE4, PE5, PE6, PE7, PE8, and The third stage is inputted with a delay of two clocks, and the fourth stage and the fifth stage are inputted with a delay of one clock.

또 PE2와 PE3의 사이 및 PE5와 PE6의 사이에서 제 1 단계, 제 2 단계 및 제 3 단계에는 28 클럭씩 지연되고, 제 4 단계와 제 5 단계에는 14 클럭씩 지연되어 입력된다.In addition, between the PE2 and the PE3 and between the PE5 and the PE6 are delayed by 28 clocks in the first, second and third stages, and delayed by 14 clocks in the fourth and fifth stages.

도 6 은 본 발명에 따른 반화소 움직임 추정부의 내부 구성을 나타낸 것으로;6 shows an internal configuration of a half-pixel motion estimation unit according to the present invention;

현재의 프레임이 입력되는 서치 윈도우 메모리(32)와 바로 전의 프레임이 입력되는 기준 블록의 메모리(33)를 제어하기 위한 제어 신호(SRW)(DRW)를 생성하여 출력하면서 어드레스 신호(ADDR)를 출력하는 어드레스 콘트롤러(address controller)(31)와;Outputs the address signal ADDR while generating and outputting a control signal SRW DRW for controlling the search window memory 32 to which the current frame is input and the memory 33 of the reference block to which the previous frame is input. An address controller 31;

9 개의 단위 프로세서(processor element)로 이루어져 상기 서치 윈도우 메모리(32) 및 기준 블록 메모리(33)로 부터 상, 하위 데이타(P, Q)와 DB 데이타(DB)를 입력받아 SAD의 계산을 수행하는 프로세서 어레이(processer array)(34)와;It is composed of nine processor elements and receives upper and lower data (P, Q) and DB data (DB) from the search window memory 32 and the reference block memory 33 to perform calculation of SAD. A processor array 34;

상기 프로세서 어레이(34)의 9 개의 단위 프로세서로부터 SAD의 값을 전달받아 비교하고 그 중 최소 에러값을 갖는 부분을 다음의 세트에서 진행될 9 점에 대한 인덱스(Index)로 출력하는 PMVG(preliminary motion vector generator)(35)와;PMVG (preliminary motion vector) for receiving and comparing SAD values from nine unit processors of the processor array 34 and outputting the portion having the minimum error value as an index for nine points to be processed in the next set. generator 35;

상기 프로세서 어레이(34)로 부터 SAD의 값을 전달받아 움직임 벡터를 계산하고 반화소 단위의 움직임 추정을 하고 움직임 보상 플래그(flag)를 발생시키는 HMVG(half-pel motion vector generator) MCD(MC decision)(36)와;A half-pel motion vector generator (HMVG) that receives a SAD value from the processor array 34, calculates a motion vector, estimates motion in a half-pixel unit, and generates a motion compensation flag. 36;

전체 블럭의 동작 타이밍을 맞추고 상기의 어드레스 콘트롤러(31)에는 어드레스 콘트롤 신호(address control)를 출력하며 프로세서 어레이(34)에는 프로세서 엘리먼트 인에이블 신호(PE-E)와 스텝 타입을 결정하기 위한 스텝 타입 제어 신호(step type)를 출력하고 상기 PMVG(35)에 MVG 제어 신호(MVG-Control)를 출력하면서 최소 에러값에 따른 인덱스(PMVH, PMVV)를 입력받으며 상기 HMVG MCD(36)에 움직임 추정의 제어 신호(HMVG-Control)를 출력하여 내부의 동작을 제어하는 프로세서 콘트롤러(processor controller)(37)들로 구성한 것이다.Step type for determining the operation timing of all blocks, outputting an address control signal to the address controller 31, and determining the processor element enable signal PE-E and the step type to the processor array 34. Outputting a control signal (step type) and outputting the MVG control signal (MVG-Control) to the PMVG (35) while receiving the index (PMVH, PMVV) according to the minimum error value and the motion estimation of the HMVG MCD (36) It is composed of processor controllers (processor controllers 37) for outputting a control signal (HMVG-Control) to control the internal operation.

도 7 은 움직임 추정기의 프로세서 어레이(34)의 내부 구조를 도시한 것으로, SAD를 계산하기 위한 9개의 프로세서 엘리먼트(PE0~PE8)(34a)~(34i)들을 병렬로 배열하여 서치 윈도우 메모리(32)로 부터 8 비트의 상, 하위 데이타(P, Q)와 프로세서 콘트롤러(37)로부터 1 비트의 스텝 타입 제어 신호(step type)를 각각 병렬로 입력받도록 하면서 프로세서 콘트롤러(37)로 부터 1 비트의 프로세서 엘리먼트 인에이블 신호(PE0-E)~(PE8-E)와 멀티플렉서 선택 제너레이터(38)에 의해 만들어져서 전달되는 프로세서 엘리먼트 메모리 선택 신호(PE0-MSEL)~(PE8-MSEL)를 각각 입력받도록 하는 한편, 상기의 기준 블록 메모리(33)로부터 DB 데이타(DB)와 프로세서 엘리먼트(PE0~PE7)(34a)~(34h)를 경유하는 중에 순차적으로 지연된 DB 데이타(Delayed DB)를 입력받도록 함으로써 SAD의 계산을 수행하여 SAD 값을 출력하도록 한 것이다.FIG. 7 illustrates the internal structure of the processor array 34 of the motion estimator. The search window memory 32 is arranged by arranging nine processor elements PE0 to PE8 34a to 34i in parallel to calculate the SAD. 8 bit upper and lower data (P, Q) and 1 bit step type control signal (step type) from the processor controller 37 in parallel, respectively, Inputs the processor element enable signals PE0-E to PE8-E and the processor element memory select signals PE0-MSEL to PE8-MSEL, which are generated and transmitted by the multiplexer select generator 38, respectively. On the other hand, by sequentially receiving delayed DB data (Delayed DB) while passing through the DB data DB and the processor elements (PE0 to PE7) 34a to 34h from the reference block memory 33, To calculate the SAD value To output it.

그러므로 9 개가 병렬로 배열된 프로세서 엘리먼트(PE0~PE8)(34a)~(34i)들에는 서치 윈도우 메모리(32)로 부터 8 비트 씩의 상위 데이타(P)와 하위 데이타(Q)가 입력되도록 한 상태에서 멀티플렉서 선택 제너레이터(38)에 의해 만들어져서 전달되는 프로세서 엘리먼트 메모리 선택 신호(PE0-MSEL)~(PE8-MSEL)가 0이면 8 비트의 상위 데이타(P)가 선택되면서 엘리먼트 메모리 선택 신호(PE0-MSEL)~(PE8-MSEL)가 1이면 하위 데이타(Q)가 선택된다.Therefore, the processor elements PE0 to PE8 34a to 34i arranged in parallel have 8 bits of upper data P and lower data Q input from the search window memory 32. In the state, when the processor element memory selection signals PE0-MSEL to PE8-MSEL generated and transmitted by the multiplexer selection generator 38 are 0, 8-bit upper data P is selected and the element memory selection signal PE0 is selected. If -MSEL) ~ (PE8-MSEL) is 1, the lower data Q is selected.

그리고 프로세서 콘트롤러(37)로 부터 1 비트의 스텝 타입 제어 신호(step type)가 0으로 입력되면 제 1 단계와 제 2 단계 및 제 3 단계에 의해 지연되도록 하면서 스텝 타입 제어 신호(step type)가 1로 입력되면 제 4 단계 및 제 5 단계에 의해 지연되도록 한다.When the 1-bit step type control signal (step type) is input to 0 from the processor controller 37, the step type control signal (step type) becomes 1 while being delayed by the first step, the second step, and the third step. If it is input as is to be delayed by the fourth and fifth steps.

상기 서치 윈도우 메모리(32)로 부터 8 비트의 상, 하위 데이타(P, Q)와 상기의 기준 블록 메모리(33)로 부터 DB 데이타(DB)에 의해 SAD의 계산을 수행하여 SAD값을 출력한다.SAD values are calculated by performing 8-bit upper and lower data P and Q from the search window memory 32 and DB data DB from the reference block memory 33 and outputting a SAD value. .

도 8 은 각각의 프로세서 엘리먼트의 구성을 도시한 것으로서, 멀티플렉서 선택 제너레이터(38)에 의해 만들어져서 전달되는 프로세서 엘리먼트 메모리 선택 신호(PE0-MSEL)~(PE8-MSEL)에 의해 서치 윈도우 메모리(32)로 부터 입력되는 상위 데이타(P)와 하위 데이타(Q)를 선택하는 서치 윈도우 멀티플렉서(38)와;FIG. 8 shows the configuration of each processor element, wherein the search window memory 32 is provided by the processor element memory selection signals PE0-MSEL to PE8-MSEL generated and transmitted by the multiplexer selection generator 38. A search window multiplexer 38 for selecting the upper data P and the lower data Q inputted from?

기준 블록 메모리(33)로 부터 DB 데이타(DB)를 입력받으면서 프로세서 콘트롤러(37)로 부터 1 비트의 스텝 타입 제어 신호(step type)에 의해 지연시켜 다음의 엘리먼트 어레이(PE1~PE8)(34b)~(34i)로 출력하는 지연 블럭(39)과;The next element array (PE1 to PE8) 34b is delayed by the 1-bit step type control signal from the processor controller 37 while receiving the DB data DB from the reference block memory 33. A delay block 39 for outputting to 34i;

서치 윈도우 멀티플렉서(38)와 지연 블럭(39)으로 부터 상, 하위 데이타(P), (Q) 및 지연된 DB 데이타(DB)를 입력받아 감산에 의해 상호 비교하는 감산 블럭(40)과;A subtraction block 40 which receives the upper, lower data P, Q and the delayed DB data DB from the search window multiplexer 38 and the delay block 39 and compares each other by subtraction;

상기 감산 블럭(40)으로 부터 상, 하위 데이타(P), (Q) 및 지연된 DB 데이타(DB)를 입력받아 감산에 의해 상호 비교한 결과에 따라 SAD 연산의 절대값 계산과 누적 연산을 통하여 16비트의 SAD 값을 구하여 출력하는 연산 블럭(41)들로 구성한 것이다.The upper, lower data (P), (Q) and the delayed DB data (DB) are inputted from the subtraction block 40 and 16 are calculated through the absolute value calculation and the cumulative calculation of the SAD operation according to the result of mutual comparison by subtraction. It is composed of arithmetic blocks 41 for obtaining and outputting the SAD value of the bit.

여기서 지연 블럭(39)은 프로세서 엘리먼트(PE0~PE8)(34a)~(34i)의 종류에 따라 a-type 지연 방식과 b-type 지연 방식의 두 가지 형태로 나뉘어지는데, a-type 지연 방식은 제 1 단계, 제 2 단계 및 제 3 단계일 때에는 2 클럭을 지연시키고 제 4 단계 및 제 5 단계일 때에는 1클럭을 지연시킨다.The delay block 39 is divided into two types, a-type delay method and b-type delay method, depending on the types of the processor elements PE0 to PE8 34a to 34i. Delay two clocks in the first, second, and third phases, and delay one clock in the fourth and fifth phases.

그리고 b-type 지연 방식은 제 1 단계, 제 2 단계 및 제 3 단계일 때에는 28 클럭을 지연시키고 제 4 단계 및 제 5 단계일 때에는 14 클럭을 지연시킨다.The b-type delay scheme delays 28 clocks in the first, second and third stages, and delays 14 clocks in the fourth and fifth stages.

이렇게 지연되어 입력되는 DB데이터와 서치 윈도우 멀티플렉서(38)에서 출력되는 서치 윈도우 데이터를 빼기, 절대값, 누적 연산을 통하여 16 비트의 SAD값을 출력하게 된다.The delayed DB data and the search window data output from the search window multiplexer 38 output 16-bit SAD values through subtraction, absolute values, and cumulative operations.

표 3 과 표 4는 상기 프로세서 어레이(34)의 프로세서 엘리먼트의 입력인 프로세서 엘리먼트 인에이블 신호(PE-E)의 출력 타이밍을 나타낸 것으로서, 각 프로세서 엘리먼트에 대해 프로세서 엘리먼트 인에이블 신호(PE-E)가 1이 되면 SAD 계산을 하게 되고, 0이 되면 SAD의 계산을 하지 않게 된다.Table 3 and Table 4 show the output timing of the processor element enable signal (PE-E) that is the input of the processor element of the processor array 34, the processor element enable signal (PE-E) for each processor element Is 1, SAD is calculated, and 0 is not SAD.

단계 1, 단계 2 및 단계 3 에서는 프로세서 엘리먼트(PE0, PE1, PE2), (PE3, PE4, PE5), (PE6, PE7, PE8)들의 사이에서는 1 사이클 씩 지연된 다음 동작하고, 프로세서 엘리먼트(PE2, PE3), (PE5, PE6)들의 사이에서는 14 사이클씩 지연된 다음 동작을 하게 된다.In steps 1, 2, and 3, a delay is performed by one cycle between the processor elements PE0, PE1, PE2, PE3, PE4, PE5, PE6, PE7, and PE8, and the processor elements PE2, There is a delay of 14 cycles between PE3) and (PE5, PE6) and the next operation.

또 단계 4 및 단계 5에서는 프로세서 엘리먼트(PE0, PE1, PE2), (PE3, PE4, PE5), (PE6, PE7, PE8)들의 사이에서는 2 사이클씩 지연된 다음 동작하고, 프로세서 엘리먼트(PE2, PE3), (PE5, PE6)들의 사이에서는 28 사이클씩 지연된 다음 동작을 하게 된다.Further, in steps 4 and 5, the processor element PE0, PE1, PE2, PE3, PE4, PE5, PE6, PE7, PE8 are delayed by two cycles, and then the processor elements PE2, PE3 are operated. , Between the (PE5, PE6) is delayed by 28 cycles and then the operation.

각 프로세서 엘리먼트(PE)에 입력되는 데이터 값을 A-delay, B-delay에 의해 각각 단계 1, 단계 2 및 단계 3에서는 1 사이클, 2 사이클, 단계 4 및 단계 5에서는 14 사이클, 28 사이클씩 지연된 값들이 입력되게 된다.Data values input to each processor element (PE) are delayed by 1 cycle, 2 cycles, 14 cycles and 28 cycles in steps 1, 2, and 3 by A-delay and B-delay, respectively. The values will be entered.

표 5은 반화소 단위 움직임 벡터 계산기 위해 단계 5에서 프로세서 엘리먼트(PE)로 부터 SAD 출력이 나오는 상태를 나타낸 타이밍(timming)을 나타낸 것이다.Table 5 shows the timing for the SAD output from the processor element (PE) in step 5 for the half-pixel unit motion vector calculator.

도 9 는 반화소 단위 움직임 추정을 하기 위해서는 프로세서 엘리먼트(PE1, PE3, PE4, PE5, PE7)들에 의한 5개의 SAD값이 필요하다.In FIG. 9, five SAD values by the processor elements PE1, PE3, PE4, PE5, and PE7 are required to perform half-pixel unit motion estimation.

상기의 표 5 에서와 같은 타이밍(timming)대로 각 프로세서 엘리먼트(PE1, PE3, PE4, PE5, PE7)의 출력이 나오게 되면, 출력된 SAD값이 나오는 시각까지 데이터를 저장하고 있다가 비교할 프로세서 엘리먼트(PE1, PE3, PE4, PE5, PE7)의 출력이 나올 때 상기의 식 1 과 식 2 와 같이 반화소 움직임 계산식에 의해 반 화소 움직임 벡터를 계산하게 된다.When the output of each of the processor elements PE1, PE3, PE4, PE5, and PE7 comes out according to the timing as shown in Table 5 above, the data is stored until the output SAD value appears and then the processor element ( When the output of PE1, PE3, PE4, PE5, PE7) comes out, the half-pixel motion vector is calculated by the half-pixel motion calculation equation as in Equation 1 and Equation 2 above.

이와 같은 과정에 의해 반화소 단위 움직임 추정이 끝나면, 단화소 단위 움직임 벡터와 함께 도 10 의 움직임 보상 결정 곡선에 의해 MC인지 no MC인지 결정을 하여 그 결과에 따라 MC 플래그(flag)신호로 내보내게 된다.After the half-pixel unit motion estimation is completed by the above process, the MC determines whether it is MC or no MC using the motion compensation decision curve of FIG. do.

MPEG-1에 적용되는 SIF 포맷에서의 32 × 32 탐색 영역에서는 탐색 영역이 그다지 넓지 않기 때문에 3 단계나 4 단계 정도의 과정에 의해서도 충분히 움직임 벡터를 추적할 수가 있다.In the 32 × 32 search area in the SIF format applied to MPEG-1, the search area is not very wide, and thus the motion vector can be sufficiently tracked by three or four steps.

그러나 탐색 영역이 48 × 48 또는 64 × 64가 되면 움직임 벡터의 최대 구간까지 가기 위새서는 적은 단계 크기(step size)에 의해서는 많은 단계들이 요구되고 또 단계 크기를 늘리면 움직임이 적은 영상에 대해서는 문제를 발생시킬 수 있기 때문에 탐색 알고리즘 선정에 많은 어려움이 있게 된다.However, if the search area reaches 48 × 48 or 64 × 64, many steps are required for the smallest step size to reach the maximum interval of the motion vector. Since it can generate a, it is difficult to select a search algorithm.

즉, 단계의 수를 줄여서 각 단계마다 탐색점을 많이 사용하는 방법에서는 계산량이 많다는 단점이 있으며, 각 스텝마다의 탐색점을 줄이고 대신 스텝수를 늘리는 방법은 계산량은 줄일 수 있지만 움직임이 큰 경우에는 움직임 벡터를 찾는데 어려움이 따를 수 있다.In other words, a method that reduces the number of steps and uses more search points in each step has a large amount of calculation.However, reducing the search point for each step and increasing the number of steps instead can reduce the calculation amount, Difficulties in finding a motion vector can follow.

그러므로 본 발명의 다른 실시예에 따른 E4SS알고리즘은 상기 4SS 알고리즘이 움직임이 큰 영상에 대해서는 잘 적응하지 못하는 경우가 발생하게 되므로 제 1 단계와 제 2 단계어서 서치 포인트(search point)를 도 11 에 도시한 것과 같이 36점에 대해 블록당 비교를 하도록 하면 제 1 단계에서 움직임 벡터가 갈 수 있는 구간이 -4, -6이므로 움직임이 큰 영상에 대해서도 좋은 성능을 낼 수 있도록 한 것이다.Therefore, in the E4SS algorithm according to another embodiment of the present invention, a case where the 4SS algorithm does not adapt well to a large moving image may occur. As shown in the above, when the block-by-block comparison is performed for 36 points, the range where the motion vector can go in the first step is -4 and -6, so that a good performance can be obtained even for a large motion image.

즉, 단계 1 은 원점을 기준으로 2 픽셀(PIXEL) 간격으로 왼쪽으로 2, 오른쪽으로 3, 위쪽으로 2, 아래쪽으로 3 포인트의 범위 안에 있는 포인트들에 대해 MAD 연산을 수행하여 최소 에러점을 구한다.That is, step 1 performs a MAD operation on points within a range of 2 points to the left, 3 to the right, 2 to the right, and 3 points to the bottom at intervals of 2 pixels based on the origin to obtain the minimum error point. .

단계 2는 상기의 단계 1에서 구해진 최소 에러점의 위치에 따라 그 서치 범위(SEARCH RANGE)가 달라진다.In step 2, the search range is changed according to the position of the minimum error point obtained in step 1 above.

그러므로 단계 1의 수평 방향 서치 범위 안에서 기준점으로부터 왼쪽에 있으면, 단계 2의 수평 방향 서치 범위는 2 픽셀 간격으로 왼쪽으로 3, 오른쪽으로 2의 범위안에 있는 포인트들에 대해 MAD연산으로 최소 에러점을 구하고, 오른쪽에 있으면, 단계 2의 수평 방향 서치 범위는 2 픽셀 간격으로 왼쪽으로 2, 오른쪽으로 3의 범위안에 있는 포인트들에 대해 MAD연산으로 최소 에러점을 구한다.Therefore, if it is left from the reference point within the horizontal search range of step 1, the horizontal search range of step 2 finds the minimum error point by MAD operation for the points in the range of 3 to the left and 2 to the right at 2 pixel intervals. , If right, the horizontal search range of step 2 finds the minimum error point by MAD operation for points in the range of 2 to the left and 3 to the right in 2 pixel intervals.

한편 단계 1의 수직 방향 서치 범위 안에서 기준점으로부터 위쪽에 있으면 단계 2의 수직 방향 서치 범위는 2 픽셀 간격으로 왼쪽으로 3, 오른쪽으로 2의 범위 안에 있는 포인트들에 대해 MAD연산으로 최소 에러점을 구하고, 아래쪽 있으면 단계 2의 수직 방향 서치 범위는 2 픽셀 간격으로 왼쪽으로 2, 오른쪽으로 3의 범위 안에 있는 포인트들에 대해 MAD연산으로 최소 에러점을 구한다.On the other hand, if it is above the reference point within the vertical search range of step 1, the vertical search range of step 2 finds the minimum error point by MAD operation for the points within the range of 3 to the left and 2 to the right at 2 pixel intervals. If downward, the vertical search range of step 2 finds the minimum error point by MAD operation for points in the range of 2 to the left and 3 to the right in 2 pixel intervals.

단계 3은 상기의 단계 2에서 구해진 최소 에러점을 중심으로 2 픽셀 간격으로 왼쪽으로 2, 오른쪽으로 3, 위쪽으로 2, 아래쪽으로 3 포인트의 범위 안에 있는 포인트들에 대해 MAD연산으로 최소 에러점을 구한다.Step 3 calculates the minimum error point by MAD operation on the points within the range of 2 points to the left, 3 to the right, 2 to the top, and 3 points to the bottom with the minimum error point obtained in step 2 above. Obtain

단계 4 는 단계 3에서 구해진 최소 에러점을 중심으로 1 픽셀 간격으로 왼쪽, 오른쪽, 위쪽, 아래쪽 각각 1 포인트에 해당하는 탐색 구간에 대해 MAD 연산으로 최소 에러점을 구한다.Step 4 finds the minimum error point by MAD operation for the search section corresponding to one point each of left, right, top, and bottom at intervals of one pixel with respect to the minimum error point obtained in step 3.

여기서 찾아진 최소 에러점과 단계 1 에서의 기준점과의 범위가 움직임 벡터가 된다.The range between the minimum error point found here and the reference point in step 1 becomes a motion vector.

E4SS 알고리즘은 전술한 4SS 알고리즘보다 성능이 우수하므로 전술한 VSLI의 구조를 변형시키면서 확장시키면 VLSI구조에도 적합하게 된다.Since the E4SS algorithm is superior to the 4SS algorithm described above, it is suitable for the VLSI structure by expanding and modifying the structure of the aforementioned VSLI.

E422 알고리즘은 각 스텝마다 탐색점 수를 늘려서 많은 연산을 하기 때문에 32 비트(bit)씩 데이터를 입력시키고 연산하도록 한다.Since the E422 algorithm performs many operations by increasing the number of search points in each step, data is input and operated by 32 bits.

도 12 내지 도 16 은 본 발명의 다른 실시예에 따른 E4SS 알고리즘에 의한 VLSI 구조를 도시한 것으로서, 전술한 4SS 알고리즘의 VISL 구조와 동일한 부분에 대한 설명은 생략한다.12 to 16 illustrate a VLSI structure by the E4SS algorithm according to another embodiment of the present invention, and descriptions of the same parts as the VISL structure of the aforementioned 4SS algorithm will be omitted.

도 12 는 데이타의 입력 순서를 나타낸 것으로, 4개의 픽셀 값이 동시에 입력되어 4개의 픽셀에 해당하는 MAD연산도 동시에 수행한다.12 illustrates an input order of data, in which four pixel values are input at the same time to perform MAD operations corresponding to four pixels.

즉, 보든 입력과 연산이 32비트씩 이루어져서 데이터의 입력과 처리가 32비트씩 이루어지므로 처리에 소요되는 클럭수도 줄어들게 된다.That is, since all inputs and operations are performed by 32 bits and data input and processing are performed by 32 bits, the number of clocks required for processing is also reduced.

각 프로세서 엘리먼트(PE)에 입력되는 데이터가 4 픽셀씩 동시에 32 비트로 입력되어지면 표 6, 표 7 및 표 8 과 같이 데이타(SW)와 DB 데이타(DB)를 입력받는 상태가 된다.When data input to each processor element PE is input simultaneously in 32 bits of 4 pixels, data SW and DB data DB are received as shown in Tables 6, 7, and 8.

각 프로세서 엘리먼트(PE)에 입력되어지는 DB 데이타(db1, db2, db3, db4)가 4SS의 단계 4 와 단계 5 일 때와 같이 1 클럭씩, 14 클럭씩 지연되어 입력됨을 알 수 있다.It can be seen that the DB data db1, db2, db3, and db4 input to each processor element PE are delayed by one clock and 14 clocks as in the case of steps 4 and 5 of 4SS.

도 13 은 48 × 48의 크기의 서치 윈도우를 나타낸 것으로, 움직임 벡터의 구간은 -16~16이고 서치 윈도우는 매크로 블럭(macro block) 6개로 이루어져 있으므로 서치 윈도우의 크기 48 × 48를 감안하여 상, 중, 하의 R, P, Q로 나누었다.FIG. 13 illustrates a search window having a size of 48 × 48. Since a motion vector has a range of −16 to 16 and the search window is composed of six macro blocks, the search window has a size of 48 × 48. It divided into middle, lower R, P, and Q.

도 14 내지 16은 48 × 48 서치 윈도우에 32 비트 처리를 할 수 있는 반화소 단위의 움직임 추정기의 구성을 도시한 것으로서, 기본적인 구성은 4SS 알고리즘에 대한 VLSI 구조와 거의 같고 다른점은 SW 데이터의 R, P, Q데이터와 DB데이터인 DB의 입력이 36 비트로 이루어진다는 점이다.14 to 16 show the configuration of a half-pixel motion estimator capable of performing 32-bit processing on a 48 × 48 search window. The basic configuration is almost the same as that of the VLSI structure for the 4SS algorithm. The input of DB, which is, P, Q data and DB data is 36 bits.

32 비트 데이터는 도 14 과 같이 서치 윈도우 데이터는 sw1, sw2, sw3, sw4 데이터가 동시에 입력되고 DB 데이터에는 db1, db2, db3, db4의 데이터가 동시에 입력된다.For 32-bit data, sw1, sw2, sw3, and sw4 data are simultaneously input to search window data, and data of db1, db2, db3, and db4 are simultaneously input to DB data.

도 15에는 프로세서 어레이의 내부 구조를 나타낸 것으로, 프로세서 어레이의 구조도 역시 4SS의 구조와 비슷한데 다른점은 SW의 입력으로 4SS에서는 8 비트로 P, Q의 두가지 SW 데이터가 입력된 것에 반해 여기서는 32bit, R, P, Q로 3개가 되었다는 것과 프로세서 엘리먼트 내에 있는 서치 윈도우 멀티플렉서(SWMUX)의 선택 입력이 2개가 된다.The internal structure of the processor array is shown in FIG. 15. The structure of the processor array is also similar to that of 4SS except that two SW data of P and Q are input as 8 bits in 4SS as the input of SW. There are three, P, and Q, and two select inputs to the search window multiplexer (SWMUX) within the processor element.

도 16 은 프로세서 어레이를 이루고 있는 프로세서 엘리먼트의 구성을 나타낸 것으로, 입력과 연산이 4 픽셀씩 동시에 이루어지므로 그에 따라 지연부(42)(43)(44)(45)와 서치 윈도우 멀티플레서(46)(47)(48)(49)를 4개씩 병렬로 형성하고 4 픽셀에 대해 각기 따로 빼기 연산과 절대값 연산을 수행해야 하므로 빼기 연산과 절대값 연산을 위한 감산부(50)(51)(52)(53)를 4 개 병렬로 형성하고 이들의 출력은 가산기(54)(55)(56)를 차례로 경우하도록 한다.FIG. 16 shows the configuration of the processor elements constituting the processor array. Since the input and the operation are performed at the same time by 4 pixels, the delay units 42, 43, 44, 45 and the search window multiplexer 46 are accordingly shown. (47) (48) (49) are formed in parallel by 4 and subtraction and absolute value calculation is required for subtraction and absolute value calculation respectively for 4 pixels. ) 53 are formed in parallel and their outputs are in the case of adders 54, 55, 56.

그러나 4개의 픽셀이 항상 동일한 블록내에서 연산이 수행되기 때문에 연산 블럭(57)은 하나만 구성한 것이다.However, since four pixels are always performed in the same block, only one operation block 57 is formed.

이와 같은 본 발명의 동영상 엔코더를 위한 반화소 단위의 움직임 추정 장치 및 방법에 의하여서는 4SS를 이용하여 실시간 시스템으로 구현될 수 있도록 동작에 소용되는 클럭수를 줄였으며 1 단계를 추가하여 반화소 단위의 움직임 추정과 움직임 보상결정까지 할 수 있도록 하였고, TSS보다 성능이 좋고 FSS보다 적은 수의 클럭수를 요구하는 4SS를 이용하여 실시간 시스템에서 적용하기에 적합하도록 하였으며, 6.5MHz의 속도로 영상 데이터가 입력될 경우 약 27MHz로 시스템을 동작시킬 수 있고, 보다 큰 움직임 벡터를 추정할 수 있는 탐색 알고리즘을 제안하여 성능면에서 우수하고 MPEG-1, MPEG-2 등 비디오 엔코더를 VLSI로 구현하는데 효율적으로 적용될 수 있도록 한 것이다.According to the motion estimation apparatus and method of the half-pixel unit for the video encoder of the present invention, the number of clocks used for the operation is reduced so that it can be implemented as a real-time system using 4SS. Motion estimation and motion compensation determination can be made, and it is suitable to be applied in real-time system by using 4SS, which has better performance than TSS and requires fewer clocks than FSS, and inputs image data at 6.5MHz. In this case, we can operate the system at about 27MHz and propose a search algorithm that can estimate a larger motion vector, which is excellent in terms of performance and can be efficiently applied to implement video encoders such as MPEG-1 and MPEG-2 in VLSI. It would be.

Claims

An address controller 31 for outputting an address signal while outputting a control signal for controlling the search window memory 32 into which the current frame is input and the reference block memory 33 into which the previous frame is input;

A processor array 34 including nine unit processors, which receives upper and lower data P and Q and DB data from the search window memory 32 and the reference block memory 33 and performs calculation of SAD. Wow,

PMVG 35 which receives the values of SAD from the processor array 34 and compares them and outputs the portion having the minimum error value as an index to be progressed in the next set;

An HMVG MCD 36 which receives the value of SAD from the processor array 34 and calculates a motion vector to generate a motion estimation and compensation flag in units of half pixels;

The address controller 31 outputs an address control signal, and the processor array 34 outputs a processor element enable signal and a step time bear signal for determining a step time, and outputs an MVG control signal to the PMVG 35. While receiving an index according to a minimum error value and outputting a control signal for motion estimation to the HMVG MCD 36, the motion estimation apparatus for a half-pixel unit for a video encoder comprising processor controllers 37 for controlling internal operations .

2. The processor array 34 according to claim 1, wherein the processor array 34 arranges nine processor elements (PE0 to PE8) 34a to 34i in parallel to calculate an SAD. 1-bit processor element enable signal PE0 from the processor controller 37 while receiving the 1-bit step type control signal (step type) in parallel from the lower data (P, Q) and the processor controller 37, respectively. -E) to (PE8-E) and the processor element memory selection signals (PE0-MSEL) to (PE8-MSEL), which are generated and transmitted by the multiplexer selection generator 38, are respectively input, while the above reference block memory A video encoder which outputs the SAD value by receiving the delayed DB data sequentially while passing through the DB data DB and the processor elements PE0 to PE7 34a to 34h from (33). Movement of half pixel unit for Im estimation device.

2. The processor element of claim 1, wherein the processor element is input from the search window memory 32 by processor element memory selection signals PE0-MSEL to PE8-MSEL which are made and transmitted by the multiplexer selection generator 38. A search window multiplexer 38 for selecting the upper data P and the lower data Q;

Receiving the DB data DB from the reference block memory 33, the processor controller 37 performs a 1-bit step type control signal (step type) and then the following element arrays PE1 to PE8 34b. A delay block 39 to be outputted to ˜34i,

A subtraction block 40 which receives upper, lower data P, Q and delayed DB data DB from the search window multiplexer 38 and the delay block 38 and compares each other by subtraction;

The upper, lower data (P), (Q) and the delayed DB data (DB) are inputted from the subtraction block 40 and 16 are calculated through the absolute value calculation and the cumulative calculation of the SAD operation according to the result of mutual comparison by subtraction. An apparatus for estimating a half pixel unit for a moving picture encoder comprising arithmetic block 41 for calculating and outputting a bit SAD value.

2. The processor element of claim 1, wherein the processor element comprises delay units 42, 43, 44, 45 and search window multiplexers 46, 47, 48, 49 so that input and operation are performed simultaneously by 4 pixels. Four in parallel,

Subtraction and absolute value calculation must be performed separately for 4 pixels, so four subtraction units 50, 51, 52, and 53 are formed in parallel for the subtraction operation and the absolute value operation. (54) (55) (56) A half-pixel motion estimation apparatus for a video encoder which outputs the SAD value via the calculation block (57) in order.

Half-pixel motion estimation

2 {E (-1) -E (0) (E (1) -E (0)

Of Equation 1,

(E (-1) -E (0) 2 {(E (1) -E (0)}

A half-pixel motion estimation method for moving picture encoder using linear approximation using Equation 2.

The delay block is divided into two types, an a-type delay block and a b-type delay block. The a-type delay block delays two clocks in steps 1, 2, and 3, and 1 in steps 4 and 5. And a b-type delay block delays 28 clocks in steps 1, 2, and 3, and delays 14 clocks in steps 4 and 5, respectively.

7. The method of claim 6, wherein in steps 1 and 2, 36 points are searched instead of 9 points of existing two pixel intervals, in step 3, 9 points of two pixel intervals are searched, and in step 4, 9 points of one pixel intervals are searched. A half-pixel motion estimation method for a video encoder which is searched for.