KR20000066066A

KR20000066066A - Method for determining the number of processing element in moving picture estimating algorithm

Info

Publication number: KR20000066066A
Application number: KR1019990012911A
Authority: KR
Inventors: 정의철
Original assignee: 윤종용; 삼성전자 주식회사
Priority date: 1999-04-13
Filing date: 1999-04-13
Publication date: 2000-11-15
Also published as: KR100571907B1

Abstract

PURPOSE: A method for determining processing element number in a moving pictures estimation algorithm is provided to find optimal processing element number corresponding to a size of a basic block and a searching point. CONSTITUTION: A method for determining processing element number in a moving pictures estimation algorithm includes minimal processing element number(NPE1) corresponding to a specification of a system and an application(305). A size of the minimal basic block size and the size of the minimal searching points are compared(314). The size which is determined to be smaller at the second step is determined to be a block to be divided, and the block is divided sequentially(315,316). A division processing element number(NPE2) corresponding to the number of pixels included in the block to be divided is calculated. When the NPE2 is same to the NPE1, the NPE2 is determined to be the optimal processing element number. Otherwise, an integer closest to the NPE1 is determined to be the optimal processing element number.

Description

Method for determining the number of processing element in moving picture estimating algorithm

본 발명은 동영상 신호처리장치에서의 움직임 추정 알고리즘 설계 방법에 관한 것으로서, 특히 동영상 움직임 추정을 위한 일단의 연산을 실행하는 프로세싱 엘레멘트 수를 최적화시키기 위한 방법에 관한 것이다.The present invention relates to a method for designing a motion estimation algorithm in a video signal processing apparatus, and more particularly, to a method for optimizing the number of processing elements for performing a set of operations for video motion estimation.

디지털 영상신호처리 기술이 발전됨에 따라 화상 압축기술의 적용영역은 통신 미디어(Media)에 머무르지 않고, 가전, 컴퓨터 등의 분야로 급속히 확대되어 가고 있다.With the development of digital image signal processing technology, the application area of image compression technology is rapidly expanding into the fields of home appliances, computers, etc., rather than staying in communication media.

디지털 영상신호처리 기술의 대표적인 규격 중에 하나가 MPEG(Moving Picture Experts Group)에 의한 동화상 압축 알고리즘이다.One of the representative standards of digital image signal processing technology is a moving picture compression algorithm using MPEG (Moving Picture Experts Group).

도 1은 일반적인 동화상 압축 알고리즘이 적용되는 MPEG 부호기의 구성도이다.1 is a block diagram of an MPEG encoder to which a general video compression algorithm is applied.

이를 간략히 설명하면, 영상신호를 입력하여 정보원 부호기(11)에서 DCT, 양자화 및 움직임 추정 등을 실행하여 정보량을 압축시킨다. 그런 다음에, 비디오신호 다중화 부호기(12)에서 계층구조 부호화 및 가변길이 부호화를 실행시킨 규격 데이터를 발생시키고, 송신 버퍼(13)에 의하여 송신되는 데이터양을 일정하게 한다. 그리고, 부호화 제어부(14)는 송신 버퍼(13)로부터 입력되는 버퍼 용량을 판단하여, 정보원 부호기(11) 및 비디오신호 다중화 부호기(12)에 정보 발생량의 증감을 지시하는 역할을 한다.In brief, the video signal is input to the information source encoder 11 to perform DCT, quantization and motion estimation to compress the amount of information. Then, the video signal multiplexing encoder 12 generates standard data which has been subjected to hierarchical coding and variable length coding, and makes the amount of data transmitted by the transmission buffer 13 constant. The encoding control unit 14 judges the buffer capacity input from the transmission buffer 13, and serves to instruct the information source encoder 11 and the video signal multiplexing encoder 12 to increase or decrease the amount of information generation.

이와 같은 부호기의 동화상 압축 알고리즘에 있어서, 가장 많은 계산량을 필요로 하는 것이 동영상 부호화를 위한 움직임 추정부분이다. 정보원 부호기(11)를 구성하는 움직임 추정기는 종종 칩 전체 프로세싱 파워(Processing Power)의 50%까지도 차지한다. 따라서 리스크 프로세서(Risc Processor) 상에서 움직임 추정기를 다른 시스템 모듈들과 함께 소프트웨어로 처리하는데는 무리가 있다. 그러므로 움직임 추정기 부분은 대부분 하드웨어로 구현한다.In such a moving picture compression algorithm of an encoder, a motion estimation part for moving picture coding requires the most computation amount. The motion estimator constituting the source coder 11 often accounts for up to 50% of the chip's total processing power. Therefore, it is difficult to process the motion estimator in software along with other system modules on the risk processor. Therefore, most of the motion estimator is implemented in hardware.

움직임 추정기를 독립적인 하드웨어 칩으로 만드는데 있어서 동작속도를 향상시키기 위하여 하드웨어의 구조를 바꾸면 칩의 크기가 너무 커져서 비디오 코덱을 SOC(System On the Chip)화 하기가 거의 불가능하게 된다.In making the motion estimator independent hardware chip, if the hardware structure is changed to improve the operation speed, the chip size becomes so large that it becomes almost impossible to make the video codec system on the chip (SOC).

다른 한편으로 하드웨어의 크기를 줄이는데 촛점을 맞추면 동작속도가 저하되어 실시간 동작이 어려워진다. 따라서, 이들 동작속도와 하드웨어의 크기간의 상호 적절한 타협이 필요하게 되었다.On the other hand, focusing on reducing hardware size slows down the operation speed, making real-time operation difficult. Therefore, there is a need for an appropriate compromise between these operating speeds and the size of hardware.

움직임 추정기의 하드웨어의 구조를 보면 사용되는 알고리즘의 종류에 따라 부분 하드웨어 모듈들은 변할 수 있으며, 이 모듈들에 따라 하드웨어의 크기가 달라지게 된다.Looking at the hardware structure of the motion estimator, the partial hardware modules may vary according to the type of algorithm used, and the size of the hardware varies according to the modules.

그런데, 대부분의 움직임 추정 알고리즘에서는 이전 프레임과 현재 프레임간의 움직임 차의 합을 구하기 위한 정합 알고리즘으로 절대차 평균(MAD:Mean of Absolute Difference)이 사용된다. 절대차 평균을 구하기 위해서는 일단의 연산(덧셈, 절대값, 뺄셈)이 필요한데, 이를 위한 모듈이 프로세싱 엘레멘트(PE:Processing Element)이다. 이 프로세싱 엘레멘트의 크기와 수는 움직임 추정기 크기와 성능에 큰 영향을 준다.However, in most motion estimation algorithms, a mean of absolute difference (MAD) is used as a matching algorithm for calculating a sum of motion differences between a previous frame and a current frame. To calculate the absolute difference average, a set of operations (addition, absolute value, subtraction) is required. The module for this is a processing element (PE). The size and number of these processing elements have a big impact on the size and performance of the motion estimator.

움직임 추정기를 구현할 때 절대차 평균을 구하기 위한 프로세싱 엘레멘트들의 어레이로 이루어진 압축 어레이(systolic array)가 기본 탐색기이다.When implementing a motion estimator, a basic array is a compression array consisting of an array of processing elements for calculating the absolute difference average.

종래의 기술에 의하면 도 2에 도시된 바와 같이, 기본 탐색기에서 실제적인 계산을 위한 부분인 프로세싱 엘레멘트의 수는 탐색점의 크기나 매크로 블럭의 크기에 의하여 설계자가 자신이 사용하는 하드웨어의 구조와 시스템의 요구 사양에 따라 경험적으로 적절히 선택하였다.According to the related art, as shown in FIG. 2, the number of processing elements, which is a part for the actual calculation in the basic searcher, is determined by the size of the search point or the size of the macro block. Empirically selected according to the requirements of the appropriate selection.

만약 블럭 정합 알고리즘 중 가장 잘 알려진 전면 탐색 알고리즘의 경우에서 매크로 블럭의 크기는 16×16이고, 탐색을 위한 탐색 영역들이 32×32라고 한다면 가장 좋은 성능을 위해선 간단히 256개의 프로세싱 엘레멘트를 사용할 수 있다. 하지만 이 경우 이들 프로세싱 엘레멘트는 지나치게 큰 칩 면적을 차지하게 된다. 만약 이 움직임 추정기가 H.263과 같은 로우 비트(low bits) 데이터 전송을 위한 것이라면 더욱더 칩 가격과 직결되는 실리콘 면적에 있어서 불필요한 손실을 주게된다.If the best-known front search algorithm among block matching algorithms is the size of the macro block is 16x16 and the search areas for the search are 32x32, then 256 processing elements can be simply used for the best performance. In this case, however, these processing elements take up too much chip area. If the motion estimator is intended for low bit data transfers, such as H.263, it will introduce unnecessary losses in the silicon area, which is directly related to the chip price.

이와같이, 종래의 기술에 의한 프로세싱 엘레멘트의 수를 결정하는데 있어서 객관적인 산출방법 없이 설계자의 경험에 따라 결정함으로써 경우에 따라서는 움직임 추정기의 크기 및 성능에 큰 손실을 줄 수 있는 문제점이 있었다.As such, in determining the number of processing elements according to the related art, according to the designer's experience without an objective calculation method, there is a problem that a large loss in the size and performance of the motion estimator may occur in some cases.

본 발명이 이루고자 하는 기술적 과제는 상술한 문제점을 해결하기 위하여 기본 블럭 및 탐색점의 크기에 상응하여 최적의 프로세싱 엘레멘트의 수를 결정하기 위한 동영상 추정 알고리즘에 있어서 프로세싱 엘레멘트 수 결정 방법을 제공하는데 있다.SUMMARY OF THE INVENTION The present invention has been made in an effort to provide a method for determining the number of processing elements in a video estimation algorithm for determining an optimal number of processing elements corresponding to the size of a basic block and a search point in order to solve the above problems.

도 1은 일반적인 MPEG 부호기의 구성도이다.1 is a block diagram of a general MPEG encoder.

도 2는 종래의 기술에 의한 프로세싱 엘레멘트 수 결정 방법의 흐름도이다.2 is a flowchart of a method for determining the number of processing elements according to the related art.

도 3은 본 발명에 의한 동영상 추정 알고리즘에 있어서 프로세싱 엘레멘트 수 결정 방법의 흐름도이다.3 is a flowchart of a method for determining the number of processing elements in a video estimation algorithm according to the present invention.

상기 기술적 과제를 달성하기 위하여 본 발명에 의한 동영상 추정 알고리즘에 있어서 프로세싱 엘레멘트 수 결정 방법은 움직임 추정기를 구성하는 최적의 프로세싱 엘레멘트의 수를 결정하는 방법에 있어서, (a) 시스템 및 어플리케이션 사양에 상응하는 실현 가능 최소 프로세싱 엘레멘트의 수(N_PE1)를 구하기 위한 단계, (b) 최소 기본 블럭의 크기와 최소 탐색점의 크기를 비교하기 위한 단계, (c) 상기 단계(b)의 비교 결과 작은 크기를 갖는 것을 분할 대상 블럭으로 판정하여, 순차적으로 분할시키기 위한 단계, (d) 상기 단계(c)에서 순차적으로 분할되는 블럭에 포함된 화소 수에 해당되는 분할 프로세싱 엘레멘트 수(N_PE2)를 산출하기 위한 단계 및 (e) 상기 N_PE1과 상기 N_PE2를 비교하여, N_PE2중에서 N_PE1과 같은 값이 있으면 이 값을 최적 프로세싱 엘레멘트의 수(N_PEO)로 결정하고, 그렇지 않으면 N_PE1보다 큰 값을 갖는 N_PE2중에서 N_PE1에 가장 근접된 정수값을 N_PEO로 결정하기 위한 단계를 포함함을 특징으로 한다.In order to achieve the above technical problem, the method for determining the number of processing elements in the video estimation algorithm according to the present invention is a method for determining the optimal number of processing elements constituting the motion estimator, which (a) corresponds to the system and application specification. Obtaining the number of feasible minimum processing elements (N _PE1 ), (b) comparing the size of the minimum basic block and the size of the minimum search point, (c) comparing the small size as a result of the step (b) Determining to have a division target block and sequentially dividing the same; and (d) calculating a number of division processing elements N _PE2 corresponding to the number of pixels included in the blocks sequentially divided in the step (c). and step (e) the N _PE1 and _PE2 compared to the N, N _PE2 optimum processing this value if the value such as N _PE1 from El Determining a number (N _PEO) in the cement, otherwise characterized in that the N _PE2 having a value greater than N _PE1 includes the step of determining the nearest integer value for N to N _PE1 _PEO.

따라서 일정양의 화소를 파이프라인(pipeline)방식을 이용하여 동시 처리하며, 이를 위한 같은 수의 프로세싱 엘레멘트를 반복적으로 되풀이하여 사용하는 압축 어레이 방법이 넓이 사용된다. 이 방법은 속도적인 관점에서 본다면 움직임 추정기의 성능을 떨어트리지만 프로세싱 엘레멘트의 수를 줄일 수 있기 때문에 칩 크기를 현격히 감소시킬 수 있다. 이 때 프로세싱 엘레멘트의 수를 결정하기 위하여 설계자는 사용되는 탐색 블럭이나 탐색영역의 크기를 이용한다. 프로세싱 엘레멘트의 수를 결정하는 방법으로서, 전면 탐색의 경우는 주로 매크로 블럭을 분할하여 사용한다. 왜냐하면 일반적으로 매크로 블럭의 크기가 탐색영역보다 작기때문에 등분하는 것이 유리하기 때문이다. 이에 비하여 탐색영역과 매크로 블럭을 적당한 크기로 쪼개어 계층마다 다른 크기의 블럭을 사용하는 계층적 탐색방법에선 쪼개어진 탐색영역과 매크로 블럭 중 크기가 작은 것을 기본으로 프로세싱 엘레멘트의 수를 결정하는 것이 가장 유리한 방법으로 사용되고 있다.Therefore, a compression array method is widely used, which simultaneously processes a certain amount of pixels using a pipeline method, and repeatedly uses the same number of processing elements repeatedly. This method reduces the performance of the motion estimator from a speed standpoint, but can significantly reduce the chip size because it reduces the number of processing elements. In this case, the designer uses the size of the search block or the search area used to determine the number of processing elements. As a method of determining the number of processing elements, in the case of front search, macro blocks are mainly divided. This is because, in general, the size of the macro block is smaller than that of the search area, so it is advantageous to divide it. On the other hand, in the hierarchical search method in which the search area and the macro blocks are divided into appropriate sizes and use blocks of different sizes for each layer, it is most advantageous to determine the number of processing elements based on the smaller one among the divided search areas and macro blocks. It is used as a method.

이하 첨부된 도면을 참조하여 본 발명의 바람직한 실시예에 대하여 상세히 설명하기로 한다.Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to the accompanying drawings.

본 발명에 의한 최적의 프로세싱 엘레멘트의 수를 결정하기 위한 알고리즘은 크게 최소의 프로세싱 엘레멘트의 수를 계산하는 제1과정(단계301∼단계305)과 기본 블럭의 크기와 탐색점의 크기를 고려하여 최적의 프로세싱 엘레멘트의 수를 계산하는 제2과정(단계306∼단계316)으로 대별할 수 있다.The algorithm for determining the optimal number of processing elements according to the present invention is largely optimized in consideration of the first process (step 301 to step 305) of calculating the minimum number of processing elements and the size of the basic block and the size of the search point. It can be roughly divided into the second process (step 306 to step 316) of calculating the number of processing elements of the?

우선, 제1과정에 의한 최소의 프로세싱 엘레멘트의 수를 계산하는 흐름을 설명하면 다음과 같다.First, the flow of calculating the minimum number of processing elements by the first process will be described.

우선, 설계 초기 결정 사항으로서 어플리케이션(Application)을 결정한다. 본 발명에서는 일 예로 어플리케이션을 H.263으로 결정하였다.(단계301)First, an application is determined as a design initial decision. In the present invention, as an example, the application is determined to be H.263.

다음으로, 시스템 클럭, 영상 구격, 초당 프레임 수 등의 시스템 사양을 결정한다. 시스템이 요구하는 사양을 CIF(352*288 픽셀)*30프레임/초, 그리고, 이전 프레임의 탐색영역의 범위를 -16∼16, 현재 프레임에서의 탐색 블럭의 크기를 16*16으로 결정하였다.(단계302)Next, system specifications such as system clock, video shooting, frames per second, etc. are determined. The system required specifications of CIF (352 * 288 pixels) * 30 frames / second, and the range of the search region of the previous frame was -16 to 16, and the size of the search block in the current frame was 16 * 16. (Step 302)

그리고 나서, 전면 조사 방식에 의한 알고리즘을 선택한 경우에 있어서 초당 계산양(N_OP)을 구하면 다음과 같다.(단계303)Then, in the case where the algorithm based on the full survey method is selected, the calculation amount N _OP is obtained as follows (step 303).

N_OP= {(352×288)/16²}×(2×16)²×16²×30 = 3114.68(MOPS)N _OP = {(352 × 288) / 16 ² } × (2 × 16) ² × 16 ² × 30 = 3114.68 (MOPS)

그리고 사용하려고 하는 클럭 주파수(F_C)를 30MHz으로 결정하면 기 발표된 수학식 1에 따라서 최소의 프로세싱 엘레멘트의 수(N_PE)를 계산한다.(단계304)If the clock frequency (F _C ) to be used is determined as 30 MHz, the minimum number of processing elements (N _PE ) is calculated according to Equation 1 (step 304).

따라서, 전면 조사 방식을 채용한 경우에 실현 가능 최소 프로세싱 엘레멘트의 수(N_PE1)는 정수값인 104개로 결정된다.(단계305)Therefore, the number N _PE1 of the minimum processing elements that can be realized in the case of employing the full-surface irradiation method is determined to be an integer value 104 (step 305).

만일, 전면 조사 방식 대신에 상관 관계를 이용한 계층적 탐색방법을 이용한 움직임 벡터 조사 방법에 적용한 경우에 대하여 살펴보면 다음과 같다.If the case is applied to the motion vector survey method using the hierarchical search method using the correlation instead of the front survey method is as follows.

일 예로 계층적 탐색 방법은 최근 발표된 현존하는 알고리즘 중 HSBMA3S(Hierarchical Search Block Matching Algorithm with 3 candidates and Spatial correlation) 조사 방법이다.For example, the hierarchical search method is a method of investigating hierarchical search block matching algorithms with 3 candidates and spatial correlation (HSBMA3S) among recently published algorithms.

HSBMA3S 조사 방법은 많은 수의 계산양을 줄일 수 있지만 전면 조사 방법보다 복잡한 회로를 요구한다.The HSBMA3S survey method can reduce the number of calculations, but requires more complex circuits than the full survey method.

이와 더불어 일정양의 화소를 파이프라인(pipeline)방식을 이용하여 동시 처리하며, 이를 위한 같은 수의 프로세싱 엘레멘트를 반복적으로 되풀이하여 사용하는 압축 어레이 방법이 넓게 사용된다. 이 방법은 속도적인 관점에서 본다면 움직임 추정기의 성능을 떨어트리지만 프로세싱 엘레멘트의 수를 줄일 수 있기 때문에 칩 크기를 현격히 감소시킬 수 있다. 이 때 프로세싱 엘레멘트의 수를 결정하기 위하여 설계자는 사용되는 탐색 블럭이나 탐색영역의 크기를 이용한다. 프로세싱 엘레멘트의 수를 결정하는 방법으로서, 전면 탐색의 경우는 주로 매크로 블럭을 분할하여 사용한다. 왜냐하면 일반적으로 매크로 블럭의 크기가 탐색영역보다 작기때문에 등분하는 것이 유리하기 때문이다. 이에 비하여 탐색영역과 매크로 블럭을 적당한 크기로 쪼개어 계층마다 다른 크기의 블럭을 사용하는 계층적 탐색방법에선 쪼개어진 탐색영역과 매크로 블럭 중 크기가 작은 것을 기본으로 프로세싱 엘레멘트의 수를 결정하는 방법을 사용한다.In addition, a compression array method that processes a certain amount of pixels simultaneously using a pipeline method and repeatedly uses the same number of processing elements repeatedly is widely used. This method reduces the performance of the motion estimator from a speed standpoint, but can significantly reduce the chip size because it reduces the number of processing elements. In this case, the designer uses the size of the search block or the search area used to determine the number of processing elements. As a method of determining the number of processing elements, in the case of front search, macro blocks are mainly divided. This is because, in general, the size of the macro block is smaller than that of the search area, so it is advantageous to divide it. On the other hand, in the hierarchical search method in which the search area and macro blocks are divided into appropriate sizes and use different size blocks for each layer, the method of determining the number of processing elements based on the smaller one among the divided search areas and macro blocks is used. do.

계층적 탐색 방법 중의 하나인 HSBMA3S 조사 방식에 의한 알고리즘을 선택한 경우에 있어서 초당 계산양(N_OP)을 구하면 다음과 같다.(단계303)In the case of selecting an algorithm based on the HSBMA3S research method, which is one of the hierarchical search methods, the calculation amount N _OP is obtained as follows (step 303).

N_OP= {(9²×4²) + (3×5²×8²) + (5²×16²)} × {(352×288)/16²}N _OP = {(9 ² × 4 ² ) + (3 × 5 ² × 8 ² ) + (5 ² × 16 ² )} × {(352 × 288) / 16 ² }

= 148.45MOPS= 148.45MOPS

그리고 수학식 1을 이용하여 최소 엘레멘트의 수(N_PE)를 계산하면 다음과 같다.(단계304)The minimum number of elements N _PE is calculated using Equation 1 as follows.

N_PE= 148.45M/30M = 4.95N _PE = 148.45M / 30M = 4.95

따라서, HSBMA3S 조사 방식에 의한 알고리즘을 선택한 경우에 있어서 실현 가능 최소 프로세싱 엘레멘트의 수(N_PE1)는 4.95보다 크거나 같은 최소 정수값인 5로 결정한다.(단계305)Therefore, in the case of selecting the algorithm by the HSBMA3S research method, the number of feasible minimum processing elements N _PE1 is determined as 5, which is the minimum integer value greater than or equal to 4.95 (step 305).

다음으로, HSBMA3S 조사 방식에 의한 알고리즘을 선택한 경우에 있어서 최적의 프로세싱 엘레멘트의 수를 구하는 제2과정에 대하여 설명하기로 한다.Next, a second process of obtaining the optimal number of processing elements in the case of selecting an algorithm based on the HSBMA3S irradiation method will be described.

초기 설계 설정 사항인 움직임 벡터 조사 방법으로 HSBMA3S 조사 방식에 의한 ME알고리즘을 선택하고, 하드웨어로 H.263에 의한 파이프라인 구조를 선택한 것으로 가정하자.(단계306∼단계307)Assume that the MEB algorithm based on the HSBMA3S survey method is selected as the motion vector survey method, which is an initial design setting, and the H.263 pipeline structure is selected as the hardware (steps 306 to 307).

HSBMA3S 조사 방식에 있어서는 최상위 계층에서 사용되는 매크로 블럭이 4×4블럭(16화소)이고, 최소 탐색점의 크기를 16×16블럭으로 결정한 경우에(단계308∼단계309), 기본 블럭과 탐색점 크기를 비교한 후에, 작은 크기를 갖는 것을 분할 대상 블럭으로 판정한다. 따라서, 본 발명에서는 기본 블럭의 크기가 탐색점의 크기보다 작으므로 기본 블럭을 분할 대상 블럭으로 판정한다.(단계310∼단계311)In the HSBMA3S irradiation method, when the macroblock used in the top layer is 4x4 blocks (16 pixels), and the size of the minimum search point is determined to be 16x16 blocks (steps 308 to 309), the basic block and the search points After comparing the sizes, it is determined that the one having the small size is a division target block. Therefore, in the present invention, since the size of the basic block is smaller than the size of the search point, it is determined that the basic block is a division target block (steps 310 to 311).

그런 후에, 분할 대상 판정을 받은 4×4의 매크로 블럭의 화소의 수가 짝수이므로 2분법에 의하여 1/2씩 분할한 후에, 분할된 단위 블럭에서의 화소의 수를 계산하여 해당 화소 수에 상응하는 분할 프로세싱 엘레멘트 수(N_PE2)를 산출한다. 이에 따라서, 1차 분할된 프로세싱 엘레멘트 수(N_PE2)는 16화소의 반인 8이 된다.(단계312∼단계313)After that, since the number of pixels of the 4x4 macroblock that has been determined to be divided is even, the pixels are divided by 1/2 by the dividing method, and the number of pixels in the divided unit blocks is calculated to correspond to the number of pixels. Calculate the number of split processing elements N _PE2 . Accordingly, the number N _{PE2 of the} first divided processing elements is 8, which is half of 16 pixels. (Steps 312 to 313)

그리고 나서, 단계305에서 결정된 실현 가능 최소 프로세싱 엘레멘트의 수(N_PE1)와 단계313에서 산출된 분할 프로세싱 엘레멘트 수(N_PE2)의 크기를 비교한다.(단계314)Then, the number of the smallest possible processing elements N _PE1 determined in step 305 and the size of the split processing element number N _PE2 calculated in step 313 are compared (step 314).

단계314의 비교 결과, 1차 분할 후 산출된 프로세싱 엘레멘트 수(N_PE2)가 8로 실현 가능 최소 프로세싱 엘레멘트의 수(N_PE1)인 5보다 크므로, 1차 분할 후 산출된 프로세싱 엘레멘트 수(N_PE2)인 8을 최적 프로세싱 엘레멘트 수(N_PE0)를 저장하는 메모리에 저장하고 나서, 단계312로 되돌아가서 1차 분할된 블럭을 다시 한번 더 분할시킨다. 즉, 1차 분할된 8화소를 2등분으로 분할하여 각각 4화소를 처리할 수 있는 블럭으로 분할한다.As a result of the comparison of step 314, the number of processing elements (N _PE2 ) calculated after the first division is greater than 5, which is the minimum number of processing elements (N _PE1 ) that can be realized by eight, so that the number of processing elements (N) calculated after the first division _PE2 ) is stored in a memory that stores the optimal processing element number N _PE0 , and then returns to step 312 to divide the primary partitioned block once again. That is, the first divided eight pixels are divided into two parts, and each pixel is divided into blocks capable of processing four pixels.

그리고 나서, 단계313에 의하여 분할 프로세싱 엘레멘트 수(N_PE2)를 구하면 2차 분할된 블럭이 4화소로 구성되어 있으므로 4로 결정된다.Then, if the division processing element number N _PE2 is obtained in step 313, it is determined as 4 since the secondary partitioned block is composed of 4 pixels.

그런 후에, 단계314에서 2차 분할된 프로세싱 엘레멘트 수(N_PE2)인 4와 실현 가능 최소 프로세싱 엘레멘트의 수(N_PE1)인 5를 비교한다.Thereafter, in step 314, the number of second-segmented processing elements (N _PE2 ) 4 is compared with the number of feasible minimum processing elements (N _PE1 ) 5.

단계314의 비교 결과, 2차 분할된 프로세싱 엘레멘트 수(N_PE2)가 실현 가능 최소 프로세싱 엘레멘트의 수(N_PE1)보다 작으므로, 단계316에 의하여 현 분할전에 메모리에 저장된 최적 프로세싱 엘레멘트의 수(N_PE0)인 8을 출력한다.As a result of the comparison in step 314, since the number of second-segmented processing elements N _PE2 is smaller than the minimum number of feasible processing elements N _PE1 , the number of optimal processing elements N stored in the memory before the current division by step 316 (N Output 8, _PE0 ).

그런데, 최상위 계층에서 결정된 최적 프로세싱 엘레멘트의 수(N_PE0)인 8은 실현 가능 최소 프로세싱 엘레멘트의 수인 5에 비하여 비교적 큰 값에 해당되므로 거의 40% 이상의 오버헤드(overhead)가 생긴다.However, since the optimal number of processing elements (N _PE0 ) 8 determined in the uppermost layer corresponds to a relatively large value compared to the maximum number of feasible minimum processing elements 5, there is almost 40% or more overhead.

따라서, 단계308∼단계316을 중간 계층과 하위 계층에 적용하여 최적 프로세싱 엘레멘트의 수를 위와 같은 방법으로 반복하여 구한다.Accordingly, steps 308 to 316 are applied to the middle layer and the lower layer, and the optimum number of processing elements is repeatedly obtained in the same manner as described above.

중간 계층과 하위 계층에서 사용되어지는 탐색 영역들은 -2∼2이다.The search areas used in the middle and lower layers are -2 to 2.

이 경우에는 단계310의 비교 결과, 탐색점이 기본 블럭의 크기보다 작으므로 단계311에서는 분할 대상 블럭을 탐색점의 영역으로 판정한다.In this case, since the search point is smaller than the size of the basic block as a result of the comparison in step 310, in step 311, the division target block is determined as an area of the search point.

단계312에서는, 5×5의 초기 탐색점 영역을 분할한다. 이 경우에 영역내의 화소의 수가 홀수이므로 2분법에 의한 분할이 되지 않는다. 이 경우에는 행 단위로 분할한다. 이에 따라서 1행의 화소 수가 5이므로 단계314에서는 분할 프로세싱 엘레멘트 수(N_PE2)는 5로 결정된다.In step 312, a 5 x 5 initial search point region is divided. In this case, since the number of pixels in the area is odd, the division by the dividing method is not possible. In this case, it is divided into rows. Accordingly, since the number of pixels in one row is five, the division processing element number N _PE2 is determined to be 5 in step 314.

단계314에 의하여, 실현 가능 최소 프로세싱 엘레멘트의 수(N_PE1)와 산출된 분할 프로세싱 엘레멘트 수(N_PE2)의 크기를 비교한다. 이 경우에 실현 가능 최소 프로세싱 엘레멘트의 수(N_PE1)와 산출된 분할 프로세싱 엘레멘트 수(N_PE2)의 크기가 5로 같으므로 단계315에서 메모리에 이미 저장된 최적 프로세싱 엘레멘트의 수(N_PE0)를 8에서 5로 대체하여 저장한다.By step 314, the size of the calculated number of split processing elements N _PE2 is compared with the number of feasible minimum processing elements N _PE1 . In this case, since the minimum number of possible processing elements (N _PE1 ) and the calculated number of split processing elements (N _PE2 ) are equal to 5, the optimal number of processing elements (N _PE0 ) already stored in the memory in step 315 is 8. Replace with 5 to save.

따라서, 최종적으로 결정된 최적 프로세싱 엘레멘트의 수(N_PE0)는 5가 되어, 이 경우에는 거의 어떠한 오버헤드가 없는 프로세싱 엘레멘트의 수에 해당된다.Thus, the finally determined optimal number of processing elements N _PE0 is 5, which corresponds to the number of processing elements with almost no overhead in this case.

물론, 위에서 사용되는 프로세싱 엘레멘트는 파이프라인을 이용하여 프로세싱 엘레멘트의 수만큼 동시 수행된다. 예를들어 5개의 프로세싱 엘레멘트를 이용하여 25개의 탐색점의 화소에 대한 모든 계산을 수행하려면 5개의 프로세싱 엘레멘트를 5번 반복적으로 사용하면 된다.Of course, the processing elements used above are executed concurrently by the number of processing elements using the pipeline. For example, if you want to perform all the calculations on the pixels of 25 search points using 5 processing elements, you can use 5 processing elements five times.

결론적으로, 설계자는 프로세싱 엘레멘트의 수를 5개로 선택함으로써 최적의 프로세싱 엘레멘트의 수로 움직임 추정기의 설계를 할 수 있게 되었다.In conclusion, the designer can design the motion estimator with the optimal number of processing elements by selecting five processing elements.

위의 실시예에서는 설명의 편의 상 HSBMA3S 조사 방식에 의한 알고리즘을 선택한 경우에 있어서의 최적의 프로세싱 엘레멘트 수를 구하는 방법을 제시하였다. 전면 조사 방식에서 위와 같은 방법을 적용하여 최적의 프로세싱 엘레멘트 수를 구할 수 있음은 당연하다.In the above embodiment, for convenience of description, a method of obtaining an optimal number of processing elements in the case of selecting an algorithm based on the HSBMA3S research method is presented. It is natural that the above method can be applied in the front irradiation method to obtain an optimal number of processing elements.

그러면, 본 발명을 적용한 경우에 있어서 얼마나 프로세싱 엘레멘트(PE)의 수가 전체 움직임 추정기 면적에 영향을 주는가를 살펴보자.Then, how the number of processing elements (PE) in the case of applying the present invention affects the overall motion estimator area.

일반적으로 칩 면적을 고려한 하나의 프로세싱 엘레멘트에 사용되는 게이트의 수는 약 256개이다. 이 게이트 수는 VHDL 하드웨어 언어로 일반적인 구조의 프로세싱 엘레멘트를 코딩하여 개관 툴(Synopsis tool)로 합성할 때 얻어진 수치이다. 본 발명에서 전면 조사 방식을 채용한 경우에 QCIF(176×144) 30 프레임/초를 처리하기 위하여 소개된 공식에 의하여 얻어진 최소 26개의 프로세싱 엘레멘트를 이용하여 32개의 최적 프로세싱 엘레멘트를 구할 수 있다.In general, the number of gates used in one processing element considering the chip area is about 256. This gate count is obtained by synthesizing the general-purpose processing elements in the VHDL hardware language and synthesizing them with the Synopsis tool. In the present invention, when the front irradiation method is adopted, 32 optimal processing elements can be obtained using at least 26 processing elements obtained by the formula introduced to process QCIF (176 × 144) 30 frames / sec.

이에 따라서, 칩 면적을 고려하지 않은 경우의 게이트 수(N_G1)는Accordingly, when the chip area is not taken into account, the gate number N _G1 is

N_G1= 256게이트 × 256개의 PE = 65536 게이트N _G1 = 256 gates × 256 PE = 65536 gates

본 발명을 적용하여 최적 PE를 구한 경우의 게이트 수(N_GO)는The gate number (N _GO ) in the case of obtaining the optimum PE by applying the present invention is

N_GO= 256게이트 × 32개의 PE = 8192 게이트N _GO = 256 gates × 32 PE = 8192 gates

가 된다.Becomes

따라서, 칩면적을 고려하지 않았을 경우에는 256개의 PE가 사용되며 이를 위하여 사용되는 PE의 전체 게이트 수는 PE를 제외한 움직임 추정기를 위한 게이트 수의 5배가 넘게 된다. 그러므로 어플리케이션의 사양, 사용되는 움직임 추정기는 이를 고려한 것의 거의 4.2배 정도의 오버헤드를 가지게 된다.Therefore, when the chip area is not considered, 256 PEs are used, and the total number of gates of PE used for this is more than five times the number of gates for the motion estimator except the PE. Therefore, the specification of the application, the motion estimator used, has almost 4.2 times the overhead of considering this.

상술한 바와 같이, 본 발명에 의하면 어플리케이션의 종류에 따라서 기본 블럭 및 탐색점의 크기에 상응하여 최적의 프로세싱 엘레멘트의 수를 찾아냄으로써, 오버헤드를 줄일 수 있을 뿐만 아니라 필요 게이트 수를 대폭 줄여 칩 면적을 축소시킬 수 있는 효과가 있다.As described above, according to the present invention, by finding the optimal number of processing elements corresponding to the size of the basic block and the search point according to the type of application, not only can the overhead be reduced but the number of gates required is greatly reduced, thereby reducing the chip area. There is an effect that can be reduced.

Claims

A method of determining the optimal number of processing elements that make up a motion estimator,

(a) _obtaining a number of feasible minimum processing elements N _PE1 corresponding to system and application specifications;

(b) comparing the size of the minimum basic block and the size of the minimum search point;

(c) determining that the block to be divided has a small size as a result of the comparison in step (b), and dividing the blocks sequentially;

(d) calculating the number of division processing elements (N _PE2 ) corresponding to the number of pixels included in the blocks sequentially divided in step (c); And

(e) comparing the N _PE1 and the N _PE2 , and if there is a value equal to N _PE1 among N _PE2 , this value is determined as the number of optimal processing elements (N _PEO ), otherwise N having a value larger than N _PE1. _And determining an integer value closest to N _PE1 among _PE2 as N _PEO .

The method of claim 1, wherein in step (a), the number of feasible minimum processing elements (N _PE1 ) is represented by Equation 1.

[Equation 1]

N _PE1 ≒ N _OP / F _C

(N _OP is the total calculation amount per second, F _C is the clock frequency)

A method for determining the number of processing elements in a video estimation algorithm characterized in that the operation.

The method of claim 1, wherein the steps (b) to (d) are executed sequentially from the highest layer to the lowest layer.

2. The method of claim 1, wherein the division in the step (c) is performed by dividing by a dividing method when the number of pixels included in the block is an even number, and dividing by a row unit when the odd number is odd. A method for determining the number of processing elements in a video estimation algorithm.