JP5146270B2

JP5146270B2 - Normalized correlation processor

Info

Publication number: JP5146270B2
Application number: JP2008285651A
Authority: JP
Inventors: 毅 ▲葛▼
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2008-11-06
Filing date: 2008-11-06
Publication date: 2013-02-20
Anticipated expiration: 2028-11-06
Also published as: JP2010113522A

Description

本発明は、画像マッチングにおける正規化相関処理装置に関する。 The present invention relates to a normalized correlation processing apparatus in image matching.

画像処理アプリケーションの分野では、画像のパタンマッチングの１手法として、正規化相関値や差分絶対値和SADを計算する方法がある。この処理はCPUによる計算量が非常に多いため、専用回路化して高速化することが多く行われており、様々な回路構成が提案されている。 In the field of image processing applications, there is a method of calculating a normalized correlation value and a sum of absolute differences SAD as one method of image pattern matching. Since this processing requires a large amount of calculation by the CPU, a dedicated circuit is often used to increase the speed, and various circuit configurations have been proposed.

正規化相関では、探索したい画像（テクスチャと呼ぶ）と探索対象画像の部分矩形（ウィンドウと呼ぶ）の間で正規化相関値を計算する。そして、これをある１つのテクスチャに対して、探索対象画像のウィンドウ位置を例えばピクセル単位で小刻みに移動させて行うというのが一般的である。 In normalized correlation, a normalized correlation value is calculated between an image to be searched (called a texture) and a partial rectangle (called a window) of the search target image. Then, this is generally performed by moving the window position of the search target image in small increments, for example, in units of pixels for a certain texture.

次の式が、正規化相関値ｒの計算式である。
ここで、ｆは、ソース画像（ウィンドウ）の画素値（輝度値）、ｇは、テンプレート画像（テクスチャ）の画素値（輝度値）、ｎは、テンプレート領域内有効画素数である。そして、マッチングを行う場合には、テクスチャｇを少しずつずらしてｒを計算する。以下、テクスチャを当てた位置を「ウィンドウ」と呼ぶ。以下では、gはx、fはyであらわす。Σｘ、（Σｘ）^２、Σｇ^２は各ウィンドウで変わらないので予め計算しておく。Σｘｙ、Σｙ^２、（Σｙ）^２は各ウィンドウで変化するので毎回計算する。これらを「相関要素値」と呼ぶ。 The following formula is a formula for calculating the normalized correlation value r.
Here, f is the pixel value (luminance value) of the source image (window), g is the pixel value (luminance value) of the template image (texture), and n is the number of effective pixels in the template area. When matching is performed, the texture g is shifted little by little to calculate r. Hereinafter, the position where the texture is applied is called a “window”. In the following, g is x and f is y. Since Σx, (Σx) ² and Σg ² do not change in each window, they are calculated in advance. Since Σxy, Σy ² and (Σy) ² change in each window, they are calculated every time. These are called “correlation element values”.

図１３は、正規化相関値の計算方法について説明する図である。
相関要素値のみをアクセラレータで計算し、最後にソフトウエアで相関値を計算する。ウィンドウの左上の座標をウィンドウの位置とし、ウィンドウの位置が（０，０）の場合の相関値をテクスチャ画像ｘとウィンドウ内の元画像ｙとを使って計算する。次に、ウィンドウの位置を１ピクセル横に移動し、ウィンドウの位置が（１，０）の場合に相関値を求める。以上のようにして、ウィンドウの位置が（３，３）の場合など、すべての元画像ｙについて相関値を求める。 FIG. 13 is a diagram illustrating a method for calculating a normalized correlation value.
Only the correlation element value is calculated by the accelerator, and finally the correlation value is calculated by software. The coordinates at the upper left of the window are defined as the window position, and the correlation value when the window position is (0, 0) is calculated using the texture image x and the original image y in the window. Next, the window position is moved horizontally by one pixel, and the correlation value is obtained when the window position is (1, 0). As described above, correlation values are obtained for all original images y, such as when the window position is (3, 3).

図１４は、ウィンドウ位置のとり方について説明する図である。
このとき、ウィンドウの位置の取り方には、二通りの方法があり、規則的な位置を設定する方法と、ランダムな位置をとる方法とがある。 FIG. 14 is a diagram for explaining how to set the window position.
At this time, there are two methods for determining the position of the window: a method of setting a regular position and a method of setting a random position.

ウィンドウの位置が規則的な場合は、相関値を取りたい各ウィンドウが1または数ピクセルずつ規則的にずれて位置する。ウィンドウ位置がランダムの場合は、相関値を取りたい各ウィンドウ位置に規則性がなく散らばっている。 When the positions of the windows are regular, the windows for which correlation values are to be obtained are regularly shifted by one or several pixels. When the window positions are random, the window positions for which correlation values are desired are scattered without regularity.

図１５は、ウィンドウ位置が規則的な場合のアクセラレータにより処理の概要を説明す
る図である。
相関要素値（Σｘｙ、Σｙ、Σｙ^２）のみをアクセラレータで計算し、最後にソフトウエアで相関値ｒを計算する。Σｘｙを計算する場合には、ウィンドウ位置（０，０）、（１，０）・・・、（３，３）、・・・などの各位置において乗算器１０でテクスチャとウィンドウのピクセルの乗算ｘｙを計算し、計算結果を加算器１１で加算して、Σｘｙを計算する。計算結果は、レジスタ１２に格納される。Σｘｙの計算に必要なマシンサイクルはウィンドウの大きさｎ×ｍピクセルに相当するｎ×ｍサイクルである。図１５においては、ウィンドウの大きさが、６×４であるので、１つのΣｘｙを計算するのに、２４サイクルかかる。 FIG. 15 is a diagram for explaining the outline of processing by an accelerator when the window positions are regular.
Only the correlation element values (Σxy, Σy, Σy ² ) are calculated by the accelerator, and finally the correlation value r is calculated by software. When calculating Σxy, the multiplier 10 multiplies the texture and window pixels at each of the window positions (0, 0), (1, 0)... (3, 3),. xy is calculated, and the adder 11 adds the calculation results to calculate Σxy. The calculation result is stored in the register 12. The machine cycle required for the calculation of Σxy is n × m cycles corresponding to the window size of n × m pixels. In FIG. 15, since the window size is 6 × 4, it takes 24 cycles to calculate one Σxy.

このようにする場合、この処理を速くするために、並列処理を行うことが考えられる。並列処理には、ウィンドウ単位で並列処理する方法と、ピクセル単位で並列処理する方法とがある。 In this case, it is conceivable to perform parallel processing in order to speed up this processing. The parallel processing includes a method of performing parallel processing in units of windows and a method of performing parallel processing in units of pixels.

ウィンドウ単位で並列処理する方法においては、各演算器は各々別のウィンドウを担当する。１ウィンドウあたりのレイテンシはウィンドウの大きさであるｎ×ｍに比例する。スループットが演算器個数倍となるが、ウィンドウの位置を個別に与える、ウィンドウ位置がランダムな場合には、ウィンドウが離れていると、同じピクセルをアクセスして計算することが出来ないので、規則的にウィンドウ位置を与える場合とは異なり、並列処理に向かない。 In the parallel processing method in units of windows, each computing unit is in charge of a different window. The latency per window is proportional to n × m which is the size of the window. Throughput is multiplied by the number of calculators, but the window position is given individually. If the window position is random, if the window is far away, the same pixel cannot be accessed and calculated. Unlike the case where the window position is given to, it is not suitable for parallel processing.

また、１ウィンドウ内で複数のピクセルを並列に処理する方法がある。この場合には、複数ピクセルを１サイクルの間に処理できる１個の演算器セットを１ウィンドウ処理に全てつぎ込む方式であり、１つのウィンドウ内で並列に処理することが出来るので、ウィンドウ位置に影響されることがなく、ウィンドウ位置がランダムである場合に向く。 There is also a method of processing a plurality of pixels in parallel in one window. In this case, a single arithmetic unit set capable of processing a plurality of pixels in one cycle is all inserted into one window process, and can be processed in parallel in one window, which affects the window position. This is suitable when the window position is random.

図１６は、画像のテンプレートマッチングのためのハードウェア構成例を示す図である。
処理するウィンドウ位置が密集して固定されている場合を示しており、（０，０）から（３，３）までの４×４のウィンドウ位置１６個が並列して処理される。１つの入力イメージからArea１−１６のデータが、GSEU（Gray-scale Search Engine Unit）に入力される。GSEUは、並列に処理するArea１−１６のそれぞれについて１つずつ設けられており、Area１−１６のそれぞれを並列に処理する構成となっている。各GSEUには、それぞれの正規化相関要素値を計算する演算器が設けられており、Area１−１６の入力データと、参照データとを使って、演算を行う。それぞれの演算器が行う処理は、行う処理に応じて１オペレーションであったり、２オペレーションであったり、３オペレーションであったりする。GSEUは、例えば、３００MHｚのクロックで動作しており、演算結果は、バスを介して出力される。 FIG. 16 is a diagram illustrating a hardware configuration example for image template matching.
The case where the window positions to be processed are densely fixed is shown, and 16 4 × 4 window positions from (0, 0) to (3, 3) are processed in parallel. The data of Area 1-16 from one input image is input to GSEU (Gray-scale Search Engine Unit). One GSEU is provided for each Area 1-16 to be processed in parallel, and each Area 1-16 is processed in parallel. Each GSEU is provided with an arithmetic unit for calculating each normalized correlation element value, and performs an operation using the input data of Area 1-16 and the reference data. The processing performed by each arithmetic unit is one operation, two operations, or three operations depending on the processing to be performed. The GSEU operates with, for example, a 300 MHz clock, and the calculation result is output via the bus.

ウィンドウが重なって配置されている場合、あるサイクルで参照すべき参照画像のピクセルを１つだけとすることにより、ウィンドウ位置が規則的な場合のみなら効率的な方法である。もし、ランダムだと、どれか１つのウィンドウしか一度に処理できない。 When the windows are arranged in an overlapping manner, it is an efficient method only when the window positions are regular by setting only one pixel of the reference image to be referred to in a certain cycle. If it is random, only one window can be processed at a time.

図１７は、ウィンドウ単位で並列処理する方式を説明する図である。
各演算器は各々別のウィンドウを担当する。図１７では、乗算器１０、加算器１１、レジスタ１２からなる演算器が１６個設けられており、それぞれ、ウィンドウ（０，０）から（３，３）までの１６個のウィンドウを１つずつ担当する。ウィンドウの位置を表す座標は、ウィンドウの左上端の画素の座標である。１ウィンドウあたりのレイテンシは、１ウィンドウの大きさであるピクセル数ｎ×ｍ個に比例する。スループットは、演算器が１つの場合に比べ演算器個数倍良くなる。 FIG. 17 is a diagram for explaining a method of performing parallel processing in units of windows.
Each computing unit is in charge of a separate window. In FIG. 17, 16 arithmetic units each including a multiplier 10, an adder 11, and a register 12 are provided, and each of 16 windows from windows (0, 0) to (3, 3) is provided. Handle. The coordinates representing the position of the window are the coordinates of the upper left pixel of the window. The latency per window is proportional to the number of pixels n × m which is the size of one window. The throughput is improved by a factor of the number of computing units compared to the case of one computing unit.

一方、１サイクルあたり4 or 8 or 9 or 16ピクセル処理できる１個の演算器セットを１ウィンドウ処理に全てつぎ込む方式や、4 or 8 or 9 or 16ピクセルは2x2 or 4x2 or 3x3 or 4x4などのようにタイル状の領域分けて、領域ごとに処理を行う方式などが考えられる。 On the other hand, one arithmetic unit set that can process 4 or 8 or 9 or 16 pixels per cycle is put into one window processing, or 4 or 8 or 9 or 16 pixels are 2x2 or 4x2 or 3x3 or 4x4 For example, a method may be considered in which tiled areas are divided and processing is performed for each area.

図１８は、１ウィンドウで複数ピクセル並列に処理する方式を説明する図である。
図１８では、６×８のテクスチャを４×２のタイル状の領域に分けてピクセルを読み込み、処理する場合を示している。４×２のタイル状領域を単位とすることで、１ウィンドウを６サイクルで読み込み終了となる。入力画像の読み込みも４×２のタイル状領域を単位に読み込むようにする。演算器は、Σｘｙを計算する場合の例であるが、乗算器１０−１〜１０−８の８個を備え、更に、加算器１１とレジスタ１２を備える。 FIG. 18 is a diagram for explaining a method of processing a plurality of pixels in parallel in one window.
FIG. 18 shows a case where pixels are read and processed by dividing a 6 × 8 texture into 4 × 2 tile areas. By using a 4 × 2 tile area as a unit, one window is read in 6 cycles. The input image is also read in units of 4 × 2 tile areas. The arithmetic unit is an example in the case of calculating Σxy, and includes eight multipliers 10-1 to 10-8, and further includes an adder 11 and a register 12.

図１９は、従来の正規化相関処理装置の構成を説明する図である。
ウィンドウ位置の制御パラメタは規則アクセス前提とした指定であり、ウィンドウ位置の制御パラメタは、先頭ウィンドウの位置、間引き幅、縦横ウィンドウ数などである。図１９（ａ）に示されるように、制御レジスタ２５には、ウィンドウの読み込み制御を行う制御回路２６に与えるパラメタが格納されており、例えば、バッファ充填サイズｂｈ、ｂｗ、テクスチャサイズｔｈ、ｔｗ、先頭ウィンドウ位置ｘ、ｙ、縦横間引き幅ｃｘ、ｃｙ、縦横ウィンドウ数ｎｘ、ｎｙである。図１９（ｂ）の構成図によれば、メモリ２０がバス２１を介してテクスチャバッファ２２とウィンドウバッファ２３に接続される。テクスチャバッファ２２には、マッチングを取りたいテンプレートを格納する。ウィンドウバッファ２３には、メモリ２０に格納された元画像からマッチングを行う領域の画像データを読み込む。正規化相関演算回路２４は、テクスチャバッファ２２とウィンドウバッファ２３からピクセルデータを読み込んで、正規化相関値を演算する。 FIG. 19 is a diagram for explaining the configuration of a conventional normalized correlation processing apparatus.
The window position control parameters are specified on the assumption of rule access, and the window position control parameters include the position of the top window, the thinning width, the number of vertical and horizontal windows, and the like. As shown in FIG. 19A, the control register 25 stores parameters to be given to the control circuit 26 that performs window reading control. For example, the buffer filling sizes bh, bw, texture sizes th, tw, The first window position x, y, the vertical and horizontal thinning widths cx, cy, and the vertical and horizontal window numbers nx, ny. According to the configuration diagram of FIG. 19B, the memory 20 is connected to the texture buffer 22 and the window buffer 23 via the bus 21. The texture buffer 22 stores a template to be matched. The window buffer 23 reads image data of an area to be matched from the original image stored in the memory 20. The normalized correlation calculation circuit 24 reads pixel data from the texture buffer 22 and the window buffer 23 and calculates a normalized correlation value.

図２０は、従来の、ダブルバッファ方式の正規化相関処理装置の構成を説明する図である。
ウィンドウ位置の制御パラメタは規則アクセス前提とした先頭ウィンドウの位置と間引き幅で指定する。図２０（ａ）に示されるように、ウィンドウの読み込みを制御する制御回路２６には、制御レジスタ２５からパラメタが入力される。制御レジスタ２５に格納されるパラメタは、例えば、バッファ充填サイズｂｈ、ｂｗ、テクスチャサイズｔｈ、ｔｗ、先頭ウィンドウ位置ｘ、ｙ、縦横間引き幅ｃｘ、ｃｙ、縦横ウィンドウ数ｎｘ、ｎｙである。図２０（ｂ）のように、メモリ２０は、バス２１に接続され、テクスチャバッファ２２、ウィンドウバッファ＃０２３−１、ウィンドウバッファ＃１２３−２とデータの送受を行う。ウィンドウバッファ＃０２３−１とウィンドウバッファ＃１２３−２には、異なるウィンドウが格納され、セレクタ２７で順次切り替えて、正規化相関演算回路２４にウィンドウデータを供給する。 FIG. 20 is a diagram for explaining the configuration of a conventional double buffer type normalized correlation processing apparatus.
The window position control parameter is specified by the position of the top window and the thinning width based on rule access. As shown in FIG. 20A, parameters are input from the control register 25 to the control circuit 26 that controls reading of the window. Parameters stored in the control register 25 are, for example, buffer filling sizes bh and bw, texture sizes th and tw, leading window positions x and y, vertical and horizontal thinning widths cx and cy, and vertical and horizontal window numbers nx and ny. As shown in FIG. 20B, the memory 20 is connected to the bus 21 and transmits / receives data to / from the texture buffer 22, window buffer # 0 23-1, and window buffer # 1 23-2. Different windows are stored in the window buffer # 0 23-1 and the window buffer # 1 23-2, and are sequentially switched by the selector 27 to supply window data to the normalized correlation calculation circuit 24.

このように、ダブルバッファ化することで、データ供給処理と演算処理を並列化可能となる。
従来の構成では、ウィンドウ位置は、ある矩形内で1ピクセルずつずらして全ての位置で演算を行うというように規則的であることを前提としてその高速化方式を検討してきた。しかし、近年の画像処理アプリケーションではこのような規則的な場合だけでなく、ウィンドウ位置が探索対象画像の任意の位置にランダムに存在するという使い方が多くなってきており、このような場合に従来方式をそのまま適用すると性能が十分に発揮できないという問題があった。 In this way, by providing a double buffer, it is possible to parallelize data supply processing and arithmetic processing.
In the conventional configuration, the speeding-up method has been studied on the assumption that the window position is regular such that the calculation is performed at every position by shifting one pixel at a time within a certain rectangle. However, in recent image processing applications, not only in such a regular case, but also in a way that the window position is randomly present at an arbitrary position of the search target image, in such a case, the conventional method is used. There was a problem that the performance could not be fully exhibited if applied as it is.

高速化には並列処理が必要となるが、その方法として、ウィンドウ単位の並列処理とピクセル単位の並列処理が考えられる。また、高速化には複数ウィンドウの一括突き放し処
理が必要である。ウィンドウ位置が規則的であるという前提がある場合、メモリアクセス量を減らすためにはウィンドウ単位並列処理の方が効率が良いため、従来はこの方法が取られてきた。しかし、ウィンドウ単位並列処理とすると、位置ランダムな場合、並列処理するウィンドウ毎にランダムなアドレスからのメモリアクセスになり実現が困難になる。また、一括突き放し方式がウィンドウ位置規則的を前提とするため、位置ランダムな場合の突き放しは考慮されてこなかった。 Parallel processing is required for speeding up, and as the method, parallel processing in window units and parallel processing in pixel units can be considered. In addition, for speeding up, it is necessary to push out multiple windows at once. When there is a premise that the window position is regular, this method has been conventionally used because parallel processing in units of windows is more efficient in order to reduce the memory access amount. However, if the window unit parallel processing is used, if the position is random, it becomes difficult to realize the memory access from a random address for each window to be processed in parallel. Further, since the collective push-out method is based on a regular window position, the push-out in the case of random positions has not been considered.

例えば、ウィンドウ単位並列処理を行う従来方式(特許文献１〜３)の場合は並列処理のウィンドウ位置が規則的になっているため、ランダムな場合、一括処理突き放し処理が出来ず、一度に１ウィンドウしか処理できなくなる。 For example, in the case of the conventional method (Patent Documents 1 to 3) that performs window unit parallel processing, since the window position of the parallel processing is regular, if it is random, batch processing cannot be pushed out, one window at a time It can only be processed.

また、画像分割による並列処理を行う従来方式(特許文献４)の場合はデータの局所性を用いてメモリのロードの時間を減らし、かつ、画像分割による並列処理を行うが、ランダムアクセスの場合、データの局所性を利用できず、多くの演算器を搭載しても、メモリアクセス速度がボトルネックとなり、演算器が遊んでしまう。
特開平０９−００９２６９号公報特開平１０−１３４１８３号公報特開２００６−０１３８７３号公報特開２００８−８４０３４号公報 Further, in the case of the conventional method for performing parallel processing by image division (Patent Document 4), the load time of the memory is reduced by using data locality and parallel processing by image division is performed. In the case of random access, The locality of data cannot be used, and even if many arithmetic units are installed, the memory access speed becomes a bottleneck and the arithmetic units are idle.
JP 09-009269 A Japanese Patent Laid-Open No. 10-134183 JP 2006-013873 A JP 2008-84034 A

以上のように、これまではウィンドウ位置が規則的なことを前提とした方式が考えられてきたが、規則的な場合とランダムな場合の両方に柔軟に対応して高速処理可能な装置構成が必要である。 As described above, a method based on the assumption that the window position is regular has been considered so far, but there is a device configuration capable of high-speed processing flexibly corresponding to both a regular case and a random case. is necessary.

従来の正規化相関回路方式では探索ウィンドウ位置が規則的に密集していることを前提として、その高速化を行う構成であった。しかし、近年の画像処理アプリケーションではこのような規則的な場合だけでなく、ウィンドウ位置がランダムなことが多くなってきており、このような場合に従来方式をそのまま適用すると性能が十分に発揮できないという問題があった。 The conventional normalized correlation circuit method is configured to increase the search speed on the assumption that search window positions are regularly concentrated. However, in recent image processing applications, not only such a regular case but also the window position is often random, and in such a case, if the conventional method is applied as it is, the performance cannot be fully exhibited. There was a problem.

本発明の課題は、ウィンドウ位置がランダムな場合も対応可能な正規化相関処理装置を提供することである。 An object of the present invention is to provide a normalized correlation processing apparatus that can cope with a case where a window position is random.

本発明の正規化相関処理装置は、画像のパタンマッチング処理をするための正規化相関処理装置であって、パタンマッチングのためのテンプレートを保持するテンプレートバッファ手段と、探索矩形の縦横サイズを任意に指定して、入力画像の前記探索矩形を保持する複数の探索矩形バッファ手段と、前記探索矩形の位置情報と、前記探索矩形バッファ手段への書き込みを制御するためのデータを保持する制御レジスタ手段と、前記テンプレートバッファ手段に格納されたテンプレートと、前記複数の探索矩形バッファ手段の１つに格納された探索矩形内のデータとの正規化相関値を演算する演算手段とを備える。 A normalized correlation processing apparatus according to the present invention is a normalized correlation processing apparatus for performing pattern matching processing of an image, and includes template buffer means for holding a template for pattern matching, and arbitrary vertical and horizontal sizes of a search rectangle. A plurality of search rectangle buffer means for specifying and holding the search rectangle of the input image, position information of the search rectangle, and a control register means for holding data for controlling writing to the search rectangle buffer means Computing means for computing a normalized correlation value between the template stored in the template buffer means and data in the search rectangle stored in one of the plurality of search rectangle buffer means.

本発明によれば、ウィンドウ位置がランダムな場合にも対応可能な正規化相関処理装置を提供することが出来る。 ADVANTAGE OF THE INVENTION According to this invention, the normalization correlation processing apparatus which can respond also when a window position is random can be provided.

本実施形態においては、ウィンドウ位置配列に一括突き放し処理するウィンドウ位置を
ウィンドウ毎に個別指定可能とし、ウィンドウバッファにウィンドウを充填する際にはテクスチャサイズth,twに関係なく縦横サイズbh,bwを任意指定可能とする。ウィンドウバッファはダブルバッファ構成として、演算と平行してウィンドウ充填動作を行う。ダブルバッファの切替えは、現在使用中のバッファに配列番号順で次に処理するウィンドウが入っている場合は切替えないという制御を行う回路により自動で行う。また、他方バッファへの充填すべき領域をハードウェアが自動計算する制御回路があり、ユーザからの利用を容易にする。これらによりウィンドウ位置が規則的かどうかにかかわらず、同じ手続きで処理可能となる。 In this embodiment, it is possible to individually specify the window positions to be processed in a lump in the window position array for each window, and when filling the window into the window buffer, the vertical and horizontal sizes bh and bw are arbitrarily set regardless of the texture size th and tw. Can be specified. The window buffer has a double buffer configuration and performs a window filling operation in parallel with the calculation. Switching of the double buffer is automatically performed by a circuit that performs control so that switching is not performed when a window to be processed next is included in the sequence number in the currently used buffer. On the other hand, there is a control circuit in which the hardware automatically calculates the area to be filled in the buffer, facilitating use by the user. These allow processing with the same procedure regardless of whether the window position is regular or not.

図１及び図２は、本発明の実施形態の概要を説明する図である。
図１（ａ）に示されるように、制御レジスタ３０には、バッファ充填サイズｂｈ、ｂｗ、テクスチャサイズｔｈ、ｔｗ、ウィンドウ位置配列が設定される。バッファ充填サイズは、ウィンドウバッファに格納する矩形領域のサイズをテクスチャサイズとは関係なく、任意に指定可能とするものである。テクスチャサイズは、テクスチャのサイズを指定する。ウィンドウ位置配列は、ウィンドウの位置がランダムな場合に、ウィンドウのランダムな位置を個別に指定する配列である。ウィンドウの位置が、規則的な場合には、先頭ウィンドウ位置からの縦横サイズと間引き距離によってもウィンドウ位置を指定可能とする。制御回路３１は、ダブルバッファ構成のウィンドウバッファに、次に充填するウィンドウの開始位置を決定するバッファ次充填開始位置探索回路と、処理するべきウィンドウがダブルバッファ構成のバッファのいずれに格納されているかの判断結果に基づいて、読み出しを行うバッファを切り替える回路であるバッファ切り替え判定回路とからなる。これは、パラメタから自動的に切り替え判定、バッファ充填矩形領域決定を行う回路である。 1 and 2 are diagrams for explaining the outline of the embodiment of the present invention.
As shown in FIG. 1A, the buffer filling sizes bh and bw, the texture sizes th and tw, and the window position array are set in the control register 30. The buffer filling size can arbitrarily specify the size of the rectangular area stored in the window buffer regardless of the texture size. The texture size specifies the size of the texture. The window position array is an array for individually specifying a random position of the window when the position of the window is random. When the position of the window is regular, the window position can be specified by the vertical and horizontal sizes and the thinning distance from the head window position. The control circuit 31 stores the buffer next filling start position search circuit for determining the start position of the next filling window in the window buffer having the double buffer structure and the buffer having the double buffer structure in which the window to be processed is stored. And a buffer switching determination circuit which is a circuit for switching a buffer to be read based on the determination result. This is a circuit that automatically performs switching determination and buffer filling rectangular area determination from parameters.

図１（ｂ）は、正規化相関処理装置の構成を示す。メモリ２０からは、バス２１を介して、テクスチャバッファ２２にテクスチャのデータが送られ、ウィンドウバッファ＃０２３−１とウィンドウバッファ＃１２３−２には、ウィンドウのデータが送られる。本実施形態では、ウィンドウバッファの格納領域の大きさが任意に指定可能となっている。セレクタ２７で選択されたウィンドウバッファからは、ウィンドウデータが読み出され、正規化相関演算回路２４に入力される。正規化相関演算回路２４には、テクスチャバッファ２２からテクスチャデータが入力され、ウィンドウデータと共に、正規化相関演算に使用される。正規化相関演算回路２４は、１ウィンドウ内でピクセル単位の並列処理を行う回路となっている。 FIG. 1B shows the configuration of the normalized correlation processing device. Texture data is sent from the memory 20 to the texture buffer 22 via the bus 21, and window data is sent to the window buffer # 0 23-1 and window buffer # 1 23-2. In this embodiment, the size of the storage area of the window buffer can be arbitrarily specified. Window data is read from the window buffer selected by the selector 27 and input to the normalized correlation calculation circuit 24. The normalized correlation calculation circuit 24 receives the texture data from the texture buffer 22 and is used for the normalized correlation calculation together with the window data. The normalized correlation calculation circuit 24 is a circuit that performs parallel processing in units of pixels within one window.

図２において、バッファに充填する矩形領域縦横サイズ指定パラメタｂｈ、ｂｗを新たに設けて自由指定可能（ｔｈ≠ｂｈ,ｔｗ≠ｂｗ可能）とする。ウィンドウバッファに充填する矩形サイズを自由に設定可能とすることにより、ランダムに与えられたウィンドウ位置に対し、ウィンドウが重なっている場合には、重なっているウィンドウをすべてウィンドウバッファにまとめて充填する。このことにより、互いに重なったウィンドウについては、読み出す際に、バッファ切り替えを行わずに読み出しを可能とする。 In FIG. 2, rectangular area vertical and horizontal size designation parameters bh and bw to be filled in the buffer are newly provided so that they can be freely designated (th ≠ bh, tw ≠ bw). By making it possible to freely set the rectangular size to be filled in the window buffer, when windows overlap at a randomly given window position, all the overlapping windows are filled in the window buffer. As a result, windows that overlap each other can be read without performing buffer switching when reading.

正規化相関処理装置の性能を表すパラメタとしては、以下のようなものがある。
・基本データ供給性能〜1サイクルあたりに外部メモリから持ってくることができるピクセル数
・実データ供給性能〜基本データ供給性能にバッファ内のピクセル再利用によるキャッシュ効果を加味した1サイクルあたりに得られるピクセル数
・演算性能〜演算器が1サイクルあたりに処理可能なピクセル数 The parameters representing the performance of the normalized correlation processing apparatus include the following.
-Basic data supply performance-Number of pixels that can be brought from external memory per cycle-Real data supply performance-Basic data supply performance Obtained per cycle, taking into account the cache effect of pixel reuse in the buffer Number of pixels / calculation performance-The number of pixels that the computing unit can process per cycle

図３は、ウィンドウ位置が位置規則的な場合の動作を説明する図である。
図３においては、
・基本データ供給性能＜演算性能であるとする。
・位置配列はA,B,C,D,...,Lの順とする。
・A-D,E-H,I-Lがそれぞれ一括充填されるようにbh,bwが設定されているとする。
・キャッシュ効果により実データ供給性能＞演算性能となるケースとする。
実データ供給性能は、以下の式で与えられる。
(実データ供給性能=基本データ供給性能+キャッシュ効果) FIG. 3 is a diagram for explaining the operation when the window position is regular.
In FIG.
-Basic data supply performance <calculation performance.
・ Position sequence is A, B, C, D, ..., L.
・ Assume that bh and bw are set so that AD, EH, and IL are filled together.
・ Assuming that the actual data supply performance> calculation performance due to the cache effect.
The actual data supply performance is given by the following equation.
(Actual data supply performance = basic data supply performance + cache effect)

動作は以下の通りとなる。
１．Aの左上座標からｂｈ、ｂｗの矩形をウィンドウバッファ０に充填する。A-Dの領域が充填される。
２．Aの相関値を計算する。同時に、ウィンドウバッファ１に充填する次充填開始ウィンドウ位置Eを、次充填開始ウィンドウ位置探索回路で探索し、バッファ１に充填する。更に次の充填開始ウィンドウ位置Iを次充填開始ウィンドウ位置探索回路で探索する。バッファ０は使用中なので、充填はバッファ０が解放されるまで保留する。
３．Aの計算を完了したら、Bがバッファ０に含まれるか否かをバッファ切替え判定回路で判定する。
４．Bは含まれるので、バッファを切替えずにBを計算する。
５．同様にDまで計算する。
６．Dの計算を完了したら、同様にEがウィンドウバッファ０に含まれるか否かをバッファ切替え判定回路で判定する。
７．Eは含まれないので、バッファをウィンドウバッファ１に切替えてEを読み込み、Eを計算する。平行して、ウィンドウバッファ０が空いたので先に保留されていたIからの矩形の充填を行う。続けて、次充填開始ウィンドウ位置を判定するが、終端Lまで全てがウィンドウバッファ０に含まれるので、充填は終了する。
８．これまでと同様に、F,G,Hと1つのウィンドウが終了するごとにバッファ切替え判定を行いながら計算する。Hまではバッファは切り替わらない。
９．H終了時点でバッファ切替え判定を行うと、Iはウィンドウバッファ１に含まれないので、ウィンドウバッファ０に切替えて処理を終端まで同様に継続する。 The operation is as follows.
1. The window buffer 0 is filled with rectangles bh and bw from the upper left coordinate of A. The area of AD is filled.
2. Calculate the correlation value of A. At the same time, the next filling start window position E to be filled in the window buffer 1 is searched by the next filling start window position search circuit, and the buffer 1 is filled. Further, the next filling start window position I is searched by the next filling start window position search circuit. Since buffer 0 is in use, filling is suspended until buffer 0 is released.
3. When the calculation of A is completed, the buffer switching determination circuit determines whether B is included in the buffer 0 or not.
4). Since B is included, B is calculated without switching the buffer.
5. Similarly, calculate up to D.
6). When the calculation of D is completed, the buffer switching determination circuit determines whether E is included in the window buffer 0 in the same manner.
7). Since E is not included, the buffer is switched to window buffer 1, E is read, and E is calculated. In parallel, since the window buffer 0 is vacant, the rectangle from I that has been previously reserved is filled. Subsequently, the next filling start window position is determined. However, since all the windows up to the end L are included in the window buffer 0, the filling ends.
8). As before, each time F, G, H and one window ends, calculation is performed while performing buffer switching determination. Until H, the buffer is not switched.
9. When the buffer switching determination is performed at the end of H, since I is not included in the window buffer 1, switching to the window buffer 0 is performed and the processing is similarly continued until the end.

図３の例は、テクスチャが比較的小さく、ウィンドウ位置が規則的な場合を示している。また、充填矩形サイズをｂｈ＞ｔｈ,ｂｗ＞ｔｗに設定し、一度に複数ウィンドウを一つのバッファに充填する。 The example of FIG. 3 shows a case where the texture is relatively small and the window positions are regular. The filling rectangle size is set to bh> th, bw> tw, and a plurality of windows are filled into one buffer at a time.

図４は、ウィンドウ位置がランダムな場合の動作を説明する図である。
図４においては、以下の条件が成り立っているとする。
・基本データ供給性能＜演算性能の回路構成の場合とする。
・位置配列はA,B,Cの順とする。
・A,B,Cがそれぞれ一個ずつ充填されるようにｂｈ,ｂｗが設定されているとする。
・キャッシュ効果がなく実データ供給性能＜演算性能となるケースであるとする。 FIG. 4 is a diagram for explaining the operation when the window position is random.
In FIG. 4, it is assumed that the following conditions are satisfied.
• The basic data supply performance is less than the calculation performance circuit configuration.
・ The position array is in the order of A, B, C.
Suppose bh and bw are set so that A, B, and C are filled one by one.
Suppose that there is no cache effect and the actual data supply performance is less than the computation performance.

動作は以下のようになる。
１．Aの左上座標からｂｈ、ｂｗの矩形をウィンドウバッファ０に充填する。Aの領域のみが充填される。
２．Aの相関値を計算する。同時に、ウィンドウバッファ１に充填する次充填開始ウィンドウ位置Bを次充填開始ウィンドウ位置探索回路で判定し、ウィドウバッファ１に充填する。更に、次の充填開始ウィンドウ位置Cを次充填開始ウィンドウ位置判定回路で計算する。ウィンドウバッファ０は使用中なので、充填はウィンドウバッファ０が解放されるまで保留する。
３．Aの計算を完了したら、Bがウィンドウバッファ０に含まれるか否かをバッファ切替え判定回路で判定する。
４．Bは含まれないので、バッファを切替えてBが充填されるのを待ってからBを計算する
。平行して、ウィンドウバッファ０が空いたので、先に保留されていたCからの矩形の充填を行う。続けて、次充填開始ウィンドウ位置を判定するが、終端Cまで全てがウィンドウバッファ０に含まれるので、充填は終了する。
５．同様に、Bの計算が終了すると、バッファ切替え判定を行い、Cは含まれないのでバッファを切替えて、Cが充填されるのを待ってからCを計算して終了する。 The operation is as follows.
1. The window buffer 0 is filled with rectangles bh and bw from the upper left coordinate of A. Only area A is filled.
2. Calculate the correlation value of A. At the same time, the next filling start window position B for filling the window buffer 1 is determined by the next filling start window position search circuit, and the window buffer 1 is filled. Further, the next filling start window position C is calculated by the next filling start window position determination circuit. Since window buffer 0 is in use, filling is deferred until window buffer 0 is released.
3. When the calculation of A is completed, the buffer switching determination circuit determines whether B is included in the window buffer 0 or not.
4). Since B is not included, switch the buffer and wait for B to fill before calculating B. In parallel, since the window buffer 0 is vacant, the rectangle from C that was previously reserved is filled. Subsequently, the next filling start window position is determined. Since all of the windows up to the end C are included in the window buffer 0, the filling ends.
5. Similarly, when the calculation of B is completed, a buffer switching determination is performed. Since C is not included, the buffer is switched, and after waiting for C to be filled, C is calculated and the processing ends.

図４においては、テクスチャが比較的大きく、ウィンドウ位置がランダムな例を示している。充填矩形サイズをテクスチャサイズと同じ（ｂｈ＝ｔｈ,ｂｗ＝ｔｗ）に設定し、通常ダブルバッファとして動作させる。 FIG. 4 shows an example in which the texture is relatively large and the window position is random. The filling rectangle size is set to be the same as the texture size (bh = th, bw = tw), and is normally operated as a double buffer.

図５は、キャッシュ効果が生じるウィンドウ位置がランダムな場合の動作を説明する図である。
図５においては、以下の条件が成り立っているとする。
・基本データ供給性能＜演算性能の回路構成の場合とする。
・位置配列はA,B,Cの順とする。
・A,Bがそれぞれウィンドウバッファ０に、Cがウィンドウバッファ１に充填されるようにｂｈ、ｂｗが設定されているとする。
・キャッシュ効果があり実データ供給性能≒演算性能となるケース。 FIG. 5 is a diagram for explaining the operation when the window position where the cache effect occurs is random.
In FIG. 5, it is assumed that the following conditions are satisfied.
• The basic data supply performance is less than the calculation performance circuit configuration.
・ The position array is in the order of A, B, C.
Suppose bh and bw are set so that A and B are filled in window buffer 0 and C is filled in window buffer 1, respectively.
A case where there is a cache effect and the actual data supply performance is almost equal to the computation performance.

動作は、以下の通りとなる。
１．Aの左上座標からｂｈ、ｂｗの矩形をウィンドウバッファ０に充填する。A,Bの領域が充填される。
２．Aの相関値を計算する。同時に、ウィンドウバッファ１に充填する次充填開始ウィンドウ位置Cを次充填開始ウィンドウ位置探索回路で探索し、ウィンドウバッファ１に充填する。更に、次の充填開始ウィンドウ位置を探索するが、終端まで来たので充填は終了する。
３．Aの計算を完了したら、Bがウィンドウバッファ０に含まれるか否かをバッファ切替え判定回路で判定する。
４．Bは含まれるのでバッファを切り替えずにBを計算する。
５．同様に、Bの計算が終了するとバッファ切替え判定を行い、Cは含まれないのでバッファを切替えてCを計算して終了する。 The operation is as follows.
1. The window buffer 0 is filled with rectangles bh and bw from the upper left coordinate of A. A and B areas are filled.
2. Calculate the correlation value of A. At the same time, the next filling start window position C to be filled in the window buffer 1 is searched by the next filling start window position search circuit, and the window buffer 1 is filled. Further, the next filling start window position is searched, but since the end has been reached, the filling ends.
3. When the calculation of A is completed, the buffer switching determination circuit determines whether B is included in the window buffer 0 or not.
4). Since B is included, B is calculated without switching the buffer.
5. Similarly, when the calculation of B is completed, a buffer switching determination is performed. Since C is not included, the buffer is switched, C is calculated, and the process ends.

図５においては、テクスチャが比較的大きく、ウィンドウ位置がランダムの場合を示している。重複領域が大きければ、充填矩形サイズを（ｂｈ＞ｔｈ,ｂｗ＞ｔｗ）に設定すれば、キャッシュ効果が生じ、処理性能が上がる。 FIG. 5 shows a case where the texture is relatively large and the window position is random. If the overlapping area is large, setting the filling rectangle size to (bh> th, bw> tw) produces a cache effect and improves processing performance.

図６は、ウィンドウ位置が規則的な場合の動作タイミングを示す図である。
図６では、ABCDがウィンドウバッファ０に一括して充填され、EFGHがウィンドウバッファ１に一括して充填される場合を示している。また、基本データ供給性能<演算性能を想定し、実データ供給性能>基本データ供給性能、実データ供給性能=基本データ供給性能＋キャッシュ効果を想定している。 FIG. 6 is a diagram illustrating the operation timing when the window positions are regular.
FIG. 6 shows a case where ABCD is filled in the window buffer 0 in a batch and EFGH is filled in the window buffer 1 in a batch. Also, basic data supply performance <assuming computation performance, actual data supply performance> basic data supply performance, actual data supply performance = basic data supply performance + cache effect is assumed.

ウィンドウの位置は、制御レジスタには、ウィンドウA-Lまでのウィンドウ位置が格納されている。最初に、ウィンドウバッファ０に、ウィンドウA-Dを充填する。このとき、ウィンドウA-Dを一括して、１つのバッファに格納するために要する時間は、４つのウィンドウを個別にバッファに格納する時間の合計よりも小さいため、効率が上がる。ウィンドウバッファ０の充填が終わると、ウィンドウAから順に計算処理が行われる。この計算処理の間に、ウィンドウバッファ１に、ウィンドウE-Hを充填する。計算処理と充填処理が並行して実行されるので、ウィンドウ１への充填時間が隠蔽される。ウィンドウバッファ０に対し、ウィンドウDまで処理が終わったら、ウィンドウバッファ１からウィンドウEを読み出し、以降順次ウィンドウの計算処理を行う。ウィンドウバッファ１のウィンドウE-Hを処理している間に、ウィンドウバッファ１に、ウィンドウI以降を充填し、充填時間を隠蔽する。 The window positions up to windows A to L are stored in the control register. First, window buffer 0 is filled with window AD. At this time, since the time required to store the windows A to D collectively in one buffer is smaller than the total time required to individually store the four windows in the buffer, the efficiency increases. When the window buffer 0 is completely filled, calculation processing is performed in order from the window A. During this calculation process, window buffer 1 is filled with window E-H. Since the calculation process and the filling process are executed in parallel, the filling time for the window 1 is hidden. When the processing up to window D is completed for window buffer 0, window E is read from window buffer 1, and the window calculation processing is sequentially performed thereafter. While the window E-H of the window buffer 1 is being processed, the window buffer 1 is filled with the window I and subsequent windows, and the filling time is hidden.

図７は、ウィンドウ位置ランダム時に通常のダブルバッファとして使う場合であって、キャッシュ効果なしの場合の動作タイミングを示す図である。
基本データ供給性能＜演算性能の構成の場合とすると(例えば8ピクセル／サイクルのデータ供給性能で、16ピクセル／サイクルの演算性能)、充填時間がクリティカルパスとなる。すなわち、図７の場合、ウィンドウA-Cのウィンドウ位置が制御レジスタに格納されているが、各ウィンドウは、１つずつウィンドウバッファに格納される。まず、ウィンドウバッファ０にウィンドウAが充填され、充填が終わると、ウィンドウAが計算処理される。また、ウィンドウバッファ０のウィンドウAの充填が終わると、ウィンドウAの計算処理と並行して、ウィンドウバッファ１の充填が開始される。ウィンドウバッファ１には、ウィンドウBが充填される。ウィンドウBの充填が終わると、ウィンドウBの計算処理の開始と共に、ウィンドウバッファ０へのウィンドウCの充填が開始される。ここで、ウィンドウの計算処理よりもウィンドウバッファへのウィンドウの充填の方が時間がかかるので、処理時間は、ウィンドウバッファの充填時間によって決定される、すなわち、クリティカルパスとなる。 FIG. 7 is a diagram showing the operation timing in the case of using as a normal double buffer when the window position is random and without the cache effect.
If the basic data supply performance is less than the calculation performance (for example, the data supply performance is 8 pixels / cycle and the calculation performance is 16 pixels / cycle), the filling time becomes a critical path. That is, in the case of FIG. 7, the window position of the window AC is stored in the control register, but each window is stored in the window buffer one by one. First, window A is filled in window buffer 0. When filling is completed, window A is calculated. When filling of window A in window buffer 0 is completed, filling of window buffer 1 is started in parallel with the calculation processing of window A. Window buffer 1 is filled with window B. When filling of window B is completed, filling of window C into window buffer 0 is started at the same time as calculation processing of window B is started. Here, since it takes more time to fill the window buffer than the window calculation processing, the processing time is determined by the window buffer filling time, that is, a critical path.

図８は、ウィンドウ位置がランダム時でもキャッシュ効果ある場合の動作タイミングを説明する図である。
図８においては、基本データ供給性能＜演算性能の構成の場合であって、A,Bの重複が大きいため一括充填のキャッシュ効果で充填時間の隠蔽が可能な場合を示している。制御レジスタには、ウィンドウA-Cのウィンドウ位置が格納されている。ウィンドウバッファ０の充填において、ウィンドウAとBの充填が行われる。ウィンドウバッファ０の充填が終わると、ウィンドウAの計算処理と、ウィンドウバッファ１の充填が開始される。１つのウィンドウの計算処理時間は、ウィンドウバッファの充填時間より短いが、ウィンドウバッファ０には、２つのウィンドウが格納されているので、ウィンドウバッファ０内のウィンドウの計算処理が終わるまでに、ウィンドウバッファ１の充填が終わり、充填時間の隠蔽が可能である。また、ウィンドウバッファ０にウィンドウA、Bを一括して充填する場合には、ウィンドウAとBを別個に充填する時間の合計よりも充填時間が短くてすむので、実データ供給性能が良くなる。 FIG. 8 is a diagram for explaining the operation timing when the cache effect is present even when the window position is random.
FIG. 8 shows a case where the basic data supply performance is less than the calculation performance, and the filling time can be concealed by the cache effect of batch filling because the overlap between A and B is large. The control register stores the window position of the window AC. When the window buffer 0 is filled, the windows A and B are filled. When the filling of the window buffer 0 is finished, the calculation processing of the window A and the filling of the window buffer 1 are started. Although the calculation processing time of one window is shorter than the filling time of the window buffer, since two windows are stored in the window buffer 0, the window buffer is not processed until the calculation processing of the window in the window buffer 0 is completed. 1 filling is completed and the filling time can be concealed. Further, when the windows A and B are filled in the window buffer 0 all at once, the filling time can be shorter than the total time for filling the windows A and B separately, so that the actual data supply performance is improved.

図９は、ウィンドウバッファの充填方法とキャッシュ効果について説明する図である。ウィンドウバッファはダブルバッファ構成（バッファ０,１）とし、テクスチャサイズは、例えば、最大16KB(128x128相当)とする。また、バッファフィルサイズbh,bwは自由設定可能であり、テクスチャサイズth, tw固定としない。 FIG. 9 is a diagram for explaining a window buffer filling method and a cache effect. The window buffer has a double buffer configuration (buffers 0 and 1), and the texture size is, for example, a maximum of 16 KB (equivalent to 128 × 128). The buffer fill sizes bh and bw can be freely set, and are not fixed to the texture sizes th and tw.

ウィンドウバッファの切替え判定は、以下の通りである。
次ウィンドウが現在使用中のバッファにはいっていればバッファを切り替えない。これは、次ウィンドウ位置からハードウェア内部で自動判定する。開始時、a)ウィンドウ位置配列の先頭エントリのbh, bw矩形を１つのバッファ０に充填する。b)先頭エントリから順番にエントリをみていき、バッファ０に充填した矩形に含まれないエントリxに到達したら、そのエントリxのbh, bw矩形をバッファ１に充填する。ウィンドウの充填に用いたエントリの処理に到達したら、b)と同様に、そのエントリ以降で現在の矩形に含まれないエントリの矩形を他方のバッファに充填する。 The window buffer switching determination is as follows.
If the next window is in the current buffer, do not switch buffers. This is automatically determined within the hardware from the next window position. At the start, a) Fill the buffer 0 with the bh and bw rectangles of the first entry in the window position array. b) Look through the entries in order from the top entry, and when the entry x not included in the rectangle filled in the buffer 0 is reached, the bh and bw rectangles of the entry x are filled in the buffer 1. When the processing of the entry used for filling the window is reached, as in b), the rectangle of the entry that is not included in the current rectangle after that entry is filled in the other buffer.

図９（ａ）は、ウィンドウ位置が規則的な場合であり、バッファ容量に比べテクスチャが比較的小さい場合である。ウィンドウA-Dをウィンドウバッファ０に格納し、ウィンドウE-Hをウィンドウバッファ１に格納する。ウィンドウバッファ０内のウィンドウすべて
が処理し終わったら、ウィンドウバッファ０にウィンドウI-Lを充填する。図９（ｂ）は、ウィンドウ位置がランダムで、キャッシュ効果がない場合である。ウィンドウAは、ウィンドウバッファ０に格納され、ウィンドウBは、ウィンドウバッファ１に格納される。ウィンドウ同士の重なりが少ないため、複数のウィンドウを１つのウィンドウバッファに格納できない場合には、１つのウィンドウを１つのウィンドウバッファに格納する。一方、図９（ｃ）の場合は、ウィンドウ位置がランダムで、キャッシュ効果がある場合である。ウィンドウAとBの重なりが多く、ウィンドウAとBが１つのウィンドウバッファに収まる場合を示している。複数のウィンドウが１つのウィンドウバッファに格納できる場合には、１つのウィンドウバッファに複数のウィンドウを充填するようにする。複数のウィンドウを一括して１つのウィンドウバッファに格納するので、データの転送効率がよく、また、計算処理するウィンドウを切り替える場合にも、バッファを切り替える必要がないので、ウィンドウのデータの読み込みを早くすることが出来る。 FIG. 9A shows a case where the window position is regular and the texture is relatively small compared to the buffer capacity. The window AD is stored in the window buffer 0, and the window EH is stored in the window buffer 1. When all the windows in the window buffer 0 have been processed, the window buffer 0 is filled with the window IL. FIG. 9B shows a case where the window position is random and there is no cache effect. Window A is stored in window buffer 0 and window B is stored in window buffer 1. Since there is little overlap between windows, when a plurality of windows cannot be stored in one window buffer, one window is stored in one window buffer. On the other hand, the case of FIG. 9C is a case where the window position is random and there is a cache effect. This shows a case where windows A and B overlap each other, and windows A and B fit in one window buffer. When a plurality of windows can be stored in one window buffer, one window buffer is filled with a plurality of windows. Since multiple windows are stored in one window buffer at a time, data transfer efficiency is good, and even when switching windows for calculation processing, there is no need to switch buffers, so window data can be read quickly. I can do it.

図１０は、ウィンドウバッファの充填のための処理フローを示す図である。
図１０において、win[i]は、 i番目のウィンドウ（i⊂{0,1,…,N-1}）であり、buf[b]は、bバッファ,（b⊂{0,1}、ダブルバッファ時）である。 FIG. 10 is a diagram showing a processing flow for filling the window buffer.
In FIG. 10, win [i] is the i-th window (i⊂ {0,1, ..., N-1}), buf [b] is b buffer, (b⊂ {0,1}, Double buffer).

ウィンドウは、０〜N-1のN個あるものとする。ステップS１０において、ｉ＝０、ｂ＝０と初期化する。ステップＳ１１において、ウィンドウバッファbuf[b]へ、win[i]を開始位置とする矩形を充填する。ステップS１２において、ｉを１増加し、ステップS１３において、ｉがＮ以上か否かを判断する。ステップＳ１３の判断がＹｅｓの場合には、処理を終了し、ステップＳ１３の判断がＮｏの場合には、ステップＳ１４において、ｉ番目のウィンドウがｂのウィンドウバッファbuf[b]に充填可能か否かを判断する。ステップＳ１４の判断がＹｅｓの場合には、バッファを充填し、ステップＳ１２に戻り、Ｎｏの場合には、ステップＳ１５に進む。ステップＳ１５においては、ｂを、ｂのビット反転したもの（０を１に、１を０にする）に置き換え、ステップＳ１６に進む。ステップＳ１６では、バッファbuf[b]は空か否かを判断する。ステップS１６の判断がＮｏの場合には、バッファが空になるまでステップS１６のステップを繰り返す。ステップS１６の判断がＹｅｓの場合には、ステップS１１に戻る。 It is assumed that there are N windows from 0 to N-1. In step S10, i = 0 and b = 0 are initialized. In step S11, the window buffer buf [b] is filled with a rectangle starting from win [i]. In step S12, i is increased by 1. In step S13, it is determined whether i is N or more. If the determination in step S13 is Yes, the process ends. If the determination in step S13 is No, in step S14, whether or not the i-th window can be filled in the b window buffer buf [b]. Judging. If the determination in step S14 is yes, the buffer is filled, and the process returns to step S12. If the determination is no, the process proceeds to step S15. In step S15, b is replaced with a bit inverted version of b (0 is set to 1 and 1 is set to 0), and the process proceeds to step S16. In step S16, it is determined whether or not the buffer buf [b] is empty. If the determination in step S16 is No, step S16 is repeated until the buffer is empty. If the determination in step S16 is yes, the process returns to step S11.

図１１は、ウィンドウバッファの切り替え処理フローを示す図である。
図１１の次のウィンドウ処理に使うバッファの決定を行うフローにおいては、現在使用中のバッファに入っていないときだけ切替える。 FIG. 11 is a diagram showing a window buffer switching process flow.
In the flow for determining the buffer to be used for the next window processing in FIG. 11, switching is performed only when the buffer currently in use is not included.

図１１において、icは、現在処理しているウィンドウ番号、bcは、現在使用しているバッファ番号、bnは、次に使用するバッファ番号である。ステップＳ２０において、i=ic+1、b=bcと初期化し、ステップＳ２１において、ウィンドウwin[i]がウィンドウバッファbuf[b]に充填されているか否かを判断する。ステップＳ２１の判断がＹｅｓの場合には、ステップＳ２３に進み、Ｎｏの場合には、ステップＳ２２に進む。ステップＳ２２においては、ｂのビットを反転し、ステップＳ２３において、ｂｎにｂを設定し、処理を終了する。 In FIG. 11, ic is the window number currently being processed, bc is the buffer number currently used, and bn is the buffer number to be used next. In step S20, i = ic + 1 and b = bc are initialized. In step S21, it is determined whether or not the window win [i] is filled in the window buffer buf [b]. If the determination in step S21 is Yes, the process proceeds to step S23, and if the determination is No, the process proceeds to step S22. In step S22, the bit of b is inverted. In step S23, b is set to bn, and the process ends.

図１２は、ウィンドウバッファの充填、切り替えの自動判定方法を説明する図である。
winがbuf内にある条件 win[i]⊆buf[b]?（図１０、図１１）の判定方法は、以下の通りである。 FIG. 12 is a diagram for explaining an automatic determination method for filling and switching the window buffer.
The determination method of the condition win [i] ⊆buf [b]? (FIGS. 10 and 11) where win is in buf is as follows.

win[i]の左上座標(x,y)、buf[b]の左上座標(px,py)とする。winの左上座標がbufの左上座標より右下にあり、かつ、winの右下座標がbufの右下座標より左上にあるという意味の以下の判断式を判断する。
((px<=x) && (py<=y)) && (((px+bw)>=(x+tw)) && ((py+bh)>=(y+th)))
図１２は、(px,py)=(x0,y0), (x,y)=(x1,y1)の場合の例である。初期設定値としては、ｂｈ（バッファの高さ）、ｂｗ（バッファ幅）、ｔｈ（ウィンドウの高さ）、ｔｗ（ウィンドウの幅）がある。win[i]として、ウィンドウの位置配列ｘ、ｙが与えられる。px<=x&&py<=yは、winの左上座標が、bufの左上座標の右下にあるという意味の判断式である。また、((px+bw)>=(x+tw)) && ((py+bh)>=(y+th))は、winの右下座標がbufの右下座標より左上にあるという意味の判断式である。 The upper left coordinates (x, y) of win [i] and the upper left coordinates (px, py) of buf [b] are used. The following judgment formula is determined which means that the upper left coordinate of win is lower right than the upper left coordinate of buf, and the lower right coordinate of win is upper left than the lower right coordinate of buf.
((px <= x) && (py <= y)) && (((px + bw)> = (x + tw)) && ((py + bh)> = (y + th)))
FIG. 12 shows an example in the case of (px, py) = (x0, y0), (x, y) = (x1, y1). Initial setting values include bh (buffer height), bw (buffer width), th (window height), and tw (window width). As win [i], window position arrays x and y are given. px <= x && py <= y is a judgment formula that means that the upper left coordinate of win is at the lower right of the upper left coordinate of buf. Also, ((px + bw)> = (x + tw)) && ((py + bh)> = (y + th)) means that the lower right coordinate of win is at the upper left of the lower right coordinate of buf This is a judgment formula.

バッファ充填動作fill win[i], buf[b]のためのアドレス生成方法は、以下の通りである。入力画像は、主記憶上で連続アドレス前提とする。iB（入力画像の先頭のアドレス）, ih（入力画像の高さ１列分のアドレスの変化分）, iw（入力画像の幅１つ分のアドレスの変化分）, bh（バッファの高さ）, bw（バッファの幅）, win[i]座標のx, yから以下の式に従って、アドレスを生成する。
iB+(y+H)*iw+x+W,
where 0<=H<=bh, 0<=W<=bw
HとWは、ウィンドウの左上の位置である。
a) ランダムなウィンドウ位置でのアクセス、b)規則的なウィンドウ位置でのアクセスの両方のアクセスパタンを両立することによって、以下のような効果がある。
a)大テクスチャ、ランダムの場合は、th, tw = bh, bwとすれば、通常のダブルバッファとして使える。
b)小テクスチャ、規則の場合は、th, tw < bh, bwと設定することで、はじめのウィンドウ充填時に周辺ウィンドウもバッファに持って来られる。バッファ切り替えを当該バッファにウィンドウが入っている間は切替えないようにすればレイテンシを隠蔽できる。 The address generation method for the buffer filling operation fill win [i], buf [b] is as follows. The input image is assumed to be a continuous address on the main memory. iB (start address of input image), ih (change in address for one column of input image height), iw (change in address for one input image width), bh (buffer height) , bw (buffer width), and x [y] of win [i] coordinates, an address is generated according to the following formula.
iB + (y + H) * iw + x + W,
where 0 <= H <= bh, 0 <= W <= bw
H and W are the upper left positions of the window.
By combining the access patterns of a) access at a random window position and b) access at a regular window position, the following effects can be obtained.
a) In the case of large texture and random, if th, tw = bh, bw, it can be used as a normal double buffer.
b) For small textures and rules, by setting th, tw <bh, bw, the surrounding window is also brought into the buffer when the first window is filled. Latency can be concealed by not switching the buffer while the window is in the buffer.

上記実施形態によれば、ウィンドウ位置が規則的な場合とランダムな場合の両方に柔軟に対応でできる。更に、ウィンドウ位置が規則的、ランダムのどちらのケースであってもユーザはバッファの切り替えや充填すべき領域をユーザが指定する必要がなく、ウィンドウの位置順win[N]、および、バッファに充填する縦横サイズbh,bwを指定するだけで、同一の利用方法で一貫して扱うことができる。 According to the above embodiment, both the case where the window position is regular and the case where it is random can be dealt with flexibly. Furthermore, regardless of whether the window position is regular or random, the user does not need to switch the buffer or specify the area to be filled. The window position order win [N] and the buffer are filled. By simply specifying the vertical and horizontal sizes bh and bw, they can be handled consistently in the same usage method.

上記実施形態のほかに、以下の付記を開示する。
（付記１）
画像のパタンマッチング処理をするための正規化相関処理装置であって、
パタンマッチングのためのテンプレートを保持するテンプレートバッファ手段と、
探索矩形を含む矩形データの縦横サイズを任意に指定して、入力画像の前記探索矩形を含む前記矩形データを保持する複数の探索矩形バッファ手段と、
前記探索矩形の位置情報と、前記複数の探索矩形バッファ手段への書き込みを制御するためのデータを保持する制御レジスタ手段と、
前記テンプレートバッファ手段に格納された前記テンプレートと、前記複数の探索矩形バッファ手段の１つに格納された前記探索矩形内のデータとの正規化相関値を演算する演算手段と、
を備えることを特徴とする正規化相関処理装置。
（付記２）
前記制御レジスタ手段は、正規化相関をとる各探索矩形の位置情報を互いに依存関係なく１つずつ個別に保持すると共に、前記探索矩形の位置が規則的な場合には、先頭探索矩形位置からの縦横サイズと間引き距離によっても前記探索矩形の位置を指定可能であることを特徴とする付記１に記載の正規化相関処理装置。
（付記３）
前記複数の探索矩形バッファ手段に含まれる第１の探索矩形バッファ手段に充填されている前記探索矩形を用いて相関値計算を行うとともに、次の相関値計算に用いる探索矩形を、前記複数の探索矩形バッファ手段に含まれる第２の探索矩形バッファ手段に充填することを特徴とする付記１に記載の正規化相関処理装置。
（付記４）
第１の探索矩形の計算が終了して、第２の探索矩形の計算を開始するさいに、現在計算に使用している探索矩形バッファに、前記第２の探索矩形が含まれていれば、探索矩形バッファを切替えず、そのまま前記第２の探索矩形の計算に使用し、前記第２の探索矩形が含まれていなければ、探索矩形バッファを切替えて、前記第２の探索矩形の計算に使用することを特徴とする付記１に記載の処理装置。
（付記５）
前記第１の探索矩形が現在使用中の探索矩形バッファ手段に入っているかどうか判定する場合に、前記探索矩形の左上座標が前記探索矩形バッファ手段に充填された矩形領域の左上座標より右下にあり、かつ、前記探索矩形の右下座標が前記探索矩形バッファ手段に充填された矩形領域の右下座標より左上にあることにより判定することを特徴とする付記４に記載の正規化相関処理装置。
（付記６）
現在使用していない他の前記探索矩形バッファ手段への矩形データの充填のさいに、現在使用中の前記探索矩形バッファ手段に、前記探索矩形の位置配列の順番に現在処理している探索矩形の次の探索矩形が含まれているか順番に判定して、含まれていない探索矩形に到達したら、それを先頭として次に矩形データを前記探索矩形バッファ手段の１つに充填することを特徴とする付記４に記載の正規化相関処理装置。
（付記７）
前記演算手段は、複数のピクセルを並列に処理可能であることを特徴とする付記１に記載の正規化相関処理装置。
（付記８）
前記演算手段は、正規化相関値の要素値となる、探索矩形内の各ピクセル値の総和、探索矩形内の各画素値の自乗の総和、探索矩形内の各画素値とテンプレート内の各画素値の同一位置ピクセルについての積の総和、を計算することを特徴とする付記１に記載の正規化相関処理装置。
（付記９）
前記探索矩形を含む矩形データのサイズは、前記テンプレートのサイズに対して非依存であることを特徴とする付記１に記載の正規化相関処理装置。 In addition to the above embodiment, the following supplementary notes are disclosed.
(Appendix 1)
A normalized correlation processing device for pattern matching processing of an image,
A template buffer means for holding a template for pattern matching;
A plurality of search rectangle buffer means for arbitrarily specifying the vertical and horizontal sizes of the rectangle data including the search rectangle and holding the rectangle data including the search rectangle of the input image;
Control register means for holding position information of the search rectangle and data for controlling writing to the plurality of search rectangle buffer means;
Computing means for computing a normalized correlation value between the template stored in the template buffer means and data in the search rectangle stored in one of the plurality of search rectangle buffer means;
A normalized correlation processing device comprising:
(Appendix 2)
The control register means individually holds the position information of each search rectangle taking a normalized correlation one by one without depending on each other, and when the position of the search rectangle is regular, from the start search rectangle position The normalized correlation processing apparatus according to appendix 1, wherein the position of the search rectangle can be specified also by a vertical and horizontal size and a thinning distance.
(Appendix 3)
The correlation rectangle is calculated using the search rectangle filled in the first search rectangle buffer means included in the plurality of search rectangle buffer means, and the search rectangle used for the next correlation value calculation is calculated as the plurality of search rectangles. 2. The normalized correlation processing apparatus according to appendix 1, wherein the second search rectangular buffer means included in the rectangular buffer means is filled.
(Appendix 4)
When the calculation of the first search rectangle is finished and the calculation of the second search rectangle is started, if the search rectangle buffer currently used for the calculation includes the second search rectangle, Without switching the search rectangle buffer, it is used for the calculation of the second search rectangle as it is, and if the second search rectangle is not included, the search rectangle buffer is switched and used for the calculation of the second search rectangle. The processing apparatus according to appendix 1, wherein:
(Appendix 5)
When determining whether or not the first search rectangle is in the currently used search rectangle buffer means, the upper left coordinate of the search rectangle is lower right than the upper left coordinate of the rectangular area filled in the search rectangle buffer means. The normalized correlation processing device according to appendix 4, wherein the normalization correlation processing apparatus determines whether the lower right coordinate of the search rectangle is located on the upper left side of the lower right coordinate of the rectangular area filled in the search rectangle buffer means .
(Appendix 6)
To again filling the rectangle data of the current addition to the search rectangle buffer means is not used, the search rectangle buffer means currently in use, the search rectangle that is currently processed in order of the search rectangle position array It is judged in order whether the next search rectangle is included, and when a search rectangle not included is reached, the next rectangle data is filled in one of the search rectangle buffer means starting from that search rectangle. The normalized correlation processing device according to attachment 4.
(Appendix 7)
The normalized correlation processing apparatus according to appendix 1, wherein the computing means is capable of processing a plurality of pixels in parallel .
(Appendix 8)
The calculation means is the sum of the pixel values in the search rectangle, the sum of the squares of the pixel values in the search rectangle, the pixel values in the search rectangle, and the pixels in the template, which are element values of the normalized correlation value The normalized correlation processing device according to appendix 1, wherein a sum of products of pixels having the same position is calculated.
(Appendix 9)
The normalized correlation processing apparatus according to claim 1, wherein a size of rectangular data including the search rectangle is independent of a size of the template.

本発明の実施形態の概要を説明する図（その１）である。It is FIG. (1) explaining the outline | summary of embodiment of this invention. 本発明の実施形態の概要を説明する図（その２）である。It is FIG. (2) explaining the outline | summary of embodiment of this invention. ウィンドウ位置が位置規則的な場合の動作を説明する図である。It is a figure explaining operation | movement in case a window position is position regular. ウィンドウ位置がランダムな場合の動作を説明する図である。It is a figure explaining operation | movement in case a window position is random. キャッシュ効果が生じるウィンドウ位置がランダムな場合の動作を説明する図である。It is a figure explaining operation | movement in case the window position where a cache effect produces is random. ウィンドウ位置が規則的な場合の動作タイミングを示す図である。It is a figure which shows the operation | movement timing in case a window position is regular. ウィンドウ位置ランダム時に通常のダブルバッファとして使う場合であって、キャッシュ効果なしの場合の動作タイミングを示す図である。It is a case where it is a case where it uses as a normal double buffer at the time of window position random, Comprising: It is a figure which shows the operation timing in case there is no cache effect. ウィンドウ位置がランダム時でもキャッシュ効果ある場合の動作タイミングを説明する図である。It is a figure explaining the operation timing when there is a cache effect even when the window position is random. ウィンドウバッファの充填方法とキャッシュ効果について説明する図である。It is a figure explaining the filling method and cache effect of a window buffer. ウィンドウバッファの充填のための処理フローを示す図である。It is a figure which shows the processing flow for filling of a window buffer. ウィンドウバッファの切り替え処理フローを示す図である。It is a figure which shows the switching process flow of a window buffer. ウィンドウバッファの充填、切り替えの自動判定方法を説明する図である。It is a figure explaining the automatic determination method of filling and switching of a window buffer. 正規化相関値の計算方法について説明する図である。It is a figure explaining the calculation method of a normalized correlation value. ウィンドウ位置のとり方について説明する図である。It is a figure explaining how to take a window position. ウィンドウ位置が規則的な場合のアクセラレータにより処理の概要を説明する図である。It is a figure explaining the outline | summary of a process by the accelerator when a window position is regular. 画像のテンプレートマッチングのためのハードウェア構成例を示す図である。It is a figure which shows the hardware structural example for the template matching of an image. ウィンドウ単位で並列処理する方式を説明する図である。It is a figure explaining the system which processes in parallel per window. １ウィンドウで複数ピクセル並列に処理する方式を説明する図である。It is a figure explaining the system which processes in parallel several pixels by 1 window. １ウィンドウで複数ピクセル並列に処理する方式を説明する図である。It is a figure explaining the system which processes in parallel several pixels by 1 window. 従来の、ダブルバッファ方式の正規化相関処理装置の構成を説明する図である。It is a figure explaining the structure of the normalization correlation processing apparatus of the conventional double buffer system.

Explanation of symbols

１０、１０−１〜１０−８乗算器
１１加算器
１２レジスタ
２０メモリ
２１バス
２２テクスチャバッファ
２３、２３−１、２３−２ウィンドウバッファ
２４正規化相関演算回路
２５、３０制御レジスタ
２６、３１制御回路
２７セレクタ 10, 10-1 to 10-8 Multiplier 11 Adder 12 Register 20 Memory 21 Bus 22 Texture buffer 23, 23-1, 23-2 Window buffer 24 Normalized correlation operation circuit 25, 30 Control register 26, 31 Control circuit 27 Selector

Claims

A normalized correlation processing device for pattern matching processing of an image,
A template buffer means for holding a template for pattern matching;
A plurality of search rectangle buffer means for arbitrarily specifying the vertical and horizontal sizes of rectangular data including a search rectangle and holding the rectangle data including one or more search rectangles of an input image;
Control register means for holding position information of the search rectangle and data for controlling writing to the plurality of search rectangle buffer means;
Computing means for computing a normalized correlation value between the template stored in the template buffer means and data in the search rectangle stored in one of the plurality of search rectangle buffer means;
Equipped with a,
When the calculation of the first search rectangle is completed and the calculation of the second search rectangle is started, if the search rectangle buffer currently used for the calculation does not include the second search rectangle, by switching the search rectangle buffer, normalized correlation processing apparatus characterized that you use in the calculation of the second search rectangle.

When filling the search rectangle buffer means that is not currently used with the rectangular data, the search rectangle buffer means that is currently being used has the search rectangles currently processed in the order of the position arrangement of the search rectangles. It is judged in order whether the next search rectangle is included, and when a search rectangle not included is reached, the next rectangle data is filled in one of the search rectangle buffer means starting from that search rectangle. The normalized correlation processing apparatus according to claim 1.

The control register means individually holds the position information of each search rectangle taking a normalized correlation one by one without depending on each other, and when the position of the search rectangle is regular, from the start search rectangle position The normalized correlation processing apparatus according to claim 1, wherein the position of the search rectangle can be specified also by a vertical and horizontal size and a thinning distance.

The correlation rectangle is calculated using the search rectangle filled in the first search rectangle buffer means included in the plurality of search rectangle buffer means, and the search rectangle used for the next correlation value calculation is calculated as the plurality of search rectangles. 4. The normalized correlation processing apparatus according to claim 1, wherein the second search rectangular buffer means included in the rectangular buffer means is filled.

When determining whether or not the first search rectangle is in the currently used search rectangle buffer means, the upper left coordinate of the search rectangle is lower right than the upper left coordinate of the rectangular area filled in the search rectangle buffer means. There, and any one of claims 1 to 4, wherein the determining by the lower right coordinates of the search rectangle in the upper left from the lower right coordinates of the filled rectangular area in the search rectangle buffer means Normalized correlation processing apparatus described in 1.