JPH0863452A

JPH0863452A - Simd processor

Info

Publication number: JPH0863452A
Application number: JP6201654A
Authority: JP
Inventors: Junichi Goto; 順一後藤; Ichiro Tamiya; 一郎民谷
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 1994-08-26
Filing date: 1994-08-26
Publication date: 1996-03-08

Abstract

PURPOSE: To provide address generation matching image processing through small hardware for a local and a common memory so that each unit processor can process a partial image of up to several thousands of pixels. CONSTITUTION: A 1st address generator 26 supplies the image signal from a memory 25 to a unit processor 1. This signal includes signals that the unit processor 1 does not require and a control signal generator 28 supplies a signal to a register 11C so that no operation is performed to the unnecessary signals. At the same time, the supplied signal is supplied even to a 2nd address generator 27 so as to stop reading a signal out of the memory 12. In image processing, the shape of array of image signals processed by respective unit processors is in the same shape in the memory 25 and frequently shifts only in position. Therefore, the address generator 27 and control signal generator 28 are mounted on a control part 6 and delay units 18 and 21 are mounted by unit processors so as to absorb the position shift of the image.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、画像処理等を並列実行
するマルチプロセッサシステム、特にＳＩＭＤプロセッ
サに関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a multiprocessor system for executing image processing and the like in parallel, and more particularly to a SIMD processor.

【０００２】[0002]

【従来の技術】従来のＳＩＭＤ型プロセッサの公知例と
して、磯西，宮田，岩瀬，「セルラアレイ型大規模並列
プロセッサのアーキテクチャ」，情処学会研究会資料，
計算機アーキテクチャ７３−９（１９８８．１０）があ
る。単位プロセッサと画素データを対応させて処理を行
うことにより、単位プロセッサの数だけ並列に処理する
ことができる。即ち、２次元配列の画素データが、２次
元格子状に接続された単位プロセッサ内のレジスタまた
はローカルメモリに格納され、その周囲の画素データの
アクセスは、２次元配列の画素データ全体のシフト操作
を単位プロセッサ間データ転送によって行うことで可能
となる。2. Description of the Related Art Isosai, Miyata, and Iwase, "Architecture of Cellular Array-type Large-scale Parallel Processor," as a publicly known example of a conventional SIMD type processor, Institute of Information Processing Society of Japan,
There is a computer architecture 73-9 (1988.10). By processing the unit processors and the pixel data in association with each other, it is possible to perform processing in parallel by the number of unit processors. That is, the two-dimensional array of pixel data is stored in a register or a local memory in a unit processor connected in a two-dimensional grid, and access to the pixel data around it is performed by shifting the entire two-dimensional array of pixel data. This is possible by performing data transfer between unit processors.

【０００３】[0003]

【発明が解決しようとする課題】しかしながら、従来の
技術では、ある単位プロセッサがその周囲の画素データ
にアクセスするためには、単位プロセッサ間でのデータ
転送が必要であり、ハードウェア的には配線数の増大を
招いていた。However, in the prior art, in order for a certain unit processor to access the pixel data around it, it is necessary to transfer data between the unit processors, and in terms of hardware, wiring is required. The number was increasing.

【０００４】本発明の目的は、各単位プロセッサ内で処
理を完結できるように、複数の画素データからなる部分
画像を保持することができる単位プロセッサローカルメ
モリ部を搭載し、かつそのアドレスの発生を、必要最小
限のハードウェアで実現する手段を提供することにあ
る。An object of the present invention is to mount a unit processor local memory unit capable of holding a partial image composed of a plurality of pixel data so that processing can be completed in each unit processor, and to generate the address thereof. , It is to provide means to realize with minimum necessary hardware.

【０００５】[0005]

【課題を解決するための手段】第１の発明は、Ｎ個（Ｎ
は自然数）の単位プロセッサからなる単位プロセッサ群
と、制御部とを備えるＳＩＭＤプロセッサにおいて、前
記制御部が、アドレス発生器と、制御部ローカルメモリ
とを備え、前記単位プロセッサ群の各単位プロセッサ
が、単位プロセッサローカルメモリ部と、演算部と、ア
ドレス入力端子と、アドレス出力端子と、遅延部と、デ
ータ入力端子と、データ出力端子と、データバスとを備
え、前記遅延器が、前記アドレス入力端子に入力される
信号を予め定められた期間だけ遅延させて前記アドレス
出力端子に出力し、前記アドレス入力端子に入力される
信号が前記単位プロセッサローカルメモリ部にアドレス
として供給され、前記データ入力端子と前記データ出力
端子が前記データバスにより接続され、前記データバス
が前記単位プロセッサローカルメモリ部にデータを供給
し、前記データバスと前記単位プロセッサローカルメモ
リ部とが前記演算部にデータを供給し、前記アドレス発
生器が前記単位プロセッサ群の第１の単位プロセッサの
前記アドレス入力端子にアドレスを供給し、前記単位プ
ロセッサ群の第ｉの単位プロセッサ（ｉは１からＮ−１
までの自然数）の前記アドレス出力端子が前記単位プロ
セッサ群の第（ｉ＋１）の単位プロセッサの前記アドレ
ス入力端子に接続され、前記第ｉの単位プロセッサの前
記データ出力端子が前記第（ｉ＋１）の単位プロセッサ
の前記データ入力端子に接続されることを特徴とする。The first invention is N (N
Is a natural number) in a SIMD processor including a unit processor group including a unit processor of a unit number, and the control unit includes an address generator and a control unit local memory, and each unit processor of the unit processor group includes A unit processor local memory unit, an arithmetic unit, an address input terminal, an address output terminal, a delay unit, a data input terminal, a data output terminal, and a data bus, wherein the delay unit has the address input terminal. The signal input to the address processor is delayed by a predetermined period and output to the address output terminal, the signal input to the address input terminal is supplied to the unit processor local memory unit as an address, and the data input terminal The data output terminals are connected by the data bus, and the data bus is connected to the unit processor. Data is supplied to a local memory unit, the data bus and the unit processor local memory unit supply data to the arithmetic unit, and the address generator is the address input terminal of the first unit processor of the unit processor group. To the i-th unit processor (i is 1 to N-1) of the unit processor group.
Natural number) up to the address input terminal of the (i + 1) th unit processor of the unit processor group, and the data output terminal of the i-th unit processor is connected to the (i + 1) th unit. It is characterized in that it is connected to the data input terminal of the processor.

【０００６】また第２の発明は、第１の発明において、
前記単位プロセッサ群の各単位プロセッサの前記遅延器
の遅延期間が、単位プロセッサ毎に独立に定められるこ
とを特徴とする。The second invention is based on the first invention.
The delay period of the delay unit of each unit processor of the unit processor group is independently determined for each unit processor.

【０００７】また第３の発明は、第１の発明において、
前記制御部が、タイミング信号発生器を備え、前記タイ
ミング信号発生器の出力信号が、第１の状態と第２の状
態を各々予め定められた任意の期間ずつ交互に取り、前
記タイミング信号発生器の出力信号が前記アドレス発生
器に供給され、前記タイミング信号発生器の出力信号が
前記第２の状態においては、前記アドレス発生器がアド
レス発生を停止することを特徴とする。The third invention is the same as the first invention,
The control unit includes a timing signal generator, and an output signal of the timing signal generator alternately takes a first state and a second state for each predetermined arbitrary period, and the timing signal generator Is supplied to the address generator, and when the output signal of the timing signal generator is in the second state, the address generator stops the address generation.

【０００８】また第４の発明は、第３の発明において、
前記単位プロセッサ群の各単位プロセッサが、演算制御
信号入力端子と、演算制御信号出力端子と、第２の遅延
器とを備え、前記第２の遅延器が、前記演算制御信号入
力端子に入力される信号を予め定められた期間だけ遅延
させて前記演算制御信号出力端子に出力し、前記演算制
御信号入力端子に入力される信号が前記演算部に供給さ
れ、前記演算制御信号入力端子に入力される信号が前記
第２の状態においては、前記演算部が演算を停止し、前
記タイミング信号発生器が前記単位プロセッサ群の第１
の単位プロセッサの前記演算制御信号出力端子に信号を
供給し、前記第ｉの単位プロセッサの前記演算制御信号
出力端子が前記第（ｉ＋１）の単位プロセッサの前記演
算制御信号入力端子に接続されることを特徴とする。A fourth aspect of the invention is the same as the third aspect of the invention.
Each unit processor of the unit processor group includes an operation control signal input terminal, an operation control signal output terminal, and a second delay device, and the second delay device is input to the operation control signal input terminal. A signal that is delayed by a predetermined period and is output to the arithmetic control signal output terminal, and the signal input to the arithmetic control signal input terminal is supplied to the arithmetic unit and input to the arithmetic control signal input terminal. In the second state, the operation unit stops the operation, and the timing signal generator causes the timing signal generator to operate in the first group of the unit processors.
A signal is supplied to the operation control signal output terminal of the unit processor, and the operation control signal output terminal of the i-th unit processor is connected to the operation control signal input terminal of the (i + 1) th unit processor. Is characterized by.

【０００９】また第５の発明は、第４の発明において、
前記単位プロセッサ群の各単位プロセッサの前記第２の
遅延器の遅延期間が、単位プロセッサ毎に独立に定めら
れることを特徴とする。A fifth aspect of the invention is the same as the fourth aspect of the invention.
The delay period of the second delay unit of each unit processor of the unit processor group is independently set for each unit processor.

【００１０】また第６の発明は、第１の発明において、
前記単位プロセッサ群の各単位プロセッサにおいて、前
記演算部の出力が前記単位プロセッサローカルメモリ部
に供給されることを特徴とする。The sixth invention is based on the first invention.
In each unit processor of the unit processor group, the output of the arithmetic unit is supplied to the unit processor local memory unit.

【００１１】また第７の発明は、第１の発明において、
前記単位プロセッサ群の各単位プロセッサが、第２のデ
ータ入力端子と、第２のデータ出力端子と、第２のデー
タバスとを備え、前記第２のデータ入力端子と前記第２
のデータ出力端子とが前記第２のデータバスにより接続
され、前記演算部が前記第２のデータバスにデータを供
給し、前記ｉの単位プロセッサの前記第２のデータ出力
端子が前記第（ｉ＋１）の単位プロセッサの前記第２の
データ入力端子に接続されることを特徴とする。A seventh aspect of the invention is based on the first aspect,
Each unit processor of the unit processor group includes a second data input terminal, a second data output terminal, and a second data bus, and the second data input terminal and the second data input terminal are provided.
Connected to the second data bus by the second data bus, the arithmetic unit supplies data to the second data bus, and the second data output terminal of the unit processor of i has the (i + 1) th data output terminal. ) Is connected to the second data input terminal of the unit processor.

【００１２】また第８の発明は、第１の発明において、
前記制御部が第２のアドレス発生器を備え、前記第２の
アドレス発生器が前記制御部ローカルメモリにアドレス
を供給することを特徴とする。An eighth invention is based on the first invention,
The control unit includes a second address generator, and the second address generator supplies an address to the control unit local memory.

【００１３】また第９の発明は、第１の発明において、
前記単位プロセッサ群の各単位プロセッサにおいて、前
記データ入力端子と、前記データバスと、前記データ出
力端子と、前記アドレス入力端子と、前記遅延器と、前
記アドレス出力端子とが多重化されていることを特徴と
する。A ninth aspect of the invention is the same as the first aspect of the invention.
In each unit processor of the unit processor group, the data input terminal, the data bus, the data output terminal, the address input terminal, the delay device, and the address output terminal are multiplexed. Is characterized by.

【００１４】また第１０の発明は、第４の発明におい
て、前記単位プロセッサ群の各単位プロセッサにおい
て、前記演算制御信号入力端子と、前記第２の遅延器
と、前記演算制御信号出力端子とが多重化されているこ
とを特徴とする。In a tenth aspect based on the fourth aspect, in each unit processor of the unit processor group, the arithmetic control signal input terminal, the second delay device and the arithmetic control signal output terminal are provided. It is characterized by being multiplexed.

【００１５】[0015]

【作用】本発明は、画像処理において、各単位プロセッ
サが必要とする画素データの並び方の形状が共通である
ことと、各単位プロセッサが画素データを必要とするタ
イミングにずれがあることに着目し、アドレス発生器は
制御部のみに搭載し、各単位プロセッサにはタイミング
のずれに応じてアドレスおよびその他の制御信号を遅延
させて供給する。これにより必要最小限の制御手段によ
って、各単位プロセッサが必要とする画素データを正し
く供給することを可能としている。The present invention focuses on the fact that in image processing, the shape of the arrangement of pixel data required by each unit processor is common, and that the timing at which each unit processor requires pixel data is different. The address generator is installed only in the control unit, and the address and other control signals are delayed and supplied to each unit processor according to the timing shift. This makes it possible to correctly supply the pixel data required by each unit processor with the minimum necessary control means.

【００１６】[0016]

【実施例】次に、本発明の実施例について図面を参照し
て説明する。Embodiments of the present invention will now be described with reference to the drawings.

【００１７】図１は、本発明の一実施例を示す図であ
る。ＳＩＭＤプロセッサは、単位プロセッサ群５と、制
御部６とから構成される。FIG. 1 is a diagram showing an embodiment of the present invention. The SIMD processor is composed of a unit processor group 5 and a control unit 6.

【００１８】この単位プロセッサ群５では、単位プロセ
ッサ数Ｎが４、演算部１１がセレクタ（ＳＥＬ）１１
ｄ，算術論理演算器（ＡＬＵ）１１ａ，加算器１１ｂお
よびレジスタ（ＲＥＧ）１１ｃで構成され、単位プロセ
ッサローカルメモリ部１２が単位プロセッサローカルメ
モリコア（ＰＥＭ）１２ａおよびセレクタ（ＳＥＬ）１
２ｂで構成される場合を示す。In this unit processor group 5, the number N of unit processors is 4, and the arithmetic unit 11 is a selector (SEL) 11
d, an arithmetic and logic unit (ALU) 11a, an adder 11b and a register (REG) 11c, and the unit processor local memory unit 12 includes a unit processor local memory core (PEM) 12a and a selector (SEL) 1.
2b shows the case of being composed of

【００１９】制御部６は、制御部ローカルメモリ（ＣＵ
Ｍ）２５と、第１のアドレス発生器（ＧＵＡＧＵ）２６
と、第２のアドレス発生器（ＰＥＡＧＵ）２７と、タイ
ミング信号発生器（ＴＧ）２８とから構成されている。The control unit 6 includes a control unit local memory (CU).
M) 25 and the first address generator (GUAGU) 26
And a second address generator (PEAGU) 27 and a timing signal generator (TG) 28.

【００２０】図１において、１，２，３，４は単位プロ
セッサ、１３はデータ入力端子、１４はデータ出力端
子、１５はデータバス、１６はアドレス入力端子、１７
はアドレス出力端子、１８は第１の遅延器、１９は演算
制御信号入力端子、２０は演算制御信号出力端子、２１
は第２の遅延器、２２は第２のデータ入力端子、２３は
第２のデータ出力端子、２４は第２のデータバスであ
る。In FIG. 1, 1, 2, 3, and 4 are unit processors, 13 is a data input terminal, 14 is a data output terminal, 15 is a data bus, 16 is an address input terminal, and 17
Is an address output terminal, 18 is a first delay device, 19 is an operation control signal input terminal, 20 is an operation control signal output terminal, 21
Is a second delay device, 22 is a second data input terminal, 23 is a second data output terminal, and 24 is a second data bus.

【００２１】画像処理の１つであるパタンマッチングを
例に上げて説明する。一般にパタンマッチングは、ある
画像に最も類似した同じ大きさの画像を、それよりも大
きい画像の中から捜し出すという処理である。図２
（ａ）および図２（ｂ）は、本実施例でのパタンマッチ
ングの処理内容を説明するための図である。以降の説明
では、４画素×４画素の大きさの画像をブロックと呼ぶ
こととする。本実施例のパタンマッチングは、注目ブロ
ック１００に最も類似したブロックを、それよりも大き
い１１画素×１１画素の大きさ（ここでの大きさは一例
である）の参照画像１０１の中から捜し出すというもの
である。Pattern matching, which is one of image processing, will be described as an example. Generally, pattern matching is a process of searching for an image of the same size that is most similar to a certain image from images that are larger than that. Figure 2
FIG. 2A and FIG. 2B are diagrams for explaining the processing content of pattern matching in this embodiment. In the following description, an image having a size of 4 pixels × 4 pixels will be referred to as a block. According to the pattern matching of the present embodiment, the block most similar to the target block 100 is searched for in the reference image 101 having a size of 11 pixels × 11 pixels larger than that (a size here is an example). It is a thing.

【００２２】２つの画像の相違の指標（相違度）として
以下の式を用いる。The following formula is used as an index (difference) of the difference between the two images.

【００２３】[0023]

【数１】 [Equation 1]

【００２４】上式で、Ｐ₀（ｉ，ｊ）は、注目ブロック
１００において、その左上の画素を（０，０）として、
右方向ｉ番目，下方向ｊ番目の画素の輝度値を示し、Ｐ
₁（ｘ＋ｉ，ｙ＋ｊ）は、参照画像１０１において、そ
の左上の画素から右方向にｘ＋ｉ番目，下方向にｙ＋ｊ
番目の画素の輝度値を示す。即ちＤ（ｘ，ｙ）は、注目
ブロック１００内の１６個の各画素値と、参照画像１０
１内において左上の画素が（ｘ，ｙ）の位置にあるブロ
ックの各画素値との差分絶対値の累算値であり、この値
が小さい程、（ｘ，ｙ）の位置にあるブロックは注目ブ
ロック１００に類似しているといえる。また（ｘ，ｙ）
の可能な組み合わせが、（０，０）〜（７，７）である
ので、参照画像１０１内のブロックの個数は６４個であ
る。In the above equation, P ₀ (i, j) is defined as the upper left pixel of the block of interest 100 (0, 0).
Indicates the luminance value of the i-th pixel in the right direction and the j-th pixel in the downward direction, and P
₁ (x + i, y + j) is, in the reference image 101, the pixel at the upper left of the reference image 101 at the x + i-th position in the right direction and the y + j in the downward direction
The luminance value of the th pixel is shown. That is, D (x, y) is equal to each of the 16 pixel values in the target block 100 and the reference image 10
The upper left pixel in 1 is the accumulated value of the absolute difference between each pixel value of the block at the position (x, y), and the smaller this value, the more the block at the position (x, y) It can be said that it is similar to the block of interest 100. Also (x, y)
Since the possible combinations of (0, 0) to (7, 7) are, the number of blocks in the reference image 101 is 64.

【００２５】注目ブロック１００内の１６個の各画素値
は、図１の第１の単位プロセッサ１，第２の単位プロセ
ッサ２，第３の単位プロセッサ３，第４の単位プロセッ
サ４の各単位プロセッサローカルメモリコア１２ａ内に
おいて、図２（ａ）に示す各画素の番号に等しいアドレ
スに記憶されているものとする。同様に参照画像１０１
の各画素値は、図１の制御部６の制御部ローカルメモリ
２５内において、図２（ｂ）に示す各画素の番号に等し
いアドレスに記憶されているものとする。Each of the 16 pixel values in the block of interest 100 is a unit processor of the first unit processor 1, the second unit processor 2, the third unit processor 3, and the fourth unit processor 4 of FIG. It is assumed that the data is stored in the local memory core 12a at an address equal to the number of each pixel shown in FIG. Similarly, reference image 101
Each pixel value of 1 is stored in the control unit local memory 25 of the control unit 6 of FIG. 1 at an address equal to the number of each pixel shown in FIG. 2B.

【００２６】これらの画素値を処理するために、４個の
単位プロセッサに対して次のように処理を割り当てる。
第１の単位プロセッサ１は、参照画像１０１内の画素番
号０の画素を左上とするブロックと注目ブロック１００
に対して、式（１）を計算する。同様にして、第２の単
位プロセッサ２，第３の単位プロセッサ３および第４の
単位プロセッサ４は、それぞれ画素番号１，画素番号２
および画素番号３の画素を左上とするブロックと注目ブ
ロック１００に対して、式（１）を計算する。In order to process these pixel values, processing is assigned to four unit processors as follows.
The first unit processor 1 uses the block having the pixel number 0 in the reference image 101 as the upper left corner and the target block 100.
Then, the equation (1) is calculated. Similarly, the second unit processor 2, the third unit processor 3 and the fourth unit processor 4 have pixel numbers 1 and 2 respectively.
Then, the equation (1) is calculated for the block having the pixel number 3 as the upper left corner and the target block 100.

【００２７】制御部６の基本動作は、命令メモリ中に記
憶された命令語シーケンスを逐次読み出し、制御信号を
生成して単位プロセッサ群５および制御部６自身に供給
することにより動作を行うという、いわゆるストアード
プログラム方式とする。The basic operation of the control section 6 is that the instruction word sequence stored in the instruction memory is sequentially read, a control signal is generated, and the control signal is supplied to the unit processor group 5 and the control section 6 itself. The so-called stored program method is used.

【００２８】先ず制御部ローカルメモリ２５に記憶され
ている参照画像１０１の内、第１の単位プロセッサ１，
第２の単位プロセッサ２，第３の単位プロセッサ３およ
び第４の単位プロセッサ４に処理が割り当てられている
部分参照画像１０１ａの画素値を読み出す動作を説明す
る。そのために、第１のアドレス発生器２６は、部分参
照画素１０１ａの画素の各アドレス（本実施例の場合、
上述したように各アドレスは画素の番号に等しい）を、
１サイクルに１画素ずつ発生する。即ち、図２（ｃ）の
矢印が示す順である。またリードイネーブル信号も第１
のアドレス発生器２６で発生するものとする。First, of the reference images 101 stored in the control unit local memory 25, the first unit processor 1,
The operation of reading the pixel value of the partial reference image 101a to which the processing is assigned to the second unit processor 2, the third unit processor 3, and the fourth unit processor 4 will be described. Therefore, the first address generator 26 uses the addresses of the pixels of the partial reference pixel 101a (in the case of the present embodiment,
As mentioned above, each address is equal to the pixel number),
One pixel is generated in one cycle. That is, it is in the order indicated by the arrow in FIG. The read enable signal is also the first
Address generator 26.

【００２９】このようなアドレス発生を行うためには、
アドレスに１を加算することを６回繰り返し、続いて５
を加算し（画素６から画素１１に移るから）、再び１を
加算することを６回繰り返す。同様の動作を繰り返して
画素３９までのアドレスを発生する。このアドレス発生
のフローチャートを図３に示す。In order to generate such an address,
Repeat adding 6 to the address 6 times, then 5
Is added (since pixel 6 is moved to pixel 11), and 1 is added again, which is repeated 6 times. The same operation is repeated to generate the addresses up to the pixel 39. A flow chart of this address generation is shown in FIG.

【００３０】図３において、ＡＤＲＳはアドレスを保持
する変数、ＤＩＳＰ１，ＤＩＳＰ２は各々アドレスの加
算値を保持する変数、ＣＮＴ１，ＣＮＴ２は各々横方向
と縦方向の回数上限を保持するための変数である。従っ
て第１のアドレス発生器２６の構成は、上記の各変数に
相当するレジスタを備えることになり、第１のアドレス
発生器２６がアドレス発生を行う前に、これらのレジス
タに適当な値を設定するための命令語を、命令語シーケ
ンスに記述する必要がある。実際のアドレス発生は、ア
ドレス発生の開始を指示する命令語の発行をトリガとし
て、設定されたレジスタ内容に従って、第１のアドレス
発生器２６が自律的に実行するものとする。In FIG. 3, ADRS is a variable that holds an address, DISP1 and DISP2 are variables that hold the added value of each address, and CNT1 and CNT2 are variables that hold the upper limit of the number of times in the horizontal and vertical directions, respectively. . Therefore, the configuration of the first address generator 26 includes registers corresponding to the above variables, and sets appropriate values in these registers before the first address generator 26 generates an address. It is necessary to describe the instruction word for doing in the instruction word sequence. It is assumed that the actual address generation is autonomously executed by the first address generator 26 according to the set register contents, triggered by the issuance of an instruction word instructing the start of address generation.

【００３１】このような動作を行うアドレス発生器とし
ては、例えば、後藤他，「超高速ビデオ信号処理プロセ
ッサ（Ｓ−ＶＳＰ）における制御方式」，信学技報ＩＣ
Ｄ９１−１０１，ＰＰ．２３−２９（１９９１）に記載
されているものがある。As an address generator which performs such an operation, for example, Goto et al., "Control system in ultra-high speed video signal processor (S-VSP)", IEICE Tech.
D91-101, PP. 23-29 (1991).

【００３２】以上のようにして読み出された部分参照画
像１０１ａの各画素は、１サイクルに１画素ずつ第１の
単位プロセッサ１のデータ入力端子１３に供給される
（データ入力端子１３に接続される演算部１１内のセレ
クタ１１ｄはデータ入力端子１３の信号を選択するよう
に設定しておく）。この動作の時間的な流れを図４の第
１のアドレス発生器２６の段に示す。各数字が第１のア
ドレス発生器２６が発生する各画素のアドレスである。Each pixel of the partial reference image 101a read out as described above is supplied to the data input terminal 13 of the first unit processor 1 one pixel in one cycle (connected to the data input terminal 13). The selector 11d in the arithmetic unit 11 is set so as to select the signal of the data input terminal 13). The temporal flow of this operation is shown in the stage of the first address generator 26 in FIG. Each number is an address of each pixel generated by the first address generator 26.

【００３３】次に、第１の単位プロセッサ１内におけ
る、単位プロセッサローカルメモリコア１２ａに記憶さ
れている図２（ａ）の注目ブロック１００の読み出し動
作を説明する。Next, the read operation of the block of interest 100 of FIG. 2A stored in the unit processor local memory core 12a in the first unit processor 1 will be described.

【００３４】本実施例では、単位プロセッサローカルメ
モリコア１２ａのためのアドレス発生は、図２（ａ）に
示すように画素０から画素１５までの単純なシーケンシ
ャルである。従って第２のアドレス発生器２７として
は、単純なインクリメントカウンタで十分である。ただ
し汎用性を求めるならば、第１のアドレス発生器２６と
同一構成のアドレス発生器が望ましい。シーケンシャル
なアドレス発生は、第１のアドレス発生器２６の構成に
よっても可能である。いずれにしても第１のアドレス発
生器２６と同様に、第２のアドレス発生器２７も、アド
レス発生開始を指示する命令語をトリガとして自律的な
アドレス発生を実行し、またリードイネーブル信号も発
生するものとする。In this embodiment, the address generation for the unit processor local memory core 12a is simple sequential from pixel 0 to pixel 15 as shown in FIG. 2 (a). Therefore, a simple increment counter is sufficient for the second address generator 27. However, if versatility is required, an address generator having the same configuration as the first address generator 26 is desirable. Sequential address generation is also possible with the configuration of the first address generator 26. In any case, like the first address generator 26, the second address generator 27 also executes autonomous address generation by using an instruction word instructing the start of address generation as a trigger, and also generates a read enable signal. It shall be.

【００３５】第２のアドレス発生器２７のアドレス発生
を第１のアドレス発生器２６のそれと同時に開始するこ
とにより（両者に同時にアドレス発生開始を指示できる
命令語を備えることが必要である）、図２（ａ）の注目
ブロック１００内の画素０から画素３の各々は、図２
（ｂ）の部分参照画像１０１ａの画素０から画素３の各
々と同期して、第１の単位プロセッサ１の算術論理演算
器１１ａに供給される。算術論理演算器１１ａは、その
２つの入力に対して差分絶対値を出力するよう予め設定
されているものとする（後述するようにパタンマッチン
グを行うためには、２個の入力の大小を比較する必要も
ある。算術論理演算器としては、差分絶対値の他にも加
減算，論理和，論理積等の複数の演算が、制御信号によ
って選択的に実行できるものが広く用いられており、本
実施例の算術論理演算器１１ａも同様のものとし、ここ
では制御部６からの制御信号によって差分絶対値を実行
するよう制御するものである）。以上の時間的様子を、
既に説明した図４の第１のアドレス発生器２６の発生ア
ドレスと対応させて、第１の単位プロセッサ１の段に示
す。０，１，２，３の各数字が、第２のアドレス発生器
２７が発生している画素０，画素１，画素２，画素３の
アドレスである（“第１の単位プロセッサ”の段とした
のは、図１に示すように第２のアドレス発生器２７の出
力は第１の単位プロセッサに遅延無くそのまま入力され
るからである）。By starting the address generation of the second address generator 27 at the same time as that of the first address generator 26 (it is necessary to provide both with an instruction word capable of instructing the start of address generation at the same time). Each of the pixels 0 to 3 in the target block 100 of 2 (a) is shown in FIG.
The partial reference image 101a of (b) is supplied to the arithmetic logic unit 11a of the first unit processor 1 in synchronization with each of the pixels 0 to 3. It is assumed that the arithmetic and logic unit 11a is preset to output the absolute difference value for its two inputs (in order to perform pattern matching as will be described later, the two inputs are compared in magnitude). As an arithmetic and logic unit, a unit that can selectively execute a plurality of operations such as addition and subtraction, logical sum, and logical product in addition to the absolute difference value by a control signal is widely used. The arithmetic logic unit 11a of the embodiment is also the same, and here, it is controlled to execute the absolute difference value by the control signal from the control unit 6). The above time situation,
It is shown in the stage of the first unit processor 1 in association with the generated address of the first address generator 26 of FIG. 4 already described. The numbers 0, 1, 2, and 3 are the addresses of pixel 0, pixel 1, pixel 2, and pixel 3 generated by the second address generator 27 (referred to as the "first unit processor" stage). This is because the output of the second address generator 27 is directly input to the first unit processor without delay as shown in FIG. 1).

【００３６】ところが次のサイクルで読み出される部分
参照画像１０１ａの画素４は、第１の単位プロセッサ１
が処理するように割り当てられている画素ではないの
で、これを演算に用いないようにしなければならない。
続く画素５，画素６についても同様である。However, the pixel 4 of the partial reference image 101a read in the next cycle is the first unit processor 1
It is not a pixel that is assigned to be processed, so it must not be used in the operation.
The same applies to the subsequent pixels 5 and 6.

【００３７】このために、図４に示すようにこの３サイ
クルの期間、第２のアドレス発生器２７のアドレス発生
を停止させることとする（破線の部分）。そして部分参
照画像１０１ａの画素１１が読み出される４サイクル後
にアドレス発生を再開し、注目ブロック１００の画素
４，画素５，画素６，画素７を読み出し、続く３サイク
ル再び停止する。以下同様の動作を繰り返すことによ
り、第１の単位プロセッサ１においては、その単位プロ
セッサローカルメモリコア１２ａ内の注目ブロック１０
０の各画素は、部分参照画像１０１ａの対応する画素と
同期して、算術論理演算器１１ａに供給されることとな
る。Therefore, as shown in FIG. 4, the address generation of the second address generator 27 is stopped during the period of these three cycles (broken line portion). Address generation is restarted 4 cycles after the pixel 11 of the partial reference image 101a is read, pixels 4, pixel 5, pixel 6, and pixel 7 of the target block 100 are read, and the subsequent 3 cycles are stopped again. By repeating the same operation thereafter, in the first unit processor 1, the target block 10 in the unit processor local memory core 12a
Each pixel of 0 is supplied to the arithmetic and logic operation unit 11a in synchronization with the corresponding pixel of the partial reference image 101a.

【００３８】このような動作を実現するために、制御部
６内にタイミング信号発生器２８を備え、図４のタイミ
ング信号発生器２８の段に示すような信号を発生させ
る。“ＯＮ”と記した期間がアドレス発生期間であり、
“ＯＦＦ”と記した期間がアドレス発生停止期間であ
る。この信号と動作クロックとの論理積を第２のアドレ
ス発生器２７に供給する動作クロックとする等の方法に
より、第２のアドレス発生器２７のアドレス発生を停止
することができる。In order to realize such an operation, a timing signal generator 28 is provided in the control unit 6 and a signal as shown in the stage of the timing signal generator 28 of FIG. 4 is generated. The period marked "ON" is the address generation period,
The period described as "OFF" is the address generation suspension period. The address generation of the second address generator 27 can be stopped by a method in which the logical product of this signal and the operation clock is used as the operation clock supplied to the second address generator 27.

【００３９】図５にタイミング信号発生器２８の動作の
フローチャートを示す。ＣＮＴ１＝４，ＣＮＴ２＝３，
ＣＮＴ３＝４は各々、“ＯＮ”の期間が４サイクル，
“ＯＦＦ”の期間が３サイクル，その“ＯＮ”と“ＯＦ
Ｆ”を４回繰り返すということを設定している。変数Ｏ
ＵＴＰＵＴが出力信号を表す。変数ＤＵＭＭＹ１が０か
ら３まで変化する期間（ＣＮＴ１より小さい期間）、Ｏ
ＵＴＰＵＴ＝１となって“ＯＮ”の期間であることを示
す。変数ＤＵＭＭＹ２が０から２まで変化する期間（Ｃ
ＮＴ２より小さい期間）、ＯＵＴＰＵＴ＝０となって
“ＯＦＦ”の期間であることを示す。そして変数ＤＵＭ
ＭＹ３が０から３までの期間（ＣＮＴ３より小さい期
間）、以上を繰り返す。FIG. 5 shows a flowchart of the operation of the timing signal generator 28. CNT1 = 4, CNT2 = 3
For CNT3 = 4, the period of "ON" is 4 cycles,
"OFF" period is 3 cycles, and "ON" and "OF"
It is set to repeat F "4 times. Variable O
UTPUT represents the output signal. The period in which the variable DUMMY1 changes from 0 to 3 (the period smaller than CNT1), O
UTPUT = 1, indicating that the period is “ON”. The period during which the variable DUMMY2 changes from 0 to 2 (C
(A period smaller than NT2), OUTPUT = 0, indicating that the period is “OFF”. And the variable DUM
The above is repeated for a period of MY3 from 0 to 3 (a period smaller than CNT3).

【００４０】次に、第１の単位プロセッサ１の演算部１
１に着目する。上述したように、部分参照画像１０１ａ
の不要な画素が読み出されている期間は、第２のアドレ
ス発生器２７はアドレス発生を停止しているが、物理的
には単位プロセッサローカルメモリコア１２ａは何らか
の値を出力していると考えられ、さらに算術論理演算器
１１ａはこの値を入力として演算を行い、その結果は続
く加算器１１ｂおよびレジスタ１１ｃによって累算され
てしまう。この演算は不要な演算であるので、実行を停
止しなければならない。そのために、レジスタ１１ｃと
して、書き込みのイネーブル／ディスイネーブルが制御
できる型のものを使用し、そして図１に示すように、タ
イミング信号発生器２８の出力信号を演算制御信号入力
端子１９を介してレジスタ１１ｃに供給し、イネーブル
／ディスイネーブル制御を行うものとする。即ち第２の
アドレス発生器２７がアドレス発生する期間は、レジス
タ１１ｃの書き込みもイネーブルとし、第２のアドレス
発生器２７がアドレス発生を停止する期間は、レジスタ
１１ｃの書き込みもディスイネーブルとする。Next, the arithmetic unit 1 of the first unit processor 1
Pay attention to 1. As described above, the partial reference image 101a
While the second address generator 27 stops the address generation during the period in which the unnecessary pixels are read out, it is physically considered that the unit processor local memory core 12a outputs some value. Further, the arithmetic logic operation unit 11a performs an operation with this value as an input, and the result is accumulated by the following adder 11b and register 11c. Since this operation is an unnecessary operation, its execution must be stopped. Therefore, as the register 11c, a type capable of controlling write enable / disable is used, and the output signal of the timing signal generator 28 is registered via the operation control signal input terminal 19 as shown in FIG. 11c, and enable / disable control is performed. That is, during the period in which the second address generator 27 generates an address, the writing in the register 11c is also enabled, and during the period in which the second address generator 27 stops the address generation, the writing in the register 11c is also disabled.

【００４１】以上により、第１の単位プロセッサ１にお
いては、図２（ａ）の注目ブロック１００と図２（ｂ）
の画素０を左上とするブロックとの相違度である式
（１）を計算することが可能となる。しかし第２の単位
プロセッサ２においては、部分参照画像１０１ａの画素
の内、演算に用いるべき画素が制御部ローカルメモリ２
５から読み出されるタイミングは、第１の単位プロセッ
サ１の場合に比べて１サイクル遅れている。さらに第３
の単位プロセッサ３，第４の単位プロセッサ４の順に、
この遅れが１サイクルずつ増える。As described above, in the first unit processor 1, the target block 100 of FIG. 2A and the target block 100 of FIG.
It is possible to calculate the equation (1) which is the degree of difference with the block in which the pixel 0 of is the upper left. However, in the second unit processor 2, among the pixels of the partial reference image 101a, the pixel to be used for calculation is the control unit local memory 2
The timing of reading from 5 is delayed by one cycle as compared with the case of the first unit processor 1. Furthermore the third
In the order of the unit processor 3 and the fourth unit processor 4 of
This delay increases by one cycle.

【００４２】そこで各単位プロセッサに第１の遅延器１
８を設けることにより、この遅れを吸収することとす
る。レジスタ１１ｃのイネーブル／ディスイネーブルの
制御信号に関しても、第２の遅延器２１を設けることと
する。これらの遅延器は、１サイクルの遅延量を持つよ
うに予め設定する。これにより、第２の単位プロセッサ
２，第３の単位プロセッサ３，第４の単位プロセッサ４
の各々においても、図２（ａ）の注目ブロック１００と
図２（ｂ）の画素１，画素２，画素３を左上とする各々
のブロックとの相違度を計算することが可能となる。Therefore, the first delay device 1 is provided in each unit processor.
By providing 8, the delay will be absorbed. The second delay device 21 is also provided for the enable / disable control signal of the register 11c. These delay devices are preset so as to have a delay amount of 1 cycle. As a result, the second unit processor 2, the third unit processor 3, the fourth unit processor 4
2A, it is possible to calculate the degree of difference between the block of interest 100 of FIG. 2A and each block in which the pixel 1, pixel 2, and pixel 3 of FIG.

【００４３】図４には、第１のアドレス発生器２６，第
１の単位プロセッサ１，第２の単位プロセッサ２，第３
の単位プロセッサ３，第４の単位プロセッサ４を併せた
動作のタイムチャートを示す。FIG. 4 shows the first address generator 26, the first unit processor 1, the second unit processor 2, and the third unit processor 2.
7 shows a time chart of an operation in which the unit processors 3 and 4 of FIG.

【００４４】以上は、図２（ｂ）の部分参照画像１０１
ａを対象とした式（１）の計算の説明である。即ち参照
画像１０１の中の、画素０，画素１，画素２，画素３を
左上とする４個のブロックを対象としたに過ぎない。前
述したように参照画像１０１内には６４個のブロックが
存在するので、残り６０個のブロックについても計算を
行う必要がある。The above is the partial reference image 101 of FIG.
It is an explanation of the calculation of formula (1) for a. That is, only the four blocks having the pixel 0, the pixel 1, the pixel 2, and the pixel 3 as the upper left in the reference image 101 are targeted. Since 64 blocks exist in the reference image 101 as described above, it is necessary to perform calculations for the remaining 60 blocks.

【００４５】注目ブロック１００は変わらず、部分参照
画像１０１ａのみが変わる。即ち、図２（ｂ）の画素
４，画素５，画素６，画素７を各々左上とする４個のブ
ロックを対象とした計算を、第１の単位プロセッサ１，
第２の単位プロセッサ２，第３の単位プロセッサ３，第
４の単位プロセッサ４に新たに割り当てることとなる。
即ち図６に示す部分参照画像１０２を、第１のアドレス
発生器２６のアドレス発生によって、制御部ローカルメ
モリ２５から読み出すことである。そのアドレス発生の
様子は、アドレスの初期値が４であることを除いて図２
（ｃ）と同様である。つまりアドレス発生に先だって、
アドレスの初期値を４に設定する命令語の実行が必要と
なる。その後の動作は、既に説明した図２（ｂ）の部分
参照画像１０１ａの場合と同様である。The block of interest 100 does not change, and only the partial reference image 101a changes. That is, the calculation for the four blocks having the pixel 4, pixel 5, pixel 6, and pixel 7 in the upper left of FIG.
It will be newly assigned to the second unit processor 2, the third unit processor 3, and the fourth unit processor 4.
That is, the partial reference image 102 shown in FIG. 6 is read from the control unit local memory 25 by the address generation of the first address generator 26. The state of the address generation is shown in FIG. 2 except that the initial value of the address is 4.
It is similar to (c). In other words, before the address is generated,
It is necessary to execute an instruction word that sets the initial value of the address to 4. The subsequent operation is similar to that of the case of the partial reference image 101a of FIG.

【００４６】以降、第１のアドレス発生器２６のアドレ
ス初期値の設定を変えることにより、読み出すべき部分
参照画像の位置をずらして行くことで、参照画像１０１
全体を処理対象に用いることができる。これは部分参照
画像の大きさが横７画素×縦４画素で一定であり、その
左上の位置のみが、図７の円で囲んだ画素に逐次移って
行くからである。この図より部分参照画像は参照画像１
０１中に１６個存在し、各部分参照画像が、第１の単位
プロセッサ１，第２の単位プロセッサ２，第３の単位プ
ロセッサ３，第４の単位プロセッサ４に割り当てられて
処理される。従って各単位プロセッサは、合計で参照画
像１０１中の１６個のブロックを処理することとなる。After that, by changing the setting of the initial address value of the first address generator 26, the position of the partial reference image to be read is shifted, whereby the reference image 101
The whole can be used as a processing target. This is because the size of the partial reference image is constant at 7 pixels horizontally by 4 pixels vertically, and only the position at the upper left of the partial reference image sequentially moves to the pixels surrounded by the circle in FIG. 7. From this figure, the partial reference image is the reference image 1
There are 16 in 01, and each partial reference image is assigned to and processed by the first unit processor 1, the second unit processor 2, the third unit processor 3, and the fourth unit processor 4. Therefore, each unit processor processes 16 blocks in the reference image 101 in total.

【００４７】以上述べた動作は、参照画像１０１全体に
わたる相違度（式（１）の値）の計算である。その他に
も、この計算の実行に先立っての種々の初期設定，前述
したように式（１）の最小値を見つける処理等が必要で
ある。The operation described above is the calculation of the dissimilarity (value of equation (1)) over the entire reference image 101. In addition, various initial settings prior to the execution of this calculation, and the process of finding the minimum value of the equation (1) as described above are necessary.

【００４８】先ず初期値設定では以下のような命令語が
実行される。第１のアドレス発生器２６，タイミング信
号発生器２８，第２のアドレス発生器２７の各動作を決
めるレジスタ設定のための命令語（第１のアドレス発生
器２６ならばアドレスの初期値設定，縦横のアドレス加
算値，縦横のカウント値）、算術論理演算値１１ａに差
分絶対値演算を実行するように設定する命令語（あるい
はこれは差分絶対値演算が必要となるサイクルの度に、
制御信号を供給するという方法も考えられる）、セレク
タ１１ｄ，セレクタ１２ｂを各々データ入力端子１３
側，レジスタ１１ｃ側を選択するように設定する命令語
（上述同様、必要となるサイクルの度に適宜制御信号を
供給する方法も可能である）、レジスタ１１ｃの内容を
０にする（クリアする）命令語である。First, in the initial value setting, the following command words are executed. An instruction word for register setting that determines each operation of the first address generator 26, the timing signal generator 28, and the second address generator 27 (in the case of the first address generator 26, initial address setting, vertical and horizontal Address addition value, vertical / horizontal count value), and an instruction word for setting the arithmetic logical operation value 11a to execute the difference absolute value operation (or this time, every time a cycle in which the difference absolute value operation is required,
A method of supplying a control signal may be considered), and the selector 11d and the selector 12b are respectively connected to the data input terminal 13
Side, register 11c side is set to select an instruction word (similarly to the above, it is possible to supply a control signal appropriately every required cycle), and register 11c is set to 0 (clear). It is a command word.

【００４９】注目ブロック１００に最も類似したブロッ
クを見つけるということは、参照画像１０１全体で６４
個ある相違度の中から最小値を見つけることである。こ
れを、各単位プロセッサが、自分が計算する１６個の相
違度の中から最小値を見つけ、最後にこれら４個の単位
プロセッサが見つけた最小値の中からさらに最小値を見
つけるという方法で行う。そのための各単位プロセッサ
ローカルメモリコア１２ａの同一のアドレスに、相違度
の最小値を記憶する変数領域を確保する必要がある。具
体的には初期設定として、表現し得る最も大きい値をそ
のアドレスに記憶する命令語を実行することである。Finding the block most similar to the target block 100 means that the reference image 101 has 64 blocks in total.
Finding the minimum value among the dissimilarities. This is done by each unit processor finding the minimum value among the 16 differences calculated by itself, and finally finding the minimum value among the minimum values found by these four unit processors. . Therefore, it is necessary to secure a variable area for storing the minimum value of the difference degree at the same address of each unit processor local memory core 12a. Specifically, as an initial setting, an instruction word that stores the largest representable value at that address is executed.

【００５０】最小値と共に、式（１）のｘ，ｙに相当す
る情報、即ち参照画像１０１中における各ブロックの位
置情報も記憶する必要があり、最小値と同様にそのため
の変数領域も確保する。部分参照画像の位置と各単位プ
ロセッサが扱うブロックの位置は１対１に対応するの
で、第１のアドレス発生器２６のアドレス初期値を各単
位プロセッサに供給し、これを位置情報として記憶する
という方法が可能である。In addition to the minimum value, it is necessary to store the information corresponding to x and y in the equation (1), that is, the position information of each block in the reference image 101, and the variable area for the same as the minimum value is secured. . Since the position of the partial reference image and the position of the block handled by each unit processor have a one-to-one correspondence, the initial address value of the first address generator 26 is supplied to each unit processor and stored as position information. Method is possible.

【００５１】また上記した、各単位プロセッサ毎での相
違度の最小値検出を行うために、各単位プロセッサは、
注目ブロック１００と部分参照画像中の自分に割り当て
られたブロックに対しての相違度の計算が完了する度
に、その時点での相違度の最小値と、たった今求められ
た最新の相違度との比較を行い、後者の方が小さければ
新たな最小値として記憶し、そうでなければ最小値はそ
のままとするという処理を行う。その際、セレクタ１１
ｄとセレクタ１２ｂの選択を共にレジスタ１１ｃ側に、
算術論理演算器１１ａの演算を減算にし、算術論理演算
器１１ａの右側の入力には単位プロセッサローカルメモ
リコア１２ａ内に記憶しているその時点での最小値を入
力し、左側の入力にはたった今求めた相違度を入力す
る。そして減算結果の正負によって、後者を最小値の記
憶領域に記憶する（最小値更新）か、何もしないかの処
理を選択する。Further, in order to detect the minimum value of the dissimilarity of each unit processor, each unit processor
Each time the calculation of the dissimilarity between the block of interest 100 and the block assigned to itself in the partial reference image is completed, the minimum dissimilarity at that time and the latest dissimilarity just obtained If the latter is smaller, it is stored as a new minimum value, and if not, the minimum value is left unchanged. At that time, the selector 11
Both d and the selector 12b are selected on the register 11c side,
The arithmetic operation of the arithmetic logical operation unit 11a is subtracted, the minimum value at that time stored in the unit processor local memory core 12a is input to the right input of the arithmetic logical operation unit 11a, and the left input is just now. Enter the calculated dissimilarity. Then, depending on whether the subtraction result is positive or negative, a process of storing the latter in the storage area of the minimum value (update of the minimum value) or not doing anything is selected.

【００５２】記憶領域を指定するアドレスは、命令語中
にアドレスをそのまま記述するものとし、第２のアドレ
ス発生器２７を通して単位プロセッサローカルメモリコ
ア１２ａに供給される、あるいは第２のアドレス発生器
２７と別のアドレス信号線を設けて供給される等、いく
つかの方法が考えられる。このアドレス信号と同時に読
み出し／書き込みイネーブル信号も各単位プロセッサに
供給されるものとする。各単位プロセッサが、上記の減
算結果の正負によって、単位プロセッサローカルメモリ
コア１２ａへの書き込みイネーブル信号の供給を制御す
ることにより、最小値の更新／更新無しの選択を行うこ
とができる。The address for designating the storage area is described in the instruction word as it is, and is supplied to the unit processor local memory core 12a through the second address generator 27, or the second address generator 27. There are several possible methods such as providing another address signal line and supplying. At the same time as this address signal, a read / write enable signal is also supplied to each unit processor. Each unit processor controls the supply of the write enable signal to the unit processor local memory core 12a depending on whether the subtraction result is positive or negative, thereby making it possible to select the minimum value update / no update.

【００５３】各単位プロセッサが扱う参照画像１０１中
におけるブロックの位置情報に関しては、上記の相違度
の最小値の更新／更新無しの処理の後、その時点の部分
参照画像の読み出しに用いた第１のアドレス発生器２６
のアドレス初期値を制御部６から各単位プロセッサに供
給し、各単位プロセッサでは相違度の最小値と同様にし
て更新／更新無しの処理を行う方法が可能である。Regarding the position information of the block in the reference image 101 handled by each unit processor, after the processing of updating / not updating the minimum value of the dissimilarity, the first one used for reading the partial reference image at that time point. Address generator 26
It is possible to supply the address initial value of 1 to each unit processor from the control unit 6 and to perform the update / no update process in each unit processor in the same manner as the minimum difference value.

【００５４】その後レジスタ１１ｃのクリア，セレクタ
１１ｄ及びセレクタ１２ｂの選択を各々元に戻すという
処理を行い、次の部分参照画像を対象とした相違度の計
算を行う。以降これを繰り返して、参照画像１０１全体
にわたって、各単位プロセッサが相違度の最小値を検出
して行く。After that, the processing of clearing the register 11c and restoring the selections of the selector 11d and the selector 12b are performed, and the difference degree is calculated for the next partial reference image. Thereafter, this process is repeated, and each unit processor detects the minimum value of the dissimilarity over the entire reference image 101.

【００５５】以上の各単位プロセッサにおける最小値検
出及びブロックの位置情報の処理に必要となる制御信号
の供給方法としては、以下の２通りが考えられる。第１
は、第２のアドレス発生器２７のアドレスおよびタイミ
ング信号発生器２８の信号と同様に、第１の遅延器１
８，第２の遅延器２１と同一の遅延を各単位プロセッサ
毎に持たせて、第１の単位プロセッサ１が相違度の計算
を完了した時点で、制御部６が第１の単位プロセッサ１
のみに供給する方法である。第２の方法は、第４の単位
プロセッサ４が相違度の計算を完了した時点で、最小値
検出の処理に必要となる制御信号を、制御部６が４個の
単位プロセッサに同時に供給する方法である。The following two methods are conceivable as the method of supplying the control signal required for the minimum value detection and the processing of the block position information in each of the above unit processors. First
Is the same as the address of the second address generator 27 and the signal of the timing signal generator 28.
8. When the first unit processor 1 completes the calculation of the dissimilarity by giving each unit processor the same delay as that of the second delay device 21, the control unit 6 causes the first unit processor 1
It is a method to supply only to. A second method is a method in which the control unit 6 simultaneously supplies the control signals necessary for minimum value detection processing to the four unit processors when the fourth unit processor 4 completes the calculation of the dissimilarity. Is.

【００５６】以上の方法により、参照画像１０１の全体
にわたって各単位プロセッサが最小値およびブロックの
位置情報の検出が完了したら、各単位プロセッサが持つ
これらの情報を、各単位プロセッサの番号と共に第２の
データバス２４を介して制御部６に集め（図には示して
いないが、第４の単位プロセッサ４の第２のデータ出力
端子２３を制御部６にフィードバックするなどして）、
その中から相違度の最小値を見つけ、さらにこれに対応
するブロックの位置情報と単位プロセッサの番号を基
に、ｘとｙを求めることができる。By the above method, when each unit processor completes the detection of the minimum value and the block position information over the entire reference image 101, the information held by each unit processor together with the number of each unit processor is used as a second value. Collected in the control unit 6 via the data bus 24 (not shown in the figure, but by feeding back the second data output terminal 23 of the fourth unit processor 4 to the control unit 6),
It is possible to find the minimum value of the dissimilarity among them, and then find x and y based on the position information of the block corresponding to this and the number of the unit processor.

【００５７】パタンマッチングの他の例として以下の処
理がある。図８に示すように、参照画像内のブロックの
取り方として、実線の円が示すような縦横とも１画素お
きとなる画素を左上の画素とするブロックを取り、これ
らと注目ブロックとのパタンマッチングを先ず行う。そ
の結果、画素Ｘが最も類似していたとし、その周囲の８
個の画素（画素ａから画素ｈ）を左上の画素とするブロ
ックに関して再びパタンマッチングを行うという２段階
の方法がある。これらは演算量を削減するための手法で
ある。The following processing is another example of pattern matching. As shown in FIG. 8, as a method of taking blocks in the reference image, a block having a pixel at every other pixel in the vertical and horizontal directions as an upper left pixel as shown by a solid circle is taken, and pattern matching between these and the target block is performed. First. As a result, it is assumed that the pixel X is the most similar, and the 8
There is a two-step method in which pattern matching is performed again for a block in which each pixel (pixel a to pixel h) is the upper left pixel. These are methods for reducing the amount of calculation.

【００５８】この手法を、第２および第５の発明によ
る、各単位プロセッサの遅延器の遅延量を独立して設定
できるという特徴によって、以下のように処理すること
ができる。この場合８個の単位プロセッサを備えるもの
とし、画素ａから画素ｈを左上とする各ブロックを、第
１の単位プロセッサから第８の単位プロセッサで処理す
るものとする。This method can be processed as follows by the feature that the delay amount of the delay device of each unit processor can be set independently according to the second and fifth inventions. In this case, it is assumed that eight unit processors are provided, and each block in which the pixel a to the pixel h are located at the upper left is processed by the first unit processor to the eighth unit processor.

【００５９】１段階目においては、参照画像内の各ブロ
ックを各単位プロセッサが正しく取り込むためには、第
１の遅延器１８および第２の遅延器２１の遅延量を２サ
イクル分とする必要がある。２段階目では、図８の８個
の画素を取り込むためには、画素ａ，画素ｂ，画素ｃ間
の遅延、即ち第１の単位プロセッサ，第２の単位プロセ
ッサの遅延は１サイクル分とし、画素ｃ，画素ｄ間の遅
延，即ち第３の単位プロセッサの遅延は９サイクル分と
し（参照画像が図２（ｂ）のように記憶されている場
合、画素ｃから画素ｄへは、下に１画素で１１サイク
ル，左に２画素で−２サイクルであるから）、画素ｄ，
画素ｅ間の遅延，即ち第４の単位プロセッサの遅延は２
サイクル分とし、画素ｃ，画素ｆ間の遅延は再び９サイ
クルとし、画素ｆ，画素ｇ，画素ｈ間の遅延は再び１サ
イクルの遅延とする。At the first stage, in order for each unit processor to properly capture each block in the reference image, the delay amounts of the first delay device 18 and the second delay device 21 must be two cycles. is there. In the second stage, in order to capture the eight pixels of FIG. 8, the delay between the pixel a, the pixel b, and the pixel c, that is, the delay of the first unit processor and the second unit processor is one cycle, The delay between the pixel c and the pixel d, that is, the delay of the third unit processor is set to be 9 cycles (when the reference image is stored as shown in FIG. 11 cycles for one pixel, -2 cycles for 2 pixels to the left), pixel d,
The delay between the pixels e, that is, the delay of the fourth unit processor is 2
The number of cycles is set, the delay between the pixel c and the pixel f is set to 9 cycles again, and the delay between the pixel f, the pixel g, and the pixel h is set to one cycle again.

【００６０】以上の遅延量の設定によって画素ａから画
素ｈに対して相違度を計算し、１段階目の画素Ｘの相違
度と付随する情報を保持しておくものとして、合計９個
の相違度から最小値を見い出すことにより、最終的な最
終値とそのブロックの位置を求めることができる。As a result of calculating the dissimilarity from the pixel a to the pixel h by setting the above delay amount and holding the dissimilarity of the pixel X in the first stage and the accompanying information, a total of 9 differences are stored. By finding the minimum value from the degrees, the final value and the position of the block can be obtained.

【００６１】また図１に示す実施例では、演算部１１に
は算術論理演算部１１ａ，加算器１１ｄ，レジスタ１１
ｃのみを搭載しているが、乗算器を搭載して、乗算結果
を加算器１１ｂおよびレジスタ１１ｃによって累算する
ことにより、積和演算器を構成すれば、フィルタ処理も
可能となる。Further, in the embodiment shown in FIG. 1, the arithmetic unit 11 includes an arithmetic logic operation unit 11a, an adder 11d, and a register 11.
Although only c is mounted, if a multiplier is mounted and the multiplication result is accumulated by the adder 11b and the register 11c to form a product-sum calculator, filter processing is also possible.

【００６２】[0062]

【発明の効果】以上説明したように、本発明によるＳＩ
ＭＤプロセッサは、画像処理では、実施例で説明したパ
タンマッチングのように、メモリ中の画素データに対し
て矩形状にアクセスすることが多く行われる。このよう
なアクセスを可能にするアドレス発生器は、カウンタ，
レジスタ，加算器等を備える必要があり、ハードウェア
量が大きくなる。本発明によれば、このようなアドレス
発生器を単位プロセッサには搭載せず、制御部のみに搭
載することで、画像処理に必要な画素データへのアクセ
スが可能となる。しかもどのようなアドレス発生のパタ
ン（矩形の大きさや位置等）を設定するかは、第１の単
位プロセッサのみについて考慮すれば良い。それ以降の
単位プロセッサは第１の単位プロセッサと同様の動作が
単に遅延を持って実行されるということのみ考慮すれば
良いので、アプリケーションプログラムの開発も容易で
ある。As described above, the SI according to the present invention is
In image processing, the MD processor often performs rectangular access to the pixel data in the memory, as in the pattern matching described in the embodiments. Address generators that enable such access include counters,
Since it is necessary to provide a register, an adder, etc., the amount of hardware becomes large. According to the present invention, such an address generator is not mounted in the unit processor but mounted only in the control unit, so that it is possible to access the pixel data necessary for image processing. Moreover, what kind of address generation pattern (size and position of rectangle, etc.) should be set may be considered only for the first unit processor. The subsequent unit processors need only consider that the same operation as that of the first unit processor is executed with a delay, so that the application program can be easily developed.

[Brief description of drawings]

【図１】本発明の一実施例を示す図である。FIG. 1 is a diagram showing an embodiment of the present invention.

【図２】実施例のパタンマッチングに必要なアドレス発
生を説明するための図である。FIG. 2 is a diagram for explaining address generation necessary for pattern matching in the embodiment.

【図３】第１のアドレス発生器２６の動作のフローチャ
ートを示す図である。FIG. 3 is a diagram showing a flowchart of the operation of the first address generator 26.

【図４】本発明のＳＩＭＤプロセッサの実施例における
動作のタイムチャートを示す図である。FIG. 4 is a diagram showing a time chart of the operation in the embodiment of the SIMD processor of the present invention.

【図５】タイミング信号発生器２８の動作のフローチャ
ートを示す図である。FIG. 5 is a diagram showing a flowchart of the operation of the timing signal generator 28.

【図６】実施例のパタンマッチングに必要なアドレス発
生を説明するための図である。FIG. 6 is a diagram for explaining address generation necessary for pattern matching in the embodiment.

【図７】実施例のパタンマッチングに必要なアドレス発
生を説明するための図である。FIG. 7 is a diagram for explaining address generation necessary for pattern matching in the embodiment.

【図８】実施例のパタンマッチングの他の手法における
単位プロセッサ内の遅延量設定を説明するための図であ
る。FIG. 8 is a diagram for explaining a delay amount setting in a unit processor in another method of pattern matching of the embodiment.

[Explanation of symbols]

１，２，３，４単位プロセッサ５単位プロセッサ群６制御部１１演算部１１ａ算術論理演算器１１ｂ加算器１１ｃレジスタ１２単位プロセッサローカルメモリ部１２ａ単位プロセッサローカルメモリコア１３データ入力端子１４データ出力端子１５データバス１６アドレス入力端子１７アドレス出力端子１８第１の遅延器１９演算制御信号入力端子２０演算制御信号出力端子２１第２の遅延器２２第２のデータ入力端子２３第２のデータ出力端子２４第２のデータバス２５制御部ローカルメモリ２６第１のアドレス発生器２７第２のアドレス発生器２８タイミング信号発生器 1, 2, 3, 4 Unit processor 5 Unit processor group 6 Control unit 11 Arithmetic unit 11a Arithmetic logic arithmetic unit 11b Adder 11c Register 12 Unit processor local memory unit 12a Unit processor local memory core 13 Data input terminal 14 Data output terminal 15 Data bus 16 Address input terminal 17 Address output terminal 18 First delay device 19 Operation control signal input terminal 20 Operation control signal output terminal 21 Second delay device 22 Second data input terminal 23 Second data output terminal 24 2 data bus 25 control unit local memory 26 first address generator 27 second address generator 28 timing signal generator

Claims

[Claims]

1. A SIMD processor comprising a unit processor group consisting of N (N is a natural number) unit processors, and a control unit, wherein the control unit comprises an address generator and a control unit local memory, Each unit processor of the unit processor group includes a unit processor local memory unit, an arithmetic unit, an address input terminal, an address output terminal, a delay unit, a data input terminal, a data output terminal, and a data bus, The delay device delays the signal input to the address input terminal by a predetermined period and outputs the delayed signal to the address output terminal, and the signal input to the address input terminal is addressed to the unit processor local memory unit. And the data input terminal and the data output terminal are connected by the data bus, Scan supplies the data to the unit processor local memory unit,
The data bus and the unit processor local memory unit supply data to the operation unit, the address generator supplies an address to the address input terminal of the first unit processor of the unit processor group, and the unit processor The address output terminal of the i-th unit processor (i is a natural number from 1 to N-1) of the group is connected to the address input terminal of the (i + 1) -th unit processor of the unit processor group, The SIM, wherein the data output terminal of the unit processor is connected to the data input terminal of the (i + 1) th unit processor.
D processor.

2. The SIMD processor according to claim 1, wherein the delay period of the delay unit of each unit processor of the unit processor group is independently determined for each unit processor.

3. The control unit includes a timing signal generator, and an output signal of the timing signal generator alternately takes a first state and a second state for each predetermined arbitrary period, An output signal of the timing signal generator is supplied to the address generator, and when the output signal of the timing signal generator is in the second state, the address generator stops address generation. The SIMD processor according to item 1.

4. Each unit processor of the unit processor group includes an operation control signal input terminal, an operation control signal output terminal, and a second delay device, and the second delay device is the operation control signal. The signal input to the input terminal is delayed by a predetermined period and output to the arithmetic control signal output terminal, the signal input to the arithmetic control signal input terminal is supplied to the arithmetic unit, and the arithmetic control signal When the signal input to the input terminal is in the second state, the operation unit stops the operation, and the timing signal generator outputs a signal to the operation control signal output terminal of the first unit processor of the unit processor group. 4. The calculation control signal output terminal of the i-th unit processor is connected to the calculation control signal input terminal of the (i + 1) -th unit processor. The placing of the SIMD processor.

5. The SI according to claim 4, wherein a delay period of the second delay device of each unit processor of the unit processor group is independently determined for each unit processor.
MD processor.

6. The SIMD processor according to claim 1, wherein in each unit processor of the unit processor group, an output of the arithmetic unit is supplied to the unit processor local memory unit.

7. Each unit processor of the unit processor group includes a second data input terminal, a second data output terminal, and a second data bus, and the second data input terminal and the second data input terminal are provided. Two data output terminals are connected by the second data bus, the arithmetic unit supplies data to the second data bus, and the second data output terminal of the unit processor of i has the ( The SIMD processor according to claim 1, wherein the SIMD processor is connected to the second data input terminal of the unit processor i + 1).

8. The SIMD processor according to claim 1, wherein the control unit includes a second address generator, and the second address generator supplies an address to the control unit local memory.

9. In each unit processor of the unit processor group, the data input terminal, the data bus,
The SIMD processor according to claim 1, wherein the data output terminal, the address input terminal, the delay device, and the address output terminal are multiplexed.

10. In each unit processor of the unit processor group, the operation control signal input terminal, the second delay device, and the operation control signal output terminal are multiplexed. Item 4. The SIMD processor according to Item 4.