JPH09511078A

JPH09511078A - Signal processing method and apparatus

Info

Publication number: JPH09511078A
Application number: JP7516374A
Authority: JP
Inventors: アケリブアヴィダン
Original assignee: エーエスピーソルーションユーエスエイインコーポレイション
Priority date: 1993-12-12
Filing date: 1994-12-09
Publication date: 1997-11-04
Also published as: US5974521A; EP0733233A4; US6460127B1; EP0733233A1; WO1995016234A1; US5809322A; AU1433495A

Abstract

(57)【要約】入力信号を処理する、ＡＳＰ（Ａssociative Ｓignal Ｐrocessing：連想信号処理）装置は、多数のサンプルと、二次元のプロセッサのアレイを含む装置と、多くの満足なメモリセルを含むそれぞれのプロセッサ２０と、少なくとも一つのプロセッサによって処理される入力信号のそれぞれのサンプルと、届いた応答を蓄積し１サイクル内に離れたプロセッサ間の伝達を行うことができる少なくとも一つのレジスタを含むレジスタアレイとを含む。 (57) [Summary] An ASP (Associative Signaling Processing) device for processing an input signal includes a large number of samples, a device including an array of two-dimensional processors, and a large number of satisfactory memory cells. Array including at least one processor 20 and each sample of an input signal processed by the at least one processor, and at least one register capable of accumulating the arriving response and transferring between the distant processors within one cycle. Including and

Description

【発明の詳細な説明】発明の名称信号処理方法および装置発明の分野本発明は信号処理方法および装置に関する。発明の背景コンピュータ映像技術のシステムおよび方法と、連想処理技術の方法については下記の刊行物に述べられていてる。明らかになっている事実は参考文献によって追加される。連想信号処理に役立つイメージ処理技術と他の題材については下記の参考文献に述べられている。全ての上記の参考文献とここで引用した刊行物は参考文献に統合される。以下の文章中の角括弧の中の数字は、上記の文献を参考文献とする。発明の概要本発明は信号処理方法および装置の改良について提供しようとするものである。ＡＳＰ（Ａssociative Ｓignal Ｐrocessing：連想信号処理）は、従来のメモリとＣＰＵ（Ｃentral Ｐrocessing Ｕnit）を含む並列コンピュータの構造と比較される。ＣＰＵは計算の役割をし、メモリはデータを蓄積するだけの簡単な仕組みである。ＡＳＰの構造は全く違う。計算は「インテリジェントメモリ(Ｉntelligent me mory)」によって行われるので、ＣＰＵはこのインテリジェントメモリを管理するだけの単純なコントローラに置き換えられる。このメモリは、読み出しと書き込みの能力に加えて、コントローラから受け取った命令に従って自己の内容を認識し変化させることができる。例えば、１と５の間の１万個の数字の列があるとする。列全体に３を加えることが必要であるとする。従来の並列コンピュータでは、数字は所定のメモリからＣＰＵへ変換され、そこで３が加えられ、その結果がメモリへ返される。それぞれの数字に対して、１から３のマシンサイクルを必要とするので、列全体では１万から３万のマシンサイクルを必要とする。連想処理では、１万個の数字がインテリジェントメモリに蓄積される。コントローラは以下のように５つの質問をし、５つの答えを返す。「誰が５ですか？自分自身をを認識しなさい。」これは１のマシンサイクルを必要とする。コントローラは自分自身を認識した全てのメモリに命令を出し、それらは「８」になる。コントローラは「誰が４ですか？」と質問を続け、「あなたは７になりなさい。」と命令を出す。以下、数列全てを処理するまで同様に続ける。従来の並列コンピュータが１万から３万のマシンサイクルを必要としたのに比べ、この処理はわずか１０のマシンサイクルしか必要としない。この読み出し、自己認識および書き込みの基本命令セットを使って、全ての算術的および論理的処理を行うことができる。本発明のある実施例よれば、入力信号を処理するＡＳＰ装置は、それぞれのプロセッサが多数の連想メモリセルを含み、それぞれの入力信号が少なくとも一つのプロセッサで処理されるプロセッサのアレイと、プロセッサから届いた答えを蓄積し、レジスタとプロセッサ間の伝達を行う少なくとも一つのレジスタを含むレジスタのアレイと、入出力信号のためのＩ／Ｏバッファレジスタから構成され、ここで、プロセッサのアレイ、レジスタのアレイおよびＩ／Ｏバッファレジスタは単一のモジュールの上に配置されている。また、本発明のある実施例によれば、ＡＳＰ装置はプロセッサのアレイ、レジスタのアレイおよびＩ／Ｏバッファレジスタを含み、プロセッサのアレイはそれぞれが多数の連想メモリセルを含み、少なくとも一つのプロセッサは複数の入力信号を処理し、レジスタアレイは少なくとも一つの、プロセッサから届いた応答を蓄積するレジスタを含み、プロセッサ間の伝達を行い、Ｉ／Ｏバッファレジスタは、信号を入出力するためのものである。さらに、本発明のある実施例によれば、プロセッサのアレイと、レジスタのアレイおよびＩ／Ｏバッファレジスタは単一のチップの上に配置されている。さらに、本発明のある実施例によれば、レジスタのアレイは少なくとも一つのマルチセルシフト操作を行う。また、本発明のある実施例によれば、ＡＳＰ装置は連想メモリワードのアレイを含み、それぞれのワードはプロセッサ、レジスタのアレイおよびＩ／Ｏバッファレジスタを含み、それぞれの入力信号のサンプルは少なくともプロセッサのうちの一つで処理され、レジスタのアレイは、ワード間の伝達を行い、少なくとも一つのマスチセルシフト操作を行う少なくとも一つのレジスタを含み、Ｉ／Ｏバッファレジスタは、信号を入出力するためのものである。さらに、本発明のある実施例によれば、レジスタアレイはシングルセルシフト操作に対しても働く。また、本発明のある実施例によれば、Ｉ／Ｏバッファレジスタとプロセッサは並列に処理される。加えて、本発明のある実施例によれば、連想メモリセルのワードの長さが短くなるほど、Ｉ／Ｏバッファレジスタのワードの長さは長くなる。さらに、本発明のある実施例によれば、この装置はリアルタイムにビデオの処理ができる。また、本発明のある実施例によれば、信号にはイメージを含む。さらに、本発明のある実施例によれば、ワードのアレイの内、少なくとも一つのワードは少なくとも一つの非連想メモリセルを含む。また、本発明のある実施例によれば、ワードのアレイの内、少なくとも一つのワードは非連想メモリセルの列を少なくとも一つ含む。さらに、本発明のある実施例によれば、アレイ、レジスタアレイおよびＩ／Ｏバッファレジスタは単一のモジュールに配置されている。また、本発明のある実施例によれば、モジュールはインストラクションを受けとり、かつ少なくとも一つのマルチセルシフト操作を実行する、バスをもつ。加えて、本発明のある実施例によれば、モジュールは少なくとも一つのマルチセルシフト操作を実行する第一バスと少なくとも一つのシングルセルシフト操作を実行する第二バスとをもつ。本発明のある実施例によれば、さらに、マルチセルまたはシングルセルのシフト操作によって伝達するプロセッサのアレイが設けられ、アレイは多数のプロセッサ、第一バスおよび第二バスを含み、第一バスは少なくとも一組のプロセッサと接続されていて、そのプロセッサは少なくとも一つのマルチセルシフト操作を実行できるもので、第二バスは少なくとも一組のプロセッサと接続されていて、そのプロセッサは少なくとも一つのシングルセルシフト操作を実行できるものである。加えて、本発明のある実施例によれば、信号処理は下記のものを含む：一連の信号特性のうち、連続した一組の第一信号特性と第二信号特性それぞれに対して、第一信号特性を持つサンプルの数を数え、次に第二信号特性を持つサンプルの数を数える。さらに、本発明のある実施例によれば、数の数え方はヒストグラムの形成を含む。また、本発明のある実施例によれば、信号はカラーイメージを含む。また、本発明のある実施例によれば、少なくとも一つの特性は強度、ノイズおよび色の密度のうちの少なくとも一つを含む。さらに、本発明のある実施例によれば、本発明の方法はカラーイメージをもつ媒体を走査することを含む。また、本発明のある実施例によれば、エッジ画素の中で一番多いエッジ画素と二番目に多いエッジ画素の自己認識と、並行して、少なくとも一つのエッジ画素と接触している全てのエッジ画素の自己認識と、少なくとも一度の、上記の二番目の自己認識過程の繰り返しを含むエッジ認識法が提供される。加えて、本発明のある実施例によれば、最も多い第一サンプルが第一特性を持つ命令の蓄積と、少なくとも一つの第一特性を持つサンプルと接触している全ての個々のサンプル対して並列に、接触したサンプルが第一特性を持っている指示の蓄積と、少なくとも一度の、上記の第二段階の繰り返しとを含む信号処理方法が提供される。さらに、本発明のある実施例によれば、信号はイメージを含み、第一サンプルの第一特性は第一サンプルがエッジ画素であることである。また、本発明のある実施例によれば、信号が詳しく調べられるフィーチャーラベリング法と、少なくとも一つのフィーチャーを含む信号と、接触したサンプルのひとつのセットを含むフィーチャーと、多くのサンプルに対応する多くの索引の蓄積を含む方法と、もし接触したサンプルの索引が個々のサンプルの索引より先に命令されているときに、多くのサンプルのなかのそれぞれの個々のサンプルに対して並列に、蓄積された個々のサンプルの索引のそのサンプルに接触しているサンプルの索引による置き換えと、少なくとも一度の、上記の置き換え段階の繰り返しが提供される。さらに、本発明のある実施例によれば、置き換えはそれぞれの繰り返しにおいて少数の索引が置き換えられるまでしか繰り返されない。また、本発明のある実施例によれば、前記信号はイメージを含む。加えて、本発明のある実施例によれば、前記信号はカラーイメージを含む。さらに、本発明のある実施例によれば、サンプルは画素を含み、第一特性は少なくとも一つの色成分を含み、少なくとも一部が隣接した画素がサンプルの結合性を決定する。加えて、本発明のある実施例によれば、画素はイメージを形成し、その中で境界が定義され、境界が届くまで繰り返しが行われる。さらに、本発明のある実施例によれば、繰り返しはあらかじめ決められた回数行われる。加えて、本発明のある実施例によれば、イメージを集める方法は、ＨＤＴＶレンズのような歪んだレンズによって作られた出力映像に対する変換計算を含み、それはレンズによる歪みの補正をし、出力信号のなかの多数の画素のそれぞれに対して並列に変形を行う。また、ユーザーが選んだ論理判定基準に応じてそれぞれが他と自身のメモリ要素の内容を比較し、それによって、比較メモリ要素が判定基準に応じるとき応答を生成する多くの比較メモリ要素を含むＡＳＰ装置と、応答を蓄積する働きをするレジスタも提供される。さらに、本発明のある実施例によれば、前記判定基準は少なくとも一つの論理的オペランドを含む。また、本発明のある実施例によれば、少なくとも一つの論理オペランドは少なくとも一つの自身とはべつのメモリ要素の参照を含む。例えば、多くのメモリ要素が対応するカラーイメージを形成する多くの画素に対してそれぞれ応答するとする。参照される要素は、三つの特定の画素の値Ａ，ＢおよびＣを含み、ユーザーが選んだ論理判定基準は個々の画素がＡの値または右上隣がＢの値を持ち、かつ左下隣がＣの値を持つというようなものである。さらに、本発明のある実施例によれば、それぞれのメモリ要素は少なくとも一つのメモリセルを含む。また、本発明のある実施例によれば、多くの比較メモリ要素は並列して個々の参照する要素に対して、自身とは別のメモリ要素との内容の比較の処理を行う。また、本発明のある実施例によれば、それぞれのＰＥ（Ｐrocessor Ｅlement）が大きさを変えられるプロセッサを含む多数のＰＥからなるＰＥアレイを含む連想メモリと、すべての連想メモリセルが多くの連想メモリセルを含む大きさを変えられるワードと、ワード内の同じ位置に配列されていて、そこではＦＩＦＯと同じ形式の多数のワードが含まれる。さらに、本発明のある実施例によれば、大きさを変えられるワードは一つより多い連想メモリセルを含む。また、本発明のある実施例によれば、多くのメモリセルの内容のモデファイの方法と、多数のメモリセルに蓄積された個々の値に対する計算の実行と、個々の値を持つ多数のメモリセルの中の計算結果の蓄積を含む方法が提供される。さらに、本発明のある実施例によれば、蓄積は全てのメモリセルで並列に実行される。ここでは、マルチメデイアやイメージ処理アプリケーションのためのチップについても述べられている。それは、低コストで、低消費電力で、小型で、マルチメデイアやイメージ処理アプリケーションのための市販アプリケーションやハイエンドの強力なイメージ処理の高性能なリアルタイムイメージ処理に適切である。前記チップは大量の並列処理をおこなうチップであり、１０２４の連想プロセッサが一つのチップのなかに詰め込まれていて、コンピュータのクロックの１のマシンサイクルで１０２４のデジタルワードを処理することができる。チップは広い範囲のイメージの処理やリアルタイムのビデオ速度のマルチメディアアプリケーションを実行できるように設計されている。それに対し、従来の一般の並列コンピューティングチップやデジタル信号処理チップ（ＤＳＰ：Ｄegital signal processing chip ）は１のマシンサイクルで１から１６のワードしか実行できない。チップの主なインストラクションセットは、全ての算術的、論理的インストラクションを実行できる四つの基本的なコマンドから構成されている。千以上のプロセッサを単一のチップに詰め込むことができるのが、もう一つの設計上の優位である。単一のチップは５００〜２０００ＭＩＰＳ（Ｍillion Ｉnstructioin Ｐer Ｓ econd）に等しい処理を実行する。このチップの上に構成されたシステムは、ハイエンドコンピュータが行うようなマルチメディア処理を、その何分のーかの価格で行うことができる。前記チップはモジュラー構造の上に構成されていて、（線形的比で）高性能を得るために簡単に一つ以上のチップと接続できる。それで、全体の性能を線形的に最も高度なスーパーコンピュータのレベルまで向上させるために多くのチップを並列に接続することができる。ＣＰＵチップとＤＳＰが存在すると、一つ以上のチップが並列に接続されているとき専用のオペレーティングシステムを必要とする。その性能は同時に接続されるチップの数の平方根に比例して向上する。二つ以上のチップが接続されると、スーパーコンピュータの構成が必要になる。前記チップの構成はデータの入力と出力処理を同時に並列して大量に処理できるようになっている。連想プロセッサとして、１０２４個のチップのそれぞれが自身の内部にメモリとデータパスを持つ。チップのデータパスの構造は内部のプロセッサにデータを並列に読み込み、メモリとＣＰＵ間のボトルネックをなくし、従来の並列コンピュータの数分の一の性能ですむようになる。このチップは、５００ＭＩＰＳと等しい機能で働くのに平均して１ワットの電力を使い、これは従来の方法とＤＳＰチップの１０〜２５倍良い結果である。図面の簡単な説明本発明は、以下の図面と同時に詳細な説明を参照することにより理解され、評価される。ＦＩＧ．１は、本発明の実施例において組み立てられ動作するＡＳＰ装置の機能を表す単純化したブロック図である。ＦＩＧ．２は、図１の装置を使用する方法の単純化したフローチャートである。ＦＩＧ．３は、本発明の実施例において組み立てられ動作する入力信号を処理するＡＳＰ装置の単純化したブロック図であるＦＩＧ．４は、図１の装置の使用例を単純化したブロック図である。ＦＩＧ．５は、図１の装置の別の使用例を単純化したブロック図である。ＦＩＧ．６は、図５の装置の一部を単純化したブロック図である。ＦＩＧ．７は、図６の装置の一部を単純化したブロック図である。ＦＩＧ．８は、図６の装置の別の一部を単純化したブロック図である。ＦＩＧ．９は、図５の装置の操作を説明する単純化したフローチャートである。ＦＩＧ．１０は、図５の装置の一部の操作を説明した図である。ＦＩＧ．１１は、本発明の別の実施例の一つにおいて組み立てられ動作する、連想リアルタイム映像装置の単純化したブロック図である。ＦＩＧ．１２は、比較と書き込みコマンドの時の、図１１の装置の操作の単純化した説明図である。ＦＩＧ．１３は、図１１の装置の一部のプロセッサ間の伝達を単純化した説明図である。ＦＩＧ．１４は、図１１の装置の一部のチップインターフェイスとプロセッサ間の接続を単純化したブロック図である。ＦＩＧ．１５は、図１１の装置の、複雑さ求めるのに使われる自動装置を単純化した説明図である。ＦＩＧ．１６は、図１１の装置の一部における、連想メモリのワード形式を説明する単純化したブロック図である。ＦＩＧ．１７は、図１１の装置の一部における、連想メモリの別のワード形式を説明する単純化したブロック図である。ＦＩＧ．１８は、図１１の装置の一部における、連想メモリのさらに別のワード形式を説明する単純化したブロック図である。ＦＩＧ．１９は、図１１の装置を利用した、域値を求める方法の実施を説明する単純化したブロック図である。ＦＩＧ．２０は、（２０Ａ−２０Ｆ）図１１の装置を利用した、薄くする方法の実施を説明するテストテンプレートを単純化した説明図である。ＦＩＧ．２１は、図１１の装置を利用した、マッチングの方法の実施を説明する単純化したブロック図である。ＦＩＧ．２２は、図１１の装置の一部における、連想メモリのさらにもう一つのワード形式を説明する単純化したブロック図である。ＦＩＧ．２３は、図１１の装置の一部における、連想メモリのさらに加えたワード形式を説明する単純化したブロック図である。ＦＩＧ．２４は、図１１の装置の一部における、連想メモリの別のワード形式を説明する単純化したブロック図である。ＦＩＧ．２５は、図１１の装置を利用した、ステレオ方法の選択した装置における実行時間を比較したグラフである。ＦＩＧ．２６は、図１１の装置を利用した、ステレオ方法の選択した装置における複雑さを比較したグラフである。ＦＩＧ．２７は、図１１の装置の一部における、連想メモリのもう一つ別のワード形式を説明する単純化したブロック図である。ＦＩＧ．２８は、図１１の装置を利用した、エッジ認識の方法の一部を説明する単純化したブロック図である。ＦＩＧ．２９は、図１１の装置の一部における、連想メモリのもう一つ別のワード形式を説明する単純化したブロック図である。ＦＩＧ．３０は、図１１の装置を利用した、連結した突出のネットワークを処理する方法において使われる画素を説明する単純化した説明図である。ＦＩＧ．３１は、図１１の装置の一部における、連想メモリのもう一つ別のワード形式を説明する単純化したブロック図である。ＦＩＧ．３２は、図１１の装置を利用した、ホウ（Ｈough）変換の計算方法の中で使用した通常の直線のパラメータ化を説明したグラフである。ＦＩＧ．３３は、図１１の装置を利用した、コンベックスヒュル（Ｃonvex Ｈull）を形成する方法の一部を説明したグラフである。ＦＩＧ．３４は、図１１の装置の一部における、連想ボロノイ（associative Ｖoronoi）図を処理する方法を説明する単純化したブロック図である。ここに添付するのは、本発明の実施例の理解と評価を助ける以下の付録（ＡＰＰＥＮＤＩＸ）である。付録Ａは「sub.rtn」と呼ばれるサブルーチンのリストであり、このサブルーチンは付録Ｂ〜Ｏのそれぞれのリストの中から呼び出される。付録Ｂはヒストグラムを形成するためのＡＳＰ方法のリストである。付録Ｃは１ＤコンボリューションのためのＡＳＰ方法のリストである。付録Ｄは２Ｄコンボリューションの低パスフィルタアプリケーションのためのＡＳＰ方法のリストである。付録Ｅは２Ｄコンボリューション（convolution）のラプラシアン（Ｌaplacia n）フィルタアプリケーションのためのＡＳＰ方法のリストである。付録Ｆは２ＤコンボリューションのラプラシアンフィルタアプリケーションのためのＡＳＰ方法のリストである。付録Ｇは曲線伝播のためのＡＳＰ方法のリストである。付録Ｈはオプティカルフロー（optical flow）のためのＡＳＰ方法のリストである。付録ＩはＲＧＢからＹＵＶへの変換を実行するためのＡＳＰ方法のリストである。付録Ｊは角と直線の認識のためのＡＳＰ方法のリストである。付録Ｋは輪郭のラベリングのためのＡＳＰ方法のリストである。付録Ｌは突出ネットワークのためのＡＳＰ方法のリストである。付録Ｍは直線として配置された信号にホウ（Ｈough）変換を実行するためのＡＳＰ方法のリストである。付録Ｎは円として配置された信号にホウ変換を実行するためのＡＳＰ方法のリストである。付録Ｏはボロノイ図の信号を形成するためのＡＳＰ方法のリストである。好ましい実施例の詳細な説明本発明の実施例において組み立てられ動作する、ＡＳＰ装置の機能を表す単純化したブロック図を図１に示す。図１の装置は、同時にアクセス可能なＦＩＦＯ(Ｆirst-Ｉn Ｆirst-Ｏut)１０、またはさらに一般的には、同時にアクセス可能なメモリを含み、そのメモリはＤＢＵＳと呼ばれるバスを通って入ってくる入力信号の少なくとも一部を蓄積する。同時にアクセス可能なＦＩＦＯ１０は複数のＰＥ(Ｐrocessor Ｅlement)２０を含むＰＥアレイ１６にデータを送り、ＰＥ２０はデータリンク３０にデータを送る。データリンク３０は好ましくは応答メモリとしてもはたらく。または、分かれた応答メモリを設けてもよい。それぞれのＰＥは、少なくとも一つの連想メモリセルを含み、さらに一般には、多数の連想メモリセルを含む。例では７２の連想メモリセルを含む。それぞれのＰＥ２０はイメージのサブポーションを蓄積し処理する。全てのＰＥ２０によって蓄積され処理されるサブポーションは同時にアクセス可能なＦＩＦＯ１０の中で一度に蓄積される入力信号の一部を形成する。例えば、１０２４個のＰＥ２０があるとする。もし処理のタスクが、それぞれのＰＥが一度に２画素を処理するのに十分単純ならば、ＦＩＦＯは一度に、カラーイメージの中の２０４８画素のブロックを蓄積できる。もし、処理のタスクがとても複雑で一つの画素を処理するのに二つのＰＥが必要ならば、ＦＩＦＯは一度にカラーイメージの中の、より少ない５１２画素のブロックしか蓄積できない。ＰＥ２０はコントローラ４０によって制御される。コントローラ４０は特徴的には全てのＰＥに並列に接続されている。図２は図１の装置を使うある方法の単純化したフローチャートである。図２の方法の第一段階はステップ５４である。ステップ５４では、システムはユーザーが選んだコマンドシーケンスをうけとる。そのコマンドシーケンスは取り扱う白黒またはカラーのイメージのそれぞれの画素に対して実行される。コマンドシーケンスはコマンドシーケンスメモリ５０に蓄積される。特徴的に、コマンドシーケンスは以下の型のコマンドの一部または全てを含む。（ａ）比較・・・一つまたはそれ以上のＰＥのそれぞれの内容を比較レジスタ（comparand）と比較し、内容がコンパランドと等しいかどうかの出力をする。（ｂ）書き込み・・・もし、自身の内容と周辺の他のＰＥの内容の一方または両方が、書き込みコマンドを優先した論理基準にしたがえば、一つまたはそれ以上のＰＥのそれぞれの書き込みオペランドに応じてその内容を変更する。（ｃ）シングルセルシフト・・・一つまたはそれ以上のＰＥのそれぞれの内容をデータリンク３０を経由して、それぞれ隣のＰＥへシフトさせる。（ｄ）マルチセルシフト・・・一つまたはそれ以上のＰＥのそれぞれの内容をデータリンク３０を経由して、それぞれを直接隣り合わないＰＥへシフトさせる。また、ステップ５４では、入力信号の一番目のブロックは同時にアクセス可能なＦＩＦＯ１０によって入力信号が受け取られる。コマンドシーケンスは、図２に示すように、一つずつ実行される。図３に、本発明のある実施例において組み立てられ動作する、入力信号を処理するＡＳＰ装置の単純化したブロック図を示す。図３の信号処理装置は以下の構成要素を含み、それらは全て、例えば一つのチップのような一つのモジュールの上に配置されている。（ａ）プロセッサのアレイ１１０またはＰＥのアレイ１１４がある。簡単のために、それらのうちの三つが示されている。それぞれのプロセッサ１１４は多数のメモリセル１２０を含み、簡単のために、それらのうちの四つが示されている。多くのメモリセルのうち、少なくとも一つのメモリセル（実施例の図では正確に一つ）は連想メモリセル１２２である。図に示すようにそれぞれのプロセッサの連想セルまたはセル１２２は同じ位置またはそれらのそれぞれのプロセッサの内部で同じ位置に配置される。例では、それぞれが７２のメモリセル１２０を含む１Ｋ（１０２４）のプロセッサ１１４があるとすると、それらは全て連想的ある。好ましくは、少なくとも一つのプロセッサは一つより多い入力信号のサンプルを処理することができる。（ｂ）応答メモリ１３０は一つ以上のレジスタを含む。そのレジスタはプロセッサ１２０から届いた応答を蓄積することができ、好ましくは、それらの間のデータリンクとして働く。連続して、プロセッサの間で分かれたデータリンクは設けられる。好ましくはメモリ１３０のデータリンクファンクションは少なくとも一つのマルチセルシフト操作を実行できるようになっていて、シフト操作当たり１６セル、実行される。メモリ１３０のデータリンクファンクションは好ましくはシングルセルシフト操作も実行する。その操作の中では、それぞれのサイクルで、一つのセルから隣のセル、または一つのＰＥから隣のＰＥヘシフトする。（ｃ）信号を入力し出力する、同時にアクセス可能なＦＩＦＯ１４０、または、もっと一般には同時にアクセス可能なメモリ。（ｄ）応答メモリの中の「ＹＥＳ」の応答の数を数えることができる、応答カウンティングユニット１５０。（ｅ）図１を参照して上に述べられたコンパランド、マスクおよび書き込みオペランドレジスタ１８０。図１のコマンドシーケンスメモリ５０と類似したコマンドシーケンスメモリ１６０と、コントローラ１７０は特徴的にはモジュール１０４の外部である。コントローラ１７０はコマンドシーケンスメモリ１６０をコントロールすることができる。連想信号処理のための方法は、以下のものを含む。（ａ）低レベル連想信号処理方法・・・ヒストグラム形成、ＩＤおよび２Ｄのコンボリューション（Ｃonvolution）、オプティカルフロー（Ｏpticalflow）、およびＲＧＢからＹＵＶへの変換のようなカラースペース間の変換。（ｂ）中レベル連想信号処理方法・・・角と直線の認識、輪郭のラベリング、突出のネットワーク、ホウ変換およびコンベックス外郭の形成やボロノイダイアグラムの形成のような幾何学的タスク。上記のそれぞれの連想信号処理方法について以下に述べる。ヒストグラムの形成ヒストグラムの形成方法を、ヒストグラム形成方法のソフトウエアとして示されている付録Ｂを参照して、以下に述べる。付録Ｂの方法はそれぞれのグレイレベルにおいて繰り返される非常に短いループを含む。ＣＯＭＰＡＲＥ（比較）操作はそのレベルの画素すべてにタグを付け、ＣＯＵＮＴＡＧはそれらを記録する。カウントは自動的にコントローラで行われ、外部のバッファへヒストグラムを蓄積する。コンボリューション（Ｃonvolution）低レベルの映像、特にエッジ認識では、イメージへの様々なフィルタの応用が必要となり、最も便利に実行されるのはコンボリューションである。イメージは長さＮＸＭの単なるベクトルとして、またはそれぞれがＭの長さのＮ列のベクトルの連結として、考えられる。Ｎ要素のデータベクトルＰ要素のフィルタによってＮ＋Ｐ−１の長さのベクトルになり、中央ののＮ−Ｐ＋１要素だけは、二つのベクトルの間の完全な重複エリアの結果、重要である。長さＰと８のコンボリューションフィルタベクトル〔ｆ〕はコントローラによってオペランドとして一度に一つ実行される。この結果、例えば、長さ８＋８＋１ｏｇ²（Ｐ）の長さのフィールド「ｆｄ」の中に蓄積される。「ｔｅｍｐ」ビットは一時的にフィールド「ｆｄ」の中をキャリーの蓄積につかわれる。「ｍａｒｋ」ビットはフィルタベクトルによって完全にかさなるエリアを自己認識するのに使われる。行コンボリューションの方法の一例を付録Ｃに示す。低パスフィルタで実行される２Ｄコンボリューション方法の一例を付録Ｄに示す。ラプラシアンフィルタで実行される２Ｄコンボリューション方法の一例を付録Ｅに示す。エッジ認識のソベル（Sobel）法で実行される２Ｄコンボリューション方法の一例を付録Ｆに示す。曲線伝播曲線伝播は弱いエッジをノイズとして消すのに便利で、強いエッジは弱められてもトレースされる。信号の統計の基本とイメージの中のノイズの評価において、二つの勾配規模の発端が計算される。「低」と「高」である。勾配規模が「低」のエッジは除去される、「高」のものはエッジであるとみなされる。「低」と「高」の間の値は、もし「低」より高い画素のチェーンを通って「高」の画素と接続されることができるならば、エッジであるとみなされる。その他の中間の画素は除去される。このプロセスは曲線の伝播を含む。連想処理、付録Ｇに詳しく示される方法は三つのフラグを使う。 (i) 「Ｅ」はじめに「高」の発端（明確なエッジ点）をマークし、最後には全ての選ばれたエッジ点を指定する。 (ii) 「ＯＥ」（Ｏld Ｅdge）最後の繰り返しで確認されたエッジの軌跡を保持する。 (iii) 「Ｌ」「低」以上の画素を指定する。全ての繰り返しにおいて、セッティング「Ｅ」によって宣言された場合、「Ｌ」の画素はまわりの８つ近傍の中に少なくとも一つエッジがあるかどうか調べる。「Ｅ」「ＯＥ」に移動する前に、処理が終わる時は、二つのフラグは定常状態になっているかどうか比較される。オプティカルフローオプティカルフローはイメージの全ての点に、ビジュアルフィールドと交わる動作を記述する速度ベクトルを割り当てる。オプティカルフローのポテンシャルアプリケーションはターゲットのエリアのトラッキング、ターゲットの自己確認、圧縮イメージの移動、独立した自動装置および関係があるエリアを含む。オプティカルフロー計算の理論は、特徴的には、二つの制約がある。イメージの中の個別の点の明るさは一定のままで、明るさのパターンのフローがどこでも滑らかに変化する。ホーン（Ｈorn）とシャンク（Ｓchunck）は抑制最小化の問題を解決するために反復的プロセスを得た。フロー速度は二つの成分(u,v)を持つ。それぞれの繰り返しにおいて、新しい速度のセット[u(n+1),v(n+1)]は先の速度の評価の平均から、評価することができる。ホーンとシャンクの方法を連携して実行する方法は付録Ｈに示される。カラースペース変換カラーイメージの処理の中で、最も重要なものの一つは、２４ビットのスペースを他のスペース、例えばイメージの圧縮に向いている（Ｙ，Ｕ，Ｖ）に変換することである。カラースペース変換の連想的な方法は付録Ｉとして後にに示される。角と直線方向の認識中レベルまたは高レベルの処理において重要な特徴は角と直線方向の認識をできることである。キャニー（Ｃanny）のエッジ認識では、直線の方向はこのプロセスの中で求められる。他方、Ｍ＆Ｈアルゴリズムは直接的でなく、それを形成するエッジビットマップは、さらに、直線認識の処理を行わなければならない。本発明のある実施例によれば、それぞれの画素のまわりの９×９のエッジビットマップは線分の方向を見分けるのに使われる。結果として生じる方法は、一般的に、１２０の違った直線を区別することができる。この方法のプログラムリストを付録Ｊとして後に示す。輪郭のトレーシングとラベリング準備の段階では、それぞれの輪郭点にそのｘｙ座標でラベルをつける。このプロセスは一般に反復的で、並列に、全ての輪郭点の３×３の近傍に対して実行される。それぞれの輪郭点はその８つの周りの一つを順番に調べ、その値が自身の値よりも小さかったら、隣のラベルを取り入れる。隣が扱われる循環するシーケンスは、ラベルの伝達をかなり高める。繰り返しは全てのラベルが変化しなくなった時に止まり、それぞれの輪郭が自身が最も小さい座標により認識する。それぞれの輪郭のなかで最も低い座標は一つだけ元のラベルを保つ。最も低い座標の点は軌跡が保持され、イメージの中の輪郭の数を得るために数えられる。この方法を実行するための連想的な方法は付録Ｋとして後に示す。連想突出ネットワークイメージの中の突出の構造は、組織化された捜索やその形についての前もっての知識なしに、すぐに認識することができる。このよう構造は、乱れた背景の中に埋もれていたり、その要素が分断されているときでも、目立つ。シャシュア（Ｓha｀ashua）とウルマン（Ｕl lman）は、曲線による包括的な突出の測定を提案した。それは、長さ、連続性および滑らかさを基にしている。イメージはＮ×Ｎのグリッドポイントのネットワークであると考えられる。隣の点からｄ方向要素のセグメントやギャップが入ってきて、同じように、隣へ出ていく。イメージの中の長さＬの曲線は接続された方向要素のシーケンスp(i),p (i+1),...,p(i+L),であり、それぞれの要素はイメージの中の線要素またはギャップに相当する。この方法を実行するための連想的な方法は付録Ｌとして後に示す。ホウ（Ｈough）変換ホウ変換は形がパラメトリック（parametric）な式によって記述される、直線や円錐の一部のような、曲線をたとえ隙間があってもみつける。イメージの中の図のそれぞれの点はパラメータスペースの中の軌跡へ変換される。パラメータが適当な範囲に分離したあと、パラメータスペースの軌跡の分布を与えて、ヒストグラムが形成される。目的カーブの発生はヒストグラムのはっきりしたピーク（多くの軌跡の共通部分）によってマークされる。直線の場合、次式の通常のパラメータ化が使われる。Ｘｃｏｓ（Ａ）＋Ｙｓｉｎ（Ａ）＝Ｒこの式は直線をＲと角度Ａで指定し、ヒストグラムはＡの全方向の直線を含む。しかし、もし候補点が方向を与える方法によってエッジ認識されると、角度Ａがわかる。この方法の連想的な実行は後の付録Ｍに示される。（例）２５６×２５６画素のサンプルに対してホウ変換を行うために必要な主なステップをいかに述べる。その中心を起点とする２５６×２５６のイメージは画素ごとに、列ごとに、多数のチップの中で、それぞれ一つのプロセッサの中に整列される。例えば、もしチップが１Ｋのプロセッサを含むならば、６４のチップが２５６×２５６＝６４Ｋを保持するのに使われる。ｘｙ座標は絶対値と符号が８ビットで与えられる。０からπ（3.1415...）までの角度Ａは１０ビットの精度（角度の符号をのぞいて）で与えられる。サインとコサインは表を参照して求められる。好ましくは、この表の大きさは関数の対称性を使って四分割されて小さくなる。Ａを比較した後、「カウンタグ（countag）」コマンドを使った要素によって、ヒストグラムと読み出し要素が求められる。（例）半径Ｒと中心ｘ₀，ｙ₀で与えられる円を見つけることができる。勾配の方向は簡単にプロセスに使うことができる。 dy／dx＝−(ｘ₀−ｘ₀)／(ｙ−ｙ₀)＝tan(Ｔ−π／２) この式を微分すると、ｘ₀，ｙ₀は以下のように求められる。ここで、Ｔは角度である。ｘ₀＝ｘ±Ｒｓｉｎ（Ｔ−π／２）ｙ₀＝ｙ±Ｒｃｏｓ（Ｔ−π／２）ヒストグラムはｘ₀，ｙ₀に対してつくられる。ホウ変換の連想的な実行は付録Ｎに示す。ボロノイ（Ｖoronoi）図このタイプの図は近接した解析に便利である。はじめに、平面にＬ個の点のセット、P(i),i=1,2,...,L、を与え、に、平面にＬ個の点のセット、P(i),i=1,2,...,L、を与え、ボロノイ図はそれぞれの点P(i)を領域R(i)で囲み、R(i)の中の全ての点は、ほかの領域のどの点(P(j ),j=1,2,...,L、i≠j)よりもP(i)に近い。これら全ての領域R(i)の境界はボロノイ図を構成する。「ブラッシュファイア（brush fire）」技術を基にした連想的な方法を付録Ｏに示す。それぞれの与えられた点は「ファイア」の源として働き、全方向へ不規則に広がる。境界は二つ（または三つ）の源が出会って燃える点から構成される。与えられたセットの中の全ての点は、はじめに別々の色、例えば自身のｘｙ座標、でマークされる。イメージの中のそれぞれの点は近傍の８つの点を調べる。ブランクの（色のない）点は近くに色の付いた点を見つけると、その色をコピーする。どちらも色が付いていたら、色を比べて違っていたら、ボロノイ（境界）点で自身をマークする。このプロセスは全ての点に色が付くまで繰り返される。以下は、付録ＢからＯの方法の中でよく使われる基本的なＡＳＰ又テップである。グループ１に分類されるどのステップも、グループ３の全てのステップと並列に実行できる。また、グループ１の全てのステップと、グループ２の全てのステップおよびグループ４の全てのステップは並列に実行できる。付録ＢからＯのいずれかに述べた方法を実行するには、付録ＢからＯのリストをボーランド「Ｃ＋＋」コンパイラのようなＣＬＡＳＳファンクションを使ういずれかの「Ｃ」言語コンパイラ上で実行させなければならない。すでに述べたように付録ＢからＯのそれぞれの方法は、以下のステップを含む。ａ．基本メモリサイズ定義ステップｂ．基本連想ワード長さ定義ステップｃ．付録Ａのサブルーチンが呼び出されるステップｄ．個々のアプリケーションに特有のステップ付録に述べられた特有の実施例は、本発明を非常に詳細に明らかにするものであり、制限するものではない。ＡＳＰチップの一つについては今述べた。その装置はここではＡＳＰ１００である。１．イントロダクションＡＳＰ１００は連想プロセッシングチップである。それはビジョン連想コンピュータの一部として働くためのものであり、最も一般にはマルチプルＡＳＰ１００チップのなかの一つのアレイのなかで働く。ＡＳＰ１００は１Ｋ×７２ビットの連想メモリアレイ、周辺回路、イメージＦＩＦＯＩ／Ｏバッファおよびコントロールロジックで構成される。２．操作のモードＡＳＰ１００は単一のチップとして（シングルチップモード）またはＡＳＰ１００チップのアレイの一部として（アレイモード）働く。２．１シングルチップモードシングルチップモードは図４に示される。このモードでは単一のＡＳＰ１００はコントローラと関連して操作される。２．２アレイモードアレイモードは図５に示される。ＡＳＰ１００チップの中の一つのアレイ並列に相互接続されて、線形アレイを構成する。単一のコントローラチップまたは適当な回路がアレイをコントロールする。３．ピンアウト３．１基本ピンアウトＡＳＰ１００は１６０ピンＰＱＦＰパッケージの中に詰められる。全ての出力とＩ／Ｏピンは３ステートであり、ＣＳによって制限される。以下にピンの全リストを示す。３．２テストピンアウト以下に示すのは、テストポイントとして働くピンのリストである。それらはプロダクションバージョンが詰められる間、無視される。４．構造と操作４−１平面図装置の平面図を図６に示す。ＡＲＲＡＹはメイン連想アレイである。ＦＩＦＯはイメージ入出力のｆｉｆｏバッファである。ＳＩＤＥはタグと、タグロジックと、タグカウントと、セレクトファーストと、（書き込みラインとマッチラインの）ロウドライバーと増幅器と、シフトユニットとから構成されるサイドアレイである。ＴＯＰはマスク、コンパランドレジスタ、および（ビットライン、インバースビットラインおよびマスクラインの）カラムドライバーから構成される。ＢＯＴＴＯＭは出力レジスタと増幅器を含む。ＣＯＮＴＲＯＬはチップのコントロールロジックである。マイクロコントロールはこのバージョンの外部である。４−２ＡＲＲＡＹの構造アレイは１０２４×７２のＡＰＥ(Ａssociative Ｐrocessing Ｅlement)から構成され、それぞれ２４のＡＰＥの幅の三つのカラムが統合されていて、物理的に３４２×７２のＡＰＥの三つのブロックに分割されている。この６ way分割はレイアウトの四角の面の比をつくり、縦方向のバスワイヤの読み込みを助ける。下記の４−３で説明するように、アレイのひとつの２４ビットセクターは（ＣＯＮＦＩＦＯ(Ｃonfigurefifoinstruction)によって）以下のように再構成できる。１．２４ビット全てがＦＩＦＯに対応する。（全体のＡＲＲＡＹ幅は４８である。）２．１６ビットがＦＩＦＯで、８ビットがＡＲＲＡＹに対応する。（全体のＡＲＲＡＹ幅は５６である。）３．８ビットがＦＩＦＯで、１６ビットがＡＲＲＡＹに対応する。（全体のＡＲＲＡＹ幅は６４である。）ＡＰＥ(Ａssociative Ｐrocessing Ｅlement)はＣＡＭ(Ｃontent Ａddressabl e Ｍemory)セルである。それは蓄積要素、書き込みデバイスおよびマッチデバイスから成り立つ。ＡＰＥの中には三つの縦方向のバスと四つの水平方向のバスがある。縦方向：ビットライン（ＢＬ）インバースビットライン（ＩＬ）マスクライン（ＭＡＳＫ）水平方向：書き込みライン（ＷＬ）マッチライン（ＭＬ）ＶＤＤＶＳＳ（ＧＮＤ）蓄積要素は、二つの交差したＣＭＯＳインバータの組で構成される。書き込みデバイスはダイナミックなＥＸＣＬＵＳＩＶＥ（ＸＯＲ）ロジックである。この技術は比較操作の装置の有効で信頼できる面積を認める。４．３ＦＩＦＯの構造図７を参照すると、ＦＩＦＯはイメージデータをＡＲＲＡＹで実行される計算と並列に入出力するように設計されている。ＦＩＦＯは１０２４×２４または１６または８の再構成可能なＡＰＥ１９０のマトリックス、それぞれの１０２４の双方向スイッチの三つのカラム、およびアドレスジェネレータ１９４から構成される。コンパランドレジスタの対応する部分は、ＴＯＰの中で、ＦＩＦＯ入力レジスタとして働き、出力レジスタに対応する部分は、ＢＯＴＴＯＭの中で、ＦＩＦＯ出力レジスタとして働く。ＦＩＦＯコントローラ（ＦＣ）はＴＯＰの中にある。ＦＩＦＯはＣＯＮＦＩＦＯの指示によって構成され、ここでオペランドの三つのＬＳＢは、６、８ビットＦＩＦＯ５、１６ビットＦＩＦＯ３、２４ビットＦＩＦＯ７、ＦＩＦＯなしである。ＦＩＦＯが８ビットの広さの時、入力ＶＩＮ［０：７］と出力ＶＯＵＴ［０：７］はＦＩＦＯのビット［０：７］へ送られる、ここでビット０は最も左のビットである。ＦＩＦＯが１６ビットの広さの時、入力ＶＩＮ［０：１５］と出力ＶＯＵＴ［０：１５］はＦＩＦＯのビット［０：７］へ送られ、最低限重要なバイト［０：７］は前のケースと同様に送られる。同じく、ＦＩＦＯが２４ビットの広さの時、最低限重要な２バイトはＦＩＦＯが１６ビットの広さの時と同じように送られる。アドレスジェネレータ１９４はシフトレジスタから構成され、シーケンシャルアドレッシングモードを実行する。それは現在活動中のＦＩＦＯワードラインを選ぶ。ＦＩＦＯはイメージＩ／Ｏモードとイメージ変換モードの二つの操作モードを持つ。双方向スイッチ（三つのうちの一つのカラム）は、イメージＩ／Ｏモードではアレイからのマトリックスを分断し、イメージ変換モードではアレイからのマトリックスを接続し、ＡＰＥのアレイの結合を形成する。入力と出力のレジスタはイメージの入力と出力のときはバッファレジスタとして働く。イメージＩ／ＯモードイメージＩ／Ｏモードでは、新しいイメージは前のイメージが書き込まれる間にＦＩＦＯに読み込まれる。ＦＩＦＯコントローラ（ＦＣ）は、ＦＩＦＯを以下のようにコントロールする。画素Ｉ／ＯはＣＬＫと同期する。外部コントロール入力ＲＳＴＦＩＦＯはアドレスジェネレータ１９４をリセットする。ＦＥＮＢ（少なくとも２のＣＬＫサイクルを必要とする）はＣＬＫのポジティブなエッジの上の次の画素の入力と出力をできるようにする。いったん全ての画素が入力（または出力）されると、ＦＦＵＬは２のＣＬＫサイクルを必要とする。このＩ／Ｏ活動は非同期で残りのチップの計算を実行する。イメージＩ／Ｏモードの基本操作は以下のように実行される。ＶＩＮピンの画素はＦＩＦＯインプットレジスタ（コンパランドレジスタのＦＩＦＯ部分）へ入力される。アドレスジェネレータ１９４は正確に一つのワードラインをできるようにする。対応するワードはＦＩＦＯアウトプットレジスタ（アウトプットレジスタのＦＩＦＯ部分）へ書き込まれ、直接その中を通ってＶＯＵＴピンへ、読み込み実行と同じように実行される。その後に、ＦＩＦＯインプットレジスタの中のワードは、書き込み実行と同じように、同じワードの中へ書き込まれるＶＯＵＴピンには三つの状態があることに注意する。それらは必要に応じて内部にできたりできなかったりする。このオペレーションのシーケンスは１０２４のプロセッサをデータで満たすために１０２４回繰り返し実行される。多数のＡＳＰ１００チップはＦＥＮＢ／ＦＦＵＬチェーンと一緒につなげることができ、最初のＡＳＰ１００はＦＥＮＢ（正確に２サイクル）を外部コントローラからうけとり、それぞれのＡＳＰ１００（正確に２サイクル）のＦＦＵＬは直接次のチップのＦＥＮＢ入力へ接続され、最後のＦＦＵＬはコントローラへ戻る。イメージ変換モードイメージ変換モードでは、前もってＦＩＦＯへ読み込まれたイメージは次の処理のＡＲＲＡＹへ伝達され、ＡＲＲＡＹからの前もって処理されたイメージはＦＩＦＯへ次の出力へ伝達される。これらの伝達はＳＩＤＥブロックのタグレジスタを経由して比較と書き込みのシーケンスにより、以下のように実行される。イメージ入力ＡＲＲＡＹの目的のビット部分はマスクレジスタにマスクされ、ＳＥＴＡＧのチェーンによりリセットされる。クリアコンパランド、書き込み操作（それは１サイクルですべて実行される。）ＦＩＦＯマトリックスの元のビット部分はマスクレジスタによりマスクされる。ビット部分の内容は比較操作の結果としてタグレジスタへ渡される。目的のビット部分は再びマスクされ、タグレジスタの内容はセットコンパランド、書き込みオペレーションによって目的ビット部分へ渡される。要するに、以下の５サイクルが使用される。イメージ出力この操作は元のビット部分がＡＲＲＡＹの中に割り当てられるとき、目的ビット部分がＦＩＦＯマトリックスの中に割り当てられることを除いては、イメージ入力と全く同じように実行される。イメージ変換操作はＡＲＲＡＹの中に二つの違ったフィールド（第一のフィールドは新しいイメージを割り当て、第二のフィールドは処理されたイメージを一時的に蓄積する。）を必要とすることに注意する。この二つの操作（イメージ入力と出力）は一つのループに統合される。４．４ＳＩＤＥの構造図８は、図６のサイドブロックの実行を説明した図である。ＳＩＤＥブロックは、タグレジスタ、ＮＥＡＲ、ＦＡＲ、ＣＯＵＮＴＴＡＧ、ＦＩＲＳＴＲＥＳＰＯＮＤＥＲ回路、ＲＳＰ回路、および水平バスドライバと増幅器を含む。タグレジスタは１０２４のタグセルのカラムから構成される。タグレジスタはＤフリップフロップによってセット入力および逆でない出力が実行される。入力は以下のように８の入力によって選ばれる。ファーノース、ニアノース、ファーサウス、ニアサウス、マッチライン（増幅器を通る）、タグ（フィードバックループ）、ＧＮＤ（タグをリセット）および第一応答出力。ｍｕｘはＭＵＸ［０：２］によってコントロールされる。４．５ＴＯＰの構造ＴＯＰ部分はＣＯＭＰＡＲＡＮＤ（比較）およびＭＡＳＫレジスタを含み、それぞれが論理および縦方向ドライバを持つ。ＣＯＭＰＡＲＡＮＤレジスタはＡＲＲＡＹと比較されるワードを持っている。それは、７２ビットの長さで、ＦＩＦＯの形状によって区分されている。(４．３を参照。)それは、ＬＥＴＣ、ＬＥＴＭＣ、ＬＭＣＣ、ＬＭＳＣ、ＬＣＳＭのインストラクションに影響される。これら全てのインストラクションはセクタビットに応じて、三つのセクタのうち一度にはひとつにしか影響を与えない。ＣＯＭＰＡＲＡＮＤのＦＩＦＯ部分は、４．３に述べたのとは違った働きをする。ＭＡＳＫレジスタは「０」によって比較と書き込みの間無視されるＣＯＭＰＡＲＡＮＤのビットをマスクする。マスクされたビットのビットラインとインバースビットラインは「０」に保持される。それはＬＥＴＭ、ＬＥＴＭＣ、ＬＭＣＣ、ＬＭＳＣ、ＬＣＳＭ、ＬＭＸ、ＳＭＸのインストラクションの影響を受ける。前者五つのインストラクションは一度に一つのセクタにしか影響を与えない。ＬＭＸとＳＭＸも位置の決まらないセクタのマスクビットをクリアする。ＭＡＳＫのＦＩＦＯ部分は４．３に述べたのとは違った働きをする。４．６ＢＯＴＴＯＭの構造ＢＯＴＴＯＭはビットラインとインバースビットラインの増幅器、出力レジスタとその多重器、ＤＢＵＳ多重器およびＤＢＵＳＩ／Ｏバッファを含む。ＡＲＲＡＹは三つのカラムに物理的に組み合わされているので、三つの増幅器の出力は分解されなけえばならない。論理ユニットはどのカラムが実際に出力を形成するのか、以下のように選択する。ＲＥＡＤ：どのＲＳＰが真かカラムを選択する。ＦＩＦＯＯＵＴ：得られたアドレスはどの中かを選択する。出力レジスタは７２ビットの長さである。８または１６または２４ビットがＦＩＦＯとして働き、ＶＯＵＴピンに接続される。ＲＥＡＤ操作では三つのセクタのうちの一つ（セクタビットによる）が、多重器経由でＤＢＵＳの２４ビットと接続される。ＤＢＵＳ多重器は二つの構成が可能である。（１）ＳＨＩＦＴ：サウスロングシフトを（列１００８：１０２３から）ＤＢＵＳ［３１：１６］へ、ノースロングシフトライン（列０：１５から）ＤＢＵＳ［１５：０］へ接続する。（２）ＲＥＡＤ：出力レジスタのビット［０：１５］（ビット０はＬＳＢ）を）ＤＢＵＳ［１５：０］へ、ビット［２３：１６］をＤＢＵＳ［２３：１６］へ接続する。ＤＢＵＳのＩ／ＯバッファはＤＢＵＳが入力としてまたは出力として接続されているかコントロールし、以下のようにＨＩＮとＬＩＮのコントロール信号によって制御される。ＳＨＩＦＴモードでは、一つのＡＳＰチップのラインＤＢＵＳ［３１：１６］は次のチップのラインＤＢＵＳ［１５：０］に接続されなければならない。外部の接続は単純ではないが、必要に応じて切り替えられる。４．７ＣＯＮＴＲＯＬの構造ＡＳＰ１００は外部のマイクロコーデッドステートマシンによってコントロールされ、デコードされたコントロールラインを受け取る。外部のマイクロコントローラは並列実行と水平なマイクロプログラミングを行うことができる。組み合わされた、ＡＳＰ１００、マイクロコントローラおよび外部コントローラの操作は以下の五段階のインストラクションパイプラインにまとめられる。取って来る段階、解読する段階、μフェッチ、比較および実行である。取って来る段階では、インストラクションは外部プログラムメモリから取って来られ、システムバスを通してＩＲ（Ｉnstruction Ｒegister）に変換される。解読段階では、ＩＲからのインストラクションはマイクロコントローラによって変換され、μＩＲに蓄積される。μフェッチ段階では、コントロールコードは入力パッドを通って外部μＩＲから内部μＩＲへ変換される。比較段階では、コンパランドレジスタに影響を与える部分が実行され、コントロールコードは内部 μＩＲからμＩＲ２へ移動する。実行段階では、図９に示すように、ＡＲＲＡＹと他の部分で実行される。４．８初期化リセットのとき、レジスタ内部の全てのＡＳＰ１００は、ＴＡＧとメモリアレイを除いて、０にセットされる。４．９操作上の注意１．書き込みは、ＷＲＩＴＥ、ＣＯＭＰＡＲＥ、ＷＲＩＴＥ・・・が行われる間連続した二つのＷＲＩＴＥまたは連続した二つのＣＯＭＰＡＲＥが実行できないように設計されている。２．ＳＥＴＡＧのようなＳＥＴを含む全ての命令のなかで、対応する信号ＬＥＴＭおよび、またはＬＥＴＣは高くセットされる。これは、コンパランドとマスクのレジスタのセットとリセットが同期するので、必要である。４．１０クロックジェネレータ望ましくは、単一の５０ＭＨｚのＣＬＫＩＮクロックがＡＳＰ１００に入力される。加えて、クロック同期コントロールＤＣＫＩＮ信号も入力される。ＣＬＫＩＮ信号はジェネレータ回路のクロックとして働く。ＤＣＬＫＩＮは入力信号である。（必要とされるセットアップタイミングまたはホールドタイミングで接続される。）回路は二つのクロックをつくり、例えば、２５ＭＨｚの、互いに相対的に四分のーサイクル遅れのＣＬＫとＤＣＬＫである。ＣＬＫはクロックジェネレーティングパッドへフィードバックされ、要求されるドライブ能力を与える。５．プログラミングモデル５．１インストラクションセットインストラクションセットの一例を以下に述べる。二つのインストラクション形式が使用される。インストラクション形式Ａはグループ１とＲＥＡＤインストラクションに使われ、インストラクション形式Ｂはその他のグループに使われる。それは、ＮＯＰの１ビット、５つのＯｐＣｏｄｅビット、２つのセクタビットおよび２４のオペランドビットを含む。インストラクション形式ＢはＮＯＰの１ビット、７つのＯｐＣｏｄｅビットおよび２４のオペランドビットを含む。ここに示したように、これらの形式は複数のインストラクションを並列に実行することはできない。代わりのインストラクションセット形式の実行によって、複数のインストラクションの並列な実行を行うことができる。以下の表の中で、ｄ（ｎ）はｎ≦２４のｎビットの引数で、ｓ（２）は２ビットのセクタ数である。並列（「水平」）処理１．グループ１、３の命令は並列に実行できる。２．グループ１、２、４の命令は並列に実行できる。明快に、別々の実施例の文脈で述べられた本発明の多くの特徴が単一の実施例の組み合わせの中でも示されることが理解される。反対に、一つの実施例の文脈の中で簡潔に述べられた本発明の多くの特徴が、別々に、または適切な組み合わせのなかでも述べられる。イスラエル、レホボット（Ｒehovot）の、ウェイズマン（Ｗeizman）科学研究所に提出された博士論文の主要部である、アビダンアケリブ（Ａvidan Ａkeri b）博士の連想リアルタイムビジョンの研究は、図１１から３４に示される。 ARTVMアーキテクチヤー（The ARTVM Arclutecture）連想リアルタイムビジョンマシン（ARTVM）は、ビジョンリクワイヤメンツに適合できる多くの特徴を備え付けている。マシンの構成とその初期操作を説明するとき、その操作の特性が強調される。図１１を参照すると、マシンの重要部分はフォワードとビットが共にパラレルな、ベイシック、クラシカル、連想プロセッサである。メインの連想プリミティブはコンペア（COMPARE）である。コンパランドレジスタはメモリーの全てのワードに対し同時に適合しており、アグリーメントは対応するタグビットをセットすることにより指示される。比較はマスクレジスタにより指示されるビットとタッグレジスタにより指示されるワード中のみで実行される。ステイタスビットrs pは、少なくとも１個の相手（マッチ）であるという信号を発する（図１２）。ライト（WRITE）プリミティブは類似の方法で操作する。コンパランドの内容はタグによって指示される全てのワードとマスクによって指示される全てのビット中に同時に書き込まれる（図１２）。リード（READ）コマンドは通常シングルワードを持ち出すように使われ、そのシングルワードはタグによって摘出される。コンビネーションコンペアライト（COMPARE-WRITE）は「イフコンディションゼンアクション」型であるため、全ての論理的かつ算術的ファンクションは[3]として実行される。それゆえ連想マシンはシンプルプロセッサのアレイとしてメモリー中の各ワード用のアレイとしてみなすことができる。ARTVMはＮ×Ｎワードを提供し、また処理されるイメージの中の各画素のワードを提供し、更に画素は複数の列をなして連続的に配列される。フルマシン指示セットは次のテーブルに示す通りである。ビジョンアルゴリズムの重要ロールをプレイする近隣オペレーションは、「きんぼう」画素からデータを持ってくることを要求する。データコミニュケーションは図１３に示されるように、シフトタグ(SHIFTTAG)プリミティブによりタグレジスタを経由して一時にワンビットスライスを実行する。多数の適用シフトは発生地と目的地の間の距離と関係を決定する。この関係が一様であるとき、全てのプロセッサ間でのコミュミケーションは同時に行われる。幸いに、近隣アルゴニズムだけが一様なコミュニケーションパターンを要求する。タグレジスターが単一の次元だけからなるときイメージは二次元からなるため、近接する列の近隣物間のコミュミケーションはＮシフトを要求する。これらの複数の長いシフトを供給するため、マフチプルシフトプリミティブすなわちシフトタグ（±b）は、ハードウェア上で実行され、このときｂはＮのサブマルティプルである。一個のＮ×Ｎイメージをk場所にシフトするサイクル中の時間的コンプレクシティは、M/2［5+k-Lk/6（b−1）］であらわさせる。ここでＭは正確さを示し、ｂはマルティプルシフトプリミティブの大きさを示す。連想メモリーにデータイメージをロードしコンピュータ結果を出力することは多くの実行処理時間がかかる。このことは、連想メモリー内のフレームバッファーを分配しかつそれをタグレジスター［33］にアクセスする事により回避することができる。それは、各ワードに付随する１６ビットシフトレジスターからなるI/Oバッファーアレイの機能である。ステレオイメージフレームは、連想プロセスと干渉することなしに、それが受けられデジタル化されるときバッファーアレイにシフトすることができる。バーチャルブランキングの際、ステレオイメージフレームは連想メモリーに移動され、一時に１ビットスライスであり、使用される。 TAGXCH（TAGエクスチェンジ）プリミティブ。このコマンドにおいてバッファーアレイ（buffer array）はタグ（tag）レジスターを経由して右回転方向に回転される。前のフレームのコンピュータ結果がアウトプットに適合しているのであれば、その結果は比較指示によってタグに入れられた後、タグエクスチェンジを実行し、このタグエクスチェンジが１ビットスライスを出力すると同時にこのプリミティブのネームタグエクスチェンジの中に１ビットスライスを持ってくる。フルステレオイメージにとってこの操作は１６回繰り返されなければならない。次のフレーム時間の際、インプットとアウトプットの両方の操作は、互いに干渉することなしに並行処理される。次のルーチンでは、バッファーアレイの内容を次のビットポジションioで開始される連想メモリーの１６ビットフィールドの内容に変更される。実行時間は６４マシンサイクル（２μｓ以下）であり、バーチャルブランキング期間（１.８ｍｓ）と無視できる程度に比較される。サンプルルーチンにおいてバッファーアレイとメモリーの連続フィールドとの間のデータを変更するとき、タグエクスチェンジプリミティブは完全にフレキシブルで一個のフィールドからデータをフェッチすることができ、別の一個のフィールドにフットすることができ、この両フィールドが配分される。所定のメモリーサイクルにおいては４個の操作が合い並んで同時に実行される。すなわち、SETAGまたはSHIFTAG;M（SETX:LETX）のロード；Ｃ(SETX:LETX)のロード；並びにCOMPARE,READまたはWRIAE。FIRSELは６サイクルにおいて多数の応答を決定するCOUNTAGは１２サイクルにおいて総計資料を集計し実行するようになっている。コントロールファンクションはＣ言語で与えられ連想操作と平行して実行されるので、実行時間には寄与しない。並行処理能力を明らかにするためには、Ｊ要素とＭビットの正確さを持つ２個のデータベクトルＡとＢを連想メモリーが含んでいることを考慮されたい。ベクトルフィールドＢを小計A+Bにより置き換えられることを私たちは期待する。連想操作はシーケンシャリーに実行され、また一時に１ビットスライスを実行し、初期時に最小重要ビットをもってスタートする。各ステップにおいて３個のスライスA,BとＣ(I=0,1,...,M-1とし、Ｃはキャリースライス)は、記述［letm d(.) ;letc d(.);setag;compare］により応用真理値表のインプットコンビネーションと並行して比較され、続いて記述[letc d(.)write]により適正なアウトプットコンビネーションがＢとＣの並行置換により続く。ルーチンのフルディスクリプションはマシンシュミレータにより与えられる。マシンサイクルの付加実行時間はベクトルサイズＪから独立しているものとみなされる。従って、１個の５１２×５１２イメージ（J=2¹⁸）と３０ナノセコンドのマシンサイクル（VLSIチップインプリメンテーションとインターコネクション参照）では、マシンは１２５０億の８ビットの加算を実行する。連想減算は同一原理で操作され、また８.５メガサイクルで実行する。加算を掛け算（８.５Ｍ²サイクル）に拡張するのは容易であるし、また引き算をわり算（１５.５Ｍ²サイクル）に拡張するのも容易である。掛け算技術はビジョンアルゴリズムにおいて詳細に議論される。ARTVM はマイクロプログラムされているとみなされるため、各フェーズで必要とされる精度を適宜操作することができるし、結果として要求される重要なビットを適宜生成する。多くのビジョンアルゴリズムでは、精度はかなり低く、このことがAR TVMに従来のマシンを越える付加的速度効果を与える。前述したように、メモリーの各ワードはシンプルプロセッサーとして活動するから、メモリーとプロセッサーは区別することができない。入力、出力並びに処理は同じワードの各種のフィールドで同時に行われる。アクセスされ、またプロセスされるフィールドはアプリケーションに完全に且つフレキシブルに従属している。このため、ワード長さＫを増大することにより能力（ビジョンアルゴリズムのファミリー）の処理を拡張することができる。ビジョンアルゴリズムに示されるように、Ｋ=１５２では私たちが考える全てのビジョンアルゴリズムはリアルタイムで走行し、このときＫ=３２は多くのシンプルイメージ処理アプリケーション、例えばヒストグラム評価、コンボルーション、形態学オペレーションなどのアプリケーションにとって十分な値である。 VLSIチップのインプリメンテーションとインターコネクション（VLSI Chip Im plementation and Interconnection）ルーマン（Ruhman）とスケルトン（Scherson）［34,35］はスタティック連想セルを発明しそれを用いて連想メモリーチップをレイアウトした。そのパフォーマンスを回路レベルシュミレーションにより評価した後、彼らは連想メモリーと番号４のスタティックRAM間の領域割合を保守的手法により評価した。スタティックRAMの４メガビットは商業上１００mm²よりも小さいチップ面積上に現在では使用されていることを考えると、連想メモリーチップ能力は１メガビットになる。ARIVM用の申請チップは４Ｋワード×１５２ビットをストアし、この４Ｋワード×１５２ビットは５９％容量に過ぎない。０.５ミクロン技術におけるサイクル時間の保守的な外装法によると３０ナノセコンドを要する。この値は連想ビジョンアルゴリズムの実行時間を計算するのに使われた。完全なコンパランドすなわちマスクワードをロードするには１バス１５２ビット幅を要する。幸いにも、連想アルゴリズムのみが一字に１または２のショートフィールド上で且つ多数のフラグビット上で操作する。このため、ワードは４個の３２ビットセクターと１個の８ビットフラッグフィールドに分離された。複数のバスは前記一個のフラグフィールドと１個のセクターに同時にアクセスするように提供される。図１４はチップインターフェイスを示し、６４の前記チップが ARTVMの連想メモリーをどのようにして相互連絡して作り上げることができるかを示している。１０個の５年ごと単位［36］の要素によるチップ能力のエクスポーテンシャル成長を考慮すると、ARTVMは１９９５年頃に８チップに減少することができる。マシンのバルクは連想メモリーであるから、アップグレードは簡単で低価格に行える。図１１に示すように、コントロールユニットは出力（カウントとrsp）を受けてテストするとともに連想メモリーにおいてマイクロ指示と不変量（マスクとコンパランド）のシーケンスを発生することが求められる。このユニットは高速ビットスライスコンポーネントを使用するとみなすことができ、１以上のVLSIチップの設計により最も効果的に活用することができる。コントロールユニットのファンクションは、下記の連想アルゴリズムから明らかになる。図１４の注意書きを参照。マシンシュミレータ（The Machine Simulator） ARTVMのシュミレータが草案された。このシュミレータはビジョンアルゴリズムの連想インプリメンテーションのチェックアウトとその保守の向上等をユーザーに可能にさせる。それは「Ｃ」言語で書かれ、「asslib.h」と表記される。ビジョンマシンシュミレータは連想指示モデラーと実行時間エバルエータとから構成される。連想指示モデラー主な特徴は次の通りである：・連想メモリーの次元は可変メモリーサイズと可変ワード長さに基づいて定義されるから、＃デファインコマンドによりアプリケーションプログラムで初期化されなければならない。・連想メモリーとそのレジスターの内容は次に述べるライプパラメータに基づいて定義される。全てはパラメータと呼ばれる構成要素である。・ARTVMアーキテクチャーに定義される連想指示は、出力発生源となる「Ｃ」ファンクションとして実行され、外部構成パラメータに結果を書き込む。・３個の指示はロード、セーブ並びにプリントサイクルに加えられる。ロードコマンドはファイル連想入力からデータと共にアレイA[.][.]を初期化し、一方、セープコマンドはアプリケーションプログラムの最後にメモリアレイA[.][.]の内容をファイル連想出力に書き込む。印刷サイクルコマンドはシュミレートされたプログラムを実行するのに求められるマシンサイクルの数を表示する。・プログラムコントロールファンクションは「Ｃ」で直接書き込まれる。・アプリケーションプログラムの一般的なフォーマットは次の通りである：実行時間エバルエータ（Ｅxecution Ｔime Ｅvaluator）速度評価は、マシンサイクルにおいてコストが各領域委変化に割り当てられる簡単化された有限オートメーションＦinite Ａutomaton（Ｆ.Ａ.）としてマシンをモデリングすることによりなされる。マシンは２個の状態すなわちS₀とS₁のみを有する。入力アルファベットは５個のカテゴリーIy:y=1,...,5で示されるカテゴリーに指示をグループ化することにより次の通り選択した。図１５は変化テーブルと図２を示す。初期化時、私たちはサイクロカウンター（サイクル(cycles)という）を０にリセットし、サイクルで割当コストにより各状態変化でサイクルカウンターをインクリメントする。この速度モデルは図に示す結果をもたらす。すなわちI₁グループからのどんな指示もグループI₂からの指示と同時に実行することができ、これらの両方の指示は、グループI₃からの指示によりオーバーラップする。カウンタ部（I₄）のコストは保守的方法によると加算器のピラミッドによりそれをオンチップに遂行し、２次元アレーで部分的合計をオフチップに集計することを基礎として１２サイクルとして見積もられる。ファーセル（I₅）のコストは、従来方法によると、ＯＲゲートのピラミッドにより、それを遂行することを基礎として６サイクルに見積もられる、そのＯＲゲートの深さは、log₂Ｎ-1であり、その一部はオンチップで残りはオフチップである。最悪の場合、ピラミッドは２度反論されなければならず、そのときタグフリップフロップはリセットされなければならない。速度評価のためのモデルが簡単になると、プログラマーに緩やかな制約を与える。並行処理を許容する指示はオーダ−Ｉ₁、Ｉ₂、Ｉ₃に書き込まれなければならない。シュミレーターに使われる指示セットとアセンブリー言語を図示化するため、私たちはすでに議論したベクトルアディションプログラムのリストを次に示す（ＡＲＴＶＭアーキテクチャー）。ビジョンアルゴリズム（Ｖision Ａlgorithms）申請された連想アーキテクチャーＡＲＴＶＭのフレキシビリティと速度をテストするため、ビジョンファンクションの広い範囲が遂行された。それらは低いレベルのアルゴリズム、例えばヒストグラムジェネレーション、コンボルーション、エッジリテクション、シニング、ステレオマッチングならびにオプティカルフローなどを含む。中間レベルファンクションは遂行された。この中間レベルファンクションは、外形輪郭トレーシングおよびレベリング、膝(Ｈough)変換、重要マッピング、並びにコンベックスフルとボロノイダイアグラムとしての幾何学的タスクのようなものを含んでいる。私たちのシュミレーターは連想アルゴリズムをテストするために使われ、そのコンプレクシチィを証明するために使われた。連想アルゴリズムを詳述する前に、マシンの特徴、いくつかのパラメーターの値ならびに使用されるデーター構造を簡単に思い起こすことが役立つと思われる。イメージ分析は５１２×５１２（Ｎ=512）であるととられている。このため連想メモリー能力は２５６Ｋワードである。（１セルあたり１ワード）。データーは、ビデオスキャンのために画素の配列をなしてメモリーにリニアに配列され、イメージの頂上左手角から開始する。入ってくるデーターは８ビットの正確さ（Ｍ=8）をもっている。そして処理は全て固定ポイントであるけれども、アルゴリズムはフルの固有の精度を保有するように設計されている。ロングシフト指示は列の間のコミュニケーションのために与えられる。その大きさはｂにより表示され、ｂは列長さＮの約数である。マシンの私たちのモデルにおいて、ロングシフトの大きさは３２プレイスに取られている（b=32）。アルゴリズムのはサイクル時間で表示される。すなわちマシンサイクルは３０ナノセックでＶＬＳＩチップインプリメンテーションとインターコネクションにおいて保守的方法によって見積もられた。この値は実行時間をコンピュータにより計算するのに使われる。ローレベルビジョン（Ｌow Ｌevel Ｖision）ヒストグラムＡＲＴＶＭの連想特性とその関連迅速カウンタ部指示はヒストグラムエバルエーションを容易にする。プログラムはリスト１に示される。それは各グレーレベルで繰り返される大変短いループからなる。すなわち比較指示は各グレーレベルの全ての画素を付け、カウンタ部は全ての画素数を総計する。その値は自動的にコントローラーに活用可能となり、そのコントローラーは外部バッファーのヒストグラムにアキュムレートする。従って、マシンサイクルの時間コンプレクシチィは次式により与えられる。ここで、Ｍはグレー水準精度を示し、私たちのモデルでは８ビットが取られる。このため、ヒストグラムは、３３３０マシンサイクルすなわちほぼ１００μｓにおいて実行される。コンボルーション（Ｃonvorution）ローレベルビジョン特にエッジの発見は、イメージへの各種のフィルターのアプリケーションを含み、このイメージはコンボルーションにより最も簡便に実行される。イメージは、簡単なベクトル長さＮ²すなわち長さＮの各々について連続するＮ列ベクトルとして考えられる。ＰエレメントフィルターによるＮエレメントデータベクトルのコンボルーションはベクトル長さＮ+Ｐ-１に結論づけられるが、しかし２個のベクトル間の完全にオーバーラップする領域を代表する中央のＮ-Ｐ+１エレメントは利得をもっている。私たちは、いくつかの技術を発展させ、連携してコンボルーションを遂行し、次元の分離と調和としてのフィルター特性に依存した。私たちは、ルーマンアンドスケルトン（Ｒuhman ＆Ｓcherson ）[6,37,38]に示されるマルチプリアンドシフトアプローチを開始した。連想メモリーのワードフォーマットは、図１６に示される。長さＰと精度８のコンボルーションフィルターベクトル[f]は、マシンコントローラすなわち１つのエレメントから一時に演算される。その結果、フィールド [fd]長さ８+８+log₂（Ｐ）で操作される。ビットテンプ（temp）は、フィールド[fd]を通して伝達するキャリーの一時的蓄積用に使われる。マーク（ mark）ビットは、フィルターベクトルにより完全にオーバーラップする領域を認識する役割を果たす。列コンボルーションプログラムはリスト２に示される。フィールド[d]は連続エレメントベクトル[f]を掛け合わせられる。掛け算では、フィールド[d]は、列コンボルーション用の１ワード位置にシフトダウンされ、すなわちコラムコンボルーション用のＮワード位置にシフトダウンされる。倍率器としてのｆエレメントアクト、すなわちその各ビットは、コントローラーによってテストされる。そしてイフセットは、フィールド[d]をビットポジションアド- オフセット（add-offset）で開始するフィールド[fd]に加算する。各加算の後、キャリーは最高のビット[fd]に伝達される。コラムコンボルーションにおいては、最後のプログラムラインだけが変化されなければならず、「シフトタグshiftt ag(1)」がに置換される。マシンサイクルの1-dコンボルーションの時間コンプレクシチィは次式で与えられる。ここで、t_a、tsはパービット加算とシフトコンプレクシチィを示し、Ｔp^1bはフィールド[fd]を越えるキャリー伝達のコンプレクシチィを示す。加算とキャリー達はマルチプライヤ（フィルターエレメント）の１個のＯＮＥデジットのために実行されるのであるから、時間コンプレクシチィはフィルターベクトルエレメントのＯＮＥデジット比としてのαのファンクションである。αの範囲は１/Ｍ ≦α≦１である。プログラムリストによると加算時間t_aは８.５サイクルである。キャリー伝達は１ビット当たり４サイクルを取り、平均伝達距離は、そのため、シフトタグ（）をのぞく１フィールドをシフトするプログラム時間は１ビット当たり３サイクルである。そのため、近傍コラムを越えて画素フィールドを移動することは１ビット当たりt^c _s=3サイクルを取り、隣接する列を越えて１画素フィールドを移動することは１ビット当たりを取る。２-dに上記アルゴリズムを拡張すると、次式が与えられる。ここで、は、２-dコンボルーションの平均キャリー伝達を示す。t_c、Ｔp^2dとtsを１１に置換すると、次式を得る。この簡単なアルゴリズムは１-dコンボルーションでかなり効率的なもので、特に使用フィルターが的に低い値（α≪１）であるならばかなり効率的である。しかし、１個のワイド２-dフィルターすなわち３１×３１にとっては実行時間はビデオフレームの半分（２０ms）を越えることができる。いくつかのアプローチが時間コンプレクシチィを減少するために考えられる。いくつかの２-dフィルターは２個の１-dコンボルーションに分離可能である。従って、２-dガウシアンは２個の同等の方向に１個の１-dフィルターの精巧アプリケーションにより影響される。１-dコンボルーションに減少することは実行時間のドラスティックな改良を導くことになる。方程式１１は次式で記述される。 t_a、Ｔp^1dとtsを置換すると、それは次式に縮小される。私たちが全てについて扱うフィルターはオリジンと調和している。ガウシアンは調和しており、一方、ガウシアンの導関数は奇数の調和となる。これは、実行時間を改良するための調和をどれくらい遠くに促進するかという興味深い質問を導くに至る。１次元のコンボルーションを調和のフィルターと最初に考えると、長さＰ＝２Ｌ＋１はポイントｄ_mに適用される。ここでフィルターエレメントf_iセンターエレメントf₀のいずれかのサイドにおいて等しい。次のワードフォーマットを使うと、イーブンファンクションをもつ１-dコンボルーションのアルゴリズムは図１７に示すとおりである。ｘ方向のコンボルーションにとって全てのシフトは１プレイスすなわち「シフトタグ（±１）」であり、ｙ方向においてそれぞれはＮ/b長さシフトすなわち、である。例えばガルシアンのような調和を持つ分離可能な２-dフィルターにこのアルゴリズムを適用すると、時間コンプレクシチィは次式のようになる。奇数調和の場合、f₀＝０とf_iはその一方のサイドでフィルターエレメントの絶対値である。それ故、コンボルーション式は次式に短縮される。そして、２個の修正がアルゴリズムに要求される。すなわちプログラム（d・f₀ の評価）のステップ１を除去することとＮＥＸＴ＋ｄからＮＥＸＴdへのループの捨て覆う１を変えることである。奇数調和フィルターの時間コンプレクシチィは、両方向に適用され、次式で表される。すでに考察されたコンボルーション時間を減少する方法はフィルターの特別な性格すなわち調和、エレメントビットスタティスティックスならびに２-dフィルターの分離可能性という利点を持っている。私たちはここに今最も一般的なフィルターに適用される増強を考慮し、画素データーをマルティプライヤーとして扱い、一時にマルティプライヤーの４ビットを適用することによりマルティプリケーションをスピードアップする。次のアルゴリズムと図１８に示されるワードフォーマットを参照すると、８ビットデーターフィールドはハイ（d_h）とロー（d_L ）少量に分離される。１２ビットフィールドすなわちndは現データー少量により、現フィルターエレメントの部分生産用に提供される。テーブルルックアップにより全ての部分プロダクトを満たすことは、１５比較書き込みサイクルを要求する。（全てのＺＥＲＯ少量のために要求されるアクションは何もない）。強化されたマルティプリケーションアルゴリズムを使うと、一般の２-dイメージコンボルーションの時間コンプレクシチィは次式で表される。そして両方向の１-dコンボルーションの時間コンプレクシチィは、次式で表される。Ｔ_mはテーブルルックアップにより２個の部分プロダクトを発生する時間（２ ×１５×２.５サイクル）であり、Ｔ_p1、Ｔ_p2は、フィールドfdに次の加算を行うキャリー伝達時間である。Ｔ_m、t_cとt_sの置換ならびにＴ_p1、Ｔ_p2の評価を行うと、時間コンプレクシチィは次式で表される。次のテーブルは７×７、１５×１５ならびに３１×３１のサイズのフィルターを議論する全てのコンボルーション方法を比較する。増強されたマルティプリケーションＴ^enhは、一般のコンボルーションを計算する最も早い方法であるように見える。これはまたワード長さ１２ビット増大で達成される。特別なフィルターの特性において他の方法は小規模な利点を提供するが、しかし、それらの調和の利用はワード長さ（１７ビット）にさらに大きなインクリメントを要求する。マールトヒドレスエッジの発見（Ｍarr ＆Ｈildreth Ｅdge ＤＥＴＥＣＴＩＯＮ）このアルゴリズム[39]は、ガウシアンフィルタードイメージのラプラシアンのゼロクロシングZCを認め、次のように記載することができる。ここで、Ｉはオリジナルイメージであり、Ｇ_σはスケールσの２-dガウシアンフィルターを示し、Δ²はラプラシアン操作を示す。このＭ＆Ｈアルゴリズムは次の２つのステップからなる。ＤＯＧフィルターは次の式を持つ。ここで、σ_pとσ_nはそれぞれポジティブとネガティブのガウシアンの空間一定値であり、しの値σ_p/σ_nは約１．６であり、これはガウシアン（Δ²Ｇ）操作者のラプラシアンに最も近く同調する者である。ＤＯＧのインプリメンテーションは各空間一定値に４個の１〜ｄコンボルーションと１列と１コラムコンボルーションを要求する。連想ＤＯＧ実行のコンプレクシチィは次式で表される。ここで、Ｐ_pとＰ_nはそれぞれ空間一定値σ_pとσ_nの固有のフィルターサイズであり、Ｔ_diffは、サインビットにＮビットの引き算と上位からかりおろす伝達戸のコンプレクシチィを示す。連想引き算速度は加算速度と同じであ最速の一般コンボルーション方法をＴ₁₄ ^enhと仮定すると、eqs.２０と１６にＭ=８in eq.１９を代入すると、次式を得る。Ｐ_p、Ｐ_nとＤＯＧコンプレクシチィ（サイクルとミリセコンド）はそれぞれσ ＝０.５、１と２に対応する３つのフィルターに関し、下記の表を得る。ここでＰ_pとＰ_nはフィルターベクトル長さを示す。Ｍ＆Ｈアルゴリズムの第２ステップすなわち０クロス方向は、３×３近傍で操作する。中央画素は、４方向（水平、垂直、ならびに２個の対角線）のいずれか１つがサイン上の変化を生むのであれば、エッジポイントであると考えられる。特に、一対の（中央の）一項目がポジティブスレッショルドＴを越えるかどうかについて、一対の田の項目がＴよりも小さい場合に、テストされる。各空間フィルターのＺＣの連想遂行は下記のアウトラインを持つ。１．スレッショルドＴとマイナス−Ｔに対し同時にすべての画素（ＤＯＧファンクションからの結果）を比較し、メモリーに２ビット移動し、結果を指示すること。２．各画素の８個の近所からメモリーにワードの１６ビットフィールドへ同時にすべての対の指示ビットをシフトし、書き込むこと。３．１６ビット指示フィールドを使用し、ＺＣのためのすべての４方向をテストし、エッジポイントをマークすること。０クロッシングを検出する連想アルゴリズムは、１６５サイクルすなわち４．９５マイクロセコンドの時間コンプレクシチィを示す。Ｍ＆Ｈアルゴリズムは傾斜方向なしにエッジポイントを発生することに注意されたい。このパラメーターは各エッジポイント回りのより大きな近隣（９×９）に操作することにより計算することができる。１６セグメント方向（およびコーナーズ）を検出する連想アルゴリズムは発展した。その結果下記のように表現される。その時間コンプレクシチィは１０１０サイクルすなわち３０．３マイクロセコンドである。Ｃanny Ｅdge Ｄetection（キャニーエッジの発見）Ｃannyのアルゴリズム、[40]は、３つの段階を持つ。すなわち、１．ガウシアンフィルターの直接的デリバティブ(ΔＧ☆I) ２．非最大サクレッション３．ヒステリシスを持つスレッショルド方向ｎのスケールσのガウシアンデリバティブの一般フォームは次式で示される。ガウシアンフィルターのｘとｙデリバティブはそれぞれの式を持つイメージをコンボルービング（巻き込み）することにより得られる。増強されたマルチプリケーション方法を適用すると、フィルターサイズの典型的セットの実行時間にＴ_1d ^enhは次のようになる。Ｎon-maximum suppression（非最大サプレッション）は、傾斜量が最大限になるエッジ後方画素として選択する。最適の感度では、テストが傾斜方向に実行される。３×３近傍は８方向を提供するのであるから、書き入れはこの数を１６に倍増する。最大であるかどうかを決定するため、各画素の傾斜値はその各サイドのそれと比較される。連想インプリメンテーションは前述した０クロス方向のそれ以外の少数の操作を要求する。Ｔhresholding with hysteresis（ヒステリシスを持つスレッショルド）はノイズになる弱いエッジを除去し、強いエッジが弱くなるにしたがいその強いエッジのトレースを継続する。信号スタシスティックスとイメージのノイズの評価を基本として、傾斜量の２個のスレッショルドがローまたはハイに計算される。ローレベルの傾斜量を持つエッジ候補は除去され、一方ハイレベルの対応するエッジ候補はエッジとして考慮される。ローとハイの間の値を持つ候補は、それらがローレベルを越える一連の画素を通してハイレベルを越える画素に接続される場合、エッジとして考慮される。この間隔のすべての他の候補は除去される。プロセスはカーブに沿って伝達される。連想インプレメンテーション（リスト３）は図１９のＥに示す３つのフラグを使用する。この図１９は、初期時にハイスレッショルド（曖昧でないエッジポイント）を越える候補をマークし、最後にすべての選択エッジポイントを、すなわち最終繰り返し時に確認エッジのトラックをキープするＯＥ（オールドエッジ）と下記の候補を表するＬとを表示する。各繰り返し時、各Ｌ候補は、１個以上の８近傍がエッジであるとき、見るように審査され、この場合、Ｅをセッティングすることによりエッジであると申告される。ＥをＯＥに移動する前、安定状態に到達したとき２個のフラグが見るように支度され、この場合、プロセスが終了する。ＬＩＳＴＩＮＧ３:回り伝達（Ｃurve Ｐropagation）。プログラム時間コンプレクシチィは次式で与えられる。ここで、Ｉは繰り返し数であり、２３.５は８近傍の値を審査する時間であり、Ｎ/bは、近隣の列からエッジポイントに持ってくるロングシフトを説明する。Ｉの上限は、最長伝達チェインにより与えられ、ほぼＮ²であり、しかし、１００繰り返しの代表値としては、曲がり伝達委コンプレクシチィは３９５０サイクルすなわち１１９マイクロセコンドになる。Ｔhinning（シニング）前記伝達アルゴリズムは薄くないカーブを生成する。マルチパスシニングアルゴリズムは、プリシニング層と繰り返しシニング層からなる。図２０Ａ〜２０Ｅを参照すると、プリシニング層はテンプレート（a）を適用することにより単一ギャップを満たし、テンプレートb、cまたはｄの１つが保持されるときポイントＰをクリアーにすることにより境界ノイズを除去する。マルチパスは、テンプレートが北方向に最初に適用されることを意味し、そのとき南、東および西方向については、十分に調和し、一度適用されることが必要なテンプレート（a）を除外する。すべてのテンプレートは北方向に示され、「注意不要」（ＯＮＥまたはＺＥＲＯ）を表示するＸを用いる。同様に、シニング（細い）層は、連続して４方向の各方向にテンプレートe、fとｇをテストし、同意があったとき、ポイントＰをクリアーにする。この４パスシークエンスは、変化がそれ以上なくなるまで繰り返される。特に価値のあるノートは、シンプルローカルプロセスにより生成されるスケルトンの質である。スケルトンの最も正確な提示は中間軸を基礎としている。デービーサンドプラマー[41]は、そのようなスケルトンを生成する高評価のアルゴリズムを提案し、それをテストするための８のイメージを選択する。わたしたちのシニングアルゴリズムは、それらのイメージに適用され、興味深い結果を得た。そのスケルトンはデービーサンドパルマーのそれに正確に仮想的に同意する。エンドポイントでない不一致は、曖昧ポイントで発生し、等しい有効結果を構成する。プレシニング層の境界ノイズを除去すると、スケルトンの外部刺激の形成を回避する。アルゴリズムの時間コンプレクシチィは次式で与えられる。ここで、最初の２個のタームはプレシニング層を説明する。実行時間はプレシニングのために１５０サイクル（４.５マイクロセコンド）であり、シニング繰り返し当たり２１４サイクル（６.４マイクロセコンド）である。エッジシニングにとって３繰り返し数は十分に足り、２５マイクロセコンドの遂行時間を与える。単一パスシニングは考慮され、むしろクリティカルに見いだされた。シンヘトアリヤ[42]により提案され留アルゴリズムは最適のように見えるが、しかしそれは理想スケルトンを果たすものではないし、ノイズツリミングの自身の予備層のアプリケーションはシニングの際いくつかのメーンブランチの整理に導く。ステレオビジョン（Ｓtereo Ｖision）は、左イメージの各要点における対応する問題を解決しなければならないし、右イメージの対応するポイントを見いださなければならないし、また共通点の欠如を計算しなければならない。ステレオは過去１０年間以上コンピュータービジョンのメジャーリサーチトピックであったので、大変多くのアプローチが提案されており、またそれらの多くのアプローチのすべてを仲間的に実行する試みがなされている。ここで、グリムソン[43]のアルゴリズムに着目する。これは、また、人間ビジョンのヒエラーキーカル構造に類似性を持ち、前記のＭ＆Ｈあるいはキャニーエッジ支持スキームにより生成された入力エッジを使うことができる。エッジ検出が左および右イメージの両方で実行されたと仮定すると、結果がメモリーにサイドバイサイドにセットされる。エッジポイントはマークされ、そのオリエンテーションは２πラジアンを越える４ビット精度すなわち２２.５度の分析で与えられる。ステレオプロセスはデファレンスとして左イメージを使用し、エッジポイントを同等サインの傾斜と概略の同じオリエンテーションにエッジポイントを一致させる。水平線（±３.７５度）近傍のエッジラインは、共通点の欠如エラーを最小にするために除外される。グリムゾンアルゴリズムは、次のステップからなる。・左イメージにエッジポイント（オリエンテーションを受け入れ可能な）を位置すること・右イメージの対応ポイント回りの領域を３つのプールに区分すること・プール内でポテンシャルマッチを基礎とするエッジポイントにマッチを移動すること・曖昧なマッチを明確にすること・共通点の欠如する値を割り当てることマッチングプロセスのアソシエイティブメモリーワードフォーマットは次の通りである。左と右イメージの入力フィールドＤＬ、ＤＲラベルエッジポイントはそれぞれイメージし、そのときＤＩＲ-Ｌ、ＤＩＲ-Ｒはそれらのオリエンテーションを供与する。共通点の欠如の結論的値は出力フィールドＤＩＳＰに記録される。アソシエイティブアルゴリズムのアウトラインは図２１に示される。３つのプールＡ、Ｂ、Ｃに区分される±Ｗピクセルの近傍をサーチすること。ここでプールＡとＣはサイズが同等であり、それぞれ分岐ならびに集中領域を代表する。より小さなプールＢは共通点の欠如０のまわりの領域である。・フィールドＤＩＲ-ＲとＤＲ（右イメージのもの）をＷワードポジションにシフトダウンすること（右のＷピクセルの右イメージのシフトに対応）。・エッジポイント（ＤＲ、ＤＬ=１）において、交差±１の範囲内においてフィールドＤＩＲ-ＲをＤＩＲ-Ｌに対し比較すること。各比較の後ポジションＷに到達するまで１ワードポジションに右イメージフィールドをシフトアップすること。比較結果は、フィールドＰＸとＫＸ（Ｘ=Ａ、ＢまたはＣ）に記録される。ＡＰＸ値００は、マッチなしを指示し、０１は１マッチを指示し、１１はプールＸの１以上のマッチを指示する、ＴＺフィールドは、一時的に、曖昧な場合に使用される各プールの各共通点の欠如を蓄積する。マッチが発見されたとき右イメージのネットシフトが共通点の欠如であることを注意されたい。・明確な共通点の欠如（ＰＸの０１、すべてのＰＹの００、ここでＹ≠Ｘ）を持つエッジポイントが選択され、それらの共通点の欠如はフィールドＤＩＳＰ（ＤＩＳＰ=ＴＺ）に割り当てられる。・近隣の有力な共存点の欠如を使用することによる１以上のプールにマッチを明確にする計画を立てる。近隣に有力なプールがある場合曖昧なポイントは、同じプールにポテンシャルマッチを持ち、そのときその曖昧なポイントはマッチとして選択される。他の場合、該当ポイントのマッチは曖昧さを残す。有力な共存の欠如のための近隣のテストするためすべてのエッジポイント（フィールドＤＬ）をＣＯＵＮＴフィールドにカウントすることによりスタートする。すると、各プールの順番に、同じ近隣を越えて明確なマッチをフィールドＣＯＵＮ-Ｐに勘定する。ＣＯＵＮＴ/Ｐ＞ＣＯＵＮＴ/２のとき、プールは有力であり、その同じプールのマッチはＴＸからＤＩＳＰにコピーされた共通点の欠如を持つ。有力なプールがない場合、もしくは有力なプールにマッチがないとき、ＤＩＳＰはアップデートされず質問ポイントがＭＲビットで曖昧なマークとして存在する。・最終ステップは共通点の欠如が範囲内であるかどうかをテストする。マール＆ポグジオ（Ｍarr＆Ｐoggio）は、範囲内領域において、エッジポイントの７０％以上がマッチされることを示す。非マッチエッジポイントはＭＲでラベルされ、ＣＯＵＮＴ -Ｐフィールドに近隣を越えて計算される。ＣＯＵＮＴ/Ｐ＞ＣＯＵＮＴ/４のとき、近隣のすべてのエッジポイントは非マッチとしてラベルされ、その共通点の欠如はクリアーにされる。アソシエイティブアルゴリズムの時間複雑性は次式で与えられる。「Ｂ４」ここで、Ｔ_shは右イメージのシフトを計算し、Ｔ_matはプール内のマッチを評価数時間を示し、Ｔ_disは明確さの時間を示し、さらにＴ_orは範囲外の共通点の欠如の発見と除去時間を示す。サイクルのシフト時間は次式で与えられる。第１の用語はフィールドＤＲ、ＤＩＲ-Ｒ（５ビツト）の最初と最後のＷプレースシフトアップを説明する。第２の用語は、連続的なマッチング捜査官の１プレイスシフトダウンをカバーする。そして最後の用語は、ジェネレーションに起因して、列の最終効果を取り扱う境界フラグのアップデートである。今、マッチング複雑性Ｔ_matを考慮する。それは、交差±１の範囲ですべての共通点の欠如（２Ｗ+１）とすべてのオリエンテーション（１０）のために８サイクル四角を要求する。ここで、ここで、前記第２の用語は、比較結果の最終プロセスを説明する。明確なプロセスは次のステップからなる。・Ｌ×Ｌ近傍毎にエッジポイントをカウントオーバーすること・３個のプールのそれぞれについて： -同じ近隣を越えて明確なマッチをカウントすること。 -この結果をエッジカウントの半分と比較すること。 -有効プールのマッチから共通点の欠如をコピーすること。ここで次のように書くことができる。ここでＴ_cntは、近傍を越えてラベルピクセルをカウントする時間を示し、Ｔ_g _t はそれよりも大きな値を比較する時間を示し、Ｔ_cpyは、共通点の欠如をコピーする時間を示す。Ｔ_cntは、次のセクションの主題であり他の２つは次式によって与えられる。明確なアルゴリズムは次に示すリストの通りである。最後にハイ以外の共存点の欠如をテストし、記録する時間は次式で与えられる。ここで固有の用語は、ＭＲで非マッチエッジポイントをグルーピングし、それらをレベリングすることを説明し、すなわち、Ｔ_cntは、近傍の非マッチエッジポイントをカウントする時間であり、Ｔ_lt、は、近傍のエッジポイントの数にその数を比較することをカバーし、Ｔ_rmは、範囲内近傍のエッジポイントの共通点の欠如をラベルし、クリアーにする時間である。ステレオマッチングは、空間的に頻繁なチャネルの各々のために実行される。マール＆ポグジオ[44]のステレオビジョンのモデルから次式が得られる。ここでＰは、チャネルのフイルターサイズ（ベクトル長さ）を示す。式２^I-1 のＰを選択すると、Ｌは同様の次の方程式を持つ。２１-２８を式２０に代入すると、Ｐ、ＷとＬの間の関係を用いて、ｋの上記定義に応用すると、次式を得る。近傍数値をのぞくアルゴリズム複雑性は、空間的頻度の３個のチャネル、用のサイクルとミリセコンドで次の表の通り示される。相当大きな近傍を越えてラベルピクセルをカウントする問題を考慮すると、ステレオ評価の際に５回実行されるフアンクションは、その時間複雑性よりも有利な位置にくることがでできる。線形の概要（Ｌinear Ｓummation）各服セル回りの近傍を越えてラベルピクセルをカウントする直進アプローチは、各ワードのカウンフィールドを提供し、近傍ラベルがインクリメントされ、その結果近傍ラベルは都合の良いシーケンスにより入力される。Ｌ×Ｌ近傍では、最大カウント値はＬ²であり、Ｌが奇数ではカウントフィールド長さは２log₂Ｌである。プログラムリストは次の通りである。ここで、「フラグ」は家運とされるピクセルをラベルし、「フィールド」は、カウントフィールドの初期（ＬＳＢ）に指示する。ワードフォーマットは図２２に示される通りである。リスト５：Ｌinear Ｓummation Ｏver Ｌ×Ｌ with ＬＯdd マシンサイクル実行時間を下に示す。ここで、k＝log₂（Ｌ＋１）、それ故それは、Ｌ²logＬに成長する。ステレオアルゴリズムの上記プログラムを結合すると、その実行時間を破壊し、それによって最も粗悪なチャネルでは、ビデオフレーム時間を越える。このことは、次のテーブルに示されるように、数ミリセコンドの単位ですべての時間を供与する。比較のため、近傍カウント値を除くステレオ実行時間はＴ_2st-cntsのとき次の通り繰り返される。２次元の概要（２-d Ｓummation）扇形の概要プログラムでは、Ｌラベルの各列が仮想オーバーラップの近傍のそれぞれでＬ回実際にカウントされた。近傍の２次元構造の効果を見ると、カウントは２つの段階で実行される。１つの段階は列毎に近傍ラベルが照合され、第２の段階では仮想の近傍列の合計が集計され、これによってその合計が適時シーケンスにより入力される。このことは、長さlogＬの付加的「列」フィールドが要求され、次のプログラムを生み出す。ここで、ワードフォーマットは図２３に示す通りである。マシンサイクルの実行時間は次式で与えられる。フィルターベクトル長さが式２^y-1で表されることを選択した私たちのケースでは、前記に示したとおり、Ｌは式２^k-1で示される。したがって次式を得る。そしてＴ_cntには、次式で表される。実行時間はＬlogＬにまで大きくなる。このカウントアルゴリズムによると数ミリセコンドの実行時間をリストする次のテーブルに示されるように、リアルタイムビデオ限界内でステレオ複雑性が良好になる。２次元つりの概要（２-d Ｔree Ｓummation）２次元の概要（２-d Ｓummation）はステレオ時間複雑性を要求リアルタイムに合わせるように減少するが、しかしこの結果がさいてきであるかすなわち意義深く改良できるかどうかという質問が発生する。２次元の概要アプローチ（第１に列による、第２にコラムによる）を継続する場合、西のつりファッションの各次元を取り扱い、ペア毎にエレメントを集計し、ペアーの結果を集計し、次元がカバーされるまで再び加算する。Ｌは式に^k-1で示されるため、大きな列とコラムはツリーにコンプリートされるように加算される。「テイル」により表示される特別なエレメントは再び思い起こされるように集計時から置換され、それ故、ｋビット「テイル」フィールドは特別な列集計の一時の蓄積を供与する。プログラムリストは下記の通りである。ワードフォーマットは図２４に示すとおりである。実行時間は次から得られる。ここで、結果を前記３チャンネルについて表に示す。コンプレクシティー計算はＬlogＬが３２に等しくなるまで行う。しかし、表は２-dの和（最も大きい近傍）が４０％以上の改良であることを示す。２７.５％のステレオコンプレクシティーが結果として改良される。計算が進歩として知られている要求された精度で実行されるという事実から大抵は改良が起こる。ステレオ結果の議論連想プロセスによるリアルタイムのステレオビジョンは、考慮され、問題ある関数は近傍と区別するためにラベルをつけられた画素の計算として認識される。この関数は二通り以上の組合わせとして４回分析され、１回は範囲外の相違として扱われる。まっすぐ前に向いて（直線的な）やり方によって関数の計算の実行はリアルタイム（ビデオ）限界を超えてステレオ実行時間をすすめる。配列と幹をどのように連想させるかという技術は大きく進歩し、ステレオコンプレクシティーはリアルタイムで十分に行われている。結果は図に示すと２５図のように直線、二次元そして二次元の幹という３つ実行されたものに分かれる。近傍の次元の関数として、実行時間は二つの比較できる点として表す。２６図は近傍の次元の関数として３つの方法でそれぞれ近傍計算無しおよび近傍計算有りでステレオコンプレクシティーを表す。オプティカルフローオプティカルフローは視野を横切る動きとして分かれる速度ベクトルをイメージした点で示す。オプティカルフローに適用されるポテンシャルはトラッキングターゲット、同一性のターゲットおよび圧縮イメージの大きい自立したロボットと関連したエリアを含んでいる。オプティカルフローをコンピューターで計算する理論は、イメージが一定で粒子の点の明るさおよび輝度パターンの流れをいつもどこでも平滑に変える２つの制約に基づいている。ＨornとＳchunck[45]は制約された問題を解くのを繰り返すプロセスを導いた。流れの速度はその要素を（ u,v）がある。繰り返す毎に速度（uⁿ⁺¹,vⁿ⁺¹）は、新しく前もって設定された平均速度（uⁿ,vⁿ）から算出される。ここでαは測定におけるノイズに依存する重みファクターである。３３式で導かれたＥ_x、Ｅ_yとＥ_tは立体的に近傍の測定値の４つの最初の相違の平均値として得られる。Ｅ_i,_j,_kはフレームｋにおけるローｉとカラムｊの交点の画素値である。インデックスｉとｊはそれぞれ上から下、る。時間毎の繰り返しをどのように組み合わせるかという実際上の問題がある。前のタイムステップ（ビデオフレームタイム）から普通に利用できる。高速度の動きの場合、タイムステップ当たりの１つの繰り返しはオプティカルフローの安定値を十分には得られない。次のフレームに進む前に数回の繰り返しが必要である。次の方程式を実行すると、メモリーワードが入力データ、出力データ、中間結果といった複数の分野に分割される。フォーマットを図２７に示す。Ｅ_nとＥ_n1の２つの連続するビデオフレームから流れはコンピューターで計算される。それぞれのフレームはグレーレベルが８ビットの精度で得られる５１２ ×５１２画素に含まれる。垂直の抹消の間（フレーム間の時間間隔）流れのイメージはＥ_n1入力Ｉ/Ｏ緩衝配列（図１５）からの新しいイメージはフィールドＥ_n に書かれる。フレームタイムの間アルゴリズムの１つかそれ以上の繰り返しは、次のフレームで使われるオプティカルフローの理にかなった近似が得られる。方程式３３を再び下に示す。ここで、Ｄ_x＝Ｅ_x/Ｐ、Ｄ_y＝Ｅ_y/ＰかつＰ＝α²+Ｅ_x ²+Ｅ_y ²である。ここで導かれたＥ_x、Ｅ_yおよびＥ_tと同じようにＤ_xおよびＤ_yは与えられたフレームで固定され、フレーム間で繰り返す過程で伴うことはない。従って、これらのパラメーターのコンピューターの計算はアルゴリズムの固定された部分として引用される。固定された部分第１段階は個々に誘導されたＥ_x、Ｅ_yおよびＥ_t、をコンピューターで計算する。クリアーなフィールドＥ_x、Ｅ_yおよびＥ_tＥ_nおよびＥ_n1のローを１段上げ、カラム左（Ｎ+1ワード上げる）に移動すると、Ｅ_i+1，_j+1，_nおよびＥ_i+1，_j+1，_n ₊₁ が得られる。Ｅ_n+1をＥ_x、Ｅ_yおよびＥ_tに入れる。Ｅ_nを下の表に示す誘導されたフィールドに加える、または減じる。第２段階。Ｅ_nおよびＥ_n1のカラム右（１ワード下げる）に移動すると、Ｅ_i+1，_j+nおよびＥ_i+1，j，_n+1が得られる。Ｅ_nおよびＥ_n1を下の表に示すように積み上げる。第３および４段階。Ｅ_nおよびＥ_n1のロー（Ｎワード）を１段下げ、次の表に示すように積み上げる。第５及び６段階。Ｅ_nおよびＥ_n1のコラムを左（１ワード上げる）に移動すると、次の表に示すように積み上げる。第７および８段階。Ｅ_nおよびＥ_n1をそれぞれの元あった場所（１ワード下げる）に移動する。次の段階でＤ_xとＤ_yを計算する。表を参照してα+Ｅ_x ²をコンピューターで計算し、フィールドＳ_cに入れる。同様にＥ_y ²をコンピューターで計算し、フィールドＡ_cに入れる。フィールドＡ_cにフィールドＳ_cを加えるとＰが得られる。フィールドＥ_xをフィールドＳ_cで割り、フィールドＵに商を置く。フィールドＥ_yをフィールドＳ_cで割り、フィールドＶに商を置く。フィールドＵからフィールドＳ_cの右側（最低重要な９ビット）にＤ_xをコピーする。フィールドＶからフィールドＳ_c左側にＤ_yをコピーする。もう１つの階層を実行する前に、除数Ｐに対応して正しく増減するように割られたＥ_XまたはＥ_yは、左にゼロ拡張される。 αは正であるからＤ_xおよびＤ_yは１より小さい。したがって、商はオーバーフローしない。固定された部分のコンプレクシティーは次のように計算される。繰り返しの部分方程式４４に示したようにＵ要素（フィールドＵ）の平均を計算し、フィールドＵ_avに結果（u）を置く。Ｕフィールドを最も望ましい繰り返しで増減したり、Ａ_cフィールドに足し合わせる。４つの近傍のものが１つの場所から左に加えることによって２倍の重みが得られる。Ａ_cに積まれた値を３で割り、フィールドＵ_avに結果として置かれる。同様にＶ要素の平均（フィールドＶ）を方程式４４に示すようにコンピューターで計算し、フィールドＶ_avに結果（v）を置く。フィールドＡ_cにＥ_tをコピーする。Ｅ_xにＵ_avを掛け合わせ、フィールドＵに置く。フィールドＵにフィールドＡ_cを加える。Ｅ_yにＶ_avを掛け合わせ、フィールドＡ_cに置く。フィールドＵにフィールドＡ_cを加え、Ｅ_xu+Ｅ_yv+Ｅ_tが得られる。フィールドＡ_cにフィールドＳ_cの右側を掛け合わせ、Ｄ_x[Ｅ_xu+Ｅ_yv+Ｅ_t]が得られる。結果をフィールドＵに置く。Ａ_cにＳ_cの左側を掛け合わせ、Ｄ_y[Ｅ_xu+Ｅ_yv+Ｅ_t]が得られる。結果をフィールドＶに置く。オプティカルフローのｕ要素をフィールドＵのＵ_av−Ｕを算出することによってコンピューターで計算する。オプティカルフローのｖ要素をフィールドＶのＶ_av−Ｖを算出することによって、コンピューターで計算する。繰り返し毎のマシンサイクルのこの部分のコンプレクシティーは次の式で得られる。したがってオプティカルフローのコンプレクシティーは下記で計算できる。ここでＩは流れを収束する繰り返し数を定義する。上の式を計算することにより固定された部分は２６０μs、繰り返しは１９６μs毎である。Ｉの異なる値の実行時間は次の表で得られる。中間レベルのビジョンコーナーとライン方向の検知中間と最高レベルの過程における重要な特徴は、コーナーとライン方向を区別する能力である。Ｃanny Ｅdge Ｄetectionの場合にはライン方向は、プロセス中に生成する。また、Ｍ＆Ｈアルゴリズムは方向性がなく、エッジビットマップはライン方向の検知をさらに処理する。セグメント方向を区別するそれぞれの画素の回りの９×９の近傍のエッジビットマップを我々は提案する。アルゴリズムは１２０の異なったコーナーとラインを識別することができる。アプローチのアウトラインは以下のとおりである。 1.９×９の近傍はハンドル方向に非常に多くのパターンを伴うから図２７に示すように２４のセクターに分配される。角度を正確に保持するセクターのサイズは中心から離れて増加する。セクターを算出する関数はエッジポイントを指示する論理的ＯＲとして定義する。そして２４ビットフィールドはセクター値に割り当てられる。近傍のエッジポイント指示器に移すことにより、数値計算され、応答するセクター値に直接ＯＲする。 2.セクターはπ/８角度の決定に基づき、円の周りの１６の等しいセグメントや線を定義する。それぞれのセグメント（方向）はセグメントの値の部分集合の行動によって特徴付けられる。１の距離の最大のＨammingは容認される。セクター値フィールドはそれぞれの１６コードとそれぞれのセグメント方向のマークに１６ビットフィールドと対抗した結果と比較される。 3.セグメントの正確付けとテスティングの単一化は８つの主なコンパス方向の１つについてそれぞれ３組曖昧さの測定を算出する。加えられたセクター値はこの不確かさを解決する。本来の曖昧さはほとんど決定しない。 4.１６ビットのセグメントフィールドはラインとコーナーのある幾対かのセグメントをテストできる。サンプルプログラムはすべて区別なしに選べる。シュミレーションからアルゴリズムのコンプレクシティーは１０１０サイクルで３０.３μsであった。このアルゴリズムの概念はセクター境界、関数の計算、パターンの正確付けの選択に依存した広いレンジの関数に拡張することができる。トレーシングとラベリングの輪郭それぞれの輪郭のポイントにxy座標に用意されたステップラベルをつける。主な過程が繰り返され、３×３の近傍のすべての輪郭ポイントに平行に運転される。すべての輪郭点は周りの８つの近傍の内のそれぞれ１つに見られ、近傍のラベルがそれ自身より小さいなら受け入れる。近傍が容易に算出できるように操作される円の連続はラベルの伝達を束ねる。すべてのラベルが変わらないときに繰り返しは止まる。最も低い座標に位置するそれぞれの輪郭はなくなる。それぞれの輪郭の受ける最も低い座標点は元あったラベルを１つのみとどめる。これらの点は跡を残し、数えられ、イメージの輪郭数を得る。リスト８は、連想メモリーにプログラムを送る。入力フィールドは[xy-coord]を１つのポジションに指定し[e dge]を輪郭点と同一にする。出力フィールドは輪郭[label]と輪郭出発点[mr]である。ワードフォーマットを図２９に示す。リスト８：トレースとラベリングの輪郭アルゴリズムの時間のコンプレクシティー（マシンサイクルで）は次の通りである。Ｉの上の範囲は約Ｎ²/２である。しかし、１００回の繰り返しを表す値として実行時間は２１８キロサイクルまたは６.６msとなる。時間コンプレクシティーへの最適近似は、輪郭のリストラベルと長さ（画素において）を与える。相対的にショートオーダー（１輪郭当たり２４サイクル）で生成される。連想突出ネットワークイメージにおける突出した構造は、それらの形について組織的な調査や秀でた知識がなくても一目で感知できる。そのような構造は混乱したバックグラウンドに埋め込まれたときや、その要素がばらばらになったときに目立つであろう。Ｓ ha'ashuaとＵllman[46]はその長さ、連続性、平滑さに基づいたカーブのグローバルな突出した測定を提案する。Ｎ×Ｎの格子点のネットワークとしてイメージを考慮するとＤ方向要素（セグメントまたは間隙）でその近傍からぞれぞれの点に入っていき、たくさんのものが近傍へ出ていく。このイメージで長さＬのカーブは方向要素Ｐ_i、Ｐ_i+1、...Ｐ_i+L、ラインセグメントやイメージの隙間に現れるそれぞれの要素を結合する連続である。そして、曲線の突出量(saliency measure)は次式で定義される。局地的突出であるσは、活動要素（active element，実セグメント）としては１ (unity)が、仮想要素（virtual element，ギャップ）としては０が割り当てられる。減衰関数ρ_i,jギャップにペナルティ(penalty)を与える。ここで、減衰要因ρは活動要素になるように１に近づき、仮想要素としては１よりもかなり小さい（ここでは０．７とする）。第１要因ｃ_i,jは総曲率の逆数の境界量に対する不連続近似値(discrete approximation)である。ここで、α_kはk番目の要素から次要素へ向かう方向の違いを示している。そして、ΔＳは方向要素の長さを示している。この量はグローバルであり、Ｌ個の方向要素を有する曲線に関して算出される。これら要素の内いくつかはギャップであるかも知れない。それ故、与えられたセグメントにおいて最大値を求めるためには、このセグメントから始まるすべての可能な曲線ｄ^Lを計算しなければならない。ｄは各ポイントで考慮される（discrete、不連続な）方向番号である。突出曲線の部分部分または突出している必要がないので、指数コンプレクシティ（complexity）はピラミッド・テクニック(pyramid techniques)では減少できない。Sha'shuaとUllmanは短曲線をそれぞれ最大にすることによってdLを配列する為にコンプレクシティ（complexity）を減少した。Ｅ_iを要素ρ_iに関連する状態変数とすると、反復プロセスは次式のように定義される。Ｅ_jはｐ_jの状態変数であり、ｐ_iの可能近傍であるｄの内の一つである。Ｅの右肩の数字は反復回数を表している。そして、ｆ_i,jはｐ_iからρ_jへの逆曲率要因である。Ｌ回の反復のあと、状態変数は先に定義した突出量に等しくなる。証明は、［４６］に概略が示され、［４７］に詳細に示されている。Ｎ×Ｎ格子におけるすべての方向要素（セグメントまたはギャップ）の最終状態変数は画像の突出マップを構成している。我々の連想アーキテクチャでは、画素は格子点を構成し、ｄ＝８の方向要素は各画素を近傍に連結する（図３０）。図３０において、次の表記が用いられている。・実線：突出が計算されるＥ_i目の要素・破線：計算に使用される次要素・点線：計算で無視される要素ｄの不連続方向に対し、角αは（３６０／ｄ）＝４５°の増加量で与えられ、ｆ_i,j は次に示す値をとる。 −４５°、０°、４５°の三つの値だけが重要であることが判るだろう。それ故、これらαの値とその次要素だけが計算に用いられる。初期画像はエッジ・ポイント（edge point）として与えられるので、前処理段階は、８近傍であるいずれのか１組のエッジ・ポイントとして活動要素を識別することを要求される。プログラムの概略はリスト９に示される。メモリのワードフォーマットは図３１に示されている。リスト９：連想突出ネットワーク（Associative Saliency Network）アクティブρが１に近づくとき、異なる高突出度を区別可能になるとは意味がないかもしれないが、多くの反復が要求される。アリゴリズムには９０ビットの語長が必要であり、サイクルのタイム・コンプレクシティ（time complexity）は次式で示される。Ｉは反復回数を示しており、括弧内の値を求めると、各反復における実行時間は０．４msになる。各反復回数の５００msという実行時間はコネクションマシン（Connection Machine）［４８］で報告された。ホク(Hough)変換ホク変換により、曲線中にギャップがあっても、直線または円錐曲線のように媒介変数表示方程式により記述された曲線の輪郭を検出できる。画像スペース(ima ge space)における図の各点はパラメータスペース(parameter space)における軌跡に変換される。パラメータを適当な領域に分けてから、パラメータスペースにおける軌跡を分配してヒストグラム(histogram)が作成される。対象曲線の出現はヒストグラムの顕著なピークにより表される（多くの軌跡の交線）。直線の場合（図３２）、我々は、Duda & Hartによるノーマル・パラメタライゼーション(normal parameterization)を用いる［４９］。 xcosθ＋ysinθ=ρ この式は、ρとθにより線を特定する。そして、ヒストグラムはθにより示される全方向における直線を含んでいる。しかしながら、もし侯補点が方向を生成する方法によるエッジ検素の結果であるなら、θは既知である。次のO'Gorman & C lowes［５０］では、この情報は、ハードウェア（語長）とタイム・コンプレクシティの両方を主に低下させることに適用された。中央の発端部を含む５１１× ５１１面像にとって、x-y座標は絶対値および絶対符号により９ビットで与えられる。０からπの範囲内の角θは、１０ビットのマッチング精度で与えられる（傾きの符号は除外する）。sinとcosはテーブル索引により求められる。テーブルサイズを４倍減少するこれら関数の調和により有利さが得られる。ρを比較してからヒストグラムが求められ、ＣＯＵＮＴＡＧ原子関数を用いた要索により読み出し要素が求められる。このアルゴリズムは５２ビットの語長が必要であり、次式に示すマシンサイクルのタイム・コンプレクシティを有する。 T_l ＝1870＋13t(r-1) ｔ、ｒはヒストグラムにおけるρ、θそれぞれの解析結果である。第２ターム（ term）はヒストグラム算出の説明となり、ｔ，ｒ≧３２におけるＴ_lを左右する。ｔ，ｒ＝１６における解析結果では、フレーム当たりの実行時間は１５０μｓであり、１２８の解析結果では丁度６．４msに増大する。ここで、与えられた半径Ｒの円の検索を考えてみよう。その方程式は次式のように記載できる。 (x-x₀)²＋(y-y₀)²=R² ｘ₀，ｙ₀は中心座標である。直線の場合、我々は処理を単純化させるために傾斜方向を用いたい。我々が得た円の方程式の微分式を次に示す。 θは傾斜方向を示している。ｘ₀，ｙ₀の解を次に示す。これらの方程式が解かれ、ｘ₀，ｙ₀を求めるためのヒストグラムが生成される。アルゴリズムは、暗い背景の明円と明るい背景の暗円とを区別するために傾斜極性（gradient polarity）を用い、分かれたヒストグラムをいずれの場合にも生成する。Ｒを３２画素よりも少ないと仮定すると、要求語長は６２ビットになり、サイクルのタイム・コンプレクシティは次式で与えられる。 T_c =1550 ＋ 26r_xｒ_y （４６）ｒ_x、ｒ_yはヒストグラムにおけるｘ₀，ｙ₀の領域解析結果（range resolution）である。ｘ₀，ｙ₀の両方を１２８とすると、フレーム当たりの実効時間は１０．８msになる。白地に部分的に黒い混合円、黒地に部分的に白い混合円は境界を決定する前に（ホストにおいて）二つのヒストグラムを合計することにより求めることができる。検査が暗い背景の明円に限られるのであれば、コンプレクシティは次のように減少する（逆の場合も同様）。 T_c =1280 ＋ 13r_xｒ_y （４７）そして、実行時間は６．４にmsに減少する。幾何問題凸包（Convex Hull）画像において、点集合の境界を見つけることは興味深く、また有益である。そのような点集合をみるとき、境界の点と内側の点とを区別することはあまり難しくはない。これら自然境界点(natural boundary pouint)は凸包の頂点である。凸包は点集合を含む最も小さい凸多角形として数学的に定義される。同じように、凸包は、頂点が点集合に属する点集合を含む唯一の凸多角形である。それ故、それは点集合を囲む最短経路である。連想実行のために選択されたアプローチは、パッケージ・ラッピング法（pack age-wrapping method）［５１］として知られている。凸包に存在すると保証されている点から始め、集合中最も小さい（ｙ座標が最も小さい）点であると仮定し、正方向に水平光線を当て、他の点に当たるまでそれを上方（時計の針と反対方向）に振らす。他の点も包体に存在しなければならない。そしてこの点で光線を止め、開始点にたどり着くまで次の点に振らし続けると、パッケージは完全に包み込まれる。便利さのため、総ての画像（と点集合）が第１象限に存在するように、我々は座標構成を選択する。最下点Ｐ_iは集合中の最小ｙ座標を求めることにより配置される。それは凸体上にあり、そのようにラベルされている。ｘi＝０、ｙ_i＝ｙ_j であるセグメントＰ_jＰ_kの拡張を参照し、Ｐ_kが集合の他の点のどれかである総てのＰ_jＰ_kを形成する角θを考慮してみる。凸包上の次の点は角θが最小の点である（図３３）。Ｖ₁でベクトルＰ_iＰ_jを表し、Ｖ₂でベクトルＰ_iＰ_kを表すと、スカラー量は次式で示される。 V₁V₂=|V₁||V₂|cosθ （４８）したがって、ここで、a₁=x_i−x_i；a₂=x_k−x_j； b₁=y_i−y_i；b₂=X_k−X_j；平方根を求めることを避けるためにcos²θを用いると、 θは０からπの範囲内にあるので、a₁a₂＋b₁b₂を２乗する前に正の値をテストする。正の値があれば、それらをマークし、その中で最大のcos²θの値を探す。もし総てのcosθの分子が負であれば、最小のcos²θの値を探す。選択したθに一致するＰ_kは凸体上にあり、そのようにラベルされている。処理を続行するために、Ｐ_jは新しいＰ_iになり、選択されたＰ_kは新しいＰ_jになる（図３３）。最初の（最も低い）点に戻ると処理が終了する。二つの特別な場合が生じることがある。第１ステップにおいて、最下点を探しているとき、同じ最小のｙ座標を有する２以上の点を見つけることがある。その中で、Ｐ_jとして最大のｘ座標を有する点を選択し、Ｐ_iとして最小のｘ座標を有する点を選択し、Ｐ_iＰ_jは最初の参照セグメントになる。θを最小にする点を探す反復中、同じ最小値を生成する２以上の点、Ｐ_k1、Ｐ_k2・・・Ｐ_ksを見つけることがある。明らかに、線分Ｐ_jＰ_k1、Ｐ_jＰ_k2、・・・、Ｐ_jＰ_ksは同一直線上にあり、選択された点は最大値｜ｘ_k−ｘ_j｜を有するＰ_jから最も遠い。すべてのｘ_k−ｘ_j を０に等しくする選択は、最大の｜ｙ_k−ｙ_i｜を基にされる。ＡＴＲＴＶＭで行われるアルゴリズムの実行の分析は、次式に示されるマシンサイクルの実行時間を与える。 T_cHull = 60＋105V （５１）Ｖは凸包の頂点の数である。したがって、タイムコンプレクシティは集合内の点の数に関係しない。凸体の１０００の頂点に対し、実行時間は３．１５msになる。ボロノイダイアグラム（Voronoi Diagram）これは、近接問題に関し計算幾何において重要な手段となっている古典的な数学的対象物である。平面内の与えられたＬ個の点Ｐ_i（ｉ=1、２、・・・、Ｌ）の集合で開始し、ボロノイダイアグラムでは、領域Ｒ_j内の総ての点が、点集合Ｐ_j（ｊ=1、２、・・・Ｌ、ｉ≠ｊ）内の他の点よりもＰ_iに近くなるように領域Ｒ_jにより各点Ｐ_iを囲む。これらすべてのＲ_jの境界はボロノイダイアグラムを構成する。ブラッシファイヤ技法（brush fire technique）に基づいた連想アルゴリズムは、リスト１０に示されている。境界は２（または３）のソースからのファイヤが一致する点からなる。与えられた集合内のすべての点は最初異なる色、実際にはそのｘ−ｙ座標、で示されている。画像内の各点はその８近傍を見る。色付きの近傍に面するブランク（無色）の点はその色をコピーする。両方の点が色付きならば点の色を比較し、色が異なっていればそれ自身をボロノイ（境界）点としてマークする。この処理は総ての点が色付けされるまで反復される。近傍との色比較のためのもう１サイクルが境界の区分を完成するために必要である。８近傍を処理する順序は、境界精度(boundary precition)を効率化するために選択された。突然ではなく、それは反対の領域方向、Ｎ、Ｓ、Ｅ、Ｗ、ＮＥ、ＳＷ、ＮＷ、ＳＥに変わる。アルゴリズムの分析により、サイクルのタイムコンプレクシティは次式のように表される。また、反復毎の実行時間は７５μsで表される。領域は反復毎に２画素の割合で斜めに増大するので、１０３まで反復が要求される。しかし、Ｉ＝２０とした代表値は１．５msである。アルゴリズムは非常に薄い(thin)境界を生成する。次のテンプレートでは、４方向（南、北、西、東）における１回の薄化で十分である。テンプレートは初期の南向きで示されており、逆方向の各組内における順序は反転されていることに注意しよう。この反転は境界精度を維持するために必須であると思われる。境界から取り除かれた画素は、４近傍を順番に調査し、境界点ではない最初の点の色をコピーすることにより再び色付けされる。薄化(thinning)および再色付け(recoloring)は反復処理ではないので、実行時簡に重要な影響を与えない。図３４はリスト１０を理解する上で有用である。リスト１０：連想ボロノイダイアグラム（Associative Boronoi Diagra m）連想ボロノイアルゴリズムは、統計データに対する素早いアクセスのために設計された。それ故、ボロノイダイアグラムの（画素内）長さ、または種座標（seedcoordinate）により識別されたボロノイ領域の（画素内）範囲を読み出すには１３マシンサイクルだけを必要とする。語長（Word Length）この章は、画像アルゴリズムを計算するために要求される連想メモリの語長（Ｋ）を評価する。３チャネルにおける白黒の計算機立体画像(monochrome computer stereo vision)として述べられたマシン・モデルを考えてみよう。入力は、各入力画像に対して左右Ｍビットである。マシンは、ハイレベル処理に用いる３チャネルにパラメータを生成する。パラメータとして次に示すものがある。ビット長［log₂(2W_i＋1)］のディスパリティ(disparity)、左右画像のための（長さ４と１それぞれの）斜面方向とエッジ名称(edge designaton)、１ビットのマッチ・ラベル(match label)。入力データは後続の処理で記憶されるとすると、最終語長は次式で与えられる。Ｍ＝８、Ｗ＝Ｐ＝７、１５、３１である。途中結果の一次記憶用に付加的な語スペースが必要とされる。そしてこれは、各種アルゴリズの実行中に動的に変動する。最大語長は実行順序に依存する。我々のケースでは、最良の順序は最下層（the coarsest）のものから始めて順番に各チャネルを計算するものである。各種処理フェーズ(phase)の調査により、最大語長は最終チャネルのためのディスパリティを計算する間に生成されることが判った。したがって、最大語長は次式で表される。 K_max＝2M＋K_ch1,2＋K_sp （５４）第１ターム(term)は入力データ用であり、Ｋ_ch1,2最初の２チャネルの結果を意味する。 K_ch1,2=2(2(4＋1)＋1)＋log₂(2W₃＋1)＋log₂(2W₂＋1) =33 （５５） K_spは最終チャネルのディスパリティを計算するための作業スペースであり、次式で表される（立体画像参照）。 K_sp=3×2＋2[log₂(2W₁＋1)²]＋5[log₂(2W₁＋1)] =42 （５６） K_spはフラグビットを含まない。それ故、Ｋ_maxは９１ビットになる。我々のモデルを拡張し、上記で実行したほとんどの画像アルゴリズムを包含するようにしてみよう。前述したように、要求される最小語長は実行順序に依存する。推奨される順序は次の順序である。・オプティカルフロー（Optical Flow）・エッジ検索と輪郭処理・ホク変換(Hough transform)、突出マッピング・立体マッチング(stereo matching) 臨界処理は１３２ビットのオプティカルフローであると思われる（立体用画像データの付加バイトを含む）。Ｅ_n1、Ｕ_av、Ｖ_avフィールドを分配するか再使用することにより、要求語長は１０６ビットに減少する。新しいアルゴリズムに予備の容量を提供し、ＡＲＴＶＭの語長は、１２８ビット（４つの３２ビットセクター）と８ビットのフラグビット、または１３６ビットに固定された。このことは連想記憶装置だけを考慮に入れている。もし１６ビットバッファが含まれれば、総語長は１５２ビットになる。アケリブ（Ａkerib）論文の結果と結論低コストの一般的目的ビジョンアーティテクチャーは、ビデオ比でどのビジョンアルゴリズムも実行できることを提案した。提案されるマシンはコンピュータビジョンとＶＬＳＩインプレメンテーションに適合するクラシカル連想構造を持つ。それは、連想リアルタイムビジョンマシン（Ａrtvm）を特定し、ローカル近傍に操作を増強するタグレジスターをアップダウンシフト機構を使用する。内部フレームバッファーは、仮想的にコンピュータＩ/Ｏ時間を削除し、同時入力、出力および計算を許容する。実体的に発生する速度無しにチップインターフェイスを縮小するため、ワードは４つのセクターに区分され、そのうちの１個のセクターが一時にアクセス可能であり、フラグフィールドは常時アクセス可能である。５１２×５１２イメージを取り扱う主要なハードウェア補足物は、連想メモリーの２５６Ｋワード数×１５２ビット数からなることを示す。前記実験を０.５ミクロン技術に外挿すると、１チップ面積１００mm²と１サイクル時間３０ナノセコンドの下では１Ｍビット連想メモリーの容量を生産する。提案されるチップは、４Ｋワード数×１５２ビット数を蓄積し、この値は５９％容量であり、これらのチップの６４個が連想メモリーを構成する。ＡＲＴＶＭシュミレータは、連想ミクロソフトウェアを発展させ、その時間コンプレクシチィを評価するのに使用されるＣ言語において発生する。ｘとｙの方向に１５エレメントフィルタを備えるコンボルーションは、０.３４msを要するため、キャニーエッジ指示は０.５msで実行し、マールアンドヒルドレス（Ｍarr ＆Ｈildreth）方法は同じ時間内に約２回実施する。±１５画素の範囲を越えてグリムソン（Ｇrimson ）の方法によるステレオ共通点の欠如を同様に計算することは、明確さと範囲外テストを含み、１マイクロセコンドで終了する。このステレオ実行は、近傍を越えるラベル化された画素をカウントするアレイアルゴリズムによって達成された。ホンアンドシャンク（Ｈorn & Ｓchunck）による光フローは０.５ms以下で実行する。曲がり伝達、シーニングおよび幻覚トレースは、それぞれ繰り返しごとに１.５、６.４ならびに６６μｓの時間がかかる。リニアーホフトランスフォーム（lynear Ｈough）は、オリジナルから方向および距離の点で１６の体積のために１５０μｓの時間がかかる。興味深い結果として、シャウシャーアンドウルマン（Ｓha'ashua ＆Ｕllman）のグローバル重要マップのための結果が得られた。それは、繰り返しごとに０.４ms時間かかり、コネクションマシンよりも速く量の３個の命令である。幾何学的な問題が発生した。即ち、凸状ハルは頂部（ベルテックス）ごとに３.１５μｓの時間がかかり、ベロノイ（Ｖoronoi）ダイヤグラムは、ブラッシュファイヤー技術により繰り返しごとに０.１５msで実行する。２つの方法は、ＡＲＴＶＭ実行の比較評価用に選択された。最初に、私たちのアーティテクチャーを２５６ハイパフオーマンスプロセツサーのＳＩＭＤアレイ (Ｉnmos Ｔ800/Ｉntel860)と比較し、それが速度上２〜３のオーダーの量だけ有利な点を持つことを発見した。速度上の利得は、より高い精度の近傍算術オペレーション、例えばコンボルーション（９７のファクター）にとって最も低く、例えば曲がり伝達（２５００のファクター）としての近傍論理オペレーションにとってピークに到達した。第二の方法は、テスト結果がよく知られたビジョンアーティテクチャーのいくつかに活用されたアビングドン（Ａbingdon）クロスベンチマークであった。ＡＲＴＶＭは、価格パフォーマンスの点で２〜６のオーダーリードすることが分かった。この調査を通して使用されたＡＲＴＶＭ形態は、ロングシフト３２プレイス（ b=３２）と仮定した。これは、トータルインターフェイス１６０ピンでは、連想チップに６４ピンを付加した。ｂを１６に減少すると、平均速度で１７％程度の損失の大きさで３２ピンをセーブする。前記した通り、技術上の有利点に関してはアーティテクチャーは柔軟であり、より高度のチップ密度という十分な利点をとることができる。メモリーチップカウントがこのパラメータにより連続的に変化するとき、イメージ分析の中に同等のフレキシビリティが存在する。従って、１０２４×１０２４イメージでは、チップカウントはフアクター４により増大し、速度上の小さな損失を伴い、おそらくは、ワード長さに少量の増大を発生する。上記したビジョンアルゴリズムにおいては、データベースは本来的に一定方向に向けた画素すなわちオリエンテーションであり、このことが所定の有利点を提供する。ホフ（Ｈough）トランスフォームとコンベックスハルは除外される。より高度なレベルのビジョンファンクションにとっては、より複雑なイメージエレメントを取り扱う場合、連想アーティテクチャーはより大きな有利点を提供することが期待される。この作業は、重要な商業的含蓄をもっている。上記に示した装置および方法は、他種類のアプリケーションにおいて有用であり、例えば、以下のものに限定されるわけではないが、有用な例を次に挙げる。Ｈ.２６１標準を遂行するビデオテレフォン：ＱＣＩＦ分析およびＱＣＩＦ分析用のビデオテレビ会議：ビデオゲームの圧縮ならびに拡張：デスクトップパブリッシング用のカラーイメージ増強ならびにマルティプレーション：光学文字認識（ＯＣＲ）；バーチャルリアリティ；漫画映画を写すコンピュータのようなイメージアニメーション；２または３次元Ｂ/Ｗすなわちカラーイメージ検索ならびに処理；交通制御用のビデオ検出；３Ｄ再構築などのような医療イメージングならびに背景投影フィルタリング；リアルタイムノーマル化グレースケール相互関係；例えば交通制御目的用の車両のトラッキングのような１以上の対象のＴＶトラッキング；例えばライセンス数のハイデンティフィケーションのようなその他の交通アプリケーション；例えば農業生産物を対象とする製造的対象物の検査；木と金属生産物とマイクロエレクトロニック生産物；コンピュータ計算のアクセラレーション；神経系ネットワークアプリケーション；曖昧論理アプリケーション；圧縮イメージのクオリティのポストプロセッシング；ビデオ、デジタルまたはアナログカメラのフォトグラフィであって、イメージ圧縮の有無、特別な効果の有無、例えばオートフォーカス、ガンマー修正、フォトモンテージ、ブルースクリーン、ハナ修正、バクロ修正、リアルタイム形態および幾何学的歪みの修正など；テレビアプリケーション例えばＨＤＴＶ（高画質テレビ）、サテライトテレビ、ケーブルテレビ；インフォタインメント；会話認識；金融、旅行、ショッピングならびにその他の目的用のキオスク；自動オフィス備品例えばファクシミリ機、プリンター、スキャナーおよびフォトコピアの特別な成果ならびに機能向上；圧縮アプリケーション例えば顔面、指紋、その他のインフォメーションの免許証用の小型カードへの圧縮、IDカードおよびメンバーシップカード；コミュニケーションアプリケーション例えばデジタルフィルタリング、ビテルビ（ＶＩＴＥＲＢＩ）デコーディングおよびダイナミックプログラミング；ならびにトレーニング、教育さらにエンターテイメントアプリケーションなど。ビデオとピクチャー編集アプリケーションは、デスクトップパブリッシング機能例えばブルーリング（blurring）、シャーペニング（sharpening）、ローテーションならびにその他の幾何学的トランスフォーメーションおよびビデオフィルタリングなどのアクセラレーションを含む。ＣＤＬＯＭ（コンパクトディスクリードオンリーメモリー）は、圧縮機例えばＭＰＥＧ-Ｉ、ＭＰＥＧ-II、ＪＰＥＧ、部分的（fractal）圧縮機、ならびにこなみ圧縮機を含む。これらは、幅広い他種類のアプリケーション例えば医療、リアルエステート、旅行、リサーチならびにジャーナリステイック目的用のイメージに到達するようなもののためのビデオシャープニングなどの増強のあるなしに関わらない。ファクシミリアプリケーションの実例は、カラーバックグラウンドを取り消すことを含み、その結果、カラーバックグラウンドに付加されたテキストの出現を鋭くし、漢字、ＯＣＲ,ファクシミリデーター圧縮などのようなレターの隙間を満たす。フォトコピアアプリケーションの一つの実例は、メモリーに例えばロゴをフォトコピー上で蓄積するテンプレートを自動的に付加するものである。家庭、職場、銀行、貴重品及び所有権情報用の容器のためのセキュリティアプリケーションは、次のものを含む：個人認識例えば顔の認識、指紋の認識、眼球の認識、音声の認識、サイン認識のような手書きの認識のような個人の認識である。本件で示される実例の実施によりアクセレレートされるカメラ特性は次の事項を含む：１.ガンマー修正：ＡＬＵＴ(ＬＯＯＫＵＰＴＡＢＬＥ)は256個のセルを含むように仕様され、それぞれガンマーパワーにダイズする値a(a=0,...,255)を含む。ガンマーは、例えば0.36または0.45の値である。同じＬＵＴは、すべての３つの要素（Ｒ,ＧとＢ)を仕様し、ガンマー修正はすべての３つの要素によって平行に達成される。２.迅速なカラーベース変換例えばルミナンス(luminance)とクロマイナンス(c hrominance)は次のプロセス前に分離されるカラートランスフォーメーション。例をあげると、ＣrとＣbコンポーネントに表示されるリットの数を減少することにより圧縮することができるＹＣrＣbの値にＲＧＢの値またはＣＭＹＫの値を変換することが時として望ましい。結果的に、圧縮されたＹＣrＣbの値はＲＧＢまたはＣＭＹＫの値に復帰される。３.5〜15のタップフィルターを持つルミナンスクロミナンス信号のローパスフィルタリング。４.孔修正。例えば、次の段階によって達成することができる。： a.写真場面を代表するオリジナル信号の高品度コンポーネントへの抽出； b.分離可能なフィルターが、例えば、それぞれコラムおよび列に［-0.25 0.5 -0.25］である２個の分離可能なフィルターの適用； c.Ｋ=0,1,2または４個のピクセルにより水平ポイント信号をシフトすることによる修正信号の発生； d.オリジナル信号に修正信号を加えることによる修正出力の発生。５.オートフォーカスおよびオート露出コンピューテーション：例えばカメラの焦点は、第一方向に予備的に定められた量によって調整することができる。その時、高頻度コンポーネントの比は、上述した実例を使うことにより計算することができ、その結果この比が調整結果として増大するかまたは減少するかを決定する。増大の時、焦点は第一方向にあらかじめ決められた量を再び調節される。減少の時、焦点は第二の方向にあらかじめ決められた量だけ調節される。６.オートカラー修正コンピューテーション、例えば自動りとく？制御と自動白色平行。例えば次の段階を実行する。： a.Ｒ,ＧまたはＢ信号の最も暗色部分があらかじめ決められた所定水準に到達するまで黒色水準を調節すること、 b.３っつの信号の意味を同等にするよう異なる利得を調節すること、 c.３個の波型の最もポジティブなピークが白色水準に到達するように前後利得を調節すること。これは、各カラーの最大、最小および中間水準を計算することによりなされることができる。７.クロマ-キーイング：８.ノイズ減少ウエイト 1/Ｋ,1-1/Ｋ,Ｋ=2,4,8を持つ２個の連続するイメージの重量平均化９.移動保護（対象物の移動時の不鮮明回避）。 10.複合S-ビデオ信号の創成クロミナンス調整選択ライン：Ｕsin(ｗt)＋Ｖsin(ｗt)Ｕsin(ｗt)-Ｖsin(ｗt)が13.5ＭＨzの頻度で実行される。本発明は、記述された内容に制限されるものではないことが当業者にとって明らかである。また本発明の範囲は、次に示されるクレームにより定義される。Detailed Description of the Invention Title of invention Signal processing method and apparatus Field of the invention The present invention relates to a signal processing method and device. BACKGROUND OF THE INVENTION Computer image technology system and method and associative processing method Are mentioned in the following publications. The facts revealed have been Will be added. See the references below for image processing techniques and other materials useful for associative signal processing. It is described in. All references cited above and publications cited herein are incorporated by reference. Less than The numbers in square brackets in the text of the above are based on the above references. Summary of the Invention The present invention seeks to provide an improved signal processing method and apparatus. . ASP (Associative Signaling Processing) is a conventional memo. Structure and ratio of parallel computer including memory and CPU (Central Processing Unit) Are compared. The CPU plays the role of calculation, and the memory has a simple function of storing data. It is a set. The structure of ASP is completely different. The calculation is "Intelligent memory mory) ”, the CPU manages this intelligent memory. It is replaced by a simple controller. This memory is read and write In addition to its built-in ability, it recognizes its contents according to the instructions received from the controller. It can be perceived and changed. For example, suppose there is a string of 10,000 numbers between 1 and 5. Add 3 to the whole row And are required. In a conventional parallel computer, numbers are converted from a given memory into a CPU, Now the 3 is added and the result is returned to memory. 1 for each number Since it requires 3 to 3 machine cycles, the entire row has 10,000 to 30,000 machine services. I need ukule. In the associative processing, 10,000 numbers are stored in the intelligent memory. Control Laura asks 5 questions and returns 5 answers: "Who is 5? Recognize yourself." This is one machine cycle Need. The controller issues a command to all memories that recognize itself, Those Becomes "8". The controller continues to ask, "Who is 4?" Become seven. I will issue a command. " Repeat the following until all the sequences are processed . Compared to conventional parallel computers that required 10,000 to 30,000 machine cycles In total, this process requires only 10 machine cycles. This read, All arithmetic and logical using the basic instruction set of self-awareness and writing Processing can be performed. According to one embodiment of the present invention, an ASP device for processing an input signal is Each processor contains many associative memory cells and each input signal An array of processors processed by at least one processor; It stores the answers received from the processor and transfers them between the registers and the processor. An array of registers containing at least one register, It is composed of I / O buffer registers for input / output signals, Where an array of processors, an array of registers and an I / O buffer register Are located on a single module. According to an embodiment of the present invention, the ASP device is an array of processors, a register. Including an array of star and I / O buffer registers, An array of processors, each including a number of associative memory cells, includes at least one One processor processes multiple input signals, The register array stores at least one response received from the processor. Between the processors, including registers To communicate, The I / O buffer register is for inputting / outputting signals. Further, according to one embodiment of the invention, an array of processors and an array of registers are provided. Ray and I / O buffer registers are located on a single chip. Further, according to an embodiment of the present invention, the array of registers comprises at least one Perform a multi-cell shift operation. Also, in accordance with one embodiment of the invention, the ASP device is an array of associative memory words. , Each word containing a processor, an array of registers and an I / O buffer. Register, Each input signal sample is processed by at least one of the processors. And An array of registers provides word-to-word transfer and at least one mast cell Including at least one register for The I / O buffer register is for inputting / outputting signals. Further, according to an embodiment of the present invention, the register array is a single cell shift. It also works for operations. Also, according to an embodiment of the present invention, the I / O buffer register and the processor are Processed in parallel. In addition, according to one embodiment of the present invention, the word length of the associative memory cell is short. The length of the word in the I / O buffer register becomes longer. Furthermore, according to an embodiment of the present invention, the device processes video in real time. Can make sense. Also, according to an embodiment of the present invention, the signal Including di. Further, according to some embodiments of the present invention, at least one of the array of words is Of words include at least one non-associative memory cell. Also, according to an embodiment of the present invention, at least one of the array of words is The word includes at least one column of non-associative memory cells. Further, according to some embodiments of the present invention, arrays, register arrays and I / Os are provided. The buffer registers are arranged in a single module. Also, according to some embodiments of the present invention, the module receives instructions. And has a bus that performs at least one multi-cell shift operation. In addition, according to some embodiments of the present invention, the module comprises at least one multi-module. First bus for performing cell shift operation and at least one single cell shift operation And a second bus to execute. According to some embodiments of the present invention, a multi-cell or single-cell shift is further provided. An array of processors is provided that communicate by manual operation, the array being a number of processors. Including a bus, first bus and second bus, The first bus is connected to at least one set of processors, which One that can perform at least one multi-cell shift operation, The second bus is connected to at least one set of processors It is capable of performing at least one single cell shift operation. In addition, according to some embodiments of the present invention, signal processing includes: Of the series of signal characteristics, a continuous set of first signal characteristics and second signal characteristics, respectively , The number of samples with the first signal characteristic is counted, and then the sample with the second signal characteristic is counted. Count the number of samples. Further, according to an embodiment of the present invention, counting the number includes forming a histogram. No. Also, according to some embodiments of the invention, the signal comprises a color image. Also, according to an embodiment of the present invention, at least one characteristic is strength, noise or And at least one of color densities. Further, according to an embodiment of the present invention, the method of the present invention has a color image. Including scanning the medium. Further, according to an embodiment of the present invention, Among the edge pixels, the self-recognition of the most edge pixel and the second most edge pixel , In parallel, all edge pixels that are in contact with at least one edge pixel Self-awareness, Edge recognition including at least one iteration of the second self-recognition process above Law is provided. In addition, according to an embodiment of the present invention, The accumulation of instructions with the most common first sample having the first characteristic, All individual samples in contact with the sample having at least one first property Parallel to the module, the accumulation of the indication that the contacted sample has the first characteristic, There is provided a signal processing method comprising at least one repetition of the above second step. It is. Further, according to an embodiment of the present invention, the signal comprises an image and the first sample The first characteristic of It is a pixel. Further, according to an embodiment of the present invention, A feature labeling method that allows the signal to be examined in detail, A signal containing at least one feature, A feature containing one set of touched samples, A method involving the accumulation of many indexes corresponding to many samples, and If the contacted sample index is ordered before the individual sample index Sometimes, in parallel for each individual sample among many samples, To the index of the sample that touches that sample of the index of the individual sample that was stacked Replacement by At least one iteration of the above replacement steps is provided. Further, according to some embodiments of the present invention, the replacement is in each iteration. Is repeated until a small number of indexes are replaced. Also, according to an embodiment of the present invention, the signal includes an image. In addition, according to an embodiment of the present invention, the signal comprises a color image. Further, according to an embodiment of the present invention, the sample includes pixels and the first characteristic is Pixels that contain at least one color component, at least some of which are adjacent, combine samples Determine sex. In addition, according to some embodiments of the present invention, the pixels form an image in which The boundaries are defined and repeated until the boundaries are reached. Further, according to some embodiments of the present invention, the iteration is a predetermined number of times. Done. In addition, according to an embodiment of the present invention, the method of collecting images is based on HDTV recording. Including a conversion calculation for the output image made by a distorted lens like It corrects the distortion caused by the lens, and it is applied to each of the many pixels in the output signal. On the other hand, the transformation is performed in parallel. In addition, depending on the logic criteria selected by the user, each needs memory of its own and others. Response when the contents of the raw are compared, thereby causing the comparison memory element to meet the criteria. And an ASP device that includes many comparison memory elements that generate Registers are also provided. Further, according to an embodiment of the present invention, the criterion is at least one logic. Inclusive operands are included. Also, according to an embodiment of the present invention, at least one logical operand is Contains a reference to at least one self and another memory element. For example, a lot of memory If the prime responds to many pixels that form the corresponding color image, I do. The referenced element contains the values A, B and C of three specific pixels, -The logic decision criterion selected by each pixel is that each pixel has a value of A, It is like having the value of C on the lower left side. Further, according to some embodiments of the invention each memory element is at least one. Including one memory cell. Also, in accordance with one embodiment of the present invention, many comparison memory elements are For the referenced element, the contents of the memory element other than itself are compared. Further, according to an embodiment of the present invention, Each PE (Processor Element) has a processor whose size can be changed. An associative memory including a PE array composed of a large number of PEs including All associative memory cells can be resized to include many associative memory cells. And the They are arranged in the same position in a word, where there are many wads of the same format as a FIFO. Included. Furthermore, according to one embodiment of the invention, more than one word can be resized. Includes many associative memory cells. Also, according to some embodiments of the present invention, the modification of the contents of many memory cells may be Method, performing calculations on individual values stored in many memory cells, and A method is provided that includes accumulating a calculation result in a number of memory cells having a value. Further, according to one embodiment of the present invention, the storage is performed in parallel on all memory cells. Is done. Here's a chip for multimedia and image processing applications It is also mentioned. It is low cost, low power consumption, small size, multi Commercial and high-end applications for media and image processing applications Suitable for high performance real-time image processing of end powerful image processing . The chip is a chip that performs a large amount of parallel processing, and has 1024 associative processes. System is packed in one chip, and it is one of the computer clocks. A machine cycle can process 1024 digital words. The chip handles a wide range of image processing and real-time video speed multimedia devices. Run the application Is designed to be On the other hand, conventional general parallel computing Chip and digital signal processing chip (DSP) ) Can only execute 1 to 16 words in one machine cycle. The chip's main instruction set consists of all arithmetic and logical instructions. It consists of four basic commands that can be used to execute actions. Thousands more Being able to pack a processor into a single chip is another design advantage It is. A single chip is 500 to 2000 MIPS (Million Instructioin Per S econd) is executed. The system built on this chip has The multimedia processing that an end computer does is worth a few minutes. It can be done in the case. The chip is built on a modular structure for high performance (in a linear ratio) Easy to connect with one or more chips to get. So the overall performance is linear Many chips to improve to the level of the most advanced supercomputers Can be connected in parallel. If there is a CPU chip and a DSP, one or more chips are connected in parallel. Requires a dedicated operating system. Its performance is connected at the same time It increases in proportion to the square root of the number of chips. When two or more chips are connected , Supercomputer configuration is required. The configuration of the chip can process a large amount of data input and output processing in parallel at the same time. It has become so. As an associative processor, each of the 1024 chips It has a memory and a data path inside itself. Chip data path The structure of is that the internal processor reads data in parallel and the Eliminate runnecks and use only a fraction of the performance of conventional parallel computers . This chip has an average power consumption of 1 watt to function as much as 500 MIPS. It uses force, which is 10-25 times better results than conventional methods and DSP chips. Brief description of the drawings The invention will be understood and evaluated by reference to the following detailed description in conjunction with the drawings. To be valued. FIG. 1 is a machine of an ASP device assembled and operated in the embodiment of the present invention. It is the simplified block diagram showing Noh. FIG. 2 is a simplified flow chart of a method of using the apparatus of FIG. . FIG. 3 processes the input signals assembled and operating in the embodiment of the present invention 1 is a simplified block diagram of an ASP device for FIG. 4 is a simplified block diagram of an example use of the apparatus of FIG. FIG. 5 is a simplified block diagram of another use of the apparatus of FIG. FIG. 6 is a simplified block diagram of a portion of the apparatus of FIG. FIG. 7 is a simplified block diagram of a portion of the apparatus of FIG. FIG. 8 is a simplified block diagram of another portion of the apparatus of FIG. FIG. 9 is a simplified flowchart illustrating the operation of the apparatus of FIG. . FIG. FIG. 10 is a diagram for explaining a part of the operation of the apparatus of FIG. FIG. 11 is assembled and operative in one of the alternative embodiments of the present invention, FIG. 3 is a simplified block diagram of an associative real-time video device. FIG. 12 is a simple operation of the device of FIG. 11 for compare and write commands. FIG. FIG. 13 is a simplified description of communication between some processors of the apparatus of FIG. FIG. FIG. 14 is a chip interface and a processor of a part of the device of FIG. It is the block diagram which simplified the connection between. FIG. 15 is a simplified version of the automatic device used in the complexity determination of the device of FIG. FIG. FIG. 16 describes the word format of the associative memory in a part of the device of FIG. It is the simplified block diagram which reveals. FIG. 17 is another word format of the associative memory in the part of the apparatus of FIG. FIG. 6 is a simplified block diagram illustrating FIG. FIG. 18 is a further word memory of the associative memory in the part of the apparatus shown in FIG. FIG. 3 is a simplified block diagram illustrating a code format. FIG. 19 illustrates the implementation of a method of finding thresholds using the apparatus of FIG. FIG. 3 is a simplified block diagram of FIG. FIG. 20 is (20A-20F) a thinning method using the apparatus of FIG. It is explanatory drawing which simplified the test template explaining execution of. FIG. 21 describes the implementation of the matching method using the apparatus of FIG. FIG. 3 is a simplified block diagram of FIG. FIG. 22 is still another associative memory in a part of the apparatus shown in FIG. 3 is a simplified block diagram illustrating the word format of FIG. FIG. 23 is an association in a part of the apparatus of FIG. FIG. 6 is a simplified block diagram illustrating an additional word format of memory. FIG. 24 is another word format of the associative memory in the part of the apparatus of FIG. FIG. 6 is a simplified block diagram illustrating FIG. FIG. 25 is the device selected by the stereo method using the device of FIG. It is the graph which compared the execution time. FIG. 26 is a device selected by the stereo method using the device of FIG. It is a graph comparing the complexity of the. FIG. 27 is another wording of the associative memory in a part of the apparatus shown in FIG. FIG. 6 is a simplified block diagram illustrating a code format. FIG. 28 illustrates a part of the edge recognition method using the apparatus of FIG. 11. FIG. 3 is a simplified block diagram of FIG. FIG. Numeral 29 is another word of the associative memory in a part of the apparatus shown in FIG. FIG. 6 is a simplified block diagram illustrating a code format. FIG. Reference numeral 30 denotes a connected protruding network using the device of FIG. FIG. 6 is a simplified explanatory diagram illustrating pixels used in the method for managing. FIG. 31 is another wording of the associative memory in a part of the apparatus shown in FIG. FIG. 6 is a simplified block diagram illustrating a code format. FIG. 32 is a calculation method of Hough transform using the apparatus of FIG. 3 is a graph illustrating parameterization of a normal straight line used therein. FIG. The reference numeral 33 indicates a conversion probe using the device of FIG. It is a graph explaining a part of method of forming Conx Hull. FIG. 34 is an associative Voronoi (associative) part of the device of FIG. FIG. 8 is a simplified block diagram illustrating a method of processing a Voronoi) diagram. Attached hereto is the following appendix (AP) to aid in understanding and evaluating the embodiments of the present invention. PENDIX). Appendix A is "sub. This is a list of subroutines called "rtn" Chin is called from each list in Appendix B-O. Appendix B is a list of ASP methods for forming a histogram. Appendix C is a list of ASP methods for 1D convolution. Appendix D is for 2D convolution low pass filter applications. It is a list of ASP methods. Appendix E is Laplacia for 2D convolution. n) A list of ASP methods for filter applications. Appendix F is for a 2D convolution Laplacian filter application. Is a list of ASP methods for. Appendix G is a list of ASP methods for curve propagation. Appendix H is a list of ASP methods for optical flow. is there. Appendix I is a list of ASP methods for performing RGB to YUV conversions. You. Appendix J is a list of ASP methods for corner and line recognition. Appendix K is a list of ASP methods for contour labeling. Appendix L is a list of ASP methods for salient networks. Appendix M is an A for performing a Hough transform on a signal arranged as a straight line. It is a list of SP methods. Appendix N is a list of ASP methods for performing hohou transformation on signals arranged as circles. Strike. Appendix O is a list of ASP methods for forming Voronoi diagram signals. Detailed Description of the Preferred Embodiment A simple representation of the functionality of an ASP device, assembled and operative in an embodiment of the invention. The converted block diagram is shown in FIG. The apparatus shown in FIG. 1 has a FIFO (First-In First-Out) 10 that can be simultaneously accessed. , Or more generally, includes memory that is simultaneously accessible, which memory is Stores at least some of the incoming signals coming in through a bus called DBUS You. The FIFO 10 that can be accessed simultaneously is a plurality of PEs (Processor Elements) 2 Data is sent to the PE array 16 including 0, and the PE 20 sends the data to the data link 30. To send. The data link 30 preferably also serves as the response memory. Or Separate response memories may be provided. Each PE includes at least one associative memory cell, and more generally , Including a number of associative memory cells. The example includes 72 associative memory cells. Respectively PE 20 stores and processes the image subportions. By all PE20 The subportions that are stored and processed are stored in the FIFO 10 that can be accessed simultaneously. It forms part of the input signal that is stored at once. For example, suppose that there are 1024 PEs 20. If the processing tasks are If the PE's are simple enough to process two pixels at a time, the FIFO will -A block of 2048 pixels in an image can be stored. If the processing task is If it's so complicated that two PEs are needed to process one pixel, the FIFO Can only store fewer 512 pixel blocks in a color image each time . The PE 20 is controlled by the controller 40. The controller 40 is characteristic Are connected in parallel to all PEs. FIG. 2 is a simplified flow chart of one method of using the apparatus of FIG. Of FIG. The first step of the method is step 54. In step 54, the system Receives the command sequence selected by. The command sequence is white Performed for each pixel of the black or color image. Command sea The cans are stored in the command sequence memory 50. Characteristically, A can contains some or all of the following types of commands: (A) Comparison ・・・ Compare the contents of each of one or more PEs. Output whether the contents are equal to comparand by comparing with the register (comparand) I do. (B) Writing ... If one of the contents of itself and the contents of other PEs around it Or both, according to the preferred logical criteria for write commands, one or The contents are changed according to the respective write operands of the further PEs. (C) Single cell shift ・・・ Each of one or more PEs The contents of are shifted to adjacent PEs via the data link 30. (D) Multi-cell shift ... Each of one or more PEs The contents are shifted to PEs that are not directly adjacent to each other via the data link 30. Let it. Also, in step 54, the first block of the input signal can be accessed simultaneously. An input signal is received by the FIFO 10. The command sequence is executed one by one as shown in FIG. FIG. 3 illustrates processing an input signal, constructed and operative in one embodiment of the invention. FIG. 3 shows a simplified block diagram of an ASP device for performing. The signal processing device of FIG. 3 has the following structure. Components, all of which are in one module, for example one chip It is placed on top. (A) There is an array 110 of processors or an array 114 of PEs. Easy For that reason, three of them are shown. Many processors 114 Memory cells 120, of which four are shown for simplicity. . Of the many memory cells, at least one memory cell (exact One is a content addressable memory cell 122. Each processor as shown Associative cells or cells 122 of the same location or of their respective processors It is placed in the same position inside. In the example, each contains 72 memory cells 120. Given a 1K (1024) processor 114, they are all associative. You. Preferably, at least one processor samples more than one input signal. Can be processed. (B) The response memory 130 includes one or more registers. The register is a process The responses received from the server 120 can be accumulated, and preferably the data between them is Work as a data link. In succession, the data link split between the processors is Be killed. Preferably, the data link function of memory 130 is at least It is possible to execute one multi-cell shift operation. 16 cells are executed. Data in memory 130 The link function also preferably performs a single cell shift operation. That maneuver In each work, in each cycle, from one cell to the next cell or one P Shift from E to adjacent PE. (C) A FIFO 140 that inputs and outputs signals and is simultaneously accessible, or , More generally memory that can be accessed simultaneously. (D) A response counter that can count the number of “YES” responses in the response memory. Mounting unit 150. (E) The comparand, mask and write options described above with reference to FIG. Perand register 180. A command sequence memory 1 similar to the command sequence memory 50 of FIG. 60 and controller 170 are characteristically external to module 104. Con The tracker 170 can control the command sequence memory 160. Wear. Methods for associative signal processing include: (A) Low-level associative signal processing method: histogram formation, ID and 2D Convolution, Optical flow, And between color spaces such as RGB to YUV conversion. (B) Medium level associative signal processing method: recognition of corners and straight lines, contour labeling, Network of protrusions, hou transformations and formation of convex contours and Voronoi dia Geometric tasks like the formation of grams. Each of the above associative signal processing methods will be described below. Histogram formation The histogram formation method is shown as software of the histogram formation method. See Appendix B, which is included below. The methods in Appendix B are for each gray Includes a very short loop repeated at the bell. COMPARE operation The product tags all pixels at that level and COUNTAG records them . Counting is automatically done by the controller and the histogram is stored in an external buffer. accumulate. Convolution In low-level video, especially edge recognition, it is possible to apply various filters to the image. Convolution is needed and is most conveniently performed. The image is As a simple vector of length NX M, or a vector of N columns each of length M Considered as a toll connection. N-element data vector P-element filter Becomes a vector of length N + P-1, and only the central N-P + 1 element has two The result of the complete overlapping area between the vectors of is significant. The convolution filter vector [f] of length P and 8 is set by the controller. Are executed as operands one at a time. As a result, for example, length 8 + 8 + 1 og²It is stored in the field "fd" of length (P). "Temp" Is temporarily used to store a carry in the field "fd". "Ma "rk" bits are self-aware of completely covered area by the filter vector Used to An example of a row convolution method is shown in Appendix C. An example of a 2D convolution method implemented with a low pass filter is shown in Appendix D. You. Appendix of an example of a 2D convolution method performed with a Laplacian filter Shown in E. Of the 2D convolution method implemented by the Sobel method of edge recognition An example is given in Appendix F. Curve propagation Curve propagation is useful for eliminating weak edges as noise, while strong edges are weakened. Even traced. In the basics of signal statistics and evaluation of noise in images , Two gradient scale origins are calculated. “Low” and “high”. Gradient scale is low Edges are removed, those with a "high" are considered to be edges. "Low" Values between "high" will pass through the chain of pixels higher than "low" and An edge is considered if it can be connected. Other intermediate pictures The element is removed. This process involves the propagation of curves. Associative processing, the method detailed in Appendix G is Use three flags. (i) "E" Mark the beginning of "high" (clear edge point) first, then finish Specifies all selected edge points for. (ii) “OE” (Old Edge) The locus of the edge confirmed in the last iteration. Hold. (iii) Designate pixels of "L" and "Low" or higher. In all iterations, if declared with setting "E", "L Check if there is at least one edge in the 8 neighborhoods . When processing is completed before moving to "E" and "OE", two The lag is compared for steady state. Optical flow Optical flow intersects the visual field at every point of the image Assign a velocity vector that describes the motion. Optical flow potential Application is target area tracking, target self-verification Includes compressed image movement, independent automation and related areas. Op The theory of Tikalflow calculation has two restrictions. In the image The brightness of individual points remains constant, and the flow of brightness patterns is smooth anywhere Changes to Horn and Schunck solve the problem of suppression minimization Got an iterative process. The flow velocity has two components (u, v). Each time In return, the new set of speeds [u (n + 1), v (n + 1)] is the average of the previous speed evaluations. Can be evaluated from How to perform the horn and shank methods in concert Are shown in Appendix H. Color space conversion One of the most important processing of color images is the 24-bit space. To other space, eg (Y, U, V) which is suitable for image compression. Is Rukoto. An associative method of color space conversion is shown later in Appendix I. You. Recognition of angles and directions An important feature in medium or high level processing is the recognition of corners and straight lines. It is possible. Canny In (Canny) edge recognition, the direction of the straight line is determined in this process. other On the other hand, the M & H algorithm is not direct, and the edge bitmap that forms it is Furthermore, a straight line recognition process must be performed. According to an embodiment of the present invention For example, a 9x9 edge bitmap around each pixel is a line segment Used to kick. The resulting method generally produces 120 different straight lines. Can be distinguished. A program listing for this method is given below in Appendix J. Contour tracing and labeling At the stage of preparation, each contour point is labeled with its xy coordinates. This program The process is generally iterative and is performed in parallel on a 3x3 neighborhood of all contour points. It is. Each contour point examines one of its eight surroundings in turn, and its value is its own. If it is smaller than the value, the next label is included. Circulating Sike where the neighbor is treated This significantly enhances label transmission. The repetition stops when all labels have stopped changing, and each contour Recognize based on the smallest coordinates. The lowest coordinate in each contour is Keep only one original label. The locus of the lowest coordinate is the trajectory, and the point in the image Counted to get the number of contours. There is an associative way to implement this method. Recorded as K later. Associative network The prominent structure in the image is a pre-requisite for organized search and its shape. Recognize immediately without knowledge of be able to. Such a structure is buried in a disordered background, its elements are Even when divided, it stands out. Sha'ashua and Ullman (Ul) lman) proposed a comprehensive measure of protrusion by a curve. It's length, continuity And smoothness. The image is considered to be a network of NxN grid points. next to From the point of, the segment or gap of the element in the d direction comes in, and similarly goes to the next To go. A curve of length L in the image is a sequence of connected directional elements p (i), p (i + 1), ..., p (i + L), where each element is a line element or Equivalent to An associative method for implementing this method is given later in Appendix L. You. Hough conversion A hou transformation is a straight line described by a parametric expression Find a curve, even a gap, such as or a part of a cone. In the image Each point in the figure is transformed into a trajectory in the parameter space. The parameter is After separating into an appropriate range, the distribution of the trajectory of the parameter space is given to Grams are formed. The generation of the target curve is due to the clear peak ( The intersection of many trajectories). For straight lines, the usual parameterization of Xcos (A) + Ysin (A) = R This formula specifies a straight line by R and angle A, and the histogram includes straight lines in all directions of A. . However, if the candidate point is edge-recognized by the method of giving a direction, the angle A I understand. An associative implementation of this method is shown in Appendix M below. (Example) Necessary to perform hohou conversion on a sample of 256 × 256 pixels Describe the main steps. A 256 × 256 image with the center as the starting point has a large number for each pixel and each column. Of the several chips, each is aligned in one processor. For example, if If the chip contains a 1K processor, then 64 chips are 256 x 256 = 64 Used to hold K. The xy coordinates are given as an absolute value and a sign of 8 bits. Angle A from 0 to π (3.1415 ...) has 10-bit precision (excluding the sign of the angle) Given). The sine and cosine are calculated by referring to the table. Preferably, The size of this table is reduced in four by using the symmetry of the function. Compared A Later, by the element using the "countag" command, the histogram And a read element is required. (Example) Radius R and center x₀, Y₀You can find the circle given by. Slope The direction of can be easily used in the process. dy / dx =-(x₀-X₀) / (Y-y₀) = Tan (T−π / 2) Differentiating this equation, x₀, Y₀Is calculated as follows. Where T is the angle is there. x₀= X ± Rsin (T−π / 2) y₀= Y ± Rcos (T-π / 2) Histogram is x₀, Y₀Made against. An associative implementation of the Hou transformation is an appendix N is shown. Voronoi diagram This type of diagram is convenient for close analysis. First, the plane of L points on the plane , P (i), i = 1,2, ..., L, Let L be a set of L points in the plane, P (i), i = 1,2, ..., L, and each Voronoi diagram is These points P (i) are surrounded by the region R (i), and all the points in R (i) are the points (P (j ), j = 1,2, ..., L, i ≠ j) is closer to P (i). The boundaries of all these regions R (i) are Voronos. (A) Configure the diagram. Appendix O for an associative method based on "brush fire" technology Shown in Each given point acts as a source of "fire" and is irregular in all directions. Spread on the law. The boundary consists of two (or three) sources that meet and burn . Every point in a given set is first of a different color, eg its own xy locus. Marked with a mark. Each point in the image examines eight nearby points. Blank (uncolored) dots copy the color when they find nearby colored dots I do. If both are colored, compare the colors and if they are different, Voronoi (boundary) Mark yourself with dots. This process is repeated until all points are colored. Below are the basic ASPs or steps often used in the methods of Appendix B to O. You. Every step in group 1 is equal to all steps in group 3. Can be run on columns. Also, all steps in group 1 and all steps in group 2 All steps of step and group 4 can be performed in parallel. To implement the method described in any of Appendix B to O, see the list of Appendix B to O Using CLASS functions like the Borland "C ++" compiler It has to be run on some "C" language compiler. As already mentioned, each method of Appendix B to O comprises the following steps: . a. Basic memory size definition step b. Basic association word length definition step c. Steps in which the subroutine in Appendix A is called d. Steps specific to individual applications The specific examples given in the appendixes illustrate the invention in great detail. Yes, not a limitation. One of the ASP chips has just been mentioned. The device here is the ASP100 is there. 1. introduction The ASP 100 is an associative processing chip. It's a vision association compilation To serve as part of a computer, most commonly multiple ASP10s. Works in one array of 0 chips. The ASP100 is a 1K x 72-bit associative memory array, peripheral circuits, image F It consists of IFO I / O buffer and control logic. 2. Mode of operation ASP100 as a single chip (single chip mode) or ASP1 Serves as part of an array of 00 chips (array mode). 2.1 Single chip mode Single chip mode is shown in FIG. This mode Now, a single ASP 100 is operated in association with the controller. 2.2 Array mode Array mode is shown in FIG. One array parallel in ASP100 chip Interconnected to form a linear array. Single controller chip or suitable The appropriate circuit controls the array. 3. Pin out 3.1 Basic pinout The ASP 100 is packaged in a 160-pin PQFP package. All outputs And I / O pins are tri-stated and are limited by CS. Below is a complete list of pins Indicates a strike. 3.2 Test pinout The following is a list of pins that serve as test points. They are Ignored while the production version is packed. 4. Structure and operation 4-1 Plan view A plan view of the device is shown in FIG. ARRAY is the main associative array. FIFO Is a fifo buffer for image input / output. SIDE is a tag and tag logic , Tag count, select first, (write line and match line Side array consisting of row driver, amplifier and shift unit It is. TOP is mask, comparand register, and (bit line, inverse It consists of column drivers (of bit lines and mask lines). BOT The TOM includes an output register and an amplifier. CONTROL is a chip control It is logic. Micro control is external to this version. 4-2 Structure of ARRAY Array is from 1024 × 72 APE (Associative Proceeding Element) It consists of three columns, each 24 APE wide, integrated in a physical Is divided into three blocks of 342 × 72 APE. This 6 way division Creates a square aspect ratio of the layout to help load the vertical bus wires. As described in 4-3 below, one 24-bit sector of the array is (C It can be reconfigured as follows by ONFIFO (Configure fifoinstruction) You. All 1.24 bits correspond to the FIFO. (The overall ARRAY width is 48 You. ) 2.16 bits correspond to FIFO and 8 bits correspond to ARRAY. (Overall A The RRAY width is 56. ) 3.8-bit FIFO, 16-bit ARRAY Corresponding to. (The total ARRAY width is 64.) APE (Associative Processing Element) is CAM (Content Addressabl) e Memory) cell. It is a storage element, a writing device and a match device. It consists of Within the APE there are three vertical buses and four horizontal buses. Vertical direction: Bit line (BL) Inverse bit line (IL) Mask line (MASK) Horizontal direction: write line (WL) Match line (ML) VDD VSS (GND) The storage element consists of a set of two crossed CMOS inverters. The writing device is dynamic EXCLUSIVE (XOR) logic is there. This technique allows an effective and reliable area of comparison equipment. 4.3 FIFO structure Referring to FIG. 7, the FIFO calculates image data by ARRAY. It is designed to input and output in parallel with. FIFO is 1024x24 or 1 Matrix of 6 or 8 reconfigurable APE 190, 1024 each Consists of three columns of bidirectional switches and an address generator 194 It is. The corresponding part of the comparand register is the FIFO input register in the TOP. The part corresponding to the output register that works as a register is FI in BOTTOM. FO output register and Then work. The FIFO controller (FC) is in the TOP. The FIFO consists of the instructions of the CONFIFO, where three of the operands are The LSB of 6, 8 bit FIFO 5, 16-bit FIFO 3, 24-bit FIFO 7, without FIFO It is. When the FIFO has a width of 8 bits, the input VIN [0: 7] and the output VOUT [0: 7] is sent to bits [0: 7] of the FIFO, where bit 0 is the leftmost bit. It is. When the FIFO is 16 bits wide, input VIN [0:15] and output V OUT [0:15] are sent to the FIFO's bits [0: 7], which is the least significant byte. [0: 7] are sent as in the previous case. Similarly, the FIFO has a 24-bit At the time of width, the minimum important 2 bytes are the same as when the FIFO is 16 bits wide. Sent to The address generator 194 is composed of a shift register and is sequential. Execute addressing mode. Which is the currently active FIFO word line Choose. The FIFO has two operation modes, an image I / O mode and an image conversion mode. To have. Bidirectional switch (one of three columns) is image I / O mode Divides the matrix from the array, and in image conversion mode The matrices are connected to form a bond of the array of APEs. Input and output registers The data acts as a buffer register for image input and output. Image I / O mode In image I / O mode, the new image is written while the previous image is written. Read into the FIFO. The FIFO controller (FC) is a FIFO Control like. The pixel I / O is synchronized with CLK. External control The input RSTFIFO resets the address generator 194. FENB ( Requires at least two CLK cycles) of the positive edge of CLK Allows input and output of the next pixel above. Once all pixels are input (or FFUL requires two CLK cycles. This I / O The activity asynchronously performs the remaining chip calculations. The basic operation of the image I / O mode is executed as follows. Picture of VIN pin The element is input to the FIFO input register (FIFO part of comparand register). Is forced. Address generator 194 can create exactly one word line I will The corresponding word is the FIFO output register (output register The FIFO portion of the device) and read through it directly to the VOUT pin. It is executed in the same way as the embedded execution. After that, in the FIFO input register Are written into the same word as a write operation VO Note that the UT pin has three states. They can be internally as needed I can and can't. This sequence of operations fills 1024 processors with data. It is repeated 1024 times. Multiple ASP100 chips can be connected together with a FENB / FFUL chain And the first ASP100 will send FENB (exactly 2 cycles) to the external controller. Or Rauketori, FFUL of each ASP100 (exactly 2 cycles) is directly next Connected to the chip's FENB input and the last FFUL returns to the controller. Image conversion mode In the image conversion mode, the image previously read to the FIFO is The pre-processed image from ARRAY that was transmitted to RIRAY It is transmitted to the next output to the IFO. These transmissions are the tag register of the SIDE block. It is executed as follows according to a sequence of comparison and writing via the data. Image input The target bit part of ARRAY is masked in the mask register, Reset by chain. Clear comparand, write operation (that is 1 Everything is done in a cycle. ) The original bit part of the FIFO matrix is the mass Masked by the register. The content of the bit part is tagged as the result of the comparison operation Passed to register. The desired bit portion is masked again and the contents of the tag register are Is passed to the target bit part by set comparand write operation. It is. In short, the following 5 cycles are used. Image output This operation is done when the original bit part is allocated in ARRAY. Image, except that the text part is allocated in the FIFO matrix It is executed exactly like the input. There are two image conversion operations in ARRAY. Different fields (the first field assigns a new image, the second field The field temporarily stores the processed image. ) Is required You. The two operations (image input and output) are combined into one loop. 4.4 Structure of SIDE FIG. 8 is a diagram illustrating the execution of the side blocks of FIG. SIDE block Is a tag register, NEAR, FAR, COUNT TAG, FIRST RE Includes SPONDER circuits, RSP circuits, and horizontal bus drivers and amplifiers. The tag register is composed of columns of 1024 tag cells. Tag register Set inputs and non-inverted outputs are performed by the D flip-flops. input Is selected by 8 inputs as follows: Far North, Near North, Far South, Near South, Matchline (through amplifier), Tag (feedback level) Loop), GND (reset tag) and first response output. mux is MUX [0: 2]. 4.5 TOP structure The TOP part contains the COMPARAND and MASK registers. Each has a logical and a vertical driver. The COMPARAND register is AR It has a word that is compared to RAY. It is 72 bits Of the FIFO and is divided by the shape of the FIFO. (See 4.3.) For LETC, LETMC, LMCC, LMSC, LCSM instructions Affected. All these instructions are divided into three, depending on the sector bit. Only one of the sectors is affected at a time. FI of COMPARAND The FO part works differently than described in 4.3. The MASK register is COMPA ignored by "0" during compare and write. Mask RAND bits. Bitlines and invars of masked bits The bit line is held at "0". It is LETM, LETMC, LMCC , LMSC, LCSM, LMX, SMX instructions. The former five instructions only affect one sector at a time. L MX and SMX also clear the mask bits for unpositioned sectors. MASK The FIFO part of works differently from the one described in 4.3. 4.6 BOTTOM structure BOTTOM is a bit line and inverse bit line amplifier, output register Data and its multiplexer, DBUS multiplexer and DBUS I / O buffer. ARR Since the AY is physically combined into three columns, the outputs of the three amplifiers are It has to be disassembled. Logical unit which columns actually form the output Or, select as follows. READ: Select the column which RSP is true. FIFO OUT: Select which of the obtained addresses. The output register is 72 bits long. 8 or 16 or 24 bits are F It acts as an IFO and is connected to the VOUT pin. Three sectors for READ operation One of them (depending on the sector bit) is 24 bits of DBUS via the multiplexer. Connected. The DBUS multiplexer can be configured in two ways. (1) SHIFT: South Long Shift DB (from column 1008: 1023) To US [31:16], North Long Shift Line (row 0:15) DBUS Connect to [15: 0]. (2) READ: Bits [0:15] of output register (bit 0 is LSB) ) To DBUS [15: 0], bits [23:16] to DBUS [23:16] Connecting. DBUS I / O buffers have DBUS connected as input or output Control is performed, and the HIN and LIN control signals are set as follows. Is controlled. In SHIFT mode, one ASP chip line DBUS [31:16] Must be connected to the next chip line DBUS [15: 0]. External The connection is not simple, but it can be switched as needed. 4.7 Structure of CONTROL The ASP100 is controlled by an external microcoded state machine. Received and decoded control line. External micro controller Rollers are capable of parallel execution and horizontal microprogramming. Combined ASP100, microcontroller and external controller The operation of La is summarized in the following five-stage instruction pipeline. Taking Incoming phase, decrypting phase, μ-fetch, compare and execute. Fetch At the stage, instructions are fetched from external program memory and It is converted to IR (Instruction Register) through the Tembus. Instructions from the IR are read by the microcontroller during the decoding stage. Converted and stored in μIR. At the μ fetch stage, the control code is External μIR is converted to internal μIR through the input pad. At the comparison stage, The part that affects the parameter register is executed, and the control code is Move from μIR to μIR2. At the execution stage, as shown in FIG. 9, ARRAY And run in other parts. 4.8 Initialization At reset, all ASPs 100 inside the register are It is set to 0, except for b. 4.9 Precautions for operation 1. For writing, WRITE, COMPARE, WRITE ... Is performed. Cannot execute two WRITEs or two COMPAREs in succession It is designed to be 2. The corresponding signal LE among all instructions including SET such as SETAG TM and / or LETC are set high. This is Compaland and trout This is necessary because the setting and resetting of the register of the clock is synchronized. 4.10 Clock generator Preferably, a single 50MHz CLKIN clock is input to the ASP100. It is. In addition, the clock synchronization control DCKIN signal is also input. CLK The IN signal serves as the clock for the generator circuit. DCLKIN is an input signal is there. (Connect at required setup timing or hold timing Is done. Circuit creates two clocks, eg 25MHz, relative to each other They are CLK and DCLK which are delayed by a quarter cycle. CLK is a clock generator Feedback is provided to the rating pad to provide the required drive capability. 5. Programming model 5.1 Instruction set An example of the instruction set will be described below. Two instructions Format is used. Instruction format A is group 1 and READ instrument Used for traction and instruction format B for other groups . 1 bit of NOP, 5 OpCode bits, 2 sector bits And 24 operand bits. Instruction format B is NOP 1 bit, 7 OpCode bits And 24 operand bits. As shown here, these formats execute multiple instructions in parallel. I can't. By executing the alternative instruction set form, Multiple instructions can be executed in parallel. In the table below, d (n) is an n-bit argument with n ≦ 24, and s (2) is 2 bits. Is the number of sectors. Parallel (“horizontal”) processing 1. The instructions of groups 1 and 3 can be executed in parallel. 2. The instructions in groups 1, 2, 4 can be executed in parallel. Obviously, many features of the invention described in the context of separate embodiments may be combined in a single embodiment. It is understood that it is also shown in the combination of. On the contrary, the context of one embodiment Many of the features of the present invention, which are briefly described in It is also mentioned in the senzen. Weizman scientific study of Rehovot, Israel Avidan Akeri, the main part of the doctoral dissertation submitted to the office b) Dr.'s associative real-time vision study is shown in Figures 11-34. ARTVM Architecture (The ARTVM Arclutecture) Associative Real-Time Vision Machine (ARTVM) is a Vision Requirements It has many features that can be adapted. Explain the machine configuration and its initial operation The characteristics of the operation are highlighted. Referring to FIG. 11, the important part of the machine is that both forward and bit are parallel. A basic, classical, associative processor. Main associative primitive Bu is COMPARE. Comparand registers are all memory Code at the same time and the agreement sets the corresponding tag bit To be instructed. The comparison is performed with the bit indicated by the mask register. It is only executed during the word pointed to by the register. Status bit rs p emits a signal that it is at least one opponent (match) (FIG. 12). The WRITE primitive operates in a similar way. The contents of Compaland All words pointed to by tag and all bits pointed to by mask Are simultaneously written in (FIG. 12). The READ command is usually single It is used to bring out a word, and the single word is extracted by the tag. Combination Compare Write (COMPARE-WRITE) "Logical action", all logical and arithmetic functions are [3] And then executed. Therefore, the associative machine is a memo as an array of simple processors. It can be viewed as an array for each word in the library. ARTVM is N × N words And a word for each pixel in the image to be processed, It is arranged continuously in a plurality of columns. Full machine instruction set in the following table As shown. Neighbor operations playing key roles in the vision algorithm Request to bring data from a "bottle" pixel. Data communication As shown in Fig. 13, the tag is registered by the shift tag (SHIFTTAG) primitive. Execute one bit slice at a time via the register. Many applied shifts Determine the distance and relationship between the fabric and the destination. When this relation is uniform, all Communication between processors It is done at the same time. Fortunately, only the neighborhood algorithm has a uniform communication pattern. Request a turn. When the tag register consists of only a single dimension, the image is Because of the dimensionality, communication between neighbors in adjacent columns requires N shifts. Request. In order to supply these multiple long shifts, a multi-shift shift limiter Drive or shift tag (± b) is implemented in hardware, where b is It is N sub-multiples. A size that shifts one NxN image to k places The time complexity in the kuru is expressed as M / 2 [5 + k-Lk / 6 (b-1)]. . Where M is the accuracy and b is the size of the multiple shift primitive. You. Loading the data image into the associative memory and outputting the computer result Takes a lot of execution processing time. This is the frame buffer in the associative memory. Distributor and avoid it by accessing Tag Register [33] be able to. It consists of a 16-bit shift register associated with each word. Function of the I / O buffer array. Stereo image frame Buffer as it is received and digitized without interfering with the process. You can shift to lei. Stereo virtual image during virtual blanking The diframe is moved to associative memory, one bit slice at a time and used. It is. TAGXCH (TAG Exchange) primitive. Buffer in this command -The array (buffer array) rotates clockwise through the tag register. Turned. Since the computer result of the previous frame fits the output If so, the result is put into the tag by the comparison instruction and then the tag exchange And this Tag Exchange outputs a 1-bit slice Bring a 1-bit slice into the name tag exchange of the primitive . This operation must be repeated 16 times for a full stereo image . During the next frame time, both input and output operations are It is processed in parallel without any need for negotiation. In the following routine, the contents of the buffer array Of the 16-bit field of the associative memory starting at the next bit position io The contents are changed. Execution time is 64 machine cycles (2μs or less), virtual blanking It is compared with the delay period (1.8 ms) to a negligible level. Sample routine smell To change the data between the buffer array and a continuous field of memory. , Is the tag exchange primitive completely flexible and a single field? Data can be fetched from another Yes, both fields are distributed. In a given memory cycle, four operations are performed side by side and simultaneously . That is, SETAG or SHIFTAG; M (SETX: LETX) load; C (SETX: LETX) load Mode; and COMPARE, READ or WRIAE. FIRSEL has many responses in 6 cycles COUNTAG decides to count and execute total data in 12 cycles. ing. Control fan Action is given in C language and is executed in parallel with the associative operation. Does not contribute. To clarify the parallel processing capability, two elements with J element and M bit accuracy Consider that the associative memory contains the data vectors A and B of Vek We expect to replace Trufield B with the subtotal A + B. Communicating Thought operations are performed sequentially, and one bit slice at a time, Start with the least significant bit at the beginning. 3 slurs at each step Chairs A, B and C (I = 0,1, ..., M-1, where C is a carry slice) are described as [letm d (.) ; letc d (.); setag; compare] to apply the application truth table input combination In parallel, followed by the correct [letc d (.) Write] description. The combination follows by parallel substitution of B and C. Full discretion of routine The options are given by the machine simulator. When adding a machine cycle Is considered to be independent of the vector size J. Therefore, one 51 2 × 512 image (J = 2¹⁸) And 30 nanosecond machine cycle (VLSI chip For implementations and interconnections), the machine is 125 Performs 0 billion 8-bit additions. Associative subtraction operates on the same principle, and also 8.5 Run in megacycles. Multiply addition (8.5M²Cycle) It's easy, and subtraction is done again (15.5 M²Easy to extend) It is. Multiplication techniques are discussed in detail in the vision algorithm. ARTVM Is considered microprogrammed and is therefore required for each phase The precision can be manipulated accordingly and the resulting significant bits required To generate. For many vision algorithms, the accuracy is fairly low, which Gives the TVM an additional speed effect over conventional machines. As mentioned earlier, each word of memory acts as a simple processor. Therefore, memory and processor cannot be distinguished. Input, output and processing Processing takes place simultaneously in various fields of the same word. Accessed and professional The fields that are accessed are completely and Flexibly subordinated. Therefore, by increasing the word length K Ability (family of vision algorithms) processing can be extended. Bi As shown in the John algorithm, at K = 152 all the views we consider The John algorithm runs in real time, where K = 32 is a Image processing applications, eg histogram evaluation, convolution , Sufficient for applications such as morphological operations. VLSI chip implementation and interconnection (VLSI Chip Im plementation and Interconnection) Ruhman and skeleton [34,35] are static associations He invented a cell and used it to lay out an associative memory chip. That performance After evaluating the monthly performance by circuit-level simulation, they The area ratio between the static RAMs of number 4 was evaluated by a conservative method. Statie 4 Mbit of RAM is 100 mm commercially²Now on smaller chip area Given that it is being used, the associative memory chip capability will be 1 megabit . The application chip for ARIVM stores 4K words × 152 bits, and this 4K word Do x 152 bits is only 59% capacity. Cycling in 0.5 micron technology It takes 30 nanoseconds according to the conservative exterior method of time. This value is associative It was used to calculate the execution time of the algorithm. One bus 152 bits to load the full compaland or mask word Width is required. Fortunately, only the associative algorithm has 1 or 2 shorts in each character. Operate on the field and on multiple flag bits. Therefore, 4 words 32 bit sector and one 8 bit flag field. Multiple The bus can access one flag field and one sector at the same time. Provided to you. FIG. 14 shows a chip interface, where 64 of the chips are How can ARTVM's associative memory be interconnected and created? Is shown. 10 Taking into account the exponential growth of chip capacity by a factor of [36] every 5 years Then ARTVM can be reduced to 8 chips around 1995. Machine Luke is an associative memory, so upgrades are easy and inexpensive. As shown in Figure 11, the control unit receives the output (count and rsp) And associative memory with microinstructions and invariants (mask and It is required to generate a sequence of data. This unit is It can be considered to use a hot slice component and one or more VLSI chips. It can be used most effectively by designing the group. Control unit The function becomes clear from the following associative algorithm. See the note in FIG. Machine Simulator (The Machine Simulator) The ARTVM simulator was drafted. This simulator is a vision algorithm User to check out the associative implementation of the system and improve its maintenance. To enable It is written in "C" language and written as "asslib.h". Bi The John Machine Simulator consists of an association instruction modeler and an execution time evaluator. Is made. Associative instruction modeler The main features are: The dimensions of the associative memory are defined based on the variable memory size and variable word length. The #define command initializes it in the application program. Must be done. -The content of the associative memory and its registers are based on the following rip parameters. Is defined as All are components called parameters. ・ Associative instructions defined in ARTVM architecture are "C" It is executed as a function and writes the result to the external configuration parameter. -Three instructions are added to the load, save and print cycles. Roadco Mand initializes array A [.] [.] With data from file associative input, while The SEEP command is stored in the memory array A [.] [.] At the end of the application program. Write content to file associative output. Print cycle commands are simulated Display the number of machine cycles required to execute the program. -The program control function is directly written by "C". The general format of an application program is as follows: Execution Time Evaluator Velocity assessment is that costs are assigned to each domain change in the machine cycle Machine as Simplified Finite Automation Finite Automaton (FA) This is done by modeling The machine has two states, S₀And S₁only Having. The input alphabet is a category represented by five categories Iy: y = 1, ..., 5. The following were selected by grouping the instructions into Gory. FIG. 15 shows the change table and FIG. At initialization, we have a cyclo counter It resets (called cycles) to 0, and each The cycle counter is incremented when the status changes. This velocity model yields the results shown in the figure. Ie I₁What from the group Instructions are also group I₂Both of these instructions can be performed simultaneously with the instructions from Is group I_ThreeOverlap according to instructions from. Counter part (I_Four) Kos According to the conservative method, the pyramid of adders accomplishes it on-chip, 12 cycles based on off-chip aggregation of partial sums in a 2D array It is estimated as Farsel (I_Five), The cost is OR by the conventional method. Estimated to 6 cycles based on performing it by the pyramid of gates The depth of the OR gate is₂N-1, some of them on-chip The rest are off-chip. In the worst case, the pyramid must be refuted twice No, then the tag flip-flop must be reset. Speed evaluation A simpler model for gives the programmer loose constraints. Parallel processing The order to allow the reason is order I₁, I₂, I_ThreeMust be written in. To illustrate the instruction set and assembly language used in the simulator, Below is a list of the vector addition programs we have already discussed ( ARTVM architecture). Vision Algorithm (Vision Algorithms) Tests the flexibility and speed of the associative architecture ARTVM submitted To achieve this, a wide range of vision functions were performed. They are low Bell's algorithm, eg histogram generation, convolution , Edge detection, thinning, stereo matching and optical Including low etc. The mid-level function was performed. This mid-level fa Functions include contour tracing and leveling, knee (Hough) conversion, important Mapping and geometric as convex and Voronoi diagrams It includes things like tasks. Our simulator is an associative algorithm Was used to test, and to prove its complexity. Before we elaborate on the associative algorithm, we will discuss the features of the machine, It may be useful to easily recall the values as well as the data structures used . The image analysis is said to be 512 x 512 (N = 512). Because of this Memory capacity is 256K words. (1 word per cell). data Are arranged linearly in memory with an array of pixels for video scanning, Start at the top left hand corner of the image. The incoming data is 8 bits accurate ( I have M = 8). And although the processing is all fixed points, Zum is designed to possess full inherent precision. Long shift instructions Given for communication between rows. The size is indicated by b Where b is a divisor of the column length N. Long shift in our model of the machine The size of the gadget is taken in 32 places (B = 32). The algorithm is displayed in cycle time. I.e. Uccle is a 30-nanosec VLSI chip implementation and intercom Estimated by a conservative method in Necton. This value controls execution time. Used by computers to calculate. Low Level Vision histogram The associative characteristics of ARTVM and its related quick counter instructions are histogram Solution is easy. The program is shown in Listing 1. Each gray level It consists of a very short loop that is repeated in Le. That is, the comparison instruction is for each gray level All pixels are attached, and the counter unit sums up the number of all pixels. Its value is automatically It can be used as a controller, and the controller can Accumulate to Togram. Therefore, the machine cycle time complexity is I is given by the following equation. Where M indicates gray level precision, 8 bits are taken in our model . Therefore, the histogram is 3330 machine cycles or almost 100 μs. Executed in. Convolution Low-level vision, especially the detection of edges, is the application of various filters to the image. Application, this image is most easily executed by convolution Is done. The image has a simple vector length N²That is, for each length N Think of it as an N-column vector that follows. N element by P element filter The convolution of the input data vector is concluded to be the vector length N + P-1. But the center representing a region of complete overlap between the two vectors The N-P + 1 element has a gain. We have developed several technologies And perform convolution in cooperation, and filter as dimension separation and harmony Depends on characteristics. We are Ruhman & Scherson ) [6,37,38] started the multiple pre-and-shift approach. Association The Molly word format is shown in FIG. The convolution filter vector [f] with length P and precision 8 is Calculated at one time from a roller or one element. As a result, the field [fd] Length 8 + 8 + log₂It is operated by (P). Bit temp (temp) is Used for temporary storage of carry carried through field [fd]. mark( The mark) bit identifies the area that completely overlaps with the filter vector. Play a role in recognition. The column convolution program is shown in Listing 2. H The field [d] can be multiplied by the continuous element vector [f]. In multiplication, The field [d] is downshifted to 1 word position for column convolution, That is, it is shifted down to the N-word position for column convolution. Magnifier F element act as, that is, each of its bits is Tested. Ifset then added field [d] to bit position- Add to field [fd] starting at offset (add-offset). After each addition Carry is transmitted in the highest bit [fd]. In column convolution , Only the last program line has to be changed and the "shift tag shiftt ag (1) " Is replaced by. The time complexity of 1-d convolution of machine cycle is given by Can be Where t_a, Ts indicates per-bit addition and shift complexity, and Tp^1bIs The complexity of carry transmission across the field [fd] is shown. Addition and carry -They are for one ONE digit of the multiplier (filter element) The time complexity is a filter vector element. It is a function of α as the ONE digit ratio of the input. The range of α is 1 / M ≦ α ≦ 1. Addition time t according to the program list_aIs 8.5 cycles . Carry transmission takes 4 cycles per bit, and the average transmission distance is for that reason, Program time for shifting one field excluding shift tag () is 1 bit There are 3 cycles. Therefore, move the pixel field beyond the neighboring columns Doing t per bit^c _s= 3 cycles are taken and 1 pixel Moving field is per bit I take the. Extending the above algorithm to 2-d gives It is. here, Shows the average carry transfer for 2-d convolution. t_c, Tp^2dAnd ts to 11 Substituting, we get: This simple algorithm is quite efficient with 1-d convolution, Is very efficient if the filter used for is a low value (α << 1) You. But for one wide 2-d filter, 31x31 In the meantime, it can exceed half (20 ms) of the video frame. Some appro Is considered to reduce time complexity. Some 2-d filters can be separated into two 1-d convolutions. Therefore, a 2-d Gaussian is a sophisticated application of one 1-d filter in two equal directions. Affected by the application. Decrease to 1-d convolution at runtime It will lead to drastic improvements in the meantime. Equation 11 is described by the following equation. t_a, Tp^1dReplacing and ts reduces it to The filters we deal with all are in harmony with the origin. Gaussian Are harmonic, while the derivative of Gaussian is odd harmonic. This run An interesting question about how far to promote harmony to improve time Lead to. If we first consider the one-dimensional convolution as a harmonic filter, Length P = 2L + 1 is point d_mApplied to Where filter element f_iCenter element f₀On either side of Are equal. Have an even function when you use the following word format The 1-d convolution algorithm is as shown in FIG. All shifts are one place or "shift" for convolution in the x direction. TAG (± 1) ”and each has a N / b length shift in the y direction, ie, It is. For example, a separable 2-d filter with a harmonic like Garcian When the algorithm of is applied, the time complexity is as follows. F for odd harmonics₀= 0 and f_iOn one side The absolute value of the Luther element. Therefore, the convolution formula is shortened to It is. Then two modifications are required for the algorithm. That is, the program (d ・ f₀ Evaluating step 1) and loop from NEXT + d to NEXTd It is to change the throwing cover of 1. Time complexity of odd harmonic filters Is applied in both directions and is given by The method of reducing the convolution time, which has already been considered, is special for filters. Personality or Harmony, Element Bit Statistics and 2-d Phil Has the advantage of separability of the tar. We are here now the most popular Treats pixel data as a multiplier, taking into account the enhancement applied to the Luther Multiplier by applying 4 bits of Multiplier at a time. Speed up the solution. The following algorithm and word word shown in FIG. The 8-bit data field is high (d_h) And low (d_L ) Separated into small amounts. 12 bit field or nd depends on the current data Provided for partial production of current filter elements. For table lookup Satisfying all more partial products requires 15 compare write cycles You. (No action required for all ZERO small amounts). Uses an enhanced multiplication algorithm In general, the time complexity of general 2-d image convolution is expressed by the following equation. Is done. And the time complexity of 1-d convolution in both directions is expressed by the following equation. It is. T_mIs the time (2 × 15 × 2.5 cycles), and T_p1, T_p2Adds the following to field fd Carry transmission time. T_m, T_cAnd t_sSubstitution of T and_p1, T_p2The evaluation of Therefore, the time complexity is expressed by the following equation. The next table shows 7x7, 15x15 and 31x31 size filters Compare all convolution methods that discuss. Augmented Martin Prique Option T^enhSeems to be the fastest way to calculate general convolution Looks like. This is also achieved with a word length increase of 12 bits. Special filter Other methods offer a small advantage in their characteristics, but their harmonization The use of requires a larger increment in word length (17 bits). Discovery of Mart Hydre's Edge (Marr & Hildreth Edge DETECTI ON) This algorithm [39] is for the Laplacian of a Gaussian filtered image. Zero Crossing ZC is accepted and can be described as: Where I is the original image and G_σIs the 2-d Gaussian of scale σ Show filter, Δ²Indicates a Laplacian operation. This M & H algorithm is It consists of the following two steps. The DOG filter has the following formula. Where σ_pAnd σ_nIs the spatial constant of positive and negative Gaussian, respectively Value and the value of σ_p/ σ_nIs about 1.6, which is the Gaussian (Δ²G) Operator He is the closest person to the Laplacian. The DOG implementation consists of four 1-d convolutions for each spatial constant. Demand a 1-row and 1-column convolution. Completion of association DOG execution Kusiti is expressed by the following equation. Where P_pAnd P_nIs a spatial constant value σ_pAnd σ_nWith a unique filter size of Yes, T_diffIs a transmission door that subtracts N bits from the sign bit and lowers it from the top. Shows the complexity of. The associative subtraction speed is the same as the addition speed The fastest general convolution method is T₁₄ ^enhAssuming that eqs.20 and 16 Substituting M = 8 in eq.19 gives the following equation. P_p, P_nAnd DOG complexity (cycle and millisecond) are σ For three filters corresponding to = 0.5, 1 and 2, we obtain the following table. here P_pAnd P_nIndicates the filter vector length. The second step of the M & H algorithm, that is, the 0 cross direction, is operated near 3 × 3. To make. The center pixel is one of four directions (horizontal, vertical, and two diagonal lines) If one produces a change in the sign, it is considered an edge point. In particular, whether a pair of (middle) items exceeds the positive threshold T , For a pair of fields is less than T. Each space fi Luther's ZC association performance has the following outline. 1. For the threshold T and minus -T, all pixels (DOG file Result) and move 2 bits to memory and indicate the result thing. 2. Simultaneously into a 16-bit field of words from 8 neighborhoods of each pixel into memory Shift and write all pairs of instruction bits to. Tests all four directions for ZC using the 3.16 bit indication field Mark the edge points. The associative algorithm for detecting 0 crossings has 165 cycles or 4. Time complexity of 95 microseconds is shown. M & H algorithm is inclined Note that the edge points are generated without the diagonal direction. This parameter Is calculated by manipulating a larger neighborhood (9x9) around each edge point can do. 16-segment direction (and corners) detection Lugorhythm has developed. As a result, it is expressed as follows. That time complex Citi is 1010 cycles or 30.3 microseconds. Canny Edge Detection Canny's algorithm, [40], has three stages. That is, 1. Direct derivative of Gaussian filter (ΔG ☆ I) 2. Non-maximal succession 3. Threshold with hysteresis The general form of a Gaussian derivative with scale σ in direction n is You. The Gaussian filter x and y derivatives are Respectively Obtained by combo rubbing the image with the expression . Applying the enhanced multiplication method typically results in a typical filter size. T for the execution time of the dynamic set_1d ^enhIs as follows. Non-maximum suppression maximizes the amount of tilt. Selected as a pixel behind the edge. For optimum sensitivity, the test is run in the tilt direction. It is. Since the 3 × 3 neighborhood provides 8 directions, the number of entries is 16 Double. The slope value for each pixel is Compared to that of. The associative implementation is based on the 0-cross direction described above. Requires a few other operations. Thresholding with hysteresis is The weak edges that become noise are removed, and the strong edges become weaker as the strong edges become weaker. Continuation of the tracing. Evaluate signal statics and image noise Basically, two thresholds of the amount of tilt are calculated low or high. B Edge candidates with high level slope are removed, while high level corresponding edges are removed. The candidate is considered as an edge. Candidates with values between low and high are When connected to pixels above high level through a series of pixels above low level If so, it is considered as an edge. All other candidates for this interval are removed. Professional Set is transmitted along a curve. The associative implementation (List 3) uses the three flags shown in E of FIG. use. FIG. 19 shows that the high threshold (unambiguous edge point The candidate edges that exceed all selected edge points, OE (old edge) to keep track of the confirmation edge at the final repetition And L representing the following candidates are displayed. At each iteration, each L candidate has more than one Evaluated to see when 8 neighborhoods are edges, in this case set E By doing so, it is declared as an edge. Before moving E to OE When it arrives, two flags are prepared to look, in which case the process will terminate. You. LISTING3: Curve Propagation. The program time complexity is given by the following equation. Where I is the number of repetitions and 23.5 is the time to examine values near 8 , N / b describes a long shift brought to an edge point from a neighboring column. The upper bound of I is given by the longest transmission chain and is approximately N²But, 10 As a typical value of 0 repetitions, Bending Complexity Complexity is 3950 cycles. Or 119 microseconds. Thinning The transfer algorithm produces a non-thin curve. Multi-pass thinning Gorism consists of a pre-thinning layer and a repeating thinning layer. 20A to 20E Referring to, the priming layer is single by applying template (a). Points when the gap is filled and one of the templates b, c or d is retained Boundary noise is removed by clearing P. Multipath is a template Means first applied in the north direction, then in the south, east and west directions For that reason, except for template (a), which is well harmonious and needs to be applied once. Remove. All templates are shown in the north direction and do not require attention (ONE or Use X to display ZERO). Similarly, the thinning layer has 4 consecutive Test the templates e, f and g in each direction, and if there is agreement, point Clear P. This 4-pass sequence will continue until there are no more changes Repeated. Particularly valuable notes generated by a simple local process It is the quality of the skeleton that is played. The most accurate presentation of the skeleton is based on the middle axis ing. Davy Sand Plummer [41] has a well-established reputation for producing such skeletons. Eight key words for proposing a value algorithm and testing it. Select an image. Our thinning algorithm is Applied and got interesting results. The skeleton is from Davy Sand Palmer I agree with this exactly and virtually. Mismatches that are not end points are fuzzy points Occurs and constitutes an equal valid result. If you remove the boundary noise of the pre-sinking layer Avoid the formation of external skeleton stimuli. Algorithm time complexity I is given by the following equation. Here, the first two terms describe the pre-sinking layer. Execution time is 150 cycles (4.5 microseconds) for thinning There are 214 cycles per return (6.4 microseconds). Edge sinin 3 repeats are enough for G.G. You. Single pass thinning was considered and rather found critical. Shinhate The retention algorithm proposed by Ariya [42] seems optimal, but it Does not provide the ideal skeleton, and its own preliminary layer of noise trimming The application guides the organization of several main branches during thinning. Stereo vision (Stereo Vision) solves the corresponding problem at each point of the left image. Have to find the corresponding point in the right image However, the lack of commonalities must be calculated. Stereo for the past 10 years Since it was a major research topic in computer vision, so many The approach of Attempts to companionally implement all of these many proposed and proposed approaches Has been made. Here, we focus on the algorithm of Grimson [43]. this is, In addition, it has similarity to the human error structure of human vision, and has the above M & H. Or can use input edges generated by the Canny edge support scheme. You. Assuming that edge detection was performed on both left and right images, the result is It is set side by side in memory. Edge points are marked and Orientation is 4 bit precision over 2π radians, or 22.5 degrees Given in the analysis of. Stereo process uses left image as reference The edge points to the same orientation as the slope of the equal sign. Match the di points. Edge lines near the horizontal line (± 3.75 degrees) are common Excluded to minimize missing point errors. The Grimson algorithm follows It consists of steps. ・ Position the edge point (accepting orientation) in the left image What to do ・ Dividing the area around the corresponding points in the right image into three pools Move the match to an edge point based on a potential match in the pool To do ・ Clarify ambiguous matches · Assigning values that lack commonalities The associative memory word format of the matching process is as follows. It is Ri. Input fields DL and DR label edge points for left and right images respectively Image, and then DIR-L and DIR-R provide their orientation. Give. The conclusive value of the lack of commonality is recorded in the output field DISP. Aso The outline of the creative algorithm is shown in FIG. Searching for a neighborhood of ± W pixels divided into three pools A, B, C. Here, pools A and C have the same size, and branch and concentration areas are replaced. To represent. The smaller pool B is the area around the lack of commonality 0. ・ Fields DIR-R and DR (from right image) are set to W word position Shift down (corresponds to shifting the right image of the right W pixel). ・ At edge point (DR, DL = 1), within the range of crossing ± 1, Compare the DIR-R against the DIR-L. Position W after each comparison Shift up the right image field to one word position until it is reached . The comparison result shows that the fields PX and KX (X = A, B or Is recorded in C). APX value 00 indicates no match, 01 indicates 1 match 11 indicates 1 or more matches of pool X, TZ field is temporary Accumulate the lack of commonalities of each pool used in ambiguous cases. Match It was noted that the net shift in the right image was a lack of commonality when discovered I want to. Has a clear lack of commonality (01 for PX, 00 for all PYs, where Y ≠ X) Edge points are selected and the lack of their commonalities causes the field DISP (D ISP = TZ). -Match one or more pools by using the lack of predominant coexistence points in the neighborhood Make a plan to make sure. If there is a dominant pool in the neighborhood, the ambiguous point is the same I have a potential match in the pool, and at that time the ambiguous point is a match Selected. In other cases, matching of the points leaves ambiguity. Strong coexistence All edge points to test neighbors for lack (field DL) Start by counting in the COUNT field. Then each program Accounts in the field COUN-P for clear matches across the same neighborhood I do. When COUNT / P> COUNT / 2, the pool is influential and the same pool Matches have a lack of commonality copied from TX to DISP. Influential DISP up if there is no match or if there is no match in the dominant pool The question point is not dated and exists as an ambiguous mark in the MR bit. • The final step tests whether the lack of commonality is in range. Marl & 70% of edge points in the range area The above is the map Indicates that the Unmatched edge points are labeled with MR and COUNT -P fields are calculated across neighbors. COUNT / P> COUNT / 4 Then all edge points in the neighborhood are labeled as unmatched and their common The lack is cleared. The time complexity of the associative algorithm is given by "B4" Where T_shComputes the shift of the right image, T_matRated matches in the pool Shows valence time, T_disIndicates the time of clarity, and T_orIs out of common Indicates the time to find and remove the deficiency. The cycle shift time is given by: The first term is the first and last W-pres of fields DR and DIR-R (5 bits). The shift up will be described. The second term is the continuous match investigator It covers the Wraith shift down. And the last term comes from generation. Therefore, it is an update of the boundary flag that handles the final effect of the column. Match now Complexity T_matConsider. It is the lack of all common points in the range of crossing ± 1 8 cycle squares for (2W + 1) and all orientations (10) Request. here, Here, the second term describes the final process of the comparison result. Clear professional The process consists of the following steps. ・ Counting over edge points for each L × L neighborhood ・ For each of the three pools: -Counting distinct matches across the same neighborhood. -Compare this result with half the edge count. -Copying the lack of commonality from a valid pool match. Here we can write: Where T_cntIs the time to count label pixels beyond the neighborhood, and T_g _t Is the time to compare larger values, and T_cpyCopied the lack of commonalities Indicates the time to do. T_cntIs the subject of the next section and the other two are Given. The clear algorithms are as listed below. Finally the time to test and record the lack of coexistence points other than high is given by . The unique term here is to group unmatched edge points in MR, Explaining leveling them, ie, T_cntIs the unmatched edge of the neighborhood Time to count points, T_lt, Is the number of edge points in the neighborhood. Covering comparing the number of T_rmIs the range It is the time to label and clear the lack of commonality of edge points in the inner neighborhood. . Stereo matching is performed for each of the spatially frequent channels. The following equation is obtained from the model of stereo vision by Marl & Poggio [44]. Here, P indicates the filter size (vector length) of the channel. Formula 2^I-1 If we choose P, then L will have a similar equation: Substituting 21-28 into Eq. 20, using the relationship between P, W and L, k above Applying to the definition, we obtain Except for the neighborhood numerical value, the algorithm complexity is three channels of spatial frequency, Cycles and milliseconds are shown in the table below. Considering the problem of counting label pixels across a fairly large neighborhood, Performed 5 times during Tereo evaluation Can be placed in an advantageous position over its time complexity. Linear Summation A straight-through approach that counts label pixels beyond the neighborhood around each clothes cell is , Provides a count field for each word, increments the neighborhood label, and As a result, the neighborhood labels are input in a convenient sequence. In the vicinity of L × L, Maximum count value is L²And when L is an odd number, the count field length is 2log₂L It is. The program list is as follows. Here, the "flag" is considered to be family luck. "Field" is the initial (LSB) count field. ). The word format is as shown in FIG. Listing 5: Linear Summation Over LxL with L Odd The machine cycle execution time is shown below. Where k = log₂(L + 1), hence it is L²Grow to logL. Combining the above programs of stereo algorithms destroys its execution time , Thereby exceeding the video frame time on the worst channel. this child Means all hours in units of a few milliseconds, as shown in the following table. To donate. For comparison, the stereo execution time excluding the neighborhood count value is T_2st-cntsof When it happens as follows returned. Two-dimensional outline (2-d Summation) In the fan-shaped overview program, each column of the L label is near the virtual overlap. Each was actually counted L times. Looking at the effect of the two-dimensional structure in the neighborhood, Is performed in two stages. In one stage, the neighborhood labels are matched column by column and the second At the stage of, the sum of the virtual neighborhood columns is aggregated, which allows the sum to be Input. This requires an additional "column" field of length logL. Demanded to create the next program. Here, the word format is shown in FIG. That's right. The execution time of the machine cycle is given by the following equation. Filter vector length is formula 2^y-1Our case choosing to be represented by Then, as shown above, L is represented by the formula 2^k-1Indicated by. Therefore, we obtain And T_cntIs expressed by the following equation. The execution time increases to LlogL. This counting algorithm lists the execution time of a few milliseconds As shown in the table in, the stereo complexity is good within the real-time video limits. It will be good. Overview of two-dimensional fishing (2-d Three Summation) 2-D Summation requires stereo time complexity Real time , But this result is negative The question arises whether it can be deeply improved. Two-dimensional outline If you continue on the plow (first by row, second by column) Handles each dimension of the station, aggregates the elements for each pair, and outputs the pair results. Aggregate and add again until the dimension is covered. L is in the formula^k-1Because, Large columns and columns are added to complete the tree. "tail" The special element displayed by is replaced from the time of aggregation so that it will be recalled again Therefore, the k-bit “tail” field provides a temporary storage of special column summaries. Give. The program list is as follows. The word format is shown in Figure 24. It is shown. The execution time is obtained from here, The results are shown in the table for the three channels. Complexity calculations are performed until LlogL equals 32. However, the table shows that the sum of 2-d (the largest neighborhood) is an improvement of 40% or more. You. A stereocomplexity of 27.5% is consequently improved. Calculation The fact that it is performed with the required accuracy, known as progress Mostly improvements occur. Discussion of stereo results Real-time stereo vision with an associative process is considered and problematic The function is recognized as a calculation of labeled pixels to distinguish it from its neighbors. This function was analyzed 4 times as a combination of 2 or more ways, 1 time as a difference out of range Is treated. Perform function calculations in a straight-forward (straight-line) manner Pushes stereo execution time beyond real-time (video) limits. Array and trunk The technology of how to associate E is done well in real time. The results are straightforward as shown in Fig. 25. It is divided into three executions: line, two-dimensional, and two-dimensional trunk. Neighborhood dimension The execution time as a function of is expressed as two comparable points. Figure 26 is the dimension of the neighborhood As a function of 3 in three ways with and without neighborhood calculation Represents Complexity. Optical flow Optical flow is an image of velocity vectors that split as movement across the field of view. It is indicated by a dotted line. Tracking potential applied to optical flow Large independent robot with large target, identity target and compressed image Includes areas associated with. Calculate optical flow by computer The theory is that when the image is constant, the brightness of the spots of particles and the flow of the luminance pattern are It is based on two constraints that smoothly change everywhere. Horn and Schunck [45] control It led to an iterative process of solving the reduced problem. The velocity of the flow u, v). Speed (u^{n + 1}, v^{n + 1}) Is new The preset average speed (uⁿ, vⁿ). Where α is a weighting factor that depends on the noise in the measurement. 33 formula Burned E_x, E_yAnd E_tIs the mean of the four first differences in the three-dimensionally nearby measurements Obtained. E_i,_j,_kIs the pixel value at the intersection of row i and column j in frame k. Inn Decks i and j are from top to bottom, You. There is a practical issue of how to combine hourly iterations. Before It can be used normally from the time step (video frame time). High speed motion In this case, one iteration per time step stabilizes the optical flow. Not enough value. Requires several iterations before moving on to the next frame . When you execute the following equation, the memory words are input data, output data, and intermediate concatenation. It is divided into multiple fields such as fruits. The format is shown in FIG. E_nAnd E_n1Computer-calculated flow from two consecutive video frames in Is done. 512 gray levels are obtained for each frame with an accuracy of 8 bits Included in × 512 pixels. Flow image during vertical erasure (time interval between frames) Page is E_n1The new image from the input I / O buffer array (Figure 15) is field E._n Written in. One or more iterations of the algorithm during frame time A reasonable approximation of the optical flow used in the next frame is obtained. One Equation 33 is shown below again. Where D_x= E_x/ P, D_y= E_y/ P and P = α²+ E_x ²+ E_y ²It is. Guide here Burned E_x, E_yAnd E_tSame as D_xAnd D_yIs solid in the given frame It is fixed and is not involved in the process of repeating between frames. Therefore, these parameters Computer algorithms have fixed algorithms Quoted as part of the Fixed part The first stage is individually induced E_x, E_yAnd E_t, Is calculated by computer You. Clear field E_x, E_yAnd E_tE_nAnd E_n1Raise the row of If you move to the left of the ram (N + 1 word up), E_{i + 1},_{j + 1},_nAnd E_{i + 1},_{j + 1},_n ₊₁ Is obtained. E_{n + 1}To E_x, E_yAnd E_tPut in. E_nThe induced feel shown in the table below Add to or subtract from the code. Second stage. E_nAnd E_n1Move to the right of the column (lower by 1 word), E_{i + 1},_{j + n}And And E_{i + 1}, J,_{n + 1}Is obtained. E_nAnd E_n1Are stacked as shown in the table below. Stages 3 and 4. E_nAnd E_n1Lower one row (N words) and stack as shown in the following table You. Stages 5 and 6. E_nAnd E_n1Move the column of to the left (up 1 word), as shown in the following table To stack up. Stages 7 and 8. E_nAnd E_n1Move to the place where they were (decrease one word). D at the next stage_xAnd D_yIs calculated. Refer to the table for α + E_x ²Is calculated by computer, and the field S_cPut in. Similarly E_y ²Is calculated by computer, and field A_cPut in. Field A_cIn the field S_cIs added to obtain P. Field E_xThe field S_cDivide by and place the quotient in field U. Field E_yThe field S_cDivide by and place the quotient in field V. Field U to field S_cD to the right of (the least significant 9 bits)_xCopy I do. Field V to field S_cD on the left_yCopy Before executing another layer, divide it so that it will increase or decrease correctly according to the divisor P. E_XOr E_yIs zero-extended to the left. α is positive, so D_xAnd D_yIs less than 1. Therefore, the quotient is over Don't low The complexity of the fixed part is calculated as follows. Repeated part Calculate the average of the U elements (field U) as shown in equation 44 De U_avPut the result (u) in. Increase or decrease the U field with the most desirable repetition, A_cAdd to the field. Four neighbors add from one place to the left This gives twice the weight. A_cDivide the value loaded in U_avPlaced as a result. Similarly, the average of the V elements (field V) is Calculated by field V_avPut result (v) in. Field A_cTo E_tCopy E_xTo U_avAnd put it in the field U. Field A in field U_cAdd. E_yTo V_avField A_cPut on. Field A in field U_cAnd add E_xu + E_yv + E_tIs obtained. Field A_cIn the field S_cMultiply the right side of, D_x[E_xu + E_yv + E_t] Is profitable Can be Place the result in field U. A_cTo S_cMultiply the left side of_y[E_xu + E_yv + E_t] Is obtained. Fee the result Place it on Rud V. The u element of the optical flow is the U of the field U_avBy calculating -U Computer to calculate. Set the v element of the optical flow to the V of the field V_avBy calculating -V And calculate with a computer. The complexity of this part of the machine cycle for each iteration is given by It is. Therefore, the complexity of optical flow can be calculated as follows. Where I defines the number of iterations to converge the flow. By calculating the above formula The fixed portion is 260 μs, and the repetition is every 196 μs. Of different values of I The run times are given in the table below. Mid-level vision Corner and line direction detection An important feature in the middle and highest level process is the distinction between corner and line direction Ability to do. In the case of Canny Edge Desection, the line direction is the process Generate inside. In addition, the M & H algorithm has no directionality and the edge bitmap Further processes line-direction detection. 9x9 neighborhood edge bits around each pixel that distinguishes the segment direction Tomap we propose. Algorithm has 120 different corners and lines Can be identified. The outline of the approach is as follows. As shown in Fig. 27, the vicinity of 1.9x9 has many patterns in the direction of the steering wheel. It is distributed over 24 sectors. Sector size that holds the angle exactly Increases away from the center. Function to calculate sector points to edge point It is defined as a logical OR. And the 24-bit field is divided by the sector value Hit. By moving it to the edge point indicator near the OR directly to the sector value to answer. 2. Sector is based on the determination of π / 8 angle, 16 equal segments around the circle Or define a line. Each segment (direction) is a subset of the segment values Characterized by behavior. Maximum Hamming at a distance of 1 is acceptable. sector -The value field has a mark for each 16 code and each segment direction. It is compared with the result of countering the 16-bit field. 3. Accurate segmentation and unification of testing are 8 Three sets of ambiguity measures are calculated for each of the two main compass directions. Addition The sector value obtained solves this uncertainty. Most of the original ambiguity is decided Absent. 4.16-bit segment field with several pairs of segments with lines and corners You can test the statement. All sample programs can be selected without distinction. Complexity of algorithm is 1010 cycles from simulation It was 30.3 μs. The concept of this algorithm is sector boundary, function calculation, Can be extended to a wide range of functions depending on the choice of pattern correctness . Tracing and labeling contours Attach the prepared step labels to the xy coordinates for each contour point. main The above process is repeated, and parallel operation is performed on all contour points in the 3 × 3 neighborhood. . Every contour point is found in one of the eight surrounding neighborhoods, Accept if Le is less than itself. Operated so that the neighborhood can be easily calculated. The sequence of circles bound together bundles the transmission of labels. Repeat when all labels are unchanged The return stops. Each contour located at the lowest coordinate disappears. each The lowest coordinate point received by the contour retains only one original label. These points Leaves a trail and is counted to get the number of contours in the image. Listing 8 is associative memory Send the program. In the input field, specify [xy-coord] as one position and Make dge] the same as the contour points. Output fields are contour [label] and contour starting point [mr] is there. The word format is shown in FIG. Listing 8: Tracing and labeling contours The time complexity of the algorithm (in machine cycles) is is there. The range above I is about N²/ 2. However, as a value that represents 100 repetitions The execution time is 218 kilocycles or 6.6 ms. Time Complexity The optimal approximation to is Gives the list label and length (in pixels) of the contour. Relatively short o (24 cycles per contour). Associative network The prominent structures in the image were a systematic investigation or excellence in their shape. It can be perceived at a glance without knowledge. Such a structure is a confusing background It will stand out when it is embedded in, or when its elements fall apart. S ha'ashua and Ullman [46] are long, continuous We propose a global prominent measurement of a curve based on its gender, smoothness. N × N Considering an image as a network of grid points, D direction elements (segments or Is a gap) and enters each point from that neighborhood, and many things come out to the neighborhood. To go. In this image, the curve of length L is the direction element P_i, P_{i + 1,}... P_{i + L}, Rye It is a sequence that joins each element that appears in a segment or gap in an image. The salient measure of the curve is defined by the following equation. Σ, which is a local protrusion, is 1 as an active element (actual segment). (unity) is assigned 0 as a virtual element (gap) You. Damping function ρ_{i, j}Gives a gap a penalty. Here, the attenuation factor ρ approaches 1 so that it becomes an active element, and it becomes 1 as a virtual element. It is much smaller (0.7 here). First factor c_{i, j}Is the reciprocal of the total curvature It is a discrete approximation to the boundary quantity. Where α_kIndicates the difference in the direction from the kth element to the next element. And , ΔS indicates the length of the directional element. This quantity is global and is calculated for a curve with L directional elements . Some of these elements may be gaps. Hence given To find the maximum value in a segment, start everything from this segment Possible curve d of^LHave to calculate No. d is the direction number considered (discrete, discontinuous) at each point. Exponential complexity because it does not need to be part of the salient curve or salient (Complexity) cannot be reduced with pyramid techniques Yes. Sha'shua and Ullman arrange dL by maximizing each short curve Therefore, the complexity has been reduced. E_iThe element ρ_iConditions related to Given a variable, the iterative process is defined as E_jIs p_jIs a state variable of_iIs one of d which is a possible neighborhood of. Right of E The numbers on the shoulders indicate the number of iterations. And f_{i, j}Is p_iTo ρ_jInverse curvature factor to It is. After L iterations, the state variable is equal to the protrusion amount defined above. The proof is outlined in [46] and detailed in [47]. NxN lattice The final state variable of every directional element (segment or gap) in is the image Constitutes the projection map of. In our associative architecture, the pixels make up the grid points and the d = 8 directional element is Each pixel is connected to the neighborhood (FIG. 30). The following notation is used in FIG. You. ・ Solid line: E for which protrusion is calculated_iEye element ・ Dashed line: next element used for calculation ・ Dotted line: Elements ignored in calculation With respect to the discontinuous direction of d, the angle α is given in increments of (360 / d) = 45 °, and f_{i, j} Takes the following values. It will be seen that only three values of -45 °, 0 ° and 45 ° are important. Therefore , Only the value of α and its next element are used in the calculation. The initial image is Edge Poi Given as an edge point, the pre-processing stage is Is required to identify the activity element as a set of edge points. Professional A gram outline is shown in Listing 9. The word format of the memory is shown in Figure 31. Have been. Listing 9: Associative Saliency Network When the active ρ approaches 1, it is significant to be able to distinguish different high protrusions. Maybe not, but many iterations are required. 90 bits for algorithm Require word length and cycle time complexity Is given by the following equation. I indicates the number of iterations, and when the value in parentheses is obtained, the execution time at each iteration is It becomes 0.4 ms. The execution time of 500ms for each iteration is the connection machine. (Connection Machine) [48]. Hough conversion Due to the Hoku transformation, even if there is a gap in the curve, the medium like a straight line or a conic curve It is possible to detect the contour of the curve described by the intervariate display equation. Image space (ima Each point in the figure in the ge space) is in the parameter space Converted to a locus. Divide the parameters into appropriate areas and then Histograms are created by distributing trajectories in the space. Of the target curve The appearance is represented by the prominent peaks in the histogram (the intersection of many trajectories). In the case of a straight line (Fig. 32), we use the normal parameterization by Duda & Hart. Use normal parameterization [49]. xcos θ + ysin θ = ρ This equation specifies a line by ρ and θ. And the histogram is denoted by θ It includes straight lines in all directions. However, if the Hou complement generates a direction Θ is known if it is the result of the edge detector by the method described above. Next O'Gorman & C In lowes [50], this information is hardware (word length) and time complex Applied mainly to lower both Citi. 511 x including the center opening For a 511 image, the x-y coordinates are given in 9 bits by absolute value and absolute sign. It is. The angle θ within the range of 0 to π is given with a matching precision of 10 bits ( Exclude the sign of the slope). sin and cos are obtained by table index. table An advantage is obtained by the harmonization of these functions, which reduces the size by a factor of 4. Compare ρ Histogram is obtained from, and read by searching using COUNTAG atomic function The output element is required. This algorithm requires a word length of 52 bits and It has the time complexity of the machine cycle shown in the formula. T_l = 1870 + 13t (r-1) t and r are the analysis results of ρ and θ in the histogram. Second term ( term) is an explanation of histogram calculation, and T at t, r ≧ 32_lInfluence . The analysis result at t, r = 16 shows that the execution time per frame is 150 μs. And the analysis result of 128 increases to just 6.4 ms. Now consider the search for a circle of given radius R. The equation is Can be described as follows. (x-x₀)²＋ (y-y₀)²= R² x₀, Y₀Is the central coordinate. If straight, we handle We want to use the tilt direction to simplify The differential equation of the circle equation we obtained is Shown in θ indicates the tilt direction. x₀, Y₀The solution of These equations are solved and x₀, Y₀A histogram is generated to determine The algorithm uses a gradient pole to distinguish between a light circle on a dark background and a dark circle on a light background. Gradient polarity is used in each case to generate a separate histogram. To achieve. Assuming R is less than 32 pixels, the required word length is 62 bits , The time complexity of the cycle is given by the following equation. T_c = 1550 + 26r_xr_y (46) r_x, R_yIs x in the histogram₀, Y₀Area analysis result (range resolution) It is. x₀, Y₀, And the effective time per frame is 10. It will be 8 ms. Boundary is determined by a partially black mixed circle on a white background and a partially white mixed circle on a black background. Determined by summing two histograms (at host) before setting be able to. If the test is limited to light circles on a dark background, Complexity Decreases as follows (and vice versa): T_c = 1280 + 13r_xr_y (47) Then, the execution time is reduced to 6.4 ms. Geometric problem Convex Hull Finding the boundaries of a set of points in an image is interesting and informative. That When we look at such a set of points, it is not so easy to distinguish between the boundary points and the inside points. There is no. These natural boundary pouints are the vertices of the convex hull. Convex A hull is mathematically defined as the smallest convex polygon that contains a set of points. Similarly, A convex hull is the only convex polygon that contains a set of points whose vertices belong to the set of points. Therefore, that This is the shortest path that encloses the point set. The approach chosen for associative execution is the package wrapping method (pack age-wrapping method) [51]. Guaranteed to exist in the convex hull Start from the point that is marked, and assume that it is the smallest point in the set (smallest y coordinate) Then apply a horizontal ray in the positive direction and move it upward until it hits another point (opposite the clock hand). (Direction). Other points must also be present in the envelope. And at this point the ray Stop and continue to swing to the next point until you reach the starting point Wrapped up. For convenience, we make sure that all images (and the point set) are in the first quadrant. Select a coordinate configuration. Lowest point P_iIs placed by finding the minimum y coordinate in the set Is done. It is on the convex and is labeled as such. xi = 0, y_i= Y_j Is segment P_jP_kSee the extension of P_kThe total where is any of the other points in the set All P_jP_kConsider the angle θ that forms Next point on the convex hull Is the point with the smallest angle θ (FIG. 33). V₁And the vector P_iP_jRepresents V₂Bet Le P_iP_kThe scalar quantity is expressed by the following equation. V₁V₂= | V₁|| V₂| cos θ (48) Therefore, Where a₁= x_i−x_i; A₂= x_k−x_j; b₁= y_i−y_i; B₂= X_k−X_j; Cos to avoid finding the square root²Using θ, θ is in the range of 0 to π, so₁a₂+ B₁b₂Test for positive values before squaring You. If there are positive values, mark them and set the maximum cos among them²Find the value of θ. Also If all numerator of cos θ is negative, then the smallest cos²Find the value of θ. One for selected θ P to match_kIs on the convex and is labeled as such. To continue processing To P_jIs the new P_iAnd the selected P_kIs the new P_j(Fig. 33). the first When it returns to the (lowest) point of, the processing ends. Two special cases can occur. Find the lowest point in the first step , Two or more points with the same minimum y coordinate may be found. That In the P_jSelect the point with the largest x coordinate as_iHas the minimum x coordinate as Select the point to_iP_jBecomes the first reference segment. Find a point that minimizes θ During iteration , Two or more points that produce the same minimum, P_k1, P_k2... P_ksMay find You. Obviously, the line segment P_jP_k1, P_jP_k2, ... P_jP_ksAre on the same straight line, The selected point is the maximum value | x_k-X_jP with |_jFarthest from All x_k-X_j Is chosen to be the maximum | y_k-Y_i| Based on. An analysis of the execution of the algorithm performed in ATRTVM is performed by the machine Gives the cycle execution time. T_cHull = 60 + 105V (51) V is the number of vertices of the convex hull. Therefore, time complexity is a point in the set. Regardless of the number of. Execution time is 3.15ms for 1000 vertices of a convex body . Voronoi Diagram This is classical mathematics, which has become an important tool in computational geometry for proximity problems. It is a target object. Given L points P in the plane_i(I = 1, 2, ..., L) Starting with the set, in the Voronoi diagram, the region R_jAll points in the point set P_jP than other points in (j = 1, 2, ... L, i ≠ j)_iArea to be closer to R_jBy each point P_iSurround. All these R_jThe boundary of is a Voronoi diagram Make up. Associative algorithm based on the brush fire technique The code is shown in Listing 10. Boundaries are files from 2 (or 3) sources It consists of the points where the ya match. Every point in a given set is initially a different color, Is shown by its xy coordinates. Each point in the image looks at its 8 neighbors. With color Surface near the mushroom The blank (colorless) dots to be copied copy the color. If both points are colored Compare colors and mark themselves as Voronoi (boundary) points if they are different You. This process is repeated until all points have been colored. Another cycle for color comparison with neighbors is needed to complete the border segmentation. is there. The order of processing the 8 neighborhoods makes the boundary precision efficient. Selected for. Not suddenly, it is the opposite region direction, N, S, E, W, NE , SW, NW, SE. By analyzing the algorithm, the cycle time The complexity is expressed by the following equation. The execution time for each iteration is represented by 75 μs. Region is 2 pixels per iteration Since it grows diagonally, up to 103 iterations are required. However, I = 20 The tabulated value is 1.5 ms. The algorithm produces very thin boundaries. In the following template, 4 One thinning in the direction (south, north, west, east) is sufficient. The template is shown in the initial south orientation, with the order in each set in the opposite direction reversed. Note that it has been rolled. This inversion is essential to maintain boundary accuracy. I think Will be Pixels removed from the boundary are examined in order in the four neighborhoods and are not boundary points It is colored again by copying the color of the first point. Thinning and Since recoloring is not an iterative process, it has no significant impact on run time. Yes. FIG. 34 is useful in understanding Listing 10. Listing 10: Associative Boronoi Diagra m) The associative Voronoi algorithm provides fast access to statistical data. Designed Therefore, the length (in pixel) of the Voronoi diagram, or seed coordinates Read (in pixel) range of Voronoi region identified by (seedcoordinate) Requires only 13 machine cycles. Word Length This section describes the word length (K) of the associative memory required to calculate the image algorithm. ) To evaluate. 3 channel black and white computer stereo image (monochrome computer Consider the machine model described as (stereo vision). Input each There are M bits on the left and right with respect to the input image. The machine has 3 levels used for high level processing. Generate parameters for the channel. The parameters are as follows. bit Long [log₂(2W_i+1)] disparity for the left and right images (length 4 And 1) slope direction and edge name (edge designaton), 1 bit Match label. If the input data is stored in subsequent processing, Then, the final word length is given by the following equation. M = 8 and W = P = 7, 15, and 31. Additional word space is required for temporary storage of intermediate results. And this is each Dynamically fluctuates during the execution of the seed algorithm. The maximum word length depends on the execution order. we In the case of, the best order is to start with the one at the bottom (the coarsest) and It is to calculate the channel. The maximum by the investigation of various processing phases (phase) It turns out that the word length is generated during the calculation of the disparity for the final channel. Was. Therefore, the maximum word length is expressed by the following equation. K_max= 2M + K_ch1,2+ K_sp (54) The first term is for input data, K_ch1,2Mean the results of the first two channels To taste. K_ch1,2= 2 (2 (4 + 1) +1) + log₂(2W_Three+1) + log₂(2W₂+1) = 33 (55) K_spIs the working space for calculating the final channel disparity, It is expressed by a formula (see stereoscopic image). K_sp= 3 × 2 ＋ 2 [log₂(2W₁+1)²] +5 [log₂(2W₁+1)]] = 42 (56) K_spDoes not include flag bits. Therefore, K_maxWill be 91 bits. Extending our model to include most of the image algorithms performed above Let's try it. Mentioned above Thus, the minimum word length required depends on the order of execution. The recommended order is: It is. ・ Optical Flow ・ Edge search and contour processing ・ Hough transform, protrusion mapping ・ Stereo matching Critical processing seems to be a 132-bit optical flow (stereoscopic image Including additional bytes of data). E_n1, U_av, V_avDistribute or reuse fields By doing so, the required word length is reduced to 106 bits. New algorithm And the word length of ARTVM is 128 bits (4 32-bit sections). And the flag bit of 8 bits, or 136 bits. this thing Only considers associative memory. If a 16-bit buffer is included , The total word length is 152 bits. Results and conclusions of the Akerib paper A low-cost general purpose vision architecture is a It is proposed that the algorithm can also be implemented. The proposed machine is a computer Has a classical associative structure that is compatible with vision and VLSI implementation One. It identifies an associative real-time vision machine (Artvm) and By using the tag register up and down shift mechanism to enhance the operation. internal The frame buffer virtually deletes computer I / O time, and inputs simultaneously, Allow output and calculation. Chip interface without any speed To reduce the space, the word is divided into four sectors, one of which is Is accessible at one time and the flag field is always accessible . The main hardware supplement that handles 512x512 images is associative memory. , 256 K words × 152 bits. 0.5 for the above experiment Extrapolated to micron technology, one chip area is 100mm²And 1 cycle time 30 nano Under the second, it produces a capacity of 1M bit associative memory. Suggested tip Stores 4K words × 152 bits, and this value is 59% capacity. 64 of these chips make up the associative memory. The ARTVM simulator develops associative micro software and It occurs in the C language used to evaluate complexity. x and y A convolution with a 15 element filter in the opposite direction requires 0.34ms Therefore, the canny edge instruction is executed in 0.5 ms, and the Marl and Hildress (Marr & Hildreth) How Execute about twice within the same time. Beyond ± 15 pixels range Grimson ) Method to calculate the lack of stereo commonalities is also clear and out of range. Includes testing and finishes in 1 microsecond. This stereo run crosses the neighborhood. Achieved by an array algorithm that counts the number of labeled pixels . Light flow by Horn & Schunck is less than 0.5ms. To go. Bent transmission, thinning and hallucination traces are each repeated Takes 1.5, 6.4 and 66 μs. Linear Hof Transform Lynear Hough has 16 volumes in the direction and distance from the original. Therefore, it takes 150 μs. An interesting result is Shawshire and Ur Results for Global Important Map of Man (Sha'ashua & Ullman) Was. It takes 0.4ms each iteration, faster than connection machines It is a large amount of three instructions. A geometric problem has occurred. That is, the convex hull is It takes 3.15μs for each Vertex, and the Voronoi die Yagram runs at 0.15ms per iteration with brushfire technology I do. Two methods were selected for comparative evaluation of ARTVM runs. First of all SIMD array of 256 high performance Omans Processors Compared with (Inmos T800 / Intel 860), it has an amount in the order of 2-3 on speed. It has been found to have a point. Speed gain is a higher precision neighborhood arithmetic operator. Lowest, for example convolution (factor of 97), eg For example, in neighborhood logic operations as bend transmission (a factor of 2500) Reached the peak. Second method Are used for some of the well-known vision architectures whose test results are It was the Abingdon cross benchmark. ARTVM is It was found that it leads the order of 2 to 6 in terms of price performance. The ARTVM format used throughout this study is Long Shift 32 Place ( b = 32). This is associated with the total interface 160-pin Added 64 pins to the chip. When b is reduced to 16, the average speed is about 17%. Save 32 pins by the amount of loss. As mentioned above, regarding technical advantages The architecture is flexible and offers the full advantage of higher chip density. Can be taken. The memory chip count is continuously changed by this parameter. There is an equivalent flexibility in image analysis when substituting. Therefore, In a 1024x1024 image, the chip count is increased by Factor 4. Incurs a small loss on speed, and probably causes a small increase in word length . In the vision algorithm above, the database is essentially unidirectional. Pixel or orientation towards the eye, which offers certain advantages. Offer. Hough transforms and convex hulls are excluded. Yo For higher level vision functions, more complex image Associative architecture offers greater advantages when dealing with It is expected. This work has important commercial implications. The devices and methods shown above are useful in other types of applications. For example, useful examples include, but are not limited to: Video telephones implementing the H.261 standard: QCIF analysis and QCIF analysis Video Conferencing for Video: Video Game Compression and Expansion: Desktop Publishing Color Image Enhancement and Multiplication for Flashing: Optical Character Recognition (OCR); virtual reality; computer-like image of a cartoon movie Image animation; 2 or 3 dimensional B / W, ie color image search and Processing; video detection for traffic control; medical imaging such as 3D reconstruction Rabini Background Projection Filtering; Real-time Normalized Grayscale Interaction A TV target for one or more targets, such as tracking vehicles for traffic control purposes. Racking; other such as high densification of license numbers Transportation applications; eg inspection of manufacturing objects for agricultural products; Wood and metal products and microelectronic products; computer-aided access Laration; Neural network application; Fuzzy logic application Post-processing of compressed image quality; video, digital or Is an analog camera photography, with or without image compression, special effects With or without, for example autofocus, gamma correction, photo montage, blues Clean, Hana correction, Bakuro correction, real-time morphology and geometric distortion correction TV applications such as HDTV (High Definition Television), Satellite TV Levi, cable TV; infotainment; speech recognition; finance, travel, shopping Ping and its Kiosk for other purposes; automatic office equipment such as fax machines, printers, Special achievements and enhancements for scanners and photocopiers; compression applications Small cards for licenses such as facials, fingerprints and other information Compression, ID card and membership card; communication application Solutions such as digital filtering, VITERBI decoding And dynamic programming; as well as training, education and more Entertainment applications etc. Video and picture editing applications are desktop publishing machines Noh, for example, blue ring, sharpening, rotation And other geometric transformations and video fills Including acceleration such as tarling. CD LOM (Compact Disc Read Only Memory) is a compressor For example MPEG-I, MPEG-II, JPEG, fractal compressors, and Including Konami compressor. These can be used in a wide variety of other applications, such as healthcare, Images for real estate, travel, research and journalistic purposes With enhancements such as video sharpening for things like reaching It doesn't matter. Facsimile application example cancel color background And the resulting appearance of text added to the color background. Sharpen letters and gaps in letters such as Kanji, OCR, and facsimile data compression. Fulfill. One example of a Photocopier application is to store a logo in memory, for example. The template to be accumulated on the copy is automatically added. Home, workplace Security applications for containers for banks, banks, valuables and proprietary information Includes: personal recognition such as face recognition, fingerprint recognition, eye recognition, voice Recognition, personal recognition such as handwriting recognition such as signature recognition. The camera characteristics that are accelerated by the implementation of the examples presented here are: including: 1. Gamma correction: A LUT (LOOK UP TABLE) contains 256 cells. The values a (a = 0, ..., 255) that are specified for each gamma power are included. No. Gamma has a value of 0.36 or 0.45, for example. Same LUT for all 3 Specify three elements (R, G and B) and the gamma correction is flat with all three elements. Achieved in line. 2. Rapid color-based conversions such as luminance and chrominance (c hrominance) is a color transformation that is separated before the next process. For example, reduce the number of lits displayed in the Cr and Cb components. Convert RGB value or CMYK value to YCrCb value that can be compressed by It is sometimes desirable to replace. As a result, the compressed YCrCb values are RGB or Or CMYK values are restored. Luminance Kuromi with 3.5-15 tap filters Low-pass filtering of nonce signals. 4. Fix holes. For example, it can be achieved by the following steps. : a. Extraction of original signal representative of photographic scenes into high quality components; b. Separable filters, eg in columns and rows [-0.25 0.5 respectively -0.25] application of two separable filters; c. To shift the horizontal point signal by K = 0, 1, 2 or 4 pixels Generation of correction signal by d. Generation of modified output by adding modified signal to original signal. 5. Auto Focus and Auto Exposure Computation: For example, the focus of the camera can be adjusted by a pre-determined amount in the first direction. Can be. At that time, the ratio of high-frequency components to the example above Can be calculated better, so that this ratio increases or decreases as a result of the adjustment. Decide how few. When increasing, the focus will re-create a predetermined amount in the first direction. Adjusted. When decreasing, the focus adjusts in a second direction by a predetermined amount Is done. 6. Auto-color correction computing, for example, Auto-Ritoku? Control and automatic White parallel. For example, execute the following steps. : a. The darkest part of R, G or B signal reaches a predetermined level. Adjusting the black level until b. adjusting the different gains to make the meaning of the three signals equal, c. Before and after gain so that the most positive peaks of the three corrugations reach the white level To adjust. This is to calculate the maximum, minimum and intermediate levels for each color Can be done by. 7. Chroma-Keying: 8. Noise reduction Weighted averaging of two consecutive images with weights 1 / K, 1-1 / K, K = 2,4,8 9. Movement protection (avoid blurring when moving objects). 10. Creation of composite S-video signal Chrominance adjustment selection line: Usin (wt) + Vsin (wt) Usin (wt) -Vsin (wt) is executed at a frequency of 13.5 MHz You. It will be apparent to those skilled in the art that the present invention is not limited to what has been described. It's mild. The scope of the present invention is defined by the claims shown below.

───────────────────────────────────────────────────── フロントページの続き (81)指定国ＥＰ(ＡＴ，ＢＥ，ＣＨ，ＤＥ，ＤＫ，ＥＳ，ＦＲ，ＧＢ，ＧＲ，ＩＥ，ＩＴ，ＬＵ，ＭＣ，ＮＬ，ＰＴ，ＳＥ)，ＯＡ(ＢＦ，ＢＪ，ＣＦ，ＣＧ，ＣＩ，ＣＭ，ＧＡ，ＧＮ，ＭＬ，ＭＲ，ＮＥ，ＳＮ，ＴＤ，ＴＧ)，ＡＰ(ＫＥ，ＭＷ，ＳＤ，ＳＺ)，ＡＭ，ＡＴ，ＡＵ，ＢＢ，ＢＧ，ＢＲ，ＢＹ，ＣＡ，ＣＨ，ＣＮ，ＣＺ，ＤＥ，ＤＫ，ＥＥ，ＥＳ，ＦＩ，ＧＢ，ＧＥ，ＨＵ，ＪＰ，ＫＥ，ＫＧ，ＫＰ，ＫＲ，ＫＺ，ＬＫ，ＬＲ，ＬＴ，ＬＵ，ＬＶ，ＭＤ，ＭＧ，ＭＮ，ＭＷ，ＮＬ，ＮＯ，ＮＺ，ＰＬ，ＰＴ，ＲＯ，ＲＵ，ＳＤ，ＳＥ，ＳＩ，ＳＫ，ＴＪ，ＴＴ，ＵＡ，ＵＳ，ＵＺ，ＶＮ────────────────────────────────────────────────── ─── Continuation of front page (81) Designated countries EP (AT, BE, CH, DE, DK, ES, FR, GB, GR, IE, IT, LU, M C, NL, PT, SE), OA (BF, BJ, CF, CG , CI, CM, GA, GN, ML, MR, NE, SN, TD, TG), AP (KE, MW, SD, SZ), AM, AT, AU, BB, BG, BR, BY, CA, CH, C N, CZ, DE, DK, EE, ES, FI, GB, GE , HU, JP, KE, KG, KP, KR, KZ, LK, LR, LT, LU, LV, MD, MG, MN, MW, N L, NO, NZ, PL, PT, RO, RU, SD, SE , SI, SK, TJ, TT, UA, US, UZ, VN

Claims

[Claims] 1. Associative signal processing apparatus for processing an input signal composed of a plurality of samples And Each processor contains a number of content addressable memory cells A quadratic of multiple processors in which each sample of is processed by more than one processor With the original array, Responses received from processors in a unit cycle between non-adjacent processors Stores one or more registers that operate to provide communication. A device comprising a register array including. 2. Each processor contains many associative memories and one or more processors can An array of processors that operate to process a large number of samples of Providing communication between processors by accumulating responses that arrive from the processors A register array that includes one or more registers that are operated to serve An I / O bus that operates to input input signals and output output signals An associative signal processing device comprising a buffer register. 3. 3. The apparatus according to claim 2, wherein the processor array and the register array are And a device in which the IO buffer register is arranged on a single chip. 4. The register array is configured to perform one or more multi-cell shift operations. The apparatus of claim 1, which is operative. 5. Register array operates to perform one or more multi-cell shift operations The device according to claim 2 or 3. 6. Each reference according to user-selected logic standards The other memory elements are compared, and as a result, the comparison memory element A plurality of comparison memory elements that, when followed, operate to generate a response, An associative device comprising a register operable to store the response. 7. 7. The apparatus of claim 6, wherein the standard comprises one or more logical operands. 8. The I / O buffer register and the processor operate in parallel. The device according to any one of 3 and 5. 9. The word length of the I / O buffer register is the word length of the associative memory cell. 6. Any one of claims 2, 3 or 5 which can be increased by decreasing the height. Equipment. 10. A video operated in real time as claimed in any one of claims 1, 2, 4 or 5. The device according to item. 11. 10. The signal according to any of claims 1, 2, 4, 5 or 9 forming an image. The device according to one paragraph. 12. The at least one logical operand is at least one other than the comparison memory element. 8. The apparatus of claim 7, which constitutes a reference for the memory element of. 13. 7. The memory element according to claim 6, wherein each memory element constitutes one or more memory cells. apparatus. 14． Multiple comparison memory elements allow the contents of the memory elements to be referenced by a single reference. The contents of any one of claims 6, 7, 12, and 13 which are operated so as to be compared in parallel with the contents. The device according to any one of the above. 15. Output image imaged by a curved lens that compensates for lens bending The step of calculating the change of Applying the change in parallel to each of a plurality of pixels of the output image; Image correction method consisting of. 16. The method of claim 15, wherein the curved lens comprises an HDTV lens. 17． Pros that are communicated by shift operations of multi-cell and single-cell An array of Sessa, Multiple processors, One or more pair of processors are connected and the bus is one or more multi-cell shift A first bus that operates to perform a Connect one or more pair of processors, and the bus is for single cell shift operation An array consisting of a second bus operated to execute. 18. First and second signals that form each successive pair according to a single characteristic sequence The number of samples having the first signal characteristic in parallel in the signal characteristic, and As a result, the number of samples having the second signal characteristic is calculated in parallel. A signal processing method for processing a generated signal. 19. The method according to claim 18, wherein the calculation unit is configured to generate a histogram. . 20. 20. The method according to claim 18 or 19, wherein the signals constitute a color image. 21. The characteristic of 21.1 or more is composed of one or more of the following group characteristics, Described method: Intensity; noise; and color density. 22. The method further comprising the step of scanning a medium carrying a color image. 21. The method according to 21. 23. The image comprises a color image. 11. The device according to item 11. 24. Recognizing a first plurality of edge pixels and a second plurality of candidate edge pixels When, All candidates connected to one or more edge pixels as edge pixels are defined as edge pixels. The process of recognizing in parallel, An edge detection method comprising the step of repeating the recognition one or more times in parallel. 25. A feature color check for signals that contain one or more feature colors consisting of one set of connection samples. Belling method, Accumulating multiple indexes for corresponding multiple samples, An index of connection samples in parallel with each independent sample from the plurality of samples Is aligned on the independent sample index, the connected sample index Replacing the pull accumulation index, A method comprising repeating the replacement one or more times. 26. The replacement is repeated until a small number of indexes are replaced at each iteration Item 25. The method according to Item 25. 27. The said signal is comprised from an image, 1-5, 8-11, 18-2. 27. The method according to any one of 3, 25 and 26. 28. 27. The method of claim 26, wherein the signal comprises a color image. 29. The change is imaged by a curved lens, which is compensated for by the curved lens. A change computer that operates to calculate changes in the output image, Manipulate the output image to apply changes to each of a number of pixels in parallel Non-concurrent change means Image correction device. 30. An associative memory composed of an array of PEs including a plurality of PEs, Each PE has a variable size processor including a variable size processor and an associative memory cell. Including the word and All of the associative memory cells among the plurality of associative memory cells included in PEs are word Multiple words that are arranged in the same area within the range and are included in the multiple PEs Is an associative memory that forms a FIFO. 31. 33. The variable sized word comprises one or more associative memory cells. apparatus. 32. A method of changing the contents of a number of memory cells, Performs a computer arithmetic calculation once for independent values stored in multiple memory cells The process to perform, Accumulating the results of arithmetic calculations in a large number of memory cells containing independent values Method. 33. 33. The method of claim 32, wherein the storing is performed on all memory cells in parallel. . 34. A method of constructing an associative signal processing device for processing an input signal, comprising: Arrange one array of multiple processors into one module and The sessor contains a number of associative memory cells, each sample of the input signal being one or more pro- cesses. The steps processed by the sessa, Accumulates responses that arrive from the processors and communicates between each processor A register array containing one or more registers operating to provide the same Arranging into modules, I / O buffer registers for inputting and outputting signals in the same module And a step of arranging. The 35.1 or greater sample is processed by more than one processor. apparatus. The apparatus of claim 1, wherein 36.1 or more processors process one or more samples. 37. The apparatus of claim 1, wherein the register array comprises a plurality of registers. 38. I / O buffer image input commands are image column / column commands. A different device according to claim 2. 39. The instruction that the I / O buffer inputs the sample is the input signal sample instruction. The device of claim 2 different from. 40. The register array is adapted to accumulate the arriving responses from the processor. The apparatus of claim 1 including a plurality of registers for operating. The 41.1 or higher registers provide communication between processors. An apparatus according to claim 1. Registers of 42.1 and above can be linked between processors that process non-adjacent samples. 42. The device of claim 41, which provides a communication. 43. Equipped with I / O buffer registers that operate to input and output signals The device according to claim 1, wherein 44. The processor array, the register array, and the I / O buffer register 44. The device of claim 43, wherein the stars are arranged in a single module. 45. The processor array, register array and I / O buffer register are 44. The device of claim 43 arranged in a silicon die. 46. The I / O buffer register is a plurality of buffers. The two-dimensional processor includes register cells, and the number of the buffer register cells is the two-dimensional processor. 46. The apparatus of claim 45, equal to or greater than the number of processors in the array. .