JPS62259164A

JPS62259164A - Computer and real time voice recognizer

Info

Publication number: JPS62259164A
Application number: JP61283012A
Authority: JP
Inventors: アラステア　ディー　マッコーレイ
Original assignee: Texas Instruments Inc
Current assignee: Texas Instruments Inc
Priority date: 1985-11-27
Filing date: 1986-11-27
Publication date: 1987-11-11

Abstract

(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。(57) [Summary] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

【発明の詳細な説明】（産業上の利用分野）本発明は電子式コンピュータ及び実時間音声レコグナイ
ザに関し、特に、相互接続並列プロセッサを有するコン
ピュータに関する。DETAILED DESCRIPTION OF THE INVENTION Field of the Invention This invention relates to electronic computers and real-time speech recognizers, and more particularly to computers having interconnected parallel processors.

（従来の技術）実時間計算を実現しようとする主要な趨勢として、並列
処理及び記号処理がある。多（の実時間適用は、記憶さ
れている知識を用い且つ大量のデータを高速に処理する
迅速な論理的決定を必要とする。更にまた、記号計算と
数値計算との間の緊密な結合が、音声及び影像の理解及
び認識、ロボット工学、兵器システム、及び工業施設制
御のような分野において望ましい場合が多い。事実、事
務所及び家庭における小形コンピュータの使用の広まり
、並びに人工知能及びロボット工学の研究分野の出現は
、非数値的または記号的計算に費やされる計算労力の量
が益々増加しつつあるという事実に注意を向けさせてお
り、エディタ、コンパイラ、及びデバッガのようなコン
ピュータとともに用いられる多くのソフトウェア手段が
記号処理の使用を広げている。記号計算は、定性的情報
またはアプリオリ知識をデータベース及び手続きの形式
で利用することができるので、数値的及び統計的手法に
加えて諸問題を解く新しい方法へ通ずるものである。(Prior Art) Parallel processing and symbolic processing are the main trends toward realizing real-time computation. Real-time applications of multiplication require rapid logical decisions using stored knowledge and processing large amounts of data at high speed.Furthermore, the tight coupling between symbolic and numerical computations , speech and image understanding and recognition, robotics, weapons systems, and industrial facility control.In fact, the widespread use of small computers in offices and homes, and the increasing use of artificial intelligence and robotics The emergence of a field of research has drawn attention to the fact that the amount of computational effort devoted to non-numeric or symbolic computations is increasing, and many of the tools used with computers, such as editors, compilers, and debuggers, are software tools are expanding the use of symbolic processing.Symbolic computation is an addition to numerical and statistical methods for solving problems because qualitative information or a priori knowledge can be made available in the form of databases and procedures. It leads to new methods.

例えば、ロボット工学、音声、及び視覚において人間の
ような知能を必要とする実社会の諸問題を解こうとする
試みは、簡単な演算であると考えられているものに対し
て要求される真人な量のアプリオリ情報、及びセンサか
らの高いデータ速度の故に、膨大な量の記号及び数値計
算力を必要とする。事実、センサデータの信号処理が、
音響学、ソナー、地震学、音声通信、生物工学等のよう
な分野に生じており、かかる処理の代表的な目的として
は、特性パラメータの推定、雑音の除去、及びより望ま
しい形式への変換がある。従来、大部分の信号プロセッ
サは、成る少数の特定のアルゴリズムに対する音声及び
効率に適合するように構成されている。将来の信号プロ
セッサは、高分解能固有システム・ビーム形成及び最適
ウィーナフィルタリングを同じプロセッサで計算するこ
とのできるように、及び新しいアルゴリズムを、これが
開発されるのにつれて、効率的に実行することのできる
ように、速度及びアルゴリズム柔軟性をもっと増大させ
ることが必要となるであろう。軍用システムにおける広
範囲のアルゴリズムを取扱う能力により、種々のアルゴ
リズムを使命最中に用い、現地施設を新しいアルゴリズ
ムで更新することができるようになっている。従来のベ
クトル手法は、コンピュータの性能に対して益々増大す
る要求を満足するができず、将来の設計は、広い並行処
理を効率的に利用することのできるものであることが必
要どなっている（１９８３年３月、米国、二二−オルレ
アン市、■ＥＥＥコンピュータ協会、「コンピュータシ
ステム組織に関するＩ　ＥＥＥ国際研修会」、Ｉ　ＥＥ
Ｅ刊８３　Ｃ８１８７９−６のマコーレイ（ＭｃＡｕｌ
ａｙ　）のｒＶＬｓＩにおける方向、並列アレイかベク
トルマシン？Ｊ　（ＰａｒａｌｌｅｌＡｒｒａｙｓ　ｏ
ｒ　Ｖｅｃｔｏｒ　Ｍａｃｈｉｎｅｓ、　Ｗｈｉｃｈ　
Ｄｉｒｅｃｔｉｏｎｉｎ　ＶＬＳＩ　？　）　、並びに
、エル・ニス・ヘイネス（Ｌ、Ｓ、　Ｈａｙｎｅｓ　）
　、アール・エル・ロー（Ｒ，Ｌ、　Ｌａｕ）、ディー
・ピー・シーウィオレフク（Ｄ、Ｐ、　Ｓｉｅｗｉｏｒ
ｅｋ）、及びディー・ダブリュー・ミツエル（Ｄ、Ｗ、
Ｍｉｚｅｌｌ　）の「コンピュータＪ　（Ｃｏｍｐｕｔ
ｅｒ　Ｈ５（１）　、９（１９８２）、並びに、ＩＥＥ
Ｅ会報、７３（５）、８５２　（１９８５）のジエイ・
アレン（Ｊ、　Ａｔ１ｅｎ　）、並びに、Ｉ　ＥＥＥ分
野５会議会報８５　ＣＨ２１２３−８（１９８５）にお
けるエイ・ディー・マコーレイ（Ａ、Ｄ、　ＭｃＡｕｌ
ａｙ　）を参照）。これらの参照文献の内容については
、他の全ての参照文献とともに本明細書において説明す
る。For example, attempts to solve real-world problems that require human-like intelligence in robotics, speech, and vision are challenging to solve real-world problems that require human-like intelligence in robotics, speech, and vision. Because of the amount of a priori information and the high data rate from the sensors, a huge amount of symbolic and numerical computing power is required. In fact, signal processing of sensor data
Occurring in fields such as acoustics, sonar, seismology, voice communications, bioengineering, etc., typical purposes of such processing include estimating characteristic parameters, removing noise, and converting to a more desirable format. be. Traditionally, most signal processors are configured to suit the performance and efficiency of a small number of specific algorithms. Future signal processors will be able to compute high-resolution eigensystem beamforming and optimal Wiener filtering on the same processor, and to efficiently implement new algorithms as they are developed. Further increases in speed and algorithmic flexibility will be required. The ability to handle a wide range of algorithms in military systems allows for a variety of algorithms to be used during missions and to update field facilities with new algorithms. Traditional vector techniques are unable to meet the ever-increasing demands on computer performance, and future designs will need to be able to efficiently exploit wide concurrency. (March 1983, 22-Orléans, USA, EEE Computer Association, "IEEE International Training Session on Computer System Organization", IEE
Macaulay (McAul) of E issue 83 C81879-6
ay ) direction in rVLsI, parallel array or vector machine? J (ParallelArrays o
r Vector Machines, Which
Direction in VLSI? ), as well as L.S. Haynes
, R.L. Lau, D.P. Siewior
ek), and D.W. Mitsuel (D,W,
Mizell)'s ``Computer J''
er H5(1), 9 (1982), and IEE
J.E. Newsletter, 73(5), 852 (1985).
Allen (J, At1en), and A.D. McAulay (A, D) in IEEE Field 5 Conference Bulletin 85 CH2123-8 (1985).
ay)). The contents of these references, along with all other references, are discussed herein.

半導体装置における超大規模集積もまた並列処理法のよ
り大きな使用へ向かって先導しつつある。Very large scale integration in semiconductor devices is also leading towards greater use of parallel processing methods.

並列処理は、処理素子相互間の成る種の相互接続を必要
とし、これは、速度と広範囲のアルゴリズムを取扱う能
力との間のトレードオフをもたらす。Parallel processing requires some kind of interconnection between processing elements, which results in a trade-off between speed and the ability to handle a wide range of algorithms.

例えば、複合相互接続ネットワークは速度の犠牲におい
て若干の柔軟性を提供するものであり、高い速度は、成
る特定のアルゴリズムに対する固定相互接続によって得
られる。そこで、問題は、多数の処理素子を効率的に使
用することによって極めて高い速度を得、且つ同時に極
めて高いアルゴリズム柔軟性を保持することである。並
列処理に対する効率は、「同じ型の単一プロセッサの使
用に対する速度の利得」を「プロセッサの個数」で除し
たものである。また、処理素子の複雑性は、実現可能な
並列処理の程度に関係する。即ち、精巧な計算は、粗レ
ベルでの列処理化不可能な部品を持つ傾向がある。全体
的速度は、粗レベルで並列処理化不可能である部品によ
って支配される。For example, composite interconnect networks offer some flexibility at the expense of speed, and higher speeds can be obtained with fixed interconnects for certain algorithms. The problem then is to obtain extremely high speeds by efficiently using a large number of processing elements, and at the same time retain extremely high algorithmic flexibility. Efficiency for parallel processing is the speed gain over using a single processor of the same type divided by the number of processors. The complexity of the processing elements is also related to the degree of parallelism that can be achieved. That is, sophisticated calculations tend to have parts that cannot be columnarized at a coarse level. Overall speed is dominated by parts that cannot be parallelized at a coarse level.

そして、多数の高速基本プロセッサがかなりの通信負担
をプロセッサ間の相互接続部にかける。簡単な再構成可
能性を有する並列プロセッサ相互接続体が必要となって
いる。The large number of high speed basic processors then places a significant communication burden on the interconnects between the processors. There is a need for a parallel processor interconnect with easy reconfigurability.

現在、大部分の実験システムによれば、適度の個数のプ
ロセッサを用いても、成る範囲のアルゴリズムに対して
並列処理を得ることは困難であるということが示されて
いる（１９８３年３月、米国、二二−オルレアン市、Ｉ
ＥＥＥコンピュータ協会、「コンピュータシステム組織
に関するＩＥＥＥ国際研修会」、Ｉ　ＥＥＥ刊８３ＣＨ
１８７９−６のマコーレイのｒＶＬｓＩにおける方向、
並列アレイかベクトルマシーンか？」。効率的に使用す
ることのできる並列プロセッサの個数（従って、速度）
は、今回のプロトタイプ及び提案されているシステムに
おいては、通信の遅延及び相互接続の複雑性によって制
限される。アルゴリズム設計に対して相互接続によって
加えられる制約は、新しいアルゴリズム設計によって性
能を得ることの機会を減少させ、且つ、適用の範囲及び
装置の寿命を制限することによって費用を高くするので
、重大な問題である。Currently, most experimental systems show that it is difficult to obtain parallel processing for a range of algorithms, even with a moderate number of processors (March 1983). 22-Orléans, USA, I
EEEE Computer Association, "IEEE International Training Session on Computer System Organization", IEEE Publication 83CH
Direction in Macaulay's rVLsI of 1879-6,
Parallel array or vector machine? ”. Number of parallel processors that can be used efficiently (and hence speed)
is limited by communication delays and interconnect complexity in the present prototype and proposed system. The constraints imposed by interconnections on algorithm design are a significant problem because they reduce the opportunity to gain performance with new algorithm designs and increase costs by limiting the scope of application and lifetime of equipment. It is.

固定相互接続は、効率的に実行することのできるアルゴ
リズムの範囲を制限する。例えば、ニューマシンを用い
る並列計算におけるバス構造の限界が考察されている（
１９８４年１０月２２日のＮＡＳＡラングレイ研究セン
タの「構造及び力学における進歩及び趨勢に関するＮＡ
ＳＡシンポジウム」のマコーレイの「最近隣接接続マシ
ンに対する有限要素計算Ｊ　（Ｆｉｎｉｔｅ　Ｅｌｅｍ
ｅｎｔ　Ｃｏｍｐｕｔａｔｉｏｎｏｎ　Ｎｅａｒｅｓｔ
　Ｎｅｉｇｈｂｏｒ　Ｃｏｎｎｅｃｔｅｄ　Ｍａｃｈｉ
ｎｅｓ　））。Fixed interconnections limit the range of algorithms that can be efficiently executed. For example, the limitations of bus structures in parallel computation using new machines have been considered (
NASA Langley Research Center's NA on Advances and Trends in Structures and Mechanics, October 22, 1984.
Macaulay's ``Finite Elem Computation J for Recently Adjacent Connected Machines'' at the SA Symposium.
entComputationonNearest
Neighbor Connected Machine
nes )).

カーネギ−・メルロン大学で開発されているもののよう
なシストリック構成（ｒｌＥＥＥコンピュータ」誌、１
９８２年１月号、３７〜４６頁のカング・エイチ・ティ
ー（Ｋｕｎｇ　Ｈ，Ｔ、　）の「何故にシストリック・
アーキテクチャか？　Ｊ　（ＷｈｙＳｙｓｔｏｒｉｃ　
Ａｒｃｈｉｔｅｃｔｕｒｅｓ　？　））　＠これは、通
信時間を減少させ、且つ多数のプロセッサを並列に効率
的に使用することを可能ならしめる。しかし、固定相互
接続であるので、アルゴリズムの制約が大きい。Systolic configurations such as those being developed at Carnegie Mellon University (rlEEE Computers, 1)
Kung H.T., January 982 issue, pp. 37-46, “Why Systolic?”
Is it architecture? J (Why Systoric
Architectures? )) @This reduces communication time and allows efficient use of multiple processors in parallel. However, since it is a fixed interconnection, the algorithm is highly constrained.

アルゴリズム柔軟、性は複合再構成可能相互接続ネット
ワークによって得られ（１９８４年レキシントン・ブッ
クス（Ｌｅｘｉｎｇｔｏｎ　Ｂｏｏｋｓ　）のシーゲル
・エイチ・ジェイ（Ｓｉｅｇｅｌ　ｆｌ、　Ｊ、　）の
「大規模並列処理のための相互接続ネットワーク、理論
と事例研究Ｊ　（Ｉｎｔｅｒｃｏｎｎｅｃｔｉｏｎ　Ｎ
ｅｔｗｏｒｋｓ　ｆｏｒＬａｒｇｅ　　５ｃａｌｅ　Ｐ
ａｒａｌｌｅｌ　Ｐｒｏｃｅｓｓｉｎｇ、　Ｔｈｅｏｒ
ｙａｎｄ　Ｃａ５ｅ　５ｔｕｄｙ　））、そして、８つ
のプロセッサを有し、バンヤンスイッチを用いたプロト
タイプシステムが、米国、オースチン市のテキサス大学
で稼働している（「今日の物理Ｊ　（Ｐｈｙｓｉｃｓ　
Ｔｏｄａｙ）第３７巻、第５号（１９８４年５月）のブ
ラウン・ジェイ・シー（Ｂｒｏｗｎ　Ｊ、　Ｃ，）のコ
ンピュータシステムのための並列アーキテクチャＪ　（
ｐ３ｒａｌｌｅｌＡｒｃｈｉｔｅｃｔｕｒｅ　ｆｏｒ　
Ｃｏｍｐｕｔｅｒ　Ｓｙｓｔｅｍｓ　））　＊バンヤン
は、２×２スイツチのレベルで構成された多チヤネルス
イッチである。しかし、この形式の再構成可能性は、提
案されている大部分のシステムにおい、大きな遅延及び
高い制御オーバヘッドを発生させ、そして、これがプロ
セッサの個数及びシステムの速度を制限する。Algorithmic flexibility is obtained by complex reconfigurable interconnect networks (Siegel, F. J., 1984, Lexington Books, ``Interconnect Networks for Massively Parallel Processing''). , Theory and Case Studies J (Interconnection N
etworks for Large 5cale P
Arallel Processing, Theor
yand Ca5e 5tudy )), and a prototype system with eight processors and Banyan switches is in operation at the University of Texas in Austin, USA (``Physics Today'').
Today) Volume 37, No. 5 (May 1984), Brown J. C., Parallel Architectures for Computer Systems J (
p3rallel Architecture for
Computer Systems)) *Banyan is a multi-channel switch configured with 2x2 switch levels. However, this type of reconfigurability causes large delays and high control overhead in most proposed systems, which limits the number of processors and the speed of the system.

複数のプロセッサ間に働きを分配することは、故障裕度
の目的に対してはこれが必ずしも常にシステムの同じ機
械的部分であることはないのであるが、成る最小レベル
中央制御の必要を無くするものではない。コンピュータ
の完全な演算をひとりだけで決定する単一プログラムの
着想は、相異なるプロセッサ内で同時に実行される複数
のかかるプログラムで置き換えられる。中央制御装置へ
行く通信チャネルは、これが隘路となることを防止する
のに十分なものでなければならない。そして、共通メモ
リが、情報を一つのプロセッサから他のプロセッサへ通
信する過程において屡々用いられる。２つまたはそれ以
上のプロセッサが情報の同じ部分を共通メモリから同時
に要求する場合に、潜在的困難であるメモリ回線争奪が
生ずる。Distributing work among multiple processors, although for fault tolerance purposes this is not necessarily always the same mechanical part of the system, eliminates the need for a minimum level of central control. isn't it. The idea of a single program that single-handedly determines the complete operation of a computer is replaced by multiple such programs running simultaneously in different processors. The communication channel to the central controller must be sufficient to prevent this from becoming a bottleneck. And common memory is often used in the process of communicating information from one processor to another. A potential difficulty, memory contention, occurs when two or more processors simultaneously request the same portion of information from a common memory.

そこで、何等かの調停が必要となり、一つのプロセッサ
を遊ばせておくか、または後で再びこのメモリに要求を
行なうことが必要となる。これは複雑性、費用及び非効
率性を増大させる。マトリックス・マトリックス乗算に
生ずる簡単な例としては、第１のマトリックスの単一の
行が、第２のマトリックスの各列との同時乗算のために
全プロセッサにおいて要求される。かかる明確に定義さ
れた演算に対するメモリの回線争奪には、コンピュータ
設計の際に注意を要する。Some arbitration is then required, requiring either one processor to be left idle or to request this memory again at a later time. This increases complexity, cost and inefficiency. A simple example that occurs in matrix-matrix multiplication is that a single row of a first matrix is required on all processors for simultaneous multiplication with each column of a second matrix. Memory contention for such well-defined operations requires attention in computer design.

種々のプロセッサが適切な時にそのタスクを完結して次
のステージのための情報を提供するように問題を区分す
るためには高度の技術が必要である。同期化は、全ての
ものを、最も遅いリンクに対して待させ、その結果、非
効率的となる。並列アルゴリズムは、成る特定の並列マ
シンに対してはより効率的ではあるが、−ａに用いられ
ている直列アルゴリズムよりも多くのステップを含む可
能性がある。効率を、多重プロセッサに対する速度を、
単一プロセッサに対する最も速いアルゴリズムでの速度
で除したものとして測ると、オーバヘッドがアルゴリズ
ムの効率を減少させる。直列アルゴリズムに対する並列
アルゴリズムの安定性及び正確性についても比較検討す
ることが必要である。Sophistication is required to partition the problem so that the various processors complete their tasks at the appropriate time and provide information for the next stage. Synchronization forces everything to wait on the slowest link, resulting in inefficiency. Parallel algorithms may involve more steps than the serial algorithm used in -a, although they are more efficient for a particular parallel machine. efficiency, speed for multiple processors,
Overhead reduces the efficiency of the algorithm, measured as the speed of the fastest algorithm for a single processor. It is also necessary to compare the stability and accuracy of parallel algorithms with respect to serial algorithms.

通信業界は、光ファイバを広く使用しており、電子工学
への転向を避け、及び切換えの目的を支持するために、
光学的切換え装置を開発しつつある。帯域中ビン制限及
びエツジ接続制約を克服するために、ＶＬＳＩとの通信
に対して光学が従来から示唆されている（ＩＥＥＥ会報
、第７２巻、第７号（１９８４年７月）、８５０〜８６
６真の、グツドマン・ジエイ・ダブりニー（Ｇｏｏｄｍ
ａｎ　Ｊ、獣）、レオンバーガ・エフ・ジェイ（Ｌｅｏ
ｎｂｅｒｇｅｒ　Ｆ、　Ｊ、）　。The telecommunications industry uses optical fiber extensively, to avoid turning to electronics and to support switching purposes.
We are developing an optical switching device. To overcome mid-band bin limitations and edge connectivity constraints, optics have traditionally been suggested for communicating with VLSI (IEEE Bulletin, Vol. 72, No. 7 (July 1984), 850-86).
6 True, good man
an J, Beast), Leonberger FJ (Leo
nberger F, J,).

カング・ニス・ワイ（Ｋｕｎｇ　Ｓ、Ｙ、　）　、及び
アサール・アール・エイ（Ａｔｈａｌｅ　Ｒ，Ａ、　）
のｒＶＬｓＩシステムのための光学的相互接）ＮＪ　（
ＯｐｔｉｃａｌＩｎｔｅｒｃｏｎｎｅｃｔｉｏｎｓ　ｆ
ｏｒ　ＶＬＳＩ　Ｓｙｓｔｅｍｓ）Ｓ並びに・ディジタ
ル光学回路技術に関するＡＧＡＲＤ−ＮＡＴＯアビオニ
クス・パネル専門家会議（１９８４年９月のネフ・ジェ
イ・エイ（Ｎｅｆｆ　Ｊ、Ａ、−）のｒＶＬＳＩ相互接
続のための電子光学技術Ｊ　（ＩＥｌｅｃｔｒｏ−ｏｐ
ｔｉｃｔｅｃｈｎｉｑｕｅｓ　ｆｏｒ　ＶＬＳＩ　Ｉｎ
ｔｅｒｃｏｎｎｅｃｔ　）参照）。Kung S, Y. and Athale R.A.
optical interconnection for the rVLsI system) NJ (
Optical Interconnections f
AGARD-NATO Avionics Panel Expert Meeting on Digital Optical Circuit Technology (Neff J.A.--September 1984) J (IE Electro-op
tictechniques for VLSI In
terconnect)).

ディジタル光学コンピュータが、終局的には支配的とな
るものと期待されており、主要な問題である有限要素を
解（ための設計が提案されている（「光学技術Ｊ　（０
ｐｔｉｃａｌ　Ｅｎｇｉｎｅｅｒｉｎｇ　）（１９８５
）に所載のマコーレイ（ＭｃＡｕｌａｙ）の「変形可能
ミラー最近隣接光学コンピュータＪ　（Ｄｅｆｏｒｍａ
ｂｌｅＭｉｒｒｏｒ　Ｎｅａｒｅｓｔ　Ｎｅｉｇｈｂｏ
ｒ　０ｐｔｉｃａｌ　Ｃｏｍｐｕｔｅｒ　）、並びに、
本発明者にかかる係属中の米国特許出願第７７７．６６
０号参照）。この設計は変形可能ミラーまたは他の空間
光変調器を使用している（［光学技術ｊ　（Ｏｐｔ、　
Ｅｎｇ、　）　、第２２巻、第６号（１９８３年１２月
）、６７５〜６８１頁のペイプ・ディー・アール（Ｐａ
ｐｅ　Ｄ、Ｒ，）及びホーンベック・エル・ジェイ（Ｈ
ｏｒｎｂｅｃｋ　Ｌ、Ｊ、　）の「光学的情報処理のた
めの変形可能ミラー装置の特性」（Ｃｈａｒａｃｔｅｒ
ｉｓｔｉｃｓ　ｏｆ　ｔｈｅ　Ｄｅｆｏｒｍａｂｌｅ　
ＭｉｒｒｏｒＤｅｖｉｃｅ　ｆｏｒ　０ｐｔｉｃａｌ　
Ｉｎｆｏｒｍａｔｉｏｎ　Ｐｒｏｃｅｓｓｉｎｇ　）参
照）。マトリックス代数演算のための音響光学を用いる
マシンが現在研究されている。これらのコンピュータは
、数値演算に対しては意義あるものであるが、使用され
ている相互接続システムのためにアルゴリズム柔軟性が
制限される。これらはまた信号処理用を自損したもので
はない。Digital optical computers are expected to eventually become dominant, and designs have been proposed to solve the major problem of finite elements ("Optical Technology J (0
(1985)
McAulay's “Deformable Mirror Nearest Adjacent Optical Computer J (Deforma
bleMirror Nearest Neighbo
r 0ptical Computer ), and
Pending U.S. Patent Application No. 777.66
(See No. 0). This design uses deformable mirrors or other spatial light modulators (Opt.
Eng.), Volume 22, No. 6 (December 1983), pp. 675-681.
pe D, R,) and Hornbeck LJ (H
ornbeck L, J.), “Characteristics of deformable mirror devices for optical information processing”
istics of the Deformable
MirrorDevice for 0ptical
(See Information Processing). Acousto-optical machines for matrix algebra operations are currently being investigated. Although these computers are useful for numerical operations, their algorithmic flexibility is limited by the interconnection system used. These are also not intended for signal processing purposes.

データフローがＭＩＴ、ＳＲＩ及び日本において従来か
ら広く研究されている（アーヴインド（Ａｒｖｉｎｄ　
）及びイアンヌソシ・アール・エイ（Ｉａｎｎｕｃｃｉ
　Ｒ，＾、）の「多重処理における２つの基礎的問題Ｊ
　（Ｔｗｏ　Ｆｕｎｄａｍｅｎｔａｌ　ｌ５ｓｕｅｓ　
ｉｎＭｕｌｔｉｐｒｏｃｅｓｓｉｎｇ　）　、並びに、
ＭＩＴ報告、ＭＩＴ／ＬＣ３／ＴＭ−２４１（１９８３
年９月）の「データフロー解法Ｊ　（Ｄａｔａｆｌｏｗ
　５ｏｌｕｔｉｏｎ　）、並びに、並列処理に関するＩ
　ＥＥＥ国際会議（１９８４年８月）の会報のＨｉｒａ
ｋｉ　Ｋ、、　５ｈｉｎ＋ａｄａＴ、、　Ｎ１５ｈｉｄ
ａ　Ｋ、の「科学計算のためのデータフローコンピュー
タ、シグマ−１のハードウェア設計」（Ａ　Ｈａｒｄｗ
ａｒｅ　Ｄｅｓｉｇｎ　ｏｆ　ｔｈｅ　Ｓｉｇｍａ　−
１、ａ　Ｄａｔａ−ｆｌｏｗ　Ｃｏｍｐｕｔｅｒ　ｆｏ
ｒ　５ｃｉｅｎｔｉｆｉｃ　Ｃｏｍｐｕｔａｔｉｏｎｓ
　）、並びに、ジャガナサン・アール（Ｊａｇａｎａｔ
ｈａｎ　Ｒ，）及びアッシュクロット・イー・エイ（Ａ
ｓｈｃｒｏｆｔ　Ｅ。Data flow has been widely studied at MIT, SRI and Japan (Arvind
) and Iannucci R.A.
R, ^,)'s ``Two fundamental problems in multiprocessing J
(Two Fundamental l5sues
inMultiprocessing), and
MIT Report, MIT/LC3/TM-241 (1983
Dataflow Solution J (September 2013)
5 solution), as well as I regarding parallel processing.
Hira of the newsletter of the EEE International Conference (August 1984)
ki K,, 5hin+adaT,, N15hid
"Hardware Design of Sigma-1, a Data Flow Computer for Scientific Computing" by A K.
are Design of the Sigma -
1.a Data-flow Computer for
r 5Ccientific Computations
), as well as Jaganat
han R.) and Ashkrot E.A.
shcroft E.

Ａ、）の「イージーフローＪ　’（Ｅａｓｙｆｌｏｗ　
）、並びに、並列処理に関するＩ　ＥＥＥ国際会１（１
９８４年８月）の会報のジャガナサン・アール（Ｊａｇ
ａｎａｔｈａｎＲ，）及びアッ°シュクロフト・イー・
エイ（ＡｓｈｃｒｏｆｔＥ、Ａ、　）の「イージーフロ
ー：並列処理のためのバイブリフトモデルＪ　（Ａ　Ｈ
ｙｂｒｉｄ　Ｍｏｄｅｌ　ｆｏｒ　ＰａｒａｌｌｅｌＰ
ｒｏｃｅｓｓｉｎｇ　）、並びに、並列処理に関するＩ
ＥＩ！ＥＥＥＥ国際会ｍ１４年８月）の会報のオマンデ
ィ・エイ（抛ａｎｄｉ　Ａ、　）　、タラップホルツ・
ディー・（Ｋｌａｐｐｈｏｌｔｚ　Ｄ、）の「プロセス
準拠ＭＩＭＤマシンにおけるデータ被動計算Ｊ　（Ｄａ
ｔａ　ＤｒｉｖｅｎＣｏｍｐｕｔａｔｉｏｎｓ　ｏｎ　
Ｐｒｏｃｅｓｓ　Ｂａ５ｅｄ　ＭＩＭＤ　Ｍａｃｈｉｎ
ｅｓ　）、並びに、並列処理に関するＩ　ＥＥＥ国際会
議（１９８４年８月）の会報のロング・ジー・ジ（Ｒｏ
ｎｇ　Ｇ、Ｇ、）の「均質データフロー・プログラムの
パイプライニングＪ　（Ｐｉｐｅｌｉｎｉｎｇ　ｏｆ　
）ｌｏｍｏｇｅｎｅｏｕｓＤａｔａｆｌｏｗ　Ｐｒｏｇ
ｒａｍｓ　）参照）。必要な入力があると直ちに演算の
開始を許すことは、フォンノイマンのマシンにおけると
同じように単一プログラムカウンタの使用を回避するの
で、並列処理を用いることの可能な手段として一般に見
られている。A,)'s ``Easyflow J' (Easyflow
), as well as IEEE International Conference on Parallel Processing 1 (1
Jaganathan R (Jag) in the newsletter of August 984)
anathanR,) and Ashcroft E.
Ashcroft E, A.'s "Easy Flow: Vibrift Model J for Parallel Processing" (A H
ybrid Model for ParallelP
processing ), as well as I on parallel processing.
EI! EEEE International Conference (August 2014) Newsletter by Omandi A. and Tarapholz.
Klapholtz D., “Data-Driven Computation in Process-Based MIMD Machines” (Da.
ta Driven Computations on
Process Ba5ed MIMD Machine
es), as well as Long Ji Ji (Ro
"Pipelining of Homogeneous Data Flow Programs" by ng G, G,
) lomogeneousDataflow Prog
rams)). Allowing operations to begin as soon as the required input is available is generally seen as a possible means of using parallelism, since it avoids the use of a single program counter, as in von Neumann's machine. .

しかし、データフローマシンについて多くの形式が提案
されており、今日のところ、演算において主要となって
いるシステムはない、テキサス・インストルーメンツ社
は、先に、データフローシステムのためのソフトウェア
及びハードウェアを開発している（シー・アール・ヴイ
ック（Ｃ９Ｒ，Ｖｉｃｋ）及びシー・ヴイー・ラマムー
アジイ（Ｃ，Ｖ。However, many forms of data flow machines have been proposed, and to date, no system has become dominant in computing. (C9R, Vick) and C.V. Ramamoorazi (C, V.).

Ｒａｍａ＋ｎｏｏｒｔｈｙ　）　（［者）のソフトウェ
ア技術ハンドブック（１９８４）におけるオツクスレイ
・ディ−（０ｘｌｅｙ　Ｄ、　）、ソーバー・ビー（５
ａｕｂｅｒ　Ｂ、）、コルニッシュ・エム（Ｃｏｒｎｉ
ｓｈ　Ｍ、　）のデータフローマシンのためのソフトウ
ェア開発Ｊ　（Ｓｏｆｔｗａｒｅｄａｖｅｌｏｐｍｅｎ
ｔ　ｆｏｒ　Ｄａｔａ−Ｆｌｏｗ　ｍａｃｈｉｎｅｓ　
）　ｓ並びに、米国特許第４，１９７．５８９号）。相
互接続及びアルゴリズムとプロプセッサとの整合に付随
する諸問題は、データフローの考え方によって自動的に
解決されるものではない。Rama+noorthy) Oxley D (Oxley D, ), Sober B (5) in Software Technology Handbook (1984)
auber B.), Corniche M.
Software development for data flow machines J (Sh M, )
t for Data-Flow machines
) s and U.S. Pat. No. 4,197.589). Problems associated with interconnection and matching algorithms to processors are not automatically solved by data flow thinking.

（発明が解決しようとする問題点）本発明は、最大並列処理を効率的に得るために多数の基
本的処理素子を用いることによって極めて高速のコンピ
ュータを提案し、且つ同時に、光学的空間光変調器の形
式の高速大形汎用相互接続ネットワークによって極めて
高いアルゴリズム柔軟性を保持しようとするものである
。(Problems to be Solved by the Invention) The present invention proposes an extremely fast computer by using a large number of basic processing elements to efficiently obtain maximum parallelism, and at the same time optical spatial light modulation. The goal is to maintain extremely high algorithmic flexibility through a high-speed, large-scale, general-purpose interconnection network in the form of a device.

（問題点を解決するための手段）本発明においては、光学的空間光変調器がクロスバ−ス
イッチとして働き、任意の処理素子を任意の組合せの他
の処理素子に直接接続することを可能ならしめる。そし
て、簡単な処理素子の使用により、粗レベルでは並列処
理化不可能の諸部品の並列処理を可能ならしめる。実施
例においては、上記処理素子は、加算器、乗算器、比較
器等である。(Means for Solving the Problems) In the present invention, an optical spatial light modulator acts as a crossbar switch, allowing any processing element to be directly connected to any combination of other processing elements. . By using simple processing elements, it is possible to process parts in parallel that cannot be processed in parallel at a coarse level. In embodiments, the processing elements are adders, multipliers, comparators, etc.

（作　用）本発明によれば、基本プロセッサ・レベルの並列処理及
びアルゴリズム柔軟性の諸問題が解決される。光学的相
互接続は容量性負荷の影響を減少させ、及び相互干渉に
対する不感性を増大させることにより、電子的相互接続
にまさる利点を有し、また、ハード配線式システムの性
能を有す。(Operation) According to the present invention, problems of parallel processing and algorithm flexibility at the basic processor level are solved. Optical interconnects have advantages over electronic interconnects by reducing the effects of capacitive loading and increasing insensitivity to mutual interference, and also have the performance of hard-wired systems.

（実施例）先ず、一般の光学クロスバ−相互接続式コンピュータの
特徴について概略説明し、その次に、本発明の実施例に
ついて詳細に説明する。(Embodiments) First, the features of a general optical crossbar interconnection type computer will be briefly explained, and then embodiments of the present invention will be explained in detail.

之ス±ム傅五凰光学クロスバ−相互接続式コンピュータにおいては、低
レベルの並列処理を自動的に最大限に利用しながら広い
範囲の処理アルゴリズムの高速演算を効率的に実行及び
イネーブルするためにアルゴリズムグラフの直接マツピ
ングが可能である。The system is designed to efficiently execute and enable high-speed computation of a wide range of processing algorithms while automatically taking full advantage of low-level parallelism in interconnected computers. Direct mapping of algorithm graphs is possible.

第１図はかかるコンピュータ３０を略示するものであり
、このコンビエータは光学的高帯域中の再構成可能なＮ
ＸＭのクロスバ−スイッチ３２を有し、このスイッチは
、光ファイバ３４により、Ｋ個の基本プロセッサＰ＋な
いしＰｋの各々に接続されている。各基本プロセッサの
出力はクロスバ−スイッチ３２の１つの行へ行き、クロ
スバ−スイッチ３２の各列は一つの基本プロセッサの一
つの入力端子へ行く。即ち、Ｎは出力端子の総数であり
、Ｍは入力端子の総数よりも小さいかまたはこれに等し
い（これは、第１図に示すような独立外部入力端子を考
慮したものである）。例えば、１０２４ｘ１０２４のク
ロスバ−スイッチにおいては、各々に対して４つの接続
部（２つの入力と２つの出力）を想定すると、５１２個
の基本プロセッサが得られる。この基本プロセッサは、
例えば、汎用信号処理のための乗算器、加算器、（乗加
算器）、比較器、バッファレジスタ、プログラマブル素
子、及び入出力レジスタ、または、汎用記号処理のため
の論理ゲート、比較器、パターンマツチャ等の混合体で
ある。成る特定の適用領域のための試験アルゴリズムの
結果として、クロスバ−スイッチに取付けられる素子の
混合及び個数を最適化する仮編成装置が簡単に形成され
る。コンピュータ３０は、また、メモリ３６、ホストコ
ンピュータ３８、入出力器４０、ディスプレイ４２、プ
ログラマブルアドレスジェネレータ４４、及び制御器４
６を有している。これら装置は、主クロスバ−スイッチ
／基本プロセッサのアーキテクチャに対する周辺装置で
あり、コンピュータ３０の他の様式のもので置き換える
ことができる。FIG. 1 schematically shows such a computer 30, which is a combinator with a reconfigurable N
The XM crossbar switch 32 is connected by optical fibers 34 to each of the K elementary processors P+ to Pk. The output of each elementary processor goes to one row of crossbar switches 32, and each column of crossbar switches 32 goes to one input terminal of one elementary processor. That is, N is the total number of output terminals and M is less than or equal to the total number of input terminals (this takes into account independent external input terminals as shown in FIG. 1). For example, in a 1024x1024 crossbar switch, assuming 4 connections (2 inputs and 2 outputs) for each, 512 elementary processors are obtained. This basic processor is
For example, multipliers, adders, (multipliers and adders), comparators, buffer registers, programmable elements, and input/output registers for general-purpose signal processing, or logic gates, comparators, pattern matrices for general-purpose symbol processing. It is a mixture of tea, etc. As a result of a test algorithm for a specific application area, a provisional arrangement is easily created which optimizes the mix and number of elements installed in the crossbar switch. The computer 30 also includes a memory 36, a host computer 38, an input/output device 40, a display 42, a programmable address generator 44, and a controller 4.
6. These devices are peripherals to the main crossbar switch/base processor architecture and can be replaced by other types of computer 30.

基本的には、コンピュータ３０の演算は次の通りである
。即ち、各平行処理ステップの後、基本プロセッサＰ１
ないしＰｋの各々が、同時に、その結果を、クロスバ−
スイッチ３２を介して、基本プロセッサＰ、ないしＰｋ
のうちの選定された他のものの入力端子へ出力し、次に
続く並列処理ステップのための入力となす。クロスバ−
スイッチを通るデータのこの通過は、並列（この場合は
、各データビットが基本プロセッサの出力線及び−行の
クロスバ−スイッチ３２を必要とする）であっても、直
列（この場合は、並直列コンバータ及び多重クロックサ
イクルが上記データの通過に対して必要となる）であっ
ても、または組合せであってもよい。Basically, the calculations of the computer 30 are as follows. That is, after each parallel processing step, the elementary processor P1
. . . , Pk simultaneously transmit their results to the crossbar.
Via the switch 32, the elementary processor P or Pk
It is output to the input terminal of the selected other one of them, and serves as an input for the next parallel processing step. crossbar
This passage of data through the switches can be either parallel (in this case each data bit requires an output line of the elementary processor and a -row crossbar switch 32) or serial (in this case parallel to serial). converter and multiple clock cycles are required for passing the data) or a combination.

第２図は、各々が処理装置として示されている一群のこ
れらコンピュータを接続し、使用者が高レベル並列処理
を利用することのできるシステムを形成する方法を示す
ものである。即ち、処理装置を並列接続し、そして各処
理装置において基本プロセッサを並列接続する。FIG. 2 shows how a group of these computers, each shown as a processing unit, can be connected to form a system that allows the user to take advantage of high levels of parallel processing. That is, the processing units are connected in parallel, and the basic processors in each processing unit are connected in parallel.

使用者は、数学的表記法で、またはエイゾのような高レ
ベル言語で実行したいと望むアルゴリズムを決定する。The user decides which algorithm he wishes to implement in mathematical notation or in a high-level language such as Eizo.

ホストコンピュータ３日内のソフトウェアはフローグラ
フを構成し、そして、相互接続ネットワークを規定する
テーブル、基本プロセッサによって行なうことが必要で
ある演算、及びこれら演算のためのタイミングスケジュ
ールを準備し始める。これらはハードウェア内にマツピ
ングされる。即ち、上記ネットワークは、クロスバ−ス
イッチ３２セツテ°イング内に、上記演算は゛使用可能
な基本プロセッサＰ、ないしＰ、内に、マツピングされ
る。このマ・フピングは、使用可能なリソースを効率的
に利用しながらタイミング上の制約が満足されるように
なっている。相異なるアルゴリズムのための構成のライ
ブラリを、将来の使用のために保持しておくことができ
る。The software within the host computer constructs the flow graph and begins to prepare the tables that define the interconnect network, the operations that need to be performed by the base processor, and the timing schedules for these operations. These are mapped into hardware. That is, the network is mapped within the crossbar switch 32 setting, and the operations are mapped within the available basic processors P or P. This mapping is such that timing constraints are met while efficiently utilizing available resources. Libraries of configurations for different algorithms can be kept for future use.

使用者はデータをコンピュータに供給するだけでよく、
若干の遅れの後、処理済みデータが出力部に出る。各基
本プロセッサは、必要な論理入力または数値入力を受取
ると直ちにその演算を行ない、次の必要な演算へルート
指定するために出力をクロスバ−スイッチへ自動的に通
過させる。信号データの連続ストリームは信号処理用に
おける同じアルゴリズムを通じて処理することが必要で
あり、従って、基本プロセッサ付き装置を通ずるパイプ
ライン処理は使用可能な並列処理を最大限に利用するこ
とになる。いくつかの並列ストリームを、これによって
リソースをよりよ（利用できるならば、用いてよい。The user only needs to feed the data into the computer,
After some delay, the processed data appears on the output. Each elementary processor performs its operation as soon as it receives the required logical or numerical input and automatically passes its output to the crossbar switch for routing to the next required operation. Continuous streams of signal data need to be processed through the same algorithms for signal processing, so pipeline processing through the basic processor-based device makes the most of the available parallelism. Several parallel streams may be used, thereby freeing up resources (if available).

上記のシステムは次のような利点を有している。The above system has the following advantages:

即ち、１、使用し易い。精巧なソフトウェア及び高帯域中クロ
スバースイッチにより、プログラマは、そのコードを低
レベルで並列処理することについて苦労することがなく
なる。即ち、大部分のアルゴリズムについて、並列処理
が自動的に行なわれる。Namely: 1. Easy to use. Sophisticated software and high-bandwidth medium crossbar switches free programmers from having to struggle with low-level parallelization of their code. That is, parallel processing is automatically performed for most algorithms.

２、　クロスバ−スイッチのソフトウェア再構成可能性
により、全く新しいセントのアルゴリズムを最適に実行
することができ、従って、現場設備の寿命が伸び、また
、同じコンピュータを一つのアルゴリズムから他のアル
ゴリズムへ迅速に切り換えることができる。2. The crossbar switch's software reconfigurability allows completely new cent algorithms to be optimally executed, thus extending the life of field equipment and allowing the same computer to be quickly used from one algorithm to another. You can switch to .

３、故障許容範囲。クロスバ−スイッチを迅速に再構成
する能力があるので、故障素子を側路することができる
。3. Failure tolerance. The ability to quickly reconfigure the crossbar switch allows for bypassing of failed elements.

４、高い性能。アルゴリズムグラフをコンピュータ上へ
直接マツピングすることができるので、クロスバ−スイ
ッチは、広い範囲のアルゴリズムに対して、完全な並列
処理を最低のレベルで使用することを可能ならしめる。4. High performance. Because the algorithm graph can be mapped directly onto the computer, the crossbar switch allows full parallelism to be used at the lowest level for a wide range of algorithms.

５、正確性。正確性は処理用素子の選定によって決まる
。5. Accuracy. Accuracy depends on the selection of processing elements.

６、光学的クロスバ−スイッチ。シリコン変形可能ミラ
ー装置により、迅速な電子的再構成可能性が得られ、ま
た、種々の基本プロセッサ間の相互接続を必要とするア
ルゴリズムのための通信時間が最小限化される。6. Optical crossbar switch. The silicon deformable mirror device provides rapid electronic reconfigurability and also minimizes communication time for algorithms that require interconnections between the various base processors.

７、光ファイバリンク。ファイバは容量性負荷の影響を
示さないので、３ＧＨｚファイバリンクにより、他のボ
ードへのリンクが簡単に得られる。7. Optical fiber link. A 3GHz fiber link provides easy links to other boards since fiber exhibits no capacitive loading effects.

８、　プログラムドブ−タフロー。データの流れはクロ
スバ−スイッチによってプログラミングされ、これは、
プログラムカウンタを用いる直列フォノノイマン形プロ
セッサと異なり、データの並列ストリームが流れるのを
可能ならしめる。8. Programmed boot flow. The data flow is programmed by a crossbar switch, which
Unlike serial phono-Neumann processors that use program counters, they allow parallel streams of data to flow.

遠隔メモリから情報及び指令を取出す必要がなくなる。There is no need to retrieve information and instructions from remote memory.

儂豆処理システムの説明第３図は光学クロスバ一式信号プロセッサ１００のため
の組織構造を示すものである。５１２個の基本プロセッ
サＰ１ないしＰＳＩ２が、３２０メガビット／秒または
それ以上の直列ファイバ光学リンク１０４付きの７６８
Ｘ７６８の光学クロスバ−スイッチ１０２に接続されて
いる。第１の２５６個のプロセッサＰ、ないしＰ２６．
は２つの入力端子を有しており、その一つはスイッチ１
０２から、のもの、他は、ディジタル化センサから、ま
たは主メモリ１０６から直接来るものである。光学クロ
スバ−スイッチ１０２に直接入る２つの出力端子がある
。プロセッサＰｔないしＰ□、は、第１の実施例の信号
プロセッサにおいて考察する基本アルゴリズムのための
乗算器としてのみ動作する。Description of the Bean Processing System FIG. 3 shows the organizational structure for the optical crossbar complete signal processor 100. 512 elementary processors P1 to PSI2 are connected to a 768 with serial fiber optic link 104 of 320 Mbit/s or more.
It is connected to the optical crossbar switch 102 of the X768. The first 256 processors P, through P26.
has two input terminals, one of which is switch 1
02, others come from digitized sensors or directly from main memory 106. There are two output terminals that go directly into the optical crossbar switch 102. The processors Pt to P□ act only as multipliers for the basic algorithm considered in the signal processor of the first embodiment.

第２の２５６個のプロセッサＰ　！ＳｆｆないしＰ、１
！は、スイッチ１０２からの２つの入力端子、並びに、
主メモリ１０６へ及びスイッチ１０２へ直接つながる１
つの出力端子を有す。プロセッサＰ２．。The second 256 processors P! Sff or P, 1
! is the two input terminals from switch 102, and
1 connected directly to main memory 106 and to switch 102
It has two output terminals. Processor P2. .

ないしｐｓ＋□は、考察されるアルゴリズムのための加
算器としてのみ動作する。基本プロセッサを２つのバン
クに再分割することは、データをこれらの間で前後に通
過させることができるので、考察されるアルゴリズムに
対して有利である。ps+□ acts only as an adder for the considered algorithm. Subdividing the basic processor into two banks is advantageous for the considered algorithm, since data can be passed back and forth between them.

多重メモリ通路が、主メモリ１０６とプロセッサＰ、な
いしＰ　２Ｓｋとの間に高帯域中を発生させる。光学ク
ロスバ−スイッチ１０２が、メモリ１０６へ戻るデータ
を、将来の計算のために、適正なバンクのプロセッサ内
へ導く。これは、メモリの管理及びアドレス指定の複雑
性を低減し、且つ速度を保持するためである（後述のＦ
ＦＴの項参照）。システム即ち信号プロセッサ１００は
また、ホストコンピュータ１０８、ホストコンピュータ
１０８とメモリ１０６との間の入出力器１１０、ディス
プレイ１１２、プログラマブルアドレスジェネレータ１
１４、制御器１１６、及び即時入力のための２５６個の
ＮＡＮＤゲート１１８を有す。Multiple memory paths generate high bandwidth between main memory 106 and processors P, P2Sk. Optical crossbar switch 102 directs data returning to memory 106 into the appropriate bank of processors for future calculations. This is to reduce the complexity of memory management and addressing, and to maintain speed (F
(See FT section). The system or signal processor 100 also includes a host computer 108, an input/output device 110 between the host computer 108 and the memory 106, a display 112, and a programmable address generator 1.
14, a controller 116, and 256 NAND gates 118 for immediate input.

データシ中フリングは、フレームバッファリングによる
かまたは光学的セツティングにより、数マイクロ秒内に
スイッチ１０２をリセットすることによって行なわれる
。光学的セツティングは、電子的セツティングとは異な
り、従来から論証されている（「光工学Ｊ　　（Ｏｐｔ
、　Ｅｎｇ、　）　２４（１）、１０７　（１９８５）
のディー・アール・ペーゾ（Ｄ、　Ｒ，Ｐａｐｅ）参照
）。一般に、長いベクトルのみがメモリとプログラムと
の間で次々に移動させられ、かかる高速機に対するアド
レス計算の困難を軽減する。メモリ１０６へ戻る前にス
イッチ１０２内でループするアルゴリズムのインプリメ
ンテ−シランが、性能上の必要条件及びメモリ転送に付
随する時間を低減するのに望ましい（後述のＦＦＴの項
参照）。Data transfer is accomplished by resetting switch 102 within a few microseconds, either by frame buffering or by optical setting. Optical setting is different from electronic setting and has been demonstrated for a long time ("Optical Engineering J (Opt.
, Eng. ) 24(1), 107 (1985)
(See D. R. Pape). Generally, only long vectors are moved sequentially between memory and the program, reducing the difficulty of address calculations for such high speed machines. Implementing an algorithm that loops within switch 102 before returning to memory 106 is desirable to reduce performance requirements and time associated with memory transfers (see FFT section below).

処ユ反襄五叫反所第４図は基本プロセッサ構造１２０を示すものであり、
信号プロセッサ１００のＰ、ないしｐｓ＋□のために使
用される。（米国のテキサス・インストルーメンツ（Ｔ
ｅｘａｓ　Ｉｎｓｔｒｕｍｅｎｔｓ　）、　ＶＨ５ＩＣ
プレイプロセッサ及びカーネギ−・メロン大学（Ｃａｒ
ｎｅｇｉｅ−Ｍｅｌｌｏｎ　Ｕｎｉｖｅｒｓｉｔｙ）に
おけるシストリックチップの開発（ケイ・ブロムレイ　
（Ｋ。FIG. 4 shows the basic processor structure 120.
It is used for P or ps+□ of the signal processor 100. (Texas Instruments (T)
exas Instruments), VH5IC
Play Processor and Carnegie Mellon University (Car
Development of systolic chips at Negie-Mellon University (Kay Bromley)
(K.

Ｂｒｏｍｌｅｙ　）　曙、５ＰＩＥ会報４９５．１３０
　（１９８４）の「リアルタイム信号処理（Ｒｅａｌ　
Ｔｉｎｇｅ　ＳｉｇｎａｌＰｒｏｃｅｓｓｉｎｇ）■」
におけるエイチ・ティ・カング（Ｈ，Ｔ、　Ｋｕｎｇ）
及びオー・メンツィルシオグルー（０，Ｍｅｎｚｉｌｃ
ｉｏｇｌｕ）を参照）は、そのプロセッサ設計にプログ
ラマブルクロスバ−スイッチを使用していることに注意
されたい。）プロセッサ１２０の演算のための２つのオ
ペランドを表わす主光学クロスバ−スイッチ１０２から
のデータが、第４図の左側に入り、ホトダイオード１２
２及び１２２′によって検出され、コンバータ１２４及
び１２４′によって直並列変換される。一つのオペラン
ドが、また、主メモリ、ローカル基本プロセッサメモリ
１２７、またはディジタル化センサから来る場合もある
。右側における計算の出力は、並直列コンバータ１２６
及びレーザドライバ１２８へ与えられて光学クロスバ−
スイッチ１０２へ戻る。第２の出力がシストリック形構
成に対して与えられ、そこでは入力データは基本プロセ
ッサを通過甘しめられる。上記出力はまた主メモリ１０
６を通過する場合もある。算術演算はＡＬＵ／乗算器１
２５内で行なわれ、これには、入出力器とＡＬＵ／乗算
器とを結び合わすプログラマブル８×８相互接続器１２
３が付属している。従って、例えば、相互接続器１２３
は、入力をコンバータ１２４′からコンバータ１２６′
へ変化なしに通過させ、また、入力をコンバータ１２４
及び１２４′からＡＬＵ／乗算器１２５へ通過させて乗
算し、そして最後に、この乗算積を出力コンバータ１２
６へ通過させる。（これは、事実上、後述のシストリッ
クフィルタリングの例における第１のバンクの基本プロ
セッサに対する準備である。）第３図は、上半分のプロ
セッサが下半分のプロセッサとは別様に接続されている
ことを示すものである。プロセッサの正常な演算はロー
カルプログラムの制御の下にあり、計算または移動サイ
クルの開始はマスク同期信号によって決定される。Bromley) Akebono, 5PIE Newsletter 495.130
(1984), “Real-time signal processing (Real-time signal processing)”
"TingeSignalProcessing)"
H.T. Kung (H.T. Kung)
and O Menzilcioglu (0, Menzilc
Note that ioglu) uses a programmable crossbar switch in its processor design. ) Data from the main optical crossbar switch 102 representing the two operands for the operation of the processor 120 enters the left side of FIG.
2 and 122' and serial-to-parallel converted by converters 124 and 124'. An operand may also come from main memory, local base processor memory 127, or a digitized sensor. The output of the calculation on the right side is the parallel-to-serial converter 126
and the optical crossbar provided to the laser driver 128.
Return to switch 102. A second output is provided to the systolic configuration, where the input data is passed through the base processor. The above output is also the main memory 10
In some cases, it may pass through 6. Arithmetic operations are performed by ALU/multiplier 1
25, which includes a programmable 8x8 interconnect 12 that couples the input/output and ALU/multipliers.
3 is included. Thus, for example, interconnect 123
converter 124' to converter 126'
and input to converter 124 without change.
and 124' to the ALU/multiplier 125 for multiplication, and finally, the multiplication product is sent to the output converter 12.
Pass to 6. (This is, in effect, a provision for the first bank of basic processors in the systolic filtering example below.) Figure 3 shows that the top half processors are connected differently than the bottom half processors. This indicates that there is a The normal operation of the processor is under the control of the local program, and the start of a computation or move cycle is determined by the mask synchronization signal.

多ギガフロップパフォーマンスは、市販の１００ナノ秒
の乗算器及び加算器のチップの使用を必要とする。Multi-gigaflop performance requires the use of commercially available 100 nanosecond multiplier and adder chips.

第５Ａ図は４×４クロスバ−スイッチ１０２に対する線
図を示すものである。各交差点は、水平入力線を垂直出
力線と結合させる指向性スイッチを有す。黒−丸は閉じ
たスイッチを示す。一つの出力線は一つの入力線から情
報を受取るが、一つの入力線はいくつかの出力線へ一斉
送信することができる。第５Ｂ図は、空間光変調器１３
０を具備するクロスバ−スイッチ１０２を線図的に示す
ものであり、ドツトは、第５Ａ図のセツティングと符合
する透明領域を示す。光学レンス系（図示せず）を用い
、入力源（ＬＥＤＩないし４）からの光を、この光を垂
直方向に広げることなしに、水平方向に広げる。空間光
変調器１３０を通過する光は、水平方向に広げることな
しに垂直方向に合焦させるレンズ系（図示せず）により
、受光用ダイオード（検出器１ないし４）上へ落とされ
る。FIG. 5A shows a diagram for a 4x4 crossbar switch 102. Each intersection has a directional switch that couples the horizontal input line with the vertical output line. Black circles indicate closed switches. One output line receives information from one input line, but one input line can broadcast to several output lines. FIG. 5B shows the spatial light modulator 13
5A, the crossbar switch 102 is diagrammatically shown with a crossbar switch 102 having a crossbar switch 102 with a dot indicating a transparent area consistent with the settings of FIG. 5A. An optical lens system (not shown) is used to spread the light from the input source (LEDI-4) horizontally without spreading the light vertically. The light passing through the spatial light modulator 130 is directed onto the receiving diodes (detectors 1 to 4) by a lens system (not shown) that focuses the light vertically without spreading it horizontally.

第６図は、空間光変調器１３０として変形可能ミラー装
置（ＤＭＤ）を具備するスイッチ１０２を示すものであ
る。ＤＭＤは、透明変調器としては働かずに、可変強度
反射器として働き、従って、この変調器の右側は折り返
される。ビームスプリッタ１３２を用いて、戻り光を入
射光から分離する。シュリーレン光学装置１３４を用い
て、ＤＭＤ１３０のミラー偏向画素相互間の領域からの
反射を阻止する。３　ＧＨｚまでの変調能力を有する５
１２個のレーザダイオード１３６　（ｒ今日の物理」（
Ｐｈｙｓｉｃｓ　Ｔｏｄａｙ　）　　３８　（５）　、
３２　　（１９８５）のＹ。FIG. 6 shows a switch 102 with a deformable mirror device (DMD) as a spatial light modulator 130. The DMD does not act as a transparent modulator, but as a variable intensity reflector, so the right side of this modulator is folded back. A beam splitter 132 is used to separate the returning light from the incident light. Schlieren optics 134 are used to block reflections from the areas between the mirror deflection pixels of DMD 130. 5 with modulation capability up to 3 GHz
12 laser diodes 136 (rToday's Physics)
Physics Today) 38 (5),
32 (1985) Y.

５ｕｅｓａｔｓｕ参照）光源として働き、５１２個のＰ
ＩＮダイオード１３８が受光体として働＜　　（ＩＥＥ
Ｅ会議録７２（７）　、８５０　（１９８４）のリュー
・ダブリュー・グツドマン（Ｊ、　Ｗ、　Ｇｏｏｄａ＋
ａｎ　）　ｓエフ・ジェイ・レオンバーガー（Ｆ、　Ｊ
、　Ｌｅｏｎｂｅｒｇｅｒ）、ニス・ワイ・カング（Ｓ
、　Ｙ、　Ｋｕｎｇ）　、及びアール・エイ・アサール
（Ｒ，Ａ、　Ａｔｈａｌｅ）　、並びに、ＡＧＡＲＤ会
議録３６２．１７（１９８５）、ビー・エル・ドープ（
Ｂ、　Ｌ、　Ｄｏｖｅ）編、「ディジタル光学回路技術
Ｊ　　（Ｄｉｇｉｔａｌ　０ｐｔｉｃａｌ　Ｃ１ｒｃｕ
ｉｔ　Ｔｅｃｈｎｏｌｏｇｙ）におけるリュー・エイ・
ネフ（Ｊ、＾、Ｎｅｆｆ）を参照）。第５Ｂ図に示して
ない光学装置を第６図に示しである。即ち、円柱状光学
装置１４０が入力源１３６からの光を水平に広がらせ、
円柱状光学装置１４２が光を受光用ダイオード１３８上
へ垂直に落とす。光学１３６及び受光体１３８は、電子
チップ上に直接集積することができる。(see 5uesatsu) acts as a light source, and 512 P
IN diode 138 acts as a photoreceptor (IEE
Liu W. Gutsman (J, W. Gooda +
an) F.J. Leonberger (F.J.
, Leonberger), Nis Y Kang (S
, Y. Kung) and R.A. Athale, and AGARD Proceedings 362.17 (1985), B.L. Dorp (
B, L, Dove), eds., Digital Optical Circuit Technology J (Digital Optical Circuit Technology J)
Liu A.
Neff (see J, ^, Neff). Optical devices not shown in FIG. 5B are shown in FIG. 6. That is, the cylindrical optical device 140 horizontally spreads the light from the input source 136;
A cylindrical optical device 142 directs light vertically onto the receiving diode 138. Optics 136 and photoreceptor 138 can be integrated directly onto the electronic chip.

膜及び片持ばり形の変形可能ミラー装置（ＤＭＤ）（Ｉ
ＥＥＥｉｉ事録ＥＤ−２２（９）、（１９７５）のアー
ル・エヌ・トーマス（Ｒ，Ｎ、　Ｔｈｏｍａｓ）　）が
開発されている。テキサス・インストルーメンツの膜Ｄ
ＭＤで行なった結像及び分光分析実施に対する結果が発
表されている（「光工学Ｊ　２２　（６）、６７５　（
１９８３）のディー・アール・ペーゾ及びエル・ジェイ
・ホーンベック（Ｌ、　Ｊ、　Ｈｏｒｎｂｅｃｋ）参照
）。膜形光変調器は変形可能ミラー素子のＸ−Ｙアレイ
から成っており、このミラー素子は、その下に横たわっ
ているＭＯＳ）ランジスタのアレイによってアドレス指
定が可能である。即ち、−第７Ａ図は相隣る４つのミラ
ー素子の斜視図であリ、第７Ｂ図は上記アレイの略図で
ある。反射性導電膜１７０が上記アレイの面をおおい、
ミラーとなっている。ＤＭＤの線アドレス指定式組織を
第７Ｂ図に示しである。即ち、データは直並列コンバー
タ１７１へ与えられ、このコンバータはＭｏＳトランジ
スタのドレイン線１７２に接続されている。ドレイン線
１７２が充電され（ｋ番目の線１７２が電位φ６１．に
充電される）、ゲート１７６に接続されているデコーダ
１７４がｍ番目のゲートを選択してターンオンさせる。Membrane and cantilever deformable mirror device (DMD) (I
EEEii Proceedings ED-22(9), (1975) R.N. Thomas) has been developed. Texas Instruments Membrane D
The results of imaging and spectroscopic analysis performed with MD have been published ("Optical Engineering J 22 (6), 675 (
Reference D. R. Peso and L. J. Hornbeck (1983)). Film light modulators consist of an X-Y array of deformable mirror elements that are addressable by an underlying array of MOS transistors. 7A is a perspective view of four adjacent mirror elements, and FIG. 7B is a schematic representation of the array. a reflective conductive film 170 covers the surface of the array;
It is a mirror. The line addressing organization of the DMD is shown in FIG. 7B. That is, data is provided to a series-to-parallel converter 171, which is connected to the drain line 172 of the MoS transistor. The drain line 172 is charged (the kth line 172 is charged to the potential φ61.), and the decoder 174 connected to the gate 176 selects the mth gate and turns it on.

次いで、ｍ番目のゲート線１７？内のＭｏ３）ランジス
タの浮動ソース１７８が対応のドレイン１７２の電位に
充電される（このｍ番目のゲート線はφ３、。Next, the mth gate line 17? The floating source 178 of the Mo3 transistor is charged to the potential of the corresponding drain 172 (the mth gate line is φ3).

に充電される）。次いで、上記ゲートはターンオフされ
、ミラー１７０はｖＭの固定電位に保持される。従って
、■９−φつ１．に比例する静電力が（ｋ、　ｍ）番目
のミラー素子に働いてこれを浮動ソース１７８へ向かっ
て下方へ偏向させる。ミラー素子の機械的応答時間、従
って線整定時間は数μｓである。ｍ番目のゲート線１７
７内の浮動ソース１７８が設定されると、次いで、次の
線のデータがドレイン線１７２内へ与えられ、そして次
のゲート線１７７がデコーダ１７４によって選択される
。片持ばり装置は、各浮動ソースの上に小さなフラップ
が一つの角でヒンジ止めされて導電性ミラーを形成して
いるという点を除き、同様である。上記膜形光変調器に
おけると同じように、上記トランジスタがターンオンす
ると上記浮動ソースが充電され、そして対応のフラップ
が、そのヒンジ部で、上記充電済み浮動ソースへ向かっ
て下方へ曲げられる。上記膜またはフラップの偏向は印
加電圧の非線形関数であり、第７Ｃ図に示す形状に近似
している。臨界「挫折電圧」よりも上では上記膜または
フラップは充電済みコンデンサ板の方への挫折に対して
不安定である。上記膜形装置及び片持ばり形装置のため
のミラー素子の大きさは３０ミクロン平方程度である。). The gate is then turned off and mirror 170 is held at a fixed potential of vM. Therefore, ■9-φ1. An electrostatic force proportional to is applied to the (k, m)th mirror element to deflect it downward toward floating source 178. The mechanical response time of the mirror element, and thus the line settling time, is a few μs. mth gate line 17
Once the floating source 178 in 7 is set, the next line's data is then provided into the drain line 172 and the next gate line 177 is selected by the decoder 174. The cantilever device is similar except that above each floating source a small flap is hinged at one corner to form a conductive mirror. As in the film optical modulator, when the transistor is turned on, the floating source is charged and the corresponding flap is bent downwardly at its hinge toward the charged floating source. The deflection of the membrane or flap is a nonlinear function of the applied voltage and approximates the shape shown in Figure 7C. Above a critical "failure voltage" the membrane or flap is unstable against buckling towards the charged capacitor plate. The mirror element size for the membrane and cantilever devices described above is on the order of 30 microns square.

プロブーミング　び　箭・“ 使用者は、対象とする関数を計算するための近似式を引
き出し、この式を実行するためのアルゴリズムを選定す
る。有向グラフ（７３１ＥＥＥ会報８５２　（１９８５
）のアレン・ジェイ（ＡｌｉｅｎＪ、）の「ディジタル
信号処理のためのコンビュータアーキテクチ＋Ｊ　　（
Ｃｏｍｐｕｔｅｒ　Ａｒｃｈｉｔｅｃｔｕｒｅ　ｆｏｒ
Ｄｉｇｉｔａｌ　Ｐｒｏｃｅｓｓｉｎｇ）参照）を、こ
のアルゴリズムのために、最大限の並列処理を提示する
如き方法で構成する。演算をノードとして表わし、接続
をエツジまたは弧として表わす。上記有向グラフを光学
クロスバ−スイッチシステム上にマツピングする。即ち
、上記エツジをクロスバ−セツティング内に、そして上
記ノードを、使用可能な基本プロセッサ内に、使用可能
なリソースを効率的に利用しながらタイミング上の制約
を満足させる如き仕方で、マツピングする。たたみごみ
を実施するためのシストリックフィルタリング、ＦＦＴ
。``Probooming'' The user derives an approximate formula to calculate the target function and selects an algorithm to execute this formula.
``Computer Architecture + J for Digital Signal Processing'' by Allen J ()
Computer Architecture for
(See Digital Processing) is configured for this algorithm in such a way as to present maximum parallelism. Represent operations as nodes and connections as edges or arcs. The above directed graph is mapped onto an optical crossbar switch system. That is, the edges are mapped into crossbar settings and the nodes are mapped into available base processors in a manner that satisfies timing constraints while efficiently utilizing available resources. Systolic filtering, FFT to implement folding garbage
.

及び倍加アルゴリズムを次いでプロセッサ１００にマツ
ピングする。並列共役勾配アルゴリズムグラフをどこか
に提示する（科学計算のための並列処理に関する第２回
ＳＩＡＭ会議（１９８５年１１月）のエイ・ディー・マ
コーレイ　（Ａ、Ｄ、ＭｃＡｕｌａｙ）の［光学クロス
バ−相互接続式多重プロセッサに対する共役勾配Ｊ　　
（Ｃｏｎｊｕｇａｔｅ　Ｇｒａｄｉｅｎｔｓ　ｏｎＯｐ
ｔｉｃａｌ　Ｃｒｏｓｓｂａｒ　Ｉｎｔｅｒｃｏｎｎｅ
ｃｔｅｄ　Ｍｕｌｔｉｐｒｏｃｅｓｓｏｒ）参照）。主
な目的は、アルゴリズムのための最大限の並行処理をも
って有向グラフを自動的に構成しくダブりニー・ビー・
アッカマン（Ｗ、　Ｂ。and the doubling algorithm are then mapped to processor 100. Present the parallel conjugate gradient algorithm graph somewhere (A. D. McAulay, 2nd SIAM Conference on Parallel Processing for Scientific Computing (November 1985) [Optical Crossbar Mutual Conjugate gradient J for connected multiprocessors
(Conjugate Gradients on Op
tical Crossbar Interconne
cted Multiprocessor)). The main objective is to automatically construct and duplicate directed graphs with maximum concurrency for algorithms.
Ackerman (W, B.

Ａｃｋｅｒｍａｎ）の「コンピュータＪ１５（２）、１
５（１９８２））、次いでこれをプロセッサ上に効率的
にマツピングするためのソフトウェアを開発することで
ある。これは、相互接続ネットトワークを決定するテー
ブルを準備すること、基本プロセッサによって行なうこ
とが必要である演算、及びこれら演算のためのタイミン
グスケジュールを含む。Ackerman) “Computer J15(2), 1
5 (1982)), and the next step is to develop software for efficiently mapping this onto a processor. This includes preparing the tables that determine the interconnection network, the operations that need to be performed by the elementary processor, and the timing schedule for these operations.

使用者はデータのストリームをプロセッサに供給し、結
果は、パイプラインが充填されると、各サイクルごとに
出力端子に現われる。各基本プロセッサＰ、ないしｐｓ
ｔ□は、必要な全ての論理または数値入力を受取った後
、次の同期パルスにおいてその演算を行ない、そして出
力をクロスバースイフチ１０２？自動的に通過させて必
要な次の演算へルート指定する。The user supplies a stream of data to the processor, and the results appear at the output terminal each cycle as the pipeline fills. Each basic processor P or ps
After receiving all necessary logic or numerical inputs, t□ performs its operation on the next synchronization pulse and sends the output to crossbar switch 102? Automatically pass through and route to the next required operation.

言０几　アルゴリズムシストリックフィルタリング、たたみこみ、相関、及び
フーリエ変換は基本信号処理アルゴリズムである。線形
フィルタは次式の通りである。Algorithms Systolic filtering, convolution, correlation, and Fourier transform are basic signal processing algorithms. The linear filter is as follows.

ここに、ａ、、（ｋ＝１　　・・・ｋ）はフィルタ係数
であり、ｂｎ（ｎ　＝　１　　・・・Ｎ）はデータであ
り、＊印はたたみこみ演算子である。Here, a, , (k=1...k) are filter coefficients, bn (n=1...N) are data, and * mark is a convolution operator.

ベクトルａ及びｂの相互相関は次式の如くに書ける。The cross-correlation between vectors a and b can be written as follows.

式（１）と（２）とを比べると、たたみこみ及び相関は
、１つの入力のオーダを逆転することにより、同じプロ
セッサで計算することができるということが解る。光学
クロスバ−信号プロセッサ上でかかる演算を行なうシス
トリック、ＦＦＴ、及び倍加の方法を次に説明する。但
し、明瞭化のために１６個の基本プロセッサだけを図示
する。Comparing equations (1) and (2), it can be seen that the convolution and correlation can be computed on the same processor by reversing the order of one input. Systolic, FFT, and doubling methods for performing such operations on an optical crossbar signal processor will now be described. However, only 16 basic processors are illustrated for clarity.

第８Ａ図は、フィルタ式（１）をシストリックモードで
実行するための有向グラフを示すものである。フィルタ
係数ａｋは乗算器プロセッサＰ、ないしＰ、内に記憶さ
れる。データは、データ値相互間にゼロを挿入した状態
で、プロセッサＰ、に直列に入れられる。上記ゼロは２
サイクル演算を許すものであり、一つのサイクルにおい
て計算が行すわれ、もう一つのサイクルにおいてデータ
が次のプロセッサへ移動させられる。全てのプロセッサ
は対応の計算を行ない、パイプラインが充満すると演算
を同時に移動させる。従って、各移動サイクルにおいて
、出力ｃ７の新たな値が右上のプロセッサＰ、において
出て来る。例えば、項ｃ４．＝ａ、ｂ、＋ａｚｂ２　＋
ａ３ｂｔは次のようにして生ずる。即ち、第１の移動サ
イクルがｂｌをプロセッサＰ、に入れ、他の全ての基本
プロセッサは０で充満している。第１の計算サイクルは
Ｃ４に対して無関係である（即ち、プロセッサＰ、にお
いて計算される積ａ、ｂ、は項ｃ、に対するものである
）。第２の移動サイクルが、ｂｌをプロセッサＰ２へ転
送し、そしてＯをプロセッサＰ１内に入れる。他の全て
の転送（即ち、プロセッサＰ、からＰ、への転送）は無
関係である。第２の計算サイクルはこれもまた無関係で
ある。第３の移動サイクルが、ｂ、をプロセッサｐＨ内
へ転送し、０をプロセッサｐＨ内へ転送し、そしてｂｔ
をプロセッサｐＨ内に入れる。第３の計算サイクルが積
ａ、ｂ、を作る。第４の移動サイクルが、ａ、ｂ、をプ
ロセッサＰ３からＰ、へ、ｂｔをプロセッサＰｔからＰ
２へ転送し、そして０をプロセッサＰ、に入れる。第４
の計算サイクルが、積ａ、ｂ、をプロセッサｐＨ内に、
そしてＯ及びａ３ｂ、の単純和をプロセッサｐＨ内に作
る。第５の移動サイクルが、ａ、ｂ、をプロセッサＰ、
からＰＩ（１へ、ａ２ｂ、をプロセッサＰ２からＰ、。FIG. 8A shows a directed graph for executing filter equation (1) in systolic mode. The filter coefficients ak are stored within the multiplier processor P, through P. Data is serially entered into processor P, with zeros inserted between data values. The above zero is 2
It allows cyclic operations, where the computation is performed in one cycle and the data is moved to the next processor in another cycle. All processors perform corresponding computations and move operations together when the pipeline is full. Thus, in each transfer cycle, a new value of output c7 comes out at the upper right processor P. For example, term c4. =a, b, +azb2 +
a3bt is generated as follows. That is, the first transfer cycle puts bl into processor P, and all other elementary processors are filled with zeros. The first computation cycle is irrelevant for C4 (ie, the product a,b, computed in processor P, is for term c). A second transfer cycle transfers bl to processor P2 and places O into processor P1. All other transfers (ie, transfers from processor P, to P) are irrelevant. The second calculation cycle is also irrelevant. A third transfer cycle transfers b, into the processor pH, transfers 0 into the processor pH, and bt
into the processor pH. The third calculation cycle produces the product a, b. The fourth transfer cycle moves a, b, from processor P3 to P, and bt from processor Pt to P.
2 and puts 0 into processor P. Fourth
The calculation cycle of brings the product a, b into the processor pH,
A simple sum of O and a3b is then created in the processor pH. A fifth transfer cycle transfers a, b to processor P,
to PI(1, a2b, from processor P2 to P,.

へ転送し、そしてす、をプロセッサｐＨ内に入れる。第
５の計算サイクルが、和ａ＝ｂ、　＋ａ２ｂ２をプロセ
ッサＰ、。内に、そして積ａｌｂ３をプロセッサＰ１内
に作る。第６の移動サイクルが、ａａｂ＋＋ａｚｂ２を
プロセッサｐｔｏからＰ、へ、そしてａ、ｂ３をプロセ
ッサＰ１からＰ、へ転送する。第６の計算サイクルが和
ａｘｂｔ　＋　ａｚｂ２　＋　ａ＋ｂ３を作る。この和
はプロセッサｐＨ内のＣ４である。そして第７の移動サ
イクルが０４を出力する。and place it in the processor pH. The fifth calculation cycle calculates the sum a=b, +a2b2 by the processor P. and create a product alb3 in processor P1. A sixth transfer cycle transfers aab++azb2 from processor pto to P, and a,b3 from processor P1 to P. The sixth calculation cycle creates the sum axbt + azb2 + a+b3. This sum is C4 in the processor pH. Then, the seventh movement cycle outputs 04.

第８Ｂ図は、第３図のコンピュータ上での第８Ａ図の有
向フローグラフのインプリメンテーションを示すもので
ある。入力はプロセッサＰ、へ与えられる。プロセッサ
ＰＩの上部の出力は、クロスバ−スイッチ１０２の左上
四分の一区域内の頂部の閉じたスイッチ（記号Ａで示す
）を介して、プロセッサＰ２の入力端子へ送り返される
。プロセッサＰ、からの第２の出力は入力データ値ｂ７
であり、これは、クロスバ−スイッチ１０２の右上四分
の一区域上のスイッチセツティング（記号Ｂで示す）を
介して、加算器Ｐ、の最上部の入力端子へ通過させられ
る。加算器Ｐ、の他の入力は、クロスバ−スイッチ１０
２の下布四分の一区域内のセ／ニイング（記号Ｃで示す
）を介して、加算器ＰＩＯから来る。出力はプロセッサ
Ｐ、から来る。シストリックアレイについては、パイプ
ラインが充満すると、各計算・移動クロックサイクルご
とに、出力が得られる。FIG. 8B illustrates the implementation of the directed flow graph of FIG. 8A on the computer of FIG. 3. Input is provided to processor P. The top output of processor PI is routed back to the input terminal of processor P2 via a top closed switch (designated A) in the upper left quadrant of crossbar switch 102. The second output from processor P is the input data value b7
, which is passed to the top input terminal of adder P, via a switch setting (designated B) on the upper right quadrant of crossbar switch 102. The other input of the adder P is the crossbar switch 10.
It comes from the adder PIO via a se/ning (denoted by the symbol C) in the lower cloth quarter area of 2. The output comes from processor P. For systolic arrays, when the pipeline is full, an output is available for each computation-transfer clock cycle.

高゛束フーリエ　換（ＦＦＴ）ＦＦＴは、２つの等長ベクトルを相関させるために、例
えば、認識のためのテンプレートに対して入力データを
整合させる際に、屡々有利に用いられる。入力データの
フーリエ変換を計算し、そしてその値に、記憶されてい
るテンプレート変換係数を乗する。全体の計算はたたみ
こみよりも時間的に少ない。即ち、２つのＮ長ベクトル
のたたみこみはほぼＯ（Ｎ”）の乗算・加算計算を行な
い、一方、２ずつのパディングを用いるＦＦＴによるた
たみこみはほぼＯ（２Ｎ＋４Ｎ　ｌｏｇｚ（２Ｎ））の
演算を行なうに過ぎないからである。ＦＦＴはまた、ス
ペクトル情報を求める場合、例えばノイズデータ中のプ
ロペラ運動から船舶を認識する場合に、重要である。High Flux Fourier Transform (FFT) The FFT is often advantageously used to correlate two equal length vectors, for example in matching input data to a template for recognition. Compute the Fourier transform of the input data and multiply that value by the stored template transform coefficients. The overall calculation takes less time than convolution. That is, convolution of two N-length vectors requires approximately O(N'') multiplication/addition calculations, while convolution by FFT using padding of 2 requires approximately O(2N+4N logz(2N)) operations. FFT is also important when determining spectral information, for example when recognizing ships from propeller motion in noise data.

有限シーケンスＸのフーリエ変換は、次式で与えられる
もう一つの有限シーケンスＸである。The Fourier transform of the finite sequence X is another finite sequence X given by the following equation.

第９Ａ図は時間的８点デシメーションＦＦＴのためのグ
ラフを示すものであり、第９Ｂ図はこの構成の開始のた
めのビット反転を示すものである。FIG. 9A shows the graph for a temporal 8-point decimation FFT, and FIG. 9B shows the bit inversion for the start of this configuration.

第９Ｃ図は、長さＮのＦＦＴに対して、右手にある出力
を、ｌｏｇ、Ｎ回、入力端子へ送り返すことにより、各
繰返しにおいて用いられる固定構成ステージを示すもの
である（例えば、図示の８点ＦＦＴは、第９Ａ図に示す
ように、ｌｏｇｚ８　＝　３回の繰返しを取る）。式（
３）内の適切な指数項に対応する重みＷは、これも第９
Ａ図に示すように、各繰返しにおいて変位させることが
必要である。これら重みを、基本プロセッサローカルメ
モリ１２７または主メモリ１０６に記憶させる。FIG. 9C shows a fixed configuration stage used in each iteration by sending the right-hand output back to the input terminal log, N times for an FFT of length N (e.g., The 8-point FFT takes logz8 = 3 iterations, as shown in Figure 9A). formula(
The weight W corresponding to the appropriate index term in 3) is also the 9th
As shown in Figure A, it is necessary to make a displacement in each iteration. These weights are stored in base processor local memory 127 or main memory 106.

第１０図は２４Ｘ２４クロスバ−スイッチ１０２上のＦ
ＦＴインプリメンテーションを示すものである。ＦＦＴ
入力はプロセッサＰ１ないしＰ、に与えられ、これらプ
ロセッサはこのデータをクロスバ−スイッチ１０２へ通
過させる。（即ち、１を乗する）。上記スイッチの左上
四分の一区域は、そのビットを、第９Ｂ図において要求
されるビット反転済みシーケンスに変換し、そしてこれ
をプロセッサＰｌないしＰ、へ戻す。第１の組の重みＷ
は上記ＦＦＴステージの第１のループに対して用いられ
る（第９Ｃ図）。次いで、クロスバ−スイッチ１０２は
リセットされ、繰返しループに対して、左上四分の一区
域をターンオフし、上古及び下車の四分の一区域をター
ンオンする。そこで、上記のデータを、固定構成ＦＦＴ
グラフを実行するクロスバ−スイッチ１０２の上布四分
の一区域を介して、加算器プロセッサＰ、ないしＰＩ３
へ通過させる。クロスバ−スイッチ１０２の下車四分の
一区域を用い、ＦＦＴの次のループのために、上記デー
タをプロセッサＰ、ないしＰ、へ戻す。ｌｏｇｚＨのル
ープの後、その出力は加算器プロセッサＰ、ないしＰ、
。から取り出される。Figure 10 shows F on the 24x24 crossbar switch 102.
FT implementation is shown. FFT
Inputs are provided to processors P1 through P, which pass this data to crossbar switch 102. (i.e., raised to the power of 1). The upper left quadrant of the switch converts the bits into the bit-reversed sequence required in FIG. 9B and returns this to the processors Pl-P. First set weight W
is used for the first loop of the FFT stage (Figure 9C). The crossbar switch 102 is then reset to turn off the upper left quadrant and turn on the upper and lower quadrants for a repeating loop. Therefore, the above data is transformed into a fixed configuration FFT.
Adder processors P, through PI3, through the upper cloth quarters of crossbar switch 102, which execute the graph.
pass to. The exit quarter section of the crossbar switch 102 is used to return the data to the processors P, P, for the next loop of the FFT. After the loop of logzH, its output is sent to an adder processor P, or P,
. taken from.

複合乗算及び複合加算がＦＦＴの各ループ内でへ　　　
次々に行なわれる。しかし、複合乗算は複合加算よりも
溝かに長い時間がかかる。２つのＦＦＴをインクリーブ
しても、複合の乗算及び加算の計算時間に食い違いがあ
るので、高い効率は得られない。実数部分および虚数部
分を分離すればこの食い違いを克服することができる。Complex multiplication and complex addition are performed within each loop of FFT.
They are carried out one after another. However, complex multiplications take significantly longer than complex additions. Even if two FFTs are incremented, high efficiency cannot be obtained because of the discrepancy in the computation times for complex multiplication and addition. This discrepancy can be overcome by separating the real and imaginary parts.

第１１Ａ図及び第１１Ｂ図は、実数部分及び虚数部分を
分離する４点ＦＦＴフローグラフを示すものであり、８
つの乗算器、１２個の加算器、及び３６Ｘ３６クロスバ
ースイツチを必要とする。計算は次のように進行する。Figures 11A and 11B show a 4-point FFT flow graph that separates the real and imaginary parts;
It requires 1 multiplier, 12 adders, and a 36x36 crossbar switch. The calculation proceeds as follows.

即ち、第１のサイクルは、変換すべきシーケンスのオー
ダを再配置する（第１１Ａ図）。That is, the first cycle rearranges the order of the sequence to be converted (FIG. 11A).

次に、上記クロスバ−スイッチを、類似の計算時間を含
んでいる１つの乗算サイクル及び２つの加算サイクルを
各々が有している繰返しループに対してリセットする。The crossbar switch is then reset for repeat loops each having one multiplication cycle and two addition cycles containing similar computation times.

従って、３つのＦＦＴをインタリープ及び計算し、相異
なる変換に対して全数３つのサイクルが同時に計算する
ようにすることができる。第１表はこのフローのための
クロスバ−スイッチセツティングを示すものである。Therefore, three FFTs can be interleaved and computed so that a total of three cycles are computed simultaneously for different transforms. Table 1 shows the crossbar switch settings for this flow.

びマトリックス・ペルトル第１２Ａ図は、再帰的ダブリングにより、時間領域にお
けるテンプレートベクトルｂ１即ち式（２）、に対して
多くのベクトルを相関させるためのグラフを示すもので
ある。上記ベクトルは、マトリックス・ベクトル乗算に
おけるマトリックスＡの行と考えることができる。一つ
のベクトルがプロセッサＰ、ないしＰ、内で乗算を行な
われている間に、先行の乗算の結果がプロセッサＰ、な
いしｐｔｚ内で加算されつつあり、第２の先行の乗算の
結果がプロセッサＰ１３ないしＰＩＪ内で加算されつつ
あり、そして第３の先行の乗算の結果がプロセッサＰＩ
Ｓ内で加算されつつある。プロセッサＰ０は使用されな
い、上記マトリックス・ベクトル乗算に対する出力ベク
トルの値は、各計算移動クロンクサイクルにおいて得ら
れる。第１２Ｂ図は、第３図のコンピュータ上のインプ
リメンテーションを示すものである。この場合には１６
Ｘ２４スイツチが適当である。このインプリメンテーシ
ョンに対する待ち時間は、シストリックアレイモードに
一対するＮと異なり、ｌｏｇ、Ｎである。ここでは並列
入力が必要であり、そして、パイプラインを充満させる
ために多重ベクトルを相関させなければならない。フー
リエ変換をとることから生ずる問題もまた避けられる。FIG. 12A shows a graph for correlating many vectors with respect to the template vector b1 in the time domain, ie, equation (2), by recursive doubling. The above vectors can be thought of as rows of matrix A in matrix-vector multiplication. While one vector is being multiplied in processors P, through P, the results of a previous multiplication are being added in processors P, through ptz, and the results of a second preceding multiplication are being added in processor P13. or PIJ, and the result of the third preceding multiplication is added to the processor PI
It is being added within S. Processor P0 is not used, and the value of the output vector for the above matrix-vector multiplication is obtained in each computational transfer clock cycle. FIG. 12B shows a computer implementation of FIG. In this case 16
An X24 switch is suitable. The latency for this implementation is log,N, as opposed to N for systolic array mode. Parallel inputs are required here, and multiple vectors must be correlated to fill the pipeline. Problems arising from taking a Fourier transform are also avoided.

スペクトル−〇ための　己ロヅモデリング自己回帰モデ
リングは、音声、水中音響学、ソナー、レーダ、及び地
震処理に広く用いられているので、これが選択される。Autoregressive Modeling for Spectrum Autoregressive modeling is chosen because it is widely used in audio, underwater acoustics, sonar, radar, and seismic processing.

自己回帰モデリング（ＡＲ）は、一般に、最適最小自乗
またはウィーナ・フィルタリング、線形予測（ＬＰ）、
及び最大エントロピー法（Ｍ　Ｅ　Ｍ）と同等である。Autoregressive modeling (AR) generally includes optimal least squares or Wiener filtering, linear prediction (LP),
and the maximum entropy method (MEM).

モデリングの目的は少数の自己回帰パラメータ（全ボー
ルモデル）によって時系列を表わすことにあり、これは
、白色雑音をモデルを通過させることにより、最小自乗
精度内で時系列を再生させることのできるものである。The purpose of modeling is to represent the time series by a small number of autoregressive parameters (full ball model), which can reproduce the time series to within least squares accuracy by passing white noise through the model. It is.

従って、上記時系列を上記モデルの逆フィルタを通過さ
せると、上記時系列に対する情報が除去され、そして白
色雑音が後に残る。従って、ＡＲパラメータは、上記時
系列のスベクトル、即ちその色を計算するための情報を
含む。上記ＡＲパラメータはまた線形予測係数（ＬＰＣ
）とみなすこともできる。即ち、これら係数を有するＦ
ＩＲまたはＭＲフィルタは、複数の過去の値から時系列
の次の値を予測するものであるからである。実際の値か
ら上記予測値を減すると、逆ＡＲフィルタを適用する場
合と同じように、白色雑音が与えられる。ｍ番目のオー
ダのＬＰＧａｍ（ｋ＝１　　・・・ｍ）が、時間ｊにお
ける時系列の値を過去の値から予測することを可能なら
しめる。Therefore, passing the time series through the inverse filter of the model removes the information for the time series and leaves behind white noise. Therefore, the AR parameters include information for calculating the time series vector, ie, its color. The above AR parameters are also linear predictive coefficients (LPC)
) can also be considered. That is, F with these coefficients
This is because the IR or MR filter predicts the next value in a time series from a plurality of past values. Subtracting the predicted value from the actual value gives white noise, similar to applying an inverse AR filter. The mth order LPGam (k=1...m) makes it possible to predict the value of the time series at time j from past values.

即ち、そして、下記の値を最小化するように計算される。That is, Then, it is calculated to minimize the following value.

Σ＝　１ｘｊ−Ｘｊ　ｌ　”　　　　’　　　　（５）
３つの用途を略述する。地理学においては、大地からの
反射信号はランダムであると考えられる。Σ= 1xj−Xj l ” ' (5)
Three uses will be briefly described. In geography, the reflected signals from the earth are considered to be random.

表面において測定されるレスポンスは、ソース小波及び
ランダムな大地の連続のたたみこみである。The response measured at the surface is a convolution of the source wavelet and a random ground sequence.

上記ソース小波の影響を、予測的デコンボリューション
により、センサデータから除去する。即ち、スペクトル
内の着色情報を除去する（Ｉ　ＥＥＥ国際会議、「音響
学、音声及び信号処理Ｊ　　（１９８５年３月）のマコ
ーレイの「逆転のための地震アレイデータの予測的デコ
ンボリューション」（Ｐｒｅｄｉｃｔｉｖｅ　Ｄｅｃｏ
ｎｖｏｌｕｔｉｏｎ　ｏｆ　Ｓｅ１５ｍ１ｃ　Ａｒｒａ
ｙＤａｔａ　ｆｏｒ　Ｉｎｖｅｒｓｉｏｎ）参照）。The influence of the source wavelet is removed from the sensor data by predictive deconvolution. i.e., removing colored information in the spectrum (I EEE International Conference on Acoustics, Speech and Signal Processing J (March 1985), Macaulay, ``Predictive Deconvolution of Seismic Array Data'') Deco
nvolution of Se15m1c Arra
yData for Inversion).

第２の用途においては、音声の２００サンプルセグメン
トが１６個の線形予測係数（ＡＲパラメータ）で表わさ
れる。即ち、この多ボールが、スペクトルのモデリング
のため、及びこのセグメントの認識のために、適当なの
である。この場合には、２００個から１６個へのデータ
の圧縮が、後続のステージにおける計算を溝かに速くす
る特徴抽出として働く。In the second application, a 200 sample segment of audio is represented by 16 linear prediction coefficients (AR parameters). That is, this multiball is suitable for modeling the spectrum and for recognizing this segment. In this case, the compression of the data from 200 to 16 acts as a feature extraction that greatly speeds up the calculations in subsequent stages.

第３の例は、次式を用いてＡＲパラメータａｌｌ（ｋ＝
１　　・・・ｍ）及びエネルギーＶを得ることに関する
ものである。The third example is the AR parameter all(k=
1...m) and obtaining the energy V.

データがゼロまたは測定領域外の繰返しであるという仮
定が、ＦＦＴにおいてなされるが、ＡＲモデルからスペ
クトルを計算する場合には避けられる。The assumption that the data is zero or repeating outside the measurement region, made in the FFT, is avoided when calculating spectra from the AR model.

ＡＲ，ＬＰ及びＭＥＭのような前述の方法は、ＡＲパラ
メータまたはＬＰＧを決定するための次のステップへ通
ずる。時系列に対する次式の自己相関関数を推定する。The aforementioned methods such as AR, LP and MEM lead to the next step to determine the AR parameters or LPG. Estimate the autocorrelation function of the following equation for the time series.

τ＝０・・・ｍ（７）次式におけるＡＲパラメータまたは線形予測係数ａｋ（
ｋ＝ｌ　　・・・ｍ）を求める。τ=0...m(7) AR parameter or linear prediction coefficient ak(
Find k=l...m).

Ｒａ　＝　ｂ　　　　　　　　　　　　　（８）ここに
、ＲはＲｘｘから形成されるトエプリッツ自己相関マト
リックスである。これは、一般に、ダービン（Ｄｕｒｂ
ｉｎ）またはレビンソン（Ｌｅｖｉｎｓｏｎ）のアルゴ
リズムを用いる順次式プロセッサ上で行なわれる。しか
し、並行式マシンに対しては、ジュール（Ｓｃｈｕｒ）
のアルゴリズムがより好都合であると一般に考えられて
いる（ＩＥＥＥ議事録「音響学、音声及び信号処理Ｊ　
ＡＳＳＰ−３１（１９８３）のニス・カング（Ｓ、　Ｋ
ｕｎｇ）及びワイ・ツユ−（Ｙ。Ra = b (8) where R is the Toeplitz autocorrelation matrix formed from Rxx. This is commonly referred to as Durbin.
in) or on a sequential processor using the Levinson algorithm. However, for parallel machines, Schur
is generally considered to be more convenient (IEEE Proceedings ``Acoustics, Speech and Signal Processing J
ASSP-31 (1983) Nis Kang (S, K
ung) and Y.T.

Ｈｕ）の「トエブリッツ・システムを解くための高度並
行アルゴリズム及びパイプライン式アーキテクチャＪ（
Ａ旧ｇｈｌｙ　　Ｃｏｎｃｕｒｒｅｎｔ　　Ａｌｇｏｒ
ｉｔｈｍａｎｄ　Ｐｉｐｅｌｉｎｅｄ　　Ａｒｃｈｉｔ
ｅｃｔｕｒｅ　ｆｏｒ　　ＳｏｌｖｉｎｇＴｏｐｌｉｔ
ｚ　　Ｓｙｓｔｅｍｓ　）参照）、ｌマトリックスＲを
、次式のように、低位及び高位の三角マトリックスの積
に分解する。Hu)'s ``Highly Concurrent Algorithms and Pipelined Architectures for Solving Toeblitz Systems'' J (
A Old Ghly Concurrent Algor
ithmand Pipelined Archit
ecture for SolvingToplit
z Systems ), the l matrix R is decomposed into a product of lower and higher triangular matrices as follows.

Ｒ−Ｕ”Ｕ　　　　　　　　　　　（９）式（８）に代
入すると、２つのステップで解くことができる。即ち、
次式からｇを求める。By substituting R−U”U (9) into equation (8), it can be solved in two steps. Namely,
Find g from the following formula.

ｂ＝Ｕ”ｇ　　　　　　　　　　　　（１０）そして、
次式からａを求める。b=U”g (10) and
Find a from the following formula.

ｇ　＝　Ｕ　ａ　　　　　　　　　　　　（１１）第１
２ａ図及び第１２Ｂ図のトリー相関を用いて式（７）に
おける自己相関係数を計算することができる。この場合
には、十分な個数のプロセッサがあると仮定して、デー
タを乗算器内にセットし、次いで、最大の所望の遅れに
等しい量だけ遅延させたこのデータのコピーを頂部から
供給する。各ステ、７プにおいて、遅れを１つ減らして
相関を行ない、このようにして、遅延済みデータストリ
ームを原データと正確に整合させる。g = U a (11) 1st
The autocorrelation coefficient in equation (7) can be calculated using the tree correlations in Figures 2a and 12B. In this case, assuming we have a sufficient number of processors, we set the data into a multiplier and then feed a copy of this data from the top delayed by an amount equal to the maximum desired delay. At each step, the delay is reduced by one and the correlation is performed, thus ensuring that the delayed data stream is accurately aligned with the original data.

自己整合係数Ｒ，、Ｒ，・・・ＲＮを、これを計算する
につれて、第１３図に示すようにシストリックアレイ内
の基本プロセッサＱ、　ＳＯ２・・・ＱｔＮに送り込む
。このアレイはジュールのアルゴリズムを用いて時系列
のためのＡＲパラメータまたはＬＰＧを計算し、そして
、自己相関係数の計算のために用いるものとは異なるコ
ンピュータ１００の一組の基本プロセッサを呼出す。（
いうまでもなく、自己相関係数を、これを計算するにつ
れて基本プロセッサに送り込むということをせずに、上
記自己相関係数を、計算後、記憶させ、そしてシストリ
ックアレイにおける同じ基本プロセッサを用いるように
クロスバ−スイッチ１０２をリセットしてもよい。）上
位の三角マトリックスＵを計算し、次いで、ｇ及びａを
計算するために下位のシストリックアレイにおいて用い
る。ｇ及びａを計算している間に、上位の２つの行のプ
ロセッサを、次の時系列のためのＡＲパラメータの計算
に対して始動させることができる。基本プロセッサｑｌ
、　（ｈ・・・（Ｉｚｓは４つまでの入力部及び３つま
での出力部と相互接続される。これは、　。As it is calculated, the self-alignment coefficients R,, R,...RN are fed into the elementary processors Q, SO2...QtN in the systolic array as shown in FIG. This array uses Joule's algorithm to calculate the AR parameters or LPG for the time series and invokes a set of base processors in computer 100 that are different from those used for the calculation of the autocorrelation coefficients. (
It goes without saying that rather than feeding the autocorrelation coefficients to the elementary processor as they are computed, the autocorrelation coefficients can be stored after being computed, and the same elementary processor in the systolic array can be used. The crossbar switch 102 may be reset as shown in FIG. ) Compute the upper triangular matrix U, which is then used in the lower systolic array to compute g and a. While calculating g and a, the top two row processors can be started for calculating the AR parameters for the next time series. basic processor ql
, (h...(Izs is interconnected with up to 4 inputs and up to 3 outputs.

第３図に示すものよりも複雑であり、第１図に示す如き
より大形のクロスバ−スイッチを必要とする。It is more complex than that shown in FIG. 3 and requires a larger crossbar switch as shown in FIG.

４８号プロセッサとしてのコンピュータ１００（または
、より一般的には、コンピュータ３０）は多ギガフロッ
プ・パフォーマンスを提供することができ、片持ちぼり
形の空間光変調器は大形の高速クロスバ−スイッチに帰
着することとなるが、このクロスバ−スイッチは、その
大きさ、速度、及び費用が、従来の半導体技術が適合し
そうもないものである。事実、アルゴリズムは最大限並
列有向グラフに縮小され、このグラフは信号プロセッサ
にマツピングされる。実行中のオーバヘッドを最小にす
るためにプログラムドブ−タフローを用いる。フィルタ
リング、たたみこみ、相関、高速フーリエ変換、及びマ
トリックス・ベクトル乗算のインプリメンテーションを
図示しである。クロスバ−スイッチが完全に再構成可能
であるので、複合アルゴリズムの高速効率的インプリメ
ンテーションが可能であり、新しいアルゴリズムをコン
ピュータ１００またはコンピュータ３０へ自動的にマツ
ピングすることが可能となる。The computer 100 (or more generally, the computer 30) as a No. 48 processor can provide multi-gigaflop performance, and the cantilevered spatial light modulator results in a large, high-speed crossbar switch. However, the size, speed, and cost of this crossbar switch make conventional semiconductor technology unlikely to be compatible. In fact, the algorithm is reduced to a maximally parallel directed graph, and this graph is mapped to a signal processor. Use programmed boot flow to minimize overhead during execution. 4 illustrates implementations of filtering, convolution, correlation, fast Fourier transform, and matrix-vector multiplication. Because the crossbar switch is fully reconfigurable, fast and efficient implementation of complex algorithms is possible, and new algorithms can be automatically mapped to computer 100 or computer 30.

儂号処ユ光学クロスバ−接読式並行基本プロセッサは、また、記
号コンピュータまたは混合式記号数値コンピュータを形
成することができる。記号計算に対しては、基本プロセ
ッサは、論理ゲート、比較器等を含む。即ち、第４図に
おいて、ＡＬＵ／乗算器１２５が、論理ゲート、パター
ン整合または他の記号処理用の機能で置き換えまたは補
足される。これは、次の用途に関連して説明する第２の
実施例コンピュータを具備する。The optical crossbar-reading parallel elementary processor can also form a symbolic computer or a mixed symbolic numerical computer. For symbolic calculations, the basic processor includes logic gates, comparators, etc. That is, in FIG. 4, ALU/multiplier 125 is replaced or supplemented with logic gates, pattern matching, or other symbolic processing functions. It comprises a second embodiment computer, which will be described in connection with the following applications.

艮１１゛　　　エキスパートシステム広く適用可能な直接的形式のエキスパートシステムは、
情報が「イフ・ゼン」規則の形式で手続き内に含まれて
いるシステムである。迅速な推論が要求され、そしてこ
こに提案するプロセッサが適当するという多くの実時間
用途がある。用途としては、音声、視覚、工業設備、ロ
ボット工学、及び兵器システムがある。前方連鎖規則準
拠システムにおいては、−組の観察を行ない、そして各
々に対して正確性の確率を想定する。この推論機関は、
状態、周囲状況または脅威を迅速に識別し、適切なアク
ションを指示する。後方連鎖式規則準拠システムにおい
ては、目標を仮定し、そしてこれを、この目標に適合す
るための必要条件に細分する。これらを更に細分する。艮１１゛ Expert system A direct form of expert system that can be widely applied is
A system in which information is contained within procedures in the form of "if then" rules. There are many real-time applications where rapid inference is required and for which the processor proposed here is suitable. Applications include audio, vision, industrial equipment, robotics, and weapons systems. In a forward chaining rule-based system, one makes -sets of observations and assumes a probability of accuracy for each. This reasoning agency is
Quickly identify conditions, surroundings, or threats and direct appropriate action. In backward chaining rule-based systems, a goal is assumed and then subdivided into requirements for meeting this goal. These are further subdivided.

始まりは、この仕方で働くコンピュータ言語である。前
方及び後方の連鎖については後で示す。いずれの形式に
おいても、急速レスポンスを得るために並列処理が必要
となる。−組の相互接続された論理規則及び確率を含ん
でいるシステムをハード配線することは経済的でない。It all started with computer languages that worked in this way. Forward and backward chains will be explained later. In either format, parallel processing is required to obtain rapid response. - It is not economical to hardwire a system containing a set of interconnected logic rules and probabilities.

即ち、上記規則は、新しい状態に対して変化する可能性
があり、または変更することが必要となるからである。That is, the rules may change or need to be changed for new conditions.

ここに提案する光学クロスバ−スイッチの速度及び再構
成可能が要求される。ここに考えられている簡単なシス
テムは、命題計算を呼出し、もっと複雑な用途において
要求されるもっと複雑な１次の述語計算をアドレス指定
するということをしない。The speed and reconfigurability of the optical crossbar switch proposed here is required. The simple system considered here does not invoke propositional computations and address the more complex first-order predicate computations required in more complex applications.

ｉ　゛　の　　びインプリメンテーション第１４図は、
２０個の観察特徴に基づいて７匹の動物を識別するため
の有向フローグラフを示すものである。この例及びその
説明は、ウィンストン（Ｗｉｎｓｔｏｎ）の「人工知能
」（アディソン・ウェスレイ　（Ａｄｄｉｓｏｎ−Ｗｅ
ｓｌｅｙ）　　１９８４）に所載のものに従ったもので
ある。観察の各々が、割当て確率のない真または偽であ
ると仮定すると、説明を面単にすることができる。基本
プロセッサ（記号Ｐ１ないしＰ９゜を付しである）は、
丸で示すＡＮＤゲート、黒点付きの丸で示すＯＲゲート
、及びｒｌＪを付した丸で示す反転ゲートである。入力
は、この動物に対応する入力が真であるならば、「真」
である。規則準拠システムのイフ・ゼン規則の特性を示
すと、例えば、プロセッサＰ、において、「動物が哺乳
動物であって肉を食するならば、これは肉食獣である」
、プロセッサｐｚ＋におけるＯＲゲートは、他の推論の
仕方で、これが肉食獣である、ということを示す。即ち
、「この動物が哺乳動物であり、且つこの動物が尖った
歯を有しており、且つこの動物が前向きの目を有してい
るならば、これは肉食獣である」と推論する。Figure 14 shows the implementation of i ゛.
Figure 3 shows a directed flow graph for identifying 7 animals based on 20 observed features. This example and its explanation can be found in Winston's ``Artificial Intelligence'' (Addison-Wesley).
(1984). The explanation can be simplified by assuming that each observation is true or false with no assigned probability. The basic processors (marked with symbols P1 to P9°) are:
An AND gate is indicated by a circle, an OR gate is indicated by a circle with a black dot, and an inversion gate is indicated by a circle with rlJ. The input is "true" if the input corresponding to this animal is true.
It is. To illustrate the characteristics of the if-then rule of a rule-compliant system, for example, in processor P, ``If an animal is a mammal and eats meat, then it is a carnivore.''
, the OR gate in processor pz+ indicates, in another way of reasoning, that it is a carnivore. That is, it is inferred that ``If this animal is a mammal, and this animal has pointed teeth, and this animal has forward-facing eyes, then it is a carnivore.''

問題とする場合において、観察は、これに付随する正確
性の確率を有している。独立性を仮定すると、２つの事
象の結合確率はこれら２つの確率の積である。従って、
中空丸で示すプロセッサ及び黒点なしのプロセッサは２
つの到来確率を乗する。これらプロセッサを、後で、形
式Ａと呼ぶ。In the case in question, observations have a probability of accuracy associated with them. Assuming independence, the joint probability of two events is the product of these two probabilities. Therefore,
Processors indicated by hollow circles and processors without black dots are 2.
Multiply by the probability of arrival. These processors will be referred to later as type A.

ゼロの出力は、論理価に対応し、いずれかの入力確率が
ゼロである場合に得られる。黒点付き丸のプロセッサは
、後で形式Ｂと呼ぶものであり、到来確率の最大値を決
定する。ウィンストンの論文においては他の式が与えら
れている。形式Ｃのプロセッサは、到来確率を１から減
算する。これら全数３つの形式のプロセッサは、また、
出力を変更する確率マツピング機能を有す。この機能は
、計算済み入力確率を０と１との間の出力確率にマツピ
ングする。An output of zero corresponds to a logical value and is obtained if any input probability is zero. The dotted circle processor is what we will later call type B and determines the maximum value of the arrival probability. Other formulas are given in Winston's paper. A type C processor subtracts the probability of arrival from one. These total three types of processors are also:
It has a probability mapping function that changes the output. This function maps calculated input probabilities to output probabilities between 0 and 1.

第１４図のフローグラフは、第１図に示すもののような
シストリックシステム上で直接に実行することが可能で
あり、このシステムにおいては、最初の１２個のプロセ
ッサは入力接続部を有し、最後の８つのプロセッサは出
力接続部を有す。ここに示す簡単な例に対して必要なプ
ロセッサは４０個だけである。プロセッサを、第２表に
示すように、形式Ａ、ＢまたはＣとして働くようにプロ
グラムする。観察入力は、最初の２０個のプロセッサＰ
、ないしＰ２゜に入れられる。確率は最後の７つのプロ
セッサの出力において得られ、そして、その最高のもの
が、識別された動物を示す。信鯨係数は、明らかに、確
率の大きさ、及びこれが他の動物に対する確率の大きさ
を越える量に関係する。第２表は、第１４図のフローグ
ラフに示す規則相互間の相互接続を提供するために、ク
ロスバ−スイッチ素子のうちのどれを起動させなければ
ならないかを示すものである。The flow graph of FIG. 14 can be implemented directly on a systolic system such as the one shown in FIG. 1, in which the first 12 processors have input connections; The last eight processors have output connections. Only 40 processors are required for the simple example shown here. The processor is programmed to work as type A, B or C as shown in Table 2. The observation input is the first 20 processors P
, or P2°. Probabilities are obtained at the output of the last seven processors, the highest of which indicates the identified animal. The confidence factor clearly relates to the magnitude of the probability and the amount by which this exceeds the magnitude of the probability for other animals. Table 2 indicates which of the crossbar switch elements must be activated to provide the interconnections between the rules shown in the flow graph of FIG.

／Ｊ　　−、ｌ　　　インプリメンテーション多くの用
途は、前方連鎖式よりも後方連鎖式の方に適する。即ち
、使用者は成る特定の仮定または目標を解決することに
関心をもっているからである。データを得ることが困難
な場合があり、そして、使用者は、不必要な観察データ
を得る為に力を費すということを望まない。第１４図に
示すエキスパートシステムに問う本当の質問は、「この
動物はデータであるか」ということである。これは、第
１４図のフローグラフを右から左へ読むことを意味する
。プロセッサＰ３４においてこの動物が黄褐色を有し、
そしてプロセッサＰ２□において暗色斑点を有し、そし
てこの動物が肉食獣であるならば、目標に合うことにな
る。後方連鎖式の利点は、この方式は、どの特徴を観察
すべきかを指示し、他の的はずれのデータを得ることで
無駄となる時間がないということである。/J-,l Implementation Many applications are better suited to backward chaining than to forward chaining. That is, the user is interested in solving a particular hypothesis or goal. Obtaining data can be difficult, and users do not want to expend effort obtaining unnecessary observation data. The real question to ask the expert system shown in Figure 14 is, "Is this animal data?" This means reading the flow graph of FIG. 14 from right to left. In the processor P34, the animal has a tan color;
And if it has a dark spot on processor P2□, and this animal is a carnivore, it will meet the target. The advantage of backward chaining is that it dictates which features to look at and there is no time wasted on obtaining other off-target data.

光学クロスバ−相互接続式プロセッサ上のインプリメン
テーションは、仮定の目標へ通ずるトリーの後向き４抽
出を必要とする（第１５図）。プロセッサには、図示の
ように、番号Ｑ１ないしＱ、を割当てである。既知の全
てのデータを、真の情報として上記プロセッサ内にセン
トする。更に他の仮定が来ようとしているということを
仮定して、相異なる組の基本プロセッサを含む複数のト
リーに対してクロスバ−スイッチをセットしておくこと
が賢明である。信号がプロセッサＱ、に入る。逆方向Ａ
ＮＤゲートを表わす各丸において（Ｑｌは第１４図にお
けるＰコ、に関係している）、上記信号は再出力方向に
伝送される。後向きＯＲゲートを表わす黒点付き丸にお
いて、上記信号は、上記ＯＲゲートの番号及び上記ＯＲ
ゲートの技をタグ付けされる。いずれのプロセッサにお
いても、出力の全てが既に真とマーク付けされているな
らば、その技に更に進む必要はない。或いはまた、上記
プロセッサが、使用者へ、「その動物は肉食獣であるか
」というような質問を出力する可能性がある。結局、上
記信号は、その動物がデータであるかどうかを確かめる
のにどんな情報が必要であるかを指示するベースへフィ
ルタ通過する。出力インジケータもまた、その動物がデ
ータであるということを確認するために使用者が満足す
るに違いない代替組合せを提供するようにマシンが上記
出力を解釈することのできるように、タグ付けされる。Implementation on an optical crossbar-interconnected processor requires backward 4 extraction of trees leading to a hypothetical target (Figure 15). The processors are assigned numbers Q1 to Q as shown. All known data is sent into the processor as real information. It is advisable to set the crossbar switch for multiple trees containing different sets of base processors, assuming that further assumptions are forthcoming. A signal enters processor Q. Reverse direction A
In each circle representing an ND gate (Ql is related to Pco in FIG. 14), the signal is transmitted again in the output direction. In the dotted circle representing a backward OR gate, the above signal is the number of the above OR gate and the above OR gate.
The gate technique is tagged. If all of the outputs are already marked true on either processor, there is no need to proceed further with the trick. Alternatively, the processor may output a question to the user such as "Is the animal a carnivore?" Eventually, the signal is filtered to a base that dictates what information is needed to ascertain whether the animal is data. Output indicators are also tagged so that the machine can interpret the output to provide alternative combinations that the user must be satisfied with to confirm that the animal is data. .

己ｌｌ皆己１　び　育言　、”声の記号処理及び論理的推論を同時に行ない、そしてクロス
バ−スイッチを介してこれらを結合することが可能であ
る。これは、信号処理出力を推論に直接使用することが
でき、そして推論出力が、推論ステップを完成するのに
必要な特定の計算を開始させることができる、という利
点を有す。``It is possible to simultaneously perform symbolic processing of voices and logical reasoning, and to combine them via a crossbar switch. This allows the signal processing output to be used directly for reasoning.'' and has the advantage that the inference output can initiate the specific computations needed to complete the inference step.

これらを計算する実際的方法においては、プロセッサの
上部セクションを推論に割当て、プロセッサの下部セク
ションを信号処理に割当てる。クロスバ−スイッチを対
応する４つのセグメントに分割し、上方のセグメントを
信号処理用とし、下布のセグメントを記号計算用とする
。他の２つのセグメントは、記号計算と数値計算との間
の任意のステージにおける通信を可能ならしめる。即ち
、第３の実施例においては、算術機能を有する一組の基
本プロセッサ（第４図におけるＡＬＵ／乗算ａｔ　２５
）　、及び記号機能を有する他の一組の基本プロセッサ
（第４図におけるＡＬＵ／乗算器１２５を論理ゲートの
ような記号機能で置き換えたもの）が設けられている。A practical way to calculate these is to allocate the upper section of the processor to inference and the lower section of the processor to signal processing. The crossbar switch is divided into four corresponding segments, with the upper segment used for signal processing and the lower fabric segment used for symbolic calculation. The other two segments allow communication at any stage between symbolic and numerical calculations. That is, in the third embodiment, a set of basic processors having arithmetic functions (ALU/multiplication at 25 in FIG.
), and another set of basic processors with symbolic functions (replacing the ALU/multiplier 125 in FIG. 4 with symbolic functions such as logic gates).

いうまでもなく、より拡張的な方法は、全ての基本プロ
セッサにおいて算術機能及び記号機能の両方を持つこと
である。Of course, a more expansive approach would be to have both arithmetic and symbolic functionality in all basic processors.

地理学的探査においては、実時間処理及び解釈を可能な
らしめるために、より速い記号及び数値処理が要求され
る。記号処理を用いてデータをクリーンアップし、ソー
スシグナチャを除去する（Ｉ　ＥＥＥ国際会議、［音響
、音声及び記号記録」（１９８５年５月）のエイ・マコ
ーレイの「反転のための地震アレイデータの予測デコン
ボリューションＪ　（Ｐｒｅｄｉｃｔｉｖｅ　Ｄｅｃｏ
ｎｖｏｌｕｔｉｏｎ　ｏｆ　Ｓｅ１５ｍ１ｃＡｒｒａｙ
　Ｄａｔａ　ｆｏｒ　Ｉｎｖｅｒｓｉｏｎ　）参照）。Geographical exploration requires faster symbolic and numerical processing to enable real-time processing and interpretation. Cleaning up data and removing source signatures using symbolic processing (A. Macaulay's ``Inversion of Seismic Array Data'' in IEEE International Conference on Acoustics, Speech and Symbolic Recording, May 1985). Predictive Deconvolution J
nvolution of Se15m1cArray
(See Data for Inversion).

次いで＼モデリング及び反転（５０「地理学Ｊ７７−８
７（１９８５）のマコーレイの「平面層点源モデリング
によるプレスタンク反転Ｊ　（Ｐｒｅｓｔａｃｋ　Ｉｎ
ｖｅｒ−ｓｉｏｎ　ｗｉｔｈ　Ｐｌａｎｅ　Ｌａｙｅｒ
　Ｐｏ１ｎｔ　５ｏｕｒｃｅ　Ｍｏｄｅｌｉｎｇ）参照
）を用いて地球パラメータの直接推定を行なう。これら
計算を制御するためのパラメータの選定及び方程式の誘
導は記号計算を必要とする。規則準拠エキスパートシス
テムが、解釈のために必要となり、そして、後方連鎖モ
ードにおいて成る特定の仮定が選定済みデータの特定の
再処理によって、Ｉ認されるように、信号処理と通信し
なければならない。Then \Modeling and Inversion (50 “Geography J77-8
7 (1985), “Prestack Inversion J by Planar Layer Point Source Modeling”.
version with Plane Layer
Direct estimation of earth parameters is performed using the following method: The selection of parameters and the derivation of equations to control these calculations require symbolic calculations. A rule-based expert system is required for interpretation and must communicate with the signal processing so that specific assumptions made in backward chaining mode are validated by specific reprocessing of the selected data.

晋己　・　　　３　置方−び　　”　システム信号・数
値的音声レコグナイザは、同時数値及び記号計算を行な
うことの利点を示すものである。The System Signal/Numerical Speech Recognizer demonstrates the benefits of performing simultaneous numerical and symbolic calculations.

第１６図は、エンベデッドパーサによる音声字句解析の
ためのブロック線図である。各ブロックは、３つの部分
に細分される。即ち、上方部分は、行なわれる機能の記
述を含み、中央部分は、使用されるアルゴリズム的方法
であり、下方部分は、クロスバ−スイッチ１０２によっ
て設定されるアーキテクチャ的インプリメンテーシッン
である。ここに含まれている原理を示すために音声認識
の困難な問題を大巾に簡単化する。先ず、音声データが
連続的に入れられてサンプリングされると仮定する。こ
のサンプリング済みデータのストリームを、各々が２０
０個のサンプルを有する２０ミリ秒フレームに区分する
。この実施例は、フレームのバッチに対してバッチモー
ドで演算する大部分のシステムとは異なり、フレームの
連続ストリームに対して演算する。これにより、バッチ
のインタフェース操作に付随する複雑性が避けられる。FIG. 16 is a block diagram for speech lexical analysis by the embedded parser. Each block is subdivided into three parts. That is, the upper part contains a description of the functions performed, the middle part is the algorithmic method used, and the lower part is the architectural implementation configured by crossbar switch 102. The difficult problem of speech recognition is greatly simplified to illustrate the principles involved. First, assume that audio data is input and sampled continuously. Each stream of sampled data contains 20
Partition into 20 ms frames with 0 samples. This embodiment operates on a continuous stream of frames, unlike most systems that operate in batch mode on batches of frames. This avoids the complexity associated with batch interface operations.

第１６図において、最初の２つの機能ブロック２０２及
び２０４は、前述したＡＲモデリング法を用いて、単一
フレームに対するデータを２００個のサンプルから１６
個の線形予測係数（ＬＰＧ）に減少する。光学クロスバ
−相互接続式基本プロセッサを用いた並行インプリメン
テーションを第１３図において前に提供した。一般に、
更に他のステージを用い、発見的手法に基づいて音声を
表わす値を改善する。実際には、フレーム当りＮ個のデ
ータサンプル及びＭ個のＬＰＧ係数があるならば、相関
は次のステージよりもＮ／Ｍ倍長くなるに過ぎない。従
って、プロセッサを効率的に使用するために、Ｎ／Ｍ巾
のトリーをＭ回繰返して使用することができる。In FIG. 16, the first two functional blocks 202 and 204 use the AR modeling method described above to generate data for a single frame from 200 samples to 16
linear predictive coefficients (LPGs). A parallel implementation using optical crossbar-interconnected basic processors was provided earlier in FIG. in general,
Further stages are used to refine the values representing speech based on heuristics. In fact, if there are N data samples and M LPG coefficients per frame, the correlation will only be N/M times longer than the next stage. Therefore, in order to use the processor efficiently, an N/M wide tree can be used repeatedly M times.

２０ミリ秒フレームを１６個のＬＰＧに減少した後、ク
ロスバ−スイッチを動的時間ワーピングのためにリセッ
トする。上記ワービングは、第１６図の機能ブロック２
０６に示すように、入って来る信号に対するＬＰＧをデ
ィクショナリ基準ＬＰＧテンプレートに対して相関させ
ることにより、ディクショナリのメンバから入って来る
語を識別または分類するために用いられるものである。After reducing the 20 ms frame to 16 LPGs, the crossbar switch is reset for dynamic time warping. The above warping is performed in function block 2 of Fig. 16.
06, it is used to identify or classify incoming words from the members of the dictionary by correlating the LPG for the incoming signal against the dictionary reference LPG template.

機能ブロック２０６に示すように、コンピュータを推論
マシンとして構成するために上記クロスバ−スイッチを
再びリセットする。規則準拠システム内で簡単な規則を
用い、一つの語の終り及び新しい語の始まりに関する決
定をなす。As shown in function block 206, the crossbar switch is again reset to configure the computer as a reasoning machine. Simple rules are used within a rule-based system to make decisions regarding the end of one word and the beginning of a new word.

状況アクション・ウェイト・アンド・シー規則準拠シス
テムの形式の記号構文解析が、言語文法の規則を用いて
、次の語に対する音声の実現可能部分を予測し、このよ
うにして、考慮しなければならないディクショナリ内の
語数を減らし、また、認識内のあいまいさを解決するこ
とを助ける。これは、第１６図において機能ブロック２
１０で示すように、クロスバ−スイッチを更に一度リセ
ットすることによってこのコンピュータ上で実行される
。Symbolic parsing in the form of a situational action weight-and-see rule-based system uses the rules of language grammar to predict the possible parts of speech for the next word, and thus must be taken into account. It reduces the number of words in the dictionary and also helps resolve ambiguities in recognition. This is the function block 2 in FIG.
This is done on this computer by resetting the crossbar switch one more time, as shown at 10.

並列動的時間ワーピング及び記号構文解析を、光学クロ
スバ−スイッチを用いるそれらのインプリメンテーショ
ンとともに次に説明する。クロスバ−スイッチは各フレ
ーム中に３回リセットされる。第１のセツティングは、
ＬＰＧを計算すること、及び規則準拠パーサに対して論
理的推論をなすことに対応し、第２のセツティングは、
動的時間ツーピング中に各基準語に対する累算費用関数
を決定するためのものであり、第３のセツティングは、
語の終りを決定してこれを分類するためのものである。Parallel dynamic time warping and symbolic parsing will now be described along with their implementation using optical crossbar switches. The crossbar switch is reset three times during each frame. The first setting is
Corresponding to computing the LPG and making logical inferences to the rule-compliant parser, the second setting is
The third setting is for determining the cumulative cost function for each reference term during dynamic time toping.
It is used to determine the end of a word and classify it.

時日ワーピングによる音声ｒ８゛第１６図に示すように、伸び未知の到来データ゛のフレ
ームに対するＬＰＧを、ＬＰＧテンプレートのディクシ
ョナリ内の全ての実現可能語に対する各フレームと相関
させなければならない。第１７Ａ図は、Ｘ方向に一一−
−−−−−　ｉ　−１、ｉ、ｉ＋１、−−−−−−−−
と記号付けした入力データＬＰＧフレーム、及びｙ方向
のに番目の基準語ＬＰＣフレームを示すものである。基
準テンプレートのディクシゴナリを品詞に従ってサブデ
ィクショナリに分ける。２つ以上の品詞として生ずる可
能性のある語が別々のサブディクショナリに含まれる。As shown in Figure 16, the LPG for frames of incoming data of unknown length must be correlated with each frame for all possible words in the dictionary of LPG templates. Figure 17A shows
−−−−− i −1, i, i+1, −−−−−−−
The input data LPG frame marked with , and the reference word LPC frame in the y direction are shown. Divide the standard template dictionary into subdictionaries according to parts of speech. Words that can occur as more than one part of speech are included in separate subdictionaries.

記号パーサを用い、文法規則の故に後に続くことのでき
ない品詞を予測する０文法と合致しない品詞を含んでい
るこれらサブディクショナリは相関させる必要がなく、
これにより、局部距離及び動的時間ワープのステージに
おいて時間が節約される。Using a symbolic parser to predict parts of speech that cannot be followed due to grammar rules, these subdictionaries containing parts of speech that do not match the zero grammar do not need to be correlated;
This saves time in the local distance and dynamic time warp stages.

代表的な局所距離測定値は次式の通りである。A typical local distance measurement value is as follows.

及びここに、Ｌは線形予測係数の個数であり、ｉは、Ｌ　Ｐ
　Ｃｉ　、　−−−−−−ｉＬを有する入力フレーム番
号であり、ｒはＬＰＣｒ、−−−−−−ｒＬを有する基
準テンプレート番号である。and where L is the number of linear prediction coefficients and i is L P
Ci, is the input frame number with -------iL, and r is the reference template number with LPCr, -------rL.

局所距離ｄ　ｉａｒを、各新しい入力フレームが到着す
るときに第１７Ａ図に示す波面に沿って同時に計算する
。第１７Ａ図における（ｉ−ｒ）点は対応のｄ　ｉ、ｒ
を表わす。認識を行なうために、到来フレームと基準フ
レームとの間の相関の度合を得なければならない。基準
に対する話者の速度の変動を考慮して、係数２までの伸
びまたは縮みが各入力フレームにおいて許される。これ
は、新しい入力フレームが到来するにつれて波面に沿っ
て累算距離費用関数を計算することによって行なわれる
。費用関数は局所距離の走行和を累算する。The local distance diar is computed simultaneously along the wavefront shown in Figure 17A as each new input frame arrives. The point (i-r) in FIG. 17A is the corresponding d i, r
represents. In order to perform recognition, the degree of correlation between the incoming frame and the reference frame must be obtained. Expansion or contraction by a factor of 2 is allowed in each input frame to account for variations in speaker speed relative to the reference. This is done by computing an accumulated distance cost function along the wavefront as new input frames arrive. The cost function accumulates running sums of local distances.

第１７Ｂ図は、累算合計が、縮み、伸び、またはいずれ
でもないということを考慮する３つの可能性から選定さ
れるということを示すものである。FIG. 17B shows that the running total is selected from three possibilities considering shrinkage, stretch, or neither.

これは次式で表わされる。This is expressed by the following formula.

ｓｉ、　ＩＰ　＝ｄｔ、　ｒ”１ｌｎ（Ｓｔ−１＋　ｒ
−ｚ　＋ｄ、、　Ｐ−１’Ｓｉ−Ｉｎ　ｒ−１’５ｉ−
２．ｒ−１＋６−１１、）第１７Ａ図の列の上部にある累算費用関数を用いて、語
が柊っているかどうか、及び突合せが得られているかど
うかを決定する。また、最良の突合せが得られたディク
ショナリ内の語が識別される。si, IP = dt, r”1ln(St-1+ r
-z +d,, P-1'Si-In r-1'5i-
2. r-1+6-11,) The cumulative cost function at the top of the column in Figure 17A is used to determine whether the word is blank and whether a match is obtained. Also, the word in the dictionary with the best match is identified.

第１８図は、局所距離及び動的時間ワービングを光学ク
ロスバ−相互接続式並行プロセッサを用いて計算する方
法を示すものである。次の入力フレームに対する１６個
の線形予測係数が、−組１６個の１０組の基本プロセッ
サＰ、、Ｐ、　−−−−−−ｐ、、。（第１８図の上部
に沿う）に対する線路Ｌ＋　、　Ｌｚ　−−−−一−Ｌ
ｌｓ−ＬＩ＆　（第１８図の上皇部分）上に並列ロード
され、全ての基準語が演算されるまでそこに留まってい
る。基準語は、短い語（０，２秒）に対しては１０フレ
ームとして、長い語に対しては２０フレームとして、ま
たは極めて長い語に対しては３０フレーム（０，６秒）
として記憶される。これにより、短い語の１０個のフレ
ームが、第１８図に示すように入力線Ｒ１、Ｒ２−−−
−一−Ｒ＋ｓｑ　、Ｒ＋ｈ。を介して供給され、長い語
は２つまたは３つの部分に分割される。FIG. 18 illustrates a method for calculating local distance and dynamic time warping using optical crossbar interconnected parallel processors. The 16 linear prediction coefficients for the next input frame are -16 sets of 10 basic processors P, , P, -------p, . (along the top of Fig. 18) for the lines L+, Lz - - L
It is loaded in parallel on ls-LI& (retired section in Figure 18) and remains there until all reference words have been computed. The reference words are 10 frames for short words (0,2 seconds), 20 frames for long words, or 30 frames (0,6 seconds) for very long words.
is stored as. As a result, 10 frames of short words are connected to input lines R1, R2 --- as shown in FIG.
-1-R+sq, R+h. , and long words are split into two or three parts.

簡単化のために、１０フレーム長の基準語だけを考える
。上記基準語に対する一組１６個の１０組のＬＰＧ　（
線形予測計数）が高速メモリ線を介してＩＱＨの基本プ
ロセッサＰ１ないしＰい。に供給され、そして、更に他
の基本プロセッサのトリーＴ＋　、Ｔｚ　−−一−−−
−−Ｔ＋ｓが式（１２）の減算及び加算、または式（１
３）の乗算及び加算を行なう。基準フレームは、全ての
フレームにわたる累算ができるように、語に対して揺れ
ている。即ち、語のｊ番目のフレームに対する１６個の
ＬＰＧがプロセッサＰ＋６Ｊ−＋ないしＰＩ＆ｊ−１６
に入る前に、語の（ｊ−１）番目のフレームに対する１
６個のＬＰＧがプロセッサＰＩ＆ｊ−１ｓないしＰｏｊ
に入り、これにより、入力フレームと語の（ｊ−１）番
目のフレームとの間の局所距離（これに加えて、上記入
力フレームからの早期フレーム局所距離との累算）を、
加算器Ａ、、Ｂ、及びＣ，における上記入力フレームと
語のｊ番目のフレームとの間の局部距離との累算にため
に、動的時間ワープ計算の一部として利用することがで
きる。For simplicity, only reference words with a length of 10 frames are considered. 10 sets of 16 LPGs for the above reference words (
A linear predictive coefficient) is connected to the IQH's basic processors P1-P via high-speed memory lines. and further the trees T+, Tz of other basic processors
--T+s is subtraction and addition of equation (12) or equation (1
3) Multiplication and addition are performed. The reference frame is oscillated relative to the word to allow accumulation across all frames. That is, the 16 LPGs for the jth frame of a word are
1 for the (j-1)th frame of the word before entering
6 LPGs are processors PI&j-1s or Poj
, which allows the local distance between the input frame and the (j-1)th frame of the word (plus the accumulation of earlier frame local distances from said input frame) to be
For the accumulation of the local distance between the input frame and the jth frame of the word in adders A, B, and C, it can be used as part of a dynamic time warp computation.

トリーＴ１ないしＴＩＯの出力は、第１７Ａ図における
時間的に揺れている局所距離の列を表わす。The outputs of trees T1 through TIO represent the sequence of time-varying local distances in FIG. 17A.

これに続いて、パイプライン的に、全ての基準語に対す
る局所距離の値がある。次いで、このパターンが次の入
力フレームに対して反覆する。各基準語と別々に相関す
るために動的時間ワーブプログラムを適用しなければな
らない、従って、後続の記述において、次の入力フレー
ム及び同じ基準語まで時間と合致するように遅延が十分
でなければならない。説明を簡単にするために、唯１つ
の基準語が用いられ、且つ、トリーの揺れているパイプ
ライン出力が単一の基準語に対する第１７図の列である
ものと仮定する。第１７Ｂ図における３つの通路に対す
る累算費用関数をフローグラフにおいて式（１４）を介
して計算する。基準フレームが１つ少ない先行時間フレ
ームにおける累算結果に局所距離を加算することにより
、通路２を計算する。上記先行時間フレームにおいて計
算した通路２に対する累算値を現在の局所距離に加算す
ることにより、通路１を得る。上記先行フレームに対す
る累算通路を、遅延した現在の局所距離に加算すること
により、通路３を計算する。上記３つの通路の比較を可
能ならしめるために、通路１及び２において遅延を加算
する。右端における累算費用は、第１７Ａ図に、おける
列の頂部における累算費用を表わしており、語の終り及
び語の識別を決定するのに用いられる。前述したように
、各入力フレームに対して、あらゆる基準語または基準
語の一部に対する一組の値が右端において遂次流れ出る
。Following this, in a pipeline, are local distance values for all reference words. This pattern is then repeated for the next input frame. A dynamic time warping program must be applied to correlate with each reference word separately, so that in subsequent descriptions the delay must be sufficient to match the time to the next input frame and the same reference word. No. For simplicity of explanation, assume that only one reference word is used and that the tree's swinging pipeline output is the sequence of FIG. 17 for a single reference word. The cumulative cost function for the three paths in FIG. 17B is calculated in the flow graph via equation (14). Path 2 is calculated by adding the local distance to the accumulated result in the previous time frame with one less reference frame. Path 1 is obtained by adding the accumulated value for path 2 calculated in the previous time frame to the current local distance. Compute path 3 by adding the accumulated path for the previous frame to the delayed current local distance. To enable comparison of the three paths above, delays are added in paths 1 and 2. The accumulated cost at the far right represents the accumulated cost at the top of the column in FIG. 17A and is used to determine word endings and word identification. As mentioned above, for each input frame, a set of values for every reference word or part of a reference word flows out sequentially at the right end.

第１９図は、語の終りを論理的推論によって検出する方
法を示すものである。全ての基準のためのフレームに対
する一つの入力フレームのための相関に対する累算費用
を表わす第１８図からのストリームはプロセッサの頂部
内へ送られる０区分語に対する結果は、語１に対して示
すように加算される。費用関数は、語がこの点で終って
いるかどうかを決定するために一組の論理演算を通る。FIG. 19 shows a method for detecting the end of a word by logical reasoning. The stream from FIG. 18 representing the accumulated cost for correlation for one input frame against frames for all references is sent into the top of the processor. The result for a 0 segment word is as shown for word 1. will be added to. The cost function goes through a set of logical operations to determine whether the word ends at this point.

実現可能規則を第１９図に示す、エネルギーは、既に、
短時間、しきい値よりも下になっており、累算費用はし
きい値の下になり、上記費用は隣りの点における費用よ
りも少なく、成る個数を越えるフレームが、先行語の終
りの後、既に通過している。語に合わなかった全ての試
験の結果はゼロ出力となる。そこで、語相互間の比較が
行なわれる。一つの語が他の語に比べて最小限の試験を
通過すると、この語は、その値及び識別タグとともに出
力部へフィルタリングする。非ゼロタグは語の終りを指
示し、そしてこのタグはこの語を特定する。動的時間ワ
ープ計算における全ての累算費用関数はゼロにセットさ
れ、一つの語の終りが検出された後、新しい語を探索し
始める。認識された語及びそ音声の可能部分はパーサへ
送られる。The feasibility rule is shown in Figure 19, and the energy is already
is below the threshold for a short time, the accumulated cost is below the threshold, the cost is less than the cost at the neighboring points, and more than the number of frames at the end of the preceding word After that, it has already passed. All tests that do not match the word result in a zero output. Comparisons between words are then made. If a word passes a minimum number of tests compared to other words, this word is filtered to the output along with its value and identification tag. A non-zero tag indicates the end of a word, and this tag specifies this word. All cumulative cost functions in the dynamic time warp computation are set to zero, and after the end of one word is detected, we start searching for a new word. The recognized word and possible parts of its speech are sent to the parser.

１０．０００語を越える語いが多くの類似語を有してお
り、あいまいさを弁別することを助けるために文法規則
を利用することなしには、良好な性能を得ることは困難
である。前述したもののような状況アクションパーサを
用いれば、句が新しい語として組立てられて認別され、
従って、後続の語において容認されない品詞を予測する
ことが可能となる。これにより、動的時間ツーピング中
に入力を相関させなければならないディクショナリ語数
を減少させることが可能になる。そこで、多数の用語を
収容することができ、または時間が他の計算のために解
放される。上記パーサは、これを有用ならしめるために
は高速で演算しなければならない。構文解析は、クロス
バ−スイッチを切換えて基本プロセッサ内のパターン整
合素子を活動化することによって行なわれる。ここに考
えられるパーサはウェイト・アンド・シー・パーサであ
り（ピー・ウィンストン（Ｐ、賀１ｎｓｔｏｎ　）の「
人工知能」第２版（アディソン・ウェスレイ、１９８４
）参照）、このパーサは、一つの規則が働かされるとき
に取るべき一組の規則及びアクションから成っている。With over 10,000 words having many similar words, it is difficult to obtain good performance without utilizing grammar rules to help discriminate between ambiguities. Using a situational action parser like the one mentioned above, phrases can be assembled and recognized as new words,
Therefore, it is possible to predict unacceptable parts of speech in subsequent words. This makes it possible to reduce the number of dictionary words with which inputs have to be correlated during dynamic time toping. A large number of terms can then be accommodated or time is freed up for other calculations. The above parser must operate fast to be useful. Parsing is accomplished by toggling a crossbar switch to activate pattern matching elements within the base processor. The parser considered here is the weight-and-see parser (P. Winston's "
Artificial Intelligence” 2nd edition (Addison Wesley, 1984)
), this parser consists of a set of rules and actions to take when a rule is activated.

約５００個の規則を、第３図に示すもののようなシステ
ム上で実行することができる。Approximately 500 rules can be executed on a system such as the one shown in FIG.

ウィンストンの著書に記載されている１３個の規則の組
をここに用いて説明を行なうことにする。I will explain this using a set of 13 rules described in Winston's book.

第２０図は、文の解析、及びオペランド解析トリーの発
生を示すものである。ウィンストンの著書に記載されて
いるように、−組の規則を用いて文からトリーを発生さ
せる。受容不可能な品詞を予測するには更に他の規則が
必要となる。また、２つ以上の品詞であり得る語に対し
ては、全ての品詞を考えなければならない。２つ以上の
品詞が満足を与える場合には、二重トレースを持つか、
または、後でオペランド解析が失敗した場合のバックト
ラッキングのためにプロセッサの状態を記憶させておく
ことが必要となる。FIG. 20 shows the analysis of a sentence and the generation of an operand analysis tree. As described in Winston Winston's book, trees are generated from sentences using -set rules. Still other rules are needed to predict unacceptable parts of speech. Furthermore, for words that can have two or more parts of speech, all parts of speech must be considered. If more than one part of speech gives satisfaction, it has a double trace, or
Alternatively, it may be necessary to remember the state of the processor for backtracking in case operand parsing fails later.

第３表は構文解析順序を示すものである。一般には３つ
のバッファが必要であるが、本例に対して必要なのは２
つのバッファＢ＋およびＢ２だけである。３つのスタッ
クノードに、、Ｋｇ及びに３を示しである。上記のバッ
ファ及びスタックノードは多重レジスタであり、全ての
レジスタの内容は、通例、−緒に移動させられる。語は
左側から゛入り、規則は、上記バッファ及び最上位のス
タックノードＫＩの内容に従って働かされる。働かされ
る規則の結果として取られるアクションとしては、スタ
ックに、内の項目に対してバッファＢ。Table 3 shows the parsing order. Typically three buffers are required, but for this example only two are needed.
There are only two buffers, B+ and B2. The three stack nodes are shown as Kg and Kg. The buffer and stack nodes described above are multiple registers, and the contents of all registers are typically moved together. Words enter from the left and rules are operated according to the contents of the buffer and the topmost stack node KI. The action taken as a result of the rule being activated is on the stack, for items in buffer B.

をタスク生成し、そして語を上記バッファ内で右−桁送
りしてバッファＢ１内の使用可能なスペースを満たす、
第２のアクションは、サブトリーを、例えば動詞句（Ｖ
　Ｐ）を開始するためにスタックに１内に新しいノード
を作ることである。上記スタック内のノードを、上記新
しいノードのための場所を作るために横に移動させる。and right-shift the word in the buffer to fill the available space in buffer B1,
The second action removes the subtree, for example the verb phrase (V
P) is to create a new node within one in the stack to start. A node in the stack is moved aside to make room for the new node.

本例において用いる最後のアクションとしては、スタッ
クＫ。The final action used in this example is stack K.

からバフファＢ、ヘノードを移動させ、そして上記スタ
ック内のノードを左へ桁送りしてスタックに、内に残っ
ているスペースを満たす。Move the node from buffer B to buffer B, and shift the nodes in the stack to the left to fill the stack.

第３表は、ウィンストンに従う１２ステツプの順序で文
を解析することを示すものである。開始においてスタッ
クＫＩ内に文句（Ｓ）を想定し、そしてこの文に対する
品詞（第２０図）を、上記文内の第１の語から始めて、
バッファＢ１及びＢ２に左から入れる。スタックに１が
文ノードを含んでおり、そしてバッファＢ、内に名詞句
（ＮＰ）があるので、右側における文の規則を働かせる
。Table 3 shows the parsing of sentences in a 12-step order according to Winston. At the beginning, suppose a phrase (S) in the stack KI, and then set the part of speech (Figure 20) for this sentence, starting from the first word in the sentence,
Enter into buffers B1 and B2 from the left. Since 1 on the stack contains a sentence node, and there is a noun phrase (NP) in buffer B, the sentence rule on the right side works.

取るアクションは、ＮＰをスタックに、へ移動させ、そ
してこれをＳに対してタスク生成することである。同時
に、到来語を右へ桁送りしてバッファＢ、を再び満たす
。右側に示すように、ステップ２において他の文の規則
を働かせ、これにより、スタックをに１からに２へと下
へ移動させ、そして新しい動詞句ノードをスタックに、
内にセットアツプする。このステージにおいて、オペラ
ンド解析のＮＰサブトリーは完成しており、ｖＰＰＰブ
トリーが開始されつつある（第２０図）。ステップ４及
び５は、動詞及び名詞を上記サブトリーと結合させる。The action taken is to move NP onto the stack, and create a task for it to S. At the same time, the incoming word is shifted to the right to refill buffer B. As shown on the right, we activate the other sentence rules in step 2, which moves the stack down from 1 to 2, and places a new verb phrase node on the stack,
Set up inside. At this stage, the NP subtree of operand analysis is complete and the vPPP subtree is starting (Figure 20). Steps 4 and 5 combine verbs and nouns with the above subtrees.

ステップ６は、既に生成したサブトリーをスタックに、
及びに２内に保持しながら前置詞句サブトリーを開始す
る。ステップ７及び８はＰＰサブトリーを確立する。ス
テップ９内の規則がＶＰの下でＰＰの連結を可能ならし
め、ステップｌＯ及び１２は文Ｓの下でＶＰを連結して
完全なオペランド解析トリーを提供する。Step 6 is to add the already generated subtree to the stack.
and start the prepositional phrase subtree while keeping it within 2. Steps 7 and 8 establish the PP subtree. The rules in step 9 enable concatenation of PPs under VP, and steps lO and 12 concatenate VPs under statement S to provide a complete operand parse tree.

第２１図は、第３表及び第２０図の例に対し、第３図に
おけるものと同様の光学クロスバ−相互接続式基本プロ
セッサコンピュータにマツピングすることのできるフロ
ーグラフを示すものである。FIG. 21 shows a flow graph that can be mapped to an optical crossbar-interconnected basic processor computer similar to that in FIG. 3 for the examples of Table 3 and FIG.

１３個の規則Ｓ＋ないしＳ４、ｖｐ、ないしＶＰ６、Ｐ
Ｐ、ないしＰＰ３は、バッファレジスタ及び上部のスタ
ックレジスタからの入力端子を有す。各クロックサイク
ルにおいて、あらゆる規則は、その記憶パターンをその
入力端子におけるパターンと整合させようとする。この
例においては、３つの整合を必要とする規則ＶＰ４を除
き、どれかの規則を活動化するために２つの整合が必要
である。13 rules S+ to S4, vp, to VP6, P
P, through PP3 have input terminals from the buffer register and the upper stack register. At each clock cycle, every rule attempts to match its stored pattern with the pattern at its input terminal. In this example, two matches are required to activate any rule, except for rule VP4, which requires three matches.

また、規則は排列されてないものと想定する。排列済み
規則は、第１４図におけるように、相互接続を再配置す
ることによって受容され得る。上記規則のうちの８つは
、活動化すると、下車部において出力を働かせ、この出
力は、転送用バッファＢ１を含む上記バッファをスタッ
クに、内のレジスタの底部に対してタスク生成を始めさ
せる。規則Ｓ３及びＶ　Ｐ　４は、活動化すると、スタ
ックレジスタを右へ桁送りし、新しい句ノードをスタッ
クに、内に発生させてそこに置く。規則ＶＰ、及びＰＰ
Ｓは、活動化すると、スタックに、を含むスタックレジ
スタをバッファＢ、内へ桁送りする。It is also assumed that the rules are not ordered. Ordered rules can be accommodated by rearranging the interconnects, as in FIG. 14. Eight of the above rules, when activated, act on an output in the dismount section, which causes the buffers, including transfer buffer B1, to begin task generation for the bottom of the registers in the stack. Rules S3 and V P 4, when activated, shift the stack register to the right and cause a new phrase node to occur and be placed on the stack. Rules VP and PP
When S is activated, it shifts the stack register containing the stack into buffer B.

規則Ｓ４を活動化すると、構文解析の完了及びトリーの
読出しが指示される。Activation of rule S4 indicates completion of parsing and reading of the tree.

聚皿勇並列基本プロセッサの相互接続の光学動的再構成可能性
の特徴を保持しながら本発明コンピュータについて種々
の変形を行なうことができる。例えば、プロセッサの形
式を混合することによってプロセッサの個数及び形式を
変えることができ、異なる個数のプロセッサを必要とす
るアルゴリズムの動的変更を得るのに、プロセッサの全
部を使用するという必要はなく、また、クロスバ−スイ
ッチによって相互接続される基本プロセッサの個数を減
らすために基本プロセッサの若干のサブセットをハード
配線で相互接続し、及び、このようにして、クロスバ−
スイッチをリセットするのに必要な時間を減少させるこ
とができる。いうまでもなく、基本プロセッサをクロス
バ−スイッチによってパイプライン式に構成することが
でき、そして、クロスバ−スイッチの再構成可能性によ
り、同じコンピュータ上の相異なるアルゴリズムに対す
る時分割が可能となる。また、空間光変調器は、ＬＣＤ
のような伝送形であっても、またはより小形の変調器の
プレイであってもよい。量子ウェル装置を基礎とする変
調器はナノ秒のスイッチング時間を提供すると思われる
（４４「応用物理書簡集ｊ　１６　（１９８４）のティ
ー・ウッド（Ｔ、　Ｗｏｏｄ）等のｒｐ−ｉ−ｎダイオ
ード構造内のＧａＡｓ　／ＧａＡ　Ｉ　Ａｓ量子ウェル
による高速光変調Ｊ　（Ｈｉｇｈ　−５ｐｅｅｄ　０ｐ
ｔｉｃａｌ　Ｍｏｄｕｌａｔｉｏｎ　ｗｉｔｈ　ＧａＡ
ｓ／ＧａＡ　Ｉ　ＡｓＱｕａｎｔｕａｉ　Ｗｅｌｌ　ｉ
ｎ　ａ　ｐ−１−ｎ　Ｄｉｏｄｅ　５ｔｒｕｃｔｕｒｅ
　）参照）。Various modifications can be made to the computer of the invention while retaining the characteristic of optical dynamic reconfigurability of the parallel basic processor interconnections. For example, the number and type of processors can be varied by mixing types of processors, and it is not necessary to use all of the processors to obtain dynamic changes in algorithms that require different numbers of processors. It is also possible to hardwire some subset of the elementary processors to reduce the number of elementary processors interconnected by the crossbar switch, and in this way
The time required to reset the switch can be reduced. Of course, the basic processors can be configured in a pipelined manner with crossbar switches, and the reconfigurability of the crossbar switches allows time-sharing for different algorithms on the same computer. In addition, the spatial light modulator is an LCD
It may be a transmission type such as , or it may be a play of a smaller modulator. Modulators based on quantum well devices appear to provide nanosecond switching times (RP-I-N diode structures such as T. Wood et al. High-speed optical modulation J (High -5peed 0p
tical Modulation with GaA
s/GaA I AsQuantuai Well i
na p-1-n Diode 5structure
)reference).

計算をより効率化するために実数部及び虚数部に分離す
ることによる実施例の高速フーリエ変換を、例えば、２
つの塁でない領域における点の数で時間または周波数を
１０進化するという周知の方法を用いることによって変
形することができる。For example, the fast Fourier transform of the embodiment by separating into real and imaginary parts in order to make the calculation more efficient is
This can be modified by using the well-known method of decimating time or frequency by the number of points in the non-base region.

同様に、実施例の記号数値音声レコグナイザを、フレー
ムのディクショナリ及び語の終り推論規則との各フレー
ムの相関のような各入力フレームに対する数値及び記号
計算の特徴を保持しながら大巾に変形することができる
。ＬＰＧの代りに、フレームを構成するサンプルの他の
特徴付けを用いてもよく、動的時間ワープを、除去して
も、または伸び係数を増減することのできるようにもっ
と広範囲的ならしめてもよく、異なる語の終り規則を利
用することもでき、相関探索を制限するためにサブディ
クショナリに対する異なるパーサを呼出すかまたはかか
るパーサなしにしてもよい　。Similarly, the symbolic-numeric phonetic recognizer of the embodiments can be extensively modified while preserving the characteristics of the numerical and symbolic computations for each input frame, such as the correlation of each frame with a dictionary of frames and end-of-word inference rules. Can be done. Instead of LPG, other characterizations of the samples making up the frame may be used, and the dynamic time warp may be removed or made more extensive such that the stretch factor can be increased or decreased. , different end-of-word rules may also be utilized, and different parsers for the subdictionaries or no such parsers may be invoked to limit the correlation search.

（発明の効果）ＪＴ構成可ゞ性の１占ハード配線は、成る特定のアルゴリズムを実行するため
のより速いシステムを提供するが、種々のアルゴリズム
を実行するのに適当する柔軟性を提供しない。バスは、
約１０個以上の高性能プロセッサを用いる大部分のアル
ゴリズムに対しては飽和状態となり易い（アール・ジェ
イ・ヘイダック（Ｒ，Ｊ、　Ｈａｙｄｕｋ　）及びヌー
ル・エイ・ケイ（Ｎｏｏｒ　Ａ、に、　）　ｌｉ、ＮＡ
ＳＡ刊、２３３５．１５（１９８４）の「構造及び力学
の研究Ｊ　（Ｒｅ５ｅａｒｃｈｉｎ　５ｔｒｕｃｔｕｒ
ｅ　ａｎｄ　Ｄｙｎａｍｉｃｓ　）　−１９８４におけ
るエイ・ディー・マコーレイ参照）、プロセッサ相互間
の最も近く隣り合う接続部は成るアルゴリズムに対して
費用効果的である。例えば、ＮＡＳＡの２−Ｄプロセッ
サアレイ、ＭＰＰ、はエツジエンハンスメントに対して
効果的である。カーネギ−・メルロン大学に建設されて
いるもの（ケイ・ブロムレイ（Ｋ、　Ｂｒｏｍｌｅｙ　
）［、会報５ＰＩＥ４９５．１３７　（１９８４）の「
実時間信号処理Ｊ　（ＲｅａｌＴｉｍｅ　Ｓｉｇｎａｌ
　Ｐｒｏｃｅｓｓｉｎｇ　）■におけるピー・ジエイ・
クエーケス（Ｐ、Ｊ、　Ｋｕｅｋｅｓ　）及びエム・ニ
ス・シュロース力（Ｍ、Ｓ、　５ｃｈｌａｕｓｋｅｒ　
）参照）のようなシストリックアレイ　（エイチ・ティ
ー・クングの「コンピュータＪ　（Ｃｏｏ＋ｐｕｔｅｒ
　）　ｌ　５　（１）　、３７（１９８２）参照）　、
ＥＳＬ、ヒユーズ（Ｈｕｇｈｅｓ　）（音響音声及び信
号処理に関する国際会議８５ＣＨ２ｉ１Ｂ−８，３，１
３９２（１９８４）におけるジェイ・ジー・ナツシュ（
Ｊ、Ｇ、Ｎａ５ｈ　）及びシー・ペトロゾリン（Ｃ，Ｐ
ｅｔｒｏｚｏｌｉｎ　）　）、及びＮ０３Ｃ（ケイ・ブ
ロムレイ編、会報５ＰＩＥ４３１．２（１９８３）の「
実時間信号処理」■におけるジェイ・エム・スペイサ−
（Ｊ、Ｍ、　５ｐｅｉｓｅｒ）及びエイチ・ジエイ・ホ
ワイトハウス（Ｈ，Ｊ。EFFECTS OF THE INVENTION JT configurable monopoly hardwiring provides a faster system for executing a particular algorithm, but does not provide adequate flexibility for executing a variety of algorithms. The bus is
For most algorithms that use more than about 10 high-performance processors, saturation is likely (R.J. Hayduk and Noor A. Li). NA
"Research on structure and mechanics J" published by SA, 2335.15 (1984)
The closest neighbor connections between processors are cost effective for the algorithm. For example, NASA's 2-D processor array, MPP, is effective for edge enhancement. What is being built at Carnegie-Mellon College (K. Bromley)
)[, Bulletin 5PIE495.137 (1984)
Real Time Signal Processing J
Processing )
Kuekes (P, J, Kuekes) and M, S, 5chlausker
) such as systolic arrays (see H.T. Kung's ``Computer J'').
) l 5 (1), 37 (1982)),
ESL, Hughes (International Conference on Acoustics Speech and Signal Processing 85CH2i1B-8,3,1
392 (1984), J.G. Natsch (
J, G, Na5h) and C-petrozolin (C, P
etrozolin)), and N03C (edited by Kay Bromley, Bulletin 5PIE431.2 (1983))
J.M. Spacer in "Real-time Signal Processing"■
(J, M, 5peiser) and H.A. White House (H, J.

Ｗｈｉｔｅｈｏｕｓｅ　）は、たたみこみ、相関、及び
他のパイプライン化容易な演算に対して効果的であり、
近い将来に従来の信号処理に対して優位を占めるものと
期待されている。しかし、より複雑な接続を必要とする
アルゴリズムは働きが良くなくなり、また、特に自動シ
ステムに対して、かかるプロセッサ上にマツピングする
ことが困難になる。実施例のコンピュータは、ハード配
線式システム、シストリックアレイ、またはもっと複雑
なネットワークの外観及び性能を与えるように再構成す
ることができる。Whitehouse) is effective for convolution, correlation, and other easy-to-pipelin operations;
It is expected that it will dominate conventional signal processing in the near future. However, algorithms that require more complex connections perform less well and are also difficult to map onto such processors, especially for automated systems. Example computers can be reconfigured to give the appearance and performance of a hard-wired system, a systolic array, or a more complex network.

任意の大きさ及び速度の汎用ネットワーク（レキシント
ン・プックス（Ｌｅｘｉｎｇｔｏｎ　Ｂｏｏｋｓ　）、
１９８４のエイチ・ジェイ・シーゲル（Ｈ，Ｊ、　Ｓｉ
ｅｇｅｌ）の「大規模並列処理のための相互接続ネット
ワーク、理論及び事例研究Ｊ　（Ｉｎｔｅｒｃｏｎｎｅ
ｃｔｉｏｎｎｅｔｗｏｒｋ　ｆｏｒ　ｌａｒｇｅ　５ｃ
ａｌｅ　ｐａｒａｌｌｅｌ　ｐｒｏｃｅｓｓｉｎｇ。Generic Networks of Any Size and Speed (Lexington Books)
H.J. Sigel (1984)
Interconne Networks, Theory and Case Studies for Massively Parallel Processing J.
ctionnetwork for large 5c
parallel processing.

ｔｈｅｏｒｙ　ａｎｄ　ｃａｓｅ　５ｔｕｄｉｅｓ　）
　）は、従来の半導体技術をもってしては余りに費用が
かかり過ぎる。theory and case 5 studies)
) are too expensive using conventional semiconductor technology.

従って、今日のシステムは、入力端子と出力端子との間
、または不完全クロスバ−スイッチ相互間にいくつかの
スイッチを設けることを必要とする多重ステージを有し
ている。ＢＢＮバタフライ（Ｂｕｔｔｅｒｆｌｙ　）マ
シン（カリフォルニア・バークレイ大学のディー・ワイ
・チェンゾ（Ｄ、Ｙ、　Ｃｈｅｎｇ）のｒＳＲＣ技術報
告Ｊ　（ＳＲＣＴｅｃｈｎｉｃａｌ　ｒｅｐｏｒｔ　）
Ｎ１０５９　　（１９８４）参照）は、完全シャフルネ
ットワークで相互接続された４×４クロスバ−スイッチ
の多重ステージを有しており、ユタ州オースティン市に
あるＴＲＡＣ（ジェイ・シー・ブラウン（Ｊ、Ｃ，Ｂｒ
ｏｗｎｅ　）の「今日の物理学ｊ　（ＰｈｙｓｉｃｓＴ
ｏｄａｙ）　３７　（５）、（１９８４））はバンヤン
（Ｂａｎｙａｎ）の構成の２×２スイツチを用いている
。多重ステージは待ち時間及び制御の複雑性を増加させ
る。Therefore, today's systems have multiple stages requiring several switches between input and output terminals or between incomplete crossbar switches. BBN Butterfly Machine (rSRC Technical Report J (SRC Technical Report) by D, Y, Cheng, University of California Berkeley
N1059 (1984)) has multiple stages of 4x4 crossbar switches interconnected in a fully shuffled network and is located at TRAC (J.C.Br.
own )'s ``Physics Today J (PhysicsT)
37 (5), (1984)) uses a 2×2 switch of Banyan's configuration. Multiple stages increase latency and control complexity.

メツセージの通過は更にオーバヘッドを増加させ、その
結果、速度及び柔軟性が更に犠牲となる。インテル（Ｉ
ｎｔｅｌ　）　　１Ｐｓｃはニスミックキューブ（Ｃｏ
ｓａ＋ｉｃ　Ｃｕｂｅ）（シー・エル・セイツ（Ｃル１
．５ｅＬｔｚ）のｒＡＣＭの通信Ｊ　（Ｃｏａｕ＋＋ｕ
ｎｉｃａｔｉｏｎ　ｏｆ　ｔｈｅ　ＡＣＭ　）２Ｂ（１
）　、２２　　（１９８５））を基礎としており、六次
元超立方体相互接続システムの頂点に分散メ□　モリが
ある２’＝６４個のノードを有す。プロセッサは６４個
からの６個の他のプロセッサと接続しており、これによ
り、アルゴリズムの柔軟性及び制御の複雑性はシストリ
ックアレイよりも増すが、柔軟性は完全再構成可能ネッ
トワークよりも少ない。実施例の信号プロセッサは、完
全な再構成可能性を有し、ブリプログラムド光学スイッ
チを用いて、高い速度、多重ステージ再構成可能システ
ムよりも少ない待ち時間及び簡単な制御を提供する。Passing messages further increases overhead, resulting in further sacrifices in speed and flexibility. Intel (I
) 1Psc is Nismic Cube (Co
sa+ic Cube)
．． 5eLtz) rACM communication J (Coau++u
cation of the ACM)2B(1
), 22 (1985)) and has 2'=64 nodes with distributed memory at the apex of a six-dimensional hypercube interconnection system. The processor is connected to 6 out of 64 other processors, which provides more algorithmic flexibility and control complexity than a systolic array, but less flexibility than a fully reconfigurable network. . The example signal processor is fully reconfigurable and uses pre-programmed optical switches to provide high speed, lower latency and simpler control than multi-stage reconfigurable systems.

プログラムドブ−タフローの１占プログラムドブ−タフローは、メモリアドレス計算、命
令復号並びにメモリ及び命令取出しに費やされるオーバ
ヘッド時間を減少させる。データフローは、成るアルゴ
リズムのための有向グラフを実行するための機構を適当
な再構成可能マシン上へ提供する（「コンピュータＪ　
（Ｃｏｍｐｕｔｅｒ）１３　（１１）、４８　（１９８
０）のジェイ・ビー・デニス（Ｊ、Ｂ、　Ｄｅｎｎｉｓ
　）　　ｒ計算概観Ｊ　（ＣｏｍｐｕｔｉｎｇＳｕｒｖ
ｅｙｓ　）　　１４　（１）、（１９８２）のディー・
アール・トレリーペン（Ｄ、Ｒ，７ｒｅｌｅａｖｅｎ　
）、アール・ブラウンブリッジ（Ｒ，Ｂｒｏｗｎｂｒｉ
ｄｇｅ　）、及びアール・ビー・ホプキンズ（Ｒ，Ｐ、
　Ｈｏｐｋｉｎｓ　）、並びにシー・アール・ヴイック
（Ｃ，Ｒ，Ｖｉｃｋ　）及びシー・ヴイー・ラマムール
ジイ（Ｃ，Ｖ、　Ｒａｍａｍｏｏｒｔｈｙ　）ｗＡの「
ソフトウェア技術ハンドブックＪ　（Ｈａｎｄｂｏｏｋ
ｏｆ　Ｓｏｆｔｗａｒｅ　ＩＥｎｇｉｎｅｅｒｉｎｇ　
）　（１９８４）内のディー・オフスレイ（Ｄ、　０ｘ
ｌｅｙ　）、ビー・ソーバー（Ｂ、　５ａｕｂｅｒ　）
　、エム・コルニー／シュ（Ｍ、　Ｃｏｒｎｉｓｈ））
。The monopolized programmable booter flow reduces the overhead time spent in memory address calculations, instruction decoding, and memory and instruction fetching. Dataflow provides a mechanism for executing a directed graph for an algorithm on any suitable reconfigurable machine ("Computer J
(Computer) 13 (11), 48 (198
0) J.B. Dennis
) r Computing Overview J (ComputingSurv
eys) 14 (1), (1982)
Earl Treley Pen (D, R, 7releaven
), Earl Brownbridge (R, Brownbri)
dge), and R.B. Hopkins (R,P,
Hopkins), as well as C.R. Vick and C.V. Ramamoorthy wA.
Software Technology Handbook J
of Software Engineering
) (1984) in Dee Offsley (D, 0x
ley), Bee Sober (B, 5auber)
, M. Cornish)
.

オヘレータは、必要な全ての入カドークンを持つと直ち
に動作する。再帰的機能をなす柔軟性及び能力は、信号
処理よりはむしろ一般的計算または人工知能へ差し向け
られる傾向のある大部分のデータフロープロジェクトに
おいて速度よりも高い優先順位を有す、マシンは、一般
に、プロセッサの動的割付け、並びに、将来の演算、ル
ート指定及びデータに関する情報を含んでいるパケット
の伝送を包含している。現在のプロトタイプ・マシンの
例としては、ＭＩＴマシン（アーヴインド（Ａｒｖｉｎ
ｄ　）及びアール・エイ・イアヌッシ（Ｒ，Ａ。The ohelator operates as soon as it has all the necessary inputs. Flexibility and the ability to perform recursive functions have a higher priority than speed in most dataflow projects, which tend to be directed toward general computation or artificial intelligence rather than signal processing; , dynamic allocation of processors, and transmission of packets containing information about future operations, routing, and data. Examples of current prototype machines include the MIT machine (Arvind
d) and R.A. Iannussi (R.A.).

Ｉａｎｎｕｃｃｉ　）のｒＭＩＴ報告書Ｊ　（ＭＩＴ　
Ｒｅｐｏｒｔ　）Ｍ　Ｉ　Ｔ／Ｌ　ＣＳ／ＴＭ−２４１
（１９８３））　、マンチェスタ・データフロー・マシ
ン（ジェイ・アール・ガード（Ｊ、Ｒ，Ｇｕｒｄ　）　
、シー・シー・キルクハム（Ｃ，Ｃ，Ｋｉｒｋｈａｍ　
）及びアイ・ワトソン（１，Ｗａｔｓｏｎ）のｒＡＣＭ
の通信Ｊ　（Ｃｏｍ＋＊ｕｎｉｃａｔｏｎ　ｏｆ　ｔｈ
ｅ　ＡＣＭ　）２８（１）　、３４　（１９８５））、
日本のシグマ・１マシン（アール・エム・ケラ−（Ｒ，
Ｍ、　Ｋｅｌｌｅｒ）鳩の８４ＣＨ２０４５−３、Ｉ　
ＥＥＥ会報５２４（１９８４）の並列処理に関する国際
会議におけるに、　ｌ１ｉｒａｋｉ、　Ｔ、　５ｈｉｓ
ａｄａ、　Ｋ、　Ｎ１５ｈｉｄａ　）、及びテキサス・
インストルメンツ・マシン（ＩＥＥＥ会議録１９（１９
７９）のディジタル・アビオニツク・システムに関する
第３会議におけるエム・コルニッシｓ　（Ｍ、Ｃｏｒｎ
ｉｓｈ　））がある。rMIT Report J (MIT
Report) MI T/L CS/TM-241
(1983)), Manchester Dataflow Machine (J.R.Gurd)
C,C,Kirkham
) and i.Watson (1, Watson) rACM
Communication of th
e ACM) 28(1), 34 (1985)),
Japanese Sigma 1 machine (R.M. Keller)
M, Keller) Pigeon 84CH2045-3, I
In International Conference on Parallel Processing, EEE Bulletin 524 (1984), l1iraki, T., 5his.
ada, K., N15hida), and Texas.
Instruments Machine (IEEE Conference Proceedings 19 (19
79) M. Cornissi's (M, Cornisi) 3rd Conference on Digital Avionic Systems
There is ish )).

実施例のコンピュータは、事実上、プログラムドブ−タ
フローを使用する。各基本プロセッサにおけるデータ通
路及び演算順序は、オーバヘッドビットを送る必要性を
減らすために予備計算される。全ての必要な入力を有す
る各基本プロセッサは、局所コード及び同期化パルスに
よって決定される入力端子または内部メモリからのデー
タに対して、予め指定された演算を行なう、その目的は
、所定の簡単な制御及びデータフロー戦略により、最大
スループット及び最少待ち時間を得ることである。The example computer uses a virtually programmed boot flow. The data path and operation order in each elementary processor is precomputed to reduce the need to send overhead bits. Each elementary processor with all necessary inputs performs prespecified operations on data from input terminals or internal memory determined by local codes and synchronization pulses, the purpose of which is to perform a predetermined simple The goal is to obtain maximum throughput and minimum latency through control and data flow strategies.

の１占光学的相互接続は、電子的相互接続に比べ、容量性負荷
の影響を減少させ、また、相互干渉に対する不感性がよ
り大きいという利点を有す、帯域中ピン制限及びエツジ
接続制約を克服するため、及び多重プロセッサを接続す
るために、ＶＬＳ　Ｉとの通信に対して光学が従来から
示唆されている。Single-occupancy optical interconnects overcome mid-band pin limitations and edge connection constraints, with the advantage of reduced capacitive loading effects and greater insensitivity to mutual interference compared to electronic interconnects. To overcome this problem and to connect multiple processors, optics have been previously suggested for communicating with VLSI.

通信業界は、電子光学への転向を避け、且つ、光ファイ
バを用いるときの切換えの目的を支持するために、光学
的切換え装置を開発しつつある。光学空間光変調器にお
ける開発は、クロスバ−スイッチが、従来の半導体技術
によっては適合されそうもない費用、速度及び大きさに
ついて利用可能となるであろうということを示唆してい
る。ディジタル光学コンピュータは終局的には実行可能
になるものと期待されており、そして、最も近く隣り合
う諸問題を解決するための設計が、本発明者にかかる係
属中の米国特許出願第７７７．６６０号に記載されてい
る。かかるコンピュータは、実施例のものとは異なり、
使用される相互接続システムのために柔軟性が制限され
ており、また、実時間処理用を自損してはいない。The telecommunications industry is developing optical switching devices to avoid converting to electro-optics and to support the purpose of switching when using optical fibers. Developments in optical spatial light modulators suggest that crossbar switches will become available at costs, speeds, and sizes that are unlikely to be matched by conventional semiconductor technology. It is expected that digital optical computers will eventually become viable, and a design for solving the proximate problems is proposed in co-pending U.S. Patent Application No. 777.660. listed in the number. Such a computer is different from that of the embodiment,
Flexibility is limited due to the interconnection system used and does not compromise real-time processing.

[Brief explanation of drawings]

第１図は一般の光学クロスバ−接続式並列プロセッサ形
コンピュータを示す略図、第２図は高レベル並列アーキ
テクチャを示すブロック線図、第３図は第１の実施例の
光学クロスバ−接続式並列信号プロセッサを示すブロッ
ク線図、第４図は第１の実施例の基本プロセッサを示す
ブロック線図、第５Ａ図及び第５Ｂ図は光学クロスバ−
スイッチの動作を説明するための線図、第６図は変形可
能ミラー装置形光学クロスバ−スイッチのための可能性
ある光学装置を示す略図、第７八図ないし第７０図は変
形可能ミラー装置の動作を説明するための斜視図及び線
図、第８Ａ図及び第８Ｂ図はフィルタリングアルゴリズ
ムのための流れ及びクロスバ−スイッチセツティングを
示す線図、第９Ａ図ないし第９Ｃ図、第１０図、第１１
Ａ図及び第１１Ｂ図は高速フーリエ変換のための流れ及
びクロスバ−スイッチセツティングを示す線図、第１２
Ａ図及び第１２Ｂ図は相関及びマトリックス・ベクトル
乗算のための倍加アルゴリズムのための流れ及びクロス
バ−スイッチセツティングを示す線図、第１３図はジュ
ールのアルゴリズムによる線形予測係数の並列計算を説
明するための線図、第１４図及び第１５図は前方及び後
方連鎖式規則準拠グラフを示す線図、第１６図はエンベ
デフドシストリフクバーサによる音声認識を説明するた
めのブロック線図、第１７Ａ図及び第１７Ｂ図は動的時
間ワープの原理を説明するための線図、第１８図は連続
動的時間ワープアルゴリズムのためのフローグラフを示
す線図、第１９図は語の終りの決定及び認識のためのフ
ローグラフを示す線図、第２０図はオペランド解析トリ
ーを示す線図、第２１図は状況アクション・ウェイト・
アンド・シー・パーサのためのフローグラフを示す線図
である。１０２・・・光学クロスバ−スイッチ、１０４・・・フ
ァイバ光学リンク、１１０・・・入出力器、１１６・・・制御器、１３４・・・シュリーレン光学装置。Ｆｔ’ｇ、　７゜チャネルストップＦｔ”１．７ｂ上音Ｉ３境界（Ｚ８ｔする累１員 □偏腓先入力データＯイ史用する累ｆ員距維・　イ史用する局餡距離手続補正書（方式） ■、事件の表示　　　昭和６１年特許願第２８３０１２
号３、補正をする者事件との関係　　出願人４、代理人Fig. 1 is a schematic diagram showing a general optical crossbar-connected parallel processor type computer, Fig. 2 is a block diagram showing a high-level parallel architecture, and Fig. 3 is an optical crossbar-connected parallel signal of the first embodiment. FIG. 4 is a block diagram showing the basic processor of the first embodiment, and FIGS. 5A and 5B are optical crossbar diagrams.
Diagrams for explaining the operation of the switch; FIG. 6 is a schematic diagram showing a possible optical arrangement for an optical crossbar switch in the form of a deformable mirror device; FIGS. FIGS. 8A and 8B are perspective views and diagrams to explain the operation; FIGS. 8A and 8B are diagrams showing the flow and crossbar switch settings for the filtering algorithm; FIGS. 9A to 9C; FIGS. 11
Figures A and 11B are diagrams showing the flow and crossbar switch settings for fast Fourier transform;
Figures A and 12B are diagrams showing the flow and crossbar switch settings for the doubling algorithm for correlation and matrix-vector multiplication; Figure 13 illustrates the parallel computation of linear prediction coefficients by Joule's algorithm; Figures 14 and 15 are diagrams showing forward and backward chain rule conformance graphs, Figure 16 is a block diagram for explaining speech recognition by embedded system flow versa, and Figure 17A is a diagram showing forward and backward chain rule compliance graphs. Figure 17B is a diagram for explaining the principle of dynamic time warp, Figure 18 is a diagram showing the flow graph for the continuous dynamic time warp algorithm, and Figure 19 is a diagram showing the end-of-word determination and Figure 20 is a diagram showing the flow graph for recognition, Figure 20 is a diagram showing the operand analysis tree, and Figure 21 is a diagram showing the situation action weight.
FIG. 3 is a diagram illustrating a flow graph for the and sea parser. 102... Optical crossbar switch, 104... Fiber optical link, 110... Input/output device, 116... Controller, 134... Schlieren optical device. Ft'g, 7゜Channel stop Ft"1.7b Upper sound I3 boundary (Z8t cumulative 1 member □ biased input data (Method) ■Indication of the case 1985 Patent Application No. 283012
Item 3: Relationship with the person making the amendment Applicant 4: Agent

Claims

[Claims] 1. (a) a plurality of parallel processors; (b)
) a resettable optical crossbar switch, said switch switchably interconnecting said processors in pairs such that an output from a first processor is directed to an input terminal of a second processor; and (c) a controller for synchronizing the processors and controlling the settings of the optical crossbar switch; and (d) an input/output device connected to the plurality of processors. 2. (a) The computer according to claim 1, wherein the processor performs only basic calculations. 3. (a) The computer according to claim 1, wherein the crossbar switch is dynamically resettable. 4. The computer according to claim 1, wherein (a) the optical crossbar switch is a deformable mirror device with schlieren optics. 5. (a) The switch is illuminated by a light emitting diode or laser diode activated by a processor, and the unobstructed light reflected by the switch is detected by a photodiode driving the processor. Computers mentioned in section. 6. The computer of claim 1, wherein: (a) the interconnect includes a fiber optic link. 7. The computer of claim 1, wherein: (a) the plurality of processors comprises a first plurality of lower order numeric processors and a second plurality of lower order symbolic processors. 8. A computer capable of dynamically resetting hardware between parallel and pipelined architectures, comprising: (a) a plurality of processors; and (b) a resettable optical crossbar switch, the switch comprising: The processors are switchably interconnected in pairs such that the output from the first processor is directed to the input/output terminals of the second processor, such that the setting of the switch allows the processors to be connected in parallel, pipelined or otherwise. A computer characterized in that it can be configured as a mixture. 9. A real-time speech recognizer for large words: (a) a plurality of subdictionaries according to the part of speech of the words; (b) a word recognizer with a word-end detector for speech input data; (c) ) a rule-compliant parser, the parser comprising: (i) a rule-compliant parser;
resolve the ambiguity for the above recognizer, and (
ii) A real-time speech recognizer, characterized in that it selects a subdictionary to the word recognizer to apply to the input data subsequent to the end of the detected word. 10. The real-time audio recognizer of claim 9, wherein (a) the audio input data is processed as a continuous stream of frames.