JPH05500124A

JPH05500124A - Concurrent computation/communication mechanism in SIMD architecture

Info

Publication number: JPH05500124A
Application number: JP2509527A
Authority: JP
Inventors: ハマーストロム，ダニエル・ダブリュー
Original assignee: アダプティブ・ソリューションズ・インコーポレーテッド
Priority date: 1990-05-30
Filing date: 1990-05-30
Publication date: 1993-01-14
Also published as: EP0485594A1; WO1991019256A1; EP0485594A4

Abstract

(57)【要約】本公報は電子出願前の出願データであるため要約のデータは記録されません。 (57) [Summary] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

【発明の詳細な説明】ＳＩＭＤアーキテクチャにおける並行演算／通信機構（技術分野）本発明は、コンピュータのプロセッサ・アーキテクチャに関し、特にＳＩＭＤアーキテクチャにおける並行演算／通信を提供するアーキテクチャに関する。このアーキテクチャは、１つのプロセッサ・ノードにおいて同時のデータ処理および通信を可能にする出方プロセッサ・ノードを含む。[Detailed description of the invention] Parallel computation/communication mechanism in SIMD architecture (technical field) TECHNICAL FIELD The present invention relates to computer processor architecture, and in particular to SIMD processor architecture. The present invention relates to architectures that provide parallel computation/communication in architectures. this The architecture supports simultaneous data processing and Contains an outgoing processor node that enables communication.

（背景技術）ニューラル・ネットワークは、コンピュータが人間の思考過程を忠実に近づくことを可能にするアーキテクチャの一形態である。ニューラル・ネットワーク・アーキテクチャの一つの形態は、１つの指令が多数のプロセッサ、従ってデータ・セットを同時に指令することを許容する、単一命令ストリーム、多重データ・ストリーム（ＳＩＭＤ）演算を可能にする。(Background technology) Neural networks allow computers to closely approximate human thought processes. It is a form of architecture that enables neural network a One form of architecture is that a single command can be sent to multiple processors and therefore data Single instruction stream, multiple data streams that allow sets to be commanded simultaneously Enables stream (SIMD) operations.

伝統的な従来のコンピュータにより実行される現在ある従来のアルゴリズムを用いて解決することができない幾っがの重要な実際の問題がある。これらの問題は、しばしば不完全に特定され、大きな探索スペースを要求する多くの小さな制約により特徴付けられる。using existing conventional algorithms executed by traditional conventional computers. There are some important real problems that cannot be solved. These problems are , many small constraints that are often poorly specified and require a large search space. Characterized by

コンピュータ音声認識、コンピュータ・ビジョンおよびロボット制御の如きコンピュータによる主要な認識情報の処理は、この分類に属する。伝統的な演算モデルは、この種の問題を解決するため用いられるならば、演算負荷下の障害点に行き詰まる。また、動物はトランジスタより数百万倍遅いニューロンを用いてこれらのタスクを行う。ニューロンは、ΣＷ　＋　＋　０　＋とじて表現することができる入力の加重和を行う。ここでｗｌｌは「メモリ刊がら引出された値、０．は人力値である。ＳＩＭＤの乗算／累計機能はこの演算を行う。Computers such as computer voice recognition, computer vision and robot control The main processing of recognition information by computers belongs to this category. Traditional calculation model If used to solve this type of problem, the I get stuck. Animals also do this using neurons that are millions of times slower than transistors. perform these tasks. Neuron can be expressed as ΣW + + 0 + Performs a weighted sum of possible inputs. Here wll is the value pulled from memory, 0. is the human power value. The SIMD multiply/accumulate function performs this operation.

膨大な並行演算のためのアーギュメントであるＦｅｌｄｍａｎの１ｏｏステ・ツブ・ルールは、５００ミリ秒の時間を有する「人間の」認識過程が５ミリ秒のニューロン切換え時間内で実施が可能であることを述べている。もしこの「スイ。Feldman's 1oo step is an argument for massively parallel operations. The rule is that a "human" cognitive process with a duration of 500 milliseconds It is stated that implementation is possible within the uron switching time. If this “Sui.

チング」時間が遅ければ、多数の「スイッチ」で高速のシステムを構成することもできる４、このことは、２つの非常に異なる演算モデルが使用されることを示唆する。これはまた、神経系統か行うことを行うコンピュータを構築するためには、コンピュータは動物の神経系統をエミュレートするように構成しなければならないことを示唆する。ＳＩＭＤシステムは、大規模な並行ニューラル・ネットワークをエミュレー・１・するように設計される演算システムである。If switching time is slow, configure a high-speed system with many switches. 4, which indicates that two very different computational models are used. suggest This can also be used to build a computer that does what the nervous system does. The computer must be configured to emulate the animal's nervous system. This suggests that there is no such thing. SIMD systems are massively parallel neural networks. A computing system designed to emulate a workpiece.

神経系統、およびニューロ演算コンピュータは、人力ノイズおよびハードウェア故障の許容性を有する連続的な、非シンボリックな大規模な並行構造を特徴としている。表示即ち入力は、独立的に結果即ち結論に到達し、従って最終出力の結論に達するため情報を一般化し内挿する演算要素のグループ間に分散される。Nervous system and neuro-computing computers are free from human noise and hardware Characterized by continuous, non-symbolic, massively parallel structures with fault tolerance ing. The display, or input, independently reaches the result, or conclusion, and therefore the final output, the conclusion. distributed among groups of computational elements that generalize and interpolate information to arrive at a theory.

換言すれば、コネクショニスト／ニューラル・ネットワークは、多数の小さな演算要素の大規模な並行演算を用いて「良好な」解をめる。このモデルは、仮説生成および支配的な即ち「最もあり得る」仮説に対する緩和の１つである。。In other words, connectionist/neural networks are Find a ``good'' solution using massively parallel operations on computational elements. This model is based on the hypothesis It is one of the relaxations to the dominant or ``most likely'' hypothesis. .

探索速度は、探索スペースの大きさとは多少とも独立的である１、学習は、データ構造の割付けとは対照的に、接続（シナプス）の強さを逓増的に変化させるプロセスである。このようなニューラル・ネットワークにおける「プログラミング」は−例である。The search speed is more or less independent of the size of the search space1, and learning In contrast to the allocation of data structure, a program that incrementally changes the strength of connections (synapses) It is a process. "Programming" in such neural networks ” is an example.

通信ノード・ユニットの全接続性あるいは近似全接続性（３０％以上）が構成される場合は、データ・セットが使用された後側々のプロセッサが相互に通信することを待つのに大量の時間が消費される。公知の処理ユニットは、全てのプロセッサ・ノードがそれらの前に使用したデータを送るまで後続のデータ・セットについての演算の起生を許容する機構を提供するものではない。Total connectivity or near total connectivity (more than 30%) of communicating nodes/units is configured. processors on both sides communicate with each other after the data set is used. A lot of time is wasted waiting for things to happen. A known processing unit handles all processes. to subsequent data sets until the server nodes send their previously used data. It does not provide a mechanism that allows operations to occur.

（発明の開示）本発明の目的は、並行演算および通信機能をＳＩＭＤ構造において提供する並行使用アーキテクチャの提供にある。(Disclosure of invention) It is an object of the present invention to provide parallel computing and communication functions in a SIMD structure. Provides a usage architecture.

本発明の別の目的は、同様な出力プロセッサが送信ネットワーク上で処理されたデータを送る間、処理データを保持する出力プロセッサ・アーキテクチャの提供にある。Another object of the invention is that a similar output processor is Providing an output processor architecture that retains processing data while sending data It is in.

本発明の別の目的は、プロセッサが後続のデータ・セットについて作動する間処理データを格納する出力プロセッサの提供にある、。Another object of the invention is to provide an intermediate processing method for the processor to operate on subsequent data sets. The purpose of the invention is to provide an output processor that stores the logical data.

本発明の四に別の目的は、通信ネットワーク上で通信プロトコルを調停するための出力プロセッサにおける機構の提供にある９゜本発明の並行使用アーキテクチャは、入力バス、人カニニット、操作ユニットおよび出力バスを含む単一の命令ストリー１、の多心データ・ストリーｌトプロセッサ・ノートにおいて使用されることを、α図する１、このアーキテクチャは、人カニニットおよび種々の操作ユニットからデータを受取る出力バラ７７・ユニットを備えた出力プロセッサを含む。プロセッサ・ノートは、出力ハッフ７・ユニットからのデータを格納し出力バスで選択された時間に伝送する。Ｓ　ＩＭＤ演算を指令し、プロセッサ・ノード、および関連する出力プロセッサ・ノートおよび出力バッファ・ユニット間のデータ交換を制御する制御ユニットが設けられる。出力プロセッサは、出力バス上のデータの格納および伝送を指令する内部コントローラを有する。A fourth object of the present invention is to arbitrate communication protocols on a communication network. 9. The concurrent use architecture of the present invention consists in providing a mechanism in the output processor of The controller is a single instruction including an input bus, a person crab unit, an operation unit and an output bus. Stream 1, used in multi-core data stream processor notebooks. Figure 1. This architecture can be used to perform various operations such as an output processor with an output rose 77 unit that receives data from the unit; include. The processor note stores and outputs data from the output Huff 7 unit. power bus to transmit at the selected time. S Commands IMD operation and executes processor node between the output processor notes and output buffer units, and associated output processor notes and output buffer units. A control unit is provided for controlling the data exchange. The output processor It has an internal controller that directs the storage and transmission of data on the device.

本発明の上記および他の目的および利点については、以降の説明を図面に関して読めば更に明らかになるであろう。The foregoing and other objects and advantages of the present invention will be further explained with reference to the drawings. It will become clearer if you read it.

（図面の簡単な説明）図１は、本発明により構成されたプロセッサ・ノートのアレイの概略図、図２は、図１のプロセッサ・ノート内に含まれる通信ノードの回報通信パターンの概略図、図３は、本発明の単一のプロセッサ・ノードのブロック図である１゜（実施例）図面に関して、先ず図１においては、単一命令ストリーム、多重データーストリーム（ＳＩＭＤ）プロセッサのアレイが全体的に１０で示される。アレイ１０は、プロセッサ・ノード１２．１４．１６．１８を含み、また特定の用途に必要とされる多数のプロセッサ・ノードを含むことができる。図１は、簡単にするため、また多数のプロセッサ・ノード間の相互作用の説明を可能にするため、このようなプロセッサ・ノードを４個のみ含む１．各プロセッサ・ノートは、入力バス２０と、出力バス２２とに接続されている。バス２０および２２は図１において単一エンティティ構造として示されるが、ある場合には人力および（または）出力バスは多重バス構造であってもよい１゜１つのコントローラ２４が制御バス２６と接続され、このハスは史にプロセッサ・ノートの各々と接続されている１、ある場合には、出力バス２２は線２８により人力バス２０と直接接続され、あるいは出力バス２２はブロセッ→ノ゛・ノートの別のアレイの入力バスと接続されるが、プロセッサ・ノートの別のアレイの出力バスは入力バス２０と接続してよい。(Brief explanation of the drawing) FIG. 1 is a schematic diagram of an array of processor notes constructed in accordance with the present invention; FIG. , an outline of the circular communication pattern of the communication nodes included in the processor notebook in Figure 1. figure, FIG. 3 is a block diagram of a single processor node of the present invention. Regarding the drawings, first, in Figure 1, there is a single instruction stream, multiple data streams. An array of system (SIMD) processors is indicated generally at 10. The array 10 is , processor node 12.14.16.18, and as needed for a particular application. A large number of processor nodes may be included. Figure 1 is for simplicity , and to enable the description of interactions between large numbers of processor nodes. 1. Contains only 4 processor nodes. Each processor note has an input bus 20 and an output bus 22. Buses 20 and 22 are shown in FIG. Although presented as a single entity structure, in some cases human and/or The power bus may have a multiple bus structure.1゜One controller 24 is connected to the control bus 2. 6, this lotus is historically connected to each of the processor notes 1, In some cases, output bus 22 is directly connected to human-powered bus 20 by line 28, and in some cases Alternatively, the output bus 22 is connected to the input bus of another array of Blosset→No.notes. However, the output bus of another array of processor notes should be connected to input bus 20. stomach.

図１に示されるように、各プロセッサ・ノート（ＰＮ）は、プロセ・ソサ・ノード１２〜２８における３０〜３７で示される接続ノート（ｃＮ）の如き１対以１の通信ノートを含む、、ＣＮは、状態がＰＮに配置されるニューラル・ネットワークにおけるエミュレートされたノートと関連している１、各ＰＮは、内部に配置された幾つかのＣＮを持ち得る。従来のコンピュータ・アーキテクチヤ表示に従って、プロセッサ・ノート１２は、接続ノード０　（ＣＮＯ）に対する状態情報、および接続ノード４　（ＣＮ４）に対する状態情報を有し、プロセッサ・ノート１４は、接続ノード１　（ＣＮＩ）に対する状態情報および接続ノード５　（ＣＮ５）に対する状態情報を、プロセッサ・ノート１６は、接続ノート２　（ＣＮ２）に対する状態情報および接続ノード６　（ＣＮ６）に対する状態情報を、またプロセッサ・ノード１８は、接続ノード３　（ＣＮ３）に対する状態情報および接続ノード７　（ＣＮ７）に対する状態情報を有する。。As shown in Figure 1, each processor note (PN) 1 pair or more, such as connecting notes (cN) indicated by 30 to 37 in nodes 12 to 28. , CN is a neural network whose state is located in PN. 1. Each PN associated with an emulated note in the may have several CNs placed. Traditional computer architecture display Therefore, processor note 12 provides state information for connected node 0 (CNO). information, and state information for connected node 4 (CN4), The port 14 provides status information for connection node 1 (CNI) and connection node 5. The processor note 16 transmits the status information for the connection notebook 2 (CN5) to the connection notebook 2 (CN5). state information for CN2) and state information for connected node 6 (CN6). , and the processor node 18 also provides state information for the connection node 3 (CN3). and state information for connection node 7 (CN7). .

暫時図２を参照すれば、接続ノード０〜７間にセットアツプされる同報パターンが示される。ＣＮは「レイヤ（ｌａｙｅｒ）Ｊに構成され、ＣＮ０−ＣＮ３は１つのレイヤを含むが、ＣＮ４〜ＣＮ７は第２のレイヤを含んでいる。先に述べたように、いずれか１つのプロセッサ・ノード、あるいはプロセッサ・ノードのアレイにおける接続ノートには２つより多いレイヤがある。、接続ノードは、同報階層と呼ばれるもので作動し、これにおいては各接続ノード０〜３が接続ノード４〜７の各々に対して同報する。このような回報階層を構成するための例示的な手法は、参考のため本文に引用される１９８９年１月３日発行のＨａ＋ｎｍｅｒ −ｓｔｒｏｍ等の米国特許第４，７９６，１９９号［ニューラル・モデル情報処理アーキテクチャおよびその方法（ＮＥＵＲＡＬ−ＭＯＤＥＬ　ＩＮＦＯＲＭＡＴＩＯＮ−ＨＡＮＤＬＩＮＧ　ＡＲＣＨ！ＴＥＣＴＵＲＥ　ＡＮＤ　ＭＥＴＨＯＤ’）」において開示されている１゜概念的には、使用し得るプロセッサ・ノードは、各々が各グロック毎に各人力に対してその機能（乗算、累計および加重指標増分）を実行するプロセッサの「レイヤ」として考えることができ、これにおいては１つのプロセッサ・ノートがその出力を他の全てのプロセッサ・ノードへ同報する。出力プロセッサ・ノード構成を用いることにより、僅かに２つのレイヤ構成を用いてｎクロックにおいて１２回の接続を行うことが可能である。公知の従来ＳＩＭＤ構造は、ｎクロックにおいて０２回の接続を行うことができるが、３つのレイヤ形態、即ち５０％多いの構成を必要とする。Referring temporarily to Figure 2, the broadcast pattern set up between connected nodes 0 to 7 is shown. CN is configured in layer J, and CN0-CN3 are 1 CN4 to CN7 include a second layer. mentioned earlier , any one processor node or There are more than two layers of connection notes in a ray. , connected nodes broadcast It operates in what is called a hierarchy, in which each connected node 0-3 is a connected node. Broadcast to each of 4 to 7. An exemplary example for configuring such a circular hierarchy: The method is from Ha+nmer published on January 3, 1989, which is cited in the text for reference. - U.S. Patent No. 4,796,199 to Strom et al. [Neural Model Information Processing] NEURAL-MODEL INFORMA TION-HANDLING ARCH! TECTURE AND METHO 1゜ disclosed in “D’)” Conceptually, the processor nodes that can be used are The processor's In this case, one processor note broadcast its output to all other processor nodes. Output processor node structure By using the It is possible to make two connections. The known conventional SIMD structure has n clocks. 02 connections can be made in 3 layers, i.e. 50% more configuration.

１０で示されるＳＩＭＤアレイにおける事象の通常のシーケンスは、各プロセッサ・ノードに送られるデータの特定部分で始まる。データは、入力バス２０上に送られる１つの命令により操作される。各プロセッサ・ノードは、データについて要求される如何なる動作でもこれを行い、次いで出力バス２２を介して情報を伝送するよう試みる。明らかに、各プロセッサ・ノートが同時に出ツノバスに送出できるのではなく、通常の条件下では、アレイにおける各プロセッサ・ノードがその情報を出力バス上に送出するまで、プロセッサ・ノートは多くのクロック即ちサイクルを待機しなければならない１゜この待機時間中、プロセッサ・ノートは、送出するまで別の機能を実行することができないため、必然的にアイドリング状態で作動している１、この問題を解決するため、３８．４０．４２および４４で示される如き出力プロセッサあるいは出力バッファが各プロセッサ・ノードのアーキテクチャに含まれる。、出力バッファは、関連する接続ノードから情報を受取り、各プロセッサ・ノードが出力バス２２での伝送りリアランスを受取る如き時までこの情報あるいはデータを保持する。The normal sequence of events in a SIMD array, shown at 10, is as follows: starts with a specific portion of data sent to the server node. The data is on the input bus 20. Operated by a single command sent. Each processor node performs any operations required by the output bus 22, and then transmits the information via the output bus Attempt to transmit. Obviously, each processor note sends to the output bus at the same time. Under normal conditions, each processor node in the array The processor notebook will run many clocks until the processor sends its information onto the output bus. i.e. it has to wait for a cycle.During this waiting time, the processor node It is necessarily idle because it cannot perform another function until it is sent out. To solve this problem, 38.40.42 and An output processor or output buffer as shown at 44 is provided at each processor node. included in the architecture of the code. , the output buffer receives information from the associated connected node. each processor node receives a transmission response on output bus 22. retain this information or data until such time as possible.

データが出力バッファに保持されるため、出、ＩＪプロセッサ・ノートが伝送を待機中プロセッサ・ノートの残りは他の機能を実行することができる。。Since the data is held in the output buffer, the IJ processor note The remainder of the idle processor notes can perform other functions. .

ノートが適【１な時点に出力バス２２１−に伝送するように種々のプロセッサ・ノード間の調停を制御するため、各プロセッサ・ノードはフリップ７０ツブ４６．４８．５０および５２の如きフリップ７０ツブを含む１．フリップ７０ツブはまた、本文では調停信号ジェネレータとも呼ばれる、。The various processors are configured to transmit the notes to the output bus 221- at the appropriate times. To control arbitration between nodes, each processor node has a flip 70 tube 46. ．． 1. Including flip 70 tubes such as 48.50 and 52. flip 70 tube Also referred to in the text as an arbitration signal generator.

次に図３において、プロセッサ・ノー・ド１２の如き１つのプロセッサ・ノー・ドの残りの構成要素が更に詳細に示される。、ノード１２は、人力バス２０および出力バス３０と接続される入カニニット５４を含む。再び、唯一の人力バスおよび出力ハスか簡単化のために示されている。多数の人力および（または）出力バスが提供される場合は、入カニニット５４および本文で説明する他のユニットは各人出力バスとの接続を有する。プロセッサーノード・コントローラ５６は、各プロセッサ・ノードに対する演算パラメータを確立するため設けられている１、加算ユニット５８は、加算を行い、入カニニット５４、乗算器６０から人力を受取り、入出力バスと接続されている。3, one processor node, such as processor node 12, The remaining components of the code are shown in further detail. , the node 12 connects the human-powered bus 20 and and an input unit 54 connected to the output bus 30. Once again, the only human-powered bus and output are shown for simplicity. large number of manpower and/or output If a bus is provided, the input crab unit 54 and other units described in the text has a connection with each output bus. The processor node controller 56 is 1 provided for establishing operational parameters for each processor node. , the addition unit 58 performs addition and inputs human power from the input unit 54 and the multiplier 60. Connected to receiving and input/output buses.

レジスタ・ユニット６２はレジスタ・アレイを含み、これはアーキテクチャの望ましい実施態様において３２個の１６ビツト・レジスタを含む。他の多くの構成を使用することもできる１、加重アドレス生成ユニット６４は、待機メモリー・ユニット６６に対する次のアドレスを計算する。望ましい実施態様においては、このメモリー・アドレスは２つの方法、即ち（１）加重アドレス・レジスタへの直接書込み、あるいは（２）加重オフセット・レジスタの内容をメモリー・アドレス・レジスタのその時の内容に加えさせる指令の表明の１つにおいてセットすることができ、これにより新しいアドレスを生じる。ノート・コントローラ５６、加算ユニット５８、乗算器６０、レジスタ・ユニット６２および加重アドレス生成ユニット６４は、本文において操作ユニットと呼ばれるものを含む１゜出カニニット６８は、データが出力バス２２上に送られる前にデータを格納するため設けられる１、出カニニット６８は、データの伝送に先立ち出カニニットの残部からデータを受取る出力プロセッサ・ノード３８を含む。データは、出カニニットの一部である接続ノード３０または３４の如き１つ以、８Ｌの接続ノートにより、出力バス２２へ送られる１、出カニニット６８は、最初に処理されたデータを受取る出力バッファ・レジスタを含む。このデータが一４ｇ＋レジスタにロードされると、出力バッファ・ユニットは「保護状態」となる、、−ｊ−ｊ保護状態になると、出力バッファはプロセッサ・ノードの残部とは独立的ではあるが同期的に作動する。Register unit 62 includes a register array, which is configured as desired by the architecture. In the preferred embodiment, it includes 32 16-bit registers. many other configurations 1, the weighted address generation unit 64 has a standby memory Compute the next address for unit 66. In a preferred embodiment, This memory address can be accessed in two ways: (1) into a weighted address register; Write directly or (2) Add the contents of the weighted offset register to memory. set in one of the assertions of the command to be added to the current contents of the address register. This will result in a new address. Note controller 56 , addition unit 58, multiplier 60, register unit 62 and weighted address. The generation unit 64 has a 1° output including what is referred to in the text as the operating unit. Ninit 68 is used to store data before it is sent onto output bus 22. 1. The output unit 68 is provided with the remaining output unit 68 before data transmission. includes an output processor node 38 that receives data from the output processor node 38; The data is one or more connection nodes such as connection node 30 or 34 that are part of the 8L connection node. 1, output unit 68 sends the first processed data to output bus 22. includes an output buffer register that receives the output buffer register. This data is loaded into the 14g+ register. When the output buffer unit is Once in a state, the output buffer is independent of, but identical to, the rest of the processor node. Operates periodically.

一時に唯１つのＰＮが出力バスを使用できるため、どの出力バッファあるいは出力プロセッサ・ノードが出力バス２２上で最初に送信するかを決定するために、異なるプロセッサ・ノートの出力バッファ間に一調停プロセスが提供される１、　用プロセッサ・ノードが伝送すると、プロセッサ・ノード１２におけるフリップ７０ツブ４６の如きフリップフロップから延びる矢印７０で示される初期接続手順装置が次のプロセッサ・ノートに送信を信号する３１図１の装置はこの送信が直ぐ隣接するノードから生じることを示すが、これは必ずしも実際のプロセッサ・ノード・アレイにおいて生じることを示すものではない。調停処理により他のある送信順序が決定されることもある。Since only one PN can use the output bus at a time, no output buffer or To determine which power processor node transmits first on output bus 22, 1. An arbitration process is provided between the output buffers of different processor notes; When the processor node for Initial connection indicated by arrow 70 extending from a flip-flop such as knob 46 The device in Figure 1 signals the next processor note to send. originates from the immediately adjacent node, but this does not necessarily mean that the actual It is not intended to indicate what would occur in a server node array. Others due to mediation process A certain transmission order may be determined.

送信信号がコントローラ２４によりアレイにおいて行われる他の操作との送信の同期を許容するように表明される時にのみ、調停および送信が生じる。逐次、大域および直接の如き調停／データ転送の幾つかのモードがアーキテクチャにおいて与えられる。コントローラ２４およびフリップ７０ツブ４６．４８．５帆５２は、プロセッサ動作のどの時点で信号が出力プロセッサ・ノートから送られるかを決定するように作動可能な調停手段と本文で呼ばれるものを含む１゜逐次モードは、それが意味するように、制御バス２６−（二の側御信号が１つのプロセッサ・ノードから次のプロセッサ・ノードへ移動することを要求する。The transmitted signal is connected to other operations performed on the array by controller 24. Arbitration and transmission occur only when asserted to allow synchronization. sequential, large Several modes of arbitration/data transfer, such as area and direct, are implemented in the architecture. given. Controller 24 and flip 70 tube 46.48.5 sail 52 is the point in processor operation when the signal is sent from the output processor note. 1° sequential mode including what is referred to in the text as an arbitration means operable to determine the The control bus 26- (which means that two side control signals are connected to one processor) request to move from one processor node to the next processor node.

入域調停は、全てのプロセッサ・ノードからの送信をＩｊｆ能にするがプロセッサーノードをディジーチェーンで作動させる制御バス上で移動する信号を使用し、他のプロセッサ・ノートは通過するがあるプロセッサ・ノードが送信することを呵詣にすることにより、特定のサイクル即ちクロックにおけるデータは送らない。Entry arbitration enables Ijf transmission from all processor nodes, but Uses a signal traveling on a control bus to daisy-chain the sarnodes. , that a processor node sends through which other processor nodes pass By observing the stomach.

直接調停は、データが調停出力バッファへ：！Ｆ込まれて直ちに次の送イ１？サイクルと同時に出力バスに送信される状態において使用される。。Direct arbitration, the data goes to the arbitration output buffer:! Immediately after F is inserted, the next transfer is 1? sa It is used in situations where it is sent to the output bus at the same time as the cycle. .

ＳＩＭＤ構造においては、１つの命令が制御バス２６１−を全でのプロセッサ・ノードに対して通過する。この命令は、同時に入力バス２０上で入力されたＩＩＭ々のプロセッサ・ノートに存在する値について実行される。各出力バッファは、図１で３８ａ、４０ａ、４２ａ、４４ａで示されるそれ自体の内部コントローラを持ち、これは各出力プロセッサが如何にＳＩＭＤコントローラ２４と別個に作動するかを説明する以下のコートおよび構造において記述される、。In a SIMD structure, one instruction can cause the control bus 261 to Pass through to the node. This instruction is simultaneously input on the input bus 20 It is performed on the values present in M processor notes. Each output buffer is , its own internal controllers, shown at 38a, 40a, 42a, 44a in FIG. This is how each output processor is separated from the SIMD controller 24. Described below are the coats and structures that explain how they work.

従来アーキテクチャにおいては、出力は出力バス２２上を個々のプロセッサ・ノードから予め定めた送信シーケンスで送られる。これは、ＩＩ！（論、新しい命令セットまたは新しいデータをプロセッサ・ノートにおいて受取ることができる前に、アレイの他のプロセッサ・ノードが送信を終了されるまで、プロセッサ・ノードが待機することを要求する１、各ＰＮに出力バッフ７を設けることは、プロセッサ・ノードが出力バスにおけるその順番を待つ間、処理データが格納される場所を提供する。プロセッサ・ノードにおいては別の動作が生じ得、これは新しいデータがロートされ、あるいは現存するデータが新しい命令セットにより操作される。In conventional architectures, outputs are routed on output bus 22 to individual processor nodes. It is sent from the card in a predetermined transmission sequence. This is II! (A new life Instruction sets or new data can be received in the processor notebook before any other processor node in the array has finished transmitting. 1, providing an output buffer 7 for each PN requires a node to wait. Processing data is stored while the processor node waits for its turn on the output bus. provide a place to A different behavior may occur at the processor node, which is a new new data is loaded, or existing data is manipulated by a new instruction set. made.

線２８は、あるプロセッサ・ノートからの出力データが１つのプロセッサ・ノードあるいは他のプロセッサ・ノードに対する人力データとなることが要求されるならば、可能状態にすることができる。このような可能状態化は、従来のマイクロ回路機構によって行われ、図２において略図的に示される。。Line 28 indicates that output data from one processor node or other processor nodes. If so, you can enable it. This enablement is similar to traditional microphones. 2, which is shown schematically in FIG. .

アレイ１０における個々のプロセッサの実際の動作については、ソフトウェア係数で提供されるが本発明の出力プロセッサ・ノードを含む集積回路の物理的設計に組込まれる下記の命令セットにより記述される１゜下記のコードは、ニューロコンピュータ・チップにおけるＰＮ構造の実際のＣＭＯ３構成を説明するコードの単純化である。以下に示すコードは、ある予め定義されたマクロにより修飾されたＣプログラミング言語による１、このコードは、本文に延べる回路の実際の構成におけるレジスタ転送レヘルの記述言語として使用される９１強調文字は、信シｊ、／Ｓ−トウエｒあるいは一／　７−　ｌ、ウニｒ要素、あるいは位相またはクロック・サイクルを示す。The actual operation of the individual processors in array 10 is explained by software personnel. a physical design of an integrated circuit provided in numbers but including an output processor node of the present invention; 1゜ The code below is written by the following instruction set that is incorporated into the neuron Code describing the actual CMO3 configuration of the PN structure in a computer chip This is a simplification of The code shown below is modified by some predefined macro. 1 in the C programming language, which is written in the C programming language. The 91 emphasis character used as the description language of the register transfer level in the configuration is Significance j, /S-toe r or 1/7-l, sea r element, or phase or or clock cycle.

変数ｐｈｔおよびｐｈ２は、ダイナミックＭＯＳデバイスを実現するため使用される２つの位相のオーバーラツプしないクロックにおける２つの位相をンミュ意味し、ｒ　ＢＪはバス（１信号線以上）を意味し、「−１」はｐｈｉの間のみ有効なダイナミック信号を意味する。１これらは任意に組合わせることができる。The variables pht and ph2 are used to realize a dynamic MOS device. The two phases of the two non-overlapping clocks are mutually exclusive. taste, r BJ means bus (more than 1 signal line), "-1" is only available during phi. means a valid dynamic signal. 1 These can be arbitrarily combined.

。出カニニット（０ｂＵＮＩＴ）（ｉ８は、出力バッファを含む、、ＰＮＮ出力バッファインターフェースは出力のため使用され、出力が入力とは独立的に実行することを可能にし、種々の再帰動作およびフィードバック動作の間使用される１゜ｉｆ　（ｒｅｓｅＬ）ｔｕｕＬｓＬ＝ｏ；ｓｅｑｒｇｈｔ＝０；ｏｂａｒｍｄ＝０：ｓｅｑｇｏ　ＩＤ＝０；　１このステップは、処理中使用される変数を初期化する１、出力バスに対する出力バッファの調停の間、最も左側のＰＮからの調停信号が逐次の調停ＯＫ信号、１ｅｆｔｆｆは、左側のＰＮ１２における本文において調停信号ジェネレータ手段とも呼ばれるフリップ７０ツブ４６からの信号であり、ＰＮ１２が送信し今ＰＮ１４の番であることを示す１゜出力バッフ７が書込まれる時、０ｂ　ａ、ｒｍｄフリップ７０ツブ４６がセットされる。Ｏｂ　ａｒｍｄは、出力バッファが保護され送信の用意があることを示す。. Output unit (0bUNIT) (i8 is a PNN output buffer including an output buffer. Buffer interfaces are used for output and allow output to run independently of input. 1 used during various recursive and feedback operations. ゜if　(reseL)tuuLsL=o;seqrght=0;obarmd =0: seqgo ID=0; 1 This step initializes the variables used during processing. During arbitration of the output buffer to the output bus, the output from the leftmost PN The arbitration signal is a sequential arbitration OK signal, and 1efff is the main text in PN12 on the left. The signal from flip 70 knob 46, also referred to as arbitration signal generator means, is 1°, which is transmitted by PN12 and indicates that it is now PN14's turn. When output buffer 7 is written, 0b a, rmd flip 70 knob 46 is set. be done. Ob armd indicates that the output buffer is secured and ready for transmission. vinegar.

この送信は、ＰＮ内部の残りの計算とは独立的に行い、本発明の出力バッファ・アーキテクチャの中枢を形成する。This transmission is done independently of the rest of the calculations inside the PN, and our output buffer Forms the core of the architecture.

もし送信が２バイトであれば、下位のバイトが最初に送られる。これは、送信制御が表明され、ＰＮの出力バッファが保護されると生じる。これが２バイト送信の２番目であることを示すため０ｕｔｓｔフリツプ７０ツブが使用される１、ここで述べた全ての調停のｒＪＩ定は逐次モー１−で働き、ここてＩ’Ｎがその送イ、１を完Ｊ゛した時右隣に信号する。、　ｘｍ　ｉ、　ｔ：は、（調停信弓′ を受取ったならば）特定のＰＮの出カバソファに送信することを通知するＳＩＭＩ）指令である。、５ｃｃ４ｃ、ｔｒｂフリップ７０ツブは、この調停モードが ■ｆ能状態にあることを小す、、ｒ）ｕｔｍｄ２はこれが２バイト送信であることを示す４゜ｉ　ｆ（（＋）ｈｌ）ＡＮＤｂ（（＋重訂ｍｄ）ＡＮＤｂ（ｘ劃［））（ｉｆ（ｓｃｑａｒｂ）　（ｉｒ　（ｎｕＬＳＬ−１）　ｉ　ｏｕｌ、ｂｕｓ　［３２（（＋ｕＬｂｕｒ　ＥＩＡＮＤ　０ｘＦＦＱＯＬ）〉＞８＋（ＩＬＩＬｓＬ−０；　（市＋ｉｒｍｄ　Ｏ；　ｓｃｑｒｇ肌　ｌ。If the transmission is 2 bytes, the lower byte is sent first. This is a sending system. Occurs when the PN control is asserted and the PN's output buffer is protected. This is 2 bytes sent 0utst flip 70 is used to indicate the second of 1, this All the arbitration rJI specifications described above work in sequential mode 1-, where I'N A. When you complete 1, send a signal to the person on your right. , xm i, t: is (arbitration Shinkyu' If you receive a SIM card that notifies you that it will be sent to a specific PN I) It is a directive. , 5cc4c, and trb flip 70 knobs have this arbitration mode. f) utmd2 recognizes that this is a 2-byte transmission. 4゜i　f((+)hl)ANDb((+revised md)ANDb(x劃[ ))(if(scqarb)( ir (nuLSL-1) i oul, bus [32 ((+uLbur E IAND 0xFFQOL)>>8+ (ILILsL-0; (city+irmd O; scqrg skin l.

前のシーケンスは、逐次の調停に対する０ｕｔｂｕｆ状態マンーンを実行する。The previous sequence executes the 0utbuf state manmoon for sequential arbitration.

。ｓｅｃｌｒｇｈｔは次のクロック時に送信するように次のＰＮに信号するｅｌｓｅ　ｉｆ　（ｓｃｑｇｏ　ＩＩ））　ｌｉｆ　（ｏｕＬｍｄ２）　１全てのＰＮが出力バス２２上で送信した後、調停プロトコルがリセツー・され、システｔ、はこれ以上の出力が後にあるかを判定するため質問される１、即ち、１ｒ　（ｒｅＳｅｔ）　ｏｂａｒ＋＋＋ｄ＝０；（市ａｒｉｉｄ＝１　；　１ｉｆ　（（ｐｈ２）　ＡＮＤｂ　（ｖｃｖｕｌ）　ＡＮＤｂ　（ｒｇｃｔｌ　Ｂ２＝＝Ｆ　ＲＧＢＢＩＩＳ）　ＡＮＤｂ（ｒ　Ｂ＝Ｆ−ＯＬＩＴＢＬＩＦ）　）　ｔｎｂｕｎｍｄ＝ｌ　；　１もし出力バッファが送信される情報を有するならば、Ｏｂ　ａｒｍｄがセットされこのサイクルが反復する１、ｖＣｖａｌは、有効出力バッファ制御信号を示し、とを示す、７レジスタ０ＵＴＢＵＦへの書込みはｃ）ｂａｒｍｄをセットし、これがＰＮ出力バッファをオンに（可能化）する、７アレイ１０のアーキテクチャの動作は下記の４口＜要約することができる。。. seclrght els signals the next PN to transmit on the next clock e if (scqgo II)) lif (ouLmd2) 1 After all PNs have transmitted on the output bus 22, the arbitration protocol is reset and System t is interrogated to determine whether there are any more outputs to come, 1, i.e. 1r (reSet) obar+++d=0; (city ariid=1; 1 if ((ph2) ANDb (vcvul) ANDb (rgctl B 2==F RGBBIIS) ANDb(r B=F-OLITBLIF)) tnbunmd=l; 1 If the output buffer has information to be sent, Obarmd is set. 1, vCval indicates the valid output buffer control signal. , a write to register 7 0UTBUF indicates that c) sets barmd; This is the 7 array 10 architecture that turns on (enables) the PN output buffer. The operation of the camera can be summarized in the following four ways. .

即ち、クロック０の期間初めに、ＣＮ０１ＣＮＩ、ＣＮ２およびＣＮ３が、それぞれプロセッサ・ノード１２．１４．１６．１８のレジスタ６２に漬かれる値を有する。これらの値は適当な線上の各ＰＮにおける各出力バッファへ書込まれる。That is, Clock 0 period Initially, CN01CNI, CN2 and CN3 are respectively processor nodes. It has the value stored in register 62 of 12.14.16.18. These values are is written to each output buffer in each PN on the appropriate line.

クロック１の期間：プロセッサ・ノード１２における出力バプ７ア３８が、ＣＮＯの出力を出力バス２２へ送る。この値は、プロセッサ・ノード１２．１４．１６．１８により線２８および入力バス２０を介して読出される。各プロセッサ・ノートは、加重メモリー・ユニット６６から重みを取出して、例えばＣＮＯの出力にＷ、いｗ％、、ｗ６゜およびＷ７ｏの如き種々の重みを乗じる。次いで、出力バッフ７３８およびプロセッサ・ノード１２は、出力バス２２上に全送信してもよいという調停信号をフリップ７０ツブ４７および線７０を介して次のプロセッサ・ノードであるプロセッサ・ノード１４における出力バッフ７４０へ送る。Clock 1 period: An output bus 7a 38 in the processor node 12 transfers the output of the CNO to an output bus. Send to 22. This value is set by processor node 12.14.16.18 on line 2. 8 and input bus 20. Each processor note is a weighted note The weights are extracted from the Lee unit 66 and applied to the output of the CNO, for example, W, w%, . Multiply by various weights such as w6° and W7o. Then, the output buffer 738 and and processor node 12 receive an arbitration signal on output bus 22 that it may transmit all. Flip signal 70 via tube 47 and line 70 to the next processor node. to output buffer 740 at processor node 14.

クロック２の期間：プロセッサ・ノード１４における出力バッファ４０は、ＣＮＩの出力を出力バス２２へ送る。この値は、線２８および人力バス２０を介してプロセッサ・ノード１２．１４．１６．１８により読出される。各プロセッサ・ノートは、加重メモリー・ユニット６６からＩＲみを取出し、例えばＣＮ　Ｏの出力にＷ４．、Ｗ、ｌ、Ｗ８１およびＷ７１の々■き種々の爪みを乗じる。７次いで、出力バッファ４０およびプロセッサ・ノート１４は、フリップ７０ツブ４８および線７０により、いま送信してもよいとプロセッサ・ノート１６における出力バッファ４２に信号する。Clock 2 period: Output buffer 40 in processor node 14 connects the output of CNI to an output bus. Send to 22. This value is sent to the processor node via line 28 and human power bus 20. Read by 12.14.16.18. Each processor note is a weighted note Take out the IR signal from the Lee unit 66 and apply it to the output of CN0, for example, W4. ,W, 1, W81 and W71 are multiplied by various nails. 7 Then the output buffer 40 and processor note 14 are connected by flip 70 knob 48 and line 70. The output buffer 42 in the processor notebook 16 indicates that it is OK to send the data now. signal.

データが処理されるまで、同様な動作が連続するクロック・サイクルの間に生じる１、プロセッサ・ノード機能を調べる別の方法は、各プロセッサ・ノードが乗算器６０、加算ユニット５８および２つの索引テーブル、油虫ア）・レス生成ユニット６４および加重メモリー６６を有すると考えることである。、各ノードは人力を受取り、各クロック毎に乗算−累計を行う３．累計ループの後、各プロセッサ・ノードはその出力をその出力バッファへ移動し、回報のためのその戻りを待つ。Similar operations occur during successive clock cycles until the data is processed. Ru1, Another way to examine processor node functionality is that each processor node 0, addition unit 58 and two index tables, oil bug a) response generation unit 64 and weighted memory 66. , each node requires human power 3. Receive and perform multiplication-accumulation for each clock. After the cumulative loop, each processor The node moves its output to its output buffer and waits for its return for routing.

このステップは、下式の如く表わすことができる。即ち、ｉ＝１従って、出力０１は加重メモリー６６から引出された値ＷＩＩに人力値０１を乗じたものの和に等しく、ここで全機能が加重アドレス生成ユニット６４に格納される。This step can be expressed as shown below. That is, i=1 Therefore, the output 01 is the value WII drawn from the weighted memory 66 multiplied by the human power value 01. where all functions are stored in the weighted address generation unit 64. It will be done.

本発明の望ましい実施態様を開示したが、請求の範囲に記載される如き本発明の範囲から逸脱することなく変更および修正が可能であることが理解されよう。Having disclosed preferred embodiments of the invention, the present invention as set forth in the claims. It will be understood that changes and modifications may be made without departing from the scope.

産業上の利用性本発明により構成されたプロセッサは、人間の頭脳の諸機能を分析および意志決定用途においてシミュレートするため使用できるニューラル・ネットワーク・システムにおいて有効である。Industrial applicability A processor configured according to the present invention analyzes various functions of the human brain and makes decisions. Neural network systems that can be used to simulate Effective in stems.

ＦＩＧ、２国際調査報告１＋ｌｅ＋ｎ＋ｌ＜ＩＩ＋ｌ＾ｐロー””””’ｒ／１ＫＱｆ’ｌ／ｎｌｎＭ＋１１１１′＠″′１―肯１Ｔｋ、　Ｋテ／−匍１０ヌ晃６FIG.2 international search report 1+le+n+l<II+l^plow””””’r/1KQf’l/nlnM+ 1111′@″′1-Ken 1Tk, Kte/-匍10nuko6

Claims

[Claims] 1. An input bus (20), an input unit (54), and an operation unit (56 to 66) ) and an output bus (22). Output processors at SIMD processor nodes (12, 14, 16, 18) In processor architecture, an output for receiving data from the input unit (54) and the operating unit (56-66); a force unit (68); located at the selection processor node (12) and selected from the output unit (68). an output buffer (38) that stores and sends data at the point in time, and a processor node. between the board (12), the output buffer (38) and the output unit (68). and a control unit (56) for controlling data exchange between the architecture. 2. At what point in the operation of the processor is a signal sent out from the output buffer (38)? 2. The apparatus according to claim 1, further comprising arbitration means (46) for determining the architecture. 3. Said arbitration means is arranged in each processor node (12) to determine the next processor. Arbitration signal generator (46) that generates control signals for the server node (14) 3. The architecture of claim 2, further comprising: 4. Each output buffer has a data storage ordering and an output bus (22). and a separate internal controller (38a) for controlling subsequent transmission. 2. The architecture of claim 1. 5. Includes multiple connection nodes (30, 34) located within each processor node. If the connection node is configured to perform n2 connections in n clocks, The architecture of claim 1, characterized in that: 6. During each clock, a selected connection node (30) All selected nodes (34, 35, 36, 37) connected via the service node The architecture according to claim 5, characterized in that broadcasting is performed for sets. 7. An input bus (20), an input unit (54), and an operation unit (58 to 66) ) and an output bus (22). Output processors at SIMD processor nodes (12, 14, 16, 18) In processor architecture, Outputs for receiving data from the input unit (54) and the operating units (58-66) a force unit (68); located at the selected processor node (12) to output the output unit at the selected time. (68), an output buffer (38) for storing and transmitting data from the processor; - Node (12) and associated output buffer (38) and output unit (68) a control unit (56) for controlling the exchange of data between each processor node; the next processor node (14). Arbitration signal generator (46) that generates a signal instructing the processor node to transmit and An architecture characterized by the provision of. 8. Each output buffer (38) provides data storage ordering and an output bus (22). a separate internal controller (38a) for controlling subsequent transmission to the 8. The architecture of claim 7, wherein the architecture comprises: 9. A plurality of connection nodes (30, 3) located within each processor node (12) 4), and the connection node is configured to perform n2 connections in n clocks. 8. The architecture of claim 7, wherein: 10. During each clock, the selected connection node (30) All selected nodes (34, 35, 36, 37) connected via the server node 10. The architecture of claim 9, wherein said architecture broadcasts to said set.