JP2741793B2

JP2741793B2 - Neural network processor

Info

Publication number: JP2741793B2
Application number: JP51653091A
Authority: JP
Inventors: 秀樹米田; サンチェス―シネンシオエドガー
Original assignee: Texas A&M University System
Current assignee: Texas A&M University System
Priority date: 1991-10-17
Filing date: 1991-10-17
Publication date: 1998-04-22
Anticipated expiration: 2013-04-22

Description

【発明の詳細な説明】技術分野本発明は、ニューラルネットワークのハードウェア化
に関するものである。Description: TECHNICAL FIELD The present invention relates to a hardware implementation of a neural network.

背景技術近年、所定の情報を入力して何らかの認識を行いその
認識結果を出力する方法として、それまでの方法とは全
く異なる概念を備えたニューラルネットワークなる考え
方が出現し、種々の分野に適用されつつある。このニュ
ーラルネットワークは人間の頭脳の作用をモデル化した
ものであり、多種のモデルが存在する。BACKGROUND ART In recent years, as a method of inputting predetermined information, performing some kind of recognition, and outputting the result of the recognition, a concept called a neural network having a concept completely different from the conventional methods has appeared, and has been applied to various fields. It is getting. This neural network models the action of the human brain, and there are various types of models.

このニューラルネットワークは、数学的なアルゴリズ
ムとして提案されたものであり、構造が複雑であるた
め、従来、このニューラルネットワークは計算機上のシ
ミュレーションで実現されていた。しかし計算機上のシ
ミュレーションでは処理速度が遅く実用上問題があっ
た。現在これらニューラルネットワークの研究も進み、
ハードウェア化の例も報告されているが、層の数が１つ
あるいは２つに限られたものであった。This neural network has been proposed as a mathematical algorithm and has a complicated structure. Therefore, conventionally, this neural network has been realized by simulation on a computer. However, the simulation on a computer has a slow processing speed and has a practical problem. Currently, research on these neural networks is progressing,
Although examples of hardware implementation have been reported, the number of layers was limited to one or two.

その一例として、ニューラルネットワークの１つのモ
デルであるネオコグニトロンのハードウェア化に関して
は、このネオコグニトロンはニューラルネットワークの
中でも構造が複雑であるため、従来あまり研究は進んで
いないが、これまでのところMITからそのハードウェア
化に関する論文がある。As an example, regarding the implementation of neocognitron, which is one model of a neural network, as hardware, this neocognitron has a relatively small structure even among neural networks. However, there is a paper from MIT on hardware implementation.

この論文は90年の国際固体素子会議（ISSCC）で発表
されたもので、NIPS（Neural Information Processing
＆ Systems）′90のポスタセッションでも発表されたも
のである。その構造は簡単で、143個のCCDアレイと７つ
のMDAC（マルチプライヤーDAコンバータ）を組合わせた
ものであり、その回路の大部分はデジタル回路である。
基本的には、入力データも係数データもデジタル回路で
記憶され、乗算は半アナログ式のMDACで行われる。また
この方式では除算回路がうまく作れなかったため、第１
層しか実現できていない。集積度は悪く、29mm²上に７
つの乗算器を実現している。This paper was presented at the International Solid-State Device Congress (ISSCC) in 1990, and has received NIPS (Neural Information Processing).
& Systems) '90 poster session. Its structure is simple, combining 143 CCD arrays and 7 MDACs (multiplier DA converters), and most of the circuits are digital circuits.
Basically, both input data and coefficient data are stored in a digital circuit, and multiplication is performed by a semi-analog MDAC. Also, the division circuit could not be made well in this method,
Only layers can be realized. Poor integration, 7 on 29mm ²
One multiplier has been realized.

このように、これまではニューラルネットワークのハ
ードウェア化には多くの困難を伴っていたため、ハード
ウェア化に代え、３層以上のニューラルネットワークを
高速にシミュレートする方法も研究されており、その１
つとして並列コンピュータ上で実行されるプログラムに
よりシミュレートすることが行われている。しかしこの
方法を採用した場合計算トポロジーが各々の並列コンピ
ュータのアーキテクチャに一致しないことが多いため、
演算要素間のデータトランスファーの効率が落ちてしま
うという問題がある。また演算要素を多く持つ並列コン
ピュータを用いて高速化しても、コストパフォーマンス
を向上させることは困難であるという問題もある。As described above, since the neural network has been associated with many difficulties in hardware, a method of rapidly simulating a neural network having three or more layers has been studied instead of the hardware.
First, simulation is performed by a program executed on a parallel computer. However, when this method is adopted, the computational topology often does not match the architecture of each parallel computer,
There is a problem that the efficiency of data transfer between operation elements is reduced. Further, there is a problem that it is difficult to improve cost performance even if the speed is increased by using a parallel computer having many arithmetic elements.

発明の開示本発明は、上記事情に鑑み、ネオコグニトロンのよう
な複雑な構造をもったニューラルネットワークであって
も、これをハードウェア化することのできる技術を提供
することを目的とするものである。DISCLOSURE OF THE INVENTION In view of the above circumstances, an object of the present invention is to provide a technology capable of realizing even a neural network having a complicated structure such as neocognitron by hardware. It is.

本発明のニューラルネットワークプロセッサは、多層
構造を有するフィードフォワード型のニューラルネット
ワークを実現するニューラルネットワークプロセッサに
おいて、前記ニューラルネットワークを構成するニュー
ロンに対応する演算要素が電圧を入出力変数とするMOS
アナログ回路で構成され、多数の該MOSアナログ回路に
よりシストリックアレイが構成されてなることを特徴と
するものである。The neural network processor according to the present invention is a neural network processor for realizing a feedforward type neural network having a multilayer structure, wherein an operation element corresponding to a neuron constituting the neural network has a MOS having a voltage as an input / output variable.
It is characterized by being constituted by an analog circuit, wherein a systolic array is constituted by a large number of the MOS analog circuits.

ここで、上記「フィードフォワード型のニューラルネ
ットワーク」とは、例えばいわゆるバックプロパゲーシ
ョン等の学習モードにおける信号伝達を除き、通常の使
用状態においては、ニューラルネットワークを構成する
全てのニューロンへの入力は必ず前層から出力されたも
のであり、後層から前層への信号伝達や同一層内でのニ
ューロン相互間の信号伝達の存在しないものをいう。Here, the above "feedforward neural network" means, for example, in a normal use state, inputs to all neurons constituting the neural network are always included except for signal transmission in a learning mode such as so-called back propagation. It is output from the previous layer and means that there is no signal transmission from the rear layer to the previous layer or signal transmission between neurons in the same layer.

ここで、上記本発明のニューラルネットワークプロセ
ッサは、典型的にはネオコグニトロンを実現するプロセ
ッサとして構成される。また、上記本発明のニューラル
ネットワークプロセッサをネオコグニトロン用に構成し
た場合には、上記MOSアナログ回路のそれぞれを、分子
用の積和演算を実行する、出力端子が互いに接続された
複数の分子用のギルバート乗算器、分母用の積和演算を
実行する、出力端子が互いに接続された複数の分母用の
ギルバート乗算器、および前記分子用のギルバート乗算
器の出力端子と接続された第一の入力端子と、前記分母
用のギルバート乗算器の出力端子と接続された第二の入
力端子を有し演算結果を電圧出力する割算器で構成する
ことが好ましい。Here, the neural network processor of the present invention is typically configured as a processor that realizes neocognitron. When the neural network processor of the present invention is configured for a neocognitron, each of the MOS analog circuits is used for a plurality of molecules whose output terminals are connected to each other to execute a product-sum operation for the molecules. A Gilbert multiplier for performing a multiply-accumulate operation for a denominator, a plurality of Gilbert multipliers for a denominator having output terminals connected to each other, and a first input connected to an output terminal of the Gilbert multiplier for the numerator. It is preferable to configure a divider having a terminal and a second input terminal connected to the output terminal of the Gilbert multiplier for the denominator, and for outputting a calculation result as a voltage.

この割算器は、例えば演算結果を電流出力する電流モ
ード割算器と電流−電圧変換器との組合せにより構成し
てもよい。This divider may be configured by, for example, a combination of a current mode divider that outputs the operation result as a current and a current-voltage converter.

上記本発明の方式を用いることにより、デジタル回路
を演算要素とする並列コンピュータと比べ格段にコスト
・パフォーマンスのよいニューラルネットワークシステ
ムを構築することができる。これはなぜなら演算要素と
してアナログ回路を用いるためである。アナログ回路は
一般にデジタル回路に比べLSIとして集積した場合少な
いシリコン上の面積で実現できる。例えば、乗算回路を
構成する場合、アナログ回路を用いれば10トランジスタ
程度で構成できるが、同程度の分解能を得るためにはデ
ジタル回路では1,000〜10,000トランジスタ必要とな
る。アナログ回路では通常はデジタル回路よりも大きな
トランジスタを使用するため集積回路上の面積比はトラ
ンジスタの数のみでは単純には比較はできないが、アナ
ログ回路を使用することで大幅に演算要素を小さくする
ことができる。By using the method of the present invention, it is possible to construct a neural network system with much higher cost performance than a parallel computer using a digital circuit as an operation element. This is because an analog circuit is used as a calculation element. Analog circuits can generally be realized with less area on silicon when integrated as LSIs than digital circuits. For example, when configuring a multiplication circuit, if an analog circuit is used, it can be configured with about 10 transistors, but a digital circuit requires 1,000 to 10,000 transistors to obtain the same resolution. Analog circuits usually use larger transistors than digital circuits, so the area ratio on an integrated circuit cannot be simply compared by the number of transistors alone, but the use of analog circuits significantly reduces the number of arithmetic elements. Can be.

ただし、本発明においては、従来不成功ながらも試み
られていたデジタル回路を演算要素とするハードウェア
化を単純にアナログ回路に置き換えたものではない。通
常ニューラルネットワークのような複雑な構造を有する
アルゴリズムをアナログ回路を用いてハードウェア化す
ると、アナログ回路に本質的な誤差が累積してしまって
使いものにならなくなるのが通常であり、したがってこ
れまでは多層のニューラルネットワークをアナログ回路
で構成しようとする試みすらほとんどないものと考えら
れ、少なくとも成功例は報告されていない。However, in the present invention, hardware implementation using a digital circuit, which has been attempted unsuccessfully but conventionally, as an arithmetic element is not simply replaced with an analog circuit. Generally, when an algorithm having a complicated structure such as a neural network is implemented by hardware using an analog circuit, it is usual that an inherent error accumulates in the analog circuit and becomes unusable. Few attempts have been made to construct a multilayer neural network with analog circuits, and at least no success has been reported.

本発明者は、ニューラルネットワークは、最終的に例
えば画像等を認識できればよく、したがって通常のアナ
ログ演算回路と比べ誤差の許容度が大きいことに想到
し、またアナログ回路でニューラルネットワークを構成
するにあたり、このニューラルネットワークを３次元の
シストリックアレイとみなし、デジタル回路で用いられ
る技法であるプロジェクションとスケージュリングを適
用することにより演算要素の数を大幅に減らし、これに
よりニューラルネットワークを現実的な規模のアナログ
回路で構成することを可能ならしめたものである。シス
トリックアレイではスループットを向上させるためパイ
プライン化（シストライズ;systolize）する必要がある
が、本発明は演算要素の入出力をアナログ電圧としたMO
Sアナログ回路を使用したものであるため、アナログ電
圧はスイッチとトランジスタの寄生容量に蓄えることが
可能であり、これによりパイプライン化が容易となる。
またアナログ電圧は配線一本で多くの演算要素にその電
圧信号を伝達でき、効率的である。The inventor of the present invention has conceived that the neural network only needs to be able to finally recognize, for example, an image and the like, and thus has a larger error tolerance than a normal analog arithmetic circuit.In configuring a neural network with an analog circuit, This neural network is regarded as a three-dimensional systolic array, and the number of arithmetic elements is greatly reduced by applying projection and scheduling, which are techniques used in digital circuits. This makes it possible to configure an analog circuit. In the systolic array, it is necessary to make a pipeline (systolize) in order to improve the throughput.
Since an S analog circuit is used, an analog voltage can be stored in a parasitic capacitance of a switch and a transistor, thereby facilitating a pipeline.
In addition, the analog voltage can efficiently transmit the voltage signal to many arithmetic elements with a single wiring, which is efficient.

ここで、シストリックアレイとは、このシストリック
アレイを構成する全ての演算要素が本質的には互いに同
一の構造をもった比較的簡単な演算器で構成されてお
り、かつパイプライン化されているものをいう。Here, the systolic array is a system in which all the arithmetic elements constituting the systolic array are composed of relatively simple arithmetic units having essentially the same structure as each other and are pipelined. Is what you have.

一般にシストリックアレイは以下のような特徴を持っ
ている。Generally, systolic arrays have the following features.

（１）同じ構造の演算要素を数多く結合し、通常のコ
ンピュータでは実現できないような高性能を実現する。(1) A large number of arithmetic elements having the same structure are combined to realize a high performance that cannot be realized by an ordinary computer.

（２）バスは不要であって、データは単方向にかつ局
所的に伝えられる。(2) No bus is required, and data is transmitted unidirectionally and locally.

（３）各演算要素で保持されるデータは少量であっ
て、保持される時間も、１パイプライン時間あるいは高
々数パイプライン時間である。通常のコンピュータシス
テムに見られるような大量のデータを長時間保持するメ
モリシステムは不要である。(3) The amount of data held by each operation element is small, and the holding time is one pipeline time or at most several pipeline times. There is no need for a memory system that retains a large amount of data for a long time as found in a normal computer system.

（４）プロジェクションのテクニックを用いて必要な
演算要素の数を減らすことが可能である。(4) The number of necessary arithmetic elements can be reduced by using the projection technique.

デジタル回路でシストリックアレイを構成する場合は
上記（１）が最も重要なシストリックアレイの特徴であ
るが、アナログ回路では上記（２）や（３）が重要な特
徴となる。アナログ回路で扱われるデータはアナログ値
である。例えばアナログ値を電圧で表現する場合、ある
レジスタあるいはメモリに貯えられたアナログデータを
バスを介して他のレジスタあるいはメモリに転送するた
めにはデジタル方式のそれより長い時間がかかってしま
う。なぜなら一般にバスは大きな寄生容量を持つため、
バス自体を転送元のアナログ電圧に対し充分に高い精度
の電圧までチャージアップするのにかなりの時間がかか
ってしまうからである。場合によっては、演算回路より
バス場のアナログデータの転送に時間がかかり、そのア
ナログデータの転送速度が回路全体の性能を律則するこ
とになる。When a systolic array is configured by a digital circuit, the above (1) is the most important feature of the systolic array, but in an analog circuit, the above (2) and (3) are important features. Data handled by the analog circuit is an analog value. For example, when an analog value is represented by a voltage, it takes a longer time to transfer analog data stored in a certain register or memory to another register or memory via a bus than in a digital system. Because buses generally have large parasitic capacitance,
This is because it takes a considerable amount of time to charge up the bus itself to a voltage with sufficiently high accuracy with respect to the analog voltage of the transfer source. In some cases, it takes time to transfer the analog data in the bus field from the arithmetic circuit, and the transfer speed of the analog data governs the performance of the entire circuit.

またアナログ回路ではデジタル回路と異なりアナログ
データを精度良く保持することが大変難しい。したがっ
てアナログ回路で大規模な演算回路を構成しようとする
場合、そのアーキテクチャはできるだけメモリを使用し
ないアーキテクチャでなければならない。デジタルシス
テムでは演算機に比べメモリは安価である。このため計
算機といえばフォンノイマン型のアーキテクチャが広く
採用されてきた。これに対し、アナログシステムでは演
算器に比べメモリの方がハードウェア量的に大きくなり
がちであり高価になってしまう。技術的に見てもアナロ
グデータを長時間精度良く保持するメモリを構築するこ
とは難しい。できるだけメモリを排除したアーキテクチ
ャを採用しない限り実現そのものが難しくなってしま
う。Also, unlike a digital circuit, it is very difficult to hold analog data with high accuracy in an analog circuit. Therefore, when a large-scale arithmetic circuit is to be constituted by analog circuits, the architecture must be one that uses as little memory as possible. In a digital system, a memory is cheaper than an arithmetic unit. For this reason, von Neumann architecture has been widely adopted for computers. On the other hand, in an analog system, a memory tends to be larger in hardware amount than an arithmetic unit, and is expensive. From a technical point of view, it is difficult to construct a memory that holds analog data with high accuracy for a long time. Unless an architecture that eliminates as much memory as possible is used, the implementation itself becomes difficult.

このように上記（２）や（３）とアナログ回路との組
み合わせが重要であり、単にデジタルストリックアレイ
の演算要素をアナログ回路に置き換えたのとは効果が大
きく異なる。シストリックアレイの、上記（２）や
（３）の特性と組み合わせることにより初めて大規模な
アナログ計算回路が可能になる。As described above, the combination of the above (2) and (3) with the analog circuit is important, and the effect is greatly different from simply replacing the arithmetic element of the digital strick array with the analog circuit. A large-scale analog calculation circuit can be realized only in combination with the characteristics (2) and (3) of the systolic array.

またハードウェア化すべきニューラルネットワークモ
デルをネオコグニトロンとした場合、プロジェクション
されたシストリックアレイの演算要素間の結合は局所的
となり、より有効なシリコン上のレイアウトが可能であ
る。これはなぜなら、ネオコグニトロン中の各ニューロ
ンは前層中の近傍のニューロンからのみ入力を受けるた
め、プロジェクションされたシストリックアレイ上でも
信号伝達は局所的となるからである。If the neural network model to be implemented as hardware is neocognitron, the connection between the arithmetic elements of the projected systolic array becomes local, and a more effective layout on silicon is possible. This is because each neuron in the neocognitron receives input only from nearby neurons in the anterior layer, so that signal transmission is local even on the projected systolic array.

ネオコグニトロンのニューロン（演算要素）における
演算を表現する式（後述する）のうちU_S層の式を簡単化
すると、U_S層のニューロン用の式とU_C層のニューロン用
の式は同じ形の式となる。すなわち、後述するように、
分子、分母が複数の積和演算によってなされる分数式で
表わされる。Briefly the formula U _S layer of the formula (described later) representing the operation in Neocognitron neurons (operation elements), the formula for the neurons of the formula and U _C layer for neurons U _S layer are the same It becomes an expression of the form. That is, as described below,
The numerator and denominator are represented by fractional expressions formed by a plurality of product-sum operations.

ここで、それぞれの分子、分母の積和演算を実行する
のにギルバート乗算器を用いると、ギルバート乗算器の
入力はアナログ電圧なので、演算要素の入力をそのまま
入力とし、乗算を実行する。その出力は電流となるた
め、複数個の乗算器の出力を一つの配線に結線すること
により加算が実行され、積和演算結果となる。これら分
子、分母それぞれの積和演算を表わす電流を電流入力割
算器の２つの入力に与えることにより除算が行われる。
演算要素の出力は電圧でなくてはならないので、例えば
電流モード割算器と電流−電圧変換器との組合せで構成
される電流入力−電圧出力の割算器を用いる。積和演算
と除算を実行するアナログ回路は演算増幅器を使っても
構成できるが、上記の組合せは、演算増幅器によるもの
よりコンパクトに実現することができる。Here, if a Gilbert multiplier is used to execute the product-sum operation of each numerator and denominator, the input of the Gilbert multiplier is an analog voltage. Since the output is a current, the outputs of a plurality of multipliers are connected to a single wire to perform addition, resulting in a product-sum operation result. Division is performed by giving currents representing the product-sum operation of these numerator and denominator to two inputs of the current input divider.
Since the output of the operation element must be a voltage, a current input-voltage output divider composed of, for example, a combination of a current mode divider and a current-voltage converter is used. The analog circuit for performing the product-sum operation and the division can be configured by using an operational amplifier, but the above combination can be realized more compactly than that by the operational amplifier.

このように、本発明のニューラルネットワークプロセ
ッサはニューロンに対応する演算要素を電圧を入出力変
数とするMOSアナログ回路で構成し、多数のMOSアナログ
回路によりシストリックアレイを構成したものであるた
め、各演算要素を大幅に小さくすることができ、また演
算要素の数も少なくて済み、これにより、コスト・パフ
ォーマンスの高いハードウェアが実現される。As described above, since the neural network processor of the present invention is configured by the MOS analog circuit that uses the voltage as the input / output variable for the operation element corresponding to the neuron, and the systolic array is configured by a large number of MOS analog circuits, The number of operation elements can be significantly reduced, and the number of operation elements can be reduced, thereby realizing cost-effective hardware.

またハードウェア化すべきニューラルネットワークモ
デルをネオコグニトロンとした場合、ネカコグニトロン
中のニューロンは前層の近傍のニューロンからのみ入力
を受けるため、プロジェクションされたシストリックア
レイ上でも信号伝達が局所的となり、より有効なシリコ
ン上のレイアウトが可能となる。If the neural network model to be implemented as hardware is neocognitron, the neurons in the necacognitron receive input only from the neurons in the vicinity of the previous layer, so that signal transmission becomes local even on the projected systolic array, An effective layout on silicon becomes possible.

図面の簡単な説明図１はネオコグニトロンが正しく認識することのでき
る数字の例を示した図である。BRIEF DESCRIPTION OF THE DRAWINGS FIG. 1 is a diagram showing examples of numbers that can be correctly recognized by neocognitron.

図２はネオコグニトロンの３次元構造を示した模式図
である。FIG. 2 is a schematic diagram showing a three-dimensional structure of neocognitron.

図３はネオコグニトロンのウインドウの例を示した模
式図である。FIG. 3 is a schematic diagram showing an example of a neocognitron window.

図４はネオコグニトロンの局所的な結合を示した模式
図である。FIG. 4 is a schematic diagram showing local binding of neocognitron.

図５はネオコグニトロンにおいて、後層ほど小さな層
となることを示した模式図である。FIG. 5 is a schematic diagram showing that the smaller the layer in the neocognitron, the smaller the layer.

図６はネオコグニトロンにおけるパターン認識の概念
を示した模式図である。FIG. 6 is a schematic diagram showing the concept of pattern recognition in a neocognitron.

図７はU_S1層で用いられる12個の係数セットを表わし
た模式図である。FIG. 7 is a schematic diagram showing 12 coefficient sets used in the _US1 layer.

図８はU_S2層で用いられる係数セットの例を表わした
模式図である。FIG. 8 is a schematic diagram illustrating an example of a coefficient set used in the _US2 layer.

図９は各ニューロンにおける演算の一例を表わした図
である。FIG. 9 is a diagram illustrating an example of an operation in each neuron.

図10はプロジェクションとスケジューリングの一例を
表わした図である。FIG. 10 is a diagram illustrating an example of projection and scheduling.

図11はアナログパイプラインレジスタおよびその接続
を示した模式図である。FIG. 11 is a schematic diagram showing the analog pipeline registers and their connections.

図12はプロジェクションとスケジューリングの他の例
を表わした図である。FIG. 12 is a diagram illustrating another example of projection and scheduling.

図13はハードウェアシステムの全体構成図である。 FIG. 13 is an overall configuration diagram of the hardware system.

図14はアナログニューロンの典型的な配置を示した図
である。FIG. 14 is a diagram showing a typical arrangement of analog neurons.

図15はパイプランイステージのブロック図である。 FIG. 15 is a block diagram of the pipeline stage.

図16は９入力のアナログニューロンのブロック図であ
る。FIG. 16 is a block diagram of a 9-input analog neuron.

図17は電流モード割算器の回路図である。 FIG. 17 is a circuit diagram of the current mode divider.

図18はギルバード乗算器の回路図である。 FIG. 18 is a circuit diagram of the Gilbert multiplier.

図19は電流・電圧変換器の回路図である。 FIG. 19 is a circuit diagram of the current / voltage converter.

図20は３×３入力の演算要素の測定、シミュレーショ
ンに用いた入力パターンを示した図である。FIG. 20 is a diagram showing an input pattern used for measurement and simulation of a 3 × 3 input operation element.

図21はハードウェア化された演算要素の測定結果を表
わしたグラフである。FIG. 21 is a graph showing measurement results of arithmetic elements implemented in hardware.

図22は演算要素の、アナログシュミュレータを用いた
シュミュレーション結果を表わしたグラフである。FIG. 22 is a graph showing a simulation result of an arithmetic element using an analog simulator.

図23は測定結果、シミュレーション結果、計算結果を
比較したグラフである。FIG. 23 is a graph comparing measurement results, simulation results, and calculation results.

発明を実施するための最良の形態以下、本発明の実施形態について説明する。BEST MODE FOR CARRYING OUT THE INVENTION Hereinafter, embodiments of the present invention will be described.

本発明、前述したように、多層構成を有するフィード
フォワード型のニューラルネットワークに適用されるも
のであってネオコグニトロンに限定されるものではない
が、ここでは、この「多層構造を有するフィードフォワ
ード型のニューラルネットワーク」の一例であり、かつ
本発明に最も適合したニューラルネットワークの１つで
あるネカコグニトロンについて説明する。As described above, the present invention is applied to a feedforward type neural network having a multilayer structure and is not limited to a neocognitron, but here, this "feedforward type having a multilayer structure" Nekogonitron, which is one example of a neural network that is an example of the neural network of the present invention and that is most suitable for the present invention, will be described.

ネオコグニトロンのアルゴリズム自体については既に
公知であるため（［１］ K.Fukushima,“Neural network model for s
elective attention in visual pattern recognition a
nd associative recall",Applied Optics, vol.26, No.
23, December 1987、［２］ K.Fukushima,“A Neural Network for Visual
Pattern Recognition",Computer,pp.65−75,March 198
8、［３］ K.Fukushima,“Analysis of the Process of V
isual Pattern Recoginition by the Neocognitron",Ne
ural Networks, Vol.2,pp.413−420,1989、［４］ K.Fukushima,“Neocognitron:A Hierarchical
Neural Network Capable of Visual Pattern Recogniti
on",Neural Networks,Vol.1,pp.119−130,1988参照）、
以下ネオコグニトロン自体については簡単に説明するこ
ととし、続いてこのネオコグニトロンのハードウェア化
について説明する。The neocognitron algorithm itself is already known ([1] K. Fukushima, “Neural network model for s
elective attention in visual pattern recognition a
nd associative recall ", Applied Optics, vol. 26, No.
23, December 1987, [2] K. Fukushima, “A Neural Network for Visual
Pattern Recognition ", Computer, pp. 65-75, March 198
8, [3] K. Fukushima, “Analysis of the Process of V
isual Pattern Recoginition by the Neocognitron ", Ne
ural Networks, Vol. 2, pp. 413-420, 1989, [4] K. Fukushima, “Neocognitron: A Hierarchical
Neural Network Capable of Visual Pattern Recogniti
on ", Neural Networks, Vol. 1, pp. 119-130, 1988),
Hereinafter, the neocognitron itself will be briefly described, and subsequently, the hardware configuration of the neocognitron will be described.

ネオコグニトロンは、画像認識用のニューラルネット
ワークモデルであるコグニトロン（［５］K.Fukushima,
“Cognitron:A self−organizing multi−layered neur
al network",Biol.Cyber.,Vol.20,pp.121−136,1975参
照）の改良として提案されたものであり、階層構造を採
用しU_C層と呼ばれる位置誤差吸収層を付加したことによ
り、入力画像の位置の移動や変形に強い、画像認識に最
適なニューラルネットワークとして構成されたものであ
る。このネオコグニトロンを用いると、例えば図１に示
す数字を互いに区別して認識することができ、またこの
図に示す変形された数字４′のようなものでも正しく認
識することができる。Neocognitron is a neural network model for image recognition, cognitron ([5] K. Fukushima,
“Cognitron: A self-organizing multi-layered neur
al network ", Biol.Cyber., Vol.20 , has been proposed as an improvement of the reference pp.121-136,1975), that adopts a hierarchical structure by adding a position error absorbing layer called U _C layer By using this neocognitron, for example, it is possible to distinguish the numbers shown in FIG. 1 from each other and recognize them. It is also possible to correctly recognize even the modified numeral 4 'shown in FIG.

このネオコクニトロン以外の他の多層構造のニューラ
ルネットワーク（［６］M.L.Minsky,“Perceptron",The
MIT Press 1969、［７］ R.P.Lippmann,“An introduction to computin
g with neural nets",IEEE ASSP Magazine,pp.4−22,Ap
ril 1987、［８］ B.Kosko,“Bidirectional Associative Memori
es",IEEE Trans on Systems,Man and Cybernetics, Vo
l.18,No.1,pp46−60,January/February 1988、［９］ G.A.Carpenter,“Neural Network Models for
Pattern Recognition and Associative Memory",Neural
Networks,Vol.2,pp243−257,1989、［10］ Moises E.Robinson G.,Hideki Yoneda,Edgar S
anchez−Sinecoio,“A Modular VLSI Design of a CMOS
Hammimg Network",IEEE/ISCAS 91,pp.1920−1923,June
1991参照）は、そのほとんどが２層あるいは３層で構
成されているが、ネオコグニトロンは、これらと比べ複
雑であって、典型的には４層以上の２次元の層で構成さ
れている。例えば上記［２］、［３］の文献に示された
例では、入力層の他のに８層用いられている。これら２
次元の層は、図２の４層構造の例に示すように、徐々に
収縮して最後には少数のニューロンからなる出力層とな
る。各層内には、ウインドウと呼ばれるニューロンのサ
ブグループから存在する。各ウインドウは入力画像の互
いに同一の部分的な形状を抽出する必要がある場合は、
少くともＮ個のウインドウが必要となる。A neural network of a multilayer structure other than this neococcnitron ([6] MLMinsky, “Perceptron”, The
MIT Press 1969, [7] RPLippmann, “An introduction to computin
g with neural nets ", IEEE ASSP Magazine, pp.4-22, Ap
ril 1987, [8] B. Kosko, “Bidirectional Associative Memori
es ", IEEE Trans on Systems, Man and Cybernetics, Vo
l.18, No.1, pp46-60, January / February 1988, [9] GACarpenter, “Neural Network Models for
Pattern Recognition and Associative Memory ", Neural
Networks, Vol. 2, pp 243-257, 1989, [10] Moises E. Robinson G., Hideki Yoneda, Edgar S
anchez-Sinecoio, “A Modular VLSI Design of a CMOS
Hammimg Network ", IEEE / ISCAS 91, pp. 1920-1923, June
Although most of them are composed of two or three layers, neocognitron is more complex than these, and is typically composed of four or more two-dimensional layers. . For example, in the examples shown in the above documents [2] and [3], eight layers are used in addition to the input layer. These two
As shown in the example of the four-layer structure in FIG. 2, the dimensional layer gradually contracts and finally becomes an output layer including a small number of neurons. Within each layer, there is a subgroup of neurons called windows. If each window needs to extract the same partial shape of the input image,
At least N windows are required.

図３は、手書き文字認識用ネオコグニトロンのウイン
ドウの例を示した模式図である。U_S1層内には、12個の
ウインドウがあり、各ウインドウ毎にそれぞれ例えば水
平線、垂直線、傾いた線等が抽出される。ウインドウの
寸法は前層（例えばU_C1層に対するU_S1層等）のウインド
ウの寸法およびその層の入力領域の寸法によって決定さ
れる。この部分形状を抽出するために、各ウイントウで
は、係数a_k,i,j,k′のセットが必要となる。この係数a
_k,i,j,k′のセットは、人間の頭脳等実際のニューラル
ネットワークにおけるニューロン間の′結合の強さ′に
対応している。ここでこの係数a_k,i,j,kのセットにおけ
るi,jは、前層内の隣接するニューロンからのそれぞれ
ｘ方向、ｙ方向の変位を表わしている。またｋはその層
内のウインドウに付した番号、ｋ′は前層内のウインド
ウに付した番号である。各ウインドウ内の全ニューロン
は同一の部分形状を抽出するものであるため、その各ウ
インドウ内の全ニューロンに対して同一の係数a
_k,i,j,k′のセットが用いられる。各層内のウインドウ
では前層で抽出された部分形状が融合されて、より複雑
かつ入力パターンのより大きな部分の部分形状が抽出さ
れる。この入力パターンのより大きな部分の形状を抽出
する後層のウインドウほどそのウインドウの面積（ウイ
ンドウを構成するニューロンの数）は小さくなり、最終
層では、各ウインドウは、最終結果を出力するただ１つ
のニューロンで構成される。図３にこの様子が、示され
ており、U_S3層内の10個の各ニューロン１つずつが各ウ
インドウを形成している。この図３に示す例では、U_S3
層の10個のニューロンのそれぞれが、このネオコグニト
ロンで認識されるべき10種類の数字のうちの各数字の認
識を担っており、最も大きい値を出力したニューロンに
対応する数字が認識結果とされる。各ウインドウでどの
ような部分形状を抽出させるかは、学習を行わせる前
に、初期状態としての係数a_k,i,j,k′のセットを設定し
ておく必要がある。FIG. 3 is a schematic diagram showing an example of a window of the neocognitron for handwritten character recognition. There are 12 windows in the _US1 layer, and for each window, for example, a horizontal line, a vertical line, an inclined line, etc. are extracted. The size of the window is determined by the size of the window in the previous layer (eg, the U _S1 layer versus the U _C1 layer) and the size of the input area of that layer. In order to extract this partial shape, each window requires a set of coefficients a _{k, i, j, k ′} . This coefficient a
_{The set of k, i, j, k '} corresponds to the' connection strength 'between neurons in a real neural network such as the human brain. Here _{, i, j in} the set of coefficients a _{k, i, j, k} represents displacements in the x and y directions from adjacent neurons in the preceding layer, respectively. K is the number assigned to the window in the layer, and k 'is the number assigned to the window in the previous layer. Since all the neurons in each window extract the same partial shape, the same coefficient a is used for all the neurons in each window.
_{A set of k, i, j, k '} is used. In the window in each layer, the partial shapes extracted in the previous layer are merged to extract a more complex partial shape of a larger portion of the input pattern. The area of the window (the number of neurons constituting the window) becomes smaller as the window in the later layer from which the shape of the larger part of the input pattern is extracted, and in the final layer, each window has only one output of the final result. Consists of neurons. This is shown in FIG. 3, where each one of the ten neurons in the _US3 layer forms a window. In the example shown in FIG. 3, U _S3
Each of the 10 neurons in the layer is responsible for recognizing each of the 10 types of numbers to be recognized by this neocognitron, and the number corresponding to the neuron that outputs the largest value is the recognition result. Is done. It is necessary to set a set of coefficients a _{k, i, j, k ′} as an initial state before learning is performed to determine what partial shape is extracted in each window.

ネオコグニトロンにおいて、各ニューロンへの入力
は、図４に示すように、前層内のウインドウ内の、対応
する位置にあるニューロンおよびその近傍にあるニュー
ロンから出力されたもののみである。ここでこの「近
傍」の意味は各層ごとに異なるが、通常は３×3,5×5,7
×７程度である。この結合により、図５に示すように、
後層へ行くほど前層側で抽出された部分形状が融合され
てより大きな部分の部分形状を抽出することとなり、ま
たこれに伴ってウインドウの大きさは順次縮小されるこ
ととなる。この図５に示すように、３×３の近傍からの
結合を有する後層内の１つのウインドウを考えると、そ
の前層内のウインドウの大きさは５×５となり、後層内
のウインドウの大きさは３×３となる。またこれととも
に後層内のニューロンは、前層内のニューロンよりも入
力層のより大きな部分領域を表現したものとなる。In the neocognitron, as shown in FIG. 4, the input to each neuron is only the output from the neuron at the corresponding position in the window in the front layer and the neuron in the vicinity thereof. Here, the meaning of the “neighborhood” differs for each layer, but is usually 3 × 3,5 × 5,7
× 7. By this coupling, as shown in FIG.
As it goes to the subsequent layer, the partial shapes extracted on the front layer side are merged to extract a larger partial shape, and the size of the window is sequentially reduced accordingly. As shown in FIG. 5, when one window in the rear layer having a connection from the vicinity of 3 × 3 is considered, the size of the window in the front layer is 5 × 5, and the size of the window in the rear layer is 5 × 5. The size is 3 × 3. At the same time, the neurons in the rear layer represent a larger partial region of the input layer than the neurons in the front layer.

ネオコグニトロンには、フィードバックループは存在
しない。即ち、各ニューロンに入力される信号は、その
前層内の近傍のニューロンから出力されたもののみであ
り、したがって信号は常に前層から後層に向かって伝達
され、後層から前層へ伝達される信号や同一層内のニュ
ーロンからニューロンへ伝達される信号は存在しない。Neocognitron does not have a feedback loop. That is, the signal input to each neuron is only the signal output from a nearby neuron in the previous layer, and therefore, the signal is always transmitted from the front layer to the rear layer, and transmitted from the rear layer to the previous layer. There is no signal transmitted or transmitted from neuron to neuron in the same layer.

前述したように、各層には、ウインドウと呼ばれるニ
ューロンのサブグループが存在し、そのウインドウはx,
yの２次元の矩形の構造を有しており、ニューロンがｘ
方向,y方向に等間隔の平行線を引いた時の各交叉点に１
つずつ存在する。ネオコグニトロンを構成するニューロ
ンの数は非常に多いが、各ウインドウ内のニューロンで
は互いに同一の係数セットが用いられるため、互いに異
なる値をもった係数の数は比較的少なくて済む。あるウ
インドウに対応する係数が、その学習プロセス中に変更
された場合には、そのウインドウを構成する全てのニュ
ーロンに対しその変更された係数が用いられることにな
る。即ち、学習結果はウインドウ内の全てのニューロン
に対し同一に適用される。As mentioned earlier, each layer has a subgroup of neurons, called windows, whose windows are x,
It has a two-dimensional rectangular structure of y, and the neuron is x
1 at each intersection when parallel lines at regular intervals in the
Exist one by one. Although the number of neurons constituting a neocognitron is very large, the number of coefficients having mutually different values is relatively small because the same set of coefficients is used for the neurons in each window. If the coefficient corresponding to a window is changed during the learning process, the changed coefficient will be used for all neurons constituting the window. That is, the learning result is applied equally to all neurons in the window.

ネオコグニトロンには２つのタイプの層が存在する。
U_S層では、入力画像の部分形状が抽出され、U_C層では、
U_S層における抽出結果が統合され入力画像の位置のエラ
ーが吸収される。Neocognitron has two types of layers.
In the U _S layer, a partial shape of the input image is extracted, and in the U _C layer,
The extraction results in the U _S layer are integrated and errors in the position of the input image are absorbed.

U_S層とC_S層は、図３に示すように、交互に配置されて
いる。ネオコグニトロンではこの構造が採用されること
により、入力画像の位置エラーや入力画像の変形が吸収
されることとなる。The U _S layer and the C _S layer are alternately arranged as shown in FIG. By adopting this structure in the neocognitron, the position error of the input image and the deformation of the input image are absorbed.

図６は、ネオコグニトロンにおけるパターン認識の概
念を示した模式図である。前層側においては、水平線、
垂直線といった簡単な部分形状が検出され、後層側に行
くに従って、前層側で検出された簡単な部分形状が、例
えば数字′４′の頂点のようなより複雑な部分形状に融
和、統合される。ここで、U_S層の中間にU_C層が入力され
ており、このU_C層により入力パターンの位置エラーや入
力パターンの変形が徐々に吸収される。FIG. 6 is a schematic diagram illustrating the concept of pattern recognition in a neocognitron. On the front layer side, the horizontal line,
A simple partial shape such as a vertical line is detected, and as it goes to the back layer side, the simple partial shape detected on the front layer side is united and integrated into a more complicated partial shape such as the vertex of the number '4'. Is done. Here are inputted U _C layer in the middle of the U _S layer, deformation of the position error and the input pattern in the input pattern by the U _C layer is gradually absorbed.

U_S層では、前層から入力された画像データと係数a
_k,i,j,kとの２次元のコンボリューションが演算され、
これにより部分形状が抽出される。この係数a
_k,i,j,kは、デジタル画像処理におけるいわゆるテンプ
レートに相当する。図7,図８は、U_S層の′学習′に用い
られるテンプレートのパターンの例を示したものであ
る。これらのテンプレートを用いて′学習′を行わせる
ことにより、これら図７、図８における白い部分に対応
する係数は零のまま変化せず黒い部分に対応する係数は
所定の正の値に変化し、これにより最終的にテンプレー
トと等価の値をもった係数a_k,i,j,kのセットとなる。In the U _S layer, the image data input from the previous layer and the coefficient a
A two-dimensional convolution with _{k, i, j, k} is calculated,
Thereby, a partial shape is extracted. This coefficient a
_{k, i, j, k} correspond to so-called templates in digital image processing. 7 and 8 show examples of template patterns used for 'learning' the _US layer. By performing 'learning' using these templates, the coefficients corresponding to the white portions in FIGS. 7 and 8 remain unchanged at zero, and the coefficients corresponding to the black portions change to a predetermined positive value. Thus _, a set of coefficients a _{k, i, j, k} having values equivalent to the template is finally obtained.

[Simplification of mathematical model for hardware]

ネオコグニトロンにおけるニューロンのモデルは、前
述した［６］〜［10］の文献に示される従来の他のニュ
ーラルネットワークモデルと比べ複雑であり、特にU_s層
を構成するニューロンにおける演算は複雑である。この
U_s層のニューロンに３×３の入力が存在するものとした
場合、このU_s層のニューロンにおける演算は数学的に
は、と表現される。ここで、C₁，C₂，C₃は定数であり、U
_s(1+1),x,y,k，U_c1,x,y,kは、それぞれ、ここで考えて
いる層およびその前層における出力である。またx,yは
各ウインドウ内の各ニューロンの座標、l,kは、着目し
ているニューロンが属する、それぞれ層の番号、ウイン
ドウの番号である。また前述したように、ｋ′はこのニ
ューロンに入力される信号が出力される前層内のウイン
ドウの番号を表わしている。また、係数a_k,i,j,kのセッ
トは、各ウインドウ毎にその値が定められる。これらの
係数は、ニューロン間に結合（シナプス）の強度に対応
する。またb_kは、a_k,i,j,kの平均を表わしており、この
ニューロンの出力を正規化するためのものである。上記
（１）式の理解を容易にするために、以下に示すような
簡単な例について考える。Models of neurons in neocognitron is complex compared with other conventional neural network model shown in the literature of the aforementioned [6] to [10], the operation is complicated in neurons, particularly constituting the U _s layer . this
If assumed that the input of the neuron to a 3 × 3 U _s layer is present, operations in the neurons of the U _s layer Mathematically, Is expressed as Here, C ₁ , C ₂ , and C ₃ are constants and U
_{s (1 + 1), x, y, k} and _{Uc1, x, y, k} are outputs in the layer considered here and the layer before it, respectively. X and y are the coordinates of each neuron in each window, and l and k are the layer number and window number to which the focused neuron belongs. As described above, k 'represents the number of the window in the previous layer from which the signal input to this neuron is output. The value of the set of coefficients a _{k, i, j, k} is determined for each window. These coefficients correspond to the strength of the connection (synapse) between the neurons. B _k represents the average of a _{k, i, j, k} and is used to normalize the output of this neuron. To facilitate understanding of the above equation (1), consider the following simple example.

この場合（１）式は、となる。 In this case, equation (1) is Becomes

図９に、入力νに対する出力U_s1,x,y,kの概略を示
す。この（２）式の物理的な意味は図４（Ａ），（Ｂ）
に示されている。各ニューロンに関する計算は、２次元
のコンボリューション演算された結果を、その入力の自
乗和平方根（rms;root−mean−square）で割り算するこ
とになる。（１）上式中の係数a_k,i,j,kのセットは、シ
ナプス結合強度に対応している。FIG. 9 schematically shows the output U _{s1, x, y, k} with respect to the input ν. The physical meaning of this equation (2) is shown in FIGS.
Is shown in The calculation for each neuron involves dividing the result of the two-dimensional convolution operation by the root-mean-square of its input. (1) The set of coefficients a _{k, i, j, k in} the above equation corresponds to the synaptic connection strength.

しかし、シミュレーションの結果、ほとんど全ての場
合において、上記自乗和平方根（rms）を単純に通常の
平均に置き換えることができ、この場合（１）式は、となる。However, as a result of the simulation, in almost all cases, the root-sum-square (rms) can be simply replaced with a normal average. In this case, the equation (1) becomes Becomes

一方、U_c層のニューロンにおける計算はU_s層の場合よ
りは簡単であり、上記と同様に３×３の入力を有する場
合となる。ここでα_kは定数である。この関数は、図９に
示す関数形と同様の関数形を有している。（３）式と
（４）式との相違点は、（３）式の分子の係数が分母の
それと相違しているのに対し、（４）式では分子と分母
の係数が同一であるという点にみである。On the other hand, calculated in neurons of U _c layer is simpler than in the case of U _s layer, if having an input in the same manner as above 3 × 3 Becomes Here, α _k is a constant. This function has the same function form as the function form shown in FIG. The difference between Expressions (3) and (4) is that the coefficient of the numerator in Expression (3) is different from that of the denominator, whereas in Expression (4), the coefficients of the numerator and denominator are the same. It is only a point.

このようにU_s層に関する（１）式を（３）式のように
簡単化することにより、U_s層とU_c層とで互いに同一の回
路構成を備えればよいこととなり、ハードウェア化が非
常に簡単化されることとなる。By thus simplified as about U _s layer (1) (3), it becomes possible may Sonaere the same circuit configuration with each other in the U _s layer and U _c layer, a hard-wired Will be greatly simplified.

[Pipelining and scheduling]

前述した文献［４］に記載された手書き文字認識の例
では、19×19ピクセルの入力を処理するために34,000も
のニューロンを必要としている。ここでは、ハードウェ
ア化するにあたり、このニューロンの数を削減化するた
めに、図10に示すようにｘ軸に沿うプロジェクションを
行う。ここではプロジェクションベクトルＰはＰ＝（1,
0,0）となる。ここでｘ軸は、入力層内の２つの軸x,yの
うちの一方である。プロジェクションは、多数の演算要
素（ここではニューロンに対応する）の例えば３次元的
な配列を、例えば２次元、１次元等のより低次元の、比
較的少数の演算要素に写像する手法である。適切な方向
にプロジェクションすることにより複数の演算要素が１
つの演算要素に写像される。例えば、プロジェクション
ベクトルＰをＰ＝（1,0,0）に選ぶと、もともとの３次
元の配列における（z,y,z）＝（i,j,k）の位置にある各
演算要素が写像された２次元の配列における（y,z）＝
（j,k）の位置に写像される。このプロジェクションベ
クトルＰ＝（1,0,0）の方向にプロジェクションを行う
ことにより、ネオコグニトロンを構成するニューロンの
３次元の配列が、２次元の配列に写像されることとな
り、各ウインドウに関し写像されたｙ−ｚ平面内に存在
する各ニューロンに対応する演算要素を構成するだけで
全システムが構成されることとなる。このプロジェクシ
ョンにより、文献［４］に記載された34,000個のニュー
ロンを必要とする例では、ほぼ2,000程度のニューロン
（演算要素）に減縮されることとなり、34,000に対しわ
ずか約６％で済む結果となる。プロジェクションを行っ
た後、スケジューリングベクトルＳを選択することにな
るが、このスケジューリングベクトルＳの選び方は一通
りではない。ただし、適切な選び方はかなり限定され
る。このスケジュールベクトルＳは、互いにほぼ同時に
演算を行う演算要素が並ぶ超平面の進行方向を表わして
いる。一旦スケジューリングベクトルＳが選択される
と、オリジナルの３次元配列中のｎ＝（i,j,k）に存在
する全演算要素に対応する演算は、時刻ｔ＝ｎ・Ｓ（・
はスカラー積を表わす）において実行されることにな
る。In the example of handwritten character recognition described in the aforementioned reference [4], as many as 34,000 neurons are required to process an input of 19 × 19 pixels. Here, in hardware, in order to reduce the number of neurons, projection along the x-axis is performed as shown in FIG. Here, the projection vector P is P = (1,
0,0). Here, the x-axis is one of the two axes x and y in the input layer. The projection is a technique of mapping, for example, a three-dimensional array of a large number of arithmetic elements (corresponding to neurons in this case) to a relatively small number of lower-dimensional arithmetic elements, for example, two-dimensional or one-dimensional. By projecting in the appropriate direction, multiple arithmetic elements
Is mapped to two arithmetic elements. For example, if the projection vector P is selected as P = (1,0,0), each operation element at the position of (z, y, z) = (i, j, k) in the original three-dimensional array is mapped. (Y, z) in the obtained two-dimensional array =
It is mapped to the position of (j, k). By performing projection in the direction of the projection vector P = (1,0,0), a three-dimensional array of neurons constituting the neocognitron is mapped into a two-dimensional array, and mapping is performed for each window. The entire system is configured only by configuring the operation elements corresponding to each neuron existing in the yz plane. By this projection, in the example that requires 34,000 neurons described in Ref. [4], the number of neurons (arithmetic elements) is reduced to about 2,000, which is only about 6% of 34,000. Become. After the projection is performed, the scheduling vector S is selected, but the method of selecting the scheduling vector S is not limited. However, the right choice is quite limited. The schedule vector S represents the traveling direction of a hyperplane in which calculation elements that perform calculations almost simultaneously are arranged. Once the scheduling vector S is selected, the operation corresponding to all the operation elements existing at n = (i, j, k) in the original three-dimensional array is performed at time t = n · S (·
Represents a scalar product).

システム全体の演算は、その超平面が全ての演算要素
（ニューロン）を通り過ぎたときに完成することにな
る。図10（Ａ）にはそのスケジューリングベクトルＳの
例が示されており、一点鎖線に示すようにスケジューリ
ングベクトルＳ＝（1,0,1）を選択すると、この超平面
内に並ぶ演算要素間で１パイプランイ時間内の信号の伝
達が存在することになり、これにより一方の演算要素に
おける演算が終了した後でないと他方の演算要素におけ
る演算を実行することができないこととなり、これによ
り超平面内で信号伝達の遅れが生じることとなり、パイ
プライン周期を長くする必要が生じる。そこで、ここで
はスケジューリングベクトルＳとしてＳ＝（1,0,2）を
選択することとする。これにより、超平面内の複数の演
算要素間における１パイプライン時間内の信号の伝達が
なくなり、完全にシストライズされることとなる。ここ
で、「シストライズ」とは、各パイプラインステージ内
において、全ての演算要素がパイプライン化されている
状態をいう。ネオコグニトロンでは、全てのニューロン
はその入力を前層内の近傍のニューロンのみから得てい
る。これら前層内の近傍のニューロンは、同一パイプラ
インステージ内に写像される。各ニューロンからの出力
データは、各ニューロン内の、コンデンサとスイッチ群
で構成されるアナログパイプラインレジスタ２（図11、
図14等参照）内に電圧の状態で蓄積される。図10に示す
入力領域が３×３の近傍領域の場合は、図11に示すよう
に互いに隣接するニューロン（演算要素）間で信号を共
有すればよいため、ニューロン毎に４つのアナログパイ
プラインレジスタを持てばよい。この各アナログパイプ
ラインレジスタ２は、ハードウェアとしては、図11の点
線で囲んだ回路のように構成される。The operation of the entire system is completed when the hyperplane has passed all the operation elements (neurons). FIG. 10 (A) shows an example of the scheduling vector S. When the scheduling vector S = (1,0,1) is selected as shown by the dashed line, the operation elements arranged in the hyperplane are selected. There is signal transmission within one pipeline time, so that the operation in the other operation element can be executed only after the operation in one operation element has been completed, and thereby the This causes a delay in signal transmission, and necessitates a longer pipeline cycle. Therefore, here, S = (1, 0, 2) is selected as the scheduling vector S. As a result, the transmission of signals within one pipeline time between a plurality of calculation elements in the hyperplane is eliminated, and the system is completely systrised. Here, the "sys-to-rise" refers to a state in which all operation elements are pipelined in each pipeline stage. In neocognitron, all neurons get their input only from nearby neurons in the anterior layer. Neighboring neurons in these anterior layers are mapped into the same pipeline stage. The output data from each neuron is stored in an analog pipeline register 2 (FIG. 11,
(See FIG. 14 and the like). When the input area shown in FIG. 10 is a 3 × 3 neighborhood area, signals may be shared between adjacent neurons (arithmetic elements) as shown in FIG. 11, so that four analog pipeline registers are provided for each neuron. I just need to have Each of the analog pipeline registers 2 is configured as hardware as a circuit surrounded by a dotted line in FIG.

これら４つのアナログパイプラインレジスタのうちの
１つのレジスタ2aは、前層内の演算要素における現在の
演算結果を保持するために使用され、他の３つのレジス
タ2b,2c,2dは過去に演算された結果を保持するために用
いられる。ここで、これら３つのアナログパイプライン
レジスタ2b,2c,2dには、それぞれ１つ前のパイプライン
ステージ、２つ前のパイプラインステージ、３つ前のパ
イプラインステージにおける演算結果が保持される。通
常、１つのパイプライン時間間隔は例えば１μsec等非
常に短時間であるため、各アナログパイプラインレジス
タ２を構成するコンデンサは、MOSトランジスタで作ら
れる非常に小さなコンデンサで十分であり、このコンデ
ンサで数パイプライン時間その電圧を十分に保持するこ
とができる。このように非常に小さなコンデンサで済む
ため、LSI化した場合にシリコンウエハの面積の削減に
大いに役立つこととなる。One of the four analog pipeline registers, register 2a, is used to hold the current operation result of the operation element in the previous layer, and the other three registers 2b, 2c, 2d are operated in the past. Used to hold the result. Here, the three analog pipeline registers 2b, 2c, and 2d hold the operation results of the previous pipeline stage, the previous pipeline stage, and the previous pipeline stage, respectively. Usually, since one pipeline time interval is very short, for example, 1 μsec, a very small capacitor made of a MOS transistor is sufficient as a capacitor constituting each analog pipeline register 2. The voltage can be sufficiently maintained during the pipeline time. Since only a very small capacitor can be used, it is very useful for reducing the area of the silicon wafer in the case of LSI.

ある１つのパイプラインにおける演算が終了したと
き、アナログパイプラインレジスタ２の内容は、図10に
示す超平面が、１つのニューロン分だけ進むに伴ってシ
フトされる必要があるが、ここでは、図11に示すよう
に、アナログハイプラインレジスタ２に蓄積された内容
をシフトすることに代え、各ニューロンに対応する係数
をそのニューロンの入力領域内で逆方向にシフトされ、
これによりアナログパイプラインレジスタ２に蓄積され
た内容をシフトした場合と等価の演算が行われる。した
がって各アナログパイプラインレジスタ２は、その各レ
ジスタ２に１つのデータが保持されている間、そのデー
タの伝達先を切り換える必要がないことになる。この方
法を採用することにより、伝達先の切り換えに伴うエラ
ーがなくなり、各アナログパイプラインレジスタ２に蓄
えられたデータを数パイプライン期間に亘って正確に保
持することができることとなる。When the operation in one pipeline is completed, the contents of the analog pipeline register 2 need to be shifted as the hyperplane shown in FIG. 10 advances by one neuron. As shown in FIG. 11, instead of shifting the content stored in the analog pipeline register 2, the coefficients corresponding to each neuron are shifted in the opposite direction in the input area of the neuron,
As a result, an operation equivalent to shifting the content stored in the analog pipeline register 2 is performed. Therefore, each analog pipeline register 2 does not need to switch the transmission destination of the data while one data is held in each register 2. By adopting this method, an error due to switching of the transmission destination is eliminated, and the data stored in each analog pipeline register 2 can be held accurately over several pipeline periods.

尚、プロジェクションベクトルＰの方向は１通りでは
なく、例えば図12に示すようにＰ＝（0,0,1）方向にプ
ロジェクションすることもできる。この場合は、スケジ
ューリングベクトルＳとして、例えばＳ＝（0,0,1）が
選択され、前述したように、前段のパイプラインステー
ジにおける演算結果が蓄積されこのパイプラインステー
ジにおける演算に用いられる。The direction of the projection vector P is not limited to one. For example, as shown in FIG. 12, the projection can be performed in the P = (0,0,1) direction. In this case, for example, S = (0,0,1) is selected as the scheduling vector S, and as described above, the operation results in the preceding pipeline stage are accumulated and used for the operation in this pipeline stage.

[Overall configuration of hardware system]

図13は、本実施例におけるハードウェアシステムの全
体構成図である。各パイプラインステージ10aにおける
各演算を実行するアナログプロセッサ10は、例えば、FS
M（Finite State Machine）のような簡単なロジックシ
ステムで構成されたホストシステム12によって制御され
る。このホストシステム12の主な役割りは、アナログパ
イプラインレジスタ２に蓄積された電圧信号を切換える
ため、および係数a_k,i,j,kのセットに対応する電圧信号
を切換えるためのスイッチの切換え制御である。FIG. 13 is an overall configuration diagram of a hardware system in the present embodiment. The analog processor 10 that executes each operation in each pipeline stage 10a includes, for example, FS
It is controlled by a host system 12 composed of a simple logic system such as M (Finite State Machine). The main role of this host system 12 is to switch the switches for switching the voltage signals stored in the analog pipeline register 2 and for switching the voltage signals corresponding to the set of coefficients a _{k, i, j, k.} Control.

Ｎピクセル×Ｎ列の画素データ14は、Ｎ個の列に分割
され、順次Ｎ回に亘ってアナログプロセッサ10に入力さ
れる。掛け算のための全ての係数a_k,i,j,kのセットと後
述する参照電圧は、ホストシステム12に付随するデジタ
ルメモリ16内に記憶されており、D/Aコンバータ18を介
してアナログプロセッサ10に供給される。前述したよう
に１つのウインドウ内の全ニューロンは互いに同一のの
係数のセットを用いるため、これらの係数や参照電圧は
多くのニューロンに分配されることとなり、したがって
ホストシステム12からアナログプロセッサ10に入力され
る電圧入力信号の数はVLSIチップ上に形成されるニュー
ロン（演算要素）の数と比べ比較的少なくて済むことと
なる。アナログプロセッサ10は、図14に拡大して示すよ
うな構成のアナログニューロン（アナログ演算要素）が
互いに隣接して多数配列されたものである。図14に示す
×及び÷が付された各ブロックは、それぞれ後述するア
ナログ掛算器及びアナログ割算器を表わしている。縦方
向に延びる線は、アナログ掛算器における演算で必要と
なる係数を表わす電圧および参照電圧を伝達する電線を
表わしており、図15に示すように、各パイプラインステ
ージにおける各ウインドウ内の多数のニューロンに分配
される。これらの縦方向に延びる電線を経由して伝達さ
れるアナログ電圧信号は、図13のアナログプロセッサ10
の上部に備えられたアナログウェイトレジスタ20内のコ
ンデンサに蓄積されたものである。したがって、このよ
うに電圧信号を一旦コンデンサに蓄えているため、１つ
のA/Dコンバータ18で１つのパイプライン時間内に全て
の電圧信号を供給することができる。The pixel data 14 of N pixels × N columns is divided into N columns and sequentially input to the analog processor 10 N times. A set of all coefficients a _{k, i, j, k} for the multiplication and a reference voltage described later are stored in a digital memory 16 associated with the host system 12 and are stored in an analog processor via a D / A converter 18. Supplied to 10. As described above, since all neurons in one window use the same set of coefficients, the coefficients and the reference voltage are distributed to many neurons, and therefore input from the host system 12 to the analog processor 10 is performed. The number of voltage input signals required is relatively small compared to the number of neurons (arithmetic elements) formed on the VLSI chip. The analog processor 10 is configured by arranging a large number of analog neurons (analog operation elements) having a configuration shown in an enlarged manner in FIG. The blocks marked with “x” and “÷” shown in FIG. 14 represent an analog multiplier and an analog divider, respectively, which will be described later. Lines extending in the vertical direction represent wires transmitting a voltage and a reference voltage representing a coefficient necessary for the operation in the analog multiplier, and as shown in FIG. 15, a number of lines in each window in each pipeline stage. Distributed to neurons. The analog voltage signal transmitted via these vertically extending wires is the analog processor 10 of FIG.
Are stored in a capacitor in the analog weight register 20 provided above the. Therefore, since the voltage signals are temporarily stored in the capacitors as described above, one A / D converter 18 can supply all the voltage signals within one pipeline time.

各ニューロンは前層内の近傍のニューロンのみからそ
の入力を得ているため、パイプラインステージ間の結合
は局所的にのみ行われる。このことによりVLSIとしての
ハードウェア化が容易となる。またネオコグニトロン以
外の多層構造を有する他のニューラルネットワークと同
様に、このネオコグニトロンにおいても′学習′プロセ
スを導入することができる。この学習プロセスは、ホス
トシステムにより、通常のモードよりも充分に時間をか
けて実行される。通常のモードにおいて誤認識された入
力パターンをシステム内又は外部のメモリに蓄積してお
き、学習モードにおいてこれら蓄積された入力パターン
がアナログプロセッサ10に入力される。この誤認識され
た入力パターンがアナログプロセッサ10に入力される。
この誤認識された入力パターンがアナログプロセッサ10
に入力されると、そのときの中間の各層の出力がモニタ
され、A/Dコンバータ22によりデジタル信号に変換され
てホストシステム12に入力され、デジタルメモリ16に記
憶される。ホストシステム12では、モニタされた中間の
各層の出力から伝播誤差を計算し、これにより学習が行
われることとなる。この学習モードを付加すると、ホス
トシステム12が若干複雑となるが、それでもアナログプ
ロセッサ10と同一の半導体チップ上に搭載することが可
能な規模と考えられる。Since each neuron gets its input only from nearby neurons in the previous layer, coupling between pipeline stages is performed only locally. This facilitates hardware implementation as a VLSI. Also, like other neural networks having a multilayer structure other than neocognitron, a 'learning' process can be introduced in this neocognitron. This learning process is performed by the host system taking much longer than in the normal mode. Input patterns misrecognized in the normal mode are stored in a memory inside or outside the system, and these stored input patterns are input to the analog processor 10 in the learning mode. The erroneously recognized input pattern is input to the analog processor 10.
This misrecognized input pattern is
, The output of each intermediate layer at that time is monitored, converted into a digital signal by the A / D converter 22, input to the host system 12, and stored in the digital memory 16. In the host system 12, the propagation error is calculated from the output of each of the monitored intermediate layers, whereby learning is performed. When this learning mode is added, the host system 12 becomes slightly complicated, but it is considered that the scale can still be mounted on the same semiconductor chip as the analog processor 10.

[Basic operation element]

前述したように、U_s層を構成する各演算要素における
演算は（３）式で表わされる。この（３）式には、乗算
と加算と割算が含まれている。図16に９入力（３×３）
のニューロンの例を示す。前述したようにここではアナ
ログ回路でこの演算要素が実現されている。９入力のニ
ューロンを構成するためには、18個のアナログ乗算器30
1,302,…,318と１個のアナログ割算器32とが必要とな
る。As described above, the operation in each operation element constituting the _Us layer is represented by equation (3). This equation (3) includes multiplication, addition, and division. Fig. 16 shows 9 inputs (3x3)
Here is an example of a neuron. As described above, this operation element is realized here by an analog circuit. To configure a 9-input neuron, 18 analog multipliers 30
, 318 and one analog divider 32 are required.

一般にＮ入力のニューロンを実現するためには、
（３）式の分子と分母の双方にＮ回の乗算が存在するた
め、2N個の乗算器と１個の割算器が必要となる。ここで
は、基本演算要素として、図17に示すアナログの電流モ
ード割算器（［12］K.Bult and H.Wallinga,″A Classs
of Analog CNOS Circuits Based on the Square−Law
Characteristic of an MOS Transistor in Saturatio
n″,IEEE Journal of Solid−State Circuits,vol.sc2
2,No.3,pp.357-365,June 1987参照）、および図18に示
すギルバート乗算器を用いた。Generally, to realize an N-input neuron,
Since there are N multiplications in both the numerator and denominator of equation (3), 2N multipliers and one divider are required. Here, an analog current mode divider shown in FIG. 17 ([12] K. Bult and H. Wallinga, ″ A Classes
of Analog CNOS Circuits Based on the Square−Law
Characteristic of an MOS Transistor in Saturatio
n ″, IEEE Journal of Solid-State Circuits, vol.sc2
2, No. 3, pp. 357-365, June 1987) and the Gilbert multiplier shown in FIG.

図17、図18等において、各トランジスタはMOSトラン
ジスタであり、該各MOSトランジスタに付随する記号5/5
又は10/5は、分子側がそのMOSトランジスタのチャネル
幅（Ｗμｍ）、分母側がそのMOSトランジスタのチャネ
ル長（Ｌμｍ）を表わしている記号である。17, 18 and the like, each transistor is a MOS transistor, and a symbol 5/5 attached to each MOS transistor.
Or, 10/5 is a symbol in which the numerator side represents the channel width (W μm) of the MOS transistor and the denominator side represents the channel length (L μm) of the MOS transistor.

入力U_c1,x,y,kは電圧（V_x−V_xref）として前層のニュ
ーロンから与えられ、係数a_k,i,j,kは電圧（V_y−
V_yref）としてアナログウェイトレジスタ20（図13、図1
4参照）から与えられ、乗算器301,303,…,318により互
いに掛け算されるとともに電流信号に変換される。分母
用の９つの各乗算器301,302,…,309から出力された信号
は１つのノードにまとめられ、また分母用の９つの各乗
算器310,311,…,318から出力された信号も１つのノード
にまとめられ、これらの２つのノードにまとめられた２
つの電流信号が電流モード割算器32に入力される。この
割算器32の出力電流は、図19に示す電流・電圧（Ｉ−
Ｖ）変換器34により電圧信号に変換される。ある１つの
信号を共通の電線を経由して多数の演算要素に分配する
ことができるように、ニューロンの外部では各変数は電
圧信号として表わされている必要がある。ここで採用し
たアナログ回路は、他のアナログ回路と同様に理想的な
ものではなく、例えば割算器32は、I_in1×I_in1／I_in2を
演算するものであって正しい割算を行っているのではな
い。また、この割算器32及び乗算器301,302,…,318には
その各出力にオフセット電流が含まれている。さらにＩ
−Ｖ変換器36にもその出力にオフセット電圧が含まれて
いる。ただし、このような誤差要因は容易にキャンセル
することができる。例えば割算器32においてはその入力
信号範囲を制限することにより、正しい演算を行う割算
器として取り扱うことができる。また、割算器32及び乗
算器301,302,…,318のオフセット電流は、単に（３）
式、（４）式中の定数C₁，C₂，C₃に寄与するのみであ
る。またＩ−Ｖ変換器34のオフセット電圧は、次のパイ
プラインステージにおける乗算のための参照電圧V_refを
このオフセット電圧と同じ値に選ぶことによりキャンセ
ルされる。またこれらの回路中の他の誤差要因について
は、ニューラルネットワークというアルゴリズムの性質
により、かなりの許容幅が許されるものである。The input U _{c1, x, y, k} is given as a voltage (V _x −V _xref ) from the neuron in the previous layer, and the coefficients a _{k, i, j, k} are the voltage (V _y −
V _yref ) as the analog wait register 20 (FIGS. 13 and 1)
4) are multiplied by multipliers 301, 303,..., 318 and converted into current signals. The signals output from the nine multipliers 301, 302,..., 309 for the denominator are combined into one node, and the signals output from the nine multipliers 310, 311,. Put together and put together in these two nodes
The two current signals are input to the current mode divider 32. The output current of the divider 32 corresponds to the current / voltage (I-
V) The voltage signal is converted by the converter 34. Outside a neuron, each variable needs to be represented as a voltage signal so that one signal can be distributed to many computing elements via a common wire. The analog circuit adopted here is not ideal like other analog circuits. For example, the divider 32 calculates I _in1 × I _in1 / I _in2 and performs a correct division. It is not. The output of the divider 32 and the multipliers 301, 302,..., 318 includes an offset current. Further I
The output of the -V converter 36 also includes an offset voltage. However, such an error factor can be easily canceled. For example, in the divider 32, by limiting the input signal range, the divider 32 can be handled as a divider that performs a correct operation. Further, the offset current of the divider 32 and the multipliers 301, 302,...
It only contributes to the constants C ₁ , C ₂ and C ₃ in the equation (4). The offset voltage of the IV converter 34 is canceled by selecting the reference voltage _Vref for multiplication in the next pipeline stage to the same value as this offset voltage. Also, other error factors in these circuits are allowed to have a considerable tolerance due to the nature of the neural network algorithm.

また、U_c層における演算（（４）式）はU_s層における
演算（（３）式）と近似しているため、U_c層を形成する
演算要素は、係数を伝送する電線の接続が異なることを
除き、U_s層の演算要素とその構造は全く同一でよく、U_s
層用とU_c層用の双方に適用できるように電線を接続して
おくことにより同一の演算要素をU_s層用にもU_c層用にも
用いることができる。In addition, since the operation in the U _c layer (Equation (4)) is similar to the operation in the U _s layer (Equation (3)), the operation elements forming the U _c layer include the connection of the electric wire that transmits the coefficient. except different, its structure and operation elements of U _s layer may exactly the same, U _s
The same operation element By connecting the wires to be applicable to both for layer for and U _c layer can be used for U _c layer also for U _s layer.

[Experimental and simulation results]

図21、図22に、上記９入力のニューロン（演算要素）
のハードウェアを、それぞれ、試作して測定した結果、
およびアナログシミュレータを用いてシミュレートした
結果を表わす。ここでは、図20に示す、テンプレートと
完全に一致したパターン（入力Ａ）、部分的に一致した
パターン（入力Ｂ）、および完全に不一致のパターン
（入力Ｃ）の３種類の入力に対応する出力を測定ないし
シミュレートした。図21、図22において、出力電圧V_o及
び入力電圧V_xは、それぞれ（２）式中のU_s(1+1),x,y,k
及び入力νに対応している。これらの結果に見るよう
に、このニューロンにより、テンプレートに対する一致
度の異なる各段階の入力データが互いに識別されてい
る。試作回路中のMOSトランジスタのチャンネル長変調
係数（λ）が予期していたよりも大きな値を有していた
ため、このニューロンにおけるパターンの分離度は若干
下がっているが、それでもなおテンプレートとの一致度
の相違が正しく識別されている。FIGS. 21 and 22 show the nine-input neurons (arithmetic elements).
As a result of prototyping and measuring each hardware,
And a result simulated using an analog simulator. Here, the output corresponding to the three types of input shown in FIG. 20, that is, a pattern completely matching the template (input A), a pattern partially matching (input B), and a completely mismatching pattern (input C). Was measured or simulated. 21 and 22, the output voltage V _o and the input voltage V _x are each 2 in the formula _{U s (1 + 1),} x, y, k
And input ν. As can be seen from these results, the input data of each stage having a different degree of matching with the template is identified by the neuron. Since the channel length modulation coefficient (λ) of the MOS transistor in the prototype circuit had a larger value than expected, the degree of pattern separation in this neuron was slightly reduced, but the degree of coincidence with the template was still low. Differences are correctly identified.

図23は、測定結果、アナログシミュレータによるシミ
ュレーションの結果、及び計算結果の比較を表わしたグ
ラフである。シミュレーションの結果は、（３）式に基
づいて計算した計算結果と非常に良く一致しているが、
測定結果とシミュレーションの結果とは入力電圧V_xが3.
5ボルト以上の高電圧側でかなり大きな相違が見られ
る。しかしこの問題は、チャンネル長Ｌが長く、したが
ってλの大きなトランジスタを製作するか、もしくは、
アナログニューロンの入力レンジを制限することにより
解決されるものである。FIG. 23 is a graph showing a comparison between a measurement result, a simulation result by an analog simulator, and a calculation result. The simulation result agrees very well with the calculation result calculated based on equation (3),
The measurement result and the simulation result show that the input voltage V _x is 3.
Significant differences are seen on the high voltage side above 5 volts. However, this problem is caused by making a transistor having a long channel length L and thus a large λ, or
This is solved by limiting the input range of the analog neuron.

また、上記ハードウェアを有するネオコグニトロン全
体の動作を確認するために、図３に示すような11×11ピ
クセルの入力をもつ６層のネオコグニトロンのシミュレ
ーションプログラムを開発した。このプログラムにおけ
るニューロンのモデルとしては、簡単化された（３）
式、および（４）式に基づくものである。そのプログラ
ムを実行させることにより、図１に示す文字が正しく認
識されることが確認された。Further, in order to confirm the operation of the entire neocognitron having the above hardware, a simulation program of a six-layer neocognitron having an input of 11 × 11 pixels as shown in FIG. 3 was developed. The model of the neuron in this program is simplified (3)
And (4). By executing the program, it was confirmed that the characters shown in FIG. 1 were correctly recognized.

尚、上記試作されたニューロン（演算要素）は２μｍ
のCMOSの技術を用いて試作されたものであり、450μｍ
×650μｍという小さな面積に収納するされている。し
たがって、前述した文献［２］，［３］に示された、3
4,000個ものニューロンを用いて文字認識を行った場合
を例にとると、シストリックアレイを構成することによ
り2000の演算要素で済み、これをハードウェア化した場
合450μｍ×650μｍ×2,000＝585mm²となり、約250mm²
程度の通常の規模の半導体チップ２〜３個で実現するこ
とができることとなる。また、１パイプライン期間が１
μＳのハードウェアの場合１つの文字を26μＳで認識す
ることができ、これはこの機能をパーソナルコンピュー
タによるソフトウェアシミュレーションで実現した場合
と比べ約100万分の１の時間で済むこととなる。このよ
うに本発明により、コストパフォーマンスに優れたプロ
セッサが実現される。The prototype neuron (arithmetic element) was 2 μm
Prototype using CMOS technology of 450 μm
It is housed in a small area of × 650 μm. Therefore, as shown in the aforementioned references [2] and [3], 3
Taking the case of character recognition using as many as 4,000 neurons as an example, a systolic array can be used for 2,000 arithmetic elements, and if this is implemented in hardware, it becomes 450 μm × 650 μm × 2,000 = 585 mm ² , About 250mm ²
It can be realized with two or three semiconductor chips of a normal size. Also, one pipeline period is 1
In the case of hardware of μS, one character can be recognized in 26 μS, which is about one millionth of the time required when this function is realized by software simulation using a personal computer. As described above, according to the present invention, a processor excellent in cost performance is realized.

───────────────────────────────────────────────────── フロントページの続き (72)発明者エドガーサンチェス―シネンシオアメリカ合衆国 77840 テキサス州カレッジステーションホンドドライブ 1812 (56)参考文献特開平２−76062（ＪＰ，Ａ) 特開平２−64880（ＪＰ，Ａ) 特開平２−228784（ＪＰ，Ａ) 特開平１−237754（ＪＰ，Ａ) ────────────────────────────────────────────────── ─── Continued on the front page (72) Inventor Edgar Sanchez-Sinencio United States 77840 College Station, Texas, Texas Drive 1812 (56) References JP-A-2-76062 (JP, A) JP-A-2-64880 (JP) JP-A-2-228784 (JP, A) JP-A-1-237754 (JP, A)

Claims

(57) [Claims]

1. A neural network processor for realizing a feed-forward type neural network having a multilayer structure, wherein an arithmetic element corresponding to a neuron constituting the neural network is constituted by a MOS analog circuit using a voltage as an input / output variable, A neural network processor comprising a systolic array constituted by a large number of the MOS analog circuits.

2. The neural network processor according to claim 1, wherein said neural network is a neocognitron.

3. Each of the MOS analog circuits executes a numerator product-sum operation, a plurality of numerator Gilbert multipliers whose output terminals are connected to each other, and executes a denominator product-sum operation. A plurality of Gilbert multipliers for the denominator whose terminals are connected to each other, a first input terminal connected to an output terminal of the Gilbert multiplier for the numerator, and an output terminal of the Gilbert multiplier for the denominator. 3. The neural network processor according to claim 2, further comprising a divider having a second input terminal and outputting a voltage of the operation result.