JPH05346914A

JPH05346914A - Neuro processor

Info

Publication number: JPH05346914A
Application number: JP4156561A
Authority: JP
Inventors: Yoichi Tamura; 洋一田村; Tadayuki Morishita; 賢幸森下
Original assignee: Matsushita Electronics Corp
Current assignee: Panasonic Holdings Corp
Priority date: 1992-06-16
Filing date: 1992-06-16
Publication date: 1993-12-27

Abstract

PURPOSE:To provide a neuro processor which performs the inference of a neural network and calculation for learning at a high speed without increasing its hardware. CONSTITUTION:Processor elements 2-10 which can change mutual connections among a multiplier 24, an adder 25, a memory 26, and registers 34-40 by the switching of selectors 27-33 are arranged in a two-dimensional matrix shape, and memories 11-15 and adders 17-22 are added to the respective columns and rows to constitute a matrix arithmetic unit 1 which is flexible in hardware. This neuro processor consists of the matrix arithmetic unit and an auxiliary arithmetic device 23; and the matrix arithmetic unit 1 selectively perform matrix calculation and the auxiliary arithmetic unit 23 perform calculation other than the matrix calculation. Then the matrix calculation and other calculation are carried out in parallel to process the calculation including the learning of the neural network at a high speed.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、ニューラルネットワー
クを構成するシナプスやニューロンの働きをシミュレー
ションするための推論および学習の計算を高速に行なう
ためのニューロプロセッサに関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a neuroprocessor for performing high-speed inference and learning calculations for simulating the actions of synapses and neurons that make up a neural network.

【０００２】[0002]

【従来の技術】ニューラルネットワークは、人間の脳の
神経細胞の働きをモデル化して模倣することによって、
従来のいわゆるノイマン形のコンピュータが苦手として
いた認識、連想、最適化問題、音声合成などを得意とす
る新しいコンピュータの実現を目指すものである。BACKGROUND OF THE INVENTION Neural networks model and imitate the function of nerve cells in the human brain,
The aim is to realize a new computer that is good at recognition, association, optimization problems, speech synthesis, etc., which conventional so-called Neumann-type computers were not good at.

【０００３】ニューラルネットワークのモデルには、ニ
ューロンの配置により階層構造のもの、相互結合構造の
ものなどさまざまな構造が検討されているが、階層構造
のネットワークにおいてはバックプロパゲーションと呼
ばれるアルゴリズムにより簡単に学習させることができ
るため制御、文字認識、画像認識、画像処理などに幅広
い応用が考えられている。Various structures such as a hierarchical structure and a mutual connection structure have been studied as a model of a neural network depending on the arrangement of neurons. In a hierarchical network, an algorithm called backpropagation makes it easy. Since it can be learned, a wide range of applications are considered for control, character recognition, image recognition, image processing, and the like.

【０００４】以下において、この階層構造のネットワー
クの構成および働きを図８および図９を参照して説明す
る。図８において、階層構造ネットワークの構成要素は
各層中に配置されたニューロン１０１、ニューロン間の
結合関係を表わすシナプス１０２、また階層構造を構成
する入力層、中間層、出力層はそれぞれ１０３、１０
４、１０５によって示されている。図８においては３層
のネットワークの例を示したが、中間層を複数にするこ
とにより４層以上の階層構造のネットワークが構成でき
る。以後、簡単のため３層のネットワークに限って説明
するが、４層以上のネットワーク構成においても、また
各層のニューロンの数を増減させても同様に扱えるもの
である。The structure and operation of this hierarchical network will be described below with reference to FIGS. 8 and 9. In FIG. 8, the constituent elements of the hierarchical network are neurons 101 arranged in each layer, synapses 102 representing the connection relationship between neurons, and the input layer, the intermediate layer, and the output layer constituting the hierarchical structure are 103 and 10 respectively.
4, 105. Although an example of a three-layer network is shown in FIG. 8, a network having a hierarchical structure of four or more layers can be configured by using a plurality of intermediate layers. Hereinafter, for simplification, the description will be limited to a three-layer network, but the same can be applied to a network configuration of four layers or more, or even if the number of neurons in each layer is increased or decreased.

【０００５】３層構造のニューラルネットワークにおい
ては、まず入力信号は入力層１０３の各ニューロンに与
えられ、その信号が各階層を構成するシナプスおよびニ
ューロンの演算操作を受けつつ入力層、中間層、出力層
の順番に伝搬し、出力層１０５のニューロンの信号がネ
ットワークの出力になるものである。このような入力
層、中間層、出力層の順番に伝搬していく通常の伝搬を
前向きの伝搬と呼ぶ。この前向きの伝搬におけるニュー
ロンの働きを図９を参照にして説明する。図９におい
て、ニューロンは１０１、１０７、１０８、１０９など
によって示されており、またシナプスは１０２で示さ
れ、またニューロンの特性関数ｆは１０６として示され
ている。各ニューロンはそれぞれ一つ前の層のニューロ
ンの出力をシナプスを介して受け取るのであるが、その
とき各シナプスはそのシナプス固有の結合の重みと呼ば
れる値によって特徴付けられており、前の層のニューロ
ンの出力値にそのシナプスのもつ結合の重みの値を乗算
した結果を次のニューロンに与えるものである。例え
ば、図９におけるｉ番目のニューロン１０８とｉ＋１番
目のニューロン１０１との間にあるシナプスは結合の重
みＷ_lijを持つものであり、ｌ層目に配置されたニュー
ロンのうちｉ番目のニューロン１０８の出力Ｏ_liにその
結合の重みの値を乗算した結果、すなわちＷ_lji×Ｏ_li
がｌ＋１層目に配置されたニューロンのうちｊ番目のニ
ューロン１０１に与えられ、このニューロン１０１はこ
れに結合している全てのシナプスにより与えられる同様
の入力全てを加算し、その加算結果にニューロンの特性
関数１０６を作用させた結果としての関数値をニューロ
ン１０１の出力Ｏ_l+1,jとして出力するものである。こ
れを式で表すと次式のようになる。In the three-layered neural network, an input signal is first given to each neuron of the input layer 103, and the signal is subjected to arithmetic operations of synapses and neurons constituting each layer, the input layer, the intermediate layer, and the output. The signals of the neurons in the output layer 105 are propagated in the order of the layers and become the output of the network. Such normal propagation in which the input layer, the intermediate layer, and the output layer propagate in this order is called forward propagation. The function of the neuron in this forward propagation will be described with reference to FIG. In FIG. 9, a neuron is indicated by 101, 107, 108, 109, etc., a synapse is indicated by 102, and a characteristic function f of the neuron is indicated by 106. Each neuron receives the output of the neuron of the previous layer via a synapse, and at that time, each synapse is characterized by a value called a connection weight specific to that synapse. The output value of is multiplied by the value of the connection weight of the synapse, and the result is given to the next neuron. For example, the synapse between the i-th neuron 108 and the i + 1-th neuron 101 in FIG. 9 has a connection weight W _lij , and among the neurons arranged in the l-th layer, the i-th neuron 108 The result of multiplying the output O _li by the value of the weight of the connection, that is, W _lji × O _li
Is given to the j-th neuron 101 among the neurons arranged in the (l + 1) th layer, and this neuron 101 adds all the similar inputs given by all the synapses connected thereto, and the addition result of the neuron The function value resulting from the action of the characteristic function 106 is output as the output O _{l + 1, j} of the neuron 101. This can be expressed by the following equation.

【０００６】[0006]

【数１】 [Equation 1]

【０００７】ここで、ＩはΣＷ×Ｏの積和演算の計算結
果を表している。以上のニューロンおよびシナプスの作
用による３層構成のネットワークの前向き伝搬における
計算はつぎの手順により行われる。まず最初にすべての
中間層のニューロンにおける出力を（数２）に従って計
算する。Here, I represents the calculation result of the product-sum operation of ΣW × O. The calculation in the forward propagation of the three-layered network by the action of the above neurons and synapses is performed by the following procedure. First, the outputs in all neurons of the intermediate layer are calculated according to (Equation 2).

【０００８】[0008]

【数２】 [Equation 2]

【０００９】次にその結果を使ったすべての出力層のニ
ューロンの出力は（数３）によって計算される。Next, the outputs of the neurons in all output layers using the result are calculated by (Equation 3).

【００１０】[0010]

【数３】 [Equation 3]

【００１１】次に、バックプロパゲーションアルゴリズ
ムによる学習方法とその場合のニューロンおよびシナプ
スの作用について説明する。バックプロパゲーションア
ルゴリズムにおける学習は、入力とそれに対応した理想
的な出力の組が前もって用意されており、そのような入
力に対する実際の出力と理想的な出力（教師信号と呼
ぶ）の差が減少するようにシナプスの結合の重みを修正
する操作によって行われる。以下、図８に示した３層の
ネットワークを例にして上記シナプスの結合の重みの修
正値、すなわち３層の場合のΔＷ_1jiとΔＷ_2jiの値の導
出を具体的な計算手順によって説明する。１．前向きの伝搬によって得られる実際の出力を（数
１）（数２）に基づいて計算する。２．出力層の各ニューロンの計算結果と教師信号との誤
差に対応した値δ_3j（以下、単にデルタと呼ぶ）を（数
４）によって計算する。Next, the learning method by the back propagation algorithm and the action of neurons and synapses in that case will be described. In learning in the backpropagation algorithm, a set of an input and an ideal output corresponding to the input is prepared in advance, and the difference between the actual output and the ideal output (called a teacher signal) for such an input is reduced. As described above, the operation is performed by correcting the synaptic connection weight. Hereinafter, the derivation of the correction value of the synapse connection weight, that is, the values of ΔW _1ji and ΔW _{2ji in} the case of three layers will be described by a specific calculation procedure by taking the three-layer network shown in FIG. 8 as an example. 1. The actual output obtained by the forward propagation is calculated based on (Equation 1) and (Equation 2). 2. A value δ _3j (hereinafter simply referred to as delta) corresponding to the error between the calculation result of each neuron in the output layer and the teacher signal is calculated by ( _Equation 4).

【００１２】[0012]

【数４】 [Equation 4]

【００１３】δ_ljはｌ層目に配置されたニューロンのう
ちｊ番目のニューロンに対するデルタ、ｔ_jは出力層に
配置されたニューロンのうちｊ番目のニューロンの与え
られた入力に対応する教師信号、ｇは（数５）で表され
るニューロンの特性関数ｆの微分係数に相当するもので
ある。ニューロンの特性関数ｆには単調非減少の関数が
用いられるので、ｇは特性関数の関数値の関数として表
すことができる。Δ _lj is the delta for the j-th neuron among the neurons arranged in the l-th layer, t _j is the teacher signal corresponding to the given input of the j-th neuron among the neurons arranged in the output layer, g corresponds to the differential coefficient of the characteristic function f of the neuron expressed by (Equation 5). Since a monotone non-decreasing function is used as the characteristic function f of the neuron, g can be expressed as a function of the function value of the characteristic function.

【００１４】[0014]

【数５】 [Equation 5]

【００１５】３．中間層と出力層との間のシナプスの結
合の重みの修正量をΔＷ_ljiとすると、この重みの修正
値は（数６）によって計算される。3. When the correction amount of the synaptic connection weight between the intermediate layer and the output layer is ΔW _lji , the correction value of this weight is calculated by ( _Equation 6).

【００１６】[0016]

【数６】 [Equation 6]

【００１７】（数６）におけるηを修正係数と呼び、Ｒ
はδ×Ｏなる行列を表している。４．中間層のニューロンに対するデルタを（数７）に従
って計算する。Η in (Equation 6) is called a correction coefficient, and R
Represents a matrix of δ × O. 4. The delta for the neurons in the hidden layer is calculated according to (Equation 7).

【００１８】[0018]

【数７】 [Equation 7]

【００１９】（数７）におけるｓはΣδ×Ｗで示される
積和結果を表している。５．中間層と入力層の間のシナ
プスの結合の重みの修正量は、中間層の出力を用いて計
算され、その修正値は（数８）に従って計算する。S in (Equation 7) represents the product-sum result represented by Σδ × W. 5. The correction amount of the synaptic connection weight between the intermediate layer and the input layer is calculated using the output of the intermediate layer, and the correction value is calculated according to (Equation 8).

【００２０】[0020]

【数８】 [Equation 8]

【００２１】実際の学習においては、修正の必要がなく
なるまで計算を繰り返す必要があるため、以上に述べた
計算手順を複数個の入力と教師信号の組み合わせに対し
て実施することになる。すなわち、上の１から５までの
手順を１回の学習と呼ぶことにすると全ての学習回数は
（学習に用いる入力と教師信号の組の数）×（繰り返し
の回数）で与えられ、膨大な計算量となる。In the actual learning, since it is necessary to repeat the calculation until the correction is no longer necessary, the above-described calculation procedure is executed for a plurality of combinations of the input and the teacher signal. That is, if the above procedure from 1 to 5 is called one learning, all learning times are given by (the number of pairs of input and teacher signals used for learning) × (number of repetitions), which is an enormous amount. It becomes a calculation amount.

【００２２】以上の説明によって明らかなように、ニュ
ーラルネットワークの計算は学習に伴う行列演算の繰り
返しのため計算量が膨大になっていた。この計算を実行
するには大型計算機などの大きなハードウェアを必要と
した。そのため家電製品を対象とした場合の学習効果の
応用においては大型計算機などによるバックプロパゲー
ションによる学習と計算結果のデータをマイクロプロセ
ッサやデジタルシグナルプロセッサ（ＤＳＰ）に移すこ
とによって行う前向きの伝搬の計算とを分離する方法が
用いられている。As is clear from the above description, the calculation amount of the neural network has become enormous because of the repetition of the matrix operation accompanying learning. Large hardware such as a large-scale computer was required to execute this calculation. Therefore, in the application of the learning effect when targeting home electric appliances, learning by back propagation with a large-scale computer etc. and calculation of forward propagation performed by transferring the calculation result data to a microprocessor or digital signal processor (DSP) Is used.

【００２３】[0023]

【発明が解決しようとする課題】上記従来のニューラル
ネットワークの学習効果の応用においては、膨大な計算
量のため大きなハードウェアの使用が必須であった。そ
れは高速性を確保するために多数の演算器を備えて並列
処理するのであるが、ハードウェアの柔軟性の欠如のた
めに一部の計算では並列処理できないという問題があっ
た。たとえば、メモリと演算器との間または演算器と演
算器との間でのデータのやり取りが計算の種類によって
は作れないという事情によるのであり、高速処理の困難
な計算が生じていた。従来、この種の並列処理のできな
い計算に対処するには、演算器を増やして逐次処理とす
る方法、または全体の計算を２つのグループに分けて計
算に適したハードウェアを備える方法など２種類の方法
が採用されている。しかし、いずれの場合においてもハ
ードウェアが大型化することにとって高速処理が実現し
ていた。In the application of the learning effect of the above-mentioned conventional neural network, the use of large hardware is indispensable because of the huge amount of calculation. It has a large number of arithmetic units for parallel processing in order to ensure high speed, but there is a problem that parallel processing cannot be performed in some calculations due to lack of flexibility of hardware. This is because, for example, data cannot be exchanged between the memory and the arithmetic unit or between the arithmetic units depending on the type of calculation, and thus high-speed processing is difficult to perform. Conventionally, in order to deal with this kind of calculation that cannot be performed in parallel, there are two types, such as a method of increasing the number of arithmetic units for sequential processing, or a method of dividing the whole calculation into two groups and providing hardware suitable for the calculation. Method has been adopted. However, in any case, high-speed processing has been realized due to the increase in size of hardware.

【００２４】応用面からみると汎用マイクロプロセッサ
あるいはＤＳＰの使用が必須条件となる場合が多いが、
マイクロプロセッサにおいては完全な逐次処理のため膨
大な計算時間を要するため現実的な応用に限界をもち、
ＤＳＰにおいては加算器と乗算器の並列動作により和積
演算をパイプライン処理するのでマイクロプロセッサよ
り遥かに高速であるが、メモリから演算器へのデータの
転送がネックになってパイプラインを活かせないことが
原因となってニューラルネットワークの学習のような計
算においては計算速度を上げることができていない。こ
れが比較的単純な家庭電化製品への応用においても学習
と前向きの演算を分離しなければならない理由であっ
た。この従来の方法によってはニューラルネットワーク
の動作は固定的になり、ユーザ自体の好みの動作は実現
することができない。From the viewpoint of application, the use of a general-purpose microprocessor or DSP is often an essential condition,
In a microprocessor, a huge amount of calculation time is required for complete sequential processing, so there is a limit to practical applications.
In the DSP, the sum-product operation is pipelined by the parallel operation of the adder and the multiplier, so it is much faster than the microprocessor, but the transfer of data from the memory to the arithmetic unit becomes a bottleneck, and the pipeline cannot be utilized. Due to this, the calculation speed cannot be increased in the calculation such as learning of the neural network. This is the reason why learning and positive operation must be separated even in the application to relatively simple home appliances. According to this conventional method, the operation of the neural network is fixed, and the operation desired by the user cannot be realized.

【００２５】本発明は上記課題を解決するもので、ハー
ドウェアに柔軟性を与えることによってハードウェアを
増やすことなしに計算量の大部分を占める行列計算を効
率よく高速に並列処理するニューロプロセッサの提供を
目的としたものである。The present invention solves the above-mentioned problems, and provides a neuroprocessor for efficiently performing high-speed parallel processing of matrix calculations that occupy most of the calculation amount without increasing the hardware by giving flexibility to the hardware. It is intended to be provided.

【００２６】[0026]

【課題を解決するための手段】上記目的を達成するため
に本発明のニューロプロセッサにおいては、セレクタ、
乗算器、加算器、メモリ、レジスタより構成されたプロ
セッサエレメントの機能をこのセレクタによる乗算器、
加算器、メモリ、レジスタなど相互の結合の切り換えに
より柔軟性を与え、かつこのプロセッサエレメントを２
次元的に配列してその各列と各行にメモリと加算器を付
加した行列演算装置を備え、またこの行列演算装置とは
並列に動作できる行列計算以外の計算を行う補助演算装
置を別に備えたニューロプロセッサによって高速に計算
するものである。To achieve the above object, in the neuroprocessor of the present invention, a selector,
The function of the processor element composed of a multiplier, an adder, a memory, and a register is multiplied by this selector,
Flexibility is provided by switching the mutual coupling of adders, memories, registers, etc.
It has a matrix operation unit that is dimensionally arranged and has a memory and an adder added to each column and each row, and an auxiliary operation unit that can operate in parallel with this matrix operation unit and that performs calculations other than matrix calculation. It is a high-speed calculation by a neuroprocessor.

【００２７】[0027]

【作用】上記構成によれば、ニューロプロセッサの行列
演算装置を構成する各プロセッサエレメントはセレクタ
を切り換えることによって乗算器、加算器、メモリ、レ
ジスタ相互の結合を変えることによって各プロセッサエ
レメントに割り当てられた行列計算を最も効率よく柔軟
に行うことができ、さらに行列計算以外の計算を別に備
えた補助演算装置によって並列に行うことによりハード
ウェアの有効利用がなされるため、比較的小さなハード
ウェアによってニューラルネットワークの学習に伴う膨
大な計算を高速に処理できるようにするものである。According to the above construction, each processor element constituting the matrix operation unit of the neuroprocessor is assigned to each processor element by changing the coupling between the multiplier, the adder, the memory and the register by switching the selector. Since the matrix calculation can be performed most efficiently and flexibly, and the hardware can be effectively used by performing the calculation other than the matrix calculation in parallel by the auxiliary calculation device provided separately, the neural network can be executed by the relatively small hardware. It enables high speed processing of a huge amount of computation involved in learning.

【００２８】[0028]

【実施例】以下、本発明の実施例を図面を参照して説明
する。本発明の一実施例であるニューロプロセッサのブ
ロック図を図１（ａ）（ｂ）に示す。図１（ａ）におい
て、行列演算装置１はプロセッサエレメント（以下ＰＥ
で表す）２〜１０、メモリ１１〜１６および加算器１７
〜２２によって構成されている。各ＰＥはそれぞれ行方
向と列方向の隣り合うＰＥと結合されており、データを
隣り合う行方向あるいは列方向のＰＥに伝達できるよう
になっている。メモリ１１〜１６は２次元的に配列した
ＰＥの各行と各列の入力側に配置されており、加算器１
７〜２２は同様に２次元的に配列したＰＥの各行と各列
の出力側に配置されている。入力側の各メモリはそれぞ
れの列や行の最初のＰＥにデータを伝達できるようにな
っている。また、出力側の各加算器は伝達されてくる計
算結果のデータを累積加算できるものである。図１
（ａ）に示されている補助演算装置２３は行列演算装置
１とは独立しており、行列演算装置１と並列に動作する
ものである。この補助演算装置２３はニューラルネット
ワークの計算のうち行列計算以外の部分を計算する機能
を有している。Embodiments of the present invention will be described below with reference to the drawings. A block diagram of a neuroprocessor that is an embodiment of the present invention is shown in FIGS. In FIG. 1A, the matrix calculation device 1 is a processor element (hereinafter PE).
2 to 10, memories 11 to 16 and adder 17
-22. Each PE is coupled to the adjacent PE in the row direction and the adjacent PE in the column direction, and the data can be transmitted to the PE in the adjacent row or column direction. The memories 11 to 16 are arranged on the input side of each row and each column of the PE arranged two-dimensionally, and the adder 1
Similarly, 7 to 22 are arranged on the output side of each row and each column of PEs arranged two-dimensionally. Each memory on the input side can transfer data to the first PE in each column or row. Further, each adder on the output side is capable of accumulatively adding the data of the calculation result transmitted thereto. Figure 1
The auxiliary arithmetic unit 23 shown in (a) is independent of the matrix arithmetic unit 1 and operates in parallel with the matrix arithmetic unit 1. The auxiliary arithmetic unit 23 has a function of calculating a part of the calculation of the neural network other than the matrix calculation.

【００２９】図１（ａ）においては、３行３列の２次元
行列状にＰＥを配置した例を示したが、ＰＥの２次元的
配列は何行何列であってもよい。Although FIG. 1A shows an example in which PEs are arranged in a two-dimensional matrix of 3 rows and 3 columns, the PE may be arranged in any number of rows and columns.

【００３０】図１（ｂ）は、本発明のニューロプロセッ
サに用いたＰＥの構成を示すブロック図である。図１
（ｂ）において、本実施例のＰＥは乗算器２４、加算器
２５、メモリ２６、セレクタ２７〜３３、レジスタ３４
〜４０によって構成される。また、Ｙ_INは列方向の入力
端子、Ｙ_OUTは列方向の出力端子、Ｘ_INは行方向の入力
端子、Ｘ_OUTは行方向の出力端子である。これらの入出
力端子によって、プロセッサエレメントは行方向と列方
向の隣り合うプロセッサエレメントと結合される。ま
た、セレクタの切り換えによって、データを列方向に出
力するか、行方向に出力するか選択することができる。FIG. 1B is a block diagram showing the configuration of the PE used in the neuroprocessor of the present invention. Figure 1
In (b), the PE of this embodiment includes a multiplier 24, an adder 25, a memory 26, selectors 27 to 33, and a register 34.
˜40. Y _IN is a column-direction input terminal, Y _OUT is a column-direction output terminal, X _IN is a row-direction input terminal, and X _OUT is a row-direction output terminal. By these input / output terminals, the processor element is coupled to the adjacent processor elements in the row direction and the column direction. Further, by switching the selector, it is possible to select whether to output the data in the column direction or in the row direction.

【００３１】以下に本実施例のニューロプロセッサの動
作を詳しく説明する。図８に示した例のように１層の中
間層をもつニューラルネットワークにおいては（数１）
と（数６）と（数７）はそれぞれ図２（Ｉ＝Ｗ×Ｏ）、
図３（ｓ＝Ｗ^T×δ）、図４（Ｒ＝δ×Ｏ^T）に示される
行列式で表わされる。ここで、^Tは転置行列であること
を表わす。図２、図３、図４の３つの行列計算は、この
種のニューラルネットワークの膨大な計算量の大部分を
占めている。図２、図３、図４においては結合の重みＷ
の行列とＲの行列が９行９列であり、これらの行列をさ
らに３行３列からなる９つの小行列に分割している。ま
たＲの行列に合わせてＯ、Ｉ、ｓ、δの行列（ベクト
ル）も３つのブロックに分割している。この分割によっ
てＷの９つの小行列の計算はそれぞれ配列された９つの
ＰＥ（図１（ａ））に対応させて実行する。すなわち、
Ｗの小行列ｗ_nmをそれぞれＰＥ_nmに対応させるととも
に、それぞれのプロセッサエレメント内のメモリＷ（図
１（ｂ））に対応する９つの要素からなるＷの小行列の
各要素を記憶させる。Ｒの小行列においても同様の対応
がなされる。またメモリ１１にはδのブロックδ₁を記
憶させ、メモリ１２にはδのブロックδ₂を記憶させ、
メモリ１３にはδのブロックδ₃を記憶させ、メモリ１
４にはＯのブロックＯ₁を記憶させ、メモリ１５にはＯ
のブロックＯ₂を記憶させ、メモリ１６にはＯの小行列
Ｏ₃を記憶させるのである。The operation of the neuroprocessor of this embodiment will be described in detail below. In a neural network having one intermediate layer as in the example shown in FIG. 8, (Equation 1)
And (Equation 6) and (Equation 7) are shown in FIG. 2 (I = W × O),
Figure ^{3 (s = W T × δ} ), represented by the matrix equation shown in FIG. ^{4 (R = δ × O T} ). Here, ^T means that it is a transposed matrix. The three matrix calculations of FIGS. 2, 3 and 4 account for a large amount of the enormous amount of calculation of this kind of neural network. In FIGS. 2, 3, and 4, the connection weight W
Matrix and R matrix are 9 rows and 9 columns, and these matrices are further divided into 9 sub-matrices consisting of 3 rows and 3 columns. In addition, the matrix (vector) of O, I, s, and δ is also divided into three blocks according to the matrix of R. By this division, the calculation of the nine small matrices of W is executed corresponding to the nine PEs (FIG. 1A) arranged. That is,
Each sub-matrix w _nm of W is associated with PE _nm , and each element of the sub-matrix of W consisting of nine elements corresponding to the memory W (FIG. 1B) in each processor element is stored. The same applies to the R sub-matrix. Further, the block δ ₁ of δ is stored in the memory 11, the block δ ₂ of δ is stored in the memory 12,
The block δ ₃ of δ is stored in the memory 13 and the memory 1
4 stores the block O ₁ of O, and the memory 15 stores O
Block O ₂ is stored in the memory 16 and a small matrix O ₃ of O is stored in the memory 16.

【００３２】セレクタの切り換えによってハードウェア
としての計算機能に柔軟性を持たせるのであるが、その
効果を図２に示したＩ＝Ｗ×Ｏの計算によって説明す
る。図１（ｂ）に示したＰＥの各セレクタの接点位置を
セレクタ１ではＤ、セレクタ２ではＣ、セレクタ３では
Ｅ、セレクタ４ではＡ、セレクタ６ではＢ、セレクタ７
ではＤに切り換える。その結果、実際のＰＥの構成は図
５に示すものとなり、演算機能もこの図と同一のものと
変わる。この構成によると図１（ａ）に示したメモリ１
４、メモリ１５、メモリ１６からはＯのデータＯ₁、
Ｏ₂、Ｏ₃が行方向（横方向）に伝達され、またＰＥ₁₁、
ＰＥ₁₂、ＰＥ₁₃のＹ_INには０を入力されることになる。
そして、各ＰＥにおいては伝達されたＯのデータと各Ｐ
Ｅの内部メモリに記憶されたＷの値を乗算し、さらにＹ
_INからの入力にこの乗算結果を加算してＹ_OUTに出力さ
せる。各ＰＥを経由して列方向（縦方向）に伝達される
演算結果は、各列の最後の加算器２０〜２２によって累
積加算され、結果として、図２の式に対応した積和演算
が実行される。Although the calculation function as hardware is made flexible by switching the selector, its effect will be explained by the calculation of I = W × O shown in FIG. The contact position of each selector of PE shown in FIG. 1B is D for selector 1, C for selector 2, E for selector 3, A for selector 4, B for selector 6, and selector 7.
Then switch to D. As a result, the actual PE configuration is as shown in FIG. 5, and the arithmetic functions are also the same as those in this figure. According to this configuration, the memory 1 shown in FIG.
4, memory 15 and memory 16 output O data O ₁ ,
O ₂ and O ₃ are transmitted in the row direction (lateral direction), and PE ₁₁
0 is input to Y _IN of PE ₁₂ and PE ₁₃ .
Then, in each PE, the transmitted O data and each P
Multiplies the value of W stored in the internal memory of E, and then Y
This multiplication result is added to the input from _IN and output to Y _OUT . The operation results transmitted in the column direction (longitudinal direction) via each PE are cumulatively added by the last adders 20 to 22 in each column, and as a result, the sum of products operation corresponding to the formula of FIG. 2 is executed. To be done.

【００３３】次に図３のＩ＝Ｗ^T×δの計算を例にして
同様に同じハードウェアを用いて別の行列計算に適応で
きる方法の柔軟性について説明する。図１（ｂ）に示し
たＰＥの各セレクタの接点位置をセレクタ１ではＥ、セ
レクタ２ではＣ、セレクタ３ではＤ、セレクタ４では
Ａ、セレクタ６ではＤ、セレクタ７ではＡに切り換え
る。その結果、実際のＰＥは図６に示した構成をとるこ
とになる。この構成によるとメモリ１１、メモリ１２、
メモリ１３からはδのデータδ₁、δ₂、δ₃が列方向
（縦方向）に伝達され、またＰＥ₁₁、ＰＥ₂₁、ＰＥ₃₁の
Ｘ_INには０を入力されることになる。そして、各プロセ
ッサエレメントでは、伝達されたδのデータと各ＰＥの
内部メモリに記憶されたＷの値を乗算し、Ｘ_INからの入
力に乗算結果を加算してＸ_OUTに出力させる。ＰＥを行
方向（横方向）に伝達される演算結果は、各行の最後の
加算器１７〜２２によって累積加算され、結果として、
図３の式に対応した積和演算が実行される。Next, the flexibility of the method of adapting to another matrix calculation using the same hardware will be described by taking the calculation of I = W ^T × δ in FIG. 3 as an example. The contact position of each selector of the PE shown in FIG. 1B is switched to E for the selector 1, C for the selector 2, D for the selector 3, A for the selector 4, D for the selector 6, and A for the selector 7. As a result, the actual PE has the configuration shown in FIG. According to this configuration, the memory 11, the memory 12,
Data δ ₁ , δ ₂ , and δ ₃ of δ are transmitted from the memory 13 in the column direction (vertical direction), and 0 is input to X _IN of PE ₁₁ , PE ₂₁ , and PE ₃₁ . Then, each processor element multiplies the transmitted δ data by the value of W stored in the internal memory of each PE, adds the multiplication result to the input from X _IN , and outputs it to X _OUT . The calculation results transmitted in the row direction (horizontal direction) through PE are cumulatively added by the last adders 17 to 22 in each row, and as a result,
The sum of products operation corresponding to the formula of FIG. 3 is executed.

【００３４】最後に、図４のＲ＝δ×Ｏ^T の計算方法を
同様に説明する。図１（ｂ）に示したＰＥの各セレクタ
の接点位置をセレクタ１ではＥ、セレクタ２ではＤ、セ
レクタ３ではＣ、セレクタ４ではＡ、セレクタ５では
Ｂ、セレクタ６ではＢ、セレクタ７ではＡに切り換え
る。その結果、プロセッサエレメントは図７に示したも
のと同じ構成をとることになる。この構成においてメモ
リ１４、メモリ１５、メモリ１６からはＯのデータ
Ｏ₁、Ｏ₂、Ｏ₃が行方向（横方向）に伝達され、メモリ
１１、メモリ１２、メモリ１３からはδのデータδ₁、
δ₂、δ₃が列方向（縦方向）に伝達され、各ＰＥにおい
ては伝達されたＯとδのデータを乗算してＲが求められ
る。このＲが求められると（数６）においてηが１の場
合はＲが重みの修正量ΔＷと等価になるので、再びＰＥ
のセレクタを切り換えて、メモリに記憶されているＷの
値にΔＷを加算器で加算することによってＷの値を修正
することができるものである。[0034] Finally, similarly illustrating the method of calculating the R = δ × O ^T FIG. The contact position of each selector of the PE shown in FIG. 1B is E for the selector 1, D for the selector 2, C for the selector 3, A for the selector 4, B for the selector 5, B for the selector 6, and A for the selector 7. Switch to. As a result, the processor element will have the same configuration as shown in FIG. In this configuration, the O data O ₁ , O ₂ , and O ₃ are transmitted in the row direction (horizontal direction) from the memory 14, the memory 15, and the memory 16, and the δ data δ ₁ from the memory 11, the memory 12, and the memory 13. ,
δ ₂ and δ ₃ are transmitted in the column direction (vertical direction), and in each PE, the transmitted O and δ data are multiplied to obtain R. When this R is obtained (Equation 6), when η is 1, R becomes equivalent to the weight correction amount ΔW, so PE
It is possible to correct the value of W by switching the selector and adding .DELTA.W to the value of W stored in the memory by the adder.

【００３５】以上のようにして３つの行列計算が同時並
行的に行なわれる。本実施例では、各ＰＥ内部の計算は
レジスタを介して実行されるためパイプライン処理する
ことができる。また、９つのＰＥは並列に動作すること
ができるので、行方向あるいは列方向におけるＰＥを単
位としたパイプライン処理を実行することによって行列
計算は非常に高速に実行できることになる。As described above, three matrix calculations are simultaneously performed in parallel. In this embodiment, the calculation inside each PE is executed via the register, so that it can be pipelined. Further, since the nine PEs can operate in parallel, the matrix calculation can be executed at a very high speed by executing pipeline processing in units of PEs in the row direction or the column direction.

【００３６】また、行列計算以外の計算には補助演算装
置を使用するが、そのような計算には主に次の４種類の
ものがある。それらは（１）前向き伝搬の計算のときの
ニューロンの特性関数ｆを作用させる処理。（２）出力
層のデルタを計算するときのｔ−Ｏの差を求める計算。
（３）中間層のデルタの計算のときの特性関数の微分係
数ｇを掛ける処理。（４）出力層のデルタの計算のとき
の特性関数の微分係数ｇを掛ける処理などである。この
ような処理を行うため、補助演算装置はニューロンの特
性関数ｆやその微分係数ｇのルックアップテーブル用の
メモリ、加算器、乗算器、データ用メモリなどから構成
されている。Although an auxiliary arithmetic unit is used for calculations other than matrix calculation, there are mainly the following four kinds of such calculations. They are (1) a process of operating the characteristic function f of the neuron in the calculation of the forward propagation. (2) Calculation to find the difference of t-O when calculating the delta of the output layer.
(3) A process of multiplying the differential coefficient g of the characteristic function when calculating the delta of the intermediate layer. (4) A process of multiplying the differential coefficient g of the characteristic function when calculating the delta of the output layer. In order to perform such processing, the auxiliary arithmetic unit is composed of a memory for a look-up table of the characteristic function f of the neuron and its differential coefficient g, an adder, a multiplier, a data memory and the like.

【００３７】補助演算装置の動作は以下のように行われ
る。まず（１）前向き伝搬の計算のときのニューロンの
特性関数ｆを作用させる処理においては加算器２０〜２
２が出力する積和演算の結果Ｉを取り込み、ニューロン
の特性関数ｆのルックアップテーブルによってｆ（Ｉ）
を求め、その結果をメモリ１４〜１６に書き込む。また
（２）出力層のデルタを計算するときのｔ−Ｏの差の計
算においてはメモリ１４〜１６から読みだしたＯの値と
補助演算装置内部のデータ用メモリに蓄えられたｔの値
の差ｔ−Ｏを加算器により計算し、結果を再びデータ用
メモリに蓄える。さらに（３）中間層のデルタの計算の
ときの特性関数の微分係数ｇを掛ける処理においてはメ
モリ１４〜１６よりＯの値を読み出し、特性関数の微分
係数ｇのルックアップテーブルよりｇ（Ｏ）を求め、加
算器１７〜１９が出力するｓの値と乗算器で乗算し、乗
算結果のδの値はメモリ１１〜１３に書き込む。また
（４）出力層のデルタの計算のときの特性関数の微分係
数ｇを掛ける処理においてはメモリ１４〜１６よりＯの
値を読み出し、特性関数の微分係数ｇのルックアップテ
ーブルよりｇ（Ｏ）を求め、これとデータ用メモリに書
き込まれたｔ−Ｏの値とを乗算して乗算結果のδの値を
メモリ１１〜１３に書き込む。これらの計算において、
補助演算装置は行列計算用演算装置とは独立して並列に
動作することができる。The operation of the auxiliary arithmetic unit is performed as follows. First, (1) Adders 20 to 2 are added in the process of applying the characteristic function f of the neuron in the calculation of the forward propagation.
The result I of the multiply-accumulate operation output by 2 is taken in, and f (I) is obtained by the look-up table of the characteristic function f of the neuron.
And write the result in the memories 14 to 16. (2) In calculating the difference of t-O when calculating the delta of the output layer, the value of O read from the memories 14 to 16 and the value of t stored in the data memory inside the auxiliary arithmetic unit are The difference t-O is calculated by the adder, and the result is stored again in the data memory. Further, (3) in the process of multiplying the differential coefficient g of the characteristic function when calculating the delta of the intermediate layer, the value of O is read from the memories 14 to 16 and g (O) is read from the lookup table of the differential coefficient g of the characteristic function. Is calculated, the value of s output from the adders 17 to 19 is multiplied by the multiplier, and the value of δ as the multiplication result is written in the memories 11 to 13. (4) In the process of multiplying the differential coefficient g of the characteristic function when calculating the delta of the output layer, the value of O is read from the memories 14 to 16, and g (O) is read from the lookup table of the differential coefficient g of the characteristic function. And the value of t−O written in the data memory are multiplied, and the value of δ as the multiplication result is written in the memories 11 to 13. In these calculations,
The auxiliary computing device can operate in parallel independently of the matrix computing computing device.

【００３８】以上で説明したように本発明のニューロプ
ロセッサでは、行列演算装置の各ＰＥのセレクタの切り
換えによって乗算器、加算器、メモリ、レジスタ相互の
結合を変え、最も効率よく計算できる構成を選択でき
る。さらに各ＰＥが列方向と行方向に任意に計算結果の
データの伝達が行うことができ、各列や各行の最後に備
えた加算器により列方向と行方向の計算結果の累積加算
が行えるので、行列計算を高速に行うことができる。さ
らに、行列計算以外の計算を行うための補助演算装置を
別に備えているので、行列の計算とそれ以外の計算を並
列に実行することができるため、無駄なハードウェアが
なくなり多量の計算を高速に処理することができる。そ
の結果、比較的小さなハードウェアによってニューラル
ネットワークの推論と学習の計算を高速に行えるため、
ユーザが自分好みの動作をするニューラルネットワーク
を作るための学習を応用する分野を広げることを可能に
するものである。As described above, in the neuroprocessor of the present invention, by switching the selector of each PE of the matrix operation unit, the coupling among the multipliers, adders, memories, and registers is changed to select the most efficient calculation configuration. it can. Furthermore, each PE can arbitrarily transmit the calculation result data in the column direction and the row direction, and the adder provided at the end of each column or each row can perform cumulative addition of the calculation results in the column direction and the row direction. , Matrix calculation can be performed at high speed. Furthermore, since an auxiliary computing device for performing calculations other than matrix calculations is separately provided, matrix calculations and other calculations can be executed in parallel, eliminating wasted hardware and enabling large amounts of calculations at high speed. Can be processed. As a result, neural network inference and learning calculations can be performed quickly with relatively small hardware.
It enables the user to expand the field of application of learning for creating a neural network that behaves in his or her preference.

【００３９】以上、本実施例においては階層構造のニュ
ーラルネットワークの計算を例を説明したが、使用した
ＰＥはセレクタの切り替えによって任意の計算を効率よ
く実行可能であり、任意の構造のニューラルネットワー
クの行列の計算が高速に行えるのは明らかである。ま
た、本発明の行列演算装置がニューラルネットワークの
計算以外の行列計算に対しても有効なことは言うまでも
ない。Although the calculation of the neural network having the hierarchical structure has been described as an example in the present embodiment, the PE used can efficiently execute the arbitrary calculation by switching the selectors. It is obvious that matrix calculation can be performed at high speed. Further, it goes without saying that the matrix operation device of the present invention is effective for matrix calculations other than neural network calculations.

【００４０】なお、本発明のニューロプロセッサが単一
チップであってもよいし、他のプロセッサや演算ブロッ
クおよびメモリなどと同一チップ上に構成されていても
よい。The neuroprocessor of the present invention may be a single chip, or may be configured on the same chip as other processors, arithmetic blocks and memories.

【００４１】[0041]

【発明の効果】以上の実施例から明らかなように、本発
明のニューロプロセッサでは、プロセッサエレメントを
構成するセレクタの切り換えによって種々の行列計算を
高速に行なえることにより、ニューラルネットワークの
推論及び学習の計算が高速に実行するのに有効であり、
それによりハードウェアを大幅に小型化したニューロプ
ロセッサを提供するものである。As is apparent from the above embodiments, in the neuroprocessor of the present invention, various matrix calculations can be performed at high speed by switching the selectors constituting the processor elements, so that the inference and learning of the neural network can be performed. The calculation is effective for fast execution,
This provides a neuroprocessor whose hardware is significantly reduced.

[Brief description of drawings]

【図１】（ａ）本発明の一実施例のニューロプロセッサ
のブロック図（ｂ）本発明の一実施例のプロセッサエレメントのブロ
ック図FIG. 1A is a block diagram of a neuroprocessor according to an embodiment of the present invention. FIG. 1B is a block diagram of a processor element according to an embodiment of the present invention.

【図２】行列計算（１）Ｉ＝Ｗ×Ｏ４の式を示している
図FIG. 2 is a diagram showing a matrix calculation (1) formula of I = W × O4.

【図３】行列計算（２）ｓ＝Ｗ^T×δの式を示している
図FIG. 3 is a diagram showing an equation of matrix calculation (2) s = W ^T × δ.

【図４】行列計算（３）Ｒ＝δ×Ｏ^Tの式を示している
図[4] matrix calculation (3) diagram shows the formula of R = δ × O ^T

【図５】行列計算（１）の場合のプロセッサエレメント
の構成を示す図FIG. 5 is a diagram showing a configuration of a processor element in the case of matrix calculation (1).

【図６】行列計算（２）の場合のプロセッサエレメント
の構成を示す図FIG. 6 is a diagram showing a configuration of a processor element in the case of matrix calculation (2).

【図７】行列計算（３）の場合のプロセッサエレメント
の構成を示す図FIG. 7 is a diagram showing a configuration of a processor element in the case of matrix calculation (3).

【図８】階層構造ニューラルネットワークのモデルを説
明するための図FIG. 8 is a diagram for explaining a model of a hierarchical neural network.

【図９】ニューロンの働きを説明するための図FIG. 9 is a diagram for explaining the function of neurons.

[Explanation of symbols]

１行列演算装置２〜１０プロセッサエレメント１１〜１６メモリ１７〜２２加算器２３補助演算装置２４乗算器２５加算器２６メモリ２７〜３３セレクタ３４〜４０レジスタ 1 Matrix operation device 2-10 Processor element 11-16 Memory 17-22 Adder 23 Auxiliary operation device 24 Multiplier 25 Adder 26 Memory 27-33 Selector 34-40 Register

Claims

[Claims]

1. A matrix calculation device for selectively performing matrix calculation of a neural network, and an auxiliary calculation device that operates independently of the matrix calculation device, wherein matrix calculation is focused on the matrix calculation device. , A neuroprocessor characterized in that calculations other than matrix calculations and matrix calculations are allocated to auxiliary arithmetic units and used in parallel.

2. A matrix operation device comprises a multiplier, an adder, a memory, a register, a selector and a plurality of output terminals,
By switching the contact position of the selector, it is possible to change the coupling among the multiplier, the adder, the memory, and the register, and has the function of outputting data from one of the plurality of output terminals selected by switching the selector. The neuroprocessor according to claim 1, wherein the neuroprocessor comprises a processor element.

3. The processor elements are arranged in a two-dimensional matrix, a memory is provided at the input side of each row and each column of the processor element, and the data is transmitted to the processor element. The neuroprocessor according to claim 2, wherein

4. Processor elements are arranged in a two-dimensional matrix form, an adder is provided on the output side of each row and each column of the processor element, and the addition result is cumulatively added in the row direction or the column direction. The neuroprocessor according to claim 2, further comprising a matrix operation device.