JPH06139217A

JPH06139217A - Highly precise processing unit and method

Info

Publication number: JPH06139217A
Application number: JP29116492A
Authority: JP
Inventors: Masa Hashimoto; 雅橋本; Takahiro Sakaguchi; 隆宏坂口; Yuji Sato; 裕二佐藤; Katsunari Shibata; 克成柴田; Mitsuo Asai; 光男浅井; Yoshihiro Kuwabara; 良博桑原; 博 ▲高▼柳; Hiroshi Takayanagi; Takuo Okabashi; 卓夫岡橋; Tatsuo Ochiai; 辰男落合; Keiji Mogi; 啓次茂木
Original assignee: Hitachi Microcomputer System Ltd; Hitachi Ltd
Current assignee: Hitachi Microcomputer System Ltd; Hitachi Ltd
Priority date: 1992-10-29
Filing date: 1992-10-29
Publication date: 1994-05-20

Abstract

PURPOSE:To perform a highly precise arithmetic processing by a neuro com puter, which performs fixed-point and single-precision arithmetic processing without increasing a circuit scale and lowering the arithmetic processing speed. CONSTITUTION:The neuro computer 10 consists of plural neuro boards 12 which perform neural network operation and a control board 11. Further, the control board 11 consists of a neural network controller 13 which controls the neuro computer 10, a control storage 15 which stores microprograms, and a global memory 16 which stores the values of input layer neurons, etc. Then the neural network controller 13 decodes and executes the microprogram codes stored in the control storage 15 to performs the arithmetic processing of neural network operation having higher precision than single precision.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、ニューラルネットワー
ク動作を固定小数点且つ単精度演算処理で行なうニュー
ロコンピュータ内のニューロンにおける演算処理に係
り、更に詳しくは、該単精度演算処理に用いる各演算器
を用いて、より高精度な演算処理を可能とする情報処理
装置と高精度演算処理方法に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to arithmetic processing in a neuron in a neurocomputer that performs neural network operation by fixed point and single precision arithmetic processing, and more specifically, each arithmetic unit used in the single precision arithmetic processing is The present invention relates to an information processing device and a high-precision arithmetic processing method that enable higher-precision arithmetic processing using the same.

【０００２】[0002]

【従来の技術】現在ニューラルネットワークの実行には
既存のノイマン型コンピュータによるソフトウェアシミ
ュレーションが主流である。一般には、このようなソフ
トウェアシミュレーションは、プログラム記述の変更を
行なうことで比較的容易にニューロンの入出力値等の精
度を向上させることができる。しかし、ニューラルネッ
ト動作の演算処理速度の面で制約を受けることも多く、
ニューラルネット動作をハードウェアで実現できるよう
なニューロコンピュータの出現が期待されていた。そこ
で大規模なニューラルネットワークで高速に学習可能な
ニューロコンピュータの実用化検討が進んでいる。例え
ば１９９０年に電子情報通信学会論文集（ＮＣ９０−１
２）に”高速学習型ニューロＷＳＩ”と題して発表され
ているニューロコンピュータではデジタルニューロンを
バス型ネットワークで接続し、デジタルニューロン間を
時分割で通信することによって、バックプロパゲーショ
ン学習法を高速に実行できる。2. Description of the Related Art Currently, software simulation using an existing Neumann computer is the mainstream for the execution of neural networks. In general, such software simulation can improve the accuracy of the input / output values of the neuron relatively easily by changing the program description. However, there are many restrictions on the processing speed of neural network operation,
It was expected that a neuro computer that could realize neural network operation with hardware would emerge. Therefore, the practical application of a neuro computer that can learn at high speed with a large-scale neural network is being studied. For example, in 1990, IEICE Transactions (NC90-1
In the neurocomputer announced in 2) under the heading "High-speed learning type neuro-WSI", the back propagation learning method is speeded up by connecting digital neurons with a bus type network and communicating between the digital neurons in a time division manner. I can do it.

【０００３】[0003]

【発明が解決しようとする課題】ところが、ハードウェ
アで実現されたニューロコンピュータは、演算精度をよ
り高精度化するために、例えば、乗算処理したい問題に
対応して、乗算器入力値のビット幅を可変にすることは
不可能である。そのため、予めニューロコンピュータの
乗算器の入出力値のビット幅を拡大しておくことも考え
られるが、一般的に、使用頻度が少ないと考えられる高
精度な演算のために乗算器入力のビット幅を拡大するこ
とや高精度な演算を行なうための専用の演算回路を設け
るとチップ面積に対する演算回路の使用効率が低下して
しまう。仮に、高精度な演算処理を行なうための専用の
演算回路を付加することで高精度演算処理そのものはソ
フトウェアシミュレーションに比べ高速化出来たとして
も、回路規模が拡大することで、チップ面積が広がり、
チップコストの増大が考えられる。本発明は、このよう
に回路規模を拡張することなく高速、高精度な演算処理
を行なうことのできる高精度演算処理方法とこの演算処
理方法が実現できる情報処理装置を提供することが本発
明が解決しようとする課題である。However, in order to further improve the calculation accuracy, the neurocomputer realized by hardware has, for example, a bit width of the multiplier input value corresponding to the problem to be multiplied. Cannot be made variable. Therefore, it is possible to expand the bit width of the input / output value of the multiplier of the neurocomputer in advance, but in general, it is necessary to increase the bit width of the multiplier input for high-precision arithmetic that is considered to be rarely used. If a dedicated arithmetic circuit is provided for increasing the number of pixels and performing high-precision arithmetic, the efficiency of use of the arithmetic circuit with respect to the chip area will decrease. Even if the high-precision arithmetic processing itself can be sped up compared to software simulation by adding a dedicated arithmetic circuit for performing high-precision arithmetic processing, the chip area is expanded by expanding the circuit scale,
An increase in chip cost can be considered. The present invention provides a high-precision arithmetic processing method capable of performing high-speed and high-accuracy arithmetic processing without expanding the circuit scale in this way, and an information processing apparatus capable of realizing the arithmetic processing method. This is a problem to be solved.

【０００４】[0004]

【課題を解決するための手段】第１の手段として、単精
度演算処理より高精度な演算が可能なように乗算器、加
算器等、各演算器の出力値や後述する高精度演算用に処
理された乗数ワード、被乗数ワードの値のキャリー、オ
ーバーフロー、符号、値が０か否かの各状態を検出する
ことができるＣＣＲをニューロン内に設ける。又、後述
する高精度乗算を行なうため、該各演算器の出力値と該
乗数ワード、該被乗数ワードを格納できる記憶装置をニ
ューロン内に設け、ここに格納された値を再度演算器に
て演算処理できるような構成のバスを設ける。さらに、
マイクロプログラムのコーディングミスを少なくするた
めニューロンの状態値を入力層から中間層へブロードキ
ャストする時と中間層から出力層へブロードキャストす
る時とでパイプライン処理に必要なマシンサイクル数を
同等にするようにラッチを設ける。尚、ニューラルネッ
トワークの形態や演算精度を自由に選択できるようにニ
ューラルネットワークをマイクロプログラムを用い制御
する。As a first means, the output value of each arithmetic unit, such as a multiplier and an adder, and a high-precision arithmetic operation to be described later are performed so that an arithmetic operation with higher precision than single precision arithmetic processing can be performed. A CCR capable of detecting a carry, an overflow, a sign, and a value of 0 of the value of the processed multiplier word and multiplicand word is provided in the neuron. Further, in order to perform high-precision multiplication described later, a storage device capable of storing the output value of each arithmetic unit, the multiplier word, and the multiplicand word is provided in the neuron, and the value stored here is calculated by the arithmetic unit again. Provide a bus that can be processed. further,
In order to reduce coding mistakes in microprograms, the number of machine cycles required for pipeline processing is made equal when broadcasting neuron state values from the input layer to the intermediate layer and when broadcasting from the intermediate layer to the output layer. Provide a latch. The neural network is controlled by using a microprogram so that the form of the neural network and the calculation accuracy can be freely selected.

【０００５】第２の手段として、単精度演算より高精度
な演算処理を行なうための高精度乗算方法として乗算器
の入力ビット幅を超える乗数、被乗数に対しても、予め
ニューロン内に備わる単精度乗算器で乗算できるよう該
乗数、該被乗数を該単精度乗算器の入力ビット幅を超え
ない範囲でそれぞれ複数のワードに分割し、分割された
該乗数ワードと該被乗数ワードを乗算し、積を算出する
高精度乗算方法を用いる。又、乗算器の入力値である乗
数、被乗数に２の補数表現を用いた時、前述の分割され
た乗数ワード、被乗数ワードを前処理する必要があるの
でその処理方法を２通り、以下に示す。As a second means, as a high precision multiplication method for performing arithmetic processing with higher precision than single precision arithmetic, even for multipliers and multiplicands exceeding the input bit width of the multiplier, single precision provided in the neuron in advance. The multiplier and the multiplicand are divided into a plurality of words within a range not exceeding the input bit width of the single precision multiplier so that the multiplier can be multiplied, and the divided multiplier word and the divided word are multiplied, and the product is obtained. A high precision multiplication method for calculation is used. Further, when the two's complement representation is used for the multiplier and the multiplicand which are the input values of the multiplier, it is necessary to preprocess the above-mentioned divided multiplier word and multiplicand word. .

【０００６】１．分割された乗数ワード、被乗数ワード
の上位ワードの最下位ビットに下位ワードの最上位ビッ
トを加算する。尚、下位ワード最上位ビットの状態検出
にはＣＣＲを用いる。２．分割された乗数ワード、被乗数ワードの最上位ワー
ド以外のワードの最上位ビット側から０を詰めていく。
尚、値が負の場合、最上位ワードでは符号拡張を行な
う。1. The most significant bit of the lower word is added to the least significant bit of the upper word of the divided multiplier word and multiplicand word. CCR is used to detect the state of the most significant bit of the lower word. 2. Zeros are filled from the most significant bit side of words other than the most significant word of the divided multiplier word and multiplicand word.
If the value is negative, sign extension is performed on the most significant word.

【０００７】これらの前処理方法を施した複数のワード
は記憶装置に格納され再度乗算器により各々乗算し、部
分積同士、桁を合わせ加算することで単精度乗算より高
精度な積を得ることができる。尚、積は部分積同士を加
算した複数ワードで表す。又、単精度演算処理と同様に
分割されたワード単位で演算処理を行なうのでシフト処
理やＡＬＵの処理においても高精度な演算処理を行なう
ことができる。A plurality of words which have been subjected to these preprocessing methods are stored in a storage device and again multiplied by a multiplier, and partial products are added together and digits are added together to obtain a product with higher precision than single precision multiplication. You can The product is represented by a plurality of words obtained by adding partial products. Further, since the arithmetic processing is performed for each divided word as in the single precision arithmetic processing, it is possible to perform the highly accurate arithmetic processing even in the shift processing and the ALU processing.

【０００８】[0008]

【作用】第１の手段を用いることで、乗算器、加算器
等、各演算器の出力値や高精度演算用に処理された乗数
ワード、被乗数ワードの値のキャリー、オーバーフロ
ー、符号、値が０か否かの各状態を検出することができ
ることと該各演算器の出力値と乗数ワード、被乗数ワー
ドを一時的に格納でき、この格納された値を再度演算器
にて演算処理できることで第２の手段に記載の高精度演
算処理方法が可能となった。又、この高精度乗算処理は
単精度同様にマイクロプログラムをデコードし、実行す
るので同一ニューロンアーキテクチャで単精度、高精
度、両方の演算処理ができる。しかも新たな高精度演算
処理用専用の回路を付加していないので単精度演算処理
用の回路と比べ回路規模の拡張はなくチップ面積の拡大
がない。By using the first means, the carry value, the overflow, the sign, and the value of the output value of each arithmetic unit such as a multiplier and an adder, the multiplier word processed for high-precision arithmetic, and the value of the multiplicand word are By being able to detect each state of 0 or not, and temporarily storing the output value of each arithmetic unit, the multiplier word, and the multiplicand word, the stored value can be processed again by the arithmetic unit. The high-precision arithmetic processing method described in the means of 2 has become possible. Further, since the high precision multiplication processing decodes and executes the microprogram in the same manner as single precision, both single precision and high precision arithmetic processing can be performed by the same neuron architecture. Moreover, since a new circuit dedicated to high-precision arithmetic processing is not added, the circuit scale is not expanded and the chip area is not expanded as compared with the circuit for single-precision arithmetic processing.

【０００９】第２の手段を用いることで、各演算器の入
出力データを複数のワードに分割してもてるので単精度
演算の各演算器の入出力データよりビット幅を大きくと
れ、この入出力データをもとに高精度乗算処理では、第
２の手段記載の前処理を行なうことで高精度乗算処理を
行なえ、単精度より高精度な積を得ることができる。
又、シフト処理やＡＬＵの処理においても入出力データ
は単精度演算に用いる入出力データよりビット幅が大き
い、複数の分割されたデータをもとに演算するので単精
度演算処理に比べより高精度な演算ができる。By using the second means, the input / output data of each arithmetic unit can be divided into a plurality of words, so that the bit width can be made larger than the input / output data of each arithmetic unit for single precision operation. In the high-precision multiplication processing based on the output data, the high-precision multiplication processing can be performed by performing the preprocessing described in the second means, and a product with higher precision than single precision can be obtained.
Further, in the shift process and the ALU process, the input / output data has a bit width larger than that of the input / output data used for the single precision calculation. Since the calculation is performed based on a plurality of divided data, the precision is higher than the single precision calculation process. You can perform various calculations.

【００１０】[0010]

【実施例】以下、固定小数点で且つ単精度演算を行なう
ニューロコンピュータにおいてより高精度な演算を実施
できる装置の構成及びこの装置を用いた高精度乗算の方
法を説明し、最後にこれらを用いた高精度乗算処理の動
作を詳細に説明する。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS The configuration of a device capable of performing a higher precision operation in a neurocomputer that performs fixed-point and single-precision operations and a high-precision multiplication method using this device will be described below, and finally, these will be used. The operation of the high precision multiplication process will be described in detail.

【００１１】本発明を用いたニューロコンピュータの一
例としてまず最初にニューロコンピュータの概要を説明
する。As an example of the neuro computer using the present invention, the outline of the neuro computer will be described first.

【００１２】図１は本発明を用いたニューロコンピュー
タの構成図である。ニューロコンピュータ１０はニュー
ラルネット動作を行なう複数のニューロボード１２と制
御ボード１１から構成される。又、制御ボード１１は、
ニューラルネットワーク制御装置１３とコントロールス
トレ−ジ１５、グローバルメモリ１６から構成される。
ニューラルネットワーク制御装置１３はコントロールス
トレッジ１５に格納されるマイクロプログラムコードを
デコード、実行することでニューロコンピュータ１０を
制御する。尚、ニューラルネットワーク制御装置１３
は、コントロールストレッジ１５内に格納されるマイク
ロプログラムコードをデコード、実行するコントロール
ストレッジ制御回路１４ａ、ワークステーション１８と
ニューロコンピュータ１０の通信手段として外部バスを
設けこの外部バスを制御する外部バス制御回路１４ｂ、
ニューロチップ１７内のニューロンを制御するニューロ
ン制御回路１４ｃと入力層ニューロンの値等を格納する
グローバルメモリ制御回路１４ｄから構成される。尚、
外部バスはメモリ、ＣＰＵ、ＳＣＳＩボードに接続され
ＳＣＳＩボードはマイクロプログラムコードを発生させ
るマイクロジェネレータをもつワークステーション１８
に接続される。本発明は制御ボード１１、ニューロボー
ド１２間のパイプライン処理及びニューロチップ１７内
における高精度演算に係わるので以下、ニューラルネッ
トワークのパイプライン処理とニューロチップ１７につ
いて説明を行なう。FIG. 1 is a block diagram of a neurocomputer using the present invention. The neuro computer 10 is composed of a plurality of neuro boards 12 that perform neural network operations and a control board 11. Also, the control board 11 is
It is composed of a neural network control device 13, a control storage 15, and a global memory 16.
The neural network control device 13 controls the neuro computer 10 by decoding and executing the micro program code stored in the control storage 15. The neural network control device 13
Is a control storage control circuit 14a for decoding and executing the micro program code stored in the control storage 15, and an external bus is provided as a communication means between the workstation 18 and the neuro computer 10 to control the external bus. ,
It is composed of a neuron control circuit 14c for controlling neurons in the neurochip 17 and a global memory control circuit 14d for storing values of input layer neurons and the like. still,
The external bus is connected to a memory, a CPU and a SCSI board, and the SCSI board is a workstation 18 having a micro generator for generating micro program code.
Connected to. Since the present invention relates to pipeline processing between the control board 11 and the neuro board 12 and high-precision calculation in the neuro chip 17, the neural network pipeline processing and the neuro chip 17 will be described below.

【００１３】本発明ではニューロコンピュータ１０を制
御するマイクロプログラムコ−ドのコーディングミスの
発生を防ぐためにニューロンの状態値を入力層から中間
層へブロードキャストする時と中間層から出力層へブロ
ードキャストする時とで必要なマシンサイクル数を同等
に設定する。この設定により、高精度演算を行なう際に
ニューロン２０が入力値をフェッチする間に９マシンサ
イクル分のＮＯＰ（ＮＯＯＰＥＲＡＴＩＯＮ）が生じ
るのでこの間に後述する高精度乗算処理の被乗数の前処
理を行なうので以下このパイプライン構成について説明
する。In the present invention, in order to prevent the occurrence of coding mistakes in the microprogram code controlling the neurocomputer 10, the state values of neurons are broadcast from the input layer to the intermediate layer and from the intermediate layer to the output layer. Set the required number of machine cycles to the same. With this setting, when the neuron 20 fetches an input value when performing a high-precision arithmetic operation, NOP (NO OPERATION) of 9 machine cycles occurs, so that the multiplicand preprocessing of the high-precision multiplication processing described later is performed during this period. The pipeline configuration will be described below.

【００１４】図２はパイプライン構成の説明と図１のニ
ューロチップ１７の内部構成を示したものである。尚、
図２のニューロチップ１７は生物の脳細胞に相当する複
数のニューロン２０から構成される。FIG. 2 shows the pipeline structure and the internal structure of the neurochip 17 shown in FIG. still,
The neurochip 17 of FIG. 2 is composed of a plurality of neurons 20 corresponding to the brain cells of a living being.

【００１５】まず、ニューラルネットワーク動作のパイ
プライン構成について説明する。図２中の破線で示すパ
イプラインは制御信号用のパイプラインであり、実線で
示すパイプラインはデータ用のパイプラインである。
尚、各パイプラインに添えられた数字はコントロールス
トレッジ制御回路１４ａにてコントロールストレッジ１
５に格納されるマイクロプログラムをデコードし実行さ
れるまでのマシンサイクルを時系列に示したものであ
る。First, the pipeline structure of the neural network operation will be described. A pipeline shown by a broken line in FIG. 2 is a pipeline for control signals, and a pipeline shown by a solid line is a pipeline for data.
The numbers attached to each pipeline are the control storage 1 by the control storage control circuit 14a.
5 is a time series of machine cycles until the microprogram stored in 5 is decoded and executed.

【００１６】第１に、ニューロン２０がこれと異なる別
ニューロン２０に値をブロードキャストする場合のマシ
ンサイクル数について説明する。図２において、コント
ロールストレッジ制御回路１４ａがニューロン２０内に
対する値の出力命令をデコードしこの制御信号がニュー
ロン２０内に到達するまで７マシンサイクル必要であ
る。次にこのニューロン２０が値を出力し、この値が制
御ボード１１とニューラルネットワーク制御装置１３を
介し別ニューロン２０の存在するニューロボード１２、
ニューロチップ１７を介し別ニューロン２０に入力され
るまでには延べ１６マシンサイクルが必要である。First, the number of machine cycles when the neuron 20 broadcasts a value to another different neuron 20 will be described. In FIG. 2, it takes 7 machine cycles until the control storage control circuit 14a decodes the value output instruction for the neuron 20 and the control signal reaches the neuron 20. Next, this neuron 20 outputs a value, and this value passes through the control board 11 and the neural network control device 13 to the neuro board 12 in which another neuron 20 exists,
A total of 16 machine cycles are required before inputting to another neuron 20 via the neurochip 17.

【００１７】第２に、入力層ニューロンの値はグローバ
ルメモリ１６に格納される、この値がニューロン２０に
格納されるまでのマシンサイクル数について説明する。
図２において、コントロールストレッジ制御回路１４ａ
がグローバルメモリ１６に対する値の出力命令をデコー
ドしこの制御信号がグローバルメモリ１６に到達するま
で５マシンサイクル必要である。次に、前述した理由に
より第１に説明したマシンサイクル数とこの第２のマシ
ンサイクル数とを同等にするため、ニューラルネットワ
ーク制御装置１３内に冗長なパイプラインを設けること
で、グローバルメモリ１６が値を出力し、ニューロン２
０に入力されるまでに延べ１６マシンサイクルが必要な
ようにパイプラインを設定する。Secondly, the value of the input layer neuron is stored in the global memory 16, and the number of machine cycles until this value is stored in the neuron 20 will be described.
In FIG. 2, the control storage control circuit 14a
Takes 5 machine cycles until the control signal reaches the global memory 16 after decoding the value output instruction to the global memory 16. Next, in order to make the number of machine cycles described in the first and the second number of machine cycles equal to each other for the above-mentioned reason, a redundant pipeline is provided in the neural network control device 13, so that the global memory 16 is Outputs the value, neuron 2
Set the pipeline so that a total of 16 machine cycles are required before it is input to 0.

【００１８】以上のパイプライン構成により、ニューロ
ンの状態値を入力層から中間層へブロードキャストする
時と中間層から出力層へブロードキャストする時とで必
要なマシンサイクル数を同等に設定することができる。
又、このパイプライン構成を採ることで、例えばニュー
ロン２０内で、グローバルメモリ１６やニューロン２０
内の値を乗数として乗算する場合、ニューロン２０にコ
ントロールストレッジ１６内のマイクロコードがデコー
ドされこの制御信号がニューロン２０に到達するまでに
７マシンサイクル、同様にグローバルメモリ１６やニュ
ーロン２０から別ニューロン２０に値が到達するまでに
は１６マシンサイクル必要なので、乗数をニューロン２
０内の乗算器がフェッチしてから乗算命令が実行される
までに差引９マシンサイクルのＮＯＰ状態が生じる。
尚、このＮＯＰ状態の間に後述する高精度乗算の前処理
を行なう。With the above pipeline configuration, the number of machine cycles required for broadcasting the state value of the neuron from the input layer to the intermediate layer and for broadcasting it from the intermediate layer to the output layer can be set equal.
Also, by adopting this pipeline configuration, for example, in the neuron 20, the global memory 16 and the neuron 20
In the case of multiplying the value in the multiplier as a multiplier, the neuron 20 decodes the microcode in the control storage 16 and takes 7 machine cycles until the control signal reaches the neuron 20, similarly from the global memory 16 or the neuron 20 to another neuron 20. It takes 16 machine cycles to reach the value of
A subtract 9 machine cycle NOP condition occurs after the multiplier in 0 fetches before the multiply instruction is executed.
During this NOP state, pre-processing for high-precision multiplication described later is performed.

【００１９】次に、後述する高精度乗算が行なえる手段
を有するニューロン２０の詳細な内部構成を説明する。Next, the detailed internal structure of the neuron 20 having means for performing high-precision multiplication, which will be described later, will be described.

【００２０】図３は図２中のニューロン２０のブロック
図を示している。構成を説明するとニューロン２０への
データの入出力は入出力バスを介して行なわれる。又、
ニューロン２０に対する命令は命令バスによりニューロ
ン２０に入力され、制御回路を介し各論理回路へ伝達さ
れる。各データの記憶装置としてはレジスタファイル３
０，ＲＡＭ３１ａ，ＲＡＭ３１ｂがあり、レジスタファ
イル３０はＦバスに出るデータを全て格納出来る。その
ため後述する高精度演算用に処理されたワード等を格納
でき、その値を再度演算器に入力、処理することで、後
述する高精度乗算が可能となる。又、多段のパイプライ
ン処理を行なうためのフリップフロップ（ＦＦ）群は、
ＦＦ３２ａ，ＦＦ３２ｂ，ＦＦ３２ｃ，ＦＦ３２ｄ，Ｆ
Ｆ３２ｅの５個から構成され、ニューロン２０内のバス
はＡ，Ｂ，Ｃ，Ｄ，Ｅ，Ｆ，Ｇの７本のバスから構成さ
れる。本発明は、このうちＥバスを設けることで後述す
る高精度乗算を可能にしている。このＥバスを設けるこ
とで単精度演算では乗算器３３の乗算結果積をＡＬＵ３
４とＦＦ３２ｄを介さず、直接シフタ３６でシフト処理
できる。又、高精度演算では同Ｅバスを設けることでＡ
ＬＵ３４にて演算処理中であっても乗算器３３の乗算結
果積をシフタ３６をスルーさせ、レジスタファイル３０
に格納でき、この格納された値を再度、ＡＬＵ３４や乗
算器３３において演算できるので後述する高精度演算が
可能となった。又、本発明では、ＣＣＲ３５を単精度演
算における各値の状態検出用のほかに単精度よりビット
幅が大きい値を高精度演算用に複数のワードに分割し、
この分割されたワードに対しても使用することで後述す
る高精度演算処理が行なえる。FIG. 3 shows a block diagram of the neuron 20 in FIG. The structure will be described. Input / output of data to / from the neuron 20 is performed via the input / output bus. or,
The command for the neuron 20 is input to the neuron 20 via the command bus and transmitted to each logic circuit via the control circuit. A register file 3 is used as a storage device for each data.
0, RAM 31a, RAM 31b, and the register file 30 can store all data output to the F bus. Therefore, it is possible to store a word or the like processed for high-precision arithmetic described later, and input the value again to the arithmetic unit to process it, thereby enabling high-precision multiplication described later. Further, the flip-flop (FF) group for performing the multi-stage pipeline processing is
FF32a, FF32b, FF32c, FF32d, F
It is composed of five F32e, and the bus in the neuron 20 is composed of seven buses A, B, C, D, E, F, and G. The present invention enables the high-precision multiplication described later by providing the E bus. By providing this E bus, the multiplication result product of the multiplier 33 can be calculated by the ALU3 in the single precision operation.
The shift processing can be performed directly by the shifter 36 without passing through the 4 and the FF 32d. In addition, by providing the same E bus for high precision arithmetic,
Even if the LU 34 is in the process of calculation, the product of the multiplication result of the multiplier 33 is passed through the shifter 36, and the register file 30
Can be stored in the ALU 34 and the stored value can be calculated again in the ALU 34 and the multiplier 33, which enables high-precision calculation to be described later. Further, in the present invention, the CCR 35 is divided into a plurality of words for high-precision calculation, in addition to detecting the state of each value in single-precision calculation, a value having a bit width larger than single-precision is divided,
By using this divided word as well, high-precision arithmetic processing described later can be performed.

【００２１】次に、以下、前述したニューロン回路を用
いた高精度乗算の前処理と高精度演算の説明を行なう。
高精度の乗算では以下に説明するように乗算は勿論、加
算、シフト等も演算処理するのでここでは高精度乗算を
例に図４を用いて説明する。図４は２ワードに分割され
た被乗数上位ワード４０ａと被乗数下位ワード４０ｂに
同様に２ワードに分割された乗数上位ワード４１ａと乗
数下位ワード４１ｂを乗算する方法を示したものであ
る。尚、各々のワードを乗算した積が部分積４２、部分
積４３、部分積４４、部分積４５にあたる。Next, the preprocessing and high-precision calculation of the high-precision multiplication using the above-mentioned neuron circuit will be described below.
In high-precision multiplication, not only multiplication but also addition, shift, and the like are calculated as described below, and therefore high-precision multiplication will be described as an example with reference to FIG. FIG. 4 shows a method of multiplying the multiplicand upper word 40a and the multiplicand lower word 40b divided into two words by the multiplier upper word 41a and the multiplier lower word 41b similarly divided into two words. The products obtained by multiplying each word correspond to the partial product 42, the partial product 43, the partial product 44, and the partial product 45.

【００２２】まずここで対象にする演算処理装置におけ
る乗算器の入力データビット幅が、被乗数，乗数，積と
も乗算器のビット幅を越えていると仮定する。この仮定
のもとでは従来方法における乗算はできないので乗数、
被乗数共にいくつかのワードに分割する（図４では全て
２ワードずつに分割する）。まず、被乗数下位ワード４
０ｂの最上位ビットをニューロンチップ内の演算器が符
号ビットとして扱っても正常な乗算が行なえるように被
乗数の上位ワード４０ａに次に示す高精度乗算の前処理
を行なう。被乗数の上位ワード４０ａの最下位ビット
に、被乗数下位ワード４０ｂの最上位ビットを加算す
る。次に先ほど分割したワードを図４の例ではそれぞれ
４回乗算し、各々の桁を合わせ加算する。具体的には以
下の４通り。First, it is assumed that the input data bit width of the multiplier in the target arithmetic processing device exceeds the bit width of the multiplier in all of the multiplicand, the multiplier and the product. Under this assumption, multiplication by the conventional method cannot be performed, so a multiplier,
Both the multiplicand are divided into some words (in FIG. 4, all are divided into two words). First, multiplicand lower word 4
The following high-precision multiplication preprocessing is performed on the high-order word 40a of the multiplicand so that normal multiplication can be performed even if the arithmetic unit in the neuron chip treats the most significant bit of 0b as a sign bit. The most significant bit of the multiplicand lower word 40b is added to the least significant bit of the higher word 40a of the multiplicand. Next, in the example of FIG. 4, the previously divided words are multiplied four times, and the respective digits are aligned and added. Specifically, there are the following four ways.

【００２３】１．被乗数下位ワード４０ｂ×乗数下位ワ
ード４１ｂ＝部分積４２２．被乗数下位ワード４０ｂ×乗数上位ワード４１ａ＝
部分積４３３．被乗数上位ワード４０ａ×乗数下位ワード４１ｂ＝
部分積４４４．被乗数上位ワード４０ａ×乗数上位ワード４１ａ＝
部分積４５又、上記４つの部分積の桁を合わせ、加算した結果は以
下の通り。1. Multiplicand lower word 40b × multiplier lower word 41b = partial product 42 2. Multiplicand lower word 40b × multiplier upper word 41a =
Partial product 43 3. Multiplicand upper word 40a × multiplier lower word 41b =
Partial product 44 4. Multiplicand upper word 40a × multiplier upper word 41a =
Partial product 45 The results of adding and adding the digits of the above four partial products are as follows.

【００２４】部分積４８ｂ＝部分積４２＋部分積４３部分積４８ａ＝部分積４４＋部分積４５（＋乗数上位ワ
ード４７＋乗数下位ワード４６）部分積４８ａを求める（）内の演算は、前述した被乗数
の上位ワード４０ａの最下位ビットに、被乗数下位ワー
ド４０ｂの最上位ビットを加算した際、和に桁溢れが生
じた場合のみ行なう。Partial product 48b = Partial product 42 + Partial product 43 Partial product 48a = Partial product 44 + Partial product 45 (+ Multiplier upper word 47 + Multiplier lower word 46) The operation in parentheses for the partial product 48a is This is performed only when a sum overflow occurs when the most significant bit of the multiplicand lower word 40b is added to the least significant bit of the upper word 40a.

【００２５】尚、乗数においても被乗数と同様に上位ワ
ードの最下位ビットに下位ワードの最上位ビットを加算
し、より高精度な乗算は可能だがこの実施例では以下に
説明する理由のために行なわない。図４中の乗数上位ワ
ード４１ａ，乗数下位ワード４１ｂの有効ビット（有効
ビット：部分積４８ｂ及び部分積４８ａが加算時にオー
バーフローしないように乗数下位ワード４１ｂ及び乗数
上位ワード４１ａのビット幅を調整することができる。
調整するに当って例えば乗数下位ワード４１ｂのビット
幅が最大１０ビットであってもオーバーフローさせない
ために８ビットとし上位２ビットには０を詰める。この
場合、０以外の８ビットを有効ビットと定義する。）を
適切に調整することで部分積同士の加算時においてオー
バーフローを起こすことが防げ、オーバーフローに起因
する桁溢れしたビットを該当する別のワードのビットに
桁合わせし加算する必要が無くこの処理時間分演算時間
を早めることができる。この加算結果は図４に示した部
分積４８ｂ，部分積４８ａのようにそれぞれのワードの
重複した部分のビットを積下位ワード４９ｂとしてまと
め、桁溢れした分は積上位ワード４９ａとし、最終的な
乗算結果（積）とする。この乗算方法を用いることで乗
数、被乗数、積のビット幅を拡張でき、高精度な乗算が
出来る。As for the multiplier, the most significant bit of the lower word can be added to the least significant bit of the upper word in the same manner as the multiplicand, and more accurate multiplication is possible, but this embodiment is performed for the reason described below. Absent. Effective bits of the multiplier upper word 41a and the multiplier lower word 41b in FIG. 4 (effective bits: adjusting the bit width of the multiplier lower word 41b and the multiplier upper word 41a so that the partial product 48b and the partial product 48a do not overflow during addition). You can
In adjusting, for example, even if the bit width of the multiplier lower word 41b is 10 bits at the maximum, it is set to 8 bits so that the upper 2 bits are filled with 0 in order not to overflow. In this case, 8 bits other than 0 are defined as valid bits. ) Is appropriately adjusted to prevent overflow when adding partial products, and it is not necessary to align and add the overflowed bit caused by the overflow to the bit of the corresponding another word. Minute calculation time can be shortened. As a result of this addition, the bits of the overlapping parts of the respective words are put together as a product lower word 49b like the partial product 48b and the partial product 48a shown in FIG. It is the multiplication result (product). By using this multiplication method, the bit width of the multiplier, the multiplicand, and the product can be expanded, and highly accurate multiplication can be performed.

【００２６】次に、図４に示す乗算を時系列的に説明す
る。Next, the multiplication shown in FIG. 4 will be described in time series.

【００２７】図５は高精度乗算の概略フローである。図
中の英文字は図４に対応する。まず、図２において説明
したニューロン２０に乗数が到達するまでに図５フロー
５０，フロー５１に示す被乗数の前処理を行なう。次に
フロー５２において図４に示す部分積４２を算出し、被
乗数下位ワード４１ｂを被乗数下位ワード４６としてレ
ジスタファイル３０に格納。フロー５３では部分積４４
を算出。フロー５４で部分積４３を算出。又、乗数上位
ワード４１ａを乗数上位ワード４７としてレジスタファ
イル３０に格納。さらに部分積４３を乗数下位ワード４
１ｂの有効ビット分左シフトしてこれに部分積４２を加
算し、和を部分積４８ｂとする。フロー５５では部分積
４５を算出し、部分積４５を被乗数下位ワード４１ｂの
有効ビット分左シフトしてこれに部分積４４を加算し、
和を部分積４８ａとする。次にフロー５６においてフロ
ー５１で行なった桁溢れ検知処理の有無の判断を行な
い、桁溢れ処理が有った場合フロー５７におい部分積４
８ａに被乗数下位ワード４６と被乗数上位ワード４７を
乗数下位ワード４１ｂの有効ビット分左シフトしたもの
を加算しこの和を再び部分積４８ａとする。尚、フロー
５６で桁溢れ処理が無い場合、フロー５８の処理を行な
う。フロー５８では部分積４８ｂと部分積４８ａのビッ
トの重複部分を加算したものを積下位ワード４９ｂと
し、この加算で桁溢れした分は部分積４８ａの該当ビッ
トに加算し、その和を積上位ワード４９ａとする。FIG. 5 is a schematic flow of high precision multiplication. The letters in the figure correspond to those in FIG. First, before the multiplier reaches the neuron 20 described with reference to FIG. 2, the multiplicand preprocessing shown in the flow 50 and the flow 51 of FIG. 5 is performed. Next, in a flow 52, the partial product 42 shown in FIG. 4 is calculated, and the multiplicand lower word 41b is stored in the register file 30 as the multiplicand lower word 46. Partial product 44 in flow 53
Calculate. The partial product 43 is calculated in the flow 54. Further, the multiplier upper word 41a is stored in the register file 30 as the multiplier upper word 47. Furthermore, the partial product 43 is set to the lower word 4 of the multiplier.
The effective bit of 1b is shifted to the left and the partial product 42 is added to this, and the sum is made a partial product 48b. In the flow 55, the partial product 45 is calculated, the partial product 45 is left-shifted by the effective bit of the multiplicand lower word 41b, and the partial product 44 is added to this,
The sum is the partial product 48a. Next, in step 56, the presence or absence of the overflow detection process performed in step 51 is judged. If there is the overflow process, the partial product 4 in flow 57 is obtained.
8a is added with the multiplicand lower word 46 and the multiplicand upper word 47 left-shifted by the effective bits of the multiplier lower word 41b, and this sum is again used as a partial product 48a. If there is no overflow processing in the flow 56, the processing in the flow 58 is performed. In the flow 58, the bit overlapped part of the partial product 48b and the partial product 48a is added to form the product lower word 49b. The overflow of this addition is added to the corresponding bit of the partial product 48a, and the sum is added to the product upper word. 49a.

【００２８】次に図２のニューロンアーキテクチャと図
４の高精度乗算処理方法を用いパイプライン動作による
高精度乗算処理をマシンサイクル毎の動作に区切り図５
のフローに対応させ詳細に説明する。Next, using the neuron architecture of FIG. 2 and the high-precision multiplication processing method of FIG. 4, the high-precision multiplication processing by pipeline operation is divided into operations for each machine cycle.
It will be described in detail corresponding to the flow of.

【００２９】尚、以下、個条書きされた文はマイクロプ
ログラムコードに対応したもので、文頭の数字は高精度
乗算処理がスタートした時点からのマシンサイクル数を
表す。又、データ等の流れは、図３をもとに説明を行な
う。In the following, the individually written sentences correspond to the micro program code, and the number at the beginning of the sentence represents the number of machine cycles from the time when the high precision multiplication process is started. The flow of data and the like will be described with reference to FIG.

【００３０】マシンサイクル０から８までは図５のフロ
ー５０からフロー５１に対応。０．乗数下位ワードブロードキャスト。１．ＮＯＰ（ＮＯＯＰＥＲＡＴＩＯＮ：ニューロン１
５に何も命令を与えない状態）２．ＲＡＭ３１ａ，３１ｂから被乗数下位ワード（１６
ビット）をフェッチしＦＦ３２ｃに格納。ＦＦ３２ｄリ
セット。３．ＡＬＵ３４にてＦＦ３２ｄとＦＦ３２ｃを加算。和のコンディションをＣＣＲ３５に記録。ＣＣＲ３５に
はデータのキャリー，オーバーフロー，符号，０か否か
の４つのその時の状態が記録できる。尚、ＣＣＲ３５の
フラグは以下の通り、キャリーが立って１，オーバーフ
ローして１，符号がマイナスで１，値が０で１、その他
は全て０となる。４．ＡＬＵ３４にてＦＦ３２ｄと０とＣＣＲ３５の符号
を加算する。和はＦＦ３２ｄに格納。５．ＦＦ３２ｄの値を左に８ビット論理シフト。シフト
結果はＦＦ３２ｄに格納。シフタ３６のシフトビットは
０，１，２，４，８，−１，−２，−４，−８，−１６
の１０通り。６．ＦＦ３２ｄの値を左に８ビット論理シフト。シフト
結果はレジスタファイル３０のアドレス１３（以下レジ
スタファイル３０のアドレス１３はＲＦ３０［１３］の
様に表す［］内はアドレス）に格納。乗数下位ワード
ブロードキャスト。７．ＲＦ３０［１３］の値をＦＦ３２ｂに格納、ＲＡＭ
３１ａ，３１ｂから被乗数上位ワードをＦＦ３２ｃに格
納。８．ＦＦ３２ｂとＦＦ３２ｃを加算、和のコンディショ
ンをＣＣＲ３５に記録。この和をＲＦ３０［１３］に格
納。マシンサイクル９から１４までは図５のフロー５２
に対応。９．ＦＦ３２ｂに乗数下位ワード格納、ＦＦ３２ｃに被
乗数下位ワード格納。ＦＦ３２ｄリセット。乗数上位ワ
ードブロードキャスト。１０．マイクロプログラム上ではＮＯＰだが乗算処理が
行なわれていて、積はＦＦ３２ｅに格納される。１１．ＦＦ３２ｄとＦＦ３２ｅを加算。和をＲＦ３０
［１１］に格納。１２．ＦＦ３２ｂ（乗数下位ワード）を右に１６ビット
算術シフト。シフト結果ＦＦ３２ｄに格納。１３．ＦＦ３２ｄを右に４ビット論理シフト。シフト結
果ＦＦ３２ｄに格納。１４．ＦＦ３２ｄを右に２ビット論理シフト。シフト結
果ＲＦ３０［１２］に格納。マシンサイクル１５から１７までは図５のフロー５３に
対応。１５．ＦＦ３２ｂに乗数下位ワード格納。ＦＦ３２ｃに
ＲＦ３０［１３］（被乗数上位ワード）格納。ＦＦ３２
ｄリセット。乗数上位ワードブロードキャスト。１６．マイクロプログラム上ではＮＯＰだが乗算処理が
行なわれていて、積はＦＦ３２ｅに格納される。１７．ＡＬＵ３４にてＦＦ３２ｄ（値は０）とＦＦ３２
ｅ（積）を加算。和をＲＦ３０［１０］に格納。マシン
サイクル１８から２３までは図５のフロー５４に対応。１８．ＦＦ３２ｂに乗数上位ワード格納。ＦＦ３２ｃに
被乗数下位ワード格納。１９．マイクロプログラム上ではＮＯＰだが乗算処理が
行なわれていて、積はＦＦ３２ｅに格納される。２０．ＦＦ３２ｂ（乗数上位ワード）を右に１６ビット
算術シフト。シフト結果ＦＦ３２ｄに格納。２１．ＦＦ３２ｄを右に２ビット論理シフト。シフト結
果ＲＦ３０［９］に格納。The machine cycles 0 to 8 correspond to the flow 50 to the flow 51 of FIG. 0. Multiplier lower word broadcast. 1. NOP (NO OPERATION: Neuron 1
No state is given to 5) 2. From the RAMs 31a and 31b, the multiplicand lower word (16
(Bit) and stored in FF32c. FF32d reset. 3. Add FF32d and FF32c in ALU34. Record the Japanese condition on CCR35. The CCR 35 can record data carry, overflow, sign, and four current states, that is, 0 or not. Incidentally, the flags of the CCR 35 are as follows, when a carry is raised 1, overflows, 1, the sign is minus 1, the value is 0, and the others are 0. 4. The ALU 34 adds the FF 32d, 0 and the code of the CCR 35. The sum is stored in FF32d. 5. The value of FF32d is 8-bit logical shift to the left. The shift result is stored in FF32d. The shift bits of the shifter 36 are 0, 1, 2, 4, 8, -1, -2, -4, -8, -16.
10 ways. 6. The value of FF32d is 8-bit logical shift to the left. The shift result is stored in the address 13 of the register file 30 (hereinafter, the address 13 of the register file 30 is an address in [] represented as RF30 [13]). Multiplier lower word broadcast. 7. Store the value of RF30 [13] in FF32b, RAM
The multiplicand upper word from 31a and 31b is stored in the FF 32c. 8. Add FF32b and FF32c and record the sum condition in CCR35. Store this sum in RF30 [13]. The machine cycle 9 to 14 is the flow 52 of FIG.
Corresponding to. 9. Store the lower word of the multiplier in FF32b and the lower word of the multiplicand in FF32c. FF32d reset. Multiplier high word broadcast. 10. Although it is NOP on the microprogram, multiplication processing is performed, and the product is stored in the FF 32e. 11. Add FF32d and FF32e. RF sum 30
Stored in [11]. 12. 16-bit arithmetic shift of FF32b (lower word of multiplier) to the right. Stored in the shift result FF 32d. 13. 4-bit logical shift of FF32d to the right. Stored in the shift result FF 32d. 14. 2-bit logical shift of FF32d to the right. Stored in shift result RF30 [12]. The machine cycles 15 to 17 correspond to the flow 53 of FIG. 15. Store the lower word of the multiplier in FF32b. RF30 [13] (multiplicand upper word) is stored in FF32c. FF32
d reset. Multiplier high word broadcast. 16. Although it is NOP on the microprogram, multiplication processing is performed, and the product is stored in the FF 32e. 17. FF32d (value is 0) and FF32 in ALU34
Add e (product). Store the sum in RF30 [10]. The machine cycles 18 to 23 correspond to the flow 54 of FIG. 18. Store the multiplier upper word in the FF 32b. Store multiplicand lower word in FF32c. 19. Although it is NOP on the microprogram, multiplication processing is performed, and the product is stored in the FF 32e. 20. 16-bit arithmetic shift of FF32b (higher word of multiplier) to the right. Stored in the shift result FF 32d. 21. 2-bit logical shift of FF32d to the right. Stored in shift result RF30 [9].

【００３１】２２．ＦＦ３２ｅ（積）を右に２ビット論
理シフト（Ｅバスからの積は上詰めなので事実上８ビッ
トシフトしたことになる）。シフト結果ＦＦ３２ｄに格
納。ＦＦ３２ｃにＲＦ３０［１１］を格納。２３．ＡＬＵ３４にてＦＦ３２ｄとＦＦ３２ｃを加算。
和はＲＦ３０［１１］に格納。マシンサイクル２４から２７までは図５のフロー５５に
対応。２４．ＦＦ３２ｂに乗数上位ワード格納。ＦＦ３２ｃに
ＲＦ３０［１３］（被乗数上位ワード）格納。２５．マイクロプログラム上ではＮＯＰだが乗算処理が
行なわれていて、積はＦＦ３２ｅに格納される。２６．ＦＦ３２ｅ（積）を右に２ビット論理シフト（Ｅ
バスからの積は上詰めなので事実上８ビットシフトした
ことになる）。シフト結果ＦＦ３２ｄに格納。ＦＦ３２
ｃにＲＦ３０［１０］を格納。２７．ＡＬＵ３４にてＦＦ３２ｄとＦＦ３２ｃを加算。
和はＲＦ３０［１０］に格納。マシンサイクル２８から
３０までは図５のフロー５６からフロー５７に対応。２８．ＣＣＲ３５のオーバーフローフラグが立っていた
ら、ＦＦ３２ｂにＲＦ３０［１０］を格納。ＦＦ３２ｃ
にＲＦ３０［１２］を格納。２９．ＣＣＲ３５のオーバーフローフラグが立っていた
ら、ＡＬＵ３４にてＦＦ３２ｂとＦＦ３２ｃを加算。和
をＦＦ３２ｄに格納。ＦＦ３２ｃにＲＦ３０［９］を格
納。３０．ＣＣＲ３５のオーバーフローフラグが立っていた
ら、ＡＬＵ３４にてＦＦ３２ｄとＦＦ３２ｃを加算。和
をＲＦ３０［１０］に格納。マシンサイクル３１から４
４までは図５のフロー５８に対応。３１．ＦＦ３２ｂにＲＦ３０［１１］を格納。３２．ＡＬＵ３４にてＦＦ３２ｂと０を加算。和のコン
ディションをＣＣＲ３５に格納。３３．ＣＣＲ３５の符号フラグがマイナスであれば、Ｆ
Ｆ３２ｂにＲＦ３０［１０］を格納。ＦＦ３２ｃにＲＦ
３０［ａ］を格納。（ＲＦ３０［ａ］には予め１６進数
でｆｆｆｆ００００の値を格納しておく）３４．ＣＣＲ３５の符号フラグがマイナスであれば、Ａ
ＬＵ３４にてＦＦ３２ｂとＦＦ３２ｃを加算。この和を
ＲＦ３０［１０］に格納。３５．ＦＦ３２ｂにＲＦ３０［１０］を格納。３６．ＡＬＵ３４にてＦＦ３２ｂと０を加算。この和を
ＦＦ３２ｄに格納。３７．ＦＦ３２ｄの値をシフタ３６により左に８ビット
論理シフト。シフト結果をＦＦ３２ｄに格納。３８．ＦＦ３２ｄの値をシフタ３６により左に８ビット
論理シフト。シフト結果をＦＦ３２ｄに格納。ＦＦ３２
ｃにＲＦ３０［１１］を格納。３９．ＡＬＵ３４にてＦＦ３２ｄとＦＦ３２ｃを加算。
但し、和のコンディションのキャリーはＣＣＲ３５に格
納するが和のオーバーフロー処理（和を最大値に設定す
る）は行なわない。この和をＲＦ３０［１１］に格納。
ＦＦ３２ｄをリセット。４０．ＡＬＵ３４にてＦＦ３２ｄ（０）と０とＣＣＲ３
５のキャリーを加算。この和をＲＦ３０［１３］に格
納。４１．ＦＦ３２ｂにＲＦ３０［１０］を格納。ＦＦ３２
ｃにＲＦ３０［ａ］を格納。４２．ＡＬＵ３４にてＦＦ３２ｂとＦＦ３２ｃの論理積
をとる。この結果をＦＦ３２ｄに格納。４３．ＦＦ３２ｄの値をシフタ３６により右に１６ビッ
ト算術シフト。このシフト結果をＦＦ３２ｄに格納。
又、ＦＦ３２ｃにＲＦ３０［１３］を格納。４４．ＡＬＵ３４にてＦＦ３２ｄとＦＦ３２ｃを加算。
この和をＲＦ３０［１０］に格納。以上で乗算結果（積）の上位ワードがＲＦ３０［１０］
へ下位ワードがＲＦ３０［１１］へ格納される。このよ
うに固定小数点で且つ単精度演算処理を行なうニューロ
コンピュータにおいて図２に示したニューロンアーキテ
クチャと図４、図５に示した高精度乗算方法を用いるこ
とで、高精度演算処理が可能となる。22. FF32e (product) is logically shifted to the right by 2 bits (the product from the E bus is right-justified, so it is effectively shifted by 8 bits). Stored in the shift result FF 32d. RF30 [11] is stored in FF32c. 23. Add FF32d and FF32c in ALU34.
The sum is stored in RF30 [11]. The machine cycles 24 to 27 correspond to the flow 55 of FIG. 24. Store the multiplier upper word in the FF 32b. RF30 [13] (multiplicand upper word) is stored in FF32c. 25. Although it is NOP on the microprogram, multiplication processing is performed, and the product is stored in the FF 32e. 26. FF32e (product) is right shifted by 2 bits (E
Since the product from the bus is on top, it is effectively shifted by 8 bits). Stored in the shift result FF 32d. FF32
Store RF30 [10] in c. 27. Add FF32d and FF32c in ALU34.
The sum is stored in RF30 [10]. The machine cycles 28 to 30 correspond to the flow 56 to the flow 57 of FIG. 28. If the overflow flag of the CCR 35 is set, RF30 [10] is stored in the FF 32b. FF32c
Store RF30 [12] in. 29. If the overflow flag of the CCR 35 is set, the FF 32b and the FF 32c are added by the ALU 34. Store the sum in FF32d. RF30 [9] is stored in FF32c. 30. If the overflow flag of the CCR 35 is set, the FF 32d and the FF 32c are added by the ALU 34. Store the sum in RF30 [10]. Machine cycle 31 to 4
Up to 4 corresponds to the flow 58 in FIG. 31. RF30 [11] is stored in FF32b. 32. Add 0 to FF32b in ALU34. Store the condition of the sum in CCR35. 33. If the sign flag of the CCR 35 is negative, F
Store RF30 [10] in F32b. RF to FF32c
Stores 30 [a]. (The value of ffff0000 in hexadecimal is stored in advance in RF30 [a]) 34. If the sign flag of the CCR 35 is negative, A
Add FF32b and FF32c in LU34. Store this sum in RF30 [10]. 35. RF30 [10] is stored in FF32b. 36. Add 0 to FF32b in ALU34. This sum is stored in FF32d. 37. The value of FF32d is logically shifted to the left by 8 bits by the shifter 36. Store the shift result in FF32d. 38. The value of FF32d is logically shifted to the left by 8 bits by the shifter 36. Store the shift result in FF32d. FF32
Store RF30 [11] in c. 39. Add FF32d and FF32c in ALU34.
However, although the carry of the sum condition is stored in the CCR 35, the overflow process of the sum (setting the sum to the maximum value) is not performed. Store this sum in RF30 [11].
Reset FF32d. 40. FF32d (0), 0 and CCR3 in ALU34
Add a carry of 5. Store this sum in RF30 [13]. 41. RF30 [10] is stored in FF32b. FF32
Store RF30 [a] in c. 42. The ALU 34 calculates the logical product of the FF 32b and the FF 32c. The result is stored in FF32d. 43. 16-bit arithmetic shift of the value of FF32d to the right by the shifter 36. The shift result is stored in FF32d.
Also, RF30 [13] is stored in the FF 32c. 44. Add FF32d and FF32c in ALU34.
Store this sum in RF30 [10]. As described above, the upper word of the multiplication result (product) is RF30 [10].
The lower word is stored in RF30 [11]. By using the neuron architecture shown in FIG. 2 and the high-precision multiplication method shown in FIG. 4 and FIG. 5 in a neurocomputer that performs fixed-point and single-precision arithmetic processing in this manner, high-precision arithmetic processing becomes possible.

【００３２】[0032]

【発明の効果】以上述べたように、本発明によれば以下
の効果がある。乗数、被乗数共に従来１ワードで構成し
ていたデータを複数のワードに分割し、ビット幅を広げ
ることで従来に比べ高精度な演算処理が実現でき、しか
も高精度演算専用の特別な回路を付加していないので回
路規模拡張はなくチップ面積の拡大がないことでチップ
コストの増大がない。As described above, the present invention has the following effects. By dividing the data that was conventionally composed of 1 word for both the multiplier and the multiplicand into multiple words and increasing the bit width, you can realize more accurate arithmetic processing than before, and add a special circuit dedicated to high precision arithmetic. Therefore, the circuit scale is not expanded and the chip area is not expanded, so that the chip cost is not increased.

【００３３】又、各演算器の前後にフリップフロップと
複数のバスを設けるので高精度演算用に乗数、被乗数を
複数に分割したワード等、高精度演算に係わる値を任意
の演算器に入力でき、これを演算処理できることで高精
度演算が可能となった。さらに、パイプライン処理を行
なうことで、各演算器、例えば、ＡＬＵと乗算器を並列
に演算処理させることで通常の単精度演算はもちろん、
高精度演算に至っても単精度演算同様に高精度乗算の最
中にＡＬＵやＣＣＲを用いた高精度データの並列処理が
行なうことができる。尚、図１のワークステーション１
８を図６に示すようにユーザがビジュアル・ユーザ・イ
ンタフェースを用いニューラルネットワークの形態を決
定することで、単精度のマイクロプログラムは一意に決
めることができる。これと同様にニューラルネットワー
クの形態が一意に決まれば高精度演算のマイクロプログ
ラムも一意に決まるので、マイクロジェネレータにユー
ザにより指定された精度のマイクロプログラムを発生さ
せ、本発明の高精度演算対応のニューロンアーキテクチ
ャと高精度演算処理方法を用いることで、ユーザがビジ
ュアル・ユーザ・インタフェースによりニューラルネッ
トワークの形態と演算精度を決めることでニューラルネ
ットワーク動作が所望の演算精度で演算処理することが
できる。Since a flip-flop and a plurality of buses are provided before and after each arithmetic unit, a value related to high precision arithmetic such as a word obtained by dividing a multiplier and a multiplicand into a plurality for high precision arithmetic can be input to an arbitrary arithmetic unit. , High accuracy calculation became possible by being able to process this. Further, by performing pipeline processing, the respective arithmetic units, for example, the ALU and the multiplier are arithmetically processed in parallel, and of course ordinary single-precision arithmetic operation is possible.
Even in the case of high precision calculation, parallel processing of high precision data using ALU or CCR can be performed during high precision multiplication as in single precision calculation. The workstation 1 in FIG.
8, the user determines the form of the neural network using the visual user interface, so that the single-precision microprogram can be uniquely determined. Similarly, if the form of the neural network is uniquely determined, the microprogram for high-precision arithmetic is also uniquely determined. Therefore, the microgenerator having the precision specified by the user is generated in the microgenerator, and the neuron corresponding to the high-precision arithmetic of the present invention is generated. By using the architecture and the high-precision arithmetic processing method, the user determines the form and arithmetic accuracy of the neural network by the visual user interface, so that the neural network operation can perform arithmetic processing with desired arithmetic accuracy.

[Brief description of drawings]

【図１】ニューロコンピュータ構成図を示す。FIG. 1 shows a neurocomputer configuration diagram.

【図２】パイプライン構成図を示す。FIG. 2 shows a pipeline configuration diagram.

【図３】ニューロンブロック図を示す。FIG. 3 shows a neuron block diagram.

【図４】高精度乗算方法概念図を示す。FIG. 4 shows a conceptual diagram of a high precision multiplication method.

【図５】高精度乗算フローを示す。FIG. 5 shows a high precision multiplication flow.

【図６】ユーザインタフェースを示す。FIG. 6 shows a user interface.

[Explanation of symbols]

１０…ニューロコンピュータ、１１…制御ボード、１２
…ニューロボード、１３…ニューラルネットワーク制御
装置、１５…コントロールストレッジ、１６…グローバ
ルメモリ、１７…ニューロチップ、２０…ニューロン、
３０…レジスタファイル、３１ａ…ＲＡＭ、３１ｂ…Ｒ
ＡＭ、３２ａ…フリップフロップ、３２ｂ…フリップフ
ロップ、３２ｃ…フリップフロップ、３２ｄ…フリップ
フロップ、３２ｅ…フリップフロップ、３３…乗算器、
３４…ＡＬＵ、３５…コンディションコードレジスタ、
３６…シフタ、４０ａ…被乗数上位ワード、４０ｂ…被
乗数下位ワード、４１ａ…乗数上位ワード、４１ｂ…乗
数下位ワード、４２…部分積、４３…部分積、４４…部
分積、４５…部分積、４６…乗数下位ワード、４７…乗
数上位ワード、４８ａ…部分積、４８ｂ…部分積、４９
ａ…積上位ワード、４９ｂ…積下位ワード。10 ... Neurocomputer, 11 ... Control board, 12
... Neuro board, 13 ... Neural network controller, 15 ... Control storage, 16 ... Global memory, 17 ... Neuro chip, 20 ... Neuron,
30 ... Register file, 31a ... RAM, 31b ... R
AM, 32a ... Flip-flop, 32b ... Flip-flop, 32c ... Flip-flop, 32d ... Flip-flop, 32e ... Flip-flop, 33 ... Multiplier,
34 ... ALU, 35 ... Condition code register,
36 ... Shifter, 40a ... Multiplicand upper word, 40b ... Multiplicand lower word, 41a ... Multiplier upper word, 41b ... Multiplier lower word, 42 ... Partial product, 43 ... Partial product, 44 ... Partial product, 45 ... Partial product, 46 ... Multiplier lower word, 47 ... Multiplier upper word, 48a ... Partial product, 48b ... Partial product, 49
a ... product high word, 49b ... product low word.

───────────────────────────────────────────────────── フロントページの続き (72)発明者坂口隆宏東京都小平市上水本町５丁目20番１号日立超エル・エス・アイ・エンジニアリング株式会社内 (72)発明者佐藤裕二東京都国分寺市東恋ケ窪１丁目280番地株式会社日立製作所中央研究所内 (72)発明者柴田克成東京都国分寺市東恋ケ窪１丁目280番地株式会社日立製作所中央研究所内 (72)発明者浅井光男東京都国分寺市東恋ケ窪１丁目280番地株式会社日立製作所中央研究所内 (72)発明者桑原良博東京都小平市上水本町５丁目22番１号株式会社日立マイコンシステム (72)発明者 ▲高▼柳博東京都小平市上水本町５丁目22番１号株式会社日立マイコンシステム (72)発明者岡橋卓夫東京都小平市上水本町５丁目22番１号株式会社日立マイコンシステム (72)発明者落合辰男東京都小平市上水本町５丁目22番１号株式会社日立マイコンシステム (72)発明者茂木啓次東京都小平市上水本町５丁目22番１号株式会社日立マイコンシステム ─────────────────────────────────────────────────── ─── Continuation of the front page (72) Inventor Takahiro Sakaguchi 5-20-1, Josuihoncho, Kodaira-shi, Tokyo Inside Hiritsu Cho-LS Engineering Co., Ltd. (72) Inventor Yuji Sato Kokubunji, Tokyo 1-280, Higashi-Kengokubo, Higashi, Ltd. Inside the Central Research Laboratory, Hitachi, Ltd. (72) Inventor, Katsunari Shibata, 1-280, Higashi-Kengokubo, Kokubunji, Tokyo (72) Inside, Central-Laboratory, Hitachi, Ltd. (72) Mitsuo Asai, 1-Higashi-Kengokubo, Kokubunji, Tokyo 280 In the Central Research Laboratory of Hitachi, Ltd. (72) Inventor Yoshihiro Kuwahara 5-22-1 Kamisuihonmachi, Kodaira-shi, Tokyo Hitachi Microcomputer System Co., Ltd. (72) Inventor ▲ Taka ▼ Hiroshi Yanagi, Tokyo 5-22-1, Mizumotocho Hitachi Microcomputer System Co., Ltd. (72) Inventor Takashi Okahashi Husband 5-22-1 Kamimizuhoncho, Kodaira-shi, Tokyo Hitachi Microcomputer System Co., Ltd. (72) Inventor Tatsuo Ochiai 5-22-1 Kamimizuhonmachi, Kodaira-shi, Tokyo Hitachi Microcomputer System (72) Invention Person Mogi Keiji 5-22-1, Kamisuimotocho, Kodaira-shi, Tokyo Hitachi Microcomputer System Co., Ltd.

Claims

[Claims]

1. A high-precision arithmetic processing device for performing arithmetic processing as a neuron in a neurocomputer that performs neural network operation by fixed-point and single-precision arithmetic processing, for performing arithmetic processing with higher precision than single-precision arithmetic. A high-precision arithmetic processing device comprising: a means for storing the arithmetic result of at least a multiplier, an ALU, and a shifter; and a communication means for communicating the arithmetic result.

2. A high-precision arithmetic processing device characterized in that a register file used as a work area during single-precision arithmetic processing is used as means for storing the arithmetic result according to claim 1.

3. A high-precision arithmetic processing device, wherein the output values of the multiplier, ALU, and shifter are connected to a storage device by a bus as means for communicating the arithmetic result according to claim 1.

4. A high-precision arithmetic processing device, in order to perform arithmetic processing with higher precision than single-precision arithmetic, whether a carry, overflow, sign, or value of a multiplier, a multiplicand, a partial product, an addition result, and a shift result is 0 or not. A high-precision arithmetic processing device having a condition code register (CCR) capable of detecting each of the states.

5. A high-precision arithmetic processing device characterized in that, as the CCR according to claim 4, the CCR used for detecting the state of the value fetched in the ALU during the single-precision arithmetic processing is used.

6. A high-precision arithmetic processing device that performs arithmetic processing as a neuron in a neurocomputer that performs neural network operation by fixed-point and single-precision arithmetic processing, and broadcasts the state value of the neuron from an input layer to an intermediate layer. A high-precision arithmetic processing device that equalizes the number of machine cycles required for pipeline processing at the time and when broadcasting from the middle layer to the output layer.

7. A high-precision arithmetic processing device for performing arithmetic processing as a neuron in a neurocomputer that performs neural network operation by fixed-point and single-precision arithmetic processing, wherein the neurocomputer has an architecture for performing single-precision arithmetic processing. A high-precision arithmetic processing device characterized by controlling and executing the high-precision arithmetic processing by a microprogram when performing the arithmetic processing with higher precision.

8. A high-precision arithmetic processing method for performing arithmetic processing as a neuron in a neurocomputer, which performs neural network operation by fixed-point and single-precision arithmetic processing, and which performs arithmetic processing with higher precision than single-precision arithmetic. As a high precision multiplication, even for a multiplier and a multiplicand that exceed the input bit width of the multiplier, the multiplier and the multiplicand are set to the input bit width of the single precision multiplier so that they can be multiplied by the single precision multiplier provided in the neuron in advance. A high-precision arithmetic processing method characterized by dividing each of a plurality of words within a range that does not exceed and multiplying the divided multiplier word by the divided multiplicand word to calculate a product.

9. The divided multiplier word, multiplicand word, and partial products, addition results, and shift results, which are output results of the respective arithmetic units, are stored in a storage device according to claim 8, and the stored values are stored. A high-precision arithmetic processing method characterized in that arithmetic processing is performed again by an arithmetic unit.

10. The method according to claim 8, wherein CCR is used,
A high-precision arithmetic processing method characterized in that a state of a divided multiplier word, a multiplicand word, a partial product as an output result of each arithmetic unit, an addition result, and a shift result is detected.

11. The low-order word as the least significant bit of the high-order word of the divided multiplier word and the multiplicand word when a two's complement expression is used for the multiplier and the multiplicand which are input values of the multiplier. A high-precision arithmetic processing method capable of performing high-precision multiplication by multiplying a value using a two's complement expression by adding the most significant bits of.

12. The method according to claim 8, wherein when a two's complement expression is used for a multiplier and a multiplicand which are input values of the multiplier, the divided multiplier word and the multiplicand word are arranged from the most significant bit side of each word. If the value is negative and the value is negative, the bit width of the partial product can be adjusted arbitrarily by performing sign extension in the most significant word, and control is performed so that overflow processing when adding partial products is not performed. A high-precision arithmetic processing method comprising:

13. The method according to claim 8, wherein when the two's complement representation is used for the multiplier and the multiplicand which are the input values of the multiplier, the divided multiplicand word (or multiplier word) is defined as follows.
13. The most significant bit of each divided word according to claim 12, wherein the most significant bit of the lower word is added to the least significant bit of the upper word described in 1, and the divided multiplier word (or multiplicand word) is divided. A high-precision arithmetic processing method, characterized in that processing is performed from the side so as not to carry out processing for overflow when adding partial products.

14. A high-precision arithmetic processing method for performing arithmetic processing as a neuron in a neurocomputer for performing neural network operation by fixed-point and single-precision arithmetic processing, wherein a multiplier fetches a multiplier in pipeline processing, The most significant bit of the lower word of the multiplicand is added to the least significant bit of the upper word of the multiplicand divided by the time the multiplier is input to the multiplier, and the addition state is detected by the CCR. A high-precision arithmetic processing method characterized by performing the following.