JPH0646412B2

JPH0646412B2 - Data Flow Processor

Info

Publication number: JPH0646412B2
Application number: JP18984587A
Authority: JP
Inventors: 薫内田; 勉天満
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 1987-07-29
Filing date: 1987-07-29
Publication date: 1994-06-15
Anticipated expiration: 2009-06-15
Also published as: JPS6433635A

Description

【発明の詳細な説明】〔産業上の利用分野〕本発明は、メモリ部，演算部をパイプライン状のバスで
結合し、データフロー方式により演算順序をコントロー
ルするデータフロープロセッサに関するものである。Description: BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a data flow processor in which a memory unit and an operation unit are connected by a pipeline bus and the operation order is controlled by a data flow method.

[Conventional technology]

従来、データフロープロセッサとして日本電気株式会社
製のμＰＤ7281がある。Conventionally, there is a μPD7281 manufactured by NEC Corporation as a data flow processor.

μＰＤ7281は第４図に示されるような構成を持つ。外部
バスから装置に入力されるデータの単位となるトークン
は、データ値と、入力後にリンクテーブル92を参照する
ためのリンクテーブルアドレスと、そのトークンが処理
されるべき装置を示すモジュール番号とを持っている。
トークン入力部91は、外部バスを通るトークンのモジュ
ール番号がその装置の番号と一致する場合にそのトーク
ンを内部に入力し、そうでない場合トークン出力部97を
通じてそのまま外部バスから出力する。入力されたトー
クンは、トークンの持つテーブルアドレスによりリンク
テーブル92を参照し、そこでファンクションテーブル93
を参照するためのファンクションテーブルアドレスと次
回にリンクテーブル92を参照するためのリンクテーブル
アドレスとを得た後にファンクションテーブル93へ送ら
れる。The μPD7281 has a structure as shown in FIG. The token, which is a unit of data input to the device from the external bus, has a data value, a link table address for referring to the link table 92 after input, and a module number indicating the device to which the token is to be processed. ing.
The token input unit 91 inputs the token internally when the module number of the token passing through the external bus matches the device number, and otherwise outputs the token directly from the external bus through the token output unit 97. The entered token refers to the link table 92 by the table address of the token, and then the function table 93
Is obtained and the link table address for referring to the link table 92 next time is obtained, and then sent to the function table 93.

トークンはファンクションテーブル93においてそのファ
ンクションテーブルアドレスによる参照を行う。そこで
データメモリ94の管理情報の参照・更新を行うと同時
に、プロセッシングユニット96での処理内容を示す処理
コードとデータメモリ94のアクセスアドレスを得、デー
タメモリ94に送られ、そこで必要に応じて２項演算の相
手方にオペランドの待ち合わせあるいは定数演算のため
の定数の読み出しを行う。キューメモリ95はプロセッシ
ングユニット96が前のトークンを処理中で次のトークン
を入力できない時にトークンを一次保持するためのメモ
リであり、プロセッシングユニット96がビジーでない場
合には、トークンはキューメモリ95からプロセッシング
ユニット96に送られ、その処理コードに応じて、算術演
算，論理演算，シフト，比較，ビット反転，プライオリ
ティエンコーディング，分流，数値発生，コピー，内部
レジスタを利用した演算などのうちの１つの処理を受け
る。なおトークンの持つ処理コードが出力を示すもので
ある場合には、トークンはキューメモリ95からトークン
出力部97で送られ、入力トークンの形に変形された後
に、外部バスへ出力される。プロセッシングユニット96
で処理を受けたトークンは、リンクテーブル92に送ら
れ、再びそのリンクテーブルアドレスにより参照を行
う。以下同様にして出力命令が実行されるまで内部のリ
ングバスを回り、そのデータ値に対して必要な処理を受
ける。The token refers to the function table address in the function table 93. Therefore, the management information of the data memory 94 is referred to / updated, and at the same time, the processing code indicating the processing content in the processing unit 96 and the access address of the data memory 94 are obtained and sent to the data memory 94. Waits for operands to the other side of the term operation or reads the constant for constant operation. The queue memory 95 is a memory for temporarily holding the token when the processing unit 96 is processing the previous token and cannot input the next token, and when the processing unit 96 is not busy, the token is processed from the queue memory 95. It is sent to the unit 96 and, depending on the processing code, performs one of the operations such as arithmetic operation, logical operation, shift, comparison, bit inversion, priority encoding, shunting, numerical value generation, copy, and operation using internal register. receive. If the processing code of the token indicates output, the token is sent from the queue memory 95 by the token output unit 97, transformed into the form of the input token, and then output to the external bus. Processing unit 96
The token processed in (4) is sent to the link table 92, and again referred to by the link table address. In the same manner, it goes around the internal ring bus until the output instruction is executed, and receives the necessary processing for the data value.

[Problems to be solved by the invention]

前述のデータフロープロセッサで、外部メモリ上にある
複数のデータに対しそれぞれある係数を掛けそれらの積
の和をとる、いわゆるコンボリューション処理を行う場
合を考える。従来のデータフロープロセッサにおいては
演算器が一つしかなく、トークンが内部リングを１周し
てプロセッシングユニットに入った時にそのトークンの
持つ２つのデータの間の１つの２項演算しかできないた
め、コンボリューションにおいてはＮ個のデータ組の乗
算を行うためにＮ回と、その結果の加算を行うために
（Ｎ−１）回トークンが内部リングを周回しプロセッシ
ングユニットに流れ込む必要があり、さらにその内の加
算は時間的に直列に行われるため処理時間が長くなると
いう問題点がある。μＰＤ7281においては上の問題点の
うち連続データの加算を高速化するためにプロセッシン
グユニットにレジスタを設けているが、これでも乗算に
ついては高速化できない。Consider the case where the above-described data flow processor performs so-called convolution processing in which a certain coefficient is applied to each of a plurality of data in the external memory and the sum of their products is calculated. In the conventional data flow processor, there is only one arithmetic unit, and when the token goes around the inner ring and enters the processing unit, only one binary operation between the two data of the token can be performed. In the case of volume, N tokens must go around the inner ring and flow into the processing unit N times in order to perform multiplication of N data sets, and (N-1) times in order to perform addition of the result. However, there is a problem in that the processing time becomes long because the addition is performed serially in time. In the μPD7281, although a register is provided in the processing unit in order to speed up the addition of continuous data among the above problems, this cannot speed up the multiplication as well.

また一般にデータフロープロセッサで外部メモリからの
データをトークンの形で入力するには時間が掛かるので
なるべくメモリアクセス回数が少ないのが望ましく、そ
のために一度外部メモリから読み出したデータを内部メ
モリに保持し複数回のコンボリューションを行う際の必
要に応じてこれを１つずつずらしながら参照するという
方法をとる。しかしこれを従来のデータフロープロセッ
サで行うと、演算されるべきデータも係数もデータメモ
リに置くことになるため、それらを参照し、プロセッシ
ングユニットで演算させるためには演算に用いるトーク
ンが内部リングを２周しなければならずこれによっても
処理速度が遅くなるという問題点があった。In addition, it generally takes time to input data from external memory in the form of token in the data flow processor, so it is desirable that the number of memory access is as small as possible.For this reason, data once read from external memory is held in internal memory and The method is referred to by shifting one by one as needed when performing the convolution of times. However, if this is done with a conventional data flow processor, both the data to be calculated and the coefficient are placed in the data memory, so in order to refer to them and operate in the processing unit, the token used for the operation uses an internal ring. There has been a problem that the processing speed is slowed down due to the necessity of making two rounds.

本発明の目的は、処理されるべきＮ個のトークンが内部
リングを１周するだけでこのようなコンボリューション
処理を行うことができるようなデータフロープロセッサ
を提供し、高速な処理を実現することにある。It is an object of the present invention to provide a data flow processor capable of performing such convolution processing only by the N tokens to be processed making one round in the inner ring, and realizing high-speed processing. It is in.

[Means for solving problems]

本発明は、データの単位であるトークンを内部のリング
状のバスに流すことにより処理を行うデータフロープロ
セッサであって、トークンの行き先アドレスを貯えておくリンクテーブル
と、オペランドフェッチの制御を行うオペランドフェッチテ
ーブルと、演算に用いるオペランドデータを一次貯えるデータメモ
リと、命令コードと係数データを貯えておくファンクションテ
ーブルと、トークンを一次保持するバッファキューと、トークンに対するデータ処理を行うプロセッシングユニ
ットと、前記プロセッシングユニットからトークンを外部バスへ
送出するトークン出力部と、前記外部バスからトークンを入力して前記リンクテーブ
ルまたは前記トークン出力部へ送出するトークン入力部
とを有し、前記プロセッシングユニットは、トークンのコピーを行うトークン生成部と、内部状態を持たない算術演算，比較演算などを行う算術
計算部と、内部にレジスタを持ちそれを用いた演算を行うレジスタ
処理部と、演算結果トークンの整形を行い前記リンクテーブルまた
はトークン出力部へ送出するトークン整形部とからな
り、前記リンクテーブルと、前記オペランドフェッチテーブ
ルと、前記データメモリと、前記ファンクションテーブ
ルと、前記バッファキューと、前記プロセッシングユニ
ットとは、順に接続されて前記のリング状バスを構成し
ていることを特徴としている。The present invention is a data flow processor that performs processing by flowing a token, which is a unit of data, to an internal ring bus, a link table that stores destination addresses of tokens, and an operand that controls operand fetch. A fetch table, a data memory that temporarily stores operand data used for operations, a function table that stores instruction code and coefficient data, a buffer queue that temporarily holds tokens, a processing unit that performs data processing on tokens, and the processing described above. A token output unit for sending a token to an external bus from the unit; and a token input unit for inputting a token from the external bus and sending it to the link table or the token output unit, the processing unit: A token generator that copies tokens, an arithmetic calculator that does not have an internal state, that performs comparison operations, etc., a register processor that has registers inside and that performs operations, and shaping of operation result tokens. And a token shaping unit for transmitting to the link table or the token output unit, the link table, the operand fetch table, the data memory, the function table, the buffer queue, and the processing unit , Which are sequentially connected to form the ring bus.

[Action]

本発明を用いてＮ個のデータについてのコンボリューシ
ョン Σｗ（ｉ）×ｘ（ｋ）を行う場合、予めＮ個の係数ｗ（ｉ）（ｉ＝０，１，・
・・，Ｎ−１）をファンクショテーブルの状態を保持す
る部分にセットしておき、また処理されるべきＮ個のデ
ータｘ（ｋ）（ｋ＝０，１，・・・，Ｎ−１）をデータ
メモリしておく。When convolution Σw (i) × x (k) for N data is performed using the present invention, N coefficients w (i) (i = 0, 1, ...
.., N-1) is set in a portion that holds the state of the function table, and N pieces of data x (k) (k = 0, 1, ..., N-1) to be processed. ) Is a data memory.

演算を行う際にはトークン生成部でＮ個の連続したトー
クンである組トークが生成される。組トークン内の各々
のトークンは同一のリンクテーブルアドレスと互いを識
別するための組織別子を持ち、必ず連続してデータフロ
ープロセッサ内を流れるように制御される。When performing an operation, the token generation unit generates a set of talks, which is N consecutive tokens. Each token in the set token has the same link table address and an organization identifier for identifying each other, and is controlled so as to always flow continuously in the data flow processor.

これらのトークンは内部リングを回りオペランドフェッ
チテーブルで順にデータメモリをアクセスするようデー
タメモリアドレスを得た後、データメモリではそれを用
いて処理データｘ（ｋ）の参照を行う。さらにこれらの
トークンはファンクショテーブルにおいてそのトークン
の組トークン内の位置に対応した、先にセットされた係
数ｗ（ｉ）をフェッチし、これにより２つのデータを持
ってプロセッシングユニットに入力される。プロセッシ
ングユニットでは算術演算部においてこれらの乗算ｗ
（ｉ）×ｘ（ｋ）を行い、レジスタ処理部はその結果を
次々に受け取りその内部レジスタに加算していく。組ト
ークンは連続して流れるため複数のコンボリューション
が混ざって誤った結果を得ることはなく、組トークンの
最後尾のトークンによってそのときのレジスタの値が結
果として出力されると同時にレジスタがクリアされ、１
つのコンボリューションを終了する。These tokens go around the inner ring to obtain a data memory address for sequentially accessing the data memory in the operand fetch table, and then the data memory uses it to refer to the processing data x (k). Further, these tokens fetch the previously set coefficient w (i) corresponding to the position in the set of tokens in the function table, so that the two data are input to the processing unit. In the processing unit, these multiplications w are performed in the arithmetic operation section.
(I) × x (k) is performed, and the register processing unit receives the results one after another and adds them to the internal register. Since the group tokens flow continuously, multiple convolutions do not mix and produce incorrect results.The last token of the group token causes the register value at that time to be output as a result and the register to be cleared. 1
Finish two convolutions.

連続したデータに対して１つずつ処理対象をずらしなが
ら次々にコンボリューションを行う場合には、次の必要
なデータｘ（ｋ）を外部メモリから入力しデータメモリ
にセットした後、データメモリでアクセスするアドレス
を１つずらすだけで次のセットｘ（ｋ）（ｋ＝１，２，
・・・，Ｎ）に対する同じ係数ｗ（ｉ＝０，１，・・
・，Ｎ−１）を用いたコンボリューションを行うことが
できる。When performing convolution one after another while shifting the processing target one by one for continuous data, input the next required data x (k) from the external memory, set it in the data memory, and then access the data memory. The next set x (k) (k = 1, 2,
The same coefficient w (i = 0, 1, ...
, N-1) can be used for convolution.

〔Example〕

次に本発明の実施例について図面を参照して説明する。 Next, embodiments of the present invention will be described with reference to the drawings.

第１図は本発明の一実施例におけるデータフロープロセ
ッサ１の構成を示す内部ブロック図であり、このデータ
フロープロセッサ１は、トークン入力部10、リンクテー
ブル11、オペランドフェッチテーブル12、データメモリ
13、ファンクションテーブル14、バッファキュー15、プ
ロセッシングユニット16、トーク出力部17からなる。リ
ンクテーブル11、オペランドフェッチテーブル12、デー
タメモリ13、ファンクションテーブル14、バッファキュ
ー15、プロセッシングユニット16は図に示すようにこの
順にパイプライン方式のバスでリング状に接続してお
り、トークンはこの内部リングバス上をデータフロープ
ロセッサ内のパイプラインロックに同期して転送され
る。またプロセッシングユニット16は、トーク生成部2
0、算術計算部21、レジスタ処理部22、トークン整形部2
3からなる。FIG. 1 is an internal block diagram showing a configuration of a data flow processor 1 according to an embodiment of the present invention. The data flow processor 1 includes a token input unit 10, a link table 11, an operand fetch table 12, and a data memory.
13, a function table 14, a buffer queue 15, a processing unit 16, and a talk output unit 17. The link table 11, the operand fetch table 12, the data memory 13, the function table 14, the buffer queue 15, and the processing unit 16 are connected in a ring by a pipeline bus in this order as shown in the figure, and the token is It is transferred on the ring bus in synchronization with the pipeline lock in the data flow processor. Further, the processing unit 16 includes a talk generation unit 2
0, arithmetic calculation unit 21, register processing unit 22, token shaping unit 2
It consists of three.

第２図は第１図の実施例を用いたデータ処理装置の一例
の全体構成図である。このデータ処理装置においては、
複数のデータフロープロセッサ１，２と、１つのメモリ
インタフェース回路３が外部バス５で結ばれており、外
部バス５はメモリインタフェース回路３を介して外部メ
モリ４と接続されている。外部バス５上でトークンはハ
ンドシェーク方式により非同期に転送される。FIG. 2 is an overall configuration diagram of an example of a data processing device using the embodiment of FIG. In this data processing device,
A plurality of data flow processors 1 and 2 and one memory interface circuit 3 are connected by an external bus 5, and the external bus 5 is connected to the external memory 4 via the memory interface circuit 3. The token is transferred asynchronously on the external bus 5 by the handshake method.

第３図はデータの単位であるトークンの形式を示す。外
部バス５上でのトークン60は、モジュール番号61、組織
別子62、リンクテーブルアドレス63、データ64からな
る。FIG. 3 shows the format of a token which is a unit of data. The token 60 on the external bus 5 includes a module number 61, an organization identifier 62, a link table address 63, and data 64.

この実施例で用いられるトークンについては、１つまた
は複数のトークンからなる組トークンで１つのまとまり
として処理することができる。組トークンは常に連続し
てデータ処理装置内を流れ、また同一のモジュール番号
61と同一のリンクテーブルアドレス63を持つ。The tokens used in this embodiment can be processed as a unit with a set token composed of one or more tokens. The group token always flows continuously in the data processing device, and the same module number is used.
It has the same link table address 63 as 61.

組織別子62は組トークン内でのそのトークンの識別に用
いられ、そのトークンが単独で組トークンを構成する場
合には“０”を、また複数のトークンからなる組トーク
ン内で互いを区別する必要がある場合にはそれぞれ異な
る値を持つことができる。ただし組トークン内の最後尾
もトークンは組織別子として“０”を持つ。The organization identifier 62 is used to identify the token in a set token, and distinguishes "0" when the token constitutes a set token by itself, and distinguishes each other in a set token composed of a plurality of tokens. They can have different values if needed. However, the last token in the group token also has "0" as an organization identifier.

また70，75，80，85は、それぞれリンクテーブル11から
オペランドフェッチテーブル12へ、オペランドフェッチ
テーブル12からデータメモリ13へ、データメモリ13から
ファンクションテーブル14へ、ファンクションテーブル
14からバッファキュー15並びにプロセッシングユニット
16へ入力される際のトークンの形式を示す。Further, 70, 75, 80, 85 are respectively the link table 11 to the operand fetch table 12, the operand fetch table 12 to the data memory 13, the data memory 13 to the function table 14, and the function table.
14 to buffer queue 15 and processing unit
The format of the token when input to 16 is shown.

トークン入力部10は、前段のデータフロープロセッサま
たはメモリインタフェース回路から入力されるトークン
のうちそのモジュール番号61が、そのデータフロープロ
セッサに与えられた番号に等しいもののみを内部へ取り
込み、リンクテーブル11へパイプラインサイクルに同期
して送り、その他のトークンは通過トークンとしてその
ままトークン出力部17へ送る。ただしリンクテーブル11
あるいはトークン出力部17が以下に述べるように組トー
クンの処理中でビジー状態である場合にはトークンを送
出せず、更に前段のデータフロープロセッサまたはメモ
リインタフェース回路にハンドシェークのアクノレジ信
号を返さないことにより入力を停止する。The token input unit 10 fetches only tokens input from the preceding data flow processor or memory interface circuit, whose module number 61 is equal to the number given to the data flow processor, to the link table 11. The tokens are sent in synchronization with the pipeline cycle, and the other tokens are sent as they are to the token output unit 17 as passing tokens. However, link table 11
Alternatively, when the token output unit 17 is in the busy state during the processing of the group token as described below, the token is not transmitted, and the handshake acknowledge signal is not returned to the preceding data flow processor or memory interface circuit. Stop typing.

リンクテーブル11は、プロセッシングユニット16または
トークン入力部10からトークンを入力するが、両方から
同時にそのリクエストがあった場合には通常はトークン
入力部10からの入力を優先する。ただしプロセッシング
ユニット16がコピー動作により連続トークン生成中はそ
れを優先し、またどちらからのものであっても入力され
トークンの組織別子が“０”でない場合には、そのトー
クンが複数のトークンからなる組トークンの最後尾以外
のトークンであり、さらに連続して組トークンの残りが
入力されてくることが分かるので、それらのトークンの
送出元でない方に対してビジー状態であることを知らせ
入力を停止することにより、それらの組トークン全体を
優先して連続して入力する。これにより組トークンの連
続性が保証される。The link table 11 inputs a token from the processing unit 16 or the token input unit 10, but when both requests are made at the same time, the input from the token input unit 10 is usually prioritized. However, the processing unit 16 gives priority to it while the continuous token is being generated by the copy operation, and if the organization identifier of the token that is input is not “0” regardless of which token is input, the token is selected from multiple tokens. Since it is a token other than the last one of the group tokens and the rest of the group tokens are input continuously, inform the person who is not the source of those tokens that they are busy and input. By stopping, the entire tokens in the set are preferentially input continuously. This guarantees the continuity of the group token.

リンクテーブル11は、プロセッシングユニット16または
トークン入力部10から入力されたトークン60のリンクテ
ーブルアドレス63によって参照され、トークンはオペラ
ンドフェッチテーブルアドレス71、ファンクションテー
ブルアドレス72および次回のリンクテーブル11参照のた
めのリンクテーブルアドレス73を得てオペランドフェッ
チテーブル12に送られる。ただしファンクションテーブ
ルアドレス生成には、リンクテーブル参照によって得ら
れたデータとそのトークンの組織別子とが用いられる。The link table 11 is referred to by the link table address 63 of the token 60 input from the processing unit 16 or the token input unit 10, and the token is used for referring to the operand fetch table address 71, the function table address 72 and the next link table 11. The link table address 73 is obtained and sent to the operand fetch table 12. However, for the function table address generation, the data obtained by referring to the link table and the organization identifier of the token are used.

オペランドフェッチテーブル12は、入力トークン70の持
つリンクテーブル11か読み出したオペランドフェッチテ
ーブルアドレス71によって参照され、そのアドレスにあ
るデータメモリ13の読み出し，書き込み，データの２項
キュー制御の命令コードの参照と状態管理を行う情報の
参照，更新を行う。これによりトークンは、データメモ
リアドレス77とデータメモリにおける動作を示すデータ
メモリコード76を受け取る。The operand fetch table 12 is referred to by the link table 11 of the input token 70 or the read operand fetch table address 71, and the instruction code for reading / writing the data memory 13 at that address and controlling the binary queue control of data is referred to. Refers to and updates the information for state management. This causes the token to receive the data memory address 77 and the data memory code 76 indicating the operation in the data memory.

データメモリ13は、入力トークン75の持つデータメモリ
アドレス77によってアクセスされ、必要に応じて２項演
算データの、また書き込みトークン出力の時のデータを
持つトークンとアドレスを持つトークンの、待ち合わせ
のためのキューして、あるいは定数演算のための定数な
どの格納のためのメモリとして用いられる。例えば外部
メモリからデータフロープロセッサに入力したデータを
データメモリの順に連続した番地に書き込むことによっ
て保持し、その後演算処理を行う際にデータメモリ読み
出しトークンによってデータメモリからそのデータを読
み出し、第１オペランド81としてトークンに付加するこ
とによりプロセッシングユニット16で演算に用いること
ができる。The data memory 13 is accessed by the data memory address 77 of the input token 75, and is used for waiting for the token having the binary operation data and the token having the data at the time of the write token output and the address as the need arises. Used as a queue or as a memory for storing constants for constant calculation. For example, the data input from the external memory to the data flow processor is held by being written in consecutive addresses in the order of the data memory, and when the arithmetic processing is performed thereafter, the data is read from the data memory by the data memory read token, and the first operand 81 Can be used for calculation in the processing unit 16 by adding to the token.

ファンクションテーブル14では、入力されるトークン80
のファンクションテーブルアドレス72によりその内部の
テーブルをアクセスし、内部状態を持ちながら必要に応
じて流れるトークンのリンクテーブルアドレス部を変更
することにより流れの制御を行うと同時に、プロセッシ
ングユニット16での処理内容を示す処理コード、すなわ
ちトークン生成コード86、算術演算コード87、レジスタ
処理コード88、トークン整形コード89をトークンに付加
する。また上述の流れ制御動作の代わりにその内部状態
保持部にあるデータを第２オペランド82としてトークン
に付加し、ファンクションテーブル14の入力時に持って
いた第１オペランド81のデータ共にプロセッシングユニ
ット16へ入力することができる。In the function table 14, the input token 80
The internal table is accessed by the function table address 72, and the flow is controlled by changing the link table address part of the token that flows as necessary while maintaining the internal state, and at the same time, the processing content of the processing unit 16 , That is, the token generation code 86, the arithmetic operation code 87, the register processing code 88, and the token shaping code 89 are added to the token. Further, instead of the flow control operation described above, the data in the internal state holding unit is added to the token as the second operand 82, and the data of the first operand 81 that was held when the function table 14 was input is also input to the processing unit 16. be able to.

バッファキュー15は、プロセッシングユニット16にトー
クンを入力する前にトークンを一時保持するためのメモ
リであり、プロセッシングユニット16のトークン生成部
20がトークン入力を停止しトークンコピー動作を実行中
にプロセッシングユニット16に対する出力を停止する。The buffer queue 15 is a memory for temporarily holding the token before inputting the token to the processing unit 16, and the token generation unit of the processing unit 16.
20 stops the token input and the output to the processing unit 16 during the token copy operation.

プロセッシングユニット16は、前述したようにトークン
生成部20、算術計算部21、レジスタ処理部22、トークン
整形部23の４つの演算部が直列に接続されることにより
構成され、入力されたトークンがそれらを順に通過する
際にこれに対しパイプライン的に作用する。先にファン
クションテーブル14で読み出した命令コード86，87，8
8，89は、これらの演算部の各々の部分に対応するコー
ドが連結されたものであり、それぞれの演算部はこれら
に従い独立に動作する。As described above, the processing unit 16 is configured by serially connecting the four calculation units of the token generation unit 20, the arithmetic calculation unit 21, the register processing unit 22, and the token shaping unit 23, and the input tokens are It acts like a pipeline on this when passing through in sequence. The instruction codes 86, 87, 8 previously read in the function table 14
Reference numerals 8 and 89 are codes in which codes corresponding to respective parts of these arithmetic units are connected, and the respective arithmetic units operate independently according to these.

トークン生成部20は、トークン生成コード86に従い、１
つのトークンの入力に対してコピー動作を行い複数のト
ークンを出力する。この際、コピーされるトークンのリ
ンクテーブルアドレスは入力トークンのリンクテーブル
アドレスをそのまま用いることができ、またコピー動作
により連続して出力される複数のトークンがそのまま組
トークンを形成するように組織別子を操作することもで
きる。またデータが等差数列を形成するような複数トー
クンを生成することもできる。According to the token generation code 86, the token generation unit 20
Performs copy operation for one token input and outputs multiple tokens. At this time, the link table address of the input token can be used as it is as the link table address of the token to be copied, and a plurality of tokens that are continuously output by the copy operation form a group token as they are. Can also be operated. It is also possible to generate multiple tokens whose data forms an arithmetic progression.

算術計算部21は、リンクテーブル11通過時に持っていた
データとデータメモリ13から読み出したデータと、ある
いはデータメモリ13から読み出したデータとファンクシ
ョンテーブル14から読み出したデータとの２項演算、あ
るいはいずれかのデータの単項演算を、算術演算コード
87に従い内部状態を持たずに実行する。演算としては乗
算を含む算術演算，論理演算，シフト，比較，ビット操
作などがある。The arithmetic calculation unit 21 performs a binary operation on the data held at the time of passing through the link table 11 and the data read from the data memory 13, or the binary operation of the data read from the data memory 13 and the data read from the function table 14. Arithmetic operation code
Execute according to 87 with no internal state. The operations include arithmetic operations including multiplication, logical operations, shifts, comparisons, bit operations and the like.

レジスタ処理部22は、内部にレジスタを持ち、レジスタ
処理コード88に従って算術計算部21の結果データをレジ
スタに作用させることができる。例としてはデータをレ
ジスタに足し込むことによりベクトルデータの総和を取
ることができ、またレジスタをクリアすることができ
る。The register processing unit 22 has a register inside, and can operate the result data of the arithmetic calculation unit 21 on the register according to the register processing code 88. As an example, the sum of vector data can be obtained by adding data to a register, and the register can be cleared.

トークン整形部23は、トークン整形コード89に従い、算
術計算部21の結果出力とレジスタ処理部22のレジスタ内
容のうちからプロセッシングユニット16の結果データと
して出力すべきものを選択し、必要な変形を行ってトー
クンの整形を行い、リンクテーブル11へ出力する。また
そのトークンがデータフロープロセッサ外へ出力される
べきであることを示す命令コードを持つときには、それ
に外部バスのトークンに必要なモジュール番号を付加し
トークン出力部17へ送出する。ただしトークン出力部17
がビジー状態である場合には、そこへの出力を停止し、
入力も禁止する。The token shaping unit 23 selects one of the result output of the arithmetic calculation unit 21 and the register contents of the register processing unit 22 as the result data of the processing unit 16 according to the token shaping code 89, and performs necessary transformation. Format the token and output it to the link table 11. When the token has an instruction code indicating that it should be output to the outside of the data flow processor, it adds the necessary module number to the token of the external bus and sends it to the token output unit 17. However, token output section 17
Is busy, stop output to it,
Input is also prohibited.

トークン出力部17は、プロセッシングユニット16または
トークン入力部10から入力されたトークンを、外部バス
５ａを介して後段のデータフロープロセッサまたはメモ
リインタフェース回路に対して出力する。ただしプロセ
ッシングユニット16およびトークン入力部10の両方から
同時にそのリクエストがあった場合には、通常はトーク
ン入力部10からの入力を優先する。ただしどちらからの
ものであっても入力されたトークンの組織別子が“０”
でない場合には、そのトークンが複数のトークンからな
る組トークンの最後尾以外のトークンであり、さらに連
続して組トークンの残りが入力されてくることが分かる
ので、それらのトークンの送出元でない方に対してビジ
ー状態であることを知らせ入力を停止することにより、
それらの組トーク全体を優先して連続して入力する。こ
れにより組トークンの連続性が保証される。The token output unit 17 outputs the token input from the processing unit 16 or the token input unit 10 to the subsequent data flow processor or memory interface circuit via the external bus 5a. However, when there is a request from both the processing unit 16 and the token input unit 10 at the same time, the input from the token input unit 10 is usually given priority. However, the organization identifier of the entered token is "0" regardless of which one
If it is not, it is understood that the token is a token other than the last token of the group token consisting of multiple tokens, and the rest of the group tokens are input in succession. By notifying that it is busy and stopping the input,
Input all the group talks consecutively with priority. This guarantees the continuity of the group token.

次に本実施例を用いて例えば９つのデータによるコンボ
リューションを行う場合の動作について説明する。Next, the operation in the case of performing convolution using, for example, nine data will be described using this embodiment.

演算に先立って９つの係数ｗ（０）〜ｗ（８）をそれぞ
れファンクションテーブル14のｃ１＋１，・・・，ｃ１
＋８，ｃ１の各アドレスにある内部状態保持部に設定す
る。次に処理に用いるデータｘ（ｋ）を外部メモリ４か
らメモリインタフェース回路３を介してデータフロープ
ロセッサに次々に入力し、データメモリ13のアドレスｄ
１から始まる連続領域に格納する。連続してコンボリュ
ーション処理を行う場合、１回目のコンボリューション
ではｘ（０）〜ｘ（８）のデータが、２回目のコンボリ
ューションではｘ（１）〜ｘ（９）のデータが、という
ように必要になるので、初期状態としては９点以上のデ
ータを用意し、以下１回９点のコンボリューションが終
了するのに同期して次以降に必要なデータを１つずつ入
力しデータメモリ13に続けて書き込んでいく。またレジ
スタ処理部22内の加算に用いるレジスタをクリアしてお
く。Prior to the calculation, nine coefficients w (0) to w (8) are respectively assigned to c1 + 1, ..., C1 of the function table 14.
It is set in the internal state holding unit at each address of +8 and c1. Next, the data x (k) to be used for processing is sequentially input from the external memory 4 to the data flow processor via the memory interface circuit 3, and the address d of the data memory 13 is input.
Store in a continuous area starting from 1. When performing the convolution processing continuously, the data of x (0) to x (8) is the first convolution, and the data of x (1) to x (9) is the second convolution. Data of 9 points or more is prepared in the initial state, and data required for the next and subsequent ones are input one by one in synchronization with the completion of convolution of 9 points once at a time. Continue to write. Further, the register used for addition in the register processing unit 22 is cleared.

コンボリューションを求める場合には、まずトークン生
成部20に対し長さ９の組トークンを生成するという命令
コードを持ったトークンを入力し、９個のトークンから
なる組トークンを用意する。これらのトークンは、等し
いリンクテーブルアドレスａ１とデータメモリ13のアク
セスに用いるデータｄ１を持ち、また組織別子としては
先頭のトークンが“１”、以下“２”，“３”，・・
・，“８”と持ち、最後置のトークンは“０”を持つ。To obtain the convolution, first, a token having an instruction code for generating a group token of length 9 is input to the token generation unit 20, and a group token composed of nine tokens is prepared. These tokens have the same link table address a1 and the data d1 used to access the data memory 13, and as the organization identifier, the first token is "1", and hereinafter "2", "3", ...
-, "8", and the last token has "0".

これらのトークンは算術計算部21，レジスタ処理部22，
トークン整形部23を通過し、リンクテーブル11でそのリ
ンクテーブルアドレスの値をアドレスとして参照を行
う。この参照によりリンクテーブルの出力トークン70は
オペランドフェッチテーブルアドレスとしてｂ１、コン
ボリューションを計算した後の結果トークンが持つべき
リンクテーブルアドレスとしてａ２およびファンクショ
ンテーブルアドレスを持つ。ファンクションテーブルア
ドレスはリンクテーブル11のアドレスａ１に書き込まれ
ているファンクションテーブルベースアドレスｃ１に組
織別子に加えることによって生成され、従ってこれら９
つの組トークン内の各トークンはｃ１＋１，ｃ１＋２，
・・・，ｃ１＋８，ｃ１という値を持つ。These tokens are the arithmetic calculation unit 21, the register processing unit 22,
After passing through the token shaping unit 23, the link table 11 refers to the value of the link table address as an address. By this reference, the output token 70 of the link table has b1 as the operand fetch table address, and a2 and the function table address as the link table address that the result token after the convolution has to have. The function table address is generated by adding the organization table to the function table base address c1 written in the address a1 of the link table 11, and thus these 9
Each token in one set of tokens is c1 + 1, c1 + 2,
.., c1 + 8, c1.

オペランドフェッチテーブル12では、入力トークン70の
持つオペランドフェッチテーブルアドレスｂ１を用いて
参照を行い、トークンのデータ部が持つ値ｄ１とｂ１の
アドレスでアクセスされる状態保持部にある値（初期値
０）の加算によりデータメモリアクセスアドレスｄ１が
生成される。状態保持部の値はオペランドフェッチテー
ブル12にトークンが入力され、そのアドレスの参照が行
われるごとに１つずつインクリメントされ、これにより
最初の８つのトークンが入力されると、それらの持つデ
ータメモリアクセスアドレスはｄ１，ｄ１＋１，・・
・，ｄ１＋７となる。９つ目のトークンが入力される
と、そのトークンはデータメモリアクセスアドレスとし
てｄ１＋８を受け取り、それと共に状態保持部の値はリ
セットされ０になる。この値９はデータメモリ読み出し
のサイクルサイズであり、オペランドフェッチテーブル
12のアドレスｂ１の命令コードに埋め込まれている。In the operand fetch table 12, reference is performed using the operand fetch table address b1 of the input token 70, and the value in the state holding unit accessed by the addresses of the values d1 and b1 of the data portion of the token (initial value 0) The data memory access address d1 is generated by adding The value of the state holding unit is incremented by 1 each time a token is input to the operand fetch table 12 and the address is referenced, and when the first 8 tokens are input, the data memory access that they have The addresses are d1, d1 + 1, ...
・, D1 + 7. When the ninth token is input, the token receives d1 + 8 as the data memory access address, and at the same time, the value of the state holding unit is reset to 0. This value 9 is the cycle size of data memory read, and the operand fetch table
It is embedded in the instruction code of 12 addresses b1.

データメモリ13では、上記のようにして得たアクセスア
ドレスｄ１，ｄ１＋１，・・・，ｄ１＋８に対して読み
出し動作が行われ、各トークンは第１オペランドとして
予め外部メモリ４からデータメモリ13に読み込まれてい
たデータｘ（０），ｘ（１），・・・，ｘ（８）をフェ
ッチする。In the data memory 13, a read operation is performed on the access addresses d1, d1 + 1, ..., D1 + 8 obtained as described above, and each token is read in advance from the external memory 4 into the data memory 13 as the first operand. The fetched data x (0), x (1), ..., X (8).

ファンクションテーブル14においては、入力されるトー
クン80の持つファンクションテーブルアドレス72をアド
レスとして参照が行われ、この場合９つのトークンはｃ
１＋１，ｃ１＋２，・・・，ｃ１＋８，ｃ１という値を
持つので、その各アドレスにある内部状態保持部の値で
ある９つの係数ｗ（０），ｗ（１），・・・，ｗ（８）
が各々のトークンの第２オペランドとしてフェッチされ
る。また同時にプロセッシングユニット16の命令コード
86，87，88，89もフェッチされる。命令コードのうち算
術演算コード87としては乗算という命令コードが、また
レジスタ処理コード88としては１つ目から８つ目のトー
クンについてはレジスタへのデータ加算とトークン消滅
という命令コードが、最後尾のトークンについてはレジ
スタへのデータ加算とレジスタ内容の出力という命令コ
ードが与えられる。In the function table 14, reference is made using the function table address 72 of the input token 80 as an address, and in this case, 9 tokens are c
Since they have values of 1 + 1, c1 + 2, ..., C1 + 8, c1, nine coefficients w (0), w (1), ..., W (8) which are the values of the internal state holding unit at each address. )
Is fetched as the second operand of each token. At the same time, the instruction code of the processing unit 16
86,87,88,89 are also fetched. Of the instruction codes, the arithmetic operation code 87 is the instruction code of multiplication, and the register processing code 88 is the instruction code of adding data to the register and deleting the tokens for the first to eighth tokens. An instruction code for adding data to a register and outputting the contents of the register is given to the token.

第１オペランドとしてｘ（０），ｘ（１），・・・，ｘ
（８）を、第２オペランドとしてｗ（０），ｗ（１），
・・・，ｗ（８）を、算術計算コードとして「乗算」を
持つ９つのトークンは、バッファキュー15，トークン生
成部20を通過して算術計算部21に入り、そこで２つのオ
ペランドの乗算が行われる。X (0), x (1), ..., x as the first operand
(8) as the second operand, w (0), w (1),
..., w (8), the nine tokens having "multiplication" as the arithmetic calculation code pass through the buffer queue 15 and the token generation unit 20 and enter the arithmetic calculation unit 21, where multiplication of two operands is performed. Done.

レジスタ処理部22においては、各々のトークンの持つレ
ジスタ処理コードに従って初期値０のレジスタに対し
て、これらの９つのトークンの持つ乗算結果のデータｘ
（０）×ｗ（０），ｘ（１）×ｗ（１），・・・，ｘ
（８）×ｗ（８）が次々に加算され、先頭から８番目ま
でのトークンはレジスタへ加算後消滅する。最後のトー
クンがレジスタへ加算を行った後、結果Σｗ（ｉ）×ｘ
（ｋ）がトークン整形部23へ出力され、レジスタは次の
コンボリューション演算のためにクリアされる。In the register processing unit 22, the multiplication result data x of these nine tokens is added to the register of the initial value 0 according to the register processing code of each token.
(0) × w (0), x (1) × w (1), ..., x
(8) × w (8) are added one after another, and the tokens from the beginning to the eighth are deleted after being added to the register. The result Σw (i) × x after the last token adds to the register
(K) is output to the token shaping unit 23, and the register is cleared for the next convolution operation.

トークン整形部23においては、結果トークンをプログラ
ムに従ってリンクテーブル11へ再び送出するか、あるい
はトークン出力部17に送りデータフロープロセッサ外に
出力する。In the token shaping unit 23, the result token is sent again to the link table 11 according to the program, or sent to the token output unit 17 and output outside the data flow processor.

以上により１回９点のコンボリューションの計算を終了
する。以下同様にして、次に必要な入力データｘ
（９），ｘ（10），・・・をデータフロープロセッサ外
から入力し、データメモリ13のアドレスｄ１＋９，ｄ１
＋10，・・・に連続して書き込み、またデータとしてｄ
１＋１を持つ次の９つのトークンからなる組トークンの
生成、それらによるデータメモリのアドレスｄ１＋１か
らのオペランドフェッチ，演算，結果出力と続けること
ができる。この際、次のデータｘ（１），ｘ（２），・
・・，ｘ（９）についてコンボリューション、さらにそ
の次のデータｘ（２），ｘ（３），・・・，ｘ（10）に
ついてのコンボリューションは、必ずしもこのコンボリ
ューション終了を待たずに開始することが可能であり、
なるべく連続して行いプロセッシングユニットの稼働し
ない時間を短くすることにより処理を高速化できる。As described above, the calculation of the convolution of 9 points is completed once. In the same manner, the next required input data x
(9), x (10), ... Are input from outside the data flow processor, and the addresses d1 + 9, d1 of the data memory 13 are input.
Continuously write to +10, ... And d as data
It is possible to continue the generation of a group token composed of the next nine tokens having 1 + 1, the operand fetch from the address d1 + 1 of the data memory by them, the operation, and the output of the result. At this time, the next data x (1), x (2), ...
.., convolution on x (9), and convolution on the next data x (2), x (3), ..., x (10) are started without waiting for the end of this convolution. It is possible to
The processing can be sped up by performing the processing as continuously as possible and shortening the time during which the processing unit does not operate.

以上本発明の実施例を用いてコンボリューション処理を
行う場合の動作を述べたが、さらに本発明を用いること
によりより一般的に複数のデータを一つの単位として演
算の対象とする処理を効率よく実行することができる。The operation in the case of performing the convolution processing has been described above using the embodiment of the present invention. Further, by further using the present invention, it is possible to more efficiently perform the processing that generally targets a plurality of data as one unit. Can be executed.

例としてトークンの持つデータの２倍のビット幅を持つ
倍精度データを扱う倍精度演算を行う場合について説明
する。データは上位語と下位語に分解され、下位語を持
つトークン、上位語を持つトークンというこの順の２つ
のトークンからなる組トークンで表され、この組トーク
ンはそれを示す組織別子を持ち、前述したような制御機
構により連続してデータフロープロセッサおよびそれを
用いたデータ処理装置内を流れる。これを用いて例えば
倍精度のデータの加算を行うときには、一方のデータを
持つ組トークンの２つのデータをオペランドフェッチテ
ーブルの待ち合わせ命令などによりデータメモリに連続
して書き込み、もう一方のデータを持つ組トークンをリ
ンクテーブルからオペランドフェッチテーブル，データ
メモリへ流し、ここで先に書き込んだデータを下位語，
上位語の順で読み出すことにより、下位語同士，上位語
同士の持つ２つのトークンをプロセッシングユニットへ
送る。プロセッシングユニットでは、算術演算部におい
て下位語同士，上位語同士の加算を行い、下位語の加算
によって生ずるキャリーをレジスタ処理部の内部レジス
タに保持し上位語の和に加えて結果の上位語を得るとい
うような制御をすることにより入力された倍精度データ
の加算を行い、結果データを入力時と同じ形式で持つ組
トークンを出力する。As an example, a case will be described in which a double-precision operation that handles double-precision data having a bit width twice that of the data of the token is performed. The data is decomposed into high-rank words and low-rank words, and is represented by a group token consisting of two tokens in this order, a token having a lower word and a token having a higher word. This group token has an organizational identifier indicating it, The control mechanism as described above continuously flows through the data flow processor and the data processing device using the data flow processor. When using this to add double-precision data, for example, two pieces of data of a pair token having one data are continuously written to the data memory by a wait instruction of the operand fetch table, and a pair having the other data is written. The token is sent from the link table to the operand fetch table and the data memory, and the data written earlier here is the lower word,
By reading in the order of the upper word, the two tokens held by the lower words and the upper words are sent to the processing unit. In the processing unit, the arithmetic operation unit adds low-order words to each other and high-order words to each other, holds a carry generated by addition of the low-order words in an internal register of the register processing unit, and adds the high-order words to obtain the high-order word of the result. By performing such control, the input double precision data is added, and a pair token having the same format as the input data is output.

また実部と虚部を持つ複素数データについてもこれと同
様に組トークン形式で流し、プロセッシングユニットで
演算することが可能である。Similarly, complex number data having a real part and an imaginary part can be sent in a set token format and can be calculated by the processing unit.

〔The invention's effect〕

以上説明したように本発明によれば、（１）組トークンの持つ組織別子を用いてファンクショ
ンテーブルで係数データｗ（ｉ）をフェッチすることが
可能となり、ｘ（ｋ）とｗ（ｉ）をフェッチするために
トークン生成部で生成したトークン列をリングバスを２
周させる必要がない、（２）算術計算部とレジスタ処理部とを連続して配置す
ることにより乗算直後にそれらの加算を行うことが可能
になり、それらの間でリングバス内にトークンをさらに
１周させる必要がない、（３）またこれらの算術計算部とレジスタ処理部がパイ
プライン的に並列動作可能なため演算性能が向上してい
る、という効果があり、これによりコンボリューション処理
においてプログラムの負担の軽減と処理の高速化を図る
ことができる。As described above, according to the present invention, (1) it becomes possible to fetch the coefficient data w (i) in the function table using the organization identifier of the set token, and x (k) and w (i). The token string generated by the token generator to fetch the
(2) Arithmetic calculation unit and register processing unit are arranged consecutively, which makes it possible to add them immediately after multiplication, and further add tokens in the ring bus between them. It is not necessary to make one round. (3) Moreover, there is an effect that the arithmetic performance is improved because the arithmetic calculation unit and the register processing unit can operate in parallel in a pipeline manner, which results in the program in the convolution processing. It is possible to reduce the burden on the server and increase the processing speed.

さらに従来のソウトウェアにより実現せざるを得ず実行
に多くのステップ数がかかっていた倍精度データおよび
複素数データの処理を、本発明を用いてそれらのデータ
を持つ組トークンの形でデータフロープロセッサ内を流
することにより高速に実行することが可能となる。Furthermore, processing of double-precision data and complex number data, which had to be realized by conventional software and took a large number of steps to execute, was performed in the data flow processor in the form of a set token having those data using the present invention. It becomes possible to execute at high speed by flowing the.

[Brief description of drawings]

第１図は本発明のデータフロープロセッサの構成図、第２図は第１図のデータフロープロセッサを用いた処理
装置の例を示す全体構成図、第３図は本発明の説明に供するトークンの形式を示す
図、第４図は従来のデータフロープロセッサの構成を示す図
である。 10……トークン入力部 11……リンクテーブル 12……オペランドフェッチテーブル 13……データメモリ 14……ファンクションテーブル 15……バッファメモリ 16……プロセッシングユニット 17……トークン出力部 20……トークン生成部 21……算術計算部 22……レジスタ処理部 23……トークン整形部FIG. 1 is a block diagram of a data flow processor of the present invention, FIG. 2 is an overall block diagram showing an example of a processing device using the data flow processor of FIG. 1, and FIG. 3 is a token used for explaining the present invention. FIG. 4 is a diagram showing a format, and FIG. 4 is a diagram showing a configuration of a conventional data flow processor. 10 …… Token input section 11 …… Link table 12 …… Operand fetch table 13 …… Data memory 14 …… Function table 15 …… Buffer memory 16 …… Processing unit 17 …… Token output section 20 …… Token generation section 21 ...... Arithmetic calculation unit 22 ...... Register processing unit 23 ...... Token shaping unit

Claims

[Claims]

1. A data flow processor that performs processing by flowing a token, which is a unit of data, to an internal ring-shaped bus, and controls a link table for storing a destination address of the token and an operand fetch. An operand fetch table, a data memory that temporarily stores operand data used for operations, a function table that stores instruction code and coefficient data, a buffer queue that temporarily holds tokens, and a processing unit that processes data for tokens, A token output unit for sending a token from the processing unit to an external bus; and a token input unit for inputting a token from the external bus and sending it to the link table or the token output unit, the processing unit: A token generator that copies tokens, an arithmetic calculator that does not have an internal state, that performs comparison operations, etc., a register processor that has registers inside and that performs operations, and shaping of operation result tokens. And a token shaping unit for transmitting to the link table or the token output unit, the link table, the operand fetch table, the data memory, the function table, the buffer queue, and the processing unit , And a data flow processor which is connected in sequence to form the ring bus.