JP2007164472A

JP2007164472A - Arithmetic device with queuing mechanism

Info

Publication number: JP2007164472A
Application number: JP2005359815A
Authority: JP
Inventors: Takayuki Sugawara; 崇之菅原; Shinichi Iwamoto; 信一岩本; Yasutoku Sakakibara; 泰徳榊原
Original assignee: Sonac KK
Current assignee: Sonac KK
Priority date: 2005-12-14
Filing date: 2005-12-14
Publication date: 2007-06-28
Also published as: WO2007069464A1

Abstract

<P>PROBLEM TO BE SOLVED: To provide an arithmetic device locally performing queuing of a plurality of pieces of data input to each processor element, and not requiring consideration about synchronization of the data in time of programming. <P>SOLUTION: This arithmetic device has one or more first processor elements, one or more second processor elements, and a queuing means. The queuing means has: a payload holding holding the data output by the first processor elements; and a flag corresponding to the payload, and showing an effective state or an ineffective state of the corresponding payload. The flag comes into the effective state when the first processor element writes the data into the corresponding payload, and comes into the ineffective state when all the second processor elements with the data as input reads the data from the payload. <P>COPYRIGHT: (C)2007,JPO&INPIT

Description

本発明は、プロセッサエレメント間又はパイプラインの各ステージを構成する機能ブロック間に待ち合わせ機構を有する演算装置に関する。 The present invention relates to an arithmetic unit having a waiting mechanism between processor elements or between functional blocks constituting each stage of a pipeline.

特許文献１には、複数のプロセッサエレメント（ＰｒｏｃｅｓｓｏｒＥｌｅｍｅｎｔ：以後、ＰＥと呼ぶ。）を行列状に配置し、各ＰＥでの演算内容と各ＰＥ間の接続をプログラムにより変更することで、目的とするデータ処理を実現する演算装置が記載されている。 In Patent Literature 1, a plurality of processor elements (Processor Elements: hereinafter referred to as PEs) are arranged in a matrix, and the operation contents in each PE and the connection between the PEs are changed by a program. An arithmetic unit that realizes data processing is described.

図８は、特許文献１に記載の演算装置を説明する図である。図８において、ＰＥ７２及びＰＥ７３は、それぞれ入力６１及び６２に入力される値の乗算を実行し、ＰＥ７１は入力６１及び６２に入力される値の加算を実行するようにプログラムされ、更に、ＰＥ７２の出力がＰＥ７１の入力６１に、ＰＥ７３の出力がＰＥ７１の入力６２に接続するようにプログラムされている。即ち、ＰＥ７２の入力６１に入力される値をａと、ＰＥ７２の入力６２に入力される値をｂと、ＰＥ７３の入力６１に入力される値をｃと、ＰＥ７３の入力６２に入力される値をｄとすると、ＰＥ７１の出力は、ａ*ｂ＋ｃ*ｄとなる。 FIG. 8 is a diagram illustrating the arithmetic device described in Patent Document 1. In FIG. 8, PE 72 and PE 73 perform multiplication of values input to inputs 61 and 62, respectively, PE 71 is programmed to perform addition of values input to inputs 61 and 62, and The output is programmed to connect to the input 61 of PE 71 and the output of PE 73 to the input 62 of PE 71. That is, the value inputted to the input 61 of the PE 72 is a, the value inputted to the input 62 of the PE 72 is b, the value inputted to the input 61 of the PE 73 is c, and the value inputted to the input 62 of the PE 73. If d is d, the output of PE 71 is a * b + c * d.

図８においては、ＰＥ７２の入力６１に値“３”、“５”、“７”が順に入力され、ＰＥ７２の入力６２に値“２”、“４”、“１”が順に入力され、ＰＥ７３の入力６１に値“１”、“４”、“４”が順に入力され、ＰＥ７３の入力６２に値“３”、“３”、“１”が順に入力されており、ＰＥ７１からは、値“９”、“３２”、“１１”が順に出力されている。 In FIG. 8, values “3”, “5”, and “7” are sequentially input to the input 61 of the PE 72, and values “2”, “4”, and “1” are sequentially input to the input 62 of the PE 72. The values “1”, “4”, and “4” are sequentially input to the input 61 of the data, the values “3”, “3”, and “1” are sequentially input to the input 62 of the PE 73. “9”, “32”, and “11” are sequentially output.

特許文献１に記載の演算装置においては、各ＰＥに入力される複数のデータ間のタイミングについて同期制御を行う必要がある。例えば、図８において、ＰＥ７２の入力６１に値“５”が入力されている間に、入力６２に値“４”が入力されるように制御する必要があり、ＰＥ７１の入力６１に値“２０”が入力されている間に、入力６２に値“１２”が入力されるように演算装置を制御する必要があり、この同期制御なしで演算装置は正しく動作しない。 In the arithmetic device described in Patent Document 1, it is necessary to perform synchronous control on the timing between a plurality of data input to each PE. For example, in FIG. 8, it is necessary to control so that the value “4” is input to the input 62 while the value “5” is input to the input 61 of the PE 72, and the value “20” is input to the input 61 of the PE 71. It is necessary to control the arithmetic unit so that the value “12” is input to the input 62 while “” is being input. Without this synchronous control, the arithmetic unit does not operate correctly.

特許文献１には、この同期制御の構成は記載されていないが、演算装置全体での同期制御は演算装置の構成を複雑し、プログラミングにより同期を保障する場合にはプログラマの負担が重くなる。 Patent Document 1 does not describe the configuration of this synchronization control. However, the synchronization control in the entire arithmetic device complicates the configuration of the arithmetic device, and the burden on the programmer becomes heavy when the synchronization is ensured by programming.

また、ＲＩＳＣプロセッサのパイプラインにおいてもこれと似た問題がある。図９は、ＲＩＳＣプロセッサのパイプライン構造を示す図である。図９によると、ＲＩＳＣプロセッサは、命令フェッチ部８１と、命令デコード部８２と、命令実行部８３と、ライトバック部８４により４ステージのパイプラインを構成している。理想的なパイプライン構成においては、パイプラインに投入される各命令は、流れ作業のごとく順次実行されていくが、実際には、データハザードや制御ハザード等の種々のハザードにより、或いは、命令実行に数ステージ必要な命令の存在によりストールを行う必要がある。ストールとは、パイプラインのステージを構成する機能ブロック、つまり図９の例では、命令フェッチ部８１、命令デコード部８２、命令実行部８３、ライトバック部８４のいずれかでの処理を終了した命令が、次の機能ブロックでの処理に移れずに、次の機能ブロックでの処理が可能となるまで待ち合わせることをいい、例えば、ある命令のデコードが命令デコード部８２で終了したとしても、先の命令の実行が命令実行部８３で終了していない場合に発生する。 There is a similar problem in the RISC processor pipeline. FIG. 9 is a diagram showing the pipeline structure of the RISC processor. According to FIG. 9, in the RISC processor, an instruction fetch unit 81, an instruction decode unit 82, an instruction execution unit 83, and a write back unit 84 constitute a 4-stage pipeline. In an ideal pipeline configuration, each instruction that enters the pipeline is executed sequentially as a flow work, but in reality, it is executed by various hazards such as data hazards and control hazards, or instruction execution. It is necessary to stall due to the presence of an instruction that requires several stages. A stall is a functional block that constitutes a pipeline stage, that is, in the example of FIG. 9, an instruction that has finished processing in any of the instruction fetch unit 81, instruction decode unit 82, instruction execution unit 83, and write back unit 84 However, the process waits until the next functional block can be processed without proceeding to the next functional block. For example, even if the decoding of a certain instruction ends in the instruction decoding unit 82, This occurs when the instruction execution unit 83 does not finish executing the instruction.

このため、ＲＩＳＣプロセッサは、パイプライン制御部９を備えており、各部での処理状況に基づき、ストール即ち、処理の待機命令を各部に対して行う。 For this reason, the RISC processor includes a pipeline control unit 9, and performs a stall, that is, a standby instruction for processing, to each unit based on the processing status of each unit.

特開２００１−３１２４８１号公報JP 2001-314881 A

上述したように、特許文献１の演算装置において、演算装置全体での同期制御は演算装置の構成を複雑し、プログラミングにより同期を保障する場合にはプログラマの負担が重くなる。 As described above, in the arithmetic device of Patent Document 1, the synchronization control in the entire arithmetic device complicates the configuration of the arithmetic device, and the burden on the programmer becomes heavy when synchronization is guaranteed by programming.

また、図９に示すパイプライン構造において、例えば、命令実行部９３を複数にするといった、パイプラインステージの段数の変更を伴う設計変更を行う場合、パイプライン制御部９を含めたプロセッサ全体の設計変更が必要になるという問題がある。 Further, in the pipeline structure shown in FIG. 9, for example, when a design change involving a change in the number of pipeline stages, such as a plurality of instruction execution units 93, is performed, the entire processor including the pipeline control unit 9 is designed. There is a problem that changes are required.

従って、本発明は、簡易な構成で、各ＰＥに入力される複数のデータの待ち合わせを行い、プログラミングの際にはデータの同期について考慮する必要のない演算装置を提供することを目的とする。 Accordingly, an object of the present invention is to provide an arithmetic device that waits for a plurality of data input to each PE with a simple configuration and does not need to consider data synchronization when programming.

また、演算装置の設計変更を容易とするため、全体を管理制御するパイプライン制御部を必要としないパイプライン構成を有する演算装置を提供することも目的とする。 Another object of the present invention is to provide an arithmetic unit having a pipeline configuration that does not require a pipeline control unit for managing and controlling the whole in order to facilitate design changes of the arithmetic unit.

本発明における演算装置によれば、
１つ以上の第１のプロセッサエレメントと、１つ以上の第２のプロセッサエレメントと、待合せ手段とを有する演算装置であって、待合せ手段は、第１のプロセッサエレメントそれぞれが出力するデータを保持するペイロードフィールドと、ペイロードフィールドに対応し、対応するペイロードフィールドの有効状態又は無効状態を示すフラグフィールドを有し、フラグフィールドは、第１のプロセッサエレメントが、対応するペイロードフィールドにデータを書き込んだときに有効状態となり、該データを入力とする総ての第２のプロセッサエレメントが、ペイロードフィールドよりデータを読み出したときに無効状態となることを特徴とする。 According to the arithmetic device of the present invention,
An arithmetic device having one or more first processor elements, one or more second processor elements, and a waiting means, wherein the waiting means holds data output from each of the first processor elements. A payload field and a flag field corresponding to the payload field and indicating a valid state or invalid state of the corresponding payload field, and the flag field when the first processor element writes data to the corresponding payload field All the second processor elements that are in the valid state and input the data are in the invalid state when the data is read from the payload field.

本発明の演算装置における他の実施形態によれば、
第１のプロセッサエレメントは、フラグフィールドが無効状態のときにのみ、新たなデータを対応するペイロードフィールドに書き込むことも好ましい。 According to another embodiment of the arithmetic device of the present invention,
The first processor element also preferably writes new data to the corresponding payload field only when the flag field is in an invalid state.

また、本発明の演算装置における他の実施形態によれば、
フラグフィールドは、第２のプロセッサエレメントに対応し、対応する第２のプロセッサエレメントがペイロードフィールドのデータを読み出したか否かを示すフラグから構成され、第２のプロセッサエレメントは、入力とするデータが保持されているペイロードフィールドに対応するフラグフィールドの、自身に対応するフラグの総てが、データを読み出していないことを示している場合にのみ、入力とするデータを読み出すことも好ましい。 According to another embodiment of the arithmetic device of the present invention,
The flag field corresponds to the second processor element, and includes a flag indicating whether or not the corresponding second processor element has read the data in the payload field. The second processor element holds data to be input. It is also preferable to read out the input data only when all of the flags corresponding to the flag field corresponding to the payload field being read indicate that the data has not been read out.

更に、本発明の演算装置における他の実施形態によれば、
第２のプロセッサエレメントの各入力には、１つ以上のデータを保持するＦＩＦＯバッファが設けられ、ペイロードフィールドに新たに書き込まれたデータは、同時に該データを入力とする総ての第２のプロセッサエレメントのＦＩＦＯバッファに出力され、ＦＩＦＯバッファは、保持しているデータがその容量に達したときにはフルフラグを有効に、それ以外の場合は無効に設定し、第１のプロセッサエレメントは、出力するデータを入力とする総てのＦＩＦＯバッファがフルフラグを無効に設定している場合にのみ、新たなデータをペイロードフィールドに書き込むことも好ましい。 Furthermore, according to another embodiment of the arithmetic unit of the present invention,
Each input of the second processor element is provided with a FIFO buffer that holds one or more data, and the newly written data in the payload field is all the second processors that receive the data at the same time. Is output to the FIFO buffer of the element. The FIFO buffer sets the full flag valid when the stored data reaches its capacity, otherwise invalid, and the first processor element outputs the data to be output. It is also preferable to write new data into the payload field only when all the input FIFO buffers have the full flag disabled.

更に、本発明の演算装置における他の実施形態によれば、
パイプラインを構成する各ステージの間に待合せ手段を有する演算装置であって、前記待合せ手段は、各ステージの出力である命令又はデータを保持するペイロードフィールドと、ペイロードフィールドの有効状態又は無効状態を示すフラグフィールドを有し、フラグフィールドは、ペイロードフィールドに命令又はデータが書き込まれたときに有効状態となり、ペイロードフィールドから命令又はデータが読み出されたときに無効状態となり、各ステージは、フラグフィールドが無効状態のときにのみ下流にある待合せ手段のペイロードフィールドに命令又はデータを書き込み、フラグフィールドが有効状態のときにのみ上流にある待合せ手段のペイロードフィールドから命令又はデータを読み出すことを特徴とする。 Furthermore, according to another embodiment of the arithmetic unit of the present invention,
An arithmetic unit having a queuing unit between each stage constituting a pipeline, wherein the queuing unit determines a payload field that holds an instruction or data that is an output of each stage, and a valid state or invalid state of the payload field. The flag field becomes valid when an instruction or data is written in the payload field, becomes invalid when an instruction or data is read from the payload field, and each stage has a flag field. The command or data is written to the payload field of the waiting means downstream only when the flag is invalid, and the command or data is read from the payload field of the waiting means upstream only when the flag field is valid. .

演算装置全体の同期制御や、入力データの同期を意識したプログラミングを行うことなく、データを入出力するプロセッサエレメント間での局所的な制御という簡易な構成で、各プロセッサエレメントへの入力データの同期を実現できる。また、各プロセッサエレメントの入力にＦＩＦＯバッファを設けることで、上流に位置するプロセッサエレメントが、待合せ手段にデータを書き込むサイクルの低減が可能となる。 Synchronize input data to each processor element with a simple configuration of local control between processor elements that input and output data, without performing synchronization control of the entire arithmetic unit or programming that is conscious of input data synchronization Can be realized. Further, by providing a FIFO buffer at the input of each processor element, it is possible to reduce the cycle in which the processor element located upstream writes data to the waiting means.

パイプライン構成を有する演算装置において、パイプライン制御部が必要なくなり、また、各ステージ間のストールの制御は局所的に独立して実行されるため、パイプラインのステージ数を増減させたとしても局所的な設計変更で対処できるという利点がある。 An arithmetic unit having a pipeline configuration eliminates the need for a pipeline control unit, and stall control between each stage is performed locally independently, so even if the number of pipeline stages is increased or decreased There is an advantage that it can be dealt with by a design change.

本発明を実施するための最良の実施形態について、以下では図面を用いて詳細に説明する。 The best mode for carrying out the present invention will be described in detail below with reference to the drawings.

図１は、本発明による演算装置が有する、ＰＥ間の接続構成を示す図である。図１のＰＥは、プログラムによりその演算内容が規定される。図１によると、各ＰＥは、その入力を直接他のＰＥから取得するのではなく、必ず、待合せ部５を介して取得する。つまり、ＰＥ１０、２０及び３０は、出力データを待合せ部５に書き込み、ＰＥ１１、２１及び３１は、入力データを待合せ部５から読み出して取得する。尚、ＰＥの行数及び列数は例示であり、本発明は、１以上の任意の行数及び２以上の任意の列数に適応可能である。 FIG. 1 is a diagram showing a connection configuration between PEs included in an arithmetic device according to the present invention. The operation content of the PE in FIG. 1 is defined by a program. According to FIG. 1, each PE does not acquire its input directly from another PE, but always acquires it via the queuing unit 5. That is, the PEs 10, 20, and 30 write the output data to the waiting unit 5, and the PEs 11, 21, and 31 read the input data from the waiting unit 5 and acquire it. In addition, the number of rows and the number of columns of PE are merely examples, and the present invention can be applied to any number of rows of 1 or more and any number of columns of 2 or more.

図２は、待合せ部５の構成を示す図である。待合せ部５は、符号５１〜５ｎで示すｎ個のレジスタを有し、各レジスタは、接続フラグフィールドと、状態フラグフィールドと、ペイロードフィールドを含んでいる。各レジスタにデータを書き込むＰＥは、プログラムにより、又は、ハードウェア構成により決定される。更に、各レジスタからデータを読み出すＰＥは、プログラムにより決定される。 FIG. 2 is a diagram illustrating a configuration of the meeting unit 5. The queuing unit 5 has n registers denoted by reference numerals 51 to 5n, and each register includes a connection flag field, a status flag field, and a payload field. The PE that writes data to each register is determined by a program or by a hardware configuration. Further, the PE for reading data from each register is determined by a program.

レジスタ５１〜５ｎの接続フラグフィールドは、それぞれが、データの読出しを行うＰＥに対応する複数ビットからなり、実際にデータを読み出す必要があるＰＥに対応するビットに、例えば、“１”を設定し、それ以外のＰＥに対応するビットに“０”を設定する。実際にデータを読み出す必要があるＰＥは、プログラムに応じて決定され、当該プログラムの実行時に初期設定として接続フィールドに書き込まれる。 Each of the connection flag fields of the registers 51 to 5n is composed of a plurality of bits corresponding to the PE from which data is read. For example, “1” is set in the bit corresponding to the PE that actually needs to read data. , “0” is set to the bits corresponding to the other PEs. The PE that actually needs to read data is determined according to the program, and is written in the connection field as an initial setting when the program is executed.

レジスタ５１〜５ｎのペイロードフィールドは、任意のビット長であり、ＰＥ間で入出力されるデータを保持する。状態フラグフィールドは、それぞれが、データの読出しを行うＰＥに対応する複数ビットからなり、例えば、ペイロードフィールドにデータが書き込まれたときに、総てのビットが“１”に設定される。また、当該ペイロードフィールドからデータが読み出されたときに、データの読み出しを行ったＰＥに対応するビット位置が“０”になるよう制御される。 The payload fields of the registers 51 to 5n have an arbitrary bit length and hold data input / output between PEs. Each of the status flag fields includes a plurality of bits corresponding to the PE from which data is read. For example, when data is written in the payload field, all bits are set to “1”. Further, when data is read from the payload field, the bit position corresponding to the PE from which the data has been read is controlled to be “0”.

データを読み出すＰＥは、状態フラグフィールドの自身に対応するビットが“１”である場合には、当該ペイロードフィールドに入力データが設定されていることを、“０”である場合には、当該ペイロードフィールドには未だ入力データが設定されていないことを認識する。状態フラグフィールドの値の設定、変更は、データの読出し及び書き込みを行うＰＥが行うことも、図示しない待合せ部５の制御部が行う構成とすることも可能である。また、接続フラグフィールドと、状態フラグフィールドの同一プロセッサエレメントに対応しているビット同士での論理積が総て“０”であることは、書き込んだデータを入力として使用するＰＥの総てがデータの読出しを行ったことを意味する。従って、ペイロードフィールドにデータを書き込むＰＥは、接続フラグフィールドと、状態フラグフィールドの同一ＰＥに対応しているビット同士での論理積が総て“０”である場合にのみ、新しいデータのペイロードフィールドへの書き込みを行う。 The PE that reads data indicates that the input data is set in the payload field when the bit corresponding to itself in the status flag field is “1”, and indicates that the payload is “0”. Recognize that the input data is not yet set in the field. Setting or changing the value of the status flag field can be performed by a PE that reads and writes data, or can be configured by a control unit of the queuing unit 5 (not shown). In addition, the logical product of the bits corresponding to the same processor element in the connection flag field and the status flag field is all “0”, which means that all the PEs using the written data as input are data. Means that the data has been read. Therefore, the PE that writes data in the payload field is a new data payload field only when the logical product of the bits corresponding to the same PE in the connection flag field and the status flag field is all "0". Write to.

図３及び図４を用いて、ＰＥ間でのデータの入出力のための動作を具体的に説明する。図３は、説明に用いるＰＥ間でのデータ入出力関係を示す図である。図３によると、ＰＥ１０が出力するデータは、待合せ部５のレジスタ５１に書き込まれてＰＥ１１及び２１の入力となり、ＰＥ２０が出力するデータは、待合せ部５のレジスタ５２に書き込まれてＰＥ１１及び３１の入力となり、ＰＥ３０が出力するデータは、待合せ部５のレジスタ５３に書き込まれてＰＥ２１及び３１の入力となっている。 The operation for inputting / outputting data between PEs will be specifically described with reference to FIGS. 3 and 4. FIG. 3 is a diagram showing a data input / output relationship between PEs used for explanation. According to FIG. 3, the data output from the PE 10 is written to the register 51 of the waiting unit 5 and is input to the PEs 11 and 21, and the data output from the PE 20 is written to the register 52 of the waiting unit 5 and stored in the PEs 11 and 31. Data that is input and output from the PE 30 is written in the register 53 of the queuing unit 5 and is input to the PEs 21 and 31.

図４は、図３の入出力関係での待合せ部５の接続フラグフィールドと、状態フラグフィールドの遷移を説明する図である。図３に示すように、図４のレジスタ５１には、ＰＥ１０が書き込みを行い、レジスタ５２には、ＰＥ２０が書き込みを行い、レジスタ５３には、ＰＥ３０が書き込みを行う。また、各レジスタの左側３ビットは接続フラグフィールドであり、右側３ビットは状態フラグフィールドであり、接続フラグフィールド及び状態フラグフィールドの第１ビットはＰＥ１１に対応し、第２ビットはＰＥ２１に対応し、第３ビットはＰＥ３１に対応している。 FIG. 4 is a diagram for explaining the transition of the connection flag field and the status flag field of the queuing unit 5 in the input / output relationship of FIG. As shown in FIG. 3, the PE 10 writes to the register 51 in FIG. 4, the PE 20 writes to the register 52, and the PE 30 writes to the register 53. The left 3 bits of each register is a connection flag field, the right 3 bits are a status flag field, the first bit of the connection flag field and the status flag field corresponds to PE11, and the second bit corresponds to PE21. The third bit corresponds to PE31.

図４（ａ）〜（ｄ）の総てにおいて、レジスタ５１の接続フラグフィールドの第１、２、３ビットは、それぞれ、“１”、“１”、“０”であり、これは、ＰＥ１０が出力するデータが、ＰＥ１１及び２１の入力となることに対応している。同様に、レジスタ５２の接続フラグフィールドの第１、２、３ビットは、それぞれ、“１”、“０”、“１”であり、これは、ＰＥ２０が出力するデータが、ＰＥ１１及び３１の入力となることに対応し、レジスタ５３の接続フラグフィールドの第１、２、３ビットは、それぞれ、“０”、“１”、“１”であり、これは、ＰＥ３０が出力するデータが、ＰＥ２１及び３１の入力となることに対応している。 4A to 4D, the first, second, and third bits of the connection flag field of the register 51 are “1”, “1”, and “0”, respectively. Corresponds to the input of PE11 and PE21. Similarly, the first, second, and third bits of the connection flag field of the register 52 are “1”, “0”, and “1”, respectively. This indicates that the data output by the PE 20 is the input of the PEs 11 and 31. The first, second, and third bits of the connection flag field of the register 53 are “0”, “1”, and “1”, respectively. And 31 are input.

まず、ＰＥ１０、２０及び３０のそれぞれが出力データを、待合せ部５の対応するレジスタに書き込んだものとする。図４（ａ）は、そのときの、各レジスタの状態を示し、総ての状態フラグフィールドの、総てのビットは、“１”に設定されている。 First, it is assumed that each of the PEs 10, 20, and 30 has written output data to the corresponding register of the waiting unit 5. FIG. 4A shows the state of each register at that time, and all bits in all the status flag fields are set to “1”.

ＰＥ１１、２１及び３１のそれぞれが、どのレジスタからデータを読み出すかは、プログラムにより決定される。つまり、ＰＥ１１はレジスタ５１及び５２からデータを読出し、ＰＥ２１はレジスタ５１及び５３からデータを読出し、ＰＥ３１はレジスタ５２及び５３からデータを読出すことはプログラムにより決定され、例えばＰＥ１１は、レジスタ５３には一切アクセスしないが、これとは異なり接続フラグフィールドの値によりＰＥ自身が認識する構成とすることも可能である。 From which register each of the PEs 11, 21, and 31 reads data is determined by a program. That is, PE11 reads data from registers 51 and 52, PE21 reads data from registers 51 and 53, and PE31 reads data from registers 52 and 53. For example, PE11 stores data in register 53. Unlike this, the PE itself can be recognized by the value of the connection flag field.

図４（ｂ）は、ＰＥ１１がレジスタ５１及び５２からデータを読出し、ＰＥ２１がレジスタ５１及び５３からデータを読出したときの各レジスタの状態を示している。図４（ｂ）に示す様に、ＰＥ１１がレジスタ５１及び５２からデータを読み出すことで、レジスタ５１及び５２の状態フラグフィールドのＰＥ１１に対応する第１ビットが“０”に変更され、ＰＥ２１がレジスタ５１及び５３からデータを読み出すことで、レジスタ５１及び５３の状態フラグフィールドのＰＥ２１に対応する第２ビットが“０”に変更されている。 FIG. 4B shows the state of each register when the PE 11 reads data from the registers 51 and 52 and the PE 21 reads data from the registers 51 and 53. As shown in FIG. 4B, when the PE 11 reads data from the registers 51 and 52, the first bit corresponding to the PE 11 in the status flag field of the registers 51 and 52 is changed to “0”, and the PE 21 is set in the register. By reading data from 51 and 53, the second bit corresponding to PE21 in the status flag field of the registers 51 and 53 is changed to “0”.

図４（ｂ）に示す状態において、レジスタ５１の接続フラグフィールドと状態フラグフィールドの同じビット位置同士の論理積は、全ビット共“０”になり、これは、ＰＥ１０が出力するデータを入力とするＰＥの総てが、つまり、ＰＥ１１及び２１がデータを読み出したことを意味し、従って、ＰＥ１０は、レジスタ５１に新しいデータの書き込みを行うことができる。 In the state shown in FIG. 4B, the logical product of the same bit positions of the connection flag field and the state flag field of the register 51 is “0” for all bits. This is because the data output from the PE 10 is input. This means that all the PEs to be executed, that is, the PEs 11 and 21 have read the data, and therefore the PE 10 can write new data to the register 51.

一方、レジスタ５２及び５３の接続フラグフィールドと状態フラグフィールドの同じビット位置同士の論理積は、全ビット共“０”ではなく、これは、ＰＥ２０とＰＥ３０が出力するデータを入力とするプロセッサエレメントの総てが、未だデータの読出しを行っていないことを意味し、従って、ＰＥ２０、３０は、新しいデータをレジスタ５２、５３に書き込むことはできない。 On the other hand, the logical product of the same bit positions in the connection flag field and the status flag field of the registers 52 and 53 is not “0” for all bits, and this is the value of the processor element that receives the data output from PE20 and PE30. This means that the data has not yet been read out, and therefore the PEs 20 and 30 cannot write new data into the registers 52 and 53.

図４（ｃ）は、ＰＥ１０がレジスタ５１に新しいデータを書き込み、かつ、ＰＥ３１がレジスタ５２及び５３からデータを読出したときの各レジスタの状態を示している。図４（ｃ）に示す様に、ＰＥ１０がレジスタ５１にデータを書き込んだことで、レジスタ５１の状態フラグフィールドは総て“１”となり、ＰＥ３１がレジスタ５２及び５３からデータを読み出すことで、レジスタ５２及び５３の状態フラグフィールドのＰＥ３１に対応する第３ビットが“０”に変更されている。 FIG. 4C shows the state of each register when the PE 10 writes new data to the register 51 and the PE 31 reads data from the registers 52 and 53. As shown in FIG. 4C, when the PE 10 writes data to the register 51, all the status flag fields of the register 51 become “1”, and when the PE 31 reads data from the registers 52 and 53, the register The third bit corresponding to PE31 in the status flag fields 52 and 53 is changed to “0”.

図４（ｃ）の状態において、レジスタ５２及び５３の接続フラグフィールドと状態フラグフィールドの同じビット位置同士の論理積は、全ビット共“０”になり、従って、ＰＥ２０及び３０は、レジスタ５２、５３に新しいデータの書き込みを行うことができる。また、ＰＥ１１は、レジスタ５１、５２の対応するビットである第１ビットが、それぞれ、“１”、“０”であるため、レジスタ５１には既にデータは設定されているが、レジスタ５２には未だデータが設定されていないこと、つまり、処理に必要なデータが揃っていないことを認識し、よって、データの読出しを行うことができないことを認識する。 In the state of FIG. 4C, the logical product of the same bit positions of the connection flag field and the status flag field of the registers 52 and 53 becomes “0” for all bits. New data can be written to 53. In PE11, since the first bits corresponding to the registers 51 and 52 are “1” and “0”, respectively, data is already set in the register 51, but the register 52 has Recognizing that data has not been set yet, that is, data necessary for processing is not prepared, and therefore it is recognized that data cannot be read out.

図４（ｄ）は、ＰＥ２０及び３０がレジスタ５２、５３に新しいデータを書き込んだときの各レジスタの状態を示している。図４（ｄ）に示す様に、ＰＥ２０と３０が、それぞれ、レジスタ５２と５３にデータを書き込んだことで、レジスタ５２及び５３の状態フラグフィールドは総て“１”となっている。 FIG. 4D shows the state of each register when the PEs 20 and 30 write new data into the registers 52 and 53. As shown in FIG. 4D, the PEs 20 and 30 write data in the registers 52 and 53, respectively, and the status flag fields of the registers 52 and 53 are all “1”.

図４（ｄ）の状態において、レジスタ５１〜５３の接続フラグフィールドと状態フラグフィールドの同じビット位置同士の論理積は、全ビット共“０”とならないため、ＰＥ１０、２０及び３０は新たなデータの書き込みを行うことはできない。また、ＰＥ１１、２１、３１のデータの読出しについては、読出しを行う各レジスタの状態フラグフィールドの、自身に対応するビットが総て“１”であるため、ＰＥ１１、２１及び３１はデータ読出しが可能であることを認識する。 In the state of FIG. 4D, since the logical product of the same bit positions of the connection flag field and the status flag field of the registers 51 to 53 does not become “0” for all bits, the PEs 10, 20 and 30 have new data. Cannot be written. In addition, regarding the reading of data of PEs 11, 21, and 31, since all the bits corresponding to the state flag field of each register to be read are “1”, PEs 11, 21, and 31 can read the data. Recognize that.

図５は、待合せ部５の各レジスタの状態遷移図である。図５において、無効状態は、接続フラグフィールドと状態フラグフィールドの同じＰＥに対応するビット位置同士の論理積が総て“０”である場合であり、上流のＰＥ、即ち、レジスタに書き込み行うＰＥは、無効状態のときにのみ新たなデータをペイロードフィールドに書き込むことができ、書き込むことで、有効状態に遷移する。有効状態である間は、上流にあるＰＥは新たなデータをペイロードフィールドに書き込むことはできない。 FIG. 5 is a state transition diagram of each register of the waiting unit 5. In FIG. 5, the invalid state is a case where the logical product of the bit positions corresponding to the same PE in the connection flag field and the state flag field is all “0”, and the upstream PE, that is, the PE to be written to the register. Can write new data to the payload field only in the invalid state, and transition to the valid state by writing. While in the valid state, the upstream PE cannot write new data into the payload field.

有効状態である場合において、下流にあるＰＥ、即ち、レジスタから読み出しを行うＰＥがデータを読み出す度に、状態フラグフィールドの対応するビット位置が“０”に変更され、接続フラグフィールドと状態フラグフィールドの同じＰＥに対応するビット位置同士の論理積が総て“０”となった場合は、即ち、データを入力として使用する総ての下流にあるＰＥがデータの読出しを行ったときに無効状態に遷移する。 When it is in the valid state, every time the PE that is downstream, that is, the PE that reads from the register, reads the data, the corresponding bit position of the status flag field is changed to “0”, and the connection flag field and the status flag field When the logical product of the bit positions corresponding to the same PE of all becomes “0”, that is, when all the downstream PEs that use data as input read out the data. Transition to.

以上、装置全体の同期制御や、入力データの同期を意識したプログラミングを行うことなく、データを入出力するＰＥ間での局所的な制御という簡易な構成で、各ＰＥへの入力データの同期を実現できる。 As described above, synchronization of input data to each PE can be achieved with a simple configuration of local control between PEs that input and output data without performing synchronization control of the entire device or programming that is conscious of synchronization of input data. realizable.

上述した待合せ部５では、接続フラグフィールドを設けてデータの読出しを行うＰＥを示し、データを書き込んだ場合には状態フラグフィールドに総て“１”を設定していたが、接続フラグフィールドを設けず、データを書き込んだ場合には状態フラグフィールドのデータの読み出しを行うＰＥに対応するビットのみに“１”を設定する構成でも良い。いずれの構成にしても、接続フラグフィールド及び／又は状態フラグフィールドの各ビットによりデータの読出しを行うＰＥの読出し状態を管理することで、１つのレジスタのデータを複数のＰＥへの入力データとして分岐させることができる。 In the queuing unit 5 described above, a connection flag field is provided to indicate a PE from which data is read. When data is written, all the status flag fields are set to “1”, but a connection flag field is provided. Instead, when data is written, the configuration may be such that “1” is set only to the bit corresponding to the PE from which the data in the status flag field is read. Regardless of the configuration, the data in one register is branched as input data to a plurality of PEs by managing the read state of the PE that reads data by each bit in the connection flag field and / or the status flag field. Can be made.

また、ＰＥが出力するデータを、２つ以上のＰＥの入力としない場合、即ち、ＰＥが１行の構成では、接続フラグフィールドは必要なく、状態フラグフィールドは１ビットのみ設ければよい。 In addition, when the data output by the PE is not input to two or more PEs, that is, in the configuration of one PE, the connection flag field is not necessary, and only one bit of the status flag field may be provided.

また、図３に示す構成において、例えば、ＰＥ１０が、あるサイクルにてデータを書き込んだ場合、次のサイクルでは、ＰＥ１１及び２１がデータを読み出すことは可能であるが、ＰＥ１０がデータを書き込むことはできず、従って、ＰＥ１０は、最短でも２サイクルに１度の割合でしかデータを書き込むことはできない。このため、各ＰＥの各入力部分にＦＩＦＯバッファを配置する。 In the configuration shown in FIG. 3, for example, when the PE 10 writes data in a certain cycle, the PEs 11 and 21 can read the data in the next cycle, but the PE 10 cannot write the data. Therefore, the PE 10 can only write data at a rate of once every two cycles. For this reason, a FIFO buffer is arranged at each input portion of each PE.

図６は、ＦＩＦＯバッファを配置した構成図であり、図３のうち、ＰＥ１０、１１及び２１に関する部分のみを表示している。 FIG. 6 is a configuration diagram in which a FIFO buffer is arranged, and only the portions related to PEs 10, 11 and 21 in FIG. 3 are displayed.

図６によると、ＰＥ１１は、ＦＩＦＯバッファ１１０から入力データを取得し、ＰＥ２１は、ＦＩＦＯバッファ２１０から入力データを取得する。ＦＩＦＯバッファ１１０及びＦＩＦＯバッファ２１０は、１つ以上のデータを保持することが可能であり、データ出力が可能な場合には、それぞれ、出力可能フラグ１１１、２１１を有効に設定する。ＰＥ１１及び２１は、演算に必要な入力に対応する総てのＦＩＦＯバッファの出力可能フラグが有効に設定されている場合に、ＦＩＦＯバッファからデータを読み出して演算を開始する。 According to FIG. 6, the PE 11 acquires input data from the FIFO buffer 110, and the PE 21 acquires input data from the FIFO buffer 210. The FIFO buffer 110 and the FIFO buffer 210 can hold one or more data, and when data output is possible, the output enable flags 111 and 211 are set to be valid. When the output enable flags of all the FIFO buffers corresponding to the inputs necessary for the calculation are set to be valid, the PEs 11 and 21 read the data from the FIFO buffer and start the calculation.

ＰＥ１０がレジスタ５１にデータを書き込むと、書き込まれたデータは、同時にＦＩＦＯバッファ１１０及び２１０へ出力されるように構成されている。尚、レジスタ５１の接続フラグフィールド及び／又は状態フラグフィールドは、既に説明したのと同様の構成であり、ＦＩＦＯバッファにデータ出力することで、状態フラグフィールドの対応するビットは“０”に変更される。 When the PE 10 writes data to the register 51, the written data is simultaneously output to the FIFO buffers 110 and 210. Note that the connection flag field and / or the status flag field of the register 51 have the same configuration as described above, and by outputting data to the FIFO buffer, the corresponding bit of the status flag field is changed to “0”. The

ＦＩＦＯバッファ１１０及び２１０は、それぞれ、保持している値がその容量に達し、新たな入力が不可となった場合それぞれＦＵＬＬフラグ１１１、２１１を有効に設定し、それ以外の場合、即ち新たな入力が可能な場合にはＦＵＬＬフラグ１１１、２１１を無効に設定する。 The FIFO buffers 110 and 210 respectively set the FULL flags 111 and 211 to be valid when the stored value reaches its capacity and new input becomes impossible, respectively. If it is possible, the FULL flags 111 and 211 are set invalid.

ＰＥ１０は、ＰＥ１０が出力するデータを入力とする総てのＦＩＦＯバッファのＦＵＬＬフラグが無効である場合にのみデータをレジスタ５１に書き込む。 The PE 10 writes data to the register 51 only when the FULL flag of all the FIFO buffers that receive the data output from the PE 10 is invalid.

以上、ＰＥ１１、２１の入力にＦＩＦＯバッファ１１１、２１１を設けることで、上流に位置するＰＥ１０が、待合せ部５にデータを書き込むサイクルの低減が可能となる。 As described above, by providing the FIFO buffers 111 and 211 at the inputs of the PEs 11 and 21, it is possible to reduce the cycle in which the upstream PE 10 writes data to the waiting unit 5.

続いて、図７を用いて、本発明による演算装置の他の実施形態を説明する。図７によると、パイプラインの各ステージを構成する命令フェッチ部８１と、命令デコード部８２と、命令実行部８３と、ライトバック部８４との間に待合せ部４が挿入されている。尚、４ステージのパイプライン構成を用いて説明するがステージ数は例示で有り、本発明は、任意のステージ数のパイプラインに適用可能である。 Next, another embodiment of the arithmetic device according to the present invention will be described with reference to FIG. According to FIG. 7, the queuing unit 4 is inserted between the instruction fetch unit 81, the instruction decode unit 82, the instruction execution unit 83, and the write back unit 84 that constitute each stage of the pipeline. Although the description will be made using a four-stage pipeline configuration, the number of stages is an example, and the present invention can be applied to a pipeline having an arbitrary number of stages.

パイプライン構成は、図１に示す演算装置で行数が１の場合と同等であるため、本実施形態の待合せ部４は、図２に示す待合せ部５の接続フラグフィールドを省略し、かつ、状態フラグフィールドを１ビットとしたもので良い。以後、状態フラグフィールドに値“１”が設定されている状態を有効状態と、値“０”が設定されている状態を無効状態と呼ぶ。 Since the pipeline configuration is equivalent to the case where the number of rows is 1 in the arithmetic unit shown in FIG. 1, the queuing unit 4 of this embodiment omits the connection flag field of the queuing unit 5 shown in FIG. The status flag field may be one bit. Hereinafter, a state in which the value “1” is set in the state flag field is referred to as a valid state, and a state in which the value “0” is set is referred to as an invalid state.

パイプラインを構成する各機能ブロックは、処理結果、例えば、命令フェッチ部８１であればフェッチした命令を、命令デコード部８２であればデコードした命令を、命令実行部８３であれば実行結果データを、下流にある待合せ部４のペイロードフィールドに書き込み、書き込みにより待合せ部４の状態フラグフィールドは有効状態に設定される。また、各機能ブロックは、上流のステージでの処理が終了したことを、上流にある待合せ部４の状態フラグフィールドが有効状態であることにより認識し、有効状態である場合にのみ待合せ部４のペイロードフィールドから命令又はデータを読み込んで処理を開始する。また、待合せ部４の状態フラグフィールドは、対応するペイロードフィールドから命令又はデータが読み出されることで無効状態に設定される。パイプラインを構成する各機能ブロックは、下流にある待合せ部４の状態フラグフィールドが無効状態である場合にのみ、ペイロードフィールドへのデータ又は命令の書き込みを行うことをできる。 Each functional block constituting the pipeline receives the processing result, for example, the fetched instruction for the instruction fetch unit 81, the decoded instruction for the instruction decode unit 82, and the execution result data for the instruction execution unit 83. The data is written in the payload field of the waiting unit 4 downstream, and the state flag field of the waiting unit 4 is set to the valid state by writing. Each functional block recognizes that the processing in the upstream stage has ended by the status flag field of the queuing unit 4 upstream is valid, and only when the queuing unit 4 is in the valid state. A command or data is read from the payload field and processing is started. Further, the status flag field of the queuing unit 4 is set to an invalid state when an instruction or data is read from the corresponding payload field. Each functional block constituting the pipeline can write data or an instruction into the payload field only when the status flag field of the downstream waiting unit 4 is invalid.

例えば、図７に示す演算装置において、
ａｄｄＲ０、＃１、Ｒ１
を実行することを考える。ここで、ａｄｄは符号無し加算命令であり、上記演算は、レジスタＲ０に格納されている値と、即値“１”を加算して、結果をレジスタＲ１に格納することを示している。 For example, in the arithmetic unit shown in FIG.
add R0, # 1, R1
Think about performing. Here, add is an unsigned addition instruction, and the above operation indicates that the value stored in the register R0 and the immediate value “1” are added and the result is stored in the register R1.

まず、命令フェッチ部８１は、下流にある待合わせ部４が無効状態である場合に、図示しない命令メモリから命令をフェッチし、フェッチした命令を待合わせ部４のペイロードフィールドに書き込む。書き込みにより待合せ部４の状態フラグフィールドは有効状態に設定されるが、有効状態に設定するのは、命令フェッチ部８１であっても、待合せ部４自身であっても良い。 First, the instruction fetch unit 81 fetches an instruction from an instruction memory (not shown) and writes the fetched instruction in the payload field of the waiting unit 4 when the downstream waiting unit 4 is in an invalid state. Although the status flag field of the queuing unit 4 is set to the valid state by writing, the instruction fetch unit 81 or the queuing unit 4 itself may set the valid state.

デコード部８２は、上流にある待合わせ部４の状態フラグフィールドが有効状態となったため、ペイロードフィールドにある命令のデコードを行う。これにより上流にある待合わせ部４の状態フラグフィールドは無効状態に設定されるが、無効状態に設定するのは、デコード部８２であっても、待合せ部４自身であっても良い。また、デコード部８２は、下流にある待合わせ部４が無効状態である場合、デコード結果を待合わせ部４のペイロードフィールドに書き込む。有効状態である場合は、無効状態になるまで待機して、無効状態になった後にデコード結果を書き込む。書き込みにより、待合せ部４の状態フラグフィールドは有効状態になる。ここで、本例においてデコード結果とは、命令実行部８３に加算を行わせるためのマイクロコードと、レジスタＲ０が保持する値と、即値である“１”と、演算結果を格納するレジスタＲ１を指定する値である。 The decoding unit 82 decodes the instruction in the payload field because the status flag field of the queuing unit 4 located upstream is valid. As a result, the status flag field of the queuing unit 4 located upstream is set to an invalid state, but the decoding unit 82 or the queuing unit 4 itself may set the invalid state. Further, when the downstream waiting unit 4 is in an invalid state, the decoding unit 82 writes the decoding result in the payload field of the waiting unit 4. If it is in the valid state, it waits until it becomes invalid and writes the decoding result after it becomes invalid. By writing, the status flag field of the waiting unit 4 becomes valid. Here, in this example, the decoding result includes the microcode for causing the instruction execution unit 83 to perform addition, the value held by the register R0, the immediate value “1”, and the register R1 for storing the operation result. The value to specify.

命令実行部８３は、上流にある待合わせ部４の状態フラグフィールドが有効状態となったため、上流にある待合せ部４のペイロードフィールドに格納されたマイクロコードに基づき、自身を加算動作に設定する。続いて、上流にある待合せ部４のペイロードフィールドに格納された値の加算を行う。最後に、命令実行部８３は、下流にある待合わせ部４が無効状態である場合に、加算の結果と、上流にある待合せ部４のペイロードフィールドに格納されたレジスタＲ１を指定する値とを、下流にある待合わせ部４のペイロードフィールドに書き込む。命令実行部８３からの書き込みにより、待合せ部４の状態フラグフィールドは有効状態になる。 The instruction execution unit 83 sets itself to an addition operation based on the microcode stored in the payload field of the queuing unit 4 located upstream because the status flag field of the queuing unit 4 located upstream becomes valid. Subsequently, the value stored in the payload field of the waiting unit 4 located upstream is added. Finally, when the downstream queuing unit 4 is in an invalid state, the instruction execution unit 83 obtains the result of addition and a value specifying the register R1 stored in the payload field of the upstream queuing unit 4 , And writes in the payload field of the waiting unit 4 downstream. By writing from the instruction execution unit 83, the status flag field of the waiting unit 4 becomes valid.

ライトバック部８４は、上流にある待合わせ部４の状態フラグフィールドが有効状態となったため、上流にある待合せ部４のペイロードフィールドに格納された加算結果の値を、同じくペイロードフィールドで指定されているレジスタＲ１に格納する。 Since the status flag field of the upstream queuing unit 4 becomes valid, the write-back unit 84 also specifies the value of the addition result stored in the payload field of the upstream queuing unit 4 in the same payload field. Stored in the register R1.

以上の構成によりパイプライン制御ユニットが必要なくなり、また、各機能ブロック間のストールの制御は局所的に独立して実行されるため、パイプラインの段数を増減させたとしても局所的な設計変更で対処できるという利点がある。 The above configuration eliminates the need for a pipeline control unit, and stall control between each functional block is performed locally independently, so even if the number of pipeline stages is increased or decreased, local design changes are possible. There is an advantage that it can cope.

尚、以上の説明における待合せ部４又は５の、接続フラグフィールド及び状態フラグフィールドに書き込まれる値、即ち“１”又は“０”は例示であり説明で使用した値に限定されるものではない。 Note that the values written in the connection flag field and the status flag field of the queuing unit 4 or 5 in the above description, that is, “1” or “0” are examples and are not limited to the values used in the description.

本発明による演算装置が有する、プロセッサエレメント間の接続構成を示す図である。It is a figure which shows the connection structure between processor elements which the arithmetic unit by this invention has. 待合せ部の構成を示す図である。It is a figure which shows the structure of a waiting part. プロセッサエレメント間でのデータ入出力関係を示す図である。It is a figure which shows the data input / output relationship between processor elements. 図３の入出力関係での待合せ部の接続フラグフィールドと、状態フラグフィールドの遷移を説明する図である。It is a figure explaining the transition of the connection flag field and state flag field of the waiting part in the input / output relationship of FIG. 待合せ部の各レジスタの状態遷移図である。It is a state transition diagram of each register of the queuing unit. ＦＩＦＯバッファを配置した構成図である。It is the block diagram which has arrange | positioned FIFO buffer. 本発明による演算装置が有する、パイプライン構成を示す図である。It is a figure which shows the pipeline structure which the arithmetic unit by this invention has. 従来技術による演算装置を説明する図である。It is a figure explaining the arithmetic unit by a prior art. 従来技術によるパイプライン構造を説明する図である。It is a figure explaining the pipeline structure by a prior art.

Explanation of symbols

１０、２０、３０、１１、２１、３１、７１、７２、７３プロセッサエレメント
４、５待合せ部
９パイプライン制御部
５１、５２、５３、５ｎレジスタ
６１、６２入力
８１命令フェッチ部
８２命令デコード部
８３命令実行部
８４ライトバック部
１１０、２１０ＦＩＦＯバッファ
１１１、２１１出力可能フラグ
１１２、２１２ＦＵＬＬフラグ
10, 20, 30, 11, 21, 31, 71, 72, 73 Processor element 4, 5 Waiting section 9 Pipeline control section 51, 52, 53, 5n Register 61, 62 Input 81 Instruction fetch section 82 Instruction decoding section 83 Instruction execution unit 84 Write back unit 110, 210 FIFO buffer 111, 211 Output enable flag 112, 212 FULL flag

Claims

An arithmetic device having one or more first processor elements, one or more second processor elements, and a waiting means,
The queuing means has a payload field that holds data output by each of the first processor elements, and a flag field that corresponds to the payload field and indicates a valid state or invalid state of the corresponding payload field,
The flag field becomes valid when the first processor element writes data to the corresponding payload field, and all the second processor elements that receive the data read data from the payload field. To become invalid,
An arithmetic unit characterized by the above.

The first processor element writes new data to the corresponding payload field only when the flag field is invalid;
The arithmetic unit according to claim 1, wherein:

The flag field corresponds to the second processor element, and includes a flag indicating whether or not the corresponding second processor element has read the data in the payload field.
The second processor element inputs only when all the flags corresponding to itself in the flag field corresponding to the payload field holding the input data indicate that the data is not read out. To read the data
The arithmetic unit according to claim 2, wherein:

Each input of the second processor element is provided with a FIFO buffer that holds one or more data, and the newly written data in the payload field is all the second processors that receive the data at the same time. Output to the FIFO buffer of the element,
The FIFO buffer enables the full flag when the stored data reaches its capacity, and disables it otherwise.
The first processor element writes new data to the payload field only when all FIFO buffers that receive output data have the full flag disabled.
The arithmetic unit according to claim 1, wherein:

An arithmetic unit having a waiting means between each stage constituting a pipeline,
The queuing means has a payload field that holds an instruction or data that is an output of each stage, and a flag field that indicates a valid state or invalid state of the payload field, and the instruction or data is written to the payload field in the flag field. Becomes invalid when an instruction or data is read from the payload field,
Each stage writes an instruction or data into the payload field of the queuing means downstream only when the flag field is in an invalid state, and writes an instruction or data from the payload field of the queuing means upstream only when the flag field is valid. Reading,
An arithmetic unit characterized by the above.