JPH04273529A

JPH04273529A - Parallel arithmetic circuit

Info

Publication number: JPH04273529A
Application number: JP3443691A
Authority: JP
Inventors: Hajime Kubosawa; 久保沢　元
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1991-02-28
Filing date: 1991-02-28
Publication date: 1992-09-29

Abstract

PURPOSE:To obtain a parallel arithmetic circuit which simultaneously supplies arithmetic commands to each of plural computing elements constituting a pipe line processing. CONSTITUTION:This circuit is equipped with plural command registers CR1, 2, and 3 which store the prescribed arithmetic commands, data register DR which stores the data necessary for the arithmetic operation by the arithmetic commands stored in the command registers CR1, 2, and 3, plural decoders DC1, 2, and 3 which decode the arithmetic commands stored in the command registers CR1, 2, and 3, plural computing elements EX-1, 2, and 3 which operates the prescribed arithmetic operation based on the decoded results of the decoders DC1, 2, and 3, and command reconstructing means 1 which simultaneously supplies the arithmetic commands to each of the plural command registers CR1, 2, and 3.

Description

[Detailed description of the invention]

【０００１】0001

【産業上の利用分野】本発明は、並列演算回路に係り、
詳しくは、画像処理等の分野に用いて好適な、数値演算
処理を高速に処理する並列演算回路に関する。[Industrial Application Field] The present invention relates to a parallel arithmetic circuit,
Specifically, the present invention relates to a parallel calculation circuit that performs numerical calculation processing at high speed and is suitable for use in fields such as image processing.

【０００２】近年、例えば、画像処理や各種シミュレー
ション等においては、数値演算処理を高速に行うための
並列演算回路が数多く開発されている。これは、例えば
、複数の演算器を同一チップに集積し、各演算器によっ
て演算を実行させるものであり、これらの各演算器が同
時に並列演算を行えば、非常に高速な演算が可能である
。In recent years, for example in image processing and various simulations, many parallel arithmetic circuits have been developed to perform numerical arithmetic processing at high speed. For example, this involves integrating multiple arithmetic units onto the same chip and having each arithmetic unit perform calculations.If each of these arithmetic units performs parallel operations at the same time, extremely high-speed calculations are possible. .

【０００３】しかし、各演算器を同時に動作させるため
は、複数の演算器に対して同時に演算命令、及び演算パ
ラメータである入力データを供給しなくてはならない。そこで、複数の演算器に対して同時に演算命令、及び入
力データを供給することが必要となる。However, in order to operate each arithmetic unit at the same time, it is necessary to simultaneously supply arithmetic instructions and input data, which are arithmetic parameters, to a plurality of arithmetic units. Therefore, it is necessary to simultaneously supply arithmetic instructions and input data to a plurality of arithmetic units.

【０００４】0004

【従来の技術】従来のこの種の並列演算回路としては、
例えば、図６に示すような構成のものがある。[Prior Art] Conventional parallel arithmetic circuits of this type include:
For example, there is a configuration as shown in FIG.

【０００５】この並列演算回路は、大別して、命令レジ
スタＣＲ、データレジスタＤＲ、デコーダＤＣ、及び３
個の演算器ＥＸ−１，２，３から構成されている。命令
レジスタＣＲは、外部から入力される所定の演算命令を
一時保持して格納するものであり、データレジスタＤＲ
は、所定の演算命令に基づいて演算を行う際のパラメー
タとなるデータを一時保持して格納するものである。[0005] This parallel arithmetic circuit is roughly divided into an instruction register CR, a data register DR, a decoder DC, and a
It is composed of arithmetic units EX-1, EX-2, and EX-3. The instruction register CR temporarily holds and stores a predetermined operation instruction input from the outside, and the data register DR
is used to temporarily hold and store data that becomes parameters when performing calculations based on predetermined calculation instructions.

【０００６】デコーダＤＣは、命令レジスタＣＲに格納
された演算命令をデコードし、各演算器ＥＸ−１，２，
３に出力するものである。なお、演算器ＥＸ−１，２，
３は３個のパイプラインで動作する。[0006] The decoder DC decodes the arithmetic instructions stored in the instruction register CR, and decodes the arithmetic instructions stored in the instruction register CR, and decodes each arithmetic unit EX-1, EX-2,
3. In addition, the arithmetic units EX-1, EX-2,
3 operates with three pipelines.

【０００７】以上の構成において、命令レジスタＣＲに
格納された演算命令がデコーダＤＣによって１度に１個
だけデコードされる場合、演算の実行は、通常、図７，
８に示すような順序で行われる。なお、演算実行には３
クロックのタイミングを要するものとする。In the above configuration, when the arithmetic instructions stored in the instruction register CR are decoded one at a time by the decoder DC, the execution of the arithmetic operation is normally performed as shown in FIG.
This is done in the order shown in 8. Note that 3
Assume that clock timing is required.

【０００８】すなわち、デコーダＤＣによって命令レジ
スタＣＲから１クロックサイクル毎に１個の演算命令■
〜■が順次読み出され、そのデコード結果が各演算器Ｅ
Ｘ−１，２，３に供給される。そして、演算器ＥＸ−１
で演算命令■，■が処理された後、演算器ＥＸ−２で演
算命令■，■が処理され、以下、演算器ＥＸ−３で演算
命令■，■、演算器ＥＸ−１で演算命令■、演算器ＥＸ
−３で演算命令■が処理される。That is, the decoder DC outputs one operation instruction from the instruction register CR every clock cycle.
~■ are read out sequentially, and the decoding results are sent to each arithmetic unit E.
Supplied to X-1, 2, and 3. And arithmetic unit EX-1
After the calculation instructions ■ and ■ are processed in the calculation unit EX-2, the calculation instructions ■ and ■ are processed in the calculation unit EX-2, and then the calculation instructions ■ and ■ are processed in the calculation unit EX-3, and the calculation instructions ■ and ■ are processed in the calculation unit EX-1. , arithmetic unit EX
At -3, the arithmetic instruction ■ is processed.

【０００９】ちなみに、この場合、８個の演算命令■〜
■の全てを実行するのに１０クロックサイクルを要して
いる。By the way, in this case, eight operation instructions
It takes 10 clock cycles to execute all of (2).

【００１０】0010

【発明が解決しようとする課題】しかしながら、このよ
うな従来の並列演算回路にあっては、デコーダＤＣによ
って１クロックサイクル毎に１個の演算命令■〜■が命
令レジスタＣＲから順次読み出され、そのデコード結果
を各演算器ＥＸ−１，２，３に供給するという構成とな
っていたため、複数の演算器ＥＸ−１，２，３に同時に
演算命令のデコード結果を供給できず、３個の演算器Ｅ
Ｘ−１，２，３によって３個のパイプライン処理が可能
となるように構成されているにもかかわらず、パイプラ
インが有効に動作せず、演算器ＥＸ−１，２，３の空き
状態が多くなってしまうという問題点があった。[Problems to be Solved by the Invention] However, in such a conventional parallel arithmetic circuit, the decoder DC sequentially reads out one arithmetic instruction (1) to (2) from the instruction register CR every clock cycle. Since the decoding result was configured to be supplied to each arithmetic unit EX-1, EX-2, and EX-3, it was not possible to simultaneously supply the decoded result of the arithmetic instruction to multiple arithmetic units EX-1, EX-2, and EX-3. Arithmetic unit E
Although the configuration is such that three pipeline processing is possible by X-1, EX-2, and EX-3, the pipeline does not operate effectively and the arithmetic units EX-1, EX-2, and EX-3 are in an empty state. There was a problem that there were many.

【００１１】パイプラインが有効に動作しないというこ
とは、例えば、図９，１０に示すように、演算命令■の
出力結果に基づいて演算命令■が実行される場合、すな
わち、演算器ＥＸ−２による演算命令■の出力データの
アドレスが演算器ＥＸ−２による演算命令■の入力デー
タのアドレスと一致する場合に、さらに顕著であり、こ
の場合、演算器ＥＸ−２の出力結果が得られるまでパイ
プライン処理が停止しているため、８個の演算命令■〜
■の全てを実行するのに１２クロックサイクルを要する
こととなり、さらに演算器ＥＸ−１，２，３の空き状態
が多くなり、演算速度が低下している。The fact that the pipeline does not operate effectively means that, for example, as shown in FIGS. 9 and 10, when arithmetic instruction (2) is executed based on the output result of arithmetic instruction (2) This is even more noticeable when the address of the output data of the calculation instruction ■ by the calculation unit EX-2 matches the address of the input data of the calculation instruction ■ by the calculation unit EX-2. Because pipeline processing has stopped, 8 operation instructions
It takes 12 clock cycles to execute all of (2), and furthermore, the arithmetic units EX-1, EX-2, and EX-3 become vacant, and the calculation speed decreases.

【００１２】［目的］そこで本発明は、パイプライン処
理をなす複数の演算器に対して演算命令をそれぞれ同時
に供給する並列演算回路を提供することを目的としてい
る。[Objective] Therefore, it is an object of the present invention to provide a parallel arithmetic circuit that simultaneously supplies arithmetic instructions to a plurality of arithmetic units performing pipeline processing.

【００１３】[0013]

【課題を解決するための手段】本発明による並列演算回
路は上記目的達成のため、所定の演算命令を格納する複
数の命令レジスタＣＲ１，２，３　と、該命令レジスタ
ＣＲ１，２，３　に格納された演算命令による演算に必
要なデータを格納するデータレジスタＤＲと、該命令レ
ジスタＣＲ１，２，３　に格納された演算命令をデコー
ドする複数のデコーダＤＣ１，２，３　と、該デコーダ
ＤＣ１，２，３　のデコード結果に基づいて所定の演算
を行う複数の演算器ＥＸ−１，２，３と、該命令レジス
タＥＸ−１，２，３に格納する所定の演算命令を読み込
み、該演算命令を前記複数の各命令レジスタＣＲ１，２
，３　に同時に供給する命令再構成手段１とを備えてい
る。[Means for Solving the Problems] In order to achieve the above object, the parallel arithmetic circuit according to the present invention includes a plurality of instruction registers CR1, 2, 3 for storing predetermined arithmetic instructions, and a plurality of instruction registers CR1, 2, 3 for storing predetermined arithmetic instructions. a data register DR that stores data necessary for the operation according to the instruction register CR1, 2, 3, a plurality of decoders DC1, 2, 3 that decode the operation instructions stored in the instruction register CR1, 2, 3; . Each of the plurality of instruction registers CR1, 2
, 3 simultaneously.

【００１４】また、前記デコーダＤＣ１，２，３　のク
ロックをＣＬＯＣＫ１、前記演算器ＥＸ−１，２，３の
数をＮ、該演算器ＥＸ−１，２，３のクロックをＣＬＯ
ＣＫ２とした場合、ＣＬＯＣＫ１をＮ×ＣＬＯＣＫ２と
することが好ましく、前記命令レジスタＣＲ１，２，３
　がオーバーフローした場合、該命令レジスタＣＲ１，
２，３　に対応するデコーダＤＣ１，２，３　のデコー
ド結果を無効化し、該デコーダＤＣ１，２，３　は該命
令レジスタＣＲ１，２，３　が空き状態となるまでをデ
コードを停止することは有効である。Further, the clock of the decoders DC1, 2, DC3 is CLOCK1, the number of the arithmetic units EX-1, 2, 3 is N, and the clock of the arithmetic units EX-1, 2, 3 is CLOCK1.
In the case of CK2, it is preferable to set CLOCK1 to N×CLOCK2, and the instruction registers CR1, 2, 3
If overflow occurs, the corresponding instruction register CR1,
It is effective to invalidate the decoding results of the decoders DC1, 2, 3 corresponding to the instruction registers CR1, 2, 3, and stop decoding until the instruction registers CR1, 2, 3 become empty. be.

【００１５】[0015]

【作用】本発明では、命令再構成手段により所定の演算
命令が格納される複数の命令レジスタに対して演算命令
が同時に供給される。According to the present invention, arithmetic instructions are simultaneously supplied by the instruction reconfiguration means to a plurality of instruction registers in which predetermined arithmetic instructions are stored.

【００１６】すなわち、複数の各演算器によって効率よ
くパイプライン動作がなされ、演算速度の向上が図られ
る。That is, the pipeline operation is efficiently performed by each of the plurality of arithmetic units, and the arithmetic speed is improved.

【００１７】[0017]

【実施例】以下、本発明を図面に基づいて説明する。図
１〜５は本発明に係る並列演算回路の一実施例を示す図
であり、図１は本実施例の全体構成を示すブロック図で
ある。DESCRIPTION OF THE PREFERRED EMBODIMENTS The present invention will be explained below based on the drawings. 1 to 5 are diagrams showing one embodiment of a parallel arithmetic circuit according to the present invention, and FIG. 1 is a block diagram showing the overall configuration of this embodiment.

【００１８】まず、構成を説明する。なお、図１におい
て、図６に示した従来例に付された番号と同一番号は同
一部分を示す。First, the configuration will be explained. In FIG. 1, the same numbers as those given to the conventional example shown in FIG. 6 indicate the same parts.

【００１９】本実施例の並列演算回路は、大別して、命
令レジスタＣＲ１，２，３　、データレジスタＤＲ、デ
コーダＤＣ１，２，３　、演算器ＥＸ−１，２，３、命
令再構成手段であるディペンデンシィコントローラ（ｄ
ｅｐｅｎｄｅｎｃｙ　ｃｏｎｔｌｅｒ）１から構成され
ており、命令レジスタＣＲ１，２，３　、デコーダＤＣ
１，２，３　は共に演算器ＥＸ−１，２，３に対応して
それぞれ３個で構成されている。The parallel arithmetic circuit of this embodiment can be roughly divided into instruction registers CR1, 2, 3, data registers DR, decoders DC1, 2, 3, arithmetic units EX-1, 2, 3, and instruction reconfiguration means. Dependency controller (d
It consists of instruction registers CR1, 2, 3, decoder DC
1, 2, and 3 each consist of three pieces corresponding to the arithmetic units EX-1, EX-2, and EX-3.

【００２０】ディペンデンシィコントローラ１は、演算
命令間のリソース（ｒｅｓｏｕｒｃｅ）とオペランド（
ｏｐｅｒａｎｄ　）との依存性を調べ、依存性がない場
合はそのまま処理を実行し、依存性がある場合はｎｏｐ
（ノーオペレーション）命令を出力して処理の中断をす
るものであり、詳しくは、外部から演算命令が入力され
た場合、演算命令を１つずつデコードし、先に読み込ん
だ演算命令の出力データと現在読み込んだ演算命令の入
力データとのアドレスが一致するかどうかを調べ、一致
する場合だけｎｏｐ命令を出力するものである。The dependency controller 1 handles resources and operands between arithmetic instructions.
operand), and if there is no dependency, execute the process as is, or if there is a dependency, nop
(No-operation) This is to interrupt processing by outputting an instruction. Specifically, when an arithmetic instruction is input from the outside, it decodes the arithmetic instructions one by one and combines them with the output data of the previously read arithmetic instruction. It checks whether the address of the currently read arithmetic instruction matches the input data, and outputs a nop instruction only if they match.

【００２１】次に作用を説明する。演算器ＥＸ−１，２
，３での演算処理は、図２，３に示すように、まず、外
部から入力される演算命令がディペンデンシィコントロ
ーラ１によって各命令レジスタＣＲ１，２，３　毎に振
り分けられ、命令レジスタＣＲ１，２，３　に格納され
ている演算命令がデコーダＤＣ１，２，３　によってデ
コードされ、デコード結果が各演算器ＥＸ−１，２，３
に同時に出力される。Next, the operation will be explained. Arithmetic unit EX-1, 2
, 3, as shown in FIGS. 2 and 3, first, the dependency controller 1 distributes arithmetic instructions input from the outside to each instruction register CR1, 2, and 3, and the instruction register CR1 , 2, 3 are decoded by decoders DC1, 2, 3, and the decoding results are sent to each computing unit EX-1, 2, 3.
are output simultaneously.

【００２２】デコーダＤＣ１，２，３　によって命令レ
ジスタＣＲ１，２，３　から１クロックサイクル毎に１
個の演算命令■，■，■がそれぞれ読み出され、そのデ
コード結果が各演算器ＥＸ−１，２，３にそれぞれ供給
される。そして、演算器ＥＸ−１では演算命令■，■，
■が順次処理され、演算器ＥＸ−２では演算命令■，■
、演算器ＥＸ−３では演算命令■，■，■が処理される
。The decoders DC1, 2, 3 extract 1 from the instruction registers CR1, 2, 3 every clock cycle.
The arithmetic instructions (1), (2), and (2) are read out, respectively, and the decoded results are supplied to each of the arithmetic units EX-1, EX-2, and EX-3, respectively. Then, in the arithmetic unit EX-1, the arithmetic instructions ■, ■,
■ is processed sequentially, and in the arithmetic unit EX-2, the operation instructions ■, ■
, the arithmetic unit EX-3 processes the arithmetic instructions ■, ■, ■.

【００２３】ちなみに、この場合、６クロックサイクル
で８個の演算命令■〜■の全てが実行される。次に、図
９，１０の従来例と同様に、演算命令■の出力結果に基
づいて演算命令■が実行される場合について本実施例を
適用すると、この場合、演算命令■と■との間でディペ
ンデンシィが発生しているため、すなわち、演算命令■
の出力結果が演算命令■の入力となるため、演算命令■
の実行が終了しなければ、演算命令■が実行されない。このような場合、図４，５に示すように、ディペンデン
シィコントローラ１から命令レジスタＣＲ２　にｎｏｐ
命令が発行され、演算器ＥＸ−２によって演算命令■が
終了するまで演算器ＥＸ−２は演算が中断される。この
場合においても、演算器ＥＸ１，３　では演算処理がな
されているため、８個の演算命令■〜■の全てが実行さ
れるまで、７クロックサイクルしかかからない。Incidentally, in this case, all eight operation instructions (1) to (2) are executed in 6 clock cycles. Next, similarly to the conventional examples shown in FIGS. 9 and 10, if this embodiment is applied to the case where the calculation instruction ■ is executed based on the output result of the calculation instruction ■, in this case, the gap between the calculation instructions ■ and ■ Because dependency occurs in , that is, the operation instruction ■
The output result of is the input of the calculation instruction ■, so the calculation instruction ■
If the execution of the operation instruction (2) is not completed, the operation instruction (2) will not be executed. In such a case, as shown in FIGS. 4 and 5, nop is transferred from the dependency controller 1 to the instruction register CR2.
The instruction is issued, and the operation of the arithmetic unit EX-2 is suspended until the arithmetic instruction (2) is completed by the arithmetic unit EX-2. Even in this case, since arithmetic processing is performed in the arithmetic units EX1 and EX3, it takes only seven clock cycles until all eight arithmetic instructions (1) to (2) are executed.

【００２４】したがって、従来例ではそれぞれ１０、及
び１２クロックサイクル必要であった演算時間が本実施
例ではそれぞれ６，７クロックサイクルに短縮される。ここで、ディペンデンシィコントローラ１が演算器ＥＸ
−１，２，３のクロックの３倍のクロックで動作する場
合、ディペンデンシィコントローラ１は１クロックサイ
クルで３個の演算命令をデコードできることになる。す
なわち、１個目の命令デコードの際には、合計９個のア
ドレス比較が必要となり、２個目には１２個、３個目の
デコードでは１５のアドレス比較が必要となる。Therefore, the calculation time required in the conventional example is 10 and 12 clock cycles, respectively, but in this embodiment, it is reduced to 6 and 7 clock cycles, respectively. Here, the dependency controller 1 is the computing unit EX
If the dependency controller 1 operates with a clock three times faster than the clocks -1, 2, and 3, the dependency controller 1 can decode three operation instructions in one clock cycle. That is, a total of 9 address comparisons are required for the first instruction decode, 12 address comparisons are required for the second instruction, and 15 address comparisons are required for the third instruction decode.

【００２５】アドレス比較を行った結果、依存性が発生
する場合には、命令レジスタＣＲ１，２，３　にｎｏｐ
命令が発行されるが、依存性の発生する演算命令が連続
する場合、命令レジスタＣＲ１，２，３　がオーバーフ
ローする場合が考えられる。このような場合には、その
実行サイクルでのデコード結果を無効化し、命令レジス
タＣＲ１，２，３　が空くまでデコードを待つような制
御信号が、ディペンデンシィコントローラ１から命令レ
ジスタＣＲ１，２，３　、及びデコーダＤＣ１，２，３
　に出力される。As a result of address comparison, if dependency occurs, nop is written to instruction registers CR1, CR2, CR3.
An instruction is issued, but if arithmetic instructions with dependence occur consecutively, the instruction registers CR1, CR2, CR3 may overflow. In such a case, a control signal is sent from the dependency controller 1 to the instruction registers CR1, 2, CR1, 2, CR3 to invalidate the decoding result in that execution cycle and wait for decoding until the instruction registers CR1, 2, 3 become free. 3, and decoders DC1, 2, 3
is output to.

【００２６】これによって、演算命令のオーバーフロー
が防止される。このように本実施例では、ディペンデン
シィコントローラ１によって所定の演算命令を格納する
複数の命令レジスタＣＲ１，２，３　に対して演算命令
を同時に供給でき、複数の各演算器ＥＸ−１，２，３に
よって効率よくパイプライン動作できる。[0026] This prevents arithmetic instructions from overflowing. In this way, in this embodiment, the dependency controller 1 can simultaneously supply arithmetic instructions to the plurality of instruction registers CR1, CR2, CR3 that store predetermined arithmetic instructions, and each of the plurality of arithmetic units EX-1, 2 and 3 allow efficient pipeline operation.

【００２７】したがって、複数の演算器ＥＸ−１，２，
３で同時に演算が実行でき、演算速度の向上を図ること
ができる。なお、上記実施例は３個の演算器を有する並
列演算回路を例に採り説明しているが、これに限らず、
必要とする並列演算に応じて演算器の数を設定可能であ
ることはいうまでもない。Therefore, a plurality of arithmetic units EX-1, EX-2,
3 can be executed simultaneously, and the calculation speed can be improved. Note that although the above embodiment has been explained using a parallel arithmetic circuit having three arithmetic units as an example, the present invention is not limited to this.
It goes without saying that the number of computing units can be set depending on the required parallel operations.

【００２８】[0028]

【発明の効果】本発明では、命令再構成手段によって所
定の演算命令を格納する複数の命令レジスタに対して演
算命令を同時に供給でき、複数の各演算器によって効率
よくパイプライン動作できる。According to the present invention, arithmetic instructions can be simultaneously supplied to a plurality of instruction registers storing predetermined arithmetic instructions by the instruction reconfiguration means, and efficient pipeline operation can be performed by each of the plurality of arithmetic units.

【００２９】したがって、複数の演算器で同時に演算が
実行でき、演算速度の向上を図ることができる。Therefore, a plurality of arithmetic units can perform calculations at the same time, and the calculation speed can be improved.

[Brief explanation of the drawing]

【図１】本発明一実施例の全体構成を示すブロック図で
ある。FIG. 1 is a block diagram showing the overall configuration of an embodiment of the present invention.

【図２】本発明一実施例の動作例を示す図である。FIG. 2 is a diagram showing an example of operation of an embodiment of the present invention.

【図３】本発明一実施例の演算命令の実行例を示す図で
ある。FIG. 3 is a diagram showing an example of execution of arithmetic instructions according to an embodiment of the present invention.

【図４】本発明一実施例の他の動作例を示す図である。FIG. 4 is a diagram showing another example of operation of an embodiment of the present invention.

【図５】本発明一実施例の演算命令の他の実行例を示す
図である。FIG. 5 is a diagram showing another example of execution of arithmetic instructions according to an embodiment of the present invention.

【図６】従来例の全体構成を示すブロック図である。FIG. 6 is a block diagram showing the overall configuration of a conventional example.

【図７】従来例の動作例を示す図である。FIG. 7 is a diagram showing an operation example of a conventional example.

【図８】従来例の演算命令の実行例を示す図である。FIG. 8 is a diagram showing an example of execution of a conventional arithmetic instruction.

【図９】従来例の他の動作例を示す図である。FIG. 9 is a diagram showing another operation example of the conventional example.

【図１０】従来例の演算命令の他の実行例を示す図であ
る。FIG. 10 is a diagram showing another example of execution of conventional arithmetic instructions.

[Explanation of symbols]

１　　　　ディペンデンシィコントローラ（命令再構成
手段）ＣＲ１，２，３　　　　　命令レジスタＤＣ１，２，３
　　　　　デコーダＤＲ　　　　データレジスタＥＸ−１，２，３　　　　演算器1 Dependency controller (instruction reconfiguration means) CR1, 2, 3 Instruction register DC1, 2, 3
Decoder DR Data register EX-1, 2, 3 Arithmetic unit

Claims

[Claims]

1. A plurality of instruction registers that store predetermined arithmetic instructions, a data register that stores data necessary for an operation based on the arithmetic instructions stored in the instruction registers, and a data register that stores the arithmetic instructions stored in the instruction registers. A plurality of decoders perform decoding, a plurality of arithmetic units perform predetermined arithmetic operations based on the decoding results of the decoders, and a predetermined arithmetic instruction to be stored in the instruction register is read, and the arithmetic instruction is stored in each of the plurality of instruction registers. instruction reconfiguration means for simultaneously supplying;
A parallel arithmetic circuit comprising:

[Claim 2] CLOCK the clock of the decoder.
1. The number of the arithmetic units is N, and the clock of the arithmetic units is CLO.
2. The parallel arithmetic circuit according to claim 1, wherein when CLOCK2 is used, CLOCK1 is N×CLOCK2.

3. When the instruction register overflows, the decoding result of a decoder corresponding to the instruction register is invalidated, and the decoder stops decoding until the instruction register becomes empty. Term 1 or 2 parallel computing device.