JPS58106642A

JPS58106642A - Parallel operating device

Info

Publication number: JPS58106642A
Application number: JP56205013A
Authority: JP
Inventors: Tsutomu Sakamoto; 務坂本
Original assignee: Toshiba Corp; Tokyo Shibaura Electric Co Ltd
Current assignee: Toshiba Corp
Priority date: 1981-12-18
Filing date: 1981-12-18
Publication date: 1983-06-25

Abstract

PURPOSE:To execute the processing of each operating section in parallel and to attain a high operating speed, by providing plural operation elements in common use for each operating section and an instruction register entried with an instruction, to plural operation sections performing specific operation processing. CONSTITUTION:An instruction read out from a main storage device 11 is entried to an instruction register 13, and an instruction from the register 13 is operated at operation sections 201-20m having a specific operation processing function. Operating elements 16116n of common use are connected to the operating sections 201-20m and the operation sections 201-20m monitor the state of the register 13. The operation sections 201-20m discriminate whether or not the instruction of the register 13 is processed at the corresponding operation section, and the instruction entried to the register 13 is fetched in response to the state of an information 22 and the usage of the element and the operation is done by using the corresponding elements 161-16n. It is outputted to the line 22 that the elements 161-16n are in use and the instruction to be executed from an instruction buffer 12 is entried to the register 13.

Description

【発明の詳細な説明】発明の技術分野本発明は／譬イデライン制御方式の並列演算装−に関す
る。DETAILED DESCRIPTION OF THE INVENTION Technical Field of the Invention The present invention relates to a parallel computing device using an ideal line control method.

発明の技術的背景一般にノイデライン制御方式の演算装置は第１図に示さ
れるように構５されている。図中、１１け主記憶装置（
以下、−ＭＥＭと称する）、１２ｆｄ　ＭＥＭ　１１か
ら先読みされた命令が順次格納される命令パ、７ア（以
下、ＩＢと異称する）である。このｌＢ１２に図示せぬ
命令先読み機構によってＭＥＭ　Ｊ　Ｊから命令を先読
みしておくことによって命令の処理を連続的に行なうい
わゆるノ譬イブライン制御が可能となる。なお、この・
中イブライン’Ｉ！ＩＩ　制は本発明の要旨と直接関係
しないので説明を省略する。ｌＢ１２に先読みされた命
令はタイプ等の解読が行なわれると命令レジスタ（以下
、ＩＲと称する）１３に置数される。このときＲＸ型の
命令であれば、ＭＥＭ　Ｚ　７からオペランドが読み出
され、このオーＱンドは図示せぬナベランドＬノノスタ
に格納される。TECHNICAL BACKGROUND OF THE INVENTION In general, a computing device using the Neudelein control system is structured as shown in FIG. In the figure, the 11-digit main memory (
-MEM), 12fd MEM (hereinafter referred to as -MEM), 7a (hereinafter referred to as IB) in which instructions read ahead from the 12fd MEM 11 are sequentially stored. By pre-reading instructions from MEM JJ by an instruction pre-reading mechanism (not shown) in the IB 12, it becomes possible to carry out so-called parallel e-line control in which instructions are processed continuously. In addition, this
Naka Eveline'I! Since the II system is not directly related to the gist of the present invention, its explanation will be omitted. The instruction prefetched into the IB 12 is placed in an instruction register (hereinafter referred to as IR) 13 after its type and the like are decoded. At this time, if the instruction is of the RX type, the operand is read from the MEM Z 7, and the operand is stored in the non-illustrated Navelland L nonostar.

ＩＲＪＪに置数されている命令は演算部１４に取り込ま
れ、演算部１４による命令実行が行なわれる。演算部Ｊ
４Ｆｉ図示せぬ制御記憶部を有しており、ＩＲＪＪから
取り込んだ命令をマイクロ命令Ｋｊｌ開して対応する処
理を行なう。すなわち、演算部１４から取り出されるマ
イクロ命令はマイクロ命令パス１５へ送出される。これ
により各種演算エレメント１６．〜１６アのいくつかが
起動されマイクロ命令に基づく処理が行なわれる。演算
エレメント１６１〜１６．Ｆｉそれぞれ固有の機能を有
しており、例えば演算エレメント１６１．１６．はそれ
ぞれ加算器、雫算器である。これら演算エレメント１６
１〜１６ｎ間のデータの授受はデータバス１７を介して
行なわれる。The instruction placed in IRJJ is taken into the arithmetic unit 14, and the instruction is executed by the arithmetic unit 14. Arithmetic unit J
The 4Fi has a control storage section (not shown), which opens the microinstruction Kjl for instructions taken in from IRJJ and performs the corresponding processing. That is, the microinstruction taken out from the arithmetic unit 14 is sent to the microinstruction path 15. As a result, various calculation elements 16. Some of the ~16A are activated and processing based on microinstructions is performed. Arithmetic elements 161-16. Each Fi has a unique function, for example, the calculation elements 161, 16, . are an adder and a drop calculator, respectively. These calculation elements 16
Data is exchanged between 1 and 16n via a data bus 17.

背景技術の問題点このような第１図の・（イブライン制御方式の演算装置
では、たとえばＲＸ型命令を処理する場合、ＭＥＭ　１
１から命令を読み出す命令読み出しステージＩＦ、オペ
ランドアＰレス＠）ＨステＦ −ノＡ１オ（ランド読み出しステー）＜および実行ステ
ージＥの順で処理が行なわれる。そして、各命令は、・
母イブライン制御によってその命令読み出しステージが
１周期（ｌマシンサイクル）ずつ後方へずれて処理され
る。もし命令してそれぞれ異なるタイミングでＭＥＭ　
ｌ　Ｉ　Ｋ対する読み出しを行なうとか、ＭＥＭ　１１
の前段に設置されるキヤ、シ息メモリ（図示せず）を命
令格納用メモリと、オペランド格納用メモリとに分離し
て設けて使用するなど公知の技術により同−周期内で処
理することが可能である。しかし、実行ステージＥが１
周期（ｌマシンサイクル）で終了しない命令の場合、後
続する命令の実行ステージと重なった状卵で処理を行な
うことは不可能であった。このため、後続する命令ｔＤ
実行、＜テーゾＥは先行する命令の実行ステージＥが終
了するまで待たされていた。したがって第２図に示され
るように先行する命令１゜の実行ステージＥがＮ周期た
とえば４周期（Ｎ＝４）の実行時間を必要とする命令の
場合、後続する命令１１＋１１＋１１　・・・はそれぞ
れ少なくと−Ｎ−１周期すなわち３８期待たされる。Problems with the Background Art In an arithmetic unit using the Eveline control method as shown in FIG.
Processing is performed in the order of the instruction read stage IF for reading instructions from 1, the operand door P address @)H stage F-NOA1O (land read stage)<, and the execution stage E. And each command is...
The instruction read stage is processed backward by one cycle (1 machine cycle) by the mother line control. If a command is issued and MEM is executed at different timings,
l I K reading, MEM 11
Processing can be performed within the same cycle using known techniques, such as using a storage memory (not shown) installed in the preceding stage as a memory for storing instructions and a memory for storing operands. It is possible. However, execution stage E is 1
In the case of an instruction that does not complete within a cycle (1 machine cycle), it is impossible to perform processing at an execution stage that overlaps with the execution stage of a subsequent instruction. Therefore, the subsequent instruction tD
Execution <Teso E was forced to wait until the execution stage E of the preceding instruction was completed. Therefore, as shown in FIG. 2, if the execution stage E of the preceding instruction 1° is an instruction that requires an execution time of N cycles, for example, 4 cycles (N=4), the following instructions 11+11+11... and -N-1 periods, or 38 expected.

このように従来のノ母イブライン制御方式の演算装置で
は、先行する命令の実行ステージＥが１周期で終了しな
い命令の場合、後続する命令の実行ステージＥが待たさ
れて・ヤイデラインの流れが妨げられるため、演算の高
速化を図る上で大きな障害となる欠点があった。In this way, in the conventional arithmetic unit using the motherboard control method, if the execution stage E of the preceding instruction does not complete in one cycle, the execution stage E of the subsequent instruction is forced to wait, and the flow of the execution line is interrupted. Therefore, there was a drawback that it became a major obstacle in speeding up calculations.

発明の目的本発明は上記事情に鑑みてなされたものでその目的は、
実行ステージの並列処理が効率よく行なえ、もってノ４
イブラインの流れの乱れを著しく減少することができ、
演算速度の高速化が図れる並列演算装置を提供すること
にある。Purpose of the Invention The present invention has been made in view of the above circumstances, and its purpose is to:
Parallel processing of the execution stage can be performed efficiently, which is the fourth advantage.
The turbulence of the eveline flow can be significantly reduced,
An object of the present invention is to provide a parallel arithmetic device capable of increasing the arithmetic speed.

発明の概要命令パ、ファに先読みされた命令が置数される命令レジ
スタと、固有の演算処理機能を有する複数の演算部と、
これら演算部が共用する複数の演算エレメントとを設け
、上記各演算部を、上記命令レジスタを監視し、命令レ
ジスタに置数される命令が自演翼部で処理すべき命令で
あって、かつこの命令を実行するのく使用する演算エレ
メントが使用されていないものと判断した場合には、上
記命令レジスタに置数されている命令を取り込んで命令
の実行を開始するとともに他の演算部に対して上記使用
する演算エレメントが使用中であることを示す信号を出
力し、かつ上記命令レジスタに次に実行すべき命令を胃
散せしめるように構成することによって、命令実行継続
中の演算部の処理動作と並行して、他の演算部が上記命
令レジスタに新た罠置数された後続する命令を取り込ん
で実行するようにしたものである。Summary of the Invention: An instruction register in which prefetched instructions are stored in an instruction register, a plurality of arithmetic units having unique arithmetic processing functions,
A plurality of arithmetic elements shared by these arithmetic units are provided, each of the arithmetic units monitors the instruction register, and determines whether the instruction placed in the instruction register is an instruction to be processed by the own performance wing, and If it is determined that the arithmetic element used to execute the instruction is not in use, it fetches the instruction placed in the instruction register above and starts executing the instruction, and also sends the instruction to other arithmetic units. By outputting a signal indicating that the arithmetic element to be used is in use and storing the next instruction to be executed in the instruction register, processing operations of the arithmetic unit while instructions are being executed can be controlled. In parallel, other arithmetic units take in and execute subsequent instructions newly placed in the instruction register.

更に本発明は、現実行中の命令と次に実行すべき命令中
に含まれているレジスタ指定部の情報を比較する比較回
路を設け、この比較回路の比較出力を、次に実行すべき
命令を取や込んで命令の実行を開始するための一条件と
することによって、汎用レジスタの使用状況に応じて複
数の演算部が汎用レジスタをレジスタレベルで多重に使
用するようにしている。Furthermore, the present invention provides a comparison circuit that compares the information of the register specification part included in the instruction currently being executed and the instruction to be executed next, and uses the comparison output of this comparison circuit as the instruction to be executed next. By incorporating this as a condition for starting instruction execution, a plurality of arithmetic units can multiplex the general-purpose registers at the register level depending on the usage status of the general-purpose registers.

発明の実施例１１！３図は本発明の一実施例を示すグロ、り図である
。第１図と同一部弄には同一符号を付して詳細な説明を
省略する。図中、２０１〜２ρ１は固有の演算処理機能
（たとえば固定小数点演算機能、浮動小数点演算機能、
または関数演算機能など）を有する演算部である。演算
部２０゜〜ｊ　Ｏ，は基本的に第１図の演算部１４を機
能分散したものであり、それぞれ独立した制御記憶（図
示せず）を有し、他の演算部と並列に動作できる。２ノ
はＩＲＪＪの内容を演算部２０１〜ｚＯ□に転送するた
めのオペシーシーン１９ス、２２は信号ラインＯＰＦ、
　Ｉ〜ＯＰＥ　ｎから成るエレメント使用中情報ライン
である。信号ラインＯＰＰ：　Ｊ〜ＯＰＥ　ｎは、それ
ぞれ演算エレメント１６、〜１６ｎが使用中であるか否
かを示すための本ので、各演算部ｚｏ１〜２０ｒｎに全
て共通に接続されている。本実施例においてこれら信号
ラインＯＰＥ　１〜０Ｐｌｉ：　ｎは通常状態でハイレ
ベル（”Ｈルベル）であり、対応する演算°エレメント
１６１〜１６イが使用される場合に演算部２０１〜２０
ｍのいずれかの演算部によってローレベル（’Ｌ’レベ
ル）にされる、この状態は該当する演算エレメントが解
放されるまで保たれる。ＰＧはロードクロヅク信号ライ
ンである。Embodiment 11 of the Invention Figure 3 is a diagram showing an embodiment of the invention. Components that are the same as those in FIG. 1 are given the same reference numerals and detailed explanations will be omitted. In the figure, 201 to 2ρ1 are specific arithmetic processing functions (for example, fixed-point arithmetic functions, floating-point arithmetic functions,
or function calculation function). The arithmetic units 20˜j O, are basically functionally distributed versions of the arithmetic unit 14 in FIG. 1, and each has an independent control memory (not shown) and can operate in parallel with other arithmetic units. . 2 is an operation scene 19 for transferring the contents of IRJJ to the calculation units 201 to zO□, 22 is a signal line OPF,
This is an element-in-use information line consisting of I to OPE n. Signal lines OPP: J to OPE n are used to indicate whether or not the calculation elements 16, to 16n are in use, and are all commonly connected to each calculation unit zo1 to 20rn. In this embodiment, these signal lines OPE1-0Pli:n are at a high level ("H level") in the normal state, and when the corresponding calculation elements 161-16i are used, the calculation units 201-20
It is set to low level ('L' level) by one of the calculation units of m, and this state is maintained until the corresponding calculation element is released. PG is a road clock signal line.

信号ラインＰＧはＩＲＪＪのロードクロ、り端子と各演
算部２０１〜２０ｒｌｌｌに共通に接続されている。本
実施例において信号ラインＰＧは通常状態テ“Ｈ″レベ
ルあり、演算部２０１〜Ｘ　Ｏ，のうちのいずれかの演
算部がＩＲＪＪＫ保持されている命令を取り込んだ場合
にその演算部によって“しルベルにされる。゛この演算
部はマシンクロ、りＣＬＫの１周期後に信号ラインＰＧ
を＠Ｈ”レベルに戻すようになっている。The signal line PG is commonly connected to the road clock terminal of IRJJ and each of the calculation units 201 to 20rllll. In this embodiment, the signal line PG is in the normal state at the "H" level, and when any one of the computing units 201 to XO takes in an instruction held in IRJJK, the signal line PG is "゛This arithmetic unit is connected to the machine clock signal line PG after one cycle of CLK.
is set to return to @H” level.

次に第３図の構成の動作を第４図のタイミングチャート
、を参照して説明する。今、ＩＲＪＪには１マシンサイ
クルの実行ステージを要する命令Ａが保持されており、
信号ラインＰＣは”Ｌ”レベルにあるものとする。また
、演算部ＪＯ。Next, the operation of the configuration shown in FIG. 3 will be explained with reference to the timing chart shown in FIG. 4. Currently, IRJJ holds an instruction A that requires an execution stage of one machine cycle.
It is assumed that the signal line PC is at the "L" level. In addition, the calculation unit JO.

がＩＲ，１３に保持されている命令Ａを取り込んで演算
エレメント１６３を使用して演算処理を実行しているも
のとする。このとき、信号ラインＯＰＥ　ｊは演算部２
０．によって＠Ｌルベルとなっており、これにより演算
エレメント１６゜が使用状態にあることが示される。こ
のような状態で演算部、２０１は命令Ａを実行してから
１マクンサイクル後（第４図のタイミングチャートでは
第１周期の終了時）に信号ラインＰＣを”Ｈ”レベルに
戻す。この例では、命令Ａを実行シてから１マシンサイ
クル後は、命令Ａの実行ステージの終了時でもあり、演
算部２０には信号ラインＯＰＥ　ｊを″″Ｈ″Ｈ″レベ
ル、演算エレメント１６雪を解放する。　□ 信号ラインＰＧが１Ｈ”レベルにカることにより、ＩＢ
Ｊｊから出力されている次に実行すべき命令たとえば命
令ＢがｌＲ１３に保持される。この命令Ｂ社たとえば４
マシンサイクルの実行ステー・ノを要する命令であるも
のとする。Assume that the instruction A held in the IR,13 is taken in and the arithmetic element 163 is used to execute arithmetic processing. At this time, the signal line OPE j is connected to the calculation unit 2.
0. This indicates that the arithmetic element 16° is in use. In this state, the arithmetic unit 201 returns the signal line PC to the "H" level one cycle after executing the instruction A (at the end of the first cycle in the timing chart of FIG. 4). In this example, one machine cycle after the execution of instruction A is also the end of the execution stage of instruction A, and the signal line OPE j is set to ``H'' level in the calculation unit 20, and the calculation element 16 is set to ``H'' level. to release. □ By the signal line PG reaching 1H" level, IB
The next instruction to be executed, such as instruction B, output from Jj is held in lR13. This command for company B, for example 4
It is assumed that the instruction requires the execution status of a machine cycle.

各演算部２０１〜２０ｒｎはＩＲｆ　３に保持されてい
る命令（命令Ｂ）をオペレージ■ンノ４ス２１を介して
受は取ってデコードし、この命令（命令Ｂ）が自演鼻部
゛で実行すべき命令であるか否かを判断している。すな
わち演算部２０．〜２０ＩＴｌはオペレージ１ンパス２
ノに現われる命令を常に監視している。たとえば演算部
２０１が上記命令Ｂを実行すべｉ！ものと判断したもの
とする。このとき演算部２０１は命令Ｂを実行するのに
必要な演算エレメントはどれであるかを判断し、当該演
算エレメントが使用中であるか否かを判定する。もし、
使用中であれば演算部２０１による命令Ｂの実行は待た
される。本実施例において、命令Ｂを実行するのに必゛
要な演算エレメントが演算エレメント１６ｎであるもの
とする（１演算エレメントとは限ら゛ない）。演算Ｗ　
２’０１は演算エレメント１′６ｎが使用中であるか否
かを信号ラインＯＰＥ　ｎの状態（“Ｈ”ｔたは“Ｌ″
レー！ルによって判断する。この例では信号ライン０Ｐ
Ｅｎは°Ｈ″レベルにあり、演算部２０１は演算エレメ
ント１６ｎが非望ｉ状態にあるものと判定する。この結
果、演算部２０１はｌＲ１３に保持されている命令Ｂを
内部の命令レノスタ（図示せず）に取り込む（演算エレ
メント１６ｔｌ以外の演算エレメントが使用中であって
もかまわない）、このとき、演算部２０、は信号ライン
ＰＧをｍ　Ｌ　１１レベルにするとともに、信号ライン
ＯＰＥ　ｎを同じく“Ｌ”レベルにする。しかる後、演
算部２０：は演算ニレメン）”７６Ｆ、を用いて命令Ｂ
を実行する。Each arithmetic unit 201 to 20rn receives and decodes the instruction (instruction B) held in the IRf 3 via the operating unit 4 bus 21, and executes this instruction (instruction B) in its own nose section. It is determined whether the command is appropriate or not. That is, the calculation unit 20. ~20 ITl is Operation 1 Pass 2
He constantly monitors the orders that appear on the screen. For example, if the calculation unit 201 executes the above instruction B, i! It shall be deemed that the At this time, the arithmetic unit 201 determines which arithmetic element is necessary to execute instruction B, and determines whether or not the arithmetic element is in use. if,
If it is in use, the execution of instruction B by the arithmetic unit 201 is put on hold. In this embodiment, it is assumed that the arithmetic element necessary to execute instruction B is the arithmetic element 16n (not limited to one arithmetic element). Operation W
2'01 indicates whether the arithmetic element 1'6n is in use or not by checking the state of the signal line OPEn (“H”t or “L”).
Leh! Judgment based on the rules. In this example, the signal line 0P
En is at the °H'' level, and the arithmetic unit 201 determines that the arithmetic element 16n is in the undesired i state.As a result, the arithmetic unit 201 transfers the instruction B held in the lR13 to the internal instruction renoster (Fig. (not shown) (it does not matter if arithmetic elements other than the arithmetic element 16tl are in use), at this time, the arithmetic unit 20 sets the signal line PG to the m L 11 level, and also sets the signal line OPE n to the same level. After that, the arithmetic section 20: uses the arithmetic unit 76F to
Execute.

演算部２０ｔは前述した命令Ａの実行時と同様に、命令
Ｂを実行してから１マシンサイクル後（第４図のタイミ
ングチャートでは第２周期の終了時）に信号ラインＰＧ
を１Ｈ＃レベルに戻す。この結果、ＩＢＪｊから出力さ
れている次に実行すべき命令たとえば命令ＣがＩＲＪ　
３に保持される。この命令Ｃはたとえば命令Ａと同じく
１マシンサイクルの実行ステージを要する命令であり、
第４レーシーンパス２１を介して各演算部２０１〜２０
ｍＫ転送される。演算部２０、は命令Ｂの実行継続中で
あるため、演算部２０亀を除く演算部２０！〜２０．が
オイレーシｌンパス２１に、現われる命令（命令Ｃ）を
監視している。このとき、演算部２０３が命令Ｃを実行
すべきものと判断し、かつ命令Ｃを実行するのに必要な
たとえば演算エレメント１６ｍが（信号ラインＯＰＥ　
２の状態によ＃））非使用状態にあることを判断したも
のとする。、これにより演算部２０．は上記命令Ｃを内
部の命令レジスタに取り込み、かつ信号ラインＰＧを＠
Ｌルベルにするとともに信号ラインＯＰＥ　Ｊも同じく
″″Ｌ＃Ｌ＃エレメント１６１て命令Ｃを実行する。Similarly to the execution of the instruction A described above, the arithmetic unit 20t outputs the signal line PG one machine cycle after executing the instruction B (at the end of the second period in the timing chart of FIG. 4).
Return to 1H# level. As a result, the next instruction to be executed output from IBJj, such as instruction C, is
3. This instruction C, for example, is an instruction that requires an execution stage of one machine cycle like instruction A,
Each calculation unit 201 to 20 via the fourth ray scene path 21
mK is transferred. Since the arithmetic unit 20 is continuing to execute instruction B, the arithmetic units 20! except for the arithmetic unit 20! ~20. is monitoring the command (command C) appearing on the oil transmission path 21. At this time, the arithmetic unit 203 determines that the instruction C should be executed, and the arithmetic element 16m necessary for executing the instruction C (signal line OPE
Based on the state #2), it is determined that the device is not in use. , whereby the calculation unit 20. takes the above instruction C into the internal instruction register, and connects the signal line PG to @
At the same time, the signal line OPE J also executes the instruction C by setting the signal line OPE J to "L#L# element 161.

一方、演算部２０１は演算エレメント１６ｎを用いて命
令Ｂの実行を継続中である。すなわち、本実施例によれ
ば、先行する命令Ｂが１マシンサイクルで終了しない命
令であっても、この命令Ｂの実行ステージが終了するの
を待つことなく、次の命令Ｃを並列しＣ実行することが
で話る。以下、命令り、命令Ｅ（いずれも１マシンサイ
クルの実行ステージを要する命令）についても、これら
の命令を実行する演算部が命令Ｂを実行する演算部２０
１と重ならず（この例ではそれぞれ演算部２０．．２０
．とする〕、かつ命令Ｂの実行に使用される演算エレメ
ントが重ならなければ（この例ではそれぞれ演算エレメ
ント１６１＋１’＊　とする）、命令Ｃの場合と同様に
命令Ｂと並列に実行される。したがって第５図に示され
るように先行する命令Ｂの実行ステージＥが４周期の実
行時間を必要とする命令であっても、後続する命令Ｃ，
Ｄ、に：はそれぞれ１周期ずつ遅れて実行が開始される
だけである。ノ９イブライン制御方式では、通常状態に
おいて後続する命令は先行する命令に対して１周期ずつ
遅れて処理されるようになっており、本実施例によれば
先行する命令の実行ステージＥが１周期で終了しない命
令の場合でもパイプラインの流れが妨げられない。した
がって演算速摩の高速化を図ることかで舞る。On the other hand, the arithmetic unit 201 continues to execute instruction B using the arithmetic element 16n. That is, according to this embodiment, even if the preceding instruction B is an instruction that does not finish in one machine cycle, the next instruction C is executed in parallel without waiting for the execution stage of this instruction B to finish. Talk about what you can do. Hereinafter, regarding instructions 1 and E (both instructions requiring an execution stage of one machine cycle), the arithmetic unit 20 that executes these instructions is the arithmetic unit 20 that executes instruction B.
1 (in this example, each calculation unit 20..20
．． ], and if the arithmetic elements used to execute instruction B do not overlap (each arithmetic element 161+1'* in this example), then the instruction B is executed in parallel with the instruction B in the same way as the instruction C. Therefore, as shown in FIG. 5, even if the execution stage E of the preceding instruction B is an instruction requiring four cycles of execution time, the subsequent instruction C,
D, ni: only start execution with a delay of one cycle each. In the 9-line control system, in the normal state, subsequent instructions are processed one cycle after the preceding instruction, and according to this embodiment, the execution stage E of the preceding instruction is delayed by one cycle. Even in the case of instructions that do not end with , the flow of the pipeline is not interrupted. Therefore, it is important to increase the speed of calculations.

次に並列処理ができない場合の動作を説明する。今、演
算部２０２が演算エレメント１６Ｋを使用して第４図に
示されるように命令Ｆの演算処理を実行しているものと
する。この命令Ｆは２マシンサイクルの実行ステージを
要する命令であるものとする。この場合、明らかなよう
に信号ラインＯＰＥ　ｌは演算部２０雪によって＠Ｌル
ベルとなっている。このような状態で演算部２０．は命
令Ｆを実行してから１マシンサイクル後（第４図のタイ
ミングチャートでは第６周期の終了時）に信号ラインＰ
Ｇを′″Ｈ＃Ｈ＃レベル。Next, the operation when parallel processing is not possible will be explained. It is now assumed that the arithmetic unit 202 is executing the arithmetic processing of the instruction F using the arithmetic element 16K as shown in FIG. It is assumed that this instruction F is an instruction requiring an execution stage of two machine cycles. In this case, as is clear, the signal line OPE1 becomes @L level due to the calculation unit 20. In this state, the calculation unit 20. signal line P after one machine cycle after executing instruction F (at the end of the 6th cycle in the timing chart of Fig. 4).
G to '''H#H# level.

信号ラインＰＧが“Ｈルベルになることにより、次に実
行すべき命令たとえば命令ＧがｌＢ１２からＩＲＪＪに
保持される。この命令Ｇは演算エレメント１６Ｂを使用
して演算部２０１が実行する命令であるものとする。こ
の場合、上記演算エレメント１６には命令Ｆの処理を継
続している演算部Ｊ（ｌｌによって使用されているため
、演算部２０．け命令Ｇの実行待ち状態となる。一方、
演算部２０．は、命令Ｆを実行してから２マシンサイク
ル後に命令Ｆの処理を終了すると、信号ラインＯＰＥ　
１を１Ｈ”レベルに戻す。演算部２０％は信号ライン０
ＰＥＩの状態を監視しており、上述したように信号ライ
フ　０ＰＥ　ｌが１Ｈ２レベルになったことを検出する
と、演算エレメント１６１が解放された（非使用状態）
ものと判断する。この結果、演算部２０１は命令Ｇを内
部の命令レジスタに取り込む、このとき、演算部２０１
は信号ラインＰＧを“Ｌ”レベルにするとともに、信号
う、イン０ＰＥＪを同じく“Ｌルベルにする。しかる後
、演算部３０１は演算エレメント１６１を用いて命令Ｇ
を奥行する。When the signal line PG becomes "H level," the next instruction to be executed, such as instruction G, is held from lB12 to IRJJ.This instruction G is an instruction executed by the arithmetic unit 201 using the arithmetic element 16B. In this case, since the arithmetic element 16 is used by the arithmetic unit J(ll) which is continuing to process the instruction F, the arithmetic unit 20. is in a state of waiting for execution of the instruction G. On the other hand,
Arithmetic unit 20. When the instruction F finishes processing two machine cycles after executing the instruction F, the signal line OPE
1 back to the 1H" level. The calculation section 20% is connected to the signal line 0.
The state of the PEI is monitored, and when it is detected that the signal life 0PE l has reached the 1H2 level as described above, the calculation element 161 is released (unused state).
judge it as something. As a result, the calculation unit 201 takes in the instruction G into the internal instruction register.
sets the signal line PG to the "L" level, and also sets the signal line PG to the "L" level.Then, the arithmetic unit 301 uses the arithmetic element 161 to process the instruction G.
Depth.

次に本発明の他の実施例を説明する。第６図は本発明の
他の実施例を示すプロ、り図である。Next, another embodiment of the present invention will be described. FIG. 6 is a diagram showing another embodiment of the present invention.

＠３図と同一部分には同一符号を付して詳細な説明を省
略する０図中、３１は＠１図および第３図のＩＲＪ　ｌ
と同じ＜、ｘＢｉｘから取シ出される命令が保持される
命令レジスタ（第１命令レゾスタ）、Ｊ２は命令レジス
タＳ１（以下、１ｎ３１と称する）の保持内容が保持（
ロード）される命令レジスタ（第２命令レジスタ）であ
る。命令レジスタ３２（以下、ＩＲＪＪと称する）のロ
ードクロ、り端子およびクリヤ端子には後述する信号ラ
インＬＧが接続されておシ、信号ラインＬＧの状態のた
とえば＠Ｈ′″＃→′″Ｌ”への遷移に応じてｌＲ３１
の保持内容がロードされ、同じく“Ｌ“→“Ｒ２へめ遷
移に応じて１クリヤされるようＫなりている。３ＳはＩＲＪＪ。The same parts as in Figure @3 are given the same reference numerals and detailed explanations are omitted. In Figure 0, 31 is the IRJ l of Figure @1 and Figure 3.
Same as <, the instruction register (first instruction register) that holds the instruction fetched from xBix, and J2 is the instruction register that holds the contents of instruction register S1 (hereinafter referred to as 1n31) (
This is the instruction register (second instruction register) to be loaded). A signal line LG, which will be described later, is connected to the load clock, rear terminal, and clear terminal of the instruction register 32 (hereinafter referred to as IRJJ), and the state of the signal line LG changes from, for example, @H'''# to ''''L''. lR31 according to the transition of
The content held in is loaded and cleared to 1 in response to the transition from "L" to "R2." 3S is IRJJ.

３２に保持されている各命令中に含まれているレジスタ
指定部の情報を比較する比較回路（以下、ＣＭＰと称す
る）である。第６図のｌＲ３１゜３２に記されている符
号ＯＰはオイレーシ、ンツード部、ＲＪ、Ｒｊはそれぞ
れ第１．＠２オペランド格納レジスタ指定部であ、９、
ＲＲ型命令がＩｎ８１．８２に保持されている状即が図
示されている。ＲＲ型命令中の第１．＠２オペランド格
納レジスタ指定部ＲＪ、ＲＪで示されるレジスタ社汎用
レジスタ（図示せず）の一つテするものとする。　ＣＭ
Ｐ　ｓ　ｓはｘｎｓｉ、ｓｚにそれぞれＲＲ型命令が保
持されている場合、ＩＲ８１内のＲＲ型命令のＲＪとＩ
ＲＪＪ内のＲＲ型命令のＲＪ、８２との一致／不一致を
それぞれ検出するようになっている。このＣＭＰ　ｊ３
の比較結果は、先行する命令の演算結果（Ｒ４で指定さ
れる汎用レジスタの一つに格納されている）を、次の命
令の第１オー（２ンドＣＲＪで指定される汎用レジスタ
の一つに格納されている）または第２オ（ランｌ’（Ｒ
Ｊで指定される汎用レジスタの一つに格納されている）
で使用する場合に意味を持つものである。すなわちＣＭ
Ｐ　Ｊ　ｌは次の命令の実行前に、当該命令を実行する
際に使用される汎用レジスタ中の成るレジスタが先行す
る命令の演算結果格納レジスタとして使用されているか
否かを検出するようになっている。　ＩＰＧはＣＭＰ　
Ｊ　ｊの比較結果（一致／不一致検出出力）の出力信号
ラインである。This is a comparison circuit (hereinafter referred to as CMP) that compares the information in the register designation part included in each instruction held in the CMP. The symbols OP written in lR31°32 in FIG. @2 Operand storage register specification part, 9,
The situation in which RR type instructions are held in In81.82 is illustrated. The first in the RR type command. @2 Operand storage register designation section RJ, one of the general-purpose registers (not shown) indicated by RJ. CM
P s s is the RJ and I of the RR type instructions in IR81 when RR type instructions are held in xnsi and sz respectively.
Matching/mismatching of the RR type instruction in RJJ with RJ and 82 is detected respectively. This CMP j3
The comparison result is to compare the operation result of the preceding instruction (stored in one of the general-purpose registers specified by R4) with the first instruction (stored in one of the general-purpose registers specified by 2nd CRJ) of the next instruction. ) or the second o(run l'(R
(stored in one of the general-purpose registers specified by J)
It has meaning when used in . In other words, CM
Before executing the next instruction, PJl now detects whether a register in the general-purpose registers used when executing the instruction is used as a register for storing the operation result of the preceding instruction. ing. IPG is CMP
This is an output signal line for the comparison result (match/mismatch detection output) of J j.

本実施例において、　ＣＭＰ　Ｊ　Ｊの一致検出期間十
ＩＹシンク四、りの間、信号ラインＩＰＧは＠Ｌ”レベ
ルとなシ、（上記１マシンクロダクの間を除く）不一致
検出期間中信号ラインＩＰＧＦｉ’″Ｈ”レベルとなる
ようになっている。すなわちＣＰＭ　ｊ　ｌは一致検出
状態を更にｌマシンクロックの間ホールドするようにな
っている。In this embodiment, during the coincidence detection period of CMP J J, the signal line IPG is not at @L" level, and during the mismatch detection period (excluding the period of 1 machine clock), the signal line IPGFi' is In other words, CPM j l is designed to hold the match detection state for an additional l machine clocks.

４０１〜４０ｒｎ社第３図の演算部ｉｏ１〜２０ｒｎと
ほぼ同様の構成の演算部、ＬＧは各演算部４０１〜４０
ｒｎお本びＩＲＪ２のロードクロ、り端子、クリヤ端子
に共通に接続される信号ラインである。演算部４０１〜
４０ｏの第３図の演算部２０１〜２０ｍと異なる部分は
次の通りである。演算部４０１〜４０−はＩＲ３１から
オヘレーシ冒ンパス２１上に送出される命令の監視、信
号ライン０ＰＦＸＪ〜ＯＰＥ　ｎの監視のほかに、信号
ライン！にの監視を行なうようになっている・そして、
少なくとも信号ラインＩＰＧが＠Ｌ”レベルである期間
中、演算部４０１〜４０、は上記命令の取シ込みを待た
される。また、演算部４０１〜４０ｍは実行状態にある
命令の実行ステージが２マシンサイクル以上を要する命
令である場合、当該命令をｘｎｓｉからＩＲＪ２にロー
ドするために信号２インＬＧを′″Ｈ”レベルから′″
Ｌ’Ｌ’レベルようになっている０本実施例では、演算
［４０１〜４０ｆｆｉは命令を実行してから１マシンサ
イクル後に信号ラインＬＧを″″ＬＬルベル定し、必要
とする実行ステージのｌマシンサイクル前に信号５イア
ＬＧｔ＠Ｈ”レベルに戻すようになっている。また、演
算部４０１〜４０１は信号ラインＬＧの監視をも行なう
、演算部４０１〜４０ｍはオペレージ冒ンノ々ス２１上
に送出された命令が自演鼻部で実行すべき命令であって
も、轟該命令の実行ステー′ジが２ｗシンサイクル以上
ヲ要する命令である場合、少なくとも信号ラインＬＧが
１Ｌ”レベルである期間中、上記命令の取９込みを待た
される。これは、ｌＲ１１１が使用状態にあるために、
次の命令をＩＲｊＪＫ続けてロード（退避）できないこ
とを、演算部・４０１〜４６．が命令の内容（２−ｖシ
ンサイクル以上を要し、ＩＲＪｊへの■−ドを必要とす
る命令であるか）、および信号ラインＬＧの状態によっ
てあらかじめ検出し、不具合が発生することを防止する
ためである。401-40rnArithmetic units having almost the same configuration as the arithmetic units io1-20rn shown in FIG.
This is a signal line that is commonly connected to the load black, rear, and clear terminals of rn, IRJ2, and IRJ2. Arithmetic unit 401~
40o is different from the calculation units 201 to 20m in FIG. 3 as follows. The arithmetic units 401 to 40- monitor the commands sent from the IR 31 onto the O'Heresy path 21, monitor the signal lines 0PFXJ to OPEn, and also monitor the signal lines 0PFXJ to OPEn. It is now possible to monitor the
At least during the period when the signal line IPG is at the @L" level, the calculation units 401 to 40 are forced to wait for the above-mentioned instruction to be received. Furthermore, the calculation units 401 to 40m have two machines in which the execution stage of the instruction in the execution state is 2. If the instruction requires more than one cycle, change the signal 2in LG from ``H'' level to ``'' to load the instruction from xnsi to IRJ2.
In this embodiment, the operations [401 to 40ffi] set the signal line LG to ``''LL level one machine cycle after executing the instruction, and set the L'L level of the required execution stage. The signal line LGt@H'' level is returned to the signal line 5 before the machine cycle.The calculation units 401 to 401 also monitor the signal line LG.The calculation units 401 to 40m monitor the signal line LG Even if the command sent out is a command to be executed by the self-playing nose, if the execution stage of the command requires 2W syncycles or more, at least the period when the signal line LG is at the 1L" level. During this time, the user is forced to wait for the above command to be retrieved. This is because lR111 is in use,
The calculation units 401 to 46 indicate that the next instruction cannot be loaded (saved) in succession. is detected in advance based on the contents of the instruction (Is it an instruction that requires 2-v syncycles or more and a ■-dead to IRJj?) and the state of the signal line LG to prevent the occurrence of problems. It's for a reason.

次に＠６図の構成の動作を第７図のタイミングチャート
を参照して説明する・今、演算部４０ｓがＩＲＪ　Ｉ　
Ｋ保持されている命令Ｊを取プ込み、演算エレメント１
６．を使用して演算処理の実行を開始したものとする。Next, the operation of the configuration shown in @6 will be explained with reference to the timing chart shown in FIG.
Takes instruction J held in K and executes calculation element 1
6. Assume that the execution of arithmetic processing is started using .

このとき、信号ラインＰＧ、ＬＧ、ＩＰＧ味それぞれ鴫
し−レベル、′Ｈ＃レベル　＠　）ｉ　ＩＩＩレベルテ
する。ｔた信号ラインＯＰＥ　電は“Ｌ＃レベルである
。上記命令Ｊが２マシンサイクル以上を要する命令、た
とえば４マシンサイクルを要する命令であるものとする
と、演算部４０ｍは命令Ｊを実行してから１ｗシンサイ
クル後（Ｉ！７図のタイミングチャートではｔＸ３周期
の終了時）に信号ツインＬＧｔ−＠Ｌ’レベルに設定す
る。これによ）、ＩＲｊＪＫ保持されている命令ＪはＩ
Ｒ３１にロードされる。このとき、演算部４０ｍは信号
ラインＰＧを″ＩＨルベルに戻す。これによシ、ＩＲＪ
　１には次に実行すべき命令たとえば命令Ｋが保持され
る。この命令にはたとえば１ｗシンサイクルの実行ステ
ージを要する命令である。At this time, the signal lines PG, LG, and IPG are at low level and 'H# level, respectively. The signal line OPE is at the "L# level."If the above instruction J is an instruction that requires two or more machine cycles, for example, four machine cycles, the arithmetic unit 40m executes the instruction J and then After 1w syncycle (at the end of tX3 period in the timing chart in Figure I!7), the signal twin LGt-@L' level is set. As a result, the instruction J held in IRjJK becomes I
Loaded into R31. At this time, the calculation unit 40m returns the signal line PG to the "IH level."
1 holds an instruction to be executed next, for example, instruction K. This instruction requires, for example, an execution stage of 1W thin cycle.

ＩＲＪ　Ｊ　Ｋ保持された命令には、オペレージ１ンバ
ス２１を介して各演算部４０１〜４０Ｉ！１に転送され
る。ここで演算１１Ａ４ｏ、が上記命令Ｋを実行すべき
ものと判断したものとする。そして、演算部４０工が信
号ラインＩＰＧの１Ｈ”レベル（ＣＭＰ　Ｊ　Ｊがレジ
スタ指定部の不一致を検出）、使用すべき演算エレメン
トたとえば演算ニレメン）１１．０非使用状ＩＩ（信号
ラインＯＰＥ　Ｊが″Ｈ”レベル）を検出し、かつ命令
Ｋが１マシンサイクルで終了する命令であることを判断
すると、信号ラインＬＧＯ”Ｌ”レベルに無関係に命令
Ｋを内部命令レジスタに取シ込む、このとき演算部４　
ｏｆｆＩＩｄ信号ラインＰＧう１Ｌ”レベルにするとと
もに、信号ツインＯＰＥ　ｊを同じ（＠Ｌ”レベルにす
る。しかる後、演算部４０□は演算エレメント１６１を
用いて命令Ｋを実行する・演算部４０ｒｎは命令Ｋを実行してから１マシンサイク
ル後（この例では命令実行終了時）に信号ラインＰＧｔ
−＠Ｈ’レベルに戻す、この結果、命令にの次に実行す
べき命令たとえば命令りがＩＲＪ　１に保持される。こ
の命令りはたとえば３マシンサイクルの実行ステージを
要する命令であるｅＩＲＪＪに保持されている命令りは
オペレージ璽ンパス３７を’（？ｔ、て各演算部４０１
〜４６ｍＫ転送される。ここで、演算部４０１が上記命
令りを実行すべきものと判断したものとする。命令りが
上述したようＫ　２−ｒシンサイクル以上の実行ステー
ジを要する命令である場合、第７図のタイミングチャー
トに示されるように、たとえ信号ツインＩＰＧが″Ｈ”
レベル、使用すべき演算エレメントたとえば演算エレメ
ント１１１が非使用状態（信号ツインＯＰＫ　１が″″
ＨＨルベルあっても、信号ラインＬＧが１Ｌｍレベルで
あれば、演算部４０１は上記命令りの取シ込みを信号ラ
インＬＧが＠Ｈｍレベルになるまで待たされる。すなわ
ち命令りが現在実行されている命令Ｊと並列に実行され
ることが待たされる。これ１１２−ｖシンサイクルを必
要とする命令りが命令Ｊと並列に実行された場合、命令
りを退避すべきＩＲｊＪＫは命令Ｊが保持されておシ、
命令りをＩＲＪ　１に退避することが不可能となるから
である。もし、命令りが１１シンサイクルで終了する命
令である場合には、命令りの取）込みが行なわれて命令
りが実行されることは明らかである。IRJJK The held instructions are sent to each operation unit 401 to 40I! via the operation bus 21. Transferred to 1. Here, it is assumed that the operation 11A4o determines that the above-mentioned instruction K should be executed. Then, the arithmetic unit 40 detects the 1H" level of the signal line IPG (CMP J J detects a mismatch in the register designation part), the arithmetic element to be used, for example, the arithmetic element) 11.0 unused state II (signal line OPE J ``H'' level) and determines that the instruction K is an instruction that will be completed in one machine cycle, the instruction K is taken into the internal instruction register regardless of the ``L'' level of the signal line LGO. Arithmetic unit 4
The offIId signal line PG is set to 1L" level, and the signal twin OPE j is set to the same (@L" level. After that, the calculation unit 40□ executes the instruction K using the calculation element 161. The calculation unit 40rn One machine cycle after executing instruction K (in this example, at the end of instruction execution), signal line PGt
- @Return to H' level. As a result, the next instruction to be executed after the instruction, such as instruction R, is held in IRJ 1. For example, this instruction is an instruction that requires an execution stage of 3 machine cycles.
~46mK transferred. Here, it is assumed that the arithmetic unit 401 determines that the above instruction should be executed. If the instruction is an instruction that requires an execution stage of K2-r syncycles or more as described above, as shown in the timing chart of FIG. 7, even if the signal twin IPG is "H".
level, the calculation element to be used, for example calculation element 111, is in an unused state (signal twin OPK 1 is ""
Even if there is a HH level, if the signal line LG is at the 1Lm level, the arithmetic unit 401 is forced to wait until the signal line LG reaches the @Hm level to receive the above-mentioned command. In other words, execution of the instruction J in parallel with the currently executed instruction J is awaited. If an instruction requiring 112-v thin cycles is executed in parallel with instruction J, IRjJK, which should save the instruction, will
This is because it becomes impossible to save the instruction to IRJ1. If the instruction is an instruction that completes in 11 syncycles, it is clear that the instruction will be fetched and executed.

演算部４０富は命令Ｊの実行を継続し、命令Ｊの実行に
要する実行ステーＪ）（４マシンサイクル）の１マシン
サイクル前（１８７図のタイ電ングチャートでは第５周
期の終了時）に信号ラインＬＧを＠Ｈ”レベルに戻す、
演算部４０には信号ラインＬＧを監視しておシ、信号ラ
インＬＧ５ｆ＠Ｈ’レベルになるとＩＦｔＪＪに保持さ
れている前記命令りを内部の命令レジスタに取９込む、
このとき演算部４ｏ１は信号ラインｐ。The arithmetic unit 40 continues executing the instruction J, and one machine cycle before the execution stage J) (4 machine cycles) required to execute the instruction J (at the end of the 5th cycle in the timing chart of Fig. 187). Return the signal line LG to @H” level,
The arithmetic unit 40 monitors the signal line LG, and when the signal line LG5f@H' level is set, the instruction stored in the IFtJJ is loaded into the internal instruction register.
At this time, the calculation unit 4o1 operates on the signal line p.

を１Ｌ”レベルにするとともに１演算エレメント１６．
が使用状態であることを示すために信号ラインＯＰＥ　
Ｊを＠Ｌ”レベルにする。しかる後演算部４０１は演算
エレメント１６１を用いて命令りを実行する。is set to 1L" level and one calculation element 16.
signal line OPE to indicate that it is in use.
J is brought to the @L'' level. Thereafter, the arithmetic unit 401 uses the arithmetic element 161 to execute the instruction.

演算部４０１は命令りを実行してからｌマシンサイクル
後に信号ラインＬＧを１Ｌ＃レベルに設定する。これ罠
よ、Ｄ、ＩＲＪＪに保持されている命令りはＩＲＪＪに
ロード（退避）される。The arithmetic unit 401 sets the signal line LG to the 1L# level one machine cycle after executing the instruction. This is a trap, D. The instructions held in IRJJ are loaded (saved) into IRJJ.

このとき演算部４０．は信号ラインＰＧｔ−＠Ｈ”レベ
ルに戻す、これによ・（Ｊ　、ＩＲＪ　Ｊには次に実行
すべき命令たとえば命令Ｍが保持される・この命令Ｍは
たとえば命令りの実行結果（命令りの第１オペランド格
納レジスタ指定部Ｒ１で指定されている汎用レジスタ中
の成るレジスタの内容）を用いて演算を行なう命令で、
１マシンサイクルで終了する命令であるものとする。At this time, the calculation unit 40. is returned to the signal line PGt-@H" level. This causes J to hold the next instruction to be executed, for example, instruction M. An instruction that performs an operation using the contents of the register in the general-purpose register specified by the first operand storage register specification part R1.
It is assumed that the instruction is completed in one machine cycle.

ＣＭＰ　Ｊ　、９はＩＲＪ　Ｊに保持されている命令り
の第１オペランド０格納レジスタ指定部Ｒ１の情報が、
ｌＲ８１に保持されている命令Ｍの第１オ（ランド格納
レジスタ指定部Ｒ１の情報またはｗｃ２オ（２ンド格納
しゾス！指定部Ｒ２Ｏ情報に一致しているか否かを比較
検出する。この例では、一致が検出されるため、一致検
出期間＋１マシンサイクルの開信号ラインＩＰＧはＣＭ
Ｐ　３　Ｊ　Ｋよって＠Ｌ”レベルに設定される。一方
、ＩＲＪＪに保持された命令Ｍはオイレーシ、ンパス２
１を介して演算部４０１〜４ｏｒｎに転送される。CMP J, 9 is the information of the first operand 0 storage register specification part R1 of the instruction held in IRJ J,
Compare and detect whether or not the first o (land storage register specifying part R1 information or wc2 o) (2nd o) of the instruction M held in lR81 matches the ZOS! specifying part R20 information. In this example, , since a match is detected, the open signal line IPG for the match detection period + 1 machine cycle is CM
P 3 J K is set to @L” level. On the other hand, the command M held in IRJJ is set to
1 to the calculation units 401 to 4orn.

命令りを実行中の演算ＷＩ６４０．を除く各演算部はオ
ペレージｌンパス２１上に現われる命令を監視しておｐ
ｌ。′たとえば演算部４ｏ寓が命令Ｍを実行すべきもの
と判断したものとする。しかし、信号ラインＩＰＧが１
Ｈｍレベルにある場合、たとえ使用すべき演算エレメン
トたとえば演算エレメント１６意が非使用状ｔＱ（信号
ライン０ＰＥｊが＠Ｈ”レベル）にあり、かつ命令Ｍが
１マシンサイクルで終了する命令であっても、演算部４
０１は信号２インＩＰＧが１Ｈ“レベルになるまで上記
命令Ｍの取シ込みを待たされる。Operation during execution of instruction WI640. Each operation unit except for
l. 'For example, assume that the arithmetic unit 4o has determined that instruction M should be executed. However, the signal line IPG is 1
Hm level, even if the arithmetic element to be used, such as arithmetic element 16, is in the unused state tQ (signal line 0PEj is @H" level) and the instruction M is an instruction that completes in one machine cycle. , calculation section 4
01 is made to wait until the signal 2-in IPG reaches the 1H" level before receiving the instruction M.

これは命令Ｍが先行する命令りの演算結果を使用する命
令である丸め、命令りの実行ステージが終了する前に命
令Ｍを実行した場合、その実行結果が誤シとなるためで
ある。This is because if instruction M is executed before the execution stage of the rounding instruction, which is an instruction that uses the operation result of the preceding instruction, is completed, the execution result will be incorrect.

演算部４０１は命４ｂの実行を継続し、命令りの実行に
要す、る実行ステージ（３マシンサイクル）の１マシン
サイクル前（第７図のタイミングチャートでは第７周期
の終了時）に信号ラインＬＧを１Ｈｍレベルに戻す、信
号ラインＬＧが′″Ｌ＃→−Ｈ’に遷移することにょシ
、ＩＲｊ　Ｊはクリヤされる。この結果、ＣＭＰ　Ｊ　
Ｊの前述した一致検出は終了するが、ＣＭＰ　Ｊ　ｊは
一致検出状態を更に１マシンサイクルの間ホールドする
ため、信号ラインＩＰＧＯ１Ｌルベルは命令しの実行終
了時まで保たれる。そして、命令りの実行終了時（第７
図のタイミングチャートでは第８周期の終了時）に信号
ラインＩＰＧが″Ｈ”レベルに戻されると、演算部４０
ｍは先行する命令しの実行ステージが終了したものと判
断し、上記命令りで得られ九演算結果（汎用レジスタ中
の成るレジスタの内容）を用いて命令Ｍの実行を行なう
。The arithmetic unit 401 continues to execute instruction 4b, and outputs a signal one machine cycle (at the end of the seventh cycle in the timing chart of FIG. 7) before the execution stage (three machine cycles) required to execute the instruction. When the line LG returns to the 1Hm level and the signal line LG transitions from ``L# to -H'', IRj J is cleared. As a result, CMP J
The above-described match detection of J is completed, but since CMP J j holds the match detection state for one more machine cycle, the signal line IPGO1L level is held until the end of execution of the command. Then, at the end of execution of the instruction (7th
In the timing chart shown in the figure, when the signal line IPG is returned to the "H" level at the end of the 8th period), the calculation unit 40
M determines that the execution stage of the preceding instruction has been completed, and executes instruction M using the nine operation results (the contents of the registers in the general-purpose registers) obtained by executing the above instruction.

このように本実尻側によれば、汎用レノスタノ使用状況
をレジスタレベルであらかじめ検出することができるの
で、先行する命令の演算結果を用いて演算を行なう命令
まで先行する命令と並列に実行してしまい、誤った結果
を得るような不具合が防止できる。また、汎用レジスタ
を演算エレメントの一つとすることができる場合、汎用
レジスタの使用状況をし・ノスタレベルで検出しなくて
も上述の不具合は防止できるが、使用レジスタが一致し
ない場合でも稜続する命令の実行が待たされることにな
り、汎用レジスタの使用効率および処理速度が低下する
。これに対し、本実施例では汎用レジスタの使用状況を
レジスタレベルで検出でき、使用レジスタが一致しない
場合には命令の並列実行が行なえるので、汎用レジスタ
の使用効率および処理速度が向上する。In this way, according to the real-world side, the usage status of general-purpose lenostano can be detected in advance at the register level, so even the instruction that performs an operation using the operation result of the preceding instruction can be executed in parallel with the preceding instruction. This can prevent problems that could result in incorrect results being obtained. In addition, if a general-purpose register can be used as one of the calculation elements, the above problem can be prevented even if the usage status of the general-purpose register is not detected at the nostar level, but instructions that continue even if the used registers do not match , the execution of which will have to wait, resulting in lower general-purpose register usage efficiency and lower processing speed. In contrast, in this embodiment, the usage status of general-purpose registers can be detected at the register level, and if the used registers do not match, instructions can be executed in parallel, thereby improving the usage efficiency and processing speed of general-purpose registers.

なお、上記他の実施例では、ＣＭＰ　３　ＪがＲＲ型命
令の第１オー４５ンド格納レジスタ指定部ＲＪ　、＠２
オ（ランド格納レジスタ指定部Ｒ２の情報を比較する場
合について説明したが、ＲＲ型命令とＲＸ型命令、ＲＸ
型命令とＲＲ型命令、ＲＸ型命令同志におけるレジスタ
指定部（インデックスレジスタ指定部も含む）の情報を
比較する場合についても同様である。ただし、ＣＭＰ３
３は命令のタイｆ（型）を判定する機能、この判定結果
に応じて比較対象となるレジスタ指定部の情報をＨｔ３
１，３１から選択する機能（或いはＩＲＪＪ、５１に保
持されている各命令中の比較対象とならない情報をマス
クする機能）を備えている必要がある。Note that in the other embodiments described above, CMP3J is the first instruction storage register designation part RJ, @2 of the RR type instruction.
(Although we have explained the case where the information in the land storage register specification section R2 is compared, RR type instruction and RX type instruction, RX
The same applies to the case where the information in the register designation part (including the index register designation part) of a type instruction, an RR type instruction, and an RX type instruction is compared. However, CMP3
3 is a function to determine the type f (type) of an instruction, and according to the result of this determination, information on the register specification part to be compared is sent to Ht3.
It is necessary to have a function to select from IRJJ, 51 (or a function to mask information that is not a comparison target in each instruction held in IRJJ, 51).

発明の効果以上詳述したように本発明の並列演算装置によれば、実
行ステージの並列処理が効率よく行なえるので、パイプ
ラインの流れの乱れを著しく減少することができ、演算
速度の高速化が図れる。Effects of the Invention As detailed above, according to the parallel computing device of the present invention, parallel processing in the execution stage can be efficiently performed, so disturbances in the pipeline flow can be significantly reduced, and the computing speed can be increased. can be achieved.

[Brief explanation of drawings]

第１図は一般的な演算装置の構成を示すプロ、り図、Ｗ
ｃｚ図は一般的なパイプラインの流れを説明するための
図、第３図は不発明の一実施例を示すプロ、り図、第４
図は上記実施例の動作を説明するためのタイミングチャ
ート、第５図は上記実施例におけるノ４イデラインの流
れを説明するための図、第６図は本発明の他の実施例を
示すブロック図、第７図は上記他の実施例の動作を説明
するためのタイミングチャートである。１１・・・主記憶装置（ＭＥＭ）、Ｊ！・・・命令ノ譬
ν７ア（ＩＢ）、１３．３１．３２・・・命令レジスタ
（ＩＲ）、１４．２０１〜２ｏ、４０ｓ〜４０ｏｌ・−
演算部、１６１〜１６ｎ・・・演算エレメント、２２・
・・エレメント使用中情報ライン、３　Ｊ−・・比較回
路（ＣＭＰ　）、ＯＦＦ、　１〜ＯＰＥ　ｎ　。ＰＣ、ＬＧ　、　ＩＰＧ−・信号ライン。出願人代理人　　弁理士　鈴　江　武　彦３０７Figure 1 is a diagram showing the configuration of a general arithmetic unit.
The cz diagram is a diagram for explaining the flow of a general pipeline.
FIG. 5 is a timing chart for explaining the operation of the above embodiment, FIG. 5 is a diagram for explaining the flow of the four ideal lines in the above embodiment, and FIG. 6 is a block diagram showing another embodiment of the present invention. , FIG. 7 is a timing chart for explaining the operation of the other embodiment. 11... Main memory (MEM), J! ...Instruction parable ν7a (IB), 13.31.32...Instruction register (IR), 14.201~2o, 40s~40ol・-
Arithmetic unit, 161-16n... Arithmetic element, 22.
...Element in use information line, 3 J-...Comparison circuit (CMP), OFF, 1~OPE n. PC, LG, IPG-・Signal line. Applicant's agent Patent attorney Takehiko Suzue 307

Claims

[Claims]

(1) Instructions P and F in which instructions prefetched from the main memory are sequentially stored, an instruction register in which instructions retrieved from these instruction P and A are stored, and the above-mentioned instructions stored in this instruction register. , a plurality of arithmetic units each having a unique arithmetic processing function to execute an instruction, a plurality of arithmetic elements shared by these arithmetic units, and a plurality of arithmetic elements commonly connected to the plurality of arithmetic units and Each of the arithmetic units monitors the instruction placed in the instruction register, and if this instruction is fetches the instruction placed in the instruction register according to whether or not it is an instruction to be processed by the instruction register and the state of the element usage information line, and performs the operation using the corresponding arithmetic element; The device is configured to output a signal indicating that the arithmetic element is in use to the element usage information line, and to set an instruction to be executed next from the instruction pad in the instruction register. Parallel computing device.

(2) Instructions in which instructions read ahead from main memory are stored sequentially - 1. F1, this instruction P, the @1 instruction register where the instruction fetched from F is stored, and the above instruction whose number is stored in this first instruction register Kll is stored.
2 instruction registers, a comparison circuit that compares these I@l and the information of the renostar specification part included in each instruction stored in the second instruction register, and the first instruction register Kft. 1 each to execute the command
a plurality of arithmetic units having arithmetic processing functions, a plurality of arithmetic elements shared by each of these arithmetic units; Each operation section monitors the above-mentioned instruction placed in the first instruction register, and at least confirms that this instruction is processed by the own execution section. Whether it is a command to be executed or not, the status of the element usage information line above,
In accordance with the comparison result of the comparator circuit, the @l command takes in the instruction numbered by Nostar KffR, and performs the calculation using the corresponding calculation element, while receiving a signal indicating that this calculation element is in use. is output to the element usage information line, and the instruction placed in the first instruction register is output to the second instruction register as necessary.
A parallel processing device characterized in that it is configured to store an instruction in an instruction register, and to store an instruction to be executed next from the instructions N and F in the first instruction register.

(3) Parallel processing according to claim 2, characterized in that the arithmetic unit is made to wait for the fetching of the instruction stored in the first instruction register at least during the match detection output period of the comparison circuit. Computing device.

(4) If the execution stage of the instruction stored in the first instruction register requires two or more cycles, the calculation unit in the calculation execution state places the instruction in the second instruction register. A parallel computing device according to claim 3, characterized in that:

(5) The arithmetic unit located in the upper r operation execution state cape performs a load clock to place the instruction stored in the army 1 instruction register into the upper 1 rillE2 instruction register. The load clock is output one cycle before the end of the execution stage of this instruction. 5. The parallel computing device according to claim 4, wherein the output of the signal is stopped.

(6) Each performance I1 above. 6. The parallel arithmetic device according to claim 5, further comprising a load clock signal line which is commonly connected to the @2 instruction register and the load clock signal line to which the load clock signal is transferred.

(7) If the execution stage of the instruction placed in the w, l instruction register is an instruction that does not complete in one cycle, the arithmetic unit is connected to other arithmetic units via at least the load clock drop signal line. 7. The parallel arithmetic device according to claim 6, wherein during a period when the load clock signal is being transferred from the load clock to the load clock signal, the fetching of the instruction stored in the empty instruction register is made to wait.