JPH05298093A

JPH05298093A - Processor

Info

Publication number: JPH05298093A
Application number: JP4097659A
Authority: JP
Inventors: Yutaka Harada; 豊原田; Kazumasa Takagi; 一正高木; Koji Nakahara; 宏治中原
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 1992-04-17
Filing date: 1992-04-17
Publication date: 1993-11-12

Abstract

PURPOSE:To improve the practical use efficiency of hardware and to realize a high performance computer by realizing the processor which simultaneously processes plural instructions in the constitution of fractionizing the computer function to increase the pipeline pitch. CONSTITUTION:The processor consists of an adder 10, an MDR(memory data register) 120, an ACC(accumulator) 110, a PC (program counter) 140, an IR (instruction register) 150, a one adder 141, and channels 161, 162, 171, 172, 300 to 302, and 304 to 307. The address register of an instruction memory 170 is the PC 140 in stage 0 and the data register is the IR 150 in stage 2. The address register of a data memory 160 is the IR 150 in stage 2 and the data register is the MDR 120 in stages 3 and 4. Correlations of simultaneously processed instruction trains are eliminated to prevent the degradation of the processing efficiency due to a branch instruction.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は計算機の構成方法に係
り、特に多数の処理装置を並列に動作させるマルチプロ
セッサの構成方法に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a computer construction method, and more particularly to a multiprocessor construction method for operating a large number of processing devices in parallel.

【０００２】[0002]

【従来の技術】計算機性能を向上させるには、素子性能
特に回路速度を改良する方法と、計算機構造を改良する
方法とがある。計算機構造を改良するにあたって、計算
機の持つ機能を分割し、命令列でその機能を時分割で使
う方法、いわゆるパイプライン方式を導入することが極
めて有効である。このパイプライン方式については例え
ば”ジア−キテクチャ− オブパイプラインドコ
ンピュ−タ−ズ” ヘミスフェア− パブリッシング
コ−ポレ−ション１９８１年（”ＴｈｅＡｒｃｈｉ
ｔｅｃｔｕｒｅｏｆｐｉｐｅｌｉｎｅｄｃｏｍｐ
ｕｔｅｒｓ” ＨｅｍｉｓｐｈｅｒｅＰｕｂｌｉｓｈ
ｉｎｇＣｏｒｐｏｒａｔｉｏｎ１９８１）に記載さ
れている。従来のパイプライン方式では、例えば各命令
の実行順序を命令読み出し（Ｉｎｓｔｒｕｃｔｉｏｎ
Ｆｅｔｃｈ）、命令解読（Ｄｅｃｏｒｄ）、オペランド
読み出し（ＯｐｅｒａｎｄＦｅｔｃｈ）、命令実行
（Ｅｘｅｃｕｔｉｏｎ）、書き込み（Ｗｒｉｔｅ）の連
続する５動作に分け、これらの一連の動作を別々の命令
が順次オ−バ−ラップしながら実行する。2. Description of the Related Art In order to improve computer performance, there are a method of improving device performance, especially a circuit speed, and a method of improving computer structure. In order to improve the computer structure, it is extremely effective to divide the functions of the computer and use the so-called pipeline method, which uses the functions in an instruction sequence in a time division manner. For this pipeline system, for example, "Gear Architecture of Pipelined Computers" Hemisphere Publishing
Corporation 1981 ("The Archi
texture of pipelined comp
uters ”Hemisphere Publish
ing Corporation 1981). In the conventional pipeline method, for example, the execution order of each instruction is read by an instruction read (Instruction).
Fetch), instruction decoding (Decord), operand read (Operand Fetch), instruction execution (Execution), and write (Write) are divided into five consecutive operations, and these consecutive operations are sequentially overlapped. While doing.

【０００３】[0003]

【発明が解決しようとする課題】しかしながら上記従来
技術は、用意されているハ−ドウエアを時分割で有効に
活用する方法であるが、必ずしも高効率で用意されたハ
−ドウエアを活用しているとは言えない。究極的なパイ
プラインでは計算機を構成しているゲ−ト単位で時分割
動作を行わせ、計算機の論理段数分の命令を一度に実行
させることができる。いずれにせよ、計算機の性能を上
げるには、計算機の構造をより細分化し、パイプライン
のピッチをあげることが有効である。しかし、従来技術
のパイプライン方式では、計算機機能を高々５個程度に
分割し、その程度の多重動作を実現してているにすぎな
い。これは、一つの命令列を逐次実行する、いわゆるＳ
ｉｎｇｌｅＩｎｓｔｒｕｃｔｉｏｎＳｔｒｅａｍ
ＳｉｎｇｌｅＤａｔａＳｔｒｅａｍ（ＳＩＳＤ）
方式の処理形態では、ブランチ命令による逐次処理順序
の変更が該パイプライン処理の秩序を乱し、先に行った
処理を無効にして全体の処理効率を低下させるからであ
る。However, the above-mentioned prior art is a method for effectively utilizing the prepared hardware in a time division manner, but it does not necessarily utilize the prepared hardware with high efficiency. It can not be said. In the ultimate pipeline, time-division operations can be performed in units of the gates that make up the computer, and instructions for the number of logical stages of the computer can be executed at one time. In any case, in order to improve the performance of the computer, it is effective to subdivide the structure of the computer and increase the pitch of the pipeline. However, in the conventional pipeline method, the computer function is divided into at most about five, and the multiple operations of that degree are realized. This is a so-called S that executes one instruction sequence one after another.
ingle Instruction Stream
Single Data Stream (SISD)
This is because, in the processing form of the method, the change of the sequential processing order by the branch instruction disturbs the order of the pipeline processing, invalidates the processing performed earlier, and lowers the overall processing efficiency.

【０００４】本発明の目的は、計算機ハ−ドウエアの活
用効率を高め、高性能の計算機を実現するために、計算
機機能をより細分化し、パイプラインピッチを上げる計
算機の構成方法を提供しようとするものである。An object of the present invention is to provide a computer configuration method in which the computer function is further subdivided and the pipeline pitch is increased in order to improve the utilization efficiency of computer hardware and realize a high-performance computer. It is a thing.

【０００５】[0005]

【課題を解決するための手段】上記目的は、細分化した
計算機機能に対して、各々レジスタセットを割当て、命
令の実行順序が進むにつれて、該レジスタセットのデ−
タも順次移動させる方法を採用することにより達成され
る。SUMMARY OF THE INVENTION The above-mentioned object is to allocate a register set to each subdivided computer function so that the data of the register set is decremented as the instruction execution sequence progresses.
This is achieved by adopting a method of sequentially moving the data.

【０００６】[0006]

【作用】本発明による方式を採用すれば、各パイプライ
ンのステ−ジ毎に異なるプログラム（命令列）の命令を
実行できるため、一つのハ−ドウエアで複数のプロセッ
サを動作させるのと同じ効果がある。この形態では、単
一のハ−ドウエアが同時に処理している命令列に相関を
無くす事ができ、従来のパイプライン方式の問題点であ
ったブランチ命令による処理効率の低下は生じない。従
って、本発明よればは、単一のハ−ドウエアで高い効率
で複数のプログラムを実行するマルチプロセッサを構成
できる。When the system according to the present invention is adopted, the instructions of a different program (instruction string) can be executed for each stage of each pipeline, so that the same effect as operating a plurality of processors by one hardware can be obtained. There is. In this form, it is possible to eliminate the correlation between the instruction sequences being processed by a single hardware at the same time, and the branch instruction, which is a problem of the conventional pipeline system, does not suffer from the reduction in processing efficiency. Therefore, according to the present invention, it is possible to configure a multiprocessor that executes a plurality of programs with high efficiency with a single hardware.

【０００７】[0007]

【実施例】以下に、本発明を実施例を使って説明する。
本発明では簡単な計算機を実施例として取り上げる。計
算機は、命令メモリ（ＩｎｓｔｒｕｃｔｉｏｎＭｅｍ
ｏｒｙ：ＩＭ）から命令を読み出し、デ−タメモリ（Ｄ
ａｔａＭｅｍｏｒｙ：ＤＭ）からオペランドを読み出
して演算を実行する。この計算機は加算器と以下の４個
のレジスタを持つ。ＡＣＣ：アキュミュレ−タ（演算レジスタ、演算デ−タ
の保持）ＭＤＲ：メモリデ−タレジスタ（デ−タメモリから読み
出されたのオペランドの保持とデ−タメモリに書き込む
デ−タの保持）ＰＣ：プログラムカウンタ（プログラムの実行アドレ
スの保持）ＩＲ：インストラクションレジスタ（命令コ−ドの保
持）またこの計算機の命令フォ−マットは図２に示される。
命令フォ−マットは命令コ−ド（ＯＰＣｏｄｅ）と命
令アドレス（Ａｄｄｒｅｓｓ）からなる。この命令は命
令メモリから読みだされ、ＩＲにセットされて解読され
た後、実行される。この計算機の命令は以下の７種類で
ある。ＪＭＰ：命令アドレスに無条件ジャンプＣＡＬ：命令アドレスにサブル−チンジャンプＲＴＮ：サブル−チンジャッンプからの復帰ＪＡＮ：アキュミュレ−タが負ならば命令アドレスに
ジャンプ（条件ジャンプ）ＬＡＣ：命令アドレスのデ−タをアキュミュレ−タに
読み出すＳＡＣ：アキュミュレ−タのデ−タを命令アドレスに
書き込むＡＤＤ：命令アドレスのデ−タをアキュミュレ−タに
加算する本実施例の計算機は構造が簡単であり、従来のパイプラ
イン技術の命令読み出し（ＩｎｓｔｒｕｃｔｉｏｎＦ
ｅｔｃｈ）と命令実行（Ｅｘｅｃｕｔｉｏｎ）に相当す
る動作のみを行う。しかし、より詳細には命令読み出
し、命令実行はより複雑な動作に細分化される。図３に
上記７個の命令を実行するための細分化された動作を示
す。図３では、命令読み出しは３個の動作ステ−ジ、命
令実行は３個の動作ステ−ジに分かれ、各命令は６個の
動作ステ−ジを経て完結する。例えば命令読み出しで
は、ステ−ジ０でＰＣのデ−タをアドレスとして命令メ
モリに転送し、命令メモリを読み出しサイクルにする。
ステ−ジ１ではＰＣを１だけ増加させ、次の命令の実行
アドレスを保持する。ステ−ジ２ではステ−ジ０で起動
された命令メモリから実行する命令コ−ドがＩＲに返さ
れる。このステ−ジでは、命令コ−ドの命令部（ＯＰ
Ｃｏｄｅ）がＩＲセットされると同じに、命令アドレス
またはＣＡＬ命令の復帰先アドレスを保持する特定の場
所（例えば０番地）がアドレスとしてデ−タメモリに送
り出される。命令実行はＩＲのセットされた命令コ−ド
により動作が異なる。例えば、ＪＭＰ命令は、ステ−ジ
３でＩＲにセットされたジャンプ先のアドレスをＰＣに
転送するだけである。一方、ＣＡＬ命令では、戻り先の
アドレスを退避させるためＰＣのデ−タをＭＤＲに転送
してデ−タメモリを書き込みサイクルにして該戻り先ア
ドレスを退避すると同時に、ＰＣにサブル−チンアドレ
スをセットするために、ＩＲからアドレスをＰＣに転送
する。ＬＡＣ命令では、ステ−ジ２で読み出しの起動が
かけられたデ−タメモリからデ−タがステ−ジ４で戻さ
れＭＤＲにセットされる。ステ−ジ５でＭＤＲにセット
されたデ−タがＡＣＣに転送される。ＡＤＤ命令ではス
テ−ジ４でＭＤＲにセットされたデ−タとＡＣＣのデ−
タがステ−ジ５で加算され結果がＡＣＣにセットされ
る。EXAMPLES The present invention will be described below with reference to examples.
The present invention takes a simple calculator as an example. The computer is an instruction memory (Instruction Mem).
ory: IM) to read the instruction from the data memory (D
The operand is read from the data memory (DM) and the operation is executed. This computer has an adder and the following four registers. ACC: Accumulator (holding operation registers and operation data) MDR: Memory data register (holding operands read from data memory and holding data to be written in data memory) PC: Program counter (Holding program execution address) IR: Instruction register (holding instruction code) The instruction format of this computer is shown in FIG.
The instruction format comprises an instruction code (OP Code) and an instruction address (Address). This instruction is read from the instruction memory, set in IR, decoded, and then executed. The computer has the following seven types of instructions. JMP: Unconditional jump to instruction address CAL: Subroutine jump to instruction address RTN: Return from subroutine jump JAN: Jump to instruction address if accumulator is negative (conditional jump) LAC: Instruction address data To the accumulator SAC: Write the data of the accumulator to the instruction address ADD: Add the data of the instruction address to the accumulator The computer of the present embodiment has a simple structure and the conventional pipe. Instruction reading of line technology (Instruction F
Only the operations corresponding to "etch" and instruction execution (Execution) are performed. However, in more detail, instruction reading and instruction execution are subdivided into more complicated operations. FIG. 3 shows subdivided operations for executing the above seven instructions. In FIG. 3, instruction reading is divided into three operation stages and instruction execution is divided into three operation stages, and each instruction is completed through six operation stages. For example, in the instruction reading, the data of the PC is transferred to the instruction memory as an address in stage 0, and the instruction memory is set as a read cycle.
In stage 1, PC is incremented by 1 and the execution address of the next instruction is held. In step 2, the instruction code to be executed from the instruction memory started in step 0 is returned to IR. In this stage, the instruction part (OP
When Code) is set to IR, a specific location (for example, address 0) holding the instruction address or the return address of the CAL instruction is sent to the data memory as an address. The operation of instruction execution differs depending on the instruction code in which IR is set. For example, the JMP instruction simply transfers the jump destination address set in IR in step 3 to the PC. On the other hand, in the CAL instruction, in order to save the return destination address, the data of the PC is transferred to the MDR and the data memory is used as a write cycle to save the return destination address, and at the same time, the subroutine address is set in the PC. To do so, transfer the address from the IR to the PC. In the LAC instruction, the data is returned from the data memory whose reading was started in step 2 in step 4 and set in the MDR. The data set in the MDR in stage 5 is transferred to the ACC. In the ADD instruction, the data set in the MDR in step 4 and the ACC data
Data is added in stage 5 and the result is set in ACC.

【０００８】図３から、各命令はステ−ジごとに実行す
る動作内容が決められていることが理解できる。この図
３に示す細分化された動作を実行するための、本発明に
よる計算機のハ−ドウエア構成の実施例を図１に示す。
図１において、計算機は加算器１００と各動作ステ−ジ
（ステ−ジ０からステ−ジ５）毎におかれたレジスタ群
で構成される。同種のレジスタはデ−タがステ−ジ順に
転送されるように、シフトレジスタを構成している。一
連のシフトレジスタは用途に応じて、不必要な場合は途
中を欠落させてあり、またプログラム（命令列）でデ−
タを保持する必要のあるものは該シフトレジスタを環状
に接続し、該デ−タが該シフトレジスタ内を循環する構
成としている。以下に、図１の実施例の動作を詳細に説
明する。本実施例による処理装置は加算器１００、ＡＣ
Ｃ１１０、ＭＤＲ１２０、ＰＣ１４０、ＩＲ１５０、デ
−タメモリ１６０、命令メモリ１７０から構成されてい
る。ＡＣＣ１１０はスレ−ジ０からステ−ジ５までの各
ステ−ジにデ−タを保持するためのレジスタを持つ。各
ステ−ジのレジスタはビット毎に次のステ−ジにデ−タ
を転送できるようステ−ジ方向にシフトレジスタとして
の構成をしている。またプログラム内（命令列）でデ−
タを保持するため、ステ−ジ５のデ−タはステ−ジ０に
転送される構造を持つ。同様に、ＭＤＲ１２０はステ−
ジ３からステ−ジ５までのシフトレジスタから構成され
る。ＭＤＲ１２０のステ−ジ４ではデ−タメモリ１６０
からデ−タが転送され、ステ−ジ３ではデ−タメモリ１
６０にデ−タを転送する。ＰＣ１４０はステ−ジ０から
ステ−ジ５までのシフトレジスタから構成され、次の命
令アドレスを保持するためステ−ジ５のデ−タはステ−
ジ０に転送される構造を持つ。ステ−ジ０でＰＣ１４０
のデ−タは命令を読みだすためアドレスとして命令メモ
リ１７０に転送され、命令メモリは読みだしサイクルに
セットされる。またステ−ジ１には保持するデ−タを１
増加して次の命令アドレスをセットする目的で１加算器
１４１が付属している。ＩＲ１５０はステ−ジ２からス
テ−ジ５までのシフトレジスタから構成される。ステ−
ジ２でＩＲ１５０は該命令メモリ１７０から命令を受け
取る。It can be understood from FIG. 3 that the contents of operation to be executed for each instruction are determined for each stage. FIG. 1 shows an embodiment of the hardware configuration of a computer according to the present invention for executing the subdivided operations shown in FIG.
In FIG. 1, the computer is composed of an adder 100 and a register group provided for each operation stage (stage 0 to stage 5). The same type of register constitutes a shift register so that data is transferred in a stage order. A series of shift registers are omitted in the middle when they are unnecessary, depending on the application, and can be deleted by a program (instruction sequence).
The shift register is connected in an annular shape so that the data needs to be retained, and the data circulates in the shift register. The operation of the embodiment shown in FIG. 1 will be described in detail below. The processing apparatus according to this embodiment includes an adder 100, AC
It is composed of C110, MDR120, PC140, IR150, data memory 160, and instruction memory 170. The ACC 110 has a register for holding data in each stage from the stage 0 to the stage 5. The register of each stage is configured as a shift register in the stage direction so that data can be transferred to the next stage bit by bit. In the program (instruction sequence),
In order to retain the data, the data of stage 5 has a structure transferred to stage 0. Similarly, the MDR120 is
It consists of shift registers from stage 3 to stage 5. In stage 4 of MDR120, data memory 160
Data is transferred from the memory, and in step 3, the data memory 1
The data is transferred to 60. The PC 140 is composed of shift registers from stage 0 to stage 5, and the data of stage 5 is stored in order to hold the next instruction address.
It has a structure to be transferred to the 0. PC140 in stage 0
Data is transferred to the instruction memory 170 as an address to read the instruction, and the instruction memory is set in the reading cycle. In addition, the data to be held in stage 1 is 1
A 1-adder 141 is attached for the purpose of increasing and setting the next instruction address. The IR 150 is composed of shift registers of stages 2 to 5. Steer
In step 2, the IR 150 receives an instruction from the instruction memory 170.

【０００９】次に、図３に示された動作を図１を使って
説明する。命令読み出しでは、ステ−ジ０ではＰＣ１４
０をアドレスレジスタとしてそのデ−タをアドレスとし
てチャンネル１７１を介して命令メモリ１７０に送り、
メモリを読み出しサイクルにする。ステ−ジ１ではＰＣ
１４０のデ−タを１加算器１４１を使って１増加させ
る。ステ−ジ２では命令メモリ１７０から命令が読み出
されチャンネル１７２を介してＩＲ１５０に転送され
る。またこのステ−ジ２ではＩＲ１５０にセットされた
命令コ−ドのアドレス部またはＣＡＬ命令の復帰先を保
持する特定番地をチャンネル１６１を介してデ−タメモ
リ１６０に転送する。ここで、ステ−ジ０からステ−ジ
２をまでデ−タを逐次転送することにより命令読み出し
が実行されることは明らか。またステ−ジ２でＩＲ１５
０に転送された命令部により以下のステ−ジ３からステ
−ジ５までの命令の実行が制御される。Next, the operation shown in FIG. 3 will be described with reference to FIG. PC14 for stage 0 when reading instructions
0 is used as an address register and its data is sent as an address to the instruction memory 170 via the channel 171.
Put memory into read cycle. PC in Stage 1
The data of 140 is incremented by 1 using the adder 141. In stage 2, an instruction is read from the instruction memory 170 and transferred to the IR 150 via the channel 172. Further, in this stage 2, a specific address holding the address part of the instruction code set in the IR 150 or the return destination of the CAL instruction is transferred to the data memory 160 via the channel 161. Here, it is clear that the instruction read is executed by sequentially transferring the data from stage 0 to stage 2. IR15 in stage 2
The instruction unit transferred to 0 controls execution of the following instructions from stage 3 to stage 5.

【００１０】命令実行では、ＪＭＰ命令の場合は、ステ
−ジ３でＩＲ１５０からＰＣ１４０にチャンネル３００
を介してジャンプ先アドレスが転送される。ＣＡＬ命令
では、ステ−ジ３でＭＤＲ１２０にＰＣ１４０から復帰
先アドレスがチャンネル３０１を介して転送され、デ−
タメモリ１６０を書き込みサイクルにする。また、ＩＲ
１５０からＰＣ１４０にチャンネル３００を介してジャ
ンプ先アドレスが転送される。ＲＴＮ命令ではステ−ジ
４で戻り先のアドレスがＭＤＲ１２０に転送され、ステ
−ジ５で戻り先アドレスがＭＤＲ１２０からＰＣ１４０
にチャンネル３０４を介して転送される。ＪＡＮ命令は
ＡＣＣ１１０の符号を判定してＪＭＰ命令と同じ動作を
行う。ＬＡＣ命令ではステ−ジ４でデ−タメモリ１６０
よりデ−タがＭＤＲ１２０に転送され、ステ−ジ５で転
送されたデ−タがＭＤＲ１２０よりＡＣＣ１１０に加算
器１００、チャネル３０６、３０７を介して転送され
る。ＳＡＣ命令ではステ−ジ３でＡＣＣ１１０からＭＤ
Ｒ１２０に書き込みデ−タがチャンネル３０２を介して
転送され、同時にデ−タメモリ１６０を書き込みサイク
ルにする。ＡＤＤ命令ではステ−ジ４でデ−タメモリ１
６０よりデ−タがＭＤＲ１２０に転送され、ステ−ジ５
でＡＣＣ１１０とＭＤＲ１２０のデ−タがチャンネル３
０５、３０６より加算器１００に供給され、該加算器１
００の出力はチャンネル３０７を介してＡＣＣ１１０に
転送される。以上の説明より命令の実行がステ−ジ３か
らステ−ジ５まで逐次デ−タを次のステ−ジに転送する
ことにより行われることは明らか。ステ−ジ５で命令実
行が完了した後は、再びステ−ジ０に戻り、命令読み出
しを開始する。従って、図１に示す処理装置ではステ−
ジ０からステ−ジ５までを循環して実行することによ
り、プログラム（命令列）を実行することができる。In the instruction execution, in the case of the JMP instruction, the channel 300 is transmitted from the IR 150 to the PC 140 in step 3.
The jump destination address is transferred via. In the CAL instruction, in step 3, the return address is transferred from the PC 140 to the MDR 120 via the channel 301, and the data is transferred.
Data memory 160 into a write cycle. Also, IR
The jump destination address is transferred from 150 to the PC 140 via the channel 300. In the RTN instruction, the return address is transferred to the MDR 120 in step 4, and the return address is transferred from the MDR 120 to the PC 140 in step 5.
To the channel 304. The JAN instruction determines the sign of the ACC 110 and performs the same operation as the JMP instruction. In the LAC instruction, the data memory 160 at stage 4
More data is transferred to the MDR 120, and the data transferred in stage 5 is transferred from the MDR 120 to the ACC 110 via the adder 100 and channels 306 and 307. In the SAC instruction, the ACC110 to MD in stage 3
The write data is transferred to the R120 via the channel 302, and at the same time, the data memory 160 is set to the write cycle. In the ADD instruction, data memory 1 in step 4
Data is transferred from MDR 60 to MDR 120 and stage 5
The data of ACC110 and MDR120 is channel 3
05 and 306 are supplied to the adder 100, and the adder 1
The output of 00 is transferred to ACC 110 via channel 307. From the above explanation, it is clear that the instruction is executed by sequentially transferring the data from stage 3 to stage 5 to the next stage. After the instruction execution is completed in step 5, the process returns to step 0 and the instruction reading is started. Therefore, in the processing device shown in FIG.
A program (instruction sequence) can be executed by cyclically executing the steps from 0 to 5.

【００１１】図１の実施例を使えば、１個のハ−ドウエ
アで複数の命令を同時に処理できる。以上の説明で、命
令は各ステ−ジを逐次実行することを明らかにした。こ
のことは、１個の命令は、複数個あるステ−ジの１この
ステ−ジを専有しているにすぎない。従って、図１の実
施例では、複数個の命令を各ステ−ジに割り当てること
が出来ることを示唆している。すなわち、図１の実施例
では、６個の命令を各ステ−ジに割当て、１個のハ−ド
ウエアで６個の命令を同時に処理できることを示してい
る。さらに、図１の実施例では、各ステ−ジにレジスタ
（ＡＣＣ，ＭＤＲ，ＰＣ，ＩＲ）を置いてあるため各ス
テ−ジ間の相関を無くすことができる。このため相関の
無い複数の命令を同時に実行できるため、従来のパイプ
ライン方式で問題となったブランチ命令による処理効率
の低下は生じない。従って、命令の実行に無効が無いた
め、パイプラインの乱れは生じない。また、異なる複数
のプログラム（命令列）を１個のハ−ドウエアで実行で
きため、本発明はマルチプロセッサと同じ機能をを実現
している。Using the embodiment of FIG. 1, one hardware can process a plurality of instructions simultaneously. In the above description, it was clarified that the instruction sequentially executes each stage. This means that an instruction only occupies one of several stages. Therefore, the embodiment of FIG. 1 suggests that multiple instructions can be assigned to each stage. That is, in the embodiment shown in FIG. 1, 6 instructions are assigned to each stage, and 6 instructions can be simultaneously processed by 1 hardware. Further, in the embodiment of FIG. 1, since registers (ACC, MDR, PC, IR) are placed in each stage, the correlation between each stage can be eliminated. For this reason, a plurality of uncorrelated instructions can be executed at the same time, so that there is no reduction in processing efficiency due to branch instructions, which has been a problem in the conventional pipeline method. Therefore, since there is no invalidity in the execution of instructions, the pipeline is not disturbed. Further, since a plurality of different programs (instruction strings) can be executed by one piece of hardware, the present invention realizes the same function as the multiprocessor.

【００１２】以上の説明では、ステ−ジを６個に分けた
場合を例に本発明を説明したが、より多くのステ−ジに
分解することは可能であり、より多くの命令を同時に実
行できることは明らかである。このステ−ジはゲ−ト一
段まで分解できることは明らか。図１に示す実施例で
は、メモリを命令メモリとデ−タメモリに分けた。この
ため従来の計算機に置かれていたメモリ用の汎用レジス
タ、特にアドレスレジスタ、デ−タレジスタを省略しデ
−タの転送を効率良く実行できる。すなわち、図１で、
命令メモリのアドレスレジスタはステ−ジ０のＰＣであ
り、命令メモリのデ−タレジスタはステ−ジ２のＩＲで
ある。またデ−タメモリのアドレスレジスタはステ−ジ
２のＩＲであり、デ−タレジスタはステ−ジ３またはス
テ−ジ４のＭＤＲである。これらのレジスタの使い方は
本来、計算機に置かれたレジスタの意味に適合してお
り、無駄が無い。これらレジスタ本来の役目を実行させ
無駄の無い構成は、各ステ−ジ毎に役目を割り付けるこ
とができるが図１の実施例で可能となる。さらに、命令
メモリは毎回アクセスされるが、デ−タメモリは実行命
令の内容によりアクセスされる場合とされない場合があ
る。特に、計算機内のデ−タレジスタの数が多くなると
デ−タメモリを参照する頻度が小さくなる。また命令メ
モリの容量は比較的小さくてすむが、デ−タメモリは容
量が大きいものが要求される。以上より、命令メモリは
高速少容量、デ−タメモリは低速大容量のものが適当で
ある。図１の実施例ではこのメモリの階層構造に対処で
きる。In the above description, the present invention has been described by exemplifying a case in which the stage is divided into six stages, but it is possible to decompose the stage into more stages and execute more instructions simultaneously. It is clear that you can do it. It is clear that this stage can be decomposed to the next level of the gate. In the embodiment shown in FIG. 1, the memory is divided into an instruction memory and a data memory. Therefore, the general purpose memory registers, especially the address register and the data register, which are placed in the conventional computer, can be omitted and the data can be transferred efficiently. That is, in FIG.
The address register of the instruction memory is the PC of stage 0, and the data register of the instruction memory is the IR of stage 2. The address register of the data memory is the IR of stage 2, and the data register is the MDR of stage 3 or stage 4. The usage of these registers is essentially compatible with the meaning of the registers placed in the computer, and there is no waste. A configuration in which the original functions of these registers are executed without any waste can be assigned to each stage, but this is possible in the embodiment of FIG. Further, although the instruction memory is accessed every time, the data memory may or may not be accessed depending on the content of the execution instruction. In particular, as the number of data registers in the computer increases, the frequency of referring to the data memory decreases. Further, although the capacity of the instruction memory is relatively small, the data memory is required to have a large capacity. From the above, it is appropriate that the instruction memory has a high speed and a small capacity, and the data memory has a low speed and a large capacity. The embodiment of FIG. 1 can handle this hierarchical structure of memory.

【００１３】[0013]

【発明の効果】本発明は、ハ−ドウエアの使用効率を高
め、１個のハ−ドウエアで複数個の命令列を同時に実行
できるマルチプロセッサを提供できる。従って、本発明
は、高性能の処理装置を実現する極めて有効な手段を提
供するものである。According to the present invention, it is possible to provide a multiprocessor capable of improving the efficiency of use of hardware and simultaneously executing a plurality of instruction sequences with one piece of hardware. Therefore, the present invention provides an extremely effective means for realizing a high performance processing device.

[Brief description of drawings]

【図１】本発明による処理装置の構造図FIG. 1 is a structural diagram of a processing apparatus according to the present invention.

【図２】処理装置の命令フォ−マットFIG. 2 is an instruction format of a processor.

【図３】処理装置の動作シ−ケンスFIG. 3 is an operation sequence of the processing device.

[Explanation of symbols]

１００：加算器、１１０：ＡＣＣ、１２０：ＭＤ
Ｒ、１４０：ＰＣ、１４１：１加算器、１５０：Ｉ
Ｒ、１６０：デ−タメモリ１７０：命令メモリ１６１、１６２、１７１、１７２、２００、２０１、２
０２、３００、３０１、３０２、３０４、３０５、３０
６、３０７：チャンネル100: Adder, 110: ACC, 120: MD
R, 140: PC, 141: 1 adder, 150: I
R, 160: data memory 170: instruction memory 161, 162, 171, 172, 200, 201, 2
02, 300, 301, 302, 304, 305, 30
6, 307: Channel

Claims

[Claims]

1. A structure in which processing contents are divided into a plurality of stages, a register is provided for each stage, and data of the same kind of registers is sequentially transferred in the order of the stages. And a single instruction occupies one of the plurality of stages, and the instruction is executed by advancing the dedicated stage.

2. A processing device according to claim 1, wherein a plurality of instructions are assigned to the plurality of stages.

3. A device according to claim 1, wherein at least one of said registers is closed between stages.
A processor which constitutes a group, and the data stored in the loop is held between a plurality of instructions as necessary.

4. The structure according to claim 1, wherein the instruction read memory and the data memory are referred to, the instruction read stage refers to the instruction memory, and the instruction execution stage refers to the instruction memory. A processing device characterized by referring to the data memory.

5. The instruction register according to claim 1, wherein the address register of the instruction memory is a program counter in the instruction reading stage, and the data register of the instruction memory is the instruction cushion of the stage. A processing device characterized by being a register.