JP2762441B2

JP2762441B2 - Coprocessor

Info

Publication number: JP2762441B2
Application number: JP62277198A
Authority: JP
Inventors: 博昭金子
Original assignee: Nippon Electric Co Ltd
Current assignee: NEC Corp
Priority date: 1987-10-30
Filing date: 1987-10-30
Publication date: 1998-06-04
Anticipated expiration: 2013-06-04
Also published as: JPH01118954A

Description

【発明の詳細な説明】［産業上の利用分野］本発明はCPUと密に結合されるコプロセッサに関し、
特に、オペランドデータの転送方式を改善して実行処理
の効率を向上させたコプロセッサに関する。［従来の技術］情報処理装置においてCPUの命令処理能力を向上させ
るために、CPUとは別の処理装置がCPUと密に結合され、
本来CPUが行うべき処理の一部分を肩代わりするという
構成が採られることがある。この別の処理装置は、コプ
ロセッサ（Co−processor）と呼ばれ、CPUが有する命令
の処理速度を向上させたり、また、CPUが処理すること
ができない命令処理（例えば、浮動小数点演算等）を実
行するものである。コプロセッサはあくまでCPUの拡張のために使用され
るので、コプロセッサの操作がソフトウェアにより制御
されることはない。即ち、演算のためのデータの転送及
び操作のためのコマンドの転送が、入出力（I/O）装置
のようにソフトウェアによって制御されることはなく、
CPUがコプロセッサを必要とする命令を検知した場合
に、CPUとコプロセッサとの間でのみデータの転送がな
される。第５図は浮動小数点演算を行うこの種の従来のコプロ
セッサを使用した情報処理装置を示すブロック図であ
る。CPU1、メモリ２及びコプロセッサ３な共有データバ
ス４を介して接続されており、データの転送はCPU1とメ
モリ２との間及びCPU1とコプロセッサ３との間において
なされる。バス制御回路５は、CUP1から入力されるデー
タ転送に係わる情報STATUSを解釈し、メモリ２に対する
リード／ライト信号MSTB及びコプロセッサ３に対するリ
ード／ライト信号CPSTBを発生する。 CPU1は浮動小数点演算命令を実行することを検知する
と、共有データバス４を介してメモリ２の浮動小数点デ
ータをオペランドデータとしてCPU1の内部に読込む。次
に、CPU1はコプロセッサ３に共有データバス４を介して
コマンドを転送し、演算を指定すると共に、演算開始を
指定する。そして、CPU1の内部に読込まれているオペラ
ンドデータがコプロセッサ３に転送され、コプロセッサ
３によりオペランドデータの演算が開始される。コプロ
セッサ３の演算が終了すると、演算結果はコプロセッサ
３からCPU1に転送され、次いで、CPU1がメモリ２に書込
む。なお、CPU1がSTATUS信号を出力すると、バス制御回路
５がこのSTATUS信号を検知してコプロセッサ３へ転送す
べきオペランドデータがメモリ２からCPU1に読込まれる
ことを把握する。そこで、バス制御回路５からメモリ２
に対するリード／ライト信号MSTBと、コプロセッサ３に
対するライト／リード信号CPSTBを同時に発生させるこ
とにより、CPU1を経由することなくメモリ２からオペラ
ンドデータを直接コプロセッサ３に転送することができ
る。第６図は従来のコプロセッサ３の構成を具体的に示す
ブロック図である。共有データバス４は、命令レジスタ
６及びオペランドバッファ８に接続されている。CPU1か
ら共有データバス４及び命令レジスタ６を介して命令デ
コードユニット７に転送されたコマンドは、命令デコー
ドユニット７において解析され、演算処理を行う命令実
行ユニット９に実行すべき処理が通知される。実行ユニ
ット９は、オペランドバッファ８に格納されたオペラン
ドデータを使用して指定された演算処理を行う。リード／ライト制御回路10は、外部から指定されるバ
スサイクル情報RD,WR,CDに基き、命令レジスタ６及びオ
ペランドバッファ８に対するストローブ信号を発行する
と共に、オペランドバッファ８に対する待合わせ制御を
行う。ここで、RD信号はコプロセッサ３に対するリードタイ
ミングを示し、WR信号はライトタイミングを示す。ま
た、CD信号はバスサイクルの種類を示し、CD＝０の場合
は命令レジスタ６に対するコマンドの転送を意味し、CD
＝１ならばオペランドバッファ８に対するオペランドデ
ータの転送を意味する。また、DRQ信号は命令実行ユニ
ット９からオペランドバッファ８に対するデータの転送
要求を示し、BSY信号はオペランドバッファ８に有効な
オペランドデータが転送されていないことを示す信号で
ある。次に、従来のコプロセッサの演算処理の動作タイミン
グについて、第７図に示すタイミングチャート図を参照
して説明する。タイミングTcにおいて命令レジスタ６に
コマンドが転送されると、命令デコードユニット７は命
令を解釈し、タイミングTeにおいて命令実行ユニット９
を起動する。命令実行ユニット９において、ある程度処理が進みタ
イミングTr1に達すると、処理にはオペランドデータを
必要とするので、オペランドバッファ８に対するDRQ信
号を発行する。リード／ライト制御回路10のBSY信号に
よりオペランドバッファ８には有効なオペランドデータ
が格納されていないことが示されているので、処理はタ
イミングTo1まで保留される。タイミングTo1においてオペランドバッファ８にオペ
ランドデータが転送されると、BSY信号はインアクティ
ブになり、命令実行ユニット９において保留されていた
処理が続行される。再開された処理がタイミングTr2に
達すると、再度、次のオペランドデータが必要になり、
オペランドバッファ８にオペランドデータが転送される
まで、すなわちタイミングTo2まで処理が保留されるこ
とになる。なお、第７図において、命令実行ユニット９
が実際に処理を行っている期間を斜線により示し、処理
が保留されている期間を空白により示している。［発明が解決しようとする問題点］しかしながら、従来のコプロセッサにおいては、第７
図に示すように、命令実行ユニット９の実行処理はオペ
ランドバッファ８にオペランドデータが転送されてくる
のを待つ時間（t1＋t2）だけ保留されるので、演算結果
を得るまでに余分な時間が生じてしまうという問題点が
ある。本発明はかかる問題点に鑑みてなされたものであっ
て、実行処理が保留される時間を解消して処理時間を短
縮することができるコプロセッサを提供することを目的
とする。［問題点を解決するための手段］本発明のコプロセッサは、演算の種類を指定するコマ
ンド及び前記演算の対象となるオペランドデータが外部
から転送されるコプロセッサにおいて、１つのコマンド
により処理されるべき全てのオペランドデータを前記コ
マンドの転送に先立って格納するのに必要な総ワード長
を有する少なくとも１組のレジスタ群と、オペランドデ
ータが転送されてくる都度前記レジスタ群の各レジスタ
を順次選択してそのオペランドデータを格納させる制御
手段と、外部からのコマンド転送を受けるコマンドレジ
スタと、前記レジスタ群からオペランドデータの転送を
受け前記コマンドレジスタに格納されたコマンドに基く
命令を処理する命令実行ユニットと、を有し、コマンド
が外部から前記コマンドレジスタに転送されるに先立
ち、全てのオペランドデータを前記レジスタ群に格納し
ておき、前記命令実行ユニットは前記コマンドの転送後
に前記命令に関する処理を開始することを特徴とする。［作用］制御手段はオペランドデータが転送されてくる都度レ
ジスタ群の各レジスタを順次選択して、そのオペランド
データを格納する。このレジスタ群は１つのコマンドに
より処理されるべき全てのオペランドデータを格納する
のに必要な総ワード長を有しており、制御手段により全
てのオペランドデータが格納される。そして、命令実行ユニットはコマンドを入力すると、
前記レジスタ群に格納されたオペランドデータの操作を
含む処理を開始する。このように、命令実行ユニットが
命令を処理する時では、既に全てのオペランドデータが
レジスタ群に格納されているので、オペランドレジスタ
の未転送による処理の保留は発生しない。［実施例］以下、添付の図面を参照して本発明の実施例について
説明する。第１図は本発明の第１の実施例に係るコプロ
セッサを示すブロック図である。80ビット長の浮動小数
点データを格納する３個の32ビットオペランドレジスタ
11,12,13及び命令レジスタ６が32ビットの共有データバ
ス４に接続されている。命令レジスタ６は共有データバ
ス４を介して転送されたコマンドを一旦保持し、これを
命令デコードユニット７に出力する。命令デコードユニ
ット７は入力されたコマンドを解析し、処理の内容を命
令実行ユニット14へ通知する。命令実行ユニット14にお
いては、オペランドレジスタ11,12,13に格納されている
80ビットのオペランドデータを使用して指定された演算
を一括して処理する。オペランドレジスタ制御回路15は、外部から指定され
るバスサイクル情報RD,WR,CDに基き、命令レジスタ６及
びオペランドレジスタ11,12,13に対するストローブ信号
を発行すると共に、オペランドレジスタ11,12,13に対す
る待合わせ制御を行う。ここで、RD信号はコプロセッサ３に対するリードタイ
ミングを示し、WR信号はライトタイミングを示す。ま
た、CD信号はバスサイクルの種類を示し、CD＝０の場合
は命令レジスタ６に対するコマンドの転送を意味し、CD
＝１の場合はオペランドレジスタ11,12,13に対するオペ
ランドデータの転送を意味する。また、２ビットのアドレス（Ａ）信号は、各オペラン
ドレジスタ11,12,13を識別するための信号であり、オペ
ランドデータを転送するバスサイクルに同期して入力さ
れる。アドレス（Ａ）信号が（00b）である場合はオペ
ランドレジスタ11へ、（01b）である場合はオペランド
レジスタ12へ、（10b）である場合はオペランドレジス
タ13に対してオペランドデータを転送することを示す。次に、このように構成されたコプロセッサの動作につ
いて第２図の動作タイミング図を参照して説明する。タ
イミングTo1において共有データバス４からアドレス
（Ａ）信号が（00b）である最初の32ビットオペランド
データが転送されると、このデータがオペランドレジス
タ11に格納される。次に、タイミングTo2においてアド
レス（Ａ）信号が（01b）である２番目の32ビットオペ
ランドデータが転送されると、このデータがオペランド
レジスタ12に、引き続いて、タイミングTo3においてア
ドレス（Ａ）信号が（10b）である３番目の32ビットオ
ペランドデータが転送されると、このデータがオペラン
ドレジスタ13に格納される。処理する浮動小数点データ
が80ビット長であるので、３番目のバスサイクルにおい
て転送される32ビットオペランドデータのうち上位16ビ
ットは無効のデータである。更に、タイミングTcにおいて命令レジスタ６にコマン
ドが転送されると、命令デコードユニット７はコマンド
を解釈し、タイミングTeにおいて命令実行ユニット14に
処理の開始を指示する。命令実行ユニット14が処理を開始し、例えば、タイミ
ングTr1,Tr2,Tr3においてオペランドデータを必要とす
る場合、この80ビットの浮動小数点データの全てがオペ
ランドレジスタ11,12,13に格納されているので、オペラ
ンドレジスタ11,12,13は命令実行ユニット14から80ビッ
トのうちのいずれの32ビットオペランドデータを要求さ
れても、又は、80ビットのデータの全てを同時に要求さ
れても、オペランドデータを命令実行ユニット14に転送
することができる。このため、オペランドデータの未転
送による処理の保留は発生せず、命令実行ユニット14は
処理が中断されることなく継続して演算処理することが
できる。なお、本実施例においては、タイミングの原点T0から
命令実行ユニット14が起動を始めるタイミングTeまでの
期間は、命令実行ユニット14は何らの処理もしていない
が、この期間を従前のコマンドに対する処理の期間とオ
ーバラップさせることができるので、そうすると、この
タイミングT0からタイミングTeまでの非動作時間がなく
なり、有効に命令実行ユニット14を使用することができ
る。また、本実施例においては、オペランドデータの転送
順序はオペランドレジスタ11,12,13に対応する順番であ
ったが、この順番に拘らず任意の順序で選択してオペラ
ンドデータを転送することができることは勿論である。第３図は本発明の第２の実施例に係るコプロセッサを
示すブロック図である。第３図において第１図と同一物
には同一符号を付して説明を省略する。オペランドレジ
スタ制御回路16がオペランドレジスタ11,12,13に対して
発行するストローブ信号はインクリメンタ17及びデコー
ダ18を介してオペランドレジスタ11,12,13に入力され
る。オペランドレジスタ制御回路16にはRD信号、WR信号
及びCD信号が入力される。インクリメンタ17はオペランドレジスタ制御回路16に
入力されるWR信号が１、CD信号が０（命令レジスタ６へ
のコマンド転送）になると、（00b）に初期化される。
そして、インクリメンタ17はオペランドレジスタ11,12,
13へオペランドデータが転送されると、それをWR＝１及
びCD＝１により検知し、その内容が＋１づつ更新され
る。デコーダ18は、オペランドレジスタ11,12,13に対して
オペランドデータが転送されると、インクリメンタ17の
内容に従って各オペランドレジスタ11,12,13に対するラ
イトストローブ信号を発生する。つまり、１回目のオペ
ランドデータ転送時、インクリメンタ17は（00b）であ
るので、デコーダ18はオペランドレジスタ11に対するラ
イトストローブ信号を発生する。このライトストローブ
信号が発生すると、インクリメンタ17は直ちに１だけイ
ンクリメンタされ、（01b）となる。オペランドレジスタ12に対するライトストローブ信号
は、インクリメンタ17が（01b）の場合、オペランドレ
ジスタ13に対するライトストローブ信号は、インクリメ
ンタ17が（10b）の場合に夫々発行される。このように、第２の実施例においては、オペランドデ
ータの転送順序が予め固定されている場合には、外部か
らアドレス情報を指定することなくオペランドレジスタ
11,12,13にオペランドデータを転送することができる。
そして、第１の実施例と同様に、第２の実施例において
も、実行ユニット14はデータの未転送によって処理を中
断することなく命令を連続して処理することができる。第４図（ａ）は、第１の実施例においてCPUがコプロ
セッサを必要とする命令の実行を行った場合のバスサイ
クルを示しており、第４図（ｂ）は第２の実施例による
場合のバスサイクルを示している。第２の実施例におい
ては、CPU1がオペランドレジスタ11,12,13を選択するた
めに、アドレス情報を指示する必要がないので、CPU1
（第５図参照）からコプロセッサに対して発行されるオ
ペランドデータ転送用のバスサイクルを省略することが
できる。つまり、第１の実施例においては、第４図
（ａ）に示すように、コプロセッサが必要とするオペラ
ンドデータは、メモリ２からのオペランドデータ読出し
用のバスサイクルBmによって、一度CPU内部に読込まれ
た後に、オペランドデータ転送用のバスサイクルBo及び
コマンド転送用のバスサイクルBcによりコプロセッサに
転送されている。これに対して、第２の実施例において
は、第４図（ｂ）に示すように、バスサイクルBmに同期
してコプロセッサにオペランドデータの転送を行ってお
り、バスサイクルBoを必要とせず、より命令実行ユニッ
ト14の使用効率（スループット）を向上させることがで
きる。［発明の効果］以上説明したように、本発明によれば、全てのオペラ
ンドデータは制御手段によりレジスタ群に格納される。
そして、全オペランドデータは実行処理に先立ちレジス
タに格納されるので、命令実行ユニットはオペランドデ
ータ転送のための待合わせをすることなく命令を処理す
ることができ、処理時間が短縮され、実行処理の効率が
向上する。The present invention relates to a coprocessor tightly coupled to a CPU,
In particular, the present invention relates to a coprocessor in which the transfer method of operand data is improved to improve the efficiency of execution processing. [Prior Art] In order to improve the instruction processing capability of a CPU in an information processing device, a processing device different from the CPU is tightly coupled with the CPU.
There may be a configuration in which a part of the processing that should be performed by the CPU is taken over. This other processing device is called a co-processor (Co-processor), which improves the processing speed of instructions of the CPU and performs instruction processing (for example, floating-point arithmetic or the like) that cannot be processed by the CPU. To do. Since the coprocessor is used only for CPU expansion, the operation of the coprocessor is not controlled by software. That is, the transfer of data for operation and the transfer of commands for operation are not controlled by software like an input / output (I / O) device.
When the CPU detects an instruction that requires a coprocessor, data transfer is performed only between the CPU and the coprocessor. FIG. 5 is a block diagram showing an information processing apparatus using a conventional coprocessor of this type for performing a floating-point operation. The CPU 1, the memory 2, and the coprocessor 3 are connected via a shared data bus 4, and data transfer is performed between the CPU 1 and the memory 2 and between the CPU 1 and the coprocessor 3. The bus control circuit 5 interprets information STATUS relating to data transfer input from the CUP 1 and generates a read / write signal MSTB for the memory 2 and a read / write signal CPSTB for the coprocessor 3. When detecting that the CPU 1 executes the floating-point operation instruction, the CPU 1 reads the floating-point data in the memory 2 via the shared data bus 4 into the CPU 1 as operand data. Next, the CPU 1 transfers the command to the coprocessor 3 via the shared data bus 4, specifies the operation, and specifies the start of the operation. Then, the operand data read into the CPU 1 is transferred to the coprocessor 3, and the coprocessor 3 starts the operation of the operand data. When the operation of the coprocessor 3 is completed, the operation result is transferred from the coprocessor 3 to the CPU 1, and then the CPU 1 writes the result in the memory 2. When the CPU 1 outputs the STATUS signal, the bus control circuit 5 detects this STATUS signal and recognizes that the operand data to be transferred to the coprocessor 3 is read from the memory 2 to the CPU 1. Therefore, the bus control circuit 5 sends the memory 2
, And the write / read signal CPSTB for the coprocessor 3 are simultaneously generated, so that the operand data can be directly transferred from the memory 2 to the coprocessor 3 without passing through the CPU 1. FIG. 6 is a block diagram specifically showing the configuration of the conventional coprocessor 3. As shown in FIG. The shared data bus 4 is connected to the instruction register 6 and the operand buffer 8. The command transferred from the CPU 1 to the instruction decode unit 7 via the shared data bus 4 and the instruction register 6 is analyzed in the instruction decode unit 7 and the processing to be executed is notified to the instruction execution unit 9 which performs the arithmetic processing. The execution unit 9 performs a designated operation using the operand data stored in the operand buffer 8. The read / write control circuit 10 issues a strobe signal to the instruction register 6 and the operand buffer 8 based on the bus cycle information RD, WR, and CD specified from the outside, and controls the queuing of the operand buffer 8. Here, the RD signal indicates a read timing for the coprocessor 3, and the WR signal indicates a write timing. The CD signal indicates the type of the bus cycle. When CD = 0, it means the transfer of a command to the instruction register 6;
If = 1, it means transfer of operand data to the operand buffer 8. The DRQ signal indicates a data transfer request from the instruction execution unit 9 to the operand buffer 8, and the BSY signal indicates that valid operand data has not been transferred to the operand buffer 8. Next, the operation timing of the arithmetic processing of the conventional coprocessor will be described with reference to a timing chart shown in FIG. When the command is transferred to the instruction register 6 at the timing Tc, the instruction decoding unit 7 interprets the instruction, and at the timing Te, the instruction execution unit 9
Start In the instruction execution unit 9, when the processing advances to some extent and reaches the timing Tr 1, the processing requires operand data, so that a DRQ signal to the operand buffer 8 is issued. Since the BSY signal of the read / write control circuit 10 indicates that valid operand data is not stored in the operand buffer 8, the processing is suspended until timing To1. When the operand data is transferred to the operand buffer 8 at the timing To1, the BSY signal becomes inactive, and the processing suspended in the instruction execution unit 9 is continued. When the restarted processing reaches timing Tr2, the next operand data is required again,
The processing is suspended until the operand data is transferred to the operand buffer 8, that is, until the timing To2. In FIG. 7, the instruction execution unit 9
, The period during which the process is actually being performed is indicated by oblique lines, and the period during which the process is suspended is indicated by a blank. [Problems to be Solved by the Invention] However, in the conventional coprocessor, the seventh
As shown in the figure, the execution process of the instruction execution unit 9 is suspended for the time (t1 + t2) waiting for operand data to be transferred to the operand buffer 8, so that extra time is generated until an operation result is obtained. There is a problem that it is. The present invention has been made in view of such a problem, and an object of the present invention is to provide a coprocessor capable of eliminating the time during which execution processing is suspended and shortening the processing time. [Means for Solving the Problems] In the coprocessor of the present invention, a command specifying a type of operation and operand data to be operated are externally transferred by a single command in a coprocessor. At least one set of registers having a total word length necessary to store all the operand data to be transferred prior to the transfer of the command, and each register of the register group each time operand data is transferred are sequentially selected. Control means for storing the operand data, a command register for receiving an external command transfer, and an instruction execution unit for receiving the operand data transfer from the register group and processing an instruction based on the command stored in the command register. Command is externally transferred to the command register. Prior to this, all operand data are stored in the register group, and the instruction execution unit starts processing related to the instruction after the transfer of the command. [Operation] Each time operand data is transferred, the control means sequentially selects each register of the register group and stores the operand data. This register group has a total word length necessary to store all operand data to be processed by one command, and all operand data is stored by the control means. Then, when the command execution unit inputs the command,
Processing including manipulation of the operand data stored in the register group is started. As described above, when the instruction execution unit processes an instruction, all the operand data has already been stored in the register group, so that the processing is not suspended due to the non-transfer of the operand register. Embodiment An embodiment of the present invention will be described below with reference to the accompanying drawings. FIG. 1 is a block diagram showing a coprocessor according to a first embodiment of the present invention. Three 32-bit operand registers that store 80-bit floating-point data
11, 12, 13 and the instruction register 6 are connected to the 32-bit shared data bus 4. The instruction register 6 temporarily holds the command transferred via the shared data bus 4 and outputs it to the instruction decoding unit 7. The instruction decode unit 7 analyzes the input command and notifies the instruction execution unit 14 of the contents of the processing. In the instruction execution unit 14, they are stored in the operand registers 11, 12, and 13.
The specified operation is processed collectively using 80-bit operand data. The operand register control circuit 15 issues a strobe signal to the instruction register 6 and the operand registers 11, 12, and 13 based on the bus cycle information RD, WR, and CD specified from the outside, and also issues a strobe signal to the operand registers 11, 12, and 13. Performs waiting control. Here, the RD signal indicates a read timing for the coprocessor 3, and the WR signal indicates a write timing. The CD signal indicates the type of the bus cycle. When CD = 0, it means the transfer of a command to the instruction register 6;
= 1 means transfer of operand data to operand registers 11, 12, and 13. The 2-bit address (A) signal is a signal for identifying each of the operand registers 11, 12, and 13, and is input in synchronization with a bus cycle for transferring operand data. When the address (A) signal is (00b), the operand data is transferred to the operand register 11, when it is (01b), it is transferred to the operand register 12, and when it is (10b), the operand data is transferred to the operand register 13. Show. Next, the operation of the coprocessor configured as described above will be described with reference to the operation timing chart of FIG. When the first 32-bit operand data whose address (A) signal is (00b) is transferred from the shared data bus 4 at the timing To1, this data is stored in the operand register 11. Next, at the timing To2, when the second 32-bit operand data whose address (A) signal is (01b) is transferred, this data is transferred to the operand register 12 and subsequently the address (A) signal is transmitted at the timing To3. When the third 32-bit operand data of (10b) is transferred, this data is stored in the operand register 13. Since the floating-point data to be processed has an 80-bit length, the upper 16 bits of the 32-bit operand data transferred in the third bus cycle are invalid data. Further, when the command is transferred to the instruction register 6 at the timing Tc, the instruction decode unit 7 interprets the command and instructs the instruction execution unit 14 to start processing at the timing Te. When the instruction execution unit 14 starts processing and, for example, requires operand data at the timing Tr1, Tr2, Tr3, since all of the 80-bit floating point data is stored in the operand registers 11, 12, and 13, The operand registers 11, 12, and 13 are capable of instructing the operand data regardless of whether the instruction execution unit 14 requests any of the 80 bits of 32-bit operand data or all of the 80 bits of data at the same time. It can be forwarded to the execution unit 14. Therefore, the suspension of the processing due to the non-transfer of the operand data does not occur, and the instruction execution unit 14 can perform the arithmetic processing continuously without interruption. In the present embodiment, the instruction execution unit 14 does not perform any processing during the period from the timing origin T0 to the timing Te at which the instruction execution unit 14 starts to be activated. Since the period can be overlapped, the non-operation time from the timing T0 to the timing Te is eliminated, and the instruction execution unit 14 can be used effectively. In the present embodiment, the transfer order of the operand data is the order corresponding to the operand registers 11, 12, and 13. However, regardless of this order, the operand data can be selected and transferred in an arbitrary order. Of course. FIG. 3 is a block diagram showing a coprocessor according to a second embodiment of the present invention. 3, the same components as those in FIG. 1 are denoted by the same reference numerals, and description thereof will be omitted. The strobe signal issued to the operand registers 11, 12, and 13 by the operand register control circuit 16 is input to the operand registers 11, 12, and 13 via the incrementer 17 and the decoder 18. The RD signal, the WR signal, and the CD signal are input to the operand register control circuit 16. The incrementer 17 is initialized to (00b) when the WR signal input to the operand register control circuit 16 becomes 1 and the CD signal becomes 0 (command transfer to the instruction register 6).
Then, the incrementer 17 outputs the operand registers 11, 12,
When the operand data is transferred to 13, it is detected by WR = 1 and CD = 1, and the content is updated by +1. When the operand data is transferred to the operand registers 11, 12, and 13, the decoder 18 generates a write strobe signal for each of the operand registers 11, 12, and 13 according to the contents of the incrementer 17. That is, at the time of the first operand data transfer, since the incrementer 17 is (00b), the decoder 18 generates a write strobe signal for the operand register 11. When this write strobe signal is generated, the incrementer 17 is immediately incremented by 1 to (01b). The write strobe signal for the operand register 12 is issued when the incrementer 17 is (01b), and the write strobe signal for the operand register 13 is issued when the incrementer 17 is (10b). As described above, in the second embodiment, when the transfer order of the operand data is fixed in advance, the operand register can be specified without externally designating the address information.
Operand data can be transferred to 11, 12, and 13.
Then, as in the first embodiment, in the second embodiment as well, the execution unit 14 can continuously process the instructions without interrupting the processing due to the untransferred data. FIG. 4 (a) shows a bus cycle when the CPU executes an instruction requiring a coprocessor in the first embodiment, and FIG. 4 (b) shows a bus cycle according to the second embodiment. 3 shows a bus cycle in the case. In the second embodiment, the CPU 1 does not need to indicate address information in order to select the operand registers 11, 12, and 13.
The bus cycle for transferring the operand data issued to the coprocessor from FIG. 5 can be omitted. That is, in the first embodiment, as shown in FIG. 4A, the operand data required by the coprocessor is once read into the CPU by the bus cycle Bm for reading the operand data from the memory 2. After that, the data is transferred to the coprocessor by the bus cycle Bo for operand data transfer and the bus cycle Bc for command transfer. On the other hand, in the second embodiment, as shown in FIG. 4 (b), the operand data is transferred to the coprocessor in synchronization with the bus cycle Bm, and the bus cycle Bo is not required. Thus, the use efficiency (throughput) of the instruction execution unit 14 can be further improved. [Effects of the Invention] As described above, according to the present invention, all operand data are stored in the register group by the control means.
Since all the operand data is stored in the register prior to the execution processing, the instruction execution unit can process the instruction without waiting for the transfer of the operand data, and the processing time is shortened. Efficiency is improved.

【図面の簡単な説明】第１図は本発明の第１の実施例に係るコプロセッサを示
すブロック図、第２図は同じくこのコプロセッサの動作
を説明するためのタイミングチャート図、第３図は本発
明の第２の実施例に係るコプロセッサを示すブロック
図、第４図（ａ），（ｂ）はコプロセッサを接続した情
報処理装置のデータバスの状態を示すタイミングチャー
ト図、第５図はコプロセッサを有する情報処理装置の構
成を示すブロック図、第６図は従来のコプロセッサを示
すブロック図、第７図はこの従来のコプロセッサの動作
を説明するためのタイミングチャート図である。 1;CPU、2;メモリ、3;コプロセッサ、4;共有データバ
ス、5;バス制御回路、6;命令レジスタ、7;命令デコード
ユニット、8;オペランドバッファ、9,14;命令実行ユニ
ット、10;リード／ライト制御回路、11,12,13;オペラン
ドレジスタ、15,16;オペランドレジスタ制御回路、17;
インクリメンタ、18;デコーダBRIEF DESCRIPTION OF THE DRAWINGS FIG. 1 is a block diagram showing a coprocessor according to a first embodiment of the present invention, FIG. 2 is a timing chart for explaining the operation of the coprocessor, and FIG. FIG. 4 is a block diagram showing a coprocessor according to a second embodiment of the present invention; FIGS. 4 (a) and 4 (b) are timing charts showing states of a data bus of an information processing apparatus to which the coprocessor is connected; FIG. 1 is a block diagram showing a configuration of an information processing apparatus having a coprocessor, FIG. 6 is a block diagram showing a conventional coprocessor, and FIG. 7 is a timing chart for explaining the operation of the conventional coprocessor. . 1; CPU, 2; memory, 3; coprocessor, 4; shared data bus, 5; bus control circuit, 6; instruction register, 7; instruction decode unit, 8; operand buffer, 9, 14; instruction execution unit, 10 ; Read / write control circuit, 11, 12, 13; operand register, 15, 16; operand register control circuit, 17;
Incrementer, 18; decoder

Claims

(57) [Claims] In a coprocessor to which a command for specifying the type of operation and operand data to be operated are externally transferred, all operand data to be processed by one command are stored prior to the transfer of the command. At least one set of registers having a required total word length; control means for sequentially selecting each register of the register group and storing the operand data each time operand data is transferred; A command register that receives the operand data from the register group, and an instruction execution unit that processes an instruction based on the command stored in the command register. First, all operand data is stored in the register group. Paid to keep the coprocessor the instruction execution unit, characterized in that initiating the process relating to the command after the transfer of the command.