JPS63316133A

JPS63316133A - Arithmetic processor

Info

Publication number: JPS63316133A
Application number: JP62151207A
Authority: JP
Inventors: Masatsugu Kametani; 亀谷　雅嗣
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 1987-06-19
Filing date: 1987-06-19
Publication date: 1988-12-23
Anticipated expiration: 2010-09-27
Also published as: JPH0789320B2

Abstract

PURPOSE:To reduce an additional overhead and to contrive the speed up of an arithmetic processing, by providing a macroinstruction sequence processor other than a host processor, and taking over a direct instruction designating operation to an arithmetic unit by the processor. CONSTITUTION:The macroinstruction sequence processor 3 which issues an arithmetic execution instruction to the arithmetic unit 1 corresponding to a macroinstruction from the host processor 2 is also provided other than the processor 2. The interface 10 of the processor 3 is controlled via the interface 7 of the processor 2, and the processor 3 is operated synchronizingly with the processor 2, and the instruction of an arithmetic operation under execution for the unit 1 by the processor 3 via a multiplex unit 4 and data transfer by the processor 2, etc., are performed in parallel. In such a way, it is possible to reduce the additional overhead on the host processor, and to perform the arithmetic processing at high speed.

Description

【発明の詳細な説明】〔産業上の利用分野〕本発明は、数値演算等の演算を行う演算処理装置に係り
、特に命令実行シーケンスとデータ入出カシ−ケンスを
並列運転可能にすることによって、演算処理の高速化が
図ることができるアプリケーションに好適な、演算処理
装置実現方法に関する。[Detailed Description of the Invention] [Industrial Application Field] The present invention relates to an arithmetic processing device that performs calculations such as numerical calculations, and in particular, by enabling parallel operation of an instruction execution sequence and a data input/output sequence, The present invention relates to a method for implementing an arithmetic processing device suitable for applications that can speed up arithmetic processing.

[Conventional technology]

従来、浮動小数点演算ユニット（ＦＰＵ）等の演算処理
ユニットにホストプロセッサを接続してホストプロセッ
サの管理下で動作する演算処理装置を構成する場合、例
えばランダム演算等のスカラ処理を実行する際には、ホ
ストプロセッサから演算ユニットへのオペランドデータ
の転送及び演算命令の指示、演算ユニットによる演算の
実行、演算ユニットからホストプロセッサへの結果デー
タの転送等のオーバーヘッドの合計値により演算実行時
間が決定される。また、ベクトル演算を行う際には、ホ
ストプロセッサから、ベクトル演算ユニットのベクトル
レジスタへの十分なベクトル長のベクトルデータの転送
、ホストプロセッサからベクトル演算ユニットへのベク
トル演算命令の指示、ベクトル演算ユニットによるその
ベクトルデータすべてに対するベクトル演算処理の実行
、ベクトルレジスタからホストプロセッサの主メモリへ
の結果データの転送等のオーバーヘッドを伴う、上記の
２例は、一般的にオーバーヘッドが直列的に加算され演
算時間が決まる。これは、演算ユニットとホストプロセ
ッサが一対一で接続され、かつ演算ユニットで演算命令
が実行されている間ホスト・プロセッサが演算ユニット
内のレジスタファイルのアクセスや次の命令指示操作を
実行できないからである。なお、この種の装置として文
献［日経エレクトロニクス　１９８６．７．１４（ｎｏ
３９９）ＪのＰ１７２．Ｐ１７３に従来のホストプロセ
ッサと演算ユニットから成る演算処理装置がある。Conventionally, when a host processor is connected to an arithmetic processing unit such as a floating point unit (FPU) to configure an arithmetic processing device that operates under the control of the host processor, for example, when performing scalar processing such as random arithmetic, The operation execution time is determined by the total value of overhead such as the transfer of operand data from the host processor to the arithmetic unit, the instruction of the arithmetic instruction, the execution of the operation by the arithmetic unit, and the transfer of result data from the arithmetic unit to the host processor. . In addition, when performing vector operations, transfer vector data of sufficient vector length from the host processor to the vector register of the vector operation unit, instruct vector operation instructions from the host processor to the vector operation unit, and The above two examples involve overheads such as performing vector arithmetic processing on all of the vector data and transferring the result data from the vector register to the main memory of the host processor. It's decided. This is because the arithmetic unit and host processor are connected on a one-to-one basis, and while the arithmetic unit is executing an arithmetic instruction, the host processor cannot access the register file in the arithmetic unit or perform the next instruction instruction operation. be. Note that this type of device is described in the literature [Nikkei Electronics 1986.7.14 (no.
399) J's P172. At P173, there is an arithmetic processing unit consisting of a conventional host processor and an arithmetic unit.

[Problem that the invention attempts to solve]

上記従来技術においては、実時間処理を要求されるラン
ダム演算（スカラ処理）や小規模なベクトル演算に対す
る考慮がなされておらず、多大なオーバーヘッドを伴う
、特に実質的な演算実行部分がＶＬＳＩ技術の進歩によ
り急速に高速化されつつある現在においては、上記ラン
ダム演算及び小規模ベクトル演算では演算ユニットによ
る演算の実行部分より、ホストプロセッサと演算ユニッ
ト間の必要なデータ及び命令のやりとり等の付加的オー
バーヘッドの方がはるかに大きくなりつつあり、高速化
やコストパフォーマンスの向上を防げる原因となってい
る。In the above-mentioned conventional technology, consideration is not given to random operations (scalar processing) and small-scale vector operations that require real-time processing, and in particular, the actual operation execution part, which involves a large amount of overhead, is based on VLSI technology. Nowadays, speeds are rapidly increasing due to advances, and in the above-mentioned random operations and small-scale vector operations, additional overhead such as necessary data and instruction exchange between the host processor and the operation unit is required, rather than the execution of the operation by the operation unit. is becoming much larger, which prevents improvements in speed and cost performance.

本発明の目的は、上で考察した演算処理に判う付加的オ
ーバーヘッドを減少させ、総合的な演算実行時間を小さ
くして高速化を図ることができる演算処理装置の構成手
段を提供することにある。An object of the present invention is to provide a means for configuring an arithmetic processing device that can reduce the additional overhead associated with the arithmetic processing discussed above, reduce the overall arithmetic execution time, and increase the speed. be.

[Means for solving problems]

上記目的は、ホストプロセッサの他に演算ユニットを共
有する第２のプロセッサであるマクロ命令シーケンスプ
ロセッサを設け、少なくとも、ホストプロセッサ上で従
来サポートされていた演算ユニットへの演算実行シーケ
ンスの命令指示操作（ホストプロセッサ上でサポートさ
れる演算ユニットの実行シーケンス記述の為のプログラ
ム）機能と同等の機能をマクロ命令シーケンスプロセッ
サに持たせ、演算ユニットへの直接的な命令指示操作を
この第２のプロセッサに肩代りさせることにより、ホス
トプロセッサが必要なデータのやりとりの為に行う演算
ユニットのレジスタファイルへのアクセス操作と、上記
マクロ命令シーケンスプロセッサによる演算実行指示操
作及びそれにより実現される演算ユニットの実質的な演
算実行処理とを並列に運転し、付加的にオーバヘッドを
減少させることによって達成される。The above object is to provide a macro instruction sequence processor which is a second processor that shares an arithmetic unit in addition to the host processor, and to perform at least the instruction instruction operation ( The macro instruction sequence processor is equipped with a function equivalent to the function (program for describing the execution sequence of the arithmetic units supported on the host processor), and this second processor is responsible for directly issuing instructions to the arithmetic units. By replacing the access operation to the register file of the arithmetic unit performed by the host processor for the exchange of necessary data, the execution instruction operation of the arithmetic unit by the macro instruction sequence processor, and the actual operation of the arithmetic unit realized thereby. This is achieved by running the calculation execution process in parallel and additionally reducing overhead.

ホストプロセッサとマクロ命令シーケンスプロセッサ間
の命令指示操作は、マクロ命令シーケンスプロセッサの
演算ユニットへの演算実行シーケンス命令指示プログラ
ムを、実行シーケンスに沿って１命令又は複数命令ステ
ップ単位にまとめてマクロ命令化して行き、そのマクロ
命令の列を、より簡単な命令指示操作でマクロ命令シー
ケンスプロセッサに対してホストプロセッサから順々に
指令して行く方法を採る。具体的には、演算ユニットで
実行すべき命令数をホストプロセッサからマクロ命令と
してマクロ命令シーケンスプロセッサに与え、一方カウ
ンタによりマクロ命令が与えられてから演算ユニットで
実行された命令数をカウントしておき、マクロ命令によ
って指示された命令数と一致したらマクロ命令シーケン
スプロセッサの動作を停止して、次のマクロ命令指示の
待ち状態にする手段を設ける。Instruction instruction operations between the host processor and the macro instruction sequence processor are performed by converting an operation execution sequence instruction instruction program to the arithmetic unit of the macro instruction sequence processor into macro instructions in units of one instruction or multiple instruction steps along the execution sequence. A method is adopted in which the host processor sequentially instructs the macroinstruction sequence processor to execute the sequence of macroinstructions using a simpler command instruction operation. Specifically, the host processor gives the number of instructions to be executed by the arithmetic unit as a macro instruction to the macro instruction sequence processor, while a counter counts the number of instructions executed by the arithmetic unit after the macro instruction is given. , means is provided to stop the operation of the macro instruction sequence processor when the number of instructions matches the number of instructions specified by the macro instruction, and to place the processor in a waiting state for the next macro instruction instruction.

一方、一致しないうちはマクロ命令の実行中とみなして
、実行が完了するまで次のマクロ命令の指示を待たせて
おく手段を設ける。また、演算ユニット内のレジスタフ
ァイルは、演算の実行中もホストプロセッサから矛盾な
くアクセスできる様な手段を設け、マクロ命令シーケン
スプロセッサが演算ユニットへマクロ命令で指示された
数の演算命令の指令を行っているのと並行して、ホスト
プロセッサは、次のマクロ命令で必要となるデータを演
算ユニットのレジスタファイル上へ転送したり、過去の
演算結果を演算ユニットのレジスタファイルから入手し
たりする操作を行う。On the other hand, as long as they do not match, it is assumed that the macro instruction is being executed, and means is provided for making the instruction of the next macro instruction wait until the execution is completed. In addition, the register file in the arithmetic unit is provided with a means that can be accessed without contradiction from the host processor even during the execution of an arithmetic operation, and the macro instruction sequence processor issues the number of arithmetic instructions to the arithmetic unit specified by the macro instruction. In parallel with this, the host processor performs operations such as transferring data required for the next macro instruction onto the register file of the arithmetic unit and obtaining past operation results from the register file of the arithmetic unit. conduct.

[Effect]

上記の手段により、ランダム演算や小規模なベクトル演
算に対しても、オペランドデータ等の転送処理とごく簡
単なマクロ命令の指示操作だけがホストプロセッサに関
する演算処理オーバーヘッドであり、これらは、マクロ
命令シーケンスプロセッサによる演算ユニットへの複雑
な演算指示操作とオーバーラツプして運転されるため、
従来に比べてかなりの実質的な付加的オーバーヘッドを
。With the above means, even for random operations and small-scale vector operations, the only arithmetic processing overhead for the host processor is the transfer processing of operand data etc. and the instruction operation of very simple macro instructions, and these are Because the operation overlaps with the complex calculation instruction operation by the processor to the calculation unit,
Substantial additional overhead compared to traditional methods.

減少させることができ、かつ、リアルタイム性も　　　
　゛それ程損わない。特に、ベクトル処理に関しては、
比較的多くの演算命令数をまとめてマクロ命令化可能で
あり、実質的なオーバーヘッドはベクトルデータの転送
処理が大半である。またベクトル処理では、ベクトルデ
ータが主メモリや演算ユニットのレジスタファイルに連
続的に配置されるため、転送命令やＤＭＡ等によって高
速にデータ移動が可能であり、よりオーバーヘッドを減
少させることが可能である。can be reduced and also has real-time performance.
゛It doesn't hurt that much. Especially regarding vector processing,
A relatively large number of arithmetic instructions can be combined into macro instructions, and most of the substantial overhead is vector data transfer processing. In addition, in vector processing, vector data is placed continuously in the main memory or the register file of the arithmetic unit, so data can be moved at high speed using transfer instructions, DMA, etc., making it possible to further reduce overhead. .

〔Example〕

以下本発明の一実施例を第１図〜第６図により説明する
。An embodiment of the present invention will be described below with reference to FIGS. 1 to 6.

第１図は、本発明の演算処理装置のブロック図を示して
いる。本演算処理装置は、演算ユニット１、演算命令シ
ーケンスプログラムが駆留し、そのプログラムによりユ
ーザーの所望の演算機能を実現するホストプロセッサ２
．演算実行シーケンスの演算ユニットへの命令指示操作
を行うマクロ命令シーケンスプロセッサ３、及びホスト
プロセッサ１とマクロ命令シーケンスプロセッサ３とが
演算ユニット１を共有するための手段であるマルチプレ
クスユニット４とから構成される。FIG. 1 shows a block diagram of an arithmetic processing device of the present invention. This arithmetic processing device includes an arithmetic unit 1, a host processor 2 in which an arithmetic instruction sequence program is stored, and a host processor 2 that realizes a user's desired arithmetic function using the program.
．． It is composed of a macro instruction sequence processor 3 that instructs an operation unit in an operation execution sequence, and a multiplex unit 4 that is a means for the host processor 1 and the macro instruction sequence processor 3 to share the operation unit 1. Ru.

第２図は、ホストプロセッサと演算ユニットから構成さ
れる従来の演算処理装置を示している。FIG. 2 shows a conventional arithmetic processing device composed of a host processor and an arithmetic unit.

ホストプロセッサ２は、主メモリ６、ＣＰＵ５及び演算
ユニットへの必要な信号を提供するインターフェース回
路７とから成る。インターフェース回路７は、特にホス
トプロセッサ側にある必要はなく、ホストプロセッサ２
と演算ユニット１の間。The host processor 2 consists of a main memory 6, a CPU 5 and an interface circuit 7 which provides the necessary signals to the arithmetic unit. The interface circuit 7 does not need to be located on the host processor side;
and arithmetic unit 1.

もしくは演算ユニット１側にあっても良い、演算ユニッ
トｌは、マイクロシーケンサ１５．マクロコードメモリ
１６、制御線生成回路１７、演算の実行を行う実行ユニ
ット１８（ＡＬＵや乗算器等）、レジスタファイル２２
．命令の解析を行う命令デコーダ１９、及び内部パスＱ
１にホストプロセッサを連絡される為のパスバッファ２
３等で構成される。演算ユニット１においては、レジス
タファイル２２上のデータに関する加減乗除算及び定義
された種々の関数等を最小演算単位（基本演算）機能と
して実現する。この方式では、ホストプロセッサがイン
ターフェース回路７及びパスバッファ２６を使用してレ
ジスタファイル２２に必要なアドレス（レジスタ番号に
当る）を指示して必要なデータを転送し１次に命令デコ
ーダ１９に命令を指示しマイクロシーケンサ１５を起動
させ必要な演算を実行させる。演算の実行が終了するま
でホストプロセッサはレジスタファイル２２のアクセス
及び次の命令の送出を待たされ、演算が終了するとパス
バッファ２３が開くことを制御線Ｑ２によって許可され
る。第３図はその演算実行シーケンスを示している。ホ
ストプロセッサ２と演算ユニット１との間のデータ入出
力をＤ１〜Ｄ４で示し、ホストプロセッサ２から演算ユ
ニット１への演算命令の送出を工１〜工４、演算ユニッ
トにおける送出された演算命令の実行をＥ１〜Ｅ４に示
している。上下方向の矢印は、オペレーションの流れを
示している６図に示すごとく処理の流れは直列的であり
、各ユニットで遊び時間（図中点線で示した）が多く発
生している。Alternatively, the calculation unit 1 may be located on the calculation unit 1 side, and the calculation unit 1 is the micro sequencer 15. Macro code memory 16, control line generation circuit 17, execution unit 18 (ALU, multiplier, etc.) that executes calculations, register file 22
．． An instruction decoder 19 that analyzes instructions, and an internal path Q
Path buffer 2 for communicating the host processor to 1
Consists of 3rd class. In the arithmetic unit 1, addition, subtraction, multiplication, division, and various defined functions regarding the data on the register file 22 are realized as minimum arithmetic unit (basic arithmetic) functions. In this method, the host processor uses the interface circuit 7 and the path buffer 26 to instruct the register file 22 with the necessary address (corresponding to the register number), transfers the necessary data, and first sends an instruction to the instruction decoder 19. The microsequencer 15 is instructed to start up and perform necessary calculations. The host processor is made to wait for accessing the register file 22 and sending out the next instruction until the execution of the operation is completed, and when the operation is completed, the opening of the path buffer 23 is permitted by the control line Q2. FIG. 3 shows the calculation execution sequence. Data input/output between the host processor 2 and the arithmetic unit 1 is shown as D1 to D4, and steps 1 to 4 represent the sending of arithmetic instructions from the host processor 2 to the arithmetic unit 1, and steps 1 to 4 represent the sending of arithmetic instructions from the host processor 2 to the arithmetic unit. The execution is shown in E1-E4. The up and down arrows indicate the flow of operations. As shown in Figure 6, the flow of processing is serial, and each unit has a lot of idle time (indicated by dotted lines in the figure).

第１図に示した本実施例においては、ホストプロセッサ
２の他に、演算ユニット１への演算実行シーケンスの命
令指示操作を実行可能なマクロ命令シーケンスプロセッ
サ３を設けている。マクロ命令シーケンスプロセッサ３
は、ＣＰＵ８．ローカルメモリ９及び演算ユニットへの
必要な信号を供給するインターフェース回路１０とを有
し、少なくとも演算ユニット１内の命令デコーダ１９へ
の接続経路を持ち、ホストプロセッサに代って演算ユニ
ットへ演算命令の指令が可能となっていなければならな
い。本実施例においては、マルチブレクスユニット４に
よってホストプロセッサ２とマクロ命令シーケンスプロ
セッサ３とが演算ユニット１を共有する。マルチプレク
スユニット４は、命令送出デコーダラインをマルチプレ
クスするマルチプレクサ１１と、演算処理に必要となる
アドレスやデータを入出力するラインをマルチプレクス
するマルチプレクサ１３とを有し、マルチプレクサ１１
のアービトレーションはアービタ１２が行い、マルチプ
レクス１３のアービトレーションはアービタ１４が行う
。本例では、ホストプロセッサとマクロ命令シーケンス
プロセッサとは命令送出ラインのみを共有する様にし、
マルチプレクサ１１とアービタ１２のみを設けている。In the embodiment shown in FIG. 1, in addition to the host processor 2, there is provided a macro instruction sequence processor 3 capable of instructing the arithmetic unit 1 to instruct instructions in an arithmetic execution sequence. Macro instruction sequence processor 3
is CPU8. It has a local memory 9 and an interface circuit 10 that supplies necessary signals to the arithmetic unit, and has at least a connection path to an instruction decoder 19 in the arithmetic unit 1, and transmits arithmetic instructions to the arithmetic unit on behalf of the host processor. Commands must be possible. In this embodiment, the host processor 2 and the macro instruction sequence processor 3 share the arithmetic unit 1 through the multiplex unit 4. The multiplex unit 4 includes a multiplexer 11 that multiplexes instruction sending decoder lines, and a multiplexer 13 that multiplexes lines that input and output addresses and data necessary for arithmetic processing.
The arbiter 12 performs arbitration for the multiplex 13, and the arbiter 14 performs arbitration for the multiplex 13. In this example, the host processor and macro instruction sequence processor share only the instruction sending line,
Only a multiplexer 11 and an arbiter 12 are provided.

なお、ホストプロセッサとマクロ命令シーケンスプロセ
ッサの役割を完全に分離し、ホストプロセッサ２がデー
タ入出力を専門に受は待ち、一方マクロ命令シーケンス
プロセッサ３が演算ユニット１への演算命令送出操作を
専門に受は持つ様に構成しても良い、その他、インター
フェース回路７及び１０には、ホストプロセッサ２から
マクロ命令シーケンスプロセッサ３へのマクロ命令指示
を行うためと、処理シーケンスの同期を行う為のデータ
線Ｑ８を生成する機能を持たせている。この方式によっ
て、少なくともホストプロセッサ２か゛らのデータ入出
力操作と、マクロ命令シーケンスプロセッサ３からの演
算命令指示操作とを並列に運転することが可能である。The roles of the host processor and the macro instruction sequence processor are completely separated, with the host processor 2 specializing in data input/output and receiving and waiting, and the macro instruction sequence processor 3 being specialized in sending arithmetic instructions to the arithmetic unit 1. In addition, the interface circuits 7 and 10 may include data lines for issuing macro commands from the host processor 2 to the macro command sequence processor 3 and for synchronizing processing sequences. It has a function to generate Q8. With this method, it is possible to operate at least data input/output operations from the host processor 2 and arithmetic instruction instruction operations from the macro instruction sequence processor 3 in parallel.

演算ユニット１内には、レジスタファイル２２へ、実行
ユニット１８側のデータバスや制御線生成回路１７から
のアドレスバスＱｓｓから成る内部バスＱ１か又はホス
トプロセッサ２側のパスラインＱ４のいずれかを接続す
るマルチプレクサ２１を新たに設け、マイクロシーケン
サ側のアクセス要求及び許可線Ｑ１１とホストプロセッ
サ側からのアクセス要求及び許可線Ｑ６とをアービトレ
ーション及びアクティブにする操作を行うアービタ回路
２０によりスイッチ制御を行っている。これによって、
実行ユニット１８が動作中でも、レジスタファイル２２
を実行ユニットが使用していないと考えられる大半の時
間を、ホストプロセッサ２がレジスタファイル２２をア
クセスするのに使用できる。したがって、ホストプロセ
ッサ２のデータ入出力操作と、演算ユニット１の演算実
行処理とを並列に運転することができる。Inside the arithmetic unit 1, either an internal bus Q1 consisting of a data bus on the execution unit 18 side and an address bus Qss from the control line generation circuit 17, or a pass line Q4 on the host processor 2 side is connected to the register file 22. Switch control is performed by an arbiter circuit 20 that arbitrates and activates the access request and permission line Q11 from the microsequencer side and the access request and permission line Q6 from the host processor side. . by this,
Even when the execution unit 18 is operating, the register file 22
Most of the time that is not considered to be used by the execution unit can be used by the host processor 2 to access the register file 22. Therefore, data input/output operations of the host processor 2 and arithmetic execution processing of the arithmetic unit 1 can be operated in parallel.

第４図は、上記で説明した本実施例の演算処理の様子を
示しており、従来例である第３図と対応して示している
。まず、ホストプロセッサ２は、データＤｉ、Ｄ２を演
算ユニット１内のレジスタファイル２２上にロードした
後、マクロ命令シーケンスプロセッサ３に対し第１のマ
クロ命令ＭＩＩの指示を行っている。マクロ命令の指示
は、簡単化したコードを行うため、実行時間を短くする
ことができる。マクロ命令ＭＩＩは、演算ユニット１に
おける実際の基本単位命令ＩＩ、Ｉ２の２つのまとまり
ＭＩＯＩを示しており、マクロ命令シーケンスプロセッ
サ３は、演算ユニット１に対してＩＩ、Ｉ２の順で命令
を送出し、次のマクロ命令待ちの状態に戻る。一方、演
算ユニットｌは、Ｉ１及び工２に相当する基本演算の実
行Ｅｌ。FIG. 4 shows the arithmetic processing of this embodiment described above, and is shown in correspondence with FIG. 3 which is a conventional example. First, the host processor 2 loads the data Di and D2 onto the register file 22 in the arithmetic unit 1, and then instructs the macro instruction sequence processor 3 to issue the first macro instruction MII. Macro instruction instructions execute simplified code, so execution time can be shortened. The macro instruction MII indicates two groups MIOI of actual basic unit instructions II and I2 in the arithmetic unit 1, and the macro instruction sequence processor 3 sends instructions II and I2 to the arithmetic unit 1 in the order. , returns to the state of waiting for the next macro instruction. On the other hand, the arithmetic unit 1 executes basic arithmetic operations El corresponding to I1 and E2.

Ｅ２を行う、その間、ホストプロセッサ２は並行して次
に必要となるデータＤ３及びＤ４をレジスタファイル２
２上にロードする操作を行う。以後同様にして、処理を
実行して行く、第３図に示した従来例に比べて各ユニッ
トの遊び時間が短縮され、２倍近い効率になっているの
がわかる。E2 is executed, during which the host processor 2 transfers the next required data D3 and D4 to the register file 2 in parallel.
2. Perform the operation to load onto the screen. It can be seen that the idle time of each unit is shortened and the efficiency is nearly twice that of the conventional example shown in FIG. 3, in which processing is subsequently executed in the same manner.

第５図は、インターフェース１０内のホストプロセッサ
２からマクロ命令シーケンスプロセッサ３へのマクロ命
令指示回路及び、２つのプロセッサ間でのシーケンスの
同期に必要となる回路のブロック図を示している。イン
ターフェース１０は、マルチプレクサ１１に命令データ
を送る信号ラインＵＸＯの生成と、アービタ１２への演
算ユニット１の命令デコーダ１９へのアクセス要求及び
許可ラインＱＩＺ及び命令が送出されるたびにパルスを
発生しそれをカウンタ回路２４のクロック入力に送る信
号ラインＱ９とをＣＰＵ８の信号から生成する信号制御
回路２３と、実行された演算命令数をカウントするカウ
ンタ回路２４．及びホストプロセッサ２から送られてく
る実行命令数をラッチするラッチ回路２５とから成る。FIG. 5 shows a block diagram of a macro instruction instruction circuit from the host processor 2 to the macro instruction sequence processor 3 in the interface 10, and a circuit necessary for sequence synchronization between the two processors. The interface 10 generates a signal line UXO that sends instruction data to the multiplexer 11, requests the arbiter 12 to access the instruction decoder 19 of the arithmetic unit 1, and generates a grant line QIZ and a pulse every time an instruction is sent. A signal control circuit 23 that generates a signal line Q9 that sends a signal line Q9 to the clock input of a counter circuit 24 from a signal of the CPU 8, and a counter circuit 24 that counts the number of executed operation instructions. and a latch circuit 25 that latches the number of execution instructions sent from the host processor 2.

第６図は、ホストプロセッサ２からマクロ命令シーケン
スプロセッサへ送られるマクロ命令の構成手段及び実行
方式を示している。ホストプロセッサ２からは、ＯＵＴ
命令等の実行時間の短い簡単な命令で、次にマクロ命令
シーケンスプロセッサ３及び演算ユニット１で実行すべ
き演算命令数を指示する。指示された命令数は、第５図
中のラッチ回路２５にストアされ、これがマクロ命令の
指示そのものになる。前のマクロ命令が終了していれば
そのラッチデータはカウンタ回路２４にロードされ、演
算命令が送出される度に信号線ＱＢによるパルス信号に
よって１ずつ減じられて行き、カウンタ値がゼロになっ
たときゼロカウント信号が信号線Ｑ７に送出される。ゼ
ロカウント信号により信号線Ｑ７がアクティブになると
、ホストプロセッサ２は次のマクロ命令の指示が可能な
ことを知り、次に実行すべき演算命令数をマクロ命令と
してラッチ回路２５に送出する。第６図に示す様に、マ
クロ命令１（ＭＩＩ）は、３個の演算命令を実行する指
令であり、０ＵＴ３の様に記述する（ｎ個の命令を実行
する場合には０ＵＴｎと記述する）。これがマクロ命令
シーケンスプロセッサ３に送られると、もし前のマクロ
命令処理が完了していない場合は、図中ＭＩ３のごとく
ホストプロセッサ２側が、現在実行中のマクロ命令が完
了するまで待たされる。また同様に、ホストプロセッサ
２側からマクロ命令が送られてくるのが遅れれば、マク
ロ命令シーケンスプロセッサ３側が次のマクロ命令の指
示があるまで待たされる（図中ＷＡＩＪと記入）、演算
ユニット１は、指令された基本演算命令（Ｉｆ、　Ｉｚ
、・・・）をそのまま実行（Ｅｌ、Ｅｘ・・・）して行
く。FIG. 6 shows the composition means and execution method of macro instructions sent from the host processor 2 to the macro instruction sequence processor. From host processor 2, OUT
A simple instruction having a short execution time, such as an instruction, instructs the number of calculation instructions to be executed next by the macro instruction sequence processor 3 and the calculation unit 1. The designated number of instructions is stored in the latch circuit 25 in FIG. 5, and this becomes the macro instruction itself. If the previous macro instruction has been completed, the latch data is loaded into the counter circuit 24, and each time an arithmetic instruction is sent out, it is decremented by 1 by a pulse signal from the signal line QB, until the counter value becomes zero. At this time, a zero count signal is sent to signal line Q7. When the signal line Q7 is activated by the zero count signal, the host processor 2 knows that the next macro instruction can be specified, and sends the number of arithmetic instructions to be executed next to the latch circuit 25 as a macro instruction. As shown in Figure 6, macro instruction 1 (MII) is a command to execute three operation instructions, and is written as 0UT3 (to execute n instructions, it is written as 0UTn). . When this is sent to the macro instruction sequence processor 3, if the previous macro instruction processing has not been completed, the host processor 2 side, as indicated by MI3 in the figure, is forced to wait until the currently executing macro instruction is completed. Similarly, if there is a delay in sending a macro instruction from the host processor 2 side, the macro instruction sequence processor 3 side will have to wait until the next macro instruction is instructed (indicated by WAIJ in the figure), and the arithmetic unit 1 will , the basic operation instructions (If, Iz
,...) are executed as they are (El, Ex...).

この方式によれば、マクロ命令シーケンスプロセッサ３
側のプログラムは従来どおりの演算実行シーケンスの記
述で良く、それを自由に区切ってマクロ命令化すること
が可能となる。したがって、マクロ命令シーケンスプロ
セッサ３で実行されるオブジェクトプログラムは、ホス
トプロセッサ２上でもそのまま走らすことが可能である
。また、マクロ命令の指示はＯＵＴ命令等の単純なもの
で良く、アドレス出力線を利用すれば１マシン命令で実
現でき、演算をまとめることによる効果と合まって命令
指示に伴うオーバーヘッドを梗小化することができる。According to this method, the macro instruction sequence processor 3
The program on the side can be written as a conventional operation execution sequence, and it can be freely divided into macro instructions. Therefore, the object program executed by the macro instruction sequence processor 3 can be run as is on the host processor 2. In addition, the macro instruction instruction can be a simple one such as an OUT instruction, and if the address output line is used, it can be realized with one machine instruction.This, together with the effect of consolidating operations, reduces the overhead associated with instruction instruction. can do.

〔Effect of the invention〕

本発明によれば、命令処理に必要なホストプロセッサ及
び演算ユ°ニット間でのデータ入出力操作とマクロ命令
シーケンスプロセッサで実行される演算命令指示操作及
び演算ユニットでの演算実行処理とを並列運転できる為
、演算処理伴う付加的オーバーヘッドを減少させること
が可能となり、総合的な演算処理時間を短縮し、処理の
高速化を図ることができる。According to the present invention, the data input/output operation between the host processor and the arithmetic unit necessary for instruction processing, the arithmetic instruction instruction operation executed by the macro instruction sequence processor, and the arithmetic execution processing in the arithmetic unit are operated in parallel. Therefore, it is possible to reduce the additional overhead associated with calculation processing, shorten the overall calculation processing time, and speed up the processing.

[Brief explanation of drawings]

第１図は本発明の一実施例を示す図、第２図は従来例を
示す図、第３図は従来例における演算処理シーケンスを
示す図、第４図は本実施例における演算処理シーケンス
を示す図、第５図はマクロ命令指示回路部のブロック図
、第６図はマクロ命令構成手段及び実行方式を示す図で
ある。１・・・演算ユニット、２・・・ホストプロセッサ、３
・・・マクロ命令シーケンスプロセッサ、４・・・マル
チプレクスユニット、１０・・・インターフェース、１
８・・・実行ユニット、１９・・・命令デコーダ、２０
・・・バスアービタ、２１・・・マルチプレクサ、２２
・・・レジスタファイル、２４・・・カウンタ回路、２
５・・・ラッチ回路。第　乙　図Fig. 1 shows an embodiment of the present invention, Fig. 2 shows a conventional example, Fig. 3 shows an arithmetic processing sequence in the conventional example, and Fig. 4 shows an arithmetic processing sequence in this embodiment. FIG. 5 is a block diagram of the macro instruction instruction circuit section, and FIG. 6 is a diagram showing the macro instruction configuration means and execution method. 1... Arithmetic unit, 2... Host processor, 3
... Macro instruction sequence processor, 4... Multiplex unit, 10... Interface, 1
8... Execution unit, 19... Instruction decoder, 20
... bus arbiter, 21 ... multiplexer, 22
...Register file, 24...Counter circuit, 2
5...Latch circuit. Figure B

Claims

[Scope of Claims] 1. Includes an arithmetic unit that executes arithmetic processing, a first processor that has an instruction instruction function for an arithmetic execution sequence to be executed by the arithmetic unit, and a data input/output operation function necessary for the arithmetic operation. The arithmetic processing device includes a second processor that shares an arithmetic unit with the first processor and has at least the instruction instruction function, and the instruction instruction operations of the arithmetic execution sequence of the second processor are summarized for each instruction step. , an arithmetic processing device characterized in that the first processor instructs the assembled instruction steps to the second processor to control an instruction instruction sequence to an arithmetic unit of the second processor. 2. The arithmetic processing device according to claim 1, wherein the first processor and the second processor operate according to the same machine instruction. 3. The arithmetic processing device according to claim 1, wherein the first processor has a data transfer function with a register file of the arithmetic unit. 4. The arithmetic processing device according to claim 1, wherein the instruction step is at least one instruction step that is executed continuously. 5. An arithmetic processing device that includes an arithmetic unit that executes arithmetic processing, and a first processor that has an instruction instruction function for an arithmetic execution sequence to be executed by the arithmetic unit and an input/output function for data necessary for the arithmetic operation; a second processor that shares an arithmetic unit with the first processor and has at least the instruction instruction function; a means for summarizing the instruction instruction operations of the arithmetic execution sequence of the second processor into macro instructions for each instruction step; and summarization thereof; 1. An arithmetic processing device, comprising means for parallelly executing an arithmetic execution operation of the second processor and a data transfer operation with an arithmetic unit by the first processor, which are executed for each macro instruction issued by the second processor. 6. The arithmetic processing device according to claim 5, wherein the data transfer operation is a data transfer operation between a storage means of the first processor and a register file of the arithmetic unit. 7. An arithmetic processing device including a first processor having an arithmetic unit that executes arithmetic processing, a command instruction function for an arithmetic execution sequence to be executed by the arithmetic unit, and an input/output function for data necessary for the arithmetic operation, a second processor that shares an arithmetic unit with the first processor and has at least the instruction instruction function; a means for organizing instruction instruction operations of the operation execution sequence of the second processor into macro instructions for each instruction step; A means for instructing a macro instruction, which is the number of arithmetic instructions to be executed in an arithmetic unit, from a first processor to a second processor, and a means for latching the instructed number of instructions and executing a macro instruction each time an instruction is executed in an arithmetic unit. means for counting the number of arithmetic instructions for operating a second processor until the number matches the number of instructed instructions; An arithmetic processing device comprising means for stopping the operation of a first processor until execution of a macro instruction therein is completed.