JPH0744401A

JPH0744401A - Logic integrated circuit and its data processing system

Info

Publication number: JPH0744401A
Application number: JP19215793A
Authority: JP
Inventors: Toyohito Iketani; 豊人池谷
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 1993-08-03
Filing date: 1993-08-03
Publication date: 1995-02-14

Abstract

PURPOSE:To control efficiently a multi-processor comprising plural CPUs. CONSTITUTION:An area of a general-purpose register used by one process is limited so that data for the general-purpose register are not saved in an external memory at the changeover of the execution process in an RISC processor and plural processes are reserved by dividing the area of the general-purpose register. In the division of the general-purpose register, a management register is provided, which reserves information of the division of the general-purpose register and information for the process changeover and a register address when an instruction is read is converted into an address of the general-purpose register when the instruction is decoded, As the division information for the case, the address of the general-purpose register to each process is referenced by the management register. Furthermore, two program counters are provided in one CPU, the one program counter is used for the execution process and the other is used for an auxiliary use and set in the execution standby state and an optional process in the standby is allocated to the divided general-purpose register.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は論理集積回路における演
算回路に適用して有効な技術であり、特に大容量の汎用
レジスタを有するマイクロプロセッサおよびデータ処理
システムに利用して有効な技術に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a technique effectively applied to an arithmetic circuit in a logic integrated circuit, and more particularly to a technique effectively applied to a microprocessor having a large-capacity general-purpose register and a data processing system.

【０００２】[0002]

【従来の技術】近年、ＲＩＳＣプロセッサの発展にとも
ないマイクロプロセッサの性能が飛躍的に向上してきて
いる。一般に、このＲＩＳＣプロセッサはパイプライン
技術を駆使し動作を高速化するために、各パイプライン
を１マシンサイクルで実行できるように単純な命令だけ
で構成されているマイクロプロセッサである。そして、
図２に示すようにこのＲＩＳＣプロセッサにおいては、
動作周波数が高く、１つの命令が単純であるため、大容
量かつ大量の命令を必要とし、ＣＰＵ内蔵の１次キャッ
シュメモリと２次キャッシュメモリとが設けられてい
る。そして、実行頻度の少ない汎用レジスタのデータを
メインメモリへも退避させている。このため、パイプラ
インの流れを乱し、高速化を妨げるような上記１次キャ
ッシュメモリ，２次キャッシュメモリ，メインメモリの
それぞれとＣＰＵ内汎用レジスタ間の演算、上記２次キ
ャッシュメモリと上記１次キャッシュメモリ間の演算を
避け、上記汎用レジスタと他の上記汎用レジスタ間の演
算のみが行われるようにされている。また同様な理由か
ら、このＲＩＳＣプロセッサにおいてはＣＩＳＣプロセ
ッサにみられる、次データも読み出せるように次アドレ
スをも決定するオートインクリメント・デクリメントに
代表されるようなパイプラインの前後関係に依存する複
雑なアドレッシングモードは採用されていない。図３に
一般的なＲＩＳＣプロセッサのパイプライン処理の概念
図を示す。このＲＩＳＣプロセッサにおいては、命令読
み出し、命令解読、汎用レジスタ指定、ＡＬＵ演算、ラ
イトバックのような一連の動作がパイプラインで行われ
ることによって命令の実行が行われている。そして、メ
モリをアクセスするときには演算器ＡＬＵ，ライトバッ
クの間にメモリアクセスを短縮化するために１ステージ
が追加される。しかし、ＣＩＳＣプロセッサにおいては
プログラマによるアセンブラレベルでのプログラミング
工数を低減するために、複雑な命令，アドレスを取り込
み、パイプラインの乱れによる処理速度の低下を軽視し
ていた。また、ＣＩＳＣプロセッサではパイプラインの
乱れによって数サイクル余分に必要となるために、命令
実行が遅くなるという問題点がある。一方、上記ＲＩＳ
Ｃプロセッサでは基準とする汎用レジスタとインデクス
との演算を行うことによって汎用レジスタを指定する、
汎用レジスタ間の演算だけによるインデクス修飾アドレ
ッシングモードを採用して１サイクルでアドレス計算で
きるため、ＲＩＳＣプロセッサでは１サイクルで命令の
実行ができる。しかしながら、上記ＲＩＳＣプロセッ
サ，ＣＩＳＣプロセッサを問わず既存のプロセッサにお
いては、１つの実行プロセスで上記汎用レジスタを専有
させる方式を採っている。このため、外部メモリへ退避
させたデータの上記外部メモリから上記データの汎用レ
ジスタへのデータ転送を軽減するために、大量の汎用レ
ジスタを設けている。そして、最も使用頻度の多い変数
のみを上記汎用レジスタに残すようにして、コンパイラ
で上記汎用レジスタに割り当てられる変数を最適化して
いる。このように、従来のＲＩＳＣプロセッサにおいて
は、ハードウエア技術の改善の他にソフトウェア技術の
向上に負うところが大きかった。しかし、たいていのコ
ンピュータシステム，ワークステーション等は、複数の
ソフトウェアを同時に実行するために時分割でプロセス
を切り換えるマルチタスクシステムをサポートしている
ため、一定時間内に複数の実行プロセスを切り換えなけ
ればならない。このとき、大量の汎用レジスタを１つの
実行プロセスで占有していると、プロセス切り換え時に
オペレーティングシステム（以下ＯＳと記す）上で上記
大量の汎用レジスタに格納されたデータを外部メモリに
退避しなければならなくなる。このことにより、外部メ
モリ−汎用レジスタ間のアドレッシングによるデータ退
避のため、実行中のアプリケーションソフトウェアから
は無用な遅延時間が生じる。また、これに対し、ほとん
どシングルチッププロセッサ構成を前提としている既存
のコンピュータシステム，ワークステーション等はプロ
セス切り換え時にデータを退避させることが原因となっ
てシステム全体のスループットを犠牲にしない程度の退
避時間で済むように、汎用レジスタの数を制限すること
で対処していた。ところが、マルチチッププロセッサで
構成されたコンピュータシステムにおいては、実行プロ
セス単位で各プロセッサを割り当てる必要性が生じ、プ
ロセス切り換え手段が複雑になるとともに、さらにプロ
セス切り換え時の遅延時間が大きくなり同様の問題が生
じている。このため、従来のＯＳではＲＩＳＣプロセッ
サは複数のプロセスの選択手段を決めるだけだったが、
マルチプロセッサ用のＯＳではそれに加えてプロセッサ
の選択を行うことが必要となっている。また、分散ＯＳ
においてはさらにスレッドのようなサブルーチンレベル
のさらに細かいプロセスにより、ソフトウェアが構成さ
れているため、頻繁にプロセス切り換えを行わなければ
ならない。このため、前述したようなＲＩＳＣプロセッ
サのアーキテクチャをマルチプロセッサに適用した場合
に、すべてのプログラムをプロセッサごとに振り分ける
必要性が生じている。このことにより、頻繁なプロセス
切り換えが行われ、パイプライン処理に支障が生じ、汎
用レジスタの外部メモリへのデータの退避がシステム全
体にとって多大な負担となってしまう。2. Description of the Related Art In recent years, the performance of microprocessors has dramatically improved with the development of RISC processors. In general, this RISC processor is a microprocessor that is composed of only simple instructions so that each pipeline can be executed in one machine cycle in order to make full use of pipeline technology to speed up the operation. And
As shown in FIG. 2, in this RISC processor,
Since the operating frequency is high and one instruction is simple, a large capacity and a large number of instructions are required, and a primary cache memory and a secondary cache memory with a built-in CPU are provided. Then, the data of the general-purpose register that is less frequently executed is also saved in the main memory. Therefore, operations between the primary cache memory, the secondary cache memory, the main memory and the general-purpose registers in the CPU that disturb the flow of the pipeline and hinder the speedup, the secondary cache memory and the primary cache The operation between the cache memories is avoided, and only the operation between the general-purpose register and other general-purpose registers is performed. Further, for the same reason, in this RISC processor, a complicated one depending on the context of the pipeline, which is represented by the auto increment / decrement which determines the next address so that the next data can also be read, which is seen in the CISC processor, is complicated. Addressing mode is not adopted. FIG. 3 shows a conceptual diagram of pipeline processing of a general RISC processor. In this RISC processor, a series of operations such as instruction reading, instruction decoding, general-purpose register designation, ALU operation, and write back are performed in a pipeline to execute instructions. When the memory is accessed, one stage is added between the arithmetic unit ALU and the write back in order to shorten the memory access. However, in the CISC processor, in order to reduce the programming man-hours at the assembler level by the programmer, complicated instructions and addresses are taken in, and the reduction of the processing speed due to the disturbance of the pipeline is neglected. In addition, the CISC processor has a problem in that instruction execution is slowed because extra cycles are required due to the disturbance of the pipeline. On the other hand, the RIS
In the C processor, a general-purpose register is designated by performing an operation on a general-purpose register as a reference and an index.
Since the address modification can be performed in one cycle by using the index modification addressing mode only by the operation between general-purpose registers, the RISC processor can execute the instruction in one cycle. However, in the existing processor regardless of the RISC processor or the CISC processor, a method in which one execution process occupies the general-purpose register is adopted. Therefore, in order to reduce the data transfer of the data saved in the external memory from the external memory to the general-purpose register, a large number of general-purpose registers are provided. Then, the variables that are most frequently used are left in the general-purpose registers, and the variables optimized by the compiler are optimized by the compiler. As described above, in the conventional RISC processor, in addition to the improvement of the hardware technique, the improvement of the software technique is largely responsible. However, most computer systems, workstations, and the like support a multitasking system that switches processes in a time-sharing manner so that multiple pieces of software can be executed simultaneously, so it is necessary to switch between multiple execution processes within a certain period of time. . At this time, if a large number of general-purpose registers are occupied by one execution process, the data stored in the large number of general-purpose registers must be saved to an external memory on the operating system (hereinafter referred to as OS) when switching processes. Will not happen. As a result, data is saved by the addressing between the external memory and the general-purpose register, which causes an unnecessary delay time from the application software being executed. On the other hand, existing computer systems, workstations, etc., which are premised on a single-chip processor configuration, have a save time that does not sacrifice the throughput of the entire system due to saving the data when switching processes. This was done by limiting the number of general-purpose registers so that it could be done. However, in a computer system including a multi-chip processor, it becomes necessary to allocate each processor in units of execution processes, which complicates the process switching means and further increases the delay time at the time of process switching, which causes the same problem. Has occurred. For this reason, in the conventional OS, the RISC processor only decides the selection means of a plurality of processes.
In the OS for multiprocessor, it is necessary to select the processor in addition to that. Also, distributed OS
In the above, since software is configured by a subroutine-level finer process such as a thread, process switching must be frequently performed. Therefore, when the RISC processor architecture as described above is applied to a multiprocessor, it is necessary to distribute all programs to each processor. As a result, frequent process switching is performed, pipeline processing is hindered, and saving of data in the external memory of the general-purpose register imposes a heavy burden on the entire system.

【０００３】[0003]

【発明が解決しようとする課題】本発明は、次世代のＯ
Ｓに対応できる高速なプロセス管理及びレジスタ管理機
能を有するマイクロプロセッサを提供し、１ＬＳＩ内に
複数のＣＰＵで構成されるようなマルチプロセッサを効
率的に制御する管理機能を有するマイクロプロセッサを
提供することを目的とする。The present invention is intended for the next generation O
To provide a microprocessor having high-speed process management and register management functions capable of supporting S, and to provide a microprocessor having a management function for efficiently controlling a multiprocessor such as one LSI having a plurality of CPUs. With the goal.

【０００４】[0004]

【課題を解決するための手段】ＲＩＳＣプロセッサにお
いて、実行プロセスの切り換え時に汎用レジスタのデー
タを外部メモリに退避しなくても良いように、１つのプ
ロセスで使用できる汎用レジスタの領域を限定し、複数
のプロセスを上記汎用レジスタ内の分割により確保でき
るようにする。そして、上記汎用レジスタ内の分割に対
する情報及びプロセス切り換えに対する情報を確保して
いる管理レジスタを設け、命令読み込み時のレジスタグ
ループ内のレジスタアドレスを命令解読時に汎用レジス
タアドレスに変換する。そのときの分割情報として、上
記汎用レジスタの各プロセスに対するプログラムカウン
タの値を上記管理レジスタから参照する。さらに、１Ｃ
ＰＵ内に２つのプログラムカウンタを設け、１つを実行
用のプロセス、その他のプログラムカウンタを補助用と
し、実行待ち状態として分割レジスタに待機中の任意の
プロセスに割り当てる。In the RISC processor, the area of the general-purpose register that can be used by one process is limited so that the data of the general-purpose register does not have to be saved to the external memory when the execution process is switched. This process can be secured by the division in the general-purpose register. Then, a management register that secures information on division and process switching in the general-purpose register is provided, and the register address in the register group at the time of reading the instruction is converted to the general-purpose register address at the time of decoding the instruction. As the division information at that time, the value of the program counter for each process of the general-purpose register is referenced from the management register. Furthermore, 1C
Two program counters are provided in the PU, one is used as a process for execution, and the other program counter is used as an auxiliary, and is assigned to any process waiting in the split register as an execution wait state.

【０００５】[0005]

【作用】上記レジスタ分割機能を有するマイクロプロセ
ッサは、１ＣＰＵ内に複数のプロセス情報を有すること
ができ、１〜数サイクル以内でプロセスを切り換えるこ
とが可能となる。このことにより、実行プロセスの切り
換えに対する汎用レジスタの外部メモリへ退避するデー
タ量を最小限に抑えることができ、切り換え時間を大幅
に短縮することができる。また他のプロセスと並行処理
することにより、実行中の命令コードの中でディレイス
ロットが原因で実行できない命令コードを検出したとき
に、プログラムカウンタを待機中のプロセスに切り換
え、実行可能な命令コードに置き換えることによりパイ
プラインの乱れを軽減することができる。The microprocessor having the register dividing function can have a plurality of process information in one CPU, and can switch processes within one to several cycles. As a result, it is possible to minimize the amount of data saved in the external memory of the general-purpose register when switching the execution process, and to significantly reduce the switching time. Also, by performing parallel processing with other processes, when an instruction code that cannot be executed due to a delay slot is detected in the instruction code being executed, the program counter is switched to the waiting process, and the instruction code becomes executable. By replacing, the disturbance of the pipeline can be reduced.

【０００６】[0006]

【Example】

（実施例１）図１にハードウエアでプロセス切り換えを
行なう本発明のプロセッサを示す。汎用レジスタＲｇ０
〜Ｒｇ２５５のように２５６個の汎用レジスタによって
構成される汎用レジスタファイルをそれぞれ１６個ずつ
の汎用レジスタでグルーピングし、Ｇｒｐ０〜Ｇｒｐ１
５まで１６個のグループに分割する。ここで、本発明に
おいては、上記汎用レジスタファイル自身に分割のため
のハードウェアは付加することなくグループ分割が行な
われるものであるがグルーピング方法については後に説
明する。そして、上記汎用レジスタにおけるそれぞれの
汎用レジスタグループと対応して管理レジスタが構成さ
れるとともに、この汎用レジスタのグループ数と対応し
た数の上記管理レジスタによって管理レジスタ群が構成
される。また、上記それぞれの管理レジスタには予め設
定されたプログラムカウンタＰＣの値またはプロセス中
断時のプログラムカウンタＰＣの値ＰＣ＊が格納されて
いると共にフラグの情報が格納されている。ここで、上
記実行状態フラグは、少なくとも実行中と実行待ちを示
すビットを必要とする。この実行状態フラグを読み書き
するときには、それぞれの管理レジスタのレジスタ番号
ごとに、ある実行状態フラグを読み書きする方法と、上
記それぞれの管理レジスタ内の実行状態を示す個別ビッ
トを読み書きする方法がある。上記それぞれの管理レジ
スタ内の実行状態を示す個別ビットを読み書きする方法
においては、読み出した状態ビットの位置と管理レジス
タの番号が一致しているものである。この具体的な例を
図４に示す。図４では、汎用レジスタグループＧｒｐ．
０と対応した管理レジスタ０のフラグが実行状態で、汎
用レジスタグループＧｒｐ．１と対応した管理レジスタ
１のフラグが実行待ち状態であるとして説明する。上記
管理レジスタは実行中ビットと実行待ち状態ビットを有
するフラグを有しており、汎用レジスタグループＧｒ
ｐ．０が実行状態であるために０ビットの管理レジスタ
のフラグがたっており、上記０ビットの実行中ビットの
フラグがたっている。ここで、上記汎用レジスタグルー
プＧｒｐ．１を実行状態に切り換えるために、実行状態
ビットの全ビットをスキャンし、０ビットの実行中ビッ
トのフラグがたっていることを割り出す。そして、管理
レジスタの０ビットのフラグにおける実行中ビットをク
リアして、実行待ち状態に切り換え、１ビットの実行中
ビットのフラグをたてる。このことによって、実行中ビ
ットと実行待ちビットを切り換えるものである。さら
に、上記汎用レジスタと接続されて、指定された汎用レ
ジスタ間の論理演算を行うための演算器ＡＬＵが構成さ
れる。そして、常にメインメモリのデータを蓄積し、読
み出すべきデータがないときにメインメモリから読み出
すデータキャッシュメモリ，命令キャッシュメモリが設
けられている。そして、これらのキャッシュメモリは１
次キャッシュメモリ及びまたは２次キャッシュメモリで
あって、上記命令キャッシュメモリはコード命令を読
み、上記データキャッシュメモリはレジスタから参照す
るデータを読み書きする。さらに、上記データキャッシ
ュメモリは双方向データの入出力を行うとともに、アド
レス線を介して上記プログラムカウンタＰＣのアドレス
を読み込む。一方、上記命令キャッシュメモリは、上記
入力された命令をフェッチするフェッチ回路と接続さ
れ、さらに上記フェッチ回路はデコード回路ＤＣＲと接
続される。そして、上記デコード回路ＤＣＲにより解読
された命令は制御信号として、管理レジスタ，汎用レジ
スタ，演算器ＡＬＵ等の論理回路にそれぞれ接続され、
そのうちの１本がプロセス切り換え回路に取り込まれ
る。ここで、このプロセス切り換え回路は、プロセス切
り換え命令が発生したときに上記デコード回路ＤＣＲと
接続されるもので、退避処理ブロック，選択アルゴリズ
ムブロック，復帰ブロックから構成されるものである。
そして、上記退避処理ブロック，復帰ブロック及び選択
アルゴリズムブロックはレジスタ制御線と接続された切
り換え制御線に接続され、上記管理レジスタに格納され
た上記プログラムカウンタＰＣの値ＰＣ＊及び実行状態
フラグの情報を制御するものである。また、上記選択ア
ルゴリズムブロックによりレジスタグループ信号Ｇｒｐ
＃が発生され、上記管理レジスタ及び上記汎用レジスタ
グループにデータ転送が行われるが、このレジスタグル
ープ信号Ｇｒｐ＃は割り当てた汎用レジスタグループを
指定するためのものである。さらに、上記管理レジスタ
及び上記汎用レジスタグループは入力用データバス及び
出力用データバスと接続される。そして、上記管理レジ
スタ及び上記汎用レジスタグループから出力されたデー
タは、上記デコード回路ＤＣＲによりアドレス指定され
アクセスされた演算器ＡＬＵに転送されるとともに上記
プログラムカウンタＰＣにも転送される。さらに、この
プログラムカウンタＰＣからの出力データは上記入力用
データバスを介して上記管理レジスタ及び上記汎用レジ
スタグループとアクセスされる。また、ＯＳ上で上記管
理レジスタは上記汎用レジスタグループと１対１で対応
させており、例えば管理レジスタａと汎用レジスタグル
ープＧｒｐ０が対応していたとすれば、レジスタグルー
プ信号Ｇｒｐ＃で上記汎用レジスタグループＧｒｐ０を
指定した場合には、上記管理レジスタａが指定され、上
記管理レジスタａに格納されたプロセスが上記汎用レジ
スタグループＧｒｐ０に割り当てられる。また、上記管
理レジスタに記録するべきデータは特に限定しないが、
実行を中断したときの実行データを含むものとする。こ
のことにより、本実施例では１つのレジスタグループ信
号Ｇｒｐ＃に対して、実行を中断したときに汎用レジス
タグループＧｒｐ０が指定されていたとすれば、上記汎
用レジスタグループＧｒｐに対応する上記管理レジスタ
ａに格納された上記プログラムカウンタＰＣの値ＰＣ＊
と実行状態のフラグの２つのデータが保持されているこ
とになる。ここで、上記汎用レジスタグループＧｒｐ０
に対応する上記管理レジスタａのフラグは実行中の実行
状態フラグであり、汎用レジスタグループＧｒｐ１に対
応する管理レジスタｂのフラグは待機状態の実行状態フ
ラグを示している。(Embodiment 1) FIG. 1 shows a processor of the present invention which performs process switching by hardware. General-purpose register Rg0
˜Rg255, a general-purpose register file composed of 256 general-purpose registers is grouped by 16 general-purpose registers, and Grp0 to Grp1.
Divide up to 5 into 16 groups. Here, in the present invention, group division is performed without adding hardware for division to the general-purpose register file itself, but a grouping method will be described later. A management register is configured corresponding to each general-purpose register group in the general-purpose register, and a management register group is configured by the number of management registers corresponding to the number of groups of the general-purpose register. Further, in each of the management registers, a preset value of the program counter PC or a value PC * of the program counter PC at the time of process interruption is stored, and flag information is also stored. Here, the above-mentioned execution status flag needs at least a bit indicating execution and execution waiting. When reading and writing this execution status flag, there are a method of reading and writing a certain execution status flag for each register number of each management register, and a method of reading and writing an individual bit indicating the execution status in each of the above management registers. In the method of reading and writing the individual bit indicating the execution status in each management register, the position of the read status bit and the management register number match. A specific example of this is shown in FIG. In FIG. 4, the general-purpose register group Grp.
0 in the management register 0 corresponding to the flag of the general register group Grp. It is assumed that the flag of the management register 1 corresponding to 1 is in the execution waiting state. The management register has a flag having an in-execution bit and an execution-waiting state bit, and the general-purpose register group Gr.
p. Since 0 is in the execution state, the flag of the 0-bit management register is set, and the flag of the 0-bit executing bit is set. Here, the general-purpose register group Grp. In order to switch 1 to the running state, all bits of the running state bit are scanned and it is determined that the flag of the running bit of 0 bit is on. Then, the in-execution bit in the 0-bit flag of the management register is cleared to switch to the execution waiting state, and the in-execution bit flag of 1 bit is set. By this, the in-execution bit and the execution wait bit are switched. Further, connected to the general-purpose register, an arithmetic unit ALU for performing a logical operation between designated general-purpose registers is configured. There are provided a data cache memory and an instruction cache memory that always accumulate data in the main memory and read from the main memory when there is no data to be read. And these cache memories are 1
A secondary cache memory and / or a secondary cache memory, wherein the instruction cache memory reads a code instruction and the data cache memory reads / writes data to be referenced from a register. Further, the data cache memory inputs / outputs bidirectional data and reads the address of the program counter PC via an address line. On the other hand, the instruction cache memory is connected to a fetch circuit that fetches the input instruction, and the fetch circuit is connected to a decode circuit DCR. The instruction decoded by the decoding circuit DCR is connected as a control signal to a logic circuit such as a management register, a general-purpose register, and an arithmetic unit ALU,
One of them is taken into the process switching circuit. Here, this process switching circuit is connected to the decoding circuit DCR when a process switching instruction is generated, and is composed of a save processing block, a selection algorithm block, and a restoration block.
The save processing block, the restore block, and the selection algorithm block are connected to the switching control line connected to the register control line, and the value PC * of the program counter PC and the execution state flag information stored in the management register are stored in the management register. To control. Further, the register group signal Grp is selected by the selection algorithm block.
# Is generated and data is transferred to the management register and the general-purpose register group. The register group signal Grp # is for designating the allocated general-purpose register group. Further, the management register and the general-purpose register group are connected to the input data bus and the output data bus. Then, the data output from the management register and the general-purpose register group is transferred to the arithmetic unit ALU which is addressed and accessed by the decoding circuit DCR and also to the program counter PC. Further, the output data from the program counter PC is accessed to the management register and the general-purpose register group via the input data bus. Further, the management register is made to correspond to the general-purpose register group on the OS in a one-to-one manner. When Grp0 is designated, the management register a is designated, and the process stored in the management register a is assigned to the general-purpose register group Grp0. The data to be recorded in the management register is not particularly limited,
It shall include the execution data when the execution was interrupted. Therefore, in this embodiment, if the general-purpose register group Grp0 is designated when the execution is interrupted for one register group signal Grp #, the management register a corresponding to the general-purpose register group Grp is assigned. Stored value of the program counter PC PC *
That is, two pieces of data, that is, an execution state flag and the execution state flag are held. Here, the general-purpose register group Grp0
The flag of the management register a corresponding to is the execution state flag being executed, and the flag of the management register b corresponding to the general-purpose register group Grp1 is the execution state flag in the standby state.

【０００７】次にこのプロセッサの初期設定動作及びプ
ロセス切り換え動作について説明する。Next, the initial setting operation and process switching operation of this processor will be described.

【０００８】まず、汎用レジスタのグルーピングの方法
について説明する。特に図示しないが、プロセッサにリ
セット信号を入力し、管理レジスタ群のフラグのデータ
をクリアし、起動用の管理レジスタのみを実行状態にす
る。上記起動用の管理レジスタにはあらかじめ起動時の
ＰＣ値が格納されており、この値をプログラムカウンタ
ＰＣに転送して、メモリからＣＰＵへの読み込みを許可
する。これによって、起動用プログラムが上記ＣＰＵに
読み込まれ、さらにＯＳが起動できるように上記ＣＰＵ
内の管理レジスタ及び汎用レジスタ上に必要なデータを
確保し、上記メモリ上のデータの割り当て、周辺入出力
装置のセットアップ等を行なう。そして、準備が完了し
たら上記ＯＳを立ち上げ、起動用以外の管理レジスタも
上記ＯＳ管理の元でプロセスが割り当てられる。ここ
で、上記管理レジスタに割り当てられるプロセスは、Ｏ
Ｓ自身を処理しても、入出力装置周辺を制御しても、Ｏ
Ｓが立ち上げたアプリケーションであっても特に限定さ
れない。First, a method of grouping general-purpose registers will be described. Although not shown in particular, a reset signal is input to the processor to clear the flag data of the management register group, and only the management register for activation is set to the execution state. A PC value at the time of startup is stored in advance in the management register for startup, and this value is transferred to the program counter PC to permit reading from the memory to the CPU. As a result, the boot program is loaded into the CPU, and the OS is booted so that the OS can be booted.
Necessary data is secured in the management register and general-purpose register therein, allocation of data on the memory, setup of peripheral input / output devices, and the like are performed. Then, when the preparation is completed, the OS is started up, and processes other than the boot-up management registers are also allocated under the OS management. Here, the process assigned to the management register is O
Whether processing S itself or controlling the periphery of the I / O device,
The application launched by S is not particularly limited.

【０００９】次に、プロセス切り換え時の本実施例のプ
ロセッサのデータ転送手順を説明する。ここでは、上述
したように汎用レジスタファイルを１６個のグループに
分割し、プロセス切り換え時に汎用レジスタグループＧ
ｒｐ０が実行状態であったとし、汎用レジスタグループ
Ｇｒｐ１にプロセス切り換えを行なうとして説明する。
プロセス切り換え命令がデータキャッシュメモリ及び命
令キャッシュメモリから出力される。そして、上記デー
タキャッシュメモリはデータバスを介してレジスタのア
ドレス指定を行い、上記命令キャッシュメモリはフェッ
チ回路を介してプロセッサ内部へプロセス切り換え命令
をフェッチする。そして、上記フェッチ回路によってフ
ェッチされた命令はデコード回路ＤＣＲに転送され、命
令解読が行われるとともに演算器ＡＬＵの演算内容を選
択し、上記演算器ＡＬＵへその演算内容のデータを転送
する。さらに、上記デコード回路ＤＣＲから出力された
データはプロセス切り換え回路に入力される。Next, the data transfer procedure of the processor of this embodiment when switching processes will be described. Here, the general-purpose register file is divided into 16 groups as described above, and the general-purpose register group G
It is assumed that rp0 is in the execution state and the process is switched to the general-purpose register group Grp1.
The process switching instruction is output from the data cache memory and the instruction cache memory. Then, the data cache memory addresses the register via the data bus, and the instruction cache memory fetches the process switching instruction into the processor via the fetch circuit. Then, the instruction fetched by the fetch circuit is transferred to the decode circuit DCR, the instruction is decoded, the operation content of the arithmetic unit ALU is selected, and the data of the operation content is transferred to the arithmetic unit ALU. Further, the data output from the decoding circuit DCR is input to the process switching circuit.

【００１０】つぎにこのプロセス切り換え回路における
処理方法について以下に説明する。まず、退避ブロック
においては、切り換え制御線を介してレジスタ制御線と
アクセスし、実行中の管理レジスタａを選択する。そし
て、プログラムカウンタＰＣに上記プログラムカウンタ
の値ＰＣ＊を保持し、さらに入力用データバスを介して
上記プログラムカウンタの値ＰＣ＊を上記管理レジスタ
ａに上書きする。そして、上記管理レジスタａに格納さ
れた実行状態フラグを上記レジスタ制御線を介して上記
切り換え制御線上に読み出し、上記実行状態フラグを待
機状態に切り換える。次に選択アルゴリズムブロックに
おいては、実行待ち状態フラグから次に実行するプロセ
スを選択し、そのプロセスと対応した汎用レジスタグル
ープを指定するためにレジスタグループ信号Ｇｒｐ＃を
発生する。そして、ラッチ回路にて上記選択アルゴリズ
ムブロックから発生したレジスタグループ信号Ｇｒｐ＃
を保持しつつ、汎用レジスタグループＧｒｐ１を選択
し、このことにより上記管理レジスタｂが指定される。
さらに、復帰ブロックにおいて、上記切り換え制御線を
介してアクセスされるレジスタ制御線により、上記管理
レジスタｂを指定し、この管理レジスタｂに格納された
プログラムカウンタの値ＰＣ＊を上記出力用データバス
を介して読み出すとともに、このプログラムカウンタの
値ＰＣ＊を上記プログラムカウンタＰＣにセットする。
そして、上記選択アルゴリズムブロックにてビット番号
により読み出されたレジスタグループ番号から上記管理
レジスタｂを割り出し、上記管理レジスタｂに格納され
た待機状態のフラグを実行状態フラグの実行中にそれと
同一ビットに切り換える。そして、選択されたプロセス
の最初の命令が上記命令キャッシュメモリよりフェッチ
回路に読み込まれることによって、プロセス切り換え動
作が終了し、次のプロセスの実行が行われる。また、全
てのプロセスは、上記プロセス切り換え回路及び管理レ
ジスタのみで管理される必要はなく、メインメモリで管
理することもできる。実際、ＯＳで管理しているプロセ
スは莫大であり、プロセッサＣＰＵ内の管理レジスタ等
で全てのプロセスを管理することはできない。よって、
本発明をＯＳ上で利用する場合は、最も使用頻度の多い
実行待ち状態のプロセスを管理レジスタに保管してお
き、実行を終了もしくは中断したプロセスは管理レジス
タデータをメインメモリへ保存する。ここで、ＯＳが行
なう処理は、上記管理レジスタから上記メインメモリへ
退避する処理と、上記メインメモリ上にある実行待ち状
態のプロセスのデータを上記管理レジスタへコピーをす
る処理と、上記管理レジスタへコピーすべきプロセスを
選択する処理を含む。ところで、メモリ上にデータがあ
るときは、上記管理レジスタ及び汎用レジスタグループ
Ｇｒｐ０〜１５のうちで使用頻度の少ないデータをプロ
グラム制御あるいは自動制御により上記データキャッシ
ュメモリから管理レジスタ群へ必要なデータを転送す
る。この切り換え作業が終了するまでプロセッサＣＰＵ
は次の命令を実行しないことを前提とする。この切り換
え方法はソフトウェアとハードウェア両方による実現が
可能であるが、ソフトウェアの場合は切り換えプロセス
を起動することにより少なくとも１つの管理レジスタを
占有する。また、従来は１つのプロセスで汎用レジスタ
ファイルを占有することによって、退避作業に数１０サ
イクルが消費されていたので、本方式を用いて高速化を
図るためにはハードウェアで組み込む方が望ましい。し
かし、この場合、汎用レジスタファイル自身に分割のた
めのハードウェアを付加するものではなくプロセス単位
に割り当てたレジスタアドレスを汎用レジスタファイル
のアドレスに変換するものである。ここで、本実施例で
は演算器としてＡＬＵを記載したが、かわりにＦｌｏａ
ｔｉｎｇＰｏｉｎｔＰｒｏｃｅｓｓｉｎｇＵｎ
ｉｔを使用することによりさらにシステム全体のソフト
ウェアの負担が軽減できる。Next, a processing method in this process switching circuit will be described below. First, in the save block, the register control line is accessed through the switching control line to select the management register a being executed. Then, the value PC * of the program counter is held in the program counter PC, and the value PC * of the program counter is overwritten in the management register a via the input data bus. Then, the execution state flag stored in the management register a is read onto the switching control line via the register control line, and the execution state flag is switched to the standby state. Next, in the selection algorithm block, the process to be executed next is selected from the execution waiting flag, and the register group signal Grp # is generated to specify the general-purpose register group corresponding to the process. Then, in the latch circuit, the register group signal Grp # generated from the selection algorithm block is generated.
The general-purpose register group Grp1 is selected while holding, and the management register b is designated by this.
Further, in the restoration block, the management register b is designated by the register control line accessed through the switching control line, and the value PC * of the program counter stored in the management register b is set to the output data bus. The value PC * of this program counter is set in the above-mentioned program counter PC while being read out.
Then, the management register b is calculated from the register group number read by the bit number in the selection algorithm block, and the flag in the standby state stored in the management register b is set to the same bit as that during execution of the execution state flag. Switch. Then, the first instruction of the selected process is read from the instruction cache memory into the fetch circuit, whereby the process switching operation is completed and the next process is executed. Further, all processes need not be managed only by the process switching circuit and the management register, but can be managed by the main memory. In fact, the number of processes managed by the OS is enormous, and it is not possible to manage all the processes with the management registers in the processor CPU. Therefore,
When the present invention is used on the OS, the process in the execution waiting state that is most frequently used is stored in the management register, and the process whose execution has been terminated or suspended stores the management register data in the main memory. Here, the processing performed by the OS is the processing of saving from the management register to the main memory, the processing of copying the data of the process in the execution waiting state on the main memory to the management register, and the management register. It includes the process of selecting the process to be copied. By the way, when there is data in the memory, the data which is rarely used in the management register and general-purpose register groups Grp0 to 15 is transferred from the data cache memory to the management register group by program control or automatic control. To do. Until the switching work is completed, the processor CPU
Assumes that the next instruction is not executed. This switching method can be implemented by both software and hardware, but in the case of software, at least one management register is occupied by initiating the switching process. Further, in the related art, a general process occupies a general-purpose register file, so several tens of cycles are consumed for saving work. Therefore, in order to achieve high speed using this method, it is preferable to incorporate it in hardware. However, in this case, the general-purpose register file itself is not added with hardware for division, but the register address allocated in process units is converted into the address of the general-purpose register file. Here, although the ALU is described as the arithmetic unit in the present embodiment, the Floa is used instead.
toning Point Processing Un
By using it, the load on the software of the entire system can be further reduced.

【００１１】（実施例２）管理レジスタの数と汎用レジ
スタグループの数が一致していない場合でも、上記管理
レジスタに任意に汎用レジスタグループを選択させる機
能をもたせたプロセッサを図５に示す。本実施例におい
ては、実施例１における管理レジスタ群のそれぞれの管
理レジスタに汎用レジスタのグループ番号に関するデー
タも格納させたものである。本実施例においては、管理
レジスタａ，管理レジスタｂに格納された汎用レジスタ
グループ番号に関するデータｇｒｐ＊は、それぞれ汎用
レジスタグループＧｒｐ０，Ｇｒｐ２のデータであるも
のとして以下に説明する。まず、実施例１と同様にして
Ｇｒｐ０，Ｇｒｐ２のように汎用レジスタのグルーピン
グを行ない、ＯＳ上でプログラムカウンタＰＣの値ＰＣ
＊をそれぞれの上記管理レジスタにそれぞれ設定すると
ともに、実行状態フラグデータ，グループ番号データｇ
ｒｐ＊を格納させる。そして、起動アドレス入力で上記
管理レジスタ群の管理レジスタ指定を行ない、上記汎用
レジスタグループと対応したプロセスを上記管理レジス
タと対応した上記汎用レジスタグループへ転送し実行す
る。(Embodiment 2) FIG. 5 shows a processor in which the management register has a function of arbitrarily selecting a general-purpose register group even when the number of management registers does not match the number of general-purpose register groups. In the present embodiment, each management register of the management register group of the first embodiment also stores data relating to the group number of the general-purpose register. In the present embodiment, the data grp * relating to the general-purpose register group numbers stored in the management registers a and b will be described below as data of the general-purpose register groups Grp0 and Grp2, respectively. First, similar to the first embodiment, general-purpose registers are grouped like Grp0 and Grp2, and the value PC of the program counter PC is set on the OS.
* Is set in each of the above management registers, and execution status flag data and group number data g
Store rp *. Then, the management register designation of the management register group is performed by inputting the start address, and the process corresponding to the general-purpose register group is transferred to the general-purpose register group corresponding to the management register and executed.

【００１２】次に本実施例によるプロセッサのプロセス
の切り換え方法について説明する。ここでは、上述した
ように汎用レジスタファイルを任意の数のグループに分
割し、グループ番号データｇｒｐ＊として汎用レジスタ
グループＧｒｐ０のデータが格納された管理レジスタａ
が実行状態であって、グループ番号データｇｒｐ＊とし
て汎用レジスタグループＧｒｐ２のデータが格納された
管理レジスタｂへプロセスを切り換えるとして説明す
る。命令キャッシュメモリから切り換え命令が発生し、
上記命令キャッシュメモリはフェッチ回路を介してプロ
セッサ内部へ切り換え命令をフェッチする。そして、上
記フェッチ回路によってフェッチされた命令はデコード
回路ＤＣＲに転送され、データはプロセス切り換え回路
に入力される。つぎに、このプロセス切り換え回路に
おけるデータ転送方法について以下に詳細に説明する。
まず、退避処理ブロックにおいては、切り換え制御線を
介してレジスタ制御線とアクセスし、実行中の管理レジ
スタａを選択する。そして、プログラムカウンタの値Ｐ
Ｃ＊をプログラムカウンタＰＣに保持し、さらに入力用
データバスを介して上記プログラムカウンタの値ＰＣ＊
を上記管理レジスタａに上書きする。そして、上記管理
レジスタａに格納された実行状態フラグをレジスタ制御
線を介して切り換え制御線上に読み出し、上記実行状態
フラグを待機状態に切り換える。次に選択アルゴリズム
ブロックにおいて次に実行するプロセスを選択し、その
データを復帰ブロックに転送する。そして、この復帰ブ
ロックにおいては、管理レジスタｂを上記切り換え制御
線と接続された上記レジスタ制御線を介してアクセスす
ることにより指定する。そして、上記管理レジスタｂに
格納されたグループ番号データｇｒｐ＊の汎用レジスタ
グループＧｒｐ２のデータを上記レジスタ制御線を介し
て上記切り換え制御線上に読み出す。そして、上記管理
レジスタｂから読み出されたグループ番号データｇｒｐ
＊としての汎用レジスタグループデータとしてのＧｒｐ
２のデータを選択するためにレジスタグループ信号Ｇｒ
ｐ＃を発生するとともにラッチ回路にてその信号を保持
しつつ、汎用レジスタグループＧｒｐ２を指定する。こ
のとき、さらにこの管理レジスタｂに格納されたプログ
ラムカウンタの値ＰＣ＊を上記出力用データバスを介し
て読み出すとともに、このプログラムカウンタの値ＰＣ
＊を上記プログラムカウンタＰＣにセットする。そし
て、上記選択アルゴリズムブロックにてビット番号によ
り読み出されたレジスタグループ番号から上記管理レジ
スタｂを割り出し、上記管理レジスタｂに格納された待
機状態のフラグを実行状態フラグの実行中にそれと同一
ビットに切り換える。そして、選択されたプロセスの最
初の命令が上記命令キャッシュメモリよりフェッチ回路
に読み込まれることによって、プロセス切り換え動作が
終了し、次のプロセスの実行が行われる。ここで、実施
例１と同様に、全てのプロセスは、上記プロセス切り換
え回路及び管理レジスタのみで管理される必要はなく、
メインメモリで管理することもできる。実際、ＯＳで管
理しているプロセスは莫大であり、プロセッサＣＰＵ内
の管理レジスタ等で全てのプロセスを管理することはで
きない。よって、本発明をＯＳ上で利用する場合は、最
も使用頻度の多い実行待ち状態のプロセスを管理レジス
タに保管しておき、実行を終了もしくは中断したプロセ
スは管理レジスタデータをメインメモリへ保存する。こ
こで、ＯＳが行なう処理は、上記管理レジスタから上記
メインメモリへ退避する処理と、上記メインメモリ上に
ある実行待ち状態のプロセスのデータを上記管理レジス
タへコピーをする処理と、上記管理レジスタへコピーす
べきプロセスを選択する処理を含む。また、従来は１つ
のプロセスで汎用レジスタファイルを占有することによ
って数１０サイクルが消費されていたので、本方式を用
いて高速化を図るためにはハードウェアで組み込む方が
望ましい。しかし、この場合、汎用レジスタファイル自
身に分割のためのハードウェアを付加するものではなく
プロセス切り換えを行うものである。また、ここで本実
施例では演算器としてＡＬＵを記載したが、かわりにＦ
ＰＵを使用することによりさらにソフトウェアの負担が
軽減できる。さらに、汎用レジスタグループの数の方が
管理レジスタの数よりも多く、上記汎用レジスタグルー
プに対応する数の管理レジスタが不足した場合にはソフ
トウェアによって対応する管理レジスタ群を分割してメ
モリ上に新たな管理レジスタをつくることによって対処
する。この実施例においては、ソフトウェアで汎用レジ
スタ上にあるプロセスデータの参照を行なうことができ
るため、実施例１と比較してユーザの使い勝手が非常に
良くなる。Next, the process switching method of the processor according to the present embodiment will be described. Here, the general-purpose register file is divided into an arbitrary number of groups as described above, and the management register a in which the data of the general-purpose register group Grp0 is stored as the group number data grp *.
In the execution state, the process is switched to the management register b in which the data of the general-purpose register group Grp2 is stored as the group number data grp *. A switching instruction is generated from the instruction cache memory,
The instruction cache memory fetches a switching instruction into the processor via the fetch circuit. Then, the instruction fetched by the fetch circuit is transferred to the decode circuit DCR, and the data is input to the process switching circuit. Next, the data transfer method in this process switching circuit will be described in detail below.
First, in the save processing block, the register control line is accessed via the switching control line to select the management register a being executed. Then, the value P of the program counter
C * is held in the program counter PC, and further the value PC * of the program counter is input via the input data bus.
Is overwritten on the management register a. Then, the execution state flag stored in the management register a is read onto the switching control line via the register control line, and the execution state flag is switched to the standby state. Next, the process to be executed next is selected in the selection algorithm block, and the data is transferred to the return block. Then, in this restoration block, the management register b is designated by accessing it through the register control line connected to the switching control line. Then, the data of the general-purpose register group Grp2 of the group number data grp * stored in the management register b is read out onto the switching control line via the register control line. The group number data grp read from the management register b
Grp as general-purpose register group data as *
Register group signal Gr for selecting 2 data
The general-purpose register group Grp2 is designated while generating p # and holding the signal in the latch circuit. At this time, the value PC * of the program counter stored in the management register b is read out through the output data bus, and the value PC * of the program counter is read.
* Is set in the program counter PC. Then, the management register b is calculated from the register group number read by the bit number in the selection algorithm block, and the flag in the standby state stored in the management register b is set to the same bit as that during execution of the execution state flag. Switch. Then, the first instruction of the selected process is read from the instruction cache memory into the fetch circuit, whereby the process switching operation is completed and the next process is executed. Here, as in the first embodiment, all processes need not be managed only by the process switching circuit and the management register.
It can also be managed in main memory. In fact, the number of processes managed by the OS is enormous, and it is not possible to manage all the processes with the management registers in the processor CPU. Therefore, when the present invention is used on the OS, the process in the execution waiting state that is most frequently used is stored in the management register, and the process whose execution has been completed or suspended saves the management register data in the main memory. Here, the processing performed by the OS is the processing of saving from the management register to the main memory, the processing of copying the data of the process in the execution waiting state on the main memory to the management register, and the management register. It includes the process of selecting the process to be copied. Moreover, conventionally, several tens of cycles are consumed by occupying the general-purpose register file in one process, so it is preferable to incorporate it in hardware in order to increase the speed using this method. However, in this case, the general-purpose register file itself is not added with hardware for division, but the process is switched. Further, although the ALU is described as the arithmetic unit in this embodiment, the FLU is used instead.
The use of PU can further reduce the load on the software. Furthermore, when the number of general-purpose register groups is larger than the number of management registers and the number of management registers corresponding to the above general-purpose register groups is insufficient, the corresponding management register groups are divided by software and newly created on the memory. This is dealt with by creating a proper management register. In this embodiment, since the process data on the general-purpose register can be referred to by software, the usability for the user becomes much better than in the first embodiment.

【００１３】（実施例３）図６に本発明を応用して、フ
ェッチ回路，デコード回路，演算器を複数設け、上記演
算器と同数のプログラムカウンタを設けることにより、
１つのＬＳＩの中に形成したマルチプロセッサを示す。
本実施例では、１つの管理レジスタ群と汎用レジスタフ
ァイルが全てのプロセッサごとに共通にアクセスできる
ように設けられ、上記管理レジスタ群と上記汎用レジス
タファイルを分割してプロセッサ別に対応させて命令を
受けるようにしたもので、図６では一例としてプロセッ
サＣＰＵを２つつくったマルチプロセッサであるものと
して説明する。つまり、本実施例は複数のプロセッサ要
素で１つの汎用レジスタグループを専有しないように、
それぞれプロセッサごとに番号が決められており、管理
レジスタの実行状態フラグへ割り当てられたプロセッサ
の番号を登録するものである。そのため、ここではプロ
セッサＣＰＵ１はプログラムカウンタＰＣ１と演算器Ａ
ＬＵ１により構成されるプロセッサ要素ＰＥ１を含み、
同様にプロセッサＣＰＵ２はプロセッサ要素ＰＥ２を含
むものとなる。さらに、本実施例においては、図６に示
すようにデータキャッシュメモリおよび命令キャッシュ
メモリが上記プロセッサＣＰＵ１，ＣＰＵ２に対して共
通に設けられている。これに対し、従来は上述した図２
のようにそれぞれのプロセッサＣＰＵ毎に専用の内蔵キ
ャッシュメモリが設けられ、上記それぞれのプロセッサ
ＣＰＵ同志をアクセスさせる共有のバスを設けて、２次
キャッシュメモリをプロセッサ毎に共通にアクセスして
いた。このため、上記プロセッサＣＰＵ同志が別チップ
であっても別プロセスで同一データ領域から読まれ、そ
れぞれの結果が異なるとき、２次キャッシュメモリに退
避されるデータはどちらか一方のプロセスの結果であ
り、他のプロセスの結果が消されてしまう。さらに、そ
の同一データに依存する別のプロセスを起動すると誤っ
たデータを読み込む可能性があるため、そのようなデー
タのアクセスはＯＳにより禁止されている。しかし、本
実施例においては、図６に示すように上記プロセッサＣ
ＰＵ１と上記プロセッサＣＰＵ２に対応するデータキャ
ッシュメモリおよび命令キャシュメモリが共通に設けら
れ実行すべきプロセスを管理レジスタにおいてプロセッ
サ番号をチェックすることにより、上記プロセス間のデ
ータ衝突をさけることができ、内蔵キャッシュメモリで
データ転送を行なうため、ＯＳによるデータ領域のチェ
ックにかかる時間を軽減でき、高速処理を行なうという
ことが容易に達成できる。ここで、特に限定しないが、
本実施例の上記データキャシュメモリおよび上記命令キ
ャッシュメモリは上記プロセッサ数分の命令コードを発
生するため、マルチポートメモリが使用される。また、
本実施例においてはそれぞれのプロセッサＣＰＵごとに
フェッチ回路，デコード回路，データバス，アドレスバ
ス等が設けられるが、プロセス切り換え回路は上記デー
タキャッシュメモリ，上記命令キャッシュメモリと同様
にプロセッサの数に関係なく共通にアクセスできるよう
に設けられる。(Embodiment 3) By applying the present invention to FIG. 6, a plurality of fetch circuits, decode circuits, and arithmetic units are provided, and the same number of program counters as the arithmetic units are provided.
The multiprocessor formed in one LSI is shown.
In this embodiment, one management register group and a general-purpose register file are provided so as to be commonly accessible to all the processors, and the management register group and the general-purpose register file are divided to correspond to each processor to receive an instruction. 6 will be described as an example of a multiprocessor having two processor CPUs. In other words, in the present embodiment, a plurality of processor elements do not occupy one general-purpose register group,
The number is determined for each processor, and the number of the assigned processor is registered in the execution status flag of the management register. Therefore, here, the processor CPU1 includes the program counter PC1 and the arithmetic unit A.
Including a processor element PE1 constituted by LU1;
Similarly, the processor CPU2 includes the processor element PE2. Further, in this embodiment, as shown in FIG. 6, a data cache memory and an instruction cache memory are provided commonly to the processors CPU1 and CPU2. On the other hand, in the related art, as shown in FIG.
As described above, a dedicated built-in cache memory is provided for each processor CPU, and a shared bus for accessing the respective processor CPUs is provided to commonly access the secondary cache memory for each processor. Therefore, even if the processors CPUs are different chips, the data saved in the secondary cache memory is read from the same data area by different processes and the results saved in the secondary cache memory are the results of either process. , The results of other processes are erased. Further, since the wrong data may be read when another process that depends on the same data is started, access to such data is prohibited by the OS. However, in the present embodiment, as shown in FIG.
A data cache memory and an instruction cache memory corresponding to the PU1 and the processor CPU2 are provided in common, and by checking the processor number of the process to be executed in the management register, it is possible to avoid data collision between the processes, and the internal cache is provided. Since the data transfer is performed by the memory, the time required for checking the data area by the OS can be reduced, and high speed processing can be easily achieved. Here, although not particularly limited,
Since the data cache memory and the instruction cache memory of this embodiment generate instruction codes for the number of processors, a multiport memory is used. Also,
In this embodiment, a fetch circuit, a decode circuit, a data bus, an address bus, etc. are provided for each processor CPU, but the process switching circuit is the same as the data cache memory and the instruction cache memory, regardless of the number of processors. It is provided for common access.

【００１４】以下に本実施例のマルチプロセッサのプロ
セス切り換え方法について説明する。まず、実施例１と
同様にしてＧｒｐ１ａ，Ｇｒｐ２ａのように汎用レジス
タのグルーピングを行ない、ＯＳ上でプログラムカウン
タＰＣ１，ＰＣ２のそれぞれの値ＰＣ＊をそれぞれの上
記管理レジスタにそれぞれ設定するとともに実行状態の
フラグデータとともに格納されたプロセッサエレメント
データＰＥ＊を格納させる。そして、起動アドレス入力
で上記管理レジスタ群の管理レジスタ指定を行ない、上
記汎用レジスタグループと対応したプロセスを上記管理
レジスタと対応した上記汎用レジスタグループへ転送し
実行する。The process switching method of the multiprocessor of this embodiment will be described below. First, similar to the first embodiment, general-purpose registers are grouped like Grp1a and Grp2a, and the respective values PC * of the program counters PC1 and PC2 are set in the respective management registers and the execution state The processor element data PE * stored together with the flag data is stored. Then, the management register designation of the management register group is performed by inputting the start address, and the process corresponding to the general-purpose register group is transferred to the general-purpose register group corresponding to the management register and executed.

【００１５】次に、プロセス切り換え時の本実施例のマ
ルチプロセッサのデータ転送手順を説明する。本実施例
では一例として、一方のプロセッサＣＰＵと対応する汎
用レジスタグループでプロセス切り換えを行なう場合の
処理方法について以下に説明する。ここでは、プロセッ
サＣＰＵ１，２をそれぞれプロセッサ要素ＰＥ１，ＰＥ
２と対応させるとともに、汎用レジスタファイルを管理
レジスタと同数のグループに分割するものである。その
ため、プロセッサＣＰＵ１においては汎用レジスタグル
ープＧｒｐ１ａが実行状態，プロセッサＣＰＵ２におい
ては汎用レジスタグループＧｒｐ２ａが実行状態であ
り、プロセッサＣＰＵ１において汎用レジスタグループ
Ｇｒｐ１ｂにプロセス切り換えを行なうものとして以下
に説明する。上記プロセッサＣＰＵ１へのプロセス切り
換え命令がデータキャッシュメモリ及び命令キャッシュ
メモリから発生されるとともに、上記プロセッサＣＰＵ
２はプロセス切り換え命令が発生されていない状態とさ
れている。そして、上記命令キャッシュメモリはフェッ
チ回路１を介して上記プロセッサＣＰＵ１内部へプロセ
ス切り換え命令をフェッチする。そして、それぞれ上記
フェッチ回路１によってフェッチされた命令はそれぞれ
デコード回路ＤＣＲ１に転送され、上記デコード回路Ｄ
ＣＲ１から出力されたデータはプロセス切り換え回路に
入力される。一方、フェッチ回路２には、上記命令キャ
ッシュメモリから上記プロセッサＣＰＵ２におけるプロ
セス切り換え命令が発生されていない。このため、デコ
ード回路ＤＣＲ２を介してプロセッサ要素ＰＥ２と対応
する演算器ＡＬＵ２において指定された演算内容を実行
するための命令が保持され、プロセス切り換え回路には
データが入力されず、命令実行中のプロセッサ要素ＰＥ
２および管理レジスタ２ａにおけるプロセスが保持され
て実行される。このようにして、上記デコード回路ＤＣ
Ｒ１のようにプロセス切り換え命令が解読されたデコー
ド回路からのみプロセス切り換え回路への命令が取り込
まれる。Next, the data transfer procedure of the multiprocessor of the present embodiment at the time of process switching will be described. In this embodiment, as an example, a processing method in the case of performing process switching in a general-purpose register group corresponding to one processor CPU will be described below. Here, the processors CPU1 and CPU2 are respectively processor elements PE1 and PE.
In addition to corresponding to 2, the general-purpose register file is divided into the same number of groups as the management registers. Therefore, the general-purpose register group Grp1a is in the execution state in the processor CPU1, the general-purpose register group Grp2a is in the execution state in the processor CPU2, and the process is switched to the general-purpose register group Grp1b in the processor CPU1. A process switching instruction to the processor CPU1 is generated from the data cache memory and the instruction cache memory, and the processor CPU
In No. 2, the process switching instruction is not generated. Then, the instruction cache memory fetches a process switching instruction into the processor CPU1 via the fetch circuit 1. Then, the instructions respectively fetched by the fetch circuit 1 are transferred to the decode circuit DCR1, and the decode circuit D
The data output from CR1 is input to the process switching circuit. On the other hand, in the fetch circuit 2, the process switching instruction in the processor CPU2 is not generated from the instruction cache memory. Therefore, the instruction for executing the specified operation content in the arithmetic unit ALU2 corresponding to the processor element PE2 is held via the decode circuit DCR2, no data is input to the process switching circuit, and the processor executing the instruction is held. Element PE
2 and the processes in the management register 2a are held and executed. In this way, the decoding circuit DC
The instruction to the process switching circuit is fetched only from the decoding circuit in which the process switching instruction is decoded like R1.

【００１６】次に、このプロセス切り換え回路における
処理方法について以下に説明する。まず、退避ブロック
においては、切り換え制御線を介してレジスタ制御線と
アクセスし、実行中の管理レジスタ１ａを選択する。そ
して、プロセッサ要素ＰＥ１におけるプログラムカウン
タＰＣ１に上記プログラムカウンタの値ＰＣ＊を保持
し、さらに入力用データバス１を介して上記プログラム
カウンタの値ＰＣ＊を上記管理レジスタ１ａに上書きす
る。そして、上記管理レジスタ１ａに格納された実行状
態フラグを上記レジスタ制御線を介して上記切り換え制
御線上に読み出し、上記実行状態フラグおよびそこに保
持されたプロセッサエレメントデータＰＥ＊としてのプ
ロセッサ要素ＰＥ１のデータを待機状態に切り換える。
次に選択アルゴリズムブロックにおいては、実行待ち状
態フラグから次実行プロセスを選択し、そのプロセスと
対応した汎用レジスタグループを指定するために、レジ
スタグループ信号Ｇｒｐ＃を発生する。そして、ラッチ
回路にて上記選択アルゴリズムブロックから発生したレ
ジスタグループ信号Ｇｒｐ＃を保持しつつ、汎用レジス
タグループＧｒｐ１ｂを選択し、このことにより上記管
理レジスタ１ｂが指定される。さらに、復帰ブロックに
おいて、上記切り換え制御線を介して上記管理レジスタ
１ｂを指定し、この管理レジスタ１ｂに格納されたプロ
グラムカウンタＰＣ１の値ＰＣ＊を上記プログラムカウ
ンタＰＣ１にセットする。そして、上記選択アルゴリズ
ムブロックにてビット番号により読み出されたレジスタ
グループ番号から上記管理レジスタ１ｂを割り出し、上
記管理レジスタ１ｂに格納された待機状態のフラグを実
行状態フラグ，プロセッサ要素ＰＥ１の実行中にそれと
同一ビットに切り換える。また、上記管理レジスタ１ｂ
及び上記汎用レジスタグループＧｒｐ１ｂから読み出さ
れたデータは演算器ＡＬＵ１によって演算が行われる。
そして、実行再開命令が上記命令キャッシュメモリより
発生されることによって、プロセス切り換え動作が終了
し、プロセスの実行が行われる。ここで、ＯＳ上にデー
タがあるときは、上記管理レジスタ及び上記汎用レジス
タグループのうちで使用頻度の少ないデータをプログラ
ム制御あるいは自動制御により上記データキャッシュメ
モリあるいは上記命令キャッシュメモリへ退避し、その
キャッシュメモリから管理レジスタ群へ必要なデータを
転送する。この切り換え方法はソフトウェアとハードウ
ェア両方による実現が可能であるが、ソフトウェアだと
切り換えプロセスが必要となる。また、従来は１つのプ
ロセス汎用レジスタファイルを占有することによって数
サイクルが消費されていたので、本方式を用いて高速化
を図るためにはハードウェアで組み込む方が望ましい。
しかし、この場合、汎用レジスタファイル自身に分割の
ためのハードウェアを付加するものではなくプロセス切
り換えを行うものである。ここで、本実施例では演算器
としてＡＬＵを記載したが、かわりにグラフィックアク
セラレータを使用する、あるいは上記ＡＬＵとグラフィ
ックアクセラレータを並列して使用することによりさら
にソフトウェアの負担が軽減できるとともに、１サイク
ルで命令実行でき、高性能な演算処理が可能となる。グ
ラフィックアクセラレータの場合は、グラフィック命令
をそれぞれプロセスとみなすことにより、通常のＣＰＵ
でのプロセスと異なり、プロセス毎の依存性がないこと
から高速処理が可能である。また、本実施例ではプロセ
ッサエレメントデータＰＥ＊が実行中の各管理レジスタ
に格納されているために、他のプロセッサ要素を共有す
ることもできる。Next, a processing method in this process switching circuit will be described below. First, in the save block, the register control line is accessed via the switching control line to select the management register 1a being executed. Then, the value PC * of the program counter is held in the program counter PC1 of the processor element PE1, and the value PC * of the program counter is overwritten in the management register 1a via the input data bus 1. Then, the execution state flag stored in the management register 1a is read onto the switching control line via the register control line, and the execution state flag and the data of the processor element PE1 held therein as the processor element data PE * are read. To the standby state.
Next, in the selection algorithm block, the next execution process is selected from the execution waiting state flag, and the register group signal Grp # is generated to specify the general-purpose register group corresponding to the selected process. Then, the latch circuit holds the register group signal Grp # generated from the selection algorithm block while selecting the general-purpose register group Grp1b, whereby the management register 1b is designated. Further, in the return block, the management register 1b is designated via the switching control line, and the value PC * of the program counter PC1 stored in the management register 1b is set in the program counter PC1. Then, the management register 1b is calculated from the register group number read by the bit number in the selection algorithm block, and the standby state flag stored in the management register 1b is used as the execution state flag, while the processor element PE1 is being executed. Switch to the same bit as that. Also, the management register 1b
The data read from the general-purpose register group Grp1b is calculated by the arithmetic unit ALU1.
Then, when the execution restart instruction is generated from the instruction cache memory, the process switching operation is completed and the process is executed. Here, when there is data on the OS, the less frequently used data in the management register and the general-purpose register group is saved in the data cache memory or the instruction cache memory by program control or automatic control, and the cache is saved. Transfers required data from memory to management registers. This switching method can be implemented by both software and hardware, but software requires a switching process. Further, conventionally, several cycles are consumed by occupying one process general-purpose register file, so that it is preferable to incorporate it in hardware in order to achieve high speed using this method.
However, in this case, the general-purpose register file itself is not added with hardware for division, but the process is switched. Here, although the ALU is described as the arithmetic unit in the present embodiment, the load on the software can be further reduced by using the graphic accelerator instead, or by using the ALU and the graphic accelerator in parallel, and in one cycle. Instructions can be executed and high-performance arithmetic processing becomes possible. In the case of a graphic accelerator, by considering each graphic instruction as a process, the normal CPU
High-speed processing is possible because there is no dependency for each process, unlike the process in. Further, in this embodiment, since the processor element data PE * is stored in each management register being executed, other processor elements can be shared.

【００１７】（実施例４）図７に本発明を応用して、１
つのプロセッサに対して複数のプログラムカウンタを設
け、それを１サイクル単位の高速プロセス切り換えに対
応できるようにしたプロセッサを示す。従来、ＲＩＳＣ
プロセッサのような１サイクルごとにパイプライン処理
を行うプロセッサに対して、条件分岐命令による評価遅
延や、メモリのロードストアによる遅延が原因となり、
パイプラインに入らない命令は大抵ディレイスロットと
して実行されずに１サイクルが消費されるという問題点
があった。本実施例はこれを低減するための方式であ
り、プログラムカウンタＰＣ１，ＰＣ２のように、１つ
のプロセッサに対して複数のプログラムカウンタを設
け、実行中のプロセスおよび非実行中のプロセスを示す
アドレスを保持し、サイクル単位でプログラムカウンタ
を切り換えるとともに、命令コードを使いわけるもので
ある。そして、実行中のプロセス，非実行中のプロセス
を切り換えるプログラムカウンタＰＣ１，ＰＣ２のうち
どちらかを選択するためのセレクタを設けるとともに、
上記プログラムカウンタＰＣ１，ＰＣ２のそれぞれに対
応する汎用レジスタグループと対応した管理レジスタに
おけるプログラムカウンタの値ＰＣ＊を選択するＰＣ＊
セレクタを設ける。そして、上記プログラムカウンタＰ
Ｃ１，ＰＣ２へのプログラムカウンタの値ＰＣ＊のセッ
ト，上記管理レジスタおよび上記汎用レジスタグループ
の制御を行なうものである。さらに、外部からの命令と
してディレイスロットが入力されたときに、その命令を
検出するためのディレイスロット検出回路が上記命令キ
ャッシュメモリと上記フェッチ回路との間に設けられ
る。そして、上記ディレイスロット検出回路はディレイ
スロットを検出したときに上記セレクタを動作させる必
要があるため、上記ディレイスロット検出回路と上記セ
レクタとは接続される。また、上記実施例におけるパイ
プラインステージ用のラッチ回路はＰＣ１用，ＰＣ２用
にそれぞれ分けることも可能である。(Embodiment 4) By applying the present invention to FIG.
A processor in which a plurality of program counters are provided for one processor so as to support high-speed process switching on a cycle-by-cycle basis is shown. Conventionally, RISC
For processors such as processors that perform pipeline processing on a cycle-by-cycle basis, evaluation delays due to conditional branch instructions and delays due to load / store of memory cause
There is a problem that an instruction that does not enter the pipeline is not executed as a delay slot and one cycle is consumed. The present embodiment is a method for reducing this, and like the program counters PC1 and PC2, a plurality of program counters are provided for one processor, and addresses indicating a process in execution and a process in non-execution are provided. It holds the data, switches the program counter in cycle units, and uses the instruction code properly. Then, a selector is provided for selecting one of the program counters PC1 and PC2 for switching the process being executed and the process not being executed, and
PC * for selecting the value PC * of the program counter in the management register corresponding to the general-purpose register group corresponding to each of the program counters PC1 and PC2
Provide a selector. Then, the program counter P
It sets the value PC * of the program counter in C1 and PC2, and controls the management register and the general-purpose register group. Furthermore, when a delay slot is input as an external instruction, a delay slot detection circuit for detecting the instruction is provided between the instruction cache memory and the fetch circuit. Since the delay slot detection circuit needs to operate the selector when the delay slot is detected, the delay slot detection circuit and the selector are connected. Further, the latch circuit for the pipeline stage in the above embodiment can be divided into the PC1 and PC2, respectively.

【００１８】ここで、図８（ａ）は命令キャッシュメモ
リに格納された命令内容を示すものであり、図８（ｂ）
は命令実行のダイアグラムを示すものである。この図８
（ａ），図８（ｂ）について以下に説明する。図８
（ａ）でプログラムカウンタＰＣ１で実行されるデータ
の中でＤＳ１とＤＳ２はディレイスロットであり、なに
もしないということを実行する同じ命令コードが入って
いる。また、プログラムカウンタＰＣ２は実行待ちのプ
ロセスの中から選択されたものであり、いつ実行されて
も良い実行状態プロセスであり、特に限定しないが、Ｐ
Ｃ１のプロセス終了後、ＰＣ２がメインの実行プロセス
となる。命令をフェッチする前に上記プログラムカウン
タＰＣ１の示すアドレスが上記ディレイスロットＤＳ１
を示したときに、ディレイスロットの命令を検出してセ
レクタを上記プログラムカウンタＰＣ２に切り換え、デ
ィレイスロットではない命令コードをフェッチデータと
して読み込む。同様にしてディレイスロットＤＳ２も上
記プログラムカウンタＰＣ２で実行する命令コードをフ
ェッチデータとして読み込む。また、上記プログラムカ
ウンタＰＣ１の命令コードを実行するときに、演算器Ａ
ＬＵは汎用レジスタグループの間で演算を行い、上記プ
ログラムカウンタＰＣ２の命令コードを実行するときに
は、汎用レジスタグループの間で演算を行う。このよう
にして、上記プログラムカウンタＰＣ１をパイプライン
に命令を埋める補助プログラムとして使用することによ
って１プロセッサに対し、１サイクルあたり１命令の実
行が可能となる。しかしながら、上記プログラムカウン
タＰＣ２の命令コードにも例外なくディレイスロットが
入っているため、上記プログラムカウンタＰＣ２もディ
レイスロットの検出が必要である。これは、上記プログ
ラムカウンタＰＣ１の実行中にディレイスロットのサイ
クルを検出し、消費すれば良いのでフェッチサイクルの
中にディレイスロットが入ることはほとんどない。Here, FIG. 8A shows the content of the instruction stored in the instruction cache memory, and FIG.
Shows a diagram of instruction execution. This Figure 8
8A and 8B will be described below. Figure 8
In the data executed by the program counter PC1 in (a), DS1 and DS2 are delay slots and contain the same instruction code for executing nothing. Further, the program counter PC2 is a process selected from the processes waiting to be executed, and is a process in a running state that may be executed at any time.
After the process of C1 is completed, PC2 becomes the main execution process. Before fetching the instruction, the address indicated by the program counter PC1 is set to the delay slot DS1.
, The selector in the delay slot is detected, the selector is switched to the program counter PC2, and the instruction code that is not the delay slot is read as fetch data. Similarly, the delay slot DS2 also reads the instruction code executed by the program counter PC2 as fetch data. When executing the instruction code of the program counter PC1, the arithmetic unit A
The LU operates between general-purpose register groups, and when executing the instruction code of the program counter PC2, the LU operates between general-purpose register groups. In this way, by using the program counter PC1 as an auxiliary program for filling the pipeline with instructions, it becomes possible to execute one instruction per cycle for one processor. However, since the instruction code of the program counter PC2 has a delay slot without exception, the program counter PC2 also needs to detect the delay slot. This is because the delay slot cycle may be detected and consumed during the execution of the program counter PC1, so that the delay slot is rarely included in the fetch cycle.

【００１９】次に、図７，図８（ａ），図８（ｂ）を用
いて本実施例におけるプロセッサのプログラムカウンタ
の切り換え方法についてのデータ転送手順を説明する。
また、プロセス切り換え動作は実施例１と同様のため、
説明を省略する。上記プログラムカウンタＰＣ１が図８
（ａ）におけるＰＣ１のディレイスロットＤＳ１からオ
ペランドＯＰ２２へ命令を切り換えるものとして以下に
説明する。まず、なにもしないということを実行するＮ
ＯＰ命令が命令キャッシュメモリから出力される。そし
て、上記命令キャッシュメモリを介してＮＯＰ命令がデ
ィレイスロットＤＳ検出回路に読み込まれ、上記命令キ
ャッシュメモリからの命令がディレイスロットＤＳ１で
あることを検出する。そして、セレクタへプロセス切り
換えを指定し、プログラムカウンタを切り換えるための
データを転送し、実行中のプログラムカウンタＰＣ１を
ＰＣ２に切り換える。そして、それぞれ上記フェッチ回
路によってフェッチされた命令はそれぞれデコード回路
ＤＣＲに転送され、命令解読が行われるとともに演算器
ＡＬＵの演算内容を選択し、上記演算器ＡＬＵへその演
算内容データを転送する。さらに、上記デコード回路Ｄ
ＣＲから出力されたデータは管理レジスタ並びに汎用レ
ジスタに取り込まれ、管理レジスタ２ａにてＯＰ２２を
実行し、次の命令を命令キャッシュメモリから発生する
ことにより、上記プログラムカウンタＰＣ１に切り換え
られる。Next, the data transfer procedure of the method of switching the program counter of the processor in this embodiment will be described with reference to FIGS. 7, 8A and 8B.
Further, since the process switching operation is similar to that of the first embodiment,
The description is omitted. The program counter PC1 shown in FIG.
In the following, description will be made assuming that the instruction is switched from the delay slot DS1 of the PC1 in FIG. First, N to do nothing
The OP instruction is output from the instruction cache memory. Then, the NOP instruction is read into the delay slot DS detection circuit via the instruction cache memory, and it is detected that the instruction from the instruction cache memory is the delay slot DS1. Then, the process switching is designated to the selector, the data for switching the program counter is transferred, and the program counter PC1 being executed is switched to PC2. The instructions fetched by the fetch circuits are transferred to the decode circuits DCR, the instructions are decoded, the operation content of the arithmetic unit ALU is selected, and the operation content data is transferred to the arithmetic unit ALU. Furthermore, the decoding circuit D
The data output from the CR is taken into the management register and the general-purpose register, OP22 is executed in the management register 2a, and the next instruction is generated from the instruction cache memory, whereby the program counter PC1 is switched to.

【００２０】ここで、本実施例では演算器としてＡＬＵ
を記載したが、かわりにグラフィックアクセラレータを
使用する、あるいは上記ＡＬＵとグラフィックアクセラ
レータを並列して使用することによりさらにソフトウェ
アの負担が軽減できるとともに、高性能な処理が可能と
なる。本実施例は、汎用レジスタグループの数が少ない
ときや特殊レジスタとして扱われる場合は、汎用レジス
タとそれに対応するプログラムカウンタとの対応がとれ
ていれば管理レジスタは必ずしも必要とされないもので
あって、プログラムカウンタは２つ以上設定することも
できる。そして、必ず１つのプログラムカウンタを実行
用に使用し、その他のプログラムカウンタはディレイス
ロット置き換え用に使用する。Here, in this embodiment, an ALU is used as an arithmetic unit.
However, by using the graphic accelerator instead, or by using the ALU and the graphic accelerator in parallel, the load on the software can be further reduced and high-performance processing can be performed. In the present embodiment, when the number of general-purpose register groups is small or when the general-purpose register group is treated as a special register, the management register is not necessarily required as long as the general-purpose register and the corresponding program counter are in correspondence. Two or more program counters can be set. Then, one program counter is always used for execution, and the other program counters are used for delay slot replacement.

【００２１】（実施例５）図９に本発明のマイクロプロ
セッサを適用したワークステーションの機能ブロック図
を示す。システムバスにより、本発明を適用した複数の
プロセッサＣＰＵ，メモリマネージメントユニットＭＭ
Ｕ，メインメモリ，グラフィックアクセラレータ，入出
力装置Ｉ／Ｏ，２ｎｄキャッシュメモリが接続されてい
る。そして、上記グラフィックアクセラレータとＣＲＴ
が接続され、上記入出力装置Ｉ／Ｏとネットワーク，キ
ーボード，ハードディスクとが接続されている。ここ
で、上記メモリマネージメントユニットＭＭＵは、階層
化したメモリと実メモリとを変換するためのものであっ
て、上記グラフィックアクセラレータは点，線，塗り潰
し，文字等の高速描画を命令により読み込み実行するも
のであり、上記グラフィックアクセラレータによる演算
結果はＶＲＡＭへドットデータ出力として書き込み、Ｃ
ＲＴに出力されて表示される。そして、上記入出力装置
Ｉ／Ｏは基本的に本発明のプロセッサＣＰＵからの命令
にしたがってワークステーション外部とのデータの入出
力を行なうが、ＤＭＡ転送の際にはメモリ−Ｉ／Ｏ間の
データ転送を行なう。さらに、上記入出力装置Ｉ／Ｏは
上記キーボードから入力された外部からの命令を取り込
むとともに、上記ネットワークにより他のワークステー
ション上との通信を行なう。そして、上記ハードディス
クは本発明のプロセッサから発生された図示していない
上記ハードディスクのハードディスクドライバに対して
このハードディスクの内容を取り出す量とそのメインメ
モリへの格納場所を指定する。ここで、本発明のプロセ
ッサは１つの１次キャッシュメモリを複数のプロセッサ
要素ＰＥ１〜ＰＥｎで共用できるために、従来ソフトウ
ェアで切り換えを行なっていたマルチスレッドの一部を
ハードウエアで切り換えることが可能となるので、メモ
リアクセス回数が低減でき、高速切り換えが可能とな
る。また、従来は内容を認識するまで他のプロセッサＣ
ＰＵを動作させることが不可能であったが、内部バスで
対応させることが可能となるので、プロセッサＣＰＵ内
で演算可能となるためにスケジューリングが高速に行な
える。さらに、外部割込みの時の実行プロセスを管理レ
ジスタの中に格納すると、割込み要求後のプロセス切り
換えを高速化することができる。(Embodiment 5) FIG. 9 shows a functional block diagram of a workstation to which the microprocessor of the present invention is applied. A plurality of processor CPUs and a memory management unit MM to which the present invention is applied by a system bus
U, main memory, graphic accelerator, input / output device I / O, and 2nd cache memory are connected. And the graphic accelerator and CRT
Is connected, and the input / output device I / O is connected to the network, keyboard, and hard disk. Here, the memory management unit MMU is for converting a hierarchical memory and a real memory, and the graphic accelerator is for reading and executing high-speed drawing of points, lines, fills, characters, etc. by an instruction. The calculation result by the graphic accelerator is written to the VRAM as dot data output, and C
It is output to RT and displayed. The input / output device I / O basically inputs / outputs data from / to the outside of the workstation according to an instruction from the processor CPU of the present invention. However, during DMA transfer, data between the memory and I / O is transferred. Transfer. Further, the input / output device I / O fetches an external command input from the keyboard and communicates with other workstations via the network. Then, the hard disk specifies to the hard disk driver of the hard disk (not shown) generated from the processor of the present invention the amount of extracting the content of the hard disk and the storage location in the main memory. Here, in the processor of the present invention, one primary cache memory can be shared by a plurality of processor elements PE1 to PEn, so that it is possible to switch a part of the multi-thread that was switched by software in the related art by hardware. Therefore, the number of times of memory access can be reduced and high-speed switching can be performed. Also, conventionally, another processor C until the contents are recognized.
Although it was impossible to operate the PU, it is possible to handle it by the internal bus, so that the calculation can be performed in the processor CPU, so that the scheduling can be performed at high speed. Further, storing the execution process at the time of the external interrupt in the management register can speed up the process switching after the interrupt request.

【００２２】[0022]

【発明の効果】大容量の汎用レジスタファイルをもつ論
理集積回路において、汎用レジスタファイルをプロセス
ごとに分割することにより、プロセス切り換えの時間を
大幅に短縮し、マルチプロセッサへの拡張を容易にし、
パイプライン処理に割り当てる命令コードの最適化によ
り、処理速度を大幅に短縮することができる。In a logic integrated circuit having a large-capacity general-purpose register file, by dividing the general-purpose register file for each process, the process switching time is greatly shortened, and expansion to a multiprocessor is facilitated.
By optimizing the instruction code assigned to the pipeline processing, the processing speed can be significantly reduced.

[Brief description of drawings]

【図１】本発明を適用したプロセッサの機能ブロック
図。FIG. 1 is a functional block diagram of a processor to which the present invention is applied.

【図２】従来のＲＩＳＣプロセッサの機能ブロック図。FIG. 2 is a functional block diagram of a conventional RISC processor.

【図３】一般的なＲＩＳＣプロセッサのパイプライン処
理の概念図。FIG. 3 is a conceptual diagram of pipeline processing of a general RISC processor.

【図４】本発明のプロセッサにおいて実行状態フラグを
読み書きするための方法を示す図。FIG. 4 illustrates a method for reading and writing execution status flags in a processor of the present invention.

【図５】実施例２の管理レジスタに汎用レジスタグルー
プのデータを格納したプロセッサの機能ブロック図。FIG. 5 is a functional block diagram of a processor in which data of a general-purpose register group is stored in a management register according to the second embodiment.

【図６】実施例３の複数の演算器と複数のプログラムカ
ウンタを設けたマルチプロセッサの機能ブロック図。FIG. 6 is a functional block diagram of a multiprocessor including a plurality of arithmetic units and a plurality of program counters according to the third embodiment.

【図７】実施例４の１つのプロセッサに対して複数のプ
ログラムカウンタを設けたプロセッサの機能ブロック
図。FIG. 7 is a functional block diagram of a processor in which a plurality of program counters are provided for one processor according to the fourth embodiment.

【図８】実施例４の１つのプロセッサに対して複数のプ
ログラムカウンタを設けたプロセッサの命令キャッシュ
メモリの命令内容とパイプライン処理を示す図。FIG. 8 is a diagram showing instruction contents and pipeline processing of an instruction cache memory of a processor in which a plurality of program counters are provided for one processor of the fourth embodiment.

【図９】実施例５の本発明を適用したワークステーショ
ンシステムの機能ブロック図。FIG. 9 is a functional block diagram of a workstation system according to a fifth embodiment of the invention.

[Explanation of symbols]

ＰＣ・・・プログラムカウンタ，Ｒｅｇ・・・汎用レジスタ，
ＤＣＲ・・・デコード回路，ＢＬＫ・・・ブロック，Ｇｒｐ・・
・汎用レジスタグループ，ＰＣ＊・・・プログラムカウンタ
の値，ＡＬＵ・・・演算器，Ｇｒｐ＃・・・レジスタグループ
信号，ＣＰＵ・・・プロセッサ，ｇｒｐ＊・・・汎用レジスタ
グループデータ，ＰＥ＊・・・プロセッサエレメントデー
タ，ＰＥ・・・プロセッサ要素，ＭＭＵ・・・メモリマネージ
メントユニット，Ｉ／Ｏ・・・入出力装置。PC: Program counter, Reg: General-purpose register,
DCR ... Decoding circuit, BLK ... Block, Grp ...
General-purpose register group, PC * ... program counter value, ALU ... arithmetic unit, Grp # ... register group signal, CPU ... processor, grp * ... general-purpose register group data, PE * ..Processor element data, PE ... Processor element, MMU ... Memory management unit, I / O ... Input / output device

Claims

[Claims]

1. A general-purpose register file configured by a general-purpose register group including a plurality of general-purpose registers,
A management register group including a plurality of management registers for storing information on processes assigned to each of the general-purpose registers and information on switching of the processes, and a memory
A logic integrated circuit included in a chip, wherein the management register holds an instruction for performing a read operation and a write operation between the general-purpose register in the general-purpose register file and the memory, and the management register is It has a function of being divided for each process, and the general-purpose register is assigned by the management register for each execution process, and when the execution process is different and the same general-purpose register is designated at the time of instruction input, each execution is executed. The use range of the general-purpose register is limited for each process by being specified in the general-purpose register assigned to the process and referring to the data stored in the management register from the register specification at the time of inputting an instruction. Logic integrated circuit.

2. The logic integrated circuit according to claim 1, wherein information about the process in the execution state and the process in the execution wait state is stored in each of the general-purpose registers in the general-purpose register file. .

3. The logic integrated circuit according to claim 1, wherein the logic integrated circuit has a plurality of processors and a plurality of processes are simultaneously executed by the general-purpose register.

4. The logic integrated circuit comprises a program counter indicating an instruction code of an execution process, a program counter indicating an instruction code of a process other than the execution process in a waiting state, and a circuit for detecting the instruction code. In the normal processing, the program counter of the execution process is used, and the instruction code not executed in the pipeline is detected by the instruction code detection circuit before the instruction is read, and the instruction code of the process in the execution waiting state is detected. A program counter switching circuit for switching to a program counter indicating that the instruction code of another process is inserted instead, and is replaced with an instruction code that can be executed in a pipeline when an instruction is input. The logic integrated circuit according to item 1.

5. A function of switching the process, selecting a management register, holding the value in a program counter, overwriting the management register to switch the execution state flag to the execution waiting state flag, and from the execution waiting state flag. The function to transfer data according to a selection algorithm that selects the next execution process and specifies the general-purpose register group corresponding to that process, and specifies the management register to be executed next and reads the value of the program counter stored therein and the above program This is performed by a process switching circuit having a data restoring function for setting in a counter, and in the above selection algorithm, a management register having information of the next execution process is calculated from the general-purpose register group read by the bit number. Logic integrated circuits Claims paragraph 1, wherein the switching to the same bit during execution of the running state flag flags stored standby state.

6. A system bus allows a plurality of processors,
A memory management unit, a main memory, a graphic accelerator, an input / output device, and a cache memory are connected, and also the graphic accelerator and the CRT, the input / output device, a network, a keyboard, and a hard disk are connected, and the memory management unit is used. Converting the hierarchical memory and the real memory, reading and executing high-speed drawing by an instruction in the graphic accelerator, executing the calculation result in the VRAM as dot data output, and outputting and displaying in the CRT, The input / output device inputs / outputs data from / to the outside in accordance with an instruction from the processor and transfers data between the memory and the input / output device at the time of DMA transfer. Amount of extracting the contents of the hard disk to the hard disk driver generated from the processor in the hard disk, while receiving the command from the outside input from the keyboard and communicating with other system in the network. A data processing system for designating a storage location in the main memory, comprising a general-purpose register file configured by a general-purpose register group including a plurality of general-purpose registers, information on processes assigned to each of the general-purpose registers, and switching of the processes. A logical integrated circuit having a management register group composed of a plurality of management registers for storing information on a memory and a memory in one chip, wherein the management register is provided between the general-purpose register in the general-purpose register file and the memory. Read operation The management register holds information for performing a write operation, and the management register has a function of being divided for each process. The general-purpose register is assigned by the management register for each execution process, and the execution process is different. In the case where the instruction is input, the same general-purpose register is designated to specify the general-purpose register assigned to each execution process, and the data stored in the management register is referenced from the register designation when the instruction is input. A data processing system, wherein the use range of the general-purpose register is limited for each process.