JP3737573B2

JP3737573B2 - VLIW processor

Info

Publication number: JP3737573B2
Application number: JP23769496A
Authority: JP
Inventors: 隆二境
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 1996-09-09
Filing date: 1996-09-09
Publication date: 2006-01-18
Anticipated expiration: 2016-09-09
Also published as: JPH1083302A

Description

【０００１】
【発明の属する技術分野】
本発明は、複数の命令フィールドを有する長命令語（Very Long Instruction Word：ＶＬＩＷ）を実行するＶＬＩＷプロセッサに関する。
【０００２】
【従来の技術】
従来より、プロセッサの性能を向上させるために、プロセッサの動作周波数を速くするための試みが行われてきたが、回路の集積度、消費電力、素子のスピードなどから、物理的限界に近づいてきている。そこで今日では、より高速な処理を実現するために、スーパースカラや、長命令語（ＶＬＩＷ）といった複数の命令を同時（並列）に実行するアーキテクチャを採用しているプロセッサが開発され、広く利用されるようになっている。
【０００３】
さて、長命令語を実行するプロセッサ（ＶＬＩＷプロセッサ）では、その長命令語中の各命令（単位命令）で使用可能なレジスタ数を２ⁿ とした場合、その命令（命令フィールド）中で１つのレジスタ（デスティネーションレジスタまたはソースレジスタ）を指定するにはｎビットのレジスタ指定部を必要とする。このため、例えば３つのオペランドを扱う命令の例では、図１６（ａ）に示すように、ｎビットのレジスタ指定部を３オペランド分必要とし、１命令全体ではレジスタ指定のために必要なビット数は３ｎビットとなる。
【０００４】
もし、メモリアクセス回数を少なくして高速処理を実現するために、扱えるレジスタ数を２倍の２ⁿ⁺¹ 個にしようとすると、各命令中のレジスタ指定部（ここでは３つのレジスタ指定部ＯＰ１〜ＯＰ３）のビット数を、図１６（ａ）に示すｎビットから、図１６（ｂ）に示すようにｎ＋１ビットに増やす必要があり、３オペランドの例では、１命令全体で３ビット増やさなければならない。
【０００５】
一方、上記の長命令語を実行するプロセッサ（ＶＬＩＷプロセッサ）で、より高い性能を実現しようとすると、同時実行可能命令（単位命令）数を増やして、並列度を上げる必要がある。並列度を上げるには、長命令語長を伸ばして命令フィールド数を増やせばよい。しかし、長命令語中の命令フィールド数（命令数）を増やすと、一度に（１サイクルで）読み出さなければならないレジスタ数（レジスタファイルのポート数）も増大し、またパイプライン処理に必要なバイパス回路（例えば、演算結果をパイプラインの書き込みステージを経ずに演算器側に導くためのバイパス回路）の規模も大きくなる。このため、ハードウェアの複雑度が増大し、動作周波数を速くするのを妨げる要因となる。
【０００６】
図１７は、このような例を、４命令並列に実行可能なＶＬＩＷプロセッサのパイプライン構成について示す。ここでは、並列実行可能な命令数４に一致する数の２入力１出力の演算器２２１-0〜２２１-3と、その演算器２２１-0〜２２１-3の演算結果を一時保持するバッファ２２２-0〜２２２-3からなるラッチ回路２２２と、このバッファ２２２-0〜２２２-3の出力を演算器２２１-0〜２２１-3の左側入力（Ｌ入力）または右側入力（Ｒ入力）に選択的にバイパスするためのバイパス回路２２３と、並列実行可能な命令数４に一致する数の入力ポート並びにその２倍の数の出力ポートを持つレジスタファイル２２４とが設けられる。
【０００７】
バイパス回路２２３は、演算器２２１-0〜２２１-3の左側入力に対応して設けられたマルチプレクサ（ＭＰＸ）２２３Ｌ0 〜２２３Ｌ3 と、演算器２２１-0〜２２１-3の右側入力に対応して設けられたマルチプレクサ２２３Ｒ0 〜２２３Ｒ3 とから構成される。マルチプレクサ２２３Ｌ0 〜２２３Ｌ3 ，２２３Ｒ0 〜２２３Ｒ3 は、レジスタファイル２２４のそれぞれ異なる出力ポートと１対１で対応しており、対応する出力ポートからの出力及びバッファ２２２-0〜２２２-3の出力の１つを選択して演算器２２１-0〜２２１-3の対応する入力側に出力する。
【０００８】
このように、並列実行可能な命令数が４の場合、バイパス回路を構成するマルチプレクサの数は２×４、各マルチプレクサの入力数は４＋１＝５となる。したがって、バイパス回路内の全マルチプレクサの総入力数、即ちバイパス回路の入力ポート数は５×２×４＝４０となる。
【０００９】
一般に、従来のＶＬＩＷプロセッサでは、実行すべき長命令語の命令フィールド数、即ち同時実行命令数（並列度）がＮの場合、３つのオペランドを扱う命令形式の例では、レジスタファイルのポート数は入力ポートがＮ、出力ポートが２Ｎとなり、バイパス回路のポート数は入力ポートが（Ｎ＋１）×２Ｎ（バイパス回路を構成する２Ｎ個のマルチプレクサの入力数はＮ＋１）、出力ポートが２Ｎとなる。
【００１０】
【発明が解決しようとする課題】
上記したように、従来のＶＬＩＷプロセッサでは、扱えるレジスタ数を増やそうとすると、命令（単位命令）中のレジスタ指定部のビット数を増やさなければならず、長命令語（ＶＬＩＷ）長を伸ばさなければならないという問題があった。
【００１１】
また従来のＶＬＩＷプロセッサでは、高並列度にすると、一度に読み出さなければならないレジスタ数が増加してレジスタファイルのポート数の増加を招き、更にパイプライン処理に必要なバイパス回路の規模が大きくなってハードウェアの複雑度が増大するという問題があった。
【００１２】
本発明は上記事情を考慮してなされたものでその目的は、扱えるレジスタ数が長命令語（ＶＬＩＷ）長を伸ばすことなく増やすことができ、しかもハードウェア構成の複雑化を招かないで済む高性能なＶＬＩＷプロセッサを提供することにある。
【００１３】
【課題を解決するための手段】
本発明は、複数の命令フィールドを有する長命令語（ＶＬＩＷ）を実行するＶＬＩＷプロセッサにおいて、複数のレジスタファイルと、長命令語中の各命令フィールドのフィールド番号の第１の所定部分をもとに、その命令フィールドの命令で参照するソースオペランドの読み出しが可能なレジスタファイルを割り当てると共に、上記各命令フィールドのフィールド番号の上記第１の所定部分とは少なくとも一部が異なる第２の所定部分をもとに、その命令フィールドの命令の実行結果の書き込みが可能なレジスタファイルを割り当てる割り当て手段とを備えたことを特徴とする。また、書き込みが可能なレジスタファイルの割り当てを例にとると、上記第１の所定部分と、各命令フィールドのデスティネーションレジスタ指定部の一部（上位ビット）との連結情報によりレジスタファイルの割り当てを行うことも可能である。更に、上記第１の所定部分と第２の所定部分の一部を重複させることも可能である。
【００１４】
上記複数のレジスタファイル内の各レジスタには、それぞれ固有のレジスタ番号が付けられている。そこで、この固有のレジスタ番号を持つレジスタの指定のためには、長命令語中の各命令フィールドのソースレジスタ指定部の示すレジスタ番号を、その命令フィールドのフィールド番号の上記第１の所定部分により修飾し、上記各命令フィールドのデスティネーションレジスタ指定部の示すレジスタ番号を、その命令フィールドのフィールド番号の前記第１の所定部分とは異なる第２の所定部分により修飾するとよい。また、このレジスタ番号の修飾には、命令フィールドのソースレジスタ指定部の示すレジスタ番号の上位に、その命令フィールドのフィールド番号の第１の所定部分を付加し、各命令フィールドのデスティネーションレジスタ指定部の示すレジスタ番号の上位に、その命令フィールドのフィールド番号の第２の所定部分を付加する方法を適用するとよい。
【００１５】
上記構成のＶＬＩＷプロセッサにおいては、長命令語の各命令フィールド（の命令）のフィールド番号により、その命令フィールドの命令で参照するソースオペランドの読み出しが可能なレジスタファイルと、その命令フィールドの命令の実行結果の書き込みが可能なレジスタファイルとが決められるため、各命令フィールドのレジスタ指定部（ソースレジスタ指定部、デスティネーションレジスタ指定部）では、そのレジスタファイル内のレジスタ位置（相対位置、相対レジスタ番号）を指定するだけでよく、長命令語全体で扱えるレジスタ数を増やしても、命令フィールドのレジスタ指定部の構成ビット数を増やさなくても済む。また、レジスタファイルの決定に、フィールド番号だけでなく、レジスタ指定部の一部（上位ビット）を用いる場合には、命令フィールドのレジスタ指定部の構成ビット数を増やす必要があるが、フィールド番号を利用しない場合に比べれば、増加するビット数は少なくて済む。
【００１６】
しかも上記構成のＶＬＩＷプロセッサにおいては、長命令語の各命令フィールド毎に、ソース指定とデスティネーション指定のそれぞれについて、対象となるレジスタファイルを制限しているため、従来に比べてレジスタファイルの入力ポート数及び出力ポート数を減らすことが可能となる。同様の理由で、バイパス回路についても、入力ポート数及び出力ポート数を減らすことが可能となる。これにより、並列度を上げても（長命令語中の命令フィールド数を増やしても）ハードウェアの複雑度が著しく増大するのを防ぐことができる。
【００１７】
更に、上記構成のＶＬＩＷプロセッサにおいては、長命令語の各命令フィールド毎に使用可能なレジスタファイルを制限していながら、ソース指定では、各命令フィールドのフィールド番号の第１の所定部分を用いたレジスタ修飾が、デスティネーション指定では、この第１の所定部分とは異なる第２の所定部分を用いたレジスタ修飾が適用されることから、ある命令フィールドの命令の演算結果を他の命令フィールドの命令でも参照できる。
【００１８】
このようなＶＬＩＷプロセッサで実行可能なプログラム（オブジェクトプログラム）、即ち長命令語中の各命令フィールドのフィールド番号によるレジスタ番号の修飾により、各命令フィールド毎に（ソース指定とデスティネーション指定のそれぞれについて）使用可能なレジスタファイルを制限することを可能とする命令語形式に従ったオブジェクトプログラムを生成するには、以下に述べる命令スケジュールとレジスタアロケーション（レジスタ割り当て）を行うコンパイル機能を用意すればよい。
【００１９】
例えば、トップダウン方式で命令スケジュールを行う場合には、スケジュールの対象を命令Ｉであるとすると、当該命令Ｉが参照するソースオペランドを定義した命令（が既に配置されている命令フィールド）のフィールド番号を調べて、そのフィールド番号とソースオペランドがマッチしているか否か（３オペランドを扱う命令の例では、２つのソースオペランドを定義した２つの命令のフィールド番号で決まるデスティネーション先としてのレジスタファイルが一致しているか否か）を判断し、マッチしていれば、命令Ｉを、当該レジスタファイル内レジスタがソース指定可能な命令フィールドに配置し、マッチしていなければ、マッチするように、ソースオペランドをレジスタファイル間でコピーするコピー命令を生成して、そのコピー命令を、そのソース先とデスティネーション先で決まる命令フィールドに配置し、しかる後に、コピー命令のコピー先（デスティネーション先）レジスタファイルをソースレジスタファイルとして指定可能な命令フィールドに、命令Ｉを配置する。
【００２０】
また、ボトムアップ方式で命令スケジュールを行う場合には、スケジュールの対象を命令Ｉであるとすると、当該命令Ｉが定義する仮想レジスタ（変数）を使用する全ての命令（が既に配置されている命令フィールド）のフィールド番号を調べ、上記仮想レジスタを使用する命令のフィールド番号で決まるソース指定可能なレジスタファイルが、その命令数に無関係に１つだけである（この状態を、仮想レジスタとフィールド番号がマッチしていると呼ぶ）か否かを判断し、仮想レジスタとフィールド番号がマッチしているならば、ａを使う命令のフィールド番号で決まる命令フィールドに命令Ｉを配置し、マッチしていなければ、ａを使う全ての命令のフィールド番号で決まるレジスタファイルにａが存在するように、ａを目的のレジスタファイルにコピーするコピー命令を生成して、そのコピー命令を、そのソース先とデスティネーション先で決まる命令フィールドに配置し、しかる後に、コピー命令のコピー元レジスタファイルがデスティネーション指定可能な命令フィールドに、命令Ｉを配置する。
【００２１】
以上の命令スケジュール処理を、トップダウン方式の場合であれば始端命令から順に終端命令まで行い、ボトムアップ方式であれば終端命令から始端命令まで行うと、スケジュールされた各命令をスキャンして、各変数（仮想レジスタ）が参照或いは定義される命令のフィールド番号から、全ての変数をレジスタファイル別にクラス分けし、各クラスの各変数について、クラス別に、そのクラスに対応するレジスタファイル内の物理レジスタを割り当てるレジスタアロケーション処理を行えばよい。
【００２２】
【発明の実施の形態】
以下、本発明の実施の形態につき図面を参照して説明する。
［第１の実施形態］
図１は本発明の第１の実施形態に係るＶＬＩＷプロセッサの概略構成を示すブロック図である。
【００２３】
図１に示すＶＬＩＷプロセッサは、例えば３オペランド命令形式の４つの命令フィールド＃０〜＃３を持つ４並列の長命令語（４並列ＶＬＩＷ）を実行する演算プロセッサであり、命令フェッチ機構１０１、命令デコード機構１０２、パイプラインレジスタ（ＰＲ）１０３〜１０５、演算器１０６-0〜１０６-3、レジスタファイル１０７-0，１０７-1、デコード（Ｄ）ステージのバイパス回路１０８-0，１０８-1、実行（Ｅ）ステージのバイパス回路１０９-0，１０９-1、及びラッチ回路１１０，１１１を備えている。
【００２４】
命令フェッチ機構１０１は、（図示せぬ命令キャッシュ等から）パイプラインで長命令語をフェッチする（読み出す）Ｉステージ（命令フェッチステージ）を司る。
【００２５】
命令デコード機構１０２は、命令フェッチ機構１０１によりフェッチされた長命令語の命令フィールド＃０〜＃３に配置されている各命令（単位命令）をパイプラインで解読するＤステージ（命令デコードステージ）を司る。本実施形態では、３オペランドの命令形式の命令、即ち３つのレジスタ指定部（デスティネーションレジスタ指定部、第１及び第２ソースレジスタ指定部）を持つ命令（例えば演算命令）が用いられる。したがって、命令デコード機構１０２により演算命令がデコードされた場合、そのデコード結果には、演算結果の格納先を示すディスティネーションレジスタ番号（ＯＰ１）、及び演算に使用するソースオペランドが格納されているレジスタを指定する２つのソースレジスタ番号（第１及び第２ソースレジスタ番号ＯＰ２，ＯＰ３）が含まれる。
【００２６】
パイプラインレジスタ１０３は、命令フェッチ機構１０１によりフェッチされた長命令語をＤステージの期間保持しておくのに用いられ、パイプラインレジスタ１０４は、命令デコード機構１０２のデコード結果をＤステージに後続するＥステージ（命令実行ステージ）の期間保持しておくのに用いられ、パイプラインレジスタ１０５は、パイプラインレジスタ１０３の出力をＥステージに後続するＷステージ（書き込みステージ）の期間保持しておくのに用いられる。
【００２７】
演算器１０６-0〜１０６-3は、長命令語中の命令フィールド＃０〜＃３の命令の指示する演算の実行（Ｅステージ）を司る。
レジスタファイル１０７-0（＃０），１０７-1（＃１）は、ＶＬＩＷプロセッサでの演算結果を記憶するための、それぞれ２ⁿ 個のレジスタから構成される。レジスタファイル１０７-0内の２ⁿ 個のレジスタには、それぞれ０〜２ⁿ −１のレジスタ番号が割り当てられ、レジスタファイル１０７-1内の２ⁿ 個のレジスタには、それぞれ２ⁿ 〜２×２ⁿ −１のレジスタ番号、即ち２ⁿ 〜２ⁿ⁺¹ −１のレジスタ番号が割り当てられている。ここで、レジスタ番号のビット数はｎ＋１ビットであり、最上位ビットにより該当するレジスタが存在するレジスタファイルが指定され（“０”の場合はレジスタファイル１０７-0、“１”の場合はレジスタファイル１０７-1）、残りのｎビット（下位ｎビット）により、そのレジスタファイル内のレジスタ位置が指定される。
【００２８】
一方、図１のＶＬＩＷプロセッサで適用される長命令語中の各命令の３つのレジスタ指定部のビット長はｎビットである。この場合、レジスタ指定部だけでは、レジスタファイル１０７-0，１０７-1により提供される合計２ⁿ⁺¹ 個のレジスタを指定することはできない。
【００２９】
そこで本実施形態では、以下に述べるように、長命令語の各命令フィールド＃０〜＃３毎（で且つソース指定とデスティネーション指定の別毎）に使用可能なレジスタをレジスタファイル１０７-0または１０７-1の一方に制限し、その命令フィールドの命令中のｎビットの各レジスタ指定部により、その制限されたレジスタファイル内のレジスタ位置（ｎ＋１ビットのレジスタ番号の最上位ビットを除くｎビット）が示される構成とすることにより、レジスタ指定部のビット長がｎビットでありながら、長命令語全体で２ⁿ⁺¹ 個のレジスタを指定できるようにしている。
【００３０】
まず本実施形態では、命令フィールド＃０，＃１（フィールド番号０，１）の命令の指示する演算に用いるソースオペランドの参照先には、レジスタファイル１０７-0が固定的に割り当てられ、命令フィールド＃２，＃３（フィールド番号２，３）の命令の指示する演算に用いるソースオペランドの参照先には、レジスタファイル１０７-1が固定的に割り当てられる。
【００３１】
また、命令フィールド＃０，＃２（フィールド番号０，２）の命令の指示する演算の実行結果、即ち演算器１０６-0〜１０６-3のうちの演算器１０６-0，１０６-2の演算結果の書き込み先には、レジスタファイル１０７-0が固定的に割り当てられ、命令フィールド＃１，＃３（フィールド番号１，３）の命令の指示する演算の実行結果、即ち演算器１０６-0〜１０６-3のうちの演算器１０６-1，１０６-3の演算結果の書き込み先には、レジスタファイル１０７-1が固定的に割り当てられる。
【００３２】
以上の割り当ては、演算器１０６-0，１０６-2の出力をレジスタファイル１０７-0の入力ポートに、演算器１０６-1，１０６-3の出力をレジスタファイル１０７-0の入力ポートに、それぞれ（ラッチ回路１１１を介して）接続すると共に、レジスタファイル１０７-0の出力ポートを（バイパス回路１０８-0、ラッチ回路１１０、バイパス回路１０９-0を介して）演算器１０６-0，１０６-1の入力側に、レジスタファイル１０７-1の出力ポートを（バイパス回路１０８-1、ラッチ回路１１０、バイパス回路１０９-1を介して）演算器１０６-2，１０６-3の入力側に、それぞれ接続することで実現される。また演算器１０６-0，１０６-2の出力はバイパス回路１０８-0，１０９-0にも接続され、演算器１０６-1，１０６-3の出力はバイパス回路１０８-1，１０９-1にも接続される。
【００３３】
以上の構成により本実施形態では、長命令語中の各命令のｎビットのレジスタ指定部により、レジスタファイル内のレジスタ位置、即ちｎ＋１ビットのレジスタ番号の最上位ビットを除くｎビットが指定され、そのレジスタ位置のレジスタが存在するレジスタファイルの情報（レジスタファイル１０７-0または１０７-1のいずれに存在するかの情報）、即ちｎ＋１ビットのレジスタ番号の最上位ビットは、命令位置（命令フィールド番号）により決定されることになる。
【００３４】
これは、各命令フィールドの３つのレジスタ指定部（デスティネーションレジスタ指定部、第１及び第２ソースレジスタ指定部）で指定されるｎビットのレジスタ番号（ＯＰ１，ＯＰ２，ＯＰ３）を命令位置（命令フィールド番号）により修飾して、ｎ＋１ビットのレジスタ番号として指定することと等価である。
【００３５】
ここでは、長命令語の命令フィールド＃０〜＃３のフィールド番号０（“００”）〜３（“１１”）を２ビット“Ｂ0 Ｂ1 ”で表すと、ビットＢ0 をソースレジスタ番号の修飾（レジスタファイル指定）に、ビットＢ1 をデスティネーションレジスタ番号の修飾（レジスタファイル指定）に用いていることになる。この場合、ソースレジスタとしては、ビットＢ0 が“０”の命令フィールド＃０，＃１ではレジスタファイル１０７-0内のレジスタが、ビットＢ0 が“１”の命令フィールド＃２，＃３ではレジスタファイル１０７-1内のレジスタが指定される。一方、デスティネーションレジスタとしては、ビットＢ1 が“０”の命令フィールド＃０，＃２ではレジスタファイル１０７-0内のレジスタが、ビットＢ1 が“１”の命令フィールド＃１，＃３ではレジスタファイル１０７-1内のレジスタが指定される。
【００３６】
レジスタファイル１０７-0は、並列実行可能な命令数４の半分である２つの入力ポートと、入力ポート数の２倍の４つの出力ポート（Ｐ00，Ｐ01，Ｐ02，Ｐ03）とを持つ。レジスタファイル１０７-1もまた、並列実行可能な命令数４の半分である２つの入力ポートと、入力ポート数の２倍の４つの出力ポート（Ｐ10，Ｐ11，Ｐ12，Ｐ13）とを持つ。
【００３７】
バイパス回路１０８-0はＤステージに対応するもので、演算器１０６-0，１０６-1での演算（命令フィールド＃０，＃１の命令の指示する演算）に用いられる４つのソースオペランドとして、基本的には命令フィールド＃０，＃１の命令中の各ソースレジスタ指定部の示すレジスタファイル１０７-0内のレジスタから読み出されるデータを選択する。但し、ソースレジスタが演算器１０６-0または１０６-2の演算結果の格納先レジスタ（デスティネーションレジスタ）に一致するものについては、バイパス回路１０８-0は、そのレジスタのデータではなくて、その演算結果、即ち２サイクル前の命令の演算結果を選択するＤステージバイパスを行う。
【００３８】
バイパス回路１０８-1もバイパス回路１０８-0と同様にＤステージに対応するもので、演算器１０６-2，１０６-3での演算（命令フィールド＃２，＃３の命令の指示する演算）に用いられる４つのソースオペランドとして、基本的には命令フィールド＃２，＃３の命令中の各ソースレジスタ指定部の示すレジスタファイル１０７-1内のレジスタから読み出されるデータを選択する。但し、ソースレジスタが演算器１０６-1または１０６-3の演算結果の格納先レジスタ（デスティネーションレジスタ）に一致するものについては、バイパス回路１０８-1は、そのレジスタのデータではなくて、その演算結果、即ち２サイクル前の命令の演算結果を選択するＤステージバイパスを行う。
【００３９】
バイパス回路１０９-0はＥステージに対応するもので、演算器１０６-0，１０６-1での演算（命令フィールド＃０，＃１の命令の指示する演算）に用いられる４つのソースオペランドとして、基本的にはバイパス回路１０８-0からラッチ回路１１０を介して導かれるデータを選択する。但し、命令フィールド＃０，＃１の命令中の各ソースレジスタ指定部の示すソースレジスタのうち、演算器１０６-0または１０６-2の演算結果の格納先レジスタ（デスティネーションレジスタ）に一致するものについては、バイパス回路１０９-0は、そのレジスタのデータ（バイパス回路１０８-0からラッチ回路１１０を介して導かれるデータ）ではなくて、その演算結果、即ち１サイクル前（直前）の命令の演算結果を選択するＥステージバイパスを行う。
【００４０】
バイパス回路１０９-1もバイパス回路１０９-0と同様にＥステージに対応するもので、演算器１０６-2，１０６-3での演算（命令フィールド＃２，＃３の命令の指示する演算）に用いられる４つのソースオペランドとして、基本的にはバイパス回路１０８-1からラッチ回路１１０を介して導かれるデータを選択する。但し、命令フィールド＃２，＃３の命令中の各ソースレジスタ指定部の示すソースレジスタのうち、演算器１０６-1または１０６-3の演算結果の格納先レジスタ（デスティネーションレジスタ）に一致するものについては、バイパス回路１０９-1は、そのレジスタのデータ（バイパス回路１０８-1からラッチ回路１１０を介して導かれるデータ）ではなくて、その演算結果（１サイクル前の命令の演算結果）を選択するＥステージバイパスを行う。
【００４１】
ラッチ回路１１０は、バイパス回路１０８-0，１０８-1によって選択されたソースオペランドをＥステージの期間保持しておくのに用いられ、ラッチ回路１１１は、バイパス回路１０９-0，１０９-1によって選択されたソースオペランドをＷステージの期間保持しておくのに用いられる。
【００４２】
図２は、バイパス回路１０９-0，１０９-1の内部構成を、その周辺の構成と共に示す。
バイパス回路１０９-0は、演算器１０６-0，１０６-1の左側入力に対応して設けられたマルチプレクサ（ＭＰＸ）１１９Ｌ0 ，１１９Ｌ1 と、演算器１０６-0，１０６-1の右側入力に対応して設けられたマルチプレクサ（ＭＰＸ）１１９Ｒ0 ，１１９Ｒ1 とから構成される。
【００４３】
マルチプレクサ１１９Ｌ0 ，１１９Ｒ0 ，１１９Ｌ1 ，１１９Ｒ1 は、レジスタファイル１０７-0の出力ポートＰ00，Ｐ01，Ｐ02，Ｐ03と１対１で対応しており、対応する出力ポートから読み出されて（図１中のバイパス回路１０８-0、ラッチ回路１１０を介して）導かれるデータ及び（ラッチ回路１１１を介して導かれる）演算器１０６-0，１０６-2の演算結果の１つを選択して演算器１０６-0，１０６-1の対応する入力側に出力する。
【００４４】
バイパス回路１０９-1は、演算器１０６-2，１０６-3の左側入力に対応して設けられたマルチプレクサ（ＭＰＸ）１１９Ｌ2 ，１１９Ｌ3 と、演算器１０６-2，１０６-3の右側入力に対応して設けられたマルチプレクサ（ＭＰＸ）１１９Ｒ2 ，１１９Ｒ3 とから構成される。
【００４５】
マルチプレクサ１１９Ｌ2 ，１１９Ｒ2 ，１１９Ｌ3 ，１１９Ｒ3 は、レジスタファイル１０７-1の出力ポートＰ10，Ｐ11，Ｐ12，Ｐ13と１対１で対応しており、対応する出力ポートから読み出されて（図１中のバイパス回路１０８-1、ラッチ回路１１０を介して）導かれるデータ及び（ラッチ回路１１１を介して導かれる）演算器１０６-1，１０６-3の演算結果の１つを選択して演算器１０６-2，１０６-3の対応する入力側に出力する。
【００４６】
ラッチ回路１１１は、演算器１０６-0〜１０６-3の演算結果をＷステージの期間保持しておくバッファ１１１-0〜１１１-3から構成される。バッファ１１１-0，１１１-2の保持データはレジスタファイル１０７-0への書き込みに用いられ、バッファ１１１-1，１１１-3の保持データはレジスタファイル１０７-1への書き込みに用いられる。
【００４７】
なお、図２では、バイパス回路１０８-0，１０８-1及びラッチ回路１１０が省略されているが、そのハードウェア構成は、バイパス回路１０９-0，１０９-1及びラッチ回路１１１と同様である。
【００４８】
次に、図１及び図２の構成における動作を説明する。
図１のＶＬＩＷプロセッサで適用されるパイプラインは、（１）命令フェッチが行われるＩステージ、（２）命令デコードとデコード結果に基づくレジスタ読み出し（ソースオペランド読み出し）が行われるＤステージ、（３）命令実行（演算）が行われるＥステージ、（４）演算結果のレジスタへの書き込みが行われるＷステージ、の４ステージで構成されるものとする。なお、命令デコード及びレジスタ読み出しや、レジスタ書き込みに２ステージを必要とする、５ステージや６ステージで構成されるパイプラインもある。
【００４９】
まず、Ｉステージでは、命令フェッチ機構１０１により命令キャッシュ等から長命令語（ＶＬＩＷ）がフェッチされる。この命令フェッチ機構１０１によりフェッチされた長命令語はパイプラインレジスタ１０３に保持され、命令デコード機構１０２によるＤステージでの命令デコードに供される。
【００５０】
このＤステージでは、命令デコード機構１０２によりデコードされた命令フィールド＃ｉ（ｉ＝０〜３）の命令が例えば演算命令の場合には、その命令のフィールド番号（２ビット）を“Ｂ0 Ｂ1 ”とすると、その上位側ビットＢ0 の値で決まるレジスタファイル（Ｂ0 ＝０であればレジスタファイル１０７-0、Ｂ0 ＝１であればレジスタファイル１０７-1）を対象に、その命令の第１及び第２ソースレジスタ指定部で指定される（当該レジスタファイル内の）レジスタからのデータ（ソースオペランド）読み出しが行われる。
【００５１】
したがって、フィールド番号“Ｂ0 Ｂ1 ”中のＢ0 が“０”、即ちフィールド番号が０（“００”），１（“０１”）の命令フィールド＃０，＃１の命令についてはレジスタファイル１０７-0を対象に、Ｂ0 が“１”、即ちフィールド番号が２（“１０”），３（“１１”）の命令フィールド＃２，＃３の命令についてはレジスタファイル１０７-1を対象に、それぞれその命令フィールドの第１及び第２ソースレジスタ指定部で指定される（当該レジスタファイル内の）レジスタからのソースオペランド読み出しが行われる。
【００５２】
このことは、各命令フィールド＃０〜＃３の第１及び第２ソースレジスタ指定部の示すｎビットのレジスタ番号（ＯＰ２，ＯＰ３）の上位に、その命令フィールド＃０〜＃３のフィールド番号“Ｂ0 Ｂ1 ”中のビットＢ0 を付加するレジスタ番号修飾が行われ、そのＢ0 が付加されたｎ＋１ビットのソースレジスタ番号（第１及び第２ソースレジスタ番号）によりソースレジスタが指定されて、そのソースレジスタからのデータ読み出しが行われることと等価である。ここで、ｎ＋１ビットのソースレジスタ番号の最上位ビット、即ちビットＢ0 は“０”でレジスタファイル１０７-0を、“１”でレジスタファイル１０７-1を指定し、当該最上位ビットを除くｎビット、即ちソースレジスタ指定部の示すｎビットは、そのレジスタファイル内のソースレジスタ位置を示す。
【００５３】
なお、本実施形態では、第１ソースレジスタ指定部により、演算器の左側入力用のソースオペランドのレジスタ指定が、第２ソースレジスタ指定部により、演算器の右側入力用のソースオペランドのレジスタ指定が行われるものとする。
【００５４】
以上の命令フィールドのフィールド番号によるソースレジスタ番号の修飾について、主としてフィールド番号が１（“０１”）の命令フィールド＃１を例に、後述するデスティネーションレジスタ番号の修飾と共に図３に示す。このレジスタ番号修飾により、命令中のソースレジスタ指定部がｎビット長であっても、長命令語全体で２ⁿ⁺¹ 個のレジスタを扱うことができる。
【００５５】
さて、命令フィールド＃０，＃１の命令の第１ソースレジスタ指定部で指定された（レジスタファイル１０７-0内の）レジスタからの読み出しデータは、演算器１０６-0，１０６-1の左側入力に対応するレジスタファイル１０７-0の出力ポートＰ00，Ｐ02から、第２ソースレジスタ指定部で指定された（レジスタファイル１０７-0内の）レジスタからの読み出しデータは、演算器１０６-0，１０６-1の右側入力に対応するレジスタファイル１０７-0の出力ポートＰ01，Ｐ03から、それぞれ出力されてバイパス回路１０８-0の対応する入力ポートに導かれる。
【００５６】
同様に、命令フィールド＃２，＃３の命令の第１ソースレジスタ指定部で指定された（レジスタファイル１０７-1内の）レジスタからの読み出しデータは、演算器１０６-2，１０６-3の左側入力に対応するレジスタファイル１０７-1の出力ポートＰ10，Ｐ12から、第２ソースレジスタ指定部で指定された（レジスタファイル１０７-1内の）レジスタからの読み出しデータは、演算器１０６-2，１０６-3の右側入力に対応するレジスタファイル１０７-1の出力ポートＰ11，Ｐ13から、それぞれ出力されてバイパス回路１０８-1の対応する入力ポートに導かれる。
【００５７】
バイパス回路１０８-0には、命令デコード機構１０２によりデコードされた現在Ｄステージにある長命令語中のデコード結果のうちの命令フィールド＃０，＃１の第１及び第２ソースレジスタ指定部のデコード結果、即ち第１及び第２ソースレジスタ番号（の下位ｎ−１ビット）と、パイプラインレジスタ１０５に保持されている現在Ｗステージにある長命令語中のデコード結果のうちの命令フィールド＃０，＃２のデスティネーションレジスタ指定部のデコード結果、即ちデスティネーションレジスタ番号（の下位ｎ−１ビット）とが導かれる。
【００５８】
一方、バイパス回路１０８-1には、命令デコード機構１０２によりデコードされた現在Ｄステージにある長命令語中のデコード結果のうちの命令フィールド＃２，＃３の第１及び第２ソースレジスタ指定部のデコード結果、即ち第１及び第２ソースレジスタ番号（の下位ｎ−１ビット）と、パイプラインレジスタ１０５に保持されている現在Ｗステージにある長命令語中のデコード結果のうちの命令フィールド＃１，＃３のデスティネーションレジスタ指定部のデコード結果、即ちデスティネーションレジスタ番号（の下位ｎ−１ビット）とが導かれる。
【００５９】
バイパス回路１０８-0は、Ｄステージにある命令の命令フィールド＃０，＃１で指定される第１及び第２ソースレジスタ番号（の下位ｎ−１ビット）を、Ｗステージにある命令（Ｄステージにある命令より２サイクル前の命令）の命令フィールド＃０，＃２で指定されるデスティネーションレジスタ番号（の下位ｎ−１ビット）とそれぞれ比較する。
【００６０】
そしてバイパス回路１０８-0は、デスティネーションレジスタ番号に一致していないソースレジスタ番号の指定するソースオペランドとして、レジスタファイル１０７-0内の当該ソースレジスタ番号の指定するレジスタからの読み出しデータを選択する。
【００６１】
またバイパス回路１０８-0は、デスティネーションレジスタ番号に一致しているソースレジスタ番号の指定するソースオペランドとして、（ラッチ回路１１１を介して導かれる）当該デスティネーションレジスタ番号の指定するレジスタへの書き込みに用いられる演算器（演算器１０６-0または１０６-2）の演算結果を選択するＤステージバイパスを行う。もし、このＤステージバイパスが行われないならば、２サイクル前の長命令語（の命令フィールド＃０または＃２）の指定により実行された演算器１０６-0または１０６-2の演算結果がレジスタファイル１０７-0内のデスティネーションレジスタ番号の指定するレジスタに書き込まれるまでは、現在Ｄステージにある命令の長命令語（の命令フィールド＃０または＃１）で指定される当該レジスタからのデータ読み出しを待たなければならず、パイプラインの流れが乱れる。
【００６２】
一方、バイパス回路１０８-1は、Ｄステージにある命令の命令フィールド＃２，＃３で指定される第１及び第２ソースレジスタ番号（の下位ｎ−１ビット）を、Ｗステージにある長命令語（Ｄステージにある長命令語より２サイクル前の長命令語）の命令フィールド＃１，＃３で指定されるデスティネーションレジスタ番号（の下位ｎ−１ビット）とそれぞれ比較する。
【００６３】
そしてバイパス回路１０８-1は、デスティネーションレジスタ番号に一致していないソースレジスタ番号の指定するソースオペランドとして、レジスタファイル１０７-1内の当該ソースレジスタ番号の指定するレジスタからの読み出しデータを選択する。
【００６４】
またバイパス回路１０８-1は、デスティネーションレジスタ番号に一致しているソースレジスタ番号の指定するソースオペランドとして、（ラッチ回路１１１を介して導かれる）当該デスティネーションレジスタ番号の指定するレジスタへの書き込みに用いられる演算器（演算器１０６-1または１０６-3）の演算結果を選択するＤステージバイパスを行う。
【００６５】
バイパス回路１０８-0により選択された、Ｄステージにある長命令語の命令フィールド＃０，＃１の指定する４つのソースオペランド、及びバイパス回路１０８-1により選択された、Ｄステージにある長命令語の命令フィールド＃２，＃３の指定する４つのソースオペランドは、ラッチ回路１１０に保持されて、Ｅステージの期間、対応するバイパス回路１０９-0，１０９-1に導かれる。
【００６６】
このとき、当該長命令語に対する命令デコード機構１０２でのデコード結果がパイプラインレジスタ１０４に移される。同時に、このパイプラインレジスタ１０４に保持されていた、１サイクル前の長命令語のデコード結果はパイプラインレジスタ１０５に移される。
【００６７】
バイパス回路１０９-0には、パイプラインレジスタ１０４に保持されている現在Ｅステージにある長命令語中のデコード結果のうちの命令フィールド＃０，＃１の第１及び第２ソースレジスタ指定部のデコード結果、即ち第１及び第２ソースレジスタ番号と、パイプラインレジスタ１０５に保持されている現在Ｗステージにある長命令語中のデコード結果のうちの命令フィールド＃０，＃２のデスティネーションレジスタ指定部のデコード結果、即ちデスティネーションレジスタ番号とが導かれる。
【００６８】
一方、バイパス回路１０９-1には、パイプラインレジスタ１０４に保持されている現在Ｅステージにある長命令語中のデコード結果のうちの命令フィールド＃２，＃３の第１及び第２ソースレジスタ指定部のデコード結果、即ち第１及び第２ソースレジスタ番号（の下位ｎ−１ビット）と、パイプラインレジスタ１０５に保持されている現在Ｗステージにある長命令語中のデコード結果のうちの命令フィールド＃１，＃３のデスティネーションレジスタ指定部のデコード結果、即ちデスティネーションレジスタ番号（の下位ｎ−１ビット）とが導かれる。
【００６９】
バイパス回路１０９-0は、Ｅステージにある命令の命令フィールド＃０，＃１で指定される第１及び第２ソースレジスタ番号（の下位ｎ−１ビット）を、Ｗステージにある命令（Ｅステージにある命令の直前の命令）の命令フィールド＃０，＃２で指定されるデスティネーションレジスタ番号（の下位ｎ−１ビット）とそれぞれ比較する。
【００７０】
そしてバイパス回路１０９-0は、デスティネーションレジスタ番号に一致していないソースレジスタ番号の指定するソースオペランドとして、バイパス回路１０８-0により選択されてラッチ回路１１０を介して導かれる該当するソースオペランドを選択する。
【００７１】
またバイパス回路１０９-0は、デスティネーションレジスタ番号に一致しているソースレジスタ番号の指定するソースオペランドとして、（ラッチ回路１１１を介して導かれる）当該デスティネーションレジスタ番号の指定するレジスタへの書き込みに用いられる演算器（演算器１０６-0または１０６-2）の演算結果を選択するＥステージバイパスを行う。
【００７２】
一方、バイパス回路１０９-1は、Ｅステージにある命令の命令フィールド＃２，＃３で指定される第１及び第２ソースレジスタ番号（の下位ｎ−１ビット）を、Ｗステージにある命令（Ｅステージにある命令の直前の命令）の命令フィールド＃１，＃３で指定されるデスティネーションレジスタ番号（の下位ｎ−１ビット）とそれぞれ比較する。
【００７３】
そしてバイパス回路１０９-1は、デスティネーションレジスタ番号に一致していないソースレジスタ番号の指定するソースオペランドとして、バイパス回路１０８-1により選択されてラッチ回路１１０を介して導かれる該当するソースオペランドを選択する。
【００７４】
またバイパス回路１０９-1は、デスティネーションレジスタ番号に一致しているソースレジスタ番号の指定するソースオペランドとして、（ラッチ回路１１１を介して導かれる）当該デスティネーションレジスタ番号の指定するレジスタへの書き込みに用いられる演算器（演算器１０６-1または１０６-3）の演算結果を選択するＥステージバイパスを行う。
【００７５】
以上のバイパス回路１０９-0，１０９-1の選択動作の詳細を説明する。
まず、バイパス回路１０９-0内のマルチプレクサ１１９Ｌ0 ，１１９Ｌ1 は、Ｅステージにある命令の命令フィールド＃０，＃１で指定される第１ソースレジスタ番号（の下位ｎ−１ビット）が、Ｗステージにある命令の命令フィールド＃０及び＃２で指定されるデスティネーションレジスタ番号（の下位ｎ−１ビット）のいずれにも一致していない場合には、バイパス回路１０８-0により演算器１０６-0，１０６-1の左側入力用として選択されてラッチ回路１１０を介して導かれるソースオペランドを選択する。
【００７６】
またマルチプレクサ１１９Ｌ0 ，１１９Ｌ1 は、Ｅステージにある命令の命令フィールド＃０，＃１で指定される第１ソースレジスタ番号（の下位ｎ−１ビット）が、Ｗステージにある命令の命令フィールド＃０で指定されるデスティネーションレジスタ番号（の下位ｎ−１ビット）に一致している場合には、ラッチ回路１１１内のバッファ１１１-0を介して導かれる演算器１０６-0の演算結果を選択し、Ｗステージにある命令の命令フィールド＃２で指定されるデスティネーションレジスタ番号（の下位ｎ−１ビット）に一致している場合には、ラッチ回路１１１内のバッファ１１１-2を介して導かれる演算器１０６-2の演算結果を選択する。
【００７７】
マルチプレクサ１１９Ｌ0 ，１１９Ｌ1 により選択されたデータ（ソースオペランド）は演算器１０６-0，１０６-1の左側入力（Ｌ入力）に供給される。
次に、バイパス回路１０９-0内のマルチプレクサ１１９Ｒ0 ，１１９Ｒ1 は、Ｅステージにある命令の命令フィールド＃０，＃１で指定される第２ソースレジスタ番号（の下位ｎ−１ビット）が、Ｗステージにある命令の命令フィールド＃０及び＃２で指定されるデスティネーションレジスタ番号（の下位ｎ−１ビット）のいずれにも一致していない場合には、バイパス回路１０８-0により演算器１０６-0，１０６-1の右側入力用として選択されてラッチ回路１１０を介して導かれるソースオペランドを選択する。
【００７８】
またマルチプレクサ１１９Ｒ0 ，１１９Ｒ1 は、Ｅステージにある命令の命令フィールド＃０，＃１で指定される第２ソースレジスタ番号（の下位ｎ−１ビット）が、Ｗステージにある命令の命令フィールド＃０で指定されるデスティネーションレジスタ番号（の下位ｎ−１ビット）に一致している場合には、ラッチ回路１１１内のバッファ１１１-0を介して導かれる演算器１０６-0の演算結果を選択し、Ｗステージにある命令の命令フィールド＃２で指定されるデスティネーションレジスタ番号（の下位ｎ−１ビット）に一致している場合には、ラッチ回路１１１内のバッファ１１１-2を介して導かれる演算器１０６-2の演算結果を選択する。
【００７９】
マルチプレクサ１１９Ｒ0 ，１１９Ｒ1 により選択されたデータ（ソースオペランド）は演算器１０６-0，１０６-1の右側入力（Ｒ入力）に供給される。
一方、バイパス回路１０９-1内のマルチプレクサ１１９Ｌ2 ，１１９Ｌ3 は、Ｅステージにある命令の命令フィールド＃２，＃３で指定される第１ソースレジスタ番号（の下位ｎ−１ビット）が、Ｗステージにある命令の命令フィールド＃１及び＃３で指定されるデスティネーションレジスタ番号（の下位ｎ−１ビット）のいずれにも一致していない場合には、バイパス回路１０８-1により演算器１０６-2，１０６-3の左側入力用として選択されてラッチ回路１１０を介して導かれるソースオペランドを選択する。
【００８０】
またマルチプレクサ１１９Ｌ2 ，１１９Ｌ3 は、Ｅステージにある命令の命令フィールド＃２，＃３で指定される第１ソースレジスタ番号（の下位ｎ−１ビット）が、Ｗステージにある命令の命令フィールド＃１で指定されるデスティネーションレジスタ番号（の下位ｎ−１ビット）に一致している場合には、ラッチ回路１１１内のバッファ１１１-1を介して導かれる演算器１０６-1の演算結果を選択し、Ｗステージにある命令の命令フィールド＃３で指定されるデスティネーションレジスタ番号（の下位ｎ−１ビット）に一致している場合には、ラッチ回路１１１内のバッファ１１１-3を介して導かれる演算器１０６-3の演算結果を選択する。
【００８１】
マルチプレクサ１１９Ｌ2 ，１１９Ｌ3 により選択されたデータ（ソースオペランド）は演算器１０６-2，１０６-3の左側入力（Ｌ入力）に供給される。
次に、バイパス回路１０９-1内のマルチプレクサ１１９Ｒ2 ，１１９Ｒ3 は、Ｅステージにある命令の命令フィールド＃２，＃３で指定される第２ソースレジスタ番号（の下位ｎ−１ビット）が、Ｗステージにある命令の命令フィールド＃１及び＃３で指定されるデスティネーションレジスタ番号（の下位ｎ−１ビット）のいずれにも一致していない場合には、バイパス回路１０８-1により演算器１０６-2，１０６-3の右側入力用として選択されてラッチ回路１１０を介して導かれるソースオペランドを選択する。
【００８２】
またマルチプレクサ１１９Ｒ2 ，１１９Ｒ3 は、Ｅステージにある命令の命令フィールド＃２，＃３で指定される第２ソースレジスタ番号（の下位ｎ−１ビット）が、Ｗステージにある命令の命令フィールド＃１で指定されるデスティネーションレジスタ番号（の下位ｎ−１ビット）に一致している場合には、ラッチ回路１１１内のバッファ１１１-1を介して導かれる演算器１０６-1の演算結果を選択し、Ｗステージにある命令の命令フィールド＃３で指定されるデスティネーションレジスタ番号（の下位ｎ−１ビット）に一致している場合には、ラッチ回路１１１内のバッファ１１１-3を介して導かれる演算器１０６-3の演算結果を選択する。
【００８３】
マルチプレクサ１１９Ｒ2 ，１１９Ｒ3 により選択されたデータ（ソースオペランド）は、演算器１０６-2，１０６-3の右側入力（Ｒ入力）に供給される。
演算器１０６-0，１０６-1は、バイパス回路１０９-0から供給されるソースオペランド間のデータの演算を行い、演算器１０６-2，１０６-3は、バイパス回路１０９-1から供給されるソースオペランド間のデータの演算を行う。演算器１０６-0〜１０６-3の演算結果はラッチ回路１１１（内のバッファ１１１-0〜１１１-3）に保持される。
【００８４】
このとき、演算器１０６-0〜１０６-3での演算を指定した長命令語のデコード結果（現在Ｅステージにある長命令語のデコード結果）がパイプラインレジスタ１０４からパイプラインレジスタ１０５に移され、同時に当該長命令語の直後の長命令語に対する命令デコード機構１０２でのデコード結果（現在Ｄステージにある長命令語のデコード結果）がパイプラインレジスタ１０４に移される。
【００８５】
ラッチ回路１１１（内のバッファ１１１-0〜１１１-3）に保持された演算器１０６-0〜１０６-3の演算結果のうち、演算器１０６-0，１０６-2の演算結果（命令フィールド＃０，＃２の命令の演算結果）はレジスタファイル１０７-0の各入力ポートに、演算器１０６-1，１０６-3の演算結果（命令フィールド＃１，＃３の命令の演算結果）はレジスタファイル１０７-1の各入力ポートに、それぞれ導かれる。
【００８６】
レジスタファイル１０７-0に導かれた演算器１０６-0，１０６-2の演算結果は、当該レジスタファイル１０７-0内のレジスタのうち、パイプラインレジスタ１０５に保持されている現在Ｗステージにある長命令語中のデコード結果に含まれている対応する命令フィールド＃０，＃２のデスティネーションレジスタ指定部の指定するレジスタに書き込まれる。
【００８７】
また、レジスタファイル１０７-1に導かれた演算器１０６-1，１０６-3の演算結果は、当該レジスタファイル１０７-1内のレジスタのうち、パイプラインレジスタ１０５に保持されている現在Ｗステージにある長命令語中のデコード結果に含まれている対応する命令フィールド＃１，＃３のデスティネーションレジスタ指定部の指定するレジスタに書き込まれる。
【００８８】
このように、ラッチ回路１１１（内のバッファ１１１-0〜１１１-3）に保持された演算器１０６-0〜１０６-3の演算結果は、パイプラインレジスタ１０５に保持されている現在Ｗステージにある長命令語中のデコード結果のうちの命令フィールド＃０〜＃３のデスティネーションレジスタ指定部と、その命令フィールド＃０〜＃３のフィールド番号“Ｂ0 Ｂ1 ”の下位側ビットＢ1 とで決まるｎ＋１ビットのデスティネーションレジスタ番号の示すレジスタに書き込まれる。
【００８９】
即ち本実施形態では、図３に示すように、命令フィールド＃０〜＃３のデスティネーションレジスタ指定部の示すｎビットのレジスタ番号（ＯＰ１）の上位に、その命令フィールド＃０〜＃３のフィールド番号“Ｂ0 Ｂ1 ”のビットＢ1 が付加されるレジスタ番号修飾が行われ、そのビットＢ1 が付加されたｎ＋１ビットのデスティネーションレジスタ番号により、（レジスタ番号０〜２ⁿ⁺¹ −１の）２ⁿ⁺¹ 個のデスティネーションレジスタのいずれかが指定され、そのデスティネーションレジスタへの演算器１０６-0〜１０６-3の演算結果の書き込みが行われる。ここで、ｎ＋１ビットのデスティネーションレジスタ番号の最上位ビット、即ちビットＢ1 は“０”でレジスタファイル１０７-0を、“１”でレジスタファイル１０７-1を指定し、当該最上位ビットを除く下位ｎビット、即ちデスティネーションレジスタ指定部の示すｎビット（ＯＰ１）は、そのレジスタファイル内のデスティネーションレジスタ位置を示す。
【００９０】
したがって、ビットＢ1 が“０”の命令フィールド、即ちフィールド番号が０（“００”），２（“１０”）の命令フィールド＃０，＃２の命令に対しては、レジスタファイル１０７-0を対象に、当該命令のデスティネーションレジスタ指定部で指定されたレジスタへの、当該命令の演算結果（演算器１０６-0，１０６-2の演算結果）の書き込みが行われる。また、ビットＢ1 が“１”の命令フィールド、即ちフィールド番号が１（“０１”），３（“１１”）の命令フィールド＃１，＃３の命令に対しては、レジスタファイル１０７-1を対象に、当該命令のデスティネーションレジスタ指定部で指定されたレジスタへの、当該命令の演算結果（演算器１０６-1，１０６-3の演算結果）の書き込みが行われる。
【００９１】
以上に述べたように本実施形態においては、２ⁿ 個のレジスタからなる２つのレジスタファイル１０７-0，１０７-1を設け、４並列の長命令語中の各命令フィールド＃０〜＃３のフィールド番号によるレジスタ番号の修飾を行い、各命令フィールド毎に（ソース指定とデスティネーション指定のそれぞれについて）使用可能なレジスタをレジスタファイル１０７-0または１０７-1の一方に制限することにより、レジスタ指定部のビット長を（従来と同じ）ｎビットとしながらも（即ち長命令語長を伸ばさないにも拘らず）、長命令語全体として、使用可能なレジスタ数を従来の２ⁿ 個から、その２倍の２ⁿ⁺¹ 個とすることができる。
【００９２】
また本実施形態においては、レジスタファイル１０７-0，１０７-1の入出力ポート数を、レジスタ数が従来と同じ２ⁿ 個でありながら、入力ポート数２、出力ポート数４と、従来の半分にすることができる。
【００９３】
以上に述べたレジスタファイル１０７-0，１０７-1と長命令語の各命令フィールド＃０〜＃３（のフィールド番号“００”〜“１１”）との間の関係、具体的には命令フィールド＃０〜＃３と当該命令フィールド＃０〜＃３の命令が参照するレジスタファイル（ソース側レジスタファイル）との関係、及び命令フィールド＃０〜＃３と当該命令フィールド＃０〜＃の命令の指定する演算結果の書き込み先レジスタファイル（デスティネーション側レジスタファイル）との関係を図４に示す。なお、図４ではレジスタファイル１０７-0と１０７-1がいずれも２つ示されているが、参照時と結果書き込み時の関係を表すためであり、物理的には図１及び図２に示したように１つだけ存在する。
【００９４】
この他、本実施形態においては、各命令フィールド毎に使用可能なレジスタを制限したことから、バイパス回路１０８-0，１０８-1の１演算器当たりの入力ポート数（マルチプレクサの入力数）を、従来の５から３に減らすことができ、ハードウェア構成の簡略化が図れる。
【００９５】
なお、以上に述べた実施形態では、図２からも明らかなように、演算器１０６-0，１０６-1での演算（４並列の長命令語の命令フィールド＃０，＃１の命令の指定する演算）で使用可能なソースレジスタはレジスタファイル１０７-0のレジスタに、演算器１０６-2，１０６-3での演算（命令フィールド＃２，＃３の命令の指定する演算）で使用可能なソースレジスタはレジスタファイル１０７-1にそれぞれ制限され、演算器１０６-0，１０６-2の演算結果（命令フィールド＃０，＃２の命令の指定する演算の演算結果）の書き込み先として使用可能なデスティネーションレジスタはレジスタファイル１０７-0のレジスタに、演算器１０６-1，１０６-3の演算結果（命令フィールド＃１，＃３の命令の指定する演算の演算結果）の書き込み先として使用可能なデスティネーションレジスタはレジスタファイル１０７-1のレジスタにそれぞれ制限されている場合について説明したが、これに限るものではない。
【００９６】
そこで、各命令フィールド毎に（ソース指定とデスティネーション指定のそれぞれについて）使用可能なレジスタの制限が、以上の実施形態とは異なる第２の実施形態について図面を参照して説明する。
［第２の実施形態］
図５は本発明の第２の実施形態に係るＶＬＩＷプロセッサの概略構成を図２と同様の形式で示すブロック図であり、図２と同一部分には同一符号を付してある。
【００９７】
図５において、バイパス回路２０９-0は、演算器１０６-0，１０６-1の左側入力に対応して設けられたマルチプレクサ（ＭＰＸ）２１９Ｌ0 ，２１９Ｌ1 と、演算器１０６-0，１０６-1の右側入力に対応して設けられたマルチプレクサ（ＭＰＸ）２１９Ｒ0 ，２１９Ｒ1 とから構成される。
【００９８】
バイパス回路２０９-0内のマルチプレクサ２１９Ｌ0 ，２１９Ｌ1 ，２１９Ｒ1 は、レジスタファイル１０７-0の出力ポートＰ00，Ｐ02，Ｐ03と１対１で対応しており、対応する出力ポートから読み出されるデータ及び（ラッチ回路１１１を介して導かれる）演算器１０６-0，１０６-2の演算結果の１つを選択して演算器１０６-0，１０６-1の対応する入力側に出力する。
【００９９】
一方、バイパス回路２０９-0内のマルチプレクサ２１９Ｒ0 はレジスタファイル１０７-1の出力ポートＰ13と１対１で対応しており、当該出力ポートＰ13から読み出されるデータ及び（ラッチ回路１１１を介して導かれる）演算器１０６-1，１０６-3の演算結果の１つを選択して演算器１０６-0の右側入力に出力する。
【０１００】
バイパス回路２０９-1は、演算器１０６-2，１０６-3の左側入力に対応して設けられたマルチプレクサ（ＭＰＸ）２２９Ｌ2 ，２２９Ｌ3 と、演算器１０６-2，１０６-3の右側入力に対応して設けられたマルチプレクサ（ＭＰＸ）２２９Ｒ2 ，２２９Ｒ3 とから構成される。
【０１０１】
バイパス回路２０９-1内のマルチプレクサ２２９Ｌ2 ，２２９Ｒ2 ，２２９Ｌ3 は、レジスタファイル１０７-1の出力ポートＰ10，Ｐ11，Ｐ12と１対１で対応しており、対応する出力ポートから読み出されるデータ及び（ラッチ回路１１１を介して導かれる）演算器１０６-1，１０６-3の演算結果の１つを選択して演算器１０６-2，１０６-3の対応する入力側に出力する。
【０１０２】
一方、バイパス回路２０９-1内のマルチプレクサ２２９Ｒ3 はレジスタファイル１０７-0の出力ポートＰ01と１対１で対応しており、当該出力ポートＰ01から読み出されるデータ及び（ラッチ回路１１１を介して導かれる）演算器１０６-0，１０６-2の演算結果の１つを選択して演算器１０６-3の右側入力に出力する。
【０１０３】
この図５の構成が、図２の構成と異なる点は、図５中で演算器１０６-0，１０６-3の右側入力に対応するマルチプレクサ２１９Ｒ0 ，２１９Ｒ3 の各入力が、図２中で演算器１０６-0，１０６-3の右側入力に対応するマルチプレクサ１１９Ｒ0 ，１１９Ｒ3 の各入力と逆になっていることである。
【０１０４】
この図５の構成では、命令フィールド＃０〜＃３を持つ４並列の長命令語（４並列ＶＬＩＷ）のフィールド番号を“Ｂ0 Ｂ1 ”とすると、デスティネーションレジスタ番号は、図６に示すように、命令フィールド＃０〜＃３の命令のデスティネーションレジスタ指定部の示すレジスタ番号（ＯＰ１）の上位にビットＢ1 が付加されたものとなる。また、第１ソースレジスタ番号は、図６に示すように、命令フィールド＃０〜＃３の命令の第１ソースレジスタ指定部の示すレジスタ番号（ＯＰ２）の上位にビットＢ0 が付加されたものとなる。ここまでは、前記第１の実施形態と同様である。
【０１０５】
次に第２ソースレジスタ番号は、図６に示すように、命令フィールドのフィールド番号によって異なり、フィールド番号が“００”（＝０），“１１”（＝３）の命令フィールド＃０，＃３の命令では、その命令の第２ソースレジスタ指定部の示すレジスタ番号（ＯＰ３）の上位にビットＢ0 のレベル反転ビットが付加されたものとなり、フィールド番号が“０１”（＝１），“１０”（＝２）の命令フィールド＃１，＃２の命令では、その命令の第２ソースレジスタ指定部の示すレジスタ番号（ＯＰ３）の上位にビットＢ0 が付加されたものとなる。
【０１０６】
この場合、本実施形態におけるレジスタファイル１０７-0，１０７-1と長命令語の各命令フィールド＃０〜＃３（のフィールド番号“００”〜“１１”）との間の関係、具体的には命令フィールド＃０〜＃３と当該命令フィールド＃０〜＃３の命令が参照するレジスタファイル（ソース側レジスタファイル）との関係、及び命令フィールド＃０〜＃３と当該命令フィールド＃０〜＃の命令の指定する演算結果の格納先レジスタファイル（デスティネーション側レジスタファイル）との関係は図７のようになる。
【０１０７】
このように、４並列の長命令語の命令フィールド＃０〜＃３（のフィールド番号“００”〜“１１”）の命令で指定された演算の演算結果の書き込み先として、命令フィールド＃０，＃２についてはレジスタファイル１０７-0（内のレジスタ）に、命令フィールド＃１，＃３についてはレジスタファイル１０７-1（内のレジスタ）に制限すると共に、命令フィールド＃０〜＃３の命令の参照先を、命令フィールド＃０，＃３については２つのレジスタファイル１０７-0，１０７-1（内のレジスタ）に、命令フィールド＃１についてはレジスタファイル１０７-0（内のレジスタ）に、命令フィールド＃２についてはレジスタファイル１０７-1（内のレジスタ）に制限することでも、レジスタ指定部のビット長を（従来と同じ）ｎビットとしながらも（即ち長命令語長を伸ばさないにも拘らず）、長命令語全体として、使用可能なレジスタ数を従来の２ⁿ 個から、その２倍の２ⁿ⁺¹ 個とすることができる。
【０１０８】
なお、以上に述べた第１及び第２の実施形態では、長命令語の命令フィールド＃０〜＃３のフィールド番号“Ｂ0 Ｂ1 ”の上位側ビットＢ0 をソースレジスタ番号修飾に、下位側ビットＢ1 をデスティネーションレジスタ番号修飾に用いる場合について説明したが、これに限るものではなく、Ｂ0 をデスティネーションレジスタ番号修飾に、Ｂ1 をソースレジスタ番号修飾に用いるようにしても構わない。この場合、命令フィールド＃０〜＃３とレジスタファイル１０７-0，１０７-1との関係は、ソース指定とデスティネーション指定とで、以上の実施形態の逆になり、ＶＬＩＷプロセッサの構成（例えば第１の実施形態では図２の構成、第２の実施形態では図５の構成）もそれに適合するように変更する必要がある。
【０１０９】
また、以上に述べた第１及び第２の実施形態では、本発明を、命令フィールド＃０〜＃３を持つ４並列の長命令語を実行するＶＬＩＷプロセッサに適用した場合について説明したが、本発明は、例えば命令フィールド＃０〜＃７（フィールド番号“０００”〜“１１１”）を持つ８並列の長命令語を実行するＶＬＩＷプロセッサ、更には命令フィールド＃０〜＃１５（フィールド番号“００００”〜“１１１１”）を持つ１６並列の長命令語を実行するＶＬＩＷプロセッサ等にも適用可能である。そこでまず、本発明を８並列の長命令語を実行するＶＬＩＷプロセッサに適用した第３の実施形態につき説明する。
［第３の実施形態］
図８は、本発明を８並列の長命令語を実行するＶＬＩＷプロセッサに適用した第３の実施形態におけるレジスタファイルと長命令語の各命令フィールドとの間の関係を示す。
【０１１０】
図８において、４つのレジスタファイル２０７-0（＃０）〜２０７-3（＃３）は、図１中のレジスタファイル１０７-0，１０７-1と同様に２ⁿ 個のレジスタから構成される。レジスタファイル２０７-0内の２ⁿ 個のレジスタには０〜２ⁿ −１のレジスタ番号が、レジスタファイル２０７-1内の２ⁿ 個のレジスタには２ⁿ 〜２×２ⁿ −１のレジスタ番号、即ち２ⁿ 〜２ⁿ⁺¹ −１のレジスタ番号が、レジスタファイル２０７-2内の２ⁿ 個のレジスタには２ⁿ⁺¹ 〜３×２ⁿ −１のレジスタ番号が、そしてレジスタファイル２０７-3内の２ⁿ 個のレジスタには３×２ⁿ 〜４×２ⁿ −１のレジスタ番号、即ち３×２ⁿ 〜２ⁿ⁺² −１のレジスタ番号が、それぞれ割り当てられている。
【０１１１】
本実施形態においては、８並列の長命令語の各命令フィールド＃０〜＃７の命令は３オペランド形式の命令（演算命令の場合）であり、デスティネーションレジスタ指定部（ＯＰ１）のビット長は（前記第１及び第２の実施形態の場合より１ビット多い）ｎ＋１ビット、第１及び第２ソースレジスタ指定部（ＯＰ２，ＯＰ３）のビット長は（前記第１及び第２の実施形態の場合と同じ）ｎビットである。デスティネーションレジスタ指定部（ＯＰ１）の最上位ビット（Ｂ）の値は、命令フィールド（のフィールド番号）によって予め定められており、フィールド＃０〜＃３（フィールド番号“０００”〜“０１１”）では“０”、フィールド＃４〜＃７（フィールド番号“１００”〜“１１１”）では“１”である。
【０１１２】
ここで、各命令フィールド＃ｉ（ｉ＝０〜７）の命令（演算命令）の演算結果の書き込み先のレジスタファイルは、その命令フィールド＃ｉのフィールド番号（３ビット）を“Ｂ0 Ｂ1 Ｂ2 ”とすると、そのフィールド番号中の最下位ビットＢ2 と、その命令フィールド中のｎ＋１ビットのデスティネーションレジスタ指定部（ＯＰ１）の最上位ビット（Ｂ）からなる２ビット“Ｂ2 Ｂ”により決定される。
【０１１３】
また、決定されたレジスタファイル内の書き込み先レジスタは、デスティネーションレジスタ指定部（ＯＰ１）の最上位ビットを除くｎビットにより指定される。即ち命令フィールド＃ｉの命令（演算命令）の演算結果の書き込み先レジスタは、図９に示すように、その命令のデスティネーションレジスタ指定部で指定されるｎ＋１ビットのレジスタ番号（ＯＰ１）の上位に、その命令フィールドのフィールド番号の最下位ビットＢ2 が付加されたｎ＋２ビットのデスティネーションレジスタ番号により指定される。
【０１１４】
一方、命令フィールド＃ｉの命令（演算命令）の参照先のレジスタファイルは、そのフィールド番号中の上位側の２ビット“Ｂ0 Ｂ1 ”により決定される。また、決定されたレジスタファイル内の参照先レジスタは、その命令フィールド＃ｉ中のｎビットの第１及び第２ソースレジスタ指定部（ＯＰ２，ＯＰ３）により指定される。
【０１１５】
即ち命令フィールド＃ｉの命令（演算命令）の演算で参照する２つのソースレジスタは、図９に示すように、その命令の第１及び第２ソースレジスタ指定部で指定されるｎビットのレジスタ番号（ＯＰ２，ＯＰ３）の上位に、その命令フィールドのフィールド番号の上位側２ビット“Ｂ0 Ｂ1 ”が付加されたｎ＋２ビットのソースレジスタ番号（第１及び第２ソースレジスタ番号）により指定される。
【０１１６】
本実施例において、上記ｎ＋２ビットのレジスタ番号の上位２ビットは、“００”でレジスタファイル２０７-0を、“０１”でレジスタファイル２０７-1を、“１０”でレジスタファイル２０７-2を、“１１”でレジスタファイル２０７-3を指定し、当該上位２ビットを除くｎビットは、そのレジスタファイル内のレジスタ位置を示す。
【０１１７】
したがって、図８に示すように、フィールド番号“Ｂ0 Ｂ1 Ｂ2 ”中の上位側２ビット“Ｂ0 Ｂ1 ”が“００”、即ちフィールド番号が０（“０００”），１（“００１”）の命令フィールド＃０，＃１の命令についてはレジスタファイル２０７-0を対象に、“Ｂ0 Ｂ1 ”が“０１”、即ちフィールド番号が２（“０１０”），３（“０１１”）の命令フィールド＃２，＃３の命令についてはレジスタファイル２０７-1を対象に、“Ｂ0 Ｂ1 ”が“１０”、即ちフィールド番号が４（“１００”），５（“１０１”）の命令フィールド＃４，＃５の命令についてはレジスタファイル２０７-2を対象に、そして“Ｂ0 Ｂ1 ”が“１１”、即ちフィールド番号が６（“１１０”），７（“１１１”）の命令フィールド＃６，＃７の命令についてはレジスタファイル２０７-3を対象に、それぞれその命令フィールドの第１及び第２ソースレジスタ指定部で指定される（当該レジスタファイル内の）レジスタからのソースオペランド読み出しが行われる。
【０１１８】
また、フィールド番号“Ｂ0 Ｂ1 Ｂ2 ”中の最下位ビットＢ2 が“０”でデスティネーションレジスタ指定部の最上位ビットＢが“０”の命令フィールド、即ちフィールド番号が０（“０００”），２（“０１０”）の命令フィールド＃０，＃２の命令についてはレジスタファイル２０７-0を対象に、ビットＢ2 が“０”でビットＢが“１”の命令フィールド、即ちフィールド番号が４（“１００”），６（“１１０”）の命令フィールド＃４，＃６の命令についてはレジスタファイル２０７-1を対象に、ビットＢ2 が“１”でビットＢが“０”の命令フィールド、即ちフィールド番号が１（“００１”），３（“０１１”）の命令フィールド＃１，＃３の命令についてはレジスタファイル２０７-2を対象に、そしてビットＢ2 が“１”でビットＢが“１”の命令フィールド、即ちフィールド番号が５（“１０１”），７（“１１１”）の命令フィールド＃５，＃７の命令についてはレジスタファイル２０７-3を対象に、当該命令のデスティネーションレジスタ指定部で指定されたレジスタへの、当該命令の演算結果の書き込みが行われる。
【０１１９】
以上に述べたように本実施形態においては、２ⁿ 個のレジスタからなる４つのレジスタファイル２０７-0〜２０７-3を設け、８並列の長命令語中の各命令フィールド＃０〜＃７のフィールド番号によるレジスタ番号の修飾を行い、各命令フィールド毎に（ソース指定とデスティネーション指定のそれぞれについて）使用可能なレジスタをレジスタファイル２０７-0〜２０６-3のいずれかに制限することにより、デスティネーションレジスタ指定部のビット長をｎ＋１ビット、第１及び第２ソースレジスタ指定部のビット長をｎビットとしながらも、長命令語全体として、使用可能なレジスタ数を２ⁿ⁺² 個とすることができる。
【０１２０】
しかも本実施形態においては、レジスタファイル２０７-0〜２０７-3の入出力ポート数を、レジスタ数が従来と同じ２ⁿ 個でありながら、入力ポート数２、出力ポート数４とすることができる（８並列の長命令語の場合、従来は入力ポート数８、出力ポート数１６）。
【０１２１】
また、図８では省略されているが、レジスタファイル２０７-0〜２０７-3にそれぞれ対応して設けられることになる、図１中のバイパス回路１０８-i，１０９-i（ｉ＝０，１）に相当するバイパス回路の１演算器の１入力当たりの入力ポート数（マルチプレクサの入力数）を３とすることができる（８並列の長命令語の場合、従来は９）。
【０１２２】
なお、前記実施形態（第３の実施形態）では、長命令語の命令フィールド＃０〜＃７のフィールド番号“Ｂ0 Ｂ1 Ｂ2 ”の上位側の２ビット“Ｂ0 Ｂ1 ”をソースレジスタ番号修飾に、下位側の１ビットＢ2 を（デスティネーション指定部の最上位ビットＢと合わせて）デスティネーションレジスタ番号修飾に用いる場合について説明したが、これに限るものではない。例えば、Ｂ1 をソースレジスタ番号修飾とデスティネーションレジスタ番号修飾の一部のビットとして共通に用い、“Ｂ0 Ｂ1 ”を（前記実施形態と同様に）ソースレジスタ番号修飾に用いると共に、“Ｂ1 Ｂ2 ”をデスティネーション番号修飾に用いるようにしても構わない。この場合、デスティネーションレジスタ指定部のビット長は、前記実施形態と異なってｎビットで済む。
【０１２３】
このようなレジスタ番号修飾では、命令フィールド＃０〜＃７と、その命令フィールド＃０〜＃７（のデスティネーションレジスタ指定部）でデスティネーション先として指定可能なレジスタファイルとの対応関係は前記実施形態と異なり、命令フィールド＃０，＃４がレジスタファイル２０７-0（＃０）に、命令フィールド＃１，＃５がレジスタファイル２０７-1（＃１）に、命令フィールド＃２，＃６がレジスタファイル２０７-2（＃２）に、そして命令フィールド＃３，＃７がレジスタファイル２０７-3（＃３）に、それぞれ対応付けられる。
【０１２４】
次に本発明を１６並列の長命令語を実行するＶＬＩＷプロセッサに適用した第４の実施形態につき説明する。
［第４の実施形態］
図１０は、本発明を１６並列の長命令語を実行するＶＬＩＷプロセッサに適用した第４の実施形態におけるレジスタファイルと長命令語の各命令フィールドとの間の関係を示す。
【０１２５】
図１０に示すように、本実施形態においても、前記第３の実施形態と同様に、それぞれ２ⁿ 個のレジスタからなる４つのレジスタファイル２０７-0（＃０）〜２０７-3（＃３）が用いられる。
【０１２６】
本実施形態において、１６並列の長命令語の各命令フィールド＃０〜＃１５の命令は３オペランド形式の命令（演算命令の場合）であり、デスティネーションレジスタ指定部（ＯＰ１）、並びに第１及び第２ソースレジスタ指定部（ＯＰ２，ＯＰ３）のビット長は、いずれもｎビットである。ここで、デスティネーションレジスタ指定部（ＯＰ１）が、前記第３の実施形態で適用された８並列の長命令語の各命令フィールド＃０〜＃７のデスティネーションレジスタ部のビット数より１ビット少ないことに注意されたい。
【０１２７】
本実施形態においては、各命令フィールド＃ｉ（ｉ＝０〜１５）の命令（演算命令）の演算結果の書き込み先のレジスタファイルは、その命令フィールド＃ｉのフィールド番号（４ビット）を“Ｂ0 Ｂ1 Ｂ2 Ｂ3 ”とすると、そのフィールド番号中の下位側の２ビット“Ｂ2 Ｂ3 ”により決定される。また、決定されたレジスタファイル内の書き込み先レジスタは、デスティネーションレジスタ指定部（ＯＰ１）のｎビットにより指定される。即ち命令フィールド＃ｉの命令（演算命令）の演算結果の書き込み先レジスタは、図１１に示すように、その命令のデスティネーションレジスタ指定部で指定されるｎビットのレジスタ番号（ＯＰ１）の上位に、その命令フィールドのフィールド番号の下位側の２ビット“Ｂ2 Ｂ3 ”が付加されたｎ＋２ビットのデスティネーションレジスタ番号により指定される。
【０１２８】
一方、命令フィールド＃ｉの命令（演算命令）の参照先のレジスタファイルは、そのフィールド番号中の上位側の２ビット“Ｂ0 Ｂ1 ”により決定される。また、決定されたレジスタファイル内の参照先レジスタは、その命令フィールド＃ｉ中のｎビットの第１及び第２ソースレジスタ指定部（ＯＰ２，ＯＰ３）により指定される。即ち命令フィールド＃ｉの命令（演算命令）の演算で参照する２つのソースレジスタは、図１１に示すように、その命令の第１及び第２ソースレジスタ指定部で指定されるｎビットのレジスタ番号（ＯＰ２，ＯＰ３）の上位に、その命令フィールドのフィールド番号の上位側２ビット“Ｂ0 Ｂ1 ”が付加されたｎ＋２ビットのソースレジスタ番号（第１及び第２ソースレジスタ番号）により指定される。
【０１２９】
本実施例において、上記ｎ＋２ビットのレジスタ番号の上位２ビットは、“００”でレジスタファイル２０７-0を、“０１”でレジスタファイル２０７-1を、“１０”でレジスタファイル２０７-2を、“１１”でレジスタファイル２０７-3を指定し、当該上位２ビットを除くｎビットは、そのレジスタファイル内のレジスタ位置を示す。
【０１３０】
したがって、図１０に示すように、フィールド番号“Ｂ0 Ｂ1 Ｂ2 Ｂ3 ”中の上位側２ビット“Ｂ0 Ｂ1 ”が“００”、即ちフィールド番号が０（“００００”），１（“０００１”），２（“００１０”），３（“００１１”）の命令フィールド＃０，＃１，＃２，＃３の命令についてはレジスタファイル２０７-0を対象に、“Ｂ0 Ｂ1 ”が“０１”、即ちフィールド番号が４（“０１００”），５（“０１０１”），６（“０１１０”），７（“０１１１”）の命令フィールド＃４，＃５，＃６，＃７の命令についてはレジスタファイル２０７-1を対象に、“Ｂ0 Ｂ1 ”が“１０”、即ちフィールド番号が８（“１０００”），９（“１００１”），１０（“１０１０”），１１（“１０１１”）の命令フィールド＃８，＃９，＃１０，＃１１の命令についてはレジスタファイル２０７-2を対象に、そして“Ｂ0 Ｂ1 ”が“１１”、即ちフィールド番号が１２（“１１００”），１３（“１１０１”），１４（“１１１０”），１５（“１１１１”）の命令フィールド＃１２，＃１３，＃１４，＃１５の命令についてはレジスタファイル２０７-3を対象に、それぞれその命令フィールドの第１及び第２ソースレジスタ指定部で指定される（当該レジスタファイル内の）レジスタからのソースオペランド読み出しが行われる。
【０１３１】
また、フィールド番号“Ｂ0 Ｂ1 Ｂ2 Ｂ3 ”中の下位側２ビット“Ｂ2 Ｂ3 ”が“００”、即ちフィールド番号が０（“００００”），４（“０１００”），８（“１０００”），１２（“１１００”）の命令フィールド＃０，＃４，＃８，＃１２の命令についてはレジスタファイル２０７-0を対象に、“Ｂ2 Ｂ3 ”が“０１”、即ちフィールド番号が１（“０００１”），５（“０１０１”），９（“１００１”），１３（“１１０１”）の命令フィールド＃１，＃５，＃９，＃１３の命令についてはレジスタファイル２０７-1を対象に、フィールド番号が２（“００１０”），６（“０１１０”），１０（“１０１０”），１４（“１１１０”）の命令フィールド＃２，＃６，＃１０，＃１４の命令についてはレジスタファイル２０７-2を対象に、そして“Ｂ2 Ｂ3 ”が“１１”、即ちフィールド番号が３（“００１１”），７（“０１１１”），１１（“１０１１”），１５（“１１１１”）の命令フィールド＃３，＃７，＃１１，＃１５の命令についてはレジスタファイル２０７-3を対象に、当該命令のデスティネーションレジスタ指定部で指定されたレジスタへの、当該命令の演算結果の書き込みが行われる。
【０１３２】
以上に述べたように本実施形態においては、２ⁿ 個のレジスタからなる４つのレジスタファイル２０７-0〜２０７-3を設け、１６並列の長命令語中の各命令フィールド＃０〜＃１５のフィールド番号によるレジスタ番号の修飾を行い、各命令フィールド毎に（ソース指定とデスティネーション指定のそれぞれについて）使用可能なレジスタをレジスタファイル２０７-0〜２０６-3のいずれかに制限することにより、デスティネーションレジスタ指定部、並びに第１及び第２ソースレジスタ指定部のビット長をそれぞれｎビットとしながらも、長命令語全体として、使用可能なレジスタ数を２ⁿ⁺² 個とすることができる。
【０１３３】
しかも本実施形態においては、レジスタファイル２０７-0〜２０７-3の入出力ポート数を、レジスタ数が従来と同じ２ⁿ 個でありながら、入力ポート数２、出力ポート数４と従来より大幅に削減できる（１６並列の長命令語の場合、従来は入力ポート数１６、出力ポート数３２）。
【０１３４】
また、図１０では省略されているが、レジスタファイル２０７-0〜２０７-3にそれぞれ対応して設けられることになる、図１中のバイパス回路１０８-i，１０９-i（ｉ＝０，１）に相当するバイパス回路の１演算器の１入力当たりの入力ポート数（マルチプレクサの入力数）を５とすることができる（１６並列の長命令語の場合、従来は１７）。
【０１３５】
なお、前記第３及び第４の実施形態では、ソースレジスタの修飾方法が第１ソースレジスタと第２ソースレジスタとで同じ場合について説明したが、前記第２の実施形態と同様に、第１ソースレジスタと第２ソースレジスタとで異なる修飾方法を適用しても構わない。
【０１３６】
また、前記第１乃至第４の実施形態では、長命令語の各命令フィールドのフィールド番号毎に（ソース指定とデスティネーション指定のそれぞれについて）アクセス可能なレジスタファイルが制限されており、したがって各命令フィールドからアクセス可能（ソース指定及びデスティネーション指定可能）なレジスタも、その命令フィールド（のフィールド番号）によって制限されていたが、予め定められたレジスタ番号のレジスタ（例えばレジスタ番号が０〜７までの８個のレジスタ）については、全ての命令フィールドから共通にアクセス可能としても（即ちフィールド番号によるレジスタ修飾の対象外としても）よく、前記実施形態に限定されない種々の変形が可能である。
【０１３７】
次に、以上に述べた第１乃至第４の実施形態で適用した命令語形式、即ち長命令語中の各命令フィールドのフィールド番号によるレジスタ番号の修飾により、各命令フィールド毎に（ソース指定とデスティネーション指定のそれぞれについて）使用可能なレジスタファイルを制限することを可能とする命令語形式に従ったオブジェクトを生成するためのコンパイラ（並列最適化コンパイラ）について説明する。
【０１３８】
図１２は本コンパイラの一実施形態を示すブロック構成図である。
同図において、並列最適化コンパイラ３１０は、字句解析・構文解析部３１１、スカラ最適化部３１２、命令スケジュール部３１３、レジスタアロケーション部３１４及びコード出力部３１５の各機能要素から構成される。
【０１３９】
並列最適化コンパイラ３１０は、ソースファイル３２０に格納されている原始プログラムを対象に字句解析・構文解析部３１１により周知の字句解析及び構文解析を行ってプログラムエラーを検出すると共に第１の内部形式のプログラム（中間コード）に変える。
【０１４０】
次に並列最適化コンパイラ３１０は、字句解析・構文解析部３１１により生成された中間コードを対象にスカラ最適化部３１２により周知の最適化を行い、冗長な処理を含まないような実行時間がより少なくて済む第２の内部形式のプログラムを生成する。このプログラムは、シリアルな命令列からなる。
【０１４１】
ここまでの並列最適化コンパイラ３１０での処理は、通常のコンパイラ処理と同様であり、ＶＬＩＷとは無関係である。
次に並列最適化コンパイラ３１０は、スカラ最適化部３１２により生成された第２の内部形式のプログラムの各命令をスケジュールする命令スケジューリングを命令スケジュール部３１３により行う。この命令スケジュール部３１３による例えば前記第１の実施形態で適用した命令語形式（図３参照）を前提とする命令スケジューリングについて、“ａ←ｂ＋ｃ”の演算を指定する命令Ｉをトップダウン方式でスケジュールする場合を例に、図１３のフローチャートを参照して説明する。
【０１４２】
まず、命令スケジュール部３１３は、スケジュールの対象となる命令Ｉのソースオペランド（ｂ，ｃ）を定義した命令（が既に配置されている命令フィールド位置）のフィールド番号を調べる（ステップＳ１）。
【０１４３】
次に命令スケジュール部３１３は、調べたフィールド番号とソースオペランド（ｂ，ｃ）とがマッチしているか否か、即ちソースオペランドｂを定義した命令のフィールド番号で決まる（デスティネーション先としての）レジスタファイルと、ソースオペランドｃを定義した命令のフィールド番号で決まるデスティネーション先としてのレジスタファイルとが一致しているか否かを判断する（ステップＳ２）。
【０１４４】
もし上記レジスタファイルが一致している場合には、命令スケジュール部３１３は、命令Ｉを、当該レジスタファイル内レジスタがソース指定可能な命令フィールドに配置する（ステップＳ３）。
【０１４５】
これにより、ソースオペランドｂを定義した命令とソースオペランドｃを定義した命令のフィールド番号がいずれも０または２であるならば、そのフィールド番号で決まるデスティネーション先としてのレジスタファイルは、いずれもレジスタファイル１０７-0（＃０）であることから、命令Ｉは命令フィールド＃０または＃１（の空きフィールド）に配置される。同様に、ソースオペランドｂを定義した命令とソースオペランドｃを定義した命令のフィールド番号がいずれも１または３であるならば、そのフィールド番号で決まるデスティネーション先としてのレジスタファイルは、いずれもレジスタファイル１０７-1（＃１）であることから、命令Ｉは命令フィールド＃２または＃３（の空きフィールド）に配置される。
【０１４６】
これに対して上記レジスタファイルが不一致の場合には、命令スケジュール部３１３は、ソースオペランドｂ，ｃのうちの一方を、そのソースオペランドが存在するレジスタファイルから他方のソースオペランドが存在するレジスタファイルにコピーする命令（ＭＯＶＥ命令）を生成し、そのコピー命令を、そのソース先とデスティネーション先で決まる命令フィールドに配置する（ステップＳ４）。
【０１４７】
これにより、例えばソースオペランドｂがレジスタファイル１０７-0（＃０）に、ソースオペランドｃがレジスタファイル１０７-1（＃１）に存在し、このソースオペランドｃをレジスタファイル１０７-0（＃０）に変数ｄとしてコピーする場合であれば、ソース先がレジスタファイル１０７-1（＃１）、デスティネーション先がレジスタファイル１０７-0（＃０）であることから、そのためのコピー命令（ｄ←ｃ）は、命令フィールド＃２に配置される。
【０１４８】
命令スケジュール部３１３はコピー命令を生成して配置すると（ステップＳ４）、命令Ｉに相当する“ａ←ｂ＋ｄ”の命令Ｉ′を、コピー命令のコピー先（デスティネーション先）レジスタファイルをソースレジスタファイルとして使用可能な命令フィールド（ここでは＃０または＃１）に配置する（ステップＳ５）。
【０１４９】
次に、命令スケジュール部３１３による前記第１の実施形態で適用した命令語形式（図３参照）を前提とする命令スケジューリングについて、レジスタ（変数）ａを定義する命令Ｉをボトムアップ方式でスケジュールする場合を例に、図１４のフローチャートを参照して説明する。
【０１５０】
まず、命令スケジュール部３１３は、スケジュールの対象となる命令Ｉが定義するレジスタ（仮想レジスタ、変数）ａを（ソースとして）使う命令（が既に配置されている命令フィールド位置）のフィールド番号を調べる（ステップＳ１１）。
【０１５１】
次に命令スケジュール部３１３は、調べたフィールド番号と命令Ｉが定義する仮想レジスタ（デスティネーション先）ａとがマッチしているか否か、具体的には、調べたフィールド番号から、仮想レジスタ（デスティネーション先）ａがレジスタファイル１０７-0（＃０）または１０７-1（＃１）のいずれになければならないか、或いはその両方になければならないかを判断する（ステップＳ１２）。
【０１５２】
この判断の条件は、ａを使う全ての命令が、命令フィールド＃０または＃１と、命令フィールド＃２または＃３のいずれか一方だけにあるか、或いは両方にあるか、即ちａを使う命令のフィールド番号で決まるソース指定可能なレジスタファイルが、その命令数に無関係に１つだけである（この状態を、デスティネーションレジスタａとフィールド番号がマッチしていると呼ぶ）か否かである。
【０１５３】
もし、ａを使う全ての命令が命令フィールド＃０または＃１だけにある場合（デスティネーションレジスタａとフィールド番号がマッチしている場合）には、ａはレジスタファイル１０７-0（＃０）になければならず、命令フィールド＃２または＃３だけにある場合には、ａはレジスタファイル１０７-1（＃１）になければならない。ａを使う命令が１つの場合には、その命令は、命令フィールド＃０または＃１と、命令フィールド＃２または＃３のいずれか一方にしか存在しない。
【０１５４】
一方、ａを使う命令が複数で、しかもその複数の命令が命令フィールド＃０または＃１側と、命令フィールド＃２または＃３側に分散配置されている場合（デスティネーションレジスタａとフィールド番号がマッチしていない場合）には、ａはレジスタファイル１０７-0（＃０）及び１０７-1（＃１）の両方になければならない。
【０１５５】
命令スケジュール部３１３は、命令Ｉが定義する変数（仮想レジスタ）ａがレジスタファイル１０７-0（＃０）になければならないと判断した場合には、その命令Ｉを命令フィールド＃０または＃２に配置し、レジスタファイル１０７-1（＃１）になければならないと判断した場合には、その命令Ｉを命令フィールド＃１または＃３に配置する（ステップＳ１３）。
【０１５６】
これに対し、命令Ｉが定義する変数（仮想レジスタ）ａがレジスタファイル１０７-0（＃０）及び１０７-1（＃１）の両方になければならないと判断した場合には、ａをレジスタファイル１０７-0（＃０）からレジスタファイル１０７-1（＃１）、またはレジスタファイル１０７-1（＃１）からレジスタファイル１０７-0（＃０）に変数ｘとしてコピーする命令（ｘ←ａ）を生成し、そのコピー命令を命令フィールド＃１または＃２に配置する（ステップＳ１４）。
【０１５７】
ここでは、命令Ｉを命令フィールド＃０または＃２に配置しようとするならば、ａをレジスタファイル１０７-0（＃０）からレジスタファイル１０７-1（＃１）にコピーする命令が命令フィールド＃１に配置され、命令Ｉを命令フィールド＃１または＃３に配置しようとするならば、ａをレジスタファイル１０７-1（＃１）からレジスタファイル１０７-0（＃０）にコピーする命令が命令フィールド＃２に配置される。
【０１５８】
命令スケジュール部３１３はコピー命令を生成して配置すると（ステップＳ１４）、命令フィールド＃１にコピー命令を配置した場合であれば、命令Ｉを命令フィールド＃０または＃２に配置し、命令フィールド＃２にコピー命令を配置した場合であれば、命令Ｉを命令フィールド＃１または＃３に配置する（ステップＳ１５）。
【０１５９】
このとき命令スケジュール部３１３は、命令フィールド＃１にコピー命令を配置した場合であれば、既に配置済みのａを使う命令のうち、命令フィールド＃２，＃３にある命令のａをｘに変更し、命令フィールド＃２にコピー命令を配置した場合であれば、既に配置済みのａを使う命令のうち、命令フィールド＃０，＃１にある命令のａをｘに変更する。
【０１６０】
並列最適化コンパイラ３１０は、以上のスケジュール処理を、スカラ最適化部３１２により生成された第２の内部形式のプログラムの各命令について、始端命令から順に終端命令まで（トップダウン方式の場合）、或いは終端命令から順に始端まで（ボトムアップ方式の場合）命令スケジュール部３１３により実行すると、そのスケジュール済みの各命令中の変数に対する物理レジスタ割り当て（レジスタアロケーション）をレジスタアロケーション部３１４により行う。このレジスタアロケーション部３１４によるレジスタアロケーションについて、図１５のフローチャートを参照して説明する。
【０１６１】
レジスタアロケーション部３１４は、命令スケジュール部３１３によりスケジュールされた各命令をスキャンして、各変数（仮想レジスタ）が参照或いは定義される命令のフィールド番号から、全ての変数をレジスタファイル別にクラス分けする（ステップＳ２１）。ここでは、レジスタファイル１０７-0（＃０）に存在すべき変数と、レジスタファイル１０７-1（＃１）に存在すべき変数の２つのクラスに分けられる。
【０１６２】
次にレジスタアロケーション部３１４は、各クラスの各変数について、クラス別に、そのクラスに対応するレジスタファイル内の物理レジスタの割り当てを行う（ステップＳ２２）。
【０１６３】
並列最適化コンパイラ３１０はレジスタアロケーション部３１４によるレジスタアロケーションを終了すると、このレジスタアロケーションが施された内部形式の各命令から計算機（ここではＶＬＩＷプロセッサ）で実行可能なコード（オブジェクトコード）をコード出力部３１５により生成し、オブジェクトファイル３３０として出力する。
【０１６４】
【発明の効果】
以上詳述したように本発明によれば、複数のレジスタファイルを設け、長命令語（ＶＬＩＷ）中の各命令フィールドのフィールド番号によるレジスタ番号の修飾を行い、各命令フィールド毎に（ソース指定とデスティネーション指定のそれぞれについて）使用可能なレジスタをいずれかのレジスタファイルに制限する構成とすることにより、長命令語全体で扱えるレジスタ数を長命令語長を伸ばすことなく増やすことができ、しかもハードウェア構成の複雑化を招かないで済む。
【図面の簡単な説明】
【図１】本発明の第１の実施形態に係るＶＬＩＷプロセッサの概略構成を示すブロック図。
【図２】図１中のバイパス回路１０９-0，１０９-1の内部構成を、その周辺の構成と共に示すブロック図。
【図３】上記第１の実施形態における命令フィールドのフィールド番号によるレジスタ番号の修飾について、主としてフィールド番号が１（“０１”）の命令フィールド＃１を例に説明するための図。
【図４】上記第１の実施形態におけるレジスタファイル１０７-0，１０７-1と長命令語の各命令フィールド＃０〜＃３（のフィールド番号“００”〜“１１”）との間の関係を示す図。
【図５】本発明の第２の実施形態に係るＶＬＩＷプロセッサの概略構成を図２と同様の形式で示すブロック図。
【図６】上記第２の実施形態における命令フィールドのフィールド番号によるレジスタ番号の修飾を説明するための図。
【図７】上記第２の実施形態におけるレジスタファイル１０７-0，１０７-1と長命令語の各命令フィールド＃０〜＃３（のフィールド番号“００”〜“１１”）との間の関係を示す図。
【図８】本発明を８並列の長命令語を実行するＶＬＩＷプロセッサに適用した第３の実施形態におけるレジスタファイルと長命令語の各命令フィールドとの間の関係を示す図。
【図９】上記第３の実施形態における命令フィールドのフィールド番号によるレジスタ番号の修飾を説明するための図。
【図１０】本発明を１６並列の長命令語を実行するＶＬＩＷプロセッサに適用した第４の実施形態におけるレジスタファイルと長命令語の各命令フィールドとの間の関係を示す図。
【図１１】上記第４の実施形態における命令フィールドのフィールド番号によるレジスタ番号の修飾を説明するための図。
【図１２】上記第１乃至第４の実施形態で適用した命令語形式に従ったオブジェクトを生成するためのコンパイラの一実施形態を示すブロック構成図。
【図１３】図１２中のコンパイラにおける命令スケジュール処理をトップダウン方式でスケジュールする場合について説明するためのフローチャート。
【図１４】図１２中のコンパイラにおける命令スケジュール処理をボトムアップ方式でスケジュールする場合について説明するためのフローチャート。
【図１５】図１２中のコンパイラにおけるレジスタアロケーション処理を説明するためのフローチャート。
【図１６】従来のＶＬＩＷプロセッサにおける、長命令語の各命令フィールドの命令で使用可能なレジスタ数と、その命令中ののレジスタ指定部のビット数との関係を、レジスタ数が２ⁿ 個の場合と、その２倍の２ⁿ⁺¹ 個の場合とについて示す図。
【図１７】従来のＶＬＩＷプロセッサの概略構成を示すブロック図。
【符号の説明】
１０１…命令フェッチ機構、
１０２…命令デコード機構、
１０６-0〜１０６-3…演算器、
１０７-0，１０７-1，２０７-0〜２０７-3…レジスタファイル、
１０８-0，１０８-1，１０９-0，１０９-1，２０９-0，２０９-1…バイパス回路、
１１０，１１１…ラッチ回路、
１１９Ｌ0 〜１１９Ｌ3 ，１１９Ｒ0 〜１１９Ｒ3 ，２１９Ｌ0 〜２１９Ｌ3 ，２１９Ｒ0 〜２１９Ｒ3 …マルチプレクサ（ＭＰＸ）[0001]
BACKGROUND OF THE INVENTION
The present invention relates to a VLIW processor that executes a very long instruction word (VLIW) having a plurality of instruction fields.
[0002]
[Prior art]
Conventionally, attempts have been made to increase the operating frequency of the processor in order to improve the performance of the processor. However, due to the degree of circuit integration, power consumption, device speed, etc., the physical limit has been approached. Yes. Therefore, today, in order to realize faster processing, a processor employing an architecture that executes a plurality of instructions such as a superscalar and a long instruction word (VLIW) simultaneously (in parallel) has been developed and widely used. It has become so.
[0003]
Now, in a processor that executes a long instruction word (VLIW processor), the number of registers that can be used by each instruction (unit instruction) in the long instruction word is 2 ⁿ In this case, an n-bit register designating unit is required to designate one register (destination register or source register) in the instruction (instruction field). For this reason, for example, in an example of an instruction that handles three operands, as shown in FIG. 16A, an n-bit register designation portion is required for three operands, and the number of bits necessary for register designation in the entire instruction. Is 3n bits.
[0004]
In order to reduce the number of memory accesses and realize high-speed processing, the number of registers that can be handled is doubled to 2 ^{n + 1} If the number of bits is to be reduced, the number of bits of the register specification section (here, three register specification sections OP1 to OP3) in each instruction is changed from n bits shown in FIG. 16A to FIG. 16B. It is necessary to increase to n + 1 bits, and in the case of 3 operands, it is necessary to increase 3 bits in one instruction.
[0005]
On the other hand, in order to achieve higher performance with a processor (VLIW processor) that executes the above long instruction word, it is necessary to increase the number of instructions that can be executed simultaneously (unit instructions) to increase the degree of parallelism. In order to increase the degree of parallelism, the length of the long instruction word may be increased to increase the number of instruction fields. However, increasing the number of instruction fields (instructions) in a long instruction word also increases the number of registers (number of register file ports) that must be read at one time (in one cycle), and is a bypass required for pipeline processing. The scale of the circuit (for example, a bypass circuit for guiding the calculation result to the calculator side without passing through the pipeline write stage) also increases. For this reason, the complexity of hardware increases and becomes a factor that prevents the operating frequency from being increased.
[0006]
FIG. 17 shows such an example of a pipeline configuration of a VLIW processor that can execute four instructions in parallel. Here, the number of 2-input 1-output arithmetic units 221-0 to 221-3 that matches the number of instructions that can be executed in parallel, and a buffer 222 that temporarily holds the arithmetic results of the arithmetic units 221-0 to 221-3. -0 to 222-3 latch circuit 222 and the outputs of the buffers 222-0 to 222-3 are selected as the left input (L input) or right input (R input) of the arithmetic units 221-0 to 221-3 And a register file 224 having a number of input ports corresponding to the number of instructions that can be executed in parallel and a number of output ports twice as many.
[0007]
The bypass circuit 223 is provided corresponding to the multiplexers (MPX) 223L0 to 223L3 provided corresponding to the left inputs of the arithmetic units 221-0 to 221-3 and the right inputs of the arithmetic units 221-0 to 221-3. Multiplexers 223R0 to 223R3. The multiplexers 223L0 to 223L3 and 223R0 to 223R3 have a one-to-one correspondence with the different output ports of the register file 224. One of the outputs from the corresponding output ports and the outputs of the buffers 222-0 to 222-3 is provided. Select and output to the corresponding input side of the calculators 221-0 to 221-3.
[0008]
Thus, when the number of instructions that can be executed in parallel is 4, the number of multiplexers constituting the bypass circuit is 2 × 4, and the number of inputs of each multiplexer is 4 + 1 = 5. Therefore, the total number of inputs of all the multiplexers in the bypass circuit, that is, the number of input ports of the bypass circuit is 5 × 2 × 4 = 40.
[0009]
In general, in the conventional VLIW processor, when the number of instruction fields of a long instruction word to be executed, that is, the number of concurrent execution instructions (parallelism) is N, in the example of an instruction format that handles three operands, the number of register file ports is The number of input ports is N, the number of output ports is 2N, the number of ports of the bypass circuit is (N + 1) × 2N (the number of inputs of 2N multiplexers constituting the bypass circuit is N + 1), and the number of output ports is 2N.
[0010]
[Problems to be solved by the invention]
As described above, in the conventional VLIW processor, in order to increase the number of registers that can be handled, the number of bits of the register designation part in the instruction (unit instruction) must be increased, and the length of the long instruction word (VLIW) must be increased. There was a problem of not becoming.
[0011]
In addition, in the conventional VLIW processor, when the degree of parallelism is increased, the number of registers that must be read at a time increases, leading to an increase in the number of register file ports, and the size of the bypass circuit required for pipeline processing increases. There was a problem that the complexity of hardware increased.
[0012]
The present invention has been made in view of the above circumstances, and the object thereof is to increase the number of registers that can be handled without increasing the length of the long instruction word (VLIW), and without increasing the complexity of the hardware configuration. It is to provide a high performance VLIW processor.
[0013]
[Means for Solving the Problems]
The present invention relates to a VLIW processor that executes a long instruction word (VLIW) having a plurality of instruction fields, based on a plurality of register files and a first predetermined part of a field number of each instruction field in the long instruction word. A register file capable of reading a source operand referred to by an instruction in the instruction field is allocated, and a second predetermined part at least partially different from the first predetermined part of the field number of each instruction field is assigned. And allocating means for allocating a register file capable of writing the execution result of the instruction in the instruction field. For example, in the case of assignment of a writable register file, the register file is assigned according to the connection information between the first predetermined portion and a part (upper bit) of the destination register designating portion of each instruction field. It is also possible to do this. Further, it is possible to overlap a part of the first predetermined portion and the second predetermined portion.
[0014]
Each register in the plurality of register files is assigned a unique register number. Therefore, in order to designate a register having this unique register number, the register number indicated by the source register designation portion of each instruction field in the long instruction word is determined by the first predetermined portion of the field number of the instruction field. The register number indicated by the destination register designation portion of each instruction field may be modified by a second predetermined portion different from the first predetermined portion of the field number of the instruction field. In addition, for the modification of the register number, a first predetermined part of the field number of the instruction field is added above the register number indicated by the source register specifying part of the instruction field, and the destination register specifying part of each instruction field It is preferable to apply a method of adding the second predetermined portion of the field number of the instruction field above the register number indicated by
[0015]
In the VLIW processor configured as described above, a register file from which a source operand referenced by an instruction in the instruction field can be read according to the field number of each instruction field of the long instruction word, and execution of the instruction in the instruction field Since the register file to which the result can be written is determined, the register specification part (source register specification part, destination register specification part) of each instruction field has a register position (relative position, relative register number) in the register file. It is not necessary to increase the number of bits constituting the register specification portion of the instruction field even if the number of registers that can be handled by the entire long instruction word is increased. In addition, when not only the field number but also a part of the register specification part (upper bits) is used for determining the register file, it is necessary to increase the number of bits constituting the register specification part of the instruction field. Compared to the case where it is not used, the number of bits to be increased is small.
[0016]
Moreover, in the VLIW processor configured as described above, the register file to be targeted is restricted for each of the source designation and the destination designation for each instruction field of the long instruction word. The number and the number of output ports can be reduced. For the same reason, it is possible to reduce the number of input ports and the number of output ports in the bypass circuit. Thereby, even if the degree of parallelism is increased (even if the number of instruction fields in the long instruction word is increased), it is possible to prevent the hardware complexity from being significantly increased.
[0017]
Further, in the VLIW processor configured as described above, the register file that can be used for each instruction field of the long instruction word is limited, but in the source designation, the register using the first predetermined part of the field number of each instruction field When the destination is specified, the register modification using the second predetermined portion different from the first predetermined portion is applied. Therefore, the operation result of the instruction in one instruction field can be changed to the instruction in another instruction field. You can refer to it.
[0018]
A program (object program) that can be executed by such a VLIW processor, that is, for each instruction field by modifying the register number by the field number of each instruction field in the long instruction word (for each of the source designation and the destination designation) In order to generate an object program that complies with the instruction word format that makes it possible to limit the register files that can be used, a compile function that performs instruction scheduling and register allocation (register allocation) described below may be prepared.
[0019]
For example, when instruction scheduling is performed in a top-down manner, if the target of the schedule is instruction I, the field number of the instruction defining the source operand referenced by the instruction I (the instruction field in which the instruction is already arranged) Whether or not the field number and source operand match (in the example of an instruction that handles three operands, the register file as the destination destination determined by the field numbers of the two instructions that define two source operands is If there is a match, the instruction I is placed in the instruction field in which the register in the register file can be specified as a source. If it does not match, the source operand is matched. Generate a copy instruction to copy between the register files and copy The instruction is placed in the instruction field determined by the source destination and the destination, and then the instruction I is placed in the instruction field in which the copy destination (destination destination) register file of the copy instruction can be designated as the source register file. .
[0020]
Further, when instruction scheduling is performed in a bottom-up manner, if the target of the schedule is an instruction I, all instructions that use a virtual register (variable) defined by the instruction I are already arranged instructions. Field), and there is only one register file that can be specified by the source using the virtual register's field number regardless of the number of instructions. If the virtual register and the field number match, place the instruction I in the instruction field determined by the field number of the instruction that uses a, and if it does not match , A is the target register so that a exists in the register file determined by the field number of all instructions that use a A copy instruction to be copied to a file is generated, and the copy instruction is placed in the instruction field determined by the source destination and the destination destination. After that, the copy source register file of the copy instruction is placed in the instruction field in which the destination can be specified. , Instruction I is arranged.
[0021]
If the above instruction schedule processing is performed from the start instruction to the end instruction in the case of the top-down method, and if it is performed from the end instruction to the start instruction in the bottom-up method, each scheduled instruction is scanned, Classify all variables by register file from the field number of the instruction to which the variable (virtual register) is referenced or defined. For each variable of each class, the physical register in the register file corresponding to that class is classified by class. What is necessary is just to perform the register allocation process to allocate.
[0022]
DETAILED DESCRIPTION OF THE INVENTION
Embodiments of the present invention will be described below with reference to the drawings.
[First Embodiment]
FIG. 1 is a block diagram showing a schematic configuration of a VLIW processor according to the first embodiment of the present invention.
[0023]
The VLIW processor shown in FIG. 1 is an arithmetic processor that executes, for example, four parallel long instruction words (four parallel VLIW) having four instruction fields # 0 to # 3 in a three-operand instruction format. Decoding mechanism 102, pipeline registers (PR) 103-105, arithmetic units 106-0-106-3, register files 107-0, 107-1, decode (D) stage bypass circuits 108-0, 108-1, The bypass circuits 109-0 and 109-1 and the latch circuits 110 and 111 in the execution (E) stage are provided.
[0024]
The instruction fetch mechanism 101 manages an I stage (instruction fetch stage) for fetching (reading) a long instruction word in a pipeline (from an instruction cache (not shown) or the like).
[0025]
The instruction decode mechanism 102 includes a D stage (instruction decode stage) for decoding each instruction (unit instruction) arranged in the instruction fields # 0 to # 3 of the long instruction word fetched by the instruction fetch mechanism 101 by a pipeline. Control. In the present embodiment, an instruction in a three-operand instruction format, that is, an instruction (for example, an operation instruction) having three register specifying parts (destination register specifying part, first and second source register specifying parts) is used. Therefore, when an operation instruction is decoded by the instruction decode mechanism 102, the decoding result includes a destination register number (OP1) indicating a storage destination of the operation result and a register storing a source operand used for the operation. Two specified source register numbers (first and second source register numbers OP2 and OP3) are included.
[0026]
The pipeline register 103 is used to hold the long instruction word fetched by the instruction fetch mechanism 101 during the D stage, and the pipeline register 104 follows the decode result of the instruction decode mechanism 102 to the D stage. The pipeline register 105 is used to hold the period of the E stage (instruction execution stage), and the pipeline register 105 holds the output of the pipeline register 103 for the period of the W stage (write stage) following the E stage. Used.
[0027]
The arithmetic units 106-0 to 106-3 are responsible for execution (E stage) of operations indicated by the instructions in the instruction fields # 0 to # 3 in the long instruction word.
The register files 107-0 (# 0) and 107-1 (# 1) are respectively 2 for storing the calculation result in the VLIW processor. ⁿ It consists of registers. 2 in register file 107-0 ⁿ 0 to 2 for each register ⁿ -1 register number is assigned, 2 in the register file 107-1 ⁿ Each register has 2 ⁿ ~ 2x2 ⁿ -1 register number, ie 2 ⁿ ~ 2 ^{n + 1} A register number of −1 is assigned. Here, the number of bits of the register number is n + 1 bits, and the register file in which the corresponding register exists is specified by the most significant bit (register file 107-0 for "0", register file for "1") 107-1), the remaining n bits (lower n bits) specify the register position in the register file.
[0028]
On the other hand, the bit length of the three register designating parts of each instruction in the long instruction word applied in the VLIW processor of FIG. 1 is n bits. In this case, the register designation unit alone has a total of 2 provided by the register files 107-0 and 107-1. ^{n + 1} Cannot be specified.
[0029]
Therefore, in the present embodiment, as described below, a register file 107-0 or a register that can be used for each instruction field # 0 to # 3 (and for each source designation and destination designation) of the long instruction word is set. Restrict to one of 107-1, and each register designation part of n bits in the instruction of the instruction field causes a register position in the restricted register file (n bits excluding the most significant bit of the n + 1 bit register number) In this configuration, the bit length of the register designating part is n bits, but the entire long instruction word is 2 ^{n + 1} The number of registers can be specified.
[0030]
First, in the present embodiment, the register file 107-0 is fixedly assigned to the reference destination of the source operand used for the operation indicated by the instruction in the instruction fields # 0 and # 1 (field numbers 0 and 1). The register file 107-1 is fixedly assigned to the reference destination of the source operand used for the operation indicated by the instructions # 2 and # 3 (field numbers 2 and 3).
[0031]
Further, the execution result of the operation indicated by the instruction in the instruction fields # 0 and # 2 (field numbers 0 and 2), that is, the operation of the arithmetic units 106-0 and 106-2 among the arithmetic units 106-0 to 106-3. The register file 107-0 is fixedly assigned to the result write destination, and the execution result of the operation indicated by the instructions in the instruction fields # 1, # 3 (field numbers 1, 3), that is, the arithmetic units 106-0 to 106-0 to The register file 107-1 is fixedly assigned to the write destination of the calculation results of the arithmetic units 106-1 and 106-3 of 106-3.
[0032]
In the above assignment, the outputs of the arithmetic units 106-0 and 106-2 are assigned to the input port of the register file 107-0, and the outputs of the arithmetic units 106-1 and 106-3 are assigned to the input port of the register file 107-0. In addition to connecting (via the latch circuit 111), the output port of the register file 107-0 is connected to the arithmetic units 106-0 and 106-1 (via the bypass circuit 108-0, the latch circuit 110, and the bypass circuit 109-0). The output port of the register file 107-1 is connected to the input side of the arithmetic units 106-2 and 106-3 (via the bypass circuit 108-1, the latch circuit 110, and the bypass circuit 109-1), respectively. It is realized by doing. The outputs of the arithmetic units 106-0 and 106-2 are also connected to the bypass circuits 108-0 and 109-0, and the outputs of the arithmetic units 106-1 and 106-3 are also connected to the bypass circuits 108-1 and 109-1. Connected.
[0033]
With the above configuration, in this embodiment, the n-bit register designating portion of each instruction in the long instruction word designates the register position in the register file, that is, n bits excluding the most significant bit of the n + 1 bit register number, Information on the register file in which the register at the register position exists (information on whether it exists in the register file 107-0 or 107-1), that is, the most significant bit of the register number of n + 1 bits indicates the instruction position (instruction field number). ).
[0034]
This is because the n-bit register numbers (OP1, OP2, OP3) designated by the three register designating parts (destination register designating part, first and second source register designating part) of each instruction field are assigned to the instruction position (instruction This is equivalent to designating as a register number of n + 1 bits by modifying with a field number).
[0035]
Here, when field numbers 0 (“00”) to 3 (“11”) of instruction fields # 0 to # 3 of the long instruction word are represented by 2 bits “B0 B1”, bit B0 is modified by the source register number ( Bit B1 is used to modify the destination register number (register file specification). In this case, as the source register, the register in the register file 107-0 is used in the instruction fields # 0 and # 1 where the bit B0 is “0”, and the register file is used in the instruction fields # 2 and # 3 where the bit B0 is “1”. The register in 107-1 is designated. On the other hand, as the destination register, the register in the register file 107-0 is used in the instruction fields # 0 and # 2 in which the bit B1 is “0”, and the register file is used in the instruction fields # 1 and # 3 in which the bit B1 is “1”. The register in 107-1 is designated.
[0036]
The register file 107-0 has two input ports that are half the number of instructions that can be executed in parallel, and four output ports (P00, P01, P02, P03) that are twice the number of input ports. The register file 107-1 also has two input ports that are half the number of instructions that can be executed in parallel, and four output ports (P10, P11, P12, P13) that are twice the number of input ports.
[0037]
The bypass circuit 108-0 corresponds to the D stage, and as four source operands used for operations in the arithmetic units 106-0 and 106-1 (operations indicated by the instructions in the instruction fields # 0 and # 1), Basically, the data to be read from the register in the register file 107-0 indicated by each source register designating part in the instructions in the instruction fields # 0 and # 1 is selected. However, if the source register matches the storage destination register (destination register) of the operation result of the arithmetic unit 106-0 or 106-2, the bypass circuit 108-0 does not use the data of the register but the operation D stage bypass is performed to select the result, that is, the operation result of the instruction two cycles before.
[0038]
The bypass circuit 108-1 corresponds to the D stage in the same manner as the bypass circuit 108-0, and is used for the calculation in the calculators 106-2 and 106-3 (the calculation indicated by the instructions in the instruction fields # 2 and # 3). As four source operands to be used, data to be read from the register in the register file 107-1 indicated by each source register designating part in the instructions in the instruction fields # 2 and # 3 is basically selected. However, if the source register matches the storage destination register (destination register) of the operation result of the arithmetic unit 106-1 or 106-3, the bypass circuit 108-1 does not use the data of the register, but the operation D stage bypass is performed to select the result, that is, the operation result of the instruction two cycles before.
[0039]
The bypass circuit 109-0 corresponds to the E stage, and is used as four source operands used for operations in the arithmetic units 106-0 and 106-1 (operations indicated by the instructions in the instruction fields # 0 and # 1). Basically, data guided from the bypass circuit 108-0 through the latch circuit 110 is selected. However, among the source registers indicated by the source register specification sections in the instructions in the instruction fields # 0 and # 1, those that match the storage destination register (destination register) of the operation result of the arithmetic unit 106-0 or 106-2 , The bypass circuit 109-0 is not the register data (data derived from the bypass circuit 108-0 via the latch circuit 110), but the operation result, that is, the operation of the instruction one cycle before (immediately before). Perform E-stage bypass to select results.
[0040]
The bypass circuit 109-1 corresponds to the E stage in the same manner as the bypass circuit 109-0, and is used for the calculation in the calculators 106-2 and 106-3 (the calculation indicated by the instructions in the instruction fields # 2 and # 3). As four source operands to be used, basically, data guided from the bypass circuit 108-1 through the latch circuit 110 is selected. However, among the source registers indicated by the source register designation sections in the instructions in the instruction fields # 2 and # 3, those that match the storage destination register (destination register) of the calculation result of the calculator 106-1 or 106-3 , The bypass circuit 109-1 selects the operation result (the operation result of the instruction one cycle before), not the data of the register (data derived from the bypass circuit 108-1 via the latch circuit 110). Perform E-stage bypass.
[0041]
The latch circuit 110 is used to hold the source operand selected by the bypass circuits 108-0 and 108-1 during the E stage, and the latch circuit 111 is selected by the bypass circuits 109-0 and 109-1. This source operand is used for holding the period of the W stage.
[0042]
FIG. 2 shows the internal configuration of the bypass circuits 109-0 and 109-1 along with the peripheral configuration thereof.
The bypass circuit 109-0 corresponds to multiplexers (MPX) 119L0 and 119L1 provided corresponding to the left inputs of the arithmetic units 106-0 and 106-1 and to the right inputs of the arithmetic units 106-0 and 106-1. The multiplexers (MPX) 119R0 and 119R1 are provided.
[0043]
The multiplexers 119L0, 119R0, 119L1, 119R1 have a one-to-one correspondence with the output ports P00, P01, P02, P03 of the register file 107-0 and are read from the corresponding output ports (bypass in FIG. 1). The arithmetic unit 106-0 is selected by selecting one of the data derived through the circuit 108-0 and the latch circuit 110 and the operation result of the arithmetic units 106-0 and 106-2 (directed through the latch circuit 111). , 106-1 to the corresponding input side.
[0044]
The bypass circuit 109-1 corresponds to the multiplexers (MPX) 119L2 and 119L3 provided corresponding to the left inputs of the arithmetic units 106-2 and 106-3 and the right inputs of the arithmetic units 106-2 and 106-3. The multiplexers (MPX) 119R2 and 119R3 are provided.
[0045]
The multiplexers 119L2, 119R2, 119L3, and 119R3 have a one-to-one correspondence with the output ports P10, P11, P12, and P13 of the register file 107-1, and are read from the corresponding output ports (bypass in FIG. 1). The arithmetic unit 106-2 selects one of the data guided through the circuit 108-1 and the latch circuit 110 and the arithmetic result of the arithmetic units 106-1 and 106-3 (directed through the latch circuit 111). , 106-3 to the corresponding input side.
[0046]
The latch circuit 111 includes buffers 111-0 to 111-3 that hold the calculation results of the calculators 106-0 to 106-3 during the W stage. The data held in the buffers 111-0 and 111-2 is used for writing to the register file 107-0, and the data held in the buffers 111-1 and 111-3 is used for writing to the register file 107-1.
[0047]
In FIG. 2, the bypass circuits 108-0 and 108-1 and the latch circuit 110 are omitted, but the hardware configuration is the same as that of the bypass circuits 109-0 and 109-1 and the latch circuit 111.
[0048]
Next, the operation in the configuration of FIGS. 1 and 2 will be described.
The pipeline applied in the VLIW processor of FIG. 1 includes (1) an I stage in which instruction fetch is performed, (2) a D stage in which register decode (source operand read) is performed based on instruction decode and decode results, and (3) It is assumed that there are four stages: an E stage where instruction execution (calculation) is performed, and (4) a W stage where calculation results are written to a register. There are also pipelines composed of 5 stages and 6 stages that require two stages for instruction decoding, register reading, and register writing.
[0049]
First, in the I stage, a long instruction word (VLIW) is fetched from the instruction cache or the like by the instruction fetch mechanism 101. The long instruction word fetched by the instruction fetch mechanism 101 is held in the pipeline register 103 and used for instruction decoding at the D stage by the instruction decoding mechanism 102.
[0050]
In this D stage, when the instruction in the instruction field #i (i = 0 to 3) decoded by the instruction decoding mechanism 102 is an arithmetic instruction, for example, the field number (2 bits) of the instruction is “B0 B1”. Then, for the register file determined by the value of the higher-order bit B0 (the register file 107-0 if B0 = 0, the register file 107-1 if B0 = 1), the first and second of the instruction are processed. Data (source operand) is read from the register (within the register file) specified by the source register specifying unit.
[0051]
Therefore, the register file 107-0 is used for the instructions in the instruction fields # 0 and # 1 having the field number "B0 B1" where B0 is "0", that is, the field numbers are 0 ("00") and 1 ("01"). For the instruction fields # 2 and # 3 in which B0 is “1”, that is, the field number is 2 (“10”) and 3 (“11”), the register file 107-1 The source operand is read from the register (within the register file) specified by the first and second source register specifying sections in the instruction field.
[0052]
This means that the field numbers “0” and “# 3” of the instruction fields # 0 to # 3 are higher than the n-bit register numbers (OP2 and OP3) indicated by the first and second source register designating units. The register number modification for adding the bit B0 in B0 B1 "is performed, and the source register is designated by the n + 1 bit source register number (first and second source register numbers) to which the B0 is added. This is equivalent to reading data from. Here, the most significant bit of the source register number of n + 1 bits, that is, bit B0 is “0”, the register file 107-0 is designated, “1” is designated the register file 107-1, and n bits excluding the most significant bit That is, the n bits indicated by the source register designating part indicate the source register position in the register file.
[0053]
In this embodiment, the first source register designating unit designates the register of the source operand for the left input of the computing unit, and the second source register designating unit designates the register of the source operand for the right input of the computing unit. Shall be done.
[0054]
FIG. 3 shows the modification of the source register number by the field number of the instruction field described above together with the modification of the destination register number, which will be described later, mainly using the instruction field # 1 whose field number is 1 (“01”) as an example. With this register number modification, even if the source register specification part in the instruction is n bits long, the entire long instruction word is 2 ^{n + 1} Can handle as many registers.
[0055]
The read data from the register (in the register file 107-0) designated by the first source register designating part of the instruction in the instruction fields # 0 and # 1 is input to the left side of the arithmetic units 106-0 and 106-1. The read data from the registers (in the register file 107-0) designated by the second source register designating unit from the output ports P00 and P02 of the register file 107-0 corresponding to the arithmetic unit 106-0, 106- 1 is output from the output ports P01 and P03 of the register file 107-0 corresponding to the right input of 1 and led to the corresponding input port of the bypass circuit 108-0.
[0056]
Similarly, the read data from the register (in the register file 107-1) designated by the first source register designating part of the instruction in the instruction fields # 2 and # 3 is the left side of the arithmetic units 106-2 and 106-3. Data read from the registers (in the register file 107-1) designated by the second source register designating unit from the output ports P10, P12 of the register file 107-1 corresponding to the input are arithmetic units 106-2, 106. -3 are output from the output ports P11 and P13 of the register file 107-1 corresponding to the right side input and guided to the corresponding input port of the bypass circuit 108-1.
[0057]
The bypass circuit 108-0 decodes the first and second source register designating parts of the instruction fields # 0 and # 1 in the decoding result in the long instruction word currently in the D stage decoded by the instruction decoding mechanism 102. The instruction field # 0 in the result, that is, the first and second source register numbers (the lower n-1 bits thereof) and the decoding result in the long instruction word currently in the W stage held in the pipeline register 105; The decoding result of the # 2 destination register designating part, that is, the destination register number (the lower n-1 bits) is derived.
[0058]
On the other hand, the bypass circuit 108-1 includes the first and second source register designating sections of the instruction fields # 2 and # 3 in the decoding result in the long instruction word currently in the D stage decoded by the instruction decoding mechanism 102. Of the first and second source register numbers (the lower n-1 bits) and the instruction field # of the decoding result in the long instruction word currently in the W stage held in the pipeline register 105 The decoding result of the first and third destination register designation sections, that is, the destination register number (the lower n-1 bits) is derived.
[0059]
The bypass circuit 108-0 converts the first and second source register numbers (the lower n−1 bits thereof) designated in the instruction fields # 0 and # 1 of the instruction in the D stage into the instruction (D stage). Are compared with the destination register number (the lower n-1 bits thereof) specified in the instruction fields # 0 and # 2 of the instruction two cycles before the instruction at (1).
[0060]
Then, the bypass circuit 108-0 selects read data from the register designated by the source register number in the register file 107-0 as the source operand designated by the source register number that does not match the destination register number.
[0061]
Further, the bypass circuit 108-0 writes the data to the register designated by the destination register number (derived through the latch circuit 111) as the source operand designated by the source register number matching the destination register number. D stage bypass is performed to select the calculation result of the calculator (operator 106-0 or 106-2) to be used. If this D-stage bypass is not performed, the operation result of the arithmetic unit 106-0 or 106-2 executed by the designation of the long instruction word (instruction field # 0 or # 2) two cycles before is stored in the register. Until data is written to the register designated by the destination register number in the file 107-0, data is read from the register designated by the long instruction word (instruction field # 0 or # 1) of the instruction currently in the D stage. The pipeline flow is disturbed.
[0062]
On the other hand, the bypass circuit 108-1 uses the first and second source register numbers (the lower n−1 bits thereof) designated in the instruction fields # 2 and # 3 of the instruction in the D stage as the long instruction in the W stage. It compares with the destination register number (the lower n−1 bits) specified in the instruction fields # 1 and # 3 of the word (long instruction word two cycles before the long instruction word in the D stage).
[0063]
Then, the bypass circuit 108-1 selects the read data from the register designated by the source register number in the register file 107-1 as the source operand designated by the source register number that does not match the destination register number.
[0064]
In addition, the bypass circuit 108-1 writes (writes through the latch circuit 111) the register designated by the destination register number as the source operand designated by the source register number that matches the destination register number. D-stage bypass is performed to select the calculation result of the calculator (operator 106-1 or 106-3) to be used.
[0065]
Four source operands specified by the instruction fields # 0 and # 1 of the long instruction word in the D stage selected by the bypass circuit 108-0, and the long instruction in the D stage selected by the bypass circuit 108-1 The four source operands designated by the word instruction fields # 2 and # 3 are held in the latch circuit 110 and are guided to the corresponding bypass circuits 109-0 and 109-1 during the E stage.
[0066]
At this time, the decoding result of the instruction decoding mechanism 102 for the long instruction word is moved to the pipeline register 104. At the same time, the decoding result of the long instruction word one cycle before held in the pipeline register 104 is moved to the pipeline register 105.
[0067]
The bypass circuit 109-0 includes the first and second source register designating parts of the instruction fields # 0 and # 1 in the decoding result in the long instruction word currently in the E stage held in the pipeline register 104. Specify the destination register of instruction fields # 0 and # 2 in the decoding result, that is, the first and second source register numbers and the decoding result in the long instruction word currently in the W stage held in the pipeline register 105 Part decoding result, that is, the destination register number is derived.
[0068]
On the other hand, the bypass circuit 109-1 designates the first and second source registers in the instruction fields # 2 and # 3 in the decoding result in the long instruction word currently in the E stage held in the pipeline register 104. Part decoding result, that is, the first and second source register numbers (the lower n-1 bits thereof) and the instruction field of the decoding result in the long instruction word currently in the W stage held in the pipeline register 105 The decoding result of the destination register designating parts of # 1 and # 3, that is, the destination register number (the lower n-1 bits) is derived.
[0069]
The bypass circuit 109-0 converts the first and second source register numbers (the lower n−1 bits thereof) designated in the instruction fields # 0 and # 1 of the instruction in the E stage into the instruction (E stage) Are compared with the destination register number (the lower n-1 bits thereof) designated in the instruction fields # 0 and # 2 of the instruction immediately before the instruction at (1).
[0070]
The bypass circuit 109-0 selects the corresponding source operand selected by the bypass circuit 108-0 and guided through the latch circuit 110 as the source operand specified by the source register number that does not match the destination register number. To do.
[0071]
In addition, the bypass circuit 109-0 can write to the register designated by the destination register number (derived through the latch circuit 111) as the source operand designated by the source register number that matches the destination register number. E stage bypass is performed to select the calculation result of the calculator (operator 106-0 or 106-2) to be used.
[0072]
On the other hand, the bypass circuit 109-1 uses the first and second source register numbers (the lower n−1 bits thereof) designated in the instruction fields # 2 and # 3 of the instruction in the E stage as the instructions in the W stage ( It compares with the destination register number (the lower n-1 bits) specified in the instruction fields # 1 and # 3 of the instruction immediately before the instruction in the E stage.
[0073]
The bypass circuit 109-1 selects the corresponding source operand selected by the bypass circuit 108-1 and guided through the latch circuit 110 as the source operand specified by the source register number that does not match the destination register number. To do.
[0074]
In addition, the bypass circuit 109-1 writes (writes through the latch circuit 111) the register designated by the destination register number as the source operand designated by the source register number that matches the destination register number. E stage bypass is performed to select the calculation result of the calculator (operator 106-1 or 106-3) to be used.
[0075]
Details of the selection operation of the bypass circuits 109-0 and 109-1 will be described.
First, the multiplexers 119L0 and 119L1 in the bypass circuit 109-0 are configured so that the first source register number (the lower n−1 bits) specified by the instruction fields # 0 and # 1 of the instruction in the E stage is in the W stage. If it does not match any of the destination register numbers (the lower n-1 bits) specified in the instruction fields # 0 and # 2 of a certain instruction, the bypass circuit 108-0 causes the arithmetic units 106-0, The source operand selected for the left input of 106-1 and guided through the latch circuit 110 is selected.
[0076]
The multiplexers 119L0 and 119L1 have the first source register number (the lower n-1 bits) specified in the instruction fields # 0 and # 1 of the instruction in the E stage in the instruction field # 0 of the instruction in the W stage. If it matches the specified destination register number (the lower n-1 bits), the operation result of the arithmetic unit 106-0 that is guided through the buffer 111-0 in the latch circuit 111 is selected, If the destination register number (the lower n−1 bits) specified in the instruction field # 2 of the instruction in the W stage matches, the operation guided through the buffer 111-2 in the latch circuit 111 The calculation result of the device 106-2 is selected.
[0077]
The data (source operand) selected by the multiplexers 119L0 and 119L1 is supplied to the left inputs (L inputs) of the arithmetic units 106-0 and 106-1.
Next, the multiplexers 119R0 and 119R1 in the bypass circuit 109-0 are configured such that the second source register number (the lower n-1 bits thereof) designated by the instruction fields # 0 and # 1 of the instruction in the E stage is the W stage. Is not coincident with any of the destination register numbers (the lower n-1 bits) specified in the instruction fields # 0 and # 2 of the instruction in the operation circuit 106-0 by the bypass circuit 108-0. , 106-1 are selected for the right side input and the source operand guided through the latch circuit 110 is selected.
[0078]
The multiplexers 119R0 and 119R1 have the second source register number (the lower n-1 bits) specified in the instruction field # 0, # 1 of the instruction in the E stage in the instruction field # 0 of the instruction in the W stage. If it matches the specified destination register number (the lower n-1 bits), the operation result of the arithmetic unit 106-0 that is guided through the buffer 111-0 in the latch circuit 111 is selected, If the destination register number (the lower n−1 bits) specified in the instruction field # 2 of the instruction in the W stage matches, the operation guided through the buffer 111-2 in the latch circuit 111 The calculation result of the device 106-2 is selected.
[0079]
The data (source operand) selected by the multiplexers 119R0 and 119R1 is supplied to the right input (R input) of the arithmetic units 106-0 and 106-1.
On the other hand, the multiplexers 119L2 and 119L3 in the bypass circuit 109-1 receive the first source register number (the lower n-1 bits) specified in the instruction fields # 2 and # 3 of the instruction in the E stage at the W stage. If it does not match any of the destination register numbers (the lower n-1 bits) specified in the instruction fields # 1 and # 3 of a certain instruction, the bypass circuit 108-1 causes the arithmetic units 106-2, A source operand selected for the left side input of 106-3 and guided through the latch circuit 110 is selected.
[0080]
The multiplexers 119L2 and 119L3 have the first source register number (the lower n-1 bits) specified in the instruction fields # 2 and # 3 of the instruction in the E stage in the instruction field # 1 of the instruction in the W stage. If it matches the designated destination register number (the lower n-1 bits), the calculation result of the calculator 106-1 led through the buffer 111-1 in the latch circuit 111 is selected, If the destination register number (the lower n-1 bits) specified in the instruction field # 3 of the instruction in the W stage matches, the operation guided through the buffer 111-3 in the latch circuit 111 The calculation result of the device 106-3 is selected.
[0081]
The data (source operand) selected by the multiplexers 119L2 and 119L3 is supplied to the left inputs (L inputs) of the arithmetic units 106-2 and 106-3.
Next, the multiplexers 119R2 and 119R3 in the bypass circuit 109-1 have the second source register number (the lower n-1 bits) specified in the instruction fields # 2 and # 3 of the instruction in the E stage set to the W stage. Is not coincident with any of the destination register numbers (the lower n-1 bits thereof) designated in the instruction fields # 1 and # 3 of the instruction in FIG. , 106-3 is selected for the right side input and the source operand guided through the latch circuit 110 is selected.
[0082]
The multiplexers 119R2 and 119R3 have the second source register number (the lower n-1 bits) specified in the instruction fields # 2 and # 3 of the instruction in the E stage in the instruction field # 1 of the instruction in the W stage. If it matches the designated destination register number (the lower n-1 bits), the calculation result of the calculator 106-1 led through the buffer 111-1 in the latch circuit 111 is selected, If the destination register number (the lower n-1 bits) specified in the instruction field # 3 of the instruction in the W stage matches, the operation guided through the buffer 111-3 in the latch circuit 111 The calculation result of the device 106-3 is selected.
[0083]
The data (source operand) selected by the multiplexers 119R2 and 119R3 is supplied to the right input (R input) of the arithmetic units 106-2 and 106-3.
The arithmetic units 106-0 and 106-1 perform calculation of data between source operands supplied from the bypass circuit 109-0, and the arithmetic units 106-2 and 106-3 are supplied from the bypass circuit 109-1. Performs data operations between source operands. The calculation results of the arithmetic units 106-0 to 106-3 are held in the latch circuit 111 (inside buffers 111-0 to 111-3).
[0084]
At this time, the decoding result of the long instruction word (the decoding result of the long instruction word currently in the E stage) designating the operation in the arithmetic units 106-0 to 106-3 is transferred from the pipeline register 104 to the pipeline register 105. At the same time, the decoding result (decoding result of the long instruction word currently in the D stage) in the instruction decoding mechanism 102 for the long instruction word immediately after the long instruction word is moved to the pipeline register 104.
[0085]
Of the computation results of the computing units 106-0 to 106-3 held in the latch circuit 111 (inside buffers 111-0 to 111-3), the computation results of the computing units 106-0 and 106-2 (instruction field #) The operation results of the 0 and # 2 instructions) are stored in the input ports of the register file 107-0, and the operation results of the arithmetic units 106-1 and 106-3 (the operation results of the instructions in the instruction fields # 1 and # 3) are registered It is led to each input port of the file 107-1.
[0086]
The calculation results of the arithmetic units 106-0 and 106-2 guided to the register file 107-0 are the lengths in the current W stage held in the pipeline register 105 among the registers in the register file 107-0. The data is written in the register designated by the destination register designation part of the corresponding instruction fields # 0 and # 2 included in the decoding result in the instruction word.
[0087]
Further, the calculation results of the arithmetic units 106-1 and 106-3 guided to the register file 107-1 are stored in the current W stage held in the pipeline register 105 among the registers in the register file 107-1. It is written in the register designated by the destination register designation part of the corresponding instruction fields # 1 and # 3 included in the decoding result in a certain long instruction word.
[0088]
As described above, the operation results of the arithmetic units 106-0 to 106-3 held in the latch circuit 111 (inside buffers 111-0 to 111-3) are transferred to the current W stage held in the pipeline register 105. N + 1 determined by the destination register designation part of instruction fields # 0 to # 3 and the lower bit B1 of the field number “B0 B1” of the instruction fields # 0 to # 3 in the decoding result in a long instruction word It is written in the register indicated by the bit destination register number.
[0089]
That is, in the present embodiment, as shown in FIG. 3, the fields of the instruction fields # 0 to # 3 are placed above the n-bit register number (OP1) indicated by the destination register designating unit of the instruction fields # 0 to # 3. The register number modification to which the bit B1 of the number “B0 B1” is added is performed, and the n + 1-bit destination register number to which the bit B1 is added is used to specify (register numbers 0 to 2). ^{n + 1} -1) 2 ^{n + 1} Any one of the destination registers is designated, and the operation results of the arithmetic units 106-0 to 106-3 are written into the destination register. Here, the most significant bit of the destination register number of n + 1 bits, that is, the bit B1 is “0”, the register file 107-0 is designated, and the register file 107-1 is designated by “1”. The n bits, that is, the n bits (OP1) indicated by the destination register designating part, indicate the destination register position in the register file.
[0090]
Therefore, the register file 107-0 is stored for the instruction field in which the bit B1 is "0", that is, the instructions in the instruction fields # 0 and # 2 whose field numbers are 0 ("00") and 2 ("10"). The operation result of the instruction (the operation results of the arithmetic units 106-0 and 106-2) is written to the target in the register specified by the destination register specifying unit of the instruction. The register file 107-1 is stored for the instruction field in which the bit B1 is "1", that is, the instructions in the field numbers 1 and # 3 having the field numbers 1 ("01") and 3 ("11") The operation result of the instruction (operation results of the arithmetic units 106-1 and 106-3) is written to the target in the register specified by the destination register specifying unit of the instruction.
[0091]
As described above, in this embodiment, 2 ⁿ Two register files 107-0 and 107-1 comprising a plurality of registers are provided, and the register numbers are modified by the field numbers of the instruction fields # 0 to # 3 in the four parallel long instruction words. By limiting the available registers to one of the register files 107-0 or 107-1 (for each of the source designation and destination designation), the bit length of the register designation portion is n bits (same as conventional). (I.e., although the long instruction word length is not increased) ⁿ 2 from that, twice that ^{n + 1} It can be a piece.
[0092]
In the present embodiment, the number of input / output ports of the register files 107-0 and 107-1 is the same as that of the conventional register. ⁿ However, the number of input ports is 2 and the number of output ports is 4, which is half of the conventional one.
[0093]
The relationship between the register files 107-0 and 107-1 described above and the instruction fields # 0 to # 3 (field numbers “00” to “11”) of the long instruction word, specifically, the instruction field The relationship between # 0 to # 3 and the register file (source side register file) referenced by the instruction in the instruction field # 0 to # 3, and the instruction field # 0 to # 3 and the instruction in the instruction field # 0 to # 3 FIG. 4 shows the relationship between the designated calculation result and the write destination register file (destination side register file). In FIG. 4, two register files 107-0 and 107-1 are shown, but this is to show the relationship between the reference time and the result writing time, and is physically shown in FIGS. There is only one.
[0094]
In addition, in the present embodiment, since the registers that can be used for each instruction field are limited, the number of input ports (the number of multiplexer inputs) per computing unit of the bypass circuits 108-0 and 108-1 is: The conventional 5 can be reduced to 3, and the hardware configuration can be simplified.
[0095]
In the embodiment described above, as is clear from FIG. 2, the operations in the arithmetic units 106-0 and 106-1 (specifying the instructions in the instruction fields # 0 and # 1 of four parallel long instruction words) Source registers that can be used in the operation of the arithmetic units 106-2 and 106-3 (operations designated by the instructions in the instruction fields # 2 and # 3) can be used in the register of the register file 107-0. The source register is limited to the register file 107-1, and can be used as a write destination for the operation results of the arithmetic units 106-0 and 106-2 (the operation results of the operations specified by the instructions in the instruction fields # 0 and # 2). The destination register is used as a write destination for the operation results of the operation units 106-1 and 106-3 (operation results specified by the instructions in the instruction fields # 1 and # 3) in the register of the register file 107-0. In the above description, the possible destination registers are limited to the registers of the register file 107-1, but the present invention is not limited to this.
[0096]
Therefore, a second embodiment in which the restriction on the registers that can be used for each instruction field (for each of the source designation and the destination designation) is different from the above embodiments will be described with reference to the drawings.
[Second Embodiment]
FIG. 5 is a block diagram showing a schematic configuration of a VLIW processor according to the second embodiment of the present invention in the same format as FIG. 2, and the same parts as those in FIG.
[0097]
In FIG. 5, the bypass circuit 209-0 includes multiplexers (MPX) 219L0 and 219L1 provided corresponding to the left inputs of the arithmetic units 106-0 and 106-1 and the right sides of the arithmetic units 106-0 and 106-1. Multiplexers (MPX) 219R0 and 219R1 provided corresponding to the inputs.
[0098]
The multiplexers 219L0, 219L1, and 219R1 in the bypass circuit 209-0 have a one-to-one correspondence with the output ports P00, P02, and P03 of the register file 107-0, and the data read from the corresponding output port and the (latch circuit) One of the calculation results of the arithmetic units 106-0 and 106-2 (guided through 111) is selected and output to the corresponding input side of the arithmetic units 106-0 and 106-1.
[0099]
On the other hand, the multiplexer 219R0 in the bypass circuit 209-0 has a one-to-one correspondence with the output port P13 of the register file 107-1, and the data read from the output port P13 and (derived through the latch circuit 111). One of the computation results of the computing units 106-1 and 106-3 is selected and output to the right input of the computing unit 106-0.
[0100]
The bypass circuit 209-1 corresponds to multiplexers (MPX) 229L2 and 229L3 provided corresponding to the left inputs of the arithmetic units 106-2 and 106-3, and to the right inputs of the arithmetic units 106-2 and 106-3. The multiplexers (MPX) 229R2 and 229R3 are provided.
[0101]
The multiplexers 229L2, 229R2, and 229L3 in the bypass circuit 209-1 have a one-to-one correspondence with the output ports P10, P11, and P12 of the register file 107-1, and the data read from the corresponding output port and the (latch circuit) One of the calculation results of the arithmetic units 106-1 and 106-3 (guided through 111) is selected and output to the corresponding input side of the arithmetic units 106-2 and 106-3.
[0102]
On the other hand, the multiplexer 229R3 in the bypass circuit 209-1 has a one-to-one correspondence with the output port P01 of the register file 107-0, and the data read from the output port P01 and (derived via the latch circuit 111). One of the computation results of the computing units 106-0 and 106-2 is selected and output to the right input of the computing unit 106-3.
[0103]
The configuration of FIG. 5 differs from the configuration of FIG. 2 in that the inputs of the multiplexers 219R0 and 219R3 corresponding to the right inputs of the calculators 106-0 and 106-3 in FIG. This is the reverse of the inputs of the multiplexers 119R0 and 119R3 corresponding to the right inputs of 106-0 and 106-3.
[0104]
In the configuration of FIG. 5, if the field number of a 4-parallel long instruction word (4-parallel VLIW) having instruction fields # 0 to # 3 is “B0 B1”, the destination register number is as shown in FIG. The bit B1 is added to the upper part of the register number (OP1) indicated by the destination register designation part of the instructions in the instruction fields # 0 to # 3. Further, as shown in FIG. 6, the first source register number is obtained by adding a bit B0 to the upper part of the register number (OP2) indicated by the first source register designating part of the instructions in the instruction fields # 0 to # 3. Become. Up to this point, the process is the same as in the first embodiment.
[0105]
Next, as shown in FIG. 6, the second source register number differs depending on the field number of the instruction field, and the instruction fields # 0 and # 3 having field numbers “00” (= 0) and “11” (= 3). In this instruction, the level inversion bit of bit B0 is added above the register number (OP3) indicated by the second source register designating part of the instruction, and the field number is "01" (= 1), "10" In the instruction of the instruction fields # 1 and # 2 of (= 2), the bit B0 is added to the higher order of the register number (OP3) indicated by the second source register designating part of the instruction.
[0106]
In this case, the relationship between the register files 107-0 and 107-1 and the instruction fields # 0 to # 3 (field numbers “00” to “11” thereof) of the long instruction word in this embodiment, specifically Indicates the relationship between the instruction fields # 0 to # 3 and the register file (source side register file) referenced by the instructions in the instruction fields # 0 to # 3, and the instruction fields # 0 to # 3 and the instruction fields # 0 to ##. FIG. 7 shows the relationship between the operation result designated by the instruction and the storage register file (destination side register file).
[0107]
As described above, the instruction fields # 0, # 0, # 3, and the instruction fields # 0, # 3, and # 3 (field numbers “00” to “11”) are written in the instruction fields # 0 and # 3. # 2 is restricted to the register file 107-0 (inside register), and instruction fields # 1 and # 3 are restricted to the register file 107-1 (inside register), and the instructions in the instruction fields # 0 to # 3 are restricted. Instructions are stored in two register files 107-0 and 107-1 (inside registers) for instruction fields # 0 and # 3, and in register file 107-0 (inside registers) for instruction field # 1. Although the field # 2 is limited to the register file 107-1 (within the register), the bit length of the register designating part is n bits (the same as the conventional one). That despite not stretch the long instruction word length), the entire long instruction word, the number of available registers of the conventional 2 ⁿ 2 from that, twice that ^{n + 1} It can be a piece.
[0108]
In the first and second embodiments described above, the upper bit B0 of the field number “B0 B1” of the instruction fields # 0 to # 3 of the long instruction word is used as the source register number modification, and the lower bit B1. However, the present invention is not limited to this, and B0 may be used for destination register number modification and B1 may be used for source register number modification. In this case, the relationship between the instruction fields # 0 to # 3 and the register files 107-0 and 107-1 is the reverse of the above embodiment in the source designation and the destination designation, and the configuration of the VLIW processor (for example, the first In the first embodiment, the configuration of FIG. 2 and the configuration of FIG. 5 in the second embodiment also need to be changed so as to be adapted thereto.
[0109]
In the first and second embodiments described above, the case where the present invention is applied to a VLIW processor that executes four parallel long instruction words having instruction fields # 0 to # 3 has been described. The present invention includes, for example, a VLIW processor that executes 8-parallel long instruction words having instruction fields # 0 to # 7 (field numbers “000” to “111”), and further instruction fields # 0 to # 15 (field number “0000”). The present invention is also applicable to a VLIW processor or the like that executes 16 parallel long instruction words having "to" 1111 "). First, a third embodiment in which the present invention is applied to a VLIW processor that executes eight parallel long instruction words will be described.
[Third Embodiment]
FIG. 8 shows the relationship between the register file and each instruction field of the long instruction word in the third embodiment in which the present invention is applied to a VLIW processor that executes eight parallel long instruction words.
[0110]
In FIG. 8, four register files 207-0 (# 0) to 207-3 (# 3) are 2 similarly to the register files 107-0 and 107-1 in FIG. ⁿ It consists of registers. 2 in register file 207-0 ⁿ 0 to 2 for each register ⁿ -1 register number is 2 in register file 207-1 ⁿ 2 for each register ⁿ ~ 2x2 ⁿ -1 register number, ie 2 ⁿ ~ 2 ^{n + 1} -1 is 2 in the register file 207-2 ⁿ 2 for each register ^{n + 1} ~ 3x2 ⁿ -1 register number and 2 in register file 207-3 ⁿ 3x2 for each register ⁿ ~ 4x2 ⁿ −1 register number, ie 3 × 2 ⁿ ~ 2 ^{n + 2} A register number of −1 is assigned.
[0111]
In the present embodiment, the instructions in the instruction fields # 0 to # 7 of the 8-parallel long instruction word are instructions in the three-operand format (in the case of an operation instruction), and the bit length of the destination register specifying unit (OP1) is The bit length of the n + 1 bits and the first and second source register designating units (OP2 and OP3) is 1 (one bit more than in the first and second embodiments) (in the case of the first and second embodiments) Same as n). The value of the most significant bit (B) of the destination register designating section (OP1) is determined in advance by the instruction field (field number thereof), and fields # 0 to # 3 (field numbers “000” to “011”). Is “0”, and fields # 4 to # 7 (field numbers “100” to “111”) are “1”.
[0112]
Here, the register file to which the operation result of the instruction (operation instruction) in each instruction field #i (i = 0 to 7) is written has the field number (3 bits) of the instruction field #i set to “B0 B1 B2”. Then, it is determined by the 2 bits “B2 B” consisting of the least significant bit B2 in the field number and the most significant bit (B) of the n + 1 bit destination register designation part (OP1) in the instruction field.
[0113]
Further, the write destination register in the determined register file is designated by n bits excluding the most significant bit of the destination register designation part (OP1). That is, the write destination register of the operation result of the instruction (operation instruction) in the instruction field #i is higher than the n + 1-bit register number (OP1) specified by the destination register specifying unit of the instruction as shown in FIG. , It is specified by an n + 2 bit destination register number to which the least significant bit B2 of the field number of the instruction field is added.
[0114]
On the other hand, the register file to which the instruction (operation instruction) in the instruction field #i is referred is determined by the upper 2 bits “B0 B1” in the field number. The reference register in the determined register file is designated by the n-bit first and second source register designation sections (OP2, OP3) in the instruction field #i.
[0115]
That is, as shown in FIG. 9, the two source registers referred to in the instruction (operation instruction) in the instruction field #i are n-bit register numbers specified by the first and second source register specifying sections of the instruction. It is specified by an n + 2 bit source register number (first and second source register numbers) in which the upper 2 bits “B0 B1” of the field number of the instruction field are added above (OP2, OP3).
[0116]
In this embodiment, the upper 2 bits of the register number of n + 2 bits are “00” for register file 207-0, “01” for register file 207-1, “10” for register file 207-2, “11” designates the register file 207-3, and the n bits excluding the upper 2 bits indicate the register position in the register file.
[0117]
Therefore, as shown in FIG. 8, in the field number “B0 B1 B2”, the upper 2 bits “B0 B1” are “00”, that is, the field numbers are 0 (“000”) and 1 (“001”) instructions. For the instructions in the fields # 0 and # 1, the register file 207-0 is targeted, the instruction field # 2 in which “B0 B1” is “01”, that is, the field numbers are 2 (“010”) and 3 (“011”). , # 3 for the register file 207-1, "B0 B1" is "10", that is, instruction fields # 4, # 5 with field numbers 4 ("100") and 5 ("101"). For the register file 207-2 and "B0 B1" is "11", that is, instructions in the field # 6 and # 7 with field numbers 6 ("110") and 7 ("111") About register For the file 207-3, the source operand is read from the register (in the register file) designated by the first and second source register designating parts of the instruction field.
[0118]
An instruction field in which the least significant bit B2 in the field number “B0 B1 B2” is “0” and the most significant bit B of the destination register designating section is “0”, that is, the field number is 0 (“000”), 2 For the instruction fields # 0 and # 2 of (“010”), the register file 207-0 is targeted, the instruction field where the bit B2 is “0” and the bit B is “1”, that is, the field number is 4 (“ 100 ") and 6 (" 110 ") instruction fields # 4 and # 6 are directed to the register file 207-1, ie, an instruction field having bit B2 of" 1 "and bit B of" 0 ", i.e., field. Instructions in instruction fields # 1 and # 3 with numbers 1 (“001”) and 3 (“011”) are for register file 207-2, and bit B2 is “1” and bit B The instruction field of “1”, that is, the instructions of the field # 5 and # 7 with field numbers 5 (“101”) and 7 (“111”), the destination of the instruction for the register file 207-3 The operation result of the instruction is written to the register specified by the register specifying unit.
[0119]
As described above, in this embodiment, 2 ⁿ Four register files 207-0 to 207-3 consisting of a plurality of registers are provided, and the register numbers are modified by the field numbers of the instruction fields # 0 to # 7 in the eight parallel long instruction words. (For each of the source designation and the destination designation), the usable register is limited to one of the register files 207-0 to 206-3, so that the bit length of the destination register designation section is n + 1 bits, Although the bit length of the second source register designating part is n bits, the number of usable registers is 2 as the entire long instruction word. ^{n + 2} It can be a piece.
[0120]
Moreover, in the present embodiment, the number of input / output ports of the register files 207-0 to 207-3 is the same as that of the conventional register. ⁿ In this case, the number of input ports can be two and the number of output ports can be four (in the case of eight parallel long instruction words, conventionally, the number of input ports is eight and the number of output ports is sixteen).
[0121]
Although omitted in FIG. 8, the bypass circuits 108-i and 109-i (i = 0, 1) in FIG. 1 are provided corresponding to the register files 207-0 to 207-3, respectively. ) Can be set to 3 (the number of inputs of the multiplexer is 9 in the conventional case of 8 parallel long instruction words).
[0122]
In the above embodiment (third embodiment), the upper two bits “B0 B1” of the field numbers “B0 B1 B2” of the instruction fields # 0 to # 7 of the long instruction word are used as the source register number modification. Although the case where the lower-order 1 bit B2 is used for destination register number modification (in combination with the most significant bit B of the destination designation section) has been described, the present invention is not limited to this. For example, B1 is commonly used as part of the source register number modification and destination register number modification, "B0 B1" is used for source register number modification (as in the previous embodiment), and "B1 B2" is used. It may be used for destination number modification. In this case, the bit length of the destination register designating unit may be n bits unlike the embodiment.
[0123]
In such a register number modification, the correspondence between the instruction fields # 0 to # 7 and the register file that can be designated as the destination in the instruction fields # 0 to # 7 (the destination register designation unit) is the same as that described above. Unlike the form, the instruction fields # 0 and # 4 are in the register file 207-0 (# 0), the instruction fields # 1 and # 5 are in the register file 207-1 (# 1), and the instruction fields # 2 and # 6 are The register file 207-2 (# 2) and the instruction fields # 3 and # 7 are associated with the register file 207-3 (# 3), respectively.
[0124]
Next, a fourth embodiment in which the present invention is applied to a VLIW processor that executes 16 parallel long instruction words will be described.
[Fourth Embodiment]
FIG. 10 shows the relationship between the register file and each instruction field of the long instruction word in the fourth embodiment in which the present invention is applied to a VLIW processor that executes 16 parallel long instruction words.
[0125]
As shown in FIG. 10, in this embodiment as well, as in the third embodiment, 2 respectively. ⁿ Four register files 207-0 (# 0) to 207-3 (# 3) made up of a plurality of registers are used.
[0126]
In the present embodiment, the instructions in the instruction fields # 0 to # 15 of the 16 parallel long instruction words are instructions in the three-operand format (in the case of operation instructions), and the destination register specifying unit (OP1) The bit lengths of the second source register designating parts (OP2, OP3) are both n bits. Here, the destination register designating part (OP1) is one bit less than the number of bits of the destination register part in each of the instruction fields # 0 to # 7 of the 8-parallel long instruction word applied in the third embodiment. Please note that.
[0127]
In this embodiment, the register file to which the operation result of the instruction (operation instruction) in each instruction field #i (i = 0 to 15) is written has the field number (4 bits) of the instruction field #i set to “B0. If B1 B2 B3 ", it is determined by the lower two bits" B2 B3 "in the field number. Further, the write destination register in the determined register file is designated by the n bits of the destination register designation unit (OP1). In other words, the write destination register of the operation result of the instruction (operation instruction) in the instruction field #i is higher than the n-bit register number (OP1) specified in the destination register specifying part of the instruction as shown in FIG. , It is specified by the destination register number of n + 2 bits to which the lower two bits “B2 B3” of the field number of the instruction field are added.
[0128]
On the other hand, the register file to which the instruction (operation instruction) in the instruction field #i is referred is determined by the upper 2 bits “B0 B1” in the field number. The reference register in the determined register file is designated by the n-bit first and second source register designation sections (OP2, OP3) in the instruction field #i. That is, as shown in FIG. 11, the two source registers referred to in the instruction (operation instruction) in the instruction field #i are n-bit register numbers specified by the first and second source register specifying sections of the instruction. It is specified by an n + 2 bit source register number (first and second source register numbers) in which the upper 2 bits “B0 B1” of the field number of the instruction field are added above (OP2, OP3).
[0129]
In this embodiment, the upper 2 bits of the register number of n + 2 bits are “00” for register file 207-0, “01” for register file 207-1, “10” for register file 207-2, “11” designates the register file 207-3, and the n bits excluding the upper 2 bits indicate the register position in the register file.
[0130]
Therefore, as shown in FIG. 10, the upper two bits “B0 B1” in the field number “B0 B1 B2 B3” are “00”, that is, the field numbers are 0 (“0000”), 1 (“0001”), For the instructions in the instruction fields # 0, # 1, # 2, and # 3 of 2 (“0010”) and 3 (“0011”), “B0 B1” is “01” for the register file 207-0. Register file for instructions in field # 4, # 5, # 6, # 7 with field numbers 4 (“0100”), 5 (“0101”), 6 (“0110”), 7 (“0111”) Instruction field with “B0 B1” being “10”, ie, field numbers 8 (“1000”), 9 (“1001”), 10 (“1010”), 11 (“1011”), for 207-1 # 8, # 9, # 10, # 11 As for the instruction, the register file 207-2 is targeted, and "B0 B1" is "11", that is, the field numbers are 12 ("1100"), 13 ("1101"), 14 ("1110"), 15 (" 1111 ″) for the instruction fields # 12, # 13, # 14, and # 15 are designated by the first and second source register designating sections of the instruction field for the register file 207-3, respectively. A source operand read from the register (in the register file) is performed.
[0131]
Further, the lower 2 bits “B2 B3” in the field number “B0 B1 B2 B3” are “00”, that is, the field numbers are 0 (“0000”), 4 (“0100”), 8 (“1000”), 12 ("1100") instruction fields # 0, # 4, # 8, and # 12 are directed to the register file 207-0, "B2 B3" is "01", ie, the field number is 1 ("0001 "), 5 (" 0101 "), 9 (" 1001 "), and 13 (" 1101 ") in the instruction fields # 1, # 5, # 9, and # 13, the register file 207-1 is targeted. Register file for instructions in field # 2, # 6, # 10, # 14 with field numbers 2 (“0010”), 6 (“0110”), 10 (“1010”), 14 (“1110”) For 207-2 "B2 B3" is "11", that is, instruction fields # 3 and # 7 with field numbers 3 ("0011"), 7 ("0111"), 11 ("1011"), 15 ("1111") , # 11, and # 15, the operation result of the instruction is written to the register specified by the destination register specifying unit of the instruction for the register file 207-3.
[0132]
As described above, in this embodiment, 2 ⁿ Four register files 207-0 to 207-3 consisting of a plurality of registers are provided, and the register numbers are modified by the field numbers of the instruction fields # 0 to # 15 in the 16 parallel long instruction words. (Regarding each of the source designation and destination designation), by limiting the usable registers to any one of the register files 207-0 to 206-3, the destination register designation section and the first and second source register designations Although the bit length of each part is n bits, the total number of usable registers is 2 for the entire long instruction word. ^{n + 2} It can be a piece.
[0133]
Moreover, in the present embodiment, the number of input / output ports of the register files 207-0 to 207-3 is the same as that of the conventional register. ⁿ However, the number of input ports is 2, and the number of output ports is 4, which can be significantly reduced compared to the conventional case (in the case of 16 parallel long instruction words, the conventional number of input ports is 16 and the number of output ports is 32).
[0134]
Although omitted in FIG. 10, the bypass circuits 108-i and 109-i (i = 0, 1) in FIG. 1 are provided corresponding to the register files 207-0 to 207-3, respectively. ) Can be set to 5 (the number of inputs of the multiplexer) per input of one arithmetic unit of the bypass circuit corresponding to (5) (17 in the case of 16 parallel long instruction words).
[0135]
In the third and fourth embodiments, the case where the modification method of the source register is the same in the first source register and the second source register has been described. However, as in the second embodiment, the first source is modified. Different modification methods may be applied to the register and the second source register.
[0136]
In the first to fourth embodiments, the register file that can be accessed is limited for each field number of each instruction field of the long instruction word (for each of the source designation and the destination designation). Registers that can be accessed from the field (source specification and destination specification possible) are also limited by the instruction field (field number thereof), but a register with a predetermined register number (for example, register numbers 0 to 7) (8 registers) may be commonly accessible from all instruction fields (that is, may not be subject to register modification by field number), and various modifications not limited to the above-described embodiment are possible.
[0137]
Next, the instruction word format applied in the above-described first to fourth embodiments, that is, the modification of the register number by the field number of each instruction field in the long instruction word, for each instruction field (source designation and A compiler (parallel optimization compiler) for generating an object in accordance with the instruction word format that makes it possible to limit the usable register file (for each destination specification) will be described.
[0138]
FIG. 12 is a block diagram showing an embodiment of the compiler.
In the figure, the parallel optimization compiler 310 includes functional elements of a lexical analysis / syntax analysis unit 311, a scalar optimization unit 312, an instruction schedule unit 313, a register allocation unit 314, and a code output unit 315.
[0139]
The parallel optimization compiler 310 detects a program error by performing well-known lexical analysis and syntax analysis on the source program stored in the source file 320 by the lexical analysis / syntax analysis unit 311, and uses the first internal format. Change to a program (intermediate code).
[0140]
Next, the parallel optimization compiler 310 performs known optimization on the intermediate code generated by the lexical analysis / syntax analysis unit 311 by the scalar optimization unit 312, and the execution time that does not include redundant processing is increased. Generate a program of the second internal format that can be reduced. This program consists of a serial instruction sequence.
[0141]
The processing in the parallel optimization compiler 310 so far is the same as the normal compiler processing and is irrelevant to the VLIW.
Next, the parallel optimization compiler 310 performs instruction scheduling for scheduling each instruction of the program in the second internal format generated by the scalar optimization unit 312 using the instruction scheduling unit 313. For instruction scheduling based on, for example, the instruction word format (see FIG. 3) applied in the first embodiment by the instruction scheduling unit 313, the instruction I specifying the operation “a ← b + c” is scheduled in a top-down manner. An example of the case will be described with reference to the flowchart of FIG.
[0142]
First, the instruction schedule unit 313 checks the field number of the instruction (the instruction field position where the instruction operand is already arranged) defining the source operand (b, c) of the instruction I to be scheduled (step S1).
[0143]
Next, the instruction scheduling unit 313 determines whether or not the checked field number matches the source operand (b, c), that is, a register (as a destination) determined by the field number of the instruction defining the source operand b. It is determined whether the file and the register file as the destination destination determined by the field number of the instruction defining the source operand c match (step S2).
[0144]
If the register files match, the instruction scheduling unit 313 places the instruction I in the instruction field in which the register in the register file can specify the source (step S3).
[0145]
As a result, if the field number of the instruction defining the source operand b and the instruction defining the source operand c are both 0 or 2, the register file as the destination destination determined by the field number is both a register file. Since it is 107-0 (# 0), the instruction I is arranged in the instruction field # 0 or # 1 (the empty field). Similarly, if the field number of the instruction defining the source operand b and the instruction defining the source operand c are both 1 or 3, the register file as the destination destination determined by the field number is both a register file. Since it is 107-1 (# 1), the instruction I is arranged in the instruction field # 2 or # 3 (the empty field).
[0146]
On the other hand, if the register files do not match, the instruction scheduling unit 313 changes one of the source operands b and c from the register file in which the source operand exists to the register file in which the other source operand exists. An instruction to be copied (MOVE instruction) is generated, and the copy instruction is arranged in an instruction field determined by the source destination and the destination destination (step S4).
[0147]
Thus, for example, the source operand b exists in the register file 107-0 (# 0) and the source operand c exists in the register file 107-1 (# 1), and this source operand c is stored in the register file 107-0 (# 0). If the source is the register file 107-1 (# 1) and the destination is the register file 107-0 (# 0), the copy instruction (d ← c ) Is arranged in the instruction field # 2.
[0148]
When the instruction scheduling unit 313 generates and places a copy instruction (step S4), the instruction I 'corresponding to the instruction I is “a ← b + d”, the copy destination (destination destination) register file of the copy instruction is the source register file Are arranged in the instruction field (# 0 or # 1 in this case) that can be used as (step S5).
[0149]
Next, for instruction scheduling based on the instruction word format (see FIG. 3) applied in the first embodiment by the instruction scheduling unit 313, the instruction I defining the register (variable) a is scheduled in a bottom-up manner. An example will be described with reference to the flowchart of FIG.
[0150]
First, the instruction scheduling unit 313 checks the field number of an instruction (an instruction field position where the instruction is already arranged) that uses the register (virtual register, variable) a defined by the instruction I to be scheduled (as a source) ( Step S11).
[0151]
Next, the instruction schedule unit 313 determines whether or not the checked field number matches the virtual register (destination destination) a defined by the instruction I, specifically, from the checked field number, the virtual register (destination). It is determined whether the destination is a register file 107-0 (# 0), 107-1 (# 1), or both (step S12).
[0152]
This judgment condition is that all instructions using a are in only one of instruction field # 0 or # 1 and instruction field # 2 or # 3, or both, that is, an instruction using a. Whether or not there is only one register file that can be specified by the field number regardless of the number of instructions (this state is called that the destination register a matches the field number).
[0153]
If all instructions that use a are only in the instruction field # 0 or # 1 (when the destination register a and the field number match), a is stored in the register file 107-0 (# 0). If it is only in the instruction field # 2 or # 3, a must be in the register file 107-1 (# 1). If there is a single instruction using a, the instruction exists only in one of the instruction field # 0 or # 1 and the instruction field # 2 or # 3.
[0154]
On the other hand, when there are a plurality of instructions using a and the plurality of instructions are distributedly arranged on the instruction field # 0 or # 1 side and on the instruction field # 2 or # 3 side (the destination register a and the field number are A) must be in both register files 107-0 (# 0) and 107-1 (# 1).
[0155]
If the instruction schedule unit 313 determines that the variable (virtual register) a defined by the instruction I must be in the register file 107-0 (# 0), the instruction I is stored in the instruction field # 0 or # 2. If it is determined that it should be in the register file 107-1 (# 1), the instruction I is arranged in the instruction field # 1 or # 3 (step S13).
[0156]
On the other hand, if it is determined that the variable (virtual register) a defined by the instruction I must be in both the register files 107-0 (# 0) and 107-1 (# 1), a is stored in the register file. Instruction for copying as variable x from register 107-10 (# 0) to register file 107-1 (# 1), or from register file 107-1 (# 1) to register file 107-0 (# 0) (x ← a) And the copy instruction is placed in the instruction field # 1 or # 2 (step S14).
[0157]
Here, if instruction I is to be placed in instruction field # 0 or # 2, the instruction to copy a from register file 107-0 (# 0) to register file 107-1 (# 1) is the instruction field #. 1 and if instruction I is to be placed in instruction field # 1 or # 3, the instruction to copy a from register file 107-1 (# 1) to register file 107-0 (# 0) Arranged in field # 2.
[0158]
When the instruction scheduling unit 313 generates and arranges the copy instruction (step S14), if the copy instruction is arranged in the instruction field # 1, the instruction I is arranged in the instruction field # 0 or # 2, and the instruction field # If the copy instruction is arranged in 2, the instruction I is arranged in the instruction field # 1 or # 3 (step S15).
[0159]
At this time, if a copy instruction is arranged in the instruction field # 1, the instruction scheduling unit 313 changes the instruction a in the instruction fields # 2 and # 3 to x among the instructions using the already arranged a. If the copy instruction is arranged in the instruction field # 2, among the instructions using a that have already been arranged, the instruction a in the instruction fields # 0 and # 1 is changed to x.
[0160]
The parallel optimization compiler 310 performs the above schedule processing from the start instruction to the end instruction in order for each instruction of the program in the second internal format generated by the scalar optimization unit 312 (in the case of a top-down method), or When executed by the instruction scheduling unit 313 in order from the end instruction to the start end (in the case of the bottom-up method), the register allocation unit 314 performs physical register allocation (register allocation) for the variable in each scheduled instruction. The register allocation by the register allocation unit 314 will be described with reference to the flowchart of FIG.
[0161]
The register allocation unit 314 scans each instruction scheduled by the instruction scheduling unit 313, and classifies all the variables by register file from the field number of the instruction to which each variable (virtual register) is referenced or defined ( Step S21). Here, it is divided into two classes: variables that should exist in the register file 107-0 (# 0) and variables that should exist in the register file 107-1 (# 1).
[0162]
Next, the register allocation unit 314 assigns physical registers in the register file corresponding to the class for each variable of each class (step S22).
[0163]
When the parallel optimization compiler 310 finishes register allocation by the register allocation unit 314, a code (object code) that can be executed by a computer (here, a VLIW processor) from each instruction in the internal format subjected to the register allocation is output as a code output unit. It is generated by 315 and output as an object file 330.
[0164]
【The invention's effect】
As described above in detail, according to the present invention, a plurality of register files are provided, and the register number is modified by the field number of each instruction field in the long instruction word (VLIW). By limiting the number of available registers to one of the register files (for each destination specification), the number of registers that can be handled by the entire long instruction word can be increased without increasing the length of the long instruction word. The hardware configuration is not complicated.
[Brief description of the drawings]
FIG. 1 is a block diagram showing a schematic configuration of a VLIW processor according to a first embodiment of the present invention.
FIG. 2 is a block diagram showing an internal configuration of bypass circuits 109-0 and 109-1 in FIG.
FIG. 3 is a diagram for explaining modification of a register number by a field number of an instruction field in the first embodiment, mainly using an instruction field # 1 having a field number of 1 (“01”) as an example;
FIG. 4 shows the relationship between the register files 107-0 and 107-1 and the instruction fields # 0 to # 3 (field numbers “00” to “11”) of the long instruction word in the first embodiment. FIG.
FIG. 5 is a block diagram showing a schematic configuration of a VLIW processor according to a second embodiment of the present invention in the same format as FIG. 2;
FIG. 6 is a view for explaining modification of a register number by a field number of an instruction field in the second embodiment.
FIG. 7 shows the relationship between the register files 107-0 and 107-1 and the instruction fields # 0 to # 3 (field numbers “00” to “11”) of the long instruction word in the second embodiment. FIG.
FIG. 8 is a diagram showing a relationship between a register file and each instruction field of a long instruction word in a third embodiment in which the present invention is applied to a VLIW processor that executes eight parallel long instruction words.
FIG. 9 is a view for explaining modification of a register number by a field number of an instruction field in the third embodiment.
FIG. 10 is a diagram showing a relationship between a register file and each instruction field of a long instruction word in a fourth embodiment in which the present invention is applied to a VLIW processor that executes 16 parallel long instruction words.
FIG. 11 is a diagram for explaining modification of a register number by a field number of an instruction field in the fourth embodiment.
FIG. 12 is a block configuration diagram showing an embodiment of a compiler for generating an object according to the instruction word format applied in the first to fourth embodiments.
13 is a flowchart for explaining a case in which instruction schedule processing in the compiler in FIG. 12 is scheduled in a top-down manner.
14 is a flowchart for explaining a case in which instruction schedule processing in the compiler in FIG. 12 is scheduled in a bottom-up manner.
FIG. 15 is a flowchart for explaining register allocation processing in the compiler in FIG. 12;
FIG. 16 shows the relationship between the number of registers that can be used in an instruction in each instruction field of a long instruction word and the number of bits of a register designating part in the instruction in a conventional VLIW processor; ⁿ 2 cases, 2 times ^{n + 1} The figure shown about the case of a piece.
FIG. 17 is a block diagram showing a schematic configuration of a conventional VLIW processor.
[Explanation of symbols]
101 ... Instruction fetch mechanism,
102: Instruction decoding mechanism,
106-0 to 106-3 ... an arithmetic unit,
107-0, 107-1, 207-0 to 207-3, register file,
108-0, 108-1, 109-0, 109-1, 209-0, 209-1 ... bypass circuit,
110, 111 ... latch circuit,
119L0 to 119L3, 119R0 to 119R3, 219L0 to 219L3, 219R0 to 219R3 ... Multiplexer (MPX)

Claims

In a VLIW processor that executes a very long instruction word (VLIW) having a plurality of instruction fields,
Multiple register files,
Based on the first predetermined part of the field number of each instruction field in the long instruction word, a register file capable of reading the source operand referred to by the instruction in the instruction field is allocated from the plurality of register files. In addition, based on a second predetermined portion at least partially different from the first predetermined portion of the field number of each instruction field, a register file capable of writing the execution result of the instruction in the instruction field is allocated. VLIW processor characterized by comprising an assigning means.

In a VLIW processor that executes a very long instruction word (VLIW) having a plurality of instruction fields,
Multiple register files each consisting of a group of registers with unique register numbers;
The register number indicated by the source register designation part of each instruction field in the long instruction word is modified by the first predetermined part of the field number of the instruction field, and the register number indicated by the destination register designation part of each instruction field is The instruction field is modified by a second predetermined part at least partially different from the first predetermined part of the field number of the instruction field, so that the instruction field for each instruction field is selected from the plurality of register files. A VLIW processor comprising: allocation means for allocating a register file capable of reading a source operand referred to by a field instruction and allocating a register file capable of writing an execution result of the instruction in the instruction field.

The allocating unit adds the first predetermined part of the field number of the instruction field to a higher rank of the register number indicated by the source register specifying part of the instruction field, and the destination register specifying part of the instruction field 3. The VLIW processor according to claim 2, wherein a register number modification for adding the second predetermined portion of the field number of the instruction field is performed above the indicated register number.

In a VLIW processor that executes a very long instruction word (VLIW) having a plurality of instruction fields,
An arithmetic unit that is provided corresponding to each instruction field of the long instruction word and executes an operation specified by the instruction of the corresponding instruction field;
Multiple register files associated with the field number of each instruction field of the long instruction word;
From among the plurality of register files, a register file capable of reading a source operand referred to by an instruction in the instruction field is assigned based on the first predetermined portion of the field number of each instruction field in the long instruction word. In addition, based on a second predetermined portion at least partially different from the first predetermined portion of the field number of each instruction field, a register file capable of writing the execution result of the instruction in the instruction field is allocated. And assigning means,
The assigning means is a decoding means for decoding an instruction in each instruction field in a long instruction word to be executed, and in the case of an instruction using a source operand, the first predetermined number of the field number of the instruction field A decoding means for reading a source operand used in the computing unit corresponding to the instruction field from the register file determined based on a portion; and a decoding unit provided corresponding to each computing unit. A plurality of buffer means for temporarily holding the result of the operation for writing to the register file determined based on the second predetermined part of the field number of the instruction field corresponding to the arithmetic unit; VLIW processor.

The allocating means includes a plurality of first bypass circuits provided corresponding to the register files, and a plurality of second bypass circuits provided corresponding to the first bypass pass circuits, respectively. The first bypass circuit is a plurality of first multiplexers provided corresponding to the respective arithmetic units using a source operand read from the register file corresponding to the bypass circuit. A plurality of first multiplexers that select one of the source operand and the data held in each of the buffer means used for writing to the register file corresponding to the bypass circuit as a source operand,
The second bypass circuit is a plurality of second multiplexers provided corresponding to the first multiplexers in the first bypass circuit corresponding to the bypass circuit, respectively. One of the data stored in each of the buffer means used for writing to the register file corresponding to the source operand selected by the first operand and the first bypass circuit is selected as a source operand to the corresponding computing unit. 5. The VLIW processor according to claim 4, comprising a plurality of second multiplexers for outputting.