JP3658101B2

JP3658101B2 - Data processing device

Info

Publication number: JP3658101B2
Application number: JP24337596A
Authority: JP
Inventors: 寿賀子大谷; 俊一岩田
Original assignee: Renesas Technology Corp
Current assignee: Renesas Technology Corp
Priority date: 1996-09-13
Filing date: 1996-09-13
Publication date: 2005-06-08
Anticipated expiration: 2016-09-13
Also published as: US6209079B1; KR100260353B1; CN1095116C; US6463520B1; TW379305B; CN1177140A; JPH1091435A; KR19980023918A

Description

【０００１】
【発明の属する技術分野】
この発明は、複数の命令コードの演算を実行するためのデータ処理装置に関するものであり、具体的には、プロセッサにおける命令コードの配置技術に関する。
【０００２】
【従来の技術】
通常、プロセッサにおいては、命令は、プロセッサとデータバスを介してつながっているメモリに、命令コードとして格納されている。この場合、当該メモリに格納されている命令コードのフォーマットには、▲１▼命令コードの長さが命令の種類によらずに常に一定に設定されている「固定長フォーマット」と、▲２▼命令コードの長さがそれぞれ命令の種類によって異なるように設定されている「任意長命令フォーマット」とがある。
【０００３】
命令コードには、演算、転送、分岐などの、命令の機能を指定するオペレーションコード部分と、命令の実行対象データ（オペランド）を指定するオペランドコード部分とがある。オペランドの指定は、命令コード中のアドレッシングモードの指定部において当該オペランドがレジスタ内に格納されているのか、それとも外部のメモリ内に格納されているのかを指定することにより、行われる。そして、オペランドがメモリ内にある場合には、更にアドレス情報を命令コード中に付加する。
【０００４】
以下に、固定長フォーマット及び任意長フォーマットの命令フォーマットを、それぞれ図１８及び図１９に模式的に示す。両図において、命令フォーマット１００はオペランドがない場合を、命令フォーマット１０１はオペランドがある場合を、命令フォーマット１０２はオペランドがない場合の任意長フォーマットを、命令フォーマット１０３はオペランドがある場合の任意長フォーマットを、それぞれ表している。
【０００５】
【発明が解決しようとする課題】
▲１▼命令コードが固定長命令フォーマットの場合
この場合には、命令コードのデコードが容易であるという利点がある。しかし、この命令フォーマットでは、決められた一定の命令長の範囲内で、オペレーションコードやアドレッシングモードの指定部や、さらにはアドレス情報などの付加情報を記述しなくてはならないという制約がある。従って、より多くの付加情報を記述するためには命令長を大きく設定する必要がある。その結果、命令長を大きくした固定長命令フォーマットでは、命令ビットパターンに冗長部分が増加し、コードサイズが大きくなるという問題点が生ずる。一方では、コードサイズを小さくするために命令長を小さく設定すると、命令機能に対する制限が大きくなるという問題点が生じる。
【０００６】
▲２▼命令コードが任意長命令フォーマットの場合
この場合には、２種類以上の任意の命令長の命令フォーマットが使用されるので、各々の命令毎に応じて命令機能を拡張することができるという利点がある。また、オペランドのない命令の命令長を短く設定することができるので、固定長命令フォーマットの場合と比較して、コードサイズを小さくできる利点がある。
【０００７】
その反面、メモリから読み込んだデータを各命令コードとして抽出し、更に、命令コード自身の各々をデコードする作業が複雑化するので、命令デコード方法が複雑にならざるを得ないという問題点がある。このため、メモリの内容から命令コードを抽出して、命令デコーダに送り込むためのＨ／Ｗ（ハードウェア）が大きくなる。例えば、図２０に示す様に、１６／３２ビット長命令フォーマット１０５，１０４を任意長命令フォーマットとして導入するときには、図２１に示す様に、命令フェッチ部と命令デコーダの間には、命令コードの転送のために、４つの経路を用意する必要がある。このためには、有効な命令コードをシフトして適切な命令コードの配置の下でデコードが実行されるように、複雑なシフト機能を命令デコーダに具備させなければならない。
【０００８】
以上の通り、▲１▼の固定長命令フォーマット及び▲２▼の任意長命令フォーマットには、それぞれ一長一短がある。そこで、▲１▼及び▲２▼のそれぞれの利点を兼ね備えたプロセッサの実現が要望されているのである。
【０００９】
この発明は、上記の懸案事項を実現するためになされたものであり、具体的には、（ｉ）固定長命令フォーマットに比べてコードサイズを縮小化し、且つ（ｉｉ）従来の任意長命令フォーマットに比べてプロセッサのＨ／Ｗ量を削減して高速化を図ることができる命令フォーマットを備えたプロセッサ及びプロセッサ用の入力装置を実現することを、その主たる目的としている。
【００１０】
又、この発明は、命令実行中に発生する各割り込み（外部割り込み、ＰＣブレーク割り込み）やソフトウェア割り込みの機能を具備したプロセッサを実現することを、その副次的目的としている。
【００１１】
又、この発明は、そのような命令コードをデコードするためのプロセッサの構成を具体化することを、その副次的目的としている。
【００１２】
【課題を解決するための手段】
第１の発明に係るデータ処理装置は、Ｎ（Ｎは１以上の整数）ビット長命令を与える第１命令データ信号と２Ｎビット長命令を与える第２命令データ信号のみから成る２種類の命令コードを実行するデータ処理装置であって、前記第１及び第２命令コードは、（１）２個の前記第１命令データ信号を２Ｎビット長のワードの境界内に格納し、（２）前記第２命令データ信号の各々を２Ｎビット長のワードの境界内に格納するという規則の下に配置された、命令コード入力手段と、前記命令コード入力手段において配置された命令コードをフェッチする命令フェッチ手段とを備え、前記第１及び第２命令データ信号のそれぞれは、その所定のビット位置において、命令実行順序の制御情報を与える命令長識別子データを備える。
【００１８】
【発明の実施の形態】
（実施の形態１）
ここでは、Ｎ＝１６とした場合の一例について、図面に基づき説明する。
【００１９】
図１は、本発明に係る複数の演算を実行するデータ処理装置の構成を示すブロック図である。図１に示す通り、同装置は、プロセッサ１０を中核として構成される。
【００２０】
プロセッサ１０は、演算部１、レジスタ２、プログラムカウンタ部（以下、ＰＣ部と称す）３、アドレス生成器４、命令キュー（ｑｕｅｕｅ）部５、命令フェッチ部６、命令デコード部７及び制御部８より構成されている。この内、制御部８は、命令デコード部７が出力するデコード結果に従って、プロセッサ１０内の各部１〜７の動作を制御する。尚、図示の簡略化のために、同図中には、制御部８から出力される各制御信号の図示化は省略されている。又、演算部１は、ＡＬＵ（算術論理演算器）、シフタ（ｓｈｉｆｔ）、Ｌｏａｄ−Ｓｔｏｒｅユニット及び乗算器（Ｍｕｌ．）から成る。又、レジスタ２は汎用レジスタであり、３２ビット幅で１６本の信号線を有し、演算結果のデータ及びアドレスデータを保持する。他の構成部分の詳細については、後述する。
【００２１】
尚、命令キュー部５と命令フェッチ部６とを総称して、「命令フェッチ機能部」と定義する。
【００２２】
又、プロセッサ１０と、外部の周辺回路１１、メモリ１２及びデータセレクタ１４とは、データバス１５とバスインタフェイス部１３とによって接続されている。尚、データバス１５は、命令コードを与える命令コードデータ信号を本プロセッサ１０に入力する「入力手段」に該当する。
【００２３】
Ｉ．命令コードの配置方法（命令セット）
次に、この発明の根幹をなす「命令コードの配置方法」について説明する。
【００２４】
既述した通り、▲１▼固定長命令フォーマット及び▲２▼任意長命令フォーマットでは、それぞれに一長一短があった。特に、▲２▼の任意長命令フォーマットにおいて、１６ビット長命令コードと３２ビット長命令コードの２種類の命令コードを用いる場合は、命令フェッチ部（ないし命令キュー部）と命令デコード部間には、４つの転送経路が必要とされる。この内、特にハードウェアの複雑化という深刻な問題点を惹起せしめる要因となっているのは、３２（２Ｎ）ビット長命令が３２（２Ｎ）ビット境界を乗り越えてしまうような命令コードの配置を許していることにあると考えられる。この点を、図２に模式的に示す。
【００２５】
今、命令フェッチ部が外部のメモリより同図の上段側に示す様な順序で配列された３つの命令コード１０６〜１０９をフェッチしてきたものとし、これらの内で、３２ビット長の命令コード１０７及び１０８がデータ処理にとって有効であるものとする。このような場合には、同図の下段側に示すような配置として、有効命令コード１０７，１０８をデコードする必要がある。従って、同図に示す様な、交差した２つの経路１１０，１１１を実現する必要が生じるのである。このような経路となる命令コードの配置を、ここでは「２Ｎビット命令が２Ｎビット境界をまたぐ配置」と呼称している。このために、命令デコーダ側のハードウェアが複雑化せざるを得ないのである。かかる要因に着眼するならば、そのような命令コードの配置を禁止する必要がある。
【００２６】
そこで、この発明では、（ｉ）Ｎ（Ｎ≧１）ビット長の命令（第１命令データ信号）と２Ｎビット長の命令（第２命令データ信号）という、２種類の命令長のみから成る複数の命令コード（命令コードデータ信号）を実行することとし、（ｉｉ）命令コードの配置方法、即ち、２Ｎビット長で与えられる「ワード」内（これを、２Ｎビット長ワード境界内と称す）に各命令をどのように配置するのかという点を、以下の２種類のみに制限することとしている。即ち、
（１）２個のＮビット長命令を２Ｎビットワード境界内に格納する。
【００２７】
（２）単一の２Ｎビット長命令を２Ｎビットワード境界内に格納する。
【００２８】
本実施例では、Ｎ＝１６の場合を扱うものとしているので、１６ビット長命令と３２ビット長命令（最大ビット長の命令）の２種類のみとなり、命令コードの配置方法は図３に示す通りとなる。
【００２９】
このような配置制限を設けることによって、３２ビット長命令が３２ビット境界をまたぐような命令コードの配置が完全に禁止されることとなり、その結果として、図１の命令フェッチ部６と命令デコード部７との転送経路は、図４に示す様に、３種類に削減される。この３種類の転送経路は、次の模式的な説明図５によって、理解されうるであろう。
【００３０】
同図に示す様に、命令キュー部５内に保持されている状態では、３種類の命令コードの配列がありうる。その内、同図の上段及び中段に示されるものは、それぞれ、先行する１６ビット長命令コードＡ１及び後行する１６ビット長命令コードＢ２のみが有効となる場合であり、いずれも上記配列方法（１）の制限の下で配列されている。この内、同図の上段の場合には、図４に示す経路ＲＴ１を経て、命令コードＡ１はデコーダに転送される。又、図５の中段の場合には、図４に示す交差経路ＲＴ３を経て、命令コードＢ２はデコーダに転送される。他方、図５の下段に示される３２ビット長の命令コードＣ１は上記配置方法（２）に基づく制約の下で配列されているものであり、図４に示す経路ＲＴ１，ＲＴ２を経てデコーダに転送される。従って、デコード化のためには、３種類の経路ＲＴ１〜ＲＴ３のみで良いこととなる。
【００３１】
このように、命令コードの配置を上記（１），（２）の制約に服させることとしているので、転送経路を従来の場合よりも１種類分だけ削減することが可能となるのであるが、上記（１），（２）の制約を受けて配置された命令コードを与える命令コードデータ信号は、図１のデータバス１５上の入力データ信号として実現されて、命令キュー部５に保持される。そのためには、ここでは、上記制約ルール（１），（２）に基づいたプログラム制御によって、命令コードのデータをメモリ１２内に書き込んでいる。従って、上記制約ルール（１），（２）に基づく配置で書き込まれた命令コードデータ信号を記憶する「メモリ１２」と、上記配置順序に従って当該メモリ１２から読み出された命令コードデータ信号をプロセッサ１０内に入力する手段たる「データバス１５」とを、「プロセッサ用命令コード入力装置」として総称する。
【００３２】
尚、上記命令コードデータ信号の内、上記制約（１）に基づき配置された１６ビット（一般的にはＮビット）長命令コードを与えるものを「第１命令コードデータ信号」と、上記制約（２）に基づき配置された３２ビット（一般的には２Ｎビット）長命令コードを与えるものを「第２命令コードデータ信号」と、それぞれ呼称する。
【００３３】
プロセッサ１０の命令キュー部５内の命令コードデータ信号のフォーマットを、図６に示す。同図中の記号ｏｐ１，ｏｐ２はオペランドコードを、記号Ｒ１，Ｒ２はレジスタを、記号Ｃは定数を、記号ｃｏｎｄは分岐条件の指定を表わす。
【００３４】
尚、上記制限（１），（２）の下でメモリ１２に命令コードのデータを書き込むのに代えて、上記制限とは無関係に命令コードのデータをメモリ１２内に書き込み、新たにデータバス１５上にメモリ１２より読み出した命令コードデータ信号を上記制限（１），（２）の下で配置する機能部を設け、この機能部の出力データを命令キュー部５に格納するようにしても良い。
【００３５】
ＩＩ．分岐先の制御
更に、本プロセッサ１０においては、分岐先のアドレスを３２ビット境界にのみ指定ないし制限する。このように、命令コードの３２ビット境界への配置を禁止したことに加えて、分岐先アドレスを３２ビット境界にのみに制限したことにより、命令フェッチ部６と命令デコーダ間の転送経路は２種類にまで削減される。
【００３６】
この点は、既述した図５に立ち戻って概観すれば、容易に理解されうるであろう。即ち、分岐先アドレスを３２ビット境界においてのみ指定できるように制限したということは、図１の命令デコード部７では、命令フェッチ部６より出力された、３２ビット長ワード境界内に配置された命令コードデータ信号を受け取った後は、当該命令コードデータ信号をその３２ビット（２Ｎビット）境界からデコードを開始することを意味する。従って、３２ビットワード境界内に２個の１６ビット長命令コードが配置されている場合は、先行する命令Ａ２をデコードした後に、同図の中段の命令コードＢ２を、先行する命令コードＡ２があるビット位置へシフトした上でデコードすればよい。
【００３７】
よって、命令キュー部５から命令デコード部７への命令コードデータ信号の転送経路は、図７に模式的に示す様に、２種類のみとなるのである。即ち、転送経路は、
（イ）命令キュー部：上位１６ビット→命令デコード部：上位１６ビット。
【００３８】
（ロ）命令キュー部：下位１６ビット→命令デコード部：下位１６ビット。
【００３９】
分岐命令の飛び先の指定は、以下の形式で行われる。その点を、図８、図９及び図１０に示す本プロセッサ１０における命令コード表に基づき説明する。尚、上記命令コード表において、「Ｆｏｒｍａｔ」中の「ｄｅｓｔ」は結果収納先のレジスタの番号を示しており、「ｓｒｃ」は演算対象であり、ここでは図６に示したレジスタＲ１を意味しており、当該レジスタＲ１中の数値がメモリのアドレス値となっている。又、「ｐｃｄｉｓｐ８」は、即値が８ビットで与えられていることを示している。
【００４０】
上記命令コード表において、
（ａ）ＪＭＰ，ＪＬ命令は、命令コード内で指定したレジスタの値が分岐先アドレスとなる。（ただし、レジスタの下位２ビットの値”０”は無視する）。
（ｂ）ＢＲＡ，ＢＬ，ＢＣ，ＢＮＣ命令は、８ビット又は２４ビットの即値を指定する。
（ｃ）ＢＥＱ，ＢＮＥ，ＢＥＱＺ，ＢＮＥＺ，ＢＬＴＺ，ＢＧＥＺ，ＢＬＥＺ，ＢＧＴＺ命令は１６ビットの即値を指定する。
上記（ｂ），（ｃ）における分岐先のアドレスは、
（分岐命令のＰＣ値）＋（符号拡張された即値を左に（最上位ビット位置側に）２ビットシフトした値）
とする。ただし、加算を行う際には、ＰＣ値の下位２ビットは”００”となる。
【００４１】
以上のように、分岐先を３２ビット境界のみに指定したことにより、分岐先アドレスの下位２ビットは常に”００”となる。したがって、命令コード内で分岐先アドレスを指定する場合には、その下位２ビットを指定する必要がなくなる。結果として、命令コード内から直接分岐できる範囲は２²倍、従って４倍となる。本実施の形態の場合には、上記命令コード表に示されるように、命令コード内のアドレス指定部は最大で２４ビットであるので、実行中の命令のアドレスから±３２ＭＢｙｔｅの範囲に直接分岐できる。
【００４２】
ＩＩＩ．命令のデコード順序の制御
更に本プロセッサ１０においては、命令コード内に命令フォーマット識別子を与える情報を所定のビット数として、ここでは１ビットとして設けている。そして、この命令フォーマット識別子の値いかんによって、命令のデコード及び実行順序を制御している。ここでは、各命令コードのＭＳＢ（ＭｏｓｔＳｉｇｎｉｆｉｃａｎｔＢｉｔ）が命令フォーマット識別子に該当する。
【００４３】
上記命令フォーマット識別子を用いた制御のルールは、次の通りに設定される。即ち、単一の３２ビット長命令コードのＭＳＢは常に１に設定されている。他方、命令コードが２個の１６ビット長命令コードから成る場合においては、上位１６ビット側に存在する命令コードのＭＳＢは常に０とされる。それに続く下位１６ビット側の命令コードに対しては、そのＭＳＢの値いかんによって異なる処理を行う。
【００４４】
命令の実行順序の制御方法を、図１１を用いて説明する。同図において、▲１▼「命令Ｂ」のＭＳＢが０の場合は、「命令Ａ」と「命令Ｂ」とは連続して実行される。
【００４５】
それに対して、▲２▼「命令Ｂ」のＭＳＢが１の場合は、「命令Ａ」のみが実行される。命令Ｂは実行されない。即ち、３２ビット長命令の命令コード配置の制約に基づいたワードアライメント調整のための命令Ｂとしてワードアライメント調整用「ＮＯＰ命令」を挿入したときには、アセンブラが自動的に当該「ＮＯＰ命令」にあたる命令コードのＭＳＢを１とし、これにより「命令Ａ」のみの実行が行われる。通常の無効演算「ＮＯＰ命令」は”０１１１００００００００００００”として与えられるが、アライメントという上記目的のために「ＮＯＰ命令」は３２ビットワード境界内の下位側１６ビット位置に挿入されたのであって、それを直接実行する必要もない。従って、本プロセッサ１０では、「ＮＯＰ命令」は”１１１１００００００００００００”として与えられ、その結果、「ＮＯＰ命令」自体は実行されない。
【００４６】
このような命令実行順序の制御を行うことにより、コード配置を満たすために挿入された無効演算である「ＮＯＰ命令」の実行時間ペナルティがなくなるという利点が得られる。
【００４７】
Ｖ．命令デコード方法とその制御部
以下では、命令のデコード制御方法の具体例を、図１２を用いて説明する。
【００４８】
（命令デコード部７の構成）
命令デコード部７は、命令デコード入力ラッチ７ａ、命令デコーダ７ｃ、定数生成器７ｄ及び命令デコード部７の制御ロジック７ｂから構成される。これらの内で、制御ロジック７ｂは、命令デコード部７の各部の制御を司る。又、命令デコーダ７ｃは、命令デコード入力ラッチ７ａに格納された３２ビットのビットパターンのうちの有効な命令コードを、入力として受け取り、当該命令デコーダ７ｃは命令コードをデコードする。尚、命令デコード入力ラッチ７ａ内の命令コードの配置は、メモリ１２上での命令コードの配置と同じである。デコード結果はプロセッサ１０の制御部８へと出力され、制御部８は、そのデコード結果に基づき、演算部１やプロセッサ１０全体を制御する。
【００４９】
各部のより詳細な動作は、次の通りである。
【００５０】
（命令フェッチ部６）
先ず、命令フェッチ部６は、データバス１５（図１）を介してメモリ１２から命令コードデータ信号を３２ビット長単位でフェッチし、それらを命令キュー部５に格納する。更に命令フェッチ部６は、後述の第５制御信号ＣＢに応じて、命令キュー部５より順次に命令コードデータ信号を読み出して命令デコード部７へ転送する。その結果、転送されてきた命令コードデータ信号は、３２ビット幅の命令デコード入力ラッチ７ａへ格納される。
【００５１】
命令デコード入力ラッチ７ａに格納された命令コードデータ信号の内で有効な命令コードを与える信号のデコードの実行は、既述した命令フォーマット識別子に基づき、制御ロジック７ｂにより、次の通りに制御される。その点を、以下に詳述する。
【００５２】
（制御ロジック７ｂ）
図１２の第１〜第３制御信号ＣＴ１〜ＣＴ３（入力信号とも称す）と命令デコード入力ラッチ７ａ内の有効コード配置との関係を、図１３に示す。図中の斜線部は命令フォーマット識別子を示し、梨地部は有効なコードを示す。
【００５３】
図１２に示すように、制御ロジック７ｂは、命令デコード入力ラッチ７ａに納められた３２ビットパターンのうちの命令長識別子を与える第１，第２制御信号ＣＴ１，ＣＴ２と、制御ロジック７ｂ自身から出力される第３制御信号ＣＴ３とを、入力とする。即ち、命令デコード入力ラッチ７ａは、命令フォーマット識別子として自ら保有する３２ビットのビットパターンの０ビット目の値と１６ビット目の値とを、各々第１制御信号ＣＴ１及び第２制御信号ＣＴ２として、制御ロジック７ｂに出力する。第３制御信号ＣＴ３は制御ロジック７ｂ自身からの出力信号であり、それは、命令デコーダ７ｃが出力するデコード終了信号ＥＳに応じて、上記ラッチ７ａに納められた３２ビットのビットパターン中の０ビット目から１５ビット目までの部分がデコードされているのか（この場合には”０”）、それとも１６ビット目から３１ビット目までの部分が現在デコードされているのか（この場合には”１”）を、表わす。
【００５４】
（α）第１制御信号ＣＴ１の値が”１”である時は、命令デコード入力ラッチ７ａに納められている命令コードデータ信号は３２ビット長命令を与えるものである。
【００５５】
（β）第１制御信号ＣＴ１が”０”であるときは、命令デコード入力ラッチ７ａには２個の１６ビット長命令から成る命令コードデータ信号が格納されている。このとき、さらに第２制御信号ＣＴ２が”０”かつ第３制御信号ＣＴ３が”０”であれば、有効な命令コードは上位１６ビット側にある。これに対して、第２制御信号２が”０”かつ第３制御信号ＣＴ３が”１”であれば、有効な命令コードは下位１６ビット側にある。
【００５６】
（γ）第１制御信号ＣＴ１が”０”で、しかも第２制御信号ＣＴ２が”１”である場合は、上位１６ビットの命令コードのみが有効なコードであり、下位１６ビットの命令コードは、上述したワードアライメント調整用「ＮＯＰ命令」であり実行されない。
【００５７】
制御ロジック７ｂは、以上の入力信号ＣＴ１〜ＣＴ３の値を基に、次コードのデコードのために、次の３種類の制御信号ＣＡ，ＣＢ，ＣＴ３を出力する。この内、第４制御信号ＣＡは、命令デコード入力ラッチ７ａ内において下位１６ビットの命令コードを上位１６ビット位置へシフトするための制御信号、即ちシフタ制御信号であり、第５制御信号ＣＢは命令アドレスの３２ビット境界のポインタであり、命令フェッチ部６に対して命令デコード入力ラッチ７ａへの命令コードの転送開始を命令する信号である。第３制御信号ＣＴ３は、既述した通り、次の有効コードが３２ビットワード境界内の上位１６ビット位置にあるのか、下位１６ビット位置にあるのかを示す信号である。
【００５８】
さらに、図１２に示すように、制御ロジック７ｂは、第６制御信号ＣＣとして、命令デコーダ７ｃにおいてデコード実行中の命令コードのアドレスが３２ビットワード境界上にある場合には”１”を、ワード境界上にない場合には”０”を、後述するＰＣのビット３０部（ＰＣ３０部）に出力する。
【００５９】
命令コード内の即値により分岐先アドレスを指定する分岐命令、即ち、上述した（ｂ）ＢＲＡ命令等や（ｃ）ＢＥＱ命令等をデコードする場合は、命令デコード入力ラッチ７ａの後半２４ビット（８ビット目から３１ビット目まで）位置に接続された定数生成器７ｄへ、同ラッチ７ａは即値を出力する。定数生成器７ｄは、即値を符号拡張して得られる値を、バスＳ２に出力する。
【００６０】
命令実行時に分岐が発生した場合には、第１〜第３制御信号（入力）ＣＴ１〜ＣＴ３の値に関わりなく、命令デコード部７の第４制御信号ＣＡ、第５制御信号ＣＢ、第６制御信号ＣＣ及び第３制御信号ＣＴ３は、すべて初期化される。
【００６１】
制御信号（入力）と制御信号（出力）の関係を図１４に示す。
【００６２】
以上より、命令デコード部７（図１，図１２）においては、いずれの場合にも、第１回目にデコードする命令コードは、必ず命令デコード入力ラッチ７ａの先頭からはじまる。したがって、命令デコード入力ラッチ７ａの内容が、そのまま命令デコーダ７ｃに転送される。命令コードデータ信号が単一の３２ビット長命令コードの場合には、転送経路は図１２に示す経路Ｐ２，Ｐ３である。又、上位の１６ビット長命令コードの転送経路は、経路Ｐ２である。
【００６３】
命令デコード入力ラッチ７ａにおける命令コードの配置が１６ビット長命令の逐次実行である場合のみ、命令コードの２回目の命令デコーダ７ｃへの出力が行われる。この際、２回目の命令デコーダ７ｃへの出力前に、次のような処理が行われる。デコード対象となる命令コード、即ち有効なコードは３２ビットワード境界内の下位側の１６ビット位置にあるので、命令デコード入力ラッチ７ａの内容は左へ（３２ビットのビットパターンの０ビット目方向へ）１６ビット分だけシフトされ（経路Ｐ１）、入力ラッチ７ａに入る。その結果が、２回目の命令デコーダ７ｃへの出力となる（経路Ｐ２）。
【００６４】
このように、経路Ｐ１を経由するのは、有効コードが命令デコード入力ラッチ７ａの下位１６ビット位置にある場合に限られるので、Ｈ／Ｗ量を削減することができ、さらに高速化を図れることが可能となる。
【００６５】
ＶＩ．ＰＣ部３のインクリメント動作
図１５は、図１中のＰＣ部３及びアドレス生成器４の部分を拡大したブロック図である。図１５を用いてＰＣ部３のインクリメント動作について説明する。
【００６６】
アドレス生成器４は、シフタ４ａと加算器４ｂとから構成されており、分岐命令のアドレスをアドレッシングモードに従って計算する。
【００６７】
ＰＣ部３は、プログラムカウンタ（以後、ＰＣと称す）を中核として、更に比較器３ｂ、（＋１／０）部３ｅ、バックアップ・プログラムカウンタ（以下、ＢＰＣと称す）３ｆ、ＢＰＣ３０部３ｇ及びプログラムカウンタ・ブレークポインタ（以下、ＰＢＰと称す）より構成される。この内、ＰＢＰ３ａとＢＰＣ３ｆとは、制御レジスタである。
【００６８】
上述のプログラムカウンタは３２ビットのカウンタであり、現在実行中の命令のアドレス値を保持する。命令コードの配置方法を上述の通り限定しているので、本プロセッサ１０（図１）の命令は偶数アドレス値からのみ始まる。よって、プログラムカウンタの３１ビット目の値は、図２２に例示するように、”０”固定となる。従って、ハードウェア上では、ＰＣの３１ビット目の値を実現する必要がないので、ＰＣは、図１５に示す通り、０ビット目から２９ビット目までの値を与えるＰＣ（０：２９）３ｃと、３０ビット目の値を与えるＰＣ３０部３ｄとによって実現されている。この内、ＰＣ３０部３ｄは、第６制御信号ＣＣの値”０”又は”１”を保有するレジスタである。
【００６９】
又、上述のＢＰＣは、後述する割り込み・トラップ発生時にＰＣ３ｃが保持するＰＣ値を退避する。ここで、図２３に例示するように、ＢＰＣの３１ビット目の値は常に”０”固定されているので、ハードウェア上では、ＢＰＣは、０ビット目から２９ビット目までの値を与えるＢＰＣ（０：２９）３ｆと、３０ビット目の値を与えるＢＰＣ３０より実現されている。従って、本プロセッサ１０（図１）は、後述する割り込み・トラップの発生を検出すると、ＰＣ（０：２９）３ｃの値をＢＰＣ（０：２９）３ｆへ退避させる。他方、ＰＢＰ３ａは、後述するＰＣブレーク割り込みを制御するための３２ビット幅制御レジスタであり、ＰＢＰ３ａは、割り込みを起動する命令実行のアドレス値を予め保有しており、制御部８（図１）から出力するバスＤ１上の書き込み命令信号を用いて、上記アドレス値を書き込んでいる。
【００７０】
ＰＣ（０：２９）３ｃの更新は、本プロセッサ１０（図１）において分岐以外の命令が実行された時には、以下に示すように行われる。尚、ここでは、４本のバスＳ１，Ｓ２，Ｓ３及びＤ１を使用しており、信号線Ｄは、ＰＣ３ｃに接続されている。
【００７１】
図１２の命令デコード部７の制御ロジック７ｂから出力される第５制御信号ＣＢ（３２ビットワード境界のアドレスを指すポインタ）が更新されると、ＰＣ（０：２９）３ｃが保持する値は（＋１／０）部３ｅによって”＋１”だけ増加させられ、その増加後の値が信号線Ｄ上にのる。これに対して、第５制御信号ＣＢが更新されない場合は、ＰＣ（０：２９）３ｃの値は、（＋１／０）部３ｅによって増加されることなく、そのまま信号線Ｄ上にのる。
【００７２】
信号線Ｄ上の値はＰＣ（０：２９）３ｃへ書き込まれ、これにより、ＰＣ（０：２９）３ｃのＰＣ値は、次命令のＰＣ（０：２９）３ｃの値へと書き換えられる。また、信号線ＤはバスＳ１に結線されているので、このＰＣ（０：２９）３ｃの更新値をバスＳ１上に呼びだすことができる。
【００７３】
ＰＣ３０部３ｄには、命令デコード部７内の制御ロジック７ｂ（図１）から出力される第６制御信号ＣＣの値が、命令実行時に書き込まれる。
【００７４】
他方、命令実行時に分岐が発生した場合には、次命令のＰＣ値は、アドレス生成器４を用いて生成され、その生成値はバスＳ３を経由してＰＣ（０：２９）３ｃに書き込まれる。
【００７５】
又、飛び先アドレス指定を即値でおこなう分岐命令（上述した分岐命令（ｂ），（ｃ））の場合には、命令デコード部７の定数生成器７ｄ（図１）は、当該命令コードから抽出した即値を３２ビットに拡張し、拡張後の即値をバスＳ２を介してアドレス生成器４へ出力する。そして、符号拡張された即値は、アドレス生成器４内のシフタ４ａによって左に（最上位側へ）２ビットだけシフトされる。ＰＣ（０：２９）３ｃの更新値は、信号線ＤからバスＳ１を介して、アドレス生成器４に出力される。ここで、ＰＣ（０：２９）３ｃの更新値は、分岐命令のＰＣ上位３０ビットにあたる。加算器４ｂは、
（ＰＣ（０：２９）３ｃの更新値）＋（符号拡張された即値を左に２ビットシフトした値）
を計算し、これにより、分岐先アドレスの上位３０ビットが得られる。この上位３０ビットの値は、バスＳ３を介して、ＰＣ（０：２９）３ｃに書き込まれる。
【００７６】
飛び先アドレス指定をレジスタ２（図１）の値で行う分岐命令の場合（上述の分岐命令（ａ））には、命令コードにより指定されたレジスタ２から、バスＳ１を介して、レジスタに保持されている値が受け取られ、その定数値がＰＣ３ｃに書き込まれる。
【００７７】
分岐発生時は、命令デコード部７が第６制御信号ＣＣを”０”に初期化することにより、ＰＣ３０部３ｄには”０”が書き込まれる。よって、本プロセッサ１０（図１）では、上述の通り、分岐先のアドレスは３２ビットワード境界のみとなる。
【００７８】
ＶＩＩ．割り込み・トラップ動作の説明
プロセッサ１０（図１）が通常のプログラムを実行している途中で、ある事象が発生すると、そのプログラムの実行を中断して、別のプログラムを実行する必要が生じる。このような事象としては、大別して、割り込み及びトラップ動作がある。
【００７９】
（ａ）割り込み（Ｉｎｔｅｒｒｕｐｔ）
上記事象の内、外部からのハードウェア信号（これを割り込み要求信号と称す）により、又は特定のアドレス実行時に生ずるＰＣブレーク信号により発生する事象。
【００８０】
（ｂ）トラップ（Ｔｒａｐ）
上記事象のうち、命令により発行される事象。
【００８１】
本プロセッサ１０（図１）はまた、外部割り込み（ＥＩ）及びＰＣブレーク割り込み（ＰＢＩ）から成る２種類の（ａ）割り込みと、１種類のトラップ（ＴＲＡＰ）とを実現する機能を備える。
【００８２】
割り込み・トラップ発生時の処理手順を図１６を用いて説明すれば、概要、次の通りとなる。
【００８３】
割り込み要求信号又はＰＣブレーク信号が有効となった場合には、後程詳述する様に、プロセッサ１０は、３２ビットワード境界でのみ割込みを受け付ける。また、トラップ命令は、その命令実行後、トラップの処理を開始する。
【００８４】
これにより、プロセッサ１０（図１）は、プログラムの実行を中断して割り込み又はトラップの処理を行う。その際、同プロセッサ１０は、後述する様に、割り込み又はトラップの事象を検出して図１５のＰＣ３ｃのＰＣ値をＢＰＣ３ｆへ退避させることとしており、その後、各々の割り込み又はトラップに対応した処理プログラム「割り込み・トラップ処理ハンドラ」への分岐をおこなう。
【００８５】
「割り込み・トラップ処理ハンドラ」での処理が完了すると、「割り込み・トラップ処理ハンドラ」からの復帰命令を実行し、続いてＰＣ３ｃのＰＣ値の復帰を行い、プロセッサ１０は当該割り込み・トラップ処理から復帰する。
【００８６】
本プロセッサ１０における割り込み・トラップの処理は、以上の通り、ハードウェアが処理を行う部分と、プログラムが処理をする部分とから成る。即ち、本プロセッサ１０では、上述の処理の内、
（１）戻り先であるＰＣ値のＢＰＣ３ｆへの退避、
（２）「割り込み・トラップ処理ハンドラ」への分岐、
（３）ＢＰＣ値のＰＣ３ｃへの書き込み、
を、ハードウェア部分が実行する。
【００８７】
前述の外部割り込み（ＥＩ）は、外部からのハードウェア信号（割込み要求信号）により発生する。割込み要求信号による割り込みの要求は、３２ビットワード境界上でのみ受け付ける（この機構は、後記の検出回路の構成による）。割り込み発生時に図１５のＢＰＣ３ｆに退避される値は、次命令のＰＣ値である。
【００８８】
他方、ＰＣブレーク割り込み（ＰＢＩ）は、特定のアドレスを実行したときに発生する。本プロセッサ１０（図１）においては、指定するアドレスは３２ビットワード境界のみである。各サイクル毎に、図１５の比較器３ｂは、ＰＢＰ３ａが保持する値とＰＣ３ｃの値とを比較し、両者の値が一致するときにＰＣブレーク信号１８を出力する。ＰＣブレーク信号１８は、後述する割り込み・トラップ検出回路を介して、割り込み・トラップ検出信号として検出され、その信号は図１５の（＋１／０）部３ｅへ出力される。その結果、プロセッサ１０に割り込みが発生し、ＰＣ値の退避が生じる。既述した通り、ＰＢＰ３ａへの書き込みは、データバスＤ１を用いて行われる。又、割り込み発生時にＢＰＣ３ｆに退避される値は、次命令のＰＣ値である。
【００８９】
又、トラップとは、ソフトウェアで制御する割り込みのことであり、トラップ命令の実行により発生する。この場合、トラップ命令が３２ビットワード境界内の上位１６ビット側にあるのか、それとも下位１６ビット側にあるのかを与える情報を、ＢＰＣビット３０、即ち、ＢＰＣ３０部３ｇに格納する。トラップ命令が上位１６ビットにある場合は、ＢＰＣ３０部３ｇの値は”０”、下位１６ビットである場合は、ＢＰＣ３０部３ｇの値は”１”である。割り込み発生時にＢＰＣに退避される値は、（トラップ命令のＰＣ値＋４）である。
【００９０】
割り込み・トラップを検出する回路を、図１７に示す。同検出回路は、図１の制御部８の一部を構成しており、インバータ２２、ＡＮＤ回路２３，２４及び割り込み・トラップ検出回路１６より成る。
【００９１】
本プロセッサ１０は、当該検出回路によって、外部信号１９による割り込みの要求及びＰＣブレーク信号１８による割り込みの要求を、共に３２ビットワード境界においてのみ受け付ける。即ち、３２ビットワード境界においては第６制御信号ＣＣのレベルは”０”であるので、第６制御信号ＣＣを反転したワード境界検出信号２１（そのレベルは”１”）と割込み要求信号１９とが同時に有効（”１”）になったときにのみ、出力信号ＶＥが有効（”１”）となる。また、ワード境界検出信号２１とＰＣブレーク信号１８とが同時に有効（”１”）になったときにのみ、出力信号ＶＦが有効となる。このように、３２ビットワード境界でのみ上記割込み要求が受け付けられ、割り込み・トラップ処理ハンドラへ分岐し、当該処理を行った後に、「割り込み・トラップ処理ハンドラ」からの復帰命令の実行によって、割り込み・トラップ処理から実行プログラムへの復帰が行われる。
【００９２】
他方、トラップ命令では、トラップ命令による割り込みが指令されると、トラップ要求信号２０が有効（”１”）になる。
【００９３】
トラップ要求信号２０、出力信号ＶＥ、出力信号ＶＦの内のいずれか１つが有効（”１”）になると、割り込み・トラップ検出回路１６により、割り込み・トラップ検出信号１７が出力される。割り込み・トラップ検出信号１７が出力されると、割り込み・トラップ処理ハンドラへ分岐し、当該処理を行った後、復帰命令を実行し、当該処理から実行プログラムへと復帰する。
【００９４】
既述の通り、割り込み・トラップ発生時には、ＰＣ３ｃ（図１５）の値がＢＰＣ３ｆに退避される。割り込み発生時、ＢＰＣ３ｆには次命令のＰＣ値が書き込まれる。
【００９５】
例えば、分岐命令の実行直後に割り込みが発生した場合には、図１５のアドレス生成器４で生成された分岐先を与えるＰＣ値は、制御部８の制御の下、バスＳ３を経由してＢＰＣ３ｆに書き込まれる。これに対して、分岐命令以外の命令の実行直後に割り込みが発生した場合には、（＋１／０）部３ｅによってＰＣ（０：２９）３ｃの出力値に”＋１”を加えられたものが、ＢＰＣ（０：２９）３ｆに書き込まれる。
【００９６】
ＢＰＣ３０部３ｇには、ＰＣ３０部３ｄの値が書き込まれる。本プロセッサ１０（図１）においては、既述した通り、割り込みは３２ビットワード境界においてのみ受け付けられるので、割り込み発生時にＢＰＣ３０部３ｇに退避される値は、常に”０”である。
【００９７】
トラップ発生時には、（（トラップ命令のＰＣ）＋４）の値がＢＰＣ３ｆに書き込まれる。即ち、（＋１／０）部３ｅによってＰＣ（０：２９）３ｃの出力値に”＋１”が加えられたものが、ＢＰＣ（０：２９）３ｆに書き込まれ、ＢＰＣ３０部３ｇにはＰＣ３０部３ｄの値が書き込まれる。即ち、トラップを発生させたトラップ命令が３２ビットワード境界の上位１６ビットにある場合は、ＢＰＣ３０部３ｆには”０”が、３２ビットワード境界の下位１６ビットにある場合はＢＰＣ３０部３ｇには”１”が、それぞれ書き込まれる。
【００９８】
割り込み・トラップからの復帰は、既述の通り、リターン命令を実行することによって行う。リターン命令は、制御部８が出力する信号２５を受けてＢＰＣ３ｆが出力する信号が与えるアドレスへ分岐する。ただし、本プロセッサ１０においては、分岐先のアドレスは常に３２ビット境界のみに設定されているので、ＢＰＣ３ｆがＰＣ３ｃに復帰する際、ＰＣの下位２ビットは常に”００”となる。
【００９９】
（まとめ）
以上の構成を採用したことにより、次の様な特徴点が得られる。
【０１００】
本プロセッサ１０は、Ｎビット長と２Ｎビット長という２種類の命令長から成る命令コードを実行することにより、固定長命令フォーマットに比べて、命令機能の制限を受けることなくコードサイズを縮小することができると共に、従来の任意長命令フォーマットに比べて命令デコード方法を簡略化できる。
【０１０１】
特に、命令コードの配置に一定の制限を加えて、２Ｎビット境界をまたぐ命令の配置を禁止すると共に、分岐先アドレスを２Ｎビットワード境界のみに指定することとしたことにより、命令フェッチ部６及び命令デコード部７間のデータ転送経路を格段に削減することができる。その結果、命令デコードのためのＨ／Ｗ量を削減でき、高速化が図れる。
【０１０２】
しかも、本プロセッサ１０においては、分岐先アドレスを２Ｎビット境界に制約したことにより、分岐先アドレスの下位２ビットは常に”００”となる。したがって、命令コード内で分岐先アドレスの下位２ビットを指定する必要がない。その結果として、アドレスの全てのビットを指定する場合と比較して、２²倍、即ち４倍の広範囲へ実行中の命令のアドレスから直接分岐することができる。
【０１０３】
そのような機能を備えた本プロセッサ１０に対して、更に、各種の割り込みやトラップ処理をサポートし得る機能を備えることも可能である。
【０１０４】
【発明の効果】
請求項１記載の発明によれば、命令コード入力手段により２個の第１命令データ信号及び第２命令データ信号は共に２Ｎビットワード境界内に格納されるので、２Ｎビット長命令のデータが当該２Ｎビット境界をまたいでしまうこととなる命令コードの配置が禁止されることとなる。このため、命令フェッチ部から命令デコード部への命令コードデータ信号の転送経路を従来の４種類から３種類に削減することができるという効果がある。
【０１０６】
更に、請求項１記載の発明によれば、命令長識別子に基づいて各命令のデコード及びその実行順序を制御することが可能となる。そして、命令長識別子の適切な設置によっては、無効演算としての命令コードデータ信号を実行しなくて済むようにすることができ、これにより実行時間ペナルティをなくすことも可能になるという利点がある。
【図面の簡単な説明】
【図１】データ処理装置の構成を示すブロック図である。
【図２】命令コードの配置方法を模式的に示す図である。
【図３】命令コードの配置方法を模式的に示す図である。
【図４】命令フェッチ部と命令デコード部との転送経路を模式的に示す図である。
【図５】３種類の命令コードの配列を示す図である。
【図６】命令キュー部内の命令コードデータ信号のフォーマットを示す図である。
【図７】命令キュー部と命令デコード部との転送経路を模式的に示す図である。
【図８】プロセッサの命令コード表を示す図である。
【図９】プロセッサの命令コード表を示す図である。
【図１０】プロセッサの命令コード表を示す図である。
【図１１】命令コードの実行順序の制御方法を示す図である。
【図１２】命令デコード部の構成の詳細を示すブロック図である。
【図１３】第１〜第３制御信号と命令デコード入力ラッチ内の有効コード配置との関係を示す図である。
【図１４】命令デコード部の制御ロジックの第１〜第３制御信号と第４〜第６制御信号との関係を示す図である。
【図１５】ＰＣ部及びアドレス生成器の構成の詳細を示すブロック図である。
【図１６】割り込み・トラップ発生時の処理手順を示す図である。
【図１７】割り込み・トラップを検出する回路を示すブロック図である。
【図１８】従来の固定長命令フォーマットを示す図である。
【図１９】従来の任意長命令フォーマットを示す図である。
【図２０】従来の任意長命令フォーマットの具体例を示す図である。
【図２１】従来の命令フェッチ部と命令デコーダ部間の転送経路を示す図である。
【図２２】ＰＣがハードウェア上で実現されている態様を示す図である。
【図２３】ＢＰＣがハードウェア上で実現されている態様を示す図である。
【符号の説明】
１演算部、２レジスタ、３ＰＣ部、３ａプログラムカウンタ・ブレークポインタ、３ｂ比較器、３ｃプログラムカウンタ部（ビット０〜２９）、３ｄプログラムカウンタ部（ビット３０）、３ｅ（＋１／０）部、３ｆバックアップ・プログラムカウンタ（ビット０〜２９）、３ｇバックアップ・プログラムカウンタ（ビット３０）、４アドレス生成器、５命令キュー部、６命令フェッチ部、７命令デコード部、７ａ命令デコード入力ラッチ、７ｂ制御ロジック、７ｃ命令デコーダ、７ｄ定数生成器、８制御部、１０プロセッサ、１１周辺回路、１２メモリ、１３バスインタフェイス部、１４データセレクタ、１５データバス、１６割り込み・トラップ検出回路、１７割り込み・トラップ検出信号、１８ＰＣブレーク信号、１９割込み要求信号、２０ＴＲＡＰ要求信号、２１ワード境界検出信号、２２インバータ、２３，２４ＡＮＤ回路、２５出力信号。[0001]
BACKGROUND OF THE INVENTION
The present invention relates to a data processing apparatus for executing an operation of a plurality of instruction codes, and more specifically to an instruction code arrangement technique in a processor.
[0002]
[Prior art]
Usually, in a processor, an instruction is stored as an instruction code in a memory connected to the processor via a data bus. In this case, the format of the instruction code stored in the memory includes (1) “fixed length format” in which the length of the instruction code is always set regardless of the type of instruction, and (2) There is an “arbitrary length instruction format” in which the length of the instruction code is set to differ depending on the type of instruction.
[0003]
The instruction code includes an operation code part that designates the function of the instruction, such as operation, transfer, and branch, and an operand code part that designates execution target data (operand) of the instruction. The designation of the operand is performed by designating whether the operand is stored in the register or in the external memory in the addressing mode designating part in the instruction code. If the operand is in the memory, address information is further added to the instruction code.
[0004]
The instruction formats of the fixed length format and the arbitrary length format are schematically shown in FIGS. 18 and 19, respectively. In both figures, instruction format 100 has no operand, instruction format 101 has an operand, instruction format 102 has an arbitrary length format without an operand, and instruction format 103 has an arbitrary length format with an operand. Respectively.
[0005]
[Problems to be solved by the invention]
(1) When instruction code is fixed length instruction format
In this case, there is an advantage that the instruction code can be easily decoded. However, this instruction format has a restriction that an operation code, an addressing mode designation part, and additional information such as address information must be described within a predetermined fixed instruction length. Therefore, in order to describe more additional information, it is necessary to set a large instruction length. As a result, in the fixed-length instruction format in which the instruction length is increased, there is a problem that redundant portions are increased in the instruction bit pattern and the code size is increased. On the other hand, if the instruction length is set small in order to reduce the code size, there arises a problem that the restriction on the instruction function becomes large.
[0006]
(2) When instruction code is in arbitrary length instruction format
In this case, since an instruction format having two or more arbitrary instruction lengths is used, there is an advantage that the instruction function can be expanded according to each instruction. In addition, since the instruction length of an instruction having no operand can be set short, there is an advantage that the code size can be reduced as compared with the case of the fixed-length instruction format.
[0007]
On the other hand, the data read from the memory is extracted as each instruction code, and further, the operation of decoding each instruction code itself is complicated, so that the instruction decoding method has to be complicated. For this reason, the H / W (hardware) for extracting the instruction code from the contents of the memory and sending it to the instruction decoder increases. For example, as shown in FIG. 20, when the 16/32 bit length instruction format 105, 104 is introduced as an arbitrary length instruction format, as shown in FIG. 21, between the instruction fetch unit and the instruction decoder, the instruction code Four routes need to be prepared for transfer. For this purpose, the instruction decoder must be provided with a complicated shift function so that a valid instruction code can be shifted and decoding can be performed under an appropriate instruction code arrangement.
[0008]
As described above, the fixed-length instruction format (1) and the arbitrary-length instruction format (2) have advantages and disadvantages. Therefore, it is desired to realize a processor having the advantages (1) and (2).
[0009]
The present invention has been made to realize the above-mentioned concerns. Specifically, (i) the code size is reduced as compared with the fixed-length instruction format, and (ii) the conventional arbitrary-length instruction format. The main object of the present invention is to realize a processor and an input device for a processor having an instruction format capable of increasing the speed by reducing the amount of H / W of the processor.
[0010]
Another object of the present invention is to realize a processor having functions of interrupts (external interrupts, PC break interrupts) generated during instruction execution and software interrupts.
[0011]
Another object of the present invention is to embody a configuration of a processor for decoding such an instruction code.
[0012]
[Means for Solving the Problems]
A data processing apparatus according to a first aspect of the present invention provides two types of instruction codes comprising only a first instruction data signal for giving an N (N is an integer of 1 or more) bit length instruction and a second instruction data signal for giving a 2N bit length instruction. The first and second instruction codes are: (1) storing the two first instruction data signals within a 2N-bit word boundary, and (2) the second instruction code. Instruction code input means arranged under the rule of storing each of two instruction data signals within a boundary of a 2N-bit word, and instruction fetch means for fetching an instruction code arranged in the instruction code input means And be prepared Each of the first and second instruction data signals includes instruction length identifier data for giving control information of the instruction execution order at the predetermined bit position. .
[0018]
DETAILED DESCRIPTION OF THE INVENTION
(Embodiment 1)
Here, an example where N = 16 will be described with reference to the drawings.
[0019]
FIG. 1 is a block diagram showing a configuration of a data processing apparatus that executes a plurality of operations according to the present invention. As shown in FIG. 1, the apparatus is configured with a processor 10 as a core.
[0020]
The processor 10 includes an arithmetic unit 1, a register 2, a program counter unit (hereinafter referred to as a PC unit) 3, an address generator 4, an instruction queue unit 5, an instruction fetch unit 6, an instruction decoding unit 7, and a control unit 8. It is made up of. Among these, the control unit 8 controls the operations of the units 1 to 7 in the processor 10 in accordance with the decoding result output from the instruction decoding unit 7. For simplification of illustration, illustration of each control signal output from the control unit 8 is omitted in FIG. The arithmetic unit 1 includes an ALU (arithmetic logic unit), a shifter, a load-store unit, and a multiplier (Mul.). The register 2 is a general-purpose register, has 16 signal lines with a 32-bit width, and holds operation result data and address data. Details of other components will be described later.
[0021]
The instruction queue unit 5 and the instruction fetch unit 6 are collectively referred to as “instruction fetch function unit”.
[0022]
Further, the processor 10, the external peripheral circuit 11, the memory 12, and the data selector 14 are connected by a data bus 15 and a bus interface unit 13. The data bus 15 corresponds to “input means” for inputting an instruction code data signal for giving an instruction code to the processor 10.
[0023]
I. Instruction code layout (instruction set)
Next, the “instruction code arrangement method” which forms the basis of the present invention will be described.
[0024]
As described above, the (1) fixed-length instruction format and the (2) arbitrary-length instruction format have merits and demerits, respectively. In particular, in the arbitrary length instruction format (2), when using two types of instruction codes, a 16-bit instruction code and a 32-bit instruction code, between the instruction fetch unit (or instruction queue unit) and the instruction decode unit Four transfer paths are required. Of these, the cause of the serious problem of hardware complexity is the arrangement of instruction codes that cause 32 (2N) bit length instructions to cross the 32 (2N) bit boundary. It seems to be forgiving. This point is schematically shown in FIG.
[0025]
Now, the instruction fetch units are arranged in the order shown in the upper part of the figure from the external memory. 3 It is assumed that one instruction code 106 to 109 has been fetched, and among these, the 32-bit instruction codes 107 and 108 are effective for data processing. In such a case, it is necessary to decode the valid instruction codes 107 and 108 in an arrangement as shown on the lower side of the figure. Therefore, it is necessary to realize two intersecting paths 110 and 111 as shown in FIG. The arrangement of the instruction code serving as such a path is referred to herein as “an arrangement in which 2N-bit instructions cross 2N-bit boundaries”. For this reason, the hardware on the instruction decoder side must be complicated. If attention is given to such factors, it is necessary to prohibit the placement of such instruction codes.
[0026]
Therefore, according to the present invention, (i) a plurality of instructions having only two types of instruction lengths, that is, an instruction (first instruction data signal) of N (N ≧ 1) bit length and an instruction of 2N bit length (second instruction data signal). (Ii) Instruction code arrangement method, that is, within a “word” given with a 2N-bit length (this is referred to as a 2N-bit word boundary) The method of arranging each instruction is limited to the following two types. That is,
(1) Two N-bit instructions are stored within a 2N-bit word boundary.
[0027]
(2) Store a single 2N bit long instruction within a 2N bit word boundary.
[0028]
In this embodiment, since the case of N = 16 is handled, there are only two types of instructions of 16-bit length instructions and 32-bit length instructions (instructions of the maximum bit length), and the instruction code arrangement method is as shown in FIG. It becomes.
[0029]
By providing such an arrangement restriction, instruction code arrangement in which a 32-bit instruction crosses a 32-bit boundary is completely prohibited. As a result, the instruction fetch unit 6 and the instruction decode unit in FIG. As shown in FIG. 4, the number of transfer routes to 7 is reduced to three types. These three types of transfer paths can be understood from the following schematic explanatory diagram 5.
[0030]
As shown in the figure, in the state held in the instruction queue unit 5, there can be three types of instruction code arrays. Among them, what is shown in the upper and middle stages of the figure is the case where only the preceding 16-bit instruction code A1 and the succeeding 16-bit instruction code B2 are valid. Arranged under the restrictions of 1). Among these, in the case of the upper stage in the figure, the instruction code A1 is transferred to the decoder via the path RT1 shown in FIG. In the case of the middle stage of FIG. 5, the instruction code B2 is transferred to the decoder via the crossing path RT3 shown in FIG. On the other hand, the 32-bit instruction code C1 shown in the lower part of FIG. 5 is arranged under the restriction based on the arrangement method (2), and is transferred to the decoder via the paths RT1 and RT2 shown in FIG. Is done. Therefore, only three types of routes RT1 to RT3 are required for decoding.
[0031]
As described above, since the arrangement of the instruction code is subject to the restrictions (1) and (2), the transfer path can be reduced by one type compared to the conventional case. An instruction code data signal for giving an instruction code arranged under the restrictions (1) and (2) is realized as an input data signal on the data bus 15 in FIG. . For this purpose, the instruction code data is written in the memory 12 by program control based on the constraint rules (1) and (2). Accordingly, the “memory 12” that stores the instruction code data signal written in the arrangement based on the restriction rules (1) and (2), and the instruction code data signal read from the memory 12 in accordance with the arrangement order are processed by the processor. The “data bus 15” that is a means for inputting into the processor 10 is generically referred to as a “processor instruction code input device”.
[0032]
Of the instruction code data signals, those that give a 16-bit (generally N-bit) length instruction code arranged based on the restriction (1) are referred to as “first instruction code data signal” and the restriction ( Those that give a 32-bit (generally 2N-bit) long instruction code arranged based on 2) are called “second instruction code data signals”, respectively.
[0033]
The format of the instruction code data signal in the instruction queue unit 5 of the processor 10 is shown in FIG. In the figure, symbols op1 and op2 represent operand codes, symbols R1 and R2 represent registers, symbol C represents a constant, and symbol cond represents branch condition designation.
[0034]
Instead of writing the instruction code data in the memory 12 under the restrictions (1) and (2), the instruction code data is written in the memory 12 regardless of the restrictions, and a new data bus 15 is written. A function unit for arranging the instruction code data signal read from the memory 12 under the restrictions (1) and (2) may be provided, and the output data of the function unit may be stored in the instruction queue unit 5. .
[0035]
II. Branch destination control
Further, in the processor 10, the branch destination address is specified or limited only to a 32-bit boundary. As described above, in addition to prohibiting the placement of the instruction code on the 32-bit boundary, the branch destination address is limited only to the 32-bit boundary, so that there are two types of transfer paths between the instruction fetch unit 6 and the instruction decoder. Reduced to
[0036]
This point can be easily understood by referring back to FIG. In other words, the restriction that the branch destination address can be specified only at the 32-bit boundary means that the instruction decoding unit 7 in FIG. 1 outputs the instruction arranged within the 32-bit long word boundary output from the instruction fetch unit 6. After receiving the code data signal, this means that the instruction code data signal starts to be decoded from its 32 bit (2N bit) boundary. Therefore, when two 16-bit instruction codes are arranged within a 32-bit word boundary, the preceding instruction A2 is decoded, and then the middle instruction code B2 in the figure has the preceding instruction code A2. What is necessary is just to decode after shifting to a bit position.
[0037]
Therefore, there are only two types of instruction code data signal transfer paths from the instruction queue unit 5 to the instruction decoding unit 7, as schematically shown in FIG. That is, the transfer path is
(A) Instruction queue part: upper 16 bits → instruction decode part: upper 16 bits
[0038]
(B) Instruction queue part: lower 16 bits → instruction decode part: lower 16 bits.
[0039]
The jump destination of a branch instruction is specified in the following format. This point will be described based on an instruction code table in the processor 10 shown in FIGS. In the above instruction code table, “dest” in “Format” indicates the register number of the result storage destination, and “src” is the object of calculation, and here means register R1 shown in FIG. The numerical value in the register R1 is the memory address value. “Pcdisp8” indicates that the immediate value is given by 8 bits.
[0040]
In the above instruction code table,
(A) For the JMP and JL instructions, the value of the register specified in the instruction code becomes the branch destination address. (However, the value “0” of the lower 2 bits of the register is ignored).
(B) The BRA, BL, BC, and BNC instructions specify 8-bit or 24-bit immediate values.
(C) The BEQ, BNE, BEQZ, BNEZ, BLTZ, BGEZ, BLEZ, and BGTZ instructions specify a 16-bit immediate value.
The branch destination addresses in (b) and (c) above are:
(PC value of branch instruction) + (value obtained by shifting the sign-extended immediate value to the left (to the most significant bit position) by 2 bits)
And However, when adding, the lower 2 bits of the PC value are “00”.
[0041]
As described above, since the branch destination is designated only on the 32-bit boundary, the lower 2 bits of the branch destination address are always “00”. Therefore, when the branch destination address is specified in the instruction code, it is not necessary to specify the lower 2 bits. As a result, the range that can branch directly from the instruction code is 2 ² Times, and thus 4 times. In the case of the present embodiment, as shown in the above instruction code table, since the address designation part in the instruction code is 24 bits at the maximum, it is possible to branch directly to the range of ± 32 MByte from the address of the instruction being executed. .
[0042]
III. Control instruction decoding order
Further, in the processor 10, information giving an instruction format identifier is provided in the instruction code as a predetermined number of bits, here 1 bit. The instruction decoding and execution order is controlled based on the value of the instruction format identifier. Here, the MSB (Most Significant Bit) of each instruction code corresponds to the instruction format identifier.
[0043]
The rule of control using the instruction format identifier is set as follows. That is, the MSB of a single 32-bit length instruction code is always set to 1. On the other hand, when the instruction code is composed of two 16-bit instruction codes, the MSB of the instruction code existing on the upper 16 bits side is always 0. Subsequent lower 16-bit instruction codes are processed differently depending on the MSB value.
[0044]
A method for controlling the execution order of instructions will be described with reference to FIG. In the figure, (1) When the MSB of “instruction B” is 0, “instruction A” and “instruction B” are executed successively.
[0045]
On the other hand, when (2) MSB of “instruction B” is 1, only “instruction A” is executed. Instruction B is not executed. That is, when the word alignment adjustment “NOP instruction” is inserted as the instruction B for word alignment adjustment based on the instruction code arrangement restriction of the 32-bit length instruction, the assembler automatically sets the instruction code corresponding to the “NOP instruction”. Thus, only the “instruction A” is executed. The normal invalid operation “NOP instruction” is given as “0111000000000000”, but for the above purpose of alignment, the “NOP instruction” was inserted into the lower 16-bit position within the 32-bit word boundary, There is no need to execute it directly. Therefore, in the processor 10, the “NOP instruction” is given as “1111000000000000000”, and as a result, the “NOP instruction” itself is not executed.
[0046]
By controlling the instruction execution order as described above, there is an advantage that there is no execution time penalty for the “NOP instruction” which is an invalid operation inserted to satisfy the code arrangement.
[0047]
V. Instruction decoding method and its control unit
Hereinafter, a specific example of the instruction decoding control method will be described with reference to FIG.
[0048]
(Configuration of instruction decode unit 7)
The instruction decode unit 7 includes an instruction decode input latch 7a, an instruction decoder 7c, a constant generator 7d, and a control logic 7b of the instruction decode unit 7. Among these, the control logic 7 b controls each part of the instruction decoding unit 7. The instruction decoder 7c receives a valid instruction code of the 32-bit bit pattern stored in the instruction decode input latch 7a as an input, and the instruction decoder 7c decodes the instruction code. The instruction code arrangement in the instruction decode input latch 7 a is the same as the instruction code arrangement on the memory 12. The decoding result is output to the control unit 8 of the processor 10, and the control unit 8 controls the arithmetic unit 1 and the entire processor 10 based on the decoding result.
[0049]
The detailed operation of each part is as follows.
[0050]
(Instruction fetch unit 6)
First, the instruction fetch unit 6 fetches an instruction code data signal from the memory 12 in units of 32 bits through the data bus 15 (FIG. 1), and stores them in the instruction queue unit 5. Further, the instruction fetch unit 6 sequentially reads out the instruction code data signal from the instruction queue unit 5 and transfers it to the instruction decoding unit 7 in accordance with a fifth control signal CB described later. As a result, the transferred instruction code data signal is stored in the 32-bit instruction decode input latch 7a.
[0051]
Execution of decoding of a signal giving a valid instruction code among the instruction code data signals stored in the instruction decode input latch 7a is controlled by the control logic 7b as follows based on the instruction format identifier described above. . This will be described in detail below.
[0052]
(Control logic 7b)
FIG. 13 shows the relationship between the first to third control signals CT1 to CT3 (also referred to as input signals) in FIG. 12 and the valid code arrangement in the instruction decode input latch 7a. The hatched portion in the figure indicates an instruction format identifier, and the satin portion indicates a valid code.
[0053]
As shown in FIG. 12, the control logic 7b outputs the first and second control signals CT1 and CT2 that give the instruction length identifier of the 32-bit pattern stored in the instruction decode input latch 7a, and the control logic 7b itself. The third control signal CT3 is input. That is, the instruction decode input latch 7a uses the value of the 0th bit and the value of the 16th bit of the 32-bit bit pattern held as an instruction format identifier as the first control signal CT1 and the second control signal CT2, respectively. Output to the control logic 7b. The third control signal CT3 is an output signal from the control logic 7b itself, which is the 0th bit in the 32-bit bit pattern stored in the latch 7a in response to the decode end signal ES output from the instruction decoder 7c. Whether the part from the 15th bit to the 15th bit is decoded (in this case “0”), or whether the part from the 16th bit to the 31st bit is currently decoded (in this case “1”) Represents.
[0054]
(Α) When the value of the first control signal CT1 is “1”, the instruction code data signal stored in the instruction decode input latch 7a gives a 32-bit instruction.
[0055]
(Β) When the first control signal CT1 is “0”, the instruction code data signal composed of two 16-bit instructions is stored in the instruction decode input latch 7a. At this time, if the second control signal CT2 is "0" and the third control signal CT3 is "0", a valid instruction code is on the upper 16 bits side. On the other hand, if the second control signal 2 is “0” and the third control signal CT3 is “1”, a valid instruction code is on the lower 16 bits side.
[0056]
(Γ) When the first control signal CT1 is “0” and the second control signal CT2 is “1”, only the upper 16-bit instruction code is a valid code, and the lower 16-bit instruction code is The above-described “NOP instruction” for word alignment adjustment is not executed.
[0057]
Based on the values of the input signals CT1 to CT3, the control logic 7b outputs the following three types of control signals CA, CB, and CT3 for decoding the next code. Among them, the fourth control signal CA is a control signal for shifting the lower 16-bit instruction code to the upper 16-bit position in the instruction decode input latch 7a, that is, a shifter control signal, and the fifth control signal CB is an instruction. This is a 32-bit boundary pointer of the address, and is a signal for instructing the instruction fetch unit 6 to start transfer of the instruction code to the instruction decode input latch 7a. As described above, the third control signal CT3 is a signal indicating whether the next valid code is in the upper 16-bit position or the lower 16-bit position in the 32-bit word boundary.
[0058]
Further, as shown in FIG. 12, the control logic 7b sets “1” as the sixth control signal CC when the address of the instruction code being decoded in the instruction decoder 7c is on a 32-bit word boundary. If it is not on the boundary, “0” is output to the PC bit 30 (PC 30) described later.
[0059]
When decoding a branch instruction that designates a branch destination address by an immediate value in the instruction code, that is, the above-mentioned (b) BRA instruction or (c) BEQ instruction, the latter 24 bits (8 bits) of the instruction decode input latch 7a The latch 7a outputs an immediate value to the constant generator 7d connected to the position (from the first to the 31st bit). The constant generator 7d outputs a value obtained by sign extending the immediate value to the bus S2.
[0060]
When a branch occurs during the execution of an instruction, the fourth control signal CA, the fifth control signal CB, and the sixth control of the instruction decoding unit 7 regardless of the values of the first to third control signals (inputs) CT1 to CT3. The signal CC and the third control signal CT3 are all initialized.
[0061]
The relationship between the control signal (input) and the control signal (output) is shown in FIG.
[0062]
As described above, in any case, in the instruction decode unit 7 (FIGS. 1 and 12), the instruction code to be decoded for the first time always starts from the head of the instruction decode input latch 7a. Therefore, the contents of instruction decode input latch 7a are transferred to instruction decoder 7c as they are. When the instruction code data signal is a single 32-bit instruction code, the transfer path is paths P2 and P3 shown in FIG. The transfer path for the upper 16-bit instruction code is path P2.
[0063]
Only when the instruction code arrangement in the instruction decode input latch 7a is sequential execution of a 16-bit instruction, the instruction code is output to the instruction decoder 7c for the second time. At this time, the following processing is performed before the second output to the instruction decoder 7c. Since the instruction code to be decoded, that is, the valid code is in the lower 16-bit position within the 32-bit word boundary, the contents of the instruction decode input latch 7a are to the left (toward the 0th bit of the 32-bit bit pattern). ) Shifted by 16 bits (path P1) and enter the input latch 7a. The result is the second output to the instruction decoder 7c (path P2).
[0064]
In this way, the route P1 is routed only when the valid code is in the lower 16-bit position of the instruction decode input latch 7a, so the amount of H / W can be reduced and the speed can be further increased. Is possible.
[0065]
VI. PC unit 3 increment operation
FIG. 15 is an enlarged block diagram of the PC unit 3 and the address generator 4 in FIG. The increment operation of the PC unit 3 will be described with reference to FIG.
[0066]
The address generator 4 includes a shifter 4a and an adder 4b, and calculates the address of the branch instruction according to the addressing mode.
[0067]
The PC unit 3 has a program counter (hereinafter referred to as PC) as a core, and further includes a comparator 3b, a (+1/0) unit 3e, a backup program counter (hereinafter referred to as BPC) 3f, a BPC 30 unit 3g, and a program counter. A break pointer (hereinafter referred to as PBP) is used. Among these, the PBP 3a and the BPC 3f are control registers.
[0068]
The above-mentioned program counter is a 32-bit counter and holds the address value of the instruction currently being executed. Since the instruction code arrangement method is limited as described above, the instruction of the processor 10 (FIG. 1) starts only from the even address value. Therefore, the value of the 31st bit of the program counter is fixed to “0” as illustrated in FIG. Therefore, since it is not necessary to realize the value of the 31st bit of the PC on the hardware, the PC gives the values from the 0th bit to the 29th bit as shown in FIG. 15 (0:29) 3c And the PC 30 unit 3d that gives the value of the 30th bit. Among these, the PC 30 unit 3d is a register that holds the value “0” or “1” of the sixth control signal CC.
[0069]
The BPC saves a PC value held by the PC 3c when an interrupt / trap described later occurs. Here, as illustrated in FIG. 23, since the value of the 31st bit of the BPC is always fixed to “0”, the BPC gives the values from the 0th bit to the 29th bit on the hardware. This is realized by (0:29) 3f and the BPC 30 giving the value of the 30th bit. Therefore, the processor 10 (FIG. 1) saves the value of the PC (0:29) 3c to the BPC (0:29) 3f when detecting the occurrence of an interrupt / trap described later. On the other hand, the PBP 3a is a 32-bit width control register for controlling a PC break interrupt, which will be described later, and the PBP 3a holds in advance an instruction execution address value that activates the interrupt, from the control unit 8 (FIG. 1). The address value is written using a write command signal on the output bus D1.
[0070]
The update of the PC (0:29) 3c is performed as shown below when an instruction other than a branch is executed in the processor 10 (FIG. 1). Here, four buses S1, S2, S3 and D1 are used, and the signal line D is connected to the PC 3c.
[0071]
When the fifth control signal CB (pointer indicating the address of the 32-bit word boundary) output from the control logic 7b of the instruction decoding unit 7 in FIG. 12 is updated, the value held by the PC (0:29) 3c is ( The value is increased by “+1” by the (+1/0) section 3e, and the increased value is placed on the signal line D. On the other hand, when the fifth control signal CB is not updated, the value of the PC (0:29) 3c is directly increased on the signal line D without being increased by the (+1/0) unit 3e.
[0072]
The value on the signal line D is written to the PC (0:29) 3c, whereby the PC value of the PC (0:29) 3c is rewritten to the value of the PC (0:29) 3c of the next instruction. Since the signal line D is connected to the bus S1, the updated value of the PC (0:29) 3c can be called on the bus S1.
[0073]
The value of the sixth control signal CC output from the control logic 7b (FIG. 1) in the instruction decoding unit 7 is written in the PC 30 unit 3d when the instruction is executed.
[0074]
On the other hand, if a branch occurs during instruction execution, the PC value of the next instruction is generated using the address generator 4, and the generated value is written to the PC (0:29) 3c via the bus S3. .
[0075]
Further, in the case of a branch instruction (the branch instructions (b) and (c) described above) in which jump address designation is performed with an immediate value, the constant generator 7d (FIG. 1) of the instruction decode unit 7 extracts the instruction code from the instruction code. The expanded immediate value is expanded to 32 bits, and the expanded immediate value is output to the address generator 4 via the bus S2. Then, the sign-extended immediate value is shifted by 2 bits to the left (to the most significant side) by the shifter 4a in the address generator 4. The updated value of PC (0:29) 3c is output from the signal line D to the address generator 4 via the bus S1. Here, the updated value of PC (0:29) 3c corresponds to the upper 30 bits of the PC of the branch instruction. The adder 4b
(Update value of PC (0:29) 3c) + (value obtained by shifting the sign-extended immediate value to the left by 2 bits)
Thus, the upper 30 bits of the branch destination address are obtained. The value of the upper 30 bits is written to the PC (0:29) 3c via the bus S3.
[0076]
In the case of a branch instruction in which the jump destination is specified by the value of register 2 (FIG. 1) (the above branch instruction (a)), the register 2 specified by the instruction code is held in the register via the bus S1. Value is received and the constant value is written to PC3c.
[0077]
When a branch occurs, the instruction decode unit 7 initializes the sixth control signal CC to “0”, so that “0” is written in the PC 30 unit 3d. Therefore, in the processor 10 (FIG. 1), as described above, the branch destination address is only a 32-bit word boundary.
[0078]
VII. Explanation of interrupt / trap operation
When a certain event occurs while the processor 10 (FIG. 1) is executing a normal program, it is necessary to interrupt the execution of the program and execute another program. Such events are roughly classified into interrupt and trap operations.
[0079]
(A) Interrupt (Interrupt)
Among the above events, an event generated by an external hardware signal (referred to as an interrupt request signal) or a PC break signal generated when a specific address is executed.
[0080]
(B) Trap
Among the above events, events issued by instructions.
[0081]
The processor 10 (FIG. 1) also has a function of realizing two types of (a) interrupts including an external interrupt (EI) and a PC break interrupt (PBI) and one type of trap (TRAP).
[0082]
The processing procedure when an interrupt / trap occurs will be described with reference to FIG.
[0083]
When the interrupt request signal or PC break signal becomes valid, the processor 10 interrupts only at a 32-bit word boundary, as will be described in detail later. See Accept. The trap instruction starts trap processing after execution of the instruction.
[0084]
As a result, the processor 10 (FIG. 1) interrupts execution of the program and performs interrupt or trap processing. At that time, as will be described later, the processor 10 detects an interrupt or trap event and saves the PC value of the PC 3c in FIG. 15 to the BPC 3f. Thereafter, the processing program corresponding to each interrupt or trap Branch to “Interrupt / Trap handler”.
[0085]
When the processing in the “interrupt / trap processing handler” is completed, a return instruction from the “interrupt / trap processing handler” is executed, and then the PC value of the PC 3c is restored, and the processor 10 returns from the interrupt / trap processing. To do.
[0086]
As described above, the interrupt / trap processing in the processor 10 includes a part for processing by hardware and a part for processing by a program. That is, in the processor 10, among the above processes,
(1) Saving the PC value that is the return destination to BPC3f,
(2) Branch to “interrupt / trap handler”
(3) Write BPC value to PC3c,
Is executed by the hardware part.
[0087]
The aforementioned external interrupt (EI) is generated by an external hardware signal (interrupt request signal). An interrupt request by an interrupt request signal is accepted only on a 32-bit word boundary (this mechanism depends on the configuration of a detection circuit described later). The value saved in the BPC 3f in FIG. 15 when an interrupt occurs is the PC value of the next instruction.
[0088]
On the other hand, a PC break interrupt (PBI) occurs when a specific address is executed. In the processor 10 (FIG. 1), the designated address is only a 32-bit word boundary. For each cycle, the comparator 3b of FIG. 15 compares the value held by the PBP 3a with the value of the PC 3c, and outputs a PC break signal 18 when the two values match. The PC break signal 18 is detected as an interrupt / trap detection signal via an interrupt / trap detection circuit described later, and the signal is output to the (+1/0) section 3e in FIG. As a result, an interrupt occurs in the processor 10 and the PC value is saved. As described above, writing to the PBP 3a is performed using the data bus D1. The value saved in the BPC 3f when an interrupt occurs is the PC value of the next instruction.
[0089]
A trap is an interrupt controlled by software, and is generated by execution of a trap instruction. In this case, information indicating whether the trap instruction is on the upper 16 bits side or the lower 16 bits side in the 32-bit word boundary is stored in the BPC bit 30, that is, the BPC 30 unit 3g. When the trap instruction is in the upper 16 bits, the value of the BPC 30 part 3g is “0”, and when the trap instruction is the lower 16 bits, the value of the BPC 30 part 3g is “1”. The value saved in the BPC when an interrupt occurs is (PC value of trap instruction +4).
[0090]
A circuit for detecting an interrupt / trap is shown in FIG. The detection circuit constitutes a part of the control unit 8 of FIG. 1, and includes an inverter 22, AND circuits 23 and 24, and an interrupt / trap detection circuit 16.
[0091]
The processor 10 accepts both the interrupt request by the external signal 19 and the interrupt request by the PC break signal 18 only at the 32-bit word boundary by the detection circuit. That is, since the level of the sixth control signal CC is “0” at the 32-bit word boundary, the word boundary detection signal 21 (its level is “1”) obtained by inverting the sixth control signal CC and the interrupt request signal 19 The output signal VE becomes valid ("1") only when the signals are simultaneously valid ("1"). Further, the output signal VF is valid only when the word boundary detection signal 21 and the PC break signal 18 are simultaneously valid (“1”). In this way, the above interrupt request is accepted only at a 32-bit word boundary, and after branching to the interrupt / trap processing handler and executing the processing, the interrupt / trap processing handler executes the return instruction by executing the return instruction. A return from the trap processing to the execution program is performed.
[0092]
On the other hand, in the trap instruction, when an interrupt by the trap instruction is instructed, the trap request signal 20 becomes valid (“1”).
[0093]
When any one of the trap request signal 20, the output signal VE, and the output signal VF becomes valid (“1”), the interrupt / trap detection circuit 16 outputs the interrupt / trap detection signal 17. When the interrupt / trap detection signal 17 is output, the process branches to an interrupt / trap processing handler, performs the processing, executes a return instruction, and returns from the processing to the execution program.
[0094]
As described above, when an interrupt / trap occurs, the value of PC3c (FIG. 15) is saved in BPC3f. When an interrupt occurs, the PC value of the next instruction is written into BPC3f.
[0095]
For example, when an interrupt occurs immediately after execution of a branch instruction, the PC value that gives the branch destination generated by the address generator 4 in FIG. 15 is BPC3f via the bus S3 under the control of the control unit 8. Is written to. On the other hand, when an interrupt occurs immediately after execution of an instruction other than a branch instruction, the (+1/0) unit 3e adds “+1” to the output value of the PC (0:29) 3c. , BPC (0:29) 3f.
[0096]
The value of the PC 30 part 3d is written in the BPC 30 part 3g. In the present processor 10 (FIG. 1), as described above, an interrupt is accepted only at a 32-bit word boundary. Therefore, the value saved in the BPC 30 unit 3g when an interrupt occurs is always “0”.
[0097]
When a trap occurs, the value of ((trap instruction PC) +4) is written into the BPC 3f. That is, a value obtained by adding “+1” to the output value of the PC (0:29) 3c by the (+1/0) unit 3e is written into the BPC (0:29) 3f, and the BPC 30 unit 3g includes the PC 30 unit 3d. The value of is written. That is, if the trap instruction that generated the trap is in the upper 16 bits of the 32-bit word boundary, “0” is in the BPC30 part 3f, and if it is in the lower 16 bits of the 32-bit word boundary, it is in the BPC30 part 3g. “1” is written respectively.
[0098]
As described above, the return from the interrupt / trap is performed by executing the return instruction. The return instruction receives the signal 25 output from the control unit 8 and branches to an address given by the signal output from the BPC 3f. However, in this processor 10, since the branch destination address is always set only at a 32-bit boundary, when the BPC 3f returns to the PC 3c, the lower 2 bits of the PC are always “00”.
[0099]
(Summary)
By adopting the above configuration, the following feature points can be obtained.
[0100]
The processor 10 executes an instruction code having two instruction lengths of N bit length and 2N bit length, thereby reducing the code size without being restricted by the instruction function as compared with the fixed length instruction format. The instruction decoding method can be simplified as compared with the conventional arbitrary length instruction format.
[0101]
In particular, a certain restriction is imposed on the instruction code arrangement to prohibit the instruction arrangement across the 2N-bit boundary, and the branch destination address is designated only on the 2N-bit word boundary. The data transfer path between the instruction decode units 7 can be significantly reduced. As a result, the H / W amount for instruction decoding can be reduced and the speed can be increased.
[0102]
In addition, in this processor 10, since the branch destination address is restricted to a 2N-bit boundary, the lower 2 bits of the branch destination address are always “00”. Therefore, it is not necessary to specify the lower 2 bits of the branch destination address in the instruction code. As a result, compared to specifying all bits of the address, 2 ² It is possible to branch directly from the address of the instruction being executed to a double, that is, four times as wide.
[0103]
The processor 10 having such a function can be further provided with a function capable of supporting various interrupts and trap processes.
[0104]
【The invention's effect】
Claim Described in 1 According to Ming, since both the first instruction data signal and the second instruction data signal are stored within the 2N bit word boundary by the instruction code input means, the data of the 2N bit length instruction crosses the 2N bit boundary. Arrangement of instruction codes that would otherwise be prohibited is prohibited. For this reason, there is an effect that the transfer path of the instruction code data signal from the instruction fetch unit to the instruction decode unit can be reduced from four types to three types.
[0106]
Further claims 1 According to the described invention, it is possible to control the decoding of each instruction and the execution order thereof based on the instruction length identifier. Depending on the proper installation of the instruction length identifier, it is possible to eliminate the execution of the instruction code data signal as an invalid operation, thereby eliminating the execution time penalty. When There is an advantage.
[Brief description of the drawings]
FIG. 1 is a block diagram showing a configuration of a data processing apparatus.
FIG. 2 is a diagram schematically showing an instruction code arrangement method;
FIG. 3 is a diagram schematically showing an instruction code arrangement method;
FIG. 4 is a diagram schematically illustrating a transfer path between an instruction fetch unit and an instruction decode unit.
FIG. 5 is a diagram showing an array of three types of instruction codes.
FIG. 6 is a diagram illustrating a format of an instruction code data signal in an instruction queue unit.
FIG. 7 is a diagram schematically illustrating a transfer path between an instruction queue unit and an instruction decoding unit.
FIG. 8 is a diagram showing a processor instruction code table;
FIG. 9 is a diagram showing a processor instruction code table;
FIG. 10 is a diagram showing a processor instruction code table;
FIG. 11 is a diagram illustrating a method for controlling the execution order of instruction codes.
FIG. 12 is a block diagram illustrating details of a configuration of an instruction decoding unit.
FIG. 13 is a diagram showing a relationship between first to third control signals and a valid code arrangement in an instruction decode input latch.
FIG. 14 is a diagram illustrating a relationship between first to third control signals and fourth to sixth control signals of control logic of the instruction decode unit;
FIG. 15 is a block diagram showing details of the configuration of a PC unit and an address generator.
FIG. 16 is a diagram showing a processing procedure when an interrupt / trap occurs.
FIG. 17 is a block diagram showing a circuit for detecting an interrupt / trap.
FIG. 18 is a diagram illustrating a conventional fixed-length instruction format.
FIG. 19 is a diagram showing a conventional arbitrary length instruction format.
FIG. 20 is a diagram showing a specific example of a conventional arbitrary length instruction format.
FIG. 21 is a diagram illustrating a transfer path between a conventional instruction fetch unit and an instruction decoder unit.
FIG. 22 is a diagram illustrating an aspect in which a PC is realized on hardware.
FIG. 23 is a diagram illustrating a mode in which BPC is realized on hardware.
[Explanation of symbols]
1 arithmetic unit, 2 register, 3 PC unit, 3a program counter / break pointer, 3b comparator, 3c program counter unit (bits 0 to 29), 3d program counter unit (bit 30), 3e (+1/0) unit, 3f Backup program counter (bits 0 to 29), 3g Backup program counter (bit 30), 4 Address generator, 5 Instruction queue section, 6 instruction fetch section, 7 Instruction decode section, 7a Instruction decode input latch, 7b control Logic, 7c instruction decoder, 7d constant generator, 8 control unit, 10 processor, 11 peripheral circuit, 12 memory, 13 bus interface unit, 14 data selector, 15 data bus, 16 interrupt / trap detection circuit, 17 interrupt / trap Detection signal, 18 PC break signal, 9 an interrupt request signal, 20 TRAP request signal, 21 word boundary detection signal, 22 an inverter, 23, 24 the AND circuit 25 output signal.

Claims

A data processing apparatus that executes two types of instruction codes consisting of only a first instruction data signal that gives N (N is an integer of 1 or more) bit length instructions and a second instruction data signal that gives 2N bit length instructions,
The first and second instruction codes (1) store the two first instruction data signals within a 2N-bit word boundary, and (2) each of the second instruction data signals have a 2N-bit length. Instruction code input means arranged under the rule of storing within the boundaries of
E Bei an instruction fetch unit for fetching an instruction code placed in the instruction code input means,
Each of the first and second instruction data signals includes instruction length identifier data for giving control information of the instruction execution order at the predetermined bit position.
Data processing device.