JP3753368B2

JP3753368B2 - Data processor and data processing system

Info

Publication number: JP3753368B2
Application number: JP2001027990A
Authority: JP
Inventors: 重純松井; 康之村上; 久仁彦西山; 淳木内; 雄一瀧常
Original assignee: Renesas Technology Corp
Current assignee: Renesas Technology Corp
Priority date: 2000-02-24
Filing date: 2001-02-05
Publication date: 2006-03-08
Anticipated expiration: 2021-02-05
Also published as: JP2001312404A

Description

【０００１】
【発明の属する技術分野】
本発明は、データプロセッサ、特に外部メモリからの命令プリフェッチに関し、例えば、連続的な命令アドレスの実行順序を変更する分岐（本明細ではジャンプの概念も分岐に含む）処理の少ないサブルーチンプログラムを実行するデータ処理システムに適用して有効な技術に関する。
【０００２】
【従来の技術】
データプロセッサ等による命令の逐次実行を高速化するための技術として、情報参照の時間的・空間的局所性に鑑みた命令キャッシュメモリ、そして命令プリフェッチバッファがある。
【０００３】
例えば特開平６−２４３０３６号公報(米国登録番号５,５１１,１７８)にはフェッチした命令の局在性を指標するループロックを設け、ループ外にプログラム制御が移行するまでループ内の命令列をキャッシュメモリに留めておくようにする発明が開示されている。
【０００４】
特開平４−６２６３７号公報には、実行速度を向上させるために、フェッチされたループ命令をＦＩＦＯ（First-In・First-Out）バッファに留めておくようにする命令キュー（命令プリフェッチバッファ）を設けたマイクロプロセッサが開示される。
【０００５】
【発明が解決しようとする課題】
上記従来技術のようにループ命令を命令キャッシュメモリや命令プリフェッチバッファから追い出さないようにして、命令の逐次実行を高速化しようとする技術は、ループ命令を多用する処理では効果的であるが、ループ命令が殆ど存在せず専らリニアな連続アドレスの命令を逐次実行する処理の場合には、ループ命令を追い出し禁止する構成を採用しても、それに見合う大きな効果は得難い。本発明者の検討によれば、そのような場合に通常の命令キャッシュメモリを用いる事すら実質的に無意味な場合のあることが明らかにされた。
【０００６】
即ち、本発明者は、携帯電話システムにおけるプロトコル処理若しくはシステム制御処理のようなサブルーチンプログラムの実行について検討した。前記サブルーチンプログラムによる前記プロトコル処理若しくはシステム制御処理は複雑でプログラム規模も大な処理であるためその処理プログラムをデータプロセッサの内蔵ＲＯＭに格納することは現実的ではない。一方、外部メモリのアクセス速度はデータプロセッサによるデータ処理速度に対して遅く、その差を吸収するためにデータプロセッサに命令キャッシュメモリを採用することができる。しかしながら、前記プロトコル処理若しくはシステム制御処理は、ループ命令が殆ど存在せず専らリニアな連続アドレスの命令を逐次実行する処理が多用され、その結果、キャッシュメモリを採用しても左程の効果が期待できない。
【０００７】
そこで本発明者はキャッシュメモリを採用せず、それに代えて比較的構成の簡単な命令プリフェッチで対処することを検討した。その場合には、ループ命令が殆ど存在せず専らリニアな連続アドレスの命令を逐次実行する処理が多用されるという特質を考慮すれば、ループ命令の追い出し禁止のような構成は全く必要ではないこと、そして、プリフェッチされた命令とその命令のアドレスとの対応付けをキャッシュメモリのアドレスタグによる制御機構やカウンタによるリード・ライトポインタ制御機構よりも簡素化することが、費用対効果の観点より必要なことが本発明者によって見出された。
【０００８】
さらに本発明者により検討を行ったところ、命令プリフェッチを行う場合に固定長のバースト転送機構を使用した場合、分岐命令により命令分岐が発生した場合に無駄になる命令をもプリフェッチすることとなるため、結果的にオーバヘッドが生じることとなることが判明した。
【０００９】
また命令プリフェッチ動作を、分岐命令の実行若しくはアドレスの下位ビットと通常の命令フェッチ要求の組合わせにより行う場合、プリフェッチされた命令を全て実行していた場合には、次の命令プリフェッチ動作により外部メモリから命令フェッチが完了するまでプログラムの実行が中断する、という問題点があることが判明した。
【００１０】
さらに、外部メモリにアクセスする場合をより詳細に検討すると、命令コードの取り込み(命令フェッチ)には命令プリフェッチが有効であるが、オペランドとして記述されたデータの取り込み(データフェッチ)ではやはり外部メモリへのアクセスが生じる場合があり、その場合には外部メモリからデータフェッチが完了するまでプログラムの実行が中断する、という問題点があることが判明した。
【００１１】
そこで本発明者は、これら命令プリフェッチで対処することにより新たに見出された問題点に関しても、命令プリフェッチ方法の工夫により対処することを検討した。この場合においても、プリフェッチされた命令とその命令のアドレスとの対応付けをキャッシュメモリのアドレスタグによる制御機構やカウンタによるリード・ライトポインタ制御機構よりも簡素化することが、費用対効果の観点より必要なことが本発明者によって見出された。
【００１２】
本発明の別の目的は、比較的簡単な構成によって外部から命令プリフェッチを行うことができ、命令の実行効率を向上させることができるデータプロセッサを提供することにある。
【００１３】
本発明の別の目的は、ループ命令が殆ど存在せず専らリニアな連続アドレスの命令を外部メモリからフェッチして逐次実行する処理の高速化を、データプロセッサにおける比較的構成の簡単な命令プリフェッチの機構によって実現する事ができるデータ処理システムを提供することにある。
【００１４】
本発明のその他の目的は、連続的な命令アドレスの実行順序を変更する分岐処理の少ないサブルーチンプログラムを実行するデータ処理システムにおけるデータ処理効率を比較的低いコストで向上させることにある。
【００１５】
本発明の前記並びにその他の目的と新規な特徴は本明細書の記述及び添付図面から明らかになるであろう。
【００１６】
【課題を解決するための手段】
本願において開示される発明のうち代表的なものの概要を簡単に説明すれば下記の通りである。
【００１７】
すなわち、データプロセッサは、命令をフェッチし、フェッチした命令を解読して、命令を実行する命令実行手段と、前記命令実行手段による指示に基づいて外部バスアクセスを制御するバスコントローラとを有する。前記バスコントローラは、複数個の命令バッファと、夫々の命令バッファに固有のフラグと、バッファ制御回路とを備える。このとき、前記バッファ制御回路は、前記夫々の命令バッファに命令アドレスの下位複数ビットが採り得る固有値を割当て、所定の命令フェッチアドレスの後続アドレスを基点に前記下位複数ビットによるアドレス順に対応する命令バッファに命令をプリフェッチし、命令のプリフェッチに応答して対応フラグをバリッド状態に、命令バッファにプリフェッチされている命令の出力に応答して対応フラグをインバリッド状態に制御する。
【００１８】
上記手段においては、命令バッファへのプリフェッチは前記命令アドレスの下位複数ビットの値が何れか１つの所定値になったときだけ行えば充分である。例えば、命令プリフェッチ制御の簡素化を考慮すれば、前記下位複数ビットによる先頭値の命令アドレスに対する命令フェッチがあったとき、その後続アドレスから前記下位複数ビットによる最終アドレスまでアドレス順に対応する命令バッファに命令プリフェッチを行えばよい。更に、分岐命令により命令アドレス系列が変更される可能性を考慮するなら、分岐命令による分岐先命令の命令フェッチがあったとき、当該命令フェッチアドレスの後続アドレスから前記下位複数ビットによる最終アドレスまでアドレス順に対応する命令バッファに命令プリフェッチを行えばよい。
【００１９】
前記データプロセッサを用いたデータ処理システムは、データプロセッサの外部に、前記データプロセッサの動作プログラムを格納し、前記バスコントローラによる外部バスアクセスの対象とされるメモリを有する。
【００２０】
前記メモリは、ループ命令が殆ど存在せず専らリニアな連続アドレスの命令を逐次実行する処理が多用されたプログラムを保有する。このようなプログラムを実行する場合はデータプロセッサにキャッシュメモリを採用しても左程の効果を期待することはできない。
【００２１】
このとき、上記手段に係るデータプロセッサによれば、プリフェッチのために外部から読み込んだ命令をどの命令バッファのバッファエントリとすればよいかは、その命令アドレスの所定の下位複数ビットの値によって一義的に決まるから、プリフェッチの制御が簡単である。この命令プリフェッチのための構成は、キャッシュメモリのアドレスタグによる制御機構やＦＩＦＯバッファのカウンタによるリード・ライトポインタ制御機構よりも簡素に実現することが可能である。
【００２２】
また、上記より、割当てられた命令アドレスの命令プリフェッチに応答して対応フラグをバリッド状態に、プリフェッチされているバッファエントリの出力に応答して対応フラグをインバリッド状態に制御する。これにより、バッファエントリが有効であってフェッチ可能であることをフラグのバリッド状態によって認識でき、命令バッファのバッファエントリが無効であって新たなバッファエントリのロードが可能であることをフラグのインバリッド状態によって認識できる。
【００２３】
前記バッファ制御回路は、上記認識を利用すれば、前記命令実行手段がフェッチすべき命令アドレスの前記下位複数ビットの値に対応して割り付けられている命令バッファのフラグがバリッド状態であることを条件に対応する命令バッファが保有する命令を前記命令実行手段に向けて出力させればよい。また、前記バッファ制御回路は、前記フラグがインバリッド状態であることを条件に対応する命令バッファへの命令プリフェッチを可能にすればよい。
【００２４】
連続的な命令アドレスの実行順序を変更する分岐のような処理の発生を考慮すれば、前記バッファ制御回路は、前記命令実行手段による連続的な命令アドレスの実行順序変更の指示に応答して、全ての前記フラグをインバリッド状態に初期化すればよい。
【００２５】
前記命令バッファを、前記命令実行手段による命令フェッチ単位のビット数で構成すれば、命令実行手段による命令バッファからの命令フェッチの制御が容易になる。
【００２６】
前記命令バッファに命令プリフェッチを行う場合に、命令アドレスの下位複数ビットによる最終アドレスまでをプリフェッチするのではなく、レジスタ等に設定された情報によりプリフェッチする最終アドレスを決定し、又は分岐命令等の出現頻度に基づいてプリフェッチする最終アドレスを決定するようにすることで、分岐命令によりプリフェッチしても無駄になる命令の数を制御可能となる。
【００２７】
さらには、分岐命令による分岐のみならず、割り込み処理発生を契機として命令プリフェッチを停止するようにしても良い。割り込み処理が発生した場合、必要に応じて、プログラムの実行が中断され、割込み処理プログラムの実行がされるため、プリフェッチされた命令が無駄になるからである。
【００２８】
前記複数個の命令バッファを１単位とし、少なくとも２単位分の命令バッファを備えるようにする。このとき前記第１単位の命令バッファ（第１のバッファテーブルの命令バッファ）のそれぞれにプリフェッチした命令を前記命令実行手段により実行している間に、前記バッファ制御回路は前記第１単位の命令バッファの最後の命令アドレスに続く命令アドレスからの命令を、第２単位の命令バッファ（第２のバッファテーブルの命令バッファ）のそれぞれにプリフェッチしていくようにすると良い。これにより、前記命令実行手段により前記第１単位の命令バッファにプリフェッチされた命令を全て実行完了後、前記第２単位の命令バッファにプリフェッチされた命令を実行するよう制御することで、命令バッファに外部メモリから命令プリフェッチを行う時間、プログラムの実行が中断することなく、命令の実行が可能となる。
【００２９】
前記命令バッファ若しくは前記バッファ制御回路に命令デコード機能を持たせ、命令バッファに命令プリフェッチする命令をデコードするようにすると良い。これにより、命令バッファにプリフェッチされた命令が分岐命令であるか否かが判明し、プリフェッチされた命令が分岐命令である場合は、以後の命令のプリフェッチを停止することで、プリフェッチしても無駄になる命令の数を制御可能となる。
【００３０】
さらには、前記命令バッファ若しくは前記バッファ制御回路にアドレス計算機能を持たせ、分岐命令により分岐する先の命令アドレスがアドレス計算により判明する場合、分岐する先の命令をプリフェッチしておくことで、新たに外部メモリから命令バッファに命令プリフェッチを行う時間、プログラムの実行を中断することなく、命令の実行が可能となる。加えて、少なくとも２単位の命令バッファを備えておき、分岐命令の命令アドレスに連続するアドレスの命令と、分岐する先の命令をプリフェッチしておくことで、分岐命令により分岐した場合と分岐しなかった場合のそれぞれにおいて、新たに外部メモリから命令バッファに命令プリフェッチを行う時間、プログラムの実行を中断することなく、命令の実行が可能となる。
【００３１】
前記命令バッファ若しくは前記バッファ制御回路に命令デコード機能とオペランドバッファを持たせ、オペランドを有する命令をプリフェッチした場合に、当該オペランドについてもプリフェッチを可能とするようにする。オペランドがアドレス修飾されたイミディエトデータ等である場合、前記イミディエトデータをフェッチするために外部メモリへのアクセスが発生するため、命令プリフェッチの際に前記イミディエトデータをもプリフェッチしておくことで、プログラムの実行が中断されることなく、命令の実行が可能となる。
【００３２】
前記データプロセッサに更にキャッシュメモリを持たせることで、既に実行したアドレスへの分岐やループ処理、さらにはプロトコル処理自体の実行の際、キャッシュメモリに格納されているプログラムの一部又は全部を再利用可能とすることで、外部メモリへのアクセスによるプログラムの実行の中断を減少させることが可能となる。
【００３３】
本発明を携帯電話の観点より説明する。携帯電話はデータ処理装置、メモリ、及び前記データ処理装置とメモリに接続されるバスを有し、前記メモリには、少なくともプロトコル制御又はシステム制御のためのプログラムが格納される。上記データ処理装置は、命令をフェッチし、フェッチした命令を解読し、命令を実行する命令実行部と、前記命令実行部による命令フェッチ単位のビット数を有する複数個の命令バッファ、それぞれの命令バッファに対応したフラグ、及びバッファ制御回路を有し前記命令実行部からの信号に基づいてバスを介してメモリへのアクセスを制御するバスコントローラを有する。前記バッファ制御回路は、前記それぞれの命令バッファに命令アドレスの下位複数ビットが取りうる固有値を割り当てる。このバッファ制御回路は、命令アドレスの下位複数ビットにより表現される最小値に該当する命令アドレスへの命令フェッチがあった場合、当該命令アドレスの次の命令アドレスから、当該下位複数ビットにより表現される最後の命令アドレスまでの命令を、前記複数個の命令バッファの命令アドレスに対応するそれぞれの命令バッファに格納し、それぞれの命令バッファに対応したそれぞれのフラグを第１状態とする。更にバッファ制御回路は、前記命令実行部からの命令フェッチ要求に応じて、前記命令実行部が出力する命令フェッチすべき命令アドレスの下位複数ビットに対応する命令バッファに対応するフラグが第１状態であれば、当該命令バッファに格納されている命令を前記命令実行部に出力し、当該フラグを第２状態とする。
【００３４】
前記命令実行部が出力する命令フェッチすべき命令アドレスの下位複数ビットに対応する命令バッファに対応するフラグが第２状態である場合、当該命令アドレスの次の命令アドレスから、当該下位複数ビットにより表現される最後の命令アドレスまでの命令を、前記複数個の命令バッファの命令アドレスに対応する命令バッファのそれぞれに格納し、それぞれの命令バッファに対応したそれぞれのフラグを第１状態としてよい。
【００３５】
上記において、前記命令実行部が出力する命令フェッチすべき命令アドレスの命令の内、命令アドレスの前記下位複数ビットにより表現される最小値に該当する命令アドレスの命令又は命令アドレスの前記下位複数ビットにより表現される値の対応命令バッファフラグが第２状態にされている命令アドレスの命令は、メモリから読み出された後、命令バッファに格納されずにそのまま前記命令実行部に供給されればよい。
【００３６】
前記命令実行部は、フェッチした命令の種類に応じて所定の信号を出力する。前記バッファ制御回路は、前記命令実行部の出力する第１信号に応じて、前記複数の命令バッファのそれぞれに対応したフラグの全てを、第２状態としてよい。前記命令実行部が前記第１信号を出力する命令は、例えば、分岐命令である。
【００３７】
【発明の実施の形態】
図１には本発明に係るデータ処理システムの一例が示される。同図にはデータプロセッサ１と外部メモリ２が代表的に示される。
【００３８】
データプロセッサ１は代表的に示された中央処理装置（ＣＰＵ）３及びバスコントローラ（ＢＳＣ）４を有する。前記ＣＰＵ３は命令をフェッチし、フェッチした命令を解読して、命令を実行する命令実行手段を構成する。前記バスコントローラ４は、前記ＣＰＵ３による指示に基づいて外部メモリ２等に対する外部バスアクセスを制御する。
【００３９】
前記ＣＰＵ３は、算術論理演算器（ＡＬＵ）等で代表される演算部１０、汎用レジスタ１１、プログラムカウンタ１２、命令デコーダ１３及びメモリアクセスコマンド発生部１４を有する。前記プログラムカウンタ１２は次に実行すべき命令アドレスを保有する。前記メモリアクセスコマンド発生部１４は、メモリアクセス動作のための制御情報を命令デコーダ１３から入力し、命令フェッチの際にはプログラムカウンタ１２から内部アドレスバス１６に命令アドレスが出力されるのに同期して、また、データアクセスの際には汎用レジスタ１１から内部アドレスバス１６にデータアドレスが出力されるのに同期して、メモリアクセスコマンド１８をメモリアクセスコマンドバス１７に出力する。前記メモリアクセスコマンド１８は、リード／ライト動作の種別を示す情報、アクセスデータ幅（並列データビット数）を示す情報、命令フェッチサイクルであることを示す情報、命令フェッチが強制命令フェッチであるか通常命令フェッチであるかを示す情報を含む。強制命令フェッチとは、命令実行順序をリニアなアドレス列から別のアドレス列に変更する分岐命令による分岐先命令のフェッチを意味する。通常命令フェッチとは、前回の命令フェッチアドレスに対して今回の命令フェッチアドレスがリニアなアドレス列で連続的なアドレスになっている命令フェッチを意味する。
【００４０】
メモリアクセスコマンドによって指示されるアクセスが命令フェッチであれば、バスコントローラ４を介してリードされた命令が内部データバス１５を介して命令デコーダ１３に取込まれる。命令デコーダ１３はその命令を解読し、解読結果にしたがって、外部メモリ２から汎用レジスタ１１にオペランドをロードさせ、演算部１０にオペランドを演算させ、演算結果を外部メモリ２にストアさせる処理等を制御して、その命令を実行する。
【００４１】
前記メモリアクセスコマンドによって指示されるアクセスがデータアクセスであれば、バスコントローラ４を介してリードされたデータが内部データバス１５を介して汎用レジスタ１１に取込まれ、或いは汎用レジスタ１１から内部データバス１５に出力されたライトデータがバスコントローラ４を介して外部メモリ２に書込まれる。
【００４２】
前記ＣＰＵ１のアドレスマップは図２に例示され、Ｈ’００００００００〜Ｈ’０ＦＦＦＦＦＦＦまでが外部メモリ空間であり、Ｈ’１０００００００〜Ｈ’ＦＦＦＦＦＦＦＦまでが内蔵メモリ及び周辺モジュール空間などとされる。外部メモリ空間は所定容量毎に順次ＣＳ０〜ＣＳ３空間とされる。特に制限されないが、外部メモリ空間ＣＳ０〜ＣＳ３に接続可能なメモリデバイスの種類はＲＯＭ（リード・オンリ・メモリ）、ＳＲＡＭ（スタティック・ランダム・アクセス・メモリ）、バーストＲＯＭ、ＤＲＡＭ（ダイナミック・ランダム・アクセス・メモリ）、ＳＤＲＡＭ（シンクロナスＤＲＡＭ）の中から予め選ばれた数種類に決められている。前記外部メモリ２はそれら外部メモリ空間ＣＳ０〜ＣＳ３に配置されたメモリデバイスによって構成される。外部メモリ２とは、４つのメモリ空間ＣＳ０〜ＣＳ３に配置されたメモリデバイスを総称する名称である。特に制限されないが、外部プログラムメモリ領域はメモリ空間ＣＳ０の先頭から一定領域に割当てられる。
【００４３】
前記バスコントローラ４は、外部メモリ２のメモリ空間ＣＳ０〜ＣＳ３毎にアクセス制御を行う。夫々のメモリ空間ＣＳ０〜ＣＳ３のメモリチップのアクセス制御に必要なアクセス制御信号２５は外部メモリアクセス制御部２０がアドレス空間毎に生成する。例えば、ＤＲＡＭが配置されているメモリ空間ＣＳ２をアクセス対象にするときは外部メモリアクセス制御部２０はローアドレスストローブ信号、カラムアドレスストローブ信号、ライトイネーブル信号等を出力する。ＳＲＡＭが配置されているメモリ空間ＣＳ３をアクセス対象にするときは外部メモリアクセス制御部２０はチップイネーブル信号、リード／ライト信号等を出力する。
【００４４】
どのメモリ空間にどのようなメモリデバイスを割当てるかは外部メモリアクセス設定レジスタ２１の設定値で決まる。例えば、メモリデバイスの特性（必要なウェイステートサイクル数、並列入出力データビット数など）を示す制御コード情報（メモリデバイス制御コード）２６の設定領域がメモリ空間別に設けられ、設定されたメモリデバイス制御コード２６は外部メモリアクセス制御部２０に与えられる。
【００４５】
アクセス対象がどのメモリ空間であるかは、内部アドレスバス１６上のアドレスをメモリアクセスアドレスデコーダ２２で解読し、その解読結果を外部メモリアクセス制御部２０に与えられることによって明らかになる。ＣＰＵ３からのアクセス要求がデータアクセスなのか命令フェッチなのか、また、リード動作なのかライト動作なのか等は、メモリアクセスコマンドバス１７上のメモリアクセスコマンドをメモリアクセスコマンドデコーダ２３で解読し、その結果が外部メモリアクセス制御部２０に与えられる。
【００４６】
外部メモリアクセス制御部２０はそれらの入力情報を参照して、外部メモリ２のアクセス対象メモリデバイスに、チップ選択等のアクセス制御情報を与え、アドレス／データ入出力制御部２４を介してアドレス信号の供給及びデータの入出力を制御する。データアクセスでは、リードデータ及びライトデータはデータパス２７を通る。
【００４７】
バスコントローラ４は命令プリフェッチのために、例えば３個の命令バッファＢｕｆ４，Ｂｕｆ８、ＢｕｆＣと、夫々の命令バッファに固有のフラグＦｌｇ４，Ｆｌｇ８，ＦｌｇＣと、バッファコントローラ３０と、入力段セレクタ３１と、出力段セレクタ３２とを備える。前記入力段セレクタ３１は１：４で出力選択を行い、出力段セレクタは４：１で入力選択を行う。入力段セレクタ３１の出力と出力段セレクタの入力との間にはスルー径路３３、命令バッファＢｕｆ４、Ｂｕｆ８、ＢｕｆＣが並列配置されている。
【００４８】
特に制限されないが、ＣＰＵ３の命令セットは１６ビット固定長であり、ＣＰＵ３は２命令単位（３２ビット単位）で命令フェッチを行う。また、ＣＰＵ３の出力するアドレス信号はバイト（８ビット）を最小単位とするバイトアドレスである。これに呼応して、前記命令バッファＢｕｆ４，Ｂｕｆ８、ＢｕｆＣは夫々３２ビットである。前記バイトアドレスであるアドレス信号の下位４ビットに着目すれば連続する１６バイト分の命令を管理できる。そこで、命令バッファＢｕｆ４は下位４ビットがＨ’４（＝Ｂ’０１００）の命令アドレスのプリフェッチエリアとして割当てられ、命令バッファＢｕｆ８は下位４ビットがＨ’８（＝Ｂ’１０００）の命令アドレスのプリフェッチエリアに割当てられ、命令バッファＢｕｆＣは下位４ビットがＨ’Ｃ（＝Ｂ’１１００）の命令アドレスのプリフェッチエリアに割当てられる。この命令バッファのアドレス割当て論理はバッファコントローラ３０に形成される。
【００４９】
前記メモリアクセスコマンドデコーダ２３により命令フェッチの指示が検出されると、前記バッファコントローラ３０は、その命令フェッチの指示が前記通常命令フェッチであるか強制命令フェッチであるかを前記メモリアクセスコマンドデコーダ２３のデコード出力に基づいて判定する。また、バッファコントローラ３０は内部アドレスバス１６の下位４ビットを入力してその値を判定する。
【００５０】
バッファコントローラ３０は、命令フェッチの指示が通常命令フェッチであって、命令アドレスの前記下位４ビットによる先頭値（Ｈ’０）の命令アドレスに対する命令フェッチであることを判別すると、当該先頭の命令アドレスに対する外部メモリ２からの命令フェッチと、その後続アドレスに対する命令バッファＢｕｆ４，Ｂｕｆ８，ＢｕｆＣへの命令のプリフェッチを制御する。即ち、バッファコントローラ３０は、前記外部メモリアクセス制御部２０に、アドレス／データ入出力制御部２４を介してメモリ空間ＣＳ０から命令を３２ビットリードさせ、リードした命令を入力段セレクタ３１に向けて出力させる。バッファコントローラ３０は、入力段セレクタ３１に供給される命令をスルー経路３３に導き、出力段セレクタ３２でスルー経路を選択して、当該命令を内部データバス１５に出力して、命令デコーダ１３に取込み可能にする。その後、バッファコントローラ３０は、その命令フェッチアドレスに対して下位４ビットを順次Ｈ’４，Ｈ’８，Ｈ’Ｃに変更して、夫々の命令アドレスを順次命令バッファＢｕｆ４，Ｂｕｆ８，ＢｕｆＣに格納する。この時の外部メモリ２に対するアクセス制御は、ＣＰＵ３が外部メモリのアクセスを要求していないとき、バッファコントローラ３０が外部メモリアクセス制御部２０を介して指示する。バッファコントローラ３０は命令バッファＢｕｆ４，Ｂｕｆ８，ＢｕｆＣにエントリを格納する毎に、対応するフラグＦｌｇ４，Ｆｌｇ８，ＦｌｇＣをバリッド状態（セット状態）にする。
【００５１】
前記バッファコントローラ３０は、命令フェッチの指示が通常命令フェッチであって、命令アドレスの前記下位４ビットの値がＨ’４，Ｈ’８，Ｈ’Ｃの何れかである場合には外部メモリ２から命令フェッチを行わず、既にプリフェッチされている命令バッファＢｕｆ４，Ｂｕｆ８，ＢｕｆＣの中から対応する命令バッファの出力を出力段セレクタ３２で選択して、命令フェッチに必要な命令を内部データバス１５に出力して、命令デコーダ１３に取込み可能にする。バッファコントローラ３０は、バッファエントリを出力した命令バッファに対応するフラグをインバリッド状態（リセット状態）にする。
【００５２】
バッファコントローラ３０は、命令フェッチの指示が強制命令フェッチである場合には、先ず、フラグＦｌｇ４，Ｆｌｇ８，ＦｌｇＣをインバリッド状態に強制する。次に、その命令アドレスの前記下位４ビットの値に拘わらず、その命令アドレスに対する外部メモリ２からの命令フェッチと、その後続アドレスに対する命令バッファへの命令をプリフェッチを制御する。即ち、バッファコントローラ３０は、前記外部メモリアクセス制御部２０に、アドレス／データ入出力制御部２４を介してメモリ空間ＣＳ０から強制命令フェッチに応ずる命令をリードさせ、リードした命令を入力段セレクタ３１に向けて出力させる。バッファコントローラ３０は、入力段セレクタ３１に供給される命令をスルー経路３３に導き、出力段セレクタ３２でスルー経路３３を選択して、当該命令を内部データバス１５に出力して、命令デコーダ１３に取込み可能にする。その後、バッファコントローラ３０は、その強制命令フェッチのアドレスに対して下位４ビットをＨ’Ｃまで変更して、夫々の命令アドレスを対応する命令バッファに格納する。強制命令フェッチアドレスの下位４ビットがＨ’４であれば、命令アドレスの下位４ビットをＨ’８、Ｈ’Ｃに順次変更して命令バッファＢｕｆ８，ＢｕｆＣにプリフェッチが行われる。この時の外部メモリ２に対するアクセス制御は、ＣＰＵ３が外部メモリのアクセスを要求していないとき、バッファコントローラ３０が外部メモリアクセス制御部２０を介して指示する。前記同様にバッファエントリが格納された命令バッファの対応フラグはバリッド状態（セット状態）にされる。
【００５３】
図３及び図４にはデータプロセッサ１による命令フェッチとプリフェッチの制御手順が例示される。
【００５４】
外部メモリ２に対するアクセス要求がデータアクセスの場合には発行されたアドレスに対するリード・／ライト動作が行われる（Ｓ１）。
【００５５】
データアクセスでない場合にはアクセス要求が強制命令フェッチであるが判定され（Ｓ２）、強制命令フェッチであればフラグＦｌｇ４，Ｆｌｇ８，ＦｌｇＣがリセット状態にされる（Ｓ３）。そして、その時のフェッチアドレスの下位４ビットの値が判定される（Ｓ４〜Ｓ７）。例えば強制命令フェッチの命令アドレスが１６ｎ＋０番地（下位４ビット＝Ｈ’０）のとき、当該命令アドレス１６ｎ＋０番地の命令が外部メモリ２から命令デコーダ１３に渡される（Ｓ８）。これによってＣＰＵ３はフェッチした命令を解読して実行可能にされる。一方、バスコントローラ４は、その後、ＣＰＵ３による外部メモリ２のアクセスがないとき、後続の命令アドレス１６ｎ＋４番地（下位４ビット＝Ｈ’４）、１６ｎ＋８番地（下位４ビット＝Ｈ’８）、１６ｎ＋Ｃ番地（下位４ビット＝Ｈ’Ｃ）から対応する命令バッファＢｕｆ４，Ｂｕｆ８，ＢｕｆＣに命令をプリフェッチし、対応するフラグＦｌｇ４，Ｆｌｇ８，ＦｌｇＣをセット状態にする（Ｓ９〜Ｓ１４）。強制命令フェッチの命令アドレスが１６ｎ＋４番地、１６ｎ＋８番地のときも当該番地の命令がデコーダに供給され（Ｓ１５，Ｓ２０）、後続の命令アドレスから命令バッファに命令プリフェッチが行われ、対応するフラグがセットされる（Ｓ１６〜Ａ１９、Ｓ２１〜Ｓ２２）。強制命令フェッチの命令アドレスが１６ｎ＋Ｃ番地のときは当該番地の命令がデコーダにフェッチされ（Ｓ２３）、命令バッファへの命令プリフェッチは行われない。
【００５６】
ステップＳ２の判定結果が通常命令フェッチであるときは、図４に示されるように、その時のフェッチアドレスの下位４ビットの値が判定される（Ｓ３０〜Ｓ３３）。例えば通常命令フェッチの命令アドレスが１６ｎ＋０番地（下位４ビット＝Ｈ’０）のとき、当該命令アドレス１６ｎ＋０番地の命令が外部メモリ２から命令デコーダ１３に渡される（Ｓ３４）。これによってＣＰＵ３はフェッチした命令を解読して実行可能にされる。一方、バスコントローラ４は、その後、ＣＰＵ３による外部メモリ２のアクセスがないとき、後続の命令アドレス１６ｎ＋４番地（下位４ビット＝Ｈ’４）、１６ｎ＋８番地（下位４ビット＝Ｈ’８）、１６ｎ＋Ｃ番地（下位４ビット＝Ｈ’Ｃ）から対応する命令バッファＢｕｆ４，Ｂｕｆ８，ＢｕｆＣに命令をプリフェッチし、対応するフラグＦｌｇ４，Ｆｌｇ８，ＦｌｇＣをセット状態にする（Ｓ３４〜Ｓ４０）。通常命令フェッチの命令アドレスが１６ｎ＋４番地、１６ｎ＋８番地、１６ｎ＋Ｃ番地のときは、当該命令番地の下位４ビットの値に対応するフラグＦｌｇ４，Ｆｌｇ８，ＦｌｇＣがセット状態にされるのを待って（Ｓ４１〜Ｓ４３）、対応する命令バッファＢｕｆ４，Ｂｕｆ８，ＢｕｆＣから命令がデコーダ１３に供給され（Ｓ４４〜Ｓ４６）、供給後に対応するフラグがリセットされる（Ｓ４７〜Ａ４９）。
【００５７】
図５には外部メモリ２のメモリ空間ＣＳ０に配置されたメモリデバイスの動作タイミングが例示される。同図に示される動作タイミングは、例えばページモードを有するフラッシュメモリのページモードによるメモリ・リード動作を示す。
【００５８】
フラッシュメモリは、ソース、ドレイン、フローティングゲート及びコントロールゲートを持つメモリセルトランジスタを記憶素子とする電気的に書き換え可能な半導体記憶装置である。図５においてアドレスＡ［１９：３］はメモリの１７ビットのページアドレス信号を示す。３ビットのページ内アドレス信号Ａ［２：０］を順次切り換えてアクセスすれば、同一ページ内のアクセスを高速化することができる。また、分岐命令がほとんど存在せず、リニアに命令実行されるプログラムの特性を鑑みれば、ＢｕｆＣに格納した命令の出力により、ＣＥや次の命令のページアドレス等を出力しておくことで、データの読出しまでの時間を短縮しておくことが可能となる。命令プリフェッチを考慮すれば、プログラムを格納する前記メモリ空間ＣＳ０のメモリデバイスにページモード付きのフラッシュメモリを採用すれば、ＣＰＵ３による外部メモリアクセスの空き期間で行わなければならない命令プリフェッチの高速化に寄与することができる。尚、図５においてＣＥはチップ選択を指示するチップイネーブル信号、ＯＥは出力動作を指示するアウトプットイネーブル信号、ＷＥは書込み動作を指示するライトイネーブル信号である。
【００５９】
図６には外部メモリ２のメモリ空間ＣＳ０に配置された別のメモリデバイスの動作タイミングが例示される。同図に示される動作タイミングは、ＳＤＲＡＭのバースト動作によるメモリ・リード動作を示す。ＳＤＲＡＭは、選択トランジスタとストレージ容量から成るダイナミック型メモリセルを夫々有するメモリバンクを複数個有し、クロック信号に同期して与えられるコマンドに基づいて、クロック同期で動作される。バースト動作のバーストレングス（連続的な出力データ数）及びＣＡＳレイテンシ（カラム系動作の開始からデータ出力までのクロックサイクル数）はＳＤＲＡＭのコントロールレジスタに予め設定される。
【００６０】
ＳＤＲＡＭはチップセレクト信号／ＣＳのローレベルによってコマンド又はデータが入力可能にされる。チップセレクト信号／ＣＳによりコマンド入力可能にされたときロウアドレスストローブ信号／ＲＡＳ，カラムアドレスストローブ信号／ＣＡＳ，ライトイネーブル信号／ＷＥの信号状態に従ってバンクアクティブコマンドが指定されると、それと一緒に入力されるアドレス信号によってバンクとロウアドレスが指定され、ロウアドレスによってワード線選択動作が行われる。次に、チップセレクト信号／ＣＳによりコマンド入力可能にされたときロウアドレスストローブ信号／ＲＡＳ，カラムアドレスストローブ信号／ＣＡＳ，ライトイネーブル信号／ＷＥの信号状態に従ってバンクリードコマンドが指定されると、それと一緒に入力されるアドレス信号によってカラムアドレスが指定され、ビット線選択等のカラム系動作が行われ、これによって読み出されたデータＤ１がＣＡＳレイテンシで指定されるクロック信号サイクルの経過に同期して外部に出力される。図６の例ではＣＡＳレイテンシは２である。この後、指定されているバーストレングスに応ずる回数だけカラムアドレスを内部アドレスカウンタで順次更新しながらでカラム系動作が繰り返され、例えばバーストレングス４の場合には、データＤ１に続けて、クロック信号ＣＬＫのクロックサイクルに同期してデータＤ２，Ｄ３，Ｄ４が出力される。命令プリフェッチを考慮すれば、プログラムを格納する前記メモリ空間ＣＳ０のメモリデバイスに、バースト動作可能な図６のＳＤＲＡＭを採用すれば、ＣＰＵ３による外部メモリアクセスの空き期間で行わなければならない命令プリフェッチの高速化に寄与することができる。
【００６１】
また、分岐命令等がほとんど存在せず、リニアに命令実行がなされるプログラムの特性を鑑みれば、ＢｕｆＣに格納した命令の出力に伴い、ワード線選択動作を行っておき、データの読出しまでの時間を短縮することが可能となる。
【００６２】
図７には図１のデータプロセッサを適用した携帯電話システムのブロック図が示される。携帯電話システムはアナログ部４０とディジタル部４１に大別される。アナログ部４０では、アンテナ４２にデュプレクサとしてのアンテナスイッチ４３が接続され、アンテナ４２で受信された高周波信号はローノイズアンプ（ＬＮＡ）４４で高周波ノイズが除去され、検波・復号回路（ＤＥＭ）４５で検波された信号が復号され、Ａ／Ｄ変換器４６でディジタルデータに変換され、ディジタル部４１に与えられる。ディジタル部４１から与えられるディジタル送信データは、特に制限されないが、ＧＭＳＫ（Gaussian Filtered Minimum Shift Keying）変調回路４７で変調され、Ｄ／Ａ変換回路４８でアナログ信号に変換される。変換されたアナログ信号は符号化回路（ＭＯＤ）４９で符号化され、符号化された信号が高周波アンプ（ＨＰＡ）５０で高周波信号に増幅されて、アンテナ４２から送信される。符号化回路（ＭＯＤ）４９及び検波・復号回路（ＤＥＭ）４５はＰＬＬ回路５１で生成されるクロック信号に同期動作される。
【００６３】
ディジタル部４１は、特に制限されないが、ディジタル信号処理部（ＤＳＰ）５３、時分割多重アクセス制御部（ＴＤＭＡ）５４、前記データプロセッサ１、及び前記外部メモリ２を有する。ディジタル信号処理部５３は、等化器５５、チャネルコーデック５６、音声圧縮伸長部５７、ビタビ処理部５８及び暗号化処理部５９を、図示を省略する積和演算回路及びその動作プログラム等によって実現する。等化器５５は前記Ａ／Ｄ変換器４６の出力を等化し、等化されたデータはビタビ処理部５８で論理値が判定され、判定結果がチャネルコーデック５６に与えられ所定のフォーマット変換が行われ、音声圧縮伸長部５７で伸長される。伸長されたデータはＤ／Ａ変換器６０を介してスピーカー６１から放音される。マイク６２に入力された音声はＡ／Ｄ変換器６３でディジタル音声データに変換され、音声圧縮伸長部５７で圧縮され、チャネルコーデック５６を介して所定のフォーマット変換が行われ、前記ＧＭＳＫ変調回路４７に与えられる。
【００６４】
前記データプロセッサ１は、通話中には前記アナログ部４０及びディジタル部４１の動作をリアルタイムに制御する。更に、データプロセッサ１は移動体通信特有のプロトコル制御処理やシステム制御処理を行う。プロトコル制御処理は、通話中や着信待ち受け中において自分自身の携帯電話システムがどの通話エリアに所属するかの判定や、通話エリアを管轄する基地局の変更などを行う処理である。システム制御処理は携帯電話システムの操作ボタンの変化に応ずる指示を検出したりディスプレイの表示を制御したりする処理である。前記プロトコル制御処理及びシステム制御処理には厳格なリアルタイム性が要求されず、また、プログラム容量も大きい。そのため、前記リアルタイム制御の為の動作プログラムはデータプロセッサ１の内蔵ＲＯＭに格納され、プロトコル制御処理やシステム制御処理のための動作プログラムは外部メモリ２に格納される。
【００６５】
前記プロトコル制御処理やシステム制御処理のための動作プログラムにはループ命令が殆ど存在せず専らリニアな連続アドレスの命令を逐次実行する処理が多用されたプログラムになっている。このようなプログラムを実行する場合はデータプロセッサ１にキャッシュメモリを採用しても左程の効果を期待することはできず、またキャッシュメモリをデータプロセッサ上に設けるとデータプロセッサのトランジスタ規模の増大によりプロセッサコストが高くなり、また、それによる占有面積も大きくなる。このとき、以上で説明した命令プリフェッチ機能を有するデータプロセッサ１を用いれば、プリフェッチのために外部から読み込んだ命令をどの命令バッファＢｕｆ４，Ｂｕｆ８，ＢｕｆＣのバッファエントリとすればよいかは、その命令アドレスの所定の下位４ビットの値によって一義的に決まるから、プリフェッチの制御が簡単である。この命令プリフェッチのための構成は、キャッシュメモリのアドレスタグによる制御機構やＦＩＦＯバッファのカウンタによるリード・ライトポインタ制御機構よりも簡素に実現することが可能である。したがって、携帯電話システムのコスト低減、及び小型化に寄与することができる。
【００６６】
特に、命令バッファへのプリフェッチは前記命令アドレスの下位４ビットの値が何れか１つの所定値になったときだけ行えば充分であり、例えば、前記下位４ビットによる先頭値（Ｈ’０）の命令アドレスに対する命令フェッチがあったとき、その後続アドレスから前記下位４ビットによる最終アドレス（Ｈ’Ｃ）までアドレス順に対応する命令バッファに命令プリフェッチを行い、命令プリフェッチ制御の簡素化を考慮している。更に、分岐命令による分岐先命令の命令フェッチがあったとき、当該命令フェッチアドレスの後続アドレスから前記下位４ビットによる最終アドレスまでアドレス順に対応する命令バッファに命令プリフェッチを行うようにして、分岐命令により命令アドレス系列が変更されたときも、命令アドレス系列の変更後における命令フェッチの効率化について考慮している。
【００６７】
図８に本発明に係るデータ処理システムの他の一例が示される。図８に示すデータ処理システムは、図１に示す外部メモリアクセス設定レジスタ２１に代えて、転送制御部２１１が用いられ、データプロセッサ１００の外部にバースト転送が可能なページモード機能付き外部メモリを含む外部メモリ２００が接続される。前記転送制御部２１１で決定されたバースト転送長に応じて、外部メモリ２００から命令バッファへ、最大ｎ命令を転送可能なように、バッファコントローラ３０により制御される。
【００６８】
前記ページモード機能付き外部メモリ(ＣＳ０空間)には、例えば携帯電話のシステムプロトコル処理などの、分岐やループが比較的少なくシーケンシャルに命令が実行されるようなプログラムが格納される。
【００６９】
図９に前記転送制御部２１１のバースト転送長の設定部２５０のブロック図と、図１０にバースト転送長の設定制御フロー、図１１にｕｐ／ｄｏｗｎ（アップダウン）カウンタ２５３とバースト語長設定レジスタ２５４の変化例を示す。前記バースト転送長の設定部２５０は分岐命令と分岐命令の間で実行される非分岐命令の数をカウントし、分岐命令の出現するまでに実行される非分岐命令の数が多ければ、バースト転送長も長くし、実行される非分岐命令の数が少なければ、バースト転送長を短くするよう制御される。初期値として設定されるバースト転送長は特に限定されないが、４命令分としても良い。
【００７０】
図９の設定部では、バースト語長設定レジスタ２５４への設定は、分岐命令が出現したところで設定しているが、ｕｐ／ｄｏｗｎカウンタ２５３が所定の値になる毎に設定するようにしても良い。
【００７１】
図１２及び図１３には、データプロセッサ１００による命令フェッチとプリフェッチの制御手順が示される。図１２及び図１３に示す制御手順は、図３及び図４に示す制御手順と比べて、命令バッファの数が増加したことにより格納する命令数が増加（Ｓα）している点を除き、特に違いはない。
【００７２】
図１４に本発明に係るデータ処理システムの他の一例が示される。図１４に示すデータ処理システムは、ＣＰＵ３が何らかの要因により割込み処理を行うことが必要となった場合である。割込み処理を行う場合、分岐命令により分岐を行う場合と同様、ＣＰＵ３が実行する命令アドレスが連続しなくなるからである。
【００７３】
割込み制御部１７１が各種要因に基づく割込みを受け付け、割込み要求があった旨をＣＰＵ３に通知する。命令デコーダ１０５は割込み制御部１７１からの割込みに対し、割込み処理プログラムの実行を行う場合に、バスコントローラ４に通知（１５３）する。前記通知に応じて、バッファコントローラ３０は分岐命令により分岐が行われる場合と同様の処理を行う。
【００７４】
図１５に、本発明に係るデータ処理システムの他の一例が示される。図１５に示すデータ処理システムは、ｎ個の命令バッファを有するプリフェッチバッファテーブルを２個(１６２，１６３)有する。バッファコントローラ３０は、ＣＰＵ３がプリフェッチバッファテーブル１６２を使用している間に、外部メモリ２００から命令フェッチを行い、プリフェッチバッファテーブル１６３の命令バッファに格納するよう制御される。具体的には、ＣＰＵ３がプリフェッチバッファテーブル１６２の命令バッファ（１９１，１５７，１５９）に格納された命令を全てフェッチした後、次の命令フェッチはプリフェッチバッファテーブル１６３の命令バッファより行い、プリフェッチバッファテーブル１６２の命令バッファに、外部メモリ２００から命令フェッチを行った命令を格納する。
【００７５】
ＣＰＵ３が、プリフェッチバッファテーブル１６３の命令バッファに格納された命令を全てフェッチした場合は、逆の切替え制御が行われる。
【００７６】
図１６にプリフェッチバッファテーブルの切替えに係るタイミングチャートを示す。時間ｔ１において分岐命令の命令フェッチが行われた後、分岐先の命令アドレスについて、バスコントローラ４は外部メモリ２００へのアクセスを行い、時間ｔ４からｔ６にかけて外部メモリ２００から供給される命令をプリフェッチバッファテーブル１６２（プリフェッチバッファテーブルＡ）の命令バッファに格納する。時間ｔ８においてプリフェッチバッファテーブルＡの最後の命令バッファＡ３に格納された命令が命令フェッチされた後、続く命令アドレスについて、時間ｔ９からｔ１２にかけて、外部メモリ２００から供給される命令をプリフェッチバッファテーブル１６３（プリフェッチバッファテーブルＢ）の命令バッファに格納する。これにより、時間ｔ１０において発行される命令フェッチに対して、プリフェッチバッファテーブルＢの命令バッファＢ０に格納された命令を供給することが可能となり、外部メモリ２００から命令が供給されるのを待つ必要がなくなる。
【００７７】
外部メモリ２００から命令を読み出すために外部メモリのアドレスバスに出力するアドレスについて説明する。
【００７８】
時間ｔ１での分岐命令の命令フェッチの場合、供給を受けるべき命令のアドレスはＣＰＵ３が内部バスに出力する情報を使用して、バスコントローラ４が生成して外部メモリのアドレスバスに出力する。一方、時間ｔ８での非分岐命令に続く命令フェッチの場合、バッファコントローラ３０の内部情報を基にして、続く命令のアドレス計算が可能であるため、内部バスに出力する情報が出力される前に、先行して供給を受けるべき命令のアドレスを出力することが可能となる。
【００７９】
図１７に、複数のプリフェッチバッファテーブルを使用する場合の動作を示す。ＣＰＵ３から分岐命令による命令フェッチが出力された場合(図１７(Ａ))、外部メモリへの読出し動作を行い、外部メモリから読み込まれる命令を命令バッファに書き込むのと平行してＣＰＵ３の命令フェッチが行われる。この場合、プリフェッチバッファテーブルへの書込みは、特に限定されないが、最も最近使用していない側のプリフェッチバッファテーブルに命令を格納するようにすると良い。
【００８０】
一方、非分岐命令による命令フェッチを行う場合（図１７（Ｂ））、命令アドレスの下位ビットにより示される命令バッファの対応するフラグがバリッド状態となるまで待ち、ＣＰＵ３の命令フェッチが行われ、フラグをインバリッド状態にする。そして全ての命令を命令フェッチした(即ち空の)プリフェッチバッファテーブルが存在する場合、ＣＰＵ３からの命令フェッチが出力されるか否かに関係なく、最後に実行した命令フェッチに連続するアドレスに対して、外部メモリ２００への読出し動作を行い、外部メモリから読み込まれる命令を空のプリフェッチバッファテーブルの命令バッファに格納し、対応するフラグをバリッド状態とする。
【００８１】
図１８に本発明に係るデータ処理システムの他の一例を示す。図１８に示すデータ処理システムは、外部メモリ２００から読み出される命令が分岐命令か非分岐命令かを判定するための命令デコーダ１７０を有するものである。外部メモリ２００から読み出される命令が分岐命令であるか非分岐命令であるかを前記命令デコーダ１７０により判定し、分岐命令である場合はその分岐命令に続く命令の読込みを中断するものである。
【００８２】
図１９に、命令デコーダ１７０により分岐命令判定を行う場合のタイミングチャートを示す。
【００８３】
時間ｔ３から始まる外部メモリ２００からの命令読出しにおいて、時間ｔ７で読み出された命令が分岐命令であることが判明した場合、外部メモリ２００からの命令読出し（バースト転送）を中断し、時間ｔ１０で分岐命令による分岐先アドレスが判明した段階で、次の命令の読出しを開始（ｔ１２）する。
【００８４】
外部メモリ２００からの命令読出しを中断するのは、命令デコーダ１７０での分岐命令の検出に限られず、割込み要因の検出であっても良い。割込み要因が検出された場合、図１４において説明したように、分岐命令による分岐と同様にＣＰＵ３が実行する命令アドレスが連続しなくなるからである。
【００８５】
図２０に本発明に係るデータ処理システムの他の一例を示す。図２０に示すデータ処理システムは、外部メモリ２００から読み出される命令が分岐命令か非分岐命令かを判定する命令デコーダ１７０と、分岐命令で分岐する分岐先アドレスを計算するためのアドレス計算機１７２を有するものである。
【００８６】
図２１に命令デコーダ１７０により分岐命令判定を行い、アドレス計算機１７２によりアドレス計算を行う場合のタイミングチャートを示す。
【００８７】
時間ｔ３から始まる外部メモリ２００からの命令読出しにおいて、時間ｔ７で読み出された命令が分岐命令であることが判明した場合、外部メモリ２００からの命令読出し（バースト転送）を中断し、アドレス計算機１７２で計算した分岐先アドレスについて、時間ｔ１０から外部メモリ２００より命令読出しを行う。これにより分岐命令を検出した場合であっても、分岐先アドレスの命令の読出しのためにＣＰＵ３の命令実行を中断することがなくなる。
【００８８】
命令デコーダ１７０により分岐命令を判定する場合、当該分岐命令が１方向分岐命令であるか、２方向分岐命令であるかについて判定するようにしても良い。１方向分岐命令である場合、必ず分岐先アドレスへの分岐が発生するが、２方向分岐命令である場合、分岐先アドレスへ分岐するか、若しくは分岐せずに続く命令アドレスの命令を実行するかのいずれかの動作を行う。
【００８９】
検出した分岐命令が１方向分岐命令である場合、当該分岐命令に続く命令から読出しを中断し、２方向分岐命令である場合、当該分岐命令に続く命令と、当該分岐命令により分岐する分岐先アドレスの命令をそれぞれプリフェッチバッファテーブルに格納するように制御すればよい。これにより、２方向分岐命令により分岐をしてもしなくても、ＣＰＵ３が実行する命令はプリフェッチバッファテーブルに格納されているため、外部メモリ２００から命令を読み出すために必要な時間がなくなることとなる。そして、実行しない側のプリフェッチバッファテーブルに格納した命令については、実行しないことが確実となった段階で、インバリッド状態とすればよい。
【００９０】
２方向分岐命令の場合に当該分岐命令に続く命令と、分岐先アドレスの命令をそれぞれ何命令分プリフェッチしておくかであるが、特に限定しないがそれぞれ２命令分程度でよい。分岐命令を検出後、分岐命令に続く２命令程度読出しを行ったところで外部メモリ２００からの読出しを中断し、続いて分岐先アドレスについて２命令程度読出しを行う。２命令程度読出しを行っておけば、実行する命令が確定した段階で、外部メモリ２００から新しく命令の読み出しを開始しても、命令実行に間に合うからである。具体的にはＣＰＵ３での命令実行にかかる時間と外部メモリ２００から命令を読み出すのに必要な時間とを考慮し、決定すればよい。
【００９１】
図２２に本発明に係るデータ処理システムの他の一例を示す。図２２に示すデータ処理装置は、プリフェッチバッファテーブルと共にオペランドバッファ（１７６，１７７）を有するものである。
【００９２】
図２３にオペランドバッファ（１７６，１７７）を有する場合のタイミングチャートを示す。命令デコーダによりオペランドに示されるアドレスについて外部メモリ２００の読出しが必要な命令を検出（ｔ６）した場合、アドレス計算機１７２によりオペランドに示されるアドレスの計算を行い、外部メモリ２００にオペランドデータの読出しを行い（ｔ９）、外部メモリ２００から読み出されたデータをオペランドバッファ（１７６，１７７）に格納するようにする。これによりＣＰＵ３からのオペランドフェッチ（ｔ８）を待って外部メモリ２００にアクセスをする場合に比べてＣＰＵ３の実行中断時間が短くなる。オペランドデータの読出しが完了した後は、続く命令について命令の読出しを継続すればよい。
【００９３】
図２４に本発明に係るデータ処理システムの他の一例を示す。図２４に示すデータ処理システムは、プリフェッチバッファと共にキャッシュメモリを更に有するものである。プロトコル処理では比較的分岐やループ処理が少ないため、キャッシュメモリだけでは処理効率を向上することが困難であるため、プリフェッチバッファを用いることが有用である。しかし、プリフェッチバッファだけでは、既に実行したアドレスへの分岐やループ処理であったとしても外部メモリ２００へのアクセスが必要となり、このような場合はキャッシュメモリが有用となる。さらには、プロトコル処理内での分岐やループ処理だけでなく、プロトコル処理プログラム自体、所定の時間間隔で何度も実行されるものであり、キャッシュメモリにプログラム全てを格納することは現実的ではないが、プログラムの一部であってもキャッシュメモリに格納されていれば、当該一部については外部メモリ２００へのアクセスが不要となり、有用と言うことができる。このことから、キャッシュメモリに格納されている命令については、キャッシュメモリから当該命令を読出し、キャッシュメモリに格納されていない命令については、プリフェッチバッファを用いて、外部メモリ２００から先行して命令を読み出すようにすればよい。
【００９４】
更に、プリフェッチバッファに図２０に示す命令デコーダ１７０とアドレス計算機１７２を有し、分岐命令の検出及び分岐先アドレスの計算を行うようにすると良い。分岐先アドレスが現在実行中の命令アドレスよりも小さいアドレスへの分岐の場合は、分岐先アドレスの命令はキャッシュメモリに格納されている可能性が高いため、命令プリフェッチを中断し、キャッシュメモリに当該命令が格納されているか否かをキャッシュメモリコントローラ１８４によりチェックし、格納されている場合はキャッシュメモリから格納されている命令を読み出すようにすればよい。
【００９５】
一方、分岐先アドレスが現在実行中の命令アドレスよりも大きい場合、又はキャッシュメモリに当該命令が格納されていない場合は、当該分岐先アドレスについて命令プリフェッチを行うようにすればよい。
【００９６】
以上本発明者によってなされた発明を実施形態に基づいて具体的に説明したが、本発明はそれに限定されるものではなく、その要旨を逸脱しない範囲において種々変更可能であることは言うまでもない。
【００９７】
例えば、データプロセッサはＣＰＵのような命令実行手段とバスコントローラ以外にも適宜の回路モジュールを備えてよいことは言うまでもない。例えば、メモリマネージメントユニット、浮動小数点演算ユニット、積和演算ユニット、データ用のキャッシュメモリ、ダイレクトメモリアクセスコントローラ、タイマ・カウンタなどを必要に応じて内蔵してもよい。
【００９８】
また、分岐命令による分岐先命令のフェッチに際してプリフェッチを行わないように構成する事も可能である。また、メモリバッファのサイズは命令フェッチの単位である命令サイズに等しくすることが命令プリフェッチ及び命令フェッチの制御を簡易化する観点より優れているが、本発明はそれに限定されず、命令フェッチの単位である命令サイズの整数倍の容量をもつ命令バッファを採用することも可能である。
【００９９】
以上の説明では主として本発明者によってなされた発明をその背景となった利用分野である携帯電話システムに適用した場合について説明したが、本発明はそれに限定されるものではなく、その他の通信端末や携帯情報端末等のデータ処理システムに広く適用することができる。
【０１００】
【発明の効果】
本願において開示される発明のうち代表的なものによって得られる効果を簡単に説明すれば下記の通りである。
【０１０１】
すなわち、比較的簡単な構成によって外部から命令プリフェッチを行うことができ、命令の実行効率を向上させることができるデータプロセッサを実現する事ができる。
【０１０２】
また、ループ命令が殆ど存在せず専らリニアな連続アドレスの命令を外部メモリからフェッチして逐次実行する処理の高速化を、データプロセッサにおける比較的構成の簡単な命令プリフェッチの機構によって実現する事ができる。
【０１０３】
更に、連続的な命令アドレスの実行順序を変更する分岐処理の少ないサブルーチンプログラムを実行するデータ処理システムにおけるデータ処理効率を比較的低いコストで向上させることができる。
【図面の簡単な説明】
【図１】本発明に係るデータ処理システムの一例をデータプロセッサと共に示すブロック図である。
【図２】データプロセッサに内蔵されたＣＰＵのアドレスマップである。
【図３】データプロセッサによる命令フェッチとプリフェッチの制御手順を図４と共に示すフローチャートである。
【図４】データプロセッサによる命令フェッチとプリフェッチの制御手順を図３と共に示すフローチャートである。
【図５】外部メモリとしてページモードを有するフラッシュメモリを採用したときの当該ページモードによるメモリ・リード動作のタイミングチャートである。
【図６】外部メモリとしてバースト動作を有するＳＤＲＡＭを採用したときの当該バースト・リード動作のタイミングチャートである。
【図７】図１のデータプロセッサを適用した携帯電話システムのブロック図である。
【図８】本発明に係るデータ処理システムの一例をデータプロセッサと共に示すブロック図である。
【図９】図８のバースト転送長設定部の一例を示すブロック図である。
【図１０】図９のバースト転送長設定部でのバースト転送長設定手順を示すフローチャートである。
【図１１】図８のバースト転送長設定部で設定されるバースト転送長の変化の一例を示す説明図である。
【図１２】データプロセッサによる命令フェッチとプリフェッチの制御手順を図１３と共に示すフローチャートである。
【図１３】データプロセッサによる命令フェッチとプリフェッチの制御手順を図１２と共に示すフローチャートである。
【図１４】本発明に係るデータ処理システムの一例をデータプロセッサと共に示すブロック図である。
【図１５】本発明に係るデータ処理システムの一例をデータプロセッサと共に示すブロック図である。
【図１６】図１５の複数のプリフェッチバッファテーブルを有する場合の、データプロセッサによる命令フェッチと命令バッファに格納される命令と外部メモリへのアクセスを示すタイミング・チャートである。
【図１７】分岐命令と非分岐命令のそれぞれの場合における命令バッファの動作を示すフローチャートである。
【図１８】本発明に係るデータ処理システムの一例をデータプロセッサと共に示すブロック図である。
【図１９】図１８の命令デコーダを有する場合の、分岐命令の検出を含む、データプロセッサによる命令フェッチと命令バッファに格納される命令と外部メモリへのアクセスを示すタイミング・チャートである。
【図２０】本発明に係るデータ処理システムの一例をデータプロセッサと共に示すブロック図である。
【図２１】図２０のアドレス計算機を有する場合の、分岐命令の検出を含む、データプロセッサによる命令フェッチと命令バッファに格納される命令と外部メモリへのアクセスを示すタイミング・チャートである。
【図２２】本発明に係るデータ処理システムの一例をデータプロセッサと共に示すブロック図である。
【図２３】図２２のオペラントフェッチ機能を有する場合の、オペランドフェッチ命令の検出を含む、データプロセッサによる命令フェッチと命令バッファに格納される命令と外部メモリへのアクセスを示すタイミング・チャートである。
【図２４】本発明に係るデータ処理システムの一例をデータプロセッサと共に示すブロック図である。
【符号の説明】
１データプロセッサ
２外部メモリ
ＣＳ０〜ＣＳ３メモリ空間
３ＣＰＵ
４バスコントローラ
１２プログラムカウンタ
１３命令デコーダ
１４メモリアクセスコマンド発生部
２０外部メモリアクセス制御部
２２メモリアクセスアドレスデコーダ
２３メモリアクセスコマンドデコーダ
２４アドレス／データ入出力制御部
３０バッファコントローラ
３１入力段セレクタ
３２出力段セレクタ
３３スルー経路
Ｂｕｆ４，Ｂｕｆ８，ＢｕｆＣ命令バッファ
Ｆｌｇ４，Ｆｌｇ８，ＦｌｇＣフラグ
４０アナログ部
４１ディジタル部[0001]
BACKGROUND OF THE INVENTION
The present invention relates to an instruction prefetch from a data processor, in particular, an external memory, and executes, for example, a subroutine program with less processing of a branch (in this specification, the concept of jump is included) that changes the execution order of consecutive instruction addresses. The present invention relates to a technology effective when applied to a data processing system.
[0002]
[Prior art]
As techniques for speeding up sequential execution of instructions by a data processor or the like, there are an instruction cache memory and an instruction prefetch buffer in consideration of temporal and spatial locality of information reference.
[0003]
For example, Japanese Patent Laid-Open No. 6-243036 (US registration number 5,511,178) is provided with a loop lock that indicates the locality of fetched instructions, and the instruction sequence in the loop is changed until program control moves outside the loop. An invention is disclosed that keeps it in a cache memory.
[0004]
JP-A-4-62637 discloses an instruction queue (instruction prefetch buffer) that keeps fetched loop instructions in a FIFO (First-In / First-Out) buffer in order to improve the execution speed. A provided microprocessor is disclosed.
[0005]
[Problems to be solved by the invention]
The technique for speeding up sequential execution of instructions by preventing the loop instruction from being flushed out of the instruction cache memory or the instruction prefetch buffer as in the above prior art is effective in processing that uses a lot of loop instructions. In the case of a process in which instructions are almost nonexistent and an instruction of a linear continuous address is sequentially executed, even if a configuration in which a loop instruction is prohibited from being driven out is employed, it is difficult to obtain a large effect commensurate with it. According to the study of the present inventor, it has been clarified that even in such a case, even using a normal instruction cache memory may be substantially meaningless.
[0006]
That is, the present inventor examined execution of a subroutine program such as protocol processing or system control processing in a mobile phone system. Since the protocol processing or system control processing by the subroutine program is complicated and has a large program scale, storing the processing program in the built-in ROM of the data processor is not realistic. On the other hand, the access speed of the external memory is slower than the data processing speed of the data processor, and an instruction cache memory can be adopted for the data processor to absorb the difference. However, in the protocol processing or system control processing, there is almost no loop instruction, and processing that sequentially executes instructions of linear continuous addresses is frequently used. As a result, even if a cache memory is used, the effect on the left is expected. Can not.
[0007]
In view of this, the present inventor has considered not using a cache memory, but instead dealing with instruction prefetch having a relatively simple configuration. In that case, considering the characteristic that there is almost no loop instruction and processing that executes instructions with linear continuous addresses exclusively is frequently used, a configuration that prohibits the expulsion of loop instructions is not necessary at all. In addition, it is necessary from the viewpoint of cost-effectiveness to make the correspondence between the prefetched instruction and the address of the instruction simpler than the control mechanism using the cache memory address tag or the read / write pointer control mechanism using the counter. It has been found by the present inventors.
[0008]
Further, as a result of examination by the present inventor, when a fixed-length burst transfer mechanism is used for instruction prefetching, an instruction that is wasted when an instruction branch occurs due to a branch instruction is also prefetched. As a result, it has been found that overhead occurs.
[0009]
When instruction prefetch operation is performed by executing a branch instruction or a combination of the lower bit of an address and a normal instruction fetch request, if all prefetched instructions have been executed, the next instruction prefetch operation performs external memory. It has been found that there is a problem that the execution of the program is interrupted until the instruction fetch is completed.
[0010]
Furthermore, if the external memory is accessed in more detail, instruction prefetch is effective for instruction code fetching (instruction fetching), but data fetching as operands (data fetching) is still to external memory. It has been found that there is a problem that the execution of the program is interrupted until the data fetch from the external memory is completed.
[0011]
Therefore, the present inventor has studied to deal with problems newly found by dealing with these instruction prefetches by devising an instruction prefetch method. Even in this case, it is possible to simplify the correspondence between the prefetched instruction and the instruction address from the control mechanism using the address tag of the cache memory and the read / write pointer control mechanism using the counter from the viewpoint of cost effectiveness. The need has been found by the inventors.
[0012]
Another object of the present invention is to provide a data processor that can perform instruction prefetching from the outside with a relatively simple configuration and can improve the execution efficiency of instructions.
[0013]
Another object of the present invention is to increase the processing speed of fetching instructions of linear continuous addresses from an external memory, which are almost free of loop instructions, and executing them sequentially. The object is to provide a data processing system that can be realized by a mechanism.
[0014]
Another object of the present invention is to improve data processing efficiency at a relatively low cost in a data processing system that executes a subroutine program with few branch processes for changing the execution order of consecutive instruction addresses.
[0015]
The above and other objects and novel features of the present invention will be apparent from the description of this specification and the accompanying drawings.
[0016]
[Means for Solving the Problems]
The following is a brief description of an outline of typical inventions disclosed in the present application.
[0017]
That is, the data processor includes an instruction execution unit that fetches an instruction, decodes the fetched instruction, and executes the instruction, and a bus controller that controls external bus access based on an instruction from the instruction execution unit. The bus controller includes a plurality of instruction buffers, a flag unique to each instruction buffer, and a buffer control circuit. At this time, the buffer control circuit assigns a unique value that can be taken by the lower plurality of bits of the instruction address to each of the instruction buffers, and corresponds to the order of the addresses by the lower plurality of bits from the subsequent address of the predetermined instruction fetch address. In response to the instruction prefetch, the corresponding flag is controlled to the valid state, and the corresponding flag is controlled to the invalid state in response to the output of the instruction prefetched in the instruction buffer.
[0018]
In the above means, it is sufficient to perform prefetching to the instruction buffer only when the value of the lower-order multiple bits of the instruction address becomes any one predetermined value. For example, in consideration of simplification of instruction prefetch control, when there is an instruction fetch for the instruction address of the head value by the lower multiple bits, the instruction buffer corresponding to the address order from the subsequent address to the final address by the lower multiple bits Instruction prefetch may be performed. Furthermore, when considering the possibility that the instruction address series is changed by the branch instruction, when there is an instruction fetch of the branch destination instruction by the branch instruction, the address from the subsequent address of the instruction fetch address to the final address by the lower-order multiple bits Instruction prefetch may be performed on the corresponding instruction buffers in order.
[0019]
A data processing system using the data processor has a memory that stores an operation program of the data processor outside the data processor and is a target of external bus access by the bus controller.
[0020]
The memory has a program in which there are almost no loop instructions and processing that executes instructions of linear continuous addresses exclusively. When such a program is executed, the effect on the left cannot be expected even if a cache memory is adopted as the data processor.
[0021]
At this time, according to the data processor according to the above means, which instruction buffer should be used as the buffer entry of the instruction read from the outside for prefetching is uniquely determined by the value of a predetermined lower plurality of bits of the instruction address. Therefore, prefetch control is simple. This configuration for instruction prefetch can be realized more simply than the control mechanism using the cache memory address tag or the read / write pointer control mechanism using the FIFO buffer counter.
[0022]
From the above, the corresponding flag is controlled to be in a valid state in response to the instruction prefetch of the assigned instruction address, and the corresponding flag is controlled to be in an invalid state in response to the output of the prefetched buffer entry. As a result, it can be recognized from the valid state of the flag that the buffer entry is valid and fetchable, and the invalid state of the flag that the buffer entry of the instruction buffer is invalid and a new buffer entry can be loaded. Can be recognized.
[0023]
If the recognition is used, the buffer control circuit has a condition that the flag of the instruction buffer allocated corresponding to the value of the lower plurality of bits of the instruction address to be fetched by the instruction execution means is in a valid state. The instruction stored in the instruction buffer corresponding to the above may be output to the instruction execution means. The buffer control circuit may enable instruction prefetch to the instruction buffer corresponding to the condition that the flag is in the invalid state.
[0024]
In consideration of the occurrence of processing such as a branch that changes the execution order of consecutive instruction addresses, the buffer control circuit responds to an instruction to change the execution order of continuous instruction addresses by the instruction execution means, All the flags may be initialized to the invalid state.
[0025]
If the instruction buffer is configured with the number of bits in the instruction fetch unit by the instruction execution means, it is easy to control instruction fetch from the instruction buffer by the instruction execution means.
[0026]
When instruction prefetch is performed on the instruction buffer, the final address to be prefetched is determined based on information set in a register or the like, or a branch instruction or the like appears, instead of prefetching up to the final address of the lower multiple bits of the instruction address By determining the final address to be prefetched based on the frequency, the number of instructions that are wasted even if prefetched by a branch instruction can be controlled.
[0027]
Furthermore, not only branching by a branch instruction but also instruction prefetch may be stopped when an interrupt process occurs. This is because when interrupt processing occurs, execution of the program is interrupted as necessary, and the interrupt processing program is executed, so that prefetched instructions are wasted.
[0028]
The plurality of instruction buffers are set as one unit, and at least two units of instruction buffers are provided. At this time, while the instruction execution unit executes the prefetched instruction in each of the first unit instruction buffers (instruction buffers of the first buffer table), the buffer control circuit performs the first unit instruction buffer. The instruction from the instruction address following the last instruction address may be prefetched to each of the second unit instruction buffers (instruction buffers of the second buffer table). As a result, after executing all the instructions prefetched in the instruction buffer of the first unit by the instruction execution means, the instruction buffer is controlled to execute the prefetched instruction in the instruction buffer of the second unit. The instruction can be executed without interrupting the execution of the program during the time when the instruction prefetch is performed from the external memory.
[0029]
The instruction buffer or the buffer control circuit may be provided with an instruction decoding function to decode an instruction to be prefetched into the instruction buffer. As a result, it is determined whether or not the instruction prefetched in the instruction buffer is a branch instruction. If the prefetched instruction is a branch instruction, it is useless even if prefetching is performed by stopping the prefetching of subsequent instructions. The number of instructions that become can be controlled.
[0030]
Furthermore, if the instruction buffer or the buffer control circuit has an address calculation function and the instruction address to be branched by the branch instruction is found by the address calculation, a new instruction can be obtained by prefetching the branch destination instruction. In addition, it is possible to execute an instruction without interrupting the execution of the program for the time to prefetch the instruction from the external memory to the instruction buffer. In addition, there is an instruction buffer of at least 2 units, and by prefetching the instruction at the address consecutive to the instruction address of the branch instruction and the instruction at the branch destination, branching with the branch instruction does not branch In each case, the instruction can be executed without interrupting the execution of the program for the time when the instruction prefetch is newly performed from the external memory to the instruction buffer.
[0031]
The instruction buffer or the buffer control circuit is provided with an instruction decoding function and an operand buffer, and when an instruction having an operand is prefetched, the operand can be prefetched. If the operand is address-modified immediate data or the like, an external memory is accessed to fetch the immediate data. Therefore, the immediate data is also prefetched at the time of instruction prefetching. By doing so, it becomes possible to execute the instruction without interrupting the execution of the program.
[0032]
By providing the data processor with a cache memory, a part or all of the program stored in the cache memory can be reused when branching to an already executed address, loop processing, or protocol processing itself is executed. By making it possible, interruption of program execution due to access to the external memory can be reduced.
[0033]
The present invention will be described from the viewpoint of a mobile phone. The cellular phone has a data processing device, a memory, and a bus connected to the data processing device and the memory, and at least a program for protocol control or system control is stored in the memory. The data processing apparatus includes: an instruction execution unit that fetches an instruction, decodes the fetched instruction, and executes the instruction; a plurality of instruction buffers having a number of bits of an instruction fetch unit by the instruction execution unit; And a bus controller that controls the access to the memory via the bus based on a signal from the instruction execution unit. The buffer control circuit assigns a unique value that can be taken by a plurality of lower bits of an instruction address to each of the instruction buffers. When there is an instruction fetch to the instruction address corresponding to the minimum value represented by the lower order multiple bits of the instruction address, the buffer control circuit is represented by the lower order multiple bits from the instruction address next to the instruction address. Instructions up to the last instruction address are stored in the respective instruction buffers corresponding to the instruction addresses of the plurality of instruction buffers, and the respective flags corresponding to the respective instruction buffers are set to the first state. Further, in response to an instruction fetch request from the instruction execution unit, the buffer control circuit has a flag corresponding to the instruction buffer corresponding to the lower-order multiple bits of the instruction address to be fetched output from the instruction execution unit in the first state. If there is, the instruction stored in the instruction buffer is output to the instruction execution unit, and the flag is set to the second state.
[0034]
When the flag corresponding to the instruction buffer corresponding to the lower-order multiple bits of the instruction address to be fetched is output from the instruction execution unit, the flag is represented by the lower-order multiple bits from the instruction address next to the instruction address. Instructions up to the last instruction address may be stored in each of the instruction buffers corresponding to the instruction addresses of the plurality of instruction buffers, and each flag corresponding to each instruction buffer may be in the first state.
[0035]
In the above, the instruction of the instruction address corresponding to the minimum value expressed by the lower multiple bits of the instruction address among the instructions of the instruction address to be fetched output by the instruction execution unit or the lower multiple bits of the instruction address The instruction at the instruction address in which the corresponding instruction buffer flag of the represented value is in the second state may be read from the memory and then supplied as it is to the instruction execution unit without being stored in the instruction buffer.
[0036]
The instruction execution unit outputs a predetermined signal according to the type of fetched instruction. The buffer control circuit may set all flags corresponding to each of the plurality of instruction buffers to the second state in accordance with a first signal output from the instruction execution unit. The instruction from which the instruction execution unit outputs the first signal is, for example, a branch instruction.
[0037]
DETAILED DESCRIPTION OF THE INVENTION
FIG. 1 shows an example of a data processing system according to the present invention. In the figure, a data processor 1 and an external memory 2 are representatively shown.
[0038]
The data processor 1 has a central processing unit (CPU) 3 and a bus controller (BSC) 4 which are shown representatively. The CPU 3 constitutes an instruction execution means for fetching an instruction, decoding the fetched instruction, and executing the instruction. The bus controller 4 controls external bus access to the external memory 2 and the like based on an instruction from the CPU 3.
[0039]
The CPU 3 includes an arithmetic unit 10 represented by an arithmetic logic unit (ALU), a general-purpose register 11, a program counter 12, an instruction decoder 13, and a memory access command generator 14. The program counter 12 holds an instruction address to be executed next. The memory access command generation unit 14 inputs control information for a memory access operation from the instruction decoder 13 and synchronizes with the instruction address output from the program counter 12 to the internal address bus 16 at the time of instruction fetch. In the data access, the memory access command 18 is output to the memory access command bus 17 in synchronization with the output of the data address from the general-purpose register 11 to the internal address bus 16. The memory access command 18 includes information indicating the type of read / write operation, information indicating the access data width (number of parallel data bits), information indicating an instruction fetch cycle, and whether or not the instruction fetch is a forced instruction fetch. Information indicating whether it is an instruction fetch is included. Forced instruction fetch means fetch of a branch destination instruction by a branch instruction that changes the instruction execution order from a linear address string to another address string. The normal instruction fetch means an instruction fetch in which the current instruction fetch address is a continuous address in a linear address string with respect to the previous instruction fetch address.
[0040]
If the access instructed by the memory access command is an instruction fetch, the instruction read via the bus controller 4 is taken into the instruction decoder 13 via the internal data bus 15. The instruction decoder 13 decodes the instruction, and controls processing to load the operand from the external memory 2 to the general-purpose register 11 according to the decoded result, cause the arithmetic unit 10 to operate the operand, and store the operation result in the external memory 2 The instruction is executed.
[0041]
If the access instructed by the memory access command is data access, the data read via the bus controller 4 is taken into the general-purpose register 11 via the internal data bus 15 or from the general-purpose register 11 to the internal data bus. The write data output to 15 is written into the external memory 2 via the bus controller 4.
[0042]
The address map of the CPU 1 is illustrated in FIG. 2, and H'0000000 to H'0FFFFFFF is an external memory space, and H'1000000 to H'FFFFFFFF is an internal memory and peripheral module space. The external memory space is sequentially made CS0 to CS3 for each predetermined capacity. Although not particularly limited, the types of memory devices that can be connected to the external memory spaces CS0 to CS3 are ROM (Read Only Memory), SRAM (Static Random Access Memory), Burst ROM, DRAM (Dynamic Random Access). -Memory) and SDRAM (synchronous DRAM) are selected in advance. The external memory 2 is composed of memory devices arranged in the external memory spaces CS0 to CS3. The external memory 2 is a name generically referring to memory devices arranged in the four memory spaces CS0 to CS3. Although not particularly limited, the external program memory area is allocated to a certain area from the top of the memory space CS0.
[0043]
The bus controller 4 performs access control for each of the memory spaces CS0 to CS3 of the external memory 2. The external memory access control unit 20 generates an access control signal 25 necessary for the access control of the memory chips in the respective memory spaces CS0 to CS3 for each address space. For example, when the memory space CS2 in which the DRAM is arranged is to be accessed, the external memory access control unit 20 outputs a row address strobe signal, a column address strobe signal, a write enable signal, and the like. When the memory space CS3 in which the SRAM is arranged is to be accessed, the external memory access control unit 20 outputs a chip enable signal, a read / write signal, and the like.
[0044]
Which memory device is assigned to which memory space is determined by the set value of the external memory access setting register 21. For example, a setting area for control code information (memory device control code) 26 indicating the characteristics of the memory device (the number of necessary way state cycles, the number of parallel input / output data bits, etc.) is provided for each memory space, and set memory device control The code 26 is given to the external memory access control unit 20.
[0045]
The memory space to be accessed is clarified by decoding the address on the internal address bus 16 with the memory access address decoder 22 and giving the decoded result to the external memory access control unit 20. Whether the access request from the CPU 3 is data access or instruction fetch, read operation or write operation, the memory access command on the memory access command bus 17 is decoded by the memory access command decoder 23, and the result Is provided to the external memory access control unit 20.
[0046]
The external memory access control unit 20 refers to the input information, gives access control information such as chip selection to the access target memory device of the external memory 2, and receives the address signal via the address / data input / output control unit 24. Control supply and input / output of data. In data access, read data and write data pass through the data path 27.
[0047]
For instruction prefetch, the bus controller 4 has, for example, three instruction buffers Buf4, Buf8, BufC, flags Flg4, Flg8, FlgC specific to each instruction buffer, a buffer controller 30, an input stage selector 31, and an output. And a stage selector 32. The input stage selector 31 performs output selection at 1: 4, and the output stage selector performs input selection at 4: 1. A through path 33 and instruction buffers Buf4, Buf8, and BufC are arranged in parallel between the output of the input stage selector 31 and the input of the output stage selector.
[0048]
Although not particularly limited, the instruction set of the CPU 3 has a fixed length of 16 bits, and the CPU 3 fetches instructions in units of 2 instructions (32 bits). The address signal output from the CPU 3 is a byte address having a byte (8 bits) as a minimum unit. In response to this, the instruction buffers Buf4, Buf8, and BufC each have 32 bits. If attention is paid to the lower 4 bits of the address signal which is the byte address, it is possible to manage instructions for 16 consecutive bytes. Therefore, the instruction buffer Buf4 is assigned as a prefetch area of the instruction address whose lower 4 bits are H'4 (= B'0100), and the instruction buffer Buf8 is an instruction address whose lower 4 bits are H'8 (= B'1000). Allocated to the prefetch area, the instruction buffer BufC is allocated to the prefetch area of the instruction address whose lower 4 bits are H′C (= B′1100). The instruction buffer address assignment logic is formed in the buffer controller 30.
[0049]
When the instruction to fetch an instruction is detected by the memory access command decoder 23, the buffer controller 30 determines whether the instruction fetch instruction is the normal instruction fetch or the forced instruction fetch. Judgment is made based on the decoded output. Further, the buffer controller 30 inputs the lower 4 bits of the internal address bus 16 and determines the value.
[0050]
When the buffer controller 30 determines that the instruction fetch instruction is a normal instruction fetch and the instruction fetch is for the instruction address of the head value (H′0) by the lower 4 bits of the instruction address, the head instruction address The instruction fetch from the external memory 2 and the prefetch of the instruction to the instruction buffers Buf4, Buf8, and BufC for the subsequent address are controlled. That is, the buffer controller 30 causes the external memory access control unit 20 to read a 32-bit instruction from the memory space CS0 via the address / data input / output control unit 24, and outputs the read instruction to the input stage selector 31. Let The buffer controller 30 guides the instruction supplied to the input stage selector 31 to the through path 33, selects the through path by the output stage selector 32, outputs the instruction to the internal data bus 15, and takes it into the instruction decoder 13. enable. Thereafter, the buffer controller 30 sequentially changes the lower 4 bits of the instruction fetch address to H′4, H′8, and H′C, and sequentially stores the instruction addresses in the instruction buffers Buf4, Buf8, and BufC. To do. The access control to the external memory 2 at this time is instructed by the buffer controller 30 via the external memory access control unit 20 when the CPU 3 does not request access to the external memory. Each time the buffer controller 30 stores an entry in the instruction buffers Buf4, Buf8, and BufC, it sets the corresponding flags Flg4, Flg8, and FlgC to a valid state (set state).
[0051]
When the instruction fetch instruction is a normal instruction fetch and the value of the lower 4 bits of the instruction address is any one of H′4, H′8, and H′C, the buffer controller 30 From the instruction buffers Buf4, Buf8, and BufC that have already been prefetched, the output stage selector 32 selects the corresponding instruction buffer output from the instruction buffers Buf4, Buf8, and BufC. The output is made available to the instruction decoder 13. The buffer controller 30 sets the flag corresponding to the instruction buffer that has output the buffer entry to an invalid state (reset state).
[0052]
When the instruction fetch instruction is forced instruction fetch, the buffer controller 30 first forces the flags Flg4, Flg8, and FlgC to the invalid state. Next, regardless of the value of the lower 4 bits of the instruction address, the instruction fetch from the external memory 2 for the instruction address and the prefetch of the instruction to the instruction buffer for the subsequent address are controlled. That is, the buffer controller 30 causes the external memory access control unit 20 to read an instruction corresponding to the forced instruction fetch from the memory space CS0 via the address / data input / output control unit 24, and sends the read instruction to the input stage selector 31. Output. The buffer controller 30 guides the instruction supplied to the input stage selector 31 to the through path 33, selects the through path 33 by the output stage selector 32, outputs the instruction to the internal data bus 15, and outputs it to the instruction decoder 13. Enable to capture. Thereafter, the buffer controller 30 changes the lower 4 bits to H′C for the forced instruction fetch address, and stores each instruction address in the corresponding instruction buffer. If the lower 4 bits of the forced instruction fetch address are H′4, the lower 4 bits of the instruction address are sequentially changed to H′8 and H′C, and prefetching is performed to the instruction buffers Buf8 and BufC. The access control to the external memory 2 at this time is instructed by the buffer controller 30 via the external memory access control unit 20 when the CPU 3 does not request access to the external memory. Similarly to the above, the corresponding flag of the instruction buffer in which the buffer entry is stored is set to a valid state (set state).
[0053]
3 and 4 illustrate an instruction fetch and prefetch control procedure by the data processor 1.
[0054]
If the access request to the external memory 2 is data access, a read / write operation for the issued address is performed (S1).
[0055]
If it is not data access, it is determined whether the access request is forced instruction fetch (S2). If it is forced instruction fetch, flags Flg4, Flg8, and FlgC are reset (S3). Then, the value of the lower 4 bits of the fetch address at that time is determined (S4 to S7). For example, when the instruction address of the forced instruction fetch is 16n + 0 (lower 4 bits = H′0), the instruction at the instruction address 16n + 0 is transferred from the external memory 2 to the instruction decoder 13 (S8). This enables the CPU 3 to decode and execute the fetched instruction. On the other hand, when the external memory 2 is not accessed by the CPU 3 thereafter, the bus controller 4 follows the instruction address 16n + 4 (lower 4 bits = H′4), 16n + 8 (lower 4 bits = H′8), 16n + C An instruction is prefetched from (lower 4 bits = H′C) to the corresponding instruction buffer Buf4, Buf8, BufC, and the corresponding flags Flg4, Flg8, FlgC are set (S9 to S14). When the instruction address of the forced instruction fetch is 16n + 4 or 16n + 8, the instruction at the address is supplied to the decoder (S15, S20), the instruction prefetch is performed from the subsequent instruction address to the instruction buffer, and the corresponding flag is set. (S16 to A19, S21 to S22). When the instruction address of the forced instruction fetch is 16n + C, the instruction at the address is fetched by the decoder (S23), and the instruction prefetch to the instruction buffer is not performed.
[0056]
When the determination result of step S2 is a normal instruction fetch, as shown in FIG. 4, the value of the lower 4 bits of the fetch address at that time is determined (S30 to S33). For example, when the instruction address of the normal instruction fetch is 16n + 0 (lower 4 bits = H′0), the instruction at the instruction address 16n + 0 is transferred from the external memory 2 to the instruction decoder 13 (S34). This enables the CPU 3 to decode and execute the fetched instruction. On the other hand, when the external memory 2 is not accessed by the CPU 3 thereafter, the bus controller 4 follows the instruction address 16n + 4 (lower 4 bits = H′4), 16n + 8 (lower 4 bits = H′8), 16n + C Instructions are prefetched from (lower 4 bits = H′C) to the corresponding instruction buffers Buf4, Buf8, and BufC, and the corresponding flags Flg4, Flg8, and FlgC are set (S34 to S40). When the instruction address of the normal instruction fetch is 16n + 4, 16n + 8, 16n + C, it waits for the flags Flg4, Flg8, FlgC corresponding to the value of the lower 4 bits of the instruction address to be set (S41 to S41). In S43, an instruction is supplied from the corresponding instruction buffer Buf4, Buf8, BufC to the decoder 13 (S44 to S46), and the corresponding flag is reset after the supply (S47 to A49).
[0057]
FIG. 5 illustrates the operation timing of the memory device arranged in the memory space CS0 of the external memory 2. The operation timing shown in the figure indicates a memory read operation in a page mode of a flash memory having a page mode, for example.
[0058]
A flash memory is an electrically rewritable semiconductor memory device that uses memory cell transistors having a source, a drain, a floating gate, and a control gate as memory elements. In FIG. 5, address A [19: 3] indicates a 17-bit page address signal of the memory. Access within the same page can be speeded up by sequentially switching and accessing the 3-bit in-page address signals A [2: 0]. Also, considering the characteristics of a program in which there are almost no branch instructions and the instructions are executed linearly, by outputting the instruction stored in BufC, the CE, the page address of the next instruction, and the like can be output. It is possible to shorten the time until reading of. If instruction prefetch is taken into consideration, if a flash memory with a page mode is adopted as the memory device of the memory space CS0 for storing a program, it contributes to speeding up of instruction prefetch that must be performed in the free period of external memory access by the CPU 3 can do. In FIG. 5, CE is a chip enable signal instructing chip selection, OE is an output enable signal instructing output operation, and WE is a write enable signal instructing write operation.
[0059]
FIG. 6 illustrates the operation timing of another memory device arranged in the memory space CS0 of the external memory 2. The operation timing shown in the figure shows the memory read operation by the burst operation of the SDRAM. The SDRAM has a plurality of memory banks each having a dynamic memory cell composed of a selection transistor and a storage capacitor, and is operated in synchronization with a clock based on a command given in synchronization with a clock signal. Burst length (number of continuous output data) and CAS latency (number of clock cycles from the start of column system operation to data output) of the burst operation are preset in the control register of the SDRAM.
[0060]
In the SDRAM, a command or data can be input by the low level of the chip select signal / CS. When a command input is enabled by the chip select signal / CS, if a bank active command is specified according to the signal state of the row address strobe signal / RAS, column address strobe signal / CAS, and write enable signal / WE, it is input together with it. The bank and row address are specified by the address signal, and the word line selection operation is performed by the row address. Next, when the command input is enabled by the chip select signal / CS, if the bank read command is designated according to the signal state of the row address strobe signal / RAS, the column address strobe signal / CAS, and the write enable signal / WE, it is accompanied with it. A column address is specified by an address signal input to the column, column operation such as bit line selection is performed, and the data D1 read by this is externally synchronized with the lapse of a clock signal cycle specified by CAS latency. Is output. In the example of FIG. 6, the CAS latency is 2. Thereafter, the column system operation is repeated while the column address is sequentially updated by the internal address counter as many times as the number corresponding to the designated burst length. For example, in the case of burst length 4, the clock signal CLK Data D2, D3, and D4 are output in synchronization with the clock cycle. If instruction prefetch is taken into consideration, if the SDRAM of FIG. 6 capable of burst operation is adopted as the memory device of the memory space CS0 for storing the program, the high speed of instruction prefetch that must be performed in the free period of external memory access by the CPU 3 It can contribute to the conversion.
[0061]
In view of the characteristics of a program in which there are almost no branch instructions or the like and the instructions are executed linearly, the time until the data is read after the word line selection operation is performed in accordance with the output of the instruction stored in BufC. Can be shortened.
[0062]
FIG. 7 shows a block diagram of a cellular phone system to which the data processor of FIG. 1 is applied. The cellular phone system is roughly divided into an analog unit 40 and a digital unit 41. In the analog unit 40, an antenna switch 43 as a duplexer is connected to the antenna 42, and the high frequency signal received by the antenna 42 is removed by a low noise amplifier (LNA) 44 and detected by a detection / decoding circuit (DEM) 45. The decoded signal is decoded, converted into digital data by the A / D converter 46, and supplied to the digital unit 41. The digital transmission data provided from the digital unit 41 is not particularly limited, but is modulated by a GMSK (Gaussian Filtered Minimum Shift Keying) modulation circuit 47 and converted to an analog signal by a D / A conversion circuit 48. The converted analog signal is encoded by an encoding circuit (MOD) 49, and the encoded signal is amplified to a high frequency signal by a high frequency amplifier (HPA) 50 and transmitted from the antenna 42. The encoding circuit (MOD) 49 and the detection / decoding circuit (DEM) 45 are operated in synchronization with the clock signal generated by the PLL circuit 51.
[0063]
The digital unit 41 includes, but is not limited to, a digital signal processing unit (DSP) 53, a time division multiple access control unit (TDMA) 54, the data processor 1, and the external memory 2. The digital signal processing unit 53 realizes the equalizer 55, the channel codec 56, the voice compression / decompression unit 57, the Viterbi processing unit 58, and the encryption processing unit 59 by a product-sum operation circuit and its operation program (not shown). . The equalizer 55 equalizes the output of the A / D converter 46, the logical value of the equalized data is determined by the Viterbi processing unit 58, the determination result is given to the channel codec 56, and a predetermined format conversion is performed. The voice compression / decompression unit 57 decompresses the data. The expanded data is emitted from the speaker 61 via the D / A converter 60. The voice input to the microphone 62 is converted into digital voice data by the A / D converter 63, compressed by the voice compression / decompression unit 57, subjected to a predetermined format conversion via the channel codec 56, and the GMSK modulation circuit 47. Given to.
[0064]
The data processor 1 controls the operations of the analog unit 40 and the digital unit 41 in real time during a call. Further, the data processor 1 performs protocol control processing and system control processing specific to mobile communication. The protocol control process is a process for determining which call area the mobile phone system belongs to during a call or waiting for an incoming call, changing a base station that controls the call area, and the like. The system control process is a process for detecting an instruction corresponding to a change in an operation button of the mobile phone system and controlling display on the display. The protocol control process and the system control process do not require strict real-time properties, and the program capacity is large. Therefore, the operation program for real-time control is stored in the built-in ROM of the data processor 1, and the operation program for protocol control processing and system control processing is stored in the external memory 2.
[0065]
The operation program for the protocol control process and the system control process has few loop instructions, and is a program that frequently uses a process of sequentially executing instructions of linear continuous addresses exclusively. When such a program is executed, even if a cache memory is adopted as the data processor 1, the effect on the left cannot be expected. If the cache memory is provided on the data processor, the transistor scale of the data processor increases. The processor cost is increased and the occupied area is also increased. At this time, if the data processor 1 having the instruction prefetch function described above is used, which instruction buffer Buf4, Buf8, and BufC should be used as the buffer entry of the instruction read from the outside for prefetching is the instruction address. Since it is uniquely determined by the value of the predetermined lower 4 bits, prefetch control is simple. This configuration for instruction prefetch can be realized more simply than the control mechanism using the cache memory address tag or the read / write pointer control mechanism using the FIFO buffer counter. Therefore, it is possible to contribute to cost reduction and miniaturization of the mobile phone system.
[0066]
In particular, prefetching to the instruction buffer is sufficient only when the value of the lower 4 bits of the instruction address reaches any one predetermined value. For example, the prefetch to the head value (H′0) by the lower 4 bits is sufficient. When there is an instruction fetch for an instruction address, instruction prefetch is performed to the instruction buffer corresponding to the address order from the subsequent address to the final address (H′C) by the lower 4 bits, and simplification of instruction prefetch control is considered. . Further, when there is an instruction fetch of a branch destination instruction by a branch instruction, instruction prefetch is performed to the instruction buffer corresponding to the address order from the subsequent address of the instruction fetch address to the final address by the lower 4 bits. Even when the instruction address series is changed, the efficiency of instruction fetch after the change of the instruction address series is considered.
[0067]
FIG. 8 shows another example of the data processing system according to the present invention. The data processing system shown in FIG. 8 includes an external memory with a page mode function that uses a transfer control unit 211 in place of the external memory access setting register 21 shown in FIG. An external memory 200 is connected. In accordance with the burst transfer length determined by the transfer control unit 211, the buffer controller 30 controls so that a maximum of n instructions can be transferred from the external memory 200 to the instruction buffer.
[0068]
The external memory (CS0 space) with the page mode function stores a program that executes instructions sequentially with relatively few branches and loops, such as mobile phone system protocol processing.
[0069]
9 is a block diagram of the burst transfer length setting unit 250 of the transfer control unit 211, FIG. 10 is a burst transfer length setting control flow, and FIG. 11 is an up / down counter 253 and a burst word length setting register. An example of change of H.254 is shown. The burst transfer length setting unit 250 counts the number of non-branch instructions executed between branch instructions, and if the number of non-branch instructions executed before the branch instruction appears is large, burst transfer is performed. If the length is increased and the number of non-branch instructions to be executed is small, the burst transfer length is controlled to be shortened. The burst transfer length set as the initial value is not particularly limited, but may be four instructions.
[0070]
In the setting unit of FIG. 9, the burst word length setting register 254 is set when a branch instruction appears, but may be set every time the up / down counter 253 reaches a predetermined value. .
[0071]
12 and 13 show the control procedure of instruction fetch and prefetch by the data processor 100. FIG. The control procedure shown in FIGS. 12 and 13 is particularly different from the control procedure shown in FIGS. 3 and 4 except that the number of instructions to be stored (Sα) is increased due to the increase in the number of instruction buffers. There is no difference.
[0072]
FIG. 14 shows another example of the data processing system according to the present invention. The data processing system shown in FIG. 14 is a case where the CPU 3 needs to perform an interrupt process for some reason. This is because when the interrupt processing is performed, the instruction addresses executed by the CPU 3 are not consecutive, as in the case where the branch is performed by the branch instruction.
[0073]
The interrupt control unit 171 receives an interrupt based on various factors, and notifies the CPU 3 that an interrupt request has been made. The instruction decoder 105 notifies the bus controller 4 of the interrupt processing program 153 in response to the interrupt from the interrupt control unit 171 (153). In response to the notification, the buffer controller 30 performs the same processing as when a branch is performed by a branch instruction.
[0074]
FIG. 15 shows another example of the data processing system according to the present invention. The data processing system shown in FIG. 15 has two (162, 163) prefetch buffer tables having n instruction buffers. The buffer controller 30 is controlled to fetch an instruction from the external memory 200 while the CPU 3 is using the prefetch buffer table 162 and store it in the instruction buffer of the prefetch buffer table 163. Specifically, after the CPU 3 fetches all the instructions stored in the instruction buffers (191, 157, 159) of the prefetch buffer table 162, the next instruction fetch is performed from the instruction buffer of the prefetch buffer table 163, and the prefetch buffer table The instruction fetched from the external memory 200 is stored in the 162 instruction buffer.
[0075]
When the CPU 3 fetches all instructions stored in the instruction buffer of the prefetch buffer table 163, reverse switching control is performed.
[0076]
FIG. 16 shows a timing chart relating to switching of the prefetch buffer table. After the instruction fetch of the branch instruction is performed at time t1, the bus controller 4 accesses the external memory 200 with respect to the branch destination instruction address, and the instruction supplied from the external memory 200 is prefetched from time t4 to time t6. Store in the instruction buffer of table 162 (prefetch buffer table A). After the instruction stored in the last instruction buffer A3 of the prefetch buffer table A at the time t8 is fetched, the instruction supplied from the external memory 200 is transferred to the prefetch buffer table 163 (from the time t9 to the time t12 for the subsequent instruction address. Store in the instruction buffer of the prefetch buffer table B). As a result, the instruction stored in the instruction buffer B0 of the prefetch buffer table B can be supplied to the instruction fetch issued at time t10, and it is necessary to wait for the instruction to be supplied from the external memory 200. Disappear.
[0077]
An address output to the address bus of the external memory in order to read an instruction from the external memory 200 will be described.
[0078]
In the case of instruction fetch of a branch instruction at time t1, the address of the instruction to be supplied is generated by the bus controller 4 using information output from the CPU 3 to the internal bus and output to the address bus of the external memory. On the other hand, in the case of the instruction fetch following the non-branch instruction at time t8, the address of the subsequent instruction can be calculated based on the internal information of the buffer controller 30, so before the information to be output to the internal bus is output. It becomes possible to output the address of an instruction to be supplied in advance.
[0079]
FIG. 17 shows an operation when a plurality of prefetch buffer tables are used. When the instruction fetch by the branch instruction is output from the CPU 3 (FIG. 17A), the instruction fetch of the CPU 3 is performed in parallel with performing the read operation to the external memory and writing the instruction read from the external memory to the instruction buffer. Done. In this case, writing to the prefetch buffer table is not particularly limited, but instructions may be stored in the prefetch buffer table on the least recently used side.
[0080]
On the other hand, when an instruction fetch by a non-branch instruction is performed (FIG. 17B), the CPU 3 waits until the corresponding flag in the instruction buffer indicated by the lower bits of the instruction address becomes valid, and the CPU 3 fetches the instruction. To the invalid state. When there is a prefetch buffer table in which all instructions are fetched (that is, empty), regardless of whether or not the instruction fetch from the CPU 3 is output, the address consecutive to the last executed instruction fetch is determined. Then, the read operation to the external memory 200 is performed, the instruction read from the external memory is stored in the instruction buffer of the empty prefetch buffer table, and the corresponding flag is set to the valid state.
[0081]
FIG. 18 shows another example of the data processing system according to the present invention. The data processing system shown in FIG. 18 has an instruction decoder 170 for determining whether an instruction read from the external memory 200 is a branch instruction or a non-branch instruction. The instruction decoder 170 determines whether the instruction read from the external memory 200 is a branch instruction or a non-branch instruction. If the instruction is a branch instruction, reading of the instruction following the branch instruction is interrupted.
[0082]
FIG. 19 shows a timing chart when branch instruction determination is performed by the instruction decoder 170.
[0083]
In the instruction reading from the external memory 200 starting from the time t3, when it is found that the instruction read at the time t7 is a branch instruction, the instruction reading (burst transfer) from the external memory 200 is interrupted, and at time t10. When the branch destination address by the branch instruction is found, reading of the next instruction is started (t12).
[0084]
Interruption of instruction reading from the external memory 200 is not limited to detection of a branch instruction by the instruction decoder 170, but may be detection of an interrupt factor. This is because when an interrupt factor is detected, the instruction addresses executed by the CPU 3 are not continuous as in the case of branching by a branch instruction, as described in FIG.
[0085]
FIG. 20 shows another example of the data processing system according to the present invention. The data processing system shown in FIG. 20 includes an instruction decoder 170 that determines whether an instruction read from the external memory 200 is a branch instruction or a non-branch instruction, and an address calculator 172 for calculating a branch destination address that is branched by the branch instruction. Is.
[0086]
FIG. 21 shows a timing chart when branch instruction determination is performed by the instruction decoder 170 and address calculation is performed by the address calculator 172.
[0087]
In the instruction reading from the external memory 200 starting from the time t3, when it is found that the instruction read at the time t7 is a branch instruction, the instruction reading (burst transfer) from the external memory 200 is interrupted, and the address calculator 172 The instruction is read from the external memory 200 from the time t10 with respect to the branch destination address calculated in (1). Thus, even when a branch instruction is detected, the instruction execution of the CPU 3 is not interrupted for reading the instruction at the branch destination address.
[0088]
When the instruction decoder 170 determines a branch instruction, it may be determined whether the branch instruction is a one-way branch instruction or a two-way branch instruction. If it is a one-way branch instruction, a branch to the branch destination address always occurs, but if it is a two-way branch instruction, whether to branch to the branch destination address or execute the instruction at the instruction address that follows without branching Do one of the actions.
[0089]
If the detected branch instruction is a one-way branch instruction, reading is interrupted from the instruction following the branch instruction. If the detected branch instruction is a two-way branch instruction, the instruction following the branch instruction and the branch destination address to be branched by the branch instruction These instructions may be controlled to be stored in the prefetch buffer table. As a result, the instruction executed by the CPU 3 is stored in the prefetch buffer table regardless of whether or not the branch is performed by the two-way branch instruction, so that the time required for reading the instruction from the external memory 200 is eliminated. . The instruction stored in the prefetch buffer table on the non-executed side may be set to the invalid state when it is certain that it will not be executed.
[0090]
In the case of a two-way branch instruction, how many instructions each of the instruction following the branch instruction and the instruction at the branch destination address are prefetched is not particularly limited, but may be about two instructions. After the branch instruction is detected, when about two instructions following the branch instruction are read, reading from the external memory 200 is interrupted, and then about two instructions are read for the branch destination address. This is because, if about two instructions are read, even if a new instruction is read from the external memory 200 when the instruction to be executed is determined, the instruction will be in time. Specifically, it may be determined in consideration of the time required for executing the instruction in the CPU 3 and the time required for reading the instruction from the external memory 200.
[0091]
FIG. 22 shows another example of the data processing system according to the present invention. The data processing apparatus shown in FIG. 22 has an operand buffer (176, 177) together with a prefetch buffer table.
[0092]
FIG. 23 shows a timing chart in the case where the operand buffer (176, 177) is provided. When the instruction decoder detects an instruction that needs to be read from the external memory 200 for the address indicated by the operand (t6), the address calculator 172 calculates the address indicated by the operand and reads the operand data to the external memory 200. (T9) The data read from the external memory 200 is stored in the operand buffer (176, 177). This shortens the execution interruption time of the CPU 3 as compared with the case where the external memory 200 is accessed after waiting for the operand fetch (t8) from the CPU 3. After the reading of the operand data is completed, the reading of the instruction for the subsequent instruction may be continued.
[0093]
FIG. 24 shows another example of the data processing system according to the present invention. The data processing system shown in FIG. 24 further includes a cache memory together with a prefetch buffer. Since protocol processing has relatively few branches and loop processing, it is difficult to improve processing efficiency with only a cache memory, and therefore it is useful to use a prefetch buffer. However, if only the prefetch buffer is used, access to the external memory 200 is required even if branching or loop processing to an already executed address is performed. In such a case, the cache memory is useful. Furthermore, not only branching and loop processing within the protocol processing, but also the protocol processing program itself is executed many times at a predetermined time interval, and storing all the programs in the cache memory is not realistic. However, even if a part of the program is stored in the cache memory, access to the external memory 200 is not necessary for the part, which can be said to be useful. Therefore, for an instruction stored in the cache memory, the instruction is read from the cache memory, and for an instruction not stored in the cache memory, the instruction is read in advance from the external memory 200 using the prefetch buffer. What should I do?
[0094]
Furthermore, the prefetch buffer may include the instruction decoder 170 and the address calculator 172 shown in FIG. 20 to detect a branch instruction and calculate a branch destination address. When branching to an address whose branch destination address is smaller than the currently executing instruction address, there is a high possibility that the instruction at the branch destination address is stored in the cache memory. Whether or not an instruction is stored is checked by the cache memory controller 184, and if stored, the stored instruction may be read from the cache memory.
[0095]
On the other hand, if the branch destination address is larger than the currently executing instruction address, or if the instruction is not stored in the cache memory, the instruction prefetch may be performed on the branch destination address.
[0096]
Although the invention made by the present inventor has been specifically described based on the embodiments, it is needless to say that the present invention is not limited thereto and can be variously modified without departing from the gist thereof.
[0097]
For example, it goes without saying that the data processor may include an appropriate circuit module in addition to the instruction execution means such as the CPU and the bus controller. For example, a memory management unit, a floating point arithmetic unit, a product-sum arithmetic unit, a data cache memory, a direct memory access controller, a timer / counter, and the like may be incorporated as necessary.
[0098]
It is also possible to configure so that prefetching is not performed when a branch destination instruction is fetched by a branch instruction. Further, it is better from the viewpoint of simplifying the control of the instruction prefetch and the instruction fetch that the size of the memory buffer is equal to the instruction size that is a unit of the instruction fetch. However, the present invention is not limited thereto, and the unit of the instruction fetch is not limited thereto. It is also possible to employ an instruction buffer having a capacity that is an integral multiple of the instruction size.
[0099]
In the above description, the case where the invention made mainly by the present inventor is applied to the mobile phone system which is the field of use behind the present invention has been described. However, the present invention is not limited thereto, and other communication terminals, The present invention can be widely applied to data processing systems such as portable information terminals.
[0100]
【The invention's effect】
The effects obtained by the representative ones of the inventions disclosed in the present application will be briefly described as follows.
[0101]
That is, an instruction prefetch can be performed from the outside with a relatively simple configuration, and a data processor that can improve instruction execution efficiency can be realized.
[0102]
In addition, it is possible to realize a high-speed processing of fetching instructions of linear continuous addresses exclusively from an external memory with few loop instructions and sequentially executing them by a relatively simple instruction prefetch mechanism in a data processor. it can.
[0103]
Furthermore, the data processing efficiency in a data processing system that executes a subroutine program with few branch processes for changing the execution order of consecutive instruction addresses can be improved at a relatively low cost.
[Brief description of the drawings]
FIG. 1 is a block diagram showing an example of a data processing system according to the present invention together with a data processor.
FIG. 2 is an address map of a CPU built in the data processor.
FIG. 3 is a flowchart showing a control procedure of instruction fetch and prefetch by the data processor together with FIG. 4;
FIG. 4 is a flowchart showing a control procedure of instruction fetch and prefetch by the data processor together with FIG. 3;
FIG. 5 is a timing chart of a memory read operation in the page mode when a flash memory having a page mode is employed as the external memory.
FIG. 6 is a timing chart of the burst read operation when an SDRAM having a burst operation is employed as the external memory.
7 is a block diagram of a mobile phone system to which the data processor of FIG. 1 is applied.
FIG. 8 is a block diagram showing an example of a data processing system according to the present invention together with a data processor.
9 is a block diagram showing an example of a burst transfer length setting unit in FIG. 8. FIG.
10 is a flowchart showing a burst transfer length setting procedure in the burst transfer length setting unit of FIG. 9;
11 is an explanatory diagram showing an example of a change in burst transfer length set by the burst transfer length setting unit in FIG. 8. FIG.
12 is a flowchart showing a control procedure of instruction fetch and prefetch by the data processor together with FIG. 13;
FIG. 13 is a flowchart showing a control procedure of instruction fetch and prefetch by the data processor together with FIG. 12;
FIG. 14 is a block diagram showing an example of a data processing system according to the present invention together with a data processor.
FIG. 15 is a block diagram showing an example of a data processing system according to the present invention together with a data processor.
16 is a timing chart showing instruction fetching by the data processor, instructions stored in the instruction buffer, and access to the external memory when the plurality of prefetch buffer tables of FIG. 15 are provided.
FIG. 17 is a flowchart showing the operation of the instruction buffer in each case of a branch instruction and a non-branch instruction.
FIG. 18 is a block diagram showing an example of a data processing system according to the present invention together with a data processor.
FIG. 19 is a timing chart showing instruction fetch by the data processor, instruction stored in the instruction buffer, and access to the external memory including branch instruction detection when the instruction decoder of FIG. 18 is included.
FIG. 20 is a block diagram showing an example of a data processing system according to the present invention together with a data processor.
FIG. 21 is a timing chart showing instruction fetch by the data processor, instruction stored in the instruction buffer, and access to the external memory including branch instruction detection when the address computer of FIG. 20 is included.
FIG. 22 is a block diagram showing an example of a data processing system according to the present invention together with a data processor.
23 is a timing chart showing instruction fetch by the data processor, instruction stored in the instruction buffer, and access to the external memory including detection of operand fetch instruction when the operand fetch function of FIG. 22 is provided.
FIG. 24 is a block diagram showing an example of a data processing system according to the present invention together with a data processor.
[Explanation of symbols]
1 Data processor
2 External memory
CS0 to CS3 memory space
3 CPU
4 Bus controller
12 Program counter
13 Instruction decoder
14 Memory access command generator
20 External memory access controller
22 Memory access address decoder
23 Memory Access Command Decoder
24 Address / data input / output controller
30 Buffer controller
31 Input stage selector
32 Output stage selector
33 Through route
Buf4, Buf8, BufC instruction buffer
Flg4, Flg8, FlgC flags
40 Analog part
41 Digital part

Claims

An instruction execution unit that fetches an instruction, decodes the fetched instruction, and executes the instruction; and a bus controller that controls external bus access based on an instruction from the instruction execution unit;
The bus controller has a plurality of instruction buffers, a flag specific to each instruction buffer, and a buffer control circuit,
The buffer control circuit assigns a unique value that can be taken by a plurality of lower bits of an instruction address to each of the instruction buffers, and assigns an instruction to an instruction buffer corresponding to the order of addresses of the lower plurality of bits based on a subsequent address of a predetermined instruction fetch address. A data processor which prefetches and controls a correspondence flag to a valid state in response to an instruction prefetch, and controls a correspondence flag to an invalid state in response to an output of a prefetched instruction.

The buffer control circuit has an instruction buffer corresponding to a condition that a flag of an instruction buffer allocated corresponding to the value of the lower plurality of bits of an instruction address to be fetched by the instruction execution unit is in a valid state. 2. The data processor according to claim 1, wherein an instruction is output to said instruction execution means.

3. The data processor according to claim 2, wherein the buffer control circuit enables instruction prefetch to an instruction buffer corresponding to a condition that the flag is in an invalid state.

4. The buffer control circuit according to claim 3, wherein all the flags are initialized to an invalid state in response to an instruction to change the execution order of consecutive instruction addresses by the instruction execution means. Data processor.

5. The data processor according to claim 1, wherein the instruction buffer has a bit number of an instruction fetch unit by the instruction execution means.

An instruction execution unit that fetches an instruction, decodes the fetched instruction, and executes the instruction; and a bus controller that controls external bus access based on an instruction from the instruction execution unit;
The bus controller has a plurality of instruction buffers having a number of bits of an instruction fetch unit by the instruction execution means, a flag specific to each instruction buffer, and a buffer control circuit,
The buffer control circuit assigns a unique value that can be taken by the lower plurality of bits of the instruction address to each instruction buffer, and when there is an instruction fetch for the instruction address of the leading value by the lower plurality of bits, the lower plurality Instructions are prefetched to the instruction buffer corresponding to the address order up to the last address by bit, and when there is an instruction fetch of the branch destination instruction by the branch instruction, the address order from the subsequent address of the instruction fetch address to the final address by the lower-order multiple bits is supported Prefetching the instruction into the instruction buffer to be executed, setting the corresponding flag in a valid state in response to the instruction prefetch, and the instruction buffer allocated in correspondence with the value of the lower plurality of bits of the instruction address to be fetched by the instruction execution means. Flag is valid Data processor, characterized in that the holdings instruction in the instruction buffer is to output to the instruction execution unit corresponding to condition that the state.

The buffer control circuit controls the corresponding flag to the invalid state in response to the output of the instruction prefetched in the instruction buffer, and can prefetch the instruction to the instruction buffer corresponding to the condition that the flag is in the invalid state. 7. The data processor according to claim 6, wherein all the flags are initialized to an invalid state in response to an instruction fetch instruction by a branch instruction.

An instruction execution unit that fetches an instruction, decodes the fetched instruction, and executes the instruction, a plurality of instruction buffers, a flag unique to each instruction buffer, and a buffer control circuit are provided, based on an instruction from the instruction execution unit A data processor formed on a single semiconductor chip having a bus controller for controlling external bus access,
A memory for storing an operation program of the data processor, and a target of external bus access by the bus controller;
The buffer control circuit assigns a unique value that can be taken by the lower plurality of bits of the instruction address to each instruction buffer, and when there is an instruction fetch for the instruction address of the leading value by the lower plurality of bits, the lower plurality Instructions are prefetched to the instruction buffer corresponding to the address order up to the last address by bit, and when there is an instruction fetch of the branch destination instruction by the branch instruction, the address order from the subsequent address of the instruction fetch address to the final address by the lower-order multiple bits is supported Prefetching the instruction into the instruction buffer to be executed, setting the corresponding flag in a valid state in response to the instruction prefetch, and the instruction buffer allocated in correspondence with the value of the lower plurality of bits of the instruction address to be fetched by the instruction execution means. Flag is valid Data processing system, characterized in that the holdings instructions in the instruction buffer corresponding to the condition that the state is to output to the instruction execution unit.

An instruction execution means for fetching an instruction, decoding the fetched instruction, and executing the instruction; a plurality of instruction buffers having a number of bits of an instruction fetch unit by the instruction execution means; a flag unique to each instruction buffer; A data processor formed on one semiconductor chip having a buffer controller and a bus controller that controls external bus access based on an instruction from the instruction execution means;
A memory for storing an operation program of the data processor, and a target of external bus access by the bus controller;
The buffer control circuit assigns a unique value that can be taken by the lower plurality of bits of the instruction address to each instruction buffer, and when there is an instruction fetch for the instruction address of the leading value by the lower plurality of bits, the lower plurality Instructions are prefetched to the instruction buffer corresponding to the address order up to the last address by bit, and when there is an instruction fetch of the branch destination instruction by the branch instruction, the address order from the subsequent address of the instruction fetch address to the final address by the lower-order multiple bits is supported Prefetching the instruction into the instruction buffer to be executed, setting the corresponding flag in a valid state in response to the instruction prefetch, and the instruction buffer allocated in correspondence with the value of the lower plurality of bits of the instruction address to be fetched by the instruction execution means. Flag is valid The instruction stored in the instruction buffer corresponding to the condition of being in the state is output to the instruction execution means, the corresponding flag is controlled to be in an invalid state in response to the output of the instruction prefetched in the instruction buffer, and the flag Is capable of prefetching instructions into an instruction buffer corresponding to a condition that the instruction is in an invalid state, and initializing all the flags to an invalid state in response to an instruction fetch instruction by a branch instruction. Data processing system.

A data processing device, a memory, and a bus connected to the data processing device and the memory;
The memory stores at least a program for protocol control or system control,
The data processing apparatus fetches an instruction, decodes the fetched instruction, and executes the instruction, and a plurality of instruction buffers having a number of bits of an instruction fetch unit by the instruction execution unit, each instruction buffer And a bus controller that has a buffer control circuit and controls the access to the memory via the bus based on a signal from the instruction execution unit,
The buffer control circuit assigns a unique value that can be taken by lower order bits of an instruction address to each instruction buffer,
When there is an instruction fetch to the instruction address corresponding to the minimum value represented by the lower multiple bits of the instruction address, from the instruction address next to the instruction address to the last instruction address represented by the lower multiple bits Instructions are stored in respective instruction buffers corresponding to instruction addresses of the plurality of instruction buffers, and respective flags corresponding to the respective instruction buffers are set to a first state;
In response to an instruction fetch request from the instruction execution unit, if the flag corresponding to the instruction buffer corresponding to the lower order bits of the instruction address to be fetched from the instruction execution unit is in the first state, the instruction buffer The mobile phone is characterized by outputting a command stored in the command execution unit to set the flag to the second state.

When the flag corresponding to the instruction buffer corresponding to the lower-order multiple bits of the instruction address to be fetched is output from the instruction execution unit, the flag is represented by the lower-order multiple bits from the instruction address next to the instruction address. The instructions up to the last instruction address are stored in each of the instruction buffers corresponding to the instruction addresses of the plurality of instruction buffers, and the respective flags corresponding to the respective instruction buffers are set to the first state. The mobile phone according to claim 10.

Of the instruction at the instruction address to be fetched, which is output by the instruction execution unit, the instruction at the instruction address corresponding to the minimum value represented by the lower multiple bits of the instruction address or the lower multiple bits of the instruction address 12. The mobile phone according to claim 11, wherein the instruction at the instruction address whose value corresponding instruction buffer flag is in the second state is read from the memory and then supplied to the instruction execution unit as it is.

The instruction execution unit outputs a predetermined signal according to the type of fetched instruction,
13. The buffer control circuit according to claim 12, wherein all the flags corresponding to each of the plurality of instruction buffers are set to a second state in accordance with a first signal output from the instruction execution unit. mobile phone.

14. The mobile phone according to claim 13, wherein the instruction for the instruction execution unit to output the first signal is a branch instruction.

An instruction execution unit that fetches an instruction, decodes the fetched instruction, and executes the instruction; and a bus controller that controls external bus access based on an instruction from the instruction execution unit;
The bus controller has a plurality of instruction buffers, a flag specific to each instruction buffer, and a buffer control circuit,
The buffer control circuit assigns a unique value that can be taken by a plurality of lower bits of an instruction address to each of the instruction buffers, and assigns an instruction to an instruction buffer corresponding to the order of addresses of the lower plurality of bits based on a subsequent address of a predetermined instruction fetch address. Prefetch, control the response flag to the valid state in response to the instruction prefetch, and control the response flag to the invalid state in response to the output of the prefetched instruction.
A data processor characterized in that the number of instruction buffers for prefetching instructions among the plurality of instruction buffers can be changed.

16. The data processor according to claim 15, wherein the number of instruction buffers for prefetching the instructions is determined by information set in a predetermined register.

16. The data processor according to claim 15, wherein the number of instruction buffers for prefetching said instructions is determined based on the number of non-branch instructions executed until a branch instruction is executed.

An instruction execution unit that fetches an instruction, decodes the fetched instruction, and executes the instruction; a bus controller that controls external bus access based on an instruction from the instruction execution unit; and an interrupt control circuit;
The bus controller has a plurality of instruction buffers, a flag specific to each instruction buffer, and a buffer control circuit,
The buffer control circuit assigns a unique value that can be taken by a plurality of lower bits of an instruction address to each of the instruction buffers, and assigns an instruction to an instruction buffer corresponding to the order of addresses of the lower plurality of bits based on a subsequent address of a predetermined instruction fetch address. Prefetch, control the response flag to the valid state in response to the instruction prefetch, and control the response flag to the invalid state in response to the output of the prefetched instruction.
A data processor, wherein prefetching of an instruction to the instruction buffer is interrupted in response to an interrupt control circuit receiving an interrupt.

19. The data according to claim 18, wherein after the interrupt control circuit accepts an interrupt, prefetching of an instruction to the instruction buffer is interrupted in response to the instruction decoder branching to an instruction address related to interrupt processing. Processor.

An instruction execution unit that fetches an instruction, decodes the fetched instruction, and executes the instruction; and a bus controller that controls external bus access based on an instruction from the instruction execution unit;
The bus controller includes a first buffer table, a second buffer table, and a buffer control circuit,
Each buffer table has a plurality of instruction buffers and a flag specific to each instruction buffer,
The buffer control circuit assigns a unique value that can be taken by lower order bits of an instruction address to each instruction buffer included in the respective buffer table, and an address by the lower order bits from a subsequent address of a predetermined instruction fetch address. In order to prefetch the instructions to the corresponding instruction buffer in order, in response to the instruction prefetch, the corresponding flag is set to the valid state, and in response to the output of the prefetched instruction, the corresponding flag is controlled to the invalid state.
An instruction prefetched in an instruction buffer included in the second buffer table is output in response to an output of an instruction prefetched in all instruction buffers included in the first buffer table. Data processor.

The buffer control circuit interrupts the instruction prefetch to the first buffer table and supplies the instruction from the instruction executing means when the instruction decoded by the instruction executing means is an instruction belonging to the first instruction type. 21. The data processor according to claim 20, wherein an instruction prefetch is performed on an instruction buffer included in the second buffer table based on an instruction address to be executed.

An instruction execution unit that fetches an instruction, decodes the fetched instruction, and executes the instruction; and a bus controller that controls external bus access based on an instruction from the instruction execution unit;
The bus controller includes a plurality of instruction buffers, a flag specific to each instruction buffer, a buffer control circuit, and an instruction decoding unit.
The buffer control circuit assigns a unique value that can be taken by a plurality of lower bits of an instruction address to each of the instruction buffers, and assigns an instruction to an instruction buffer corresponding to the order of addresses of the lower plurality of bits based on a subsequent address of a predetermined instruction fetch address. Prefetch, control the response flag to the valid state in response to the instruction prefetch, and control the response flag to the invalid state in response to the output of the prefetched instruction.
An instruction stored in the instruction buffer is decoded by the instruction decoding unit, and when the decoded instruction belongs to a first instruction type, instruction prefetch is suspended until the instruction is output from the instruction buffer. A data processor.

The data processor according to claim 22, wherein the first instruction type is a branch instruction.

The bus controller further includes an address calculation unit,
The address calculation unit calculates a branch destination address to branch by the branch instruction;
24. The data processor according to claim 23, wherein instruction prefetch is performed based on the branch destination address.

The plurality of instruction buffers and flags specific to each instruction buffer are classified into a first buffer table and a second buffer table,
25. The branching instruction is prefetched to an instruction buffer included in a first buffer table, and prefetched to an instruction buffer included in the second buffer table based on the branch destination address. Data processor.

The instruction buffer included in the first buffer table is prefetched up to an instruction at a predetermined address following the branch instruction, and the instruction at a predetermined address is based on the branch destination address in the second buffer table. 26. The data processor of claim 25, wherein is prefetched.

An instruction execution unit that fetches an instruction, decodes the fetched instruction, and executes the instruction; and a bus controller that controls external bus access based on an instruction from the instruction execution unit;
The bus controller includes a plurality of instruction buffers, a flag specific to each instruction buffer, one or more data buffers, a flag specific to the data buffer, an instruction decoding unit, an address calculation unit, and buffer control Circuit and
The buffer control circuit assigns a unique value that can be taken by a plurality of lower bits of an instruction address to each of the instruction buffers, and assigns an instruction to an instruction buffer corresponding to the order of addresses of the lower plurality of bits based on a subsequent address of a predetermined instruction fetch address. Prefetch, control the response flag to the valid state in response to the instruction prefetch, and control the response flag to the invalid state in response to the output of the prefetched instruction.
When the instruction stored in the instruction buffer is decoded by the instruction decoding unit and the instruction is an instruction belonging to a second type of instruction that requires information stored at a predetermined address, the address calculation unit includes: The predetermined address is calculated, the data stored at the predetermined address is stored in the data buffer, the corresponding unique flag is set to the valid state, and the unique flag is responded to the output of the stored data. A data processor characterized by controlling the state to an invalid state.

An instruction execution unit that fetches an instruction, decodes the fetched instruction, and executes the instruction; a bus controller that controls external bus access based on an instruction from the instruction execution unit; and a cache memory,
The bus controller has a plurality of instruction buffers, a flag specific to each instruction buffer, and a buffer control circuit,
The buffer control circuit assigns a unique value that can be taken by a plurality of lower bits of an instruction address to each of the instruction buffers, and assigns an instruction to an instruction buffer corresponding to the order of addresses of the lower plurality of bits based on a subsequent address of a predetermined instruction fetch address. Prefetch, control the response flag to the valid state in response to the instruction prefetch, and control the response flag to the invalid state in response to the output of the prefetched instruction.
The data processor, wherein the prefetched instruction is also supplied to the cache memory.

The bus controller supplies an instruction stored in the cache memory to the instruction execution unit without performing an instruction prefetch when an instruction at an instruction fetch address is stored in the cache memory. 30. A data processor according to claim 28.

An instruction execution unit that fetches an instruction, decodes the fetched instruction, and executes the instruction; a bus controller that controls external bus access based on an instruction from the instruction execution unit; and a cache memory,
The bus controller has a plurality of instruction buffers and a buffer control circuit,
The buffer control circuit prefetches an instruction into the instruction buffer based on a subsequent address of a predetermined instruction fetch address;
The prefetched instruction is also supplied to the cache memory,
The bus controller further includes an instruction decoding unit and an address calculation unit,
The instruction decoding unit decodes the prefetched instruction, and when the instruction is a branch instruction, the address calculation unit calculates a branch destination address,
If the branch destination address is a branch to an address smaller than the instruction address being executed by the instruction execution means, the instruction prefetch is interrupted,
A data processor, wherein if the branch destination address is a branch to an address larger than an instruction address being executed by the instruction execution means, an instruction prefetch is performed on the branch destination address.

When the branch destination address is a branch to an address smaller than the instruction address being executed by the instruction execution means, when the instruction at the branch destination address is stored in the cache memory, the branch destination address is stored in the cache memory. 31. The data processor according to claim 30 , wherein the instruction is supplied to the instruction execution means, and when the instruction at the branch destination address is not stored in the cache memory, the instruction is prefetched into the instruction buffer.