JP3778573B2

JP3778573B2 - Data processor and data processing system

Info

Publication number: JP3778573B2
Application number: JP51547898A
Authority: JP
Inventors: 重純松井; 進金子
Original assignee: Renesas Technology Corp
Current assignee: Renesas Technology Corp
Priority date: 1996-09-27
Filing date: 1996-09-27
Publication date: 2006-05-24
Anticipated expiration: 2016-09-27
Also published as: WO1998013759A1; TW332272B

Description

技術分野
本発明は、データプロセッサ、更にはデータプロセッサにおけるマルチタスキング若しくはタスク切換え技術に関し、例えば複数のタスクをパイプライン的に処理するデータプロセッサ、そしてそのデータプロセッサを適用したデータ処理システムに適用して有効な技術に関するものである。
背景技術
データプロセッサによるデータ処理を高速化する技術としてパイプライン処理がある。パイプライン処理は、一つの大きな処理を複数の処理要素に分割し、各処理要素に必要な時間即ちパイプラインピッチで次々に新しい処理を実行することにより、データ処理のスループットを向上させるものである。例えば、一つの命令を実行するための制御処理を、命令フェッチ、命令デコード、演算、メモリアクセス及びレジスタストアの各処理に分けた場合、前記夫々の処理を一つのパイプラインステージとし、一つのパイプラインステージのピッチ（パイプラインピッチ）毎に命令フェッチを行って、見掛け上、一つの命令を１パイプラインピッチで実行していく。
このようなパイプライン処理の途上で、タスク切換えを行う場合には、後から、現在実行中のタスクに復帰できるように、プログラムカウンタ、ステータスレジスタ及びデータレジスタなどの値をスタック領域に退避する処理を行わなければならない。
しかしながら、そのような退避処理には少なからず時間を要するため、パイプラインに乱れを生ずることになる。特に、複雑な処理を行う場合には、プログラム実行状態を局所的に見てもタスク切換えが頻繁に生ずることになる。これにより、折角パイプライン処理を採用しても、思うようにデータ処理のスループットを向上させることができなくなる。
特開昭６２−２３７５３１号公報には、複数のプログラムを時分割で実行する場合に、割り込みでは処理が複雑になるから、プログラムＲＯＭとプログラムカウンタを２組み設け、各組におけるＲＯＭアクセスタイミングをずらし、セレクタで交互にＲＯＭの出力を選択して命令レジスタに命令を供給する様にし、これによって、複数のプログラムを簡単に時分割で実行できるようにする技術が示されている。
これは、クロック信号に基づいてプログラムを単に交互に切り替えてプログラムを完全時分割で実行させるための技術であり、複数のプログラムを見掛け上、単に並列的に実行する場合を想定しており、特定のイベントがデータプロセッサの内外で発生するのに応じてタスクを切り替えることについては考慮されていない。機器制御などに汎用的に利用されるデータプロセッサでは少なくともそれを考慮し、タスク切換えに際してパイプラインの乱れ等を小さくし、スループットを向上することが要求される。
また、スーパスカラアーキテクチャーのデータプロセッサは、複数の命令を複数のパイプラインによって同時に実行することができる。このようなデータプロセッサにおいて、ある命令が他の命令の実行結果を利用するようというようなデータコンフリクトの状態に代表される命令相互間の依存関係を管理しなければならない。並列実行されるべき命令にデータコンフリクトを生ずることが明らかになった場合、複数のパイプラインの一部は命令実行を停止して他方の命令実行の完了を待つことになる。このようにデータコンフリクトによって空いたパイプラインを別のタスクの実行に利用することを考慮すると、やはり、タスク切換えに伴う処理時間を短くしてパイプラインの乱れを最小限に抑えなければならないことが本発明者によって明らかにされた。
また、データプロセッサはオペランドアクセスを高速化するためにキャッシュメモリを搭載することができる。キャッシュメモリのキャッシュラインが書き換えられると、それに対応されるメモリの内容も書き換えられなければならない。例えばデータプロセッサのみが主メモリを占有しているなら、キャッシュラインのリプレースが行われるときだけ前記書き換えられた内容を主メモリに反映させればよい。このような動作をライトバックと称する。
しかしながら、データプロセッサの外部に接続されたＤＭＡ（Direct Memory Access）コントローラは、キャッシュメモリの書き換えが主メモリに反映されていない誤ったデータを主メモリから読出してデータ転送を行う虞がある。このような虞をキャッシュ・コヒーレンシの問題と称し、これを解消するために、メモリライト動作時にキャッシュヒットであっても、その都度メモリライト動作を行うライトスルー方式をキャッシュメモリに採用すると共に、ライトバッファを用いてキャッシュメモリをノンブロッキング構成にすることができる。ところが、キャッシュ・コヒーレンシのために、メモリライト動作が頻発すると、データプロセッサはＤＭＡコントローラと主メモリを接続するバスのデータ転送能力をキャッシュ・コヒーレンシのために使い切ってしまい、ＤＭＡコントローラによって高速のデータ転送を行うとき、そのデータ転送速度が制限されてしまうという問題を生ずる。
そこで、ライトスルーを採用せず、ライトバック方式でキャッシュ・コヒーレンシを保つために、キャッシュコヒーレンシを保たない動作を検出し、その時点でライトバックさせる技術を採用することができる。例えば、データプロセッサは、ＤＭＡコントローラがキャッシュメモリに蓄積されているデータをリードアクセスする動作（バス・スヌープ）を検出すると、そのデータをライトバックする動作を割り込ませ、その後で、ＤＭＡ転送を可能にする。しかしながら、データプロセッサは、キャッシュコヒーレンシを保たない動作を検出するための負担が増すことになる。
本発明の目的は、タスクの切換えに伴う処理を軽減でき、データ処理能力を向上させることができるデータプロセッサを提供することにある。
本発明の別の目的は、タスク切換えに際してパイプラインの乱れを最小限に抑えることができるデータプロセッサを提供することにある。
本発明の更に別の目的は、スーパスカラアーキテクチャーのデータプロセッサにおいて、データコンフリクトによって空いたパイプラインを別のタスクの実行に切り替えて有効利用できるようにすることにある。
本発明のその他の目的は、ライトバック方式のキャッシュメモリを内蔵するとき、ＤＭＡ転送に際してキャッシュ・コヒーレンシを保たせるための負担を最小限に抑えることができるデータプロセッサを提供することにある。
本発明の前記ならびにその他の目的と新規な特徴は本明細書の以下の記述から明らかにされるであろう。
発明の開示
本発明において、第１図に例示されるように、命令フェッチユニット（１０）が命令をフェッチし、命令レジスタ（１１）にラッチされた命令を命令デコーダ（１２）が解読し、その解読結果に基づいて命令実行ユニット（１３）が命令を実行するデータプロセッサ（１）は、プログラムの格納領域（１６０，１７０）とその領域に格納され命令を順次読出すためのポインタ（１６１，１７１）とを夫々が備えた複数個のタスクバッファ（１６，１７）と、前記夫々のタスクバッファ毎に専用化され、前記命令実行ユニットに配置されたレジスタ手段（Ｓ１，Ｓ２）と、前記複数個のタスクバッファと命令フェッチユニットとの中から一つを選択的に前記命令レジスタに接続するセレクタ（１８）と、初期状態において前記セレクタに前記命令フェッチユニットを選択させると共に、内部又は外部で発生されるイベントに従って前記セレクタを選択制御する切換え制御手段（１９）と、前記命令実行ユニットの制御に基づいて前記複数個のタスクバッファの全部又は一部をデータ書き込み可能に外部とインタフェースするインタフェース手段（２１，ＢＵＳ）とを含む。
前記タスクバッファは夫々固有のポインタを有し、命令実行ユニットは夫々のタスクバッファに割り当てられた固有のレジスタ手段を有するから、実行すべきタスクが、命令フェッチユニットのプログラムに従った通常命令処理とタスクバッファのプログラムに従ったスワップタスク処理との間で切換えられるとき、中断される通常命令処理の実行状態（例えばプログラムカウンタや汎用レジスタの値）を退避したり復帰したりするために外部メモリのスタック領域をアクセスする処理を必要としない。これにより、タスク切換えの高速化と、タスク切換えに伴う処理の軽減とが達成され、データプロセッサのデータ処理能力向上に寄与する。
前記命令レジスタ、命令デコーダ及び命令実行ユニットが、パイプラインステージ単位で処理を進めて命令をパイプライン処理を行う場合、上記により、パイプラインの乱れを最小限に抑えることができる。
前記命令実行ユニットは、前記命令レジスタに命令をラッチさせる指示信号（ＬＩＲ）を出力し、前記セレクタは、その指示信号を前記切換え制御手段が選択する命令フェッチユニット又はタスクバッファに供給し、命令フェッチユニットは命令レジスタに供給すべき命令をその指示信号に基づいて更新し、前記タスクバッファは前記ポインタをその指示信号に基づいて更新することができる。この制御は、前記タスクバッファのポインタ制御を容易化する。
前記スワップタスク処理から通常命令処理への復帰の手法として、前記切換え制御手段が選択したタスクバッファから命令デコーダに供給された命令の解読結果に基づいて当該切換え制御手段が前記セレクタを前記命令フェッチユニットの選択状態に戻すようにできる。即ち、選ばれたスワップタスク処理の完了を以って、通常命令処理に復帰させる。
スワップタスク処理を最優先に完了させることを考慮した場合、第１図に例示されるように、前記切換え制御手段は、前記タスクバッファの選択に呼応して、命令実行ユニットに入力される割り込み信号を無効化する割り込み禁止信号（ＩＮＨ）を出力するとよい。これにより、スワップタスク処理中において割込み要求は受け付けられない。
割り込み受け付けを許容する場合、第１２図に例示されるデータプロセッサ（１Ａ）のように、前記切換え制御手段（１９）は、前記タスクバッファ（１６、１７）を選択しているとき、前記命令実行ユニット（１３）による割り込みの受け付けに対応される制御信号ＩＣＮＴにより前記セレクタ（１８）を命令フェッチユニットの選択状態に戻すと共に、その直前のタスクバッファの選択状態を退避させればよい。
データプロセッサ（１）は、前記命令実行ユニットと外部との間にデータキャッシュメモリ（１５）を備えることができる。このデータプロセッサは、第２０図に例示されるように、バス（４）を介してメモリ等の複数の周辺回路（２，５）に接続されてデータ処理システムを構成する。このとき、前記タスクバッファにＤＭＡ転送制御プログラム若しくはＤＭＡ転送及びデータ変換制御プログラムを設定した場合、キャッシュコヒーレンシの問題を解決するためのデータプロセッサの負担を軽減することができる。すなわち、データプロセッサの処理タスクがセレクタ等を介してＤＭＡ転送制御処理に切換えられた状態において、ＤＭＡコントローラとしての機能は実行ユニットが実現することになる。従って、データプロセッサの外部メモリ間、或いは外部メモリと外部の入出力回路間でＤＭＡデータ転送を制御する場合、ＤＭＡ転送制御のためのアドレス信号若しくはアクセス制御情報は必ずデータキャッシュメモリを通ることになる。換言すれば、キャッシュメモリがライトバック方式を採用する場合に、データキャッシュメモリの書き換えが外部メモリに反映されていない状態でＤＭＡ転送が開始されても、そのような外部メモリに反映されていないデータはデータキャッシュメモリから命令実行ユニットに読み込まれて、転送されることになる。これにより、データプロセッサは、キャッシュコヒーレンシを保たないＤＭＡ転送動作を検出するとともに検出したときには予じめライトバック動作を行なわせることを要せず、キャッシュコヒーレンシを保たないＤＭＡ転送動作を検出するというデータプロセッサの処理負担を軽減することができる。当然データプロセッサで実現するＤＭＡ転送制御機能において、転送データは一旦データプロセッサに読み込まれることになる。
上記タスク切換え手段は、第１４図及び第１６図に例示されるスーパスカラ形式のデータプロセッサ（１Ｂ，１Ｃ）にも応用できる。すなわち、命令レジスタ（１１Ａ，１１Ｂ）にラッチした命令を命令デコーダ（１２Ａ，１２Ｂ）で解読して命令実行ユニット（１３Ａ，１３Ｂ）がその命令を実行する命令実行制御系列を複数系列備えると共に、命令をフェッチする命令フェッチユニット（１０）を含み、複数の命令を前記複数の命令実行制御系列で並列実行可能なデータプロセッサ（１Ｂ、１Ｃ）は、プログラムの格納領域とその領域に格納され命令を順次読出すためのポインタとを夫々が備えた複数個のタスクバッファ（１６、１７）と、前記夫々のタスクバッファ毎に専用化され、特定の前記命令実行ユニットに配置されたレジスタ手段（Ｓ１，Ｓ２）と、前記複数個のタスクバッファと命令フェッチユニットとの中から一つを選択して前記特定の命令実行ユニットに対応される命令レジスタに接続するセレクタ（１８）と、初期状態において前記セレクタに前記命令フェッチユニットを選択させると共に、内部又は外部で発生されるイベントに従って前記セレクタを選択制御する切換え制御手段（１９）とを有する。このデータプロセッサにおいても、一方の命令実行制御系を利用して通常命令処理とスワップタスク処理を切換えられるので、上記同様、タスク切換えの高速化とタスク切換えに伴うデータプロセッサの負担軽減とを達成でき、また、パイプラインの乱れも最小限に抑えることができる。したがって、スーパスカラアーキテクチャが本来企図する高いデータ処理能力を保証することができる。
複数の命令を並列実行可能なスーパスカラデータプロセッサにおいて、データコンフリクトのような命令相互の依存関係をハードウェアによって調停する場合、前記夫々の命令実行制御系列に含まれる命令デコーダからの命令解読結果に基づいて、相互に異なる命令実行制御系列による命令の並列実行が可能か否かについてそれら命令相互間の依存関係を調べ、他の命令の実行結果に依存する命令の実行を遅らせる競合管理ユニット（２５）を備えることになる。
このとき、前記切換え制御手段は、第１６図に例示されるように、データコンフリクトなどによって前記競合管理ユニットが特定の命令の実行を遅らせるとき、それを通知する制御信号２５０に応答して前記セレクタ（１８）にタスクバッファを選択させることにより、処理が中断される一方の命令実行制御系若しくはパイプによる通常命令処理をスワップタスク処理に切換えることができ、命令実行制御系列を有効利用することができる。特にタスク切換え時には前述の通り、途中で中断される通常命令処理の実行状態の退避を要しないから、命令実行制御系列の空き時間が短い場合にも効率的にタスク切換えを行ってスワップタスク処理に移行することができる。
前記データコンフリクト等の競合状態は命令デコード結果に基づいて競合管理ユニット（２５）が判定し、そのとき、処理が遅延されるべき命令は既にデコードを終了している。その後で、スワップタスク処理に切換えられるが、処理が中断される通常命令処理と、それに代えて処理が開始されるスワップタスク処理とが相互に同じ命令レジスタ及び命令デコーダを用いる場合には、第１７図に例示されるように、パイプ１におけるパイプラインステージｍの命令フェッチ（Ｉｎ）と同じ命令をパイプ１のステージｍ＋２で再度フェッチし、パイプ１におけるパイプラインステージｍ＋１の命令デコード（Ｄｎ）と同じ命令をパイプ１のステージｍ＋３で再度デコードしなければならず、この意味においてパイプラインが乱れることになる。従って、スワップタスク処理の後に、処理が中断された通常命令処理に復帰するときは命令フェッチから再開しなければならない。
上述のデータコンフリクトによる通常命令処理からスワップタスク処理書への切換えにおいて、パイプラインに全く乱れを生じないようにするには、第１８図に例示されるように、データプロセッサ（１Ｄ）は、一方の命令実行制御系に、スワップタスク処理専用の命令レジスタ（１１Ｃ）と命令デコーダ（１２Ｃ）を追加すればよい。すなわち、命令レジスタ（１１Ａ，１１Ｂ）にラッチした命令を命令デコーダ（１２Ａ，１２Ｂ）で解読して命令実行ユニット（１３Ａ，１３Ｂ）が命令を実行する命令実行制御系列を複数系列備えると共に、命令をフェッチする命令フェッチユニット（１０）を含み、複数の命令を前記複数の命令実行制御系列で並列実行可能とされることを前提とする。そして、このデータプロセッサ（１Ｄ）は、プログラムの格納領域とその領域に格納され命令を順次読出すためのポインタとを夫々が備えた複数個のタスクバッファ（１６、１７）と、前記複数個のタスクバッファに専用化された特定タスク用命令レジスタ（１１Ｃ）と、前記特定タスク用命令レジスタにラッチされた命令を解読する特定タスク用命令デコーダ（１２Ｃ）と、前記夫々のタスクバッファ毎に専用化され、特定の命令実行ユニットに配置されたレジスタ手段（Ｓ１，Ｓ２）と、前記複数個のタスクバッファと命令フェッチユニットとの中から一つを選択的して前記特定の命令実行ユニットに対応される命令レジスタに接続する第１のセレクタ（１８）と、前記複数個のタスクバッファの中から一つを選択して前記特定タスク用命令レジスタに接続する第２のセレクタ（２６）と、前記特定の命令実行ユニットに対応される命令デコーダの出力と前記特定タスク用命令デコーダの出力を選択的に前記特定の命令実行ユニットに接続する第３のセレクタ（２７）と、前記夫々の命令実行制御系列に含まれる命令デコーダからの命令解読結果に基づいて、相互に異なる命令実行制御系列による命令の並列実行が可能か否かについてそれら命令相互間の依存関係を調べ、他の命令の実行結果に依存する特定の命令の実行を遅らせ、当該特定の命令の実行を遅らせるとき前記第３のセレクタに前記特定タスク用命令デコーダを選択させる競合管理ユニット（２５）と、初期状態において前記命令フェッチユニットを前記第１のセレクタに選択させると共に第２のセレクタを非選択状態に制御し、内部又は外部で発生されるイベントに従って前記第１のセレクタを選択制御し、また、前記第３のセレクタによる前記特定タスク用命令デコーダの選択に呼応して第２のセレクタに内部又は外部で発生されるイベントに応じたタスクバッファを選択させる切換え制御手段（１９）とを含む。
【図面の簡単な説明】
第１図は本発明の第１の実施例に係るデータプロセッサのブロック図、
第２図は命令フェッチユニットの一例ブロック図、
第３図はスワップタスクバッファの第１の例を示すブロック図、
第４図はスワップタスクバッファの第２の例を示すブロック図、
第５図はスワップタスクバッファの第３の例を示すブロック図、
第６図はスワップタスクバッファの第４の例を示すブロック図、
第７図は命令実行ユニットに含まれるレジスタセットの一例説明図、
第８図は第１の実施例に係るデータプロセッサにおけるタスク切換え動作の一例説明図、
第９図は通常命令処理と割り込み処理との切換え動作の一例説明図、
第１０図は第１の実施例に係るデータプロセッサにおけるタスク切換えとパイプラインとの関係を示す一例タイミングチャート、
第１１図はスワップタスク中に割り込みを受け付けない第１の実施例に係るデータプロセッサの一例動作タイミングチャート、
第１２図は本発明の第２の実施例に係るデータプロセッサのブロック図、
第１３図はスワップタスク中に割り込みを受け付ける第２の実施例に係るデータプロセッサの一例動作タイミングチャート、
第１４図は本発明の第３の実施例に係るデータプロセッサのブロック図、
第１５図は第３の実施例に係るデータプロセッサにおいてデータコンフリクトを生じた場合の制御とタスク切換え制御の内容を示す一例タイミングチャート、
第１６図は本発明の第４の実施例に係るデータプロセッサのブロック図、
第１７図は第４の実施例に係るデータプロセッサにデータコンフリクトを生じたときのタスク切換え制御の内容を示すタイミングチャート、
第１８図は本発明の第５の実施例に係るデータプロセッサのブロック図、
第１９図はデータコンフリクトを生じたとき第５の実施例に係るデータプロセッサで行われるタスク切換え制御を示すタイミングチャート、
第２０図は本発明のデータプロセッサを適用したデータ処理システムの一例ブロック図、
第２１図はＤＭＡ転送制御及びデータ変換制御プログラムによるタスクの一例を示す説明図、
第２２図はＤＭＡ転送制御及びデータ変換制御プログラムのプログラム記述の最小単位の一例を示す説明図、
第２３図はライトバック方式を採用するキャッシュメモリとデータプロセッサの外部に配置されたＤＭＡコントローラとを含むデータ処理システムの一例ブロック図である。
発明を実施するための最良の形態
第１図には本発明の第１の実施例に係るデータプロセッサのブロック図が示される。同図に示されるデータプロセッサ１は、特に制限されないが、公知の半導体集積回路製造技術によって単結晶シリコンのような１個の半導体基板に形成されている。
第１図において１０は命令フェッチユニット、１１は命令レジスタ、１２は命令デコーダ、１３は命令実行ユニット、１４は命令キャッシュメモリ、１５はデータキャッシュメモリ、１６，１７は代表的に示されたスワップタスクバッファ、１８はセレクタ、１９は切換え制御回路、２０は内蔵周辺モジュールを総称する回路ブロックである。
前記命令実行ユニット１３は、プログラムカウンタＰＣ、汎用レジスタＧＲ、夫々のスワップタスクバッファ１６，１７に個々に割り当てられたレジスタセットＳ１，Ｓ２、割り込み制御回路１３１、シーケンス制御回路１３２、演算回路等１３３などを含む。
本実施例のデータプロセッサ１において、命令レジスタ１１、命令デコーダ１２及び命令実行ユニット１３はパイプラインステージ単位で処理を進めて、命令をパイプライン処理する。命令レジスタ１１、命令デコーダ１２及び命令実行ユニット１３における動作サイクルは、データプロセッサ１の図示を省略する動作基準クロック信号に同期して、前記シーケンス制御回路１３２が制御する。
この命令実行ユニット１３は、特に制限されないが、内部バスＢＵＳに接続されたデータキャッシュメモリ１５を介して外部にインタフェースされる。データキャッシュメモリのキャッシュの対象は外部メモリ２等とされる。データキャッシュメモリ１５は図示を省略するキャッシュデータ部、キャッシュタグ部及びキャッシュコントローラを含む回路ブロックとして図示されている。キャッシュデータ部は外部メモリ２等が保有するデータの一部を保持する。キャッシュタグ部はキャッシュデータ部が保有するデータと対応させてそのアドレスの一部（アドレスタグ）をキャッシュタグとして保有する。キャッシュコントローラは外部アクセスにおいてキャッシュ・ヒットの場合には、ヒットに係るデータをキャッシュデータ部から内部バスＢＵＳに出力し、或いはヒットに係るデータを新たなエントリとしてキャッシュデータ部に書き込む。キャッシュ・ミスの場合には外部メモリ２等から読み込んだデータを内部バスＢＵＳに与え、或い外部メモリ２等を書き込みアクセスする。キャッシュミスにおいては、キャッシュラインのリプレースを行うことができる。特に制限されないが、このキャッシュコントローラは、キャッシュヒットによって書き換えられたキャッシュデータ部の内容を外部メモリ２等に書き戻すための処理は、キャッシュラインのリプレースが行われるときだけ行う、所謂ライトバックによって行う。
前記プログラムカウンタＰＣは次に実行すべき命令アドレスを保有する。命令フェッチユニット１０は、特に制限されないが、プログラムカウンタＰＣの値に基づいて、将来実行されると予想される命令（例えばプログラムカウンタＰＣによって指定される命令からその先の複数命令）をフェッチする。フェッチされるべき命令は、特に制限されないが、外部メモリ３に格納されている。この実施例において外部メモリ３と命令フェッチユニット１０との間には命令キャッシュメモリ１４が配置されている。
この命令キャッシュメモリ１４は、図示を省略するキャッシュデータ部、キャッシュタグ部及びキャッシュコントローラを含む回路ブロックとして図示されている。キャッシュデータ部は外部メモリ３等が保有する命令の一部を保持する。キャッシュタグ部はキャッシュデータ部が保有する命令と対応させてそのアドレスの一部（アドレスタグ）をキャッシュタグとして保有する。キャッシュコントローラは命令フェッチユニット１０によるメモリアクセスにおいてキャッシュ・ヒットの場合にはキャッシュデータ部が保有する命令を命令フェッチユニット１０に与え、キャッシュ・ミスの場合には外部メモリ３等から命令を読み込んで命令フェッチユニット１０に与える。
命令フェッチユニット１０は、特に制限されないが、先入れ・先出し（First-in・First-out）バッファの機能を有し、プログラムカウンタＰＣの値に対して複数ワード分の命令をプリフェッチすることができる。例えば第２図に示されるように４段のラッチ１００Ａ〜１００Ｄが直列配置され、セレクタ１０１Ａ〜１０１Ｃを介して前段のラッチを介することなく直接外部若しくは命令キャッシュメモリ１４からの命令を取り込むことができるようにされている。１０２は命令フェッチのための制御回路であり、プログラムカウンタＰＣの値に基づいてフェッチすべき命令のアドレスを出力するとともに、それによって入力される命令を先入れ・先出し形態で前記ラッチ１００Ａ〜１００Ｄに保持させ且つラッチ１００Ａ〜１００Ｄから出力させる。特に制限されないが、ラッチ１００Ａ〜１００Ｄは２ワード単位で命令をラッチし、命令デコーダ１２は１ワード単位で命令をデコードする。これに応じて、データラッチ１００Ｄの出力はセレクタ１０３で下位ワードと上位ワードに分けて出力される。
前記夫々のスワップタスクバッファ１６、１７は、プログラム格納領域１６０、１７０とその格納領域１６０、１７０に格納された命令を順次読み出すためのポインタ１６１、１７１とを有する。特に制限されないが、スワップタスクバッファ１６は内部バスＢＵＳを介して実行ユニット１３によりそのプログラム格納領域１６０に対する書込みが可能にされている。また、スワップタスクバッファ１７は、命令実行ユニット１３により制御されるシリアルインタフェース（その制御線は図示を省略してある）２１を介してそのプログラム格納領域１７０に対する書込みが可能にされている。
スワップタスクバッファ１６、１７の一例は第３図〜第６図に示されている。第３図の例はシフトレジスタとセレクタを格納領域１６０（１７０）とするものであり、複数個の並列入出力型のラッチＬＡＴが縦続されて成るシフトレジスタと、夫々のラッチＬＡＴの並列出力から１ビットづつ選択して並列出力するセレクタＳＥＬと、セレクタＳＥＬを介して、各位ラッチＬＡＴの出力を上位側或いは下位側から順番にセレクタＳＥＬに選択させるポインタ１６０（１７０）とによって構成することができる。例えば夫々ｍビットのラッチＬＡＴをｎ段備えるとすると、ｎビット単位で順次命令をｍ回出力することができる。シフトレジスタへのデータ書込みはシリアルインタフェース２１又は命令実行ユニット１３の制御で行われる。前記ラッチＬＡＴの段数は命令のビット数に応じて決定され、第４図には第３図とはラッチＬＡＴの段数が異なる構成が示されている。第５図の例はダイナミック型メモリセル又はスタティック方メモリセルをマトリクス配置したメモリセルアレイＭＣＡ１とアドレスデコーダＡＤＥＣ１から成るＲＡＭ（Random Access Memory）を格納領域１６０とするものであり、ポインタ１６１がＲＡＭに対するアクセスアドレスを生成する。ＲＡＭに対する書込みは命令実行ユニット１３が制御する。第６図の例は不揮発性記憶素子をマトリクス配置したメモリセルアレイＭＣＡ２とアドレスデコーダＡＤＥＣ２から成るＲＯＭ（Read Only Memory）を格納領域１６０とするものであり、ポインタ１６１がＲＯＭに対するアクセスアドレスを生成する。
スワップタスクバッファ１６，１７には、一つのまとまった処理を実現するための命令系列によって構成されるプログラムが格納される。特定の命令系列によって実現される一つのまとまった処理の単位をタスクと定義するならば、特定のタスクに係る処理プログラムが格納される。例えば、ＤＭＡ転送のための処理プログラム、データ圧縮・伸長のための処理プログラムなどが設定される。スワップタスクバッファ１６，１７への処理プログラムのロードは、特に制限されないが、パワーオンリセットなどによるシステムイニシャライズ時にシリアルインタフェース２１や命令実行ユニット１３を介して行うことができる。
前記セレクタ１８はスワップタスクバッファ１６、１７と命令フェッチユニット１０との中から一つを選択して命令レジスタ１１に接続する。その接続制御は切換え制御回路１９が行う。この切換え制御回路１９は、データプロセッサ１のイニシャライズリセット時に前記セレクタ１８に命令フェッチユニット１０を選択させ、その後、内外で発生される所定のイベント、例えば、内蔵周辺回路モジュール２０からの割り込み信号２２や外部における所定のイベント発生の通知信号２３に従ってセレクタ１８にスワップタスクバッファ１６又は１７の出力を選択させる。どのスワップタスクバッファを選択するかは、イベントの発生元とスワップタスクバッファとの対応テーブルを切換え制御回路１９が備えて判定したり、或いはイベント発生の通知信号毎に固有のスワップタスクバッファを割り当てて制御することができる。
特に制限されないが、前記命令実行ユニット１３は、前記命令レジスタ１１に命令をラッチさせる指示信号ＬＩＲを出力する。命令レジスタ１１はその指示信号ＬＩＲに同期して命令をラッチする。このとき、前記セレクタ１８は、前記切換え制御回路１９が選択する命令フェッチユニット１０又はスワップタスクバッファ１６，１７にその指示信号ＬＩＲを供給する。命令フェッチユニット１０はその指示信号ＬＩＲを受けると、命令レジスタ１１に供給すべき命令をその指示信号に基づいて更新する。また、前記タスクバッファ１６，１７はその指示信号ＬＩＲを受けると、前記ポインタ１６１，１７１をその指示信号ＬＩＲに基づいて更新する。これにより、セレクタ１８で選択されるスワップタスクバッファ１６又は１７のポインタ１６１又は１７１が順次更新され、そのポインタの値に応じた命令が格納領域１６０，１７０から命令レジスタ１１に供給されることになる。
スワップタスクバッファ１６，１７に格納されたプログラムの実行終了は、当該プログラムの最後に実行される命令が命令デコーダ１２でデコードされて出力される終了信号１２０によって切換え制御回路１９が認識する。切換え制御回路１９は、そのデコード結果（終了信号１２０）を受け取ると、前記セレクタ１８を命令フェッチユニット１０の選択状態に戻す。
第７図には命令実行ユニット１３のレジスタ構成例が示される。汎用レジスタＧＲは、レジスタＳＲ，Ｒ０〜Ｒ１５を含む。ＳＲはステータスレジスタに割り当てられ、Ｒ０〜Ｒ７はデータレジスタやアドレスレジスタに割り当てられ、Ｒ８〜Ｒ１５はデータレジスタ、アドレスレジスタ、スタックポインタ等に割り当てられる。前記レジスタセットＳ１はレジスタＳ１ＳＲ，Ｓ１Ｒ０〜Ｓ１Ｒ７を含み、前記レジスタセットＳ２はレジスタＳ２ＳＲ，Ｓ２Ｒ０〜Ｓ２Ｒ７を含み、それらレジスタセットＳ１，Ｓ２は、汎用レジスタＧＲのレジスタＳＲ，Ｒ０〜Ｒ７に代えて利用されるものであり、夫々固有のレジスタアドレスを有する。レジスタセットＳ１はスワップタスクバッファ１６に格納されたプログラムの実行に専用的に割り当てられ、レジスタセットＳ２はスワップタスクバッファ１７に格納されたプログラムの実行に専用的に割り当てられ、汎用レジスタＧＲのレジスタＳＲ，Ｒ０〜Ｒ７は命令フェッチユニット１０から出力された命令の実行に割り当てられる。
特に制限されないが、汎用レジスタＧＲのレジスタＳＲ，Ｒ０〜Ｒ７、レジスタセットＳ１又はレジスタセットＳ２のどのレジスタを利用するかは、レジスタ番号とタスクの種類によって決定され、例えばそれは、命令のオペランドフィールドで指定される。命令フェッチユニット１８から出力される命令が選択されるとき命令実行ユニット１３は命令実行に前記レジスタＳＲ，Ｒ０〜Ｒ１５を用い、スワップタスクバッファ１６から出力される命令が選択されるとき命令実行ユニット１３は命令実行に前記レジスタＳ１ＳＲ，Ｓ１Ｒ０〜Ｓ１Ｒ７を用い、スワップタスクバッファ１７から出力される命令が選択されるとき命令実行ユニット１３は命令実行に前記レジスタＳ２ＳＲ，Ｓ２Ｒ０〜Ｓ２Ｒ７を用いる。
上述のように、スワップタスクバッファ１６，１７は夫々固有のポインタ１６１，１７１を有し、夫々のスワップタスクバッファ１６，１７に割り当てられた固有のレジスタセットＳ１，Ｓ２を有するから、実行すべきタスクが命令フェッチユニット１０とスワップタスクバッファ１６，１７との間で切換えられたとき、プログラムカウンタＰＣやレジスタＧＲの値を退避したり復帰したりするために外部メモリ２等のスタック領域をアクセスする処理を必要としない。
第８図にはタスク切換えの動作例が示される。前記命令フェッチユニット１０からの命令を実行する（通常命令処理）状態の途上で、例えば信号２３によって、スワップタスクバッファ１６に格納されているプログラム（スワップタスク１）の実行が要求されると、切換え制御回路１９は、パイプラインステージの切換わりに同期して、セレクタ１８による選択状態をスワップタスクバッファ１６に切換える。これにより、スワップタスクバッファ１６は指示信号ＬＩＲに同期してスワップタスク１の先頭の命令をポインタ１６１で指示して出力し、命令レジスタ１１がこれをラッチする。また、命令実行ユニット１３は、スワップタスクを実行するときは、当該タスクの命令記述によって指定されるレジスタセットＳ１を利用する。これにより、プログラムカウンタＰＣの退避、レジスタＳＲ，Ｒ０〜Ｒ７の退避を要せずに、スワップタスク１の実行に移ることができる。切換えられたスワップタスク１の最後の命令が命令デコーダ１２で解読されると、切換え制御回路１９はセレクタ１８に命令フェッチユニット１０を選択させる。このとき、プログラムカウンタＰＣ、ステータスレジスタＳＲ，データ及びアドレスレジスタＲ０〜Ｒ７は、前記ワップタスク１への切換え直前の値をそのまま維持している。スワップタスクバッファ１６に格納されていた命令の実行においてレジスタＲ８〜Ｒ１５は利用されない。したがって、通常命令への切換えに際しても、復帰のためのメモリアクセスを必要としない。第９図に示される通常命令処理と割り込み処理との切換えの場合には、切換えの度に退避、復帰のためのメモリアクセスが必要になる。退避、復帰のためのメモリアクセスは、タスク切換え若しくはパイプライン切換えのオーバヘッドになる。
第１０図には通常命令処理とスワップタスク１との切換え時におけるパイプラインの状態の一例が示される。特に制限されないが、本実施例のデータプロセッサ１におけるパイプラインステージは５段とされ、通常命令処理におけるパイプラインステージは、命令フェッチ（Ｉｎ）、命令デコード（Ｄｎ）、演算（Ｅｎ）、メモリアクセス（Ａｎ）及びレジスタストア（Ｓｎ）の各ステージとされる。スワップタスクにおけるパイプラインステージは、命令転送（Ｃｓ）、命令デコード（Ｄｓ）、演算（Ｅｓ）、メモリアクセス（Ａｓ）及びレジスタストア（Ｓｓ）の各ステージとされる。
例えば、パイプラインステージｍにおいてスワップタスク１の実行が要求されると、切換え制御回路１９は、パイプラインステージｍ＋１においてセレクタ１８による選択状態をスワップタスクバッファ１６に切換え、当該パイプラインステージｍ＋１においてスワップタスク１の先頭の命令に対する命令が命令レジスタ１１に転送（Ｃｓ１）される。タスク切換え時には前述の通りプログラムカウンタＰＣやレジスタＳＲ，Ｒ０〜Ｒ７の退避を要せずに、スワップタスク１の実行に移ることができる。以下パイプラインステージ毎に処理が一つずつ進められる。命令実行ユニット１３は、通常命令処理の実行では汎用レジスタＧＲを利用するが、スワップタスク１の実行ではレジスタセットＳ１を利用する。どのレジスタを利用するかは夫々の命令記述によって決定される。切換えられたスワップタスク１の最後の命令がパイプラインステージｎにおいて命令デコーダ１２で解読（Ｄｓ１）されると、切換え制御回路１９に終了信号１２０が供給される。切換え制御回路１９はパイプラインステージｎ＋１でセレクタ１８に命令フェッチユニット１０を選択させ、これによってパイプラインステージｎ＋１以降では命令レジスタ１１には命令フェッチユニット１０から命令が供給される。通常命令処理への切換えに際しても、前述の通り、復帰のためのメモリアクセスを必要としない。以上のように、通常命令処理とスワップタスク１との間でのタスク切換えに際して、パイプラインは一切乱れを生じていない。
第１図において前記割り込み制御回路１３１には、代表的に示された割り込み要求信号ＩＲＱが供給される。割り込み制御回路１３１はそれに設定されている割り込み優先度に応じて割り込み要求を受け付ける。本実施例では、前記切換え制御回路１９は、スワップタスクバッファ１６又は１７をセレクタ１８に選択させている状態において、割り込み受け付け禁止信号ＩＮＨをイネーブルにして割り込み制御回路１３１に供給する。割り込み制御回路１３１は、割り込み禁止信号ＩＮＨがイネーブルにされているとき割り込み要求を一切受け付けない。したがって、データプロセッサ１は、スワップタスクバッファ１６又は１７のプログラムに従ったタスクを実行しているとき、そのタスクの実行完了まで、タスクの切換えは行われない。換言すれば、スワップタスクバッファ１６又は１７に格納されたプログラムによるタスクには最も高い実行優先度が与えられることになる。割り込み制御回路１３１は、割り込み要求を受け付けると、現在の命令実行を中断し、プログラムカウンタＰＣ、ステータスレジスタＳＲ、データ及びアドレスレジスタＲ０〜Ｒ１５の内容等を外部メモリ２等のスタック領域に退避し、その後、受け付けた割り込み要求処理プログラムに分岐される。
第１１図には上述のようにスワップタスク中に割り込みを受け付けない場合の動作例が示される。通常処理の途中で割り込み要求があると、戻り番地などを退避した後、割り込み処理に分岐され、割り込み処理が終了されると、復帰処理を行った後に、通常処理に戻される。通常処理においてスワップタスク１の実行要求があると、切換え制御回路１９はスワップタスクバッファ１６を選択させて、即座にスワップタスク１の実行に移される。スワップタスク１の実行中は前記割り込み禁止信号ＩＮＨがイネーブルにされるので、割込み要求があっても、その間は割り込み要求は受け付けられない。受け付け禁止されていた割り込み要求は、スワップタスク１の実行完了後に割り込み禁止信号ＩＮＨがディスエーブルにされた後受け付けられる。割り込み処理に分岐する際には先ず、中断されている通常命令処理の戻り番地やレジスタの値を退避させ、その後割り込み処理に分岐される。割り込み処理終了後は、退避された情報を復帰させた後、通常命令処理に戻される。
第１２図には本発明に係るデータプロセッサの第２の実施例が示される。同図に示されるデータプロセッサ１Ａは、スワップタスクバッファ１６又は１７に格納されたプログラムによるタスクの実行中にも割り込みを受付可能にした点が、第１図のデータプロセッサ１と相違される。その他の点については第１図と同じであり、それと同一機能の回路ブロックには同一符号を付してその詳細な説明を省略する。
データプロセッサ１Ａにおいて、割り込み制御回路１３１は割り込み要求を受け付けると、割り込み制御信号ＩＣＮＴをイネーブルにして前記切換え制御回路１９に供給する。切換え制御回路１９は、セレクタ１８がスワップタスクバッファ１６又は１７を選択しているとき、割り込み制御信号ＩＣＮＴがイネーブルにされると、セレクタ１８による選択状態を命令フェッチユニット１０に切換え制御する。更に、切換え直前に選択されていたスワップタスクバッファ１６，１７を特定するための情報（スワップタスク選択情報）を退避する。退避先は、切換え制御回路１９内部の図示を省略する退避用ラッチとすることが望ましい。外部メモリ２等のスタック領域に退避させてもよいが、その場合には、当該割り込み処理からスワップタスクに復帰する時にスワップタスク選択情報を復帰させるのに外部バスアクセスサイクルを起動しなければならず、スワップタスク処理への復帰が遅れるからである。
スワップタスクの実行中に割り込みを受け付ける場合、それ以前に通常命令処理からスワップタスクへの分岐が行われている。したがって、当該割り込み処理を完了した後、現在中断中の通常処理に復帰出来るようにしなければならない。このため、前記セレクタ１８の切換えとスワップタスク選択情報の退避の後、現在中断している通常命令処理の戻り番地とレジスタ情報が退避され、その後、割り込み処理プログラムに分岐される。
第１３図にはスワップタスク中に割り込みを受け付ける場合の動作例が示される。通常命令処理の途中で割り込み要求があると、戻り番地などを退避した後、割り込み処理に分岐され、割り込み処理が終了されると、復帰処理を行った後に、通常命令処理に戻される。通常命令処理においてスワップタスク１の実行要求があると、切換え制御回路１９はスワップタスクバッファ１６をセレクタ１８に選択させて、即座にスワップタスク１の実行に移される。割り込み制御回路１３１は、スワップタスク１の実行中でも割り込みを受け付けることができ、割り込みを受け付けると、割り込み制御信号ＩＣＮＴをイネーブルにして前記切換え制御回路１９に供給する。これによって切換え制御回路１９は、セレクタ１８による選択状態を命令フェッチユニット１０に切換え制御すると共に、そのとき選択されていたスワップタスクバッファを特定するためのスワップタスク選択情報を退避する。そして、割り込みを受け付けた命令実行ユニット１３は、それ以前に処理が中断されている通常命令処理の戻り番地とレジスタ情報をスタック領域に退避（Ｓ１）した後、割り込み処理プログラムに分岐する。この割り込み処理が終了（Ｔ１）されると、前記割り込み制御信号ＩＣＮＴがディスエーブルにされ、これによって切換え制御回路１９は、退避されているスワップタスク選択情報に従って、中断されていたスワップタスク１の実行を再開する。このスワップタスク１の最後の命令が命令デコーダ１２で解読されると、終了信号１２０が切換え制御回路１９に与えられ、これによって、切換え制御回路１９はセレクタ１８に命令フェッチ回路１０の出力を選択させる。そうすると、前記割り込み処理の後の復帰処理（Ｓ２）が開始され、退避されていた通常命令処理の戻り番地やレジスタ情報が復帰されて、通常命令処理が再開される。その復帰処理（Ｓ２）は、割り込み処理終了（Ｔ１）後に再開されるスワップタスク処理１の終了まで引き伸ばされているが、これは、割り込み処理が終了（Ｔ１）されたとき、切換え制御回路１９は、前記スワプタスク選択情報が退避されていることに基づいて、セレクタ１８を先ずスワップタスクバッファ１６に切換えるからである。
第１４図には本発明に係るデータプロセッサの第３の実施例が示される。同図に示されるデータプロセッサ１Ｂはスーパスカラアーキテクチャを有し、複数の命令を２本のパイプラインによって並列的に実行することができる。すなわち、命令レジスタ１１Ａにラッチした命令を命令デコーダ１２Ａで解読して命令実行ユニット１３Ａがその命令を実行する第１の命令実行制御系列と、命令レジスタ１１Ｂにラッチした命令を命令デコーダ１２Ｂで解読して命令実行ユニット１３Ｂがその命令を実行する第２の命令実行制御系列とを有する。第１の命令実行制御系列で行われるパイプライン処理をパイプ０と称し、第２の命令実行制御系列で行われるパイプライン処理をパイプ１と称する。ＬＩＲＡは命令レジスタ１１Ａに対する命令ラッチの指示信号、ＬＩＲＢは命令レジスタ１１Ｂに対する命令ラッチの指示信号であり、前記指示信号ＬＩＲに対応される。
前記命令実行ユニット１３Ａ、１３Ｂは夫々に専用化されたシーケンス制御回路１３２Ａ，１３２Ｂと演算回路１３３Ａ，１３３Ｂを有する。パイプ０とパイプ１との間で生ずるデータコンフリクトのような命令相互間の依存関係は競合管理ユニット２５が命令デコーダ１２Ａ，１２Ｂのデコード結果に基づいて検出する。すなわち、競合管理ユニット２５は、命令デコーダ１２Ａ，１２Ｂからの命令解読結果に基づいて、パイプ０とパイプ１による命令の並列実行が可能か否かについてそれら命令相互間の依存関係を調べ、他の命令の実行結果に依存することになる命令の実行を遅らせるように、制御信号ＡＲＢＡ，ＡＲＢＢによってシーケンス制御回路１３２Ａ，１３２Ｂを制御する。
割り込み制御回路１３１、プログラムカウンタＰＣ，汎用レジスタＧＲは双方の命令実行ユニット１３Ａ，１３Ｂに共有されている。レジスタセットＳ１，Ｓ２は命令実行ユニット１３Ｂに専用化されている。それらの詳細については第１図のデータプロセッサと同じである。
このスーパスカラアーキテクチャーのデータプロセッサ１Ｂにおいて、前記セレクタ１８、切換え制御回路１９及びスワップタスクバッファ１６，１７は命令レジスタ１１Ｂ側の命令実行制御系列に配置されている。そして、第１図のデータプロセッサと同様に、命令フェッチユニット１０、命令キャッシュメモリ１４、内蔵周辺モジュール２０、データキャッシュメモリ１５などが設けられている。第１４図において第１図と同一機能を有するものには同一符号を付してその詳細な説明を省略する。尚、第１４図の場合、双方のスワップタスクバッファ１６、１７は内部バスＢＵＳを介してプログラムの初期ロードが行われるようになっている。
第１５図にはデータプロセッサ１Ｂにおいてデータコンフリクトを生じた場合の制御とタスク切換え制御の内容が例示されている。
例えばパイプラインステージｍで夫々命令レジスタ１１Ａ，１１Ｂにラッチされた命令のデコードステージ（ｍ＋１）で競合管理ユニット２５がデータコンフリクトを検出すると、後から実行されるべき命令の実行は、先に実行されるべき命令の実行結果が得られるまでＮＯＰ（ノン・オペレーション）とされる。すなわち、パイプラインステージ（ｍ＋４）におけるパイプ０のレジスタストア（Ｓｎ）の結果を当該ステージ（ｍ＋４）におけるパイプ１の演算ステージ（Ｅｎ）で利用できるようになるまで、パイプ１のパイプラインステージがＮＯＰとされる。
また、パイプラインステージｍ＋３においてスワップタスク１の実行が要求されると、切換え制御回路１９は、パイプラインステージｍ＋４においてセレクタ１８による選択状態をスワップタスクバッファ１６に切換え、当該パイプラインステージｍ＋４においてパイプ１ではスワップタスク１の先頭の命令に対する命令が命令レジスタ１１Ｂに転送される（Ｃｓ１）。タスク切換え時には前述の通りプログラムカウンタＰＣやレジスタＳＲ，Ｒ０〜Ｒ７の退避を要せずに、スワップタスク１の実行に移ることができる。以下パイプ１のパイプラインステージ毎に処理が順次進められる。このとき、命令実行ユニット１３Ｂは、スワップタスク１の実行にはレジスタセットＳ１を利用する。どのレジスタを利用するかは前記の例と同様に夫々の命令記述によって決定される。切換えられたスワップタスク１の最後の命令がパイプ１におけるパイプラインステージｎ＋１で命令デコーダ１２Ｂにてデコードされると、切換え制御回路１９に終了信号１２０が供給される。切換え制御回路１９はパイプラインステージｎ＋１でセレクタ１８に命令フェッチユニット１０を選択させ、これによってパイプ１のパイプラインステージｎ＋１以降では命令レジスタ１１Ｂには命令フェッチユニット１０から命令が供給される。これによってパイプ１では通常命令処理が再開される。通常命令処理への切換えに際しても、前述の通り、復帰のためのメモリアクセスを必要としない。以上のように、通常命令処理とスワップタスク１との間でのタスク切換えに際して、パイプラインは一切乱れを生じていない。
第１６図には本発明に係るデータプロセッサの第４の実施例が示される。同図に示されるデータプロセッサ１Ｃは前記データプロセッサ１Ｂと同様にスーパスカラアーキテクチャを有し、複数の命令を２本のパイプラインによって並列的に実行することができる。データプロセッサ１Ｂと相違する点は、パイプ０及びパイプ１によって通常命令処理を行っているときのデータコンフリクトの発生をスワップタスクへの切換え要因の一つとして有することである。競合管理ユニット２５はデータコンフリクトの発生に同期する制御信号２５０を切換え制御回路１９に与える。これによって切換え制御回路１９は、データコンフリクトによる通常命令処理におけるパイプ１の空きを利用してスワップタスク１の処理を行なう。但し、パイプ１側の命令レジスタ１１Ｂ及び命令デコーダ１２Ｂは一組しかないので、データコンフリクトによって実行が中断された命令実行の再開に際しては、命令フェッチからやり直すことになる。その制御はシーケンス制御回路１３２Ｂが行なう。その他の構成は図１４のデータプロセッサ１Ｂと同じであるのでその構成の詳細な説明は省略する。
第１７図にはデータコンフリクトを生じたときのタスク切換え制御の内容が例示されている。例えばパイプラインステージｍで夫々命令レジスタ１１Ａ，１１Ｂにラッチされた命令のデコードステージ（ｍ＋１）で競合管理ユニット２５がデータコンフリクトを検出すると、後から実行されるべき命令の実行は、先に実行されるべき命令の実行結果を利用できるまでＮＯＰ（ノン・オペレーション）とされる。すなわち、パイプラインステージ（ｍ＋４）におけるパイプ０のレジスタストア（Ｓｎ）の結果を当該ステージ（ｍ＋４）におけるパイプ１の演算ステージ（Ｅｎ）で利用できるようになるまで、パイプ１のパイプラインステージにおける通常命令処理の実行が停止される。その指示は制御信号ＡＲＢＢによって命令実行ユニット１３Ｂに通知される。このとき、競合管理ユニット２５は制御信号２５０を活性化して切換え制御回路１９に与える。切換え制御回路１９は、それに応答してセレクタ１８にスワップタスクバッファ１６を選択させる。これにより、パイプラインステージm＋１〜ｍ＋５においてパイプ１は、スワップタスク１の処理を行うことができる。スワップタスク１の処理に許容される期間は、データコンフリクトによってパイプ１の通常命令処理が中断される期間であり、その期間は競合管理ユニット２５で制御され、制御信号２５０に反映され、当該制御信号２５０がインアクティブにされることによって、セレクタ１８の選択状態は元の通常命令処理の選択状態（命令フェッチユニット１０の選択状態）に戻される。タスク切換え時には前述の通りプログラムカウンタＰＣやレジスタＳＲ，Ｒ０〜Ｒ７の退避を要せずに、スワップタスク１の実行に移ることができる。このとき、命令実行ユニット１３Ｂは、スワップタスク１の実行にはレジスタセットＳ１を利用する。どのレジスタを利用するかは前記の例と同様に夫々の命令記述によって決定される。
図１７の例では、パイプラインステージｍ＋３で夫々命令レジスタ１１Ａ，１１Ｂにラッチされた命令のデコードステージ（ｍ＋４）においても競合管理ユニット２５がデータコンフリクトを検出して、上記同様に、パイプラインステージ（ｍ＋７）におけるパイプ０のレジスタストア（Ｓｎ）の結果を当該ステージ（ｍ＋７）におけるパイプ１の演算ステージ（Ｅｎ）で利用できるようになるまで、パイプ１のパイプラインステージにおける通常命令処理の実行が停止され、それに代えて、パイプ１は、スワップタスク１の処理を行なってる。この例では、スワップタスク１の処理は細切れであり、その処理タイミングもデータコンフリクト発生時に限定されているが、データコンフリクトに固有の処理や処理インターバルに制限のない処理に適用して有効である。また、制御信号２５０は、前記信号２２、２３で選択されたスワップタスクを実際に処理するタイミングを規定する制御信号として利用してもよい。
第１８図には本発明に係るデータプロセッサの第５の実施例が示される。同図に示されるデータプロセッサ１Ｄは前記データプロセッサ１Ｂと同様にスーパスカラアーキテクチャを有し、複数の命令を２本のパイプラインによって並列的に実行することができる。データプロセッサ１Ｄは、データプロセッサ１Ｃと同様に、パイプ０及びパイプ１によって通常命令処理を行っているときのデータコンフリクトの発生をスワップタスクへの切換え要因の一つとするが、そのとき実行されるスワップタスクに専用的に割り当てられる命令レジスタ１１Ｃと命令デコーダ１２Ｃを備える点が前記データプロセッサ１Ｃと相違される。命令レジスタ１１Ｃの入力の選択はセレクタ２６が行い、命令デコーダ１２Ｂ又は１２Ｃの出力はセレクタ２７によって選択される。
競合管理ユニット２５はデータコンフリクトの発生に同期してイネーブルにされる制御信号２５０を切換え制御回路１９と前記セレクタ２７に与える。これによってセレクタ２７は命令デコーダ１２Ｃの出力を選択すると共に、制御信号ＬＩＲＢも命令レジスタ１１Ｃ側に供給され、命令レジスタ１１Ｂは現在保持している命令をそのまま維持し、それに代えて命令レジスタ１１Ｃが制御信号ＬＩＲＢに従って新たな命令をラッチ可能にされる。また、切換え制御回路１９は、イネーブル状態の制御信号２５０によってセレクタ２６によってスワップタスクバッファ１６又は１７を命令レジスタ１１Ｃに接続する。どちらを接続するかは選択可能であっても固定的であってもよい。例えばデータプロセッサのイニシャライズリセット時に設定される動作モードに応じて何れを選択するかを決定するようにできる。
データコンフリクトによる通常命令処理のパイプ１の空きを利用して例えばスワップタスク１の処理を行なう場合、パイプ１側にはそれ専用の命令レジスタ１１Ｃ及び命令デコーダ１２Ｃを備えるから、データコンフリクトによって実行が中断された命令実行の再開に際しては、データプロセッサ１Ｃのように命令フェッチからやり直す必要はない。即ちパイプラインは一切乱れない。その他の構成はデータプロセッサ１Ｃと同じであるのでその構成の詳細な説明は省略する。
第１９図にはデータコンフリクトを生じたときデータプロセッサ１Ｄで行われるタスク切換え制御の内容が例示されている。例えばパイプラインステージｍで夫々命令レジスタ１１Ａ，１１Ｂにラッチされた命令のデコードステージ（ｍ＋１）で競合管理ユニット２５がデータコンフリクトを検出すると、後から実行されるべき命令実行は、先に実行されるべき命令の実行結果が得られるまでＮＯＰ（ノン・オペレーション）とされる。すなわち、パイプラインステージ（ｍ＋４）におけるパイプ０のレジスタストア（Ｓｎ）の結果を当該ステージ（ｍ＋４）におけるパイプ１の演算ステージ（Ｅｎ）で利用できるようになるまで、パイプ１のパイプラインステージにおける通常命令処理の実行が停止される。第１７図との相違点は、第１９図のステージｍ＋４のパイプ１における演算ステージ（Ｅｎ）のために改めて命令フェッチ及びデコードを繰り返すことを要しないということである。パイプ１のパイプラインステージにおける通常命令処理の実行停止の指示は、制御信号ＡＲＢＢによって命令実行ユニット１３Ｂに通知される。このとき、競合管理ユニット２５は制御信号２５０を活性化して切換え制御回路１９に与える。切換え制御回路１９は、それに応答してセレクタ１８にスワップタスクバッファ１６を選択させる。これにより、パイプラインステージm＋１〜ｍ＋５においてパイプ１は、スワップタスク１の処理を行うことができる。スワップタスク１の処理に許容される期間は、データコンフリクトによってパイプ１の通常命令処理が中断される期間であり、その期間は競合管理ユニット１５で制御され、制御信号２５０に反映され、当該信号２５０がインアクティブにされることによって、セレクタ１８の選択状態は元の通常命令処理の選択状態（命令フェッチユニット１０の選択状態）に戻される。タスク切換え時には前述の通りプログラムカウンタＰＣやレジスタＳＲ，Ｒ０〜Ｒ７の退避を要せずに、スワップタスク１の実行に移ることができる。このとき、命令実行ユニット１３Ｂは、スワップタスク１の実行にはレジスタセットＳ１を利用する。どのレジスタを利用するかは前記の例と同様に夫々の命令記述によって決定される。
図１９の例では、パイプラインステージｍ＋３で夫々命令レジスタ１１Ａ，１１Ｂにラッチされた命令のデコードステージ（ｍ＋４）においても競合管理ユニット２５がデータコンフリクトを検出して、上記同様に、パイプラインステージ（ｍ＋７）におけるパイプ０のレジスタストア（Ｓｎ）の結果を当該ステージ（ｍ＋７）におけるパイプ１の演算ステージ（Ｅｎ）で利用できるようになるまで、パイプ１のパイプラインステージにおける通常命令処理の実行が停止され、それに代えて、パイプ１は、スワップタスク１の処理を行なってる。
第２０図には前記データプロセッサ１を適用したデータ処理システムの一例が示される。スデータプロセッサ１の外部バス４には前記外部メモリ２及び入出力回路５が代表的に示されている。外部バス４はアドレスバスＡＢＵＳ，データバスＤＢＵＳ及びコントロールバスＣＢＵＳを含む。このシステムにおいて、データプロセッサ１のスワップタスクバッファ１６にはＤＭＡ転送制御及びデータ変換制御プログラムが格納されている。このＤＭＡ転送制御及びデータ変換制御プログラムの起動は前記制御信号２３の一つに割り当てられた割り込み信号２３０とされる。この割り込み信号２３０は、入出力回路５から供給される。
第２１図には前記ＤＭＡ転送制御及びデータ変換制御プログラムによるタスクの一例が示される。すなわち、入出力回路５から割込み信号２３０が切換え制御回路１９に与えられると、データプロセッサ１の処理プログラムは、スワップタスクバッファ１６に格納されているＤＭＡ転送制御及びデータ変換制御プログラムに切換えられる。このプログラムによって処理されるタスクは、入出力回路５からデータを読み込み、読み込んだデータを命令実行ユニット１３でデータ変換（例えば圧縮や座標変換）し、変換されたデータをメモリ２の所定領域に書込み制御する。読み出しアドレスと書込みアドレスは、データ転送及びデータ変換毎に、前記プログラムによって順次更新される。そのようなＤＭＡ転送制御及びデータ変換制御プログラムのプログラム記述の最小単位の例を第２２図に示す。スワップタスクバッファを用いたタスク切換えには前述の通り、通常の割り込み処理のような退避処理を必要とせずパイプラインの乱れもないから、発生したイベントに対して高速に応答することができる。
また、データプロセッサ１に代表される上記実施例において、スワップタスクバッファ１６、１７にＤＭＡ転送制御プログラムを設定した場合、第２３図に例示されるようなシステム構成に比べて、キャッシュコヒーレンシの問題を解決するいためのデータプロセッサ１の負担を軽減することができる。すなわち、第２３図のシステム構成では、キャッシュメモリ１５がライトバック方式を採用するとき、キャッシュメモリ１５の書き換えが外部メモリに反映されていない状態でＤＭＡコントローラ６がＤＭＡ転送を開始するとキャッシュコヒーレンシを保てなくなるので、データプロセッサ１Ｅはキャッシュコヒーレンシを保たないＤＭＡ転送動作の起動を常時監視し、それを検出したときは予じめライトバック動作を行なわせることが必要であり、データプロセッサ１Ｅは、キャッシュコヒーレンシを保たない動作を検出するための処理を負担しなければならない。これに対し、第１図のデータプロセッサ１を例にすると、データプロセッサ１の処理タスクがセレクタ１８等を介してＤＭＡ転送制御処理に切換えられた状態において、ＤＭＡコントローラとしての機能は実行ユニット１３が実現することになる。従って、データプロセッサ１の外部メモリ間、或いは外部メモリと外部の入出力回路間でＤＡＭデータ転送を制御する場合、ＤＭＡ転送制御のためのアドレス信号若しくはアクセス制御情報は必ずデータキャッシュメモリ１５を通すことになる。これにより、キャッシュメモリ１５がライトバック方式を採用する場合に、キャッシュメモリ１５の書き換えが外部メモリに反映されていない状態でＤＭＡ転送が開始されても、そのような外部メモリに反映されていないデータはキャッシュメモリ１５から命令実行ユニット１３に読み込まれて、転送されることになるから、データプロセッサ１は、キャッシュコヒーレンシを保たない動作を検出するための処理を負担する必要がない。尚、データプロセッサ１で実現するＤＭＡ転送制御機能において、転送データは一旦データプロセッサ１に読み込まれることになる。
以上本発明者によってなされた発明を実施例に基づいて具体的に説明したが、本発明はそれに限定されるものではなく、その要旨を逸脱しない範囲において種々変更可能であることは言うまでもない。
例えば、スワップタスクバッファの数は上記実施例に限定されず適宜変更することができる。また、キャッシュメモリはデータキャッシュメモリと命令キャッシュメモリを分離した構成に限定されず、命令とデータに兼用されるユニファイド・キャッシュメモリであってもよい。また、パイプラインステージの段数は上記実施例の５段に限定されない。また、スーパスカラデータプロセッサにおける並列動作可能なパイプの数は２本に限定されず、それ以上であってもよい。更に、スワップタスクの内容は必要に応じてどのようなものでも適用でき、制限されない。
産業上の利用可能性
以上のように、本発明に係るデータプロセッサは、種々のデータ処理システム、特に頻繁にタスクの切換えが行われるシステム、データ処理能力の向上を必要とするシステムに広く適用することができ、例えば、ディジタルカメラにおける撮影データの転送とデータ圧縮とをスワップタスクとして備えた組み込み機器制御用のコンピュータシステムなどに適用することができる。Technical field
The present invention relates to a data processor, and further to multitasking or task switching technology in the data processor. For example, the present invention is effective when applied to a data processor that processes a plurality of tasks in a pipeline and a data processing system to which the data processor is applied. Technology.
Background art
There is pipeline processing as a technique for speeding up data processing by a data processor. Pipeline processing improves data processing throughput by dividing one large process into a plurality of processing elements and executing new processes one after another at the time required for each processing element, that is, the pipeline pitch. . For example, when control processing for executing one instruction is divided into instruction fetch, instruction decode, operation, memory access, and register store processes, each of the processes is set as one pipeline stage, and one pipe stage is set. Instruction fetch is performed at every line stage pitch (pipeline pitch), and one instruction is apparently executed at one pipeline pitch.
When switching tasks in the middle of such pipeline processing, processing to save values such as the program counter, status register, and data register to the stack area so that you can return to the currently executing task later Must be done.
However, since such a save process requires a certain amount of time, the pipeline is disturbed. In particular, when complicated processing is performed, task switching frequently occurs even when the program execution state is viewed locally. As a result, the throughput of data processing cannot be improved as expected even when the angle pipeline processing is adopted.
In JP-A-62-237531, when a plurality of programs are executed in a time-sharing manner, processing is complicated with interrupts. Therefore, two sets of a program ROM and a program counter are provided, and the ROM access timing in each set is shifted. A technique has been shown in which a selector output is alternately selected by a selector and an instruction is supplied to an instruction register so that a plurality of programs can be easily executed in a time-sharing manner.
This is a technology for simply switching programs based on a clock signal and executing them in complete time division. Apparently, multiple programs are apparently executed in parallel. It is not considered that the task is switched in response to the occurrence of this event inside or outside the data processor. A data processor that is generally used for device control or the like is required to consider at least that, and to reduce the disturbance of the pipeline at the time of task switching and to improve the throughput.
A superscalar architecture data processor can execute a plurality of instructions simultaneously by a plurality of pipelines. In such a data processor, it is necessary to manage a dependency relationship between instructions represented by a data conflict state in which a certain instruction uses an execution result of another instruction. When it becomes clear that a data conflict occurs in an instruction to be executed in parallel, a part of the plurality of pipelines stops the instruction execution and waits for the completion of the execution of the other instruction. Considering the use of a pipeline vacated by data conflict in this way for the execution of another task, it is still necessary to shorten the processing time associated with task switching and minimize pipeline disruption. Revealed by the inventor.
In addition, the data processor can be equipped with a cache memory in order to speed up operand access. When the cache line of the cache memory is rewritten, the contents of the corresponding memory must be rewritten. For example, if only the data processor occupies the main memory, the rewritten content may be reflected in the main memory only when the cache line is replaced. Such an operation is called a write back.
However, there is a risk that a DMA (Direct Memory Access) controller connected to the outside of the data processor reads erroneous data from which the cache memory rewrite is not reflected in the main memory and transfers the data. This concern is called a cache coherency problem. To solve this problem, even if a cache hit occurs during a memory write operation, a write-through method that performs a memory write operation each time is adopted for the cache memory, and The cache memory can be configured to be non-blocking using a buffer. However, if memory write operations occur frequently due to cache coherency, the data processor uses up the data transfer capability of the bus connecting the DMA controller and main memory for cache coherency, and high-speed data transfer is performed by the DMA controller. This causes a problem that the data transfer speed is limited.
Therefore, in order to maintain the cache coherency by the write-back method without adopting the write-through, it is possible to adopt a technique of detecting an operation that does not maintain the cache coherency and writing back at that time. For example, when the DMA controller detects an operation (bus snoop) for reading and accessing data stored in the cache memory, the data processor interrupts the operation for writing back the data, and then enables DMA transfer. To do. However, the data processor increases the burden for detecting an operation that does not maintain cache coherency.
An object of the present invention is to provide a data processor that can reduce processing accompanying task switching and can improve data processing capability.
Another object of the present invention is to provide a data processor capable of minimizing pipeline disturbance during task switching.
Still another object of the present invention is to enable a data processor of a superscalar architecture to effectively use a pipeline freed by data conflict by switching to execution of another task.
Another object of the present invention is to provide a data processor capable of minimizing the burden for maintaining cache coherency during DMA transfer when a write-back cache memory is incorporated.
The above and other objects and novel features of the present invention will become apparent from the following description of the present specification.
Disclosure of the invention
In the present invention, as illustrated in FIG. 1, the instruction fetch unit (10) fetches an instruction, and the instruction latch (12) decodes the instruction latched in the instruction register (11). The data processor (1) on which the instruction execution unit (13) executes instructions based on the program storage area (160, 170) and pointers (161, 171) for sequentially reading instructions stored in the area. Each of the plurality of task buffers (16, 17) provided therein, the register means (S1, S2) dedicated to each of the task buffers and disposed in the instruction execution unit, and the plurality of task buffers And a selector (18) for selectively connecting one of the instruction fetch unit and the instruction fetch unit to the instruction register in the initial state. A switching control means (19) for selecting a fetch unit and selecting and controlling the selector according to an event generated internally or externally, and all or a part of the plurality of task buffers based on the control of the instruction execution unit Interface means (21, BUS) for interfacing with the outside so that data can be written.
Since each task buffer has its own pointer and each instruction execution unit has its own register means assigned to each task buffer, the task to be executed is a normal instruction process according to the program of the instruction fetch unit. When switching between swap task processing according to the task buffer program, the execution status of the interrupted normal instruction processing (for example, the value of the program counter or general-purpose register) is saved in or restored from the external memory. Does not require processing to access the stack area. Thereby, speeding up of task switching and reduction of processing accompanying task switching are achieved, which contributes to improvement of data processing capability of the data processor.
When the instruction register, the instruction decoder, and the instruction execution unit advance the processing in units of pipeline stages and perform the pipeline processing of the instructions, the above can minimize the disturbance of the pipeline.
The instruction execution unit outputs an instruction signal (LIR) for latching an instruction in the instruction register, and the selector supplies the instruction signal to an instruction fetch unit or a task buffer selected by the switching control unit, and an instruction fetch The unit can update the instruction to be supplied to the instruction register based on the instruction signal, and the task buffer can update the pointer based on the instruction signal. This control facilitates pointer control of the task buffer.
As a technique for returning from the swap task processing to the normal instruction processing, the switching control means sets the selector to the instruction fetch unit based on the result of decoding the instruction supplied from the task buffer selected by the switching control means to the instruction decoder. You can return to the selected state. In other words, upon completion of the selected swap task process, the normal command process is resumed.
When considering that the swap task processing is completed with the highest priority, as illustrated in FIG. 1, the switching control means is adapted to receive an interrupt signal input to the instruction execution unit in response to the selection of the task buffer. An interrupt disable signal (INH) for invalidating the signal may be output. As a result, no interrupt request is accepted during the swap task processing.
When accepting interrupts, the switching control means (19) executes the instruction when the task buffer (16, 17) is selected as in the data processor (1A) illustrated in FIG. The selector (18) may be returned to the instruction fetch unit selection state by the control signal ICNT corresponding to the acceptance of the interrupt by the unit (13), and the task buffer selection state immediately before that may be saved.
The data processor (1) may include a data cache memory (15) between the instruction execution unit and the outside. As illustrated in FIG. 20, this data processor is connected to a plurality of peripheral circuits (2, 5) such as a memory via a bus (4) to constitute a data processing system. At this time, when a DMA transfer control program or a DMA transfer and data conversion control program is set in the task buffer, the burden on the data processor for solving the cache coherency problem can be reduced. That is, in the state where the processing task of the data processor is switched to the DMA transfer control process via the selector or the like, the function as the DMA controller is realized by the execution unit. Therefore, when DMA data transfer is controlled between the external memory of the data processor or between the external memory and the external input / output circuit, the address signal or access control information for DMA transfer control always passes through the data cache memory. . In other words, when the cache memory adopts the write-back method, even if the DMA transfer is started in a state where the rewrite of the data cache memory is not reflected in the external memory, the data not reflected in such external memory Are read from the data cache memory into the instruction execution unit and transferred. As a result, the data processor detects a DMA transfer operation that does not maintain cache coherency, and does not require a write-back operation in advance when detected, and detects a DMA transfer operation that does not maintain cache coherency. This reduces the processing burden on the data processor. Of course, in the DMA transfer control function realized by the data processor, the transfer data is once read into the data processor.
The task switching means can also be applied to the superscalar type data processors (1B, 1C) illustrated in FIGS. That is, the instruction latched in the instruction register (11A, 11B) is decoded by the instruction decoder (12A, 12B) and the instruction execution unit (13A, 13B) includes a plurality of instruction execution control sequences for executing the instruction. A data processor (1B, 1C) that includes an instruction fetch unit (10) that fetches a plurality of instructions in parallel with the plurality of instruction execution control sequences, and sequentially stores instructions stored in the program storage area A plurality of task buffers (16, 17) each having a pointer for reading, and register means (S1, S2) dedicated to each task buffer and arranged in a specific instruction execution unit And the specific instruction execution unit by selecting one of the plurality of task buffers and the instruction fetch unit. A selector (18) connected to the corresponding instruction register, and a switching control means (19) for causing the selector to select the instruction fetch unit in an initial state and for selecting and controlling the selector according to an event generated internally or externally And have. In this data processor as well, normal instruction processing and swap task processing can be switched using one instruction execution control system, so that the task switching speed can be increased and the burden on the data processor accompanying task switching can be reduced as described above. In addition, pipeline disturbance can be minimized. Therefore, the high data processing capability originally intended by the superscalar architecture can be guaranteed.
In a superscalar data processor capable of executing a plurality of instructions in parallel, when arbitration between instructions such as data conflict is arbitrated by hardware, it is based on an instruction decoding result from an instruction decoder included in each instruction execution control sequence. Then, a conflict management unit (25) that checks the dependency relationship between instructions as to whether or not instructions can be executed in parallel by mutually different instruction execution control sequences and delays the execution of instructions depending on the execution result of other instructions. Will be provided.
At this time, as shown in FIG. 16, when the contention management unit delays execution of a specific instruction due to data conflict or the like, the switching control means responds to the control signal 250 for notifying the selector By selecting the task buffer in (18), the normal instruction processing by one instruction execution control system or pipe in which the processing is interrupted can be switched to the swap task processing, and the instruction execution control sequence can be used effectively. . In particular, when switching tasks, as described above, it is not necessary to save the execution state of normal instruction processing that is interrupted midway. Can be migrated.
The contention state such as the data conflict is determined by the contention management unit (25) based on the instruction decode result. At this time, the instruction whose processing is to be delayed has already been decoded. Thereafter, the process is switched to the swap task process. However, when the normal instruction process in which the process is interrupted and the swap task process in which the process is started instead use the same instruction register and instruction decoder, As illustrated in the figure, the same instruction as the instruction fetch (In) of the pipeline stage m in the pipe 1 is fetched again at the stage m + 2 of the pipe 1 and is the same as the instruction decode (Dn) of the pipeline stage m + 1 in the pipe 1. The instruction must be decoded again at stage m + 3 of pipe 1 and in this sense the pipeline is disturbed. Therefore, when returning to the normal instruction processing in which the processing is interrupted after the swap task processing, it is necessary to resume from the instruction fetch.
In order to prevent the pipeline from being disturbed at all in the switching from the normal instruction processing to the swap task processing document due to the data conflict described above, the data processor (1D), as illustrated in FIG. An instruction register (11C) dedicated to swap task processing and an instruction decoder (12C) may be added to the instruction execution control system. That is, the instruction execution unit (13A, 13B) decodes the instruction latched in the instruction register (11A, 11B) by the instruction decoder (12A, 12B) and the instruction execution unit (13A, 13B) executes the instruction. It is assumed that an instruction fetch unit (10) for fetching is included, and a plurality of instructions can be executed in parallel by the plurality of instruction execution control sequences. The data processor (1D) includes a plurality of task buffers (16, 17) each having a program storage area and pointers for sequentially reading instructions stored in the area; A specific task instruction register (11C) dedicated to the task buffer, a specific task instruction decoder (12C) for decoding the instruction latched in the specific task instruction register, and dedicated to each task buffer And selecting one of the register means (S1, S2) arranged in a specific instruction execution unit, the plurality of task buffers and the instruction fetch unit, and corresponding to the specific instruction execution unit. A first selector (18) connected to the instruction register, and one for the specific task by selecting one of the plurality of task buffers A second selector (26) connected to the instruction register, an output of the instruction decoder corresponding to the specific instruction execution unit, and an output of the specific task instruction decoder are selectively connected to the specific instruction execution unit. Based on the third selector (27) and the result of instruction decoding from the instruction decoder included in each of the instruction execution control sequences, whether or not instructions can be executed in parallel by different instruction execution control sequences. A contention that checks the dependency between each other, delays the execution of a specific instruction depending on the execution result of another instruction, and causes the third selector to select the instruction decoder for the specific task when delaying the execution of the specific instruction The management unit (25), in the initial state, causes the instruction selector to be selected by the first selector and deselects the second selector. The state is controlled, the first selector is selected and controlled according to an event generated internally or externally, and the second selector is internally controlled in response to selection of the instruction decoder for the specific task by the third selector. Or switching control means (19) for selecting a task buffer according to an externally generated event.
[Brief description of the drawings]
FIG. 1 is a block diagram of a data processor according to a first embodiment of the present invention,
FIG. 2 is an example block diagram of an instruction fetch unit;
FIG. 3 is a block diagram showing a first example of a swap task buffer;
FIG. 4 is a block diagram showing a second example of the swap task buffer.
FIG. 5 is a block diagram showing a third example of the swap task buffer;
FIG. 6 is a block diagram showing a fourth example of the swap task buffer.
FIG. 7 is a diagram illustrating an example of a register set included in the instruction execution unit.
FIG. 8 is an explanatory diagram showing an example of task switching operation in the data processor according to the first embodiment;
FIG. 9 is an explanatory diagram showing an example of switching operation between normal instruction processing and interrupt processing,
FIG. 10 is an example timing chart showing the relationship between task switching and pipeline in the data processor according to the first embodiment;
FIG. 11 is an example operation timing chart of the data processor according to the first embodiment that does not accept an interrupt during the swap task;
FIG. 12 is a block diagram of a data processor according to the second embodiment of the present invention,
FIG. 13 is an example operation timing chart of the data processor according to the second embodiment for accepting an interrupt during the swap task;
FIG. 14 is a block diagram of a data processor according to a third embodiment of the present invention,
FIG. 15 is an example timing chart showing the contents of control and task switching control when a data conflict occurs in the data processor according to the third embodiment;
FIG. 16 is a block diagram of a data processor according to the fourth embodiment of the present invention,
FIG. 17 is a timing chart showing the contents of task switching control when a data conflict occurs in the data processor according to the fourth embodiment;
FIG. 18 is a block diagram of a data processor according to a fifth embodiment of the present invention,
FIG. 19 is a timing chart showing task switching control performed by the data processor according to the fifth embodiment when a data conflict occurs;
FIG. 20 is a block diagram showing an example of a data processing system to which the data processor of the present invention is applied.
FIG. 21 is an explanatory diagram showing an example of tasks by the DMA transfer control and data conversion control program;
FIG. 22 is an explanatory diagram showing an example of a minimum unit of program description of the DMA transfer control and data conversion control program;
FIG. 23 is a block diagram showing an example of a data processing system including a cache memory employing a write-back method and a DMA controller arranged outside the data processor.
BEST MODE FOR CARRYING OUT THE INVENTION
FIG. 1 shows a block diagram of a data processor according to a first embodiment of the present invention. The data processor 1 shown in the figure is not particularly limited, but is formed on a single semiconductor substrate such as single crystal silicon by a known semiconductor integrated circuit manufacturing technique.
In FIG. 1, 10 is an instruction fetch unit, 11 is an instruction register, 12 is an instruction decoder, 13 is an instruction execution unit, 14 is an instruction cache memory, 15 is a data cache memory, and 16 and 17 are representatively shown swap tasks. A buffer, 18 is a selector, 19 is a switching control circuit, and 20 is a circuit block generically referring to a built-in peripheral module.
The instruction execution unit 13 includes a program counter PC, a general-purpose register GR, register sets S1 and S2 individually assigned to the swap task buffers 16 and 17, an interrupt control circuit 131, a sequence control circuit 132, an arithmetic circuit 133, and the like. including.
In the data processor 1 of the present embodiment, the instruction register 11, the instruction decoder 12, and the instruction execution unit 13 advance the processing in units of pipeline stages, and pipeline the instructions. The sequence control circuit 132 controls operation cycles in the instruction register 11, the instruction decoder 12, and the instruction execution unit 13 in synchronization with an operation reference clock signal (not shown) of the data processor 1.
The instruction execution unit 13 is not particularly limited, but is interfaced to the outside via the data cache memory 15 connected to the internal bus BUS. The target of the data cache memory is the external memory 2 or the like. The data cache memory 15 is illustrated as a circuit block including a cache data unit, a cache tag unit, and a cache controller (not shown). The cache data part holds a part of data held in the external memory 2 or the like. The cache tag portion holds a part of the address (address tag) as a cache tag in correspondence with the data held by the cache data portion. In the case of a cache hit in external access, the cache controller outputs the data relating to the hit from the cache data portion to the internal bus BUS, or writes the data relating to the hit to the cache data portion as a new entry. In the case of a cache miss, the data read from the external memory 2 or the like is given to the internal bus BUS, or the external memory 2 or the like is written and accessed. In the case of a cache miss, the cache line can be replaced. Although not particularly limited, the cache controller performs a process for writing back the contents of the cache data portion rewritten by the cache hit to the external memory 2 or the like by so-called write back, which is performed only when the cache line is replaced. .
The program counter PC holds an instruction address to be executed next. Although not particularly limited, the instruction fetch unit 10 fetches an instruction that is expected to be executed in the future (for example, a plurality of instructions ahead from an instruction specified by the program counter PC) based on the value of the program counter PC. The instruction to be fetched is not particularly limited, but is stored in the external memory 3. In this embodiment, an instruction cache memory 14 is arranged between the external memory 3 and the instruction fetch unit 10.
The instruction cache memory 14 is illustrated as a circuit block including a cache data unit, a cache tag unit, and a cache controller (not shown). The cache data part holds a part of instructions held in the external memory 3 or the like. The cache tag portion holds a part of the address (address tag) as a cache tag in association with the instruction held by the cache data portion. In the memory access by the instruction fetch unit 10, the cache controller gives an instruction held in the cache data unit to the instruction fetch unit 10 in the case of a cache hit, and in the case of a cache miss, reads the instruction from the external memory 3 etc. This is given to the fetch unit 10.
Although not particularly limited, the instruction fetch unit 10 has a first-in / first-out buffer function, and can prefetch instructions for a plurality of words with respect to the value of the program counter PC. . For example, as shown in FIG. 2, four stages of latches 100A to 100D are arranged in series, and instructions from the external or instruction cache memory 14 can be taken directly through the selectors 101A to 101C without going through the previous stage latch. Has been. Reference numeral 102 denotes a control circuit for fetching instructions, which outputs the address of an instruction to be fetched based on the value of the program counter PC and inputs the input instruction to the latches 100A to 100D in a first-in first-out manner. It is held and outputted from the latches 100A to 100D. Although not particularly limited, the latches 100A to 100D latch instructions in units of two words, and the instruction decoder 12 decodes instructions in units of one word. In response to this, the output of the data latch 100D is divided into a lower word and an upper word by the selector 103 and output.
Each of the swap task buffers 16 and 17 has program storage areas 160 and 170 and pointers 161 and 171 for sequentially reading instructions stored in the storage areas 160 and 170. Although not particularly limited, the swap task buffer 16 can be written to the program storage area 160 by the execution unit 13 via the internal bus BUS. The swap task buffer 17 can be written to the program storage area 170 via a serial interface 21 (the control line of which is not shown) controlled by the instruction execution unit 13.
An example of the swap task buffers 16 and 17 is shown in FIGS. In the example of FIG. 3, a shift register and a selector are used as the storage area 160 (170), and a shift register in which a plurality of parallel input / output type latches LAT are cascaded and the parallel outputs of the respective latches LAT are used. A selector SEL that selects one bit at a time and outputs it in parallel, and a pointer 160 (170) that causes the selector SEL to select the output of each latch LAT in order from the upper side or the lower side via the selector SEL. . For example, if n stages of m-bit latches LAT are provided, instructions can be sequentially output m times in units of n bits. Data writing to the shift register is performed under the control of the serial interface 21 or the instruction execution unit 13. The number of stages of the latch LAT is determined according to the number of bits of the instruction, and FIG. 4 shows a configuration in which the number of stages of the latch LAT is different from FIG. In the example of FIG. 5, a RAM (Random Access Memory) comprising a memory cell array MCA1 in which dynamic memory cells or static memory cells are arranged in a matrix and an address decoder ADEC1 is used as a storage area 160, and a pointer 161 is an access to the RAM. Generate an address. The instruction execution unit 13 controls writing to the RAM. In the example of FIG. 6, a ROM (Read Only Memory) composed of a memory cell array MCA2 in which nonvolatile memory elements are arranged in a matrix and an address decoder ADEC2 is used as a storage area 160, and a pointer 161 generates an access address for the ROM.
The swap task buffers 16 and 17 store a program composed of an instruction sequence for realizing a single process. If a single unit of processing realized by a specific instruction sequence is defined as a task, a processing program related to the specific task is stored. For example, a processing program for DMA transfer and a processing program for data compression / decompression are set. The loading of the processing program to the swap task buffers 16 and 17 is not particularly limited, but can be performed via the serial interface 21 or the instruction execution unit 13 at the time of system initialization by a power-on reset or the like.
The selector 18 selects one of the swap task buffers 16 and 17 and the instruction fetch unit 10 and connects it to the instruction register 11. The switching control circuit 19 performs the connection control. The switching control circuit 19 causes the selector 18 to select the instruction fetch unit 10 at the time of initialization reset of the data processor 1, and then a predetermined event generated inside or outside, for example, an interrupt signal 22 from the built-in peripheral circuit module 20 or The selector 18 is made to select the output of the swap task buffer 16 or 17 in accordance with the notification signal 23 of the occurrence of a predetermined event externally. Which swap task buffer is selected is determined by the switching control circuit 19 having a correspondence table between event generation sources and swap task buffers, or by assigning a unique swap task buffer for each event generation notification signal. Can be controlled.
Although not particularly limited, the instruction execution unit 13 outputs an instruction signal LIR that causes the instruction register 11 to latch an instruction. The instruction register 11 latches the instruction in synchronization with the instruction signal LIR. At this time, the selector 18 supplies the instruction signal LIR to the instruction fetch unit 10 or the swap task buffers 16 and 17 selected by the switching control circuit 19. Upon receiving the instruction signal LIR, the instruction fetch unit 10 updates the instruction to be supplied to the instruction register 11 based on the instruction signal. When the task buffers 16 and 17 receive the instruction signal LIR, the task buffers 16 and 17 update the pointers 161 and 171 based on the instruction signal LIR. As a result, the pointer 161 or 171 of the swap task buffer 16 or 17 selected by the selector 18 is sequentially updated, and an instruction corresponding to the value of the pointer is supplied from the storage areas 160 and 170 to the instruction register 11. .
The end of execution of the program stored in the swap task buffers 16 and 17 is recognized by the switching control circuit 19 based on the end signal 120 output by decoding the instruction executed at the end of the program by the instruction decoder 12. When receiving the decoding result (end signal 120), the switching control circuit 19 returns the selector 18 to the selected state of the instruction fetch unit 10.
FIG. 7 shows a register configuration example of the instruction execution unit 13. General-purpose register GR includes registers SR, R0 to R15. SR is assigned to a status register, R0 to R7 are assigned to a data register and an address register, and R8 to R15 are assigned to a data register, an address register, a stack pointer, and the like. The register set S1 includes registers S1SR and S1R0 to S1R7. The register set S2 includes registers S2SR and S2R0 to S2R7. The register sets S1 and S2 are used in place of the registers SR and R0 to R7 of the general-purpose register GR. Each having a unique register address. The register set S1 is assigned exclusively to the execution of the program stored in the swap task buffer 16, and the register set S2 is assigned exclusively to the execution of the program stored in the swap task buffer 17, and the register SR of the general-purpose register GR. , R0 to R7 are assigned to the execution of the instruction output from the instruction fetch unit 10.
Although not particularly limited, which register of the registers SR, R0 to R7, the register set S1, or the register set S2 of the general-purpose register GR is determined by the register number and the type of the task. It is specified. When an instruction output from the instruction fetch unit 18 is selected, the instruction execution unit 13 uses the registers SR, R0 to R15 for instruction execution. When an instruction output from the swap task buffer 16 is selected, the instruction execution unit 13 Uses the registers S1SR, S1R0 to S1R7 for instruction execution. When an instruction output from the swap task buffer 17 is selected, the instruction execution unit 13 uses the registers S2SR, S2R0 to S2R7 for instruction execution.
As described above, the swap task buffers 16 and 17 have their own pointers 161 and 171 and have their own register sets S1 and S2 assigned to the swap task buffers 16 and 17, respectively. Is switched between the instruction fetch unit 10 and the swap task buffers 16 and 17, a process of accessing a stack area such as the external memory 2 in order to save or restore the values of the program counter PC and the register GR Do not need.
FIG. 8 shows an example of task switching operation. When execution of a program (swap task 1) stored in the swap task buffer 16 is requested by, for example, the signal 23 in the course of executing an instruction from the instruction fetch unit 10 (normal instruction processing), switching is performed. The control circuit 19 switches the selection state by the selector 18 to the swap task buffer 16 in synchronization with the switching of the pipeline stage. As a result, the swap task buffer 16 designates and outputs the first instruction of the swap task 1 in synchronization with the instruction signal LIR, and the instruction register 11 latches it. Further, when executing the swap task, the instruction execution unit 13 uses the register set S1 specified by the instruction description of the task. Thus, the swap task 1 can be executed without saving the program counter PC and saving the registers SR and R0 to R7. When the last instruction of the switched swap task 1 is decoded by the instruction decoder 12, the switching control circuit 19 causes the selector 18 to select the instruction fetch unit 10. At this time, the program counter PC, status register SR, data and address registers R0 to R7 maintain the values immediately before switching to the wap task 1. The registers R8 to R15 are not used in the execution of the instruction stored in the swap task buffer 16. Therefore, memory access for return is not required even when switching to the normal instruction. In the case of switching between normal instruction processing and interrupt processing shown in FIG. 9, memory access for saving and restoring is required each time switching is performed. Memory access for saving and restoring is an overhead of task switching or pipeline switching.
FIG. 10 shows an example of the pipeline state at the time of switching between the normal instruction processing and the swap task 1. Although not particularly limited, the pipeline stage in the data processor 1 of this embodiment is five stages, and the pipeline stage in normal instruction processing is instruction fetch (In), instruction decode (Dn), operation (En), and memory access. (An) and register store (Sn) stages. The pipeline stages in the swap task are the instruction transfer (Cs), instruction decode (Ds), operation (Es), memory access (As), and register store (Ss) stages.
For example, when execution of the swap task 1 is requested in the pipeline stage m, the switching control circuit 19 switches the selection state by the selector 18 to the swap task buffer 16 in the pipeline stage m + 1, and the swap task in the pipeline stage m + 1. The instruction for the first instruction of 1 is transferred to the instruction register 11 (Cs1). When switching tasks, the swap task 1 can be executed without saving the program counter PC and the registers SR and R0 to R7 as described above. Thereafter, one process is performed for each pipeline stage. The instruction execution unit 13 uses the general-purpose register GR for execution of normal instruction processing, but uses the register set S1 for execution of the swap task 1. Which register is used is determined by each instruction description. When the last instruction of the switched swap task 1 is decoded (Ds1) by the instruction decoder 12 in the pipeline stage n, the end signal 120 is supplied to the switching control circuit 19. The switching control circuit 19 causes the selector 18 to select the instruction fetch unit 10 at the pipeline stage n + 1, whereby the instruction is supplied from the instruction fetch unit 10 to the instruction register 11 after the pipeline stage n + 1. Even when switching to normal instruction processing, as described above, memory access for return is not required. As described above, at the time of task switching between the normal instruction processing and the swap task 1, the pipeline is not disturbed at all.
In FIG. 1, the interrupt control circuit 131 is supplied with a representative interrupt request signal IRQ. The interrupt control circuit 131 accepts an interrupt request according to the interrupt priority set for it. In this embodiment, the switching control circuit 19 enables the interrupt acceptance inhibition signal INH and supplies it to the interrupt control circuit 131 in a state where the selector 18 is selected from the swap task buffer 16 or 17. The interrupt control circuit 131 does not accept any interrupt request when the interrupt disable signal INH is enabled. Therefore, when the data processor 1 is executing a task according to the program of the swap task buffer 16 or 17, the task switching is not performed until the execution of the task is completed. In other words, the task with the program stored in the swap task buffer 16 or 17 is given the highest execution priority. When receiving an interrupt request, the interrupt control circuit 131 interrupts the current instruction execution, saves the contents of the program counter PC, status register SR, data, address registers R0 to R15, etc. to the stack area of the external memory 2, etc. Thereafter, the process branches to the accepted interrupt request processing program.
FIG. 11 shows an operation example when no interrupt is accepted during the swap task as described above. If there is an interrupt request in the middle of normal processing, the return address is saved and then branched to interrupt processing. When the interrupt processing is completed, the return processing is performed and then the normal processing is returned. If there is a request to execute the swap task 1 in the normal process, the switching control circuit 19 causes the swap task buffer 16 to be selected and immediately moves to the execution of the swap task 1. Since the interrupt prohibition signal INH is enabled during the execution of the swap task 1, even if there is an interrupt request, no interrupt request is accepted during that time. The interrupt request that has been prohibited from being accepted is accepted after the interrupt prohibition signal INH is disabled after the execution of the swap task 1 is completed. When branching to interrupt processing, first, the return address and register value of the interrupted normal instruction processing are saved, and then branching to interrupt processing. After completion of the interrupt process, the saved information is restored and then returned to the normal instruction process.
FIG. 12 shows a second embodiment of the data processor according to the present invention. The data processor 1A shown in the figure is different from the data processor 1 in FIG. 1 in that an interrupt can be accepted even during execution of a task by a program stored in the swap task buffer 16 or 17. The other points are the same as those in FIG. 1. Circuit blocks having the same functions are denoted by the same reference numerals, and detailed description thereof is omitted.
In the data processor 1A, when receiving the interrupt request, the interrupt control circuit 131 enables the interrupt control signal ICNT and supplies it to the switching control circuit 19. When the selector 18 selects the swap task buffer 16 or 17 and the interrupt control signal ICNT is enabled, the switching control circuit 19 switches the selection state by the selector 18 to the instruction fetch unit 10. Further, information (swap task selection information) for specifying the swap task buffers 16 and 17 selected immediately before switching is saved. The save destination is preferably a save latch (not shown) inside the switching control circuit 19. It may be saved in the stack area of the external memory 2 or the like, but in that case, when returning from the interrupt processing to the swap task, an external bus access cycle must be started to restore the swap task selection information. This is because the return to the swap task processing is delayed.
When an interrupt is accepted during the execution of the swap task, the branch from the normal instruction processing to the swap task is performed before that. Therefore, after completing the interrupt processing, it is necessary to be able to return to the normal processing that is currently suspended. For this reason, after switching the selector 18 and saving the swap task selection information, the return address and register information of the normal instruction processing currently suspended are saved, and then branched to the interrupt processing program.
FIG. 13 shows an example of operation when accepting an interrupt during a swap task. If there is an interrupt request in the middle of normal instruction processing, the return address and the like are saved and then branched to interrupt processing. When the interrupt processing ends, the return processing is performed and then the normal instruction processing is returned. When there is a request to execute the swap task 1 in the normal instruction processing, the switching control circuit 19 causes the selector 18 to select the swap task buffer 16 and immediately shifts to execution of the swap task 1. The interrupt control circuit 131 can accept an interrupt even while the swap task 1 is being executed. When an interrupt is accepted, the interrupt control signal 131 is enabled and supplied to the switching control circuit 19. As a result, the switching control circuit 19 switches the selection state by the selector 18 to the instruction fetch unit 10, and saves swap task selection information for specifying the swap task buffer selected at that time. Then, the instruction execution unit 13 that has received the interrupt saves the return address and register information of the normal instruction processing for which processing was interrupted before (S1), and branches to the interrupt processing program. When this interrupt processing is completed (T1), the interrupt control signal ICNT is disabled, whereby the switching control circuit 19 executes the interrupted swap task 1 according to the saved swap task selection information. To resume. When the last instruction of the swap task 1 is decoded by the instruction decoder 12, an end signal 120 is given to the switching control circuit 19, which causes the selector 18 to select the output of the instruction fetch circuit 10. . Then, a return process (S2) after the interrupt process is started, the return address and register information of the saved normal instruction process are restored, and the normal instruction process is resumed. The return process (S2) is extended until the end of the swap task process 1 resumed after the end of the interrupt process (T1). This is because when the interrupt process is completed (T1), the switching control circuit 19 This is because, based on the fact that the swap task selection information is saved, the selector 18 is first switched to the swap task buffer 16.
FIG. 14 shows a third embodiment of the data processor according to the present invention. The data processor 1B shown in the figure has a superscalar architecture, and can execute a plurality of instructions in parallel by two pipelines. That is, the instruction latch 12A decodes the instruction latched in the instruction register 11A and the instruction execution unit 13A executes the instruction and the instruction latched in the instruction register 11B is decoded by the instruction decoder 12B. The instruction execution unit 13B has a second instruction execution control sequence for executing the instruction. Pipeline processing performed in the first instruction execution control sequence is referred to as pipe 0, and pipeline processing performed in the second instruction execution control sequence is referred to as pipe 1. LIRA is an instruction latch instruction signal for the instruction register 11A, and LIRB is an instruction latch instruction signal for the instruction register 11B, and corresponds to the instruction signal LIR.
The instruction execution units 13A and 13B have sequence control circuits 132A and 132B and arithmetic circuits 133A and 133B, respectively, dedicated to the instruction execution units 13A and 13B. Dependencies between instructions such as data conflicts occurring between pipe 0 and pipe 1 are detected by the conflict management unit 25 based on the decoding results of the instruction decoders 12A and 12B. That is, the contention management unit 25 checks the dependency between the instructions as to whether or not the instructions can be executed in parallel by the pipe 0 and the pipe 1 based on the result of the instruction decoding from the instruction decoders 12A and 12B. The sequence control circuits 132A and 132B are controlled by the control signals ARBA and ARBB so as to delay the execution of the instruction that depends on the execution result of the instruction.
The interrupt control circuit 131, the program counter PC, and the general-purpose register GR are shared by both instruction execution units 13A and 13B. The register sets S1 and S2 are dedicated to the instruction execution unit 13B. Details thereof are the same as those of the data processor of FIG.
In the data processor 1B of the superscalar architecture, the selector 18, the switching control circuit 19, and the swap task buffers 16 and 17 are arranged in an instruction execution control sequence on the instruction register 11B side. Similar to the data processor of FIG. 1, an instruction fetch unit 10, an instruction cache memory 14, a built-in peripheral module 20, a data cache memory 15 and the like are provided. In FIG. 14, components having the same functions as those in FIG. In the case of FIG. 14, both swap task buffers 16 and 17 are initially loaded with the program via the internal bus BUS.
FIG. 15 illustrates the contents of control and task switching control when a data conflict occurs in the data processor 1B.
For example, when the conflict management unit 25 detects a data conflict at the decode stage (m + 1) of the instruction latched in the instruction registers 11A and 11B at the pipeline stage m, the instruction to be executed later is executed first. NOP (non-operation) is performed until the execution result of the instruction to be obtained is obtained. That is, the pipeline stage of pipe 1 is NOP until the result of the register store (Sn) of pipe 0 in the pipeline stage (m + 4) becomes available in the operation stage (En) of pipe 1 in the stage (m + 4). It is said.
When execution of the swap task 1 is requested in the pipeline stage m + 3, the switching control circuit 19 switches the selection state by the selector 18 to the swap task buffer 16 in the pipeline stage m + 4, and the pipe 1 in the pipeline stage m + 4. Then, the instruction for the first instruction of the swap task 1 is transferred to the instruction register 11B (Cs1). When switching tasks, the swap task 1 can be executed without saving the program counter PC and the registers SR and R0 to R7 as described above. Thereafter, the processing is sequentially advanced for each pipeline stage of the pipe 1. At this time, the instruction execution unit 13B uses the register set S1 for execution of the swap task 1. Which register is used is determined by each instruction description as in the above example. When the last instruction of the switched swap task 1 is decoded by the instruction decoder 12B at the pipeline stage n + 1 in the pipe 1, the end signal 120 is supplied to the switching control circuit 19. The switching control circuit 19 causes the selector 18 to select the instruction fetch unit 10 at the pipeline stage n + 1, whereby the instruction is supplied from the instruction fetch unit 10 to the instruction register 11B after the pipeline stage n + 1 of the pipe 1. As a result, normal instruction processing is resumed in the pipe 1. Even when switching to normal instruction processing, as described above, memory access for return is not required. As described above, at the time of task switching between the normal instruction processing and the swap task 1, the pipeline is not disturbed at all.
FIG. 16 shows a fourth embodiment of the data processor according to the present invention. The data processor 1C shown in the figure has a superscalar architecture similar to the data processor 1B, and can execute a plurality of instructions in parallel by two pipelines. The difference from the data processor 1B is that the occurrence of data conflict when normal instruction processing is performed by the pipe 0 and the pipe 1 is one of the switching factors to the swap task. The conflict management unit 25 provides the switching control circuit 19 with a control signal 250 that is synchronized with the occurrence of the data conflict. As a result, the switching control circuit 19 performs the swap task 1 process using the vacancy of the pipe 1 in the normal instruction process due to the data conflict. However, since there is only one set of the instruction register 11B and the instruction decoder 12B on the pipe 1 side, when the execution of the instruction whose execution is interrupted due to the data conflict is resumed, the instruction fetch is repeated. The control is performed by the sequence control circuit 132B. Since the other configuration is the same as that of the data processor 1B of FIG. 14, a detailed description of the configuration is omitted.
FIG. 17 illustrates the contents of task switching control when a data conflict occurs. For example, when the conflict management unit 25 detects a data conflict at the decode stage (m + 1) of the instruction latched in the instruction registers 11A and 11B at the pipeline stage m, the instruction to be executed later is executed first. It is determined as NOP (non-operation) until the execution result of the instruction to be used can be used. In other words, until the result of the register store (Sn) of pipe 0 in the pipeline stage (m + 4) becomes available in the operation stage (En) of pipe 1 in the stage (m + 4), the normal in the pipeline stage of pipe 1 Execution of instruction processing is stopped. The instruction is notified to the instruction execution unit 13B by the control signal ARBB. At this time, the conflict management unit 25 activates the control signal 250 and gives it to the switching control circuit 19. In response to this, the switching control circuit 19 causes the selector 18 to select the swap task buffer 16. As a result, the pipe 1 can perform the process of the swap task 1 in the pipeline stages m + 1 to m + 5. The period allowed for the processing of the swap task 1 is a period in which the normal instruction processing of the pipe 1 is interrupted due to a data conflict. The period is controlled by the contention management unit 25 and reflected in the control signal 250. By making 250 inactive, the selection state of the selector 18 is returned to the selection state of the original normal instruction processing (the selection state of the instruction fetch unit 10). When switching tasks, the swap task 1 can be executed without saving the program counter PC and the registers SR and R0 to R7 as described above. At this time, the instruction execution unit 13B uses the register set S1 for execution of the swap task 1. Which register is used is determined by each instruction description as in the above example.
In the example of FIG. 17, the conflict management unit 25 also detects a data conflict in the decode stage (m + 4) of the instruction latched in the instruction registers 11A and 11B at the pipeline stage m + 3, and the pipeline stage ( The execution of normal instruction processing in the pipeline stage of pipe 1 is stopped until the result of register store (Sn) of pipe 0 in m + 7) becomes available in the operation stage (En) of pipe 1 in the stage (m + 7) Instead, the pipe 1 performs the swap task 1 processing. In this example, the processing of the swap task 1 is finely divided and the processing timing is limited when a data conflict occurs. However, the processing is effective when applied to processing unique to the data conflict and processing with no limitation on the processing interval. Further, the control signal 250 may be used as a control signal that defines the timing for actually processing the swap task selected by the signals 22 and 23.
FIG. 18 shows a fifth embodiment of the data processor according to the present invention. The data processor 1D shown in the figure has a superscalar architecture similar to the data processor 1B, and can execute a plurality of instructions in parallel by two pipelines. As with the data processor 1C, the data processor 1D uses the occurrence of data conflict during normal instruction processing by the pipe 0 and the pipe 1 as one of the switching factors to the swap task. The data processor 1C is different from the data processor 1C in that it includes an instruction register 11C and an instruction decoder 12C that are exclusively assigned to tasks. The selector 26 selects the input of the instruction register 11C, and the output of the instruction decoder 12B or 12C is selected by the selector 27.
The contention management unit 25 provides the switching control circuit 19 and the selector 27 with a control signal 250 that is enabled in synchronization with the occurrence of a data conflict. As a result, the selector 27 selects the output of the instruction decoder 12C, and the control signal LIRB is also supplied to the instruction register 11C. The instruction register 11B maintains the instruction currently held as it is, and the instruction register 11C controls it instead. A new instruction can be latched according to the signal LIRB. Further, the switching control circuit 19 connects the swap task buffer 16 or 17 to the instruction register 11C by the selector 26 by the control signal 250 in the enable state. Which to connect may be selectable or fixed. For example, it is possible to determine which one to select according to the operation mode set at the time of initialization reset of the data processor.
For example, when processing of the swap task 1 is performed using the empty space of the pipe 1 for normal instruction processing due to data conflict, since the pipe 1 is provided with the dedicated instruction register 11C and the instruction decoder 12C, execution is interrupted due to data conflict. When the instruction execution is resumed, it is not necessary to restart from the instruction fetch as in the data processor 1C. That is, the pipeline is not disturbed at all. Since the other configuration is the same as that of the data processor 1C, detailed description of the configuration is omitted.
FIG. 19 illustrates the contents of task switching control performed by the data processor 1D when a data conflict occurs. For example, when the conflict management unit 25 detects a data conflict at the decode stage (m + 1) of the instruction latched in the instruction registers 11A and 11B at the pipeline stage m, the instruction execution to be executed later is executed first. NOP (non-operation) is performed until the execution result of the power instruction is obtained. In other words, until the result of the register store (Sn) of pipe 0 in the pipeline stage (m + 4) becomes available in the operation stage (En) of pipe 1 in the stage (m + 4), the normal in the pipeline stage of pipe 1 Execution of instruction processing is stopped. The difference from FIG. 17 is that it is not necessary to repeat the instruction fetch and decode again for the operation stage (En) in the pipe 1 of the stage m + 4 in FIG. The instruction execution stop instruction of the normal instruction processing in the pipeline stage of the pipe 1 is notified to the instruction execution unit 13B by the control signal ARBB. At this time, the conflict management unit 25 activates the control signal 250 and gives it to the switching control circuit 19. In response to this, the switching control circuit 19 causes the selector 18 to select the swap task buffer 16. As a result, the pipe 1 can perform the process of the swap task 1 in the pipeline stages m + 1 to m + 5. The period allowed for the processing of the swap task 1 is a period in which the normal instruction processing of the pipe 1 is interrupted due to a data conflict. The period is controlled by the contention management unit 15 and is reflected in the control signal 250. Is made inactive, the selection state of the selector 18 is returned to the selection state of the original normal instruction processing (the selection state of the instruction fetch unit 10). When switching tasks, the swap task 1 can be executed without saving the program counter PC and the registers SR and R0 to R7 as described above. At this time, the instruction execution unit 13B uses the register set S1 for execution of the swap task 1. Which register is used is determined by each instruction description as in the above example.
In the example of FIG. 19, the conflict management unit 25 also detects a data conflict at the decode stage (m + 4) of the instruction latched in the instruction registers 11A and 11B at the pipeline stage m + 3, and the pipeline stage ( The execution of normal instruction processing in the pipeline stage of pipe 1 is stopped until the result of register store (Sn) of pipe 0 in m + 7) becomes available in the operation stage (En) of pipe 1 in the stage (m + 7) Instead, the pipe 1 performs the swap task 1 processing.
FIG. 20 shows an example of a data processing system to which the data processor 1 is applied. The external memory 2 and the input / output circuit 5 are representatively shown on the external bus 4 of the data processor 1. The external bus 4 includes an address bus ABUS, a data bus DBUS, and a control bus CBUS. In this system, a DMA transfer control and data conversion control program is stored in the swap task buffer 16 of the data processor 1. The DMA transfer control and data conversion control program is activated by an interrupt signal 230 assigned to one of the control signals 23. The interrupt signal 230 is supplied from the input / output circuit 5.
FIG. 21 shows an example of tasks by the DMA transfer control and data conversion control program. That is, when the interrupt signal 230 is given from the input / output circuit 5 to the switching control circuit 19, the processing program of the data processor 1 is switched to the DMA transfer control and data conversion control program stored in the swap task buffer 16. The task processed by this program reads data from the input / output circuit 5, converts the read data into data by the instruction execution unit 13 (for example, compression or coordinate conversion), and writes the converted data to a predetermined area of the memory 2. Control. The read address and the write address are sequentially updated by the program for each data transfer and data conversion. An example of the minimum unit of program description of such DMA transfer control and data conversion control program is shown in FIG. As described above, task switching using the swap task buffer does not require save processing such as normal interrupt processing and does not disturb the pipeline, so that it can respond to an event that has occurred at high speed.
Further, in the above-described embodiment represented by the data processor 1, when a DMA transfer control program is set in the swap task buffers 16 and 17, the cache coherency problem is reduced as compared with the system configuration as illustrated in FIG. The burden on the data processor 1 for solving the problem can be reduced. That is, in the system configuration of FIG. 23, when the cache memory 15 adopts the write-back method, cache coherency is maintained if the DMA controller 6 starts DMA transfer without rewriting the cache memory 15 being reflected in the external memory. Therefore, it is necessary for the data processor 1E to constantly monitor the start of the DMA transfer operation that does not maintain cache coherency, and when it is detected, it is necessary to perform a write-back operation in advance. Processing for detecting an operation that does not maintain cache coherency must be borne. On the other hand, taking the data processor 1 of FIG. 1 as an example, the execution unit 13 functions as a DMA controller when the processing task of the data processor 1 is switched to the DMA transfer control process via the selector 18 or the like. Will be realized. Therefore, when DAM data transfer is controlled between the external memories of the data processor 1 or between the external memory and the external input / output circuit, the address signal or access control information for DMA transfer control must be passed through the data cache memory 15 without fail. become. As a result, when the cache memory 15 adopts the write-back method, even if DMA transfer is started in a state where the rewrite of the cache memory 15 is not reflected in the external memory, the data not reflected in such external memory Is read from the cache memory 15 to the instruction execution unit 13 and transferred, the data processor 1 does not need to bear processing for detecting an operation that does not maintain cache coherency. In the DMA transfer control function realized by the data processor 1, the transfer data is once read into the data processor 1.
Although the invention made by the present inventor has been specifically described based on the embodiments, it is needless to say that the present invention is not limited thereto and can be variously modified without departing from the gist thereof.
For example, the number of swap task buffers is not limited to the above embodiment, and can be changed as appropriate. Further, the cache memory is not limited to a configuration in which the data cache memory and the instruction cache memory are separated, and may be a unified cache memory that is used for both instructions and data. Further, the number of pipeline stages is not limited to five in the above embodiment. Further, the number of pipes that can be operated in parallel in the superscalar data processor is not limited to two, and may be more than that. Further, the contents of the swap task can be applied as needed, and are not limited.
Industrial applicability
As described above, the data processor according to the present invention can be widely applied to various data processing systems, in particular, systems in which tasks are frequently switched, and systems that require improvement in data processing capability. The present invention can be applied to a computer system for controlling an embedded device provided with a transfer task and data compression in a digital camera as a swap task.

Claims

An instruction decoder decodes an instruction latched in an instruction register and an instruction execution unit includes a plurality of instruction execution control sequences for executing the instruction, and includes an instruction fetch unit for fetching an instruction, and a plurality of instructions are included in the plurality of instructions. In a data processor that can be executed in parallel in an execution control sequence,
A plurality of tasks buffer the storage area of the program and the pointer for out sequentially read the instructions stored in that region each comprising,
Register means dedicated to each of the respective task buffers and arranged in a specific instruction execution unit;
A selector that selects one of the plurality of task buffers and an instruction fetch unit and connects to an instruction register corresponding to the specific instruction execution unit;
Switching control means for selecting and controlling the selector according to an event generated internally or externally, and causing the selector to select the instruction fetch unit in an initial state ;
The instruction register, instruction decoder, and instruction execution unit included in each of the instruction execution control series are for processing the pipeline in units of pipeline stages,
Based on the result of instruction decoding from the instruction decoder included in each of the instruction execution control sequences, whether or not instructions can be executed in parallel by mutually different instruction execution control sequences is examined, A contention management unit for delaying execution of instructions depending on the execution result of
The data processor characterized in that the switching control means causes the selection means to select a task buffer when the contention management unit delays execution of a specific instruction.

The instruction execution unit included in each of the instruction execution control series outputs an instruction signal for causing the instruction register to latch an instruction, and the selector switches the instruction signal output from the corresponding instruction execution unit. The control means supplies the instruction fetch unit or task buffer selected by the control means, and the instruction fetch unit updates the instruction to be supplied to the instruction register based on the instruction signal. The data processor according to claim 1, wherein:

An instruction decoder decodes an instruction latched in the instruction register and an instruction execution unit executes a plurality of instruction execution control sequences, and includes an instruction fetch unit that fetches instructions, and executes a plurality of instructions. In a data processor that can be executed in parallel in a control sequence,
A plurality of tasks buffer the storage area of the program and the pointer for out sequentially read the instructions stored in that region each comprising,
A specific task instruction register dedicated to the plurality of task buffers;
A specific task instruction decoder for decoding an instruction latched in the specific task instruction register;
Register means dedicated to each task buffer and arranged in a specific instruction execution unit;
A first selector that selectively connects one of the plurality of task buffers and an instruction fetch unit to an instruction register corresponding to the specific instruction execution unit;
A second selector for selecting one of the plurality of task buffers and connecting to the specific task instruction register;
A third selector for selectively connecting the output of the instruction decoder corresponding to the specific instruction execution unit and the output of the instruction decoder for the specific task to the specific instruction execution unit;
Based on the result of instruction decoding from the instruction decoder included in each of the instruction execution control sequences, whether or not instructions can be executed in parallel by mutually different instruction execution control sequences is examined, A contention management unit that delays the execution of a specific instruction depending on the execution result of the instruction, and causes the third selector to select the instruction decoder for the specific task when delaying the execution of the specific instruction;
In the initial state, the instruction selector is selected by the first selector and the second selector is controlled to a non-selected state, and the first selector is selected and controlled according to an event generated internally or externally. Switching control means for causing the second selector to select a task buffer corresponding to an event generated internally or externally in response to the selection of the instruction decoder for the specific task by the third selector. A data processor characterized by being.