JP2005044017A

JP2005044017A - Microprocessor and compiler for program to be executed by microprocessor

Info

Publication number: JP2005044017A
Application number: JP2003200893A
Authority: JP
Inventors: Atsutake Asai; 淳毅朝井
Original assignee: Sharp Corp
Current assignee: Sharp Corp
Priority date: 2003-07-24
Filing date: 2003-07-24
Publication date: 2005-02-17
Anticipated expiration: 2023-07-24
Also published as: JP3853309B2

Abstract

<P>PROBLEM TO BE SOLVED: To provide a microprocessor in which it is possible to quickly perform access to data and a compiler for a program to be executed by the microprocessor. <P>SOLUTION: This microprocessor 40 is provided with an instruction decoder 6 for decoding successively read instruction codes, and for generating a control signal, an address register 11 for storing and generating address information, an ALU 9 for performing various arithmetic operations and an internal RAM 7 for temporarily storing information. Address decode parts 7A, 7B and 7C of the internal RAM 7 are configured of a high order address decoder and a low order address decoder, and generates an overall address decode signal from a high order address decode signal and a low order address decode signal. In this case, delay until the low order address decode signal of the low order address decoder is derived is shorter than delay until the high order address decode signal of the high address decoder is derived. <P>COPYRIGHT: (C)2005,JPO&NCIPI

Description

【０００１】
【発明の属する技術分野】
主に高級言語で書かれたプログラムを実行するマイクロプロセッサおよびコンパイラであって、特にスタックを用いて処理を行なうマイクロプロセッサと該マイクロプロセッサで実行されるプログラムのためのコンパイラに関する。
【０００２】
【従来の技術】
高級言語でプログラムを設計する際、サブルーチンへの引数・戻り値がスタックでやり取りされ、また、Ｃ言語でのローカル変数はスタック上に領域が確保されて演算が行なわれる。そのため、ローカル変数での演算は、ＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）のアドレス演算とそのリード、格納アドレス演算とそのライトが発生し複雑である。また、外部ＲＡＭのアクセス速度が遅い場合には、その速度が全体の処理速度に影響する。サブルーチンの実行処理を高速化するために、サブルーチン呼出し・復帰に関するスタック操作を並列化して、よって高速化する方法が提供されている（たとえば、特許文献１参照）。
【０００３】
【特許文献１】
特開平１１−２４２５９５号公報
【０００４】
【発明が解決しようとする課題】
上述の文献で提案された方法は高速化を図るように構成されるが、実際に使用するスタックは外部メモリとして接続されているから、外部メモリへのアクセスが一般的に低速であることに鑑みると、外部メモリへのアクセスが頻繁となる処理内容であった場合には実行処理を高速にすることが困難である。この文献ではスタック上のデータへのアクセス、演算には言及されていない。
【０００５】
それゆえにこの発明の目的は、データを高速にアクセスできるマイクロプロセッサおよび該マイクロプロセッサで実行されるプログラムのためのコンパイラを提供することである。
【０００６】
【課題を解決するための手段】
この発明のある局面に従うマイクロプロセッサは、アドレス信号に基づいてアクセスされる情報を記憶する内部メモリと、指令に基づき内部メモリの情報を用いた演算を含む各種演算を行なう演算部と、与えられる命令コードをデコードしデコード結果に基づき演算の指令を含む各部を制御するための制御情報を出力する命令解読部と、与えられるアドレス情報を入力して解読しアドレス信号を出力するアドレスデコード部とを備える。
【０００７】
アドレス情報は上位アドレス情報と下位アドレス情報を含み、アドレスデコード部による、下位アドレス情報を入力してからアドレス信号を導出するまでの遅延は、上位アドレス情報を入力してからアドレス信号を導出するまでの遅延よりも短い。
【０００８】
したがって、内部メモリの下位アドレス情報に基づいて参照される部分領域のアクセスは上位アドレス情報に基づいて参照される部分領域のアクセスよりも速く行なわれるので、下位アドレスが変化した場合に内部メモリの高速アクセスが可能となる。また、内部メモリの下位アドレス情報に基づいて参照される部分領域のアクセスを高速化できる。例えばレジスタファイルを用いて演算する従来手法に比較して大量データを高速アクセスできる。
【０００９】
好ましくは、アドレスデコード部は、アドレス情報のうちの上位アドレス情報をデコードして上位デコード信号を出力する上位デコーダと、アドレス情報のうちの下位アドレス情報をデコードして下位デコード信号を出力する下位デコーダと、上位デコード信号と下位デコード信号を入力してアドレス信号を生成し出力する生成デコーダとを有する。
【００１０】
したがって、アドレスデコード部に上位デコーダと下位デコーダとを別個に設けて、さらに生成デコーダを設けることにより、下位アドレス情報を入力してからアドレス信号を導出するまでの遅延を、上位アドレス情報を入力してからアドレス信号を導出するまでの遅延よりも短くしている。
【００１１】
好ましくは下位アドレスのビット長は上位アドレスのビット長よりも短い。したがって、下位デコーダを簡単に構成できるとともに、デコードのための処理段数を少なくできる。
【００１２】
上述のマイクロプロセッサは好ましくは上位アドレス情報を生成するアドレス生成部をさらに備えて、アドレス生成部は、上位アドレス情報を格納するアドレスレジスタと、アドレスレジスタに格納される上位アドレス情報を、命令解読部の制御情報に基づき更新するアドレス更新部とを有し、下位アドレス情報は、命令解読部が出力する制御情報に含まれる。
【００１３】
したがってアドレスレジスタの上位アドレス情報を更新することにより、内部メモリの下位アドレス情報に基づいてアクセスされる領域を、すなわち高速アクセス可能なスタック領域を、内部メモリにおいて可変に設定できる。
【００１４】
好ましくは上述の更新は上位アドレス情報が示すアドレスのインクリメントまたはデクリメントである。
【００１５】
好ましくは命令解読部から出力された下位アドレス情報により内部メモリへのアクセスは１サイクルで行なわれる。
【００１６】
上述のマイクロプロセッサは、命令解読部に与える命令コードを逐次指定する情報を保持するプログラムカウンタをさらに備えて、命令解読部に与えられた命令コードが他の処理ルーチンに分岐することを指令する命令コードであるとき、制御情報により、プログラムカウンタに保持される情報を内部メモリに退避させて、プログラムカウンタの値を指定の値に変更して、アドレスレジスタの内容を更新する。
【００１７】
したがって、他の処理ルーチンへの分岐時には、１命令コードを実行することにより、プログラムカウンタに保持される情報を内部メモリに退避させて、プログラムカウンタの値を指定の値に変更して、アドレスレジスタの内容を更新するという一連の処理を実行できる。
【００１８】
好ましくは上述のプログラムカウンタに保持される情報の内部メモリへの退避と、プログラムカウンタの値の指定値への変更と、アドレスレジスタの内容更新とは、並列に行なわれる。したがって、他の処理ルーチンへの分岐を速やかに処理できる。
【００１９】
好ましくは、処理において命令解読部に例外要因の信号が入力されたとき、この処理から予め準備された例外要因処理ルーチンに分岐するために、制御情報により、アドレスレジスタの内容およびプログララムカウンタの内容は退避されて、かつアドレスレジスタの内容は所定の固定値に変更される。
【００２０】
したがって１命令コードにより、アドレスレジスタの内容およびプログララムカウンタの内容を退避して、かつアドレスレジスタの内容を所定の固定値に変更できるから、速やかに例外要因処理ルーチンに分岐できる。
【００２１】
好ましくは命令解読部に、分岐先の処理ルーチンから元の処理に復帰する命令コードが与えられたとき、制御情報によりプログラムカウンタの内容を予め退避していた内容に復元する。
【００２２】
好ましくは命令解読部に、分岐先の処理ルーチンから元の処理に復帰する命令コードが与えられたとき、制御情報によりアドレスレジスタの内容と前記プログラムカウンタの内容とを復元する。
【００２３】
したがって１命令コードによりアドレスレジスタの内容とプログラムカウンタの内容とを復元できるから、分岐先の処理ルーチンから元の処理に速やかに復帰できる。
【００２４】
好ましくは、内部メモリは独立した３つのポートを有し、３つのポートのうちの１つ目のポートのためのアドレス信号に基づいて内部メモリから情報を読出し、３つのポートのうちの２つ目のポートのためのアドレス信号に基づいて内部メモリから情報を読出し、これら読出された情報を演算部に与えて、その演算結果を、３つのポートのうちの３つ目のポートのためのアドレス信号に基づいて内部メモリに格納する動作を１サイクルで行なう。
【００２５】
したがって内部メモリを参照しながらの演算処理を高速に実行できる。
好ましくは命令コードは、３つのポートのためのアドレス情報それぞれの下位アドレス情報を含む。
【００２６】
好ましくは命令コードは定数情報を含み、命令解読部は、定数情報を、内部メモリから読出された情報を演算部に与えるためのバスに送出する。したがってバスは内部メモリを参照した情報と定数情報との演算部への供給に共用できて装置構成を簡単化できる。また定数情報を命令解読部から直接に演算部に与えることができて、定数情報を用いた演算を速く実行できる。
【００２７】
好ましくは処理中に例外要因の発生に応じて例外処理ルーチンに分岐するとき、アドレスレジスタの内容を退避して固定値に更新し、待ち時間のためのサイクルを実行し、プログラムカウンタの値を退避させるために内部メモリに格納する。
【００２８】
このように処理中に例外要因の発生するとアドレスレジスタの内容は退避されて固定値に更新される。もし分岐先の例外処理ルーチンの最初に内部メモリを参照する命令が実行される場合でも、分岐時には待ち時間のためのサイクルが実行されているから、アドレスレジスタが変更されて、変更後の内容に基づきアドレス解読部がアドレス信号を生成するのに要する時間は、待ち時間で相殺されることになって、その後の内部メモリ参照においては更新後のアドレス情報を用いて適正な領域を参照できる。
【００２９】
好ましくは、例外処理ルーチンから分岐前の元の処理に復帰するとき、アドレスレジスタを退避していた内容に復元して、待ち時間のためのサイクルを実行する。
【００３０】
このように元の処理に復帰するときはアドレスレジスタの内容が変更される。もし復帰した元の処理の最初の命令が内部メモリを参照することを指令する場合でも、復帰時には待ち時間のためのサイクルが実行されているから、アドレスレジスタが変更されて、変更後の内容に基づきアドレス解読部がアドレス信号を生成するのに要する時間は、待ち時間で相殺されることになって、その後の内部メモリ参照においては更新後のアドレス情報を用いて適正な領域を参照できる。
【００３１】
この発明の他の局面に従うコンパイラは、ソースプログラムを構成するソースコードを順次入力して、上述のマイクロプロセッサのための命令コードに変換して命令コード列を出力するコンパイラであって、内部メモリを参照するソースコードを変換するとき、直前に上位アドレス情報の変更を指示する命令コードへの変換がなされている場合には、命令コード列において該ソースコードの命令コードの前に待ち時間サイクルのための命令コードを置くようにコンパイルする。
【００３２】
このように上位アドレス情報を変更するような命令コードと、内部メモリを参照する命令コードとの間には待ち時間サイクルのための命令コードが置かれるから、内部メモリを参照する命令コード実行時には、上位アドレス情報が更新されて更新後の内容に基づきアドレス信号を生成するのに要する時間は待ち時間で相殺されることになって、その後の内部メモリ参照においては更新後のアドレス情報を用いて適正な領域を参照できる。
【００３３】
好ましくは、ソースプログラムにおいてサブルーチン呼出しを指示するソースコードを検出すると、該ソースコードを、プログラムカウンタが保持する情報を内部メモリに退避させて、プログラムカウンタの値を指定の値に変更して、アドレスレジスタの内容を更新することを指令する命令コードに変換する。
【００３４】
したがって、サブルーチン呼出しが指令されるときは、１命令コードにより、マイクロプロセッサに対して、プログラムカウンタが保持する情報を内部メモリに退避させて、プログラムカウンタの値を指定の値に変更して、アドレスレジスタの内容を更新することを指令できる。
【００３５】
好ましくは、ソースプログラムにおいてサブルーチンの終了を指示するソースコードを検出すると、該ソースコードを、プログラムカウンタの内容を予め退避されていた内容に復元する命令コードに変換する。
【００３６】
好ましくは、ソースプログラムにおいてサブルーチン呼出しを指示するソースコードの次位のソースコードが、内部メモリを参照することを指示している場合には、該サブルーチンからの復帰を指示する命令コード群に待ち時間サイクルのための命令コードを置くようにコンパイルする。
【００３７】
サブルーチンから復帰するためには上位アドレス情報が変更されるから、該サブルーチンからの復帰を指示する命令コード群に待ち時間サイクルのための命令コードを置くようにすることで、サブルーチン復帰後に内部メモリを参照する場合は、上位アドレス情報が更新されて、更新後の内容に基づいて内部メモリをアクセスするための信号の生成に要する時間は待ち時間で相殺されることになって、該内部メモリ参照においては更新後のアドレス情報を用いて適正な領域を参照できる。
【００３８】
好ましくは、ソースプログラムにおいてサブルーチンの最初のソースコードが内部メモリの参照を指示することを検出したとき、命令コード列において該ソースコードの命令コードの前に待ち時間サイクルのための命令コードを置くようにコンパイルする。
【００３９】
したがって、直前のサブルーチンで上位アドレス情報が更新されて次位のサブルーチンの最初で内部メモリを参照する場合は、上位アドレス情報が更新されて、更新後の上位アドレス情報に基づいて内部メモリをアクセスする為の信号の生成に要する時間は待ち時間で相殺されることになって、該内部メモリ参照においては更新後のアドレス情報を用いて適正な領域を参照できる。
【００４０】
【発明の実施の形態】
以下、この発明の各実施の形態について図面を参照して説明する。
【００４１】
（実施の形態１）
本実施の形態では、内部ＲＡＭの部分領域をスタックとして機能させ、内部ＲＡＭに関するアドレスデコーダのスタック領域をアクセスする回路規模を小さくすることにより、アドレスデータをデコードしてデコード信号を生成（導出）するまでの所要時間（遅延）を短くするようなプロセッサが提供される。
【００４２】
図１には、本実施の形態に係るマイクロプロセッサ４０の構成が示される。マイクロプロセッサ４０はポートであるアドレス出力１、データ入力２およびデータ出力３、プログラムカウンタ（以下、ＰＣと略す）４、パイプ５、命令デコーダ６、３−ＰｏｒｔＲＡＭ（以下、単に内部ＲＡＭと略す）７、Ａバス８、内部ＲＡＭ７の情報を用いた演算（算術・論理演算）および命令デコーダ６から与えられる情報を用いた演算（算術・論理演算）を行なうＡＬＵ（ＡｒｉｔｈｍｅｔｉｃａｎｄＬｏｇｉｃＵｎｉｔ）９、Ｂバス１０、アドレスレジスタ１１、インクリメンタ１２および１４、ＭＵＸ（Ｍｕｌｔｉｐｌｅｘｏｒ）１５と１６、データ出力バッファ１７、データ入力バッファ１８およびＣバス１９を備える。内部ＲＡＭ７はメモリセルアレイ７０、ならびにＡバス側、Ｂバス側およびＣバス側のそれぞれにおいてアドレスデコード部７Ａ、７Ｂおよび７Ｃを備える。アドレスデコード部７Ａ、７Ｂおよび７Ｃはバス側から与えられるメモリセルアレイ７０のアクセスに関するアドレス情報を入力して解読し、解読結果としてメモリセルアレイ７０をアクセスするためのアドレス信号を出力する。
【００４３】
マイクロプロセッサ４０において実行されるプログラムは複数の命令コードを含み、図示しない外部メモリに格納されている。マイクロプロセッサ４０から外部メモリをアクセスするためのアドレスデータはアドレス出力１に接続されて、外部メモリからマイクロプロセッサ４０に読込まれるデータはデータ入力２に接続されて、マイクロプロセッサ４０から外部メモリに書込まれるデータはデータ出力３に接続されている。マイクロプロセッサ４０全体は、外部からのクロック入力に同期して動作しているが、ここではクロック入力の図示および同期動作の説明は略す。
【００４４】
ＰＣ４は、カウント動作しながら次にフェッチすべき命令が格納されているアドレスデータを保持しており、アドレス出力１を介して外部メモリに指示する。指示されたアドレスデータに基づき指定されたアドレスから読出されたデータである命令コードは、データ入力２を経由してタイミングを調整するパイプ５へ入力すると、パイプ５により適宜命令デコーダ６に与えられて、ここで解読される。マイクロプロセッサ４０のその他の部分は、命令コードの解読結果に基づく命令デコーダ６による指示に従い動作する。
【００４５】
ＡＬＵ９は２つの入力ポートと１つの出力ポートを有し、ＲＡＭ７は３つの独立したポート（３−Ｐｏｒｔ）を有する。Ａバス８は内部ＲＡＭ７のデータをＡＬＵ９に出力する専用バスであり、ＡＬＵ９の１方の入力ポートに接続されている。Ｂバス１０はデータ入力２およびデータ入力バッファ１８を介して外部メモリからのデータをＡＬＵ９に出力する専用バスであり、ＡＬＵ９の他方の入力ポートに接続され、同時にデータ出力バッファ１７を介してデータ出力３にも接続されている。Ｃバス１９は入力専用であり、ＡＬＵ９の出力、ＰＣ４などが接続されている。ここで、内部ＲＡＭ７はデータ幅は３２ビット、アドレスデータ幅は１６ビットで、６４ｋワード＝２Ｍビットの容量を有すると想定する。
【００４６】
アドレスレジスタ１１は、内部ＲＡＭ７のアドレスデータ１６ビット中の上位１１ビットを保持するレジスタである。アドレスレジスタ１１に保持される値は、命令デコーダ６の指示により、インクリメンタ１２を介して＋１されたり−１されたりする。
【００４７】
Ａバス８のＡバスアドレスにはアドレスレジスタ１１の出力およびインクリメンタ１２の出力の一方がＭＵＸ１６で選択されて接続される。Ｂバス１０のＢバスアドレスにはアドレスレジスタ１１の出力が接続されている（図示しない）。
【００４８】
Ｃバス１９のＣバスアドレスにはアドレスレジスタ１１の出力およびインクリメンタ１２の出力の一方がＭＵＸ１５で選択されて接続される。内部ＲＡＭ７のアドレスデータの下位５ビットは、内部ＲＡＭ７の３つのポートでそれぞれ独立であって、命令デコーダ６からの信号が接続されている。したがって、内部ＲＡＭ７をプログラムの命令コードに従って様々なアクセスが可能となっている。
【００４９】
ここで、アドレスレジスタ１１に保持される値（上位アドレスデータの値）を変化させないと想定した時、内部ＲＡＭ７のアクセスできる領域は下位アドレスデータで指定できる範囲、すなわち３２（＝２＾５）ワードの空間となるから、内部ＲＡＭ７を３２ビットの内蔵レジスタが３２個存在するかの如くアクセスできて、命令コードに従い、その空間内で様々な演算が行なえる。
【００５０】
アドレスレジスタ１１の値を＋１インクリメントすれば、内部ＲＡＭ７のアドレスを３２増加させることとなり、アクセスできる内部ＲＡＭ７の領域（＝命令コードによってデータの読出しまたは書込みが可能な領域）が移動し、異なった空間の３２ワード分にアクセスが可能となる。アドレスレジスタの値を−１させた場合も同様である。
【００５１】
いま、内部ＲＡＭ７はＳＲＡＭ（ＳｔａｔｉｃＲＡＭ）で実現されているとする。ＳＲＡＭの各セルを図２に示す。図２のＳＲＡＭのセル２１は、データ線対から供給される１ビットの情報を保持するインバータ２２と２３、およびトランジスタ２４〜２９を有する。内部ＲＡＭ７の３つのポートのうちの１つめのポートのアドレスをデコードした信号が供給されるワード線ＷＡはトランジスタ２４と２５に接続され、同様に２つめのポートのアドレスをデコードした信号が供給されるＷＢはトランジスタ２６と２７に接続され、同様に３つめのポートのアドレスをデコードした信号が供給されるＷＣはトランジスタ２８と２９に接続される。
【００５２】
ワード線ＷＡ、ＷＢおよびＷＢのうち選択されたワード線のみがハイとなり、選択されたワード線が接続されたトランジスタはオン状態となる。ワード線ＷＡが選択された場合はデータ線の対ＤＡとＤＡｂがトランジスタ２４と２５を介してインバータ２２と２３に接続される。同様にワード線ＷＢが選択された場合はデータ線の対ＤＢとＤＢｂがトランジスタ２６と２７を介してインバータ２２と２３に接続され、ワード線ＷＣが選択された場合はデータ線の対ＤＣとＤＣｃがトランジスタ２８と２９を介してインバータ２２と２３に接続される。
【００５３】
セル２１は、図３の如く並べられＳＲＡＭが形成される。図３では、Ａバスアドレスからのデコードされた信号はワード線ＷＡ００００〜ＷＡＦＦＦＦを介して、Ｂバスアドレスからのデコードされた信号はワード線ＷＢ００００〜ＷＢＦＦＦＦを介して、Ｃバスアドレスからのデコードされた信号はワード線ＷＣ００００〜ＷＣＦＦＦＦを介して、各セル２１に与えられる。選択されたワード線に接続されてオンしたセル２１には対応のデータ線対が接続される。
【００５４】
ＳＲＡＭ（内部ＲＡＭ７）のそれぞれのポートにセンスアンプＡＭＰが配され、各セル２１で保持されたデータはセンスアンプＡＭＰにより増幅されてＡバス８のデータＤＡ００〜ＤＡ１Ｆ、Ｂバス１０のデータＤＢ００〜ＤＢ１Ｆ、Ｃバス１９のデータＤＣ００〜ＤＣ１Ｆとしてそれぞれ出力（読出し）される。
【００５５】
データを書込みする場合も同様にワード線で選択され、センスアンプＡＭＰから駆動されたデータ線の値が、各セル２１に書き込まれる。
【００５６】
図４には、各ワード線の信号を生成するアドレスデコード部の構成が示される。内部ＲＡＭ７のアドレスデコード部７Ａ、７Ｂおよび７Ｃは図４に示す同様の構成を有する。アドレスデコード部は上位アドレスデコーダ４１、下位アドレスデコーダ４２およびアドレスデコーダ４３を有する。アドレスレジスタ１１から駆動される上位１１ビットＡ５〜Ａ１５は、アドレスデコーダ４１に入力され、デコード結果の信号ＧＡ０００〜ＧＡ７ＦＦが生成されてアドレスデコーダ４３に与えられる。信号ＧＡ０００〜ＧＡ７ＦＦのうち該当するアドレスの信号のみがハイとなり、その他の信号はローとなる。
【００５７】
命令デコーダ６から駆動される下位５ビットＡ０〜Ａ４は、アドレスデコーダ４２に入力され、デコード結果の信号ＧＢ００〜ＧＢ１Ｆが生成されアドレスデコーダ４３に与えられる。信号ＧＢ００〜ＧＢ１Ｆのうち該当するアドレスの信号のみがハイとなり、その他の信号はローとなる。
【００５８】
さらに、それぞれにデコードされた信号はアドレスデコーダ４３に入力されて、個々のワード線信号Ｗ００００〜ＷＦＦＦＦが生成されて出力される。ワード線信号Ｗ００００〜ＷＦＦＦＦのうち該当するアドレスのワード線信号のみがハイとなり、その他のワード線信号はローとなる。
【００５９】
ここでは、アドレスデコーダを上位アドレスデコーダ４１と下位アドレスデコーダ４２とに分離して、かつ下位アドレスデコーダ４２は比較的規模の小さい回路、たとえばゲート１段の回路で実現し、また後段のアドレスデコーダ４３も段数を少なくしているため、アドレスデコード部では下位アドレスデータ（下位５ビットＡ０〜Ａ４のデータ）を入力してからワード線信号Ｗ００００〜ＷＦＦＦＦを出力するまでの所要時間（遅延量）を最小とすることができる。
【００６０】
いま、マイクロプロセッサ４０を１００ＭＨｚのクロックで動作させたとすると、１サイクルは１０ｎｓであるが、下位アドレスデータ（下位５ビットＡ０〜Ａ４のデータ）の変化から、デコードして、デコード結果を出力し、さらにそのデコード結果を用いてＳＲＡＭセルからデータを読出すまでの遅延（所要時間）を１サイクル以内となるようタイミング設計を行なったとする。このように設計されたマイクロプロセッサ４０では、下位アドレスデータ（下位５ビットＡ０〜Ａ４のデータ）のみが変化する場合、１サイクルで各種データの演算が行なえ、従来の汎用レジスタマシンによる、レジスタの演算と同様に演算を行なうことができる。
【００６１】
このマイクロプロセッサ４０を用いて、図５のＣ言語で書かれたソースプログラム５０（以下、単にプログラム５０という）を実行する。実行時にはプログラム５０は予めコンパイルされてマイクロプロセッサ４０が実行可能な機械コードに変換（翻訳）されていると想定する。
【００６２】
プログラム５０は関数ｆｕｎｃ１のサブルーチンプログラムであり変数ａとｂを引数としており、ローカル変数ｃ、ｄおよびｅを有する。関数ｆｕｎｃ１のプログラム５０は図示しない他の関数プログラムなどから呼出されたとき実行される。また、プログラム５０の関数ｆｕｎｃ１の中で別の関数ｆｕｎｃ２のサブルーチンプログラムが呼出されている。関数ｆｕｎｃ２のサブルーチンプログラムは変数ｃを引数としており、ローカル変数ｆを有する。
【００６３】
プログラム５０を実行する時の内部ＲＡＭ７上のデータの配置例を図６に模式的に示すとともに、各サイクル毎のマイクロプロセッサ４０の動作を図７に表形式で示す。図７には図５のプログラム５０の左端に当てられた行番号を示すソース行番号７１と、該ソース行番号７１の行に記載されたソースコードをコンパイルして得られた機械コードであるニーモニックコード７２、該ニーモニックコード７２を実行する時のマイクロプロセッサ４０の動作７３および該ニーモニックコード７２を実行するサイクルの順番を示すサイクル番号７４が示される。
【００６４】
図示しない他の関数から関数ｆｕｎｃ１が呼出されてプログラム５０が実行される時、アドレスレジスタ１１の値は０で、関数ｆｕｎｃ１の引数にコピーすべき変数ａとｂが内部ＲＡＭ７のアドレス“００ｈ＋２”とアドレス“００ｈ＋１”にそれぞれ格納されていたと想定する。
【００６５】
まず引数をスタックへコピーする必要があるため、サイクル番号７４が示す１番目および２番目のサイクルで、現在のアドレスレジスタ１１のデータにより指定される内部ＲＡＭ７の領域であるローカル変数領域の内容を、次に実行される関数ｆｕｎｃ１が使用するローカル領域である（アドレスレジスタ１１の値＋１）により指定される領域へコピーする。この時、Ａバス８にはアドレスレジスタ１１の値が上位アドレスとなり、Ｃバス１９には（アドレスレジスタ１１の値＋１）が上位アドレスとなり、異なった上位アドレスの領域間でのコピーを行なう。これにより変数ａとｂはアドレス“２０ｈ＋１”と“２０ｈ＋２”にそれぞれコピーされる。
【００６６】
サイクル番号７４が示す３番目のサイクルでは、サブルーチン（関数ｆｕｎｃ１）の呼出し命令のニーモニックコード７２を実行することにより、関数ｆｕｎｃ１の処理が終了した時の戻り番地（ＰＣ＋４）を、次の関数が使用するローカル領域に退避し、アドレスレジスタ１１の値を＋１し、そしてＰＣ４に関数ｆｕｎｃ１の先頭番地を代入して制御を関数ｆｕｎｃ１に移すという動作を並列に実行できる。したがってサブルーチン呼出しは１サイクルで実行できるから、サブルーチン呼出しを高速に処理できる。
【００６７】
関数ｆｕｎｃ１のローカル変数ｃ、ｄ、ｅの領域をローカル領域のそれぞれ３番地、４番地、５番地にコンパイラが割り当てている（図６参照）。関数ｆｕｎｃ１での最初の命令は「ｃ＝ａ＋ｂ；」の演算命令であり内部ＲＡＭ７を参照する動作を伴う命令である。この時点では、先に関数ｆｕｎｃ１を呼出した際にアドレスレジスタ１１の値は変化しており、上位アドレスからの遅延量が大きい場合には該演算命令は１サイクルで実行完了しない場合がある。そのためサイクル番号７４が示す４番目のサイクルでは何も行なわない命令を示すニーモニックコード７２（‘ＮＯＰ’）がコンパイラによって挿入され、次のローカル変数に対する演算、すなわち「ｃ＝ａ＋ｂ；」の演算に備える。命令デコーダ６は命令コードを入力して‘ＮＯＰ’であることを解釈すると、該命令コードのために当てられたサイクルにおいては何ら動作せずに次の命令コードの入力まで待機する。‘ＮＯＰ’のサイクルにおいては、全ての内部バスおよび制御信号は変化せずに現状状態を維持することになる。
【００６８】
ここで命令コード（‘ＮＯＰ’）を挿入する目的について説明する。マイクロプロセッサ４０では上位アドレスデータを入力してからデコードしてデコード信号を導出するまでの遅延時間が、下位アドレスデータを入力してからデコードしてデコード信号を導出するまでの遅延時間よりも相対的に長くなる。アドレスデータを入力してからデコード信号導出までの遅延時間（所要時間）が長いと、内部ＲＡＭ７のアクセス（参照）が１サイクルで終了せずに次の演算命令のためのデータを読出しできない惧れがある。つまり、実際は、「アクセス時間」＝「アドレスデコードの遅延時間」＋「メモリセルの読出し時間（ワード線・ビット線の遅延とセンスアンプ部の遅延）」であるので、デコードに時間がかかるとアクセス時間が長くなる。仮にアクセス時間が１サイクル以上（１００ＭＨｚでは１０ｎｓ以上）かかった場合でも、次の演算命令の直前に命令コード（‘ＮＯＰ’）が１サイクル分実行されることで、アクセス時間のために余分に１サイクル分充てることができて、次の演算命令実行時には常にオペランドデータのアクセスに成功している状態とすることができる。これにより必要データが揃わずに演算命令が実行できないというエラー状態を確実に回避できる。
【００６９】
サイクル番号７４が示す５番目と６番目のサイクルでローカル変数間での演算が行なわれる。これら演算のための変数（オペランド）は、すべて同じ上位アドレスの領域に割当てられているため、すべて１サイクルで終了する。
【００７０】
次に実行される関数ｆｕｎｃ２の呼出し命令のため、サイクル番号７４が示す７番目のサイクルで、引数ｃを関数ｆｕｎｃ２で使用するローカル領域（図６のアドレス“４０ｈ＋１”）へコピーし、サイクル番号７４が示す８番目のサイクルではＰＣ４の戻り番地の退避（図６のアドレス“４０ｈ＋０”へのコピー）をして、アドレスレジスタ１１の繰り上げ、およびＰＣ４に対する関数ｆｕｎｃ２の先頭番地の代入が同時に行なわれ、関数ｆｕｎｃ２に制御が移される。
【００７１】
サイクル番号７４の９番目のサイクルでは、４番目のサイクルと同様にコンパイラが挿入したニーモニックコード７２（‘ＮＯＰ’）により、上位アドレスの遅延待ちを行なう。
【００７２】
サイクル番号７４の１０番目のサイクルでは、ローカル変数と定数との演算命令（ｆ＝ｃ＋１；）が実行される。この演算のための定数（＝１）の情報は対応のニーモニックコード７２に含まれており、定数情報はＢバス１０を命令デコーダ６が駆動してＡＬＵ９に与えられて演算実行される。したがってＢバス１０を内部ＲＡＭ７から読出したデータのＡＬＵ９への転送とともに、定数情報の命令デコーダ６からＡＬＵ９への転送に利用できる。
【００７３】
関数ｆｕｎｃ２の処理が終了し関数ｆｕｎｃ１の処理に戻るため、サイクル番号７４が示す１１番目のサイクルでアドレスレジスタ１１の値を−１して戻し、サイクル番号７４が示す１３番目のサイクルで戻り値を元の関数ｆｕｎｃ１で使用するローカル領域にコピーし、サイクル番号７４の１４番目のサイクルでＰＣ４の値を復元して関数ｆｕｎｃ１の制御に戻る。このようにサブルーチンから復帰する（関数ｆｕｎｃ２から元の関数ｆｕｎｃ１にリターンする）際にもアドレスレジスタ１１の値を変化させるため、上位アドレスの遅延待ちが必要となり、サイクル番号７４が示す１２番目のサイクルではコンパイラが挿入したニーモニックコード７２（‘ＮＯＰ’）が実行される。
【００７４】
サイクル番号７４が示す１５番目のサイクルでは、ローカル変数間の演算が行なわれ、１６番目〜１９番目のサイクルで関数ｆｕｎｃ１を呼出した図示のない他の関数に復帰（リターン）するための処理が行なわれる。この時も同様にアドレスレジスタ１１の値が変更されるため、コンパイラによって挿入された何もしない命令を示すニーモニックコード７２（‘ＮＯＰ’）が実行される。
【００７５】
図７のサブルーチン呼出し命令（‘ＣＡＬＬｆｕｎｃ１’、‘ＣＡＬＬｆｕｎｃ２’）の動作７３では、この１サイクル（クロックの１周期）で、「ＰＣ４の内容の退避」、「アドレスレジスタ１１の値の繰り上げ」および「ＰＣ４の値の更新」の３つの動作が、同時に並行して行なわれる。従来は、「アドレスレジスタ」にあたる物はないので、「ＰＣ内容の退避」と「ＰＣ値の更新」とは逐次処理されるか、同時並列処理される物もあるかも知れない。本実施の形態では、「アドレスレジスタ１１の繰り上げ」も含めて同時に並列処理される特徴を有する。
【００７６】
この同時並列処理を図１を参照し説明する。「ＰＣ４の内容の退避」は、「ＰＣ４→インクリメンタ１４→Ｃバス１９→内部ＲＡＭ７」の経路を用いて行なう。「アドレスレジスタ１１の繰り上げ」は、「アドレスレジスタ１１→インクリメンタ１２→アドレスレジスタ１１」の経路を用いて行なう。「ＰＣ４の値の更新」は、図示されないが、「命令デコーダ６→ＰＣ４」という経路を用いて行なう。これら３種類の経路は独立している（共通したバス接続でない）ために、同時並列処理が可能となる。
【００７７】
またマイクロプロセッサ４０では割込みを処理する。命令デコーダ６は図示しない割込み入力を受付けて、その割込み入力の信号がアクティブになると、予め準備された割込み処理プログラムへ分岐する。その様子を図８に示す。図８には割込み処理プログラムについて図７と同様に表形式でマイクロプロセッサ４０の動作が示される。図８ではソース番号７１は省略されている。
【００７８】
プログラム５０などの処理中に例外要因が生じたことを示す割込み信号が発生すると、ハード的にサイクル番号７４が示す１０１番目〜１０３番目の３つのサイクルが実行される。１０１番目のサイクルでは、アドレスレジスタ１１の値を退避レジスタ１３にコピー（保存）し、同時にアドレスレジスタ１１に所定値（固定した値）を代入する。アドレスレジスタ１１の値が固定値になることで、割込み発生時は、常に内部ＲＡＭ７の同じスタック領域を使用することとなる。割込み処理プログラムの中で、アドレスレジスタ１１を適宜変更することにより、異なった領域をスタックとして使用することもできる。
【００７９】
サイクル番号７４が示す１０３番目のサイクルで、アドレスレジスタ１１の新たな値が示すスタック領域の０番地に、戻り番地（ＰＣ４の値＋４）を保存すると同時に、ＰＣ４に固定値を代入する。ＰＣ４の値が固定値となることで、割込み発生時は、常に同じ番地（割込み処理プログラムの先頭番地）にジャンプすることとなる。割込み処理プログラムの中で、適宜分岐することにより、様々な処理を行なうことができる。
【００８０】
この場合も、１０１番目のサイクルでアドレスレジスタ１１の値の変更が行なわれるので、次の１０２番目のサイクルで上位アドレスの遅延待ちを行なう。この１０２番目のサイクルでは、前述のコンパイラが挿入するニーモニックコード７２（‘ＮＯＰ’）を実行するのとは異なり、ハード制御により待ちサイクルが実行される。この待ちサイクルにおいては全ての内部バスおよび制御信号は変化せずに現状状態を維持し、かつ命令デコーダ６に対するパイプ５による命令コードの供給も停止する。
【００８１】
サイクル番号７４が示す１０４番目以降のサイクルでは、割込み処理が行なわれて処理の最後には、割込み処理から元の処理に復帰（リターン）するためのニーモニックコード７２（“ＲＥＴＩ”）が必ず配置される。この命令“ＲＥＴＩ”を実行する２０１番目のサイクルでは、割込み処理用スタック領域の０番地の内容をＰＣ４にコピーしてＰＣ４の値を復元し、同時に退避レジスタ１３の内容をアドレスレジスタ１１にコピーし、分岐前の元の処理時のアドレスを復元する。これにより割込み入力により中断された元の処理を再開して、中断した時点の内容から実行することができる。
【００８２】
サイクル番号７４が示す２０３番目のサイクルから元の処理が再開することになるが、再開して最初に実行される命令コード（ニーモニックコード７２）が内部ＲＡＭ７を参照する命令か否か判別することは困難である。そのために、２０１番目のサイクルでアドレスレジスタ１１の内容が変更されているので、２０３番目のサイクルの前に必ず２０２番目のサイクルが実行されるようにして上位アドレスの遅延待ちを行なう。これも１０２番目のサイクルと同様にハード制御による待ちサイクルである。
【００８３】
上述のようにアドレスレジスタ１１の内容が更新された後に内部ＲＡＭ７を参照するような命令が実行されるときは、‘ＮＯＰ’などによる待ちサイクルが挿入されるから、アドレスレジスタ１１の内容が更新されたとしても内部ＲＡＭ７参照のためのサイクルを確保できて、たとえば図５の３行目の演算命令でもＡＬＵ９での実行時には上位アドレスからのデコード信号（ＧＡ０００〜ＧＡ７ＦＦ）を準備しておくことができる。
【００８４】
したがって、このような演算命令実行時には次のような動作となる。つまり、内部ＲＡＭ７の３つのアドレスデコード部７Ａ〜７Ｃに対応した独立した３つのポートのうちの１つ目のポートのＡバスアドレスの信号に基づいて内部ＲＡＭ７から情報を読出し、２つ目のポートのためのＢバスアドレスの信号に基づいて内部ＲＡＭ７から情報を読出し、これら読出された情報をＡＬＵ９に与えて、その演算結果を、３つ目のポートのＣバスアドレスの信号に基づいて内部ＲＡＭ７に格納する動作を１サイクルで行なえる。
【００８５】
また、Ａ、ＢおよびＣバスアドレスに与えられる命令デコーダ６からの下位アドレスに基づいて上述したように高速に内部ＲＡＭ７をアクセスできる。
【００８６】
（実施の形態２）
本実施の形態ではソースプログラムを入力して、ソースプログラム中の内部ＲＡＭ７をアクセス（参照）する命令コードを検出したときは、上述の待ちサイクルを設けるための命令コード（‘ＮＯＰ’）を挿入して、該ソースプログラムをコンパイルするコンパイラが提供される。
【００８７】
図９は実施の形態２に係るコンパイル手順を実行するマイクロコンピュータ８０である。マイクロコンピュータ８０はＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）、コンパイラプログラム（以下、単にコンパイラと呼ぶ）などのデータを予め格納するＲＯＭ（ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）８２、ＲＡＭ８３、入出力Ｉ／Ｆ（ＩｎｔｅｒＦａｃｅ）８４、キーボードなどの外部から指示などの情報を入力するための入力部８５、情報を外部に出力するための画面などの出力部８６、インターネットなどの各種通信回線と接続するための通信Ｉ／Ｆ８７、記録媒体８９が着脱自在に挿入されて、挿入された記録媒体に対して情報をアクセスするための記録媒体駆動部８８を備える。
【００８８】
コンパイラは記録媒体８９に予め記録されて記録媒体駆動部８８により読出されることで供給されてもよく、ネットワークから通信Ｉ／Ｆ８７を介してロードされることで供給されてよい。
【００８９】
入出力Ｉ／Ｆ８４はマイクロプロセッサ４０を含む各種装置と入出力する。ＲＯＭ８２に格納されたコンパイラはＣＰＵ８１の制御のもとに実行されることにより、ＲＡＭ８３などに準備された高級言語のソースプログラムは逐次読出されて、マイクロプロセッサ４０のための機械語命令に翻訳されて、翻訳された内容はＲＡＭ８３の所定領域に格納される。ＲＡＭ８３の所定領域に格納された機械語命令列は読出されて入出力Ｉ／Ｆ８４を介してデータ入力２としてマイクロプロセッサ４０に与えられる。なお、ＲＡＭ８３から読出された機械語命令列は通信Ｉ／Ｆ８７およびネットワークを介して読出されてデータ入力２としてマイクロプロセッサ４０に与えられても良く、または記録媒体駆動部８８を介して記録媒体８９に書込んで、記録媒体８９の内容がデータ入力２としてマイクロプロセッサ４０に与えられても良い。
【００９０】
ここでは、マイクロコンピュータ８０が動作していることを前提としてマイクロプロセッサ４０が動作するとしているが、通常は、コンパイラが例えば記録媒体８９にコンパイル結果（命令コード列）を格納し、その結果を、アドレス出力１、データ入力２に接続されているＲＯＭ等に書き込んだ後に、マイクロプロセッサ４０を動作させることになる。
【００９１】
図１０は実施の形態２に係るコンパイル手順を示すフローチャートであり、ＲＯＭ８２のコンパイラが実行されることにより図１０の手順が実行される。図１０のフローチャートに従い図５のＣ言語のプログラム５０が図７のニーモニックコード７２の列にコンパイルされる手順を説明する。プログラム５０において宣言される変数（引数を含む）のスタック領域への図６のような割当てもコンパイラによりなされるが、ここではその説明は省略する。また、手順を追って逐次生成されるニーモニックコード７２はＣＰＵ８１の図示のない内部メモリに逐次格納される。コンパイルすべきソースコードが無くなる（全てのソースコードのコンパイルが終了する）と、内部メモリのニーモニックコード７２の列はＲＡＭ８３の所定領域に格納される。
【００９２】
まず、ＣＰＵ８１はＲＡＭ８３からプログラム５０のファイルを図示のない内部メモリに読込む（ステップＳ１）。次に、読込んだソースファイルにコンパイルすべき関数のコードがあるか判定する（ステップＳ２）。コンパイルすべきコードがなければ、コンパイル結果である内部メモリに格納されたニーモニックコード７２の列（機械語命令列）はＲＡＭ８３の所定領域に出力されるが（ステップＳ１７）、あればプログラム５０の先頭行の内容は内部ＲＡＭ７を参照する命令コードか否か判定する（ステップＳ３）。内部ＲＡＭ７を参照する命令コードであればＣＰＵ８１は‘ＮＯＰ’のニーモニックコード７２を生成して（ステップＳ４）、読込んだ内容に基づいて内部ＲＡＭ７を参照するニーモニックコード７２を生成する（ステップＳ５）。その後、ステップＳ６に移行する。
【００９３】
関数ｆｕｎｃ１のサブルーチンを呼び出す命令は、図示していない他のソースコードファイルに含まれるプログラムに記載されているため、ニーモニックコード「ＣＡＬＬｆｕｎｃ１」は、このソースコードファイルのコンパイル時には生成されない（図示していない他のソースコードファイルをコンパイルする時に生成される）。
【００９４】
プログラム５０の１行目の読込み内容は関数ｆｕｎｃ１を定義する内容である（ステップＳ２でＹＥＳ）。プログラム５０の最初の命令は３行目の命令コードであり、内部ＲＡＭ７を参照する命令であるから（Ｓ３でＹＥＳ）、ＣＰＵ８１は‘ＮＯＰ’のニーモニックコード７２を生成して（ステップＳ４）、読込んだ内容に基づいて内部ＲＡＭ７を参照するニーモニックコード７２を生成する（ステップＳ５）。その後、ステップＳ６に移行する。
【００９５】
４行の命令コードは一般命令であるから（Ｓ６でＮＯ、Ｓ１１でＮＯ）、読込んだ内容に基づいてニーモニックコード７２を生成する（ステップＳ１６）。その後、ステップＳ６に移行する。
【００９６】
ＣＰＵ８１はプログラム５０の５行目を読込む。読込んだ内容はサブルーチン呼出し命令コードであるから（ステップＳ６でＹＥＳ）、対応のニーモニックコード７２（‘ＣＡＬＬｆｕｎｃ２’）を生成する（ステップＳ７）。関数ｆｕｎｃ２のサブルーチンから戻ってきた最初の命令である、６行目のソースコードは内部ＲＡＭ７を参照する命令であるから（ステップＳ８でＹＥＳ）、‘ＮＯＰ’のニーモニックコード７２を生成して、さらに内部ＲＡＭ７を参照する命令に対応のニーモニックコード７２を生成する（ステップＳ９、Ｓ１０）。その後、ステップＳ６に戻る。
【００９７】
次の７行目はサブルーチン終了命令と判定されるので（ステップＳ６でＮＯ、ステップＳ１１でＹＥＳ）、元のサブルーチンに戻るために、即ち関数ｆｕｎｃ１を呼出したサブルーチンに戻るために、ニーモニックコード７２（‘ＤＥＣＡＤＲ’）を生成し（ステップＳ１２）、‘ＮＯＰ’のニーモニックコード７２を生成し（ステップＳ１３）、戻り値コピー命令のニーモニックコード７２を生成し（ステップＳ１４）、ニーモニックコード７２（‘ＲＥＴ’）を生成し（ステップＳ１５）、ステップＳ２に戻る。
【００９８】
これで関数ｆｕｎｃ１のサブルーチンのコンパイルが終了し、さらにソースファイルには関数ｆｕｎｃ２が続いている（ステップＳ２でＹＥＳ）。関数ｆｕｎｃ２のサブルーチンの最初の命令は１１行目の命令コードで示される。これは、内部ＲＡＭ７を参照する命令であるから（ステップＳ３でＹＥＳ）、ＣＰＵ８１は‘ＮＯＰ’のニーモニックコード７２を生成して（ステップＳ４）、読込んだ内容に基づいて内部ＲＡＭ７を参照するための、ニーモニックコード７２を生成する（ステップＳ５）。その後、ステップＳ６に移行する。
【００９９】
この時点でコンパイルすべきコードは残っていないので（ステップＳ２でＹＥＳ）、コンパイル結果の機械語命令列はＲＡＭ８３の所定領域に出力（格納）される（ステップＳ１７）。以上でプログラム５０のコンパイルは終了する。
【０１００】
（実施の形態の変形例）
実施の形態１と２では、上位アドレスからの遅延量が大きいために、アドレスレジスタ１１の値の変更の直後に内部ＲＡＭ７の参照が実行される場合には、該参照は１サイクルで処理を終了することができない惧れがあるので、コンパイラにより何も動作しないことを指示する機械語命令（‘ＮＯＰ’）を挿入したが、上位アドレスからの遅延量が比較的小さいので上述のような内部ＲＡＭ７参照も１サイクルで処理が終了する場合は、該機械語命令（‘ＮＯＰ’）の挿入は不要である。したがって、コンパイラのオプションとして、機械語命令（‘ＮＯＰ’）の自動挿入を許可するか否かを可変に設定するようにしてもよい。
【０１０１】
また、割込み発生および割込み処理からの復帰時には、ハード制御による待ちサイクルが挿入されるが、これも上位アドレスからの遅延量が比較的小さい場合には、ハード設計時のオプションとすることができる。
【０１０２】
また、本実施の形態１では、下位アドレスを５ビットとし、内部ＲＡＭ７の各ローカル領域を３２ワードとしたが、これに限定されない。たとえば、多量のローカル変数を必要とするプログラムが実行されるようなマイクロプロセッサ４０では、下位アドレスを増加させて各ローカル領域を大きく取る等、システムの最適化を行なうこともできる。また逆に必要なローカル変数が少ない場合には、下位アドレスを減少させて内部ＲＡＭ７の未使用領域を減らして、内部ＲＡＭ７に関する容量削減およびコストダウンをして、システムの最適化を行なうこともできる。
【０１０３】
（実施の形態の効果）
マイクロプロセッサ４０を使用することにより、通常スタックを使用するローカル変数上の演算を、アドレスを演算するためのサイクルなしに１サイクルで行なえる。また、スタックをアドレスレジスタ１１の値の変化によって切換えるため、ローカルに使用するアドレスレジスタ１１の値のスタックへの退避等が必要なく、関数（サブルーチン）呼出しのオーバーヘッドが減少する。また、アドレスレジスタ１１の出力と、インクリメンタ１２の出力をＭＵＸ１５と１６で切換える構造により、関数（サブルーチン）呼出しの前後の領域間での引数渡しが可能となる。また、関数（サブルーチン）呼出しの際、戻り番地を退避し、アドレスレジスタ１１の繰上げをし、ＰＣ４へ値を代入することを並列実行でき、関数（サブルーチン）呼出しを高速に実行できる。
【０１０４】
したがってマイクロプロセッサ４０では、すべての演算をスタック上で行なうような言語仕様を有する言語のプログラムを、たとえばＪａｖａ（Ｒ）で書かれたプログラムを高速に実行できる。
【０１０５】
今回開示された実施の形態はすべての点で例示であって制限的なものではないと考えられるべきである。本発明の範囲は上記した説明ではなくて特許請求の範囲によって示され、特許請求の範囲と均等の意味および範囲内でのすべての変更が含まれることが意図される。
【０１０６】
【発明の効果】
この発明のマイクロプロセッサによれば、内部メモリの下位アドレス情報に基づいて参照される部分領域のアクセスを高速化できる。例えばレジスタファイルを用いて演算する従来手法に比較して大量データを高速アクセスできる。
【０１０７】
この発明のコンパイラによれば、上位アドレス情報を変更するような命令コードと、内部メモリを参照する命令コードとの間には待ち時間サイクルのための命令コードが置かれるから、内部メモリを参照する命令コード実行時には、上位アドレス情報が更新されて更新後の上位アドレス情報に基づいて内部メモリをアクセスするための信号の生成に要する時間は待ち時間で相殺されることになって、その後の内部メモリ参照においては更新後のアドレス情報を用いて適正な領域を参照できる。
【図面の簡単な説明】
【図１】マイクロプロセッサの構成図である。
【図２】ＳＲＡＭの各セルの構成例を示す図である。
【図３】ＳＲＡＭのセルの配列例を示す図である。
【図４】各ワード線の信号を生成するアドレスデコード部の構成図である。
【図５】Ｃ言語で書かれたソースプログラムの一例を示す図である。
【図６】プログラムを実行する時の内部ＲＡＭ上のデータの配置例を示す図である。
【図７】各サイクル毎のマイクロプロセッサの動作を表形式で示す図である。
【図８】割込み処理プログラムへ分岐時の各サイクル毎のマイクロプロセッサの動作を表形式で示す図である。
【図９】コンパイル手順を実行するマイクロコンピュータのブロック図である。
【図１０】コンパイル手順を示すフローチャートである。
【符号の説明】
６命令デコーダ、７内部ＲＡＭ、７Ａ，７Ｂ，７Ｃアドレスデコード部、９ＡＬＵ、１１アドレスレジスタ、４０マイクロプロセッサ、４１上位アドレスデコーダ、４２下位アドレスデコーダ。[0001]
BACKGROUND OF THE INVENTION
The present invention relates to a microprocessor and a compiler that mainly execute programs written in a high-level language, and more particularly to a microprocessor that performs processing using a stack and a compiler for a program that is executed by the microprocessor.
[0002]
[Prior art]
When designing a program in a high-level language, arguments and return values for subroutines are exchanged on the stack, and for local variables in the C language, an area is secured on the stack and an operation is performed. For this reason, the operation with local variables is complicated by the occurrence of RAM (Random Access Memory) address operation and its read, storage address operation and its write. Further, when the access speed of the external RAM is low, the speed affects the overall processing speed. In order to increase the speed of subroutine execution processing, a method has been provided in which stack operations relating to subroutine call / return are parallelized and thus speeded up (see, for example, Patent Document 1).
[0003]
[Patent Document 1]
JP 11-242595 A
[0004]
[Problems to be solved by the invention]
The method proposed in the above document is configured to increase the speed, but since the stack actually used is connected as an external memory, it is considered that access to the external memory is generally slow. If the processing contents frequently access the external memory, it is difficult to speed up the execution process. This document does not mention access to data on the stack and computation.
[0005]
Therefore, an object of the present invention is to provide a microprocessor capable of accessing data at high speed and a compiler for a program executed by the microprocessor.
[0006]
[Means for Solving the Problems]
A microprocessor according to an aspect of the present invention includes an internal memory that stores information accessed based on an address signal, an arithmetic unit that performs various operations including information using information in the internal memory based on an instruction, and an instruction to be given An instruction decoding unit that decodes a code and outputs control information for controlling each unit including an operation command based on the decoding result, and an address decoding unit that inputs and decodes given address information and outputs an address signal .
[0007]
The address information includes upper address information and lower address information. The delay from the input of the lower address information to the derivation of the address signal by the address decoding unit is from the input of the upper address information to the derivation of the address signal. Shorter than the delay.
[0008]
Therefore, the access to the partial area referred to based on the lower address information of the internal memory is performed faster than the access to the partial area referred to based on the upper address information. Access is possible. Further, it is possible to speed up the access of the partial area referred to based on the lower address information of the internal memory. For example, a large amount of data can be accessed at a high speed as compared with the conventional method of calculating using a register file.
[0009]
Preferably, the address decoding unit decodes the upper address information of the address information and outputs a higher decode signal, and the lower decoder which decodes the lower address information of the address information and outputs a lower decode signal And a generation decoder that inputs an upper decoding signal and a lower decoding signal to generate and output an address signal.
[0010]
Therefore, by providing a separate upper decoder and lower decoder in the address decoding unit, and further providing a generation decoder, the delay from the input of the lower address information to the derivation of the address signal is input to the upper address information. It is shorter than the delay until the address signal is derived.
[0011]
Preferably, the bit length of the lower address is shorter than the bit length of the upper address. Therefore, the low-order decoder can be configured easily, and the number of processing stages for decoding can be reduced.
[0012]
The above-mentioned microprocessor preferably further includes an address generation unit that generates upper address information, and the address generation unit converts the address register that stores the upper address information and the upper address information stored in the address register into the instruction decoding unit. The address update unit for updating based on the control information is included, and the lower address information is included in the control information output by the instruction decoding unit.
[0013]
Therefore, by updating the upper address information of the address register, an area accessed based on the lower address information of the internal memory, that is, a stack area that can be accessed at high speed can be variably set in the internal memory.
[0014]
Preferably, the above update is an increment or decrement of the address indicated by the upper address information.
[0015]
Preferably, access to the internal memory is performed in one cycle based on the lower address information output from the instruction decoding unit.
[0016]
The above-mentioned microprocessor further includes a program counter for holding information for sequentially designating instruction codes to be given to the instruction decoding unit, and instructs the instruction code given to the instruction decoding unit to branch to another processing routine When it is a code, the information held in the program counter is saved in the internal memory by the control information, the value of the program counter is changed to a specified value, and the contents of the address register are updated.
[0017]
Therefore, when branching to another processing routine, by executing one instruction code, the information held in the program counter is saved in the internal memory, the value of the program counter is changed to the specified value, and the address register It is possible to execute a series of processes for updating the contents of.
[0018]
Preferably, saving of the information held in the program counter to the internal memory, changing the value of the program counter to a specified value, and updating the contents of the address register are performed in parallel. Therefore, it is possible to quickly process a branch to another processing routine.
[0019]
Preferably, when an exception factor signal is input to the instruction decoding unit in the process, the contents of the address register and the program counter are controlled according to the control information in order to branch from this process to an exception factor processing routine prepared in advance. Is saved and the contents of the address register are changed to a predetermined fixed value.
[0020]
Accordingly, the contents of the address register and the contents of the program counter can be saved by one instruction code, and the contents of the address register can be changed to a predetermined fixed value, so that it is possible to quickly branch to the exception factor processing routine.
[0021]
Preferably, when an instruction code for returning from the branch destination processing routine to the original processing is given to the instruction decoding unit, the contents of the program counter are restored to the contents previously saved by the control information.
[0022]
Preferably, when the instruction decoding unit is given an instruction code for returning from the branch destination processing routine to the original processing, the contents of the address register and the contents of the program counter are restored by the control information.
[0023]
Therefore, the contents of the address register and the contents of the program counter can be restored by one instruction code, so that it is possible to quickly return to the original process from the branch destination processing routine.
[0024]
Preferably, the internal memory has three independent ports, reads information from the internal memory based on an address signal for the first port of the three ports, and the second of the three ports. The information is read from the internal memory based on the address signal for the other port, the read information is given to the operation unit, and the operation result is obtained as the address signal for the third port of the three ports. Based on the above, the operation of storing in the internal memory is performed in one cycle.
[0025]
Therefore, it is possible to execute arithmetic processing while referring to the internal memory at high speed.
Preferably, the instruction code includes lower address information for each of the address information for the three ports.
[0026]
Preferably, the instruction code includes constant information, and the instruction decoding unit sends the constant information to a bus for supplying information read from the internal memory to the arithmetic unit. Therefore, the bus can be shared for supplying the information referring to the internal memory and the constant information to the arithmetic unit, and the apparatus configuration can be simplified. Also, constant information can be given directly from the instruction decoding unit to the arithmetic unit, so that arithmetic using the constant information can be executed quickly.
[0027]
Preferably, when branching to an exception handling routine according to the occurrence of an exception factor during processing, the contents of the address register are saved and updated to a fixed value, a cycle for waiting time is executed, and the value of the program counter is saved To store in internal memory.
[0028]
Thus, when an exception factor occurs during processing, the contents of the address register are saved and updated to a fixed value. Even if an instruction that references internal memory is executed at the beginning of the exception processing routine at the branch destination, the cycle for the waiting time is executed at the time of branching, so the address register is changed and the contents after the change are changed. Based on this, the time required for the address decoding unit to generate the address signal is offset by the waiting time, and the appropriate area can be referred to using the updated address information in the subsequent internal memory reference.
[0029]
Preferably, when returning from the exception handling routine to the original process before branching, the address register is restored to the saved contents and a cycle for waiting time is executed.
[0030]
Thus, when returning to the original processing, the contents of the address register are changed. Even if the first instruction of the original process that was restored instructs to refer to the internal memory, the cycle for waiting time is executed at the time of return, so the address register is changed and the contents after the change are changed. Based on this, the time required for the address decoding unit to generate the address signal is offset by the waiting time, and the appropriate area can be referred to using the updated address information in the subsequent internal memory reference.
[0031]
A compiler according to another aspect of the present invention is a compiler that sequentially inputs source codes constituting a source program, converts them into instruction codes for the above-described microprocessor, and outputs an instruction code string. When the source code to be referred is converted, if conversion to an instruction code for instructing the change of upper address information is performed immediately before, the waiting cycle is performed before the instruction code of the source code in the instruction code string. Compile to put the instruction code.
[0032]
Since the instruction code for the waiting cycle is placed between the instruction code that changes the upper address information and the instruction code that refers to the internal memory, when executing the instruction code that refers to the internal memory, The time required to generate the address signal based on the updated contents after the upper address information is updated is offset by the waiting time, and the updated address information is used for subsequent internal memory references. You can refer to the area.
[0033]
Preferably, when a source code instructing a subroutine call is detected in the source program, the information stored in the program counter is saved in the internal memory, and the value of the program counter is changed to a specified value. Converts to an instruction code that instructs to update the register contents.
[0034]
Therefore, when a subroutine call is instructed, the information held by the program counter is saved in the internal memory by one instruction code, the value of the program counter is changed to the specified value, and the address is changed. You can command to update the register contents.
[0035]
Preferably, when a source code instructing the end of a subroutine is detected in the source program, the source code is converted into an instruction code for restoring the contents of the program counter to the contents saved in advance.
[0036]
Preferably, when the source code next to the source code instructing the subroutine call in the source program instructs to refer to the internal memory, the waiting time is set in the instruction code group instructing the return from the subroutine. Compile to put the instruction code for the cycle.
[0037]
Since the high-order address information is changed in order to return from the subroutine, the instruction memory for the waiting cycle is placed in the instruction code group instructing the return from the subroutine. In the case of referring, the upper address information is updated, and the time required to generate a signal for accessing the internal memory based on the updated contents is offset by the waiting time. Can refer to an appropriate area using the updated address information.
[0038]
Preferably, when it is detected in the source program that the first source code of the subroutine indicates an internal memory reference, an instruction code for a waiting cycle is placed in the instruction code string before the instruction code of the source code. Compile to
[0039]
Therefore, when the upper address information is updated in the immediately preceding subroutine and the internal memory is referred to at the beginning of the next subroutine, the upper address information is updated and the internal memory is accessed based on the updated upper address information. The time required to generate a signal for this purpose is offset by the waiting time, and in the internal memory reference, an appropriate area can be referred to using the updated address information.
[0040]
DETAILED DESCRIPTION OF THE INVENTION
Embodiments of the present invention will be described below with reference to the drawings.
[0041]
(Embodiment 1)
In the present embodiment, the partial area of the internal RAM is made to function as a stack, and the circuit scale for accessing the stack area of the address decoder related to the internal RAM is reduced, so that the address data is decoded and a decode signal is generated (derived). A processor is provided that shortens the time (delay) required until the time.
[0042]
FIG. 1 shows a configuration of a microprocessor 40 according to the present embodiment. The microprocessor 40 is a port address output 1, data input 2 and data output 3, program counter (hereinafter abbreviated as PC) 4, pipe 5, instruction decoder 6, 3-Port RAM (hereinafter simply abbreviated as internal RAM) 7 , A bus 8, ALU (Arithmetic and Logic Unit) 9 for performing an operation (arithmetic / logical operation) using information in internal RAM 7 and an operation (arithmetic / logical operation) using information given from instruction decoder 6, B bus 10, an address register 11, incrementers 12 and 14, MUX (Multiplexor) 15 and 16, a data output buffer 17, a data input buffer 18, and a C bus 19. The internal RAM 7 includes a memory cell array 70 and address decode units 7A, 7B, and 7C on the A bus side, the B bus side, and the C bus side, respectively. Address decode units 7A, 7B and 7C receive and decode address information relating to access to memory cell array 70 given from the bus side, and output an address signal for accessing memory cell array 70 as a decoding result.
[0043]
A program executed in the microprocessor 40 includes a plurality of instruction codes and is stored in an external memory (not shown). Address data for accessing the external memory from the microprocessor 40 is connected to the address output 1, and data read from the external memory to the microprocessor 40 is connected to the data input 2 and written from the microprocessor 40 to the external memory. The data to be loaded is connected to the data output 3. Although the entire microprocessor 40 operates in synchronization with an external clock input, the illustration of the clock input and the description of the synchronous operation are omitted here.
[0044]
The PC 4 holds the address data in which the instruction to be fetched next is stored while performing the counting operation, and instructs the external memory via the address output 1. When an instruction code, which is data read from a designated address based on the designated address data, is input to the pipe 5 for adjusting the timing via the data input 2, it is given to the instruction decoder 6 through the pipe 5 as appropriate. Deciphered here. The other parts of the microprocessor 40 operate according to instructions from the instruction decoder 6 based on the result of decoding the instruction code.
[0045]
The ALU 9 has two input ports and one output port, and the RAM 7 has three independent ports (3-Port). The A bus 8 is a dedicated bus that outputs data in the internal RAM 7 to the ALU 9, and is connected to one input port of the ALU 9. The B bus 10 is a dedicated bus for outputting data from the external memory to the ALU 9 via the data input 2 and the data input buffer 18, and is connected to the other input port of the ALU 9 and simultaneously outputs data via the data output buffer 17. 3 is also connected. The C bus 19 is for input only, and is connected to the output of the ALU 9 and the PC 4. Here, it is assumed that the internal RAM 7 has a data width of 32 bits, an address data width of 16 bits, and a capacity of 64k words = 2M bits.
[0046]
The address register 11 is a register that holds the upper 11 bits of the 16 bits of address data in the internal RAM 7. The value held in the address register 11 is incremented or decremented by the instruction decoder 6 via the incrementer 12.
[0047]
One of the output of the address register 11 and the output of the incrementer 12 is selected by the MUX 16 and connected to the A bus address of the A bus 8. The output of the address register 11 is connected to the B bus address of the B bus 10 (not shown).
[0048]
One of the output of the address register 11 and the output of the incrementer 12 is selected by the MUX 15 and connected to the C bus address of the C bus 19. The lower 5 bits of the address data of the internal RAM 7 are independent at the three ports of the internal RAM 7, and a signal from the instruction decoder 6 is connected thereto. Therefore, the internal RAM 7 can be accessed in various ways according to the instruction code of the program.
[0049]
Here, assuming that the value held in the address register 11 (the value of the upper address data) is not changed, the accessible area of the internal RAM 7 is a range that can be specified by the lower address data, that is, 32 (= 2 ^ 5) words. Therefore, the internal RAM 7 can be accessed as if there were 32 32-bit internal registers, and various operations can be performed in the space according to the instruction code.
[0050]
If the value of the address register 11 is incremented by +1, the address of the internal RAM 7 is increased by 32, and the area of the internal RAM 7 that can be accessed (= the area where data can be read or written by the instruction code) is moved. Can be accessed for 32 words. The same applies when the value of the address register is set to -1.
[0051]
Now, it is assumed that the internal RAM 7 is realized by an SRAM (Static RAM). Each cell of the SRAM is shown in FIG. The SRAM cell 21 in FIG. 2 includes inverters 22 and 23 that hold 1-bit information supplied from a data line pair, and transistors 24-29. A word line WA to which a signal obtained by decoding the address of the first port among the three ports of the internal RAM 7 is supplied is connected to the transistors 24 and 25, and similarly, a signal obtained by decoding the address of the second port is supplied. The WB connected to the transistors 26 and 27 is connected to the transistors 28 and 29. Similarly, the WC to which the signal obtained by decoding the address of the third port is supplied.
[0052]
Of the word lines WA, WB and WB, only the selected word line goes high, and the transistor connected to the selected word line is turned on. When word line WA is selected, data line pair DA and DAb are connected to inverters 22 and 23 via transistors 24 and 25, respectively. Similarly, when the word line WB is selected, the data line pair DB and DBb are connected to the inverters 22 and 23 via the transistors 26 and 27, and when the word line WC is selected, the data line pair DC and DCc. Is connected to inverters 22 and 23 through transistors 28 and 29.
[0053]
The cells 21 are arranged as shown in FIG. 3 to form an SRAM. In FIG. 3, the decoded signal from the A bus address is decoded from the C bus address via the word lines WA0000 to WAFFFF, and the decoded signal from the B bus address is decoded from the C bus address via the word lines WB0000 to WBFFFF. A signal is given to each cell 21 through word lines WC0000 to WCFFFF. A corresponding data line pair is connected to the cell 21 connected to the selected word line and turned on.
[0054]
A sense amplifier AMP is arranged at each port of the SRAM (internal RAM 7), and the data held in each cell 21 is amplified by the sense amplifier AMP to be data DA00 to DA1F of the A bus 8 and data DB00 to DB1F of the B bus 10. , Output (read) as data DC00 to DC1F of the C bus 19, respectively.
[0055]
Similarly, when data is written, the value of the data line selected by the word line and driven from the sense amplifier AMP is written to each cell 21.
[0056]
FIG. 4 shows a configuration of an address decoding unit that generates a signal of each word line. The address decoding units 7A, 7B and 7C of the internal RAM 7 have the same configuration as shown in FIG. The address decoding unit has an upper address decoder 41, a lower address decoder 42 and an address decoder 43. The upper 11 bits A5 to A15 driven from the address register 11 are input to the address decoder 41, and decoding result signals GA000 to GA7FF are generated and applied to the address decoder 43. Of the signals GA000 to GA7FF, only the signal at the corresponding address is high, and the other signals are low.
[0057]
The lower 5 bits A0 to A4 driven from the instruction decoder 6 are input to the address decoder 42, and the decoding result signals GB00 to GB1F are generated and applied to the address decoder 43. Of the signals GB00 to GB1F, only the corresponding address signal is high, and the other signals are low.
[0058]
Further, the decoded signals are input to the address decoder 43, and individual word line signals W0000 to WFFFF are generated and output. Of the word line signals W0000 to WFFFF, only the word line signal at the corresponding address is high, and the other word line signals are low.
[0059]
Here, the address decoder is separated into a high-order address decoder 41 and a low-order address decoder 42, and the low-order address decoder 42 is realized by a relatively small circuit, for example, a one-stage circuit, and a subsequent-stage address decoder 43. Since the number of stages is reduced, the address decoding unit minimizes the time required (delay amount) from the input of the lower address data (data of the lower 5 bits A0 to A4) to the output of the word line signals W0000 to WFFFF. It can be.
[0060]
Assuming that the microprocessor 40 is operated with a clock of 100 MHz, one cycle is 10 ns, but decoding is performed from the change of the lower address data (lower 5 bits A0 to A4 data), and the decoding result is output. Furthermore, it is assumed that the timing design is performed using the decoding result so that the delay (time required) until data is read from the SRAM cell is within one cycle. In the microprocessor 40 designed in this way, when only the lower address data (data of the lower 5 bits A0 to A4) changes, various data can be calculated in one cycle, and the register operation is performed by a conventional general-purpose register machine. The calculation can be performed in the same manner as.
[0061]
Using this microprocessor 40, a source program 50 (hereinafter simply referred to as program 50) written in the C language of FIG. 5 is executed. At the time of execution, it is assumed that the program 50 is precompiled and converted (translated) into machine code executable by the microprocessor 40.
[0062]
The program 50 is a subroutine program of the function func1, has variables a and b as arguments, and has local variables c, d and e. The program 50 of the function func1 is executed when called from another function program (not shown). In addition, a subroutine program of another function func2 is called in the function func1 of the program 50. The subroutine program of the function func2 takes a variable c as an argument and has a local variable f.
[0063]
An example of the arrangement of data on the internal RAM 7 when the program 50 is executed is schematically shown in FIG. 6, and the operation of the microprocessor 40 for each cycle is shown in a table form in FIG. FIG. 7 shows a source line number 71 indicating the line number assigned to the left end of the program 50 of FIG. 5 and a mnemonic that is a machine code obtained by compiling the source code described in the line of the source line number 71. A code 72, an operation 73 of the microprocessor 40 when executing the mnemonic code 72, and a cycle number 74 indicating the order of cycles in which the mnemonic code 72 is executed are shown.
[0064]
When the function func1 is called from another function not shown and the program 50 is executed, the value of the address register 11 is 0, and the variables a and b to be copied to the argument of the function func1 are the addresses “00h + 2” of the internal RAM 7. Assume that the addresses are stored in the address “00h + 1”.
[0065]
First, since it is necessary to copy the argument to the stack, in the first and second cycles indicated by the cycle number 74, the contents of the local variable area which is the area of the internal RAM 7 specified by the data in the current address register 11 are Copy to the area specified by (the value of the address register 11 + 1) which is the local area used by the function func1 to be executed next. At this time, the value of the address register 11 becomes the upper address for the A bus 8 and (value of the address register 11 + 1) becomes the upper address for the C bus 19, and copying is performed between areas of different upper addresses. As a result, the variables a and b are copied to the addresses “20h + 1” and “20h + 2”, respectively.
[0066]
In the third cycle indicated by the cycle number 74, the next function uses the return address (PC + 4) when the processing of the function func1 is completed by executing the mnemonic code 72 of the call instruction of the subroutine (function func1). The operation of saving to the local area to be performed, incrementing the value of the address register 11 by 1, and substituting the start address of the function func1 into the PC 4 to transfer the control to the function func1 can be executed in parallel. Therefore, since subroutine calls can be executed in one cycle, subroutine calls can be processed at high speed.
[0067]
The compiler assigns the areas of local variables c, d, and e of the function func1 to addresses 3, 4, and 5, respectively, of the local area (see FIG. 6). The first instruction in the function func1 is an arithmetic instruction “c = a + b;” and is an instruction accompanied by an operation referring to the internal RAM 7. At this time, when the function func1 is called first, the value of the address register 11 has changed. If the delay amount from the higher address is large, the operation instruction may not be completed in one cycle. Therefore, a mnemonic code 72 ('NOP') indicating an instruction that does not perform anything in the fourth cycle indicated by the cycle number 74 is inserted by the compiler to prepare for an operation on the next local variable, that is, an operation of “c = a + b;”. . When the instruction decoder 6 inputs the instruction code and interprets it as “NOP”, the instruction decoder 6 does not operate in the cycle assigned for the instruction code and waits until the next instruction code is input. In the “NOP” cycle, all the internal buses and control signals are not changed and the current state is maintained.
[0068]
Here, the purpose of inserting the instruction code ('NOP') will be described. In the microprocessor 40, the delay time from the input of the upper address data to the decoding and deriving of the decode signal is relative to the delay time from the input of the lower address data to the decoding and deriving of the decode signal. It becomes long. If the delay time (required time) from input of address data to decoding of the decode signal is long, access (reference) to the internal RAM 7 may not be completed in one cycle, and data for the next operation instruction may not be read. There is. In other words, “access time” = “address decoding delay time” + “memory cell read time (word line / bit line delay and sense amplifier delay)”. The time will be longer. Even if the access time takes 1 cycle or more (10 ns or more at 100 MHz), the instruction code ('NOP') is executed for one cycle immediately before the next operation instruction, so that an extra 1 for the access time. The cycle data can be allocated, and the operand data can always be successfully accessed when the next operation instruction is executed. As a result, it is possible to reliably avoid an error state in which an operation instruction cannot be executed without obtaining necessary data.
[0069]
Calculations between local variables are performed in the fifth and sixth cycles indicated by cycle number 74. Since all the variables (operands) for these operations are assigned to the same upper address area, they all end in one cycle.
[0070]
Since the function func2 is to be executed next, the argument c is copied to the local area (address “40h + 1” in FIG. 6) used in the function func2 in the seventh cycle indicated by the cycle number 74. In the eighth cycle shown in FIG. 6, the return address of PC4 is saved (copied to the address “40h + 0” in FIG. 6), the address register 11 is carried up, and the start address of the function func2 for PC4 is assigned simultaneously. Control is transferred to the function func2.
[0071]
In the ninth cycle of cycle number 74, the upper address is delayed by the mnemonic code 72 ('NOP') inserted by the compiler as in the fourth cycle.
[0072]
In the tenth cycle of cycle number 74, an operation instruction (f = c + 1;) with local variables and constants is executed. Information on the constant (= 1) for this calculation is included in the corresponding mnemonic code 72, and the constant information is given to the ALU 9 by driving the B bus 10 by the instruction decoder 6 and executed. Therefore, the B bus 10 can be used for transferring data read from the internal RAM 7 to the ALU 9 and for transferring constant information from the instruction decoder 6 to the ALU 9.
[0073]
Since the process of the function func2 is completed and the process returns to the process of the function func1, the value of the address register 11 is decremented by -1 in the 11th cycle indicated by the cycle number 74, and the return value is returned in the 13th cycle indicated by the cycle number 74. Copy to the local area used in the original function func1, restore the value of PC4 in the 14th cycle of cycle number 74, and return to the control of the function func1. Thus, since the value of the address register 11 is changed when returning from the subroutine (returning from the function func2 to the original function func1), it is necessary to wait for a delay of the higher address, and the twelfth cycle indicated by the cycle number 74 Then, the mnemonic code 72 ('NOP') inserted by the compiler is executed.
[0074]
In the fifteenth cycle indicated by cycle number 74, an operation between local variables is performed, and a process for returning (returning) to another function (not shown) that called function func1 in the sixteenth to nineteenth cycles is performed. It is. At this time as well, since the value of the address register 11 is similarly changed, the mnemonic code 72 (“NOP”) indicating the instruction to be inserted inserted by the compiler is executed.
[0075]
In the operation 73 of the subroutine call instruction ('CALL func1', 'CALL func2') in FIG. 7, in this one cycle (one cycle of the clock), "saving the contents of PC4" and "carrying up the value of the address register 11" The three operations of “PC4 value update” are simultaneously performed in parallel. Conventionally, since there is no thing corresponding to the “address register”, “saving of PC contents” and “update of PC value” may be processed sequentially or may be processed simultaneously in parallel. The present embodiment has a feature that parallel processing is performed simultaneously including “carrying up the address register 11”.
[0076]
This simultaneous parallel processing will be described with reference to FIG. “Saving the contents of PC4” is performed using the path “PC4 → incrementer 14 → C bus 19 → internal RAM 7”. The “carrying up of the address register 11” is performed using the path “address register 11 → incrementer 12 → address register 11”. “Updating the value of PC4” is performed using a route of “instruction decoder 6 → PC4” (not shown). Since these three types of paths are independent (not a common bus connection), simultaneous parallel processing is possible.
[0077]
The microprocessor 40 processes an interrupt. The instruction decoder 6 receives an interrupt input (not shown), and when the interrupt input signal becomes active, the instruction decoder 6 branches to an interrupt processing program prepared in advance. This is shown in FIG. FIG. 8 shows the operation of the microprocessor 40 in the form of a table similar to FIG. 7 for the interrupt processing program. In FIG. 8, the source number 71 is omitted.
[0078]
When an interrupt signal indicating that an exception factor has occurred during processing of the program 50 or the like, the three cycles 101 to 103 indicated by the cycle number 74 are executed in hardware. In the 101st cycle, the value of the address register 11 is copied (saved) to the save register 13, and at the same time, a predetermined value (fixed value) is substituted into the address register 11. Since the value of the address register 11 becomes a fixed value, the same stack area of the internal RAM 7 is always used when an interrupt occurs. Different areas can be used as a stack by appropriately changing the address register 11 in the interrupt processing program.
[0079]
In the 103rd cycle indicated by cycle number 74, the return address (PC4 value + 4) is stored at address 0 of the stack area indicated by the new value of address register 11, and at the same time, a fixed value is assigned to PC4. Since the value of PC4 becomes a fixed value, when an interrupt occurs, it always jumps to the same address (the first address of the interrupt processing program). Various processes can be performed by appropriately branching in the interrupt processing program.
[0080]
Also in this case, since the value of the address register 11 is changed in the 101st cycle, the upper address is delayed in the next 102th cycle. In the 102nd cycle, unlike the execution of the mnemonic code 72 ('NOP') inserted by the above-mentioned compiler, a wait cycle is executed by hardware control. In this waiting cycle, all internal buses and control signals are not changed and the current state is maintained, and the supply of the instruction code by the pipe 5 to the instruction decoder 6 is also stopped.
[0081]
In the 104th and subsequent cycles indicated by the cycle number 74, an interrupt process is performed, and at the end of the process, a mnemonic code 72 ("RETI") for returning from the interrupt process to the original process is always arranged. The In the 201st cycle in which this instruction “RETI” is executed, the contents of address 0 of the interrupt processing stack area are copied to PC4 to restore the value of PC4, and at the same time, the contents of save register 13 are copied to address register 11. Restore the original processing address before branching. As a result, the original process interrupted by the interrupt input can be resumed and executed from the content at the time of the interruption.
[0082]
The original process is restarted from the 203rd cycle indicated by the cycle number 74, but it is determined whether or not the instruction code (mnemonic code 72) executed first after restarting is an instruction referring to the internal RAM 7. Have difficulty. For this reason, since the contents of the address register 11 are changed in the 201st cycle, the 202th cycle is always executed before the 203th cycle, and the delay of the higher address is performed. Similarly to the 102nd cycle, this is a wait cycle by hardware control.
[0083]
When an instruction that refers to the internal RAM 7 is executed after the contents of the address register 11 are updated as described above, a wait cycle such as “NOP” is inserted, so the contents of the address register 11 are updated. Even so, a cycle for referring to the internal RAM 7 can be secured. For example, even the operation instruction in the third row in FIG. 5 can prepare the decode signal (GA000 to GA7FF) from the higher address when executed by the ALU 9. .
[0084]
Accordingly, the following operation is performed when such an arithmetic instruction is executed. That is, information is read from the internal RAM 7 based on the signal of the A bus address of the first port among the three independent ports corresponding to the three address decoding units 7A to 7C of the internal RAM 7. The second port The information is read from the internal RAM 7 based on the signal of the B bus address for the A, the read information is given to the ALU 9, and the calculation result is calculated based on the signal of the C bus address of the third port. Can be stored in one cycle.
[0085]
Further, as described above, the internal RAM 7 can be accessed at high speed based on the lower address from the instruction decoder 6 given to the A, B, and C bus addresses.
[0086]
(Embodiment 2)
In this embodiment, when a source program is input and an instruction code for accessing (referring to) the internal RAM 7 in the source program is detected, an instruction code ('NOP') for providing the above-described wait cycle is inserted. Thus, a compiler for compiling the source program is provided.
[0087]
FIG. 9 shows a microcomputer 80 that executes a compiling procedure according to the second embodiment. The microcomputer 80 includes a CPU (Central Processing Unit), a ROM (Read Only Memory) 82 that stores data such as a compiler program (hereinafter simply referred to as a compiler), a RAM 83, an input / output I / F (Inter Face) 84, a keyboard. An input unit 85 for inputting information such as instructions from the outside, an output unit 86 for outputting information to the outside, a communication I / F 87 for connecting to various communication lines such as the Internet, and a recording medium 89 includes a recording medium driving unit 88 that is detachably inserted and accesses information to the inserted recording medium.
[0088]
The compiler may be supplied by being recorded in advance on the recording medium 89 and read by the recording medium driving unit 88, or may be supplied by being loaded from the network via the communication I / F 87.
[0089]
The input / output I / F 84 inputs / outputs various devices including the microprocessor 40. The compiler stored in the ROM 82 is executed under the control of the CPU 81, so that the high-level language source program prepared in the RAM 83 or the like is sequentially read and translated into machine language instructions for the microprocessor 40. The translated contents are stored in a predetermined area of the RAM 83. The machine language instruction sequence stored in a predetermined area of the RAM 83 is read out and applied to the microprocessor 40 as the data input 2 via the input / output I / F 84. The machine language instruction sequence read from the RAM 83 may be read via the communication I / F 87 and the network and may be given to the microprocessor 40 as the data input 2, or may be supplied to the recording medium 89 via the recording medium driving unit 88. And the contents of the recording medium 89 may be given to the microprocessor 40 as the data input 2.
[0090]
Here, it is assumed that the microprocessor 40 operates on the assumption that the microcomputer 80 is operating. Usually, however, the compiler stores a compilation result (instruction code string) in the recording medium 89, for example, After writing to the ROM connected to the address output 1 and the data input 2, the microprocessor 40 is operated.
[0091]
FIG. 10 is a flowchart showing a compile procedure according to the second embodiment, and the procedure shown in FIG. 10 is executed by executing the compiler in the ROM 82. A procedure for compiling the C language program 50 of FIG. 5 into the mnemonic code 72 of FIG. 7 will be described with reference to the flowchart of FIG. The compiler also assigns variables (including arguments) declared in the program 50 to the stack area as shown in FIG. 6, but the description thereof is omitted here. Further, the mnemonic code 72 sequentially generated following the procedure is sequentially stored in an internal memory (not shown) of the CPU 81. When there is no source code to be compiled (compilation of all source codes is completed), the sequence of mnemonic codes 72 in the internal memory is stored in a predetermined area of the RAM 83.
[0092]
First, the CPU 81 reads the program 50 file from the RAM 83 into an internal memory (not shown) (step S1). Next, it is determined whether or not there is a function code to be compiled in the read source file (step S2). If there is no code to be compiled, the sequence of mnemonic codes 72 (machine language instruction sequence) stored in the internal memory as a compilation result is output to a predetermined area of the RAM 83 (step S17). It is determined whether the content of the line is an instruction code that refers to the internal RAM 7 (step S3). If the instruction code refers to the internal RAM 7, the CPU 81 generates a mnemonic code 72 of “NOP” (step S4), and generates a mnemonic code 72 that refers to the internal RAM 7 based on the read contents (step S5). . Thereafter, the process proceeds to step S6.
[0093]
Since the instruction for calling the subroutine of the function func1 is described in a program included in another source code file (not shown), the mnemonic code “CALL func1” is not generated when the source code file is compiled (not shown). Not generated when compiling other source code files).
[0094]
The content read in the first line of the program 50 is the content defining the function func1 (YES in step S2). Since the first instruction of the program 50 is the instruction code on the third line and refers to the internal RAM 7 (YES in S3), the CPU 81 generates a mnemonic code 72 of “NOP” (step S4) and reads it. A mnemonic code 72 referring to the internal RAM 7 is generated based on the contents (step S5). Thereafter, the process proceeds to step S6.
[0095]
Since the instruction codes in the four lines are general instructions (NO in S6, NO in S11), the mnemonic code 72 is generated based on the read contents (step S16). Thereafter, the process proceeds to step S6.
[0096]
The CPU 81 reads the fifth line of the program 50. Since the read content is a subroutine call instruction code (YES in step S6), a corresponding mnemonic code 72 ('CALL func2') is generated (step S7). Since the source code on the sixth line, which is the first instruction returned from the subroutine of the function func2, is an instruction that refers to the internal RAM 7 (YES in step S8), a mnemonic code 72 of “NOP” is generated, and A mnemonic code 72 corresponding to an instruction referring to the internal RAM 7 is generated (steps S9 and S10). Then, it returns to step S6.
[0097]
Since the next 7th line is determined to be a subroutine end instruction (NO in step S6, YES in step S11), in order to return to the original subroutine, that is, to return to the subroutine that called the function func1, mnemonic code 72 ( 'DEC ADR') is generated (step S12), the mnemonic code 72 of 'NOP' is generated (step S13), the mnemonic code 72 of the return value copy instruction is generated (step S14), and the mnemonic code 72 ('RET) ') Is generated (step S15), and the process returns to step S2.
[0098]
This completes the compilation of the subroutine of the function func1, and the function func2 continues in the source file (YES in step S2). The first instruction of the subroutine of the function func2 is indicated by the instruction code on the 11th line. Since this is an instruction for referring to the internal RAM 7 (YES in step S3), the CPU 81 generates a mnemonic code 72 of “NOP” (step S4), and refers to the internal RAM 7 based on the read contents. The mnemonic code 72 is generated (step S5). Thereafter, the process proceeds to step S6.
[0099]
Since no code to be compiled remains at this time (YES in step S2), the machine language instruction sequence as a compilation result is output (stored) in a predetermined area of the RAM 83 (step S17). Thus, the compilation of the program 50 ends.
[0100]
(Modification of the embodiment)
In the first and second embodiments, since the delay amount from the upper address is large, when the reference of the internal RAM 7 is executed immediately after the value of the address register 11 is changed, the reference is completed in one cycle. Since the machine language instruction ('NOP') instructing that no operation is performed is inserted by the compiler, the amount of delay from the higher address is relatively small, so that the internal RAM 7 as described above is inserted. If the reference is completed in one cycle, the machine language instruction ('NOP') need not be inserted. Therefore, as an option of the compiler, whether to allow automatic insertion of machine language instructions ('NOP') may be variably set.
[0101]
In addition, a wait cycle by hardware control is inserted at the time of interrupt generation and return from interrupt processing. This can also be an option at the time of hardware design if the delay amount from the higher address is relatively small.
[0102]
In the first embodiment, the lower address is 5 bits and each local area of the internal RAM 7 is 32 words. However, the present invention is not limited to this. For example, in a microprocessor 40 in which a program that requires a large amount of local variables is executed, the system can be optimized, for example, by increasing the lower address to make each local area larger. Conversely, when the number of necessary local variables is small, it is possible to optimize the system by reducing the lower addresses and reducing the unused area of the internal RAM 7 to reduce the capacity and cost of the internal RAM 7. .
[0103]
(Effect of embodiment)
By using the microprocessor 40, an operation on a local variable that normally uses a stack can be performed in one cycle without a cycle for calculating an address. Further, since the stack is switched by changing the value of the address register 11, it is not necessary to save the value of the address register 11 used locally to the stack, and the overhead of calling a function (subroutine) is reduced. Further, the structure in which the output of the address register 11 and the output of the incrementer 12 are switched by the MUXs 15 and 16 makes it possible to pass arguments between areas before and after the function (subroutine) call. Further, when a function (subroutine) is called, the return address is saved, the address register 11 is carried up, and a value is assigned to the PC 4 can be executed in parallel, so that the function (subroutine) can be called at high speed.
[0104]
Therefore, the microprocessor 40 can execute a program of a language having a language specification that performs all operations on the stack, for example, a program written in Java (R) at high speed.
[0105]
The embodiment disclosed this time should be considered as illustrative in all points and not restrictive. The scope of the present invention is defined by the terms of the claims, rather than the description above, and is intended to include any modifications within the scope and meaning equivalent to the terms of the claims.
[0106]
【The invention's effect】
According to the microprocessor of the present invention, it is possible to speed up the access to the partial area referred to based on the lower address information of the internal memory. For example, a large amount of data can be accessed at a high speed as compared with the conventional method of calculating using a register file.
[0107]
According to the compiler of the present invention, since the instruction code for the waiting cycle is placed between the instruction code for changing the upper address information and the instruction code for referring to the internal memory, the internal memory is referred to. When executing the instruction code, the upper address information is updated and the time required to generate a signal for accessing the internal memory based on the updated upper address information is offset by the waiting time. In reference, an appropriate area can be referred to using the updated address information.
[Brief description of the drawings]
FIG. 1 is a configuration diagram of a microprocessor.
FIG. 2 is a diagram illustrating a configuration example of each cell of an SRAM.
FIG. 3 is a diagram illustrating an example of an array of SRAM cells.
FIG. 4 is a configuration diagram of an address decoding unit that generates a signal of each word line.
FIG. 5 is a diagram showing an example of a source program written in C language.
FIG. 6 is a diagram illustrating an example of data arrangement on an internal RAM when a program is executed.
FIG. 7 is a diagram showing the operation of the microprocessor for each cycle in a table format.
FIG. 8 is a diagram showing, in a tabular form, the operation of the microprocessor for each cycle when branching to an interrupt processing program.
FIG. 9 is a block diagram of a microcomputer that executes a compiling procedure;
FIG. 10 is a flowchart showing a compilation procedure.
[Explanation of symbols]
6 instruction decoder, 7 internal RAM, 7A, 7B, 7C address decoding unit, 9 ALU, 11 address register, 40 microprocessor, 41 upper address decoder, 42 lower address decoder.

Claims

An internal memory for storing information accessed based on an address signal;
A calculation unit for performing various calculations including a calculation using information in the internal memory based on a command;
An instruction decoding unit that decodes a given instruction code and outputs control information for controlling each unit including the operation instruction based on the decoding result;
An address decoding unit that inputs and decodes given address information and outputs the address signal;
The address information includes upper address information and lower address information, and a delay from the input of the lower address information to the derivation of the address signal by the address decoding unit after the input of the upper address information A microprocessor characterized by being shorter than a delay until an address signal is derived.

The address decoding unit decodes the higher-order address information in the address information and outputs a higher-order decode signal, and decodes the lower-order address information in the address information and outputs a lower-order decode signal. 2. The microprocessor according to claim 1, further comprising: a lower decoder; and a generation decoder that inputs the upper decoding signal and the lower decoding signal to generate and output the address signal.

The address generation unit further includes an address generation unit that generates the upper address information.
An address register for storing the upper address information;
An address update unit that updates the upper address information stored in the address register based on the control information of the instruction decoding unit;
The microprocessor according to claim 1, wherein the lower address information is included in the control information output from the instruction decoding unit.

A program counter for holding information for sequentially designating the instruction code to be given to the instruction decoding unit;
When the instruction code given to the instruction decoding unit is an instruction code instructing to branch to another processing routine,
According to the control information, the information held in the program counter is saved in the internal memory, the value of the program counter is changed to a specified value, and the contents of the address register are updated. The microprocessor according to any one of claims 1 to 3.

The saving of the information held in the program counter to the internal memory, the change of the value of the program counter to a specified value, and the update of the contents of the address register are performed in parallel. Item 5. The microprocessor according to item 4.

When an exception factor signal is input to the instruction decoding unit during processing,
In order to branch from the processing to the exception factor processing routine prepared in advance, the contents of the address register and the contents of the program counter are saved by the control information, and the contents of the address register are set to a predetermined fixed value. The microprocessor according to claim 4, wherein the microprocessor is changed to:

When the instruction decoding unit is given an instruction code for returning from the branch destination processing routine to the original processing, the control information restores the contents of the program counter to the previously saved contents. The microprocessor according to any one of claims 4 to 6.

When the instruction decoding unit is given an instruction code for returning from the branch destination processing routine to the original processing, the contents of the address register and the contents of the program counter are restored by the control information. The microprocessor according to any one of claims 4 to 6.

The internal memory has three independent ports;
Read information from the internal memory based on the address signal for a first port of the three ports, and based on the address signal for a second port of the three ports Information is read from the internal memory, the read information is given to the calculation unit, and the calculation result is obtained based on the address signal for the third port of the three ports. The microprocessor according to any one of claims 1 to 8, wherein the operation of storing data in (1) is performed in one cycle.

10. The microprocessor according to claim 9, wherein the instruction code includes the lower address information of each of the address information for the three ports.

The instruction code includes constant information;
11. The microprocessor according to claim 9, wherein the instruction decoding unit sends the constant information to a bus for supplying information read from the internal memory to the arithmetic unit.

When branching to an exception handling routine according to the occurrence of an exception cause during processing, the contents of the address register are saved and updated to a fixed value, a cycle for waiting time is executed, and the value of the program counter is saved The microprocessor according to any one of claims 4 to 11, wherein the microprocessor is stored in the internal memory.

13. When returning from the exception handling routine to the original process before branching, the address register is restored to the saved contents, and a cycle for waiting time is executed. Microprocessor.

A compiler that sequentially inputs source code constituting a source program, converts the source code into the instruction code for the microprocessor according to any one of claims 1 to 13, and outputs an instruction code string.
When the source code that refers to the internal memory is converted, if the instruction code that changes the upper address information has been converted immediately before, the instruction code of the source code in the instruction code string A compiler characterized by compiling to put an instruction code for a waiting cycle in front of each other.

When the source code for instructing a subroutine call is detected in the source program, the information stored in the program counter is saved in the internal memory, and the value of the program counter is changed to a specified value. 15. The compiler according to claim 14, wherein the compiler converts the instruction code to instruct to update the contents of the address register.

If the source code next to the source code instructing the subroutine call in the source program instructs to refer to the internal memory, the instruction code group instructing the return from the subroutine is sent to the instruction code group. The compiler according to claim 14, wherein the compiler is compiled so as to place an instruction code for a waiting cycle.

An instruction code for the waiting cycle before the instruction code of the source code in the instruction code sequence when it is detected in the source program that the first source code of a subroutine indicates a reference to the internal memory The compiler according to any one of claims 14 to 16, wherein the compiler is compiled such that