JP2003532221A

JP2003532221A - Processor architecture with ALU, Java stack and multiple stack pointers

Info

Publication number: JP2003532221A
Application number: JP2001580661A
Authority: JP
Inventors: ロニー、シー．ゴフ; デイビッド、アール．エボイ; サティエンドラ、エス．セシー
Original assignee: Philips Electronics NV
Current assignee: Koninklijke Philips NV
Priority date: 2000-05-04
Filing date: 2001-03-01
Publication date: 2003-10-28
Also published as: EP1281120A1; WO2001084305A1

Abstract

(57)【要約】プロセッサを介してデータを処理するスタックベースのプロセッサ機構と関連の方法が提供される。処理されるべき第１のデータは、オペランドスタックに書き込まれる。オペランドスタック内のデータ位置は、スタックプロセッサのスタック／ＡＬＵコントローラに含まれるスタックポインタを用いて特定される。スタックポインタは、データが格納されるべきオペランドスタック内のバンクを特定する。データ位置が特定された後、スタックポインタと機能コードを用いて選択されたデータの並列転送が行われる。選択されたデータは、オペランドスタックから、機能コードにより特定された命令に応じた処理を行う論理演算ユニットに転送される。その後、その結果はオペランドスタックに効率的に戻され、所望のときに読み出される。 SUMMARY A stack-based processor mechanism and associated method for processing data through a processor is provided. The first data to be processed is written to the operand stack. The data position in the operand stack is specified using a stack pointer included in the stack / ALU controller of the stack processor. The stack pointer specifies a bank in the operand stack where data is to be stored. After the data position is specified, the selected data is transferred in parallel using the stack pointer and the function code. The selected data is transferred from the operand stack to a logical operation unit that performs a process according to the instruction specified by the function code. Thereafter, the result is efficiently returned to the operand stack and read when desired.

Description

Detailed Description of the Invention

【０００１】（技術分野）本発明は、Java（登録商標）のスタックベースのプロセッサ内部の実行ユニッ
トに関する。より具体的には、本発明は、処理部と記憶部との間で、データをよ
り効率的に送受するスタック処理ハードウェア構成によるデータ処理に関する。TECHNICAL FIELD The present invention relates to an execution unit inside a Java (registered trademark) stack-based processor. More specifically, the present invention relates to data processing by a stack processing hardware configuration that transmits and receives data more efficiently between a processing unit and a storage unit.

【０００２】（背景技術）コンピュータ用プロセッサが市場に氾濫するとき、より高度な処理性能をもつ
プロセッサに対する要求が高まっている。今日のプロセッサは、より高速の処理
速度を要求する、より複雑なタスクを操作する。さらに、これらの複雑なタスク
は、内部の記憶装置の性能をより効率よく引き出すプロセッサを要求する。公知
のプロセッサは、オペランドスタックから取り出したデータに対して論理演算を
行う。この種のスタックベースのプロセッサの一つとして、Java仮想マシン（Ｊ
ＶＭ）がある。ＪＶＭは、コンピュータモデルの一般的な形態であり、Java言語
を実行する。ＪＶＭは、要求の少ないアプリケーションに対しては良好に動作す
るが、ハードウェアでＪＶＭのいくつかを実装するには、性能を改善する必要が
ある。すなわち、スタックプロセッサはハードウェアで実装されたが、その実装
は、処理性能に対する要求の高まりに対して効率的ではなかった。例えば、一般
的なスタックベースのプロセッサは、３２ビットか６４ビットのワイドスタック
で構成される。2. Description of the Related Art When computer processors flood the market, there is an increasing demand for processors with higher processing performance. Today's processors handle more complex tasks that require faster processing speeds. Moreover, these complex tasks require a processor that more efficiently exploits the performance of internal storage. A known processor performs a logical operation on the data taken out from the operand stack. As one of this kind of stack-based processor, Java Virtual Machine (J
VM). The JVM is a common form of computer model that implements the Java language. While the JVM works well for less demanding applications, implementing some of the JVM in hardware requires improved performance. That is, the stack processor was implemented in hardware, but the implementation was not efficient in response to the increasing demand for processing performance. For example, a typical stack-based processor has a 32-bit or 64-bit wide stack.

【０００３】３２ビット幅のスタックベースのプロセッサが利用されて、６４ビット幅のエ
ントリーを書き込む要求があれば、オペランドスタックからＡＬＵにデータを読
むために２つのアクセスが要求される。２つのアクセスは、スタックに結果を書
き戻すために要求される。すなわち、スタックからデータを読むために2回の３
２ビット読み出しが要求され、スタックにデータを書き戻すためにさらに２回の
３２ビット書き込みが要求される。６４ビット幅のスタックベースのプロセッサ
が２組の６４ビット幅のデータを有するならば、スタックからデータを読み出し
て、オペランドスタックに結果を格納するまで、合計4回のアクセスが要求され
る。これは、オペランドスタックからデータを読み出すのに2組の６４ビットが
要求され、オペランドスタックに結果を格納するのに２組の６４ビットが要求さ
れることを意味する。したがって、現在のスタックベースのプロセッサでデータ
を処理するために要求される複数のアクセスは、データの全体的な処理時間を増
やし、スタックベースのプロセッサを用いた実装の全体的な効率性を悪くする。If a 32-bit wide stack-based processor is utilized and there is a request to write a 64-bit wide entry, then two accesses are required to read the data from the operand stack into the ALU. Two accesses are required to write the result back to the stack. Ie, three times to read the data from the stack
A 2-bit read is required and two additional 32-bit writes are required to write the data back to the stack. If a 64-bit wide stack-based processor has two sets of 64-bit wide data, a total of four accesses are required until the data is read from the stack and the result is stored in the operand stack. This means that two sets of 64 bits are required to read the data from the operand stack and two sets of 64 bits are required to store the result in the operand stack. Therefore, the multiple accesses required to process data on current stack-based processors increases the overall processing time of the data and reduces the overall efficiency of implementations using stack-based processors. .

【０００４】また、現在の６４ビット幅のスタックは、６４ビット幅以下のデータの効率的
な割り当てを許容しない。例えば、３２ビット幅のデータは、６４ビットを保持
可能なスタック記憶領域列を占有しうる。すなわち、その列内の残りの３２ビッ
トは浪費されうる。このため、この種の実装は、高価な記憶空間を無駄にするた
め、非常に望ましくない。Also, current 64-bit wide stacks do not allow efficient allocation of data that is 64 bits wide or less. For example, 32-bit wide data can occupy a stack storage area column that can hold 64 bits. That is, the remaining 32 bits in the column can be wasted. Therefore, this type of implementation wastes expensive storage space and is highly undesirable.

【０００５】上述したスタックベースのプロセッサに加えて、性能、Java命令の完全なハー
ドウェア解釈、ネイティブJavaおよびコンパイルを向上させるために、現在のと
ころ、３つの基本的な技術がある。ハードウェア解釈のアプローチは、解釈処理
を改良するためにハードウェアを利用する。ホストのハードウェアは、Java命令
を機械語に翻訳する。翻訳された機械語は、Javaマシンをモデリングしているハ
ードウェアの産業上の標準プロセッサと互換性がある。これには２つの機能的な
問題がある。まず、翻訳がソフトウェア解釈（典型的な状況）とハードウェア翻
訳のどちらで実行されても関係なく、時間がかかる。いったん結果が生成される
と、翻訳が実行されなければならない。このように、ハードウェア翻訳は２つの
ステップ処理である（すなわち、実行の後に翻訳）。Javaのハードウェア翻訳に
関連のある二つ目の問題は、アーキテクチャである。もし目標のプロセッサが（
たいていのマシンと同様に）スタックベースのマシンでなければ、翻訳処理は複
雑になる。その結果、人工的な（すなわち、仮想の）スタックがエミュレートで
きるように、各Java命令は複数の目標のプロセッサ命令に発展する。In addition to the stack-based processor described above, there are currently three basic techniques to improve performance, full hardware interpretation of Java instructions, native Java and compilation. The hardware interpretation approach utilizes hardware to improve the interpretation process. The host hardware translates the Java instructions into machine language. The translated machine language is compatible with industry standard processors of hardware modeling Java machines. This has two functional problems. First, it takes time regardless of whether the translation is done by software interpretation (typical situation) or hardware translation. Once the result is produced, the translation has to be performed. Thus, hardware translation is a two step process (ie, translation after execution). The second issue associated with Java hardware translation is architecture. If the target processor is (
Unless you have a stack-based machine (like most machines), the translation process is complicated. As a result, each Java instruction evolves into multiple target processor instructions so that an artificial (ie, virtual) stack can be emulated.

【０００６】純粋なJavaプロセッサは、ハードウェアにて実のスタックベースのプロセッサ
を生成する。しかしながら、実のスタックベースのプロセッサは、非常に複雑で
あり、標準のプロセッサ（例えば、ＡＲＭ、MIPSおよびｘ８６など）で利用可能
なたいていのソフトウェアを利用できない。A pure Java processor produces a real stack-based processor in hardware. However, real stack-based processors are very complex and do not take advantage of most of the software available on standard processors (eg ARM, MIPS and x86).

【０００７】コンパイルは性能を向上させる他の技術である。コンパイルはインタープリタ
ーを不要とし、Javaを直接特定の機械語にコンパイルする。にもかかわらず、結
果の実行ファイルは小型ではない。Javaのコンパイルの実行に必要なリソースは
高価であり、ときには不便である（例えば、ハンドヘルド型パーソナル・デジタ
ル・アシスタントや携帯電話はコンパイラをサポートする十分なメモリを持たな
い）。さらに、コンパイルの実行に要求される時間は、過度の初期遅延を引き起
こす（すなわち、実時間応答は起動時には必要ではない）。Compilation is another technique that improves performance. Compile does not require an interpreter and compiles Java directly into a specific machine language. Nevertheless, the resulting executable is not small. The resources required to perform a Java compilation are expensive and sometimes inconvenient (eg, handheld personal digital assistants and cell phones do not have enough memory to support the compiler). Moreover, the time required to perform the compilation causes an excessive initial delay (ie, real-time response is not needed at startup).

【０００８】上述の観点で、従来の技術の問題点を解消するスタックベースの処理機構が必
要になる。さらに、開示された機構とそれに関連する手法は、従来の技術の問題
点を解消しつつ、オペランドスタック内での記憶空間の効率的な使用を可能にす
る。From the above viewpoint, a stack-based processing mechanism that solves the problems of the conventional technology is required. Further, the disclosed mechanism and related techniques allow for efficient use of storage space within the operand stack while eliminating the problems of the prior art.

【０００９】（発明の開示）広義には、本発明は、スタックベースのプロセッサを用いてデータを効率的に
処理する手法とハードウェアを提供することにより、上述した必要性を満たすも
のである。本発明は、処理、装置、システム、デバイスおよび方法を含む数多く
の手法で実装されうることを評価すべきである。本発明のいくつかの発明性のあ
る実施形態を以下に説明する。DISCLOSURE OF THE INVENTION Broadly speaking, the present invention satisfies the above need by providing a method and hardware for efficiently processing data using a stack-based processor. It should be appreciated that the present invention can be implemented in numerous ways, including as a process, an apparatus, a system, a device and a method. Several inventive embodiments of the present invention are described below.

【００１０】一実施形態では、スタックプロセッサを介してデータを処理する手法が開示さ
れる。この手法は、オペランドスタックにデータを書き込み、オペランドスタッ
ク内のデータ位置を特定するものである。オペランドスタック内のデータ位置は
、一以上のスタックポインタを用いて特定される。スタックポインタがオペラン
ドスタック内のデータを位置づけると、論理演算ユニットへのデータの並列転送
が行われる。転送されたデータは、スタックポインタや機能コードを用いて選択
される。転送されたデータは、機能コードにより定義される命令に応じて、論理
演算ユニット内で処理される。In one embodiment, a technique for processing data via a stack processor is disclosed. This technique writes data to the operand stack and identifies the data location within the operand stack. Data locations within the operand stack are identified using one or more stack pointers. When the stack pointer positions the data in the operand stack, the data is transferred in parallel to the logical operation unit. The transferred data is selected using the stack pointer or the function code. The transferred data is processed in the logical operation unit according to the instruction defined by the function code.

【００１１】他の実施形態では、データを処理するスタック処理サブシステムを開示する。
スタック処理システムは、スタック処理サブシステム内に位置するオペランドス
タックを有する。このオペランドスタックは、データを格納可能なバンクを有す
る。スタック処理サブシステムは、オペランドスタックのバンク内のデータ位置
を特定するスタックポインタも有する。論理演算ユニットは、オペランドスタッ
クとのインタフェースであり、オペランドスタックから論理演算ユニットまでデ
ータの並列ワード転送が実行されうる。さらに、並列ワード転送で転送されるべ
きデータは、スタックポインタと機能コードを用いて特定される。機能コードは
、論理演算ユニットにより処理される特定の命令を特定する。In another embodiment, a stack processing subsystem for processing data is disclosed.
The stack processing system has an operand stack located within the stack processing subsystem. This operand stack has banks capable of storing data. The stack processing subsystem also has a stack pointer that identifies a data location within a bank of the operand stack. The logical operation unit is an interface with the operand stack, and parallel word transfer of data can be performed from the operand stack to the logical operation unit. Further, the data to be transferred by the parallel word transfer is specified by using the stack pointer and the function code. The function code identifies the particular instruction processed by the logical operation unit.

【００１２】他の実施形態では、スタックポインタを用いたデータでの処理を実行する手法
を開示する。このデータは、スタックプロセッサ内に含まれるオペランドスタッ
クに配置される。オペランドスタック内のデータ位置は、スタックプロセッサ内
にも位置するスタックポインタにより追跡される。データがオペランドスタック
内に配置された後、データはオペランドスタックから論理演算ユニットに転送さ
れる。このデータは、オペランドスタックから論理演算ユニットに１２８ビット
の増分で転送されうる。データが論理演算ユニットに転送されると、そのデータ
は機能コードからの命令により処理される。[0012] In another embodiment, a method of performing processing on data using a stack pointer is disclosed. This data is placed on the operand stack contained within the stack processor. The data position in the operand stack is tracked by the stack pointer, which is also located in the stack processor. After the data is placed in the operand stack, the data is transferred from the operand stack to the logical operation unit. This data can be transferred from the operand stack to the logical operation unit in 128 bit increments. When the data is transferred to the logical operation unit, the data is processed by the instruction from the function code.

【００１３】本発明の多くの利点を認識すべきである。本発明の一実施形態において、オペ
ランドスタックと論理演算ユニットは、単一ユニットに統合される。さらに、本
発明は、スタックをバンクに分割することにより、１２８ビット幅のより効率的
な使用を可能にする。過去に、スタックのデータ幅未満のデータがスタックに格
納されたとき（すなわち、６４ビットスタックにおける３２ビットデータ）、残
りの空間は無駄に浪費されていた（すなわち、６４ビットスタックにおける残り
の３２ビットデータは無駄に浪費されていた）。本発明において、スタックポイ
ンタは、個々の列ごとにデータを配置する（すなわち、１２８ビットの列ごとに
２組の独立した６４ビットのエントリーを配置する）場合と対比して、複数のデ
ータを同じ列（同じ列で２組の独立した６４ビットエントリー）に配置させる。
本発明はまた、単一の演算処理（すなわち、一つのリードアクセスと一つのライ
トアクセス）において２つのアクセスのみで１２８ビット幅のデータを完全に処
理させる。このように、Javaのスタックベースのプロセッサの処理時間は、開示
されクレームされた構造および方法により、データのより少ないアクセスと効率
的な処理が可能になるので、非常に短くなる。One should appreciate the many advantages of the present invention. In one embodiment of the present invention, the operand stack and the logical operation unit are combined into a single unit. Further, the present invention allows for more efficient use of 128 bits wide by dividing the stack into banks. In the past, when less than the stack's data width was stored on the stack (ie, 32-bit data in a 64-bit stack), the remaining space was wasted (ie, the remaining 32-bits in the 64-bit stack). The data was wasted wastefully). In the present invention, the stack pointer is the same for a plurality of data as compared with the case where data is arranged for each column (that is, two sets of independent 64-bit entries are arranged for each 128-bit column). Place in columns (two independent 64-bit entries in the same column).
The present invention also allows 128-bit wide data to be fully processed with only two accesses in a single arithmetic operation (ie, one read access and one write access). Thus, the processing time of Java's stack-based processor is very short because the disclosed and claimed structures and methods allow for less access to data and efficient processing.

【００１４】本発明の他の特徴および効果は、本発明の原理を図示した添付図面を参照して
、以下に詳しく説明される。Other features and advantages of the present invention are described in detail below with reference to the accompanying drawings that illustrate the principles of the present invention.

【００１５】（発明を実施するための最良の形態）本発明は、添付図面とともに以下の詳細な説明により容易に理解されうる。同
一の構成要素には同一の符号を付している。BEST MODE FOR CARRYING OUT THE INVENTION The present invention can be easily understood by the following detailed description in conjunction with the accompanying drawings. The same components are designated by the same reference numerals.

【００１６】スタックサブシステムを用いてデータを処理する、統合的なスタック／ＡＬＵ
サブシステムとそれに関連する方法についての機構を開示する。以下の記載では
、本発明を完全に理解するために、数多くの具体的な詳細説明がなされる。しか
しながら、当業者には、本発明はこれら具体的な詳細のいくつか、またはすべて
がなくても実現可能であることがわかるであろう。他の例では、公知の処理動作
は、本発明を不必要にあいまいにしないように、詳細には説明されていない。Integrated stack / ALU for processing data using stack subsystem
Mechanisms for subsystems and related methods are disclosed. In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention. However, one of ordinary skill in the art will appreciate that the present invention may be practiced without some or all of these specific details. In other instances, well-known processing operations have not been described in detail so as not to unnecessarily obscure the present invention.

【００１７】図１は本発明の一実施形態であるJavaスタックサブシステム１０２の外観図１
００を示している。Javaスタックサブシステム１０２は、Javaプロセッサ用の実
行ユニットである。Javaスタックサブシステム１０２は、他のサブシステムとシ
ステムバス１５０との通信を行う３２ビット幅のバス１２０を有する。一実施形
態では、バス１２０は、入力マルチプレクサ（入力ｍｕｘ）１０８にロードされ
るべきデータを供給することができる。入力ｍｕｘ１０８は、バス１２０からオ
ペランド１１０に３２ビットデータを導く。FIG. 1 is an external view of a Java stack subsystem 102 that is an embodiment of the present invention.
00 is shown. The Java stack subsystem 102 is an execution unit for a Java processor. The Java stack subsystem 102 has a 32-bit wide bus 120 that communicates with the system bus 150 with other subsystems. In one embodiment, bus 120 may provide the data to be loaded into input multiplexer (input mux) 108. Input mux 108 directs 32-bit data from bus 120 to operand 110.

【００１８】オペランドスタック１１０にデータをロードするために、スタック／ＡＬＵコ
ントローラ１０４は、機能コード１２４をデコードする。機能コード１２４は、
オペランドスタック１１０に格納されたデータに対してどのような処理が実行さ
れるべきかに関する情報をスタック／ＡＬＵコントローラ１０４に提供する。例
えば、機能コード１２４は、所定のデータセットで実行されるべきＡＤＤ演算を
要求しうる。この場合、機能コード１２４は、オペランドスタック１１０から読
まれるべきデータに関するＡＤＤ演算を論理演算ユニット（ＡＬＵ）１０６に実
行させる。このため、機能コード１２４は、スタックポインタ105c-f（すなわち
、スタックポインタ-1、スタックポインタ-2、スタックポインタ-3、スタックポ
インタ-4）にオペランドスタック１１０内の所望のデータ位置を特定させる。所
望のデータは、１２８ビット幅のバスを用いて出力マルチプレクサ１１２により
ＡＬＵ１０６により取り出される。このように、オペランドスタック１１０の各
部からのデータ（例えば、各３２ビット部分）は、単一アクセスの間に、ＡＬＵ
１０６により読み出されうる。To load data into operand stack 110, stack / ALU controller 104 decodes function code 124. The function code 124 is
It provides the stack / ALU controller 104 with information regarding what processing should be performed on the data stored in the operand stack 110. For example, function code 124 may require an ADD operation to be performed on a given data set. In this case, function code 124 causes logical operation unit (ALU) 106 to perform an ADD operation on the data to be read from operand stack 110. Therefore, the function code 124 causes the stack pointers 105c-f (that is, the stack pointer-1, the stack pointer-2, the stack pointer-3, and the stack pointer-4) to specify the desired data position in the operand stack 110. The desired data is retrieved by ALU 106 by output multiplexer 112 using a 128 bit wide bus. As such, data from each part of the operand stack 110 (eg, each 32-bit part) is transferred to the ALU during a single access.
Can be read by 106.

【００１９】オペランドスタック１１０は、一実施形態では、４つのバンク110a〜110dを有
する。各バンク１１０は３２ビット幅が望ましい。スタックポインタ１０５は３
２ビット中心であり、オペランドスタック１１０のバンク110a〜110dのどれがデ
ータを受け取って、バンク110a〜110d内のどのデータがＡＬＵ１０６に通過する
かを選択するために用いられる。スタックポインタ105c-fは、オペランドスタッ
ク１１０に押し込められた最後の４つのエントリー用のデータ位置を特定する。Operand stack 110, in one embodiment, has four banks 110a-110d. Each bank 110 is preferably 32 bits wide. Stack pointer 105 is 3
It is 2-bit centric and is used to select which of the banks 110a-110d of the operand stack 110 will receive the data and which of the banks 110a-110d will pass to the ALU 106. Stack pointers 105c-f identify the data locations for the last four entries pushed onto operand stack 110.

【００２０】好適な実施形態では、スタック／ＡＬＵコントローラ１０４は、すべての演算
を制御し、スタック／ＡＬＵサブシステム１０２内のすべての機能性を操作する
。スタック／ＡＬＵコントローラ１０４は、スタックへの柔軟で効率的なアクセ
スを許容する６つのスタックポインタ１０５を有する。スタックポインタ105a（
スタックポインタ＋１）とスタックポインタ105b（スタックポインタ）は、スタ
ックポインタ105c-fで定義される。（スタックポインタ+1）105aは、スタック+1
の先頭アドレスである。スタックポインタ105bは常にスタックの先頭アドレスで
ある。スタックポインタ105c-fはそれぞれ、上述したように、スタック-1,-2,-3
,-4の先頭アドレスである。アクセスの間、スタックポインタ105c-fはそれぞれ
、一つまたはすべてのバンク110a〜ｄのどれがゲートされるべきかを定義するこ
とができる。結果として、単一のアクセスサイクルの間に、１２８ビット幅のバ
スにゲートされる全４つのバンク110a〜110dをＡＬＵ１０６に持つことができる
。このため、スタック／ＡＬＵサブシステム１０２は、サイクル当たり３２ビッ
トエンティティを一つだけ読み、その後、次の３２ビットエンティティを読むた
めにスタックポインタを調整することにより、時間を浪費しない。In the preferred embodiment, the stack / ALU controller 104 controls all operations and operates all functionality within the stack / ALU subsystem 102. The stack / ALU controller 104 has six stack pointers 105 that allow flexible and efficient access to the stack. Stack pointer 105a (
Stack pointer +1) and stack pointer 105b (stack pointer) are defined by stack pointers 105c-f. (Stack pointer +1) 105a is the stack +1
Is the start address of. The stack pointer 105b is always the top address of the stack. The stack pointers 105c-f are respectively stack-1, -2, -3 as described above.
, -4 is the start address. During access, the stack pointers 105c-f can each define which one or all banks 110a-d should be gated. As a result, the ALU 106 can have all four banks 110a-110d gated to a 128-bit wide bus during a single access cycle. Thus, the stack / ALU subsystem 102 does not waste time by reading only one 32-bit entity per cycle and then adjusting the stack pointer to read the next 32-bit entity.

【００２１】また、バスマスタ・インタフェースユニット（BMIU）１３０がバス１２０に接
続されている。BMIU１３０は、システムバス１５０とスタック／ＡＬＵサブシス
テム１０２との間で通信を行う。例えば、BMIU１３０は、スタック／ＡＬＵサブ
システム１０２へのアドレッシング１４２とリード／ライト１４４を制御できる
。また、スタック／ＡＬＵサブシステム１０２と通信されうるホールト１４５コ
マンドが図示されている。スタック／ＡＬＵサブシステム１０２と通信を行うた
めに、リード／ライトスタック（RD/WRT STK）１２６がまた図示されている。RD
/WRT STK１２６は、一つの例では、図２を参照して議論される機能コードにより
特定される。RD/WRT STK１２６は、ホストプロセッサに対して、オペランドスタ
ック１１０からの読み出しと書き込みを許可する。さらに、ウェイト１３６コマ
ンドとALU CC（状態コード）１３４は、スタック／ＡＬＵサブシステム１０２か
らの信号として図示されている。バス１４０は、出力１１６からパス１３２に沿
ってBMIU１３０に出力される出力を取り込む。Further, a bus master interface unit (BMIU) 130 is connected to the bus 120. The BMIU 130 communicates between the system bus 150 and the stack / ALU subsystem 102. For example, the BMIU 130 can control addressing 142 and read / write 144 to the stack / ALU subsystem 102. Also shown are the Halt 145 commands that may be communicated with the Stack / ALU subsystem 102. A read / write stack (RD / WRT STK) 126 is also shown to communicate with the stack / ALU subsystem 102. RD
/ WRT STK 126 is identified in one example by the function code discussed with reference to FIG. The RD / WRT STK 126 permits the host processor to read and write from the operand stack 110. In addition, the wait 136 command and ALU CC (status code) 134 are shown as signals from the stack / ALU subsystem 102. Bus 140 takes the output from output 116 along path 132 to BMIU 130.

【００２２】以下は、理解を目的とした、スタック／ＡＬＵサブシステム１０２の効率的な
動作の一例である。まず、３２ビット幅のデータは、バス１２０を介してスタッ
ク／ＡＬＵサブシステム１０２に送られる。スタックポインタ105c-fは３２ビッ
トのデータをバンク110a,110b,110c,110dのそれぞれに導く。一例として、ＡＤ
Ｄ演算を定義する機能コード１２４は、機能コード１２４がデコードされるスタ
ック／ＡＬＵコントローラ１０４に導かれる。スタック／ＡＬＵコントローラ１
０４が機能コード１２４をデコードした後、第１サイクルが始まる。第1サイク
ルでは、スタック／ＡＬＵコントローラ１０４は、オペランドスタック１１０か
らのデータを、出力mux１１２を介してＡＬＵ１０６に導く。好適な一実施形態
では、出力mux１１２からＡＬＵ１０６に導かれたデータは、１２８ビット幅（
例えば、複数の３２ビットワード）である。これは、３２ビット幅の増分（例え
ば、一回に一ワードのみ）データを転送する従来例とは逆であり、３２ビット以
上の転送は、スタックポインタを使用せずに多くのサイクルを用いて行われる。The following is an example of efficient operation of the stack / ALU subsystem 102 for understanding purposes. First, 32-bit wide data is sent to the stack / ALU subsystem 102 via the bus 120. The stack pointers 105c-f guide the 32-bit data to the banks 110a, 110b, 110c, 110d, respectively. As an example, AD
The function code 124 defining the D operation is directed to the stack / ALU controller 104 where the function code 124 is decoded. Stack / ALU controller 1
After 04 decodes the function code 124, the first cycle begins. In the first cycle, the stack / ALU controller 104 directs the data from the operand stack 110 to the ALU 106 via the output mux 112. In one preferred embodiment, the data derived from output mux 112 to ALU 106 is 128 bits wide (
For example, multiple 32-bit words). This is the opposite of the conventional example of transferring data in 32-bit width increments (eg, only one word at a time), and transfers of 32 bits or more use many cycles without using the stack pointer. Done.

【００２３】データがＡＬＵ１０６に導かれるとき、スタック／ＡＬＵコントローラ１０４
はＡＬＵ１０６に機能コードを転送する。このため、ＡＬＵ１０６は、結果を生
成するためにデータの機能コード１２４により特定される所望の動作を行う（例
えば、その動作はプロセッサやJavaプロセッサなどにより一般に実行される演算
動作でありうる）。ＡＬＵ１０６は、動作が完了したことを知らせる信号をスタ
ック／ＡＬＵ１０４に送り、スタック／ＡＬＵコントローラ１０４は、ＡＬＵ１
０６からの結果を読み出してオペランドスタック１１０に戻す。ＡＬＵ１０６か
らオペランドスタック１１０への戻りパス１１４は、２種類の３２ビット幅のパ
ス114a,114b（すなわち、６４ビット幅のパス）で特定される。これは、その結
果が同一の単一アクセス動作の間に格納されうることを保障する。その結果がオ
ペランドスタック１１０に書き戻されるとき、スタックポインタ105e（スタック
ポインタ-3）とスタックポインタ105f（スタックポインタ-4）はオペランドスタ
ック１１０内の結果の位置を特定する。Stack / ALU controller 104 when data is directed to ALU 106
Transfers the function code to the ALU 106. As such, the ALU 106 performs the desired operation specified by the function code 124 of the data to produce the result (e.g., the operation may be a computing operation commonly performed by a processor, Java processor, etc.). The ALU 106 sends a signal to the stack / ALU 104 indicating that the operation is completed, and the stack / ALU controller 104 sends the signal to the ALU1.
The result from 06 is read and returned to the operand stack 110. The return path 114 from the ALU 106 to the operand stack 110 is specified by two types of 32-bit width paths 114a and 114b (that is, 64-bit width paths). This ensures that the result can be stored during the same single access operation. When the result is written back to operand stack 110, stack pointer 105e (stack pointer-3) and stack pointer 105f (stack pointer-4) locate the result within operand stack 110.

【００２４】スタックポインタ１０５は、その結果が2番目のサイクルでオペランドスタッ
ク１１０に読み戻された後、再調整される。スタックポインタは、スタックポイ
ンタ105c（スタックポインタ-1）とスタックポインタ105d（スタックポインタ-2
）がデータ位置を示すように再調整される。オペランドスタック１１０内のデー
タ位置は変わらないことに注意されたい。スタックポインタを再調整する命令は
、機能コード１２４の一部である。ユーザがオペランドスタック１１０内に格納
された結果を読み出すとき、出力ｍｕｘ１１２は３２ビット幅の結果を吐き出す
。オペランドスタック１１０からの結果を吐き出す命令は、機能コード１２４に
書き込まれ、標準のJavaコードの一部である。Javaスタックから取り出された情
報はＰＯＰ命令の制御の下で一般になされる。機能コードは、デコード／実行サ
ブシステムにより、スタック／ＡＬＵサブシステム１０２に送られる。機能コー
ドがスタック／ＡＬＵサブシステム１０２に送られるとき、オペランドスタック
１１０内の適当なデータをＢバスにゲートさせる。The stack pointer 105 is readjusted after the result is read back into the operand stack 110 in the second cycle. The stack pointers are stack pointer 105c (stack pointer-1) and stack pointer 105d (stack pointer-2).
) Is readjusted to indicate the data position. Note that the data position within operand stack 110 does not change. The instruction to realign the stack pointer is part of the function code 124. When the user reads the result stored in the operand stack 110, the output mux 112 spits out a 32-bit wide result. The instruction that flushes the result from the operand stack 110 is written into the function code 124 and is part of the standard Java code. The information retrieved from the Java stack is generally done under the control of the POP instruction. The function code is sent to the stack / ALU subsystem 102 by the decode / execute subsystem. When the function code is sent to the stack / ALU subsystem 102, it gates the appropriate data in the operand stack 110 on the B bus.

【００２５】図２はオペランドスタック１１０からＡＬＵスロット112a〜ｄにデータを転送
するスタックポインタ１０５を示す本発明の一実施形態である。機能コードレジ
スタ２０２は、スタック／ＡＬＵサブシステム１０２により実行されるべき動作
を保持する。この情報は、デコード／実行サブシステムによるJava命令から引き
出される。デコード／実行サブシステムは、スタック／ＡＬＵサブシステム１０
２と通信する。その機能は、当業者に知られている。一実施形態において、機能
コードは、オペランドスタック１１０内のデータに関して実行されることを要求
する。例えば、ＡＤＤ演算が要求された場合、ＡＤＤ演算を指示するように機能
コードレジスタ２０２が書き込まれる。機能コードレジスタ２０２は、機能デコ
ードテーブル２０４と通信を行うように図示されている。一実施形態において、
機能コードテーブル２０４は、デコード信号（例えば、ＡＤＤや長い分割など）
を有する。このため、これらデコード信号204aは、一実施形態において、実行さ
れるべき機能を特定する。FIG. 2 is an embodiment of the present invention showing the stack pointer 105 for transferring data from the operand stack 110 to the ALU slots 112a-d. The function code register 202 holds the operation to be performed by the stack / ALU subsystem 102. This information is derived from Java instructions by the decode / execute subsystem. The decoding / execution subsystem is the stack / ALU subsystem 10
Communicate with 2. Its function is known to those skilled in the art. In one embodiment, the function code requires that it be performed on the data in the operand stack 110. For example, when the ADD operation is requested, the function code register 202 is written so as to instruct the ADD operation. Function code register 202 is shown in communication with function decode table 204. In one embodiment,
The function code table 204 includes a decoded signal (for example, ADD or long division).
Have. Thus, these decode signals 204a identify the function to be performed, in one embodiment.

【００２６】機能デコードテーブル２０４は、デコード信号204aをバンクセレクタｍｕｘ２
０６に伝送する。スタックポインタ１０５は、オペランドスタック１１０内のデ
ータのバンクアドレス情報をバンク選択ｍｕｘ２０６に伝送する。スタックポイ
ンタ１０５は、オペランドスタック１１０内のデータのバンクアドレスを特定す
るアドレスビット105c-1〜105f-1を含んでいる。バンクセレクタ２０６は、デコ
ード信号204aからの情報を取得し、アドレスビット105c-1〜105f-1はスロット選
択信号206aを生成する。スロット選択信号206aは、ＡＬＵ１０６で処理するため
に送られるデータを出力ｍｕｘ１１２のＡＬＵスロット112a〜ｄに関連づける。
スロット選択信号206aはまた、ＡＬＵ演算結果を含むオペランドバンク１１０内
の目標位置を入力ｍｕｘ１０８に知らせる。データがオペランドスタック１１０
内にどの順序で存在するかに関係なく、バンク110a〜ｄのいずれかから取得され
ることに注意されたい。The function decode table 204 transfers the decode signal 204a to the bank selector mux2.
It is transmitted to 06. The stack pointer 105 transfers the bank address information of the data in the operand stack 110 to the bank selection mux 206. Stack pointer 105 includes address bits 105c-1 to 105f-1 that specify the bank address of the data in operand stack 110. The bank selector 206 acquires the information from the decoded signal 204a, and the address bits 105c-1 to 105f-1 generate the slot selection signal 206a. Slot select signal 206a associates data sent for processing by ALU 106 with ALU slots 112a-d of output mux 112.
The slot select signal 206a also informs the input mux 108 of the target position within the operand bank 110 containing the ALU operation result. Data is operand stack 110
Note that it will come from any of the banks 110a-d, regardless of their order in which they are present.

【００２７】理解のために、以下では、図２を参照して議論されたように、オペランドスタ
ック１１０からのデータをＡＬＵスロット112a〜ｄに転送する動作の一例を説明
する。例えば、Java命令は、ＡＤＤ演算がオペランドスタック１１０内に含まれ
るデータ組に対して実行されることを特定する。Java命令のデコードは、ロード
されるべきＡＤＤ命令を機能コードに要求させる。For purposes of understanding, an example operation of transferring data from operand stack 110 to ALU slots 112a-d, as discussed with reference to FIG. 2, is described below. For example, a Java instruction specifies that an ADD operation is performed on the data set contained within operand stack 110. Decoding a Java instruction causes the function code to request the ADD instruction to be loaded.

【００２８】機能デコードテーブル２０４がアクセスされた後、デコード信号204aが生成さ
れてバンク選択ｍｕｘ２０６に送られる。さらに、スタックポインタ１０５は、
オペランドスタック１１０内のデータのバンクアドレスを指示する情報を送る。
例えば、スタックポインタ105fは、アドレスビット105f-lをもつ要求されたＡＤ
Ｄ命令用のデータ組のバンク位置を特定する。バンクセレクタｍｕｘ２０６は、
スロット信号206aを生成し、この信号を出力ｍｕｘ１１２に送る。スロット信号
206aは、データ組のバンクアドレスを出力ｍｕｘに知らせる。After the function decode table 204 is accessed, the decode signal 204 a is generated and sent to the bank selection mux 206. Furthermore, the stack pointer 105 is
Send information indicating the bank address of the data in operand stack 110.
For example, the stack pointer 105f is the requested AD with address bits 105f-l.
Identify the bank position of the data set for the D instruction. The bank selector mux 206 is
It produces a slot signal 206a and sends this signal to the output mux 112. Slot signal
206a informs the output mux of the bank address of the data set.

【００２９】上述したように、本発明は、ユーザに多くの利点をもたらす。本発明は、より
少ないアクセスで、かつより効率的にスタックプロセッサを用いて実行される。
多くても、読み出しと書き込みの２つのアクセスが要求される。オペランドスタ
ックからＡＬＵへのデータパスは、１２８ビット幅であり、２組の６４ビットエ
ントリーが単一のアクセスで読み出される。以前は、データパスは、３２ビット
幅か６４ビット幅のいずれかであった。いずれの場合も、１２８ビットエンティ
ティ用のＡＬＵにオペランドスタックからのデータを読み出すには複数回のアク
セスが必要であり、処理時間の増大を招いていた。As mentioned above, the present invention offers many advantages to the user. The present invention is implemented with a stack processor with fewer accesses and more efficiently.
At most, two accesses, read and write, are required. The data path from the operand stack to the ALU is 128 bits wide and two sets of 64 bit entries are read in a single access. Previously, the datapath was either 32 or 64 bits wide. In either case, the ALU for 128-bit entity needs to be accessed multiple times to read data from the operand stack, resulting in an increase in processing time.

【００３０】さらに、スタックポインタは、スタック内の高価な記憶空間を浪費するのを防
止する。スタックポインタは、選択可能なバンクを生成し、これにより、３２ビ
ット幅の記憶領域が１２８ビット幅のスタックに生成される。このように、一つ
の１２８ビット幅スタックには、４つの３２ビット幅の独立したデータエントリ
ーまで格納できる。従来の技術と比較して、一つの３２ビットデータエントリー
は１２８ビット幅のスタック内の記憶空間を浪費せず、これによりスタックの記
憶空間をより効率的に使用できる。In addition, the stack pointer prevents wasting expensive storage space in the stack. The stack pointer creates a selectable bank, which creates a 32-bit wide storage area in a 128-bit wide stack. Thus, a 128-bit wide stack can store up to four 32-bit wide independent data entries. Compared to the prior art, one 32-bit data entry does not waste the storage space in the 128-bit wide stack, which allows the storage space of the stack to be used more efficiently.

【００３１】本発明は、適切なソフトウェアで駆動されるコンピュータ実装動作を用いて実
現可能である。このように、コンピュータ周辺機器を駆動するためのコンピュー
タシステム内（すなわち、ソフトウェアドライバの形態）に格納されたデータを
含む種々のコンピュータ実装動作が適用可能である。これらの動作は、物理量の
物理的操作を要求する。通常、必ずしも必要でないが、これらの物理量は、格納
され、伝送され、結合され、比較され、操作されることができる電子的または磁
気的な信号の形態を要する。さらに、実行される操作はたびたび、確認し、識別
し、走査し、または比較する期間で参照される。The present invention can be implemented using computer-implemented operations driven by suitable software. Thus, various computer-implemented operations involving data stored within a computer system (ie, in the form of software drivers) for driving computer peripherals are applicable. These operations require physical manipulations of physical quantities. Usually, though not necessarily, these physical quantities require the form of electronic or magnetic signals that can be stored, transmitted, combined, compared, and manipulated. Further, the manipulations performed are often referred to in the periods to confirm, identify, scan, or compare.

【００３２】本発明の一部を形成する上述した動作のいくつかは、機械動作に有益である。
これら動作を実行するために、適切なデバイスまたは装置が有効でありうる。装
置は、要求された目的のために特に構成されるか、あるいはコンピュータ内に格
納されたコンピュータプログラムにより選択的に活性化されるか構成される汎用
のコンピュータでありうる。特に、種々の汎用のマシンは、上述した説明にした
がって記載されたコンピュータプログラムで用いられうる。この種のマシンは、
要求された動作を実行するためにより専門化した装置を構成するためにより便利
である。Some of the operations described above that form part of the present invention are beneficial to machine operation.
Appropriate devices or apparatus may be available to perform these operations. The device may be a general purpose computer specifically configured for the required purpose or selectively activated or configured by a computer program stored in the computer. In particular, various general purpose machines may be used with the computer programs described in accordance with the above description. This kind of machine
It is more convenient to configure a more specialized device to perform the required operations.

【００３３】上述した発明は、わかりやすくするために詳細に説明されたが、添付の特許請
求の範囲の範囲内で特定の変更や変形が可能であることは明らかである。したが
って、本実施形態は、図式化されたものであり、限定的なものではない。本発明
は、上述した詳細説明に限定されず、添付の特許請求の範囲の範囲とその均等範
囲内で種々の変更が可能である。Although the above invention has been described in detail for the sake of clarity, it will be apparent that certain changes and modifications are possible within the scope of the appended claims. Therefore, this embodiment is schematic and not limiting. The present invention is not limited to the above detailed description, and various modifications can be made within the scope of the appended claims and the equivalents thereof.

【００３４】本発明は、適切なソフトウェアで駆動されるコンピュータ実装動作を用いて実
施することもできる。このように、コンピュータ周辺機器を駆動するためのコン
ピュータシステム内（すなわち、ソフトウェアドライバの形態）に格納されたデ
ータを含む種々のコンピュータ実装動作が適用可能である。これらの動作は、物
理量の物理的操作を要求する。通常、必ずしも必要でないが、これらの物理量は
、格納され、伝送され、結合され、比較され、操作されることができる電子的ま
たは磁気的な信号の形態を要する。さらに、実行される操作はたびたび、確認し
、識別し、走査し、または比較する期間で参照される。The invention may also be implemented using computer-implemented operations driven by suitable software. Thus, various computer-implemented operations involving data stored within a computer system (ie, in the form of software drivers) for driving computer peripherals are applicable. These operations require physical manipulations of physical quantities. Usually, though not necessarily, these physical quantities require the form of electronic or magnetic signals that can be stored, transmitted, combined, compared, and manipulated. Further, the manipulations performed are often referred to in the periods to confirm, identify, scan, or compare.

【００３５】本発明の一部を形成する上述した動作のいくつかは、機械動作に有益である。
これら動作を実行するために、適切なデバイスまたは装置が有用でありうる。装
置は、要求された目的のために特に構成されるか、あるいはコンピュータ内に格
納されたコンピュータプログラムにより選択的に活性化されるか構成される汎用
のコンピュータでありうる。特に、種々の汎用のマシンは、上述した説明にした
がって記載されたコンピュータプログラムで用いられうる。この種のマシンは、
要求された動作を実行するためにより専門化した装置を構成するためにより便利
である。Some of the above-described operations that form part of the present invention are beneficial to machine operation.
Appropriate devices or apparatus may be useful for performing these operations. The device may be a general purpose computer specifically configured for the required purpose or selectively activated or configured by a computer program stored in the computer. In particular, various general purpose machines may be used with the computer programs described in accordance with the above description. This kind of machine
It is more convenient to configure a more specialized device to perform the required operations.

[Brief description of drawings]

【図１】本発明の一実施形態によるJavaスタックサブシステムを示す図。[Figure 1] FIG. 3 illustrates a Java stack subsystem according to one embodiment of the present invention.

【図２】オペランドスタックから特定のＡＬＵスロットにデータを直接的に転送するス
タックポインタを示す本発明の一実施形態の図。FIG. 2 is a diagram of one embodiment of the present invention showing a stack pointer that directly transfers data from the operand stack to a particular ALU slot.

───────────────────────────────────────────────────── フロントページの続き (72)発明者デイビッド、アール．エボイアメリカ合衆国アリゾナ州、テンプ、ウェスト、セクレタリアット、ドライブ、68 (72)発明者サティエンドラ、エス．セシーアメリカ合衆国カリフォルニア州、プレザントン、マニアー、ウッド、ドライブ、 5179 Ｆターム(参考） 5B033 DE00 ─────────────────────────────────────────────────── ─── Continued front page (72) Inventor David Earl. Evoie Tempe, Arizona, United States Strike, secretariat, drive, 68 (72) Inventor Satiendra, S. Cecie Preza, California, United States Tone, enthusiast, wood, drive, 5179 F-term (reference) 5B033 DE00

Claims

[Claims]

1. A method of processing data through a stack processor, comprising writing data to an operand stack, using one or more stack pointers to identify a data position within the operand stack, the stack pointer and the operands. A method for generating parallel transfer of selected data by using a function code from a stack to a logical operation unit that performs processing according to an instruction specified by the function code.

2. A data transfer through a stack processor according to claim 1, wherein the parallel transfer moves 64 bits in parallel, moves 96 bits in parallel, and moves 128 bits in parallel. Method.

3. The method for processing data via a stack processor according to claim 1, wherein the parallel transfer supplies the selected data to a logical operation unit for parallel word processing.

4. A logical operation unit produces a result and transfers the result as one of a single word and a plurality of words onto an operand stack, wherein one or more stack pointers indicate a position in the operand stack for the result. The method for processing data via a stack processor according to claim 1, characterized in that it is specified.

5. The method of processing data through a stack processor according to claim 4, wherein the parallel transfer supplies the selected data to a logical operation unit for parallel word processing.

6. When generating parallel transfer of selected data using a stack pointer and a function code from an operand stack to a logical operation unit, transferring address information of the stack pointer and a decode signal to a selector / multiplexer, The method of processing data through a stack processor of claim 1, wherein the slot select signal received from the selector multiplexer is transferred to the output multiplexer, the output multiplexer selecting specific data from the operand stack. .

7. The method of processing data via a stack processor of claim 1, wherein the stack processor is a Java stack processor.

8. The method of processing data through a stack processor of claim 1, wherein the operand stack is divided into a plurality of banks, each bank being 32 bits wide.

9. The method of processing data via a stack processor of claim 1, wherein identification and generation are controlled by a stack / ALU controller.

10. An operand stack capable of storing data and having banks arranged in a stack processing subsystem; a stack pointer in a stack processing subsystem for locating data within a bank of the operand stack; The logical operation unit connected to the operand stack is provided so that the parallel word transfer of data is performed from the operand stack to the logical operation unit, and the data to be transferred by the parallel word transfer uses the stack pointer and the function code. A stack processing subsystem for data processing, characterized in that the function code specifies a specific instruction to be processed by a logical operation unit.

11. The stack processing subsystem for data processing according to claim 10, wherein the parallel transfer includes 64-bit parallel transfer, 96-bit parallel transfer, and 128-bit parallel transfer.

12. The stack processing subsystem for data processing according to claim 10, wherein a stack pointer in the stack processor identifies a position of processing data in the operand stack.

13. The data processing according to claim 10, wherein the bank selector / multiplexer receives bank address information from the stack pointer and a decode signal from the function code so as to send the slot select signal to the output multiplexer. Stack processing subsystem for.

14. The stack processing subsystem for data processing of claim 13, wherein the output multiplexer selects specific data from the operand stack.

15. The operand stack has a plurality of banks each having a 32-bit word width.
The stack processing subsystem for data processing according to claim 10, wherein the stack processing subsystem is divided into a plurality of banks.

16. The stack processing subsystem for data processing of claim 10, wherein the operand stack is hardware logic.

17. The stack processing subsystem for data processing of claim 10, wherein the stack pointer is hardware logic.

18. The stack processing subsystem for data processing according to claim 10, wherein the logical operation unit is hardware logic.

19. The stack processing subsystem for data processing according to claim 10, further comprising an input multiplexer arranged in the immediate vicinity of the operand stack.

20. The stack processing subsystem for data processing according to claim 14, wherein the output multiplexer is arranged close to an operand stack of the logical operation unit.

21. The stack processing subsystem for data processing of claim 10, wherein the stack pointer is located in the stack / ALU controller.

22. Placing data on an operand stack contained within the stack processor, tracking the data position within the operand stack having a stack pointer located within the stack processor, 128 bits from the operand stack to the logical operation unit. A method of performing data processing by using a stack pointer, which is characterized by transferring data that can be transferred in increments and processing the data in the logical operation unit by an instruction from a function code.

23. A logical operation unit produces a result, transfers the result as one of a single word and a plurality of words to an operand stack, and one or more stack pointers identify the position of the operand stack for the result. The method for performing data processing using a stack pointer according to claim 22, wherein

24. When data is transferred from the operand stack to the logical operation unit, the stack pointer address information and the decode signal are transferred to the selector multiplexer, and the slot select signal received from the selector multiplexer is transferred to the output multiplexer. 23. The method of performing data processing with a stack pointer of claim 22, wherein the output multiplexer selects specific data from the operand stack.