JP5610551B2

JP5610551B2 - Data processor

Info

Publication number: JP5610551B2
Application number: JP2013019200A
Authority: JP
Inventors: 荒川　文男; 文男荒川
Original assignee: Renesas Electronics Corp
Current assignee: Renesas Electronics Corp
Priority date: 2013-02-04
Filing date: 2013-02-04
Publication date: 2014-10-22
Anticipated expiration: 2028-02-19
Also published as: JP2013093051A

Description

本発明は、マイクロプロセッサやマイクロコンピュータ等のデータプロセッサに係り、命令に対する効率的なコード割当を可能にする技術に関する。 The present invention relates to a data processor such as a microprocessor or a microcomputer, and relates to a technique that enables efficient code allocation to instructions.

マイクロプロセッサは、１９８４年にモトローラ社が開発した６８０２０以来長らく３２ビットプロセッサが主流であった。３２ビットで指定できる２^３２B=４GBは約２０年間にわたり十分大きなアドレス空間であったためである。しかしながら、システム性能向上に伴う必要メモリ容量の増大とメモリ単価の下落によって、近年、PC／サーバ分野で４ＧＢを越える空間を扱える６４ビットプロセッサが普及しつつある。そして、組込プロセッサにおいても、PC／サーバ分野に追随する形で数年から十年遅れで６４ビット化が進行すると予測される。 Microprocessors have been mainly 32-bit processors since 68020, developed by Motorola in 1984. This is because 2 ³² B = 4 GB that can be specified by 32 bits has been a sufficiently large address space for about 20 years. However, due to the increase in required memory capacity accompanying the improvement in system performance and the decrease in memory unit price, in recent years, 64-bit processors capable of handling a space exceeding 4 GB are becoming widespread in the PC / server field. Even in the embedded processor, it is predicted that 64-bit conversion will proceed with a delay of several to ten years following the PC / server field.

組込プロセッサは、性能最優先のPC／サーバ用プロセッサとは異なり、高効率と高性能の両立が求められる。この結果、高コード効率の実現可能な１６ビット固定長命令セットのRISC（Reduced Instruction Set Computer）型の組込プロセッサが普及している。高コード効率は、オフチップメモリの大容量化が進んだ現在においても、オンチップのキャッシュ、ＲＡＭやＲＯＭの有効活用には欠かせないものである。しかしながら、こうしたプロセッサを６４ビット化するには１６ビット固定長命令コード空間の効率的な活用が不可欠である。 Unlike the PC / server processor with the highest priority on performance, the embedded processor is required to have both high efficiency and high performance. As a result, RISC (Reduced Instruction Set Computer) type embedded processors having a 16-bit fixed length instruction set capable of realizing high code efficiency have become widespread. High code efficiency is indispensable for the effective use of on-chip cache, RAM, and ROM even in the present when the capacity of off-chip memory has increased. However, efficient use of the 16-bit fixed-length instruction code space is indispensable for making such a processor 64-bit.

また、３２ビットプロセッサの時代が長らく続いた結果、演算の基本が３２ビットとなり、８ビットや１６ビットのデータはプロセッサのレジスタ上で３２ビットに拡張して扱うか、４個の８ビットデータや２個の１６ビットデータとして３２ビット単位で扱うことが一般的となった。そして、６４ビットプロセッサにおいても、６４ビットの演算体系に加えて、こうした３２ビットを基本とした演算体系をサポートする必要がある。このため、既存の６４ビットプロセッサでは、必要に応じて同一演算に対して３２ビットと６４ビットの双方の演算命令を定義している。この結果、６４ビットプロセッサでは演算命令数が増大し、それらを定義するために必要なコード空間も増大している。 In addition, as the result of the era of 32-bit processors for a long time, the basic operation has become 32 bits, and 8-bit and 16-bit data can be handled as being expanded to 32 bits on the processor register, or four 8-bit data and It has become common to handle two 16-bit data in units of 32 bits. Even in a 64-bit processor, it is necessary to support an arithmetic system based on 32 bits in addition to a 64-bit arithmetic system. For this reason, in existing 64-bit processors, both 32-bit and 64-bit operation instructions are defined for the same operation as necessary. As a result, the number of operation instructions increases in a 64-bit processor, and the code space required to define them also increases.

特開平０６−３３７７８３号公報Japanese Patent Laid-Open No. 06-337783

PowerPC User Instruction Set Architecture Book I Version 2.02、Internet URL<http://www.ibm.com/developerworks/power/library/pa-archguidev2>、平成20年1月23日検索PowerPC User Instruction Set Architecture Book I Version 2.02, Internet URL <http://www.ibm.com/developerworks/power/library/pa-archguidev2>, January 23, 2008 search SH-4A拡張機能ソフトウェアマニュアル、Internet URL<http://documentation.renesas.com/jpn/products/mpumcu/rjj09b0235_sh4asm.pdf>、平成20年1月23日検索SH-4A Extended Function Software Manual, Internet URL <http://documentation.renesas.com/jpn/products/mpumcu/rjj09b0235_sh4asm.pdf>, January 23, 2008 search AMD64 Architecture Programmer’s Manual Volume１: Application Programming, Revision 3.11 、Internet URL<http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/24592.pdf>、平成20年1月23日検索AMD64 Architecture Programmer ’s Manual Volume 1: Application Programming, Revision 3.11, Internet URL <http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/24592.pdf>, search January 23, 2008

前述のように、１６ビット固定長命令コードのRISC型の組込プロセッサを６４ビット化するには命令のコード空間（単に命令コード空間とも証する）の効率的な活用が不可欠である。中でも、３２ビットプロセッサが３２ビット演算体系のみのサポートで済んだのに対して、６４ビットプロセッサにおいては３２ビットと６４ビットの双方の演算体系をサポートする必要がある。そして、このために既存の６４ビットプロセッサのように同一演算に対して３２ビットと６４ビットの双方の演算命令を定義すると、１６ビット固定長命令セットでは命令コード空間を圧迫し、既存の３２ビット演算体系並みの６４ビット演算体系を構築することが難しい。例えば３２ビットの演算体系を有する命令セットの命令のオペレーションコードが、８ビットで表すことが出来る２５６種類あるとき、複数の６４ビット演算命令を単に追加しようとすると、オペレーションコードのビット数を少なくとも1ビット増やすことが必要になり、命令コード空間が大きくなり、既存の３２ビットの演算の命令体系を維持することが出来なくなり。 As described above, in order to make a 16-bit fixed-length instruction code RISC type embedded processor 64-bit, efficient use of an instruction code space (also simply referred to as an instruction code space) is indispensable. In particular, a 32-bit processor only needs to support a 32-bit arithmetic system, whereas a 64-bit processor needs to support both 32-bit and 64-bit arithmetic systems. For this reason, when both 32-bit and 64-bit operation instructions are defined for the same operation as in an existing 64-bit processor, the 16-bit fixed-length instruction set compresses the instruction code space, and the existing 32-bit It is difficult to construct a 64-bit arithmetic system that is equivalent to the arithmetic system. For example, when there are 256 types of operation codes of an instruction set having a 32-bit operation system that can be expressed by 8 bits, if a plurality of 64-bit operation instructions are simply added, the number of bits of the operation code is set to at least 1 It becomes necessary to increase the number of bits, the instruction code space becomes large, and the existing instruction system for 32-bit operations cannot be maintained.

特に、６４ビット演算の演算結果の下位３２ビットが３２ビット演算のそれと同一な場合でも、演算結果から生成するフラグが３２ビットと６４ビットの演算で異なる場合は異なる命令を定義する必要がある。生成するフラグのみが異なる場合、１命令で生成するフラグ数を増やすことによってフラグを生成する命令数を削減することが出来る。例えば、文献１のPowerPCでは１命令で正・負・ゼロ・オーバーフロー・キャリーといった複数種類のフラグを生成している。更に、文献２では複数種類の複数サイズ用のフラグを生成している。即ち「種類数」×「サイズ数」のフラグを生成している。文献２では「４種類」×「２サイズ」＝「８フラグ」である。 In particular, even when the lower 32 bits of the result of the 64-bit operation are the same as those of the 32-bit operation, it is necessary to define different instructions if the flags generated from the operation result are different between the 32-bit and 64-bit operations. When only the flags to be generated are different, the number of instructions for generating a flag can be reduced by increasing the number of flags to be generated by one instruction. For example, PowerPC in Document 1 generates multiple types of flags such as positive, negative, zero, overflow, and carry with one instruction. Further, in Document 2, a plurality of types of flags for a plurality of sizes are generated. That is, a flag “number of types” × “number of sizes” is generated. In Document 2, “4 types” × “2 sizes” = “8 flags”.

しかしながら、１命令で生成するフラグ数を増加させるとフラグを使用する命令数を増加させる必要がある。例えば、条件分岐命令の分岐条件は「どのフラグを使用するか」と「使用するフラグがセットされているかクリアされているか」の組合せで決定する方式が一般的である。文献２の条件分岐命令では、フラグの使い方を指定するフィールドに５ビット取っていて、３２通りの指定が可能となっている。したがって、条件分岐命令数は３２×「フラグ以外のバリエーション数」となる。フラグ以外のバリエーションとしてはディレイスロットの有無、分岐先アドレス指定方法等が考えられる。 However, if the number of flags generated by one instruction is increased, it is necessary to increase the number of instructions using the flag. For example, the branch condition of a conditional branch instruction is generally determined by a combination of “which flag is used” and “whether the flag to be used is set or cleared”. The conditional branch instruction in Document 2 takes 5 bits in the field for specifying how to use the flag, and 32 types can be specified. Therefore, the number of conditional branch instructions is 32 × “the number of variations other than flags”. Possible variations other than flags include the presence / absence of a delay slot and a branch destination address designation method.

このように「フラグ数の増加」は「フラグを生成する命令（フラグ生成命令）数の削減」に貢献する反面「フラグを使用する命令（フラグ使用命令）数の増大」を招く。したがって、文献２のようにフラグ数を増やせば命令数が削減できるとは限らない。文献２はCISC（Complicated Instruction Set Computer）を前提としており、主要なフラグ生成命令である演算命令はメモリオペランドも指定できるため数が多い。そして、フラグ数を増やして数の多い「フラグ生成命令数の削減」を行えば命令数を削減できた。一方、一般的なRISCは３２ビット固定長命令セットであり、命令コード空間に余裕があるため、命令数削減ニーズが小さい。このためRISCにおいてフラグ数を調整して命令数の最小化を図った例はない。しかし、１６ビット固定長命令セットのRISCの６４ビットプロセッサ化においては命令コード空間に余裕がない。また、RISCはCISCよりフラグ生成命令数が少ない。このため、フラグ数を増やすだけでは最適化ポイントを見出せない。そして、フラグ生成命令数とフラグ使用命令数のバランスが良くなる方式にすることが重要である。 Thus, “increasing the number of flags” contributes to “reducing the number of instructions that generate flags (flag generation instructions)”, but causes “increase in the number of instructions using flags (flag using instructions)”. Therefore, if the number of flags is increased as in Document 2, the number of instructions cannot always be reduced. Document 2 is predicated on CISC (Complicated Instruction Set Computer), and there are a large number of arithmetic instructions, which are main flag generation instructions, because memory operands can also be specified. Then, the number of instructions could be reduced by increasing the number of flags and performing “reduction of the number of flag generation instructions”. On the other hand, general RISC is a 32-bit fixed-length instruction set, and there is a margin in the instruction code space, so the need for reducing the number of instructions is small. For this reason, there is no example of adjusting the number of flags in RISC to minimize the number of instructions. However, when the 16-bit fixed-length instruction set is made into a 64-bit processor of RISC, there is no room in the instruction code space. RISC has fewer flag generation instructions than CISC. For this reason, an optimization point cannot be found only by increasing the number of flags. It is important to adopt a system that improves the balance between the number of flag generation instructions and the number of flag use instructions.

本発明が解決しようとする第１の課題は、フラグ生成命令数の少ない命令セットにおいてフラグ数を調整して命令数を削減し、それらを定義するために必要なコード空間の最小化を図り、１６ビット固定長命令セットのRISCのように命令コード空間に余裕がないプロセッサの６４ビット化を可能にすることである。 The first problem to be solved by the present invention is to reduce the number of instructions by adjusting the number of flags in an instruction set with a small number of flag generation instructions, and to minimize the code space necessary to define them. This is to enable a 64-bit processor having a sufficient instruction code space, such as RISC having a 16-bit fixed length instruction set.

また、一般に、フラグ数を増やしても、１命令で生成した複数のフラグを使用する例は少なく、１つだけを使用する場合が多い。一方、複数命令で生成したフラグを組み合わせて使用すると、プログラムを効率的に出来る場合がある。しかし、命令を実行する度に複数のフラグを更新すると、先行命令が生成したフラグを後続の命令が上書きしてしまうため、フラグを組み合わせて使用することは困難である。このため、生成したフラグを逐次レジスタに転送して、レジスタ上で論理演算してからフラグに戻したり、レジスタ上で論理演算した結果を数値として判定してフラグを生成したり、フラグを生成する度に条件分岐や条件実行をする必要がある。これらの方式は実行命令数が多くなったり、分岐頻度が増加したりするため効率が悪く性能が低下する。 In general, even if the number of flags is increased, there are few examples of using a plurality of flags generated by one instruction, and only one is often used. On the other hand, if a combination of flags generated by a plurality of instructions is used, the program may be made efficient. However, if a plurality of flags are updated each time an instruction is executed, a flag generated by a preceding instruction is overwritten by a subsequent instruction, and it is difficult to use a combination of flags. For this reason, the generated flag is sequentially transferred to the register and logically operated on the register and then returned to the flag. The result of the logical operation on the register is determined as a numerical value to generate a flag, or a flag is generated. It is necessary to perform conditional branching and conditional execution each time. In these methods, the number of executed instructions increases and the branch frequency increases, so that the efficiency is low and the performance is lowered.

特に、あるデータを演算対象としてみた場合、そのサイズが２種類であることはないため、複数種類のサイズに対するフラグを生成しても一方は不要である。適切な符号拡張またはゼロ拡張により、複数種類のサイズに対するフラグが同じ値になって、どちらでも使えるということはありうるが、一方が不要であることに変わりはない。したがって、複数のフラグを定義した場合、同時に更新するよりは、必要なものだけを更新して、残したいフラグは残し、フラグ間の演算を可能にすることも効果的である。しかしながら、これを実現するには、フラグ生成命令が更新するフラグと、フラグ使用命令が使用するフラグの双方の種類及び場所を指定する必要があり、最も大きな命令コード空間を必要とする。 In particular, when a certain piece of data is regarded as an operation target, there are no two types of sizes, and one of the two types of sizes is not necessary even if flags for a plurality of types are generated. With appropriate sign extension or zero extension, it is possible that flags for multiple sizes will have the same value and either can be used, but one is not required. Therefore, when a plurality of flags are defined, it is also effective to update only the necessary ones, leave the flags that are desired to remain, and enable the calculation between the flags, rather than updating them simultaneously. However, to realize this, it is necessary to specify both the type and location of the flag updated by the flag generation instruction and the flag used by the flag use instruction, and the largest instruction code space is required.

本発明が解決しようとする第２の課題は、命令コード空間の最小化を主目的として定義した複数フラグを、大きな命令コード空間を使わずに活用し、複数命令で生成したフラグを組み合わせて使用することを可能にすることである。 The second problem to be solved by the present invention is to use a plurality of flags defined mainly for the purpose of minimizing the instruction code space without using a large instruction code space and to combine flags generated by a plurality of instructions. It is possible to do.

本願において開示される発明のうち代表的なものの概要を簡単に説明すれば下記の通りである。 The following is a brief description of an outline of typical inventions disclosed in the present application.

本発明では、フラグ生成命令数が多い場合に１命令が生成するフラグ数を増やすことによって、フラグ生成命令数の減少がフラグ使用命令数の増加を上回るようにすることにより命令数の削減を実現するという観点を基に、オペランドのデータサイズに応じた複数フラグを生成する命令を定義すると言う手段を採用するものである。要するに、縮小命令セットコンピュータ型のデータプロセッサにおいて、複数データサイズのオペランドに対して演算処理が可能であって小さいデータサイズのオペランドに対する演算処理と等しい処理を大きいデータサイズのオペランドの下位側に対して行い演算処理されるオペランドのデータサイズに拘わらず夫々のデータサイズに対応するフラグを生成する命令を命令セットに加える。 In the present invention, when the number of flag generation instructions is large, the number of flags generated by one instruction is increased, so that the decrease in the number of flag generation instructions exceeds the increase in the number of flag use instructions. Based on this viewpoint, a means for defining an instruction for generating a plurality of flags according to the data size of the operand is adopted. In short, in a reduced instruction set computer type data processor, arithmetic processing is possible for operands of multiple data sizes, and processing equivalent to arithmetic processing for operands of small data sizes is applied to the lower side of operands of large data sizes. An instruction for generating a flag corresponding to each data size is added to the instruction set regardless of the data size of the operand to be processed.

第２の観点として、複数のフラグを定義して必要なものだけを更新して、残したいフラグは残し、フラグ間の演算を可能にするには、フラグ生成命令が更新するフラグと、フラグ使用命令が使用するフラグの双方の種類及び場所を指定するという手段を採用する。すなわち、前記命令が生成した夫々のデータサイズに対応するフラグのうち後続命令が生成するフラグによって更新するフラグの指定に加えて、修飾する後続命令が生成するフラグのうち使用するフラグの指定、及び指定した２つのフラグ間の論理演算の指定を夫々行うプレフィックス命令を命令セットに加える。 As a second aspect, in order to define a plurality of flags and update only the necessary ones, leave the flags that you want to keep, and enable operations between flags, the flags generated by the flag generation instruction and the flags used A means of specifying both types and locations of flags used by the instruction is adopted. That is, in addition to specifying a flag to be updated by a flag generated by a subsequent instruction among flags corresponding to each data size generated by the instruction, specifying a flag to be used among flags generated by a subsequent instruction to be modified; and A prefix instruction for specifying a logical operation between two specified flags is added to the instruction set.

本願において開示される発明のうち代表的なものによって得られる効果を簡単に説明すれば下記のとおりである。 The effects obtained by the representative ones of the inventions disclosed in the present application will be briefly described as follows.

すなわち、第１の観点の発明により、命令セットを構成する命令の種類（命令数）が全体として少なくなる。したがって、命令コード空間に余裕のないＲＩＳＣ型のデータプロセッサにおける命令コードのコード空間縮小に寄与することができる。例えば、１６ビット固定長命令セットのＲＩＳＣのように命令コード空間に余裕がないプロセッサの６４ビット化が可能になる。 That is, according to the invention of the first aspect, the types of instructions (the number of instructions) constituting the instruction set are reduced as a whole. Therefore, it is possible to contribute to the reduction of the code space of the instruction code in the RISC type data processor having no room for the instruction code space. For example, it becomes possible to make a 64-bit processor having a sufficient instruction code space, such as a 16-bit fixed-length instruction set RISC.

第２の観点の発明により、命令コード空間の最小化を主目的として定義した複数フラグを、大きな命令コード空間を使わずに活用し、複数命令で生成したフラグを組み合わせて使用することが可能になる。 The invention according to the second aspect makes it possible to use a plurality of flags defined mainly for the purpose of minimizing the instruction code space without using a large instruction code space, and to use a combination of flags generated by a plurality of instructions. Become.

本発明に係るデータプロセッサにおけるプロセッサコアの構成を概略的に例示するブロック図である。It is a block diagram which illustrates roughly the composition of the processor core in the data processor concerning the present invention. 本発明の実施形態１に係るプロセッサコアの実行ユニットを概略的に例示するブロック図である。FIG. 2 is a block diagram schematically illustrating an execution unit of a processor core according to the first embodiment of the present invention. 本発明の実施形態２に係るフラグ更新プレフィックス命令を概略的に例示する説明図である。It is explanatory drawing which illustrates schematically the flag update prefix instruction which concerns on Embodiment 2 of this invention. 本発明の実施形態２に係るプロセッサコアの命令デコードユニットを概略的に例示するブロック図である。It is a block diagram which illustrates schematically the instruction decode unit of the processor core which concerns on Embodiment 2 of this invention. 本発明の実施形態２に係るプロセッサコアの実行ユニットを概略的に例示するブロック図である。It is a block diagram which illustrates schematically the execution unit of the processor core which concerns on Embodiment 2 of this invention. 本発明の実施形態２に係るプロセッサコアの動作を概略的に例示する説明図である。It is explanatory drawing which illustrates schematically operation | movement of the processor core which concerns on Embodiment 2 of this invention. 本発明に係るデータプロセッサの概略的な構成を例示するブロック図である。1 is a block diagram illustrating a schematic configuration of a data processor according to the present invention.

１．実施の形態の概要
先ず、本願において開示される発明の代表的な実施の形態について概要を説明する。代表的な実施の形態についての概要説明で括弧を付して参照する図面中の参照符号はそれが付された構成要素の概念に含まれるものを例示するに過ぎない。 1. First, an outline of a typical embodiment of the invention disclosed in the present application will be described. Reference numerals in the drawings referred to in parentheses in the outline description of the representative embodiments merely exemplify what are included in the concept of the components to which the reference numerals are attached.

最初に上記夫々の観点について具体的に説明する。先ず、第１の観点に関し、１６ビット固定長命令セットのRISC型３２ビットプロセッサのフラグ生成命令数は、例えば文献３のSH-４Aプロセッサコアではオペランドフィールドが８ビットの命令が１７、４ビットの命令が１２である。尚、フラグを使用して更新する命令はフラグ生成命令としてカウントしている。また、６４ビットプロセッサ化に関係のない浮動小数点命令は除いている。一方、フラグ使用命令数は、オペランドフィールドが８ビットの命令が４、４ビットの命令が１である。そして、フラグ数は１である。フラグ数が少ないほどフラグ生成命令数が増え、フラグ使用命令数が減るため、SH-４Aプロセッサコアでは２９命令対５命令とフラグ生成命令数の方が約６倍多い。そして、２９のフラグ生成命令のうち２６命令はオペランドサイズによって動作が異なるため、単純に６４ビット命令を追加すると２６命令増加する。この結果、フラグ生成命令数とフラグ使用命令数の比は５５対５となり１１倍となる。 First, each of the above viewpoints will be specifically described. First, regarding the first aspect, the number of flag generation instructions of a 16-bit fixed length instruction set RISC type 32-bit processor is, for example, 17-4 for an instruction with an 8-bit operand field in the SH-4A processor core of Reference 3. The instruction is 12. Note that an instruction to be updated using a flag is counted as a flag generation instruction. Also, floating point instructions not related to the 64-bit processor are excluded. On the other hand, the number of instructions using a flag is 4 for an 8-bit instruction in the operand field and 1 for a 4-bit instruction. The number of flags is 1. Since the number of flag generation instructions increases and the number of flag use instructions decreases as the number of flags decreases, in the SH-4A processor core, 29 instructions versus 5 instructions and the number of flag generation instructions are about six times as many. Of the 29 flag generation instructions, 26 instructions have different operations depending on the operand size. Therefore, simply adding a 64-bit instruction increases 26 instructions. As a result, the ratio between the number of flag generation instructions and the number of flag use instructions is 55: 5, which is 11 times.

このようにフラグ生成命令数が多い場合、1命令が生成するフラグ数を増やすことによって比率を変えて命令数を削減することができる。増やし方としては、（１）フラグの種類、（２）オペランドサイズ、又は（３）その両方に応じたフラグを定義する方式が考えられる。 When the number of flag generation instructions is large in this way, the number of instructions can be reduced by changing the ratio by increasing the number of flags generated by one instruction. As a method of increasing, a method of defining flags according to (1) flag type, (2) operand size, or (3) both can be considered.

まず、（１）のフラグの種類に応じたフラグを定義する方式について考える。SH-４Aプロセッサコアのフラグの種類には、符号付大小、符号なし大小、ゼロ、オーバーフロー、キャリー、シフトアウトビット等がある。そして、フラグは１ビットなので、どの命令が立てたフラグかで意味が変わる。異なる演算が異なる種類のフラグを立てる場合、フラグの種類を増やしても命令数は減らないため、同一演算で生成するフラグのみ異なる場合に着目すると、比較命令が候補となる。比較命令は、符号付大小、符号なし大小、ゼロの３フラグを別フラグにすれば、１８命令を８命令にできる。他の命令はゼロ、オーバーフロー、キャリー、シフトアウトビット等を生成するが、演算が異なるためフラグを種類で分けても命令数削減効果はない。一方、フラグ使用命令数はフラグ種類数に応じて３倍になり、５命令から１５命令になる。この結果、フラグの関係する命令数は６０命令から１０命令減って１０命令増えるため６０命令のままである。 First, consider a method of defining a flag according to the type of flag (1). The types of flags of the SH-4A processor core include signed large / small, unsigned large / small, zero, overflow, carry, shift-out bit, and the like. Since the flag is 1 bit, the meaning changes depending on which flag is set. When different types of flags are set for different operations, the number of instructions does not decrease even if the number of types of flags is increased. Therefore, focusing on the case where only the flags generated by the same operation are different, comparison instructions are candidates. The comparison instruction can be changed from 18 instructions to 8 instructions by changing the three flags of signed size, unsigned size, and zero to different flags. Other instructions generate zero, overflow, carry, shift-out bits, etc., but since the operations are different, there is no effect in reducing the number of instructions even if the flags are divided by type. On the other hand, the number of instructions using the flag is tripled according to the number of flag types, from 5 instructions to 15 instructions. As a result, the number of instructions related to the flag is reduced by 10 instructions from 60 instructions and increased by 10 instructions, so it remains 60 instructions.

次に、（２）のオペランドサイズに応じたフラグを定義する方式について考える。３２ビットと６４ビットのオペランドサイズ毎にフラグを設けると、オペランドサイズによって動作が異なる２９命令のうち、下位３２ビットの動作が同一な１５命令を３２ビットと６４ビットで共通の命令とすることができる。一方、フラグは２倍となるため、フラグ使用命令数は５命令から１０命令になる。この結果、フラグの関係する命令数は６０命令から１５命令減って５命令増えるため５０命令に減少する。 Next, consider a method of defining a flag according to the operand size in (2). If a flag is provided for each of the 32-bit and 64-bit operand sizes, among the 29 instructions whose operations differ depending on the operand size, 15 instructions having the same operation in the lower 32 bits may be used as common instructions for 32 bits and 64 bits. it can. On the other hand, since the flag is doubled, the number of instructions using the flag is changed from 5 instructions to 10 instructions. As a result, the number of instructions related to the flag decreases by 15 instructions from 60 instructions and increases by 5 instructions, and thus decreases to 50 instructions.

更に、（３）の両方に応じたフラグを定義する方式を考える。まず、フラグを３種類にすることによって、比較命令を１８命令から８命令にできる。更にサイズ毎にフラグを定義することによって８命令を４命令にすることができる。また、比較命令以外のフラグ生成命令のうち下位３２ビットの動作が同一な命令を６命令削減できる。一方、フラグ使用命令数はフラグ種類数に応じて６倍になり、５命令から３０命令になる。この結果、フラグの関係する命令数は６０命令から２０命令減って２５命令増えるため６５命令に増加する。 Further, consider a method of defining a flag corresponding to both (3). First, by using three kinds of flags, the comparison instruction can be changed from 18 instructions to 8 instructions. Furthermore, 8 instructions can be changed to 4 instructions by defining a flag for each size. Further, among the flag generation instructions other than the comparison instruction, it is possible to reduce six instructions having the same operation in the lower 32 bits. On the other hand, the number of flag use instructions is 6 times according to the number of flag types, from 5 instructions to 30 instructions. As a result, the number of instructions related to the flag increases from 65 instructions by 20 instructions to 25 instructions and thus increases to 65 instructions.

以上のように、命令数最小化という観点からフラグ数を最適化すると、（２）オペランドサイズに応じてフラグを定義する方式が最善であることが明らかになった。 As described above, when the number of flags is optimized from the viewpoint of minimizing the number of instructions, it has become clear that (2) the method of defining flags according to the operand size is the best.

命令が消費する命令コード空間は命令がオペランドフィールドに使用するビット数によって大きく変化する。Nビットでは命令コード空間全体の２の(１６-N）乗分の一の空間を消費する。例えば８ビットならば１/２５６、４ビットならば１/４０９６の空間を消費する。このため、重要なのは８ビットオペランドフィールドの命令数の削減である。 The instruction code space consumed by the instruction varies greatly depending on the number of bits used by the instruction in the operand field. With N bits, the space of 1 / (16−N) times of the entire instruction code space is consumed. For example, a space of 1/256 is consumed for 8 bits, and 1/4096 space is consumed for 4 bits. Therefore, what is important is a reduction in the number of instructions in the 8-bit operand field.

そこで、上記考察を８ビットオペランドフィールドを有する命令（単に８ビットオペランドフィールド命令とも称する）に限定して行うと、３２ビットプロセッサのフラグ関連の８ビットオペランドフィールド命令は生成命令が１７、使用命令が条件分岐命令４の計２１命令であり、このうちオペランドサイズによって動作が異なる命令は生成命令の１５命令なので、単純に６４ビット命令を追加すると生成命令が１５命令増加して３２命令になり、計３６命令となる。 Therefore, if the above consideration is limited to an instruction having an 8-bit operand field (also simply referred to as an 8-bit operand field instruction), the flag-related 8-bit operand field instruction of the 32-bit processor has 17 generation instructions and use instructions. Conditional branch instruction 4 is a total of 21 instructions. Of these, instructions whose operations differ depending on the operand size are 15 generated instructions. Therefore, if a 64-bit instruction is simply added, the generated instructions are increased by 15 instructions to 32 instructions. There are 36 instructions.

まず、（１）のフラグの種類に応じたフラグを定義する方式では、フラグを３種類にすることによって、比較命令を１７命令から６命令にできる反面、条件分岐命令が４命令から１２命令に増加するので、フラグの関係する命令数は３６命令から８命令減って８命令増えるため３６命令のままである。 First, in the method of defining a flag according to the type of flag (1), the comparison instruction can be changed from 17 instructions to 6 instructions by changing the flag to 3 types, but the conditional branch instruction is changed from 4 instructions to 12 instructions. Since the number increases, the number of instructions related to the flag decreases by 8 instructions from 36 instructions and increases by 8 instructions, so it remains 36 instructions.

次に、（２）のオペランドサイズに応じたフラグを定義する方式では、オペランドサイズによって動作が異なる１５命令のうち、下位３２ビットの動作が同一な１０命令を３２ビットと６４ビットで共通の命令とすることができる。一方、フラグは２倍となるため、条件分岐命令数は４命令から８命令になる。この結果、フラグの関係する命令数は３６命令から１０命令減って４命令増えるため３０命令に減少する。 Next, in the method of defining a flag according to the operand size in (2), among the 15 instructions whose operations differ depending on the operand size, 10 instructions having the same operation in the lower 32 bits are shared by 32 bits and 64 bits. It can be. On the other hand, since the flag is doubled, the number of conditional branch instructions is changed from 4 instructions to 8 instructions. As a result, the number of instructions related to the flag is reduced to 30 instructions because it is reduced by 10 instructions from 36 instructions and increased by 4 instructions.

更に、（３）の両方に応じたフラグを定義する方式を考える。まず、フラグを３種類にすることによって、比較命令を１４命令から６命令にできる。更にサイズ毎にフラグを定義することによって６命令を３命令にすることができる。また、比較命令以外のフラグ生成命令のうち下位３２ビットの動作が同一な命令を３命令削減できる。一方、フラグ使用命令数はフラグ種類数に応じて６倍になり、４命令から２４命令になる。この結果、フラグの関係する命令数は３６命令から１４命令減って２０命令増えるため４２命令に増加する。 Further, consider a method of defining a flag corresponding to both (3). First, by using three types of flags, the comparison instruction can be changed from 14 instructions to 6 instructions. Furthermore, 6 instructions can be changed to 3 instructions by defining a flag for each size. In addition, among the flag generation instructions other than the comparison instruction, it is possible to reduce three instructions having the same operation in the lower 32 bits. On the other hand, the number of flag use instructions is 6 times according to the number of flag types, from 4 instructions to 24 instructions. As a result, the number of instructions related to the flag increases by 14 instructions from 36 instructions to 20 instructions and increases to 42 instructions.

以上のように命令コード空間消費に影響の大きい８ビットオペランドフィールド命令に限定しても（２）のオペランドサイズに応じてフラグを定義する方式が最善であることが解る。一方、文献２でCISCに適用した際は、命令のコードサイズ最小化に最善であった（３）の方式がRISCでは最悪の方式となっている。 As described above, it can be understood that the method of defining the flag in accordance with the operand size in (2) is the best even when limited to the 8-bit operand field instruction having a large influence on the instruction code space consumption. On the other hand, when applied to CISC in Document 2, the method (3), which was best for minimizing the code size of instructions, is the worst method in RISC.

本発明の第１の課題であるフラグ生成命令数の少ない命令セットにおいてフラグ数を調整して命令数を削減し、それらを定義するために必要なコード空間の最小化を図り、１６ビット固定長命令セットのRISCのように命令コード空間に余裕がないプロセッサの６４ビット化を可能にすることは、オペランドサイズに応じたフラグを定義することによって達成する。具体的には、３２ビットと６４ビットのオペランドサイズ毎にフラグを設け、３２ビットと６４ビットオペランドの命令で下位３２ビットの動作が同一な命令を統合し、フラグ数増加に応じた条件分岐等のフラグ使用命令数を増加させることにより達成する。これにより、命令セットを構成する命令の種類（命令数）が全体として少なくなる。 The first problem of the present invention is to reduce the number of instructions by adjusting the number of flags in an instruction set with a small number of flag generation instructions, to minimize the code space required to define them, and to have a fixed length of 16 bits Enabling a 64-bit processor having no instruction code space like RISC of the instruction set is achieved by defining a flag according to the operand size. Specifically, a flag is provided for each of the 32-bit and 64-bit operand sizes, and instructions with the same low-order 32-bit operation are integrated in 32-bit and 64-bit operand instructions, conditional branching according to an increase in the number of flags, etc. This is achieved by increasing the number of flags using instructions. As a result, the types of instructions (instruction numbers) constituting the instruction set are reduced as a whole.

第２の観点に関しては、課題のところで述べたように、複数のフラグを定義して必要なものだけを更新して、残したいフラグは残し、フラグ間の演算を可能にするには、フラグ生成命令が更新するフラグと、フラグ使用命令が使用するフラグの双方の種類及び場所を指定する必要があり、最も大きな命令コード空間を必要とする。 Regarding the second point of view, as described in the section above, in order to define a plurality of flags and update only the necessary ones, leave the flags you want to keep, and enable operations between flags, generate a flag. It is necessary to specify the type and location of both the flag updated by the instruction and the flag used by the flag using instruction, and the largest instruction code space is required.

この問題を解決するには、後続命令を修飾する命令であるプレフィックス命令を定義すればよい。プレフィックス命令の実装は可変長命令セットの実装に類似しており、文献４の８７頁からの記載のようにプレフィックス命令を使うプロセッサは従来から存在する。そして、本分野の通常のスキルの技術者であればその実装は可能である。そして、どのようなプレフィックス命令を定義するかが重要となる。本発明ではフラグ更新プレフィックス命令として、更新するフラグの指定、後続命令が生成したフラグのうち使用するフラグの指定、指定した２つのフラグ間の論理演算の指定を行う。フラグを２種類、演算を８種類とすると、５ビットのオペランドフィールドで指定することができ、大きな命令コード空間を必要としない。論理演算も指定できるため、残したいフラグの更新抑止が可能な命令セットで論理演算を別命令で行う場合と同一の命令数でフラグ間の論理演算を実行することができる。これにより、命令コード空間の最小化を主目的として定義した複数フラグを、大きな命令コード空間を使わずに活用し、複数命令で生成したフラグを組み合わせて使用することが可能となる。 In order to solve this problem, it is only necessary to define a prefix instruction that is an instruction that modifies the subsequent instruction. The implementation of the prefix instruction is similar to the implementation of the variable-length instruction set, and there are conventional processors that use the prefix instruction as described from page 87 of Document 4. And if it is an engineer of ordinary skill in this field, it can be implemented. And what kind of prefix instruction is defined is important. In the present invention, as a flag update prefix instruction, a flag to be updated is specified, a flag to be used among flags generated by a subsequent instruction, and a logical operation between two specified flags are specified. If there are two types of flags and eight types of operations, they can be specified by a 5-bit operand field, and a large instruction code space is not required. Since a logical operation can also be specified, a logical operation between flags can be executed with the same number of instructions as when a logical operation is performed with another instruction in an instruction set that can suppress the update of a flag to be retained. As a result, a plurality of flags defined mainly for the purpose of minimizing the instruction code space can be used without using a large instruction code space, and flags generated by a plurality of instructions can be used in combination.

上記観点を踏まえて代表的な実施の形態を説明する。 A typical embodiment will be described based on the above viewpoint.

〔１〕縮小命令セットコンピュータ型のデータプロセッサは、複数データサイズのオペランドに対して演算処理が可能であって小さいデータサイズのオペランドに対する演算処理と等しい処理を大きいデータサイズのオペランドの下位側に対して行い演算処理されるオペランドのデータサイズに拘わらず夫々のデータサイズに対応するフラグ（ｎｅｗＵ，ｎｅｗＴ）を生成する第１命令を命令セットに有する。これにより、命令セットを構成する命令の種類（命令数）が全体として少なくなる。したがって、命令のコード空間に余裕のないＲＩＳＣ型のデータプロセッサにおける命令コード空間の縮小に寄与することができる。例えば、１６ビット固定長命令セットのＲＩＳＣのように命令コード空間に余裕がないプロセッサの６４ビット化が可能になる。 [1] Reduced instruction set A computer-type data processor can perform arithmetic processing on operands having a plurality of data sizes, and performs processing equivalent to arithmetic processing on operands having a small data size on the lower side of an operand having a large data size. The instruction set has a first instruction for generating a flag (newU, newT) corresponding to each data size regardless of the data size of the operand to be processed. As a result, the types of instructions (instruction numbers) constituting the instruction set are reduced as a whole. Therefore, it is possible to contribute to the reduction of the instruction code space in the RISC type data processor in which the instruction code space has no room. For example, it becomes possible to make a 64-bit processor having a sufficient instruction code space, such as a 16-bit fixed-length instruction set RISC.

〔２〕項１のデータプロセッサは、例えば前記第１命令によって生成されたフラグを選択して使用する第２命令を更に前記命令セットに有する。 [2] The data processor according to [1] further includes, for example, a second instruction in the instruction set that selects and uses a flag generated by the first instruction.

〔３〕項１のデータプロセッサは、例えば前記第１命令が生成した夫々のデータサイズに対応するフラグのうち後続命令が生成するフラグによって更新するフラグを指定する当該後続命令を修飾するプレフィックス命令を更に前記命令セットに有する。これにより、複数のフラグを定義して必要なものだけを更新することができる。 [3] The data processor according to item 1, for example, includes a prefix instruction that qualifies the subsequent instruction that specifies a flag to be updated by a flag generated by a subsequent instruction among flags corresponding to each data size generated by the first instruction. Furthermore, it has in the said instruction set. Thereby, a plurality of flags can be defined and only necessary ones can be updated.

〔４〕項１のデータプロセッサは、例えば前記第１命令が生成した夫々のデータサイズに対応するフラグのうち後続命令が生成するフラグによって更新するフラグの指定に加えて、修飾する後続命令が生成するフラグのうち使用するフラグの指定、及び指定した２つのフラグ間の論理演算の指定を夫々行うプレフィックス命令を更に前記命令セットに有する。これにより、複数のフラグを定義して必要なものだけを更新して、残したいフラグは残し、フラグ間の演算を可能にすることができる。したがって、命令コード空間の最小化を主目的として定義した複数フラグを、大きな命令コード空間を使わずに活用し、複数命令で生成したフラグを組み合わせて使用することが可能になる。 [4] In the data processor according to item 1, for example, in addition to specifying a flag to be updated by a flag generated by a subsequent instruction among flags corresponding to each data size generated by the first instruction, a subsequent instruction to be generated is generated The instruction set further includes a prefix instruction for specifying a flag to be used among the flags to be used and specifying a logical operation between the two specified flags. Thereby, it is possible to define a plurality of flags and update only necessary ones, leave the flags to be left, and enable operations between the flags. Therefore, it is possible to use a plurality of flags defined mainly for the purpose of minimizing the instruction code space without using a large instruction code space and to combine flags generated by a plurality of instructions.

〔５〕項１のデータプロセッサにおいて、例えば前記複数データサイズは、３２ビットと６４ビットである。 [5] In the data processor according to item 1, for example, the plurality of data sizes are 32 bits and 64 bits.

〔６〕項２のデータプロセッサにおいて、例えば前記フラグは、複数データサイズ毎の、符号付き大小、符号無し大小、ゼロ、オーバーフロー、キャリー、又はシフトアウトビットである。 [6] In the data processor according to item 2, for example, the flag is a signed size, unsigned size, zero, overflow, carry, or shift-out bit for each of a plurality of data sizes.

〔７〕命令実行部（ＥＸＵ）を有する縮小命令セットコンピュータ型の別のデータプロセッサは、フラグの生成を伴う処理を実行するための第１命令及びフラグの使用を伴う処理を実行するための第２命令を命令セットに有する。前記命令実行部は命令デコード結果に従った処理を行なう演算回路（ＡＬＵ，ＳＦＴ）、フラグラッチ回路（Ｕ，Ｔ）及びフラグ選択回路（ＦＭＵＸ）を有する。前記演算回路は、前記第１命令のデコード結果に従って、複数データサイズのオペランドに対して演算処理が可能であって小さいデータサイズのオペランドに対する演算処理と等しい処理を大きいデータサイズのオペランドの下位側に対して行い演算処理されるオペランドのデータサイズに拘わらず夫々のデータサイズに対応するフラグを生成する。前記フラグラッチ回路は、前記第１命令のデコード結果に従って、前記演算回路で生成されたフラグをラッチする。前記フラグ選択回路は、前記第２命令のデコード結果に従って、前記フラグラッチ回路にラッチされたフラグを選択する。 [7] Another data processor of the reduced instruction set computer type having an instruction execution unit (EXU) is a first instruction for executing a process involving generation of a flag and a process for using the flag. Has two instructions in the instruction set. The instruction execution unit includes arithmetic circuits (ALU, SFT), a flag latch circuit (U, T), and a flag selection circuit (FMUX) that perform processing according to the instruction decode result. The arithmetic circuit is capable of performing arithmetic processing on operands having a plurality of data sizes according to the decoding result of the first instruction, and performing processing equivalent to arithmetic processing on operands having a small data size on the lower side of the operand having a large data size A flag corresponding to each data size is generated irrespective of the data size of the operands that are operated on. The flag latch circuit latches the flag generated by the arithmetic circuit according to the decoding result of the first instruction. The flag selection circuit selects a flag latched by the flag latch circuit according to a decoding result of the second instruction.

〔８〕項７のデータプロセッサにおいて、例えば前記演算回路はデータサイズ毎に符号付き大小、符号無し大小、ゼロ、オーバーフロー、キャリー、及びシフトアウトビットのフラグを生成し、生成したフラグから一種類のフラグが第１命令で選択されてオペランドサイズ毎に前記フラグラッチ回路にラッチされる。 [8] In the data processor according to item 7, for example, the arithmetic circuit generates a flag of signed size, unsigned size, zero, overflow, carry, and shift-out bit for each data size, and one type of flag is generated from the generated flag. A flag is selected by the first instruction and latched in the flag latch circuit for each operand size.

〔９〕項８のデータプロセッサにおいて、例えば前記複数データサイズは、３２ビットと６４ビットである。 [9] In the data processor according to item 8, for example, the plurality of data sizes are 32 bits and 64 bits.

〔１０〕更に別のデータプロセッサは、演算処理を行なって複数のフラグを生成可能な演算命令と共に、前記演算命令が生成した複数のフラグのうち後続命令が生成するフラグによって更新するフラグを指定する当該後続命令を修飾するプレフィックス命令を命令セットに有する。 [10] Still another data processor specifies a flag to be updated by a flag generated by a subsequent instruction among a plurality of flags generated by the calculation instruction, together with a calculation instruction capable of performing calculation processing and generating a plurality of flags. The instruction set has a prefix instruction that modifies the subsequent instruction.

〔１１〕更に別のデータプロセッサは、演算処理を行なって複数のフラグを生成可能な演算命令と共に、前記演算命令が生成した複数のフラグのうち後続命令が生成するフラグによって更新するフラグの指定に加えて、修飾する後続命令が生成するフラグのうち使用するフラグの指定、及び指定した２つのフラグ間の論理演算の指定を夫々行うプレフィックス命令を命令セットに有する。 [11] Still another data processor is configured to designate a flag to be updated by a flag generated by a subsequent instruction among a plurality of flags generated by the calculation instruction, together with a calculation instruction capable of performing calculation processing and generating a plurality of flags. In addition, the instruction set includes a prefix instruction for designating a flag to be used among flags generated by a subsequent instruction to be modified and a logical operation between the two designated flags.

２．実施の形態の詳細
実施の形態について更に詳述する。以下、本発明を実施するための最良の形態を図面に基づいて詳細に説明する。なお、発明を実施するための最良の形態を説明するための全図において、同一の機能を有する部材には同一の符号を付し、その繰り返しの説明は省略する。 2. Details of Embodiments Embodiments will be further described in detail. The best mode for carrying out the present invention will be described below in detail with reference to the drawings. Note that members having the same function are denoted by the same reference symbols throughout the drawings for describing the best mode for carrying out the invention, and the repetitive description thereof will be omitted.

《実施形態１》
図７には本発明に係るデータプロセッサＤＰＵが例示される。データプロセッサＤＰＵは中央処理装置のようなプロセッサコアＣＰＵを中心に、これに内部バスで接続された不揮発性メモリＲＯＭ、揮発性メモリＲＡＭ、入出力インタフェース回路ＩＯＣ、及び外部バスインタフェース回路ＥＢＩＦ等を備え、例えば相補型ＭＯＳ修正回路製造技術により単結晶シリコン等の１個の半導体基板に形成される。不揮発性メモリＲＯＭはプロセッサコアＣＰＵが実行するプログラム等の格納領域に利用され、揮発性メモリＲＡＭはプロセッサコアＣＰＵのワーク領域等に利用される。 Embodiment 1
FIG. 7 illustrates a data processor DPU according to the present invention. The data processor DPU mainly includes a processor core CPU such as a central processing unit, and includes a nonvolatile memory ROM, a volatile memory RAM, an input / output interface circuit IOC, an external bus interface circuit EBIF and the like connected to the processor core CPU. For example, it is formed on one semiconductor substrate such as single crystal silicon by a complementary MOS correction circuit manufacturing technique. The nonvolatile memory ROM is used as a storage area for programs executed by the processor core CPU, and the volatile memory RAM is used as a work area for the processor core CPU.

図１にはプロセッサコアＣＰＵのブロック構成が概略的に例示される。例えばプロセッサコアＣＰＵは、命令キャッシュＩＣ、命令フェッチユニットＩＦＵ、命令デコードユニットＩＤＵ、実行ユニットＥＸＵ、ロードストアユニットＬＳＵ、データキャッシュＤＣ、及びバスインタフェースユニットユニットＢＩＵから成る。 FIG. 1 schematically illustrates a block configuration of a processor core CPU. For example, the processor core CPU includes an instruction cache IC, an instruction fetch unit IFU, an instruction decode unit IDU, an execution unit EXU, a load store unit LSU, a data cache DC, and a bus interface unit unit BIU.

命令フェッチユニットＩＦＵは命令アドレスＩＡを命令キャッシュＩＣに出力し、命令キャッシュＩＣは命令アドレスＩＡで指定されたアドレスからフェッチした命令ＦＩを命令フェッチユニットＩＦＵに返す。キャッシュミスした場合は、ミスしたアドレスを外部命令アドレスＥＩＡとしてバスインタフェースユニットユニットＢＩＵに出力し、外部フェッチ命令ＥＩを受け取ってから、命令ＦＩを命令フェッチユニットＩＦＵに返す。 The instruction fetch unit IFU outputs the instruction address IA to the instruction cache IC, and the instruction cache IC returns the instruction FI fetched from the address specified by the instruction address IA to the instruction fetch unit IFU. When a cache miss occurs, the missed address is output to the bus interface unit unit BIU as the external instruction address EIA, and after receiving the external fetch instruction EI, the instruction FI is returned to the instruction fetch unit IFU.

命令デコードユニットＩＤＵは、命令フェッチユニットＩＦＵから命令ＯＰを受け取り、分岐制御信号ＢＲＣを出力する。また、命令ＯＰをデコードし、実行ユニットＥＸＵ及びロードストアユニットＬＳＵにそれぞれ実行制御情報ＥＸＣ及びロードストア制御情報ＬＳＣを出力すると共に、レジスタファイルＲＦにアクセスし、実行用オペランドＥＸＡ及びＥＸＢを実行ユニットＥＸＵに、ロードストア用アドレスオペランドＬＳＡ及びＬＳＢ、並びにストアデータＳＤをロードストアユニットＬＳＵに供給する。更に、実行結果ＥＸＯを実行ユニットＥＸＵから、ロードデータＬＤをロードストアユニットＬＳＵから受け取り、レジスタファイルＲＦに格納する。 The instruction decode unit IDU receives the instruction OP from the instruction fetch unit IFU and outputs a branch control signal BRC. In addition, the instruction OP is decoded, and the execution control information EXC and the load / store control information LSC are output to the execution unit EXU and the load / store unit LSU, respectively, and the register file RF is accessed to execute the execution operands EXA and EXB in the execution unit EXU. The load store address operands LSA and LSB and the store data SD are supplied to the load store unit LSU. Further, the execution result EXO is received from the execution unit EXU, and the load data LD is received from the load store unit LSU, and stored in the register file RF.

実行ユニットＥＸＵは命令デコードユニットＩＤＵから実行制御情報ＥＸＣ、実行用オペランドＥＸＡ及びＥＸＢを受け取り、実行制御情報ＥＸＣに従って演算実行した後、実行結果ＥＸＯを命令デコードユニットＩＤＵに返す。 The execution unit EXU receives the execution control information EXC and the execution operands EXA and EXB from the instruction decode unit IDU, performs an operation according to the execution control information EXC, and then returns an execution result EXO to the instruction decode unit IDU.

ロードストアユニットＬＳＵは命令デコードユニットＩＤＵからロードストア制御情報ＬＳＣ、ロードストア用アドレスオペランドＬＳＡ及びＬＳＢ、並びにストアデータＳＤを受け取り、ロードストア制御情報ＬＳＣに従ってロードストア実行した後、ロードデータＬＤを命令デコードユニットＩＤＵに返す。また、ロードストアの際には、データキャッシュＤＣにデータアドレスＤＡを出力し、更にストアの際には、データキャッシュストアデータＤＣＳＤも出力する。そして、データキャッシュＤＣはロードの際にはデータキャッシュロードデータＤＣＬＤをロードストアユニットＬＳＵ返し、ストアの際はデータキャッシュストアデータＤＣＳＤをストアする。キャッシュミスした場合は、ミスしたアドレスを外部データアドレスＥＤＡとしてバスインタフェースユニットユニットＢＩＵに出力し、外部ロードデータＥＬＤを受け取ってから、データキャッシュロードデータＤＣＬＤをロードストアユニットＬＳＵに返す。また、キャッシュミスに伴うデータのコピーバックや、キャッシュしないデータの外部ストア時には、それらのデータを外部ストアデータＥＳＤとして出力すると共に、それらのデータのアドレスを外部データアドレスＥＤＡとして出力する。 The load / store unit LSU receives the load / store control information LSC, the load / store address operands LSA and LSB, and the store data SD from the instruction decode unit IDU, executes load / store in accordance with the load / store control information LSC, and then decodes the load data LD. Return to unit IDU. In addition, the data address DA is output to the data cache DC during the load store, and the data cache store data DCSD is also output during the store. The data cache DC returns the data cache load data DCLD when loading, and stores the data cache store data DCSD when storing. In the case of a cache miss, the missed address is output to the bus interface unit unit BIU as the external data address EDA, and after receiving the external load data ELD, the data cache load data DCLD is returned to the load store unit LSU. In addition, when data is copied back due to a cache miss or when data that is not cached is externally stored, the data is output as external store data ESD and the address of the data is output as an external data address EDA.

バスインタフェースユニットＢＩＵは、命令キャッシュＩＣ又はデータキャッシュＤＣから、それぞれ外部命令アドレスＥＩＡ又は外部データアドレスＥＤＡを受け取り、プロセッサコアＣＰＵ外に外部アドレスＥＡを出力してデータを要求し、外部データＥＤとして受け取り、それぞれ外部フェッチ命令ＥＩ又は外部ロードデータＥＬＤとして出力する。また、データキャッシュＤＣから、外部データアドレスＥＤＡ及び外部ストアデータＥＳＤを受け取り、プロセッサコアＣＰＵ外に外部アドレスＥＡ及び外部データＥＤとして出力し、ストアリクエストを出す。 The bus interface unit BIU receives the external instruction address EIA or the external data address EDA from the instruction cache IC or the data cache DC, outputs the external address EA outside the processor core CPU, requests the data, and receives it as the external data ED. , Output as external fetch instruction EI or external load data ELD, respectively. Further, the external data address EDA and the external store data ESD are received from the data cache DC, and are output as the external address EA and the external data ED outside the processor core CPU to issue a store request.

図２には、本発明の実施形態１に係るプロセッサの実行ユニットＥＸＵが概略的に例示される。実行ユニットＥＸＵは算術論理演算器ＡＬＵ、シフタＳＦＴ、３２ビットフラグマルチプレクサＦＭ３２、６４ビットフラグマルチプレクサＦＭ６４、３２ビットシフトアウトマルチプレクサＭ３２、６４ビットシフトアウトビットマルチプレクサＭ６４、出力マルチプレクサＯＭＵＸ、３２ビット演算用フラグＴ、６４ビット演算用フラグＵ、フラグマルチプレクサＦＭＵＸから成る。また、図示していないが命令デコードユニットＩＤＵからの実行制御情報ＥＸＣは各構成要素に入力されてそれらを制御する。 FIG. 2 schematically illustrates an execution unit EXU of the processor according to the first embodiment of the present invention. The execution unit EXU includes an arithmetic logic unit ALU, a shifter SFT, a 32-bit flag multiplexer FM32, a 64-bit flag multiplexer FM64, a 32-bit shift-out multiplexer M32, a 64-bit shift-out bit multiplexer M64, an output multiplexer OMUX, and a 32-bit operation flag T. , A 64-bit operation flag U and a flag multiplexer FMUX. Although not shown, execution control information EXC from the instruction decode unit IDU is input to each component to control them.

算術論理演算器ＡＬＵは命令デコードユニットＩＤＵから実行用オペランドＥＸＡ及びＥＸＢを受け取り、実行制御情報ＥＸＣに従って各種算術論理演算を実行した後、実行結果ＡＬＯ、３２ビットフラグ群（符号付大ＧＴ３２，符号なし大ＧＵ３２、ゼロＺ３２、オーバーフローＶ３２、キャリーＣ３２）及び６４ビットフラグ群（符号付大ＧＴ６４，符号なし大ＧＵ６４、ゼロＺ６４、オーバーフローＶ６４、キャリーＣ６４）を出力する。 The arithmetic logic unit ALU receives the execution operands EXA and EXB from the instruction decode unit IDU, executes various arithmetic logic operations according to the execution control information EXC, and then executes the execution result ALO, a 32-bit flag group (signed large GT32, unsigned) A large GU32, zero Z32, overflow V32, carry C32) and 64-bit flag group (signed large GT64, unsigned large GU64, zero Z64, overflow V64, carry C64) are output.

シフタＳＦＴは命令デコードユニットＩＤＵから実行用オペランドＥＸＡ及びＥＸＢを受け取り、実行制御情報ＥＸＣに従って各種シフト演算を実行した後、実行結果ＳＦＯ、３２ビット左シフトアウトビットＳＬ３２、６４ビット左シフトアウトビットＳＬ６４、及び右シフトアウトビットＳＲを出力する。そして、３２ビットシフトアウトビットマルチプレクサＭ３２で、シフト演算の方向に応じて３２ビット左シフトアウトビットＳＬ３２又は右シフトアウトビットＳＲを選択して３２ビットフラグ群の１つである３２ビットシフトアウトフラグＳＦ３２として出力する。また、６４ビットシフトアウトビットマルチプレクサＭ６４で、シフト演算の方向に応じて６４ビット左シフトアウトビットＳＬ６４又は右シフトアウトビットＳＲを選択して６４ビットフラグ群の１つである６４ビットシフトアウトフラグＳＦ６４として出力する。 The shifter SFT receives the execution operands EXA and EXB from the instruction decode unit IDU, executes various shift operations according to the execution control information EXC, and then executes an execution result SFO, a 32-bit left shift-out bit SL32, a 64-bit left shift-out bit SL64, And the right shift-out bit SR is output. Then, the 32-bit shift-out bit multiplexer M32 selects the 32-bit left shift-out bit SL32 or the right shift-out bit SR according to the direction of the shift operation, and the 32-bit shift-out flag SF32, which is one of the 32-bit flag groups. Output as. Further, the 64-bit shift-out bit multiplexer M64 selects the 64-bit left shift-out bit SL64 or the right shift-out bit SR according to the direction of the shift operation, and is a 64-bit shift-out flag SF64 that is one of the 64-bit flag groups. Output as.

出力マルチプレクサＯＭＵＸは実行結果ＡＬＯ及び実行結果ＳＦＯの一方を実行制御情報ＥＸＣに従って選択し、実行結果EXOとして出力する。 The output multiplexer OMUX selects one of the execution result ALO and the execution result SFO according to the execution control information EXC, and outputs it as the execution result EXO.

３２ビットフラグマルチプレクサＦＭ３２は３２ビットフラグ群から命令の種類に応じてフラグを選択して、新たな３２ビットフラグｎｅｗＴを生成し、３２ビットフラグＴの入力とする。同様に、６４ビットフラグマルチプレクサＦＭ６４は６４ビットフラグ群から命令の種類に応じてフラグを選択し、新たな６４ビットフラグｎｅｗＵを生成し、６４ビットフラグＵの入力とする。３２ビットフラグＴ及び６４ビットフラグＵはこれらの入力をラッチし、フラグマルチプレクサＦＭＵＸに出力する。フラグマルチプレクサＦＭＵＸは使用する命令に応じて３２ビットフラグＴ及び６４ビットフラグＵの一方を選択し、フラグ出力ＦＯとして出力する。フラグマルチプレクサＦＭＵＸはラッチ後の値を使用して、次命令で使用するフラグを選択しており、図示していないが命令デコードユニットＩＤＵからの実行制御情報ＥＸＣとして、ラッチ前の値を使用することにより次命令の制御情報を受け取ることが出来る。 The 32-bit flag multiplexer FM32 selects a flag from the 32-bit flag group according to the type of instruction, generates a new 32-bit flag newT, and uses the 32-bit flag T as an input. Similarly, the 64-bit flag multiplexer FM64 selects a flag from the 64-bit flag group according to the type of instruction, generates a new 64-bit flag newU, and inputs it to the 64-bit flag U. The 32-bit flag T and the 64-bit flag U latch these inputs and output them to the flag multiplexer FMUX. The flag multiplexer FMUX selects one of the 32-bit flag T and the 64-bit flag U according to the instruction to be used, and outputs it as a flag output FO. The flag multiplexer FMUX uses the value after latch to select the flag to be used in the next instruction, and uses the value before latch as execution control information EXC from the instruction decode unit IDU (not shown). The control information of the next command can be received.

上記実施形態1により、命令セットを構成する命令の種類（命令数）を全体として少なくすることができる。したがって、命令コード空間に余裕のないＲＩＳＣ型のデータプロセッサにおける命令コードのコード空間縮小に寄与することができ、１６ビット固定長命令セットのＲＩＳＣのように命令コード空間に余裕がないプロセッサの６４ビット化が可能になる。 According to the first embodiment, the types of instructions (the number of instructions) constituting the instruction set can be reduced as a whole. Therefore, it is possible to contribute to the reduction of the code space of the instruction code in the RISC type data processor having no instruction code space, and the 64-bit of the processor having no instruction code space like the RISC of the 16-bit fixed length instruction set. Can be realized.

《実施形態２》
図３には、本発明の実施形態２に係るフラグ更新プレフィックス命令が概略的に例示される。フラグ更新プレフィックス命令は、更新するフラグの指定、後続命令が生成したフラグのうち使用するフラグの指定、指定した２つのフラグ間の論理演算の指定を行う。フラグを３２ビットフラグＴ及び６４ビットフラグＵの２種類とすると、更新するフラグの指定に１ビット、後続命令が生成したフラグのうち使用するフラグの指定に１ビット使用する。また、演算を６種類とすると、指定した２つのフラグ間の論理演算の指定には３ビット使用する。したがって、フラグ更新プレフィックス命令は５ビットのオペランドフィールドで指定することができ、大きな命令コード空間を必要としない。 << Embodiment 2 >>
FIG. 3 schematically illustrates a flag update prefix instruction according to the second embodiment of the present invention. The flag update prefix instruction designates a flag to be updated, designates a flag to be used among flags generated by subsequent instructions, and designates a logical operation between the two designated flags. If there are two types of flags, a 32-bit flag T and a 64-bit flag U, 1 bit is used for designating a flag to be updated, and 1 bit is used for designating a flag to be used among flags generated by subsequent instructions. Also, assuming that there are six types of operations, 3 bits are used to specify a logical operation between two specified flags. Therefore, the flag update prefix instruction can be specified by a 5-bit operand field and does not require a large instruction code space.

１６ビット固定長命令セットで定義すると、図３のように１１ビットのオペレーションタイプ指定フィールドＯＰＴでフラグ更新プレフィックス命令であることを指定し、２ビットのソースデスティネーション指定フィールドＳＤで、更新するフラグの指定及び後続命令が生成したフラグのうち使用するフラグの指定を行い、３ビットの論理演算指定フィールドＴＹＰで指定した２つのフラグ間の論理演算の指定を行う。論理演算の種類は論理積ＡＮＤ、論理和ＯＲ、否定論理積ＡＮＤＮ、否定論理和ＯＲＮ、排他的論理和ＸＯＲ、新フラグＮＥＷの６種類であり、それぞれＴＹＰフィールドの０００から１０１を割り当てている。この演算の種類にソースとソースデスティネーションフラグを加えてニモニックとしている。ＳＤフィールドは上位がソース、下位がデスティネーションで、０が３２ビットフラグＴ、１が６４ビットフラグＵを指定することを表す。動作欄のｎｅｗＴは後続命令が生成した３２ビットフラグ、ｎｅｗＵは後続命令が生成した６４ビットフラグ、＆＝、｜＝、＾＝、＝、〜はＣ言語と同じ意味の演算子であり、＆＝は右辺の値と左辺の値との論理積を取って左辺の変数に代入、｜＝は右辺の値と左辺の値との論理和を取って左辺の変数に代入、＾＝は右辺の値と左辺の値との排他的論理和を取って左辺の変数に代入、＝は右辺の値を左辺の変数に代入、〜は右側の値を論理反転することを表す。 When the 16-bit fixed-length instruction set is defined, the flag update prefix instruction is designated in the 11-bit operation type designation field OPT as shown in FIG. 3, and the flag to be updated is designated in the 2-bit source destination designation field SD. Among the flags generated by the designation and subsequent instructions, the flag to be used is designated, and the logical operation between the two flags designated in the 3-bit logical operation designation field TYP is designated. There are six types of logical operations: logical product AND, logical sum OR, negative logical product ANDN, negative logical sum ORN, exclusive logical sum XOR, and new flag NEW, each of which is assigned 000 to 101 in the TYP field. The source and source destination flags are added to this type of operation to make it mnemonic. In the SD field, the higher order is the source, the lower order is the destination, 0 indicates the 32-bit flag T, and 1 indicates the 64-bit flag U. NewT in the action column is a 32-bit flag generated by the subsequent instruction, newU is a 64-bit flag generated by the subsequent instruction, & =, | =, ^ =, =, ~ are operators having the same meaning as in the C language, and & = Is the logical product of the value on the right side and the value on the left side and assigned to the variable on the left side. | = Is the logical sum of the value on the right side and the value on the left side and assigned to the variable on the left side. The exclusive OR of the value and the value on the left side is calculated and assigned to the variable on the left side, = indicates that the value on the right side is assigned to the variable on the left side, and ~ indicates that the value on the right side is logically inverted.

例えばＳＤ＝００、ＴＹＰ＝０００の場合は、論理演算の種類は論理積ＡＮＤ、更新するフラグ（デスティネーションフラグ）は３２ビットフラグＴ，後続命令が生成したフラグのうち使用するフラグ（ソースフラグ）も３２ビットフラグＴであり、ニモニックはＡＮＤＴＴ、動作はＴ＆＝ｎｅｗＴ；Ｕ：不変なので、３２ビットフラグＴと後続命令が生成したフラグのうち３２ビットフラグＴとの論理積を取って３２ビットフラグＴに格納し、６４ビットフラグＵは更新しないというフラグ更新プレフィックス命令となる。そして、後続命令の動作は、プレフィックス命令がなければ生成したフラグで３２ビットフラグＴ及び６４ビットフラグＵを更新するところを、上記フラグ更新プレフィックス指定の動作に置き換えられる。 For example, when SD = 00 and TYP = 000, the type of logical operation is logical AND, the update flag (destination flag) is a 32-bit flag T, and the flag (source flag) used among the flags generated by subsequent instructions Is a 32-bit flag T, the mnemonic is ANDTT, and the operation is T & = newT; U: unchanged, so the 32-bit flag is obtained by ANDing the 32-bit flag T and the 32-bit flag T among the flags generated by the subsequent instruction. The flag update prefix instruction is stored in T and the 64-bit flag U is not updated. Then, in the operation of the subsequent instruction, if there is no prefix instruction, the update of the 32-bit flag T and the 64-bit flag U with the generated flag is replaced with the operation of specifying the flag update prefix.

本実施形態２と前述の実施形態１との構成上の違いは命令デコードユニットＩＤＵ及び実行ユニットＥＸＵに現れるため、プロセッサコアの代表的ブロック構成は実施形態１と同様に図１に示される。 Since a difference in configuration between the second embodiment and the first embodiment appears in the instruction decode unit IDU and the execution unit EXU, a typical block configuration of the processor core is shown in FIG. 1 as in the first embodiment.

図４には、本発明の実施形態２に係るプロセッサの命令デコードユニットＩＤＵが概略的に例示される。１サイクルに１命令発行するスカラプロセッサを例示するが、文献４のようにプレフィックス命令を使うプロセッサは従来から存在し、本分野の通常のスキルの技術者であればプレフィックスデコード及び発行方式のスーパースカラ、アウトオブオーダ等の他の発行形態への適用は可能である。また、本実施例ではフラグ更新プレフィックス命令のみがプレフィックス命令であることを前提としているが、他のプレフィックス命令も扱えるように拡張することも本分野の通常のスキルの技術者であれば可能である。 FIG. 4 schematically illustrates an instruction decode unit IDU of a processor according to the second embodiment of the present invention. An example of a scalar processor that issues one instruction per cycle is shown. However, a processor that uses a prefix instruction as in Reference 4 has existed in the past, and if it is an engineer of ordinary skill in this field, a superscalar of prefix decoding and issuing method Application to other issuance forms such as out-of-order is possible. Also, in this embodiment, it is assumed that only the flag update prefix instruction is a prefix instruction. However, it is possible for an engineer with ordinary skills in this field to extend it to handle other prefix instructions. .

命令デコードユニットＩＤＵは、メインデコーダＤＥＣ及びプレフィックスデコーダＰＦ−ＤＥＣから成る。メインデコーダＤＥＣは命令フェッチユニットＩＦＵから供給される命令ＯＰをデコードし、実行制御情報ｏｐ−ｅｘｃを実効制御情報ＥＸＣの一部として実行ユニットＥＸＵへ、３２ビットフラグ更新制御ｏｐ−ｗｒｔ及び６４ビットフラグＵの更新制御ｏｐ−ｗｒｕをプレフィックスデコーダＰＦ−ＤＥＣへ、ロードストア制御情報ＬＳＣをロードストアユニットＬＳＵへ、そして、レジスタファイル制御情報ＲＦＣをレジスタファイルＲＦへ出力する。尚、レジスタファイル制御情報ＲＦＣのうち、書込み情報は発行された命令がレジスタ書込みステージに達するタイミングで供給する。 The instruction decode unit IDU includes a main decoder DEC and a prefix decoder PF-DEC. The main decoder DEC decodes the instruction OP supplied from the instruction fetch unit IFU, and uses the execution control information op-exc as a part of the effective control information EXC to the execution unit EXU. The 32-bit flag update control op-wrt and the 64-bit flag The U update control op-wru is output to the prefix decoder PF-DEC, the load / store control information LSC is output to the load / store unit LSU, and the register file control information RFC is output to the register file RF. Of the register file control information RFC, write information is supplied when the issued instruction reaches the register write stage.

ファイルＲＦはレジスタファイル制御情報ＲＦＣに基づいて、実行用オペランドＥＸＡ及びＥＸＢを実行ユニットＥＸＵに、ロードストア用アドレスオペランドＬＳＡ及びＬＳＢ、並びにストアデータＳＤをロードストアユニットＬＳＵに供給する。更に、実行結果ＥＸＯを実行ユニットＥＸＵから、ロードデータＬＤをロードストアユニットＬＳＵから受け取り、レジスタファイルＲＦに格納する。 The file RF supplies the execution operands EXA and EXB to the execution unit EXU, the load / store address operands LSA and LSB, and the store data SD to the load / store unit LSU based on the register file control information RFC. Further, the execution result EXO is received from the execution unit EXU, and the load data LD is received from the load store unit LSU, and stored in the register file RF.

プレフィックスデコーダＰＦ−ＤＥＣは、命令ＯＰのオペレーションタイプ指定フィールドＯＰＴをデコードし、命令ＯＰがフラグ更新プレフィックスであれば有効フラグｖを立て、そうでなければクリアする。また、２ビットのソースデスティネーション指定フィールドＳＤをそれぞれプレフィックスソースフラグ指定情報ｐｆｓｒｃとプレフィックスデスティネーションフラグ情報ｐｆｄｓｔとしてラッチする。更に、論理演算指定フィールドＴＹＰをプレフィックス論理演算指定情報ｐｆｔｙｐとしてラッチする。命令ＯＰがフラグ更新プレフィックス命令である場合、メインデコーダＤＥＣにもそれが供給される。この時、メインデコーダＤＥＣはフラグ更新プレフィックスをノーオペレーションコードとみなし、実行ユニットＥＸＵ及びロードストアユニットＬＳＵが何もしないような制御情報を出力する。 The prefix decoder PF-DEC decodes the operation type designation field OPT of the instruction OP, sets a valid flag v if the instruction OP is a flag update prefix, and clears otherwise. The 2-bit source destination designation field SD is latched as prefix source flag designation information pfsrc and prefix destination flag information pfdst, respectively. Further, the logical operation designation field TYP is latched as prefix logical operation designation information pftyp. If the instruction OP is a flag update prefix instruction, it is also supplied to the main decoder DEC. At this time, the main decoder DEC regards the flag update prefix as a no operation code, and outputs control information such that the execution unit EXU and the load / store unit LSU do nothing.

命令ＯＰがフラグ更新プレフィックス命令であった次のサイクルでは、メインデコーダＤＥＣは後続命令をデコードし、前述のように各種制御情報を出力する。一方、プレフィックスデコーダＰＦ−ＤＥＣでは、前サイクルでラッチした情報を使って処理を進める。命令ＯＰがフラグ更新プレフィックス命令であったため、有効フラグｖが立っており、論理演算指定情報ｔｙｐ、３２ビットフラグソース情報ｓｒｔ、６４ビットフラグソース情報ｓｒｕ、３２ビットフラグ更新制御ｗｒｔ及び６４ビットフラグ更新制御ｗｒｕとしては、それぞれプレフィックス論理演算指定情報ｐｆｔｙｐ、プレフィックスフラグソース指定情報ｐｆｓｒｃ、同じくプレフィックスフラグソース指定情報ｐｆｓｒｃ、プレフィックスデスティネーションフラグ情報ｐｆｄｓｔが０、及びプレフィックスデスティネーションフラグ情報ｐｆｄｓｔが１という情報を出力する。この結果、フラグ生成のための制御情報としては、メインでコーダからの３２ビットフラグ更新制御ｏｐ−ｗｒｔ及び６４ビットフラグ更新制御ｏｐ−ｗｒｕがオーバーライドされ、フラグ更新プレフィックス命令の情報が出力される。 In the next cycle in which the instruction OP is a flag update prefix instruction, the main decoder DEC decodes the subsequent instruction and outputs various control information as described above. On the other hand, in the prefix decoder PF-DEC, the processing proceeds using the information latched in the previous cycle. Since the instruction OP is a flag update prefix instruction, the valid flag v is set, the logical operation designation information type, the 32-bit flag source information srt, the 64-bit flag source information sru, the 32-bit flag update control wrt, and the 64-bit flag update. As control wru, prefix logical operation designation information pftyp, prefix flag source designation information pfsrc, prefix flag source designation information pfsrc, prefix destination flag information pfdst is 0, and prefix destination flag information pfdst is 1 are output. To do. As a result, as the control information for generating the flag, the 32-bit flag update control op-wrt and the 64-bit flag update control op-wru from the coder are overridden mainly, and the information of the flag update prefix instruction is output.

一方、命令ＯＰがフラグ更新プレフィックス命令でなかった次のサイクルでは、有効フラグｖが立っていないため、論理演算指定情報ｔｙｐ、３２ビットフラグソース情報ｓｒｔ、６４ビットフラグソース情報ｓｒｕ、３２ビットフラグ更新制御ｗｒｔ及び６４ビットフラグ更新制御ｗｒｕとしては、それぞれ１０１、０、１、３２ビットフラグ更新制御ｏｐ−ｗｒｔ及び６４ビットフラグ更新制御ｏｐ−ｗｒｕを出力する。この結果、メインデコーダＤＥＣの出力が命令デコードユニットＩＤＵとして出力される。尚、メインデコーダＤＥＣ出力のない、論理演算指定情報ｔｙｐ、３２ビットフラグソース情報ｓｒｔ、及び６４ビットフラグソース情報ｓｒｕとして、それぞれ１０１、０、１出力することが命令本来の動作を指定している。 On the other hand, in the next cycle in which the instruction OP is not a flag update prefix instruction, the valid flag v is not raised, so that the logical operation designation information type, 32-bit flag source information srt, 64-bit flag source information sru, 32-bit flag update As control wrt and 64-bit flag update control wru, 101, 0, 1, 32-bit flag update control op-wrt and 64-bit flag update control op-wru are output, respectively. As a result, the output of the main decoder DEC is output as the instruction decode unit IDU. It should be noted that 101, 0, and 1 are output as the logical operation specifying information type, the 32-bit flag source information srt, and the 64-bit flag source information sru, respectively, without the main decoder DEC output, to specify the original operation of the instruction .

上記論理演算指定情報ｔｙｐ、３２ビットフラグソース情報ｓｒｔ、６４ビットフラグソース情報ｓｒｕ、３２ビットフラグ更新制御ｗｒｔ及び６４ビットフラグ更新制御ｗｒｕはメインデコーダＤＥＣで生成される実行制御情報ｏｐ−ｅｘｃと共に実効制御情報ＥＸＣとして実行ユニットＥＸＵに出力される。 The logical operation designation information type, 32-bit flag source information srt, 64-bit flag source information sru, 32-bit flag update control wrt and 64-bit flag update control wru are effective together with the execution control information op-exc generated by the main decoder DEC. It is output to the execution unit EXU as control information EXC.

図５には、本発明の実施形態２に係るプロセッサの実行ユニットＥＸＵが概略的に例示される。図２に例示した実施形態１に係るプロセッサの実行ユニットＥＸＵと共通部分は同一の機能を有している。追加部分は、３２ビットフラグソースマルチプレクサＳ３２、６４ビットフラグソースマルチプレクサＳ６４、３２ビットフラグ論理演算器ＦＬ３２、及び６４ビットフラグ論理演算器ＦＬ６４である。 FIG. 5 schematically illustrates an execution unit EXU of the processor according to the second embodiment of the present invention. The common parts with the execution unit EXU of the processor according to the first embodiment illustrated in FIG. 2 have the same functions. The additional portions are a 32-bit flag source multiplexer S32, a 64-bit flag source multiplexer S64, a 32-bit flag logic operator FL32, and a 64-bit flag logic operator FL64.

３２ビットフラグソースマルチプレクサＳ３２は、命令デコードユニットＩＤＵからの３２ビットフラグソース情報ｓｒｔに従って、新たな３２ビットフラグｎｅｗＴ又は新たな６４ビットフラグｎｅｗＵを選択し、３２ビットフラグ論理演算器ＦＬ３２に供給し、３２ビットフラグ論理演算器ＦＬ３２は、これと３２ビットフラグＴとから、論理演算指定情報ｔｙｐに従って、論理演算を行い、結果を３２ビットフラグＴにラッチする新たな値とする。同様に、６４ビットフラグソースマルチプレクサＳ６４は、命令デコードユニットＩＤＵからの６４ビットフラグソース情報ｓｒｔに従って、新たな３２ビットフラグｎｅｗＴ又は新たな６４ビットフラグｎｅｗＵを選択し、６４ビットフラグ論理演算器ＦＬ６４に供給し、６４ビットフラグ論理演算器ＦＬ６４は、これと６４ビットフラグＵとから、論理演算指定情報ｔｙｐに従って、論理演算を行い、結果を６４ビットフラグＵにラッチする新たな値とする。 The 32-bit flag source multiplexer S32 selects a new 32-bit flag newT or a new 64-bit flag newU according to the 32-bit flag source information srt from the instruction decode unit IDU, and supplies it to the 32-bit flag logic unit FL32. The 32-bit flag logical operator FL32 performs a logical operation according to the logical operation designation information type from this and the 32-bit flag T, and sets the result as a new value latched in the 32-bit flag T. Similarly, the 64-bit flag source multiplexer S64 selects a new 32-bit flag newT or a new 64-bit flag newU according to the 64-bit flag source information srt from the instruction decode unit IDU, and sends it to the 64-bit flag logic operator FL64. The 64-bit flag logical operation unit FL64 supplies the 64-bit flag logical unit FL64 to the 64-bit flag U and performs a logical operation in accordance with the logical operation designation information type, and sets the result as a new value to be latched in the 64-bit flag U.

上記のように本発明の実施形態２に係る命令デコードユニットＩＤＵ及び実行ユニットＥＸＵにより、大きな命令コード空間を必要としないフラグ更新プレフィックス命令による、残したいフラグの更新抑止、複数命令で生成したフラグ間の論理演算が可能となる。 As described above, by the instruction decode unit IDU and the execution unit EXU according to the second embodiment of the present invention, the flag update prefix instruction that does not require a large instruction code space is used to suppress the update of the flag that is to be retained, and between the flags generated by a plurality of instructions Can be logically operated.

次に、具体例によってフラグ更新プレフィックス命令の効果を説明する。図６には本発明の実施形態２に係るプロセッサの動作例が概略的に例示される。図６のＣプログラムは６４ビットポインタｐがＮＵＬＬポインタでなく、３２ビット変数ｉが１０より大きかったら｛｝内を実行せよというプログラムである。ＮＵＬＬポインタは何も指していない状態であり値が０となっている。 Next, the effect of the flag update prefix instruction will be described using a specific example. FIG. 6 schematically illustrates an example of the operation of the processor according to the second embodiment of the present invention. The C program in FIG. 6 is a program for executing the contents in {} when the 64-bit pointer p is not a NULL pointer and the 32-bit variable i is larger than 10. The NULL pointer is not pointing at anything, and the value is 0.

フラグ更新プレフィックス命令を含むアセンブラで、このＣプログラムを記述すると図６のように４命令で記述される。まず第１ステップで、ＣＭＰ／ＥＱｐ，０で６４ビットポインタｐとＮＵＬＬポインタ値０とを６４ビットサイズで比較し、比較結果を６４ビットフラグＵに格納する。６４ビットポインタｐがＮＵＬＬポインタの場合に６４ビットフラグＵがセットされる。即ち、Ｕ＝（ｐ＝＝ＮＵＬＬ）となる。このとき、３２ビットフラグＴには６４ビットポインタｐとＮＵＬＬポインタ値０の下位３２ビットを比較した結果が格納されるが本プログラムでは使用しない。第２ステップでは、フラグ更新プレフィックス命令ＯＲＮＴＵをデコードする。第３ステップでは、ＣＭＰ／ＧＴｉ，１０で３２ビット変数ｉが１０より大きかったら新たな３２ビットフラグｎｅｗＴが立つ。そして、フラグ更新プレフィックス命令ＯＲＮＴＵによって、Ｕ｜＝〜ｎｅｗＴとなるので、Ｕ＝（ｐ＝＝ＮＵＬＬ）｜〜（ｉ＞１０）となる。このとき、３２ビットフラグＴは不変である。この結果、６４ビットフラグＵにはＣプログラムのｉｆ文の条件式の反転値が入っている。第４ステップでは、ＢＴ．Ｄ＿ａｆｔｅｒ＿ｉｆ＿ｃｌｏｓｅによってＵが１、即ち条件式が不成立ならばｉｆ文の後ろに飛ぶのでｉｆ文は実行されない。 When this C program is described in an assembler including a flag update prefix instruction, it is described in four instructions as shown in FIG. First, in a first step, the 64-bit pointer p and the NULL pointer value 0 are compared with a 64-bit size using CMP / EQ p, 0, and the comparison result is stored in the 64-bit flag U. A 64-bit flag U is set when the 64-bit pointer p is a NULL pointer. That is, U = (p == NULL). At this time, the result of comparing the 64-bit pointer p and the lower 32 bits of the NULL pointer value 0 is stored in the 32-bit flag T, but this is not used in this program. In the second step, the flag update prefix instruction ORNTU is decoded. In the third step, if the 32-bit variable i is greater than 10 in CMP / GT i, 10, a new 32-bit flag newT is set. Since U | = ˜newT is satisfied by the flag update prefix instruction ORNTU, U = (p == NULL) | ˜ (i> 10). At this time, the 32-bit flag T is unchanged. As a result, the 64-bit flag U contains the inverted value of the conditional expression of the “if” statement of the C program. In the fourth step, BT. If U is 1 by D_after_if_close, that is, if the conditional expression is not satisfied, the if statement is not executed because it jumps after the if statement.

以上のようにフラグ更新プレフィックス命令を使うと複数の比較結果をまとめることが出来るため、１回の条件分岐で条件判定が完了する。フラグ更新プレフィックス命令を使わないと条件判定のたびに条件分岐を行う必要があり、これを高速化することは困難である。あるいは生成したフラグを汎用レジスタに転送して論理演算を行う場合、第２ステップのフラグ更新プレフィックス命令の代わりにフラグ転送命令ＭＯＶＵＲ０を実行して生成したＵフラグを汎用レジスタＲ１に転送し、第４ステップの条件分岐の前に、フラグ転送命令ＭＯＶＴＲ０を実行して生成したＴフラグを汎用レジスタＲ１に転送し、ＮＯＴＲ０で論理反転し、ＡＮＤ＃１，Ｒ０で上位をクリアし、ＯＲＲ０，Ｒ１で（ｐ＝＝ＮＵＬＬ）｜〜（ｉ＞１０）を生成する。更に、ＳＨＬＲＲ１で（ｐ＝＝ＮＵＬＬ）｜〜（ｉ＞１０）を３２ビットフラグＴに格納する。したがって、命令数が４命令増えて２倍になり性能が低下する。このように、フラグ更新プレフィックス命令は複雑な条件判定を高速化することが出来る。 As described above, when the flag update prefix instruction is used, a plurality of comparison results can be collected, so that the condition determination is completed in one conditional branch. If the flag update prefix instruction is not used, it is necessary to perform a conditional branch every time the condition is determined, and it is difficult to increase the speed. Alternatively, when a logical operation is performed by transferring the generated flag to the general-purpose register, the U-flag generated by executing the flag transfer instruction MOVU R0 is transferred to the general-purpose register R1 instead of the flag update prefix instruction in the second step. Before the 4-step conditional branch, the T flag generated by executing the flag transfer instruction MOVT R0 is transferred to the general-purpose register R1, logically inverted by NOT R0, and the higher order is cleared by AND # 1, R0, and OR R0 , R1 (p == NULL) | to (i> 10). Further, (p == NULL) | to (i> 10) are stored in the 32-bit flag T by SHLR R1. Therefore, the number of instructions increases by 4 and doubles, thereby degrading performance. As described above, the flag update prefix instruction can speed up complicated condition determination.

以上本発明者によってなされた発明を実施形態に基づいて具体的に説明したが、本発明はそれに限定されるものではなく、その要旨を逸脱しない範囲において種々変更可能であることは言うまでもない。例えばフラグ更新プレフィックス命令はＯＲＮＴＵに代表されるように、後続命令が生成するフラグによって更新するフラグの指定に加えて、後続命令が生成するフラグのうち使用するフラグの指定、及び指定した２つのフラグ間の論理演算の指定を夫々行う機能を有するものとした。本発明はそれに限定されず、先に生成された夫々のデータサイズに対応するフラグのうち後続命令が生成するフラグによって更新するフラグを指定する機能だけを持つ命令であってもよい。 Although the invention made by the present inventor has been specifically described based on the embodiments, it is needless to say that the present invention is not limited thereto and can be variously modified without departing from the gist thereof. For example, a flag update prefix instruction is represented by ORNTU, in addition to specifying a flag to be updated by a flag generated by a subsequent instruction, specification of a flag to be used among flags generated by the subsequent instruction, and two specified flags It has a function to specify each logical operation between them. The present invention is not limited to this, and may be an instruction having only a function of designating a flag to be updated by a flag generated by a subsequent instruction among flags corresponding to the respective data sizes generated previously.

ＩＣ命令キャッシュ
ＩＦＵ命令フェッチユニット
ＩＤＵ命令デコードユニット
ＥＸＵ実行ユニット
ＬＳＵロードストアユニット
ＤＣデータキャッシュ
ＢＩＵバスインタフェースユニットユニット
ＩＡ命令アドレス
ＥＩＡ外部命令アドレス
ＯＰ命令
ＢＲＣ分岐制御信号
ＥＸＣ実行制御情報
ＬＳＣロードストア制御情報
ＲＦレジスタファイル
ＥＸＡ，ＥＸＢ実行用オペランド
ＬＳＡ，ＬＳＢロードストア用アドレスオペランド
ＳＤストアデータ
ＥＸＯ実行結果
ＤＡデータアドレス
ＤＣＳＤデータキャッシュストアデータ
ＤＣＬＤデータキャッシュロードデータ
ＥＬＤ外部ロードデータ
ＥＩＡ外部命令アドレス
ＥＡ外部アドレス
ＥＤ外部データ
ＥＩ外部フェッチ命令
ＥＳＤ外部ストアデータ
ＡＬＵ算術論理演算器
ＳＦＴシフタ
ＦＭ３２３２ビットフラグマルチプレクサ
ＦＭ６４６４ビットフラグマルチプレクサ
Ｍ３２３２ビットシフトアウトマルチプレクサ
Ｍ６４６４ビットシフトアウトビットマルチプレクサ
ＯＭＵＸ出力マルチプレクサ
Ｔ３２ビット演算用フラグ
Ｕ６４ビット演算用フラグ
ｎｅｗＴ新たな３２ビットフラグ
ｎｅｗＵ新たな６４ビットフラグ
ＦＭＵＸフラグマルチプレクサ
ＡＬＯ，ＳＦＯ実行結果
ＧＴ３２３２ビットデータサイズの符号付大フラグ
ＧＵ３２３２ビットデータサイズの符号なし大フラグ
Ｚ３２３２ビットデータサイズのゼロフラグ
Ｖ３２３２ビットデータサイズのオーバーフローフラグ
Ｃ３２３２ビットデータサイズのキャリーフラグ
ＧＴ６４６４ビットデータサイズの符号付大フラグ
ＧＵ６４６４ビットデータサイズの符号なし大フラグ
Ｚ６４６４ビットデータサイズのゼロフラグ
Ｖ６４６４ビットデータサイズのオーバーフローフラグ
Ｃ６４６４ビットデータサイズのキャリーフラグ
ＳＬ３２３２ビット左シフトアウトビット
ＳＬ６４６４ビット左シフトアウトビット
ＳＲ右シフトアウトビット
ＳＦ３２３２ビットシフトアウトフラグ
ＳＦ６４６４ビットシフトアウトフラグ
Ｓ３２３２ビットフラグソースマルチプレクサ
Ｓ６４６４ビットフラグソースマルチプレクサ
ＦＬ３２３２ビットフラグ論理演算器
ＦＬ６４６４ビットフラグ論理演算器 IC instruction cache IFU instruction fetch unit IDU instruction decode unit EXU execution unit LSU load store unit DC data cache BIU bus interface unit unit IA instruction address EIA external instruction address OP instruction BRC branch control signal EXC execution control information LSC load store control information RF register File EXA, EXB Execution operand LSA, LSB Load store address operand SD store data EXO execution result DA data address DCSD data cache store data DCLD data cache load data ELD External load data EIA External instruction address EA External address ED External data EI External Fetch instruction ESD external store data ALU arithmetic logic Calculator SFT Shifter FM32 32-bit flag multiplexer FM64 64-bit flag multiplexer M32 32-bit shift-out multiplexer M64 64-bit shift-out bit multiplexer OMUX Output multiplexer T 32-bit operation flag U 64-bit operation flag newT New 32-bit flag newU New 64-bit flag FMUX flag multiplexer ALO, SFO execution result GT32 32-bit data size signed large flag GU32 32-bit data size unsigned large flag Z32 32-bit data size zero flag V32 32-bit data size overflow flag C32 32-bit Carry flag for data size GT64 Signed large flag for 64-bit data size G U64 64-bit data size unsigned large flag Z64 64-bit data size zero flag V64 64-bit data size overflow flag C64 64-bit data size carry flag SL32 32-bit left shift out bit SL64 64-bit left shift out bit SR right shift Out bit SF32 32-bit shift-out flag SF64 64-bit shift-out flag S32 32-bit flag source multiplexer S64 64-bit flag source multiplexer FL32 32-bit flag logic unit FL64 64-bit flag logic unit

Claims

A data processor having, in an instruction set, an arithmetic instruction capable of performing arithmetic processing and generating a plurality of flags,
Among the plurality of flags generated by the arithmetic instruction, in addition to specifying the flag to be updated by the flag generated by the subsequent instruction, the flag to be used among the flags generated by the subsequent instruction to be modified, and between the two specified flags A data processor having, in the instruction set, a prefix instruction for designating each logical operation .