JP2006227777A

JP2006227777A - Data processor

Info

Publication number: JP2006227777A
Application number: JP2005038760A
Authority: JP
Inventors: Toru Hiraoka; 徹平岡; Kimihiro Sugino; 貴美広杉野; Kesami Hagiwara; 今朝巳萩原; Koji Kobayashi; 浩二小林
Original assignee: Renesas Technology Corp
Current assignee: Renesas Technology Corp
Priority date: 2005-02-16
Filing date: 2005-02-16
Publication date: 2006-08-31
Anticipated expiration: 2025-02-16
Also published as: JP4737592B2

Abstract

<P>PROBLEM TO BE SOLVED: To enhance protection of a customer program in connection with data processing technology, especially, with the technology for attaining protection enhancement of the customer program. <P>SOLUTION: Enhancement of protection of the customer program is attained by constituting a data processor to contain; a central processing unit (1600) executable of an instruction code; an instruction cache (100) which can hold an encrypted instruction code; and an instruction code decryption logic (300) which is arranged between the central processing unit and the instruction cache, and fetches an encrypted instruction code through the instruction cache for decrypting and supplying it to the central processing unit, and making the content of the instruction cache an encrypted instruction code and avoiding decrypted instruction code being stored in the instruction cache. <P>COPYRIGHT: (C)2006,JPO&NCIPI

Description

本発明は、データ処理技術、特に顧客プログラムの保護強化を図るための技術に関する。 The present invention relates to a data processing technique, and more particularly to a technique for enhancing protection of a customer program.

ソフトウエア及びデータの保護に関する技術として、中央処理装置に暗号化・復号部及び鍵格納部を付加して、記憶装置に格納されている暗号化された機械語命令及びデータを復号部によって復号して中央処理装置で実行させ、また中央処理装置から記憶装置にデータを格納する時は暗号化部によってデータを暗号化して格納するようにした技術が知られている（例えば特許文献１参照）。 As technology related to software and data protection, an encryption / decryption unit and key storage unit are added to the central processing unit, and the encrypted machine language instructions and data stored in the storage device are decrypted by the decryption unit. There is known a technique in which data is executed by a central processing unit, and when data is stored from a central processing unit to a storage device, the data is encrypted and stored by an encryption unit (see, for example, Patent Document 1).

また、情報処理装置においてソフトウエア等の機密保護を図る際に、暗号強度と計算機のスループットがトレードオフにならないようにし、もって機密保護と高速処理とをともに実現できるようにすることを目的とした技術として、第２の記憶手段に格納されている暗号化された命令及びデータを復号して第１の記憶手段に格納してデータ処理手段で実行及び処理し、データ処理手段から第２の記憶手段にデータを出力する時は一旦第１の記憶手段に格納した後、暗号化手段で暗号化して第２の記憶手段に格納する技術が知られている（例えば特許文献２参照）。 In addition, the purpose of the information processing device is to make it possible to realize both security protection and high-speed processing so that the encryption strength and the computer throughput do not become a trade-off when security protection of software etc. is attempted. As a technique, the encrypted instruction and data stored in the second storage means are decrypted, stored in the first storage means, executed and processed by the data processing means, and the second storage from the data processing means. A technique is known in which when data is output to the means, the data is temporarily stored in the first storage means and then encrypted by the encryption means and stored in the second storage means (see, for example, Patent Document 2).

特開平２−１５５０３４号公報（図１）Japanese Patent Laid-Open No. 2-155044 (FIG. 1) 特開平９−２５９０４４号公報（図１）Japanese Patent Laid-Open No. 9-259044 (FIG. 1)

従来はメモリ上の暗号化された命令コードを一旦復号化して命令キャッシュやユーザＲＡＭ（ランダム・アクセス・メモリ）に置いてから中央処理装置によって命令が処理されていた。この方式では、命令キャッシュやユーザＲＡＭには復号化された命令コードが存在するため、デバッグツールや悪意を持ったプログラムにより顧客プログラムの機密保護が守られないことがあり得る。そこで、命令キャッシュやユーザＲＡＭとＣＰＵの間で命令コードの復号化を実施することが必要となる。それについて本願発明者が検討したところ、中央処理装置で命令が処理される直前に復号化すると、復号化に要する処理時間が大きいため、命令の処理性能が大幅に劣化してしまうことが見いだされた。 Conventionally, an encrypted instruction code in a memory is once decrypted and placed in an instruction cache or user RAM (random access memory), and then the instruction is processed by a central processing unit. In this method, since the decrypted instruction code exists in the instruction cache and the user RAM, the security of the customer program may not be protected by a debugging tool or a malicious program. Therefore, it is necessary to decode the instruction code between the instruction cache or user RAM and the CPU. As a result of examination by the inventors of the present invention, it is found that if the decoding is performed immediately before the instruction is processed in the central processing unit, the processing time required for the decoding is large, so that the processing performance of the instruction is greatly deteriorated. It was.

本発明の目的は、顧客プログラムの保護の強化を図ることにある。 An object of the present invention is to enhance protection of customer programs.

本発明の別の目的は、命令の処理性能を劣化させることなく、顧客プログラムの保護の強化を図るための技術を提供することにある。 Another object of the present invention is to provide a technique for enhancing protection of a customer program without deteriorating instruction processing performance.

本発明の前記並びにその他の目的と新規な特徴は本明細書の記述及び添付図面から明らかになるであろう。 The above and other objects and novel features of the present invention will be apparent from the description of this specification and the accompanying drawings.

本願において開示される発明のうち代表的なものの概要を簡単に説明すれば下記の通りである。 The following is a brief description of an outline of typical inventions disclosed in the present application.

すなわち、命令コードを実行可能な中央処理装置と、暗号化された命令コードを保持可能な命令キャッシュと、上記中央処理装置と上記命令キャッシュとの間に配置され、上記暗号化された命令コードを、上記命令キャッシュを介して取り込み、それを復号化して上記中央処理装置に供給するための命令コード復号化論理と、を含んでデータ処理装置を構成する。 That is, a central processing unit that can execute an instruction code, an instruction cache that can store an encrypted instruction code, and a central processing unit that is disposed between the central processing unit and the instruction cache, and that stores the encrypted instruction code And an instruction code decoding logic for fetching it through the instruction cache, decoding it and supplying it to the central processing unit.

上記の手段によれば、命令キャッシュと中央処理装置との間に命令コード復号化論理が設けられているため、命令キャッシュの内容は暗号化された命令コードとされ、復号化された命令コードが命令キャッシュに格納されることがない。このことが、顧客プログラムの保護の強化を達成する。 According to the above means, since the instruction code decryption logic is provided between the instruction cache and the central processing unit, the content of the instruction cache is the encrypted instruction code, and the decrypted instruction code is It is not stored in the instruction cache. This achieves enhanced customer program protection.

このとき、命令の逐次読み出し時や分岐命令発生時の命令実行処理においてオーバヘッドが生じないようにするには、上記命令コード復号化論理において、上記暗号化された命令コードをパイプライン処理によって復号化すると良い。 At this time, in order to prevent overhead from occurring in instruction execution processing when instructions are read sequentially or when a branch instruction is generated, in the instruction code decoding logic, the encrypted instruction code is decoded by pipeline processing. Good.

また、分岐命令発生時の分岐先命令コードの復号化処理によるオーバヘッドを隠蔽化するには、分岐先命令アドレスに対応して、分岐先命令コードの復号化後の命令を、上記分岐命令アドレスに関連付けて保持可能な信号変換バッファを上記中央処理装置内に設けると良い。 In addition, in order to conceal the overhead due to the decoding process of the branch destination instruction code when the branch instruction is generated, the instruction after decoding the branch destination instruction code is set to the branch instruction address corresponding to the branch destination instruction address. A signal conversion buffer that can be stored in association with each other may be provided in the central processing unit.

そして、命令コード復号化によるオーバヘッドと分岐命令発生時の分岐先命令読み出しに関するオーバヘッドとの双方の隠蔽化を可能とするには、命令フェッチアドレスをキーとして当該命令の分岐先アドレスを出力可能な動的分岐予測機構としての分岐先アドレスバッファを上記中央処理装置内に設け、上記分岐先アドレスバッファを介して投機的に命令フェッチを実行するようにすると良い。 In order to make it possible to conceal both the overhead due to instruction code decoding and the overhead related to reading the branch destination instruction when a branch instruction is generated, it is possible to output the branch destination address of the instruction using the instruction fetch address as a key. A branch destination address buffer as a general branch prediction mechanism may be provided in the central processing unit, and instruction fetch may be speculatively executed via the branch destination address buffer.

さらに、上記構成のデータ処理装置は、マイクロコンピュータとして一つの半導体基板に形成することができる。 Furthermore, the data processing apparatus having the above configuration can be formed on one semiconductor substrate as a microcomputer.

本願において開示される発明のうち代表的なものによって得られる効果を簡単に説明すれば下記の通りである。 The effects obtained by the representative ones of the inventions disclosed in the present application will be briefly described as follows.

すなわち、顧客プログラムの保護の強化を図るための技術を提供することができる。 That is, it is possible to provide a technique for enhancing the protection of the customer program.

図１０には、本発明にかかるデータ処理装置の一例であるマイクロコンピュータが示される。 FIG. 10 shows a microcomputer as an example of a data processing apparatus according to the present invention.

図１０に示されるデータ処理装置は、特に制限されないが、命令キャッシュ（ＩＮＳ−ＣＡＣＨＥ）１００、命令コード復号化論理（ＩＮＳ−ＤＥＣ）３００、中央処理装置（ＣＰＵ）１６００、及びメモリ（ＭＥＭ）１５００とを含み、公知の半導体集積回路製造技術により、例えば単結晶シリコン基板などの一つの半導体基板に形成される。暗号化された命令コードは、上記命令キャッシュ１００を介して上記命令コード復号化論理３００に伝達される。上記命令コード復号化論理３００は、暗号化された命令コードを上記ＣＰＵ１６００の直前で復号化してから上記ＣＰＵ１６００に供給する。上記ＣＰＵ１６００は、上記命令コード復号化論理３００によって復号化された命令を実行する。例えば、上記暗号化された命令コードは、上記ＣＰＵ１６００や図示しないコプロセッサで暗号化し、上記ＭＥＭ１５００に格納されていてもよいし、上記ＭＥＭ１５００に予め格納されていてもよいし、上記データ処理装置の外部から暗号化された命令コードが伝達されるものであってもよい。 The data processing apparatus shown in FIG. 10 is not particularly limited, but includes an instruction cache (INS-CACHE) 100, an instruction code decoding logic (INS-DEC) 300, a central processing unit (CPU) 1600, and a memory (MEM) 1500. And is formed on one semiconductor substrate such as a single crystal silicon substrate by a known semiconductor integrated circuit manufacturing technique. The encrypted instruction code is transmitted to the instruction code decryption logic 300 via the instruction cache 100. The instruction code decryption logic 300 decrypts the encrypted instruction code immediately before the CPU 1600 and supplies the decrypted instruction code to the CPU 1600. The CPU 1600 executes the instruction decoded by the instruction code decoding logic 300. For example, the encrypted instruction code may be encrypted by the CPU 1600 or a coprocessor (not shown) and stored in the MEM 1500, or may be stored in advance in the MEM 1500, or may be stored in the data processing apparatus. An instruction code encrypted from the outside may be transmitted.

図１には、上記ＣＰＵ１６００の詳細な構成例が示される。 FIG. 1 shows a detailed configuration example of the CPU 1600.

図１に示されるようにＣＰＵ１６００は、命令キュー５００、命令デコーダ（ＤＥＣ）６００、オペランドアドレス加算器（ＯＰ−ＡＤＲ−ＡＤＤ）７００、オペランドキャッシュ（ＯＰ−ＣＡＣＨＥ）８００、オペランドデータ復号化論理（ＯＰ−ＤＥＣ）９００、命令実行部（ＩＮＳ−ＰＲＡ）１０００、オペランドデータ暗号化論理（ＯＰ−ＣＯＤ）１１００、命令アドレス加算器（ＩＮＳ−ＡＤＲ−ＡＤＤ）１２００、逐次命令フェッチアドレス生成論理（ＡＤＲ−ＣＲＥ）１３００、及びセレクタ９０−１，９０−２，９０−３を含んで成る。 As shown in FIG. 1, the CPU 1600 includes an instruction queue 500, an instruction decoder (DEC) 600, an operand address adder (OP-ADR-ADD) 700, an operand cache (OP-CACHE) 800, an operand data decoding logic (OP -DEC) 900, instruction execution unit (INS-PRA) 1000, operand data encryption logic (OP-COD) 1100, instruction address adder (INS-ADR-ADD) 1200, sequential instruction fetch address generation logic (ADR-CRE) ) 1300 and selectors 90-1, 90-2, 90-3.

上記命令キュー５００は、上記命令コード復号化論理３００から伝達された３２バイトの命令を格納する。上記セレクタ９０−１は、上記命令コード復号化論理３００の出力と、上記命令キュー５００の出力とを選択的に命令デコーダ６００に伝達する。命令デコーダ６００は、上記セレクタ９０−１を介して伝達された命令の解読を行う。オペランドアドレス加算器７００は命令のオペランドアドレスを生成する。オペランドキャッシュ８００は、上記オペランドアドレスによって生成される。オペランドデータ復号化論理９００は、上記オペランドキャッシュ８００から出力された暗号化データを復号化する。命令実行部１０００は、上記命令デコーダ６００でデコードされた命令を実行する。この命令実行部１０００での命令実行結果はセレクタ９０−３及びオペランドデータ暗号化論理１１００に伝達される。上記命令アドレス加算器１２００は分岐命令の分岐先アドレスを計算する。上記逐次命令フェッチアドレス生成論理１３００は逐次命令読み出し時のアドレスを計算する。セレクタ９０−２は、命令をフェッチするため、上記命令アドレス加算器１２００と上記逐次命令フェッチアドレス生成論理１３００の出力とを選択的に上記命令キャッシュ１００に伝達する。フェッチすべき命令が命令キャッシュ１００内に存在する場合には、当該命令が命令コード復号化論理３００に伝達される。しかし、フェッチすべき命令が命令キャッシュ１００内に存在しない場合には、対応する命令がメモリ１５００から読出される。このとき、当該命令は命令キャッシュ１００に記憶される。 The instruction queue 500 stores a 32-byte instruction transmitted from the instruction code decoding logic 300. The selector 90-1 selectively transmits the output of the instruction code decoding logic 300 and the output of the instruction queue 500 to the instruction decoder 600. The instruction decoder 600 decodes the instruction transmitted through the selector 90-1. Operand address adder 700 generates an operand address for the instruction. The operand cache 800 is generated by the operand address. The operand data decryption logic 900 decrypts the encrypted data output from the operand cache 800. The instruction execution unit 1000 executes the instruction decoded by the instruction decoder 600. The instruction execution result in the instruction execution unit 1000 is transmitted to the selector 90-3 and the operand data encryption logic 1100. The instruction address adder 1200 calculates a branch destination address of the branch instruction. The sequential instruction fetch address generation logic 1300 calculates an address at the time of sequential instruction reading. The selector 90-2 selectively transmits the instruction address adder 1200 and the output of the sequential instruction fetch address generation logic 1300 to the instruction cache 100 in order to fetch an instruction. If the instruction to be fetched exists in the instruction cache 100, the instruction is transmitted to the instruction code decoding logic 300. However, if the instruction to be fetched does not exist in the instruction cache 100, the corresponding instruction is read from the memory 1500. At this time, the instruction is stored in the instruction cache 100.

図１において、Ｉ１，Ｉ２，Ｉ−ＤＣ〜ＷＢ、及びＯ−ＤＣなどは命令の処理におけるパイプライン処理との対応を示している。Ｉ１，Ｉ２は命令キャッシュ参照ステージを示し、Ｉ−ＤＣは命令キャッシュ１００から読み出された命令コードの復号化ステージを示し、ＩＱは命令キューに滞留しているステージを示し、ＩＤは命令の解読ステージを示し、Ｅ１はレジスタリード及び分岐先命令アドレス加算、オペランドアドレス加算ステージを示し、Ｅ２，Ｅ３は演算実行ステージ又はオペランドキャッシュ参照ステージを示し、Ｏ−ＥＣはオペランドストアデータの暗号化ステージを示し、Ｏ−ＤＣはオペランドキャッシュ８００からのロードデータの復号化ステージを示し、ＷＢは命令実行結果の書き込み（ライトバック）を示す。Ｉ−ＤＣステージ、Ｏ−ＥＣステージ、及びＯ−ＤＣステージは通常数サイクル以上必要である。 In FIG. 1, I1, I2, I-DC to WB, O-DC, and the like indicate correspondence with pipeline processing in instruction processing. I1 and I2 indicate the instruction cache reference stage, I-DC indicates the decoding stage of the instruction code read from the instruction cache 100, IQ indicates the stage staying in the instruction queue, and ID indicates the instruction decoding E1 indicates a register read and branch destination instruction address addition and operand address addition stage, E2 and E3 indicate an operation execution stage or an operand cache reference stage, and O-EC indicates an operand store data encryption stage , O-DC indicates the decoding stage of the load data from the operand cache 800, and WB indicates the writing (write back) of the instruction execution result. The I-DC stage, O-EC stage, and O-DC stage usually require several cycles or more.

一つの命令の命令長は２バイトで、一度の命令フェッチで命令キャッシュからフェッチする命令コードは４命令（８バイト）と仮定する。命令コードはメモリから読み出されるとバス２００を通して命令キャッシュ１００に格納される。逐次命令処理の場合、逐次命令フェッチアドレス生成論理１３００で生成されたアドレスにより命令キャッシュ１００を参照する。命令キャッシュ１００から読み出された命令コードは命令コード復号化論理３００に転送され復号化される。命令コード復号化論理３００によって復号化された命令コードは、命令キュー５００に格納され、また、命令デコーダ６００に転送されて命令の解読が行われる。命令デコーダ６００で解読された命令がロード型命令の場合、オペランドアドレス加算器７００によりオペランドアドレスが生成され、そのアドレスに基づいてオペランドキャッシュ８００が参照される。オペランドキャッシュ８００から読み出されたオペランドは、オペランドデータ復号化論理９００で復号化され、ＣＰＵ１６００内の図示されないレジスタに格納される。命令デコーダ６００で解読された命令がストア型命令の場合、図示されないレジスタから読み出されたデータは命令実行部１０００を経由し、オペランドデータ暗号化論理１１００で暗号化された後に、上記メモリ１５００に書き込まれる。命令デコーダ６００で解読された命令が演算型命令の場合、命令実行部１０００で演算が行われ、その演算結果は、セレクタ９０−３を介して、図示されないレジスタに書き込まれる。 It is assumed that the instruction length of one instruction is 2 bytes, and the instruction code fetched from the instruction cache by one instruction fetch is 4 instructions (8 bytes). When the instruction code is read from the memory, it is stored in the instruction cache 100 through the bus 200. In the case of sequential instruction processing, the instruction cache 100 is referred to by the address generated by the sequential instruction fetch address generation logic 1300. The instruction code read from the instruction cache 100 is transferred to the instruction code decoding logic 300 and decoded. The instruction code decoded by the instruction code decoding logic 300 is stored in the instruction queue 500 and transferred to the instruction decoder 600 to be decoded. When the instruction decoded by the instruction decoder 600 is a load type instruction, an operand address adder 700 generates an operand address, and the operand cache 800 is referred to based on the address. The operand read from the operand cache 800 is decoded by the operand data decoding logic 900 and stored in a register (not shown) in the CPU 1600. When the instruction decoded by the instruction decoder 600 is a store type instruction, data read from a register (not shown) is encrypted by the operand data encryption logic 1100 via the instruction execution unit 1000 and then stored in the memory 1500. Written. When the instruction decoded by the instruction decoder 600 is an arithmetic instruction, the instruction execution unit 1000 performs an operation, and the operation result is written to a register (not shown) via the selector 90-3.

図２には、上記構成における命令処理タイミングが示される。 FIG. 2 shows instruction processing timing in the above configuration.

図２において命令１から命令４までの８バイトはサイクル１からサイクル２で命令キャッシュ１００から読み出される。読み出された８バイトの命令コードはサイクル３からサイクル８の間で復号化される。復号化された命令コードは命令キュー５００に転送するとともに、先頭の命令１は命令デコーダ６００に転送される。命令１はサイクル９で命令の解読が行われる。説明の便宜上、命令１乃至命令９及び命令１０乃至命令１７は演算型命令と仮定する。以降命令１はパイプライン処理され、サイクル１３で結果の書き込みが行われる。命令２は命令キュー５００から命令デコーダ６００に転送され、サイクル１０で命令の解読が行われる。同様に命令３はサイクル１１で命令の解読が行われ、命令４はサイクル１２で命令の解読が行われる。次に命令５から命令８までの８バイトはサイクル２からサイクル３で命令キャッシュ１００から読み出される。しかし、サイクル４では、命令コード復号化論理３００が未だ命令１から命令４の復号化処理を行っているため、命令５から命令８の復号化処理の開始はサイクル９まで待たされる。命令５から命令８の復号化処理はサイクル９に開始され、サイクル１４で完了する。復号化された命令コードは命令キュー５００に転送するとともに、命令５は命令デコーダ６００に転送される。従って命令５はサイクル１５で命令の解読が行われ、サイクル１９で結果の書き込みが行われる。このように命令４の終了（サイクル１６）と命令５の終了（サイクル１９）の間に２サイクルのオーバヘッド（処理の遅れ）が生じる。同様に命令８と命令９の間にも２サイクルのオーバヘッドが生じる。分岐命令はサイクル２２で命令の解読が行われ、サイクル２３で命令アドレス加算器１２００により分岐先命令アドレスが求まる。その分岐先命令アドレスにより、サイクル２４及びサイクル２５で命令キャッシュ１００を参照する。分岐先命令の命令コード（命令１０から命令１３）の復号化処理はサイクル２６に開始され、サイクル３１で完了する。分岐先命令である命令１０はサイクル３２で命令の解読が行われ、サイクル３６で結果の書き込みが行われる。 In FIG. 2, 8 bytes from instruction 1 to instruction 4 are read from the instruction cache 100 in cycle 1 to cycle 2. The read 8-byte instruction code is decoded between cycle 3 and cycle 8. The decoded instruction code is transferred to the instruction queue 500, and the first instruction 1 is transferred to the instruction decoder 600. Instruction 1 is decoded in cycle 9. For convenience of explanation, it is assumed that the instructions 1 to 9 and the instructions 10 to 17 are arithmetic instructions. Thereafter, instruction 1 is pipeline processed, and the result is written in cycle 13. The instruction 2 is transferred from the instruction queue 500 to the instruction decoder 600, and the instruction is decoded in cycle 10. Similarly, instruction 3 is decoded in cycle 11 and instruction 4 is decoded in cycle 12. Next, 8 bytes from instruction 5 to instruction 8 are read from the instruction cache 100 in cycles 2 to 3. However, in cycle 4, since the instruction code decoding logic 300 is still performing the decoding process from instruction 1 to instruction 4, the start of the decoding process from instruction 5 to instruction 8 is waited until cycle 9. The decoding process from instruction 5 to instruction 8 starts in cycle 9 and completes in cycle 14. The decoded instruction code is transferred to the instruction queue 500, and the instruction 5 is transferred to the instruction decoder 600. Therefore, the instruction 5 is decoded in cycle 15 and the result is written in cycle 19. Thus, an overhead of 2 cycles (processing delay) occurs between the end of instruction 4 (cycle 16) and the end of instruction 5 (cycle 19). Similarly, an overhead of 2 cycles occurs between the instruction 8 and the instruction 9. The branch instruction is decoded in cycle 22, and in cycle 23, the instruction address adder 1200 determines the branch destination instruction address. The instruction cache 100 is referred to in cycles 24 and 25 by the branch destination instruction address. The decoding process of the instruction code (instruction 10 to instruction 13) of the branch destination instruction starts in cycle 26 and is completed in cycle 31. The instruction 10 which is a branch destination instruction is decoded in a cycle 32, and a result is written in a cycle 36.

上記例によれば、以下の作用効果を得ることができる。 According to the above example, the following operational effects can be obtained.

命令キャッシュ１００と命令キュー５００との間に命令コード復号化論理３００が配置されたことにより、命令キャッシュ１００の内容は暗号化された命令コードとなり、復号化された命令コードが命令キャッシュ１００に格納されることが無いため、顧客プログラムの保護を強化することができる。すなわち、デバッグ機構により命令キャッシュ１００の内容を表示したり、命令キャッシュ１００の中の命令コードを他の記憶装置に転送しても、その命令コードを解読することはできないため、顧客プログラムの保護が実現できる。 Since the instruction code decryption logic 300 is arranged between the instruction cache 100 and the instruction queue 500, the content of the instruction cache 100 becomes an encrypted instruction code, and the decrypted instruction code is stored in the instruction cache 100. Since this is not done, the protection of the customer program can be strengthened. That is, even if the contents of the instruction cache 100 are displayed by the debugging mechanism or the instruction code in the instruction cache 100 is transferred to another storage device, the instruction code cannot be decoded. realizable.

図３には、上記ＣＰＵ１６００の別の構成例が示される。 FIG. 3 shows another configuration example of the CPU 1600.

図３に示されるＣＰＵ１６００が図１に示されるのと大きく相違するのは、命令コード復号化論理３００において暗号化された命令コードの復号化がパイプライン処理によって行われる点、及び分岐先命令のアドレスに対応させてその分岐先の８命令分の暗号化された命令コードに対応する復号化後の命令コードを連想して保持するための復号変換バッファ（ＤＴＢ）４００が設けられている点である。 The CPU 1600 shown in FIG. 3 is greatly different from that shown in FIG. 1 in that the instruction code encrypted in the instruction code decoding logic 300 is decrypted by pipeline processing, and the branch destination instruction There is provided a decoding conversion buffer (DTB) 400 for associating and holding the decoded instruction code corresponding to the encrypted instruction code corresponding to the eight instructions at the branch destination in association with the address. is there.

図３において、Ｉ１，Ｉ２は命令キャッシュ参照ステージを示し、Ｉ−ｄｃ１乃至Ｉ−ｄｃ６は命令キャッシュ１００から読み出された命令コードの復号化ステージを示し、ＩＱは命令キューに滞留しているステージを示し、ＩＤは命令の解読ステージを示し、Ｅ１はレジスタリード及び分岐先命令アドレス加算、オペランドアドレス加算ステージを示し、Ｅ２，Ｅ３は演算実行ステージまたはオペランドキャッシュ参照ステージを示し、Ｏ−ｅｃ１乃至Ｏ−ｅｃ６はオペランドストアデータの暗号化ステージを示し、Ｏ−ｄｃ１乃至Ｏ−ｄｃ６はオペランドキャッシュ８００からのロードデータの復号化ステージを示し、ＷＢは命令実行結果の書き込み（ライトバック）を示す。Ｉ−ｄｃ１〜Ｉ−ｄｃ６ステージＯ−ｅｃ１〜Ｏ−ｅｃ６ステージ及びＯ−ｄｃ１〜Ｏ−ｄｃ６ステージは通常数サイクル以上必要な復号化や暗号化の処理をパイプライン処理化したステージである。 3, I1 and I2 indicate instruction cache reference stages, I-dc1 to I-dc6 indicate decoding stages of instruction codes read from the instruction cache 100, and IQ is a stage staying in the instruction queue. ID indicates an instruction decoding stage, E1 indicates a register read and branch destination instruction address addition and operand address addition stage, E2 and E3 indicate an operation execution stage or an operand cache reference stage, and O-ec1 to O -Ec6 indicates the encryption stage of the operand store data, O-dc1 to O-dc6 indicate the decryption stage of the load data from the operand cache 800, and WB indicates the writing of the instruction execution result (write back). I-dc1 to I-dc6 stages O-ec1 to O-ec6 stages and O-dc1 to O-dc6 stages are stages in which decryption and encryption processes that are usually required for several cycles or more are pipelined.

命令キャッシュ１００から読み出された命令コードは、セレクタ９０−４を介して、命令コード復号化論理３００に転送され復号化される。命令コード復号化論理３００から出力される復号化された命令コードは命令キュー５００に格納されるとともに、命令デコーダ６００に転送されて命令の解読が行われる。また、分岐命令の分岐先の８命令については命令コード復号化論理３００で復号化された後に、対応する分岐先命令アドレスと対にして信号変換バッファ４００に格納される。命令デコーダ６００で解読された命令がロード型命令の場合、オペランドアドレス加算器７００によりオペランドアドレスが生成され、オペランドキャッシュ８００が参照される。オペランドキャッシュ８００から読み出されたオペランドはオペランドデータ復号化論理９００で復号化されてＣＰＵ１６００内の図示されないレジスタに格納される。命令デコーダ６００で解読された命令がストア型命令の場合、図示されないレジスタから読み出されたデータは命令実行部１０００を経由してオペランドデータ暗号化論理１１００で暗号化された後に、メモリ１５００に書き込まれる。命令デコーダ６００で解読された命令が演算型命令の場合、命令実行部１０００で演算が行われ、その演算結果は、図示されないレジスタに書き込まれる。 The instruction code read from the instruction cache 100 is transferred to the instruction code decoding logic 300 via the selector 90-4 and decoded. The decoded instruction code output from the instruction code decoding logic 300 is stored in the instruction queue 500 and transferred to the instruction decoder 600 to be decoded. Further, the branch instruction eight instructions are decoded by the instruction code decoding logic 300 and then stored in the signal conversion buffer 400 as a pair with the corresponding branch instruction address. When the instruction decoded by the instruction decoder 600 is a load type instruction, an operand address is generated by the operand address adder 700 and the operand cache 800 is referred to. Operands read from the operand cache 800 are decoded by the operand data decoding logic 900 and stored in a register (not shown) in the CPU 1600. When the instruction decoded by the instruction decoder 600 is a store type instruction, data read from a register (not shown) is encrypted by the operand data encryption logic 1100 via the instruction execution unit 1000 and then written to the memory 1500. It is. When the instruction decoded by the instruction decoder 600 is an arithmetic instruction, the instruction execution unit 1000 performs an operation, and the operation result is written in a register (not shown).

図４には、上記構成における命令処理タイミングが示される。 FIG. 4 shows the instruction processing timing in the above configuration.

命令１から命令４までの８バイトはサイクル１からサイクル２で命令キャッシュ１００から読み出される。読み出された８バイトの命令コードはサイクル３からサイクル８の間で復号化される。復号化された命令コードはセレクタ９０−４７を介して命令キュー５００に転送される。尚、復号化された先頭の命令１は命令デコーダ６００に転送される。命令１はサイクル９で命令の解読が行われる。説明の便宜上、命令１乃至命令９及び命令１０乃至命令２１は演算型命令と仮定する。以降命令１はパイプライン処理され、サイクル１３で結果の書き込みが行われる。命令２は命令キュー５００から命令デコーダ６００に転送され、サイクル１０で命令の解読が行われる。同様に命令３はサイクル１１で命令の解読が行われ、命令４はサイクル１２で命令の解読が行われる。次に命令５から命令８までの８バイトはサイクル２からサイクル３で命令キャッシュ１００から読み出される。命令５から命令８の復号化処理はサイクル４に開始され、サイクル９で完了する。復号化された命令コードは命令キュー５００に転送される。従って命令５はサイクル１３で命令の解読が行われ、サイクル１７で結果の書き込みが行われる。以降、命令９まで同様に処理される。分岐命令はサイクル１８で命令の解読が行われ、サイクル１９で命令アドレス加算器１２００により分岐先命令アドレスが求まる。その分岐先命令アドレスにより、サイクル２０及びサイクル２１で命令キャッシュ１００を参照する。分岐先命令の命令コード（命令１０から命令１３）の復号化処理はサイクル２２で開始され、サイクル２７で完了する。一方、分岐先命令アドレスと「分岐先命令アドレス＋８」のアドレスは信号変換バッファ４００にも送られ、サイクル２０で信号変換バッファ４００内に当該分岐先アドレスに対応する復号化された分岐先命令コードが存在するか否か判定される。所望の分岐先命令コードが信号変換バッファ４００に存在する場合、サイクル２１で復号化された後の分岐先命令コードが信号変換バッファ４００から読み出される。読み出された命令コードはセレクタ９０−４を介して命令キュー５００に転送される。先頭の命令１０は命令デコーダ６００に転送される。また、サイクル２１で信号変換バッファ４００内に、「分岐先命令アドレス＋８」に対応する、復号化された分岐先命令コードが存在するか否か判定される。所望の命令コードが符号化変換バッファ４００に存在する場合、サイクル２２で復号化された後の分岐先命令コードが信号変換バッファ４００から読み出される。読み出された命令コードは、セレクタ９０−４を介して命令キュー５００に転送される。分岐先命令である命令１０はサイクル２２で命令の解読が行われ、サイクル２６で結果の書き込みが行われる。このように分岐命令と命令１０の間には命令アドレス加算器１２００で分岐先アドレスを計算して、信号変換バッファ４００から分岐先命令コードを読み出すまでの３サイクルのオーバヘッドが生じる。尚、信号変換バッファ４００に分岐先命令の命令コードが存在しない場合は、サイクル２７で分岐先命令コードの復号化処理が完了するので、サイクル２８で命令１０の命令の解読を行うことができる。以降、命令１１から命令１７までは同様に命令が実行される。命令１０から命令１３までの命令コードはサイクル２１に命令キャッシュ１００から読み出され、命令１４から命令１７までの命令コードはサイクル２２で命令キャッシュ１００から読み出されるため、命令１８から命令２１までの命令コードはサイクル２３で読み出される。読み出された命令コードはサイクル２４から命令コード復号化処理され、サイクル２９で復号化処理が完了する。復号化された命令コード（命令１８から命令２１）は命令キュー５００に転送されるとともに、先頭の命令１８は命令デコーダ６００に転送される。命令１８はサイクル３０で命令の解読が行われ、サイクル３４で結果の書き込みが行われる。このような処理によれば、命令１０から命令２１の命令処理についてオーバヘッドが生じることはない。 Eight bytes from instruction 1 to instruction 4 are read from the instruction cache 100 in cycle 1 to cycle 2. The read 8-byte instruction code is decoded between cycle 3 and cycle 8. The decoded instruction code is transferred to the instruction queue 500 via the selector 90-47. The decoded first instruction 1 is transferred to the instruction decoder 600. Instruction 1 is decoded in cycle 9. For convenience of explanation, it is assumed that the instructions 1 to 9 and the instructions 10 to 21 are arithmetic instructions. Thereafter, instruction 1 is pipeline processed, and the result is written in cycle 13. The instruction 2 is transferred from the instruction queue 500 to the instruction decoder 600, and the instruction is decoded in cycle 10. Similarly, instruction 3 is decoded in cycle 11 and instruction 4 is decoded in cycle 12. Next, 8 bytes from instruction 5 to instruction 8 are read from the instruction cache 100 in cycles 2 to 3. The decoding process from instruction 5 to instruction 8 starts in cycle 4 and completes in cycle 9. The decoded instruction code is transferred to the instruction queue 500. Therefore, the instruction 5 is decoded in cycle 13 and the result is written in cycle 17. Thereafter, the same processing up to the instruction 9 is performed. The branch instruction is decoded in cycle 18, and in cycle 19, the instruction address adder 1200 determines the branch destination instruction address. The instruction cache 100 is referred to in cycle 20 and cycle 21 by the branch destination instruction address. The decoding process of the instruction code (instruction 10 to instruction 13) of the branch destination instruction starts in cycle 22 and is completed in cycle 27. On the other hand, the branch destination instruction address and the address of “branch destination instruction address + 8” are also sent to the signal conversion buffer 400, and the decoded branch destination instruction code corresponding to the branch destination address is stored in the signal conversion buffer 400 in cycle 20. It is determined whether or not exists. When a desired branch destination instruction code exists in the signal conversion buffer 400, the branch destination instruction code after being decoded in the cycle 21 is read from the signal conversion buffer 400. The read instruction code is transferred to the instruction queue 500 via the selector 90-4. The first instruction 10 is transferred to the instruction decoder 600. In cycle 21, it is determined whether or not a decoded branch destination instruction code corresponding to “branch destination instruction address + 8” exists in the signal conversion buffer 400. When the desired instruction code exists in the encoding / conversion buffer 400, the branch destination instruction code after being decoded in the cycle 22 is read from the signal conversion buffer 400. The read instruction code is transferred to the instruction queue 500 via the selector 90-4. The instruction 10 which is a branch destination instruction is decoded in cycle 22 and the result is written in cycle 26. In this way, between the branch instruction and the instruction 10, there is an overhead of 3 cycles until the branch destination address is calculated by the instruction address adder 1200 and the branch destination instruction code is read from the signal conversion buffer 400. When the instruction code of the branch destination instruction does not exist in the signal conversion buffer 400, the decoding process of the branch destination instruction code is completed in cycle 27, so that the instruction 10 can be decoded in cycle 28. Thereafter, instructions 11 to 17 are executed in the same manner. Since the instruction codes from the instruction 10 to the instruction 13 are read from the instruction cache 100 in the cycle 21 and the instruction codes from the instruction 14 to the instruction 17 are read from the instruction cache 100 in the cycle 22, the instructions from the instruction 18 to the instruction 21 are read. The code is read in cycle 23. The read instruction code is decoded from cycle 24, and the decoding process is completed in cycle 29. The decoded instruction code (instruction 18 to instruction 21) is transferred to the instruction queue 500, and the head instruction 18 is transferred to the instruction decoder 600. The instruction 18 is decoded in cycle 30 and the result is written in cycle 34. According to such processing, there is no overhead in the instruction processing from the instruction 10 to the instruction 21.

上記の例によれば、命令キャッシュ１００と命令キューの間に命令コードの復号化論理３００を有することにより、命令キャッシュ１００の内容は暗号化された命令コードとなり、復号化された命令コードが命令キャッシュ１００に格納されることが無いため、顧客プログラムの保護を強化することができる。また、命令コード復号化論理３００での復号化がパイプライン処理されることにより、逐次命令読み出し時のオーバヘッドを隠蔽化することができる。さらに、信号変換バッファ４００を設けることにより、分岐先アドレスに対応して分岐先の命令コードの復号化後の命令を記憶するようにしているため、分岐命令発生時の復号化処理によるオーバヘッドを隠蔽化することが可能となる。 According to the above example, by having the instruction code decryption logic 300 between the instruction cache 100 and the instruction queue, the contents of the instruction cache 100 become an encrypted instruction code, and the decrypted instruction code is an instruction code. Since it is not stored in the cache 100, the protection of the customer program can be strengthened. In addition, since the decoding by the instruction code decoding logic 300 is pipelined, the overhead at the time of sequential instruction reading can be concealed. Furthermore, by providing the signal conversion buffer 400, the instruction after decoding of the instruction code of the branch destination is stored corresponding to the branch destination address, so that the overhead due to the decoding processing when the branch instruction is generated is concealed. Can be realized.

図５には、上記復号変換バッファ４００の構成例が示される。 FIG. 5 shows a configuration example of the decoding conversion buffer 400.

図５に示される復号変換バッファ４００はタグ部とデータ部とを含む。特に制限されないが、ダイレクトマッピング方式が採用され、タグ部は７０４バイト構成とされ、データ部は２Ｋバイト構成とされる。尚、復号変換バッファ４００は、２−ｗａｙや４−ｗａｙのセットアソシアティブ方式であっても良いし、フルアソシアティブ方式であっても良い。 The decoding conversion buffer 400 shown in FIG. 5 includes a tag part and a data part. Although not particularly limited, a direct mapping method is adopted, the tag portion has a 704 byte configuration, and the data portion has a 2 Kbyte configuration. The decoding conversion buffer 400 may be a 2-way or 4-way set associative method or a full associative method.

図６には、図５に示される復号変換バッファ４００において参照（Ｒｅａｄ）及び書き込み（Ｗｒｉｔｅ）についての動作が示される。 FIG. 6 shows operations for reference (Read) and write (Write) in the decoding conversion buffer 400 shown in FIG.

参照時は分岐先命令アドレスの［１０：３］の８ビットにより２５６カラムの中からタグ及びデータを読み出す。読み出されたタグには、復号変換バッファに登録されている命令コードに対応する命令アドレスの［３１：１１］とその内容が有効であることを示すバリッドビット［Ｖ］が含まれる。コンパレータ（Ｃｍｐ．）では、分岐先アドレスの［３１：１１］と読み出されたタグの［３１：１１］との比較が行われる。バリッドビットが論理値“１”で、かつ分岐先アドレスの［３１：１１］と読み出されたタグの［３１：１１］が一致した場合、復号変換バッファ４００に復号後の命令コードが存在するため、復号変換バッファ４００から読み出された命令コードが命令キュー５００及び命令デコーダ６００に転送される。復号変換バッファ４００に復号後の命令コードが存在しない場合は、命令コード復号化論理３００で復号化された命令コードが命令キュー５００及び命令デコーダ６００に転送されるとともに、復号変換バッファ４００にも転送されて復号変換バッファ４００への書き込みが行われる。このとき、復号後の命令コードに対応する命令アドレスの［１０：３］の８ビットにより２５６カラムの中の一つのカラムに書き込まれる。そしてタグには命令アドレスの［３１：１１］が書き込まれるとともに、バリッドビットに論理値“１”が書き込まれる。また、復号後の命令コードも同じカラムに書き込まれる。 At the time of reference, tags and data are read out from 256 columns by 8 bits of [10: 3] of the branch destination instruction address. The read tag includes [31:11] of the instruction address corresponding to the instruction code registered in the decoding conversion buffer and a valid bit [V] indicating that the contents are valid. The comparator (Cmp.) Compares the branch destination address [31:11] with the read tag [31:11]. When the valid bit is a logical value “1” and the branch destination address [31:11] matches the read tag [31:11], the decoded instruction code exists in the decoding conversion buffer 400. Therefore, the instruction code read from the decoding conversion buffer 400 is transferred to the instruction queue 500 and the instruction decoder 600. When there is no decoded instruction code in the decoding conversion buffer 400, the instruction code decoded by the instruction code decoding logic 300 is transferred to the instruction queue 500 and the instruction decoder 600, and also transferred to the decoding conversion buffer 400. Then, writing to the decoding conversion buffer 400 is performed. At this time, it is written in one of 256 columns by 8 bits of [10: 3] of the instruction address corresponding to the decoded instruction code. Then, the instruction address [31:11] is written in the tag, and the logical value “1” is written in the valid bit. The decoded instruction code is also written in the same column.

図７には、上記ＣＰＵ１６００の別の構成例が示される。 FIG. 7 shows another configuration example of the CPU 1600.

図７に示されるのが、図１に示されるのと大きく相違するのは、暗号化された命令コードの復号化がパイプライン処理によって行われる点、さらには、動的分岐予測機構などによる投機的命令フェッチ方式を併用し、命令フェッチアドレスをキーとして当該命令フェッチする命令コードの中に以前に分岐成功した分岐命令が存在するときにその分岐命令の分岐先アドレスを出力可能な分岐先アドレスバッファ（ＢＴＢ）１４００を備える点である。そして上記分岐先アドレスバッファ１４００が設けられたことに対応して、セレクタ９０−２は、上記命令アドレス加算器１２００の出力と、上記分岐先アドレスバッファ１４００の出力信号と、上記逐次命令フェッチアドレス生成論理１３００の出力とを選択的に上記命令キャッシュ１００及び上記分岐先アドレスバッファ１４００に伝達可能に構成される。 7 is greatly different from that shown in FIG. 1 in that the encrypted instruction code is decrypted by pipeline processing, and further, speculation by a dynamic branch prediction mechanism or the like. Destination address buffer that can output the branch destination address of the branch instruction when there is a branch instruction that has succeeded in branching in the instruction code that fetches the instruction using the instruction fetch address as a key. (BTB) 1400. In response to the provision of the branch destination address buffer 1400, the selector 90-2 outputs the instruction address adder 1200, the output signal of the branch destination address buffer 1400, and the sequential instruction fetch address generation. The output of the logic 1300 is selectively transmitted to the instruction cache 100 and the branch destination address buffer 1400.

図７において、Ｉ１，Ｉ２，Ｉ−ｄｃ１などは命令の処理におけるパイプライン処理との対応を示している。Ｉ１，Ｉ２は命令キャッシュ参照ステージを示し、Ｉ−ｄｃ１乃至Ｉ−ｄｃ６は命令キャッシュ１００から読み出された命令コードの復号化ステージを示し、ＩＱは命令キューに滞留しているステージを示し、ＩＤは命令の解読ステージを示し、Ｅ１はレジスタリード及び分岐先命令アドレス加算、オペランドアドレス加算ステージを示し、Ｅ２，Ｅ３は演算実行ステージまたはオペランドキャッシュ参照ステージを示し、Ｏ−ｅｃ１乃至Ｏ−ｅｃ６はオペランドストアデータの暗号化ステージを示し、Ｏ−ｄｃ１乃至Ｏ−ｄｃ６はオペランドキャッシュ８００からのロードデータの復号化ステージを示し、ＷＢは命令実行結果の書き込み（ライトバック）を示す。Ｉ−ｄｃ１〜Ｉ−ｄｃ６ステージＯ−ｅｃ１〜Ｏ−ｅｃ６ステージ及びＯ−ｄｃ１〜Ｏ−ｄｃ６ステージは通常数サイクル以上必要な復号化や暗号化の処理をパイプライン処理化したステージである。 In FIG. 7, I1, I2, I-dc1, and the like indicate correspondence with pipeline processing in instruction processing. I1 and I2 indicate instruction cache reference stages, I-dc1 to I-dc6 indicate decoding stages of instruction codes read from the instruction cache 100, IQ indicates a stage staying in the instruction queue, and ID Indicates an instruction decoding stage, E1 indicates a register read and branch destination instruction address addition, operand address addition stage, E2 and E3 indicate an operation execution stage or an operand cache reference stage, and O-ec1 to O-ec6 are operands Store data encryption stages are shown, O-dc1 to O-dc6 are load data decryption stages from the operand cache 800, and WB is an instruction execution result write (write back). I-dc1 to I-dc6 stages O-ec1 to O-ec6 stages and O-dc1 to O-dc6 stages are stages in which decryption and encryption processes that are usually required for several cycles or more are pipelined.

ここで、図３に示される構成によれば、図８に示されるように、命令の逐次読み出し時及び分岐命令発生時での命令コードの復号化処理オーバヘッドは隠蔽化されるものの、分岐命令発生時の分岐先命令読み出しに関するオーバヘッドは矢印で示されるように隠蔽化されない。これに対して図７に示される構成によれば、以下に詳述するように、命令コードの復号化によるオーバヘッドと、分岐命令発生時の分岐先命令読み出しに関するオーバヘッドとの両方を隠蔽化することができる。 Here, according to the configuration shown in FIG. 3, as shown in FIG. 8, although the instruction code decoding processing overhead at the time of sequential instruction reading and branch instruction generation is concealed, branch instruction generation occurs. The overhead associated with reading the branch destination instruction at that time is not concealed as shown by the arrows. On the other hand, according to the configuration shown in FIG. 7, as described in detail below, both the overhead due to the decoding of the instruction code and the overhead related to reading the branch destination instruction when the branch instruction is generated are concealed. Can do.

図９には、図７に示される構成における命令処理タイミングが示される。 FIG. 9 shows instruction processing timings in the configuration shown in FIG.

命令１と分岐命令１はサイクル１及びサイクル２で命令キャッシュ１００から読み出される。同時にサイクル１では分岐先アドレスバッファ１４００が参照され、サイクル２では分岐命令１の分岐先アドレス（すなわち命令２の命令アドレス）が分岐先アドレスバッファ１４００から出力される。サイクル３及びサイクル４では分岐先アドレスバッファ１４００から出力された分岐命令１の分岐先アドレスにより命令２及び分岐命令２を命令キャッシュ１００から読み出すとともに分岐先アドレスバッファ１４００を参照し、サイクル４では分岐命令２の分岐先アドレス（すなわち命令３の命令アドレス）が分岐先アドレスバッファ１４００から出力される。サイクル５及びサイクル６では分岐先アドレスバッファ１４００から出力された分岐命令２の分岐先アドレスにより命令３及び分岐命令３が命令キャッシュ１００から読み出されて分岐先アドレスバッファ１４００が参照され、サイクル６では分岐命令３の分岐先アドレス（すなわち命令４の命令アドレス）が分岐先アドレスバッファ１４００から出力される。サイクル７及びサイクル８では分岐先アドレスバッファ１４００から出力された分岐命令３の分岐先アドレスにより命令４から命令７までが命令キャッシュ１００から読み出される。一方、命令１及び分岐命令１はサイクル３からサイクル８の間で復号化され、命令１はサイクル９に命令の解読が行われ、分岐命令１はサイクル１０で解読される。命令２及び分岐命令２はサイクル４で命令キャッシュ１００から読み出されているため、サイクル５からサイクル１０の間で復号化され、命令２はサイクル１１で命令の解読が行われ、分岐命令２はサイクル１２で命令が解読される。命令３及び分岐命令３はサイクル６で命令キャッシュ１００から読み出されているため、サイクル７からサイクル１２の間で復号化され、命令３はサイクル１３で命令の解読が行われ、分岐命令３はサイクル１４で解読される。命令４から命令７はサイクル８で命令キャッシュ１００から読み出されているため、サイクル９からサイクル１４の間で復号化され、命令４はサイクル１５で解読され、命令５はサイクル１６で解読され、命令６はサイクル１７で解読され、命令７はサイクル１８で解読される。このように分岐先アドレスバッファ１４００などの動的分岐予測機構によって投機的に命令フェッチを実行することにより、命令コードの復号化によるオーバヘッドと、分岐命令発生時の分岐先命令読み出しに関するオーバヘッドとの両方を隠蔽化することができる。 Instruction 1 and branch instruction 1 are read from instruction cache 100 in cycle 1 and cycle 2. At the same time, in cycle 1, the branch destination address buffer 1400 is referred to, and in cycle 2, the branch destination address of the branch instruction 1 (that is, the instruction address of the instruction 2) is output from the branch destination address buffer 1400. In cycle 3 and cycle 4, instruction 2 and branch instruction 2 are read from instruction cache 100 based on the branch destination address of branch instruction 1 output from branch destination address buffer 1400, and branch destination address buffer 1400 is referenced. 2 branch destination addresses (that is, the instruction address of the instruction 3) are output from the branch destination address buffer 1400. In cycle 5 and cycle 6, instruction 3 and branch instruction 3 are read from instruction cache 100 by the branch destination address of branch instruction 2 output from branch destination address buffer 1400, and branch destination address buffer 1400 is referenced. The branch destination address of the branch instruction 3 (that is, the instruction address of the instruction 4) is output from the branch destination address buffer 1400. In cycles 7 and 8, instructions 4 to 7 are read from the instruction cache 100 by the branch destination address of the branch instruction 3 output from the branch destination address buffer 1400. On the other hand, instruction 1 and branch instruction 1 are decoded between cycle 3 and cycle 8, instruction 1 is decoded in cycle 9, and branch instruction 1 is decoded in cycle 10. Since instruction 2 and branch instruction 2 are read from instruction cache 100 in cycle 4, they are decoded between cycle 5 and cycle 10, instruction 2 is decoded in cycle 11, and branch instruction 2 is In cycle 12, the instruction is decoded. Since the instruction 3 and the branch instruction 3 are read from the instruction cache 100 in the cycle 6, the instruction 3 is decoded in the cycle 7 to the cycle 12, the instruction 3 is decoded in the cycle 13, and the branch instruction 3 is Decrypted in cycle 14. Since instructions 4 through 7 are being read from instruction cache 100 in cycle 8, they are decoded between cycles 9 and 14, instruction 4 is decoded in cycle 15, instruction 5 is decoded in cycle 16, Instruction 6 is decoded in cycle 17 and instruction 7 is decoded in cycle 18. Thus, by speculatively executing an instruction fetch by a dynamic branch prediction mechanism such as the branch destination address buffer 1400, both the overhead due to decoding of the instruction code and the overhead related to reading the branch destination instruction when a branch instruction occurs Can be concealed.

以上本発明者によってなされた発明を具体的に説明したが、本発明はそれに限定されるものではなく、その要旨を逸脱しない範囲で種々変更可能であることはいうまでもない。 Although the invention made by the present inventor has been specifically described above, the present invention is not limited thereto, and it goes without saying that various changes can be made without departing from the scope of the invention.

例えば、命令キャッシュ１００、命令コード復号化論理３００、及びＣＰＵ１６００をそれぞれ別個の半導体チップにより形成することができる。 For example, the instruction cache 100, the instruction code decoding logic 300, and the CPU 1600 can be formed by separate semiconductor chips.

以上の説明では主として本発明者によってなされた発明をその背景となった利用分野であるマイクロコンピュータに適用した場合について説明したが、本発明はそれに限定されるものではなく、各種データ処理装置に広く適用することができる。 In the above description, the case where the invention made mainly by the present inventor is applied to the microcomputer which is the field of use as the background has been described. However, the present invention is not limited to this and is widely applied to various data processing devices. Can be applied.

本発明は、ＣＰＵ（中央処理装置）を含むことを条件に適用することができる。 The present invention can be applied on condition that a CPU (central processing unit) is included.

本発明にかかるデータ処理装置の一例であるマイクロコンピュータにおけるＣＰＵの構成例ブロック図である。1 is a block diagram illustrating a configuration example of a CPU in a microcomputer as an example of a data processing apparatus according to the present invention. 図１に示される構成の動作タイミング図である。It is an operation | movement timing diagram of the structure shown by FIG. 上記マイクロコンピュータにおけるＣＰＵの別の構成例ブロック図である。It is another example block diagram of a CPU in the microcomputer. 図３に示される構成の動作タイミング図である。FIG. 4 is an operation timing chart of the configuration shown in FIG. 3. 図３における主要部の構成例説明図である。FIG. 4 is an explanatory diagram of a configuration example of a main part in FIG. 3. 図３における主要部の別の構成例説明図である。It is another structural example explanatory drawing of the principal part in FIG. 上記マイクロコンピュータにおけるＣＰＵの別の構成例ブロック図である。It is another example block diagram of a CPU in the microcomputer. 図７に示される構成の比較対象とされる構成の動作タイミング図である。FIG. 8 is an operation timing chart of a configuration to be compared with the configuration shown in FIG. 7. 図７に示される構成の動作タイミング図である。FIG. 8 is an operation timing chart of the configuration shown in FIG. 7. 上記マイクロコンピュータの全体的な構成例ブロック図である。It is a block diagram of an example of the overall configuration of the microcomputer.

Explanation of symbols

９０−１〜９０−３セレクタ
１００命令キャッシュ
２００バス
３００命令コード復号化論理
５００命令キュー
６００命令デコーダ
７００オペランドアドレス加算器
８００オペランドキャッシュ
９００オペランドデータ復号化論理
１０００命令実行部
１１００オペランドデータ暗号化論理
１２００命令アドレス加算器
１３００逐次命令フェッチアドレス生成論理
１４００分岐先アドレスバッファ
１５００メモリ
１６００ＣＰＵ 90-1 to 90-3 Selector 100 Instruction cache 200 Bus 300 Instruction code decryption logic 500 Instruction queue 600 Instruction decoder 700 Operand address adder 800 Operand cache 900 Operand data decryption logic 1000 Instruction execution unit 1100 Operand data encryption logic 1200 Instruction address adder 1300 Sequential instruction fetch address generation logic 1400 Branch destination address buffer 1500 Memory 1600 CPU

Claims

A central processing unit capable of executing instruction codes;
An instruction cache capable of holding an encrypted instruction code;
Instruction code decryption arranged between the central processing unit and the instruction cache, for fetching the encrypted instruction code via the instruction cache, decrypting it and supplying it to the central processing unit A data processing apparatus comprising: logic;

2. The data processing apparatus according to claim 1, wherein the instruction code decryption logic sequentially decrypts the encrypted instruction code by pipeline processing.

The central processing unit includes a signal conversion buffer capable of holding an instruction after decoding of a branch destination instruction code in association with the branch instruction address corresponding to the branch destination instruction address, and branching into the signal conversion buffer 3. The data processing apparatus according to claim 2, wherein when a branch destination instruction code corresponding to the destination address exists, it is read and executed.

The central processing unit includes a branch destination address buffer as a dynamic branch prediction mechanism capable of outputting the branch destination address of the instruction using the instruction fetch address as a key, and speculatively fetches instructions via the branch destination address buffer. The data processing apparatus according to claim 2 to be executed.

5. A data processing apparatus according to claim 3, wherein the data processing apparatus is formed on one semiconductor substrate as a microcomputer.