JP5968693B2

JP5968693B2 - Semiconductor device

Info

Publication number: JP5968693B2
Application number: JP2012141718A
Authority: JP
Inventors: 直石川; 雅勝石崎
Original assignee: Renesas Electronics Corp
Current assignee: Renesas Electronics Corp
Priority date: 2012-06-25
Filing date: 2012-06-25
Publication date: 2016-08-10
Anticipated expiration: 2032-06-25
Also published as: JP2014006685A

Description

本発明は、半導体装置に関し、たとえば、キャッシュメモリを有する半導体装置に関する。 The present invention relates to a semiconductor device, for example, a semiconductor device having a cache memory.

従来から、メモリアクセスを高速化する方式が知られている。
特許文献１（特開２００１−１４２６９８号公報）のメモリアクセス方式では、主記憶装置が、複数の命令コードにより記述されたプログラムを記憶し、中央演算処理装置が、記主記憶装置に記憶されたプログラムを読み込み、読み込んだプログラムに記述された命令コードに従って命令を実行する。命令用メモリは、主記憶装置と比べて高速に読み込み可能であり、中央演算処理装置が実行中の命令の次に実行する次命令の命令コードが記憶されているアドレスが、実行中の命令の命令コードが記憶されているアドレスと連続していないアドレス不連続命令について、アドレス不連続命令の命令コードを記憶する。高速アクセス制御部が、次命令がアドレス不連続命令であることを検出し、アドレス不連続命令の場合には、命令用メモリに記憶されたアドレス不連続命令の命令コードを検索して命令用メモリからアドレス不連続命令の命令コードを取得し、取得した命令コードを中央演算処理装置に転送する。 Conventionally, a method for speeding up memory access is known.
In the memory access method of Patent Document 1 (Japanese Patent Laid-Open No. 2001-142698), the main storage device stores a program described by a plurality of instruction codes, and the central processing unit is stored in the storage device. The program is read and the instruction is executed according to the instruction code described in the read program. The instruction memory can be read faster than the main memory, and the address where the instruction code of the next instruction to be executed next to the instruction being executed by the central processing unit is stored is the address of the instruction being executed. For an address discontinuous instruction that is not continuous with the address where the instruction code is stored, the instruction code of the address discontinuous instruction is stored. The high-speed access control unit detects that the next instruction is an address discontinuous instruction, and in the case of an address discontinuous instruction, searches the instruction code of the address discontinuous instruction stored in the instruction memory to search the instruction memory. The instruction code of the address discontinuous instruction is obtained from the data, and the obtained instruction code is transferred to the central processing unit.

また、特許文献２（特開２００６−２９３７４８号公報）には、情報処理モジュールの入力インターフェースにスキップビット数を設定できるようにすることで、先頭アドレスに対して設定されたスキップビット数分を読み飛ばしてから、処理を開始することできるため、ｎビットアライメントされていない領域からも正しく入力データを与えることを可能とした情報処理装置が記載されている。 Patent Document 2 (Japanese Patent Application Laid-Open No. 2006-293748) reads the number of skip bits set for the head address by enabling the number of skip bits to be set in the input interface of the information processing module. An information processing apparatus is described in which processing can be started after skipping, so that input data can be correctly supplied from an area that is not n-bit aligned.

特開２００１−１４２６９８号公報JP 2001-142698 A 特開２００６−２９３７４８号公報JP 2006-293748 A

しかしながら、特許文献１および特許文献２の方式では、分岐先命令などの命令が複数のキャッシュラインに跨って登録されてしまうことがあり、キャッシュのヒット率が上がらず、メモリアクセス待ちが発生するという問題がある。 However, in the methods of Patent Document 1 and Patent Document 2, instructions such as branch destination instructions may be registered across a plurality of cache lines, the cache hit rate does not increase, and memory access wait occurs. There's a problem.

その他の課題と新規な特徴は、本明細書の記述および添付図面から明らかであろう。 Other problems and novel features will be apparent from the description of this specification and the accompanying drawings.

一実施の形態の半導体装置は、ビット幅がＮビットであるメインメモリと、Ｌ（Ｎ×Ｘ：Ｘは２以上の整数）ビット幅のキャッシュラインを有するキャッシュメモリと、キャッシュ制御部を備える。キャッシュ制御部は、メインメモリに記憶されているＭビット（２×Ｍ≦Ｌ）の命令をキャッシュメモリに登録する場合に、命令の先頭が含まれるＮビットの読出し単位データと、後続する（Ｘ−１）個のＮビットの読出し単位データを１つのキャッシュラインに登録する。 A semiconductor device according to an embodiment includes a main memory having a bit width of N bits, a cache memory having a cache line of L (N × X: X is an integer of 2 or more) bit width, and a cache control unit. When registering an M-bit (2 × M ≦ L) instruction stored in the main memory in the cache memory, the cache control unit includes N-bit read unit data including the head of the instruction and the following (X -1) Register N-bit read unit data in one cache line.

一実施の形態によれば、キャッシュのヒット率を向上させることができる。 According to one embodiment, the cache hit rate can be improved.

第１の実施形態の半導体装置の構成を表わす図である。It is a figure showing the structure of the semiconductor device of 1st Embodiment. 第２の実施形態の半導体装置の構成を表わす図である。It is a figure showing the structure of the semiconductor device of 2nd Embodiment. 第２の実施形態の命令メモリに記憶される命令を表わす図である。It is a figure showing the instruction | command memorize | stored in the instruction | indication memory of 2nd Embodiment. （ａ）は、Ｖビットメモリに記憶されるバリッドビット（Ｖ０，Ｖ１）と、タグメモリに記憶されるタグアドレス（Ａ３〜Ａ２４）を表わす図である。（ｂ）は、キャッシュデータメモリに記憶される１つのキャッシュラインのデータを表わす図である。(A) is a figure showing the valid bit (V0, V1) memorize | stored in V bit memory, and the tag address (A3-A24) memorize | stored in a tag memory. (B) is a diagram showing data of one cache line stored in the cache data memory. アドレス比較器の一構成例を表わす図である。It is a figure showing the example of 1 structure of an address comparator. 従来のキャッシュ方式において、分岐先命令がキャッシュされる例を説明するための図である。It is a figure for demonstrating the example in which a branch destination instruction is cached in the conventional cache system. 従来のキャッシュ方式において、分岐先命令がキャッシュされる別の例を説明するための図である。It is a figure for demonstrating another example by which the branch destination instruction is cached in the conventional cache system. 第２の実施形態のキャッシュ方式において、分岐先命令がキャッシュされる例を説明するための図である。It is a figure for demonstrating the example by which the branch destination instruction is cached in the cache system of 2nd Embodiment. 第２の実施形態のキャッシュ方式において、分岐先命令がキャッシュされる別の例を説明するための図である。It is a figure for demonstrating another example in which the branch destination instruction is cached in the cache system of 2nd Embodiment. 分岐先命令をキャッシュメモリへ登録する処理を制御するシーケンサの動作を説明するための図である。It is a figure for demonstrating operation | movement of the sequencer which controls the process which registers a branch destination instruction into cache memory. 分岐先命令がヒットする場合の動作例を示す図である。It is a figure which shows the operation example in case a branch destination instruction hits. 分岐先命令がミスする場合の動作例を示す図である。It is a figure which shows the operation example when a branch destination instruction misses. 第３の実施形態のアドレス比較器の構成を表わす図である。It is a figure showing the structure of the address comparator of 3rd Embodiment. 第３の実施形態の入出力制御部を表わす図である。It is a figure showing the input-output control part of 3rd Embodiment. １つのキャッシュラインに２つの分岐先命令が格納されている場合の格納状態を表わす図である。It is a figure showing the storage state in case two branch destination instructions are stored in one cache line. 第４の実施形態の半導体装置の構成を表わす図である。It is a figure showing the structure of the semiconductor device of 4th Embodiment. 第４の実施形態の命令メモリに記憶される命令を表わす図である。It is a figure showing the instruction | command memorize | stored in the instruction | indication memory of 4th Embodiment. （ａ）は、Ｖビットメモリに記憶されるバリッドビット（Ｖ０，Ｖ１，Ｖ２）と、タグメモリに記憶されるタグアドレス（Ａ２〜Ａ２４）を表わす図である。（ｂ）は、キャッシュデータメモリに記憶される１つのキャッシュラインのデータを表わす図である。(A) is a figure showing the valid bit (V0, V1, V2) memorize | stored in V bit memory, and the tag address (A2-A24) memorize | stored in a tag memory. (B) is a diagram showing data of one cache line stored in the cache data memory. 分岐先命令をキャッシュメモリへ登録する処理を制御するシーケンサの動作を説明するための図である。It is a figure for demonstrating operation | movement of the sequencer which controls the process which registers a branch destination instruction into cache memory. 第５の実施形態の半導体装置の構成を表わす図である。It is a figure showing the structure of the semiconductor device of 5th Embodiment. キャッシュデータメモリへの登録を説明するための図である。It is a figure for demonstrating registration to a cache data memory. キャッシュデータメモリからの読出しを説明するための図である。It is a figure for demonstrating reading from a cache data memory.

以下、本発明の実施の形態について、図面を用いて説明する。
［第１の実施形態］
図１は、第１の実施形態の半導体装置の構成を表わす図である。 Hereinafter, embodiments of the present invention will be described with reference to the drawings.
[First Embodiment]
FIG. 1 is a diagram illustrating the configuration of the semiconductor device according to the first embodiment.

図１を参照して、この半導体装置１は、メインメモリである命令メモリ８と、ＣＰＵ２とを備える。 Referring to FIG. 1, the semiconductor device 1 includes an instruction memory 8 that is a main memory and a CPU 2.

ＣＰＵ２は、ＣＰＵコア４と、キャッシュメモリ３と、キャッシュ制御部８１とを備える。 The CPU 2 includes a CPU core 4, a cache memory 3, and a cache control unit 81.

命令メモリ８は、Ｎビットのビット幅を有する。Ｎビットの読出し単位データ内で命令の位置がｓビットで指定されるものとする。たとえば、ｓ＝ｌｏｇ₂（Ｎ／８）とする。Ｎ＝６４の場合には、３である。 The instruction memory 8 has a bit width of N bits. It is assumed that the position of an instruction is designated by s bits in N-bit read unit data. For example, s = log ₂ (N / 8). When N = 64, it is 3.

キャッシュメモリ３は、Ｌ（Ｎ×Ｘ：Ｘは２以上の自然数）ビット幅のキャッシュラインを有する。キャッシュメモリ３は、各キャッシュラインのデータの命令メモリ内での格納位置を表わすタグアドレスとを記憶する。 The cache memory 3 has a cache line of L (N × X: X is a natural number of 2 or more) bit width. The cache memory 3 stores a tag address representing the storage position of the data of each cache line in the instruction memory.

キャッシュ制御部８１は、命令メモリ８に記憶されているＭビット（２×Ｍ≦Ｌ）の命令をキャッシュメモリに登録する場合に、命令の先頭が含まれるＮビットの読出し単位データと、後続する（Ｘ−１）個のＮビットの読出し単位データを１つのキャッシュラインに登録する。ただし、Ｘは２以上の自然数である。 When registering an M-bit (2 × M ≦ L) instruction stored in the instruction memory 8 in the cache memory, the cache control unit 81 follows the N-bit read unit data including the head of the instruction, (X-1) N-bit read unit data is registered in one cache line. However, X is a natural number of 2 or more.

より具体的には、キャッシュ制御部８１は、命令を指定するアドレスの下位のｓ個のビットを０とした第１のアドレスで指定されるＮビットの読出し単位データと、次の（Ｘ−１）個のＮビットの読出し単位データをキャッシュラインに登録する。また、キャッシュ制御部８１は、第１のアドレスの下位ｓビットを除いた部分をキャッシュラインを指定するタグアドレスとしてキャッシュメモリ３に登録する。 More specifically, the cache control unit 81 reads the N-bit read unit data specified by the first address with the lower s bits of the address specifying the instruction as 0, and the following (X-1 ) N-bit read unit data is registered in the cache line. In addition, the cache control unit 81 registers the portion excluding the lower s bits of the first address in the cache memory 3 as a tag address designating the cache line.

以上のように、本実施の形態によれば、１つのキャッシュラインに命令の全体が登録されるので、キャッシュのヒット率を向上させることができる。 As described above, according to this embodiment, since the entire instruction is registered in one cache line, the cache hit rate can be improved.

［第２の実施形態］
第２の実施形態では、分岐先命令をキャッシュする好適な構成を説明する。 [Second Embodiment]
In the second embodiment, a preferred configuration for caching a branch destination instruction will be described.

図２は、第２の実施形態の半導体装置の構成を表わす図である。
図２を参照して、この半導体装置１は、ＣＰＵ（Central Processing Unit）２と、ＲＯＭ（Read Only Memory)５と、ＲＡＭ（Random Access Memory）７と、ＣＩＦ（Cpu InterFace)６とを備える。 FIG. 2 is a diagram illustrating the configuration of the semiconductor device according to the second embodiment.
Referring to FIG. 2, the semiconductor device 1 includes a CPU (Central Processing Unit) 2, a ROM (Read Only Memory) 5, a RAM (Random Access Memory) 7, and a CIF (Cpu InterFace) 6.

ＣＩＦ６は、ＲＡＭ７およびＲＯＭ５と、ＣＰＵ２との間でデータおよび命令のやりとりを制御する。ＲＯＭ５は、命令を記憶する命令メモリ８を含む。 The CIF 6 controls the exchange of data and instructions between the RAM 7 and ROM 5 and the CPU 2. The ROM 5 includes an instruction memory 8 that stores instructions.

ＣＰＵコア４とＲＯＭ５との間、ＣＰＵコア４とＲＡＭ７との間は、ＣＩＦ６を介して６４ビットのバスで接続される。 The CPU core 4 and the ROM 5 and the CPU core 4 and the RAM 7 are connected via a CIF 6 via a 64-bit bus.

ＣＰＵ２は、ＣＰＵコア４と、キャッシュメモリ３と、キャッシュ制御部８１とを有する。 The CPU 2 includes a CPU core 4, a cache memory 3, and a cache control unit 81.

ＣＰＵコア４は、命令をデコードし、実行する。
キャッシュメモリ３は、キャッシュデータメモリ９と、タグメモリ１１と、Ｖビットメモリ１３とを有する。 The CPU core 4 decodes and executes the instruction.
The cache memory 3 includes a cache data memory 9, a tag memory 11, and a V bit memory 13.

キャッシュ制御部８１は、ＬＲＵ（Least Recently Used）１５と、入出力制御部１０と、アドレス比較器１２と、Ｖビット制御部１４と、ＬＲＵ制御部１６と、第１キュー１７と、第２キュー１８と、シーケンサ１９とを有する。 The cache control unit 81 includes an LRU (Least Recently Used) 15, an input / output control unit 10, an address comparator 12, a V bit control unit 14, an LRU control unit 16, a first queue 17, and a second queue. 18 and a sequencer 19.

命令メモリ８は、６４ビットのビット幅を有する。６４ビットの読出し単位データ内で命令の位置が３ビット（Ａ０〜Ａ２）で指定される。命令メモリ８内の命令の位置は、２５ビットのアドレス（Ａ０〜Ａ２４）で指定される。命令メモリ８内の読出し単位データのアドレスは、２２ビットのアドレス（Ａ３〜Ａ２４）で指定される。 The instruction memory 8 has a bit width of 64 bits. The position of the instruction is specified by 3 bits (A0 to A2) in the 64-bit read unit data. The position of the instruction in the instruction memory 8 is specified by a 25-bit address (A0 to A24). The address of the read unit data in the instruction memory 8 is specified by a 22-bit address (A3 to A24).

キャッシュデータメモリ９は、１２８ビット幅のキャッシュラインを有する。すなわち、命令メモリ８内の連続する２つの読出し単位データを記憶する。キャッシュラインの２つの読出し単位データのうち、命令メモリ８においてアドレスが小さい方に記憶されていた読出し単位データを前半の６４ビットデータ、命令メモリ８においてアドレスが大きい方に記憶されていた読出し単位データを後半の６４ビットデータと呼ぶことにする。 The cache data memory 9 has a 128-bit width cache line. That is, two consecutive read unit data in the instruction memory 8 are stored. Of the two read unit data of the cache line, the read unit data stored in the instruction memory 8 with the smaller address is the first 64-bit data, and the read unit data stored in the instruction memory 8 with the larger address Is called the latter 64-bit data.

タグメモリ１１は、タグアドレス（Ａ３〜Ａ２４）を記憶する。タグアドレスは、対応するキャッシュラインの前半の６４ビットデータの命令メモリ８におけるアドレスを表わす。 The tag memory 11 stores tag addresses (A3 to A24). The tag address represents an address in the instruction memory 8 of 64-bit data in the first half of the corresponding cache line.

Ｖビットメモリ１３は、バリッドビットＶ０、Ｖ１を記憶する。バリッドビットＶ０が、「１」の場合には、対応するキャッシュラインの前半の６４ビットデータが有効であり、「０」の場合には、対応するキャッシュラインの前半の６４ビットデータが無効である。バリッドビットＶ１が、「１」の場合には、対応するキャッシュラインの後半の６４ビットデータが有効であり、「０」の場合には、対応するキャッシュラインの後半の６４ビットデータが無効である。 The V bit memory 13 stores valid bits V0 and V1. When the valid bit V0 is "1", the first 64-bit data of the corresponding cache line is valid, and when the valid bit V0 is "0", the first 64-bit data of the corresponding cache line is invalid. . When the valid bit V1 is “1”, the 64-bit data in the latter half of the corresponding cache line is valid. When the valid bit V1 is “0”, the 64-bit data in the latter half of the corresponding cache line is invalid. .

１つのキャッシュラインと、そのキャッシュラインに対応するタグアドレスと、そのキャッシュラインに対応するバリッドビットとが１つのエントリを構成する。 One cache line, a tag address corresponding to the cache line, and a valid bit corresponding to the cache line constitute one entry.

ＬＲＵ１５は、最近使用されていないラインを表わす情報を記憶する。
入出力制御部１０は、命令メモリ８に記憶されている分岐先命令をキャッシュデータメモリ９に登録する場合に、分岐先命令の先頭が含まれる６４ビットの読出し単位データと、後続する６４ビットの読出し単位データとを１つのキャッシュラインに登録する。具体的には、入出力制御部１０は、分岐先命令を指定するアドレスの下位の３個のビットを０とした第１のアドレスで指定される６４ビットの読出し単位データと、次の６４ビットの読出し単位データをキャッシュラインに登録する。また、入出力制御部１０は、第１のアドレスの下位３ビットを除いた部分をキャッシュラインを指定するタグアドレスとしてタグメモリ１１に登録する。 The LRU 15 stores information representing lines that have not been used recently.
When registering the branch destination instruction stored in the instruction memory 8 in the cache data memory 9, the input / output control unit 10 stores the 64-bit read unit data including the head of the branch destination instruction and the subsequent 64-bit data. The read unit data is registered in one cache line. Specifically, the I / O controller 10 reads the 64-bit read unit data specified by the first address with the lower three bits of the address specifying the branch destination instruction as 0, and the next 64 bits. Are registered in the cache line. Further, the input / output control unit 10 registers the portion excluding the lower 3 bits of the first address in the tag memory 11 as a tag address designating the cache line.

また、入出力制御部１０は、キャッシュデータメモリ９から、指定されたキャッシュラインの１２８ビットのデータを読み出す。 Further, the input / output control unit 10 reads 128-bit data of the designated cache line from the cache data memory 9.

アドレス比較器１２は、タグメモリ１１に記憶されているタグアドレス（Ａ３〜Ａ２４）と、分岐先命令のアドレスＪＡＤＤＲのうち下位３ビットを除くアドレスＪＡＤＤＲ［２４：３］とを比較する。 The address comparator 12 compares the tag address (A3 to A24) stored in the tag memory 11 with the address JADDR [24: 3] excluding the lower 3 bits of the address JADDR of the branch destination instruction.

Ｖビット制御部１４は、Ｖビットメモリ１３におけるバリッドビットＶ０，Ｖ１の更新を制御する。 The V bit control unit 14 controls the update of the valid bits V0 and V1 in the V bit memory 13.

ＬＲＵ制御部１６は、ＬＲＵ１５内のデータの更新を制御する。
第１キュー１７は、キャッシュのヒット時に、キャッシュデータメモリ９から読みだされたキャッシュラインの前半の６４ビットデータを一時的に保持する。第２キュー１８は、キャッシュのヒット時に、キャッシュデータメモリ９から読みだされたキャッシュラインの後半の６４ビットデータを一時的に保持する。 The LRU control unit 16 controls updating of data in the LRU 15.
The first queue 17 temporarily holds 64-bit data of the first half of the cache line read from the cache data memory 9 when the cache hits. The second queue 18 temporarily holds 64-bit data in the latter half of the cache line read from the cache data memory 9 when the cache hits.

図３は、命令メモリ８に記憶される命令を表わす図である。
命令メモリ８は、可変長の命令、すなわち８ビット、１６ビット、２４ビット、３２ビット、４０ビット、４８ビット、５６ビット、または６４ビットの命令を記憶する。命令メモリ８からは、６４ビットの命令バスを通じて、６４ビット単位でデータが読みだされる。命令メモリ８からのデータの読み出し位置を指定するアドレス（Ａ３〜Ａ２４）は、６４ビットの読出し単位データ＃０、データ＃１、データ＃２・・・の先頭を表わす０ｂ０・・・００００００００、０ｂ０・・・００００１０００、０ｂ０・・・０００１００００、・・・で与えられる。 FIG. 3 is a diagram showing instructions stored in the instruction memory 8.
The instruction memory 8 stores a variable length instruction, that is, an instruction of 8 bits, 16 bits, 24 bits, 32 bits, 40 bits, 48 bits, 56 bits, or 64 bits. Data is read from the instruction memory 8 in 64-bit units through a 64-bit instruction bus. The addresses (A3 to A24) for designating the position for reading data from the instruction memory 8 are 0b0... 00000000, 0b0 representing the beginning of 64-bit read unit data # 0, data # 1, data # 2,. ... 00001000, 0b0 ... 00010000, ...

図４（ａ）は、Ｖビットメモリ１３に記憶されるバリッドビット（Ｖ０，Ｖ１）と、タグメモリに記憶されるタグアドレス（Ａ３〜Ａ２４）を表わす図である。 FIG. 4A shows valid bits (V0, V1) stored in the V-bit memory 13 and tag addresses (A3 to A24) stored in the tag memory.

図４（ｂ）は、キャッシュデータメモリ９に記憶される１つのキャッシュラインのデータを表わす図である。 FIG. 4B shows data of one cache line stored in the cache data memory 9.

１つのキャッシュラインのデータは、１２８ビットのデータ（Ｄ０〜Ｄ１２７）からなる。１つのキャッシュラインのデータのうち、６４ビットのデータ（Ｄ０〜Ｄ６３）が前半の６４ビットデータを構成し、６４ビットのデータ（Ｄ６４〜Ｄ１２７）が後半の６４ビットデータを構成する。 The data of one cache line consists of 128-bit data (D0 to D127). Of the data on one cache line, 64-bit data (D0 to D63) constitutes the first half of 64-bit data, and 64-bit data (D64 to D127) constitutes the second half of 64-bit data.

図５は、アドレス比較器１２の構成を表わす図である。
図５を参照して、アドレス比較器１２は、比較器２１＿１〜２１＿Ｎと、論理積回路２２＿１〜２２＿Ｎと、論理和回路２３とを含む。ただし、Ｎは、キャッシュデータメモリに格納可能なラインの総数である。 FIG. 5 is a diagram showing the configuration of the address comparator 12.
Referring to FIG. 5, address comparator 12 includes comparators 21_1 to 21_N, logical product circuits 22_1 to 22_N, and logical sum circuit 23. N is the total number of lines that can be stored in the cache data memory.

比較器２１＿ｉ（ｉ＝１〜Ｎ）は、第ｉエントリのタグアドレスＡ３〜Ａ２４と、分岐先命令のアドレスＪＡＤＤＲのうち下位３ビットを除くアドレスＪＡＤＤＲ［２４：３］とを比較して、一致する場合には「Ｈ」レベルの信号を出力し、不一致の場合には「Ｌ」レベルの信号を出力する。 The comparator 21_i (i = 1 to N) compares the tag addresses A3 to A24 of the i-th entry with the address JADDR [24: 3] excluding the lower 3 bits of the address JADDR of the branch destination instruction, and matches. If it does, an “H” level signal is output, and if they do not match, an “L” level signal is output.

論理積回路２２＿ｉは、比較器２１＿ｉの出力と、バリッドビットＶ０の論理積を出力する。論理和回路２３は、論理積回路２２＿１〜２２＿Ｎの論理和を出力する。この出力は、キャッシュのヒットまたはミスを示す信号であり、「Ｈ」レベルのときに、キャッシュがヒットしたことを表わし、「Ｌ」レベルのときに、キャッシュがミスしたことを表わす。また、ヒットしたエントリのタグアドレスに対応するキャッシュラインが、入出力制御部１０によって読出される。
（従来のキャッシュ方式）
図６は、従来のキャッシュ方式において、分岐先命令がキャッシュされる例を説明するための図である。 The AND circuit 22_i outputs a logical product of the output of the comparator 21_i and the valid bit V0. The OR circuit 23 outputs a logical sum of the AND circuits 22_1 to 22_N. This output is a signal indicating a cache hit or miss. When it is at “H” level, it indicates that the cache has been hit, and when it is at “L” level, it indicates that the cache has been missed. In addition, the cache line corresponding to the tag address of the hit entry is read by the input / output control unit 10.
(Conventional cache method)
FIG. 6 is a diagram for explaining an example in which a branch destination instruction is cached in the conventional cache method.

従来のキャッシュ方式は、命令メモリからのデータの読み出し箇所（アライメント境界）が固定されている。図６に示すように、分岐先命令１００をキャッシュデータメモリへ登録する場合には、アライメント境界である０ｂ０・・・０００１００００を先頭アドレスとした１２８ビットのデータがキャッシュデータメモリの１つのキャッシュラインのデータとして登録される。 In the conventional cache system, the data reading position (alignment boundary) from the instruction memory is fixed. As shown in FIG. 6, when registering the branch destination instruction 100 in the cache data memory, 128-bit data with the start address of 0b0... 00010000 as the alignment boundary is stored in one cache line of the cache data memory. Registered as data.

図７は、従来のキャッシュ方式において、分岐先命令がキャッシュされる別の例を説明するための図である。 FIG. 7 is a diagram for explaining another example in which a branch destination instruction is cached in the conventional cache method.

図７に示すように、分岐先命令２００をキャッシュデータメモリへ登録する場合には、まず、アライメント境界である０ｂ０・・・００００００００を先頭アドレスとした１２８ビットのデータがキャッシュデータメモリの１つのキャッシュラインのデータとして登録されるが、この場合には、分岐先命令２００の一部のみしか、キャッシュデータメモリに登録されない。したがって、分岐先命令へのジャンプ時に、分岐先命令２００がキャッシュにヒットしたとしても、メモリアクセス待ちとなり性能が上がらないという問題がある。
（本実施の形態のキャッシュ方式）
図８は、本実施の形態のキャッシュ方式において、分岐先命令がキャッシュされる例を説明するための図である。 As shown in FIG. 7, when registering the branch destination instruction 200 in the cache data memory, first, 128-bit data having a start address of 0b0... 00000000 as an alignment boundary is stored in one cache data memory. In this case, only a part of the branch destination instruction 200 is registered in the cache data memory. Therefore, even when the branch destination instruction 200 hits the cache at the time of jump to the branch destination instruction, there is a problem that the performance is not improved due to waiting for memory access.
(Cache method of this embodiment)
FIG. 8 is a diagram for explaining an example in which a branch destination instruction is cached in the cache system according to the present embodiment.

本実施の形態のキャッシュ方式では、命令メモリ８からのデータの読み出し境界（アライメント境界）が固定されていない。図８に示すように、分岐先命令１００をキャッシュデータメモリ９へ登録する場合には、分岐先命令１００の先頭が含まれる６４ビットの読出し単位データ（アドレス０ｂ０・・・０００１００００で指定される）がキャッシュラインの前半に登録されて、前半の６４ビットデータ（Ｄ０〜Ｄ６３）となる。さらに、その次の６４ビットの読出し単位データ（０ｂ０・・・０００１１０００で指定される）がキャッシュラインの後半に登録されて、後半６４ビットデータ（Ｄ６４〜Ｄ１２７）となる。 In the cache system of the present embodiment, the data reading boundary (alignment boundary) from the instruction memory 8 is not fixed. As shown in FIG. 8, when registering the branch destination instruction 100 in the cache data memory 9, 64-bit read unit data including the head of the branch destination instruction 100 (specified by address 0b0... 00010000) Are registered in the first half of the cache line and become the first half of 64-bit data (D0 to D63). Further, the next 64-bit read unit data (specified by 0b0... 00011000) is registered in the second half of the cache line, and becomes the second half 64-bit data (D64 to D127).

図９は、本実施の形態のキャッシュ方式において、分岐先命令がキャッシュされる別の例を説明するための図である。 FIG. 9 is a diagram for explaining another example in which a branch destination instruction is cached in the cache system according to the present embodiment.

図９に示すように、分岐先命令２００をキャッシュデータメモリ９へ登録する場合には、まず、分岐先命令２００の先頭が含まれる６４ビットの読出し単位データ（アドレス０ｂ０・・・００００１０００で指定される）がキャッシュラインの前半に登録されて、前半の６４ビットデータ（Ｄ０〜Ｄ６３）となる。さらに、その次の６４ビットの読出し単位データ（０ｂ０・・・０００１００００で指定される）がキャッシュラインの後半に登録されて、後半の６４ビットデータ（Ｄ６４〜Ｄ１２７）となる。この場合には、分岐先命令２００の全部が、キャッシュデータメモリ９に登録されるので、従来のようなメモリアクセス待ちが発生しない。
（分岐先命令キャッシュメモリへの登録）
図１０は、分岐先命令をキャッシュメモリへ登録する処理を制御するシーケンサ１９の動作を説明するための図である。 As shown in FIG. 9, when registering the branch destination instruction 200 in the cache data memory 9, first, 64-bit read unit data (address 0b0... 00001000 including the head of the branch destination instruction 200 is specified. Are registered in the first half of the cache line and become the first half of 64-bit data (D0 to D63). Further, the next 64-bit read unit data (specified by 0b0... 00010000) is registered in the second half of the cache line, and becomes the second half 64-bit data (D64 to D127). In this case, since all of the branch destination instructions 200 are registered in the cache data memory 9, the conventional memory access wait does not occur.
(Register to branch destination instruction cache memory)
FIG. 10 is a diagram for explaining the operation of the sequencer 19 that controls the process of registering the branch destination instruction in the cache memory.

シーケンサ１９は、以下の（１）〜（３）の処理を制御する。
（１）タグアドレスを登録、バリッドビットをクリア、ＬＲＵ更新。
（２）前半の６４ビットデータのキャッシュデータメモリへの書き込み、バリッドビットＶ０の書き込み。
（３）後半の６４ビットデータのキャッシュデータメモリへの書き込み、バリッドビットＶ１の書き込み。 The sequencer 19 controls the following processes (1) to (3).
(1) Register tag address, clear valid bit, update LRU.
(2) Write the first half of 64-bit data to the cache data memory and write the valid bit V0.
(3) Write the latter half of 64-bit data to the cache data memory and write the valid bit V1.

シーケンサ１９は、通常”ＩＤＬＥ状態”にあり、キャッシュミスしたときに登録を開始する。シーケンサ１９は、登録開始によって、上記の（１）を実行し、”Ｄ０待ち状態”に移行する。その後、シーケンサ１９は、データの前半を受け付けると、（２）を実行し、”Ｄ１待ち状態”に移行する。最後に、シーケンサ１９は、データの後半を受け付けると（３）を実行し、”ＩＤＬＥ”状態に戻る。 The sequencer 19 is normally in the “IDLE state” and starts registration when a cache miss occurs. When the registration starts, the sequencer 19 executes the above (1) and shifts to the “D0 waiting state”. Thereafter, when the sequencer 19 receives the first half of the data, it executes (2) and shifts to the “D1 waiting state”. Finally, when the sequencer 19 accepts the second half of the data, it executes (3) and returns to the “IDLE” state.

なお、シーケンサ１９は、登録データを待っているときに分岐を行って、分岐先の命令がキャッシュにヒットしたときは、１つのキャッシュラインが埋まらないまま登録を終了する場合がある。また、シーケンサ１９は、登録データを待っているときに分岐を行って、分岐先の命令がキャッシュにミスしたときには、別のキャッシュラインの登録を開始する場合がある。 Note that the sequencer 19 performs a branch when waiting for registration data, and may end registration without filling one cache line when a branch destination instruction hits the cache. The sequencer 19 may branch when waiting for registration data, and may start registration of another cache line when a branch destination instruction misses in the cache.

（ヒット時の動作）
図１１は、分岐先命令がヒットする場合の動作例を示す図である。 (Action when hit)
FIG. 11 is a diagram illustrating an operation example when a branch destination instruction hits.

図１１を参照して、ＣＰＵコア４は、分岐先アドレスＪＡＤＤＲ（ここでは、命令Ａのアドレス＠Ａ）を指定した分岐命令ＪＵＭＰを実行する。 Referring to FIG. 11, CPU core 4 executes branch instruction JUMP specifying branch destination address JADDR (here, address @A of instruction A).

アドレス比較器１２は、分岐先アドレスの下位３ビットを除く部分ＪＡＤＤＲ［２４：３］といずれかのタグアドレスが一致するので、キャッシュヒット信号ｂｒａｈｉｔを「Ｈ」レベルとする。 The address comparator 12 sets the cache hit signal brahit to the “H” level because any of the tag addresses matches the portion JADDR [24: 3] excluding the lower 3 bits of the branch destination address.

シーケンサ１９は、バリッドビットＶ０が有効を示す「１」であるため、キャッシュデータメモリの中のヒットしたキャッシュラインの前半の６４ビットデータｂｒａｄａｔ０［６３：０］を第１キュー１７に送る。ここでは、前半の６４ビットデータには、命令Ａと命令Ｂが含まれるものとする。 Since the valid bit V 0 is “1” indicating that the valid bit V 0 is valid, the sequencer 19 sends the first half 64-bit data brad0 [63: 0] of the hit cache line in the cache data memory to the first queue 17. Here, the first half of 64-bit data includes instruction A and instruction B.

また、シーケンサ１９は、バリッドビットＶ１が有効を示す「１」であるため、キャッシュデータメモリの中のヒットしたキャッシュラインの後半の６４ビットデータｂｒａｄａｔ１［１２７：６４］を第２キュー１８に送る。こでは、後半の６４ビットのデータには、命令Ｃと命令Ｄが含まれるものとする。 Further, since the valid bit V1 is “1” indicating that the valid bit V1 is valid, the sequencer 19 sends the second half 64-bit data bradat1 [127: 64] of the hit cache line in the cache data memory to the second queue 18. Here, it is assumed that the latter 64-bit data includes instruction C and instruction D.

ＬＲＵ制御部１６は、ＬＲＵの先頭に命令Ａのアドレスを登録する。
シーケンサ１９は、第１キュー１７、第２キュー１８の命令を順番にＣＰＵコア４に送り、ＣＰＵコア４によって命令がデコードされる。命令Ｃがデコードされた後、ＣＰＵコア４は、分岐先アドレスＪＡＤＤＲ（ここでは、命令Ｅのアドレス＠Ｅ）を指定した分岐命令ＪＵＭＰを実行する。 The LRU control unit 16 registers the address of the instruction A at the head of the LRU.
The sequencer 19 sequentially sends the instructions in the first queue 17 and the second queue 18 to the CPU core 4, and the instructions are decoded by the CPU core 4. After the instruction C is decoded, the CPU core 4 executes the branch instruction JUMP specifying the branch destination address JADDR (here, the address @E of the instruction E).

シーケンサ１９は、バリッドビットＶ０が有効を示す「１」であるため、キャッシュデータメモリの中のヒットしたキャッシュラインの前半の６４ビットデータｂｒａｄａｔ０［６３：０］を第１キュー１７に送る。ここでは、前半の６４ビットのデータには、命令Ｅと命令Ｆが含まれるものとする。 Since the valid bit V 0 is “1” indicating that the valid bit V 0 is valid, the sequencer 19 sends the first half 64-bit data brad0 [63: 0] of the hit cache line in the cache data memory to the first queue 17. Here, the first half of 64-bit data includes instruction E and instruction F.

シーケンサ１９は、バリッドビットＶ１が無効を示す「０」であるため、キャッシュデータメモリの中のヒットしたキャッシュラインの後半の６４ビットデータｂｒａｄａｔ１［１２７：６４］を第２キュー１８に送らない。 Since the valid bit V1 is “0” indicating invalidity, the sequencer 19 does not send the second half 64-bit data bradat1 [127: 64] of the hit cache line in the cache data memory to the second queue 18.

ＬＲＵ制御部１６は、ＬＲＵの先頭に命令Ｅのアドレスを登録する。
シーケンサ１９は、第１キュー１７の命令を順番にＣＰＵコア４に送り、ＣＰＵコア４によって命令がデコードされる。 The LRU control unit 16 registers the address of the instruction E at the head of the LRU.
The sequencer 19 sequentially sends the instructions in the first queue 17 to the CPU core 4, and the instructions are decoded by the CPU core 4.

（ミス時の動作）
図１２は、分岐先命令がミスする場合の動作例を示す図である。 (Operation at the time of mistake)
FIG. 12 is a diagram illustrating an operation example when a branch destination instruction misses.

図１２を参照して、ＣＰＵコア４は、分岐先アドレスＪＡＤＤＲ（ここでは、命令Ａのアドレス＠Ａ）を指定した分岐命令ＪＵＭＰが実行する。 Referring to FIG. 12, CPU core 4 executes a branch instruction JUMP that designates branch destination address JADDR (here, address @A of instruction A).

アドレス比較器１２は、分岐先アドレスの下位３ビットを除く部分ＪＡＤＤＲ［２４：３］がいずれかのタグアドレスとも一致しないので、キャッシュヒット信号ｂｒａｈｉｔが「Ｈ」レベルとしない。 In the address comparator 12, the portion JADDR [24: 3] excluding the lower 3 bits of the branch destination address does not match any of the tag addresses, so the cache hit signal brahit does not become “H” level.

シーケンサ１９は、分岐先アドレスＪＡＤＤＲの下位３ビットを除いた部分を、タグアドレスＡ３〜Ａ２４としてタグメモリに書き込む。 The sequencer 19 writes the portion excluding the lower 3 bits of the branch destination address JADDR as tag addresses A3 to A24.

ＬＲＵ制御部１６は、ＬＲＵの先頭に命令Ａのアドレスを登録する。
ＣＰＵコア４は、要求信号ＲＥＱを出力し、これに応答して命令メモリ８から送信される応答信号ＡＣＱを受け取る。Ｖビット制御部１４は、バリッドビットＶ０，Ｖ１を「０」，「０」にクリアする。 The LRU control unit 16 registers the address of the instruction A at the head of the LRU.
The CPU core 4 outputs a request signal REQ and receives a response signal ACQ transmitted from the instruction memory 8 in response thereto. The V bit control unit 14 clears the valid bits V0 and V1 to “0” and “0”.

ＣＰＵコア４は、命令メモリ８から前半の６４ビットデータＤ０を受けると、キャッシュデータメモリの対応するキャッシュラインの前半の位置に前半の６４ビットデータＤ０を書き込む。Ｖビット制御部１４は、命令メモリ８から完了信号ＥＮＤを受けると、バリッドビットＶ０を「１」に設定する。 When the CPU core 4 receives the first-half 64-bit data D0 from the instruction memory 8, the CPU core 4 writes the first-half 64-bit data D0 in the first-half position of the corresponding cache line of the cache data memory. When receiving the completion signal END from the instruction memory 8, the V bit control unit 14 sets the valid bit V 0 to “1”.

ＣＰＵコア４は、さらに要求信号ＲＥＱを出力し、これに応答して命令メモリ８から送信される応答信号ＡＣＱを受け取る。ＣＰＵコア４は、命令メモリ８から後半６４ビットのデータＤ１を受けると、キャッシュデータメモリの対応するキャッシュラインの後半の位置に後半の６４ビットデータＤ１を書き込む。Ｖビット制御部１４は、命令メモリ８から完了信号ＥＮＤを受けると、バリッドビットＶ１を「１」に設定する。 The CPU core 4 further outputs a request signal REQ, and receives a response signal ACQ transmitted from the instruction memory 8 in response thereto. When the CPU core 4 receives the latter half 64-bit data D1 from the instruction memory 8, it writes the latter half 64-bit data D1 in the latter half position of the corresponding cache line of the cache data memory. When receiving the completion signal END from the instruction memory 8, the V bit control unit 14 sets the valid bit V 1 to “1”.

以上のように、本実施の形態によれば、分岐先命令の先頭が命令メモリのどこにあろうと、分岐先命令の全体を１つのキャッシュラインに保持することができる。これにより、キャッシュヒット時は、メモリアクセスを待つことなく分岐先命令の実行を開始することができる。 As described above, according to the present embodiment, the entire branch destination instruction can be held in one cache line regardless of where the head of the branch destination instruction is in the instruction memory. As a result, when a cache hit occurs, execution of the branch destination instruction can be started without waiting for memory access.

［第３の実施形態］
第２の実施形態では、ジャンプ時にキャッシュラインの前半にある分岐先命令の先頭が検索される。しかしながら、キャッシュラインの後半にも、別の分岐先命令が格納されている場合がある。この分岐先命令も検索の対象とすることによって、ヒット率を向上させることができる。第３の実施形態では、後半の６４ビットデータに対応するアドレスをタグアドレスから算出して、ヒット判定に使用する。 [Third Embodiment]
In the second embodiment, the head of the branch destination instruction in the first half of the cache line is searched at the time of jump. However, another branch destination instruction may be stored in the second half of the cache line. By making this branch destination instruction a search target, the hit rate can be improved. In the third embodiment, an address corresponding to the latter 64-bit data is calculated from the tag address and used for hit determination.

図１３は、第３の実施形態のアドレス比較器の構成を表わす図である。
図１３を参照して、アドレス比較器５１２は、比較器２１＿１〜２１＿Ｎ，２５＿１〜２５＿Ｎと、論理積回路２２＿１〜２２＿Ｎ，２６＿１〜２６＿Ｎと、論理和回路２３，２７と、加算器２４＿１〜２４＿Ｎと、論理和回路２８と、論理積回路２９とを備える。ただし、Ｎは、キャッシュデータメモリに格納可能なキャッシュラインの総数である
比較器２１＿ｉ（ｉ＝１〜Ｎ）は、第ｉエントリのタグアドレスＡ３〜Ａ２４と、分岐先命令のアドレスＪＡＤＤＲのうち下位３ビットを除くアドレスＪＡＤＤＲ［２４：３］とを比較して、一致する場合には「Ｈ」レベルの信号を出力し、不一致の場合には「Ｌ」レベルの信号を出力する。 FIG. 13 is a diagram illustrating the configuration of the address comparator according to the third embodiment.
Referring to FIG. 13, address comparator 512 includes comparators 21_1 to 21_N, 25_1 to 25_N, AND circuits 22_1 to 22_N, 26_1 to 26_N, OR circuits 23 and 27, and adders 24_1 to 24_N. , An OR circuit 28 and an AND circuit 29 are provided. However, N is the total number of cache lines that can be stored in the cache data memory. The comparator 21_i (i = 1 to N) is a subordinate of the tag addresses A3 to A24 of the i-th entry and the address JADDR of the branch destination instruction. The address JADDR [24: 3] excluding 3 bits is compared, and if they match, an “H” level signal is output, and if they do not match, an “L” level signal is output.

論理積回路２２＿ｉは、比較器２１＿ｉの出力と、バリッドビットＶ０の論理積を出力する。この出力は、キャッシュラインの前半のヒットまたはミスを示す信号であり、「Ｈ」レベルのときに、いずれかのキャッシュラインの前半がヒットしたことを表わし、「Ｌ」レベルのときに、すべてのキャッシュラインの前半がミスしたことを表わす。 The AND circuit 22_i outputs a logical product of the output of the comparator 21_i and the valid bit V0. This output is a signal indicating the hit or miss of the first half of the cache line. When it is at “H” level, it indicates that the first half of one of the cache lines has been hit. Indicates that the first half of the cash line missed.

論理和回路２３は、論理積回路２２＿１〜２２＿Ｎの出力の論理和を出力する。
加算器２４＿ｉは、第ｉエントリのタグアドレスＡ３〜Ａ２４に「１」を加算することによって、タグアドレスの次のアドレスを出力する。タグアドレスがあるキャッシュラインの前半の６４ビットデータの命令メモリ８内でのアドレスを示すので、タグアドレスの次のアドレスは、同一のキャッシュラインの後半の６４ビットデータの命令メモリ８内でのアドレスを示す。 The logical sum circuit 23 outputs a logical sum of the outputs of the logical product circuits 22_1 to 22_N.
The adder 24_i adds “1” to the tag addresses A3 to A24 of the i-th entry, and outputs the next address of the tag address. Since the address in the instruction memory 8 of the first 64-bit data of the cache line with the tag address is indicated, the address next to the tag address is the address in the instruction memory 8 of the second half 64-bit data of the same cache line. Indicates.

比較器２５＿ｉは、加算器２４＿ｉから出力される第ｉエントリのタグアドレスＡ３〜Ａ２４の次のアドレスと、分岐先命令のアドレスＪＡＤＤＲのうち下位３ビットを除くアドレスＪＡＤＤＲ［２４：３］とを比較して、一致する場合には「Ｈ」レベルの信号を出力し、不一致の場合には「Ｌ」レベルの信号を出力する。 The comparator 25_i compares the address next to the tag addresses A3 to A24 of the i-th entry output from the adder 24_i and the address JADDR [24: 3] excluding the lower 3 bits of the address JADDR of the branch destination instruction. If they match, an “H” level signal is output, and if they do not match, an “L” level signal is output.

論理積回路２６＿ｉは、比較器２５＿ｉの出力と、バリッドビットＶ１の論理積を出力する。 The AND circuit 26_i outputs a logical product of the output of the comparator 25_i and the valid bit V1.

論理和回路２７は、論理積回路２６＿１〜２６＿Ｎの出力の論理和を出力する。この出力は、キャッシュラインの後半のヒットまたはミスを示す信号であり、「Ｈ」レベルのときに、いずれかのキャッシュラインの後半がヒットしたことを表わし、「Ｌ」レベルのときに、すべてのキャッシュラインの後半がミスしたことを表わす。 The OR circuit 27 outputs a logical sum of the outputs of the AND circuits 26_1 to 26_N. This output is a signal indicating a hit or miss in the second half of the cache line. When it is at the “H” level, it indicates that the second half of any cache line has been hit. Indicates that the second half of the cash line has missed.

論理和回路２８は、論理和回路２３の出力と論理和回路２８の出力の論理和を出力する。この出力は、キャッシュラインのヒットまたはミスを示す信号であり、「Ｈ」レベルのときに、いずれかのキャッシュラインが前半または後半でヒットしたことを表わし、「Ｌ」レベルのときに、すべてのキャッシュラインがミスしたことを表わす。また、ヒットしたエントリのタグアドレスに対応するキャッシュラインが、入出力制御部１０によって読出される。 The logical sum circuit 28 outputs a logical sum of the output of the logical sum circuit 23 and the output of the logical sum circuit 28. This output is a signal indicating a hit or miss of the cache line. When it is “H” level, it indicates that one of the cache lines hit in the first half or the second half, and when it is “L” level, Indicates that the cache line has missed. In addition, the cache line corresponding to the tag address of the hit entry is read by the input / output control unit 10.

図１４は、第３の実施形態の入出力制御部を表わす図である。
入出力制御部１０は、セレクタ７０とシフタ５１を含む。 FIG. 14 is a diagram illustrating an input / output control unit according to the third embodiment.
The input / output control unit 10 includes a selector 70 and a shifter 51.

セレクタ７０は、ヒットしたキャッシュラインの１２８ビットのデータ「Ｄ０〜Ｄ１２７」を出力する。 The selector 70 outputs 128-bit data “D0 to D127” of the hit cache line.

キャッシュラインの前半のデータがヒットしたときには、第２の実施形態と同様に、セレクタ７０の出力が、ＣＰＵコア４に送られる。 When the first half of the cache line hits, the output of the selector 70 is sent to the CPU core 4 as in the second embodiment.

キャッシュラインの後半のデータがヒットしたときには、セレクタ７０から出力される「Ｄ０〜Ｄ１２７」をシフタ５１が６４ビット右シフトする。その結果、第０ビット目〜第６３ビット目が「Ｄ６４〜Ｄ１２７」（後半の６４ビットのデータ）で、第６４ビット目〜第１２７ビット目がダミーのデータがＣＰＵコア４に送られる。 When data in the latter half of the cache line is hit, the shifter 51 right-shifts “D0 to D127” output from the selector 70 by 64 bits. As a result, the 0th to 63rd bits are sent to the CPU core 4 as “D64 to D127” (the latter 64 bits of data), and the 64th to 127th bits as dummy data.

図１５は、１つのキャッシュラインに２つの分岐先命令が格納されている場合の格納状態を表わす図である。 FIG. 15 is a diagram illustrating a storage state when two branch destination instructions are stored in one cache line.

キャッシュラインの前半には、分岐先命令１００の先頭が位置し、キャッシュラインの後半には、分岐先命令２００の先頭が位置する。 The beginning of the branch destination instruction 100 is located in the first half of the cache line, and the beginning of the branch destination instruction 200 is located in the second half of the cache line.

第２の実施形態では、１つのキャッシュラインに２つの分岐先命令が登録されていても、キャッシュラインの前半の６４ビットデータに先頭が含まれる分岐先命令１００のみがキャッシュのヒット判定の対象となる。これに対して、本実施の形態では、キャッシュラインの後半の６４ビットデータに先頭が含まれる分岐先命令２００もキャッシュのヒット判定の対象となる。 In the second embodiment, even if two branch destination instructions are registered in one cache line, only the branch destination instruction 100 whose head is included in 64-bit data in the first half of the cache line is subject to cache hit determination. Become. On the other hand, in the present embodiment, the branch destination instruction 200 whose head is included in the latter half of the 64-bit data of the cache line is also subject to cache hit determination.

以上のように、本実施の形態によれば、タグアドレスに１を加算したアドレスも、アドレス比較の対処に加えることによって、第２の実施形態よりもキャッシュのヒット率を増加させることができる。 As described above, according to the present embodiment, the address obtained by adding 1 to the tag address can be increased in the cache hit rate as compared with the second embodiment by adding to the address comparison.

［第４の実施形態］
本実施の形態では、命令メモリからのデータの読み出し単位が、第２の実施形態よりも小さい場合について説明する。 [Fourth Embodiment]
In this embodiment, a case where the unit of reading data from the instruction memory is smaller than that in the second embodiment will be described.

図１６は、第４の実施形態の半導体装置の構成を表わす図である。
図１６の半導体装置１０１が、図２の半導体装置１と相違する点は以下である。 FIG. 16 is a diagram illustrating the configuration of the semiconductor device of the fourth embodiment.
The semiconductor device 101 in FIG. 16 is different from the semiconductor device 1 in FIG. 2 as follows.

ＣＰＵコア４とＲＯＭ５との間、ＣＰＵコア４とＲＡＭ７との間は、ＣＩＦ６を介して３２ビットの命令バスで接続される。 The CPU core 4 and the ROM 5 and the CPU core 4 and the RAM 7 are connected via a CIF 6 by a 32-bit instruction bus.

命令メモリ１０８は、３２ビットのビット幅を有する。３２ビットの読出し単位データ内で命令の位置が２ビット（Ａ０〜Ａ１）で指定される。命令メモリ１０８内の命令の位置は、Ａ０〜Ａ２４の２５ビットのアドレスで指定される。 The instruction memory 108 has a bit width of 32 bits. The position of the instruction is designated by 2 bits (A0 to A1) in the 32-bit read unit data. The position of the instruction in the instruction memory 108 is designated by a 25-bit address A0 to A24.

キャッシュデータメモリ９は、１２８ビット幅のキャッシュラインを有する。すなわち、キャッシュデータメモリ９は、命令メモリ１０８内の連続する４つの読出し単位データを記憶する。キャッシュラインの４つの読出し単位データについて、命令メモリ１０８においてアドレスが小さい方に記憶されていた読出し単位データから順番に第１番目の３２ビットデータ、第２番目の３２ビットデータ、第３番目の３２ビットデータ、第４番目の３２ビットデータと呼ぶことにする。 The cache data memory 9 has a 128-bit width cache line. That is, the cache data memory 9 stores four consecutive read unit data in the instruction memory 108. For the four read unit data in the cache line, the first 32-bit data, the second 32-bit data, and the third 32 in order from the read unit data stored in the instruction memory 108 with the smaller address. It will be referred to as bit data, the fourth 32-bit data.

Ｖビットメモリ１１３は、バリッドビットＶ０、Ｖ１、Ｖ２を記憶する。バリッドビットＶ０は、キャッシュラインの第１番目の３２ビットデータが有効か無効かを表わすビットである。バリッドビットＶ１は、キャッシュラインの第２番目の３２ビットデータが有効か無効かを表わすビットである。バリッドビットＶ２は、キャッシュラインの第３番目の３２ビットデータが有効か無効かを表わすビットである。バリッドビットＶ３は、キャッシュラインの第４番目の３２ビットデータが有効か無効かを表わすビットである。 The V bit memory 113 stores valid bits V0, V1, and V2. The valid bit V0 is a bit indicating whether the first 32-bit data of the cache line is valid or invalid. The valid bit V1 is a bit indicating whether the second 32-bit data of the cache line is valid or invalid. The valid bit V2 is a bit indicating whether the third 32-bit data of the cache line is valid or invalid. The valid bit V3 is a bit indicating whether the fourth 32-bit data of the cache line is valid or invalid.

タグメモリ１１１は、キャッシュデータメモリ９の各キャッシュラインのタグアドレス（Ａ２〜Ａ２４）を記憶する。タグアドレスは、対応するキャッシュラインの第１の３２ビットデータの命令メモリ８におけるアドレスを表わす。 The tag memory 111 stores the tag address (A2 to A24) of each cache line of the cache data memory 9. The tag address represents the address in the instruction memory 8 of the first 32-bit data of the corresponding cache line.

入出力制御部１１０は、命令メモリ１０８に記憶されている分岐先命令をキャッシュデータメモリ９に登録する場合に、分岐先命令の先頭が含まれる３２ビットの読出し単位データと、後続する３個の３２ビットの読出し単位データとを１つのキャッシュラインに登録する。具体的には、入出力制御部１１０は、分岐先命令を指定するアドレスの下位の２個のビットを０とした第１のアドレスで指定される３２ビットの読出し単位データと、次の３個の３２ビットの読出し単位データとをキャッシュラインに登録する。また、入出力制御部１１０は、第１のアドレスの下位２ビットを除いた部分をキャッシュラインを指定するタグアドレスとしてタグメモリ１１１に登録する。 When the branch destination instruction stored in the instruction memory 108 is registered in the cache data memory 9, the input / output control unit 110 includes 32-bit read unit data including the head of the branch destination instruction and the subsequent three 32-bit read unit data is registered in one cache line. Specifically, the input / output control unit 110 reads 32-bit read unit data specified by the first address with the lower two bits of the address specifying the branch destination instruction set to 0, and the next three Are registered in the cache line. Further, the input / output control unit 110 registers the portion excluding the lower 2 bits of the first address in the tag memory 111 as a tag address for designating the cache line.

また、入出力制御部１１０は、キャッシュデータメモリ９から、指定されたキャッシュラインの１２８ビットのデータを読み出す。 Further, the input / output control unit 110 reads 128-bit data of the designated cache line from the cache data memory 9.

図１７は、命令メモリ１０８に記憶される命令を表わす図である。
命令メモリ１０８は、可変長の命令、すなわち８ビット、１６ビット、２４ビット、３２ビット、４０ビット、４８ビット、５６ビット、または６４ビットの命令を記憶する。命令メモリ１０８からは、３２ビットの命令バスを通じて、３２ビット単位でデータが読みだされる。命令メモリ１０８からのデータの読み出し位置を指定するアドレスは、３２ビットのデータ＃０、データ＃１、データ＃２・・・の先頭を表わす０ｂ０・・・０００００００、０ｂ０・・・００００１００、０ｂ０・・・０００１０００、・・・で与えられる。 FIG. 17 is a diagram showing instructions stored in instruction memory 108.
The instruction memory 108 stores variable length instructions, ie, 8-bit, 16-bit, 24-bit, 32-bit, 40-bit, 48-bit, 56-bit, or 64-bit instruction. Data is read from the instruction memory 108 in units of 32 bits through a 32-bit instruction bus. The address for designating the reading position of the data from the instruction memory 108 is 0b0... 0000000, 0b0... 0000100, 0b0, which represents the head of 32-bit data # 0, data # 1, data # 2,. ..0001000, ...

図１８（ａ）は、Ｖビットメモリ１１３に記憶されるバリッドビット（Ｖ０，Ｖ１，Ｖ２）と、タグメモリ１１１に記憶されるタグアドレス（Ａ２〜Ａ２４）を表わす図である。 FIG. 18A shows valid bits (V0, V1, V2) stored in the V-bit memory 113 and tag addresses (A2 to A24) stored in the tag memory 111.

図１８（ｂ）は、キャッシュデータメモリ９に記憶される１つのキャッシュラインのデータを表わす図である。 FIG. 18B shows data of one cache line stored in the cache data memory 9.

１キャッシュラインのデータは、１２８ビットのデータ（Ｄ０〜Ｄ１２７）からなる。１キャッシュラインのデータのうち、３２ビットのデータ（Ｄ０〜Ｄ３１）が第１番目の３２ビットデータを構成し、３２ビットのデータ（Ｄ３２〜Ｄ６３）が第２番目の３２ビットデータを構成し、３２ビットのデータ（Ｄ６４〜Ｄ９５）が第３番目の３２ビットデータを構成し、３２ビットのデータ（Ｄ９６〜Ｄ１２７）が第４番目の３２ビットデータを構成する。 The data for one cache line consists of 128-bit data (D0 to D127). Among the data of one cache line, 32-bit data (D0 to D31) constitutes the first 32-bit data, and 32-bit data (D32 to D63) constitutes the second 32-bit data, The 32-bit data (D64 to D95) constitutes the third 32-bit data, and the 32-bit data (D96 to D127) constitutes the fourth 32-bit data.

（分岐先命令キャッシュメモリへの登録）
図１９は、分岐先命令をキャッシュメモリへ登録する処理を制御するシーケンサ１１９の動作を説明するための図である。 (Register to branch destination instruction cache memory)
FIG. 19 is a diagram for explaining the operation of the sequencer 119 that controls the process of registering the branch destination instruction in the cache memory.

シーケンサ１１９は、以下の（１）〜（５）の処理を制御する。
（１）タグアドレスを登録、バリッドビットをクリア、ＬＲＵ更新。
（２）第１番目の３２ビットデータのキャッシュデータメモリへの書き込み、バリッドビットＶ０の書き込み。
（３）第２番目の３２ビットデータのキャッシュデータメモリへの書き込み、バリッドビットＶ１の書き込み。
（４）第３番目の３２ビットデータのキャッシュデータメモリへの書き込み、バリッドビットＶ２の書き込み。
（５）第４番目の３２ビットデータのキャッシュデータメモリへの書き込み、バリッドビットＶ３の書き込み。 The sequencer 119 controls the following processes (1) to (5).
(1) Register tag address, clear valid bit, update LRU.
(2) Write the first 32-bit data to the cache data memory and write the valid bit V0.
(3) Writing the second 32-bit data to the cache data memory and writing the valid bit V1.
(4) Writing the third 32-bit data to the cache data memory and writing the valid bit V2.
(5) Write the fourth 32-bit data to the cache data memory and write the valid bit V3.

シーケンサ１１９は、通常”ＩＤＬＥ状態”にあり、キャッシュミスしたときに登録を開始する。シーケンサ１１９は、登録開始によって、上記の（１）を実行し、”Ｄ０待ち状態”に移行する。その後、シーケンサ１１９は、第１番目の３２ビットのデータを受け付けると、（２）を実行し、”Ｄ１待ち状態”に移行するその後、シーケンサ１１９は、第２番目の３２ビットのデータを受け付けると、（３）を実行し、”Ｄ２待ち状態”に移行する。その後、シーケンサ１１９は、第３番目の３２ビットのデータを受け付けると、（４）を実行し、”Ｄ３待ち状態”に移行する。最後に、シーケンサ１１９は、第４番目の３２ビットのデータを受け付けると、（５）を実行し、”ＩＤＬＥ”状態に戻る。 The sequencer 119 is normally in the “IDLE state” and starts registration when a cache miss occurs. The sequencer 119 executes the above (1) by starting registration, and shifts to “D0 waiting state”. After that, when the sequencer 119 receives the first 32-bit data, the sequencer 119 executes (2) and shifts to the “D1 wait state”. After that, the sequencer 119 receives the second 32-bit data. , (3) are executed, and a transition is made to “D2 waiting state”. After that, when the sequencer 119 receives the third 32-bit data, it executes (4) and shifts to the “D3 waiting state”. Finally, when the sequencer 119 receives the fourth 32-bit data, it executes (5) and returns to the “IDLE” state.

以上のように、本実施の形態では、命令メモリのビット幅がキャッシュラインのビット幅の１／４で、命令の最大長の１／２の場合であっても、第２の実施形態と同様に、分岐先命令の先頭が命令メモリのどこにあろうと、分岐先命令の全体を１つのキャッシュラインに保持することができる。これにより、キャッシュヒット時は、メモリアクセスを待つことなく分岐先命令の実行を開始することができる。 As described above, in this embodiment, even if the bit width of the instruction memory is 1/4 of the bit width of the cache line and 1/2 of the maximum length of the instruction, the same as in the second embodiment. In addition, the entire branch destination instruction can be held in one cache line regardless of where the head of the branch destination instruction is in the instruction memory. As a result, when a cache hit occurs, execution of the branch destination instruction can be started without waiting for memory access.

［第５の実施形態］
第５の実施形態では、分岐先命令をキャッシュメモリにシフトして格納する。 [Fifth Embodiment]
In the fifth embodiment, the branch destination instruction is shifted and stored in the cache memory.

本実施の形態では、第１の実施形態と異なり、命令メモリから読み出した分岐先命令をシフトしてつめてキャッシュメモリに格納することで、キャッシュメモリの少ない面積で分岐先命令を格納することができる。 In the present embodiment, unlike the first embodiment, the branch destination instruction read from the instruction memory is shifted and stored in the cache memory, so that the branch destination instruction can be stored in a small area of the cache memory. it can.

図２０は、第５の実施形態の半導体装置の構成を表わす図である。
図２０の半導体装置２０１が、図２の半導体装置１と相違する点は以下である。 FIG. 20 is a diagram illustrating the configuration of the semiconductor device of the fifth embodiment.
The semiconductor device 201 in FIG. 20 is different from the semiconductor device 1 in FIG. 2 as follows.

キャッシュデータメモリ２０９は、６４ビット幅のキャッシュラインを有する。すなわち、命令メモリ８内の１つの読出し単位データを記憶する。 The cache data memory 209 has a 64-bit width cache line. That is, one read unit data in the instruction memory 8 is stored.

分岐先命令アドレスメモリ２１１は、分岐先命令アドレス（Ａ０〜Ａ２４）を記憶する。分岐先命令アドレスは、タグアドレス（Ａ３〜Ａ２４）と、読出しデータ単位内の分岐先命令の位置を表わすアドレス（Ａ０〜Ａ２）からなる。 The branch destination instruction address memory 211 stores branch destination instruction addresses (A0 to A24). The branch destination instruction address includes a tag address (A3 to A24) and an address (A0 to A2) indicating the position of the branch destination instruction in the read data unit.

Ｖビットメモリ２１３は、バリッドビットＶ０を記憶する。バリッドビットＶ０が、「１」の場合には、対応するキャッシュラインの６４ビットデータが有効であり、「０」の場合には、対応するキャッシュラインの６４ビットデータが無効である。 The V bit memory 213 stores a valid bit V0. When the valid bit V0 is “1”, the 64-bit data of the corresponding cache line is valid, and when it is “0”, the 64-bit data of the corresponding cache line is invalid.

１つのキャッシュラインと、そのキャッシュラインに対応する分岐先命令アドレスと、そのキャッシュラインに対応するバリッドビットとが１つのエントリを構成する。 One cache line, a branch destination instruction address corresponding to the cache line, and a valid bit corresponding to the cache line constitute one entry.

入出力制御部２１０は、命令メモリ８に記憶されている分岐先命令をキャッシュデータメモリ２０９に登録する場合に、分岐先命令を指定するアドレスの下位の３個のビットを０とした第１のアドレスで指定される６４ビットの読出し単位データと、次の６４ビットの読出し単位データとを読み出す。入出力制御部２１０は、分岐先命令の先頭がキャッシュラインの先頭にくるように、読み出された１２８ビットのデータをシフトして、キャッシュラインに登録する。 When registering the branch destination instruction stored in the instruction memory 8 in the cache data memory 209, the input / output control unit 210 sets the lower three bits of the address designating the branch destination instruction to 0. The 64-bit read unit data designated by the address and the next 64-bit read unit data are read. The input / output control unit 210 shifts the read 128-bit data so that the head of the branch destination instruction comes to the head of the cache line, and registers it in the cache line.

また、入出力制御部２１０は、第１のアドレスの下位３ビットを除いた部分をキャッシュラインを指定するタグアドレスとし、分岐先命令のアドレスの下位３ビットを読出し単位データ内の分岐先命令の位置を表わすアドレスとして分岐先命令アドレスメモリ２１１に記憶する。 Further, the input / output control unit 210 uses the portion excluding the lower 3 bits of the first address as a tag address for designating the cache line, and uses the lower 3 bits of the address of the branch destination instruction as the branch destination instruction in the read unit data. Stored in the branch destination instruction address memory 211 as an address representing a position.

また、入出力制御部２１０は、キャッシュデータメモリ２０９から、指定されたキャッシュラインの６４ビットのデータを読み出す。 In addition, the input / output control unit 210 reads 64-bit data of the specified cache line from the cache data memory 209.

アドレス比較器２１２は、分岐先命令アドレスメモリ２１１に記憶されているタグアドレス（Ａ３〜Ａ２４）と、分岐先命令のアドレスＪＡＤＤＲのうち下位３ビットを除くアドレスＪＡＤＤＲ［２４：３］とを比較する。 The address comparator 212 compares the tag addresses (A3 to A24) stored in the branch destination instruction address memory 211 with the address JADDR [24: 3] excluding the lower 3 bits of the address JADDR of the branch destination instruction. .

Ｖビット制御部２１４は、Ｖビットメモリ２１３におけるバリッドビットＶ０の更新を制御する。 The V bit control unit 214 controls updating of the valid bit V 0 in the V bit memory 213.

図２１は、キャッシュデータメモリへの登録を説明するための図である。
入出力制御部２１０は、シフタ６１を含み、シフタ６１は、分岐先命令のアドレスＪＡＤＤＲの下位３ビットの値だけ、命令メモリ８から読みだされた１２８ビットの読出しデータをシフトして、６４ビット幅のキャッシュラインに出力する。 FIG. 21 is a diagram for explaining registration in the cache data memory.
Input Output controller 210 includes a shifter 61, shifter 61, only the lower 3 bits of the address JADDR the branch target instruction, to shift the read data of 128 bits read out from the instruction memory 8, 64 Output to a bit-width cache line.

たとえば、分岐先命令のアドレスの下位３ビットが、「０００」の場合には、シフタ６１は、読み出された１２８ビットの読出しデータをシフトせずに、６４ビット幅のキャッシュラインに登録する。 For example, when the lower 3 bits of the address of the branch destination instruction are “000”, the shifter 61 registers the read 128-bit read data in a 64-bit cache line without shifting.

分岐先命令のアドレスの下位３ビットが、「００１」の場合には、シフタ６１は、読み出された１２８ビットの読出しデータを８ビットだけ右シフトして６４ビットのキャッシュラインに登録する。同様に、分岐先命令のアドレスの下位３ビットが、「０１０」の場合には、シフタ６１は、読み出された１２８ビットの読出しデータを１６ビットだけ右シフトして６４ビットのキャッシュラインに登録する。分岐先命令のアドレスの下位３ビットが、「０１１」の場合には、シフタ６１は、読み出された１２８ビットの読出しデータを２４ビットだけ右シフトして６４ビットのキャッシュラインに登録する。分岐先命令のアドレスの下位３ビットが、「１００」の場合には、シフタ６１は、読み出された１２８ビットの読出しデータを３２ビットだけ右シフトして６４ビットのキャッシュラインに登録する。分岐先命令のアドレスの下位３ビットが、「１０１」の場合には、シフタ６１は、読み出された１２８ビットの読出しデータを４０ビットだけ右シフトして６４ビットのキャッシュラインに登録する。分岐先命令のアドレスの下位３ビットが、「１１０」の場合には、シフタ６１は、読み出された１２８ビットの読出しデータを４８ビットだけ右シフトして６４ビットのキャッシュラインに登録する。分岐先命令のアドレスの下位３ビットが、「１１１」の場合には、シフタ６１は、読み出された１２８ビットの読出しデータを５６ビットだけ右シフトして６４ビットのキャッシュラインに登録する。 When the lower 3 bits of the address of the branch destination instruction are “001”, the shifter 61 right-shifts the read 128-bit read data by 8 bits and registers it in the 64-bit cache line. Similarly, when the lower 3 bits of the address of the branch destination instruction are “010”, the shifter 61 right-shifts the read 128-bit read data by 16 bits and registers it in the 64-bit cache line. To do. When the lower 3 bits of the address of the branch destination instruction are “011”, the shifter 61 right-shifts the read 128-bit read data by 24 bits and registers it in the 64-bit cache line. When the lower 3 bits of the address of the branch destination instruction are “100”, the shifter 61 right-shifts the read 128-bit read data by 32 bits and registers it in the 64-bit cache line. When the lower 3 bits of the address of the branch destination instruction are “101”, the shifter 61 right-shifts the read 128-bit read data by 40 bits and registers it in the 64-bit cache line. When the lower 3 bits of the address of the branch destination instruction are “110”, the shifter 61 right-shifts the read 128-bit read data by 48 bits and registers it in the 64-bit cache line. When the lower 3 bits of the address of the branch destination instruction are “111”, the shifter 61 right-shifts the read 128-bit read data by 56 bits and registers it in the 64-bit cache line.

これによって、分岐先命令が最大（６４ビット長）であっても１つのキャッシュラインに登録することができる。 As a result, even if the branch destination instruction is maximum (64-bit length), it can be registered in one cache line.

図２２は、キャッシュデータメモリからの読出しを説明するための図である。
入出力制御部２１０は、シフタ６２を含み、シフタ６２は、分岐先命令アドレスメモリ２１１に記憶されている分岐先命令の下位３ビット（Ａ０〜Ａ２）の値だけ、キャッシュデータメモリ２０９から読みだされた６４ビットのデータをシフトして、ＣＰＵコア４へ出力する。 FIG. 22 is a diagram for explaining reading from the cache data memory.
Input Output controller 210 includes a shifter 62, shifter 62, only the value of the lower 3 bits of the branch destination instruction stored in the branch target instruction address memory 211 (A0-A2), read from the cache data memory 209 The outputted 64-bit data is shifted and output to the CPU core 4.

たとえば、分岐先命令のアドレスの下位３ビット（Ａ０〜Ａ２）が、「０００」の場合には、シフタ６２は、キャッシュデータメモリ２０９から出力された分岐先命令をシフトせずに、ＣＰＵコア４へ出力する。分岐先命令のアドレスの下位３ビット（Ａ０〜Ａ２）が、「００１」の場合には、シフタ６２は、キャッシュデータメモリ２０９から出力された分岐先命令を８ビットだけ左シフトして、ＣＰＵコア４へ出力する。同様に、分岐先命令のアドレスの下位３ビット（Ａ０〜Ａ２）が、「０１０」の場合には、シフタ６２は、キャッシュデータメモリ２０９から出力された分岐先命令を１６ビットだけ左シフトして、ＣＰＵコア４へ出力する。分岐先命令のアドレスの下位３ビット（Ａ０〜Ａ２）が、「０１１」の場合には、シフタ６２は、キャッシュデータメモリ２０９から出力された分岐先命令を２４ビットだけ左シフトして、ＣＰＵコア４へ出力する。分岐先命令のアドレスの下位３ビット（Ａ０〜Ａ２）が、「１００」の場合には、シフタ６２は、キャッシュデータメモリ２０９から出力された分岐先命令を３２ビットだけ左シフトして、ＣＰＵコア４へ出力する。分岐先命令のアドレスの下位３ビット（Ａ０〜Ａ２）が、「１０１」の場合には、シフタ６２は、キャッシュデータメモリ２０９から出力された分岐先命令を４０ビットだけ左シフトして、ＣＰＵコア４へ出力する。分岐先命令のアドレスの下位３ビット（Ａ０〜Ａ２）が、「１１０」の場合には、シフタ６２は、キャッシュデータメモリ２０９から出力された分岐先命令を４８ビットだけ左シフトして、ＣＰＵコア４へ出力する。分岐先命令のアドレスの下位３ビット（Ａ０〜Ａ２）が、「１１１」の場合には、シフタ６２は、キャッシュデータメモリ２０９から出力された分岐先命令を５６ビットだけ左シフトして、ＣＰＵコア４へ出力する。 For example, when the lower 3 bits (A0 to A2) of the address of the branch destination instruction are “000”, the shifter 62 does not shift the branch destination instruction output from the cache data memory 209, and the CPU core 4 Output to. When the lower 3 bits (A0 to A2) of the address of the branch destination instruction are “001”, the shifter 62 shifts the branch destination instruction output from the cache data memory 209 to the left by 8 bits, and the CPU core Output to 4. Similarly, when the lower 3 bits (A0 to A2) of the branch destination instruction address are “010”, the shifter 62 shifts the branch destination instruction output from the cache data memory 209 to the left by 16 bits. And output to the CPU core 4. When the lower 3 bits (A0 to A2) of the address of the branch destination instruction are “011”, the shifter 62 shifts the branch destination instruction output from the cache data memory 209 to the left by 24 bits, and the CPU core Output to 4. When the lower 3 bits (A0 to A2) of the address of the branch destination instruction are “100”, the shifter 62 shifts the branch destination instruction output from the cache data memory 209 to the left by 32 bits to obtain the CPU core. Output to 4. When the lower 3 bits (A0 to A2) of the address of the branch destination instruction are “101”, the shifter 62 shifts the branch destination instruction output from the cache data memory 209 to the left by 40 bits to obtain the CPU core. Output to 4. When the lower 3 bits (A0 to A2) of the address of the branch destination instruction are “110”, the shifter 62 shifts the branch destination instruction output from the cache data memory 209 to the left by 48 bits to obtain the CPU core. Output to 4. When the lower 3 bits (A0 to A2) of the address of the branch destination instruction are “111”, the shifter 62 shifts the branch destination instruction output from the cache data memory 209 to the left by 56 bits to obtain the CPU core. Output to 4.

以上のように、本実施の形態によれば、データの入出力時に命令をシフトさせて格納することによって、第１〜第４の実施形態よりも分岐先命令を格納するキャッシュメモリの領域を削減することができる。 As described above, according to the present embodiment, the area of the cache memory for storing the branch destination instruction is reduced as compared with the first to fourth embodiments by shifting and storing the instruction at the time of data input / output. can do.

以上、本発明者によってなされた発明を実施の形態に基づき具体的に説明したが、本発明は実施の形態に限定されるものではなく、その要旨を逸脱しない範囲で種々変更可能であることはいうまでもない。 As mentioned above, the invention made by the present inventor has been specifically described based on the embodiment. However, the present invention is not limited to the embodiment, and various modifications can be made without departing from the scope of the invention. Needless to say.

その他、実施の形態に記載された内容の一部を以下に記載する。
（１）半導体装置は、ビット幅がＮビットであるメインメモリと、ビット幅がＮビットのキャッシュラインを有するキャッシュメモリとを備える。さらに、半導体装置は、前記メインメモリに記憶されているＭビット（Ｍ≦Ｎ）の命令を前記キャッシュメモリに登録する場合に、前記命令の先頭が含まれるＮビットの読出し単位データと後続のＮビットの読出し単位データとを読み出して、命令の先頭がキャッシュラインの先頭になるように、読出した２Ｎビットのデータをシフトして登録するキャッシュ制御部とを備える。 In addition, a part of the contents described in the embodiment will be described below.
(1) The semiconductor device includes a main memory having a bit width of N bits and a cache memory having a cache line having a bit width of N bits. Further, when registering an M-bit (M ≦ N) instruction stored in the main memory in the cache memory, the semiconductor device includes N-bit read unit data including the head of the instruction and the subsequent N A cache control unit that reads out the read unit data of bits and shifts and registers the read 2N-bit data so that the head of the instruction becomes the head of the cache line.

１，１０１，２０１半導体装置、２ＣＰＵ、３キャッシュメモリ、４ＣＰＵコア、５ＲＯＭ、６ＣＩＦ、７ＲＡＭ、８，１０８命令メモリ、９，２０９キャッシュデータメモリ、１０，１１０，２１０入出力制御部、１１，１１１タグメモリ、１２，２１２，５１２アドレス比較器、１３，１１３，２１３Ｖビットメモリ、１４，１１４，２１４Ｖビット制御部、１５ＬＲＵ、１６ＬＲＵ制御部、１７第１キュー、１８第２キュー、１９，１１９シーケンサ、８１キャッシュ制御部、２１＿１〜２１＿Ｎ，２５＿１〜２５＿Ｎ比較器、２４＿１〜２４＿Ｎ加算器、２２＿１〜２２＿Ｎ，２６＿１〜２６＿Ｎ論理積回路、６１，６２シフタ、７０，９０セレクタ、２３，２７，２８論理和回路、２１１分岐先命令アドレスメモリ。 1, 101, 201 Semiconductor device, 2 CPU, 3 cache memory, 4 CPU core, 5 ROM, 6 CIF, 7 RAM, 8, 108 Instruction memory, 9,209 Cache data memory, 10, 110, 210 Input / output control unit 11, 111 Tag memory, 12, 212, 512 Address comparator, 13, 113, 213 V bit memory, 14, 114, 214 V bit control unit, 15 LRU, 16 LRU control unit, 17 First queue, 18 2 queues, 19, 119 sequencer, 81 cache control unit, 21_1 to 21_N, 25_1 to 25_N comparator, 24_1 to 24_N adder, 22_1 to 22_N, 26_1 to 26_N AND circuit, 61, 62 shifter, 70, 90 selector, 23, 27, 28 OR circuit, 211 minutes Previous instruction address memory.

Claims

A main memory having a bit width of N bits;
A cache memory having a cache line of L (N × X: X is a natural number of 2 or more) bit width;
When registering an M-bit (2 × M ≦ L) instruction stored in the main memory in the cache memory, N-bit read unit data including the head of the instruction and subsequent (X−1) ) pieces of Bei example a cache control unit for registering a reading unit data of N bits into a single cache line,
When the position of the instruction in each read unit data of the main memory is designated by s bits,
The cache control unit includes N-bit read unit data specified by a first address in which the lower s bits of an address specifying the instruction are set to 0, and the next (X-1) N bits The read unit data is registered in the cache line .

The cache control unit, the registers in the cache memory the portion excluding the lower s bits of the first address as a tag address for specifying the cache line, the semiconductor device according to claim 1, wherein.

The semiconductor device according to claim 2 , wherein the instruction is a branch destination instruction specified by a branch instruction.

When the cache control unit jumps to the address of the branch destination instruction specified by the branch instruction, the cache control unit corresponds to the tag address when the portion except the lower s bit of the address matches the tag address. The semiconductor device according to claim 3 , wherein data of a cache line to be read is read.

The semiconductor device according to claim 3 , wherein X = 2.

When the cache control unit jumps to the address of the branch destination instruction specified by the branch instruction, if the portion excluding the lower s bits of the address matches the next address of the tag address, 4. The semiconductor device according to claim 3 , wherein data of a cache line corresponding to the address is read.

The semiconductor device according to claim 5 , wherein the cache control unit outputs N-bit data corresponding to a next address among the read cache line data to the CPU core.

The semiconductor device according to claim 1, wherein the main memory stores a variable length instruction.

The semiconductor device according to claim 1, wherein the cache control unit registers data indicating whether or not each read unit data registered in the cache line is valid in the cache memory.