JP6088951B2

JP6088951B2 - Cache memory system and processor system

Info

Publication number: JP6088951B2
Application number: JP2013196128A
Authority: JP
Inventors: 口紘希野; 田忍藤
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 2013-09-20
Filing date: 2013-09-20
Publication date: 2017-03-01
Anticipated expiration: 2033-09-20
Also published as: WO2015041151A1; US20160196210A1; US9740613B2; JP2015060571A

Description

本発明の実施形態は、不揮発性メモリを用いたキャッシュメモリシステムおよびプロセッサシステムに関する。 Embodiments described herein relate generally to a cache memory system and a processor system using a nonvolatile memory.

キャッシュメモリは、メインメモリよりもアクセス速度が高速であり、プロセッサの処理能力に直接的な影響を与えることから、キャッシュメモリの大容量化は今後も進むと見込まれている。 Since the cache memory has a higher access speed than the main memory and directly affects the processing capability of the processor, the increase in the capacity of the cache memory is expected to continue.

キャッシュメモリが大容量になると、キャッシュメモリ内のデータを管理するタグ情報も膨大になり、プロセッサが読み出し要求を行ったデータがキャッシュメモリ内にあるか否かの判定処理に時間がかかってしまう。この判定処理に時間がかかると、メインメモリへのアクセスにも時間がかかり、プロセッサの処理能力の低下につながる。 When the capacity of the cache memory becomes large, the tag information for managing the data in the cache memory also becomes enormous, and it takes time to determine whether or not the data requested by the processor is in the cache memory. If this determination process takes time, access to the main memory also takes time, leading to a decrease in the processing capacity of the processor.

特表２００２−５３６７１５号公報Special Table 2002-536715 gazette 特表２００２−５３６７１６号公報Japanese translation of PCT publication No. 2002-536716 特表２００２−５３６７１７号公報JP-T-2002-536717

本発明が解決しようとする課題は、大容量のキャッシュメモリに対するアクセス効率を向上可能なキャッシュメモリシステムおよびプロセッサシステムを提供することである。 The problem to be solved by the present invention is to provide a cache memory system and a processor system capable of improving access efficiency to a large-capacity cache memory.

本実施形態では、ｋ次（ｋ＝１からｎまでのすべての整数、ｎは１以上の整数）のキャッシュメモリと、
前記ｋ次のキャッシュメモリよりもメモリ容量が大きく、かつメインメモリよりも高速アクセスが可能な不揮発性メモリを用いた大容量キャッシュメモリと、
プロセッサが発行する仮想アドレスから物理アドレスへのアドレス変換情報と、前記ｋ次のキャッシュメモリのアクセス単位であるキャッシュラインよりもデータ量の多いページ単位で前記大容量キャッシュメモリにデータが格納されているか否かを記録するフラグ情報と、を格納するトランスレーション・ルックアサイド・バッファと、を備えるキャッシュメモリシステムが提供される。 In the present embodiment, a kth-order cache memory (all integers from k = 1 to n, where n is an integer equal to or greater than 1),
A large-capacity cache memory using a non-volatile memory having a memory capacity larger than that of the kth-order cache memory and capable of being accessed at a higher speed than the main memory;
Whether the data is stored in the large-capacity cache memory in units of pages with a larger amount of data than the address conversion information from the virtual address to the physical address issued by the processor and the cache line that is the access unit of the k-th cache memory A cache memory system is provided that includes flag information for recording whether or not, and a translation lookaside buffer for storing the flag information.

本発明の第１の実施形態に係るプロセッサシステム１の概略構成を示す図。1 is a diagram showing a schematic configuration of a processor system 1 according to a first embodiment of the present invention. 第１の実施形態におけるＴＬＢ３、各キャッシュメモリ４〜６およびメインメモリ７のアクセス優先度を示す図。The figure which shows the access priority of TLB3, each cache memory 4-6, and the main memory 7 in 1st Embodiment. 第１の実施形態におけるＴＬＢ３の内部構成を示す図。The figure which shows the internal structure of TLB3 in 1st Embodiment. セットアソシアティブ構成のＴＬＢ３の内部構成を示す図。The figure which shows the internal structure of TLB3 of a set associative structure. 第１の実施形態によるＣＰＵ２が読み出し要求アドレスを発行する場合の処理手順を示すフローチャート。6 is a flowchart illustrating a processing procedure when the CPU 2 according to the first embodiment issues a read request address. 第２の実施形態に係るプロセッサシステム１の概略構成を示すブロック図。The block diagram which shows schematic structure of the processor system 1 which concerns on 2nd Embodiment. 第２の実施形態におけるＴＬＢ３、各キャッシュメモリ４〜６およびメインメモリ７のアクセス優先度を示す図。The figure which shows the access priority of TLB3, each cache memory 4-6, and the main memory 7 in 2nd Embodiment. 第２の実施形態におけるＴＬＢ３の内部構成を示す図。The figure which shows the internal structure of TLB3 in 2nd Embodiment. 第２の実施形態によるＣＰＵ２が読み出し要求アドレスを発行する場合の処理手順を示すフローチャート。9 is a flowchart showing a processing procedure when the CPU 2 according to the second embodiment issues a read request address. 第３の実施形態に係るプロセッサシステム１の概略構成を示すブロック図。The block diagram which shows schematic structure of the processor system 1 which concerns on 3rd Embodiment. 第３の実施形態におけるＴＬＢ３、ページテーブル１０、各キャッシュメモリ４〜６およびメインメモリ７のアクセス優先度を示す図。The figure which shows the access priority of TLB3, the page table 10, each cache memory 4-6, and the main memory 7 in 3rd Embodiment. 第３の実施形態によるＣＰＵ２が読み出し要求アドレスを発行する場合の処理手順を示すフローチャート。10 is a flowchart showing a processing procedure when the CPU 2 according to the third embodiment issues a read request address. 第４の実施形態に係るプロセッサシステム１の概略構成を示すブロック図。The block diagram which shows schematic structure of the processor system 1 which concerns on 4th Embodiment. 第４の実施形態におけるＴＬＢ３、各キャッシュメモリおよびメインメモリ７のアクセス優先度を示す図。The figure which shows the access priority of TLB3, each cache memory, and the main memory 7 in 4th Embodiment. 第４の実施形態によるＣＰＵ２が読み出し要求アドレスを発行する場合の処理手順を示すフローチャート。10 is a flowchart showing a processing procedure when the CPU 2 according to the fourth embodiment issues a read request address.

以下、図面を参照しながら、本発明の実施形態を説明する。 Hereinafter, embodiments of the present invention will be described with reference to the drawings.

（第１の実施形態）
図１は本発明の第１の実施形態に係るプロセッサシステム１の概略構成を示す図である。図１のプロセッサシステム１は、プロセッサ（ＣＰＵ）２と、トランスレーション・ルックアサイド・バッファ（ＴＬＢ：Translation Lookaside Buffer）３と、１次キャッシュメモリ（Ｌ１キャッシュ）４と、２次キャッシュメモリ（Ｌ２キャッシュ）５と、大容量キャッシュメモリ（ページマッピングキャッシュ）６と、メインメモリ７とを備えている。 (First embodiment)
FIG. 1 is a diagram showing a schematic configuration of a processor system 1 according to the first embodiment of the present invention. A processor system 1 in FIG. 1 includes a processor (CPU) 2, a translation lookaside buffer (TLB) 3, a primary cache memory (L1 cache) 4, and a secondary cache memory (L2 cache). ) 5, a large-capacity cache memory (page mapping cache) 6, and a main memory 7.

メインメモリ７以外のプロセッサ２、ＴＬＢ３、Ｌ１キャッシュ４、Ｌ２キャッシュ５、およびページマッピングキャッシュ６は例えば一つのチップ８に集積される。また、ＴＬＢ３、Ｌ１キャッシュ４、Ｌ２キャッシュ５、およびページマッピングキャッシュ６はメモリシステム９に対応する。 The processor 2, TLB 3, L1 cache 4, L2 cache 5, and page mapping cache 6 other than the main memory 7 are integrated on one chip 8, for example. TLB 3, L 1 cache 4, L 2 cache 5, and page mapping cache 6 correspond to memory system 9.

Ｌ１キャッシュ４とＬ２キャッシュ５は、メインメモリ７よりも高速アクセスが可能な半導体メモリ（例えば、ＳＲＡＭ）で構成されている。ページマッピングキャッシュ６は、メインメモリ７よりも高速アクセスが可能で、かつＬ１キャッシュ４とＬ２キャッシュ５よりもメモリ容量が大きい不揮発性メモリ（例えば、ＭＲＡＭ）で構成されている。本明細書では、ページマッピングキャッシュ６を、低消費電力のスピン注入磁化反転型ＭＲＡＭ（ＳＴＴ−ＭＲＡＭ）で構成する例を説明する。 The L1 cache 4 and the L2 cache 5 are configured by semiconductor memories (for example, SRAM) that can be accessed at a higher speed than the main memory 7. The page mapping cache 6 is configured by a nonvolatile memory (for example, MRAM) that can be accessed at a higher speed than the main memory 7 and has a larger memory capacity than the L1 cache 4 and the L2 cache 5. In the present specification, an example in which the page mapping cache 6 is configured by a low power consumption spin-injection magnetization switching MRAM (STT-MRAM) will be described.

ＴＬＢ３は、ＣＰＵ２が発行する仮想アドレスから物理アドレスへのアドレス変換情報と、ｋ（ｋ＝１からｎまでのすべての整数、ｎは１以上の整数）次のキャッシュメモリのアクセス単位であるキャッシュラインよりもデータ量の多いページ単位でページマッピングキャッシュ６にデータが格納されているか否かを記録するフラグ情報と、を格納する。本実施形態によるＴＬＢ３は、Ｌ１キャッシュ４とＬ２キャッシュ５に優先してＣＰＵ２によりアクセスされるため、高速のメモリ（例えばＳＲＡＭ）で構成されている。 The TLB 3 is address conversion information from the virtual address to the physical address issued by the CPU 2 and a cache line that is an access unit of the next cache memory k (all integers from k = 1 to n, where n is an integer of 1 or more). And flag information for recording whether or not data is stored in the page mapping cache 6 in units of pages having a larger amount of data. Since the TLB 3 according to the present embodiment is accessed by the CPU 2 in preference to the L1 cache 4 and the L2 cache 5, the TLB 3 includes a high-speed memory (for example, SRAM).

メインメモリ７は、メモリシステム９内のどのメモリよりもメモリ容量が大きいことから、チップ８の外部、もしくはパッケージ積層化技術を用いて、例えばＤＲＡＭで構成されている。 Since the main memory 7 has a memory capacity larger than any memory in the memory system 9, it is constituted by, for example, a DRAM using the outside of the chip 8 or a package stacking technique.

図２は第１の実施形態におけるＴＬＢ３、各キャッシュメモリ４〜６およびメインメモリ７のアクセス優先度を示す図である。図示のように、ＣＰＵ２は、ＴＬＢ３、Ｌ１キャッシュ４、Ｌ２キャッシュ５、ページマッピングキャッシュ６およびメインメモリ７の順にアクセスする。アクセス頻度の高いメモリ内のデータはアクセス頻度の低いメモリにも格納される。すなわち、Ｌ１キャッシュ４内のデータはＬ２キャッシュ５にも格納され、Ｌ２キャッシュ５内のデータはページマッピングキャッシュ６にも格納され、ページマッピングキャッシュ６内のデータはメインメモリ７にも格納される。このように、各メモリ４〜７は階層関係を維持しており、これらのメモリにアクセスするためのアドレス変換情報等をＴＬＢ３が保持する。 FIG. 2 is a diagram showing access priorities of the TLB 3, the cache memories 4 to 6, and the main memory 7 in the first embodiment. As illustrated, the CPU 2 accesses the TLB 3, the L1 cache 4, the L2 cache 5, the page mapping cache 6, and the main memory 7 in this order. Data in the frequently accessed memory is also stored in the infrequently accessed memory. That is, the data in the L1 cache 4 is also stored in the L2 cache 5, the data in the L2 cache 5 is also stored in the page mapping cache 6, and the data in the page mapping cache 6 is also stored in the main memory 7. In this way, the memories 4 to 7 maintain a hierarchical relationship, and the TLB 3 holds address conversion information and the like for accessing these memories.

図２は、ＣＰＵ２をＭＯＳトランジスタを組み合わせたフリップフロップ（Ｆ／Ｆ）等で構成し、ＴＬＢ３、Ｌ１キャッシュ４およびＬ２キャッシュ５をＳＲＡＭで構成し、ページマッピングキャッシュ６をＳＴＴ−ＭＲＡＭで構成し、メインメモリ７をＤＲＡＭで構成する例を示している。 In FIG. 2, the CPU 2 is configured by a flip-flop (F / F) or the like combining MOS transistors, the TLB 3, the L1 cache 4 and the L2 cache 5 are configured by SRAM, the page mapping cache 6 is configured by STT-MRAM, An example in which the main memory 7 is constituted by a DRAM is shown.

図３は第１の実施形態におけるＴＬＢ３の内部構成を示す図である。図３のＴＬＢ３は、ページ単位で、Ｖａｌｉｄ情報、Ｄｉｒｔｙ情報、仮想アドレス情報（ＶＰＮ：Virtual Page Number）、物理アドレス情報（ＰＰＮ：Physical Page Number）、フラグ情報（Ｆｌａｇ）と、キャッシュアドレス情報（ＣＰＮ：Cache Page Number）とを格納している。 FIG. 3 is a diagram showing an internal configuration of the TLB 3 in the first embodiment. The TLB 3 in FIG. 3 includes, in page units, Valid information, Dirty information, virtual address information (VPN: Virtual Page Number), physical address information (PPN: Physical Page Number), flag information (Flag), and cache address information (CPN). : Cache Page Number).

ＣＰＵ２が読み出し要求をするアドレスは仮想アドレスであり、この仮想アドレスは、図３に示すように、仮想アドレス情報ＶＰＮとページオフセットとを含んでいる。ＴＬＢ３は、ＣＰＵ２からの仮想アドレスを物理アドレスに変換する。変換した物理アドレスは、図３に示すように、物理アドレス情報ＰＰＮとページオフセットとを含んでいる。物理アドレス中のページオフセットは、ＣＰＵ２が要求した仮想アドレス中のページオフセットと同じである。 The address from which the CPU 2 makes a read request is a virtual address, and this virtual address includes virtual address information VPN and a page offset as shown in FIG. The TLB 3 converts the virtual address from the CPU 2 into a physical address. As shown in FIG. 3, the converted physical address includes physical address information PPN and a page offset. The page offset in the physical address is the same as the page offset in the virtual address requested by the CPU 2.

ＴＬＢ３には、図３に示すようにキャッシュアドレス情報が格納されており、ＴＬＢ３は、このキャッシュアドレス情報を利用して、ページマッピングキャッシュ６をアクセスする。このキャッシュアドレス情報は、図３に示すように、キャッシュページ番号ＣＰＮとページオフセットとを有する。キャッシュアドレス中のページオフセットは、ＣＰＵ２が要求した仮想アドレス中のページオフセットと同じである。 As shown in FIG. 3, cache address information is stored in the TLB 3, and the TLB 3 accesses the page mapping cache 6 using this cache address information. The cache address information includes a cache page number CPN and a page offset as shown in FIG. The page offset in the cache address is the same as the page offset in the virtual address requested by the CPU 2.

図３に示すように、ＴＬＢ３の内部にキャッシュアドレス情報が含まれていれば、そのキャッシュアドレス情報にてページマッピングキャッシュ６にアクセスできるため、アクセス効率がよくなるが、ページマッピングキャッシュ６のメモリ容量（ページエントリ数）が大きくなるほど、ＴＬＢ３に格納すべきキャッシュアドレス情報が増えてしまい、ＴＬＢ３が大容量化して検索にも時間がかかってしまう。よって、ページマッピングキャッシュ６のメモリ容量が大きい場合には、ＴＬＢ３からキャッシュアドレス情報を削除して、ＴＬＢ３の情報量の削減を図ってもよい。ただし、この場合、ＴＬＢ３の物理アドレスを用いてページマッピングキャッシュ６にアクセスしなければならないため、ＴＬＢ３にキャッシュアドレス情報が含まれる場合よりも、アクセスに時間がかかってしまう。 As shown in FIG. 3, if the cache address information is included in the TLB 3, the page mapping cache 6 can be accessed with the cache address information, so that the access efficiency is improved, but the memory capacity of the page mapping cache 6 ( As the number of page entries) increases, the cache address information to be stored in the TLB 3 increases, and the TLB 3 increases in capacity and takes longer to search. Therefore, when the memory capacity of the page mapping cache 6 is large, the cache address information may be deleted from the TLB 3 to reduce the information amount of the TLB 3. However, in this case, since the page mapping cache 6 must be accessed using the physical address of the TLB 3, access takes longer than when the TLB 3 includes cache address information.

ＣＰＵ２が実行するオペ-レーティングシステム（ＯＳ）のタスク（プロセス）が切り替わると、ＴＬＢ３の情報を書き換える（フラッシュする）必要がある。これは、タスクごとに、仮想アドレスと物理アドレスとの対応関係が異なっており、同じ仮想アドレスであっても、物理アドレスが異なるためである。このため、タスクが切り替わると、ＴＬＢ３のすべてのページエントリを無効化する必要が生じる。ＴＬＢ３のサイズが小さい場合は大した問題ではないが、ＴＬＢ３のサイズが大きい場合は、ＴＬＢ３の更新に時間がかかるために、ＣＰＵ２の処理遅延が生じてしまう。このような処理遅延を解消するために、各タスクの仮想空間を識別するアドレス空間ＩＤ（ＡＳＩＤ）を設けて、予めＴＬＢ３に、アドレス空間ＩＤごとにページ情報を格納しておけば、タスクが切り替わるごとにＴＬＢ３をフラッシュする必要がなくなる。 When the task (process) of the operating system (OS) executed by the CPU 2 is switched, it is necessary to rewrite (flash) the information in the TLB 3. This is because the correspondence between the virtual address and the physical address is different for each task, and the physical address is different even for the same virtual address. For this reason, when the task is switched, all page entries of the TLB 3 need to be invalidated. If the size of the TLB 3 is small, it is not a big problem, but if the size of the TLB 3 is large, it takes a long time to update the TLB 3, so that the processing delay of the CPU 2 occurs. In order to eliminate such a processing delay, an address space ID (ASID) for identifying the virtual space of each task is provided, and if the page information is stored for each address space ID in the TLB 3 in advance, the task is switched. There is no need to flush TLB3 every time.

また、ページマッピングキャッシュ６の容量が増大すると、ＴＬＢ３のエントリ数も増えるため、ＴＬＢ３の検索遅延が生じてしまう。そこで、ＴＬＢ３のエントリ数が多い場合は、ＴＬＢ３を複数の階層構造にしたり、仮想アドレス情報ＶＰＮの一部のビット（例えば下位側１０ビット）をインデックスにしたセットアソシアティブ構成にして、ＴＬＢ３の検索遅延を削減するのが望ましい。 Further, when the capacity of the page mapping cache 6 is increased, the number of entries in the TLB 3 is also increased, so that a search delay of the TLB 3 occurs. Therefore, when the number of TLB3 entries is large, the TLB3 has a plurality of hierarchical structures, or has a set associative configuration in which some bits (for example, the lower 10 bits) of the virtual address information VPN are used as indexes, and the TLB3 search delay It is desirable to reduce

図４はセットアソシアティブ構成のＴＬＢ３の内部構成を示す図である。図４のＴＬＢ３は、仮想アドレス情報ＶＰＮの一部のビットをインデックスにして、複数のウェイを有する。セットアソシアティブのインデックスとして用いる仮想アドレス情報ＶＰＮの一部のビット（例えば下位側１０ビット）は、同一セット内では重複しているが、仮想アドレス情報ＶＰＮの残りのビットはウェイ毎に異なる。そのためＴＬＢ３が出力する物理アドレス情報ＰＰＮはそれぞれ異なっている。 FIG. 4 is a diagram illustrating an internal configuration of the TLB 3 having a set associative configuration. The TLB 3 in FIG. 4 has a plurality of ways by using some bits of the virtual address information VPN as an index. Some bits (for example, lower 10 bits) of the virtual address information VPN used as the set associative index are duplicated in the same set, but the remaining bits of the virtual address information VPN are different for each way. Therefore, the physical address information PPN output from the TLB 3 is different.

図４のＴＬＢ３では、ＣＰＵ２が読み出し要求をした仮想アドレスの一部によりＴＬＢ３内のセットを選択し、仮想アドレスの残りの部分が選択したセット内の各ウェイが保持している仮想アドレス情報ＶＰＮと一致する場合には、対応する物理アドレス情報ＰＰＮを出力する。 In the TLB 3 in FIG. 4, the CPU 2 selects a set in the TLB 3 based on a part of the virtual address requested to be read, and the remaining part of the virtual address is the virtual address information VPN held by each way in the selected set. If they match, the corresponding physical address information PPN is output.

図５は第１の実施形態によるＣＰＵ２が読み出し要求アドレスを発行する場合の処理手順を示すフローチャートである。まず、ＣＰＵ２が発行した読み出し要求アドレスがＴＬＢ３内の仮想アドレス情報ＶＰＮにヒットするか否かを判定する（ステップＳ１）。ヒットしなければ、メインメモリ７内の不図示のページテーブルエントリ（ＰＴＥ）からアドレス変換情報をロードして、ＴＬＢ３内の情報を更新する（ステップＳ２）。これらステップＳ１，Ｓ２の処理は第１処理に対応する。 FIG. 5 is a flowchart showing a processing procedure when the CPU 2 according to the first embodiment issues a read request address. First, it is determined whether or not the read request address issued by the CPU 2 hits the virtual address information VPN in the TLB 3 (step S1). If there is no hit, the address translation information is loaded from a page table entry (PTE) (not shown) in the main memory 7 and the information in the TLB 3 is updated (step S2). These processes in steps S1 and S2 correspond to the first process.

ステップＳ１でヒットしたと判定された場合、あるいはステップＳ２の処理が終了した場合、ＣＰＵ２が発行した読み出し要求アドレスがＬ１キャッシュ４内のタグ情報にヒットするか否かを判定する（ステップＳ３）。ヒットすれば、Ｌ１キャッシュ４に格納されている対応データを読み出してＣＰＵ２に転送し、図５の処理を終了する（ステップＳ４）。なお、Ｌ１キャッシュ４のインデックスがページ内のアドレスで構成されている場合には、第１処理と同時に投機的にＬ１キャッシュ４のタグメモリにアクセスすることも可能であるが、ヒットしたかどうかの判定は、第１処理終了後でなければならない。 If it is determined in step S1 that a hit has occurred, or if the processing in step S2 is completed, it is determined whether or not the read request address issued by the CPU 2 hits the tag information in the L1 cache 4 (step S3). If there is a hit, the corresponding data stored in the L1 cache 4 is read out and transferred to the CPU 2, and the processing of FIG. 5 is terminated (step S4). If the index of the L1 cache 4 is composed of addresses in the page, it is possible to access the tag memory of the L1 cache 4 speculatively at the same time as the first processing. The determination must be after the end of the first process.

ステップＳ３でヒットしなかったと判定されると、ＣＰＵ２が発行した読み出し要求アドレスがＬ２キャッシュ５内のタグ情報にヒットするか否かを判定する（ステップＳ５）。ヒットすれば、Ｌ２キャッシュ５に格納されているデータを読み出してＣＰＵ２に転送し、図５の処理を終了する（ステップＳ６）。これらステップＳ３〜Ｓ６の処理は第２処理に対応する。 If it is determined in step S3 that there is no hit, it is determined whether or not the read request address issued by the CPU 2 hits the tag information in the L2 cache 5 (step S5). If there is a hit, the data stored in the L2 cache 5 is read out and transferred to the CPU 2, and the process of FIG. 5 is terminated (step S6). These processes in steps S3 to S6 correspond to the second process.

ステップＳ５でヒットしなかったと判定されると、ＣＰＵ２が発行した読み出し要求アドレスに対応するデータがページマッピングキャッシュ６に格納されているか否かをＴＬＢ３が保持するフラグ情報に基づいて判定する（ステップＳ７）。格納されている場合は、このアドレスに対応するページ分のデータをページマッピングキャッシュ６から読み出してＣＰＵ２に転送するとともに、このアドレスに対応するキャッシュライン分のデータをＬ１キャッシュ４とＬ２キャッシュ５に転送する（ステップＳ８）。これらステップＳ７およびＳ８の処理は第３処理に対応する。 If it is determined in step S5 that no hit has occurred, it is determined based on the flag information held in the TLB 3 whether or not the data corresponding to the read request address issued by the CPU 2 is stored in the page mapping cache 6 (step S7). ). If it is stored, the data for the page corresponding to this address is read from the page mapping cache 6 and transferred to the CPU 2, and the data for the cache line corresponding to this address is transferred to the L1 cache 4 and L2 cache 5. (Step S8). The processes in steps S7 and S8 correspond to the third process.

ステップＳ７で格納されていないと判定されると、ＣＰＵ２が発行した読み出し要求アドレスに対応するデータをメインメモリ７から読み出してＣＰＵ２に転送するとともに、このアドレスに対応するページ分のデータをページマッピングキャッシュ６に転送し、かつこのアドレスに対応するキャッシュライン分のデータをＬ１キャッシュ４とＬ２キャッシュ５に転送し、かつＴＬＢ３を更新する（ステップＳ９）。このステップＳ９の処理は第４処理に対応する。 If it is determined in step S7 that the data is not stored, the data corresponding to the read request address issued by the CPU 2 is read from the main memory 7 and transferred to the CPU 2, and the data for the page corresponding to this address is transferred to the page mapping cache. 6 and the data for the cache line corresponding to this address is transferred to the L1 cache 4 and the L2 cache 5, and the TLB 3 is updated (step S9). The process in step S9 corresponds to the fourth process.

このように、第１の実施形態では、Ｌ１キャッシュ４やＬ２キャッシュ５よりも大容量で、かつメインメモリ７よりも高速アクセスが可能なページマッピングキャッシュ６を設けて、このページマッピングキャッシュ６のタグ情報を既存のＴＬＢ３にページ単位で格納する。タグ情報をページ単位でＴＬＢ３に格納することで、Ｌ１キャッシュ４やＬ２キャッシュ５のようにキャッシュライン単位でタグ情報を格納するよりも情報量を削減でき、ページマッピングキャッシュ６に専用のタグメモリを設ける必要がなくなる。すなわち、本実施形態によれば、大容量かつ高速のページマッピングキャッシュ６のタグ情報を既存のＴＬＢ３に格納できる。 As described above, in the first embodiment, the page mapping cache 6 having a larger capacity than the L1 cache 4 and the L2 cache 5 and capable of being accessed faster than the main memory 7 is provided. Information is stored in the existing TLB 3 in units of pages. By storing tag information in the TLB 3 in page units, the amount of information can be reduced compared to storing tag information in cache line units as in the L1 cache 4 and L2 cache 5, and a dedicated tag memory is provided in the page mapping cache 6. There is no need to provide it. That is, according to the present embodiment, tag information of the large-capacity and high-speed page mapping cache 6 can be stored in the existing TLB 3.

また、本実施形態では、ページマッピングキャッシュ６よりも優先してＬ１キャッシュ４とＬ２キャッシュ５にアクセスするため、Ｌ１キャッシュ４とＬ２キャッシュ５に迅速にアクセスできる。さらに、Ｌ１キャッシュ４とＬ２キャッシュ５に入りきれないデータを大容量かつ高速のページマッピングキャッシュ６に格納するため、メインメモリ７にアクセスするよりも高速にデータの読み書きを行える。 In this embodiment, since the L1 cache 4 and the L2 cache 5 are accessed in preference to the page mapping cache 6, the L1 cache 4 and the L2 cache 5 can be accessed quickly. Furthermore, since data that cannot be stored in the L1 cache 4 and the L2 cache 5 is stored in the large-capacity and high-speed page mapping cache 6, data can be read / written faster than accessing the main memory 7.

また、本実施形態では、ＴＬＢ３の内部に、ページマッピングキャッシュ６用のキャッシュアドレス情報を有するため、Ｌ２キャッシュ５にヒットしなかったときに、このキャッシュアドレス情報を用いて、ページマッピングキャッシュ６から迅速に所望のデータを読み出すことができる。 In the present embodiment, since the TLB 3 has cache address information for the page mapping cache 6, when the L2 cache 5 is not hit, the cache address information is used to quickly retrieve the page mapping cache 6. Desired data can be read out.

（第２の実施形態）
以下に説明する第２の実施形態は、Ｌ２キャッシュ５とページマッピングキャッシュ６へのアクセスを並列化するものである。 (Second Embodiment)
In the second embodiment described below, access to the L2 cache 5 and the page mapping cache 6 is parallelized.

本実施形態は、ページマッピングキャッシュ６のアクセスレイテンシがＬ２キャッシュ５に匹敵するほど高速である場合、あるいはページマッピングキャッシュ６のメモリ容量がＬ２キャッシュ５のメモリ容量の数倍から数十倍の場合に特に有効である。 In the present embodiment, when the access latency of the page mapping cache 6 is as high as that of the L2 cache 5, or when the memory capacity of the page mapping cache 6 is several times to several tens of times the memory capacity of the L2 cache 5. It is particularly effective.

ページマッピングキャッシュ６とＬ２キャッシュ５は、それぞれ異なる物理アドレスのデータを格納する。すなわち、ページマッピングキャッシュ６とＬ２キャッシュ５は、互いに排他的にデータを格納する。 The page mapping cache 6 and the L2 cache 5 store data of different physical addresses. That is, the page mapping cache 6 and the L2 cache 5 store data exclusively from each other.

本実施形態のページマッピングキャッシュ６は、ページ全体に渡ってアクセスが頻繁に発生するデータを格納する。これに対して、Ｌ２キャッシュ５は、ページ内のある特定のラインだけアクセスが頻発する場合にこのラインのデータを格納する。 The page mapping cache 6 of the present embodiment stores data that frequently occurs throughout the page. On the other hand, the L2 cache 5 stores data of this line when access frequently occurs only for a specific line in the page.

このように、本実施形態では、一つのページ内で、ページマッピングキャッシュ６とＬ２キャッシュ５のどちらにデータを格納するかを動的に切り替える。 As described above, in this embodiment, in one page, the page mapping cache 6 or the L2 cache 5 is dynamically switched to store data.

図６は第２の実施形態に係るプロセッサシステム１の概略構成を示すブロック図である。図６のプロセッサシステム１は、ＣＰＵ２によりＬ２キャッシュ５とページマッピングキャッシュ６が並列的にアクセスされることが図１と異なっている。 FIG. 6 is a block diagram showing a schematic configuration of the processor system 1 according to the second embodiment. The processor system 1 in FIG. 6 is different from that in FIG. 1 in that the CPU 2 accesses the L2 cache 5 and the page mapping cache 6 in parallel.

図７は第２の実施形態におけるＴＬＢ３、各キャッシュメモリ４〜６およびメインメモリ７のアクセス優先度を示す図である。図示のように、ＣＰＵ２は、ＴＬＢ３とＬ１キャッシュ４の順にアクセスし、Ｌ１キャッシュ４の後はＬ２キャッシュ５とページマッピングキャッシュ６に並列にアクセスし、その後メインメモリ７にアクセスする。 FIG. 7 is a diagram showing access priorities of the TLB 3, the cache memories 4 to 6, and the main memory 7 in the second embodiment. As shown in the figure, the CPU 2 accesses the TLB 3 and the L1 cache 4 in this order. After the L1 cache 4, the CPU 2 accesses the L2 cache 5 and the page mapping cache 6 in parallel, and then accesses the main memory 7.

図８は第２の実施形態におけるＴＬＢ３の内部構成を示す図である。図８のＴＬＢ３は、図３のＴＬＢ３の構成に加えて、ページ単位でアクセスマップを有する。アクセスマップは、例えば各ページごとに、ページ内のすべてのライン分のビットを有する。Ｌ２キャッシュ５にデータが格納されると、対応するラインのビットが例えば１に設定される。そして、アクセスマップ内の１ページ分の全ビットのうち、１になったビットの数が予め定めた閾値を超えた場合には、そのページについては、ページマッピングキャッシュ６に格納することにし、Ｌ２キャッシュ５内の対応データは無効化する。 FIG. 8 is a diagram showing an internal configuration of the TLB 3 in the second embodiment. The TLB 3 in FIG. 8 has an access map for each page in addition to the configuration of the TLB 3 in FIG. The access map has, for example, bits for all lines in the page for each page. When data is stored in the L2 cache 5, the bit of the corresponding line is set to 1, for example. Then, when the number of bits that are 1 out of all bits for one page in the access map exceeds a predetermined threshold, the page is stored in the page mapping cache 6 and L2 Corresponding data in the cache 5 is invalidated.

図８のＴＬＢ３は、図２のＴＬＢ３と同様に、ページマッピングキャッシュ６にアクセスするためのキャッシュアドレス情報を有するが、このキャッシュアドレス情報は必ずしも必須ではない。また、ページマッピングキャッシュ６のエントリ数が多い場合には、ＴＬＢ３をセットアソシアティブ構成にしてもよい。また、データがＬ２キャッシュ５に格納されているときはキャッシュアドレス情報が不要で、反対に、データがページマッピングキャッシュ６に格納されている時は、アクセスマップは不要となるので、ＴＬＢ３内のアクセスアップ用のビットとキャッシュアドレス情報用のビットを共有することができ、ＴＬＢ３の容量を節約することができる。 The TLB 3 in FIG. 8 has cache address information for accessing the page mapping cache 6 in the same manner as the TLB 3 in FIG. 2, but this cache address information is not always essential. If the number of entries in the page mapping cache 6 is large, the TLB 3 may be set associative. Further, when the data is stored in the L2 cache 5, the cache address information is not required. On the contrary, when the data is stored in the page mapping cache 6, the access map is not required. The up bit and the cache address information bit can be shared, and the capacity of the TLB 3 can be saved.

図９は第２の実施形態によるＣＰＵ２が読み出し要求アドレスを発行する場合の処理手順を示すフローチャートである。ステップＳ１１〜Ｓ１４は図５のステップＳ１〜Ｓ４と同様である。ステップＳ１３でＬ１キャッシュ４にヒットしなかったと判定されると、読み出し要求アドレスに対応するデータがページマッピングキャッシュ６に格納されているか否かをＴＬＢ３が保持するフラグ情報に基づいて判定する（ステップＳ１５）。格納されている場合は、このアドレスに対応するページ分のデータをページマッピングキャッシュ６から読み出してＣＰＵ２に転送するとともに、このアドレスに対応するキャッシュライン分のデータをＬ１キャッシュ４に転送する（ステップＳ１６）。ステップＳ１１，Ｓ１２の処理は第１処理に対応する。ステップＳ１３，Ｓ１４の処理は第２処理に対応する。ステップＳ１５，Ｓ１６の処理は第３処理に対応する。 FIG. 9 is a flowchart showing a processing procedure when the CPU 2 according to the second embodiment issues a read request address. Steps S11 to S14 are the same as steps S1 to S4 in FIG. If it is determined in step S13 that the L1 cache 4 has not been hit, whether or not the data corresponding to the read request address is stored in the page mapping cache 6 is determined based on the flag information held in the TLB 3 (step S15). ). If it is stored, the data for the page corresponding to this address is read from the page mapping cache 6 and transferred to the CPU 2, and the data for the cache line corresponding to this address is transferred to the L1 cache 4 (step S16). ). The processes in steps S11 and S12 correspond to the first process. The processes of steps S13 and S14 correspond to the second process. The processes of steps S15 and S16 correspond to the third process.

ステップＳ１５で格納されていないと判定されると、ＣＰＵ２が発行した読み出し要求アドレスがＬ２キャッシュ５内のタグ情報にヒットするか否かを判定する（ステップＳ１７）。ヒットすれば、Ｌ２キャッシュ５に格納されているデータを読み出してＣＰＵ２に転送する（ステップＳ１８）。ステップＳ１７，Ｓ１８の処理は第４処理に対応する。なお、ステップＳ１５は、ステップＳ１１でＴＬＢ３にアクセスした時点であらかじめ必要な情報をＴＬＢ３から読み出しているため、ページマッピングキャッシュ６を持たないメモリシステムと比べて、Ｌ２キャッシュ５にアクセスするタイミングが遅延することはない。 If it is determined that it is not stored in step S15, it is determined whether or not the read request address issued by the CPU 2 hits the tag information in the L2 cache 5 (step S17). If there is a hit, the data stored in the L2 cache 5 is read and transferred to the CPU 2 (step S18). The processing in steps S17 and S18 corresponds to the fourth processing. In step S15, since necessary information is read from the TLB 3 in advance when the TLB 3 is accessed in step S11, the access timing to the L2 cache 5 is delayed compared to a memory system that does not have the page mapping cache 6. There is nothing.

ステップＳ１７でヒットしなかったと判定されると、ＣＰＵ２が発行した読み出し要求アドレスに対応するデータをメインメモリ７から読み出してＣＰＵ２に転送するとともに、このアドレスに対応するページ分のデータをページマッピングキャッシュ６に転送し、かつこのアドレスに対応するキャッシュライン分のデータをＬ１キャッシュ４とＬ２キャッシュ５に転送する（ステップＳ１９）。ステップＳ１９の処理は第５処理に対応する。 If it is determined in step S17 that there is no hit, the data corresponding to the read request address issued by the CPU 2 is read from the main memory 7 and transferred to the CPU 2, and the data for the page corresponding to this address is transferred to the page mapping cache 6 And data for the cache line corresponding to this address is transferred to the L1 cache 4 and the L2 cache 5 (step S19). The process in step S19 corresponds to the fifth process.

次に、ＴＬＢ３内のアクセスマップの対応ページをチェックする（ステップＳ２０）。すなわち、メインメモリ７から読み出したデータをＬ２キャッシュ５に書き込んで、ＴＬＢ３内のアクセスマップを更新した場合に、アクセスマップ内の対応ページのビットが１になった数が閾値を超えるか否かをチェックする（ステップＳ２０、Ｓ２１）。 Next, the corresponding page of the access map in the TLB 3 is checked (step S20). That is, when the data read from the main memory 7 is written in the L2 cache 5 and the access map in the TLB 3 is updated, whether or not the number of bits of the corresponding page in the access map becomes 1 exceeds the threshold value. Check (steps S20 and S21).

閾値を超えると判定された場合は、対応ページ内の全ラインのデータをＬ２キャッシュ５とメインメモリ７からページマッピングキャッシュ６に転送し、対応ページ内の全ラインのＬ２キャッシュ５のデータは無効化し、ＴＬＢ３を更新する。このとき、ページマッピングキャッシュ６内で追い出されたデータを必要に応じてメインメモリ７にライトバックする。また、ＣＰＵ２が発行した読み出し要求アドレスに対応するデータをＬ１キャッシュ４に転送する（ステップＳ２２）。ステップＳ２０〜Ｓ２２の処理は第６処理に対応する。 If it is determined that the threshold value is exceeded, the data of all lines in the corresponding page is transferred from the L2 cache 5 and the main memory 7 to the page mapping cache 6, and the data in the L2 cache 5 of all lines in the corresponding page is invalidated. , TLB3 is updated. At this time, the data evicted in the page mapping cache 6 is written back to the main memory 7 as necessary. Further, the data corresponding to the read request address issued by the CPU 2 is transferred to the L1 cache 4 (step S22). The processes in steps S20 to S22 correspond to the sixth process.

ステップＳ２０で超えなかったと判定されると、ＣＰＵ２が発行した読み出し要求アドレスに対応するデータをＬ１キャッシュ４とＬ２キャッシュ５に転送する（ステップＳ２３）。ステップＳ２３の処理は第７処理に対応する。 If it is determined in step S20 that it has not exceeded, the data corresponding to the read request address issued by the CPU 2 is transferred to the L1 cache 4 and the L2 cache 5 (step S23). The process in step S23 corresponds to the seventh process.

このように、第２の実施形態では、Ｌ２キャッシュ５とページマッピングキャッシュ６へのアクセスを並列化して行うため、対応ページの全体に渡ってアクセスが頻繁に発生するか、対応ページ内の特定のラインにアクセスが集中するかによって、Ｌ２キャッシュ５とページマッピングキャッシュ６のどちらにデータを格納するかを切り替えることができる。よって、Ｌ２キャッシュ５とページマッピングキャッシュ６を効率よく使い分けることができる。 As described above, in the second embodiment, since the access to the L2 cache 5 and the page mapping cache 6 is performed in parallel, access frequently occurs over the entire corresponding page or a specific page in the corresponding page is specified. Whether the data is stored in the L2 cache 5 or the page mapping cache 6 can be switched depending on whether access concentrates on the line. Therefore, the L2 cache 5 and the page mapping cache 6 can be used efficiently.

（第３の実施形態）
以下に説明する第３の実施形態は、ＴＬＢ３とは別個に、ページテーブルを備えるものである。ページマッピングキャッシュ６のエントリ数が増えると、すべてのエントリに関するアドレス変換情報やフラグ情報等をＴＬＢ３に格納しきれなくなるおそれがある。そこで、本実施形態では、ＴＬＢ３に入りきれなかった情報をページテーブルに格納する。 (Third embodiment)
In the third embodiment described below, a page table is provided separately from the TLB 3. When the number of entries in the page mapping cache 6 increases, there is a possibility that address translation information, flag information, etc. relating to all entries cannot be stored in the TLB 3. Therefore, in the present embodiment, information that could not fit in the TLB 3 is stored in the page table.

図１０は第３の実施形態に係るプロセッサシステム１の概略構成を示すブロック図である。図１０のプロセッサシステム１は、図１と比較して、Ｌ２キャッシュ５とページマッピングキャッシュ６の間に新たにページテーブル１０を配置している。ページテーブル１０には、ＴＬＢ３に格納しきれなかったアドレス変換情報やフラグ情報等が格納される。よって、ページテーブル１０は、基本的にはＴＬＢ３と同じ内部構成を有する。ページテーブル１０は、ページマッピングキャッシュ６と同様に、メインメモリ７より高速アクセスが可能なメモリ（例えばＭＲＡＭ）で構成される。 FIG. 10 is a block diagram showing a schematic configuration of the processor system 1 according to the third embodiment. In the processor system 1 of FIG. 10, a page table 10 is newly arranged between the L2 cache 5 and the page mapping cache 6 as compared with FIG. The page table 10 stores address conversion information, flag information, etc. that could not be stored in the TLB 3. Therefore, the page table 10 basically has the same internal configuration as the TLB 3. Similar to the page mapping cache 6, the page table 10 is configured by a memory (for example, MRAM) that can be accessed at a higher speed than the main memory 7.

図１１は第３の実施形態におけるＴＬＢ３、ページテーブル１０、各キャッシュメモリ４〜６およびメインメモリ７のアクセス優先度を示す図である。図示のように、ＣＰＵ２は、ＴＬＢ３、Ｌ１キャッシュ４、Ｌ２キャッシュ５、ページテーブル１０、ページマッピングキャッシュ６、およびメインメモリ７の順にアクセスする。 FIG. 11 is a diagram showing access priorities of the TLB 3, the page table 10, the cache memories 4 to 6 and the main memory 7 in the third embodiment. As illustrated, the CPU 2 accesses the TLB 3, the L1 cache 4, the L2 cache 5, the page table 10, the page mapping cache 6, and the main memory 7 in this order.

ＣＰＵ２の読み出し要求アドレスがＴＬＢ３にヒットしなかったときに、メインメモリ７にアクセスする前に、ページテーブル１０内を検索し、ページテーブル１０にヒットすれば、メインメモリ７にアクセスせずにアドレス変換情報をロードすることができ、Ｌ１キャッシュ４、Ｌ２キャッシュ５に該当データがなかった場合、対応データをページマッピングキャッシュ６から取り出す。これにより、メインメモリ７へのアクセス頻度を減らすことができる。 When the read request address of the CPU 2 does not hit the TLB 3, the page table 10 is searched before accessing the main memory 7. If the page table 10 is hit, the address conversion is performed without accessing the main memory 7. When the information can be loaded and there is no corresponding data in the L1 cache 4 and the L2 cache 5, the corresponding data is retrieved from the page mapping cache 6. Thereby, the access frequency to the main memory 7 can be reduced.

ページテーブル１０は、上述したように、基本的にはＴＬＢ３と同じ内部構成を有し、ページマッピングキャッシュ６に直接アクセスするためのキャッシュアドレス情報を有するのが望ましい。また、ページマッピングキャッシュ６のエントリ数が多い場合には、ページテーブル１０をセットアソシアティブ構成にするのが望ましい。また、ＣＰＵ２が実行するタスクを切り替えたときに、ページテーブル１０全体を無効化して更新しなくて済むように、各タスクごとにアドレス空間ＩＤ（ＡＳＩＤ）を割り振ってアドレス変換情報等を管理してもよい。 As described above, the page table 10 basically has the same internal configuration as that of the TLB 3 and desirably has cache address information for directly accessing the page mapping cache 6. Further, when the number of entries in the page mapping cache 6 is large, it is desirable that the page table 10 has a set associative configuration. In addition, when the task executed by the CPU 2 is switched, an address space ID (ASID) is assigned to each task to manage address conversion information so that the entire page table 10 need not be invalidated and updated. Also good.

図１２は第３の実施形態によるＣＰＵ２が読み出し要求アドレスを発行する場合の処理手順を示すフローチャートである。ステップＳ３１〜Ｓ３６は、図５のステップＳ１〜Ｓ６と同様である。ステップＳ３１，Ｓ３２は第１処理に対応する。ステップＳ３３〜Ｓ３６は第２処理に対応する。 FIG. 12 is a flowchart showing a processing procedure when the CPU 2 according to the third embodiment issues a read request address. Steps S31 to S36 are the same as steps S1 to S6 in FIG. Steps S31 and S32 correspond to the first process. Steps S33 to S36 correspond to the second process.

ステップＳ３５でＬ２キャッシュ５にミスしたと判定されると、ＣＰＵ２の読み出し要求アドレスがページテーブル１０にヒットしたか否かを判定する（ステップＳ３７）。ヒットしたと判定されると、ページマッピングキャッシュ６から該当するデータを読み出してＣＰＵ２に転送するとともに、読み出し要求アドレスに対応するキャッシュライン分のデータをＬ１キャッシュ４とＬ２キャッシュ５に転送する（ステップＳ３８）。ステップＳ３７，Ｓ３８は第３処理に対応する。 If it is determined in step S35 that the L2 cache 5 has been missed, it is determined whether or not the read request address of the CPU 2 has hit the page table 10 (step S37). If it is determined that there is a hit, the corresponding data is read from the page mapping cache 6 and transferred to the CPU 2, and the data for the cache line corresponding to the read request address is transferred to the L1 cache 4 and the L2 cache 5 (step S38). ). Steps S37 and S38 correspond to the third process.

ステップＳ３７でミスしたと判定されると、ＣＰＵ２が発行した読み出し要求アドレスに対応するデータをメインメモリ７から読み出してＣＰＵ２に転送するとともに、このアドレスに対応するページ分のデータをページマッピングキャッシュ６に転送し、かつこのアドレスに対応するキャッシュライン分のデータをＬ１キャッシュ４とＬ２キャッシュ５に転送し、かつＴＬＢ３とページテーブル１０を更新する（ステップＳ３９）。ステップＳ３９は第４処理に対応する。 If it is determined in step S37 that a miss has occurred, the data corresponding to the read request address issued by the CPU 2 is read from the main memory 7 and transferred to the CPU 2, and the data for the page corresponding to this address is stored in the page mapping cache 6. The data for the cache line corresponding to this address is transferred to the L1 cache 4 and the L2 cache 5, and the TLB 3 and the page table 10 are updated (step S39). Step S39 corresponds to the fourth process.

このように、第３の実施形態では、ＴＬＢ３とは別個にページテーブル１０を設けるため、ページマッピングキャッシュ６のエントリ数が増えて、アドレス変換情報等をＴＬＢ３に格納しきれなくなっても、ページテーブル１０に格納でき、ページマッピングキャッシュ６の大容量化に対応可能となる。 As described above, in the third embodiment, since the page table 10 is provided separately from the TLB 3, even if the number of entries in the page mapping cache 6 increases and the address translation information cannot be stored in the TLB 3, the page table 10, and the capacity of the page mapping cache 6 can be increased.

（第４の実施形態）
上述した第１〜第３の実施形態では、ＣＰＵ２はまずＴＬＢ３にアクセスして、その後に各キャッシュメモリ４〜６とメインメモリ７に順にアクセスしている。このため、ＴＬＢ３が大容量化すると、ＴＬＢ３内の検索に時間がかかり、Ｌ１キャッシュ４に迅速にアクセスできなくなる。そこで、以下の第４の実施形態では、ＣＰＵ２がＴＬＢ３よりも先にＬ１キャッシュ４にアクセスするようにしたものである。 (Fourth embodiment)
In the first to third embodiments described above, the CPU 2 first accesses the TLB 3 and then accesses each of the cache memories 4 to 6 and the main memory 7 in order. For this reason, when the capacity of the TLB 3 is increased, it takes time to search in the TLB 3, and the L1 cache 4 cannot be quickly accessed. Therefore, in the following fourth embodiment, the CPU 2 accesses the L1 cache 4 before the TLB 3.

図１３は第４の実施形態に係るプロセッサシステム１の概略構成を示すブロック図、図１４は第４の実施形態におけるＴＬＢ３、各キャッシュメモリおよびメインメモリ７のアクセス優先度を示す図である。本実施形態では、図６および図７と比較して、Ｌ１キャッシュ４とＴＬＢ３を入れ替えている。 FIG. 13 is a block diagram showing a schematic configuration of the processor system 1 according to the fourth embodiment, and FIG. 14 is a diagram showing access priorities of the TLB 3, each cache memory, and the main memory 7 in the fourth embodiment. In the present embodiment, the L1 cache 4 and the TLB 3 are interchanged as compared with FIGS.

ＣＰＵ２は、読み出し要求アドレスを発行したときに、まずはＬ１キャッシュ４にアクセスする。図１３のＬ１キャッシュ４は、ＣＰＵ２が発行した仮想アドレスからなる読み出し要求アドレスで直接アクセス可能とされている。Ｌ１キャッシュ４にミスした場合に、ＣＰＵ２はＴＬＢ３にアクセスすることになる。 When the CPU 2 issues a read request address, it first accesses the L1 cache 4. The L1 cache 4 in FIG. 13 can be directly accessed with a read request address composed of a virtual address issued by the CPU 2. When the L1 cache 4 misses, the CPU 2 accesses the TLB 3.

本実施形態のように、Ｌ１キャッシュ４に仮想アドレスでアクセスする場合、ＣＰＵ２がタスクを切り替えると、Ｌ１キャッシュ４全体を無効化（フラッシュ）しなければならない。ただし、Ｌ１キャッシュ４に格納されているデータは、Ｌ２キャッシュ５とページマッピングキャッシュ６のどちらかに格納されているため、メインメモリ７にアクセスする必要はほとんどなく、高速にアドレス空間の切替を行うことができる。 When accessing the L1 cache 4 with a virtual address as in this embodiment, when the CPU 2 switches tasks, the entire L1 cache 4 must be invalidated (flushed). However, since the data stored in the L1 cache 4 is stored in either the L2 cache 5 or the page mapping cache 6, there is almost no need to access the main memory 7, and the address space is switched at high speed. be able to.

図１５は第４の実施形態によるＣＰＵ２が読み出し要求アドレスを発行する場合の処理手順を示すフローチャートである。図１５のフローチャートは図９のフローチャートと比べて、図９のステップＳ１１とＳ１３の判定処理を入れ替えた構成になっている。すなわち、ＣＰＵ２が発行した読み出し要求アドレスがＬ１キャッシュ４にヒットするか否かを判定し（ステップＳ４１）、ヒットすれば、Ｌ１キャッシュ４から読み出したデータをＣＰＵ２に転送する（ステップＳ４２）。ヒットしなければ、ＣＰＵ２が発行した読み出し要求アドレスがＴＬＢ３にヒットするか否かを判定し（ステップＳ４３）、ヒットしなければ、メインメモリ７内のページテーブルエントリからアドレス変換情報をロードしてＴＬＢ３内のデータを更新する（ステップＳ４４）。ステップＳ４１，Ｓ４２は第１処理に対応する。ステップＳ４３，Ｓ４４は第２処理に対応する。 FIG. 15 is a flowchart showing a processing procedure when the CPU 2 according to the fourth embodiment issues a read request address. The flowchart in FIG. 15 has a configuration in which the determination processes in steps S11 and S13 in FIG. 9 are interchanged as compared with the flowchart in FIG. That is, it is determined whether or not the read request address issued by the CPU 2 hits the L1 cache 4 (step S41). If it hits, the data read from the L1 cache 4 is transferred to the CPU 2 (step S42). If there is no hit, it is determined whether or not the read request address issued by the CPU 2 hits the TLB 3 (step S43). If there is no hit, the address conversion information is loaded from the page table entry in the main memory 7 and the TLB 3 is loaded. The data inside is updated (step S44). Steps S41 and S42 correspond to the first process. Steps S43 and S44 correspond to the second process.

ステップＳ４３で格納されていると判定された場合、あるいはステップＳ４４の処理が終了した場合は、ＣＰＵ２が発行した読み出し要求アドレスに対応するデータがページマッピングキャッシュ６に格納されているか否かをＴＬＢ３が保持するフラグ情報に基づいて判定する（ステップＳ４５）。その後は、図９のステップＳ１７以降と同様の処理が行われる（ステップＳ４６〜Ｓ５３）。ステップＳ４５，Ｓ４６は第３処理に対応する。ステップＳ４７，Ｓ４８は第４処理に対応する。ステップＳ４９は第５処理に対応する。ステップＳ５０〜Ｓ５２は第６処理に対応する。ステップＳ５３は第７処理に対応する。 If it is determined that the data is stored in step S43, or if the process in step S44 ends, the TLB 3 determines whether data corresponding to the read request address issued by the CPU 2 is stored in the page mapping cache 6. A determination is made based on the held flag information (step S45). Thereafter, the same processing as that after step S17 in FIG. 9 is performed (steps S46 to S53). Steps S45 and S46 correspond to the third process. Steps S47 and S48 correspond to the fourth process. Step S49 corresponds to the fifth process. Steps S50 to S52 correspond to the sixth process. Step S53 corresponds to the seventh process.

本実施形態においても、ＴＬＢ３にアドレス空間識別ＩＤ（ＡＳＩＤ）を設けて、各タスクごとにアドレス変換情報等を管理してもよい。 Also in this embodiment, an address space identification ID (ASID) may be provided in the TLB 3 to manage address conversion information and the like for each task.

図１３は、Ｌ２キャッシュ５とページマッピングキャッシュ６が並列化している例を示したが、図１や図１０のように、Ｌ２キャッシュ５とページマッピングキャッシュ６が並列化していない場合において、Ｌ１キャッシュ４とＴＬＢ３のアクセス順序を入れ替えてもよい。 FIG. 13 shows an example in which the L2 cache 5 and the page mapping cache 6 are parallelized. However, when the L2 cache 5 and the page mapping cache 6 are not paralleled as shown in FIG. 1 and FIG. 4 and TLB3 access order may be interchanged.

このように、第４の実施形態では、ＴＬＢ３よりも先にＬ１キャッシュ４にアクセスするようにしたため、ＴＬＢ３が大容量化してＴＬＢ３のアクセスに時間がかかる場合でも、Ｌ１キャッシュ４に迅速にアクセスできる。 As described above, in the fourth embodiment, since the L1 cache 4 is accessed before the TLB 3, the L1 cache 4 can be quickly accessed even when the TLB 3 has a large capacity and takes time to access the TLB 3. .

なお、Ｌ１キャッシュ４だけでなく、Ｌ２キャッシュ５も、ＴＬＢ３より先にアクセスするようにしてもよい。 Note that not only the L1 cache 4 but also the L2 cache 5 may be accessed before the TLB 3.

上述した実施形態では、２階層のキャッシュメモリであるＬ１キャッシュ４とＬ２キャッシュ５を設ける例を説明したが、３階層以上のキャッシュメモリを設けてもよい。ｋ（ｋ＝１からｎまでのすべての整数、ｎは１以上の整数）次のキャッシュメモリを設ける場合、図５の処理では、すべてのキャッシュメモリでミスした場合に、ページマッピングキャッシュ６にアクセスすることになる。また、図９の処理では、ステップＳ１５でミスした後、Ｌ２キャッシュ５以上の高次のキャッシュメモリすべてでミスした場合に、ステップＳ１９以降の処理を行うことになる。また、図１２の処理では、すべてのキャッシュメモリでミスした後に、ステップＳ３７の処理を行うことになる。また、図１５の処理では、ステップＳ４５でミスした後、Ｌ２キャッシュ５以上の高次のキャッシュメモリすべてでミス場合に、ステップＳ４９以降の処理を行うことになる。 In the embodiment described above, an example in which the L1 cache 4 and the L2 cache 5 that are two-level cache memories are provided has been described, but a cache memory having three or more levels may be provided. k (all integers from k = 1 to n, where n is an integer equal to or greater than 1) When the next cache memory is provided, the process of FIG. 5 accesses the page mapping cache 6 when all the cache memories miss. Will do. In the process of FIG. 9, after a miss in step S <b> 15, if a miss occurs in all higher-order cache memories higher than the L2 cache 5, the processes after step S <b> 19 are performed. In the process of FIG. 12, the process of step S37 is performed after making a miss in all the cache memories. In the process of FIG. 15, after a miss in step S <b> 45, if all of the higher-order cache memories higher than the L2 cache 5 miss, the process from step S <b> 49 is performed.

上述した各実施形態では、Ｌ２キャッシュ５７のメモリセルがＭＲＡＭセルの例を説明したが、他の不揮発性メモリ（例えば、ＲｅＲＡＭメモリセル、ＰｈａｓｅｃｈａｎｇｅＲＡＭ（ＰＲＡＭやＰＣＭ）メモリセル、ＮＡＮＤフラッシュメモリセル）で構成されていてもよい。また、上述した各実施形態において、メモリ制御回路１の一部あるいは全部は、Ｌ１キャッシュ４６またはＬ２キャッシュ５７に内蔵してもよい。さらに、上述した各実施形態において、特定のメモリに対する電源を遮断する際には、特定のメモリ以外のｎ次（ｎは２以上の整数）のキャッシュメモリとメインメモリ７８のうち、電源を遮断可能なメモリ内の一部または全部をすべて一括で遮断してもよい。あるいは、特定のメモリを含めてｎ次のキャッシュメモリのうち不揮発性メモリのそれぞれについて、個別に電源遮断のタイミングを制御してもよい。 In each of the embodiments described above, an example in which the memory cell of the L2 cache 57 is an MRAM cell has been described. However, other nonvolatile memories (for example, a ReRAM memory cell, a phase change RAM (PRAM or PCM) memory cell, a NAND flash memory cell) ). In each embodiment described above, part or all of the memory control circuit 1 may be built in the L1 cache 46 or the L2 cache 57. Furthermore, in each of the above-described embodiments, when the power to a specific memory is shut off, the power can be shut down among the n-order cache memory and the main memory 78 other than the specific memory (n is an integer of 2 or more). A part or all of the memory may be blocked at once. Alternatively, the power-off timing may be individually controlled for each of the non-volatile cache memories including the specific memory.

上述した実施形態で説明したプロセッサシステム１の少なくとも一部は、ハードウェアで構成してもよいし、ソフトウェアで構成してもよい。ソフトウェアで構成する場合には、プロセッサシステム１の少なくとも一部の機能を実現するプログラムをフレキシブルディスクやＣＤ−ＲＯＭ等の記録媒体に収納し、コンピュータに読み込ませて実行させてもよい。記録媒体は、磁気ディスクや光ディスク等の着脱可能なものに限定されず、ハードディスク装置やメモリなどの固定型の記録媒体でもよい。 At least a part of the processor system 1 described in the above-described embodiment may be configured by hardware or software. When configured by software, a program for realizing at least a part of the functions of the processor system 1 may be stored in a recording medium such as a flexible disk or a CD-ROM, and read and executed by a computer. The recording medium is not limited to a removable medium such as a magnetic disk or an optical disk, but may be a fixed recording medium such as a hard disk device or a memory.

また、プロセッサシステム１の少なくとも一部の機能を実現するプログラムを、インターネット等の通信回線（無線通信も含む）を介して頒布してもよい。さらに、同プログラムを暗号化したり、変調をかけたり、圧縮した状態で、インターネット等の有線回線や無線回線を介して、あるいは記録媒体に収納して頒布してもよい。 Further, a program for realizing at least a part of the functions of the processor system 1 may be distributed via a communication line (including wireless communication) such as the Internet. Further, the program may be distributed in a state where the program is encrypted, modulated or compressed, and stored in a recording medium via a wired line such as the Internet or a wireless line.

本発明の態様は、上述した個々の実施形態に限定されるものではなく、当業者が想到しうる種々の変形も含むものであり、本発明の効果も上述した内容に限定されない。すなわち、特許請求の範囲に規定された内容およびその均等物から導き出される本発明の概念的な思想と趣旨を逸脱しない範囲で種々の追加、変更および部分的削除が可能である。 The aspect of the present invention is not limited to the individual embodiments described above, and includes various modifications that can be conceived by those skilled in the art, and the effects of the present invention are not limited to the contents described above. That is, various additions, modifications, and partial deletions can be made without departing from the concept and spirit of the present invention derived from the contents defined in the claims and equivalents thereof.

１プロセッサシステム、２ＣＰＵ、３ＴＬＢ、４Ｌ１キャッシュ、５Ｌ２キャッシュ、６ページマッピングキャッシュ、７メインメモリ、９メモリシステム、１０ページテーブル 1 processor system, 2 CPU, 3 TLB, 4 L1 cache, 5 L2 cache, 6 page mapping cache, 7 main memory, 9 memory system, 10 page table

Claims

a k-th order cache memory (all integers from k = 1 to n, where n is an integer of 1 or more);
A large-capacity cache memory using a non-volatile memory having a memory capacity larger than that of the kth-order cache memory and capable of being accessed at a higher speed than the main memory;
Whether the data is stored in the large-capacity cache memory in units of pages with a larger amount of data than the address conversion information from the virtual address to the physical address issued by the processor and the cache line that is the access unit of the k-th cache memory A cache memory system comprising: flag information for recording whether or not; and a translation lookaside buffer for storing the flag information.

The cache memory system according to claim 1, wherein the translation lookaside buffer is accessed by a processor prior to the k-th order cache memory.

The cache memory system according to claim 2, wherein the k-th cache memory is accessed by a processor in preference to the large-capacity cache memory.

The cache memory system according to claim 2 or 3, wherein the large-capacity cache memory stores all data stored in the kth-order cache memory.

The cache memory system according to claim 1, wherein the translation lookaside buffer is accessed by a processor next to a primary cache memory in the k-th cache memory.

The cache memory system according to claim 5, wherein the large-capacity cache memory stores all data stored in all cache memories other than the primary cache memory in the k-th cache memory.

The specific cache memory higher than the primary cache memory in the k-th cache memory and the large-capacity cache memory are accessed in parallel by the processor,
The cache memory system according to claim 1 or 2, wherein the specific cache memory and the large-capacity cache memory store data corresponding to different addresses.

The said translation lookaside buffer has an access map which stores the information which shows whether the data are stored in the said specific cache memory for every cache line in each page per page. Cache memory system.

9. The cache memory system according to claim 1, wherein the translation lookaside buffer stores address information for accessing the large-capacity cache memory in units of pages.

The cache memory according to claim 1, wherein the translation lookaside buffer has dirty information indicating whether data in the large-capacity cache memory is written back to the main memory in units of pages. system.

11. The cache memory system according to claim 1, wherein the translation lookaside buffer has a set associative configuration using a part of bits of a virtual address as an index.

12. The cache according to claim 1, further comprising a page table storing address translation information and flag information that could not be stored in the translation lookaside buffer and capable of being accessed faster than the main memory. Memory system.

The page table is accessed by the processor after accessing the k th cache memory;
The cache memory system according to claim 12, wherein the large-capacity cache memory is accessed after access to the page table by a processor.

A processor;
Main memory,
a k-th order cache memory (all integers from k = 1 to n, where n is an integer of 1 or more);
A large-capacity cache memory using a non-volatile memory having a memory capacity larger than that of the kth-order cache memory and capable of being accessed at a higher speed than the main memory;
Data is stored in the large-capacity cache memory in units of pages with a larger amount of data than the address conversion information from the virtual address to the physical address issued by the processor and the cache line that is the access unit of the k-th cache memory. A processor system comprising flag information for recording whether or not there is a translation lookaside buffer for storing the flag information.

The processor is
It is determined whether or not a read request address has hit the translation lookaside buffer. If not, the address translation information related to the read request address is loaded from the main memory and the translation lookaside buffer is loaded. A first process for updating the buffer;
After the first processing, it is examined in order from the low-order cache memory whether the data corresponding to the read request address is stored in the k-th order cache memory, and if stored, the stored data is A second process of reading;
If the data corresponding to the read request address is not stored in any of the k-th cache memories, it is determined whether the data corresponding to the read request address is stored in the large-capacity cache memory. If the determination is made based on the flag information held by the look-aside buffer and stored in the large-capacity cache memory, the data corresponding to the read request address is read from the large-capacity cache memory, and the read request A third process of storing data for a cache line corresponding to an address in the k-th cache memory;
If there is no hit in the third process, the data corresponding to the read request address is read from the main memory, the page unit data corresponding to the read request address is stored in the large-capacity cache memory, and the read And a fourth process of storing data for a cache line corresponding to a request address in the k-th cache memory and updating the translation lookaside buffer based on the read request address. 14. The processor system according to 14.

The specific cache memory higher than the primary cache memory in the k-th cache memory and the large-capacity cache memory are accessed in parallel by the processor,
The translation lookaside buffer has an access map for storing information indicating whether data is stored in the specific cache memory for each cache line in each page in units of pages,
The processor is
It is determined whether or not a read request address has hit the translation lookaside buffer. If not, the address translation information related to the read request address is loaded from the main memory and the translation lookaside buffer is loaded. A first process for updating the buffer;
After the first processing, it is checked whether data corresponding to the read request address is stored in the primary cache memory in the k-th cache memory, and if stored, the stored data is A second process of reading;
If it is determined in the second process that the data is not stored in the primary cache memory, it is determined whether the data corresponding to the read request address is stored in the large-capacity cache memory. Determine based on the flag information held in the buffer, and if stored in the large-capacity cache memory, read data corresponding to the read request address from the large-capacity cache memory and correspond to the read request address A third process of storing data for the cache line to be stored in the primary cache memory;
If it is determined in the third process that the translation lookaside buffer has not been hit, does the read request address hit a secondary or higher order cache memory in the kth order cache memory? A fourth process for reading out data corresponding to the read request address from the higher-level cache memory if it is determined in order, and if hit,
A fifth process of reading data corresponding to the read request address from the main memory when it is determined in the fourth process that the data is not stored in the higher-order cache memory;
With reference to a page corresponding to the read request address of the access map in the translation lookaside buffer, when the number of data stored in the specific cache memory exceeds a predetermined threshold, All the data of the corresponding page is stored in the large-capacity cache memory, the data in the specific cache memory is invalidated, and the data for the cache line corresponding to the read request address is read from the main memory and the 1 A sixth process of storing in the next cache memory and updating the translation lookaside buffer;
If it is determined in the sixth process that the predetermined threshold is not exceeded, a seventh process of reading data for the cache line corresponding to the read request address from the main memory and storing it in the specific cache memory; 15. The processor system according to claim 14, wherein the processor system is executed.

A page table that stores address translation information and flag information that could not be stored in the translation lookaside buffer, and that can be accessed faster than the main memory,
The processor is
It is determined whether or not a read request address has hit the translation lookaside buffer. If not, the address translation information related to the read request address is loaded from the main memory and the translation lookaside buffer is loaded. A first process for updating the buffer;
After the first processing, it is examined in order from the low-order cache memory whether the data corresponding to the read request address is stored in the k-th order cache memory, and if stored, the stored data is A second process of reading;
If the data corresponding to the read request address is not stored in any of the k-th cache memories, it is determined whether or not the read request address hits the page table. A third process of reading data corresponding to the read request address from a cache memory and storing data for a cache line corresponding to the read request address in the primary cache memory and the secondary cache memory;
If there is no hit in the third process, the data corresponding to the read request address is read from the main memory, the page unit data corresponding to the read request address is stored in the large-capacity cache memory, and the read A fourth process of storing data for a cache line corresponding to a request address in the k-th cache memory and updating the translation lookaside buffer and the page table based on the read request address; 15. The processor system according to claim 14, which is executed.

The specific cache memory higher than the primary cache memory in the k-th cache memory and the large-capacity cache memory are accessed in parallel by the processor,
The translation lookaside buffer has an access map for storing information indicating whether data is stored in the specific cache memory for each cache line in each page in units of pages,
The processor is
Checking whether data corresponding to the read request address is stored in a primary cache memory in the k-th cache memory, and if stored, a first process of reading the stored data;
If the data corresponding to the read request address is not stored in the primary cache memory, it is determined whether or not the read request address hits the translation lookaside buffer. A second process of loading address translation information relating to the read request address from main memory and updating the translation lookaside buffer;
After the end of the second process, it is determined based on the flag information held by the translation lookaside buffer whether data corresponding to the read request address is stored in the large-capacity cache memory, If stored in the large-capacity cache memory, data corresponding to the read request address is read from the large-capacity cache memory, and data in units of cache lines corresponding to the read request address is read to the primary cache memory. A third process to store;
If it is determined in the third process that the address is not stored in the translation lookaside buffer, the read request address is stored in a higher-order cache memory than the first-order cache in the k-th order cache memory. Whether or not to hit, in order, and if hit, a fourth process of reading data corresponding to the read request address from the higher-order cache memory;
A fifth process of reading data corresponding to the read request address from the main memory when it is determined in the fourth process that the data is not stored in the higher-order cache memory;
With reference to a page corresponding to the read request address of the access map in the translation lookaside buffer, when the number of data stored in the specific cache memory exceeds a predetermined threshold, All the data of the corresponding page is stored in the large-capacity cache memory, the data in the specific cache memory is invalidated, and the data for the cache line corresponding to the read request address is stored in the primary cache memory And a sixth process for updating the translation lookaside buffer;
7. The seventh process of storing data for a cache line corresponding to the read request address in the k-th cache memory when it is determined in the sixth process that the predetermined threshold is not exceeded. 14. The processor system according to 14.