JP2018136922A

JP2018136922A - Memory division for computing system having memory pool

Info

Publication number: JP2018136922A
Application number: JP2017224258A
Authority: JP
Inventors: パヴェル・ザイコフ; Zaykov Pavel; ルーシー・マツソヴァ; Matusova Lucie
Original assignee: Honeywell International Inc
Current assignee: Honeywell International Inc
Priority date: 2017-02-23
Filing date: 2017-11-22
Publication date: 2018-08-30
Anticipated expiration: 2037-11-22
Also published as: US20180239709A1; EP3367246B1; EP3367246A1; US10515017B2; JP7242170B2

Abstract

PROBLEM TO BE SOLVED: To provide performance-efficient division of a memory hierarchy in a COTS multi-core processor.SOLUTION: A computing system comprises: at least one processing unit 110; a memory controller 120; and a main memory 130 for communicating, via the memory controller, with the processing unit. A memory hierarchy is divided into a plurality of memory pools. The main memory includes a set of memory modules divided into ranks each having a rank address determined by a set of rank address bits. Each rank has a set of memory devices including one or more banks each having a bank address determined by a set of bank address bits. A plurality of threads are executed on the processing unit, and allocated to the memory pools on the basis of one or more memory division technologies including bank division, rank division or memory controller division.SELECTED DRAWING: Figure 1

Description

[0001]アビオニクスシステムにおいて最も重要な要件の中の１つは、実行プロセスの時間及び空間の分割を確実にすることである。時間分割は、プロセス内のスレッドがプロセッサー時間の予め定められた部分を得ることを保証する技術である。プロセッサー時間の予め定められた部分がスレッド実行を完了するのに十分なことを確実にするために、通常、安全マージンが、測定された最悪ケース実行時間（ＷＣＥＴ）の上に追加される。空間分割とは、プロセスが互いのデータを損なうのを防止する、メモリーアクセスに対するハードウェア強制の制限を指す。時間分割及び空間分割はリアルタイムオペレーティングシステム（ＲＴＯＳ）によって保証され、これは通常、商用オフザシェルフ（ＣＯＴＳ）シングルコアプロセッサーで実行される。 [0001] One of the most important requirements in avionics systems is to ensure time and space partitioning of the execution process. Time division is a technique that ensures that threads in a process get a predetermined portion of processor time. To ensure that a predetermined portion of processor time is sufficient to complete thread execution, a safety margin is typically added above the measured worst case execution time (WCET). Spatial partitioning refers to hardware-enforced restrictions on memory access that prevent processes from losing each other's data. Time division and space division are guaranteed by a real-time operating system (RTOS), which is typically performed on a commercial off-the-shelf (COTS) single core processor.

[0002]アビオニクスシステムの複雑さ及び計算能力は常に増加しつつあり、その一方で、ＣＯＴＳシングルコアプロセッサーは旧式になっている。従って、新規なコンピューターアーキテクチャーを選択して、アビオニクスシステムのニーズを満たすことが必要である。利用できるＣＯＴＳマルチコアプロセッサーはその最善の候補のいくつかとなる傾向があるがは、これは、低いサイズ、重量及び電力（ＳＷａＰ）特性と一緒になった高い性能上の能力のためである。利点とは別に、ＣＯＴＳマルチコアプロセッサーは、予測不可能な競合の時間的影響を受けるという課題がある。予測不可能な競合の結果、時間分割が危うくなる場合もあり得る。 [0002] The complexity and computing power of avionic systems is constantly increasing, while COTS single-core processors are obsolete. It is therefore necessary to select a new computer architecture to meet the needs of avionic systems. Available COTS multi-core processors tend to be some of their best candidates because of their high performance capabilities combined with low size, weight and power (SWaP) characteristics. Apart from the advantages, the COTS multi-core processor has the problem of being subject to the time effects of unpredictable contention. Time division can be compromised as a result of unpredictable contention.

[0003]予測不可能な競合は、複数のコアからの同じ共用ハードウェア資源へのアクセスによって引き起こされる。共用ハードウェア資源の例は、キャッシュ、メインメモリー、及び入出力（Ｉ／Ｏ）インターフェイスである。予測不可能な競合の結果として、控え目なタスクタイミングとなり、それに伴い、プロセッサー性能上不利になる。従って、ＣＯＴＳマルチコアプロセッサーの共用ハードウェア資源の時間的影響に取り組む、性能効率の良い技術を有する必要がある。 [0003] Unpredictable contention is caused by access to the same shared hardware resource from multiple cores. Examples of shared hardware resources are caches, main memory, and input / output (I / O) interfaces. As a result of unpredictable contention, conservative task timing is associated with processor performance penalty. Therefore, there is a need to have a performance efficient technology that addresses the time effects of the shared hardware resources of the COTS multi-core processor.

[0004]ＣＯＴＳマルチコアプロセッサーにおいて、キャッシュはハードウェア資源であり、その可用性はアプリケーションの性能に大幅に影響を与える。キャッシュがプロセッサーコア間で共有される場合、異なるコアにマッピングされるタスクは互いのキャッシュラインを無効にする場合もある。コアにまたがるキャッシュ無効化の結果、プロセッサー性能上不利になることもある。 [0004] In a COTS multi-core processor, the cache is a hardware resource, and its availability greatly affects the performance of the application. If the cache is shared between processor cores, tasks mapped to different cores may invalidate each other's cache lines. Cache invalidation across cores can result in processor performance penalties.

[0005]コアにまたがるキャッシュ無効化を減らして、それぞれプロセッサー性能を増加させるために、各種のアプローチが開発されている。１つのアプローチでは、キャッシュ分割は、ＤＤＣ−ＩからのＤｅｏｓＲＴＯＳの「メモリープール」と呼ばれる機構を介して提供され、これはどのメモリーページが各メモリープールに含まれるかについての精緻化されたコントロールを可能とする。 [0005] Various approaches have been developed to reduce cache invalidation across cores, each increasing processor performance. In one approach, cache partitioning is provided through a mechanism called the “memory pool” of Deos RTOS from DDC-I, which provides refined control over which memory pages are included in each memory pool. Is possible.

米国特許第８，０６９，３０８号US Pat. No. 8,069,308 米国公開第２０１５／０２０５７２４号US Publication No. 2015/0205724

Ｉｎｔ’ｌＣｏｎｆ．ｏｎＰａｒａｌｌｅｌＡｒｃｈｉｔｅｃｔｕｒｅｓａｎｄＣｏｍｐｉｌａｔｉｏｎＴｅｃｈｎｉｑｕｅｓ（ＰＡＣＴ），２０１２の会議録、３６７〜３７６頁のＬｉｕ他の、Ａｓｏｆｔｗａｒｅｍｅｍｏｒｙｐａｒｔｉｔｉｏｎａｐｐｒｏａｃｈｆｏｒｅｌｉｍｉｎａｔｉｎｇｂａｎｋ−ｌｅｖｅｌｉｎｔｅｒｅｆｅｒｅｎｃｅｉｎｍｕｌｔｉｃｏｒｅｓｙｓｔｅｍｓ（マルチコアシステムにおけるバンクレベル干渉を除去するためのソフトウェアメモリー分割アプローチ）Int'l Conf. on Parallel Architecture and Compilation Techniques (PACT), Proceedings of 2012, Liu et al., pp. 367-376, Multi-levels in the A Software memory partitioning approach)

[0006]メモリープールの概念がキャッシュを分割するためにうまく適用されているにもかかわらず、メインメモリーにおける著しい干渉によって生じるＣＯＴＳマルチコアプロセッサーの大きな予測不可能な競合がまだあり、メインメモリーはダイナミックランダムアクセスメモリー（ＤＲＡＭ）であり得る。メインメモリーは、１つ又は複数のメモリーコントローラーの助けを借りてプロセッサーによってアクセスされる。 [0006] Even though the memory pool concept has been successfully applied to partition the cache, there is still a large unpredictable contention of COTS multi-core processors caused by significant interference in main memory, and main memory is dynamically random It can be an access memory (DRAM). Main memory is accessed by the processor with the help of one or more memory controllers.

[0007]別のアプローチでは、メモリー管理ユニット（ＭＭＵ）を制御するために、Ｌｉｎｕｘ（登録商標）のカーネル拡張を使用してＤＲＡＭバンク及びキャッシュカラーリングが行われている。このアプローチは、キャッシュ及びメインメモリーの分割が相当な性能改善をもたらすことができることを示唆する。 [0007] In another approach, DRAM banking and cache coloring are performed using Linux kernel extensions to control the memory management unit (MMU). This approach suggests that cache and main memory partitioning can provide significant performance improvements.

[0008]従って、ＣＯＴＳマルチコアプロセッサーにおけるメモリー階層の、性能効率の良い分割を提供するという課題に対処する必要がある。 [0008] Accordingly, there is a need to address the problem of providing a performance efficient partitioning of memory hierarchies in COTS multi-core processors.

[0009]コンピューティングシステムは、少なくとも１つの処理ユニットと、少なくとも１つの処理ユニットと動作上通信する、キャッシュ有り又は無しの、少なくとも１つのメモリーコントローラーと、少なくとも１つのメモリーコントローラーを介して少なくとも１つの処理ユニットと動作上通信するメインメモリーとを含む。コンピューティングシステムのメモリー階層は少なくとも１つのキャッシュ、少なくとも１つのメモリーコントローラー、及びメインメモリーを含み、メモリー階層は複数のメモリープールに分けられる。メインメモリーは、一組のランクアドレスビットによって定められるランクアドレスをそれぞれ有するランクに分けられる一組のメモリーモジュールを含み、各ランクは一組のメモリーデバイスを有し、各メモリーデバイスは、一組のバンクアドレスビットによって定められるバンクアドレスをそれぞれ有する１つ又は複数のバンクを含む。複数のスレッドは、少なくとも１つの処理ユニット上で実行され、バンクアドレスビットを使用してメモリープールのうち１つ又は複数のサイズ及びパターンを定めるバンク分割、ランクアドレスビットを使用して１つ又は複数のランクにアクセスするランク分割、又は、メモリーコントローラーインターリーブを使用するメモリーコントローラー分割を含む、１つ又は複数のメモリー分割技術に基づいて、メモリープールに割り当てられる。 [0009] The computing system includes at least one processing unit, at least one memory controller with or without cache in operative communication with the at least one processing unit, and at least one via the at least one memory controller. Main memory in operation communication with the processing unit. The memory hierarchy of the computing system includes at least one cache, at least one memory controller, and main memory, and the memory hierarchy is divided into a plurality of memory pools. The main memory includes a set of memory modules that are divided into ranks each having a rank address defined by a set of rank address bits, each rank having a set of memory devices, each memory device having a set of memory devices. It includes one or more banks each having a bank address defined by a bank address bit. The plurality of threads are executed on at least one processing unit and use bank address bits to define one or more sizes and patterns of the memory pool, bank division, one or more using rank address bits Assigned to a memory pool based on one or more memory partitioning techniques, including rank partitioning to access the ranks of the memory, or memory controller partitioning using memory controller interleaving.

[0010]本発明の特徴は、図面を参照した以下の記述から当業者に明らかになる。図面が典型的な実施形態だけを表しており、従って、範囲を制限するものとみなすべきではないことを理解した上で、本発明は、添付図面を用いることにより付加的な具体性及び詳細に関して記載されている。 [0010] The features of the present invention will become apparent to those skilled in the art from the following description with reference to the drawings. With the understanding that the drawings represent only exemplary embodiments and therefore should not be considered as limiting the scope, the present invention is directed to additional specificities and details by using the accompanying drawings. Have been described.

[0011]メモリー分割によって実施することができる一実施形態による、マルチコアプロセッサーアーキテクチャーのブロック図である。[0011] FIG. 2 is a block diagram of a multi-core processor architecture, according to one embodiment that may be implemented with memory partitioning. [0012]図１のマルチコアプロセッサーアーキテクチャーにおいて実装されるメモリー階層のブロック図である。[0012] FIG. 2 is a block diagram of a memory hierarchy implemented in the multi-core processor architecture of FIG. [0013]図１のマルチコアプロセッサーアーキテクチャーにおいて実装することができる一実施形態による、デュアルインラインメモリーモジュールの複数のダイナミックランダムアクセスメモリー（ＤＲＡＭ）デバイスの典型的配置のブロック図である。[0013] FIG. 2 is a block diagram of an exemplary arrangement of a plurality of dynamic random access memory (DRAM) devices in a dual in-line memory module, according to one embodiment that may be implemented in the multi-core processor architecture of FIG. [0014]図２Ｂのデュアルインラインメモリーモジュールにおいて実装することができる、ＤＲＡＭデバイスのアーキテクチャーのブロック図である。[0014] FIG. 2B is a block diagram of an architecture of a DRAM device that can be implemented in the dual in-line memory module of FIG. 2B. [0015]図１のマルチコアプロセッサーアーキテクチャーにおいて実装することができる、ＤＲＡＭメモリーコントローラーの論理構造のブロック図である。[0015] FIG. 2 is a block diagram of the logic structure of a DRAM memory controller that may be implemented in the multi-core processor architecture of FIG. [0016]１つの実施形態による、コンピューティングシステムのメモリー分割のための方法のフローチャートである。[0016] FIG. 6 is a flowchart of a method for memory partitioning of a computing system, according to one embodiment. [0017]ＣＯＴＳマルチコアプロセッサーにおけるワーカー及びトラッシャープロセス繰返しのためのスレッドマッピング及び実行タイムラインのグラフ表現である。[0017] FIG. 4 is a graphical representation of thread mapping and execution timelines for worker and trasher process iterations in a COTS multi-core processor. [0018]非分割のＤＲＡＭ及びＤＲＡＭバンク分割の両方に対するワーカープロセスの実行時間を示すグラフである。[0018] FIG. 6 is a graph showing execution times of worker processes for both undivided DRAM and DRAM bank partitioning.

[0019]以下の詳述において、実施形態は充分詳細に記載されており、当業者は本発明を実施することができる。他の実施形態を、本発明の範囲を逸脱せずに利用することができることを理解すべきである。従って、以下の詳述は、限定的な意味で解釈すべきものではない。 [0019] In the following detailed description, the embodiments are described in sufficient detail to enable those skilled in the art to practice the invention. It should be understood that other embodiments may be utilized without departing from the scope of the present invention. The following detailed description is, therefore, not to be construed in a limiting sense.

[0020]メモリープールとともに実装される、コンピューティングシステムのためのメモリー分割が、本明細書において開示される。特に、本明細書のアプローチは、商用オフザシェルフ（ＣＯＴＳ）マルチコアプロセッサーシステムのメモリー階層の性能効率の良い分割を提供する。本明細書のアプローチにおいて、メモリープール概念は、特にダイナミックランダムアクセスメモリー（ＤＲＡＭ）デバイスでの用途に利用される。 [0020] A memory partition for a computing system implemented with a memory pool is disclosed herein. In particular, the approach herein provides a performance efficient partitioning of the memory hierarchy of a commercial off-the-shelf (COTS) multi-core processor system. In the approach herein, the memory pool concept is utilized particularly for applications in dynamic random access memory (DRAM) devices.

[0021]本明細書のアプローチを適用することができるメモリー階層は、１つ又は複数のキャッシュ、１つ又は複数のメモリーコントローラー、及び１つ又は複数のメインメモリーを含む。本明細書において使用する場合、「メモリー分割」は、１つ又は複数のキャッシュ、１つ又は複数のメモリーコントローラー、又は１つ又は複数のメインメモリーの分割を指す。 [0021] The memory hierarchy to which the approach herein can be applied includes one or more caches, one or more memory controllers, and one or more main memories. As used herein, “memory partitioning” refers to partitioning of one or more caches, one or more memory controllers, or one or more main memories.

[0022]本明細書のアプローチは、バンク分割、ランク分割、及び複数のＤＲＡＭコントローラーインターリーブを管理することを含むメモリー分割によってキャッシュ分割を強化するために用いることができる。メモリープールは、メモリーバンクを選択するアドレスビット、アクティブなメモリーランク、ランクビットの数及びインターリーブのタイプ、アクティブなメモリーコントローラーの数、インターリーブの粒状度及びタイプ、そして、任意に、キャッシュインデックスアドレスビットを含む各種の要因を考慮に入れることによって、メモリー分割のために実装される。 [0022] The approach herein can be used to enhance cache partitioning by memory partitioning, including bank partitioning, rank partitioning, and managing multiple DRAM controller interleaves. The memory pool contains the address bits that select a memory bank, the active memory rank, the number of rank bits and the type of interleaving, the number of active memory controllers, the interleaving granularity and type, and optionally the cache index address bits. Implemented for memory partitioning by taking into account various factors including:

[0023]バンク分割、ランク分割及びメモリーコントローラー分割を含む、１つ又は複数のメモリー分割技術は従来のキャッシュ分割技術によって使用されて処理コアごとに実行時間サイクルを著しく減らすことができ、それによって、処理性能が大幅に増加する。 [0023] One or more memory partitioning techniques, including bank partitioning, rank partitioning and memory controller partitioning, can be used by conventional cache partitioning techniques to significantly reduce execution time cycles per processing core, thereby Processing performance is greatly increased.

[0024]ＤＲＡＭバンクを選択するアドレスビットはアドレス空間において十分に高いので、既存のメモリープール実装において追加的な修正は必要とされない。その結果、本明細書のアプローチは、単にメモリープールの適当な構成（オフセット及びサイズ）を使用するだけで、メモリー階層の分割を適用することが可能である。 [0024] Since the address bits for selecting the DRAM bank are sufficiently high in the address space, no additional modification is required in existing memory pool implementations. As a result, the approach herein can apply memory hierarchy partitioning simply by using the appropriate configuration (offset and size) of the memory pool.

[0025]本明細書のメモリー分割技術は、アビオニクスコンピューターシステムを使用して複数の同時プロセスを実行する航空機に搭載されるアビオニクスプラットフォームなどの、アビオニクスアプリケーションで特に有利で有益である。加えて、本明細書のメモリー分割技術は、シングルコアプロセッサー及びマルチコアプロセッサーの両方に適用することができる。 [0025] The memory partitioning techniques herein are particularly advantageous and beneficial in avionics applications, such as avionics platforms mounted on aircraft that use an avionic computer system to perform multiple simultaneous processes. In addition, the memory partitioning techniques herein can be applied to both single-core processors and multi-core processors.

[0026]本明細書のアプローチの更なる詳細は、図面を参照して後述する。
[0027]図１は、一実施例による、メモリー分割を実装することができるマルチコアプロセッサーアーキテクチャー１００を例示する。マルチコアプロセッサーアーキテクチャー１００は、ＣＯＴＳマルチコアプロセッサーユニット１１０、並びに１つ又は複数のキャッシュ１１４、１１６、１１８、１つ又は複数のメモリーコントローラー１２０及びＤＲＡＭなどの主記憶装置１３０を含むメモリー階層を一般に含む。 [0026] Further details of the approach herein are described below with reference to the drawings.
[0027] FIG. 1 illustrates a multi-core processor architecture 100 that can implement memory partitioning, according to one embodiment. The multi-core processor architecture 100 generally includes a memory hierarchy including a COTS multi-core processor unit 110 and one or more caches 114, 116, 118, one or more memory controllers 120 and a main storage device 130 such as DRAM.

[0028]ＣＯＴＳマルチコアプロセッサーユニット１１０は、１つ又は複数のプロセッサークラスター１１２を含み、各プロセッサークラスター１１２は、１つ又は複数の中央処理装置（ＣＰＵ）コア（ＣＰＵ０、ＣＰＵ１、…ＣＰＵｋ）を含む。コアはそれぞれ、キャッシュ１１４などの専用のレベル１（Ｌ１）キャッシュ、及びキャッシュ１１６などの共用のレベル２（Ｌ２）キャッシュを有する。プロセッサークラスター１１２は、相互接続１１７を介してメモリー階層に動作上接続されている。相互接続１１７は、他の入出力インターフェイス１１９とプロセッサークラスター１１２との間の入出力接続を提供することもできる。 [0028] The COTS multi-core processor unit 110 includes one or more processor clusters 112, and each processor cluster 112 includes one or more central processing unit (CPU) cores (CPU0, CPU1, ... CPUk). Each core has a dedicated level 1 (L1) cache such as cache 114 and a shared level 2 (L2) cache such as cache 116. The processor cluster 112 is operatively connected to the memory hierarchy via interconnect 117. Interconnect 117 may also provide input / output connections between other input / output interfaces 119 and processor cluster 112.

[0029]いくつかの実装において、キャッシュ１１８などの少なくとも１つのレベル３（Ｌ３）キャッシュが存在し、それは相互接続１１７とメモリー階層との間に位置する。Ｌ３キャッシュ１１８は、プラットフォームキャッシュとしてよく知られており、コアによるメモリーアクセスをバッファーする。Ｌ３キャッシュ１１８は、１つ又は複数のメモリーコントローラー１２０に動作上接続されており、メモリーコントローラー１２０がメインメモリー１３０へのアクセスの命令を出す。 [0029] In some implementations, there is at least one level 3 (L3) cache, such as cache 118, which is located between the interconnect 117 and the memory hierarchy. The L3 cache 118, well known as a platform cache, buffers memory accesses by the core. The L3 cache 118 is operatively connected to one or more memory controllers 120, and the memory controller 120 issues an instruction to access the main memory 130.

[0030]主記憶装置１３０は、１つ又は複数のメモリーコントローラー１２０を介して１つ又は複数のプロセッサークラスター１１２に動作上接続されている。主記憶装置１３０は、デュアルインラインメモリーモジュール（ＤＩＭＭ）などの少なくとも１つのメモリーモジュール１３２を含む。主記憶装置１３０は、実行時にデータが格納されて、アクセスされる物理メモリーである。 [0030] The main memory 130 is operatively connected to one or more processor clusters 112 via one or more memory controllers 120. The main storage device 130 includes at least one memory module 132 such as a dual in-line memory module (DIMM). The main storage device 130 is a physical memory in which data is stored and accessed during execution.

[0031]ＤＲＡＭメモリーアーキテクチャーにおいて、各メモリーセル（単一のビット）は、小型コンデンサーによって実装される。時間が過ぎるにつれて、コンデンサーのチャージは弱るので、明示的にリフレッシュされないと記憶データは最終的に失われる。データロスを防止するために、追加ハードウェアが周期的に各メモリーセルの読込み及び書戻しを行い（すなわち、リフレッシュを実行し）、そして、コンデンサーチャージを元のレベルに戻す。ＤＲＡＭリフレッシュは自動的に行われ、ユーザーには見えない。 [0031] In a DRAM memory architecture, each memory cell (single bit) is implemented by a small capacitor. Over time, the capacitor charge will weaken, so the stored data will eventually be lost unless explicitly refreshed. To prevent data loss, additional hardware periodically reads and writes back each memory cell (ie, performs a refresh) and returns the capacitor charge to its original level. DRAM refresh is automatic and is not visible to the user.

[0032]ＤＲＡＭメモリーアーキテクチャーは、メモリーコントローラー、ランク、及びバンクを含む３つのレベルの並列性を提供する。更に、ランクの数はバンクの数を定める。ＤＲＡＭメモリーアーキテクチャーのこれらのレベルについて、以下に述べる。 [0032] The DRAM memory architecture provides three levels of parallelism including a memory controller, rank, and bank. Furthermore, the number of ranks determines the number of banks. These levels of DRAM memory architecture are described below.

[0033]図２Ａは、図１のマルチコアプロセッサーアーキテクチャー１００に実装されるメモリー階層のブロック図である。メモリー階層は、１つ又は複数のキャッシュ１１４、１１６、１１８、１つ又は複数のメモリーコントローラー１２０、及びメインメモリー１３０（ＤＲＡＭ）を含み、それは１つ又は複数のＤＩＭＭなどの１つ又は複数のメモリーモジュール１３２を含むことができる。ＤＩＭＭはランクインターリーブを可能にし、それについては後述する。 [0033] FIG. 2A is a block diagram of a memory hierarchy implemented in the multi-core processor architecture 100 of FIG. The memory hierarchy includes one or more caches 114, 116, 118, one or more memory controllers 120, and a main memory 130 (DRAM), which includes one or more memories, such as one or more DIMMs. A module 132 may be included. The DIMM enables rank interleaving, which will be described later.

[0034]図２Ｂは、一実施例による、ＤＩＭＭ２４０の典型的配置を示す。ＤＩＭＭ２４０は、典型的には２つのランク（例えば、ランク０、ランク１）で構成され、それらは回路基板２４２に接続されている。ランクは、チップセレクト信号によって明示的に選択される。各ランクは、一組のランクアドレスビットによって定められるランクアドレスを有する。ランクは、一組のＤＲＡＭデバイス２００から成る。ランク内の全てのＤＲＡＭデバイスは、アドレス、データ、及びコマンドバスを共有する。 [0034] FIG. 2B illustrates an exemplary arrangement of DIMM 240, according to one embodiment. The DIMM 240 is typically configured with two ranks (for example, rank 0 and rank 1), which are connected to the circuit board 242. The rank is explicitly selected by a chip select signal. Each rank has a rank address defined by a set of rank address bits. A rank consists of a set of DRAM devices 200. All DRAM devices in the rank share an address, data, and command bus.

[0035]図２Ｃは、一実施例による、ＤＩＭＭ２４０において実装することができる単一のＤＲＡＭデバイス２００のアーキテクチャーを表す。ＤＲＡＭデバイス２００は、一組のメモリーバンク２１０（例えば、バンク１からバンク８まで）を含み、各バンクは、付随する論理を備える行及び列のＤＲＡＭアレイ２１２を含む。各バンク２１０は、一組のバンクアドレスビットによって定められるバンクアドレスを有する。バンク２１０の各行は単一のメモリーページを含み、それはＤＲＡＭデバイス２００で最も小さいアドレス可能データ単位であり、典型的には４ｋＢに等しい。各ページは、開いているか又は閉じているかのいずれかであり得る。行バッファー２１４は、最も直近に開いたページを保持する。各バンク２１０はまた、行デコーダー２１６及び列デコーダー２１８を含む。 [0035] FIG. 2C represents the architecture of a single DRAM device 200 that may be implemented in DIMM 240, according to one embodiment. DRAM device 200 includes a set of memory banks 210 (eg, bank 1 through bank 8), each bank including a row and column DRAM array 212 with associated logic. Each bank 210 has a bank address defined by a set of bank address bits. Each row of bank 210 contains a single memory page, which is the smallest addressable data unit in DRAM device 200, typically equal to 4 kB. Each page can be either open or closed. The row buffer 214 holds the most recently opened page. Each bank 210 also includes a row decoder 216 and a column decoder 218.

[0036]ＤＲＡＭデバイス２００は、３つの基本インターフェイスであるコマンド（ｃｍｄ）インターフェイス２２０、アドレス（ａｄｄｒ）インターフェイス２２２、及びデータインターフェイス２２４を有する。コマンドインターフェイス２２０は、命令デコーダー２２６と動作上通信して、読込み、書込み、又は、リフレッシュというメモリー操作のタイプを指示する。アドレスインターフェイス２２２は、行デコーダー２１６及び列デコーダー２１８と動作上通信する。データインターフェイスは、列デコーダー２１８と動作上通信する。リフレッシュカウンター２２８は、命令デコーダー２２６と行デコーダー２１６との間に動作上接続される。 [0036] The DRAM device 200 has three basic interfaces: a command (cmd) interface 220, an address (addr) interface 222, and a data interface 224. Command interface 220 is in operational communication with instruction decoder 226 to indicate the type of memory operation, read, write, or refresh. Address interface 222 is in operative communication with row decoder 216 and column decoder 218. The data interface is in operational communication with the column decoder 218. The refresh counter 228 is operatively connected between the instruction decoder 226 and the row decoder 216.

[0037]図３は、ＤＲＡＭメモリーコントローラー３００の論理構造のブロック図であり、ＤＲＡＭメモリーコントローラー３００はプロセッサーユニットに実装することができる。ＤＲＡＭメモリーコントローラー３００は、Ｌ３キャッシュを介するなどして、ＣＰＵコアからメモリー要求を受信する（ブロック３１０）。メモリー要求はリクエストバッファー３２０に格納され、リクエストバッファー３２０は、メモリーバンク（例えば、バンク０、バンク１…バンク８）のそれぞれに対する各自の優先待ち行列３２２を含む。一旦複数のメモリー要求が優先待ち行列３２２に存在すると、メモリースケジューラー３３０が呼び出されて、それぞれの優先待ち行列３２２と通信するそれぞれのバンクスケジューラー３３２を使用してメモリー要求の１つを選択する。次いで、選択されたメモリー要求はチャネルスケジューラー３４０に送信される。チャネルスケジューラー３４０は、ＤＲＡＭアドレス及びコマンドバスと通信する。 [0037] FIG. 3 is a block diagram of the logic structure of the DRAM memory controller 300, which can be implemented in a processor unit. The DRAM memory controller 300 receives a memory request from the CPU core, such as via an L3 cache (block 310). Memory requests are stored in request buffer 320, which includes its own priority queue 322 for each of the memory banks (eg, bank 0, bank 1... Bank 8). Once there are multiple memory requests in the priority queue 322, the memory scheduler 330 is invoked to select one of the memory requests using each bank scheduler 332 that communicates with each priority queue 322. The selected memory request is then sent to the channel scheduler 340. The channel scheduler 340 communicates with the DRAM address and command bus.

[0038]ＤＲＡＭメモリーコントローラーによって、ユーザーは、リフレッシュレートなどのＤＲＡＭパラメーターを指定することができる。リフレッシュが進行する間、ＤＲＡＭデバイスは読出し書込み動作に一時的に利用できなくなる。ＤＲＡＭデバイスのページが同時に多数アクセスされる場合、ページ追出しが行バッファーに発生することがあり、メモリーアクセス時間が増加する結果となる。行バッファー追出しの緩和は、単一のバンクを各処理コアに割り当てることである。 [0038] The DRAM memory controller allows the user to specify DRAM parameters such as refresh rate. While the refresh proceeds, the DRAM device is temporarily unavailable for read / write operations. If many pages of the DRAM device are accessed simultaneously, page eviction may occur in the row buffer, resulting in increased memory access time. The mitigation of row buffer eviction is to assign a single bank to each processing core.

[0039]多重ＣＰＵコアがメモリー要求を同時に送信している場合、チャネルスケジューラーの再順序付けが生じる可能性もある。複数のメモリーコントローラーの使用は、チャネルスケジューラーの再順序付けによって生じる干渉遅延を緩和することができる。 [0039] If multiple CPU cores are sending memory requests simultaneously, channel scheduler reordering may also occur. The use of multiple memory controllers can mitigate interference delays caused by channel scheduler reordering.

[0040]本明細書のアプローチは、キャッシュインデックスアドレスビット、ＤＲＡＭバンクを選択するためのアドレスビット、アクティブなメモリーランク、ランクビットの数及びインターリーブのタイプ、並びに、アクティブなメモリーコントローラーの数、インターリーブの粒状度及びタイプを考慮に入れることによって、メモリー分割のためのメモリープール概念を実装する。アプリケーションニーズに応じて、本明細書のアプローチは、メモリーアクセス分離の各種レベルを実現することが可能である。例えば、単一のメモリープールを１台のメモリーコントローラーに割り当てることができ、又は、単一のメモリープールを特定のランクの特定のバンクに分離することができる。一般に、メモリーアクセスのより厳しい分離の結果として、利用できるメモリープールの数が少なくなる。 [0040] The approach herein includes cache index address bits, address bits for selecting a DRAM bank, active memory rank, number of rank bits and type of interleaving, and number of active memory controllers, interleaving Implement a memory pool concept for memory partitioning by taking into account granularity and type. Depending on the application needs, the approach herein can achieve various levels of memory access isolation. For example, a single memory pool can be assigned to one memory controller, or a single memory pool can be separated into a specific bank of a specific rank. In general, fewer memory pools are available as a result of the tighter separation of memory access.

[0041]図４は、本明細書のアプローチによるメモリー分割を実装する方法４００のフローチャートである。方法４００は、コンピューティングシステムのメインメモリーを複数のメモリープールに分けること（４１０）を含む。メインメモリーは、一組のランクアドレスビットによって定められるランクアドレスをそれぞれ有する１つ又は複数のランクに配置される、ＤＲＡＭデバイスなどの一組のメモリーデバイスを含むことができる。各メモリーデバイスは、各々、一組のバンクアドレスビットによって定められるバンクアドレスを有する１つ又は複数のバンクを含む。１つ又は複数のＣＰＵコアで実行される複数のスレッドは、１つ又は複数のメモリー分割技術に基づいてメモリープールに割り当てられ（ブロック４２０）、その分割技術は、バンク分割（ブロック４２２）、ランク分割（４２４）、又はメモリーコントローラー分割（４２６）を含む。加えて、これらのメモリー分割技術の１つ又は複数を、任意で、キャッシュインデックスアドレスビットを使用してキャッシュ分割のサイズ及び数を定める従来のキャッシュ分割技術と連動して用いることができる（ブロック４３０）。 [0041] FIG. 4 is a flowchart of a method 400 for implementing memory partitioning according to the approach herein. The method 400 includes dividing 410 the main memory of the computing system into a plurality of memory pools. The main memory can include a set of memory devices, such as DRAM devices, that are arranged in one or more ranks each having a rank address defined by a set of rank address bits. Each memory device includes one or more banks each having a bank address defined by a set of bank address bits. Multiple threads executing on one or more CPU cores are assigned to a memory pool based on one or more memory partitioning techniques (block 420), which is divided into banks (block 422), rank Includes partitioning (424) or memory controller partitioning (426). In addition, one or more of these memory partitioning techniques can optionally be used in conjunction with conventional cache partitioning techniques that use cache index address bits to determine the size and number of cache partitions (block 430). ).

[0042]いくつかの実装において、スレッドの少なくともいくつかは同じメモリープールに割り当てられ、又は、スレッドの少なくともいくつかは異なるメモリープールにそれぞれ割り当てることができる。加えて、スレッドの少なくともいくつかは同じＣＰＵコアにマッピングすることができ、又は、スレッドの少なくともいくつかは異なるＣＰＵコアにそれぞれマッピングすることができる。更に、メモリープールの少なくともいくつかは、メモリーコントローラー、１つ又は複数のランクあるいは１つ又は複数のバンクに、１対１（１：１）の対応で、それぞれマッピングすることができる。あるいは、メモリープールの少なくともいくつかは、複数のメモリーコントローラー、１つ又は複数のランク、及び１つ又は複数のバンクに、１対多（１：Ｎ）の対応で、それぞれマッピングすることができる。 [0042] In some implementations, at least some of the threads can be assigned to the same memory pool, or at least some of the threads can each be assigned to different memory pools. In addition, at least some of the threads can be mapped to the same CPU core, or at least some of the threads can each be mapped to different CPU cores. Furthermore, at least some of the memory pools can be mapped to memory controllers, one or more ranks, or one or more banks, respectively, in a one-to-one (1: 1) correspondence. Alternatively, at least some of the memory pools can be mapped to a plurality of memory controllers, one or more ranks, and one or more banks, respectively, in a one-to-many (1: N) correspondence.

[0043]バンク分割技術はバンクアドレスビットを使用して、メインメモリーのメモリープールのサイズ及びパターンを定める。メモリープールは、バンクアドレスビット及びランクアドレスビットに関してメインメモリーにマッピングされる。この技術において、ＤＲＡＭバンク分割を使用して、行バッファー追出しに結果としてなるコア間でのバンク共有によって生じる遅延を回避し、最高約３０％の予想される性能の増加を提供することができる。 [0043] Bank partitioning technology uses bank address bits to define the size and pattern of the memory pool of main memory. The memory pool is mapped to main memory with respect to bank address bits and rank address bits. In this technique, DRAM bank partitioning can be used to avoid the delay caused by bank sharing between cores resulting in row buffer eviction and provide an expected performance increase of up to about 30%.

[0044]ランク分割技術は、ランクアドレスビットを使用してランクにアクセスし、そのため、ランクの数が多いほど利用できるバンクの数が多くなるという結果になる。
[0045]ＤＲＡＭデバイスのための典型的メモリーアドレスレイアウトは、表１に示される。 [0044] Rank partitioning techniques use rank address bits to access ranks, so the higher the number of ranks, the greater the number of available banks.
[0045] A typical memory address layout for a DRAM device is shown in Table 1.

表１のアドレスレイアウトが示唆するように、メモリーバンク用の専用アドレスビットがある。
[0046]プロセッサーのドキュメンテーションが制限されており、アドレスレイアウトに関する情報が失われている場合、アドレスレイアウトを決定するために発見アルゴリズムが適用可能であり、例えば、Ｉｎｔ’ｌＣｏｎｆ．ｏｎＰａｒａｌｌｅｌＡｒｃｈｉｔｅｃｔｕｒｅｓａｎｄＣｏｍｐｉｌａｔｉｏｎＴｅｃｈｎｉｑｕｅｓ（ＰＡＣＴ），２０１２の会議録、３６７〜３７６頁のＬｉｕ他の、Ａｓｏｆｔｗａｒｅｍｅｍｏｒｙｐａｒｔｉｔｉｏｎａｐｐｒｏａｃｈｆｏｒｅｌｉｍｉｎａｔｉｎｇｂａｎｋ−ｌｅｖｅｌｉｎｔｅｒｅｆｅｒｅｎｃｅｉｎｍｕｌｔｉｃｏｒｅｓｙｓｔｅｍｓ（マルチコアシステムにおけるバンクレベル干渉を除去するためのソフトウェアメモリー分割アプローチ）により提案されたアルゴリズムがあり、その開示内容は、本明細書の一部を構成するものとして援用する。 As the address layout in Table 1 suggests, there are dedicated address bits for the memory bank.
[0046] If the processor documentation is limited and information about the address layout is lost, a discovery algorithm can be applied to determine the address layout, see, eg, Int'l Conf. on Parallel Architecture and Compilation Techniques (PACT), Proceedings of 2012, Liu et al., pp. 367-376, Multi-levels in the A There is an algorithm proposed by a software memory partitioning approach), the disclosure of which is incorporated as part of this specification.

[0047]メモリーコントローラー分割技術は３２ビットシステム上で実装することができ、メモリーコントローラーインターリーブを使用し、そして、それはインターリーブの粒状度及びタイプを考慮する。メモリーコントローラーインターリーブは、公平又は均等にメモリー要求を複数のメモリーコントローラーに分散するために用いることができ、あるいは、特定のメモリーコントローラーに完全にメモリー要求を分離するために用いることができる。６４ビットシステムにおいて、メモリーコントローラー分割は、全てのメモリーコントローラーをアクセスできるようにする一方でメモリーコントローラーインターリーブを無効にすることによって実施することができる。 [0047] The memory controller partitioning technique can be implemented on a 32-bit system, uses memory controller interleaving, and it takes into account the granularity and type of interleaving. Memory controller interleaving can be used to distribute memory requests across multiple memory controllers fairly or evenly, or can be used to completely separate memory requests across specific memory controllers. In 64-bit systems, memory controller partitioning can be implemented by disabling memory controller interleaving while allowing all memory controllers to be accessed.

[0048]キャッシュ分割技術が使用される場合、メモリープールはキャッシュインデックスアドレスビットに関してメインメモリーにマッピングされる。キャッシュ分割技術に関連した更なる詳細は、ＣＡＣＨＥＰＯＯＬＩＮＧＦＯＲＣＯＭＰＵＴＩＮＧＳＹＳＴＥＭＳ（コンピューティングシステムのためのキャッシュプーリング）と題する米国特許第８，０６９，３０８号において、そして、ＳＹＳＴＥＭＡＮＤＭＥＴＨＯＤＯＦＣＡＣＨＥＰＡＲＴＩＴＩＯＮＩＮＧＦＯＲＰＲＯＣＥＳＳＯＲＳＷＩＴＨＬＩＭＩＴＥＤＣＡＣＨＥＤＭＥＭＯＲＹＰＯＯＬＳ（メモリープールのキャッシュに制限のあるプロセッサーのためのキャッシュ分割のシステム及び方法）と題する、米国公開第２０１５／０２０５７２４号に記載されており、その開示の両方の内容は、本明細書の一部を構成するものとして援用する。 [0048] If a cache partitioning technique is used, the memory pool is mapped to main memory in terms of cache index address bits. Further details relating to the cache partitioning technique can be found in US Pat. No. 8,069,308 entitled CACHE POOLING FOR COMPUTING SYSTEMS, and SYSTEM AND METHOD OF CACHETING PROCESSORS FORS US Patent Publication No. 2015/0205724, entitled LIMITED CACHED MEMORY POOLS (cache partitioning system and method for processors with limited memory pool cache), the contents of both of which are disclosed herein. Incorporated as part of the book.

[0049]メモリーアドレスレイアウトに応じて、メモリープールの１：１（１対１）又は１：Ｎ（１対多）のマッピングは、キャッシュ、メモリーコントローラー、ランク、及びバンクの間で行うことができる。 [0049] Depending on the memory address layout, a 1: 1 (one-to-one) or 1: N (one-to-many) mapping of memory pools can be made between caches, memory controllers, ranks, and banks. .

[0050]実験的研究がＤＲＡＭバンク分割に関して行われて、その利点を示した。これらの研究は、ＤＤＣ−ＩからのＤｅｏｓリアルタイムオペレーティングシステム（ＲＴＯＳ）を使用して、ＣＯＴＳマルチコアプロセッサーで行われた。ＣＯＴＳマルチコアプロセッサーは、１２個の物理ＣＰＵを有し、各物理ＣＰＵは２つのハードウェアスレッドを有し、その結果として合計２４個の仮想ＣＰＵになる。各仮想ＣＰＵは、それ自身の、３２ＫＢのサイズの小さい命令及びデータＬ１キャッシュを有する。仮想ＣＰＵは、３つのクラスターにグループ化され、各クラスターが８つの仮想ＣＰＵを有する。各クラスターはそれ自身の２ＭＢのＬ２キャッシュを有する。全てのクラスターは、オンチップ相互接続に接続されている。ＤＲＡＭメモリーは３台のメモリーコントローラーにマッピングされ、各メモリーコントローラーは５１２ＫＢの専用のＬ３キャッシュ容量を有する。 [0050] Experimental studies have been conducted on DRAM bank partitioning to show its advantages. These studies were performed on COTS multi-core processors using the Deos Real-Time Operating System (RTOS) from DDC-I. The COTS multi-core processor has 12 physical CPUs, each physical CPU has 2 hardware threads, resulting in a total of 24 virtual CPUs. Each virtual CPU has its own small 32 KB instruction and data L1 cache. Virtual CPUs are grouped into three clusters, each cluster having eight virtual CPUs. Each cluster has its own 2MB L2 cache. All clusters are connected to on-chip interconnects. The DRAM memory is mapped to three memory controllers, and each memory controller has a dedicated L3 cache capacity of 512 KB.

[0051]ＤＲＡＭバンク分割の利点は、同時に同じ及び異なるＤＲＡＭバンクに誘発される一組のキャッシュを用いないメモリーアクセスによって示された。キャッシュを用いないメモリーアクセスが使用されるため、キャッシュの有無が測定された実行時間に影響を及ぼさないという仮定がなされる。それにもかかわらず、続く実験で、Ｌ３キャッシュ（プラットフォームキャッシュ）は無効にされる。行われる実験は、ワーカープロセス及びトラッシャープロセスの２つのプロセスに関するものであり、両方のプロセスが繰り返し呼び出される。 [0051] The benefits of DRAM bank partitioning have been demonstrated by memory accesses without a set of caches that are induced in the same and different DRAM banks simultaneously. Since memory access without a cache is used, the assumption is made that the presence or absence of a cache does not affect the measured execution time. Nevertheless, the L3 cache (platform cache) is invalidated in subsequent experiments. The experiments performed are for two processes, a worker process and a trasher process, both processes being called repeatedly.

[0052]ワーカープロセスは、ライタースレッドとリーダースレッドとを含む２つのスレッドを有している。ライタースレッドは、予め定められたページ数を自身のメモリープールより大きなメモリーアレイへ書き込む。ライタースレッドが終了すると、リーダースレッドはページを読み込んで、メモリーアレイのチェックサムを計算する。ワーカープロセスの両方のスレッドは、同じメモリープールを共有する。単一スレッドの繰返し終了の時点で、ワーカースレッドからの全てのメモリー要求がキャッシュに登録されないことを保証するために、キャッシュは無効化される。 [0052] The worker process has two threads, including a writer thread and a reader thread. The writer thread writes a predetermined number of pages to a memory array larger than its own memory pool. When the writer thread finishes, the reader thread reads the page and calculates the checksum of the memory array. Both threads of the worker process share the same memory pool. At the end of a single thread iteration, the cache is invalidated to ensure that all memory requests from worker threads are not registered in the cache.

[0053]トラッシャープロセスは、複数のトラッシャースレッドを有する。トラッシャースレッドは、メモリーアレイに多くのページ数を連続的に書き込み、そのため、アクセスされるメモリーは、割当メモリープールと同じ大きさであるか、又はそれより大きい。従って、トラッシャープロセスは、メモリー階層においてキャッシュを用いない書込み及びストレスを連続的に実行することができる。トラッシャープロセスは、緩やかな使用を有効にして、ワーカープロセスと並列に動作する。トラッシャースレッドは、ワーカープロセススレッドが動作しているＣＰＵとは異なる形でＣＰＵにマッピングされる。 [0053] The trasher process has a plurality of trasher threads. The trasher thread continuously writes a large number of pages to the memory array so that the accessed memory is the same size or larger than the allocated memory pool. Therefore, the trasher process can continuously execute writing and stress without using a cache in the memory hierarchy. The trasher process runs in parallel with the worker process, enabling loose use. The trasher thread is mapped to the CPU differently from the CPU on which the worker process thread is running.

[0054]図５は、ワーカー及びトラッシャープロセス繰返しに対するスレッドマッピング及び実行タイムラインのグラフ表現である。ライター及びリーダーを含むワーカープロセススレッドは、ＣＰＵ０にマッピングされ、その一方で、トラッシャースレッドは同じクラスターの残りの７つのＣＰＵ（ＣＰＵ１からＣＰＵＮ）上で動作した。実行時間測定は、ライター及び／又はリーダータイミングに対して示される。 [0054] FIG. 5 is a graphical representation of thread mapping and execution timelines for worker and trasher process iterations. Worker process threads, including writers and leaders, were mapped to CPU0, while trasher threads ran on the remaining seven CPUs (CPU1 to CPUN) of the same cluster. Execution time measurements are shown relative to writer and / or reader timing.

[0055]図６は、非分割ＤＲＡＭ及びＤＲＡＭバンク分割の両方に対するワーカープロセスの実行時間を示すグラフである。列挙された性能結果は、１００回のスレッドの繰返しにわたる、測定された最高、測定された最低、及び平均の実行時間に対するものである。メモリープールを有するＤＲＡＭバンク分割の利点を概説するために、非分割ＤＲＡＭを、ＤＲＡＭバンク分割と比較する。非分割ＤＲＡＭは、ワーカー及びリーダースレッドと同じメモリープールにトラッシャースレッドを配置することによって実装される。ＤＲＡＭバンク分割は、別々のメモリープールにトラッシャースレッドを配置することによって実装される。ワーカー及びトラッシャーメモリープールは、ＤＲＡＭバンクのために使用するアドレスビットを考慮することによって定められる。実験データが示すように、ＤＲＡＭバンク分割は処理コアごとに、非分割ＤＲＡＭに比べて約２０％実行時間サイクルを減らし、それによって、ＣＰＵ性能が更に２０％追加された。 [0055] FIG. 6 is a graph showing execution times of worker processes for both undivided DRAM and DRAM bank partitioning. The listed performance results are for the highest measured, lowest measured, and average execution time over 100 thread iterations. To outline the advantages of DRAM bank partitioning with a memory pool, non-partitioned DRAM is compared to DRAM bank partitioning. Non-divided DRAM is implemented by placing trasher threads in the same memory pool as the worker and leader threads. DRAM bank partitioning is implemented by placing trasher threads in separate memory pools. Worker and trasher memory pools are defined by considering the address bits used for DRAM banks. As experimental data indicates, DRAM bank partitioning reduced the execution time cycle by about 20% for each processing core compared to non-partitioned DRAM, thereby adding an additional 20% CPU performance.

[0056]追加の実験結果は、アプリケーションの操作上のデータセットが完全にキャッシュ常駐であることができるときに、キャッシュ分割が非分割キャッシュと比べて最高約６０％の性能改善を得られること（そして、キャッシュはコアにまたがる追出し／汚染から保護され得ること）を示唆している。キャッシュサイズは、大半のアプリケーションに対して１００％のキャッシュ常駐域をサポートするには少なすぎる場合が多いので、キャッシュ分割及びＤＲＡＭバンク分割の組合せによって、各アプリケーションに対する最悪ケースのメモリートランザクション回数の減少及び最悪ケースの実行の減少（キャッシュ分割による）、並びに、発生するそれらのメモリー処理に対する可能な最小限のＤＲＡＭアクセス時間（ＤＲＡＭバンク分割による）がもたらされる。 [0056] Additional experimental results show that cache partitioning can obtain up to about 60% performance improvement over non-partitioned caches when the operational dataset of the application can be fully cache resident ( And the cache can be protected from eviction / contamination across the core). Cache size is often too small to support 100% cache resident area for most applications, so the combination of cache partitioning and DRAM bank partitioning reduces the worst case memory transactions for each application and The worst case execution is reduced (due to cache partitioning) as well as the minimum possible DRAM access time (due to DRAM bank partitioning) for those memory operations that occur.

[0057]本明細書のシステム及び方法で使用するコンピューター又はプロセッサーは、当業者に知られているような、ソフトウェア、ファームウェア、ハードウェア又はそれらのいかなる適切な組合せも使用して実装することができる。例えば、限定するものではないが、ハードウェアコンポーネントは、１つ又は複数のマイクロプロセッサー、記憶素子、デジタル信号処理（ＤＳＰ）素子、インターフェイスカード、及び当技術分野で知られる他の標準部品を含むことができる。これらは、特別に設計された特定用途向け集積回路（ＡＳＩＣ）又はフィールドプログラム可能ゲートアレイ（ＦＰＧＡ）によって補うことができるか、又はそれに取り入れることができる。コンピューター又はプロセッサーは、本明細書の方法及びシステムで使用する各種のプロセスタスク、計算及び制御機能を実行するためのソフトウェアプログラム、ファームウェア又は他のコンピューター可読命令を有する機能を含むこともできる。 [0057] The computer or processor used in the systems and methods herein can be implemented using software, firmware, hardware, or any suitable combination thereof, as known to those skilled in the art. . For example, without limitation, hardware components include one or more microprocessors, storage elements, digital signal processing (DSP) elements, interface cards, and other standard components known in the art. Can do. These can be supplemented by or incorporated in specially designed application specific integrated circuits (ASICs) or field programmable gate arrays (FPGAs). The computer or processor may also include functions having software programs, firmware, or other computer readable instructions for performing various process tasks, computation and control functions used in the methods and systems herein.

[0058]本方法は、プログラムモジュール又はコンポーネントなどのコンピューター実行可能命令によって行うことができ、それらは少なくとも１つのプロセッサーによって実行される。一般に、プログラムモジュールはルーチン、プログラム、オブジェクト、コンポーネント、データ構造、アルゴリズムなどを含み、それらが特定のタスクを実行するか、又は特定のデータタイプを実装する。 [0058] The method may be performed by computer-executable instructions, such as program modules or components, which are executed by at least one processor. Generally, program modules include routines, programs, objects, components, data structures, algorithms, etc. that perform particular tasks or implement particular data types.

[0059]本明細書において記載されている方法の動作において使用する各種のプロセスタスク、計算及び他のデータの生成を行うための命令は、ソフトウェア、ファームウェア、又は他のコンピューター可読であるかもしくはプロセッサー読取り可能な命令として実装することができる。これらの命令は、典型的には、コンピューター可読命令又はデータ構造の記憶のために用いるコンピューター可読媒体を含む任意の適切なコンピュータープログラム製品に記憶される。このようなコンピューター可読媒体は、汎用もしくは専用コンピューター又はプロセッサーによってアクセスすることができるいかなる利用可能な媒体、あるいはいかなるプログラム可能論理デバイスでもあり得る。 [0059] Instructions for performing various process tasks, calculations and other data generation used in the operation of the methods described herein may be software, firmware, or other computer readable or processor It can be implemented as a readable instruction. These instructions are typically stored in any suitable computer program product including computer readable instructions or computer readable media used for storage of data structures. Such computer-readable media can be any available media that can be accessed by a general purpose or special purpose computer or processor, or any programmable logic device.

[0060]好適なコンピューター可読媒体には、磁気又は光学媒体などの記憶又はメモリー媒体が含まれ得る。例えば、記憶又はメモリー媒体は、従来型ハードディスク、コンパクトディスク読取り専用メモリー（ＣＤ−ＲＯＭ）、ＤＶＤ、ランダムアクセスメモリー（ＲＡＭ）（限定するものではないが、シンクロナスダイナミックランダムアクセスメモリー（ＳＤＲＡＭ）、ダブルデータレート（ＤＤＲ）ＲＡＭ、ＲＡＭＢＵＳダイナミックＲＡＭ（ＲＤＲＡＭ）、スタティックＲＡＭ（ＳＲＡＭ）、などを含む）などの揮発性又は不揮発性の媒体、読取り専用メモリー（ＲＯＭ）、電気的消去可能なプログラム可能ＲＯＭ（ＥＥＰＲＯＭ）、フラッシュメモリー、ブルーレイディスクなどを含むことができる。上記の組合せもまた、コンピューター可読媒体の範囲内に含まれる。
例示実施形態
[0061]実施例１は、少なくとも１つの処理ユニットと、前記少なくとも１つの処理ユニットと動作上通信する、キャッシュ有り又は無しの、少なくとも１つのメモリーコントローラーと、前記少なくとも１つのメモリーコントローラーを介して前記少なくとも１つの処理ユニットと動作上通信するメインメモリーとを含むコンピューティングシステムを包含する。前記コンピューティングシステムのメモリー階層は少なくとも１つのキャッシュ、前記少なくとも１つのメモリーコントローラー、及び前記メインメモリーを含み、前記メモリー階層は複数のメモリープールに分けられる。前記メインメモリーは各々、一組のランクアドレスビットによって定められるランクアドレスを各々有するランクに分割される一組のメモリーモジュールを含み、各ランクは一組のメモリーデバイスを有し、各前記メモリーデバイスが各々、一組のバンクアドレスビットによって定められるバンクアドレスを有する１つ又は複数のバンクを含む。複数のスレッドは、前記少なくとも１つの処理ユニット上で実行され、前記バンクアドレスビットを使用して前記メモリープールのうち１つ又は複数のサイズ及びパターンを定めるバンク分割、前記ランクアドレスビットを使用して前記１つ又は複数のランクにアクセスするランク分割、あるいはメモリーコントローラーインターリーブを使用するメモリーコントローラー分割を含む１つ又は複数のメモリー分割技術に基づいて、前記メモリープールに割り当てられる。 [0060] Suitable computer readable media may include storage or memory media such as magnetic or optical media. For example, the storage or memory medium can be a conventional hard disk, compact disk read only memory (CD-ROM), DVD, random access memory (RAM) (but not limited to synchronous dynamic random access memory (SDRAM), double Volatile or non-volatile media such as data rate (DDR) RAM, RAMBUS dynamic RAM (RDRAM), static RAM (SRAM), etc.), read only memory (ROM), electrically erasable programmable ROM ( EEPROM), flash memory, Blu-ray disc, and the like. Combinations of the above are also included within the scope of computer-readable media.
Exemplary embodiment
[0061] Example 1 includes at least one processing unit, at least one memory controller with or without cache in operational communication with the at least one processing unit, and via the at least one memory controller. A computing system including a main memory in operational communication with at least one processing unit is included. The memory hierarchy of the computing system includes at least one cache, the at least one memory controller, and the main memory, and the memory hierarchy is divided into a plurality of memory pools. Each of the main memories includes a set of memory modules that are divided into ranks each having a rank address defined by a set of rank address bits, each rank having a set of memory devices, each of the memory devices being Each includes one or more banks having a bank address defined by a set of bank address bits. A plurality of threads execute on the at least one processing unit and use the bank address bits to define one or more sizes and patterns of the memory pool using the bank address bits, using the rank address bits The memory pool is assigned based on one or more memory partitioning techniques including rank partitioning that accesses the one or more ranks, or memory controller partitioning that uses memory controller interleaving.

[0062]実施例２は、実施例１のコンピューティングシステムを包含し、前記メインメモリーはダイナミックランダムアクセスメモリー（ＤＲＡＭ）を含む。
[0063]実施例３は、実施例１から２のいずれかのコンピューティングシステムを包含し、前記スレッドはまた、キャッシュインデックスアドレスビットを使用してキャッシュ分割のサイズ及び数を定める、キャッシュ分割技術に基づいて前記メモリープールに割り当てられる。 [0062] Example 2 includes the computing system of Example 1, wherein the main memory includes dynamic random access memory (DRAM).
[0063] Example 3 includes the computing system of any of Examples 1-2, wherein the thread also uses a cache index address bit to determine the size and number of cache partitions, in a cache partitioning technique. Based on the memory pool.

[0064]実施例４は、実施例１から３のいずれかのコンピューティングシステムを包含し、前記スレッドの少なくともいくつかは同じメモリープールに割り当てられる。
[0065]実施例５は、実施例１から３のいずれかのコンピューティングシステムを包含し、前記スレッドの少なくともいくつかは異なるメモリープールにそれぞれ割り当てられる。 [0064] Example 4 includes the computing system of any of Examples 1-3, wherein at least some of the threads are assigned to the same memory pool.
[0065] Example 5 includes the computing system of any of Examples 1-3, wherein at least some of the threads are each assigned to a different memory pool.

[0066]実施例６は、実施例１から５のいずれかのコンピューティングシステムを包含し、前記少なくとも１つの処理ユニットは１つ又は複数の中央処理装置（ＣＰＵ）コアを含む。 [0066] Example 6 includes the computing system of any of Examples 1-5, wherein the at least one processing unit includes one or more central processing unit (CPU) cores.

[0067]実施例７は、実施例６のコンピューティングシステムを包含し、前記スレッドの少なくともいくつかは同じＣＰＵコアにマッピングされる。
[0068]実施例８は、実施例６のコンピューティングシステムを包含し、前記スレッドの少なくともいくつかは異なるＣＰＵコアにそれぞれマッピングされる。 [0067] Example 7 includes the computing system of Example 6, wherein at least some of the threads are mapped to the same CPU core.
[0068] Example 8 includes the computing system of Example 6, wherein at least some of the threads are each mapped to a different CPU core.

[0069]実施例９は、実施例１から８のいずれかのコンピューティングシステムを包含し、前記メモリープールの少なくともいくつかは、前記少なくとも１つのメモリーコントローラー、前記１つ又は複数のランク、あるいは前記１つ又は複数のバンクに、１対１（１：１）の対応でそれぞれマッピングされる。 [0069] Example 9 includes the computing system of any of Examples 1-8, wherein at least some of the memory pools include the at least one memory controller, the one or more ranks, or the Each is mapped to one or a plurality of banks in a one-to-one (1: 1) correspondence.

[0070]実施例１０は、実施例１から８のいずれかのコンピューティングシステムを包含し、前記メモリープールの少なくともいくつかは、前記少なくとも１つのメモリーコントローラー、前記１つ又は複数のランク、及び前記１つ又は複数のバンクの複数に、１対多（１：Ｎ）の対応でそれぞれマッピングされる。 [0070] Example 10 includes the computing system of any of Examples 1-8, wherein at least some of the memory pools include the at least one memory controller, the one or more ranks, and the The data are mapped to one or a plurality of banks in a one-to-many (1: N) correspondence.

[0071]実施例１１は、マルチコアプロセッサーユニットであって、１つ又は複数のプロセッサークラスターであって、それぞれが複数の中央処理装置（ＣＰＵ）コアを含み、前記コアのそれぞれが専用レベルのファーストキャッシュ（ｆｉｒｓｔｃａｃｈｅ）及び共用レベルのセカンドキャッシュ（ｓｅｃｏｎｄｃａｃｈｅ）を有する、１つ又は複数のプロセッサークラスターと、前記１つ又は複数のプロセッサークラスターに動作上連結された相互接続と、前記相互接続を介して前記１つ又は複数のプロセッサークラスターと動作上通信する１つ又は複数のメモリーコントローラーとを含むマルチコアプロセッサーユニットを含む、アビオニクスコンピューターシステムを包含する。メインメモリーは、前記１つ又は複数のメモリーコントローラーを介して前記１つ又は複数のプロセッサークラスターと動作上通信する。前記アビオニクスコンピューターシステムのメモリー階層は前記ファースト又はセカンドキャッシュの少なくとも１つ、前記１つ又は複数のメモリーコントローラー、及び前記メインメモリーを含み、前記メモリー階層は複数のメモリープールに分けられる。前記メインメモリーは、一組のランクアドレスビットによって定められるランクアドレスをそれぞれ有するランクに分けられる一組のデュアルインラインメモリーモジュール（ＤＩＭＭ）を含み、各ランクは一組のダイナミックランダムアクセスメモリー（ＤＲＡＭ）デバイスを有し、前記ＤＲＡＭデバイスのそれぞれは、一組のバンクアドレスビットによって定められるバンクアドレスをそれぞれ有する１つ又は複数のバンクを含む。複数のスレッドは、前記ＣＰＵコア上で実行される。前記スレッドは、前記バンクアドレスビットを使用して前記メモリープールのうち１つ又は複数のサイズ及びパターンを定めるバンク分割、前記ランクアドレスビットを使用して前記１つ又は複数のランクにアクセスするランク分割、又は前記メモリーコントローラーインターリーブを使用して、前記メモリー要求を複数のメモリーコントローラーに公平に分散するか、又は前記メモリー要求を特定のメモリーコントローラーに完全に分離するメモリーコントローラー分割を含む１つ又は複数のメモリー分割技術に基づいて、前記メモリープールに割り当てられる。前記アビオニクスコンピューターシステムは、航空機に搭載されるアビオニクスプラットフォームの一部として実装される。 [0071] Example 11 is a multi-core processor unit, one or more processor clusters, each including a plurality of central processing unit (CPU) cores, each of the cores being a dedicated level fast cache One or more processor clusters having a first cache and a shared level second cache, an interconnect operatively coupled to the one or more processor clusters, and via the interconnect An avionics computer system is included that includes a multi-core processor unit that includes one or more memory controllers in operative communication with the one or more processor clusters. Main memory is in operative communication with the one or more processor clusters via the one or more memory controllers. The memory hierarchy of the avionic computer system includes at least one of the first or second cache, the one or more memory controllers, and the main memory, and the memory hierarchy is divided into a plurality of memory pools. The main memory includes a set of dual in-line memory modules (DIMMs) that are divided into ranks each having a rank address defined by a set of rank address bits, each rank being a set of dynamic random access memory (DRAM) devices. Each of the DRAM devices includes one or more banks each having a bank address defined by a set of bank address bits. A plurality of threads are executed on the CPU core. The thread uses the bank address bits to define one or more sizes and patterns in the memory pool, and the rank address bits use the rank address bits to access the one or more ranks. One or more comprising a memory controller partition that uses the memory controller interleave to distribute the memory requests fairly to a plurality of memory controllers or to completely separate the memory requests to specific memory controllers The memory pool is allocated based on a memory partitioning technique. The avionic computer system is implemented as a part of an avionics platform installed on an aircraft.

[0072]実施例１２は、実施例１１のアビオニクスコンピューターシステムを包含し、前記スレッドの１つ又は複数はまた、キャッシュインデックスアドレスビットを使用してキャッシュ分割のサイズ及び数を定めるキャッシュ分割技術に基づいて、前記メモリープールの１つ又は複数に割り当てられる。 [0072] Example 12 includes the avionics computer system of Example 11, wherein one or more of the threads is also based on a cache partitioning technique that uses cache index address bits to determine the size and number of cache partitions. And assigned to one or more of the memory pools.

[0073]実施例１３は、実施例１１から１２のいずれかのアビオニクスコンピューターシステムを含み、少なくとも、前記スレッドのいくつかは前記同じメモリープールに割り当てられる。 [0073] Example 13 includes the avionic computer system of any of Examples 11-12, at least some of the threads being assigned to the same memory pool.

[0074]実施例１４は、実施例１１から１２のいずれかのアビオニクスコンピューターシステムを包含し、前記スレッドの少なくともいくつかは異なるメモリープールにそれぞれ割り当てられる。 [0074] Example 14 includes the avionic computer system of any of Examples 11-12, wherein at least some of the threads are each assigned to a different memory pool.

[0075]実施例１５は、実施例１１から１４のいずれかのアビオニクスコンピューターシステムを包含し、前記スレッドの少なくともいくつかは同じＣＰＵコアにマッピングされる。 [0075] Example 15 includes the avionic computer system of any of Examples 11-14, wherein at least some of the threads are mapped to the same CPU core.

[0076]実施例１６は、実施例１１から１４のいずれかのアビオニクスコンピューターシステムを包含し、前記スレッドの少なくともいくつかは異なるＣＰＵコアにそれぞれマッピングされる。 [0076] Example 16 includes the avionic computer system of any of Examples 11-14, wherein at least some of the threads are each mapped to a different CPU core.

[0077]実施例１７は、実施例１１から１６のいずれかのアビオニクスコンピューターシステムを包含し、前記メモリープールの少なくともいくつかは、前記１つ又は複数のメモリーコントローラー、前記１つ又は複数のランク、又は前記１つ又は複数のバンクに、１対１（１：１）の対応でそれぞれマッピングされる。 [0077] Example 17 includes the avionics computer system of any of Examples 11-16, wherein at least some of the memory pools include the one or more memory controllers, the one or more ranks, Alternatively, the data is mapped to the one or more banks in a one-to-one (1: 1) correspondence.

[0078]実施例１８は、実施例１１から１６のいずれかのアビオニクスコンピューターシステムを包含し、前記メモリープールの少なくともいくつかは、前記１つ又は複数のメモリーコントローラーのうちの複数、前記１つ又は複数のランク、及び前記１つ又は複数のバンクに、１対多（１：Ｎ）対応でそれぞれマッピングされる。 [0078] Example 18 includes the avionics computer system of any of Examples 11-16, wherein at least some of the memory pools are a plurality of the one or more memory controllers, the one or more A plurality of ranks and one or more banks are mapped in a one-to-many (1: N) correspondence.

[0079]実施例１９は、コンピューティングシステムを操作する方法を包含し、前記方法は、前記コンピューティングシステムのメモリー階層を複数のメモリープールに分けるステップであって、前記メモリー階層は、少なくとも１つのキャッシュ、少なくとも１つのメモリーコントローラー、及びメインメモリーを含み、前記メインメモリーは、それぞれが一組のランクアドレスビットによって定められるランクアドレスを有するランクに分けられる一組のメモリーモジュールを含み、各ランクは一組のメモリーデバイスを有し、前記メモリーデバイスのそれぞれは、一組のバンクアドレスビットによって定められるバンクアドレスをそれぞれ有する１つ又は複数のバンクを含む、ステップと、前記コンピューティングシステムの少なくとも１つの処理ユニット上で実行される、複数のスレッドのそれぞれを、前記バンクアドレスビットを使用して前記メモリープールのうち１つ又は複数のサイズ及びパターンを定めるバンク分割、前記ランクアドレスビットを使用して前記１つ又は複数のランクにアクセスするランク分割、又はメモリーコントローラーインターリーブを使用して、前記メモリー要求を複数のメモリーコントローラーに公平に分散するか、又は前記メモリー要求を特定のメモリーコントローラーに完全に分離するメモリーコントローラー分割を含む１つ又は複数のメモリー分割技術に基づいて、前記メモリープールの１つ又は複数に割り当てるステップとを含む。 [0079] Example 19 includes a method of operating a computing system, the method comprising dividing the memory hierarchy of the computing system into a plurality of memory pools, the memory hierarchy comprising at least one memory hierarchy A cache, at least one memory controller, and a main memory, wherein the main memory includes a set of memory modules, each of which is divided into ranks having a rank address defined by a set of rank address bits, each rank having a A set of memory devices, each of the memory devices including one or more banks each having a bank address defined by a set of bank address bits; and at least one of the computing systems Each of a plurality of threads executed on a processing unit of the bank using the bank address bits to define one or more sizes and patterns of the memory pool using the bank address bits, using the rank address bits Use rank splitting to access the one or more ranks, or use memory controller interleaving to evenly distribute the memory requests to multiple memory controllers, or completely separate the memory requests to specific memory controllers Assigning to one or more of the memory pools based on one or more memory partitioning techniques including memory controller partitioning.

[0080]実施例２０は、実施例１９の方法を包含し、前記スレッドの１つ又は複数はまた、キャッシュインデックスアドレスビットを使用してキャッシュ分割のサイズ及び数を定めるキャッシュ分割技術に基づいて、前記メモリープールの１つ又は複数に割り当てられる。 [0080] Example 20 includes the method of Example 19, wherein one or more of the threads is also based on a cache partitioning technique that uses cache index address bits to determine the size and number of cache partitions, Assigned to one or more of the memory pools.

[0081]本発明は、その本質的特徴を逸脱しない範囲で、他の特定の形で実施することができる。記載された実施形態は、全ての点において例示的なものであり限定的ではない、と考えるべきである。従って、本発明の範囲は、前述の説明よりもむしろ添付の特許請求の範囲によって示される。本請求項の等価の意味及び範囲の中で生じる全ての変更は、本請求項の範囲の中に包含されるべきである。 [0081] The present invention may be embodied in other specific forms without departing from the essential characteristics thereof. The described embodiments are to be considered in all respects as illustrative and not restrictive. The scope of the invention is, therefore, indicated by the appended claims rather than by the foregoing description. All changes that come within the meaning and range of equivalency of the claims are to be embraced within the scope of the claims.

１１０ＣＯＴＳマルチコアプロセッサー
１１４キャッシュ
１１６キャッシュ
１１７相互接続
１１８キャッシュ
１１９入出力インターフェイス
１２０メモリーコントローラー
１３０メインメモリー
１３２ＤＩＭＭ
２００ＤＲＡＭデバイス
２１４行バッファー
２１６行デコーダー
２１８列デコーダー
２２０コマンド
２２２アドレス
２２４データ
２２６コマンドデコーダー
２２８リフレッシュカウンター
３２０要求バッファー
３３０メモリースケジューラー
３４０チャネルスケジューラー 110 COTS multi-core processor 114 cache 116 cache 117 interconnect 118 cache 119 input / output interface 120 memory controller 130 main memory 132 DIMM
200 DRAM device 214 row buffer 216 row decoder 218 column decoder 220 command 222 address 224 data 226 command decoder 228 refresh counter 320 request buffer 330 memory scheduler 340 channel scheduler

Claims

A computing system,
At least one processing unit;
At least one memory controller, with or without cache, in operational communication with the at least one processing unit;
A main memory operatively communicating with the at least one processing unit via the at least one memory controller;
The memory hierarchy of the computing system includes at least one cache, the at least one memory controller, and the main memory, and the memory hierarchy is divided into a plurality of memory pools,
The main memory includes a set of memory modules each divided into ranks having a rank address defined by a set of rank address bits, each rank having a set of memory devices, each of the memory devices Includes one or more banks each having a bank address defined by a set of bank address bits;
A plurality of threads are executed in the at least one processing unit;
A bank partition that uses the bank address bits to define one or more sizes and patterns of the memory pool;
Assigned to the memory pool based on one or more memory partitioning techniques, including rank partitioning using the rank address bits to access the one or more ranks, or memory controller partitioning using memory controller interleaving Be
Computing system.

An avionics computer system,
A multi-core processor unit,
One or more processor clusters, each including a plurality of central processing unit (CPU) cores, each of the cores having a dedicated level first cache and a shared level second cache A cluster,
An interconnect operatively coupled to the one or more processor clusters;
A multi-core processor unit comprising: one or more memory controllers in operational communication with the one or more processor clusters via the interconnect;
A main memory operatively communicating with the one or more processor clusters via the one or more memory controllers;
The memory hierarchy of the avionic computer system includes at least one of the first or second cache, the one or more memory controllers, and the main memory, and the memory hierarchy is divided into a plurality of memory pools,
The main memory includes a set of dual inline memory modules (DIMMs) that are divided into ranks each having a rank address defined by a set of rank address bits, each rank being a set of dynamic random access memory (DRAM) devices. Each of the DRAM devices includes one or more banks each having a bank address defined by a set of bank address bits;
A plurality of threads are executed on the CPU core, and the threads are
A bank partition that uses the bank address bits to define one or more sizes and patterns of the memory pool;
Rank splits using the rank address bits to access the one or more ranks, or use memory controller interleaving to distribute the memory requests fairly across the memory controllers, or Assigned to the memory pool based on one or more memory partitioning techniques, including memory controller partitioning that is completely separated into specific memory controllers,
The avionic computer system is implemented as part of an avionics platform installed on an aircraft,
Avionics computer system.

A method of operating a computing system, comprising:
Dividing the memory hierarchy of the computing system into a plurality of memory pools, the memory hierarchy including at least one cache, at least one memory controller, and main memory;
The main memory includes a set of memory modules that are divided into ranks each having a rank address defined by a set of rank address bits, each rank having a set of memory devices, each of the memory devices comprising: Including one or more banks each having a bank address defined by a set of bank address bits;
Steps,
Each of a plurality of threads executing on at least one processing unit of the computing system;
A bank partition that uses the bank address bits to define one or more sizes and patterns of the memory pool;
Rank splits using the rank address bits to access the one or more ranks, or use memory controller interleaving to distribute the memory requests fairly across the memory controllers, or Assigning to one or more of said memory pools based on one or more memory partitioning techniques, including memory controller partitioning that completely separates into specific memory controllers.