JP2013200827A

JP2013200827A - Process scheduling device, process scheduling method, and program

Info

Publication number: JP2013200827A
Application number: JP2012070143A
Authority: JP
Inventors: Atsushi Tsuji; 篤史辻
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 2012-03-26
Filing date: 2012-03-26
Publication date: 2013-10-03

Abstract

PROBLEM TO BE SOLVED: To provide a process scheduling device, a process scheduling method, and a program, capable of suppressing performance deterioration in a multiple CPU system.SOLUTION: A process scheduling device 30 includes: a cost calculating part 31 for calculating, for each process, a maintenance cost for maintaining consistency in a content of each CPU on the basis of information indicative of an access state to a cache of each CPU or a memory 12 acquired for each process executed by a computer 100 so as to perform scheduling processes in the computer 100 having CPUs 10 and 11; a grouping part 32 for separating respective processes into groups on the basis of a calculated maintenance cost for each process; and a scheduling part 33 for allocating, for each group, respective processes to either of the CPUs.

Description

本発明は、マルチＣＰＵシステムにおけるプロセスのスケジューリングを行なうための、プロセススケジューリング装置、プロセススケジューリング方法、及びこれらを実現するためのプログラムに関する。 The present invention relates to a process scheduling apparatus, a process scheduling method, and a program for realizing these processes, for performing process scheduling in a multi-CPU system.

近年、ＣＰＵ（Central Processing Unit）のクロック数の上昇には、限界が見え初めて来たため、複数のプロセッサコアを備えたマルチコアＣＰＵの利用が増加している。図１２は、従来からの一般的なマルチコアＣＰＵの概略構成を示す図である。図１２に示すように、一般的なマルチコアＣＰＵでは、各コアは、それぞれＬ１キャッシュ及びＬ２キャッシュを備えており、互いに、ＣＰＵに一つあるＬ３キャッシュを共有している。 In recent years, the rise in the number of clocks of a CPU (Central Processing Unit) has come to a limit, and the use of a multi-core CPU having a plurality of processor cores has increased. FIG. 12 is a diagram showing a schematic configuration of a conventional general multi-core CPU. As shown in FIG. 12, in a general multi-core CPU, each core has an L1 cache and an L2 cache, respectively, and shares one L3 cache in the CPU.

また、コンピュータの更なる処理能力の向上のため、このようなマルチコアＣＰＵを複数搭載したマルチＣＰＵシステムの利用も増加している。但し、マルチコアＣＰＵを複数備えたマルチＣＰＵシステムでは、別々の物理ＣＰＵによって実行される各プロセスが、同じメモリ領域にアクセスする場合に、キャッシュ一貫性（キャッシュコヒーレンシ：cache coherency）を維持するためキャッシュ無効化処理が行われ、メモリアクセスが遅延する場合がある。この結果、システムにおける性能が低下するという問題が発生してしまう。 In addition, in order to further improve the processing capability of computers, the use of multi-CPU systems equipped with a plurality of such multi-core CPUs is also increasing. However, in a multi-CPU system having a plurality of multi-core CPUs, caches are disabled to maintain cache coherency when each process executed by different physical CPUs accesses the same memory area. Processing may be performed and memory access may be delayed. As a result, there arises a problem that the performance in the system is degraded.

ここで、メモリアクセスの遅延について図１３を用いて説明する。図１３は、従来からのマルチＣＰＵシステムにおける動作を説明するための図である。図１３に示すように、ＣＰＵ２が、プロセスの実行のため、メモリにアクセスして、データＸのライトを要求している場合において、既に、別のＣＰＵ１が、自身のＬ３キャッシュにデータＸを読み込んでいるとする。 Here, the memory access delay will be described with reference to FIG. FIG. 13 is a diagram for explaining an operation in a conventional multi-CPU system. As shown in FIG. 13, when the CPU 2 accesses the memory to execute the process and requests to write the data X, another CPU 1 has already read the data X into its own L3 cache. Suppose that

この場合、キャッシュの一貫性を維持するため、システムのキャッシュ一貫性維持機構（キャッシュコヒーレンシ機構）は、別のＣＰＵ１のＬ３キャッシュ上のデータＸを無効化し、メモリへの書き戻しを実行する。このため、別のＣＰＵ１は、再度メモリから自身のＬ３キャッシュにデータＸをロードする必要がある。 In this case, in order to maintain the consistency of the cache, the cache consistency maintenance mechanism (cache coherency mechanism) of the system invalidates the data X on the L3 cache of another CPU 1 and executes write back to the memory. For this reason, another CPU 1 needs to load data X from the memory into its own L3 cache again.

このようなキャッシュ無効化処理が行われると、メモリからデータをロードする間、ＣＰＵのデータアクセス（リード／ライト）は遅延させられ、自身のみで動作する場合にくらべて性能は大きく低下する。例えば、別々のＣＰＵ上で動作するプロセスが同一のロックを取り合う場合、プロセス間（スレッド間）で共有のメモリ領域にアクセスする場合、このようなキャッシュ無効化が多く発生し、システムにおける性能が低下する。 When such a cache invalidation process is performed, the data access (read / write) of the CPU is delayed while data is loaded from the memory, and the performance is greatly reduced as compared with the case where the CPU operates alone. For example, when processes running on different CPUs share the same lock, when accessing a shared memory area between processes (between threads), such cache invalidation often occurs and the performance in the system decreases. To do.

このため、従来から、メモリアクセスの遅延を抑制するための方法が提案されている（特許文献１及び２参照。）。このような方法では、プロセス（スレッド）をできるだけ同一のＣＰＵ上で実行することにより、キャッシュ無効化によるメモリアクセスの遅延が抑制されている。 For this reason, conventionally, a method for suppressing a delay in memory access has been proposed (see Patent Documents 1 and 2). In such a method, a process (thread) is executed on the same CPU as much as possible, thereby suppressing a memory access delay due to cache invalidation.

具体的には、特許文献１に開示された方法では、まず、キャッシュに記憶されている共通データへの各プロセスのアクセスが検出される。そして、アクセスが検出された共通データのアドレスと、共通データへアクセスしたプロセスの識別情報と、同一プロセスによる同一データへのアクセスの回数とが記録される。次に、記憶された情報に基づいて、同一アドレスへのアクセス回数の多いプロセスが同一のＣＰＵに割り当てられる。 Specifically, in the method disclosed in Patent Document 1, first, the access of each process to the common data stored in the cache is detected. Then, the address of the common data where the access is detected, the identification information of the process accessing the common data, and the number of accesses to the same data by the same process are recorded. Next, based on the stored information, processes having a large number of accesses to the same address are assigned to the same CPU.

また、特許文献２に開示された方法では、まず、プロセスのメモリアクセス量及びメモリアクセス待ち時間等が、ＣＰＵに設けられた性能測定手段又はその他のＨＷ（Hard Wear）による性能測定手段によって取得される。次に、アクセス量又はアクセス時間が多いプロセスから順に、キャッシュの大きいＣＰＵに割り当てられる。 In the method disclosed in Patent Document 2, first, the memory access amount of the process, the memory access waiting time, and the like are acquired by the performance measurement unit provided in the CPU or other performance measurement unit by HW (Hard Wear). The Next, the CPU having the larger cache is assigned in order from the process having the larger access amount or access time.

特開２００２−５５９６６号公報JP 2002-55966 A 特開２００３‐０６１７５号公報JP 2003-06175 A

ところで、特許文献１に開示された方法では、アクセス回数の多いプロセスが同一のＣＰＵに割り当てられるが、実際には、アクセス回数の多いプロセスほど、キャッシュ無効化処理を引き起こしているわけではない。また、アクセスが検出された共通データのアドレスを取得し、記憶することは、データ量が膨大となるため、ＣＰＵにとって負担となる。このため、特許文献１に開示された方法では、メモリアクセスの遅延抑制は不十分であり、システムの性能が低下するおそれがある。 By the way, in the method disclosed in Patent Document 1, processes with a higher access count are assigned to the same CPU, but actually, a process with a higher access count does not cause a cache invalidation process. Also, acquiring and storing the address of the common data where access has been detected is a burden on the CPU because the amount of data becomes enormous. For this reason, with the method disclosed in Patent Document 1, memory access delay suppression is insufficient, and system performance may be degraded.

また、特許文献２に開示された方法では、ＣＰＵのキャッシュ効率が考慮されていないため、同一メモリ領域にアクセスするプロセス群が別々のＣＰＵに割り当てられることがある。この場合、キャッシュ無効化処理の回数が逆に増加してしまうことから、特許文献２に開示された方法でも、メモリアクセスの遅延抑制は不十分であり、システムの性能が低下するおそれがある。 Further, in the method disclosed in Patent Document 2, the CPU cache efficiency is not taken into consideration, and therefore, a process group that accesses the same memory area may be assigned to different CPUs. In this case, since the number of cache invalidation processes increases conversely, even with the method disclosed in Patent Document 2, the delay of memory access is insufficient and the system performance may be degraded.

本発明の目的の一例は、上記問題を解消し、マルチＣＰＵシステムにおける性能低下を抑制し得る、プロセススケジューリング装置、プロセススケジューリング方法、及びプログラムを提供することにある。 An object of the present invention is to provide a process scheduling apparatus, a process scheduling method, and a program that can solve the above-described problems and suppress performance degradation in a multi-CPU system.

上記目的を達成するため、本発明の一側面におけるプロセススケジューリング装置は、複数のＣＰＵを有するコンピュータにおいてプロセスのスケジューリングを行なうための装置であって、
前記コンピュータによって実行されるプロセス毎に取得された、前記複数のＣＰＵそれぞれのキャッシュ又は前記コンピュータのメモリへのアクセス状況を示す情報に基づいて、前記プロセス毎に、前複数のＣＰＵそれぞれのキャッシュの内容の一貫性を維持するためのコストを計算する、コスト計算部と、
計算された前記プロセス毎の前記コストに基づいて、前記プロセスそれぞれをグループに分ける、グルーピング部と、
前記プロセスそれぞれを、前記グループ毎に、前記複数のＣＰＵのいずれかに割り当てる、スケジューリング部と、を備えていることを特徴とする。 To achieve the above object, a process scheduling apparatus according to one aspect of the present invention is an apparatus for scheduling processes in a computer having a plurality of CPUs.
The contents of the cache of each of the plurality of CPUs for each process based on the information indicating the access status to the cache of each of the plurality of CPUs or the memory of the computer acquired for each process executed by the computer A cost calculator that calculates the cost of maintaining consistency
A grouping unit that divides each of the processes into groups based on the calculated cost for each of the processes;
And a scheduling unit that assigns each of the processes to any of the plurality of CPUs for each of the groups.

また、上記目的を達成するため、本発明の一側面におけるプロセススケジューリング方法は、複数のＣＰＵを有するコンピュータにおいてプロセスのスケジューリングを行なうための方法であって、
（ａ）前記コンピュータによって実行されるプロセス毎に取得されている、前記複数のＣＰＵそれぞれのキャッシュ又は前記コンピュータのメモリへのアクセス状況を示す情報に基づいて、前記プロセス毎に、前複数のＣＰＵそれぞれのキャッシュの内容の一貫性を維持するためのコストを計算する、ステップと、
（ｂ）前記（ａ）のステップで計算された前記プロセス毎の前記コストに基づいて、前記プロセスそれぞれをグループに分ける、ステップと、
（ｃ）前記プロセスそれぞれを、前記グループ毎に、前記複数のＣＰＵのいずれかに割り当てる、ステップと、を有することを特徴とする。 In order to achieve the above object, a process scheduling method according to one aspect of the present invention is a method for scheduling a process in a computer having a plurality of CPUs.
(A) Each of the plurality of CPUs for each of the processes based on information indicating an access status to the cache of each of the plurality of CPUs or the memory of the computer acquired for each process executed by the computer Calculating the cost to maintain the consistency of the cache contents of the
(B) dividing each of the processes into groups based on the cost for each of the processes calculated in the step of (a);
(C) assigning each of the processes to any one of the plurality of CPUs for each of the groups.

更に、上記目的を達成するため、本発明の一側面におけるプログラムは、複数のＣＰＵを有するコンピュータに、プロセスのスケジューリングを行なわせるためのプログラムであって、
前記コンピュータに、
（ａ）実行対象となるプロセス毎に取得されている、前記複数のＣＰＵそれぞれのキャッシュ又は前記コンピュータのメモリへのアクセス状況を示す情報に基づいて、前記プロセス毎に、前複数のＣＰＵそれぞれのキャッシュの内容の一貫性を維持するためのコストを計算する、ステップと、
（ｂ）前記（ａ）のステップで計算された前記プロセス毎の前記コストに基づいて、前記プロセスそれぞれをグループに分ける、ステップと、
（ｃ）前記プロセスそれぞれを、前記グループ毎に、前記複数のＣＰＵのいずれかに割り当てる、ステップと、を実行させる。 Furthermore, in order to achieve the above object, a program according to one aspect of the present invention is a program for causing a computer having a plurality of CPUs to perform process scheduling,
In the computer,
(A) Based on information indicating an access status to each of the plurality of CPUs or the memory of the computer acquired for each process to be executed, each of the previous plurality of CPUs for each process Calculating the cost of maintaining the consistency of the content, steps,
(B) dividing each of the processes into groups based on the cost for each of the processes calculated in the step of (a);
(C) allocating each of the processes to any of the plurality of CPUs for each of the groups.

以上のように、本発明によれば、マルチＣＰＵシステムにおける性能低下を抑制することができる。 As described above, according to the present invention, it is possible to suppress the performance degradation in the multi-CPU system.

図１は、本発明の実施の形態１におけるプロセススケジューリング装置の構成を示すブロック図である。FIG. 1 is a block diagram showing a configuration of a process scheduling apparatus according to Embodiment 1 of the present invention. 図２は、図１に示したコンピュータに設けられた性能情報取得部における処理を説明するための図である。FIG. 2 is a diagram for explaining processing in the performance information acquisition unit provided in the computer shown in FIG. 図３は、本発明の実施の形態１におけるプロセススケジューリング装置の動作を示すフロー図である。FIG. 3 is a flowchart showing the operation of the process scheduling apparatus according to Embodiment 1 of the present invention. 図４は、本発明の実施の形態１において維持コストが計算されたプロセスの一例を示す図である。FIG. 4 is a diagram showing an example of a process for which the maintenance cost is calculated in the first embodiment of the present invention. 図５は、図３に示したスケジューリング処理を概念的に説明する図である。FIG. 5 is a diagram conceptually illustrating the scheduling process shown in FIG. 図６は、本発明の実施の形態２におけるプロセススケジューリング装置の動作を示すフロー図である。FIG. 6 is a flowchart showing the operation of the process scheduling apparatus according to Embodiment 2 of the present invention. 図７は、本発明の実施の形態２におけるコスト計算処理を概念的に説明する図である。FIG. 7 is a diagram conceptually illustrating the cost calculation process in the second embodiment of the present invention. 図８は、本発明の実施の形態２におけるスケジューリング処理を概念的に説明する図である。FIG. 8 is a diagram conceptually illustrating the scheduling process in the second embodiment of the present invention. 図９は、本発明の実施の形態３におけるプロセススケジューリング装置の動作を示すフロー図である。FIG. 9 is a flowchart showing the operation of the process scheduling apparatus according to the third embodiment of the present invention. 図１０は、本発明の実施の形態３における情報取得処理を概念的に示す図である。FIG. 10 is a diagram conceptually showing the information acquisition process in the third embodiment of the present invention. 図１１は、本発明の実施の形態３におけるスケジューリング処理を概念的に説明する図である。FIG. 11 is a diagram conceptually illustrating the scheduling process according to the third embodiment of the present invention. 図１２は、従来からの一般的なマルチコアＣＰＵの概略構成を示す図である。FIG. 12 is a diagram showing a schematic configuration of a conventional general multi-core CPU. 図１３は、従来からのマルチＣＰＵシステムにおける動作を説明するための図である。FIG. 13 is a diagram for explaining an operation in a conventional multi-CPU system.

（実施の形態１）
以下、本発明の実施の形態１における、プロセススケジューリング装置、プロセススケジューリング方法、及びプログラムについて、図１〜図１１を参照しながら説明する。 (Embodiment 1)
Hereinafter, a process scheduling apparatus, a process scheduling method, and a program according to Embodiment 1 of the present invention will be described with reference to FIGS.

［装置構成］
最初に、本実施の形態１におけるプロセススケジューリング装置の構成を図１を用いて説明する。図１は、本発明の実施の形態１におけるプロセススケジューリング装置の構成を示すブロック図である。 [Device configuration]
First, the configuration of the process scheduling apparatus according to the first embodiment will be described with reference to FIG. FIG. 1 is a block diagram showing a configuration of a process scheduling apparatus according to Embodiment 1 of the present invention.

図１に示すように、本実施の形態１におけるプロセススケジューリング装置３０は、ＣＰＵ１０及び１１を有するコンピュータ１００において、プロセスのスケジューリングを実行する装置である。本実施の形態１では、プロセススケジューリング装置３０は、コンピュータ１００に導入されているオペレーティングシステム２０上に後述するプログラムによって構築されている。 As shown in FIG. 1, a process scheduling apparatus 30 according to the first embodiment is an apparatus that executes process scheduling in a computer 100 having CPUs 10 and 11. In the first embodiment, the process scheduling apparatus 30 is constructed by a program described later on the operating system 20 installed in the computer 100.

また、図１に示すように、本実施の形態１では、コンピュータ１００は、更に、主記憶装置としてのメモリ１２と、ＣＰＵ毎に設けられた性能情報取得部１３とを備えている。性能情報取得部１３は、コンピュータ１００によって実行されるプロセス毎に、対応するＣＰＵのキャッシュ又はメモリ１２へのアクセス状況を示す情報（以下「性能情報」と表記する。）を取得し、取得した情報をプロセススケジューリング装置に入力する。 As shown in FIG. 1, in the first embodiment, the computer 100 further includes a memory 12 as a main storage device and a performance information acquisition unit 13 provided for each CPU. The performance information acquisition unit 13 acquires, for each process executed by the computer 100, information indicating the access status of the corresponding CPU to the cache or memory 12 (hereinafter referred to as “performance information”), and the acquired information. Is input to the process scheduling apparatus.

更に、本実施の形態１では、ＣＰＵ１０及び１１は、図１２に示したマルチコアＣＰＵである。また、図１には、２つのＣＰＵのみが例示されているが、本実施の形態１において、コンピュータに備えられるＣＰＵの数は、特に限定されるものではない。 Further, in the first embodiment, the CPUs 10 and 11 are multi-core CPUs shown in FIG. Moreover, although only two CPUs are illustrated in FIG. 1, in the first embodiment, the number of CPUs provided in the computer is not particularly limited.

また、図１に示すように、プロセススケジューリング装置３０は、コスト計算部３１と、グルーピング部３２と、スケジューリング部３３とを備えている。このうち、コスト計算部３１は、性能情報取得部１３によってプロセス毎に取得された性能情報に基づいて、プロセス毎に、ＣＰＵ１０及び１１それぞれのキャッシュの内容の一貫性（キャッシュコヒーレンシ）を維持するためのコスト（以下「維持コスト」と表記する。）を計算する。 As shown in FIG. 1, the process scheduling apparatus 30 includes a cost calculation unit 31, a grouping unit 32, and a scheduling unit 33. Among these, the cost calculation unit 31 maintains consistency (cache coherency) of the contents of the respective caches of the CPUs 10 and 11 for each process based on the performance information acquired for each process by the performance information acquisition unit 13. Cost (hereinafter referred to as “maintenance cost”).

グルーピング部３２は、計算されたプロセス毎の維持コストに基づいて、プロセスそれぞれをグループに分ける。更に、スケジューリング部３３は、各プロセスを、グループ毎に、ＣＰＵ１０及び１１のいずれかに割り当てる。 The grouping unit 32 divides each process into groups based on the calculated maintenance cost for each process. Furthermore, the scheduling unit 33 assigns each process to one of the CPUs 10 and 11 for each group.

このように、本実施の形態１では、プロセススケジューリング装置３０は、各ＣＰＵの実際のアクセス状況に基づいて、各プロセスについて、キャッシュ一貫性を維持するために必要な維持コストが計算される。そして、各プロセスのＣＰＵの割り当ては、計算された維持コストに基づいて、例えば、維持コストが高いプロセスはまとめて同じＣＰＵに割り当てられるので、メモリアクセスの遅延が抑制され、結果、マルチＣＰＵシステムにおける性能低下も抑制される。 Thus, in the first embodiment, the process scheduling apparatus 30 calculates a maintenance cost necessary for maintaining cache coherency for each process based on the actual access status of each CPU. Then, the CPU allocation of each process is based on the calculated maintenance cost, for example, processes having a high maintenance cost are collectively allocated to the same CPU, so that memory access delay is suppressed, and as a result, in the multi-CPU system Performance degradation is also suppressed.

ここで、図１に加えて図２を用いて、本実施の形態１におけるプロセススケジューリング装置３０の構成を更に具体的に説明する。図２は、図１に示したコンピュータに設けられた性能情報取得部における処理を説明するための図である。 Here, the configuration of the process scheduling apparatus 30 according to the first embodiment will be described more specifically with reference to FIG. 2 in addition to FIG. FIG. 2 is a diagram for explaining processing in the performance information acquisition unit provided in the computer shown in FIG.

まず、本実施の形態１では、コンピュータ１００に備えられた性能情報取得部１３は、性能情報として、プロセス毎に、対応するＣＰＵのキャッシュが無効化された回数を取得する。このため、コスト計算部３１は、各ＣＰＵのキャッシュが無効化された回数（以下「キャッシュ無効化回数」と表記する。）に基づいて、維持コストを計算する。 First, in the first embodiment, the performance information acquisition unit 13 provided in the computer 100 acquires, as performance information, the number of times the corresponding CPU cache is invalidated for each process. For this reason, the cost calculation unit 31 calculates the maintenance cost based on the number of times each CPU's cache is invalidated (hereinafter referred to as “cache invalidation count”).

つまり、本実施の形態１では、「キャッシュ無効化回数」が、キャッシュ一貫性を維持するための維持コストとみなされて、取得される。そして、キャッシュ無効化回数が小さくなるようにスケジューリングを行うことで、システムの性能の向上が図られる。これは、キャッシュ無効化が発生したプロセスは、他のＣＰＵ上で実行されるプロセスと同一のメモリ領域にアクセスするため、他のＣＰＵのキャッシュ無効化を引き起こす要因となり、性能低下の原因になると考えられるからである。 That is, in the first embodiment, “cache invalidation count” is regarded as a maintenance cost for maintaining cache coherency and is acquired. Then, scheduling is performed so that the number of cache invalidations is reduced, thereby improving the system performance. This is because the process in which the cache invalidation occurs accesses the same memory area as the process executed on the other CPU, so that it causes the cache invalidation of the other CPU and the performance degradation. Because it is.

なお、複数のＣＰＵが、同一のメモリ領域にアクセスする際において、各ＣＰＵがメモリに対して読み込み処理のみを実行した場合は、キャッシュの無効化は発生しない。いずれかのＣＰＵが、メモリに対して書き込み処理を実行すると、他方のＣＰＵのキャッシュが無効化され、性能低下が発生する。 When a plurality of CPUs access the same memory area and each CPU executes only read processing on the memory, cache invalidation does not occur. When one of the CPUs performs a write process on the memory, the cache of the other CPU is invalidated, resulting in performance degradation.

具体的には、本実施の形態１では、性能情報取得部１３は、「キャッシュ無効化回数」として、あるＣＰＵが、他のＣＰＵのＭ状態のキャッシュラインに相当する、メモリのデータにアクセスした回数を取得する。この回数は、言い換えると、図２に示すように、他のＣＰＵにおいて、変更されたキャッシュが無効化された回数に相当する。 Specifically, in the first embodiment, the performance information acquisition unit 13 accesses the data in the memory corresponding to the cache line in the M state of another CPU as “cache invalidation count”. Get the number of times. In other words, as shown in FIG. 2, this number of times corresponds to the number of times the changed cache is invalidated in another CPU.

「Ｍ状態」とは、図２に示すように、キャッシュ一貫性を保つためのプロトコルの一つである「ＭＥＳＩプロトコル」で規定された状態であり、当該キャッシュラインが、そのキャッシュにのみ存在し、メインメモリ上の値から変更されている状態をいう。 As shown in FIG. 2, the “M state” is a state defined by “MESI protocol” which is one of the protocols for maintaining cache coherence, and the cache line exists only in the cache. The state where the value on the main memory is changed.

性能情報取得部１３の具体例としては、例えば、ＣＰＵがインテル社製のＣＰＵであるならば、ＣＰＵ性能カウンタ（以下の参照文献１を参照。）が挙げられる。この場合、ＣＰＵ性能カウンタ「MSR_OFFCORE_RSP_0」のL3_CACHE_HIT_OTHER_CORE_HITM」を用いて、論理コア毎に、他ＣＰＵにＭ状態で存在するＬ３キャッシュラインへのアクセス回数が取得される。
参照文献１：「Intel 64 and IA-32 Architectures Software Developer’s Manual Volume 3B: System Programming Guide, Part 2」 As a specific example of the performance information acquisition unit 13, for example, if the CPU is an Intel CPU, a CPU performance counter (see Reference Document 1 below) may be mentioned. In this case, using the L3_CACHE_HIT_OTHER_CORE_HITM of the CPU performance counter “MSR_OFFCORE_RSP_0”, the number of accesses to the L3 cache line existing in the M state in the other CPU is obtained for each logical core.
Reference 1: "Intel 64 and IA-32 Architectures Software Developer's Manual Volume 3B: System Programming Guide, Part 2"

また、本実施の形態１では、コスト計算部３１は、性能情報取得部毎（ＣＰＵ毎）に設けられており、各コスト計算部３１は、性能情報取得部１３によって取得されたプロセス毎の「キャッシュ無効化回数」を、維持コストとする。更に、各コスト計算部３１は、このようにして得られた維持コストの値を、プロセス毎に、コスト記録部３４に記録する（口述の図４参照）。なお、上述した特許文献２には、ＣＰＵの性能カウンタの値をプロセス毎に記録する方式が開示されている。 Further, in the first embodiment, the cost calculation unit 31 is provided for each performance information acquisition unit (for each CPU), and each cost calculation unit 31 stores “for each process acquired by the performance information acquisition unit 13. The “cache invalidation count” is the maintenance cost. Furthermore, each cost calculation part 31 records the value of the maintenance cost obtained in this way in the cost recording part 34 for each process (see FIG. 4 of dictation). Patent Document 2 described above discloses a method of recording the value of the performance counter of the CPU for each process.

また、本実施の形態１では、グルーピング部３２は、各プロセスを、キャッシュ無効化回数が予め設定された閾値を越えているプロセスのグループと、キャッシュ無効化回数が予め設定された閾値以下となっているプロセスのグループとに分けることができる。キャッシュ無効化回数が少ない場合は、性能低下を無視できるからである。 In the first embodiment, the grouping unit 32 sets each process to a group of processes whose cache invalidation count exceeds a preset threshold and the cache invalidation count is equal to or less than a preset threshold. Can be divided into groups of processes. This is because when the cache invalidation count is small, the performance degradation can be ignored.

そして、この場合、スケジューリング部３３は、キャッシュ無効化回数が閾値を越えているプロセスのグループについては、同一のＣＰＵに割り当てる。この結果、同一のメモリ領域にアクセスする可能性が高いプロセスは、一つにまとめらえるので、キャッシュ無効化の発生が減少し、性能低下が抑制される。 In this case, the scheduling unit 33 assigns a group of processes whose cache invalidation count exceeds the threshold to the same CPU. As a result, processes that have a high possibility of accessing the same memory area can be combined into one, so that the occurrence of cache invalidation is reduced and performance degradation is suppressed.

［装置動作］
次に、本実施の形態１におけるプロセススケジューリング装置の動作について図３〜図５を用いて説明する。図３は、本発明の実施の形態１におけるプロセススケジューリング装置の動作を示すフロー図である。図４は、本発明の実施の形態１において維持コストが計算されたプロセスの一例を示す図である。図５は、図３に示したスケジューリング処理を概念的に説明する図である。 [Device operation]
Next, the operation of the process scheduling apparatus according to the first embodiment will be described with reference to FIGS. FIG. 3 is a flowchart showing the operation of the process scheduling apparatus according to Embodiment 1 of the present invention. FIG. 4 is a diagram showing an example of a process for which the maintenance cost is calculated in the first embodiment of the present invention. FIG. 5 is a diagram conceptually illustrating the scheduling process shown in FIG.

更に、本実施の形態１では、プロセススケジューリング装置３０を動作させることによって、プロセススケジューリング方法が実施される。よって、本実施の形態１におけるプロセススケジューリング方法の説明は、以下のプロセススケジューリング装置３０の動作説明に代える。なお、以下の説明においては、適宜図１及び図２を参酌する。 Furthermore, in the first embodiment, the process scheduling method is implemented by operating the process scheduling apparatus 30. Therefore, the description of the process scheduling method in the first embodiment is replaced with the following description of the operation of the process scheduling apparatus 30. In the following description, FIGS. 1 and 2 are referred to as appropriate.

また、本実施の形態１では、プロセススケジューリング装置３０は、オペレーティングシステム２０に予め備えられているスケジューラと協調して、Ｌｉｎｕｘ（登録商標）ＯＳにおけるマイグレーションスレッド（以下の参考文献２を参照。）のように動作する。
参考文献２：「詳解Ｌｉｎｕｘカーネル第３版」、Daniel P. Bovet, Marco Cesati 著、高橋浩和監訳、杉田由美子、清水正明、高杉昌督、平松雅巳、安井隆宏訳 Further, in the first embodiment, the process scheduling apparatus 30 cooperates with a scheduler provided in advance in the operating system 20 to create a migration thread (see Reference Document 2 below) in the Linux (registered trademark) OS. To work.
Reference 2: "Detailed Linux Kernel Third Edition", written by Daniel P. Bovet, Marco Cesati, directed by Hirokazu Takahashi, Yumiko Sugita, Masaaki Shimizu, Masanori Takasugi, Masatsugu Hiramatsu, Takahiro Yasui

図３に示すように、まず、各コスト計算部３１は、コスト記録部３４に記録されている各プロセスの維持コストを初期化して０（ゼロ）に設定する（ステップＡ１）。続いて、各コスト計算部３１は、ＣＰＵ１０及び１１のうちの一つを選択する（ステップＡ２）。但し、今回のステップＡ２を実行する前に、既に図３に示すステップＡ１〜Ａ７が実行されている場合は、各コスト計算部３１は、前回の処理の際に選択していないＣＰＵを選択する。 As shown in FIG. 3, first, each cost calculation unit 31 initializes and sets the maintenance cost of each process recorded in the cost recording unit 34 to 0 (zero) (step A1). Subsequently, each cost calculation unit 31 selects one of the CPUs 10 and 11 (step A2). However, if the steps A1 to A7 shown in FIG. 3 have already been executed before executing the current step A2, each cost calculation unit 31 selects a CPU that has not been selected in the previous process. .

次に、各コスト計算部３１は、対応する性能情報取得部１３からプロセス毎の性能情報を受け取り、プロセス毎にキャッシュ無効化回数（維持コスト）を求め、求めたキャッシュ無効化回数を維持コストとしてコスト記録部３４に記録する（ステップＡ３）。ステップＡ３が実行されると、図４に示すように、プロセス毎の維持コストが特定される。 Next, each cost calculation unit 31 receives performance information for each process from the corresponding performance information acquisition unit 13, obtains the cache invalidation count (maintenance cost) for each process, and uses the obtained cache invalidation count as the maintenance cost. It records in the cost recording part 34 (step A3). When step A3 is executed, the maintenance cost for each process is specified as shown in FIG.

次に、ステップＡ３が実行されると、グルーピング部３２は、プロセス毎に記録された維持コストの大きさの順に、各プロセスをソートする（ステップＡ４）。次に、ソートが完了すると、グルーピング部３２は、いずれかのプロセスの維持コストが設定された閾値Ｔを超えているかどうかを判定する（ステップＡ５）。 Next, when step A3 is executed, the grouping unit 32 sorts the processes in the order of the maintenance cost recorded for each process (step A4). Next, when the sorting is completed, the grouping unit 32 determines whether the maintenance cost of any process exceeds the set threshold value T (step A5).

また、本実施の形態１では、ステップＡ５で判定基準となる「閾値Ｔ」は、コンピュータ１０のシステム構成に応じて、利用者が指定することができる。例えば、利用者は、ＣＰＵ性能カウンタ「UNCORE_HIT+OTHER_CORE_HIT_SNP+OTHER_CORE_HITM」より取得した、当該プロセスのＬ３キャッシュラインへのアクセス数に基づいて、閾値Ｔを指定することができる。閾値Ｔの具体例としては、アクセス数の８０％の値等が挙げられる。なお、閾値Ｔがこのような値に設定されている場合は、維持コストが閾値Ｔを超えるプロセスでは、Ｌ３キャッシュラインへの全アクセスの８割を超えるアクセスが、キャッシュ無効化となっている。 In the first embodiment, the “threshold value T” that is the determination criterion in step A5 can be specified by the user according to the system configuration of the computer 10. For example, the user can specify the threshold T based on the number of accesses to the L3 cache line of the process acquired from the CPU performance counter “UNCORE_HIT + OTHER_CORE_HIT_SNP + OTHER_CORE_HITM”. A specific example of the threshold T is a value of 80% of the number of accesses. When the threshold value T is set to such a value, in a process whose maintenance cost exceeds the threshold value T, accesses exceeding 80% of all accesses to the L3 cache line are cache invalidated.

ステップＡ５の判定の結果、いずれのプロセスの維持コストも閾値Ｔを超えていない場合（全てのプロセスの維持コストが閾値Ｔ以下である場合）は、プロセススケジューリング装置３０における処理は一旦終了する。この場合は、キャッシュ無効化による性能低下が発生していない状態であると考えられるからである。 As a result of the determination in step A5, when the maintenance cost of any process does not exceed the threshold value T (when the maintenance cost of all processes is equal to or less than the threshold value T), the process in the process scheduling device 30 is temporarily ended. This is because it is considered that there is no performance degradation due to cache invalidation.

一方、ステップＡ５の判定の結果、いずれのプロセスの維持コストが閾値Ｔを超えている場合は、グルーピング部３２は、図４に示すように、維持コストの値が大きい上位Ｎ個（Ｎは任意の自然数）のプロセスを一つのグループに設定する（ステップＡ６）。なお、一つのグループにまとめる際のプロセス数Ｎは、特に限定されないが、例えば、ＣＰＵの「コア数」などに設定できる。 On the other hand, if the maintenance cost of any process exceeds the threshold T as a result of the determination in step A5, the grouping unit 32, as shown in FIG. (Natural number) is set in one group (step A6). The number of processes N when grouping into one group is not particularly limited, but can be set to, for example, the “number of cores” of the CPU.

次に、スケジューリング部３３は、ステップＡ６で設定したグループに属するプロセスを、ステップＡ２で選択したＣＰＵに割り当てる（ステップＡ７）。これは、維持コストの値が大きい上位Ｎ個のプロセスを、別々のＣＰＵで動作させると、キャッシュ無効化による性能低下が発生し易くなるためである。 Next, the scheduling unit 33 assigns the process belonging to the group set in step A6 to the CPU selected in step A2 (step A7). This is because if the top N processes having a large maintenance cost value are operated by different CPUs, performance degradation due to cache invalidation is likely to occur.

ステップＡ７が実行されると、維持コストが高いＮ個のプロセスＡ〜Ｈは、同じＣＰＵに割り当てられる。図５に示すように、例えば、維持コストが高いプロセスＢ及びプロセスＣの維持コストはスケジューリングにより減少する。なお、図５では、図４に示したプロセスの一部のみを図示している。 When step A7 is executed, N processes A to H having a high maintenance cost are assigned to the same CPU. As shown in FIG. 5, for example, the maintenance costs of the process B and the process C having a high maintenance cost are reduced by scheduling. In FIG. 5, only a part of the process shown in FIG. 4 is illustrated.

そして、ステップＡ１〜Ａ７は、一定時間の経過後、再度実行される。この場合は、ステップＡ２において、上述したように前回選択されたＣＰＵと異なるＣＰＵが選択されて、スケジューリングが実行される。つまり、選択対象となるＣＰＵはラウンドロビン方式によって選択される。このようにして、ステップＡ１〜Ａ７を繰り返すことにより、コンピュータ１００全体において、維持コストの低減が図られ、性能の低下が抑制される。 And step A1-A7 is performed again after progress of fixed time. In this case, in step A2, a CPU different from the previously selected CPU is selected as described above, and scheduling is executed. That is, the CPU to be selected is selected by the round robin method. In this way, by repeating steps A1 to A7, the maintenance cost is reduced in the computer 100 as a whole, and the decrease in performance is suppressed.

［実施の形態１の効果］
以上のように、本実施の形態１によれば、キャッシュ無効化回数が多いプロセスが同一のＣＰＵで実行されるようにスケジューリングが行われる。これにより、互いに別々のＣＰＵ上で動作させると同一のロックを取り合う複数のプロセス、更には、プロセス間／スレッド間で共有のメモリ領域にアクセスする複数のプロセスが、同一のＣＰＵによって実行される。この結果、各プロセスにおいてキャッシュ無効化回数が減少するため、メモリアクセスの遅延が低減され、マルチＣＰＵシステムにおける性能が向上する。 [Effect of Embodiment 1]
As described above, according to the first embodiment, scheduling is performed so that processes having a large number of cache invalidations are executed by the same CPU. Accordingly, a plurality of processes that acquire the same lock when operated on different CPUs, and a plurality of processes that access a shared memory area between processes / threads are executed by the same CPU. As a result, the number of cache invalidations decreases in each process, so that the memory access delay is reduced and the performance in the multi-CPU system is improved.

また、キャッシュ無効化回数が多いプロセスは、他のプロセスとのキャッシュ競合でキャッシュ無効化を発生させ、性能低下を引き起こす原因である。そして、利用者は、維持コストの計算結果を用いて、キャッシュ競合を引き起こすプロセスを特定することができるので、スケジュール時のＣＰＵの割り当ての手動設定、及びプログラム構造の改善を図ることもできる。 In addition, a process with a large number of cache invalidations causes cache invalidation due to cache contention with other processes, which causes performance degradation. Since the user can specify the process causing the cache contention by using the calculation result of the maintenance cost, the CPU allocation can be manually set at the time of scheduling and the program structure can be improved.

［プログラム］
また、本実施の形態１におけるプログラムは、コンピュータ１００に、図３に示すステップＡ１〜Ａ７を実行させるプログラムであれば良い。このプログラムをコンピュータにインストールし、実行することによって、本実施の形態１におけるプロセススケジューリング装置とプロセススケジューリング方法とを実現することができる。この場合、コンピュータのＣＰＵ１０及び１１は、コスト計算部３１、グルーピング部３２、スケジューリング部３３として機能し、処理を行なう。また、コンピュータに備えられたハードディスク、メモリ等の記憶装置が、コスト記録部３４として機能する。 [program]
Moreover, the program in this Embodiment 1 should just be a program which makes the computer 100 perform step A1-A7 shown in FIG. By installing and executing this program on a computer, the process scheduling apparatus and the process scheduling method according to the first embodiment can be realized. In this case, the CPUs 10 and 11 of the computer function as the cost calculation unit 31, the grouping unit 32, and the scheduling unit 33 to perform processing. Further, a storage device such as a hard disk or a memory provided in the computer functions as the cost recording unit 34.

また、本実施の形態１におけるプログラムは、コンピュータ読み取り可能な記録媒体に格納された状態で提供されても良いし、インターネット等のネットワーク介して流通されるものであっても良い。記録媒体の具体例としては、ＣＦ（Compact Flash（登録商標））及びＳＤ（Secure Digital）等の汎用的な半導体記憶デバイス、フレキシブルディスク（Flexible Disk）等の磁気記憶媒体、又はＣＤ−ＲＯＭ（Compact Disk Read Only Memory）などの光学記憶媒体が挙げられる。 Further, the program according to the first embodiment may be provided in a state stored in a computer-readable recording medium, or may be distributed via a network such as the Internet. Specific examples of the recording medium include general-purpose semiconductor storage devices such as CF (Compact Flash (registered trademark)) and SD (Secure Digital), magnetic storage media such as a flexible disk, or CD-ROM (Compact Optical storage media such as Disk Read Only Memory).

（実施の形態２）
次に、本発明の実施の形態２における、プロセススケジューリング装置、プロセススケジューリング方法、及びプログラムについて、図６〜図８を参照しながら説明する。また、以下の説明では、適宜、図１を参照する。 (Embodiment 2)
Next, a process scheduling apparatus, a process scheduling method, and a program according to Embodiment 2 of the present invention will be described with reference to FIGS. In the following description, FIG. 1 will be referred to as appropriate.

本実施の形態２におけるプロセススケジューリング装置は、図１に示した実施の形態１におけるプロセススケジューリング装置１０と同様の構成を備えている。但し、本実施の形態２では、コスト計算部３１は、各ＣＰＵのキャッシュ無効化回数に基づいて、プロセス毎に、各プロセスが共有領域アクセス処理を実行している間のキャッシュ無効化回数を計算し、計算したキャッシュ無効化回数を維持コストとする。 The process scheduling apparatus according to the second embodiment has the same configuration as that of the process scheduling apparatus 10 according to the first embodiment shown in FIG. However, in the second embodiment, the cost calculation unit 31 calculates the number of cache invalidations while each process is executing the shared area access processing for each process based on the number of cache invalidations of each CPU. The calculated cache invalidation count is used as the maintenance cost.

つまり、本実施の形態２においては、共有領域アクセス処理時における維持コストが大きいプロセスが、同一のＣＰＵによって実行されるようにスケジューリングが実行される。また、「共有領域アクセス処理」とは、一般的には、ロック処理、共有メモリへのアクセス処理等が該当する。更に、共有領域アクセス処理は、利用者によって定義されていても良く、この場合は、利用者は、プログラミング時において、プロセス間でメモリ競合が発生し得る任意の処理を、共有領域アクセス処理として定義する。また、共有領域アクセス処理は、既存のプログラムから抽出されたロック処理であっても良い。 That is, in the second embodiment, scheduling is executed so that a process having a high maintenance cost during shared area access processing is executed by the same CPU. The “shared area access process” generally includes a lock process, an access process to a shared memory, and the like. Furthermore, the shared area access process may be defined by the user. In this case, the user defines any process that may cause a memory contention between processes as a shared area access process during programming. To do. The shared area access process may be a lock process extracted from an existing program.

ここで、図６〜図８を用いて、本実施の形態２におけるプロセススケジューリング装置の動作について説明する。図６は、本発明の実施の形態２におけるプロセススケジューリング装置の動作を示すフロー図である。図７は、本発明の実施の形態２におけるコスト計算処理を概念的に説明する図である。図８は、本発明の実施の形態２におけるスケジューリング処理を概念的に説明する図である。 Here, the operation of the process scheduling apparatus according to the second embodiment will be described with reference to FIGS. FIG. 6 is a flowchart showing the operation of the process scheduling apparatus according to Embodiment 2 of the present invention. FIG. 7 is a diagram conceptually illustrating the cost calculation process in the second embodiment of the present invention. FIG. 8 is a diagram conceptually illustrating the scheduling process in the second embodiment of the present invention.

なお、本実施の形態２においても、プロセススケジューリング装置を動作させることによって、プロセススケジューリング方法が実施される。よって、本実施の形態２におけるプロセススケジューリング方法の説明も、以下のプロセススケジューリング装置の動作説明に代える。 In the second embodiment, the process scheduling method is also implemented by operating the process scheduling apparatus. Therefore, the description of the process scheduling method in the second embodiment is also replaced with the following description of the operation of the process scheduling apparatus.

まず、本実施の形態２では、前提として、各コスト計算部３１は、常時、対応する性能情報取得部１３からプロセス毎の性能情報としてキャッシュ無効化回数を受け取り、これをコスト記録部３４に記録しているとする。記録されたキャッシュ無効化回数は、維持コストの計算に用いられる。 First, in the second embodiment, as a premise, each cost calculation unit 31 always receives the cache invalidation count as performance information for each process from the corresponding performance information acquisition unit 13 and records this in the cost recording unit 34. Suppose you are. The recorded number of cache invalidations is used to calculate the maintenance cost.

続いて、図６に示すように、各コスト計算部３１は、対応するＣＰＵにおいて、いずれかのプロセスが共有領域アクセス処理を実行したかどうかを判定する（ステップＢ１）。ステップＢ１の判定の結果、いずれのプロセスも共有領域アクセス処理を実行していない場合は、各コスト計算部３１は、待機状態となり、設定時間の経過後、再度ステップＢ１を実行する。 Subsequently, as illustrated in FIG. 6, each cost calculation unit 31 determines whether any process has executed the shared area access processing in the corresponding CPU (step B <b> 1). As a result of the determination in step B1, if no process is executing the shared area access processing, each cost calculation unit 31 enters a standby state, and executes step B1 again after the set time has elapsed.

一方、ステップＢ１の判定の結果、いずれかのプロセスが共有領域アクセス処理を実行した場合は、各コスト計算部３１は、共有領域アクセス処理を実行したプロセスを特定する（ステップＢ２）。 On the other hand, as a result of the determination in step B1, when any process executes the shared area access process, each cost calculation unit 31 identifies the process that executed the shared area access process (step B2).

次に、各コスト計算部３１は、特定したプロセスについて、コスト記録部３４から、共有領域アクセス処理の前後におけるキャッシュ無効化回数を取得する。そして、各コスト計算部３１は、図７に示すように、両者の差分、即ち、共有領域アクセス処理の実行中のキャッシュ無効化回数を計算し、計算した差分を維持コストとする（ステップＢ３）。また、各コスト計算部３１は、計算した維持コストをコスト記録部３４に記録する。 Next, each cost calculation unit 31 acquires the number of cache invalidations before and after the shared area access process from the cost recording unit 34 for the identified process. Then, as shown in FIG. 7, each cost calculation unit 31 calculates the difference between them, that is, the number of cache invalidations during execution of the shared area access process, and uses the calculated difference as the maintenance cost (step B3). . Each cost calculation unit 31 records the calculated maintenance cost in the cost recording unit 34.

次に、グルーピング部３２は、ステップＢ３で計算された維持コストが設定された閾値Ｔを超えているかどうかを判定する（ステップＢ４）。なお、閾値Ｔの設定は、図３に示したステップＡ５の場合と同様に行われる。 Next, the grouping unit 32 determines whether or not the maintenance cost calculated in Step B3 exceeds the set threshold value T (Step B4). The threshold value T is set in the same manner as in step A5 shown in FIG.

ステップＢ４の判定の結果、維持コストが閾値Ｔを超えていない場合は、プロセススケジューリング装置３０における処理は一旦終了する。一方、ステップＢ４の判定の結果、維持コストが閾値Ｔを超えている場合は、グルーピング部３２は、維持コストが閾値Ｔを超えたプロセスを一つのグループに設定する（ステップＢ５）。 As a result of the determination in step B4, when the maintenance cost does not exceed the threshold T, the process in the process scheduling device 30 is temporarily ended. On the other hand, as a result of the determination in step B4, if the maintenance cost exceeds the threshold T, the grouping unit 32 sets the processes whose maintenance cost exceeds the threshold T as one group (step B5).

次に、スケジューリング部３３は、任意のＣＰＵを選択し、ステップＢ５で設定したグループに属するプロセスを、選択したＣＰＵに割り当てる（ステップＢ６）。そして、ステップＢ６が実行されると、再度、ステップＢ１〜Ｂ６が実行される。このように、本実施の形態２では、各プロセスが共有領域アクセス処理を実行する度に、スケジューリングが実行される。また、選択対象となるＣＰＵはラウンドロビン方式によって選択される。 Next, the scheduling unit 33 selects an arbitrary CPU and assigns a process belonging to the group set in step B5 to the selected CPU (step B6). When step B6 is executed, steps B1 to B6 are executed again. As described above, in the second embodiment, scheduling is executed every time each process executes the shared area access processing. The CPU to be selected is selected by the round robin method.

具体的には、図８に示すように、共有領域アクセス処理１を実行するプロセスＡは、ＣＰＵ１０で実行される予定であったが、ステップＢ６の実行により、ＣＰＵ１１に割り当てられる。維持コストが閾値を超えているプロセスを、別々のＣＰＵで動作させると、キャッシュ無効化による性能低下が発生し易いが、共有領域アクセス処理１を実行するプロセスは全てＣＰＵ１１で実行されるので、マルチＣＰＵシステムにおける性能低下は抑制される。 Specifically, as shown in FIG. 8, the process A that executes the shared area access processing 1 was scheduled to be executed by the CPU 10, but is assigned to the CPU 11 by the execution of step B6. If a process whose maintenance cost exceeds the threshold is operated by a separate CPU, performance degradation due to cache invalidation is likely to occur. However, since all processes that execute the shared area access processing 1 are executed by the CPU 11, multiple processes are executed. Performance degradation in the CPU system is suppressed.

また、本実施の形態２におけるプログラムは、コンピュータ１００に、図６に示すステップＢ１〜Ｂ６を実行させるプログラムであれば良い。このプログラムをコンピュータにインストールし、実行することによって、本実施の形態２におけるプロセススケジューリング装置とプロセススケジューリング方法とを実現することができる。この場合、コンピュータのＣＰＵ１０及び１１は、コスト計算部３１、グルーピング部３２、スケジューリング部３３として機能し、処理を行なう。また、コンピュータに備えられたハードディスク、メモリ等の記憶装置が、コスト記録部３４として機能する。 Moreover, the program in this Embodiment 2 should just be a program which makes the computer 100 perform step B1-B6 shown in FIG. By installing and executing this program on a computer, the process scheduling apparatus and the process scheduling method according to the second embodiment can be realized. In this case, the CPUs 10 and 11 of the computer function as the cost calculation unit 31, the grouping unit 32, and the scheduling unit 33 to perform processing. Further, a storage device such as a hard disk or a memory provided in the computer functions as the cost recording unit 34.

また、本実施の形態２におけるプログラムも、実施の形態１と同様に、コンピュータ読み取り可能な記録媒体に格納された状態で提供されていても良いし、インターネット等のネットワーク介して流通されるものであっても良い。 Further, the program in the second embodiment may be provided in a state of being stored in a computer-readable recording medium as in the first embodiment, or distributed through a network such as the Internet. There may be.

（実施の形態３）
次に、本発明の実施の形態３における、プロセススケジューリング装置、プロセススケジューリング方法、及びプログラムについて、図９〜図１１を参照しながら説明する。また、以下の説明では、適宜、図１を参照する。 (Embodiment 3)
Next, a process scheduling apparatus, a process scheduling method, and a program according to Embodiment 3 of the present invention will be described with reference to FIGS. In the following description, FIG. 1 will be referred to as appropriate.

本実施の形態３におけるプロセススケジューリング装置は、図１に示した実施の形態１におけるプロセススケジューリング装置１０と同様の構成を備えている。但し、本実施の形態３では、コンピュータ１００において、ＣＰＵそれぞれと、コンピュータに備えられたメモリとによって、複数のＮＵＭＡ（Non-Uniform Memory Access）ノードが設定されている。また、本実施の形態３では、メモリ１２は、複数備えられていても良い。また、性能情報取得部１３は、性能情報として、各プロセスがメモリ１２に対して書き込み処理を実行したことの有無を取得する。 The process scheduling apparatus according to the third embodiment has the same configuration as that of the process scheduling apparatus 10 according to the first embodiment shown in FIG. However, in the third embodiment, in the computer 100, a plurality of non-uniform memory access (NUMA) nodes are set by each CPU and a memory provided in the computer. In the third embodiment, a plurality of memories 12 may be provided. In addition, the performance information acquisition unit 13 acquires, as performance information, whether each process has performed a write process on the memory 12.

そして、各コスト計算部３１は、本実施の形態３では、性能情報に基づいて、各プロセスがメモリ１２に対して書き込み処理を実行した回数を、ＮＵＭＡノード別に集計し、プロセス毎のＮＵＭＡノード別に集計した、メモリにアクセスした回数を、維持コストとする。そして、グルーピング部３２は、プロセス毎に、メモリ１２にアクセスした回数が最も高いＮＵＭＡノードを特定し、特定したＮＵＭＡノードが同一となるプロセスを一つのグループとする。 In the third embodiment, each cost calculation unit 31 counts the number of times each process has executed write processing to the memory 12 based on the performance information for each NUMA node, and for each NUMA node for each process. The total number of times the memory is accessed is taken as the maintenance cost. Then, the grouping unit 32 identifies the NUMA node having the highest number of accesses to the memory 12 for each process, and sets the processes having the same identified NUMA node as one group.

ここで、図９〜図１１を用いて、本実施の形態３におけるプロセススケジューリング装置の動作について説明する。図９は、本発明の実施の形態３におけるプロセススケジューリング装置の動作を示すフロー図である。図１０は、本発明の実施の形態３における情報取得処理を概念的に示す図である。図１１は、本発明の実施の形態３におけるスケジューリング処理を概念的に説明する図である。 Here, the operation of the process scheduling apparatus according to the third embodiment will be described with reference to FIGS. FIG. 9 is a flowchart showing the operation of the process scheduling apparatus according to the third embodiment of the present invention. FIG. 10 is a diagram conceptually showing the information acquisition process in the third embodiment of the present invention. FIG. 11 is a diagram conceptually illustrating the scheduling process according to the third embodiment of the present invention.

なお、本実施の形態３においても、プロセススケジューリング装置を動作させることによって、プロセススケジューリング方法が実施される。よって、本実施の形態３におけるプロセススケジューリング方法の説明も、以下のプロセススケジューリング装置の動作説明に代える。 In the third embodiment, the process scheduling method is also implemented by operating the process scheduling apparatus. Therefore, the description of the process scheduling method in the third embodiment is also replaced with the following description of the operation of the process scheduling apparatus.

まず、本実施の形態３では、前提として、図１０に示すように、性能情報取得部１３は、プロセスがメモリに書き込み処理を実行すると、書き込みが行われたページを取得し、対応するビットを１に設定し、更に、このことを対応するコスト計算部３１に通知する。また、各コスト計算部３１は、ＮＵＭＡノードに対応するカウンタを保持しており、性能情報取得部１３がビットを０から１に変化させると、書き込みが行われたページが属するＮＵＭＡノードのカウンタをインクリメントする。 First, in the third embodiment, as a premise, as shown in FIG. 10, when the process executes the write process to the memory, the performance information acquisition unit 13 acquires the page where the write is performed, and sets the corresponding bit. It is set to 1, and this is notified to the corresponding cost calculation unit 31. Each cost calculation unit 31 holds a counter corresponding to the NUMA node. When the performance information acquisition unit 13 changes the bit from 0 to 1, the counter of the NUMA node to which the written page belongs is stored. Increment.

続いて、図９に示すように、各コスト計算部３１は、設定時間毎に、ＮＵＭＡノードのカウンタを集計し、プロセス毎のＮＵＭＡノード別のカウンタ値（各プロセスのＮＵＭＡノード別のメモリへの書き込み回数）を、維持コストとして、コスト記録部３４に記録する（ステップＣ１）。 Subsequently, as shown in FIG. 9, each cost calculation unit 31 counts the counters of the NUMA nodes for each set time, and the counter value for each NUMA node for each process (to the memory for each process NUMA node) The number of times of writing) is recorded in the cost recording unit 34 as a maintenance cost (step C1).

次に、グルーピング部３２は、維持コストに基づいて、プロセス毎に、維持コスト（カウンタ値）が最大となるＮＵＭＡノードを特定する（ステップＣ２）。そして、図１１に示すように、グルーピング部３２は、プロセスを、特定されたＮＵＭＡノードが同じプロセスが一つのグループとなるようにグルーピングする（ステップＣ３）。 Next, the grouping unit 32 specifies the NUMA node having the maximum maintenance cost (counter value) for each process based on the maintenance cost (step C2). Then, as illustrated in FIG. 11, the grouping unit 32 groups the processes so that the processes having the same NUMA node are included in one group (step C3).

その後、図１１に示すように、グルーピング部３２は、各グループが別々のＣＰＵに割り当てられるよう、各プロセスをグループ毎に各ＣＰＵに割り当てる（ステップＣ４）。この結果、同様のメモリ領域に対して書き込み処理を実行する可能性が高いプロセスは、同一のＣＰＵに割り当てられるので、キャッシュ効率が向上し、結果、マルチＣＰＵシステムの性能も向上する。 After that, as shown in FIG. 11, the grouping unit 32 assigns each process to each CPU for each group so that each group is assigned to a different CPU (step C4). As a result, processes having a high possibility of executing a write process on the same memory area are allocated to the same CPU, so that the cache efficiency is improved, and as a result, the performance of the multi-CPU system is also improved.

また、本実施の形態３におけるプログラムは、コンピュータ１００に、図９に示すステップＣ１〜Ｃ４を実行させるプログラムであれば良い。このプログラムをコンピュータにインストールし、実行することによって、本実施の形態３におけるプロセススケジューリング装置とプロセススケジューリング方法とを実現することができる。この場合、コンピュータのＣＰＵ１０及び１１は、コスト計算部３１、グルーピング部３２、スケジューリング部３３として機能し、処理を行なう。また、コンピュータに備えられたハードディスク、メモリ等の記憶装置が、コスト記録部３４として機能する。 Moreover, the program in this Embodiment 3 should just be a program which makes the computer 100 perform step C1-C4 shown in FIG. By installing and executing this program on a computer, the process scheduling apparatus and the process scheduling method according to the third embodiment can be realized. In this case, the CPUs 10 and 11 of the computer function as the cost calculation unit 31, the grouping unit 32, and the scheduling unit 33 to perform processing. Further, a storage device such as a hard disk or a memory provided in the computer functions as the cost recording unit 34.

また、本実施の形態３におけるプログラムも、実施の形態１と同様に、コンピュータ読み取り可能な記録媒体に格納された状態で提供されても良いし、インターネット等のネットワーク介して流通されるものであっても良い。 Also, the program in the third embodiment may be provided in a state of being stored in a computer-readable recording medium as in the first embodiment, or distributed through a network such as the Internet. May be.

上述した実施の形態の一部又は全部は、以下に記載する（付記１）〜（付記１５）によって表現することができるが、以下の記載に限定されるものではない。 Part or all of the above-described embodiment can be expressed by (Appendix 1) to (Appendix 15) described below, but is not limited to the following description.

（付記１）
複数のＣＰＵを有するコンピュータにおいてプロセスのスケジューリングを行なうための装置であって、
前記コンピュータによって実行されるプロセス毎に取得された、前記複数のＣＰＵそれぞれのキャッシュ又は前記コンピュータのメモリへのアクセス状況を示す情報に基づいて、前記プロセス毎に、前複数のＣＰＵそれぞれのキャッシュの内容の一貫性を維持するためのコストを計算する、コスト計算部と、
計算された前記プロセス毎の前記コストに基づいて、前記プロセスそれぞれをグループに分ける、グルーピング部と、
前記プロセスそれぞれを、前記グループ毎に、前記複数のＣＰＵのいずれかに割り当てる、スケジューリング部と、
を備えていることを特徴とするプロセススケジューリング装置。 (Appendix 1)
An apparatus for scheduling processes in a computer having a plurality of CPUs,
The contents of the cache of each of the plurality of CPUs for each process based on the information indicating the access status to the cache of each of the plurality of CPUs or the memory of the computer acquired for each process executed by the computer A cost calculator that calculates the cost of maintaining consistency
A grouping unit that divides each of the processes into groups based on the calculated cost for each of the processes;
A scheduling unit that assigns each of the processes to any of the plurality of CPUs for each of the groups;
A process scheduling apparatus comprising:

（付記２）
前記情報として、前記プロセス毎に、前記複数のＣＰＵそれぞれのキャッシュが無効化された回数が取得されており、
前記コスト計算部が、取得された前記複数のＣＰＵそれぞれのキャッシュが無効化された回数に基づいて、前記コストを計算する、
付記１に記載のプロセススケジューリング装置。 (Appendix 2)
As the information, for each process, the number of times each cache of the plurality of CPUs is invalidated is acquired,
The cost calculation unit calculates the cost based on the number of times the acquired caches of the plurality of CPUs are invalidated;
The process scheduling apparatus according to appendix 1.

（付記３）
前記グルーピング部が、前記プロセスそれぞれを、前記回数が予め設定された閾値を越えているプロセスのグループと、前記回数が予め設定された閾値以下となっているプロセスのグループとに分ける、
付記２に記載のプロセススケジューリング装置。 (Appendix 3)
The grouping unit divides each of the processes into a group of processes in which the number of times exceeds a preset threshold and a group of processes in which the number of times is equal to or less than a preset threshold.
The process scheduling apparatus according to attachment 2.

（付記４）
前記コスト計算部が、取得された前記複数のＣＰＵそれぞれのキャッシュが無効化された回数に基づいて、前記プロセス毎に、当該プロセスが共有領域アクセス処理を実行している間の前記回数を計算し、計算した前記回数を前記コストとする、
付記２に記載のプロセススケジューリング装置。 (Appendix 4)
The cost calculation unit calculates the number of times during which the process is executing shared area access processing for each process based on the number of times the acquired caches of the plurality of CPUs are invalidated. The calculated number of times is the cost.
The process scheduling apparatus according to attachment 2.

（付記５）
前記コンピュータにおいて、前記複数のＣＰＵそれぞれと、前記コンピュータに備えられたメモリとによって、複数のノードが設定され、
前記情報として、前記プロセス毎に、当該プロセスが前記メモリに対して書き込み処理を実行したことの有無が取得されており、
前記コスト計算部が、前記情報に基づいて、前記プロセスそれぞれが前記メモリに対して書き込み処理を実行した回数を、前記ノード別に集計し、前記プロセス毎の前記ノード別に集計した、前記メモリにアクセスした回数を、前記コストとし、
前記グルーピング部が、前記プロセス毎に、前記メモリにアクセスした回数が最も高いノードを特定し、特定したノードが同一となるプロセスを一つのグループとする、
付記１に記載のプロセススケジューリング装置。 (Appendix 5)
In the computer, a plurality of nodes are set by each of the plurality of CPUs and a memory provided in the computer.
As the information, for each process, whether or not the process has executed write processing on the memory is acquired,
Based on the information, the cost calculation unit accessed the memory by counting the number of times each process executed a write process to the memory for each node and totaling for each node for each process. The number of times is the cost,
For each process, the grouping unit identifies a node having the highest number of accesses to the memory, and processes in which the identified nodes are the same are grouped together.
The process scheduling apparatus according to appendix 1.

（付記６）
複数のＣＰＵを有するコンピュータにおいてプロセスのスケジューリングを行なうための方法であって、
（ａ）前記コンピュータによって実行されるプロセス毎に取得されている、前記複数のＣＰＵそれぞれのキャッシュ又は前記コンピュータのメモリへのアクセス状況を示す情報に基づいて、前記プロセス毎に、前複数のＣＰＵそれぞれのキャッシュの内容の一貫性を維持するためのコストを計算する、ステップと、
（ｂ）前記（ａ）のステップで計算された前記プロセス毎の前記コストに基づいて、前記プロセスそれぞれをグループに分ける、ステップと、
（ｃ）前記プロセスそれぞれを、前記グループ毎に、前記複数のＣＰＵのいずれかに割り当てる、ステップと、
を有することを特徴とするプロセススケジューリング方法。 (Appendix 6)
A method for scheduling a process in a computer having a plurality of CPUs, comprising:
(A) Each of the plurality of CPUs for each of the processes based on information indicating an access status to the cache of each of the plurality of CPUs or the memory of the computer acquired for each process executed by the computer Calculating the cost to maintain the consistency of the cache contents of the
(B) dividing each of the processes into groups based on the cost for each of the processes calculated in the step of (a);
(C) assigning each of the processes to any of the plurality of CPUs for each of the groups;
A process scheduling method comprising:

（付記７）
前記情報として、前記プロセス毎に、前記複数のＣＰＵそれぞれのキャッシュが無効化された回数が取得されており、
前記（ａ）のステップにおいて、取得された前記複数のＣＰＵそれぞれのキャッシュが無効化された回数に基づいて、前記コストを計算する、
付記６に記載のプロセススケジューリング方法。 (Appendix 7)
As the information, for each process, the number of times each cache of the plurality of CPUs is invalidated is acquired,
In the step (a), the cost is calculated based on the number of times the acquired caches of the plurality of CPUs are invalidated.
The process scheduling method according to attachment 6.

（付記８）
前記（ｂ）のステップにおいて、前記プロセスそれぞれを、前記回数が予め設定された閾値を越えているプロセスのグループと、前記回数が予め設定された閾値以下となっているプロセスのグループとに分ける、
付記７に記載のプロセススケジューリング方法。 (Appendix 8)
In the step (b), each of the processes is divided into a group of processes in which the number of times exceeds a preset threshold and a group of processes in which the number of times is less than or equal to a preset threshold.
The process scheduling method according to appendix 7.

（付記９）
前記（ａ）のステップにおいて、取得された前記複数のＣＰＵそれぞれのキャッシュが無効化された回数に基づいて、前記プロセス毎に、当該プロセスが共有領域アクセス処理を実行している間の前記回数を計算し、計算した前記回数を前記コストとする、
付記７に記載のプロセススケジューリング方法。 (Appendix 9)
In the step (a), based on the number of times the acquired caches of the plurality of CPUs are invalidated, for each process, the number of times during which the process is executing shared area access processing is calculated. Calculate the number of times calculated as the cost,
The process scheduling method according to appendix 7.

（付記１０）
前記コンピュータにおいて、前記複数のＣＰＵそれぞれと、前記コンピュータに備えられたメモリとによって、複数のノードが設定され、
前記情報として、前記プロセス毎に、当該プロセスが前記メモリに対して書き込み処理を実行したことの有無が取得されており、
前記（ａ）のステップにおいて、前記情報に基づいて、前記プロセスそれぞれが前記メモリに対して書き込み処理を実行した回数を、前記ノード別に集計し、前記プロセス毎の前記ノード別に集計した、前記メモリにアクセスした回数を、前記コストとし、
前記（ｂ）のステップにおいて、前記プロセス毎に、前記メモリにアクセスした回数が最も高いノードを特定し、特定したノードが同一となるプロセスを一つのグループとする、
付記６に記載のプロセススケジューリング方法。 (Appendix 10)
In the computer, a plurality of nodes are set by each of the plurality of CPUs and a memory provided in the computer.
As the information, for each process, whether or not the process has executed write processing on the memory is acquired,
In the step (a), based on the information, the number of times each of the processes has performed write processing on the memory is totaled for each node, and the memory is totaled for each node for each process. The number of accesses is the cost,
In the step (b), for each process, a node having the highest number of accesses to the memory is specified, and processes having the same specified node are grouped into one group.
The process scheduling method according to attachment 6.

（付記１１）
複数のＣＰＵを有するコンピュータに、プロセスのスケジューリングを行なわせるためのプログラムであって、
前記コンピュータに、
（ａ）実行対象となるプロセス毎に取得されている、前記複数のＣＰＵそれぞれのキャッシュ又は前記コンピュータのメモリへのアクセス状況を示す情報に基づいて、前記プロセス毎に、前複数のＣＰＵそれぞれのキャッシュの内容の一貫性を維持するためのコストを計算する、ステップと、
（ｂ）前記（ａ）のステップで計算された前記プロセス毎の前記コストに基づいて、前記プロセスそれぞれをグループに分ける、ステップと、
（ｃ）前記プロセスそれぞれを、前記グループ毎に、前記複数のＣＰＵのいずれかに割り当てる、ステップと、
を実行させるプログラム。 (Appendix 11)
A program for causing a computer having a plurality of CPUs to perform process scheduling,
In the computer,
(A) Based on information indicating an access status to each of the plurality of CPUs or the memory of the computer acquired for each process to be executed, each of the previous plurality of CPUs for each process Calculating the cost of maintaining the consistency of the content, steps,
(B) dividing each of the processes into groups based on the cost for each of the processes calculated in the step of (a);
(C) assigning each of the processes to any of the plurality of CPUs for each of the groups;
A program that executes

（付記１２）
前記情報として、前記プロセス毎に、前記複数のＣＰＵそれぞれのキャッシュが無効化された回数が取得されており、
前記（ａ）のステップにおいて、取得された前記複数のＣＰＵそれぞれのキャッシュが無効化された回数に基づいて、前記コストを計算する、
付記１１に記載のプログラム。 (Appendix 12)
As the information, for each process, the number of times each cache of the plurality of CPUs is invalidated is acquired,
In the step (a), the cost is calculated based on the number of times the acquired caches of the plurality of CPUs are invalidated.
The program according to appendix 11.

（付記１３）
前記（ｂ）のステップにおいて、前記プロセスそれぞれを、前記回数が予め設定された閾値を越えているプロセスのグループと、前記回数が予め設定された閾値以下となっているプロセスのグループとに分ける、
付記１２に記載のプログラム。 (Appendix 13)
In the step (b), each of the processes is divided into a group of processes in which the number of times exceeds a preset threshold and a group of processes in which the number of times is less than or equal to a preset threshold.
The program according to attachment 12.

（付記１４）
前記（ａ）のステップにおいて、取得された前記複数のＣＰＵそれぞれのキャッシュが無効化された回数に基づいて、前記プロセス毎に、当該プロセスが共有領域アクセス処理を実行している間の前記回数を計算し、計算した前記回数を前記コストとする、
付記１２に記載のプログラム。 (Appendix 14)
In the step (a), based on the number of times the acquired caches of the plurality of CPUs are invalidated, for each process, the number of times during which the process is executing shared area access processing is calculated. Calculate the number of times calculated as the cost,
The program according to attachment 12.

（付記１５）
前記コンピュータにおいて、前記複数のＣＰＵそれぞれと、前記コンピュータに備えられたメモリとによって、複数のノードが設定され、
前記情報として、前記プロセス毎に、当該プロセスが前記メモリに対して書き込み処理を実行したことの有無が取得されており、
前記（ａ）のステップにおいて、前記情報に基づいて、前記プロセスそれぞれが前記メモリに対して書き込み処理を実行した回数を、前記ノード別に集計し、前記プロセス毎の前記ノード別に集計した、前記メモリにアクセスした回数を、前記コストとし、
前記（ｂ）のステップにおいて、前記プロセス毎に、前記メモリにアクセスした回数が最も高いノードを特定し、特定したノードが同一となるプロセスを一つのグループとする、
付記１１に記載のプログラム。 (Appendix 15)
In the computer, a plurality of nodes are set by each of the plurality of CPUs and a memory provided in the computer.
As the information, for each process, whether or not the process has executed write processing on the memory is acquired,
In the step (a), based on the information, the number of times each of the processes has performed write processing on the memory is totaled for each node, and the memory is totaled for each node for each process. The number of accesses is the cost,
In the step (b), for each process, a node having the highest number of accesses to the memory is specified, and processes having the same specified node are grouped into one group.
The program according to appendix 11.

本発明によれば、マルチＣＰＵシステムにおける性能低下を抑制することができる。本発明は、マルチＣＰＵシステムを搭載するサーバコンピュータ等に有用である。 According to the present invention, it is possible to suppress performance degradation in a multi-CPU system. The present invention is useful for a server computer or the like equipped with a multi-CPU system.

１０、１１ＣＰＵ
１２メインメモリ
１３性能情報取得部
２０オペレーティングシステム
３０プロセススケジューリング装置
３１コスト計算部
３２グルーピング部
３３スケジューリング部
３４コスト記録部
１００コンピュータ 10, 11 CPU
DESCRIPTION OF SYMBOLS 12 Main memory 13 Performance information acquisition part 20 Operating system 30 Process scheduling apparatus 31 Cost calculation part 32 Grouping part 33 Scheduling part 34 Cost recording part 100 Computer

Claims

An apparatus for scheduling processes in a computer having a plurality of CPUs,
The contents of the cache of each of the plurality of CPUs for each process based on the information indicating the access status to the cache of each of the plurality of CPUs or the memory of the computer acquired for each process executed by the computer A cost calculator that calculates the cost of maintaining consistency
A grouping unit that divides each of the processes into groups based on the calculated cost for each of the processes;
A scheduling unit that assigns each of the processes to any of the plurality of CPUs for each of the groups;
A process scheduling apparatus comprising:

As the information, for each process, the number of times each cache of the plurality of CPUs is invalidated is acquired,
The cost calculation unit calculates the cost based on the number of times the acquired caches of the plurality of CPUs are invalidated;
The process scheduling apparatus according to claim 1.

The grouping unit divides each of the processes into a group of processes in which the number of times exceeds a preset threshold and a group of processes in which the number of times is equal to or less than a preset threshold.
The process scheduling apparatus according to claim 2.

The cost calculation unit calculates the number of times during which the process is executing shared area access processing for each process based on the number of times the acquired caches of the plurality of CPUs are invalidated. The calculated number of times is the cost.
The process scheduling apparatus according to claim 2.

In the computer, a plurality of nodes are set by each of the plurality of CPUs and a memory provided in the computer.
As the information, for each process, whether or not the process has executed write processing on the memory is acquired,
Based on the information, the cost calculation unit accessed the memory by counting the number of times each process executed a write process to the memory for each node and totaling for each node for each process. The number of times is the cost,
For each process, the grouping unit identifies a node having the highest number of accesses to the memory, and processes in which the identified nodes are the same are grouped together.
The process scheduling apparatus according to claim 1.

A method for scheduling a process in a computer having a plurality of CPUs, comprising:
(A) Each of the plurality of CPUs for each of the processes based on information indicating an access status to the cache of each of the plurality of CPUs or the memory of the computer acquired for each process executed by the computer Calculating the cost to maintain the consistency of the cache contents of the
(B) dividing each of the processes into groups based on the cost for each of the processes calculated in the step of (a);
(C) assigning each of the processes to any of the plurality of CPUs for each of the groups;
A process scheduling method comprising:

A program for causing a computer having a plurality of CPUs to perform process scheduling,
In the computer,
(A) Based on information indicating an access status to each of the plurality of CPUs or the memory of the computer acquired for each process to be executed, each of the previous plurality of CPUs for each process Calculating the cost of maintaining the consistency of the content, steps,
(B) dividing each of the processes into groups based on the cost for each of the processes calculated in the step of (a);
(C) assigning each of the processes to any of the plurality of CPUs for each of the groups;
A program that executes