JP2014146366A

JP2014146366A - Multi-core processor system, and control method and control program of multi-core processor system

Info

Publication number: JP2014146366A
Application number: JP2014077378A
Authority: JP
Inventors: Hiromasa Yamauchi; 宏真山内; Koichiro Yamashita; 浩一郎山下; Takahisa Suzuki; 貴久鈴木; Yasushi Kurihara; 康志栗原
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2014-04-03
Filing date: 2014-04-03
Publication date: 2014-08-14
Anticipated expiration: 2030-08-27
Also published as: JP5776813B2

Abstract

PROBLEM TO BE SOLVED: To improve the processing capability of a multi-core processor by enhancing the utilization efficiency of a cache even when parallel processing and multitask processing are executed.SOLUTION: In the case that each CPU simultaneously executes processing with the same priority set therein executable in parallel as in a left multi-core processor system 100, a scheduler 110 preferentially arranges shared data of processing of a high priority in a memory area having a high access speed first. On the other hand, in the case that each CPU simultaneously executes processing having a different priority executable in parallel as in a right multi-core processor system 100, the scheduler 110 arranges shared data of processing set to a high priority in the same manner as the left multi-core processor system 100. After that, the scheduler 110 arranges shared data of a task #2 and a task #3 set to a low priority in the rest memory.

Description

この発明は、複数のコアによる並列処理によってマルチタスク処理を行う際のマルチコアプロセッサシステム、マルチコアプロセッサシステムの制御方法および制御プログラムに関する。 The present invention relates to a multicore processor system, a control method for a multicore processor system, and a control program when performing multitask processing by parallel processing using a plurality of cores.

従来より、プロセッサが処理実行時に利用するデータを格納するメモリ領域として、キャッシュメモリ、メインメモリ、ファイルシステムという階層的なメモリ構成が採用されている。階層的なメモリ構成は、データへのアクセス速度を向上させるため、システムの高速化が期待される。階層的なメモリ構成の場合、他のメモリと比較して高速に動作するキャッシュメモリは、限られたメモリ容量であるため、キャッシュメモリに格納されたデータは、ＬＲＵ（ＬｅａｓｔＲｅｃｅｎｔｌｙＵｓｅｄ）などのアルゴリズムを用いて入れ替えが行われる（例えば、下記特許文献１参照。）。 Conventionally, a hierarchical memory configuration such as a cache memory, a main memory, and a file system has been adopted as a memory area for storing data used by a processor during processing execution. The hierarchical memory configuration is expected to increase the system speed in order to improve the access speed to data. In the case of a hierarchical memory configuration, a cache memory that operates at a higher speed than other memories has a limited memory capacity. Therefore, data stored in the cache memory is an algorithm such as LRU (Least Recently Used). Is used for replacement (see, for example, Patent Document 1 below).

また、近年では複数のプロセッサを備えたマルチコアプロセッサシステムが広く採用されている。マルチコアプロセッサシステムは、各プロセッサによってタスクを並列に実行させるため、処理性能を大幅に向上させることができる（例えば、下記特許文献１参照。）。一方で、マルチコアプロセッサシステムは、タスクを並列に実行させた場合、各プロセッサのキャッシュメモリ上のデータが書き換えられた際に、他のプロセッサのキャッシュメモリ上のデータを同期させる処理が必要となる。 In recent years, multi-core processor systems having a plurality of processors have been widely adopted. Since the multi-core processor system causes tasks to be executed in parallel by each processor, the processing performance can be greatly improved (see, for example, Patent Document 1 below). On the other hand, in a multi-core processor system, when tasks are executed in parallel, when data in the cache memory of each processor is rewritten, processing for synchronizing data in the cache memory of other processors is required.

データの同期の手法として、具体的には、プロセッサ間でのキャッシュコヒーレンシを取るための機構であるスヌープキャッシュ機構が挙げられる。スヌープキャッシュ機構は、キャッシュメモリ上の、あるプロセッサが他のプロセッサと共有するデータが書き換えられた際に動作する。キャッシュメモリ上のデータの書き換えは、他のプロセッサのキャッシュメモリに搭載されているスヌープコントローラによって検知される。そして、スヌープコントローラは、キャッシュメモリ間のバスを介して、書き換えられた新しい値を他のプロセッサのキャッシュメモリにも反映させる（例えば、下記特許文献２参照。）。 Specifically, a data synchronization method includes a snoop cache mechanism that is a mechanism for achieving cache coherency between processors. The snoop cache mechanism operates when data on a cache memory that is shared by one processor with another processor is rewritten. Rewriting of data on the cache memory is detected by a snoop controller mounted on the cache memory of another processor. Then, the snoop controller reflects the rewritten new value in the cache memory of another processor via the bus between the cache memories (see, for example, Patent Document 2 below).

また、組み込みシステムにおいても、複数のアプリケーションの並列実行が要求されており、並列実行を実現するための技術が提供されている。具体的には、１つのプロセッサ上で実行するタスクを時分割などで切り替えるマルチタスク処理や、複数のプロセッサで複数のタスクを実行する分散処理や、これらの処理を組み合わせた処理が開示されている（例えば、下記特許文献３参照。）。 Further, even in an embedded system, parallel execution of a plurality of applications is required, and a technique for realizing parallel execution is provided. Specifically, multitask processing that switches tasks executed on one processor by time division, distributed processing that executes a plurality of tasks on a plurality of processors, and processing that combines these processing are disclosed. (For example, refer to Patent Document 3 below.)

特開平６−１７５９２３号公報JP-A-6-175923 特開平１０−２４０６９８号公報JP-A-10-240698 特開平１１−２１２８６９号公報JP-A-11-212869

しかしながら、マルチコアプロセッサシステムの場合、複数のプロセッサによって並列タスクを実行する際に必要な、キャッシュメモリ間の同期処理や、マルチタスク処理の実行によって発生する頻繁なキャッシュメモリの書き換えが、性能低下の原因となることもあった。 However, in the case of a multi-core processor system, synchronization processing between cache memories and frequent rewriting of cache memory that occurs when multitask processing is executed, which are necessary when executing parallel tasks by multiple processors, cause performance degradation. Sometimes it was.

図２０は、マルチコアの並列処理におけるスヌープの動作例を示す説明図である。マルチコアプロセッサシステム２０００の場合、マルチコア（例えば、図２０のようなＣＰＵ＃０，ＣＰＵ＃１）では、各ＣＰＵが同時に処理を実行する並列処理が行われる。そして並列処理の中でも、特に共通のデータを用いるタスクを各ＣＰＵ上で同時に実行する場合、一方のキャッシュメモリ（例えば、キャシュＬ１＄０とキャッシュＬ１＄１とのいずれか）上のデータが書き換えられると、スヌープ１２０によって同期処理が行われる。具体的には、スヌープ１２０は、ＣＰＵ＃０によってキャッシュＬ１＄０に配置されているデータの中の変数ａの値が書き換えられると、バスを介して、キャッシュＬ１＄１の変数ａのデータを書き換える。 FIG. 20 is an explanatory diagram illustrating an example of snoop operation in multi-core parallel processing. In the case of the multi-core processor system 2000, in the multi-core (for example, CPU # 0, CPU # 1 as shown in FIG. 20), parallel processing in which each CPU executes processing simultaneously is performed. In parallel processing, particularly when a task using common data is executed on each CPU at the same time, data in one cache memory (for example, one of the cache L1 $ 0 and the cache L1 $ 1) is rewritten. Then, the synchronization process is performed by the snoop 120. Specifically, when the value of the variable a in the data arranged in the cache L1 $ 0 is rewritten by the CPU # 0, the snoop 120 transfers the data of the variable a in the cache L1 $ 1 via the bus. rewrite.

スヌープ１２０によるデータの書き換えが頻繁に発生すると、キャシュＬ１＄０とキャッシュＬ１＄１とを接続するバスが混雑し、結果として性能劣化を起こしてしまう。さらに、頻繁な書き換え処理の発生によって、バストランザクションが増加してしまう。また、頻繁な書き換え処理の発生は、スヌープ１２０のバスを占有してしまうことになる。このような状態に、リアルタイム制約のある他のプロセスの実行要求が発生した場合、リアルタイム制約のある他のプロセスのキャッシュメモリへのアクセスを阻害してしまうため、重大な性能問題となる恐れがあった。 If data rewriting by the snoop 120 frequently occurs, the bus connecting the cache L1 $ 0 and the cache L1 $ 1 is congested, resulting in performance degradation. Furthermore, bus transactions increase due to frequent rewrite processing. Further, frequent rewrite processing occupies the snoop 120 bus. If an execution request for another process with a real-time constraint occurs in such a state, access to the cache memory of another process with a real-time constraint may be hindered, which may cause a serious performance problem. It was.

図２１は、マルチタスク処理におけるキャッシュ書き換え例を示す説明図である。マルチコアプロセッサシステム２０００がマルチタスク処理を行う場合、タスクの実行状況に応じて、実行対象となるタスクを切り替えるタスクスイッチが行われる。例えば、図２１において、マルチコアプロセッサシステム２０００は、タスク＃０〜タスク＃２を対象にしてマルチタスク処理を行う。 FIG. 21 is an explanatory diagram of an example of cache rewriting in multitask processing. When the multi-core processor system 2000 performs multi-task processing, a task switch that switches a task to be executed is performed according to the task execution status. For example, in FIG. 21, the multi-core processor system 2000 performs multi-task processing for task # 0 to task # 2.

そして、図２１の左側のように、ＣＰＵ＃０によってタスク＃０が実行され、ＣＰＵ＃１によってタスク＃２が実行されている状態で、タスクスイッチが発生したとする。タスクスイッチの発生によって、図２１の右側のように、ＣＰＵ＃０によって実行されるタスクは、タスク＃０からタスク＃１に切り替えられる。実行対象となるタスクが切り替えられると、キャッシュＬ１＄０に配置されるデータの内容も、タスク＃０が利用するデータからタスク＃１が利用するデータへ書き換えられる。 Then, as shown in the left side of FIG. 21, it is assumed that a task switch occurs in a state where task # 0 is executed by CPU # 0 and task # 2 is executed by CPU # 1. When a task switch occurs, the task executed by CPU # 0 is switched from task # 0 to task # 1 as shown on the right side of FIG. When the task to be executed is switched, the content of the data arranged in the cache L1 $ 0 is also rewritten from the data used by the task # 0 to the data used by the task # 1.

キャッシュＬ１＄０に配置されるデータが書き換えられた後、書き換え以前に実行されていた処理の実行に戻った場合、ＣＰＵ＃０は、タスク＃０が利用するデータをメモリ１４０から再度読み出す必要がある。たとえ、タスクスイッチの発生によって、対象となるキャッシュメモリに配置されたデータが書き換えられても、その後ＣＰＵによってキャッシュメモリに配置されたデータが利用されないことも多かった。このように再利用性がないデータの書き換え処理は、キャシュメモリを利用するＣＰＵにとって性能劣化の原因となってしまうという問題があった。 After the data arranged in the cache L1 $ 0 is rewritten, when returning to the execution of the process executed before the rewriting, the CPU # 0 needs to read the data used by the task # 0 from the memory 140 again. is there. For example, even if the data placed in the target cache memory is rewritten due to the occurrence of a task switch, the data placed in the cache memory by the CPU is not often used thereafter. In this way, the rewriting process of data having no reusability has a problem that it causes performance degradation for the CPU using the cache memory.

本開示技術は、上述した従来技術による問題点を解消するため、並列処理およびマルチタスク処理が実行される場合であっても、キャッシュの利用効率を高めてマルチコアプロセッサシステムの処理能力を向上させることのできるマルチコアプロセッサシステム、マルチコアプロセッサシステムの制御方法および制御プログラムを提供することを目的とする。 In order to solve the above-described problems caused by the prior art, the disclosed technology improves the processing efficiency of the multi-core processor system by improving the use efficiency of the cache even when parallel processing and multi-task processing are executed. An object of the present invention is to provide a multicore processor system, a control method for the multicore processor system, and a control program.

上述した課題を解決し、目的を達成するため、本開示技術は、タスクをそれぞれ処理する複数のコアと、前記複数のコアがタスクを処理する場合にアクセスするデータをそれぞれ記憶する複数のキャッシュとを有するマルチコアプロセッサシステムであって、前記複数のコアのうちの第１のコアは、前記複数のコアのいずれかに割り当てられる前記タスクの優先度が既定の値以上である場合に、前記タスクの処理を実行する前に前記タスクを割り当てたコアに対応するキャッシュに前記データを格納する。 In order to solve the above-described problem and achieve the object, the disclosed technology includes a plurality of cores that respectively process tasks, and a plurality of caches that respectively store data that is accessed when the plurality of cores process tasks. A first core of the plurality of cores, the priority of the task assigned to any one of the plurality of cores is equal to or higher than a predetermined value. Before executing the process, the data is stored in a cache corresponding to the core to which the task is assigned.

本マルチコアプロセッサシステム、マルチコアプロセッサシステムの制御方法および制御プログラムによれば、並列処理およびマルチタスク処理が実行される場合であっても、キャッシュの利用効率を高めてマルチコアプロセッサシステムの処理能力を向上させることができるという効果を奏する。 According to the control method and control program of the multi-core processor system and the multi-core processor system, even when parallel processing and multi-task processing are executed, the use efficiency of the cache is improved and the processing capability of the multi-core processor system is improved. There is an effect that can be.

本実施の形態にかかるスケジューリング処理の一例を示す説明図である。It is explanatory drawing which shows an example of the scheduling process concerning this Embodiment. 階層的なメモリ構成の一例を示す説明図である。It is explanatory drawing which shows an example of a hierarchical memory structure. マルチタスク処理の一例を示す説明図である。It is explanatory drawing which shows an example of a multitask process. 通常のキャッシュコヒーレンシの手順（その１）を示す説明図である。It is explanatory drawing which shows the procedure (the 1) of normal cache coherency. 通常のキャッシュコヒーレンシの手順（その２）を示す説明図である。It is explanatory drawing which shows the procedure (the 2) of normal cache coherency. 通常のキャッシュコヒーレンシの手順（その３）を示す説明図である。It is explanatory drawing which shows the procedure (the 3) of normal cache coherency. 通常のキャッシュコヒーレンシの手順（その４）を示す説明図である。It is explanatory drawing which shows the procedure (the 4) of normal cache coherency. 低優先度並列タスクにおけるキャッシュコヒーレンシの手順を示す説明図である。It is explanatory drawing which shows the procedure of the cache coherency in a low priority parallel task. スケジューラの機能的構成を示すブロック図である。It is a block diagram which shows the functional structure of a scheduler. 共有データの配置処理の手順を示すフローチャートである。It is a flowchart which shows the procedure of arrangement | positioning processing of shared data. タスクテーブル作成処理の手順を示すフローチャートである。It is a flowchart which shows the procedure of a task table creation process. タスクテーブルのデータ構造例を示すデータテーブルである。It is a data table which shows the example of a data structure of a task table. タスクテーブルの設定例を示すデータテーブルである。It is a data table which shows the example of a setting of a task table. タスク実行処理の手順（その１）を示すフローチャートである。It is a flowchart which shows the procedure (the 1) of a task execution process. タスク実行処理の手順（その２）を示すフローチャートである。It is a flowchart which shows the procedure (the 2) of a task execution process. タスク実行処理の手順（その３）を示すフローチャートである。It is a flowchart which shows the procedure (the 3) of a task execution process. タスク実行処理の手順（その４）を示すフローチャートである。It is a flowchart which shows the procedure (the 4) of a task execution process. 同一優先度の並列タスクの実行例を示す説明図である。It is explanatory drawing which shows the example of execution of the parallel task of the same priority. 優先度の異なる並列タスクの実行例を示す説明図である。It is explanatory drawing which shows the execution example of the parallel task from which a priority differs. マルチコアの並列処理におけるスヌープの動作例を示す説明図である。It is explanatory drawing which shows the operation example of the snoop in the multi-core parallel processing. マルチタスク処理におけるキャッシュ書き換え例を示す説明図である。It is explanatory drawing which shows the example of cache rewriting in multitask processing.

以下に添付図面を参照して、この発明にかかるマルチコアプロセッサシステム、マルチコアプロセッサシステムの制御方法および制御プログラムの好適な実施の形態を詳細に説明する。 Exemplary embodiments of a multicore processor system, a control method of the multicore processor system, and a control program according to the present invention will be explained below in detail with reference to the accompanying drawings.

図１は、本実施の形態にかかるスケジューリング処理の一例を示す説明図である。本実施の形態では、マルチコアプロセッサシステム１００に備えられた複数のプロセッサによって、複数の処理を並列に実行することができる。したがって、マルチコアプロセッサシステム１００では、アプリケーションの中から並列に実行可能な処理群（例えば、並列タスク）を抽出して、効率的な並列処理を行うことができる。 FIG. 1 is an explanatory diagram illustrating an example of a scheduling process according to the present embodiment. In the present embodiment, a plurality of processes can be executed in parallel by a plurality of processors provided in the multi-core processor system 100. Therefore, in the multi-core processor system 100, it is possible to extract processing groups (for example, parallel tasks) that can be executed in parallel from applications and perform efficient parallel processing.

また、本実施の形態では、実行対象となる処理に実行順序に関する優先度を、高優先度と低優先度とに設定することによって、再利用性の高いデータを選別してキャッシュメモリに配置することができる。優先度は、処理を実行させた際に一旦キャッシュメモリに格納したデータへアクセスする頻度や、デッドライン時間に基づいて設定されている。各タスクの優先度の設定内容はタスクテーブル１１１に記憶されている。なお、図１以降、高優先度のタスクを表すブロックを、低優先度のタスクを表すブロックよりも大きく表示する。 Also, in this embodiment, by setting the priority related to the execution order to the processing to be executed as high priority and low priority, data with high reusability is selected and placed in the cache memory. be able to. The priority is set based on the frequency of accessing the data once stored in the cache memory when the process is executed and the deadline time. The setting contents of the priority of each task are stored in the task table 111. In FIG. 1 and subsequent figures, blocks representing high priority tasks are displayed larger than blocks representing low priority tasks.

したがって、マルチコアプロセッサシステム１００のスケジューラ１１０は、並列実行させる処理に設定されている優先度を参照して、各処理を実行する際にアクセスされるデータ（以下、「共有データ」と呼ぶ）をそれぞれ、最適なメモリ領域に配置する。また、スケジューラ１１０は、複数のキャッシュメモリに同一の共有データが配置されている場合に、優先度に応じて共有データを同期させるキャッシュコヒーレンシとして、どのような手法を用いるかを選択する。 Therefore, the scheduler 110 of the multi-core processor system 100 refers to the priority set for the processes to be executed in parallel, and each accesses data (hereinafter referred to as “shared data”) when executing each process. Place it in the optimal memory area. Further, the scheduler 110 selects what method is used as cache coherency for synchronizing shared data according to priority when the same shared data is arranged in a plurality of cache memories.

具体的には、左側のマルチコアプロセッサシステム１００のように、同一優先度が設定された並列に実行可能な処理を各ＣＰＵによって同時に実行する場合、スケジューラ１１０は、高優先度の処理の共有データをアクセス速度の速いメモリ領域から優先的に配置する。例えば、高優先度に設定されている並列可能なタスク＃０，１およびタスク＃３，４の共有データは、キャッシュＬ１＄から順に、アクセス速度の速いメモリ領域に配置される。そして、低優先度に設定されたタスク＃２およびタスク＃５の共有データについては、高優先度の処理の共有データが配置された後に、残りのメモリに配置される。 Specifically, as in the left multi-core processor system 100, when the CPUs simultaneously execute parallel executable processes with the same priority set, the scheduler 110 stores the shared data of the high priority processes. Arrange preferentially from the memory area with fast access speed. For example, the shared data of tasks # 0, 1 and tasks # 3, 4 that can be set in parallel with high priority are arranged in a memory area with a high access speed in order from the cache L1 $. Then, the shared data of task # 2 and task # 5 set to the low priority is arranged in the remaining memory after the shared data of the high priority processing is arranged.

一方、右側のマルチコアプロセッサシステム１００のように、優先度の異なる並列に実行可能な処理を各ＣＰＵによって同時に実行する場合も、スケジューラ１１０は、左側のマルチコアプロセッサシステム１００と同様に、高優先度に設定された処理の共有データをキャッシュＬ１＄へ配置する。その後、スケジューラ１１０は、残りのメモリに、低優先度に設定されたタスク＃２およびタスク＃３の共有データを配置する。 On the other hand, when the CPUs simultaneously execute processes that can be executed in parallel with different priorities as in the right multi-core processor system 100, the scheduler 110 increases the priority as in the left multi-core processor system 100. The shared data of the set process is arranged in the cache L1 $. After that, the scheduler 110 arranges shared data of task # 2 and task # 3 set to low priority in the remaining memory.

また、左側のマルチコアプロセッサシステム１００の場合、スケジューラ１１０は、通常のキャッシュメモリに新たな値が書き込まれたタイミングでキャッシュコヒーレンシを行う。一方、右側のマルチコアプロセッサシステム１００の場合、スケジューラ１１０は、あるキャッシュメモリ（例えば、キャッシュＬ１＄０）に、新たな値が書き込まれた後、ＣＰＵから新たな値の書き込みが反映されていないキャッシュメモリ（キャッシュＬ１＄１）への読み込みが発生したタイミングでキャッシュコヒーレンシを行う。 In the case of the left multi-core processor system 100, the scheduler 110 performs cache coherency at the timing when a new value is written in a normal cache memory. On the other hand, in the case of the multi-core processor system 100 on the right side, the scheduler 110 does not reflect the writing of a new value from the CPU after a new value is written in a certain cache memory (for example, the cache L1 $ 0). Cache coherency is performed at the timing when reading into the memory (cache L1 $ 1) occurs.

このように、本実施の形態にかかるマルチコアプロセッサシステム１００は、利用頻度の高い共有データを、アクセス速度の速いキャッシュメモリに優先的に配置するため、処理速度を向上させることができる。また、低優先度に設定された処理の共有データは、ＣＰＵからのアクセス要求が発生するまで、キャッシュコヒーレンシによる同期処理が延期される。したがって、再利用性のない共有データをキャッシュメモリに書き込むなど、処理性能の低下の原因となる動作を回避することができる。以下には、本実施の形態にかかるマルチコアプロセッサシステム１００の詳細な構成と処理手順について説明する。 As described above, the multi-core processor system 100 according to the present embodiment preferentially arranges shared data that is frequently used in a cache memory having a high access speed, so that the processing speed can be improved. In addition, for the shared data of the process set to the low priority, the synchronization process by the cache coherency is postponed until an access request from the CPU is generated. Therefore, it is possible to avoid an operation that causes a decrease in processing performance, such as writing shared data having no reusability to the cache memory. The detailed configuration and processing procedure of the multi-core processor system 100 according to this embodiment will be described below.

（階層的なメモリ構成）
図２は、階層的なメモリ構成の一例を示す説明図である。図２に例示したように、本実施の形態にかかるマルチコアプロセッサシステム１００は、複数種類のメモリ領域を備えている。各メモリ領域は、それぞれプロセッサからのアクセス速度やメモリ容量が異なるため、それぞれ用途に応じたデータが格納される。 (Hierarchical memory configuration)
FIG. 2 is an explanatory diagram illustrating an example of a hierarchical memory configuration. As illustrated in FIG. 2, the multi-core processor system 100 according to the present embodiment includes a plurality of types of memory areas. Since each memory area has a different access speed and memory capacity from the processor, data corresponding to each application is stored.

図２のように、マルチコアプロセッサシステム１００の各プロセッサ（ＣＰＵ＃０，ＣＰＵ＃１）には、キャッシュＬ１＄（各プロセッサに搭載されたキャッシュメモリ）、キャッシュＬ２＄（スヌープ１２０に搭載されたキャッシュメモリ）、メモリ１４０およびファイルシステム１５０という４種類のメモリ領域が用意されている。 As shown in FIG. 2, each processor (CPU # 0, CPU # 1) of the multi-core processor system 100 includes a cache L1 $ (cache memory installed in each processor) and a cache L2 $ (cache installed in the snoop 120). Four types of memory areas are prepared: a memory), a memory 140, and a file system 150.

各プロセッサと接続関係が近い上位のメモリ領域ほど、アクセス速度が速く、メモリ容量が小さい。反対に、各プロセッサとの接続関係が遠い下位のメモリ領域ほど、アクセス速度が遅く、メモリ容量が大きい。したがって、マルチコアプロセッサシステム１００では、図１にて説明したように、優先的に処理したいタスクが利用する共有データや、利用頻度の高い共有データを上位のメモリに配置する。 The higher the memory area closer to each processor, the faster the access speed and the smaller the memory capacity. On the other hand, the lower the memory area that is farther connected to each processor, the slower the access speed and the larger the memory capacity. Therefore, in the multi-core processor system 100, as described with reference to FIG. 1, shared data used by a task to be preferentially processed or shared data that is frequently used is arranged in a higher-level memory.

（マルチタスク処理）
図３は、マルチタスク処理の一例を示す説明図である。本実施の形態にかかるマルチコアプロセッサシステム１００におけるマルチタスク処理とは、複数のタスクが複数のプロセッサによって並列に実行される処理を意味する。 (Multitask processing)
FIG. 3 is an explanatory diagram illustrating an example of multitask processing. Multitask processing in the multicore processor system 100 according to the present embodiment means processing in which a plurality of tasks are executed in parallel by a plurality of processors.

例えば、図３では、マルチコアプロセッサシステム１００の実行対象となるタスクとしてタスク＃０〜タスク＃５が用意されている。そして、スケジューラ１１０の制御によって、ＣＰＵ＃０とＣＰＵ＃１とは、それぞれ、ディスパッチされたタスクを実行する。スケジューラ１１０は、複数のタスクの中から実行対象となるタスクをタイムスライシングなどで適宜切り替えながら各タスクを並列に実行させる。 For example, in FIG. 3, task # 0 to task # 5 are prepared as tasks to be executed by the multi-core processor system 100. Then, under the control of the scheduler 110, the CPU # 0 and the CPU # 1 each execute the dispatched task. The scheduler 110 causes each task to be executed in parallel while appropriately switching a task to be executed from among a plurality of tasks by time slicing or the like.

（キャッシュコヒーレンシ）
次に、本実施の形態にかかるマルチコアプロセッサシステム１００のスヌープ１２０によって実行されるキャッシュコヒーレンシの手順について説明する。図１にて説明したように、スヌープ１２０は、スケジューラ１１０からの指示に応じて、通常のキャッシュコヒーレンシと、低優先度並列タスクにおけるキャッシュコヒーレンシのいずれかのコヒーレンス方式が設定される。 (Cash coherency)
Next, a procedure of cache coherency executed by the snoop 120 of the multi-core processor system 100 according to the present embodiment will be described. As described with reference to FIG. 1, the snoop 120 is set to either a normal cache coherency or a cache coherency in a low priority parallel task in accordance with an instruction from the scheduler 110.

＜通常のキャッシュコヒーレンシ（ｗｒｉｔｅ時更新）＞
図４〜７は、通常のキャッシュコヒーレンシの手順を示す説明図である。図４に例示したマルチコアプロセッサシステム１００では、並列タスクを実行するＣＰＵ＃０およびＣＰＵ＃１のキャッシュメモリ（キャッシュＬ１＄０およびキャッシュＬ１＄１）に、実行対象のタスクの記述４００に基づいて、最新データが格納される。 <Normal cache coherency (update at write)>
4 to 7 are explanatory diagrams showing a normal cache coherency procedure. In the multi-core processor system 100 illustrated in FIG. 4, the cache memory (cache L1 $ 0 and cache L1 $ 1) of the CPU # 0 and CPU # 1 that execute parallel tasks is based on the description 400 of the task to be executed. The latest data is stored.

その後、図５のように、マルチコアプロセッサシステム１００の１つのＣＰＵが、記述４００の変数ａの中身を書き換えたとする。例えば、図５では、ＣＰＵ＃０によって、キャシュＬ１＄０の変数ａの値が書き換えられている。すると、同じデータが格納されているキャッシュＬ１＄１の変数ａは古いデータとなり、同じ変数ａであっても異なる値となってしまう。 Thereafter, as shown in FIG. 5, it is assumed that one CPU of the multi-core processor system 100 rewrites the contents of the variable a in the description 400. For example, in FIG. 5, the value of the variable a of the cache L1 $ 0 is rewritten by the CPU # 0. Then, the variable a of the cache L1 $ 1 in which the same data is stored becomes old data, and even the same variable a has a different value.

そこで、通常のキャッシュコヒーレンシの場合、古いデータが格納されているキャッシュＬ１＄１の変数ａの値は、まず、図６のように、記述４００に基づいて、パージされる。 Therefore, in the case of normal cache coherency, the value of the variable a of the cache L1 $ 1 in which old data is stored is first purged based on the description 400 as shown in FIG.

その後、図７のように、スヌープ１２０のバスを介して、キャッシュＬ１＄０の変数ａの値は、キャッシュＬ１＄１の変数ａの値として格納される。以上説明したように、通常のキャッシュコヒーレンシの場合、図４〜７に例示した処理を施すことによって、キャッシュＬ１＄０とキャッシュＬ１＄１との一貫性が保たれる。 Thereafter, as shown in FIG. 7, the value of the variable a in the cache L1 $ 0 is stored as the value of the variable a in the cache L1 $ 1 via the bus of the snoop 120. As described above, in the case of normal cache coherency, consistency between the cache L1 $ 0 and the cache L1 $ 1 is maintained by performing the processing illustrated in FIGS.

＜低優先度並列タスクにおけるキャッシュコヒーレンシ（ｒｅａｄ時更新）＞
図８は、低優先度並列タスクにおけるキャッシュコヒーレンシの手順を示す説明図である。図８は、低優先度に設定された並列タスクをマルチコアプロセッサシステム１００によって実行させる場合のコヒーレンシの手順を表している。 <Cache coherency in low-priority parallel tasks (update during read)>
FIG. 8 is an explanatory diagram showing a procedure of cache coherency in a low priority parallel task. FIG. 8 shows a coherency procedure when the multi-core processor system 100 executes a parallel task set to a low priority.

まず、マルチコアプロセッサシステム１００において、ＣＰＵ＃０とＣＰＵ＃１とは並列タスクを実行しており、キャッシュＬ１＄０とキャッシュＬ１＄１には同じデータが配置されている（ステップＳ８０１）。 First, in the multi-core processor system 100, CPU # 0 and CPU # 1 execute parallel tasks, and the same data is arranged in the cache L1 $ 0 and the cache L1 $ 1 (step S801).

その後、マルチコアプロセッサシステム１００のＣＰＵ＃０が変数ａの中身を書き換えると（ステップＳ８０２）、キャッシュＬ１＄１の変数ａは、パージされる（ステップＳ８０３）。このように、低優先度並列タスクにおけるキャッシュコヒーレンシの場合も、キャシュメモリに格納されている変数ａの書き換えを検出して、古いデータがパージされるまでは、通常のキャッシュコヒーレンシと同じ手順が行われる。 Thereafter, when the CPU # 0 of the multi-core processor system 100 rewrites the contents of the variable a (step S802), the variable a of the cache L1 $ 1 is purged (step S803). In this way, even in the case of cache coherency in a low priority parallel task, the same procedure as normal cache coherency is performed until rewriting of the variable a stored in the cache memory is detected and old data is purged. Is called.

その後、マルチコアプロセッサシステム１００のＣＰＵ＃１によって、変数ａへアクセスする処理が実行された場合、スヌープ１２０は、バスを介して、キャッシュＬ１＄０に格納されている最新の変数ａの値を、キャッシュＬ１＄１に格納する（ステップＳ８０４）。 Thereafter, when the process of accessing the variable a is executed by the CPU # 1 of the multi-core processor system 100, the snoop 120 changes the value of the latest variable a stored in the cache L1 $ 0 via the bus. Store in the cache L1 $ 1 (step S804).

以上説明したように、低優先度並列タスクにおけるキャッシュコヒーレンシでは、ＣＰＵ＃１によって最新の書き換え内容が反映されていないキャッシュＬ１＄１の変数ａへのアクセス要求が発生した際に、スヌープ１２０が制御され、コヒーレンスがとられる。したがって、通常のキャッシュコヒーレンシのような冗長なバストランザクションを回避することができる。 As described above, in the cache coherency in the low priority parallel task, the snoop 120 controls when the access request to the variable a of the cache L1 $ 1 in which the latest rewrite content is not reflected by the CPU # 1 is generated. And coherence is taken. Therefore, redundant bus transactions such as normal cache coherency can be avoided.

上述したように、通常のキャッシュコヒーレンシでは、変数ａが更新されたタイミングで動作を開始する。それに対して、低優先度並列タスクにおけるキャッシュコヒーレンシでは、ＣＰＵ＃０によって、キャッシュＬ１＄０の変数ａが更新された後、ＣＰＵ＃１によって変数ａへの読み込み要求が発生すると、はじめて動作を開始する。具体的には、スヌープ１２０が、最新の変数ａが配置されているキャッシュＬ１＄０の変数ａの値を読み込み、読み込んだ値を、キャッシュＬ１＄１の変数ａとして配置する。 As described above, in normal cache coherency, the operation starts at the timing when the variable a is updated. On the other hand, in the cache coherency in the low priority parallel task, after the CPU # 0 updates the variable a of the cache L1 $ 0, the operation starts only when the CPU # 1 issues a read request to the variable a. To do. Specifically, the snoop 120 reads the value of the variable a in the cache L1 $ 0 where the latest variable a is arranged, and arranges the read value as the variable a in the cache L1 $ 1.

また、図８に例示したステップＳ８０４では、キャッシュＬ１＄０にＣＰＵ＃０のアクセス対象となるデータが配置されていたが、キャッシュＬ１＄０によって実行されるタスクによっては、他のメモリ領域に格納されているデータがアクセス対象となる場合もある。例えば、ＣＰＵ＃０が、キャッシュＬ２＄やメモリ１４０やファイルシステム１５０に配置されているデータへアクセスする場合も想定される。そのような場合には、スヌープ１２０は、各データ領域から対象となるデータを読み出してキャシュメモリＬ１＄に配置することができる。 In step S804 illustrated in FIG. 8, data to be accessed by the CPU # 0 is arranged in the cache L1 $ 0. However, depending on the task executed by the cache L1 $ 0, the data is stored in another memory area. In some cases, the data being accessed is the target of access. For example, it is assumed that CPU # 0 accesses data arranged in the cache L2 $, the memory 140, and the file system 150. In such a case, the snoop 120 can read out the target data from each data area and place it in the cache memory L1 $.

以下には、図１に示した本実施の形態にかかるスケジューリング処理を実現するマルチコアプロセッサシステム１００のスケジューラ１１０の機能的構成と、動作内容について説明する。 The functional configuration and operation contents of the scheduler 110 of the multi-core processor system 100 that implements the scheduling processing according to the present embodiment shown in FIG. 1 will be described below.

（スケジューラの機能的構成）
図９は、スケジューラの機能的構成を示すブロック図である。図９において、マルチコア９０１は、ｎ個のＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）を備え、マルチコアプロセッサシステム１００の全体の制御を司る。マルチコア９０１とは、コアが複数搭載されたプロセッサまたはプロセッサ群である。コアが複数搭載されていれば、複数のコアが搭載された単一のプロセッサでもよく、シングルコアのプロセッサが並列されているプロセッサ群でもよい。なお、本実施の形態では、説明を単純化するため、シングルコアのプロセッサが並列されているプロセッサ群を例に挙げて説明する。 (Functional configuration of scheduler)
FIG. 9 is a block diagram showing a functional configuration of the scheduler. In FIG. 9, the multi-core 901 includes n CPUs (Central Processing Units) and controls the entire multi-core processor system 100. The multi-core 901 is a processor or a group of processors on which a plurality of cores are mounted. If a plurality of cores are mounted, a single processor having a plurality of cores may be used, or a processor group in which single core processors are arranged in parallel may be used. In the present embodiment, in order to simplify the explanation, a processor group in which single-core processors are arranged in parallel will be described as an example.

そして、スケジューラ１１０は、判断部１００１と、第１配置部１００２と、第２配置部１００３と、第３配置部１００４と、特定部１００５と、抽出部１００６と、割当部１００７と、を含む構成である。判断部１００１〜割当部１００７は、具体的には、例えば、マルチコアプロセッサシステム１００の他のメモリ１００８（ＣＰＵに搭載されたキャッシュメモリ以外のメモリ）に記憶されたプログラムをマルチコア９０１の中の特定のＣＰＵに実行させることにより、その機能を実現する。 The scheduler 110 includes a determination unit 1001, a first arrangement unit 1002, a second arrangement unit 1003, a third arrangement unit 1004, a specifying unit 1005, an extraction unit 1006, and an allocation unit 1007. It is. Specifically, the determination unit 1001 to the allocation unit 1007 specify, for example, a program stored in another memory 1008 of the multi-core processor system 100 (a memory other than a cache memory installed in the CPU) in the multi-core 901. The function is realized by causing the CPU to execute the function.

判断部１００１は、マルチコアプロセッサシステム１００において、実行対象となる処理（以下、「実行対象処理」と呼ぶ）に設定されている優先度がしきい値以上か否かを判断する機能を有する。具体的には、判断部１００１は、マルチコアプロセッサシステム１００の各プロセッサ（ＣＰＵ＃０〜ＣＰＵ＃ｎ）に割り当てて実行させる処理群のうち各プロセッサに割り当てられる実行対象処理の優先度が、しきい値以上か否かを判断する。判断部１００１による判断結果は、一旦、他のメモリ１００８などの記憶領域に記憶される。 The determination unit 1001 has a function of determining whether or not the priority set for a process to be executed (hereinafter referred to as “execution target process”) is equal to or higher than a threshold value in the multi-core processor system 100. Specifically, the determination unit 1001 determines whether the priority of the execution target process assigned to each processor in the processing group assigned to each processor (CPU # 0 to CPU #n) of the multi-core processor system 100 and executed. Judge whether or not it is greater than or equal to the value. The determination result by the determination unit 1001 is temporarily stored in a storage area such as another memory 1008.

優先度は、実行対象処理のシミュレーションによって得られる動作結果に基づいて設定される。例えば、各実行対象処理のデッドラインを比較して、デッドラインまでの時間が短い実行対象処理ほど、優先度が高くなるように設定してもよい。本実施の形態にかかるスケジューラ１１０は、優先度が高く設定されている実行対象処理の共有データを、一旦、アクセス速度の速いメモリ（キャッシュＬ１＄や、キャッシュＬ２＄）に配置すると、処理が終了するまでロック状態に保つ。したがって、優先度が高く設定されている実行対象処理は、他の実行対象処理よりも優先的に実行される。 The priority is set based on the operation result obtained by the simulation of the execution target process. For example, the deadline of each execution target process may be compared, and the execution target process having a shorter time to the deadline may be set to have a higher priority. The scheduler 110 according to the present embodiment ends the processing once the shared data of the execution target process set with a high priority is placed in a memory (cache L1 $ or cache L2 $) having a high access speed. Keep locked until Therefore, an execution target process with a high priority is executed with higher priority than other execution target processes.

また、他にも、動作結果を参照して、キャッシュメモリに配置した共有データの更新回数が多い実行対象処理ほど優先度が高くなるように設定してもよい。本実施の形態にかかるスケジューラ１１０は、再利用性の高い共有データを優先的に各プロセッサのキャッシュメモリ（キャッシュＬ１＄）に配置するため、キャッシュメモリの利用効率を高い値に維持することができる。 In addition, referring to the operation result, the execution target process having a larger number of updates of the shared data arranged in the cache memory may be set to have a higher priority. Since the scheduler 110 according to the present embodiment preferentially places shared data with high reusability in the cache memory (cache L1 $) of each processor, the utilization efficiency of the cache memory can be maintained at a high value. .

また、判断部１００１において判断基準となるしきい値は、調整可能である。そして、判断部１００１は、各実行対象処理について、設定されている優先度がしきい値以上であれば、高優先度の実行対象処理とし、設定されている優先度がしきい値に満たなければ、低優先度の実行対象処理とする。したがって、実行対象となるアプリケーションに応じて最適な値を設定することができる。また、実行対象処理の単位としては、タスク、プロセス、スレッドなど、任意の単位を選択することができる。本実施の形態では、一例として、タスクを実行対象処理の単位として説明を行う。 In addition, the threshold value that is a determination criterion in the determination unit 1001 can be adjusted. If the priority set for each execution target process is equal to or higher than the threshold value, the determination unit 1001 sets the execution target process with a high priority, and the set priority level does not satisfy the threshold value. For example, the execution target process has a low priority. Therefore, an optimal value can be set according to the application to be executed. In addition, an arbitrary unit such as a task, a process, or a thread can be selected as the unit of the execution target process. In the present embodiment, as an example, a task is described as a unit of execution target processing.

第１配置部１００２は、判断部１００１の判断結果に応じて、各ＣＰＵに搭載されたキャッシュメモリへデータを配置する機能を有する。具体的には、第１配置部１００２は、判断部１００１によって、実行対象処理のうち、しきい値以上の優先度であると判断された高優先度の実行対象処理が実行時にアクセスする共有データを、対象となるＣＰＵのキャッシュメモリに配置する。 The first arrangement unit 1002 has a function of arranging data in a cache memory mounted on each CPU in accordance with the determination result of the determination unit 1001. Specifically, the first arrangement unit 1002 is a shared data that is accessed by the determination unit 1001 during execution of a high-priority execution target process that is determined to have a priority level equal to or higher than a threshold among the execution target processes. Are placed in the cache memory of the target CPU.

例えば、高優先度の実行対象処理であるタスクＡが、マルチコア９０１の中のＣＰＵ＃１によって実行される場合、タスクＡが実行時にアクセスする共有データは、第１配置部１００２によって、キャッシュメモリ１に配置される。同様に、高優先度の実行対象処理であるタスクＢが、マルチコア９０１の中のＣＰＵ＃０によって実行される場合、タスクＢが実行時にアクセスする共有データは、第１配置部１００２によって、キャッシュメモリ０に配置される。 For example, when task A, which is a high-priority execution target process, is executed by CPU # 1 in multi-core 901, shared data accessed by task A at the time of execution is cache memory 1 by first arrangement unit 1002. Placed in. Similarly, when task B, which is a high-priority execution target process, is executed by CPU # 0 in multi-core 901, shared data accessed by task B at the time of execution is cached by first allocation unit 1002. 0.

また、アプリケーション１０００によっては、実行対象処理の中に、判断部１００１によって、高優先度の実行対象処理が存在しないと判断されることがある。このような場合にキャッシュメモリを空の状態で放置すると、キャッシュメモリの利用効率が低下してしまう。そこで、第１配置部１００２は、高優先度の実行対象処理以外の処理（例えば、後述する低優先度の実行対象処理）であっても、各ＣＰＵに搭載されたキャッシュメモリへ共有データを配置する。その後、高優先度の実行対象処理が現れた場合、第１配置部１００２は、優先的に高優先度の処理の共有データを対象となるＣＰＵのキャッシュメモリに配置する。 Depending on the application 1000, the determination unit 1001 may determine that there is no high priority execution target process in the execution target process. In such a case, if the cache memory is left empty, the utilization efficiency of the cache memory is lowered. Therefore, the first arrangement unit 1002 arranges shared data in the cache memory mounted on each CPU even for processes other than the high priority execution target process (for example, the low priority execution target process described later). To do. Thereafter, when a high priority execution target process appears, the first placement unit 1002 preferentially places the shared data of the high priority process in the cache memory of the target CPU.

また、第１配置部１００２は、上述したように、高優先度の実行対象処理の共有データを、対象となるプロセッサのキャッシュメモリに配置する際に、高優先度の実行対象処理の実行が終了するまで、共有データの上書きを禁止（ロック状態）にすることもできる。したがって、第１配置部１００２は、高優先度の実行対象処理の共有データに対する、再利用性のないデータによる上書きを防ぐことができる。 Further, as described above, when the first placement unit 1002 places the shared data of the high priority execution target process in the cache memory of the target processor, the execution of the high priority execution target process ends. Until this is done, overwriting of the shared data can be prohibited (locked). Therefore, the first arrangement unit 1002 can prevent the shared data of the high priority execution target process from being overwritten by non-reusable data.

第２配置部１００３は、判断部１００１の判断結果に応じて、各プロセッサのキャッシュメモリよりもアクセス速度の遅い他のメモリ１００８に、データを配置する機能を有する。具体的には、第２配置部１００３は、判断部１００１によって、しきい値以上の優先度でないと判断された低優先度の実行対象処理が実行時にアクセスする共有データを、他のメモリ１００８に配置する。 The second arrangement unit 1003 has a function of arranging data in another memory 1008 having an access speed slower than the cache memory of each processor in accordance with the determination result of the determination unit 1001. Specifically, the second arrangement unit 1003 transfers the shared data accessed by the low priority execution target process, which is determined by the determination unit 1001 as not having a priority equal to or higher than the threshold, to the other memory 1008. Deploy.

なお、図２にて説明したように、キャッシュメモリ以外の他のメモリ１００８は、アクセス速度と、メモリ容量に応じて階層的に複数種類のメモリが用意されている。したがって、第２配置部１００３は、アクセス速度の高いメモリの順に配置可能な容量分のデータを順次格納する。例えば、図９の場合、キャッシュＬ２＄→メモリ１４０→ファイルシステム１５０の順序でデータが配置される。また、データも、事前のシミュレーションから特定した更新頻度が高いデータが優先的にアクセス速度の速いメモリに配置される。 As described with reference to FIG. 2, the memory 1008 other than the cache memory is provided with a plurality of types of memories hierarchically according to the access speed and the memory capacity. Therefore, the second arrangement unit 1003 sequentially stores data for a capacity that can be arranged in the order of high access speed memories. For example, in the case of FIG. 9, data is arranged in the order of cache L2 $ → memory 140 → file system 150. In addition, as for data, data having a high update frequency specified from a prior simulation is preferentially arranged in a memory having a high access speed.

第３配置部１００４は、マルチコア９０１からアクセス要求のあった共有データを、要求元のＣＰＵに搭載されているキャッシュメモリに配置する機能を有する。具体的には、第３配置部１００４は、マルチコア９０１の中のいずれかのＣＰＵ（例えば、ＣＰＵ＃１）においてメモリ１００８に配置された共有データへのアクセス要求が発生した場合に、メモリ１００８に配置された共有データを、ＣＰＵ＃１のキャッシュメモリ１に配置する。 The third placement unit 1004 has a function of placing shared data requested to be accessed from the multi-core 901 in a cache memory mounted on the requesting CPU. Specifically, the third placement unit 1004 stores the memory 1008 in the memory 1008 when an access request to the shared data placed in the memory 1008 occurs in any of the CPUs (for example, CPU # 1) in the multi-core 901. The arranged shared data is arranged in the cache memory 1 of the CPU # 1.

特定部１００５は、判断部１００１によって実行対象処理の優先度が、しきい値以上か否かの判断が行われると、マルチコア９０１の各ＣＰＵのキャッシュメモリの中の書き換え可能な領域の容量を特定する機能を有する。書き換え可能な領域とは、すなわち、上書き可能な領域を意味する。 The identifying unit 1005 identifies the capacity of the rewritable area in the cache memory of each CPU of the multi-core 901 when the determining unit 1001 determines whether the priority of the execution target process is equal to or higher than the threshold value. Has the function of The rewritable area means an overwritable area.

したがって、実行済の処理の共有データが配置されている領域や、低優先度の処理の共有データが配置されている領域は、上書き可能なため、書き換え可能な領域として特定される。特定部１００５による特定結果は、一旦、他のメモリ１００８などの記憶領域に記憶される。 Therefore, the area where the shared data of the executed process is arranged and the area where the shared data of the low priority process is arranged can be overwritten, and thus are specified as a rewritable area. The identification result by the identification unit 1005 is temporarily stored in a storage area such as another memory 1008.

また、第１配置部１００２は、特定部１００５によって特定された書き換え可能な領域の容量に応じて、配置処理を調整することもできる。例えば、書き換え可能な領域の容量が高優先度の実行対象処理が実行時にアクセスする共有データの容量よりも小さい場合、第１配置部１００２は、共有データをすべてキャッシュメモリに配置することはできない。そこで、第１配置部１００２は、共有データのうち、更新頻度が高いデータの順にキャッシュメモリに配置可能な容量分配置する。そして、第２配置部１００３は、キャッシュメモリに配置できなかった共有データを他のメモリ１００８領域に配置する。 In addition, the first arrangement unit 1002 can adjust the arrangement process according to the capacity of the rewritable area specified by the specifying unit 1005. For example, when the capacity of the rewritable area is smaller than the capacity of the shared data accessed during execution of the high priority execution target process, the first placement unit 1002 cannot place all the shared data in the cache memory. Therefore, the first arrangement unit 1002 arranges the shared data in a capacity that can be arranged in the cache memory in the order of data with the highest update frequency. Then, the second arrangement unit 1003 arranges the shared data that could not be arranged in the cache memory in another memory 1008 area.

また、反対に、書き換え可能な領域の容量が、高優先度の実行対象処理が実行時にアクセスする共有データの容量よりも大きくなる可能性もある。このような場合、第１配置部１００２は、まず、通常通り高優先度の実行対象処理が実行時にアクセスする共有データをキャッシュメモリに配置する。その後、第１配置部１００２は、低優先度の実行対象処理が実行時にアクセスする共有データのうち、更新頻度が高いデータの順にキャッシュメモリの中の空き容量に配置する。 On the other hand, the capacity of the rewritable area may be larger than the capacity of the shared data accessed by the high priority execution target process at the time of execution. In such a case, the first placement unit 1002 first places the shared data, which is accessed at the time of execution by the high priority execution target process, in the cache memory as usual. Thereafter, the first arrangement unit 1002 arranges the shared data accessed at the time of execution of the low priority execution target process in the free capacity in the cache memory in the order of the data with the highest update frequency.

抽出部１００６は、アプリケーション１０００に含まれる実行対象処理のうち、特定の条件を満たす処理を抽出する機能を有する。具体的には、抽出部１００６は、実行対象処理のうち、実行時にアクセスするデータが共通する処理（例えば並列タスク）を抽出する。実行時にアクセスするデータが共通するか否かは、各実行対象処理に設定されている共有データの識別子を参照する（例えば、後述する図１３にて説明する共有データＩＤ）。抽出部１００６による抽出結果は、一旦、メモリ１００８などの記憶領域に記憶される。 The extraction unit 1006 has a function of extracting a process that satisfies a specific condition from the execution target processes included in the application 1000. Specifically, the extraction unit 1006 extracts a process (for example, a parallel task) that has common data to be accessed during execution from the execution target processes. Whether or not the data to be accessed at the time of execution is common refers to the identifier of the shared data set in each execution target process (for example, the shared data ID described in FIG. 13 described later). The extraction result by the extraction unit 1006 is temporarily stored in a storage area such as the memory 1008.

割当部１００７は、実行対象処理をマルチコア９０１の各ＣＰＵに割り当てる機能を有する。割当部１００７は、スケジューラ１１０からの指示がなければ、各実行対象処理を、事前に設定されている依存関係および実行順序と、現在の各ＣＰＵの処理負荷とに基づいて、最適なＣＰＵに割り当てる。 The allocation unit 1007 has a function of allocating execution target processing to each CPU of the multi-core 901. If there is no instruction from the scheduler 110, the allocating unit 1007 allocates each execution target process to the optimum CPU based on the dependency and execution order set in advance and the current processing load of each CPU. .

また、割当部１００７は、抽出部１００６によって抽出された処理が存在する場合には、共有データが共通する処理同士として抽出された各処理をマルチコア９０１の中の同一のＣＰＵに割り当てる。さらに、割当部１００７は、抽出部１００６によって抽出された処理のうち、同一の優先度が設定されている処理を、マルチコア９０１の中の同一のＣＰＵ（例えば、ＣＰＵ＃１など）に割り当てることもできる。 Further, when there is a process extracted by the extraction unit 1006, the allocation unit 1007 allocates each process extracted as a process with common shared data to the same CPU in the multi-core 901. Further, the assigning unit 1007 may assign a process with the same priority among the processes extracted by the extracting unit 1006 to the same CPU (for example, CPU # 1) in the multi-core 901. it can.

以下には、マルチコアプロセッサシステム１００が、実行対象処理の一例として、アプリケーション１００を構成する並列タスクを各ＣＰＵによって並列に実行する場合について説明する。 Below, the case where the multi-core processor system 100 performs the parallel task which comprises the application 100 in parallel by each CPU as an example of an execution object process is demonstrated.

（共有データの配置処理）
図１０は、共有データの配置処理の手順を示すフローチャートである。図１０のフローチャートは、共有データをいずれのキャッシュメモリ（キャッシュＬ１＄やキャッシュＬ２＄）に配置するかを決定する手順を表している。図１０の各処理を実行することによって、各タスクを実行する際に利用する共有データをキャッシュコヒーレンシ処理の内容に対応した適切なキャッシュメモリに配置することができる。 (Shared data placement process)
FIG. 10 is a flowchart showing a procedure of shared data arrangement processing. The flowchart of FIG. 10 represents a procedure for determining in which cache memory (cache L1 $ or cache L2 $) the shared data is to be placed. By executing each process of FIG. 10, the shared data used when executing each task can be arranged in an appropriate cache memory corresponding to the contents of the cache coherency process.

図１０において、スケジューラ１１０には、実行対象となるタスクが順次入力される。したがって、スケジューラ１１０は、まず、実行対象となるタスクが高優先度タスクか否かを判断する（ステップＳ１００１）。ステップＳ１００１において、実行対象となるタスクが高優先度タスクであると判断された場合（ステップＳ１００１：Ｙｅｓ）、スケジューラ１１０は、実行対象となるタスクの全共有データサイズが、キャッシュＬ１＄サイズよりも小さいか否かを判断する（ステップＳ１００２）。 In FIG. 10, tasks to be executed are sequentially input to the scheduler 110. Therefore, the scheduler 110 first determines whether or not the task to be executed is a high priority task (step S1001). If it is determined in step S1001 that the task to be executed is a high priority task (step S1001: Yes), the scheduler 110 determines that the total shared data size of the task to be executed is larger than the cache L1 $ size. It is determined whether or not it is smaller (step S1002).

ステップＳ１００２において、全共有データサイズが、キャッシュＬ１＄サイズよりも小さいと判断された場合（ステップＳ１００２：Ｙｅｓ）、スケジューラ１１０は、全共有データをＬ１＄に配置して（ステップＳ１００３）、一連の処理を終了する。すなわち、スケジューラ１１０は、ステップＳ１００３によって、実行対象となるタスクが高優先度タスクであり、かつ、実行対象のタスクの全共有データがＣＰＵのキャッシュメモリに格納可能であれば、全共有データをアクセス速度の速いキャッシュＬ１＄に配置する。 If it is determined in step S1002 that the total shared data size is smaller than the cache L1 $ size (step S1002: Yes), the scheduler 110 places all shared data in L1 $ (step S1003), and a series of End the process. That is, in step S1003, the scheduler 110 accesses all shared data if the task to be executed is a high priority task and all shared data of the task to be executed can be stored in the CPU cache memory. Place in the fast cache L1 $.

ステップＳ１００２において、全共有データサイズが、キャッシュＬ１＄サイズよりも小さくないと判断された場合（ステップＳ１００２：Ｎｏ）、スケジューラ１１０は、全共有データをキャッシュＬ１＄に配置することはできない。したがって、スケジューラ１１０は、実行対象のタスクの共有データのうち、更新頻度の高い順番にキャッシュＬ１＄，Ｌ２＄に配置する（ステップＳ１００４）。すなわち、スケジューラ１１０は、ステップＳ１００４によって、共有データのうち更新頻度の高いデータから順番にキャッシュＬ１＄に配置し、キャッシュＬ１＄の容量がなくなると、続いて、残りの共有データのうち更新頻度の高いデータから順番にキャッシュＬ２＄に配置する。 If it is determined in step S1002 that the total shared data size is not smaller than the cache L1 $ size (step S1002: No), the scheduler 110 cannot place all the shared data in the cache L1 $. Therefore, the scheduler 110 arranges the shared data of the task to be executed in the caches L1 $ and L2 $ in the order of update frequency (step S1004). That is, in step S1004, the scheduler 110 arranges the shared data in the cache L1 $ in order from the data with the highest update frequency, and when the cache L1 $ runs out of capacity, the update frequency of the remaining shared data is subsequently updated. Arrange in the cache L2 $ in order from the highest data.

以上説明したステップＳ１００２〜Ｓ１００４の処理は、高優先度タスクの共有データを配置する場合の手順を表している。一方、高優先度タスク以外のタスク（低優先度のタスク）の共有データは、更新頻度大となるデータを対象に、キャシュＬ１＄の空領域に配置される。 The processing in steps S1002 to S1004 described above represents a procedure in the case of arranging shared data of high priority tasks. On the other hand, shared data of tasks other than high priority tasks (low priority tasks) is arranged in an empty area of the cache L1 $ for data with a high update frequency.

ステップＳ１００１において、実行対象となるタスクが高優先度タスクではないと判断された場合（ステップＳ１００１：Ｎｏ）、スケジューラ１１０は、共有データのうち、更新頻度の高いデータを対象として配置処理を行う。まず、スケジューラ１１０は、実行対象のタスクの共有データのうち、更新頻度大の全共有データサイズが未ロックのキャッシュＬ１＄サイズよりも小さいか否かを判断する（ステップＳ１００５）。未ロックのキャッシュＬ１＄サイズとは、キャッシュＬ１＄の全領域のうち、既に他の実行対象のタスクの共有データが配置されているロック領域以外の領域の容量を意味する。 If it is determined in step S1001 that the task to be executed is not a high-priority task (step S1001: No), the scheduler 110 performs arrangement processing on data with high update frequency among the shared data. First, the scheduler 110 determines whether or not the shared data size of the update frequency among the shared data of the task to be executed is smaller than the unlocked cache L1 $ size (step S1005). The unlocked cache L1 $ size means the capacity of an area other than the lock area in which shared data of other execution target tasks is already arranged among all areas of the cache L1 $.

ステップＳ１００５において、更新頻度大の全共有データサイズが未ロックのキャッシュＬ１＄サイズよりも小さいと判断された場合（ステップＳ１００５：Ｙｅｓ）、スケジューラ１１０は、更新頻度大の全共有データをキャッシュＬ１＄に配置できると判断する。したがって、スケジューラ１１０は、更新頻度大の共有データをキャッシュＬ１＄に配置して（ステップＳ１００６）、一連の処理を終了する。 When it is determined in step S1005 that the size of all shared data with a high update frequency is smaller than the size of the unlocked cache L1 $ (step S1005: Yes), the scheduler 110 stores all shared data with a high update frequency in the cache L1 $. Judge that it can be placed in. Therefore, the scheduler 110 places shared data with a high update frequency in the cache L1 $ (step S1006), and ends a series of processing.

一方、更新頻度大の全共有データサイズが未ロックのキャッシュＬ１＄サイズよりも小さくはないと判断された場合（ステップＳ１００５：Ｎｏ）、スケジューラ１１０は、更新頻度大の全共有データをキャッシュＬ１＄に配置できない。したがって、スケジューラ１１０は、実行対象のタスクの共有データのうち、更新頻度の高いデータを順番に、キャッシュＬ１＄，Ｌ２＄へ配置する（ステップＳ１００７）。すなわち、スケジューラ１１０は、ステップＳ１００４と同様に、共有データのうち、更新頻度の高いデータから順番にキャッシュＬ１＄へ配置する。そして、キャッシュＬ１＄の容量がなくなると、スケジューラ１１０は、続いて、残りの共有データのうち更新頻度の高いデータから順番にキャッシュＬ２＄へ配置する。 On the other hand, if it is determined that the size of all shared data with high update frequency is not smaller than the size of the unlocked cache L1 $ (step S1005: No), the scheduler 110 stores all shared data with high update frequency in the cache L1 $. Can not be placed. Therefore, the scheduler 110 arranges frequently updated data among the shared data of the task to be executed in order in the caches L1 $ and L2 $ (step S1007). That is, similarly to step S1004, the scheduler 110 arranges the shared data in the cache L1 $ in order from the data with the highest update frequency. When the capacity of the cache L1 $ is exhausted, the scheduler 110 subsequently arranges the remaining shared data in the cache L2 $ in order from the data with the highest update frequency.

以上説明したように、低優先度タスクの共有データの場合、スケジューラ１１０は、高優先度タスクの共有データが配置されていないメモリ領域に、低優先度タスクの共有データを効率的に配置することができる。たとえ、アクセス速度の速いメモリ領域（例えば、キャッシュＬ１＄）に配置されても、高優先度タスクの共有データを配置する場合と異なり、低優先度タスクの共有データはロックされていないため、高優先度タスクの処理を邪魔するような事態を防ぐことができる。 As described above, in the case of shared data of low priority tasks, the scheduler 110 efficiently arranges shared data of low priority tasks in a memory area where shared data of high priority tasks is not arranged. Can do. Even if it is arranged in a memory area with a high access speed (for example, cache L1 $), unlike the case where the shared data of the high priority task is arranged, the shared data of the low priority task is not locked. It is possible to prevent a situation that disturbs the processing of the priority task.

（タスクテーブル作成処理）
図１１は、タスクテーブル作成処理の手順を示すフローチャートである。図１１のフローチャートは、マルチコアプロセッサシステム１００によって実行させるアプリケーションを構成するタスクのシミュレーションを行い、シミュレーション結果に基づいて、タスクの優先度を表すタスクテーブル１１１を作成する手順を表している。図１１の各処理を実行することによって、スケジューラ１１０が、各タスクの共有データを適切に配置するために必要な、タスクテーブル１１１を作成することができる。 (Task table creation process)
FIG. 11 is a flowchart showing a procedure of task table creation processing. The flowchart of FIG. 11 represents a procedure for performing a simulation of a task that constitutes an application to be executed by the multi-core processor system 100 and creating a task table 111 that represents the priority of the task based on the simulation result. By executing each process of FIG. 11, the scheduler 110 can create a task table 111 necessary for appropriately arranging the shared data of each task.

図１１において、スケジューラ１１０は、まず、実行対象の各タスク中の各データサイズの解析を行う（ステップＳ１１０１）。続いて、スケジューラ１１０は、各タスクのデッドライン解析を行う（ステップＳ１１０２）。さらに、スケジューラ１１０は、タスク間のデータ依存解析を行う(ステップＳ１１０３)。以上説明したステップＳ１１０１〜Ｓ１１０３によって、スケジューラ１１０は、各タスクの構成を特定するために必要なデータを取得できる。ステップＳ１１０１〜Ｓ１１０３によって取得されたデータは、タスクテーブル１１１に格納され、後述する優先度を設定するためのシミュレーションに利用される。 In FIG. 11, the scheduler 110 first analyzes each data size in each task to be executed (step S1101). Subsequently, the scheduler 110 performs deadline analysis of each task (step S1102). Furthermore, the scheduler 110 performs data dependency analysis between tasks (step S1103). Through steps S1101 to S1103 described above, the scheduler 110 can acquire data necessary for specifying the configuration of each task. The data acquired in steps S1101 to S1103 is stored in the task table 111, and is used for simulation for setting priorities described later.

続いて、スケジューラ１１０は、各タスクの中に未シミュレーションの並列タスクが存在するか否かを判断する（ステップＳ１１０４）。ステップＳ１１０４において、未シミュレーションの並列タスクが存在すると判断された場合（ステップＳ１１０４：Ｙｅｓ）、スケジューラ１１０は、未シミュレーションの並列タスクのいずれか１組の並列タスクのシミュレーションを実行する（ステップＳ１１０５）。 Subsequently, the scheduler 110 determines whether or not an unsimulated parallel task exists in each task (step S1104). If it is determined in step S1104 that an unsimulated parallel task exists (step S1104: Yes), the scheduler 110 executes a simulation of any one of the unsimulated parallel tasks (step S1105).

その後、スケジューラ１１０は、依存解析のあるデータの更新頻度を測定し（ステップＳ１１０６）、依存関係のあるデータの更新頻度がしきい値よりも大きいか否かを判断する（ステップＳ１１０７）。ステップＳ１１０７は、優先度の設定が必要か否かを判断するための処理である。 After that, the scheduler 110 measures the update frequency of data with dependency analysis (step S1106), and determines whether the update frequency of data with dependency relationship is larger than a threshold value (step S1107). Step S1107 is processing for determining whether or not priority setting is necessary.

ステップＳ１１０７において、依存関係のあるデータの更新頻度がしきい値よりも大きい場合（ステップＳ１１０７：Ｙｅｓ）、スケジューラ１１０は、タスクテーブル１１１に格納されているデッドラインを基に優先度を設定する（ステップＳ１１０８）。一方、依存関係のあるデータの更新頻度がしきい値よりも大きくはない場合（ステップＳ１１０７：Ｎｏ）、一旦キャッシュに格納されても更新頻度が低いため、スケジューラ１１０は、優先度を決定せずに、ステップＳ１１０９の処理に移行する。 In step S1107, when the update frequency of the dependent data is larger than the threshold (step S1107: Yes), the scheduler 110 sets the priority based on the deadline stored in the task table 111 ( Step S1108). On the other hand, if the update frequency of the dependent data is not greater than the threshold value (step S1107: No), the scheduler 110 does not determine the priority because the update frequency is low even once stored in the cache. Then, the process proceeds to step S1109.

次に、スケジューラ１１０は、処理中の並列タスクをシミュレーション済タスクに設定し（ステップＳ１１０９）、ステップＳ１１０４の処理に戻り、未シミュレーションの並列タスクが存在するか否かを判断する。 Next, the scheduler 110 sets the parallel task being processed as a simulated task (step S1109), returns to the process of step S1104, and determines whether there is an unsimulated parallel task.

ステップＳ１１０４において、未シミュレーションの並列タスクが存在すると判断される限り、スケジューラ１１０は、ステップＳ１１０５〜Ｓ１１０９の処理によってシミュレーションを繰り返して、並列タスクの優先度を設定する。ステップＳ１１０４において、未シミュレーションの並列タスクが存在しないと判断されると（ステップＳ１１０４：Ｎｏ）、スケジューラ１１０は、すべての並列タスクのシミュレーションが終了したため、一連の処理を終了する。 As long as it is determined in step S1104 that an unsimulated parallel task exists, the scheduler 110 repeats the simulation by the processing in steps S1105 to S1109 and sets the priority of the parallel task. If it is determined in step S1104 that there are no unsimulated parallel tasks (step S1104: No), the scheduler 110 ends the series of processes because the simulation of all parallel tasks is completed.

以上説明したように、スケジューラ１１０は、図１１の各処理を実行することによって、タスクテーブル１１１を作成することができる。なお、上述したタスクテーブル作成処理は、スケジューラ１１０が実行主体となっているが、他のコンパイラやシミュレータが実行主体となって事前に実行しておいてもよい。 As described above, the scheduler 110 can create the task table 111 by executing each process of FIG. The task table creation process described above is executed mainly by the scheduler 110, but may be executed in advance by another compiler or simulator as the execution subject.

例えば、ステップＳ１１０１〜Ｓ１１０３による解析は、一般的なコンパイラによって実行することができる。また、ステップＳ１１０１〜Ｓ１１０３による解析結果を利用したステップＳ１１０５におけるシミュレーションも、各タスクを実行した場合の実行時間や更新回数を見積もる公知のシミュレータによって実行することができる（例えば、特開２０００−２７６３８１参照。）。 For example, the analysis in steps S1101 to S1103 can be executed by a general compiler. In addition, the simulation in step S1105 using the analysis results in steps S1101 to S1103 can also be executed by a known simulator that estimates the execution time and the number of updates when each task is executed (see, for example, JP 2000-276381 A). .)

図１２は、タスクテーブルのデータ構造例を示すデータテーブルである。また、図１３は、タスクテーブルの設定例を示すデータテーブルである。図１２のデータテーブル１２００は、図１１にて説明したタスクテーブル作成処理によって作成されたタスクテーブル１１１のデータ構造例を表している。 FIG. 12 is a data table showing an example of the data structure of the task table. FIG. 13 is a data table showing a setting example of the task table. A data table 1200 in FIG. 12 represents an example of the data structure of the task table 111 created by the task table creation process described in FIG.

タスクテーブル１１１は、図１２のデータテーブル１２００のように、タスク情報を表す下記の情報群のフィールドと、共有データ情報を表す下記の情報群のフィールドとから構成されている。なお、タスクテーブル１１１のうち、タスク名、タスクＩＤ、デッドラインなど、値が空白のフィードは、タスク毎に異なる値が入力される。また、優先度やコヒーレンスモードなど、値が○／×のように二値となっているフィールドは、二値のいずれかの値が入力される。 The task table 111 includes the following information group field representing task information and the following information group field representing shared data information, like the data table 1200 of FIG. In the task table 111, a value with a blank value such as a task name, a task ID, and a deadline is input with a different value for each task. In addition, in the fields such as the priority and the coherence mode that are binary values such as ◯ / ×, any one of the binary values is input.

＜タスク情報＞
・タスク名：（タスクの名称）
・タスクＩＤ：（タスクの識別子）
・デッドライン：（ステップＳ１１０２の解析結果）
・優先度：高／低（ステップＳ１１０８の設定内容）
・コヒーレンスモード：Ｗｒｉｔｅ時更新／ｒｅａｄ時更新
・他のＣＰＵへのｆｏｒｋ：許可／不許可 <Task information>
-Task name: (task name)
Task ID: (task identifier)
-Deadline: (analysis result of step S1102)
Priority: high / low (set contents in step S1108)
・ Coherence mode: Update at Write / Update at read ・ Fork to other CPU: Permit / Disallow

＜共有データ情報＞
・共有データ名：（データの名称）
・共有データＩＤ：（データのＩＤ）
・更新回数：（ステップＳ１１０６の測定結果）
・配置されるキャッシュレベル：Ｌ１（キャッシュＬ１＄）／Ｌ２（キャッシュＬ２＄）
・データサイズ：（ステップＳ１１０１の解析結果） <Shared data information>
-Shared data name: (Data name)
-Shared data ID: (Data ID)
Update count: (measurement result of step S1106)
Cache level to be arranged: L1 (cache L1 $) / L2 (cache L2 $)
Data size: (analysis result of step S1101)

上記のタスク情報のうち、コヒーレンスモード、他のＣＰＵへのｆｏｒｋおよび配置されるキャッシュレベルは、タスク実行時に決定される。具体的には、コヒーレンスモード、他のＣＰＵへのｆｏｒｋは、後述する図１４〜１７によって説明されるタスク実行処理によって決定される。また、配置されるキャッシュレベルは、上述の図１０によって説明した共有データの配置処理によって決定される。なお、タスクテーブル１１１の具体的な数値が設定された、データテーブル１２００を図１３に例示している。 Among the above task information, the coherence mode, the fork to other CPUs, and the cache level to be arranged are determined at the time of task execution. Specifically, the coherence mode and the fork to another CPU are determined by a task execution process described with reference to FIGS. The cache level to be arranged is determined by the shared data arrangement process described with reference to FIG. FIG. 13 illustrates a data table 1200 in which specific numerical values of the task table 111 are set.

（タスク実行処理）
図１４〜１７は、タスク実行処理の手順を示すフローチャートである。図１４〜１７のフローチャートは、スケジューラ１１０が、実行対象となる並列タスクを各プロセッサに実行させる際の手順を表している。図１４〜１７の各処理を実行することによって、実行対象となる並列タスクは、タスクテーブル１１１に設定されている優先度や、実行中の他の並列タスクの優先度に応じたコヒーレンス手法に基づいて実行される。 (Task execution process)
14 to 17 are flowcharts showing the procedure of the task execution process. 14 to 17 show procedures when the scheduler 110 causes each processor to execute a parallel task to be executed. By executing each process of FIGS. 14 to 17, the parallel task to be executed is based on the coherence method according to the priority set in the task table 111 and the priority of other parallel tasks being executed. Executed.

図１４において、スケジューラ１１０は、まず、実行対象のタスクにおいて状態遷移が発生したか否かを判断する（ステップＳ１４０１）。ステップＳ１４０１における状態遷移とは、「タスク生成」、「タスク終了」および「タスクスイッチ」を意味する。したがって、ステップＳ１４０１において、状態遷移が発生したと判断された場合、スケジューラ１１０は、さらに、上記の３種類の中のいずれの状態になったかを判断する。 In FIG. 14, the scheduler 110 first determines whether or not a state transition has occurred in the task to be executed (step S1401). The state transition in step S1401 means “task generation”, “task end”, and “task switch”. Therefore, if it is determined in step S1401 that a state transition has occurred, the scheduler 110 further determines which of the above three types has been reached.

ステップＳ１４０１において、スケジューラ１１０は、状態遷移が発生するまで待機状態となる（ステップＳ１４０１：Ｎｏのループ）。ステップＳ１４０１において、状態遷移のうち、タスク生成が発生したと判断された場合（ステップＳ１４０１：Ｙｅｓタスク生成）、スケジューラ１１０は、実行対象のタスクが並列タスクか否かを判断する（ステップＳ１４０２）。 In step S1401, the scheduler 110 is in a standby state until a state transition occurs (step S1401: No loop). If it is determined in step S1401 that task generation has occurred in the state transition (step S1401: Yes task generation), the scheduler 110 determines whether the task to be executed is a parallel task (step S1402).

ステップＳ１４０２において、実行対象のタスクが並列タスクであると判断された場合（ステップＳ１４０２：Ｙｅｓ）、スケジューラ１１０は、新たに生成された並列タスクが、Ｍａｓｔｅｒスレッドか否かを判断する（ステップＳ１４０３）。Ｍａｓｔｅｒスレッドとは、優先的に実行されるスレッドである。 If it is determined in step S1402 that the task to be executed is a parallel task (step S1402: Yes), the scheduler 110 determines whether or not the newly generated parallel task is a master thread (step S1403). . The master thread is a thread that is preferentially executed.

ステップＳ１４０３において、新たに生成された並列タスクが、Ｍａｓｔｅｒスレッドであると判断された場合（ステップＳ１４０３：Ｙｅｓ）、スケジューラ１１０は、さらに、新たに生成された並列タスクが、高優先度タスクか否かを判断する（ステップＳ１４０４）。ステップＳ１４０４において、高優先度タスクか否かは、タスクテーブル１１１を参照して判断することができる。 If it is determined in step S1403 that the newly generated parallel task is a master thread (step S1403: Yes), the scheduler 110 further determines whether or not the newly generated parallel task is a high priority task. Is determined (step S1404). In step S1404, whether or not the task is a high priority task can be determined with reference to the task table 111.

ステップＳ１４０４において、新たに生成された並列タスクが、高優先度タスクであると判断された場合（ステップＳ１４０４：Ｙｅｓ）、スケジューラ１１０は、さらに、ＣＰＵにおいて高優先度タスクを実行中か否かを判断する（ステップＳ１４０５）。 If it is determined in step S1404 that the newly generated parallel task is a high priority task (step S1404: Yes), the scheduler 110 further determines whether or not the CPU is executing the high priority task. Determination is made (step S1405).

ステップＳ１４０５において、高優先度タスクを実行中であると判断された場合（ステップＳ１４０５：Ｙｅｓ）、スケジューラ１１０は、実行対象のタスクを実行に移すための準備処理を行う。すなわち、スケジューラ１１０は、実行中の並列タスクを、並列スレッドを実行中のＣＰＵの中で負荷最小のＣＰＵにｍｉｇｒａｔｉｏｎ（データ移行）し、実行中に新たなスレッドの他のＣＰＵへのｆｏｒｋ（新たなスレッドのコピー生成）を禁止する（ステップＳ１４０６）。 If it is determined in step S1405 that the high priority task is being executed (step S1405: Yes), the scheduler 110 performs a preparation process for moving the execution target task to execution. That is, the scheduler 110 migrates (executes data migration) the parallel task being executed to the CPU having the smallest load among the CPUs that are executing the parallel threads, and forks the new thread to other CPUs (newly) Copy generation of a thread) is prohibited (step S1406).

さらに、スケジューラ１１０は、ステップＳ１４０６において、ｍｉｇｒａｔｉｏｎしたタスクの共有データを配置したキャッシュ領域を、ロックする（ステップＳ１４０７）。そして、スケジューラ１１０は、ｍｉｇｒａｔｉｏｎしたタスクを逐次実行し（ステップＳ１４０８）、新たに生成された並列タスクにおいてスレッドの他のＣＰＵへのｆｏｒｋを禁止し、負荷最小のＣＰＵに割り当てる（ステップＳ１４０９）。 Further, in step S1406, the scheduler 110 locks the cache area in which the shared data of the migrated task is placed (step S1407). Then, the scheduler 110 sequentially executes the migrated task (step S1408), prohibits the forking of the thread to another CPU in the newly generated parallel task, and assigns it to the CPU with the smallest load (step S1409).

その後、スケジューラ１１０は、新たに生成された並列タスクの共有データを配置したキャッシュ領域をロックし、タスクの実行を開始する（ステップＳ１４１０）。ステップＳ１４１０の処理が終了すると、スケジューラ１１０は、ステップＳ１４０１の処理に戻り、新たに状態遷移が発生するまで待機状態となる。 After that, the scheduler 110 locks the cache area where the newly generated shared data of the parallel task is arranged, and starts executing the task (step S1410). When the process of step S1410 ends, the scheduler 110 returns to the process of step S1401 and enters a standby state until a new state transition occurs.

また、ステップＳ１４０３において、新たに生成された並列タスクがＭａｓｔｅｒスレッドではないと判断された場合（ステップＳ１４０３：Ｎｏ）、スケジューラ１１０は、スレッドのｆｏｒｋが禁止されているか否かを判断する（ステップＳ１４１１）。ステップＳ１４０３において、判断基準となっているスレッドとは、新たに生成されたタスクを構成するスレッドである。 If it is determined in step S1403 that the newly generated parallel task is not a master thread (step S1403: No), the scheduler 110 determines whether or not thread forking is prohibited (step S1411). ). In step S1403, the thread that is the determination criterion is a thread that constitutes a newly generated task.

ステップＳ１４０３において、新たに生成されたタスクのスレッドのｆｏｒｋが禁止されていると判断された場合（ステップＳ１４１１：Ｙｅｓ）、スケジューラ１１０は、新たに生成されたタスクをＭａｓｔｅｒスレッドが実行されるＣＰＵと同じＣＰＵにキューイングする（ステップＳ１４１２）。ステップＳ１４１２の処理によってキューイングされたタスクは、キューイング先のＣＰＵによって、現在実行中のタスクの終了後に実行される。スケジューラ１１０は、ステップＳ１４１２の処理が終了すると、ステップＳ１４０１の処理に戻り、新たに状態遷移が発生するまで待機状態となる。 If it is determined in step S1403 that the forking of the newly generated task thread is prohibited (step S1411: Yes), the scheduler 110 sets the newly generated task to the CPU on which the master thread is executed. Queue to the same CPU (step S1412). The task queued by the process of step S1412 is executed by the queuing destination CPU after the currently executing task is completed. When the process of step S1412 ends, the scheduler 110 returns to the process of step S1401 and enters a standby state until a new state transition occurs.

また、スケジューラ１１０は、新たに生成されたタスクが並列タスクではないと判断された場合（ステップＳ１４０２：Ｎｏ）、または、スレッドのｆｏｒｋが禁止されていないと判断された場合（ステップＳ１４１１：Ｎｏ）、タスクを負荷最小のＣＰＵにキューイングする（ステップＳ１４１３）。ステップＳ１４１３によってキューイングされるタスクは、ステップＳ１４０１によって新たに生成されたと判断されたタスクである。スケジューラ１１０は、ステップＳ１４１３の処理が終了すると、ステップＳ１４０１の処理に戻り、新たに状態遷移が発生するまで待機状態となる。 Further, the scheduler 110 determines that the newly generated task is not a parallel task (step S1402: No), or determines that thread forking is not prohibited (step S1411: No). The task is queued to the CPU with the smallest load (step S1413). The task queued in step S1413 is a task determined to be newly generated in step S1401. When the process of step S1413 ends, the scheduler 110 returns to the process of step S1401 and enters a standby state until a new state transition occurs.

図１５のフローチャートは、ステップＳ１４０１において、タスク終了が発生したと判断された場合（１４０１：Ｙｅｓタスク終了）と、タスクスイッチが発生したと判断された場合（ステップＳ１４０１：Ｙｅｓタスクスイッチ）とにおけるスケジューラ１１０の処理を表している。 The flowchart of FIG. 15 shows the scheduler in the case where it is determined in step S1401 that a task end has occurred (1401: Yes task end) and in the case where it is determined that a task switch has occurred (step S1401: Yes task switch). 110 processes.

図１５において、スケジューラ１１０は、まず、ステップＳ１４０１において、タスク終了が発生したと判断された場合（１４０１：Ｙｅｓタスク終了）、ロックしていた並列タスクの共有データを配置したキャッシュ領域を、開放する（ステップＳ１５０１）。 In FIG. 15, when it is determined in step S1401 that task termination has occurred (1401: Yes task termination), the scheduler 110 first releases a cache area in which shared data of locked parallel tasks is arranged. (Step S1501).

その後、スケジューラ１１０は、実行待ちのタスクがあるか否かを判断する（ステップＳ１５０２）。ステップＳ１５０２において、実行待ちのタスクがあると判断された場合（ステップＳ１５０２：Ｙｅｓ）、スケジューラ１１０は、ステップＳ１５０３に移行して、実行待ちのタスクを実行するための処理を行う。一方、ステップＳ１５０２において、実行待ちのタスクがないと判断された場合（ステップＳ１５０２：Ｎｏ）、スケジューラ１１０は、図１４のステップＳ１４０１の処理に戻り、次の状態遷移が発生するまで待機状態となる。 Thereafter, the scheduler 110 determines whether there is a task waiting for execution (step S1502). If it is determined in step S1502 that there is a task waiting for execution (step S1502: Yes), the scheduler 110 proceeds to step S1503 and performs processing for executing the task waiting for execution. On the other hand, if it is determined in step S1502 that there is no task waiting to be executed (step S1502: No), the scheduler 110 returns to the process of step S1401 in FIG. 14 and enters a standby state until the next state transition occurs. .

一方、ステップＳ１４０１において、タスクスイッチが発生したと判断された場合（１４０１：Ｙｅｓタスクスイッチ）、スケジューラ１１０は、タスクの実行権を渡すのが低優先度の並列タスクか否かを判断する（ステップＳ１５０３）。なお、ステップＳ１５０２において、実行待ちのタスクがあると判断された場合（ステップＳ１５０２：Ｙｅｓ）も、スケジューラ１１０は、ステップＳ１５０３の判断処理を行う。 On the other hand, if it is determined in step S1401 that a task switch has occurred (1401: Yes task switch), the scheduler 110 determines whether it is a low-priority parallel task that passes the task execution right (step S1401). S1503). If it is determined in step S1502 that there is a task waiting to be executed (step S1502: Yes), the scheduler 110 performs the determination process in step S1503.

ステップＳ１５０３において、タスクの実行権を渡すのが低優先度の並列タスクであると判断された場合（ステップＳ１５０３：Ｙｅｓ）、スケジューラ１１０は、低優先度の並列タスクを実行する際のキャッシュコヒーレンス方式を採用する。すなわち、スケジューラ１１０は、ＣＰＵのキャシュコヒーレンス方式を、他のＣＰＵがデータにアクセスしたときにスヌープ機構が動作するモードに設定する（ステップＳ１５０４）。 If it is determined in step S1503 that the task execution right is transferred to the low-priority parallel task (step S1503: Yes), the scheduler 110 executes the cache coherence method when executing the low-priority parallel task. Is adopted. That is, the scheduler 110 sets the CPU cache coherence method to a mode in which the snoop mechanism operates when another CPU accesses the data (step S1504).

ステップＳ１５０３において、タスクの実行権を渡すのが低優先度の並列タスクではないと判断された場合（ステップＳ１５０３：Ｎｏ）、または、ステップＳ１５０４の処理が終了すると、スケジューラ１１０は、実行対象となるタスクの実行を開始する（ステップＳ１５０５）。ステップＳ１５０５によってタスクが実行されると、スケジューラ１１０は、ステップＳ１４０１の処理に戻り、次のタスクの状態遷移が発生するまで待機状態となる。 In step S1503, when it is determined that it is not a low priority parallel task that passes the task execution right (step S1503: No), or when the process of step S1504 ends, the scheduler 110 becomes an execution target. Task execution is started (step S1505). When the task is executed in step S1505, the scheduler 110 returns to the process in step S1401 and is in a standby state until a state transition of the next task occurs.

図１６のフローチャートは、ステップＳ１４０４において、新たに生成された並列タスクが、高優先度タスクではないと判断された場合（ステップＳ１４０４：Ｎｏ）のスケジューラ１１０の処理を表している。 The flowchart of FIG. 16 represents the process of the scheduler 110 when it is determined in step S1404 that the newly generated parallel task is not a high priority task (step S1404: No).

図１６において、スケジューラ１１０は、まず、ステップＳ１４０４にて新たに生成された並列タスクが、高優先度タスクではないと判断された場合（ステップＳ１４０４：Ｎｏ）、高優先度タスクを実行中か否かを判断する（ステップＳ１６０１）。なお、ステップＳ１６０１では、新たに生成されたタスクを実行させるＣＰＵにおいて、現在、高優先度タスクが実行されているか否かを判断する。 In FIG. 16, the scheduler 110 first determines whether or not the parallel task newly generated in step S1404 is not a high priority task (step S1404: No). Is determined (step S1601). In step S1601, it is determined whether a high priority task is currently being executed in the CPU that executes the newly generated task.

ステップＳ１６０１において、高優先度タスクを実行中であると判断された場合（ステップＳ１６０１：Ｙｅｓ）、スケジューラ１１０は、低優先度の並列タスクを実行する際のキャッシュコヒーレンス方式を採用する。すなわち、スケジューラ１１０は、実行中の並列タスクのキャシュコヒーレンス方式を、他のＣＰＵがデータにアクセスするときにスヌープ１２０のスヌープ機構が動作するモードに設定する（ステップＳ１６０２）。 If it is determined in step S1601 that a high priority task is being executed (step S1601: Yes), the scheduler 110 employs a cache coherence method when executing a low priority parallel task. That is, the scheduler 110 sets the cache coherence method of the parallel task being executed to a mode in which the snoop mechanism of the snoop 120 operates when another CPU accesses data (step S1602).

その後、スケジューラ１１０は、実行対象となるタスクを負荷最小のＣＰＵにキューイングして（ステップＳ１６０３）、ステップＳ１４０１の処理に移行する。ステップＳ１６０３において、キューイングされたタスクは、現在実行中のタスクが終了した後、実行される。負荷最小のＣＰＵとは、キューイング済のタスクの処理量が最小のＣＰＵを意味する。なお、ステップＳ１４０１に移行したスケジューラ１１０は、次に遷移状態が発生するまで待機状態となる。 Thereafter, the scheduler 110 queues the task to be executed to the CPU with the smallest load (step S1603), and proceeds to the processing of step S1401. In step S1603, the queued task is executed after the currently executing task is completed. The CPU with the smallest load means a CPU with the smallest processing amount of the queued task. Note that the scheduler 110 that has shifted to step S1401 is in a standby state until the next transition state occurs.

ステップＳ１６０１において、高優先度タスクを実行中ではないと判断された場合（ステップＳ１６０１：Ｎｏ）、スケジューラ１１０は、高優先度の並列タスクを実行する際のキャッシュコヒーレンス方式を採用する。すなわち、スケジューラ１１０は、実行中の並列タスクを、並列タスクに含まれる並列スレッドを実行中の他のＣＰＵの中で負荷最小のＣＰＵにｍｉｇｒａｔｉｏｎし、実行中に並列タスクに含まれる新たなスレッドの他のＣＰＵへのｆｏｒｋを禁止する（ステップＳ１６０４）。 If it is determined in step S1601 that the high priority task is not being executed (step S1601: No), the scheduler 110 adopts a cache coherence method when executing the high priority parallel task. That is, the scheduler 110 migrates the parallel task being executed to the CPU having the smallest load among the other CPUs executing the parallel threads included in the parallel task, and the new thread included in the parallel task during the execution. Forking to another CPU is prohibited (step S1604).

さらに、スケジューラ１１０は、ステップＳ１６０４において、ｍｉｇｒａｔｉｏｎしたタスクを逐次実行させる（ステップＳ１６０５）。そして、スケジューラ１１０は、新たに生成された並列タスクにおいて、並列タスクに含まれるスレッドの他のＣＰＵへのｆｏｒｋを禁止し、負荷最小のＣＰＵにキューイングする（ステップＳ１６０６）。 Further, the scheduler 110 sequentially executes the tasks migrated in step S1604 (step S1605). Then, in the newly generated parallel task, the scheduler 110 prohibits the forking of the thread included in the parallel task to another CPU and queues it to the CPU with the smallest load (step S1606).

ステップＳ１６０６によって、キューイングされたタスクは、現在実行中のタスクが終了した後、実行される。また、ステップＳ１６０６が終了すると、スケジューラ１１０は、ステップＳ１４０１の処理に移行して、新たな状態遷移が発生するまで待機状態となる。 In step S1606, the queued task is executed after the currently executing task is completed. When step S1606 ends, the scheduler 110 shifts to the process of step S1401 and enters a standby state until a new state transition occurs.

図１７のフローチャートは、ステップＳ１４０５において、新たに生成された並列タスクが、高優先度タスクを実行中ではないと判断された場合（ステップＳ１４０５：Ｎｏ）のスケジューラ１１０の処理を表している。 The flowchart of FIG. 17 represents the processing of the scheduler 110 when it is determined in step S1405 that the newly generated parallel task is not executing the high priority task (step S1405: No).

図１７において、スケジューラ１１０は、まず、ステップＳ１４０５にて対象となるＣＰＵが高優先度タスクを実行中ではないと判断された場合（ステップＳ１４０５：Ｎｏ）、新たに生成されたタスクを負荷最小のＣＰＵに割り当てる（ステップＳ１７０１）。 In FIG. 17, when it is determined that the target CPU is not executing the high priority task in step S1405 (step S1405: No), the scheduler 110 first sets the newly generated task to the minimum load. The CPU is assigned (step S1701).

そして、スケジューラ１１０は、新たに生成された並列タスクが逐次実行ではデッドライン制約を満たさないか否かを判断する（ステップＳ１７０２）。ステップＳ１７０２において、スケジューラ１１０は、タスクテーブル１１１に設定されているデッドライン制約に基づいてデットライン制約を満たさないか否かの判断を行う。 Then, the scheduler 110 determines whether or not the newly generated parallel task does not satisfy the deadline constraint in the sequential execution (step S1702). In step S <b> 1702, the scheduler 110 determines whether the deadline constraint is not satisfied based on the deadline constraint set in the task table 111.

ステップＳ１７０２において、デッドライン制約を満たさないと判断された場合（ステップＳ１７０２：Ｙｅｓ）、スケジューラ１１０は、さらに、現在低優先度の並列タスクを実行中か否かを判断する（ステップＳ１７０３）。 If it is determined in step S1702 that the deadline constraint is not satisfied (step S1702: Yes), the scheduler 110 further determines whether or not a low-priority parallel task is currently being executed (step S1703).

ステップＳ１７０３において、低優先度の並列タスクが実行中と判断された場合（ステップＳ１７０３：Ｙｅｓ）、スケジューラ１１０は、低優先度の並列タスクを実行する際のキャッシュコヒーレンス方式を採用する。すなわち、スケジューラ１１０は、実行中の並列タスクのコヒーレンス方式を、他のＣＰＵがデータにアクセスした時にスヌープ機構が動作するモードに設定する（ステップＳ１７０４）。 If it is determined in step S1703 that a low-priority parallel task is being executed (step S1703: Yes), the scheduler 110 employs a cache coherence method for executing the low-priority parallel task. That is, the scheduler 110 sets the coherence method of the parallel task being executed to a mode in which the snoop mechanism operates when another CPU accesses data (step S1704).

ステップＳ１７０４の処理が終了すると、スケジューラ１１０は、新たに生成された並列タスクの共有データを配置したキャシュ領域をロックする（ステップＳ１７０５）。また、ステップＳ１７０３において、低優先度の並列タスクを実行中ではないと判断された場合（ステップＳ１７０３：Ｎｏ）、スケジューラ１１０は、通常のコヒーレンス方式を採用するため、ステップＳ１７０４の処理を行わずに、ステップＳ１７０５の処理に移行する。 When the process of step S1704 ends, the scheduler 110 locks the cache area in which the newly generated shared data of the parallel task is arranged (step S1705). If it is determined in step S1703 that the low-priority parallel task is not being executed (step S1703: No), the scheduler 110 employs a normal coherence method, and thus does not perform the process of step S1704. The process proceeds to step S1705.

ステップＳ１７０５の処理が終了すると、スケジューラ１１０は、新たに生成された並列タスクの実行を開始させ（ステップＳ１７０６）、ステップＳ１４０１の処理に戻り、次のタスクの状態遷移が発生するまで待機状態となる。 When the process of step S1705 ends, the scheduler 110 starts execution of the newly generated parallel task (step S1706), returns to the process of step S1401, and enters a standby state until a next task state transition occurs. .

一方、ステップＳ１７０２において、デッドライン制約を満たすと判断された場合（ステップＳ１７０２：Ｎｏ）、スケジューラ１１０は、新たに生成された並列タスクの共有データを配置したキャッシュ領域を、ロックする（ステップＳ１７０７）。 On the other hand, if it is determined in step S1702 that the deadline constraint is satisfied (step S1702: No), the scheduler 110 locks the cache area in which the newly generated shared data of the parallel task is placed (step S1707). .

そして、スケジューラ１１０は、新たに生成された並列タスクの逐次実行を開始させる（ステップＳ１７０８）。その後、スケジューラ１１０は、ステップＳ１４０１の処理に戻り、次のタスクの状態遷移が発生するまで待機状態となる。 Then, the scheduler 110 starts sequential execution of the newly generated parallel task (step S1708). Thereafter, the scheduler 110 returns to the process of step S1401 and enters a standby state until a state transition of the next task occurs.

以上説明したように、スケジューラ１１０は、並列タスクとして特定された各タスクにそれぞれ、どのような優先度（高優先度／低優先度）が設定されているか、さらに、並列タスク同士が同一の優先度であるかに応じて、最適なＣＰＵに実行されるようにスケジューリングすることができる。また、スケジューラ１１０は、各タスクの優先度に応じて共有データのキャッシュコヒーレンス方式を設定するため、キャッシュメモリ（キャッシュＬ１＄）の利用効率の低下を防ぐことができる。 As described above, the scheduler 110 determines what priority (high priority / low priority) is set for each task specified as a parallel task, and the parallel tasks have the same priority. Depending on the degree, it can be scheduled to be executed by an optimal CPU. In addition, since the scheduler 110 sets a cache coherence method for shared data in accordance with the priority of each task, it is possible to prevent a decrease in usage efficiency of the cache memory (cache L1 $).

（適用例）
次に、本実施の形態にかかるスケジューリング処理を通信機器に適用させた場合の動作例について説明する。具体的には、スマートフォンなどの携帯型の通信機器と、サーバなどの固定型の通信機器とによってそれぞれ実行される並列タスクについて説明する。 (Application example)
Next, an operation example when the scheduling process according to the present embodiment is applied to a communication device will be described. Specifically, parallel tasks executed respectively by a portable communication device such as a smartphone and a fixed communication device such as a server will be described.

＜同一優先度の並列タスクの場合＞
図１８は、同一優先度の並列タスクの実行例を示す説明図である。図１８では、スマートフォン１８０１は、他のスマートフォン１８０２とＷＬＡＮ（ＷｉｒｅｌｅｓｓＬＡＮ）の規格に準拠した通信を行っている。さらに、スマートフォン１８０１は、サーバ１８０３ともＬＴＥ（ＬｏｎｇＴｅｒｍＥｖｏｌｕｔｉｏｎ）の規格に準拠した通信を行っている。 <For parallel tasks with the same priority>
FIG. 18 is an explanatory diagram of an execution example of parallel tasks having the same priority. In FIG. 18, the smartphone 1801 communicates with another smartphone 1802 in accordance with a WLAN (Wireless LAN) standard. Furthermore, the smartphone 1801 performs communication based on the LTE (Long Term Evolution) standard with the server 1803.

ＷＬＡＮの規格に沿ったタスク（ＷＬＡＮ＃０，１）と、ＬＴＥの規格に沿ったタスク（ＬＴＥ＃０，１）は、共にリアルタイム制約があるため高優先度タスクとなる。したがって、スマートフォン１８０１は、ＷＬＡＮ＃０，１およびＬＴＥ＃０，１を、同一優先度の並列タスクとして実行する。スマートフォン１８０１のスヌープ１２０では、同一優先度の並列タスクが実行されるため、通常のキャッシュコヒーレンシを行うスヌープ方式が採用される。 The task (WLAN # 0, 1) that conforms to the WLAN standard and the task (LTE # 0, 1) that conforms to the LTE standard are both high priority tasks due to real-time restrictions. Therefore, the smartphone 1801 executes WLAN # 0, 1 and LTE # 0, 1 as parallel tasks having the same priority. Since the snoop 120 of the smartphone 1801 executes parallel tasks with the same priority, a snoop method for performing normal cache coherency is adopted.

＜優先度の異なる並列タスクの場合＞
図１９は、優先度の異なる並列タスクの実行例を示す説明図である。図１９では、スマートフォン１８０１が、サーバ１８０３とＬＴＥの規格に準拠した通信を行っている。また、スマートフォン１８０１では、通信を必要としないドライバのアプリケーションについてのタスク（ｄｒｉｖｅｒ＃０，１）が実行されている。 <For parallel tasks with different priorities>
FIG. 19 is an explanatory diagram of an execution example of parallel tasks having different priorities. In FIG. 19, the smartphone 1801 communicates with the server 1803 in accordance with the LTE standard. In the smartphone 1801, a task (driver # 0, 1) for a driver application that does not require communication is executed.

スマートフォン１８０１によって実行されているドライバのアプリケーションは、リアルタイム制約が設けられていないため、低優先度タスクとなる。したがって、スマートフォン１８０１は、ＬＴＥ＃０，１を、高優先度の並列タスクとして実行し、ｄｒｉｖｅｒ＃０，１を、低優先度の並列タスクとして実行する。優先度の異なる並列タスクが実行されるため、スマートフォン１８０１のスヌープ１２０では、ＬＴＥ＃０，１に対して低優先度並列タスクにおけるキャッシュコヒーレンシを行うスヌープ方式が採用される。 The driver application executed by the smartphone 1801 is a low-priority task because no real-time constraint is provided. Therefore, the smartphone 1801 executes LTE # 0, 1 as a high priority parallel task, and executes driver # 0, 1 as a low priority parallel task. Since parallel tasks with different priorities are executed, the snoop 120 of the smartphone 1801 employs a snoop method that performs cache coherency in the low priority parallel tasks for LTE # 0,1.

以上説明したように、マルチコアプロセッサシステム、マルチコアプロセッサシステムの制御方法および制御プログラムによれば、利用頻度の高い共有データを、アクセス速度の速いキャッシュメモリに優先的に配置するため、処理速度を向上させることができる。 As described above, according to the multi-core processor system, the control method and control program for the multi-core processor system, the shared data that is frequently used is preferentially arranged in the cache memory having a high access speed, so that the processing speed is improved. be able to.

また、低優先度に設定された処理の共有データの場合、ＣＰＵからのアクセス要求が発生するまで、キャッシュコヒーレンシによる同期処理を延期する。すなわち、再利用性のない共有データをキャッシュメモリに書き込むといった、マルチコアプロセッサシステムの処理性能の低下の原因となる処理を回避することができる。したがって、並列処理およびマルチタスク処理が実行される場合であっても、キャッシュの利用効率を高めてマルチコアプロセッサシステムの処理能力を向上させることができる。 In the case of shared data for processing set to low priority, the synchronization processing by cache coherency is postponed until an access request from the CPU is generated. That is, it is possible to avoid a process that causes a decrease in the processing performance of the multi-core processor system, such as writing shared data without reusability to the cache memory. Therefore, even when parallel processing and multitask processing are executed, it is possible to improve the use efficiency of the cache and improve the processing capability of the multicore processor system.

また、高優先度タスクがなく、キャッシュメモリに空き領域がある場合には、低優先度タスクの共有データを、各ＣＰＵのキャッシュメモリに配置してもよい。したがって、高優先後タスクが存在しない場合であっても、キャッシュメモリを効率的に利用させることができる。 Further, when there is no high priority task and there is an empty area in the cache memory, the shared data of the low priority task may be arranged in the cache memory of each CPU. Therefore, even when there is no post-high priority task, the cache memory can be used efficiently.

さらに、キャッシュメモリに配置した高優先度タスクの実行時にアクセスされる共有データは、高優先度タスクが終了するまでロックされるように設定してもよい。共有データをロックすることによって、タスクスイッチが発生しても、他のタスクの共有データによって高優先度タスクの共有データが書き換えられてしまうような事態を防ぎ、高優先度タスクを効率的に実行させることができる。 Further, the shared data accessed when executing the high priority task arranged in the cache memory may be set to be locked until the high priority task is completed. Locking shared data prevents high-priority task shared data from being overwritten by shared data of other tasks even if a task switch occurs, and efficiently executes high-priority tasks Can be made.

また、高優先度タスクが実行時にアクセスする共有データが、キャッシュメモリの容量よりも大きく、キャッシュメモリに配置しきれない場合には、キャッシュメモリ以外のメモリ領域のうち、アクセス速度の速いメモリ領域に共有データを配置してもよい。また、共有メモリを配置する際に、複数のメモリ領域が存在する場合には、アクセス速度の早いメモリから順番に共有データを配置する。したがって、高優先度タスクの共有データを優先的にアクセス速度の速いメモリ領域に配置するため、効率的な処理を期待することができる。 Also, if the shared data accessed by the high-priority task is larger than the capacity of the cache memory and cannot be allocated in the cache memory, the memory area other than the cache memory will have a higher access speed. Shared data may be arranged. In addition, when a plurality of memory areas exist when the shared memory is arranged, the shared data is arranged in order from the memory with the higher access speed. Therefore, since the shared data of the high priority task is preferentially arranged in a memory area with a high access speed, efficient processing can be expected.

さらに、高優先度タスクが実行時にアクセスする共有データが、キャッシュメモリの容量よりも小さく、キャッシュメモリに余裕がある場合には、余った領域に低優先度タスクの共有データを配置してもよい。余った領域に低優先度タスクの共有データを配置することによって、キャッシュメモリの空き容量を防ぎ、高い利用効率を維持することができる。 Furthermore, when the shared data accessed by the high priority task is smaller than the capacity of the cache memory and the cache memory has room, the shared data of the low priority task may be arranged in the surplus area. . By arranging the shared data of the low priority task in the surplus area, it is possible to prevent free space in the cache memory and maintain high utilization efficiency.

また、各ＣＰＵのキャシュメモリの他のメモリ領域として、複数のメモリ領域が用意されている場合には、アクセス速度の速いメモリ領域から順番に共有データを配置してもよい。優先度にかかわらず、各タスクの共有データをアクセス速度の速いメモリ領域に優先的に配置することによって、各タスクを効率的に実行させることができる。 Further, when a plurality of memory areas are prepared as other memory areas of the cache memory of each CPU, the shared data may be arranged in order from the memory area with the fast access speed. Regardless of the priority, each task can be efficiently executed by preferentially arranging the shared data of each task in a memory area having a high access speed.

さらに、実行対象となるタスクの中から並列タスクを抽出して、同一のプロセッサに割り当ててもよい。さらに、並列タスクのうち優先度も同一の並列タスクを抽出して同一のプロセッサに割り当ててもよい。優先度が同一の並列タスクが同一のプロセッサに割り当てられることによって、一旦キャシュメモリに配置した共有データを効率的に利用することができる。 Furthermore, parallel tasks may be extracted from tasks to be executed and assigned to the same processor. Further, parallel tasks having the same priority among the parallel tasks may be extracted and assigned to the same processor. By assigning parallel tasks having the same priority to the same processor, shared data once placed in the cache memory can be used efficiently.

なお、本実施の形態で説明したマルチコアプロセッサシステムの制御方法は、あらかじめ用意されたプログラムをパーソナル・コンピュータやワークステーションなどのコンピュータで実行することにより実現することができる。本マルチコアプロセッサシステムの制御プログラムは、ハードディスク、フレキシブルディスク、ＣＤ−ＲＯＭ、ＭＯ、ＤＶＤなどのコンピュータで読み取り可能な記録媒体に記録され、コンピュータによって記録媒体から読み出されることによって実行される。また本マルチコアプロセッサシステムの制御プログラムは、インターネットなどのネットワークを介して配布してもよい。 The control method of the multi-core processor system described in this embodiment can be realized by executing a program prepared in advance on a computer such as a personal computer or a workstation. The control program of the multi-core processor system is recorded on a computer-readable recording medium such as a hard disk, a flexible disk, a CD-ROM, an MO, and a DVD, and is executed by being read from the recording medium by the computer. Further, the control program of the present multi-core processor system may be distributed via a network such as the Internet.

１００マルチコアプロセッサシステム
１１０スケジューラ
１２０スヌープ
１３０メモリコントローラ
１４０メモリ
１５０ファイルシステム
１０００アプリケーション
１００１判断部
１００２第１配置部
１００３第２配置部
１００４第３配置部
１００５特定部
１００６抽出部
１００７割当部 DESCRIPTION OF SYMBOLS 100 Multi-core processor system 110 Scheduler 120 Snoop 130 Memory controller 140 Memory 150 File system 1000 Application 1001 Judgment part 1002 1st arrangement part 1003 2nd arrangement part 1004 3rd arrangement part 1005 Identification part 1006 Extraction part 1007 Allocation part

Claims

Multiple cores to handle each task,
A plurality of caches each storing data to be accessed when the plurality of cores process a task, wherein a first core of the plurality of cores includes:
When the priority of the task assigned to any of the plurality of cores is equal to or higher than a predetermined value, the data is stored in a cache corresponding to the core to which the task is assigned before executing the processing of the task. Multi-core processor system.

A control method for a multi-core processor system, comprising: a plurality of cores each processing a task; and a plurality of caches each storing data to be accessed when the plurality of cores process a task. The first core of
Determining whether the priority of the task assigned to any of the plurality of cores is greater than or equal to a predetermined value;
A multi-core processor system for executing a process of storing the data in a cache corresponding to a core to which the task is allocated before executing the process of the task when the priority of the task is equal to or higher than the predetermined value. Control method.

A control program for a multi-core processor system, comprising: a plurality of cores each processing a task; and a plurality of caches each storing data to be accessed when the plurality of cores process a task. To the first core of
Determining whether the priority of the task assigned to any of the plurality of cores is greater than or equal to a predetermined value;
When the priority of the task is equal to or higher than the predetermined value, a process of storing the data in a cache corresponding to a core to which the task is allocated before executing the process of the task is executed. Control program.