JP2014096024A

JP2014096024A - Control program for multi-core processor, electronic apparatus, and control method

Info

Publication number: JP2014096024A
Application number: JP2012247172A
Authority: JP
Inventors: Masaki Gondo; 正樹権藤; Junichi Tatsuta; 純一立田; Satoru Tsurugatani; 哲鶴ヶ谷
Original assignee: ESOL CO Ltd
Current assignee: ESOL CO Ltd
Priority date: 2012-11-09
Filing date: 2012-11-09
Publication date: 2014-05-22
Anticipated expiration: 2032-11-09
Also published as: JP5734941B2

Abstract

PROBLEM TO BE SOLVED: To provide a control program for multi-core processor, an electronic apparatus, and a control method such that a menu core processor can improve a throughput by actualizing an SMP model, and secure real-time properties by securing an execution time of a thread of high priority.SOLUTION: A control program for multi-core processor includes: a global scheduler 21 which determines an operation core 10 of a generated thread; and local schedulers 31 each provided for a plurality of cores 10. The local scheduler 31 makes its core 10 execute a thread assigned thereto according to the priority. The global scheduler 21 determines execution of thread migration between the plurality of cores 10 based upon a predetermined scheduling policy. Here, top 10 threads of high priority among threads allocated to the respective cores 10 are not objects of thread migration.

Description

この発明は、複数のコアを備えたマルチコアプロセッサの制御に関するものである。特に、多数のコアを組み込んだプロセッサにおいて、スループットの向上とリアルタイム性の確保とを両立することができる制御プログラム、電子機器及び制御方法に関する。 The present invention relates to control of a multi-core processor having a plurality of cores. In particular, the present invention relates to a control program, an electronic device, and a control method that can achieve both improvement in throughput and real-time performance in a processor incorporating a large number of cores.

従来、この種のマルチコアプロセッサの多くは異種のコアを用いたものである。このような異種のコアを用いた構成は、特定のアプリケーション用途向けのコアを設けることで、消費電力当たりのパフォーマンスを向上するために採用されている。しかしながら、プロセス技術の急速な進歩などを考慮すると、特定のアプリケーション用途向けのチップの優位性を確保することはますます困難になると想定される。このような事情から、よりスケーラブルな構成、すなわち、均質なコアを組み込んだチップが増加すると考えられる。また、近年においては単一のチップに組み込むコアの数が増加する傾向にあり、このような均質なコアを用いた構成によればコア数が増加したとしても対応が容易であるので、コア数の増加を促進する構成としても期待されている。 Conventionally, many of these types of multi-core processors use different types of cores. Such a configuration using different types of cores is employed to improve performance per power consumption by providing a core for a specific application. However, considering rapid progress in process technology and the like, it is expected that it will become increasingly difficult to secure the superiority of chips for specific application applications. Under such circumstances, it is considered that the number of chips having a more scalable configuration, that is, a homogeneous core is increased. In recent years, the number of cores incorporated in a single chip tends to increase. According to the configuration using such a homogeneous core, even if the number of cores increases, the number of cores can be easily accommodated. It is also expected to promote the increase in

ところで、ランタイムソフトウェアを管理するための最も一般的なアプローチは、オペレーティング・システムを使用することであり、マルチコアプロセッサの制御にはマルチコアプロセッサに対応したオペレーティング・システムが使用される。コアの数が最大４程度のマルチコア環境では、非対称型マルチプロセッシング（ＡＭＰ）モデル、または、対称型マルチプロセッシング（ＳＭＰ）モデルのいずれかで制御を行うオペレーティング・システムが使用される。 By the way, the most common approach for managing runtime software is to use an operating system, and an operating system corresponding to the multi-core processor is used to control the multi-core processor. In a multi-core environment where the number of cores is about 4 at the maximum, an operating system that performs control by using either an asymmetric multi-processing (AMP) model or a symmetric multi-processing (SMP) model is used.

ＡＭＰは、スレッドが実行されるコアを固定する処理方法であり、スレッドマイグレーションやキャッシュ関連の問題などを回避し、処理のリアルタイム性を確保しやすいために広く使用されている。このＡＭＰは、コアの数が比較的少ない場合には有利であるが、コアの数が増加していくと、コア間の通信、デバイスの共有、サービスの共有などの処理コストが許容できない問題として発生する。なお、ハイパーバイザベースのパーティション化モデルは原則としてＡＭＰモデルであり、同様の問題を抱えている。 AMP is a processing method for fixing a core on which a thread is executed, and is widely used because it avoids thread migration and cache-related problems and easily secures real-time processing. This AMP is advantageous when the number of cores is relatively small, but as the number of cores increases, processing costs such as communication between cores, device sharing, and service sharing cannot be tolerated. Occur. The hypervisor-based partitioning model is an AMP model in principle, and has the same problem.

このように、ＡＭＰは比較的コアの数が少ない場合には有効であるものの、コア数の増加への対応には限界があるため、コア数のさらなる増加が見込まれる現在においては、スケーラブルなＳＭＰモデルを採用したオペレーティング・システムによりスループットを向上させることが強く求められている。 Thus, although AMP is effective when the number of cores is relatively small, there is a limit to the response to the increase in the number of cores, so in the present situation where a further increase in the number of cores is expected, scalable SMP There is a strong demand to improve throughput with operating systems that employ models.

しかしながら、コアの数が８個以上となるようなメニーコアプロセッサにおいてＳＭＰモデルのオペレーティング・システムを実装すると、キャッシュコヒーレンシの不足または高コストという新たな問題が発生する。例えば、共有メモリの排他制御が複雑かつ高コストとなるので、処理全体に与えるボトルネックとなり、また、ＳＭＰモデルにおいてはスレッドの動作コアを変更するスレッドマイグレーションが実行されるが、このスレッドマイグレーションのオーバーヘッドが増大するという問題がある。なお、ほとんどのメニーコアプロセッサは、ハードウェア・キャッシュコヒーレンシ・メカニズムを提供しておらず、また、キャッシュコヒーレンシを持つ稀なメニーコアプロセッサにおいても、チップ内のコア間のコヒーレンシを維持するためのコストは、従来のコアの数が少ないプロセッサに比べてはるかに高コストとなる。 However, when an SMP model operating system is implemented in a many-core processor having eight or more cores, a new problem of insufficient cache coherency or high cost occurs. For example, the exclusive control of the shared memory is complicated and expensive, which becomes a bottleneck given to the entire process. In the SMP model, thread migration that changes the operating core of the thread is executed. There is a problem that increases. Note that most many-core processors do not provide hardware / cache coherency mechanisms, and even in rare many-core processors with cache coherency, the cost of maintaining coherency between cores in a chip is Compared to a processor with a small number of cores, the cost is much higher.

上記したように、スループットを高めるためにＳＭＰモデルを採用すると、トレードオフとしてボトルネックやオーバーヘッドが発生し、リアルタイム性が犠牲になるという問題があった。この点、メニーコアプロセッサの制御については複数の先行研究があるものの（例えば非特許文献１参照）、主にサーバの処理を目的としたものであり、例えば組み込みシステムのようにリアルタイム性を求められる分野における研究ではないためにリアルタイム性を確保するためのスケジューリングについてはほとんど議論されていなかった。 As described above, when the SMP model is adopted in order to increase the throughput, there is a problem in that bottlenecks and overhead occur as tradeoffs, and real-time performance is sacrificed. In this regard, although there are a plurality of prior studies on the control of the many-core processor (see, for example, Non-Patent Document 1), it is mainly intended for server processing, and is a field that requires real-time performance, such as an embedded system. Because it is not a study in Japan, there was little discussion about scheduling to ensure real-time performance.

David Wentzlaff他、"An Operating System for Multicore and Clouds: Mechanisms and Implementation"、MIT Open Access Articles、［online］、インターネット〈URL：http://dspace.mit.edu/openaccess-disseminate/1721.1/62570〉David Wentzlaff et al., "An Operating System for Multicore and Clouds: Mechanisms and Implementation", MIT Open Access Articles, [online], Internet <URL: http://dspace.mit.edu/openaccess-disseminate/1721.1/62570>

本発明は、メニーコアプロセッサにおいてＳＭＰモデルを実現してスループットを向上するとともに、高優先度のスレッドの実行時間を保証してリアルタイム性を確保することができるマルチコアプロセッサの制御プログラム、電子機器及び制御方法を提供することを課題とする。 The present invention realizes an SMP model in a many-core processor to improve throughput, and also guarantees a high-priority thread execution time to ensure real-time performance, a multi-core processor control program, an electronic device, and a control method It is an issue to provide.

本発明は、上記した課題を解決するためになされたものであり、以下を特徴とする。 The present invention has been made to solve the above-described problems, and is characterized by the following.

（請求項１）
請求項１に記載のマルチコアプロセッサの制御プログラムは、複数のコアを備えたマルチコアプロセッサの制御プログラムであって、生成されるスレッドの動作コアを決定するグローバルスケジューラと、前記複数のコアごとに設けられたローカルスケジューラと、を備え、前記ローカルスケジューラは、自コアに割り当てられたスレッドを優先度に応じてスケジューリングして自コアで実行させ、前記グローバルスケジューラは、所定のスケジューリングポリシーに基づいて前記複数のコア間でのスレッドマイグレーションの実行を決定するものであって、各コアに割り当てられたスレッドのうち高優先度の上位Ｎ個（Ｎは１以上の予め定められた自然数）のスレッドについてはスレッドマイグレーションの対象としないことを特徴とする。 (Claim 1)
The control program for a multi-core processor according to claim 1 is a control program for a multi-core processor having a plurality of cores, and is provided for each of the plurality of cores, a global scheduler that determines an operating core of a generated thread. A local scheduler, wherein the local scheduler schedules threads assigned to the own core according to priority and executes the threads in the own core, and the global scheduler executes the plurality of the plurality of threads based on a predetermined scheduling policy. Thread migration between cores is determined and thread migration is performed for the top N threads (N is a predetermined natural number of 1 or more) with high priority among threads assigned to each core. It is characterized by not being subject to

（請求項２）
請求項２に記載の発明は、上記した請求項１記載の発明の特徴点に加え、以下の点を特徴とする。 (Claim 2)
The invention described in claim 2 has the following features in addition to the features of the invention described in claim 1 described above.

すなわち、前記グローバルスケジューラは、生成されたすべてのスレッドのうち高優先度の上位Ｍ個（Ｍはスレッドを割り当て可能なコアの数）のスレッドについて、各スレッドの動作コアが互いに異なるようにマップすることを特徴とする。 That is, the global scheduler maps high-priority top M threads (M is the number of cores to which threads can be assigned) among all generated threads so that the operating cores of the threads are different from each other. It is characterized by that.

（請求項３）
請求項３に記載の発明は、上記した請求項１又は２記載の発明の特徴点に加え、以下の点を特徴とする。 (Claim 3)
The invention described in claim 3 is characterized by the following points in addition to the characteristics of the invention described in claim 1 or 2.

すなわち、前記グローバルスケジューラは、前記複数のコア間でのスレッドのロードバランスを定期的に検査した結果を基にマイグレーション対象のスレッド及びコアを決定することを特徴とする。 In other words, the global scheduler determines a thread and a core to be migrated based on a result of periodically checking a load balance of threads among the plurality of cores.

（請求項４）
請求項４に記載の発明は、請求項１〜３のいずれかに記載のマルチコアプロセッサの制御プログラムを搭載した電子機器である。 (Claim 4)
A fourth aspect of the present invention is an electronic device equipped with the control program for the multi-core processor according to any one of the first to third aspects.

（請求項５）
請求項５に記載のマルチコアプロセッサの制御方法は、複数のコアを備えたマルチコアプロセッサ上で作動し、前記複数のコアごとに設けられたローカルスケジューラでスレッドのスケジューリングを行うマルチコアプロセッサの制御方法であって、生成されるスレッドの動作コアを決定するステップと、前記ローカルスケジューラが自コアに割り当てられたスレッドについて優先度に応じてスケジューリングして自コアで実行させるステップと、所定のスケジューリングポリシーに基づいて前記複数のコア間でスレッドマイグレーションを実行するステップと、を備え、各コアに割り当てられたスレッドのうち高優先度の上位Ｎ個（Ｎは１以上の予め定められた自然数）のスレッドについてはスレッドマイグレーションが実行されないことを特徴とする。 (Claim 5)
The multi-core processor control method according to claim 5 is a multi-core processor control method that operates on a multi-core processor having a plurality of cores and performs thread scheduling by a local scheduler provided for each of the plurality of cores. A step of determining an operating core of the generated thread, a step in which the local scheduler schedules a thread assigned to the own core according to priority and executes the thread in accordance with a priority, and a predetermined scheduling policy. Executing thread migration between the plurality of cores, and among the threads assigned to each core, the top N threads with high priority (N is a predetermined natural number of 1 or more) are threads. No migration is performed And features.

（請求項６）
請求項６に記載の発明は、上記した請求項５記載の発明の特徴点に加え、以下の点を特徴とする。 (Claim 6)
The invention described in claim 6 has the following characteristics in addition to the characteristics of the invention described in claim 5.

すなわち、生成されたすべてのスレッドのうち高優先度の上位Ｍ個（Ｍはスレッドを割り当て可能なコアの数）のスレッドについて、各スレッドの動作コアが互いに異なるようにマップすることを特徴とする。 In other words, among the generated threads, high-priority top M threads (M is the number of cores to which threads can be assigned) are mapped so that the operating cores of the threads are different from each other. .

（請求項７）
請求項７に記載の発明は、上記した請求項５又は６記載の発明の特徴点に加え、以下の点を特徴とする。 (Claim 7)
The invention described in claim 7 is characterized by the following points in addition to the characteristics of the invention described in claim 5 or 6.

すなわち、前記複数のコア間でのスレッドのロードバランスを定期的に検査した結果を基にマイグレーション対象のスレッド及びコアを決定することを特徴とする。 That is, the thread and core to be migrated are determined based on the result of periodically checking the load balance of threads among the plurality of cores.

請求項１記載の発明によれば、グローバルスケジューラとローカルスケジューラとによる２段階のスケジュールによりスレッドが実行される。グローバルスケジューラは、生成したスレッドの動作コアを決定するとともに、所定のスケジューリングポリシーに基づいてスレッドマイグレーションの実行を決定する。このため、各コアにほぼ均等にスレッドを割り当てることができ、ハードウェアリソースを有効に活用してスループットを向上することができる。 According to the first aspect of the present invention, the thread is executed by a two-stage schedule by the global scheduler and the local scheduler. The global scheduler determines the operating core of the generated thread and determines the execution of thread migration based on a predetermined scheduling policy. For this reason, threads can be allocated to each core almost evenly, and hardware resources can be effectively used to improve throughput.

また、各コアに割り当てられたスレッドのうち高優先度の上位Ｎ個（Ｎは１以上の予め定められた自然数）のスレッドについてはスレッドマイグレーションの対象としないので、高優先度のスレッドはスレッドマイグレーションされずに割り当てコアでの最優先実行が保証されている。このため、リアルタイム性が要求される高優先度のスレッドの実行時間を保証することができるので、リアルタイム性を確保することができる。 Also, among the threads assigned to each core, the top N threads with the highest priority (N is a predetermined natural number equal to or greater than 1) are not subject to thread migration. Rather, the highest priority execution in the assigned core is guaranteed. For this reason, the execution time of a high-priority thread that requires real-time performance can be guaranteed, so that real-time performance can be ensured.

また、請求項２に記載の発明は上記の通りであり、グローバルスケジューラは生成されたすべてのスレッドのうち高優先度の上位Ｍ個（Ｍはスレッドを割り当て可能なコアの数）のスレッドについて、各スレッドの動作コアが互いに異なるようにマップする。すなわち、スレッドを割り当て可能なコア数と同数のスレッドについて、優先度の高い順に実行を保証することができるので、これら高優先度のスレッドの実行時間を保証することができ、リアルタイム性を確保することができる。 Further, the invention according to claim 2 is as described above, and the global scheduler has the highest priority among the generated threads of the top M threads (M is the number of cores to which threads can be allocated). Map so that the operating core of each thread is different from each other. In other words, the same number of threads as the number of cores to which threads can be allocated can be guaranteed to be executed in the order of priority, so the execution time of these high priority threads can be guaranteed and real-time performance can be ensured. be able to.

また、請求項３に記載の発明は上記の通りであり、グローバルスケジューラは、複数のコア間でのスレッドのロードバランスを定期的に検査した結果を基にマイグレーション対象のスレッド及びコアを決定するので、ロードバランスを最適な状態に保つことができ、スループットを向上することができる。 Further, the invention according to claim 3 is as described above, and the global scheduler determines the thread and core to be migrated based on the result of periodically checking the load balance of threads among a plurality of cores. , Load balance can be maintained in an optimum state, and throughput can be improved.

また、常にスレッドマイグレーションを実行するのではなく、定期的に検査した結果を基にスレッドマイグレーションを実行するので頻繁にスレッドマイグレーションが発生せず、かつ、ロードバランスが変化しない状況においてはスレッドマイグレーションが発生しないので、スレッドマイグレーションのコストを抑制することができる。特に、ボトルネックやキャッシュコヒーレンシの問題を回避するために共有メモリを使用しない（コアごとのローカルメモリを使用する）場合には、スレッドマイグレーション時にローカルメモリのコピーが必要となるためスレッドマイグレーションのコストが問題となる。しかしながら、本発明の制御によれば、スレッドマイグレーションの回数を最低限に抑制することでスレッドマイグレーションに伴うコストを抑制することができるため、言い換えると、スループットやリアルタイム性を確保しつつもボトルネックやキャッシュコヒーレンシの問題を回避することができる。 In addition, thread migration is not always executed, but thread migration is executed based on the results of periodic inspections. Therefore, thread migration does not occur frequently and the load balance does not change. Therefore, the cost of thread migration can be suppressed. In particular, when shared memory is not used to avoid bottlenecks and cache coherency problems (local memory for each core is used), a copy of local memory is required during thread migration, so thread migration costs are reduced. It becomes a problem. However, according to the control of the present invention, the cost associated with thread migration can be suppressed by minimizing the number of thread migrations. In other words, while maintaining throughput and real-time performance, bottlenecks and The problem of cache coherency can be avoided.

また、請求項４に記載の発明は上記の通りであり、上記したような効果を発揮する制御プログラムを搭載した電子機器を得ることができる。 The invention according to claim 4 is as described above, and an electronic device equipped with a control program that exhibits the above-described effects can be obtained.

また、請求項５によれば、請求項１記載の発明と同様の効果を得ることができる。 According to claim 5, the same effect as that of the invention of claim 1 can be obtained.

また、請求項６によれば、請求項２記載の発明と同様の効果を得ることができる。 According to claim 6, the same effect as that of the invention of claim 2 can be obtained.

また、請求項７によれば、請求項３記載の発明と同様の効果を得ることができる。 According to claim 7, the same effect as that of the invention of claim 3 can be obtained.

システムの概要を示す概念図である。It is a conceptual diagram which shows the outline | summary of a system. スケジューリングポリシーを説明する図である。It is a figure explaining a scheduling policy. （ａ）コアごとのワークロードの計算式、（ｂ）ロードバランスのばらつきの計算式である。(A) Calculation formula of workload for each core, (b) Calculation formula of variation in load balance. スレッド生成処理のフロー図である。It is a flowchart of a thread production | generation process. スレッド削除処理のフロー図である。It is a flowchart of a thread deletion process. ロードバランシング処理のフロー図である。It is a flowchart of a load balancing process.

本発明の実施形態について、図を参照しながら説明する。 Embodiments of the present invention will be described with reference to the drawings.

（システムの基本構成）
本実施形態に係るシステムは、電子機器に組み込まれて使用される組み込みシステムであり、マルチコアプロセッサ（メニーコアプロセッサ）を備えている。このマルチコアプロセッサは、図１（ａ）に示すように、複数のコア１０（図１（ａ）においては６４個のコア１０）を備えている。電子機器に内蔵される不揮発メモリには、このマルチコアプロセッサを制御するための制御プログラム（オペレーティング・システム）が記憶されており、この制御プログラムがマルチコアプロセッサ上で実行されることで各種アプリケーション２５が実行されるように形成されている。 (Basic system configuration)
The system according to this embodiment is an embedded system that is used by being incorporated in an electronic device, and includes a multi-core processor (a many-core processor). As shown in FIG. 1A, this multi-core processor includes a plurality of cores 10 (64 cores 10 in FIG. 1A). A non-volatile memory built in the electronic device stores a control program (operating system) for controlling the multi-core processor, and various applications 25 are executed by executing the control program on the multi-core processor. It is formed to be.

なお、このシステムにおいては、ハードウェアによるキャッシュコヒーレンシ機構は存在していない。また、コア１０で共有される共有メモリは存在するものの、アクセスが高コストであるので、後述するマイクロカーネル３０はこの共有メモリを使用しておらず、コア１０ごとのローカルメモリを使用している。 In this system, there is no hardware cache coherency mechanism. In addition, although there is a shared memory shared by the core 10, since access is expensive, the microkernel 30 described later does not use this shared memory, and uses a local memory for each core 10. .

複数のコア１０は、図１（ａ）に示すように、ＯＳサーバ実行コア１１とアプリケーション実行コア１２とに分けられる。ＯＳサーバ実行コア１１は、制御プログラムの一部をなすＯＳサーバ２０を実行するコア１０である。アプリケーション実行コア１２は、ユーザアプリケーションやミドルウェア、ドライバなどのアプリケーション２５を実行するコア１０である。ＯＳサーバ２０及びアプリケーション２５は、それぞれがスレッドとしてコア１０に割り当てられ、実行される。なお、いずれのコア１０をＯＳサーバ実行コア１１又はアプリケーション実行コア１２とするかは、予め静的に決定しておいてもよいし、後述するグローバルスケジューラ２１の割り当てにより動的に決定されることとしてもよい。グローバルスケジューラ２１の割り当てにより動的に決定される場合、ＯＳサーバ実行コア１１においてＯＳサーバ２０が実行待機状態に変位した場合に、このＯＳサーバ実行コア１１においてアプリケーション２５が実行され、ＯＳサーバ実行コア１１がアプリケーション実行コア１２に変化することもあり得ることとなる。 The plurality of cores 10 are divided into an OS server execution core 11 and an application execution core 12 as shown in FIG. The OS server execution core 11 is a core 10 that executes the OS server 20 that forms part of the control program. The application execution core 12 is a core 10 that executes an application 25 such as a user application, middleware, or driver. Each of the OS server 20 and the application 25 is assigned to the core 10 as a thread and executed. Note that which core 10 is to be the OS server execution core 11 or the application execution core 12 may be statically determined in advance, or may be dynamically determined by assignment of a global scheduler 21 described later. It is good. When dynamically determined by the assignment of the global scheduler 21, when the OS server 20 is shifted to the execution standby state in the OS server execution core 11, the application 25 is executed in the OS server execution core 11, and the OS server execution core 11 may change to the application execution core 12.

なお、ＯＳサーバ２０は、オペレーティング・システムが提供する各種機能をスレッドとして実行するものである。そして、これらＯＳサーバ２０の１つとして、生成したスレッドの動作コア１０を決定するグローバルスケジューラ２１が実行される。このグローバルスケジューラ２１は、所定のスケジューリングポリシーに基づいて、生成されるスレッドの動作コア１０を決定し、また、スレッドの動作コア１０の変更（スレッドマイグレーション）の実行を決定する。このグローバルスケジューラ２１の詳細については後ほど説明する。 The OS server 20 executes various functions provided by the operating system as threads. Then, as one of these OS servers 20, a global scheduler 21 that determines the operating core 10 of the generated thread is executed. The global scheduler 21 determines the operation core 10 of the generated thread based on a predetermined scheduling policy, and determines execution of the change (thread migration) of the operation core 10 of the thread. Details of the global scheduler 21 will be described later.

各コア１０には、図１（ａ）に示すように、制御プログラムの一部をなすマイクロカーネル３０がコア１０ごとに分散して設けられている。このマイクロカーネル３０は、図１（ｂ）に示すように、ローカルスケジューラ３１、メッセージマネージャ３２、メモリマネージャ３３、インタラプトマネージャ３４を備えている。 As shown in FIG. 1A, each core 10 is provided with microkernels 30 forming a part of the control program in a distributed manner for each core 10. The microkernel 30 includes a local scheduler 31, a message manager 32, a memory manager 33, and an interrupt manager 34 as shown in FIG.

ローカルスケジューラ３１は、自コア１０に割り当てられたスレッドについて優先度に応じてスケジューリングして自コア１０で実行させるものである。例えば、あるローカルスケジューラ３１が制御するコア１０に、グローバルスケジューラ２１によって３つのスレッドが割り当てられた場合、ローカルスケジューラ３１はこの３つのスレッドのうちで最も優先度の高いスレッドを優先して実行する。そして、最も優先度の高いスレッドが待機状態となった場合には次に優先度の高いスレッドを実行し、上位２つのスレッドがいずれも待機状態となった場合にのみ最も優先度の低いスレッドを実行するようにスケジューリングする。そして、実行中のスレッドよりも優先度の高いスレッドが実行可能状態となったら、実行中のスレッドを停止し、優先度の高いスレッドに切り替えて実行する。 The local scheduler 31 schedules the threads assigned to the own core 10 according to the priority and executes them on the own core 10. For example, when three threads are assigned by the global scheduler 21 to the core 10 controlled by a certain local scheduler 31, the local scheduler 31 preferentially executes the thread having the highest priority among the three threads. When the highest priority thread enters the standby state, the next highest priority thread is executed. Only when the top two threads enter the standby state, the lowest priority thread is executed. Schedule to run. Then, when a thread having a higher priority than the executing thread becomes executable, the executing thread is stopped and switched to a higher priority thread for execution.

メッセージマネージャ３２は、他スレッドへのメッセージング機能を有するものである。このメッセージマネージャ３２は、ＯＳサーバ２０へのメッセージング機能も有している。例えば、アプリケーションスレッド４２がカーネルＡＰＩの呼び出し（例えばスレッドの生成・削除など）を行う場合、各コア１０のインターフェースライブラリ４１を使用してカーネルＡＰＩの呼び出しが実行され、インターフェースライブラリ４１内においてメッセージマネージャ３２を使用してＯＳサーバ２０（他コア１０で実行されている）が呼び出される。このように、メッセージマネージャ３２を使用してコア１０間の通信が実行され、ＯＳサーバ２０への処理依頼・応答処理待ちが行われることで、アプリケーションスレッド４２はコア１０を意識することなくＯＳサービスの呼び出しを行えるようになっている。 The message manager 32 has a function of messaging to other threads. The message manager 32 also has a messaging function to the OS server 20. For example, when the application thread 42 calls a kernel API (for example, generation / deletion of a thread), the kernel API is called using the interface library 41 of each core 10, and the message manager 32 is stored in the interface library 41. Is used to call the OS server 20 (executed by the other core 10). As described above, the communication between the cores 10 is executed using the message manager 32 and the processing request / response processing waiting for the OS server 20 is performed. Can be called.

メモリマネージャ３３は、自コア１０に割り当てられたコア１０ごとのローカルメモリを管理するものである。すなわち、本システムにおいては、アクセスが高コストな共有メモリを使用せずにコア１０ごとのローカルメモリを使用することでボトルネックやキャッシュコヒーレンシの問題を回避する構成となっており、このコア１０ごとのローカルメモリを管理するのがメモリマネージャ３３である。このメモリマネージャ３３は、例えばスレッドの生成・削除に伴うメモリイメージの管理を行う。 The memory manager 33 manages local memory for each core 10 allocated to the own core 10. In other words, this system is configured to avoid the bottleneck and cache coherency problems by using the local memory for each core 10 without using the shared memory that is expensive to access. The memory manager 33 manages the local memory. The memory manager 33 manages a memory image associated with, for example, thread generation / deletion.

インタラプトマネージャ３４は、自コア１０の処理の割り込み管理を行うためのものである。このインタラプトマネージャ３４は、割り込み要求が発生したときに、現在の処理を中断して割り込み処理を実行するようにコア１０の処理を切り替える。 The interrupt manager 34 is for performing interrupt management of the processing of the own core 10. When an interrupt request is generated, the interrupt manager 34 switches the processing of the core 10 so as to interrupt the current processing and execute the interrupt processing.

（スレッドグループについて）
グローバルスケジューラ２１は、所定のスケジューリングポリシーに基づいてスレッドの動作コア１０を決定・変更する。グローバルスケジューラ２１のスケジューリングポリシーの基本的考え方の１つは、各コア１０に割り当てられたスレッド群を「優先度上位スレッドグループ」と「優先度下位スレッドグループ」とに分割するというものである。「優先度上位スレッドグループ」は、各コア１０に割り当てられたスレッドのうち高優先度の上位Ｎ個のスレッドである。「優先度下位スレッドグループ」は「優先度上位スレッドグループ」に含まれない低優先度のスレッドである（図２参照。なお、この図２においてはコア１０の数を４つとして簡略化して説明しているが、これは説明の便宜上であり、実際には図１に示すような多数のコア１０が組み込まれている）。 (About thread groups)
The global scheduler 21 determines and changes the operating core 10 of the thread based on a predetermined scheduling policy. One basic idea of the scheduling policy of the global scheduler 21 is to divide a thread group assigned to each core 10 into a “priority higher thread group” and a “priority lower thread group”. The “priority upper thread group” is the upper N threads with higher priority among the threads assigned to each core 10. The “priority lower thread group” is a low-priority thread not included in the “priority upper thread group” (see FIG. 2. Note that in FIG. 2, the number of cores 10 is simplified to four. However, this is for convenience of explanation, and a large number of cores 10 as shown in FIG. 1 are actually incorporated).

「優先度上位スレッドグループ」と「優先度下位スレッドグループ」との違いは、スレッドマイグレーションの対象となるかどうかである。「優先度上位スレッドグループ」に属するスレッドはスレッドマイグレーションの対象とはならず、「優先度下位スレッドグループ」に属するスレッドはスレッドマイグレーションの対象となる。 The difference between the “priority higher-level thread group” and the “priority lower-level thread group” is whether or not a thread migration target. Threads belonging to the “priority upper thread group” are not subject to thread migration, and threads belonging to the “priority lower thread group” are subject to thread migration.

「優先度上位スレッドグループ」に属するスレッドは、「優先度上位スレッドグループ」に属する限りはスレッドマイグレーションされないため、割り当てられたコア１０において優先的に実行される。このため、リアルタイム性が要求される高優先度のスレッドの実行時間を保証することができるので、リアルタイム性を確保することができるようになっている。 The threads belonging to the “priority higher thread group” are not thread-migrated as long as they belong to the “priority higher thread group”, and therefore are preferentially executed in the assigned core 10. For this reason, the execution time of a high-priority thread that requires real-time performance can be guaranteed, so that real-time performance can be ensured.

本実施形態においては、Ｎ＝１と設定しており、各コア１０の「優先度上位スレッドグループ」には１つのスレッドのみが属するようにしている。このため、「優先度上位スレッドグループ」に属するスレッドは、実行可能状態のときには常に実行状態に遷移するので、常に実行が保証されるようになっている。 In this embodiment, N = 1 is set, and only one thread belongs to the “priority higher-level thread group” of each core 10. For this reason, the threads belonging to the “priority higher-level thread group” always transition to the execution state when in the executable state, so that the execution is always guaranteed.

（生成されるスレッドの動作コア１０の決定について）
次に、生成されるスレッドの動作コア１０がどのように決定されるかについて説明する。 (Determination of the operating core 10 of the generated thread)
Next, how the operation core 10 of the generated thread is determined will be described.

グローバルスケジューラ２１は、以下のようなスケジューリングポリシーに基づいて生成されるスレッドの動作コア１０を決定する。すなわち、生成されたすべてのスレッドのうち高優先度の上位Ｍ個（Ｍはスレッドを割り当て可能なコア１０の数）のスレッドについて、各スレッドの動作コア１０が互いに異なるようにマップする。本実施形態においてはＭ＝６４であるので、高優先度の上位６４個のスレッドがそれぞれ別のコア１０で実行されるようにマップされる。言い換えると、それぞれのコア１０において最高優先度のスレッドとなるようにマップされる。 The global scheduler 21 determines the operation core 10 of the thread generated based on the following scheduling policy. That is, among all the generated threads, the top M threads with the highest priority (M is the number of cores 10 to which threads can be assigned) are mapped so that the operation cores 10 of the threads are different from each other. In this embodiment, since M = 64, the high-priority upper 64 threads are mapped to be executed by different cores 10 respectively. In other words, each core 10 is mapped to be the highest priority thread.

以下、図４のスレッド生成処理のフロー図を参照しつつ、具体的なグローバルスケジューラ２１の挙動について説明する。 Hereinafter, a specific behavior of the global scheduler 21 will be described with reference to a flow chart of the thread generation process of FIG.

図４に示すスレッド生成処理は、例えばアプリケーションスレッド４２がスレッド生成要求（カーネルＡＰＩ）を出すことで実行される。グローバルスケジューラ２１がスレッド生成要求を受け取ると、図４のステップＳ１０１に示すように、優先度上位スレッドグループに空きがあるかどうかがチェックされる。優先度上位スレッドグループに空きがある場合（本実施形態においては、スレッドを割り当て可能なコア１０の数（＝６４）よりも、生成されたスレッドの数が少ない場合）には、ステップＳ１０２に進む。一方、優先度上位スレッドグループに空きがない場合には、ステップＳ１０３に進む。 The thread generation process shown in FIG. 4 is executed, for example, when the application thread 42 issues a thread generation request (kernel API). When the global scheduler 21 receives the thread generation request, as shown in step S101 in FIG. 4, it is checked whether or not there is a vacancy in the higher priority thread group. If there is a vacancy in the higher priority thread group (in this embodiment, if the number of generated threads is smaller than the number of cores 10 to which threads can be assigned (= 64)), the process proceeds to step S102. . On the other hand, if there is no space in the higher priority thread group, the process proceeds to step S103.

ステップＳ１０２に進んだ場合、優先度上位スレッドグループに空きがあるコア１０に対して、グローバルスケジューラ２１がスレッド生成の指示を出す。これにより、当該コア１０においてスレッドが作成され、作成されたスレッドは優先度上位スレッドグループに属することとなる。 When the process proceeds to step S102, the global scheduler 21 issues a thread generation instruction to the core 10 in which the priority higher-order thread group has a vacancy. Thereby, a thread is created in the core 10, and the created thread belongs to the higher priority thread group.

一方、ステップＳ１０３に進んだ場合、生成したスレッドよりも優先度の低いスレッドが優先度上位スレッドグループに存在するかどうかがチェックされる。生成したスレッドよりも優先度の低いスレッドが優先度上位スレッドグループに存在する場合には、ステップＳ１０４に進む。一方、生成したスレッドよりも優先度の低いスレッドが優先度上位スレッドグループに存在しない場合には、ステップＳ１０５に進む。 On the other hand, when the process proceeds to step S103, it is checked whether or not a thread having a lower priority than the generated thread exists in the higher priority thread group. If a thread having a lower priority than the generated thread exists in the higher priority thread group, the process proceeds to step S104. On the other hand, if a thread having a lower priority than the generated thread does not exist in the higher priority thread group, the process proceeds to step S105.

ステップＳ１０４に進んだ場合、優先度上位スレッドグループに有するスレッドのうちで最も優先度の低いスレッド（仮にスレッドＸと呼ぶ）を有するコア１０に対して、グローバルスケジューラ２１がスレッド生成の指示を出す。これにより、当該コア１０においてスレッドが作成され、作成されたスレッドは優先度上位スレッドグループに属することとなると同時に、スレッドＸは優先度上位スレッドグループから優先度低位スレッドグループに移動することとなる。 When the processing proceeds to step S104, the global scheduler 21 issues a thread generation instruction to the core 10 having the lowest priority thread (referred to as the thread X) among the threads in the higher priority thread group. As a result, a thread is created in the core 10, and the created thread belongs to the higher priority thread group, and at the same time, the thread X moves from the higher priority thread group to the lower priority thread group.

ステップＳ１０５に進んだ場合、割り当てスレッド数が最も少ないコア１０に対して、グローバルスケジューラ２１がスレッド生成の指示を出す。これにより、当該コア１０においてスレッドが作成され、作成されたスレッドは優先度低位スレッドグループに属することとなる。 When the process proceeds to step S105, the global scheduler 21 issues a thread generation instruction to the core 10 having the smallest number of assigned threads. As a result, a thread is created in the core 10, and the created thread belongs to the low priority thread group.

なお、上記したフローにおいては、スレッドの動作コア１０を必ずグローバルスケジューラ２１が決定することとしたが、スレッドの動作コア１０を指定してスレッドを作成できるようにしてもよい。例えば、カーネルＡＰＩの引数で動作コア１０を指定できるようにしてもよい。この場合、グローバルスケジューラ２１は、上記したステップＳ１０１〜１０５の処理を行うことなく、指定されたコア１０にスレッド生成の指示を直接出すこととなる。 In the above-described flow, the global scheduler 21 always determines the thread operating core 10, but the thread operating core 10 may be designated to create a thread. For example, the operation core 10 may be designated by an argument of the kernel API. In this case, the global scheduler 21 directly issues a thread generation instruction to the designated core 10 without performing the processing of steps S101 to S105 described above.

（スレッドの削除について）
図５はスレッド削除処理のフロー図である。この図５を参照しつつ、スレッドの削除処理について説明する。 (About thread deletion)
FIG. 5 is a flowchart of thread deletion processing. The thread deletion process will be described with reference to FIG.

図５に示すスレッド削除処理は、例えばアプリケーションスレッド４２がスレッド削除要求（カーネルＡＰＩ）を出すことで実行される。グローバルスケジューラ２１がこのスレッド削除要求を受け取ると、図５のステップＳ２００に示すように、当該スレッドを削除する。そして、ステップＳ２０１に進む。 The thread deletion process shown in FIG. 5 is executed, for example, when the application thread 42 issues a thread deletion request (kernel API). When the global scheduler 21 receives this thread deletion request, the thread is deleted as shown in step S200 of FIG. Then, the process proceeds to step S201.

ステップＳ２０１では、削除したスレッドが優先度上位スレッドグループに属していたか否かがチェックされる。優先度上位スレッドグループに属していなかった場合、処理が終了する。優先度上位スレッドグループに属していた場合、ステップＳ２０２に進む。 In step S201, it is checked whether or not the deleted thread belongs to the higher priority thread group. If it does not belong to the higher priority thread group, the process ends. If it belongs to the higher priority thread group, the process proceeds to step S202.

ステップＳ２０２では、優先度上位スレッドグループに属していないスレッドのうち、最高優先度のスレッド（仮にスレッドＹと呼ぶ）を抽出し、このスレッドＹが削除したスレッドと同じコア１０に割り当てられているか否かがチェックされる。スレッドＹが削除したスレッドと同じコア１０に割り当てられている場合、処理が終了する（これにより、削除したスレッドの代わりにスレッドＹが優先度上位スレッドグループに属することとなる）。スレッドＹが削除したスレッドと同じコア１０に割り当てられていない場合、ステップＳ２０３に進む。 In step S202, the highest priority thread (referred to as thread Y) is extracted from the threads not belonging to the higher priority thread group, and whether or not this thread Y is assigned to the same core 10 as the deleted thread. Is checked. When the thread Y is assigned to the same core 10 as the deleted thread, the processing ends (this causes the thread Y to belong to the higher priority thread group instead of the deleted thread). If the thread Y is not assigned to the same core 10 as the deleted thread, the process proceeds to step S203.

ステップＳ２０３では、スレッドＹを、削除したスレッドが属していたコア１０にマイグレーションする。これにより、スレッドＹは、削除したスレッドが属していたコア１０において、優先度上位スレッドグループに属することとなる。 In step S203, the thread Y is migrated to the core 10 to which the deleted thread belongs. As a result, the thread Y belongs to the higher priority thread group in the core 10 to which the deleted thread belongs.

以上説明したように、優先度上位スレッドグループに属するスレッドが削除された場合には、優先度の高いスレッドから順に優先度上位スレッドグループに格上げされるようになっている。 As described above, when a thread belonging to the higher priority thread group is deleted, the threads are upgraded to the higher priority thread group in order from the highest priority thread.

（スレッドマイグレーションについて）
本実施形態に係るグローバルスケジューラ２１は、スレッドマイグレーションを実行するにあたり、コア１０間でのスレッドのロードバランスを定期的に検査し、この検査結果を基にマイグレーション対象のスレッド及びコア１０を決定する。 (About thread migration)
The global scheduler 21 according to the present embodiment periodically checks the load balance of threads among the cores 10 when executing thread migration, and determines the migration target threads and the cores 10 based on the check results.

ロードバランスは、図２に示すように、実行（ＲＵＮＮＩＮＧ）状態を含む実行可能（ＲＥＡＤＹ）状態のスレッド（負荷測定スレッド）の優先度を基に計算される。 As shown in FIG. 2, the load balance is calculated based on the priority of a thread (load measurement thread) in a ready (READY) state including a running (RUNNING) state.

具体的には、コア１０ごとに、図３（ａ）に示す計算式でワークロードが計算される。例えば図２に示す「Ｃｏｒｅ０」のワークロードは、（２５６−１）＾２＋（２５６−６）＾２＋（２５６−１０）＾２＝１８８，０４１である。 Specifically, the workload is calculated for each core 10 using the calculation formula shown in FIG. For example, the workload of “Core 0” illustrated in FIG. 2 is (256-1) ^ 2 + (256-6) ^ 2 + (256-10) ^ 2 = 188,041.

このように計算されたワークロードの値を図３（ｂ）に示す計算式に代入することで、ロードバランスのばらつきが計算される。この計算式で導き出される値Ｄが小さいほどロードバランスのばらつきが小さくスループットが向上すると判断するため、グローバルスケジューラ２１は、この値Ｄが小さくなるようにスレッドマイグレーションを実行する。 By substituting the calculated workload value into the calculation formula shown in FIG. 3B, the variation in load balance is calculated. Since it is determined that the smaller the value D derived from this calculation formula, the smaller the load balance variation and the higher the throughput, the global scheduler 21 executes thread migration so that the value D becomes smaller.

図６はスレッドマイグレーションを含めたロードバランシング処理のフロー図である。この図６を参照しつつ、グローバルスケジューラ２１によるロードバランシング処理について説明する。 FIG. 6 is a flowchart of load balancing processing including thread migration. The load balancing process by the global scheduler 21 will be described with reference to FIG.

図６に示すロードバランシング処理は、例えば５０ｍｓなどの一定周期で呼び出されるものである。本実施形態においては、グローバルスケジューラ２１がタイマ割り込みによって一定周期で処理を起動するようにしている。 The load balancing process shown in FIG. 6 is called at a constant cycle such as 50 ms. In the present embodiment, the global scheduler 21 starts processing at a constant cycle by a timer interrupt.

処理が起動すると、まず図６のステップＳ３００に示すように、すべてのコア１０のワークロードが測定される。具体的には、グローバルスケジューラ２１が各コア１０にワークロード測定の指示を出し、指示を受け取った各コア１０は図３（ａ）に示す計算式でワークロードを計算してグローバルスケジューラ２１に返却する。そして、ステップＳ３０１に進む。 When the process starts, first, the workload of all the cores 10 is measured as shown in step S300 of FIG. Specifically, the global scheduler 21 issues a workload measurement instruction to each core 10, and each core 10 that has received the instruction calculates the workload using the formula shown in FIG. 3A and returns it to the global scheduler 21. To do. Then, the process proceeds to step S301.

ステップＳ３０１では、各コア１０のワークロード測定の結果を基に、最も負荷の低いコア１０（ワークロードが最小のコア１０）を「マイグレーションターゲット」として選定する。マイグレーションターゲットとして最も負荷の低いコア１０を選定しているのは、スレッドマイグレーションの目的を「負荷の低いコア１０の有効活用」と定義したためである。このように目的を限定することで、過度に計算が複雑になって処理負担が増えることがないような仕組みになっている。そして、マイグレーションターゲットが選定されたら、ステップＳ３０２に進む。 In step S301, the core 10 with the lowest load (the core 10 with the smallest workload) is selected as the “migration target” based on the workload measurement result of each core 10. The reason why the core 10 having the lowest load is selected as the migration target is that the purpose of the thread migration is defined as “effective use of the core 10 having a low load”. By limiting the purpose in this way, the calculation is not excessively complicated and the processing load is not increased. When the migration target is selected, the process proceeds to step S302.

ステップＳ３０２では、マイグレーションターゲット以外のすべてのコア１０について、当該コア１０に含まれるスレッド（優先度低位スレッドグループのうち最も優先度の高いスレッド）をマイグレーションターゲットにマイグレーションした場合のロードバランスのばらつきが計算される。 In step S302, for all the cores 10 other than the migration target, the load balance variation when the thread included in the core 10 (the highest priority thread in the low priority thread group) is migrated to the migration target is calculated. Is done.

具体的には、マイグレーションターゲット以外のコア１０において優先度低位スレッドグループのうち最も優先度の高いスレッドをマイグレーションターゲットに移動したと仮定し、図３（ｂ）に示す計算式でロードバランスのばらつきを計算する。これをマイグレーションターゲット以外のすべてのコア１０について計算し、最もロードバランスのばらつきが小さくなる組み合わせを検査する。なお、マイグレーションするスレッドを優先度低位スレッドグループのうち最も優先度の高いスレッドとしたのは、マイグレーションの目的を「優先度が高いスレッドの実行機会の最大化」と定義したためである。このように目的を限定することで、ロードバランスのばらつきの計算回数が過度になって処理負担が増えることがないような仕組みになっている。そして、ロードバランスのばらつきが計算されたら、ステップＳ３０３に進む。 Specifically, assuming that the core 10 other than the migration target has moved the thread with the highest priority in the lower priority thread group to the migration target, the variation in load balance is calculated using the formula shown in FIG. calculate. This is calculated for all the cores 10 other than the migration target, and the combination with the smallest load balance variation is inspected. The reason why the thread to be migrated is the highest priority thread in the low priority thread group is that the purpose of the migration is defined as “maximizing the execution opportunity of the high priority thread”. By limiting the purpose in this way, the system is configured such that the number of load balance variation calculations is not excessive and the processing load does not increase. When the load balance variation is calculated, the process proceeds to step S303.

ステップＳ３０３では、ステップＳ３０２においてロードバランスのばらつきが最も小さくなると計算された組み合わせでスレッドマイグレーションを実行する。なお、スレッドマイグレーションを実行しない方がロードバランスのばらつきが小さい場合には、スレッドマイグレーションを実行せずに処理を終了する。 In step S303, thread migration is executed with the combination calculated in step S302 when the load balance variation is minimized. If the variation in load balance is smaller when thread migration is not executed, the process is terminated without executing thread migration.

以上のような処理によれば、定期的にスレッドマイグレーションが実行されるため、スループットを向上することができる。なお、本実施形態においては１回のロードバランシング処理でスレッドマイグレーションされるスレッドの数を最大１つとしているため、過度にスレッドマイグレーションが発生しないように抑制されている。 According to the processing as described above, since thread migration is periodically performed, throughput can be improved. In the present embodiment, since the maximum number of threads that are thread-migrated in one load balancing process is one, the thread migration is suppressed from occurring excessively.

また、スレッドマイグレーションを行うに当たり、単に優先度に基づいてスレッドマイグレーションを実行するのではなく、優先度ベースのロードバランスのばらつきに基づいてスレッドマイグレーションを実行することで、スループットを向上しつつもスレッドマイグレーションの回数を抑制できるように形成されている。 Also, when performing thread migration, thread migration is performed based on variations in priority-based load balance instead of simply performing thread migration based on priority, while improving thread throughput. It is formed so that the number of times can be suppressed.

なお、上記した処理においては、グローバルスケジューラ２１から各コア１０にワークロード測定の指示が出すこととしたが、各コア１０のマイクロカーネル３０が所定時間毎にワークロードの測定結果をグローバルスケジューラ２１に送信するようにしてもよい。 In the above-described processing, the global scheduler 21 issues a workload measurement instruction to each core 10. However, the microkernel 30 of each core 10 sends the workload measurement result to the global scheduler 21 every predetermined time. You may make it transmit.

（まとめ）
以上説明したように、本実施形態によれば、グローバルスケジューラ２１とローカルスケジューラ３１とによる２段階のスケジュールによりスレッドが実行される。グローバルスケジューラ２１は、生成したスレッドの動作コア１０を決定するとともに、所定のスケジューリングポリシーに基づいてスレッドマイグレーションの実行を決定する。このため、各コア１０にほぼ均等にスレッドを割り当てることができ、ハードウェアリソースを有効に活用してスループットを向上することができる。 (Summary)
As described above, according to the present embodiment, threads are executed according to a two-stage schedule by the global scheduler 21 and the local scheduler 31. The global scheduler 21 determines the operation core 10 of the generated thread and also determines execution of thread migration based on a predetermined scheduling policy. For this reason, threads can be allocated to each core 10 almost equally, and the hardware resources can be effectively used to improve the throughput.

また、各コア１０に割り当てられたスレッドのうち高優先度の上位１個のスレッドについてはスレッドマイグレーションの対象としないので、高優先度のスレッドはスレッドマイグレーションされずに割り当てコア１０での最優先実行が保証されている。このため、リアルタイム性が要求される高優先度のスレッドの実行時間を保証することができるので、リアルタイム性を確保することができる。 In addition, since the top one thread with high priority among the threads assigned to each core 10 is not subject to thread migration, the high priority thread is not subjected to thread migration and is executed with the highest priority in the assigned core 10. Is guaranteed. For this reason, the execution time of a high-priority thread that requires real-time performance can be guaranteed, so that real-time performance can be ensured.

また、グローバルスケジューラ２１は生成されたすべてのスレッドのうち高優先度の上位６４個のスレッドについて、各スレッドの動作コア１０が互いに異なるようにマップする。すなわち、スレッドを割り当て可能なコア１０数と同数のスレッドについて、優先度の高い順に実行を保証することができるので、これら高優先度のスレッドの実行時間を保証することができ、リアルタイム性を確保することができる。 The global scheduler 21 maps the high-priority upper 64 threads among all the generated threads so that the operation cores 10 of the threads are different from each other. In other words, the same number of threads as the number of cores 10 to which threads can be allocated can be guaranteed to be executed in descending order of priority, so the execution time of these high priority threads can be guaranteed and real-time performance can be ensured. can do.

また、グローバルスケジューラ２１は、複数のコア１０間でのスレッドのロードバランスを定期的に検査した結果を基にマイグレーション対象のスレッド及びコア１０を決定するので、ロードバランスを最適な状態に保つことができ、スループットを向上することができる。 In addition, since the global scheduler 21 determines the migration target thread and the core 10 based on the result of periodically checking the load balance of the threads among the plurality of cores 10, the load balance can be maintained in an optimum state. And throughput can be improved.

また、常にスレッドマイグレーションを実行するのではなく、定期的に検査した結果を基にスレッドマイグレーションを実行するので頻繁にスレッドマイグレーションが発生せず、かつ、ロードバランスが変化しない状況においてはスレッドマイグレーションが発生しないので、スレッドマイグレーションのコストを抑制することができる。特に、ボトルネックやキャッシュコヒーレンシの問題を回避するために共有メモリを使用しない（コア１０ごとのローカルメモリを使用する）場合には、スレッドマイグレーション時にローカルメモリのコピーが必要となるためスレッドマイグレーションのコストが問題となる。しかしながら、本実施形態の制御によれば、スレッドマイグレーションの回数を最低限に抑制することでスレッドマイグレーションに伴うコストを抑制することができるため、言い換えると、スループットやリアルタイム性を確保しつつもボトルネックやキャッシュコヒーレンシの問題を回避することができる。 In addition, thread migration is not always executed, but thread migration is executed based on the results of periodic inspections. Therefore, thread migration does not occur frequently and the load balance does not change. Therefore, the cost of thread migration can be suppressed. In particular, when shared memory is not used (a local memory for each core 10 is used) in order to avoid bottlenecks and cache coherency problems, a copy of local memory is required during thread migration, so the cost of thread migration Is a problem. However, according to the control of the present embodiment, the cost associated with thread migration can be suppressed by minimizing the number of thread migrations. In other words, the bottleneck is achieved while ensuring throughput and real-time performance. And cache coherency problems can be avoided.

なお、上記した実施形態においては、スレッドを割り当て可能なコア１０の数Ｍを、コア１０の総数６４と同数としたが、本発明の実施形態としてはこれに限らない。プロセッサに実装されたコア１０のうちの任意の数のコア１０のみをスレッドを割り当て可能なコア１０として扱ってもよい。例えば、ＯＳサーバ実行コア１１を予め決定してグローバルスケジューラ２１の管理外とし、これらのコア１０をスレッド割り当て可能なコア１０から除外してもよい。具体的には、コア１０が６４個ある場合に、このうちの８個をＯＳサーバ実行コア１１とし、残りの５６個をアプリケーション実行コア１２とし、この５６個のアプリケーション実行コア１２をスレッド割り当て可能なコア１０としてグローバルスケジューラ２１によるスレッドの割り当てやスレッドマイグレーションの対象としてもよい。 In the above-described embodiment, the number M of cores 10 to which threads can be assigned is the same as the total number 64 of cores 10. However, the embodiment of the present invention is not limited to this. Only an arbitrary number of cores 10 out of the cores 10 mounted on the processor may be treated as cores 10 to which threads can be assigned. For example, the OS server execution core 11 may be determined in advance and excluded from the management of the global scheduler 21, and these cores 10 may be excluded from the cores 10 that can be assigned with threads. Specifically, when there are 64 cores 10, 8 of them are OS server execution cores 11, the remaining 56 are application execution cores 12, and the 56 application execution cores 12 can be assigned to threads. The core 10 may be a target of thread assignment or thread migration by the global scheduler 21.

また、上記した実施形態においてはコア１０が６４個の場合について説明したが、本発明の実施形態としてはこれに限らず、任意の数のコア１０に対応できることは言うまでもない。 Moreover, although the case where the number of cores 10 was 64 was demonstrated in the above-mentioned embodiment, it cannot be overemphasized that it can respond to not only this but arbitrary number of cores 10 as embodiment of this invention.

また、上記した実施形態においては図３に示す計算式でワークロード及びロードバランスを計算することとしたが、本発明の実施形態としてはこれに限らず、他の計算式を使用してもよい。例えば、乗数を変更して優先度の重みづけを変更してもよい。 In the above-described embodiment, the workload and the load balance are calculated using the calculation formula shown in FIG. 3. However, the present invention is not limited to this, and other calculation formulas may be used. . For example, the priority weight may be changed by changing the multiplier.

また、上記した実施形態においては、各コア１０の「優先度上位スレッドグループ」に属するスレッドの数Ｎを「１」に設定したが、本発明の実施形態としてはこれに限らない。Ｎの値は１以上の予め定められた自然数であればよく、例えば２や３としてもよい。ただし、あまり大きな数値とするとスループットが低下するため、適切な値に設定する必要がある。なお、コア１０の数が多い場合には、Ｎが小さい値であっても十分な数の高優先度スレッドの実行を保証できるので、このような場合には小さい値（例えば最小値である「１」）に設定することで、リアルタイム性を確保しつつスループットを最大とすることができる。 In the above-described embodiment, the number N of threads belonging to the “priority higher-level thread group” of each core 10 is set to “1”, but the embodiment of the present invention is not limited to this. The value of N may be a predetermined natural number equal to or greater than 1, and may be 2 or 3, for example. However, if the value is too large, the throughput will decrease, so it must be set to an appropriate value. If the number of cores 10 is large, even if N is a small value, execution of a sufficient number of high-priority threads can be guaranteed. In such a case, a small value (for example, the minimum value “ By setting it to 1 "), the throughput can be maximized while ensuring the real-time property.

なお、Ｎ＞１の場合、グローバルスケジューラ２１は、高優先度スレッドの動作コア１０を以下のようにマップすることが望ましい。まず、既に説明したように、生成されたすべてのスレッドのうち高優先度の上位Ｍ個のスレッドについて、各スレッドの動作コア１０が互いに異なるようにマップする。そして、次の高優先度の上位Ｍ個のスレッド（すなわち、優先度が（Ｍ＋１）番目〜（Ｍ×２）番目のスレッド）について、各スレッドの動作コア１０が互いに異なるようにマップする。このように、高優先度のスレッドをＭ個ずつのブロックに分割し、各ブロックに含まれるスレッドを互いに異なるコア１０にマップする作業をＮ回繰り返す。このような処理によれば、Ｎ＞１の場合においても優先度が高い順にスレッドの実行を保証することができる。 In the case of N> 1, it is desirable that the global scheduler 21 maps the operation core 10 of the high priority thread as follows. First, as already described, the high-priority top M threads among all the generated threads are mapped so that the operation cores 10 of the threads are different from each other. Then, the next high priority M threads (that is, the (M + 1) th to (M × 2) th priority) are mapped so that the operation cores 10 of the respective threads are different from each other. In this manner, the operation of dividing the high priority thread into M blocks and mapping the threads included in each block to different cores 10 is repeated N times. According to such processing, even when N> 1, it is possible to guarantee the execution of threads in descending order of priority.

１０コア
１１ＯＳサーバ実行コア
１２アプリケーション実行コア
２０ＯＳサーバ
２１グローバルスケジューラ
２５アプリケーション
３０マイクロカーネル
３１ローカルスケジューラ
３２メッセージマネージャ
３３メモリマネージャ
３４インタラプトマネージャ
４１インターフェースライブラリ
４２アプリケーションスレッド 10 core 11 OS server execution core 12 application execution core 20 OS server 21 global scheduler 25 application 30 microkernel 31 local scheduler 32 message manager 33 memory manager 34 interrupt manager 41 interface library 42 application thread

Claims

A control program for a multi-core processor having a plurality of cores,
A global scheduler that determines the operating core of the generated thread;
A local scheduler provided for each of the plurality of cores;
With
The local scheduler schedules the threads assigned to the own core according to the priority and causes the own core to execute,
The global scheduler determines execution of thread migration between the plurality of cores based on a predetermined scheduling policy, and the top N high-priority threads (N is assigned to each core). A control program for a multi-core processor, wherein a thread of one or more predetermined natural numbers) is not subject to thread migration.

The global scheduler maps high-priority top M threads (M is the number of cores to which threads can be assigned) of all generated threads so that the operating cores of the threads are different from each other. The multicore processor control program according to claim 1, wherein

The multi-core processor according to claim 1 or 2, wherein the global scheduler determines a thread and a core to be migrated based on a result of periodically checking a load balance of threads among the plurality of cores. Control program.

An electronic device equipped with the control program for a multi-core processor according to claim 1.

A control method of a multi-core processor that operates on a multi-core processor having a plurality of cores and performs thread scheduling with a local scheduler provided for each of the plurality of cores,
Determining the operating core of the generated thread;
Scheduling the local scheduler according to priority for threads assigned to its own core and executing it in its own core;
Performing thread migration between the plurality of cores based on a predetermined scheduling policy;
With
A method for controlling a multi-core processor, characterized in that thread migration is not executed for the top N threads (N is a predetermined natural number equal to or greater than 1) with high priority among threads assigned to each core.

The high-priority top M threads (M is the number of cores to which threads can be assigned) of all the generated threads are mapped so that the operating cores of each thread are different from each other. Item 6. A multicore processor control method according to Item 5.

The multicore processor control method according to claim 5 or 6, wherein a thread and a core to be migrated are determined based on a result of periodically checking a load balance of threads among the plurality of cores.