JP2014002787A

JP2014002787A - Multi-core processor system, cache coherency control method, and cache coherency control program

Info

Publication number: JP2014002787A
Application number: JP2013184347A
Authority: JP
Inventors: Takahisa Suzuki; 貴久鈴木; Koichiro Yamashita; 浩一郎山下; Hiromasa Yamauchi; 宏真山内; Yasushi Kurihara; 康志栗原
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2013-09-05
Filing date: 2013-09-05
Publication date: 2014-01-09
Anticipated expiration: 2030-06-14
Also published as: JP5614483B2

Abstract

PROBLEM TO BE SOLVED: To reduce operations within a cache coherency mechanism.SOLUTION: A multi-core processor system 100 includes an execution unit 503 that executes coherency of the value of shared data stored in a cache memory that is accessed by each of CPUs. The multi-core processor system 100 detects a first thread executed by a CPU #0, and specifies a second thread under execution by a CPU #1 other than the CPU #0. After the specification, the multi-core processor system 100 determines whether there are shared data that are accessed by the first and second threads in common. When it is determined that there are not shared data, the multi-core processor system 100 makes the execution unit 503 stop the execution of coherency between a snooping cache #0 corresponding to the CPU #0 and a snooping cache #1 corresponding to the CPU #1.

Description

本発明は、キャッシュコヒーレンシ機構を制御するマルチコアプロセッサシステム、キャッシュコヒーレンシ制御方法、およびキャッシュコヒーレンシ制御プログラムに関する。 The present invention relates to a multi-core processor system that controls a cache coherency mechanism, a cache coherency control method, and a cache coherency control program.

近年、マルチコアプロセッサシステムでは、コアごとに独立したキャッシュメモリを搭載しており、キャッシュコヒーレンシ機構によりキャッシュメモリの一貫性を維持する、といった形態がとられている。キャッシュコヒーレンシ機構を利用したマルチコアプロセッサシステムでは、キャッシュメモリに格納された共有データの一貫性の維持をハードウェアで行うために、マルチコアプロセッサ向けの並列ソフトウェアを容易に作成することができる。 In recent years, multi-core processor systems are equipped with an independent cache memory for each core, and the cache coherency mechanism is used to maintain the consistency of the cache memory. In a multi-core processor system using a cache coherency mechanism, parallel software for a multi-core processor can be easily created in order to maintain the consistency of shared data stored in a cache memory with hardware.

キャッシュコヒーレンシ機構は、キャッシュメモリの動作を監視するために、キャッシュメモリアクセス時の遅延が発生する。遅延を防止する技術として、ＳＭＰ（ＳｙｍｍｅｔｒｉｃＭｕｌｔｉＰｒｏｃｅｓｓｉｎｇ）かＡＳＭＰ（ＡＳｙｍｍｅｔｒｉｃＭｕｌｔｉＰｒｏｃｅｓｓｉｎｇ）に基づいて、キャッシュコヒーレンシ機構を制御する技術が開示されている（たとえば、下記特許文献１を参照。）。特許文献１では、複数のコアが複数のプロセスを実行する場合をＳＭＰとし、複数のコアが単一のプロセスを実行する場合をＡＳＭＰとしている。プロセスとはプログラムの実行単位であり、一つのプロセスには、一つ以上のスレッドが属している。同じプロセスに属するスレッドは、同じメモリ空間にアクセスする。 Since the cache coherency mechanism monitors the operation of the cache memory, a delay occurs when the cache memory is accessed. As a technique for preventing a delay, a technique for controlling a cache coherency mechanism based on SMP (Symmetric Multi Processing) or ASMP (Asymmetric Multi Processing) is disclosed (for example, see Patent Document 1 below). In Patent Document 1, a case where a plurality of cores executes a plurality of processes is referred to as SMP, and a case where a plurality of cores execute a single process is referred to as ASMP. A process is a program execution unit, and one or more threads belong to one process. Threads belonging to the same process access the same memory space.

また、別の技術として、複数のコアが、同一のプロセスに属するスレッドを実行する場合にコヒーレンシを実行し、異なるプロセスに属するスレッドを実行する場合にはコヒーレンシを実行しないという技術が開示されている（たとえば、下記特許文献２を参照。）。 As another technique, a technique is disclosed in which a plurality of cores execute coherency when executing threads belonging to the same process, and do not execute coherency when executing threads belonging to different processes. (For example, see Patent Document 2 below.)

また、スレッド間の依存関係を解析する技術として、各スレッドを１ステートメントごとに実行することにより、共有データへのアクセスを表す情報を作成し、スレッドのステートメントごとの依存関係を解析する技術が開示されている（たとえば、下記特許文献３を参照。）。 In addition, as a technology for analyzing the dependency relationship between threads, a technology is disclosed in which information representing access to shared data is created by executing each thread for each statement, and the dependency relationship for each statement of the thread is analyzed. (For example, refer to Patent Document 3 below.)

特開平１０−９７４６５号公報Japanese Patent Laid-Open No. 10-97465 特開２００４−１３３７５３号公報Japanese Patent Laid-Open No. 2004-133753 特開２０００−２０７２４８号公報JP 2000-207248 A

上述した従来技術において、特許文献１、２にかかる技術では、プロセスの単位にてコヒーレンシを実行するか否かを判断している。また、組み込み機器など、多くの機能を同時に使用しない場合は、単一のプロセスにて実行されることが多い。したがって、組み込み機器に特許文献１、２にかかる技術を適用しても、コヒーレンシを常に実行することになり、キャッシュコヒーレンシ機構の動作が増加し、キャッシュメモリへの遅延発生や、消費電力の増大を招くという問題があった。 In the prior art described above, the techniques according to Patent Documents 1 and 2 determine whether or not to execute coherency in units of processes. When many functions such as embedded devices are not used at the same time, they are often executed in a single process. Therefore, even if the techniques according to Patent Documents 1 and 2 are applied to an embedded device, coherency is always executed, and the operation of the cache coherency mechanism increases, causing delay to the cache memory and increasing power consumption. There was a problem of inviting.

また、特許文献３にかかる技術を用いると、ステートメントごとに共有データのアクセス情報を解析するため、ステートメントごとにキャッシュコヒーレンシ機構を制御することになり、制御の回数が非常に増大してしまうという問題があった。 Further, when the technique according to Patent Document 3 is used, the access information of the shared data is analyzed for each statement, so that the cache coherency mechanism is controlled for each statement, and the number of times of control is greatly increased. was there.

本発明は、上述した従来技術による問題点を解消するため、キャッシュコヒーレンシ機構の動作を削減できるマルチコアプロセッサシステム、キャッシュコヒーレンシ制御方法、およびキャッシュコヒーレンシ制御プログラムを提供することを目的とする。 An object of the present invention is to provide a multi-core processor system, a cache coherency control method, and a cache coherency control program capable of reducing the operation of a cache coherency mechanism in order to solve the above-described problems caused by the prior art.

上述した課題を解決し、目的を達成するため、開示のマルチコアプロセッサシステムは、複数のコアのそれぞれがアクセスする複数のキャッシュメモリを有するマルチコアプロセッサシステムであって、複数のコアのいずれか一つは、複数のコアのうち第１のコアによって実行された第１のスレッドを検出した場合に、複数のコアのうち第１のコア以外の第２のコアにより実行中の第２のスレッドを検出し、第１のスレッドおよび第２のスレッドの実行時に共通してアクセスする共有データが無い場合には、複数のキャッシュメモリのうち第１のコアに対応する第１のキャッシュメモリと第２のコアに対応する第２のキャッシュメモリとのコヒーレンシ処理を停止させる。 In order to solve the above-described problems and achieve the object, a disclosed multi-core processor system is a multi-core processor system having a plurality of cache memories accessed by each of a plurality of cores, and any one of the plurality of cores is When a first thread executed by the first core among the plurality of cores is detected, a second thread being executed by the second core other than the first core among the plurality of cores is detected. If there is no shared data that is commonly accessed during execution of the first thread and the second thread, the first cache memory corresponding to the first core and the second core among the plurality of cache memories The coherency process with the corresponding second cache memory is stopped.

本マルチコアプロセッサシステム、キャッシュコヒーレンシ制御方法、およびキャッシュコヒーレンシ制御プログラムによれば、キャッシュコヒーレンシ機構内の動作を削減し、消費電力の削減や、遅延の防止を図ることができるという効果を奏する。 According to the multi-core processor system, the cache coherency control method, and the cache coherency control program, operations in the cache coherency mechanism can be reduced, and power consumption can be reduced and delay can be prevented.

実施の形態にかかるマルチコアプロセッサシステム１００のハードウェアを示すブロック図である。It is a block diagram which shows the hardware of the multi-core processor system 100 concerning embodiment. マルチコアプロセッサシステム１００のハードウェアの一部とソフトウェアを示すブロック図である。2 is a block diagram showing a part of hardware and software of the multi-core processor system 100. FIG. スヌープ対応キャッシュ＃０の内部を示すブロック図である。It is a block diagram which shows the inside of snoop corresponding | compatible cache # 0. スヌープ対応したバス１１０の詳細を示す説明図である。It is explanatory drawing which shows the detail of the bus | bath 110 corresponding to a snoop. マルチコアプロセッサシステム１００の機能を示すブロック図である。2 is a block diagram showing functions of a multi-core processor system 100. FIG. キャッシュコヒーレンシの実行状態と停止状態を示す説明図である。It is explanatory drawing which shows the execution state and stop state of cache coherency. マルチコアプロセッサシステム１００の動作概要である。2 is an operation overview of the multi-core processor system 100. 依存情報５０１の登録方法を示す説明図である。It is explanatory drawing which shows the registration method of the dependence information. 拡張されたスレッドデータ構造９０１のメンバー一覧と記憶内容の一例を示す説明図である。It is explanatory drawing which shows an example of the member list of the extended thread | sled data structure 901, and the memory content. スヌープ制御部＃０によるラインフェッチ処理を示すフローチャートである。It is a flowchart which shows the line fetch process by the snoop control part # 0. スヌープ制御部＃０によるラインへの書き込み処理を示すフローチャートである。It is a flowchart which shows the write-in process to the line by the snoop control part # 0. コヒーレンシ制御処理を示すフローチャートである。It is a flowchart which shows a coherency control process. コヒーレンシ対象ＣＰＵ決定処理を示すフローチャートである。It is a flowchart which shows a coherency target CPU determination process.

以下に添付図面を参照して、本発明にかかるマルチコアプロセッサシステム、キャッシュコヒーレンシ制御方法、およびキャッシュコヒーレンシ制御プログラムの好適な実施の形態を詳細に説明する。 Exemplary embodiments of a multi-core processor system, a cache coherency control method, and a cache coherency control program according to the present invention will be described below in detail with reference to the accompanying drawings.

（マルチコアプロセッサシステム１００のハードウェア）
図１は、実施の形態にかかるマルチコアプロセッサシステム１００のハードウェアを示すブロック図である。図１において、マルチコアプロセッサシステム１００は、ＣＰＵを複数搭載するＣＰＵｓ１０１と、ＲＯＭ（Ｒｅａｄ‐ＯｎｌｙＭｅｍｏｒｙ）１０２と、ＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）１０３と、を備えている。また、マルチコアプロセッサシステム１００は、フラッシュＲＯＭ１０４と、フラッシュＲＯＭコントローラ１０５と、フラッシュＲＯＭ１０６と、を備えている。また、マルチコアプロセッサシステム１００は、ユーザやその他の機器との入出力装置として、ディスプレイ１０７と、Ｉ／Ｆ（Ｉｎｔｅｒｆａｃｅ）１０８と、キーボード１０９と、を備えている。また、各部はバス１１０によってそれぞれ接続されている。 (Hardware of the multi-core processor system 100)
FIG. 1 is a block diagram illustrating hardware of a multi-core processor system 100 according to the embodiment. In FIG. 1, a multi-core processor system 100 includes CPUs 101 on which a plurality of CPUs are mounted, a ROM (Read-Only Memory) 102, and a RAM (Random Access Memory) 103. The multi-core processor system 100 includes a flash ROM 104, a flash ROM controller 105, and a flash ROM 106. The multi-core processor system 100 includes a display 107, an I / F (Interface) 108, and a keyboard 109 as input / output devices for a user and other devices. Each unit is connected by a bus 110.

ここで、ＣＰＵｓ１０１は、マルチコアプロセッサシステム１００の全体の制御を司る。ＣＰＵｓ１０１は、シングルコアのプロセッサを並列して接続したすべてのＣＰＵを指している。ＣＰＵｓ１０１の詳細は、図２にて後述する。また、マルチコアプロセッサシステムとは、コアが複数搭載されたプロセッサを含むコンピュータのシステムである。なお、本実施の形態では、説明を単純化するため、シングルコアのプロセッサが並列されているプロセッサ群を例に挙げて説明する。 Here, the CPUs 101 govern the overall control of the multi-core processor system 100. CPUs 101 refers to all CPUs in which single-core processors are connected in parallel. Details of the CPUs 101 will be described later with reference to FIG. A multi-core processor system is a computer system including a processor having a plurality of cores. In the present embodiment, in order to simplify the explanation, a processor group in which single-core processors are arranged in parallel will be described as an example.

ＲＯＭ１０２は、ブートプログラムなどのプログラムを記憶している。ＲＡＭ１０３は、ＣＰＵｓ１０１のワークエリアとして使用される。フラッシュＲＯＭ１０４は、ＯＳ（ＯｐｅｒａｔｉｎｇＳｙｓｔｅｍ）などのシステムソフトウェアやアプリケーションソフトウェアなどを記憶している。たとえば、ＯＳを更新する場合、マルチコアプロセッサシステム１００は、Ｉ／Ｆ１０８によって新しいＯＳを受信し、フラッシュＲＯＭ１０４に格納されている古いＯＳを、受信した新しいＯＳに更新する。 The ROM 102 stores a program such as a boot program. The RAM 103 is used as a work area for the CPUs 101. The flash ROM 104 stores system software such as an OS (Operating System), application software, and the like. For example, when updating the OS, the multi-core processor system 100 receives the new OS through the I / F 108 and updates the old OS stored in the flash ROM 104 to the received new OS.

フラッシュＲＯＭコントローラ１０５は、ＣＰＵｓ１０１の制御に従ってフラッシュＲＯＭ１０６に対するデータのリード／ライトを制御する。フラッシュＲＯＭ１０６は、フラッシュＲＯＭコントローラ１０５の制御で書き込まれたデータを記憶する。データの具体例としては、マルチコアプロセッサシステム１００を使用するユーザがＩ／Ｆ１０８を通して取得した画像データ、映像データなどである。フラッシュＲＯＭ１０６は、たとえば、メモリカード、ＳＤカードなどを採用することができる。 The flash ROM controller 105 controls data read / write with respect to the flash ROM 106 according to the control of the CPUs 101. The flash ROM 106 stores data written under the control of the flash ROM controller 105. Specific examples of the data include image data and video data acquired by the user using the multi-core processor system 100 through the I / F 108. As the flash ROM 106, for example, a memory card, an SD card, or the like can be adopted.

ディスプレイ１０７は、カーソル、アイコンあるいはツールボックスをはじめ、文書、画像、機能情報などのデータを表示する。このディスプレイ１０７は、たとえば、ＴＦＴ液晶ディスプレイなどを採用することができる。 A display 107 displays data such as a document, an image, and function information as well as a cursor, an icon, or a tool box. As the display 107, for example, a TFT liquid crystal display can be adopted.

Ｉ／Ｆ１０８は、通信回線を通じてＬＡＮ（ＬｏｃａｌＡｒｅａＮｅｔｗｏｒｋ）、ＷＡＮ（ＷｉｄｅＡｒｅａＮｅｔｗｏｒｋ）、インターネットなどのネットワーク１１１に接続され、ネットワーク１１１を介して他の装置に接続される。そして、Ｉ／Ｆ１０８は、ネットワーク１１１と内部のインターフェースを司り、外部装置からのデータの入出力を制御する。Ｉ／Ｆ１０８には、たとえばモデムやＬＡＮアダプタなどを採用することができる。キーボード１０９は、数字、各種指示などの入力のためのキーを備え、データの入力を行う。また、キーボード１０９は、タッチパネル式の入力パッドやテンキーなどであってもよい。 The I / F 108 is connected to a network 111 such as a LAN (Local Area Network), a WAN (Wide Area Network), and the Internet through a communication line, and is connected to other devices via the network 111. The I / F 108 controls an internal interface with the network 111 and controls data input / output from an external device. For example, a modem or a LAN adapter can be employed as the I / F 108. The keyboard 109 includes keys for inputting numbers, various instructions, and the like, and inputs data. The keyboard 109 may be a touch panel type input pad or a numeric keypad.

図２は、マルチコアプロセッサシステム１００のハードウェアの一部とソフトウェアを示すブロック図である。図２で示されるハードウェアは、キャッシュコヒーレンシ機構２０１と、共有メモリ２０２と、ＣＰＵｓ１０１に含まれるＣＰＵ＃０〜ＣＰＵ＃３とである。共有メモリ２０２と、ＣＰＵ＃０〜ＣＰＵ＃３はキャッシュコヒーレンシ機構２０１によって接続されている。また、ＣＰＵ＃０〜ＣＰＵ＃３は、共有メモリ２０２のデータに高速にアクセス可能とするため、共有メモリ２０２のデータをコピーしたキャッシュメモリを保持する。本実施の形態におけるＣＰＵ＃０〜ＣＰＵ＃３のキャッシュメモリは、キャッシュコヒーレンシ機構２０１の内部に存在する。 FIG. 2 is a block diagram showing a part of hardware and software of the multi-core processor system 100. The hardware shown in FIG. 2 is a cache coherency mechanism 201, a shared memory 202, and CPUs # 0 to CPU # 3 included in the CPUs 101. The shared memory 202 and the CPUs # 0 to # 3 are connected by a cache coherency mechanism 201. In addition, the CPUs # 0 to # 3 hold a cache memory in which the data in the shared memory 202 is copied in order to be able to access the data in the shared memory 202 at high speed. The cache memories of CPU # 0 to CPU # 3 in the present embodiment exist inside the cache coherency mechanism 201.

キャッシュコヒーレンシ機構２０１は、ＣＰＵ＃０〜ＣＰＵ＃３がアクセスするキャッシュメモリの整合性をとる装置である。キャッシュコヒーレンシ機構の方式としては、大別してスヌープ方式とディレクトリ方式とがある。 The cache coherency mechanism 201 is a device that takes consistency of the cache memory accessed by the CPUs # 0 to # 3. The cache coherency mechanism is roughly classified into a snoop method and a directory method.

スヌープ方式とは、キャッシュメモリが自身のキャッシュメモリや他ＣＰＵのキャッシュメモリの更新状態を管理し、他のキャッシュメモリと更新状態の情報を交換する方式である。更新状態の情報を交換することで、スヌープ方式をとるキャッシュコヒーレンシ機構は、どのキャッシュメモリに最新のデータが存在するかを判断する。また、各キャッシュメモリが最新のデータを取得できるように、スヌープ方式をとるキャッシュコヒーレンシ機構は、自身のキャッシュメモリの状態を変更したりキャッシュメモリの無効化を行う。 The snoop method is a method in which the cache memory manages the update state of its own cache memory and the cache memory of another CPU and exchanges update state information with other cache memories. By exchanging information on the update state, the cache coherency mechanism adopting the snoop method determines in which cache memory the latest data exists. In addition, the cache coherency mechanism using the snoop method changes the state of its own cache memory or invalidates the cache memory so that each cache memory can acquire the latest data.

ディレクトリ方式とは、キャッシュメモリの一貫性をディレクトリと呼ぶ専用領域にて一元管理する方式である。ディレクトリ方式をとるキャッシュコヒーレンシ機構では、キャッシュメモリからディレクトリへデータを送り、すべてのキャッシュメモリがデータを共有する。 The directory system is a system in which the consistency of cache memory is centrally managed in a dedicated area called a directory. In a cache coherency mechanism using a directory system, data is sent from a cache memory to a directory, and all the cache memories share the data.

いずれの方式においても、キャッシュコヒーレンシ機構２０１は、各ＣＰＵのキャッシュメモリが主記憶となる共有メモリ２０２と一致するかを判断する。不一致が発生した場合、キャッシュコヒーレンシ機構２０１は、一致をとるためキャッシュメモリのコピー、更新、無効化などを行うことで、不一致を解消する。本実施の形態におけるキャッシュコヒーレンシ機構２０１は、スヌープ方式にて説明を行うが、ディレクトリ方式であっても本実施の形態を適用することができる。 In any method, the cache coherency mechanism 201 determines whether the cache memory of each CPU matches the shared memory 202 serving as the main memory. When a mismatch occurs, the cache coherency mechanism 201 resolves the mismatch by copying, updating, invalidating, etc. the cache memory to achieve a match. The cache coherency mechanism 201 in the present embodiment will be described by a snoop method, but the present embodiment can be applied even by a directory method.

また、キャッシュコヒーレンシ機構２０１の内部としては、スヌープ対応キャッシュ＃０〜スヌープ対応キャッシュ＃３とスヌープ対応したバス１１０が存在する。ＣＰＵ＃０〜ＣＰＵ＃３は、対応するスヌープ対応キャッシュ＃０〜スヌープ対応キャッシュ＃３にそれぞれアクセスする。たとえば、ＣＰＵ＃０は、スヌープ対応キャッシュ＃０にアクセスする。スヌープ対応キャッシュの詳細については、図３にて後述する。 In the cache coherency mechanism 201, there is a snoop-compatible cache # 0 to snoop-compatible cache # 3 and a snoop-compatible bus 110. CPU # 0 to CPU # 3 access the corresponding snoop corresponding cache # 0 to snoop corresponding cache # 3, respectively. For example, CPU # 0 accesses snoop-compatible cache # 0. Details of the snoop-compatible cache will be described later with reference to FIG.

スヌープ対応したバス１１０は、従来のバスの機能に加えて、スヌープに対応するために機能追加されたバスである。スヌープ対応したバス１１０の詳細は、図４にて後述する。また、スヌープ対応キャッシュ＃０は、マスタＩ／Ｆ＃０とスレーブＩ／Ｆ＃０とで、スヌープ対応したバス１１０と接続される。同様にスヌープ対応キャッシュ＃１〜スヌープ対応キャッシュ＃３も、それぞれのマスタＩ／ＦとスレーブＩ／Ｆとで、スヌープ対応したバス１１０と接続される。 The snoop compatible bus 110 is a bus that has a function added to support snoop in addition to the conventional bus function. Details of the snoop compatible bus 110 will be described later with reference to FIG. The snoop-compatible cache # 0 is connected to the snoop-compatible bus 110 by the master I / F # 0 and the slave I / F # 0. Similarly, the snoop-compatible cache # 1 to the snoop-compatible cache # 3 are also connected to the snoop-compatible bus 110 at each master I / F and slave I / F.

共有メモリ２０２は、ＣＰＵ＃０〜ＣＰＵ＃３からアクセス可能な記憶領域である。記憶領域とは、具体的には、たとえば、ＲＯＭ１０２、ＲＡＭ１０３、フラッシュＲＯＭ１０４である。 The shared memory 202 is a storage area accessible from the CPUs # 0 to # 3. Specifically, the storage area is, for example, the ROM 102, the RAM 103, and the flash ROM 104.

図２で示されるソフトウェアは、ＯＳ２０３と、スレッド＃０〜スレッド＃３と、である。ＯＳ２０３は、マルチコアプロセッサシステム１００を制御するプログラムである。具体的にＯＳ２０３は、ＣＰＵ＃０〜ＣＰＵ＃３が実行するソフトウェアのスケジューリング処理を行う。スレッド＃０〜スレッド＃３は、ＯＳ２０３によってＣＰＵ＃０〜ＣＰＵ＃３に割り当てられたスレッドである。スレッド＃０〜スレッド＃３は、同一のプロセスに属する場合も、異なるプロセスに属する場合も存在する。 The software shown in FIG. 2 is the OS 203 and thread # 0 to thread # 3. The OS 203 is a program that controls the multi-core processor system 100. Specifically, the OS 203 performs software scheduling processing executed by the CPUs # 0 to # 3. Thread # 0 to thread # 3 are threads assigned to CPU # 0 to CPU # 3 by OS 203. Thread # 0 to thread # 3 may belong to the same process or may belong to different processes.

図３は、スヌープ対応キャッシュ＃０の内部を示すブロック図である。図３では、スヌープ対応キャッシュ＃０〜スヌープ対応キャッシュ＃３のうち、スヌープ対応キャッシュ＃０を用いて内部を説明する。 FIG. 3 is a block diagram showing the inside of the snoop-compatible cache # 0. In FIG. 3, the inside will be described using the snoop compatible cache # 0 out of the snoop compatible cache # 0 to the snoop compatible cache # 3.

スヌープ対応キャッシュ＃０は、キャッシュライン記憶部＃０とキャッシュライン制御部＃０とを内部に含む。キャッシュライン記憶部＃０は、データフィールドと、アドレスフィールドと、ステートフィールドを含む。データフィールドは、数十バイト程度のラインと呼ばれる連続したデータの単位で、データを記憶している。アドレスフィールドは、データフィールドが記憶するラインが共有メモリ２０２上で対応するアドレスを格納する。ステートフィールドは、データフィールドの状態を格納する。また、アドレスフィールドとステートフィールドとをあわせてタグ領域とする。 Snoop-compatible cache # 0 includes a cache line storage unit # 0 and a cache line control unit # 0. The cache line storage unit # 0 includes a data field, an address field, and a state field. The data field stores data in units of continuous data called lines of about several tens of bytes. The address field stores an address corresponding to the line stored in the data field on the shared memory 202. The state field stores the state of the data field. The address field and the state field are combined into a tag area.

また、ステートフィールドのとりうる状態は、スヌープ方式を実現するプロトコルによって異なるが、代表的な状態としては、Ｍ状態、Ｅ状態、Ｓ状態、Ｉ状態、という４つの状態である。 The states that the state field can take vary depending on the protocol that implements the snoop method, but there are four typical states: M state, E state, S state, and I state.

Ｍ状態は、対応するラインが該当のキャッシュメモリだけに存在し、主記憶から変更されている状態を示す。Ｅ状態は、対応するラインが該当のキャッシュメモリだけに存在し、主記憶から変更されていない状態を示す。Ｓ状態は、対応するラインが複数のキャッシュメモリに存在し、主記憶から変更されていない状態を示す。Ｉ状態は、対応するラインが無効である状態を示す。さらに、前述の４つの状態と異なるＯ状態として、対応するラインが複数のキャッシュメモリに存在し、該当のラインが更新されており、主記憶へ書き戻す責任を持った状態を使用するプロトコルも存在する。 The M state indicates a state in which the corresponding line exists only in the corresponding cache memory and is changed from the main memory. The E state indicates a state in which the corresponding line exists only in the corresponding cache memory and has not been changed from the main memory. The S state indicates a state where corresponding lines exist in a plurality of cache memories and are not changed from the main memory. The I state indicates a state where the corresponding line is invalid. In addition, as an O state different from the above four states, there is a protocol that uses a state in which the corresponding line exists in a plurality of cache memories, the corresponding line is updated, and is responsible for writing back to the main memory. To do.

スヌープ方式を実現するプロトコルには、大別して、無効型プロトコルと更新型プロトコルが存在する。無効型プロトコルは、複数のキャッシュメモリから参照があるアドレスに対し、あるキャッシュメモリが更新を行う場合、そのアドレスはダーティであるとして参照中の全キャッシュの該当のラインを無効化するプロトコルである。更新型プロトコルは、複数のキャッシュメモリが参照しているアドレスに対してデータ更新を行うときは、主記憶と他のキャッシュメモリに対して更新データを通知するプロトコルである。 Protocols for realizing the snoop method are roughly classified into an invalid protocol and an update protocol. The invalid protocol is a protocol for invalidating a corresponding line of all the caches being referred to as an address being dirty when a certain cache memory updates a reference address from a plurality of cache memories. The update-type protocol is a protocol for notifying update data to the main memory and other cache memories when data is updated with respect to addresses referenced by a plurality of cache memories.

無効型プロトコルの例としては、Ｍ状態、Ｅ状態、Ｓ状態、Ｉ状態という４つの状態をとるＭＥＳＩ（Ｉｌｌｉｎｏｉｓ）プロトコル、または、Ｍ状態、Ｏ状態、Ｓ状態、Ｉ状態という４つ状態をとるＭＯＳＩ（Ｂｅｒｋｅｌｅｙ）プロトコルなどが存在する。更新型プロトコルの例としては、Ｍ状態、Ｅ状態、Ｉ状態という３つの状態をとるＭＥＩ（Ｆｉｒｅｆｌｙ）プロトコル、または、Ｍ状態、Ｏ状態、Ｅ状態、Ｓ状態という４つの状態をとるＭＯＥＳ（ＤＲＡＧＯＮ）プロトコルなどが存在する。 As an example of the invalid protocol, MESI (Illinois) protocol that takes four states of M state, E state, S state, and I state, or four states of M state, O state, S state, and I state There is a MOSI (Berkeley) protocol and the like. As an example of the update type protocol, MEI (Firefly) protocol that takes three states of M state, E state, and I state, or MOES (DRAGON) that takes four states of M state, O state, E state, and S state ) Protocols exist.

本実施の形態では、無効型プロトコルの一つであるＭＥＳＩプロトコルを例にして説明を行うが、他の無効型プロトコルや、また、更新型プロトコルであっても本実施の形態を適用することができる。 In the present embodiment, the MESI protocol, which is one of the invalid protocols, will be described as an example. However, the present embodiment can be applied to other invalid protocols and update protocols. it can.

キャッシュライン制御部＃０は、キャッシュメモリの機能を実現するために要求される、データ格納構造決定処理、ライン入れ替え処理、データ更新処理、等といった様々な機能を有する。また、キャッシュライン制御部＃０は、対応するＣＰＵ＃０とはＣＰＵＩ／Ｆで接続され、バス１１０とは、マスタＩ／Ｆ＃０とスレーブＩ／Ｆ＃０とで接続されている。 The cache line control unit # 0 has various functions such as a data storage structure determination process, a line replacement process, a data update process, and the like required to realize the cache memory function. The cache line control unit # 0 is connected to the corresponding CPU # 0 via a CPU I / F, and is connected to the bus 110 via a master I / F # 0 and a slave I / F # 0.

また、キャッシュライン制御部＃０は、スヌープに対応するためスヌープ制御部＃０を含む。スヌープ制御部＃０は、スヌープ方式を実現するプロトコルに従って、キャッシュライン記憶部＃０を制御する機能を有する。本実施の形態におけるスヌープ制御部＃０は、ＭＥＳＩプロトコルに従ってキャッシュライン記憶部＃０を制御する。スヌープ制御部＃０は、新たにキャッシュラインをフェッチする処理と、キャッシュラインへの書き込み処理を行う。前述の２つの処理については、図１０、図１１にて後述する。 Further, the cache line control unit # 0 includes a snoop control unit # 0 to cope with snoop. The snoop control unit # 0 has a function of controlling the cache line storage unit # 0 according to a protocol that realizes the snoop method. The snoop control unit # 0 in the present embodiment controls the cache line storage unit # 0 according to the MESI protocol. The snoop control unit # 0 performs processing for newly fetching a cache line and processing for writing to the cache line. The above two processes will be described later with reference to FIGS.

スヌープ対応キャッシュは、ＣＰＵ＃０〜ＣＰＵ＃３各々の分存在するため、スヌープ対応キャッシュの内部構造となるキャッシュライン記憶部、キャッシュライン制御部もそれぞれのＣＰＵ分存在する。たとえば、キャッシュライン記憶部＃１は、スヌープ対応キャッシュ＃１のキャッシュライン記憶部を示している。キャッシュライン記憶部＃２、キャッシュライン記憶部＃３も、それぞれスヌープ対応キャッシュ＃２、スヌープ対応キャッシュ＃３に対応する。キャッシュライン制御部、スヌープ制御部も同様である。 Since snoop-compatible caches exist for each of CPU # 0 to CPU # 3, there are also cache line storage units and cache line control units that are internal structures of the snoop-compatible cache for each CPU. For example, the cache line storage unit # 1 indicates the cache line storage unit of the snoop-compatible cache # 1. The cache line storage unit # 2 and the cache line storage unit # 3 also correspond to the snoop compatible cache # 2 and the snoop compatible cache # 3, respectively. The same applies to the cache line control unit and the snoop control unit.

図４は、スヌープ対応したバス１１０の詳細を示す説明図である。図４中の線は物理的な一本の信号線であり、黒丸は信号線同士の接続を表す。バス１１０は、マスタＩ／Ｆ＃０〜マスタＩ／Ｆ＃３とスレーブＩ／Ｆ＃０〜スレーブＩ／Ｆ＃３から信号を受ける。マスタＩ／Ｆは、アドレス情報と、リードかライトかといったコマンド情報を送出する。コントローラ４０１は、アドレス情報とあらかじめコントローラ４０１に登録されたマッピング情報に基づいて、対応するスレーブＩ／Ｆにセレクト信号を出力する。セレクト信号を受け取ったスレーブＩ／Ｆはアドレス情報とコマンド情報を受け取り、コマンド情報に応じてデータのやりとりを行う。 FIG. 4 is an explanatory diagram showing details of the bus 110 corresponding to the snoop. A line in FIG. 4 is one physical signal line, and a black circle represents a connection between the signal lines. Bus 110 receives signals from master I / F # 0 to master I / F # 3 and slave I / F # 0 to slave I / F # 3. The master I / F sends out address information and command information such as read or write. The controller 401 outputs a select signal to the corresponding slave I / F based on the address information and the mapping information registered in the controller 401 in advance. The slave I / F that has received the select signal receives address information and command information, and exchanges data according to the command information.

図４の例にて、マスタＩ／Ｆ＃０によって送出された信号は、コントローラ４０１が受信する。コントローラ４０１は、たとえば、スレーブＩ／Ｆ＃１〜スレーブＩ／Ｆ＃３に対してセレクト信号を出力する。セレクト信号を受け取ったスレーブＩＦ＃１〜スレーブＩ／Ｆ＃３は、アドレス情報とコマンド情報を受け取り、コマンド情報に応じてデータのやりとりを行う。 In the example of FIG. 4, the controller 401 receives the signal transmitted by the master I / F # 0. For example, the controller 401 outputs a select signal to the slave I / F # 1 to the slave I / F # 3. The slave IF # 1 to slave I / F # 3 that have received the select signal receive the address information and the command information, and exchange data according to the command information.

さらにスヌープに対応したバス１１０は、ブロードキャスト、ブロック、インバリデートといった３つの機能が追加される。ブロードキャストとは、マスタＩ／Ｆから、コマンド情報とデータ情報の組み合わせの要求を、あらかじめブロードキャスト先として設定されたすべてのスレーブＩ／Ｆに送出する機能である。ブロックとは、現在のバス接続を強制的に解除する機能である。インバリデートとは、キャッシュメモリに対してアドレスに対応するラインを無効化させる機能である。これらの機能を使用することで、バス１１０は、キャッシュコヒーレンシ機構として要求される機能を満たす。 Furthermore, the bus 110 corresponding to the snoop is added with three functions such as broadcast, block, and invalidate. Broadcasting is a function for sending a request for a combination of command information and data information from a master I / F to all slave I / Fs set in advance as broadcast destinations. The block is a function for forcibly releasing the current bus connection. Invalidation is a function for invalidating a line corresponding to an address for a cache memory. By using these functions, the bus 110 satisfies the functions required as a cache coherency mechanism.

（マルチコアプロセッサシステム１００の機能）
次に、マルチコアプロセッサシステム１００の機能について説明する。図５は、マルチコアプロセッサシステム１００の機能を示すブロック図である。マルチコアプロセッサシステム１００は、実行部５０３と、検出部５０４と、特定部５０５と、判断部５０６と、制御部５０７と、を含む。この制御部となる機能（検出部５０４〜制御部５０７）は、記憶装置に記憶されたプログラムをたとえばＣＰＵ＃０が実行することにより、その機能を実現する。記憶装置とは、具体的には、たとえば、図１に示したＲＯＭ１０２、ＲＡＭ１０３、フラッシュＲＯＭ１０４などである。また、実行部５０３はキャッシュコヒーレンシ機構２０１が実行することにより、その機能を実現する。 (Functions of the multi-core processor system 100)
Next, functions of the multi-core processor system 100 will be described. FIG. 5 is a block diagram showing functions of the multi-core processor system 100. The multi-core processor system 100 includes an execution unit 503, a detection unit 504, a specifying unit 505, a determination unit 506, and a control unit 507. The function (detection unit 504 to control unit 507) serving as the control unit is realized by, for example, CPU # 0 executing a program stored in the storage device. Specifically, the storage device is, for example, the ROM 102, the RAM 103, the flash ROM 104, or the like shown in FIG. The execution unit 503 realizes its function by the cache coherency mechanism 201 executing it.

また、マルチコアプロセッサシステム１００は、任意のスレッドに対して、同じ共有データにアクセスしないスレッドの一覧と、任意のスレッドがアクセスしないプロセス間通信の領域と、を格納した依存情報５０１にアクセス可能である。依存情報５０１の詳細は、図９にて後述する。 In addition, the multi-core processor system 100 can access the dependency information 501 that stores a list of threads that do not access the same shared data and an interprocess communication area that is not accessed by any thread. . Details of the dependency information 501 will be described later with reference to FIG.

また、検出部５０４〜制御部５０７は、スケジューラ５０２に内部の機能として存在してもよいし、または、スケジューラ５０２の外部にあり、スケジューラ５０２の処理結果を通知可能な状態で存在してもよい。スケジューラ５０２は、ＯＳ２０３に含まれるソフトウェアであり、ＣＰＵに割り当てるプロセスを決定する機能を有する。また、図５では、検出部５０４〜制御部５０７は、ＣＰＵ＃０の内部に存在しているが、ＣＰＵ＃１〜ＣＰＵ＃３のいずれかに存在していてもよいし、ＣＰＵ＃０〜ＣＰＵ＃３のすべてに存在してもよい。 The detection unit 504 to the control unit 507 may exist as an internal function in the scheduler 502, or may exist outside the scheduler 502 and in a state in which the processing result of the scheduler 502 can be notified. . The scheduler 502 is software included in the OS 203 and has a function of determining a process to be assigned to the CPU. In FIG. 5, the detection unit 504 to the control unit 507 exist in the CPU # 0, but may exist in any of the CPUs # 1 to # 3, or the CPUs # 0 to # 0. It may exist in all of CPU # 3.

たとえば、スケジューラ５０２は、スレッドに設定されている優先度等に基づいて、次にＣＰＵに割り当てるスレッドを決定する。定められた時刻になった場合、スケジューラ５０２はディスパッチャが決定されたスレッドをＣＰＵに割り当てる。スケジューラの機能の１つとしてディスパッチャを含む形態も存在する。本実施の形態では、スケジューラ５０２内にディスパッチャの機能が存在する。 For example, the scheduler 502 determines the next thread to be assigned to the CPU based on the priority set for the thread. When the predetermined time comes, the scheduler 502 assigns the thread for which the dispatcher is determined to the CPU. There is a form including a dispatcher as one of the functions of the scheduler. In the present embodiment, a dispatcher function exists in the scheduler 502.

実行部５０３は、複数のコアの各々によってアクセスされるキャッシュメモリに格納された共有データの値のコヒーレンシを実行する機能を有する。たとえば、実行部５０３となるキャッシュコヒーレンシ機構２０１は、ＣＰＵ＃０がアクセスするスヌープ対応キャッシュ＃０と、ＣＰＵ＃１がアクセスするスヌープ対応キャッシュ＃１の共有データの値のコヒーレンシを実行する。 The execution unit 503 has a function of executing coherency of the value of the shared data stored in the cache memory accessed by each of the plurality of cores. For example, the cache coherency mechanism 201 serving as the execution unit 503 executes coherency of values of shared data in the snoop-compatible cache # 0 accessed by the CPU # 0 and the snoop-compatible cache # 1 accessed by the CPU # 1.

検出部５０４は、複数のコアのうち第１のコアによって実行された第１のスレッドを検出する機能を有する。たとえば、検出部５０４は、第１のコアとなるＣＰＵ＃０によって実行されたスレッドＡ−１を検出する。具体的にスレッドＡ−１が検出されるタイミングとしては、スケジューラ５０２によってスレッドの再スケジューリング要求が行われる時である。なお、検出されたスレッドの情報は、ＣＰＵ＃０のレジスタ、ローカルメモリなどに格納される。 The detection unit 504 has a function of detecting a first thread executed by the first core among the plurality of cores. For example, the detection unit 504 detects the thread A-1 executed by the CPU # 0 serving as the first core. Specifically, the timing at which the thread A-1 is detected is when the scheduler 502 makes a thread rescheduling request. The detected thread information is stored in a register of CPU # 0, a local memory, or the like.

特定部５０５は、検出部５０４によって第１のスレッドが検出された場合、複数のコアから第１のコア以外の第２のコアによって実行中の第２のスレッドを特定する機能を有する。たとえば、特定部５０５は、スレッドＡ−１が検出された場合、ＣＰＵ＃０〜ＣＰＵ＃３からＣＰＵ＃０以外、たとえばＣＰＵ＃１によって実行中のスレッドＡ−２を特定する。なお、特定されたスレッドの情報は、ＣＰＵ＃０のレジスタ、ローカルメモリなどに格納される。 When the detecting unit 504 detects the first thread, the specifying unit 505 has a function of specifying the second thread being executed by the second core other than the first core from the plurality of cores. For example, when the thread A-1 is detected, the identifying unit 505 identifies the thread A-2 being executed by the CPU # 1 other than the CPU # 0 to CPU # 3, for example, the CPU # 1. The information on the identified thread is stored in a register of CPU # 0, a local memory, or the like.

判断部５０６は、第１のスレッドおよび特定部５０５によって特定された第２のスレッドによって共通してアクセスされる共有データが存在するか否かを判断する機能を有する。また、判断部５０６は、さらに、第１および第２のスレッドが同一のプロセスに属するか否かを判断してもよい。また、判断部５０６は、第１および第２のスレッドが異なるプロセスに属し、かつ、第１および第２のスレッドによって共通して使用されるプロセス間通信の領域が存在するか否かを判断してもよい。プロセス間通信の領域の詳細については、図８にて後述する。 The determination unit 506 has a function of determining whether there is shared data that is commonly accessed by the first thread and the second thread specified by the specifying unit 505. The determination unit 506 may further determine whether or not the first and second threads belong to the same process. Further, the determination unit 506 determines whether the first and second threads belong to different processes and an interprocess communication area that is commonly used by the first and second threads exists. May be. Details of the inter-process communication area will be described later with reference to FIG.

また、判断部５０６は、第１および第２のスレッドが属するプロセス同士が使用されるプロセス間通信の領域が存在するか否かを判断してもよい。また、判断部５０６は、共通して使用されるプロセス間通信の領域が存在するか否かを判断する場合、初めに、第１および第２のスレッドの少なくとも一方がプロセス間通信の領域を使用していない状態を判断してもよい。第１および第２のスレッドが共にプロセス間通信の領域を使用している場合に、判断部５０６は、第１および第２のスレッドによって共通して使用されるプロセス間通信の領域が存在するか否かを判断してもよい。 The determination unit 506 may determine whether there is an interprocess communication area in which the processes to which the first and second threads belong are used. In addition, when the determination unit 506 determines whether or not there is an interprocess communication area that is used in common, first, at least one of the first and second threads uses the interprocess communication area. You may judge the state which is not. When both the first and second threads use the interprocess communication area, the determination unit 506 determines whether there is an interprocess communication area commonly used by the first and second threads. It may be determined whether or not.

たとえば、判断部５０６は、依存情報５０１にアクセスし、第１のスレッドの同じ共有データにアクセスしないスレッドの一覧に、第２のスレッドが含まれるか否かで、スレッドによって共通してアクセスされる共有データが存在するか否かを判断する。また、判断部５０６は、依存情報５０１にアクセスし、第１および第２のスレッドがアクセスしないプロセス間領域から、共通して使用されるプロセス間通信の領域が存在するか否かを判断してもよい。なお、判断結果は、ＣＰＵ＃０のレジスタ、ローカルメモリなどに格納される。 For example, the determination unit 506 accesses the dependency information 501 and is commonly accessed by threads depending on whether or not the second thread is included in the list of threads that do not access the same shared data of the first thread. Determine whether shared data exists. Further, the determination unit 506 accesses the dependency information 501 and determines whether there is an interprocess communication area that is commonly used from the interprocess areas that are not accessed by the first and second threads. Also good. The determination result is stored in a register of CPU # 0, a local memory, or the like.

制御部５０７は、判断部５０６によって共通してアクセスされる共有データが存在しないと判断された場合、実行部５０３により第１および第２のコアそれぞれに対応する第１および第２のキャッシュメモリとのコヒーレンシの実行を停止させる機能を有する。また、制御部５０７は、第１および第２のスレッドが同一のプロセスであると判断され、かつ、共通してアクセスされる共有データが存在しないと判断された場合、第１および第２のキャッシュメモリとのコヒーレンシの実行を停止させてもよい。また、制御部５０７は、第１および第２のスレッドが異なるプロセスに属し、かつ、共通して使用されるプロセス間通信の領域が存在しないと判断された場合、第１および第２のキャッシュメモリとのコヒーレンシの実行を停止させてもよい。 When the determination unit 506 determines that there is no shared data accessed in common by the determination unit 506, the control unit 507 uses the first and second cache memories respectively corresponding to the first and second cores by the execution unit 503. Has a function of stopping the execution of coherency. When the control unit 507 determines that the first and second threads are the same process, and determines that there is no shared data to be accessed in common, the control unit 507 Execution of coherency with the memory may be stopped. Further, when it is determined that the first and second threads belong to different processes and there is no interprocess communication area used in common, the control unit 507 determines that the first and second cache memories The coherency execution may be stopped.

また、制御部５０７は、第１および第２のキャッシュメモリとのコヒーレンシの実行を停止させた場合、第１のキャッシュメモリに格納された共有データを第１のキャッシュメモリから消去させてもよい。 In addition, when the execution of coherency with the first and second cache memories is stopped, the control unit 507 may delete the shared data stored in the first cache memory from the first cache memory.

たとえば、判断部５０６がスレッドＡ−１とスレッドＡ−２に共通してアクセスされる共有データが存在しないと判断した場合を想定する。この時、制御部５０７は、キャッシュコヒーレンシ機構２０１によりスヌープ対応キャッシュ＃０とスヌープ対応キャッシュ＃１のコヒーレンシの実行を停止させる。また、制御部５０７は、スヌープ対応キャッシュ＃０に格納された共有データをフラッシュすることで、共有データをスヌープ対応キャッシュ＃０から消去させる。具体的なフラッシュ操作は、図１２にて後述する。 For example, it is assumed that the determination unit 506 determines that there is no shared data that is commonly accessed by the thread A-1 and the thread A-2. At this time, the control unit 507 causes the cache coherency mechanism 201 to stop executing the coherency of the snoop-compatible cache # 0 and the snoop-compatible cache # 1. Further, the control unit 507 flushes the shared data stored in the snoop-compatible cache # 0, thereby deleting the shared data from the snoop-compatible cache # 0. A specific flash operation will be described later with reference to FIG.

図６は、キャッシュコヒーレンシの実行状態と停止状態を示す説明図である。符号６０１で示すブロック図は、マルチコアプロセッサシステム１００で動作するプロセスＡが保有するスレッドと、各スレッドがアクセスするメモリについて示している。プロセスＡは、スレッドＡ−１、スレッドＡ−２、スレッドＡ−３を保有する。また、スレッドＡ−１は、共有メモリ２０２のプロセスＡ用データ領域のうち、データＡにアクセスする。スレッドＡ−２は、データＢにアクセスする。スレッドＡ−３は、データＡとデータＢにアクセスする。 FIG. 6 is an explanatory diagram showing an execution state and a stop state of cache coherency. A block diagram denoted by reference numeral 601 shows threads held by the process A operating in the multi-core processor system 100 and memory accessed by each thread. The process A has a thread A-1, a thread A-2, and a thread A-3. The thread A-1 accesses data A in the process A data area of the shared memory 202. The thread A-2 accesses the data B. The thread A-3 accesses data A and data B.

符号６０１で示す前提のもと、符号６０２で示すマルチコアプロセッサシステム１００では、ＣＰＵ＃０にスレッドＡ−１が割り当てられ、ＣＰＵ＃１にスレッドＡ−３が割り当てられる。符号６０２の状態では、キャッシュコヒーレンシ機構２０１は、キャッシュコヒーレンシを実行することで、ＣＰＵ＃０とＣＰＵ＃１はデータＡを整合性のとれた状態で共有することができる。 Under the premise indicated by reference numeral 601, in the multi-core processor system 100 indicated by reference numeral 602, the thread A- 1 is assigned to the CPU # 0 and the thread A- 3 is assigned to the CPU # 1. In the state indicated by reference numeral 602, the cache coherency mechanism 201 executes cache coherency, so that the CPU # 0 and the CPU # 1 can share the data A in a consistent state.

次に、符号６０３で示すマルチコアプロセッサシステム１００では、符号６０２の状態から、ＣＰＵ＃１に割り当てられたスレッドが、スレッドＡ−３からスレッドＡ−２に切り替わった状態を示している。符号６０３の状態では、ＣＰＵ＃０とＣＰＵ＃１には共有してアクセスするデータが存在しないため、キャッシュコヒーレンシ機構２０１は、キャッシュコヒーレンシの実行を停止してもよい。キャッシュコヒーレンシを無効にするには、たとえば、図４にて前述したスヌープに対応したバス１１０にて、ブロードキャスト送信を行う際に、ブロードキャスト先から除外することで、キャッシュコヒーレンシの実行停止を実現する。 Next, in the multi-core processor system 100 denoted by reference numeral 603, the thread assigned to the CPU # 1 is switched from the thread A-3 to the thread A-2 from the state denoted by reference numeral 602. In the state indicated by reference numeral 603, the CPU # 0 and the CPU # 1 do not have shared data to be accessed, so the cache coherency mechanism 201 may stop executing the cache coherency. In order to invalidate the cache coherency, for example, when performing broadcast transmission on the bus 110 corresponding to the snoop described above with reference to FIG. 4, the execution of cache coherency is stopped by excluding it from the broadcast destination.

具体的に無効にする方法としては、バス１１０にて、スヌープ対応キャッシュ＃０とスヌープ対応キャッシュ＃１のブロードキャストを停止するといった方法がある。たとえば、キャッシュコヒーレンシ機構２０１は、スヌープ対応キャッシュ＃０のマスタＩ／Ｆ＃０から無効化通知等をブロードキャスト送信する際に、スレーブＩ／Ｆ＃１には送信せず、スレーブＩ／Ｆ＃２とスレーブＩ／Ｆ＃３に送信する。このように、キャッシュコヒーレンシ機構２０１において、特定のＣＰＵ間のキャッシュコヒーレンシの実行を停止すると、送信先が減少するために、バス１１０のトラフィック量や処理量が減少し、消費電力を削減し、また遅延を防止することができる。 As a specific invalidation method, there is a method of stopping broadcasting of the snoop-compatible cache # 0 and the snoop-compatible cache # 1 on the bus 110. For example, when the cache coherency mechanism 201 broadcasts an invalidation notification or the like from the master I / F # 0 of the snoop-compatible cache # 0, the cache coherency mechanism 201 does not transmit the slave I / F # 1 to the slave I / F # 2. To the slave I / F # 3. As described above, in the cache coherency mechanism 201, when the execution of cache coherency between specific CPUs is stopped, the number of transmission destinations decreases, so the traffic amount and processing amount of the bus 110 decrease, power consumption is reduced, and Delay can be prevented.

図７は、マルチコアプロセッサシステム１００の動作概要である。図７では、図６で提示したようなスレッドがアクセスする共有データに基づいてキャッシュコヒーレンシ機構２０１の実行と停止を切り替えることを実現するための処理群として、処理７０１〜処理７１０を示している。処理７０１と処理７０２は、マルチコアプロセッサシステム１００の設計時に実行され、処理７０３〜処理７１０は、マルチコアプロセッサシステム１００の動作中に実行される。 FIG. 7 is an outline of the operation of the multi-core processor system 100. In FIG. 7, processing 701 to processing 710 are shown as processing groups for realizing switching between execution and stop of the cache coherency mechanism 201 based on shared data accessed by threads as shown in FIG. 6. Processing 701 and processing 702 are executed when the multi-core processor system 100 is designed, and processing 703 to processing 710 are executed during the operation of the multi-core processor system 100.

処理７０１では、スレッド間の依存関係を、ＥＳＬ（ＥｌｅｃｔｒｏｎｉｃＳｙｓｔｅｍＬｅｖｅｌ）シミュレータなどによって、対象プロセスに属するスレッド間の依存関係を解析する。具体的な解析の例は、図８にて後述する。解析後、処理７０２ではコンパイラによって、対象プロセスの元となるプログラムが書き換えられる。具体的には、コンパイラは、特定のスレッドの先頭等に、特定のスレッドと依存関係があるスレッドの情報をプログラム内に追加する。 In the process 701, the dependency between threads belonging to the target process is analyzed by using an ESL (Electronic System Level) simulator or the like. A specific analysis example will be described later with reference to FIG. After the analysis, in the process 702, the program that is the source of the target process is rewritten by the compiler. Specifically, the compiler adds information on a thread having a dependency relationship with a specific thread to the head of the specific thread.

処理７０３にて、対象プロセスを実行するＣＰＵは、新規にスレッドを生成する。生成後、処理７０４にて、前述のＣＰＵは、コンパイラによって追加された依存関係の情報を、ＯＳ２０３内の依存情報５０１に追加する。依存情報５０１の詳細は、図９にて後述する。追加後、対象プロセスは、ＯＳ２０３に対してスレッドの割り当て要求を行い、処理７０５にて、ＯＳ２０３内のスケジューラ５０２によって割り当てが決定されたＣＰＵによって、生成されたスレッドを実行する。 In processing 703, the CPU that executes the target process newly generates a thread. After the generation, in the process 704, the CPU adds the dependency relationship information added by the compiler to the dependency information 501 in the OS 203. Details of the dependency information 501 will be described later with reference to FIG. After the addition, the target process makes a thread allocation request to the OS 203, and in step 705, the CPU determines the allocation by the scheduler 502 in the OS 203 and executes the generated thread.

ＯＳ２０３のスケジューラ５０２は、処理７０６にて、対象のＣＰＵに割り当てるスレッドを決定する。決定対象となるスレッドは、処理７０３にて生成されたスレッドや実行可能状態となっているスレッドから決定される。スレッドを決定後、スケジューラ５０２は、処理７０７にて、他のＣＰＵで実行中のスレッドを取得する。 In process 706, the scheduler 502 of the OS 203 determines a thread to be assigned to the target CPU. The thread to be determined is determined from the thread generated in the process 703 and the thread in an executable state. After determining the thread, the scheduler 502 acquires a thread being executed by another CPU in a process 707.

実行中のスレッドを取得後、スケジューラ５０２は、処理７０８にて、登録された依存情報５０１から、コヒーレンシの実行、または停止を決定する。決定後、スケジューラ５０２は、処理７０９にて、決定内容に従ってキャッシュコヒーレンシ機構２０１を制御する。制御後、スケジューラ５０２は、処理７１０にて、対象のＣＰＵに決定されたスレッドを割り当てる。処理７０６〜処理７１０の詳細については、図１２、図１３にて後述する。 After acquiring the executing thread, the scheduler 502 determines execution or stop of coherency from the registered dependency information 501 in processing 708. After the determination, in step 709, the scheduler 502 controls the cache coherency mechanism 201 according to the determined content. After the control, the scheduler 502 assigns the determined thread to the target CPU in process 710. Details of processing 706 to processing 710 will be described later with reference to FIGS.

図８は、依存情報５０１の登録方法を示す説明図である。図８では、符号６０１で示したプロセスＡについて、依存情報５０１の登録方法を示している。マルチコアプロセッサシステム１００の設計時に、ＥＳＬシミュレータは、スレッド並列処理を行うプロセスＡ内のスレッドに対して、それぞれのスレッドが利用するデータ領域を調査する。 FIG. 8 is an explanatory diagram showing a method for registering the dependency information 501. FIG. 8 shows a method for registering the dependency information 501 for the process A indicated by reference numeral 601. At the time of designing the multi-core processor system 100, the ESL simulator investigates the data area used by each thread for the thread in the process A that performs thread parallel processing.

調査方法には、ソースコードを解析することで使用するデータ領域を得る静的手法や、実際にスレッドを実行してメモリアクセスの記録からスレッドが使用するデータ領域を得る動的手法などが存在する。スレッドのデータ領域の完全な解析が困難である場合には、調査によって明らかにデータ領域を共有していないことが判明したスレッドの組み合わせを記録してもよい。 There are static methods to obtain the data area to be used by analyzing the source code, and dynamic methods to obtain the data area to be used by the thread from the memory access record by actually executing the thread. . If complete analysis of the data area of the thread is difficult, a combination of threads that are clearly found not to share the data area may be recorded.

データ領域の共有の調査に加えて、調査対象のスレッドが、ＯＳ２０３が提供するプロセス間通信の領域のうち、明らかに利用しない領域があれば、その領域も記録する。プロセス間通信の領域とは、複数のプロセス間で通信を行うための領域であり、ＯＳ２０３がＡＰＩ（ＡｐｐｌｉｃａｔｉｏｎＰｒｏｇｒａｍｍｉｎｇＩｎｔｅｒｆａｃｅ）等で提供する。一般的に、プロセス間でメモリ空間は独立しており、互いのメモリ空間を直接アクセスすることはできない。したがって、プロセス間で情報を送受信する場合、プロセスは、プロセス間通信の領域を利用して情報を送受信する。たとえば、ＯＳ２０３はメモリ空間の塊となるチャンクを確保し、確保したチャンクをプロセス間通信の領域として、プロセスに提供する。 In addition to the data area sharing investigation, if the investigation target thread has an area that is obviously not used in the inter-process communication area provided by the OS 203, that area is also recorded. The inter-process communication area is an area for performing communication between a plurality of processes, and is provided by the OS 203 using an API (Application Programming Interface) or the like. In general, the memory space is independent between processes, and it is not possible to directly access each other's memory space. Therefore, when information is transmitted / received between processes, the process transmits / receives information using an inter-process communication area. For example, the OS 203 secures a chunk that becomes a lump of memory space, and provides the secured chunk as an interprocess communication area to a process.

符号８０１は、プロセスＡの元となるプログラムである。ＥＳＬシミュレータは、調査の結果となる、データを共有しないスレッドの情報や利用しないプロセス間通信領域の情報を、プロセスＡの実行中にスレッドを起動する際に依存情報５０１に情報を登録するようにプログラムを変更する。コンパイラは、変更されたプログラムをコンパイルし、実行コードを生成する。具体的に図８では、スレッドＡ−１と、スレッドＡ−２には、共通してアクセスされる共有データが存在しないため、ＥＳＬシミュレータは、プロセスＡのプログラムに、スレッドＡ−１とスレッドＡ−２が依存の無いことを追記する。 Reference numeral 801 denotes a program that is a source of the process A. The ESL simulator registers the information of the thread that does not share data and the information of the inter-process communication area that is not used as a result of the investigation into the dependency information 501 when the thread is activated during the execution of the process A. Change the program. The compiler compiles the changed program and generates an execution code. Specifically, in FIG. 8, since there is no shared data that is commonly accessed in the thread A-1 and the thread A-2, the ESL simulator includes the thread A-1 and the thread A in the program of the process A. -2 adds no dependency.

ＯＳ２０３は、データを共有しないスレッドの情報を依存情報５０１に登録する。図８に示すプロセスＡのスレッド間の依存関係８０２では、スレッドＡ−１とスレッドＡ−２に“依存無し”が登録されることになる。ＯＳ２０３は、データを共有しないスレッドの情報と利用しないプロセス間通信領域とを管理する依存情報５０１として、スレッドデータ構造を拡張する。拡張されたスレッドデータ構造については、図９にて後述する。 The OS 203 registers information on threads that do not share data in the dependency information 501. In the dependency relationship 802 between threads of the process A shown in FIG. 8, “no dependency” is registered in the thread A-1 and the thread A-2. The OS 203 extends the thread data structure as dependency information 501 for managing information on threads that do not share data and interprocess communication areas that are not used. The extended thread data structure will be described later with reference to FIG.

図９は、拡張されたスレッドデータ構造９０１のメンバー一覧と記憶内容の一例を示す説明図である。図９では、拡張されたスレッドデータ構造９０１と、スレッドデータ構造９０１の記憶内容の一例として、表９０２、スレッドＩＤリスト９０３、スレッドＩＤリスト９０４、プロセス間通信領域リスト９０５、プロセス間通信領域リスト９０６を示している。 FIG. 9 is an explanatory diagram showing an example of a member list and stored contents of the extended thread data structure 901. In FIG. 9, as an example of the expanded thread data structure 901 and the stored contents of the thread data structure 901, a table 902, a thread ID list 903, a thread ID list 904, an inter-process communication area list 905, and an inter-process communication area list 906 Is shown.

スレッドデータ構造９０１のスレッドＩＤフィールドには、スレッドごとに採番された値が設定される。スレッド関数フィールドは、スレッドの関数名が設定される。その他、スレッドデータ構造９０１のフィールドとして、従来のスレッドデータ構造に存在するフィールドが存在する。また、依存情報５０１の実体となる、データを共有しないスレッドＩＤリストフィールドと、利用しないプロセス間通信領域リストフィールドは、本実施の形態にて拡張されたフィールドである。 In the thread ID field of the thread data structure 901, a value assigned for each thread is set. In the thread function field, the function name of the thread is set. In addition, a field existing in the conventional thread data structure exists as a field of the thread data structure 901. Also, the thread ID list field that does not share data and the inter-process communication area list field that is not used, which are the entities of the dependency information 501, are fields expanded in this embodiment.

データを共有しないスレッドＩＤリストフィールドは、対象のスレッドとデータを共有しないスレッドＩＤリストへのポインタが設定されている。たとえば、表９０２において、スレッドＩＤが“１”であるＢｒｏｗｓｅｒ＿Ｍａｉｎスレッドは、データを共有しないスレッドＩＤリストフィールドにスレッドＩＤリスト９０３が設定されている。スレッドＩＤリスト９０３には、スレッドＩＤ“２”とスレッドＩＤ“３”のスレッドが登録されている。これにより、Ｂｒｏｗｓｅｒ＿Ｍａｉｎスレッドは、スレッドＩＤが“２”であるＢｒｏｗｓｅｒ＿Ｄｏｗｎｌｏａｄスレッドと、スレッドＩＤが“３”であるＢｒｏｗｓｅｒ＿Ｕｐｌｏａｄスレッドとに対してデータの共有がない、ということを示している。 In the thread ID list field that does not share data, a pointer to a thread ID list that does not share data with the target thread is set. For example, in Table 902, for the Browser_Main thread whose thread ID is “1”, the thread ID list 903 is set in the thread ID list field that does not share data. In the thread ID list 903, threads having a thread ID “2” and a thread ID “3” are registered. As a result, the Browser_Main thread indicates that there is no data sharing between the Browser_Download thread whose thread ID is “2” and the Browser_Upload thread whose thread ID is “3”.

同様に、Ｂｒｏｗｓｅｒ＿Ｄｏｗｎｌｏａｄスレッドのデータを共有しないスレッドＩＤリストフィールドには、スレッドＩＤリスト９０４が設定されている。スレッドＩＤリスト９０４の内容により、Ｂｒｏｗｓｅｒ＿Ｄｏｗｎｌｏａｄスレッドは、Ｂｒｏｗｓｅｒ＿Ｍａｉｎスレッドに対してデータの共有がないことを示している。 Similarly, a thread ID list 904 is set in the thread ID list field that does not share data of the Browser_Download thread. The content of the thread ID list 904 indicates that the Browser_Download thread does not share data with the Browser_Main thread.

利用しないプロセス間通信領域リストフィールドは、スレッドが利用しないプロセス間通信領域リストへのポインタが設定されている。たとえば、Ｂｒｏｗｓｅｒ＿Ｍａｉｎスレッドは、利用しないプロセス間通信領域リストフィールドにプロセス間通信領域リスト９０５が設定されている。プロセス間通信領域リスト９０５には、チャンク３とチャンク４が登録されている。 In the unused inter-process communication area list field, a pointer to an inter-process communication area list that is not used by a thread is set. For example, in the Browser_Main thread, the inter-process communication area list 905 is set in the inter-process communication area list field that is not used. Chunk 3 and chunk 4 are registered in the inter-process communication area list 905.

また、Ｂｒｏｗｓｅｒ＿Ｍａｉｎスレッドと異なるプロセスとして実行される、スレッドＩＤが“４”のＦＴＰ＿Ｄｏｗｎｌｏａｄスレッドは、利用しないプロセス間通信領域リストフィールドにプロセス間通信領域リスト９０６が設定されている。プロセス間通信領域リスト９０６には、チャンク１とチャンク２が登録されている。ＯＳ２０３が確保したチャンクがチャンク１〜チャンク４である場合、プロセス間通信領域リスト９０５とプロセス間通信領域リスト９０６では、共通して利用されるプロセス間通信領域が存在しないことになる。 Further, the FTP_Download thread with the thread ID “4” executed as a process different from the Browser_Main thread has the inter-process communication area list 906 set in the inter-process communication area list field that is not used. Chunk 1 and chunk 2 are registered in the inter-process communication area list 906. When the chunks secured by the OS 203 are chunk 1 to chunk 4, the inter-process communication area list 905 and the inter-process communication area list 906 do not have an inter-process communication area used in common.

図９で説明したように、ＯＳ２０３は、スレッドデータ構造９０１によってスレッドを管理するが、同様に、スレッドの集合であるプロセスを管理してもよい。さらに、ＯＳ２０３は、プロセスごとに利用しないプロセス間通信領域リストを生成してもよい。 As described with reference to FIG. 9, the OS 203 manages threads by the thread data structure 901. Similarly, the OS 203 may manage processes that are sets of threads. Furthermore, the OS 203 may generate an inter-process communication area list that is not used for each process.

図１０は、スヌープ制御部＃０によるラインフェッチ処理を示すフローチャートである。図１０では、スヌープ対応キャッシュ＃０が、ＣＰＵ＃０からキャッシュへの読み込み要求を受信し、ＣＰＵ＃０が要求したキャッシュがキャッシュライン記憶部＃０に存在しなく、スヌープ制御部＃０によるラインフェッチ処理が発生した場合を想定している。スヌープ制御部＃０以外のスヌープ制御部＃１〜スヌープ制御部＃３も、対応するＣＰＵからの読み込み要求により、ラインフェッチ処理が行われる。 FIG. 10 is a flowchart showing line fetch processing by the snoop control unit # 0. In FIG. 10, the snoop-compatible cache # 0 receives a read request from the CPU # 0 to the cache, and the cache requested by the CPU # 0 does not exist in the cache line storage unit # 0, and the line by the snoop control unit # 0 It is assumed that fetch processing has occurred. Snoop control units # 1 to # 3 other than the snoop control unit # 0 also perform line fetch processing in response to a read request from the corresponding CPU.

スヌープ制御部＃０は、新規ラインフェッチを決定する（ステップＳ１００１）。決定後、スヌープ制御部＃０は、バス１１０に１ライン分のリード要求をマスタＩ／Ｆ＃０からブロードキャストとして送出する（ステップＳ１００２）。 The snoop control unit # 0 determines a new line fetch (step S1001). After the determination, the snoop control unit # 0 sends a read request for one line to the bus 110 from the master I / F # 0 as a broadcast (step S1002).

リード要求を受信したスヌープ制御部＃１〜スヌープ制御部＃３は、ラインフェッチ処理の応答を開始する。なお、スヌープ制御部＃１〜スヌープ制御部＃３はすべて等しい処理を行うため、以下の記述では、説明の簡略化のため、スヌープ制御部＃１にて説明を行う。スヌープ制御部＃１は、キャッシュライン記憶部＃１のタグ領域から、要求されたアドレスと一致するアドレスを持つラインが存在するかを検索する（ステップＳ１００３）。なお、一致するアドレスを持つラインを検索する際には、無効状態であるＩ状態以外のラインから検索される。 The snoop control unit # 1 to snoop control unit # 3 that has received the read request starts a response to the line fetch process. In addition, since all of the snoop control unit # 1 to the snoop control unit # 3 perform the same processing, in the following description, the description will be given by the snoop control unit # 1 for simplification of description. The snoop control unit # 1 searches the tag area of the cache line storage unit # 1 for a line having an address that matches the requested address (step S1003). When searching for a line having a matching address, the line is searched from a line other than the I state which is an invalid state.

検索後、スヌープ制御部＃１は、一致するアドレスを持つラインが存在するかを判断する（ステップＳ１００４）。一致するアドレスを持つラインが存在しない場合（ステップＳ１００４：Ｎｏ）、スヌープ制御部＃１は、ラインフェッチ処理の応答を終了する。一致するアドレスを持つラインが存在する場合（ステップＳ１００４：Ｙｅｓ）、スヌープ制御部＃１は、リード要求の送信元、図１０の例ではスヌープ制御部＃０に、ブロック命令を発行する（ステップＳ１００５）。 After the search, the snoop control unit # 1 determines whether there is a line having a matching address (step S1004). When there is no line having the matching address (step S1004: No), the snoop control unit # 1 ends the response of the line fetch process. If there is a line having a matching address (step S1004: Yes), the snoop control unit # 1 issues a block command to the transmission source of the read request, that is, the snoop control unit # 0 in the example of FIG. 10 (step S1005). ).

発行後、スヌープ制御部＃１は、一致したラインの状態がＭ状態であるかを判断する（ステップＳ１００６）。一致したラインの状態がＭ状態である場合（ステップＳ１００６：Ｙｅｓ）、スヌープ制御部＃１は、一致したラインのデータを共有メモリ２０２に書き込む（ステップＳ１００７）。書き込み終了後、または、検索されたラインの状態がＭ状態以外の場合（ステップＳ１００６：Ｎｏ）、スヌープ制御部＃１は、ラインをＳ状態に変更し（ステップＳ１００８）、ラインフェッチ処理の応答を終了する。 After the issuance, the snoop control unit # 1 determines whether the matched line is in the M state (step S1006). When the state of the matched line is the M state (step S1006: Yes), the snoop control unit # 1 writes the matched line data in the shared memory 202 (step S1007). After completion of writing or when the state of the searched line is other than the M state (step S1006: No), the snoop control unit # 1 changes the line to the S state (step S1008), and returns a response of the line fetch process. finish.

スヌープ制御部＃０は、スヌープ制御部＃１〜スヌープ制御部＃３のいずれかからブロック命令によりブロックされたかを判断する（ステップＳ１００９）。ブロックされなかった場合（ステップＳ１００９：Ｎｏ）、スヌープ制御部＃０は、リードしたラインをＥ状態でキャッシュライン記憶部＃０に格納し（ステップＳ１０１０）、ラインフェッチ処理を終了する。ブロックされた場合（ステップＳ１００９：Ｙｅｓ）、スヌープ制御部＃０は、バス１１０に１ライン分のリード要求を再送する（ステップＳ１０１１）。再送後、スヌープ制御部＃０は、リードしたラインをＳ状態でキャッシュライン記憶部＃０に格納し（ステップＳ１０１２）、ラインフェッチ処理を終了する。 The snoop control unit # 0 determines whether any of the snoop control unit # 1 to the snoop control unit # 3 is blocked by the block command (step S1009). If not blocked (step S1009: No), the snoop control unit # 0 stores the read line in the cache line storage unit # 0 in the E state (step S1010), and ends the line fetch process. When blocked (step S1009: Yes), the snoop control unit # 0 retransmits the read request for one line to the bus 110 (step S1011). After the retransmission, the snoop control unit # 0 stores the read line in the cache line storage unit # 0 in the S state (step S1012), and ends the line fetch process.

図１１は、スヌープ制御部＃０によるラインへの書き込み処理を示すフローチャートである。図１１では、スヌープ対応キャッシュ＃０が、ＣＰＵ＃０からキャッシュへの書き込み要求を受信した場合を想定している。スヌープ制御部＃０以外のスヌープ制御部＃１〜スヌープ制御部＃３も、対応するＣＰＵからの書き込み要求により、書き込み処理が行われる。 FIG. 11 is a flowchart showing a write process to a line by the snoop control unit # 0. In FIG. 11, it is assumed that the snoop-compatible cache # 0 receives a write request to the cache from the CPU # 0. The snoop control unit # 1 to the snoop control unit # 3 other than the snoop control unit # 0 are also subjected to the write process in response to a write request from the corresponding CPU.

スヌープ制御部＃０は、ラインへの書き込みを決定する（ステップＳ１１０１）。決定後、スヌープ制御部＃０は、書き込み予定のラインがＳ状態であるかを判断する（ステップＳ１１０２）。Ｓ状態である場合（ステップＳ１１０２：Ｙｅｓ）、スヌープ制御部＃０は、無効化要求をスヌープ制御部＃１〜スヌープ制御部＃３にブロードキャストにて発行する（ステップＳ１１０３）。無効化要求を送信後、またはＳ状態でない場合（ステップＳ１１０２：Ｎｏ）、スヌープ制御部＃０は、書き込み予定のラインをＭ状態に変更する（ステップＳ１１０４）。変更後、スヌープ制御部＃０は、書き込み予定のラインへデータを書き込み（ステップＳ１１０５）、ラインへの書き込み処理を終了する。 The snoop control unit # 0 determines writing to the line (step S1101). After the determination, the snoop control unit # 0 determines whether the line to be written is in the S state (step S1102). When it is in the S state (step S1102: Yes), the snoop control unit # 0 issues an invalidation request to the snoop control unit # 1 to the snoop control unit # 3 by broadcast (step S1103). After sending the invalidation request or when not in the S state (step S1102: No), the snoop control unit # 0 changes the line to be written to the M state (step S1104). After the change, the snoop control unit # 0 writes data to the line to be written (step S1105), and ends the line writing process.

ステップＳ１１０３の処理にて無効化要求を受信したスヌープ制御部＃１〜スヌープ制御部＃３は、ラインへの書き込み処理の応答を開始する。なお、スヌープ制御部＃１〜スヌープ制御部＃３はすべて等しい処理を行うため、以下の記述では、説明の簡略化のため、スヌープ制御部＃１にて説明を行う。 The snoop control unit # 1 to the snoop control unit # 3 that has received the invalidation request in the process of step S1103 starts a response to the line writing process. In addition, since all of the snoop control unit # 1 to the snoop control unit # 3 perform the same processing, in the following description, the description will be given by the snoop control unit # 1 for simplification of description.

スヌープ制御部＃１は、キャッシュライン記憶部＃１のタグ領域から、要求されたアドレスと一致するアドレスを持つラインが存在するかを検索する（ステップＳ１１０６）。なお、一致するアドレスを持つラインを検索する際には、無効状態であるＩ状態以外のラインから検索される。検索後、スヌープ制御部＃１は、一致するアドレスを持つラインが存在するかを判断する（ステップＳ１１０７）。一致するアドレスを持つラインが存在する場合（ステップＳ１１０７：Ｙｅｓ）、スヌープ制御部＃１は、ラインをＩ状態にすることで無効化する（ステップＳ１１０８）。ラインの無効化後、または、一致するアドレスを持つラインが存在しない場合（ステップＳ１１０７：Ｎｏ）、スヌープ制御部＃１は、ラインへの書き込み処理の応答を終了する。 The snoop control unit # 1 searches the tag area of the cache line storage unit # 1 for a line having an address that matches the requested address (step S1106). When searching for a line having a matching address, the line is searched from a line other than the I state which is an invalid state. After the search, the snoop control unit # 1 determines whether there is a line having a matching address (step S1107). If there is a line having a matching address (step S1107: Yes), the snoop control unit # 1 invalidates the line by setting it to the I state (step S1108). After the line is invalidated or when there is no line having a matching address (step S1107: No), the snoop control unit # 1 ends the response to the writing process to the line.

図１２は、コヒーレンシ制御処理を示すフローチャートである。コヒーレンシ制御処理は、ＣＰＵ＃０〜ＣＰＵ＃３のうち、どのＣＰＵでも実行される。図１２では、ＣＰＵ＃０にてコヒーレンシ制御処理が実行される場合にて説明を行う。 FIG. 12 is a flowchart showing the coherency control process. The coherency control process is executed by any of the CPUs # 0 to # 3. In FIG. 12, a description will be given of a case where the CPU # 0 executes the coherency control process.

ＣＰＵ＃０は、再スケジュール要求の受け付けを検出する（ステップＳ１２０１）。コヒーレンシ制御処理は、スレッドの再スケジュールを行うたびに実行される。したがって、コヒーレンシ制御処理は、スケジューラ５０２の内部にあってもよいし、スケジューラ５０２の外部にあり、通知可能な状態であってもよい。ＣＰＵ＃０は、再スケジュールの対象ＣＰＵに割り付けるスレッドを決定する（ステップＳ１２０２）。再スケジュール対象ＣＰＵは、ＣＰＵ＃０であってもよいし、スケジューラ５０２が他のＣＰＵのスケジュール処理も行っている場合は、再スケジュール要求の受け付けの対象となった他のＣＰＵであってもよい。 CPU # 0 detects acceptance of a reschedule request (step S1201). The coherency control process is executed each time a thread is rescheduled. Therefore, the coherency control process may be inside the scheduler 502 or may be outside the scheduler 502 and can be notified. CPU # 0 determines a thread to be assigned to the target CPU for rescheduling (step S1202). The CPU to be rescheduled may be CPU # 0, or may be another CPU for which a rescheduling request is accepted when the scheduler 502 is also performing scheduling processing for another CPU. .

決定後、ＣＰＵ＃０は、割り付け禁止フラグが設定されているかを判断する（ステップＳ１２０３）。割り付け禁止フラグが設定されている場合（ステップＳ１２０３：Ｙｅｓ）、ＣＰＵ＃０は、一定時間後、再度ステップＳ１２０３を実行する。割り付け禁止フラグが設定されていない場合（ステップＳ１２０３：Ｎｏ）、ＣＰＵ＃０は、対象ＣＰＵ以外の他のＣＰＵへ割り付け禁止フラグを設定する（ステップＳ１２０４）。 After the determination, CPU # 0 determines whether the assignment prohibition flag is set (step S1203). When the allocation prohibition flag is set (step S1203: Yes), the CPU # 0 executes step S1203 again after a predetermined time. When the allocation prohibition flag is not set (step S1203: No), the CPU # 0 sets an allocation prohibition flag to a CPU other than the target CPU (step S1204).

割り付け禁止フラグの設定後、ＣＰＵ＃０は、コヒーレンシ対象ＣＰＵ決定処理を実行する（ステップＳ１２０５）。コヒーレンシ対象ＣＰＵ決定処理の詳細は、図１３にて後述する。実行後、ＣＰＵ＃０は、コヒーレンシ対象ＣＰＵ決定処理によって、コヒーレンシを実行していた状態から停止へ変化したＣＰＵが存在するかを判断する（ステップＳ１２０６）。停止へ変化したＣＰＵがある場合（ステップＳ１２０６：Ｙｅｓ）、ＣＰＵ＃０は、対象ＣＰＵのキャッシュをフラッシュする（ステップＳ１２０７）。フラッシュとは、たとえばスヌープ制御部が、Ｍ状態など、データが更新されているラインを共有メモリ２０２に書き込み、Ｍ状態を含むすべてのラインをＩ状態に設定する操作である。 After setting the allocation prohibition flag, the CPU # 0 executes a coherency target CPU determination process (step S1205). Details of the coherency target CPU determination process will be described later with reference to FIG. After the execution, the CPU # 0 determines whether there is a CPU that has changed from the state where the coherency is being executed to the stop by the coherency target CPU determination process (step S1206). If there is a CPU that has changed to stop (step S1206: Yes), the CPU # 0 flushes the cache of the target CPU (step S1207). The flash is an operation in which, for example, the snoop control unit writes a line in which data is updated, such as the M state, to the shared memory 202 and sets all the lines including the M state to the I state.

フラッシュ後、または、停止に変化したＣＰＵがない場合（ステップＳ１２０６：Ｎｏ）、ＣＰＵ＃０は、キャッシュコヒーレンシ機構２０１を制御する（ステップＳ１２０８）。具体的な制御内容として、ステップＳ１２０５の処理にて、対象ＣＰＵがＣＰＵ＃０であり、ＣＰＵ＃０とＣＰＵ＃１とのコヒーレンシの実行を停止すると決定した場合を想定する。 If there is no CPU changed to stop after the flush (step S1206: No), the CPU # 0 controls the cache coherency mechanism 201 (step S1208). As a specific control content, it is assumed that the target CPU is CPU # 0 and it is determined in step S1205 that execution of coherency between CPU # 0 and CPU # 1 is stopped.

この場合、ＣＰＵ＃０は、スヌープ対応キャッシュ＃０に対し、スヌープ対応キャッシュ＃１をブロードキャスト先から除くように指示をする。具体的に、ＣＰＵ＃０は、スヌープ対応キャッシュ＃０の設定レジスタを変更することで、スヌープ対応キャッシュ＃１をブロードキャスト先から除くようにする。 In this case, the CPU # 0 instructs the snoop compatible cache # 0 to remove the snoop compatible cache # 1 from the broadcast destination. Specifically, the CPU # 0 changes the setting register of the snoop corresponding cache # 0 to remove the snoop corresponding cache # 1 from the broadcast destination.

同様に、ＣＰＵ＃０は、スヌープ対応キャッシュ＃１に対しても、スヌープ対応キャッシュ＃０をブロードキャスト先から除くように指示をする。ブロードキャスト先から除くことで、ステップＳ１００２、ステップＳ１１０３の処理で行ったブロードキャストの送信量が減少し、バス１１０内のトラフィック量を減少することができる。また、ブロードキャスト先から除かれたスヌープ対応キャッシュも、それぞれの応答処理を行わなくて済むため、処理量を減少させることができる。 Similarly, the CPU # 0 instructs the snoop compatible cache # 1 to remove the snoop compatible cache # 0 from the broadcast destination. By excluding from the broadcast destination, the transmission amount of broadcast performed in the processing of step S1002 and step S1103 can be reduced, and the traffic amount in the bus 110 can be reduced. In addition, the snoop-compatible cache removed from the broadcast destination does not need to perform each response process, so the processing amount can be reduced.

制御後、ＣＰＵ＃０は、他ＣＰＵの割り付け禁止フラグを解除し（ステップＳ１２０９）、決定されたスレッドを対象ＣＰＵに割り付ける（ステップＳ１２１０）。割り付け後、ＣＰＵ＃０は、コヒーレンシ制御処理を終了する。 After control, CPU # 0 cancels the allocation prohibition flag of the other CPU (step S1209), and allocates the determined thread to the target CPU (step S1210). After the allocation, CPU # 0 ends the coherency control process.

図１３は、コヒーレンシ対象ＣＰＵ決定処理を示すフローチャートである。コヒーレンシ対象ＣＰＵ決定処理も、コヒーレンシ制御処理と同じくＣＰＵ＃０〜ＣＰＵ＃３のうち、どのＣＰＵでも実行される。図１３では、図１２と同様に、ＣＰＵ＃０にてコヒーレンシ対象ＣＰＵ決定処理が実行される場合にて説明を行う。また、コヒーレンシ対象ＣＰＵ決定処理は、コヒーレンシ制御処理から対象ＣＰＵと割り付け予定のスレッドである対象スレッドを引数として取得する。 FIG. 13 is a flowchart showing a coherency target CPU determination process. The coherency target CPU determination process is also executed by any of the CPUs # 0 to # 3 as in the coherency control process. In FIG. 13, as in FIG. 12, the case where the CPU # 0 performs the coherency target CPU determination process will be described. In the coherency target CPU determination process, the target CPU and a target thread that is a thread scheduled to be allocated are acquired as an argument from the coherency control process.

ＣＰＵ＃０は、対象ＣＰＵ以外の他のＣＰＵから、未選択のＣＰＵを選択する（ステップＳ１３０１）。選択後、ＣＰＵ＃０は、対象スレッドが属するプロセスと選択されたＣＰＵで実行中のスレッドが属するプロセスが同一かを判断する（ステップＳ１３０２）。 CPU # 0 selects an unselected CPU from CPUs other than the target CPU (step S1301). After selection, the CPU # 0 determines whether the process to which the target thread belongs and the process to which the thread being executed by the selected CPU belongs are the same (step S1302).

同一である場合（ステップＳ１３０２：Ｙｅｓ）、ＣＰＵ＃０は、対象スレッドのデータを共有しないスレッドＩＤリストフィールドに、選択されたＣＰＵで実行中のスレッドが含まれているかを判断する（ステップＳ１３０３）。含まれる場合（ステップＳ１３０３：Ｙｅｓ）、ＣＰＵ＃０は、対象ＣＰＵと選択されたＣＰＵとのコヒーレンシの停止を決定する（ステップＳ１３０４）。含まれない場合（ステップＳ１３０３：Ｎｏ）、ＣＰＵ＃０は、対象ＣＰＵと選択されたＣＰＵとのコヒーレンシの実行を決定する（ステップＳ１３０５）。 If they are the same (step S1302: Yes), the CPU # 0 determines whether the thread being executed by the selected CPU is included in the thread ID list field that does not share the data of the target thread (step S1303). . If included (step S1303: Yes), the CPU # 0 determines to stop coherency between the target CPU and the selected CPU (step S1304). If not included (step S1303: No), the CPU # 0 determines execution of coherency between the target CPU and the selected CPU (step S1305).

対象スレッドが属するプロセスと選択されたＣＰＵで実行中のスレッドが属するプロセスが同一でない場合（ステップＳ１３０２：Ｎｏ）、ＣＰＵ＃０は、プロセス同士が同一のプロセス間通信領域を利用するかを判断する（ステップＳ１３０６）。同一のプロセス間通信領域を利用する場合（ステップＳ１３０６：Ｙｅｓ）、ＣＰＵ＃０は、対象スレッドと選択されたＣＰＵで実行中のスレッドが同一のプロセス間通信領域を利用するかを判断する（ステップＳ１３０７）。 If the process to which the target thread belongs and the process to which the thread being executed by the selected CPU belongs are not the same (step S1302: No), the CPU # 0 determines whether the processes use the same inter-process communication area. (Step S1306). When using the same inter-process communication area (step S1306: Yes), the CPU # 0 determines whether the target thread and the thread being executed by the selected CPU use the same inter-process communication area (step S1306: Yes). S1307).

同一のプロセス間通信領域を利用する場合（ステップＳ１３０７：Ｙｅｓ）、ＣＰＵ＃０は、対象ＣＰＵと選択されたＣＰＵとのコヒーレンシの実行を決定する（ステップＳ１３０８）。同一のプロセス間通信領域を利用しない場合（ステップＳ１３０６：Ｎｏ、ステップＳ１３０７：Ｎｏ）、ＣＰＵ＃０は、対象ＣＰＵと選択されたＣＰＵとのコヒーレンシの停止を決定する（ステップＳ１３０９）。 When using the same inter-process communication area (step S1307: Yes), the CPU # 0 determines execution of coherency between the target CPU and the selected CPU (step S1308). When the same inter-process communication area is not used (step S1306: No, step S1307: No), the CPU # 0 determines to stop coherency between the target CPU and the selected CPU (step S1309).

ステップＳ１３０４、ステップＳ１３０５、ステップＳ１３０８、ステップＳ１３０９のいずれかの処理後、ＣＰＵ＃０は、対象ＣＰＵ以外の他のＣＰＵから未選択のＣＰＵが存在するかを判断する（ステップＳ１３１０）。未選択のＣＰＵがある場合（ステップＳ１３１０：Ｙｅｓ）、ＣＰＵ＃０は、ステップＳ１３０１の処理に移行する。未選択のＣＰＵがない場合（ステップＳ１３１０：Ｎｏ）、ＣＰＵ＃０はコヒーレンシ対象ＣＰＵ決定処理を終了する。 After any of steps S1304, S1305, S1308, and S1309, CPU # 0 determines whether there is an unselected CPU from other CPUs than the target CPU (step S1310). If there is an unselected CPU (step S1310: Yes), the CPU # 0 proceeds to the process of step S1301. If there is no unselected CPU (step S1310: No), the CPU # 0 ends the coherency target CPU determination process.

以上説明したように、マルチコアプロセッサシステム、キャッシュコヒーレンシ制御方法、およびキャッシュコヒーレンシ制御プログラムによれば、第１および第２のコアで実行中の第１および第２のスレッドでアクセスされる領域が異なることを判断する。領域が異なると判断された場合、マルチコアプロセッサシステムは、第１のコアに対応する第１のキャッシュメモリと第２のコアに対応する第２のキャッシュメモリとのコヒーレンシを停止させる。アクセスされる領域が異なるため、コヒーレンシを停止させても共有データの不整合が発生しない。よって、マルチコアプロセッサシステムは、キャッシュコヒーレンシ機構のトラフィック量が減少することで、消費電力を削減し、また、トラフィック量の増大による遅延を防止することができる。 As described above, according to the multi-core processor system, the cache coherency control method, and the cache coherency control program, the areas accessed by the first and second threads being executed in the first and second cores are different. Judging. When it is determined that the areas are different, the multi-core processor system stops coherency between the first cache memory corresponding to the first core and the second cache memory corresponding to the second core. Since the accessed areas are different, shared data inconsistency does not occur even if coherency is stopped. Therefore, the multi-core processor system can reduce power consumption by reducing the traffic amount of the cache coherency mechanism, and can prevent a delay due to an increase in the traffic amount.

また、マルチコアプロセッサシステムは、第１および第２のスレッドが同一のプロセスに属し、かつ共通してアクセスされる共有データが存在しないと判断された場合に、コヒーレンシを停止させてもよい。これにより、マルチコアプロセッサシステムは、２つのコアが同一プロセスのスレッドを実行していても、キャッシュコヒーレンシ機構のトラフィック量を減少することができ、消費電力を削減することができる。携帯電話に代表される組み込み機器では、複数のプロセスを同時に起動するケースは少なく、同一のプロセスに属する複数のスレッドを実行する機会が多いため、本実施の形態は、特に有効である。 The multi-core processor system may stop coherency when it is determined that the first and second threads belong to the same process and there is no shared data to be accessed in common. Thereby, even if two cores are executing the thread of the same process, the multi-core processor system can reduce the traffic amount of the cache coherency mechanism, and can reduce power consumption. In an embedded device typified by a mobile phone, there are few cases where a plurality of processes are activated simultaneously, and there are many opportunities to execute a plurality of threads belonging to the same process. Therefore, this embodiment is particularly effective.

さらに、組み込み機器に搭載される組み込みＯＳには、プロセスの概念がなく、すべてのスレッド（タスク）が同一のメモリ空間にアクセスするＯＳも存在する。たとえば、下記参考文献１で説明されているμＩＴＲＯＮである。
（参考文献１：組み込みＯＳ適材適所《Ｗｉｎｄｏｗｓ（登録商標）ＥｍｂｅｄｄｅｄＣＥ編》（１） ―― まず、μＩＴＲＯＮとＷｉｎｄｏｗｓ（登録商標）ＥｍｂｅｄｄｅｄＣＥの違いを理解する｜ＴｅｃｈＶｉｌｌａｇｅ／ＣＱ出版株式会社：［ｏｎｌｉｎｅ］、［平成２２年０５月１３日検索］、インターネット＜ＵＲＬ：ｈｔｔｐ：／／ｗｗｗ．ｋｕｍｉｋｏｍｉ．ｎｅｔ／ａｒｃｈｉｖｅｓ／２００９／０９／ｉｔｒｏｎｗｉｎｄｏｗｓ＿ｅｍｂｅｄｄｅｄ＿ｃｅｏｓｗｉｎｄｏｗｓ＿ｅｍｂｅｄｄｅｄ＿ｃｅ１．ｐｈｐ？ｐａｇｅ＝２＞） Furthermore, an embedded OS installed in an embedded device has no concept of a process, and there is an OS in which all threads (tasks) access the same memory space. For example, μITRON described in Reference Document 1 below.
(Reference 1: Suitable place for embedded OS << Windows (registered trademark) Embedded CE Edition (1)}-First, understand the difference between μITRON and Windows (registered trademark) Embedded CE | Tech Village / CQ Publishing Co., Ltd .: [ online], [searched on May 13, 2010], Internet <URL: http: // www.

前述のようなＯＳでは、すべてのスレッドが全メモリ空間にアクセス可能であるため、従来手法を適用しても、常にコヒーレンシを実行する状態となる。しかし、本実施の形態におけるマルチコアプロセッサシステム１００では、共通してアクセスされる共有データが存在しないと判断された場合に、キャッシュコヒーレンシ機構のトラフィック量や処理量を削減することができ、特に効果がある。 In the OS as described above, since all threads can access the entire memory space, even if the conventional method is applied, the state is always in a state where coherency is executed. However, the multi-core processor system 100 according to the present embodiment can reduce the traffic amount and the processing amount of the cache coherency mechanism when it is determined that there is no shared data to be accessed in common, which is particularly effective. is there.

また、マルチコアプロセッサシステムは、第１および第２のスレッドが異なるプロセスに属し、かつ、第１および第２のスレッドによって共通して使用されるプロセス間通信の領域が存在しない場合に、コヒーレンシを停止してもよい。これにより、マルチコアプロセッサシステムは、異なるプロセスであってもコヒーレンシを実行すべき場合を除いてコヒーレンシを停止させることでトラフィック量や処理量が減少し、消費電力を削減し、また、トラフィック量の増大による遅延を防止することができる。 In addition, the multi-core processor system stops coherency when the first and second threads belong to different processes and there is no interprocess communication area commonly used by the first and second threads. May be. As a result, the multi-core processor system reduces the traffic volume and processing volume by stopping the coherency except when the coherency should be executed even in different processes, thereby reducing the power consumption and increasing the traffic volume. Can prevent delay.

また、マルチコアプロセッサシステムは、第１および第２のスレッドが属するプロセス同士が使用されるプロセス間通信の領域が存在するか否かを判断してもよい。また、第１および第２のスレッドによって共通して使用されるプロセス間通信の領域が存在するかを判断する場合、マルチコアプロセッサシステムは、初めに、第１および第２のスレッドの少なくとも一方がプロセス間通信の領域を使用していない状態を判断してもよい。 The multi-core processor system may determine whether there is an interprocess communication area in which the processes to which the first and second threads belong are used. When determining whether there is an area of interprocess communication that is commonly used by the first and second threads, the multi-core processor system starts with at least one of the first and second threads being a process. A state where the inter-communication area is not used may be determined.

プロセス間通信の領域を使用しているか否かの判断は、共通して使用されるプロセス間通信の領域が存在するか否かの判断よりも処理量が小さい。したがって、第１および第２のスレッドがプロセス間通信の領域を使用しているか否かを先に判断し、共に領域を使用している場合、共通して使用される領域が存在するかを判断する、という処理順序によって、全体の処理量を減少させることができる。 The determination of whether or not the interprocess communication area is used has a smaller processing amount than the determination of whether or not the interprocess communication area used in common exists. Therefore, it is determined first whether or not the first and second threads are using the interprocess communication area. If both areas are used, it is determined whether there is an area that is commonly used. The total processing amount can be reduced by the processing order of “Yes”.

また、マルチコアプロセッサシステムは、第１のキャッシュメモリと第２のキャッシュメモリとのコヒーレンシの実行を停止させた場合、第１のキャッシュメモリに格納された共有データを第１のキャッシュメモリから消去してもよい。これにより、マルチコアプロセッサシステムは、コヒーレンシの実行を停止させても、共有メモリの整合性をとることができる。 In addition, when the multi-core processor system stops coherency execution between the first cache memory and the second cache memory, the multi-core processor system erases the shared data stored in the first cache memory from the first cache memory. Also good. As a result, the multi-core processor system can maintain the consistency of the shared memory even if the execution of coherency is stopped.

なお、本実施の形態で説明したキャッシュコヒーレンシ制御方法は、あらかじめ用意されたプログラムをパーソナル・コンピュータやワークステーション等のコンピュータで実行することにより実現することができる。本キャッシュコヒーレンシ制御プログラムは、ハードディスク、フレキシブルディスク、ＣＤ−ＲＯＭ、ＭＯ、ＤＶＤ等のコンピュータで読み取り可能な記録媒体に記録され、コンピュータによって記録媒体から読み出されることによって実行される。また本キャッシュコヒーレンシ制御プログラムは、インターネット等のネットワークを介して配布してもよい。 Note that the cache coherency control method described in the present embodiment can be realized by executing a program prepared in advance on a computer such as a personal computer or a workstation. The cache coherency control program is recorded on a computer-readable recording medium such as a hard disk, a flexible disk, a CD-ROM, an MO, and a DVD, and is executed by being read from the recording medium by the computer. The cache coherency control program may be distributed via a network such as the Internet.

＃０、＃１ＣＰＵ、スヌープ対応キャッシュ
Ａ−１、Ａ−２スレッド
１００マルチコアプロセッサシステム
１１０バス
５０１依存情報
５０２スケジューラ
５０３実行部
５０４検出部
５０５特定部
５０６判断部
５０７制御部 # 0, # 1 CPU, Snoop-compatible cache A-1, A-2 Thread 100 Multi-core processor system 110 Bus 501 Dependent information 502 Scheduler 503 Execution unit 504 Detection unit 505 Identification unit 506 Determination unit 507 Control unit

Claims

With multiple cores,
A multi-core processor system having a plurality of cache memories accessed by each of the plurality of cores, wherein any one of the plurality of cores is:
When a first thread executed by a first core among the plurality of cores is detected, a second thread being executed by a second core other than the first core among the plurality of cores is detected. A first cache memory corresponding to the first core among the plurality of cache memories when there is no shared data to be detected and accessed in common during execution of the first thread and the second thread And coherency processing with the second cache memory corresponding to the second core,
A multi-core processor system characterized by that.

A cache coherency control method for a multi-core processor system having a plurality of cache memories accessed by each of a plurality of cores, wherein any one of the plurality of cores includes:
When a first thread executed by a first core among the plurality of cores is detected, a second thread being executed by a second core other than the first core among the plurality of cores is detected. A first cache memory corresponding to the first core among the plurality of cache memories when there is no shared data to be detected and accessed in common during execution of the first thread and the second thread And coherency processing with the second cache memory corresponding to the second core,
A cache coherency control method characterized by executing processing.

A cache coherency control program for a multi-core processor system having a plurality of cache memories accessed by each of a plurality of cores, wherein any one of the plurality of cores includes:
When a first thread executed by a first core among the plurality of cores is detected, a second thread being executed by a second core other than the first core among the plurality of cores is detected. A first cache memory corresponding to the first core among the plurality of cache memories when there is no shared data to be detected and accessed in common during execution of the first thread and the second thread And coherency processing with the second cache memory corresponding to the second core,
A cache coherency control program characterized by causing processing to be executed.