JP2021117577A

JP2021117577A - Information processing device, information processing method and program

Info

Publication number: JP2021117577A
Application number: JP2020009086A
Authority: JP
Inventors: 武早坂; Takeshi Hayasaka
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 2020-01-23
Filing date: 2020-01-23
Publication date: 2021-08-10
Anticipated expiration: 2040-01-23
Also published as: JP7434925B2

Abstract

To provide an information processing device, an information processing method, and a program that can store information according to a start time and an end time of a waiting time regarding concurrent processing.SOLUTION: An information processing device 10 performing concurrent processing of a plurality of processing units 20 and 21 includes: operation information storage sections 20-1 and 21-1 that store information regarding operation status of its own processing unit (hereinafter referred to as operation information) in a storage area that can be read by other processing units; and information acquisition sections 20-4 and 21-4 that, when a predetermined waiting time occurs before a predetermined condition is satisfied by the other processing unit, read out the operation information of the other processing unit from the storage area at a start time and an end time of the waiting time and stores the information according to the read-out operation information in a predetermined storage section.SELECTED DRAWING: Figure 1

Description

本発明は、情報処理装置、情報処理方法およびプログラムに関する。 The present invention relates to an information processing device, an information processing method and a program.

特許文献１に記載されている情報処理装置は、複数のプロセスを並列に実行する際に、プロセス間の通信にて通信待ち合わせが発生した場合、通信待ち合わせ時間、データ転送時間等を測定し、所定のファイルに記憶する。その場合、特許文献１に記載されている情報処理装置は、通信待ち合わせ開始時刻、通信待ち合わせ終了時刻、およびデータ転送終了時刻を採取し、各時刻の差分から通信待ち合わせ時間とデータ転送時間を算出する。さらに、特許文献１に記載されている情報処理装置は、データ転送後にデータサイズや通信相手プロセス情報等のその他の情報を採取してそのファイルに記憶する。 The information processing apparatus described in Patent Document 1 measures a communication wait time, a data transfer time, and the like when a communication wait occurs in communication between processes when executing a plurality of processes in parallel, and determines the predetermined value. Store in the file of. In that case, the information processing apparatus described in Patent Document 1 collects the communication wait start time, the communication wait end time, and the data transfer end time, and calculates the communication wait time and the data transfer time from the difference between the times. .. Further, the information processing apparatus described in Patent Document 1 collects other information such as data size and communication partner process information after data transfer and stores it in the file.

特開２００９−１９９１２１号公報JP-A-2009-199121

特許文献１に記載されている情報処理装置は、プロセス間のデータ転送後に通信相手プロセス情報等のその他の情報を採取して所定のファイルに記憶する。そのため、例えば待ち合わせ時間の開始時と終了時で内容が変化する情報があった場合に、その内容の変化を記憶することができないという課題があった。 The information processing apparatus described in Patent Document 1 collects other information such as communication partner process information after data transfer between processes and stores it in a predetermined file. Therefore, for example, when there is information whose contents change at the start and end of the meeting time, there is a problem that the change in the contents cannot be memorized.

本発明は、上記課題を解決する情報処理装置、情報処理方法およびプログラムを提供することを目的とする。 An object of the present invention is to provide an information processing device, an information processing method, and a program that solve the above problems.

上記課題を解決するため本発明の一態様は、複数の処理単位を並列して処理する装置であって、各前記処理単位が、他の前記処理単位が読み出し可能な記憶領域に自己の前記処理単位の動作状態に係る情報（以下、動作情報）を記憶する動作情報記憶部と、他の前記処理単位で所定の条件が成立するまでに所定の待ち合わせ時間が発生した場合に、その待ち合わせ時間の開始時と終了時に前記記憶領域から前記他の処理単位の前記動作情報を読み出し、読み出した前記動作情報に応じた情報を所定の記憶部に記憶する情報取得部と、を含む情報処理装置である。 In order to solve the above problems, one aspect of the present invention is an apparatus that processes a plurality of processing units in parallel, and each of the processing units has its own processing in a storage area that can be read by the other processing units. When a predetermined waiting time occurs before a predetermined condition is satisfied in the operation information storage unit that stores information related to the operation state of the unit (hereinafter, operation information) and the other processing unit, the waiting time is changed. It is an information processing apparatus including an information acquisition unit that reads the operation information of the other processing unit from the storage area at the start and the end and stores the information corresponding to the read operation information in a predetermined storage unit. ..

また、本発明の一態様は、複数の処理単位を並列して処理する方法であって、各前記処理単位において、他の前記処理単位が読み出し可能な記憶領域に自己の前記処理単位の動作状態に係る情報（以下、動作情報）を記憶するステップと、他の前記処理単位で所定の条件が成立するまでに所定の待ち合わせ時間が発生した場合に、その待ち合わせ時間の開始時と終了時に前記記憶領域から前記他の処理単位の前記動作情報を読み出し、読み出した前記動作情報に応じた情報を所定の記憶部に記憶するステップと、を含む情報処理方法である。 Further, one aspect of the present invention is a method of processing a plurality of processing units in parallel, and in each of the processing units, the operating state of the processing unit itself is stored in a storage area in which the other processing units can be read. When a predetermined waiting time occurs before the predetermined condition is satisfied in the step of storing the information related to (hereinafter, operation information) and the other processing unit, the storage is performed at the start and end of the waiting time. This is an information processing method including a step of reading the operation information of the other processing unit from an area and storing the information corresponding to the read operation information in a predetermined storage unit.

また、本発明の一態様は、複数の処理単位を並列して処理する際に、各前記処理単位において、他の前記処理単位が読み出し可能な記憶領域に自己の前記処理単位の動作状態に係る情報（以下、動作情報）を記憶するステップと、他の前記処理単位で所定の条件が成立するまでに所定の待ち合わせ時間が発生した場合に、その待ち合わせ時間の開始時と終了時に前記記憶領域から前記他の処理単位の前記動作情報を読み出し、読み出した前記動作情報に応じた情報を所定の記憶部に記憶するステップと、をコンピュータに実行させるプログラムである。 Further, one aspect of the present invention relates to the operating state of the processing unit itself in a storage area in which the other processing units can be read in each of the processing units when a plurality of processing units are processed in parallel. When a predetermined waiting time occurs before a predetermined condition is satisfied in the step of storing information (hereinafter, operation information) and other processing units, the storage area is used at the start and end of the waiting time. This is a program for causing a computer to perform a step of reading out the operation information of the other processing unit and storing the information corresponding to the read operation information in a predetermined storage unit.

本発明の各態様によれば、並列処理に係る待ち合わせ時間の開始時と終了時に応じた情報を記憶することができる。 According to each aspect of the present invention, it is possible to store information according to the start time and the end time of the waiting time related to the parallel processing.

本発明に係る情報処理装置の一実施形態の基本的構成例を示すブロック図である。It is a block diagram which shows the basic structure example of one Embodiment of the information processing apparatus which concerns on this invention. 本発明の第１実施形態に係る情報処理装置の機能的構成例を示すブロック図である。It is a block diagram which shows the functional configuration example of the information processing apparatus which concerns on 1st Embodiment of this invention. 図２に示すノード１−１の動作例を示すフローチャートである。It is a flowchart which shows the operation example of the node 1-1 shown in FIG. 本発明の第２実施形態に係る情報処理装置の機能的構成例を示すブロック図である。It is a block diagram which shows the functional configuration example of the information processing apparatus which concerns on 2nd Embodiment of this invention. 図４に示すノード１−１の動作例を示すフローチャートである。It is a flowchart which shows the operation example of the node 1-1 shown in FIG. 本発明の一実施形態による情報処理装置の最小構成を示す図である。It is a figure which shows the minimum structure of the information processing apparatus by one Embodiment of this invention. 本発明の一実施形態による最小構成の情報処理装置の処理フローを示す図である。It is a figure which shows the processing flow of the information processing apparatus of the minimum structure by one Embodiment of this invention.

以下、図面を参照して本発明の実施形態について説明する。なお、各図において同一または対応する構成には同一の符号を用いて説明を適宜省略する。 Hereinafter, embodiments of the present invention will be described with reference to the drawings. In each figure, the same reference numerals are used for the same or corresponding configurations, and the description thereof will be omitted as appropriate.

＜実施形態の基本的構成例＞
図１は、本発明の一実施形態に係る情報処理装置の基本的構成例を示すブロック図である。図１に示す情報処理装置１０は、サーバ、パーソナルコンピュータ、タブレット端末等のコンピュータ、あるいはそのコンピュータと周辺装置とから構成することができる。また、情報処理装置１０は、情報処理装置１０を構成するコンピュータ（あるいはコンピュータと周辺装置）が有する１または複数のＣＰＵ（中央処理装置）、主記憶装置、補助記憶装置等の記憶装置、入出力装置、通信装置等のハードウェアと、１または複数のＣＰＵが実行するプログラム等のソフトウェアとの組み合わせで構成される機能的構成として、処理ユニット（処理単位）２０と処理ユニット２１を有する。処理ユニット２０と処理ユニット２１は１つの処理を分散して並列して処理する機能的構成であり、例えば、分散並列プログラムの場合のプロセスとそのプロセスを実行するハードウェアとの組み合わせ、共有並列プログラムの場合のスレッド（あるいはタスク（以下省略））とそのスレッドを実行するハードウェアとの組み合わせ等に対応する。本実施形態において、情報処理装置１０は、複数の処理ユニット２０および２１を並列して処理する装置である。なお、情報処理装置１０は、３以上の複数の処理ユニットを含んでいてもよい。 <Basic configuration example of the embodiment>
FIG. 1 is a block diagram showing a basic configuration example of an information processing apparatus according to an embodiment of the present invention. The information processing device 10 shown in FIG. 1 can be composed of a server, a personal computer, a computer such as a tablet terminal, or the computer and a peripheral device. Further, the information processing device 10 is a storage device such as one or a plurality of CPUs (central processing units), a main storage device, an auxiliary storage device, and input / output of a computer (or a computer and peripheral devices) constituting the information processing device 10. A processing unit (processing unit) 20 and a processing unit 21 are included as a functional configuration composed of a combination of hardware such as a device and a communication device and software such as a program executed by one or a plurality of CPUs. The processing unit 20 and the processing unit 21 have a functional configuration in which one process is distributed and processed in parallel. For example, a combination of a process in the case of a distributed parallel program and hardware that executes the process, a shared parallel program. Corresponds to the combination of the thread (or task (hereinafter omitted)) and the hardware that executes the thread in the case of. In the present embodiment, the information processing device 10 is a device that processes a plurality of processing units 20 and 21 in parallel. The information processing device 10 may include a plurality of processing units of 3 or more.

処理ユニット２０は、動作情報記憶部２０−１と、動作情報２０−３を記憶する記憶領域２０−２と、情報取得部２０−４と、記憶部２０−５とを含む。また、処理ユニット２１は、動作情報記憶部２１−１と、動作情報２１−３を記憶する記憶領域２１−２と、情報取得部２１−４と、記憶部２１−５とを含む。処理ユニット２０と処理ユニット２１は基本的構成は同一である。すなわち、動作情報記憶部２０−１と動作情報記憶部２１−１、記憶領域２０−２と記憶領域２１−２、動作情報２０−３と動作情報２１−３、情報取得部２０−４と情報取得部２１−４、そして、記憶部２０−５と記憶部２１−５は、それぞれ、互いに対応し、また、基本的構成が互いに同一である。以下、処理ユニット２０について主に説明し、処理ユニット２１についての説明は適宜省略する。なお、処理ユニット２０と処理ユニット２１は、例えばユーザプログラムを含み、ユーザプログラムの構成要素として図示してない他の１または複数の機能的構成を含む。なお、記憶部２０−５と記憶部２１−５は、同一の構成（１つの記憶部）であってもよい。 The processing unit 20 includes an operation information storage unit 20-1, a storage area 20-2 for storing operation information 20-3, an information acquisition unit 20-4, and a storage unit 20-5. Further, the processing unit 21 includes an operation information storage unit 21-1, a storage area 21-2 for storing the operation information 21-3, an information acquisition unit 21-4, and a storage unit 21-5. The processing unit 20 and the processing unit 21 have the same basic configuration. That is, the operation information storage unit 20-1, the operation information storage unit 21-1, the storage area 20-2 and the storage area 21-2, the operation information 20-3 and the operation information 21-3, and the information acquisition unit 20-4 and the information. The acquisition unit 21-4, and the storage units 20-5 and the storage unit 21-5 correspond to each other, and the basic configurations are the same as each other. Hereinafter, the processing unit 20 will be mainly described, and the description of the processing unit 21 will be omitted as appropriate. The processing unit 20 and the processing unit 21 include, for example, a user program, and include another one or a plurality of functional configurations (not shown) as components of the user program. The storage unit 20-5 and the storage unit 21-5 may have the same configuration (one storage unit).

動作情報記憶部２０−１は、他の処理ユニット２１が直接読み出し可能な記憶領域２０−２に、自己の処理ユニット２０の動作状態に係る情報（動作情報２０−３）を記憶する。記憶領域２０−２は、自己の処理ユニット２０がデータを読み書き可能な領域であり、かつ、他の処理ユニット２１がデータを直接読み出し可能な領域である。ここで、他の処理ユニット２１が直接読み出し可能とは、例えば、処理ユニット２０と処理ユニット２１間のデータ転送用のプログラムを実行することなく、他の処理ユニット２１が例えばメモリ読み出し用コマンドやレジスタ読み出し用コマンドを実行することで読み出すことができる、という意味である。また、動作情報２０−３は、例えば、実行カウンタ（ＰＣ（プログラムカウンタ）等）の値、性能カウンタの値（実行命令数、メモリアクセス回数、分岐予測成功率、キャッシュメモリヒット率等）、実行中のユーザルーチン、ユーザルーチンの呼び出し履歴の情報等を含む。なお、動作情報記憶部２０−１が記憶領域２０−２に動作情報２０−３を記憶する動作やタイミングに限定はない。 The operation information storage unit 20-1 stores information (operation information 20-3) related to the operation state of its own processing unit 20 in a storage area 20-2 that can be directly read by another processing unit 21. The storage area 20-2 is an area in which the own processing unit 20 can read / write data and another processing unit 21 can directly read the data. Here, the fact that the other processing unit 21 can be read directly means that, for example, the other processing unit 21 can read, for example, a memory read command or register without executing a data transfer program between the processing unit 20 and the processing unit 21. It means that it can be read by executing the read command. Further, the operation information 20-3 includes, for example, the value of the execution counter (PC (program counter), etc.), the value of the performance counter (number of execution instructions, number of memory accesses, branch prediction success rate, cache memory hit rate, etc.), execution. Contains information on the user routine inside, the call history of the user routine, and the like. There is no limitation on the operation and timing in which the operation information storage unit 20-1 stores the operation information 20-3 in the storage area 20-2.

情報取得部２０−４は、他の処理ユニット２１で所定の条件が成立するまでに所定の待ち合わせ時間が発生した場合に、その待ち合わせ時間の開始時と終了時に他の処理ユニット２１の記憶領域２１−２から他の処理ユニット２１の動作情報２１−３を読み出し、読み出した動作情報２１−３に応じた情報を所定の記憶部２０−５に記憶する。ここで、所定の条件が成立するとは、例えば、自己の処理ユニット２０が他の処理ユニット２１に対する通信を行おうと待機している場合に、他の処理ユニット２１が自己の処理ユニット２０との間の通信を行える状態になったこと（あるいは状態であったこと）である。あるいは、所定の条件が成立するとは、例えば、自己の処理ユニット２０が他の処理ユニット２１と同期を成立させようとする場合に、他の処理ユニット２１が自己の処理ユニット２０と同期を成立できる状態になったこと（あるいは状態であったこと）である。所定の待ち合わせ時間は、例えば、所定の条件の成否を少なくとも１回判断するのに要する時間より長い待機時間である。また、動作情報２１−３に応じた情報とは、動作情報２１−３そのもの、または動作情報２１−３に基づいて生成した情報である。動作情報２１−３に基づいて生成した情報は、例えば、動作情報２１−３が数値情報を含む場合に、開始時の数値と終了時の数値の差分を表す情報である。 When a predetermined waiting time occurs before the predetermined condition is satisfied in the other processing unit 21, the information acquisition unit 20-4 stores the storage area 21 of the other processing unit 21 at the start and end of the waiting time. The operation information 21-3 of the other processing unit 21 is read from -2, and the information corresponding to the read operation information 21-3 is stored in the predetermined storage unit 20-5. Here, when a predetermined condition is satisfied, for example, when the own processing unit 20 is waiting to communicate with the other processing unit 21, the other processing unit 21 is between the other processing unit 20 and the other processing unit 20. It is (or was) in a state where communication can be performed. Alternatively, when a predetermined condition is satisfied, for example, when the own processing unit 20 tries to establish synchronization with the other processing unit 21, the other processing unit 21 can establish synchronization with the own processing unit 20. Being in a state (or being in a state). The predetermined waiting time is, for example, a waiting time longer than the time required to determine the success or failure of the predetermined condition at least once. Further, the information corresponding to the operation information 21-3 is the operation information 21-3 itself or the information generated based on the operation information 21-3. The information generated based on the operation information 21-3 is, for example, information representing the difference between the numerical value at the start and the numerical value at the end when the operation information 21-3 includes numerical information.

上述したように、図１に示す情報処理装置１０では、処理ユニット２０において、情報取得部２０−４が、他の処理ユニット２１で所定の条件が成立するまでに所定の待ち合わせ時間が発生した場合に、その待ち合わせ時間の開始時と終了時に記憶領域２１−２から他の処理ユニット２１の動作情報２１−３を読み出し、読み出した動作情報２１−３に応じた情報を所定の記憶部２０−５に記憶する。また、処理ユニット２１においては、情報取得部２１−４が、他の処理ユニット２０で所定の条件が成立するまでに所定の待ち合わせ時間が発生した場合に、その待ち合わせ時間の開始時と終了時に記憶領域２０−２から他の処理ユニット２０の動作情報２０−３を読み出し、読み出した動作情報２０−３に応じた情報を所定の記憶部２１−５に記憶する。以上の構成によれば、並列処理に係る待ち合わせ時間の開始時と終了時に応じた情報を記憶部２０−５および２１−５に記憶することができる。 As described above, in the information processing apparatus 10 shown in FIG. 1, in the processing unit 20, when the information acquisition unit 20-4 has a predetermined waiting time before the other processing unit 21 satisfies the predetermined condition. At the start and end of the waiting time, the operation information 21-3 of the other processing unit 21 is read from the storage area 21-2, and the information corresponding to the read operation information 21-3 is stored in the predetermined storage unit 20-5. Remember in. Further, in the processing unit 21, when a predetermined waiting time occurs before the predetermined condition is satisfied in the other processing unit 20, the information acquisition unit 21-4 stores the waiting time at the start and the end of the waiting time. The operation information 20-3 of the other processing unit 20 is read from the area 20-2, and the information corresponding to the read operation information 20-3 is stored in the predetermined storage unit 21-5. According to the above configuration, information corresponding to the start time and the end time of the waiting time related to the parallel processing can be stored in the storage units 20-5 and 21-5.

また、情報処理装置１０では、動作情報記憶部２０−１が、他の処理ユニット２１が読み出し可能な記憶領域２０−２に自己の処理ユニット２０の動作状態に係る情報である動作情報２０−３を記憶する。また、動作情報記憶部２１−１が、他の処理ユニット２０が読み出し可能な記憶領域２１−２に自己の処理ユニット２１の動作情報２１−３を記憶する。この構成によれば、情報取得部２０−４および２１−４は、他の処理ユニット２１および２０の動作情報２１−３および２０−３を効率的に読み出すことができる。 Further, in the information processing device 10, the operation information storage unit 20-1 has the operation information 20-3, which is information related to the operation state of its own processing unit 20 in the storage area 20-2 that can be read by the other processing unit 21. Remember. Further, the operation information storage unit 21-1 stores the operation information 21-3 of its own processing unit 21 in the storage area 21-2 that can be read by the other processing unit 20. According to this configuration, the information acquisition units 20-4 and 21-4 can efficiently read the operation information 21-3 and 20-3 of the other processing units 21 and 20.

なお、記憶領域２０−２が自己の処理ユニット２０における実行中のユーザルーチンおよびユーザルーチンの呼び出し履歴を格納するスタック領域を含んでいてもよい。この場合、実行中のユーザルーチンおよびユーザルーチンの呼び出し履歴は、動作情報２０−３に含まれる。また、記憶領域２１−２が自己の処理ユニット２１における実行中のユーザルーチンおよびユーザルーチンの呼び出し履歴を格納するスタック領域を含んでいてもよい。この場合、実行中のユーザルーチンおよびユーザルーチンの呼び出し履歴は、動作情報２１−３に含まれる。また、この場合、情報取得部２０−４は、他の処理ユニット２１における実行中のユーザルーチンおよびユーザルーチンの呼び出し履歴を含む動作情報２１−３を直接読み出すことができる。また、この場合、情報取得部２１−４は、他の処理ユニット２１における実行中のユーザルーチンおよびユーザルーチンの呼び出し履歴を含む動作情報２１−３を直接読み出すことができる。 The storage area 20-2 may include a stack area for storing a user routine being executed in its own processing unit 20 and a call history of the user routine. In this case, the running user routine and the call history of the user routine are included in the operation information 20-3. Further, the storage area 21-2 may include a stack area for storing the user routine being executed in its own processing unit 21 and the call history of the user routine. In this case, the running user routine and the call history of the user routine are included in the operation information 21-3. Further, in this case, the information acquisition unit 20-4 can directly read the operation information 21-3 including the running user routine in the other processing unit 21 and the call history of the user routine. Further, in this case, the information acquisition unit 21-4 can directly read the operation information 21-3 including the running user routine in the other processing unit 21 and the call history of the user routine.

また、動作情報２０−３および２１−３は、処理ユニット２０および２１の動作状態に応じて変化する数値情報を含んでいてもよい。この場合、情報取得部２０−４および２１−４は、例えば、待ち合わせ時間の開始時の数値情報と終了時の数値情報の差分を記憶部２０−５または２１−５に記憶することができる。 Further, the operation information 20-3 and 21-3 may include numerical information that changes according to the operation state of the processing units 20 and 21. In this case, the information acquisition units 20-4 and 21-4 can store, for example, the difference between the numerical information at the start of the waiting time and the numerical information at the end in the storage unit 20-5 or 21-5.

＜第１実施形態＞
次に、図２および図３を参照して、本発明の第１実施形態について説明する。図２は、本発明の第１実施形態に係る情報処理装置の構成例をノード１−１として示すブロック図である。ここで、ノードとは、通信ネットワークを構成するコンピュータ、端末、通信装置等の構成要素を論理的（あるいは機能的）に表現したものである。この場合、図２に示す複数のノード１−１は、ノード間インタコネクト１−７で接続されたクラスタシステム１００を構成している。また、図３は、図２に示す各ノード１−１の動作例を示すフローチャートである。 <First Embodiment>
Next, the first embodiment of the present invention will be described with reference to FIGS. 2 and 3. FIG. 2 is a block diagram showing a configuration example of the information processing apparatus according to the first embodiment of the present invention as a node 1-1. Here, the node is a logical (or functional) representation of components such as computers, terminals, and communication devices that make up a communication network. In this case, the plurality of nodes 1-1 shown in FIG. 2 constitute a cluster system 100 connected by the inter-node interconnect 1-7. Further, FIG. 3 is a flowchart showing an operation example of each node 1-1 shown in FIG.

図２に示す各ノード１−１は、サーバ、パーソナルコンピュータ、タブレット端末等のコンピュータ、あるいはそのコンピュータと周辺装置とを用いて構成することができる。また、各ノード１−１は、複数のＣＰＵコア１−３と、それら全てのＣＰＵコア１−３からアクセス可能な共有メモリ１−６を有している。この場合、ＣＰＵコア１−３はそれぞれ１つのプロセス１−２を処理し、各ノード１−１は複数のＣＰＵコア１−３を用いて複数のプロセス１−２を並列処理する。 Each node 1-1 shown in FIG. 2 can be configured by using a computer such as a server, a personal computer, a tablet terminal, or the computer and a peripheral device. Further, each node 1-1 has a plurality of CPU cores 1-3 and shared memory 1-6 accessible from all of the CPU cores 1-3. In this case, each CPU core 1-3 processes one process 1-2, and each node 1-1 processes a plurality of processes 1-2 in parallel using a plurality of CPU cores 1-3.

第１実施形態において、複数のプロセス１−２は、分散並列プログラムを構成する。各プロセス１−２は、そのプロセスを実行するＣＰＵコア１−３と、共有メモリ１−６に確保されたプロセスメモリ１−５を備える。 In the first embodiment, the plurality of processes 1-2 constitute a distributed parallel program. Each process 1-2 includes a CPU core 1-3 that executes the process and a process memory 1-5 allocated in the shared memory 1-6.

各ＣＰＵコア１−３は、実行カウンタ（ＰＣ：プログラムカウンタ等）、および、性能カウンタ（実行命令数、メモリアクセス回数、分岐予測成功率、キャッシュメモリヒット率等を格納する記憶領域）を有している。これらを称して実行カウンタおよび性能カウンタ群１−４とする。各ＣＰＵコア１−３は、他ＣＰＵコア１−３の実行カウンタおよび性能カウンタ群１−４へアクセスする手段を有する。また、各ＣＰＵコア１−３は、他の各ＣＰＵコア１−３に割り当てられた各プロセスメモリ１−５にアクセスする手段を有する。 Each CPU core 1-3 has an execution counter (PC: program counter, etc.) and a performance counter (storage area for storing the number of execution instructions, the number of memory accesses, the branch prediction success rate, the cache memory hit rate, etc.). ing. These are referred to as execution counters and performance counter groups 1-4. Each CPU core 1-3 has means for accessing the execution counter and the performance counter group 1-4 of the other CPU cores 1-3. Further, each CPU core 1-3 has a means for accessing each process memory 1-5 allocated to each of the other CPU cores 1-3.

次に、図３を参照して、図２に示す各ノード１−１の動作例として、プロセス間通信処理について説明する。図３に示す処理は、あるノード１−１が実行しているプロセス１−２（第１プロセスとする）が、他のノード１−１が実行しているプロセス１−２（第２プロセスとする）との間でデータ転送処理を行うときに実行される。図３に示すプロセス間通信処理では、第１プロセスが最初に通信対象の第２プロセスとの間で通信が可能か否かを判定する（３−１）。通信可能である場合（３−１で「真」の場合）、第１プロセスは、第１プロセスと第２プロセスとの間でデータ転送処理（３−１０）を行い、プロセス間通信処理を完了する。この場合、通信待合せは発生しないため、プロファイル情報の採取は行われない。 Next, with reference to FIG. 3, interprocess communication processing will be described as an operation example of each node 1-1 shown in FIG. In the process shown in FIG. 3, a process 1-2 (referred to as the first process) executed by a certain node 1-1 is executed by a process 1-2 (a second process) executed by another node 1-1. Is executed when data transfer processing is performed with. In the interprocess communication process shown in FIG. 3, it is first determined whether or not the first process can communicate with the second process to be communicated (3-1). When communication is possible (when "true" in 3-1), the first process performs data transfer processing (3-10) between the first process and the second process, and completes the interprocess communication processing. do. In this case, since the communication wait does not occur, the profile information is not collected.

３−１において、通信対象の第２プロセスとの間でまだ通信が行えない状況と判定された場合（３−１で「偽」の場合）、通信待合せとなる。通信待合せ処理では、第１プロセスは、通信待合せを開始した時刻（Ｔ０）を参照する（３−２）。その後、第１プロセスは、現在時刻（Ｔ１）を参照しつつ、あらかじめ定めた一定時間を経過するまで（Ｔ１−Ｔ０＞一定時間となるまで）通信可能か否かの判定を続ける（３−３、３−４、３−５）。ここで、一定時間とは、通信処理時間と比較して十分小さな時間とする。 In 3-1 if it is determined that communication with the second process to be communicated cannot be performed yet (in the case of "false" in 3-1), the communication is waited for. In the communication waiting process, the first process refers to the time (T0) at which the communication waiting is started (3-2). After that, the first process continues to determine whether or not communication is possible until a predetermined fixed time elapses (until T1-T0> fixed time) while referring to the current time (T1) (3-3). 3-4, 3-5). Here, the fixed time is a time sufficiently smaller than the communication processing time.

一定時間以内に通信可能となった場合は（３−３で「真」の場合は）、データ転送処理３−１０へと移行する。この場合、通信待合せは発生しているが、その通信待ちは十分小さいため、プロファイル情報の採取は行わない。 If communication becomes possible within a certain period of time (in the case of "true" in 3-3), the process proceeds to data transfer processing 3-10. In this case, the communication wait has occurred, but the communication wait is sufficiently small, so the profile information is not collected.

３−５において、待ち時間が一定時間を超えた場合（３−５で「真」の場合は）、第１プロセスは、プロファイル情報の採取を行う（３−６）。ここで、第１プロセスは、プロファイル情報として、通信相手の第２プロセスのメモリを参照し、そのスタック領域から、実行中のユーザルーチン、および、ユーザルーチンの呼び出し履歴の情報を得る。また、第１プロセスは、第２プロセスの性能カウンタ情報を採取する。 In 3-5, when the waiting time exceeds a certain time (when "true" in 3-5), the first process collects profile information (3-6). Here, the first process refers to the memory of the second process of the communication partner as profile information, and obtains information on the running user routine and the call history of the user routine from the stack area. In addition, the first process collects the performance counter information of the second process.

その後、第１プロセスは、通信可能となるまで待合せを継続し（３−７）、通信可能となった後、プロファイル情報の採取を行う（３−８）。続いて、第１プロセスは、３−９において、採取したプロファイル情報を例えば図示していない所定の記憶部に保存する。ここで、第１プロセスが、第２プロセス（および第１プロセス）が実行中のユーザルーチンおよび呼び出し履歴、さらに、３−６と３−８で採取した性能カウンタの差分を算出し、例えば図示していない所定の記憶部に保存する。図３に示すプロセス間通信処理では、性能カウンタの差分を採用することにより、当該プロセスが通信待合せを行っている間、通信相手プロセスがどの様な処理を行っていたかを判断することができる。第１プロセスは、最後にデータ転送処理３−１０を行い、プロセス間通信処理を完了する。 After that, the first process continues waiting until communication becomes possible (3-7), and after communication becomes possible, collects profile information (3-8). Subsequently, in the first process, in 3-9, the collected profile information is stored in, for example, a predetermined storage unit (not shown). Here, the first process calculates the user routine and call history that the second process (and the first process) is executing, and the difference between the performance counters collected in 3-6 and 3-8, and is shown in the figure, for example. It is saved in a predetermined storage unit that is not used. In the interprocess communication processing shown in FIG. 3, by adopting the difference of the performance counter, it is possible to determine what kind of processing the communication partner process is performing while the process is waiting for communication. The first process finally performs the data transfer process 3-10 to complete the interprocess communication process.

なお、実行カウンタおよび性能カウンタ群１−４とプロセスメモリ１−５への情報の書き込み（更新）は、例えば、プロセス１−２内の所定のプログラム、ノード１−１内でプロセス１−２等の実行を管理するオペレーティングシステム、ＣＰＵ１−３を制御するファームウェアやハードウェア等によって行うことができる。 For writing (updating) information to the execution counter, performance counter group 1-4, and process memory 1-5, for example, a predetermined program in process 1-2, process 1-2 in node 1-1, etc. It can be performed by the operating system that manages the execution of the above, the firmware and hardware that control the CPU 1-3, and the like.

第１実施形態によれば、分散並列処理に係るプロセス間通信処理における通信待ち合わせ時間の開始時と終了時にプロファイル情報と性能カウンタ情報を取得し、それらの情報やそれらの情報の差分を記憶するので、開始時と終了時に応じた情報を記憶することができる。 According to the first embodiment, profile information and performance counter information are acquired at the start and end of the communication wait time in the interprocess communication process related to the distributed parallel processing, and the information and the difference between the information are stored. , Information can be stored according to the start and end.

なお、第１実施形態における各構成と図１に示す各構成との対応関係は次のとおりである。図２に示すノード１−１が、図１に示す情報処理装置１０に対応する。図２に示すプロセス１−２が、図１に示す処理ユニット２０および２１に対応する。図２に示す実行カウンタおよび性能カウンタ群１−４とプロセスメモリ１−５が、図１に示す記憶領域２０−２および２１−２に対応する。図３に示す（３−６）、（３−８）および（３−９）の処理を実行するプロセス１−２内の構成（プログラム）が、図１に示す情報取得部２０−４および２１−４に対応する。図２に示す実行カウンタおよび性能カウンタ群１−４とプロセスメモリ１−５に記憶される情報（あるいは図３に示す（３−６）および（３−８）で採取される情報）が、図１に示す動作情報２０−３および２１−３に対応する。図３に示す（３−９）でプロファイル情報を保存する際の保存先が、図１に示す記憶部２０−５および２１−５に対応する。図２に示すノード１−１が有する実行カウンタおよび性能カウンタ群１−４とプロセスメモリ１−５に所定の情報を書き込む構成（プロセス１−２内のプログラム等）が、図１に示す動作情報記憶部２０−１および２１−１に対応する。 The correspondence between each configuration in the first embodiment and each configuration shown in FIG. 1 is as follows. Node 1-1 shown in FIG. 2 corresponds to the information processing device 10 shown in FIG. Process 1-2 shown in FIG. 2 corresponds to the processing units 20 and 21 shown in FIG. The execution counters and performance counter groups 1-4 and process memory 1-5 shown in FIG. 2 correspond to the storage areas 20-2 and 21-2 shown in FIG. The configuration (program) in the process 1-2 that executes the processes (3-6), (3-8), and (3-9) shown in FIG. 3 is the information acquisition units 20-4 and 21 shown in FIG. Corresponds to -4. The information stored in the execution counters and performance counter groups 1-4 and the process memory 1-5 shown in FIG. 2 (or the information collected in (3-6) and (3-8) shown in FIG. 3) is shown in FIG. Corresponds to the operation information 20-3 and 21-3 shown in 1. The storage destination when the profile information is stored in (3-9) shown in FIG. 3 corresponds to the storage units 20-5 and 21-5 shown in FIG. The configuration (program in process 1-2, etc.) for writing predetermined information to the execution counter and performance counter group 1-4 and process memory 1-5 of node 1-1 shown in FIG. 2 is the operation information shown in FIG. Corresponds to storage units 20-1 and 21-1.

＜第２実施形態＞
次に、図４および図５を参照して、本発明の第２実施形態について説明する。図４は、本発明の第２実施形態に係る情報処理装置の構成例をノード１−１として示すブロック図である。また、図５は、図４に示すノード１−１の動作例を示すフローチャートである。 <Second Embodiment>
Next, a second embodiment of the present invention will be described with reference to FIGS. 4 and 5. FIG. 4 is a block diagram showing a configuration example of the information processing apparatus according to the second embodiment of the present invention as node 1-1. Further, FIG. 5 is a flowchart showing an operation example of the node 1-1 shown in FIG.

図４に示すノード１−１は、図２に示すノード１−１に対応する構成であり、サーバ、パーソナルコンピュータ、タブレット端末等のコンピュータ、あるいはそのコンピュータと周辺装置とを用いて構成することができる。また、ノード１−１は、複数のＣＰＵコア１−３と、それら全てのＣＰＵコア１−３からアクセス可能な共有メモリ１−６を有している。また、各ＣＰＵコア１−３はそれぞれ実行カウンタおよび性能カウンタ群１−４を有している。図４に示すノード１−１、ＣＰＵコア１−３、実行カウンタおよび性能カウンタ群１−４、および共有メモリ１−６の各構成は、図２に示す同一の符号を付けた各構成と同一である。 The node 1-1 shown in FIG. 4 has a configuration corresponding to the node 1-1 shown in FIG. 2, and can be configured by using a computer such as a server, a personal computer, a tablet terminal, or the computer and a peripheral device. can. Further, the node 1-1 has a plurality of CPU cores 1-3 and shared memory 1-6 accessible from all of the CPU cores 1-3. Further, each CPU core 1-3 has an execution counter and a performance counter group 1-4, respectively. The configurations of the node 1-1, the CPU core 1-3, the execution counter and the performance counter group 1-4, and the shared memory 1-6 shown in FIG. 4 are the same as the configurations with the same reference numerals as shown in FIG. Is.

なお、第２実施形態において、複数のスレッド２−２から構成されるプロセス２−１は、共有並列プログラムを構成する。また、各スレッド２−２は、各ＣＰＵコア１−３によって実行され、共有メモリ１−６に確保されたプロセスメモリ２−３と、他のスレッド２−２を処理するＣＰＵコア１−３が有する実行カウンタおよび性能カウンタ群１−４とにアクセスすることができる。 In the second embodiment, the process 2-1 composed of a plurality of threads 2-2 constitutes a shared parallel program. Further, each thread 2-2 is executed by each CPU core 1-3, and the process memory 2-3 secured in the shared memory 1-6 and the CPU core 1-3 that processes the other threads 2-2 are used. It is possible to access the execution counter and the performance counter group 1-4.

次に、図５を参照して、図４に示すノード１−１の動作例として、スレッド間同期処理について説明する。図５に示す処理は、プロセス２−１のあるスレッド２−２（第１スレッドとする）が、他のスレッド２−２（第２スレッドとする）との間で同期を成立させるときに実行される。図５に示すスレッド間同期処理では、第１スレッドが、最初に同期対象の第２スレッドとの間で同期が成立しているか否かを判定する（４−１）。同期が成立している場合（４−１で「真」の場合）、第１スレッドはスレッド間同期処理を完了する。この場合、同期待合せは発生しないため、プロファイル情報の採取は行われない。 Next, with reference to FIG. 5, the interthread synchronization processing will be described as an operation example of the node 1-1 shown in FIG. The process shown in FIG. 5 is executed when one thread 2-2 (referred to as the first thread) of process 2-1 establishes synchronization with another thread 2-2 (referred to as the second thread). Will be done. In the inter-thread synchronization process shown in FIG. 5, it is determined whether or not the first thread is first synchronized with the second thread to be synchronized (4-1). When synchronization is established (when "true" in 4-1), the first thread completes the interthread synchronization process. In this case, the same expectation does not occur, so profile information is not collected.

４−１において、同期対象の第２スレッドとの間でまだ同期が成立していないと判定された場合（４−１で「偽」の場合）、同期待合せとなる。同期待合せ処理では、第１スレッドは、同期待合せを開始した時刻（Ｔ０）を参照する（４−２）。その後、第１スレッドは、現在時刻（Ｔ１）を参照しつつ、あらかじめ定めた一定時間を経過するまで（Ｔ１−Ｔ０＞一定時間となるまで）同期成立か否かの判定を続ける（４−３、４−４、４−５）。ここで、一定時間とは、同期処理時間と比較して十分小さな時間とする。 In 4-1 when it is determined that synchronization has not yet been established with the second thread to be synchronized (in the case of "false" in 4-1), the same expectation is met. In the expectation matching process, the first thread refers to the time (T0) at which the expectation matching is started (4-2). After that, the first thread continues to determine whether or not synchronization is established until a predetermined fixed time elapses (until T1-T0> fixed time) while referring to the current time (T1) (4-3). , 4-4, 4-5). Here, the fixed time is a time sufficiently smaller than the synchronous processing time.

一定時間以内に通信可能となった場合は（４−３で「真」の場合は）、第１スレッドはスレッド間同期処理を完了する。この場合、同期待合せは発生しているが、その同期待ちは十分小さい（一定時間以内の）ため、プロファイル情報の採取は行わない。 If communication becomes possible within a certain period of time (in the case of "true" in 4-3), the first thread completes the interthread synchronization process. In this case, the same expectation has occurred, but the waiting for synchronization is sufficiently small (within a certain period of time), so profile information is not collected.

４−５において、待ち時間が一定時間を超えた場合（４−５で「真」の場合は）、第１スレッドは、プロファイル情報の採取を行う（４−６）。ここで、第１スレッドは、プロファイル情報として、同期相手の第２スレッドのメモリを参照し、そのスタック領域から、実行中のユーザルーチン、および、ユーザルーチンの呼び出し履歴の情報を得る。また、第１スレッドは、第２スレッドの性能カウンタ情報を採取する。 In 4-5, when the waiting time exceeds a certain time (when it is "true" in 4-5), the first thread collects profile information (4-6). Here, the first thread refers to the memory of the second thread of the synchronization partner as profile information, and obtains information on the running user routine and the call history of the user routine from the stack area. In addition, the first thread collects the performance counter information of the second thread.

その後、第１スレッドは、同期が成立するまで待合せを継続し（４−７）、同期が成立した後、プロファイル情報の採取を行う（４−８）。続いて、第１プロセスは、４−９において、採取したプロファイル情報を例えば図示していない所定の記憶部に保存する。ここで、第１スレッドは、第２スレッド（および第１スレッド）が実行中のユーザルーチンおよび呼び出し履歴、さらに、４−６と４−８で採取した性能カウンタの差分を算出し、例えば図示していない所定の記憶部に保存する。図５に示すスレッド間同期処理では、性能カウンタの差分を採用することにより、当該スレッド（第１スレッド）が同期待合せを行っている間、同期対象スレッド（第２スレッド）がどの様な処理を行っていたかを判断することができる。第１スレッドは、４−９の後、スレッド間同期処理を完了する。 After that, the first thread continues waiting until synchronization is established (4-7), and after synchronization is established, profile information is collected (4-8). Subsequently, in 4-9, the first process stores the collected profile information in, for example, a predetermined storage unit (not shown). Here, the first thread calculates the user routine and call history that the second thread (and the first thread) is executing, and the difference between the performance counters collected in 4-6 and 4-8, and is shown in the figure, for example. Save in a predetermined storage unit that is not. In the inter-thread synchronization processing shown in FIG. 5, by adopting the difference of the performance counter, what kind of processing is performed by the synchronization target thread (second thread) while the thread (first thread) is performing the same expectation. You can judge if you were going. The first thread completes the interthread synchronization process after 4-9.

第２実施形態によれば、共有並列処理に係るスレッド間同期処理における同期待ち合わせ時間の開始時と終了時にプロファイル情報と性能カウンタ情報を取得し、それらの情報やそれらの情報の差分を記憶するので、開始時と終了時に応じた情報を記憶することができる。 According to the second embodiment, profile information and performance counter information are acquired at the start and end of the synchronization wait time in the interthread synchronous processing related to the shared parallel processing, and the information and the difference between the information are stored. , Information can be stored according to the start and end.

なお、第２実施形態における各構成と図１に示す構成との対応関係は次のとおりである。図４に示すノード１−１が、図１に示す情報処理装置１０に対応する。図４に示すプロセス２−１が有するスレッド２−２が、図１に示す処理ユニット２０および２１に対応する。図４に示す実行カウンタおよび性能カウンタ群１−４とプロセスメモリ２−３が、図１に示す記憶領域２０−２および２１−２に対応する。図５に示す（４−６）および、（４−８）および（４−９）の処理を実行するスレッド２−２内の構成（プログラム）が、図１に示す情報取得部２０−４および２１−４に対応する。図４に示す実行カウンタおよび性能カウンタ群１−４とプロセスメモリ２−３に記憶される情報（あるいは図５に示す（４−６）および（４−８）で採取される情報）が、図１に示す動作情報２０−３および２１−３に対応する。図５に示す（４−９）でプロファイル情報を保存する際の保存先が、図１に示す記憶部２０−５および２１−５に対応する。図４に示すノード１−１が有する実行カウンタおよび性能カウンタ群１−４とプロセスメモリ２−３に所定の情報を書き込む構成（スレッド２−２内のプログラム等）が、図１に示す動作情報記憶部２０−１および２１−１に対応する。 The correspondence between each configuration in the second embodiment and the configuration shown in FIG. 1 is as follows. Node 1-1 shown in FIG. 4 corresponds to the information processing device 10 shown in FIG. Threads 2-2 of process 2-1 shown in FIG. 4 correspond to processing units 20 and 21 shown in FIG. The execution counters and performance counter groups 1-4 and process memory 2-3 shown in FIG. 4 correspond to the storage areas 20-2 and 21-2 shown in FIG. The configuration (program) in the thread 2-2 that executes the processes (4-6), (4-8), and (4-9) shown in FIG. 5 is the information acquisition unit 20-4 and the information acquisition unit 20-4 shown in FIG. Corresponds to 21-4. The information stored in the execution counters and performance counter groups 1-4 and the process memory 2-3 shown in FIG. 4 (or the information collected in (4-6) and (4-8) shown in FIG. 5) is shown in FIG. Corresponds to the operation information 20-3 and 21-3 shown in 1. The storage destination when the profile information is stored in (4-9) shown in FIG. 5 corresponds to the storage units 20-5 and 21-5 shown in FIG. The configuration (program in thread 2-2, etc.) for writing predetermined information to the execution counter and performance counter group 1-4 and process memory 2-3 included in node 1-1 shown in FIG. 4 is the operation information shown in FIG. Corresponds to storage units 20-1 and 21-1.

＜第１実施形態と第２実施形態の他の効果等＞
上述したように、第１実施形態および第２実施形態では、分散並列プログラム（ＭＰＩ（ＭｅｓｓａｇｅＰａｓｓｉｎｇＩｎｔｅｒｆａｃｅ；メッセージパッシングインターフェース）プログラム等）、あるいは、共有並列プログラム（ＯｐｅｎＭＰ（ＯｐｅｎＭｕｌｔｉＰｒｏｃｅｓｓｉｎｇ）プログラム等）の実行において、分散並列プログラムの場合はプロセス間通信処理中に通信待合せが発生した場合、共有並列プログラムの場合はスレッド（あるいはタスク）間の同期待合せが発生した場合、さらに、それら通信待合せ、あるいは、同期待合せが一定時間以上継続した事を契機として、通信相手プロセス、あるいは、同期待合せ対象スレッドのプロファイル情報を採取する。この構成によれば、情報採取用の追加処理コードよる影響（実行命令数の増加、メモリアクセスの増加、分岐予測の成功率の変動、キャッシュメモリのヒット率およびミス率の挙動の変動）が無く、したがって、ユーザプログラムの挙動を正確に表現するプロファイル情報を採取することが出来る。また、情報採取用に追加資源（ＣＰＵコア、メモリなど）が不要であり、したがって、ユーザプログラムの実行を妨げない。また、通信待合わせ、あるいは、同期待合せの要因となる処理（すなわち、通信相手プロセスの処理内容、あるいは、同期待ちスレッドの処理内容）について、ピンポイントで情報を採取することができる。また、通信待合せ、あるいは、同期待合せの処理中に情報採取を行うため、ユーザプログラム実行性能に影響を与えない。 <Other effects of the first embodiment and the second embodiment>
As described above, in the first embodiment and the second embodiment, execution of a distributed parallel program (MPI (Message Passing Interface) program or the like) or a shared parallel program (OpenMP (Open MultiProcessing) program or the like) is executed. In the case of a distributed parallel program, when a communication wait occurs during interprocess communication processing, in the case of a shared parallel program, when the same expectation between threads (or tasks) occurs, and further, these communication waits or synchronization When the wait continues for a certain period of time or longer, the profile information of the communication partner process or the thread to be expected to be matched is collected. According to this configuration, there is no influence due to the additional processing code for collecting information (increase in the number of executed instructions, increase in memory access, fluctuation in the success rate of branch prediction, fluctuation in the behavior of the cache memory hit rate and miss rate). Therefore, it is possible to collect profile information that accurately expresses the behavior of the user program. In addition, no additional resources (CPU core, memory, etc.) are required for collecting information, and therefore, the execution of the user program is not hindered. In addition, it is possible to pinpoint information about the processing that causes communication waiting or the same expectation (that is, the processing content of the communication partner process or the processing content of the synchronization waiting thread). In addition, since information is collected during communication waiting or processing of the same expectation, it does not affect the user program execution performance.

なお、第１実施形態と第２実施形態は、例えば、分散並列プログラムの最適化および高速化を目的としたプロファイリングや、共有並列プログラムの最適化および高速化を目的としたプロファイリング等に適用することができる。 The first embodiment and the second embodiment are applied to, for example, profiling for the purpose of optimizing and speeding up a distributed parallel program, profiling for the purpose of optimizing and speeding up a shared parallel program, and the like. Can be done.

図６は、本発明の一実施形態による情報処理装置の最小構成を示す図である。
図７は、本発明の一実施形態による最小構成の情報処理装置の処理フローを示す図である。
情報処理装置１０は、複数の処理ユニット（処理単位）を並列して処理する装置であって、図６に示すように、処理ユニット２０は、動作情報記憶部２０−１と、情報取得部２０−４とを含む。
動作情報記憶部２０−１は、他の処理単位が読み出し可能な記憶領域に自己の処理単位の動作状態に係る情報（以下、動作情報）を記憶する（ステップＳ７−１）。
情報取得部２０−４は、他の処理単位で所定の条件が成立するまでに所定の待ち合わせ時間が発生した場合に、その待ち合わせ時間の開始時と終了時に記憶領域から他の処理単位の動作情報を読み出し、読み出した動作情報に応じた情報を所定の記憶部に記憶する（ステップＳ７−２）。 FIG. 6 is a diagram showing a minimum configuration of an information processing device according to an embodiment of the present invention.
FIG. 7 is a diagram showing a processing flow of the information processing apparatus having the minimum configuration according to the embodiment of the present invention.
The information processing device 10 is a device that processes a plurality of processing units (processing units) in parallel, and as shown in FIG. 6, the processing unit 20 includes an operation information storage unit 20-1 and an information acquisition unit 20. Including -4 and.
The operation information storage unit 20-1 stores information related to the operation state of its own processing unit (hereinafter, operation information) in a storage area that can be read by another processing unit (step S7-1).
When a predetermined waiting time occurs before a predetermined condition is satisfied in the other processing unit, the information acquisition unit 20-4 provides operation information of the other processing unit from the storage area at the start and end of the waiting time. Is read, and the information corresponding to the read operation information is stored in a predetermined storage unit (step S7-2).

以上、この発明の実施形態について図面を参照して説明してきたが、具体的な構成は上記実施形態に限られるものではなく、この発明の要旨を逸脱しない範囲の設計変更等も含まれる。 Although the embodiments of the present invention have been described above with reference to the drawings, the specific configuration is not limited to the above embodiments, and design changes and the like within a range not deviating from the gist of the present invention are also included.

また、上記実施形態でコンピュータが実行するプログラムの一部または全部は、コンピュータ読取可能な記憶媒体や通信回線を介して頒布することができる。 Further, a part or all of the program executed by the computer in the above embodiment can be distributed via a computer-readable storage medium or a communication line.

１−１ノード
１−２、２−１プロセス
１−３ＣＰＵコア
１−４実行カウンタおよび性能カウンタ群
１−５、２−３プロセスメモリ
１−６共有メモリ
１−７ノード間インタコネクト
２−２スレッド
１０情報処理装置
２０、２１処理ユニット（処理単位）
２０−１、２１−１動作情報記憶部
２０−２、２１−２記憶領域
２０−３、２１−３動作情報
２０−４、２１−４情報取得部
２０−５、２１−５記憶部 1-1 Node 1-2, 2-1 Process 1-3 CPU core 1-4 Execution counter and performance counter group 1-5, 2-3 Process memory 1-6 Shared memory 1-7 Internode interconnect 2-2 Thread 10 Information processing unit 20, 21 Processing unit (processing unit)
20-1, 21-1 Operation information storage unit 20-2, 21-2 Storage area 20-3, 21-3 Operation information 20-4, 21-4 Information acquisition unit 20-5, 21-5 Storage unit

Claims

A device that processes multiple processing units in parallel.
Each of the above processing units
An operation information storage unit that stores information related to the operation state of the processing unit (hereinafter, operation information) in a storage area that can be read by the other processing units.
When a predetermined waiting time occurs before a predetermined condition is satisfied in the other processing unit, the operation information of the other processing unit is read from the storage area at the start and end of the waiting time. An information acquisition unit that stores information corresponding to the operation information in a predetermined storage unit, and an information acquisition unit.
Information processing equipment including.

The storage area includes a running user routine in the self-processing unit and a stack area for storing the call history of the user routine.
The information processing device according to claim 1, wherein the information acquisition unit reads out the operation information including the executing user routine and the call history of the user routine in the other processing unit.

The operation information includes numerical information that changes according to the operation state of the processing unit.
The information processing device according to claim 1 or 2, wherein the information acquisition unit stores the difference between the numerical information at the start and the numerical information at the end in the storage unit.

The information processing apparatus according to any one of claims 1 to 3, wherein the processing unit is a process or a thread.

It is a method of processing multiple processing units in parallel.
In each of the processing units
A step of storing information related to the operating state of the processing unit (hereinafter referred to as operation information) in a storage area that can be read by the other processing unit.
When a predetermined waiting time occurs before a predetermined condition is satisfied in the other processing unit, the operation information of the other processing unit is read from the storage area at the start and end of the waiting time. A step of storing information according to the operation information in a predetermined storage unit, and
Information processing methods including.

When processing multiple processing units in parallel
In each of the processing units
A step of storing information related to the operating state of the processing unit (hereinafter referred to as operation information) in a storage area that can be read by the other processing unit.
When a predetermined waiting time occurs before a predetermined condition is satisfied in the other processing unit, the operation information of the other processing unit is read from the storage area at the start and end of the waiting time. A step of storing information according to the operation information in a predetermined storage unit, and
A program that causes a computer to run.