JP2004171167A

JP2004171167A - Multiprocessor computer and program

Info

Publication number: JP2004171167A
Application number: JP2002334577A
Authority: JP
Inventors: Jun Tanabe; 純田辺
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 2002-11-19
Filing date: 2002-11-19
Publication date: 2004-06-17
Anticipated expiration: 2022-11-19
Also published as: JP3876818B2

Abstract

<P>PROBLEM TO BE SOLVED: To acquire the parallel execution time of each parallel program even when a plurality of the parallel programs are executed on a multiprocessor computer. <P>SOLUTION: When a thread is generated in a processor of a processor 100-n's own (1≤n≤N) and the thread is the first one in a process, the processor 100-n generates a parallel execution time table 202 peculiar to the above process on a main storage device 200. When the processor 100-n allocates the processor of its own to the thread in the process and when the processor 100-n gets back the processor of its own from the thread in the process, the processor 10-n updates the parallel execution time table 202 on the basis of a current time which a system timer 205 shows; a time when the parallel execution time table is updated, which a time stamp 204 shows; the number of the threads which a parallel counter 203 shows; and the contents of the parallel execution time table 202. The thread acquires the contents of the parallel execution time table 202 by means of a parallel execution time acquisition system call. <P>COPYRIGHT: (C)2004,JPO

Description

【０００１】
【発明の属する技術分野】
本発明は、複数のプロセッサが主記憶装置を共有するマルチプロセッサ計算機に関し、特に、並列プログラムが如何に効率良く動作したかを表す指標である並列実行時間を算出する技術に関する。
【０００２】
【従来の技術】
複数のプロセッサから構成されるマルチプロセッサ計算機の性能評価を行う従来の技術として、各プロセッサ毎にプロセッサが稼働状態にあるときの開始時刻及び終了時刻と、プロセッサが非稼働状態のときの時刻とを記録し、制御部が、上記各時刻に基づいて任意の時刻におけるプロセッサの稼働台数や、稼働率を求める技術が知られている（例えば、特許文献１参照）。
【０００３】
【特許文献１】
特開平５−１８９３９５号公報
【０００４】
【発明が解決しようとする課題】
上述した従来の技術によれば、何台のプロセッサが同時に稼働しているかを知ることはできるが、マルチプロセッサ計算機を使用して並列プログラムを実行するユーザにとって重要なことは、プロセッサが何台同時に稼働しているかではなく、並列プログラムが効率的に実行されるか否かである。マルチプロセッサ計算機で実行される並列プログラムが１つの場合には、上述した従来の技術によっても並列プログラムが効率的に実行されているか否かを判定することができる。つまり、多くのプロセッサが同時に稼働している時間が長いほど、並列プログラムが効率的に実行されていると判定することができる。
【０００５】
しかし、マルチプロセッサ計算機で実行される並列プログラムが複数の場合は、プロセッサの稼働台数が分かったとしても、稼働台数に基づいて並列プログラムが効率的に実行されているか否かを判定することはできない。例えば、並列プログラムαの２つのスレッドと、並列プログラムβの１つのスレッドとがマルチプロセッサ計算機で実行されている場合、上述した従来の技術では、プロセッサの稼働台数が３台であることが分かるだけであり、並列プログラムα，β毎に、同時に実行されているスレッド数が分からないため、各並列プログラムが効率的に実行されているか否かを判定することはできない。
【０００６】
そこで、本発明の目的は、複数の並列プログラムが同時に実行されている場合でも、各並列プログラムが効率的に実行されているか否かを判定できるようにすることにある。
【０００７】
【課題を解決するための手段】
本発明にかかる第１のマルチプロセッサ計算機は、上記目的を達成するため、複数のプロセッサと、該複数のプロセッサによって共有される主記憶装置とを備えたマルチプロセッサ計算機において、
前記各プロセッサが、
自プロセッサで生成されたスレッドが、該スレッドが属するプロセス中の最初のスレッドである場合、前記主記憶装置上に、前記プロセスに固有の並列実行時間テーブルであって、複数のスレッド数毎に、そのスレッド数以上のスレッドに同時にプロセッサが割り当てられた時間を登録する領域を有する並列実行時間テーブルを生成する手段と、
自プロセッサを前記プロセス内のスレッドに割り当てるとき、及び自プロセッサを前記プロセス内のスレッドから返還させるとき、現時点においてプロセッサが割り当てられている、前記プロセス内のスレッドの数と、現時点の時刻と、現時点における前記並列実行時間テーブルの内容と、前記並列実行時間テーブルに対する最新の更新時刻とに基づいて、前記並列実行時間テーブルを更新する手段と、
前記並列実行時間テーブルの内容を取得する手段とを備えたことを特徴とする。
【０００８】
また、本発明にかかる第２のマルチプロセッサ計算機は、
第１のマルチプロセッサ計算機において、
前記並列実行時間テーブルが、前記主記憶装置以外の、前記複数のプロセッサによって共有される記憶装置上に生成されることを特徴とする。
【０００９】
より具体的には、本発明にかかる第３のマルチプロセッサ計算機は、
複数のプロセッサと、該複数のプロセッサによって共有される主記憶装置とを備えたマルチプロセッサ計算機において、
前記各プロセッサが、
自プロセッサで生成されたスレッドが、該スレッドが属するプロセス中の最初のスレッドである場合、前記主記憶装置上に、前記プロセスに固有のスレッド共有空間であって、複数のスレッド数毎に、そのスレッド数以上のスレッドに同時にプロセッサが割り当てられていた時間が登録される領域を有する並列実行時間テーブルと、前記プロセス内のスレッドの内の、現時点においてプロセッサが割り当てられているスレッドの数が設定される並列カウンタと、前記並列実行時間テーブルに対する最新の更新時刻が設定されるタイムスタンプとを含むスレッド共有空間を生成する手段と、
自プロセッサを前記プロセス内のスレッドに割り当てるとき、現在時刻と、前記並列実行時間テーブルの内容と、前記並列カウンタの内容と、前記タイムスタンプの内容とに基づいて、前記並列実行時間テーブルの内容を更新すると共に、前記タイムスタンプに現在時刻を設定し、更に、前記並列カウンタの値を増加させる手段と、
自プロセッサを前記プロセス内のスレッドから返還させるとき、現在時刻と、前記並列実行時間テーブルの内容と、前記並列カウンタの内容と、前記タイムスタンプの内容とに基づいて、前記並列実行時間テーブルの内容を更新すると共に、前記タイムスタンプに現在時刻を設定し、更に、前記並列カウンタの値を減少させる手段と、
前記並列実行時間テーブルの内容を取得する手段とを備えたことを特徴とする。
【００１０】
本発明にかかる第４のマルチプロセッサ計算機は、
第３のマルチプロセッサ計算機において、
前記スレッド共有空間が、前記主記憶装置以外の、前記複数のプロセッサによって共有される記憶装置上に生成されることを特徴とする。
【００１１】
【作用】
或るプロセッサでスレッドが生成されると、そのスレッドがプロセス中の最初のスレッドである場合、主記憶装置上に上記プロセス固有の並列実行時間テーブルが作成される。その後、何れかのプロセッサが、上記プロセス内のスレッドにプロセッサを割り当てたり、上記プロセス内のスレッドからプロセッサを返還させるとき、現時点の時刻と、上記並列実行時間テーブルに対する最新の更新時刻と、現時点においてプロセッサが割り当てられている上記プロセスに属するスレッドの数と、並列実行時間テーブルの内容とに基づいて並列実行時間テーブルを更新する。このように、プロセス固有の並列実行時間テーブルを用いて並列実行時間を管理するようにしているので、マルチプロセッサ計算機で複数の並列プログラムが同時に実行されている場合であっても、各並列プログラムが効率的に動作しているか否かを判定することが可能になる。
【００１２】
【発明の実施の形態】
次に本発明の実施の形態について図面を参照して詳細に説明する。
【００１３】
【実施例の構成】
図１は本発明にかかるマルチプロセッサ計算機の実施例のブロック図であり、複数のプロセッサ（ＣＰＵ，演算器）１００−１〜１００−Ｎと、各プロセッサ１００−１〜１００−Ｎによって共有される主記憶装置２００とから構成されている。
【００１４】
先ず、以下の説明で使用する用語の定義について述べておく。
【００１５】
プロセスとは、ＵＮＩＸ（登録商標）システムにおいて一般的にプロセスと呼ばれる概念に相当する。すなわち、プロセスとは、固有の記憶領域を有する最小の実行単位であり、時分割処理によりスケジューリングされるものである。従って、マルチプロセッサ計算機においては、複数のプロセッサ上で同時に異なるプロセスを実行することができ、シングルプロセッサシステムにおいても複数のプロセスをインタラクティブに実行することができる。
【００１６】
また、スレッドとは、ＵＮＩＸ（登録商標）システムにおいて一般的にスレッドと呼ばれる概念に相当する。すなわち、スレッドは単一のプロセスから生成された複数の実行単位であり、プロセスと同様に時分割処理によりスケジューリングされる。スレッドは固有の記憶領域を持たない最小の実行単位であり、同一プロセスに属する各スレッドはそれらが属している同一プロセスに固有の記憶領域を共有している。
【００１７】
また、時分割処理とは、非同期的あるいは同期的に発生する割り込みを契機として、実行待ち状態のプロセスのうち実行優先度の高いものからプロセッサが割り当てられ、プロセッサ上で実行される間に実行優先度が低下し、優先度が低いものからプロセッサを返還して実行待ち状態に移行するスケジューリング方式のことである。一般的にはマルチプロセッサ計算機における並列プログラムの最小実行単位はスレッドにより実現される事が多いが、プロセスによって実現される場合もある。スレッドにより実現される場合は、並列プログラムは複数のスレッドにより構成されたプロセスに相当し、このプロセスに固有の空間でかつこのプロセスを構成する複数のスレッドが互いに共有するスレッド共有空間を利用して動作する。プロセスにより実現される場合は、並列プログラムは複数のプロセスにより構成されたジョブに相当し、このジョブに固有の空間でかつこのジョブを構成する複数のプロセスが互いに共有するプロセス共有空間を利用して動作する。以下では並列プログラムの最小の実行単位がスレッドである事を想定した記述となっているが、最小の実行単位がプロセスであるようなシステムの場合でも、プロセスをスレッド、ジョブをプロセスと読み替えることで同様の効果を得ることができる。
【００１８】
図１において、主記憶装置２００は、各プロセッサ１００−１〜１００−Ｎから等価に共有されたシステムに唯一の主記憶装置である。主記憶装置２００は、スレッド共有空間２０１と、システムタイマ２０５とを含んでいる。
【００１９】
スレッド共有空間２０１は、任意のある時点においてシステムに存在する複数のプロセスの内の任意の一つが占有してアクセス可能な記憶領域であり、なおかつそのプロセスに属する各スレッドが共有している記憶領域である。このようなスレッド共有空間２０１には、並列実行時間テーブル２０２と、並列カウンタ２０３と、タイムスタンプ２０４とが設けられている。
【００２０】
並列実行時間テーブル２０２は、対応するプロセスを構成しているスレッドに関する１並列実行時間からＮ並列実行時間までの配列（要素数Ｎ）が格納される領域を表している。ここで、ｎ並列実行時間（ｎ＝１，２，…，Ｎ）とはｎ個以上のスレッドが同時に動作していた時間（ｎ個以上のスレッドに同時にプロセッサが割り当てられていた時間）を表すものとする。
【００２１】
並列カウンタ２０３は、対応するプロセスを構成するスレッドの内、同時に異なるプロセッサで動作しているスレッドの数（０〜Ｎ）を表している。
【００２２】
タイムスタンプ２０４には、並列実行時間テーブル２０２に対する最新の更新時刻が格納される。
【００２３】
システムタイマ２０５には、システム稼働中にマシンクロック単位で単調増加し続けるシステムに唯一のタイマ値が格納される。
【００２４】
また、図１においてプロセッサ１００−１は、ユーザプログラム実行処理部１０１と、割込み開始処理部１０２と、並列実行時間算出処理部１０３と、並列カウンタ減少処理部１０４と、並列実行時間取得処理部１０５と、処理部１０６と、並列実行時間算出処理部１０７と、並列カウンタ増加処理部１０８と、割込み終了処理部１０９と、スレッド開始処理部１１０と、プロセッサ割当処理部１１１と、ウェイト処理部１１２と、プロセッサ返還処理部１１３と、スレッド終了処理部１１４と、記録媒体Ｋとを備えている。なお、他のプロセッサもプロセッサ１００−１と同様の構成を有している。
【００２５】
ユーザプログラム実行処理部１０１は、スレッドがユーザモードでプログラムのコードを実行する手段である。
【００２６】
割込み開始処理部１０２は、システムコールやＩ／Ｏ処理、例外処理などの非同期ないし同期的な割込みの発生に伴い、プロセッサコンテキストのセーブや割込み原因に対応する割込みハンドラを呼び出す。
【００２７】
並列実行時間算出部１０３は、割込み開始処理の直後に呼び出され、システムコールの発行元のスレッドが属するプロセスに固有のスレッド共有空間２０１内の並列カウンタ２０３の値を参照し、その値が、「１」以上であった場合は、並列実行時間テーブル２０２内の第１番目のエントリから並列カウンタ２０３の値に対応するエントリまでの各値に、システムタイマ２０５とタイムスタンプ２０４との差分を加算し、その後、タイムスタンプ２０４にシステムタイマ２０５の値を設定する。また、並列カウンタ２０３の値が「０」であった場合は、タイムスタンプ２０４にシステムタイマ２０５の値を設定する処理のみを行い、並列実行時間テーブル２０２に対する更新処理は行わない。
【００２８】
並列カウンタ減少処理部１０４は、並列カウンタ２０３の値を１減少させる機能を有する。
【００２９】
並列実行時間取得処理部１０５は、並列実行時間取得システムコールにより呼び出され、発行元のスレッドが属するプロセスに対応するスレッド共有空間２０１の並列実行時間テーブル２０２の各エントリの内容を取得し、スレッドに返却する機能を有する。
【００３０】
並列実行時間算出処理部１０７は、並列実行時間算出処理部１０３と同様の機能を有する。
【００３１】
並列カウンタ増加処理部１０８は、並列カウンタ２０３の値を１増加させる機能を有する。
【００３２】
割込み終了処理部１０９は、プロセッサコンテキストのリストアや割込み発生ポイントへの復帰を行う。
【００３３】
スレッド開始処理部１１０は、新たに生成されたスレッドが、そのスレッドの属しているプロセス内の最初のスレッドであった場合、プロセスに固有の空間で且つそのプロセスに属している全てのスレッドが共有するスレッド共有空間２０１を確保した上で、並列実行時間テーブル２０２、並列カウンタ２０３及びタイムスタンプ２０４の値を「０」に初期化する。
【００３４】
プロセッサ割当処理部１１１は、他の実行優先度の低いスレッドからプロセッサを奪い取り、仮想空間の切り替えなどのスイッチ処理を実行してスレッドの実行を開始させる。
【００３５】
ウェイト処理部１１２は、スリープシステムコールやタイムスライス切れが発生した場合、スレッドを実行待ちキューやスリープキューなどを用いて停止させる。
【００３６】
プロセッサ返還処理部１１３は、システムコールやＩ／Ｏ待ちによるスリープ呼び出しや、終了処理、タイムスライス切れ割込みで自スレッドよりも実行優先度の高いスレッドがある時などに呼び出され、他の実行優先度の高いスレッドにプロセッサを譲るために仮想空間の切り替えなどのスイッチ処理を実行する。
【００３７】
スレッド終了処理部１１４は、終了したスレッドがプロセス内の最後のスレッドである場合は、上記プロセス固有のスレッド共有空間２０１を解放する機能を有する。
【００３８】
なお、並列実行時間算出処理部１０３，並列カウンタ減少処理部１０４によって連続的に行われる処理と、並列実行時間算出処理部１０７，並列カウンタ増加処理部１０８によって連続的に行われる処理と、並列実行時間取得処理部１０５によって行われる処理とはそれぞれ互いに、同一プロセス内の異なるスレッドにおいて同時に実行されないよう、排他制御によりシリアライズされている。
【００３９】
処理部１０６は、プロセッサ返還処理、並列実行時間取得処理以外の、システムコールに応じた処理、Ｉ／Ｏ処理、例外処理などの一般的な割込み処理を行う。
【００４０】
記録媒体Ｋは、ディスク、半導体メモリ、その他の記録媒体である。この記録媒体Ｋに記録されているプログラムは、プロセッサ１００−１によって読み取られ、その動作を制御することで、プロセッサ１００−１上に、ユーザプログラム実行処理部１０１、割込み開始処理部１０２、並列実行時間算出処理部１０３、並列カウンタ減少処理部１０４、並列実行時間取得処理部１０５、処理部１０６、並列実行時間算出処理部１０７、並列カウンタ増加処理部１０８、割込み終了処理部１０９、スレッド開始処理部１１０、プロセッサ割当処理部１１１、ウェイト処理部１１２、プロセッサ返還処理部１１３、スレッド終了処理部１１４を実現する。
【００４１】
【実施例の動作】
次に各図を参照して本実施例の動作について詳細に説明する。
【００４２】
先ず、プロセッサ１００−１においてスレッドＸが生成されると、プロセッサ１００−１内のスレッド開始処理部１１０は、スレッドＸを実行待ちキュー（図示せず）につなぎ、更に、図２の流れ図に示すように、スレッドＸが、それが属するプロセスＺ内の最初のスレッドであるか否かを判断し（ステップＳ１）、最初のスレッドであった場合（ステップＳ１がＹｅｓ）は、上記プロセスＺに固有のスレッド共有空間２０１を生成し、更に、並列実行時間テーブル２０２の各エントリ、並列カウンタ２０３及びタイムスタンプ２０４の値を「０」に初期設定する（ステップＳ２）。これに対して、最初のスレッドでなかった場合（ステップＳ１がＮｏ）は、ステップＳ２をスキップする。
【００４３】
その後、或るプロセッサ（例えば、プロセッサ１００−１）内のプロセッサ割当処理部１１１によって、実行待ちキューにつながれているスレッドＸにプロセッサ１００−１が割り当てられると、並列実行時間算出処理部１０７が、並列実行時間の算出処理を行う（ステップＳ１３）。
【００４４】
このステップＳ１３の処理を詳しく説明すると、次のようになる。先ず、スレッドＸが属するプロセスＺに割り当てられているスレッド共有空間２０１中の並列カウンタ２０３の値Ｃを参照し、その値Ｃが「１」以上であるか否かを調べる。
【００４５】
そして、「１」以上であった場合は、システムタイマ２０５の値ＳＴＭとタイムスタンプ２０４の値ＴＳとの差分（ＳＴＭ−ＴＳ）を求める。その後、並列実行時間テーブル２０２の各エントリの内の、第１番目のエントリから並列カウンタ２０３の値に対応するエントリまでを対象にして、そこに設定されている値に上記差分（ＳＴＭ−ＴＳ）を加算する。そして、最後に、タイムスタンプ２０４の値ＴＳにシステムタイマ２０５の値ＳＴＭを設定する。
【００４６】
これに対して、並列カウンタ２０３の値Ｃが「１」未満であった場合は、タイムスタンプ２０４の値ＴＳにシステムタイマ２０５の値ＳＴＭを設定する処理のみを行い、並列実行時間テーブル２０２に対する更新処理は行わない。以上がステップＳ１３で行う処理の詳細である。
【００４７】
並列実行時間算出処理部１０７の処理が終了すると、並列カウンタ増加処理部１０８が、スレッドＸが属しているプロセスＺに割り当てられているスレッド共有空間２０１内の並列カウンタ２０３の値Ｃを増加させる（ステップＳ１４）。なお、本実施例では、並列カウンタ２０３の値Ｃを「１」増加させるものとする。
【００４８】
その後、割込み終了処理部１０９による割込み終了処理が行われ、上記スレッドＸは、ユーザモードに移行し、ユーザプログラム実行処理部１０１でユーザプログラムを実行する（ステップＳ３）。このステップＳ３では、並列プログラムがプログラムの記述に従って実行され、割込みが発生しない限り（ステップＳ４がＹｅｓとならない限り）、ユーザプログラムが実行され続ける。
【００４９】
スレッドＸがシステムコールやＩ／Ｏを発行したり、タイムスライス切れや例外処理により割り込まれた場合（ステップＳ４がＹｅｓ）は、割込み開始処理部１０２に処理が移行し、以下の処理が行われる。
【００５０】
先ず、並列実行時間算出処理部１０３がスレッドＸが属しているプロセスＺのスレッド共有空間２０１を対象にして、ステップＳ１３において並列実行時間算出処理部１０７が行った処理と同様の処理を行う（ステップＳ７）。次いで、並列カウンタ減少処理部１０４が並列カウンタ２０３の値Ｃを減少させる（ステップＳ８）。本実施例では、並列カウンタ２０３の値Ｃを「１」減少させるものとする。
【００５１】
その後、割込み要因に対応する割込み処理が実行される。もし、割込み要因が並列実行時間取得システムコールである場合（ステップＳ９がＹｅｓ）は、並列実行時間取得処理部１０５において、並列実行時間の取得処理が行われる（ステップＳ１１）。このステップＳ１１の並列実行時間の取得処理では、並列実行時間テーブル２０２の内容を全て取得し、システムコールの発行元のスレッドに返却する処理が行われる。その後、並列実行時間算出処理部１０７において、スレッドＸが属するプロセスＺのスレッド共有空間２０１を処理対象にして前述した並列実行時間の算出処理が行われ（ステップＳ１３）、更に、並列カウンタ増加処理部１０８において前述した並列カウンタ２０３の増加処理が行われる（ステップＳ１４）。その後、割込み終了処理部１０９において割込み終了処理が実行され、上記スレッドＸは、ユーザモードに移行し、ユーザプログラム実行処理部１０１でユーザプログラムを実行する（ステップＳ３）。
【００５２】
また、割込み要因が並列実行時間取得システムコールでなかった場合（ステップＳ９がＮｏ）は、割込み要因が終了システムコールであるか否かを判定する（ステップＳ１０）。
【００５３】
そして、終了システムコールでなかった場合（ステップＳ１０がＮｏ）は、処理部１０６において割込み要因に応じた割込み処理が行われる（ステップＳ１２）。このステップＳ１２で行われる処理としては、例えば、新たなスレッドＹの生成処理などがある。もし、ステップＳ１２で新たなスレッドＹが生成されたとすると、スレッド開始処理部１１０は、スレッドＹを実行待ちキューにつなぐと共に、上記スレッドＹを処理対象にして前述したステップＳ１の処理を行う。この場合、スレッドＹはプロセスＺ中の最初のスレッドでないので（ステップＳ１がＮｏ）、ステップＳ２はスキップされる。その後、プロセッサ１００−１〜１００−Ｎの内の、或る１台のプロセッサ１００−ｎ（１≦ｎ≦Ｎ）内のプロセッサ割当処理部１１１によってスレッドＹにプロセッサ１００−ｎが割り当てられると、並列実行時間算出処理部１０７、並列カウンタ増加処理部１０８、割込み終了処理部１０９による処理が順次行われる。
【００５４】
これに対して、割込み要因が終了システムコールであった場合（ステップＳ１０がＹｅｓ）は、プロセッサ返還処理部１１３によるプロセッサ返還処理が行われた後、スレッド終了処理部１１４によるスレッド終了処理が行われる。このスレッド終了処理においては、先ず、終了システムコールの発行元のスレッド（例えば、スレッドＸ）が、同一プロセスＺ内の最後のスレッドであるか否かを判断する（ステップＳ５）。そして、最後のスレッドである場合（ステップＳ５がＹｅｓ）は、スレッドＸが属しているプロセスＺに割り当てられているスレッド共有空間２０１を解放する（ステップＳ６）。これに対して、最後のスレッドでない場合（ステップＳ５がＮｏ）は、ステップＳ６をスキップする。その後、スレッドＸは終了する。
【００５５】
以上の処理により、割込みの発生毎に並列実行時間テーブル２０２に１並列実行時間からＮ並列実行時間までの各値が適時採取されことになる。なお、並列実行時間をＴｎ、並列カウンタ２０３の値をＣ、タイムスタンプ２０４の値をＴＳ、システムタイマ２０５の値をＳＴＭと表記すると、ステップＳ７、Ｓ１３の並列実行時間の算出処理は、Ｔｎ＝Ｔｎ＋（ＳＴＭ−ＴＳ）（但し１≦ｎ≦Ｃの各値について）、およびＴＳ＝ＳＴＭと表記できる。また、ステップＳ８の並列カウンタ減算処理はＣ＝Ｃ−１と表記でき、ステップＳ１４の並列カウンタ増加処理はＣ＝Ｃ＋１と表記できる。
【００５６】
次に、具体例を用いて並列実行時間が算出される様子を説明する。
【００５７】
図３は、３つのスレッドＡ、Ｂ、Ｃにより構成される並列プログラムが動作した際の、並列実行時間テーブル２０２の各エントリの値Ｔｎ（ｎ＝１，２，…，Ｎ）、並列カウンタ２０３の値Ｃ及びタイムスタンプ２０４の値ＴＳの遷移を表した図である。なお、以下では説明を容易にするために、割り込みは時刻Ｐ１，Ｐ２，…，Ｐ８の８箇所においてのみ発生したものとし、なおかつ割り込みが発生してから終了するまでに要する時間は十分に短く０に等しかったと仮定する。すなわち、割込み開始処理部１０２で割込み開始処理が行われてから割込み終了処理部１０９で割込み終了処理が行われるまでの間にシステムタイマ２０５の値は変動しなかったと仮定する。なお、現実のシステムおいては、割り込みは任意のスレッドの任意の時点で発生し、割り込みが発生してから終了するまでに要する時間は０以上の可変値となるが、現実のシステムのどのような場合も図３で例示した処理の組み合わせにすぎず、並列実行時間が正しく採取される。
【００５８】
時刻Ｐ１（ＳＴＭ＝１０）において、プロセッサ１００−１でスレッドＡが生成されると、プロセッサ１００−１内のスレッド開始処理部１１０は、スレッドＡを実行待ちキューにつなぐと共に、スレッドＡが、それが属するプロセスＤ内の最初のスレッドであるか否かを判断する（図２、ステップＳ１）。この場合、スレッドＡは最初のスレッドであるので（ステップＳ１がＹｅｓ）、スレッド開始処理部１１０は、スレッドＡが属するプロセスＤ固有のスレッド共有空間２０１を生成し、更に、並列実行時間テーブル２０２の各エントリの値Ｔ１〜ＴＮ、並列カウンタ２０３の値Ｃ及びタイムスタンプ２０４の値ＴＳを全て「０」に初期化する（ステップＳ２）。
【００５９】
その後、或るプロセッサ（例えば、プロセッサ１００−１）内のプロセッサ割当処理部１１１によって、実行待ちキューにつながれているスレッドＡにプロセッサ１００−１が割り当てられると、並列実行時間算出処理部１０７が、ステップＳ１３の並列実行時間算出処理を行う。この場合、並列カウンタ２０３の値Ｃは「０」であるので、並列実行時間算出処理部１０７は、並列実行時間テーブル２０２に対する更新処理は行わず、タイムスタンプ２０４の値ＴＳにシステムタイマ２０５の値ＳＴＭの値を設定する処理だけを行う。従って、ＴＳ＝１０となる。
【００６０】
その後、並列カウンタ増加処理部１０８が、並列カウンタ２０３の値Ｃを＋１し、Ｃ＝１とする（ステップＳ１４）。その後、割込み終了処理部１０９による割込み終了処理が行われ、スレッドＡがユーザプログラム実行処理部１０１でユーザプログラムの実行を開始する（ステップＳ３）。
【００６１】
次に、時刻Ｐ２（ＳＴＭ＝２５）において、スレッドＡが新たなスレッドＢを生成するためにスレッド生成システムコールを発行する（ステップＳ４がＹｅｓ）。これにより、割込み開始処理部１０２に処理が移行し、並列実行時間算出処理部１０３において並列実行時間算出処理が行われる（ステップＳ７）。このとき、並列カウンタ２０３の値Ｃは「１」であるので、並列実行時間テーブル２０２の第１番目のエントリの値Ｔ１に、システムタイマ２０５の値ＳＴＭとタイムスタンプ２０４の値ＴＳとの差分（ＳＴＭ−ＴＳ）を加算し、更に、システムタイマ２０５の値ＳＴＭをタイムスタンプ２０４の値ＴＳに設定する。すなわち、Ｔ１＝Ｔ１＋（ＳＴＭ−ＴＳ）＝０＋（２５−１０）＝１５、ＴＳ＝ＳＴＭ＝２５となる。その後、並列カウンタ減少処理部１０４が、並列カウンタ２０３の値Ｃを１減少させ、Ｃ＝０とする（ステップＳ８）。
【００６２】
その後、プロセッサ１００−１内の処理部１０６においてスレッドＢが生成される（ステップＳ１２）。
【００６３】
スレッドＢが生成されると、プロセッサ１００−１内のスレッド開始処理部１１０は、スレッドＢを実行待ちキューにつなぐと共に、スレッドＢがプロセスＤの最初のスレッドであるか否かを判断する（ステップＳ１）。この場合、スレッドＢは最初のスレッドでないので、ステップＳ２の処理はスキップされる。
【００６４】
また、プロセッサ１００−１内の並列実行時間算出処理部１０７において、並列実行時間算出処理が行われる（ステップＳ１３）。このとき、並列カウンタ２０３の値Ｃは「０」であるので、並列実行時間テーブル２０２は更新されず、タイムスタンプ２０４の値ＴＳにシステムタイマ２０５の値ＳＴＭが設定される。すなわち、ＴＳ＝２５となる（ＳＴＭは変動しなかったと仮定）。その後、並列カウンタ増加処理部１０８において、並列カウンタ増加処理が行われ、並列カウンタ２０３の値Ｃが「１」にされる（ステップＳ１４）。その後、割込み終了処理部１０９による割込み終了処理が行われ、スレッドＡはユーザプログラムの処理に復帰する。
【００６５】
一方、実行待ちキューにつながれたスレッドＢ（新たに生成されたスレッド）に、或るプロセッサ（例えば、プロセッサ１００−２）内のプロセッサ割当処理部１１１によりプロセッサ１００−２が割り当てられると、プロセッサ１００−２内の並列実行時間算出処理部１０７が並列実行時間の算出処理を行う（ステップＳ１３）。この時、スレッドＢが属するプロセスＤに割り当てられているスレッド共有空間２０１内の並列カウンタ２０３の値Ｃは「１」であるので、並列実行時間テーブル２０２の第１番目のエントリの値Ｔ１をＴ１＝Ｔ１＋（ＳＴＭ−ＴＳ）＝１５＋（２５−２５）＝１５とし、タイムスタンプ２０４の値ＴＳを「２５」とする。その後、並列カウンタ増加処理部１０８において、並列カウンタ２０３に対する増加処理が行われ、並列カウンタ２０３の値Ｃが「２」にされる。
【００６６】
その後、割込み終了処理部１０９による割込み終了処理が行われ、スレッドＢは、ユーザプログラムの実行を開始する。ここで、スレッドＡが割り当てられているプロセッサ１００−１内の並列実行時間算出処理部１０７、並列カウンタ増加処理部１０８よりも、スレッドＢが割り当てられるプロセッサ１００−２内の並列実行時間算出処理部１０７、割込み終了処理部１０９が先の処理を行ったとしても最終的な結果は同じである。
【００６７】
次に、時刻Ｐ３（ＳＴＭ＝３５）において、プロセッサ１００−２で実行されているスレッドＢがスリープシステムコールを発行すると、プロセッサ１００−２内の割込み開始処理部１０２に制御が移り、並列実行時間算出処理部１０３が並列実行時間の算出処理を行う（ステップＳ７）。この時、並列カウンタ２０３の値Ｃは「２」であるので、並列実行時間テーブル２０２の第１番目、第２番目の値Ｔ１、Ｔ２が更新される。すなわち、ＳＴＭ−ＴＳ＝３５−２５＝１０が、値Ｔ１、Ｔ２に加算され、Ｔ１＝Ｔ１＋（ＳＴＭ−ＴＳ）＝１５＋１０＝２５、Ｔ２＝Ｔ２＋（ＳＴＭ−ＴＳ）＝０＋１０＝１０となる。その後、並列実行時間算出処理部１０３において並列カウンタ２０３の減少処理が行われ、並列カウンタ２０３の値Ｃが「１」にされる（ステップＳ８）。その後、プロセッサ返還処理部１１３がプロセッサの返還処理を行い、更に、ウェイト処理部１１２がスレッドＢをスリープキューにつなぎ、停止させる。
【００６８】
次に、時刻Ｐ４（ＳＴＭ＝４０）において、例えば、プロセッサ１００−２内のプロセッサ割当処理部１１１が、停止していたスレッドＢにプロセッサ１００−２を割り当てると、プロセッサ１００−２内の並列実行時間算出処理部１０７が、並列実行時間の算出処理を行う（ステップＳ１３）。この時、並列カウンタ２０３の値Ｃは「１」となっているので、並列実行時間算出処理部１０７は、並列実行時間テーブル２０２の第１番目のエントリの値Ｔ１を更新すると共に、タイムスタンプ２０４の値ＴＳを更新する。すなわち、Ｔ１＝Ｔ１＋（ＳＴＭ−ＴＳ）＝２５＋（４０−３５）＝３０、ＴＳ＝ＳＴＭ＝４０とする。その後、並列カウンタ増加処理部１０８が、並列カウンタ２０３の値Ｃを＋１して「２」にする。その後、割込み終了処理部１０９により割込み終了処理が行われ、スレッドＢがユーザプログラムの実行を再開する。
【００６９】
次に、時刻Ｐ５（ＳＴＭ＝５０）において、プロセッサ１００−２で動作しているスレッドＢが、スレッドＣを生成するためにスレッド生成システムコールを発行すると、プロセッサ１００−２内の割込み開始処理部１０２に制御が移る。これにより、並列実行時間算出処理部１０３が並列実行時間の算出処理を行う（ステップＳ７）。この時、並列カウンタ２０３の値Ｃは「２」となっているので、並列実行時間テーブル２０２の第１、第２番目のエントリの値Ｔ１、Ｔ２に（ＳＴＭ−ＴＳ）を加算し、その後、タイムスタンプ２０４の値ＴＳにシステムタイマ２０５の値ＳＴＭを設定する。すなわち、Ｔ１＝Ｔ１＋（ＳＴＭ−ＴＳ）＝３０＋（５０−４０）＝４０、Ｔ２＝Ｔ２＋（ＳＴＭ−ＴＳ）＝１０＋（５０−４０）＝２０、ＴＳ＝ＳＴＭ＝５０となる。次に、並列カウンタ減少処理部１０４が、並列カウンタ２０３の値Ｃを１減少させ、Ｃ＝１とする。
【００７０】
その後、プロセッサ１００−２内の処理部１０６でスレッドＣが生成される。スレッドＣが生成されると、プロセッサ１００−２内のスレッド開始処理部１１０は、スレッドＣを実行待ちキューにつなぐと共に、スレッドＣがプロセスＤ内の最初のスレッドであるか否かを判断する（ステップＳ１）。この場合、スレッドＣは最初のスレッドでないので、ステップＳ２の処理はスキップされる。
【００７１】
また、プロセッサ１００−２内の並列実行時間算出処理部１０７において、並列実行時間の算出処理が実行される（ステップＳ１３）。この時、並列カウンタ２０３の値Ｃが「１」であるので、並列実行時間テーブル２０２の第１番目のエントリの値Ｔ１を更新すると共に、タイムスタンプ２０４の値ＴＳを更新する。すなわち、Ｔ１＝Ｔ１＋（ＳＴＭ−ＴＳ）＝４０＋（５０−５０）＝４０、ＴＳ＝ＳＴＭ＝５０となる。次に、並列カウンタ増加処理部１０８が、並列カウンタ２０３の値Ｃを１増加させ、Ｃ＝２とする（ステップＳ１４）。その後、割込み終了処理部１０９で割込み終了処理が実行され、スレッドＢは、ユーザプログラムの処理に復帰する。
【００７２】
一方、実行待ちキューにつながれている、新たに生成されたスレッドＣに或るプロセッサ（例えば、プロセッサ１００−３とする）内のプロセッサ割当処理部１１１によりプロセッサ１００−３が割り当てられると、並列実行時間算出処理部１０７が、並列実行時間の算出処理を行う（ステップＳ１３）。この時、並列カウンタ２０３の値Ｃは「２」であるので、並列実行時間テーブル２０２の第１番目、第２番目のエントリの値Ｔ１、Ｔ２が更新されると共に、タイムスタンプ２０４の値ＴＳが更新される。すなわち、Ｔ１＝Ｔ１＋（ＳＴＭ−ＴＳ）＝４０＋（５０−５０）＝４０、Ｔ２＝Ｔ２＋（ＳＴＭ−ＴＳ）＝２０＋（５０−５０）＝２０、ＴＳ＝ＳＴＭ＝５０となる。次に、並列カウンタ増加処理部１０８が、並列カウンタ２０３の値Ｃを１増加させ、Ｃ＝３とする。
【００７３】
その後、割込み終了処理部１０９が割込み終了処理を行い、スレッドＣはユーザプログラムの実行を開始する。なお、スレッドＢが実行されているプロセッサ１００−２内の並列実行時間算出処理部１０７、並列カウンタ増加処理部１０８よりも先に、スレッドＣが実行されるプロセッサ１００−３内の並列実行時間算出処理部１０７、並列カウンタ増加処理部１０８がステップＳ１３、Ｓ１４の処理を実行したとしても最終的な結果は同じである。
【００７４】
次に、時刻Ｐ６（ＳＴＭ＝６０）において、プロセッサ１００−１で動作しているスレッドＡが、終了するために終了システムコールを発行すると、プロセッサ１００−１内の割込み開始処理部１０２に制御が移り、並列実行時間算出処理部１０３が並列実行時間の算出処理を行う（ステップＳ７）。このとき、並列カウンタ２０３の値Ｃは「３」であるので、並列実行時間テーブル２０２の第１番目、第２番目、第３番目のエントリの値Ｔ１、Ｔ２、Ｔ３が更新されると共に、タイムスタンプ２０４の値ＴＳが更新される。すなわち、Ｔ１＝Ｔ１＋（ＳＴＭ−ＴＳ）＝４０＋（６０−５０）＝５０、Ｔ２＝Ｔ２＋（ＳＴＭ−ＴＳ）＝２０＋（６０−５０）＝３０、Ｔ３＝Ｔ３＋（ＳＴＭ−ＴＳ）＝０＋（６０−５０）＝１０、ＴＳ＝６０となる。その後、並列カウンタ減少処理部１０４が、並列カウンタ２０３の値Ｃを−１し、Ｃ＝２とする。
【００７５】
その後、プロセッサ返還処理部１１３が、スレッドＡに割り当てていたプロセッサの返還処理を行い、更に、スレッド終了処理部１１４が、スレッドＡの終了処理を行う。このとき、スレッドＡは、プロセスＤ内の最後のスレッドでないので（ステップＳ５がＮｏ）、スレッド共有空間２０１の解放処理は行われない。
【００７６】
次に、時刻Ｐ７（ＳＴＭ＝６５）において、プロセッサ１００−３で動作しているスレッドＣが、終了するために終了システムコールを発行すると、プロセッサ１００−３内の割込み開始処理部１０２に制御が移る。これにより、並列実行時間算出処理部１０３が、並列実行時間の算出処理を行う（ステップＳ７）。このとき、並列カウンタ２０３の値Ｃは、「２」であるので、並列実行時間テーブル２０２の第１番目、第２番目のエントリの値Ｔ１、Ｔ２が更新されると共に、タイムスタンプ２０４の値ＴＳが更新される。すなわち、Ｔ１＝Ｔ１＋（ＳＴＭ−ＴＳ）＝５０＋（６５−６０）＝５５、Ｔ２＝Ｔ２＋（ＳＴＭ−ＴＳ）＝３０＋（６５−６０）＝３５、ＴＳ＝ＳＴＭ＝６５となる。その後、並列カウンタ減少処理部１０４が並列カウンタ２０３の値Ｃを１減少させ、Ｃ＝１とする（ステップＳ８）。
【００７７】
その後、プロセッサ返還処理部１１３が、スレッドＣに割り当てていたプロセッサの返還処理を行い、更に、スレッド終了処理部１１４が、スレッドＣの終了処理を行う。このとき、スレッドＣは、プロセスＤ内の最後のスレッドでないので（ステップＳ５がＮｏ）、スレッド共有空間２０１の解放処理は行われない。
【００７８】
次に、時刻Ｐ８（ＳＴＭ＝７０）において、プロセッサ１００−２で動作しているスレッドＢが、終了するために終了システムコールを発行すると、プロセッサ１００−２内の割込み開始処理部１０２に制御が移る。これにより、並列実行時間算出処理部１０３が、並列実行時間の算出処理を行う（ステップＳ７）。このとき、並列カウンタ２０３の値Ｃは「１」であるので、並列実行時間テーブル２０２の第１番目の値Ｔ１を更新すると共に、タイムスタンプ２０４の値ＴＳを更新する。すなわち、Ｔ１＝Ｔ１＋（ＳＴＭ−ＴＳ）＝５５＋（７０−６５）＝６０、ＴＳ＝ＳＴＭ＝７０となる。その後、並列カウンタ減少処理部１０４が、並列カウンタ２０３の値Ｃを１減算し、Ｃ＝０とする（ステップＳ８）。
【００７９】
その後、プロセッサ返還処理部１１３がスレッドＢに割り当てていたプロセッサの返還処理を行い、更に、スレッド終了処理部１１４がスレッドＢの終了処理を行う。この時、スレッドＢは、プロセスＤ内で最後のスレッドであるので（ステップＳ５がＹｅｓ）、スレッド終了処理部１１４は、スレッド共有空間２０１を解放する（ステップＳ６）。
【００８０】
以上の処理によりスレッドＡ、Ｂ、Ｃから構成される並列プログラムの並列実行時間が算出でき、任意の時点で発行する並列実行時間取得システムコールにより、その瞬間の並列実行時間（Ｔ１，Ｔ２，Ｔ３）を容易に取得することが可能となる。例えば、時刻Ｐ８でスレッドＢが並列実行時間取得システムコールを発行した場合、スレッドＢは、並列実行時間（Ｔ１＝６０，Ｔ２＝３５，Ｔ１＝１０）を取得することができる。そして、スレッドＢは、上記並列実行時間を取得すると、例えば、取得した並列実行時間を表示部（図示せず）に表示したり、並列実行時間に基づいて２並列度＝３５／６０＝０．５８３３（約５８．３％）、３並列度＝１０／６０＝０．１６６６６（約１６．７％）を算出し、それらを表示部に表示する。
【００８１】
なお、上述した実施例においては、主記憶装置２００上にスレッド共有空間２０１を生成するようにしたが、各プロセッサ１００−１〜１００−Ｎによって共有される他の記憶装置上にスレッド共有空間２０１を生成するようにしても構わない。
【００８２】
【発明の効果】
本発明の第１の効果は、マルチプロセッサ計算機上で複数の並列プログラムが動作している場合であっても、各並列プログラムの並列実行時間を取得できるという点である。その理由は、プロセス固有或いはジョブ固有の並列実行時間テーブルを利用して並列実行時間を管理するようにしているからである。
【００８３】
本発明の第２の効果は、並列プログラムの実行時の並列度として、２並列度からＮ並列度までの最大（Ｎ−１）個の並列度が取得でき、並列度の詳細な検証ができる点である。その理由は、複数のスレッド数（１，２，…，Ｎ）毎に、そのスレッド数以上のスレッドに同時にプロセッサが割り当てられた時間を登録する領域を有する並列実行時間テーブルを用いて、並列実行時間を管理しているからである。
【図面の簡単な説明】
【図１】本発明の実施例のブロック図である。
【図２】図１の処理例を示す流れ図である。
【図３】具体例を挙げて実施例の動作を説明するための図である。
【符号の説明】
１００−１〜１００−Ｎ…プロセッサ
１０１…ユーザプログラム実行処理部
１０２…割込み開始処理部
１０３…並列実行時間算出処理部
１０４…並列カウンタ減少処理部
１０５…並列実行時間取得処理部
１０６…処理部
１０７…並列実行時間算出処理部
１０８…並列カウンタ増加処理部
１０９…割込み終了処理部
１１０…スレッド開始処理部
１１１…プロセッサ割当処理部
１１２…ウェイト処理部
１１３…プロセッサ返還処理部
１１４…スレッド終了処理部
２００…主記憶装置
２０１…スレッド共有空間
２０２…並列実行時間テーブル
２０３…並列カウンタ
２０４…タイムスタンプ
２０５…システムタイマ
Ｋ…記録媒体[0001]
TECHNICAL FIELD OF THE INVENTION
The present invention relates to a multiprocessor computer in which a plurality of processors share a main storage device, and more particularly, to a technique for calculating a parallel execution time, which is an index indicating how efficiently a parallel program operates.
[0002]
[Prior art]
As a conventional technique for evaluating the performance of a multiprocessor computer including a plurality of processors, a start time and an end time when a processor is in an operating state and a time when the processor is in an inactive state for each processor are described. There is known a technique in which a control unit records the number of operating processors and an operating rate at an arbitrary time based on each of the above times (see, for example, Patent Document 1).
[0003]
[Patent Document 1]
JP-A-5-189395
[0004]
[Problems to be solved by the invention]
According to the above-described conventional technology, it is possible to know how many processors are operating at the same time. However, it is important for a user who executes a parallel program using a multiprocessor computer that how many processors are simultaneously operating. It is not whether it is running, but whether or not the parallel program is executed efficiently. In the case where the number of parallel programs executed by the multiprocessor computer is one, it is possible to determine whether or not the parallel programs are being efficiently executed by the above-described conventional technique. That is, it is possible to determine that the parallel program is being executed more efficiently as the time during which many processors are operating simultaneously is longer.
[0005]
However, when there are a plurality of parallel programs executed by the multiprocessor computer, it is not possible to determine whether or not the parallel programs are being efficiently executed based on the number of operating processors, even if the number of operating processors is known. . For example, when two threads of the parallel program α and one thread of the parallel program β are executed by the multiprocessor computer, the above-described conventional technology only shows that the number of operating processors is three. Since the number of simultaneously executing threads is not known for each of the parallel programs α and β, it cannot be determined whether or not each parallel program is being executed efficiently.
[0006]
Therefore, an object of the present invention is to make it possible to determine whether or not each parallel program is being executed efficiently even when a plurality of parallel programs are being executed simultaneously.
[0007]
[Means for Solving the Problems]
In order to achieve the above object, a first multiprocessor computer according to the present invention is a multiprocessor computer including a plurality of processors and a main storage device shared by the plurality of processors.
Each of the processors,
When the thread generated by the own processor is the first thread in the process to which the thread belongs, a parallel execution time table unique to the process is provided on the main storage device, and for each of a plurality of threads, Means for generating a parallel execution time table having an area for registering a time at which a processor is simultaneously allocated to threads equal to or greater than the number of threads;
When assigning the own processor to a thread in the process, and when returning the own processor from the thread in the process, the number of threads in the process to which the processor is currently assigned, the current time, Means for updating the parallel execution time table based on the content of the parallel execution time table and the latest update time for the parallel execution time table,
Means for acquiring the contents of the parallel execution time table.
[0008]
Further, a second multiprocessor computer according to the present invention includes:
In the first multiprocessor computer,
The parallel execution time table is generated on a storage device shared by the plurality of processors other than the main storage device.
[0009]
More specifically, the third multiprocessor computer according to the present invention comprises:
In a multiprocessor computer including a plurality of processors and a main storage device shared by the plurality of processors,
Each of the processors,
When the thread created by the own processor is the first thread in the process to which the thread belongs, the thread is a thread shared space unique to the process on the main storage device, and the thread is shared by a plurality of threads. A parallel execution time table having an area in which the time at which a processor is simultaneously allocated to threads equal to or more than the number of threads is registered, and the number of threads to which a processor is currently allocated among threads in the process are set. Means for generating a thread shared space including a parallel counter, and a time stamp at which the latest update time for the parallel execution time table is set.
When allocating its own processor to a thread in the process, the content of the parallel execution time table is determined based on the current time, the content of the parallel execution time table, the content of the parallel counter, and the content of the time stamp. Means for updating, setting the current time in the time stamp, and further increasing the value of the parallel counter;
When returning the own processor from a thread in the process, the content of the parallel execution time table is determined based on the current time, the content of the parallel execution time table, the content of the parallel counter, and the content of the time stamp. Means for updating the timestamp with the current time, and further reducing the value of the parallel counter;
Means for acquiring the contents of the parallel execution time table.
[0010]
A fourth multiprocessor computer according to the present invention comprises:
In a third multiprocessor computer,
The thread sharing space is created on a storage device shared by the plurality of processors other than the main storage device.
[0011]
[Action]
When a thread is generated by a certain processor, if the thread is the first thread in the process, the process-specific parallel execution time table is created on the main storage device. Thereafter, when any processor assigns a processor to the thread in the process or returns the processor from the thread in the process, the current time, the latest update time for the parallel execution time table, The parallel execution time table is updated based on the number of threads belonging to the process to which the processor is assigned and the contents of the parallel execution time table. As described above, since the parallel execution time is managed using the parallel execution time table unique to the process, even when a plurality of parallel programs are simultaneously executed by the multiprocessor computer, each parallel program is executed. It is possible to determine whether the operation is efficient.
[0012]
BEST MODE FOR CARRYING OUT THE INVENTION
Next, embodiments of the present invention will be described in detail with reference to the drawings.
[0013]
[Configuration of the embodiment]
FIG. 1 is a block diagram of an embodiment of a multiprocessor computer according to the present invention, which is shared by a plurality of processors (CPUs, arithmetic units) 100-1 to 100-N and each of the processors 100-1 to 100-N. And a main storage device 200.
[0014]
First, definitions of terms used in the following description will be described.
[0015]
The process corresponds to a concept generally called a process in the UNIX (registered trademark) system. That is, a process is a minimum execution unit having a unique storage area, and is scheduled by time division processing. Therefore, in a multiprocessor computer, different processes can be executed simultaneously on a plurality of processors, and in a single processor system, a plurality of processes can be executed interactively.
[0016]
Further, a thread corresponds to a concept generally called a thread in a UNIX (registered trademark) system. That is, a thread is a plurality of execution units generated from a single process, and is scheduled by time-division processing like a process. A thread is the smallest execution unit having no unique storage area, and threads belonging to the same process share a unique storage area with the same process to which they belong.
[0017]
The time-division processing means that a processor is assigned from an execution waiting process having a higher execution priority when an interrupt that occurs asynchronously or synchronously occurs, and the execution priority is assigned to the processor while the process is executed on the processor. This is a scheduling method in which the degree is reduced and the processor is returned from the one with the lower priority and shifts to the execution waiting state. Generally, the minimum execution unit of a parallel program in a multiprocessor computer is often realized by a thread, but may be realized by a process. When implemented by threads, a parallel program is equivalent to a process composed of a plurality of threads, and utilizes a space unique to this process and a thread shared space shared by a plurality of threads constituting the process. Operate. When realized by a process, a parallel program is equivalent to a job configured by a plurality of processes, and utilizes a space unique to the job and a process shared space shared by a plurality of processes configuring the job. Operate. The description below assumes that the minimum execution unit of a parallel program is a thread.However, even in a system where the minimum execution unit is a process, a process can be read as a thread and a job can be read as a process. Similar effects can be obtained.
[0018]
In FIG. 1, the main storage device 200 is the only main storage device in a system equivalently shared by the processors 100-1 to 100-N. The main storage device 200 includes a thread sharing space 201 and a system timer 205.
[0019]
The thread shared space 201 is a storage area that can be occupied and accessed by any one of a plurality of processes existing in the system at an arbitrary point in time, and is a storage area shared by threads belonging to the process. It is. In such a thread sharing space 201, a parallel execution time table 202, a parallel counter 203, and a time stamp 204 are provided.
[0020]
The parallel execution time table 202 represents an area in which an array (number of elements N) from one parallel execution time to N parallel execution times for threads constituting the corresponding process is stored. Here, the n-parallel execution time (n = 1, 2,..., N) represents the time during which n or more threads are operating at the same time (the time during which processors are allocated to n or more threads at the same time). Shall be.
[0021]
The parallel counter 203 indicates the number (0 to N) of threads operating on different processors at the same time among the threads constituting the corresponding process.
[0022]
The time stamp 204 stores the latest update time for the parallel execution time table 202.
[0023]
The system timer 205 stores the only timer value in the system that keeps increasing monotonically in machine clock units during the operation of the system.
[0024]
1, the processor 100-1 includes a user program execution processing unit 101, an interrupt start processing unit 102, a parallel execution time calculation processing unit 103, a parallel counter decrease processing unit 104, and a parallel execution time acquisition processing unit 105. , A processing unit 106, a parallel execution time calculation processing unit 107, a parallel counter increase processing unit 108, an interrupt end processing unit 109, a thread start processing unit 110, a processor allocation processing unit 111, and a weight processing unit 112. , A processor return processing unit 113, a thread end processing unit 114, and a recording medium K. The other processors have the same configuration as the processor 100-1.
[0025]
The user program execution processing unit 101 is means for executing a program code in a thread in a user mode.
[0026]
When an asynchronous or synchronous interrupt such as a system call, an I / O process, or an exception process occurs, the interrupt start processing unit 102 saves a processor context and calls an interrupt handler corresponding to the cause of the interrupt.
[0027]
The parallel execution time calculation unit 103 is called immediately after the interrupt start process, refers to the value of the parallel counter 203 in the thread shared space 201 unique to the process to which the thread that issued the system call belongs, and sets the value to “ If the value is “1” or more, the difference between the system timer 205 and the time stamp 204 is added to each value from the first entry in the parallel execution time table 202 to the entry corresponding to the value of the parallel counter 203. Then, the value of the system timer 205 is set to the time stamp 204. When the value of the parallel counter 203 is “0”, only the process of setting the value of the system timer 205 to the time stamp 204 is performed, and the update process for the parallel execution time table 202 is not performed.
[0028]
The parallel counter decrease processing unit 104 has a function of decreasing the value of the parallel counter 203 by one.
[0029]
The parallel execution time acquisition processing unit 105 is called by the parallel execution time acquisition system call, acquires the contents of each entry of the parallel execution time table 202 of the thread shared space 201 corresponding to the process to which the issuing thread belongs, and Has a function to return.
[0030]
The parallel execution time calculation processing unit 107 has the same function as the parallel execution time calculation processing unit 103.
[0031]
The parallel counter increase processing unit 108 has a function of increasing the value of the parallel counter 203 by one.
[0032]
The interrupt termination processing unit 109 restores the processor context and returns to the interrupt occurrence point.
[0033]
When the newly created thread is the first thread in the process to which the thread belongs, the thread start processing unit 110 shares the space unique to the process with all the threads belonging to the process. After securing the thread shared space 201 to be executed, the values of the parallel execution time table 202, the parallel counter 203, and the time stamp 204 are initialized to “0”.
[0034]
The processor allocation processing unit 111 robbs a processor from another thread having a low execution priority, executes switch processing such as switching of a virtual space, and starts execution of the thread.
[0035]
When a sleep system call or a time slice expires, the wait processing unit 112 stops the thread using an execution wait queue or a sleep queue.
[0036]
The processor return processing unit 113 is called when there is a thread having a higher execution priority than its own thread due to a sleep call due to a system call or an I / O wait, a termination process, or a time-slice outage interrupt. Switch processing such as switching of a virtual space in order to yield a processor to a thread having a high thread count.
[0037]
The thread termination processing unit 114 has a function of releasing the process-specific thread shared space 201 when the terminated thread is the last thread in the process.
[0038]
Note that a process continuously performed by the parallel execution time calculation processing unit 103 and the parallel counter decrease processing unit 104, a process continuously performed by the parallel execution time calculation processing unit 107 and the parallel counter increase processing unit 108, The processing performed by the time acquisition processing unit 105 is serialized by exclusive control so as not to be simultaneously executed in different threads in the same process.
[0039]
The processing unit 106 performs general interrupt processing such as processing according to a system call, I / O processing, and exception processing other than the processor return processing and the parallel execution time acquisition processing.
[0040]
The recording medium K is a disk, a semiconductor memory, or another recording medium. The program recorded on the recording medium K is read by the processor 100-1, and by controlling the operation thereof, the user program execution processing unit 101, the interrupt start processing unit 102, the parallel execution Time calculation processing unit 103, parallel counter decrease processing unit 104, parallel execution time acquisition processing unit 105, processing unit 106, parallel execution time calculation processing unit 107, parallel counter increase processing unit 108, interrupt end processing unit 109, thread start processing unit 110, a processor assignment processing unit 111, a weight processing unit 112, a processor return processing unit 113, and a thread termination processing unit 114.
[0041]
[Operation of the embodiment]
Next, the operation of this embodiment will be described in detail with reference to the drawings.
[0042]
First, when the thread X is generated in the processor 100-1, the thread start processing unit 110 in the processor 100-1 connects the thread X to an execution waiting queue (not shown), and furthermore, is shown in the flowchart of FIG. As described above, it is determined whether or not the thread X is the first thread in the process Z to which the thread X belongs (step S1). If the thread X is the first thread (step S1 is Yes), the thread X is unique to the process Z. Of the parallel execution time table 202, and initializes the values of the parallel counter 203 and the time stamp 204 to "0" (step S2). On the other hand, if it is not the first thread (No in step S1), step S2 is skipped.
[0043]
Thereafter, when the processor 100-1 is assigned to the thread X connected to the execution queue by the processor assignment processing unit 111 in a certain processor (for example, the processor 100-1), the parallel execution time calculation processing unit 107 The parallel execution time is calculated (step S13).
[0044]
The processing in step S13 will be described in detail below. First, referring to the value C of the parallel counter 203 in the thread shared space 201 assigned to the process Z to which the thread X belongs, it is checked whether or not the value C is equal to or more than “1”.
[0045]
If the value is “1” or more, a difference (STM−TS) between the value STM of the system timer 205 and the value TS of the time stamp 204 is obtained. Then, for the entries from the first entry to the entry corresponding to the value of the parallel counter 203 among the entries of the parallel execution time table 202, the difference (STM-TS) is set to the value set therein. Is added. Finally, the value STM of the system timer 205 is set to the value TS of the time stamp 204.
[0046]
On the other hand, if the value C of the parallel counter 203 is less than “1”, only the process of setting the value STM of the system timer 205 to the value TS of the time stamp 204 is performed, and the update to the parallel execution time table 202 is performed. No processing is performed. The above is the details of the processing performed in step S13.
[0047]
When the processing of the parallel execution time calculation processing unit 107 ends, the parallel counter increase processing unit 108 increases the value C of the parallel counter 203 in the thread shared space 201 allocated to the process Z to which the thread X belongs ( Step S14). In this embodiment, the value C of the parallel counter 203 is increased by “1”.
[0048]
Thereafter, an interrupt end process is performed by the interrupt end processing unit 109, and the thread X shifts to the user mode, and the user program is executed by the user program execution processing unit 101 (step S3). In step S3, the parallel program is executed according to the description of the program, and the user program continues to be executed unless an interrupt occurs (unless step S4 becomes Yes).
[0049]
If the thread X issues a system call or I / O, or is interrupted due to an out-of-time-slice or exceptional processing (Yes in step S4), the processing shifts to the interrupt start processing unit 102, and the following processing is performed. .
[0050]
First, the parallel execution time calculation processing unit 103 performs the same processing as the processing performed by the parallel execution time calculation processing unit 107 in step S13 on the thread shared space 201 of the process Z to which the thread X belongs (step S13). S7). Next, the parallel counter decrease processing unit 104 decreases the value C of the parallel counter 203 (Step S8). In this embodiment, the value C of the parallel counter 203 is decreased by "1".
[0051]
Thereafter, an interrupt process corresponding to the interrupt factor is executed. If the interrupt factor is a parallel execution time acquisition system call (Yes in step S9), the parallel execution time acquisition processing unit 105 performs a parallel execution time acquisition process (step S11). In the process of acquiring the parallel execution time in step S11, a process of acquiring all the contents of the parallel execution time table 202 and returning it to the thread that issued the system call is performed. Thereafter, the parallel execution time calculation processing unit 107 performs the above-described parallel execution time calculation processing on the thread shared space 201 of the process Z to which the thread X belongs (step S13). At 108, the above-described process of increasing the parallel counter 203 is performed (step S14). Thereafter, an interrupt end process is executed in the interrupt end processing unit 109, the thread X shifts to the user mode, and the user program execution processing unit 101 executes the user program (step S3).
[0052]
If the interrupt factor is not the parallel execution time acquisition system call (No in step S9), it is determined whether the interrupt factor is a termination system call (step S10).
[0053]
If it is not an end system call (No in step S10), the processing unit 106 performs an interrupt process according to the interrupt factor (step S12). The process performed in step S12 includes, for example, a process of generating a new thread Y. If a new thread Y is generated in step S12, the thread start processing unit 110 connects the thread Y to the execution waiting queue, and performs the above-described step S1 on the thread Y as a processing target. In this case, since the thread Y is not the first thread in the process Z (No in Step S1), Step S2 is skipped. Thereafter, when the processor 100-n is assigned to the thread Y by the processor assignment processing unit 111 in one of the processors 100-1 to 100-N (1 ≦ n ≦ N), Processing by the parallel execution time calculation processing unit 107, the parallel counter increase processing unit 108, and the interruption end processing unit 109 is sequentially performed.
[0054]
On the other hand, when the interrupt factor is the termination system call (Yes in step S10), the processor return processing unit 113 performs the processor return processing, and then the thread termination processing unit 114 performs the thread termination processing. . In this thread end processing, first, it is determined whether or not the thread that issued the end system call (for example, thread X) is the last thread in the same process Z (step S5). If it is the last thread (Yes in step S5), the thread shared space 201 allocated to the process Z to which the thread X belongs is released (step S6). On the other hand, if it is not the last thread (No in step S5), step S6 is skipped. Thereafter, the thread X ends.
[0055]
With the above processing, each value from one parallel execution time to N parallel execution times is collected in the parallel execution time table 202 every time an interrupt occurs. If the parallel execution time is represented by Tn, the value of the parallel counter 203 is represented by C, the value of the time stamp 204 is represented by TS, and the value of the system timer 205 is represented by STM, the processing of calculating the parallel execution time in steps S7 and S13 is as follows. Tn + (STM-TS) (for each value of 1 ≦ n ≦ C) and TS = STM. Further, the parallel counter subtraction process in step S8 can be expressed as C = C-1, and the parallel counter increase process in step S14 can be expressed as C = C + 1.
[0056]
Next, how the parallel execution time is calculated will be described using a specific example.
[0057]
FIG. 3 shows a value Tn (n = 1, 2,..., N) of each entry of the parallel execution time table 202 and a parallel counter 203 when a parallel program composed of three threads A, B, and C operates. Is a diagram showing the transition of the value C of the time stamp 204 and the value TS of the time stamp 204. In the following, for the sake of simplicity, it is assumed that interrupts have occurred only at eight points at times P1, P2,..., P8, and the time required from the occurrence of the interrupt to the end thereof is sufficiently short to 0. Suppose that That is, it is assumed that the value of the system timer 205 does not change between the time when the interrupt start processing is performed by the interrupt start processing unit 102 and the time when the interrupt end processing is performed by the interrupt end processing unit 109. In an actual system, an interrupt occurs at an arbitrary point in an arbitrary thread, and the time required from the occurrence of an interrupt to the end thereof is a variable value of 0 or more. In such a case, this is only a combination of the processes illustrated in FIG. 3 and the parallel execution time is correctly collected.
[0058]
At time P1 (STM = 10), when the thread A is generated by the processor 100-1, the thread start processing unit 110 in the processor 100-1 connects the thread A to the execution waiting queue, and the thread A Is the first thread in the process D to which it belongs (step S1 in FIG. 2). In this case, since the thread A is the first thread (Yes in step S1), the thread start processing unit 110 generates a thread shared space 201 unique to the process D to which the thread A belongs, The values T1 to TN of each entry, the value C of the parallel counter 203, and the value TS of the time stamp 204 are all initialized to “0” (step S2).
[0059]
Thereafter, when the processor 100-1 is assigned to the thread A connected to the execution waiting queue by the processor assignment processing unit 111 in a certain processor (for example, the processor 100-1), the parallel execution time calculation processing unit 107 The parallel execution time calculation processing of step S13 is performed. In this case, since the value C of the parallel counter 203 is “0”, the parallel execution time calculation processing unit 107 does not perform the update process on the parallel execution time table 202 and sets the value TS of the time stamp 204 to the value of the system timer 205. Only the process of setting the STM value is performed. Therefore, TS = 10.
[0060]
Thereafter, the parallel counter increase processing unit 108 increments the value C of the parallel counter 203 by 1 and sets C = 1 (step S14). Thereafter, an interrupt end process is performed by the interrupt end processing unit 109, and the thread A starts execution of the user program by the user program execution processing unit 101 (step S3).
[0061]
Next, at time P2 (STM = 25), the thread A issues a thread generation system call to generate a new thread B (Yes in step S4). As a result, the processing shifts to the interrupt start processing unit 102, and the parallel execution time calculation processing is performed in the parallel execution time calculation processing unit 103 (step S7). At this time, since the value C of the parallel counter 203 is “1”, the difference T1 between the value STM of the system timer 205 and the value TS of the time stamp 204 is added to the value T1 of the first entry of the parallel execution time table 202. (STM-TS), and the value STM of the system timer 205 is set to the value TS of the time stamp 204. That is, T1 = T1 + (STM-TS) = 0 + (25-10) = 15 and TS = STM = 25. Thereafter, the parallel counter decrease processing unit 104 decreases the value C of the parallel counter 203 by 1 and sets C = 0 (step S8).
[0062]
Then, the thread B is generated in the processing unit 106 in the processor 100-1 (Step S12).
[0063]
When the thread B is generated, the thread start processing unit 110 in the processor 100-1 connects the thread B to the execution waiting queue and determines whether the thread B is the first thread of the process D (step S1). S1). In this case, since the thread B is not the first thread, the processing in step S2 is skipped.
[0064]
The parallel execution time calculation processing section 107 in the processor 100-1 performs a parallel execution time calculation process (step S13). At this time, since the value C of the parallel counter 203 is “0”, the parallel execution time table 202 is not updated, and the value TS of the system timer 205 is set to the value TS of the time stamp 204. That is, TS = 25 (assuming that the STM has not changed). Thereafter, the parallel counter increase processing unit 108 performs parallel counter increase processing, and the value C of the parallel counter 203 is set to "1" (step S14). Thereafter, an interrupt end process is performed by the interrupt end processing unit 109, and the thread A returns to the process of the user program.
[0065]
On the other hand, when the processor 100-2 is assigned to the thread B (a newly created thread) connected to the execution waiting queue by the processor assignment processing unit 111 in a certain processor (for example, the processor 100-2), the processor 100 The parallel execution time calculation processing unit 107 in -2 performs a parallel execution time calculation process (step S13). At this time, since the value C of the parallel counter 203 in the thread shared space 201 assigned to the process D to which the thread B belongs is “1”, the value T1 of the first entry of the parallel execution time table 202 is changed to T1. = T1 + (STM-TS) = 15 + (25−25) = 15, and the value TS of the time stamp 204 is “25”. Thereafter, the parallel counter increase processing unit 108 performs an increase process on the parallel counter 203, and the value C of the parallel counter 203 is set to “2”.
[0066]
Thereafter, an interrupt end process is performed by the interrupt end processing unit 109, and the thread B starts executing the user program. Here, the parallel execution time calculation processing unit in the processor 100-2 to which the thread B is assigned is more than the parallel execution time calculation processing unit 107 and the parallel counter increase processing unit 108 in the processor 100-1 to which the thread A is assigned. 107, the final result is the same even if the interrupt end processing unit 109 performs the previous processing.
[0067]
Next, at time P3 (STM = 35), when the thread B executed by the processor 100-2 issues a sleep system call, the control is transferred to the interrupt start processing unit 102 in the processor 100-2, and the parallel execution time The calculation processing unit 103 performs a parallel execution time calculation process (step S7). At this time, since the value C of the parallel counter 203 is “2”, the first and second values T1 and T2 of the parallel execution time table 202 are updated. That is, STM-TS = 35-25 = 10 is added to the values T1 and T2, and T1 = T1 + (STM-TS) = 15 + 10 = 25 and T2 = T2 + (STM-TS) = 0 + 10 = 10. Thereafter, the parallel execution time calculation processing unit 103 performs a process of decreasing the parallel counter 203, and the value C of the parallel counter 203 is set to “1” (step S8). After that, the processor return processing unit 113 performs the return processing of the processor, and the wait processing unit 112 connects the thread B to the sleep queue and stops it.
[0068]
Next, at time P4 (STM = 40), for example, when the processor allocation processing unit 111 in the processor 100-2 allocates the processor 100-2 to the stopped thread B, the parallel execution in the processor 100-2 The time calculation processing unit 107 performs a parallel execution time calculation process (step S13). At this time, since the value C of the parallel counter 203 is “1”, the parallel execution time calculation processing unit 107 updates the value T1 of the first entry of the parallel execution time table 202 and sets the time stamp 204 Is updated. That is, T1 = T1 + (STM-TS) = 25 + (40-35) = 30 and TS = STM = 40. After that, the parallel counter increase processing unit 108 increases the value C of the parallel counter 203 by “1” to “2”. After that, the interrupt end processing is performed by the interrupt end processing unit 109, and the thread B resumes the execution of the user program.
[0069]
Next, at time P5 (STM = 50), when the thread B operating in the processor 100-2 issues a thread generation system call to generate the thread C, the interrupt start processing unit in the processor 100-2 Control is transferred to 102. Thus, the parallel execution time calculation processing unit 103 performs a parallel execution time calculation process (Step S7). At this time, since the value C of the parallel counter 203 is "2", (STM-TS) is added to the values T1 and T2 of the first and second entries of the parallel execution time table 202, and thereafter, The value STM of the system timer 205 is set to the value TS of the time stamp 204. That is, T1 = T1 + (STM-TS) = 30 + (50-40) = 40, T2 = T2 + (STM-TS) = 10 + (50-40) = 20, and TS = STM = 50. Next, the parallel counter decrease processing unit 104 decreases the value C of the parallel counter 203 by one, and sets C = 1.
[0070]
Then, the thread C is generated by the processing unit 106 in the processor 100-2. When the thread C is generated, the thread start processing unit 110 in the processor 100-2 connects the thread C to the execution queue and determines whether the thread C is the first thread in the process D ( Step S1). In this case, since the thread C is not the first thread, the processing in step S2 is skipped.
[0071]
Further, the parallel execution time calculation processing unit 107 in the processor 100-2 executes a parallel execution time calculation process (step S13). At this time, since the value C of the parallel counter 203 is “1”, the value T1 of the first entry of the parallel execution time table 202 is updated, and the value TS of the time stamp 204 is updated. That is, T1 = T1 + (STM-TS) = 40 + (50-50) = 40 and TS = STM = 50. Next, the parallel counter increase processing unit 108 increases the value C of the parallel counter 203 by 1 and sets C = 2 (step S14). Thereafter, the interrupt end processing unit 109 executes the interrupt end processing, and the thread B returns to the processing of the user program.
[0072]
On the other hand, when the processor 100-3 is allocated by the processor allocation processing unit 111 in a certain processor (for example, the processor 100-3) to the newly generated thread C connected to the execution waiting queue, parallel execution is performed. The time calculation processing unit 107 performs a parallel execution time calculation process (step S13). At this time, since the value C of the parallel counter 203 is “2”, the values T1 and T2 of the first and second entries of the parallel execution time table 202 are updated, and the value TS of the time stamp 204 is changed. Be updated. That is, T1 = T1 + (STM-TS) = 40 + (50-50) = 40, T2 = T2 + (STM-TS) = 20 + (50-50) = 20, and TS = STM = 50. Next, the parallel counter increase processing unit 108 increases the value C of the parallel counter 203 by 1 and sets C = 3.
[0073]
Thereafter, the interrupt end processing unit 109 performs an interrupt end process, and the thread C starts executing the user program. Note that, prior to the parallel execution time calculation processing unit 107 and the parallel counter increase processing unit 108 in the processor 100-2 in which the thread B is executed, the parallel execution time calculation in the processor 100-3 in which the thread C is executed is performed. Even if the processing unit 107 and the parallel counter increase processing unit 108 execute the processing of steps S13 and S14, the final result is the same.
[0074]
Next, at time P6 (STM = 60), when the thread A operating in the processor 100-1 issues a termination system call to terminate, the interrupt start processing unit 102 in the processor 100-1 is controlled. Then, the parallel execution time calculation processing unit 103 performs a parallel execution time calculation process (step S7). At this time, since the value C of the parallel counter 203 is “3”, the values T1, T2, and T3 of the first, second, and third entries of the parallel execution time table 202 are updated, and the time is updated. The value TS of the stamp 204 is updated. That is, T1 = T1 + (STM-TS) = 40 + (60-50) = 50, T2 = T2 + (STM-TS) = 20 + (60-50) = 30, T3 = T3 + (STM-TS) = 0 + (60 −50) = 10 and TS = 60. Thereafter, the parallel counter decrease processing unit 104 decrements the value C of the parallel counter 203 by 1 and sets C = 2.
[0075]
Thereafter, the processor return processing unit 113 performs a return process of the processor assigned to the thread A, and further, the thread termination processing unit 114 performs a termination process of the thread A. At this time, since the thread A is not the last thread in the process D (No in step S5), the release processing of the thread shared space 201 is not performed.
[0076]
Next, at time P7 (STM = 65), when the thread C operating in the processor 100-3 issues a termination system call to terminate, the interrupt start processing unit 102 in the processor 100-3 is controlled. Move on. Thereby, the parallel execution time calculation processing unit 103 performs a parallel execution time calculation process (Step S7). At this time, since the value C of the parallel counter 203 is “2”, the values T1 and T2 of the first and second entries of the parallel execution time table 202 are updated, and the value TS of the time stamp 204 is updated. Is updated. That is, T1 = T1 + (STM-TS) = 50 + (65-60) = 55, T2 = T2 + (STM-TS) = 30 + (65-60) = 35, and TS = STM = 65. After that, the parallel counter decrease processing unit 104 decreases the value C of the parallel counter 203 by 1 and sets C = 1 (step S8).
[0077]
Thereafter, the processor return processing unit 113 performs return processing of the processor assigned to the thread C, and further, the thread termination processing unit 114 performs termination processing of the thread C. At this time, since the thread C is not the last thread in the process D (No in step S5), the release processing of the thread shared space 201 is not performed.
[0078]
Next, at time P8 (STM = 70), when the thread B operating in the processor 100-2 issues an end system call to end, the interrupt start processing unit 102 in the processor 100-2 is controlled. Move on. Thereby, the parallel execution time calculation processing unit 103 performs a parallel execution time calculation process (Step S7). At this time, since the value C of the parallel counter 203 is “1”, the first value T1 of the parallel execution time table 202 is updated and the value TS of the time stamp 204 is updated. That is, T1 = T1 + (STM-TS) = 55 + (70-65) = 60, and TS = STM = 70. Thereafter, the parallel counter decrease processing unit 104 subtracts 1 from the value C of the parallel counter 203, and sets C = 0 (step S8).
[0079]
Thereafter, the processor return processing unit 113 performs return processing of the processor assigned to the thread B, and further, the thread termination processing unit 114 performs termination processing of the thread B. At this time, since the thread B is the last thread in the process D (Step S5: Yes), the thread termination processing unit 114 releases the thread shared space 201 (Step S6).
[0080]
With the above processing, the parallel execution time of the parallel program composed of the threads A, B, and C can be calculated, and the parallel execution time (T1, T2, T3) at that moment can be calculated by the parallel execution time acquisition system call issued at any time. ) Can be easily obtained. For example, when thread B issues a parallel execution time acquisition system call at time P8, thread B can acquire the parallel execution time (T1 = 60, T2 = 35, T1 = 10). Then, when the thread B acquires the parallel execution time, for example, it displays the acquired parallel execution time on a display unit (not shown), or based on the parallel execution time, the degree of parallelism = 35/60 = 0. 5833 (approximately 58.3%), 3 parallelism = 10/60 = 0.166666 (approximately 16.7%), and display them on the display unit.
[0081]
In the above-described embodiment, the thread shared space 201 is created on the main storage device 200. However, the thread shared space 201 is created on another storage device shared by the processors 100-1 to 100-N. May be generated.
[0082]
【The invention's effect】
A first effect of the present invention is that even when a plurality of parallel programs are operating on a multiprocessor computer, the parallel execution time of each parallel program can be obtained. The reason is that the parallel execution time is managed using a parallel execution time table specific to the process or the job.
[0083]
The second effect of the present invention is that a maximum of (N-1) parallelisms from 2 parallelisms to N parallelisms can be acquired as parallelisms at the time of execution of a parallel program, and detailed verification of the parallelisms can be performed. Is a point. The reason is that, for each of a plurality of threads (1, 2,..., N), a parallel execution time table having an area for registering a time at which a processor is simultaneously allocated to threads equal to or more than the number of threads is used. This is because they manage time.
[Brief description of the drawings]
FIG. 1 is a block diagram of an embodiment of the present invention.
FIG. 2 is a flowchart showing a processing example of FIG. 1;
FIG. 3 is a diagram for explaining the operation of the embodiment by giving a specific example;
[Explanation of symbols]
100-1 to 100-N: Processor
101: User program execution processing unit
102: Interrupt start processing unit
103: Parallel execution time calculation processing unit
104: parallel counter decrease processing unit
105: Parallel execution time acquisition processing unit
106 ... Processing unit
107: Parallel execution time calculation processing unit
108: Parallel counter increase processing unit
109: Interruption end processing unit
110: thread start processing unit
111: Processor allocation processing unit
112 ... Weight processing unit
113: Processor return processing unit
114: thread end processing unit
200: Main storage device
201: thread sharing space
202: Parallel execution time table
203 ... Parallel counter
204 ... Time stamp
205: System timer
K: Recording medium

Claims

In a multiprocessor computer including a plurality of processors and a main storage device shared by the plurality of processors,
Each of the processors,
When the thread generated by the own processor is the first thread in the process to which the thread belongs, a parallel execution time table unique to the process is provided on the main storage device, and for each of a plurality of threads, Means for generating a parallel execution time table having an area for registering a time at which a processor is simultaneously allocated to threads equal to or greater than the number of threads;
When assigning the own processor to a thread in the process, and when returning the own processor from the thread in the process, the number of threads in the process to which the processor is currently assigned, the current time, Means for updating the parallel execution time table based on the content of the parallel execution time table and the latest update time for the parallel execution time table,
Means for acquiring the contents of the parallel execution time table.

The multiprocessor computer according to claim 1,
A multiprocessor computer, wherein the parallel execution time table is generated on a storage device shared by the plurality of processors other than the main storage device.

In a multiprocessor computer including a plurality of processors and a main storage device shared by the plurality of processors,
Each of the processors,
When the thread created by the own processor is the first thread in the process to which the thread belongs, the thread is a thread shared space unique to the process on the main storage device, and the thread is shared by a plurality of threads. A parallel execution time table having an area in which the time at which a processor is simultaneously allocated to threads equal to or more than the number of threads is registered, and the number of threads to which a processor is currently allocated among threads in the process are set. Means for generating a thread shared space including a parallel counter, and a time stamp at which the latest update time for the parallel execution time table is set.
When allocating its own processor to a thread in the process, the content of the parallel execution time table is determined based on the current time, the content of the parallel execution time table, the content of the parallel counter, and the content of the time stamp. Means for updating, setting the current time in the time stamp, and further increasing the value of the parallel counter;
When returning the own processor from a thread in the process, the content of the parallel execution time table is determined based on the current time, the content of the parallel execution time table, the content of the parallel counter, and the content of the time stamp. Means for updating the timestamp with the current time, and further reducing the value of the parallel counter;
Means for acquiring the contents of the parallel execution time table.

The multiprocessor computer according to claim 3,
A multiprocessor computer, wherein the thread sharing space is created on a storage device shared by the plurality of processors other than the main storage device.

In a multiprocessor computer including a plurality of processors and a main storage device shared by the plurality of processors,
Each of the processors,
When the process generated by the own processor is the first process in a job to which the process belongs, a parallel execution time table unique to the job is provided on the main storage device, and for each of a plurality of processes, Means for generating a parallel execution time table having an area for registering a time at which a processor is simultaneously allocated to processes equal to or greater than the number of processes;
When assigning its own processor to a process in the job, and when returning its own processor from the process in the job, the number of processes in the job to which the processor is currently assigned, the current time, Means for updating the parallel execution time table based on the content of the parallel execution time table and the latest update time for the parallel execution time table,
Means for acquiring the contents of the parallel execution time table.

In a multiprocessor computer including a plurality of processors and a main storage device shared by the plurality of processors,
Each of the processors,
When the process generated by the own processor is the first process in a job to which the process belongs, the process is a shared space unique to the job on the main storage device, and the process is executed for each of a plurality of processes. A parallel execution time table having an area in which the time during which a processor is allocated to more than one process at the same time is registered, and the number of processes to which a processor is currently allocated among the processes in the job are set. Means for generating a process shared space including a parallel counter and a timestamp at which the latest update time for the parallel execution time table is set;
When allocating its own processor to a process in the job, based on the current time, the contents of the parallel execution time table, the contents of the parallel counter, and the contents of the time stamp, the contents of the parallel execution time table are Means for updating, setting the current time in the time stamp, and further increasing the value of the parallel counter;
When returning the own processor from the process in the job, the content of the parallel execution time table is based on the current time, the content of the parallel execution time table, the content of the parallel counter, and the content of the time stamp. Means for updating the timestamp with the current time, and further reducing the value of the parallel counter;
Means for acquiring the contents of the parallel execution time table.

A processor that is a component of a multiprocessor computer including a plurality of processors and a main storage device shared by the plurality of processors,
When the thread generated by the own processor is the first thread in the process to which the thread belongs, a parallel execution time table unique to the process is provided on the main storage device, and for each of a plurality of threads, Means for generating a parallel execution time table having an area for registering the time at which a processor is simultaneously allocated to the number of threads or more,
When assigning the own processor to a thread in the process, and when returning the own processor from the thread in the process, the number of threads in the process to which the processor is currently assigned, the current time, Means for updating the parallel execution time table based on the content of the parallel execution time table and the latest update time for the parallel execution time table,
A program for functioning as a means for acquiring the contents of the parallel execution time table.

A processor that is a component of a multiprocessor computer including a plurality of processors and a main storage device shared by the plurality of processors,
When the process generated by the own processor is the first process in a job to which the process belongs, a parallel execution time table unique to the job is provided on the main storage device, and for each of a plurality of processes, Means for generating a parallel execution time table having an area for registering a time at which a processor is simultaneously assigned to the number of processes or more,
When assigning its own processor to a process in the job, and when returning its own processor from the process in the job, the number of processes in the job to which the processor is currently assigned, the current time, Means for updating the parallel execution time table based on the content of the parallel execution time table and the latest update time for the parallel execution time table,
A program for functioning as a means for acquiring the contents of the parallel execution time table.