JP3546694B2

JP3546694B2 - Multi-thread computer system and multi-thread execution control method

Info

Publication number: JP3546694B2
Application number: JP10397098A
Authority: JP
Inventors: 淳嗣酒井
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 1998-03-31
Filing date: 1998-03-31
Publication date: 2004-07-28
Anticipated expiration: 2018-03-31
Also published as: JPH11282815A

Description

【０００１】
【発明の属する技術分野】
本発明は並列計算機システムに関し、特に、共有メモリ型マルチプロセッサ計算機上で、オペレーティングシステムを介することなく複数スレッドを効率的にスケジューリングするマルチスレッド計算機システムに関する。
【０００２】
【従来の技術】
より高い演算処理性能を得るために、一つのシステム内に複数のプロセッサエレメントを備えるマルチプロセッサ構成の計算機システムがある。そのような計算機システムのうち、各プロセッサエレメントが主記憶メモリを共有する構成のものは共有メモリ型マルチプロセッサ計算機システムと呼ばれ、分散メモリ型システムに比べプログラムの記述が容易であるという利点を持つ。
【０００３】
他方、ソフトウェア面では、一つのプロセスをスレッドと呼ばれる制御の流れに分割し、複数のスレッドを並行して実行する、マルチスレッド実行と呼ばれる並列実行方式がある。マルチプロセッサ計算機システム上では、複数のスレッドを複数のプロセッサエレメントに割り当てて同時に実行させることで処理性能が向上する。
【０００４】
マルチプロセッサ計算機システムにおけるスレッドの管理は、通常、マルチプロセッサ用オペレーティングシステムが行なう。すなわち、新しいスレッドの生成、別のスレッドとの同期、スレッドの消滅等を行なう場合、当該スレッドはオペレーティングシステムのサービスを呼び出す。
【０００５】
スレッドのプリエンプションもオペレーティングシステムを介して行なわれる。プリエンプションとは、長時間プロセッサを占有し続けるスレッドの実行を中断させ、他のスレッドの実行に切り替える処理である。プリエンプション機能をもつ計算機システムは、ハードウェアタイマ装置と、所定の時間が経過するとタイマ装置からプロセッサに割り込み信号を伝える仕組みを持っている。ユーザプログラムの実行を開始してから一定の時間が経過すると、タイマ割り込みによってオペレーティングシステムに制御が移り、オペレーティングシステムが実行スレッドの切り替えを行なう。
【０００６】
マルチプロセッサ計算機システム上で複数のスレッドを効率良く実行するには、スレッドの生成、同期、切り替え、消滅といったスレッド管理のオーバーヘッドを低減させることが重要である。
【０００７】
サン・マイクロシステムズのソラリス（Ｓｏｌａｒｉｓ）オペレーティングシステムでは、オペレーティングシステムではなく、ユーザレベルのライブラリによってスレッドの管理を行なうことができる。これにより、より低いオーバーヘッドでスレッド管理ができるが、プリエンプションの場合にはオペレーティングシステムが介在する（マルチスレッドプログラミング入門、アスキー出版局、１９９６年９月、ｐ６９〜ｐ７０およびｐ７６〜ｐ７７）。
【０００８】
特開平５−１５８９００号公報に記載されているプリエンプション処理回路は、オペレーティングシステムではなく、専用ハードウェアによってプリエンプションの受け付けを行なうものであるが、受け付け後の切り替え処理はオペレーティングシステムへの通常の割り込みによって行なっている。
【０００９】
他方、複数のスレッドによって共有される計算機資源、例えば主記憶メモリ上の共有変数を読み書きする場合には、複数のスレッドがその計算機資源を同時にアクセスすることで誤った処理結果を得ることを回避するために、排他制御と呼ばれる処理を行なう必要がある（排他制御については、例えば、Ａ．Ｓ．タネンバウム原著「ＯＳの基礎と応用」プレンティスホール・トッパン、第２章２．２節等に説明されている）。
【００１０】
排他制御は通常、テストアンドセットやエクスチェンジといった排他制御用のプロセッサ命令によって実現される。これらの命令は、主記憶上のあるメモリセルの値の検査とそのメモリセルへの新しい値の設定とを不可分に行なう。例えば、テストアンドセット命令は、以下の３つの操作を、割り込みあるいは他のプロセッサによって分断されること無く行なう。
（１）主記憶上のあるメモリセルの値の計算機レジスタへの読み出し
（２）そのメモリセルへの値１の書き込み
（３）計算機レジスタへ読み出した値と値０との比較
【００１１】
共有メモリ型マルチプロセッサ計算機システムでは、異なるプロセッサ間の排他制御を行なうため、これらの排他制御用命令は共有している主記憶メモリに対して作用する。すなわち、排他制御自身のために主記憶メモリへのアクセスが必要になる。
【００１２】
【発明が解決しようとする課題】
従来技術の第１の問題点は、マルチスレッド実行のために十分高速な割り込み応答処理ができず、スレッド切り替えのオーバーヘッドが大きかった点である。これは、割り込み処理がオペレーティングシステム内で処理されるため、ユーザプログラムからオペレーティングシステムへのコンテキストの切り替えが発生するからである。
【００１３】
第２の問題点は、排他制御処理自身のために主記憶メモリへのアクセスが発生し、マルチスレッド処理に不可欠なスレッド間の排他制御や同期処理のオーバーヘッドが大きかった点である。近年のプロセッサの処理速度は主記憶メモリのアクセス速度よりも格段に速いため、多くのプロセッサはキャッシュメモリと呼ばれる高速で小容量のメモリをプロセッサと主記憶メモリの間に備え、プロセッサは専らキャッシュメモリをアクセスすることでメモリアクセスによる処理の遅延を回避している。しかし、排他制御命令は常に主記憶メモリへのアクセスを行なうため、処理速度の低下を招く。
【００１４】
上に挙げた問題点はいずれもマルチスレッド処理のオーバーヘッドを増大させる原因となり、特に、粒度の小さいスレッド（含有する命令数の少ないスレッド）を単位としてマルチスレッド処理を行なう場合に大きな影響をもたらす。
【００１５】
【発明の目的】
本発明の目的は、オペレーティングシステムが介在しない高速なユーザレベル割り込みを提供し、マルチスレッド処理の効率を向上させることである。
【００１６】
本発明の他の目的は、マルチプロセッサ計算機システムにおけるプロセッサエレメント間の高速なロック機構を提供し、マルチスレッド処理の効率を向上させることである。
【００１７】
【課題を解決するための手段】
本発明の第１のマルチスレッド計算機システムは、複数のプロセッサエレメントを含むプロセッサと前記複数のプロセッサエレメントで共有される主記憶メモリとを備え、一つのユーザプロセスを粒度の小さい複数のスレッドに分割し、そのユーザプロセス内のスレッドスケジューラの制御の下に複数のスレッドを複数のプロセッサエレメントに割り当てて同時に実行するマルチスレッド計算機システムにおいて、
前記プロセッサエレメントのそれぞれは、レジスタ集合、演算ユニット及び制御ユニットに加えて、クロック毎に値が減じられていくカウンタと、該カウンタの値がゼロになった場合に割り込み要求信号を発生するゼロ比較器と、ユーザプロセス内のスレッドスケジューラの置かれているメモリアドレスを保持するユーザハンドラレジスタと、プログラムカウンタの値の退避先レジスタであるユーザ退避ＰＣとを含んで構成され、
前記プロセッサは、前記ゼロ比較器から前記割り込み要求信号が発生したプロセッサエレメントにてユーザレベル割り込み処理を開始し、プログラムカウンタの値を前記ユーザ退避ＰＣに設定すると共に前記ユーザハンドラレジスタの値を前記プログラムカウンタに設定する割り込み制御部と、プロセッサエレメント間で排他的に値の操作が行なえる計算機命令セットによって各プロセッサエレメントからアクセス可能な１ビットの記憶装置の集合とを含んで構成され、
前記ユーザプロセス内のスレッドスケジューラは、次に実行すべきスレッドが割り当てられるプロセッサエレメント内の前記カウンタに当該次に実行すべきスレッドに割り当てるタイムクォンタム値を設定すると共に前記ユーザハンドラレジスタに自スレッドスケジューラの置かれているメモリアドレスを設定して、前記プロセッサエレメントに当該次に実行すべきスレッドの実行を開始させるものであり、
複数のプロセッサエレメントで実行される複数のスレッド間の排他制御は、前記１ビットの記憶装置の集合を用いて行なうものである。
【００１８】
また、本発明の第２のマルチスレッド計算機システムは、複数のプロセッサエレメントを含むプロセッサと前記複数のプロセッサエレメントで共有される主記憶メモリとを備え、一つのユーザプロセスを粒度の小さい複数のスレッドに分割し、そのユーザプロセス内のスレッドスケジューラの制御の下に複数のスレッドを複数のプロセッサエレメントに割り当てて同時に実行するマルチスレッド計算機システムにおいて、
前記プロセッサエレメントのそれぞれは、レジスタ集合、演算ユニット及び制御ユニットに加えて、クロック毎に値が減じられていくカウンタと、該カウンタの値がゼロになった場合に割り込み要求信号を発生するゼロ比較器と、ユーザプロセス内のスレッドスケジューラの置かれているメモリアドレスを保持するユーザハンドラレジスタと、プログラムカウンタの値の退避先レジスタであるユーザ退避ＰＣとを含んで構成され、
前記プロセッサは、前記ゼロ比較器から前記割り込み要求信号が発生したプロセッサエレメントにてユーザレベル割り込み処理を開始し、プログラムカウンタの値を前記ユーザ退避ＰＣに設定すると共に前記ユーザハンドラレジスタの値を前記プログラムカウンタに設定する割り込み制御部と、プロセッサエレメントに対応した識別番号をもつトークンを到着順に格納するキュー構造の集合であって、プロセッサエレメント間で排他的にトークンの追加、検索あるいは削除が行なえる計算機命令セットによって各プロセッサエレメントからアクセスできるキュー構造の集合とを含んで構成され、
前記ユーザプロセス内のスレッドスケジューラは、次に実行すべきスレッドが割り当てられるプロセッサエレメント内の前記カウンタに当該次に実行すべきスレッドに割り当てるタイムクォンタム値を設定すると共に前記ユーザハンドラレジスタに自スレッドスケジューラの置かれているメモリアドレスを設定して、前記プロセッサエレメントに当該次に実行すべきスレッドの実行を開始させるものであり、
複数のプロセッサエレメントで実行される複数のスレッド間の排他制御は、前記キュー構造の集合を用いて行なうものである。
【００１９】
また本発明の第１のマルチスレッド実行制御方法は、複数のプロセッサエレメントを含むプロセッサと前記複数のプロセッサエレメントで共有される主記憶メモリとを備え、一つのユーザプロセスを粒度の小さい複数のスレッドに分割し、そのユーザプロセス内のスレッドスケジューラの制御の下に複数のスレッドを複数のプロセッサエレメントに割り当てて同時に実行するマルチスレッド計算機システムにおけるマルチスレッド実行制御方法において、オペレーティングシステムが介在しない高速なユーザレベル割り込みを提供するために、
（ａ）ユーザプロセスのスレッドが割り当てられるプロセッサエレメント内のユーザハンドラレジスタにそのユーザプロセスのスレッドスケジューラの置かれているメモリアドレスを設定すると共に、そのプロセッサエレメント内のカウンタにそのスレッドに割り当てるタイムクォンタム値を設定して、そのプロセッサエレメントでそのスレッドの実行を開始させ、実行を開始させたスレッドと他のスレッドとの間の排他制御は、プロセッサ内に設けられ且つプロセッサエレメント間で排他的に値の操作が行なえる計算機命令セットによって各プロセッサエレメントからアクセス可能な１ビットの記憶装置の集合を用いて行なわせる段階
（ｂ）プロセッサエレメントにおけるスレッドの実行開始と同時にそのプロセッサエレメント内の前記カウンタの値を一定周期で更新し、予め定められたカウント値に達した時点でユーザレベル割り込みを発生させる段階
（ｃ）ユーザレベル割り込みの処理において、割り込み要求元のプロセッサエレメントの現在のプログラムカウンタの値をそのプロセッサエレメント内のユーザ退避ＰＣに設定し、そのプロセッサエレメント内のユーザハンドラレジスタに設定されたメモリアドレスをプログラムカウンタに設定することにより制御をユーザプロセス内のスレッドスケジューラに移す段階
を含むことを特徴とする。
【００２０】
また、本発明の第２のマルチスレッド実行制御方法は、複数のプロセッサエレメントを含むプロセッサと前記複数のプロセッサエレメントで共有される主記憶メモリとを備え、一つのユーザプロセスを粒度の小さい複数のスレッドに分割し、そのユーザプロセス内のスレッドスケジューラの制御の下に複数のスレッドを複数のプロセッサエレメントに割り当てて同時に実行するマルチスレッド計算機システムにおけるマルチスレッド実行制御方法において、オペレーティングシステムが介在しない高速なユーザレベル割り込みを提供するために、
（ａ）ユーザプロセスのスレッドが割り当てられるプロセッサエレメント内のユーザハンドラレジスタにそのユーザプロセスのスレッドスケジューラの置かれているメモリアドレスを設定すると共に、そのプロセッサエレメント内のカウンタにそのスレッドに割り当てるタイムクォンタム値を設定して、そのプロセッサエレメントでそのスレッドの実行を開始させ、実行を開始させたスレッドと他のスレッドとの間の排他制御は、プロセッサエレメントに対応した識別番号をもつトークンを到着順に格納するプロセッサ内のキュー構造の集合であって、プロセッサエレメント間で排他的にトークンの追加、検索あるいは削除が行なえる計算機命令セットによって各プロセッサエレメントからアクセスできるキュー構造の集合を用いて行なわせる段階
（ｂ）プロセッサエレメントにおけるスレッドの実行開始と同時にそのプロセッサエレメント内の前記カウンタの値を一定周期で更新し、予め定められたカウント値に達した時点でユーザレベル割り込みを発生させる段階
（ｃ）ユーザレベル割り込みの処理において、割り込み要求元のプロセッサエレメントの現在のプログラムカウンタの値をそのプロセッサエレメント内のユーザ退避ＰＣに設定し、そのプロセッサエレメント内のユーザハンドラレジスタに設定されたメモリアドレスをプログラムカウンタに設定することにより制御をユーザプロセス内のスレッドスケジューラに移す段階
を含むことを特徴とする。
【００２１】
【作用】
カウンタがゼロになると割り込み制御部はユーザレベル割り込み処理を開始する。ユーザレベル割り込み処理では、予めユーザプロセスのスレッドスケジューラの開始アドレスに設定しておいたユーザハンドラレジスタの値をプログラムカウンタに設定し、オペレーティングシステムを経由すること無くスレッドスケジューラに制御を移行する。
【００２２】
複数のプロセッサエレメントで実行される複数のスレッド間の排他制御は、プロセッサ内に設けられた１ビットの記憶装置の集合またはキュー構造の集合を用いて行なわれ、主記憶メモリへアクセスする必要のないロック獲得及び解放機能を提供する。
【００２３】
【発明の実施の形態】
【構成の説明】
本発明の実施の形態について、図面を参照して詳細に説明する。
【００２４】
図１は、本発明が実施される計算機構成の一例を示す図である。プロセッサ１は内部に複数のプロセッサエレメント１１、１２及び１９を持ち、それらのプロセッサエレメントは共通の主記憶メモリ２に対してアクセスする。
【００２５】
図２を参照すると、本発明の第１の実施の形態は、内部に複数のプロセッサエレメントを有するプロセッサ１と、そのプロセッサ上で動作するオペレーティングシステム５０と、そのオペレーティングシステム上で動作するユーザプロセス１００とから構成される。
【００２６】
プロセッサ１内の各プロセッサエレメントは、一般的な計算機のプロセッサが持つレジスタ集合や演算ユニット、制御ユニット等の他に、ユーザレベルのプリエンプション割り込みを発生させるためのカウンタ２６とゼロ比較器２５、ユーザレベル割り込み発生時の制御移動に使用されるユーザハンドラレジスタ２０、ユーザ退避ＰＣ２３を備えている。
【００２７】
カウンタ２６の値は、クロック信号によってクロック毎に減算される。ゼロ比較器２５は、カウンタ２６の値とゼロとを比較する。ユーザハンドラレジスタ２０はユーザレベル割り込み発生時に実行制御を移すべきプログラムカウンタ値を保持しているレジスタであり、ユーザ退避ＰＣ２３はユーザレベル割り込みの発生直前のプログラムカウンタ値を退避しておくためのレジスタである。
【００２８】
図２のプログラムカウンタ（ＰＣ）２２及び割り込み制御部４１は一般的な計算機のプロセッサが持つプログラムカウンタ及び割り込み制御部と同等の機能を有するものであり、カーネルハンドラレジスタ２１及びカーネル退避ＰＣ２４は一般的な計算機のプロセッサにおける通常の割り込み発生時に用いられる割り込みハンドラレジスタ及びプログラムカウンタ退避用レジスタと同等の機能を有するものである。
【００２９】
ユーザプロセス１００はオペレーティングシステム５０により生成される実行プログラムの実体である。ユーザプロセス１００は主記憶メモリ空間を共有する複数のスレッド１０２、１０３及び１０４から構成され、これらのスレッドがプロセッサエレメントに割り当てられ、実行される。スレッドスケジューラ１０１はユーザプログラムとリンクされた形でユーザプロセス１００の内部に存在し、当該ユーザプロセスを構成する全スレッドの管理を行なう。
【００３０】
また、プロセッサ内には全プロセッサエレメントで共有されるロック変数セット３０とアクセス調停機構４０がある。ロック変数セット３０は１ビットの状態を記憶できるロック変数の集合であり、アクセス調停機構４０はロック変数へのアクセスをプロセッサエレメント間で調停する機構である。
【００３１】
【動作の説明】
図２を参照して、本実施の形態の動作について詳細に説明する。
【００３２】
オペレーティングシステム５０がユーザプロセス１００を開始させると、ユーザプロセス１００はその内部に組み込んだスレッドスケジューラ１０１を呼び出す。スレッドスケジューラ１０１は、ある決まったスケジューリングアルゴリズムに従って当該ユーザプロセス１００内のスレッドの集合をスケジューリングする。スレッドスケジューラ１０１が次に実行すべきスレッド（ここではスレッド１０２とする）を選択すると、スレッドスケジューラ１０１はスレッド１０２に割り当てるタイムクォンタム値をスレッド１０２が割り当てられるプロセッサエレメント１１のカウンタ２６に設定し、スレッドスケジューラ１０１の置かれているメモリアドレスを当該プロセッサエレメント１１のユーザハンドラレジスタ２０に設定した後、当該プロセッサエレメント１１でスレッド１０２の実行を開始させる。
【００３３】
スレッド１０２が実行されるのと並行して、カウンタ２６の値がクロック信号によって一定時間毎に減じられてゆく。その値がゼロになると、ゼロ比較器２５が割り込み制御部４１に割り込み要求信号を送る。割り込み制御部４１は当該プロセッサエレメント１１にてユーザレベル割り込みの処理を開始する。
【００３４】
ユーザレベル割り込み処理が開始すると、プロセッサエレメント１１は現在のプログラムカウンタ２２の値をユーザ退避ＰＣ２３に設定し、ユーザハンドラレジスタ２０の値をプログラムカウンタ２２に設定する。これにより、オペレーティングシステム５０を介することなく、ユーザプロセス１００の内部にあるスレッドスケジューラ１０１に速やかに制御が移行する。スレッドスケジューラ１０１は、レジスタセットの値をはじめとするスレッド１０２の実行状態をスレッドスケジューラ１０１の内部の管理データ領域に保存し、定められたスケジューリングアルゴリズムに従って次に実行開始すべきスレッドを選択し、次スレッドのレジスタセットの値を復元すると共に、カウンタ２６にそのスレッドに対応するタイムクォンタム値を設定して次スレッドの実行を開始する。
【００３５】
これに対して、従来の割り込み処理を用いる場合を以下に説明する。まず、割り込み処理開始時にユーザプログラムからオペレーティングシステムにコンテキストを切り替える。このコンテキスト切り替えには、
○現在のプログラムカウンタ２２の値のカーネル退避ＰＣ２４への設定
○カーネルハンドラレジスタ２１の値のプログラムカウンタ２２への設定
○プロセッサの動作モードのカーネルモードへの切り替え
○必要なレジスタセットの内容の主記憶メモリへの退避
といった処理が含まれる。このコンテキスト切り替えによって制御はオペレーティングシステム５０内部のカーネルレベル割り込みハンドラ５１、次いでプロセススケジューラ５２に移行する。プロセススケジューラ５２は割り込み要因を分析し、あらかじめ定められたスケジューリングアルゴリズムに従って次に実行すべきプロセスを決定する。そして、再度コンテキスト切り替えを行なって次プロセスの実行を開始する。このコンテキスト切り替えには、
○次プロセスのためのレジスタセットの設定
○次プロセス用論理アドレス空間の設定
○プロセッサの動作モードのユーザモードへの切り替え処理
が含まれる。これら一連の処理を終えて、ようやく制御がユーザプロセスに移行する。
【００３６】
つまり、ユーザレベル割り込み機構の導入により、オペレーティングシステムを経由することによるオーバーヘッドの無い、従来より高速なプリエンプション機構をユーザプログラムで利用することが可能になる。
【００３７】
なお、ユーザレベル割り込み機構の割り込み要因は、ダウンカウンタのゼロ一致によるプリエンプション割り込みに限定されるものではない。割り込み制御部さえ対応させれば、例えば、ユーザプロセス内で無効命令割り込みを処理するなどの目的に利用することも可能である。
【００３８】
次に図２を参照してロック変数セット３０を用いた高速ロック機構について説明する。各プロセッサエレメントは、本プロセッサに備わっているテストアンドセット命令を用いてロック変数セット３０の内部にあるロック変数にアクセスする。あるプロセッサエレメント１１がロック変数に対してテストアンドセット命令を用いると、アクセス調停機構４０は他のプロセッサエレメントからロック変数へのアクセスを一時的に禁止し、その間にプロセッサエレメント１１はロック変数の値をゼロと比較してプロセッサエレメント１１内の状態フラグに反映させ、ロック変数に値１を設定する。ソフトウェア側から見ると、本実施例で述べたテストアンドセット命令の動作は、従来のプロセッサが備えているテストアンドセット命令の動作と同様であるが、大きな遅延をもたらす主記憶メモリへのアクセスを伴わないため、従来よりもはるかに少ないオーバーヘッドで変数をロックすることができる。
【００３９】
プロセッサ内に搭載するロック変数の個数は８個、１６個または３２個が適当である。その理由は、ユーザ向けの高機能な排他制御及び同期制御機構はより基本的な排他制御機構を用いて容易に構築可能であり、そのような基本的な排他制御機構が用いるロック変数は少数で済むためと、ロック変数の個数を小さな２の冪乗個とすることでロック変数の番号を指定するための命令フィールド幅を小さくし、命令効率を高めることが可能であるためである。また、ロック変数の個数をレジスタセットの一般的なビット数である３２あるいは６４以内にすることにより、ロック変数セット全体を一つの特殊レジスタとして扱うことが可能になる。これによりロック変数セットをプロセスに対応するコンテキストに含めることが可能になり、プロセス毎に論理的に独立した高速ロック機構を用いることができるようになる。
【００４０】
上に述べたユーザレベル割り込み機構及び高速ロック機構の導入により、従来より細かな単位での並列処理が可能になり、マルチプロセッサ計算機の実効性能を高めると共に、その適用分野を拡大することができる。これが本実施の形態の効果である。
【００４１】
【発明の他の実施の形態】
次に、ユーザレベル割り込み機構の他の実施形態について説明する。
【００４２】
図２において、カウンタ２６の入力はクロック信号となっているが、この信号は計算機システムのクロック信号そのものには限らない。カウンタ２６の入力としてシステムクロック信号を適当な分周器で数分の一に分周したものを用いれば、カウンタ２６を構成するハードウェアのビット数やカウンタ２６に値を設定する計算機命令中のカウンタ値フィールドのビット幅を減らすことができ、ハードウェア及びソフトウェア双方の効率を改善できる。また、システムクロック信号ではなく、当該プロセッサエレメントの命令実行制御機構の信号をカウンタ２６に入力し、１命令実行する毎にカウンタ２６が減じられるようにする方式にしてもよい。
【００４３】
次に、高速ロック機構の他の実施形態について説明する。
【００４４】
図３を参照すると、高速ロック機構の第２の実施の形態は、各ロック変数がキュー構造になっているものである。各キュー６１、６２及び６９には、プロセッサエレメント数に等しい個数までのトークンを格納することができる。トークンは各プロセッサエレメントが発行するもので、トークンには発行したプロセッサエレメントの番号が付される。キューに一つ以上のトークンがある場合、当該キューの先頭にあるトークンを発行したプロセッサエレメントがそのキューに対応するロック変数をロックしていることを意味する。このロック変数を操作するために、ロック試行命令とロック解除命令の２種の計算機命令が備わっている。プロセッサエレメント１１、１２、…、１９とキュー６１、６２、…、６９との間に設けられたキュー制御機構６０は、ロック試行命令、ロック解除命令の実行時、キュー６１、６２、…、６９に対してプロセッサエレメント間で排他的にトークンの追加、検索、削除の操作を行なう。
【００４５】
プロセッサエレメント１１がロック試行命令を実行すると、キュー制御機構６０は、ロック試行命令のオペランドで指定されたキュー内（ここではキュー６１とする）に当該プロセッサエレメント１１のトークンが存在するか否かを検査し、もしトークンが存在しなければ新たにトークンを投入する。また、キュー制御機構６０は指定されたキュー６１の先頭に当該プロセッサエレメント１１に対応するトークンがあるか否かを調べ、その結果をプロセッサエレメント１１の状態フラグに設定する。ユーザプログラムはロック試行命令に続いて状態フラグを参照する分岐命令を実行することで、ロックを獲得できたか否かを判断する。
【００４６】
プロセッサエレメント１１がロック解除命令を実行すると、キュー制御機構６０は、ロック試行命令のオペランドで指定されたキュー内（ここではキュー６１とする）に当該プロセッサエレメント１１のトークンがあるか否かを検査し、もしあればそのトークンをキュー６１から除去する。
【００４７】
本実施形態は、ロックを要求した順序をキュー内で記憶し先着順にロックを獲得させるという点で、より公平な排他制御機構を実現できる特長を持つ。
【００４９】
【発明の効果】
本発明の第１の効果は、オペレーティングシステムを経ないユーザレベルの割り込み機構を実現できることである。これによりスレッド切り替えのためのオーバーヘッドが小さくなり、従来より小さな粒度でのマルチスレッド実行が可能になる。
【００５０】
本発明の第２の効果は、比較的応答速度の遅い主記憶メモリをアクセスすること無くプロセッサエレメント間の排他制御が実現できることである。これによりスレッド管理やスレッド間の通信にかかわるオーバーヘッドが小さくなり、従来より小さな粒度でのマルチスレッド実行が可能になる。
【００５１】
これらの低オーバーヘッド効果により、共有メモリ型マルチプロセッサ計算機の応用分野が拡大され、大きな粒度での並列性が少ないプログラムに対してもマルチスレッド実行による性能向上が可能になる。
【図面の簡単な説明】
【図１】本発明が実施される計算機の構成例を示す図である。
【図２】本発明の第１の実施の形態を示す図である。
【図３】本発明における高速ロック機構の第２の実施の形態を示す図である。
【符号の説明】
１プロセッサ
２主記憶メモリ
１１〜１９プロセッサエレメント
２０ユーザハンドラレジスタ
２１カーネルハンドラレジスタ
２２プログラムカウンタ
２３ユーザ退避ＰＣ
２４カーネル退避ＰＣ
２５ゼロ比較器
２６カウンタ
３０ロック変数セット
４０アクセス調停機構
４１割り込み制御部
５０オペレーティングシステム
５１カーネルレベル割り込みハンドラ
５２プロセススケジューラ
６０キュー制御機構
６１〜６９キュー
１００ユーザプロセス
１０１スレッドスケジューラ
１０２〜１０４スレッド[0001]
TECHNICAL FIELD OF THE INVENTION
The present invention relates to a parallel computer system, and more particularly, to a multi-thread computer system for efficiently scheduling a plurality of threads on a shared memory multiprocessor computer without passing through an operating system.
[0002]
[Prior art]
There is a multi-processor computer system having a plurality of processor elements in one system in order to obtain higher processing performance. Among such computer systems, a system in which each processor element shares a main memory is called a shared memory multiprocessor computer system, which has an advantage that a program can be easily described as compared with a distributed memory system. .
[0003]
On the other hand, on the software side, there is a parallel execution method called multi-thread execution in which one process is divided into control flows called threads and a plurality of threads are executed in parallel. On a multiprocessor computer system, processing performance is improved by allocating a plurality of threads to a plurality of processor elements and executing them simultaneously.
[0004]
The management of threads in a multiprocessor computer system is usually performed by a multiprocessor operating system. That is, when creating a new thread, synchronizing with another thread, or deleting a thread, the thread calls a service of the operating system.
[0005]
Thread preemption is also performed through the operating system. Preemption is a process of interrupting the execution of a thread that has occupied the processor for a long time and switching to the execution of another thread. A computer system having a preemption function has a hardware timer device and a mechanism for transmitting an interrupt signal from the timer device to a processor when a predetermined time has elapsed. When a certain time has elapsed since the start of the execution of the user program, the control is transferred to the operating system by a timer interrupt, and the operating system switches the execution thread.
[0006]
In order to efficiently execute a plurality of threads on a multiprocessor computer system, it is important to reduce thread management overhead such as thread generation, synchronization, switching, and deletion.
[0007]
In the Sun Microsystems Solaris operating system, threads can be managed by a user-level library instead of the operating system. This allows for thread management with lower overhead, but in the case of preemption the operating system intervenes (Introduction to Multithread Programming, ASCII Publishing, September 1996, p69-p70 and p76-p77).
[0008]
The preemption processing circuit described in JP-A-5-158900 accepts preemption not by an operating system but by dedicated hardware, but the switching process after the acceptance is performed by a normal interrupt to the operating system. I do.
[0009]
On the other hand, when reading / writing a computer resource shared by a plurality of threads, for example, a shared variable in a main memory, it is possible to prevent a plurality of threads from accessing the computer resource at the same time to obtain an erroneous processing result. For this purpose, it is necessary to perform a process called exclusive control (exclusive control is described in, for example, AS Tanenbaum, “Basics and Application of OS”, Prentice Hall Toppan, Chapter 2, Section 2.2, etc.) Has been).
[0010]
Exclusive control is usually realized by a processor instruction for exclusive control such as test and set or exchange. These instructions inseparably test the value of a certain memory cell in the main memory and set a new value to that memory cell. For example, the test and set instruction performs the following three operations without interruption or division by another processor.
(1) Reading the value of a certain memory cell in the main memory to a computer register
(2) Writing value 1 to the memory cell
(3) Comparison between the value read into the computer register and the value 0
[0011]
In a shared memory multiprocessor computer system, since exclusive control is performed between different processors, these exclusive control instructions act on a shared main memory. That is, access to the main storage memory is required for exclusive control itself.
[0012]
[Problems to be solved by the invention]
A first problem of the prior art is that a sufficiently high-speed interrupt response process cannot be performed due to multi-thread execution, and a thread switching overhead is large. This is because, since the interrupt processing is processed in the operating system, a context switch from the user program to the operating system occurs.
[0013]
The second problem is that an access to the main memory occurs due to the exclusive control processing itself, and the overhead of exclusive control and synchronization processing between threads, which is indispensable for multi-thread processing, is large. Since the processing speed of recent processors is much faster than the access speed of the main memory, many processors have a high-speed, small-capacity memory called a cache memory between the processor and the main memory. To avoid processing delays due to memory access. However, since the exclusive control instruction always accesses the main storage memory, the processing speed is reduced.
[0014]
All of the above-mentioned problems cause an increase in the overhead of multithread processing, and in particular, greatly affect multithread processing in units of small-grained threads (threads containing a small number of instructions).
[0015]
[Object of the invention]
An object of the present invention is to provide a high-speed user-level interrupt without the intervention of an operating system, and to improve the efficiency of multi-thread processing.
[0016]
Another object of the present invention is to provide a high-speed lock mechanism between processor elements in a multiprocessor computer system, and to improve the efficiency of multithread processing.
[0017]
[Means for Solving the Problems]
Of the present inventionFirstMulti-thread computer systemA processor including a plurality of processor elements and a main memory shared by the plurality of processor elements, dividing one user process into a plurality of threads having a small granularity, and under control of a thread scheduler in the user process. In a multi-threaded computer system that assigns multiple threads to multiple processor elements and executes them simultaneously,
Each of the processor elements includes, in addition to a register set, an operation unit, and a control unit, a counter whose value is decreased every clock, and a zero comparison which generates an interrupt request signal when the value of the counter becomes zero. A user handler register that holds a memory address where a thread scheduler in a user process is placed, and a user save PC that is a save destination register of a value of a program counter,
The processor starts user level interrupt processing in the processor element where the interrupt request signal is generated from the zero comparator, sets a value of a program counter in the user save PC, and sets a value of the user handler register in the program. An interrupt control unit to be set in the counter, and a set of 1-bit storage devices accessible from each processor element by a computer instruction set capable of exclusively operating values between the processor elements;
The thread scheduler in the user process sets the time quantum value to be allocated to the next thread to be executed in the counter in the processor element to which the next thread to be executed is allocated, and sets the own thread scheduler in the user handler register. Setting the memory address where it is located, causing the processor element to start execution of the next thread to be executed,
Exclusive control among a plurality of threads executed by a plurality of processor elements is performed using a set of the 1-bit storage devices.
[0018]
In addition, the present inventionSecondMulti-thread computer systemA processor including a plurality of processor elements and a main memory shared by the plurality of processor elements, dividing one user process into a plurality of threads having a small granularity, and under control of a thread scheduler in the user process. In a multi-threaded computer system that assigns multiple threads to multiple processor elements and executes them simultaneously,
Each of the processor elements includes, in addition to a register set, an operation unit, and a control unit, a counter whose value is decreased every clock, and a zero comparison which generates an interrupt request signal when the value of the counter becomes zero. A user handler register that holds a memory address where a thread scheduler in a user process is placed, and a user save PC that is a save destination register of a value of a program counter,
The processor starts user level interrupt processing in the processor element where the interrupt request signal is generated from the zero comparator, sets a value of a program counter in the user save PC, and sets a value of the user handler register in the program. A set of a queue structure that stores an interrupt control unit to be set in a counter and a token having an identification number corresponding to a processor element in the order of arrival, and a computer that can exclusively add, search, or delete tokens among the processor elements. And a set of queue structures accessible from each processor element by the instruction set.
The thread scheduler in the user process sets the time quantum value to be allocated to the next thread to be executed in the counter in the processor element to which the next thread to be executed is allocated, and sets the own thread scheduler in the user handler register. Setting the memory address where it is located, causing the processor element to start execution of the next thread to be executed,
Exclusive control among a plurality of threads executed by a plurality of processor elements is performed using a set of the queue structures.
[0019]
Also, a first multi-thread execution control method of the present invention includes a processor including a plurality of processor elements and a main storage memory shared by the plurality of processor elements, and converts one user process into a plurality of threads having a small granularity. In a multi-thread execution control method in a multi-thread computer system for dividing and assigning a plurality of threads to a plurality of processor elements under the control of a thread scheduler in the user process, a high-speed user level without an operating system is involved. To provide an interrupt,
(A) The memory address where the thread scheduler of the user process is placed is set in the user handler register in the processor element to which the thread of the user process is allocated, and the time quantum value to be allocated to the thread in the counter in the processor element. The setThen, the execution of the thread is started by the processor element, and the exclusive control between the thread that started the execution and another thread is provided in the processor and the operation of the value is exclusively performed between the processor elements. Performed using a set of 1-bit storage accessible from each processor element by a set of executable computer instructions.Stage
(B) updating the value of the counter in the processor element at a constant period simultaneously with the start of execution of a thread in the processor element, and generating a user-level interrupt when the count value reaches a predetermined count value
(C) In the processing of the user-level interrupt, the current program counter value of the processor element of the interrupt request source is set in the user save PC in the processor element, and the memory address set in the user handler register in the processor element Transferring control to a thread scheduler in a user process by setting
IncludingIt is characterized by the following.
[0020]
Further, a second multi-thread execution control method of the present invention includes a processor including a plurality of processor elements and a main storage memory shared by the plurality of processor elements, wherein one user process is provided with a plurality of threads having a small granularity. In a multi-threaded execution control method in a multi-threaded computer system in which a plurality of threads are assigned to a plurality of processor elements under the control of a thread scheduler in the user process and executed simultaneously, a high-speed user without an operating system intervenes. To provide a level interrupt,
(A) The memory address where the thread scheduler of the user process is placed is set in the user handler register in the processor element to which the thread of the user process is allocated, and the time quantum value to be allocated to the thread in the counter in the processor element. The setThen, the execution of the thread is started by the processor element, and the exclusive control between the thread that started the execution and another thread is performed in the processor that stores the token having the identification number corresponding to the processor element in the order of arrival. Using a set of queue structures that can be accessed from each processor element by a computer instruction set that allows exclusive addition, search, or deletion of tokens among the processor elements.
(B) updating the value of the counter in the processor element at a constant period simultaneously with the start of execution of a thread in the processor element, and generating a user-level interrupt when the count value reaches a predetermined count value
(C) In the processing of the user-level interrupt, the current program counter value of the processor element of the interrupt request source is set in the user save PC in the processor element, and the memory address set in the user handler register in the processor element Transferring control to a thread scheduler in a user process by setting
IncludingIt is characterized by the following.
[0021]
[Action]
When the counter reaches zero, the interrupt controller starts user level interrupt processing. In the user level interrupt processing, the value of the user handler register set in advance to the start address of the thread scheduler of the user process is set in the program counter, and control is transferred to the thread scheduler without passing through the operating system.
[0022]
Exclusive control among a plurality of threads executed by a plurality of processor elements is performed using a set of 1-bit storage devices or a set of queue structures provided in the processor,Provides a lock acquisition and release function that does not require access to main memory.
[0023]
BEST MODE FOR CARRYING OUT THE INVENTION
[Description of configuration]
Embodiments of the present invention will be described in detail with reference to the drawings.
[0024]
FIG. 1 is a diagram illustrating an example of a computer configuration in which the present invention is implemented. The processor 1 has a plurality of processor elements 11, 12, and 19 therein, and these processor elements access a common main memory 2.
[0025]
Referring to FIG. 2, a first embodiment of the present invention relates to a processor 1 having a plurality of processor elements therein, an operating system 50 operating on the processor, and a user process 100 operating on the operating system. It is composed of
[0026]
Each processor element in the processor 1 includes a counter 26 and a zero comparator 25 for generating a user-level preemption interrupt, a user-level pre-emption interrupt, and a register set, an operation unit, and a control unit of a general computer processor. It has a user handler register 20 and a user save PC 23 used for control movement when an interrupt occurs.
[0027]
The value of the counter 26 is decremented every clock by the clock signal. The zero comparator 25 compares the value of the counter 26 with zero. The user handler register 20 is a register holding a program counter value to which execution control is to be transferred when a user level interrupt occurs, and the user save PC 23 is a register for saving the program counter value immediately before the user level interrupt occurs. is there.
[0028]
The program counter (PC) 22 and the interrupt control unit 41 of FIG. 2 have the same functions as the program counter and the interrupt control unit of a general computer processor, and the kernel handler register 21 and the kernel evacuation PC 24 are common. It has the same function as an interrupt handler register and a program counter saving register used when a normal interrupt occurs in a processor of a simple computer.
[0029]
The user process 100 is an entity of an execution program generated by the operating system 50. The user process 100 is composed of a plurality of threads 102, 103 and 104 sharing a main memory space, and these threads are assigned to processor elements and executed. The thread scheduler 101 exists inside the user process 100 in a form linked to the user program, and manages all threads constituting the user process.
[0030]
In the processor, there are a lock variable set 30 and an access arbitration mechanism 40 shared by all processor elements. The lock variable set 30 is a set of lock variables that can store a 1-bit state, and the access arbitration mechanism 40 is a mechanism for arbitrating access to the lock variable between the processor elements.
[0031]
[Description of operation]
The operation of the present exemplary embodiment will be described in detail with reference to FIG.
[0032]
When the operating system 50 starts the user process 100, the user process 100 calls a thread scheduler 101 incorporated therein. The thread scheduler 101 schedules a set of threads in the user process 100 according to a certain scheduling algorithm. When the thread scheduler 101 selects a thread to be executed next (here, the thread 102), the thread scheduler 101 sets a time quantum value to be assigned to the thread 102 in the counter 26 of the processor element 11 to which the thread 102 is assigned, and After setting the memory address where the scheduler 101 is located in the user handler register 20 of the processor element 11, the processor element 11 starts executing the thread 102.
[0033]
In parallel with the execution of the thread 102, the value of the counter 26 is reduced at regular intervals by the clock signal. When the value becomes zero, the zero comparator 25 sends an interrupt request signal to the interrupt control unit 41. The interrupt control unit 41 starts processing of a user-level interrupt in the processor element 11.
[0034]
When the user level interrupt process starts, the processor element 11 sets the current value of the program counter 22 in the user save PC 23 and sets the value of the user handler register 20 in the program counter 22. Thereby, the control is promptly transferred to the thread scheduler 101 inside the user process 100 without going through the operating system 50. The thread scheduler 101 saves the execution state of the thread 102 including the value of the register set in a management data area inside the thread scheduler 101, selects the next thread to start execution according to a predetermined scheduling algorithm, and The value of the register set of the thread is restored, and the time quantum value corresponding to the thread is set in the counter 26, and the execution of the next thread is started.
[0035]
On the other hand, a case where the conventional interrupt processing is used will be described below. First, the context is switched from the user program to the operating system at the start of the interrupt processing. For this context switch,
○ Setting the current value of the program counter 22 in the kernel save PC 24
○ Setting of the value of the kernel handler register 21 to the program counter 22
○ Switching processor operation mode to kernel mode
○ Save required register set contents to main memory
Is included. By this context switching, control is transferred to the kernel level interrupt handler 51 inside the operating system 50 and then to the process scheduler 52. The process scheduler 52 analyzes an interrupt factor and determines a process to be executed next according to a predetermined scheduling algorithm. Then, the context is switched again and the execution of the next process is started. For this context switch,
○ Setting of register set for next process
○ Setting the logical address space for the next process
○ Processing to switch the processor operation mode to user mode
Is included. After a series of these processes, control is finally transferred to the user process.
[0036]
In other words, the introduction of the user-level interrupt mechanism makes it possible to use a preemption mechanism that is faster than in the past, without the overhead of passing through the operating system, in the user program.
[0037]
Note that the interrupt factor of the user-level interrupt mechanism is not limited to the preemption interrupt due to the zero match of the down counter. If only the interrupt control unit is used, it can be used for the purpose of processing an invalid instruction interrupt in a user process, for example.
[0038]
Next, a high-speed lock mechanism using the lock variable set 30 will be described with reference to FIG. Each processor element accesses a lock variable inside the lock variable set 30 by using a test and set instruction provided in the present processor. When one processor element 11 uses a test and set instruction on a lock variable, the access arbitration mechanism 40 temporarily inhibits access to the lock variable from another processor element, during which the processor element 11 returns to the value of the lock variable. Is compared with zero and reflected in the status flag in the processor element 11, and the value 1 is set in the lock variable. From the viewpoint of software, the operation of the test and set instruction described in the present embodiment is similar to the operation of the test and set instruction provided in the conventional processor, but the access to the main storage memory causing a large delay is performed. Because it is not accompanied, variables can be locked with much less overhead than before.
[0039]
The number of lock variables mounted in the processor is suitably 8, 16, or 32. The reason is that a sophisticated exclusive control and synchronization control mechanism for the user can be easily constructed using a more basic exclusive control mechanism, and such a basic exclusive control mechanism uses a small number of lock variables. This is because the number of lock variables is set to a small power of 2 so that the width of an instruction field for specifying the number of a lock variable can be reduced and the instruction efficiency can be improved. Further, by setting the number of lock variables within 32 or 64 which is the general number of bits of the register set, the entire lock variable set can be handled as one special register. This makes it possible to include the lock variable set in the context corresponding to the process, and to use a logically independent high-speed lock mechanism for each process.
[0040]
The introduction of the user-level interrupt mechanism and the high-speed lock mechanism described above enables parallel processing in smaller units than before, thereby improving the effective performance of the multiprocessor computer and expanding its application field. This is the effect of the present embodiment.
[0041]
Another embodiment of the present invention
Next, another embodiment of the user level interrupt mechanism will be described.
[0042]
In FIG. 2, the input of the counter 26 is a clock signal, but this signal is not limited to the clock signal of the computer system. If the system clock signal obtained by dividing the system clock signal by a suitable frequency divider to a fraction is used as the input to the counter 26, the number of bits of the hardware constituting the counter 26 and the computer The bit width of the counter value field can be reduced, and the efficiency of both hardware and software can be improved. Instead of the system clock signal, a signal of the instruction execution control mechanism of the processor element may be input to the counter 26 so that the counter 26 is reduced every time one instruction is executed.
[0043]
Next, another embodiment of the high-speed lock mechanism will be described.
[0044]
Referring to FIG. 3, the second embodiment of the high-speed lock mechanism is such that each lock variable has a queue structure. Each of the queues 61, 62 and 69 can store tokens up to the number equal to the number of processor elements. The token is issued by each processor element, and the token is assigned the number of the issued processor element. If there is one or more tokens in the queue, it means that the processor element that issued the token at the head of the queue has locked the lock variable corresponding to the queue. To operate the lock variable, two types of computer instructions, a lock trial instruction and a lock release instruction, are provided. The queue control mechanism 60 provided between the processor elements 11, 12,..., 19 and the queues 61, 62,. , The operations of adding, searching, and deleting tokens are exclusively performed between the processor elements.
[0045]
When the processor element 11 executes the lock attempt instruction, the queue control mechanism 60 determines whether or not the token of the processor element 11 exists in the queue designated by the operand of the lock attempt instruction (here, the queue 61). Check, and if token does not exist, insert a new token. Further, the queue control mechanism 60 checks whether or not there is a token corresponding to the processor element 11 at the head of the designated queue 61, and sets the result in the status flag of the processor element 11. The user program determines whether or not the lock has been acquired by executing a branch instruction that refers to the status flag following the lock attempt instruction.
[0046]
When the processor element 11 executes the lock release instruction, the queue control mechanism 60 checks whether or not the token of the processor element 11 exists in the queue specified by the operand of the lock attempt instruction (here, the queue 61). Then, the token is removed from the queue 61, if any.
[0047]
This embodiment has an advantage that a more fair exclusion control mechanism can be realized in that the order in which locks are requested is stored in a queue and locks are acquired on a first-come, first-served basis.
[0049]
【The invention's effect】
A first effect of the present invention is that a user-level interrupt mechanism that does not pass through an operating system can be realized. As a result, the overhead for thread switching is reduced, and multi-thread execution with smaller granularity than before becomes possible.
[0050]
A second effect of the present invention is that exclusive control between processor elements can be realized without accessing a main storage memory having a relatively slow response speed. This reduces overhead related to thread management and communication between threads, and enables multi-thread execution with smaller granularity than before.
[0051]
Due to these low overhead effects, the application field of the shared memory type multiprocessor computer is expanded, and the performance can be improved by multithread execution even for a program with large granularity and low parallelism.
[Brief description of the drawings]
FIG. 1 is a diagram illustrating a configuration example of a computer on which the present invention is implemented.
FIG. 2 is a diagram showing a first embodiment of the present invention.
FIG. 3 is a view showing a second embodiment of the high-speed lock mechanism according to the present invention.
[Explanation of symbols]
1 processor
2 Main memory
11-19 processor element
20 User handler register
21 Kernel handler register
22 Program Counter
23 User evacuation PC
24 Kernel save PC
25 Zero comparator
26 counter
30 Lock Variable Set
40 Access Arbitration Mechanism
41 Interrupt control unit
50 Operating system
51 Kernel level interrupt handler
52 Process Scheduler
60 Queue control mechanism
61-69 queue
100 user process
101 Thread scheduler
102-104 threads

Claims

A processor including a plurality of processor elements and a main memory shared by the plurality of processor elements, dividing one user process into a plurality of threads having a small granularity, and under control of a thread scheduler in the user process. In a multi-threaded computer system that assigns multiple threads to multiple processor elements and executes them simultaneously,
Each of the processor elements includes, in addition to a register set, an operation unit, and a control unit, a counter whose value is decreased every clock, and a zero comparison which generates an interrupt request signal when the value of the counter becomes zero. A user handler register that holds a memory address where a thread scheduler in a user process is placed, and a user save PC that is a save destination register of a value of a program counter,
The processor starts user level interrupt processing in the processor element where the interrupt request signal is generated from the zero comparator, sets a value of a program counter in the user save PC, and sets a value of the user handler register in the program. An interrupt control unit to be set in the counter, and a set of 1-bit storage devices accessible from each processor element by a computer instruction set capable of exclusively operating values between the processor elements;
The thread scheduler in the user process sets the time quantum value to be allocated to the next thread to be executed in the counter in the processor element to which the next thread to be executed is allocated, and sets the own thread scheduler in the user handler register. Setting the memory address where it is located, causing the processor element to start execution of the next thread to be executed,
A multi-thread computer system according to claim 1, wherein exclusive control among a plurality of threads executed by a plurality of processor elements is performed using the set of 1-bit storage devices.

A processor including a plurality of processor elements and a main memory shared by the plurality of processor elements, dividing one user process into a plurality of threads having a small granularity, and under control of a thread scheduler in the user process. In a multi-threaded computer system that assigns multiple threads to multiple processor elements and executes them simultaneously,
Each of the processor elements includes, in addition to a register set, an operation unit, and a control unit, a counter whose value is decreased every clock, and a zero comparison which generates an interrupt request signal when the value of the counter becomes zero. A user handler register that holds a memory address where a thread scheduler in a user process is placed, and a user save PC that is a save destination register of a value of a program counter,
The processor starts user level interrupt processing in the processor element where the interrupt request signal is generated from the zero comparator, sets a value of a program counter in the user save PC, and sets a value of the user handler register in the program. A set of a queue structure that stores an interrupt control unit to be set in a counter and a token having an identification number corresponding to a processor element in the order of arrival, and a computer that can exclusively add, search, or delete tokens among the processor elements. And a set of queue structures accessible from each processor element by the instruction set.
The thread scheduler in the user process sets the time quantum value to be allocated to the next thread to be executed in the counter in the processor element to which the next thread to be executed is allocated, and sets the own thread scheduler in the user handler register. Setting the memory address where it is located, causing the processor element to start execution of the next thread to be executed,
A multi-threaded computer system, wherein exclusive control among a plurality of threads executed by a plurality of processor elements is performed using a set of the queue structures.

A processor including a plurality of processor elements and a main memory shared by the plurality of processor elements, dividing one user process into a plurality of threads having a small granularity, and under control of a thread scheduler in the user process. A multi-threaded execution control method in a multi-threaded computer system in which a plurality of threads are assigned to a plurality of processor elements and executed simultaneously,
(A) The memory address where the thread scheduler of the user process is located is set in the user handler register in the processor element to which the thread of the user process is allocated, and the time quantum value to be allocated to the thread in the counter in the processor element. To start the execution of the thread in the processor element, and exclusive control between the thread that started the execution and another thread is provided in the processor and exclusive value is set between the processor elements. the mosquito operations in a thread at the same time the processor elements perform the start and the in step (b) the processor elements to perform using a set of storage devices 1 bit accessible from each processor element by capable computer instruction set (C) generating a user-level interrupt when the count value reaches a predetermined count value. In the process of the user-level interrupt, the current program counter of the processor element that issued the interrupt request is updated. set the value to the user save PC within the processor element, including the step of transferring control to a thread scheduler in user process by setting a memory address that is set in user handler registers within the processor element to the program counter A multi-thread execution control method characterized by the above-mentioned.

A processor including a plurality of processor elements and a main memory shared by the plurality of processor elements, dividing one user process into a plurality of threads having a small granularity, and under control of a thread scheduler in the user process. A multi-threaded execution control method in a multi-threaded computer system in which a plurality of threads are assigned to a plurality of processor elements and executed simultaneously,
(A) The memory address where the thread scheduler of the user process is located is set in the user handler register in the processor element to which the thread of the user process is allocated, and the time quantum value to be allocated to the thread in the counter in the processor element. Is set, the execution of the thread is started in the processor element, and the exclusive control between the thread that started the execution and another thread stores a token having an identification number corresponding to the processor element in the order of arrival. A set of queue structures in the processor, which is set using a set of queue structures that can be accessed from each processor element by a computer instruction set in which tokens can be added, searched, or deleted exclusively between the processor elements. Step (b) step of the value of the counter in the thread at the same time the processor elements perform the start and the in the processor elements is updated at a constant period, generates a user-level interrupt when it reaches a count value set in advance (c) In the processing of the user-level interrupt, the current program counter value of the processor element of the interrupt request source is set in the user save PC in the processor element, and the memory address set in the user handler register in the processor element is stored in the program counter. setting multithread execution control method the step of transferring control to a thread scheduler in user process is characterized in including that by the.