JPH09146904A

JPH09146904A - Address space sharing system

Info

Publication number: JPH09146904A
Application number: JP7310174A
Authority: JP
Inventors: Hideki Yamanaka; 英樹山中
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1995-11-29
Filing date: 1995-11-29
Publication date: 1997-06-06
Anticipated expiration: 2015-11-29
Also published as: JP3639366B2

Abstract

PROBLEM TO BE SOLVED: To effectively use a local memory space by evading the problem of an address conflict which is caused when an arbitrary thread (process) which is being executed is moved to another processor in the address space sharing space of a parallel decentralized processing system, etc., wherein plural processors carry forward processing collaboratively. SOLUTION: The address spaces of processors 11 and 11' are divided into spaces that plural threads share and spaces, such as a stack, that the threads do not share. The processors 1 and 1' are provided with CPU address/logical address converting circuits 12 and 12' and for access to the unshared space of the latter, the value of a base register 15 is added to its access request address to perform address conversion, making the address space such as a stack relative and moving it optionally in parallel.

Description

Detailed Description of the Invention

【０００１】[0001]

【発明の属する技術分野】本発明は，複数のプロセッサ
が協調して処理を進める並列分散処理システムにおい
て，複数のプロセッサまたはスレッドがアドレス空間を
共有して動作するアドレス空間共有システムに関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to an address space sharing system in which a plurality of processors or threads operate by sharing an address space in a parallel distributed processing system in which a plurality of processors cooperate in processing.

【０００２】現在の並列分散処理の環境は，それ自身の
複雑さおよび並列処理のためのプログラミングの困難さ
から一部の専門家の独占物となってしまっているが，単
一のＣＰＵによる処理のボトルネックが顕在化している
現在，一般の計算機ユーザにも容易で高性能な並列処理
を可能とする環境の提供が急務となっている。The current parallel distributed processing environment has become the monopoly of some experts due to its own complexity and the difficulty of programming for parallel processing, but processing by a single CPU Now that the bottleneck of is becoming apparent, there is an urgent need to provide an environment that enables general computer users to easily perform high-performance parallel processing.

【０００３】[0003]

【発明が解決しようとする課題】複数のプロセッサが協
調して処理を進める並列処理分散システム，特に，ＬＡ
Ｎ，ＷＡＮ環境でヘテロジーニァスな複数のプロセッサ
を１クラスタとして協調させながら一つのタスクを並列
分散処理させるようなシステム環境が考えられている。
このような並列処理環境を一般ユーザに提供する際に問
題となるのは，簡易性と習得のし易さとであるが，これ
は性能との間にトレードオフの関係を生ずる。A parallel processing distributed system in which a plurality of processors cooperate to carry out processing, in particular, LA
A system environment has been considered in which a plurality of heterogeneous processors in an N, WAN environment are coordinated as one cluster to perform one task in parallel and distributed processing.
A problem in providing such a parallel processing environment to general users is simplicity and easiness of learning, but this causes a trade-off relationship between performance and performance.

【０００４】従来の技術水準では，性能のために簡易性
をかなりの程度犠牲にするか，または簡易性のために大
幅な性能の低下を甘受せざるを得ない。性能に関し，プ
ロセッサ間のデータ転送の遅延とスループットが問題に
なるが，これは，本質的にはデータ転送の遅延の問題に
還元できる。並列度の高い計算では，転送するデータの
単位が小さく，転送量に関係のない転送回数だけに依存
する遅延が主だからである。転送回数に依存する遅延を
全体として減らすためには，転送するデータをある程度
バッファに蓄積しておいて，まとめて一度に転送する必
要があるが，このバッファの大きさをどの程度にすると
最適であるのかは，因子が複雑に絡み合っているため，
事実上実験してみないことには分からない。In the state of the art, simplicity must be sacrificed for performance to a large extent, or performance must be significantly reduced for simplicity. In terms of performance, the delay and throughput of data transfer between processors becomes a problem, but this can be essentially reduced to the problem of delay in data transfer. This is because, in a calculation with a high degree of parallelism, the unit of data to be transferred is small, and the delay mainly depends on the number of transfers regardless of the transfer amount. In order to reduce the delay depending on the number of transfers as a whole, it is necessary to store the data to be transferred in a buffer to some extent and transfer them all at once. What is the optimum size of this buffer? What is there is that the factors are intricately intertwined,
In fact, I don't know what I didn't experiment with.

【０００５】また，プログラムを並列実行するために
は，それを並列実行の単位に分割しなければならない。
しかし，より小さな単位に分割すればそれだけ多くのＣ
ＰＵが利用可能になる代わりに，実行の単位が小さくな
ることによる同期のオーバヘッド，コンテキスト・スイ
ッチの増加によるオーバヘッド，データの遅延，データ
転送量の増加，メモリのフラグメンテーションによるペ
ージングの増加等を招くことになり，ここでもまた，ト
レードオフを生じる。In order to execute a program in parallel, it must be divided into units for parallel execution.
However, if it is divided into smaller units, more C
Instead of making the PU available, it causes synchronization overhead due to smaller execution units, overhead due to increased context switches, data delay, increased data transfer amount, increased paging due to memory fragmentation, etc. And again there is a trade-off.

【０００６】エンドユーザに対しても並列処理によるプ
ログラムの高速化，大規模化のメリットを享受できるよ
うにすることが望まれているが，現状では，ある程度の
性能を得るためには，エンドユーザにもエキスパート・
ユーザの持つ並列処理の煩雑なノウハウを獲得してプロ
グラムのチューニングをしてもらわなければならないと
いう矛盾に直面する。It is desired that end users can enjoy the advantages of speeding up and increasing the scale of programs by parallel processing. However, under the present circumstances, in order to obtain a certain level of performance, the end users are required. Also an expert
We face the contradiction of having to tune the program by acquiring the user's complicated know-how of parallel processing.

【０００７】これらの解決の手段として，従来，並列処
理のための高級言語，例えば，手続き型として，Ｏｃｃ
ａｍ（A.Burns, PROGRAMMING IN occam 2, Addison-Wes
ley,1988）, ＨＰＦ（High Performance Fortran Foru
m, High Performance FortranLanguage Specification,
1994)，関数型として，ＣＬＥＡＮ（R.Plasmeijer and
M.van Eekelen, Functional Programming and Paralle
l Graph Rewriting, Addison-Wesley,1993），論理型と
して，ＰＡＲＬＯＧ（T.Conlon, Programmingin PARLO
G, Addison-Wesley,1989）のような高級言語が開発され
ている。As a means for solving these problems, conventionally, a high-level language for parallel processing, for example, as a procedural type, Occ
am (A. Burns, PROGRAMMING IN occam 2, Addison-Wes
ley, 1988), HPF (High Performance Fortran Foru)
m, High Performance Fortran Language Specification,
1994), CLEAN (R. Plasmeijer and
M. van Eekelen, Functional Programming and Paralle
l Graph Rewriting, Addison-Wesley, 1993), as a logical type, PARLOG (T. Conlon, Programming in PARLO
Higher level languages such as G, Addison-Wesley, 1989) have been developed.

【０００８】また，高レベルのライブラリ・インタフェ
ースとして，例えばＰＶＭ（A.Geist,A.Beguelin,J.Don
garra,W.Jiang,R.Manchek and V.Sunderam, PVM:Parall
el Virtual Machine - A Users' Guide and Tutorial f
or Networked Parallel Computing -, MIT press,1994
），ＭＰＩ（Message Passing Interface Forum, MPI:
A Message-Passing Interface Standard, May 5,199
4）が開発されている。As a high-level library interface, for example, PVM (A.Geist, A.Beguelin, J.Don
garra, W.Jiang, R.Manchek and V.Sunderam, PVM: Parall
el Virtual Machine-A Users' Guide and Tutorial f
or Networked Parallel Computing-, MIT press, 1994
), MPI (Message Passing Interface Forum, MPI:
A Message-Passing Interface Standard, May 5,199
4) is being developed.

【０００９】しかし，高級言語では十分な性能がでない
か，性能を出すためにはエキスパート並みのノウハウが
必要であり，また，高レベルのライブラリは，未だにエ
ンドユーザが使えるようなレベルに達していない。However, a high-level language does not have sufficient performance, or expert-level know-how is required to obtain the performance, and the high-level library has not yet reached a level usable by the end user. .

【００１０】他の中間的な解決手段として，比較的低レ
ベルの手続き型の逐次言語と並列処理のための命令言語
を組み合わせる方法（I.Foster, R.Olson and S.Tueck
e, Productive Parallel Programming, Scientific Pro
gramming, Vol.1,pp.51-66, 1992; L.A.Crowl and T.J.
LeBlanc, Parallel Programming with Control Abstrac
tion, ACM Transactions on Programming Languages an
d Systems, Vol.16,No.3,pp.524-576, 1994）が提案さ
れている。As another intermediate solution, a method of combining a relatively low-level procedural sequential language and an instruction language for parallel processing (I. Foster, R. Olson and S. Tueck
e, Productive Parallel Programming, Scientific Pro
gramming, Vol.1, pp.51-66, 1992; LACrowl and TJ
LeBlanc, Parallel Programming with Control Abstrac
tion, ACM Transactions on Programming Languages an
d Systems, Vol.16, No.3, pp.524-576, 1994) have been proposed.

【００１１】これらの方法は，全ての面にわたってエン
ドユーザであるのではなく，逐次処理のエキスパートで
あるが並列処理に関しては比較的エンドユーザに近い人
を対象として，低レベルの逐次言語のチューニングと並
列処理のチューニングとを分離し，並列処理のインタフ
ェース部分だけに簡易で画一化されたチューニング・ス
タイルを導入するものである。These methods are not end users in all aspects, but those who are experts in serial processing but are relatively close to the end users in terms of parallel processing. It separates the tuning of parallel processing and introduces a simple and uniform tuning style only to the interface part of parallel processing.

【００１２】本発明が対象とするシステムは，後者の考
え方にもとづくものであるが，並列処理インタフェース
部分のさらなる画一化とエキスパート・ユーザのための
汎用性とを推進し，逐次処理部分とのインタフェースに
柔軟性を持たせるために，全体を手続き型言語の意味論
で統一することを図っている。The system targeted by the present invention is based on the latter idea, but promotes further uniformization of the parallel processing interface part and versatility for expert users, and In order to make the interface flexible, we are trying to unify the overall semantics of the procedural language.

【００１３】例えば，以上のような並列分散処理システ
ムのもとでの，ネットワーク上のワークステーション
群，あるいは専用のマルチＣＰＵの並列計算機上で並列
プログラムを実行するとき，共有メモリ機構（仮想共有
メモリ機構を含む。以下同様。）を使用して全てのＣＰ
Ｕに共有するアドレス空間（実アドレス空間または論理
アドレス空間）を構成する方法は，単一のＣＰＵを使用
したプログラミングに最も近い並列プログラミングを可
能とする方法として知られている。また，スタックその
他，各種のバッファ等のアドレス空間は，他のＣＰＵと
共有させる必要はないので，全てのアドレス空間を共有
させるのではなく，アドレス空間の一部を各ＣＰＵにロ
ーカルになるようにする方法もある。For example, when a parallel program is executed on a group of workstations on a network or a dedicated multi-CPU parallel computer under the above parallel distributed processing system, a shared memory mechanism (virtual shared memory) is used. Mechanism, including the following).
A method of configuring an address space (real address space or logical address space) shared by U is known as a method that enables parallel programming that is the closest to programming using a single CPU. Further, since the address space such as the stack and various buffers does not need to be shared with other CPUs, instead of sharing all the address space, a part of the address space should be local to each CPU. There is also a way to do it.

【００１４】これらの方法の処理モデルは，複数のスレ
ッドと呼ばれる計算主体が，それぞれ共有するアドレス
空間の中で，ときに同期を取りながら相互に並列に処理
を進めるものである。各スレッドは，自分専用のスタッ
クを使用し，他のスレッドと共有するアドレス空間上
で，ＣＰＵを排他的に使用して処理を進める。すなわ
ち，スレッドは，自分専用のデータ領域あるいは一時的
なデータをスタック上に確保してＣＰＵの演算に使用し
たり，他のスレッドにＣＰＵ使用権を明け渡す時にＣＰ
Ｕのレジスタの中身を自分のスタック上に退避し，次に
ＣＰＵ使用権が自分に明け渡された時に，退避しておい
たレジスタの中身を元に戻すことにより，スレッド毎の
ＣＰＵの状態の一貫性を保持する。The processing models of these methods are such that calculation subjects called a plurality of threads proceed in parallel with each other in a shared address space, sometimes in synchronization with each other. Each thread uses its own stack and advances the processing by exclusively using the CPU on the address space shared with other threads. That is, a thread reserves its own data area or temporary data on the stack and uses it for CPU operation, or when giving up the CPU usage right to another thread, the CP
By saving the contents of the U register on its own stack and then returning the saved contents of the register when the CPU usage right is handed over to itself, the consistency of the CPU state for each thread Retains sex.

【００１５】このように，スタック空間は各スレッド毎
の専用空間の集まりであるのに，全てのＣＰＵで共有さ
れてしまうと，あるＣＰＵのスレッドが使用しているた
めに他のＣＰＵには自分で使えないメモリのアドレス空
間が多くできてしまうという問題が生じる。As described above, although the stack space is a collection of dedicated spaces for each thread, if it is shared by all CPUs, the threads of one CPU are used by other CPUs, so that the other CPUs have their own. There is a problem that a lot of address space of the memory that cannot be used is created.

【００１６】また，ローカルなアドレス空間にスタック
を置く方式の場合でも，スレッドを他のＣＰＵに移動さ
せるときに，前と同じアドレス空間にスタックを配置し
なければならないので，スレッドが移動可能である場合
には，実際上，スタックを（仮想）共有アドレスに配置
しなければならず，アドレス空間の利用効率が悪くなっ
てしまう。Also in the case of the method of placing the stack in the local address space, the thread can be moved because the stack must be placed in the same address space as before when the thread is moved to another CPU. In that case, the stack has to be practically placed at the (virtual) shared address, resulting in poor utilization efficiency of the address space.

【００１７】図５は，以上の従来技術の問題点説明図で
ある。図５（Ａ）に示すように，ＣＰＵｉで実行してい
たスレッドＡを他のＣＰＵｊに移動させて，ＣＰＵｊ上
でスレッドＡの実行を継続する場合，ＣＰＵｊのアドレ
ス空間においても，ＣＰＵｉのアドレス空間のスレッド
Ａ用スタックと同じ位置にスタックを配置する必要があ
る。しかし，すでに他のスレッドＣ用スタックがその位
置にある場合には，アドレスの衝突が生じるので，スレ
ッドの移動が不可となる。FIG. 5 is a diagram for explaining the problems of the above-mentioned conventional technique. As shown in FIG. 5A, when the thread A that has been executed by the CPUi is moved to another CPUj and the execution of the thread A is continued on the CPUj, even in the address space of the CPUj, the address space of the CPUi It is necessary to arrange the stack at the same position as the stack for thread A of. However, if another thread C stack is already in that position, an address conflict occurs, and the thread cannot be moved.

【００１８】スレッドＡを移動可能にするには，例えば
図５（Ｂ）に示すように，あらかじめスタックのアドレ
ス空間に全てのスレッド用のスタックが重なることのな
いように，領域をリザーブしておく必要が生じ，アドレ
ス空間という重要な資源を無駄に使用することになって
しまう。To make the thread A movable, for example, as shown in FIG. 5B, an area is reserved in advance so that stacks for all threads do not overlap the address space of the stack. The need arises, and the important resource of the address space is wasted.

【００１９】本発明は上記問題点の解決を図り，共有メ
モリ・システムにおいて，スレッドまたはプロセスが移
動しても，スタック等のアドレス衝突が起きないように
し，これによってアドレス空間の有効利用を可能にする
ことを目的とする。The present invention solves the above problems and prevents address collisions such as stacks in the shared memory system even when threads or processes move, thereby enabling effective use of the address space. The purpose is to do.

【００２０】[0020]

【課題を解決するための手段】本発明では，アドレス空
間を共有空間と他のＣＰＵに共有されないローカル・ア
ドレス空間に分け，さらにローカル・アドレス空間中に
スタック・アドレス専用のスタック空間を構成する。そ
して，ＣＰＵからのスタック空間以外のアドレスへのメ
モリアクセスに対してはアドレス変換を行わず，スタッ
ク空間のアドレスのメモリへのアクセスに限り，自動的
にアドレスを原点からのオフセットに変換して，専用の
レジスタ（スタック・ベース・レジスタ：ｓｂｒ）の中
身を足し合わせるようにする。According to the present invention, an address space is divided into a shared space and a local address space which is not shared by other CPUs, and a stack space dedicated to a stack address is formed in the local address space. Address conversion is not performed for memory access from the CPU to addresses other than the stack space, and only for access to the memory at the address of the stack space, the address is automatically converted to an offset from the origin, The contents of a dedicated register (stack base register: sbr) are added together.

【００２１】図１は，本発明の構成例を示す図である。
例えば，図１（Ａ）に示すように，ネットワーク１０で
結合された複数のプロセッサ１１，１１’上でプロセッ
サ使用権を得る計算主体である複数のスレッド１３ａ〜
１３ｄが動作するシステムにおいて，各プロセッサ１
１，１１’のメモリのアドレス空間を，複数のスレッド
が共有する第１のアドレス空間と，スレッド間で共有し
ない第２のアドレス空間とに分割して構成する。なお，
ここでは，プロセッサ使用権を得る計算主体をスレッド
として説明するが，この計算主体がプロセスであって
も，本発明を同様に適用することができる。FIG. 1 is a diagram showing a configuration example of the present invention.
For example, as shown in FIG. 1 (A), a plurality of threads 13a, which are the subject of calculation, who obtain the processor usage right on the plurality of processors 11 and 11 'connected by the network 10.
In the system in which 13d operates, each processor 1
The memory address spaces 1 and 11 'are divided into a first address space shared by a plurality of threads and a second address space not shared by threads. In addition,
Here, the calculation subject that obtains the processor usage right is described as a thread, but the present invention can be similarly applied even if this calculation subject is a process.

【００２２】各プロセッサ１１，１１’は，スレッドが
メモリにアクセスする際に指定したアドレス（これをＣ
ＰＵアドレスという）の一部を，他のアドレス（これを
論理アドレスという）に変換するＣＰＵアドレス／論理
アドレス変換回路１２，１２’を備える。Each of the processors 11 and 11 'has an address designated by a thread when accessing a memory (this is a C address).
CPU address / logical address conversion circuits 12 and 12 'for converting a part of a PU address into another address (this is called a logical address) are provided.

【００２３】ＣＰＵアドレス／論理アドレス変換回路１
２，１２’は，例えば図１（Ｂ）に示すような，空間判
別回路１４，ベース・レジスタ１５，加算回路１６，ア
ドレスレジスタ１７からなる回路である。CPU address / logical address conversion circuit 1
Reference numerals 2 and 12 'are circuits composed of a space discriminating circuit 14, a base register 15, an adding circuit 16 and an address register 17, as shown in FIG.

【００２４】空間判別回路１４は，スレッドからのメモ
リアクセス要求が，アドレス空間中の共有アドレス空間
（例えばスタック空間以外の空間）に対するものである
か，またはスレッド間で共有しないアドレス空間（例え
ばスタック空間）に対するものであるかを判別する回路
である。The space discriminating circuit 14 determines whether a memory access request from a thread is directed to a shared address space (for example, a space other than the stack space) in the address space or an address space which is not shared by the threads (for example, the stack space). ) Is a circuit for determining whether or not it is.

【００２５】ベース・レジスタ１５は，スレッド間で共
有しない空間を相対アドレス空間とするためのベース・
アドレスを保持する手段であって，例えばスレッドごと
のスタックのオフセット値であるスタック・ベース・ア
ドレスを保持する。加算回路１６は，アクセス要求がス
タック空間等のスレッド間で共有しない空間に対するも
のである場合に，ベース・レジスタ１５に保持したベー
ス・アドレスを，アクセス要求のアドレスに加算する回
路である。The base register 15 is a base register for making a space that is not shared between threads a relative address space.
A means for holding an address, for example, a stack base address which is an offset value of a stack for each thread is held. The adder circuit 16 is a circuit that adds the base address held in the base register 15 to the address of the access request when the access request is for a space such as a stack space that is not shared by threads.

【００２６】アドレスレジスタ１７は，ＣＰＵアドレス
がスタック空間等の非共有空間を示すとき，加算回路１
６でベース・アドレスを加算したアドレスを保持し，ま
たＣＰＵアドレスが共有空間を示す場合には，そのＣＰ
Ｕアドレスをそのまま保持し，それを論理アドレスとし
て出力する。The address register 17 is provided for the adder circuit 1 when the CPU address indicates a non-shared space such as a stack space.
If an address obtained by adding the base address in 6 is held, and if the CPU address indicates a shared space, its CP
The U address is held as it is and output as a logical address.

【００２７】このＣＰＵアドレス／論理アドレス変換回
路１２，１２’によって，図１（Ｃ）に示すように，Ｃ
ＰＵアドレス空間での各スレッドのスタック空間の位置
は，論理アドレス空間ではアドレスの原点からベース・
レジスタ１５の値ｓｂｒが加算された位置へ移動可能と
なり，アドレスの衝突を回避することが可能となる。With this CPU address / logical address conversion circuit 12, 12 ', as shown in FIG.
The position of the stack space of each thread in the PU address space is based on the origin of the address in the logical address space.
It is possible to move to the position where the value sbr of the register 15 is added, and it is possible to avoid address collision.

【００２８】[0028]

【発明の実施の形態】本システムでは，並列化が容易で
高性能な実行環境を提供するために，例えばネットワー
ク化されたヘテロジーニァスなクラスタ上で，実行中の
任意のスレッド（またはプロセス）を適度に移動（マイ
グレート）させながら全体の処理を進めることを可能に
することと，ネットワーク上のプロセス間を高速なスト
リーム通信で結び付けることを実現する。BEST MODE FOR CARRYING OUT THE INVENTION In order to provide a high-performance execution environment that can be easily parallelized, this system can appropriately execute arbitrary threads (or processes) on a networked heterogeneous cluster. It realizes that the entire process can be carried out while moving (migrating) to, and that the processes on the network are connected by high-speed stream communication.

【００２９】すなわち，あるプロセスが通信の応答を待
っている時間にＣＰＵを他のプロセスの処理に割り当て
ることで，ＣＰＵの利用効率を上げると共に他のプロセ
スの実行による通信をも時間的にオーバラップさせて全
体として平均した場合の通信のレイテンシを低下させる
ことを可能とするシステムの提供を図る。That is, by allocating the CPU to the processing of another process while a certain process is waiting for a communication response, the utilization efficiency of the CPU is improved and the communication by the execution of the other process also overlaps in time. Therefore, it is intended to provide a system capable of reducing the communication latency when averaged as a whole.

【００３０】このようなヘテロジーニァスな計算機環境
での柔軟な対応のために計算機に依存しない仮想コード
（中間コード）方式を採用するとともに，さらに，任意
の時点でのスレッド（プロセス）・マイグレーションを
可能とするために仮想共有メモリ機構を採用する。In order to flexibly deal with such a heterogeneous computer environment, a virtual code (intermediate code) method that does not depend on a computer is adopted, and further, thread (process) migration at any time is possible. In order to do so, a virtual shared memory mechanism is adopted.

【００３１】（１）仮想コード方式本システムでは，仮想コードとこの仮想コードのインタ
プリタとをネットワーク上に複数分散配置して，仮想共
有メモリ上のリモート・メモリへのアクセスを高速な通
信に変換しながら処理を進める。(1) Virtual code system In this system, a plurality of virtual codes and interpreters of the virtual codes are distributed and arranged on the network, and access to the remote memory on the virtual shared memory is converted into high-speed communication. While proceeding with the process.

【００３２】仮想コードは，必要なときにサーバ間で転
送することもできるが，仮想コードを予め各サーバに保
持させて，移動の度の転送を不要とすることもできる。
他のＣＰＵのヒープ領域へのアクセスは，プロセスの実
行中に毎回トラップしてその都度データをセル単位で転
送する。したがって，この場合には，スレッドの移動時
に仮想コードとヒープ領域のデータを転送する必要がな
く，プロセスのスタック領域と仮想コードのインタプリ
タが使用する数個のレジスタの中身だけを転送すればよ
い。The virtual code can be transferred between the servers when necessary, but the virtual code can be held in each server in advance so that the transfer is not required for each movement.
The access to the heap area of another CPU is trapped every time the process is executed, and the data is transferred in cell units each time. Therefore, in this case, it is not necessary to transfer the virtual code and the data in the heap area when the thread moves, and only the stack area of the process and the contents of several registers used by the virtual code interpreter need to be transferred.

【００３３】（２）仮想共有メモリここで，スタック領域も仮想共有メモリの機構で管理す
れば移動時に転送する必要はないが，スタックにアクセ
スできるのは，それを使用して処理を進めるプロセスだ
けであるので，仮想共有メモリを使用すると全てのサー
バ上にメモリ空間をリザーブしてしまい，全サーバで利
用可能なローカル・メモリ量が小さくなってしまう。ま
た，仮想共有メモリのオーバヘッドもあるため効率が低
下してしまう。(2) Virtual shared memory If the stack area is also managed by the mechanism of the virtual shared memory, it is not necessary to transfer it at the time of movement, but the stack can be accessed only by the process that uses it to advance the processing. Therefore, if virtual shared memory is used, the memory space is reserved on all servers, and the amount of local memory available on all servers becomes small. In addition, there is also the overhead of virtual shared memory, which reduces efficiency.

【００３４】プロセスの実行途中での動的な移動を可能
とするためには，仮想コードをネットワーク上でポジシ
ョン・インデペンダントにしておく必要がある。そのた
めには，仮想コードに表れる全てのメモリ・アドレスを
仮想的に全ての計算機で共有させることが必要である。
オペレーティング・システム（ＯＳ）のレベルでサポー
トするためには，大掛かりな仕組みの仮想共有メモリ
（D.Lenoski,J.Laudon,K.Gharachorloo,A.Gupta,J.Henn
essy, The Directory-Based Cache Coherent Protocol
for the DASH multiprocessor, IEEE Proceedings of t
he 17th Annual International Symposium on Computer
Architecture, pp.148-159, 1990)を採用しなければな
らない。In order to enable dynamic movement during the execution of the process, it is necessary to make the virtual code position-independent on the network. For that purpose, it is necessary to virtually share all memory addresses appearing in the virtual code with all computers.
In order to support at the operating system (OS) level, virtual shared memory with a large-scale mechanism (D.Lenoski, J.Laudon, K.Gharachorloo, A.Gupta, J.Henn
essy, The Directory-Based Cache Coherent Protocol
for the DASH multiprocessor, IEEE Proceedings of t
he 17th Annual International Symposium on Computer
Architecture, pp.148-159, 1990).

【００３５】本システムでは，任意の時点でサーバ間で
スタックを移動できるようにするために，仮想コードの
設計時に，仮想スタック・アドレス空間を仮想コードの
内部レジスタであるスタック・ベース・レジスタ（ｓｂ
ｒ）を用いて相対アドレス化する。つまり，実際の計算
機上では任意の時点でスタックの中身を別のアドレスに
コピーし，そのアドレスの先頭をスタック・ベース・レ
ジスタにセットすることにより，矛盾なく実行が継続で
きるようにする。In this system, in order to move the stack between servers at an arbitrary time, the virtual stack address space is set to the stack base register (sb) which is an internal register of the virtual code when the virtual code is designed.
Relative addressing using r). In other words, on the actual computer, the contents of the stack are copied to another address at any time, and the start of that address is set in the stack base register so that execution can continue without conflict.

【００３６】すなわち，仮想コード領域は，各計算機上
で最初のプロセスが動き出す直前にリザーブされたメモ
リ空間にオンデマンドで全体を一度にコピーすればよ
い。他のライトを伴うメモリ空間およびヒープ領域は，
セルという可変長のライト・ワンスのメモリ単位で管理
し，この限定されたセルへのアクセスに関してのみ仮想
共有メモリを構築している。That is, the entire virtual code area may be copied on demand at once into the memory space reserved immediately before the first process starts on each computer. The memory space and heap area with other writes are
Cells are managed in units of variable-length write-once memory, and a virtual shared memory is constructed only for access to these limited cells.

【００３７】したがって，インタプリタが内部的に利用
するメモリ空間，プロセスのスタック空間，ストリーム
のバッファ等はすべてローカルな計算機上に取ることが
できるので，仮想共有メモリが扱わなければならない対
象であるセル空間はかなり小さい。しかも，セルは，ラ
イト・ワンスなので，セル・データのキャッシュのコヒ
ーレンスを保つ仕組みが省略でき，一度作られたセルの
キャッシュを恒久的なローカルなデータとして，キャッ
シュのコヒーレンスを保つ仕組みのコストなしに保持す
ることができる。これにより，仮想共有メモリ機構を用
いることによるオーバヘッドを減少させることができ
る。Therefore, since the memory space internally used by the interpreter, the stack space of the process, the stream buffer, etc. can all be stored on the local computer, the cell space which is the target to be handled by the virtual shared memory. Is quite small. Moreover, since the cell is write-once, the mechanism for maintaining the coherence of the cache of cell data can be omitted, and the cache of the created cell is treated as permanent local data without the cost of the mechanism for maintaining the coherence of the cache. Can be held. As a result, the overhead due to using the virtual shared memory mechanism can be reduced.

【００３８】図２は，ＣＰＵのアドレス空間を仮想メモ
リ機構が使用する論理アドレスに変換する処理を説明す
る図である。図２において，スタック空間以外のＣＰＵ
アドレスｘであるＡは，そのまま論理アドレスｘのＡ’
に変換され，スタック空間のＣＰＵアドレスであるＢ
は，一旦，原点からのオフセットｙに変換された後，ス
タック・ベース・レジスタ（ｓｂｒ）の値を足され，論
理アドレス（ｓｂｒ＋ｙ）に変換される。FIG. 2 is a diagram for explaining the process of converting the CPU address space into a logical address used by the virtual memory mechanism. In FIG. 2, the CPU other than the stack space
A which is the address x is A ′ of the logical address x as it is
B, which is the CPU address of the stack space
Is once converted to the offset y from the origin, then added to the value of the stack base register (sbr), and converted to the logical address (sbr + y).

【００３９】このように，ＣＰＵ／論理アドレス変換を
ＣＰＵレベルで実装するため，あらかじめアドレス変換
のために最適化したアルゴリズムまたは回路を作り込む
ことができ，プログラムのレベルで明示的にアドレス変
換を施す場合に比べ，非常に高速で，これによるオーバ
ヘッドを非常に小さくすることができる。As described above, since the CPU / logical address conversion is implemented at the CPU level, an algorithm or circuit optimized for the address conversion can be built in advance, and the address conversion is explicitly performed at the program level. Compared with the case, it is very fast and the overhead due to this is very small.

【００４０】しかも，この小さなオーバヘッドのコスト
を払うことにより，ＣＰＵの数が増加すればするほど，
ＣＰＵが有効に使用することができるアドレス空間を全
体的に見て大きくすることができるというスケーラビリ
ティを確保できる。Moreover, as the number of CPUs increases by paying the cost of this small overhead,
It is possible to ensure scalability that the address space that can be effectively used by the CPU can be enlarged as a whole.

【００４１】[0041]

【実施例】図３は，本発明の実施例を示す図である。ネ
ットワーク上の複数のワークステーション（プロセッサ
３０，３０’，…）で並列計算させるための言語を作
り，そのインタプリタ用に本方式のスタック・ベース・
レジスタを持つ仮想コードを設計し，それを実現してい
る。この言語は，任意のスレッドが任意の時点で他のプ
ロセッサに移動することが可能であるように設計されて
いる。FIG. 3 is a diagram showing an embodiment of the present invention. A language for making parallel calculations by a plurality of workstations (processors 30, 30 ', ...) On the network is created, and the stack base of this method is used for the interpreter.
We designed virtual code with registers and realized it. The language is designed so that any thread can move to another processor at any time.

【００４２】オペレーティング・システム（ＯＳ）のス
レッド・スケジューラ３３，３３’は，プロセッサ間で
通信を行いながら，各スレッドの実行制御およびプロセ
ッサ間の移動制御を行う手段である。スレッド・スケジ
ューラ３３，３３’が管理するスレッド制御テーブル３
４，３４’には，各スレッドの実行に関する制御情報と
して，例えば，スレッドを識別するスレッドＩＤ，スレ
ッド実行時間，スレッド実行権，そのスレッドが割り当
てられているプロセッサ（ＣＰＵ）番号，どのようなと
きにスレッドのマイグレーションを行うかについての移
動条件，スタック・ベース・レジスタ（ｓｂｒ）の値，
スタック・ポインタ（ｓｐ）の値等が格納されている。
スレッド制御テーブル３４，３４’は，プロセッサ３
０，３０’がスレッドの実行スケジューリングを行うと
き，他のプロセッサへのスレッドの転送を行うときに，
登録，更新，削除される。The thread schedulers 33, 33 'of the operating system (OS) are means for performing execution control of each thread and movement control between processors while communicating between the processors. Thread control table 3 managed by the thread scheduler 33, 33 '
4, 34 ′ includes, as control information related to the execution of each thread, for example, a thread ID for identifying the thread, a thread execution time, a thread execution right, a processor (CPU) number to which the thread is assigned, and when Conditions for migrating threads to, the value of stack base register (sbr),
The stack pointer (sp) value and the like are stored.
The thread control table 34, 34 'is the processor 3
When 0, 30 'schedule execution of threads, when transferring threads to other processors,
Registered, updated, and deleted.

【００４３】図３は，プロセッサ３０のスレッドＡがプ
ロセッサ３０’へ移動するところを示している。ここ
で，スレッドＡは，プロセッサ３０上ではアドレス空間
３１のスレッド・スタックＡを使用しているが，移動先
のプロセッサ３０’のアドレス空間３１’では，既にス
レッドＣがスレッド・スタックＡのアドレスを使用して
いるので，従来の方式では，スレッド・スタックＡを移
動させることができない状態である。FIG. 3 shows thread A of processor 30 moving to processor 30 '. Here, the thread A uses the thread stack A of the address space 31 on the processor 30, but in the address space 31 ′ of the destination processor 30 ′, the thread C already has the address of the thread stack A. Since it is used, the thread stack A cannot be moved by the conventional method.

【００４４】しかし，本方式の場合，プロセッサ３０上
のスレッドＡは，ｓｂｒというスタック・ベース・レジ
スタの値を持っており，これによってスタック専用のア
ドレス空間を相対化している。したがって，スレッドＡ
をプロセッサ３０からプロセッサ３０’に移動させたと
きに，プロセッサ３０’上でスタック・ベース・レジス
タの値をスレッド・スタックＣに重ならないようにｓｂ
ｒ’に変えることにより，スレッドＡはプロセッサ３
０’上でスレッド・スタックＡを問題なく使用すること
ができる。However, in the case of this system, the thread A on the processor 30 has the value of the stack base register called sbr, and the address space dedicated to the stack is made relative by this. Therefore, thread A
Sb so that the value of the stack base register on the processor 30 ′ does not overlap the thread stack C when the processor is moved from the processor 30 to the processor 30 ′.
By changing to r ′, thread A becomes processor 3
Thread stack A can be used on 0'without problems.

【００４５】このように，アドレスの衝突時にスタック
・ベース・レジスタの値の変更だけでよいのは，プロセ
ッサのスタック空間のアドレスへのアクセスが，全てｓ
ｂｒに関する相対値になっているためである。また，プ
ロセッサ上で使用されるアドレス空間と仮想メモリのア
ドレス空間（論理アドレス空間）とが，ｓｂｒを用いた
アドレス変換を介して間接的に対応付けられていること
から，プロセッサ上でロード，ストアされるアドレス，
およびその他の計算に表れるアドレスが，全てプロセッ
サのＣＰＵアドレスになっているためである。As described above, it is only necessary to change the value of the stack base register at the time of address collision, because all accesses to addresses in the stack space of the processor are s.
This is because it is a relative value regarding br. Further, since the address space used on the processor and the address space (logical address space) of the virtual memory are indirectly associated with each other through the address conversion using sbr, the load and store on the processor are performed. Address,
This is because all the addresses appearing in and other calculations are the CPU address of the processor.

【００４６】つまり，スタック・ベース・レジスタ（ｓ
ｂｒ）を使用した相対スタック・アドレス空間では，移
動前と移動後とでアクセス要求元へ意識させることな
く，スタックの場所を論理アドレス空間内で自由に平行
移動させることができるので，ＯＳ間のネゴシエーショ
ンなしに，移動してきたスレッドのスタックを，空いて
いる任意のスタック空間にコピーすることができる。That is, the stack base register (s
In a relative stack address space using (br), the location of the stack can be freely translated in the logical address space before and after the move without making the access request source aware. The stack of a moving thread can be copied to any free stack space without negotiation.

【００４７】以上のように，スレッドＡをプロセッサ３
０からプロセッサ３０’に移動させる場合に，スレッド
・スケジューラ３３は，スレッドＡの移動により移動先
のプロセッサ３０’のアドレス空間３１’でアドレスの
衝突が発生するかどうかを判断することなく，スレッド
Ａの制御情報をプロセッサ３０’のスレッド・スケジュ
ーラ３３’に渡し，スレッドＡを移動させることができ
る。As described above, the thread A is processed by the processor 3
When moving from 0 to the processor 30 ′, the thread scheduler 33 does not determine whether the movement of the thread A causes an address collision in the address space 31 ′ of the destination processor 30 ′, and the thread A It is possible to transfer the thread A to the thread scheduler 33 'of the processor 30' to move the thread A.

【００４８】移動先のスレッド・スケジューラ３３’
は，スレッド・スタックＡをアドレス空間３１’におけ
る空きのスタック空間に割り当て，アドレス空間３１’
の原点からの相対値であるｓｂｒ’をスレッドＡのスレ
ッド制御情報へ設定し，実行スケジュールの契機にスレ
ッドＡにプロセッサ使用権を与えて，スレッドＡを実行
させる。Destination thread scheduler 33 '
Allocates thread stack A to an empty stack space in address space 31 ',
Sbr ′, which is a relative value from the origin of the above, is set in the thread control information of the thread A, and the processor usage right is given to the thread A at the timing of the execution schedule to execute the thread A.

【００４９】なお，ここで，スタックポインタ（ｓｐ）
は，スタック・ベース・レジスタ（ｓｂｒ）からの相対
値を保持し，スレッドの移動後もその値が保持される。
なお，図１（Ｂ）に示すＣＰＵアドレス／論理アドレス
変換回路における空間判別回路１４において，受け取っ
たＣＰＵアドレスがスタック空間であるかそれ以外の空
間であるかを判別する方法として，本実施例では，ＣＰ
Ｕアドレスの上位２ビットが“１１”の場合にスタック
空間，上位２ビットが“１１”以外の場合にスタック空
間以外の空間と判断し，上位２ビットが“１１”であっ
て加算回路１６によりベース・レジスタ１５の値（ｓｂ
ｒ）を加算する際には，ＣＰＵアドレスの上位２ビット
をマスクし，上位２ビットを“００”にしてから，加算
している。Here, the stack pointer (sp)
Holds a relative value from the stack base register (sbr), and the value is held even after the thread is moved.
In the present embodiment, as a method for determining whether the received CPU address is the stack space or the other space in the space determination circuit 14 in the CPU address / logical address conversion circuit shown in FIG. , CP
If the upper 2 bits of the U address is “11”, it is determined to be the stack space, and if the upper 2 bits are other than “11”, it is determined to be the space other than the stack space. Value of base register 15 (sb
When r) is added, the upper 2 bits of the CPU address are masked and the upper 2 bits are set to “00” before adding.

【００５０】この空間の判別をＣＰＵアドレスの一部を
利用して行うのではなく，例えば命令コードから得られ
る制御信号等を利用して行うようにしてもよい。図４
は，図３に示すスレッド・スケジューラの処理を中心と
した処理フローチャートである。この処理フローチャー
トは，各ＣＰＵ（プロセッサ）内のＯＳと各スレッドの
処理の流れを示しており，他のＣＰＵ（ＯＳ）からのス
レッドの受信（ステップＳ１〜Ｓ４），他のＣＰＵ（Ｏ
Ｓ）へのスレッドの送信（ステップＳ５〜Ｓ８），自Ｃ
ＰＵ内のスレッドの実行制御（ステップＳ９〜Ｓ１５）
の３つのループから構成されている。The space may be determined not by using a part of the CPU address but by using a control signal or the like obtained from an instruction code, for example. FIG.
3 is a process flowchart centering on the process of the thread scheduler shown in FIG. This processing flow chart shows the processing flow of the OS and each thread in each CPU (processor), and receives threads from other CPUs (OS) (steps S1 to S4) and other CPUs (O).
Send thread to S (steps S5 to S8), own C
Execution control of threads in PU (steps S9 to S15)
It consists of three loops.

【００５１】ステップＳ１では，他のＣＰＵ（ＯＳ）か
らのスレッドの転送要求の有無を判定する。転送要求が
あれば，ステップＳ２の処理を行い，転送要求がなけれ
ばステップＳ５の処理へ進む。In step S1, it is determined whether or not there is a thread transfer request from another CPU (OS). If there is a transfer request, the process of step S2 is performed, and if there is no transfer request, the process proceeds to step S5.

【００５２】ステップＳ２では，他のＣＰＵ（ＯＳ）か
らスレッド・スタックを受信し，その内容をアドレス空
間の空き領域にコピーする。ステップＳ３では，スレッ
ド・スタックをコピーしたアドレス空間の先頭を，その
スタック用のスタック・ベース・レジスタｓｂｒの退避
領域に設定する。この移動したスレッドのｓｂｒの値の
変更は，ＣＰＵがスタック用に確保したスタック制御テ
ーブル上のｓｂｒ用領域に書き込むことで実行する。In step S2, the thread stack is received from another CPU (OS) and its contents are copied to a free area in the address space. In step S3, the head of the address space where the thread stack is copied is set to the save area of the stack base register sbr for that stack. The change of the sbr value of the moved thread is executed by writing to the sbr area on the stack control table reserved for the stack by the CPU.

【００５３】ステップＳ４では，他のＣＰＵ（ＯＳ）か
ら転送されたスタックを用いるスレッドの制御情報を受
信し，スレッド制御テーブルに設定する。その後，ステ
ップＳ１へ戻る。In step S4, the control information of the thread using the stack transferred from another CPU (OS) is received and set in the thread control table. Then, it returns to step S1.

【００５４】ステップＳ５では，各スレッドについて，
スレッド制御テーブルにおける移動条件をチェックし，
移動するスレッドを決定する。ステップＳ６では，移動
条件を満足するスレッドが存在するかどうかを判定し，
存在する場合にはステップＳ７の処理を行い，存在しな
い場合はステップＳ９の処理へ進む。At step S5, for each thread,
Check the movement conditions in the thread control table,
Determine which thread to move. In step S6, it is determined whether or not there is a thread that satisfies the movement condition,
If it exists, the process of step S7 is performed, and if it does not exist, the process proceeds to step S9.

【００５５】ステップＳ７では，他のＣＰＵ（ＯＳ）へ
移動条件に合致したスレッドのスレッド・スタックを転
送する。ステップＳ８では，他のＣＰＵ（ＯＳ）へ転送
したスタックのスレッドの制御情報をスレッド制御テー
ブルから転送し，スレッド制御テーブルにおけるその情
報を消去する。その後，ステップＳ５へ戻る。In step S7, the thread stack of the thread that matches the movement condition is transferred to another CPU (OS). In step S8, the control information of the thread of the stack transferred to another CPU (OS) is transferred from the thread control table, and the information in the thread control table is erased. Then, it returns to step S5.

【００５６】ステップＳ９では，実行するスレッドを決
定する。ステップＳ１０では，実行するスレッドが存在
するかどうかを判定する。存在する場合にはステップＳ
１１の処理を行い，実行可能な状態にあるスレッドが存
在しない場合には，ステップＳ１へ戻る。In step S9, the thread to be executed is determined. In step S10, it is determined whether or not there is a thread to execute. Step S if it exists
The process of 11 is performed, and when there is no thread in an executable state, the process returns to step S1.

【００５７】ステップＳ１１では，スタック・ベース・
レジスタ（ｓｂｒ）の値を，実行するスタックのスタッ
ク制御テーブルから得て，スタック・ベース・レジスタ
に設定する。In step S11, the stack base
The value of the register (sbr) is obtained from the stack control table of the stack to be executed and set in the stack base register.

【００５８】ステップＳ１２では，スタックに退避して
あったその他のレジスタの中身を戻し，そのスレッドへ
コンテキスト・スイッチを行ってＣＰＵ使用権を与え
る。ステップＳ１３では，スレッドを実行する。In step S12, the contents of the other registers saved in the stack are returned, the context switch is performed for the thread, and the CPU usage right is given. In step S13, the thread is executed.

【００５９】ステップＳ１４では，スレッドの実行を停
止または中断する事象の発生により，レジスタのスタッ
クへの退避とＯＳへのコンテキスト・スイッチを行う。
ステップＳ１５では，スレッド制御テーブルへのスタッ
ク・ベース・レジスタ（ｓｂｒ）の値を退避し，また，
スレッド実行時間などのスレッド制御テーブルの更新を
行う。In step S14, when an event that suspends or suspends the execution of the thread occurs, the registers are saved in the stack and the context switch to the OS is performed.
In step S15, the value of the stack base register (sbr) is saved in the thread control table, and
Update the thread control table such as thread execution time.

【００６０】以上のように，本システムは，ＣＰＵ内
にスタックへのアクセスを，相対的なアドレスで行わせ
る専用のベース・レジスタを設けること，ネットワー
ク上で結合された計算機群または並列計算機で，（仮
想）共有メモリ機構を使うとき，（仮想）共有メモリの
対象にならないローカル・メモリ空間にスタック専用の
相対アドレス空間を作ること，アドレス空間を分割
し，一部に専用のベース・レジスタを使った相対アドレ
ス空間を作ることにより，共有メモリあるいは仮想共有
メモリ機構を用いるシステムにおけるローカル・メモリ
空間の有効利用が可能になる。As described above, the present system is provided with a dedicated base register for making access to the stack at a relative address in the CPU, a group of computers connected on the network, or a parallel computer. When using the (virtual) shared memory mechanism, create a relative address space dedicated to the stack in a local memory space that is not the target of the (virtual) shared memory, divide the address space, and use a dedicated base register for part By creating the relative address space, it becomes possible to effectively use the local memory space in the system using the shared memory or the virtual shared memory mechanism.

【００６１】[0061]

【発明の効果】以上説明したように，本発明によれば，
スタック空間をプロセッサ（ＣＰＵ）間で独立に扱うこ
とができるので，適切にスレッドを各プロセッサに分配
すると，全体として，スタック空間を（仮想）共有させ
た場合に可能な最大のスレッド数にプロセッサ数を掛け
た数に近いスレッドを動かすことができ，より大規模な
計算が可能となる。また，スレッド１個当たりのスタッ
クの大きさをより大きくして，大きなデータをスタック
上に置くことができるようになる。As described above, according to the present invention,
Since the stack space can be handled independently among the processors (CPUs), if threads are appropriately distributed to each processor, the total number of processors will be the maximum number of threads possible when the stack space is (virtually) shared. Threads close to the number multiplied by can be moved, enabling larger-scale calculations. Also, the size of the stack per thread can be made larger, and large data can be placed on the stack.

[Brief description of the drawings]

【図１】本発明の構成例を示す図である。FIG. 1 is a diagram showing a configuration example of the present invention.

【図２】本発明のＣＰＵのアドレス処理を説明する図で
ある。FIG. 2 is a diagram illustrating address processing of a CPU according to the present invention.

【図３】本発明の実施例を示す図である。FIG. 3 is a diagram showing an embodiment of the present invention.

【図４】スレッド・スケジューラの処理フローチャート
である。FIG. 4 is a processing flowchart of a thread scheduler.

【図５】従来技術の問題点を説明する図である。FIG. 5 is a diagram illustrating a problem of the conventional technique.

[Explanation of symbols]

１０ネットワーク１１，１１’ プロセッサ１２，１２’ ＣＰＵアドレス／論理アドレス変換回路１３ａ〜１３ｄスレッド１４空間判別回路１５ベース・レジスタ１６加算回路１７アドレスレジスタ 10 network 11, 11 'processor 12, 12' CPU address / logical address conversion circuit 13a-13d thread 14 space discrimination circuit 15 base register 16 adder circuit 17 address register

Claims

[Claims]

1. In a system in which a plurality of processes or threads, which are the subjects of computation to obtain processor usage rights on a plurality of processors, operate, the address space of the memory is
A first address space shared by multiple processes or threads and a second address space not shared between processes or threads
And a memory from a process or thread, each of the processors holding a base address for making all or a specific portion of the second address space into a relative address space. Means for determining whether the access request is for all or a particular portion of the second address space, and if the access request is for all or a particular portion of the second address space, An address space sharing system, comprising means for adding the base address to an address and converting the address.

2. The address space sharing system according to claim 1, wherein all or a specific part of the second address space is a stack space used by each process or thread as a stack. system.