JPH06318191A

JPH06318191A - Multi-thread processing system

Info

Publication number: JPH06318191A
Application number: JP10711493A
Authority: JP
Inventors: Yoshio Kato; 喜郎加藤
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1993-05-10
Filing date: 1993-05-10
Publication date: 1994-11-15

Abstract

PURPOSE:To reduce overhead at the time of starting/ending parallel operation by keeping threads at a waiting state without ending them at the time point of end of a parallel processing part, and at the time of starting succeeding parallel processing, restarting the waited threads to execute the parallel processing without generating new threads. CONSTITUTION:This multi-thread processing system is provided with threads 15 and parallel calling processing 13, and when a waiting thread 15 exists, the processing 13 called out at the time of starting parallel processing in a program allocates a parallel execution enabled program part to the thread 15. When there is no waiting thread 15, a new thread 15 is generated, the parallel execution enabled program part is allocated to the newly generated thread 15 to execute the program part, and at the time of ending the processing, the thread is managed as a waiting thread 15.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、複数のスレッドによっ
て並列処理を行なうマルチスレッド処理方式に関するも
のである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a multi-thread processing system for performing parallel processing by a plurality of threads.

【０００２】近年、スレッドと呼ぶ並列動作機構を備え
たＵＮＩＸシステムが現れており、マルチプロセッサ上
で並列処理を効率良く行なうことが望まれている。In recent years, a UNIX system equipped with a parallel operation mechanism called a thread has appeared, and it is desired to efficiently perform parallel processing on a multiprocessor.

【０００３】[0003]

【従来の技術】科学技術計算などの性能が要求される応
用プログラムにおいて、ＤＯループやサブルーチンを複
数のＣＰＵで並列に実行することによって、実行性能を
向上することが行われている。例えば図８の（ａ）に示
す２つのＣＰＵ１、ＣＰＵ２や、図８の（ｂ）に示す４
つのＣＰＵ１、ＣＰＵ２、ＣＰＵ３、ＣＰＵ４からアク
セス可能なＭＳＵ（主記憶装置）にプログラムやデータ
をロードし、各ＣＰＵ１、２、更にＣＰＵ３、ＣＰＵ４
がそれぞれ担当すべき部分のプログラムを実行する。こ
れらのシステムでは、理想的には元のプログラムを１つ
のＣＰＵで実行したときの１／２（図８の（ａ）の場
合）、１／４（図８の（ｂ）の場合）の時間でジョブの
実行が終了する。2. Description of the Related Art In an application program requiring performance such as scientific and technological calculation, execution performance is improved by executing DO loops and subroutines in parallel by a plurality of CPUs. For example, the two CPUs 1 and 2 shown in (a) of FIG. 8 and the four CPUs shown in (b) of FIG.
Programs and data are loaded into an MSU (main storage device) accessible from one CPU1, CPU2, CPU3, and CPU4, and each CPU1, 2, and further CPU3, CPU4
Execute the part of the program they should be in charge of. In these systems, ideally, the time of 1/2 (in the case of (a) of FIG. 8) and 1/4 (in the case of (b) of FIG. 8) when the original program is executed by one CPU Ends the job execution.

【０００４】実際には、実行効率がプログラムの構造
（並列実行が可能なループが少ないなど）と並列処理制
御のオーバーヘッドに大きく影響を受けるので、ユーザ
プログラムを並列処理に適した構造に書き直すこと（ユ
ーザプログラムのチューニング）や、並列処理制御のオ
ーバーヘッドを減らすこと（システムのチューニング）
を行なうことによって、理想的な性能に近づけることが
できる。ここでは、並列処理制御のオーバーヘッドにつ
いて以下従来技術を説明する。In practice, since the execution efficiency is greatly affected by the structure of the program (there are few loops that can be executed in parallel) and the overhead of parallel processing control, rewrite the user program into a structure suitable for parallel processing ( User program tuning) and parallel processing control overhead reduction (system tuning)
By performing, it is possible to approach the ideal performance. Here, the related art will be described below regarding the overhead of parallel processing control.

【０００５】ジョブはＣＰＵの数に見合った数の複数の
タスクからなる。ＵＮＩＸシステムでは、Ｆｏｒｔｒａ
ｎジョブをプロセス、タスクをスレッドとするマルチス
レッドプログラムとして実行する。A job is composed of a plurality of tasks corresponding to the number of CPUs. For UNIX system, Fortra
Execute as a multi-thread program in which n jobs are processes and tasks are threads.

【０００６】ＯＳ（オペレーティングシステム）は、Ｃ
ＰＵとスレッドの割り当てを行なう。スレッドがいつ、
どのプログラム部分を並列処理するかは、実行時ライブ
ラリによって制御する。The operating system (OS) is C
Assign PUs and threads. When the thread
The run-time library controls which program parts are processed in parallel.

【０００７】並列処理するプログラム部分としては、サ
ブルーチンを並列処理の単位としたり、あるいは１００
０回転するＤＯループを２５０回転ずつ４つに分割して
それぞれを並列に実行するなどする。As a program portion for parallel processing, a subroutine is used as a unit of parallel processing, or 100
The DO loop that makes 0 rotations is divided into four 250 rotations, and the DO loops are executed in parallel.

【０００８】いずれでも、プログラムの実行開始から、
ある部分までは逐次処理部分があり、並列処理を行った
後、再び逐次処理に戻ってプログラムを終了する。図９
の例では、（ａ）までは逐次処理を行なう。（ａ）の時
点で２個のプログラム部分（ｂ）、（ｃ）の並列処理を
開始する。そして、（ｂ）、（ｃ）の演算が両者ともに
終了した時点（ｄ）で、プログラムは再び逐次処理動作
となる。これら（ａ）から（ｄ）までの動作がループに
含まれる場合、並列動作の開始・終了処理のコストもル
ープ回転数と同じだけ増大することとなる。更に、並列
に動作するプログラム部分（ｂ）、（ｃ）の中にも、同
様に並列処理の開始・終了部分が含まれることがある。In any case, from the start of program execution,
There is a sequential processing part up to a certain part, and after performing parallel processing, the program is terminated by returning to the serial processing again. Figure 9
In the example of, the sequential processing is performed up to (a). At the time of (a), the parallel processing of the two program parts (b) and (c) is started. Then, at the time point (d) when the operations of (b) and (c) are both completed, the program again becomes a sequential processing operation. When the operations (a) to (d) are included in the loop, the cost of the start / end processing of the parallel operation is increased by the same as the loop rotation speed. Further, the program parts (b) and (c) operating in parallel may also include the start / end part of the parallel processing.

【０００９】[0009]

【発明が解決しようとする課題】科学技術計算では、並
列処理の単位となる演算量（ＤＯループやサブルーチン
など）は、例えばＰＲＩＮＴ文などの入出力処理の演算
量に比べて１／１０００や１／１００という小さな値と
なり、従来の入出力処理と演算との並列処理とは大きな
違いがある。In science and technology calculation, the amount of computation (DO loop, subroutine, etc.) that is a unit of parallel processing is 1/1000 or 1 compared with the amount of computation of input / output processing such as PRINT statement. This is a small value of / 100, and there is a big difference between the conventional parallel processing of input / output processing and calculation.

【００１０】このため、並列処理単位のコストが小さい
ときでも、効率よく並列処理するためには、並列処理の
制御に要するオーバーヘッドを、並列処理される演算量
よりも、相対的に小さくする必要がある。Therefore, even if the cost of the parallel processing unit is small, in order to perform parallel processing efficiently, it is necessary to make the overhead required for controlling parallel processing relatively smaller than the amount of operations to be executed in parallel. is there.

【００１１】並列動作の開始処理・終了処理を行なうに
は、ＯＳが備えるスレッド生成処理のシステムコールを
使用して、並列実行を行なうプログラム部分と、スレッ
ドとを１対１に対応させることが考えられる。即ち、並
列動作を開始する段階で、並列動作部分に対応するスレ
ッドを生成する。図９の例では、（ａ）の時点で、
（ｂ）、（ｃ）を並列実行するために、スレッド生成の
システムコール(pthread create)を発行し、直後に
（ｂ）、（ｃ）のそれぞれの終了を待ち合わせるシステ
ムコール(pthread join)を発行する。これにより、
（ｂ）、（ｃ）の並列処理が終わると、phread joinの
待が解除され、（ｄ）に到る。（ｄ）ではスレッドを消
去するためのシステムコール(phread delate)を発行す
る。In order to perform the start / end processing of the parallel operation, it is conceivable to use the system call of the thread generation processing provided in the OS so that the program portion to be executed in parallel and the threads are in one-to-one correspondence. To be That is, at the stage of starting the parallel operation, a thread corresponding to the parallel operation part is generated. In the example of FIG. 9, at the time of (a),
To execute (b) and (c) in parallel, issue a system call (pthread create) for thread creation, and immediately after that, issue a system call (pthread join) that waits for the end of each of (b) and (c). To do. This allows
When the parallel processing of (b) and (c) is completed, the wait for phread join is released, and the process reaches (d). In (d), a system call (phread delate) for erasing the thread is issued.

【００１２】このように、スレッド生成、スレッド待ち
合わせ、スレッド消去といった処理を順次行って図９の
並列処理を行なったのでは、これらの生成、消去などに
するための処理時間が並列処理を行なう時間に比して無
視できない大きさになってしまい、特にループで繰り返
す場合に大きな時間となってしまう問題があった。In this way, the processes such as thread generation, thread waiting, and thread erasing are sequentially performed to perform the parallel processing of FIG. 9. Therefore, the processing time for creating and erasing these threads is the time required for the parallel processing. However, there is a problem that the time becomes too large to be ignored, and especially when repeating in a loop, it takes a long time.

【００１３】本発明は、これらの問題を解決するため、
並列処理部分の終了時点でスレッドを終了させずに、待
機状態にしておき、次の並列処理を開始するときに新た
なスレッドを生成せずに待機状態のスレッドを再開させ
て並列処理を行い、並列動作の開始・終了のオーバーヘ
ッドを削減することを目的としている。The present invention solves these problems.
Do not terminate the thread at the end of the parallel processing part, put it in the waiting state, restart the thread in the waiting state without generating a new thread when starting the next parallel processing, and perform parallel processing, It aims to reduce the overhead of starting and ending parallel operations.

【００１４】[0014]

【課題を解決するための手段】図１は、本発明の原理ブ
ロック図を示す。図１において、並列呼出処理１３は、
プログラムから並列処理開始時に呼び出され、並列実行
可能なプログラム部分をスレッド１５に割り当てて実行
させると共に、待機中のスレッド１５を管理などするも
のである。FIG. 1 shows a block diagram of the principle of the present invention. In FIG. 1, the parallel call processing 13 is
It is called from a program at the start of parallel processing, allocates a parallel-executable program part to the thread 15 and executes the same, and manages the waiting thread 15.

【００１５】スレッド１５は、並列実行可能なプログラ
ム部分を取り込んで処理を実行するものである。The thread 15 takes in a program part which can be executed in parallel and executes a process.

【００１６】[0016]

【作用】本発明は、図１に示すように、プログラムから
呼び出された並列呼出処理１３が並列実行可能なプログ
ラム部分を待機中のスレッド１５があった場合にはこの
スレッド１５に並列実行可能なプログラム部分を割り当
て、一方、待機中のスレッド１５がない場合には新たな
スレッド１５を生成してこのスレッド１５に並列実行可
能なプログラム部分を割り当てて実行させ、処理終了し
たときに待機中のスレッド１５として管理し、並列処理
開始・終了時のスレッド１５の生成・消去の回数を削減
するようにしている。According to the present invention, as shown in FIG. 1, when there is a thread 15 waiting for a program portion in which the parallel call processing 13 called from the program can be executed in parallel, the thread 15 can be executed in parallel. On the other hand, if there is no waiting thread 15, a new thread 15 is created, a parallel executable program section is allocated to this thread 15 and executed, and a thread waiting when the processing ends. The number of threads 15 generated / erased at the start / end of parallel processing is reduced.

【００１７】従って、並列処理部分の終了時点でスレッ
ド１５を終了させずに、待機状態にしておき、次の並列
処理を開始するときに新たなスレッド１５を生成せずに
待機状態のスレッド１５を再開させて並列処理を行うこ
とにより、並列動作の開始・終了のオーバーヘッドを削
減することが可能となる。Therefore, at the end of the parallel processing part, the thread 15 is not terminated but kept in the standby state, and when the next parallel processing is started, a new thread 15 is not generated and the thread 15 in the standby state is not generated. By restarting and performing parallel processing, it is possible to reduce the overhead of starting and ending the parallel operation.

【００１８】[0018]

【実施例】次に、図１から図７を用いて本発明の実施例
の構成および動作を順次詳細に説明する。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS Next, the construction and operation of an embodiment of the present invention will be sequentially described in detail with reference to FIGS.

【００１９】図１は、本発明の原理ブロック図を示す。
図１において、主記憶装置１１は、複数のＣＰＵ１、２
・・・ｎからアクセス可能な記憶装置であって、プログ
ラムをローディングしたり、データを格納したりなどす
る記憶装置であり、図示のように、ＯＳ１２を格納した
り、並列呼出処理１３や待機中スレッド１４を格納した
り、ユーザプログラム１７やサブルーチンを格納したり
などする記憶装置である。FIG. 1 shows a block diagram of the principle of the present invention.
In FIG. 1, the main storage device 11 includes a plurality of CPUs 1, 2
... A storage device accessible from n, for loading programs, storing data, etc., and storing the OS 12, the parallel call processing 13 and the standby state as shown in the figure. It is a storage device for storing the threads 14, the user program 17, and a subroutine.

【００２０】ＯＳ１２は、オペレーティングシステムで
あって、全体を統括制御するものであり、ここではスレ
ッドを作成したり、スレッドを消去したりなどするもの
である。The OS 12 is an operating system that controls the entire system, and here creates a thread and deletes a thread.

【００２１】並列呼出処理１３は、プログラムから並列
処理開始時に呼び出され、並列実行可能なプログラム部
分を、待機中のスレット１５あるいは待機中のスレッド
１５がないときは新たに作成させたスレド１５に割り当
てて実行させたり、並列処理の終了したスレッド１５を
待機状態として管理したりなどするものである。The parallel call processing 13 is called from the program at the start of the parallel processing, and the parallel executable program portion is assigned to the thread 15 waiting or the thread 15 newly created when there is no thread 15 waiting. And executes the thread 15, and manages the thread 15 for which parallel processing has ended in a standby state.

【００２２】待機中スレッド１４は、待機中のスレッド
１５を管理するためのものであって、ここでは管理ブロ
ック１６をリンクあるいはポイントしたりして当該管理
ブロック１６で表されるスレッド１５を管理するもので
ある。The waiting thread 14 is for managing the waiting thread 15. Here, the management block 16 is linked or pointed to manage the thread 15 represented by the management block 16. It is a thing.

【００２３】スレッド１５は、並列実行可能なプログラ
ム部分を取り込んで処理を実行するものであって、管理
ブロック１６によって特定されるものである。管理ブロ
ック１６は、スレッド１５を特定する情報を管理するも
のであって、ここでは下記の情報を設定して管理するも
のである。The thread 15 takes in a program part which can be executed in parallel and executes a process, and is specified by the management block 16. The management block 16 manages information that identifies the thread 15, and here sets and manages the following information.

【００２４】・スレッド識別子：ＯＳ１２にスレッド生
成を依頼したときに、ＯＳ１２から通知される当該スレ
ッドの識別子である。・次の管理ブロックのアドレス：管理ブロック１６の連
鎖のために使うものである。Thread identifier: This is an identifier of the thread notified from the OS 12 when the OS 12 is requested to create a thread. Address of next management block: Used for chaining management blocks 16.

【００２５】・実行中フラグ：通常の動作状態である
か、待機状態であるかを識別するためのフラグである。・呼出しに関する情報：実行すべきサブルーチンの入口
アドレスや、サブルーチンに渡すパラメタ、パラメタの
個数である。In-execution flag: a flag for identifying whether it is in a normal operation state or a standby state. Information about calling: The entry address of the subroutine to be executed, the parameters to be passed to the subroutine, and the number of parameters.

【００２６】ユーザプログラム１７は、業務処理を行な
うプログラムであって、ここでは、サブルーチン１から
ｎを呼び出して並列処理を行なうものである。ＣＰＵ１
からＣＰＵｎは、複数のＣＰＵであって、ユーザプログ
ラム１７を並列処理するＣＰＵ群である。The user program 17 is a program for performing business processing, and here, calls the subroutines 1 to n to perform parallel processing. CPU1
To CPUn are a plurality of CPUs and a CPU group that processes the user program 17 in parallel.

【００２７】図２は、本発明の待機中のスレッドの管理
説明図を示す。図２の（ａ）は、待機中スレッド１４か
らリンクで複数の管理ブロック１６を順次管理するとき
の様子を示す。FIG. 2 is a diagram for explaining management of waiting threads according to the present invention. FIG. 2A shows a state in which a plurality of management blocks 16 are sequentially managed by a link from the waiting thread 14.

【００２８】図２の（ｂ）は、待機中スレッドテーブル
からポインタで複数の管理ブロック１６を管理するとき
の様子を示す。以上のように、待機中スレッド１４から
リンクで複数の管理ブロック１６を管理したり、待機中
スレッドテーブルからポインタで複数の管理ブロック１
６を管理したりすることにより、並列呼出処理１３が並
列処理可能なプログラム部分をスレッド１５に割り当て
るときに、当該待機中のスレッド１５があるときにこれ
に割り当てて並列処理を行わせ（並列処理を再開さ
せ）、処理終了後に返却させて待機中のスレッド１５と
して管理し、スレッド１５の生成・消去に要する処理量
よりも、スレッド１５の待機・再開の処理量の方が少な
く、並列処理の開始・終了時のオーバーヘッドを削減す
ることが可能となる。ここで、スレッド１５を作成させ
るにはシステムコール(pthread create)を発行し、スレ
ッド１５を消去させるにはシステムコール(pthread del
ete)を発行する。一方、本願発明では、スレッド１５を
再開させるにはシステムコール(pthread cond signal)
を発行し、スレッド１５を待機させるにはシステムコー
ル(pthread cond wait)を発行する。ここで、後者のス
レッド再開・待機の処理量は、前者のスレッド作成・消
去に比し、少ないので、結果として並列処理の開始・終
了のためのオーバーヘッドを削減することができる。以
下順次詳細に説明する。FIG. 2B shows how a plurality of management blocks 16 are managed by pointers from the waiting thread table. As described above, a plurality of management blocks 16 are managed by links from the waiting thread 14 and a plurality of management blocks 1 are managed by pointers from the waiting thread table.
6 is managed, when the parallel call processing 13 allocates a program part that can be processed in parallel to the thread 15, and when there is the waiting thread 15, the parallel call processing 13 allocates it to the parallel processing (parallel processing). Is resumed after the processing is completed, and is returned and managed as a thread 15 in a waiting state. It is possible to reduce the overhead at the start / end. Here, a system call (pthread create) is issued to create the thread 15 and a system call (pthread del is created to delete the thread 15).
ete) is issued. On the other hand, in the present invention, a system call (pthread cond signal) is used to restart the thread 15.
Is issued and a system call (pthread cond wait) is issued to make the thread 15 wait. Here, the latter thread restart / wait processing amount is smaller than that of the former thread creation / erasure, and as a result, the overhead for starting / ending parallel processing can be reduced. The details will be sequentially described below.

【００２９】図３および図４のフローチャートに示す順
序に従い、図１および図２の構成の動作を詳細に説明す
る。図３において、Ｓ１は、初期設定する。これは、装
置の各種初期設定を行い、動作可能な状態にする。The operation of the configuration of FIGS. 1 and 2 will be described in detail in the order shown in the flow charts of FIGS. In FIG. 3, S1 is initialized. This makes various initial settings of the device and makes it operational.

【００３０】Ｓ２は、並列処理指示ありか判別する。こ
れは、図６のプログラム（オブジェクト）を順次行毎に
読み込んで実行し、例えば図６の（３）の並列呼出指示
がありか判別する。ＹＥＳの場合には、Ｓ３で並列呼出
処理１３を呼出し、Ｓ４に進む。一方、ＮＯの場合に
は、並列処理指示でないと判明したので、Ｓ１０で次の
行を読み込んで実行し、Ｓ２を繰り返す。In step S2, it is determined whether or not there is a parallel processing instruction. This is performed by sequentially reading the program (object) of FIG. 6 line by line and determining whether there is a parallel call instruction of (3) of FIG. 6, for example. If YES, the parallel call process 13 is called in S3, and the process proceeds to S4. On the other hand, in the case of NO, it is determined that the instruction is not the parallel processing instruction. Therefore, the next line is read and executed in S10, and S2 is repeated.

【００３１】Ｓ４は、並列呼出しか判別する。これは、
Ｓ２で呼び出された並列呼出処理１３が現在実行しよう
とするプログラムの行の命令が並列呼出しか判別する。
ＹＥＳの場合には、Ｓ５で待機中のスレッドに空がある
か判別し、ＹＥＳのときにＳ６からＳ９の処理で待機中
のスレッド１５の処理を再開させＳ９に進み、ＮＯのと
きに新たなスレッド１５を生成し、Ｓ９に進む。一方、
Ｓ４のＮＯの場合には、並列呼出しでなく、例えばＥＮ
ＤＰＡＲＣＡＬＬ（並列実行終了）であったので、図
４の（Ｃ）のＳ１５に進む。In S4, only parallel calls are discriminated. this is,
The parallel call processing 13 called in S2 determines whether the instruction in the line of the program which is currently to be executed is parallel call.
In the case of YES, it is determined in S5 whether or not there is an empty thread in the waiting state, and in the case of YES, the processing of the waiting thread 15 is restarted in the processing of S6 to S9 and the process proceeds to S9. The thread 15 is generated, and the process proceeds to S9. on the other hand,
In the case of NO in S4, instead of calling in parallel, for example EN
Since it was D PARCALL (end of parallel execution), the process proceeds to S15 of FIG.

【００３２】Ｓ６は、Ｓ５のＹＥＳで待機中のスレッド
１５があると判明したので、管理ブロックを外す（図２
参照）。Ｓ７は、Ｓ６で外した管理ブロックに、呼出し
に関する情報を設定する。In S6, since it is determined that there is a waiting thread 15 in YES in S5, the management block is removed (see FIG. 2).
reference). In S7, the information regarding the call is set in the management block removed in S6.

【００３３】Ｓ８は、管理ブロックの実行中フラグをＯ
Ｎにする。Ｓ９は、並列実行を開始する。以上のＳ６か
らＳ９によって、待機中のスレッド１５があった場合、
この待機中のスレッド１５の管理ブロックを外し、この
管理ブロックに呼出しに関する情報を設定すると共に実
行中フラグをＯＮにし、実際の並列処理の実行を開始す
る。これにより、新たなスレッドを再生して開始するよ
りも、待機中のスレッドによる処理を再開させるので、
処理量を削減できる。そして、並列処理が終了した場合
には、Ｓ４に戻ってＮＯとなり、図４の（Ｃ）のＳ１５
以下の並列処理終了に進む。In S8, the execution flag of the management block is set to O.
Set to N. S9 starts parallel execution. When there is a waiting thread 15 by the above S6 to S9,
The management block of this waiting thread 15 is removed, information regarding the call is set in this management block, the execution flag is turned ON, and the actual parallel processing is started. With this, rather than replaying and starting a new thread, processing by the waiting thread is resumed,
The processing amount can be reduced. Then, when the parallel processing ends, the process returns to S4 and becomes NO, and S15 of FIG.
Proceed to the end of parallel processing below.

【００３４】Ｓ１１は、Ｓ５のＮＯで待機中のスレッド
１５がないと判明したので、管理ブロック用領域を獲得
して初期化する（図２参照）。Ｓ１２は、管理ブロック
に、呼出しに関する情報を設定する。In S11, since it is determined that there is no waiting thread 15 in NO in S5, the management block area is acquired and initialized (see FIG. 2). In step S12, information regarding calling is set in the management block.

【００３５】Ｓ１３は、管理ブロックの実行中フラグを
ＯＮにする。Ｓ１４は、ＯＳにスレッドの生成を依頼す
る。Ｓ９は、Ｓ１４で新たに生成されたスレッドが並列
処理を実行する。In step S13, the in-execution flag of the management block is turned on. In S14, the OS is requested to create a thread. In S9, the thread newly generated in S14 executes parallel processing.

【００３６】以上のＳ１１からＳ１４、Ｓ９によって、
待機中のスレッド１５がなかった場合、新たに管理ブロ
ックを作成し、この管理ブロックに呼出しに関する情報
を設定すると共に実行中フラグをＯＮにした後、スレッ
ド１５を新たに作成した後、実際の並列処理の実行を開
始する。これにより、待機中のスレッドがない場合に新
たなスレッドを生成した並列処理を実行開始することが
できる。By the above S11 to S14 and S9,
If there is no thread 15 waiting, a new management block is created, information regarding the call is set in this management block, the running flag is turned ON, and then a new thread 15 is created. Start executing the process. As a result, when there is no waiting thread, the parallel processing that creates a new thread can be started.

【００３７】図４のＳ１５は、図３のＳ４のＮＯ、即ち
並列処理終了の場合、サブルーチンが処理を終えたか判
別する。これは、サブルーチンを実行していたスレッド
の実行中フラグがＯＦＦであれば、サブルーチンの処理
が終わったことが判明する。ＹＥＳの場合には、Ｓ１６
に進む。ＮＯの場合には、サブルーチンが処理を終える
のを待機する。In step S15 of FIG. 4, if NO in step S4 of FIG. 3, that is, if the parallel processing is completed, it is determined whether the subroutine has completed the processing. This means that if the in-execution flag of the thread executing the subroutine is OFF, the processing of the subroutine is completed. If YES, S16
Proceed to. If NO, wait for the subroutine to finish processing.

【００３８】Ｓ１６は、管理ブロックを待機中スレッド
１４に戻す。これは、Ｓ１５のＹＥＳで、並列処理を実
行したスレッド１５がサブルーチンの処理を終了したの
で、このスレッド１５を図２の待機中スレッド１４に戻
し、次の並列処理に備える。In step S16, the management block is returned to the waiting thread 14. This is YES in S15, and the thread 15 that has executed the parallel processing has completed the processing of the subroutine. Therefore, the thread 15 is returned to the waiting thread 14 in FIG. 2 to prepare for the next parallel processing.

【００３９】Ｓ１７は、並列呼出ししたサブルーチンが
全て処理を終えたか判別する。ＹＥＳの場合には、Ｓ１
８で後処理を行い、終了する。ＮＯの場合には、Ｓ１５
以降を繰り返す。In step S17, it is determined whether or not all the subroutines called in parallel have finished processing. If YES, S1
Post-processing is performed at 8, and the process ends. If NO, S15
Repeat the above.

【００４０】以上によって、並列実行処理を行い、呼び
出されたサブルーチンが処理を終えたときに待機中スレ
ッド１４に戻すことを繰り返し、並列呼出しした全ての
サブルーチンが処理を終了するのを待ち合わせ、全てが
処理終了したときに後処理を行って終了する。これらに
より、並列処理を行ったスレッド１５を管理ブロックに
リンクして管理し、次の並列処理時にこのスレッド１５
を取り出して並列処理を実行（再開）させることによ
り、並列処理の開始・終了毎にスレッドの生成・消去を
繰り返す必要がなくなり、並列処理の開始・終了時の処
理量を削減してオーバーヘッドを少なくできる。As described above, parallel execution processing is performed, and when the called subroutine finishes the processing, returning to the waiting thread 14 is repeated, and all the subroutines called in parallel wait until the processing is completed. When the processing ends, post-processing is performed and the processing ends. By these, the thread 15 that has performed parallel processing is linked to the management block and managed, and this thread 15
By retrieving and executing (resuming) parallel processing, there is no need to repeat thread creation / deletion at the start / end of parallel processing, reducing the amount of processing at the start / end of parallel processing and reducing overhead. it can.

【００４１】次に、図４のＳ２１からＳ２４によって、
別スレッドによる並列実行処理について説明する。図４
のＳ２１は、別スレッドがＳ９の並列実行開始まで待
つ。Next, by S21 to S24 of FIG.
Parallel execution processing by another thread will be described. Figure 4
In S21, the process waits until another thread starts parallel execution in S9.

【００４２】Ｓ２２は、管理ブロックから呼出しに関す
る情報を取り出す。Ｓ２３は、ユーザサブルーチンを呼
び出す。Ｓ２４は、管理ブロックの実行中フラグをＯＦ
Ｆにする。In step S22, information regarding the call is retrieved from the management block. S23 calls a user subroutine. In S24, the execution flag of the management block is set to OF.
Set to F.

【００４３】以上によって、別スレッドが他のスレッド
によるＳ９の並列実行開始まで待ち、並列実行開始され
たときに管理ブロックから呼出しに関する情報を取り出
し、この情報をもとにユーザサブルーチンを呼出して処
理を依頼した後、自身のスレッドの管理ブロックの実行
中フラグを呼出し元から非同期にＯＦＦにする。そし
て、Ｓ１５に進み、呼び出されたサブルーチンが処理を
終えたときに管理ブロックを待機中スレッドに戻す。As described above, another thread waits until the parallel execution of S9 by the other thread is started, and when the parallel execution is started, the information regarding the call is taken out from the management block, and the user subroutine is called based on this information to perform the processing. After the request, the in-execution flag of the management block of its own thread is asynchronously turned off from the caller. Then, the process proceeds to S15, and the management block is returned to the waiting thread when the called subroutine finishes the processing.

【００４４】図５は、本発明の並列処理プログラムの動
作概念図を示す。ここで、（１）から（７）は、図６の
（１）から（７）に対応している。ここでは、図示のよ
うに、並列処理プログラムのＭＡＩＮが（１）で並列処
理開始し、スレッド１５を作成してサブルーチンＳ１の
実行と、スレッド１５を作成してサブルーチンＳ２の実
行を並列に開始する。FIG. 5 shows an operation conceptual diagram of the parallel processing program of the present invention. Here, (1) to (7) correspond to (1) to (7) in FIG. Here, as shown in the figure, MAIN of the parallel processing program starts the parallel processing at (1), creates the thread 15 and executes the subroutine S1, and creates the thread 15 and starts the execution of the subroutine S2 in parallel. .

【００４５】並列実行開始したサブルーチンＳ１の中で
更に（３）で並列処理開始し、スレッド１５を作成して
サブルーチンＳ３の実行と、スレッド１５を作成してサ
ブルーチンＳ３の実行を並列に開始する。そして、
（７）の並列処理をそれぞれ実行して処理を終了し、
（４）でサブルーチンＳ３の並列処理をそれぞれ終了
し、サブルーチンのＳ１の処理に戻る。In the subroutine S1 in which the parallel execution is started, the parallel processing is further started in (3), the thread 15 is created to execute the subroutine S3, and the thread 15 is created to start the execution of the subroutine S3 in parallel. And
The parallel processing of (7) is executed respectively and the processing is terminated.
In (4), the parallel processing of the subroutine S3 is finished, and the processing returns to the processing of S1 of the subroutine.

【００４６】同様に、並列実行開始したサブルーチンＳ
２の中で更に（５）で並列処理開始し、スレッド１５を
作成してサブルーチンＳ３の実行と、スレッド１５を作
成してサブルーチンＳ３の実行を並列に開始する。そし
て、（７）の並列処理をそれぞれ実行して処理を終了
し、（６）でサブルーチンＳ３の並列処理をそれぞれ終
了し、サブルーチンのＳ２の処理に戻る。Similarly, the subroutine S which has started parallel execution
In (2), the parallel processing is further started, and the thread 15 is created to execute the subroutine S3, and the thread 15 is created to execute the subroutine S3 in parallel. Then, the parallel processing of (7) is executed to end the processing, the parallel processing of the subroutine S3 is ended in (6), and the processing returns to the processing of S2 of the subroutine.

【００４７】次に、（２）でサブルーチンＳ１およびサ
ブルーチンＳ２の並列処理を終了し、ＭＡＩＮに戻る。
これらの際に、並列実行を開始する（１）、（３）、
（５）で待機中のスレッド１５がないときに作成して並
列実行させ、並列実行を終了する（４）、（２）、
（６）のときにスレッド１５を待機中として管理し、次
回以降の繰り返しループなどのときに（１）、（３）、
（５）のときに待機中のスレッド１５を割り当てて使用
することにより、従来のスレッドの生成・消去をその都
度行なう場合に比し、スレッド１５を待機中に管理およ
び並列処理を再開させる本願ではその処理量を削減で
き、特に並列呼出時のオーバーヘッドを削減できる。Next, in (2), the parallel processing of the subroutine S1 and the subroutine S2 is completed, and the process returns to MAIN.
At these times, parallel execution is started (1), (3),
When there is no thread 15 waiting in (5), it is created and executed in parallel, and the parallel execution ends (4), (2),
At the time of (6), the thread 15 is managed as a waiting state, and at the time of the next and subsequent repeated loops, (1), (3),
In the present application, by allocating and using the waiting thread 15 at the time of (5), the management and parallel processing are resumed while the thread 15 is waiting, as compared with the conventional case where threads are created and deleted each time. The amount of processing can be reduced, and especially the overhead at the time of parallel call can be reduced.

【００４８】図６は、本発明のプログラム例を示す。こ
れは、図５の動作概念図を実際のプログラムのイメージ
で記述したものであって、（１）から（７）、ＭＡＩ
Ｎ、Ｓ１、Ｓ２、Ｓ３は図６のそれぞれのものに対応し
ている。FIG. 6 shows an example of the program of the present invention. This is a description of the operation conceptual diagram of FIG. 5 in the image of an actual program, and includes (1) to (7) and MAI.
N, S1, S2, and S3 correspond to those in FIG.

【００４９】図６において、並列呼出処理１３は、並列
実行開始で呼び出され、並列実行終了で復帰するもので
あって、図５の（１）、（３）、（５）でスレッド１５
を割り当て（待機中のスレッド１５があるときはこれを
割り当て、待機中のスレッド１５がないときは新たに生
成したスレッド１５を割り当て）、サブルーチンを並列
実行させるものである。In FIG. 6, the parallel call processing 13 is called at the start of parallel execution and returns at the end of parallel execution, and the thread 15 is executed at (1), (3) and (5) in FIG.
Is assigned (if there is a waiting thread 15, this is assigned; if there is no waiting thread 15, the newly created thread 15 is assigned), and the subroutines are executed in parallel.

【００５０】ＭＡＩＮは、メインプログラムであって、
並列呼出処理１３を呼び出してサブルーチンＳ１、サブ
ルーチンＳ２を並列実行させるものである。Ｓ１は、サ
ブルーチンであって、ここではＭＡＩＮから並列呼び出
されたものであり、並列呼出処理１３を呼び出して２つ
のサブルーチンＳ３を並列実行させるものである。MAIN is the main program,
The parallel call processing 13 is called to execute the subroutines S1 and S2 in parallel. S1 is a subroutine, which is called from MAIN in parallel here, and calls the parallel calling process 13 to execute two subroutines S3 in parallel.

【００５１】Ｓ２は、サブルーチンであって、ここでは
ＭＡＩＮから並列呼び出されたものであり、並列呼出処
理１３を呼び出して２つのサブルーチンＳ３を並列実行
させるものである。S2 is a subroutine, which is called in parallel from MAIN in this case, and calls the parallel calling process 13 to execute two subroutines S3 in parallel.

【００５２】Ｓ３は、サブルーチンであて、演算処理を
行なう中核となるものであり、サブルーチンＳ１、サブ
ルーチンＳ２からそれぞれ並列に呼び出されて並列実行
処理を行なうものである。S3 is a subroutine, which is the core of the arithmetic processing, and is called in parallel from each of the subroutines S1 and S2 to perform the parallel execution processing.

【００５３】図７は、本発明のスレッドとサブルーチン
の概念図を示す。図７において、プロセスは、ＯＳによ
った作成されたプロセス（ジョブ）である。FIG. 7 shows a conceptual diagram of threads and subroutines of the present invention. In FIG. 7, the process is a process (job) created by the OS.

【００５４】ｔ１は、当初新たに作成されたスレッド１
である。ｔ２は、ｔ１のスレッド１の実行中のシステム
コール時に生成されたスレッドである。また、ｔ２のス
レッド２は並列実行終了時にシステムコールによってこ
こでは消滅させる（ループを形成して繰り返すときは既
述した待機中スレッド１５にしておき、次回以降この待
機中のスレッド１５を割り当て並列処理を実行（再開）
させる。T1 is the thread 1 newly created at the beginning
Is. t2 is a thread generated at the time of the system call during execution of the thread 1 of t1. In addition, the thread 2 at t2 is made to disappear here by a system call at the end of parallel execution (when forming a loop and repeating, it is set to the waiting thread 15 described above, and the waiting thread 15 is allocated from next time to parallel processing. Execute (restart)
Let

【００５５】[0055]

【発明の効果】以上説明したように、本発明によれば、
並列処理部分の終了時点でスレッド１５を終了させず
に、待機状態にしておき、次の並列処理を開始するとき
に新たなスレッド１５を生成せずに待機状態のスレッド
１５を割り当てて再開させて並列処理を行う構成を採用
しているため、並列動作の開始・終了のオーバーヘッド
を削減することができる。As described above, according to the present invention,
At the end of the parallel processing part, the thread 15 is not terminated but is put in a waiting state, and when the next parallel processing is started, a new thread 15 is not generated and the waiting thread 15 is allocated and restarted. Since the configuration that performs parallel processing is adopted, the overhead of starting and ending parallel operation can be reduced.

[Brief description of drawings]

【図１】本発明の原理ブロック図である。FIG. 1 is a principle block diagram of the present invention.

【図２】本発明の待機中のスレッドの管理説明図であ
る。FIG. 2 is an explanatory diagram of management of a waiting thread according to the present invention.

【図３】本発明の動作説明フローチャート（その１）で
ある。FIG. 3 is a flowchart (No. 1) for explaining the operation of the present invention.

【図４】本発明の動作説明フローチャート（その２）で
ある。FIG. 4 is a flowchart (part 2) for explaining the operation of the present invention.

【図５】本発明の並列処理プログラムの動作概念図であ
る。FIG. 5 is an operation conceptual diagram of a parallel processing program of the present invention.

【図６】本発明のプログラム例である。FIG. 6 is an example of a program of the present invention.

【図７】本発明のスレッドとサブルーチンの概念図であ
る。FIG. 7 is a conceptual diagram of threads and subroutines of the present invention.

【図８】並列処理のハードウェア構成図である。FIG. 8 is a hardware configuration diagram of parallel processing.

【図９】並列プログラムの構造例である。FIG. 9 is a structural example of a parallel program.

[Explanation of symbols]

１１：主記憶装置１２：ＯＳ（オペレーティングシステム）１３：並列呼出処理１４：待機中スレッド１５：スレッド１６：管理ブロック１７：ユーザプログラム 11: main memory 12: OS (operating system) 13: parallel call processing 14: waiting thread 15: thread 16: management block 17: user program

Claims

[Claims]

1. In a multi-thread processing method for performing parallel processing by a plurality of threads, a thread (15) that takes in a program portion that can be executed in parallel and executes the processing, and a parallel execution that is called at the start of parallel processing in the program. A parallel call process (13) that allocates a possible program part to the thread (15) and executes the thread (15) and manages the waiting thread (15), and the parallel call that is called at the start of the parallel process in the program When there is a thread (15) waiting for the program part that can be executed in parallel by the process (13), the program part that can be executed in parallel is assigned to this thread (15),
On the other hand, if there is no waiting thread (15), a new thread (15) is generated, a parallel-executable program part is allocated to this thread (15), and the thread is executed. A multi-thread processing system characterized by being configured to be managed as threads (15).