JP2001249821A

JP2001249821A - Job scheduling method

Info

Publication number: JP2001249821A
Application number: JP2000067146A
Authority: JP
Inventors: Naoki Utsunomiya; 直樹宇都宮; Koji Sonoda; 浩二薗田; Hiroyuki Kumazaki; 裕之熊▲崎▼; Hiroyuki Takatsu; 弘幸高津
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 2000-03-07
Filing date: 2000-03-07
Publication date: 2001-09-14

Abstract

PROBLEM TO BE SOLVED: To reduce dead waiting time due to waiting for an event of a process in a job in the job to perform calculation by plural processes. SOLUTION: When a generation factor of waiting for the event to be requested by the process in a certain job exists in except the job for a scheduler to switch processes by unit of job, the job is preferentially switched over to a job to be the generation factor. In addition, when the job waits for the event for execution of a collective function, switching of the jobs is instructed to the scheduler. Furthermore, a processor to execute the collective function is separated from a processor to execute the next job in multi-processor environment. Thus, the useless waiting time for the event is eliminated and processor use rate of all jobs are enhanced by switching the jobs. In addition, the lowering of parallelization efficiency of the jobs is prevented by selectively using the processors.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、複数ノード上で分
散されたプロセス全体が一つのジョブとして動作するプ
ログラム実行のスケジューリング方法係わり、特にこの
ようなジョブが複数存在する場合、各ジョブを効率的に
実行する事を可能とするジョブスケジューリング方法に
関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a method of scheduling a program execution in which all processes distributed on a plurality of nodes operate as one job. Particularly, when there are a plurality of such jobs, each job can be efficiently processed. The present invention relates to a job scheduling method which can be executed in a job.

【０００２】[0002]

【従来の技術】近年、科学技術計算や画像処理計算に代
表される膨大なデータを高速に処理する要求を満たす為
に、複数プロセッサを用いた並列処理が可能な並列計算
機が、このような要求を満たす計算機の中心となった。
プロセッサを複数結合する方式としては、比較的少数の
プロセッサを高速バスで接続し、各プロセッサからの直
接のメモリアクセスを許す蜜結合型のマルチプロセッサ
構成方式と、プロセッサとメモリでノードを構成し、こ
のノードを複数、高速ネットワークで接続する疎結合型
のマルチプロセッサ構成方式がある。ユーザプログラム
はＯＳが提供する複数の実行実態（プロセス、または、
スレッド、以降はプロセスという言葉で統一する）を用
いて並列処理を行う。ＯＳは、各ユーザプログラムに対
応する並列処理を、これら複数のプロセスをまとめたジ
ョブを単位として、その実行を制御する。2. Description of the Related Art In recent years, in order to satisfy the demand for high-speed processing of enormous data represented by scientific and technological calculations and image processing calculations, a parallel computer capable of performing parallel processing using a plurality of processors has been required. It became the center of the computer to meet.
As a method to connect multiple processors, a relatively small number of processors are connected by a high-speed bus, and a direct connection to each processor allows direct memory access. There is a loosely-coupled multiprocessor configuration system in which a plurality of nodes are connected by a high-speed network. The user program has a plurality of execution conditions (processes or
Threads, hereinafter referred to as processes) are used for parallel processing. The OS controls the execution of the parallel processing corresponding to each user program in units of a job in which the plurality of processes are put together.

【０００３】通常、ＯＳのスケジューラはプロセス単位
にプログラムの実行を制御する為、ジョブが複数のプロ
セスから成る場合、各プロセスのプログラム実行が協調
しないと、各プロセス間の同期待ちが大きくなるという
問題がある。この同期待ち時間の増大を回避する為、Ｏ
Ｓはギャングスケジューラを備え、実行させるべきジョ
ブが複数ある場合は、ジョブの連続実行時間（クウォン
タムタイム）を設定し、あるジョブが、クウォンタムタ
イムを使い切った時に、同ジョブに属する全てのプロセ
スを一度に一時停止する事により、同ジョブの実行を一
時停止し、別の実行可能なジョブを選択して、このジョ
ブに属する全てのプロセスの実行を再開する事により、
同ジョブを再実行させる。Normally, the scheduler of the OS controls the execution of a program on a process-by-process basis. Therefore, when a job is composed of a plurality of processes, the synchronization wait between the processes increases if the execution of the programs of the processes does not cooperate. There is. In order to avoid this increase in the synchronization waiting time, O
S has a gang scheduler. If there are a plurality of jobs to be executed, the continuous execution time (quantum time) of the job is set. By suspending the execution of the same job, the execution of the same job is suspended, another executable job is selected, and the execution of all processes belonging to this job is resumed.
Re-execute the job.

【０００４】たとえば、特開平９−１２８３５１号公報
では、疎結合型のマルチプロセッサシステムを仮定し、
ギャングスケジューリングを効率よく行う為に、システ
ムがノード間通信の頻度を監視し、その頻度を契機とし
てジョブの切替え処理を起動する仕組みを提供してい
る。For example, Japanese Patent Application Laid-Open No. 9-128351 assumes a loosely coupled multiprocessor system,
In order to perform gang scheduling efficiently, a system is provided in which the system monitors the frequency of inter-node communication and activates the job switching process based on the frequency.

【０００５】[0005]

【発明が解決しようとする課題】上記従来技術では、ギ
ャングスケジューラを導入する事により、ジョブ内のプ
ロセス間同期に関する同期待ち時間の問題は解決する。
さて、ギャングスケジューラにおけるジョブのクウォン
タムタイムは通常のプロセスのスケジューラにおけるク
ウォンタムタイムよりも長めに設定する。なぜならば、
ギャングスケジューリングでは、ジョブに含まれる全て
のプロセスについて、一時停止や再実行を行う為、単一
プロセスの一時停止・再実行を扱うプロセスのスケジュ
ーリングに比べて、スケジューリングのオーバヘッドが
大きくなるからである。特に、疎結合型のマルチプロセ
ッサシステムで、ジョブに属するプロセスが複数のノー
ドに渡る場合は、通信処理の遅延なども加わり、さらに
スケジューリングオーバヘッドは増加する。In the above-mentioned prior art, the problem of the synchronization waiting time relating to the synchronization between processes in a job is solved by introducing a gang scheduler.
The quantum time of a job in the gang scheduler is set to be longer than the quantum time in the scheduler of a normal process. because,
This is because, in gang scheduling, all processes included in a job are suspended and re-executed, so that scheduling overhead becomes larger than scheduling of a process that handles suspension and re-execution of a single process. Particularly, in a loosely coupled multiprocessor system, when a process belonging to a job extends to a plurality of nodes, a delay in communication processing is added, and the scheduling overhead further increases.

【０００６】ところが、従来のギャングスケジューラを
備えたシステムで、異なるジョブ間で同期を行う必要が
ある場合は、ギャングスケジューリングだけでは、同期
待ち時間の問題を解決する事はできないという問題点が
ある。図４は異なるジョブ１００と１０４が、ある資源
に対してアクセスする為にロックを獲得するところを示
している。現在ロック保持しているのはジョブ１０４の
プロセス１０２で、ロックを要求しているのは、ジョブ
１００のプロセス１０１である。また、現状ギャングス
ケジューラが実行を許しているのはジョブ１００で、そ
の他のジョブはジョブ１０４を含めランキュー１０３に
繋がれている。各ジョブは、ランキューの矢印方向に順
番に切り替わるものとする。現在実行中のジョブ１００
中のプロセス１０１は上記ロックの獲得を要求するが、
ジョブ１０４中のプロセス１０２がロックを保持したま
まスケジュールされていないので、ジョブ１０１はジョ
ブ１０４が動作を再開して、ロックを解放するまで待つ
必要がある。ところが、ギャングスケジューリングで
は、ジョブのクウォンタムタイムが長いので、ジョブ１
００はロック獲得要求後すぐに次のジョブに切り替わら
ず問題である。However, in a system having a conventional gang scheduler, when synchronization is required between different jobs, there is a problem that the gang scheduling alone cannot solve the problem of the synchronization waiting time. FIG. 4 shows different jobs 100 and 104 acquiring locks to access certain resources. The process 102 of the job 104 currently holds the lock, and the process 101 of the job 100 requests the lock. Currently, the gang scheduler permits execution of the job 100, and other jobs including the job 104 are connected to the run queue 103. Each job is switched sequentially in the direction of the arrow of the run queue. Currently running job 100
Process 101 requests acquisition of the lock,
Since the process 102 in the job 104 is not scheduled while holding the lock, the job 101 needs to wait until the job 104 resumes its operation and releases the lock. However, in gang scheduling, since the quantum time of a job is long, the job 1
00 is a problem because it does not switch to the next job immediately after the lock acquisition request.

【０００７】更に、従来のギャングスケジューラでは、
ユーザジョブと非同期にコンテキストを切り替えている
為、ユーザジョブ中の全プロセスが何らかの事象待ちで
停止していたとしても、そのユーザジョブにクウォンタ
ムタイムを割り付けてしまう。従って、プロセッサを使
用しないジョブがプロセッサを割り付けられている為、
プロセッサの利用率が低下し、問題である。この問題に
伴い、例えば、図４のランキュー中のジョブが全てロッ
ク待ちであった場合、ジョブ１０４がスケジュールされ
るまで、間のジョブ（１０５〜１０８）に無駄にプロセ
ッサを割り付けてしまう可能性があり、問題である。Further, in the conventional gang scheduler,
Since the context is switched asynchronously with the user job, even if all processes in the user job are stopped waiting for some event, a quantum time is assigned to the user job. Therefore, because a job that does not use a processor is assigned a processor,
Processor utilization is reduced, which is a problem. Along with this problem, for example, if all the jobs in the run queue in FIG. 4 are waiting for a lock, there is a possibility that a processor is uselessly allocated to the intervening jobs (105 to 108) until the job 104 is scheduled. Yes, it is a problem.

【０００８】これに対して、上記従来の技術では、ネッ
トワークのパケットを監視する事により、切り替えるタ
イミングを検出するが、複数ノードにプロセスが分散し
ているジョブは、あるノード上のプロセスに対するパケ
ット数だけをカウントしていたとしても、全プロセスが
待ち状態になったかどうかの判断を下す事ができず、上
記問題の解決とはならない。更に、ジョブが通信を行わ
ないような場合にも上記問題の解決とはならない。[0008] On the other hand, in the above-described conventional technique, the timing of switching is detected by monitoring the network packet. Even if only counting is performed, it is not possible to determine whether or not all processes have entered the waiting state, and the above problem cannot be solved. Further, even when the job does not perform communication, the above problem cannot be solved.

【０００９】また、並列計算プログラムに特有な機能と
して、ジョブ中の全プロセスが同じ機能を実行するコレ
クティブ機能があるが、コレクティブ機能が機能完了待
ちを伴う場合、この待ちの間ジョブ中の全プロセスが実
行を停止するが、従来のギャングスケジューラでは、こ
の契機を拾う事ができず、各プロセスは待ち時間中も無
駄にプロセッサを占有し、問題である。Also, as a function unique to the parallel computing program, there is a collective function in which all processes in a job execute the same function. If the collective function involves waiting for completion of a function, all processes in the job during this wait are performed. However, the conventional gang scheduler cannot catch this opportunity, and each process wastefully occupies the processor even during the waiting time, which is a problem.

【００１０】また、別の問題として、切り替えるジョブ
が使用するプロセス数が異なる場合、割り付けられたプ
ロセッサが上記プロセス数よりも少ないと、プロセス間
の同期待ち時間が延びてしまうという事がある。更に、
常に最大のプロセッサ数を予約しておくと、使用するプ
ロセッサ数が当該最大プロセッサ数よりも少ない場合
は、プロセッサ利用率が低下するという問題点がある。As another problem, when the number of processes used by the jobs to be switched is different, if the number of allocated processors is smaller than the above-mentioned number of processes, the synchronization waiting time between the processes may be extended. Furthermore,
If the maximum number of processors is always reserved, if the number of processors to be used is smaller than the maximum number of processors, there is a problem that the processor utilization is reduced.

【００１１】本発明の目的は、ギャングスケジューラ
が、スケジュールの契機として、ユーザジョブ中のプロ
セスの、当該ユーザジョブ以外のプロセスに起因する事
象待ちを利用する事により、事象待ちによるプロセッサ
利用率の低下を抑止する事にある。[0011] It is an object of the present invention to reduce the processor utilization rate due to event waiting by using a gang scheduler to wait for an event of a process in a user job caused by a process other than the user job as a trigger of a schedule. Is to deter.

【００１２】更に、本発明の別の目的として、並列計算
プログラムに特有なコレクティブ機能を実行する際の待
ち事象を、ギャングスケジュールの契機とする事にあ
る。Still another object of the present invention is to use a waiting event when executing a collective function unique to a parallel computing program as a trigger of a gang schedule.

【００１３】更に、本発明の別の目的として、ギャング
スケジューラの切替え処理を行うＯＳが動作するプロセ
ッサと、ユーザジョブが動作するプロセッサを分離し、
ジョブによって両者の構成を変えて並列処理させる事に
より、ジョブの並列化効果の低下を抑止し、同期待ち時
間の増大を防ぐと共に、プロセッサの利用率を高める事
にある。Further, as another object of the present invention, a processor in which an OS for performing a switching process of a gang scheduler operates and a processor in which a user job operates are separated.
An object of the present invention is to reduce the effect of parallelizing jobs, to prevent an increase in synchronization waiting time, and to increase the utilization rate of a processor by changing the configuration of both jobs and performing parallel processing.

【００１４】[0014]

【課題を解決するための手段】上記目的を達成する為
に、ユーザジョブを構成する複数のプロセスにおいて、
そのうちの一つのプロセスが事象待ち命令を発行した事
を契機として、ＯＳは事象発生特定手段を用いて上記プ
ロセスが待つ事象の発生プロセスが上記ユーザジョブに
含まれるかどうかを判断し、上記ユーザプロセスに含ま
れない場合は、当該ユーザジョブ中のプロセスから、別
の実行可能なジョブに切り替えるようにしたものであ
る。In order to achieve the above object, in a plurality of processes constituting a user job,
When one of the processes issues an event wait instruction, the OS uses an event occurrence identification unit to determine whether the process in which the process waits is included in the user job, and Is not included, the process in the user job is switched to another executable job.

【００１５】上記他の目的を達成する為に、ユーザジョ
ブがコレクティブ機能を実行する場合、コレクティブ機
能要求を契機としてユーザジョブから別の実行可能なジ
ョブへ切り替えるようにしたものである。In order to achieve the above object, when a user job executes a collective function, the user job is switched to another executable job in response to a request for a collective function.

【００１６】上記他の目的を達成する為に、ノード内を
マルチプロセッサ構成とし、ユーザジョブが動作するユ
ーザ実行用プロセッサとＯＳやサービスプログラムが動
作するサービス実行用プロセッサとを分離する機能を有
し、ユーザジョブがある事象を契機として別の実行可能
なジョブへ切り替える場合、ユーザ実行用プロセッサと
サービス実行用プロセッサの構成を変えるようにしたも
のである。In order to achieve the above object, the node has a multiprocessor configuration, and has a function of separating a user execution processor on which a user job operates and a service execution processor on which an OS or a service program operates. When a user job is switched to another executable job triggered by a certain event, the configurations of the user execution processor and the service execution processor are changed.

【００１７】[0017]

【発明の実施の形態】図１は本発明の実施例の構成を示
す図である。１〜６はノードを表す。ノードは、複数の
プロセッサ（２５〜２８，１３）と主記憶とから成り、
各プロセッサはノード内の主記憶へは直接アクセス可能
である。各ノードは高速ネットワーク（１８）で接続さ
れ、あるノード上のプロセッサは、通常、このネットワ
ークとＯＳ処理を介して別のノードのメモリにアクセス
可能である。複数のノードのうちのあるノード（２，
６）にはディスク装置（７，９）が接続されている。ノ
ード５には複数のディスク装置８が接続されている。他
のノード（１，３，４）からはＯＳを介してこれらのデ
ィスク装置にアクセス可能である。FIG. 1 is a diagram showing the configuration of an embodiment of the present invention. 1 to 6 represent nodes. The node includes a plurality of processors (25 to 28, 13) and a main memory,
Each processor can directly access the main memory in the node. Each node is connected by a high speed network (18), and the processor on one node can usually access the memory of another node via this network and OS processing. A node (2,
Disk devices (7, 9) are connected to 6). A plurality of disk devices 8 are connected to the node 5. These disk devices can be accessed from the other nodes (1, 3, 4) via the OS.

【００１８】各ノードには機能的に同等なＯＳが存在す
る。各ＯＳはその処理をするプロセス（１２）がＯＳ専
用のプロセッサ（１３）上で動作している。なお、本実
施例では特に断りがない限りプロセスとスレッドの区別
はしない。例えば、ＯＳが複数のスレッドで構成されて
いる場合は、各スレッドはＯＳ内のデータを共有する事
が可能であるが、複数のプロセスで構成されている場合
は、各プロセスの仮想空間は独立である為、共有データ
にアクセスする為には別の工夫が必要である。また、プ
ロセスとスレッドを適宜混在させてＯＳ処理を行う事も
可能である。これら、共有データへのアクセス方法は、
本発明にとっては実装上の違いに過ぎないので、本発明
は本質的にどちらの方法に対しても適応可能である。Each node has a functionally equivalent OS. In each OS, a process (12) for processing the OS runs on a processor (13) dedicated to the OS. In this embodiment, no distinction is made between processes and threads unless otherwise specified. For example, when the OS is composed of a plurality of threads, each thread can share data in the OS. However, when the OS is composed of a plurality of processes, the virtual space of each process is independent. Therefore, another device is required to access the shared data. It is also possible to perform OS processing by appropriately mixing processes and threads. To access these shared data,
The invention is essentially adaptable to either method, as it is only an implementation difference for the invention.

【００１９】ユーザプログラムはジョブとして実行され
る。ジョブは、複数のプロセスから構成され、プロセス
は複数のスレッドから構成される。プロセス内のスレッ
ドは同じ仮想空間を共有する。図１ではジョブ１０、１
１が存在し、ジョブ１０はプロセス１７〜２０を保持す
る。プロセスをスレッドとして実行する場合は、例え
ば、１７と１８をノード１上のプロセスのスレッドとす
る。同じプロセスの属するスレッドは仮想空間を共有す
る為、異なるノード上のスレッドは異なるプロセスに属
する。この事に注意をすれば、プロセスをいつでもプロ
セスとスレッドに置き換える事ができるので、移行はプ
ロセスとスレッドの区別をしないが、本実施例はスレッ
ドによる実施例にも容易に適用できるものとする。ユー
ザジョブに属するプロセスはユーザ用プロセッサ（１
４）で実行され、ＯＳ専用プロセッサで実行される事は
ない。The user program is executed as a job. A job is composed of a plurality of processes, and a process is composed of a plurality of threads. Threads in a process share the same virtual space. In FIG. 1, jobs 10 and 1
1 exists, and the job 10 holds processes 17 to 20. When the process is executed as a thread, for example, 17 and 18 are set as the thread of the process on the node 1. Since threads belonging to the same process share the virtual space, threads on different nodes belong to different processes. If attention is paid to this, the process can be replaced with a process and a thread at any time. Therefore, the migration does not distinguish between the process and the thread. However, this embodiment can be easily applied to the embodiment using a thread. The process belonging to the user job is a user processor (1
It is executed in 4), and is not executed by the OS dedicated processor.

【００２０】各ＯＳにはプロセスをスケジュールするプ
ロセススケジューラと、複数のプロセスから成るジョブ
をスケジュールするギャングスケジューラとが存在す
る。プロセススケジューラはノード内のスケジューラ
で、同スケジューラが管理するノード上のプロセスの優
先度に応じて、各プロセスをスケジュールする。ギャン
グスケジューラはノード間をまたがるスケジューラで、
各ジョブに割り当てられた優先度に応じて、各ジョブを
スケジュールする。プロセスの優先度とジョブの優先度
はお互いに独立で、通常は、お互いが他方に対して影響
を与える事はない。Each OS has a process scheduler for scheduling a process and a gang scheduler for scheduling a job including a plurality of processes. The process scheduler is a scheduler in a node, and schedules each process according to the priority of the process on the node managed by the scheduler. The gang scheduler is a scheduler that spans nodes,
Schedule each job according to the priority assigned to each job. Process priority and job priority are independent of each other and usually do not affect each other.

【００２１】各ノード上のＯＳは、実行するユーザジョ
ブをジョブ管理テーブル（１６）を用いて管理する。ジ
ョブ管理テーブルの詳細は後述する。更に、ＯＳは事象
発生特定手段を有し、ユーザプログラムがある事象待ち
を要求した場合、その事象を発生させるプロセスを特定
する事が可能である。The OS on each node manages user jobs to be executed using a job management table (16). Details of the job management table will be described later. Further, the OS has an event occurrence specifying means, and when a user program requests a certain event wait, it is possible to specify a process for generating the event.

【００２２】次に、図２を用いて、ユーザジョブが画面
に文字を表示する例について、全体の動作を説明する。
図２は本実施例における全体動作のうち、ノード１とノ
ード２について示した図である。図１の各ジョブ（１
０、１１）はある基本的な時間（クウォンタム時間）で
各ジョブ中に含まれるプロセスが一斉に切り替わる。現
在ジョブ１０が動作していて、ジョブ１１は実行可能で
はあるが、プロセッサが割り付けられておらず、実行待
ちとなっているものとする。Next, referring to FIG. 2, an overall operation of an example in which a user job displays characters on a screen will be described.
FIG. 2 is a diagram illustrating the node 1 and the node 2 in the overall operation according to the present embodiment. Each job (1
0, 11) is a certain basic time (quantum time), and the processes included in each job are switched simultaneously. It is assumed that the job 10 is currently operating and the job 11 is executable, but the processor is not allocated and is waiting for execution.

【００２３】図２のジョブ１１中のプロセス１７は画面
に文字を表示する為に、画面に対応するディバイスへの
アクセス用ロックを取得しようと試みる（２１０）。本
ロックは、ロックの獲得を監視するいわゆるスピンロッ
クではなく、ロックの獲得に失敗した場合は、プロセッ
サを明け渡し、ロックの獲得を待つ。この要求をノード
１上のＯＳが受け取り、ロックが獲得できかどうかのチ
ェックを行う（２１１）。ロックが獲得できた場合は、
ユーザプログラムに戻り文字の出力を開始する。獲得で
きなかった場合は、次のステップに進む。次ステップ２
１２では、ロックを取得する為に、ロック解放という事
象待ち要求を行う。次に、ＯＳは事象発生特定手段を用
いて、ロック解放事象を発生させるプロセスを特定し
（２１３）、当該プロセスが、ロック要求を発行したプ
ロセスが属するジョブ１１に属しているかどうかをチェ
ックする（２１４）。上記プロセスがジョブ１１に属し
ている場合は、ステップ２１６へ進み、プロセス１７は
事象待ち状態になる。そうでない場合は、ジョブ切替え
要求をノード１上のギャングスケジューラに対して行う
（２１５）。ノード１上のギャングスケジューラはジョ
ブ１１に属するプロセス１７〜２０を一時停止状態にす
る。プロセス１７はジョブ切替え後２１６で事象待ち状
態となる。プロセス１８はノード１上にあるので、ノー
ド１上のギャングスケジューラが一時停止状態へと状態
を更新する。プロセス１９，２０に関しては、ノード２
上のギャングスケジューラへジョブ切替え要求を発行す
る（２１５）。ジョブ切替え要求を受理した（２１７）
ノード２上のギャングスケジューラは、プロセス１９，
２０の状態を一時停止状態へと更新する。The process 17 in the job 11 of FIG. 2 attempts to obtain a lock for accessing the device corresponding to the screen in order to display characters on the screen (210). This lock is not a so-called spin lock that monitors acquisition of a lock. If acquisition of a lock fails, the processor yields and waits for acquisition of the lock. The OS on the node 1 receives this request and checks whether the lock can be acquired (211). If you can get the lock,
Return to the user program and start outputting characters. If not, go to the next step. Next step 2
At 12, an event wait request for lock release is made to acquire the lock. Next, the OS uses the event occurrence specifying means to specify a process that generates the lock release event (213), and checks whether the process belongs to the job 11 to which the process that issued the lock request belongs ( 214). If the process belongs to the job 11, the process proceeds to step 216, and the process 17 enters an event waiting state. Otherwise, a job switching request is made to the gang scheduler on node 1 (215). The gang scheduler on node 1 suspends processes 17 to 20 belonging to job 11. The process 17 enters an event waiting state at 216 after the job switching. Since process 18 is on node 1, the gang scheduler on node 1 updates the state to a suspended state. For processes 19 and 20, node 2
A job switching request is issued to the above gang scheduler (215). Job switching request received (217)
The gang scheduler on node 2 processes
20 is updated to the suspended state.

【００２４】以上の過程を経てジョブ１０はロック取得
事象を契機として、全プロセスが一時停止状態となり使
用していたプロセッサを解放する。この後、ギャングス
ケジューラは、他に実行可能なジョブがあれば、そのジ
ョブ中のプロセスに空いたプロセッサを割り付ける。そ
うでない場合は、プロセッサをアイドル状態とする。図
１で、例えば、ジョブ１１が実行可能であれば、このジ
ョブ中のプロセスにプロセッサを割り付けて、実行を再
開させる。Through the above steps, the job 10 is triggered by the lock acquisition event, all the processes are suspended, and the used processor is released. Thereafter, if there is another executable job, the gang scheduler allocates a free processor to a process in the job. Otherwise, put the processor in the idle state. In FIG. 1, for example, if the job 11 is executable, a processor is allocated to the process in the job and the execution is resumed.

【００２５】上記例では、事象特定手段１５がＯＳ内に
ある場合について説明しているが、ジョブやプロセスな
どと共に事象特定手段がユーザプログラムレベルで管理
されている場合がある。その場合、ユーザプログラムレ
ベルで、事象発生ジョブを特定し、事象待ち要求時に、
事象発生ジョブの情報を付加する。In the above example, the case where the event specifying means 15 is included in the OS has been described. However, the event specifying means may be managed at the user program level together with the job and the process. In this case, at the user program level, specify the event occurrence job,
Add event occurrence job information.

【００２６】本手順によると、事象を発生させる要因
が、事象待ちを行うジョブ以外のジョブにある場合にの
み、事象待ちジョブの一時停止と、別なジョブへの切替
えを行う。従って、上記要因が事象待ちジョブ内にある
場合は、切替えは行わないので、不必要にジョブ切替え
処理が呼ばれる事を防ぐ事になる。即ち、要因が同じジ
ョブ内にある場合は、むしろジョブを切り替えずに実行
させつづけたほうが、ロックを要求したプロセスがロッ
クを獲得しやすくなる。According to this procedure, only when the cause of the event is a job other than the job waiting for the event, the job waiting for the event is temporarily stopped and the job is switched to another job. Therefore, when the above-mentioned factor is present in the event waiting job, the switching is not performed, so that unnecessary calling of the job switching process is prevented. In other words, if the factors are in the same job, it is easier for the process that has requested the lock to acquire the lock if the job is executed without switching.

【００２７】図３は事象待ちが完了する過程を示す。本
例では、ノード２上のあるプロセスがロックを保持して
いたとする。上記プロセスが同ロックを解放する時（２
２０）、ノード２上のＯＳに対して開放要求を発行す
る。ノード２上のＯＳは同ロックの開放要求を受け取る
と（２２１）、プロセス管理テーブル（図５の２３０）
を用いて、そのロックを待っているプロセスからロック
待ちのジョブを特定する（２２２）。プロセス管理テー
ブルについては後で説明する。本例の場合は、ロック待
ちプロセス１７から本プロセスが属するジョブであるジ
ョブ１０を特定する。次にノード２上のギャングスケジ
ューラがロック待ちジョブ１０を実行可能状態とする
（２２３）。同ギャングスケジューラは、ジョブ１０に
属するプロセスがプロセス１７〜２０である事を特定
し、同プロセスを全て実行可能状態にする。ノード２上
のギャングスケジューラにとって、プロセス１９，２０
は同じノード上のプロセスであるので、直接実行可能状
態とする。プロセス１７，１８に関しては、異なるノー
ド上のジョブであるので、ジョブ起動要求をノード１上
のギャングスケジューラに対して行う。起動要求を受け
取ったノード１上のギャングスケジューラは（２２
５）、同ノード上に存在するプロセス１７，１８を実行
可能状態とする（２２６）。FIG. 3 shows the process of completing the event wait. In this example, it is assumed that a certain process on the node 2 holds the lock. When the above process releases the lock (2
20) A release request is issued to the OS on the node 2. When the OS on the node 2 receives the lock release request (221), the process management table (230 in FIG. 5)
, The job waiting for the lock is specified from the process waiting for the lock (222). The process management table will be described later. In the case of the present example, the job 10 which is the job to which this process belongs is specified from the lock waiting process 17. Next, the gang scheduler on the node 2 makes the lock waiting job 10 executable (223). The gang scheduler specifies that the processes belonging to the job 10 are processes 17 to 20, and makes all the processes executable. For the gang scheduler on node 2, processes 19 and 20
Is a process on the same node, so that it is directly executable. Since the processes 17 and 18 are jobs on different nodes, a job start request is made to the gang scheduler on the node 1. The gang scheduler on the node 1 that has received the activation request is (22
5) The processes 17 and 18 existing on the same node are made executable (226).

【００２８】上記処理により、ロック要求時にはロック
要求を発行したプロセスが属するジョブ１０全体が一時
停止状態となり、ロック獲得時にはジョブ１０全体が実
行可能状態となる。これにより、ジョブ１０がクウォン
タムタイムの最後まで無駄にプロセッサを使用する事は
なくなる。By the above processing, the entire job 10 to which the process that issued the lock request belongs is temporarily stopped when a lock request is issued, and the entire job 10 is executable when a lock is acquired. As a result, the job 10 does not wastefully use the processor until the end of the quantum time.

【００２９】次に、事象発生プロセスの特定方法につい
て詳細に説明する。図５は事象発生プロセスの特定に使
用するテーブルで、管理プログラムが管理する。管理プ
ログラムはＯＳ中か、ユーザプログラム中に存在する。
プロセス管理テーブル２３０の各エントリはプロセスＩ
Ｄ、ノード番号、待ち事象、ジョブＩＤのフィールドか
ら成る。プロセスＩＤは各プロセスに固有のＩＤであ
る。ノード番号は、プロセスが存在するノード番号であ
る。待ち事象とは、プロセスが事象待ちを行っている場
合、その事象を記述する。事象には、ＯＳによって論理
的な番号（事象番号）が付加されている。例えば、プロ
セス１７は事象待ち中であり、その事象の論理番号はＬ
１である。ＯＳは、この事象番号を引数として事象待ち
プロセスの待ち状態を解除するインタフェース、及び、
ある事象待ちを行う場合、ユーザが指定した事象に対応
する事象番号を返してから事象待ちを開始するインタフ
ェースを用意している。最後のジョブＩＤには、プロセ
スが属するジョブのＩＤが入る。図５では、プロセス１
７〜２０がジョブＩＤ１０のジョブに属し、プロセスＰ
１はジョブＩＤがＪ１のジョブに属する。Next, a method for specifying the event occurrence process will be described in detail. FIG. 5 is a table used for specifying the event occurrence process, and is managed by the management program. The management program exists in the OS or the user program.
Each entry in the process management table 230 is a process I
D, a node number, a waiting event, and a job ID field. The process ID is an ID unique to each process. The node number is a node number where the process exists. When a process is waiting for an event, the wait event describes the event. A logical number (event number) is added to the event by the OS. For example, process 17 is waiting for an event, and the logical number of the event is L
It is one. The OS uses the event number as an argument to release the waiting state of the event waiting process, and
When waiting for an event, an interface is provided that returns the event number corresponding to the event specified by the user and then starts waiting for the event. The last job ID contains the ID of the job to which the process belongs. In FIG. 5, process 1
7 to 20 belong to the job with the job ID 10 and the process P
1 belongs to the job whose job ID is J1.

【００３０】ジョブ管理テーブルでは、ジョブとそのジ
ョブに属するプロセスの対応関係を管理する。各エント
リは、ジョブＩＤとプロセスリストのフィールドから成
る。プロセスリストには、ジョブＩＤに対応するジョブ
に属するプロセスのリストを保持する。本例では、ジョ
ブ１０はプロセスリストとして、プロセス１７〜２０を
保持する。The job management table manages the correspondence between jobs and processes belonging to the jobs. Each entry includes a job ID and a process list field. The process list holds a list of processes belonging to the job corresponding to the job ID. In this example, the job 10 holds processes 17 to 20 as a process list.

【００３１】事象管理テーブルの各エントリは事象と、
その事象を発生させるプロセス、及び、その事象を待つ
プロセスのリストから成る。エントリ２３５は、事象Ｌ
１について記述しており、この事象を発生させるプロセ
スがＰ１であり、この事象の発生を待っているプロセス
がプロセス１７である事を示す。例えば、これをロック
取得に適応する場合、事象はロック解放事象となり、ロ
ック保持プロセスが事象発生プロセスで、ロック待ちプ
ロセスを事象待ちプロセスリストに登録する。Each entry in the event management table is an event,
It consists of a list of processes that cause the event and those that wait for the event. Entry 235 contains event L
No. 1 is described, indicating that the process that generates this event is P1, and the process waiting for the occurrence of this event is the process 17. For example, when this is applied to lock acquisition, the event is a lock release event, the lock holding process is the event generation process, and the lock waiting process is registered in the event waiting process list.

【００３２】上記管理テーブルについては管理プログラ
ムが、プロセスやジョブの生成時、事象の発生時に関連
する全てのノード上に関連するエントリを作成する。With respect to the above management table, the management program creates related entries on all nodes related when a process or a job is generated or when an event occurs.

【００３３】次に、図５と図６を用いて、事象発生ジョ
ブを特定する事象発生特定手順を説明する。事象発生特
定手段では、事象番号Ｌ１を引数として受け取り、事象
管理テーブル２３１を検索し、エントリ２３５から事象
発生プロセスＰ１を得る（２５０）。次に、プロセス管
理テーブル２３０を検索し、プロセスＰ１に対応するエ
ントリ２３６からジョブＩＤであるＪ１を得る（２５
１）。ジョブＩＤを得る事により、事象発生ジョブの特
定が完了する（２５２）。更に、ジョブ管理テーブル１
６を検索し、ジョブＩＤであるＪ１からプロセスリスト
を取得する（２５３）。この場合、プロセスリストはプ
ロセスＰ１からのみ成る。Next, an event occurrence specifying procedure for specifying an event occurrence job will be described with reference to FIGS. The event occurrence specifying means receives the event number L1 as an argument, searches the event management table 231 and obtains the event occurrence process P1 from the entry 235 (250). Next, the process management table 230 is searched, and the job ID J1 is obtained from the entry 236 corresponding to the process P1 (25).
1). By obtaining the job ID, the specification of the event occurrence job is completed (252). Further, the job management table 1
6 is retrieved, and a process list is acquired from the job ID J1 (253). In this case, the process list consists only of the process P1.

【００３４】事象発生特定手段１５により得られるジョ
ブＩＤ、プロセスリストと、事象待ち要求プロセスを比
較する事により、２１４で事象待ち要求プロセスと事象
発生プロセスが同一ジョブに属するかどうかを判断する
事ができる。By comparing the job ID and process list obtained by the event occurrence specifying means 15 with the event wait request process, it is possible to determine at 214 whether the event wait request process and the event occurrence process belong to the same job. it can.

【００３５】次に、ジョブ切替え方法について詳しく説
明する。図２の方法では、事象待ち状態になったジョブ
を一時停止させる事により、当該ジョブのクウォンタム
タイムの残りを、イベント待ち時間に割り当てる事を抑
止する。これにより、無駄な待ち時間を削減する事がで
き、システムのプロセッサ利用率を向上させる事ができ
た。しかし、図４の様に事象待ちジョブ１００以外に複
数の実行可能なジョブが存在し、ランキュー中の上記ジ
ョブが全て、同じロック解放の事象待ちを行っている場
合、ランキュー途中のジョブは全て、１クウォンタムタ
イム分無駄に事象待ちを行ってしまう。そこで、事象待
ちジョブ１００を一時停止する時に、事象発生ジョブ１
０４を優先的にスケジュールする事により、上記のよう
な無駄な待ち時間を回避する。Next, the job switching method will be described in detail. In the method shown in FIG. 2, by temporarily suspending the job in the event waiting state, the remaining quantum time of the job is prevented from being allocated to the event waiting time. As a result, unnecessary waiting time can be reduced, and the processor utilization of the system can be improved. However, as shown in FIG. 4, when there are a plurality of executable jobs other than the event waiting job 100 and all the jobs in the run queue are waiting for the same lock release event, all the jobs in the run queue are The event is wasted for one quantum time. Therefore, when the event waiting job 100 is suspended, the event occurrence job 1
04 is preferentially scheduled, thereby avoiding the useless waiting time as described above.

【００３６】事象発生ジョブを優先的にスケジュールす
る方法は以下のようにいくつかある。（１）図４におい
て、ＯＳは事象待ちジョブ１００を一時停止した後、事
象発生ジョブ１０４を実行可能状態にするだけではな
く、ランキュー上の順番を無視して、ジョブ１０５〜１
０８よりも先にジョブ１０４にプロセッサを割り付け、
実行を開始させる。There are several methods for preferentially scheduling an event occurrence job as follows. (1) In FIG. 4, after temporarily suspending the event waiting job 100, the OS not only sets the event occurrence job 104 to the executable state, but also ignores the order on the run queue and
Processor is assigned to the job 104 prior to 08
Start execution.

【００３７】（２）図４において、ＯＳが１クウォンタ
ムタイム間優先度を上げる機能を備え、ジョブ１００を
一時停止した後、上記機能により事象発生ジョブ１０４
の優先度を一時的に高くし、ジョブ１０４を優先的にス
ケジュールする。ジョブは１０４、１クウォンタムタイ
ムが経過した後、自動的に優先度が元に戻る。(2) In FIG. 4, after the OS has a function of raising the priority for one quantum time, the job 100 is temporarily stopped, and the
Is temporarily increased, and the job 104 is scheduled with priority. The priority of the job automatically returns to its original state after the elapse of one quantum time.

【００３８】（３）上記（２）の方法において、更に、
ジョブ１０４の優先度を一時的に高くする時、当ジョブ
に属する全プロセスの優先度をも一時的に高くする。(3) In the above method (2),
When the priority of the job 104 is temporarily increased, the priority of all processes belonging to the job is also temporarily increased.

【００３９】次に、コレクティブ機能実行時の実施例に
ついて説明する。図７はコレクティブ機能の実行の様子
を示す図である。ユーザジョブ１０の各プロセスは一斉
にコレクティブ機能の実行を開始する。図７では、ユー
ザジョブ１０を構成するプロセスのうちプロセス１７〜
１９のみを示している。もう一つのプロセス２０も同じ
動作をする。Next, an embodiment at the time of executing the collective function will be described. FIG. 7 is a diagram showing a state of execution of the collective function. Each process of the user job 10 simultaneously starts executing the collective function. In FIG. 7, processes 17 to 17 of the processes constituting the user job 10 are illustrated.
Only 19 is shown. Another process 20 performs the same operation.

【００４０】図７ではコレクティブ機能としてディスク
装置への入出力を行う。まず、ジョブ１０中の各プロセ
スはコレクティブ入出力実行を開始するために、各プロ
セスが存在するノード上のＯＳに対して、コレクティブ
入出力実行要求を発行する。図７ではプロセス１７，１
８はノード１上のＯＳ３０１に対して、コレクティブ入
出力要求を発行する。ノード２上でも同様に、プロセス
１９，２０が同上のＯＳに対してコレクティブ入出力要
求を発行し、ノード２上のＯＳ３０１が要求を受ける。
ＯＳ３００，３０１は、ジョブ中の全てのプロセスから
要求を受けた後、お互いに連絡を取りながら、入出力の
最適化を行い、その後にディスクが接続されたノード
（例えばノード５など）上のＯＳに対して入出力の実行
を開始する。In FIG. 7, input / output to / from a disk device is performed as a collective function. First, in order to start collective input / output execution, each process in the job 10 issues a collective input / output execution request to the OS on the node where each process exists. In FIG. 7, processes 17, 1
8 issues a collective input / output request to the OS 301 on the node 1. Similarly, on the node 2, the processes 19 and 20 issue a collective input / output request to the OS, and the OS 301 on the node 2 receives the request.
After receiving requests from all the processes in the job, the OSs 300 and 301 optimize input / output while communicating with each other, and then execute the OS on a node (for example, the node 5 or the like) to which the disk is connected. Start executing I / O to.

【００４１】コレクティブ機能はジョブ中の全てのプロ
セスが同じ動作を行う為、待ち事象がある場合は、ジョ
ブ中の全てのプロセスが待ち状態となる。上記例では、
ジョブ１０中の全プロセス１７〜２０は全て入出力要求
後、入出力要求完了待ちとなる。従って、この時にジョ
ブを切り替える事により、各プロセスの無駄な待ち時間
を減らす事が可能となる。In the collective function, all processes in a job perform the same operation. Therefore, if there is a wait event, all processes in the job enter a wait state. In the above example,
All the processes 17 to 20 in the job 10 wait for completion of the input / output request after the input / output request. Therefore, by switching jobs at this time, it is possible to reduce unnecessary waiting time of each process.

【００４２】図８は、コレクティブ機能実行時のユーザ
プログラムの動作とＯＳの動作を示す。全体の処理はユ
ーザプログラム処理とＯＳ処理とに分かれると共に、各
処理を実行するプロセッサもユーザ実行用プロセッサと
サービス実行用プロセッサとに分かれる。始めは、コレ
クティブ処理を行うユーザプログラムは、ユーザ実行用
プロセッサ上で動作している。また、このユーザプログ
ラムはジョブ１０上で動作するものとする。このプログ
ラムを実行しているプロセス１７は、始めに、コレクテ
ィブ要求を発行する（３００）。コレクティブ実行自体
は、ユーザプログラム中の別モジュールで実行する。あ
るいは、コレクティブ実行をＯＳが実行するような実装
もありえる。その場合、図８の処理３０１〜３０６、３
０７〜３０８はＯＳ内で実行する事になる。本実施例で
は、コレクティブ機能の実行はユーザプログラムとして
動作し、この場合、上記別モジュールは、ユーザプロセ
スの延長で動作する部分と、ユーザジョブには含まれず
に別プロセスとして動作する部分とに分かれる。この要
求を受けた当該モジュールは、ユーザプロセスの延長で
動作し、他のプロセス（１８〜２０）からの要求を収集
する（３０２）。要求を全て集め終えた後、コレクティ
ブ要求の完了待ちを行う（３０２）。それと同時に、サ
ービス実行用プロセッサ上で、別プロセスとして動作し
ているコレクティブ処理用プロセスに対して、コレクテ
ィブ機能の実行を指示する。以降、コレクティブ機能は
同プロセス上で、ユーザプロセスと並列に実行される
（３０７）。FIG. 8 shows the operation of the user program and the operation of the OS when the collective function is executed. The whole process is divided into a user program process and an OS process, and the processor that executes each process is also divided into a user execution processor and a service execution processor. Initially, a user program that performs collective processing operates on a user execution processor. This user program operates on the job 10. The process 17 executing this program first issues a collective request (300). The collective execution itself is executed by another module in the user program. Alternatively, there may be an implementation in which the OS executes collective execution. In that case, the processes 301 to 306 and 3 in FIG.
07 to 308 are executed in the OS. In this embodiment, the execution of the collective function operates as a user program. In this case, the separate module is divided into a part that operates as an extension of the user process and a part that is not included in the user job and operates as a separate process. . Upon receiving this request, the module operates as an extension of the user process and collects requests from other processes (18 to 20) (302). After all the requests have been collected, the completion of the collective request is waited for (302). At the same time, the service execution processor instructs the collective processing process operating as a separate process to execute the collective function. Thereafter, the collective function is executed on the same process in parallel with the user process (307).

【００４３】コレクティブ要求の完了待ちを行うプロセ
スは、ジョブ１０の一時停止要求をＯＳに対して発行す
る（３０３）。要求を受けたＯＳは他に実行可能ジョブ
があるかどうかを調べ（３０４）、ある場合は他のジョ
ブにプロセッサを割り付け、同ジョブを実行する（３０
５）。ない場合は、プロセッサをアイドル状態にし、ジ
ョブの起動待ちを行う。サービス実行用プロセッサで実
行中のコレクティブ機能の実行が完了すると、コレクテ
ィブ処理用プロセスは、コレクティブ機能実行終了通知
をＯＳに対して行う（３０８）。ＯＳは本事象待ちを行
っているジョブ１０を再び事項可能状態とする（３０
６）。The process that waits for the completion of the collective request issues a request to suspend the job 10 to the OS (303). The OS receiving the request checks whether there is another executable job (304), and if so, allocates a processor to another job and executes the same job (30).
5). If not, the processor is set to the idle state and the job is waited for activation. When the execution of the collective function being executed by the service execution processor is completed, the collective processing process sends a collective function execution end notification to the OS (308). The OS sets the job 10 waiting for this event to the item possible state again (30
6).

【００４４】以上のように、コレクティブ機能の特性か
ら、コレクティブ機能が実行される場合は、同機能を要
求したプロセスは全て、同機能の実行完了待ちを行うの
で、これを契機としてＯＳに他のジョブへの切替えを指
示する事により、無駄な待ち時間を削減する。従って、
上記例ではコレクティブ機能としてディスク装置への入
出力について説明したが、コレクティブ通信についても
同様に容易に適用可能である。更に、ユーザプログラム
実行用のプロセッサと、サービス実行用のプロセッサを
分ける事により、事象待ちにより切り替えられた次のジ
ョブの実行を干渉する事を抑止する効果もある。As described above, when the collective function is executed due to the characteristics of the collective function, all processes that have requested the function wait for the completion of the execution of the function. By instructing switching to a job, unnecessary waiting time is reduced. Therefore,
In the above example, input / output to / from the disk device has been described as a collective function. However, collective communication can be similarly easily applied. Further, by separating the processor for executing the user program and the processor for executing the service, there is also an effect of suppressing interference with the execution of the next job switched by waiting for an event.

【００４５】また、次に実行するジョブがジョブ１１
で、ノード２上のプロセッサ１４のうち２７と２８がユ
ーザ実行用プロセッサではなく、サービス実行用のプロ
セッサに割り付けられていた場合は、ジョブ１１に切り
替える時点で、プロセッサの構成を変え、プロセッサ２
７，２８をユーザ実行用プロセッサに含める事で、ジョ
ブ１１の並列性を損なう事なく次のジョブに切り替え
る。The job to be executed next is job 11
In the case where 27 and 28 among the processors 14 on the node 2 are assigned to the service execution processor instead of the user execution processor, the configuration of the processor is changed at the time of switching to the job 11, and
By including 7 and 28 in the processor for user execution, the job 11 is switched to the next job without impairing the parallelism.

【００４６】[0046]

【発明の効果】本発明によれば、ユーザジョブ中のある
プロセスが事象待ち状態になった場合、その事象を起す
要因が当該ユーザジョブ以外にある時にのみ、ユーザジ
ョブ中の全プロセスを一時停止状態にし、次のジョブに
切り替える事により、当該ユーザジョブが無駄に事象待
ちを行う事を抑止する事ができるので、プロセッサの利
用率を上げる事ができる。また、同じ事象待ちのジョブ
が複数存在する場合、それらの事象待ちジョブよりも優
先して、事象発生ジョブをスケジュールする事により、
無駄な事象待ちを削減し、プロセッサ利用率を上げる事
が可能となる。According to the present invention, when a certain process in a user job enters an event waiting state, all processes in the user job are temporarily stopped only when the factor causing the event is other than the user job. By switching to the next job and switching to the next job, it is possible to prevent the user job from wastefully waiting for an event, so that the utilization rate of the processor can be increased. In addition, when there are a plurality of jobs waiting for the same event, by scheduling the event occurrence job in preference to those event waiting jobs,
It is possible to reduce unnecessary event waiting and increase the processor utilization rate.

【００４７】更に、ユーザジョブがコレクティブ機能を
実行する場合、コレクティブ機能処理中に発生する、ユ
ーザジョブ中の全プロセスの事象待ちを契機として、ジ
ョブを切り替える事により、無駄な事象待ちを削減し、
プロセッサ利用率を上げる事が可能となる。Further, when the user job executes the collective function, the job is switched upon the event wait of all processes in the user job, which occurs during the collective function processing, thereby reducing unnecessary event wait.
It is possible to increase the processor utilization.

【００４８】更に、別のユーザジョブへ切り替える場
合、プロセッサの構成を変える事により、次のジョブの
並列性が損なわれる事を抑止する事が可能となる。Further, when switching to another user job, by changing the configuration of the processor, it is possible to prevent the parallelism of the next job from being impaired.

[Brief description of the drawings]

【図１】本発明の一実施例の計算機システムの構成図で
ある。FIG. 1 is a configuration diagram of a computer system according to an embodiment of the present invention.

【図２】本発明において、ユーザジョブが事象待ちを行
う場合の処理手順を示す図である。FIG. 2 is a diagram showing a processing procedure when a user job waits for an event in the present invention.

【図３】本発明において、事象待ちを行うユーザジョブ
を再起動する場合の処理手順を示す図である。FIG. 3 is a diagram showing a processing procedure when a user job that waits for an event is restarted in the present invention.

【図４】本発明において、複数のジョブが事象待ちを行
う状態を示す図である。FIG. 4 is a diagram showing a state where a plurality of jobs wait for an event in the present invention.

【図５】本発明において、使用するテーブルを示す図で
ある。FIG. 5 is a diagram showing a table used in the present invention.

【図６】本発明において、事象発生特定の処理手順を示
す図である。FIG. 6 is a diagram showing a processing procedure for specifying occurrence of an event in the present invention.

【図７】本発明において、コレクティブ機能実行の概略
を示す図である。FIG. 7 is a diagram schematically showing execution of a collective function in the present invention.

【図８】本発明において、コレクティブ機能実行の処理
手順を示す図である。FIG. 8 is a diagram showing a processing procedure for executing a collective function in the present invention.

[Explanation of symbols]

１…ノード１、２…ノード２、１０…ユーザジョブ、１
１…ユーザジョブ、１５…事象発生特定手段、１６…ジ
ョブ管理テーブル。1 ... node 1, 2 ... node 2, 10 ... user job, 1
1: User job, 15: Event occurrence specifying means, 16: Job management table

───────────────────────────────────────────────────── フロントページの続き (72)発明者熊▲崎▼ 裕之神奈川県横浜市戸塚区戸塚町5030番地株式会社日立製作所ソフトウェア事業部内 (72)発明者高津弘幸神奈川県横浜市戸塚区戸塚町5030番地株式会社日立製作所ソフトウェア事業部内Ｆターム(参考） 5B045 AA03 CC06 EE08 EE13 5B098 AA10 CC01 GA03 GA04 GA05 GA08 GC03 GC05 GD02 GD14 GD22 ──────────────────────────────────────────────────続き Continuing on the front page (72) Inventor Kuma ▲ Saki ▼ Hiroyuki 5030 Totsuka-cho, Totsuka-ku, Yokohama-shi, Kanagawa Prefecture Inside Software Division, Hitachi, Ltd. (72) Inventor Hiroyuki Takatsu Totsuka-cho, Totsuka-ku, Yokohama-shi, Kanagawa 5030 F-term in the Software Division, Hitachi, Ltd. (Reference) 5B045 AA03 CC06 EE08 EE13 5B098 AA10 CC01 GA03 GA04 GA05 GA08 GC03 GC05 GD02 GD14 GD22

Claims

[Claims]

1. A computer system in which a plurality of processors and a node comprising a memory accessible from each processor are connected by a high-speed network, and some of the nodes are connected to a plurality of input / output devices in a computer system. When an OS having an equivalent function operates on a node and one or more user processes belonging to the user job operate on each node for a user job composed of a plurality of user processes, and a plurality of user jobs exist ,
A time-division scheduler for a job that suspends all user processes that make up a running user job at once and reschedules all processes belonging to another stopped user job at one time, and when a user job waits for an event In the case of an OS having an event occurrence job specifying means for specifying which job causes the event, when a certain user job waits for an event to occur, a certain process in the user job waits for the event. A step of issuing a request to suspend, a step in which the OS that has received the suspension request identifies a job that causes the event using the event occurrence identifying unit, and a step in which the job identified in the previous step issues the suspension request. OS if different from user job
Suspending all processes belonging to the job that issued the suspension request, and setting the OS to a user job waiting for the event when the event occurs, so that all the processes belonging to the job can be executed by each node. And setting the job executable state again.

2. The job scheduling method according to claim 1, wherein when the user job waits for an event, if the means for specifying which job causes the event is not provided in the OS, the user job is set in the event waiting state. When you wait,
A step of specifying a job that causes an event; a step of issuing a request to suspend the job together with information of the event-occurring job to the OS; and the OS receiving the request uses the information to wait for the user job. Checking whether the job that causes the event is the user job.

3. The job scheduling method according to claim 1, wherein when the event occurrence job is made executable, the job is preferentially scheduled for a specific time.

4. The job scheduling method according to claim 3, wherein a processor is assigned to the event occurrence job immediately after the event waiting job is stopped so that the event occurrence job is preferentially scheduled for a specific time. And a job scheduling method.

5. The job scheduling method according to claim 3, wherein the priority of the job is increased only for a specific time in order to schedule the job only for a specific time.

6. The job scheduling method according to claim 5, wherein when increasing the priority of the job, the priority of all processes belonging to the same job is increased.

7. A computer system in which nodes each including a plurality of processors and a memory accessible from each processor are connected by a high-speed network, and some of the nodes are connected to a plurality of input / output devices in a computer system. An OS having the same function operates on a node on a predetermined service execution processor. For a user job including a plurality of user processes, a user process belonging to the user job is executed on each node. If one or more user jobs are running on a user execution processor other than the service execution processor and there are multiple user jobs, all the user processes that compose the running user job are suspended at once, and another suspended Time-division schedule of a job to reschedule all processes belonging to a user job at once When the user job waits for an event, the OS having event occurrence job specifying means for specifying which job causes the event to occur, the user job is executed by all user processes in the job. Starting the execution of the collective function that operates on the network, collecting the start request from each user process by the user job, and temporarily stopping the user process on each node in order to wait for the completion of the request function after collection is completed. Making a stop request to the OS; and, if another job exists, the OS selecting the user job and re-executing a user process belonging to the user job on each node. , A step in which a user program executes a collective function on a service execution processor, and a corresponding function Setting a user process that is waiting for completion at the end of the process to an executable state.

8. The job scheduling method according to claim 7, wherein a collective communication function for simultaneously communicating between processes belonging to the job is executed as a collective function.

9. The job scheduling method according to claim 8, wherein a collective file input / output function for simultaneously performing file input / output between processes belonging to the job is executed as a collective function.

10. The job scheduling method according to claim 7, wherein the service execution processor is a shared processor and the user execution processor is an exclusive processor, and the shared processor is allowed to execute a plurality of jobs in a time-division manner for each process. The OS has a scheduler that allows the exclusive processor to execute a plurality of jobs in a time-division manner on a job-by-job basis. A job scheduling method characterized by changing.