JPH09179754A

JPH09179754A - Task monitoring device and its method

Info

Publication number: JPH09179754A
Application number: JP7333508A
Authority: JP
Inventors: Yoshiyuki Baba; 儀之馬場
Original assignee: Mitsubishi Electric Corp
Current assignee: Mitsubishi Electric Corp
Priority date: 1995-12-21
Filing date: 1995-12-21
Publication date: 1997-07-11

Abstract

PROBLEM TO BE SOLVED: To allow a task itself to recognize its stop time, to visually grasp the stop state of the task and to automatically cope with the execution delay of the task. SOLUTION: An operating system(OS) part 2 for attaining a multi-task execution format is provided with a task stop allowable time setting means 2a, a stop time accumulating means 2b, a stop state provision means 2c, and a stop time excess reporting means 2d, and the OS part is also provided with a means for managing the stop time of a user defintion task to be executed by a computer, and when the stop time exceeds allowable time, informing the task of the event. A system managing monitor 3 to be driven as a task on the computer is provided with a task stop record collecting means 3a, a stop state displaying means 2b, a bottleneck detecting means 3c, a fault informing means 3d, a task state transition displaying means 3e, and a task inputting means 3f. The monitor 3 enters the task stop state recorded by the OS part 2 by the use of an application interface and manages the performance and operation of a computer system.

Description

Detailed Description of the Invention

【０００１】[0001]

【発明の属する技術分野】この発明は、マルチタスク実
行方式の計算機において、タスクの停止時間の管理を行
い、タスクに処理の時間的遅れを認識させる事を可能に
すると共に、計算機システムの負荷状況の監視を可能に
するものである。更に、計算機システムの性能管理及び
運転管理を行うことを可能にする技術に関するものであ
る。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention manages a task stop time in a multi-task execution system computer and enables the task to recognize a processing delay, and the load status of the computer system. It enables the monitoring of. Furthermore, the present invention relates to a technique that enables performance management and operation management of a computer system.

【０００２】[0002]

【従来の技術】従来のマルチタスク実行方式の計算機で
は、計算機の性能を評価するための各種評価ツールが作
成され、提供されている。例えば、ＵＮＩＸのｐｓコマ
ンドでは、オペレーティングシステム（以降、ＯＳと称
す）の管理下で実行されるプログラムの個別環境である
プロセスという実行単位毎に、ＯＳによる管理情報を表
示する。同じくＵＮＩＸでは、計算機でのシステム資源
の使用状況を把握するための評価ツールとして、ｖｍｓ
ｔａｔコマンドが提供されている。ｖｍｓｔａｔコマン
ドでは、システム資源の過剰利用によるレスポンス低下
を検出するためのデータを表示する。また、マルチタス
ク実行環境においては、トランザクション処理におい
て、処理のレスポンスを保証するためにトランザクショ
ン処理モニタを用いている。2. Description of the Related Art In a conventional multi-task execution type computer, various evaluation tools for evaluating the performance of the computer have been prepared and provided. For example, in the UNIX ps command, management information by the OS is displayed for each execution unit of a process which is an individual environment of a program executed under the control of an operating system (hereinafter referred to as OS). Similarly, in UNIX, vms is used as an evaluation tool to understand the usage status of system resources on computers.
A tat command is provided. The vmstat command displays data for detecting a response deterioration due to excessive use of system resources. In the multitasking execution environment, a transaction processing monitor is used to guarantee the response of the processing in the transaction processing.

【０００３】ｐｓコマンドでは、プロセス（プログラ
ム）が停止している原因を表すイベント情報を、例えば
図１３に示すコマンド出力でのＷＣＨＡＮ欄に、ソフト
ウェアのアドレス情報で表示する。ここで、ＷＣＨＡＮ
欄にアドレス表示がないプロセスは、実行可能状態にあ
るもの、あるいは実際はＯＳの一部であるプログラムで
ある。ｐｓコマンド出力のＷＣＨＡＮ欄に表示するアド
レス情報等の表示データは、ＯＳがプロセス毎に割り当
てるテーブル内に管理している。ｐｓコマンドは、テー
ブル内のデータを得るためのアプリケーション・インタ
ファース（以下、ＡＰＩ）を用い、これによって得たデ
ータを表示する。しかしながら、この情報をタスク自身
で判断することはできなかった。In the ps command, event information indicating the cause of the process (program) stopping is displayed as software address information in the WCHAN column in the command output shown in FIG. 13, for example. Where WCHAN
Processes that do not have an address in the column are those that are in a runnable state or are actually programs that are part of the OS. Display data such as address information displayed in the WCHAN column of the ps command output is managed by the OS in a table allocated for each process. The ps command uses an application interface (hereinafter, API) for obtaining the data in the table and displays the obtained data. However, this information could not be judged by the task itself.

【０００４】また、ｖｍｓｔａｔコマンドでは、図１４
の様な出力が得られる。出力において、ｐｒｏｃｓの項
目ではＣＰＵ待ちの状態、ｍｅｍｏｒｙの項目ではメモ
リの使用状況、ｐａｇｅの項目ではページング処理の状
況、ｆａｕｌｔの項目では割り込みの発生状況、ｃｐｕ
の項目ではＣＰＵの利用状況を表している。これらの情
報は、ＯＳによってＯＳ内部のデータとして管理されて
いる。ｖｍｓｔａｔコマンドにおいても、これらＯＳ内
部のデータを得るためにＡＰＩを実行し、これによって
得たデータを表示する。Further, in the vmstat command, as shown in FIG.
You will get an output like In the output, the items of procs are waiting for CPU, the items of memory are memory usage status, the items of page are paging processing status, the items of fault are interrupt status, and cpu.
In the item, the usage status of the CPU is shown. These pieces of information are managed by the OS as data inside the OS. Also in the vmstat command, the API is executed to obtain the data inside these OSs, and the obtained data is displayed.

【０００５】また、オンライン・トランザクション処理
システムでは、トランザクション処理モニタを使って計
算機システムに投入するトランザクションを整理するこ
とで、システムのスループットを向上させる手段が採ら
れている。このトランザクション処理モニタでは、実行
リクエストをキューに貯めた後に、データベース・アプ
リケーションに渡すキューイング機能、優先度に従って
順番に処理するスケージューリング機能を備えている。
この機能により、トランザクション処理モニタを用いた
計算機システムでは、処理のレスポンス時間を保証して
いる。Further, in the online transaction processing system, a means for improving the throughput of the system is adopted by organizing the transactions input to the computer system by using the transaction processing monitor. This transaction processing monitor has a queuing function of storing execution requests in a queue and then passing them to a database application, and a scheduling function of sequentially processing according to priority.
This function guarantees the response time of the processing in the computer system using the transaction processing monitor.

【０００６】しかしながら、以上のような情報は、全て
ＯＳ自身でしか判断できず、タスク自身がこの情報を使
用することができなかった。また、この情報を取り込む
インターフェースを設けたとしても、その情報を判断す
ることはできなかった。However, all of the above information can be judged only by the OS itself, and the task itself cannot use this information. Even if an interface for taking in this information was provided, it was not possible to judge the information.

【０００７】[0007]

【発明が解決しようとする課題】計算機におけるタスク
の実行では、外乱によってタスクの実行遅れが生じ得
る。ここでいう外乱とは、タスクの実行を継続できなく
するイベント全般を指している。このイベントには例え
ば、各種Ｉ／Ｏの終了待ち、メモリやＣＰＵ等の計算機
資源待ち、マルチタスク実行下での他タスクとの通信、
他計算機上のタスクとの通信、タスク自身が設定した停
止等がある。このため、停止時間の管理を行っていない
計算機上でのタスクは、外乱による実行遅れを知ること
ができない。In executing a task in a computer, a task execution delay may occur due to disturbance. Disturbance here refers to all events that make it impossible to continue executing a task. This event includes, for example, waiting for completion of various I / Os, waiting for computer resources such as memory and CPU, communication with other tasks under multitask execution,
There are communication with tasks on other computers, and stoppages set by the tasks themselves. Therefore, the task on the computer that does not manage the stop time cannot know the execution delay due to the disturbance.

【０００８】また、従来のシステム評価ツールでは、Ｏ
Ｓがタスクの停止時間を管理していないため、外乱が引
き起こすレスポンス低下やシステムハング等の原因を即
座に把握することが困難であった。例えば、Ｉ／Ｏにお
いてリトライが頻発してシステムのレスポンス低下が起
きた時には、１回のＩ／Ｏに費やした時間によって異常
を検知するのではなく、Ｉ／Ｏのリトライ回数もしくは
Ｉ／Ｏサービスを提供するソフトウェアのログによって
異常を検知する必要があった。Further, in the conventional system evaluation tool, O
Since S does not manage the stop time of the task, it is difficult to immediately understand the cause of the response deterioration or system hang caused by the disturbance. For example, when I / O retries occur frequently and the response of the system deteriorates, an abnormality is not detected by the time spent for one I / O, but the number of I / O retries or I / O service is performed. It was necessary to detect the abnormality by the log of the software that provides the.

【０００９】また、実行リクエストのキューを管理する
従来のトランザクション処理モニタでは、１個の実行リ
クエストを増やすことが、ＯＳの処理でのオーバーヘッ
ドによって、タスクにどれだけの遅れを生じるかを把握
することが困難であった。Further, in the conventional transaction processing monitor for managing the queue of execution requests, it is necessary to grasp how much one execution request causes a delay in the task due to the overhead in the processing of the OS. Was difficult.

【００１０】本発明は上記のような問題点を解消するた
めになされたもので、マルチタスク実行方式の計算機に
おいて、以下の機能を得ることを目的とする。（１）マルチタスク実行を実現するＯＳにおいて、タス
クに実行遅れを通知するための処理をするようにして、
タスク自身で実行遅れを認識することができるようにし
て、タスク自身での回復処理を自動化することを可能に
する。（２）システム管理モニタを１つのタスクとして実現し
て、一定時間毎にシステムで実行中のタスクが停止して
いる原因毎の停止時間分布を得て、計算機システムのボ
トルネック解析を容易にする。（３）また、システム管理モニタでは、任意タスクにお
ける停止原因毎の停止時間分布も得ることを可能にす
る。（４）また、システム管理モニタでは、任意タスクにお
ける停止時間を監視し、停止時間が許容値を超えた時に
は、利用者が定義するアクションを自動的に実行する。（５）また、システム管理モニタでは、タスクの停止原
因と停止時間に注目した、タスクの状態遷移情報を提供
する。（６）また、システム管理モニタでは、実行中のタスク
の停止時間に基づいて負荷状況を判断して、負荷状況に
応じて計算機システムに投入するタスクを制御する。The present invention has been made to solve the above problems, and it is an object of the present invention to obtain the following functions in a multitasking computer. (1) In an OS that realizes multitask execution, processing for notifying a task of execution delay is performed.
By enabling the task itself to recognize the execution delay, it is possible to automate the recovery process in the task itself. (2) Realize the system management monitor as one task, obtain the stop time distribution for each cause that the task being executed in the system is stopped at regular intervals, and facilitate the bottleneck analysis of the computer system. . (3) Further, the system management monitor makes it possible to obtain the stop time distribution for each stop cause in an arbitrary task. (4) Further, the system management monitor monitors the stop time in an arbitrary task, and when the stop time exceeds the allowable value, the action defined by the user is automatically executed. (5) Also, the system management monitor provides task state transition information focusing on the cause and duration of the task. (6) Further, the system management monitor determines the load status based on the stop time of the task being executed, and controls the task to be input to the computer system according to the load status.

【００１１】[0011]

【課題を解決するための手段】本発明に係わるタスク監
視装置は、複数のタスクが並列に実行される計算機であ
って、タスクの実行を制御するオペレーティングシステ
ムに、下記の要素を設けるようにしたものである。（ａ）タスク毎にそのタスクの停止時間の許容値を設定
する停止許容時間設定手段、（ｂ）タスクの動作状態が
遷移する度に、その遷移状態に応じて、タスク毎にその
タスクの停止した原因、停止時刻、停止時間及び、タス
クの停止原因別にその停止原因並びに停止時間の累積値
等の停止状況を記録する停止時間累積手段、（ｃ）前記
停止時間累積手段で記録されたタスクの停止時間とその
タスクに対応する前記許容値とを比較する比較手段、
（ｄ）前記比較手段でタスクの停止時間が前記許容値を
越えると判断したときには、前記タスクに停止時間が許
容値を越えたことを報告する停止時間超過報告手段。A task monitoring apparatus according to the present invention is a computer in which a plurality of tasks are executed in parallel, and an operating system for controlling the execution of the tasks is provided with the following elements. It is a thing. (A) Permissible stop time setting means for setting the permissible value of the stop time for each task, (b) Every time the operating state of the task transits, the task is suspended for each task according to the transition state Stop time accumulating means for recording the cause of the stop, the stop time, the stop time, and the stop status of the stop cause and the cumulative value of the stop time according to the cause of the stop of the task, and (c) the task recorded by the stop time accumulating means. Comparison means for comparing the stop time with the tolerance value corresponding to the task,
(D) Stop time excess reporting means for reporting to the task that the stop time exceeds the allowable value when the comparing means determines that the task stop time exceeds the allowable value.

【００１２】また、複数のタスクが並列に実行される計
算機であって、タスクの実行を制御するオペレーティン
グシステムに、下記の要素を設け、（ａ）タスクの動作
状態が遷移する度に、その遷移状態に応じて、タスク毎
にそのタスクの停止した原因、停止時刻、停止時間及
び、タスクの停止原因別にその停止原因並びに停止時間
の累積値等の停止状況を記録する停止時間累積手段、
（ｂ）前記停止時間累積手段で記録されたタスクの停止
状況を提供する停止状況提供手段、所定のインターバル
で動作するシステム管理モニタに、下記の要素を設ける
ようにしたものである。（ａ）前記停止状況提供手段の提供する停止状況を収集
する停止状況収集手段、（ｂ）前記停止状況収集手段の
収集したタスクの停止状況を基にタスクの負荷状況を度
数分布で表示する停止状況表示手段。Further, in a computer in which a plurality of tasks are executed in parallel, the following elements are provided in an operating system which controls the execution of the tasks, and (a) each time the operating state of the task transits, its transition Depending on the state, the cause of stopping the task for each task, the stop time, the stop time, and the stop time accumulating means for recording the stop status such as the stop cause and the cumulative value of the stop time according to the cause of the stop of the task,
(B) The following elements are provided in the stop situation providing means for providing the stop situation of the task recorded by the stop time accumulating means and the system management monitor operating at a predetermined interval. (A) stop status collecting means for collecting the stop status provided by the stop status providing means, (b) stop for displaying the task load status in a frequency distribution based on the stop status of the task collected by the stop status collecting means Status display means.

【００１３】また、前記停止状況表示手段は、所定のイ
ンターバルにおけるタスクの負荷状況を表示するように
したものである。Further, the stop status display means displays the load status of the task at a predetermined interval.

【００１４】また、前記停止状況表示手段は、累積した
タスクの負荷状況を表示するようにしたものである。Further, the stop status display means displays the accumulated task load status.

【００１５】また、前記停止状況提供手段は、停止状況
として停止状態中の情報も含めて提供するようにしたも
のである。Further, the stop status providing means provides the stop status including the information on the stop status.

【００１６】また、複数のタスクが並列に実行される計
算機であって、タスクの実行を制御するオペレーティン
グシステムに、下記の要素を設け、（ａ）タスクの動作
状態が遷移する度に、その遷移状態に応じて、タスク毎
にそのタスクの停止した原因、停止時刻、停止時間及
び、タスクの停止原因別にその停止原因並びに停止時間
の累積値等の停止状況を記録する停止時間累積手段、
（ｂ）前記停止時間累積手段で記録されたタスクの停止
状況を提供する停止状況提供手段、所定のインターバル
で動作するシステム管理モニタに、下記の要素を設ける
ようにしたものである。（ａ）前記停止状況提供手段の提供する停止状況を収集
する停止状況収集手段、（ｂ）タスク毎にそのタスク異
常状態を定義した異常状態定義情報、（ｃ）前記停止状
況と前記異常状態定義情報を基にタスクの異常状態を検
出する異常状態検出手段と、（ｄ）異常状態検出手段で
異常と判断したときには、前記タスクに対して異常であ
ることを通知する異常通知手段。Further, in a computer in which a plurality of tasks are executed in parallel, an operating system for controlling the execution of the tasks is provided with the following elements, and (a) each time the operating state of the task transits, its transition Depending on the state, the cause of stopping the task for each task, the stop time, the stop time, and the stop time accumulating means for recording the stop status such as the stop cause and the cumulative value of the stop time according to the cause of the stop of the task,
(B) The following elements are provided in the stop situation providing means for providing the stop situation of the task recorded by the stop time accumulating means and the system management monitor operating at a predetermined interval. (A) Stop status collection means for collecting the stop status provided by the stop status providing means, (b) Abnormal status definition information defining the task abnormal status for each task, (c) The stop status and the abnormal status definition An abnormal state detecting means for detecting an abnormal state of the task based on the information; and (d) an abnormal state notifying means for notifying the task of the abnormal state when the abnormal state detecting means determines the abnormal state.

【００１７】また、複数のタスクが並列に実行される計
算機であって、タスクの実行を制御するオペレーティン
グシステムに、下記の要素を設け、（ａ）タスクの動作
状態が遷移する度に、その遷移状態に応じて、タスク毎
にそのタスクの停止した原因、停止時刻、停止時間及
び、タスクの停止原因別にその停止原因並びに停止時間
の累積値等の停止状況を記録する停止時間累積手段、
（ｂ）前記停止時間累積手段で記録されたタスクの停止
状況を提供する停止状況提供手段、所定のインターバル
で動作するシステム管理モニタに、下記の要素を設ける
ようにしたものである。（ａ）前記停止状況提供手段の提供する停止状況を収集
する停止状況収集手段、（ｂ）タスク毎にそのタスク異
常状態を定義した異常状態定義情報、（ｃ）前記停止状
況と前記異常状態定義情報を基にタスクの異常状態を検
出する異常状態検出手段と、（ｄ）異常状態検出手段で
異常と判断したときには、この異常状態を処理するタス
クを起動するタスク投入手段。Further, in a computer in which a plurality of tasks are executed in parallel, the following elements are provided in an operating system for controlling the execution of the tasks, and (a) each time the operating state of the task transits, the transition Depending on the state, the cause of stopping the task for each task, the stop time, the stop time, and the stop time accumulating means for recording the stop status such as the stop cause and the cumulative value of the stop time according to the cause of the stop of the task,
(B) The following elements are provided in the stop situation providing means for providing the stop situation of the task recorded by the stop time accumulating means and the system management monitor operating at a predetermined interval. (A) Stop status collection means for collecting the stop status provided by the stop status providing means, (b) Abnormal status definition information defining the task abnormal status for each task, (c) The stop status and the abnormal status definition An abnormal state detecting means for detecting an abnormal state of the task based on the information, and (d) a task inputting means for activating a task for processing the abnormal state when the abnormal state detecting means determines the abnormality.

【００１８】また、前記異常状態定義情報をユーザタス
クから更新するシステム管理モニタインタフェース手段
を設けるようにしたものである。Further, a system management monitor interface means for updating the abnormal state definition information from a user task is provided.

【００１９】また、前記システム管理モニタに前記タス
クの停止状況を基にタスクの遷移状態を表示するタスク
状態遷移表示手段を設けるようにしたものである。Further, the system management monitor is provided with a task state transition display means for displaying the transition state of the task based on the stop status of the task.

【００２０】また、複数のタスクが並列に実行される計
算機であって、タスクの実行を制御するオペレーティン
グシステムに、下記の要素を設け、（ａ）タスクの動作
状態が遷移する度に、その遷移状態に応じて、タスク毎
にそのタスクの停止した原因、停止時刻、停止時間及
び、タスクの停止原因別にその停止原因並びに停止時間
の累積値等の停止状況を記録する停止時間累積手段、
（ｂ）前記停止時間累積手段で記録されたタスクの停止
状況を提供する停止状況提供手段、ユーザタスクが設定
するタスクの投入情報を保持するシステム管理モニタイ
ンタフェース手段を備え、所定のインターバルで動作す
るシステム管理モニタに、下記の要素を設けるようにし
たものである。（ａ）前記停止状況提供手段の提供する停止状況を収集
する停止状況収集手段、（ｂ）前記停止状況収集手段の
収集したタスクの停止状況を基にタスクの投入の可否を
判断する投入可否判断手段と、（ｃ）前記タスク投入可
否判断手段で投入可と判断したときに前記システム管理
モニタインタフェース手段から得たタスク投入情報を基
にタスクを投入するタスク投入手段。Further, in a computer in which a plurality of tasks are executed in parallel, the operating system for controlling the execution of the tasks is provided with the following elements, and (a) each time the operating state of the task transits, its transition Depending on the state, the cause of stopping the task for each task, the stop time, the stop time, and the stop time accumulating means for recording the stop status such as the stop cause and the cumulative value of the stop time according to the cause of the stop of the task,
(B) A stop status providing means for providing the stop status of the task recorded by the stop time accumulating means, a system management monitor interface means for holding the input information of the task set by the user task are provided, and the operation is performed at a predetermined interval. The system management monitor has the following elements. (A) Stop status collection means for collecting the stop status provided by the stop status providing means, (b) Judgment possibility for judging whether the task can be inputted based on the stop status of the task collected by the stop status collection means And (c) task input means for inputting a task based on the task input information obtained from the system management monitor interface means when the task input availability determination means determines that the task can be input.

【００２１】また、本発明によるタスク監視方法は、下
記の工程を有するものである。（ａ）タスク毎にそのタスクの停止時間の許容値を設定
する工程、（ｂ）前記タスクが停止したとき、その停止
時刻、停止原因を記録する工程、（ｃ）前記タスクが停
止状態から回復したとき、停止時間及び停止原因と停止
時間の累計値を記録する工程、（ｄ）前記停止時間と前
記許容値とを比較する比較工程、（ｅ）前記比較工程で
停止時間が許容値を越えると判断したときには、前記タ
スクに対して許容時間を越える停止が発生したことを報
告する工程。The task monitoring method according to the present invention has the following steps. (A) setting a permissible value of the stop time of the task for each task, (b) recording the stop time and cause of the stop when the task is stopped, (c) recovering the task from the stopped state When, the step of recording the stop time and the cause of the stop and the cumulative value of the stop time, (d) the comparing step of comparing the stop time with the allowable value, and (e) the stop time exceeding the allowable value in the comparing step If it is determined that the task has been stopped for more than the allowed time, then the step of reporting.

【００２２】また、下記の工程を有するものである。（ａ）タスクが停止したとき、その停止時刻、停止原因
を記録する工程、（ｂ）前記タスクが停止状態から回復
したとき、停止時間及び停止原因と停止時間の累計値等
のタスクの停止状況を記録する工程、（ｃ）前記停止状
況を基にタスクの負荷状況を度数分布で表示する工程。Further, it has the following steps. (A) When the task is stopped, a step of recording the stop time and the cause of the stop, (b) When the task is recovered from the stopped state, the stop status of the task such as the stop time and the cause of the stop and the cumulative value of the stop times And (c) displaying the task load status in a frequency distribution based on the stop status.

【００２３】また、下記の工程を有するものである。（ａ）タスク毎にそのタスクの異常状態を定義する工
程、（ｂ）前記タスクが停止したとき、その停止時刻、
停止原因を記録する工程、（ｃ）前記タスクが停止状態
から回復したとき、停止時間及び停止原因と停止時間の
累計値等のタスクの停止状況を記録する工程、（ｄ）前
記停止情報を基に定義された前記異常状態の発生を検出
する異常状態検出工程、（ｅ）前記異常状態検出工程で
異常を検出するとこの異常状態を処理する工程。Further, it has the following steps. (A) a step of defining an abnormal state of each task, (b) when the task is stopped, its stop time,
Recording the cause of the stop, (c) recording the stop status of the task such as the stop time and the cumulative value of the stop cause and the stop time when the task recovers from the stopped state, (d) based on the stop information An abnormal state detecting step of detecting the occurrence of the abnormal state defined in 1., and (e) a step of processing the abnormal state when an abnormality is detected in the abnormal state detecting step.

【００２４】また、異常状態を処理する工程を、この異
常状態に該当するタスクに報告する工程としたものであ
る。Further, the step of processing the abnormal state is a step of reporting to the task corresponding to this abnormal state.

【００２５】また、異常状態を処理する工程を、この異
常状態を処理するタスクを起動する工程としたものであ
る。Further, the step of processing the abnormal state is a step of activating a task for processing the abnormal state.

【００２６】[0026]

【発明の実施の形態】図１は、本発明によるタスク監視
装置の一実施の形態の全体構成を示す図である。図にお
いて、１はタスク監視装置を構成するソフトウェアの全
体構成を示し、２はオペレーティングシステム（以下、
ＯＳと称す）である。ＯＳ２はタスクの停止してもよい
停止時間を設定する許容時間設定手段２ａ、停止したタ
スクの停止時間及び停止原因累積手段２ｂ、タスク停止
状況提供手段２ｃ、及び停止時間超過の報告手段２ｄを
有している。３は所定の周期で動作するシステム管理モ
ニタで、ＯＳ２上で実行される１つのタスクとして実現
されている。このシステム管理モニタ３はタスクの停止
状況収集手段３ａ、停止状況表示手段３ｂ、タスクの異
常状態検出手段３ｃ、異常状態通知手段３ｄ、タスク状
態遷移表示手段３ｅ及びタスクを起動するタスク投入手
段３ｆを有している。４はＯＳ２の管理下で実行される
ユーザ定義のタスク（以下、タスクと称す）で、５はユ
ーザ定義タスク４とシステム管理モニタ３とのインタフ
ェースをとるシステム管理モニタインタフェース手段
（以降、システム管理モニタＩ／Ｆ手段と称す）であ
る。1 is a diagram showing the overall configuration of an embodiment of a task monitoring apparatus according to the present invention. In the figure, 1 indicates the overall configuration of software that constitutes the task monitoring device, and 2 indicates an operating system (hereinafter,
It is called OS). The OS 2 has an allowable time setting means 2a for setting a stop time that may stop the task, a stop time and stop cause accumulating means 2b for the stopped task, a task stop status providing means 2c, and a stop time excess reporting means 2d. doing. A system management monitor 3 operates at a predetermined cycle, and is realized as one task executed on the OS 2. The system management monitor 3 includes a task stop status collection unit 3a, a stop status display unit 3b, a task abnormal state detection unit 3c, an abnormal state notification unit 3d, a task state transition display unit 3e, and a task input unit 3f for activating a task. Have Reference numeral 4 denotes a user-defined task (hereinafter referred to as a task) executed under the control of the OS 2, and 5 denotes a system management monitor interface means (hereinafter, system management monitor) for interfacing the user-defined task 4 with the system management monitor 3. (I / F means).

【００２７】実施の形態１．この実施の形態１では、Ｏ
Ｓ２がタスク４の停止時間を管理して、停止時間超過時
にタスク４に事象通知を行う例について説明する。タス
ク４においては、ＯＳ２に対してタスク固有の停止時間
の許容値を停止許容時間設定手段２ａにより設定するこ
とが可能になっており、タスク４の停止時間がこの許容
値を超過した場合、ＯＳ２の有する停止時間超過報告手
段２ｄによってタスク４が検知可能な手段で事象通知を
受けることができる。タスクへの事象通知方法の例とし
ては、例えばＵＮＩＸ等のシグナル等の手段がある。Embodiment 1 In the first embodiment, O
An example in which S2 manages the stop time of the task 4 and notifies the task 4 of the event when the stop time is exceeded will be described. In the task 4, it is possible to set the permissible value of the task-specific stop time for the OS 2 by the permissible stop time setting means 2a, and when the stop time of the task 4 exceeds this permissible value, the OS 2 The event notification can be received by the means 4d that the task 4 can detect by the stop time excess reporting means 2d. An example of a method of notifying an event to a task is a means such as a signal such as UNIX.

【００２８】図２は、実施の形態１におけるタスクの停
止時間の累積処理と停止時間の報告時期を説明する図
で、この図を参照しながら説明する。図２に示すグラフ
１０は、１つのタスクの状態遷移を時系列で表してい
る。縦軸ではタスクの状態、横軸では時間の流れを表し
ている。１１ａの状態は、注目しているタスク４がＣＰ
Ｕ（以後プロセッサとも称す）によって実行されている
状態である。１１ｂの状態における時刻で１２ａの状態
に遷移するのは、タスク４がプログラム中で資源を要求
した等で、停止した状態を表している。１１ｃの状態の
時刻で１３ａの状態に遷移するのは、タスク４がＣＰＵ
の利用を制限された場合を表している。１２ａから１２
ｂの期間は、タスク４が要求した資源を獲得するまで
に、タスク４が停止した時間を表している。また、１３
ａから１３ｂの期間は、タスク４がＣＰＵによる実行権
を持つまで、タスク４が停止した時間を表している。１
１ｄからは、タスク４が再び実行されている状態を表し
ている。FIG. 2 is a diagram for explaining the cumulative processing of the task stop time and the reporting time of the stop time in the first embodiment, which will be described with reference to this figure. The graph 10 shown in FIG. 2 represents the state transition of one task in time series. The vertical axis represents task status, and the horizontal axis represents time flow. In the state of 11a, the task 4 being noted is CP
It is in a state of being executed by U (hereinafter also referred to as a processor). The transition to the state of 12a at the time in the state of 11b represents a stopped state due to the task 4 requesting resources in the program. At the time of the state of 11c, the task 4 changes to the state of 13a by the CPU.
Represents the case where the use of is restricted. 12a to 12
The period of b represents the time during which the task 4 stopped until the task 4 acquired the requested resource. Also, 13
The period from a to 13b represents the time when the task 4 is stopped until the task 4 has the execution right by the CPU. 1
From 1d, the task 4 is being executed again.

【００２９】図３は、ＯＳ２がタスク４毎に管理してい
るタスク管理テーブル２０とタスクの停止状況をトレー
スするトレースバッファ３０を示している。タスク管理
テーブル２０は、タスク４毎にＯＳ２が割り当てるタス
ク管理テーブルであり、ここにはＯＳ２がタスク４に関
する管理データを格納されている。即ち、タスク４が最
後に停止した時刻２１、最後に停止した原因２２、許容
停止時間２３、総停止時間２４、停止原因別の許容停止
時間２５、停止原因別の累積回数２６、停止原因別の累
積時間２７等のタスクの停止状況が格納される。FIG. 3 shows a task management table 20 managed by the OS 2 for each task 4 and a trace buffer 30 for tracing the stop status of tasks. The task management table 20 is a task management table assigned by the OS 2 for each task 4, and the OS 2 stores management data relating to the task 4. That is, the time 21 at which the task 4 was last stopped, the cause 22 that was stopped last, the allowable stop time 23, the total stop time 24, the allowable stop time 25 for each stop cause, the cumulative number of times 26 for each stop cause, and each stop cause The task stop status such as cumulative time 27 is stored.

【００３０】また、ＯＳ２のデータ領域の一部に設けら
れたトレースバッファ３０にも、タスクの停止状況をＯ
Ｓ２が書き込んで行く。４０はＯＳ２がタスクの停止状
況を記録したタスク管理テーブル２０とトレースバッフ
ァ３０からデータを取り出すＡＰＩであり、タスクプロ
グラム（タスク）において実行可能になっている。シス
テム管理モニタ３は、このＡＰＩ４０を用いてＯＳ２か
ら種々のデータを受け取る。Also, the task stop status is set to O in the trace buffer 30 provided in a part of the data area of OS2.
S2 writes in. Reference numeral 40 denotes an API for extracting data from the task management table 20 and the trace buffer 30 in which the OS 2 records the task stop status, and can be executed by a task program (task). The system management monitor 3 receives various data from the OS 2 using this API 40.

【００３１】ＯＳ２は、タスク４への事象通知のため
に、タスク毎に停止原因別の停止回数及び停止時間を累
積して管理する。タスクの停止事象の管理は、ＯＳ２が
タスク４の切替えを行う際のコンテキスト保存処理（図
２における１２ａ，１３ａでの処理）及びコンテキスト
回復処理（図２における１２ｂ，１３ｂでの処理）にお
いて、計算機起動時からの経過時間を記録しているタイ
マを用いて行う。また、ＯＳ２は同時に、その時点での
タイムスタンプ３１、回復処理を施したタスク４の識別
子３２、停止時間３３、停止原因３４等のトレースデー
タを記憶装置上に確保したトレースバッファ３０上に書
き出す。トレースバッファ３０への書き出しは、回帰的
に行い、システム管理モニタ３が時系列的にデータを得
ることを可能にさせる。もし、タスク４における停止時
間があらかじめ設定された許容値を超過した場合、ＯＳ
２は図２における１１ｄのタイミングにおいてタスク４
で検知可能な方法を用いて、即ち停止時間超過報告手段
２ｄにより事象通知する。In order to notify the task 4 of an event, the OS 2 accumulates and manages the number of stop times and the stop time for each stop cause for each task. The task stop event management is performed by a computer in the context saving process (processes 12a and 13a in FIG. 2) and the context recovery process (processes 12b and 13b in FIG. 2) when the OS 2 switches the task 4. This is done using a timer that records the time that has elapsed since the startup. At the same time, the OS 2 writes the trace data such as the time stamp 31, the identifier 32 of the task 4 that has been subjected to the recovery process, the stop time 33, the stop cause 34, etc., to the trace buffer 30 secured in the storage device. Writing to the trace buffer 30 is performed recursively so that the system management monitor 3 can obtain data in time series. If the stop time in task 4 exceeds the preset allowable value, the OS
2 is task 4 at the timing of 11d in FIG.
Using the method that can be detected, that is, the stop time excess reporting means 2d notifies the event.

【００３２】以下に、この実施の形態１におけるタスク
４の停止時間の管理の処理の詳細を図４に示した流れ図
を用いて説明する。ステップＳ１０１において、タスク
４が停止した原因が、事象待ちか否かを判断して、その
結果により処理が異なる。なお、ここでの事象待ちと
は、タスク４の実行を停止する要因となるもの、即ちＣ
ＰＵ利用待ち以外のものである。停止した原因が事象待
ちでない場合には、ステップＳ１２０の処理でコンテキ
ストを保存して事象待ちに関係する処理を行わない。The details of the processing for managing the stop time of the task 4 in the first embodiment will be described below with reference to the flow chart shown in FIG. In step S101, it is determined whether or not the cause of the task 4 being stopped is waiting for an event, and the processing differs depending on the result. The event wait here is a factor that causes the execution of task 4 to be stopped, that is, C
It is something other than waiting for PU usage. When the cause of the stop is not the event wait, the process of step S120 saves the context and does not perform the process related to the event wait.

【００３３】ステップＳ１０２では、タスク切替えを前
提として現在まで実行していたタスク４の実行環境を保
存する処理を行う。ステップＳ１０３では、ＯＳ２が管
理するタスク管理テーブル２０に、最後に停止した時刻
２１（タスクが事象待ちを開始した時間情報）と最後に
停止した原因２２を記録する処理を行う。また、トレー
スバッファ３０には、タスク４が事象待ちを開始した時
刻情報（タイムスタンプ）３１、タスク識別子３２、停
止原因３４等のタスクの停止状況を書き出す。ステッ
プＳ１０４では、タスク４が事象待ちにより停止してい
ることを表している。マルチタスク実行形式のＯＳ２で
は、この後、停止事象が解決されたタスク４に対して処
理を行う。In step S102, processing for saving the execution environment of the task 4 which has been executed up to now is performed on the premise of task switching. In step S103, a process of recording the last stop time 21 (time information when the task started waiting for an event) and the last stop cause 22 in the task management table 20 managed by the OS 2 is performed. Further, in the trace buffer 30, the task stop status such as time information (time stamp) 31 at which the task 4 starts waiting for an event, the task identifier 32, and the stop cause 34 is written. In step S104, it is indicated that the task 4 is stopped by waiting for an event. After that, the OS 2 of the multitask execution format processes the task 4 in which the stop event has been resolved.

【００３４】ステップＳ１０５において、事象待ちが解
決したタスク４があるか否かを判断して、その結果によ
り処理が異なる。なお、このステップＳ１０５には、事
象待ちの終了割り込み処理でも入ってくる（図５のＢか
らの流れ）。In step S105, it is determined whether or not there is a task 4 whose event waiting has been resolved, and the processing differs depending on the result. It should be noted that this step S105 also comes into the end interrupt processing waiting for an event (flow from B in FIG. 5).

【００３５】ステップＳ１０６において、事象待ちが解
決したタスク４を選択する処理を行う。次に、ステップ
Ｓ１０７において、停止原因が解決されて実行を再開す
る時のロギング処理を行う。ここでは、ＯＳ２が管理す
るタスク管理テーブル２０の、総停止時間２４と停止原
因別の累積回数２５、停止原因別の累積時間２７を更新
する。また、トレースバッファ３０には、タスク４での
事象待ちが解決した時刻情報（タイムスタンプ）３１、
タスク識別子３２、停止時間３３（タイムスタンプ３１
から最後に停止した停止時刻２１を減算して求める）、
停止原因３４を書き出す。In step S106, processing for selecting the task 4 whose event waiting has been resolved is performed. Next, in step S107, a logging process is performed when the cause of the stop is resolved and the execution is restarted. Here, the total stop time 24, the cumulative number of times 25 by stop cause, and the cumulative time 27 by stop cause of the task management table 20 managed by the OS 2 are updated. In the trace buffer 30, time information (time stamp) 31 when the event waiting in the task 4 is resolved,
Task identifier 32, stop time 33 (time stamp 31
Is calculated by subtracting the last stop time 21 from
Write down the cause of stop 34.

【００３６】次に、ステップＳ１０８において、タスク
４がＣＰＵ待ちを開始した停止情報に関して、ロギング
処理を行う。ここでの処理は、ステップＳ１０３におけ
る処理と同様であるが、ログ中の停止原因の情報が異な
る。次に、ステップＳ１０９において、該当するタスク
４をＣＰＵ待ちのためにキューイングする処理を行う。
次に、ステップＳ１１０において、タスク４を停止させ
ていた事象が解決したタスク４があるか否かを判断し
て、その結果により処理が異なる。もし、事象待ちが解
決したタスク４がある場合には、ステップＳ１０５から
の処理を実行し、そうでない場合には、ステップＳ１１
１に移行する。ステップＳ１１１では、ＣＰＵ待ちをし
ているタスク４が存在するか否かを判断して、その結果
により処理が異なる。ＣＰＵ待ちしているタスク４が存
在しない場合には、ＣＰＵはアイドル状態になる。この
アイドル状態が解決するのは、各種の割り込みを受けた
時で、その時はＢ（ステップＳ１０５：事象待ち解決の
場合）またはＣ（ステップＳ１１１：一定時間後のスケ
ジューリング割り込み）から処理を再開する。Next, in step S108, logging processing is performed on the stop information that the task 4 has started waiting for the CPU. The process here is the same as the process in step S103, but the information on the cause of the stop in the log is different. Next, in step S109, a process of queuing the corresponding task 4 for waiting for the CPU is performed.
Next, in step S110, it is determined whether or not there is a task 4 in which the event that has stopped the task 4 is resolved, and the processing differs depending on the result. If there is a task 4 whose event waiting is resolved, the processing from step S105 is executed, and if not, step S11 is executed.
Move to 1. In step S111, it is determined whether or not there is a task 4 waiting for the CPU, and the processing differs depending on the result. If there is no task 4 waiting for the CPU, the CPU goes into the idle state. This idle state is resolved when various interrupts are received, and at that time, the process is restarted from B (step S105: event waiting solution) or C (step S111: scheduling interrupt after a fixed time).

【００３７】ＣＰＵ待ちをしているタスク４がある場合
には、ステップＳ１１２に移行して、ＣＰＵ待ちキュー
から実行タスク選択する処理を行う。次に、ステップＳ
１１３において、停止原因が解決されて実行を再開する
ためのロギング処理を行う。ここでの処理は、ステップ
Ｓ１０７の処理と同様であるが、ログ中の停止原因の情
報が異なる。次に、ステップＳ１１４において、ＣＰＵ
の実行権を得たタスク４に対する、タスクの実行環境を
回復する処理を行う。その後、ステップＳ１１５におい
て、タスク４の停止時間が許容時間を超過したかどうか
の判断処理を行う。停止時間が許容値を超過していた場
合には、ＯＳ２はステップＳ１３０の処理でタスク４に
対して事象通知を行う。If there is a task 4 waiting for the CPU, the process proceeds to step S112 to perform a process of selecting an execution task from the CPU waiting queue. Next, step S
At 113, a logging process for resolving the cause of the stop and resuming the execution is performed. The process here is the same as the process of step S107, but the information on the cause of the stop in the log is different. Next, in step S114, the CPU
The task execution environment for the task 4 which has obtained the execution right is restored. After that, in step S115, it is determined whether or not the stop time of the task 4 exceeds the allowable time. If the stop time exceeds the allowable value, the OS 2 notifies the task 4 of the event in the process of step S130.

【００３８】以上のように、この実施の形態１において
は、ＯＳ２におけるタスク４の停止時間管理において、
タスク４の停止前後において、停止原因と停止時間に関
する情報を記録している。更に、もし、タスク４の停止
時間が許容値を超過していた場合、ＯＳ２はタスクに対
して事象通知するようにしたので、タスク自身で停止時
間が超過したことを認識できるようになる。As described above, in the first embodiment, in the stop time management of the task 4 in the OS 2,
Information about the cause of the stop and the stop time is recorded before and after the task 4 is stopped. Furthermore, if the stop time of the task 4 exceeds the allowable value, the OS 2 notifies the task of an event, so that the task itself can recognize that the stop time has been exceeded.

【００３９】実施の形態２．この実施の形態２では、本
発明によるタスク監視装置におけるシステム管理モニタ
３での負荷状況表示について説明する。本発明において
は、計算機上で１つのタスクとして実現されて動作する
システム管理モニタ３が、計算機上で実行されている全
てのタスク４の停止事象について、以下の情報を提供す
る。（ａ）特定のタスクにおける、停止原因別の停止時間の
度数分布。（ｂ）計算機単体内の全タスクにおける、停止原因別の
停止時間の度数分布。なお、タスクの停止原因はＯＳの
実現方法に依存する。例えば、ＵＮＩＸにおいては、以
下のような原因がある。・各種入出力装置の終了待ち。例）磁気ディスク装置、端末装置、ネットワーク装置
等。・ＯＳデータにおける作業領域枯渇による待ち。例）主記憶装置と２次記憶装置との間でのデータ入れ換
え、各種バッファメモリの管理。・プロセス（タスク）間のコミュニケーション（計算機
内外）の事象待ち。例）セマフォ、タスク間メッセージ通信、計算機間通
信。・ＣＰＵの利用制限による待ち。例）プロセス（タスク）の優先度による実行待ち、或い
は時間制限による実行停止。・自発的な待ち。例）プログラム自身によるタイマ機能を利用した一定時
間の停止。Embodiment 2 In the second embodiment, the load status display on the system management monitor 3 in the task monitoring device according to the present invention will be described. In the present invention, the system management monitor 3, which is realized and operates as one task on the computer, provides the following information regarding the stop event of all the tasks 4 executed on the computer. (A) Frequency distribution of stop time for each stop cause in a specific task. (B) Frequency distribution of the stop time according to the stop cause in all tasks in the computer alone. The cause of stopping the task depends on the OS implementation method. For example, UNIX has the following causes.・ Waiting for completion of various input / output devices. Example) Magnetic disk device, terminal device, network device, etc.・ Wait due to exhaustion of work area in OS data. Example) Data exchange between main memory and secondary memory, management of various buffer memories.・ Waiting for event of communication (inside / outside computer) between processes (tasks). Example) Semaphore, message communication between tasks, communication between computers.・ Waiting due to CPU usage restrictions. Example) Waiting for execution depending on the priority of the process (task) or stopping execution due to time limit.・ Spontaneous waiting. Example) Stop the program for a certain period of time using the timer function.

【００４０】本発明におけるシステム管理モニタ３で
は、特定及びシステム全体で動作するタスク４におい
て、停止原因別の停止時間の度数分布を一定のインター
バル毎に提供する。図５は、本実施の形態２によって提
供される停止原因別の停止時間の度数分布の概略図を示
している。図５において、縦軸はインターバル内でのタ
スク４の停止回数、横軸は１回の停止あたりの停止時間
を表しており、この実施の形態２では停止時間を１０ｍ
秒で丸めて報告している。図５に示す例では、注目した
停止原因のほとんどが１０〜２０ｍ秒の間で終了する
が、たまに停止時間が長い場合があることを示してい
る。In the system management monitor 3 according to the present invention, the frequency distribution of the stop time for each stop cause is provided at a constant interval in the task 4 which is specified and operates in the entire system. FIG. 5 is a schematic diagram of the frequency distribution of the stop time according to the stop cause provided by the second embodiment. In FIG. 5, the vertical axis represents the number of times task 4 is stopped within the interval, and the horizontal axis represents the stop time per stop. In the second embodiment, the stop time is 10 m.
Rounded up in seconds and reported. The example shown in FIG. 5 shows that most of the noted causes of termination are completed within 10 to 20 msec, but occasionally the duration of the suspension is long.

【００４１】図６は、本実施の形態２における、タスク
の停止原因別の停止時間の度数分布表示を行うまでの処
理の流れを示す流れ図で、以下図６を参照しながら本実
施の形態２における動作について説明する。ステップＳ
２０１（停止状況取得手段及び停止状況提供手段に相
当）では、計算機システム上で実行中の全てのタスク４
に関して、現在の停止状況を得る処理を行う。また、こ
の時には、この停止状況を得た時刻も同時に記録する。
なお、停止状況の取得は、ＯＳ２が提供するＡＰＩの実
行によるアトミックな処理であるが、ＯＳ２内部での処
理については、後述の実施の形態４にて説明する。FIG. 6 is a flow chart showing the flow of processing until the frequency distribution display of the stop time by task stop cause in the second embodiment is displayed. The second embodiment will be described with reference to FIG. 6 below. The operation will be described. Step S
In 201 (corresponding to stop status acquisition means and stop status providing means), all tasks 4 running on the computer system
With respect to, the process of obtaining the current stop status is performed. In addition, at this time, the time when the stop situation is obtained is also recorded at the same time.
It should be noted that the acquisition of the stop status is an atomic process by executing the API provided by the OS2, and the process inside the OS2 will be described later in a fourth embodiment.

【００４２】次に、ステップＳ２０２において、ステッ
プＳ２０１で収集したスナップショットをシステム管理
モニタ３が負荷状況表示を行うためのデータとして処理
する。次に、ステップＳ２０３において、実施の形態１
での処理において説明した図４の流れ図のステップＳ１
０３，Ｓ１０７，Ｓ１０８，Ｓ１１３で書き出したログ
を処理するか否かを判断して、その結果により処理が異
なる。もし、ログがない場合には、処理を行わない。し
かしながら、ログを処理しない場合においても、計算機
システム内のタスクの状況についてのスナップショット
表示を得ることができる。Next, in step S202, the system management monitor 3 processes the snapshots collected in step S201 as data for displaying the load status. Next, in step S203, the first embodiment
Step S1 of the flowchart of FIG. 4 described in the processing
It is determined whether or not to process the log written in 03, S107, S108, and S113, and the processing differs depending on the result. If there is no log, no processing is done. However, even when the log is not processed, it is possible to obtain a snapshot display of the status of tasks in the computer system.

【００４３】ログがあれば、ステップＳ２０４に移行し
て、未処理のログを１レコード読み出す処理を行い。次
に、ステップＳ２０５に移行して、読み出したレコード
（ログ）がインターバル内のデータとして処理するのに
相応しいかどうかの判断を、ステップＳ２０１の処理で
得た時刻情報を基にステップＳ２０５で行う。次に、ス
テップＳ２０６において、ステップＳ２０４で読み込ん
だレコード（ログ）を、負荷状況表示データとして処理
する。次に、ステップＳ２０７では、未処理のログがあ
るかどうかを判断して、未処理のログが残っている場合
には、ステップＳ２０４の処理に戻る。未処理のログが
無ければステップＳ２０８に移行して、ステップＳ２０
２及びステップＳ２０６において処理したデータを、停
止原因毎の停止時間の度数分布図にして表示する。If there is a log, the process proceeds to step S204 to read one record of the unprocessed log. Next, the process proceeds to step S205, and it is determined in step S205 whether or not the read record (log) is suitable for processing as data within the interval, based on the time information obtained in the process of step S201. Next, in step S206, the record (log) read in step S204 is processed as load status display data. Next, in step S207, it is determined whether or not there is an unprocessed log, and if any unprocessed log remains, the process returns to step S204. If there is no unprocessed log, the process proceeds to step S208 and step S20.
The data processed in 2 and step S206 is displayed as a frequency distribution chart of the stop time for each stop cause.

【００４４】以上のように本実施の形態２によれば、シ
ステム管理モニタが、一定のインターバル毎に、タスク
の停止状況を基にタスクの停止原因毎の停止時間を度数
分布にして表示するようにしたので、タスクの停止する
原因を視覚的に捉えることができる。As described above, according to the second embodiment, the system management monitor displays the stop time for each cause of the task in a frequency distribution based on the stop status of the task at regular intervals. Since it is set, it is possible to visually grasp the cause of the task stop.

【００４５】実施の形態３．本発明におけるシステム管
理モニタ３では、実施の形態例２において説明したタス
ク４の停止原因毎の停止時間に関する度数分布表示をイ
ンターバル内で得ると共に、累積期間でも得ることが可
能である。本実施の形態３では、本発明におけるシステ
ム管理モニタ３において、特定またはシステム全体で動
作するタスク４について、累積期間で停止原因別の停止
時間についての度数分布の提供について説明する。Embodiment 3 In the system management monitor 3 of the present invention, it is possible to obtain the frequency distribution display regarding the stop time for each stop cause of the task 4 described in the second embodiment within the interval and also in the cumulative period. In the third embodiment, in the system management monitor 3 according to the present invention, for the task 4 operating in the specific or the entire system, provision of the frequency distribution of the stop time for each stop cause in the cumulative period will be described.

【００４６】図７は、本実施の形態３における、累積期
間における停止原因毎の停止時間の度数分布表示を行う
処理の流れを示す流れ図であり、以下、この図７に示す
流れ図を参照しながら説明する。本実施の形態３での処
理は、実施の形態２での説明に用いた図６の処理の流れ
とほとんど同じであり、処理開始時の処理方法が異なる
だけである。即ち、実施の形態２における図６に示すス
テップＳ２０２で処理して度数分布表示に用いたデータ
は、測定インターバル（システム管理モニタ３の動作イ
ンターバル）の内だけで有効なので、次の測定インター
バルでデータを積算する時の処理では、このデータを無
効化する必要がある。これをステップＳ３００として、
この無効化の処理を処理開始時に実行する。また、ステ
ップＳ３００の処理でデータの無効化が実現できるよう
な度数分布表示を行うことと、新たな測定インターバル
で得た停止情報を前回の測定インターバルで得ていた情
報と重ね合わす処理が必要となり、この処理をステップ
Ｓ３０８（実施の形態２におけるステップＳ２０８の処
理に相当）で行う。FIG. 7 is a flow chart showing the flow of the processing for displaying the frequency distribution of the stop time for each stop cause in the accumulation period in the third embodiment. Hereinafter, with reference to the flow chart shown in FIG. explain. The processing in the third embodiment is almost the same as the processing flow in FIG. 6 used for the description in the second embodiment, and only the processing method at the start of processing is different. That is, since the data processed in step S202 shown in FIG. 6 in the second embodiment and used for the frequency distribution display is valid only within the measurement interval (the operation interval of the system management monitor 3), the data at the next measurement interval is used. It is necessary to invalidate this data in the process of accumulating. This is step S300,
This invalidation process is executed at the start of the process. Further, it is necessary to display the frequency distribution so that the data can be invalidated in the processing of step S300 and to superpose the stop information obtained at the new measurement interval with the information obtained at the previous measurement interval. This process is performed in step S308 (corresponding to the process of step S208 in the second embodiment).

【００４７】以上のように本実施の形態３によれば、、
システム管理モニタが、累積期間におけるタスクの停止
原因毎の停止時間の度数分布表示を行うようにしたの
で、タスクの負荷状況をある程度長いレンジで確認する
ことができる。As described above, according to the third embodiment,
Since the system management monitor displays the frequency distribution of the stop time for each cause of stopping the task in the cumulative period, it is possible to confirm the load status of the task in a somewhat long range.

【００４８】実施の形態４．本実施の形態４では、ＯＳ
２がタスク４の停止原因とその停止時間を提供するＡＰ
Ｉ手段について説明する。本発明に係わるＡＰＩでは、
実行を再開したタスク４における停止原因とその停止時
間を提供するのみではなく、現在も停止中のタスク４で
の停止開始からの停止時間をも提供する。本実施の形態
４で説明する手段を用いることにより、長時間停止して
いるタスク４を検出することが可能になる。Embodiment 4 In the fourth embodiment, the OS
2 is an AP that provides the cause of stopping task 4 and its stopping time
The I means will be described. With the API according to the present invention,
It not only provides the cause of the stop and the stop time in the task 4 that resumed execution, but also provides the stop time from the start of the stop in the task 4 that is currently stopped. By using the means described in the fourth embodiment, it becomes possible to detect the task 4 that has been stopped for a long time.

【００４９】図８は、本実施の形態４における、ＯＳ２
がタスク４の停止状況を提供する手段の処理の流れを示
す流れ図であり、以下に、この図を参照しながら説明す
る。先ず、ステップＳ４０１において、プロセッサに対
しての割り込み処理を禁止するための処理を行う。これ
は、図８に示す一連の処理を短時間にアトミックに行う
ための処置である。FIG. 8 shows OS2 in the fourth embodiment.
Is a flow chart showing the flow of processing of means for providing the stop status of task 4, which will be described below with reference to this figure. First, in step S401, processing for prohibiting interrupt processing for the processor is performed. This is a procedure for atomically performing the series of processing shown in FIG. 8 in a short time.

【００５０】次に、ステップＳ４０２において、タスク
管理テーブル２０へのアクセスを禁止する処理を行う。
この処理は、計算機システムが密結合方式のマルチプロ
セッサシステムの場合、ここでの処理によって、タスク
管理テーブル２０が他プロセッサによって書き換えられ
ることを防ぐための処理である。Next, in step S402, a process of prohibiting access to the task management table 20 is performed.
This process is a process for preventing the task management table 20 from being rewritten by another processor by the process when the computer system is a tightly coupled multiprocessor system.

【００５１】次に、ステップＳ４０３において、本処理
を実行した時の時刻データを収集して、ステップＳ４０
４において、ステップＳ４０３で収集した時刻データと
タスク４のスナップショット情報を、本処理を要求した
タスク（システム管理モニタ３）に渡す。タスク４のス
ナップショット状態は、タスク管理テーブル２０上の、
タスクの状態２８と最後に停止した時刻２１と最後に停
止した原因２２を基にした、停止中のタスク４における
停止原因と停止開始からの時間情報である。Next, in step S403, time data when this process is executed is collected, and in step S40
In step 4, the time data collected in step S403 and the snapshot information of task 4 are passed to the task (system management monitor 3) requesting this processing. The snapshot status of task 4 is
It is time information from the stop cause and the start of the stop of the task 4 being stopped, based on the task state 28, the time 21 at which the task was stopped last, and the cause 22 that was stopped last.

【００５２】次に、ステップＳ４０５において、ステッ
プＳ４０２でロック（禁止）していたタスク管理テーブ
ル２０へのアクセス禁止を解除する処理を行う。最後
に、ステップＳ４０６において、ステップＳ４０１で禁
止していたプロセッサへの割り込みを許可する処理を行
う。Next, in step S405, a process of releasing the access prohibition to the task management table 20 locked (prohibited) in step S402 is performed. Finally, in step S406, a process of permitting the interrupt to the processor prohibited in step S401 is performed.

【００５３】以上のように本実施の形態４によれば、シ
ステム管理モニタがタスクの停止状況を収集するときに
停止中のタスクについての、その停止状況を取得するこ
とができるので、長期間停止中のタスクを検出すること
ができる。As described above, according to the fourth embodiment, when the system management monitor collects the task stop status, it is possible to acquire the stop status of the task that is being stopped. It can detect tasks inside.

【００５４】実施の形態５．本実施の形態５では、本発
明によるシステム管理モニタ３において、計算機システ
ムの異常状態の検出、その検出結果によりボトルネック
を検出する手段について説明する。本発明ではシステム
管理モニタ３において、計測したタスクの停止時間に関
して、下記に示す（１）〜（３）の少なくとも一つによ
って、異常事態を定義するフィルタを設定することがで
きる。（１）一度の停止時間が許容値を超えた。（２）停止原因別の停止時間が許容値を超えた。（３）停止時間の累積が許容値を超えた。また、フィル
タの設定によって異常状態だと判断をした後には、シス
テム管理モニタ３は、利用者が望むユーザ定義タスク４
を投入することで、異常状態の記録、報告、回避、回復
等の処理を自動化できる。Embodiment 5 FIG. In the fifth embodiment, means for detecting an abnormal state of a computer system and detecting a bottleneck based on the detection result in the system management monitor 3 according to the present invention will be described. In the present invention, in the system management monitor 3, it is possible to set a filter defining an abnormal situation by at least one of the following (1) to (3) regarding the measured task stop time. (1) One stop time exceeds the allowable value. (2) The stop time for each cause of stop exceeds the allowable value. (3) The cumulative downtime exceeds the allowable value. Further, after it is determined that the status is abnormal due to the filter setting, the system management monitor 3 displays the user-defined task 4 desired by the user.
By inputting, it is possible to automate processing such as recording, reporting, avoiding, and recovering abnormal conditions.

【００５５】図９は、本実施の形態５における、システ
ム管理モニタ３の有する異常状態検出手段についての処
理の流れを示す流れ図であり、以下、この図を参照しな
がら説明する。なお、システム管理モニタ３は、図９に
示す処理を、一定時間毎に実行する。先ず、ステップＳ
５０１において、異常状態の定義が変更されているか否
かを判断する。判断の結果、異常状態の定義が変更され
ている場合には、ステップＳ５１０に移行してシステム
管理モニタＩ／Ｆ手段により、ユーザが定義した異常状
態を設定する。FIG. 9 is a flow chart showing a processing flow of the abnormal state detecting means of the system management monitor 3 in the fifth embodiment, which will be described below with reference to this figure. The system management monitor 3 executes the processing shown in FIG. 9 at regular intervals. First, step S
At 501, it is determined whether the definition of the abnormal state has been changed. If the result of determination is that the definition of the abnormal state has been changed, the flow moves to step S510 and the system management monitor I / F means sets the abnormal state defined by the user.

【００５６】異常状態の定義が変更されていない場合に
は、ステップＳ５０２に移行して、実施の形態２または
実施の形態３で説明した停止状況収集手段３ａで収集し
た計算機システムの停止状況（負荷状況）を表すデータ
を参照して、異常状態と定義された事態になっているか
どうかの診断を行う処理を行う。次に、ステップＳ５０
３において、ステップＳ５０２で計算機システムの診断
を行った結果についての判断を行い、異常と判断する状
態があった場合には、ステップＳ５２０でユーザが定義
したタスク４に異常状態通知手段３ｄにより報告する。
または、発生した異常状態に対処するユーザ定義タスク
４をタスク投入手段３ｆにより起動して、問題解決にあ
たる。If the definition of the abnormal state has not been changed, the process proceeds to step S502, and the stop status (load status of the computer system collected by the stop status collection means 3a described in the second or third embodiment By referring to the data indicating the (condition), a process of diagnosing whether or not the situation defined as the abnormal state is performed is performed. Next, step S50
In step 3, the result of the diagnosis of the computer system is judged in step S502, and if there is a state judged as abnormal, in step S520 the abnormal state notifying means 3d reports to the task 4 defined by the user. .
Alternatively, the user-defined task 4 for coping with the abnormal state that has occurred is activated by the task input means 3f to solve the problem.

【００５７】以上のように本実施の形態５によれば、シ
ステム管理モニタが、計算機システムの異常状態を検知
し、異常状態から対処することをタスク自身に通知する
ように、あるいは、適切なタスクを起動するようにした
ので、異常状態に対して適切な対処をすることができ
る。また、例えば、異常状態の発生する原因を調査する
ことにより、システムのボトルネックがどこにあるかを
知ることができる。As described above, according to the fifth embodiment, the system management monitor detects an abnormal state of the computer system and notifies the task itself of the action to be taken from the abnormal state, or an appropriate task. Since it is activated, it is possible to take appropriate measures against an abnormal condition. Further, for example, by investigating the cause of the occurrence of the abnormal state, it is possible to know where the bottleneck of the system is.

【００５８】実施の形態６．本実施の形態６では、本発
明によるシステム管理モニタ３において、タスク４の状
態遷移を把握することが可能なトレース情報を提供する
手段について説明する。図１０及び図１１は、本実施の
形態６を説明する図であり、図１０は、本実施の形態６
により提供されるタスクの遷移状態図（トレース図）、
図１１は本実施の形態６におけるシステム管理モニタ３
の処理の流れを示す流れ図である。Embodiment 6 FIG. In the sixth embodiment, a means for providing trace information capable of grasping the state transition of the task 4 in the system management monitor 3 according to the present invention will be described. 10 and 11 are diagrams for explaining the sixth embodiment, and FIG. 10 shows the sixth embodiment.
Transition state diagram (trace diagram) of tasks provided by
FIG. 11 shows the system management monitor 3 in the sixth embodiment.
3 is a flowchart showing the flow of the processing of FIG.

【００５９】本発明によるシステム管理モニタ３では、
停止状況収集手段３ａで収集したデータを基にタスク４
の状態遷移をタスク状態遷移表示手段により、図１０の
ようなトレース図（遷移状態図）を提供する。図１０の
縦軸はタスクの状態を表しており、本発明においては、
実行状態、ＣＰＵ待ち状態、停止状態の３状態がある。
横軸は時間軸を表している。システム管理モニタ３は、
実施の形態２または３においてステップＳ２０６で説明
したログ処理でのデータを元に、図１０に示すグラフを
作成する。このグラフの元データは、状態遷移した時
間、遷移前後の状態、タスクの識別子である。遷移した
時間の場所で、遷移前から遷移後の状態に向けて矢印を
引く。矢印の下には、遷移前後の停止原因を記述する。In the system management monitor 3 according to the present invention,
Task 4 based on the data collected by the stop situation collection means 3a
The task state transition display means for providing the state transition of is provided as a trace diagram (transition state diagram) as shown in FIG. The vertical axis of FIG. 10 represents the task state, and in the present invention,
There are three states: execution state, CPU wait state, and stop state.
The horizontal axis represents the time axis. The system management monitor 3
The graph shown in FIG. 10 is created based on the data in the log processing described in step S206 in the second or third embodiment. The original data of this graph is the time of state transition, the states before and after the transition, and the task identifier. At the transition time, draw an arrow from the state before the transition to the state after the transition. Below the arrow, the cause of the stop before and after the transition is described.

【００６０】図１０について補足説明をすると、図にお
ける［ー］は、”実行状態”を、［ＣＰＵ］は、”ＣＰ
Ｕ待ち状態”を、［ｉｐｃ：ｉｎｔｅｒｎａｌｐｒｏ
ｃｅｓｓｃｏｍｕｎｉｃａｔｉｏｎ］は、停止の一原
因でタスク間通信による”停止状態”を示している。例
えば、図１０に示されている時系列の最初の状態は、”
タスク１”が”実行状態”から”ＣＰＵ待ち状態”に遷
移したことを、次の状態は、”タスク２”が”停止状
態”から”ＣＰＵ待ち状態”に遷移したことを示してい
る。As a supplementary explanation of FIG. 10, [-] indicates "execution state" and [CPU] indicates "CP".
"Waiting state", [ipc: internal pro
[cess communication] indicates a “stopped state” due to inter-task communication due to one cause of stoppage. For example, the first state of the time series shown in FIG. 10 is "
The "task 1" transitions from the "execution state" to the "CPU wait state", and the next state indicates that the "task 2" transitions from the "stop state" to the "CPU wait state".

【００６１】次に、本実施の形態６におけるシステム管
理モニタ３における処理の流れを示す図１１の流れ図を
用いて説明する。ステップＳ２０４からステップＳ２０
７までの処理は、実施の形態２または実施の形態３での
同ステップ番号の処理と同じである。本実施の形態６に
特有の処理は、ステップＳ６０１での処理（状態遷移表
示手段）を新たに追加し、図１０に示したようなトレー
ス図を表示することである。Next, description will be made with reference to the flowchart of FIG. 11 showing the flow of processing in the system management monitor 3 in the sixth embodiment. Step S204 to Step S20
The processing up to 7 is the same as the processing with the same step number in the second or third embodiment. The process peculiar to the sixth embodiment is that the process (state transition display means) in step S601 is newly added and a trace diagram as shown in FIG. 10 is displayed.

【００６２】以上のように本実施の形態６によれば、シ
ステム管理モニタが、タスクの状態遷移を把握すること
が可能なトレース情報を提供するようにしたので、マル
チタスク実行方式の計算機において、タスク間の依存関
係を把握することが容易になる。As described above, according to the sixth embodiment, the system management monitor is adapted to provide the trace information capable of grasping the state transition of the task. Therefore, in the multitask execution type computer, It becomes easy to understand the dependency relationships between tasks.

【００６３】実施の形態７．本実施の形態７では、本発
明によるシステム管理モニタ３において、計算機システ
ムに投入するタスク４を制御して、タスク４のスループ
ットを向上させる手段ついて説明する。本発明によるシ
ステム管理モニタは３、タスク投入手段３ｆを備えてい
る。ユーザ定義タスク４が本発明によるシステム管理モ
ニタＩ／Ｆ手段５を用いて、投入するタスク情報を取得
して、タスク投入手段３ｆにより計算機システムにタス
ク４を投入するようにしている。このようにすることに
より、システム管理モニタ３がタスクの停止状況収集手
段３ａで収集したタスクの停止状況、即ち計算機の負荷
状況に基づいて、計算機システムにタスク４を投入する
ことを可能としている。従って、一度に複数のタスク４
を計算機システムに投入しないようにして、ＯＳ２のオ
ーバーヘッドを減らす、即ち、タスク４のスループット
を向上させる効果がある。Embodiment 7 FIG. In the seventh embodiment, a means for controlling the task 4 input to the computer system in the system management monitor 3 according to the present invention to improve the throughput of the task 4 will be described. The system management monitor according to the present invention comprises 3 and task input means 3f. The user-defined task 4 uses the system management monitor I / F means 5 according to the present invention to acquire task information to be input, and the task input means 3f inputs the task 4 to the computer system. By doing so, the system management monitor 3 can input the task 4 to the computer system based on the task stop situation collected by the task stop situation collecting means 3a, that is, the load status of the computer. Therefore, multiple tasks 4 at a time
Is not input to the computer system, the overhead of the OS 2 is reduced, that is, the throughput of the task 4 is improved.

【００６４】図１２（ａ）及び（ｂ）は、このタスク投
入手段の処理の流れを示す流れ図であり、図１２（ａ）
は、ユーザタスク４での処理を、図１２（ｂ）は、シス
テム管理モニタ３での処理を示している。本実施の形態
７においては、タスク４の停止時間を基にした負荷状況
を判断して、その結果によりタスクを投入するべきかど
うかを決定する。以下、図１３に示した流れ図を参照し
ながら説明する。FIGS. 12A and 12B are flow charts showing the processing flow of this task input means.
12B shows the processing in the user task 4, and FIG. 12B shows the processing in the system management monitor 3. In the seventh embodiment, the load status based on the stop time of the task 4 is determined, and the result determines whether or not the task should be input. Hereinafter, description will be made with reference to the flowchart shown in FIG.

【００６５】ユーザ定義タスク４においては、ステップ
Ｓ７０１で、システム管理モニタ３に投入するタスクの
情報を渡すために、システム管理モニタＩ／Ｆ手段５
に、その情報を設定しておく。実際にタスクを投入する
処理は、システム管理モニタ３に任せる。このような場
合に必要になる、異なるタスク間での情報伝達手段は、
当業者には公知の方法が種々あるが、その方法について
は、ここでは触れない。In the user-defined task 4, in step S701, the system management monitor I / F means 5 is passed in order to pass the information of the task to be input to the system management monitor 3.
Then, set that information. The system management monitor 3 is responsible for the actual task input process. In this case, the means of information transmission between different tasks is
There are various methods known to those skilled in the art, which will not be described here.

【００６６】図１２（ｂ）に示すステップＳ７１０から
ステップＳ７１３までは、システム管理モニタ３側で
の、タスク投入手続きを示している。システム管理モニ
タ３がこの一連の処理を行うのは、タスク投入側によっ
てステップＳ７０１が実行された時や前回の処理から一
定時間経た後等、計算機システムに対して新たなタスク
４の投入が必要になった場合である。Steps S710 to S713 shown in FIG. 12B show the task input procedure on the system management monitor 3 side. The system management monitor 3 performs this series of processing because it is necessary to input a new task 4 to the computer system when step S701 is executed by the task input side or after a certain time has passed from the previous processing. That is the case.

【００６７】ステップＳ７１０（タスク投入可否判断手
段）において、計算機システムにタスク４の投入を行う
べきかどうかを判断する。例えば、ＣＰＵ待ちの時間が
多いこと等により、ＯＳ２のオーバーヘッドが増大して
いる状態では、新たなタスク４を投入するのに相応しく
ないと判断し、処理を抜ける。なお、タスク投入時期の
善し悪しの判断は、実施の形態５に示したあげた診断ル
ーチンと同様な方法による。タスクの投入が可能と判断
すると、ステップＳ７１１において、ステップＳ７０１
の実行で渡された、投入するタスクに関する情報をシス
テム管理モニタＩ／Ｆ手段５から得る処理を行う。次い
で、ステップＳ７１２において、計算機システムにタス
クを投入する処理を行う。次に、ステップＳ７１３で、
他にペンディング中のタスク投入処理があるかどうかを
判断して、ペンディング中のものがある場合に、ペンデ
ィング中のタスク投入要求に対してステップＳ７１０か
らの処理を繰り返す。In step S710 (task input availability determination means), it is determined whether or not task 4 should be input to the computer system. For example, when the overhead of the OS 2 is increasing due to a long time waiting for the CPU or the like, it is determined that it is not suitable for inputting a new task 4, and the process is terminated. The determination of whether the task input timing is good or bad is made by the same method as the diagnosis routine described in the fifth embodiment. If it is determined that the task can be input, in step S711, step S701 is performed.
The processing for obtaining the information about the task to be delivered, which is passed by the execution of the above, from the system management monitor I / F means 5. Next, in step S712, processing for inputting a task to the computer system is performed. Next, in step S713,
It is determined whether there is another pending task input process, and if there is a pending task input process, the processes from step S710 are repeated for the pending task input request.

【００６８】以上の処理のように、本実施の形態７によ
れば、システム管理モニタを通してタスク投入を行うよ
うにしたので、計算機システムの負荷状況に応じて投入
するタスクを制限することが可能となり、タスクのスル
ープットを向上させることが可能になる。As described above, according to the seventh embodiment, since the task is input through the system management monitor, it is possible to limit the input task according to the load status of the computer system. It becomes possible to improve the throughput of the task.

【００６９】[0069]

【発明の効果】以上のように、この発明によれば、タス
クの停止時間の許容値を設定する手段を設けると共に、
ＯＳがタスク毎にタスクの停止原因と停止時間を管理す
るようにしたので、タスクの停止時間が許容値を超えた
場合には、ＯＳがタスクに対して、事象通知を行うこと
ができる。従って、タスク自身が、タスクの実行に遅れ
が生じたことを認識することができるので、タスク自身
で処理の遅れに対処する処理等の回復処理を自動化させ
ることができる。As described above, according to the present invention, means for setting the allowable value of the task stop time is provided, and
Since the OS manages the cause and the stop time of the task for each task, when the stop time of the task exceeds the allowable value, the OS can notify the task of the event. Therefore, the task itself can recognize that a delay has occurred in the execution of the task, and thus the task itself can automate the recovery processing such as the processing for dealing with the processing delay.

【００７０】また、所定の周期で動作するステム管理モ
ニタを設けて、このシステム管理モニタがタスクの停止
状況、即ち負荷状況を度数分布表示するようにしたの
で、タスクの負荷状況を視覚的に捉えることができる。Since a system management monitor that operates at a predetermined cycle is provided and the system management monitor displays the stop status of the task, that is, the load status in a frequency distribution, the load status of the task can be visually grasped. be able to.

【００７１】また、タスクの停止状況の表示をその停止
原因別に所定のインターバル毎に表示するようにしたの
で、突発的に発生する停止原因の把握をすることができ
る。Further, since the stop status of the task is displayed according to the cause of the stop at predetermined intervals, it is possible to grasp the cause of the stop which suddenly occurs.

【００７２】また、タスクの停止状況の表示をその停止
原因別に累積値で表示するようにしたので、タスクの全
体的な停止状況を把握することができる。Further, since the task stop status is displayed by the cumulative value according to the cause of the stop, it is possible to grasp the overall stop status of the task.

【００７３】また、停止状態中のタスクの情報も取得で
きるようにしたので、長期間停止しているタスクの検出
が可能となり、その停止原因に対する処理を行うことが
可能となる。Further, since the information of the task in the stopped state can be acquired, it becomes possible to detect the task which has been stopped for a long time, and it is possible to perform the process for the cause of the stop.

【００７４】また、システム管理モニタにタスクの異常
状態を定義する手段を設け、定義した異常状態が発生す
ると、タスクに対してその報告をするようにしたので、
タスク自身で、以後の対処が可能となる。Further, the system management monitor is provided with a means for defining an abnormal state of a task, and when the defined abnormal state occurs, the task is reported.
The task itself can take further action.

【００７５】また、システム管理モニタは、異常状態が
発生すると、その異常状態に対処するタスクを自動的に
起動するようにしたので、異常処理を自動的に行うこと
ができる。When the abnormal condition occurs, the system management monitor automatically activates the task for dealing with the abnormal condition, so that the abnormal process can be automatically performed.

【００７６】また、異常状態の定義を更新する手段を設
けるようにしたので、システムの運営を柔軟に行うこと
ができる。Since the means for updating the definition of the abnormal state is provided, the system can be operated flexibly.

【００７７】また、システム管理モニタに、タスクの状
態遷移を表示する手段を設けるようにしたので、タスク
間の依存関係を把握することができ、この依存関係を解
析することによりシステムとして最適なスループットが
得られるようにタスクの運用を行うことが可能となる。Since the system management monitor is provided with means for displaying task state transitions, it is possible to grasp the dependency relationship between tasks, and by analyzing this dependency relationship, the optimum throughput for the system can be obtained. It is possible to operate the task so that

【００７８】また、タスクの負荷状況を把握しながら、
タスクを投入するようにしたので、システムのスループ
ットの向上を図ることができる。While grasping the load status of the task,
Since the task is input, the system throughput can be improved.

【００７９】また、予めタスクの停止時間の許容値を設
定しておいて、タスクの停止時間を計測して、停止時間
が許容値を越えると、タスク自身に報告するようにした
ので、タスク自身で対策を行うことが可能となる計算機
システムを提供することができる。In addition, the permissible value of the task stop time is set in advance, the task stop time is measured, and if the stop time exceeds the permissible value, the task itself is reported. It is possible to provide a computer system capable of taking measures.

【００８０】また、タスクの停止状況を度数分布で表示
するようにしたので、タスクの負荷状況を直感的に把握
することのできる計算機システムを構築することができ
る。Further, since the stop status of the task is displayed in the frequency distribution, it is possible to construct a computer system capable of intuitively grasping the load status of the task.

【００８１】また、タスクの異常状態を定義して、定義
した異常状態が発生するとこの異常状態に対処するよう
にしたので、自動的に異常処理を行う計算機システムを
提供することができる。Further, the abnormal state of the task is defined, and when the defined abnormal state occurs, the abnormal state is dealt with, so that it is possible to provide a computer system for automatically performing abnormal processing.

【００８２】また、異常状態に対する処理をタスク自身
で行うようにしたので、最適な対処をすることができ
る。Further, since the task itself performs the processing for the abnormal state, it is possible to take the optimum measure.

【００８３】また、異常状態に対する処理を、予め対応
付けておいてタスクにより行うようにしたので、自動的
な対処をすることができる。Further, since the processing for the abnormal state is associated with each other in advance and is performed by the task, it is possible to take an automatic countermeasure.

[Brief description of the drawings]

【図１】本発明によるタスク管理装置のソフトウェア
の全体構成を示す図である。FIG. 1 is a diagram showing an overall configuration of software of a task management device according to the present invention.

【図２】実施の形態１におけるタスクの状態遷移を時
系列に示した図である。FIG. 2 is a diagram showing a state transition of tasks in the first embodiment in time series.

【図３】実施の形態１におけるタスク管理テーブルと
トレースバッファの詳細を示す図である。FIG. 3 is a diagram showing details of a task management table and a trace buffer according to the first embodiment.

【図４】実施の形態１におけるタスクの停止時間管理
の処理の流れを示す流れ図である。FIG. 4 is a flowchart showing a processing flow of task down time management in the first embodiment.

【図５】実施の形態２におけるシステム管理モニタが
表示するインターバル毎の度数分布の例を示す図であ
る。FIG. 5 is a diagram showing an example of frequency distribution for each interval displayed by the system management monitor according to the second embodiment.

【図６】実施の形態２におけるシステム管理モニタの
行う処理の流れを示す流れ図である。FIG. 6 is a flowchart showing a flow of processing performed by a system management monitor according to the second embodiment.

【図７】実施の形態３におけるシステム管理モニタの
行う処理の流れを示す流れ図である。FIG. 7 is a flowchart showing the flow of processing performed by the system management monitor according to the third embodiment.

【図８】実施の形態４におけるＡＰＩ手段の行う処理
の流れを示す流れ図である。FIG. 8 is a flowchart showing a flow of processing performed by API means in the fourth embodiment.

【図９】実施の形態５におけるシステム管理モニタの
有するボトルネック検出手段の行う処理の流れを示す流
れ図である。FIG. 9 is a flowchart showing a flow of processing performed by a bottleneck detection unit included in the system management monitor according to the fifth embodiment.

【図１０】実施の形態６におけるシステム管理モニタ
の表示する状態遷移図の一例を示す図である。FIG. 10 is a diagram showing an example of a state transition diagram displayed by the system management monitor according to the sixth embodiment.

【図１１】実施の形態６におけるシステム管理モニタ
の行う処理の流れを示す流れ図である。FIG. 11 is a flowchart showing a flow of processing performed by a system management monitor according to the sixth embodiment.

【図１２】実施の形態７におけるシステム管理モニタ
の有するタスク投入手段の行う処理の流れを示す流れ図
である。FIG. 12 is a flowchart showing the flow of processing performed by a task input means included in the system management monitor according to the seventh embodiment.

【図１３】ＵＮＩＸにおけるｐｓコマンドの出力形式
を示す図である。FIG. 13 is a diagram showing an output format of a ps command in UNIX.

【図１４】ＵＮＩＸにおけるｖｍｓｔａｔコマンドの
出力形式を示す図である。FIG. 14 is a diagram showing an output format of a vmstat command in UNIX.

[Explanation of symbols]

１ソフトウェア構成図、２オペレーティングシステ
ム、２ａ停止許容時間設定手段、２ｂ停止時間累積
手段、２ｃ停止状況提供手段、２ｄ停止時間超過報
告手段、３システム管理モニタ、３ａ停止状況収集
手段、３ｂ停止状況表示手段、３ｃ異常状態検出手
段、３ｄ異常状態通知手段、３ｅタスク状態遷移表
示手段、３ｆタスク投入手段、４ユーザ定義タス
ク、５システム管理モニタＩ／Ｆ手段、２０タスク
管理テーブル、３０トレースバッファ。1 software configuration diagram, 2 operating system, 2a stop allowable time setting means, 2b stop time accumulating means, 2c stop status providing means, 2d stop time excess reporting means, 3 system management monitor, 3a stop status collecting means, 3b stop status display Means, 3c Abnormal state detecting means, 3d Abnormal state notifying means, 3e Task state transition display means, 3f Task input means, 4 User-defined tasks, 5 System management monitor I / F means, 20 Task management table, 30 Trace buffer.

Claims

[Claims]

1. A task monitoring device comprising: a computer in which a plurality of tasks are executed in parallel, wherein an operating system for controlling the execution of the tasks is provided with the following elements. Permissible stop time setting means for setting the permissible value of the stop time of the task, (b) Every time the operating state of the task transits, the cause of the suspension of the task, the suspension time, the suspension for each task according to the transition state Stop time accumulating means for recording the stop cause and the stop status such as the cumulative value of the stop time according to the time and the cause of stopping the task, and (c) corresponding to the stop time of the task and the task recorded by the stop time accumulating means. Comparing means for comparing the allowable value with
(D) Stop time excess reporting means for reporting to the task that the stop time exceeds the allowable value when the comparing means determines that the task stop time exceeds the allowable value.

2. A computer in which a plurality of tasks are executed in parallel, the operating system having the following elements for controlling the execution of the tasks, and (a) each time the operating state of the task transits, its transition: Depending on the status, for each task, the cause of the task's stoppage, stop time, stop time, and
A stop time accumulating means for recording the stop cause for each task stop cause and the stop status such as a cumulative value of the stop time, (b)
A stop status providing means for providing the stop status of the task recorded by the stop time accumulating means, a system management monitor having the following elements and operating at a predetermined interval, and (a) a stop status provided by the stop status providing means. And a stop status display unit for displaying the load status of the task in a frequency distribution based on the stop status of the task collected by the stop status collection unit. Monitoring equipment.

3. The task monitoring device according to claim 2, wherein the stop situation display means displays a task load situation at a predetermined interval.

4. The task monitoring device according to claim 2, wherein the stop status display means displays the load status of accumulated tasks.

5. The task monitoring device according to claim 2, wherein the stop status providing unit provides the stop status including information on the stop status.

6. A computer in which a plurality of tasks are executed in parallel, the operating system having the following elements for controlling the execution of the tasks, and (a) each time the operating state of the task transits, its transition: Depending on the status, for each task, the cause of the task's stoppage, stop time, stop time, and
A stop time accumulating means for recording the stop cause for each task stop cause and the stop status such as a cumulative value of the stop time, (b)
A stop status providing means for providing the stop status of the task recorded by the stop time accumulating means, a system management monitor having the following elements and operating at a predetermined interval, and (a) a stop status provided by the stop status providing means. Stop status collecting means for collecting, (b) abnormal state definition information defining the task abnormal state for each task, (c) an abnormal state for detecting an abnormal state of a task based on the stop state and the abnormal state definition information Detection means,
(D) A task monitoring device comprising: an abnormality notifying unit for notifying that the task is abnormal when the abnormal state detecting unit determines that the task is abnormal.

7. A computer in which a plurality of tasks are executed in parallel, the operating system having the following elements for controlling the execution of the tasks, and (a) each time the operating state of the task transits, its transition Depending on the status, for each task, the cause of the task's stoppage, stop time, stop time, and
A stop time accumulating means for recording the stop factor and the stop status such as the cumulative value of the stop time according to the cause of the stop of the task, (b)
A stop status providing means for providing the stop status of the task recorded by the stop time accumulating means, a system management monitor having the following elements and operating at a predetermined interval, and (a) a stop status provided by the stop status providing means. Stop status collecting means for collecting, (b) abnormal state definition information defining the task abnormal state for each task, (c) an abnormal state for detecting an abnormal state of a task based on the stop state and the abnormal state definition information Detection means,
(D) A task monitoring device comprising: a task inputting means for activating a task for processing the abnormal state when the abnormal state detecting means determines that the abnormal state is present.

8. The task monitoring device according to claim 6 or 7, further comprising a system management monitor interface unit for updating the abnormal state definition information from a user task.

9. The system status monitor according to claim 6, further comprising task status transition display means for displaying a task transition status based on the task suspension status. Task monitoring device.

10. A computer in which a plurality of tasks are executed in parallel, the operating system having the following elements for controlling the execution of the tasks, and (a) each time the operating state of the task transits, its transition: A stop time accumulating means for recording the stop status of each task, the stop time, the stop time, and the stop factor of each task and the cumulative value of the stop time according to the state,
(B) A stop status providing means for providing the stop status of the task recorded by the stop time accumulating means, a system management monitor interface means for holding the input information of the task set by the user task, and the following elements: A system management monitor operating at intervals of (a) a stop status collecting means for collecting the stop status provided by the stop status providing means, and (b) a task status based on the stop status of the task collected by the stop status collecting means. An input availability determination means for determining whether the input is possible,
(C) a task input means for inputting a task based on the task input information obtained from the system management monitor interface means when the task input availability determination means determines that the task can be input apparatus.

11. A task monitoring method including the following steps: (a) a step of setting a permissible value of a stop time of the task for each task, and (b) recording the stop time and the cause of the stop when the task is stopped. A step of: (c) recording a stop time, a cause of the stop, and a cumulative value of the stop time when the task recovers from the stopped state; (d) a comparing step of comparing the stop time with the allowable value; e) A step of reporting to the task that a stop exceeding the allowable time has occurred when it is judged in the comparing step that the stop time exceeds the allowable value.

12. A task monitoring method comprising the steps of: (a) recording a stop time and a cause of the stop when the task is stopped; and (b) a stop time and a stop when the task is recovered from a stop state. A step of recording the task stop status such as the cause and the cumulative value of the stop time, and (c) a step of displaying the task load status in a frequency distribution based on the stop status.

13. A task monitoring method comprising the following steps: (a) a step of defining an abnormal state of each task, (b) when the task is stopped, its stop time,
Recording the cause of the stop, (c) recording the stop status of the task such as the stop time and the cumulative value of the stop cause and the stop time when the task recovers from the stopped state, (d) based on the stop information An abnormal state detecting step of detecting the occurrence of the abnormal state defined in 1., and (e) a step of processing the abnormal state when an abnormality is detected in the abnormal state detecting step.

14. The task monitoring method according to claim 14, wherein the step of processing the abnormal state is a step of reporting to a task corresponding to the abnormal state.

15. The task monitoring method according to claim 14, wherein the step of processing the abnormal state is a step of activating a task for processing the abnormal state.