JPH09330302A

JPH09330302A - Method and system for monitoring performance of computer system

Info

Publication number: JPH09330302A
Application number: JP9029951A
Authority: JP
Inventors: Shunji Takubo; 俊二田窪; Nobutoshi Sagawa; 暢俊佐川; Tadashi Ota; 忠太田; Susumu Yamaga; 晋山賀
Original assignee: Hitachi ULSI Engineering Corp; Hitachi Ltd
Current assignee: Hitachi ULSI Engineering Corp; Hitachi Ltd
Priority date: 1996-02-14
Filing date: 1997-02-14
Publication date: 1997-12-22
Anticipated expiration: 2017-02-14
Also published as: JP3881739B2

Abstract

PROBLEM TO BE SOLVED: To enable more than one computer to monitor performance data on a computer to be monitored object through a network without increasing the load on the object computer. SOLUTION: A sampling process 4 actuated on each node 2 of parallel computers 1 samples performance data and a gathering process 3 actuated on a specific node gathers those performance data and sends them to a receiving process 15 on a monitor computer 11. The receiving process 15 distributes the performance data to a display process 16 or storing process 17 which is already actuated on the same or different monitor computer 11 when such a process is present. The display process 16 displays some of itemized performance data included in the performance data on a display device 12. The storing process 17 stores all the performance data in a storage device 13.

Description

Detailed Description of the Invention

【０００１】[0001]

【発明の属する技術分野】本発明は、複数の計算機をネ
ットワークで接続した計算機システムにおける性能デー
タの採取ならびにその表示をおこなう性能モニタリング
方法、そのための計算機システムおよびプログラム記憶
媒体に係り、特に並列計算機あるいは分散システムに好
適な性能モニタリング方法に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a performance monitoring method for collecting and displaying performance data in a computer system in which a plurality of computers are connected via a network, a computer system and a program storage medium therefor, and particularly a parallel computer or The present invention relates to a performance monitoring method suitable for a distributed system.

【０００２】[0002]

【従来の技術】並列計算機や分散システムでは、それら
を構成するノードと呼ばれる計算機が複数個協調して並
列に稼働するため、それらのノードの動作は、ノード間
通信に例示されるように、他のノードの動作に依存す
る。したがって、逐次型計算機に比較してその動作は非
常に複雑なものとなる。2. Description of the Related Art In a parallel computer or a distributed system, a plurality of computers, which are called nodes constituting the parallel computers, operate in parallel in cooperation with each other. Depends on the behavior of the node. Therefore, the operation becomes very complicated as compared with the sequential computer.

【０００３】このような並列計算機を有効に使用し十分
な性能を引き出すためには、ノード単体の動作のみなら
ず、ノード間の動作の因果関係や負荷のバランスなども
含めた複雑な稼働状況を正確に把握し、それらの情報を
実行中のプログラムのチューニングに役立てていくこと
が必要となる。In order to effectively use such a parallel computer and bring out sufficient performance, not only the operation of a single node but also a complicated operating condition including a causal relation of the operation between the nodes and a load balance, etc. It is necessary to accurately grasp the information and use the information for tuning the running program.

【０００４】計算機の稼働状況の把握を支援する従来技
術としては、主に次の２通りの方法が用いられてきた。
第１は、Ｈｅｗｌｅｔｔ−Ｐａｃｋａｒｄ社が販売して
いるＰｅｒｆＶｉｅｗなどで用いられている方法で、分
散システムの各ノード毎にＣＰＵの稼働状況やメモリの
使用状況、ネットワークの通信頻度などといったそのノ
ードの稼働状況に関する性能データを計測し、磁気ディ
スク記憶装置などの、そのノードに含まれた記憶装置に
蓄積する。各ノードに蓄積された性能データを、その分
散システムに接続された一つの計算機内に集積し、グラ
フ表示などによって視覚的に理解できるよう表示する。The following two methods have been mainly used as conventional techniques for supporting the grasp of the operating status of a computer.
The first is the method used in PerfView sold by Hewlett-Packard, etc., and the operating status of the CPU such as CPU operating status, memory usage status and network communication frequency for each node of the distributed system. Performance data regarding the situation is measured and stored in a storage device included in the node, such as a magnetic disk storage device. The performance data accumulated in each node is accumulated in one computer connected to the distributed system, and displayed so that it can be visually understood by graph display.

【０００５】第２は、ＩＢＭ社のＶｉｓｕａｌｉｚａｔ
ｉｏｎＴｏｏｌに代表される方法で、並列計算機の各
ノード上に性能データを採取するプロセスを起動し、ネ
ットワークを介してその並列計算機に接続された制御用
計算機上に起動された表示プロセスが、各ノード上のプ
ロセスからリアルタイムに性能データを受信し、表示を
行う。たとえば、米国ＩｎｔｅｒｎａｔｉｏｎａｌＢ
ｕｓｉｎｅｓｓＭａｃｈｉｎｅｓＣｏｒｐ．発行
の、“ＩＢＭＰａｒａｌｌｅｌＥｎｖｉｒｏｎｍｅ
ｎｔｆｏｒＡＩＸＯｐｅｒａｔｉｏｎａｎｄ
ＵｓｅＶｅｒｓｉｏｎ２．１．０，”ｐｐ．２６３
−２６５，１９９５（資料番号ＧＣ２３−３８９１−０
０）参照。Secondly, IBM's Visualizat
In the method represented by ion Tool, a process for collecting performance data is started on each node of the parallel computer, and the display process started on the control computer connected to the parallel computer via the network is Performance data is received and displayed in real time from the process on the node. For example, US International B
usins Machines Corp. Published by "IBM Parallel Environneme
nt for AIX Operation and
Use Version 2.1.0, "pp.263.
-265, 1995 (Document number GC23-3891-0
See 0).

【０００６】[0006]

【発明が解決しようとする課題】一般に並列計算機ある
いは分散システムは複数の利用者により共有して使用さ
れるようになっている。したがって、このような計算機
の性能データをネットワークを介してリアルタイムに複
数の利用者がモニタリングできることが望ましい。Generally, a parallel computer or a distributed system is commonly used by a plurality of users. Therefore, it is desirable that a plurality of users can monitor such computer performance data in real time via a network.

【０００７】しかし、第１の方法では、各ノード毎に性
能データを記憶装置に蓄積することができるが、性能デ
ータの測定を一度終えてからデータの解析ならびに表示
を行うため、リアルタイムに稼働状況の把握をすること
ができない。In the first method, however, the performance data can be stored in the storage device for each node, but since the performance data measurement is once completed and the data is analyzed and displayed, the operating status is real-time. I can't grasp.

【０００８】第２の方法では、各ノード上のプロセスか
らリアルタイムに性能データを受信し表示することがで
きるが、上記参考文献は、複数の利用者が同じ監視対象
の計算機の性能データをモニタリングする方法を具体的
には示していない。In the second method, the performance data can be received and displayed in real time from the process on each node, but in the above reference, a plurality of users monitor the performance data of the same monitored computer. The method is not specifically shown.

【０００９】複数の利用者が同じモニタリング対象の計
算機の性能データのモニタリングを行う場合、利用者の
数が増加するに従ってそのモニタリング対象の計算機に
対する、そのモニタリングのための負荷が増大しないこ
とがさらに望ましい。When a plurality of users monitor the performance data of the same monitoring target computer, it is further desirable that the monitoring load on the monitoring target computer does not increase as the number of users increases. .

【００１０】従って本発明の目的は、性能測定の対象と
なる計算機の負荷の増大を抑えつつ、リアルタイムに性
能データを複数の利用者がモニタリングするのに適した
計算機性能モニタリング方法、そのための計算機システ
ムおよびプログラム記憶媒体を提供することにある。Accordingly, an object of the present invention is to provide a computer performance monitoring method suitable for a plurality of users to monitor performance data in real time while suppressing an increase in the load of the computer whose performance is to be measured, and a computer system therefor. And providing a program storage medium.

【００１１】[0011]

【課題を解決するための手段】上記目的を達成するた
め、本発明では、監視対象の計算機により性能データを
繰り返し採取し、稼働状況モニタ用の計算機にネットワ
ークを通じて送信する。このモニタ用の計算機に受信プ
ロセスを起動し、このプロセスは、送信された性能デー
タを受信し、その計算機と同じかまたは異なる複数の計
算機上に起動された一つまたは複数の利用プロセスにそ
の受信した性能データを分配する。測定された性能デー
タは、複数の測定項目に対する複数の性能データを含
む。それらの利用プロセスは、例えば、複数の表示プロ
セスあるいは複数の蓄積プロセスである。To achieve the above object, in the present invention, performance data is repeatedly collected by a computer to be monitored and transmitted to a computer for operating status monitoring via a network. Invokes a receiving process on this monitor computer, and this process receives the transmitted performance data and receives it on one or more utilization processes started on multiple computers that are the same as or different from that computer. Distribute the performance data. The measured performance data includes a plurality of performance data for a plurality of measurement items. The utilization processes are, for example, a plurality of display processes or a plurality of accumulation processes.

【００１２】いずれかの表示プロセスが上記受信プロセ
スにより転送された性能データを受信したときには、そ
の表示プロセスは、転送された性能データの内、一部の
測定項目に対する性能データを、その表示プロセスが起
動されている計算機に接続された表示装置上に表示す
る。いずれかの蓄積プロセスが上記受信プロセスにより
転送された性能データを受信したときには、その蓄積プ
ロセスは、転送された性能データを全て、その蓄積プロ
セスが起動されている計算機に接続された記憶装置に記
憶する。When any of the display processes receives the performance data transferred by the receiving process, the display process displays the performance data for some measurement items among the transferred performance data. Display on the display device connected to the running computer. When any of the storage processes receives the performance data transferred by the reception process, the storage process stores all the transferred performance data in the storage device connected to the computer on which the storage process is activated. To do.

【００１３】本発明の一つのより望ましい動作態様で
は、監視対象の計算機が並列計算機である場合には、そ
の並列計算機の各ノード上に起動した採取プロセスが採
取した性能データを、上記ネットワークに接続された、
並列計算機内の所定の一つのノード上に起動された収集
プロセスにより収集し、上記受信プロセスに転送する。In one more preferable operation mode of the present invention, when the computer to be monitored is a parallel computer, the performance data collected by the collection process activated on each node of the parallel computer is connected to the network. Was done,
The data is collected by a collection process activated on a predetermined one node in the parallel computer and transferred to the reception process.

【００１４】[0014]

【発明の実施の形態】以下、本発明に係る性能モニタリ
ング方法を図面に示したいくつかの実施の形態を参照し
てさらに詳細に説明する。なお、以下においては、同じ
参照番号は同じものもしくは類似のものを表わすものと
する。また、第２の実施の形態以降では、第１の実施の
形態との相違点を主に説明する。BEST MODE FOR CARRYING OUT THE INVENTION The performance monitoring method according to the present invention will be described below in more detail with reference to some embodiments shown in the drawings. In the following, the same reference numerals represent the same or similar ones. Further, in the second and subsequent embodiments, differences from the first embodiment will be mainly described.

【００１５】＜発明の実施の形態１＞図１２において、
並列計算機１は、それぞれ少なくとも一つのプロセッサ
２Ａとメモリ２Ｂなどで構成される複数のノード２と、
それらのノード間を接続する内部ネットワーク５とから
なる。各ノードのメモリ２Ｂは、そのノードのプロセッ
サ２Ａが実行するプログラムおよびデータを保持する。
各ノードあるいは一部のノードはさらに磁気記憶装置等
の周辺装置を有するが、ここでは簡単化のために図示し
ていない。並列計算機１の特定の１個のノードは外部ネ
ットワーク２１と接続されており、外部ネットワーク２
１には複数の計算機１１が接続され、上記一つのノード
およびこれらの計算機はこのネットワーク２１を介して
相互に通信可能になっている。並列計算機１が監視対象
の計算機であり、外部ネットワーク２１に接続された二
つの計算機１１がモニタリングに使用される計算機の例
である。ネットワーク２１に接続された他の計算機は、
簡単化のために図示されていない。各計算機１１は、プ
ロセッサ１１Ａやメモリ１１Ｂにより構成され、その計
算機１１には、ディスプレイ装置とキーボードなどを含
む入出力装置１２と磁気ディスク記憶装置などの記憶装
置１３が接続されている。各プロセッサは、適当なＯ
Ｓ、たとえば、Ｘ／ＯｐｅｎＣｏｍｐａｎｙＬｉｍ
ｉｔｅｄにより開発され、ライセンスされているＵＮＩ
Ｘにより制御される。<First Embodiment of the Invention> In FIG.
The parallel computer 1 includes a plurality of nodes 2 each including at least one processor 2A and memory 2B,
The internal network 5 connects between these nodes. The memory 2B of each node holds a program and data executed by the processor 2A of that node.
Each node or some of the nodes further has a peripheral device such as a magnetic storage device, but they are not shown here for simplification. One specific node of the parallel computer 1 is connected to the external network 21, and the external network 2
A plurality of computers 11 are connected to 1, and the above-mentioned one node and these computers can communicate with each other via this network 21. The parallel computer 1 is a computer to be monitored, and two computers 11 connected to the external network 21 are examples of computers used for monitoring. Other computers connected to the network 21
Not shown for simplicity. Each computer 11 includes a processor 11A and a memory 11B, and the computer 11 is connected to an input / output device 12 including a display device and a keyboard and a storage device 13 such as a magnetic disk storage device. Each processor has an appropriate O
S, eg X / Open Company Lim
UNI developed and licensed by ited
Controlled by X.

【００１６】図１には、本実施の形態におけるモニタリ
ングシステムを構成する５種類のプロセスとそれらの間
の関連を示す。これらのプロセスは、適当なプログラム
記録媒体に記録された５つのプログラムが、並列計算機
１に組み込まれて、それぞれプロセスとして実行された
ものである。並列計算機１の各ノード２には採取プロセ
ス４がおかれる。この採取プロセス４はそれぞれのノー
ド２の性能データを一定の時間間隔で繰り返し採取する
機能を持つ。収集プロセス３は並列計算機１の複数のノ
ード２のうち、外部ネットワーク２１に接続されている
特定のノード上におかれ、各ノード上の採取プロセス４
が採取した各ノードの性能データを収集し、いずれか一
つのモニタ用の計算機１１へ送信する。FIG. 1 shows the five types of processes that make up the monitoring system of this embodiment and the relationships between them. These processes are ones in which five programs recorded in an appropriate program recording medium are installed in the parallel computer 1 and executed as respective processes. Each node 2 of the parallel computer 1 has a collection process 4. The collection process 4 has a function of repeatedly collecting the performance data of each node 2 at regular time intervals. The collection process 3 is placed on a specific node connected to the external network 21 among the plurality of nodes 2 of the parallel computer 1, and the collection process 4 on each node
The performance data of each node collected by is collected and transmitted to one of the monitoring computers 11.

【００１７】一方、この一つのモニタ用計算機１１には
受信プロセス１５、表示プロセス１６そして蓄積プロセ
ス１７が起動される。受信プロセス１５は複数のモニタ
用計算機の内の１つのみに起動され、収集プロセス３と
１対１でデータの送受信を行う。表示プロセス１６なら
びに蓄積プロセス１７は、一人または複数の利用者によ
って必要な数だけ起動され、受信プロセス１５から性能
データの分配を受ける。表示プロセス１６は、その性能
データ内の一部の項目別性能データを入出力装置１２内
のディスプレイ装置上に表示する。蓄積プロセスは、そ
の性能データの全体を記憶装置１３へ蓄積したりする。
この表示プロセス１６および蓄積プロセス１７は、必ず
しも受信プロセス１５の起動されたモニタ用計算機１１
上で起動される必要はなく、そのモニタ用の計算機とネ
ットワーク２１で接続されている他の計算機上でもそれ
らのプロセスを起動することもできる。また、いずれか
一つのモニタ用計算機１１上に複数の表示プロセス１６
を起動することもできる。同様にその計算機上に複数の
蓄積プロセス１７を起動することも可能である。On the other hand, a reception process 15, a display process 16 and a storage process 17 are activated in this one monitor computer 11. The reception process 15 is activated by only one of the plurality of monitor computers, and transmits / receives data to / from the collection process 3 on a one-to-one basis. The display process 16 and the accumulation process 17 are activated by a required number of users or users, and receive performance data from the reception process 15. The display process 16 displays some item-specific performance data in the performance data on the display device in the input / output device 12. The storage process stores the entire performance data in the storage device 13.
The display process 16 and the storage process 17 are not necessarily the monitor computer 11 in which the reception process 15 is activated.
It is not necessary to be started on the above, and those processes can be started on another computer connected to the monitor computer by the network 21. In addition, a plurality of display processes 16 are provided on any one monitor computer 11.
Can also be started. Similarly, it is possible to activate a plurality of storage processes 17 on the computer.

【００１８】次に、本モニタリングシステムの各プロセ
スがどのように連携をとりながら動作するかを、図６
（ａ），（ｂ）のフローチャートならびに図２から図５
に示された上記５つのプロセスの内部構成図を用いて説
明する。Next, how each process of this monitoring system operates in cooperation with each other will be described with reference to FIG.
Flowcharts of (a) and (b) and FIGS. 2 to 5
Will be described with reference to the internal configuration diagrams of the above five processes shown in FIG.

【００１９】まず並列計算機１上の、外部ネットワーク
２１に接続された上記特定のノード２上に収集プロセス
３を起動する（ステップ５２１（図６（ａ）））。通常
の運用では、この収集プロセス３の起動はシステム管理
者が並列計算機１に対して収集プロセス起動コマンドを
投入することによって行う。収集プロセス３は、起動さ
れると、まず初期化処理を実行する。たとえば、ノード
数、各ノードの属性や性能データを取得するノードの数
など、並列計算機１の構成が記述された構成定義ファイ
ル２０８（図３）を読み込む。本実施の形態では、全て
のノードが性能データを取得すると仮定する。各ノード
の属性には、そのノードのアドレス、磁気ディスク記憶
装置など周辺機器がそのノードに付属しているか否かの
周辺装置付属状況などの情報が含まれる。初期化処理が
終わったら、この収集プロセス３はいずれか一つのモニ
タ用計算機１１上に起動されるであろう受信プロセス１
５からの接続要求を待つ。First, the collection process 3 is started on the specific node 2 connected to the external network 21 on the parallel computer 1 (step 521 (FIG. 6A)). In normal operation, the collection process 3 is started by the system administrator issuing a collection process start command to the parallel computer 1. When started, the collection process 3 first executes an initialization process. For example, the configuration definition file 208 (FIG. 3) in which the configuration of the parallel computer 1 is described, such as the number of nodes, the attributes of each node, and the number of nodes that acquire performance data, is read. In this embodiment, it is assumed that all nodes acquire performance data. The attributes of each node include information such as the address of the node, the peripheral device attachment status indicating whether or not a peripheral device such as a magnetic disk storage device is attached to the node. When the initialization process is completed, this collection process 3 will be activated on any one of the monitor computers 11
Wait for connection request from 5.

【００２０】次いで、いずれか一つのモニタ用計算機１
１上に受信プロセス１５を起動する（ステップ５４
１）。通常の運用では、受信プロセス１５の起動は、シ
ステム管理者がモニタ用計算機１１から受信プロセス起
動コマンドを投入することによって行う。この受信プロ
セス１５は並列計算機１に対してネットワークにより接
続された任意の計算機上に起動することが可能である。
しかし、本プロセスは複数の利用者が異なる計算機上で
起動した複数の表示プロセスから接続される可能性があ
るため、並列計算機の管理用ワークステーションなど複
数の利用者からのアクセスが可能な特定の計算機上で起
動するのが一般的である。システム管理者はこの受信プ
ロセス１５の起動コマンドの引数として、収集プロセス
３へ接続要求を出す際に必要となる並列計算機１のＩＰ
アドレス情報（あるいはホスト名）、性能データ採取の
時間間隔などのパラメータを入力する。Next, any one monitor computer 1
1 to start the receiving process 15 (step 54)
1). In normal operation, the reception process 15 is activated by the system administrator issuing a reception process activation command from the monitoring computer 11. This receiving process 15 can be activated on any computer connected to the parallel computer 1 by a network.
However, since this process may be connected from multiple display processes started by multiple users on different computers, it is possible to specify a specific computer that can be accessed by multiple users, such as a parallel computer management workstation. It is generally started on a computer. The system administrator uses the IP of the parallel computer 1 that is required when issuing a connection request to the collection process 3 as an argument of the start command of the reception process 15.
Input parameters such as address information (or host name) and time interval for collecting performance data.

【００２１】起動された受信プロセス１５は、図４の接
続処理ルーチン３０５により、収集プロセス３に対して
接続要求を行う（ステップ５４２）。本接続要求は、Ｕ
ＮＩＸにおけるｃｏｎｎｅｃｔシステムコールで実装す
ることができる。この時、接続要求を出す相手である収
集プロセス３の識別情報として、起動の際に引数として
与えられた並列計算機のＩＰアドレス情報を用いる。接
続要求を受け取った収集プロセス３では、接続処理ルー
チン２０２が、受信プロセス１５内の接続処理ルーチン
３０５（図４）のとの接続処理を行い、これにより収集
プロセス３が受信プロセス１５との間でデータの送受信
を行うのを可能にする（ステップ５２２）。The activated receiving process 15 makes a connection request to the collection process 3 by the connection processing routine 305 of FIG. 4 (step 542). This connection request is U
It can be implemented by the connect system call in NIX. At this time, the IP address information of the parallel computer given as an argument at the time of start-up is used as the identification information of the collection process 3 which is the other party who issues the connection request. In the collection process 3 that has received the connection request, the connection processing routine 202 performs connection processing with the connection processing routine 305 (FIG. 4) in the reception process 15, whereby the collection process 3 communicates with the reception process 15. Allows data to be sent and received (step 522).

【００２２】接続処理の完了後、収集プロセス３は先に
構成定義ファイル２０８から読み込んだ並列計算機１の
構成定義情報を受信プロセス１５へ送信する（ステップ
５２３）。さて、本システムでは、システムを構成する
プロセス間でメッセージを授受することにより、要求の
伝達やデータの送受信を行う。本実施の形態では、メッ
セージは可変長のバイト列であり、先頭１バイトはメッ
セージの種類を表す識別子を格納する識別子フィールド
であり、それに続くデータフィールドにデータを格納す
る。メッセージは、データフィールドを伴わず、識別子
フィールドのみから成る場合もある。上記ステップ５２
３での、上記構成定義情報の伝達のためには、収集プロ
セス３において、内部処理ルーチン２０７（図３）が受
信プロセス１５に対応して設けられた入出力バッファ２
０３上にその構成定義情報を含むメッセージを構成す
る。すなわち、識別子フィールドに本メッセージが構成
定義情報を含むことを示す１バイトの識別子を格納し、
データフィールドに、上記監視用の計算機１１のＩＰア
ドレスを格納する。このように構成されたメッセージを
ＵＮＩＸのｓｅｎｄシステムコールを用いて受信プロセ
ス１５へネットワーク２１を介して送る。After the connection process is completed, the collection process 3 sends the configuration definition information of the parallel computer 1 previously read from the configuration definition file 208 to the reception process 15 (step 523). Now, in this system, requests are transmitted and data is transmitted and received by exchanging messages between the processes that configure the system. In the present embodiment, the message is a variable-length byte string, the first byte is an identifier field that stores an identifier indicating the type of message, and data is stored in the subsequent data field. The message may consist only of the identifier field without the data field. Step 52 above
3, the internal processing routine 207 (FIG. 3) in the collection process 3 is provided for the reception process 15 in order to transmit the above configuration definition information.
A message including the configuration definition information is constructed on the 03. That is, a 1-byte identifier indicating that this message includes the configuration definition information is stored in the identifier field,
The IP address of the monitoring computer 11 is stored in the data field. The message thus configured is sent to the receiving process 15 via the network 21 by using the UNIX send system call.

【００２３】受信プロセス１５では、収集プロセス３か
らこのメッセージ受信すると、入力解析ルーチン３０６
（図４）にてその識別子からメッセージ内容が並列計算
機１の構成定義情報であることを検知する。さらに、構
成定義情報に含まれたノード数から、性能データの転送
に必要な入出力バッファ３０１，３０２のサイズを算出
する。算出されたサイズは後にそれらの入出力バッファ
が確保されるときに使用される。各入出力バッファ３０
１または３０２は、後に起動される表示プロセス１６あ
るいは蓄積プロセス１７の一つに対応して確保される。
これらの入出力バッファのサイズは、１時間ステップに
一つの採取プロセス４により採取される性能データの長
さを並列計算機内のノード数倍した長さ以上確保すれば
よい。次いで、受信プロセス１５は、収集プロセス３に
対して各ノードの採取プロセス４の起動を要求する（ス
テップ５４３）。具体的には、上記計算機構成定義情報
の授受と同様に、採取プロセス４は、起動要求であるこ
とを示す要求識別子を含んだメッセージを収集プロセス
３へ送信し、それを受信した収集プロセス３が入力解析
ルーチン２０５でその識別子から要求の内容を識別す
る。Upon reception of this message from the collection process 3, the reception process 15 receives the input analysis routine 306.
In FIG. 4, it is detected from the identifier that the message content is the configuration definition information of the parallel computer 1. Further, the sizes of the input / output buffers 301 and 302 required for transfer of performance data are calculated from the number of nodes included in the configuration definition information. The calculated size is used later when those I / O buffers are reserved. Each input / output buffer 30
1 or 302 is reserved corresponding to one of the display process 16 or the accumulation process 17 which is activated later.
The size of these input / output buffers may be at least as long as the length of the performance data collected by one collection process 4 in one hour step is multiplied by the number of nodes in the parallel computer. Next, the reception process 15 requests the collection process 3 to activate the collection process 4 of each node (step 543). Specifically, similarly to the above-mentioned exchange of computer configuration definition information, the collection process 4 sends a message including a request identifier indicating that it is a start request to the collection process 3, and the collection process 3 that has received the message sends the message. The input analysis routine 205 identifies the content of the request from the identifier.

【００２４】受信プロセス１５から採取プロセス４の起
動を要求された収集プロセス３は、先に受信した構成定
義情報内のノードアドレスを使用して、各ノード２上に
採取プロセス４を起動する（ステップ５２４）。起動に
はＵＮＩＸのリモートシェル機能を用いる。採取プロセ
ス４は起動されると（ステップ５０１）、ＵＮＩＸのｃ
ｏｎｎｅｃｔシステムコールを用いて接続処理ルーチン
１００（図２）により収集プロセス３との接続処理を実
行するなどの初期処理を行い、収集プロセス３からモニ
タリング開始が要求されるのを待つ。このようにして採
取プロセス４の起動および採取プロセスによる収集プロ
セス３との接続処理が終了すると、収集プロセス３は受
信プロセス１５に対して起動処理終了を通知する。それ
を受けた受信プロセス１５は収集プロセス３に対してモ
ニタリング開始を要求する（ステップ５４４）。この
時、受信プロセスがモニタリング開始要求のために送信
するメッセージには、要求の内容を示す識別子の他、性
能データ採取時間間隔についての情報が含まれる。The collection process 3 requested by the receiving process 15 to start the collection process 4 starts the collection process 4 on each node 2 using the node address in the previously received configuration definition information (step 524). The remote shell function of UNIX is used for activation. When the collection process 4 is started (step 501), UNIX c
Initial processing, such as executing connection processing with the collection process 3 by the connection processing routine 100 (FIG. 2) using the connect system call, waits for the monitoring start request from the collection process 3. In this way, when the collection process 4 is activated and the connection process with the collection process 3 by the collection process is completed, the collection process 3 notifies the reception process 15 of the completion of the activation process. Receiving it, the receiving process 15 requests the collection process 3 to start monitoring (step 544). At this time, the message sent by the receiving process for the monitoring start request includes the identifier indicating the content of the request and the information about the performance data collection time interval.

【００２５】モニタリング開始要求を受けた収集プロセ
ス３は、該当する採取プロセス４へ開始要求メッセージ
を転送する（ステップ５２５）。採取プロセス４が起動
されると、カウンタ制御／読み出しルーチン１０４がｓ
ｅｌｅｃｔシステムコールを発行し、採取プロセス４は
メッセージ到着待ち状態になる。採取プロセス４は、ｓ
ｅｌｅｃｔシステムコールに対するリターンの値によ
り、メッセージ到着を検出すると、この採取プロセス４
の処理は入力解析ルーチン１０２に移る。入力解析ルー
チン１０２ではｒｅｃｅｉｖｅシステムコールを発行し
てメッセージを入出力バッファ１０１に読み込む。メッ
セージの先頭の識別子がモニタリング開始要求識別子で
あることを確認し、メッセージに含まれる性能データ採
取時間間隔を取り出し、これを戻り値として使用してカ
ウンタ制御／読み出しルーチン１０４に戻る。こうして
採取要求を受けた採取プロセス４は、要求に含まれてい
る採取時間間隔を読みとり、カウンタ制御／読み出しル
ーチン１０４は、この採取時間間隔をクロック１０７に
セットし、クロック１０７はセットされた時間間隔ごと
にカウンタ制御／読み出しルーチン１０４に割り込みを
発生する。Upon receiving the monitoring start request, the collection process 3 transfers the start request message to the corresponding collection process 4 (step 525). When the collection process 4 is activated, the counter control / readout routine 104
The select system call is issued, and the collection process 4 enters the message arrival waiting state. The sampling process 4 is s
If a message arrival is detected by the value of the return to the elect system call, this collection process 4
The process moves to the input analysis routine 102. The input analysis routine 102 issues a receive system call to read a message into the input / output buffer 101. It is confirmed that the identifier at the beginning of the message is the monitoring start request identifier, the performance data collection time interval included in the message is extracted, and this is used as a return value to return to the counter control / readout routine 104. The collection process 4 receiving the collection request in this way reads the collection time interval included in the request, and the counter control / readout routine 104 sets this collection time interval in the clock 107, and the clock 107 sets the set time interval. Each time an interrupt is generated in the counter control / readout routine 104.

【００２６】カウンタ制御／読み出しルーチン１０４
は、クロックからの割り込みを受けると、その都度、Ｕ
ＮＩＸの性能データ採取のためのシステムコールである
ｒｓｔａｔなどの関数を発行してＯＳ１０５から性能デ
ータを採取する（ステップ５０２）。性能データは複数
の項目別性能データからなり、採取可能な項目別性能デ
ータの数と種類は、上記関数の仕様によって予め決まっ
ている。通常、単位時間内でのＣＰＵ利用率、メモリ利
用率、磁気ディスク記憶装置アクセス回数、ネットワー
クによる通信回数（すなわち、送信回数と受信回数）な
どの項目別性能データを得ることができる。ＯＳは、こ
れらの性能データを、カーネル内のソフトウエアカウン
タ１０６、またはそのノード内のハードウエアカウンタ
１０８より読み出し、ｒｓｔａｔの出力引数として返
す。Counter control / readout routine 104
Whenever it receives an interrupt from the clock, U
A function such as rstat, which is a system call for collecting performance data of NIX, is issued to collect performance data from the OS 105 (step 502). The performance data is composed of a plurality of item-specific performance data, and the number and types of item-specific performance data that can be collected are predetermined according to the specifications of the above functions. Generally, it is possible to obtain item-wise performance data such as CPU utilization rate, memory utilization rate, magnetic disk storage device access frequency, network communication frequency (that is, transmission frequency and reception frequency) within a unit time. The OS reads these performance data from the software counter 106 in the kernel or the hardware counter 108 in the node and returns it as an output argument of rstat.

【００２７】このようにして性能データを採取した採取
プロセス４のカウンタ制御、読み出しルーチン１０４
は、その性能データを入出力バッファ１０１に格納し、
出力制御ルーチン１０３に制御を移す。出力制御ルーチ
ン１０３は、ｓｅｎｄシステムコールを発行し、それに
より入出力バッファ１０１の内容を内部ネットワーク５
を介して収集プロセス３へ送信する（ステップ５０
３）。収集プロセス３では、いずれかの採取プロセス４
からの性能データを含むメッセージの到着を、内部処理
ルーチン２０７（図３）がＯＳのシステムコールである
ｓｅｌｅｃｔ関数を用いて監視する。メッセージの到着
を確認すると、内部処理ルーチン２０７は、入力解析ル
ーチン２０５を起動する。入力解析ルーチン２０５は、
ｒｅｃｅｉｖｅシステムコールを発行して、そのメッセ
ージ内の性能データを入出力バッファ２０１へ読み込
む。入力解析ルーチン２０５は、その性能データの読み
込みが終了すると、送信元の採取プロセスを記憶し、全
ての採取プロセス４から性能データが送信されてきたか
を確認し、処理を内部処理ルーチン２０７に戻す。The counter control / readout routine 104 of the collection process 4 that collects the performance data in this way
Stores the performance data in the input / output buffer 101,
The control is transferred to the output control routine 103. The output control routine 103 issues a send system call, which causes the contents of the input / output buffer 101 to be transferred to the internal network 5
To the collection process 3 via (step 50).
3). In the collection process 3, one of the collection processes 4
The internal processing routine 207 (FIG. 3) monitors the arrival of a message including performance data from the OS using a select function which is a system call of the OS. Upon confirming the arrival of the message, the internal processing routine 207 activates the input analysis routine 205. The input analysis routine 205 is
A receive system call is issued to read the performance data in the message into the input / output buffer 201. When the reading of the performance data is completed, the input analysis routine 205 stores the collection process of the transmission source, confirms whether the performance data has been transmitted from all the collection processes 4, and returns the processing to the internal processing routine 207.

【００２８】この時、性能データをまだ送信してきてな
い採取プロセス４がある場合は、内部処理ルーチン２０
７はｓｅｌｅｃｔシステムコールを発行して性能データ
を含むメッセージ到着の監視を続ける。全ての採取プロ
セス４からの性能データが到着した場合は、内部処理ル
ーチン２０７はそれらの採取プロセス４からの性能デー
タを、それぞれの採取プロセスに対応して設けられた入
出力バッファ２０１から、受信プロセス１５に対する入
出力バッファ２０３へコピーし、さらに、それらのコピ
ーされた各ノードに関する性能データを一つのメッセー
ジに組み立てる。メッセージは、識別子に続いて、各採
取プロセスから送られた性能データを連ねた性能データ
からなる。次いで内部処理ルーチン２０７は、出力制御
ルーチン２０６を起動する。出力制御ルーチン２０６
は、ＵＮＩＸのｓｅｎｄシステムコールを発行し、この
性能データを含むメッセージを受信プロセス１５に送信
する（ステップ５２６）。At this time, if there is a collecting process 4 that has not yet transmitted the performance data, the internal processing routine 20
7 issues a select system call to continue monitoring the arrival of messages containing performance data. When the performance data from all the collection processes 4 arrive, the internal processing routine 207 receives the performance data from the collection processes 4 from the input / output buffer 201 provided corresponding to each collection process, and receives the performance data. The data is copied to the I / O buffer 203 for 15, and the performance data on each of the copied nodes is assembled into one message. The message is composed of performance data in which the performance data sent from each collection process is linked after the identifier. Next, the internal processing routine 207 activates the output control routine 206. Output control routine 206
Issues a UNIX send system call and sends a message containing this performance data to the receiving process 15 (step 526).

【００２９】受信プロセス１５では、内部処理ルーチン
３０８がＵＮＩＸのｓｅｌｅｃｔシステムコールを発行
し、収集プロセス３からのメッセージ到着を待つ。収集
プロセス３からのメッセージの到着を検出すると、内部
処理ルーチン３０８は、入力解析ルーチン３０６に処理
を移す。入力解析ルーチン３０６は、ＵＮＩＸのｒｅｃ
ｅｉｖｅシステムコールを発行して、そのメッセージ内
の性能データを収集プロセス３に対応して設けられた入
出力バッファ３０４へ読み込む（ステップ５４５）。性
能データを受信した後、受信プロセス１５は、表示プロ
セス１６または蓄積プロセス１７が接続されているかを
確認する。表示プロセスが１つも接続されていない場合
には、以下の性能データの転送処理は行わない。少なく
とも一つの表示プロセス１６が接続されていた場合に
は、受信プロセス１５は入出力バッファ３０４に到着し
たメッセージをその表示プロセス１６に対して１対１に
設けられた入出力バッファ３０１にコピーし、それによ
りそのメッセージをその入出力バッファ３０１に接続さ
れた表示プロセス１６に分配する。もし複数の表示プロ
セス１６が接続されている入出力バッファ３０１がある
ときには、全ての表示プロセス１６にその性能データを
全く同じ方法で分配する。蓄積プロセス１７が接続され
ている場合には、その蓄積プロセスに接続して設けられ
た入出力バッファ３０１を利用する。本分配処理の詳細
に関しては後述する。In the receiving process 15, the internal processing routine 308 issues a UNIX select system call and waits for the arrival of a message from the collecting process 3. Upon detecting the arrival of a message from the collection process 3, the internal processing routine 308 shifts the processing to the input analysis routine 306. The input analysis routine 306 is a UNIX rec
An eive system call is issued to read the performance data in the message into the input / output buffer 304 provided corresponding to the collection process 3 (step 545). After receiving the performance data, the reception process 15 confirms whether the display process 16 or the storage process 17 is connected. If no display process is connected, the following performance data transfer processing is not performed. When at least one display process 16 is connected, the receiving process 15 copies the message arriving at the input / output buffer 304 to the input / output buffer 301 provided on a one-to-one basis with respect to the display process 16. Thereby, the message is distributed to the display process 16 connected to the input / output buffer 301. If there is an input / output buffer 301 to which a plurality of display processes 16 are connected, the performance data is distributed to all the display processes 16 in exactly the same manner. When the storage process 17 is connected, the input / output buffer 301 provided in connection with the storage process is used. Details of this distribution processing will be described later.

【００３０】このようにして、採取プロセス４、収集プ
ロセス３、受信プロセス１５がシステム管理者によって
起動された後、本システムの使用者は表示プロセス１６
あるいは蓄積プロセス１７を起動すれば、これらのプロ
セスを使用して性能データをモニタリングをすることが
可能となる。この場合、図示された他のモニタ用の計算
機１１は、並列計算機１の一人の利用者が使用する机上
の計算機でもよい。こうして、システム管理者が管理す
る計算機１１からもあるいはそれから隔たった、利用者
が管理する個人用の汎用の計算機１１からも並列計算機
１の性能データをモニタ可能になる。なお、本明細書で
はこのように、性能データのモニタリング専用でない汎
用の計算機でも性能データのモニタリングに使用される
計算機をモニタ用の計算機と呼んでいる。In this way, after the collection process 4, the collection process 3, and the reception process 15 are activated by the system administrator, the user of this system displays the display process 16.
Alternatively, if the storage process 17 is activated, it becomes possible to monitor the performance data using these processes. In this case, the other monitor computer 11 shown may be a desk computer used by one user of the parallel computer 1. In this way, the performance data of the parallel computer 1 can be monitored from the computer 11 managed by the system administrator or from a general-purpose computer 11 for personal use, which is remote from the computer and is managed by the user. In this specification, even a general-purpose computer that is not exclusively used for performance data monitoring is referred to as a monitor computer even if it is used for performance data monitoring.

【００３１】本実施の形態では、受信プロセスを任意の
計算機に起動可能にするとともに、性能データを使用す
る表示プロセスあるいは蓄積プロセスも任意の計算機に
起動可能になっている。さらに、個々の表示プロセスが
要求する項目別性能データに依らないで、予め定めた複
数の項目別性能データを採取し、表示プロセスはその中
から項目別性能データを選択して利用する。これらの予
め定められた複数の項目が十分多ければ、利用者が他の
項目別性能データを要求した場合でも、通常は上の方法
によっても利用者の要求を満たすことができる。In this embodiment, the receiving process can be activated on any computer, and the display process or the storage process using the performance data can be activated on any computer. Further, a plurality of predetermined item-specific performance data are collected without depending on the item-specific performance data required by each display process, and the display process selects and uses the item-specific performance data from among them. If the number of these predetermined plural items is sufficiently large, the user's request can be usually satisfied by the above method even when the user requests other item-specific performance data.

【００３２】さらに、本実施の形態では、蓄積プロセス
では利用者が後で利用する項目別性能データを選択でき
るように、採取された全ての項目別性能データを記憶装
置に保持するようになっている。このためにも、利用者
の要求する項目別性能データが何であるかに依らないで
予め定めた一定数の項目別性能データを採取している。Further, in this embodiment, all the collected item-specific performance data are stored in the storage device in the accumulation process so that the user can select the item-specific performance data to be used later. There is. For this reason, a predetermined number of item-specific performance data are collected regardless of what the item-specific performance data requested by the user is.

【００３３】さらに、収集プロセス、採取プロセスと受
信プロセスは、表示プロセスあるいは蓄積プロセスの起
動とは独立に起動され、その後の表示プロセスあるいは
蓄積プロセスの起動に応じて採取した性能データをそれ
らのプロセスに分配している。これにより、起動された
表示プロセスの有無あるいはその数が変化しても、受信
プロセスと収集プロセス、採取プロセスは同じ処理を実
行すればよいことになる。Furthermore, the collection process, the collection process, and the reception process are activated independently of the activation of the display process or the accumulation process, and the performance data collected in response to the activation of the display process or the accumulation process thereafter is sent to those processes. We are distributing. As a result, the reception process, the collection process, and the collection process only have to execute the same processing even if the presence or the number of the activated display processes changes.

【００３４】表示プロセス１６の動作は以下の通りであ
る。表示プロセス１６はモニタリングシステムの使用者
が自分の使用するモニタ用計算機１１上で起動する（ス
テップ５６１（図６（ｂ）））。この時、使用者は受信
プロセス１５が起動された計算機のＩＰアドレスをあら
かじめ確認しておき、その表示プロセス１６の起動時の
引数として指定する。The operation of the display process 16 is as follows. The display process 16 is activated on the monitor computer 11 used by the user of the monitoring system (step 561 (FIG. 6B)). At this time, the user confirms in advance the IP address of the computer on which the receiving process 15 is activated and designates it as an argument when the display process 16 is activated.

【００３５】複数の利用者が本モニタリングシステムを
利用する場合には、通常は、利用者毎に異なるモニタ用
計算機を用いる。例えば、二人の利用者が、図１に示さ
れる２台のモニタ用計算機１１のそれぞれを用いる。本
実施の形態では、同一の利用者が同じ計算機１１上に複
数の表示プロセスを起動することができる。少なくとも
一つの表示プロセス（第１種の表示プロセス）は、後に
例示するように、その表示プロセスに対して定まった項
目別性能データを、その表示プロセスにより定められた
図形もしくはグラフでもって表示するように構成されて
いる。その他の表示プロセス（第２種の表示プロセス）
は利用者が選んだ項目別性能データを、その表示プロセ
スに対してあらかじめ定められたグラフにより表示する
ように構成される。後者の種類の表示プロセスとして、
それぞれ異なるグラフに対応して準備された複数の表示
プロセスが同一のモニタ用の計算機上に起動可能に構成
されている。さらにいずれの種類の表示プロセスも、起
動された後、利用者の指示により受信プロセスに性能デ
ータの転送を要求するように構成されている。このよう
に、表示可能なグラフごとに表示プロセスを起動するよ
うに表示プロセスを構成してあるので、各表示プロセス
の構造が簡単である。When a plurality of users use this monitoring system, a different monitor computer is usually used for each user. For example, two users use each of the two monitor computers 11 shown in FIG. In this embodiment, the same user can activate a plurality of display processes on the same computer 11. At least one display process (first type display process), as exemplified later, displays the item-specific performance data defined for the display process in the form of a graphic or graph defined by the display process. Is configured. Other display processes (type 2 display process)
Is configured to display the performance data for each item selected by the user by a graph predetermined for the display process. As the latter kind of display process,
A plurality of display processes prepared corresponding to different graphs can be activated on the same monitor computer. Further, both types of display processes are configured to request the transfer of performance data to the receiving process according to a user's instruction after being activated. As described above, since the display process is configured to be activated for each graph that can be displayed, the structure of each display process is simple.

【００３６】以上の結果、本実施の形態では性能データ
を同じ計算機上に表示するには、利用者は最小限一つの
表示プロセスを起動する必要があるが、一般には利用者
は複数の項目別性能データの表示を希望するので、同一
の利用者が同一の計算機上に複数の第２種の表示プロセ
スを起動することになる。その際、どのグラフで性能デ
ータを表示するかに応じて、起動すべき第２種の表示プ
ロセスを選択し、その選択された第２種の表示プロセス
を起動後に、その表示プロセスで表示すべき項目別性能
データを指示するようになっている。As a result, in the present embodiment, in order to display the performance data on the same computer, the user needs to start at least one display process. Since the user desires to display the performance data, the same user starts a plurality of second type display processes on the same computer. At that time, a second type display process to be activated is selected according to which graph the performance data is displayed, and after the selected second type display process is activated, the display process should be displayed. It is designed to indicate performance data by item.

【００３７】起動された表示プロセス１６では、制御ル
ーチン４０３が表示画面の初期化や受信プロセス１５へ
接続するための初期化処理を行った後、接続処理ルーチ
ン４０６に処理を移して受信プロセス１５へ接続要求を
発行する（ステップ５６２）。In the activated display process 16, the control routine 403 performs initialization processing for initializing the display screen and initialization for connecting to the receiving process 15, and then shifts the processing to the connection processing routine 406 to the receiving process 15. A connection request is issued (step 562).

【００３８】接続要求を受け取った受信プロセス１５の
接続処理ルーチン３０３は、その表示プロセス用に入出
力バッファ３０１ならびに要求フラグ３０２を生成し、
上記表示プロセスに対して接続処理を行う（ステップ５
４６）。上記受信プロセスに対する接続処理が終了し、
そのプロセスに対してデータの送受信が可能となった
ら、受信プロセス１５は今接続した表示プロセス１６に
対して、受信プロセス１５が先に収集プロセス３から受
信した、並列計算機１の構成定義情報を送信する。Upon receiving the connection request, the connection processing routine 303 of the receiving process 15 generates the input / output buffer 301 and the request flag 302 for the display process,
Connection processing is performed for the display process (step 5).
46). The connection process for the above receiving process is completed,
When data can be transmitted / received to / from the process, the reception process 15 transmits the configuration definition information of the parallel computer 1, which the reception process 15 has previously received from the collection process 3, to the display process 16 that has just been connected. To do.

【００３９】構成定義情報を受信した表示プロセス１６
は、その情報を参照して性能データを蓄積するための送
受信バッファ４０４の確保や全ノードの性能データを表
示するためのグラフのレイアウトの計算などを行った上
で、ウィンドウを表示装置上に表示する（ステップ５６
３）。図７は、表示プロセス（第１種の表示プロセス）
により予め定められた複数の項目別性能データを表示す
るウィンドウの例である。二つの直方体および一組の矢
印（６０１，６０２，６０３）によって一つのノードの
稼働状況を表す。このような図形を複数のノードに対応
して複数個配置することによって並列計算機全体の稼働
状況を表す。２つの直方体６０１，６０２の高さはそれ
ぞれ一つのノードでのＣＰＵ利用率とメモリ利用率に対
応し、一組の矢印６０３の長さはネットワーク２１への
そのノードからの送信回数と送受信回数に対応する。表
示ウィンドウ６１０は、表示エリア６２０と制御エリア
６３０から構成され、使用者は制御エリア６３０に配置
されたボタン６４０，６５０などのオブジェクトを利用
して表示プロセスの制御を行う。ボタン６４０を操作す
ることによりデータ表示が開始され、ボタン６５０を操
作することにより表示が停止される。Display process 16 that received the configuration definition information
Displays the window on the display device after securing the transmission / reception buffer 404 for accumulating the performance data by referring to the information and calculating the graph layout for displaying the performance data of all the nodes. Yes (Step 56)
3). FIG. 7 shows a display process (first type display process).
5 is an example of a window that displays a plurality of item-specific performance data predetermined by the above. The operating status of one node is represented by two rectangular parallelepipeds and a pair of arrows (601, 602, 603). By arranging a plurality of such figures corresponding to a plurality of nodes, the operation status of the entire parallel computer is represented. The heights of the two rectangular parallelepipeds 601 and 602 respectively correspond to the CPU utilization rate and the memory utilization rate in one node, and the length of a pair of arrows 603 corresponds to the number of transmissions and the number of transmissions and receptions from the node to the network 21. Correspond. The display window 610 includes a display area 620 and a control area 630, and the user controls the display process by using objects such as buttons 640 and 650 arranged in the control area 630. The data display is started by operating the button 640, and the display is stopped by operating the button 650.

【００４０】図８は１画面に利用者が選択した１種類の
項目別性能データを表示する他の表示プロセス（第２種
の表示プロセス）により表示されるウィンドウの例であ
る。表示エリア６２０には棒グラフが表示される。グラ
フの横軸１７１１は異なるノードに対応し、縦軸１７１
２に一つの項目別性能データがマッピングされる。利用
者は、制御エリア６３０に与えられた性能データのリス
ト１７０４から、棒グラフの縦軸にマッピングしたい性
能データを選択する。ボタン６４０，６５０は図７の場
合と同様である。図７あるいは図８のいずれの表示プロ
セスの場合でも、使用者が表示ウィンドウ６１０または
１７００内のボタン６４０の操作によりデータ表示の開
始を指示すると、表示プロセス１６は受信プロセス１５
に対して性能データの転送要求メッセージを送信する
（ステップ５６４）。FIG. 8 is an example of a window displayed by another display process (second type display process) for displaying one type of item-specific performance data selected by the user on one screen. A bar graph is displayed in the display area 620. The horizontal axis 1711 of the graph corresponds to different nodes, and the vertical axis 171
One item-specific performance data is mapped to 2. The user selects the performance data to be mapped on the vertical axis of the bar graph from the performance data list 1704 provided in the control area 630. Buttons 640 and 650 are similar to those in FIG. In either case of the display process of FIG. 7 or FIG. 8, when the user instructs the start of data display by operating the button 640 in the display window 610 or 1700, the display process 16 receives the reception process 15
A performance data transfer request message is transmitted to (step 564).

【００４１】前述のように、受信プロセス１５は、内部
処理ルーチン３０８においてｓｅｌｅｃｔシステムコー
ルを発行し、入出力バッファ３０１および３０４へのメ
ッセージ到着待ちの状態にある。表示プロセス１６から
のデータ転送要求メッセージの到着によりｓｅｌｅｃｔ
システムコールはリターンし、内部処理ルーチン３０８
は、このメッセージの到着を検出すると、入力解析ルー
チン３０６に処理を移す。入力解析ルーチン３０６はｒ
ｅｃｅｉｖｅシステムコールを発行して要求メッセージ
を当該表示プロセスと接続された入出力バッファ３０１
に読み込む。次いで、出力制御ルーチン３０７は要求メ
ッセージの識別子をチェックし、メッセージがデータ転
送要求であることを判定して、その入出力バッファ３０
１に付随する要求フラグ３０２をセットする。なお、同
じ使用者または異なる使用者によって複数の表示プロセ
ス１６または蓄積プロセス１７が起動された場合には、
受信プロセス１５は上記の処理を繰り返し、それら全て
のプロセスと接続操作を行う。As described above, the receiving process 15 issues a select system call in the internal processing routine 308, and is in a state of waiting for a message arrival in the input / output buffers 301 and 304. Select by the arrival of the data transfer request message from the display process 16.
The system call returns, and the internal processing routine 308
When the arrival of this message is detected, the process shifts to the input analysis routine 306. The input analysis routine 306 is r
An input / output buffer 301 connected to the display process by issuing the receive system call to send a request message.
Read in. Next, the output control routine 307 checks the identifier of the request message, determines that the message is a data transfer request, and outputs the I / O buffer 30
The request flag 302 associated with 1 is set. In addition, when a plurality of display processes 16 or accumulation processes 17 are activated by the same user or different users,
The reception process 15 repeats the above processing and performs connection operation with all of these processes.

【００４２】このように表示プロセス１６が受信プロセ
ス１５に接続され、使用者の画面操作によって表示が開
始された後の装置動作について説明する。前述の手順に
より採取プロセス４から収集プロセス３を経由して受信
プロセス１５へ性能データの転送が行われる（ステップ
５０４，５２７，５４８）。このとき、受信プロセス１
５は内部処理ルーチン３０８においてｓｅｌｅｃｔシス
テムコールを発行し、入出力バッファ３０１，３０５へ
のメッセージ到着待ち状態にある。収集プロセス３から
の性能データを含むメッセージの到着を契機としてｓｅ
ｌｅｃｔシステムコールは内部処理ルーチン３０８にリ
ターンし、さらに内部処理ルーチン３０８は入力解析ル
ーチン３０６に制御を移す。入力解析ルーチン３０６は
ｒｅｃｅｉｖｅシステムコールを発行して到着したメッ
セージを入出力バッファ３０４に読み込み、メッセージ
の識別子をチェックしてこれが性能データを含むメッセ
ージであることを確認して内部処理ルーチン３０８に処
理を戻す。内部処理ルーチン３０８は表示プロセス１６
と接続された入出力バッファ３０１に付随する要求フラ
グ３０２を順次確認する（ステップ５４７）。The operation of the apparatus after the display process 16 is thus connected to the reception process 15 and the display is started by the screen operation of the user will be described. The performance data is transferred from the collection process 4 to the reception process 15 via the collection process 3 by the above-described procedure (steps 504, 527, 548). At this time, the receiving process 1
5 issues a select system call in the internal processing routine 308, and is in a state of waiting for a message to arrive at the input / output buffers 301 and 305. Se is triggered by the arrival of a message containing performance data from the collection process 3.
The lect system call returns to the internal processing routine 308, and the internal processing routine 308 transfers control to the input analysis routine 306. The input analysis routine 306 issues a receive system call to read the arrived message into the input / output buffer 304, checks the message identifier, confirms that this is a message including performance data, and processes it in the internal processing routine 308. return. The internal processing routine 308 is the display process 16
The request flags 302 associated with the input / output buffers 301 connected to are sequentially checked (step 547).

【００４３】もしいずれかの入出力バッファ３０１に付
随する要求フラグ３０２がセットされていれば性能デー
タを含むメッセージを入出力バッファ３０４からその一
つの入出力バッファ３０１にメモリコピーし、出力制御
ルーチン３０７を起動する。出力制御ルーチン３０７は
ｓｅｎｄシステムコールを発行して入出力バッファに格
納された上記メッセージをその入出力バッファ３０１に
接続された表示プロセスへ転送し、その入出力バッファ
３０１に付随する要求フラグ３０２をクリアする（ステ
ップ５４９）。この処理を全ての接続されている表示プ
ロセス１６に対して設けられている全ての要求フラグ３
０２について繰り返すことにより、受信プロセス１５は
全ての表示プロセス１６に対して性能データを分配す
る。蓄積プロセスが入出力バッファ３０１に接続されて
いる場合も全く同様である。このように複数の表示プロ
セス１６に対して性能データを分配する処理を、モニタ
用計算機１１上にある受信プロセス１５が行うため、起
動された表示プロセス１６の数が増えた場合でも並列計
算機１上にある採取プロセス４や収集プロセス３にはそ
の影響はなく、監視対象の並列計算機の負荷が増加しな
い。If the request flag 302 associated with any one of the input / output buffers 301 is set, a message including performance data is copied from the input / output buffer 304 to one of the input / output buffers 301, and the output control routine 307 is executed. To start. The output control routine 307 issues a send system call to transfer the above message stored in the I / O buffer to the display process connected to the I / O buffer 301, and clears the request flag 302 attached to the I / O buffer 301. (Step 549). This processing is performed for all request flags 3 provided for all connected display processes 16.
The reception process 15 distributes the performance data to all the display processes 16 by repeating the process for 02. The same applies when the storage process is connected to the input / output buffer 301. Since the receiving process 15 on the monitor computer 11 performs the processing of distributing the performance data to the plurality of display processes 16 as described above, even if the number of activated display processes 16 is increased, The collection process 4 and the collection process 3 in 1 above are not affected, and the load on the parallel computer to be monitored does not increase.

【００４４】受信プロセス１５から性能データの転送を
受けた表示プロセス１６は、受信した全てのデータのう
ち表示に必要な項目別性能データを参照して表示画面を
作成し、入出力装置１２に描画する（ステップ５６
６）。表示に必要なデータとは、図７のウィンドウを持
つ表示プロセスの場合には、ＣＰＵ利用率、メモリ利用
率、通信回数という、その表示プロセスで定められた３
つの項目別性能データであり、図８のウィンドウを持つ
表示プロセスの場合には、その表示プロセスに対して利
用者が指示した一つの項目別性能データである。The display process 16, which has received the performance data transferred from the reception process 15, creates a display screen by referring to the item-specific performance data necessary for display out of all the received data, and draws it on the input / output device 12. Yes (Step 56)
6). In the case of a display process having the window shown in FIG. 7, the data necessary for display is the CPU utilization rate, the memory utilization rate, and the number of communications, which are defined by the display process.
It is one item-specific performance data, and in the case of the display process having the window of FIG. 8, it is one item-specific performance data instructed by the user for the display process.

【００４５】表示プロセス１６は、処理５６４から処理
５６６までの表示動作を繰り返すことによって、次々に
送られてくる性能データを表示していく。すなわち、表
示プロセス１６の制御ルーチン４０３は、データ転送要
求を受信プロセス１５に送信する（ステップ５６４）。
そのときの具体的な動作は後に説明するとおりである。
制御ルーチン４０３は、その後、利用者のウィンドウ操
作あるいは受信プロセスからの性能データを含むメッセ
ージの到着のいずれかを検出するための待ち状態にな
る。性能データを含むメッセージが到着すると、制御ル
ーチン４０３はｒｅｃｅｉｖｅシステムコールを発行し
てメッセージを送受信バッファ４０４に読み込む（ステ
ップ５６５）、処理を入力解析ルーチン４０１に移す。
入力解析ルーチン４０１は、受信プロセス１５から送ら
れたメッセージの識別子が、性能データを含むメッセー
ジの識別子であることを確認する。次いで描画処理ルー
チン４０２を起動し、描画処理ルーチン４０２は、ウィ
ンドウ上のグラフの高さを性能データに応じて変化させ
る（ステップ５６６）。描画が終了すると処理は制御ル
ーチン４０３に戻り、制御ルーチン４０３は、次の時間
ステップの性能データに対するデータ転送要求メッセー
ジを送受信バッファ４０４に作成し、ｓｅｎｄシステム
コールを発行してこのメッセージを受信プロセス１５に
送信する（ステップ５６４）。送信後、制御ルーチン４
０３は再び上記待ち状態に戻る。The display process 16 displays the performance data sent one after another by repeating the display operation from processing 564 to processing 566. That is, the control routine 403 of the display process 16 sends a data transfer request to the receiving process 15 (step 564).
The specific operation at that time is as described later.
The control routine 403 then waits to detect either the user's windowing or the arrival of a message containing performance data from the receiving process. When the message including the performance data arrives, the control routine 403 issues a receive system call to read the message into the transmission / reception buffer 404 (step 565) and shifts the processing to the input analysis routine 401.
The input analysis routine 401 confirms that the identifier of the message sent from the receiving process 15 is the identifier of the message including the performance data. Next, the drawing processing routine 402 is activated, and the drawing processing routine 402 changes the height of the graph on the window according to the performance data (step 566). When the drawing is completed, the process returns to the control routine 403, and the control routine 403 creates a data transfer request message for the performance data of the next time step in the transmission / reception buffer 404, issues the send system call, and receives this message in the receiving process 15 (Step 564). After sending, control routine 4
03 returns to the waiting state again.

【００４６】一方、表示の停止は、各時間ステップに対
する性能データの描画の後に送信している性能データ転
送要求メッセージの送信を中止することにより実現され
る。使用者が表示ウインド６１０内のボタン６０５を操
作すると、制御ルーチン４０３は上記待ち状態から抜け
て入力解析ルーチン４０１に制御を移す。入力解析ルー
チン４０１がその要求を解析し、押されたボタンが停止
ボタン６５０であることを制御ルーチン４０３に伝え
る。制御ルーチン４０３は、この状態が検出されると、
その後、受信プロセス１５に対して性能データ転送要求
メッセージの送信を行わない。データ転送要求が行われ
ない表示プロセス１６は対応する要求フラグ３０２をセ
ットしない。したがって、受信プロセス１５はそのよう
な表示プロセス１６には性能データを送信しない。した
がって、その表示プロセス１６の表示は停止する。On the other hand, the stop of the display is realized by stopping the transmission of the performance data transfer request message which is being transmitted after drawing the performance data for each time step. When the user operates the button 605 in the display window 610, the control routine 403 exits the waiting state and transfers control to the input analysis routine 401. The input analysis routine 401 analyzes the request and informs the control routine 403 that the pressed button is the stop button 650. When the control routine 403 detects this state,
After that, the performance data transfer request message is not transmitted to the receiving process 15. The display process 16 that does not make a data transfer request does not set the corresponding request flag 302. Therefore, the receiving process 15 does not send performance data to such a displaying process 16. Therefore, the display of the display process 16 is stopped.

【００４７】表示プロセス１６の終了処理は、表示ウイ
ンド６１０のメニュー操作により実行される。停止処理
の場合と同様に、終了処理の場合も利用者のウィンドウ
操作を制御ルーチン４０３が検出して上記待ち状態から
抜け、入力解析ルーチン４０１に処理を移す。入力解析
ルーチン４０１はメニューから停止が選ばれたことを検
出し、制御ルーチン４０３に伝える。制御ルーチン４０
３は送受信バッファ４０４に終了通知メッセージを形成
し、ｓｅｎｄシステムコールによりこれを受信プロセス
１５に送出する（ステップ５６７）。The ending process of the display process 16 is executed by the menu operation of the display window 610. Similarly to the case of the stop processing, also in the case of the end processing, the control routine 403 detects the user's window operation, exits the waiting state, and shifts the processing to the input analysis routine 401. The input analysis routine 401 detects that stop has been selected from the menu and notifies the control routine 403. Control routine 40
3 forms an end notification message in the transmission / reception buffer 404 and sends it to the receiving process 15 by the send system call (step 567).

【００４８】終了通知メッセージを受けた受信プロセス
１５は、その表示プロセス１６との接続状態を解消し、
割り当てていた入出力バッファならびに要求フラグを解
放する（ステップ５５０）。接続が解消されたら、表示
プロセス１６は終了処理（ステップ５６８）を行ったう
えで終了する（ステップ５６９）。全ての表示プロセス
１６との接続が解消されたら、システム管理者は、受信
プロセス１５の終了処理を開始することが可能となる。
受信プロセス１５の終了処理を開始するには、受信プロ
セス１５が起動されている計算機１１を制御するＯＳが
提供する割り込み機能を利用して受信プロセス１５に対
して割り込み信号を入力する。割り込み信号を受け取っ
た受信プロセス１５は収集プロセス３に対して終了要求
を発行する（ステップ５５１）。終了要求を受けた収集
プロセス３は、全ての採取プロセス４に対して終了要求
を発行し（ステップ５２８）、それを受けた採取プロセ
ス４は終了処理を行う（ステップ５０５）。全ての採取
プロセス４に対する接続が解消されたら、収集プロセス
３は受信プロセス１５との接続を解消する。その後、収
集プロセス３および受信プロセス１５は、各々独立に終
了処理を行い、本モニタリングシステム内の全てのプロ
セスが終了する（ステップ５２９，５５２）。The receiving process 15 receiving the end notification message cancels the connection state with the display process 16,
The allocated input / output buffer and request flag are released (step 550). When the connection is released, the display process 16 performs termination processing (step 568) and then terminates (step 569). When the connection with all the display processes 16 is released, the system administrator can start the termination process of the receiving process 15.
To start the termination process of the receiving process 15, an interrupt signal is input to the receiving process 15 by using the interrupt function provided by the OS that controls the computer 11 on which the receiving process 15 is activated. The receiving process 15 that has received the interrupt signal issues an end request to the collecting process 3 (step 551). The collection process 3 that has received the termination request issues a termination request to all the collection processes 4 (step 528), and the collection process 4 that has received the termination request performs termination processing (step 505). When the connection to all the collection processes 4 is released, the collection process 3 releases the connection to the reception process 15. After that, the collection process 3 and the reception process 15 each independently perform termination processing, and all the processes in the monitoring system are terminated (steps 529 and 552).

【００４９】次に、蓄積プロセス１７が記憶装置１３に
性能データを蓄積し、その蓄積したデータを表示プロセ
ス１６で表示する方法について説明する。図５に示す蓄
積プロセス１７は、表示プロセス１６と同様に、使用者
によって起動される。蓄積プロセス１７が起動される
と、接続処理ルーチン４５５が受信プロセス１５へ接続
要求を発行し、表示プロセス１６が実行したのと同様の
手順で受信プロセス１５と接続し、その蓄積プロセス１
７の動作を使用者に制御させるための制御ウィンドウ
（図示せず）を表示する。制御ウィンドウは、表示プロ
セス１６の表示ウィンドウ、たとえば６１０（図７）と
同様であるが、表示エリア６２０はなく、制御エリア６
３０のみで構成される。制御エリア６３０で使用者が制
御できるのは、蓄積するデータ項目の選択、蓄積するフ
ァイル名の指定、蓄積の開始ならびに終了である。制御
エリア上の操作で蓄積するデータ項目ならびに蓄積する
ファイル名を設定した後、データ蓄積開始の操作を行う
と、蓄積プロセス１７は表示プロセス１６と同様に性能
データの受信を行う。入力解析ルーチン４５１は、受信
プロセス１５から送られたメッセージの識別子が、性能
データを含むメッセージの識別子であることを確認す
る。この性能データは送受信バッファ４５２に格納さ
れ、その後、この性能データは、データ整形ルーチン４
５３で蓄積用のデータ形式に整形され、出力ルーチン４
５４によって磁気ディスク記憶装置などの記憶装置１３
へ蓄積される。Next, a method in which the accumulation process 17 accumulates the performance data in the storage device 13 and the accumulated data is displayed by the display process 16 will be described. The accumulation process 17 shown in FIG. 5 is activated by the user similarly to the display process 16. When the storage process 17 is activated, the connection processing routine 455 issues a connection request to the reception process 15 and connects to the reception process 15 in the same procedure as that executed by the display process 16.
A control window (not shown) for allowing the user to control the operation of 7 is displayed. The control window is similar to the display window of display process 16, eg, 610 (FIG. 7), but without the display area 620.
It consists of only 30. In the control area 630, the user can control selection of a data item to be stored, designation of a file name to be stored, start and end of storage. When the data accumulation start operation is performed after setting the data items to be accumulated and the file name to be accumulated by the operation on the control area, the accumulation process 17 receives the performance data similarly to the display process 16. The input analysis routine 451 confirms that the message identifier sent from the reception process 15 is the message identifier including the performance data. This performance data is stored in the transmission / reception buffer 452, and thereafter, this performance data is stored in the data shaping routine 4
Output routine 4 is formatted in 53 for storage data format
A storage device 13 such as a magnetic disk storage device by 54
Is accumulated in.

【００５０】蓄積するファイル１００４の形式は、図９
（ｄ）に示すように、まず最初に並列計算機のノード総
数、前述した各ノードの属性、データを採取したノード
一覧などといった並列計算機の構成定義情報を格納す
る。その情報の後に、複数のブロックが記憶される。各
ブロックは、複数のノードの一つと、複数の時間ステッ
プの一つに対応する。各ブロックは、複数のデータレコ
ードからなり、各データレコードは、同じ時間ステップ
に同じノードから取得された、異なる測定項目に対する
性能データを含む。ブロックの最初と最後には、それぞ
れ図９（ａ）のヘッダレコード１００１および図９
（ｂ）のブロック終了レコード１００３が置かれ、ブロ
ックの境界を示す。ヘッダレコードには、ブロック全体
の長さを示すブロック長やそのブロックに属する複数の
データレコードを生成したノードの番号およびそれらの
データレコードに含まれる性能データが取得された時間
ステップを表す時刻情報等が含まれている。さらに、全
てのレコードの先頭にはレコードの種類を示すタイプコ
ードおよびそのレコードの長さを示すレコード長があ
る。ブロックの並ぶ順は、まず、特定のノードと特定の
時間ステップに対応する一つのブロックが記憶され、次
に同じ時間ステップに対する他のノードに対する他の複
数のノードが記憶される。同じ時間ステップに対する全
てのノードに対する複数のブロックが蓄積された後に、
後続の複数の時間ステップに対する複数のブロックが並
ぶ。The format of the file 1004 to be stored is shown in FIG.
As shown in (d), first, the configuration definition information of the parallel computer such as the total number of nodes of the parallel computer, the attributes of each node described above, the list of nodes from which data is collected, etc. is stored. After that information, multiple blocks are stored. Each block corresponds to one of the plurality of nodes and one of the plurality of time steps. Each block consists of multiple data records, each data record containing performance data for different measurements taken from the same node at the same time step. The header record 1001 of FIG. 9A and the header record 1001 of FIG.
A block end record 1003 of (b) is placed and indicates a block boundary. In the header record, the block length that indicates the length of the entire block, the number of the node that generated multiple data records belonging to that block, and the time information that indicates the time step at which the performance data included in those data records was acquired, etc. It is included. Further, at the beginning of every record, there is a type code indicating the type of record and a record length indicating the length of the record. The order in which the blocks are arranged is such that one block corresponding to a specific node and a specific time step is stored first, and then a plurality of other nodes for other nodes for the same time step are stored. After accumulating multiple blocks for all nodes for the same time step,
The blocks are arranged for the subsequent time steps.

【００５１】このようにして蓄積プロセス１７により蓄
積されたデータは、いずれかの表示プロセス１６で表示
することが可能となる。この表示に使用する表示プロセ
スは、すでに起動され、図７または図８に示す表示ウイ
ンドウに性能データを表示している表示プロセスでもよ
く、まだ起動されていない表示プロセスを使用してもよ
い。以下では、すでに起動され、性能データの表示に使
用されている表示プロセスを使用する場合について説明
する。The data accumulated by the accumulation process 17 in this way can be displayed by any of the display processes 16. The display process used for this display may be a display process that has already been started and is displaying performance data in the display window shown in FIG. 7 or FIG. 8, or a display process that has not been started may be used. In the following, the case of using the display process that has already been started and is used for displaying the performance data will be described.

【００５２】図１０のフローチャートは表示プロセス１
６が蓄積データを読み込んで表示を行う手順を示したも
のである。まず、利用者が事前に入出力装置１２から表
示をする蓄積データが格納されたファイル名を入力する
（ブロック１１０１）。その後、利用者が表示プロセス
１６に対して蓄積データ表示モードへの切り替えの要求
を入力すると、表示プロセス１６は蓄積データ表示モー
ドへ切り替える（ステップ１１０２）。この時、表示プ
ロセス１６は、受信プロセスへのデータ転送要求の送信
を停止し、入力先切替ルーチン４０５（図５）がデータ
入力先を受信プロセス１５から記憶装置１３上の指定し
たファイルへ切り替える。The flowchart of FIG. 10 shows the display process 1.
6 shows the procedure for reading the accumulated data and displaying it. First, the user inputs the file name in which the accumulated data to be displayed is stored from the input / output device 12 in advance (block 1101). After that, when the user inputs a request for switching to the accumulated data display mode to the display process 16, the display process 16 switches to the accumulated data display mode (step 1102). At this time, the display process 16 stops transmitting the data transfer request to the receiving process, and the input destination switching routine 405 (FIG. 5) switches the data input destination from the receiving process 15 to the designated file on the storage device 13.

【００５３】表示プロセス１６はこのファイルをオープ
ンすると（ステップ１１０４）、表示プロセス１６はフ
ァイルの先頭にある並列計算機の構成定義情報を読み込
む（ステップ１１０５）。この時、それまで使用してい
た構成定義情報をバッファ（図示せず）へ退避し、ファ
イルから読み込んだ構成定義情報をもとにグラフのレイ
アウト等を計算し直し、そのグラフを表示画面に表示す
る。次に、ファイルを最後まで読んでブロック数を数
え、先に読み込んだ構成定義情報のノード数でこのブロ
ック数を割ることにより、蓄積されている複数のブロッ
クが測定された時間ステップ数を計算する（ステップ１
１０６）。そして、蓄積データ表示機能の制御画面を表
示する（ステップ１１０７）。When the display process 16 opens this file (step 1104), the display process 16 reads the configuration definition information of the parallel computer at the head of the file (step 1105). At this time, the configuration definition information used up to that point is saved to a buffer (not shown), the layout of the graph is recalculated based on the configuration definition information read from the file, and the graph is displayed on the display screen. To do. Next, the file is read to the end, the number of blocks is counted, and this number of blocks is divided by the number of nodes in the configuration definition information read earlier to calculate the number of time steps at which multiple blocks are measured. (Step 1
106). Then, the control screen of the accumulated data display function is displayed (step 1107).

【００５４】図１１は制御画面の一例である。ファイル
名表示欄９０２には、現在読み込みを行っているファイ
ル名が表示され、データ数表示欄９０３には処理ブロッ
ク１１０６で計算した蓄積データの時間ステップ数が表
示される。表示範囲入力欄９０４，９０５，９０６に
は、それぞれ表示する時間ステップ範囲の最初、最後お
よび何時間ステップおきにデータを表示するかという読
み飛ばす時間ステップ数を入力する。スライダ９０７
は、現在何番目の時間ステップのデータを表示している
かを示すもので、この位置を動かすことによって、表示
データを変更することが可能である。ボタン９０８，９
０９，９１０，９１１は、それぞれ一つ前の時間ステッ
プへのコマ送り、連続表示の停止、連続表示の開始、一
つ先の時間ステップへのコマ送りを指定するためのボタ
ンである。FIG. 11 shows an example of the control screen. The file name display column 902 displays the name of the file currently being read, and the data number display column 903 displays the number of time steps of accumulated data calculated in the processing block 1106. In the display range input fields 904, 905, and 906, the number of time steps to skip, such as the beginning, the end, and at what time step of the time step range to be displayed, is input. Slider 907
Indicates the number of the time step at which the data is currently displayed, and the display data can be changed by moving this position. Buttons 908, 9
Reference numerals 09, 910, and 911 are buttons for designating frame advance to the previous time step, stop of continuous display, start of continuous display, and frame advance to the next time step, respectively.

【００５５】ステップ１１０９で、このような制御画面
９０１により、利用者が表示時刻、すなわち表示すべき
データの時間ステップを指定すると、表示プロセス１６
はファイルの内容を読み込んで、該当する時間ステップ
のデータを検索する（ステップ１１１０）。そして、そ
のデータを表示装置へ描画する（ステップ１１１１）。
また、ステップ１１０９で、制御画面９０１のボタン９
１０を押して連続表示を開始させると、表示範囲入力欄
９０５で指定された表示範囲の最後の時間ステップに到
達するか、表示ボタン９０９により連続表示の停止が指
定されるまで（ステップ１１１２）、現在読み込んでい
るデータの次から１時間ステップ分のデータを読み込み
（ステップ１１１３）、表示することを繰り返す（ステ
ップ１１１４）。表示を繰り返す時間間隔は、ファイル
に蓄積しているヘッダレコード１００１内の時刻情報か
ら算出する方法や、あらかじめ指定する方法などが考え
られるが、いずれにしても図５における表示プロセス１
６内のクロック生成ルーチン４０７によって時刻を計測
し、指定時間が経過したら描画処理を行う。At step 1109, when the user designates the display time, that is, the time step of the data to be displayed, on the control screen 901, the display process 16
Reads the contents of the file and retrieves the data of the corresponding time step (step 1110). Then, the data is drawn on the display device (step 1111).
Further, in step 1109, the button 9 on the control screen 901 is pressed.
When the continuous display is started by pressing 10, the current time until the last time step of the display range specified in the display range input field 905 is reached or until the continuous display is stopped by the display button 909 (step 1112) Data for one hour step is read from the next to the read data (step 1113) and displayed repeatedly (step 1114). The time interval for repeating the display may be calculated from the time information in the header record 1001 accumulated in the file or may be specified in advance. In any case, the display process 1 in FIG.
The clock generation routine 407 in 6 measures the time, and when the designated time has elapsed, the drawing process is performed.

【００５６】マウス等の入力装置より、蓄積データ表示
機能の終了が利用者により指示されると（ステップ１１
０８）、終了処理を実行し、それにより、入力先切替ル
ーチン４０５が受信プロセスへと入力先を切り替え、こ
の表示プロセス１６は、通常のデータ表示モードへ戻る
（ステップ１１１５）。この時、表示プロセス１６はフ
ァイルから読み込まれた構成定義情報を破棄し、バッフ
ァへ退避させておいた以前の構成定義情報を復活させ、
表示エリア上のグラフを蓄積データ表示モードに切り替
える以前のリアルタイム表示の状態に戻す。これによ
り、蓄積プロセス１７と表示プロセス１６を起動してお
くことにより、現在のデータをリアルタイムで表示して
いる最中に、過去のデータを表示し直すことが可能にな
る。また、事前に蓄積しておいたデータを用いて、並列
計算機の稼働状況を事後解析することも可能となる。な
お、蓄積プロセス１７はそれぞれのモニタ用の計算機１
１上に並行して起動し、それぞれの蓄積プロセスにより
それぞれが起動された計算機に含まれた記憶装置１３に
性能データを並行して蓄積することができる。When the user instructs the end of the accumulated data display function from the input device such as a mouse (step 11).
08), end processing is executed, whereby the input destination switching routine 405 switches the input destination to the receiving process, and the display process 16 returns to the normal data display mode (step 1115). At this time, the display process 16 discards the configuration definition information read from the file, restores the previous configuration definition information saved in the buffer,
Restores the graph in the display area to the real-time display status before switching to the accumulated data display mode. As a result, by starting the accumulation process 17 and the display process 16, it becomes possible to redisplay the past data while the current data is being displayed in real time. It is also possible to perform post-mortem analysis of the operating status of the parallel computer using the data accumulated in advance. The accumulation process 17 is the computer 1 for each monitor.
The performance data can be stored in parallel in the storage devices 13 included in the computers that have been started up in parallel and that have been started by the respective storage processes.

【００５７】さらに、蓄積プロセス１７は受信プロセス
１５から見ると表示プロセス１６と特に区別されるもの
ではないため、１つの受信プロセス１５に対して複数の
表示プロセス１６が起動可能であるということは、複数
の蓄積プロセス１７の起動が可能であることを意味す
る。したがって、複数の使用者がそれぞれ互いに異なる
蓄積プロセス１７を同時に起動することによりデータ蓄
積機能を同時に利用することが可能となり、しかもそれ
らの蓄積プロセスの起動により並列計算機の負荷が増大
することはない。Furthermore, since the storage process 17 is not particularly distinguished from the display process 16 from the viewpoint of the reception process 15, the fact that a plurality of display processes 16 can be activated for one reception process 15 means that This means that a plurality of storage processes 17 can be activated. Therefore, a plurality of users can simultaneously use different data storage functions by simultaneously activating different storage processes 17, and the load on the parallel computer does not increase due to the activation of these storage processes.

【００５８】＜発明の実施の形態２＞本発明は、上記実
施の形態で使用した並列計算機とは異なるものにも適用
可能である。上記実施の形態では、並列計算機の内の特
定のノードのみが外部ネットワークに接続されていた。
しかし、すでに開発されている並列計算機の中には、複
数の特定のノードがこの外部ネットワークに接続されて
いるものもある。そのような並列計算機においては、そ
れらの特定のノードの各々に、収集プロセスを配置し、
並列計算機内に性能データの収集および受信プロセスへ
の転送をそれらの複数の収集プロセスにより分割して実
行させることができる。<Second Embodiment of the Invention> The present invention can be applied to a computer different from the parallel computer used in the above embodiments. In the above embodiment, only a specific node in the parallel computer is connected to the external network.
However, some of the parallel computers already developed have a plurality of specific nodes connected to this external network. In such a parallel computer, a collection process is placed in each of those specific nodes,
In the parallel computer, the collection of performance data and the transfer to the receiving process can be divided and executed by the plurality of collecting processes.

【００５９】すなわち、受信プロセスと各特定のノード
に起動された収集プロセスとの間の通信を使用して、そ
の収集プロセスがその特定のノードを含む一部の複数の
ノードの性能データの収集を指示し、それらのノードが
採取した性能データをその収集プロセスが収集し、上記
受信プロセスに転送する。このことを各特定のノードに
起動された収集プロセスが行う互いに並行に行う。こう
することにより、先の実施の形態における、全てのノー
ドが採取した複数の性能データを収集する特定の一つの
ノードの負荷よりは、この実施の形態２における各特定
のノードの負荷が減少する。このような並列計算機にお
いても、複数のユーザが、この並列計算機に性能データ
をモニタしようとする場合でも、並列計算機の負荷が増
大することはない。That is, using the communication between the receiving process and the collecting process activated in each specific node, the collecting process collects performance data of some nodes including the specific node. The collecting process collects the performance data collected by those nodes and transfers it to the receiving process. This is done in parallel with each other performed by the collection process launched on each particular node. By doing so, the load of each specific node in the second embodiment is reduced as compared with the load of one specific node collecting a plurality of performance data collected by all the nodes in the previous embodiment. . Even in such a parallel computer, even when a plurality of users try to monitor performance data in the parallel computer, the load on the parallel computer does not increase.

【００６０】＜発明の実施の形態３＞すでに開発されて
いる他の並列計算機では、それぞれが複数のノードを含
むような複数のパーティションに分割されている。この
場合には、各パーティションごとに、その中の複数のノ
ードによりいずれかの一つジョブを並列に実行させるこ
とができる。もちろん各パーティションで複数のジョブ
を実行させることも可能である。しかし、一人の利用者
が一つのパーティションを占有して使用するのに適して
いる。このような計算機システムではそれぞれのパーテ
ィションごとに稼動状況をモニタリングすることが有効
である。<Third Embodiment of the Invention> Another parallel computer already developed is divided into a plurality of partitions each including a plurality of nodes. In this case, for each partition, any one of the jobs can be executed in parallel by a plurality of nodes in the partition. Of course, it is also possible to execute a plurality of jobs in each partition. However, it is suitable for one user to occupy and use one partition. In such a computer system, it is effective to monitor the operating status for each partition.

【００６１】例えば、図１３では、並列計算機１が二つ
のパーティション１、パーティション２（１００Ａ，１
００Ｂ）に分割されている。各パーティションに含まれ
た複数のノードの内、予め定められた一つのノードが外
部ネットワーク２１に接続されている。この場合、採取
プロセス４を各ノード上で起動し、かつ収集プロセス３
を各パーティションの、外部ネットワーク２１に接続さ
れた特定のノード２上に起動する。パーティション１の
性能データを監視するために、第１の受信プロセス１５
Ａを第１のモニタ用計算機１１上に起動し、第１のパー
ティションの特定のノードに起動された、収集プロセス
３と交信させる。さらに表示プロセス１６Ａあるいは蓄
積プロセス１７Ａをそのモニタ用計算機上１１に起動
し、上記第１の受信プロセスと１５Ａに接続する。実施
の形態１の場合と同様にして、表示プロセス１６Ａある
いは蓄積プロセス１７Ａはこの受信プロセス１５Ａから
パーティション１の性能データの分配を受ける。さらに
他の図示しない表示プロセスあるいは蓄積プロセスを他
の図示しない計算機に起動された場合、その表示プロセ
スあるいは蓄積プロセスもこの第１の受信プロセスを介
して性能データの分配を受ける。For example, in FIG. 13, the parallel computer 1 has two partitions 1 and 2 (100A, 1A).
00B). A predetermined one node of the plurality of nodes included in each partition is connected to the external network 21. In this case, the collection process 4 is started on each node, and the collection process 3
Is started on a specific node 2 of each partition connected to the external network 21. In order to monitor the performance data of the partition 1, the first receiving process 15
A is started up on the first monitor computer 11 to communicate with the collection process 3 started up on a specific node of the first partition. Further, the display process 16A or the storage process 17A is activated on the monitor computer 11 and is connected to the first receiving process and 15A. Similar to the case of the first embodiment, the display process 16A or the storage process 17A receives the performance data of the partition 1 from the receiving process 15A. When another display process or storage process (not shown) is activated by another computer (not shown), the display process or storage process also receives performance data distribution via the first receiving process.

【００６２】同様にして他のパーティション２に関して
も第２の受信プロセス１５Ｂ、表示プロセス１６Ｂある
いは蓄積プロセス１７Ｂあるいはさらに他の図示しない
表示プロセスあるいは蓄積プロセスを起動する。この実
施の形態においては、各パーティションごとに性能デー
タを他のパーティションとは独立に採取することができ
る。さらに、同じパーティションの性能データをモニタ
する表示プロセスと蓄積プロセスの両方が同じモニタ用
の計算機に上に起動された場合にもあるいは同じパーテ
ィションの性能データをモニタする複数の表示プロセス
を異なるモニタ用のｐ計算機上に起動した場合にも、並
列計算機の負荷は増えない。Similarly, for the other partition 2, the second receiving process 15B, the display process 16B or the storage process 17B, or another display process or storage process (not shown) is activated. In this embodiment, performance data can be collected for each partition independently of other partitions. Furthermore, even if both the display process and the accumulation process that monitor the performance data of the same partition are launched on the same monitor computer, or if multiple display processes that monitor the performance data of the same partition are used for different monitors. Even when started on the p computer, the load on the parallel computer does not increase.

【００６３】＜発明の実施の形態４＞実施の形態２の最
も極端な場合は、並列計算機の全てのノードが外部ネッ
トワークに接続されている場合である。この場合には、
各ノードに上記採取プロセスと収集プロセスの機能を兼
ねたプロセスを起動し、上記受信プロセスが、各ノード
のこのプロセスと交信してそのノードの性能データを受
信させればよい。この場合には、各ノードに性能データ
の採取のための同じプロセスが起動されるので、実施の
形態１に比べて、異なるノード間で性能モニタのための
負荷の偏りが少ない。<Fourth Embodiment of the Invention> The most extreme case of the second embodiment is a case where all the nodes of the parallel computer are connected to the external network. In this case,
It suffices that each node activates the collection process and the process having the functions of the collection process, and the reception process communicates with this process of each node to receive the performance data of that node. In this case, since the same process for collecting performance data is started in each node, the bias of the load for performance monitoring between different nodes is smaller than that in the first embodiment.

【００６４】＜発明の実施の形態５＞本発明は、外部ネ
ットワークで結合された複数の計算機からなる分散型の
計算機システムに対しても同様に適用が可能である。こ
の場合には、この分散型のシステムに属する複数の計算
機の各々に、採取プロセスを配置し、それらの計算機の
内の一つ、たとえば、ネットワークを管理する計算機に
収集プロセスを配し、さらに、それらの計算機の内の、
監視対象となる複数の計算機のうちの一つに受信プロセ
スを配し、それらの監視対象となる複数の計算機の一部
あるいはそれらの監視対象となる複数の計算機とは異な
るいずれか一つの計算機に表示プロセスあるいは蓄積プ
ロセスを配置すればよい。なお、受信プロセスを監視対
象となる複数の計算機とは異なるものに起動することも
可能である。この結果、各採取プロセスと上記収集プロ
セスの間の通信あるいは収集プロセスと受信プロセスの
間の通信は外部ネットワークを介して行われる点で、実
施の形態１と異なる。しかし、複数の表示プロセスある
いは蓄積プロセスにこの受信プロセスから性能データを
分配できることは実施の形態１と同じである。<Fifth Embodiment of the Invention> The present invention can be similarly applied to a distributed computer system including a plurality of computers connected by an external network. In this case, a collection process is arranged on each of a plurality of computers belonging to this distributed system, and a collection process is arranged on one of those computers, for example, a computer that manages a network. Of those calculators,
A receiving process is allocated to one of the multiple computers to be monitored, and a part of the multiple computers to be monitored or to any one computer that is different from the multiple computers to be monitored A display process or a storage process may be arranged. It should be noted that it is possible to activate the receiving process on a computer different from the plurality of computers to be monitored. As a result, the communication between each collection process and the collection process or the communication between the collection process and the reception process is performed via the external network, which is different from the first embodiment. However, the performance data can be distributed from the receiving process to a plurality of display processes or storage processes, as in the first embodiment.

【００６５】＜変形例＞本発明は以上に示した実施の形
態に限定されるものではなく、以下に例示する複数の変
形例を含めいろいろの変形例を含むものである。（１）並列計算機内部の高速通信手順の利用実施の形態では、並列計算機内部の通信、すなわち採取
プロセス４と収集プロセス３との間の性能データの受け
渡しにＴＣＰ／ＩＰによる通信を用いるとして説明を行
った。しかし、並列計算機によっては、内部ネットワー
クが複数のデータを異なるパスを経由して並列に転送可
能なものを使用するものが多い。そのような内部ネット
ワークを介したノード間の通信には、ＴＣＰ／ＩＰより
も軽量、高速な内部通信手順を利用しているものも多
い。そのような計算機においては、上記実施の形態１，
２，３のごとく、その計算機内の一つもしくは一部の複
数のノードに起動された収集プロセスに他のノードで採
取された性能データを転送する方法を採用した場合に
は、そのノード間通信には、並列計算機固有の高速内部
通信プロトコルを使用することが性能データの転送の高
速化の点で有効である。<Modifications> The present invention is not limited to the above-described embodiments, but includes various modifications including a plurality of modifications illustrated below. (1) Use of high-speed communication procedure inside parallel computer In the embodiment, it is assumed that communication inside the parallel computer, that is, communication by TCP / IP is used for passing performance data between the collection process 4 and the collection process 3. went. However, in many parallel computers, an internal network that can transfer a plurality of data in parallel via different paths is used. For communication between nodes via such an internal network, there are many cases in which an internal communication procedure that is lighter and faster than TCP / IP is used. In such a computer, the first embodiment
If a method of transferring the performance data collected by other nodes to the collection process activated by one or some of the multiple nodes in the computer, such as 2 and 3, is used for communication between the nodes. It is effective to use a high-speed internal communication protocol peculiar to a parallel computer in order to speed up the transfer of performance data.

【００６６】この場合、図３に示す収集プロセス３で
は、入力解析ルーチン２０５ならびに出力制御ルーチン
２０６に代えて、内部通信に対応した入力解析ルーチン
および出力制御ルーチンの組、ならびに外部通信に対応
した入力解析ルーチンと出力制御ルーチンの組とがが用
意され、内部通信と外部通信のいずれを行うかにより、
これらの組の一方が使用される。入出力バッファ２０１
は前者により使用され、入出力バッファ２０３は後者に
より使用される。例えば、採取プロセス４から高速内部
通信手順によって入出力バッファ２０１に送られてきた
データは、採取プロセス４用の、高速内部通信に対応し
た入力解析ルーチンによって受信され、一度、内部処理
ルーチン２０７へ渡される。内部処理ルーチン２０７は
受信プロセス１５と通信するための、外部ネットワーク
通信に対応した出力制御ルーチン２０６を呼び出し、デ
ータを引き渡す。その出力制御ルーチン２０６はＴＣＰ
／ＩＰなどの、モニタ用計算機１１と通信が可能なプロ
トコルによって受信プロセス１５へデータを送信する。
本変形により、並列計算機の全てのノードがＴＣＰ／Ｉ
Ｐなどといった比較的負荷の重い通信プロトコルを使用
する場合より、並列計算機１にかかる、モニタリングに
よる負荷を減らすことが可能となる。In this case, in the collection process 3 shown in FIG. 3, instead of the input analysis routine 205 and the output control routine 206, a set of an input analysis routine and an output control routine corresponding to the internal communication, and an input corresponding to the external communication. A set of analysis routine and output control routine is prepared, and depending on whether internal communication or external communication is performed,
One of these sets is used. I / O buffer 201
Is used by the former and the input / output buffer 203 is used by the latter. For example, the data sent from the collection process 4 to the input / output buffer 201 by the high-speed internal communication procedure is received by the input analysis routine corresponding to the high-speed internal communication for the collection process 4, and is once passed to the internal processing routine 207. Be done. The internal processing routine 207 calls the output control routine 206 corresponding to the external network communication for communicating with the receiving process 15 and passes the data. The output control routine 206 is TCP
The data is transmitted to the receiving process 15 by a protocol such as / IP which enables communication with the monitoring computer 11.
With this modification, all the nodes of the parallel computer are TCP / I
It is possible to reduce the monitoring load on the parallel computer 1, as compared to the case where a relatively heavy communication protocol such as P is used.

【００６７】（２）採取する項目の受信プロセスからの
指定実施の形態１では、採取プロセスが採取する項目別性能
データは予め決まっていたが、受信プロセスからこれら
の項目を指定させてもよい。このためには、システム管
理者が受信プロセスを起動するときに、その受信プロセ
スを起動する計算機１１にこれらのデータを指定する情
報を入力させればよい。(2) Designation of Items to be Collected by Receiving Process In the first embodiment, the performance data for each item to be collected by the collecting process is predetermined, but these items may be specified by the receiving process. To this end, when the system administrator activates the reception process, the computer 11 that activates the reception process may input information designating these data.

【００６８】（３）受信プロセスでの一部の項目別性能
データの選択上記実施の形態１では、採取プロセスにて採取された複
数の項目別性能データ（例えばＣＰＵ利用率、メモリ利
用率、磁気ディスク記憶装置アクセス回数、通信回数）
が、収集プロセス３、受信プロセス１５を経由してつね
に表示プロセス１６にまで転送された。これに対し、受
信プロセス１５にて表示に必要な項目のみを選択して表
示プロセス１６に転送させると、転送されるデータ量を
低減できる。これを実現するためには、表示プロセス１
６が利用者によって起動され、受信プロセス１５に対し
て接続が確立した（ステップ５６２（図６（ｂ）））直
後に、表示プロセス１６から受信プロセス１５に対して
表示すべき性能データの通知を行う。受信プロセス１５
はどの表示プロセスがどの項目別性能データを必要とす
るかを記憶する。受信プロセス１５において内部処理ル
ーチン３０８が性能データを入出力バッファ３０４か
ら、いずれかの表示プロセスに接続された入出力バッフ
ァ３０１へメモリコピーする際に、受信した性能データ
全体をコピーするのではなく、必要と記憶された項目別
性能データのみを選択してコピーする。(3) Selection of Partial Item-Specific Performance Data in Receiving Process In the first embodiment, a plurality of item-specific performance data collected in the collecting process (for example, CPU utilization rate, memory utilization rate, magnetic field). Disk storage device access count, communication count)
Was always transferred to the display process 16 via the collection process 3 and the reception process 15. On the other hand, if only the items necessary for display are selected by the receiving process 15 and transferred to the display process 16, the amount of transferred data can be reduced. To achieve this, display process 1
6 is activated by the user, and immediately after the connection is established to the receiving process 15 (step 562 (FIG. 6B)), the display process 16 notifies the receiving process 15 of performance data to be displayed. To do. Reception process 15
Stores which display process requires which itemized performance data. When the internal processing routine 308 in the receiving process 15 copies the performance data from the input / output buffer 304 to the input / output buffer 301 connected to any of the display processes in the memory, instead of copying the entire received performance data, Select and copy only the item-specific performance data that is stored as necessary.

【００６９】（４）項目別性能データの選択的採取実施の形態１では、表示プロセスまたは蓄積プロセスの
起動とは独立に、性能データの収集を受信プロセスから
収集プロセスに要求した。しかし、いずれかの表示プロ
セスまたは蓄積プロセスが起動されてから、この要求を
発行するように受信プロセスを構成することもできる。
その際、利用者が要求する項目別性能データを表示プロ
セスあるいは蓄積プロセスから受信プロセスに通知さ
せ、受信プロセスにより起動されたそれらのプロセスが
通知した項目別性能データを採取することを監視対象の
計算機に要求させることもできる。この方法を採ると、
新たに表示プロセスあるいは蓄積プロセスが起動された
場合に、その新たに起動されたプロセスが要求する項目
別性能データが追加して採取されるように、受信プロセ
スが採取を要求する項目別性能データを更新することが
望ましい。こうすることにより、必要最小限の項目別性
能データのみを採取することになるので、採取する性能
データの量が少なくなる。(4) Selective collection of performance data by item In the first embodiment, the collection of performance data is requested from the reception process to the collection process independently of the activation of the display process or the accumulation process. However, it is also possible to configure the receiving process to issue this request after any display or storage process has been activated.
At that time, the monitoring process is performed by letting the receiving process notify the receiving process of the item-specific performance data requested by the user and collecting the item-specific performance data notified by those processes started by the receiving process. Can be requested. With this method,
When a new display process or storage process is started, the performance data for each item requested by the receiving process is collected so that the performance data for each item requested by the newly started process is additionally collected. It is desirable to update. By doing so, since only the minimum required item-specific performance data is collected, the amount of collected performance data is reduced.

【００７０】（５）複数の時間ステップでの採取された
データの一括転送実施の形態１は、データを採取する時間ステップ毎に採
取プロセス４から表示プロセス１６までデータを送信す
るものであったが、変形例として採取プロセス４におい
て複数の時間ステップの性能データを蓄積しておき、そ
れらをまとめて１回のデータ送信で送り出す方式が考え
られる。以下その変形例について説明する。図２におい
て、採取プロセス４の出力制御ルーチン１０３は、カウ
ンタ制御／読み出しルーチン１０４からデータを受け取
ると、入出力バッファ１０１中のその時点での最後尾へ
格納し、入出力バッファ１０１へ格納したレコード数を
記憶しておく。入出力バッファ１０１内のレコード数が
所定値に達したら、出力制御ルーチン１０３はそれまで
格納したデータをまとめて一つのデータとして収集プロ
セス３へ送信する。収集プロセス３ならびに受信プロセ
ス１５については、前述の実施の形態１と同様の動作を
する。このデータを最終的に受信した表示プロセス１６
は、図５において、内部のクロック４０７を参照して、
一定時間毎に送受信バッファ４０４内のデータから一時
刻分のデータを読み出し、表示を行う。送受信バッファ
４０４内のデータを全て処理し終えたら、表示プロセス
１６は受信プロセス１５へデータ要求を送信する。な
お、蓄積プロセス１７に関しては、動作に変更はなく、
前述の実施の形態１の通りデータを蓄積する。これによ
り、採取したデータの転送回数を少なくすることが可能
となり、通信負荷を減少することが可能となる。(5) Batch Transfer of Collected Data at Plural Time Steps In the first embodiment, the data is transmitted from the collecting process 4 to the displaying process 16 at each time step of collecting data. As a modified example, a method is conceivable in which performance data at a plurality of time steps is accumulated in the collection process 4 and these are collectively sent out by one data transmission. The modification will be described below. In FIG. 2, when the output control routine 103 of the collection process 4 receives data from the counter control / readout routine 104, the output control routine 103 stores the data in the last portion of the input / output buffer 101 at that time, and stores the record in the input / output buffer 101. Remember the number. When the number of records in the input / output buffer 101 reaches a predetermined value, the output control routine 103 collects the data stored up to that point and transmits it as one data to the collection process 3. The collecting process 3 and the receiving process 15 operate in the same manner as in the first embodiment described above. Display process 16 that finally received this data
Refer to the internal clock 407 in FIG.
Data for one time is read from the data in the transmission / reception buffer 404 at regular time intervals and displayed. After processing all the data in the transmission / reception buffer 404, the display process 16 transmits a data request to the reception process 15. There is no change in the operation of the accumulation process 17,
The data is stored as in the first embodiment. This makes it possible to reduce the number of times of transfer of the collected data and reduce the communication load.

【００７１】[0071]

【発明の効果】本発明によれば、監視対象の計算機から
採取された性能データを利用するモニタ用のプロセス
（表示プロセスあるいは蓄積プロセス）の数が増大して
も、監視対象の計算機自体の負荷はほとんど増加するこ
とがない。According to the present invention, even if the number of monitoring processes (display process or storage process) using the performance data collected from the monitored computer increases, the load on the monitored computer itself is increased. Is almost never increased.

[Brief description of drawings]

【図１】本発明による並列計算機性能モニタリングシス
テムの全体構成図。FIG. 1 is an overall configuration diagram of a parallel computer performance monitoring system according to the present invention.

【図２】並列計算機性能モニタリングシステムにおける
採取プロセスのモジュール関連図。FIG. 2 is a module-related diagram of a collection process in a parallel computer performance monitoring system.

【図３】並列計算機性能モニタリングシステムにおける
収集プロセスのモジュール関連図。FIG. 3 is a module-related diagram of a collection process in the parallel computer performance monitoring system.

【図４】並列計算機性能モニタリングシステムにおける
受信プロセスのモジュール関連図。FIG. 4 is a module related diagram of a reception process in the parallel computer performance monitoring system.

【図５】並列計算機性能モニタリングシステムにおける
表示プロセス／蓄積プロセスのモジュール関連図。FIG. 5 is a module related diagram of a display process / storage process in the parallel computer performance monitoring system.

【図６】（ａ）は、図１のシステムで採用される並列計
算機性能モニタリング方法の処理手順の一部を示すフロ
ーチャート。（ｂ）は、図１のシステムで採用される並
列計算機性能モニタリング方法の処理手順の他の一部を
示すフローチャート。6A is a flowchart showing a part of a processing procedure of a parallel computer performance monitoring method adopted in the system of FIG. (B) is a flowchart which shows another part of the processing procedure of the parallel computer performance monitoring method employ | adopted by the system of FIG.

【図７】性能データの表示画面の一例を示す図。FIG. 7 is a diagram showing an example of a display screen of performance data.

【図８】性能データの表示画面の他の例を示す図。FIG. 8 is a diagram showing another example of a display screen of performance data.

【図９】（ａ）は、蓄積プロセスが記憶装置へ蓄積する
ブロックのヘッダレコードの形式の例を示す図。（ｂ）
は、蓄積プロセスが記憶装置へ蓄積するブロックの通常
のデータレコードの形式の例を示す図。（ｃ）は、蓄積
プロセスが記憶装置へ蓄積するブロックのブロック終了
レコードの形式の例を示す図。（ｄ）は、蓄積プロセス
が記憶装置へ蓄積するファイルの形式の例を示す図。FIG. 9A is a diagram showing an example of the format of a header record of a block stored in a storage device by a storage process. (B)
FIG. 3 is a diagram showing an example of a format of a normal data record of a block that the storage process stores in a storage device. FIG. 6C is a diagram showing an example of the format of a block end record of a block stored in a storage device by a storage process. FIG. 6D is a diagram showing an example of the format of a file stored in a storage device by a storage process.

【図１０】蓄積されたデータの再表示処理手順を示すフ
ローチャート。FIG. 10 is a flowchart showing a procedure for redisplaying accumulated data.

【図１１】蓄積されたログデータの表示機能を制御する
ための制御画面の一例を示す図。FIG. 11 is a diagram showing an example of a control screen for controlling the display function of accumulated log data.

【図１２】本発明による性能モニタリング方法を適用す
る並列計算機の全体構成図。FIG. 12 is an overall configuration diagram of a parallel computer to which the performance monitoring method according to the present invention is applied.

【図１３】本発明による性能モニタリング方法を適用す
る他の並列計算機の全体構成図。FIG. 13 is an overall configuration diagram of another parallel computer to which the performance monitoring method according to the present invention is applied.

─────────────────────────────────────────────────────
─────────────────────────────────────────────────── ───

【手続補正書】[Procedure amendment]

【提出日】平成９年４月８日[Submission date] April 8, 1997

【手続補正１】[Procedure amendment 1]

【補正対象書類名】明細書[Document name to be amended] Statement

【補正対象項目名】図６[Correction target item name] Fig. 6

【補正方法】変更[Correction method] Change

【補正内容】[Correction contents]

【図６】図１のシステムで採用される並列計算機性能モ
ニタリング方法の処理手順を示すフローチャート。 6 is a parallel computer performance model used in the system of FIG .
The flowchart which shows the process sequence of a nitering method.

───────────────────────────────────────────────────── フロントページの続き (72)発明者佐川暢俊東京都国分寺市東恋ケ窪一丁目280番地株式会社日立製作所中央研究所内 (72)発明者太田忠東京都小平市上水本町５丁目20番１号日立超エル・エス・アイ・エンジニアリング株式会社内 (72)発明者山賀晋東京都小平市上水本町５丁目20番１号日立超エル・エス・アイ・エンジニアリング株式会社内 ─────────────────────────────────────────────────── ─── Continuation of the front page (72) Inventor Nobutoshi Sagawa 1-280, Higashi Koikekubo, Kokubunji City, Tokyo Inside Central Research Laboratory, Hitachi, Ltd. (72) Inventor Tadashi Ota 5-2-1, Mizumizumoto-cho, Kodaira-shi, Tokyo No. Hitate Super LSI Engineering Co., Ltd. (72) Inventor Susumu Yamaga 5-20-1 Kamimizuhonmachi, Kodaira-shi, Tokyo Inside Hitate Super LSI Engineering Co., Ltd.

Claims

[Claims]

1. A computer network having a plurality of computers including a computer system to be monitored and a computer connection network for connecting them, wherein the computer system to be monitored provides performance data of the computer system at a plurality of timings. The performance data collected by the monitored computer system is monitored by the receiving process activated on one computer of the multiple computers other than the monitored computer system. Received from the target computer system via the computer connection network, among the plurality of computers, equal to or different from the one computer, started on a plurality of computers other than the monitored computer system, For each of the multiple usage processes for using performance data, Computer performance monitoring method comprising the step of transferring the performance data by the receiving process.

2. The plurality of utilization processes have a plurality of display processes for displaying performance data on a display device connected to the plurality of computers on which the utilization processes are activated. Computer performance monitoring method.

3. The performance data includes performance data of a plurality of items, and the method is a part of the performance data of a plurality of items included in the performance data distributed to each display process. The computer performance monitoring method according to claim 2, further comprising the step of displaying the different performance data on a display device connected to a computer on which the display process is activated by the display process.

4. The method further comprises the step of receiving, by the receiving process, a performance data transfer request sent from each display process, and the step of transferring the performance data from the receiving process to each display process is performed by the receiving process. 4. The computer performance monitoring method according to claim 3, which is executed in response to the transfer request sent from the display process.

5. The step of transferring performance data to each display process includes the step of already transmitting a performance data transfer request to the receiving process each time the receiving process receives the performance data from the monitored computer system. If there is one display process that has already sent the performance data transfer request, the received performance data is transferred to the detected display process, and the performance data is transferred. 5. The computer performance monitoring method according to claim 4, further comprising the step of transferring the received performance data to each of the detected plurality of display processes when a plurality of display processes which have already sent the transfer request are detected. .

6. The computer performance monitoring method according to claim 4, wherein some of the item-specific performance data displayed by one of the plurality of display processes is item-specific performance data selected by a user of the display process. .

7. The partial item-specific performance data displayed by one of the plurality of display processes is item-specific performance data predetermined for the display process.
The described computer performance monitoring method.

8. The computer performance monitoring method according to claim 3, wherein some of the item-specific performance data displayed by one of the plurality of display processes is item-specific performance data selected by a user of the display process. .

9. The performance data according to a part of items displayed by one of the plurality of display processes is performance data according to items predetermined for the display process.
The described computer performance monitoring method.

10. The step of transferring performance data to each display process, the transfer request sent from each display process requesting the transfer of some item-specific performance data is received by the receiving process, and the transfer request is received. Each time the receiving process receives the performance data from the monitored computer system after the transmission of the, the above-mentioned request requested by the transfer request among the plurality of item-specific performance data included in the received performance data. 3. The computer performance monitoring method according to claim 2, further comprising the step of selecting a part of the item-specific performance data by the receiving process and transferring the selected part of the item-specific performance data to the display process by the receiving process.

11. The method further comprises the step of sending a performance data transfer request from the receiving process to the monitored computer system, wherein the receiving step includes sending to the monitored computer system after sending the transfer request. The computer performance monitoring method according to claim 2, comprising the step of receiving performance data sent from the computer.

12. The computer performance monitoring method according to claim 11, wherein the step of transmitting the performance data transfer request is executed regardless of whether or not any of the plurality of display processes is activated.

13. The computer performance monitoring method according to claim 11, wherein the step of sending out the performance data transfer request is executed after any one of the plurality of display processes is activated.

14. The method according to claim 2, further comprising the step of notifying the computer system to be monitored of the plurality of item-specific performance data to be collected from the receiving process before any of the plurality of display processes is activated. The described computer performance monitoring method.

15. The method further comprises the step of inputting information for determining a plurality of item-by-item performance data to be collected into the one computer on which the reception process is started by an operator, and the step of notifying includes: 15. The computer performance monitoring method according to claim 14, which is executed depending on the input information.

16. A transfer request for specifying a plurality of item-specific performance data to be transferred, which is sent by the receiving process when one of the plurality of display processes is first activated.
3. The computer performance monitoring method according to claim 2, further comprising a step of requesting the monitoring target computer system to collect the plurality of item-specific performance data received by the reception process and designated by the transfer request from the reception process.

17. The receiving step comprises: when the computer system to be monitored includes a plurality of nodes each including at least one processor, and at least one node is connected to the computer connection network. There is the step of receiving performance data from one node, where the collection process launched on each node collects the performance data of that node and the collection process launched on the one node The computer performance monitoring method according to claim 1, wherein the computer system to be monitored is programmed so as to collect the performance data collected by the collection process activated in each of the plurality of nodes.

18. The computer system to be monitored further has an internal network for connecting the plurality of nodes, which is different from the computer connection network, and the collection process started on each node from the collection process. 18. The computer performance monitoring method according to claim 17, wherein the transfer of the performance data collected by the node from the process to the collection process is performed via the internal network.

19. From the harvesting process of each node,
Transferring the performance data collected by the node to the collection process is performed via the internal network and according to the internal communication procedure defined for the computer system to be monitored, and the internal communication procedure is 19. The computer performance monitoring method according to claim 18, which is different from a communication procedure defined for the computer connection network that connects the plurality of computers to each other.

20. In the receiving step, the plurality of nodes are divided into a plurality of partitions each including a plurality of nodes, and at least one of the plurality of nodes included in each partition is connected to the computer connection network. In this case, the performance data of the plurality of nodes included in the one partition are collectively received from the one node included in one of the plurality of partitions. The collection process started on each of the plurality of nodes belonging to the partition collects the performance data of that node, and the collection process started on the one node of each partition is connected to the plurality of nodes belonging to the partition. To collect the performance data collected by the collection process started for each 18. The computer system to be monitored is programmed.
The described computer performance monitoring method.

21. The other receiving process for receiving the performance data of another one of the plurality of partitions is started in any one of the plurality of computers, and the performance data of the other partition is received. Each of the other plurality of display processes for displaying is started in one of the plurality of computers, and the performance data of the other partition is received by the other receiving process, and the performance of the other partition is displayed. 21. The computer performance monitoring method according to claim 20, further comprising the step of transferring data to each of the plurality of other display processes by the other receiving process.

22. The receiving step is a distributed computer system in which the computer system to be monitored comprises a plurality of computers selected from the plurality of computers connected by the computer connection network, and the plurality of nodes. Each of which includes one of the plurality of selected computers, collectively receives a plurality of performance data for each of the plurality of selected computers from one of the plurality of selected computers. The distributed computer system, the collecting process is activated on each of the plurality of selected computers, and the collecting process is performed on the one computer of the plurality of selected computers. The nature of the computer that the collection process was launched on and the collection process was launched on the selected computers. The computer performance monitoring method according to claim 1, wherein the computer performance monitoring method is programmed to collect performance data.

23. In the receiving step, the computer system to be monitored is a computer system including a plurality of nodes, and when each node is connected to the computer connection network, the receiving step includes: And receiving the performance data of the node from the monitoring target computer system, wherein the collection process started on each node is programmed to collect the performance data of the node. The described computer performance monitoring method.

24. The computer system to be monitored includes a plurality of processors connected to each other, and in the case where at least one of the plurality of processors is connected to the computer connection network, the receiving step comprises: Collecting the performance data of each of the plurality of processors and receiving from the one of the processors, the performance data of the processor by a sampling process activated on each of the plurality of processors. The computer performance according to claim 1, wherein the computer system is programmed to collect the performance data collected by the collection process of each processor by a collection process started on the one processor. Monitoring method.

25. The plurality of utilization processes have a plurality of accumulation processes for storing performance data in a storage device connected to the plurality of computers on which the utilization processes are activated. Computer performance monitoring method.

26. A computer monitoring system for monitoring a computer system to be monitored via a computer connection network, comprising one computer connected to the computer connection network and started on the one computer. And a receiving process, wherein the receiving process receives the collected performance data from the monitored computer system every time the monitored computer system repeatedly collects the performance data of the computer system at different timings. Programmed to transfer the received performance data to each of first and second display processes for displaying performance data, wherein the first display process is equal to or different from the one computer, The second computer is started on the first computer connected to the computer connection network. Shows the process, the first different than the computer, which has been started on the second computer connected to said computer connection network.