JPH11203257A

JPH11203257A - Computer system

Info

Publication number: JPH11203257A
Application number: JP10004844A
Authority: JP
Inventors: Michio Suzuki; 道朗鈴木
Original assignee: MIYAGI OKI DENKI KK; Oki Electric Industry Co Ltd
Current assignee: MIYAGI OKI DENKI KK; Oki Electric Industry Co Ltd
Priority date: 1998-01-13
Filing date: 1998-01-13
Publication date: 1999-07-30

Abstract

PROBLEM TO BE SOLVED: To provide a computer system of high reliability that does not stop supplying services to a client. SOLUTION: A server of a computer system to perform large scale process management adopts a duplex system of CPUs 11 and 21, a duplexed disk and a duplexed network. Also applications activated in the server are roughly divided into communication control applications 13 and 23 for communicating with a client group 41, data base applications 12 and 22 for accessing a data base and other data working applications 14 and 24, wherein plural applications stay in each of the CPUs 11 and 12, and perform at least confirmation of the presence of applications, communication-enable/disable detection of the applications and database access enable/disable detection of the applications.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、クライアント・サ
ーバ型のコンピュータシステムに係り、特に、サーバア
プリケーションの障害検出方式および障害復旧後の通信
制御方式に関するものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a client-server type computer system, and more particularly to a system for detecting a failure of a server application and a communication control system after recovery from the failure.

【０００２】[0002]

【従来の技術】最近のコンピュータシステムは、クライ
アント・サーバ型が普及し、その中でそのサーバではデ
ータベースをアクセスするアプリケーションと、通信の
制御を行うアプリケーション及びその他のデータ加工用
アプリケーションに大別される。2. Description of the Related Art In recent computer systems, a client-server type has become widespread, and the server is roughly divided into an application for accessing a database, an application for controlling communication, and other data processing applications. .

【０００３】また、大規模な工程管理を行うコンピュー
タシステムでは、ハードウェアの２重化は当然となり、
アプリケーションのダウンに備えて同じ機能を持ったア
プリケーションをグループ化し、グループ間で通信する
仕組みが一般的となっている。また、そのサーバでは非
同期に発生するクライアントからの処理内容の異なる要
求を受取り、これらの通信制御を行っていた。[0003] In a computer system for performing a large-scale process control, it is natural that hardware is duplicated.
It is common to group applications having the same function in preparation for application downtime and communicate between the groups. In addition, the server receives requests with different processing contents from the client, which occur asynchronously, and controls these communications.

【０００４】[0004]

【発明が解決しようとする課題】コンピュータシステム
の障害対応として、オペレーティングシステム内でＣＰ
Ｕ（中央処理装置）やディスク等のハードウェアの２重
化対応を実施していた。ソフトウェアの障害対応として
は、アプリケーション間の通信先を変更する仕組みであ
り、オペレーティングシステム上に存在しているが機能
しない障害が発生した場合には、送信したメッセージが
喪失してしまう障害があった。In order to cope with a failure of a computer system, a CP in an operating system is used.
U (Central Processing Unit) and hardware such as disks were duplicated. In response to software failures, the mechanism to change the communication destination between applications was such that if a failure that exists on the operating system but did not function occurred, the transmitted message was lost. .

【０００５】従来、アプリケーションの障害を自動的に
検出して、自ノードにて再起動する機能はあったが、環
境的な要因、例えば、各アプリケーションごとに割当て
られた占有メモリを超過した場合には、再起動を繰り返
し、該当メッセージが処理されないという問題があっ
た。また、アプリケーション間の通信先を変更し、複数
の経路にてメッセージの送信を行うため、各経路ごとに
正しく機能することを検出するためには、クライアント
からの要求に対する処理を行う際に検出できるため、問
題への対応が遅れるという問題があった。Conventionally, there has been a function of automatically detecting a failure of an application and restarting the node at its own node. However, when an environmental factor, for example, when an occupied memory allocated to each application is exceeded, the function is restarted. Had a problem that the message was not processed after repeated restart. In addition, since the communication destination between the applications is changed and the message is transmitted through a plurality of paths, it can be detected when processing a request from a client in order to detect that it functions correctly for each path. Therefore, there was a problem that the response to the problem was delayed.

【０００６】一方、クライアントからの要求に応じてレ
スポンスを早くするため、複数の送信先に順番に送信す
る方法を行っていたが、障害復旧後、処理前のメッセー
ジの数量と各メッセージごとの重さの総和が送信経路ご
とに不均衡となり、早く送信したメッセージが、後で送
ったメッセージに追い越されることがあった。本発明
は、上記問題点を除去し、クライアントへのサービスを
停止させることがない信頼性の高いコンピュータシステ
ムを提供することを目的とする。On the other hand, in order to speed up a response in response to a request from a client, a method of sequentially transmitting data to a plurality of destinations has been used. However, after recovery from a failure, the number of messages before processing and the weight of each message are reduced. In some cases, the sum of the data becomes unbalanced for each transmission path, and a message transmitted earlier may be overtaken by a message transmitted later. SUMMARY OF THE INVENTION It is an object of the present invention to eliminate the above-mentioned problems and to provide a highly reliable computer system that does not stop services to clients.

【０００７】[0007]

【課題を解決するための手段】本発明は、上記目的を達
成するために、〔１〕クライアント・サーバ型のコンピュータシステム
において、サーバ上に存在するアプリケーションの障害
検出手段として、前記アプリケーションの存在確認手段
と、前記アプリケーションの通信可否検出手段と、前記
アプリケーションのデータベースアクセス可否検出手段
とを具備するようにしたものである。In order to achieve the above object, the present invention provides: [1] In a client-server type computer system, as a means for detecting a failure of an application existing on a server, the existence of the application is confirmed. Means, means for detecting the communication availability of the application, and means for detecting the database access availability of the application.

【０００８】〔２〕上記〔１〕記載のコンピュータシス
テムにおいて、前記アプリケーションの障害検出手段に
おいて障害を検出した際の対応手段として、異常アプリ
ケーションの再起動手段と、グループ内アプリケーショ
ンの再起動手段とを具備するようにしたものである。〔３〕上記〔２〕記載のコンピュータシステムにおい
て、障害復旧後において最適な通信経路の探索手段とし
て、障害復旧後に通信待ちの少ない経路の探索と通信先
を変更する手段と、障害復旧後にメッセージの追い越し
可否条件により通信先を決定する手段と、障害復旧後、
前記アプリケーションの初期起動ノードへの移行手段を
具備するようにしたものである。[2] In the computer system according to the above [1], a restart means for an abnormal application and a restart means for an application in a group are provided as means for responding when a fault is detected by the fault detecting means for the application. It is provided with. [3] In the computer system according to the above [2], a means for searching for a path with a small communication wait after the recovery from the failure and changing the communication destination as a means for searching for an optimal communication path after the recovery from the failure; A means for determining a communication destination based on overtaking conditions, and after a failure recovery,
A means for shifting the application to an initial start node is provided.

【０００９】[0009]

【発明の実施の形態】以下、本発明の実施の形態につい
て図面を参照しながら詳細に説明する。図１は本発明の
実施例を示すコンピュータシステムの概略システム構成
図である。この図において、１，２はデータベース（Ｄ
Ｂ）、１１は第１のＣＰＵ（中央処理装置）、１２は第
１のＤＢアプリケーション（グループ）、１３は第１の
通信制御アプリケーション（グループ）、１４は第１の
データ加工アプリケーション（グループ）、１５は第１
の異常検出アプリケーションである。Embodiments of the present invention will be described below in detail with reference to the drawings. FIG. 1 is a schematic system configuration diagram of a computer system showing an embodiment of the present invention. In this figure, 1 and 2 are databases (D
B), 11 is a first CPU (central processing unit), 12 is a first DB application (group), 13 is a first communication control application (group), 14 is a first data processing application (group), 15 is the first
Is an abnormality detection application.

【００１０】また、２１は第２のＣＰＵ、２２は第２の
ＤＢアプリケーション（グループ）、２３は第２の通信
制御アプリケーション（グループ）、２４は第２のデー
タ加工アプリケーション（グループ）、２５は第２の異
常検出アプリケーションである。更に、第１の異常検出
アプリケーション１５と第２の異常検出アプリケーショ
ン２５は異常検出モニタ３１に接続されている。Reference numeral 21 denotes a second CPU, 22 denotes a second DB application (group), 23 denotes a second communication control application (group), 24 denotes a second data processing application (group), and 25 denotes a second data processing application (group). 2 is an abnormality detection application. Further, the first abnormality detection application 15 and the second abnormality detection application 25 are connected to an abnormality detection monitor 31.

【００１１】また、第１の通信制御アプリケーション１
３と第２の通信制御アプリケーション２３には、クライ
アント（例えば、パソコン端末）群４１が接続されてい
る。このように、大規模な工程管理を行うコンピュータ
システムのサーバでは、ＣＰＵ１１，２１の２重化やデ
ィスクの２重化およびネットワークの２重化を行ってい
る。Also, the first communication control application 1
A client (for example, a personal computer terminal) group 41 is connected to the third communication control application 23 and the second communication control application 23. As described above, in the server of the computer system that performs the large-scale process management, the CPUs 11 and 21 are duplicated, the disks are duplicated, and the network is duplicated.

【００１２】また、そのサーバにて起動されるアプリケ
ーションは、クライアント群４１と互いに通信する通信
制御アプリケーション１３，２３とデータベースをアク
セスするＤＢアプリケーション１２，２２及びその他の
データ加工アプリケーション１４，２４に大別され、そ
れぞれのＣＰＵ１１，２１に複数常駐している。これら
の異常を検出する異常検出アプリケーション１５，２５
が、各ＣＰＵ１１，２１に常駐し異常検出モニタ３１に
検出結果を表示する。The applications started by the server are roughly divided into communication control applications 13 and 23 for communicating with the client group 41, DB applications 12 and 22 for accessing the database, and other data processing applications 14 and 24. Each of the CPUs 11 and 21 resides in a plurality. Anomaly detection applications 15, 25 for detecting these anomalies
Are resident in each of the CPUs 11 and 21 and display the detection result on the abnormality detection monitor 31.

【００１３】図２は本発明の実施例を示すコンピュータ
システムの動作フローチャートである。以下、このコン
ピュータシステムの動作を図２に従って説明する。アプ
リケーションの障害には大きく分けて２つあり、１つは
コンピュータ上から離脱する障害と、コンピュータ上存
在するが機能しない障害がある。FIG. 2 is an operation flowchart of a computer system showing an embodiment of the present invention. Hereinafter, the operation of the computer system will be described with reference to FIG. There are roughly two types of application failures. One is a failure that leaves the computer and the other is a failure that exists on the computer but does not function.

【００１４】これらの判定をするため、異常検出アプリ
ケーションでは、各アプリケーションの存在チェックを
行う。すなわち、ＯＳ（オペレーティングシステム）上
に存在するアプリケーションの検索を行い（ステップＳ
１）、全てのアプリケーションが存在するか否かのチェ
ックを行う（ステップＳ２）。次に、そのコンピュータ
上に存在するが機能しないアプリケーションを検出する
ために、異常検出アプリケーションでは全てのアプリケ
ーションへ通信可能か応答要求付きのメッセージを送信
し（ステップＳ３）、その応答の有無をチェックする
（ステップＳ４）。In order to make these determinations, the abnormality detection application checks the existence of each application. That is, a search for an application existing on the OS (operating system) is performed (Step S).
1) Check whether all applications exist (step S2). Next, in order to detect an application that exists on the computer but does not function, the abnormality detection application transmits a message with a response request to all the applications to determine whether communication is possible (step S3) and checks whether there is a response. (Step S4).

【００１５】各アプリケーションでは応答要求付きのメ
ッセージを受信すると、送信元に応答を返す。異常検出
アプリケーションでは一定期間受信待ちとするが、一定
期間を超過した場合は異常と判断する（ステップＳ
７）。また、データベースではセキュリティーの設定ミ
スや各アプリケーションごとに割り当てられた資源を超
過してアクセスすると参照可能で更新不可等の障害が発
生することがある。When each application receives a message with a response request, it returns a response to the transmission source. The abnormality detection application waits for a certain period of time, but if it exceeds a certain period, it is determined that there is an abnormality (step S
7). In addition, if a database is accessed with a security setting error or a resource that exceeds the resources allocated to each application, a failure may occur such that the database can be referenced and cannot be updated.

【００１６】この障害を検出するために、異常検出アプ
リケーションではＤＢにアクセスするアプリケーション
に対してＤＢチェックの応答要求付きのメッセージを送
信する（ステップＳ５）。各アプリケーションではＤＢ
チェックの応答要求付きのメッセージを受信するとＤＢ
の所定テーブルに対して参照、追加、更新、削除を行
い、アクセスした結果を異常検出アプリケーションへ報
告する。In order to detect this failure, the abnormality detection application transmits a message with a DB check response request to the application accessing the DB (step S5). DB for each application
DB when receiving message with check response request
, Reference, addition, update, and deletion of the predetermined table, and reports the access result to the abnormality detection application.

【００１７】異常検出アプリケーションでは一定期間受
信待ちとするが、一定期間を超過した場合は異常と検出
し、さらに報告内容により異常か正常かを判定する（ス
テップＳ６）。次に、異常検出アプリケーションにて異
常と判定したアプリケーションの対応を定義したテーブ
ルを検索し対応を決定する。In the abnormality detection application, reception is waited for a certain period of time. If the period exceeds a certain period, it is detected as abnormal, and it is further determined whether the report is abnormal or normal based on the contents of the report (step S6). Next, a table that defines the correspondence of the application determined to be abnormal by the abnormality detection application is searched to determine the correspondence.

【００１８】この異常検出対応テーブルには、例えば、
図３に示すように、存在すべきアプリケーション別に対
応方法を定義する。対応方法には起動ノードにての再起
動と、他ノードにての再起動方法がある。高速にデータ
をアクセスする必要がある場合、メモリ上のデータを共
有するのが一般的であり、同一機能のアプリケーション
は同一グループとして管理する。In the abnormality detection correspondence table, for example,
As shown in FIG. 3, a corresponding method is defined for each application that should exist. The response method includes a restart at the start node and a restart method at another node. When it is necessary to access data at high speed, it is common to share data on a memory, and applications having the same function are managed as the same group.

【００１９】以下、障害アプリケーションが有りの場合
の動作フローについて説明する。図４は本発明の実施例
を示す障害アプリケーションが有りの場合の動作フロー
チャートである。まず、障害アプリケーションの検索を
行い（ステップＳ１１）、その障害アプリケーションが
停止か否かをチェックする（ステップＳ１２）。障害ア
プリケーションを再起動する場合には、まず、障害アプ
リケーションが停止していなければ停止処理を行う（ス
テップＳ１３）。The operation flow when there is a faulty application will be described below. FIG. 4 is an operation flowchart when there is a failure application according to the embodiment of the present invention. First, a failed application is searched (step S11), and it is checked whether the failed application is stopped (step S12). When restarting the failed application, first, if the failed application has not been stopped, stop processing is performed (step S13).

【００２０】次いで、グループ内の１つのアプリケーシ
ョンに障害が発生した場合には、他ノードにグループ毎
に移動する必要がある。このため同一グループのアプリ
ケーションが存在するか、異常検出対応テーブルを検索
し（ステップＳ１４）、同一グループのアプリケーショ
ンが存在すれば停止処理を行う（ステップＳ１５）。Next, when a failure occurs in one application in the group, it is necessary to move to another node for each group. Therefore, an abnormality detection correspondence table is searched to determine whether there is an application in the same group (step S14), and if there is an application in the same group, stop processing is performed (step S15).

【００２１】次に、停止しているアプリケーションを異
常検出対応テーブルを参照して再起動ノードを検索し
（ステップＳ１６）、アプリケーションの起動処理を行
う（ステップＳ１７）。最後に、これらのアプリケーシ
ョンの異常が、例えば、ＣＰＵ１１にて多発してＣＰＵ
２２のノードにアプリケーションが移行した場合、負荷
のバランスが崩れ、レスポンスの低下が発生するため、
システム管理者が異常を取り除いた後で、異常検出アプ
リケーションへ初期起動ノードへの移行の要求を行う。Next, the restarting node is searched for the stopped application by referring to the abnormality detection correspondence table (step S16), and the application is started (step S17). Finally, abnormalities of these applications frequently occur in the CPU 11, for example.
When the application is migrated to 22 nodes, the load balance will be lost and the response will decrease.
After the system administrator removes the abnormality, the system administrator requests the abnormality detection application to shift to the initial startup node.

【００２２】以下、その初期起動ノードへの移行動作に
ついて説明する。図５は本発明の実施例を示す初期起動
ノードへの移行動作フローチャートである。まず、初期
起動ノードへの移行処理要求を受信する（ステップＳ２
１）。次に、ＯＳ上に存在するアプリケーションの検索
を行う（ステップＳ２２）。Hereinafter, the operation of shifting to the initial activation node will be described. FIG. 5 is a flowchart showing the operation of shifting to the initial startup node according to the embodiment of the present invention. First, a request for transition processing to the initial activation node is received (step S2).
1). Next, an application existing on the OS is searched (step S22).

【００２３】次いで、該当アプリケーションの初期起動
ノードの検索を行う（ステップＳ２３）。異常検出アプ
リケーションでは各アプリケーションの現在の起動ノー
ドと初期起動ノードを比較し（ステップＳ２４）、一致
していなければ、初期起動ノードにて再起動要求を送信
する（ステップＳ２５，Ｓ２６）。Next, a search is made for an initial startup node of the application (step S23). The abnormality detection application compares the current start node of each application with the initial start node (step S24), and if they do not match, the initial start node transmits a restart request (steps S25 and S26).

【００２４】一方、障害復旧時クライアントの要求種類
と要求量に応じて負荷の軽い経路を探索するためメッセ
ージの種類ごとに処理の重さを定義している。また、デ
ータベースでは複数のテーブルから成り立っているの
が、一般的で各トランザクションによってデータベース
の正規化が図られている。例えば、装置の状態と装置の
履歴を管理するテーブルが別々に管理されている場合、
トランザクションの追い越しが発生すると、状態と履歴
に矛盾を生じる場合があるが、単一テーブルの検索要求
ではメッセージを追い越すことにより、早い処理が可能
となる。On the other hand, the weight of processing is defined for each type of message in order to search for a path with a light load according to the type and amount of request of the client at the time of failure recovery. Further, a database is generally composed of a plurality of tables, and the database is normalized by each transaction. For example, if the tables for managing the device status and the device history are managed separately,
When the overtaking of the transaction occurs, inconsistency may occur between the state and the history. However, by overtaking the message in the search request of the single table, the fast processing can be performed.

【００２５】このため、例えば、図６に示すように、各
メッセージには追い越し可能であるか予め定義してお
く。障害復旧後、通信制御アプリケーションではメッセ
ージの待ち別の各メッセージの処理の重さを加算して、
通信負荷テーブルを管理している。図７は本発明の実施
例を示すコンピュータシステムの障害復旧後に通信待ち
の少ない経路の探索と通信先の変更動作フローチャート
である。For this reason, for example, as shown in FIG. 6, it is previously defined whether each message can be overtaken. After recovery from the failure, the communication control application adds the processing weight of each message waiting for the message,
Manages the communication load table. FIG. 7 is a flow chart of an operation of searching for a route with little communication waiting and changing a communication destination after recovery from a failure of the computer system according to the embodiment of the present invention.

【００２６】まず、障害復旧を行う（ステップＳ３
１）。次に、クランアントからメッセージを受信する
（ステップＳ３２）。次に、処理前メッセージの数量検
索を行う（ステップＳ３３）。次に、処理前メッセージ
の重み検索を行う（ステップＳ３４）。次に、処理前メ
ッセージの重みの総和の算出を行う（ステップＳ３
５）。First, failure recovery is performed (step S3).
1). Next, a message is received from the client (step S32). Next, a quantity search of the pre-processing message is performed (step S33). Next, a weight search of the pre-process message is performed (step S34). Next, the sum of the weights of the pre-processing message is calculated (step S3).
5).

【００２７】次に、追い越しは可能であるか否かをチェ
ックする（ステップＳ３６）。次に、追い越しが可能で
ある場合には、処理前メッセージの重みの総和の小さい
アプリケーションへメッセージを送信する（ステップＳ
３７）。一方、追い越しが不可能である場合には、処理
前メッセージの重みの総和の大きいアプリケーションへ
メッセージを送信する（ステップＳ３８）。Next, it is checked whether or not overtaking is possible (step S36). Next, when overtaking is possible, the message is transmitted to an application having a small sum of weights of the pre-processing message (step S).
37). On the other hand, if overtaking is not possible, the message is transmitted to an application having a large sum of the weights of the pre-processing message (step S38).

【００２８】このように、まず、各アプリケーションに
対して処理前メッセージの検索を行い、次に、処理前の
メッセージごとの重さの検索と処理前の数量の総和を算
出する。クライアントから受信したメッセージが追い越
し可能であれば総和の小さい経路へメッセージを送信す
る。クライアントから受信したメッセージが追い越し不
可であれば総和の大きい経路へメッセージを送信する。As described above, first, the pre-processing message is searched for each application, and then the weight of each pre-processing message is searched and the sum of the pre-processing quantity is calculated. If the message received from the client can be overtaken, the message is transmitted to a route with a small sum. If the message received from the client cannot be overtaken, the message is transmitted to a route having a large sum.

【００２９】このように、この実施例によれば、クライ
アント・サーバ型のコンピュータシステムにおいて、サ
ーバ上に存在するアプリケーションの障害検出手段と、
その対応手段を設けることにより、クライアントへのサ
ービスを停止させないようにすることが可能となる。ま
た、異常アプリケーションの再起動判定手段を設けたこ
とにより、環境的な要因によるアプリケーションの停止
と再起動を繰り返すことを防止することができ、異常処
理中の異常まで考慮する必要がなく、開発工数の削減に
よる開発コストを低減することが可能である。As described above, according to this embodiment, in a client-server type computer system, a failure detecting means for an application existing on a server,
By providing the corresponding means, it is possible to prevent the service to the client from being stopped. In addition, by providing a restart determination unit for abnormal applications, it is possible to prevent the application from being repeatedly stopped and restarted due to environmental factors, and it is not necessary to consider abnormalities during abnormal processing. It is possible to reduce the development cost by reducing the cost.

【００３０】また、障害復旧後に通信待ちの少ない経路
の探索と通信先を変更する手段を設けたことにより、障
害による停止時間を短縮することが可能となる。なお、
本発明は、クライアント・サーバ型のコンピュータ全て
に適用することが可能である。また、本発明は上記実施
例に限定されるものではなく、本発明の趣旨に基づいて
種々の変形が可能であり、これらを本発明の範囲から排
除するものではない。Further, the provision of the means for searching for a route with little communication waiting after the recovery from the failure and changing the communication destination makes it possible to reduce the stop time due to the failure. In addition,
The present invention can be applied to all client-server type computers. Further, the present invention is not limited to the above-described embodiments, and various modifications are possible based on the gist of the present invention, and these are not excluded from the scope of the present invention.

【００３１】[0031]

【発明の効果】以上、詳細に説明したように、本発明に
よれば、以下のような効果を奏することができる。（１）請求項１記載の発明によれば、クライアント・サ
ーバ型のコンピュータシステムにおいて、サーバ上に存
在するアプリケーションの障害検出手段と、その対応手
段を設けることにより、クライアントへのサービスを停
止させないようにすることができる。As described above, according to the present invention, the following effects can be obtained. (1) According to the first aspect of the present invention, in a client-server type computer system, a service for a client is not stopped by providing a failure detection unit for an application existing on a server and a corresponding unit. Can be

【００３２】（２）請求項２記載の発明によれば、異常
アプリケーションの再起動手段を設けたことにより、環
境的な要因によるアプリケーションの停止と再起動を繰
り返すことを防止することができ、異常処理中の異常ま
で考慮する必要がなく、開発工数の削減による開発コス
トを低減することができる。（３）請求項３記載の発明によれば、障害復旧後に通信
待ちの少ない経路の探索と通信先を決定する手段を設け
たことにより、障害による停止時間を短縮することがで
きる。(2) According to the second aspect of the present invention, the provision of the restart means for the abnormal application can prevent the stop and restart of the application from being repeated due to environmental factors. It is not necessary to consider abnormalities during processing, and development costs can be reduced by reducing development man-hours. (3) According to the third aspect of the present invention, the provision of the means for searching for a route with less communication waiting after the recovery from the failure and determining the communication destination can reduce the stop time due to the failure.

[Brief description of the drawings]

【図１】本発明の実施例を示すコンピュータシステムの
概略システム構成図である。FIG. 1 is a schematic system configuration diagram of a computer system showing an embodiment of the present invention.

【図２】本発明の実施例を示すコンピュータシステムの
動作フローチャートである。FIG. 2 is an operation flowchart of a computer system showing an embodiment of the present invention.

【図３】本発明の実施例を示すコンピュータシステムの
異常検出対応テーブルを示す図である。FIG. 3 is a diagram illustrating an abnormality detection correspondence table of the computer system according to the embodiment of the present invention.

【図４】本発明の実施例を示す障害アプリケーションが
有りの場合の動作フローチャートである。FIG. 4 is an operation flowchart in the case where there is a failure application according to the embodiment of the present invention.

【図５】本発明の実施例を示す初期起動ノードへの移行
動作フローチャートである。FIG. 5 is a flowchart illustrating a transition operation to an initial activation node according to the embodiment of the present invention.

【図６】本発明の実施例を示すコンピュータシステムの
メッセージと処理の重さと追い越しの可否対応テーブル
を示す図である。FIG. 6 is a diagram illustrating a message, a processing weight, and an overtaking availability correspondence table of the computer system according to the embodiment of the present invention.

【図７】本発明の実施例を示すコンピュータシステムの
障害復旧後に通信待ちの少ない経路の探索と通信先の変
更動作フローチャートである。FIG. 7 is a flowchart of an operation of searching for a route having a small communication wait and changing a communication destination after recovery from a failure of the computer system according to the embodiment of the present invention.

[Explanation of symbols]

１，２データベース（ＤＢ）１１第１のＣＰＵ（中央処理装置）１２第１のＤＢアプリケーション（グループ）１３第１の通信制御アプリケーション（グループ）１４第１のデータ加工アプリケーション（グルー
プ）１５第１の異常検出アプリケーション２１第２のＣＰＵ２２第２のＤＢアプリケーション（グループ）２３第２の通信制御アプリケーション（グループ）２４第２のデータ加工アプリケーション（グルー
プ）２５第２の異常検出アプリケーション３１異常検出モニタ４１クライアント群1, 2 Database (DB) 11 1st CPU (central processing unit) 12 1st DB application (group) 13 1st communication control application (group) 14 1st data processing application (group) 15 1st Anomaly detection application 21 Second CPU 22 Second DB application (group) 23 Second communication control application (group) 24 Second data processing application (group) 25 Second anomaly detection application 31 Anomaly detection monitor 41 Client group

Claims

[Claims]

In a client-server type computer system, (a) existence check means of the application, (b) communication availability detection means of the application, and (b) means for detecting a failure of an application existing on a server; c) a computer system comprising: a database access permission / non-permission detection unit for the application.

2. The computer system according to claim 1, wherein: (a) a restart means for an abnormal application; and (b) a restart for an application in a group, when the fault is detected by the fault detecting means for the application. A computer system comprising: starting means.

3. The computer system according to claim 2, wherein, as means for searching for an optimal communication path after recovery from the failure, (a) means for searching for a path with less communication waiting after recovery from the failure and changing the communication destination; A computer system comprising: b) means for deciding a communication destination based on a message overtaking availability condition after restoration of a failure; and (c) means for shifting the application to an initial start node after restoration from the failure.