JP2001134545A

JP2001134545A - Server managing method

Info

Publication number: JP2001134545A
Application number: JP31181099A
Authority: JP
Inventors: Kazuhito Saito; 一仁斉藤
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 1999-11-02
Filing date: 1999-11-02
Publication date: 2001-05-18

Abstract

PROBLEM TO BE SOLVED: To solve that abnormality detection as to whether or not all tasks needed for a server application to start are in operation can not be performed although server machine main body and network trouble, etc., can be detected. SOLUTION: A status file 18 is provided which can record and manage whether or not not only a server machine main body and a network function, but also a server application and OS modules are functioning normally. When the self-diagnosis result of the machine main body, the monitor result of the network function by the server application, the operation state detection result of the server application, and the operation state detection result of OS modules that the server application use are all 'OK (normal)', it is judged that the system is in normal operation and actuation is carried out. When one or more results are '(NG)' different from the normal operation, an 'error' is decided and abnormality of the server application which appears to operate normally is speedily detected.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、複数のコンピュー
タをネットワークで接続したコンピュータネットワーク
システムに係り、特に、ＬＡＮ（Local Area Network）
やＷＡＮ（Wide Area Network）等で構成されるクライ
アント・サーバシステムにおけるサーバで発生する障害
の管理を効率的に行うのに好適なサーバ管理方法に関す
るものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a computer network system in which a plurality of computers are connected via a network, and more particularly, to a LAN (Local Area Network).
The present invention relates to a server management method suitable for efficiently managing a failure that occurs in a server in a client-server system including a network and a WAN (Wide Area Network).

【０００２】[0002]

【従来の技術】コンピュータをＬＡＮまたはＷＡＮなど
のネットワークに接続してなるクライアント・サーバシ
ステムにおけるサーバには、グループウェアサーバやデ
ータべースサーバ、ネットワークサーバ、プリンタサー
バ、ファイルサーバなどがあり、これらのサーバは、今
日、多種多様な業務において使用され、基幹業務や日常
業務に無くてはならない存在となっている。2. Description of the Related Art Servers in a client server system in which a computer is connected to a network such as a LAN or a WAN include a groupware server, a database server, a network server, a printer server, a file server, and the like. Is used in a wide variety of businesses today, and is indispensable for mission-critical and daily tasks.

【０００３】このようなサーバのダウン（障害）は業務
遂行に大きな問題となる。このようなサーバの障害につ
いての対応は、専門のシステム管理者もしくはネットワ
ークシステム技術者が必要である。しかし、システム管
理者などは、サーバのそばにいないことが常であり、不
具合を発見するのは一般にクライアントの利用者であ
る。[0003] Such a down (failure) of the server poses a serious problem in business execution. To deal with such a server failure, a specialized system administrator or a network system engineer is required. However, the system administrator or the like is usually not near the server, and it is generally the user of the client that finds a defect.

【０００４】このようなクライアント・サーバシステム
などのコンピュータネットワークシステムの管理を自動
的に行う技術が、例えば、特開平１０−２４７１６９号
公報や、特開平１１−４２２２号公報および特開平１１
−３１１１４号公報に記載されている。Techniques for automatically managing such a computer network system such as a client / server system are disclosed in, for example, JP-A-10-247169, JP-A-11-4222 and JP-A-11-222.
-31114.

【０００５】特開平１０−２４７１６９号公報には、サ
ーバや端末（クライアント）、ルータ等の被管理システ
ムの管理情報を管理システムで短時間に収集して管理シ
ステムに集中する負荷を極力削減することを目的とし
て、管理システムにおいて管理情報の収集を所定の管理
シナリオに従って行うＤＭエージェントを生成して各被
管理システムに移動させ、このＤＭエージェントによ
り、各被管理システムの実行環境上で管理シナリオに従
って管理情報を収集して、管理システムに報告する技術
が記載されている。[0005] Japanese Patent Application Laid-Open No. Hei 10-247169 discloses that management information of a managed system such as a server, a terminal (client), and a router is collected in a short time by the management system, and the load concentrated on the management system is reduced as much as possible. For this purpose, a DM agent that collects management information in a management system according to a predetermined management scenario is generated and moved to each managed system, and managed by the DM agent in an execution environment of each managed system according to the management scenario. It describes techniques for collecting information and reporting it to a management system.

【０００６】また、特開平１１−４２２２号公報には、
ＳＮＭＰ（Simple Network Management Protocol）を用
いたネットワークの管理において、管理エージェントの
状態変化発生から、それを検出するまでの時間を、管理
マネージャやネットワークに過負荷をかけることなく短
縮する技術が記載されている。[0006] Japanese Patent Application Laid-Open No. 11-4222 discloses that
In the management of a network using SNMP (Simple Network Management Protocol), there is described a technology for shortening the time from occurrence of a state change of a management agent to detection of the change without overloading a management manager or a network. I have.

【０００７】また、特開平１１−３１１１４号公報に
は、遠隔地からネットワークに接続された管理サーバに
アクセスして、この管理サーバを起動・停止制御し、管
理すべき情報を収集し、Ｗｅｂブラウザに表示すること
により、ネットワークを一元的に、かつ、効率的に管理
する技術が記載されている。Japanese Patent Application Laid-Open No. 11-31114 discloses that a management server connected to a network is accessed from a remote place, start / stop control of this management server is performed, information to be managed is collected, and a Web browser is provided. Describes a technique for centrally and efficiently managing a network by displaying the information.

【０００８】しかし、これらの従来技術では、一見正常
に動作しているサーバにおける各サーバ用のアプリケー
ション（サーバアプリケーション）の異常を速やかに検
出することはできない。すなわち、サーバダウン現象に
はいろいろな予期できないパターンがあり、単純作業に
て対応することは難しい。However, according to these conventional techniques, it is not possible to quickly detect an abnormality of an application for each server (server application) in a server that is operating normally at first glance. That is, there are various unexpected patterns in the server down phenomenon, and it is difficult to cope with the simple work.

【０００９】例えば、イントラネットにグループウェア
データべースサーバとグループウェアクライアントが接
続されているネットワーク構成においては、このデータ
べースサーバは日常業務で必ず使用するサーバであり、
このサーバが２４時間稼動していれば問題ないが、時々
ダウンすることがあり、そのダウンした際には以下に示
す種々の問題がある。For example, in a network configuration in which a groupware database server and a groupware client are connected to an intranet, the database server is a server that is always used in daily work.
There is no problem if this server operates for 24 hours, but sometimes it goes down, and when it goes down, there are various problems described below.

【００１０】（１）グループウェアのサーバダウンを速
やかに検出できない現象がある。（２）サーバのＯＳ(オぺレイティングシステム)の一部
がダウンしている揚合、サーバアプリケーションの異常
に気付くことができない。（３）サーバにおけるネットワーク機能が正常でもサー
バアプリケーション機能がダウンしている場合があり、
その場合、ネットワーク管理情報からではサーバダウン
を検出できない。(1) There is a phenomenon that a server failure of groupware cannot be detected quickly. (2) When a part of the OS (Operating System) of the server is down, an abnormality of the server application cannot be noticed. (3) Even if the network function in the server is normal, the server application function may be down,
In that case, server down cannot be detected from the network management information.

【００１１】（４）ネットワーク機能が動作している場
合で、サーバアプリケーションが正常に動作しているこ
とを検出する機能が無い場合、ネットワーク管理者には
不具合を速やかに検出することができない。（５）サーバのＯＳの一部の機能、例えば、コンソール
（キーボードやマウスおよびディスプレイ装置など）機
能がダウンしている場合、その不具合は、サーバでない
と検出できない。（６）サーバのＯＳとサーバアプリケーションの機能的
同期(必要機能の動作確認)が取れていることが検出でき
ない。(4) In the case where the network function is operating and there is no function for detecting that the server application is operating normally, the network administrator cannot detect the malfunction immediately. (5) When some functions of the OS of the server, for example, console (keyboard, mouse and display device) functions are down, the failure cannot be detected unless the server is used. (6) It cannot be detected that the OS of the server and the server application are functionally synchronized (operation check of necessary functions).

【００１２】（７）サーバ機能の一部がダウンしている
場合、クライアントからはその機能を使用しない限りサ
ーバダウンが検出できない(障害の発見ができない)。（８）よく発生する現象は、エージェント(オペレータ
無しで自動的にサーバアプリケーションを起動するプロ
グラム)にてサーバアプリケーションを起動した時に正
しく起動できない現象がある。尚、エージェントには、
サーバアプリケーションをシャットダウンしたり、スタ
ートアップしたりする機能があり、サーバアプリケーシ
ョンの起動をコントロールしている。(7) If a part of the server function is down, the server cannot be detected from the client unless the function is used (failure cannot be found). (8) A frequently occurring phenomenon is that the agent (a program that automatically starts a server application without an operator) cannot start the server application properly when started. In addition, agents
It has a function to shut down and start up the server application, and controls the startup of the server application.

【００１３】図９は、従来のサーバ管理制御例を示すフ
ローチャートである。メインスイッチがオンされるとＰ
ＯＳＴ（power on test、パワーオンテスト）による自
己診断テスト、および、ＳＮＭＰ等によるネットワーク
機能テストを順に行って、それぞれのテスト結果が正常
であれば（ステップ９０１〜９０４）、サーバアプリケ
ーションを起動してプログラムをロードし（ステップ９
０５）、サーバシステムを起動する（ステップ９０
６）。尚、ＰＯＳＴとネットワーク機能のテストで致命
的なエラーを検出すれば（ステップ９０７，９０８）、
それぞれのエラー内容を表示して（ステップ９０９，９
１０）、起動を停止する。FIG. 9 is a flowchart showing an example of conventional server management control. When the main switch is turned on, P
A self-diagnosis test by OST (power on test) and a network function test by SNMP etc. are sequentially performed. If the respective test results are normal (steps 901 to 904), the server application is started. Load the program (step 9
05), the server system is started (step 90)
6). If a fatal error is detected in the POST and the network function test (steps 907 and 908),
The contents of each error are displayed (steps 909, 9
10) Stop the startup.

【００１４】しかし、ＰＯＳＴとネットワーク機能のテ
ストで正常であるとの結果だけでは、例えば、サーバア
プリケーションの異常のように、利用する際に初めて分
かる障害は検出されなので、一見、正常に動作している
ように見えても、実際には稼動できない状態となってい
る場合もある。[0014] However, only the result of the test in the POST and the network function that is normal indicates that a failure that can be recognized for the first time at the time of use, such as an abnormality of a server application, is detected. Even if it looks like it may be in a state where it cannot actually operate.

【００１５】[0015]

【発明が解決しようとする課題】解決しようとする問題
点は、従来の技術では、サーバ用のマシーン（サーバマ
シーン）やネットワーク障害などは、ＰＯＳＴや、ネッ
トワーク監視プロトコル(ＳＮＭＰ)等で管理されている
ので異常検出ができるようになつているが、サーバアプ
リケーションが起動するのに必要なタスク全部が動作し
ているか否か等の異常検出はできない点である。The problem to be solved is that, in the prior art, a server machine (server machine) or a network failure is managed by POST, network monitoring protocol (SNMP) or the like. Therefore, it is possible to detect an abnormality, but it is not possible to detect an abnormality such as whether or not all tasks necessary for starting a server application are operating.

【００１６】本発明の目的は、これら従来技術の課題を
解決し、サーバマシーン本体が動作していることと、ネ
ットワーク機能が動作していること、そしてサーバアプ
リケーションが機能していること等をトータルで監視
し、サーバ全体として障害の検出・管理ができるように
し、特に、サーバアプリケーションのみ、または、ＯＳ
の一部異常障害をいち早く検出することを可能とし、か
つ、その不具合内容に応じた対応処理を自動的に行うこ
とを可能とするサーバ管理方法を提供することである。An object of the present invention is to solve these problems of the prior art, and to provide a total that the server machine itself is operating, the network function is operating, and the server application is functioning. To monitor and detect and manage failures as a whole server. In particular, only server applications or OS
It is an object of the present invention to provide a server management method capable of promptly detecting a partial abnormal failure of the above, and automatically performing a response process according to the content of the failure.

【００１７】[0017]

【課題を解決するための手段】上記目的を達成するた
め、本発明のサーバ管理方法は、サーバ機能が正しく動
作するためには、サーバマシーン本体が動作しているこ
とと、ネットワーク機能が動作していること、そしてサ
ーバアプリケーションが機能していることが必要である
ことに着目し、これらサーバシステム全体を管理できる
「ステータスファイル」を設ける。そして、この「ステ
ータスファイル」に、サーバマシーン本体の自己診断結
果や、ＳＮＭＰによるネットワーク機能の監視結果、サ
ーバアプリケーションの動作状態検出結果、および、サ
ーバアプリケーションが利用するＯＳモジュールの動作
状態検出結果のそれぞれを記録し、各結果が全て「ＯＫ
(正常)」であれば、システムが正常に動作中であるとし
て起動するが、いずれか１つ、または、複数が正常状態
と異なる（「ＮＧ」）場合は、「エラー」として判定す
る。このことにより、一見正常に動作しているサーバに
おけるアプリケーションの異常を速やかに検出すること
ができる。In order to achieve the above object, the server management method of the present invention requires that the server machine itself operates and that the network function operates in order for the server function to operate correctly. Focusing on the fact that it is necessary that the server application is functioning, a "status file" that can manage the entire server system is provided. The “status file” includes the self-diagnosis result of the server machine itself, the monitoring result of the network function by SNMP, the operation state detection result of the server application, and the operation state detection result of the OS module used by the server application. Is recorded and all results are "OK".
If the status is (normal), the system is started up as if it is operating normally, but if one or more of them is different from the normal status ("NG"), it is determined as "error". As a result, it is possible to promptly detect an abnormality of the application in the server that is operating normally at first glance.

【００１８】尚、「ステータスファイル」においては、
サーバマシーン本体やネットワークなどのそれぞれが正
しく動作するために立ち上げておかなければならない全
ての機能、例えばその機能名を文字コードで「正常値デ
ータ」としてサーバマシーン本体やネットワーク別に記
録しておき、サーバの実際の起動時に、順次に立ち上げ
た各機能の機能名を「ステータスファイル」における
「カレントデータ」として記録し、予め「正常値デー
タ」として記録されている全ての機能名が「カレントデ
ータ」に記録されていれば、そのサーバマシーン本体や
ネットワークが正常に起ち上がったものとして判定し、
一つでも欠けていればエラーと判定する。In the "status file",
Record all functions that must be started for each server machine or network etc. to operate properly, for example, record the function name as `` normal value data '' with character code for each server machine or network, When the server is actually started, the function names of the sequentially started functions are recorded as “current data” in the “status file”, and all the function names previously recorded as “normal value data” are referred to as “current data”. ”, It is determined that the server machine itself and the network have started up normally,
If even one is missing, it is determined as an error.

【００１９】そして、「エラー」として判定した場合に
は、どのチェック結果が異常であるかを確認して、例え
ば、或るサーバアプリケーションのみが異常である時に
は、当該サーバアプリケーションをシャットダウンして
再起動する。これにより、サーバアプリケーションの起
動時に良く発生する立ち上げ異常の解決を自動的に行う
ことができると共に、サーバアプリケーションのみの異
常がなかなか発見できないことがなくなり、利用する際
に初めて分かるサーバアプリケーションの異常を速やか
に解決できる。If it is determined as "error", it is confirmed which check result is abnormal. For example, when only a certain server application is abnormal, the server application is shut down and restarted. I do. This makes it possible to automatically resolve startup errors that often occur when the server application is started, and it is possible to easily find out only server application errors. Can be resolved quickly.

【００２０】また、「エラー（障害、異常）」として判
定（判断）した場合には、「ステータスファイル」の内
容を、サーバアプリケーション等のロギングデータべー
スファイル(アプリケーションが起動された時からのア
プリケーションのイべント処理が記録されるロギングフ
ァイル)に埋め込み、データべースアクセス機能が動作
している場合、クライアントユーザからも状態判断がで
きるようにする。これにより、サーバ管理者あるいはク
ライアントユーザは、サーバマシーン本体のコンソール
を利用しなくても、すなわち、サーバマシーンの設置場
所まで行かずに、クライアントマシーンから、確認した
いサーバアプリケーションのロギングファイルをアクセ
スすることで、その状況を確認できる。When it is determined (determined) as “error (failure, abnormality)”, the contents of the “status file” are stored in a logging database file such as a server application. It is embedded in a logging file where event processing is recorded, so that the client user can determine the status when the database access function is operating. As a result, the server administrator or the client user can access the logging file of the server application to be checked from the client machine without using the console of the server machine itself, that is, without going to the installation location of the server machine. You can check the situation.

【００２１】また、判定した「エラー」が、サーバアプ
リケーションやＯＳモジュールの障害等、ネットワーク
機能以外の障害であれば、サーバアプリケーションから
独立した電子メール機能を用いて、サーバ管理者に、サ
ーバアプリケーションの異常を通知する。このことによ
り、従来は、サーバ異常を見に行かないと検出できなか
った事象を電子メールの到着で検出することができる。If the determined “error” is a failure other than the network function such as a failure of the server application or the OS module, the server administrator is notified of the server application by using an e-mail function independent of the server application. Notify an error. As a result, it is possible to detect an event that has not been detected unless a server abnormality is checked beforehand by the arrival of an electronic mail.

【００２２】また、判定した「エラー」がサーバアプリ
ケーションで利用するＯＳモジュールの障害であれば、
サーバマシーンのコンソール(キーボード／マウス／Ｃ
ＲＴなど)機能が効かなくなってしまうので、その場合
には、サーバマシーンのメインスイッチ（電源スイッ
チ）のオフ・オン制御(再起動)を行う。このことによ
り、従来、サーバ管理者がサーバマシーンの設置場所ま
で行ってシステムのロックと判断し、サーバマシーンの
電源を制御（オフ・オン）していたことが、自動的にお
こなうことができる。If the determined “error” is a failure of the OS module used in the server application,
Server machine console (keyboard / mouse / C
In this case, the on / off control (restart) of the main switch (power switch) of the server machine is performed. As a result, it is possible to automatically perform the operation in which the server administrator conventionally goes to the installation location of the server machine, determines that the system is locked, and controls the power of the server machine (off / on).

【００２３】[0023]

【発明の実施の形態】以下、本発明の実施の形態を、図
面により詳細に説明する。図１は、本発明のサーバ管理
方法に係る処理動作例を示すフローチャートであり、図
２は、本発明のサーバ管理方法に係る処理動作を行うサ
ーバマシーンの構成例を示すブロック図、図３は、図１
における処理動作を行うクライアント・サーバシステム
の構成例を示すブロック図、図４は、図３におけるサー
バマシーンで作成するステータスファイルの構成例を示
す説明図である。Embodiments of the present invention will be described below in detail with reference to the drawings. FIG. 1 is a flowchart illustrating an example of a processing operation according to the server management method of the present invention. FIG. 2 is a block diagram illustrating a configuration example of a server machine that performs a processing operation according to the server management method of the present invention. , FIG.
FIG. 4 is a block diagram showing a configuration example of a client-server system performing the processing operation in FIG. 3, and FIG. 4 is an explanatory diagram showing a configuration example of a status file created by the server machine in FIG.

【００２４】図３において、１はグループウェアサーバ
機能を設けたサーバマシーン、２〜４はクライアントマ
シーン(以下および図中「クライアント」と記載)、５はＬ
ＡＮであり、クライアント２〜４は、ＬＡＮ５を介して
サーバマシーン１にアクセスし、サーバアプリケーショ
ン（グループウェア）を利用する。In FIG. 3, 1 is a server machine provided with a groupware server function, 2 to 4 are client machines (hereinafter, referred to as “client” in the figure), and 5 is L
The client is an AN, and the clients 2 to 4 access the server machine 1 via the LAN 5 and use a server application (groupware).

【００２５】サーバマシーン１は、通常２４時間稼動で
あり、クライアント２〜４は、ユーザが業務中のみＰＯ
ＷＥＲ（電源）スイッチを入れて稼動にしている。この
ようなシステムにおいては、「ハードウェア(サーバマ
シーン本体)」、「ＯＳ（オぺレーティングシステ
ム）」、「ネットワーク機能」、「サーバアプリケーシ
ョン」、「サーバ付属機能(バックアップ装置／停電対
策装置／リモートアクセス用モデム／その他)」などが
全て稼動していて始めて、正しい稼動状態となる。The server machine 1 is normally operated for 24 hours, and the clients 2 to 4 are provided with the PO only when the user is in business.
The WER (power supply) switch is turned on to operate. In such a system, “hardware (server machine body)”, “OS (operating system)”, “network function”, “server application”, “server attached function (backup device / power failure prevention device / remote device) Access modem / others) ”etc. are all in operation, and the correct operation state is achieved.

【００２６】このうち「ハードウェア(サーバマシーン
本体)」については、従来からパワーオン時に行うパワ
ーオンテスト(ＰＯＳＴ)によりハードウェアの異常を検
出している。また、「ネットワーク機能」についても、
従来から、ネットワーク監視プロトコル(ＳＮＭＰ：Sim
ple Network Management protocol)等で管理され、異常
検出ができるようになっている。As for the “hardware (server machine body)”, an abnormality in the hardware is detected by a power-on test (POST) performed at the time of power-on. In addition, about "network function",
Conventionally, a network monitoring protocol (SNMP: Sim
ple Network Management protocol) to detect errors.

【００２７】しかし、「ＯＳ異常検出」や「サーバアプ
リケーションの異常検出」には確実なものはなく、従来
は、サーバ管理者が必要に応じてハードウェアやハード
デイスク・タスク状態・メモリ状態などを確認して稼動
確認を行っている。However, there is no reliable "OS error detection" or "server application error detection". Conventionally, a server administrator checks hardware, hard disk, task status, memory status and the like as necessary. Operation confirmation.

【００２８】本例では、サーバマシーン１に、それらの
稼動状態を管理するための図４に示す構成のステータス
ファイル（status file）１８を作成するプログラム
と、そのステータスファイルの内容を監視するプログラ
ムとを設けて、カレントデータと正常時データと比較し
て、どの部分に異常があるのかを検出する。In this example, a program for creating a status file (status file) 18 having the configuration shown in FIG. 4 for managing the operating states of the server machine 1 and a program for monitoring the contents of the status file are provided in the server machine 1. Is provided, and the current data and the normal data are compared to detect which portion has an abnormality.

【００２９】図２に示すように、サーバマシーン１は、
ＣＲＴ（Cathode Ray Tube）やＬＣＤ（Liquid Crystal
Display）等からなる表示装置１ａ、キーボードやマウ
ス等からなる入力装置１ｂ、ハードディスクドライブ等
からなる外部記憶装置１ｃ、ＣＰＵ（Central Processi
ng Unit）１０や主メモリ１１等を有し蓄積プログラム
方式によるコンピュータ処理を行なう情報処理装置１
ｄ、本発明に係る処理プログラムやデータ等を記録した
光ディスク１ｅ、光ディスク１ｅの読み取り動作を行な
う駆動装置１ｆ、停電等の商用電源供給停止時にバッテ
リー電池を電源を供給する無停電電源装置（図中および
以下、「ＵＰＳ：Uninterruptible Power Supply」と記
載)１ｇにより構成されている。As shown in FIG. 2, the server machine 1
CRT (Cathode Ray Tube) and LCD (Liquid Crystal)
(Display), an input device 1b including a keyboard and a mouse, an external storage device 1c including a hard disk drive, and a CPU (Central Processi).
ng Unit) 10, a main memory 11, and the like, and an information processing apparatus 1 that performs computer processing by a storage program method
d, an optical disk 1e on which a processing program and data according to the present invention are recorded, a driving device 1f for reading the optical disk 1e, an uninterruptible power supply for supplying power to a battery cell when a commercial power supply such as a power failure is stopped (in the figure) And hereinafter referred to as “UPS: Uninterruptible Power Supply”).

【００３０】光ディスク１ｅに記録された処理プログラ
ムやデータをインストールして主メモリ１１に読み込み
ＣＰＵ１０で実行することにより、情報処理装置１ｄ内
に、サーバ管理処理部１２として本発明に係る機能が実
装される。By installing a processing program or data recorded on the optical disk 1e and reading it into the main memory 11 and executing it by the CPU 10, the functions according to the present invention are implemented as the server management processing section 12 in the information processing apparatus 1d. You.

【００３１】このサーバ管理処理部１２は、ＰＯＳＴ処
理部１３、ネットワークテスト処理部１４、アプリテス
ト処理部１５、ＯＳテスト処理部１６、結果記録処理部
１７、ステータスファイル１８、検証処理部１９、エラ
ー処理部２０からなり、以下、その動作を説明する。The server management processing unit 12 includes a POST processing unit 13, a network test processing unit 14, an application test processing unit 15, an OS test processing unit 16, a result recording processing unit 17, a status file 18, a verification processing unit 19, The operation is described below.

【００３２】サーバマシーン１は、サーバ管理処理部１
２におけるＰＯＳＴ処理部１３により、「ハードウェア
状態」に関するカレントデータを、ＰＯＳＴの自己診断
機能結果情報から読み取る。また、ＯＳテスト処理部１
６により、「ＯＳ状態」に関するカレントデータを、タ
スクマネージャのアプリケーション状態情報から読み取
り、ネットワークテスト処理部１４により、「ネットワ
ーク状態」に関するカレントデータを、ＳＮＭＰ情報か
ら読み取り、アプリテスト処理部１５により、「アプリ
ケーション状態」に関するカレントデータを、ロギング
データから読み取る。The server machine 1 includes a server management processing unit 1
The POST processing unit 13 in 2 reads the current data relating to the “hardware state” from the POST self-diagnosis function result information. OS test processing unit 1
6, the current data on the "OS state" is read from the application state information of the task manager, the current data on the "network state" is read from the SNMP information by the network test processing unit 14, and the " The current data on the "application state" is read from the logging data.

【００３３】これらの読み取り結果は、それぞれ、結果
記録処理部１７により、ステータスファイル１８に記録
される。このステータスファイル１８は、テキストファ
イル形式として、汎用的に見ることができるようにす
る。そして、結果記録処理部１７は、それぞれの読み取
り結果を、正常時に読み取ったデータと比較して、現在
のそれぞれの稼動状態を判定（判断）し、その自己診断
結果を「ＯＫ／ＮＧ」で記録する。Each of these reading results is recorded in the status file 18 by the result recording processing unit 17. The status file 18 can be generally viewed as a text file format. Then, the result recording processing unit 17 compares each read result with the data read at normal time to determine (judge) each current operating state, and records the self-diagnosis result as “OK / NG”. I do.

【００３４】すなわち、ステータスファイル１８におい
ては、図４に示すように、サーバマシーン本体やネット
ワークのそれぞれが正しく動作するために立ち上げてお
かなければならない全ての機能、例えばその機能名を文
字コード（ａａａ，ｂｂｂ，ｃｃｃ，…，ＡＡＡ，ＢＢ
Ｂ，…）で「正常値データ」としてサーバマシーン本体
やネットワーク別に記録しておき、そして、サーバの実
際の起動時に、順次に立ち上げた各機能の機能名をステ
ータスファイル１８における「カレントデータ」として
記録し、予め「正常値データ」として記録されている全
ての機能名が「カレントデータ」に記録されていれば、
そのサーバマシーン本体（ハードウェア状態）やネット
ワーク状態が正常に起ち上がったものとして判断し
（「ＯＫ」）、一つでも欠けていればエラーと判断する
（「ＮＧ」）。That is, in the status file 18, as shown in FIG. 4, all functions that must be started up for the server machine and the network to operate properly, for example, the function names are represented by character codes ( aaa, bbb, ccc, ..., AAA, BB
B,...), Is recorded as “normal value data” for each server machine or network, and the function names of the sequentially activated functions at the time of actual startup of the server are referred to as “current data” in the status file 18. If all function names previously recorded as "normal value data" are recorded in "current data",
It is determined that the server machine body (hardware state) and the network state have started up normally (“OK”), and if any of them is missing, it is determined that an error has occurred (“NG”).

【００３５】その後、検証処理部１９により、結果記録
処理部１７によるステータスファイル１８における自己
診断結果（「ＯＫ／ＮＧ」）に基づき、稼動状態を判定
する。すなわち、各項目の内容が全て「ＯＫ(正常稼動
と同じ)」の場合、システムが正常に動作中であると判
定するが、いずれか一つ、または、複数が「ＮＧ」で正
常状態と異なる場合は、「エラー（ＥＲＲＯＲ）」と判
定する。Thereafter, the verification processing unit 19 determines the operation state based on the self-diagnosis result (“OK / NG”) in the status file 18 by the result recording processing unit 17. That is, if the contents of each item are all “OK (same as normal operation)”, it is determined that the system is operating normally, but one or more of them are “NG” and different from the normal state. In this case, it is determined that the error is “ERROR”.

【００３６】ここで、「エラー（ＥＲＲＯＲ）」と判定
した場合、サーバマシーン１は、サーバ管理処理部１２
におけるエラー処理部２０により、ステータスファイル
のどの項目が異常であるか確認し、それぞれに対応した
エラー処理を行う。例えば、サーバアプリケーションの
みが異常である時には、サーバアプリケーションをシャ
ットダウンして再起動する。Here, if it is determined that the error is “ERROR”, the server machine 1
The error processing unit 20 checks which item in the status file is abnormal, and performs error processing corresponding to each item. For example, when only the server application is abnormal, the server application is shut down and restarted.

【００３７】このようにすることにより、サーバアプリ
ケーションの異常を、クライアント２〜４が実際に利用
する前に発見できると共に、サーバアプリケーションの
起動時によく発生する立ち上げ異常を自動的に解消させ
ることができる。By doing so, it is possible to detect an abnormality in the server application before the clients 2 to 4 actually use it, and to automatically eliminate a startup abnormality that often occurs when the server application is started. it can.

【００３８】また、エラー処理部２０は、このステータ
スファイル１８を、下記サンプルに示すようなサーバア
プリケーションのロギングデータべースファイル(アプ
リケーションが起動された時からのアプリケーションの
イべント処理が記録されるロギングファイル)に埋め込
む。The error processing unit 20 stores the status file 18 in a logging database file of a server application (a logging in which the event processing of the application since the application was started is recorded) as shown in the following sample. File).

【００３９】〈ロギングファイルのサンプル〉「・Serverを閉始しました。Release 4.5.5 を実行し
ています。・Event Interceptor (Version 4.6) ・Event Interceptor started ・Database Replicator started ・Mail Router started for domain A ・Router: Internet SMTP host nts_s_pro07 in domain ・Router: Shutdown is in progress ・Router: Beginning mailbox file compaction ・Router: Completed mailbox file compaction ・索引更新処理の起動・Stats agent started ・エージェントマネージャを開始しました・AMgr: 実行番号２を開始しています・AMgr: 実行番号１を開始しています・Query/Set Hand1er started ・Query/Set Handler (Version 4.6) ・Reporter started ・Reporter: Sending statistics to 'statrep.nsf'eve
ry720 minutes. Running ana1ysis dai1y.Event Dispat
cher started. ・データベースサーバの起動」<Sample logging file>"The server has been closed. Release 4.5.5 is running.-Event Interceptor (Version 4.6)-Event Interceptor started-Database Replicator started-Mail Router started for domain A・ Router: Internet SMTP host nts_s_pro07 in domain ・ Router: Shutdown is in progress ・ Router: Beginning mailbox file compaction ・ Router: Completed mailbox file compaction ・ Start of index update processing ・ Stats agent started ・ Agent manager started ・ AMgr: Starting execution number 2 ・ AMgr: Starting execution number 1 ・ Query / Set Hand1er started ・ Query / Set Handler (Version 4.6) ・ Reporter started ・ Reporter: Sending statistics to 'statrep.nsf'eve
ry720 minutes.Running ana1ysis dai1y.Event Dispat
cher started.-Starting the database server "

【００４０】これにより、データべースアクセス機能が
動作している場合、サーバマシーン１のコンソール（キ
ーボードやマウスおよびディスプレイ装置など）を利用
しなくてもクライアントからのロギングデータベースを
見ることで、サーバアプリケーションの異常を検出する
ことができる。すなわち、クライアント２〜４の各ユー
ザからも状態判断ができる。また、サーバ管理者も、サ
ーバマシーン１の設置場所まで行かずに状況を確認する
ことができる。Thus, when the database access function is operating, the server application of the server application can be viewed by viewing the logging database from the client without using the console (keyboard, mouse, display device, etc.) of the server machine 1. Abnormality can be detected. That is, the status can be determined from each user of the clients 2 to 4. Also, the server administrator can check the situation without going to the installation location of the server machine 1.

【００４１】また、サーバマシーン１には、サーバアプ
リケーションから独立した電子メール機能が設けられて
おり、エラー処理部２０は、ステータスファイル１８を
参照して、ネットワーク機能が正常であることを判断す
ると、その電子メール機能を用いて、予め登録されてい
るサーバ管理者の電子メールアドレスに、サーバアプリ
ケーションの異常を通知する。これにより、サーバ管理
者は、サーバマシーン１の設置場所に居なくても、サー
バ異常を検出することができる。The server machine 1 has an e-mail function independent of the server application. The error processing unit 20 refers to the status file 18 and determines that the network function is normal. Using the e-mail function, the server application notifies the pre-registered e-mail address of the server administrator of the abnormality of the server application. Thus, the server administrator can detect a server abnormality even when not at the installation location of the server machine 1.

【００４２】また、エラー処理部２０は、ステータスフ
ァイル１８を参照して異常判定した際に起動するエージ
ェントを作成し、このエージェントにより、サーバアプ
リケーションの障害解析に必要なデータおよびロギング
ファイルを収集し、その異常が、サーバアプリケーショ
ンの一部障害、もしくは、ＯＳの障害であった場合は、
ＵＰＳ１ｇの機能を用いて、情報処理装置１ｄ（サーバ
マシーン１）の電源制御(電源のオフ・オン等)を行う。The error processing unit 20 creates an agent to be activated when an abnormality is determined with reference to the status file 18, and collects data and a logging file necessary for analyzing a failure of the server application by this agent. If the abnormality is a server application failure or OS failure,
Using the function of the UPS 1g, the power supply control (power off / on, etc.) of the information processing apparatus 1d (server machine 1) is performed.

【００４３】これにより、サーバマシーン１のコンソー
ル機能の障害にも自動的に対応でき、従来はサーバ管理
者がサーバマシーン１の設置場所まで行き、システムの
ロック障害と判断してサーバマシーン１の電源を制御
（オフ・オン）していた作業が不要となる。特に、この
ようなサーバダウンタイムを短くするため、サーバ管理
者は、就業時間前に確認する必要があり、大きな作業負
荷となっていたが、その作業負荷を削減することができ
る。Thus, it is possible to automatically cope with a failure of the console function of the server machine 1. Conventionally, a server administrator goes to the installation location of the server machine 1, determines that a system lock failure has occurred, and turns off the power supply of the server machine 1. The work of controlling (off / on) is unnecessary. In particular, in order to shorten such server downtime, the server administrator needs to check before working hours, which has been a large workload, but the workload can be reduced.

【００４４】また、この動作を、サーバアプリケーショ
ンの立ち上げ時に行うことにより、通常、午前５時ぐら
いを立ち上げとしておけば、午前８時等の就業開始時間
に問題なくサーバシステムを稼動状態とすることができ
る。Also, by performing this operation when the server application is started up, if the start-up is usually at about 5:00 am, the server system is brought into the operating state at 8:00 am or the like without any problem. be able to.

【００４５】以下、このようなサーバマシーン１におけ
るサーバ管理制御動作を、図１を用いて説明する。メイ
ンスイッチが入れられると（ステップ１０１）、まず、
ＰＯＳＴ（パワオンテスト）を行い（ステップ１０
２）、その結果（「ＯＫ／ＮＧ」）を（ステップ１０
３）、ステータスファイル１８におけるハードウェア状
態欄に記録する（ステップ１０４，１０５）。Hereinafter, the server management control operation in the server machine 1 will be described with reference to FIG. When the main switch is turned on (step 101), first,
Perform POST (power-on test) (step 10
2) and the result (“OK / NG”) (step 10)
3), and record it in the hardware status column of the status file 18 (steps 104 and 105).

【００４６】次に、ＳＮＭＰによるネットワーク機能の
確認を行い（ステップ１０６）、その結果（「ＯＫ／Ｎ
Ｇ」）を（ステップ１０７）、ステータスファイル１８
におけるネットワーク状態欄に記録する（ステップ１０
８，１０９）。Next, the network function is confirmed by SNMP (step 106), and as a result (“OK / N
G ”) (step 107), the status file 18
Is recorded in the network status column in (Step 10)
8, 109).

【００４７】さらに、サーバアプリケーションを起動し
て、その起動状態の確認と起動モジュールのチェックを
行い（ステップ１１０）、その結果（「ＯＫ／ＮＧ」）
を（ステップ１１１）、ステータスファイル１８におけ
るサーバアプリケーション状態欄に記録する（ステップ
１１２，１１３）。Further, the server application is activated, its activation state is confirmed and the activation module is checked (step 110), and as a result ("OK / NG")
Is recorded in the server application status column in the status file 18 (steps 112 and 113).

【００４８】また、サーバアプリケーションで利用する
ＯＳのモジュールの動作状態を確認し（ステップ１１
４）、その結果（「ＯＫ／ＮＧ」）を（ステップ１１
５）、ステータスファイル１８におけるＯＳ状態欄に記
録する（ステップ１１６，１１７）。Further, the operating state of the OS module used in the server application is confirmed (step 11).
4) and the result (“OK / NG”) (step 11
5), and record it in the OS status column of the status file 18 (steps 116 and 117).

【００４９】そして、このようにしてステータスファイ
ル１８に記録された各状態欄の確認結果を検証する（ス
テップ１１８）。全ての状態が「ＯＫ」と記録されてい
れば、異常なしとしてサーバマシーンを起動し（ステッ
プ１１９）、また、いずれか１つでも「ＮＧ」が記録さ
れていれば、「エラー通知出力」等のエラー処理、ある
いは、図５〜図８で示すように、どの状態が「ＮＧ」で
あるかを判別して、その「ＮＧ」が記録された状態に対
応するエラー処理等を行う（ステップ１２１）。Then, the confirmation result of each status column recorded in the status file 18 is verified (step 118). If all the states are recorded as "OK", the server machine is started without any abnormality (step 119), and if any one of "NG" is recorded, "Error notification output" etc. Or the error processing corresponding to the state where the "NG" is recorded is determined by determining which state is "NG" as shown in FIGS. 5 to 8 (step 121). ).

【００５０】このように、ＰＯＳＴ、ネットワーク機能
テスト、サーバアプリケーション起動状態テスト、ＯＳ
起動テストの１つでも異常があればエラー表示等の処理
を行うので、一見、正常に動作しているように見えるサ
ーバアプリケーションの異常を速やかに検出することが
できる。As described above, the POST, the network function test, the server application activation state test, the OS
If at least one of the startup tests has an abnormality, processing such as error display is performed, so that at first glance an abnormality of the server application that appears to be operating normally can be promptly detected.

【００５１】図５は、本発明のサーバ管理方法における
第１のエラー処理動作例を示すフローチャートである。
本例は、図１におけるステップ１２１での処理の一詳細
例を示すものであり、図４で示すステータスファイル１
８において「ＮＧ」が記録されているのがサーバアプリ
ケーション状態のみであれば（ステップ５０１〜５０
４）、サーバアプリケーションをシャットダウンして再
起動する（ステップ５０５）。FIG. 5 is a flowchart showing a first error processing operation example in the server management method of the present invention.
This example shows a detailed example of the processing in step 121 in FIG. 1, and the status file 1 shown in FIG.
In step 8, if "NG" is recorded only in the server application state (steps 501 to 50)
4) Shut down and restart the server application (step 505).

【００５２】また、サーバアプリケーション以外のエラ
ーであれば、エラー表示を行い起動停止する（ステップ
５０６）。このようにすることにより、サーバアプリケ
ーションの起動時によく発生する「立ち上げ異常トラブ
ル」に自動的に対処することができ、利用・アクセス時
に初めて判明するサーバアプリケーションの異常を速や
かに解決することができる。If the error is other than that of the server application, an error message is displayed and startup is stopped (step 506). By doing so, it is possible to automatically cope with the "startup trouble" that often occurs when the server application is started, and it is possible to quickly resolve the abnormality of the server application that is first identified at the time of use / access. .

【００５３】図６は、本発明のサーバ管理方法における
第２のエラー処理動作例を示すフローチャートである。
本例は、図１におけるステップ１２１での処理の一詳細
例を示すものであり、図４で示すステータスファイル１
８におけるいずれかの状態に「ＮＧ」が記録されていれ
ば（ステップ１１４〜１１９）、どれが「ＮＧ」で、ど
のような状態であるかを判定し（ステップ６０１）、そ
の判定内容をサーバアプリケーションのロギングデータ
に設定する（ステップ６０２）。FIG. 6 is a flowchart showing a second example of the error processing operation in the server management method of the present invention.
This example shows a detailed example of the processing in step 121 in FIG. 1, and the status file 1 shown in FIG.
If "NG" is recorded in any of the states in step 8 (steps 114 to 119), it is determined which is "NG" and in what state (step 601). It is set in the logging data of the application (step 602).

【００５４】このようにすることにより、サーバ管理者
およびクライアントユーザは、サーバマシーンのコンソ
ールを使用しなくても、すなわち、サーバマシーンの設
置場所まで出向かなくても、クライアントからのロギン
グデータを見ることでサーバの異常を検出することがで
きる。By doing so, the server administrator and the client user can view the logging data from the client without using the console of the server machine, that is, without going to the installation location of the server machine. In this way, a server error can be detected.

【００５５】図７は、本発明のサーバ管理方法における
第３のエラー処理動作例を示すフローチャートである。
本例は、図１におけるステップ１２１での処理の一詳細
例を示すものであり、図４で示すステータスファイル１
８において「ＮＧ」が記録されているのがサーバアプリ
ケーション状態のみであれば（ステップ５０１〜５０
４）、すなわち、ネットワーク機能が正常であれば、そ
のステータスフィル１８の情報をテキストにして、サー
バアプリケーションから独立して動作する電子メールに
より、システム管理者に通知し（ステップ７０１）、そ
の後、サーバアプリケーションをシャットダウンして再
起動する（ステップ５０５）。FIG. 7 is a flowchart showing a third example of the error processing operation in the server management method of the present invention.
This example shows a detailed example of the processing in step 121 in FIG. 1, and the status file 1 shown in FIG.
In step 8, if "NG" is recorded only in the server application state (steps 501 to 50)
4) In other words, if the network function is normal, the information of the status file 18 is converted to text and the system administrator is notified by e-mail operating independently of the server application (step 701). The application is shut down and restarted (step 505).

【００５６】また、図４で示すステータスファイル１８
において「ＮＧ」が記録されているのがＯＳのエラーで
あれば（ステップ５０１〜５０３）、すなわち、ネット
ワーク機能が正常であれば、そのステータスフィル１８
の情報をテキストにして、サーバアプリケーションから
独立して動作する電子メールにより、システム管理者に
通知し（ステップ７０２）、その後、あるいは、ＰＯＳ
Ｔエラーかネットワーク機能のエラーであれば（ステッ
プ５０１、５０２）、エラー表示を行う（ステップ５０
６）。The status file 18 shown in FIG.
If "NG" is recorded as the error of the OS (steps 501 to 503), that is, if the network function is normal, the status file 18
Of the information as text, and notifies the system administrator by e-mail that operates independently of the server application (step 702).
If it is a T error or an error in the network function (steps 501 and 502), an error is displayed (step 50).
6).

【００５７】このようにすることにより、サーバ管理者
は、従来、サーバ異常を見に行かないと検出できなかっ
た現象を、電子メールの到着で検出することができ、障
害の発生を速やかに検出できる。By doing so, the server administrator can detect the phenomenon that could not be detected unless the server abnormality was checked before by the arrival of the e-mail, and quickly detect the occurrence of a failure. it can.

【００５８】図８は、本発明のサーバ管理方法における
第４のエラー処理動作例を示すフローチャートである。
本例は、図１におけるステップ１２１での処理の一詳細
例を示すものであり、図４で示すステータスファイル１
８において「ＮＧ」が記録されているのがサーバアプリ
ケーション状態のみであれば（ステップ５０１〜５０
４）、サーバアプリケーションをシャットダウンして再
起動し（ステップ５０５）、また、「ＮＧ」が記録され
ているのがＯＳのエラーであれば（ステップ５０１〜５
０３）、図２のＵＰＳ１ｇの機能を用いてサーバマシー
ン本体のメイン電源スイッチをオフ・オンして再起動す
る（ステップ８０１）。FIG. 8 is a flowchart showing a fourth error processing operation example in the server management method of the present invention.
This example shows a detailed example of the processing in step 121 in FIG. 1, and the status file 1 shown in FIG.
In step 8, if "NG" is recorded only in the server application state (steps 501 to 50)
4) Shut down and restart the server application (step 505), and if "NG" is recorded as an OS error (steps 501 to 5).
03), the main power switch of the server machine is turned off and on using the function of the UPS 1g in FIG. 2 to restart (step 801).

【００５９】尚、図４で示すステータスファイル１８に
おいて「ＮＧ」が記録されているのがＰＯＳＴエラーか
ネットワーク機能のエラーであれば（ステップ５０１、
５０２）、エラー表示を行う（ステップ５０６）。If "NG" is recorded in the status file 18 shown in FIG. 4 if the error is a POST error or a network function error (step 501,
502), an error is displayed (step 506).

【００６０】このようにすることにより、従来は、サー
バ管理者が、サーバマシーンの設置場所まで行き、シス
テムのロックと判断してサーバマシーン本体の電源を切
って再起動していたことを、自動的に行うことができ
る。すなわち、ＯＳがエラーになると、サーバマシーン
のキーボードやマウス等が動作しないケースが常であ
り、オペレータは何もできないので電源の「オフ／オ
ン」操作を行うことになるが、本例では、ＵＰＳの電源
オフ／オン制御機能を利用して、サーバマシーンの電源
オフ／オンをエラー内容から判断して自動的に行う。By doing so, conventionally, the server administrator goes to the installation location of the server machine, judges that the system is locked, turns off the power of the server machine, and restarts it. Can be done That is, when the OS causes an error, the keyboard and mouse of the server machine usually do not operate, and the operator cannot perform any operation, so that the power supply is turned “off / on”. The power off / on control function of the server machine is used to automatically determine the power off / on of the server machine based on the error content.

【００６１】以上、図１〜図８を用いて説明したよう
に、本例のサーバ管理方法では、サーバマシーン本体と
ネットワーク機能のみならず、サーバアプリケーション
とＯＳモジュールが正常に機能しているか否かを記録・
管理できるステータスファイル１８を設け、このステー
タスファイル１８に記録された、マシーン本体の自己診
断結果や、ＳＮＭＰによるネットワーク機能の監視結
果、サーバアプリケーションの動作状態検出結果、およ
び、サーバアプリケーションが利用するＯＳモジュール
の動作状態検出結果のそれぞれが、全て、「ＯＫ(正
常)」であれば、システムが正常に動作中であるとして
起動するが、いずれか１つ、または、複数が正常状態と
異なる（「ＮＧ」）場合は、「エラー」として判定す
る。このことにより、一見正常に動作しているサーバア
プリケーションの異常を速やかに検出することができ
る。As described above with reference to FIGS. 1 to 8, in the server management method of the present embodiment, it is determined whether not only the server machine itself and the network function but also the server application and the OS module are functioning normally. Record
A status file 18 that can be managed is provided. The self-diagnosis result of the machine body, the monitoring result of the network function by SNMP, the result of detecting the operation state of the server application, and the OS module used by the server application are recorded in the status file 18. If all the operation state detection results are “OK (normal)”, the system is started up normally, but one or more of them are different from the normal state (“NG”). )), It is determined as “error”. As a result, it is possible to quickly detect the abnormality of the server application that is operating normally at first glance.

【００６２】また、「エラー」として判定した場合に
は、どのチェック結果が異常であるかを判別して、例え
ば、サーバアプリケーションのみが異常である時には、
サーバアプリケーションをシャットダウンして自動的に
再起動する。これにより、サーバアプリケーションの起
動時に良く発生する立ち上げ異常の解決を自動的に行う
ことができる。When it is determined as “error”, it is determined which check result is abnormal. For example, when only the server application is abnormal,
Shut down and automatically restart the server application. As a result, it is possible to automatically solve the startup abnormality that often occurs when the server application is started.

【００６３】また、「エラー」として判定した場合に
は、「ステータスファイル」の内容を、サーバアプリケ
ーションのロギングデータべースファイルに埋め込み、
データべースアクセス機能が動作している場合、クライ
アントユーザからも状態判断ができるようにする。これ
により、サーバ管理者あるいはクライアントユーザは、
サーバマシーンの設置場所まで行ってそのコンソールを
利用しなくても、クライアントマシーンから、確認した
いサーバアプリケーションのロギングファイルをアクセ
スすることで、その状況を確認できる。When it is determined that an error has occurred, the contents of the “status file” are embedded in the logging database file of the server application.
When the database access function is operating, enable the client user to determine the status. This allows the server administrator or client user to
Even without going to the installation location of the server machine and using the console, the situation can be confirmed by accessing the logging file of the server application to be confirmed from the client machine.

【００６４】また、判定した「エラー」が、サーバアプ
リケーションやＯＳモジュールの障害等、ネットワーク
機能以外の障害であれば、サーバアプリケーションから
独立した電子メール機能を用いて、サーバ管理者に、サ
ーバアプリケーションの異常を通知する。このことによ
り、従来はサーバ異常を見に行かないと検出できなかっ
た事象を電子メールの到着で検出することができる。If the determined “error” is a failure other than the network function such as a failure of the server application or the OS module, the server administrator is notified of the server application by using an e-mail function independent of the server application. Notify an error. As a result, it is possible to detect an event that has not been detected unless a server error has been seen in the past by the arrival of an e-mail.

【００６５】また、判定した「エラー」がサーバアプリ
ケーションで利用するＯＳモジュールの障害であれば、
ＵＰＳの機能を利用してサーバマシーンのメインスイッ
チ（電源スイッチ）のオフ・オン制御(再起動)を行う。
このことにより、従来、サーバ管理者がサーバマシーン
の設置場所まで行ってシステムのロックと判断し、サー
バマシーンの電源を制御（オフ・オン）していたこと
が、自動にて可能となる。If the determined “error” is a failure of the OS module used in the server application,
The main switch (power switch) of the server machine is turned on / off (restarted) using the function of the UPS.
As a result, it has become possible for the server administrator to automatically go to the installation location of the server machine, determine that the system is locked, and control the power of the server machine (off / on).

【００６６】尚、本発明は、図１〜図８を用いて説明し
た例に限定されるものではなく、その要旨を逸脱しない
範囲において種々変更可能である。例えば、本例では、
エラー処理における各具体的な処理を個別に行っている
が、それぞれを組合せて共に行うことでも良い。すなわ
ち、図６におけるステップ６０２でのエラー処理（ロギ
ングデータ設定）や、図８におけるステップ８０１での
エラー処理（メインスイッチのオフ・オン）後に、図７
におけるステップ７０１，７０２での処理（メール通
知）を行う手順としても良い。The present invention is not limited to the examples described with reference to FIGS. 1 to 8 and can be variously modified without departing from the gist thereof. For example, in this example,
Although specific processes in the error process are individually performed, they may be performed in combination with each other. That is, after the error processing (logging data setting) in step 602 in FIG. 6 and the error processing (main switch off / on) in step 801 in FIG.
The procedure (mail notification) in steps 701 and 702 in the above may be performed.

【００６７】また、本例では、グループウェアサーバを
例として説明したが、データベースサーバやファイルサ
ーバ、メールサーバのサーバ管理にも適用することがで
きる。また、本例では、ＬＡＮをネットワークに用いて
いるが、ＷＡＮを用いるものでも良い。In this embodiment, the groupware server has been described as an example. However, the present invention can be applied to server management of a database server, a file server, and a mail server. In this example, a LAN is used for the network, but a WAN may be used.

【００６８】また、本例では、図４のステータスファイ
ル１８においては、テキスト形式（文字コード）のファ
イルとして記録しているが、その他、広範なアプリケー
ションからも読める形式であれば良い。また、文字コー
ドではなく、ビットフラグやバイトフラグ、イメージフ
ラグ等を用いることでも良い。In this example, the status file 18 shown in FIG. 4 is recorded as a file in a text format (character code), but any other format that can be read from a wide range of applications may be used. Instead of a character code, a bit flag, a byte flag, an image flag, or the like may be used.

【００６９】また、本例では、ステータスファイル１８
を、稼動中のサーバアプリケーションのロギングファイ
ルに埋め込んでいるが、容易にアクセスできるものであ
れば、どのようなロギングファイルでも良い。また、本
例では、ＵＰＳを設け、オペレーティングシステムの異
常時には、このＵＰＳの機能を用いてサーバ（情報処理
装置１ｄ）の電源のＯＦＦ・ＯＮ制御を行っているが、
情報処理装置１ｄ自体の機能を用いてリブート（再起
動）することでも良い。In this example, the status file 18
Is embedded in the logging file of the running server application, but any logging file that can be easily accessed may be used. Further, in this example, a UPS is provided, and when the operating system is abnormal, the power of the server (the information processing apparatus 1d) is turned off and on using the function of the UPS.
Rebooting (restarting) using the function of the information processing apparatus 1d itself may be performed.

【００７０】[0070]

【発明の効果】本発明によれば、ＰＯＳＴによるサーバ
マシーン本体の障害検出やネットワーク監視プロトコル
(ＳＮＭＰ)によるネットワーク機能の障害検出と共に、
サーバアプリケーションが起動するのに必要なタスク全
部が動作しているか否か等の異常検出もシステム立ち上
げ時に行い、サーバマシーン本体が動作していること
と、ネットワーク機能が動作していること、そしてサー
バアプリケーションが機能していることをトータルで監
視し、システム全体として障害の検出・管理を行うこと
ができるので、一見正常に動作しているサーバアプリケ
ーションの異常を速やかに検出し対処することができ、
例えばグループウェア等、ネットワークとデータベース
とを統合管理するサーバシステムの効率的な運用を支援
することが可能である。According to the present invention, a failure detection of a server machine body by POST and a network monitoring protocol are provided.
(SNMP) along with network function failure detection,
Abnormality detection, such as whether all tasks necessary to start the server application are running, is also performed at system startup, and that the server machine itself is running, the network function is running, and By monitoring the functioning of the server application as a whole and detecting and managing failures in the entire system, it is possible to quickly detect and deal with abnormalities in seemingly normal server applications. ,
For example, it is possible to support efficient operation of a server system that integrally manages a network and a database, such as groupware.

【００７１】特に、異常検出の対象となるハードウェ
ア、ネットワーク、アプリケーション、オペレーティン
グシステムのそれぞれに対応付けて、それぞれが正常に
動作するのに必要な全ての機能の識別情報をテキスト形
式で予め「正常時データ」として登録しておき、電源ス
イッチオンに伴い起動した各機能の識別情報を「カレン
トデータ」として記録し、この「カレントデータ」と
「正常時データ」とを比較し、一つでも一致しなけば異
常として検出するので、機能項目の順序に影響されるこ
となく異常の判定を行うことができる。In particular, the identification information of all functions necessary for normal operation of each of the hardware, the network, the application, and the operating system, which are the targets of the abnormality detection, is stored in a text format in advance. Time data, and the identification information of each function activated when the power switch is turned on is recorded as “current data”, and the “current data” is compared with the “normal data”. Otherwise, an abnormality is detected, so that the abnormality can be determined without being affected by the order of the function items.

【００７２】また、検出した障害内容に応じた対応処理
を自動的に行う。例えば、サーバアプリケーションが異
常であれば、当該サーバアプリケーションを自動的に再
起動するので、サーバアプリケーションの起動時に良く
発生する立ち上げ異常を自動的に解決することができ、
サーバシステムの信頼性を向上させることが可能であ
る。Further, a response process corresponding to the detected fault content is automatically performed. For example, if the server application is abnormal, the server application is automatically restarted, so that a startup abnormality that often occurs when the server application is started can be automatically resolved.
It is possible to improve the reliability of the server system.

【００７３】また、障害内容を、サーバアプリケーショ
ンのロギングデータべースファイルに埋め込むことによ
り、サーバマシーンでなくとも、クライアントマシーン
からそのロギングファイルをアクセスすることで、その
状況を確認することができ、サーバ管理者がサーバマシ
ーンの設置場所に出向く必要がなくなり、作業負荷を軽
減することが可能である。Further, by embedding the contents of the fault in the logging database file of the server application, the situation can be confirmed by accessing the logging file from the client machine, even if the server machine is not used. This eliminates the need for the user to go to the installation location of the server machine, and can reduce the work load.

【００７４】また、障害が、ネットワーク機能以外の障
害であれば、電子メール機能を用いて、その障害を通知
することにより、サーバ管理者による障害の検出が容易
となり、障害への対応を迅速化することが可能である。If the failure is a failure other than the network function, the failure is notified by using the e-mail function, so that the failure can be easily detected by the server administrator and the response to the failure can be expedited. It is possible to

【００７５】また、障害がサーバアプリケーションで利
用するＯＳモジュールの障害であれば、サーバマシーン
のメインスイッチ（電源スイッチ）をオフ・オン制御し
てサーバマシーン本体の再起動を行い、システムロック
を自動的に解除するので、サーバ管理者の作業負荷を軽
減することが可能である。If the failure is a failure of the OS module used in the server application, the main switch (power switch) of the server machine is turned off and on, the server machine is restarted, and the system lock is automatically set. , It is possible to reduce the workload of the server administrator.

[Brief description of the drawings]

【図１】本発明のサーバ管理方法に係る処理動作例を示
すフローチャートである。FIG. 1 is a flowchart illustrating an example of a processing operation according to a server management method of the present invention.

【図２】本発明のサーバ管理方法に係る処理動作を行う
サーバマシーンの構成例を示すブロック図である。FIG. 2 is a block diagram illustrating a configuration example of a server machine that performs a processing operation according to the server management method of the present invention.

【図３】図１における処理動作を行うクライアント・サ
ーバシステムの構成例を示すブロック図である。FIG. 3 is a block diagram illustrating a configuration example of a client-server system that performs the processing operation in FIG. 1;

【図４】図３におけるサーバマシーンで作成するステー
タスファイルの構成例を示す説明図である。FIG. 4 is an explanatory diagram showing a configuration example of a status file created by a server machine in FIG. 3;

【図５】本発明のサーバ管理方法における第１のエラー
処理動作例を示すフローチャートである。FIG. 5 is a flowchart showing a first error processing operation example in the server management method of the present invention.

【図６】本発明のサーバ管理方法における第２のエラー
処理動作例を示すフローチャートである。FIG. 6 is a flowchart showing a second error processing operation example in the server management method of the present invention.

【図７】本発明のサーバ管理方法における第３のエラー
処理動作例を示すフローチャートである。FIG. 7 is a flowchart showing a third error processing operation example in the server management method of the present invention.

【図８】本発明のサーバ管理方法における第４のエラー
処理動作例を示すフローチャートである。FIG. 8 is a flowchart showing a fourth error processing operation example in the server management method of the present invention.

【図９】従来のサーバ管理制御例を示すフローチャート
である。FIG. 9 is a flowchart showing a conventional server management control example.

[Explanation of symbols]

１：サーバマシーン、１ａ：表示装置、１ｂ：入力装
置、１ｃ：外部記憶装置、１ｄ：情報処理装置、１ｅ：
光ディスク、１ｆ：駆動装置、１ｇ：ＵＰＳ（無停電電
源装置）、２〜４：クライアント、５：ＬＡＮ、１０：
ＣＰＵ、１１：主メモリ、１２：サーバ管理処理部、１
３：ＰＯＳＴ処理部、１４：ネットワークテスト処理
部、１５：アプリテスト処理部、１６：ＯＳテスト処理
部、１７：結果記録処理部、１８：ステータスファイ
ル、１９：検証処理部、２０：エラー処理部。1: server machine, 1a: display device, 1b: input device, 1c: external storage device, 1d: information processing device, 1e:
Optical disk, 1f: drive device, 1g: UPS (uninterruptible power supply), 2 to 4: client, 5: LAN, 10:
CPU, 11: main memory, 12: server management processing unit, 1
3: POST processing unit, 14: network test processing unit, 15: application test processing unit, 16: OS test processing unit, 17: result recording processing unit, 18: status file, 19: verification processing unit, 20: error processing unit .

Claims

[Claims]

1. A server management method for detecting an abnormality of a server constituting a client-server system, wherein a hardware abnormality is detected by a self-diagnosis unit and a network function abnormality is detected by a network management unit when a power switch is turned on. Performing an error detection of application startup and an error detection of the operating system, recording a result of each detection, and determining an error if any one of the detection results is abnormal. Server management method.

2. The server management method according to claim 1, wherein all functions required for normal operation of each of the hardware and the network and the application and the operating system are associated with each other. The identification information is registered in a text format in advance, the identification information of each of the functions activated upon turning on the power switch is recorded, and the recorded identification information is compared with the previously registered identification information. A server management method characterized by detecting an error if they do not match.

3. The server management method according to claim 1, wherein when an error is determined, which of the detection results is abnormal is confirmed based on the record, and only the application is determined. A server management method characterized by shutting down and restarting the application if abnormal.

4. In the server management method according to any one of claims 1 to 3, when an error is determined,
A server management method, wherein each of the recorded detection results is written to a logging file.

5. In the server management method according to any one of claims 1 to 4, when an error is determined,
A server that checks which detection result is abnormal based on the record and, if the network function is normal, sends abnormality occurrence information to a preset mail address using an electronic mail. Management method.

6. In the server management method according to any one of claims 1 to 5, when an error is determined,
A server management method comprising: confirming which detection result is abnormal based on a record; and if the operating system is abnormal, performing a power switch off / on control and restarting the server.