JP2007265333A

JP2007265333A - Operation restoration support system

Info

Publication number: JP2007265333A
Application number: JP2006092977A
Authority: JP
Inventors: Naoya Fujimoto; 直也藤本
Original assignee: Hitachi Software Engineering Co Ltd
Current assignee: Hitachi Software Engineering Co Ltd
Priority date: 2006-03-30
Filing date: 2006-03-30
Publication date: 2007-10-11

Abstract

<P>PROBLEM TO BE SOLVED: To map system configuration and environmental configuration in an operation application management system to other system at a remote place, to automatically continue application without changing an operation definition even on a machine with different host name from an original system when a failure occurs in the original system and quickly and easily restart an operation. <P>SOLUTION: This operation restoration support system is constituted of the operation application management system constituting of a manager and a plurality of agents and a standby system operation application management system which acts for functions of the operation application management system when the failure occurs. Each host of an active system and each host of a standby system are mapped and when the failure occurs in the host of the active system, an operation is alternately executed by the mapped host of the standby system. When configuration change occurs between the active system and the standby system, mapping is set so that a load of the host of the standby system may be evenly distributed to update configuration management information. <P>COPYRIGHT: (C)2008,JPO&INPIT

Description

本発明は、電子計算機での業務運用における、システム障害時の業務復旧を遠隔地で迅速に行うことを可能にする業務復旧支援システムに関する。 The present invention relates to a business recovery support system that enables a business recovery in the event of a system failure in a business operation on an electronic computer to be quickly performed at a remote location.

電子計算機を使用した業務運用管理システムでは、様々な要因によるシステムの障害が発生する危険性がある。このため、企業の基幹業務で運用中のシステムについて、障害発生時の対応システムの構築が実施されてきている。 In a business operation management system using an electronic computer, there is a risk of system failure due to various factors. For this reason, the construction of a response system in the event of a failure has been implemented for a system that is operating in a company's core business.

近年では、重要データのバックアップだけでなく、冗長構成を持つ（クラスタ）システムによる障害時の対応システムが構築されてきている。また、広域災害に対しても遠隔地でのディスクミラーリングやデータレプリケーションにより、拠点全体が破壊された場合でも、別拠点でシステムを再構築し業務を再開できるようになってきている。 In recent years, not only backup of important data but also a response system in the event of a failure by a (cluster) system having a redundant configuration has been constructed. In addition, even in the event of a wide-area disaster, even if the entire site is destroyed by remote disk mirroring or data replication, it is now possible to reconstruct the system at another site and resume operations.

下記特許文献１に記載のものは、冗長構成を持つコンピューターシステムにおいて、システム障害自動復旧処理を、障害監視装置を用いて統合的に管理し、発生した障害の種別により、その復旧動作を自動的に実施可能とすることで、システム保守のための工数を削減するものである。 In the computer system having the redundant configuration, the system failure automatic recovery processing is integratedly managed by using a failure monitoring apparatus, and the recovery operation is automatically performed according to the type of the failure that occurred. By making it feasible, the man-hours for system maintenance are reduced.

また、遠隔地へのデータの複製を行う遠隔データ・シャドーイング・システムの方法として、例えば、下記特許文献２に記載のものが知られている。
特開2003−114811号公報特開平7−244597号公報 Further, as a method of a remote data shadowing system for copying data to a remote place, for example, a method described in Patent Document 2 below is known.
JP 2003-114811 A JP-A-7-244597

しかしながら、これまでの技術は、遠隔地に必要データを転送したり退避したりすることは、設定により自動的にできるものの、障害時にシステム構成を再構築して障害復旧を図ることについては、迅速かつ容易に実行できるとは言い難い。システム構成の再構築のためには、元のシステムと同じシステム構成を準備しておく必要がある、または人手による元のシステムと同様の業務が実行できる環境の再構築が必要である、などの事情があるためである。 However, while the conventional technology can automatically transfer and save necessary data to remote locations by setting, it is quick to rebuild the system configuration at the time of failure and to recover from the failure. And it's hard to say that it can be done easily. In order to reconstruct the system configuration, it is necessary to prepare the same system configuration as the original system, or it is necessary to reconstruct an environment where the same operations as the original system can be performed manually. This is because there are circumstances.

上記特許文献２で示される技術は、遠隔地にデータを転送しておく技術であり、そのままでは遠隔地でのシステム再開が可能とはならない。また、上記特許文献１で示される技術は、システムの部品については、障害の種別に応じて自動的に復旧できるものの、コンピューターのOSについては、冗長構成（クラスタ）システムを前提としているため、遠隔地のまったく異なる別コンピューターでの業務復旧には対応できない。例えば、マネージャー、エージェント構成で業務運用管理を行っている場合は、業務の定義情報中に、業務を実行するエージェントマシン情報を持っているため、別システム上で同一の業務を実行させるには、業務定義中で指定してあるホスト名（コンピューター名）で動作が可能でなければならない。このためには、元のシステムとまったく同じマシン環境を準備しておく必要がある。または、別システムで業務を再開させるときに、各業務が問題なく動作できるよう、マシン間の通信設定情報を変更するか、別システムで動作可能なように、すべての業務定義を変更しなければならない。 The technique disclosed in Patent Document 2 is a technique for transferring data to a remote location, and as it is, the system cannot be restarted at a remote location. Further, although the technology disclosed in Patent Document 1 can automatically recover system components according to the type of failure, the computer OS is premised on a redundant configuration (cluster) system. It cannot handle business recovery on another computer with a completely different location. For example, if business operation management is performed in a manager / agent configuration, the agent machine information for executing the business is included in the business definition information. To execute the same business on another system, It must be possible to operate with the host name (computer name) specified in the business definition. For this purpose, it is necessary to prepare the same machine environment as the original system. Or, when restarting a task on another system, change the communication setting information between machines so that each task can operate without problems, or change all task definitions so that it can operate on another system. Don't be.

本発明は、業務運用管理システムでのシステム構成や環境構築を、例えば遠隔地にある別のシステムにマッピングし、元のシステムに障害が発生した場合に、元のシステムとはホスト名（コンピューター名）が異なるマシン上でも、業務定義を変更することなく、自動的な運用継続を可能とするとともに、迅速かつ容易に業務再開できるような技術を提供することを目的とする。 The present invention maps a system configuration and environment construction in a business operation management system to, for example, another system at a remote location, and when a failure occurs in the original system, the original system is different from the host name (computer name). The purpose of this technology is to provide a technology that enables automatic operation continuation without changing the business definition even on different machines, and that the business can be resumed quickly and easily.

上記目的を達成するため、本発明では、マネージャーホストと、そのマネージャーホストに管理されている任意の数のエージェントホストとを備え、前記マネージャーホストからの要求に応じて前記エージェントホストが実際の業務の実行を行う業務運用管理システムにおいて、障害が発生し、前記業務運用管理システムが停止した場合に、代替運用を行う、マネージャーホストと、そのマネージャーホストに管理される任意の数のエージェントホストとを備えた待機系業務運用管理システムを設ける。そして、前記業務運用管理システムのマネージャーホストおよびエージェントホストに障害が発生したとき、前記待機系業務運用管理システムのどのマネージャーホストおよびどのエージェントホストで業務代行を行うかをあらかじめ定義した構成管理テーブルを用意しておくとともに、前記業務運用管理システムでのマネージャーホストおよびエージェントホストの各ホスト名と、それらのホストを代替する前記待機系業務運用管理システムのホストを特定するIPアドレスとの対応関係を格納したホスト名解決テーブルを用意する。前記業務運用管理システムのマネージャーホストが停止したことを検知したとき、前記構成管理テーブルから、前記停止したマネージャーホストの代替となる前記待機系業務運用管理システムのマネージャーホストを割り当て、業務を継続させる。また、前記業務運用管理システムのエージェントホストが停止したことを検知したとき、前記ホスト名解決テーブルから、前記停止したエージェントホストの代替となる前記待機系業務運用管理システムのエージェントホストを割り当て、業務を継続させる。 In order to achieve the above object, the present invention comprises a manager host and an arbitrary number of agent hosts managed by the manager host, and the agent host responds to a request from the manager host. In the business operation management system to be executed, when a failure occurs and the business operation management system stops, a manager host that performs alternative operation and an arbitrary number of agent hosts managed by the manager host are provided. Establish a standby business operation management system. In addition, a configuration management table that defines in advance which manager host and which agent host of the standby business operation management system will perform business substitution when a failure occurs in the manager host and agent host of the business operation management system is prepared. In addition, the correspondence between the host names of the manager host and agent host in the business operation management system and the IP address that identifies the host of the standby business operation management system that substitutes for those hosts is stored. Prepare a host name resolution table. When it is detected that the manager host of the business operation management system has stopped, the manager host of the standby business operation management system, which replaces the stopped manager host, is assigned from the configuration management table, and the business is continued. Further, when detecting that the agent host of the business operation management system has stopped, assigns an agent host of the standby business operation management system as a substitute for the stopped agent host from the host name resolution table, Let it continue.

前記業務運用管理システムのマネージャーホストで、新たなエージェントホストが追加されたときには、自動的に構成管理情報を更新し、前記待機系業務運用管理システムに通知を行い、前記待機系業務運用管理システムのマネージャーでは、前記業務運用管理システムに追加されたエージェントが停止した場合に、前記業務運用管理システムでの業務を、前記待機系業務運用管理システムの各エージェントに、負荷を自動分散して、代替実行させるように構成定義設定を行うようにするとよい。また、前記待機系業務運用管理システムのマネージャーホストで、新たなエージェントホストが追加されたときに、自動的に構成管理情報を更新し、既存のエージェントホストのうち、最も多くのエージェントホストの代替実行が定義されているホストを検索し、新たに追加したエージェントホストに負荷を自動分散させるように構成定義設定を行うようにするとよい。 When a new agent host is added on the manager host of the business operation management system, the configuration management information is automatically updated, the standby business operation management system is notified, and the standby business operation management system In the manager, when an agent added to the business operation management system stops, the work in the business operation management system is automatically distributed to each agent in the standby business operation management system, and executed as an alternative. It is advisable to set the configuration definition so that In addition, when a new agent host is added on the manager host of the standby business operation management system, the configuration management information is automatically updated, and the largest number of existing agent hosts are replaced. It is advisable to search for a host that is defined and to perform configuration definition settings so that the load is automatically distributed to the newly added agent host.

本発明によれば、業務運用管理システムでのシステム構成や環境構築を、自動的に、例えば遠隔地にある別のシステムにマッピングし、元のシステムとはホスト名（コンピューター名）が異なるマシン上でも、ホスト名などの構成情報の変更や業務定義を変更することなく、自動的な運用継続を可能とし、大規模な障害が発生した場合でも、迅速かつ容易に業務再開できる。 According to the present invention, the system configuration and environment construction in the business operation management system are automatically mapped to another system at a remote location, for example, on a machine having a host name (computer name) different from that of the original system. However, automatic operation can be continued without changing the configuration information such as the host name or the business definition, and the business can be resumed quickly and easily even if a large-scale failure occurs.

以下、本発明を実施する場合の一形態を図面を参照して具体的に説明する。 Hereinafter, an embodiment for carrying out the present invention will be specifically described with reference to the drawings.

図１は、本発明の一実施形態のシステムの概略構成を表すブロック図である。この実施形態の遠隔地での業務復旧支援システムは、実行系システム1と待機系システム2で構成されている。実行系システム1は、拠点3と拠点4からなる。拠点3には、業務の定義情報を持ち業務実行をコントロールするマネージャー6と、マネージャー6からの要求により実際の業務を実行するエージェント7および8が設けられている。拠点4には、マネージャー6からの要求により実際の業務を実行するエージェント9が設けられている。待機系システム2は、拠点5からなる。拠点5には、実行系システム1のマネージャー6に障害が発生した場合に業務を引き継いで管理するマネージャー10と、実行系システム1のエージェント7、8、または9に障害が発生した場合に業務を引き継いで実行するエージェント11および12が設けられている。それぞれの拠点間は、ネットワーク13（例えばWAN（Wide Area Network））で接続されている。これらのマネージャーおよびエージェントは、それぞれ、１台のマシンと考えてよい。 FIG. 1 is a block diagram showing a schematic configuration of a system according to an embodiment of the present invention. The remote business recovery support system of this embodiment includes an active system 1 and a standby system 2. The active system 1 includes a base 3 and a base 4. The base 3 is provided with a manager 6 that has business definition information and controls business execution, and agents 7 and 8 that execute actual business in response to a request from the manager 6. The base 4 is provided with an agent 9 that executes an actual job in response to a request from the manager 6. The standby system 2 includes a base 5. At the base 5, if the manager 6 of the active system 1 fails, the manager 10 that takes over the work and the agent 7, 8, or 9 of the active system 1 Agents 11 and 12 that take over and execute are provided. Each base is connected by a network 13 (for example, WAN (Wide Area Network)). Each of these managers and agents may be considered as one machine.

図２は、実行系システムと待機系システムとのマッピング構成を示すブロック図である。マッピングとは、実行系の各マシンに障害が発生したときに、待機系のどのマシンが業務を引き継ぐのかを関連づけた定義のことを指す。図２の例では、実行系システムのマネージャー21（図１の6）と待機系システムのマネージャー25（図１の10）とがマッピングされている。マネージャー21が障害により停止した場合には、マネージャー25が引き継いで業務を継続する。同様に、エージェント22（図１の7）とエージェント24（図１の9）がエージェント26（図１の11）と、エージェント23（図１の8）がエージェント27（図１の12）と、それぞれマッピングされている。それぞれのエージェントの何れかが障害により停止すると、マネージャーは、待機系システムのエージェントに業務の実行を要求する。エージェント26は、実行系の複数のエージェント22，24とマッピングされている。エージェント22とエージェント24の両方とも停止した場合には、２つのエージェント分の業務をエージェント26で実行する。 FIG. 2 is a block diagram showing a mapping configuration between the active system and the standby system. Mapping refers to a definition that associates which machine in the standby system takes over work when a failure occurs in each machine in the active system. In the example of FIG. 2, the manager 21 of the active system (6 in FIG. 1) and the manager 25 of the standby system (10 in FIG. 1) are mapped. If the manager 21 stops due to a failure, the manager 25 takes over and continues the business. Similarly, agent 22 (7 in FIG. 1) and agent 24 (9 in FIG. 1) are agent 26 (11 in FIG. 1), agent 23 (8 in FIG. 1) is agent 27 (12 in FIG. 1), Each is mapped. When any of the agents stops due to a failure, the manager requests the agent of the standby system to execute the business. The agent 26 is mapped to a plurality of agents 22 and 24 in the execution system. When both the agent 22 and the agent 24 are stopped, the business for two agents is executed by the agent 26.

図３は、図２のマッピング構成を管理する構成管理テーブルの構成図である。本構成管理テーブルにより、実行系システムでのホスト名31と、該実行系システムでのホストにマッピングされている待機系システムのホスト名32と、該待機系システムのホストのIPアドレス33とを定義する。また、それぞれのホストがマネージャーなのかエージェントなのかを示す種別34と、マネージャーの場合における待機系の起動モード35を定義する。この構成管理テーブルは、予め管理者により作成され、実行系および待機系それぞれのマネージャー21，25上に格納されている。構成管理テーブルのデータ更新は、管理者の指示に基づいて、実行系マネージャー21にて実施される。実行系マネージャー21内の構成管理テーブルが更新されると、その更新データが待機系マネージャー25に転送され、該待機系マネージャー内の構成管理テーブルも同様に更新される。 FIG. 3 is a configuration diagram of a configuration management table that manages the mapping configuration of FIG. This configuration management table defines the host name 31 in the active system, the host name 32 of the standby system mapped to the host in the active system, and the IP address 33 of the host in the standby system To do. Also, a type 34 indicating whether each host is a manager or an agent, and a standby activation mode 35 in the case of a manager are defined. This configuration management table is created in advance by an administrator and stored on the managers 21 and 25 of the active system and the standby system. The data update of the configuration management table is performed by the active manager 21 based on an instruction from the administrator. When the configuration management table in the active manager 21 is updated, the updated data is transferred to the standby manager 25, and the configuration management table in the standby manager is updated in the same manner.

待機系への業務切り替えが発生した場合、本構成管理テーブルのデータを読み込み、起動モードが「自動」の場合、待機系で業務運用管理システムを起動した後、待機系に転送されている業務データを元に、自動的に業務を再開する。起動モードが「手動」の場合、待機系で業務運用管理システムを起動した後、業務は、中断状態となる。この場合、業務の実行状況を業務データ中の実行ログ等で確認し、手動で、業務を再開する。業務切り替えの処理手順については、後に詳しく説明する。 When a job switch to the standby system occurs, the data in this configuration management table is read. If the start mode is "Automatic", the business data transferred to the standby system after starting the business operation management system on the standby system Automatically resumes work based on When the start mode is “manual”, after starting the business operation management system in the standby system, the business is suspended. In this case, the execution status of the business is confirmed with an execution log in the business data, and the business is manually restarted. The process switching process will be described in detail later.

図４は、実行系と待機系の業務データの転送を表すブロック図である。実行系システム41（例えば図１の1）のマネージャー43（図１の6）には、業務定義データ、業務実行状態データ、業務実行結果データ、および構成管理データなどを格納したデータベース45が接続されている。データベース45に更新が発生すると、データベースソフトウェアのレプリケーション機能により、データベース45中のデータが待機系システム42（図１の2）のマネージャー44（図１の10）に接続されているデータベース46に反映される。これにより、業務運用中には、実行系システム41内のデータベース45と同じデータが、待機系システム42内のデータベース46に保持されることになる。 FIG. 4 is a block diagram showing transfer of business data between the active system and the standby system. A database 45 storing business definition data, business execution status data, business execution result data, configuration management data, and the like is connected to the manager 43 (6 of FIG. 1) of the active system 41 (for example, 1 of FIG. 1). ing. When an update occurs in the database 45, the data in the database 45 is reflected in the database 46 connected to the manager 44 (10 in FIG. 1) of the standby system 42 (2 in FIG. 1) by the replication function of the database software. The Thereby, during business operation, the same data as the database 45 in the execution system 41 is held in the database 46 in the standby system 42.

図５は、ホスト名解決のためのデータテーブルであるホスト名解決テーブルの構成図である。実行系システムでのホスト名51とそれぞれのIPアドレス52、現在そのホストが使用中かどうかを示すフラグを格納するフラグ領域53（図５の例では、0が使用中、1が未使用を示すものとする）、および、IPアドレスの優先度54が格納されている。実行系システムが正常に運用されている状態では、ホスト名51が当該正常に運用されている実行系システムのホスト名であるレコードのうち、IPアドレス52が当該正常に運用されている実行系システムのホストのIPアドレスであるレコードのフラグ領域53が0（使用中）になっており、それ以外の同一ホスト名のレコードについては、IPアドレス52が待機系システムのホスト（上記実行系システムのホストにマッピングされているもの）のIPアドレスとなり、フラグ領域53は1（未使用）となっている。マネージャーとエージェント間の通信処理では、このホスト名解決テーブルの情報に従ってアドレス解決が行われる。マネージャーでは、ジョブの定義情報に設定されている実行先エージェントのアドレスを解決するときに、ホスト名51が実行先エージェントで、かつフラグ領域53が0（使用中）のレコードを入力する。このレコードに設定されているIPアドレス52を使用し、通信を行う。エージェントへの通信処理では、マネージャーのホスト名とIPアドレスをエージェントに連絡する。このときも、ホスト名解決テーブルの情報を参照する。IPアドレス52が自ホストと同じで、かつフラグ領域53が0（使用中）のレコードを入力する。このレコードに設定されているホスト名51とIPアドレス52をエージェントに連絡する。エージェントからマネージャーへの通信は、このホスト名とIPアドレスを使用し、通信を行う。 FIG. 5 is a configuration diagram of a host name resolution table that is a data table for host name resolution. A flag area 53 for storing a host name 51 and each IP address 52 in the execution system and a flag indicating whether or not the host is currently in use (in the example of FIG. 5, 0 indicates that it is in use and 1 indicates that it is not in use) And the priority 54 of the IP address is stored. When the running system is operating normally, out of the records where the host name 51 is the host name of the running system that is operating normally, the running system where the IP address 52 is operating normally The flag area 53 of the record that is the IP address of the current host is 0 (in use), and for other records with the same host name, the IP address 52 is the host of the standby system (the host of the above-mentioned execution system) And the flag area 53 is 1 (unused). In communication processing between the manager and the agent, address resolution is performed according to the information in the host name resolution table. When resolving the address of the execution agent set in the job definition information, the manager inputs a record in which the host name 51 is the execution agent and the flag area 53 is 0 (in use). Communication is performed using the IP address 52 set in this record. In the communication process to the agent, the manager's host name and IP address are reported to the agent. Also at this time, the information of the host name resolution table is referred to. A record in which the IP address 52 is the same as that of the own host and the flag area 53 is 0 (in use) is input. The agent is notified of the host name 51 and IP address 52 set in this record. Communication from the agent to the manager uses this host name and IP address.

例えば、図５のホスト名解決テーブルの先頭レコード（実行系システムが正常に運用されている状態であるとする）は、ホスト名51が「マネージャー21」、IPアドレス52が「1.0.0.1」、フラグ領域53が「0」、優先度54が「1」であるが、これは現在正常に運用されている実行系システムのホストを示している。次のレコードは、ホスト名51が「マネージャー21」、IPアドレス52が「2.0.0.1」、フラグ領域53が「1」、優先度54が「2」であるが、これは上記IPアドレス52が「1.0.0.1」の実行系のホストにマッピングされており、該実行系のホストに障害が発生したときに該「マネージャー21」を引き継ぐ待機系のホストを示すレコードである。実行系のホストで障害が発生した場合、「マネージャー21」のレコードを入力し、実行系のホストのレコードのフラグ領域53を1（未使用）に変更し、待機系のホストのレコードのフラグ領域53を0（使用中）に変更する。その後の通信処理では、待機系のホストのレコードが使用中となるため、ホスト名「マネージャー21」のIPアドレスは、待機系のホストである「2.0.0.1」が使用されるようになる。 For example, in the first record of the host name resolution table in FIG. 5 (assuming that the running system is operating normally), the host name 51 is “manager 21”, the IP address 52 is “1.0.0.1”, The flag area 53 is “0”, and the priority 54 is “1”, which indicates a host of an active system that is currently operating normally. In the next record, the host name 51 is “manager 21”, the IP address 52 is “2.0.0.1”, the flag area 53 is “1”, and the priority 54 is “2”. This record is mapped to the active host “1.0.0.1”, and indicates a standby host that takes over the “manager 21” when a failure occurs in the active host. If a failure occurs on the executing host, enter the manager 21 record, change the flag area 53 of the executing host record to 1 (unused), and set the standby host record flag area. Change 53 to 0 (in use). In the subsequent communication processing, since the record of the standby host is in use, “2.0.0.1”, which is the standby host, is used as the IP address of the host name “manager 21”.

優先度54は、障害が発生して待機系のホストに引き継ぐ際の優先度（数値が低い程優先度が高いとする）を示す。実行系システムのホストに障害が発生すると、そのホストと同じホスト名51を持つ待機系システムのIPアドレス52のうち、優先度54が最も高いIPアドレス52を取得し、該IPアドレスの待機系ホストで上記障害が発生した実行系システムのホストを引き継ぐ。このとき、業務を引き継いだ待機系システムのホストを示すレコードのフラグ領域54は1（未使用）から0（使用中）に更新する。 The priority level 54 indicates the priority level when a failure occurs and the standby host takes over (assuming that the lower the numerical value, the higher the priority level). When a failure occurs in the host of the active system, the IP address 52 with the highest priority 54 is acquired from the IP addresses 52 of the standby system having the same host name 51 as that host, and the standby host of that IP address is acquired. In step 3, take over the host of the active system in which the failure occurred. At this time, the flag area 54 of the record indicating the host of the standby system that has taken over the job is updated from 1 (unused) to 0 (in use).

図６は、実行系システムのマネージャーが停止した場合の、待機系マネージャーにおける復旧処理の概要を示すフローチャートである。 FIG. 6 is a flowchart showing an outline of the recovery process in the standby manager when the manager of the active system stops.

待機系のマネージャーは、定期的に実行系マネージャー（図２で説明したように、当該待機系マネージャーとマッピングされている実行系マネージャー）と通信を行い、実行系マネージャーが稼働中かどうかをチェックする。実行系マネージャーが稼働中であれば終了し、停止していた場合は次のステップに進む（ステップ601）。実行系マネージャーが停止していた場合、待機系マネージャーは、図３で説明した構成管理テーブルを読み込む（ステップ602）。図５のホスト名解決テーブルのマネージャーホスト名と同じホスト名51のレコードを入力し、実行系ホストのレコードのフラグ領域53を1（未使用）にし、待機系ホストのレコードのフラグ領域53を0（使用中）にする。これ以降、通信処理でマネージャーから通知されるマネージャーホストのIPアドレスは、待機系ホストのIPアドレスとなる（ステップ603）。読み込んだ構成管理テーブル中の実行ホスト名31が当該停止したマネージャーであるレコードの起動モード35が自動起動か手動起動かをチェックする（ステップ604）。自動起動の場合、当該待機系マネージャーで業務システムを起動し業務を再開する（ステップ611）。 The standby manager periodically communicates with the active manager (the active manager mapped to the standby manager as described in FIG. 2) to check whether the active manager is running. . If the active manager is running, the process is terminated, and if it is stopped, the process proceeds to the next step (step 601). If the active manager has stopped, the standby manager reads the configuration management table described in FIG. 3 (step 602). The record of the same host name 51 as the manager host name in the host name resolution table of FIG. 5 is input, the flag area 53 of the record of the executing host is set to 1 (unused), and the flag area 53 of the record of the standby host is set to 0 (In use). Thereafter, the IP address of the manager host notified from the manager in the communication processing is the IP address of the standby host (step 603). It is checked whether the activation mode 35 of the record whose execution host name 31 in the read configuration management table is the stopped manager is automatic activation or manual activation (step 604). In the case of automatic startup, the standby system manager starts the business system and resumes the business (step 611).

ステップ604で手動起動の場合は、図３の構成管理テーブルの実行系ホスト名31に登録されている実行系システムの各エージェントのうちの先頭のエージェントを最初の処理対象のエージェントとして、ステップ605に進む。図５に示したホスト名解決テーブルを参照し、処理対象のエージェントのホスト名と同じホスト名51を持つレコードのうち、優先度54が最も高い（値が小さいもの）レコードを最初の対象レコードとする（ステップ605）。次に、対象レコードのIPアドレス52で特定されるエージェントが停止中か稼働中かをチェックする（ステップ606）。停止中であれば、当該対象レコードのフラグ領域53を1（未使用）とし（ステップ607）、図５のホスト名解決テーブルの当該対象レコードの次の優先度を持つレコード（処理対象のエージェントのホスト名と同じホスト名51を持つもの）を探索する（ステップ608）。あればそのレコードを新たに対象レコードとして、ステップ606に戻る（ステップ609）。ステップ609で次の優先度を持つレコードが無ければ、上記処理対象のエージェントを実現する稼働中のホストが無かったということであるから、エラーログを出力し（ステップ610）、処理を終了する。 In the case of manual activation at step 604, the first agent among the agents of the active system registered in the active host name 31 of the configuration management table of FIG. move on. Referring to the host name resolution table shown in FIG. 5, among the records having the same host name 51 as the host name of the processing target agent, the record with the highest priority 54 (the one with the smallest value) is set as the first target record. (Step 605). Next, it is checked whether the agent identified by the IP address 52 of the target record is stopped or operating (step 606). If it is stopped, the flag area 53 of the target record is set to 1 (unused) (step 607), and the record having the next priority of the target record in the host name resolution table of FIG. Search is made for those having the same host name 51 as the host name (step 608). If there is, the record is newly set as a target record, and the process returns to Step 606 (Step 609). If there is no record having the next priority in step 609, it means that there is no active host that realizes the agent to be processed, so an error log is output (step 610), and the process is terminated.

ステップ606で対象レコードのIPアドレス52で特定されるエージェントが稼働中であれば、対象レコードのフラグ領域53を0（使用中）とする。以降の通信時にマネージャーがジョブ実行先エージェントのホスト名からIPアドレスを解決する場合、このレコードが参照され、ジョブの実行先が待機系のエージェントに切り替わる（ステップ612）。図３の構成管理テーブルの実行系ホスト名31に登録されている各エージェントの中から、未だ処理していない実行系のエージェントがあるか否か判定する（ステップ613）。未処理のエージェントがあれば、そのエージェントを新たな処理対象のエージェントとして、ステップ605に戻る。未処理のエージェントが無ければ、図３の構成管理テーブルの実行系ホスト名31に登録されている全エージェントについて処理を終えたということであるから、待機モードで業務システムを起動する（ステップ611）。 If the agent specified by the IP address 52 of the target record in step 606 is operating, the flag area 53 of the target record is set to 0 (in use). When the manager resolves the IP address from the host name of the job execution destination agent during subsequent communication, this record is referred to, and the job execution destination is switched to the standby agent (step 612). It is determined whether there is an active agent that has not yet been processed among the agents registered in the active host name 31 of the configuration management table of FIG. 3 (step 613). If there is an unprocessed agent, the agent returns to step 605 as the new agent to be processed. If there is no unprocessed agent, it means that the processing has been completed for all the agents registered in the active host name 31 in the configuration management table of FIG. 3, so the business system is started in standby mode (step 611). .

待機モードでの業務システムの起動とは、業務の実行を抑止しつつシステム起動して待機する処理である。なお、自動起動の場合はステップ604からステップ611に進むが、この場合は自動モードで業務システムの起動を行いすぐに業務を再開する。また、図５のホスト名解決テーブル中のマネージャーのレコードのフラグ領域53については、停止した実行系マネージャーのフラグ領域53を1（未使用）とし、業務を引き継いだ待機系マネージャーのフラグ領域53を0（使用中）とする処理を行う必要があるが、この処理はステップ611のシステム起動処理の中で行うものとする。 The activation of the business system in the standby mode is a process of starting the system and waiting while suppressing execution of the business. In the case of automatic activation, the process proceeds from step 604 to step 611. In this case, the business system is activated in the automatic mode and the business is immediately resumed. Further, for the flag area 53 of the manager record in the host name resolution table of FIG. 5, the flag area 53 of the stopped active manager is set to 1 (unused), and the flag area 53 of the standby manager that has taken over the task is set. Although it is necessary to perform a process of 0 (in use), this process is performed in the system startup process in step 611.

以上のように、マネージャーの復旧処理では、自動起動の場合は、すぐに業務システムを起動して業務を再開し、手動起動の場合は、各エージェントの状態を確認し、構成を変更して業務システムを起動し、業務再開待機状態にする。 As described above, in the manager recovery process, in the case of automatic startup, the business system is started immediately and the business is restarted. In the case of manual startup, the status of each agent is checked, the configuration is changed, and the business is changed. Start the system and put it in a business restart standby state.

自動起動の場合の各エージェントの稼動／停止確認、および構成の変更は、実際に業務が再開し、該当するエージェントに対して、業務の実行要求が行われるときに、手動起動の場合と同様の処理（ステップ604〜610，612，613）が行われる。 Checking the operation / stop of each agent and changing the configuration in the case of automatic startup is the same as in the case of manual startup when the business is actually restarted and a business execution request is made to the corresponding agent. Processing (steps 604 to 610, 612, 613) is performed.

図７は、実行系システムのエージェントが停止した場合の復旧処理の概要を示すフローチャートである。本処理は、マネージャーからエージェントに対して業務の実行要求を行った場合に、当該エージェントが停止だったとき、当該マネージャーにおいて起動される。 FIG. 7 is a flowchart showing an outline of recovery processing when the agent of the execution system stops. This process is started in the manager when the manager makes a business execution request to the agent and the agent is stopped.

まず、図５に示したホスト名解決テーブルから、IPアドレス52が当該停止したエージェントのIPアドレスに一致するレコードを探索する（ステップ701）。探索された当該停止エージェントのレコードのフラグ領域53を1（未使用）とする（ステップ702）。次に、図５のホスト名解決テーブルから、ホスト名51が上記停止したエージェントと同一で、優先度54が上記停止したエージェントの次に高いレコードを探索する（ステップ703）。優先度が次に高いレコードが無い場合は、停止したエージェントを引き継ぐ待機系エージェントが無いということであるから、エラーログを出力し、処理を終了する（ステップ704、708）。次の優先度のレコードがあれば、そのレコードのIPアドレス52で特定されるエージェントが稼働中かどうかチェックし（ステップ705）、稼働中であれば、そのレコードのフラグ領域53を0（使用中）にする。以降の通信時にマネージャーがジョブ実行先エージェントのホスト名からIPアドレスを解決する場合、このレコードが参照され、ジョブの実行先が待機系のエージェントに切り替わる（ステップ706）。業務実行要求を当該IPアドレスのエージェントに送る（ステップ707）。エージェントが停止中の場合は（ステップ705）、ステップ702に戻る。 First, a record in which the IP address 52 matches the IP address of the stopped agent is searched from the host name resolution table shown in FIG. 5 (step 701). The flag area 53 of the record of the searched stop agent is set to 1 (unused) (step 702). Next, the host name resolution table in FIG. 5 is searched for a record having the same host name 51 as the stopped agent and the next highest priority 54 of the stopped agent (step 703). If there is no record with the next highest priority, it means that there is no standby agent that takes over the stopped agent, so an error log is output and the process is terminated (steps 704 and 708). If there is a record of the next priority, it is checked whether the agent identified by the IP address 52 of that record is operating (step 705). If it is operating, the flag area 53 of that record is set to 0 (in use) ). When the manager resolves the IP address from the host name of the job execution destination agent during subsequent communication, this record is referred to, and the job execution destination is switched to the standby agent (step 706). A business execution request is sent to the agent of the IP address (step 707). If the agent is stopped (step 705), the process returns to step 702.

以上のように、エージェント復旧処理では、エージェントが停止していた場合、マッピングされている稼働中の他のエージェント（優先度が高いものから順に割当てる）で業務を実行する。エージェントからマネージャーに対しての業務実行結果の通知についても、同様な処理が行われ、マッピングされている稼働中の他のマネージャーに通知を行う。 As described above, in the agent recovery process, when an agent is stopped, a task is executed by other active agents that are mapped (assigned in descending order of priority). The same processing is performed for notification of the business execution result from the agent to the manager, and notification is given to the other active managers that are mapped.

図８は、実行系システムにエージェントが追加された場合の構成変更処理の概要を示すフローチャートである。 FIG. 8 is a flowchart showing an outline of the configuration change process when an agent is added to the active system.

実行系マネージャーでは、新たにエージェントを追加し（ステップ801）、図３の構成管理テーブルに実行系のホスト名31だけを格納したレコードを追加する（ステップ802）。図４で説明したように、実行系マネージャーにおいて構成管理テーブルに追加・変更があると、その旨が待機系マネージャーに通知される。待機系マネージャーでは、自機内に保持している図３の構成管理テーブルに同様にして前記レコード追加を行うとともに、図１０に示すエージェント一覧テーブルから、カレントフラグ104がONのデータを読み込む（ステップ803）。 The active manager adds a new agent (step 801), and adds a record storing only the host name 31 of the active system to the configuration management table of FIG. 3 (step 802). As described with reference to FIG. 4, if there is an addition / change in the configuration management table in the active manager, the standby manager is notified to that effect. The standby manager adds the record in the same manner to the configuration management table of FIG. 3 held in its own apparatus, and reads data with the current flag 104 ON from the agent list table shown in FIG. 10 (step 803). ).

ここで図１０のエージェント一覧テーブルについて説明する。エージェント一覧テーブルは、待機系マネージャーで保持しているテーブルである。エージェント一覧テーブルは、エージェントID101、エージェントホスト名102、IPアドレス103、カレントフラグ104、およびマッピングエージェント数105を持つ。エージェントID101は、ここでは通し番号である。エージェントホスト名102は、待機系ホスト名を記載する。IPアドレス103は、対応するエージェントホスト名102のホストのIPアドレスである。カレントフラグ104は、次に割り当てる待機系エージェントを示している。すなわち、マッピングする待機系エージェントが必要になったときには、カレントフラグ104がONのエージェントを使用する。エージェント一覧テーブルに登録されているエージェントの中で、カレントフラグ104がONのものは１つだけで、残りのエージェントのカレントフラグ104はOFFである。カレントフラグ104がONのエージェントを待機系エージェントとして割り当てたときには、そのエージェントのカレントフラグ104をOFFとし、エージェントID101が次の番号のエージェントのカレントフラグ104がONとされる。エージェント一覧テーブルの最後のエージェントの次は、エージェントID101が１の先頭エージェントに戻る。マッピングエージェント数105は、当該エージェントを待機系としてマッピングしている実行系エージェントの数を示す。 Here, the agent list table in FIG. 10 will be described. The agent list table is a table held by the standby manager. The agent list table has an agent ID 101, an agent host name 102, an IP address 103, a current flag 104, and a mapping agent number 105. The agent ID 101 is a serial number here. The agent host name 102 describes the standby host name. The IP address 103 is the IP address of the corresponding agent host name 102 host. The current flag 104 indicates a standby agent to be assigned next. That is, when a standby agent to be mapped becomes necessary, an agent whose current flag 104 is ON is used. Among the agents registered in the agent list table, only one has the current flag 104 ON, and the current flags 104 of the remaining agents are OFF. When an agent whose current flag 104 is ON is assigned as a standby agent, the current flag 104 of that agent is turned OFF, and the current flag 104 of the agent whose agent ID 101 is the next number is turned ON. After the last agent in the agent list table, the agent ID 101 returns to the first agent of 1. The number 105 of mapping agents indicates the number of active agents mapping the agent as a standby system.

再び図８に戻って、ステップ803の後、待機系マネージャーでは、カレントフラグ104がONのエージェントのホスト名102とIPアドレス103を上記実行系マネージャーに通知し（ステップ804）、当該カレントフラグ104をOFFにし、次のIDのレコードのカレントフラグ104をONにする（ステップ805）。実行系マネージャーでは、ステップ804で通知されたエージェント情報（ホスト名とIPアドレス）を元に、図３の構成管理テーブルと図５のホスト名解決テーブルを更新する（ステップ806）。ここで更新された情報は、図４で示すように、待機系マネージャーに反映される。 Returning to FIG. 8 again, after step 803, the standby manager notifies the execution manager of the host name 102 and IP address 103 of the agent whose current flag 104 is ON (step 804). It is turned OFF and the current flag 104 of the record with the next ID is turned ON (step 805). The active manager updates the configuration management table in FIG. 3 and the host name resolution table in FIG. 5 based on the agent information (host name and IP address) notified in step 804 (step 806). The information updated here is reflected in the standby manager as shown in FIG.

以上のように、新しいエージェントが追加されると、待機系のエージェントとのマッピングが自動的に行われ、データベースに反映される。本構成変更機能は、設定により、手動でのみ実施するようにすることが可能である。 As described above, when a new agent is added, mapping with the standby agent is automatically performed and reflected in the database. This configuration change function can be executed only manually by setting.

図９は、待機系システムにエージェントが追加された場合の構成変更処理の概要を示すフローチャートである。 FIG. 9 is a flowchart showing an overview of the configuration change process when an agent is added to the standby system.

待機系マネージャーでは、新たにエージェントを追加し（ステップ901）、図１０のエージェント一覧テーブルにレコードを追加する（ステップ902）。エージェント一覧テーブルから、マッピングエージェント数105が最大のレコードを入力し、その値を１カウントダウンし、ステップ902で追加したレコードのマッピングエージェント数105を１にする（ステップ903）。次に、図３の構成管理テーブルから、ステップ903でマッピングエージェント数をカウントダウンしたエージェント名のレコードを読み込み、データの変更を実行系マネージャーに通知する（ステップ904、905）。 The standby manager adds a new agent (step 901), and adds a record to the agent list table of FIG. 10 (step 902). The record having the maximum number of mapping agents 105 is input from the agent list table, the value is counted down by 1, and the number of mapping agents 105 of the record added in step 902 is set to 1 (step 903). Next, the agent name record in which the number of mapping agents is counted down in step 903 is read from the configuration management table in FIG. 3, and the change of data is notified to the executing system manager (steps 904 and 905).

実行系マネージャーでは、通知されたエージェント情報を元に、構成管理テーブルのマッピングするエージェントを更新し、ホスト名解決テーブルの該当するIPアドレスを更新する（ステップ906）。 The active manager updates the agent to be mapped in the configuration management table based on the notified agent information, and updates the corresponding IP address in the host name resolution table (step 906).

以上のように、待機系で新しいエージェントが追加されると、実行系エージェントとのマッピングが更新され、複数のエージェントがマッピングされている待機系エージェントに対して、エージェントの分散が自動的に行われる。本構成変更機能は、設定により、手動でのみ実施するようにすることが可能である。 As described above, when a new agent is added in the standby system, the mapping with the active system agent is updated, and the agents are automatically distributed to the standby system to which multiple agents are mapped. . This configuration change function can be executed only manually by setting.

なお、上記実施形態では、待機系システムのマネージャーで、実行系システムのマネージャーの停止を検出し図６の処理を行うようにしているが、マネージャーの停止を検出する機能やシステムを起動する機能などを備えた別の装置でこれらの処理を行ってもよい。図７の処理も同様である。 In the above embodiment, the manager of the standby system detects the stop of the manager of the active system and performs the processing of FIG. 6, but the function of detecting the manager stop, the function of starting the system, etc. You may perform these processes with another apparatus provided with. The process in FIG. 7 is the same.

本発明の一実施形態の構成を表すブロック図The block diagram showing the structure of one Embodiment of this invention 実行系システムと待機系システムのマッピング構成を示すブロック図Block diagram showing the mapping configuration between the active system and the standby system マッピング構成を管理するデータ構成図Data structure diagram for managing mapping structure 実行系と待機系の業務データの転送を表すブロック図Block diagram showing transfer of business data between active and standby systems ホスト名解決のためのデータ構成図Data structure diagram for host name resolution 実行系システムのマネージャーが停止した場合の復旧処理の概要を示すフローチャートFlow chart showing the outline of recovery processing when the manager of the active system stops 実行系システムのエージェントが停止した場合の復旧処理の概要を示すフローチャートFlowchart showing an overview of recovery processing when an agent in the running system stops 実行系システムにエージェントが追加された場合の構成変更処理の概要を示すフローチャートFlowchart showing the overview of configuration change processing when an agent is added to the active system 待機系システムにエージェントが追加された場合の構成変更処理の概要を示すフローチャートFlowchart showing an overview of configuration change processing when an agent is added to the standby system エージェント一覧を管理するデータ構成図Data structure diagram for managing the agent list

Explanation of symbols

1…実行系システム、2…待機系システム、3、4、5…拠点、6…実行系システムのマネージャー、7、8、9…実行系システムのエージェント、10…待機系システムのマネージャー、11、12…待機系システムのエージェント、13…公衆回線網、21…実行系システムのマネージャー、22、23、24…実行系システムのエージェント、25…待機系システムのマネージャー、26、27…待機系システムのエージェント、31…実行系ホスト名、32…待機系ホスト名、33…待機系IPアドレス、34…種別、35…起動モード、41…実行系システム、42…待機系システム、43…実行系システムのマネージャー、44…待機系システムのマネージャー、45…実行系システムの業務運用管理システムDB、46…待機系システムの業務運用管理システムDB、51…ホスト名、52…IPアドレス、53…フラグ領域、54…優先度、101…エージェントID、102…エージェントホスト名、103…IPアドレス、104…カレントフラグ、105…マッピングエージェント数。 1 ... active system, 2 ... standby system, 3, 4, 5 ... base, 6 ... execution system manager, 7, 8, 9 ... active system agent, 10 ... standby system manager, 11, 12 ... Standby system agent, 13 ... Public network, 21 ... Active system manager, 22, 23,24 ... Active system agent, 25 ... Standby system manager, 26,27 ... Standby system Agent, 31 ... Execution host name, 32 ... Standby host name, 33 ... Standby IP address, 34 ... Type, 35 ... Startup mode, 41 ... Execution system, 42 ... Standby system, 43 ... Execution system Manager, 44 ... Standby system manager, 45 ... Execution system business operation management system DB, 46 ... Standby system business operation management system DB, 51 ... Host name, 52 ... IP address, 53 ... Flag area 54 ... Priority, 101 ... Agent ID, 102 ... Agent host name, 103 ... IP address, 104 ... Current flag, 105 ... Number of mapping agents.

Claims

In a business operation management system comprising a manager host and an arbitrary number of agent hosts managed by the manager host, the agent host executes an actual business in response to a request from the manager host. A business recovery support system that supports business recovery when it occurs,
In the case where the business operation management system is stopped, a standby business operation management system including a manager host that performs an alternative operation and an arbitrary number of agent hosts managed by the manager host is provided.
Means for storing in advance a configuration management table that defines which manager host and which agent host of the standby business operation management system perform business substitution when a failure occurs in the manager host and agent host of the business operation management system When,
Host name resolution table that stores the correspondence between the host names of the manager host and agent host in the business operation management system and the IP address that identifies the host of the standby business operation management system that substitutes for those hosts Means for storing;
Means for allocating a manager host of the standby business operation management system as a substitute for the stopped manager host from the configuration management table when detecting that the manager host of the business operation management system has stopped; When,
When it is detected that the agent host of the business operation management system has stopped, an agent host of the standby business operation management system that substitutes for the stopped agent host is assigned from the host name resolution table, and the business is continued. A business restoration support system characterized by comprising means.