JP2007133795A

JP2007133795A - Cluster-structured business system

Info

Publication number: JP2007133795A
Application number: JP2005328252A
Authority: JP
Inventors: Takayuki Hamada; 貴之浜田; Tatsuro Ueda; 達郎植田; Junsuke Fujii; 淳介藤井; Koichi Morita; 宏一森田
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 2005-11-14
Filing date: 2005-11-14
Publication date: 2007-05-31

Abstract

<P>PROBLEM TO BE SOLVED: To avoid an event, in which failure of a server having a queue is not detectable by failure of a server detecting a failure of server in a cluster-structured queue system, and takeover of messages left in a queue cannot be performed. <P>SOLUTION: The cluster-structured queue system in which a plurality of servers share queues, a message management program spontaneously determines, upon detection of a failure of another server, whether to take over messages managed by the failure server or not. The message management program which determines the takeover takes over the messages managed by the failure server by updating "processed" messages of messages within the queue managed by the failure server to its own ID and copying copies of the messages to its message cache area. <P>COPYRIGHT: (C)2007,JPO&INPIT

Description

本発明は、所定の業務を実行する業務システムであって、複数のサーバが一つのキューを共用するクラスタ構成のキューシステムに関するものであり、業務処理要求が処理されずに滞ることを防ぐ技術に関するものである。業務システムには、金融システムなどが含まれる。 The present invention relates to a business system that executes a predetermined business, and relates to a cluster system queue system in which a plurality of servers share a single queue, and relates to a technique for preventing business processing requests from being delayed without being processed. Is. The business system includes a financial system.

キューを使用するクラスタ構成のキューシステムにおいて、一部のサーバで障害が発生した場合に、障害が発生したサーバが行っていた処理を、他のサーバが引き継ぐ方法として、特許文献１が知られている。特許文献１においては、サーバの障害を検知するサーバが、障害サーバと同じ名称のキューを持つサーバに対して前記障害サーバのサーバ状態を通知し、通知を受けたサーバが引き継ぎサーバとして前記障害サーバのメッセージ処理を引き継ぐ。また、特許文献１においては、障害サーバが再始動する過程で、障害サーバのキューの他サーバ参照状態が参照中であった場合、そのキューは使用せずに待機状態となり、他サーバ参照状態が参照終了となった場合、初めてそのキューを使用することにすることで、障害サーバのキューに残ったメッセージに対して障害サーバと引継ぎサーバが二重に処理を行わないようにすることを可能としている。 In a cluster configuration queue system that uses queues, Patent Document 1 is known as a method in which another server takes over the processing performed by a failed server when a failure occurs in some server. Yes. In Patent Document 1, a server that detects a failure of a server notifies a server status of the failed server to a server having a queue with the same name as the failed server, and the server that has received the notification serves as the takeover server as the failed server. Take over message processing. Further, in Patent Document 1, if the server reference state other than the queue of the failed server is being referred to in the process of restarting the failed server, the queue is not used and is in a standby state. When the reference ends, the queue is used for the first time, so that it is possible to prevent the failed server and takeover server from performing double processing on messages remaining in the failed server queue. Yes.

特開２００４−８６５４３号公報JP 2004-86543 A

しかしながら、上記従来技術では、サーバの障害を検知するサーバ自身に障害が発生した場合、キューを持つサーバの障害を検知できなくなり、キューに残ったメッセージの引き継ぎが不可能となる。また、障害サーバのキューの他サーバ参照状態が参照中であった場合、障害サーバが再始動したにもかかわらず、そのキューは使用できない状態となり、システムの障害からの即時復旧が困難となる。 However, in the above prior art, when a failure occurs in the server itself that detects the failure of the server, it is impossible to detect the failure of the server having the queue, and it is impossible to take over the message remaining in the queue. In addition, when the server reference state other than the queue of the failed server is being referenced, the queue cannot be used even though the failed server is restarted, and it is difficult to immediately recover from a system failure.

そこで、本発明では、キューを管理する複数のサーバが互いに生存を確認し合い、他サーバの障害を検知したサーバは、予め定められた情報に従って自身が引き継ぎサーバであるかどうかを自発的に判断し、その結果、自身が引き継ぎサーバであると判断したサーバが障害サーバの管理していたメッセージを引き継ぐ。また、本発明では、クラスタ構成の複数のサーバがキューを共用し、キューにメッセージを登録する際に、メッセージに自身が管理するメッセージあることを特定可能なＩＤを付与する。前記引き継ぎサーバは、前記障害サーバが管理していたメッセージに付与された前記ＩＤを自身のＩＤに更新することで、メッセージの引継ぎを行う。また、前記障害サーバが再始動する場合、共用しているキューを使用することが可能である。 Therefore, in the present invention, a plurality of servers that manage queues mutually confirm the existence of each other, and a server that detects a failure of another server voluntarily determines whether it is a takeover server according to predetermined information. As a result, the server that is determined to be the takeover server takes over the message managed by the failed server. Further, in the present invention, when a plurality of servers in a cluster configuration share a queue and register a message in the queue, an ID that can specify that the message is managed by the server is assigned to the message. The takeover server takes over the message by updating the ID given to the message managed by the failed server to its own ID. Also, when the failed server restarts, it is possible to use a shared queue.

本発明によれば、クラスタを構成する複数のサーバが互いに生存を確認し合い、他サーバの障害を検知したサーバが、自発的に障害サーバのメッセージを引き継ぐかどうか判断することができる。これにより、特定のサーバの障害によりメッセージの引き継ぎが不可能となることを防ぐことができる。また、障害サーバが再始動する場合、即座にキューを使用することが可能であり、システムの障害からの即時復旧が可能となる。 According to the present invention, it is possible to determine whether a plurality of servers constituting a cluster mutually confirm the existence of each other and a server that detects a failure of another server spontaneously takes over the message of the failed server. As a result, it is possible to prevent a message from being taken over due to a failure of a specific server. In addition, when the failed server restarts, the queue can be used immediately, and immediate recovery from a system failure is possible.

本発明の実施の形態について、図面を用いて説明する。
まず、本実施の形態におけるシステム構成図を図1に示す。サーバ装置１（１１），サーバ装置２（２1）、サーバ装置３（３１）は同一の構成をしており、それぞれサーバ装置４（４１）とネットワークを介して接続されている。また、サーバ装置１（１１）、サーバ装置２（２１）、サーバ装置３（３１）は互いの生存を確認し合う目的で電文をやり取り（５４）できるように、ネットワークを介して互いに接続されている。サーバ装置１（１１）では、メッセージの入力処理を行うプログラム（１２）と、メッセージにより要求された業務処理を実行するプログラム（１３）と、メッセージの管理を行うプログラム（１４）とが動作する。メッセージの管理を行うプログラム（１４）は、図２に示すように、システム内で自身を一意に特定するＩＤ（１５）と、メッセージのコピーを保持しておくメモリ上の領域であるメッセージキャッシュ領域（１６）と、サーバ障害時のメッセージの引継ぎのための情報を記しておく引継ぎ情報管理テーブル（１７）とを持つ。サーバ装置２（２１）およびサーバ装置３（３１）においても、前記サーバ装置１（１１）と同様のプログラムが動作する。サーバ装置４（４１）ではＤＢＭＳ（４２）が動作し、メッセージを記憶するＤＢキュー（４３）を実装している。ＤＢキュー（４３）の実態は、ＤＢＭＳ（４２）のテーブルである。 Embodiments of the present invention will be described with reference to the drawings.
First, FIG. 1 shows a system configuration diagram in the present embodiment. The server device 1 (11), the server device 2 (21), and the server device 3 (31) have the same configuration, and are connected to the server device 4 (41) via a network, respectively. The server device 1 (11), the server device 2 (21), and the server device 3 (31) are connected to each other via a network so that messages can be exchanged (54) for the purpose of confirming each other's survival. Yes. In the server apparatus 1 (11), a program (12) that performs message input processing, a program (13) that executes business processing requested by the message, and a program (14) that manages messages operate. As shown in FIG. 2, the message management program (14) has an ID (15) that uniquely identifies itself in the system and a message cache area that is a memory area that holds a copy of the message. (16) and a takeover information management table (17) in which information for message takeover at the time of a server failure is recorded. In the server device 2 (21) and the server device 3 (31), the same program as the server device 1 (11) operates. In the server device 4 (41), the DBMS (42) operates, and a DB queue (43) for storing messages is mounted. The actual state of the DB queue (43) is a table of the DBMS (42).

基本的な処理の流れについて、図１を用いて説明する。前記の通り、サーバ装置１（１１）、サーバ装置２（１２）、サーバ装置３（１３）は同一の構成をしており、互いに生存確認をしている。また、各サーバ装置および各サーバ装置上で動作する各プログラムは同一の機能を有しており、それらは並列に動作する。以下では、サーバ装置１（１１）を例に基本的な処理の流れについて説明する。メッセージ入力プログラム（１２）は、Webブラウザからのリクエストを受け付けてメッセージを生成するプログラムやCSVファイル等からの入力をもとに一括でメッセージの生成を行うバッチプログラム等を想定する。メッセージ入力プログラム（１２）は生成したメッセージをメッセージ管理プログラム（１４）へ送信し（５１）、メッセージをＤＢキュー（４３）へ登録するよう依頼する。メッセージ管理プログラム（１４）は、受信したメッセージをＤＢキュー（４３）へ登録（５２）する。メッセージ管理プログラム（１４）は、ＤＢキュー（４３）へ登録したメッセージをメッセージ処理プログラム（１３）へ送信し（５３）、メッセージの処理を依頼する。メッセージ処理プログラム（１３）はメッセージに従い業務処理を実行する。 A basic processing flow will be described with reference to FIG. As described above, the server device 1 (11), the server device 2 (12), and the server device 3 (13) have the same configuration, and are alive with each other. Each server device and each program operating on each server device have the same function, and they operate in parallel. Below, the flow of a basic process is demonstrated for the server apparatus 1 (11) as an example. The message input program (12) assumes a program that receives a request from a Web browser and generates a message, a batch program that generates messages in a batch based on input from a CSV file or the like. The message input program (12) transmits the generated message to the message management program (14) (51), and requests to register the message in the DB queue (43). The message management program (14) registers (52) the received message in the DB queue (43). The message management program (14) transmits the message registered in the DB queue (43) to the message processing program (13) (53), and requests the message processing. The message processing program (13) executes business processing according to the message.

以下、本実施の形態の処理内容について図４、図５、図６を用いて説明する。説明においては、サーバ装置１（１１）を例に用いるが、サーバ装置２（２１）、サーバ装置３（３１）においても同様の処理内容である。なお、処理内容以外の対象物については、図１、図２、図７および図８の表記を引用する。 Hereinafter, processing contents of the present embodiment will be described with reference to FIGS. 4, 5, and 6. In the description, the server device 1 (11) is used as an example, but the same processing contents are applied to the server device 2 (21) and the server device 3 (31). In addition, about objects other than the processing content, the notation of FIG.1, FIG.2, FIG.7 and FIG.8 is quoted.

まず、メッセージ入力の処理内容について図４を用いて説明する。メッセージ入力プログラム（１２）は、WebブラウザからのリクエストやCSVファイルからの入力等を受け付けてメッセージを生成し、メッセージ管理プログラムへメッセージを送信する（１０１）。メッセージ管理プログラム(１４)は、メッセージ入力プログラム（１２）から送信されたメッセージのコピーを自身のメッセージキャッシュ領域（１６）に保持し（１０２）、その後、ＤＢキュー（４３）へメッセージを登録する（１０３）。ＤＢキュー（４３）へ登録されたメッセージの実態は、ＤＢキュー（４３）の実態であるＤＢＭＳ（４２）のテーブルの１レコードであり、図７に示した項目を持つ。メッセージＩＤ（６１）はメッセージをシステム内で一意に特定するＩＤであり、メッセージ管理プログラム（１４）がＤＢキュー（４３）へメッセージを登録する際にシステム内で一意なＩＤを生成し付与する。管理者ＩＤ（６２）はメッセージの管理者をシステム内で一意に特定するＩＤであり、当該メッセージをＤＢキュー（４３）へ登録したメッセージ管理プログラム（１４）のＩＤ（１５）を付与する。ステータス（６３）はメッセージの処理状態を表す値であり、メッセージ登録時は“処理未済”を付与する。メッセージ本体（６４）は、メッセージ入力プログラム（１２）から受信したメッセージそのものである。また、メッセージキャッシュ領域（１６）に保持するメッセージのコピーは、ＤＢキュー（４３）へ登録するメッセージと同一のメッセージ本体（６４）およびメッセージＩＤ（６１）を持つ。 First, processing contents of message input will be described with reference to FIG. The message input program (12) receives a request from a Web browser, an input from a CSV file, etc., generates a message, and transmits the message to the message management program (101). The message management program (14) holds a copy of the message transmitted from the message input program (12) in its message cache area (16) (102), and then registers the message in the DB queue (43) ( 103). The actual state of the message registered in the DB queue (43) is one record in the table of the DBMS (42) which is the actual state of the DB queue (43), and has the items shown in FIG. The message ID (61) is an ID that uniquely identifies the message in the system. When the message management program (14) registers a message in the DB queue (43), a unique ID is generated and assigned in the system. The manager ID (62) is an ID that uniquely identifies the manager of the message in the system, and is given the ID (15) of the message management program (14) that registered the message in the DB queue (43). The status (63) is a value indicating the processing state of the message, and “processing is not completed” is assigned when the message is registered. The message body (64) is the message itself received from the message input program (12). The copy of the message held in the message cache area (16) has the same message body (64) and message ID (61) as the message registered in the DB queue (43).

次に、メッセージ処理の処理内容について図５を用いて説明する。メッセージ管理プログラム（１４）は自身のメッセージキャッシュ領域（１６）に保持しているメッセージのコピーを一つ取り出し、メッセージ処理プログラム（１３）へ送信する（２０１）。メッセージ処理プログラム（１０３）は、受け取ったメッセージに従い業務処理を実行し、業務処理が終了するとその旨をメッセージ管理プログラム（１４）へ通知する（２０２）。メッセージ管理プログラム（１４）は、メッセージ処理プログラム（１３）からの終了通知を受け取ると、ＤＢキュー（４３）に登録してある当該メッセージのステータス（６２）を“処理済み”に更新し（２０３）、その後、自身のメッセージキャッシュ領域（１６）から当該メッセージのコピーを削除する（２０４）。 Next, processing contents of the message processing will be described with reference to FIG. The message management program (14) takes out one copy of the message held in its message cache area (16) and sends it to the message processing program (13) (201). The message processing program (103) executes the business process according to the received message, and notifies the message management program (14) of the completion of the business process (202). When the message management program (14) receives the end notification from the message processing program (13), it updates the status (62) of the message registered in the DB queue (43) to “processed” (203). Thereafter, a copy of the message is deleted from its own message cache area (16) (204).

最後に、サーバ障害時のメッセージの引継ぎ処理の内容について図６を用いて説明する。システムを起動する（３０１）と、サーバ装置１（１１）、サーバ装置２（２１）、サーバ装置３（３１）で動作するメッセージ管理プログラム（１４、２４、３４）は、それぞれ互いが動作していることを確認する生存監視を開始する（３０２）。生存監視は、例えばサーバ装置１（１１）のメッセージ管理プログラム（１４）がサーバ装置２（２１）のメッセージ管理プログラム（２４）へ生存確認問合せ電文を送信し、電文を受信したサーバ装置２（２１）のメッセージ管理プログラム（２４）がサーバ装置１（１１）のメッセージ管理プログラム（１４）へ生存応答電文を返送して、それをサーバ装置１（１１）のメッセージ管理プログラム（１４）が受信することでサーバ装置２（２１）のメッセージ管理プログラム（２４）の生存を確認するという手段で行う。サーバ装置１（１１）のメッセージ管理プログラム（１４）がサーバ装置２（２１）のメッセージ管理プログラム（２４）からの生存応答電文を受信できない時、サーバ装置２（２１）のメッセージ管理プログラム（２４）に障害が発生したことを検知できる。 Finally, the contents of the message takeover process in the event of a server failure will be described with reference to FIG. When the system is started (301), the message management programs (14, 24, 34) operating on the server device 1 (11), the server device 2 (21), and the server device 3 (31) Survival monitoring is performed to confirm that it is present (302). In the survival monitoring, for example, the message management program (14) of the server apparatus 1 (11) transmits a survival confirmation inquiry message to the message management program (24) of the server apparatus 2 (21), and receives the message. ) Message management program (24) returns a survival response message to the message management program (14) of the server apparatus 1 (11), and the message management program (14) of the server apparatus 1 (11) receives it. In this way, the existence of the message management program (24) of the server device 2 (21) is confirmed. When the message management program (14) of the server apparatus 1 (11) cannot receive the survival response message from the message management program (24) of the server apparatus 2 (21), the message management program (24) of the server apparatus 2 (21) Can detect that a failure has occurred.

他サーバの障害を検知しない間は、定期的に他サーバの生存確認を繰り返す。他サーバの障害を検知した場合、メッセージ管理プログラム（１４）は、自身の引継ぎ情報テーブル（１７）を参照し、障害が発生したサーバの引継ぎ情報を取得する（３０４）。メッセージ管理プログラム（１４）は、前記引継ぎ情報より前記障害が発生したサーバが管理していたメッセージを引き継ぐサーバの優先度を確認する（３０５）。メッセージ管理プログラム（１４）は、前記優先度で自身が最も優先度の高いサーバであった場合、ＤＢキュー（４３）を参照し、前記障害が発生したサーバが管理していたメッセージで、且つ、ステータス（６３）が“処理未済”のメッセージを検索する（３０７）。メッセージ管理プログラム（１４）は、前記検索にヒットしたメッセージの管理者ＩＤ（６２）を自身のＩＤ（１５）に更新し、その後、メッセージのコピーを自身のメッセージキャッシュ領域（１６）に保持する（３０８）。一方、前記優先度で自身よりも優先度の高いサーバが存在する場合、メッセージ管理プログラム（１４）は、前記優先度の高いサーバが生存しているか確認する（３０６）。前記優先度の高いサーバが一つでも生存している場合、メッセージ管理プログラム（１４）は何もしない。前記優先度の高いサーバが全て生存していない場合、メッセージ管理プログラム（１４）は、自身が最も優先度の高いサーバであると判断し、前記メッセージ引継ぎ処理（３０７、３０８）を行う。 While the failure of other servers is not detected, the existence check of other servers is repeated periodically. When a failure of another server is detected, the message management program (14) refers to its own takeover information table (17) and acquires takeover information of the server in which the failure has occurred (304). The message management program (14) confirms the priority of the server that takes over the message managed by the failed server from the takeover information (305). The message management program (14) refers to the DB queue (43) when the server itself is the highest priority server, and is a message managed by the failed server, and A message whose status (63) is “processing not completed” is searched (307). The message management program (14) updates the manager ID (62) of the message hit in the search to its own ID (15), and then holds a copy of the message in its message cache area (16) ( 308). On the other hand, if there is a server with a higher priority than that of itself, the message management program (14) checks whether the server with the higher priority is alive (306). If even one of the high priority servers is alive, the message management program (14) does nothing. If all the servers with high priority are not alive, the message management program (14) determines that it is the server with the highest priority, and performs the message takeover processing (307, 308).

本発明は、業務処理の滞りが許されない金融システムなどの社会インフラシステムに利用可能である。 The present invention can be used for a social infrastructure system such as a financial system in which business processing is not allowed to be delayed.

キューを用いて非同期でメッセージを連携させる基本的な処理の流れを説明する概念図である。It is a conceptual diagram explaining the flow of the basic process which cooperates a message asynchronously using a queue. メッセージ管理プログラム14の構成例を説明する概念図である。3 is a conceptual diagram illustrating an example of the configuration of a message management program 14. FIG. ＤＢキュー43の構成例を説明する概念図である。3 is a conceptual diagram illustrating an example configuration of a DB queue 43. FIG. 本発明において、キューにメッセージを入力する処理の流れを説明したフローチャートである。In the present invention, it is the flowchart explaining the flow of the processing which inputs the message to the queue. 本発明において、キューに入力されたメッセージを処理する際の流れを説明したフローチャートである。In the present invention, it is the flowchart explaining the flow at the time of processing the message which is input to the queue. 本発明において、システム内の特定のサーバに障害は発生した際に残りの特定のサーバがメッセージを引き継ぐ処理の流れを説明したフローチャートである。In the present invention, when a failure occurs in a specific server in the system, it is a flowchart explaining the flow of processing in which the remaining specific server takes over the message. ＤＢキュー43の構成例である。3 is a configuration example of a DB queue 43. 引継ぎ情報テーブル17の構成例である。4 is a configuration example of a takeover information table 17.

Explanation of symbols

11 サーバ装置１
12 メッセージ入力プログラム
21 サーバ装置２
22 メッセージ入力プログラム
23 メッセージ処理プログラム
24 メッセージ管理プログラム２
31 サーバ装置３
32 メッセージ入力プログラム
33 メッセージ処理プログラム
34 メッセージ管理プログラム３
41 サーバ装置４
42 ＤＢＭＳ
51 メッセージＩＤ
52 管理者ＩＤ
53 ステータス
54 メッセージ
61 サーバＩＤ
62 引継ぎサーバ
11 Server device 1
12 Message input program
21 Server device 2
22 Message input program
23 Message processing program
24 Message management program 2
31 Server device 3
32 Message input program
33 Message processing program
34 Message management program 3
41 Server device 4
42 DBMS
51 Message ID
52 Administrator ID
53 Status
54 Message
61 Server ID
62 Takeover server

Claims

In a business system for executing a predetermined business, a business system having a cluster configuration in which a plurality of servers share a queue that temporarily stores a message that is request information for requesting execution of the business.
Means for holding a program for inputting the message, a program for managing the message, and a program for executing a job in accordance with the message;
Means for detecting that a failure has occurred in a specific server among the plurality of servers;
A cluster-structured business system comprising means for taking over the message managed by the specific server in which the failure has occurred and taking over the specific server in which the failure has not occurred among the plurality of servers.

Means for taking over the message managed by the specific server in which the failure has occurred and the specific server in which the failure has not occurred among the plurality of servers has failed in the specific server among the plurality of servers A server in which a failure has occurred is identified using a means for detecting that, and a server that has been managed by the predetermined server in which the failure has occurred is determined by which of the plurality of servers should take over 2. The cluster-structured business system according to claim 1, wherein a server that should take over the message determines based on the received information and takes over the message.