JP7209784B1

JP7209784B1 - Redundant system and redundant method

Info

Publication number: JP7209784B1
Application number: JP2021135452A
Authority: JP
Inventors: 俊也齋藤; 友博森近
Original assignee: Mitsubishi Electric Corp
Current assignee: Mitsubishi Electric Corp
Priority date: 2021-08-23
Filing date: 2021-08-23
Publication date: 2023-01-20
Anticipated expiration: 2041-08-23
Also published as: JP2023030364A

Abstract

【課題】プロセス単位での切り替えを前提とするソフトウェアを用いつつ、サーバ単位での切り替えを可能にする。【解決手段】冗長化システム１００は、主系で動作する第１サーバ１０と、従系で動作する第２サーバ２０とを備える。第１サーバ１０と第２サーバ２０とでは、処理機能を実現するプロセスである処理部と、主系と従系との切り替えを実現するプロセスである切替部とが動作する。第２サーバ２０における切替部は、第１サーバにおける処理部と切替部との少なくともいずれかに障害が発生すると、第２サーバ２０を主系で動作させる。【選択図】図１The present invention enables switching on a server-by-server basis while using software that assumes switching on a process-by-process basis. A redundant system (100) includes a first server (10) operating in a master system and a second server (20) operating in a slave system. In the first server 10 and the second server 20, a processing unit, which is a process for realizing processing functions, and a switching unit, which is a process for realizing switching between the master system and the slave system, operate. The switching unit in the second server 20 causes the second server 20 to operate as the main system when at least one of the processing unit and the switching unit in the first server fails. [Selection drawing] Fig. 1

Description

本開示は、サーバ単位での切替を行う冗長化技術に関する。 The present disclosure relates to redundancy technology for performing switching on a server-by-server basis.

費用面から、現行システムで利用されているソフトウェアを、有償のソフトウェアからＯＳＳ（ＯｐｅｎＳｏｕｒｃｅＳｏｆｔｗａｒｅ）へ移行させたいという要望がある。例えば、有償のデータベース（以下、ＤＢと書く）を、ＯＳＳのＤＢへ移行させたいという要望がある。 In terms of cost, there is a demand to migrate the software used in current systems from paid software to OSS (Open Source Software). For example, there is a demand to migrate a paid database (hereinafter referred to as DB) to an OSS DB.

オンプレミスで稼働している現行システムは、サーバ単位での切り替えを行う運用が行われていることが多い。ＯＳＳを利用した場合には、プロセス単位で切り替えを行う運用になってしまう場合がある。その結果、現行システムの運用を変更しなければならない場合がある。 Current systems operating on-premises are often switched over on a server-by-server basis. When OSS is used, there are cases where switching is performed on a process-by-process basis. As a result, the operation of current systems may have to be changed.

特許文献１には、サーバ単位ではなく、業務単位で待機系に切り替えする分散システムについて記載されている。 Patent Literature 1 describes a distributed system that switches to a standby system not for each server but for each business.

特開平９－２９３０５９号公報JP-A-9-293059

プロセス単位で切り替えを行う運用では、管理が複雑になってしまう。そのため、オンプレミスで稼働しているシステム等の場合には、プロセス単位での切り替えではなく、サーバ単位での切り替えが望まれることがある。しかし、価格面からＯＳＳを選択し、その選択したＯＳＳがプロセス単位で切り替えを行う場合、サーバ単位で切り替えることはできない。
本開示は、プロセス単位での切り替えを前提とするソフトウェアを用いつつ、サーバ単位での切り替えを可能にすることを目的とする。 In the operation of switching by process, management becomes complicated. Therefore, in the case of a system operating on-premises, it may be desirable to switch on a server-by-server basis instead of switching on a process-by-process basis. However, if an OSS is selected from a cost standpoint and the selected OSS performs switching on a process-by-process basis, it is not possible to switch on a server-by-server basis.
An object of the present disclosure is to enable switching in units of servers while using software that assumes switching in units of processes.

本開示に係る冗長化システムは、
主系で動作する第１サーバと、従系で動作する第２サーバとを備える冗長化システムであり、
前記第１サーバと前記第２サーバとでは、処理機能を実現するプロセスである処理部と、主系と従系との切り替えを実現するプロセスである切替部とが動作し、
前記第２サーバにおける前記切替部は、前記第１サーバにおける前記処理部と前記切替部との少なくともいずれかに障害が発生すると、前記第２サーバを主系で動作させる。 The redundant system according to the present disclosure is
A redundant system comprising a first server operating as a master system and a second server operating as a slave system,
In the first server and the second server, a processing unit, which is a process for realizing processing functions, and a switching unit, which is a process for realizing switching between a master system and a slave system, operate,
The switching unit in the second server causes the second server to operate as a main system when at least one of the processing unit and the switching unit in the first server fails.

本開示では、主系で動作する第１サーバにおける処理部と切替部との少なくともいずれかに障害が発生すると、第２サーバを主系で動作させる。これにより、処理部と切替部とをプロセス単位での切り替えを前提とするソフトウェアで実現した場合にも、サーバ単位での切り替えが可能になる。 In the present disclosure, when a failure occurs in at least one of the processing unit and the switching unit in the first server operating as the main system, the second server is operated as the main system. As a result, even when the processing unit and the switching unit are realized by software that assumes switching on a per-process basis, switching on a server-by-server basis is possible.

実施の形態１に係る冗長化システム１００の構成図。1 is a configuration diagram of a redundant system 100 according to Embodiment 1; FIG. 実施の形態１に係る第１サーバ１０の構成図。2 is a configuration diagram of a first server 10 according to Embodiment 1. FIG. 実施の形態１に係る第２サーバ２０の構成図。2 is a configuration diagram of a second server 20 according to Embodiment 1. FIG. 実施の形態１に係る冗長化システム１００の処理の概要説明図。FIG. 4 is a schematic explanatory diagram of processing of the redundancy system 100 according to the first embodiment; FIG. 実施の形態１に係る冗長化システム１００の処理の概要説明図。FIG. 4 is a schematic explanatory diagram of processing of the redundancy system 100 according to the first embodiment; FIG. 実施の形態１に係る第１障害処理のフローチャート。4 is a flowchart of first failure processing according to the first embodiment; 実施の形態１に係る第２障害処理のフローチャート。4 is a flowchart of second failure processing according to the first embodiment; 実施の形態１に係る第３障害処理のフローチャート。9 is a flowchart of third failure processing according to the first embodiment; 実施の形態１に係る第４障害処理のフローチャート。10 is a flowchart of fourth failure processing according to the first embodiment; 実施の形態１に係る冗長化システム１００の効果の説明図。FIG. 4 is an explanatory diagram of the effects of the redundant system 100 according to the first embodiment; 実施の形態１に係る冗長化システム１００の効果の説明図。FIG. 4 is an explanatory diagram of the effects of the redundant system 100 according to the first embodiment;

実施の形態１．
＊＊＊構成の説明＊＊＊
図１を参照して、実施の形態１に係る冗長化システム１００の構成を説明する。
冗長化システム１００は、第１サーバ１０と、第２サーバ２０と、アプリケーションサーバ３０とを備える。
第１サーバ１０と第２サーバ２０とは、同一の機能を実現したサーバである。実施の形態１では、第１サーバ１０と第２サーバ２０とは、データベース機能を実現したデータベースサーバであるとする。アプリケーションサーバ３０は、仮想ＩＰ（Ｉｎｔｅｒｎｅｔ
Ｐｒｏｔｏｃｏｌ）アドレスを用いて、第１サーバ１０と第２サーバ２０とのうち主系で動作する方へアクセスする。実施の形態１では、初期状態においては、第１サーバ１０は主系で動作し、第２サーバ２０は従系で動作するものとする。 Embodiment 1.
*** Configuration description ***
A configuration of a redundant system 100 according to the first embodiment will be described with reference to FIG.
A redundant system 100 includes a first server 10 , a second server 20 , and an application server 30 .
The first server 10 and the second server 20 are servers realizing the same function. In Embodiment 1, the first server 10 and the second server 20 are assumed to be database servers that implement database functions. The application server 30 uses a virtual IP (Internet
Protocol) address is used to access whichever of the first server 10 and the second server 20 operates in the main system. In the first embodiment, in the initial state, the first server 10 operates as a main system, and the second server 20 operates as a subordinate system.

図２を参照して、実施の形態１に係る第１サーバ１０の構成を説明する。
第１サーバ１０は、コンピュータである。
第１サーバ１０は、プロセッサ１１と、メモリ１２と、ストレージ１３と、通信インタフェース１４とのハードウェアを備える。プロセッサ１１は、信号線を介して他のハードウェアと接続され、これら他のハードウェアを制御する。 A configuration of the first server 10 according to the first embodiment will be described with reference to FIG.
The first server 10 is a computer.
The first server 10 includes hardware including a processor 11 , a memory 12 , a storage 13 and a communication interface 14 . The processor 11 is connected to other hardware via signal lines and controls these other hardware.

第１サーバ１０は、機能構成要素として、処理部１１１と、切替部１１２と、内部監視部１１３とが動作する。切替部１１２は、外部監視部１１４と、切替実行部１１５と、障害検知部１１６とを備える。第１サーバ１０の各機能構成要素の機能はソフトウェアにより実現される。
ストレージ１３には、第１サーバ１０の各機能構成要素の機能を実現するプログラムが格納されている。このプログラムは、プロセッサ１１によりメモリ１２に読み込まれ、プロセッサ１１によって実行される。これにより、第１サーバ１０の各機能構成要素の機能が実現される。 In the first server 10, a processing unit 111, a switching unit 112, and an internal monitoring unit 113 operate as functional components. The switching unit 112 includes an external monitoring unit 114 , a switching execution unit 115 and a failure detection unit 116 . The function of each functional component of the first server 10 is realized by software.
The storage 13 stores a program that implements the function of each functional component of the first server 10 . This program is read into the memory 12 by the processor 11 and executed by the processor 11 . Thereby, the function of each functional component of the first server 10 is realized.

ストレージ１３には、運用スクリプト１３１が記憶されている。運用スクリプト１３１は、停止スクリプト１３２と、昇格スクリプト１３３と、同期解除スクリプト１３４とを含む。 An operation script 131 is stored in the storage 13 . The operational script 131 includes a stop script 132 , a promotion script 133 and a synchronization release script 134 .

図３を参照して、実施の形態１に係る第２サーバ２０の構成を説明する。
第２サーバ２０は、コンピュータである。
第２サーバ２０は、プロセッサ２１と、メモリ２２と、ストレージ２３と、通信インタフェース２４とのハードウェアを備える。プロセッサ２１は、信号線を介して他のハードウェアと接続され、これら他のハードウェアを制御する。 The configuration of the second server 20 according to the first embodiment will be described with reference to FIG.
The second server 20 is a computer.
The second server 20 includes hardware including a processor 21 , a memory 22 , a storage 23 and a communication interface 24 . The processor 21 is connected to other hardware via signal lines and controls these other hardware.

第２サーバ２０は、機能構成要素として、処理部２１１と、切替部２１２と、内部監視部２１３とが動作する。切替部２１２は、外部監視部２１４と、切替実行部２１５と、障害検知部２１６とを備える。第２サーバ２０の各機能構成要素の機能はソフトウェアにより実現される。
ストレージ２３には、第２サーバ２０の各機能構成要素の機能を実現するプログラムが格納されている。このプログラムは、プロセッサ２１によりメモリ２２に読み込まれ、プロセッサ２１によって実行される。これにより、第２サーバ２０の各機能構成要素の機能が実現される。 In the second server 20, a processing unit 211, a switching unit 212, and an internal monitoring unit 213 operate as functional components. The switching unit 212 includes an external monitoring unit 214 , a switching execution unit 215 and a failure detection unit 216 . The function of each functional component of the second server 20 is realized by software.
The storage 23 stores a program that implements the function of each functional component of the second server 20 . This program is read into the memory 22 by the processor 21 and executed by the processor 21 . Thereby, the function of each functional component of the second server 20 is realized.

ストレージ２３には、運用スクリプト２３１が記憶されている。運用スクリプト２３１は、停止スクリプト２３２と、昇格スクリプト２３３と、同期解除スクリプト２３４とを含む。 An operation script 231 is stored in the storage 23 . The operational script 231 includes a stop script 232 , a promotion script 233 and a synchronization cancellation script 234 .

処理部１１１及び処理部２１１は、処理機能を実現するプロセスである。実施の形態１では、処理機能は、データベース機能である。処理機能は、データベース機能に限らず、何らかの業務処理を実現する機能等の他の機能であってもよい。切替部１１２及び切替部２１２は、処理部１１１及び処理部２１１についての主系と従系との切り替えを実現するプロセスである。内部監視部１１３及び内部監視部２１３は、それぞれ切替部１１２と切替部２１２とを監視するプロセスである。
処理部１１１及び切替部１１２と、処理部２１１及び切替部２１２とは、ＯＳＳによって実現される。実施の形態１では、処理部１１１及び処理部２１１は、ＰｏｓｔｇｒｅＳＱＬにより実現され、切替部１１２及び切替部２１２は、ｐｇｐｏｏｌにより実現されるものとする。内部監視部１１３及び内部監視部２１３は、ＯＳＳによって実現されてもよいし、他のソフトウェアによって実現されてもよい。 The processing units 111 and 211 are processes that implement processing functions. In Embodiment 1, the processing function is a database function. The processing function is not limited to the database function, and may be other functions such as functions for realizing some kind of business processing. The switching unit 112 and the switching unit 212 are processes for realizing switching between the main system and the slave system for the processing units 111 and 211 . The internal monitoring unit 113 and the internal monitoring unit 213 are processes that monitor the switching unit 112 and the switching unit 212, respectively.
The processing unit 111 and the switching unit 112, and the processing unit 211 and the switching unit 212 are realized by OSS. In Embodiment 1, the processing units 111 and 211 are implemented by PostgreSQL, and the switching units 112 and 212 are implemented by pgpool. The internal monitoring unit 113 and the internal monitoring unit 213 may be realized by OSS or may be realized by other software.

プロセッサ１１，２１は、プロセッシングを行うＩＣ（ＩｎｔｅｇｒａｔｅｄＣｉｒｃｕｉｔ）である。プロセッサ１１，２１は、具体例としては、ＣＰＵ（Ｃｅｎｔｒａｌ
ＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）、ＤＳＰ（ＤｉｇｉｔａｌＳｉｇｎａｌＰｒｏｃｅｓｓｏｒ）、ＧＰＵ（ＧｒａｐｈｉｃｓＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）である。 The processors 11 and 21 are ICs (Integrated Circuits) that perform processing. As a specific example, the processors 11 and 21 are CPUs (Central
Processing Unit), DSP (Digital Signal Processor), and GPU (Graphics Processing Unit).

メモリ１２，２２は、データを一時的に記憶する記憶装置である。メモリ１２，２２は、具体例としては、ＳＲＡＭ（ＳｔａｔｉｃＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）、ＤＲＡＭ（ＤｙｎａｍｉｃＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）である。 The memories 12 and 22 are storage devices that temporarily store data. Specific examples of the memories 12 and 22 are SRAM (Static Random Access Memory) and DRAM (Dynamic Random Access Memory).

ストレージ１３，２３は、データを保管する記憶装置である。ストレージ１３，２３は、具体例としては、ＨＤＤ（ＨａｒｄＤｉｓｋＤｒｉｖｅ）、ＳＳＤ（ＳｏｌｉｄＳｔａｔｅＤｒｉｖｅ）である。また、ストレージ１３，２３は、ＳＤ（登録商標，ＳｅｃｕｒｅＤｉｇｉｔａｌ）メモリカード、ＣＦ（ＣｏｍｐａｃｔＦｌａｓｈ，登録商標）、ＮＡＮＤフラッシュ、フレキシブルディスク、光ディスク、コンパクトディスク、ブルーレイ（登録商標）ディスク、ＤＶＤ（ＤｉｇｉｔａｌＶｅｒｓａｔｉｌｅＤｉｓｋ）といった可搬記録媒体であってもよい。 The storages 13 and 23 are storage devices that store data. Specific examples of the storages 13 and 23 are HDDs (Hard Disk Drives) and SSDs (Solid State Drives). The storages 13 and 23 are SD (registered trademark, Secure Digital) memory cards, CF (Compact Flash, registered trademark), NAND flashes, flexible disks, optical disks, compact disks, Blu-ray (registered trademark) disks, DVDs (Digital Versatile Disks). ) may be a portable recording medium.

通信インタフェース１４，２４は、外部の装置と通信するためのインタフェースである。通信インタフェース１４，２４は、具体例としては、Ｅｔｈｅｒｎｅｔ（登録商標）、ＵＳＢ（ＵｎｉｖｅｒｓａｌＳｅｒｉａｌＢｕｓ）のポートである。 Communication interfaces 14 and 24 are interfaces for communicating with external devices. The communication interfaces 14 and 24 are, for example, Ethernet (registered trademark) and USB (Universal Serial Bus) ports.

図２では、プロセッサ１１は、１つだけ示されていた。しかし、プロセッサ１１は、複数であってもよく、複数のプロセッサ１１が、各機能を実現するプログラムを連携して実行してもよい。同様に、図３では、プロセッサ２１は、１つだけ示されていた。しかし、プロセッサ２１は、複数であってもよく、複数のプロセッサ２１が、各機能を実現するプログラムを連携して実行してもよい。 Only one processor 11 was shown in FIG. However, there may be a plurality of processors 11, and the plurality of processors 11 may cooperate to execute programs that implement each function. Similarly, only one processor 21 was shown in FIG. However, there may be a plurality of processors 21, and the plurality of processors 21 may cooperate to execute programs that implement each function.

＊＊＊動作の説明＊＊＊
図４から図９を参照して、実施の形態１に係る冗長化システム１００の動作を説明する。
実施の形態１に係る冗長化システム１００の動作手順は、実施の形態１に係る冗長化方法に相当する。また、実施の形態１に係る冗長化システム１００の動作を実現するプログラムは、実施の形態１に係る冗長化プログラムに相当する。 ***Description of operation***
The operation of the redundant system 100 according to the first embodiment will be described with reference to FIGS. 4 to 9. FIG.
The operating procedure of the redundancy system 100 according to the first embodiment corresponds to the redundancy method according to the first embodiment. Also, a program that realizes the operation of the redundancy system 100 according to the first embodiment corresponds to the redundancy program according to the first embodiment.

冗長化システム１００の動作は、第１サーバ１０の切替部１１２に障害が発生した場合の第１障害処理と、第１サーバ１０の処理部１１１に障害が発生した場合の第２障害処理と、第２サーバ２０の切替部２１２に障害が発生した場合の第３障害処理と、第２サーバ２０の処理部２１１に障害が発生した場合の第４障害処理とに分けられる。 The operation of the redundant system 100 includes first failure processing when a failure occurs in the switching unit 112 of the first server 10, second failure processing when a failure occurs in the processing unit 111 of the first server 10, It is divided into third failure processing when a failure occurs in the switching unit 212 of the second server 20 and fourth failure processing when a failure occurs in the processing unit 211 of the second server 20 .

図４及び図５を参照して、実施の形態１に係る冗長化システム１００の処理の概要を説明する。
処理部１１１は、切替部１１２もしくは切替部２１２から主系で動作するか、従系で動作するかの指示を受け、切り替えが行われる。同様に、処理部２１１は、切替部２１２もしくは切替部１１２から主系で動作するか、従系で動作するかの指示を受け、切り替えが行われる。また、切替部１１２は、主系で動作しているか、従系で動作しているかを切り替える。同様に、切替部２１２は、主系で動作しているか、従系で動作しているかを切り替える。
図４に示すように、アプリケーションサーバ３０は、切替部１１２及び切替部２１２のうち主系で動作している方を介して、処理部１１１及び処理部２１１のうち主系で動作している方にアクセスする。 An overview of the processing of the redundant system 100 according to the first embodiment will be described with reference to FIGS. 4 and 5. FIG.
The processing unit 111 receives an instruction from the switching unit 112 or the switching unit 212 as to whether to operate in the master system or in the slave system, and performs switching. Similarly, the processing unit 211 receives an instruction from the switching unit 212 or the switching unit 112 as to whether to operate in the master system or in the slave system, and switching is performed. Also, the switching unit 112 switches between operating in the main system and operating in the subordinate system. Similarly, the switching unit 212 switches between operating in the master system and operating in the slave system.
As shown in FIG. 4, the application server 30 switches between the processing unit 111 and the processing unit 211 operating in the main system via the switching unit 112 and the switching unit 212 operating in the main system. to access.

ここで、処理部１１１は、切替部１１２とは独立して、主系で動作しているか、従系で動作しているかの切り替えが行われる。同様に、処理部２１１は、切替部２１２とは独立して、主系で動作しているか、従系で動作しているかの切り替えが行われる。また、切替部１１２は、処理部１１１とは独立して、主系で動作しているか、従系で動作しているかを切り替える。同様に、切替部２１２は、処理部２１１とは独立して、主系で動作しているか、従系で動作しているかを切り替える。
つまり、処理部１１１及び切替部１１２と、処理部２１１及び切替部２１２とは、プロセス単位での主系と従系との切り替えを行う。ＰｏｓｔｇｒｅＳＱＬ及びｐｇｐｏｏｌでは、このような制御が実装されている。 Here, the processing unit 111 switches between operating in the master system and operating in the slave system independently of the switching unit 112 . Similarly, the processing unit 211 switches between operating in the master system and operating in the slave system independently of the switching unit 212 . Also, the switching unit 112 switches between operating in the master system and operating in the slave system independently of the processing unit 111 . Similarly, the switching unit 212 switches between operating in the master system and operating in the slave system independently of the processing unit 211 .
That is, the processing unit 111 and the switching unit 112, and the processing unit 211 and the switching unit 212 perform switching between the master system and the slave system for each process. Such controls are implemented in PostgreSQL and pgpool.

したがって、何の制御もしなければ、図５に示すように、アプリケーションサーバ３０が、切替部２１２を介して、処理部１１１にアクセスする状態も起こり得る。実施の形態１に係る冗長化システム１００は、アプリケーションサーバ３０が、切替部１１２を介して処理部１１１にアクセスする状態と、切替部２１２を介して処理部２１１にアクセスする状態とのいずれかになるように制御する。つまり、実施の形態１に係る冗長化システム１００は、処理部１１１と切替部１１２との両方が主系で動作している状態と、処理部２１１と切替部２１２との両方が主系で動作している状態とのいずれかになるように制御する。
処理部１１１と切替部１１２との両方が主系で動作している場合には、第１サーバ１０が主系で動作していると言う。同様に、処理部２１１と切替部２１２との両方が主系で動作している場合には、第２サーバ２０が主系で動作していると言う。 Therefore, without any control, the application server 30 may access the processing unit 111 via the switching unit 212 as shown in FIG. In the redundant system 100 according to the first embodiment, the application server 30 accesses the processing unit 111 via the switching unit 112 or accesses the processing unit 211 via the switching unit 212. control so that That is, in the redundant system 100 according to the first embodiment, both the processing unit 111 and the switching unit 112 are operating in the main system, and both the processing unit 211 and the switching unit 212 are operating in the main system. Control to be either the state where
When both the processing unit 111 and the switching unit 112 are operating as the main system, it is said that the first server 10 is operating as the main system. Similarly, when both the processing unit 211 and the switching unit 212 are operating as the main system, it is said that the second server 20 is operating as the main system.

図６を参照して、実施の形態１に係る第１障害処理を説明する。
第１サーバ１０の切替部１１２に障害が発生する（ステップＳ１０）。 The first failure processing according to the first embodiment will be described with reference to FIG.
A failure occurs in the switching unit 112 of the first server 10 (step S10).

（ステップＳ１１：第１障害検知処理）
第２サーバ２０の切替部２１２は、第１サーバ１０の切替部１１２に障害が発生したことを検知する。切替部２１２は、切替部１１２に障害が発生したことを検知すると主系として動作する。
具体的には、第１サーバ１０の切替部１１２における外部監視部１１４と、第２サーバ２０の切替部２１２における外部監視部２１４とは、互いに状態を知らせる情報を定期的に送信しあうハートビート通信を行っている。ここでは、外部監視部１１４は、処理部１１１及び切替部１１２の状態を示す第１状態情報を切替部２１２に送信し、外部監視部２１４は、処理部２１１及び切替部２１２の状態を示す第２状態情報を切替部１１２に送信している。切替部１１２に障害が発生すると、外部監視部１１４から第１状態情報が送信されなくなる。外部監視部２１４は、基準時間以上に渡って第１状態情報の送信が途絶えた場合に、切替部１１２に障害が発生したと判定する。
第１障害検知処理はｐｇｐｏｏｌに予め実装された機能により実現される。 (Step S11: first failure detection process)
The switching unit 212 of the second server 20 detects that the switching unit 112 of the first server 10 has failed. When the switching unit 212 detects that the switching unit 112 has failed, it operates as a main system.
Specifically, the external monitoring unit 114 in the switching unit 112 of the first server 10 and the external monitoring unit 214 in the switching unit 212 of the second server 20 periodically transmit information informing each other of the status. communicating. Here, the external monitoring unit 114 transmits first state information indicating the states of the processing unit 111 and the switching unit 112 to the switching unit 212 , and the external monitoring unit 214 transmits first state information indicating the states of the processing unit 211 and the switching unit 212 . 2-state information is transmitted to the switching unit 112 . When a failure occurs in the switching unit 112, the external monitoring unit 114 stops transmitting the first state information. The external monitoring unit 214 determines that a failure has occurred in the switching unit 112 when the transmission of the first state information is stopped for the reference time or longer.
The first failure detection process is realized by a function pre-implemented in pgpool.

（ステップＳ１２：停止処理）
第２サーバ２０の切替部２１２は、運用スクリプト２３１における停止スクリプト２３２を起動して、第１サーバ１０の処理部１１１を停止させる。
具体的には、停止スクリプト２３２は、プログラムを停止させる停止コマンドを発行するスクリプトである。第２サーバ２０の切替部２１２における切替実行部２１５は、第１サーバ１０の処理部１１１を指定して、停止スクリプト２３２を実行することにより、処理部１１１を停止させる。
処理部１１１及び処理部２１１がＰｏｓｔｇｒｅＳＱＬにより実現される場合には、処理部１１１又は処理部２１１を停止させる停止コマンドは、「＄｛ＰＧＰＡＴＨ％／｝／ｐｇ＿ｃｔｌｓｔｏｐ－Ｄ＄｛ＰＧＤＡＴＡ｝である。＄｛ＰＧＰＡＴＨ｝は、停止させるデータベース機能のフォルダのパスである。したがって、ここでは、＄｛ＰＧＰＡＴＨ｝は、処理部１１１のプログラムが記憶されたフォルダのパスである。 (Step S12: stop processing)
The switching unit 212 of the second server 20 activates the stop script 232 in the operation script 231 to stop the processing unit 111 of the first server 10 .
Specifically, the stop script 232 is a script that issues a stop command to stop the program. The switching execution unit 215 in the switching unit 212 of the second server 20 designates the processing unit 111 of the first server 10 and stops the processing unit 111 by executing the stop script 232 .
When the processing unit 111 and the processing unit 211 are realized by PostgreSQL, the stop command for stopping the processing unit 111 or the processing unit 211 is "${PGPATH%/}/pg_ctl stop-D${PGDATA}. ${PGPATH} is the path of the folder of the database function to be stopped, so here, ${PGPATH} is the path of the folder in which the program of processing unit 111 is stored.

（ステップＳ１３：第２障害検知処理）
第２サーバ２０の切替部２１２は、第１サーバ１０の処理部１１１に障害が発生したことを検知する。
具体的には、第２サーバ２０の切替部２１２における障害検知部２１６は、第１サーバ１０の処理部１１１の状態を監視している。障害検知部２１６は、ステップＳ１２の処理で処理部１１１が停止されると、処理部１１１に障害が発生したと検知する。
第２障害検知処理はｐｇｐｏｏｌに予め実装された機能により実現される。 (Step S13: second failure detection process)
The switching unit 212 of the second server 20 detects that the processing unit 111 of the first server 10 has failed.
Specifically, the failure detection unit 216 in the switching unit 212 of the second server 20 monitors the state of the processing unit 111 of the first server 10 . The failure detection unit 216 detects that a failure has occurred in the processing unit 111 when the processing unit 111 is stopped in the process of step S12.
The second failure detection process is realized by a function pre-implemented in pgpool.

（ステップＳ１４：昇格処理）
第２サーバ２０の切替部２１２は、運用スクリプト２３１における昇格スクリプト２３３を起動して、第２サーバ２０の処理部２１１を主系に昇格させる。
具体的には、昇格スクリプト２３３は、処理部１１１又は処理部２１１を主系に昇格させる昇格コマンドを発行するスクリプトである。第２サーバ２０の切替部２１２における切替実行部２１５は、第２サーバ２０の処理部２１１を指定して、昇格スクリプト２３３を実行することにより、処理部２１１を主系に昇格させる。
処理部１１１及び処理部２１１がＰｏｓｔｇｒｅＳＱＬにより実現される場合には、昇格コマンドは、「＄｛ＰＧＰＡＴＨ％／｝／ｐｇ＿ｃｔｌ“ｐｒｏｍｏｔｅ－Ｄ”＄｛ｎｅｗ＿ｍａｓｔｅｒ＿ｎｏｄｅ＿ｐｇｄａｔａ｝」である。＄｛ｎｅｗ＿ｍａｓｔｅｒ＿ｎｏｄｅ＿ｐｇｄａｔａ｝は、昇格させるデータベース機能のフォルダのパスである。したがって、ここでは、＄｛ｎｅｗ＿ｍａｓｔｅｒ＿ｎｏｄｅ＿ｐｇｄａｔａ｝は、処理部２１１のプログラムが記憶されたフォルダのパスである。 (Step S14: promotion process)
The switching unit 212 of the second server 20 activates the promotion script 233 in the operation script 231 to promote the processing unit 211 of the second server 20 to the main system.
Specifically, the promotion script 233 is a script that issues a promotion command for promoting the processing unit 111 or the processing unit 211 to the main system. The switching execution unit 215 in the switching unit 212 of the second server 20 designates the processing unit 211 of the second server 20 and executes the promotion script 233 to promote the processing unit 211 to the main system.
When the processing unit 111 and the processing unit 211 are realized by PostgreSQL, the promotion command is "${PGPATH%/}/pg_ctl"promote-D"${new_master_node_pgdata}". ${new_master_node_pgdata} is the folder path of the database function to promote. Therefore, here, ${new_master_node_pgdata} is the path of the folder in which the program of the processing unit 211 is stored.

ステップＳ１１の処理で、切替部２１２が主系になり、ステップＳ１４の処理で、処理部２１１が主系になる。その結果、第２サーバ２０が主系で動作する状態になる。つまり、主系で動作するサーバが第１サーバ１０から第２サーバ２０に切り替わる。
処理機能がデータベースの場合、同期処理は、予め実装された機能（ＰｏｓｔｇｒｅＳＱＬ）により実現され、従系で稼働している間、主系のデータを同期している。従系であれば読み込みしかできないが、昇格スクリプトを実行することにより主系に昇格するため、読み込みに加えて書き込みが可能となる。 In the processing of step S11, the switching unit 212 becomes the main system, and in the processing of step S14, the processing unit 211 becomes the main system. As a result, the second server 20 enters a state of operating as the main system. In other words, the server operating as the main system is switched from the first server 10 to the second server 20 .
When the processing function is a database, the synchronous processing is realized by a pre-implemented function (PostgreSQL), synchronizing the master data while the slave system is running. If it is a slave system, it can only read, but by executing the promotion script, it will be promoted to the master system, so it will be possible to write in addition to reading.

図７を参照して、実施の形態１に係る第２障害処理を説明する。
第１サーバ１０の処理部１１１に障害が発生する（ステップＳ２０）。 The second failure processing according to the first embodiment will be described with reference to FIG.
A failure occurs in the processing unit 111 of the first server 10 (step S20).

（ステップＳ２１：第１障害検知処理）
第１サーバ１０の切替部１１２は、第１サーバ１０の処理部１１１に障害が発生したことを検知する。
具体的には、第１サーバ１０の切替部１１２における障害検知部１１６は、第１サーバ１０の処理部１１１の状態を監視している。障害検知部１１６は、処理部１１１が停止する等すると、処理部１１１に障害が発生したと検知する。
第１障害検知処理はｐｇｐｏｏｌに予め実装された機能により実現される。 (Step S21: first failure detection process)
The switching unit 112 of the first server 10 detects that the processing unit 111 of the first server 10 has failed.
Specifically, the failure detection unit 116 in the switching unit 112 of the first server 10 monitors the state of the processing unit 111 of the first server 10 . The failure detection unit 116 detects that a failure has occurred in the processing unit 111 when the processing unit 111 stops or the like.
The first failure detection process is realized by a function pre-implemented in pgpool.

なお、この際、第２サーバ２０の切替部２１２も、第１サーバ１０の切替部１１２と同様に、第１サーバ１０の処理部１１１に障害が発生したことを検知する。 At this time, the switching unit 212 of the second server 20 also detects that the processing unit 111 of the first server 10 has failed, similarly to the switching unit 112 of the first server 10 .

（ステップＳ２２：停止処理）
第１サーバ１０の切替部１１２は、運用スクリプト１３１における停止スクリプト１３２を起動して、第１サーバ１０の切替部１１２を停止させる。
具体的には、停止スクリプト１３２は、停止スクリプト２３２と同様に、プログラムを停止させる停止コマンドを発行するスクリプトである。第１サーバ１０の切替部１１２における切替実行部１１５は、第１サーバ１０の切替部１１２を指定して、停止スクリプト１３２を実行することにより、切替部１１２を停止させる。 (Step S22: stop processing)
The switching unit 112 of the first server 10 activates the stop script 132 in the operation script 131 to stop the switching unit 112 of the first server 10 .
Specifically, the stop script 132 is a script that issues a stop command to stop the program, like the stop script 232 . The switching execution unit 115 in the switching unit 112 of the first server 10 designates the switching unit 112 of the first server 10 and stops the switching unit 112 by executing the stop script 132 .

（ステップＳ２３：第２障害検知処理）
第２サーバ２０の切替部２１２は、第１サーバ１０の切替部１１２に障害が発生したことを検知する。切替部２１２は、切替部１１２に障害が発生したことを検知すると主系として動作する。
具体的には、ステップＳ２２の処理で切替部１１２が停止したため、切替部１１２における外部監視部１１４から第１状態情報の送信が途絶える。第２サーバ２０の切替部２１２における外部監視部２１４は、基準時間以上に渡って第１状態情報の送信が途絶えた場合に、切替部１１２に障害が発生したと判定する。
第２障害検知処理はｐｇｐｏｏｌに予め実装された機能により実現される。 (Step S23: second failure detection process)
The switching unit 212 of the second server 20 detects that the switching unit 112 of the first server 10 has failed. When the switching unit 212 detects that the switching unit 112 has failed, it operates as a main system.
Specifically, since the switching unit 112 is stopped in the process of step S22, transmission of the first state information from the external monitoring unit 114 in the switching unit 112 is stopped. The external monitoring unit 214 in the switching unit 212 of the second server 20 determines that a failure has occurred in the switching unit 112 when the transmission of the first state information is interrupted for the reference time or longer.
The second failure detection process is realized by a function pre-implemented in pgpool.

（ステップＳ２４：昇格処理）
第２サーバ２０の切替部２１２は、図６のステップＳ１４の処理と同様に、運用スクリプト２３１における昇格スクリプト２３３を起動して、第２サーバ２０の処理部２１１を主系に昇格させる。 (Step S24: promotion process)
The switching unit 212 of the second server 20 activates the promotion script 233 in the operation script 231 to promote the processing unit 211 of the second server 20 to the main system, as in the process of step S14 in FIG.

ステップＳ２３の処理で、切替部２１２が主系になり、ステップＳ２４の処理で、処理部２１１が主系になる。その結果、第２サーバ２０が主系で動作する状態になる。つまり、主系で動作するサーバが第１サーバ１０から第２サーバ２０に切り替わる。 In the processing of step S23, the switching unit 212 becomes the main system, and in the processing of step S24, the processing unit 211 becomes the main system. As a result, the second server 20 enters a state of operating as the main system. That is, the server operating as the main system is switched from the first server 10 to the second server 20 .

図８を参照して、実施の形態１に係る第３障害処理を説明する。
第２サーバ２０の切替部２１２に障害が発生する（ステップＳ３０）。 The third failure processing according to the first embodiment will be described with reference to FIG.
A failure occurs in the switching unit 212 of the second server 20 (step S30).

（ステップＳ３１：第１障害検知処理）
第２サーバ２０の内部監視部２１３は、第２サーバ２０の切替部２１２に障害が発生したことを検知する。
具体的には、内部監視部２１３は、切替部２１２の状態を監視している。内部監視部２１３は、切替部２１２が停止する等すると、切替部２１２に障害が発生したと検知する。 (Step S31: First failure detection process)
The internal monitoring unit 213 of the second server 20 detects that the switching unit 212 of the second server 20 has failed.
Specifically, the internal monitoring unit 213 monitors the state of the switching unit 212 . The internal monitoring unit 213 detects that a failure has occurred in the switching unit 212 when the switching unit 212 stops.

（ステップＳ３２：停止処理）
第２サーバ２０の内部監視部２１３は、運用スクリプト２３１における停止スクリプト２３２を起動して、第２サーバ２０の処理部２１１を停止させる。
具体的には、第２サーバ２０の内部監視部２１３は、第２サーバ２０の処理部２１１を指定して、停止スクリプト２３２を実行することにより、処理部２１１を停止させる。 (Step S32: stop processing)
The internal monitoring unit 213 of the second server 20 activates the stop script 232 in the operation script 231 to stop the processing unit 211 of the second server 20 .
Specifically, the internal monitoring unit 213 of the second server 20 stops the processing unit 211 by specifying the processing unit 211 of the second server 20 and executing the stop script 232 .

（ステップＳ３３：第２障害検知処理）
第１サーバ１０の切替部１１２は、第２サーバ２０の処理部２１１に障害が発生したことを検知する。
具体的には、切替部２１２に障害が発生すると、外部監視部２１４から第２状態情報が送信されなくなる。第１サーバ１０の切替部１１２における外部監視部１１４は、基準時間以上に渡って第２状態情報の送信が途絶えた場合に、切替部２１２に障害が発生したと判定する。
第２障害検知処理はｐｇｐｏｏｌに予め実装された機能により実現される。 (Step S33: Second failure detection process)
The switching unit 112 of the first server 10 detects that the processing unit 211 of the second server 20 has failed.
Specifically, when a failure occurs in the switching unit 212 , the second status information is no longer transmitted from the external monitoring unit 214 . The external monitoring unit 114 in the switching unit 112 of the first server 10 determines that a failure has occurred in the switching unit 212 when the transmission of the second state information is interrupted for the reference time or longer.
The second failure detection process is realized by a function pre-implemented in pgpool.

（ステップＳ３４：同期解除処理）
第１サーバ１０の切替部１１２は、運用スクリプト１３１における同期解除スクリプト１３４を起動して、第２サーバ２０の処理部２１１との間のデータの同期を停止させる。
具体的には、同期解除スクリプト１３４は、第１サーバ１０の処理部１１１と第２サーバ２０の処理部２１１との間のデータの同期を停止させるスクリプトである。なお、前提として、処理部１１１と処理部２１１とのうち主系で動作している方から従系で動作している方へデータの更新情報が送信され、データの同期が行われる。これは、ＰｏｓｔｇｒｅＳＱＬで予め実装された機能である。第１サーバ１０の切替部１１２は、第２サーバ２０の処理部２１１を指定して、同期解除スクリプト１３４を実行することにより、データの同期を停止する。 (Step S34: Synchronization Release Processing)
The switching unit 112 of the first server 10 activates the synchronization release script 134 in the operation script 131 to stop data synchronization with the processing unit 211 of the second server 20 .
Specifically, the synchronization cancellation script 134 is a script that stops synchronization of data between the processing unit 111 of the first server 10 and the processing unit 211 of the second server 20 . As a premise, data update information is transmitted from one of the processing units 111 and 211 operating in the master system to the one operating in the slave system, and data synchronization is performed. This is a pre-implemented feature in PostgreSQL. The switching unit 112 of the first server 10 stops data synchronization by specifying the processing unit 211 of the second server 20 and executing the synchronization release script 134 .

ステップＳ３２の処理で、障害が発生した切替部２１２に加え、処理部２１１が停止される。これにより、第２サーバ２０の処理が停止した状態になる。 In the processing of step S32, the processing unit 211 is stopped in addition to the switching unit 212 in which the failure has occurred. As a result, the processing of the second server 20 is stopped.

図９を参照して、実施の形態１に係る第４障害処理を説明する。
第２サーバ２０の処理部２１１に障害が発生する（ステップＳ４０）。 A fourth failure process according to the first embodiment will be described with reference to FIG.
A failure occurs in the processing unit 211 of the second server 20 (step S40).

（ステップＳ４１：第１障害検知処理）
第２サーバ２０の切替部２１２は、第２サーバ２０の処理部２１１に障害が発生したことを検知する。
具体的には、第２サーバ２０の切替部２１２における障害検知部２１６は、第２サーバ２０の処理部２１１の状態を監視している。障害検知部２１６は、処理部２１１が停止する等すると、処理部２１１に障害が発生したと検知する。
第１障害検知処理はｐｇｐｏｏｌに予め実装された機能により実現される。 (Step S41: First failure detection process)
The switching unit 212 of the second server 20 detects that the processing unit 211 of the second server 20 has failed.
Specifically, the failure detection unit 216 in the switching unit 212 of the second server 20 monitors the state of the processing unit 211 of the second server 20 . The failure detection unit 216 detects that a failure has occurred in the processing unit 211 when the processing unit 211 stops.
The first failure detection process is realized by a function pre-implemented in pgpool.

なお、この際、第１サーバ１０の切替部１１２も、第２サーバ２０の切替部２１２と同様に、第２サーバ２０の処理部２１１に障害が発生したことを検知する。 At this time, the switching unit 112 of the first server 10 also detects that the processing unit 211 of the second server 20 has failed, similarly to the switching unit 212 of the second server 20 .

（ステップＳ４２：停止処理）
第２サーバ２０の切替部２１２は、運用スクリプト２３１における停止スクリプト２３２を起動して、第２サーバ２０の切替部２１２を停止させる。
具体的には、第２サーバ２０の切替部２１２における切替実行部２１５は、第２サーバ２０の切替部２１２を指定して、停止スクリプト２３２を実行することにより、切替部２１２を停止させる。 (Step S42: stop processing)
The switching unit 212 of the second server 20 activates the stop script 232 in the operation script 231 to stop the switching unit 212 of the second server 20 .
Specifically, the switching execution unit 215 in the switching unit 212 of the second server 20 stops the switching unit 212 by specifying the switching unit 212 of the second server 20 and executing the stop script 232 .

ステップＳ４２の処理で、障害が発生した処理部２１１に加え、切替部２１２が停止される。これにより、第２サーバ２０の処理が停止した状態になる。 In the processing of step S42, the switching unit 212 is stopped in addition to the processing unit 211 in which the failure has occurred. As a result, the processing of the second server 20 is stopped.

（ステップＳ４３：同期解除処理）
第１サーバ１０の切替部１１２が、処理部２１１の障害を検知する。そして、第１サーバ１０の切替部１１２は、運用スクリプト１３１における同期解除スクリプト１３４を起動して、第２サーバ２０の処理部２１１との間のデータの同期を停止させる。 (Step S43: Synchronization Release Processing)
The switching unit 112 of the first server 10 detects the failure of the processing unit 211 . Then, the switching unit 112 of the first server 10 activates the synchronization cancellation script 134 in the operation script 131 to stop data synchronization with the processing unit 211 of the second server 20 .

＊＊＊実施の形態１の効果＊＊＊
以上のように、実施の形態１に係る冗長化システム１００は、プロセス単位での切り替えを行うソフトウェアを用いて、サーバ単位での切り替えを実現可能である。これにより、例えばそのソフトウェアがＯＳＳの場合、費用を抑えつつ、サーバ単位での切り替えにより運用の複雑化を防ぐことが可能になる。 *** Effect of Embodiment 1 ***
As described above, the redundant system 100 according to the first embodiment can implement switching in units of servers using software that performs switching in units of processes. As a result, for example, when the software is OSS, it is possible to prevent complication of operation by switching on a server-by-server basis while keeping costs down.

図１０に示すように、単純にＰｏｓｔｇｒｅＳＱＬ及びｐｇｐｏｏｌを用いて冗長化されたデータベースシステムを実現すると、プロセス単位での切り替えが行われることになる。図１０では、サーバ＃１のＰｏｓｔｇｒｅＳＱＬ及びｐｇｐｏｏｌが主系で動作しているときに、サーバ＃１のｐｇｐｏｏｌに障害が発生した場合を示している。サーバ＃１のｐｇｐｏｏｌに障害が発生すると、サーバ＃２のｐｇｐｏｏｌが主系に切り替わる。しかし、サーバ＃１のＰｏｓｔｇｒｅＳＱＬは主系で動作したままである。したがって、アプリケーションサーバ３０は、サーバ＃２のｐｇｐｏｏｌを介して、サーバ＃１のＰｏｓｔｇｒｅＳＱＬにアクセスすることになる。
これに対して、図１１に示すように、実施の形態１に係る冗長化システム１００では、サーバ＃１のｐｇｐｏｏｌに障害が発生すると、サーバ＃１のＰｏｓｔｇｒｅＳＱＬが停止され、サーバ＃２のＰｏｓｔｇｒｅＳＱＬ及びｐｇｐｏｏｌが主系に切り替わる。その結果、アプリケーションサーバ３０は、サーバ＃２のｐｇｐｏｏｌを介して、サーバ＃２のＰｏｓｔｇｒｅＳＱＬにアクセスすることになる。 As shown in FIG. 10, if a redundant database system is realized simply by using PostgreSQL and pgpool, switching will be performed on a process-by-process basis. FIG. 10 shows a case where a failure occurs in pgpool of server #1 while PostgreSQL and pgpool of server #1 are operating in the main system. When a failure occurs in the pgpool of server #1, the pgpool of server #2 switches over to the main system. However, PostgreSQL on server #1 is still running on the main system. Therefore, the application server 30 accesses PostgreSQL of server #1 via pgpool of server #2.
On the other hand, as shown in FIG. 11, in the redundant system 100 according to the first embodiment, when a failure occurs in pgpool of server #1, PostgreSQL of server #1 is stopped, and PostgreSQL of server #2 and pgpool switches to the principal. As a result, the application server 30 accesses PostgreSQL of server #2 via pgpool of server #2.

図６を用いて説明した、第１障害処理のように、第２サーバ２０の切替部２１２は第１サーバ１０の切替部１１２に障害が発生したことを検知すれば、処理部１１１を停止させる。
図７を用いて説明した、第２障害処理のように、第１サーバ１０の切替部１１２は第１サーバ１０の処理部１１１に障害が発生したことを検知すれば、切替部１１２自身を停止する。その後、第２サーバ２０の切替部２１２は処理部２１１に対して主系への昇格処理を実施する。
図８を用いて説明した、第３障害処理のように、第２サーバ２０の内部監視部２１３は、第２サーバ２０の切替部２１２に障害が発生したことを検知すれば、処理部２１１を停止させる。その後、切替部１１２は処理部１１１にデータの同期を停止させる。
図９を用いて説明した、第４障害処理のように、第２サーバ２０の切替部２１２は、第２サーバ２０の処理部２１１に障害が発生したことを検知すれば、切替部２１２自身を停止する。その後、切替部１１２は処理部１１１にデータの同期を停止させる。
このように本実施の形態では、第１サーバ１０又は第２サーバ２０のプロセスに障害を検知すると、同一サーバ上のもう一方のプロセスを停止させる。それにより、サーバ単位での切り替えを可能とする。 6, the switching unit 212 of the second server 20 stops the processing unit 111 when it detects that the switching unit 112 of the first server 10 has failed. .
If the switching unit 112 of the first server 10 detects that a failure has occurred in the processing unit 111 of the first server 10, the switching unit 112 stops itself as in the second failure processing described using FIG. do. After that, the switching unit 212 of the second server 20 performs promotion processing to the main system for the processing unit 211 .
As in the third failure process described using FIG. stop. After that, the switching unit 112 causes the processing unit 111 to stop data synchronization.
As in the fourth failure process described with reference to FIG. 9, when the switching unit 212 of the second server 20 detects that a failure has occurred in the processing unit 211 of the second server 20, the switching unit 212 itself Stop. After that, the switching unit 112 causes the processing unit 111 to stop data synchronization.
As described above, in this embodiment, when a failure is detected in a process of the first server 10 or the second server 20, the other process on the same server is stopped. This enables switching on a server-by-server basis.

運用の複雑化について説明する。
ここでは、オンプレミスでシステムが稼働しており、ユーザ側で運用が行われているとする。ユーザ側で運用が行われているため、障害が発生した場合の１次対応はユーザ側で行われる。
サーバ単位で切り替えを行う場合には、ユーザは、障害が発生しているのが、どのサーバであるかを特定し、特定されたサーバを例えば再起動する等するという対応を行う。これに対して、プロセス単位で切り替えを行う場合には、ユーザは、障害が発生しているのが、どのサーバのどのプロセスかを特定する必要がある。そして、特定されたプロセスを再起動する等するという対応が必要である。ユーザにとっては、障害が発生しているプロセスを特定することは困難である場合がある。また、プロセスを再起動するという対応も困難である場合がある。
このように、プロセス単位での切り替えを行う場合には、運用が複雑になり、ユーザ側で対応を行うといったことが困難になる場合がある。 Describe operational complexity.
Here, it is assumed that the system is running on-premises and operated by the user. Since operation is performed on the user side, the primary response in the event of a failure is performed on the user side.
When performing switching on a server-by-server basis, the user specifies which server is in trouble, and responds by, for example, restarting the specified server. On the other hand, when the process switching is performed in units of processes, the user needs to specify which process of which server is causing the failure. Then, it is necessary to take measures such as restarting the specified process. It may be difficult for the user to identify the failing process. It may also be difficult to respond by restarting the process.
In this way, when switching is performed on a process-by-process basis, the operation becomes complicated, and it may be difficult for the user to take action.

＊＊＊他の構成＊＊＊
＜変形例１＞
実施の形態１では、処理部１１１及び処理部２１１が実現する処理機能は、データベース機能であるとした。しかし、処理部１１１及び処理部２１１が実現する処理機能は、データベース機能に限定されるものではない。
例えば、負荷分散を行うために、複数のアプリケーションサーバ３０を用いる場合がある。この場合には、処理部１１１及び処理部２１１が実現する処理機能は、アプリケーションサーバ３０の機能であってもよい。処理機能がアプリケーションサーバ３０の機能である場合にも、原則として、実施の形態１で説明した処理により、プロセス単位での切り替えを行うＯＳＳを用いて、サーバ単位での切り替えを実現可能である。但し、図９のステップＳ４２では、データベースの間のデータの同期ではなく、アプリケーションサーバ３０の機能間でのセッション情報の同期を停止するようにする必要がある。 ***Other Configurations***
<Modification 1>
In the first embodiment, the processing functions realized by the processing units 111 and 211 are database functions. However, the processing functions realized by the processing units 111 and 211 are not limited to database functions.
For example, multiple application servers 30 may be used to distribute the load. In this case, the processing functions realized by the processing units 111 and 211 may be functions of the application server 30 . Even if the processing function is the function of the application server 30, in principle, switching in units of servers can be realized by using the OSS that performs switching in units of processes by the processing described in the first embodiment. However, in step S42 of FIG. 9, it is necessary to stop synchronization of session information between functions of the application server 30 instead of synchronization of data between databases.

以上、本開示の実施の形態及び変形例について説明した。これらの実施の形態及び変形例のうち、いくつかを組み合わせて実施してもよい。また、いずれか１つ又はいくつかを部分的に実施してもよい。なお、本開示は、以上の実施の形態及び変形例に限定されるものではなく、必要に応じて種々の変更が可能である。 The embodiments and modifications of the present disclosure have been described above. Some of these embodiments and modifications may be combined and implemented. Also, any one or some may be partially implemented. It should be noted that the present disclosure is not limited to the above embodiments and modifications, and various modifications are possible as necessary.

１００冗長化システム、１０第１サーバ、１１プロセッサ、１２メモリ、１３ストレージ、１４通信インタフェース、１５電子回路、１１１処理部、１１２切替部、１１３内部監視部、１１４外部監視部、１１５切替実行部、１１６障害検知部、１３１運用スクリプト、１３２停止スクリプト、１３３昇格スクリプト、１３４同期解除スクリプト、２０第２サーバ、２１プロセッサ、２２メモリ、２３ストレージ、２４通信インタフェース、２５電子回路、２１１処理部、２１２切替部、２１３内部監視部、２１４外部監視部、２１５切替実行部、２１６障害検知部、２３１運用スクリプト、２３２停止スクリプト、２３３昇格スクリプト、２３４同期解除スクリプト、３０アプリケーションサーバ。 100 redundant system, 10 first server, 11 processor, 12 memory, 13 storage, 14 communication interface, 15 electronic circuit, 111 processing unit, 112 switching unit, 113 internal monitoring unit, 114 external monitoring unit, 115 switching execution unit, 116 failure detection unit, 131 operation script, 132 stop script, 133 promotion script, 134 synchronization release script, 20 second server, 21 processor, 22 memory, 23 storage, 24 communication interface, 25 electronic circuit, 211 processing unit, 212 switching 213 internal monitoring unit 214 external monitoring unit 215 switching execution unit 216 failure detection unit 231 operation script 232 stop script 233 promotion script 234 synchronization release script 30 application server.

Claims

A redundant system comprising a first server operating as a master system and a second server operating as a slave system,
In the first server and the second server, a processing unit, which is a process for realizing processing functions, and a switching unit, which is a process for realizing switching between a master system and a slave system, operate,
the switching unit in the second server causes the second server to operate as a main system when a failure occurs in at least one of the processing unit and the switching unit in the first server ;
The switching unit in the second server causes the second server to operate as a main system by stopping the processing unit in the first server when a failure of the switching unit in the first server is detected. redundant system.

The switching unit in the first server stops when a failure of the processing unit in the first server is detected,
2. When the switching unit in the first server stops, the switching unit in the second server detects a failure of the switching unit in the first server and causes the second server to operate as a main system. The redundant system described in .

In the second server, an internal monitoring unit, which is a process for monitoring the state of the switching unit in the second server, operates;
3. The redundancy system according to claim 1 , wherein the internal monitoring unit in the second server stops the processing unit in the second server when detecting a failure in the switching unit in the second server.

4. The method according to any one of claims 1 to 3 , wherein the switching unit in the second server issues a promotion command to the main system to the processing unit in the second server when operating the second server as the main system. The redundancy system according to any one of items 1 and 2.

the processing function is a database function;
The switching unit in the second server stops when a failure of the database function in the second server is detected,
5. Any one of claims 1 to 4 , wherein the switching unit in the first server stops synchronization of data with the database function in the first server when a failure in the database function in the second server is detected. or the redundant system according to item 1.

the processing function is an application server function;
When a failure of the application server function of the second server is detected, the switching unit of the second server stops synchronization of the session information with the application server function of the first server, and then stops. The redundancy system according to any one of claims 1 to 4 .

A redundancy method in a redundant system comprising a first server operating as a master system and a second server operating as a slave system,
In the first server and the second server, a processing unit, which is a process for realizing processing functions, and a switching unit, which is a process for realizing switching between a master system and a slave system, operate,
When at least one of the processing unit and the switching unit in the first server fails, the switching unit in the second server causes the second server to operate as a main system ;
When a failure of the switching unit in the first server is detected, the switching unit in the second server stops the processing unit in the first server, thereby causing the second server to operate as a main system. redundancy method.