JPH08278909A

JPH08278909A - System and method for high reliability

Info

Publication number: JPH08278909A
Application number: JP7082175A
Authority: JP
Inventors: Masanori Hirano; 正則平野; Tsunemichi Shiozawa; 恒道塩澤; Yasuo Kinouchi; 康夫木ノ内; Takashi Suzuki; 孝至鈴木
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 1995-04-07
Filing date: 1995-04-07
Publication date: 1996-10-22

Abstract

PURPOSE: To reduce the influence range of a transaction process in a fault processing and to make the reliability of the whole module high by providing a main memory fault informing means and a central processing means. CONSTITUTION: The module 2 performs a transaction process for itself at a <=50% use rate of a processor 4, and restores the data base 13 of a module 1 and then performs a transaction processing for the module 1 at the remaining >50% use rate. Thus, processors 3 and 4 are put in partial charge of the transaction process at a <=50% processor use rate, half and half, so that even if a fault occurs to one processor, the other processor can back up it. Even while one module becomes faulty and its data base is restored, the transaction process of the normal module is not affected, so the influence on the whole transaction process is small.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、２台のモジュールのう
ち一方が障害となった場合でも、トランザクション処理
に影響を及ぼすことなく、障害となったモジュールに対
するトランザクション処理をバックアップして、システ
ムの高信頼化を保証しながら、システムの経済化を図る
ことができる高信頼化システムおよび高信頼化方法に関
する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention backs up transaction processing for a failed module without affecting the transaction processing even if one of the two modules fails, so that the system The present invention relates to a high reliability system and a high reliability method capable of making a system economical while ensuring high reliability.

【０００２】[0002]

【従来の技術】オンラインリアルタイム処理とは、デー
タが発生する都度、その場で端末から入力し、通信回線
を通してコンピュータシステムに入力して即時処理し、
その結果を端末等に応答する処理方式である。オンライ
ンリアルタイムシステムは、銀行におけるバンキングシ
ステム、列車等の座席予約システム等で使用されている。
トランザクションとは、オンラインリアルタイムシステ
ムにおいて、コンピュータシステムに対して端末等から
処理を要求してくる単位のことである。従来、トランザ
クション処理の高信頼化方法としては、プロセッサを２
台設けて、そのうちの１台のプロセッサで全トランザク
ションの処理を行い（以下、このプロセッサをアクトプ
ロセッサと記す）、残りの１台を予備として待機させる
（以下、このプロセッサをスタンバイプロセッサと記
す）方法が一般に採用されている。この方法では、アク
トプロセッサが障害になった場合、スタンバイプロセッ
サが半導体ファイル装置からチェックポイントデータベ
ースとログ情報を読み出し、これらの情報によりアクト
プロセッサが障害になった時点のデータベースを復元し
て、トランザクション処理を再開する。しかしながら、
再開処理の間、全てのトランザクション処理が中断され
るという問題がある。また、この方法では、地震、水害
等の大規模災害時には、システム（ノード）全体の機能
が停止してしまう。このような場合でも、トランザクシ
ョン処理を継続させるためには、遠隔地点にも予備のプ
ロセッサを設置することが必要であり、遠隔地点に２台
のプロセッサを設置して待機させておくと、全体で４台
のプロセッサを設置しながら、実際にトランザクション
処理のために稼働するプロセッサは１台だけであるた
め、高信頼のための設備費が膨大になるという問題があ
る。2. Description of the Related Art Online real-time processing means that each time data is generated, it is input from a terminal on the spot, input to a computer system through a communication line and immediately processed.
This is a processing method of responding the result to a terminal or the like. The online real-time system is used in banking systems in banks, seat reservation systems for trains, and the like.
A transaction is a unit that requests processing from a terminal or the like to a computer system in an online real-time system. Conventionally, as a high reliability method of transaction processing, two processors have been used.
A method in which one processor is used to process all transactions (hereinafter, this processor is referred to as an act processor), and the remaining one is used as a standby (hereinafter, this processor is referred to as a standby processor) Is generally adopted. With this method, when the act processor fails, the standby processor reads the checkpoint database and log information from the semiconductor file device, restores the database at the time when the act processor failed, and uses this information to perform transaction processing. To resume. However,
There is a problem that all transaction processes are suspended during the restart process. Also, with this method, the function of the entire system (node) will stop in the event of a large-scale disaster such as an earthquake or water damage. Even in such a case, in order to continue the transaction processing, it is necessary to install a spare processor at the remote location. There is a problem that the equipment cost for high reliability becomes enormous because only one processor actually operates for transaction processing while installing four processors.

【０００３】図８は、従来のノード内のアクト−スタン
バイプロセッサによるバックアップシステムの接続構成
図である。図８において、１，２はプロセッサ、３は半
導体ファイル装置、４は通信制御装置（ＣＣＵ）、５
５，５６はそれぞれプロセッサ１，２と通信制御装置４
を接続する信号線、５７，５８はそれぞれプロセッサ
１，２と半導体ファイル装置３を接続する信号線、５９
はプロセッサ１とプロセッサ２を接続する信号線、６０
はトランザクションが送られてくる通信回線である。一
方のプロセッサ１は、アクトプロセッサとしてトランザ
クション処理を行い、他方のプロセッサ２はスタンバイ
プロセッサとして待機しているものとする。通信回線６
０から入力されたトランザクションは通信制御装置４で
受信され、信号線５５を介してプロセッサ１に入力され
る。プロセッサ１は主メモリ上にデータベースを有して
おり、このデータベースの内容に従ってトランザクショ
ン処理を行うとともに、データベースの更新を行う。デ
ータベースの更新を行った場合、そのデータベース内の
アドレスおよび更新データをログ情報として信号線５７
を介して半導体ファイル装置３に書き込む。さらに、ト
ランザクションへの応答を信号線５５を介して通信制御
装置４に送出する。このようにして、逐次、通信回線６
０を介して送られてきるトランザクションは処理され
る。プロセッサ１は、予め決められた周期で、主メモリ
上のデータベースを半導体ファイル装置３にチェックポ
イント情報として格納する。FIG. 8 is a connection configuration diagram of a backup system using an act-standby processor in a conventional node. In FIG. 8, 1 and 2 are processors, 3 is a semiconductor file device, 4 is a communication control unit (CCU), 5
Reference numerals 5 and 56 denote processors 1 and 2 and communication control device 4, respectively.
Signal lines 57 and 58 connecting the processors 1 and 2 to the semiconductor file device 3, respectively.
Is a signal line connecting the processor 1 and the processor 2, 60
Is a communication line to which a transaction is sent. It is assumed that one processor 1 performs transaction processing as an act processor, and the other processor 2 stands by as a standby processor. Communication line 6
The transaction input from 0 is received by the communication control device 4 and input to the processor 1 via the signal line 55. The processor 1 has a database in the main memory, performs transaction processing according to the contents of this database, and updates the database. When the database is updated, the address and update data in the database are used as log information in the signal line 57.
Write to the semiconductor file device 3 via. Further, the response to the transaction is sent to the communication control device 4 via the signal line 55. In this way, the communication line 6
Transactions sent via 0 are processed. The processor 1 stores the database on the main memory in the semiconductor file device 3 as checkpoint information at a predetermined cycle.

【０００４】図９は、従来のノード間バックアップ方法
を説明するための接続構成図である。図９において、１
０００は地点Ａ（例えば、東京）のノード、２０００は
地点Ｂ（例えば、大阪）に設置されたバックアップノー
ドである。地点Ａと地点Ｂは遠隔地に位置しており、地
点Ａで地震、水害等の災害によりノード１０００全体が
障害となった場合には、地点Ｂでバックアップが可能で
ある。なお、ノード１０００内の１〜１０は図８の符号
と同じものを示し、ノード２０００内の１０１〜１１０
はそれぞれ図８の１〜１０と同じものである。３０００
は、ノード１０００内の通信制御装置４とノード２００
０内の通信制御装置１０４とを接続する信号線である。
ノード１０００内のプロセッサ１はアクトプロセッサと
してトランザクション処理を行い、プロセッサ２はスタ
ンバイプロセッサとして、プロセッサ１が障害となった
場合に、トランザクション処理をバックアップするため
に待機している。ノード２０００のプロセッサ１０１は
主メモリ上にプロセッサ１のデータベースを有してお
り、トランザクション処理は行わないが、プロセッサ１
から信号線５５、通信制御装置４、通信回線３０００、
通信制御装置１０４、信号線４０５を介して送られてく
るデータベース更新のログ情報により、主メモリ内のデ
ータベースを更新するとともに、信号線４０７を介して
半導体ファイル装置１０３にもログ情報を書き込む。ま
た、予め決められた周期で、主メモリ上のデータベース
をチェックポイント情報として半導体ファイル装置１０
３に書き込む。プロセッサ１０２はプロセッサ１０１が
障害となった場合、これをバックアップするために待機
している。FIG. 9 is a connection configuration diagram for explaining a conventional inter-node backup method. In FIG. 9, 1
000 is a node at a point A (for example, Tokyo), and 2000 is a backup node installed at a point B (for example, Osaka). The points A and B are located at remote places, and if the node 1000 as a whole fails at the point A due to a disaster such as an earthquake or water damage, the point B can be backed up. It should be noted that 1 to 10 in the node 1000 are the same as those in FIG. 8 and 101 to 110 in the node 2000.
Are the same as 1 to 10 in FIG. 3000
Are the communication control device 4 and the node 200 in the node 1000.
It is a signal line for connecting to the communication control device 104 in 0.
The processor 1 in the node 1000 performs transaction processing as an act processor, and the processor 2 as a standby processor stands by to back up transaction processing when the processor 1 fails. The processor 101 of the node 2000 has a database of the processor 1 in the main memory and does not perform transaction processing.
From the signal line 55, the communication control device 4, the communication line 3000,
The database in the main memory is updated by the log information of the database update sent via the communication control device 104 and the signal line 405, and the log information is also written in the semiconductor file device 103 via the signal line 407. In addition, the semiconductor file device 10 uses the database on the main memory as checkpoint information at a predetermined cycle.
Write to 3. If the processor 101 fails, the processor 102 stands by to back it up.

【０００５】[0005]

【発明が解決しようとする課題】このように、従来、図
８に示すようなノード内のバックアップ方法を採用する
とともに、図９に示すようなノード間のバックアップ方
法を採用していた。しかしながら、図８および図９のバ
ックアップ方法では、次のような問題がある。すなわ
ち、図８においては、プロセッサ１が障害になった場
合、スタンバイプロセッサ２で処理を再開するため、プ
ロセッサ１は障害である旨を信号線５９を介してスタン
バイプロセッサ２に通知する。この通知を受信したプロ
セッサ２は、半導体ファイル装置３からチェックポイン
トデータベースを主メモリ上に読み出し、その後、ログ
情報によりチェックポイント時点からのデータベースの
更新内容を上書きする。これにより、プロセッサ２の主
メモリには、プロセッサ１が障害になった時点のデータ
ベースが復元される。データベースの復元が終了する
と、プロセッサ２は信号線５６を介して通信制御装置４
に通知する。通信制御装置４は、送られてきたトランザ
クションを信号線５６を介してプロセッサ２に送り、プ
ロセッサ２によりトランザクション処理が再開される。
この方法では、プロセッサ２が再開処理を行っている
間、全てのトランザクション処理が中断されるという問
題がある。As described above, conventionally, the intra-node backup method shown in FIG. 8 and the inter-node backup method shown in FIG. 9 have been adopted. However, the backup methods of FIGS. 8 and 9 have the following problems. That is, in FIG. 8, when the processor 1 fails, the standby processor 2 restarts the processing, so the processor 1 notifies the standby processor 2 via the signal line 59 of the failure. The processor 2 receiving this notification reads the checkpoint database from the semiconductor file device 3 onto the main memory, and then overwrites the update contents of the database from the checkpoint time point with the log information. As a result, the database at the time of the failure of the processor 1 is restored in the main memory of the processor 2. When the database restoration is completed, the processor 2 sends the communication control device 4 via the signal line 56.
To notify. The communication control device 4 sends the sent transaction to the processor 2 via the signal line 56, and the processor 2 restarts the transaction processing.
This method has a problem that all transaction processing is suspended while the processor 2 is performing the restart processing.

【０００６】次に、図９においては、ノード１０００で
大規模災害が生じ、このためトランザクション処理が不
可能となった場合、図示されていないが、通信回線６
０、１１０に接続され、これらのノードが正常か否かを
監視するための管理ノードにより障害が検出され、トラ
ンザクション送出元に対してノード１０００が障害であ
る旨を通知する。以後、トランザクションは通信回線４
１０を介してノード２０００に送られ、プロセッサ１０
１で処理される。しかし、このような大規模災害時に
も、トランザクション処理を可能とするためには、図９
に示すように４台のプロセッサを設置しながら、実際に
トランザクション処理のために稼働するのは１台のプロ
セッサのみであり、高信頼化のための設備コスト負担は
極めて大となるという問題がある。Next, in FIG. 9, when a large-scale disaster occurs at the node 1000 and transaction processing becomes impossible, the communication line 6 is not shown.
A failure is detected by a management node connected to 0, 110 and monitoring whether these nodes are normal, and notifies the transaction sender that the node 1000 has a failure. After that, the transaction is communication line 4
Sent to the node 2000 via the processor 10
1 is processed. However, in order to enable transaction processing even in the event of such a large-scale disaster, the process shown in FIG.
As shown in (4), only one processor actually operates for transaction processing while four processors are installed, and there is a problem that the facility cost burden for high reliability becomes extremely large. .

【０００７】本発明の目的は、このような従来の課題を
解決し、障害処理中にかけるトランザクション処理での
影響範囲を極力少なくするとともに、モジュール全体と
しての高信頼化を図ることができ、また異なる２地点で
のノード相互のバックアップのためのプロセッサ使用率
の余裕を共用することができる高信頼化システムおよび
方法を提供することにある。An object of the present invention is to solve such a conventional problem, to minimize the influence range in transaction processing during failure processing, and to improve the reliability of the module as a whole. It is an object of the present invention to provide a highly reliable system and method capable of sharing a margin of a processor usage rate for backup of nodes at two different points.

【０００８】[0008]

【課題を解決するための手段】上記目的を達成するた
め、本発明による高信頼化システムは、プロセッサと
該プロセッサによりアクセスされる半導体ファイル装置
とを備えたモジュールを２台設置し、データベースを用
いてトランザクション処理を行う高信頼の情報処理シス
テムにおいて、各モジュールのプロセッサの使用率が５
０％以下となるように、各モジュールに割り当てられた
データベースを格納する主メモリと、各モジュールが障
害となり、自ら回復処理を実施したが、回復できないこ
とが判明したとき、その旨を他モジュールに通知する障
害通知手段と、該障害通知手段により通知を受けると、
各モジュールが相互に他モジュールの半導体ファイル装
置をアクセスして、チェックポイント時点のデータベー
スを上記主メモリに読み出すとともに、該チェックポイ
ント時点後のログ情報を読み出し、上記データベースに
上書きして、障害時点の他モジュールのデータベースを
復元する中央処理手段とを有することを特徴としてい
る。In order to achieve the above object, a high reliability system according to the present invention uses two databases each including a processor and a semiconductor file device accessed by the processor. In a highly reliable information processing system that performs transaction processing using
The main memory that stores the database allocated to each module so that it becomes 0% or less, and each module failed, and when it was found that recovery could not be performed by itself, other modules should be notified to that effect. When the failure notification means for notifying and the notification by the failure notification means,
Each module mutually accesses the semiconductor file device of another module to read the database at the checkpoint time into the main memory, read the log information after the checkpoint time, and overwrite the log information in the database to detect the failure time. It has a central processing means for restoring the database of another module.

【０００９】また、本発明による高信頼化方法は、プ
ロセッサと該プロセッサによりアクセスされる半導体フ
ァイル装置とを備えたモジュールを２台設置し、データ
ベースを用いてトランザクション処理を行う高信頼化方
法において、使用率が５０％以下となるように、分割さ
れたデータベースが割り当てられた各モジュールは、割
り当てられた全てのデータベースを主メモリに格納し、
該データベースを用いてトランザクション処理を行い、
該データベースの更新を主メモリ上で行うとともに、該
データベースの更新履歴をログ情報として上記半導体フ
ァイル装置に書き込み、かつ予め決められたチェックポ
イントで主メモリ上の全てのデータベースをチェックポ
イント情報として該半導体ファイル装置に書き込み、ト
ランザクション処理中に障害となったモジュールは、半
導体ファイル装置からチェックポイント時点のデータベ
ースを主メモリ上に読み出すとともに、該チェックポイ
ント時点後のログ情報を読み出して、該ログ情報で上記
データベース上に上書きし、障害時点のデータベースを
復元して、トランザクション処理を再開するが、再度障
害となった場合には、同じ処理を繰り返して、予め決め
られた回数の再開処理を行っても回復しない場合には、
固定障害であることを２台のうちの正常なモジュールに
通知し、該正常なモジュールは、プロセッサの５０％の
使用率で自モジュールに対するトランザクション処理を
行いながら、残りの５０％の使用率で障害となったモジ
ュールの半導体ファイル装置からチェックポイント時点
のデータベースを主メモリ上に読み出し、該チェックポ
イント時点後のログ情報を読み出して、該ログ情報で上
記データベースに上書きし、障害となった時点の他モジ
ュールのデータベースを復元して、他モジュールのデー
タベースに対するトランザクション処理も処理すること
を特徴としている。The high reliability method according to the present invention is a high reliability method in which two modules each including a processor and a semiconductor file device accessed by the processor are installed and transaction processing is performed using a database. Each module to which a divided database is allocated so that the usage rate is 50% or less stores all the allocated databases in the main memory,
Transaction processing is performed using the database,
The database is updated on the main memory, the update history of the database is written as log information in the semiconductor file device, and all the databases on the main memory are used as checkpoint information at a predetermined checkpoint. The module that writes in the file device and becomes a failure during transaction processing reads the database at the checkpoint time from the semiconductor file device into the main memory, reads the log information after the checkpoint time, and uses the log information to read the above information. Overwrites on the database, restores the database at the time of the failure, and restarts the transaction processing, but when the failure occurs again, the same processing is repeated and recovery is performed even if the restart processing is performed a predetermined number of times. If not,
The normal module of the two units is notified of the fixed failure, and the normal module performs transaction processing for its own module at the usage rate of 50% of the processor, and fails at the remaining usage rate of 50%. The database at the time of the checkpoint is read from the semiconductor file device of the module that has become to the main memory, the log information after the time of the checkpoint is read, and the database is overwritten with the log information. The feature is that the database of the module is restored and the transaction processing for the database of another module is also processed.

【００１０】また、２台のモジュールを異なる２地点
Ａ，Ｂに設置し、それぞれ分散してトランザクション処
理を行い、地点Ａの第１のモジュールと地点Ｂの第１の
モジュールは相互に相手モジュールのデータベースを備
え、自モジュールのデータベースのログ情報を通信回線
を介して送信し、該ログ情報を受信したモジュールは、
相手モジュールのデータベースを更新し、地点Ａの第２
のモジュールと地点Ｂの第２のモジュールも上記と同じ
処理を行い、Ａ,Ｂいずれかの地点で１台のモジュール
が障害となった場合には、同一地点の正常なモジュール
が障害となったモジュールのトランザクション処理を継
続し、Ａ,Ｂいずれかの地点で２台のモジュールが同時
に障害となった場合には、他地点の２台のモジュール
が、障害となった地点の２台のモジュールのトランザク
ション処理を継続することも特徴としている。Further, two modules are installed at two different points A and B, and transaction processing is performed in a distributed manner, and the first module at the point A and the first module at the point B mutually oppose each other. The module that includes the database, transmits the log information of the database of its own module through the communication line, and receives the log information,
The other party's database is updated and the second at point A
Module and the second module at point B also perform the same processing as above, and if one module fails at either point A or B, the normal module at the same point fails. If the module transaction processing continues and two modules at the same time fail at either A or B, the two modules at the other points will be replaced by the two modules at the failed point. It is also characterized by continuing transaction processing.

【００１１】[0011]

【作用】本発明においては、２台のモジュールがプロセ
ッサの使用率５０％以内で自モジュールに対するトラン
ザクション処理を行い、いずれか一方のモジュールが障
害となった場合には、正常なモジュールはプロセッサの
５０％の使用率で自モジュールに対するトランザクショ
ン処理を行いながら、残りの５０％の使用率で障害とな
ったモジュールの半導体ファイル装置からチェックポイ
ント時点のデータベースおよびログ情報を読み出し、他
モジュールが障害となった時点のデータベースを復元
し、他モジュールに対するトランザクション処理を肩代
りする。これにより、障害処理中におけるトランザクシ
ョン処理に対する影響を少なくでき、かつモジュール全
体の高信頼化を図ることができる。また、異なる２地点
にそれぞれ２台のモジュールを設置し、それぞれ独立に
プロセッサの使用率５０％以内で自モジュールに対する
トランザクション処理を行い、地点の異なるモジュール
間では、相互に相手モジュールのデータベースを持ち合
い、データベースの更新履歴をログ情報として通信回線
を介して送り、地点の異なる２モジュール間での相互バ
ックアップを可能としている。この時、同一地点内での
モジュール間相互バックアップと異なる２地点間でのノ
ード間相互バックアップのためのプロセッサ使用率の余
裕を共用することにより、効率のよい高信頼化方法を実
現することができる。According to the present invention, when two modules perform transaction processing with respect to their own modules within the processor utilization rate of 50% and one of the modules fails, the normal module is the processor of 50%. While performing transaction processing for its own module at a usage rate of%, the database and log information at the checkpoint time was read from the semiconductor file device of the module that failed at the remaining usage rate of 50%, and another module failed. Restore the point-in-time database and take over the transaction processing for other modules. As a result, it is possible to reduce the influence on the transaction processing during the failure processing and to improve the reliability of the entire module. In addition, two modules are installed at two different points, and transaction processing is performed independently for each module within the usage rate of the processor of 50%. The update history of the database is sent as log information via a communication line to enable mutual backup between two modules at different points. At this time, an efficient high reliability method can be realized by sharing the margin of the processor usage rate for mutual backup between modules within the same point and mutual backup between nodes between two different points. .

【００１２】[0012]

【実施例】以下、本発明の実施例を、図面により詳細に
説明する。図１は、本発明の第１の実施例を示すトラン
ザクション処理の高信頼化システムの構成図である。図
１において、１，２はモジュール、３，４はそれぞれモ
ジュール１，２内のプロセッサ、５，６はそれぞれモジ
ュール１，２内の半導体ファイル装置、７，８はそれぞ
れプロセッサ３，４内の中央処理装置であって、命令の
実行、入出力処理を行うものである。また、９，１０は
それぞれプロセッサ３，４内の主メモリ、１１，１２は
それぞれプロセッサ３，４内の障害検出・通知装置、１
３，１４はそれぞれ主メモリ９，１０に記憶されている
データベース、１５，１６はそれぞれ半導体ファイル
５，６に記憶されているチェックポイント時点のデータ
ベース、１７，１８はそれぞれ半導体ファイル装置５，
６に記憶されているログ情報、１９は通信回線２６を介
してトランザクションを受信する通信制御装置である。
また、２４，２５は受信したトランザクションをそれぞ
れ中央処理装置７，８に送る信号線、２０，２１はそれ
ぞれ中央処理装置７，８と半導体ファイル装置５，６と
を接続する信号線、２２，２３はそれぞれ中央処理装置
７，８と半導体ファイル装置６，５とを接続する信号
線、２７は障害検出・通知装置１１，１２間を接続する
信号線、２８，２９はそれぞれ中央処理装置７，８と主
メモリ９，１０とを接続する信号線、３０，３１はそれ
ぞれ中央処理装置７，８と障害検出・通知装置１１，１
２間を接続する信号線である。Embodiments of the present invention will now be described in detail with reference to the drawings. FIG. 1 is a block diagram of a transaction processing high reliability system showing a first embodiment of the present invention. In FIG. 1, 1 and 2 are modules, 3 and 4 are processors in the modules 1 and 2, 5 and 6 are semiconductor file devices in the modules 1 and 2, and 7 and 8 are central portions in the processors 3 and 4, respectively. It is a processing device that executes instructions and performs input / output processing. Further, 9 and 10 are main memories in the processors 3 and 4, 11 and 12 are fault detection / notification devices in the processors 3 and 4, respectively.
Reference numerals 3 and 14 are databases stored in the main memories 9 and 10, 15 and 16 are databases at checkpoints stored in the semiconductor files 5 and 6, and 17 and 18 are semiconductor file devices 5 and 5, respectively.
The log information stored in 6 and 19 are communication control devices that receive transactions via the communication line 26.
Further, 24 and 25 are signal lines for sending the received transactions to the central processing units 7 and 8, respectively, and 20 and 21 are signal lines for connecting the central processing units 7 and 8 and the semiconductor file devices 5 and 6, and 22 and 23, respectively. Are signal lines connecting the central processing units 7 and 8 to the semiconductor file devices 6 and 5, 27 are signal lines connecting the fault detection / notification devices 11 and 12, and 28 and 29 are central processing units 7 and 8, respectively. And signal lines 30 and 31 for connecting the main memory 9 and the main memory 9 and 10, respectively, are central processing units 7 and 8 and failure detection / notification devices 11 and 1, respectively.
It is a signal line that connects the two.

【００１３】図２は、本発明の各モジュールのプロセッ
サの通常動作および障害検出時動作の各フローチャート
である。図１において、通信回線２６を介して送られて
きたトランザクションは、通信制御装置１９で受信され
る。通信制御装置１９は、当該トランザクションがモジ
ュール１内のデータベース１３で処理されるものであれ
ば、信号線２４を介して中央処理装置７に送られ、また
モジュール２内のデータベース１４で処理されるもので
あれば、信号線２５を介して中央処理装置８に送られ
る。以下、当該トランザクションがモジュール１内のデ
ータベース１３で処理されるものとして説明する。図２
に示すように、中央処理装置７に送られたトランザクシ
ョンは、データベース１３に従って処理された後（ステ
ップ３０１，３０２）、信号線２８を介して主メモリ９
にアクセスされ、データベース１３の内容が書き換えら
れる（ステップ３０３）。また、中央処理装置７は、書
き換えたデータベース１３のアドレスおよび書き換え内
容を信号線２０を介して半導体ファイル装置５のログ情
報１７にも書き込む（ステップ３０４）。その後、中央
処理装置７は当該トランザクションへの応答を信号線２
４を介して通信制御装置（ＣＣＵ）１９に送ると（ステ
ップ３０５）、通信制御装置１９は通信回線２６を介し
てトランザクション送出元に当該トランザクションへの
応答を送る。同じように、通信回線２６を介して送られ
てくるトランザクションは、モジュール１またはモジュ
ール２で処理される。この場合、モジュール１，２に
は、プロセッサ３，４の使用率が５０％以下となるよう
にデータベース１３，１４の量が調整されて格納されて
いる。また、中央処理装置７，８は、予め決められた周
期でデータベース１３，１４の内容をチェックポイント
情報として、信号線２０，２１を介して半導体ファイル
装置５，６のチェックポイントデータベースエリア１
５，１６に書き込む。FIG. 2 is a flowchart of a normal operation and a failure detection operation of the processor of each module of the present invention. In FIG. 1, the transaction sent via the communication line 26 is received by the communication control device 19. If the transaction is processed by the database 13 in the module 1, the communication control device 19 is sent to the central processing unit 7 via the signal line 24 and processed by the database 14 in the module 2. If so, it is sent to the central processing unit 8 via the signal line 25. Hereinafter, it is assumed that the transaction is processed by the database 13 in the module 1. Figure 2
As shown in FIG. 5, the transaction sent to the central processing unit 7 is processed according to the database 13 (steps 301 and 302) and then the main memory 9 via the signal line 28.
Is accessed and the contents of the database 13 are rewritten (step 303). The central processing unit 7 also writes the rewritten address of the database 13 and the rewritten contents into the log information 17 of the semiconductor file device 5 via the signal line 20 (step 304). After that, the central processing unit 7 sends a response to the transaction to the signal line 2
4 to the communication control unit (CCU) 19 (step 305), the communication control unit 19 sends a response to the transaction to the transaction sender via the communication line 26. Similarly, a transaction sent via the communication line 26 is processed by the module 1 or the module 2. In this case, the modules 1 and 2 are adjusted and stored in the databases 13 and 14 so that the usage rates of the processors 3 and 4 are 50% or less. Further, the central processing units 7 and 8 use the contents of the databases 13 and 14 as checkpoint information at a predetermined cycle, and checkpoint database area 1 of the semiconductor file devices 5 and 6 via the signal lines 20 and 21.
Write in 5,16.

【００１４】図１、図２において、トランザクションが
上述のように処理されている途中で、モジュール１が障
害なり、この障害をプロセッサ３内の障害検出・通知装
置１１で検出したとする（ステップ３１１）。障害検出
・通知装置１１は、信号線３０を介して中央処理装置７
をリセットする（ステップ３１３）。これにより、中央
処理装置７は、プログラムを最初から開始し、信号線２
８を介して主メモリ９の内容を初期化し（ステップ３１
４）、信号線２０を介して半導体ファイル装置５内のチ
ェックポイントデータベース１５を主メモリ９のデータ
ベース格納エリア１３に読み出す（ステップ３１５）。
さらに、中央処理装置７は、信号線２０を介して半導体
ファイル装置５に格納されているログ情報１７を読み出
し、このログ情報に従って主メモリ９上のデータベース
１３を書き換える（ステップ３１６）。チェックポイン
ト時点からの全ログ情報について上記処理が終了すると
（ステップ３１７）、主メモリ９上のデータベース１３
は、障害が検出された直前の内容となる。このようにし
て、データベース１３が復元されると、再びモジュール
１でのトランザクション処理が再開される（ステップ３
１８）。データベース１３の回復中に再度プロセッサ３
が障害になると、それを障害検出・通知装置１１が検出
し、前述と同じデータベース回復処理を行う。障害検出
・通知装置１１は予め決められた回数だけ障害を検出す
ると（ステップ３１２）、プロセッサ３が固定障害であ
るとみなして、信号線２７を介してその旨をモジュール
２内の障害検出・通知装置１２に通知する（ステップ３
１９）。障害検出・通知装置１２は、信号線３１を介し
て中央処理装置８にモジュール１が固定障害であること
を通知する。1 and 2, it is assumed that the module 1 fails while the transaction is being processed as described above, and this failure is detected by the failure detection / notification device 11 in the processor 3 (step 311). ). The fault detection / notification device 11 is connected to the central processing unit 7 via the signal line 30.
Is reset (step 313). As a result, the central processing unit 7 starts the program from the beginning, and the signal line 2
The contents of the main memory 9 are initialized via 8 (step 31
4) Read the checkpoint database 15 in the semiconductor file device 5 into the database storage area 13 of the main memory 9 via the signal line 20 (step 315).
Further, the central processing unit 7 reads the log information 17 stored in the semiconductor file device 5 via the signal line 20, and rewrites the database 13 on the main memory 9 according to this log information (step 316). When the above process is completed for all log information from the checkpoint time (step 317), the database 13 on the main memory 9
Is the content just before the failure was detected. In this way, when the database 13 is restored, the transaction processing in the module 1 is restarted again (step 3).
18). Processor 3 again during recovery of database 13
Becomes a failure, the failure detection / notification device 11 detects it and performs the same database recovery process as described above. When the failure detection / notification device 11 detects a failure a predetermined number of times (step 312), the processor 3 regards it as a fixed failure, and notifies the failure detection / notification in the module 2 via the signal line 27. Notify device 12 (step 3)
19). The fault detection / notification device 12 notifies the central processing unit 8 via the signal line 31 that the module 1 has a fixed fault.

【００１５】図３は、本発明による相手方障害時のモジ
ュールのプロセッサの動作フローチャートである。中央
処理装置８は、信号線２３を介して半導体ファイル装置
５からチェックポイントデータベース１５を主メモリ１
０に読み出す（ステップ３２１）。次に、中央処理装置
８は、信号線２３を介して半導体ファイル装置５に格納
されているログ情報１７を読み出し（ステップ３２
２）、このログ情報に従って、主メモリ１０に読み出し
たチェックポイントデータベース１５を書き換える（ス
テップ３２３）。チェックポイント時点からの全ログ情
報について上記処理が終了すると（ステップ３２４）、
主メモリ１０上にはモジュール１の障害直前のデータベ
ースが復元される。中央処理装置８は、信号線２５を介
して通信制御装置（ＣＣＵ）１９にモジュール１のデー
タベースが復元されたことを通知する（ステップ３２
５）。通信制御装置１９は、モジュール１で処理される
トランザクションも信号線２５を介して中央処理装置８
に送る（ステップ３２６）。これにより、モジュール１
に対するトランザクション処理は、モジュール２で再開
される。モジュール２は、プロセッサの使用率５０％以
内で自モジュールに対するトランザクション処理を行う
とともに、残りの５０％の使用率で上述のモジュール１
のデータベースの復元を行い、次にモジュール１に対す
るトランザクション処理を行う（ステップ３２７）。こ
のようにして、プロセッサ３，４がそれぞれプロセッサ
の使用率５０％以内でトランザクション処理を半分ずつ
分担して処理を行うことにより、いずれか一方のプロセ
ッサが障害となった場合でも、相互にバックアップが可
能となる。また、いずれか一方のモジュールが障害とな
り、当該モジュールのデータベースを復元中でも、正常
なモジュールのトランザクション処理は影響を受けない
ため、全体のトランザクション処理への影響は少なくて
すむという利点がある。FIG. 3 is a flowchart showing the operation of the processor of the module when the other party fails according to the present invention. The central processing unit 8 transfers the checkpoint database 15 from the semiconductor file device 5 to the main memory 1 via the signal line 23.
It is read to 0 (step 321). Next, the central processing unit 8 reads the log information 17 stored in the semiconductor file device 5 via the signal line 23 (step 32).
2) According to this log information, the checkpoint database 15 read out to the main memory 10 is rewritten (step 323). When the above processing is completed for all log information from the time of the checkpoint (step 324),
The database immediately before the failure of the module 1 is restored on the main memory 10. The central processing unit 8 notifies the communication control unit (CCU) 19 via the signal line 25 that the database of the module 1 has been restored (step 32).
5). The communication control device 19 uses the signal line 25 to transmit transactions processed by the module 1 to the central processing unit 8.
(Step 326). This allows module 1
Transaction processing for is resumed in module 2. The module 2 performs transaction processing for its own module within the usage rate of the processor of 50%, and the above-mentioned module 1 with the remaining usage rate of 50%.
The database is restored, and then the transaction processing for module 1 is performed (step 327). In this way, the processors 3 and 4 share the transaction processing by 50% within the processor utilization rate of 50%, so that even if one of the processors fails, the backup can be performed mutually. It will be possible. Further, even if one of the modules becomes a failure and the database of the module is restored, the transaction processing of the normal module is not affected, so that there is an advantage that the transaction processing of the entire module is less affected.

【００１６】図４は、本発明の第２の実施例を示すトラ
ンザクション処理の高信頼化システムの構成図である。
図４において、１〜３１の符号は図１と同じものを示
す。３２，３３はそれぞれ半導体ファイル装置５，６に
格納されているプロセッサ４，３内のデータベース１
４，１３のチェックポイントデータベース、３４，３５
はそれぞれ半導体ファイル装置５，６に格納されている
ログ情報である。図１の実施例と異なる点は、プロセッ
サ３でトランザクション処理を行った場合、ログ情報を
半導体ファイル装置５内のエリア１７のみでなく、半導
体ファイル装置６内のエリア３５にも格納すること、お
よびプロセッサ３内のデータベース１３のチェックポイ
ントデータベースを半導体ファイル装置５内のエリア１
５のみでなく、半導体ファイル装置６内のエリア３３に
も格納することである。また、同じように、プロセッサ
４からのログ情報は半導体ファイル装置５，６のエリア
１８，３４に格納され、チェックポイントデータベース
は半導体ファイル装置６，５のエリア１６，３２に格納
される。このように、ログ情報とチェックポイントデー
タベースを２台の半導体ファイル装置５，６に二重化し
て格納することにより、どちらかの半導体ファイル装置
が障害となって、ログ情報およびチェックポイントデー
タベースが失われたとしても、正常な半導体ファイル装
置からログ情報およびチェックポイントデータベースを
読み出して再開処理を行うことが可能となり、信頼性を
より一層向上できる。FIG. 4 is a block diagram of a transaction processing high reliability system showing a second embodiment of the present invention.
In FIG. 4, reference numerals 1 to 31 are the same as those in FIG. 32 and 33 are databases 1 in the processors 4 and 3 stored in the semiconductor file devices 5 and 6, respectively.
Checkpoint database of 4,13,34,35
Is log information stored in the semiconductor file devices 5 and 6, respectively. The difference from the embodiment of FIG. 1 is that when transaction processing is performed by the processor 3, the log information is stored not only in the area 17 in the semiconductor file device 5 but also in the area 35 in the semiconductor file device 6, and The checkpoint database of the database 13 in the processor 3 is used as the area 1 in the semiconductor file device 5.
This is to be stored in the area 33 in the semiconductor filing device 6 as well as in No. Similarly, the log information from the processor 4 is stored in the areas 18 and 34 of the semiconductor file devices 5 and 6, and the checkpoint database is stored in the areas 16 and 32 of the semiconductor file devices 6 and 5. Thus, by duplicating and storing the log information and the checkpoint database in the two semiconductor file devices 5 and 6, one of the semiconductor file devices becomes an obstacle and the log information and the checkpoint database are lost. Even in this case, the log information and the checkpoint database can be read from the normal semiconductor file device and the restart processing can be performed, and the reliability can be further improved.

【００１７】図５は、本発明の第３の実施例を示すトラ
ンザクション処理の高信頼化システムの構成図である。
図５において、符号１〜３１は図１の実施例と同じもの
を示している。３６，３７はそれぞれモジュール１，２
内に設けられた２台目の半導体ファイル装置、３８，３
９はそれぞれ半導体ファイル装置３６，３７内のチェッ
クポイントデータベース、４０，４１はそれぞれ半導体
ファイル装置３７，３８内のログ情報である。図５の実
施例が図１の実施例と異なる点は、モジュール１，２内
にそれぞれ半導体ファイル装置を２台設け、チェックポ
イントデータベースおよびログ情報を２台の半導体ファ
イル装置３６，３７に二重化して格納することである。
これにより、半導体ファイル装置の１台が障害となり、
チェックポイントデータベースおよびログ情報が失われ
たとしても正常な半導体ファイル装置からチェックポイ
ントデータベースおよびログ情報を読み出して再開処理
を行うことが可能となり、信頼性を一層向上させること
ができる。FIG. 5 is a block diagram of a transaction processing high reliability system showing a third embodiment of the present invention.
In FIG. 5, reference numerals 1 to 31 indicate the same parts as those in the embodiment shown in FIG. 36 and 37 are modules 1 and 2, respectively
Second semiconductor file device provided inside, 38, 3
Reference numeral 9 is a checkpoint database in the semiconductor file devices 36 and 37, and 40 and 41 are log information in the semiconductor file devices 37 and 38, respectively. The embodiment of FIG. 5 is different from the embodiment of FIG. 1 in that two semiconductor file devices are provided in each of the modules 1 and 2 and the checkpoint database and log information are duplicated in the two semiconductor file devices 36 and 37. It is to store it.
As a result, one of the semiconductor filing devices becomes an obstacle,
Even if the checkpoint database and the log information are lost, the checkpoint database and the log information can be read from the normal semiconductor file device and the restart processing can be performed, and the reliability can be further improved.

【００１８】図６は、本発明の第４の実施例を示すトラ
ンザクション処理の高信頼化システムの構成図である。
図６において、符号１〜３１および３６〜４１は図５の
実施例と同じものを示している。４２はプロセッサ３が
半導体ファイル装置６に格納したチェックポイントデー
タベース、４３はプロセッサ３が半導体ファイル装置６
に格納したログ情報である。図６においては、図５の状
態でトランザクション処理を行っているとき、半導体フ
ァイル装置５が障害となり（×で示す）、半導体ファイ
ル装置５内のチェックポイントデータベース１５および
ログ情報１７が失われたため、プロセッサ３が半導体フ
ァイル装置６にチェックポイントデータベース４２およ
びログ情報４３を格納した場合を示している。このよう
に、いずれか一方のモジュールの半導体ファイル装置の
１台が故障した場合、他モジュールの半導体ファイル装
置にチェックポイントデータベースおよびログ情報を書
き込むことにより、常にチェックポイントデータベース
およびログ情報が半導体ファイル装置に二重化して格納
されるため、全体としての信頼性をより一層向上でき
る。FIG. 6 is a block diagram of a transaction processing high reliability system showing a fourth embodiment of the present invention.
In FIG. 6, reference numerals 1 to 31 and 36 to 41 indicate the same parts as those in the embodiment of FIG. 42 is a checkpoint database stored in the semiconductor file device 6 by the processor 3, 43 is the semiconductor file device 6 in the processor 3.
It is the log information stored in. In FIG. 6, when the transaction processing is performed in the state of FIG. 5, the semiconductor file device 5 becomes an obstacle (indicated by x), and the checkpoint database 15 and the log information 17 in the semiconductor file device 5 are lost. The case where the processor 3 stores the checkpoint database 42 and the log information 43 in the semiconductor file device 6 is shown. As described above, when one of the semiconductor file devices of one of the modules fails, the checkpoint database and the log information are written in the semiconductor file device of the other module so that the checkpoint database and the log information are always stored in the semiconductor file device. Since the data is redundantly stored in, the overall reliability can be further improved.

【００１９】図７は、本発明の第５の実施例を示すトラ
ンザクション処理のノード間における高信頼化システム
の構成図である。図７において、１０００は地点Ａ（例
えば、東京）のノード、２０００は地点Ｂ（例えば、大
阪）のノードである。地点Ａと地点Ｂは離れた場所に位
置し、いずれかの地点で地震、水害等の災害によりノー
ド全体が障害となった場合に、他の正常なノードが障害
になったノードのバックアップを可能とするものであ
る。ノード１０００内で、符号１〜３１は図１と同じも
のを示し、ノード２０００内の１０１〜１３１はそれぞ
れ図１の１〜３１と同じものを示している。２０１，２
０２は、それぞれプロセッサ１０３，３のデータベース
１１３、１３のコピーデータベース、２０３，２０４は
それぞれプロセッサ１０４，４のデータベース１１４，
１４のコピーデータベース、２０５，２０６はプロセッ
サ３，１０３のコピーデータベース２０１，２０２を予
め決められた周期で半導体ファイル装置５，１０５に格
納したチェックポイントデータベース、２０９，２１０
はプロセッサ４，１０４のコピーデータベース２０３，
２０４を予め決められた周期で半導体ファイル装置６，
１０６に格納したチェックポイントデータベース、２０
７，２０８はプロセッサ３，１０３のコピーデータベー
ス２０１，２０２の更新履歴を半導体ファイル装置５，
１０５に格納したログ情報、２１１，２１２はプロセッ
サ４，１０４のコピーデータベース２０３，２０４の更
新履歴を半導体ファイル装置６，１０６に格納したログ
情報である。３０００は、通信制御装置１９と通信制御
装置１１９間を接続する通信回線である。FIG. 7 is a block diagram of a high reliability system between nodes for transaction processing showing a fifth embodiment of the present invention. In FIG. 7, 1000 is a node at a point A (for example, Tokyo), and 2000 is a node at a point B (for example, Osaka). The points A and B are located apart from each other, and if an entire node fails due to a disaster such as an earthquake or water damage at any point, another normal node can back up the failed node. It is what In the node 1000, reference numerals 1 to 31 indicate the same as those in FIG. 1, and 101 to 131 in the node 2000 indicate the same as those in FIGS. 1 to 31. 201,2
02 is a copy database of the databases 113 and 13 of the processors 103 and 3, respectively, and 203 and 204 are databases 114 of the processors 104 and 4, respectively.
14 is a copy database, 205 and 206 are checkpoint databases in which the copy databases 201 and 202 of the processors 3 and 103 are stored in the semiconductor file devices 5 and 105 in a predetermined cycle, and 209 and 210.
Is a copy database 203 of the processors 4, 104,
204 in the semiconductor file device 6 at a predetermined cycle.
Checkpoint database stored in 106, 20
Reference numeral 7208 indicates the update history of the copy databases 201 and 202 of the processors 3 and 103 in the semiconductor file device 5;
Reference numeral 211 and 212 are log information stored in the semiconductor file device 6 and 106, and update history of the copy databases 203 and 204 of the processors 4 and 104 are stored in the semiconductor file device 6 and 106. A communication line 3000 connects between the communication control device 19 and the communication control device 119.

【００２０】図７において、プロセッサ３，４，１０
３，１０４はそれぞれデータベース１３，１４，１１
３，１１４を用い、プロセッサの使用率５０％以内でト
ランザクション処理を行う。ノード１０００内では、モ
ジュール１とモジュール２とが相互バックアップ状態に
あり、ノード２０００内ではモジュール１０１とモジュ
ール１０２とが相互バックアップ状態にある。各ノード
でのトランザクション処理および各ノード内で１台のモ
ジュールが障害となった場合のモジュール間のバックア
ップ処理は、図１において説明した通りである。ここで
は、ノード間のバックアップ処理について、図７により
説明する。ノード１０００のモジュール１とノード２０
００のモジュール１０１、ノード１０００のモジュール
２とノード２０００のモジュール１０２とが、相互バッ
クアップ状態にある。ノード１０００の中央処理装置７
は、データベース１３に対するトランザクション処理を
行うとともに、予め決められた周期でログ情報１７を信
号線２０を介して半導体ファイル装置５から読み出し、
信号線２４を介して通信制御装置１９に送る。通信制御
装置１９は、ログ情報を通信回線３０００を介して通信
制御装置１１９に送る。通信制御装置１１９は、ログ情
報を通信回線１２４を介して中央処理装置１０７に送
る。In FIG. 7, processors 3, 4, 10
3 and 104 are databases 13, 14 and 11, respectively.
3, 114, transaction processing is performed within a processor usage rate of 50%. In the node 1000, the module 1 and the module 2 are in the mutual backup state, and in the node 2000, the module 101 and the module 102 are in the mutual backup state. Transaction processing in each node and backup processing between modules when one module fails in each node are as described in FIG. Here, the backup process between nodes will be described with reference to FIG. Module 1 of node 1000 and node 20
The module 101 of 00, the module 2 of the node 1000, and the module 102 of the node 2000 are in a mutual backup state. Central processing unit 7 of node 1000
Performs transaction processing on the database 13, reads the log information 17 from the semiconductor file device 5 via the signal line 20 at a predetermined cycle,
It is sent to the communication control device 19 via the signal line 24. The communication control device 19 sends the log information to the communication control device 119 via the communication line 3000. The communication control device 119 sends the log information to the central processing unit 107 via the communication line 124.

【００２１】中央処理装置１０７は、送られたログ情報
をもとにコピーデータベース２０２を書き換える。ま
た、中央処理装置１０７は、送られたログ情報を信号線
１２０を介して半導体ファイル装置１０５のログ情報格
納エリア２０８に書き込む。プロセッサ１０９のコピー
データベース２０２は、中央処理装置１０７の制御のも
とに、予め決められた周期で信号線１２０を介して半導
体ファイル装置１０５のチェックポイントデータベース
格納エリア２０６に書き込まれる。全く同じように、ノ
ード２０００のプロセッサ１０３のデータベース１１３
は、ノード１０００のプロセッサ３内のコピーデータベ
ース２０１に復元され、さらに半導体ファイル装置５内
にコピーデータベース２０１のチェックポイントデータ
ベース２０５およびログ情報２０７が格納される。以
上、ノード１０００内のモジュール１とノード２０００
内のモジュール１０１のデータベースを相互に送り合っ
て、相手モジュールのデータベースのコピーを主メモリ
に復元する方法、およびチェックポイントデータベー
ス、ログ情報を半導体ファイル装置に格納する方法につ
いて述べた。全く同じようにして、ノード１０００内の
モジュール２とノード２０００内のモジュール１０２
は、相互バックアップ状態にある。The central processing unit 107 rewrites the copy database 202 based on the sent log information. Further, the central processing unit 107 writes the sent log information in the log information storage area 208 of the semiconductor file device 105 via the signal line 120. The copy database 202 of the processor 109 is written in the checkpoint database storage area 206 of the semiconductor file device 105 via the signal line 120 at a predetermined cycle under the control of the central processing unit 107. Exactly the same, the database 113 of the processor 103 of the node 2000
Is restored to the copy database 201 in the processor 3 of the node 1000, and the checkpoint database 205 and the log information 207 of the copy database 201 are stored in the semiconductor file device 5. As described above, the module 1 in the node 1000 and the node 2000
The method of transmitting the databases of the modules 101 in each other to each other to restore a copy of the database of the counterpart module to the main memory, and the method of storing the checkpoint database and the log information in the semiconductor file device have been described. In exactly the same way, module 2 in node 1000 and module 102 in node 2000
Are in mutual backup state.

【００２２】このような状態で、ノード２０００が地
震、水害等の災害によりトランザクション処理が停止す
ると、図７では図示省略されているが、通信回線２６，
１２６に接続され、これらのノードが正常か否かを監視
する管理ノードにより障害が検出されるので、管理ノー
ドによりトランザクション送出元にノード２０００が障
害である旨を通知する。以後、トランザクションは、通
信回線２６を介してノード１０００に送られる。トラン
ザクションを受信した通信制御装置１９は、信号線２
４，２５を介して中央処理装置７，８に通知する。中央
処理装置７は、５０％の使用率で自モジュールへのトラ
ンザクション処理を行うとともに、残りの５０％のプロ
セッサ使用率でモジュール１０１に対するトランザクシ
ョン処理もデータベース２０１を用いて処理する。中央
処理装置８も、同じように５０％のプロセッサ使用率で
自モジュールに対するトランザクション処理を行いなが
ら、残りの５０％のプロセッサ使用率でモジュール１０
２に対するトランザクションをデータベース２０３を用
いて処理する。各モジュールは、正常時にはプロセッサ
の使用率５０％以内で自モジュールに対するトランザク
ション処理を行いながら、残りの５０％のプロセッサ使
用率により、同一ノード内のモジュール障害時には、正常なモジュ
ールが障害となったモジュールのトランザクション処理
をバックアップし、ノード全体が障害時には、正常なノードの２台のモジ
ュールが障害となったノードの２台のモジュールに対す
るトランザクション処理をバックアップする。このよう
に、ノード内でのモジュール間相互バックアップと、ノ
ード間での相互バックアップのためのプロセッサの使用
率の余裕を共用することにより、高い信頼度を維持した
まま、経済的にシステムを構成することができる。In this state, if the node 2000 stops transaction processing due to a disaster such as an earthquake or water damage, the communication line 26, though not shown in FIG.
Since a failure is detected by the management node connected to 126 and monitoring whether these nodes are normal, the management node notifies the transaction sender that the node 2000 has a failure. Thereafter, the transaction is sent to the node 1000 via the communication line 26. The communication control device 19 that received the transaction uses the signal line 2
The central processing unit 7, 8 is notified via 4, 25. The central processing unit 7 performs transaction processing for its own module at a usage rate of 50%, and also uses the database 201 for transaction processing for the module 101 at the remaining 50% processor usage rate. Similarly, the central processing unit 8 performs transaction processing for its own module at a processor usage rate of 50%, and the module 10 at the remaining 50% processor usage rate.
The transaction for 2 is processed using the database 203. Each module performs transaction processing to its own module within the processor usage rate of 50% during normal operation, but due to the remaining 50% of processor usage rate, when a module failure occurs in the same node, the normal module fails When the entire node fails, the two modules of the normal node back up the transaction processing of the two modules of the failed node. In this way, by sharing the mutual backup between modules within a node and the margin of processor usage for mutual backup between nodes, a system can be economically constructed while maintaining high reliability. be able to.

【００２３】[0023]

【発明の効果】以上説明したように、本発明によれば、
同一地点内の２台のモジュールによる相互バックアップ
では、いずれか一方のモジュールが障害となったとき、
正常なモジュールで処理していたトランザクション処理
に影響を与えることなく、障害となったモジュールに対
するトランザクション処理をバックアップできる。ま
た、地震、水害等の大規模災害に対して、システム全体
として高信頼化を図るためには、異なる２地点間での相
互バックアップが必要となるが、ノード内のモジュール
間相互バックアップのためのプロセッサ使用率の余裕と
ノード間相互バックアップのためのプロセッサ使用率の
余裕を共用することにより、システムの高信頼化を保証
しながら、システム全体としての経済化を図ることがで
きる。As described above, according to the present invention,
In mutual backup by two modules in the same point, when one of the modules fails,
The transaction processing for the failed module can be backed up without affecting the transaction processing that was being processed by the normal module. In addition, in order to improve the reliability of the entire system against large-scale disasters such as earthquakes and floods, mutual backup between two different points is necessary. By sharing the margin of the processor usage rate and the margin of the processor usage rate for mutual backup between nodes, it is possible to make the system as a whole economical while ensuring the high reliability of the system.

[Brief description of drawings]

【図１】本発明の第１の実施例を示すトランザクション
処理の高信頼化システムの構成図である。FIG. 1 is a configuration diagram of a transaction processing high reliability system according to a first embodiment of the present invention.

【図２】図１におけるアクトプロセッサの正常時動作お
よび障害時動作のフローチャートである。FIG. 2 is a flowchart of a normal operation and a failure operation of the act processor in FIG.

【図３】図１におけるスタンバイプロセッサの障害時動
作のフローチャートである。FIG. 3 is a flow chart of a failure operation of the standby processor in FIG.

【図４】本発明の第２の実施例を示すトランザクション
処理の高信頼化システムの構成図である。FIG. 4 is a configuration diagram of a transaction processing high reliability system showing a second embodiment of the present invention.

【図５】本発明の第３の実施例を示すトランザクション
処理の高信頼化システムの構成図である。FIG. 5 is a configuration diagram of a transaction processing high reliability system showing a third embodiment of the present invention.

【図６】本発明の第４の実施例を示すトランザクション
処理の高信頼化システムの構成図である。FIG. 6 is a configuration diagram of a transaction processing high reliability system showing a fourth embodiment of the present invention.

【図７】本発明の第５の実施例を示すトランザクション
処理のノード間の高信頼化システムの構成図である。FIG. 7 is a block diagram of a high reliability system between nodes of transaction processing showing a fifth embodiment of the present invention.

【図８】従来のトランザクション処理の高信頼化システ
ムの構成図である。FIG. 8 is a block diagram of a conventional high reliability system for transaction processing.

【図９】従来のトランザクション処理のノード間の高信
頼化システムの構成図である。FIG. 9 is a configuration diagram of a conventional high reliability system between transaction processing nodes.

[Explanation of symbols]

１，２…モジュール、３，４…プロセッサ、５，６…半
導体ファイル装置、７，８…中央処理装置、９，１０…
主メモリ、１１，１２…障害検出・通知装置、１３，１
４…データベース、１５，１６…チェックポイントデー
タベース、１７，１８…ログ情報、１９…通信制御装
置、２６…通信回線、２０〜２５，２７…信号線、３
２，３３…半導体ファイル装置内の相手方チェックポイ
ントデータベース、３４，３５…半導体ファイル装置内
の相手方ログ情報、３６，３７…他の半導体ファイル装
置、３８，３９…他の半導体ファイル装置内のチェック
ポイントデータベース、４０，４１…他の半導体ファイ
ル装置内のログ情報、４２，４３…相手方のチェックポ
イントデータベース、およびログ情報、１０１，１０２
…モジュール、１０３，１０４…プロセッサ、１０５，
１０６…半導体ファイル装置、１０７，１０８…中央処
理装置、１０９，１１０…主メモリ、１１１，１１２…
障害検出・通知装置、１１３，１１４，１１５，１１６
…チェックポイントデータベース、２０２，２０４，１
１７，１１８…ログ情報。1, 2 ... Module, 3, 4 ... Processor, 5, 6 ... Semiconductor file device, 7, 8 ... Central processing unit, 9, 10 ...
Main memory, 11, 12 ... Fault detection / notification device, 13, 1
4 ... Database, 15, 16 ... Checkpoint database, 17, 18 ... Log information, 19 ... Communication control device, 26 ... Communication line, 20-25, 27 ... Signal line, 3
2, 33 ... Counterpart checkpoint database in semiconductor file device, 34, 35 ... Counterpart log information in semiconductor file device, 36, 37 ... Other semiconductor file device, 38, 39 ... Checkpoint in other semiconductor file device Database, 40, 41 ... Log information in other semiconductor file device, 42, 43 ... Counterpart checkpoint database and log information, 101, 102
... module, 103, 104 ... processor, 105,
106 ... Semiconductor file device, 107, 108 ... Central processing unit, 109, 110 ... Main memory, 111, 112 ...
Fault detection / notification device, 113, 114, 115, 116
… Checkpoint database, 202, 204, 1
17, 118 ... Log information.

───────────────────────────────────────────────────── フロントページの続き (72)発明者鈴木孝至東京都千代田区内幸町１丁目１番６号日本電信電話株式会社内 ─────────────────────────────────────────────────── ─── Continuation of the front page (72) Inventor Takashi Suzuki 1-1-6 Uchisaiwaicho, Chiyoda-ku, Tokyo Nihon Telegraph and Telephone Corporation

Claims

[Claims]

1. A highly reliable information processing system in which two modules each including a processor and a semiconductor file device accessed by the processor are installed, and a transaction processing is performed using a database. The main memory that stores the database allocated to each module and each module has failed so that the recovery rate is 50% or less. To the semiconductor memory device of another module upon receiving the notification from the fault notifying means, and reads out the database at the checkpoint into the main memory and also Read the log information after the time point and High reliability system, characterized in that it comprises overwriting the base, a central processing unit for restoring a database of other modules of the point of failure.

2. In a high reliability method in which two modules each having a processor and a semiconductor file device accessed by the processor are installed and transaction processing is performed using a database, the usage rate is 50% or less. Further, each module to which the divided database is allocated stores all the allocated databases in the main memory, performs transaction processing using the database, updates the database on the main memory, and The update history of the database is written as log information in the semiconductor file device, and all databases on the main memory are written as checkpoint information in the semiconductor file device at a predetermined checkpoint, causing a failure during transaction processing. The module is
The database at the checkpoint time is read from the semiconductor file device into the main memory, the log information after the checkpoint time is read out, the database is overwritten with the log information, the database at the time of the failure is restored, and the transaction is performed. The process is restarted, but if a failure occurs again, the same process is repeated, and if it does not recover even after the restart process is performed a predetermined number of times, it means that there is a fixed failure among the two. The normal module is notified, and the normal module performs transaction processing for its own module at the usage rate of 50% of the processor, and checks points from the semiconductor file device of the failed module at the remaining usage rate of 50%. After reading the database of the time point into the main memory, after the check point A high reliability method characterized in that the log information is read, the database is overwritten with the log information, the database of another module at the time of the failure is restored, and transaction processing for the database of the other module is also processed. .

3. Each module stores the database and log information at the time of checkpoint in a duplicated manner in both the semiconductor file device in its own module and the semiconductor file device in another module. Item 3. The high reliability method according to Item 2.

4. The module according to claim 2, wherein each of the modules is provided with two semiconductor file devices, and the database and log information at the time of checkpoint are duplicated and stored in the two semiconductor file devices. High reliability method.

5. Each of the modules has a capacity of one of the other modules when one of the two semiconductor file devices fails.
The database and log information at the time of checkpoint is also stored in the stand, and the database and log information at the time of checkpoint is always duplicated and stored in two semiconductor file devices. Method.

6. The two modules are connected to two different points A,
They are installed in B and perform transaction processing in a distributed manner. The first module at point A and the first module at point B are provided with the database of the other module, and the log information of the database of the own module is connected to the communication line. The module that has transmitted the log information via the update module updates the database of the other module, and the second module at the point A and the second module at the point B also perform the same processing as described above. If one module fails at point A, the normal module at the same point continues the transaction processing of the failed module, and two modules fail at either point A or B at the same time. If this happens, the two modules at other points can continue the transaction processing of the two modules at the point of failure. Reliable method of claim 2, wherein.