JPH02266444A

JPH02266444A - Fault recovery system for distributed data base

Info

Publication number: JPH02266444A
Application number: JP1086660A
Authority: JP
Inventors: Norihiro Kato; 加藤　宣弘; Yojiro Morimoto; 森本　陽二郎; Koichi Sekiguchi; 幸一関口; Miho Muranaga; 村永　美帆; Yoshikazu Yamashita; 義和山下
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 1989-04-05
Filing date: 1989-04-05
Publication date: 1990-10-31

Abstract

PURPOSE:To shorten the recovery time of a fault site by substituting a saved site for the data processing transaction of the fault site, and when the fault is recovered, transferring the saved data to the fault site to restore the data of the fault site. CONSTITUTION:When a fault is generated, a data projecting means 18 in the fault site 11 protects the data of a main storage 16 for a fixed period, and simultaneously with the protection, a fault informing means 19 in the fault site 11 informs the generation of the fault to the other site 12 and a data save determining means 21 determines which data in the main storage 16 are to be saved to which site. Based upon the determination, both the data communication means of respective sites transmit/receive data and the data are saved to the determined site. On the other hand, the site 12 receiving the data from the fault site 11 substitutes the transaction processed buy the fault site at the time of generating the fault. When the fault of the fault site 11 is recovered, the internal data communication means 22 transmits the saved data to the fault site 11. Consequently, the recovery of the fault site is rapidly executed.

Description

【発明の詳細な説明】［発明の目的］（産業上の利用分野）この発明は、分散型のデータベース管理システムにおけ
る障害復旧方式に関する。Detailed Description of the Invention [Object of the Invention] (Industrial Application Field) The present invention relates to a failure recovery method in a distributed database management system.

（従来の技術）従来の分散データベースの障害復旧方式としては、例え
ばＪ、　　Ｕｌ１ｍａｎ著、國井、大保訳の「データベ
ース・システムの原理」に記載されたものがある（昭和
６０年５月２５日、日本コンピュータ協会発行、Ｐ５４
４）。(Prior Art) Conventional disaster recovery methods for distributed databases include, for example, the one described in "Principles of Database Systems" by J. Ulman, translated by Kunii and Taiho (May 25, 1985). , published by Japan Computer Association, p.54
4).

ここでは第３図に示すように、通信路１によって接続さ
れた各サイト（計算機）２Ａ、２Ｂに、ＣＰＵ３、主記
憶４、データベース５、そしてログ６を備え、１つのサ
イト２Ａで障害が発生した場合、その障害サイト２Ａか
ら他の全てのサイト２Ｂに障害が発生したことを通知し
、通知を受けたサイト２Ｂではどのサイトで障害が発生
したかをログ６に記録し、障害が回復すると、障害が発
生した時点までさかのぼってログ６を調べ、障害サイト
２Ａのデータと共通な全てのデータを最新のデータとし
て障害サイト２Ａへ送信する例が記載されている。As shown in FIG. 3, each site (computer) 2A, 2B connected by a communication path 1 is equipped with a CPU 3, a main memory 4, a database 5, and a log 6, and a failure occurs at one site 2A. In this case, the faulty site 2A notifies all other sites 2B that the fault has occurred, and the site 2B that received the notification records which site the fault has occurred in in the log 6, and when the fault is recovered. , an example is described in which the log 6 is examined going back to the time when a failure occurred, and all data common to the data of the failure site 2A is sent to the failure site 2A as the latest data.

しかし、上記のような障害復旧方式では、障害発生時に
障害サイト２Ａの主記憶４上のデータを全て破棄し、そ
のデータを処理していた全てのトランザクションをアボ
ートしなければならない。However, in the above-described failure recovery method, when a failure occurs, all data on the main memory 4 of the failure site 2A must be discarded and all transactions that were processing that data must be aborted.

この場合、ＶＬＳＩ技術の発達により記憶装置の記憶容
量は非常に大きくなっているので、障害が発生した時点
で障害サイト２Ａ内の主記憶４上には、大量の処理途中
のデータが存在し、前述のように破棄するデータの量は
ぼう大なものとなる。In this case, the storage capacity of storage devices has become extremely large due to the development of VLSI technology, so at the time a failure occurs, a large amount of unprocessed data exists on the main memory 4 in the failure site 2A. As mentioned above, the amount of data to be discarded is enormous.

従って、障害サイト２Ａにおける障害が回復した後、主
記憶４上のデータを障害が発生する前の状態に戻すため
には、必要なデータをデータベース５から読込み、障害
発生時にアボートしたトランザクションをもう一度最初
から実行しなければならない。Therefore, after the failure at the failure site 2A is recovered, in order to return the data on the main memory 4 to the state before the failure, the necessary data is read from the database 5 and the transaction that was aborted at the time of failure is restarted. must be executed from

また、障害サイト２Ａの障害が回復しない間、他のサイ
ト２Ｂは障害サイト２Ａにのみ存在するデータを処理す
るトランザクションを実行できない。Furthermore, while the fault at the faulty site 2A is not recovered, other sites 2B cannot execute transactions that process data existing only at the faulty site 2A.

（発明が解決しようとする課題）このように、従来の障害復旧方式では、障害の復旧に時
間を要するばかりでなく、障害が復旧しない間にアボー
トされるトランザクションが増加するという問題があっ
た。(Problems to be Solved by the Invention) As described above, in the conventional failure recovery method, there is a problem that not only does it take time to recover from a failure, but also that the number of transactions that are aborted while the failure is not recovered increases.

そこで、本発明は、障害サイトの復旧時間を短縮できる
と共に、アボートによるトランザクションの増加を防止
できる分散データベースの障害復旧方式を提供すること
を目的とする。SUMMARY OF THE INVENTION Therefore, an object of the present invention is to provide a distributed database failure recovery method that can shorten the recovery time of a failed site and prevent an increase in transactions due to aborts.

［発明の構成］（課題を解決するための手段）本発明は、上記目的を達成するため、通信路により互い
に結合された複数の計算機サイトに散在しているデータ
群をとり扱う分散データベースの障害復旧方式において
、各サイトに障害発生時に主記憶上のデータを一定時間
保護するデータ保護手段と、障害の発生と回復を他のサ
イトへ通知する障害通知手段と、主記憶上のデータを退
避するサイトを決定するデータ退避決定手段と、退避す
るデータの送受信を行うデータ通信手段とを備え、いず
れかのサイトに障害が発生した時、その障害サイトが処
理を代行するのに必要なデータを他のサイトに退避する
ことにより、その退避されたサイトで前記障害サイトの
データ処理のトランザクションを代行し、障害が回復し
た時、退避されたデータを障害サイトに転送することに
より前記障害サイトのデータを復旧することを特徴とす
る。[Structure of the Invention] (Means for Solving the Problems) In order to achieve the above object, the present invention solves a problem in a distributed database that handles data groups scattered at multiple computer sites interconnected by communication paths. In the recovery method, each site has a data protection means that protects the data on main memory for a certain period of time when a failure occurs, a failure notification means that notifies other sites of the occurrence and recovery of a failure, and a method that evacuates the data on main memory. Equipped with a data evacuation decision means for determining a site and a data communication means for transmitting and receiving data to be evacuated, when a failure occurs in one of the sites, the data necessary for the failed site to perform processing on behalf of the other site is provided. By evacuating the data to the failed site, the evacuated site handles data processing transactions for the failed site, and when the failure is recovered, the data at the failed site is transferred to the failed site by transferring the evacuated data to the failed site. It is characterized by recovery.

（作用）本発明では、上記構成により、一つのサイトにおいて障
害が発生すると、その障害サイトのデータ保護手段が主
記憶上のデータを一定時間保護する。同時に障害サイト
の障害通知手段は、他サイトへ障害が発生したことを通
知し、かつデータ退避決定手段は主記憶上のどのデータ
をどのサイトに退避するかを決定する。この決定に基い
て各サイトのデータ通信手段同志がデータの送受を行い
、決定されたサイトへデータが退避される。このデータ
は、障害サイトのデータ処理を他のサイトが代行するの
に必要なデータである。以上の処理が終わるまで、障害
サイト内の主記憶上のデータは保護される。(Operation) In the present invention, with the above configuration, when a failure occurs at one site, the data protection means of the failed site protects the data on the main memory for a certain period of time. At the same time, the failure notification means of the failed site notifies other sites of the occurrence of the failure, and the data evacuation determining means determines which data in the main memory is to be evacuated to which site. Based on this determination, the data communication means of each site sends and receives data, and the data is saved to the determined site. This data is necessary for another site to handle data processing for the failed site. The data on the main memory in the failed site is protected until the above processing is completed.

女方、障害サイトからデータを受入れたサイトは、障害
発生時にその障害サイトで処理されていたトランザクシ
ョンを代行する。また、障害サイトの障害が回復すると
、その障害サイトの障害通知手段が障害が回復したこと
を他のサイトへ通知する。そして、この通知を受けたサ
イトではその内部のデータ通信手段が前述のように退避
されていたデータを障害サイトへ送信し、これにより障
害サイトの主記憶上には障害発生時と同じデータが最新
のデータとして記憶される。On the other hand, the site that accepts data from the failed site will act on behalf of the transactions that were being processed by the failed site at the time of the failure. Further, when the fault at the faulty site is recovered, the fault notification means of the faulty site notifies other sites that the fault has been recovered. Then, at the site that receives this notification, its internal data communication means sends the data that was evacuated as described above to the failed site, and as a result, the main memory of the failed site has the same data as at the time of the failure. is stored as data.

（実施例）以下、本発明の実施例を図面を参照して説明する。(Example) Embodiments of the present invention will be described below with reference to the drawings.

第１図は本発明の分散データベースの障害復旧方式の一
実施例を示すブロック図である。図では計算機サイトと
して、３台のサイト１１，１２゜１３が通信路１４によ
って互いに結合された例を示している。FIG. 1 is a block diagram showing an embodiment of the distributed database failure recovery method of the present invention. The figure shows an example in which three computer sites 11, 12, and 13 are connected to each other by a communication path 14.

各サイト１１，１２．１３は全体の動作を制御するＣＰ
Ｕ１５、主記憶１６、データベース１７、データ保護手
段１８、障害通知手段１９、負荷状況通知手段２０、デ
ータ退避決定手段２１、データ通信手段２２を備えて構
成されている。Each site 11, 12, 13 is a CP that controls the overall operation
The system includes U 15, main memory 16, database 17, data protection means 18, fault notification means 19, load status notification means 20, data evacuation determination means 21, and data communication means 22.

主記憶１６に記憶されているデータは、障害が発生した
場合にＣＰＵ１５の指令に基いてデータ保護手段１８に
よって一定時間保護されるようになっている。The data stored in the main memory 16 is protected for a certain period of time by the data protection means 18 based on a command from the CPU 15 when a failure occurs.

障害通知手段１９は、例えばサイト１１内で障害が発生
した時にＣＰＵＩ　５の指令によって他のサイト１２．
１３に障害が発生したことを通知する手段であり、障害
が回復した時も他のサイト１２．１３に障害回復を通知
する。For example, when a failure occurs within the site 11, the failure notification means 19 notifies other sites 12.
This is a means for notifying the site 12 and 13 that a failure has occurred, and when the failure has been recovered, it also notifies the other sites 12 and 13 of the failure recovery.

負荷状況通知手段２０は、障害発生の通知を受けた場合
に、自サイト１２．１３内のＣＰＵ１５、主記憶１６な
どの負荷状況を調べ、その内容を障害サイト１１に通知
する手段である。The load status notification means 20 is a means for checking the load status of the CPU 15, main memory 16, etc. in the own site 12, 13, and notifying the failed site 11 of the contents when receiving a notification of the occurrence of a failure.

データ退避決定手段２１は、正常サイト１２゜１３の負
荷状況通知手段２０の内容によって、どのデータをどの
サイトに退避するかを決定する手段であって、正常サイ
ト１２．１３の負荷の軽減に応じて退避するデータを分
配するものである。The data evacuation determining means 21 is a means for determining which data is to be evacuated to which site based on the contents of the load status notification means 20 of the normal site 12.13, and is adapted to reduce the load on the normal site 12.13. This is to distribute the data to be saved.

データ通信手段２２は、前述のようにデータ退避決定手
段２１によって決定されたデータを正常サイト１２．１
：３のデータ通信手段２２に送信する手段である。The data communication means 22 transfers the data determined by the data evacuation determination means 21 as described above to the normal site 12.1.
:3 is a means for transmitting to the data communication means 22.

次に、障害復旧方式を第２図に示すフローチャートを用
いて説明する。Next, a failure recovery method will be explained using the flowchart shown in FIG.

まず、ステップ２０１で、あるサイト、例えばサイト１
１において障害が発生したとする。First, in step 201, a certain site, for example site 1
Assume that a failure occurs in 1.

すると、ステップ２０２でその障害サイト１１内のデー
タ保護手段１８が主記憶１６に記憶されているデータを
一定時間保護する。この保護動作はステップ２０６で障
害サイト１１の主記憶１６のデータが他のサイト１２．
１３に完全に退避するまで続けられる。Then, in step 202, the data protection means 18 in the failed site 11 protects the data stored in the main memory 16 for a certain period of time. This protection operation is performed in step 206 when the data in the main memory 16 of the failed site 11 is transferred to the other site 12.
This will continue until it is completely evacuated to 13.

ステップ２０３では、障害発生と同時に障害発生を他の
サイト１２．１３に通知するものでありる。In step 203, the other sites 12 and 13 are notified of the failure at the same time as the failure occurs.

次いで、ステップ２０４では、障害発生の通知を受けた
サイト１２．１３の負荷状況通知手段２０がそれぞれ自
サイト１２．１３のＣＰＵｌ５主記憶１６などの負荷状
況を調べ、その結果を障害サイト１１へ通知する。Next, in step 204, the load status notification means 20 of the sites 12 and 13 that have received notification of the occurrence of the failure check the load status of the CPU 15 main memory 16, etc. of each of their own sites 12 and 13, and notify the failure site 11 of the results. do.

ステップ２０５では、障害サイト１１のデータ退避決定
手段２１が他サイ）１２．１３の負荷状況に基いて、障
害サイト１１の主記憶１６のどのデータをどのサイ）・
に退避するかを決定する。この場合、他のサイト１２．
１３に退避するデータは、その処理を他サイト１２．１
３が代行するのに必要なデータである。In step 205, the data evacuation determining means 21 of the failed site 11 determines which data in the main memory 16 of the failed site 11 is to be saved to which site based on the load status of the other site 12.13.
Decide whether to evacuate. In this case, other sites 12.
The data saved to 13 will be processed by another site 12.1.
3 is the data necessary for acting on behalf of the user.

ステップ２０６では、ステップ２０５のデータ退避決定
に基いて障害サイト１１のデータ通信手段２２が他サイ
ト１２．１３のデータ通信手段２２にデータを送信する
ことで、障害サイト１１の主記憶１６のデータは他サイ
ト１２．１３に退避される。これら退避されたデータは
、他サイト１２．１３の主記憶１６に記憶される。障害
サイト１１の主記憶１６のデータは、この間データ保護
手段１８によって保護される。In step 206, the data communication means 22 of the faulty site 11 transmits data to the data communication means 22 of the other sites 12.13 based on the data evacuation decision in step 205, so that the data in the main memory 16 of the faulty site 11 is It will be evacuated to another site 12.13. These saved data are stored in the main memory 16 of the other site 12.13. During this time, the data in the main memory 16 of the failed site 11 is protected by the data protection means 18.

次に、ステップ２０７では、データを退避された他のサ
イト１２．１３で障害サイト１１のトランザクションを
代行する。即ち、障害サイト１１の主記憶１６のデータ
は、他のサイト１２．１３の主記憶２０へ退避されるの
で、障害サイト１１で処理されていたトランザクション
は、引き続き他サイト１２．１３で処理される。また、
障害サイト１１における障害が回復しない間、他のサイ
ト１２．１３は各々のデータベース１７のデータを処理
するトランザクションだけでなく、障害発生時に障害サ
イト１１の主記憶１６上にあったデータを処理するトラ
ンザクションも実行する。Next, in step 207, the other site 12.13 whose data has been evacuated performs the transaction of the failed site 11 on behalf of the other site 12.13. That is, the data in the main memory 16 of the faulty site 11 is saved to the main memory 20 of the other site 12.13, so transactions that were being processed at the faulty site 11 will continue to be processed at the other site 12.13. . Also,
While the failure at the failed site 11 is not recovered, the other sites 12 and 13 execute not only transactions that process data in each database 17, but also transactions that process data that was in the main memory 16 of the failed site 11 at the time of the failure. Also execute.

次いで、ステップ２０８で障害サイト１１の障害が回復
すると、ステップ２０９でその障害サイト１１の障害通
知手段１つがサイト１２．１３へ障害が回復したことを
通知する。Next, when the fault at the faulty site 11 is recovered in step 208, one fault notification means of the faulty site 11 notifies the site 12.13 that the fault has been recovered in step 209.

ステップ２１０では、この回復通知により、通知を受け
たサイト１２．１３がそれぞれその内部のデータ通信手
段２２から退避しておいたデータを障害サイト１１のデ
ータ通信手段２２に送信する。これにより、障害サイト
１１の主記憶１６上には、障害発生時と同じデータが最
新のデータとして記憶される。In step 210, in response to this recovery notification, the sites 12 and 13 that received the notification transmit the evacuated data from their internal data communication means 22 to the data communication means 22 of the faulty site 11. As a result, the same data as when the failure occurred is stored as the latest data on the main memory 16 of the failure site 11.

最後に、ステップ１１では、障害発生時に障害サイト１
１の主記憶１６になかったデータに対して障害が回復し
ない間に更新が行われた場合は、サイト１２．１３から
障害サイト１１に更新データを送信する。これにより、
障害サイト１１は完全に障害から復旧する。Finally, in step 11, when a failure occurs, the failure site 1
If data that was not present in the main memory 16 of the site 12 is updated while the failure has not been recovered, the updated data is sent from the site 12.13 to the failed site 11. This results in
The failed site 11 completely recovers from the failure.

以」二の説明では、サイト１１で障害が発生したと仮定
して説明したが、他のサイ）１２．１３で障害が発生し
た場合も同様である。In the following explanation, it is assumed that a failure has occurred at site 11, but the same applies if a failure occurs at other sites (12 and 13).

また、データを退避する場合、データの通信に時間を要
するが、このデータの通信に要する時間は、同じ量のデ
ータを磁気ディスクなどに格納されているデータベース
から主記憶に読出す時間に比較して非常に短いものであ
る。例えば、１０９バイトのデータを通信速度１０１０
ビツト／秒のＬＡＮにより送信したとすると、その通信
に要する時間はおよそ、１０９×８÷１０１０＝０．８　（秒）である。一方、
同じ量のデータをアクセス時間が３Ｘ１０−２秒（１０
７バイト以下のデータを読出す場合）の磁気ディスクな
どに格納されているデータベースから主記憶上に読込む
とすると、要する時間はおよそ３　Ｘ　１０−２Ｘ　（１０９÷１０７）＝３　Ｃ秒）
である。従って、データを退避する時間は非常に短く、
実用上何ら問題はない。Also, when saving data, it takes time to communicate the data, but the time required to communicate this data is compared to the time required to read the same amount of data from a database stored on a magnetic disk etc. to main memory. It is very short. For example, if you send 109 bytes of data at a communication speed of 1010
If the data is transmitted via a bit/second LAN, the time required for the communication is approximately 109×8÷1010=0.8 (seconds). on the other hand,
Access time for the same amount of data is 3X10-2 seconds (10
When reading data of 7 bytes or less) from a database stored on a magnetic disk, etc., to the main memory, the time required is approximately 3 x 10-2 x (109 ÷ 107) = 3 C seconds)
It is. Therefore, the time to save data is very short.
There is no practical problem.

以上説明したように本例によれば、次のような効果を奏
する。As explained above, according to this example, the following effects are achieved.

＜１）いずれかのサイトで障害が発生した場合、障害サ
イトの主記憶上の処理途中のデータを他のサイトに退避
するようにしたので、データを退避されたサイトで障害
サイトのデータを処理するトランザクションを代行する
ことができる。従って、障害が発生してもシステムとし
て有効に機能するばかりでなく、従来のような大量のデ
ータを破棄したり、障害サイトを復旧するには必要なデ
ータをデータベースから読込み、障害発生時にアボート
したトランザクションを最初から実行しなければならず
、復旧に時間がかかるという問題点や、障害が回復しな
い間にアボートされるトランザクションが増加するとい
う問題点を効果的に解決することができる。<1) If a failure occurs at one of the sites, the data that is being processed in the main memory of the failed site is evacuated to another site, so the data from the failed site can be processed at the site where the data was evacuated. Transactions can be performed on your behalf. Therefore, even if a failure occurs, the system not only functions effectively, but also discards large amounts of data as in the past, reads the data necessary to recover the failed site from the database, and then aborts when a failure occurs. It is possible to effectively solve the problem that a transaction has to be executed from the beginning and it takes time to recover, and the problem that the number of transactions that are aborted while the failure is not recovered increases.

（２）障害が回復しない間、退避しているデータを処理
するトランザクションも実行できるので、従来に比較し
てアボートされる、あるいは待たされるトランザクショ
ンの数を低減することができる。(2) Transactions that process saved data can be executed while the failure is not recovered, so the number of transactions that are aborted or made to wait can be reduced compared to the past.

また、退避しているデータは常に最新のデータに更新さ
れるため、障害が復旧したときに、新たにデータの更新
を行う必要がなく、そのままデータの処理を行うことが
できる。Furthermore, since the saved data is always updated to the latest data, there is no need to newly update the data when the failure is recovered, and the data can be processed as is.

なお、本発明は上記実施例に限定されるものではなく、
その要旨を逸脱しない範囲で適宜変形して実施できる。Note that the present invention is not limited to the above embodiments,
It can be implemented with appropriate modifications without departing from the gist of the invention.

［発明の効果］以上の通り、本発明は特許請求の範囲に記載の通り分散
データベースの障害復旧処理方式であるので、障害サイ
トの復旧時間を短縮できると共に、アボートによるトラ
ンザクションの増加を防止できる。[Effects of the Invention] As described above, since the present invention is a distributed database failure recovery processing method as described in the claims, it is possible to shorten the recovery time of a failed site and prevent an increase in transactions due to aborts.

[Brief explanation of drawings]

第１図は本発明の分散データベースの障害復旧方式の一
実施例を示すブロック図、第２図はその動作を示すフロ
ーチャート、第３図は従来例の障害復旧方式を示す説明
図である。１１．１２．１３・・・計算機サイト１４・・・通信路１５・・・ＣＰＵ１６・・・主記憶１７・・・データベース１８・・・データ保護手段１つ・・・障害通知手段２０・・・負荷状況通知手段２１・・・データ退避決定手段２２・・・データ通信手段代Ｕ人「、１三　４子　う１　千口FIG. 1 is a block diagram showing an embodiment of the distributed database failure recovery method of the present invention, FIG. 2 is a flowchart showing its operation, and FIG. 3 is an explanatory diagram showing a conventional failure recovery method. 11.12.13...Computer site 14...Communication path 15...CPU 16...Main memory 17...Database 18...One data protection means...Failure notification means 20...・Load status notification means 21...Data evacuation decision means 22...Data communication means fee

Claims

[Claims]

In a disaster recovery method for a distributed database that handles data groups scattered across a plurality of computer sites interconnected by communication paths, data protection means protects data in main memory for a certain period of time when a failure occurs at each site; A failure notification means for notifying other sites of the occurrence and recovery of a failure, a data evacuation determining means for determining a site to evacuation of data on the main memory, and a data communication means for transmitting and receiving the data to be evacuated. When a failure occurs on one site, the data necessary for processing on behalf of the failed site is evacuated to another site, so that the evacuated site can perform data processing transactions on behalf of the failed site. . A disaster recovery method for a distributed database, characterized in that when a fault is recovered, the data at the faulty site is restored by transferring the evacuated data to the faulty site.