JPH01263745A

JPH01263745A - Recovery of data base

Info

Publication number: JPH01263745A
Application number: JP1025964A
Authority: JP
Inventors: Jr William W Myre; ウイリアム・ウオルター・マイエ・ジユニア; Cheng-Fong Shih; チエング―フオング・シン
Original assignee: International Business Machines Corp
Current assignee: International Business Machines Corp
Priority date: 1988-04-08
Filing date: 1989-02-06
Publication date: 1989-10-20
Anticipated expiration: 2011-03-29
Also published as: EP0336549A3; EP0336549B1; DE68922431T2; EP0336549A2; BR8901649A; US5043866A; DE68922431D1; JPH0833860B2

Abstract

PURPOSE: To avoid the defect of a shadow method and realize the simultaneity of a sub-page and restoration from a plural-page transaction by quickly and efficiently executing an operation at an overhead point at the time of execution and also unnecessitating an analysis step at the time of restoring. CONSTITUTION: Functions MINBUFLSN and LOWTRANLSN realized in a computerized routine are defined and add the first and second components of a check point. MINBUFLSN functionally executes first updating as against a first 'soiling' data page inside a RAM buffer. LOWTRANLSN is functionally executes the first updating of sequence in a table corresponding to the transaction where respective kinds of updating are not functionally comitted. The two components are periodically taken-out during advance writing and stored in a log header as the function of logging action. At the time of recovering, the check point is retrieved and the functions between the components are compared by a recovery algorithm.

Description

【発明の詳細な説明】Ａ、産業上の利用分野本発明はコンピユータ化されたデータベースに関し、さ
らに詳細には、システム・クラッシュ後にデータを回復
するための手法に関するものである。DETAILED DESCRIPTION OF THE INVENTION A. INDUSTRIAL APPLICATION This invention relates to computerized databases and, more particularly, to techniques for recovering data after a system crash.

Ｂ、従来技術］ンピュータ化されたデータベース・システムが多数の
アプリケーシヨンで広く受は入れられていることは周知
である。このデータベース・システムに蓄積されたデー
タは膨大な費用と努力の表われであり、ユーザに七って
不可欠でないとしても非常に貴重なものであり、データ
の消失は非常に深刻かつ高くつくものになる恐れがある
。B. Prior Art] It is well known that computerized database systems have found wide acceptance in numerous applications. The data accumulated in this database system represents a huge amount of expense and effort, and is extremely valuable, if not essential, to users, making data loss very serious and costly. There is a risk that it will happen.

したがって、データの記憶及び操作におけるデータベー
ス・システムのより一般的な機能に加えて、コンピユー
タ化されたデータベース・システムは、通常の処理が停
止するシステム・クラッシュの場合に、重要なデータ回
復機能をも提供しなければならない。回復機能を提供す
る際の難点の１つは、データベースを整合状態に回復さ
せる必要があることであった。不整合の問題の一般的な
例は、たとえば、金融業務アプリケージロンの事例に見
られる。ディスク上の顧客の勘定レコードに対する借方
記入が行なわれた後で、相関勘定に対して貸方記入が行
なわれる前に、銀行のデータベース・システムのクラッ
シュが発生することがある。貸方記入活動は、主メモリ
またはＲＡＭバッファ・メモリに入力されるという意味
では、完了しているかもしれないが、実際にはまだディ
スクに書き込まれていない。したがって、外部の永久記
憶ディスク上のデータベースのイメージは不整合状態に
あると言われる。Therefore, in addition to the more general functions of database systems in data storage and manipulation, computerized database systems also provide important data recovery capabilities in the event of a system crash that halts normal processing. must be provided. One of the difficulties in providing recovery functionality has been the need to restore the database to a consistent state. A common example of the problem of inconsistency can be found, for example, in the case of financial business applications. A bank database system crash may occur after a customer's account record on disk is debited but before the correlated account is credited. Although the credit activity may be complete in the sense of being entered into main memory or RAM buffer memory, it has not actually been written to disk yet. Therefore, the image of the database on the external permanent storage disk is said to be in an inconsistent state.

この問題を解決するため、この例における借方活動及び
貸方活動等、トランザクション境界間のすべての活動が
完了した場合に、常にデータベースが整合状態になるよ
うに、別々の数組のデータベース活動の境界をつけるト
ランザクシロン境界の概念が、当技術で開発された。言
い換えると、データベースに対する一連の更新に関連す
るデータベースにおけるトランザクション制御により、
これらの更新のすべてが完了するか、または１つも完了
しないことになる。問題が生じ、かつクラッシュ後にト
ランザクシロンが完了する前にデータベースを回復する
必要がある場合に、データベースに関する動作をこれら
のトランザクション境界に戻すことができることになる
。To solve this problem, separate sets of database activity boundaries are created such that the database is always in a consistent state when all activities between transaction boundaries are completed, such as debit and credit activities in this example. The concept of transaxillon boundaries has been developed in the art. In other words, transaction control in the database associated with a series of updates to the database
All or none of these updates will be completed. If a problem occurs and the database needs to be recovered after a crash before the transaction completes, operations on the database can be redirected to these transaction boundaries.

トランプクシ１ン境界及びコミットの概念によって対処
される整合性の問題に加えて、データベースを回復時に
前のデータ・イメージに復元しなければならない場合に
、データベース回復に付随して前のデータ・イメージを
記憶する問題に関連してもう１つの難点があった。上記
の簡単な例では、それは借方記入活動のみの部分的トラ
ンザクションまたは作動時トランザクシロンが発生した
場合に、２つの勘定の元の状態を保持するということで
ある。この場合、データベースが不完全なトランザクシ
ョンによって変更される前の元の整合状態になるように
、トランザクションを取り消すことができるということ
である。In addition to the consistency issues addressed by the notion of transaction boundaries and commits, database recovery is accompanied by the need to restore the previous data image if the database must be restored to the previous data image during recovery. There was another difficulty related to the problem of remembering. In the simple example above, that would be to preserve the original state of the two accounts if a partial transaction or operating transaction with only debit activity occurs. In this case, the transaction can be undone so that the database is in its original consistent state before being altered by the incomplete transaction.

当技術で開発された、回復のために前のイメージ・デー
タを保持する１つの技術は、シャドー法またはシャドー
・ベージングと呼ばれるものである。この技術を用いた
初期のデータベース・システムには、ＩＢＭ社が開発し
たシステムＲ及び商業データベース・プロダクトＳＱＬ
　　ＤＳがある。One technique developed in the art to retain previous image data for recovery is called shadow methods or shadow basing. Early database systems using this technology included System R, developed by IBM, and the commercial database product SQL.
There is a DS.

この技術では、記録データ・ページのコピーが保持され
た。トランザクション・コミット時に、記録コピーのペ
ージの変更を含む新しいコピーが作成され、それが新し
いコミット・コピーになり、前のコピーは削除された。With this technique, copies of recorded data pages were maintained. When a transaction commits, a new copy is created that contains the page changes from the recording copy, becomes the new committed copy, and the previous copy is deleted.

追加のデータベース変更が行なわれるたびに、現在のコ
ミット・コピーがそれらの変更と共にデータ・ページの
新しいコミット・コピーに複写され、前の相関データ・
ページは削除された。必要とされる古いデータ・ページ
から新しいデータ・ページへの変更は単にポインタを変
更するだけでよいので、コミット時にこの技術の処理能
力上の利益が生じた。次に説明するより新しい先行書込
みロギング技術の場合のように、データベース活動を再
実行する必要はなかった。Each time additional database changes are made, the current committed copy is replicated with those changes to a new committed copy of the data page, and the previous correlated data
Page has been deleted. The throughput benefit of this technique occurred at commit time because the required change from old to new data page was simply a matter of changing a pointer. There was no need to rerun the database activity, as is the case with newer write-ahead logging techniques described below.

それにもかかわらず、シャドー技術の周知の欠点が多数
開らかになり、先行書込みロギングをサポートするアル
ゴリズムの開発が必要となった。Nevertheless, many of the well-known shortcomings of shadow technology have been exposed, necessitating the development of algorithms that support write-ahead logging.

その主な欠点は、データのシャドー・コピーを維持する
ための余分なＲＡＭ及びディスク空間、及びそれに付随
するオーバヘッドなどであった。データベース内のすべ
てのページを更新する際に第２のコピーが必要なためで
ある。その他の欠点には、チェックポイントに費用がか
かること、データの物理的集合が妨げられてデータが断
片化すること、マツプの効率が悪いこと、及びページ・
マツプ・ブロック用に余分な入出力が必要なことがあっ
た。Its main drawbacks were the extra RAM and disk space and associated overhead for maintaining shadow copies of data. This is because a second copy is required when updating all pages in the database. Other drawbacks include expensive checkpoints, fragmented data that prevents physical aggregation of data, inefficient maps, and
Extra I/O was sometimes required for map blocks.

要するに、シャドー・ページングの不利な点が余りに大
きいことが使用中に判明し、したがって、先行書込みロ
ギングが開発され、回復問題に対するよりすぐれた解決
法であると考えられるようになった。この技術の初期の
例は、上記のＳＱＬ／ＤＳデータベース・システムに見
られる。この技術では、データの第２のコピー全体を保
持する代リ、データベース活動の前後に行なわれたこと
について、単に線形レコードまたはジャーナルを保持す
るだけである。ＲＡＭバッファに表わされた活動は、完
了したトランザクションに関するものでさえ、必ずしも
ディスクに書き込まれるわけではない。したがって、そ
のような作業はシステム・クラッシュ時に失われ、上記
シャドー・ページング技術の場合とは異なって、回復ロ
グから回復を行なう際に再実行しなければならない。In short, the disadvantages of shadow paging were found to be too great in use, so write-ahead logging was developed and considered to be a better solution to the recovery problem. An early example of this technique is seen in the SQL/DS database system mentioned above. Instead of maintaining an entire second copy of the data, this technique simply maintains a linear record or journal of what happened before and after database activity. Activities represented in RAM buffers are not necessarily written to disk, even for completed transactions. Therefore, such work is lost in the event of a system crash and must be re-executed when performing recovery from the recovery log, unlike in the case of the shadow paging techniques described above.

先行書込みロギング・プロトコルのもとでは、一般に、
変更を含む相関データをデータ・ファイルに書き込む前
に、変更に対応するログ・レコードを、最初にディスク
に書き込まなければならない。先行書込みロギングの１
つの重要な態様は、システム・クラッシュの場合にそこ
から回復を開始する回復ログ内の点が周期的に決定され
、書き出されるという、チェックポイント法に関するも
のである。チェックポイント法の効率を左右する１つの
要素はデータベースの内部構造であり、さらに具体的に
はロッキングの細分性である。複数ユーザ・データベー
スでは、２Å以上のユーザが同じデータにアクセスしよ
うとするきき、周知の問題が発生する。前の例に戻ると
、第１のユーザが上記借方勘定を読み取っている間に、
第２のユーザが貸方勘定を変更することがある。第２の
ユーザのトランプクシ１ンが作動中に、第１のユーザが
同じ貸方勘定にアクセスすることがあり、不整合の結果
が生じる。Under write-ahead logging protocols, typically
Before correlated data containing changes can be written to a data file, log records corresponding to the changes must first be written to disk. Write-ahead logging 1
One important aspect concerns checkpointing, in which a point in the recovery log from which to begin recovery in case of a system crash is periodically determined and written out. One factor that determines the efficiency of checkpointing is the internal structure of the database, and more specifically, the granularity of locking. A well-known problem occurs in multi-user databases when more than two users attempt to access the same data. Returning to the previous example, while the first user is reading the debit account above,
A second user may change the credit account. A first user may access the same credit account while a second user's playing card is active, resulting in a mismatch.

この問題に対する１つの解決策は、データベースの一部
分に対するアクセスを１人のユーザに制限することであ
った。そのような制限はロックと呼ばれ、データベース
の制限される部分の大きさはロックの「細分性」に直接
関係するものであった。たとえば、前述のＳ　Ｑ　Ｌ／
Ｄ　Ｓデータベースでは、ロッキングの細分性は、多数
のレコードを包含する物理的データ・ページ・レベルに
あった。One solution to this problem has been to limit access to portions of the database to one user. Such restrictions were called locks, and the size of the restricted portion of the database was directly related to the "granularity" of the lock. For example, the aforementioned S Q L/
In DS databases, the granularity of locking was at the physical data page level, which encompasses a large number of records.

データベースの回復に関するそのような大きな細分性の
重要性は、それによってチェックポイント法における回
復の問題が単純になることであった。The importance of such large granularity for database recovery was that it simplifies the recovery problem in checkpointing methods.

最適チェックポイントで回復を開始できるようにするの
は、たとえば、レコード・レベルでロッキングが行なわ
れるサブページ細分性のシステムの場合よりも簡単な問
題であった。Enabling recovery to begin at an optimal checkpoint has been a simpler problem than, for example, in subpage granularity systems where locking is done at the record level.

細分性が大きいとなぜチェックポイント法で回復の問題
が単純になるかを説明すると、レコード・ロッキングの
場合に、ディスクから内部にデータを移し、ＲＡＭバッ
ファから外部にデータを送り出す物理ページが、複数の
トランザクシロンからの更新を含むことがある。さらに
、システム・クラッシュの時点でこれらのトランザクシ
ョンのあるものはコミットされ、あるものは打ち切られ
、あるものは停止される。その時点で、大きな細分性ロ
ッキングを有するＤＢ２等のデータベース・システムで
は、ページはせいぜい１つのトランザクションによる影
響しか受けないことになる。したがって、そのページに
関する回復を扱うことは比較的容易になる。しかし、そ
のようなサブページ・ロッキングによって同時更新が容
易になるので、同じページで多数のトランザクシロンが
実行されることがあり、関連するデータベースのすべて
の変更を適正な順序で行なわなければならないため、回
復がより困難になる。To explain why checkpointing simplifies the recovery problem with high granularity, record locking requires multiple physical pages to move data in from disk and out from RAM buffers. may contain updates from transaxilons. Furthermore, at the time of a system crash, some of these transactions are committed, some are aborted, and some are stopped. At that point, in database systems such as DB2 that have large granularity locking, the page will be affected by at most one transaction. Therefore, handling recovery for that page becomes relatively easy. However, such subpage locking facilitates concurrent updates, since many transactions can be executed on the same page, and all related database changes must be made in the proper order. , recovery becomes more difficult.

たとえば、データ・ページがページ・アウトされて更新
が必要となったかどうかを記録することが比較的容易な
、データ・ページ・レベルでのロッキング細分性と関連
する回復アルゴリズムの方がより簡単であるにもかかわ
らず、サブページ細分性ロッキングを実現し、その結果
、データ・ページ内の多数のトランザクシロンに対して
回復を行なう方法に対して開発及び関心が増大してきた
。For example, locking granularity and associated recovery algorithms are simpler at the data page level, where it is relatively easy to record whether a data page has been paged out and requires an update. Nevertheless, there has been increased development and interest in ways to implement subpage granularity locking and thus recovery for multiple transactions within a data page.

そのような同時性が求められる１つの理由は、ページ・
レベルのロッキングがページの物理的サイズに関係して
いることであった。しかし、データベースは、任意の物
理的制限ではなくて論理的オブジェクトまたはエンティ
ティに関して動作するので、論理データ・オブジェクト
に対してテーブルまたはレコード・レベルでロッキング
を行なうことが、すなわち、レコード・レベルなどのサ
ブページ・レベルの細分性が強く望まれていた。データ
・ページは入出力のためデータベースに入れてもよいが
、ロック自体は機能的にそのように制限しないことが好
ましい。したがって、回復のために先行書込み技術を用
いる状況では、同時性のため、単一ページ内にレコード
に対するロックを有する多数のトランザクションを設け
ることが望まれていた。One reason why such simultaneity is required is that page
The level of locking was related to the physical size of the page. However, since databases operate in terms of logical objects or entities rather than arbitrary physical constraints, locking at the table or record level for logical data objects is not possible, i.e., at the sub-record level, etc. Page-level granularity was highly desired. Although data pages may be entered into the database for input and output, the locks themselves are preferably not functionally restrictive. Therefore, in situations where write-ahead techniques are used for recovery, it has been desirable to have multiple transactions with locks on records within a single page for concurrency.

したがって、サブページ細分性ロッキングのための技術
、及びデータ・ページ内の多数のトランザクションのた
めの回復技術が開発された。１つの技術では、先行書込
みロギングに、データ・ページのサイズ（すなわち、デ
ータベース内にデータを取り入れるため、またはデータ
を書き出すために使用される単位）よりも細かいロッキ
ング細分性を設ける。この技術では、通常のデータベー
ス動作中は、バッファ・プールの状態に関する詳細な情
報を書き出す、すなわち、「ログ」する。具体的には、
ＲＡＭバッファ・プール内のページの状況（すなわち、
どのページが「汚れて」いたか、それらのページを汚し
た活動とそれに関連するログ順序番号（ＬＳＮ）等）が
周期的に回復ログに書き出され、このチェックポイント
情報を含むレコード内に入れられる。回復のため必要な
このデータに加えて、データベース内の実データに対す
る変更のログ・レコードも回復ログに書き出される。Therefore, techniques for subpage granularity locking and recovery techniques for multiple transactions within a data page have been developed. One technique provides write-ahead logging with locking granularity that is finer than the data page size (ie, the unit used to bring data into or write data out). This technique writes out, or "logs", detailed information about the state of the buffer pool during normal database operations. in particular,
Status of pages in the RAM buffer pool (i.e.
Which pages were "dirty", the activities that soiled those pages, their associated log sequence numbers (LSNs), etc.) are periodically written to the recovery log and included in a record containing this checkpoint information. It will be done. In addition to this data needed for recovery, log records of changes to the actual data in the database are also written to the recovery log.

したがって、ログ全体は、チェックポイント情報とデー
タベースに対する実データ更新とからなる一連のレコー
ドであった。Therefore, the entire log was a series of records consisting of checkpoint information and actual data updates to the database.

回復時に、解析パスが行なわれる。この段階では、ログ
に対して順方向パスが行なわれ、ログに以前に書き込ま
れたチェックポイント情報を含むすべてのログ・レコー
ドが読み出されて解析される。次に、解析パス中にそこ
から回復を開始する最適チェックポイントがこの情報か
ら計算される。Upon recovery, an analysis pass is performed. At this stage, a forward pass is made to the log and all log records containing checkpoint information previously written to the log are read and parsed. The optimal checkpoint from which to begin recovery during the analysis pass is then computed from this information.

″　さらに具体的には、多数のチェックポイント・レコ
ードから２つのＬＳＮが決定される。すなわち、バッフ
ァ内の最初の「汚れた」ページに対する第１の更新に関
するＬＳＮと、依然として作動中の、または「コミット
されていない」最初のトランザクションに対する第１の
更新に対応するＬＳＮである。最適回復点はこれらのＬ
ＳＮのうち小さい方の値に対応する。解析パス中に計算
されたこれらのＬＳＮを、以下では、本発明のＭＩＮＢ
ＵＦＬＳＮ及びＬＯＷＴＲＡＮＬＳＨに対応するものと
理解する。これらは本発明による通常の順方向処理中に
周期的に決定され記憶される。″More specifically, two LSNs are determined from a number of checkpoint records: the LSN for the first update to the first “dirty” page in the buffer, and the LSN for the first update to the first “dirty” page in the buffer, and The LSN corresponding to the first update for the first transaction that is "uncommitted." The optimal recovery point is these L
It corresponds to the smaller value of SN. These LSNs calculated during the analysis pass will be referred to below as our MINB
It is understood that this corresponds to UFLSN and LOWTRANLSH. These are determined and stored periodically during normal forward processing according to the present invention.

そのような従来技術の１つの問題点は、（最適な回復点
を決定するための）これら２つのＬＳＮ値を計算するだ
けのためにすべてのログ・レコードを読むため、バッフ
ァ・プールのログ全体を走査しなければならない（すな
わち、解析パスが必要な）ことであった。これは本発明
と対照的であり、本発明では、上述のようにＭＩＮＢＵ
ＦＬＳＮ及びＬＯＷＴＲＡＮＬＳＮの値が周期的に決定
されてログ・レコードに書き込まれ、解析パスの必要は
ない。One problem with such prior art techniques is that they read every log record just to calculate these two LSN values (to determine the optimal recovery point), so the buffer pool's entire log (i.e., a parsing pass is required). This is in contrast to the present invention, in which the MINBU
The values of FLSN and LOWTRANLSN are determined periodically and written to the log record, with no need for a parse pass.

従来技術では、ＭＩＮＢＵＦＬＳＮ及びＬＯＷＴＲＡＮ
ＬＳＮの実際の瞬時値は、本発明に従って最後にログに
書き込まれた値よりも新しく、シたがって一層最適な回
復点をもたらすが、本発明のものは、解析バスを必要と
せず、回復の際にただちに使用可能である。In the prior art, MINBUFLSN and LOWTRAN
Although the actual instantaneous value of LSN is newer than the last value written to the log according to the present invention, thus resulting in a more optimal recovery point, the present invention does not require an analysis bus and available for immediate use.

Ｃ１発明が解決しようとする問題点したがって、上記のことを念頭に置けば、柔軟なチェッ
クポイント処理のための新規な方法が望まれていたこと
は容易に明らかである。上記のシャドー法の欠点を回避
しながら、同時にサブページの同時性とデータ・ページ
における複数ページ・トランザクションからの回復を実
現する技術が望まれていた。さらに、この技術をサポー
トするために実行時に必要なオーバヘッドの点で特に迅
速かつ効率的で、かつ回復時に分析段階が不要な、デー
タベース回復のためのシステム及び方法が望まれていた
。Problems That the C1 Invention Attempts to Solve Thus, with the above in mind, it is readily apparent that a new method for flexible checkpointing was desired. It would be desirable to have a technique that avoids the drawbacks of the shadow method described above while simultaneously providing subpage concurrency and recovery from multi-page transactions on data pages. Furthermore, it would be desirable to have a system and method for database recovery that is particularly quick and efficient in terms of the overhead required at runtime to support this technique, and that does not require an analysis step during recovery.

５　Ｄ１問題点を解決するための手段］ンピュータ化ルーチンで実現された機能ＭＩＮＢＵＦ
ＬＳＮ及びＬＯＷＴＲＡＮＬＳＮが定義され、チェック
ポイントの第１及び第２成分を含む。ＭＩＮＢＵＦＬＳ
Ｎは、機能的にＲＡＭバッファ内の最初の「汚れた」デ
ータ・ページに対する第１の更新に関係している。ＬＯ
ＷＴＲＡＮＬＳＮは、機能的に各更新がコミットされて
いないトランザクションに対応するようになった、トラ
ンザクシロン・テーブル中のシーケンスの最初の更新に
関係している。これら２つの成分は先行書込み中に定期
的に取り出されて、ロギング活動の関数としてログ・ヘ
ッダに記憶される。回復時には、チェックポイントが検
索され、回復アルゴリズムでその成分間の機能比較が用
いられる。従来の回復ログの解析バスは不要となり、ロ
ギング中のオーバヘッドが減少し、かつ回復の効率が高
まる。5 Measures to solve D1 problem] Function MINBUF realized by computerized routine
LSN and LOWTRANLSN are defined and include the first and second components of the checkpoint. MINBUFLS
N is functionally related to the first update to the first "dirty" data page in the RAM buffer. L.O.
WTRANLSN pertains to the first update in the sequence in the transaction table such that each update functionally corresponds to an uncommitted transaction. These two components are periodically retrieved during write ahead and stored in the log header as a function of logging activity. During recovery, checkpoints are searched and the recovery algorithm uses functional comparisons between its components. A traditional recovery log parsing bus is no longer required, reducing overhead during logging and increasing recovery efficiency.

Ｅ、実施例本発明を記述するため、以下で使用するいくつかの用語
及び概念、及び使用されるシステム及び方法の簡単な概
説を示す。その後に、図面を参照しながら本発明の動作
についてさらに詳細な説明を行なう。E. EXAMPLE A brief overview of some of the terms and concepts used and the systems and methods used is provided below to describe the invention. Thereafter, the operation of the present invention will be explained in more detail with reference to the drawings.

先行書込みロギング・プロトコルを用いるデータベース
の技術では、以下の用語が通常使用され、る。In database technology that uses write-ahead logging protocols, the following terms are commonly used:

データ・ページ：この用語は、ユーザ・データを含む一
定の大きさの記憶ブロックを指す。そのようなデータ・
ページはＲＡＭバッファ等ある形の２次記憶域にページ
・インすることができる（それらデータ・ページはシス
テム・クラッシュの場合には失われる）。または、これ
らのページをハード・ディスク・ファイル等通常の形の
１次記憶域にページ・アウトすることができる（それら
のページはシステム・クラッシュの場合にも保持される
）。バッファ内のデータ・ページはそれに対する最新の
更新を含むが、ハード・ディスク上のデータ・ページの
古いコピーは、そのページに対する最新の更新を含まな
いことがあり得る。Data Page: This term refers to a block of storage of fixed size that contains user data. Such data
Pages can be paged into some form of secondary storage, such as a RAM buffer (those data pages are lost in the event of a system crash). Alternatively, these pages can be paged out to a conventional form of primary storage, such as a hard disk file (where they are retained even in the event of a system crash). Although a data page in the buffer contains the most recent updates to it, an older copy of the data page on the hard disk may not contain the most recent updates to that page.

ログ・レコード：ログ・レコードという用語は、単一の
データ変更の前後のイメージを共に含むレコードを指す
。このログ・レコードは通常、ログ・レコードに含まれ
る情報を使ったデータ変更の再実行及び取消しを可能に
するために使用される。Log record: The term log record refers to a record that contains both the before and after images of a single data change. This log record is typically used to allow redoing and undoing of data changes using the information contained in the log record.

回復ログ：これは、データベースのデータ・ページに対
して加えられたすべての変更を、それらが実行された順
序で記録する一連の（すぐ上で説明した）ログ・レコー
ドを指す。回復ログを参照することにより、回復処理は
データベースを整合状態に復元することができる。Recovery log: This refers to a set of log records (described immediately above) that records all changes made to the data pages of a database in the order in which they were performed. By referring to the recovery log, the recovery process can restore the database to a consistent state.

ログ順序番号二回復ログ内の上記の各ログ・レコードは
、固有のログ順序番号（ＬＳＮ）を使って識別される。Log Sequence Number Each log record described above in the recovery log is identified using a unique log sequence number (LSN).

ログ・レコードのＬＳＮとは、ログ・レコードの最初の
バイトの、ログの論理的開始部分からの論理的バイト・
オフセットである。The LSN of a log record is the first byte of the log record, starting from the logical start of the log.
It is an offset.

先行書込みロギング：データ・ページが更新または「汚
された」とき、その更新は、対応する識別ＬＳＮを有す
るログ・レコードを生成する。データ・ページはまたこ
のＬＳＮで更新される。「汚れた」データ・ページはど
れも、そのページに加えられた最後の更新を記録したロ
グ・レコードのＬＳＮを含む。このようにして、データ
・ページは、その対応するＬＳＨによって識別されるロ
グ内の特定の点と関連づけられ、このＬＳＮより後のす
べてのログ・レコードは、対応するその特定ページを参
照しないようになる。「汚れた」データ・ページがディ
スクに書き出されるとき、先行書込みロギング・プロト
コルは、回復ログが書き込むべきページ上のＬＳＮによ
って識別されるログ・レコードまで、あらかじめディス
クに書き込まれていなければならないと指定する。この
手順により、そのページに対する変更を記録するすべて
のログ・レコードがディスク上にすでに書き出される前
に、データ・ページがディスクに書き出されないことが
保証される。したがって、回復の場合、それによってデ
ータベースを整合状態に復元することができる。Write-ahead logging: When a data page is updated or "tainted", the update generates a log record with a corresponding identifying LSN. The data page is also updated with this LSN. Every "dirty" data page contains the LSN of the log record that recorded the last update made to that page. In this way, a data page is associated with a particular point in the log identified by its corresponding LSH, and all log records after this LSN do not refer to that particular corresponding page. Become. When a "dirty" data page is written to disk, the write-ahead logging protocol specifies that the recovery log must have been previously written to disk up to the log record identified by the LSN on the page to be written. do. This procedure ensures that a data page is not written to disk before all log records recording changes to that page have already been written to disk. Therefore, in case of recovery, it allows the database to be restored to a consistent state.

ＭＩＮＢＵＦＬＳＮ：データベースＲＡＭバッファはい
つでも０または１つ以上の「汚れた」ペー、ジを含む。MINBUFLSN: The database RAM buffer contains zero or more "dirty" pages at any time.

バッファが１つ以上の「汚れた」ページを含む場合、最
も古いページ、すなわち、他のどの「汚れた」ページよ
りも前に更新されたページが１つ存在する。このページ
に対する最初の更新のＬＳＮを、本明細書ではＭＩＮＢ
ＵＦＬＳＮとして定義する。ＭＩＮＢＵＦＬＳＮは、必
ずしもそのページに書き込まれる：ｔ、　Ｓ　Ｎと同じ
ではないが、そのページに対する最初の更新のＬ　Ｓ　
Ｎであることに留意されたい。（それは２番目の更新ま
で同じである。）ＭＩＮＢＵＦＬＳＮパラメータが重要なのは、ログされ
た動作が再実行される必要があるログ内の最初の点を識
別するためである。なぜならば、以前のログ・レコード
と関連したデータ・ページはすでにディスクに書き出さ
れ、もはやバッフ１内にないからである。「汚れた」ペ
ージがバッファにないときは、次に使用可能な、すなわ
ち空いているＬＳＮがＭＩＮＢＵＦ’ＬＳＮとして使用
される。なぜならば、このＬＳＮは、バッファ・ページ
に対する書込みを行なうことができる最初の場所だから
である。If a buffer contains one or more "dirty" pages, there is one page that is the oldest, ie, one that was updated before any other "dirty" pages. The LSN of the first update to this page is herein MINB
Define as UFLSN. MINBUFLSN is not necessarily the same as t, S N written to that page, but is the L S of the first update to that page.
Note that N. (It remains the same until the second update.) The MINBUFLSN parameter is important because it identifies the first point in the log at which the logged operation needs to be re-executed. This is because the data page associated with the previous log record has already been written to disk and is no longer in Buffer1. When there are no "dirty" pages in the buffer, the next available or free LSN is used as the MINBUF'LSN. This is because this LSN is the first location where writes to the buffer page can occur.

ＬＯＷＴＲＡＮＬＳＮ　：これは、最も古い作動時トラ
ンザクションによって書かれた最初のログ・レコードの
ＬＳＮである。言い換えると、これは、まだ作動中のト
ランザクションによって書かれた任意のログ・レコード
の最小のＬＳＮである。ＬＯＷＴＲＡＮＬＳＮは、それ
よりも前には作動時トランザクションが現われないとい
うログ内の点を識別する。LOWTRANLSN: This is the LSN of the first log record written by the oldest active transaction. In other words, this is the lowest LSN of any log record written by a transaction that is still active. LOWTRANLSN identifies a point in the log before which no active transaction appears.

上記のことを念頭に置いて、次に本発明の全体的概念に
ついて概説する。上記ＭＩＮＢＵＦＬＳＮ及びＬＯＷＴ
ＲＡＮＬＳＮが定期的に決定され、ログ・ファイル・ヘ
ッダのＲＡＭバージジンに書き込まれる。ディスクに書
き込まれた最後のログ・レコードのＬＳＮも更新される
。次にログ・ファイル・ヘッダがディスクに書き込まれ
る。これらの活動が、本発明のチェックポイント・シス
テム及び方法の通常の実行時薯−バヘッドとなる。ＭＩ
ＮＢＵＦＬＳＮ及びＬＯＷＴＲＡＮＬＳＮを維持するに
は、比較的小さなオーバヘッドしか必要、とせず、チェ
ックポイント（ＭＩＮＢＵＦＬＳＮ。With the above in mind, the general concept of the present invention will now be outlined. Above MINBUFLSN and LOWT
The RANLSN is determined periodically and written to the RAM virgin of the log file header. The LSN of the last log record written to disk is also updated. The log file header is then written to disk. These activities constitute the normal runtime overhead of the checkpointing system and method of the present invention. M.I.
Maintaining NBUFLSN and LOWTRANLSN requires relatively little overhead and requires a checkpoint (MINBUFLSN.

ＬＯＷＴＲＡＮＬＳＮ）を含むログ・ファイル・ヘッダ
を書く際の不定期的なディスク入出力も同様に小さなオ
ーバヘッドしか必要としないことに留意されたい。Note that the occasional disk I/O in writing log file headers, including LOWTRANLSN), requires little overhead as well.

データベースの回復が必要となったとき、ログ内の開始
点Ｓ″Ｔ　Ａ　ＲＴ　Ｌ　Ｓ　Ｎ、が、Ｌ　ＯＷ　Ｔ　
ＲＡ　ＮＬＳＮとＭＩＮＢＵＦＬＳＨの最小値として求
められる。５ＴＡＲＴＬＳＮが、ディスクに書き込まれ
た最後のＬＳＮよりも小さい場合、回復は完了し、さも
ない場合は回復は必要でない。しかし、さらに、ＬＯＷ
ＴＲＡＮＬＳＮがＭＩＮＢＵＦＬＳＮよりも小さい場合
は、ログされた更新が適用されたので、ＬＯＷＴＲＡＮ
ＬＳＮとＭＩＮＢＵＦＬＳＨの間での回復中にデータ・
ページを読み込む必要はない。本発明のこの段階を以後
「ミニ解析」と呼ぶ。ＭＩＮＢＵＦＬＳＮに達した後、
通常の回復再実行処理が再開する。When a database needs to be recovered, the starting point in the log, S″T A R S N, is LOW T
It is determined as the minimum value of RA NLSN and MINBUFLSH. If the 5TART LSN is less than the last LSN written to disk, recovery is complete, otherwise no recovery is required. However, in addition, LOW
If TRANLSN is less than MINBUFLSN, the logged updates have been applied, so LOWTRAN
Data loss during recovery between LSN and MINBUFLSH
No need to load the page. This step of the invention is hereinafter referred to as "mini-analysis." After reaching MINBUFLSN,
Normal recovery re-execution processing resumes.

第１図をまず参照すると、本発明によれば、ブロック１
０及び１２で示すように、ＭＩＮＢＵＦＬＳＮ及びＬＯ
ＷＴＲＡＮＬＳＮからなるチェックポイントが周期的に
作成され、次に、ブロック１４で示すように、ハード・
ファイルや固定ディスク等の永続記憶装置のログ・ファ
イル・ヘッダにやはり周期的に書き出されることを想起
されたい。線１６は、この線よりも上のチェックポイン
トを取り出して記憶する処理と、この線よりも下側のシ
ステム・クラッシュ後にそのようなチェックポイントを
使ってデータベースの回復を容易にする処理との間の機
能上の区別を概念的に示すためのものである。以前に取
り出され記憶されたチェックポイントを検索して以下で
さらに詳しく説明する回復処理で使用するという、これ
ら後者の回復方法をブロック１８で総称的に示す。Referring first to FIG. 1, according to the present invention, block 1
MINBUFLSN and LO as shown at 0 and 12
A checkpoint consisting of WTRANLSN is created periodically and then the hard
Recall that log file headers are also periodically written to persistent storage such as files or fixed disks. Line 16 is between the process of retrieving and storing checkpoints above this line and the process of using such checkpoints to facilitate database recovery after a system crash below this line. This is to conceptually show the functional distinction between These latter recovery methods, in which previously retrieved and stored checkpoints are retrieved and used in the recovery process described in more detail below, are indicated generically at block 18.

次に第２図を参照すると、ＭＩＮＢＵＦＬＳＮの概念が
概念的に示されている。データ処理システムのＲＡＭバ
ッファ２０の形の２次記憶域は、複数のデータ・ページ
２２及び２４を含む。これらのページの一部２２は「ク
リーン」であり、他のページは「汚れた」ページ２４で
あって、更新・が、これらの変更の前後のイメージを示
す相関ログ・レコードと共にそれらのページに書き込ま
れている。「汚れた」各ページには、それぞれのページ
の状況を「クリーン」から「汚れた」に変化させるログ
書込みの固有のＬＳＮ２８が関連づけられている。前述
のように、ＭＩＮＢＵＦＬＳＮは最も古いページのＬＳ
Ｎｌすなわち、他のどの「汚れた」ページの更新よりも
先に更新されたベージのＬＳＮである。バッファ２０に
「汚れた」ページがないときは、次に使用可能な、すな
わち空いているＬＳＮがバッファ・ページに対する書込
みを行なうことができる最初の場所であるので、そのＬ
ＳＮがＭＩＮＢＵＦＬＳＮとして使用される。Referring now to FIG. 2, the MINBUFLSN concept is conceptually illustrated. Secondary storage in the form of RAM buffer 20 of the data processing system includes a plurality of data pages 22 and 24. Some of these pages 22 are "clean" and others are "dirty" pages 24 where updates have been made to those pages with correlated log records showing before and after images of these changes. It is written. Each "dirty" page is associated with a unique LSN 28 of log writes that changes the status of the respective page from "clean" to "dirty." As mentioned above, MINBUFLSN is the LS of the oldest page.
Nl, the LSN of the page that was updated before any other "dirty" page updates. When there are no "dirty" pages in buffer 20, the next available or free LSN is the first place a write can be made to a buffer page, so the LSN
SN is used as MINBUFLSN.

したがって、第２図の例では、「汚れた」ページ２４に
は、昇順に５．７．１０．１１及び１６のＬＳＮが関連
づけられている。ＬＳＮが時間的に昇順に割り当てられ
ているので、最小のＬＳＮが最初の関連ページを示す。Thus, in the example of FIG. 2, "dirty" page 24 has associated LSNs of 5.7.10.11 and 16 in ascending order. Since LSNs are assigned in ascending order in time, the lowest LSN indicates the first relevant page.

参照番号２８で示したＭＩＮＢＵＦＬＳＮの定義によれ
ば、この例では、ＭＩＮＢＵＦＬＳＮは、最小の順序番
号、すなわち参照番号３０で示した５である。５のＬＳ
Ｎに関連する「汚れた」ページ２４がディスクに書き出
される場合、そのページはそれに応じてＲＡＭバッファ
２０から消える。第１図のブロック１０で示す処理によ
れば、新しいＭＩＮＢＵＦＬＳＮ（詳しくは以下に説明
する第５図に示す）を周期的に取り出す際に、次のチｊ
−”／クポイントをディスクに書き込む前に、バッファ
２０内の「汚れた」ページと関連する最小のＬＳＮ、す
なわちＭＩＮＢＵＦＬＳＮ２８を求めて、バッファ２０
内のページに対応するＬＳＮ　（すなわち、この例では
、５．７．１０．１１及び１６）のリストが周期的に走
査される。したがって、５のＬＳＮに関連するこの「汚
れた」ページが書き出され、バッファ２０から消えた場
合は、この走査に基づいて、７のＬＳＮが新しいＭＩＮ
ＢＵＦＬＳＮ２８として５のＬＳＮに置き換わり、７の
ＬＳＨに関連する「汚れた」ページ２４を、バッファ２
０内の新たに指定された最も古い「汚れた」ページとし
てマークする。According to the definition of MINBUFLSN, indicated by reference number 28, in this example, MINBUFLSN is the lowest sequence number, namely 5, indicated by reference number 30. 5LS
If a "dirty" page 24 associated with N is written to disk, that page disappears from RAM buffer 20 accordingly. According to the process shown in block 10 of FIG. 1, when periodically retrieving a new MINBUFLSN (shown in FIG.
- Before writing a "dirty" page point to disk, determine the smallest LSN associated with a "dirty" page in buffer 20, i.e. MINBUFLSN28, and
The list of LSNs corresponding to pages within (ie, 5.7.10.11 and 16 in this example) is periodically scanned. Therefore, if this "dirty" page associated with an LSN of 5 is written out and disappears from buffer 20, then based on this scan, the LSN of 7 becomes the new MIN
Replaces LSN of 5 as BUFLSN28 and stores "dirty" page 24 associated with LSH of 7 in buffer 2
Mark as the newly specified oldest "dirty" page in 0.

次に第３図を参照すると、トランザクション・テーブル
３２の概略図が示されている。この図は、データベース
・システムのトランザクシロン管理機能では、まだ作動
中のトランザクションと関連して書き込まれたログ・レ
コードを記録することが一般的であることを概念的に示
すためのものである。トランザクシロン・テーブル３２
に通常保持される値の１つはＬＳＮ３４である。トラン
ザクションが最初にログ・レコードを書くとき、そのロ
グ・レコードのＬＳＮがトランザクション・テーブル３
４に書き込まれ記憶される。（ログ・レコードは、デー
タベースに対する更新を記録するためトランザクション
によってのみ書かれる。Referring now to FIG. 3, a schematic diagram of transaction table 32 is shown. This diagram is intended to conceptually illustrate that the transaction management function of a database system typically records log records written in connection with transactions that are still in progress. Transaxilon Table 32
One of the values typically held in is LSN34. When a transaction writes a log record for the first time, the LSN of that log record is stored in transaction table 3.
4 and stored. (Log records are only written by transactions to record updates to the database.

一部のトランザクションは決して更新せず、読み取るだ
けである。トランザクシロンは、更新を行なうまで、「
読取り」トランザクションと呼ばれる。読取りトランザ
クションはログ・レコードを書いていないので、ＬＳＮ
が関連づけられていない。第３図の例は、最初のトラン
ザクシロン書込み欄３４を「Ｒ」、すなわち読取り（参
照番号３６）が占めていることを示す。なぜならば、こ
れらのトランザクションと関連するＬＳＮがないからで
ある。）第３図の説明図を引き続き参照すると、特定のトランザ
クションがその最初のログ・レコードを書き、テーブル
３４で１４の値を有するＬＳＮ３８がそれに関連づけら
れる。別のトランザクションが、同様にその最初のログ
・レコードを書き、テーブル３４に示すように、１８の
値を有するＬＳＮ３８がそれに関連づけられる。ＬＯＷ
ＴＲＡＮＬＳＮが最も古い作動時トランザクションが書
いた最初のログ・レコードのＬＳＮである、すなわち、
ＬＳＮが時間的に昇順に割り当てられているので、その
トランザクシロンが書いた任意のログ・レコードの最小
のＬＳＮであることを上記のことから想起されたい。し
たがって、第３図に関して、定義により、ＬＯＷＴＲＡ
ＮＬＳＮ４０はトランザクション・テーブル３４のすべ
てのＬＳＮの最小値、すなわち、ログ・レコードを書い
た書込みトランザクションとそれぞれ関連づけられてい
る順序番号１０．１４及び１８の列の最小値である。し
たがって、ＬＯＷＴＲＡＮＬＳＮ４０は、参照番号４２
として示す値１０であり、これは、テーブル３４に記録
された各トランザクションについて書かれた最初のＬＳ
Ｎを含む欄３４におけるすべてのＬＳＮの最小値である
。Some transactions never update, only read. Transaxilon will be displayed as "until you update it.
called a "read" transaction. The read transaction is not writing a log record, so the LSN
is not associated. The example of FIG. 3 shows that the first transaction write field 34 is occupied by an "R" or read (reference numeral 36). This is because there are no LSNs associated with these transactions. ) Continuing to refer to the illustration of FIG. 3, a particular transaction writes its first log record, and LSN 38 having a value of 14 in table 34 is associated with it. Another transaction similarly writes its first log record and has associated with it an LSN 38 having a value of 18, as shown in table 34. LOW
TRANLSN is the LSN of the first log record written by the oldest active transaction, i.e.
Recall from above that since LSNs are assigned in ascending order in time, it is the lowest LSN of any log record written by that transaction. Therefore, with respect to Figure 3, by definition LOWTRA
NLSN 40 is the minimum value of all LSNs in transaction table 34, ie, the minimum value of the sequence number columns 10, 14 and 18, respectively, associated with the write transaction that wrote the log record. LOWTRANLSN40 is therefore the reference number 42
10, which is the first LS written for each transaction recorded in table 34.
This is the minimum value of all LSNs in column 34 containing N.

特定の時点でバッファ２０またはテーブル３４にＬＳＮ
が存在しない場合は、ＬＯＷＴＲＡＮＬＳＮ及びＭＩＮ
ＢＵＦＬＳＮは、次に使用可能なログ・レコードの値に
対応する値を割り当てられる。この値はもちろん、単調
に増加するＬＳＮの性質のため、第４図及び第５図の手
順に従って後で決定されるＭＩＮＢＵＦＬＳＮまたはＬ
ＯＷＴＲＡＮＬＳＮに等しいかまたはそれよりも大きい
。LSN in buffer 20 or table 34 at a particular time
does not exist, LOWTRANLSN and MIN
BUFLSN is assigned a value that corresponds to the value of the next available log record. This value is, of course, MINBUFLSN or L which is determined later according to the procedure of FIGS.
Equal to or greater than OWTRANLSN.

要約すると、第２図及び第３図に関連して、バッファ・
ページ（ディスクに書き込まれ、またはディスクから書
き出される入出力ページの単位）に対する書込みがデー
タベースで行なわれるとき、ログ・レコードが作成され
る。このログ・レコードには、データベース・システム
に対するレコードを一義的に識別し、かつデータ・ペー
ジに対するその特定の入出力書込みを一義的に識別する
ＬＳＮが関連づけられる。ＬＳＮはその後で「汚れた」
ページを「タグ」づけするために使用され、「汚れた」
ページに関連するそのようなＬＳＨのリストが作られる
。ページが最初に「汚された」　（すなわち、そのペー
ジに対して最初の入出力書込みが行なわれた）とき、Ｌ
ＳＮが作成される。（ＬＳＮはもちろん、書込みが行な
われるときはいつでも作成されるが、そのような最初の
書込みが常にある。）最初の書込みに関連するこの最初
のしＳＮが、バッファ２０内の「汚れた」ページＬＳＮ
のリストに入れられる。したがって、リストには、その
時点でバッファ・プールに現存するすべてのページにつ
いて、当該の「汚れた」ページに対してログのどこで最
初の書込みが行なわれたかを示す識別子があり、そのよ
うな識別子またはＬＳＮが「汚れた」各ページと関連づ
けられている。In summary, with reference to FIGS. 2 and 3, the buffer
A log record is created when a page (a unit of input/output page written to or written from disk) is written to in the database. Associated with this log record is an LSN that uniquely identifies the record to the database system and uniquely identifies that particular I/O write to a data page. LSN was then “dirty”
Used to "tag" a page as "dirty"
A list of such LSHs associated with the page is created. When a page is first "dirty" (i.e., the first I/O write is made to it), L
An SN is created. (LSNs are of course created whenever a write is made, but there is always the first such write.) This first SN associated with the first write is the "dirty" page in buffer 20. LSN
be included in the list. Therefore, for every page currently in the buffer pool, the list has an identifier indicating where in the log the first write was made to the "dirty" page in question; Or an LSN is associated with each "dirty" page.

ＬＳＮは、順序番号を時間的に昇順に割り当てられるの
で、すでに存在するどのＬＳＮも、定義により、新たに
割り振られるどのＬＳＮよりも小さくなければならない
。したがって、ＭＩＮＢＵＦＬＳＨの値は、新たに割り
振られたＬＳＨに等しいかまたはそれよりも小さい。バ
ッファ・ページがディスクに書き出される場合、それと
関連するＬＳＮがバッファ２０の「汚れた」ページのＬ
ＳＮリストから除去される。同様に、バッファ２０内に
新たに読み込むことができる「クリーン」ページがその
最初の書込みをディスクに対して行なった場合は、それ
に応じてそのページにＬＳＮが割り当てられる。したが
って、それに応じてバッファ・プールに対して変更また
は追加があると、「汚れた」ページと関連するＬＳＨの
リストが周期的に変更される。Since LSNs are assigned sequence numbers in ascending order of time, any already existing LSN must by definition be smaller than any newly allocated LSN. Therefore, the value of MINBUFLSH is equal to or less than the newly allocated LSH. When a buffer page is written to disk, its associated LSN is the LSN of the "dirty" page in buffer 20.
Removed from the SN list. Similarly, if a "clean" page that can be newly read into buffer 20 has made its first write to disk, then the page will be assigned an LSN accordingly. Therefore, the list of LSHs associated with "dirty" pages is periodically modified as there are changes or additions to the buffer pool accordingly.

次に第４図を参照して、通常のデータベース動作中のチ
ェックポイントのＬＯＷＴＲＡＮＬＳＮ成分の導出及び
維持についてさらに詳しく説明する。Referring now to FIG. 4, the derivation and maintenance of the checkpoint LOWTRANLSN component during normal database operation will now be described in more detail.

第４図を参照して、通常のデータベース動作中にＬＯＷ
ＴＲＡＮＬＳＨの値を取り出して維持するための方法を
以下に詳述する。最初に、４４で、ＬＯＷＴＲＡＮＬＳ
Ｎはログ内の次に使用可能なＬＳＮに初期設定される。Referring to Figure 4, LOW during normal database operation.
The method for retrieving and maintaining the value of TRANLSH is detailed below. First, at 44, LOWTRANLS
N is initialized to the next available LSN in the log.

データベースの始動時には、作動的書込みトランザクシ
ョンのＬＳＨのリストは空であることに留意されたい。Note that at database startup, the LSH's list of active write transactions is empty.

したがって、ＬＯＷＴＲＡＮＬＳＮはログ内の次に使用
可能なＬＳＨに初期設定される。トランザクションが現
われてログ・レコードを書くと、対応するログ・レコー
ドを識別するＬＳＮが作成される。したがって、４６で
、サブルーチンは、そのようなトランザクシロンが対応
する第１のログ・レコードを書くのを待つ。この事象が
発生すると、それによって定義された新しいＬＳＮが、
４８で第３図の例に示すような書込みトランザクシロン
のＬＳＮのリストに加えられる。新たに追加されたしＳ
Ｎは、もちろんＬＯＷＴＲＡＮＬＳＮに等しいかまたは
それよりも大きくなければならない。なぜならば、ＬＳ
Ｎは昇順に順次割り振られるがらである。したがって、
再計算の必要はない。Therefore, LOWTRANLSN is initialized to the next available LSH in the log. When a transaction appears and writes a log record, an LSN is created that identifies the corresponding log record. Therefore, at 46, the subroutine waits for such a transaction to write the corresponding first log record. When this event occurs, the new LSN defined by it is
At 48, the write transaction is added to the list of LSNs as shown in the example of FIG. Newly added S
N must of course be equal to or greater than LOWTRANLSN. Because, L.S.
N is allocated sequentially in ascending order. therefore,
No need to recalculate.

第３図に示すようなＬＳＮのリストが空でなくなると、
リストに対する次の変更は、その対応するトランザクシ
ョン（ＬＳＮを生じる）が終了したときのＬＳＨの除去
、またはトランザクションがその最初のログ・レコード
を書いたときのＬＳＮの新たな追加のいずれかである。When the list of LSNs as shown in Figure 3 is no longer empty,
The next change to the list is either the removal of the LSH when its corresponding transaction (which resulted in the LSN) ends, or the addition of a new LSN when the transaction writes its first log record.

したがって、５０で、処理は、リストに対するＬＳＨの
次の追加またはＬＳＨの削除を待つ。次に５２で、Ｌｓ
Ｎリストに対する変更がＬＳＨの追加であるがどうか検
査が行なわれ、そうである場合は、処理は５４でループ
して４８に戻り、追加のＬＳＮをリストに加える。一方
、５２でリストに対する変更が削除である場合は、その
ような削除によってＬＳＮリストが再び空になり、それ
がブロック５８で判定される。リストが空になった場合
は、処理は６０でループして戻り、４４でＬＯＷＴＲＡ
ＮＬＳＮを次に使用可能なＬＳＮに再び初期設定する。Therefore, at 50, the process waits for the next addition of an LSH to the list or deletion of an LSH. Then at 52, Ls
A check is made to see if the change to the N list is the addition of an LSH, and if so, the process loops at 54 and returns to 48 to add the additional LSN to the list. On the other hand, if the change to the list at 52 is a deletion, then such deletion empties the LSN list again, as determined at block 58. If the list is empty, processing loops back at 60 and returns to LOWTRA at 44.
Reinitialize the NLSN to the next available LSN.

一方、リストが空にされなかった場合は、処理は６２で
判断ブロック５８を出てブロック６４に進む。ブロック
６４では、ＬＳＮリストは依然として削除後のメンバー
を含むが、現在リストにあるすべてのＬＳＮのうちの最
小のＬＳＮを決定することによってＬＯＷＴＲＡＮＬＳ
Ｈの値が更新され、この最小のＬＳＮが次に新しいり、
０ＷＴＲＡＮＬＳＮになる。処理は次に６５で戻り、５
０でリストに対する次の追加またはリストからの次の削
除を待つ。If, on the other hand, the list has not been emptied, processing exits decision block 58 at 62 and proceeds to block 64. At block 64, the LSN list still includes the deleted member, but the LOWTRANLS is added by determining the lowest LSN of all LSNs currently in the list.
The value of H is updated and this minimum LSN becomes the next new one,
It becomes 0WTRANLSN. Processing then returns at 65 and 5
0 to wait for the next addition to or deletion from the list.

次に第５図を参照して、通常のデータベース動作中にＭ
ＩＮＢＵＦＬＳＮ値を維持するための方法についてさら
に詳しく説明する。データベースの始動時には、第２図
の説明図でＲＡＭバッファ２０に示すような「汚れた」
バッファ・ページに対応するＬＳＨのリストが空になる
。したがって、６６に示すように、ＭＩＮＢＵＦＬＳＮ
をログ内の次に使用可能なＬＳＨに初期設定しなければ
ならない。バッファ・ページが初めて書かれる、すなわ
ち「汚される」とき、そのページを最初に「汚した」そ
のページに対する書込みに対応するログ・レコードを識
別する対応するＬＳＮが作成される。したがって、６８
で、ルーチンは、トランザクシｅンが最初のバッファ・
ページを「汚す」のを待ち、対応するＬＳＮを生成する
。第２図に示すようなリストに新たに追加されたＬＳＮ
は、以前のどのＭＩＮＢＵＦＬＳＮよりも大きいかまた
は等しくなければならない。これは、ＬＳＮは昇順に順
次割り振られるからであり、したがって、再計算の必要
はない。７０で、「汚れた」バッファ・ページのＬＳＮ
のリストに新しいＬＳＮが追加される。Next, referring to FIG. 5, during normal database operation M
The method for maintaining the INBUFLSN value will be described in more detail. When the database is started, a "dirty" state as shown in the RAM buffer 20 in the explanatory diagram of FIG.
The list of LSHs corresponding to the buffer page becomes empty. Therefore, as shown at 66, MINBUFLSN
must be initialized to the next available LSH in the log. When a buffer page is written to, or "sullied", for the first time, a corresponding LSN is created that identifies the log record corresponding to the write to the page that first "sullied" the page. Therefore, 68
, the routine writes that the transaction is the first buffer.
Wait to "smear" the page and generate the corresponding LSN. Newly added LSN to the list as shown in Figure 2
must be greater than or equal to any previous MINBUFLSN. This is because LSNs are allocated sequentially in ascending order, so there is no need for recalculation. 70, the LSN of the "dirty" buffer page
The new LSN is added to the list.

ＬＳＨのリストが空でなくなると、対応する「汚れた」
バッファ・ページが書き出されるときのＬＳＮの除去、
または、バッファ・ページ内のページが「汚される」と
きに生じるＬＳＮの新たな追加のいずれかから、リスト
に対する次の変更が生じる。したがって、７２で、処理
は、「汚れた」ページに対応するＬＳＮリストに対する
次のそのような追加、または次のそのような削除を待つ
。７４で、リストに対する変更が追加なのか、それとも
削除なのかを判定するために検査が行なわれる。追加の
場合は、ルーチンは７６で７０に戻り、新しいＬＳＮを
リストに追加する。一方、リストに対する変更が削除で
ある場合は、処理は７８で７４における検査から出て、
次の検査８０に進む。ＬＯＷＴＲＡＮＬＳＮに関する処
理の場合と同様に、リストからの削除によってはリスト
が再び空になる可能性があり、それが８０で検査される
。リストが空である場合は、８４でルーチンを出て、６
６の初期状態に戻る。一方、リストが空でない場合は、
処理は８２で検査８０から出てブロック８６に進む。削
除後に、ＬＳＮリストがまだメンバーを含む場合は、Ｍ
ＩＮＢＵＦＬＳＮの値を更新する必要がある可能性があ
る。したがって、８６で、「汚れた」ページに対応する
リスト内のＬＳＨの最小値が決定され、新しい値として
ＭＩＮＢＵＦＬＳＨに割り当てられ、処理は８８でルー
プして戻り、７２でリストに対する次の変更を待つ。When the LSH list is no longer empty, the corresponding "dirty"
Removal of LSN when buffer page is written out,
The next change to the list results from either a new addition of LSNs that occurs when a page in a buffer page is "sullied". Thus, at 72, the process waits for the next such addition to the LSN list, or the next such deletion, corresponding to the "dirty" page. At 74, a check is made to determine whether the changes to the list are additions or deletions. If so, the routine returns to 70 at 76 to add the new LSN to the list. On the other hand, if the change to the list is a deletion, processing continues at 78 from the test at 74;
Proceed to the next test 80. As with the processing for LOWTRANLSN, deletions from the list may result in the list being empty again, which is checked at 80. If the list is empty, exit the routine at 84 and
Return to the initial state of 6. On the other hand, if the list is not empty,
Processing exits test 80 at 82 and proceeds to block 86 . After deletion, if the LSN list still contains members,
It may be necessary to update the value of INBUFLSN. Therefore, at 86, the minimum value of LSH in the list corresponding to the "dirty" page is determined and assigned as the new value to MINBUFLSH, and processing loops back at 88 and waits for the next change to the list at 72. .

ＭＩＮＢＵＦＬＳＮ及びＬＯＷＴＲＡＮＬＳＮの値を、
第５図及び第４図に関連して詳述した相関処理によって
第１図の１０及び１２で維持する場合、第１図の１４で
示すように、通常の動作中にチェックポイントのこれら
の成分を周期的に記憶することが望ましい。この次に第
６図に関連して、チェックポイントのディスクへの書出
しについて、さらに詳細に説明する。The values of MINBUFLSN and LOWTRANLSN are
If maintained at 10 and 12 in FIG. 1 by the correlation process detailed in connection with FIGS. 5 and 4, these components of the checkpoint during normal operation, as shown at 14 in FIG. It is desirable to periodically store . Next, writing a checkpoint to disk will be described in more detail with reference to FIG.

データベースが最初に始動されるとき、９０で、ＬＯＷ
Ｔ、ＲＡＮＬＳＮ及びＭＩＮＢＵＦＬＳＮの値が、ログ
内の次に使用可能なＬＳＨに初期設定される。ＭＩＮＢ
ＵＦＬＳＮとＬＯＷＴＲＡＮＬＳＮからなるチェックポ
イントが、ときどき記憶されて回復のために使用可能と
なることが本発明の特徴であり、そのような記憶は通常
、ディスクへの周期的書込みの形を取る。９２で、サブ
ルーチンは、この記憶ステップの前に１つのログ・レコ
ードについて各々ｎ回のログ書込みが行なわれるのを待
つ。だだし、ｎはチェックポイントの効率が最適となる
ように選択される。たとえば、ログ・レコードが１回書
き込まれるデータ・ページの更新では、ｎ＝１である。LOW at 90 when the database is first started
The values of T, RANLSN and MINBUFLSN are initialized to the next available LSH in the log. MINB
It is a feature of the invention that checkpoints consisting of the UFLSN and LOWTRANLSN are stored from time to time and made available for recovery, and such storage typically takes the form of periodic writes to disk. At 92, the subroutine waits for each n log write to occur for one log record before this storage step. However, n is chosen such that checkpoint efficiency is optimal. For example, for a data page update where a log record is written once, n=1.

別のトランザクシーンが生じて対応するログ・レコード
が書かれる場合は、ｎ＝２である。データベースの通常
動作中、処理は、値ｎに達するまで、そのようなログ書
込みの数、またはＬＳＨの対応する番号を単にカウント
するだけである。９４で、ＬＯＷＴＲＡＮＬＳＮ及びＭ
ＩＮＢＵＦＬＳＮの値がログ・ファイル・ヘッダに書か
れる。処理は、次に９６でループして９２に戻り、新た
に取り出される次のチェックポイント（すなわち、ＭＩ
ＮＢＵＦＬＳＮ及びＬＯＷＴＲＡＮＬＳＮの次の現在値
）が９４でログ・ファイル・ヘッダに次に書かれる前に
、ログ書込みのｎカウントを再び累計し始める。If another transaction scene occurs and a corresponding log record is written, then n=2. During normal operation of the database, the process simply counts the number of such log writes, or the corresponding number of LSHs, until the value n is reached. 94, LOWTRANLSN and M
The value of INBUFLSN is written to the log file header. Processing then loops at 96 and returns to 92 to check for the next newly retrieved checkpoint (i.e., MI
The n count of log writes begins to accumulate again before the next current value of NBUFLSN and LOWTRANLSN) is written to the log file header at 94.

したがって、−数的概念は、ログが成長し続けるにつれ
てときどきチェックポイントを取り、このチェックポイ
ントをディスクに書き込むことである。ログが成長する
につれて、ログ・ファイル・ヘッダ内のチェックポイン
ト値は古くなり、最適でなくなる。しかし、明らかに、
ログに対する追加の書込みが行なわれた場合は、チェッ
クポイント値は同じ最適性を維持する。Therefore, the idea is to take a checkpoint from time to time as the log continues to grow, and write this checkpoint to disk. As the log grows, the checkpoint value in the log file header becomes stale and non-optimal. But clearly,
If additional writes are made to the log, the checkpoint value maintains the same optimality.

ｎの選択に関して、ログ・ファイル書込み活動の何らか
の手段を使って、チェックポイントを取・る間隔を決定
することが望まれる。本明細書に記述する実施例では、
これは、ログ書込み回数ｎをカウントすることによって
行なわれ、この値が、後続のチェックポイントの決定と
ログ・ファイル・ヘッダへのチェックポイントの書込み
の間の間隔を決定する要素になる。しかし、本発明はロ
グ・ファイル活動の特定の測定基準に限定されるもので
はなく、他のｎの決定要素を代わりに使用できることを
了解されたい。たとえば、別法では、そのようなログ活
動の測定基準を、単にログに書かれるバイトの数のカウ
ントとすることができる。Regarding the choice of n, it is desirable to use some means of log file write activity to determine the interval at which checkpoints are taken. In the examples described herein,
This is done by counting the number of log writes, n, which is the factor that determines the interval between determining subsequent checkpoints and writing checkpoints to the log file header. However, it should be understood that the present invention is not limited to any particular metric of log file activity, and other determinants of n may be used instead. For example, such a log activity metric could alternatively simply be a count of the number of bytes written to the log.

この場合、間隔はより正確になるが、この値を維持する
ための費用及びオーバヘッドが増大する。In this case, the interval will be more accurate, but the cost and overhead of maintaining this value will increase.

したがって、入出力の数、ログに書かれるバイトの数等
を用いてログ・ファイル活動を測定する必要が生じるが
、本発明は特定のログ活動測定基準に限定されるもので
はなく、当技術で周知であるいくつかのそのような技術
を使用することを意図している。さらに、アプリケーシ
ョンによっては、チェックポイントの周期的記憶を起こ
す機能の選択を外部で行なうことが望ましいこともある
と想定している。そのような事例の１つでは、ユーザは
、ｎの値、または正確に言えばｎが何の測定基準である
か等の構成パラメータをユーザ・インターフェースで随
意に変更することができる。基本的概念は、チェックポ
イントを周期的に取り出し、ログ活動の何らかの測定基
準の関数として記憶することである。Therefore, there is a need to measure log file activity using the number of inputs and outputs, number of bytes written to the log, etc., although the present invention is not limited to any particular log activity metric, and the present invention is not limited to any particular log activity metric; It is contemplated that several such techniques, which are well known, may be used. Furthermore, it is assumed that in some applications it may be desirable to externally select the functionality that causes periodic storage of checkpoints. In one such case, a user can arbitrarily change configuration parameters such as the value of n, or precisely what metric n is, at the user interface. The basic idea is to take checkpoints periodically and store them as a function of some metric of log activity.

第１図を再び参照すると、ＭＩＮＢＵＦＬＳＮ及びＬＯ
ＷＴＲＡＮＬＳＮが参照番号１０及び１２、ならびに第
５図及び第４図の対応する手順に従って決定され、さら
に１．これらのＭＩＮＢＵＦＬＳＮ及びＬＯＷＴＲＡＮ
ＬＳＮ値を含むチェックポイントが、第６図に関連して
詳述した手順に従って１４で書き出された後、ログされ
たそのようなチェックポイントが役立つのは、クラッシ
ュ後にシステム回復のためにそれを後で使用する際であ
る。したがって、次に第１図に参照番号１８で示したチ
ェックポイントのそのような使用について第７図の処理
に関してさらに詳しく説明する。Referring again to FIG. 1, MINBUFLSN and LO
WTRAN LSN is determined according to reference numerals 10 and 12 and the corresponding procedures of FIGS. 5 and 4; and 1. These MINBUFLSN and LOWTRAN
After a checkpoint containing the LSN value is written out in step 14 according to the procedure detailed in connection with Figure 6, such a logged checkpoint is useful if it is used for system recovery after a crash. for later use. Accordingly, such use of the checkpoint designated by reference numeral 18 in FIG. 1 will now be described in more detail with respect to the process of FIG.

チェックポイントを書き出す手順（第６図）に従って、
ＭＩＮＢＵＦＬＳＮ及びＬＯＷＴＲＡＮＬＳＮに対する
対応値がログ・ファイル・ヘラダニ書かれたことを想起
されたい。システム・クラッシュ時に、ＬＯＷＴＲＡＮ
ＬＳＮ及びＭＩＮＢＵＦＬＳＨのこれらの値が、９８に
示すように、ログ・ファイル・ヘッダから取り出される
。本発明によれば、読み取られたそのようなログ・ファ
イル・ヘッダからチェックポイント・パラメータを取り
出すことができることの大きな利点は、通常のログ回復
技術に付随するログ・ファイルの解析パスの必要がそれ
によってなくなることである。Follow the procedure for writing out checkpoints (Figure 6),
Recall that the corresponding values for MINBUFLSN and LOWTRANLSN were written to the log file database. In the event of a system crash, LOWTRAN
These values for LSN and MINBUFLSH are retrieved from the log file header as shown at 98. In accordance with the present invention, a significant advantage of being able to extract checkpoint parameters from such read log file headers is that it eliminates the need for log file parsing passes associated with normal log recovery techniques. It disappears due to

第７図を引き続き参照すると、開始ＬＳＮ１すなわちＬ
ＯＷＴＲＡＮＬＳＮ及びＭＩＮＢＵＦＬＳＮの最小値が
回復時に決定される。１０２で、この開始ＬＳＮが、デ
ィスクに書き込まれた最後のＬＳＮよりも小さいかどう
か検査が行なわれる。Continuing to refer to FIG. 7, starting LSN1 or L
The minimum values of OWTRANLSN and MINBUFLSN are determined upon recovery. At 102, a check is made whether this starting LSN is less than the last LSN written to disk.

そうである場合は、１０４でルーチンを出て１１８に進
み順方向回復手順が終わる。開始ＬＳＮがディスクに対
する最後のＬＳ’Ｎよりも小さくない場合は、手順は経
路１０６に従って、１０８に進み、そこで検査を行なっ
て、ＬＯＷＴＲＡＮＬＳＮがＭＩＮＢＵＦＬＳＮよりも
小さいかどうか判定する。ＬＯＷＴＲＡＮＬＳＮがＭＩ
ＮＢＵＦＬＳＮよりも小さい場合は、ＬＯＷＴＲＡＮＬ
ＳＮから始まってＭＩＮＢＵＦＬＳＨの前の最後の更新
に到るまでのログ・レコードに記録されたすべての更新
が、データベースのディスク・パージロンに対して実際
に加えられたことがわかり、したがって、検査のために
データ・ページを読み込む必要がない。この間隔で開始
するトランザクシロンを記録することのみが必要である
。したがって、処理は１１０で出て、ＭＩＮＢＵＦＬＳ
Ｎから始まってＬＯＷＴＲＡＮＬＳＮの直前のログ・レ
コードに到るまでのログ・レコードについてのみ「ミニ
解析」を行なう。このステップについては後でさらによ
り詳しく説明する。ログ・レコードが、ＭＩＮＢＵＦＬ
ＳＮと関連するＬＳＮを超え（たことが、ブロック１０
８からの経路１１２またはブロック１１４からの脱出に
よって示され）ると、更新が実際に適用されたかどうか
知るためにディスクからデータ・ページを読み込むこと
が必要であり、このステップはＭＩＮＢＵＦＬＳＮにお
ける完全順方向回復１１６と呼ばれる。その後、処理は
１１８に進み、順方向回復が終わる。If so, the routine exits at 104 and proceeds to 118 to end the forward recovery procedure. If the starting LSN is not less than the last LS'N for the disk, the procedure follows path 106 to 108 where a check is made to determine if LOWTRANLSN is less than MINBUFLSN. LOWTRANLSN is MI
LOWTRANL if less than NBUFLSN
We can see that all the updates recorded in the log records starting from SN up to the last update before MINBUFLSH were actually made to the database's disk persilon, and therefore for inspection There is no need to load data pages. It is only necessary to record the transaxilons starting at this interval. Therefore, processing exits at 110 and MINBUFLS
A "mini-analysis" is performed only on log records starting from N and ending with the log record immediately before LOWTRANLSN. This step will be explained in more detail later. The log record is MINBUFL
SN and the associated LSN (that is, block 10
8), it is necessary to read the data page from disk to know whether the update has actually been applied, and this step is a complete forward step in MINBUFLSN. It is called recovery 116. Processing then proceeds to 118 and forward recovery ends.

１１４に示した「ミニ解析」に関係するステップに戻る
と、上記のことから、ＬＯＷＴＲＡＮＬＳＮとＭＩＮＢ
ＵＦＬＳＮの間のログ・レコードに等しいかまたはそれ
よりも大きい一連のログ・レコードが存在することがわ
かる。さらに、ＬＯＷＴＲＡＮＬＳＮから始まってＭＩ
ＮＢＵＦＬＳＮで識別されるログ・レコードの前で終わ
るログ・レコードにログされたすべての更新が、ディス
クに書き込まれたことがわかる。最小のＬＳＮ　（すな
わち、ＬＯＷＴＲＡＮＬＳＮと同時に決定されるＭＩＮ
ＢＵＦＬＳＮ）　がＬＯＷＴＲＡＮＬＳＮよりも大きい
場合は、ディスクに書き込まれていない最初のデータ変
更を識別するＬＳＮがＭＩＮＢＵＦＬＳＮである。ＬＯ
ＷＴＲＡＮＬＳＮから始マってＭＩＮＢＵＦＬＳＮに到
るログされた動作はすべてディスクに書き出された。そ
うでない場合は、それらの動作は、ＭＩＮＢＵ″ＦＬＳ
Ｎを計算するために使用される「汚れた」ログ・レコー
ドのＬＳＮのリスト中に存在し、したがって、ＭＩＮＢ
ＵＦＬＳＮはＬＯＷＴＲＡＮＬＳＮよりも小さいかまた
はそれに等しくなる。その場合、第７図の処理は「ミニ
解析」フェーズを実行することができ、このフェーズで
は、従来の先行書込みログ・チェックポイント法とは違
って、データ・ページを読み込む必要はない。これは、
ログ・レコードによって参照されるそれらの更新が外部
のディスク上にあることがわかっているからである。Returning to the steps related to the "mini-analysis" shown in 114, from the above, LOWTRANLSN and MINB
It can be seen that there is a series of log records equal to or larger than the log records between UFLSN. Furthermore, starting from LOWTRANLSN, MI
It can be seen that all updates logged to log records ending before the log record identified by NBUFLSN have been written to disk. Minimum LSN (i.e. MIN determined at the same time as LOWTRANLSN)
BUFLSN) is greater than LOWTRANLSN, then the LSN that identifies the first data change not written to disk is MINBUFLSN. L.O.
All logged operations starting from WTRANLSN to MINBUFLSN were written to disk. If not, their behavior is MINBU″FLS
are present in the list of LSNs of "dirty" log records used to calculate N, and therefore MINB
UFLSN will be less than or equal to LOWTRANLSN. In that case, the process of FIG. 7 can perform a "mini-parse" phase in which no data pages need to be read, unlike traditional write-ahead log checkpointing methods. this is,
This is because those updates referenced by the log records are known to be on external disk.

ＭＩＮＢＵＦＬＳＮがＬＯＷＴＲＡＮＬＳＮよりも小さ
いときは、この事実は知られず、処理は、データ・ペー
ジを取り込むことを強制される。When MINBUFLSN is less than LOWTRANLSN, this fact is not known and the process is forced to fetch data pages.

データ・ページは、そのデータ・ページに対して加えら
れた最後の更新のＬＳＮを含む。ログ・レコードのＬＳ
Ｎがデータ・ページ上のＬＳＮよりも大きい場合は、更
新ログが外部のディスクに対して適用されておらず、再
適用しなければならないことがわかる。データ・ページ
上のＬＳＮが、処理が現在あるログ・レコードのＬＳＮ
に等しいかまたはそれよりも大きい場合は、更新が実際
に加えられたことがわかる。これは当技術では完全再実
行と呼ばれ、順方向回復または再実行フェーズでログ・
レコードが処理されるたびにページ入出力が必要となる
ので、費用がかかる。A data page contains the LSN of the last update made to that data page. Log record LS
If N is greater than the LSN on the data page, we know that the update log has not been applied to external disk and must be reapplied. The LSN on the data page is the LSN of the log record that is currently being processed.
If it is equal to or greater than , we know that the update was actually made. This is referred to in the art as a full redo, which means that the forward recovery or redo phase
It is expensive because page I/O is required each time a record is processed.

それとは対称的に、本発明の技術は、ＭＩＮＢＵＦＬＳ
Ｎに達するまで、そのような入出力の必要がなくなる。In contrast, the present technique uses MINBUFLS
Until N is reached, there is no need for such input/output.

多くの場合、ＬＯＷＴＲＡＮＬＳＮとＭＩＮＢＵＦＬＳ
Ｎの間に多数のログ・レコードが存在する。したがって
、ＬＯＷＴＲＡＮＬＳＮがＭＩＮＢＵＦＬＳＮよりも小
さい場合、再実行中にそれらのログ・レコードを調べる
とき、ページ入出力をなくすことができる。入出力の実
行は必要でなく、新しいトランザクションの開始を探す
ことだけが必要である。従来のチェックポイント技術の
解析フェーズでは、新しいトランザクションの開始点を
決定するために状態情報を調べた。Often LOWTRANLSN and MINBUFLS
There are many log records between N. Therefore, if LOWTRANLSN is less than MINBUFLSN, page I/O can be eliminated when examining those log records during rerun. There is no need to perform I/O, only to look for the start of a new transaction. The analysis phase of traditional checkpointing techniques examines state information to determine the starting point for a new transaction.

本発明では、ＭＩＮＢＵＦＬＳＮとＬＯＷＴＲＡＮＬＳ
Ｎを記憶することによって解析フェーズが不要となり、
解析フェーズを実際に実行する必要なしに解析フェーズ
の利点のいくつかがほぼ実現される（そこから、「ミニ
解析」という用語が出てきた）。本方法のもとでは、ロ
グが処理される。In the present invention, MINBUFLSN and LOWTRANLS
By memorizing N, the analysis phase becomes unnecessary,
Some of the benefits of the analysis phase are largely realized without the need to actually perform the analysis phase (hence the term "mini-analysis"). Under this method, logs are processed.

また、ＭＩＮＢＵＦＬＳＮがＬＯＷＴＲＡＮＬＳＮに等
しいかまたはそれよりも小さいある種の場合では、付随
ページ入出力を伴う完全再実行が必要とされる。しかし
、本発明によれば、特定のログ間隔に関して高価な入出
力が必要とされない可能性が見込まれる。Also, in certain cases where MINBUFLSN is equal to or less than LOWTRANLSN, a full re-execution with accompanying page I/O is required. However, the present invention allows for the possibility that no expensive input/output is required for a particular log interval.

わかりやすくするため、ＬＯＷＴＲＡＮＬＳＮ及びＭＩ
ＮＢＵＦＬＳＮを維持するための第４図及び第５図のル
ーチンの呼出しプログラムは省略した。しかし、明らか
なように、トランザクション状態のレコードを追加、削
除及び更新する際のトランザクシロン管理機能の一部と
してこれらのサブルーチンを維持するために、データベ
ース管理機構の一部として、当技術で周知の適当なトラ
ンザクション管理機構が設けられる。For clarity, LOWTRANLSN and MI
The routine calling program of FIGS. 4 and 5 for maintaining NBUFLSN has been omitted. However, it will be appreciated that in order to maintain these subroutines as part of the transaction management functions in adding, deleting and updating records in transactional state, as part of the database management mechanism, well known in the art Appropriate transaction management mechanisms are provided.

同様に、「汚れた」ページのＬＳＮリストを追加、削除
及び維持するためのルーチンの呼出しプログラムと、第
２図及び第３図に示したトランザクション・テーブル中
のＬＳＮのリストも省略した。しかし、バッファが取り
込まれページ・アウトされるとき、これらの「汚れた」
ページのＬＳＮ　ＩＪスト及びＬＳＮ値を追加、削除及
び維持する機能を実行するために、当技術で周知の通常
のバッファ・プール管理機能も設けられる。したがって
、たとえば、第６図のブロック９４に従って新しいチェ
ックポイントを書くときに、対応するＭＩＮＢＵＦＬＳ
Ｎ及びＬＯＷＴＲＡＮＬＳＮ値を、上記バッファ・プー
ル管理機能によって維持されている記、憶位置から回収
することができる。Similarly, the calling routines for adding, deleting, and maintaining the LSN list of "dirty" pages and the list of LSNs in the transaction table shown in FIGS. 2 and 3 have also been omitted. But when the buffer is brought in and paged out, these "dirty"
Conventional buffer pool management functions, well known in the art, are also provided to perform the functions of adding, deleting, and maintaining LSN IJ lists and LSN values for pages. Thus, for example, when writing a new checkpoint according to block 94 of FIG.
The N and LOWTRANLSN values can be retrieved from storage locations maintained by the buffer pool manager.

最後に、第８図を参照すると、本明細書に記載する処理
を実行するための関連ソフトウェアを備えたマイクロプ
ロセッサからなる本発明のデータベース・システム１２
０が示されている。このシステムは、ＩＢＭパーソナル
・システム／２フアミリーのコンピュータに関連するア
ーキテクチャ等の通常のパーソナル・コンピュータ・ア
ーキテクチャの態様で構成することが好ましい。そのよ
うなコンピュータ・システムには、インテル８０３８６
マイクロプロセツサ等のマイクロプロセッサ１３２が設
けられている。マイクロプロセッサ１３２と、メモリ１
２６、ＲＯＭ１２８及び各種の入出力装置１３０によっ
て実現されるいくつかの関連機能との間を接続するため
に、バス１３４が設けられている。バスは、１本の線と
して機能的に示しであるが、通常のように当技術で周知
の目的用の通常のアドレス線とデータ線と制御線を含む
ことを認識されたい。Finally, referring to FIG. 8, the database system 12 of the present invention comprises a microprocessor with associated software for performing the processes described herein.
0 is shown. The system is preferably constructed in the manner of conventional personal computer architecture, such as that associated with the IBM Personal System/2 family of computers. Such computer systems include Intel 80386
A microprocessor 132, such as a microprocessor, is provided. microprocessor 132 and memory 1
A bus 134 is provided to connect between 26, ROM 128, and several related functions implemented by various input/output devices 130. Although the bus is functionally shown as a single line, it will be appreciated that it includes conventional address, data, and control lines for purposes well known in the art.

簡単にするため、装置１２６．１２８及び１３０は、マ
イクロプロセッサ１３２との間のアダプタ・インターフ
ェースなしに示しであるが、それらの装置は周知の目的
のため、パーソナル・システム／２の一部として、また
は差込み式別売品として設けられている。For simplicity, devices 126, 128, and 130 are shown without an adapter interface to microprocessor 132; however, they are, for well-known purposes, included as part of Personal System/2. Or it is provided as an optional plug-in item.

読取り専用メモ！Ｊ　（ＲＯＭ）１２８は基本人出力オ
ペレーティング・シ曵テム（ＢＩＯ８）を内蔵する。こ
れはマイクロプロセッサ１３２によって実行され、シス
テム１２０の基本動作を制御する。Ｏ８／２等のオペレ
ーティング・システム１２４は機能的に示してあり、Ｒ
ＯＭ１２８内のＢＩＯ８と共に通常の方法で稼動するが
、オペレーティング・システム１２４用のソフトウェア
は通常、他のメモリ１２６に記憶される。入出力装置１
３０はオペレータ入力用のキーボード、マウス等、なら
びにシステムの視覚出力をユーザに提供するためのＩＢ
Ｍ　ＰＳ／２カラー表示装置８５１４等の表示装置を含
むことができる。メモリ１２６は、ディスク及び関連の
ディスク駆動機構、１つまたは複数のディスク・ファイ
ル等の種々の媒体の形を取ることができる。Read-only memo! J (ROM) 128 contains a basic human output operating system (BIO8). It is executed by microprocessor 132 and controls the basic operation of system 120. An operating system 124, such as O8/2, is shown functionally and R
Although operating in a conventional manner with BIO 8 within OM 128 , software for operating system 124 is typically stored in other memory 126 . Input/output device 1
30 is a keyboard, mouse, etc. for operator input, and an IB for providing visual output of the system to the user.
A display device such as an MPS/2 color display device 8514 may be included. Memory 126 may take the form of a variety of media, such as a disk and associated disk drive, one or more disk files, and the like.

本発明の教示を実現するのに適した適当なノ１−ドウエ
ア及びオペレーティング・システム・ソフトウェアに関
するより詳しい情報は、以下の文書に記載されている。More detailed information regarding suitable hardware and operating system software suitable for implementing the teachings of the present invention can be found in the following documents:

ヤコブツチ（Ｉａｃｏｂｕｃｃｉ）編ｒｏｓ／２プログ
ラマの手引き（ＯＳ／２Ｐｒｏｇｒａｍｍｅｒ’ｓ　Ｇ
ｕｉｄｅ）　Ｊ　、マグロ−ヒル、１９８８年；「技術
解説書パーソナル・システム／２（５０型、６０型シス
テム）　　（ＴｅｃｈｎｉｃａｌＲｅｆｅｒｅｎｃｅ　
Ｍａｎｕａｌ、　Ｐｅｒｓｏｎａｌ　Ｓｙｓｔｅｍ／２
　（Ｍｏｄｅ１５０、６０　Ｓｙｓｔｅｍｓ））　Ｊ、
ＩＢＭ社、部品番号６８Ｘ２２２４、資料番号５８８Ｘ
２２２４；及びｒＩＢＭオペレーティング・システム／
２　フィージョン１．標準率版技術解説書（’ＩＢＭ　
ＯｐｅｒａｔｉｎｇＳｙｓｔｅｍ／２　Ｖｅｒｓｉｏｎ
　１．０５ｔａｎｄａｒｄ　ＥｄｉｔｉｏｎＴｅｃｈｎ
ｉｃａｌ　Ｒｅｆｅｒｅｎｃｅ）　Ｊ、ＩＢＭ社、部品
番号６２８０２０１、資料番号５８７１−ＡＡ、これら
の文書を、引用により本明細書に組み込む。OS/2 Programmer's Guide (OS/2 Programmer's G), edited by Iacobucci.
uide) J, McGraw-Hill, 1988; ``Technical Reference Manual Personal System/2 (50 type, 60 type system) (Technical Reference
Manual, Personal System/2
(Mode150, 60 Systems)) J,
IBM, part number 68X2224, document number 588X
2224; and rIBM operating system/
2 Fision 1. Standard rate version technical manual ('IBM
Operating System/2 Version
1.05standard EditionTechn
ical Reference) J, IBM Corporation, Part No. 6280201, Document No. 5871-AA, each of which is incorporated herein by reference.

本発明によれば、オペレーティング・システム１２４の
他に、図に示したデータベース・プログラム等のアプリ
ケージロン・プログラム１２２もメモリ１２６にロード
される。データベース・プログラム１２２は、当技術で
周知のものなどいくつかの関係データベース・ルーチン
の他に、第５図、第４図、第７図及び第６図に関連して
前述した処理を実現するためのコンピュータ・プログラ
ム・ルーチン１３６．１３８．１４０及び１４２を含む
。データベース・プログラム１２２は一般に、システム
１２０に関係データベース機能を実行させるための命令
をマイクロプロセッサ１３２に与える。したがって、ユ
ーザは、プログラム１２２の実行によって表示入出力装
置１３０上に生じる出力に応答して、各種の入出力装置
１３０を介してシステムと対話することができる。In addition to the operating system 124, in accordance with the present invention, an application program 122, such as the database program shown, is also loaded into the memory 126. Database program 122 includes a number of relational database routines, such as those well known in the art, as well as for implementing the processes described above in connection with FIGS. 5, 4, 7, and 6. computer program routines 136, 138, 140 and 142. Database program 122 generally provides instructions to microprocessor 132 to cause system 120 to perform relational database functions. Accordingly, a user can interact with the system through various input/output devices 130 in response to output produced on display input/output devices 130 by execution of program 122 .

コンピユータ化された機能１３６−１４２に関連して、
ＭＩＮＢＵＦＬＳＮ及びＬＯＷＴＲＡＮＬＳＨの値は順
方向処理中にルーチン１３６及び１３８によって周期的
に決定される。これらの値はチェックポイント書込みサ
ブルーチン１４２に応答して、周期的にメモリ１２６に
書き出される。In connection with computerized functions 136-142,
The values of MINBUFLSN and LOWTRANLSH are determined periodically by routines 136 and 138 during forward processing. These values are periodically written to memory 126 in response to checkpoint write subroutine 142.

これらのＬＳＮ値がメモリ１２６にログされることに加
えて、システム１２０のメモリ１２６にあるデータベー
スに対する変更及び更新（入出力装置１３０を介して入
力される）も、メモリ１２６に記憶された回復ログにロ
グされることは明らかなはずである。回復時には、シス
テム１２０がデータベース・プログラム１２２の使用さ
れるチェックポイント・ルーチン１４０部分を使って、
第７図に関連して前述した方式でＭＩＮＢＵＦＬＳＮ及
びＬＯＷＴＲＡＮＬＳＮの値を取り出して、マイクロプ
ロセッサ１３２に、ＭＩＮＢＵＦＬＳＮ及びＬＯＷＴＲ
ＡＮＬＳＨ値に機能的に応答して、メモリ１２６の適当
な位置にあるデータベースの回復を制御させる。In addition to these LSN values being logged in memory 126, changes and updates to the database in memory 126 of system 120 (entered via input/output device 130) are also logged in the recovery log stored in memory 126. It should be clear that it is logged. During recovery, system 120 uses the checkpoint routine 140 portion of database program 122 to
The values of MINBUFLSN and LOWTRANLSN are retrieved in the manner described above in connection with FIG.
Functionally responsive to the ANLSH value, recovery of the database at the appropriate location in memory 126 is controlled.

[Brief explanation of the drawing]

第１図は、本発明のシステム及び方法の主な機能構成要
素を示すブロック・ダイヤグラムである。第２図は、バッファ内のデータ・ページと関連するＭＩ
ＮＢＵＦＬＳＮとＬＳＨの関係を示す、データベースの
ＲＡＭバッファの概念図である。第３図は、トランザクション・テーブルで示されるトラ
ンザクションと関連するＬＯＷＴＲＡＮＬＳＮとＬＳＮ
の関係を示す、データベースと関連するトランザクショ
ン・テーブルの概念図である。第４図は、第１図のブロック１２に示すような、通常の
データベース動作中にＬＯＷＴＲＡＮＬＳＮを取り出し
て維持するための方法を示す流れ図である。第５図は、第１図のブロック１０に示すような、通常の
データベース動作中にＭＩＮＢＵＦＬＳＮを取り出して
維持するための方法を示す流れ図である。第６図は、第１図のブロック１４に示すような、それぞ
れ第１図のブロック１２及び１０に対応する第４図及び
第５図に示す方法によって作成されたチエツクポイン）
　（ＭＩＮＢＵＦＬＳＮ、ＬＯＷＴＲＡＮＬＳＮ）を記
憶するための方法を示す流れ図である。第７図は、第１図のブロック１０及び１２で作成され、
かつ第１図のブロック１４に記憶されたチェックポイン
トを使用したデータベース回復の開始をさらに詳細に示
す流れ図である。第８図は、本発明の回復機能を備えたコンピユータ化さ
れたデータベース・システムの概略構成図である。１２０・・・・データベース・システム、１２２・・・
・データベース・プログラム、１２４・・・・オペレー
ティング・システム、１２６・・・・メモリ、１３０・
・・・入出力装置、１３２・・・・マイクロプロセッサ
。ＲＡバＩず７７トゑ艷ＬＳＮＩＩＩローＩＩＡ＠＝Ｊｅイ繁ｌＦ、ビｘ−７．丁！イ；Ｌ（ｒ４１Ｎ
ＢｕＦＬ５Ｎ、ＬＯＷＴＲＡｐＪ（ＪＮ）Ｌ省ヲ出マロ％６［FIG. 1 is a block diagram illustrating the major functional components of the system and method of the present invention. Figure 2 shows the MI associated with the data pages in the buffer.
FIG. 2 is a conceptual diagram of a database RAM buffer showing the relationship between NBUFLSN and LSH. Figure 3 shows the LOWTRANLSN and LSN associated with the transaction indicated in the transaction table.
FIG. 2 is a conceptual diagram of a transaction table associated with a database, showing the relationship between the two. FIG. 4 is a flow diagram illustrating a method for retrieving and maintaining LOWTRANLSN during normal database operations, such as shown in block 12 of FIG. FIG. 5 is a flow diagram illustrating a method for retrieving and maintaining MINBUFLSN during normal database operations, such as shown in block 10 of FIG. FIG. 6 shows checkpoints created by the method shown in FIGS. 4 and 5, as shown in block 14 of FIG. 1, corresponding to blocks 12 and 10 of FIG. 1, respectively).
3 is a flowchart illustrating a method for storing (MINBUFLSN, LOWTRANLSN). FIG. 7 is created by blocks 10 and 12 of FIG.
and is a flowchart illustrating in further detail the initiation of database recovery using the checkpoint stored in block 14 of FIG. FIG. 8 is a schematic diagram of a computerized database system equipped with the recovery function of the present invention. 120...Database system, 122...
- Database program, 124... Operating system, 126... Memory, 130...
...input/output device, 132...microprocessor. RA Ba Izu 77 Toe 艷 LSN III Law II A@=JeI Shige IF, Bi x-7. Ding! I;L(r41N
BuFL5N, LOWTRApJ (JN) L saving%6 [

Claims

[Scope of Claims] Data corresponding to a plurality of updated data pages is stored, and the first indicator is functionally related to the update time of a corresponding one of the updated pages. retrieving from the data a log record corresponding to each of the operational database transactions, and retrieving a second indicator functionally related to the time of one of the operational transactions from the stored log record; A method for restoring a database, comprising: retrieving a checkpoint as a function of the first and second indicators, storing the checkpoint, and restoring the database in response to the checkpoint.