JP2004062759A

JP2004062759A - Database log management method, its device and its program

Info

Publication number: JP2004062759A
Application number: JP2002223295A
Authority: JP
Inventors: Hiroshi Yamakawa; 山川　　洋
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 2002-07-31
Filing date: 2002-07-31
Publication date: 2004-02-26

Abstract

<P>PROBLEM TO BE SOLVED: To reduce a data quantity of a log file gathered for recovering a database. <P>SOLUTION: When updating the database, a pre-update log making part 112 of a log acquiring part 109 acquires pre-update data from a database management control part 107, and outputs this pre-update data to the log file. A post-update log making part 114 acquires this pre-update data and post-update data. A difference information making part 113 makes difference data of the post-update data to the pre-update data. The post-update log making part 114 outputs the difference data to the log file when the size of the difference data is smaller than the size of the post-update data, and outputs the post-update data to the log file in a nonsuch case. <P>COPYRIGHT: (C)2004,JPO

Description

【０００１】
【発明の属する技術分野】
本発明は、データベースログの管理技術に係わり、特にログファイルのサイズを削減するデータベースログの管理技術に関するものである。
【０００２】
【従来の技術】
一般的にＲＤＢＭＳ（リレーショナル・データベース管理システム）では、ＯＳ、一時記憶装置や外部記憶装置などの障害に備えて、データベースの更新情報を記録している。これらのデータベースの更新記録は、ログもしくはジャーナルと呼ばれ、ユーザデータとは別に外部記憶装置に書き出している。ログもしくはジャーナルは、データベースの回復に備えるものであるため、更新情報をログとして記憶する場合、メモリなど一時記憶装置上に一時的なキャッシングを行わず、更新処理の完了と同時に外部記憶装置に書き出すのが一般的である。
【０００３】
ログの中で更新処理を再実行するためのログをリドゥーログもしくは更新後ログと呼び、更新処理を無効にするためのログをアンドゥーログもしくは更新前ログと呼ぶ。前者は当該データの更新対象データの更新前のデータの値を取得し、後者は更新処理後のデータの値を取得する。ログレコードを外部記憶装置に書き出したファイルをログファイルと呼ぶ。データベースに対して、更新が発生するごとに更新前ログ、更新後ログの書き出しが発生するので、その回数に応じてログファイル容量は増大してゆく。
【０００４】
通常、データベースに対するアクセスは、トランザクションという一連の処理を一まとまりとして論理的にとらえ、データベースの内容の整合性が取れるようにしている。データベースに対する更新処理は、あるトランザクションに含まれ、トランザクションは、コミット要求で同一トランザクション内の一連の処理がデータベースに反映され、ロールバック要求で同一トランザクション内の一連の処理はデータベースに反映されず無効になる。更新が発生した場合の更新前ログ、更新後ログは、トランザクションの完了、無効に関わらずログファイルに出力される仕組みであることが多い。このため更新処理のロールバック、ロールフォワードに必要な情報であるかどうかは、実際にデータベースに障害が発生し、データベースの回復の必要があるとログファイルから必要な情報を選択し利用するので、必要でない情報もログファイルに格納されることがあり、ログファイル容量が増加していた。
【０００５】
この問題に対する解決策として、特開平６−８３６８２号公報記載の技術が挙げられる。ここでは該当処理がコミットされている情報を含んでいない場合のみ更新前情報を更新前ログとして、また更新後情報を更新後ログとしてログファイルに書き込みだけを行い、データベースには、コミットされていないデータを含まない場合のみ書き出すことで、ログファイル容量の削減を実現している。しかしこの方法でも、コミットされていないデータ更新が大量に発生する長大トランザクション（トランザクションを規定する時間が長いケース）の場合は、更新前ログ、更新後ログのログファイルへの書き出しが大量に発生し、ログファイル容量が増大してしまう。
【０００６】
これまでＲＤＢＭＳは、ＯＬＴＰ（オンライン・トランザクション処理）などの定型業務に利用されることが多く、格納するデータは数値データや文字データなど比較的データ量が少ない場合に限定されていた。近年、ＯＬＡＰ（オンライン解析処理）などの情報系システムにＲＤＢＭＳが利用されるケースが増え、データベースに文書データ、画像や音声といったマルチメディアデータなど膨大なデータ量のデータが含まれるケースが増えてきている。これらのマルチメディアデータに対する更新前ログ、更新後ログなどのログファイルは、データ量が膨大になる。この場合、上記公知技術を利用しても、データベースに対してデータ更新が頻繁に発生する場合には、ログファイル容量は増大する。
【０００７】
ＲＤＢＭＳに文書データ、画像や音声といったマルチメディアデータをＢＬＯＢ（Ｂｉｎａｒｙ　Ｌａｒｇｅ　ＯＢｊｅｃｔ）といったデータ型として格納する場合、当該データのごく一部の変更にもかかわらず、変更後データの物理イメージを更新後データとしてすべてログファイルに出力していることが多い。このため当該データに対して更新が多発すると、更新前データと更新後データの物理イメージをそれぞれ更新前ログおよび更新後ログとして取得する必要があり、ログ格納領域は元データに対して約２倍の容量が必要となる。また更新前ログ、および更新後ログそれぞれのデータ量も多くなるため、それらのログファイルへの書き出しの入出力による性能劣化も懸念される。
【０００８】
【発明が解決しようとする課題】
前述のように、データベースに対して更新が多発すると、更新前データおよび更新後データを更新前ログおよび更新後ログとしてそれぞれログファイルに格納する必要があるため、データベースに格納するデータのデータ長が長い場合、外部記憶装置のログファイル格納領域が大量に必要となる。
【０００９】
本発明の目的は、データベース回復のために採取するログデータを格納するログ格納領域のデータ量を削減することにある。
【００１０】
【課題を解決するための手段】
本発明は、データベースの回復のためにデータベースログを採取するデータベースログの管理技術であって、データベースの更新に際して更新前データを取得し、ログファイルにこの更新前データを出力し、更新前データに対応する更新後データを取得し、更新前データに対する更新後データの変更された部分である差分データを作成し、この差分データのサイズと更新後データのサイズとを比較して差分データのサイズが更新後データのサイズより小さい場合にはログファイルにこの差分データを出力し、差分データのサイズが更新後データのサイズ以上の場合にはログファイルにこの更新後データを出力するデータベースログの管理技術を特徴とする。
【００１１】
【発明の実施の形態】
図１は、本実施形態のデータベースへのアクセスを伴うトランザクション処理システムの構成図である。システムは、データベースを管理する装置であるデータベースサーバ１００、クライアント１０１及び両者を接続するネットワーク１０２から構成される。クライアント１０１は、計算機であるとともにデータベースサーバ１００にアクセスする端末であり、実行されるＵＡＰ（Ｕｓｅｒ　Ａｐｐｌｉｃａｔｉｏｎ　Ｐｒｏｇｒａｍ）及びネットワーク１０２を介してデータベースサーバ１００が管理するデータベースにアクセスする。データベースサーバ１００は、サーバ計算機であり、ＣＰＵ１０３、主記憶装置１０４、データベース格納領域１０６およびログファイル格納領域１０７を有する。データベース格納領域１０６はデータベースを格納する記憶領域、ログファイル格納領域１０７はログファイルを格納する記憶領域である。
【００１２】
主記憶装置１０４は、データベースに対するアクセスを処理するデータベース管理制御部１０７，データベースに対する入出力処理を行うデータベース入出力管理部１０８、ログレコードの取得処理を行うログ取得部１０９、ログレコードからデータベースの回復を行うデータベース回復部１１０およびログファイルへの入出力処理を行うログ入出力管理部１１１から構成される。これらすべての処理部は、ＣＰＵ１０３によって実行されるプログラムである。
【００１３】
ログ取得部１０９は、データベース管理制御部１０７からデータベースに対する更新処理情報を取得し、ログレコードを作成し、ログ入出力管理部１１１を介してログファイルに書き込む。ログ取得部１０９は、データベースの更新前のデータをログファイルに格納する更新前ログ作成部１１２、データベースの更新前データと更新後データの差分情報を作成する差分情報作成部１１３およびデータベースの更新後データをログファイルに格納する更新後ログ作成部１１４から構成される。
【００１４】
データベース回復部１１０は、データベースに障害が発生し、データベースの回復要求があった場合、取得したログファイルを利用してデータベースの回復を行う。データベース回復部１１０は、データベースの更新前のデータをログレコードに格納されている更新前データから回復する更新前データ回復部１１５とデータベースの更新後のデータをログレコードに格納されている更新後データから回復する更新後データ回復部１１６を有している。
【００１５】
データベースサーバ１００は、クライアント１０１からの要求をデータベース管理制御部１０７で受け付け、データベースに対するアクションの要求を実行する。データベース管理制御部１０７は、クライアント１０１により要求されたデータが主記憶上のバッファ領域にあれば、そこから取得し、主記憶上に要求されたデータが存在しなければ、データベース入出力管理部１０８を通して、データベース格納領域１０５からレコードデータを取得する。データベースに対する処理結果のデータベース格納領域１０５への書き出しはアクセス要求に同期してもしくは非同期に行われる。
【００１６】
データベース管理制御部１０７は、データベース更新が発生するごとにログ取得部１０９を起動し、データベース更新に同期してログレコードの記録を行わしめる。またデータベースに障害が発生し、データベースの回復の必要が生じた場合、データベースアクセス管理制御部１０７は、データベース回復部１１０にデータベース回復要求を送る。データベース回復部１１０は、ログファイル格納領域１０６からデータベースの回復に必要なログレコードを取得し、回復されたレコードデータを作成してデータベース管理制御部１０７に引き渡す。データベース管理制御部１０７は、データベース入出力管理部１０８を介してこのレコードデータをデータベース格納領域１０５の該当する場所に書き込む。
【００１７】
図２は、ログレコードのデータ構成を示す図である。ログレコードは、管理情報として、ログ種別、トランザクションＩＤ、ログレコード番号および差分情報を保持している。ログ種別は、当該ログレコードがデータベースに対するどのようなアクションの記録であるかを示す属性であり、削除、追加、更新などが指定可能である。本実施形態によるログファイルの更新後データの差分情報管理方式は、データベースに対する更新のアクションに対して適用されるものである。トランザクションＩＤは、当該ログレコードがどのトランザクションに属するものであるかを一意に決定する属性である。ログレコード番号は、データベース管理システム内でログレコードに対して連続的に割り振られた番号である。
【００１８】
トランザクションＩＤとログレコード番号は、データベースに対するロールフォワード処理において重要な意味を持つ識別子である。データベース管理システムでは、通常、トランザクションの完了もしくは無効化とは別にシンクポイントというデータベースへの更新情報の書き出しを保証する同期点を設けている。このシンクポイント設定時に、メモリ上にバッファリングされているデータを外部記憶装置に書き出し、ログファイルとは別にシンクポイントダンプファイルを作成し、データベースについて行ったレコード更新に対応するトランザクションＩＤ、ログレコード番号などの情報を書き出す。ログレコード番号は、書き出すごとに番号が付与される。したがってシンクポイントダンプファイルに書き出されているログレコード番号より値が小さいログレコード番号をもつログレコードは、データベースへの書き出しが保証されているので、同一トランザクション内でのデータベースに対するロールフォワード処理を行う必要がない。ロールフォワード処理は、シンクポイントファイルに書き出されているログレコード番号より大きい番号のログレコードに対して行えばよい。したがってログファイル上のトランザクションＩＤとログレコード番号の組み合わせとシンクポイントダンプファイル上の該当する値を比較することにより、ロールフォワード処理の作業量を削減することができる。
【００１９】
更新前データは、データベースを更新する前のレコードデータの全体である。更新後データは、データベースを変更した後の当該レコードデータの全体か又は更新前データに対する変更部分のデータである。その変更部分データには、更新前データの先頭からのオフセットと必要に応じてデータのサイズが付加される。更新前データと変更部分データとから更新後の当該レコードデータ全体を復元することが可能である。更新後データフラグは、更新後データがレコードデータ全体か変更部分データかを区別するフラグである。
【００２０】
図３は、長大データとしてＢＬＯＤ型のデータ、つまりＢＬＯＢデータを一例として取り上げ、ＢＬＯＤデータを繰り返し更新する例を示している。初期ＢＬＯＢデータ（データ挿入時のＢＬＯＤデータ）は、Ａ〜Ｈまでの８つのブロックから構成されているデータベース中のレコードデータである。ここでＡ〜Ｈは各ブロックの内容情報を区別するために付与された記号である。Ａ〜Ｈまでの各ブロックのデータは任意のデータ長を持つ。あるトランザクションにより、初期ＢＬＯＢデータに対してブロックＤが内容Ｘに、ブロックＧが内容Ｙに更新されて更新１後（更新１の処理後）のＢＬＯＢデータとなる。すなわちここではブロックＸとブロックＹが更新前データ（ここでは初期ＢＬＯＤデータ）に対する差分のブロックということになる。更新１後のＢＬＯＢデータは、さらにブロックＨが内容Ｚに更新されて更新２後のＢＬＯＢデータとなる。また更新２後のＢＬＯＢデータは、ブロックＸが内容Ｄに、ブロックＹが内容Ｇに、ブロックＺが内容Ｈに更新されて更新３後のＢＬＯＢデータとなるケースを示している。
【００２１】
図４は、図３に示すような更新後ＢＬＯＢデータの各々に対して作成されるログレコードの一例を示している。更新１のログレコードは、ログ種別が更新、トランザクションＩＤが１０、ログレコード番号が１００、更新後データフラグが変更部分を示し、更新前データは、内容Ａ、Ｂ、Ｃ、Ｄ、Ｅ、Ｆ、Ｇ、Ｈが格納され、更新後データには、オフセット１と更新後レコードデータ中のブロックＸ、オフセット２と更新後レコードデータ中のブロックＹが格納されている。
【００２２】
更新２のログレコードは、ログ種別が更新、トランザクションＩＤが１０、ログレコード番号が２００、更新後データフラグが変更部分を示し、更新前データは、内容Ａ、Ｂ、Ｃ、Ｘ、Ｅ、Ｆ、Ｙ、Ｈが格納され、更新後データには、オフセット３と更新後レコードデータ中のブロックＺが格納されている。更新２ログレコードのトランザクションＩＤは、更新１ログレコードと同一であるため、同一トランザクションでの更新であり、またログレコード番号が更新１ログレコードのログレコード番号より大きいため、後の更新であることがわかる。
【００２３】
更新３のログレコードは、ログ種別が更新、トランザクションＩＤが２０、ログレコード番号が１００、更新後データフラグが変更部分を示し、更新前データは、内容Ａ、Ｂ、Ｃ、Ｘ、Ｅ、Ｆ、Ｙ、Ｚが格納され、更新後データには、オフセット１とブロックＤ，オフセット２とブロックＧ，オフセット３とブロックＨが格納されている。更新３ログレコードは、トランザクションＩＤが更新１ログレコードおよび更新２ログレコードのものと異なるため、別のトランザクションであることがわかる。
【００２４】
なお上記実施例は、データベース中のレコード（行方向のデータ）を更新する場合の例を示したが、特定のフィールド（列方向のデータ）を更新する場合についても上記のようにデータをブロックに分割することができ、上記方式をそのまま適用できる。
【００２５】
このようにして、更新前データと更新後データをログレコードに格納することができる。以上のように更新後データに差分を用いることにより、ログファイルの容量を削減することができる。次に具体的なログ格納手順について述べる。
【００２６】
図５は、ログ取得部１０９の処理手順を示すフローチャートである。まずログ取得部１０９は、データベース管理制御部１０７よりデータベース更新情報を受け取る（ステップ５００）。更新情報は、更新前レコードデータの全体と更新後レコードデータの全体を含んでいる。更新前データ書き出し要求であれば（ステップ５０１ＹＥＳ）、更新前ログ作成部１１２を呼び出して管理情報と更新前データを記録する（ステップ５０２）。そうでなければ（ステップ５０１ＮＯ）、次の処理に移る。更新後データ書き出し要求であれば（ステップ５０３ＹＥＳ）、更新後ログ作成部１１４を呼び出して更新後データを記録する（ステップ５０４）。そうでなければ（ステップ５０３ＮＯ）、次の処理に移る。これらの一連の処理のあとに、次のログレコード作成のためにカウンタ中のログレコード番号を増加させる（ステップ５０５）。
【００２７】
図６は、更新前ログ作成部１１２の処理手順を示すフローチャートである更新前ログ作成部１１２は、受け取った更新情報から更新前データを取得する（ステップ６０１）。次に当該ログレコードの管理情報と取得した更新前データをログファイルに書き出し、処理を終了する（ステップ６０２）。管理情報のログ種別には「更新」を設定し、トランザクションＩＤには指定されたトランザクションＩＤを設定し、ログレコード番号にはカウンタに保存されたログレコード番号を設定する。更新後データフラグは未定のフィールドとする。
【００２８】
図７は、更新後ログ作成部１１４の処理手順を示すフローチャートである。更新後ログ作成部１１４は、まず更新前データを取得する（ステップ７０１）。さらに更新後データを取得する（ステップ７０２）。次に差分情報作成部１１３は、取得した更新前データと更新後データから図２で示した差分情報管理方式に従って、更新前データに対する更新後データの差分情報、すなわち変更部分データを作成する（ステップ７０３）。差分情報を作成するに当たり公知の技術が適用できるので、詳細説明を省略する。更新前データと差分情報から矛盾なく更新後データを復元できることが保証されていればよい。
【００２９】
次に更新後ログ作成部１１４は、取得した更新後データのサイズと差分情報データのサイズを取得する（ステップ７０４，７０５）。差分情報データのサイズには、付加されるオフセットなど付加情報のサイズも含まれる。さらに取得した更新後データのサイズと差分情報データのサイズを比較する（ステップ７０６）。更新後データに比べて差分情報データのサイズが更新後データのサイズに所定の係数値Ｎ（０＜Ｎ≦１）をかけたものより小さければ（すなわち所定の割合よりも小さければ）、差分情報データを更新後データとして、ログファイルに格納する（ステップ７０７）。Ｎの値は、復元データ作成に要する時間、更新後データから差分情報を差し引いたデータ量の書き出しに要する時間、データベースに格納されるデータ形式などの物理データ特性などを考慮して設定する。特にＮ＝１の場合には上記の時間や物理データ特性を考慮しないことになるが、ステップ７０６の大まかな判定基準として適用可能であり、この場合も本実施例の１つである。そうでなければ（ステップ７０６ＮＯ）、取得した更新後データをそのまま更新後データとしてログファイルに格納し、処理を終了する（ステップ７０８）。ステップ７０７，７０８ともに更新後データはその更新前データと同一ログレコードに格納される。またステップ７０７では「更新後データフラグ」に変更部分のみを示すフラグを設定し、ステップ７０８では「更新後データフラグ」にレコードデータ全体を示すフラグを設定する。またステップ７０７で書き込む差分情報データには差分のブロックごとに更新前データを基にしたフセットと必要に応じてデータのサイズが含められる。
【００３０】
図８は、更新処理を行うトランザクションの処理シーケンスの途中で障害が発生した場合の例を表している。トランザクション１は、同一トランザクション内でデータベースの更新１と更新２の処理を行う。ここで更新２とは、図３に示すような同一レコードデータに対する更新処理とする。またトランザクション２は、そのトランザクション内で更新３を行うものとする。このようなトランザクションの実行中に、更新１と更新２の間でシンクポイントが発生したものとする。このような状況でトランザクション２を実行中に障害が発生した場合を想定する。
【００３１】
更新１はシンクポイント前であるため、その更新処理はデータベースに反映されている。しかし更新２はシンクポイント後の処理であるため、データベースに反映されている保証はない。トランザクション１は、障害発生前にコミットしているためロールバックする必要はないが、ロールフォワードする必要がある。該当データは更新１の処理が行われた状態となっているため、図４に示す更新２のログ情報を使用して更新後データを復元し、データベース管理制御部１０７に引き渡し、データベースに反映する。
【００３２】
トランザクション２は、コミットする前に障害が発生しているため処理をすべて無効化する必要がある。通常、ロールバック処理を行う場合、該当トランザクションの更新後ログを基にしてロールフォワード処理をしてデータベースの該当レコードを更新３後のレコードにした後、ロールバック処理を行う必要がある。この場合、該当データは更新３のログを用いてロールフォワード処理が行われ、その後、ロールバック処理が行われる。
【００３３】
以下に、取得したログデータからデータベースの回復を行う手順について説明する。図９は、データベース回復部１１０の処理手順を示すフローチャートである。データベース回復部１１０は、データベース管理制御部１０７よりデータベース回復要求を受け取る（ステップ９００）。更新前データ回復要求であれば（ステップ９０１ＹＥＳ）、更新前データ回復部１１５を呼び出して更新前データ回復処理を行う（ステップ９０２）。そうでなければ（ステップ９０１ＮＯ）、ステップ９０３の処理に移る。更新後データ回復要求であれば（ステップ９０３ＹＥＳ）、更新後データ回復部１１６を呼び出し更新後データ回復処理を行う（ステップ９０４）。そうでなければ（ステップ９０３ＮＯ）、処理を終了する。
【００３４】
図１０は、更新前データ回復部１１５の処理手順を示すフローチャートである。更新前データ回復部１１５は、ログファイルの指定されたトランザクションＩＤとログレコード番号をもつログレコードから更新前データを取得する（ステップ１００１）。次に取得した更新前データをデータベースの該当するレコードデータに上書きし、処理を終了する（ステップ１００２）。
【００３５】
図１１は、更新後データ回復部１１６の処理手順を示すフローチャートである。更新後データ回復部１１６は、ログファイルの指定されたトランザクションＩＤとログレコード番号をもつログレコードから更新後データを取得する（ステップ１１０１）。次に該当する管理情報中の更新後データフラグを参照して取得した更新後データは差分情報であるか否か判定する（ステップ１１０２）。更新後データが差分情報である場合、ログファイルの当該ログレコードの更新前データを取得する（ステップ１１０３）。次に取得した更新前データと差分情報データから更新後レコードデータの全体を復元する（ステップ１１０４）。すなわち差分情報データから各差分ブロックを切り出し、そのオフセットに基づいて更新前データの対応するブロックを差分ブロックで書き換えることによって更新後レコードデータの全体が復元される。次に復元した更新後データをデータベース管理制御部１０７に渡し、データベースの該当するレコードデータに上書きし、そのレコードデータの回復を行う（ステップ１１０５）。更新後データが差分情報でなくレコードデータの全体である場合は、ログファイルから取得した更新後データを直接、データベース管理制御部１０７に渡し、データベースの該当するレコードデータに上書きし、そのレコードデータの回復を行う。以上により、本実施例の差分を用いた更新後データの場合にもデータの回復処理を実現することができる。
【発明の効果】
以上述べたように本発明によれば、データベース回復のために採取されるログファイルのデータ量を削減することができる。
【図面の簡単な説明】
【図１】実施形態のトランザクション処理システムの構成図である。
【図２】ログレコードのデータ構成例を示す図である。
【図３】繰り返し更新されるＢＬＯＢデータの例を示す図である。
【図４】更新後ＢＬＯＢデータの各々に対して作成されるログレコードの例を示す図である。
【図５】実施形態のログレコード作成手順を示すフローチャートである。
【図６】実施形態の更新前ログの作成手順を示すフローチャートである。
【図７】実施形態の更新後ログの作成手順を示すフローチャートである。
【図８】更新処理を行うトランザクションの処理シーケンス中で障害が発生した場合の例を説明する図である。
【図９】実施形態のデータベース回復部１１０の処理手順を示すフローチャートである。
【図１０】実施形態の更新前データ回復部１１５の処理手順を示すフローチャートである。
【図１１】実施形態の更新後データ回復部１１６の処理手順を示すフローチャートである。
【符号の説明】
１００：データベースサーバ、１０５：データベース格納領域、１０６：ログファイル格納領域、１０７：データベース管理制御部、１０９：ログ取得部、１１０：データベース回復部、１１２：更新前ログ作成部、１１３：差分情報作成部、１１４：更新後ログ作成部[0001]
TECHNICAL FIELD OF THE INVENTION
The present invention relates to a database log management technique, and more particularly to a database log management technique for reducing the size of a log file.
[0002]
[Prior art]
Generally, in an RDBMS (relational database management system), database update information is recorded in preparation for a failure of an OS, a temporary storage device, an external storage device, or the like. Update records of these databases are called logs or journals, and are written out to an external storage device separately from user data. Since the log or the journal is prepared for database recovery, when storing update information as a log, temporary caching is not performed on a temporary storage device such as a memory, and is written to an external storage device upon completion of the update process. It is common.
[0003]
A log for re-executing the update process in the log is called a redo log or a post-update log, and a log for invalidating the update process is called an undo log or a pre-update log. The former acquires the value of the data before update of the data to be updated of the data, and the latter acquires the value of the data after the update processing. A file in which a log record has been written to an external storage device is called a log file. Each time an update occurs in the database, the pre-update log and the post-update log are written out, so the log file capacity increases in accordance with the number of times of writing.
[0004]
Normally, access to a database is logically regarded as a series of transactions as a group so that the contents of the database can be consistent. Update processing to the database is included in a certain transaction, and a transaction is invalidated because a series of processing in the same transaction is reflected in the database by a commit request, and a series of processing in the same transaction is not reflected in the database by a rollback request Become. In many cases, the pre-update log and the post-update log when an update occurs are output to a log file regardless of whether the transaction is completed or invalid. For this reason, whether the information is necessary for rollback and rollforward of the update process is determined by using the necessary information from the log file when the database actually fails and the database needs to be recovered. Unnecessary information was sometimes stored in the log file, increasing the log file capacity.
[0005]
As a solution to this problem, there is a technique described in Japanese Patent Application Laid-Open No. 6-83682. Here, only when the relevant process does not include the committed information, the pre-update information is written as the pre-update log, and the post-update information is written to the log file as the post-update log, and the database is not committed By writing only when no data is included, the log file size can be reduced. However, even in this method, in the case of a long transaction where a large amount of uncommitted data updates occur (a case where the time for defining the transaction is long), a large amount of writing of the pre-update log and the post-update log to the log file occurs. As a result, the log file capacity increases.
[0006]
Until now, RDBMSs are often used for routine tasks such as OLTP (Online Transaction Processing), and the stored data is limited to cases where the data amount is relatively small, such as numerical data or character data. In recent years, the use of RDBMSs in information systems such as OLAP (online analysis processing) has increased, and the cases in which databases contain vast amounts of data such as multimedia data such as document data, images and sounds have increased. I have. Log files such as a pre-update log and an post-update log for such multimedia data have an enormous data amount. In this case, even if the above-described known technology is used, the log file capacity increases when data is frequently updated in the database.
[0007]
In the case where multimedia data such as document data, images and sounds are stored in the RDBMS as a data type such as BLOB (Binary Large OBject), the physical image of the changed data is updated after the change of the physical image despite a small change in the data. Is often output to a log file. For this reason, when the data is frequently updated, it is necessary to acquire the physical images of the pre-update data and the post-update data as a pre-update log and a post-update log, respectively, and the log storage area is about twice that of the original data. Is required. Also, since the data amount of each of the pre-update log and the post-update log increases, there is a concern that performance may be degraded due to input / output of writing to the log file.
[0008]
[Problems to be solved by the invention]
As described above, if updates occur frequently in the database, it is necessary to store the pre-update data and the post-update data in the log files as a pre-update log and a post-update log, respectively. If it is long, a large amount of log file storage area of the external storage device is required.
[0009]
An object of the present invention is to reduce the data amount of a log storage area for storing log data collected for database recovery.
[0010]
[Means for Solving the Problems]
The present invention is a database log management technique for collecting a database log for recovery of a database, acquires data before update when updating a database, outputs the data before update to a log file, and outputs the data before update to the data before update. Obtain the corresponding post-update data, create difference data that is the changed part of the post-update data with respect to the pre-update data, compare the size of this differential data with the size of the post-update data, and determine the size of the differential data. A database log management technology that outputs the difference data to a log file if the size of the data after the update is smaller than the size of the data after the update, and outputs the data after the update to the log file if the size of the difference data is equal to or larger than the size of the data after the update. It is characterized by.
[0011]
BEST MODE FOR CARRYING OUT THE INVENTION
FIG. 1 is a configuration diagram of a transaction processing system involving access to a database according to the present embodiment. The system includes a database server 100, which is a device for managing a database, a client 101, and a network 102 connecting the two. The client 101 is a computer and a terminal that accesses the database server 100, and accesses a database managed by the database server 100 via a UAP (User Application Program) to be executed and a network 102. The database server 100 is a server computer, and includes a CPU 103, a main storage device 104, a database storage area 106, and a log file storage area 107. The database storage area 106 is a storage area for storing a database, and the log file storage area 107 is a storage area for storing log files.
[0012]
The main storage device 104 includes a database management control unit 107 for processing access to the database, a database input / output management unit 108 for performing input / output processing on the database, a log acquisition unit 109 for performing log record acquisition processing, and database recovery from log records. And a log input / output management unit 111 that performs input / output processing for a log file. All of these processing units are programs executed by the CPU 103.
[0013]
The log acquisition unit 109 acquires update processing information for the database from the database management control unit 107, creates a log record, and writes the log record to the log file via the log input / output management unit 111. The log acquisition unit 109 includes a pre-update log creation unit 112 that stores data before a database update in a log file, a difference information creation unit 113 that creates difference information between the pre-update data and the post-update data of the database, and a post-update database. It is composed of an updated log creating unit 114 that stores data in a log file.
[0014]
When a failure occurs in the database and a database recovery request is issued, the database recovery unit 110 recovers the database using the acquired log file. The database recovery unit 110 recovers the pre-update data of the database from the pre-update data stored in the log record, and the post-update data stored in the log record stores the post-update data of the database. And a post-update data recovery unit 116 that recovers data from the server.
[0015]
The database server 100 receives a request from the client 101 in the database management control unit 107 and executes a request for an action on the database. If the data requested by the client 101 is in the buffer area on the main memory, the database management control section 107 acquires the data from the buffer area. If the requested data does not exist on the main memory, the database management control section 107 , Record data is acquired from the database storage area 105. Writing of the processing result of the database to the database storage area 105 is performed synchronously or asynchronously with the access request.
[0016]
The database management control unit 107 activates the log acquisition unit 109 every time a database update occurs, and records a log record in synchronization with the database update. When a failure occurs in the database and the database needs to be recovered, the database access management control unit 107 sends a database recovery request to the database recovery unit 110. The database recovery unit 110 acquires log records necessary for database recovery from the log file storage area 106, creates recovered record data, and delivers the data to the database management control unit 107. The database management control unit 107 writes the record data to a corresponding location in the database storage area 105 via the database input / output management unit 108.
[0017]
FIG. 2 is a diagram illustrating a data configuration of a log record. The log record holds a log type, a transaction ID, a log record number, and difference information as management information. The log type is an attribute indicating what kind of action the log record is for the database, and deletion, addition, update, and the like can be specified. The difference information management method of the post-update data of the log file according to the present embodiment is applied to the action of updating the database. The transaction ID is an attribute that uniquely determines to which transaction the log record belongs. The log record number is a number continuously assigned to log records in the database management system.
[0018]
The transaction ID and the log record number are identifiers having important meanings in the roll forward processing on the database. In a database management system, a synchronization point that guarantees writing of update information to a database called a synchronization point is provided separately from completion or invalidation of a transaction. When this sync point is set, the data buffered in the memory is written to the external storage device, a sync point dump file is created separately from the log file, and the transaction ID and log record number corresponding to the record update performed on the database Write out information such as. A log record number is assigned each time it is written. Therefore, a log record having a log record number whose value is smaller than the log record number written in the synchronization point dump file is guaranteed to be written to the database, so that roll forward processing is performed on the database in the same transaction. No need. The roll forward process may be performed on a log record having a number greater than the log record number written in the synchronization point file. Therefore, by comparing the combination of the transaction ID and the log record number in the log file with the corresponding value in the synchronization point dump file, the amount of work of the roll forward process can be reduced.
[0019]
The pre-update data is the entire record data before updating the database. The post-update data is the entire record data after the database is changed or the data of the changed part with respect to the pre-update data. An offset from the head of the pre-update data and the size of the data as necessary are added to the changed part data. It is possible to restore the entire updated record data from the pre-update data and the changed partial data. The post-update data flag is a flag for distinguishing whether the post-update data is the entire record data or the changed partial data.
[0020]
FIG. 3 shows an example in which BLOD-type data, that is, BLOB data is taken as an example of long data, and the BLOD data is repeatedly updated. The initial BLOB data (BLOD data at the time of data insertion) is record data in a database composed of eight blocks A to H. Here, A to H are symbols given to distinguish the content information of each block. The data of each block from A to H has an arbitrary data length. By a certain transaction, the block D is updated to the content X and the block G is updated to the content Y with respect to the initial BLOB data, and becomes the BLOB data after the first update (after the processing of the first update). That is, here, the block X and the block Y are blocks that are different from the pre-update data (here, the initial BLOD data). In the BLOB data after the update 1, the block H is further updated to the content Z, and becomes the BLOB data after the update 2. The BLOB data after the update 2 shows a case where the block X is updated to the content D, the block Y is updated to the content G, and the block Z is updated to the content H to become the BLOB data after the update 3.
[0021]
FIG. 4 shows an example of a log record created for each of the updated BLOB data as shown in FIG. The log record of update 1 has a log type of update, a transaction ID of 10, a log record number of 100, a post-update data flag indicating a changed portion, and the data before update includes contents A, B, C, D, E, and F. , G and H are stored in the updated data, offset 1 and block X in the updated record data, offset 2 and block Y in the updated record data.
[0022]
The log record of update 2 has a log type of update, a transaction ID of 10, a log record number of 200, a post-update data flag indicating a changed portion, and the data before update includes contents A, B, C, X, E, and F. , Y, and H, and the updated data stores the offset 3 and the block Z in the updated record data. Since the transaction ID of the update 2 log record is the same as the update 1 log record, it is an update in the same transaction. Also, since the log record number is larger than the log record number of the update 1 log record, it must be a later update. I understand.
[0023]
The log record of update 3 has a log type of update, a transaction ID of 20, a log record number of 100, a post-update data flag indicating a changed portion, and data before update of contents A, B, C, X, E, and F. , Y, and Z, and the updated data stores offset 1, block D, offset 2, block G, offset 3, and block H. Since the transaction ID of the update 3 log record is different from those of the update 1 log record and the update 2 log record, it is known that the transaction is another transaction.
[0024]
In the above embodiment, an example in which a record (data in a row direction) in a database is updated has been described. However, in a case where a specific field (data in a column direction) is updated, data is divided into blocks as described above. The above method can be applied as it is.
[0025]
In this way, the pre-update data and the post-update data can be stored in the log record. By using the difference for the updated data as described above, the capacity of the log file can be reduced. Next, a specific log storage procedure will be described.
[0026]
FIG. 5 is a flowchart illustrating a processing procedure of the log acquisition unit 109. First, the log acquisition unit 109 receives database update information from the database management control unit 107 (Step 500). The update information includes the entire record data before update and the entire record data after update. If the request is a data write request before update (YES in step 501), the log information creation unit 112 is called to record management information and data before update (step 502). Otherwise (step 501 NO), the process proceeds to the next process. If it is a post-update data write request (step 503 YES), the post-update log creating unit 114 is called to record the post-update data (step 504). Otherwise (NO in step 503), the process moves to the next process. After these series of processing, the log record number in the counter is increased for the next log record creation (step 505).
[0027]
FIG. 6 is a flowchart showing the processing procedure of the pre-update log creation unit 112. The pre-update log creation unit 112 acquires pre-update data from the received update information (step 601). Next, the management information of the log record and the acquired pre-update data are written to a log file, and the process ends (step 602). "Update" is set for the log type of the management information, the specified transaction ID is set for the transaction ID, and the log record number stored in the counter is set for the log record number. The post-update data flag is an undetermined field.
[0028]
FIG. 7 is a flowchart illustrating a processing procedure of the post-update log creation unit 114. The post-update log creation unit 114 first obtains pre-update data (step 701). Further, updated data is obtained (step 702). Next, the difference information creating unit 113 creates difference information of the post-update data with respect to the pre-update data, that is, changed partial data from the acquired pre-update data and post-update data according to the difference information management method shown in FIG. 703). Since a known technique can be applied to create the difference information, a detailed description is omitted. It suffices if it is guaranteed that the updated data can be restored without any contradiction from the pre-update data and the difference information.
[0029]
Next, the post-update log creation unit 114 obtains the size of the obtained post-update data and the size of the difference information data (steps 704 and 705). The size of the difference information data includes the size of additional information such as an offset to be added. Further, the size of the acquired updated data is compared with the size of the difference information data (step 706). If the size of the difference information data is smaller than the size of the post-update data multiplied by a predetermined coefficient value N (0 <N ≦ 1) (ie, smaller than a predetermined ratio), the difference information The data is stored in the log file as updated data (step 707). The value of N is set in consideration of the time required for creating the restored data, the time required for writing the amount of data obtained by subtracting the difference information from the updated data, physical data characteristics such as the data format stored in the database, and the like. In particular, when N = 1, the above-mentioned time and physical data characteristics are not taken into consideration, but they can be applied as a rough judgment criterion in step 706, and this case is also one of the present embodiments. Otherwise (NO in step 706), the acquired updated data is stored as it is in the log file as the updated data, and the process ends (step 708). In both steps 707 and 708, the updated data is stored in the same log record as the data before the update. In step 707, a flag indicating only the changed portion is set in the "updated data flag", and in step 708, a flag indicating the entire record data is set in the "updated data flag". The difference information data to be written in step 707 includes the offset based on the pre-update data and the size of the data as necessary for each difference block.
[0030]
FIG. 8 illustrates an example in which a failure occurs in the middle of a processing sequence of a transaction for performing an update process. Transaction 1 performs update 1 and update 2 processing of the database within the same transaction. Here, the update 2 is an update process for the same record data as shown in FIG. Also, it is assumed that transaction 2 performs update 3 within that transaction. It is assumed that a sync point occurs between update 1 and update 2 during execution of such a transaction. It is assumed that a failure occurs during the execution of transaction 2 in such a situation.
[0031]
Since update 1 is before the synchronization point, the update processing is reflected in the database. However, since update 2 is a process after the sync point, there is no guarantee that the update is reflected in the database. Transaction 1 does not need to be rolled back because it was committed before the failure occurred, but must be rolled forward. Since the corresponding data has been subjected to the processing of update 1, the updated data is restored using the log information of update 2 shown in FIG. 4 and transferred to the database management control unit 107 to be reflected in the database. .
[0032]
Transaction 2 needs to invalidate all processing because a failure has occurred before committing. Normally, when the rollback processing is performed, it is necessary to perform the rollback processing based on the updated log of the corresponding transaction to convert the corresponding record in the database into the record after the update 3, and then perform the rollback processing. In this case, the corresponding data is subjected to the roll forward processing using the log of the update 3, and then the roll back processing is performed.
[0033]
Hereinafter, a procedure for recovering the database from the acquired log data will be described. FIG. 9 is a flowchart illustrating a processing procedure of the database recovery unit 110. The database recovery unit 110 receives a database recovery request from the database management control unit 107 (Step 900). If the request is a pre-update data recovery request (step 901 YES), the pre-update data recovery unit 115 is called to perform a pre-update data recovery process (step 902). If not (NO in step 901), the process proceeds to step 903. If it is a post-update data recovery request (step 903 YES), the post-update data recovery unit 116 is called to perform post-update data recovery processing (step 904). Otherwise (step 903 NO), the process ends.
[0034]
FIG. 10 is a flowchart illustrating a processing procedure of the pre-update data recovery unit 115. The pre-update data recovery unit 115 acquires pre-update data from the log record having the specified transaction ID and log record number in the log file (step 1001). Next, the acquired pre-update data is overwritten on the corresponding record data in the database, and the processing is terminated (step 1002).
[0035]
FIG. 11 is a flowchart illustrating a processing procedure of the post-update data recovery unit 116. The post-update data recovery unit 116 acquires the post-update data from the log record having the specified transaction ID and log record number in the log file (step 1101). Next, it is determined whether or not the updated data acquired by referring to the updated data flag in the corresponding management information is difference information (step 1102). If the post-update data is difference information, the pre-update data of the log record in the log file is obtained (step 1103). Next, the entire post-update record data is restored from the acquired pre-update data and difference information data (step 1104). That is, each difference block is cut out from the difference information data, and the corresponding block of the pre-update data is rewritten with the difference block based on the offset, thereby restoring the entire record data after update. Next, the restored updated data is passed to the database management control unit 107, overwritten on the corresponding record data in the database, and the record data is recovered (step 1105). If the post-update data is not the difference information but the entire record data, the post-update data obtained from the log file is directly passed to the database management control unit 107 to overwrite the corresponding record data in the database, and Perform recovery. As described above, the data recovery process can be realized even in the case of the updated data using the difference according to the present embodiment.
【The invention's effect】
As described above, according to the present invention, the data amount of a log file collected for database recovery can be reduced.
[Brief description of the drawings]
FIG. 1 is a configuration diagram of a transaction processing system according to an embodiment.
FIG. 2 is a diagram illustrating a data configuration example of a log record.
FIG. 3 is a diagram illustrating an example of BLOB data that is repeatedly updated.
FIG. 4 is a diagram illustrating an example of a log record created for each of updated BLOB data.
FIG. 5 is a flowchart illustrating a log record creation procedure according to the embodiment.
FIG. 6 is a flowchart illustrating a procedure of creating a pre-update log according to the embodiment.
FIG. 7 is a flowchart illustrating a procedure for creating a post-update log according to the embodiment.
FIG. 8 is a diagram illustrating an example of a case where a failure occurs during a processing sequence of a transaction for performing an update process.
FIG. 9 is a flowchart illustrating a processing procedure of a database recovery unit 110 according to the embodiment.
FIG. 10 is a flowchart illustrating a processing procedure of a pre-update data recovery unit 115 according to the embodiment.
FIG. 11 is a flowchart illustrating a processing procedure of a post-update data recovery unit 116 according to the embodiment.
[Explanation of symbols]
100: database server, 105: database storage area, 106: log file storage area, 107: database management control section, 109: log acquisition section, 110: database recovery section, 112: pre-update log creation section, 113: difference information creation Section, 114: log creation section after update

Claims

What is claimed is: 1. A database log management method for collecting a database log for recovery of a database, comprising: acquiring pre-update data when updating the database, outputting the pre-update data to a log file, and corresponding to the pre-update data. Obtaining post-update data, creating difference data that is a changed part of the post-update data with respect to the pre-update data, comparing the size of the differential data with the size of the post-update data, If the size is smaller than the size of the updated data, the difference data is output to the log file.If the size of the difference data is equal to or larger than the size of the updated data, the difference data is output to the log file. A database log management method characterized by outputting.

2. The database log management method according to claim 1, wherein the pre-update data and the post-update data are one of a row direction record and a column direction field in the database.

Instead of comparing the size of the difference data with the size of the updated data, the size of the difference data is compared with a value obtained by multiplying the size of the updated data by a predetermined coefficient N. Item 10. The database log management method according to Item 1.

In recovering the database, when the pre-update data is used for recovery, the relevant pre-update data is acquired from the log file and reflected in the database, and when the post-update data is used for recovery, the relevant update is performed. Obtain the post-update data or the difference data that is recorded as previous data, and if the post-update data is recorded, reflect the post-update data in the database, and the difference data is recorded 2. The database log management method according to claim 1, wherein when there is, the post-update data is restored from the pre-update data and the difference data and reflected in the database.

What is claimed is: 1. A database log management method for collecting a database log for recovery of a database, comprising: acquiring pre-update data when updating the database, outputting the pre-update data to a log file, and corresponding to the pre-update data. Obtain post-update data, create difference data composed of changed blocks when dividing the pre-update data into a plurality of blocks, and compare the size of the difference data with the size of the post-update data. If the size of the differential data is smaller than the size of the post-update data, the log file outputs a flag indicating that the differential data and the differential data have been recorded, and the size of the differential data is If the size is larger than or equal to, the updated data and the updated data are recorded in the log file. Managing a database log and outputs a flag indicating that.

Instead of comparing the size of the difference data with the size of the updated data, the size of the difference data is compared with a value obtained by multiplying the size of the updated data by a predetermined coefficient N. Item 6. The database log management method according to Item 5.

In recovering the database, when the pre-update data is used for recovery, the relevant pre-update data is acquired from the log file and reflected in the database, and when the post-update data is used for recovery, the relevant update is performed. Obtain previous data and the updated data or the difference data recorded with the flag, and reflect the updated data in the database according to the flag, or update from the pre-update data and the difference data according to the flag. 6. The database log management method according to claim 5, wherein post-data is restored and reflected in the database.

A computer that manages a database log that collects a database log for recovery of a database, comprising: a database management control unit that controls access to a database; and a log acquisition unit, wherein the log acquisition unit updates the database. Means for obtaining the pre-update data from the database management control means, outputting the pre-update data to a log file, and obtaining the post-update data corresponding to the pre-update data from the database management control means. Means for creating difference data that is a changed part of the post-update data with respect to the pre-update data; and comparing the size of the differential data with the size of the post-update data to determine that the size of the differential data is If it is smaller than the size of the post-data, Managing a database log, comprising: means for outputting minute data; and means for outputting the updated data to the log file when the size of the difference data is equal to or larger than the size of the updated data. calculator.

The managing computer further includes a database recovery unit, and the database recovery unit acquires the corresponding pre-update data from the log file and reflects the data in the database when the pre-update data is used for recovery. Means for acquiring the pre-update data and the recorded post-update data or the difference data when the post-update data is used for recovery, and when the post-update data is recorded, Means for reflecting the post-update data in the database; and means for restoring post-update data from the pre-update data and the difference data when the difference data is recorded and reflecting the restored data in the database. 9. The computer according to claim 8, wherein said computer manages a database log.

A program for causing a computer to realize a function of collecting a database log for recovery of a database, a function of obtaining data before update when updating the database, outputting the data before update to a log file to the computer. A function to obtain post-update data corresponding to the pre-update data; a function to create difference data that is a changed part of the post-update data with respect to the pre-update data; a size of the differential data and the post-update A function of outputting the difference data to the log file when the size of the difference data is smaller than the size of the post-update data by comparing the size of the post-update data with the size of the post-update data; In the above case, the function of outputting the updated data to the log file is realized. Of the program.

Further, in the computer, when recovering the database, when the pre-update data is used for recovery, a function of acquiring the relevant pre-update data from the log file and reflecting the data on the database, and when using the post-update data for recovery. A function of acquiring the pre-update data and the recorded post-update data or the difference data, and a function of reflecting the post-update data in the database when the post-update data is recorded. 11. The program according to claim 10, wherein a function of restoring post-update data from the pre-update data and the differential data when the differential data is recorded and reflecting the updated data in the database is realized. .