JP4280306B2

JP4280306B2 - Log-based data architecture for transaction message queuing systems

Info

Publication number: JP4280306B2
Application number: JP52274998A
Authority: JP
Inventors: デビッド・ダブリュ・エッチ・ワング; デレク・エル・シュベンケ
Original assignee: Mitsubishi Electric Research Laboratories Inc
Current assignee: Mitsubishi Electric Research Laboratories Inc
Priority date: 1996-11-14
Filing date: 1997-11-11
Publication date: 2009-06-17
Anticipated expiration: 2017-11-11
Also published as: WO1998021654A1; EP1015973A1; AU5177198A; JP2001502455A; EP1015973A4

Abstract

A message queuing system is provided that saves and stores messages and their state in an efficient single file on a single disk to enable rapid recovery from server failures. The single disk, single file storage system into which messages and their states are stored eliminates writes to three different disks, the data disk, the index structure disk and the log disk. The single disk, single file storage is made possible by clustering all information together in a contiguous space on the same disk. The result is that all writes are contained in one sweeping motion of the write head in which the write head moves only in one direction and only once to find the area where it needs to start writing messages and their states are stored. In order to keep track of the clustered information, a unique Queue Entry Map Table (100) is used which includes control information (100), message blocks (102) and log records (104) in conjunction with single file disk storage that allows the write head never to have to back-up to traverse saved data when writing new records. The system also permits locating damaged files without the requirement of scanning entire log files.

Description

技術分野
この発明は、メッセージキューイングに関し、より詳しくはクライアントサーバおよびモバイルエージェントアプリケーションのための迅速で信頼性のあるトランザクションメッセージキューイングシステムに関し、さらに、当該システムのためのログベースデータアーキテクチャに関する。
背景技術
メッセージキューイングは、その本来の同期処理および非同期処理をいずれも可能にするという柔軟性により、それぞれ異なるコンピュータシステム上のアプリケーション間で最も基本となる通信範例である。メッセージキューイングミドルウェア基幹は、一般的なクライアントサーバならびにモバイルエージェント計算処理、すなわちワークフロー計算処理、オブジェクトメッセージ伝送、トランザクションメッセージ伝送およびデータ複写サービスのいずれにおいても膨大なアプリケーションドメインについて非常に柔軟性のある骨組みである。
トランザクションメッセージ伝送の場面ではデータが転送中に紛失することがたびたびある。金融業界では、ある場所から別の場所に転送される銀行の取引記録が、サーバ不良、送信回線の不具合またはその他人為的なものによって紛失する可能性があるが、これが金融業界で起これば大惨事になる。エラーが発生したことを迅速に突き止めることができ、かつデータが有効であった既知のポイントからデータの再構築を可能にすることがシステムマネージャに課せられる。
クラッシュ前の当該システムの最新状況を再構築するために当該システムがいわゆるログファイル全体を走査することによって、過去においてエラーが発生した地点を確定する。関連するタイムスタンプを有するログファイルを常時利用して、メッセージおよびログファイルに含まれるデータを識別する。しかしながら、最新状況を確認するためにログファイル全体を走査するには、１，０００ものログレコードを走査する必要がある。
エラーの発生した地点を突き止め、かつその発生地点からファイルを再構築するログレコード全体をそうさすることは、非能率な方法というだけでなく、従来のシステムでは、２種類のディスクファイルが必要であった。このうち、ひとつはデータファイルとして、他方はログファイルとして機能する。
さらに、ログエントリとデータファイルまたはセクタの間の相互関係は、従来のセクタが識別性のない順序で格納され、ログファイルとセクタとの間のマッピングが所要時間の多少かかるプロセスであるたため複雑である。
さらに技術背景では、一つのポイントから別のポイントに伝送されるデータレコードが損なわれることのないような格納装置を提供可能にするようなメッセージキューイングが一般的に使用されることは理解されよう。たとえば、ある場所でエラーが発生しデータが紛失した場合でも、メッセージキューイング本来の格納装置が機能してデータを第２の場所で再構築可能である。
例として、特に株取引では、取引中の割込みは数時間というよりは数分に極限されることが望ましい。しかし、時に、システムサーバが停止した場合、その時のシステムでの取引数により、回復に２時間から８時間もかかることがある。したがって、停止時間や破損ファイルの見つけ出しと再構築に所要する時間と費用を最小限に抑える必要がある。
なお、ここで用いられるキューファイルは、伝送中のメッセージの物理的格納装置を表す。キューファイルは、未完了操作のための保持用セルと称することもできる。すなわち、基本的には、所定のメッセージを受信する受け手がそこにいない場合、メッセージをキューファイルに保持し、後で送出できるようにすることを意味する。したがって、キューファイルは送信された情報の保持に信頼性を与える。
さらに、従来のシステムにおいて、回復データは、キューファイルそれ自体により提供されるものではない。したがって、エラーまたはデータの紛失が発生した場合、キューファイルはファイル状態を確認するために利用されていない。すなわち、以前に不正処理されていないデータからデータファイルを再構築するためにキューファイルは使用されていない。従来のシステムでは、キューファイルそれ自体により回復データが提供されることはない。
メッセージキューイングの実世界のアプリケーションへの応用例についての別の例では、メッセージキューイング基幹のモバイルエージェントを用いたリアルタイムオンライントランザクション処理の支援の仕方に絡んでいる。本例では、顧客はたとえば地理的に分散した支店を有する銀行である。顧客の口座が作られ、その口座が開設された地方支店で保管される。例示の目的のために、これを口座の本拠地支店（home branch）と称する。各口座の写しが本店においても保管される。口座の読出し取り操作を地方支店または本店のいずれからも行うことが可能である。しかしながら、本拠地支店にある写しと本店の写しとが同等に更新することが要求される。
更新の要求が本拠地支店で発生すると、地方支店にあるコピーを更新しなければならない。この更新によって、次にキューに加える（enqueue）要求をキューマネージャまたはキューサーバに自動的に送出するエージェントを始動することが可能である。このキューマネージャは、広域ネットワークを介して別のキューマネージャに対しての要求をキューから外し（dequeue）、このキューマネージャが、今度は、ミラーオフィスにある口座のデータベースサーバに対しての更新要求をキューから外す。
メッセージキューは、本例では非同期の信頼性のある処理を提供する。非同期処理は、ある位置でのデータベースの更新によって起動するエージェントから始まる。エージェントは、更新要求をメッセージキューマネージャに対して非同期で送出するが、応答を待つ必要はない。メッセージキューマネージャは、要求者が応答を待つ必要なく処理を継続できるように要求についての保持セルとして機能する。さらにメッセージキューマネージャは、本例では、更新の要求の受け手が、トランザクションメッセージキューとして当該業界において知られている二相コミットプロトコル（Two Phase Commit Protocol）と呼ばれる周知のハンドシェークプロトコルを介して受信状況をひとつひとつ確認するまで、更新要求のコピーをキューで保持することで信頼性を提供する。
これらフタイプのメッセージキューイングシステムは、これまで信頼可能に動作したが、メッセージキューに添付されるメッセージを格納するために別個のキューデータおよびログレコードファイルを使用するデータ構造に依存するものであった。このような構造は、サーバのクラッシュ時における迅速な修復を妨げ、２種類の格納用ディスクを必要とする。一つはデータのためのディスクであり、他方はログレコードのディスクである。さらに、従来のメッセージキューイング構造は、通常、効率よく作業するための予備のハードウェアを必要とせずには書込み動作が最適化されることはない。また、メッセージ滞留時間の短い高性能のスループットシステムには適切ではない。上述の別個のキューデータおよびログファイルにもまた、非信頼性が必要以上のレベルで取り入れられている。これは、ファイル不正処理および媒体不良の二点が潜在的に含まれているからである。さらに、メッセージキューイングシステムの業務管理担当者のために最初から回復に要する仕事量が予め定められる手段は通常存在しない。
なお、上述のシステムは、Digital Equipment Corporation社のDECmessageQ、IBM社のMQシリーズおよびTransarc社のEncina RQSとして市販化されている。
発明の概要
この発明において提供されるメッセージキューイングシステムは、従来のメッセージキューイングが有する課題を解決するために、メッセージおよびその状況を単一ディスク上の効率的な単一ファイルに保存し格納することによって、サーバ不良からの回復を迅速に行うことが可能となる。メッセージおよびその状況を格納する単一ディスク単一ファイル格納装置システムによって、３種類のそれぞれ異なるディスク、すなわちデータディスク、インデックス構造ディスクおよびログディスクへの書込みが消去される。単一ディスク単一ファイル格納装置は、同一ディスク上の隣接スペースにおいてすべての情報を集束させることによって可能となる。この結果、すべての書込みは、書込みヘッドの一つの掃引動作に含まれ、書込みヘッドは一方向にのみ一度だけ移動して、メッセージの書込み開始を必要とし、その状態を格納する領域を見つけ出す。集束された情報を追従するために使用される固有のキューエントリマップテーブル（Queue Entry Map Table）は、制御情報と、メッセージブロックと、ログレコードと、新規のレコードを書込む際に保存されたデータをトラバースするために書込みヘッドがバックアップする必要が全くない単一ファイルディスク格納装置とを同時に含む。さらに当該システムは、ログファイル全体を走査する必要なく破損ファイルを見つけることができる。
最新の有効データを見つけるために、制御チェックポイント間隔システムを利用して最新の不正処理されていないデータを見つけることができる。走査を行い、最新のチェックポイント間隔を見つけることによって、最後のキューをすぐに認識できる。チェックポイント後にログレコードの走査を行い、すべてのメッセージの最新状況を設定する。上述のシステムによって従前のシステムよりも少ない時間の位数でデータ回復を行うことができる。同時に、効率的なフォワード方向への書込み方法を確立させることによって、順序づけられていないセクタを介して検索する必要がなくなる。
一実施態様によると、最後尾のセクタに新規のレコードを追加することによって先行のセクタを更新し、ファイル状態が変更されたことを示す巡回循環バッファ用システム（circular wrap around buffering system）を用いることによって、開放され、有効メッセージおよび／またはログレコードをもはや保持していない先行のブロックを再利用する。
したがって、この発明は、トランザクションメッセージキューイングシステムのためのログベースデータ構造（アーキテクチャ）を提供するものであり、当該システムは、メッセージキューデータおよびログレコードの組合わせオンディスクファイル構造を利用するものである。この発明の一実施態様によると、単一ディスクのキューデータ／ログレコード組合わせファイルでは、書込み動作の性能および信頼性が向上し、同時に使用ディスク数が減少する。
上述のように、システムクラッシュの回復は、エラーの発生場所を突き止める際にすべてのログレコードを通して検索する必要のないキューエントリマップテーブルを使用することによって加速される。さらに、キューエントリマップテーブルを使用することによって、システム業務管理担当者に対して拡張性および柔軟性をもたらすキューデータファイルに要件の数を当初から割り当てることが可能である。
さらに上述したように、当該システムは、格納装置の再利用のためにキューデータファイルの循環（ラップアラウンド）が潜在的に存在することを暗示する巡回キューを利用するものである。これによって、キューが循環する（ラップアラウンド）場合、まだ有効かもしれないキューデータおよび／またはログレコードが次の書込み動作によって確実に上書きされないように予約表（リザベーションテーブル）または自由空間（フリースペース）ヒープを維持することを要求される。
一実施態様によると、キューデータ格納装置構造（アーキテクチャ）は、キューの固定サイズに基づいてキューマネージャを最初に初期化する場合に作成される単一フラットファイルからなる。初期キューの作成は、メッセージキューイングシステムにおけるピーク負荷、たとえば時に所定の時点でメッセージキューに予想される入力の最大数についてのシステム業務管理担当者の感覚に基づいて行われる。キューデータファイルにおける各メッセージは、メッセージヘッダおよびメッセージ本文を含む。メッセージ本文は、メッセージの内容を含み、メッセージヘッダに続く次の隣接ブロックのディスクに格納される。
上記実施態様では、キューデータファイルは、実行時に拡張可能な所定数の論理セグメントまたはセクタに分割される。各セグメントは、キューエントリマップテーブル（QEMT）のコピーを包含し、各セグメントの冒頭にこれが格納される。QEMTは、キューファイル全体に格納されたキューエントリおよびログレコード情報についての制御情報を包含する。メッセージヘッダ、メッセージ本文およびログレコードは、QEMTの後にメッセージデータおよびログレコードブロックの潜在的な混合と共に格納される。
理解されるように、QEMTのサイズは、ユーザがキュー作成時に定義するキューエントリの予測最大数に依存する。ログレコードは、決定論的バイト数を取るため、キューデータファイルは、ログレコードと、メッセージヘッダと、メッセージ本文と、QEMTの混合のデータタイプから構成される。
新規セグメントがキューデータファイルに到達すると、その新規セグメントの冒頭に新規QEMテーブルがディスクに書込まれ、メッセージおよびログレコードがQEMテーブルに続く。最も小さいオンディスクデータのタイプはログレコードであるため、１ブロックがログレコードのサイズである場合、キューデータファイルのセグメントは、ブロックからなるように定義される。このように実施性を高めることは、検索アルゴリズムの開発を容易にする。
トランザクションメッセージキューイングシステムの状態は、QEMTに包含される制御情報によって捕獲される。QEMTは、各自コピーを維持する各スレッドよりむしろ多重スレッドを作動させることが可能な静的データ構造として定義される。
ログベースデータ構造（アーキテクチャ）の結果、当該発明は、既存のトランザクションメッセージキューイングデータ構造（アーキテクチャ）において多数の改良を提供するものである。書込み動作の性能が既存のメッセージキューイング構造（アーキテクチャ）において改良され、この発明をベースにしたメッセージキューイングシステムによって、高速化の銀行処理アプリケーション等メッセージ滞留時間を短縮した高性能化スループットシステムが確実に達成される。さらに当該システムは、様々な帯域幅を有するネットワークおよび／または信頼性のないネットワークを介してエージェントの搬送の際に根底をなす信頼性のあるメッセージ送信基幹（インフラストラクチャ）についても適用可能である。
さらに、メッセージデータおよびログレコードの書込み動作は、常にフォワード方向に行われ、これらはいずれも同一のディスクファイルに格納されることができる。
また、本システムは、トランザクションメッセージキューイングシステムの信頼性を向上させるものである。当該ログベースデータ構造（アーキテクチャ）において、ファイルの不正処理が分割キューデータおよびログレコードファイルとの２つの潜在的なファイル不正処理の場面に対して起こり得る一つの場所が存在する。また、使用されるディスクファイルが少なくなるため、信頼性も高まる。キューデータ／ログレコードの組合わせファイルは、公知のACID特性の原始性（Atomicity）、一貫性（Consistency）および隔離性（Isolation）のそれぞれの特性に忠実である。さらに、明らかなように、既存のRAID技術を駆使して透過性のある二重書込みを行うことが可能である。
当該システムにおいて、これによって得られたメッセージキューイングシステムによって、先入れ先出し（First In First Out）、後入れ先出し（Last In First Out）または優先順位ベースのメッセージデータアクセスを含むいずれのメッセージデータアクセス方法も支援可能となる。また、同時にシステムクラッシュからの回復に要する時間を短縮することができる。従来のアプローチではログレコードのファイル全体の全データを走査するが、当該システムでは、最新のチェックポイントを決定するために一部のキューエントリマップテーブルをまずテストすることのみ要求される。そして次にそのセグメント内にあるログレコードに走査を進める。
さらに、この発明によって、業務管理担当者は、キューデータファイルのセグメント数、続いてチェックポイント間隔の数を最初から予め定めることによって、システム回復に要する仕事量を調整できるため、当該システムは、メッセージキューイングシステムの業務管理に対して拡張性および柔軟性を提供する。したがって、システム業務管理担当者は、チェックポイントの書込みの合計金額を先に支払うため、回復時に拡張ログレコードの走査を行う場合の高額の支払いを防ぐ。このトレードオフ（交換条件）を調整および微調整することによって、アプリケーションの要件およびドメインを適合させることができる。
上記の利点は、キュー制御情報と、メッセージデータと、メッセージ動作のトランザクションログレコードとを包含する予め割り当てられたオンディスクキューバッファを使用することによる。オンディスクキューバッファは、多数のセグメントまたはセクタから構成される。各セグメントは、同一の所定のブロック数から構成される。各セグメントの冒頭には上述のキューエントリマップテーブルがあり、個別のキューエントリの状況に関する制御情報データと、ディスク上にありメッセージが物理的に格納されるポインタオフセットとを包含する。キューエントリマップテーブルは、メッセージキューイングシステム全体についての固定のチェックポイント間隔として機能する。メッセージ動作のメッセージおよびトランザクションログレコードは、メッセージブロックとログレコードブロックが組み合わせられるようなセグメント内のブロックに格納される。また、ある特定のメッセージのログレコードを当該メッセージに隣接するように格納することは要求されない。
当該発明の特徴として、ディスクヘッドに対して常にフォワード方向になるようにメッセージデータ書込み操作を行う。また、ポインタがトラバースする必要なく連続してメッセージをディスクに格納する。さらに、ログレコード書込み動作は、常にディスクヘッドに対してフォワード方向になるように行う。ログレコードは、二相コミットプロトコルに基づいてメッセージ操作の状態の変化が書込まれる。したがって、ログレコードは、準備（Prepare）、準備完了（Prepared）、コミット（Commit）、アボート（Abort）そして確認（Acknowledge）の各メッセージが遠隔のキューマネージャから書込まれることができる。
別の固有の特徴として、キュー全体がシングルパスで走査されることができる。さらに、オンディスクの不要データの集積は常に線形プロセスである。さらに、多数のキューエントリマップテーブルが、キューマネージャの適切なシャットダウン時にディスクに格納される最新のテーブルの固有のシーケンス番号と共に、同一ファイルに存在している。
重要なこととして、読出し動作は、先出し先入れ、後入れ先出しまたは優先順位ベースの方策に従い、これらの方策のいずれかを実行するために特別な規定を必要としない。
さらに、回復の手順は、キューエントリマップテーブルのタイムスタンプのみを検索することによって加速される。これは、最新のキューエントリマップテーブルが回復プロセスの開始状態として機能するからである。当該テーブルに続くログレコードは順に読み出され、最後の既知のチェックポイント後に行われる変更に反映ために、この最新のキューエントリマップテーブルのインメモリコピーが変更される。
【図面の簡単な説明】
図１は、メッセージが本店からその支店に流れる、当該システムを利用した典型的な銀行取引アプリケーションのブロック図、
図２は、データを一つのファイルに記録し、ログを別のファイルに記録し、前記データを非連続セクタに格納し、最新状況を再構築するためにログファイル全体を走査することを要求しながら、回復プロセスが当該システムの全メッセージの完全な状態を得るためにデータファイルおよびログファイルの両方を含む、２つのファイルシステムを表す線図、
図３は、単一ファイルを利用して、データおよびQEMTマッピングテーブルを格納することにより、ハードウェアを最小にし、データの回復に要する走査時間を短縮して紛失データを迅速に回復可能にする当該システムを表す線図、
図４は、単一書込み方向の巡回ファイルを示した、図３のファイル内にあるデータブロックの格納装置を表す線図、
図５は、QEMT制御ブロックを利用することによって有効データの位置および／または場所を容易に確認できることを示した、ファイル内の各種の周知の位置またはオフセットにおける可能なQEMT制御ブロックを表す線図、
図６は、ファイルをフォワード方向に書き込めるように、メッセージデータブロックによって状態変化のログレコードが分散することを示す線図、
図７は、本QEMT構造を示し、タイムスタンプとして機能し、かつ当該システムを再格納するために必要とされる増分のチェックポイント情報を有した、QEMTシーケンス番号を含む表、
図８は、個別のメッセージ状況を再格納するための情報を提供する表、
図９は、巡回キューを実行する循環システムにおいてフォワード方向へのデータの流れを示す線図、
図１０は、図６のログ入力によって、増分ログレコードに格納された情報を示す表、
図１１は、キューからメッセージを取り出すための手順を示すフローチャート、
図１２は、メッセージをキューに書込むための手順を図示するフローチャート、
図１３は、完全に再格納された状態をもたらす最新のQEMTを識別後にログレコードの次の読出しを行い、最新のQEMTが初期走査によって識別される回復プロセスを示すフローチャートである。
発明を実施するための最良の形態
図１を参照すると、更新された口座情報を支店から本店へ伝送する目的のために、メッセージキューイングシステム１０が銀行の支店１２と本店１４との間に設置される。また、銀行の異なる支店のそれぞれの端末機１６、１８および２０にデータがそれぞれ入力される。このデータは、各支店のローカルデータベースサーバ２２、２４および２６にそれぞれ格納され、各データベースサーバは独自のローカル格納装置を有し、ここでは参照符号２８によって示される。
データベースサーバの出力は、一連のメッセージキューイングサーバ３０、３２および３４にそれぞれ接続され、それぞれが独自の格納装置を保有し、ここでは参照符号３６で付されている。
メッセージキューイングサーバは、広域ネットワーク４０に対してその出力を行う。該ネットワークは、この出力を本店にあるメッセージキューイングサーバ４２に接続し、このサーバは、図示のように、各格納装置４４をそれぞれ有している。メッセージキューイングサーバ３０、３２および３４は、図示のように、接続された格納装置５２を有するデータベースサーバ５０と広域に通信を行う。メッセージキューイングサーバ４２の出力は、図示のように、接続された格納装置５２を備えるデータベースサーバ５０と接続される。このデータベースの情報は、本店の端末機５４において閲覧が可能である。
メッセージキューイングシステムの目的は、更新された口座情報を本店に備えるために支店から信頼して伝送することを可能にすることである。また、本社と直接接続しているかどうかに関係なく、支店におけるトランザクションが進行できることも重要である。
次に、図２を参照すると、従来、６０および６２で図示されるようなメッセージおよびヘッダは、データディスク６４のセクタ６６、６８、７０および７２に格納され、このセクタにおいてメッセージおよび随伴するヘッダがランダムに配置されていた。
同時に、メッセージ状態の情報は、上記データディスクに格納されている各メッセージについてのレコードを含む、ログディスク８０に格納され、着信順位およびデータディスクにおける位置を含んでいた。さらに、トランザクションの状況は、メッセージおよび対応するヘッダそれぞれについてログディスク８０にログインされていた。
「Ｘ」８２で示されるように、伝送が割り込まれる場合、従来、ここでは８４で図示されるように、伝送割込み直前のデータディスクファイルの最新状態が再構築できるようにログファイル全体を走査することが要求されていた。前述したように、これは時間のかかるプロセスであり、クラッシュする直前のシステムの状態を再構築するためには、ログファイル全体が走査されなければならない。メッセージおよびヘッダの情報がデータディスクの非連続セクタに格納されるため、状況ははるかに複雑になる。また、伝送割込み時に不正処理されないメッセージを見つけるためにログファイルとデータファイルとの相互通信が要求される。
次に、図３を参照すると、当該システムにおいてメッセージデータ６０およびメッセージヘッダ情報６２は、単一ディスク格納装置９０の、ここでは９２、９４、９６および９８で図示される連続するセクタに格納される。キューエントリマネジメントテーブルを利用することによってアクセス可能な順番にメッセージおよびヘッダの情報が格納されることはこの発明の特徴であり、後述するチェックポイントシステムを介してメッセージデータを見つける。
メッセージデータがセクタすべてに格納されず、むしろ上述のように連続して格納されることは明らかである。
ファイル９０に格納されたデータにアクセス可能とするために、キューエントリマネジメントテーブル（すなわちQEMT）は、制御情報１００の入力と、メッセージブロック１０２とログレコード１０４とを含むセクタ情報を含む。これらはすべて関連データおよびヘッダが見つけられるセクタを固有に特定するように設計される。したがって、QEMTは、このようにしてシステム状態を特定する。
図４、図５および図６と関連して明らかなように、キューエントリマネジメントテーブルは、メッセージデータとヘッダ情報との間に分散されたファイル９０に格納される。
次に、図４を参照する。一実施態様では、ファイル９０は、隣接するセクタがここでは１０６で図示される情報ブロックを有するように構成され、情報ブロックは、矢印１０８で図示されるように左から入り、左から入るブロック番号１および右から出るブロック番号１３によって図示されるようにファイルを左から右へトラバースする。隣接するブロックおよびファイルを通してのフローは、変化しないいわゆる書込み方向を作成することが理解されよう。
次に、図５を参照する。上述のQEMT制御情報ブロック１００は、QEMT制御情報ブロック１００の位置がファイル９０を通して周知のオフセットでのチェックポイントを特定するように、その他の隣接ブロック１０６間に分散される。
QEMT制御ブロックを一定の間隔で分散する目的は、場合によって、特定のメッセージデータおよびヘッダ情報を含む完全なシステム状態を、チェックポイント番号またはチェックポイント間隔を単に指定することによって迅速に見つけることである。その結果、有効情報を有する最後であるとしてチェックポイント間隔が識別されると、QEMTブロックがどこで有効データが見つけられるか、ならびにその身元および位置を指定した後に隣接ブロックが書き込まれるように、制御QEMT制御ブロックのいずれかの側にメッセージデータおよびログレコードブロックを有することができる。
別の説明として、QEMT制御ブロックは、周知の位置を回復プロセスに提供して当該システムの状態を調査する。
次に、図６を参照すると、ブロック１０６は１１０で図示されるように、メッセージデータブロックとして、または１１２で図示されるように、増分ログブロックとしてブロック１１２を図３中のログレコード１０４に対応させながら利用される。これらのログレコードは、隣接する下流ブロックでのメッセージへの状態変化を記録する。なお、制御ブロックは、ファイルの調査の開始に一部の既知のポイントのみ与え、一方ログレコードはファイルにおける個別のメッセージに関する情報を提供する。
図３に戻って参照すると、ログレコード１０４は、開始点がＱＥＭＴ制御ブロックで表されるデータに関連する多数の連続ログレコードの一つでしかないことが理解されよう。これらのログレコードは、先行のメッセージブロックにおける情報への変更を、その特定のメッセージブロックへの変更についての経過を完全に付するように、記録する。
再び図６に戻ると、所定の数のメッセージブロックは、チェックポイント後に生じた追加メッセージデータブロックを特定するQEMT制御ブロックによって境界が示されることに留意する。このセクタ内にトランザクションログレコード１１２がある。ログレコードＴ１は、メッセージブロックのいずれか一つにおいての変更を記載できることが明らかである。矢印１１４からわかるように、情報は左から右へ流れる。この場合、トランザクションログレコードＴ１は、当該システムにおけるいずれのメッセージについても状態の変化を記述するが、これは、メッセージは受信されてもはや保持する必要がないという確認、またはメッセージは送信されたがまだ受信されていない、または確認されていないという確認の場合もある。さらに、上記は、この種のシステムにおいてメッセージを確実に伝送するために２パスハンドシェーキング技術（two pass handshaking technique）を反映している。
たとえば、トランザクションログレコードＴ１は、新規メッセージがその特定のポイントでファイルに追加されたことを示してもよい。ログレコードの作成時にログレコードの位置が書込みヘッドによって決定されることは理解されよう。そこで、ログレコードが時間Ｔ１で作成される際、書込みヘッドはファイル中のある特定のポイントに存在するが、ログレコードは、全ファイル構造の中のいずれの位置においてもトランザクションおよびメッセージ照会することができる。
同様に、トランザクションログレコードＴ２、Ｔ３およびＴ４は、これらのログレコードを時に連続して投稿しながら、これらのメッセージが状態を変えたことを反映する。
QEMTブロックおよびログレコードブロックは、単一ファイル構造に挿入可能であり、さらに単一ファイル構造は、一実施態様において一方向に情報の流れを有するため、先行技術の２ファイル構造を完全に排除することが可能である。さらに、QEMTブロックおよびトランザクションログレコードブロックを利用することによって、不正処理されないこれらのメッセージを固有に特定して情報割込みの影響を早期診断でき、システム故障後のシステム状態を素早く回復できる。
次に図７を参照すると、キューエントリマネジメントテーブルのヘッダの構成が１２０に図示されている。これによって明らかであるように、一実施態様によると、ヘッダは、キューファイルのセグメント数１２２と、セグメントサイズ１２４と、QEMTシーケンス番号もしくはタイムスタンプ１２６と、前回のセグメントにおける最後尾のログレコードのシーケンス番号１２８と、現行セグメント数１３０と、キューヘッドポインタ１３２と、キューテイルポインタ１３４と、現行のセグメントにおいて次に利用可能なブロック１３６と、QEMT入力の一覧１３８と、ディスクブロックの予約表１４０と、調整者（コーディネータ）として動作する係属中のトランザクション一覧１４２、および受け手として動作する係属中のトランザクション一覧１４４とを含む。
ヘッダに含まれる情報は、回復プロセスの支援情報であることが理解されよう。
次に図８を参照すると、QEMI入力１３８は、それぞれシーケンス番号１４６と、メッセージＩＤ１４８と、QputまたはQgetのいずれかであるメッセージ動作モード１５０と、メッセージ受け手のノード名１５２と、メッセージ受け手のサーバ名１５４と、「アクティブ」、「ペンディング」、「アボート」または「コミット」のいずれかであるトランザクション状況１５６と、受信者によって受信された最後の既知の応答である参加者２ＰＣ投票（vote）１５８と、一組の追加フラグ１６０と、メッセージのポインタオンディスク位置１６２とを含む。
したがって、キューエントリマネジメントテーブルは、ファイルの状況に関する正確な情報を提供し、より詳しくは任意のキューエントリに関する情報を提供する。
次に図９を参照すると、ここでは、単一メッセージは隣接ブロックに格納されるため、再処理は、隣接ブロックを読み出し返す（read back）ことを含む。この結果、読出し操作中のヘッドの動作を軽減する。
要するに、先行技術では、読出しを実行するために隣接しないブロックを読出しヘッドがトラバースすることを要求し、したがって、相当時間を要する可能性があった。当該システムにおいては、メッセージは隣接ブロックに格納されているため、読出し操作時にこれらの隣接ブロックをトラバースすることのみ必要となる。同様に、続いて起こる書込み操作ではヘッドは限定されたファイル量のみトラバースする。
要するに、次の書込みについてフォワード方向の流れがあり、循環するため、データは隣接ブロックに構成され、ここから上記の利点が生じる。
次に図１０を参照すると、図６のトランザクションログレコード１１２は、一実施態様における特殊なログレコードマーカ１６２を含む。本実施態様によると、シーケンス番号１６４は、QgetまたはQputのいずれかの操作を言及するメッセージ操作モード１６６とともに提供される。メッセージＩＤ１６８と、一組の操作フラグ１７０と、「アクティブ」、「ペンディング」「アボード」または「コミット」の状況を含むトランザクション状況１７２と、上述の参加者２ＰＣ投票１７４と、キューファイルにおけるメッセージのオンディスク位置を指すポインタ１７６が含まれる。
次に図１１を参照すると、書込みまたはQput操作のフローチャートが示されている。本フローチャートでは、１８０で図示されるように開始した後、他のユーザがヘッド入力にアクセスできないようにブロックキューヘッドポインタ１８２が効果的に一覧のヘッドをロックする。その後、システムはキューヘッドポインタを増加し、トランザクション状況を「アクティブ読出し」に設定する。これは、ハンドシェークプロセスの開始を示す。
１８６で図示されるように、システムはその後キューヘッドポインタのロックを解除し、その後１８８で図示されるように、メッセージをオンディスクキューファイルから読出す。次にQEMテーブルは、１９０で図示されるようにロックされ、ログレコードは、その後１９２で図示されるように書込まれ、QEMテーブルは、１９４で図示されるようにロックを解除される。QEMテーブルのロック解除ステップの出力は、メッセージ伝送がトランザクションのものであるかどうかを確認する決定ブロック１９６に当てはまる。トランザクションのものである場合、１９８で図示されるように、システムは二相「コミット」プロトコルを作動し、ハンドシェークを許容する。これによってQputまたは書込み操作が完了する。
次に図１２を参照して、Qgetまたは読出し操作について説明する。２００で図示されるように開始されて、２０２で図示されるようにキューテイルポインタがロックされ、新規のQEM入力が、２０４で図示されるようにキューテイルポインタを増加させて作成される。その後、２０６で図示されるように、システムはQEM入力制御情報を記入し、トランザクション状況を「アクティブ制御」に設定する。その後、２０８で図示されるように、キューテイルポインタはロックを解除され、QEMテーブルは、２１０で図示されるようにロックされる。
次に、２１２で図示されるように、決定ブロック２１４で示されるセグメントの境界線を横切るブロックと共に、システムはオンディスクブロックを予約表から割り当てる。ブロックがセグメント境界線を横切る場合、２１６で図示されるように、システムはQEMTチェックポイントのディスクへの書込みを強制する。これは、メモリ内コピーをディスクに書込むことを指す。ブロック２０６は、QEMテーブルの状態のメモリ内コピーおよびこれによって得られるQEM入力を更新することが理解される。
２１８で図示されるようにQEMTチェックポイントがディスクへの書込みを強制された後、当該システムは、メッセージデータをディスクに書込みQEMテーブルのロックを解除する。決定ブロック２２０は、メッセージがトランザクションのものであるかどうかを確定し、トランザクションのものである場合には、２２１で図示されるように二相コミットプロトコルを作動してハンドシェークを促進させる。書込みシーケンスの終了は２２２で図示される。ブロック２２０は、ハンドシェーキングプロトコルを動作している受け手末端を指す。
次に図１３を参照すると、回復シーケンスが図示され、２３０で図示されるような開始後、キューテーブルポインタは２３２で図示されるようにロックされ、システムはその後、２３４で図示されるようにグローバルデータ構造を再格納する。このことは、システムの状態を全体的に初期化する。その後、２３６で図示されるように、システムは最新のQEMTについてキューファイルにおける各QEMTを走査する。これによって、通信割込み前に最新のチェックポイントが確立される。その後、２３８で図示されるように、システムは最新のQEMTを有するログレコードについてこのセグメントのログレコードを走査する。これは、セグメントのログレコードがQEMTにおいての入力によって照会されるメッセージに適用されることを意味する。
決定ブロック２４０で図示されるように、システムは走査すべきログレコードがさらにあるかどうかを確認する。当該QEMTに関連するポインタに続いてQEMTは最新のログレコードを特定することが理解されよう。しかしながら、その後走査する必要のある次のログレコードが実際にある可能性がある。この場合、当該システムは、２４２で図示されるようにメッセージのトランザクション状況について参加者とコンタクトを取る。一例では、受信者はメッセージを受信したかどうかについて質問される。その後、システムは二相「コミット」プロトコルを呼出して、２４４で図示されるようにトランザクションを解決する。これは、ハンドシェーキングプロセスが２パスプロセスであることを示す。したがって、ある人が受信者から受信を返されるような状況であっても、この状況を利用して当該システムが停止した地点でハンドシェーキングプロセスを再開する。
２４６で明らかなように、当該システムは、予約表の状況を更新し新規ファイルポインタ位置を決定する。したがって、現行のセグメント番号１３０および現行のセグメント１３６において次に利用可能なブロックによって新規ファイルのポインタ位置を決定しながら、セクション全体が走査され、予約表１４０の状況が更新される。
２４８で図示されるように、システムは、２５０で図示されるように、回復が完了した地点で新規QEMTの状況をディスクに書き出す。
Ｃ言語で書かれたこの発明の１つの実施例のためのプログラミング・リスト出力が次に記載される。

Technical field
The present invention relates to message queuing, and more particularly to a rapid and reliable transaction message queuing system for client server and mobile agent applications, and further to a log-based data architecture for the system.
Background art
Message queuing is the most basic communication paradigm between applications on different computer systems due to the flexibility of enabling both its inherent synchronous and asynchronous processing. Message queuing middleware backbone is a very flexible framework for vast application domains in general client server and mobile agent computing processes, ie workflow computing processes, object message transmission, transaction message transmission and data replication services. It is.
In transaction message transmission situations, data is often lost during transmission. In the financial industry, bank transaction records transferred from one location to another may be lost due to server failure, transmission line failures, or other artifacts. It will be a disaster. It is up to the system manager to quickly determine that an error has occurred and to allow reconstruction of the data from known points where the data was valid.
In order to reconstruct the latest status of the system before the crash, the system scans the entire so-called log file to determine the point where an error has occurred in the past. A log file with an associated time stamp is always used to identify messages and data contained in the log file. However, to scan the entire log file to confirm the latest situation, it is necessary to scan as many as 1,000 log records.
Determining where the error occurred and doing the entire log record that reconstructs the file from that point is not only an inefficient way, but traditional systems require two types of disk files. there were. Of these, one functions as a data file and the other functions as a log file.
In addition, the interrelationship between log entries and data files or sectors is complex because traditional sectors are stored in a non-identifiable order, and mapping between log files and sectors is a somewhat time consuming process. is there.
In addition, it will be appreciated that in the technical background, message queuing is commonly used to provide a storage device in which data records transmitted from one point to another are not compromised. . For example, even if an error occurs at a certain location and data is lost, the original storage device for message queuing can function and the data can be reconstructed at the second location.
As an example, especially in stock trading, it is desirable that interruptions during trading be limited to minutes rather than hours. However, sometimes, if the system server goes down, recovery may take 2 to 8 hours depending on the number of transactions in the system at that time. Therefore, there is a need to minimize downtime and the time and expense required to find and reconstruct corrupted files.
The queue file used here represents a physical storage device for a message being transmitted. The queue file can also be referred to as a holding cell for an incomplete operation. That is, basically, if there is no receiver who receives a predetermined message, the message is held in a queue file so that it can be sent out later. Therefore, the queue file provides reliability in holding the transmitted information.
Furthermore, in conventional systems, the recovery data is not provided by the queue file itself. Therefore, when an error or data loss occurs, the queue file is not used to check the file status. That is, the queue file is not used to reconstruct the data file from data that has not previously been tampered with. In conventional systems, the queue file itself does not provide recovery data.
Another example of application of message queuing to real-world applications involves how to support real-time online transaction processing using message queuing core mobile agents. In this example, the customer is a bank with geographically dispersed branches, for example. A customer account is created and stored at the local branch where the account was opened. For illustrative purposes, this will be referred to as the home branch of the account. A copy of each account is also stored at the head office. Account read-out operations can be performed from either the regional branch office or the head office. However, it is required that the copy at the head office branch and the copy at the head office be updated equally.
When a renewal request occurs at the home branch, the copy at the local branch must be updated. With this update, it is possible to start an agent that automatically sends the next enqueue request to the queue manager or queue server. The queue manager dequeues a request to another queue manager over the wide area network, and this queue manager in turn sends an update request to the database server of the account in the mirror office. Remove from the queue.
The message queue provides asynchronous reliable processing in this example. Asynchronous processing starts with an agent that is started by updating the database at a certain location. The agent sends an update request asynchronously to the message queue manager, but does not need to wait for a response. The message queue manager functions as a holding cell for requests so that the requester can continue processing without having to wait for a response. In addition, in this example, the message queue manager receives the update request via a well-known handshake protocol called a two-phase commit protocol known as the transaction message queue in the industry. It provides reliability by keeping a copy of the update request in the queue until it is confirmed one by one.
These types of message queuing systems have worked reliably but have relied on data structures that use separate queue data and log record files to store messages attached to the message queue. . Such a structure hinders quick repair in the event of a server crash and requires two types of storage disks. One is a disk for data and the other is a log record disk. Furthermore, conventional message queuing structures typically do not optimize write operations without the need for spare hardware to work efficiently. Also, it is not suitable for a high-performance throughput system with a short message residence time. The separate queue data and log files described above also incorporate unreliability at an unnecessarily high level. This is because two points of file fraud processing and medium failure are potentially included. In addition, there is usually no means for predetermining the amount of work required for recovery from the beginning for the business manager of the message queuing system.
The above system is commercially available as DECmessageQ from Digital Equipment Corporation, MQ series from IBM, and Encina RQS from Transarc.
Summary of the Invention
The message queuing system provided in the present invention saves and stores messages and their status in an efficient single file on a single disk in order to solve the problems of conventional message queuing. It becomes possible to quickly recover from a server failure. A single disk single file storage system that stores messages and their status erases writes to three different disks: a data disk, an index structure disk, and a log disk. A single disk single file storage device is enabled by focusing all information in adjacent spaces on the same disk. As a result, all writes are included in one sweep operation of the write head, and the write head moves only once in one direction, necessitating the start of writing messages and finding an area to store its state. The unique Queue Entry Map Table used to track the aggregated information is the control information, message blocks, log records, and data stored when writing new records. Simultaneously with a single file disk storage device where the write head does not need to back up at all to traverse. In addition, the system can find corrupted files without having to scan the entire log file.
In order to find the latest valid data, the control checkpoint interval system can be utilized to find the latest illegitimate data. By performing a scan and finding the latest checkpoint interval, the last queue can be immediately recognized. Scan log records after checkpoint and set up-to-date status for all messages. Data recovery can be performed in the order of time less than that of the previous system by the above-described system. At the same time, establishing an efficient forward write method eliminates the need to search through unordered sectors.
According to one embodiment, the preceding sector is updated by adding a new record to the last sector, and a circular wrap around buffering system is used to indicate that the file state has changed. Reclaims previous blocks that were freed and no longer hold valid messages and / or log records.
Accordingly, the present invention provides a log-based data structure (architecture) for a transaction message queuing system, which utilizes a combined on-disk file structure of message queue data and log records. is there. According to one embodiment of the present invention, a single disk queue data / log record combination file improves the performance and reliability of the write operation while simultaneously reducing the number of disks used.
As mentioned above, system crash recovery is accelerated by using a queue entry map table that does not need to be searched through all log records when locating the error. Furthermore, by using a queue entry map table, it is possible to assign a number of requirements from the beginning to a queue data file that provides expandability and flexibility to system operations managers.
Further, as described above, the system utilizes a cyclic queue that implies that there is a potential circulation of the queue data file (wraparound) for storage device reuse. This ensures that if the queue cycles (wraparound), queue data and / or log records that may still be valid will not be overwritten by subsequent write operations (reservation table) or free space (free space) Required to maintain the heap.
According to one embodiment, the queue data store structure (architecture) consists of a single flat file that is created when the queue manager is first initialized based on a fixed size of the queue. The creation of the initial queue is based on the peak load in the message queuing system, eg, the system manager's perception of the maximum number of inputs expected in the message queue at a given time. Each message in the queue data file includes a message header and a message body. The message body contains the content of the message and is stored in the disk of the next adjacent block following the message header.
In the above embodiment, the queue data file is divided into a predetermined number of logical segments or sectors that can be expanded at runtime. Each segment contains a copy of the queue entry map table (QEMT), which is stored at the beginning of each segment. QEMT contains control information about queue entries and log record information stored in the entire queue file. Message headers, message bodies and log records are stored with a potential mix of message data and log record blocks after the QEMT.
As will be appreciated, the size of the QEMT depends on the expected maximum number of queue entries that the user defines when creating the queue. Since a log record takes a deterministic number of bytes, a queue data file is composed of a mixed data type of log record, message header, message body, and QEMT.
When a new segment reaches the queue data file, a new QEM table is written to disk at the beginning of the new segment, and messages and log records follow the QEM table. Since the smallest on-disk data type is a log record, if one block is the size of the log record, the segment of the queue data file is defined to consist of blocks. This enhancement of implementation facilitates the development of search algorithms.
The state of the transaction message queuing system is captured by control information contained in the QEMT. QEMT is defined as a static data structure that can run multiple threads rather than each thread that maintains its own copy.
As a result of the log-based data structure (architecture), the present invention provides a number of improvements in the existing transaction message queuing data structure (architecture). The performance of the write operation is improved in the existing message queuing structure (architecture), and the message queuing system based on the present invention ensures a high-performance throughput system that shortens the message residence time such as high-speed bank processing application. To be achieved. Further, the system can be applied to a reliable message transmission infrastructure (infrastructure) that is based on transporting agents through networks having various bandwidths and / or unreliable networks.
Further, the message data and log record writing operations are always performed in the forward direction, and both can be stored in the same disk file.
The system also improves the reliability of the transaction message queuing system. In the log base data structure (architecture), there is one place where file illegal processing can occur for two potential file illegal processing scenes: split queue data and log record file. In addition, reliability is improved because fewer disk files are used. The queue data / log record combination file is faithful to the known ACID characteristics of Atomicity, Consistency, and Isolation. Furthermore, as can be seen, it is possible to perform transparent double writing using existing RAID technology.
The resulting message queuing system supports any message data access method, including first-in-first-out, last-in-first-out, or priority-based message data access. It becomes possible. At the same time, the time required for recovery from a system crash can be reduced. The conventional approach scans all data in the entire log record file, but the system only requires that some queue entry map tables be first tested to determine the latest checkpoint. The scan then proceeds to the log record in that segment.
Furthermore, according to the present invention, the business manager can adjust the workload required for system recovery by predetermining the number of queue data file segments and then the number of checkpoint intervals from the beginning. Provides scalability and flexibility for business management of queuing systems. Accordingly, the system operation manager in charge pays the total amount of writing of the checkpoints first, thereby preventing a high payment when scanning the extended log record at the time of recovery. By adjusting and fine-tuning this trade-off (exchange conditions), application requirements and domains can be adapted.
The above advantages are due to the use of pre-allocated on-disk queue buffers that contain queue control information, message data, and transaction log records of message operations. The on-disk queue buffer is composed of a number of segments or sectors. Each segment is composed of the same predetermined number of blocks. At the beginning of each segment is the above-described queue entry map table, which contains control information data regarding the status of individual queue entries and pointer offsets on the disk where messages are physically stored. The queue entry map table serves as a fixed checkpoint interval for the entire message queuing system. Message action messages and transaction log records are stored in blocks within the segment such that message blocks and log record blocks are combined. Further, it is not required to store a log record of a specific message so as to be adjacent to the message.
As a feature of the present invention, the message data writing operation is always performed in the forward direction with respect to the disk head. In addition, messages are continuously stored on the disk without having to traverse the pointer. Further, the log record writing operation is always performed in the forward direction with respect to the disk head. Log records are written with changes in the status of message operations based on the two-phase commit protocol. Thus, the log record can be written from a remote queue manager with Prepare, Prepared, Commit, Abort, and Acknowledge messages.
Another unique feature is that the entire queue can be scanned in a single pass. Furthermore, the accumulation of unnecessary data on disk is always a linear process. In addition, a number of queue entry map tables exist in the same file, with the unique sequence number of the latest table stored on disk at the time of proper queue manager shutdown.
Importantly, the read operation follows a first-in first-in, last-in first-out or priority-based strategy and does not require any special provisions to implement any of these strategies.
Furthermore, the recovery procedure is accelerated by searching only the time stamps in the queue entry map table. This is because the latest queue entry map table functions as the start state of the recovery process. The log records following the table are read in order and the in-memory copy of this latest queue entry map table is modified to reflect changes made after the last known checkpoint.
[Brief description of the drawings]
FIG. 1 is a block diagram of a typical banking application using the system, in which messages flow from the head office to its branches,
FIG. 2 records data in one file, logs in another file, stores the data in non-consecutive sectors, and requires scanning the entire log file to reconstruct the latest situation. While the diagram representing two file systems, including both data files and log files, in order for the recovery process to get a complete state of all messages of the system,
FIG. 3 illustrates the use of a single file to store data and the QEMT mapping table, thereby minimizing hardware, reducing the scan time required for data recovery, and recovering lost data quickly. Diagram representing the system,
4 is a diagram representing a storage device for data blocks in the file of FIG. 3, showing a circular file in a single write direction;
FIG. 5 is a diagram representing possible QEMT control blocks at various known positions or offsets in the file, showing that the location and / or location of valid data can be easily ascertained by utilizing the QEMT control block.
FIG. 6 is a diagram showing that log records of state changes are distributed by message data blocks so that a file can be written in the forward direction;
FIG. 7 shows the present QEMT structure, a table containing QEMT sequence numbers that serves as a time stamp and has the incremental checkpoint information needed to re-store the system;
FIG. 8 is a table that provides information for restoring individual message statuses;
FIG. 9 is a diagram showing the flow of data in the forward direction in a cyclic system that executes a cyclic queue;
FIG. 10 is a table showing information stored in the incremental log record by the log input of FIG.
FIG. 11 is a flowchart showing a procedure for retrieving a message from the queue;
FIG. 12 is a flowchart illustrating a procedure for writing a message to a queue;
FIG. 13 is a flowchart illustrating a recovery process in which the next read of the log record is performed after identifying the latest QEMT that results in a fully restored state, and the latest QEMT is identified by the initial scan.
BEST MODE FOR CARRYING OUT THE INVENTION
Referring to FIG. 1, a message queuing system 10 is installed between a bank branch 12 and a head office 14 for the purpose of transmitting updated account information from the branch to the head office. In addition, data is input to each of the

terminals

16, 18 and 20 at different branches of the bank. This data is stored in the

local database servers

22, 24 and 26 of each branch, respectively, each database server having its own local storage device, here designated by reference numeral 28.
The output of the database server is connected to a series of

message queuing servers

30, 32 and 34, each having its own storage device, here designated by reference numeral 36.
The message queuing server outputs the message to the wide area network 40. The network connects this output to a message queuing server 42 at the head office, and this server has a storage device 44 as shown in the figure. As shown in the figure, the

message queuing servers

30, 32 and 34 communicate with a database server 50 having a storage device 52 connected thereto over a wide area. The output of the message queuing server 42 is connected to a database server 50 having a connected storage device 52 as shown. The information in this database can be viewed on the terminal 54 at the head office.
The purpose of the message queuing system is to allow the updated account information to be reliably transmitted from the branch in order to prepare the head office. It is also important that transactions at the branch can proceed regardless of whether they are directly connected to the head office.
Referring now to FIG. 2, conventionally messages and headers as illustrated at 60 and 62 are stored in

sectors

66, 68, 70 and 72 of data disk 64, where messages and associated headers are stored. It was randomly arranged.
At the same time, the message status information was stored in the log disk 80, including a record for each message stored in the data disk, including the incoming rank and position in the data disk. Furthermore, the transaction status is logged in the log disk 80 for each message and corresponding header.
If transmission is interrupted, as indicated by "X" 82, conventionally, the entire log file is scanned so that the latest state of the data disk file immediately before the transmission interruption can be reconstructed, as illustrated here at 84. It was requested. As mentioned above, this is a time consuming process and the entire log file must be scanned to reconstruct the state of the system just prior to the crash. The situation is much more complicated because the message and header information is stored in non-contiguous sectors of the data disk. Also, mutual communication between the log file and the data file is required to find a message that is not illegally processed at the time of transmission interruption.
Referring now to FIG. 3, message data 60 and message header information 62 in the system are stored in a continuous sector of a single disk storage device 90, here illustrated by 92, 94, 96 and 98. . It is a feature of the present invention that message and header information is stored in an accessible order by using the queue entry management table, and message data is found through a checkpoint system described later.
It is clear that message data is not stored in all sectors, but rather is stored continuously as described above.
In order to be able to access data stored in file 90, the queue entry management table (ie, QEMT) includes control information 100 input, and sector information including message block 102 and log record 104. These are all designed to uniquely identify the sector where the associated data and header are found. Therefore, QEMT identifies the system state in this way.
As is clear in connection with FIGS. 4, 5 and 6, the queue entry management table is stored in a file 90 distributed between message data and header information.
Reference is now made to FIG. In one embodiment, file 90 is configured such that adjacent sectors have information blocks illustrated here as 106, with information blocks entering from the left as illustrated by arrow 108 and block numbers entering from the left. Traverse the file from left to right as illustrated by block number 13 exiting 1 and right. It will be appreciated that the flow through adjacent blocks and files creates a so-called write direction that does not change.
Reference is now made to FIG. The QEMT control information block 100 described above is distributed among the other adjacent blocks 106 so that the location of the QEMT control information block 100 identifies checkpoints at known offsets through the file 90.
The purpose of distributing QEMT control blocks at regular intervals is to quickly find the complete system state, including specific message data and header information, in some cases, simply by specifying a checkpoint number or checkpoint interval. . As a result, when the checkpoint interval is identified as being the last to have valid information, the control QEMT will be written so that the QEMT block will write valid data where it is found, and the neighboring block will be written after specifying its identity and location. You can have message data and log record blocks on either side of the control block.
As another explanation, the QEMT control block provides a known location to the recovery process to investigate the status of the system.
Referring now to FIG. 6, block 106 corresponds to log record 104 in FIG. 3 as a message data block, as illustrated at 110, or as an incremental log block, as illustrated at 112. It is used while letting. These log records record state changes to messages in adjacent downstream blocks. Note that the control block gives only some known points to the beginning of the file search, while the log record provides information about individual messages in the file.
Referring back to FIG. 3, it will be appreciated that the log record 104 is only one of a number of consecutive log records associated with data represented by a QEMT control block. These log records record changes to information in the preceding message block so that the history of changes to that particular message block is fully followed.
Returning again to FIG. 6, note that the predetermined number of message blocks is bounded by a QEMT control block that identifies additional message data blocks that occurred after the checkpoint. Within this sector is a transaction log record 112. It is clear that the log record T1 can describe changes in any one of the message blocks. As can be seen from arrow 114, information flows from left to right. In this case, the transaction log record T1 describes a change of state for any message in the system, which confirms that the message is received and no longer needs to be retained, or the message has been sent but still There may be a confirmation that it has not been received or confirmed. Furthermore, the above reflects a two pass handshaking technique to ensure the transmission of messages in this type of system.
For example, transaction log record T1 may indicate that a new message has been added to the file at that particular point. It will be appreciated that the position of the log record is determined by the write head when the log record is created. So, when a log record is created at time T1, the write head is at a particular point in the file, but the log record can query for transactions and messages at any location in the entire file structure. it can.
Similarly, transaction log records T2, T3 and T4 reflect that these messages have changed state, posting these log records from time to time.
The QEMT block and log record block can be inserted into a single file structure, and the single file structure has a one-way information flow in one implementation, thus completely eliminating the prior art two-file structure. It is possible. Furthermore, by using the QEMT block and the transaction log record block, these messages that are not processed illegally can be uniquely identified to diagnose early the impact of information interruption, and the system state after a system failure can be quickly recovered.
Next, referring to FIG. 7, the structure of the queue entry management table header is shown at 120. As can be seen, according to one embodiment, the header includes the queue file segment number 122, the segment size 124, the QEMT sequence number or timestamp 126, and the sequence of the last log record in the previous segment. Number 128, current segment number 130, cue head pointer 132, cue tail pointer 134, next available block 136 in the current segment, QEMT input list 138, disk block reservation table 140, It includes a pending transaction list 142 that operates as a coordinator and a pending transaction list 144 that operates as a receiver.
It will be understood that the information contained in the header is support information for the recovery process.
Referring now to FIG. 8, the QEMI input 138 includes a sequence number 146, a message ID 148, a message operation mode 150 that is either Qput or Qget, a node name 152 of the message receiver, and a server name of the message receiver. 154, a transaction status 156 that is either “active”, “pending”, “abort” or “commit”, and a participant 2PC vote 158 that is the last known response received by the recipient A set of additional flags 160 and a pointer on-disk location 162 of the message.
Thus, the queue entry management table provides accurate information about the status of the file, and more specifically provides information about any queue entry.
Referring now to FIG. 9, where the single message is stored in the adjacent block, reprocessing includes reading back the adjacent block. As a result, the head operation during the read operation is reduced.
In short, the prior art required the read head to traverse non-adjacent blocks in order to perform a read, and therefore could take a considerable amount of time. In the system, since the messages are stored in adjacent blocks, it is only necessary to traverse these adjacent blocks during a read operation. Similarly, in subsequent write operations, the head traverses only a limited amount of file.
In short, because there is a forward flow for the next write and it circulates, the data is organized into adjacent blocks, from which the above advantages arise.
Referring now to FIG. 10, the transaction log record 112 of FIG. 6 includes a special log record marker 162 in one implementation. According to this embodiment, the sequence number 164 is provided with a message operation mode 166 that refers to either a Qget or Qput operation. Message ID 168, a set of operation flags 170, transaction status 172 including status of "active", "pending", "abode" or "commit", participant 2PC vote 174 described above, message on in queue file A pointer 176 pointing to the disk position is included.
Referring now to FIG. 11, a flowchart for a write or Qput operation is shown. In this flowchart, after starting as illustrated at 180, the block cue head pointer 182 effectively locks the heads of the list so that other users cannot access the head input. The system then increments the queue head pointer and sets the transaction status to “active read”. This indicates the start of the handshake process.
The system then unlocks the queue head pointer, as illustrated at 186, and then reads the message from the on-disk queue file, as illustrated at 188. The QEM table is then locked as illustrated at 190, the log record is then written as illustrated at 192, and the QEM table is unlocked as illustrated at 194. The output of the unlocking step of the QEM table applies to decision block 196 that checks whether the message transmission is for a transaction. If so, as illustrated at 198, the system activates a two-phase “commit” protocol to allow handshaking. This completes the Qput or write operation.
Next, Qget or read operation will be described with reference to FIG. Beginning as illustrated at 200, the queue tail pointer is locked as illustrated at 202, and a new QEM input is created by incrementing the queue tail pointer as illustrated at 204. Thereafter, as illustrated at 206, the system fills in the QEM input control information and sets the transaction status to “active control”. Thereafter, as illustrated at 208, the queue tail pointer is unlocked and the QEM table is locked as illustrated at 210.
Next, as illustrated at 212, the system allocates on-disk blocks from the reservation table, along with a block that crosses the segment boundary indicated by decision block 214. If the block crosses the segment boundary, the system forces a QEMT checkpoint to be written to disk, as illustrated at 216. This refers to writing an in-memory copy to disk. It is understood that block 206 updates the in-memory copy of the state of the QEM table and the resulting QEM input.
After the QEMT checkpoint is forced to write to disk as illustrated at 218, the system writes the message data to disk and releases the lock on the QEM table. Decision block 220 determines whether the message is transactional and, if it is transactional, operates a two-phase commit protocol as illustrated at 221 to facilitate handshaking. The end of the write sequence is illustrated at 222. Block 220 refers to the recipient end operating the handshaking protocol.
Referring now to FIG. 13, after the recovery sequence is illustrated and started as illustrated at 230, the queue table pointer is locked as illustrated at 232 and the system is then globalized as illustrated at 234. Re-store the data structure. This initializes the overall state of the system. Thereafter, as illustrated at 236, the system scans each QEMT in the queue file for the latest QEMT. This establishes the latest checkpoint before communication interruption. Thereafter, as illustrated at 238, the system scans the log records of this segment for log records with the latest QEMT. This means that the segment log record applies to messages queried by input in QEMT.
As illustrated by decision block 240, the system determines whether there are more log records to scan. It will be understood that following the pointer associated with the QEMT, the QEMT identifies the latest log record. However, there may actually be the next log record that needs to be scanned afterwards. In this case, the system contacts the participant about the transaction status of the message as illustrated at 242. In one example, the recipient is asked as to whether the message has been received. The system then calls a two-phase “commit” protocol to resolve the transaction as illustrated at 244. This indicates that the handshaking process is a two-pass process. Therefore, even if a certain person returns a reception from the recipient, the handshaking process is resumed at the point where the system is stopped using this situation.
As is evident at 246, the system updates the reservation table status to determine the new file pointer position. Thus, the entire section is scanned and the status of the reservation table 140 is updated, determining the pointer position of the new file with the current segment number 130 and the next available block in the current segment 136.
As illustrated at 248, the system writes the status of the new QEMT to disk at the point where recovery is complete, as illustrated at 250.
The programming list output for one embodiment of the present invention written in C language will now be described.

Claims

Means for transmitting a transaction message having a queue state, message queue data, and a message queue concatenated including log records;
In receptor sites, see containing and means for storing the transaction message queue data into a single file of a single disk and the message queue data and the log records are used on-disk file structure combined,
The single file is distributed and arranged at regular intervals in the file, a plurality of control blocks storing control information about the status of the queue and the log record, and a plurality of control blocks are arranged between the control block and the control block . The log indicating a state change to a message in an adjacent downstream block, which is arranged between a plurality of adjacent data blocks storing the message queue data, and a plurality of adjacent data blocks and a plurality of adjacent data blocks A message queuing system having a log record block for storing records.

The message queuing system of claim 1, further comprising a read / write head for accessing the single disk and means for driving the head in a single forward direction during a write operation.

The message queuing system of claim 1, further comprising a queue entry management table disposed at a preselected location on the disk and having a control information block, at least one message block, and at least one log record.

4. A message queuing system according to claim 3, wherein the preselected location corresponds to a fixed offset from the start of the file, whereby the latest status of message queue data can be quickly identified.

The receiving site further includes means for recovering the message queue upon interruption of the transmission in response to the latest queue entry management table prior to the information, thereby receiving from the information contained in the latest queue entry management table. 5. The message queuing system according to claim 4, wherein the latest valid information stored is stored.

The offset places the queue entry management table at the beginning of the sector so that the file is divided into multiple sectors and the table constitutes a checkpoint for the location of the sector that has valid information, so that the most recent before the interrupt. 6. The message queuing system according to claim 5, wherein the valid information is quickly found through identification of the sector enclosing the latest table.

The management queue table is written to adjacent blocks, resulting in a single file, forward write direction and adjacent message queue data block system to minimize seek time and increase complete and rapid recovery from transmission interrupts. 2. The message queuing system according to 2.