JPWO2003090089A1

JPWO2003090089A1 - Cache device

Info

Publication number: JPWO2003090089A1
Application number: JP2003586765A
Authority: JP
Inventors: 後藤　正徳; 正徳後藤; 新開　慶武; 慶武新開
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2002-04-22
Filing date: 2002-04-22
Publication date: 2005-08-25
Also published as: WO2003090089A1

Abstract

本発明は，２つのキャッシュ装置を有し，アクセスホストまたは２次記憶装置から与えられるデータを記憶するとともに，該データを２次記憶装置に記憶させるキャッシュシステムを提供する。アクセスホストからデータを受信した一方のキャッシュ装置は，該データを自装置のキャッシュメモリに記憶するとともに，該データを他方のキャッシュ装置に送信する。他方のキャッシュ装置は，一方のキャッシュ装置から送信されたデータを受信して，自装置のキャッシュメモリに記憶する。一方のキャッシュ装置または他方のキャッシュ装置は，キャッシュメモリに記憶されたデータを２次記憶装置に出力し，記憶する。２次記憶装置への記憶後，記憶を実行した側のキャッシュ装置は，他方のキャッシュ装置にフラッシュメッセージを送信する。データが２つのキャッシュ装置の双方に記憶されるので，キャッシュ装置が単一の場合のボトルネックを解消できるとともに，一方のキャッシュ装置に障害が発生しても，データの喪失を防止できる。The present invention provides a cache system having two cache devices, storing data supplied from an access host or a secondary storage device, and storing the data in the secondary storage device. One cache device that has received data from the access host stores the data in its own cache memory and transmits the data to the other cache device. The other cache device receives the data transmitted from the one cache device and stores it in its own cache memory. One cache device or the other cache device outputs and stores the data stored in the cache memory to the secondary storage device. After storing in the secondary storage device, the cache device on the side that executed the storage transmits a flush message to the other cache device. Since data is stored in both of the two cache devices, the bottleneck in the case of a single cache device can be eliminated, and loss of data can be prevented even if a failure occurs in one of the cache devices.

Description

技術分野
本発明は，データを一時的に記憶するキャッシュ装置，キャッシュシステム，およびキャッシュ方法に関し，特に，２つのキャッシュ装置を有し，アクセスホストまたは２次記憶装置から与えられるデータを記憶するとともに，アクセスホストからのデータを２次記憶装置に記憶させるキャッシュシステムにおける各キャッシュ装置，該キャッシュシステム，およびキャッシュ方法に関する。
背景技術
従来，ハードディスク等の２次記憶装置は１つのアクセスホストにのみ使用されていたが，近年，ストレージ用ネットワーク等を介して複数のアクセスホストにより共用される，複数の２次記憶装置からなる２次記憶装置群が広く使用されてきている。これら２次記憶装置群に対しては，アクセスホストからストレージ用ネットワークを介してデータの読み出しおよび書き込みが行われる。
このような２次記憶装置は，その構造から，要求アクセスパターンによってはスループット／レスポンス性能が十分に発揮されないことが多い。また，ＲＡＩＤ（ＲｅｄｕｎｄａｎｔＡｒｒａｙｏｆＩｎｅｘｐｅｎｓｉｖｅＤｉｓｋｓ）５を使用する場合には，書き込みに要するオーバヘッドが大きい。
そこで，キャッシュ装置（キャッシュメモリ）を用いて，比較的頻繁に使用されるデータや２次記憶装置に書き込まれるデータを一旦キャッシュ装置に記憶させておくことで性能向上を図ることが一般に行われている。
キャッシュ装置は，２次記憶装置から読み出され，または，２次記憶装置に書き込まれるデータを，高速な１次記憶装置を利用して一時的に保管する役割を果たし，計算機システムの性能向上に大きな効果をもたらしている。
特に，共用される２次記憶装置群とアクセスホストとの間に，ストレージ仮想化用アドレス変換装置やファイルシステム用のアプライアンス装置等，データを中間でインタセプトするタイプの装置を経由する必要がある場合に，その装置上でキャッシュ装置を使用することは性能向上に有効な手法となる。
しかし，単一のキャッシュ装置は，ボトルネックとなり，性能を悪化させることが多い。すなわち，キャッシュ装置が単一である場合に，２次記憶装置群に対するアクセスが１つのキャッシュ装置のみを通過する。このため，アクセスホストおよび２次記憶装置が増加しても，キャッシュ装置に対する十分なスループットを確保できず，全体の性能はキャッシュ装置の性能限界により決定される。
また，単一のキャッシュ装置では，そのキャッシュ装置の障害により，データが失われ，その結果，ダウンタイムの増加等の問題を引き起こす。
すなわち，キャッシュ装置に，２次記憶装置群に書き込まれる前のデータが存在し，そのデータがキャッシュ装置にのみ存在する状態はいつでも容易に起こりうる。そのような状態でキャッシュ装置に障害が発生した場合に，キャッシュ装置にのみ存在するデータは失われてしまい，回復不能となる。
また，キャッシュ装置の障害によるシステムのダウンタイムが増加すると，経済的および信頼性の面で，システムに損害を与えることにもなる。
ボトルネックの問題に対しては，キャッシュ装置を複数設けることによりこの問題を解決する方法が考えられる。これは，複数のキャッシュ装置を複数のアクセスホストに割り当てることで，ボトルネックを解消しスケーラビリティを確保しようとするものである。
しかし，あるアクセスホストが一方のキャッシュ装置にデータを書き込み，このデータと同一アドレスで内容の異なるデータを，他のアクセスホストが他方のキャッシュ装置に書き込んだ場合に，双方のキャッシュ装置間で内容の異なるデータがキャッシュされることになる。この状態を放置すると，あるアドレスに対するキャッシュデータが各キャッシュ装置で不一致のままとなり，一貫性（ｃｏｎｓｉｓｔｅｎｃｙ）が保たれなくなる。
また，あるキャッシュ装置が障害で停止した場合に，他のキャッシュ装置には，異なるデータがキャッシュされていることになり，透過性（ｔｒａｎｓｐａｒｅｎｃｙ）が保持されなくなってしまう。
さらに，キャッシュ装置が個々に独立して動作している限り，書き込まれたキャッシュデータが単一障害により失われ，回復不能となる問題は依然解決されないままである。
これに対し，複数のキャッシュ装置間で書き込まれたデータの整合性を保つ方法がある。
この整合性を保つ一般的な方法は，トークンを使用するものである。これは複数のキャッシュ装置間でトークンと呼ばれる情報を通信し合い，排他制御によってデータの一貫性を保証するものである。しかし，トークン制御は，メッセージコスト（トークンの通信コスト）が高くなり，トークンの通信時間のため，比較的多くの時間を要する。したがって，この方法は，キャッシュ装置のように処理速度の向上のために設けられた装置では，逆にボトルネックとなりかねないという問題を抱えている。
また，複数のキャッシュ装置間で共用されるメモリを使用し，このメモリ上で排他制御を行うシステムがある。このような共用メモリを使用するキャッシュ装置は，共用しているキャッシュメモリ装置とその管理部に障害が発生すると，キャッシュデータが失われるという問題を抱えている。また，キャッシュメモリに揮発性メモリを使用した場合には，電源障害によってもデータが失われる可能性がある。このため，不揮発性メモリを使用する必要があるが，その分コストが高くなるという問題がある。
このように，キャッシュ装置を利用する場合には，書き込まれたデータに対する一貫性および透過性が保持される必要があるにも関わらず，既存の多くの方法は，高いコストを要する等の問題を有する。
発明の開示
本発明は，単一のキャッシュ装置におけるボトルネックを解消することを目的とする。
また，本発明は，キャッシュ装置の障害発生時にも，キャッシュ装置に記憶されたデータの喪失を防止することを目的とする。
本発明によるキャッシュ装置は，２つのキャッシュ装置を有し，アクセスホストまたは２次記憶装置から与えられるデータを記憶するとともに，前記アクセスホストからのデータを前記２次記憶装置に記憶させるキャッシュシステムにおける各キャッシュ装置であって，前記アクセスホストから与えられる第１のデータを入力するデータ入力部と，他方のキャッシュ装置が前記アクセスホストから入力し，自キャッシュ装置に送信した第２のデータを受信するデータ受信部と，前記第１のデータおよび前記第２のデータの双方またはいずれか一方を記憶するキャッシュ記憶部と，前記キャッシュ記憶部を管理するキャッシュ管理部と，前記第１のデータを前記他方のキャッシュ装置に送信するデータ送信部と，前記第１のデータまたは前記第２のデータを前記２次記憶装置に出力するデータ出力部と，を有する。
本発明によるキャッシュシステムは，２つのキャッシュ装置を有し，アクセスホストまたは２次記憶装置から与えられるデータを記憶するとともに，前記アクセスホストからのデータを前記２次記憶装置に記憶させるキャッシュシステムであって，前記２つのキャッシュ装置のそれぞれは，前記アクセスホストから与えられる第１のデータを入力するデータ入力部と，他方のキャッシュ装置が前記アクセスホストから入力し，自キャッシュ装置に送信した第２のデータを受信するデータ受信部と，前記第１のデータおよび前記第２のデータの双方またはいずれか一方を記憶するキャッシュ記憶部と，前記キャッシュ記憶部を管理するキャッシュ管理部と，前記第１のデータを前記他方のキャッシュ装置に送信するデータ送信部と，前記第１のデータまたは前記第２のデータを前記２次記憶装置に出力するデータ出力部と，を有する。
本発明によるキャッシュ方法は，２つのキャッシュ装置を有し，アクセスホストまたは２次記憶装置から与えられるデータを記憶するとともに，前記アクセスホストからのデータを前記２次記憶装置に記憶させるキャッシュシステムにおけるキャッシュ方法であって，前記アクセスホストからデータを受信した一方のキャッシュ装置は，該データを自装置のキャッシュメモリに記憶するとともに，該データを他方のキャッシュ装置に送信し，前記他方のキャッシュ装置は，前記一方のキャッシュ装置から送信された前記データを受信して，自装置のキャッシュメモリに記憶し，前記一方のキャッシュ装置または前記他方のキャッシュ装置は，前記データを前記２次記憶装置に出力するものである。
本発明によると，キャッシュシステムに２つのキャッシュ装置が設けられる。各キャッシュ装置は，アクセスホストから２次記憶装置に記憶されるべきデータを入力し，２次記憶装置に出力（記憶）する。したがって，キャッシュ装置が１つの場合に対して，２倍の処理能力を得ることができ，キャッシュ装置が単一の場合のボトルネックを解消することができる。
また，本発明によると，アクセスホストからデータを入力された一方のキャッシュ装置は，データを自装置のキャッシュメモリに記憶するとともに，他方のキャッシュ装置に，入力されたデータを送信する。他方のキャッシュ装置は，一方のキャッシュ装置から送信されたデータを，自装置のキャッシュメモリに記憶する。これにより，キャッシュシステムにおけるデータの不揮発化が達成される。したがって，データが２次記憶装置に記憶されていない状態において，一方のキャッシュ装置に障害が発生しても，他方のキャッシュ装置からデータを得ることができ，障害によるデータの喪失が防止される。
好ましくは，前記データ入力部が前記第１のデータを入力してから前記データ送信部が前記第１のデータの送信を完了するまでに，前記データ受信部が前記第２のデータを受信した場合には，前記第１のデータおよび前記第２のデータの双方の前記２次記憶装置上のアドレス範囲と，双方のデータの内容とに基づいて，前記第１のデータおよび前記第２のデータの衝突の有無を判断する衝突検出部と，前記衝突検出部が衝突を検出すると，衝突が発生したことを示す衝突検出メッセージを前記他方のキャッシュ装置に送信する衝突検出メッセージ送信部と，前記他方のキャッシュ装置からの前記衝突検出メッセージを受信する衝突検出メッセージ受信部と，をさらに有する。
また，前記衝突検出部は，前記衝突検出メッセージ受信部が前記他方のキャッシュ装置から前記衝突検出メッセージを受信した場合も衝突を検出する。
これにより，同一アドレス範囲の内容の異なる２つのデータが，それぞれ，アクセスホストから双方のキャッシュ装置に受信された場合であっても，これら２つのデータの衝突を検出することができる。
衝突が検出された場合には，前記キャッシュ管理部は，前記第１のデータおよび前記第２のデータのうち，あらかじめ定められた優先順位に基づいて優先されるデータを有効なものとして取り扱い，他方を無効なものとして取り扱うことができる。
あるいは，衝突が検出された場合には，前記キャッシュ管理部は，前記第１のデータが前記データ入力部に入力された時刻および前記第２のデータが前記他方のキャッシュ装置のデータ入力部に入力された時刻のうち，早い時刻を有するデータを有効なものとして取り扱い，遅い時刻を有するデータを無効なものとして取り扱うこともできる。
さらに，衝突が検出された場合には，キャッシュ装置は，ランダムな時間の経過後に，前記第１のデータ，または，前記第１のデータの再送を示す再送メッセージを送信するデータ／メッセージ再送部と，前記他方のキャッシュ装置の前記データ／メッセージ再送部から送信された前記第２のデータまたは前記第２のデータの再送メッセージを受信するデータ／メッセージ受信部と，をさらに有し，前記キャッシュ管理部は，前記データ／メッセージ再送部による再送時刻と，前記データ／メッセージ受信部による受信時刻とのうち，早い時刻に対応するデータを有効なものとして取り扱い，遅い時刻に対応するデータを無効なものとして取り扱うこともできる。
これらのいずれかにより，衝突状態を解消でき，２つのキャッシュ装置間でのデータの一貫性および透過性を確保することができる。
好ましくは，前記データ送信部は，前記第１のデータとともに，該第１のデータの前記２次記憶装置への出力をいずれのキャッシュ装置が行うかを示す第１のフラッシュ権限情報を送信し，前記データ受信部は，前記第２のデータとともに，該第２のデータの前記２次記憶装置への出力をいずれのキャッシュ装置が行うかを示す第２のフラッシュ権限情報を受信し，前記データ出力部は，前記第１のフラッシュ権限情報が自キャッシュ装置を示している場合には，前記第１のデータを前記２次記憶装置に出力し，前記第２のフラッシュ権限情報が自キャッシュ装置を示している場合には，前記第２のデータを前記２次記憶装置に出力する。
これにより，２つのキャッシュ装置間で負荷分散を行うことができる。
好ましくは，キャッシュ装置は，他方のキャッシュ装置の障害発生を監視し，該障害発生を検出すると，前記キャッシュ記憶部に記憶された前記第１のデータのうち，前記データ出力部または他方のキャッシュ装置のデータ出力部が前記２次記憶装置への出力を完了しておらず，かつ，前記データ送信部が他方のキャッシュ装置への送信を完了したもの，および，前記キャッシュ記憶部に記憶された前記第２のデータのうち，前記データ出力部または他方のキャッシュ装置のデータ出力部が前記２次記憶装置への出力を完了していないものを，前記データ出力部が前記２次記憶装置に出力するように制御する障害監視部をさらに有する。
これにより，一方のキャッシュ装置に障害が発生した場合にも，キャッシュ装置に存在し，２次記憶装置に存在しないデータを，２次記憶装置に確実にフラッシュ（記憶）させることができる。
発明を実施するための最良の形態
図１は，本発明の一実施の形態によるキャッシュシステムを使用した２次記憶装置アクセスシステムの全体構成を示すブロック図である。この２次記憶装置アクセスシステム（以下，単に「アクセスシステム」という。）は，キャッシュシステム３，アクセスホスト群４，２次記憶装置群５，アクセスネットワーク６，およびストレージネットワーク７を備えている。
キャッシュシステム３は，２つのキャッシュ装置１および２を有する。キャッシュ装置１および２は，ともに同じ構成を有する。キャッシュ装置１は，キャッシュ管理部１１，キャッシュメモリ１２，入出力部１３および１４，メッセージ通信部１５，ならびに障害監視部１６を有する。キャッシュ装置２は，キャッシュ管理部２１，キャッシュメモリ２２，入出力部２３および２４，メッセージ通信部２５，ならびに障害監視部２６を有する。
アクセスホスト群４は，ｎ個（ｎは２以上の整数）のアクセスホスト４_１〜４_ｎを有する。各アクセスホストは，キャッシュシステム３を介して２次記憶装置群５にデータ（以下「記憶データ（ｓｔｏｒａｇｅｄａｔａ）」という。）を書き込み，また，２次記憶装置群５（またはキャッシュシステム３）に記憶された記憶データをキャッシュシステム３を介して読み出す。各アクセスホストは，たとえばコンピュータにより構成される。
２次記憶装置群５は，アクセスホスト群４の各アクセスホストにより共用される２次記憶装置であり，ｍ個（ｍは２以上の整数）の２次記憶装置５_１〜５_ｍを有する。各２次記憶装置には，それぞれを一意に識別するための装置番号（たとえばシリアル番号）が付されており，アクセスホスト群４およびキャッシュシステム３は，この装置番号を指定することにより，２次記憶装置群５の１つの２次記憶装置を特定でき，さらにアドレス（またはブロック番号）を指定することにより，特定された２次記憶装置上の記憶データを特定できる。各２次記憶装置は，たとえばハードディスク，光磁気ディスク（ＭＯ），光ディスク（たとえばＤＶＤ−ＲＡＭ）等により構成される。
アクセスネットワーク６は，たとえばＳＣＳＩネットワーク，ファイバチャネル（ｆｉｂｒｅｃｈａｎｎｅｌ），ＬＡＮ（イーサネット）等により構成される。ストレージネットワーク７は，たとえばファイバチャネルにより構成されている。
アクセスホスト４_１〜４_ｎ，キャッシュ装置１の入出力部１３，およびキャッシュ装置２の入出力部２３は，アクセスネットワーク６に接続されている。これにより，アクセスホスト４_１〜４_ｎは，アクセスネットワーク６を介して，キャッシュ装置１または２に記憶データを送信し，また，キャッシュ装置１または２から記憶データを受信することができる。
また，２次記憶装置５_１〜５_ｍ，キャッシュ装置１の入出力部１４，およびキャッシュ装置２の入出力部２４は，ストレージネットワーク７に接続されている。これにより，キャッシュ装置１および２は，ストレージネットワーク７を介して，２次記憶装置群５に記憶データを送信し（書き込み），また，２次記憶装置群５から記憶データを受信する（読み出す）ことができる。
キャッシュ装置１および２は，それぞれ独立して構成され，一方の障害が他方に影響しないようになっている。キャッシュ装置１および２が独立してアクセスホスト群４からの入出力を受け付けることにより，後述するように，記憶データの透過性（ｔｒａｎｓｐａｒｅｎｃｙ）を保ちながら，１つのキャッシュ装置の２倍の性能を得ることが可能となる。
入出力部１３（２３）（括弧内の符号はキャッシュ装置２における対応する構成要素を示す。以下同じ。）は，アクセスネットワーク６を介して送受信される記憶データに対してプロトコル処理等の通信処理を実行する。また，入出力部１４（２４）は，ストレージネットワーク７を介して送受信される記憶データに対して通信処理を実行する。
キャッシュメモリ１２（２２）は，２次記憶装置群５の２次記憶装置よりも高速にアクセス（読み出しおよび書き込み）が可能な記憶デバイス（ＲＡＭ等）により構成される。
キャッシュ管理部１１（２１）は，キャッシュ制御表（後述）を保持し，キャッシュメモリ１２（２２）に記憶されている記憶データをキャッシュ制御表に基づいて管理する。また，キャッシュ管理部１１（２１）は，入出力部１３（２３）および１４（２４），障害監視部１６（２６），ならびにメッセージ通信部１５（２５）を制御し，入出力部１３（２３）もしくは１４（２４）またはメッセージ通信部１５（２５）から入力される記憶データのキャッシュメモリ１２（２２）への書き込みや入出力部からの送信，キャッシュメモリ１２（２２）から読み出した記憶データの入出力部１３（２３）または１４（２４）を介した送信，他方のキャッシュ装置との間の制御メッセージおよび記憶データのメッセージ通信部１５（２５）を介した送受信等を行う。
図２は，キャッシュ管理部１１（２１）により保持されるキャッシュ制御表の構成例を示している。キャッシュ制御表は，キャッシュメモリ１２用のものと，キャッシュメモリ２２用のものとがそれぞれ設けられ，前者はキャッシュ管理部１１により，後者はキャッシュ管理部２１により，それぞれ保持される。
キャッシュ制御表は，キャッシュメモリ１２（２２）に現在記憶されている各記憶データのキャッシュ制御リストを有する。各キャッシュ制御リストは，データ項目として，要素番号，装置番号，装置開始アドレス，データ長，キャッシュ開始アドレス，状態，およびフラグを有する。
「要素番号」は，キャッシュメモリ１２（２２）に現在記憶されている記憶データの要素番号である。ここで，１つの要素は，アクセスホスト群４のあるアクセスホストから書き込まれ，または，あるアクセスホストから読み出された１まとまりのある記憶データに対応する。１つの要素は，１バイトの記憶データに対応する場合もあるし，複数バイトの記憶データに対応する場合もある。
たとえば，あるアクセスホスト４_ｉ（ｉは１〜ｎのいずれかの整数）から５１２バイトを１ブロックとする記憶データが送信され，キャッシュメモリ１２（２２）にこの１ブロックの記憶データが記憶された場合には，１つの要素が１ブロック（５１２バイト）の記憶データに対応することとなる。また，他のアクセスホスト４_ｊ（ｊは１〜ｎのいずれか）から２ブロックの記憶データが送信され，キャッシュメモリ１２（２２）にこの２ブロックの記憶データが記憶された場合には，１つの要素が２ブロックの記憶データに対応することとなる。
「装置番号」は，記憶データが記憶されるべき２次記憶装置群５の２次記憶装置の装置番号（たとえばシリアル番号）である。「装置開始アドレス」は，対応する装置番号の２次記憶装置における，記憶データの記憶開始アドレス（先頭アドレス）である。「データ長」は，対応する記憶データの長さ（バイト数）である。
なお，記憶データが，複数バイト（たとえば５１２バイト）からなるブロック単位で各２次記憶装置から読み出され，また，各２次記憶装置に書き込まれる場合には，装置開始アドレスを，最初のブロックが書き込まれる開始ブロック番号とし，データ長を，最後のブロックが書き込まれる終了ブロック番号とすることもできる。たとえば，２ブロックの記憶データが２次記憶装置の第５ブロックの記憶領域および第６ブロックの記憶領域に記憶される場合に，開始ブロック番号を５，終了ブロック番号を６とすることもできる。
「キャッシュ開始アドレス」は，キャッシュメモリ１２（２２）における記憶データの記憶開始アドレス（先頭アドレス）である。
「状態」は，記憶データの状態を示し，この状態には，「受信」（以下“ｒｅｃｅｉｖｅｄ”という。），「ダーティ」（以下“ｄｉｒｔｙ”という。），「ノンボラタイル」（以下“ｎｏｎ−ｖｏｌａｔｉｌｅ”という。），「フラッシング」（以下“ｆｌｕｓｈｉｎｇ”という。），「フラッシュメッセージ」（以下“ｆｌｕｓｈｅｄｍｓｇ”という。），「クリーン」（以下“ｃｌｅａｎ”という。），および「インバリデイト」（以下“ｉｎｖａｌｉｄａｔｅｄ”という。）がある。
ｒｅｃｅｉｖｅｄ状態は，その記憶データがアクセスホストから受信され，この記憶データのコピーメッセージ（後述）の送信完了までの状態，または，このコピーメッセージに対する第１の確認応答メッセージ（後述）が受信されるまでの状態をいう。
ｄｉｒｔｙ状態は，その記憶データに関する第１の確認応答メッセージが受信された状態で，この記憶データがまだ２次記憶装置に書き込まれていない状態をいう。
ｎｏｎ−ｖｏｌａｔｉｌｅ状態は，他方のキャッシュ装置から送信されたコピーメッセージに含まれる記憶データがキャッシュメモリに記憶され，かつ，この記憶データがまだ２次記憶装置群には書き込まれていない状態をいう。
ｆｌｕｓｈｉｎｇ状態は，ｄｉｒｔｙ状態とは異なり，その記憶データが２次記憶装置群に書き込まれている間の状態をいう。
ｆｌｕｓｈｅｄｍｓｇ状態は，その記憶データの２次記憶装置への書き込み（フラッシュ）が完了し，他方のキャッシュ装置に記憶されている同じ記憶データの状態をｎｏｎ−ｖｏｌａｔｉｌｅ状態からｃｌｅａｎ状態へ変更させるように他方のキャッシュ装置にフラッシュメッセージ（後述）が通知されている状態をいう。
ｃｌｅａｎ状態は，その記憶データの２次記憶装置への書き込みが終了し，他方のキャッシュ装置へのその旨の通知が終了した状態をいう。この状態では，キャッシュメモリ１２（２２）に記憶された記憶データは，いつでも上書き，消去等の廃棄可能な状態にあるが，アクセスホスト群４からのこの記憶データの読み出しに備えてキャッシュメモリ１２（２２）に記憶されている。
ｉｎｖａｌｉｄａｔｅｄ状態は，その記憶データが，ｃｌｅａｎ状態の後，無効化された状態をいう。この状態にある記憶データには，その後，直ちに，または，所定の時間の経過後，上書き，消去等の廃棄処理がなされる。
「フラグ」は，フラッシュ権限の有無，書き込みが衝突状態であるかどうか，アクセスホストへ既にホスト確認応答メッセージ（ＡＣＫ）を返信したか等を示す。フラッシュ権限の有無，書き込みの衝突状態，および確認応答メッセージについては，後述する。
キャッシュ管理部１１（２１）は，アクセスホスト群４または２次記憶装置群５からデータを受信し，処理する際に，自己が保持するキャッシュ制御表を参照し，処理の結果，必要に応じてキャッシュ制御表の内容を更新する。
メッセージ通信部１５および２５は，通信回線Ｌにより相互に接続され，記憶データ，制御メッセージ等を相互に送受信する。両者を接続する通信回線Ｌには，たとえばＰＣＩバス（ＰｅｒｉｐｈｅｒａｌＣｏｍｐｏｎｅｎｔＩｎｔｅｒｃｏｎｎｅｃｔＢｕｓ），ギガビットイーサネット等が使用され，その通信速度はアクセスネットワーク６よりも高速であることが好ましい。
図３は，メッセージ通信部１５と２５との間で通信される制御メッセージのデータ構造例を示している。制御メッセージは，制御情報が格納されるヘッダ部と，記憶データが格納されるデータ部とを有する。ヘッダ部には，タイプ，装置番号，装置開始アドレス，データ長，シーケンス番号，状態，およびフラッシュ権限のデータ項目が置かれる。
「タイプ」は，制御メッセージのタイプである。このタイプには，コピーメッセージを表す“ＣＯＰＹ”，コピーメッセージに対する確認応答メッセージ（第１の確認応答メッセージ）を表す“Ｃ−ＡＣＫ”，フラッシュメッセージを表す“ＦＬＵＳＨＥＤ”，フラッシュメッセージに対する確認応答メッセージ（第２の確認応答メッセージ）を表す“Ｆ−ＡＣＫ”，衝突検出メッセージを表す“ＣＯＬＬＩＳＩＯＮ”，衝突検出メッセージに対する確認応答メッセージ（第３の確認応答メッセージ）を表す“ＣＯＬ−ＡＣＫ”，障害監視メッセージを表す“ＭＯＮＩＴＯＲ”等がある。これらの各制御メッセージについては後述する。
「装置番号」は，コピーメッセージの場合には，データ部に付与される記憶データが記憶される２次記憶装置の装置番号，フラッシュメッセージの場合には，フラッシュされた記憶データが記憶された２次記憶装置の装置番号，衝突検出メッセージの場合には，衝突が検出された記憶データが記憶される２次記憶装置の装置番号，確認応答メッセージの場合には，対応するコピーメッセージ，フラッシュメッセージ，衝突検出メッセージ等の装置番号と同じ装置番号である。
「装置開始アドレス」は，コピーメッセージの場合には，データ部に付与される記憶データが記憶される２次記憶装置の開始アドレス，フラッシュメッセージの場合には，フラッシュされた記憶データが記憶された２次記憶装置の開始アドレス，衝突検出メッセージの場合には，衝突が検出された記憶データが記憶される２次記憶装置の開始アドレス，確認応答メッセージの場合には，対応するコピーメッセージ，フラッシュメッセージ，衝突検出メッセージ等の開始アドレスと同じ開始アドレスである。
「データ長」は，コピーメッセージの場合には，データ部に付与される記憶データの長さ（バイト数，ブロック数等），フラッシュメッセージの場合には，フラッシュされた記憶データの長さ，衝突検出メッセージの場合には，衝突が検出された記憶データの長さである。確認応答メッセージのデータ長の値は０とされる。
「状態」は，前述したキャッシュ制御表における状態と同一のデータである。「シーケンス番号」は，送信される制御メッセージに付されるシリアル番号である。キャッシュ管理部１１および２１は，それぞれ自己が送信する制御メッセージに，たとえば１から始まるシリアル番号を送信する順序に従って順に付与する。これにより，送信する制御メッセージの時間順序を明らかにすることができる。したがって，受信側のキャッシュ管理部１１（２１）は，受信される制御メッセージのシーケンス番号を管理しておくことにより，制御メッセージの受信順序順が送信順序と異なっていても，送信順序を正確に知ることができる。
「フラッシュ権限」は，フラッシュ（キャッシュメモリ１２（２２）に記憶された記憶データを２次記憶装置に記憶する処理）を行うキャッシュ装置が，キャッシュ装置１であるか２であるかを示す。フラッシュ権限を指定しない場合には，この領域は，キャッシュ装置１および２を示す値以外の値（たとえばＮｕｌｌ等）に設定される。
障害監視部１６（２６）は，他方のキャッシュ装置の障害を監視し，障害を検出した場合には，障害の回復に必要な処理を行う。たとえば，障害監視部１６（２６）は，一定時間間隔ごとに障害検出メッセージをメッセージ通信部１５（２５）を介して相互に送受信し合う。そして，一方の障害監視部は，他方の障害監視部からの障害検出メッセージを，所定の時間を経過しても受信しない場合には，他方のキャッシュ装置に障害が発生したと判断する。
また，障害監視部１６（２６）は，他方のキャッシュ装置の障害発生を検出することにより，自己が属するキャッシュ装置１（２）のキャッシュ管理部１１（２１）により保持されるキャッシュ制御表の状態を，必要に応じて書き換え，障害回復処理を実行する。この障害回復処理の詳細については後述する。
このような構成のアクセスシステムにおいて，アクセスホスト群４は，キャッシュシステム３を介して２次記憶装置群５に記憶データを書き込み，また，キャッシュシステム３を介して２次記憶装置群５から記憶データを読み出す。以下，これらの書き込み処理および読み出し処理，ならびにキャッシュシステム３の一方のキャッシュ装置に障害が発生した場合の障害回復処理の詳細について説明する。
＜記憶データの書き込み処理＞
アクセスホスト群４のいずれかのアクセスホスト４_ｉが，記憶データをキャッシュ装置１または２を介して２次記憶装置群５に書き込む場合の処理について以下に説明する。
図４は，アクセスホスト４_ｉから送信された記憶データの２次記憶装置群５への書き込み処理の流れを示すシーケンス図である。
まず，アクセスホスト４_ｉは，アクセスネットワーク６を介して記憶データをキャッシュ装置１または２に送信する。記憶データをキャッシュ装置１または２のいずれに送信するかは，アクセスホスト４_ｉにあらかじめ設定されている。キャッシュ装置１または２の一方にのみ送信するように設定されてもよいし，キャッシュ装置１および２に交互に送信するように設定されていてもよい。また，アクセスホスト４_ｉが，書き込み要求信号をキャッシュ装置１および２に送信し，キャッシュ装置１および２のうちアイドル状態にあるものが，データ受信可能メッセージをアクセスホスト４_ｉに返信し，これにより，記憶データの送信を開始してもよい（キャッシュ装置１および２の双方からデータ受信可能メッセージが送信された場合には，アクセスホスト４_ｉが一方を選択する）。以下では，アクセスホスト４_ｉからキャッシュ装置１に記憶データが送信される場合を例にとって説明する。
アクセスホスト４_ｉは，キャッシュ装置１の入出力部１３に記憶データおよび制御データをアクセスネットワーク６を介して送信する。制御データは，たとえば，記憶データのヘッダとして付加されている。この制御データには，記憶データの記憶されるべき２次記憶装置の装置番号，装置開始アドレス，およびデータ長（以下，装置番号，装置開始アドレス，およびデータ長を「アドレス範囲」という。）が含まれている。
入出力部１３は，アクセスホスト４_ｉから送信された制御データおよび記憶データをキャッシュ管理部１１に与える。
キャッシュ管理部１１は，アクセスホスト４_ｉから送信された記憶データ（記憶データａ）のアドレス範囲と重なるアドレス範囲を有する記憶データ（記憶データｂ）がキャッシュメモリ１２に既に記憶されているかどうかを，キャッシュメモリ１２のキャッシュ制御表のアドレス範囲に基づいて判断する。
記憶データａのアドレス範囲と重なるアドレス範囲を有する記憶データｂが存在しない場合には，キャッシュ管理部１１は，記憶データａのキャッシュ制御リストを生成し，キャッシュ制御表に付加する。そして，キャッシュ管理部１１は，生成したキャッシュ制御リストのキャッシュ開始アドレスから始まるキャッシュメモリ１２のメモリセルに記憶データａを書き込む（Ｓ１）。
一方，アドレス範囲が重なる場合として，図７Ａ〜図７Ｅに示すように，（１）記憶データａのアドレス範囲と記憶データｂのアドレス範囲とがまったく同一の場合（図７Ａ），（２）記憶データａのアドレス範囲が記憶データｂのアドレス範囲を包含する場合（図７Ｂ），（３）記憶データａのアドレス範囲が記憶データｂのアドレス範囲に包含される場合（図７Ｃ），（４）記憶データａのアドレス範囲と記憶データｂのアドレス範囲とが，ともに一部重なる範囲と重ならない範囲とを有する場合（図７Ｄおよび図７Ｅ），がある。
（１）の場合には，キャッシュ管理部１１は，その記憶データｂのキャッシュ制御リストの装置番号，装置開始アドレス等を記憶データａのものに更新する。あるいは，キャッシュ管理部１１は，記憶データａのキャッシュ制御リストを生成し，キャッシュ制御表に付加するとともに，記憶データｂのキャッシュ制御リストの状態を“ｉｎｖａｌｉｄａｔｅｄ”にするか，または，記憶データｂのキャッシュ制御リストを消去してもよい。そして，キャッシュ管理部１１は，キャッシュ制御リストのキャッシュ開始アドレスから始まるキャッシュメモリ１２の領域に記憶データａを書き込む（Ｓ１）。
記憶データａが書き込まれるキャッシュメモリ１２上の領域は，記憶データｂが記憶されていたキャッシュメモリ１２上の領域と同一であってもよいし，他の空いている領域であってもよい。前者の場合に，記憶データｂは記憶データａにより上書きされる。後者の場合に，記憶データｂは，キャッシュメモリ１２上には残っているものの，そのキャッシュ制御リストは“ｉｎｖａｌｉｄａｔｅｄ”にされているか，または，キャッシュ制御表から消去（上書きを含む。）されているので，有効な記憶データとしては取り扱われない。したがって，記憶データｂは，その後，他の記憶データにより上書きされることとなる。以下の（２）〜（４）の場合も同様である。
（２）の場合も，（１）の場合と同様に処理される（Ｓ１）。したがって，記憶データｂのキャッシュ制御リストはキャッシュ制御表から抹消される。
（３）の場合には、キャッシュ管理部１１は，アドレス範囲が重なる部分については，記憶データａのキャッシュ制御リストを生成して，キャッシュ制御表に付加し，記憶データａをキャッシュメモリ１２に書き込む（Ｓ１）。また，キャッシュ管理部１１は，記憶データｂのうち，記憶データａと重なる部分を除いた２つの部分について，それぞれキャッシュ制御リストを生成（または更新）し，キャッシュ制御表に付加する。
（４）の場合には，キャッシュ管理部１１は，記憶データａのキャッシュ制御リストを生成して，キャッシュ制御表に付加し，記憶データａをキャッシュメモリ１２に書き込む（Ｓ１）。また，キャッシュ管理部１１は，記憶データｂのうち，記憶データａと重なる部分を除いた部分（１つ）について，キャッシュ制御リストを生成または更新し，キャッシュ制御表に付加する。
記憶データａのキャッシュ制御リストの状態には，“ｒｅｃｅｉｖｅｄ”が書き込まれ，この記憶データａがｒｅｃｅｉｖｅｄ状態であることが示される。
続いて，キャッシュ管理部１１は，コピーメッセージ（ＣＯＰＹ）を，メッセージ通信部１５および通信回線Ｌを介してキャッシュ装置２のメッセージ通信部２５に送信する（Ｓ２）。このコピーメッセージのヘッダ部の装置番号，装置開始アドレス，およびデータ長には，記憶データａのキャッシュ制御リストの装置番号，装置開始アドレス，およびデータ長がそれぞれ書き込まれる。また，コピーメッセージのデータ部には，記憶データａが置かれる。
メッセージ通信部２５は，キャッシュ装置１から送信されたコピーメッセージをキャッシュ管理部２１に与える。キャッシュ管理部２１は，コピーメッセージを受信すると，コピーメッセージのデータ部の記憶データａに対して，前述したキャッシュ管理部１１のステップＳ１と同様の処理を実行する。
すなわち，キャッシュ管理部２１は，キャッシュメモリ２２のキャッシュ制御表に基づいてキャッシュ制御リストの更新または生成を行い，記憶データａをキャッシュメモリ２２に書き込む（Ｓ７）。記憶データａのすべてがキャッシュメモリ２２に書き込まれた時に，キャッシュ装置１または２の一方が障害で停止してもこの記憶データａは喪失されない。すなわち，記憶データａの不揮発化が完了する。したがって，この不揮発化を表すために，キャッシュ管理部２１が保持する記憶データａのキャッシュ制御リストの状態には，“ｎｏｎ−ｖｏｌａｔｉｌｅ”が書き込まれる。
記憶データａのキャッシュメモリ２２への書き込み後，キャッシュ管理部２１は，正常に書き込みが終了したことを示す第１の確認応答メッセージ（Ｃ−ＡＣＫ）をメッセージ通信部２５および通信回線Ｌを介してメッセージ通信部１５に送信する（Ｓ８）。
メッセージ通信部１５は，第１の確認応答メッセージをキャッシュ管理部１１に与える。キャッシュ管理部１１は，第１の確認応答メッセージをメッセージ通信部１５から受け取ると，アクセスホスト用の確認応答メッセージ（ホスト確認応答メッセージ）を入出力部１３およびアクセスネットワーク６を介してアクセスホスト４_ｉに送信する（Ｓ３）。このホスト確認応答メッセージの送信により，アクセスホスト４_ｉは，記憶データａがキャッシュシステム３（キャッシュ装置１および２）に記憶され，不揮発化されたことを知る。
ホスト確認応答メッセージの送信の際に，キャッシュ管理部１１は，データａのキャッシュ制御リストの状態を“ｒｅｃｅｉｖｅｄ”から“ｄｉｒｔｙ”に更新する。
続いて，その後，適当なタイミングで，キャッシュ管理部１１は，キャッシュメモリ１２に記憶した記憶データａに対応するキャッシュ制御リストの状態を“ｆｌｕｓｈｉｎｇ”に更新するとともに，この記憶データａを，入出力部１４およびストレージネットワーク７を介して２次記憶装置群５の２次記憶装置（５_ｋとする。）に送信する（Ｓ４）。この２次記憶装置５_ｋは，記憶データａのキャッシュ制御リストの装置番号に対応する２次記憶装置である。送信された記憶データａは，２次記憶装置５_ｋにおける装置開始アドレスから始まる領域に書き込まれる（Ｓ４）。
記憶データａの２次記憶装置５_ｋへの送信（書き込み）が終了すると，キャッシュ管理部１１は，キャッシュ制御表の状態を“ｆｌｕｓｈｅｄｍｓｇ”に更新するとともに，フラッシュメッセージ（ＦＬＵＳＨＥＤ）をメッセージ通信部１５および通信回線Ｌを介してメッセージ通信部２５に送信する（Ｓ５）。このフラッシュメッセージは，メッセージ通信部２５からキャッシュ管理部２１に与えられる。これにより，キャッシュ管理部２１は，フラッシュ（すなわち記憶データａの２次記憶装置への不揮発化）が完了したことを知り，キャッシュメモリ２２に記憶された記憶データａを安全に消去できることを認識する。
続いて，キャッシュ管理部２１は，キャッシュメモリ２２に記憶された記憶データａのキャッシュ制御リストの状態を“ｃｌｅａｎ”に更新するとともに，ｃｌｅａｎ状態に移行したことをキャッシュ装置１に通知するために，第２の確認応答メッセージ（Ｆ−ＡＣＫ）を，メッセージ通信部２５および通信回線Ｌを介してメッセージ通信部１５に送信する（Ｓ９）。
キャッシュ管理部１１は，メッセージ通信部１５から第２の確認応答メッセージを受け取ると，記憶データａをｃｌｅａｎ状態として扱うために，キャッシュ制御リストの状態を“ｃｌｅａｎ”に更新する。
キャッシュ管理部１１および２１は，ｃｌｅａｎ状態にあるキャッシュ制御リストの状態をいつでも“ｉｎｖａｌｉｄａｔｅｄ”（無効化）に更新することができる（Ｓ６，Ｓ１０）。たとえば，ｃｌｅａｎ状態にある記憶データａが不要になった時点，または，キャッシュメモリ１２または２２の空き容量がなくなり，記憶データａを消去する必要が生じた時点等に，記憶データａの状態をｉｎｖａｌｉｄａｔｅｄ状態にすることができる。また，ＬＲＵ（ＬｅａｓｔＲｅｃｅｎｔｌｙＵｓｅｄ）アルゴリズムを用いて，最も読み出されていない記憶データａをｉｎｖａｌｉｄａｔｅｄ状態にすることもできる。
ｉｎｖａｌｉｄａｔｅｄ状態のキャッシュ制御リストは，その後，新たな記憶データがキャッシュメモリに書き込まれる時等に，キャッシュ制御表から消去されるか，または，他の新たに記憶データのキャッシュ制御リストにより上書きされることとなる。また，ｉｎｖａｌｉｄａｔｅｄ状態の記憶データのキャッシュメモリ上の領域も，他の新たな記憶データにより上書きされる。
これにより，キャッシュシステム３の記憶データの書き込み処理は終了する。キャッシュシステム３のキャッシュ装置１および２は，アクセスホスト群から独立して記憶データを受信し，書き込み処理を独立して実行することができる。したがって，キャッシュシステム３は，キャッシュ装置が１つしか存在しない場合と比較して，ほぼ２倍の処理能力を有する。これにより，キャッシュ装置が１つの場合のボトルネックを解消できる。また，記憶データは，少なくとも一方のキャッシュ装置または２次記憶装置群５に保持されているので，一方のキャッシュ装置に障害が発生しても，記憶データが失われることはない。
キャッシュ装置２がアクセスホスト４_ｉから記憶データを受信した場合には，キャッシュ装置１とキャッシュ装置２とが入れ替わるだけであり，上記と同じ処理が実行される。
なお，ステップＳ７において，キャッシュ装置１から通信回線Ｌを介して送信された記憶データａと同一のアドレス範囲を有する別の記憶データ（記憶データｃ）を，キャッシュ装置２がアクセスホスト４_ｊから受信し，この記憶データｃがキャッシュ装置２においてｒｅｃｅｉｖｅｄ状態に置かれている場合がある。すなわち，同一のアドレス範囲であって，内容の異なる記憶データａおよびｃを，キャッシュ装置１および２が，ほぼ同時に，異なるアクセスホストからそれぞれ受信する場合がある。この場合の処理については，後述する書き込みデータの衝突時の処理において説明する。
ステップＳ１において，記憶データａにより上書きされ，消去された記憶データｂがｒｅｃｅｉｖｅｄ状態またはｆｌｕｓｈｅｄｍｓｇ状態にあり，この記憶データｂについての第１の確認応答メッセージまたは第２の確認応答メッセージがキャッシュ装置２からキャッシュ装置１へ送信中の場合がある。この場合に，キャッシュ装置１（キャッシュ管理部１１）は，キャッシュ装置２から送信されたこれらの確認応答メッセージを無視することとなる。すなわち，キャッシュ装置１は，キャッシュ装置２からのこれらの確認応答メッセージを受信しても，廃棄するだけであり，確認応答メッセージの受信に伴う処理を実行しない。
確認応答メッセージを無視するかどうかは，メッセージに含まれるシーケンス番号に基づいて判断される。たとえば，シーケンス番号が１から順に１つずる増加する番号である場合において，キャッシュ装置１が第１の確認応答メッセージを２つ受信した場合には，両第１の確認応答メッセージのシーケンス番号のうち，若い（小さな）値を有するものは記憶データｂに対応する応答メッセージである。したがって，この場合には，若いシーケンス番号を有する第１の確認応答メッセージが無視される。
また，キャッシュ装置２によってフラッシュされ，ｆｌｕｓｈｅｄｍｓｇ状態にある記憶データ（記憶データｄ）と同一のアドレス範囲で，内容の異なる記憶データａが，アクセスホストからキャッシュ装置１に送信され，キャッシュ装置１により受信されるときがある。この場合には，キャッシュ装置１は，キャッシュ装置２からのフラッシュメッセージを無視するとともに，キャッシュ装置２は，すでにフラッシュされた記憶データｄを，キャッシュ装置１から送信されるコピーメッセージに含まれる記憶データａに置き換えることにより，同一アドレス範囲の記憶データの不一致を解消することができる。
キャッシュ装置１の記憶データａがｆｌｕｓｈｉｎｇ状態となるまでに，この記憶データａと同じアドレス範囲の新たな記憶データ（記憶データｅ）を，キャッシュ装置１がアクセスホストから受信した場合には，キャッシュ装置１（キャッシュ管理部１１）は，２次記憶装置４_ｋへの記憶データａの書き込み（フラッシング）を中止し，記憶データｅのみを２次記憶装置４_ｋに書き込むことにより，書き込みの重複を回避することができる。
＜記憶データの書き込み処理の他の形態＞
アクセスホスト４_ｉからキャッシュ装置１に送信された記憶データａが複数のバイトからなるブロックデータである場合に，コピーメッセージは，このブロックデータを複数の部分に分割して，各部分ごとに複数個送信されてもよい。この場合に，第１の確認応答メッセージも，各部分ごとに対応して，複数個送信されることとなる。また，このブロックデータのキャッシュ制御リストの状態は，ブロックデータのすべてについて，受信やキャッシュメモリ２２への書き込み等が終了した後に，更新される。さらに，アクセスホスト４_ｉへのホスト確認応答メッセージも，ブロックデータのすべてについての第２の確認応答メッセージ受信後に送信される。
ステップＳ５で送信されるフラッシュメッセージは，他のメッセージとは別に送信することもできるし，他の制御メッセージに付加してピギーバック（ｐｉｇｇｙｂａｃｋ）により送信することもできる。すなわち，ステップＳ４の直後，他のメッセージとは分離された個別メッセージとしてフラッシュメッセージを送信することもできるし，その後，他の記憶データについてのコピーメッセージに付加してピギーバックとして送信することもできる。
なお，キャッシュメモリ１２に記憶されている記憶データｂと，アクセスホスト４_ｉから受信した記憶データａとが同一内容である場合に，キャッシュ装置１はアクセスホスト４_ｉに対して即座に確認応答メッセージを送信し，２次記憶装置（ならびにキャッシュメモリ１２および２２）への書き込み（更新）を行わないようにすることもできる。これによって，書き込みに要するコストを減少させることができる。
＜書き込みデータの衝突時の処理＞
「書き込みデータの衝突」とは，アクセスホスト群からキャッシュシステム３に同一のアドレス範囲で内容の異なる記憶データが複数書き込まれ，これら複数の記憶データがキャッシュシステム３においてｒｅｃｅｉｖｅｄ状態にある状態をいう。
この衝突には，一方のキャッシュ装置のみにおける衝突と，２つのキャッシュ装置１および２の双方における衝突とがある。以下では，これら２つの場合の衝突発生時の処理について説明する。
（１）一方のキャッシュ装置での衝突発生時の処理
キャッシュ装置１が，ある記憶データＡ１をアクセスホスト４_ｉから受信後，この記憶データＡ１と同一のアドレス範囲で，内容の異なる記憶データＡ２をアクセスホスト４_ｊから受信した場合に，キャッシュ装置１において，記憶データＡ１とＡ２との衝突が発生する。アクセスホスト４_ｉと４_ｊとは同じ場合もあるし，異なる場合もある。
この場合に，キャッシュ装置１のキャッシュ管理部１１は，記憶データＡ１のキャッシュ制御リストを，記憶データＡ２のキャッシュ制御リストにより上書きするか，あるいは，記憶データＡ１のキャッシュ制御リストを消去またはｉｎｖａｌｉｄａｔｅｄ状態にして，記憶データＡ２のキャッシュ制御リストを新たに生成し，キャッシュ制御表に加える。また，キャッシュ管理部１１は，キャッシュメモリ１２において，記憶データＡ２を，記憶データＡ１と同一の領域または異なる領域に書き込む。
また，キャッシュ管理部１１は，記憶データＡ１についての第１の確認応答メッセージをキャッシュ装置２から受信した場合には，この第１の確認応答メッセージを無視する。なお，２つの第１の確認応答メッセージは，たとえば，上述したのと同様にして，シーケンス番号の大小（老若）により区別することができ，シーケンス番号の小さい（若い）第１の確認応答メッセージが無視される。
同様にして，キャッシュ装置２のキャッシュ管理部２１は，記憶データＡ１のコピーメッセージ（第１のコピーメッセージ）に基づいて生成したキャッシュ制御リストを，記憶データＡ２のコピーメッセージ（第２のコピーメッセージ）に基づいて上書きするか，あるいは，記憶データＡ１のキャッシュ制御リストを消去またはｉｎｖａｌｉｄａｔｅｄ状態にして，記憶データＡ２のキャッシュ制御リストを新たに生成し，キャッシュ制御表に加える。また，キャッシュ管理部２１は，キャッシュメモリ２２において，記憶データＡ２を，記憶データＡ１と同一の領域または異なる領域に書き込む。
なお，通信回線Ｌがたとえばギガビットイーサネットのような場合には，後から送信した第２のコピーメッセージが先に送信した第１のコピーメッセージよりも先にキャッシュ装置２に受信される場合がある。このような場合にも，キャッシュ管理部２１は，第１のコピーメッセージおよび第２のコピーメッセージの各シーケンス番号に基づいて，第１のコピーメッセージを無視（廃棄）し，第２のコピーメッセージの記憶データをキャッシュメモリ２２に記憶することができる。
キャッシュ装置１における記憶データの衝突を説明したが，キャッシュ装置２において同様の衝突が発生した場合も同様の処理が実行される。
（２）両キャッシュ装置間での衝突発生時の衝突検出処理
図５Ａから図５Ｃは，両キャッシュ装置間で衝突が発生した場合の衝突検出の処理の流れを示すシーケンス図である。これらの図において，記憶データＡ１および記憶データＡ２は，同一のアドレス範囲であり，異なる内容を有する記憶データである。
図５Ａから図５Ｃのいずれにおいても，アクセスホスト４_ｉから送信された記憶データＡ１は，キャッシュ装置１により受信されてｒｅｃｅｉｖｅ状態にあり，また，アクセスホスト４_ｊから送信された記憶データＡ２は，キャッシュ装置２により受信されてｒｅｃｅｉｖｅ状態にある。したがって，記憶データＡ１および記憶データＡ２は衝突状態にある。
なお，キャッシュ装置１の記憶データＡ１の受信時刻とキャッシュ装置２の記憶データＡ２の受信時刻とは同時の場合もあるし，一方が他方より早い場合もあるし，遅い場合もある。
この衝突状態において，キャッシュ装置１および２の双方で，記憶データＡ１およびＡ２がｄｉｒｔｙ状態またはｎｏｎ−ｖｏｌａｔｉｌｅ状態に移行すると，両キャッシュ装置の保持するデータの一貫性および透過性が保持できなくなる。したがって，一貫性および透過性を保持するために，両キャッシュ装置は，まず衝突状態を検出する必要がある。このため，以下の処理が実行される。
図５Ａは，一方のキャッシュ装置（キャッシュ装置２）が記憶データＡ２のコピーメッセージを送信する前に，他方のキャッシュ装置（キャッシュ装置１）から記憶データＡ１のコピーメッセージを受信した場合の衝突検出の処理の流れを示している。
キャッシュ装置１はアクセスホスト４_ｉから記憶データＡ１を，キャッシュ装置２はアクセスホスト４_ｊから記憶データＡ２を，それぞれ受信することにより，コピーメッセージをそれぞれ送信しようとする（Ｓ１１，Ｓ１３）。しかし，キャッシュ装置２がコピーメッセージを送信する前に，キャッシュ装置１から，記憶データＡ１のコピーメッセージを受信した場合には，キャッシュ装置２（キャッシュ管理部２１）は，記憶データＡ２について衝突が発生していることを検出する（Ｓ１４）。
すなわち，キャッシュ装置２（キャッシュ管理部２１）は，（ａ）記憶データＡ１のコピーメッセージのヘッダ部に含まれるアドレス範囲（つまり装置番号，装置開始アドレス，およびデータ長）と，記憶データＡ２のアドレス範囲とを比較することにより，同じアドレス範囲であることを検出し，（ｂ）コピーメッセージのデータ部に含まれる記憶データＡ１の内容と，記憶データＡ２の内容とを比較することにより，記憶データの内容が異なることを検出し，（ｃ）記憶データＡ２のキャッシュ制御リストにより記憶データＡ２がｒｅｃｅｉｖｅ状態にあること，および，コピーメッセージのヘッダ部の状態により記憶データＡ１がｒｅｃｅｉｖｅ状態にあることを検出する。これら（ａ）〜（ｃ）により，キャッシュ装置２は，記憶データＡ１と記憶データＡ２との衝突の発生を検出する。
キャッシュ装置２は，衝突を検出することにより，記憶データＡ２のコピーメッセージの送信を中止するとともに，第１の確認応答メッセージの代わりに衝突検出メッセージ（ＣＯＬＬＩＳＩＯＮ）をキャッシュ装置１に送信する（Ｓ１４）。
キャッシュ装置１（キャッシュ管理部１１）は，キャッシュ装置２から衝突検出メッセージを受信することにより，記憶データＡ１について衝突が発生していることを検出する（Ｓ１２）。すなわち，キャッシュ装置１（キャッシュ管理部１１）は，メッセージが衝突検出メッセージであること，および，衝突検出メッセージのヘッダ部に含まれるアドレス範囲に基づいて，記憶データＡ１に衝突が発生していることを検出する。
なお，キャッシュ装置１と２とが入れ替わった場合，すなわち，キャッシュ装置２がキャッシュ装置１より先に記憶データＡ２のコピーメッセージをキャッシュ装置１に送信した場合には，キャッシュ装置１が衝突検出メッセージをキャッシュ装置２に送信することとなる。
このようにして，一方のキャッシュ装置がｒｅｃｅｉｖｅ状態にある記憶データについて，同一アドレス範囲の異なる内容の記憶データを有するコピーメッセージを受信した場合には，他方のキャッシュ装置に，コピーメッセージの代わりに，衝突発生を通知する衝突検出メッセージを送信することにより，両キャッシュ装置が衝突の発生を検出することができる。
図５Ｂは，双方のキャッシュ装置がコピーメッセージを送信し，受信した場合の衝突検出の処理の流れを示している。
キャッシュ装置１が記憶データＡ１のコピーメッセージを送信し（Ｓ２１）．キャッシュ装置２が記憶データＡ２のコピーメッセージを送信し（Ｓ２４），両キャッシュ装置が，それぞれ相手方のコピーメッセージを受信した場合に，両キャッシュ装置は，この相手方のコピーメッセージによりそれぞれ衝突を検出する（Ｓ２２，Ｓ２５）。
この場合にも，図５Ａと同様に，衝突を検出したキャッシュ装置は，第１の確認応答メッセージの代わりに衝突検出メッセージを相手方に送信する（Ｓ２２，Ｓ２５）。すなわち，図５Ｂでは，双方のキャッシュ装置が衝突検出メッセージを相手方のキャッシュ装置に送信することとなる。これにより，双方のキャッシュ装置は，既に衝突検出済みの衝突を再度検出することとなる。
このように，両キャッシュ装置が，コピーメッセージを送信し，受信した場合には，このコピーメッセージにより両キャッシュ装置が衝突を検出することができる。さらに，この場合においても，双方のキャッシュ装置が衝突検出メッセージを送信することにより，双方のキャッシュ装置は，相互に衝突を検出したことを通知し合うことができる。
図５Ｃは，図５Ｂと同様に，双方のキャッシュ装置がコピーメッセージを送信するが，記憶データＡ１のコピーメッセージがキャッシュ装置１の衝突検出メッセージよりも後にキャッシュ装置２に受信された場合の衝突検出の処理の流れを示している。
この場合に、キャッシュ装置２は，キャッシュ装置１から送信された衝突検出メッセージにより衝突を検出するが（Ｓ３５），その後，受信された記憶データＡ１のコピーメッセージによって，検出済みの衝突を再度検出することとなる（Ｓ３６）。これにより，両キャッシュ装置で，衝突が検出されたこととなる（Ｓ３２，Ｓ３５，Ｓ３６）。
なお，キャッシュ装置２は，図５Ｃの破線で示すように，検出済みの衝突の検出後，第３の確認応答メッセージ（ＣＯＬ−ＡＣＫ）をキャッシュ装置１に送信してもよい。第３の確認応答メッセージが送信された場合に，キャッシュ装置１は，このメッセージにより，検出済みの衝突を再度検出することとなる（Ｓ３３）。
図５Ｃと逆の状況も生じ得る。すなわち，記憶データＡ２のコピーメッセージがキャッシュ装置１に遅れて到着する場合である。この場合にも，キャッシュ装置１と２とが入れ替わるだけで，同様の処理が行われる。
（３）両キャッシュ装置間での衝突回復処理
上述した両キャッシュ装置間で衝突が検出された場合に，その状況を解消（回復）する処理には以下の３つの方法がある。
（ａ）第１の方法
第１の方法は，衝突が検出された場合に，キャッシュ装置１または２のいずれを優先するかをあらかじめ決めておき，優先されるキャッシュ装置１が受信した記憶データを有効なものとする衝突解消方法である。この場合，優先されるキャッシュ装置の記憶データのみが有効なものとして取り扱われ，優先されないキャッシュ装置の記憶データは無効なものとして取り扱われる。
ここで，「記憶データが有効なものとして取り扱われる」とは，その記憶データのキャッシュ制御リストがｉｎｖａｌｉｄａｔｅｄ状態以外の状態でキャッシュ制御表に存在し，その記憶データがキャッシュメモリに記憶されていることをいう。
「記憶データが無効なものとして取り扱われる」とは，その記憶データのキャッシュ制御リストがキャッシュ制御表から消去される（他のキャッシュ制御リストにより上書きされる場合を含む。）か，または，そのキャッシュ制御リストの状態がｉｎｖａｌｉｄａｔｅｄ状態にされることをいう。その記憶データは，キャッシュメモリに存在する場合もあるし，キャッシュメモリから消去されている場合（他の記憶データにより上書きされる場合を含む。）もある。
たとえば，図５Ａにおいて，キャッシュ装置１が優先される場合に，キャッシュ装置１のキャッシュ管理部１１は，衝突を検出しても（Ｓ１２），記憶データＡ１のキャッシュ制御リストおよびキャッシュメモリ１２を更新する必要はない。すなわち，キャッシュ装置１では，記憶データＡ１が有効なものとして取り扱われ，記憶データＡ２は無効なものとして取り扱われる。
一方，キャッシュ装置２のキャッシュ管理部２１は，衝突検出後，記憶データＡ１のコピーメッセージのヘッダ部の情報により，記憶データＡ１のキャッシュ制御リストをキャッシュ制御表に付加するとともに，キャッシュメモリ２２に記憶データＡ１を書き込む。記憶データＡ２のキャッシュ制御リストは，消去されるか，または，ｉｎｖａｌｉｄａｔｅｄ状態にされる。これにより，キャッシュ装置２でも，記憶データＡ１が有効なものとして取り扱われ，記憶データＡ２は無効なものとして取り扱われる。
図５Ａにおいて，キャッシュ装置２が優先される場合に，キャッシュ装置２のキャッシュ管理部２１は，衝突検出後，記憶データＡ２のキャッシュ制御リストおよびキャッシュメモリ２２を更新する必要はない。そして，キャッシュ管理部２１は，記憶データＡ２のコピーメッセージをキャッシュ装置１に送信する。このコピーメッセージは，衝突検出メッセージとは別に送信されてもよいし，ピギーバックとして，衝突検出メッセージとともに送信されてもよい。
キャッシュ装置１のキャッシュ管理部１１は，このコピーメッセージのヘッダ部の情報により記憶データＡ２のキャッシュ制御リストをキャッシュ制御表に付加し，キャッシュメモリ１２に記憶データＡ２を書き込む。記憶データＡ１のキャッシュ制御リストは、消去されるか，または，ｉｎｖａｌｉｄａｔｅｄ状態にされる。これにより，キャッシュ装置１でも，記憶データＡ２が有効なものとして取り扱われ，記憶データＡ１は無効なものとして取り扱われる。
なお，キャッシュ装置１は，キャッシュ装置２からのコピーメッセージに対して，第１の確認応答メッセージを返してもよいが，メッセージの通信コストを削減するために返さない方が好ましい。第１の確認応答メッセージが返された場合には，キャッシュ装置２は，この第１の確認応答メッセージを無視すればよい。
図５Ｂおよび図５Ｃでは，キャッシュ装置１および２は，ともに記憶データＡ１および記憶データＡ２の双方を保持しているので，両キャッシュ装置は，衝突検出後，優先される側のキャッシュ装置により受信された記憶データを有効なものとして取り扱い，優先されない側のキャッシュ装置により受信された記憶データを無効なものとして取り扱う。
このようにして，衝突状態が解消され，キャッシュ装置１および２において，記憶データの一貫性および透過性が確保される。
（ｂ）第２の方法
第２の方法は，キャッシュ装置１の記憶データの受信時刻とキャッシュ装置２の記憶データの受信時刻とを比較し，より早い受信時刻の記憶データを優先し，有効なものとする衝突解消方法である。
この第２の方法では，衝突検出後，キャッシュ装置１および２が，通信回線Ｌを介して，受信時刻を通信し合うか，あるいは，コピーメッセージまたは衝突検出メッセージにより受信時刻を送信することにより，相互に受信時刻を通知することとなる。そして，より早い受信時刻で受信された記憶データが優先され，その記憶データが有効なものとして取り扱われ，遅い受信時刻で受信された記憶データが無効なものとして取り扱われることとなる。コピーメッセージまたは衝突メッセージにより受信時刻を通知する場合には，これらのメッセージのヘッダ部またはデータ部に時刻を格納する領域が設けられる。
なお，第２の方法では，キャッシュ装置１および２の双方の時刻が同期されていることが前提となる。また，第１の方法と同様にして，図５Ａにおいて，キャッシュ装置１が優先される場合には，キャッシュ装置２からキャッシュ装置１に記憶データＡ２のコピーメッセージが送信されることとなる。図５Ｂおよび図５Ｃでは，キャッシュ装置１および２のいずれが優先されても，双方とも記憶データＡ１およびＡ２を保持しているので，コピーメッセージを新たに送信する必要はない。
この第２の方法によっても，衝突状態が解消され，キャッシュ装置１および２において，記憶データの一貫性および透過性が確保される。
（ｃ）第３の方法
第３の方法は，衝突検出後，両キャッシュ装置がランダムな時間の経過後，再びコピーメッセージ（再送コピーメッセージ）または再送指示メッセージを送信する衝突解消方法である。
より早く再送コピーメッセージまたは再送指示メッセージを送信したキャッシュ装置が優先され，このキャッシュ装置がアクセスホストから受信した記憶データが有効なものとして取り扱われる。
ランダムな時間は，たとえば，キャッシュ管理部１１および２１によりそれぞれ発生された擬似乱数に基づいて求められた時間である。
ランダムな時間の経過後送信されるメッセージは，記憶データＡ１またはＡ２を含む再送コピーメッセージであってもよいが，記憶データＡ１およびＡ２を相手側のキャッシュ装置が既に有する場合には，通信コストを軽減するために，記憶データを含まず，再送を示すための再送指示メッセージであることが好ましい。
再送コピーメッセージは，通常のコピーメッセージと区別するために，ヘッダ部のタイプが，再送コピーメッセージであることを表す“ＲＥ−ＣＯＰＹ”とされる。それ以外のヘッダ部の内容は，通常のコピーメッセージのものと同じである。
再送指示メッセージは，ヘッダ部のみを有し，データ部を有しない。再送指示メッセージのヘッダ部のタイプは，再送指示メッセージを表す“ＲＥＴＸ”であり，ヘッダのアドレス範囲は，先に送信した記憶データと同一のアドレス範囲とされる。これにより，受信側のキャッシュ装置は，先に受信した記憶データについての再送メッセージであることを識別することができる。
たとえば，図５Ａでは，キャッシュ装置１（キャッシュ管理部１１）は，ランダムな時間の経過後，再送指示メッセージを通信回線Ｌを介してキャッシュ装置２に送信する。キャッシュ装置２（キャッシュ管理部２１）は，ランダムな時間の経過後，記憶データＡ２を含む再送コピーメッセージを通信回線Ｌを介してキャッシュ装置１に送信する。
キャッシュ装置１が送信する再送指示メッセージが，キャッシュ装置２が送信する再送コピーメッセージより先に送信された場合には，キャッシュ装置１が優先され，したがって，記憶データＡ１が有効なものとして取り扱われる。
一方，キャッシュ装置２が送信する再送コピーメッセージが，キャッシュ装置１が送信する再送指示メッセージより先に送信された場合には，キャッシュ装置２が優先され，したがって，記憶データＡ２が有効なものとして取り扱われる。なお，この場合に，キャッシュ装置１は，記憶データＡ２を有しないので，キャッシュ装置２は，記憶データＡ２をコピーメッセージ等によりキャッシュ装置１に送信することとなる。
キャッシュ装置１によるランダムな時間とキャッシュ装置２によるランダムな時間が同一であり，キャッシュ装置１が送信する再送指示メッセージと，キャッシュ装置２が送信する再送コピーメッセージとが，同時に送信および受信された場合には，再びランダムな時間が計時され，同様の処理が繰り返される。
図５Ｂおよび図５Ｃの場合にも，同様の処理が実行される。
なお，ランダムな時間が経過するまでの間に，同じアドレス範囲を有する別の記憶データがアクセスホストから送信され，キャッシュ装置１または２により受信される場合がある。この場合には，別の記憶データを受信したキャッシュ装置１または２が，再度コピーメッセージにより，この別の記憶データを相手方のキャッシュ装置に送信することとなる。
この第３の方法によっても，衝突状態が解消され，キャッシュ装置１および２において，記憶データの一貫性および透過性が確保される。
なお，キャッシュ装置１がアクセスホストから受信した記憶データおよびキャッシュ装置２がアクセスホストから受信した記憶データの少なくとも一方が複数バイトを有する場合には，複数バイトの記憶データの一部分に衝突が発生する場合がある。この場合には，この衝突が発生した一部分について，上記衝突検出処理および回復処理が実行される。
＜記憶データの読み出し処理＞
アクセスホスト群４からキャッシュシステム３に対し，記憶データの読み出し要求があった場合の記憶データの読み出し処理について説明する。ここでは，アクセスホスト４_ｉがキャッシュ装置１に対して記憶データの読み出し要求を行った場合を説明する。
アクセスホスト４_ｉは，読み出す記憶データのアドレス範囲を含む読み出し要求を，アクセスネットワーク６を介してキャッシュ装置１の入出力部１３に送信する。この読み出し要求は，入出力部１３からキャッシュ管理部１１に与えられる。
キャッシュ管理部１１は，読み出し要求に含まれるアドレス範囲の記憶データがキャッシュメモリ１２に存在するかどうかを，キャッシュ制御表に基づいて判断する。ｒｅｃｅｉｖｅｄ状態およびｉｎｖａｌｉｄａｔｅｄ状態にある記憶データを除き，それ以外の状態にある記憶データは，キャッシュメモリ１２に存在すると判断される。
アドレス範囲に対応する記憶データの全部または一部が，キャッシュメモリ１２に記憶されていない場合には，キャッシュ管理部１１は，キャッシュメモリ１２に記憶されていない部分を，入出力部１４およびストレージネットワーク７を介して２次記憶装置群５の対応する２次記憶装置から読み出し，キャッシュメモリ１２に記憶する。また，これに伴い，キャッシュ管理部１１は，２次記憶装置からキャッシュメモリ１２に記憶された記憶データに関するキャッシュ制御リストを生成し，キャッシュ制御表に付加する。なお，このキャッシュ制御リストの状態には“ｃｌｅａｎ”が書き込まれる。
続いて，キャッシュ管理部１１は，読み出し要求に対応する記憶データを，キャッシュメモリ１２から読み出し，入出力部１３およびアクセスネットワーク６を介してアクセスホスト４_ｉに送信する。
このとき，本実施の形態では，キャッシュ装置１が２次記憶装置から読み込んた記憶データがキャッシュ装置２のキャッシュメモリ２２に既に存在する場合であっても，２次記憶装置から読み出した記憶データと，キャッシュメモリ２２に存在する記憶データとは同一内容であることが保証されている。したがって、キャッシュ装置１と２との間で記憶データの一貫性を確認するための確認メッセージ等を通信する必要はない。これにより，この通信に伴う通信オーバヘッドおよび通信コストを削減することができる。
なお，アクセスホスト４_ｉから読み出し要求があった記憶データが，キャッシュメモリ１２に存在せず，キャッシュメモリ２２に存在する場合に，キャッシュ管理部１１は，キャッシュ装置２（キャッシュメモリ２２）からこの記憶データを受け取ってもよい。
また，アクセスホスト４_ｉにおいて，読み出され，受信される記憶データの時間順序が重要な場合には，アクセスホスト４_ｉが有する一般的な制御同期機構等を使用することで時間順序を保証することができる。
＜負荷分散処理＞
上記の説明では，アクセスホストから記憶データを受信したキャッシュ装置が２次記憶装置に記憶データの書き込み（フラッシュ）を行っているが，このフラッシュを２つのキャッシュ装置間で分散させることもできる。
たとえば，一方のキャッシュ装置の２次記憶装置へのアクセス頻度が他方のキャッシュ装置のそれよりも高い場合等には，書き込み処理を他方のキャッシュ装置に分担させることができる。また，２つのキャッシュ装置の処理能力や性能に差がある場合も，能力や性能の高いキャッシュ装置により多くの書き込み処理を行わせることもできる。これにより，キャッシュ装置１および２の間で負荷分散が図られる。
このような負荷分散を行うために，キャッシュ管理部１１および２１は，ともに負荷を計測し，制御メッセージ（上述したコピーメッセージ，確認応答メッセージ等とは異なる他の制御メッセージ）により定期的に負荷を相互に通知し合う。計測される負荷としては，たとえば２次記憶装置群５への書き込み回数や書き込んだデータ量（バイト数，ブロック数）等が含まれる。そして，負荷の低いキャッシュ装置がフラッシュを実行する。
図６は，負荷分散が行われた場合の記憶データの書き込み処理の流れを示すシーケンス図である。
キャッシュ装置１がアクセスホスト４_ｉから記憶データを受信すると，キャッシュ管理部１１は，図４のステップＳ１と同様に，キャッシュ制御リストを生成するとともに，キャッシュメモリ１２に記憶データを記憶する（Ｓ４０）。
続いて，キャッシュ管理部１１は，コピーメッセージをキャッシュ装置２に送信する（Ｓ４１）。ここで，キャッシュ管理部１１は，キャッシュ装置１の負荷がキャッシュ装置２の負荷よりも高い場合には，コピーメッセージのヘッダ部のフラッシュ権限にキャッシュ装置２を指定する。
キャッシュ管理部２１は，フラッシュ権限にキャッシュ装置２が指定されたコピーメッセージを受信すると，キャッシュ制御リストを生成するとともに，生成したキャッシュリストの状態を，通常の場合に設定されるｎｏｎ−ｖｏｌａｔｉｌｅ状態ではなく，ｄｉｒｔｙ状態に設定し，コピーメッセージに含まれる記憶データをキャッシュメモリ２２に記憶する（Ｓ４２）。
続いて，キャッシュ管理部２１は，第１の確認応答メッセージをキャッシ装置１に返信する（Ｓ８）。
キャッシュ管理部１１は，第１の確認応答メッセージを受信すると，キャッシュ制御リストの状態ｄｉｒｔｙ状態ではなく，ｎｏｎ−ｖｏｌａｔｉｌｅ状態に設定し，また，ホスト確認応答メッセージをアクセスホスト４_ｉに送信する（Ｓ４３）。
その後，キャッシュ管理部２１は，適当なタイミングでｄｉｒｔｙ状態の記憶データをｆｌｕｓｈｉｎｇ状態にして，２次記憶装置に書き込む（Ｓ４４）。以降のステップＳ４５〜Ｓ４８の処理は，キャッシュ管理部１１がキャッシュ制御リストの状態をｎｏｎ−ｖｏｌａｔｉｌｅからｃｌｅａｎにし，キャッシュ管理部２１がキャッシュ制御リストの状態をｆｌｕｓｈｉｎｇからｆｌｕｓｈｅｄｍｓｇおよびｃｌｅａｎにする点が異なることを除いて，図４に示すステップＳ５〜Ｓ１０の対応する処理と同じである。
このようにして，適切なアドレス範囲の記憶データのフラッシュ権限を他方のキャッシュ装置に移すことにより，負荷分散が図られる。
なお，上記では，相手のキャッシュ装置にフラッシュ権限を与えているが，自己のキャッシュ装置にフラッシュ権限があることを，相手のキャッシュ装置にコピーメッセージ等の制御メッセージによって通知することもできる。
また，負荷の値を，各キャッシュ装置の性能に対する負荷の比とすることもできる。たとえば，キャッシュ装置１は，自己の負荷の値を自己の性能の値により除算し（すなわち（キャッシュ装置１の負荷の値）÷（キャッシュ装置１の性能の値）），この除算結果をキャッシュ装置２に送信し，キャッシュ装置２も，自装置の除算結果をキャッシュ装置１に送信する。そして，２つの比のうち，低い比の値を有するキャッシュ装置にフラッシュ権限が付与されるようにすることもできる。
＜障害監視および回復装置＞
障害監視部１６および２６は，前述したように，他方のキャッシュ装置の動作状況を監視し，障害が発生したかどうかを判断する。たとえば，障害監視部１６および２６は，相互に一定時間間隔で，他方のキャッシュ装置に通信回線Ｌを介して障害検出メッセージを送信し合う。そして，障害監視部１６および２６は，所定の時間を経過しても，他方のキャッシュ装置から送信される障害検出メッセージを受信しない場合には，他方のキャッシュ装置に障害が発生していると判断する。
以下，キャッシュ装置１に障害が発生し，障害監視部２６がこの障害を検出した場合を例にとり，障害検出時の回復処理について説明する。
キャッシュ装置１の障害が検出されると，キャッシュ装置１がキャッシュメモリ１２から２次記憶装置群５に書き込む予定であるｄｉｒｔｙ状態の記憶データを，キャッシュ装置２がキャッシュ装置１に代わって２次記憶装置に書き込む必要がある。
このため，障害監視部２６は，障害検出後，キャッシュ管理部２１が保持するキャッシュ制御表（すなわち，キャッシュメモリ２２に記憶された記憶データに関するキャッシュ制御表）に基づいて，ｎｏｎ−ｖｏｌａｔｉｌｅ状態にある記憶データの状態をｄｉｒｔｙ状態に書き換える。これにより，ｄｉｒｔｙ状態に変更された記憶データは，キャッシュ管理部２１によって，２次記憶装置群５に書き込まれる（フラッシュされる）こととなる。
キャッシュ装置１によって２次記憶装置群５に書き込み途中にあるｆｌｕｓｈｉｎｇ状態の記憶データも，そのどの部分までが２次記憶装置群５に書き込み済みかを判断することはできない。このキャッシュ装置１におけるｆｌｕｓｈｉｎｇ状態の記憶データも，キャッシュ装置２においては，ｎｏｎ−ｖｏｌａｔｉｌｅ状態の記憶データとしてキャッシュメモリ２２に記憶されている。したがって，このｆｌｕｓｈｉｎｇ状態の記憶データについても，上記同様に，キャッシュ装置２において，ｎｏｎ−ｖｏｌａｔｉｌｅ状態からｄｉｒｔｙ状態に変更されることにより，キャッシュ装置２によって，２次記憶装置に確実に記憶されることとなる。
キャッシュ装置１が障害から回復した場合，または，キャッシュ装置１が他のバックアップ用のキャッシュ装置に切り替えられた場合には，一方のキャッシュ装置にのみｄｉｒｔｙ状態の記憶データが存在する状況が起こり得る。したがって，このような状況が起こらないようにするために，キャッシュメモリ２２に記憶された全てのキャッシュデータがｃｌｅａｎ状態にならない限り，キャッシュ装置１を復帰させないよう，キャッシュ装置１の復帰タイミングが制御される。あるいは，キャッシュ装置１の復帰前までのｄｉｒｔｙ状態にある記憶データを復帰後，キャッシュ装置１に送信してもよい。
なお，キャッシュ装置２に障害が発生した場合には，キャッシュ装置１の障害監視部１６が上述した障害監視部２６と同様の処理を実行することとなる。
産業上の利用の可能性
本発明は，コンピュータ等のアクセスホスト（群）と２次記憶装置（群）との間に配置されるキャッシュシステムに利用することができる。
本発明によると，２つのキャッシュ装置がそれぞれ独立して，アクセスホストから記憶データを受信し，処理するので，単一のキャッシュ装置に対してほぼ２倍の処理能力を得ることができ，単一キャッシュ装置のボトルネックを解消できる。
また，両キャッシュ装置に同じ記憶データが記憶されるので，キャッシュシステムにおける記憶データの不揮発化が可能になる。その結果，一方のキャッシュ装置に障害が発生しても，記憶データの喪失が防止される。
さらに，両キャッシュ装置間で衝突解消処理が実行されるので，両キャッシュ装置間での記憶データの一貫性および透過性が確保される。したがって，記憶データの読み出しごとに，両キャッシュ装置でその記憶データに対する一貫性を確認する必要がなく，確認のためのオーバヘッドをなくすことができる。
結果として，性能を向上させながら，記憶データの一貫性と透過性を実現することが可能となる。
【図面の簡単な説明】
図１は，本発明の一実施の形態によるキャッシュシステムを使用した２次記憶装置アクセスシステムの全体構成を示すブロック図である。
図２は，キャッシュ管理部により保持されるキャッシュ制御表の構成例を示す。
図３は，メッセージ通信部間で通信される制御メッセージのデータ構造例を示す。
図４は，アクセスホストから送信された記憶データの２次記憶装置群への書き込み処理の流れを示すシーケンス図である。
図５Ａから図５Ｃは，両キャッシュ装置間で衝突が発生した場合の衝突検出の処理の流れを示すシーケンス図である。
図６は，負荷分散が行われた場合の記憶データの書き込み処理の流れを示すシーケンス図である。
図７Ａから図７Ｅは，２つの記憶データのアドレス範囲が重なる場合を示す。Technical field
The present invention relates to a cache device, a cache system, and a cache method for temporarily storing data, and in particular, has two cache devices, stores data provided from an access host or a secondary storage device, and access host The present invention relates to each cache device, a cache system, and a cache method in a cache system for storing data from a secondary storage device.
Background art
Conventionally, a secondary storage device such as a hard disk has been used for only one access host. However, in recent years, a secondary storage device composed of a plurality of secondary storage devices shared by a plurality of access hosts via a storage network or the like. Storage devices have been widely used. Data is read from and written to the secondary storage device group from the access host via the storage network.
Due to the structure of such a secondary storage device, the throughput / response performance is often not fully exhibited depending on the requested access pattern. Further, when RAID (Redundant Array of Inexpensive Disks) 5 is used, the overhead required for writing is large.
Therefore, it is generally performed to improve performance by using a cache device (cache memory) to store relatively frequently used data or data written in the secondary storage device in the cache device. Yes.
The cache device serves to temporarily store data read from or written to the secondary storage device using a high-speed primary storage device, thereby improving the performance of the computer system. Has had a great effect.
Especially when it is necessary to go through a device that intercepts data in the middle, such as a storage virtualization address translation device or a file system appliance device, between the shared secondary storage device group and the access host. In addition, using a cache device on the device is an effective technique for improving performance.
However, a single cache device becomes a bottleneck and often degrades performance. That is, when there is a single cache device, access to the secondary storage device group passes through only one cache device. For this reason, even if the number of access hosts and secondary storage devices increases, sufficient throughput for the cache device cannot be secured, and the overall performance is determined by the performance limit of the cache device.
Further, in a single cache device, data is lost due to a failure of the cache device, resulting in problems such as an increase in downtime.
That is, a state in which data before being written to the secondary storage device group exists in the cache device and the data exists only in the cache device can easily occur at any time. When a failure occurs in the cache device in such a state, data existing only in the cache device is lost and cannot be recovered.
In addition, if the downtime of the system due to the failure of the cache device increases, the system may be damaged in terms of economy and reliability.
To solve the bottleneck problem, a method for solving this problem by providing a plurality of cache devices can be considered. This is to allocate a plurality of cache devices to a plurality of access hosts, thereby eliminating bottlenecks and ensuring scalability.
However, if one access host writes data to one cache device, and another access host writes data with the same address and different contents to the other cache device, the contents of both cache devices Different data will be cached. If this state is left as it is, cache data for a certain address remains inconsistent in each cache device, and consistency cannot be maintained.
Further, when a certain cache device stops due to a failure, different data is cached in the other cache device, and transparency is not maintained.
Furthermore, as long as the cache devices are operating independently, the problem that the written cache data is lost due to a single failure and becomes unrecoverable remains unresolved.
On the other hand, there is a method for maintaining the consistency of data written between a plurality of cache devices.
A common way to maintain this consistency is to use tokens. In this method, information called a token is communicated between a plurality of cache devices, and data consistency is guaranteed by exclusive control. However, token control increases the message cost (token communication cost) and requires a relatively long time due to the token communication time. Therefore, this method has a problem that a device such as a cache device provided for improving the processing speed may become a bottleneck.
In addition, there is a system that uses a memory shared by a plurality of cache devices and performs exclusive control on the memory. A cache device using such a shared memory has a problem that cache data is lost when a failure occurs in the shared cache memory device and its management unit. In addition, when a volatile memory is used as the cache memory, data may be lost due to a power failure. For this reason, it is necessary to use a non-volatile memory, but there is a problem that the cost increases accordingly.
Thus, when using a cache device, many existing methods have problems such as high cost even though consistency and transparency with respect to written data must be maintained. Have.
Disclosure of the invention
An object of the present invention is to eliminate bottlenecks in a single cache device.
Another object of the present invention is to prevent loss of data stored in a cache device even when a failure occurs in the cache device.
A cache device according to the present invention has two cache devices, stores data provided from an access host or a secondary storage device, and stores data from the access host in the secondary storage device. A data input unit for inputting first data given from the access host and data for receiving second data input from the access host and transmitted to the own cache device by the cache device A receiving unit; a cache storage unit that stores both or any one of the first data and the second data; a cache management unit that manages the cache storage unit; and A data transmission unit for transmitting to the cache device; and the first data or the first data The data having a data output section for outputting to the secondary storage device.
A cache system according to the present invention is a cache system having two cache devices, storing data supplied from an access host or a secondary storage device, and storing data from the access host in the secondary storage device. Each of the two cache devices includes a data input unit that inputs first data provided from the access host, and a second cache device that the other cache device inputs from the access host and transmits to the own cache device. A data receiving unit that receives data; a cache storage unit that stores both or any one of the first data and the second data; a cache management unit that manages the cache storage unit; A data transmission unit for transmitting data to the other cache device; Having a data output unit for outputting over data or the second data to the secondary storage device.
A cache method according to the present invention includes two cache devices, stores data provided from an access host or a secondary storage device, and stores data from the access host in the secondary storage device. The one cache device that has received data from the access host stores the data in its own cache memory and transmits the data to the other cache device, and the other cache device The data transmitted from the one cache device is received and stored in its own cache memory, and the one cache device or the other cache device outputs the data to the secondary storage device It is.
According to the present invention, two cache devices are provided in the cache system. Each cache device receives data to be stored in the secondary storage device from the access host, and outputs (stores) the data to the secondary storage device. Therefore, it is possible to obtain twice the processing capacity as compared with the case where there is one cache device, and it is possible to eliminate the bottleneck when there is a single cache device.
According to the present invention, one cache device to which data is input from the access host stores the data in its own cache memory and transmits the input data to the other cache device. The other cache device stores the data transmitted from one cache device in its own cache memory. This achieves data non-volatility in the cache system. Therefore, even when a failure occurs in one cache device in a state where the data is not stored in the secondary storage device, data can be obtained from the other cache device, and loss of data due to the failure is prevented.
Preferably, when the data receiving unit receives the second data from when the data input unit inputs the first data to when the data transmitting unit completes the transmission of the first data Includes the first data and the second data based on the address ranges on the secondary storage device of both the first data and the second data, and the contents of the two data. A collision detection unit that determines the presence or absence of a collision; a collision detection message transmission unit that transmits a collision detection message indicating that a collision has occurred to the other cache device when the collision detection unit detects a collision; and And a collision detection message receiving unit for receiving the collision detection message from the cache device.
The collision detection unit also detects a collision when the collision detection message reception unit receives the collision detection message from the other cache device.
Thereby, even when two pieces of data having different contents in the same address range are received from the access host by both cache devices, a collision between these two pieces of data can be detected.
When a collision is detected, the cache management unit treats the data prioritized based on a predetermined priority order as the valid data among the first data and the second data, Can be treated as invalid.
Alternatively, when a collision is detected, the cache management unit inputs the time when the first data is input to the data input unit and the second data is input to the data input unit of the other cache device. Of the received times, data having an earlier time can be treated as valid, and data having a later time can be treated as invalid.
Further, when a collision is detected, the cache device includes a data / message retransmission unit that transmits a retransmission message indicating retransmission of the first data or the first data after a lapse of random time. A data / message receiving unit for receiving the second data or the retransmission message of the second data transmitted from the data / message retransmission unit of the other cache device, and the cache management unit Treats the data corresponding to the earlier time among the retransmission time by the data / message retransmission unit and the reception time by the data / message receiving unit as valid, and invalidates the data corresponding to the later time It can also be handled.
By either of these, the collision state can be resolved, and the consistency and transparency of data between the two cache devices can be ensured.
Preferably, the data transmission unit transmits, together with the first data, first flush authority information indicating which cache device performs output of the first data to the secondary storage device, The data receiving unit receives, together with the second data, second flush authority information indicating which cache device outputs the second data to the secondary storage device, and outputs the data output The first unit outputs the first data to the secondary storage device when the first flash authority information indicates the own cache device, and the second flash authority information indicates the own cache device. If so, the second data is output to the secondary storage device.
As a result, load distribution can be performed between the two cache devices.
Preferably, the cache device monitors the occurrence of a failure in the other cache device, and when the occurrence of the failure is detected, out of the first data stored in the cache storage unit, the data output unit or the other cache device. The data output unit has not completed output to the secondary storage device, and the data transmission unit has completed transmission to the other cache device, and the data stored in the cache storage unit Of the second data, the data output unit or the data output unit of the other cache device does not complete the output to the secondary storage device, and the data output unit outputs to the secondary storage device And a failure monitoring unit for controlling as described above.
Thus, even when a failure occurs in one of the cache devices, data that exists in the cache device but does not exist in the secondary storage device can be reliably flushed (stored) in the secondary storage device.
BEST MODE FOR CARRYING OUT THE INVENTION
FIG. 1 is a block diagram showing the overall configuration of a secondary storage device access system using a cache system according to an embodiment of the present invention. This secondary storage device access system (hereinafter simply referred to as “access system”) includes a cache system 3, an access host group 4, a secondary storage device group 5, an access network 6, and a storage network 7.
The cache system 3 has two cache devices 1 and 2. Both the cache devices 1 and 2 have the same configuration. The cache device 1 includes a cache management unit 11, a cache memory 12, input / output units 13 and 14, a message communication unit 15, and a failure monitoring unit 16. The cache device 2 includes a cache management unit 21, a cache memory 22, input / output units 23 and 24, a message communication unit 25, and a failure monitoring unit 26.
The access host group 4 includes n access hosts 4 (n is an integer of 2 or more). ₁ ~ 4 _n Have Each access host writes data (hereinafter referred to as “storage data”) to the secondary storage device group 5 via the cache system 3, and also to the secondary storage device group 5 (or the cache system 3). The stored data is read through the cache system 3. Each access host is constituted by a computer, for example.
The secondary storage device group 5 is a secondary storage device shared by each access host of the access host group 4, and m (m is an integer of 2 or more) secondary storage devices 5. ₁ ~ 5 _m Have Each secondary storage device is assigned a device number (for example, a serial number) for uniquely identifying each secondary storage device, and the access host group 4 and the cache system 3 specify the device number to specify the secondary number. One secondary storage device of the storage device group 5 can be specified, and the storage data on the specified secondary storage device can be specified by specifying an address (or block number). Each secondary storage device includes, for example, a hard disk, a magneto-optical disk (MO), an optical disk (for example, DVD-RAM), and the like.
The access network 6 is constituted by, for example, a SCSI network, a fiber channel (fibre channel), a LAN (Ethernet), or the like. The storage network 7 is configured by, for example, a fiber channel.
Access host 4 ₁ ~ 4 _n The input / output unit 13 of the cache device 1 and the input / output unit 23 of the cache device 2 are connected to the access network 6. As a result, access host 4 ₁ ~ 4 _n Can transmit the stored data to the cache device 1 or 2 via the access network 6 and can receive the stored data from the cache device 1 or 2.
Secondary storage device 5 ₁ ~ 5 _m The input / output unit 14 of the cache device 1 and the input / output unit 24 of the cache device 2 are connected to the storage network 7. As a result, the cache devices 1 and 2 transmit (write) storage data to the secondary storage device group 5 via the storage network 7 and receive (read) storage data from the secondary storage device group 5. be able to.
The cache devices 1 and 2 are configured independently so that one failure does not affect the other. The cache devices 1 and 2 independently receive input / output from the access host group 4 to obtain twice the performance of one cache device while maintaining the transparency of stored data, as will be described later. It becomes possible.
The input / output unit 13 (23) (the reference numerals in parentheses indicate the corresponding components in the cache device 2. The same applies hereinafter) communication processing such as protocol processing for stored data transmitted and received via the access network 6. Execute. In addition, the input / output unit 14 (24) performs communication processing on stored data transmitted / received via the storage network 7.
The cache memory 12 (22) is configured by a storage device (RAM or the like) that can be accessed (read and written) at a higher speed than the secondary storage devices of the secondary storage device group 5.
The cache management unit 11 (21) holds a cache control table (described later), and manages the stored data stored in the cache memory 12 (22) based on the cache control table. Further, the cache management unit 11 (21) controls the input / output units 13 (23) and 14 (24), the failure monitoring unit 16 (26), and the message communication unit 15 (25), and the input / output unit 13 (23 ) Or 14 (24) or the storage data input from the message communication unit 15 (25) is written to the cache memory 12 (22), transmitted from the input / output unit, or stored data read from the cache memory 12 (22). Transmission through the input / output unit 13 (23) or 14 (24), transmission / reception of control messages and stored data with the other cache device through the message communication unit 15 (25), and the like are performed.
FIG. 2 shows a configuration example of a cache control table held by the cache management unit 11 (21). The cache control tables are provided for the cache memory 12 and for the cache memory 22, respectively. The former is held by the cache management unit 11 and the latter is held by the cache management unit 21, respectively.
The cache control table has a cache control list of each storage data currently stored in the cache memory 12 (22). Each cache control list has, as data items, an element number, a device number, a device start address, a data length, a cache start address, a state, and a flag.
The “element number” is an element number of stored data currently stored in the cache memory 12 (22). Here, one element corresponds to a piece of stored data written from an access host in the access host group 4 or read from a certain access host. One element may correspond to 1-byte stored data or may correspond to a plurality of bytes of stored data.
For example, a certain access host 4 _i When storage data having 512 bytes as one block is transmitted from (i is an integer from 1 to n), and this one block of storage data is stored in the cache memory 12 (22), one element Corresponds to stored data of one block (512 bytes). Other access host 4 _j When two blocks of storage data are transmitted from (j is one of 1 to n) and the two blocks of storage data are stored in the cache memory 12 (22), one element is two blocks of storage data. It will correspond to.
The “device number” is a device number (for example, serial number) of the secondary storage device of the secondary storage device group 5 in which stored data is to be stored. The “device start address” is a storage start address (first address) of stored data in the secondary storage device with the corresponding device number. “Data length” is the length (number of bytes) of the corresponding stored data.
When the storage data is read from each secondary storage device in units of a block consisting of a plurality of bytes (for example, 512 bytes) and written to each secondary storage device, the device start address is set to the first block. Can be used as the starting block number, and the data length can be used as the ending block number where the last block is written. For example, when storage data of 2 blocks is stored in the storage area of the fifth block and the storage area of the sixth block of the secondary storage device, the start block number can be 5 and the end block number can be 6.
The “cache start address” is a storage start address (first address) of storage data in the cache memory 12 (22).
The “state” indicates the state of stored data. In this state, “reception” (hereinafter referred to as “received”), “dirty” (hereinafter referred to as “dirty”), “non-volatile” (hereinafter referred to as “non-volatility”). volatile ")," flushing "(hereinafter" flashing ")," flash message "(hereinafter" flushed msg ")," clean "(hereinafter" clean "), and" invalidate "( Hereinafter referred to as “invalidated”).
In the received state, the storage data is received from the access host, the state until the completion of transmission of the copy message (described later) of the stored data, or the first confirmation response message (described later) for this copy message is received The state of.
The dirty state is a state in which the first confirmation response message relating to the stored data has been received and the stored data has not yet been written to the secondary storage device.
The non-volatile state is a state in which the storage data included in the copy message transmitted from the other cache device is stored in the cache memory, and the storage data has not yet been written to the secondary storage device group.
Unlike the dirty state, the flushing state refers to a state while the stored data is being written to the secondary storage device group.
In the flushed msg state, writing (flushing) of the storage data to the secondary storage device is completed, and the state of the same storage data stored in the other cache device is changed from the non-volatile state to the clean state. A state in which a flush message (described later) is notified to the other cache device.
The clean state is a state in which writing of the stored data to the secondary storage device is finished and notification to that effect to the other cache device is finished. In this state, the stored data stored in the cache memory 12 (22) can be discarded at any time, such as overwriting and erasing. However, in preparation for reading this stored data from the access host group 4, the cache memory 12 ( 22).
The invalidated state is a state in which the stored data is invalidated after the clean state. The stored data in this state is then subjected to a discarding process such as overwriting or erasing immediately or after a predetermined time has elapsed.
“Flag” indicates the presence / absence of flash authority, whether writing is in a collision state, whether a host acknowledgment message (ACK) has already been returned to the access host, or the like. The presence / absence of flash authority, write collision status, and confirmation response message will be described later.
When the cache management unit 11 (21) receives data from the access host group 4 or the secondary storage device group 5 and processes the data, the cache management unit 11 (21) refers to the cache control table held by the cache management unit 11 (21), Update the contents of the cache control table.
The message communication units 15 and 25 are connected to each other by a communication line L, and transmit / receive stored data, control messages, and the like. For example, a PCI (Peripheral Component Interconnect Bus), Gigabit Ethernet, or the like is used for the communication line L that connects the two, and the communication speed is preferably higher than that of the access network 6.
FIG. 3 shows an example of the data structure of a control message communicated between the message communication units 15 and 25. The control message has a header part in which control information is stored and a data part in which stored data is stored. In the header section, data items of type, device number, device start address, data length, sequence number, status, and flash authority are placed.
“Type” is the type of the control message. This type includes “COPY” indicating a copy message, “C-ACK” indicating an acknowledgment message (first acknowledgment message) for the copy message, “FLUSHED” indicating a flash message, and an acknowledgment message for the flash message ( "F-ACK" representing the second confirmation response message), "COLLISION" representing the collision detection message, "COL-ACK" representing the confirmation response message (third confirmation response message) for the collision detection message, and the failure monitoring message There is “MONITOR” or the like. Each of these control messages will be described later.
In the case of a copy message, the “device number” is the device number of the secondary storage device in which the storage data given to the data section is stored, and in the case of a flash message, 2 in which the flashed storage data is stored. In the case of the device number of the secondary storage device and the collision detection message, the device number of the secondary storage device in which the storage data in which the collision is detected is stored. In the case of the acknowledgment message, the corresponding copy message, flash message, The device number is the same as the device number of the collision detection message or the like.
The “device start address” is the start address of the secondary storage device that stores the storage data given to the data section in the case of a copy message, and the flash storage data in the case of a flash message. In the case of a secondary storage device start address and collision detection message, the secondary storage device start address in which stored data in which a collision is detected is stored. In the case of an acknowledgment message, the corresponding copy message and flash message are stored. The start address is the same as the start address of the collision detection message.
“Data length” is the length of stored data (number of bytes, number of blocks, etc.) given to the data part in the case of a copy message, and the length of the stored data flushed in the case of a flash message. In the case of a detection message, this is the length of stored data where a collision was detected. The value of the data length of the confirmation response message is 0.
The “state” is the same data as the state in the cache control table described above. “Sequence number” is a serial number attached to a control message to be transmitted. Each of the cache management units 11 and 21 assigns serial numbers starting from 1, for example, to control messages transmitted by the cache management units 11 and 21 in order according to the order of transmission. This makes it possible to clarify the time order of control messages to be transmitted. Accordingly, the cache management unit 11 (21) on the receiving side manages the sequence number of the control message to be received, so that the transmission order can be accurately set even if the reception order of the control messages is different from the transmission order. I can know.
“Flush authority” indicates whether the cache device that performs flash (processing for storing storage data stored in the cache memory 12 (22) in the secondary storage device) is the cache device 1 or 2. When the flash authority is not specified, this area is set to a value (for example, Null or the like) other than the values indicating the cache devices 1 and 2.
The failure monitoring unit 16 (26) monitors the failure of the other cache device, and when a failure is detected, performs processing necessary for recovery from the failure. For example, the failure monitoring unit 16 (26) transmits and receives a failure detection message to each other via the message communication unit 15 (25) at regular time intervals. When one failure monitoring unit does not receive the failure detection message from the other failure monitoring unit even after a predetermined time has elapsed, it determines that a failure has occurred in the other cache device.
Further, the failure monitoring unit 16 (26) detects the occurrence of a failure in the other cache device, and thereby the state of the cache control table held by the cache management unit 11 (21) of the cache device 1 (2) to which the failure monitoring unit 16 (26) belongs. Is rewritten as necessary, and fault recovery processing is executed. Details of the failure recovery processing will be described later.
In the access system having such a configuration, the access host group 4 writes the storage data to the secondary storage device group 5 via the cache system 3, and the storage data from the secondary storage device group 5 via the cache system 3. Is read. Hereinafter, details of the writing process and the reading process, and the failure recovery process when a failure occurs in one of the cache devices of the cache system 3 will be described.
<Writing process of stored data>
Any access host 4 in the access host group 4 _i However, the processing when the stored data is written to the secondary storage device group 5 via the cache device 1 or 2 will be described below.
FIG. 4 shows the access host 4 _i FIG. 10 is a sequence diagram showing a flow of a process of writing storage data transmitted from the storage device to the secondary storage device group 5;
First, access host 4 _i Transmits the stored data to the cache device 1 or 2 via the access network 6. The access host 4 determines whether the stored data is transmitted to the cache device 1 or 2. _i Is preset. It may be set to transmit only to one of the cache devices 1 or 2, or may be set to transmit alternately to the cache devices 1 and 2. Access host 4 _i Sends a write request signal to the cache devices 1 and 2, and the cache devices 1 and 2 in the idle state send a data receivable message to the access host 4 _i Thus, the transmission of the stored data may be started (when a data receivable message is transmitted from both the cache devices 1 and 2, the access host 4 _i Select one). In the following, access host 4 _i A case where stored data is transmitted from the cache to the cache device 1 will be described as an example.
Access host 4 _i Transmits the stored data and control data to the input / output unit 13 of the cache device 1 via the access network 6. The control data is added as, for example, a header of stored data. The control data includes the device number, device start address, and data length of the secondary storage device in which the stored data is to be stored (hereinafter, the device number, device start address, and data length are referred to as “address range”). include.
The input / output unit 13 is connected to the access host 4 _i The control data and storage data transmitted from is provided to the cache management unit 11.
The cache management unit 11 is connected to the access host 4 _i The address range of the cache control table of the cache memory 12 indicates whether or not the stored data (stored data b) having an address range overlapping the address range of the stored data (stored data a) transmitted from the cache memory 12 is already stored. Judgment based on.
When there is no storage data b having an address range that overlaps the address range of the storage data a, the cache management unit 11 generates a cache control list of the storage data a and adds it to the cache control table. Then, the cache management unit 11 writes the storage data a into the memory cell of the cache memory 12 starting from the cache start address of the generated cache control list (S1).
On the other hand, when the address ranges overlap, as shown in FIGS. 7A to 7E, (1) the address range of the stored data a and the address range of the stored data b are exactly the same (FIG. 7A), (2) the stored When the address range of the data a includes the address range of the storage data b (FIG. 7B), (3) When the address range of the storage data a is included in the address range of the storage data b (FIG. 7C), (4) There are cases where the address range of the stored data a and the address range of the stored data b both have a partially overlapping range and a non-overlapping range (FIGS. 7D and 7E).
In the case of (1), the cache management unit 11 updates the device number, device start address, etc. in the cache control list of the stored data b to that of the stored data a. Alternatively, the cache management unit 11 generates a cache control list for the stored data a and adds it to the cache control table, and sets the state of the cache control list for the stored data b to “invalidated” or The cache control list may be deleted. Then, the cache management unit 11 writes the storage data a into the area of the cache memory 12 starting from the cache start address of the cache control list (S1).
The area on the cache memory 12 where the stored data a is written may be the same as the area on the cache memory 12 where the stored data b is stored, or may be another free area. In the former case, the stored data b is overwritten by the stored data a. In the latter case, the stored data b remains on the cache memory 12, but the cache control list is "invalidated" or erased (including overwriting) from the cache control table. Therefore, it is not handled as valid stored data. Therefore, the stored data b is thereafter overwritten with other stored data. The same applies to the following cases (2) to (4).
In the case of (2), the same processing as in the case of (1) is performed (S1). Therefore, the cache control list of the stored data b is deleted from the cache control table.
In the case of (3), the cache management unit 11 generates a cache control list of the stored data a for the part where the address ranges overlap, adds the cache control list to the cache control table, and writes the stored data a to the cache memory 12. (S1). In addition, the cache management unit 11 generates (or updates) a cache control list for each of two parts of the stored data b excluding the part overlapping the stored data a, and adds the cache control list to the cache control table.
In the case of (4), the cache management unit 11 generates a cache control list of the storage data a, adds it to the cache control table, and writes the storage data a to the cache memory 12 (S1). In addition, the cache management unit 11 generates or updates a cache control list for a portion (one) of the stored data b excluding the portion overlapping with the stored data a, and adds the cache control list to the cache control table.
“Received” is written in the state of the cache control list of the stored data a, indicating that the stored data a is in the received state.
Subsequently, the cache management unit 11 transmits a copy message (COPY) to the message communication unit 25 of the cache device 2 via the message communication unit 15 and the communication line L (S2). The device number, device start address, and data length of the cache control list of the stored data a are written in the device number, device start address, and data length of the header portion of this copy message. Also, stored data a is placed in the data part of the copy message.
The message communication unit 25 gives the copy message transmitted from the cache device 1 to the cache management unit 21. When the cache management unit 21 receives the copy message, the cache management unit 21 performs the same process as step S1 of the cache management unit 11 described above on the storage data a of the data part of the copy message.
That is, the cache management unit 21 updates or generates a cache control list based on the cache control table of the cache memory 22, and writes the stored data a to the cache memory 22 (S7). When all of the stored data a is written in the cache memory 22, even if one of the cache devices 1 or 2 stops due to a failure, the stored data a is not lost. That is, the nonvolatile storage data a is completed. Therefore, in order to represent this non-volatility, “non-volatile” is written in the state of the cache control list of the storage data a held by the cache management unit 21.
After writing the stored data a to the cache memory 22, the cache management unit 21 sends a first acknowledgment message (C-ACK) indicating that the writing has been completed normally via the message communication unit 25 and the communication line L. It transmits to the message communication part 15 (S8).
The message communication unit 15 gives the first confirmation response message to the cache management unit 11. When the cache management unit 11 receives the first confirmation response message from the message communication unit 15, the cache management unit 11 transmits an access host confirmation response message (host confirmation response message) via the input / output unit 13 and the access network 6 to the access host 4. _i (S3). By sending this host acknowledgment message, access host 4 _i Learns that the stored data a is stored in the cache system 3 (cache devices 1 and 2) and is made non-volatile.
When transmitting the host confirmation response message, the cache management unit 11 updates the state of the cache control list of the data a from “received” to “dirty”.
Subsequently, at an appropriate timing, the cache management unit 11 updates the state of the cache control list corresponding to the storage data a stored in the cache memory 12 to “flushing”, and the storage data a is input / output. Secondary storage devices (5) of the secondary storage device group 5 via the unit 14 and the storage network 7. _k And (S4). This secondary storage device 5 _k Is a secondary storage device corresponding to the device number of the cache control list of the stored data a. The transmitted storage data a is stored in the secondary storage device 5 _k Is written in the area starting from the device start address (S4).
Secondary storage device 5 for stored data a _k When the transmission (writing) to the cache is completed, the cache management unit 11 updates the state of the cache control table to “flushed msg”, and transmits a flash message (FLUSHED) via the message communication unit 15 and the communication line L. It transmits to the part 25 (S5). This flush message is given from the message communication unit 25 to the cache management unit 21. Thereby, the cache management unit 21 knows that the flush (that is, non-volatization of the storage data a to the secondary storage device) is completed, and recognizes that the storage data a stored in the cache memory 22 can be safely erased. .
Subsequently, the cache management unit 21 updates the state of the cache control list of the stored data a stored in the cache memory 22 to “clean” and notifies the cache device 1 that the state has shifted to the clean state. The second confirmation response message (F-ACK) is transmitted to the message communication unit 15 via the message communication unit 25 and the communication line L (S9).
When the cache management unit 11 receives the second confirmation response message from the message communication unit 15, the cache management unit 11 updates the state of the cache control list to “clean” in order to treat the stored data a as the clean state.
The cache management units 11 and 21 can always update the state of the cache control list in the clean state to “invalidated” (S6, S10). For example, the state of the stored data a is invalidated when the stored data a in the clean state becomes unnecessary, or when the free space of the cache memory 12 or 22 disappears and the stored data a needs to be erased. Can be in a state. In addition, the least read storage data a can be set to the invalidated state by using an LRU (Least Recently Used) algorithm.
The invalidated cache control list is then deleted from the cache control table when new storage data is written to the cache memory, or overwritten by another newly stored data cache control list. It becomes. In addition, the area on the cache memory of the stored data in the invalidated state is also overwritten with other new stored data.
Thus, the storage data writing process of the cache system 3 is completed. The cache devices 1 and 2 of the cache system 3 can receive the stored data independently from the access host group and execute the writing process independently. Therefore, the cache system 3 has almost twice the processing capacity as compared with the case where only one cache device exists. Thereby, the bottleneck in the case of one cache device can be eliminated. Further, since the stored data is held in at least one cache device or the secondary storage device group 5, even if a failure occurs in one cache device, the stored data is not lost.
Cache device 2 is access host 4 _i When the stored data is received from the cache device 1, only the cache device 1 and the cache device 2 are switched, and the same processing as described above is executed.
In step S7, the cache device 2 accesses another storage data (stored data c) having the same address range as the stored data a transmitted from the cache device 1 via the communication line L. _j In some cases, the stored data c is received in the received state in the cache device 2. That is, there are cases where the cache devices 1 and 2 receive storage data a and c having different contents within the same address range from different access hosts almost simultaneously. The process in this case will be described in the process at the time of collision of write data described later.
In step S1, the storage data b overwritten and erased by the storage data a is in the received state or the flushed msg state, and the first confirmation response message or the second confirmation response message for the storage data b is stored in the cache device 2. To the cache device 1 in some cases. In this case, the cache device 1 (cache management unit 11) ignores these confirmation response messages transmitted from the cache device 2. That is, even if the cache device 1 receives these confirmation response messages from the cache device 2, it only discards them, and does not execute processing associated with reception of the confirmation response messages.
Whether or not to ignore the acknowledgment message is determined based on the sequence number included in the message. For example, when the sequence number is a number that increases one by one in order from 1, and the cache device 1 receives two first confirmation response messages, the sequence number of both first confirmation response messages , Having a young (small) value is a response message corresponding to the stored data b. Therefore, in this case, the first acknowledgment message having a young sequence number is ignored.
In addition, storage data a having different contents in the same address range as the storage data (storage data d) that is flushed by the cache device 2 and in the flushed msg state is transmitted from the access host to the cache device 1, and the cache device 1 There are times when it is received. In this case, the cache device 1 ignores the flush message from the cache device 2, and the cache device 2 uses the stored data d that has already been flushed to the stored data included in the copy message transmitted from the cache device 1. By replacing with a, it is possible to eliminate the mismatch of the stored data in the same address range.
If the cache device 1 receives new storage data (stored data e) in the same address range as the storage data a from the access host before the storage data a of the cache device 1 is in the flushing state, the cache device 1 1 (cache management unit 11) is a secondary storage device 4 _k The writing (flushing) of the storage data a to the storage is stopped, and only the storage data e is stored in the secondary storage device 4 _k By writing to, duplication of writing can be avoided.
<Other forms of storage data write processing>
Access host 4 _i If the stored data a sent from the cache device 1 to the cache device 1 is block data consisting of a plurality of bytes, the copy message may be sent to each part by dividing the block data into a plurality of parts. Good. In this case, a plurality of first confirmation response messages are also transmitted corresponding to each part. The state of the block data cache control list is updated after reception of all block data and writing to the cache memory 22 are completed. In addition, access host 4 _i A host acknowledgment message is also sent after receiving the second acknowledgment message for all of the block data.
The flush message transmitted in step S5 can be transmitted separately from other messages, or can be transmitted by piggyback in addition to other control messages. That is, immediately after step S4, the flash message can be transmitted as an individual message separated from other messages, or can be added to a copy message for other stored data and transmitted as a piggyback. .
The storage data b stored in the cache memory 12 and the access host 4 _i When the storage data a received from the cache server 1 has the same content, the cache device 1 uses the access host 4 _i It is also possible to immediately send an acknowledgment message to the secondary storage device (and the cache memories 12 and 22) so that writing (updating) is not performed. This can reduce the cost required for writing.
<Processing when write data collides>
“Write data collision” refers to a state in which a plurality of storage data having different contents are written in the same address range from the access host group to the cache system 3 and the plurality of storage data are in the received state in the cache system 3.
This collision includes a collision in only one cache device and a collision in both the two cache devices 1 and 2. In the following, processing when a collision occurs in these two cases will be described.
(1) Processing when a collision occurs in one cache device
The cache device 1 transfers some storage data A1 to the access host 4 _i After receiving from the access host 4, the storage data A 2 having different contents in the same address range as the storage data A 1 is received. _j In the cache device 1, a collision between the stored data A1 and A2 occurs. Access host 4 _i And 4 _j May be the same or different.
In this case, the cache management unit 11 of the cache device 1 overwrites the cache control list of the storage data A1 with the cache control list of the storage data A2, or erases or invalidates the cache control list of the storage data A1. Then, a new cache control list of the storage data A2 is generated and added to the cache control table. In addition, the cache management unit 11 writes the storage data A2 in the cache memory 12 in the same area as the storage data A1 or in a different area.
In addition, when the cache management unit 11 receives the first confirmation response message for the storage data A1 from the cache device 2, the cache management unit 11 ignores the first confirmation response message. Note that the two first confirmation response messages can be distinguished by the magnitude (old and young) of the sequence number in the same manner as described above, for example. It will be ignored.
Similarly, the cache management unit 21 of the cache device 2 uses the cache control list generated based on the copy message (first copy message) of the storage data A1 as the copy message (second copy message) of the storage data A2. Or the cache control list of the storage data A1 is deleted or invalidated, and a new cache control list of the storage data A2 is generated and added to the cache control table. Further, the cache management unit 21 writes the storage data A2 in the cache memory 22 in the same area as or different from the storage data A1.
When the communication line L is, for example, Gigabit Ethernet, the second copy message transmitted later may be received by the cache device 2 before the first copy message transmitted first. Even in such a case, the cache management unit 21 ignores (discards) the first copy message based on the sequence numbers of the first copy message and the second copy message, and determines the second copy message. The stored data can be stored in the cache memory 22.
Although the collision of stored data in the cache device 1 has been described, the same processing is executed when a similar collision occurs in the cache device 2.
(2) Collision detection processing when a collision occurs between both cache devices
FIG. 5A to FIG. 5C are sequence diagrams showing the flow of processing for collision detection when a collision occurs between both cache devices. In these figures, storage data A1 and storage data A2 are storage data having the same address range and different contents.
In any of FIGS. 5A to 5C, the access host 4 _i The storage data A1 transmitted from is received by the cache device 1 and is in the receive state, and the access host 4 _j The storage data A2 transmitted from is received by the cache device 2 and is in the receive state. Therefore, the stored data A1 and the stored data A2 are in a collision state.
Note that the reception time of the storage data A1 of the cache device 1 and the reception time of the storage data A2 of the cache device 2 may be the same, or one may be earlier or later than the other.
If the stored data A1 and A2 shift to the dirty state or the non-volatile state in both the cache devices 1 and 2 in this collision state, the consistency and transparency of the data held by both cache devices cannot be maintained. Therefore, in order to maintain consistency and transparency, both cache devices must first detect a collision condition. For this reason, the following processing is executed.
FIG. 5A shows collision detection when one cache device (cache device 2) receives a copy message of storage data A1 from the other cache device (cache device 1) before sending the copy message of storage data A2. The flow of processing is shown.
The cache device 1 is an access host 4 _i Stored data A1 from the cache device 2 is accessed by the access host 4 _j Each of the stored data A2 is received to transmit a copy message (S11, S13). However, if the cache device 2 receives a copy message of the storage data A1 from the cache device 1 before sending the copy message, the cache device 2 (cache management unit 21) has a collision with the storage data A2. (S14).
That is, the cache device 2 (cache management unit 21): (a) the address range (that is, the device number, device start address, and data length) included in the header portion of the copy message of the stored data A1, and the address of the stored data A2. By comparing the range, the same address range is detected, and (b) the stored data A1 included in the data part of the copy message is compared with the stored data A2 to store the stored data. (C) that the stored data A2 is in the receive state according to the cache control list of the stored data A2, and that the stored data A1 is in the receive state due to the status of the header part of the copy message. To detect. From these (a) to (c), the cache device 2 detects the occurrence of a collision between the stored data A1 and the stored data A2.
The cache device 2 detects the collision, thereby stopping the transmission of the copy message of the stored data A2, and transmits a collision detection message (COLLISION) to the cache device 1 instead of the first confirmation response message (S14). .
The cache device 1 (cache management unit 11) detects that a collision has occurred in the stored data A1 by receiving a collision detection message from the cache device 2 (S12). That is, the cache device 1 (cache management unit 11) confirms that the message is a collision detection message and that the collision occurs in the stored data A1 based on the address range included in the header part of the collision detection message. Is detected.
When the cache devices 1 and 2 are switched, that is, when the cache device 2 sends a copy message of the stored data A2 to the cache device 1 before the cache device 1, the cache device 1 sends a collision detection message. It is transmitted to the cache device 2.
In this way, when a copy message having storage data with different contents in the same address range is received for the storage data in which one cache device is in the receive state, the other cache device receives the copy message instead of the copy message. By transmitting a collision detection message for notifying the occurrence of a collision, both cache devices can detect the occurrence of the collision.
FIG. 5B shows the flow of collision detection processing when both cache devices transmit and receive a copy message.
The cache device 1 transmits a copy message of the stored data A1 (S21). When the cache device 2 transmits a copy message of the stored data A2 (S24) and both cache devices receive the copy message of the other party, both cache devices respectively detect a collision by the copy message of the other party ( S22, S25).
Also in this case, as in FIG. 5A, the cache device that has detected a collision transmits a collision detection message to the other party instead of the first confirmation response message (S22, S25). That is, in FIG. 5B, both cache devices transmit a collision detection message to the other cache device. As a result, both cache apparatuses detect again a collision that has already been detected.
Thus, when both cache devices transmit and receive a copy message, both cache devices can detect a collision by this copy message. Furthermore, even in this case, both cache devices can notify each other that a collision has been detected by transmitting a collision detection message.
5C, similar to FIG. 5B, both cache devices transmit a copy message, but when the copy message of the stored data A1 is received by the cache device 2 after the collision detection message of the cache device 1, the collision detection is performed. The flow of processing is shown.
In this case, the cache device 2 detects the collision by the collision detection message transmitted from the cache device 1 (S35), but then detects the detected collision again by the received copy message of the storage data A1. (S36). As a result, a collision is detected in both cache devices (S32, S35, S36).
Note that the cache device 2 may transmit a third confirmation response message (COL-ACK) to the cache device 1 after the detected collision is detected, as indicated by a broken line in FIG. 5C. When the third confirmation response message is transmitted, the cache device 1 detects the detected collision again by this message (S33).
The reverse situation of FIG. 5C can also occur. That is, this is a case where the copy message of the storage data A2 arrives late at the cache device 1. Also in this case, the same processing is performed only by exchanging the cache devices 1 and 2.
(3) Collision recovery processing between both cache devices
There are the following three methods for solving (recovering) the situation when a collision is detected between the two cache devices described above.
(A) First method
In the first method, when a collision is detected, it is determined in advance which one of the cache devices 1 or 2 is to be prioritized, and the conflict resolution that makes the stored data received by the prioritized cache device 1 valid is determined. Is the method. In this case, only the storage data of the priority cache device is treated as valid, and the storage data of the non-priority cache device is treated as invalid.
Here, “stored data is treated as valid” means that the cache control list of the stored data exists in the cache control table in a state other than the invalidated state, and the stored data is stored in the cache memory. Say.
“The stored data is treated as invalid” means that the cache control list of the stored data is erased from the cache control table (including the case where the cache is overwritten by another cache control list), or the cache This means that the state of the control list is set to the invalidated state. The stored data may exist in the cache memory or may be erased from the cache memory (including a case where it is overwritten by other stored data).
For example, in FIG. 5A, when the cache device 1 is prioritized, the cache management unit 11 of the cache device 1 updates the cache control list of the stored data A1 and the cache memory 12 even if a collision is detected (S12). There is no need. That is, in the cache device 1, the stored data A1 is handled as valid, and the stored data A2 is handled as invalid.
On the other hand, the cache management unit 21 of the cache device 2 adds the cache control list of the storage data A1 to the cache control table and stores it in the cache memory 22 based on the information of the header part of the copy message of the storage data A1 after the collision detection. Write data A1. The cache control list of the stored data A2 is erased or set to the invalidated state. As a result, the cache device 2 also treats the stored data A1 as valid, and treats the stored data A2 as invalid.
In FIG. 5A, when the cache device 2 is prioritized, the cache management unit 21 of the cache device 2 does not need to update the cache control list of the storage data A2 and the cache memory 22 after the collision is detected. Then, the cache management unit 21 transmits a copy message of the stored data A2 to the cache device 1. This copy message may be transmitted separately from the collision detection message, or may be transmitted together with the collision detection message as piggyback.
The cache management unit 11 of the cache device 1 adds the cache control list of the storage data A2 to the cache control table based on the information of the header part of the copy message, and writes the storage data A2 to the cache memory 12. The cache control list of the stored data A1 is erased or set to the invalidated state. As a result, the cache device 1 also treats the stored data A2 as valid, and treats the stored data A1 as invalid.
The cache device 1 may return the first confirmation response message in response to the copy message from the cache device 2, but it is preferable not to return it in order to reduce the communication cost of the message. When the first confirmation response message is returned, the cache device 2 may ignore the first confirmation response message.
5B and 5C, since both the cache devices 1 and 2 hold both the stored data A1 and the stored data A2, both cache devices are received by the priority cache device after the collision is detected. The stored data is treated as valid, and the stored data received by the non-prioritized cache device is treated as invalid.
In this way, the collision state is resolved, and the cache devices 1 and 2 ensure the consistency and transparency of the stored data.
(B) Second method
The second method is a conflict resolution method that compares the reception time of the storage data of the cache device 1 with the reception time of the storage data of the cache device 2 and prioritizes the storage data at an earlier reception time to be effective. is there.
In this second method, after the collision is detected, the cache devices 1 and 2 communicate the reception time via the communication line L, or send the reception time by a copy message or a collision detection message. The reception time is notified to each other. The stored data received at an earlier reception time is given priority, the stored data is treated as valid, and the stored data received at a later reception time is treated as invalid. When the reception time is notified by a copy message or a collision message, an area for storing the time is provided in the header part or data part of these messages.
In the second method, it is assumed that the times of both the cache devices 1 and 2 are synchronized. Similarly to the first method, in FIG. 5A, when the cache device 1 is prioritized, a copy message of the stored data A2 is transmitted from the cache device 2 to the cache device 1. In FIG. 5B and FIG. 5C, regardless of which of the cache devices 1 and 2 is prioritized, both hold the stored data A1 and A2, so there is no need to newly send a copy message.
Also by this second method, the collision state is resolved, and the cache devices 1 and 2 ensure the consistency and transparency of the stored data.
(C) Third method
The third method is a collision resolution method in which both cache devices transmit a copy message (retransmission copy message) or a retransmission instruction message again after a random time has elapsed after collision detection.
The cache device that has transmitted the retransmission copy message or the retransmission instruction message earlier has priority, and the stored data received by the cache device from the access host is treated as valid.
The random time is, for example, a time obtained based on the pseudo random numbers generated by the cache management units 11 and 21, respectively.
The message transmitted after the lapse of random time may be a retransmission copy message including the storage data A1 or A2. However, if the cache device on the other side already has the storage data A1 and A2, the communication cost is reduced. In order to mitigate, it is preferable that the message is a retransmission instruction message for indicating retransmission without including stored data.
In order to distinguish a retransmission copy message from a normal copy message, the type of the header portion is “RE-COPY” indicating that it is a retransmission copy message. The other contents of the header part are the same as those of a normal copy message.
The retransmission instruction message has only a header part and no data part. The type of the header portion of the retransmission instruction message is “RETX” representing the retransmission instruction message, and the address range of the header is the same address range as the previously transmitted stored data. As a result, the cache device on the receiving side can identify that it is a retransmission message for the previously received stored data.
For example, in FIG. 5A, the cache device 1 (cache management unit 11) transmits a retransmission instruction message to the cache device 2 via the communication line L after a lapse of random time. The cache device 2 (cache management unit 21) transmits a retransmission copy message including the stored data A2 to the cache device 1 via the communication line L after a lapse of random time.
When the retransmission instruction message transmitted by the cache device 1 is transmitted before the retransmission copy message transmitted by the cache device 2, the cache device 1 is given priority, and therefore the stored data A1 is treated as valid.
On the other hand, when the retransmission copy message transmitted by the cache device 2 is transmitted prior to the retransmission instruction message transmitted by the cache device 1, the cache device 2 has priority, and therefore the stored data A2 is treated as valid. It is. In this case, since the cache device 1 does not have the storage data A2, the cache device 2 transmits the storage data A2 to the cache device 1 by a copy message or the like.
When the random time by the cache device 1 and the random time by the cache device 2 are the same, and the retransmission instruction message transmitted by the cache device 1 and the retransmission copy message transmitted by the cache device 2 are transmitted and received simultaneously. In this case, a random time is counted again, and the same process is repeated.
Similar processing is executed also in the case of FIGS. 5B and 5C.
In addition, before the random time elapses, another storage data having the same address range may be transmitted from the access host and received by the cache device 1 or 2. In this case, the cache device 1 or 2 that has received the other stored data transmits the other stored data to the other cache device again by a copy message.
Also by this third method, the collision state is resolved, and the cache devices 1 and 2 ensure the consistency and transparency of the stored data.
When at least one of the storage data received from the access host by the cache device 1 and the storage data received from the access host by the cache device 2 has a plurality of bytes, a collision occurs in a part of the storage data of a plurality of bytes. There is. In this case, the collision detection process and the recovery process are executed for a part where the collision occurs.
<Read processing of stored data>
A storage data read process when there is a read request for storage data from the access host group 4 to the cache system 3 will be described. Here, access host 4 _i A case where a read request for stored data is made to the cache device 1 will be described.
Access host 4 _i Transmits a read request including an address range of stored data to be read to the input / output unit 13 of the cache device 1 via the access network 6. This read request is given from the input / output unit 13 to the cache management unit 11.
The cache management unit 11 determines whether storage data in the address range included in the read request exists in the cache memory 12 based on the cache control table. Except for the storage data in the received state and the invalidated state, it is determined that the storage data in other states exists in the cache memory 12.
When all or part of the storage data corresponding to the address range is not stored in the cache memory 12, the cache management unit 11 converts the part not stored in the cache memory 12 into the input / output unit 14 and the storage network. 7 is read from the corresponding secondary storage device of the secondary storage device group 5 via 7 and stored in the cache memory 12. Along with this, the cache management unit 11 generates a cache control list relating to storage data stored in the cache memory 12 from the secondary storage device, and adds it to the cache control table. Note that “clean” is written in the state of the cache control list.
Subsequently, the cache management unit 11 reads the storage data corresponding to the read request from the cache memory 12 and accesses the access host 4 via the input / output unit 13 and the access network 6. _i Send to.
At this time, in the present embodiment, even if the storage data read from the secondary storage device by the cache device 1 already exists in the cache memory 22 of the cache device 2, the storage data read from the secondary storage device The data stored in the cache memory 22 is guaranteed to have the same content. Therefore, there is no need to communicate a confirmation message or the like for confirming the consistency of stored data between the cache devices 1 and 2. Thereby, the communication overhead and communication cost accompanying this communication can be reduced.
Access host 4 _i When the storage data requested to be read from the cache memory 12 does not exist in the cache memory 12 but exists in the cache memory 22, the cache management unit 11 may receive the storage data from the cache device 2 (cache memory 22). .
Access host 4 _i If the time order of the stored data read and received is important, the access host 4 _i The time sequence can be guaranteed by using a general control synchronization mechanism or the like possessed by.
<Load distribution processing>
In the above description, the cache device that has received the storage data from the access host writes (flashes) the storage data to the secondary storage device, but this flush can also be distributed between the two cache devices.
For example, when the access frequency of one cache device to the secondary storage device is higher than that of the other cache device, the write processing can be shared by the other cache device. In addition, even when there is a difference in processing capability and performance between the two cache devices, it is possible to perform a large number of write processes with a cache device having high capability and performance. Thereby, load distribution is achieved between the cache devices 1 and 2.
In order to perform such load distribution, the cache management units 11 and 21 both measure the load, and periodically load it with a control message (another control message different from the above-described copy message, confirmation response message, etc.). Communicate with each other. The measured load includes, for example, the number of writes to the secondary storage device group 5, the amount of data written (number of bytes, number of blocks), and the like. A cache device with a low load executes flushing.
FIG. 6 is a sequence diagram showing the flow of storage data write processing when load distribution is performed.
Cache device 1 is access host 4 _i When the storage data is received, the cache management unit 11 generates a cache control list and stores the storage data in the cache memory 12 (S40), as in step S1 of FIG.
Subsequently, the cache management unit 11 transmits a copy message to the cache device 2 (S41). Here, when the load on the cache device 1 is higher than the load on the cache device 2, the cache management unit 11 designates the cache device 2 as the flush authority in the header part of the copy message.
When the cache management unit 21 receives the copy message in which the cache device 2 is designated as the flash authority, the cache management unit 21 generates a cache control list, and the state of the generated cache list is set to a non-volatile state set in a normal case. Instead, the dirty state is set, and the storage data included in the copy message is stored in the cache memory 22 (S42).
Subsequently, the cache management unit 21 returns a first confirmation response message to the cache device 1 (S8).
When the cache management unit 11 receives the first confirmation response message, the cache management unit 11 sets the state of the cache control list to the non-volatile state instead of the state dirty state, and sends the host confirmation response message to the access host 4. _i (S43).
After that, the cache management unit 21 sets the storage data in the dirty state to the flushing state at an appropriate timing and writes it in the secondary storage device (S44). The subsequent steps S45 to S48 are different in that the cache management unit 11 changes the state of the cache control list from non-volatile to clean, and the cache management unit 21 changes the state of the cache control list from flushing to flushed msg and clean. Except for this, it is the same as the corresponding processing in steps S5 to S10 shown in FIG.
In this way, load distribution is achieved by transferring the authority to flush stored data in an appropriate address range to the other cache device.
In the above description, the other cache device is given the flush authority. However, the other cache device can be notified of the flush authority by a control message such as a copy message.
Also, the load value can be the ratio of the load to the performance of each cache device. For example, the cache device 1 divides its own load value by its own performance value (that is, (the load value of the cache device 1) / (the value of the performance of the cache device 1)), and this division result is used as the cache device. 2 and the cache device 2 also transmits the division result of its own device to the cache device 1. Then, it is possible to give a flush authority to a cache device having a lower value of the two ratios.
<Fault monitoring and recovery device>
As described above, the failure monitoring units 16 and 26 monitor the operation status of the other cache device and determine whether or not a failure has occurred. For example, the failure monitoring units 16 and 26 transmit failure detection messages to the other cache device via the communication line L at regular time intervals. If the failure monitoring units 16 and 26 do not receive a failure detection message transmitted from the other cache device even after a predetermined time has elapsed, it is determined that a failure has occurred in the other cache device. To do.
In the following, the recovery process when a failure is detected will be described, taking as an example the case where a failure occurs in the cache device 1 and the failure monitoring unit 26 detects this failure.
When a failure of the cache device 1 is detected, the cache device 1 stores the stored data in the dirty state that the cache device 1 intends to write from the cache memory 12 to the secondary storage device group 5. Need to write to the device.
Therefore, the failure monitoring unit 26 is in a non-volatile state based on the cache control table held by the cache management unit 21 (that is, the cache control table related to the stored data stored in the cache memory 22) after the failure is detected. The state of the stored data is rewritten to the dirty state. As a result, the storage data changed to the dirty state is written (flashed) to the secondary storage device group 5 by the cache management unit 21.
It cannot be determined to which part of the stored data in the flushing state that is being written to the secondary storage device group 5 by the cache device 1 has been written to the secondary storage device group 5. The stored data in the flushing state in the cache device 1 is also stored in the cache memory 22 as the stored data in the non-volatile state in the cache device 2. Therefore, the storage data in the flushing state is also reliably stored in the secondary storage device by the cache device 2 by changing from the non-volatile state to the dirty state in the cache device 2 as described above. It becomes.
When the cache device 1 recovers from a failure, or when the cache device 1 is switched to another backup cache device, a situation may occur in which stored data in the dirty state exists only in one cache device. Therefore, in order to prevent such a situation from occurring, the return timing of the cache device 1 is controlled so that the cache device 1 is not returned unless all the cache data stored in the cache memory 22 is in the clean state. The Alternatively, the stored data in the dirty state before the restoration of the cache device 1 may be transmitted to the cache device 1 after restoration.
When a failure occurs in the cache device 2, the failure monitoring unit 16 of the cache device 1 executes the same processing as the failure monitoring unit 26 described above.
Industrial applicability
The present invention can be used for a cache system arranged between an access host (group) such as a computer and a secondary storage device (group).
According to the present invention, each of the two cache devices independently receives and processes the stored data from the access host, so that it is possible to obtain almost twice the processing capacity of a single cache device. The bottleneck of the cache device can be eliminated.
In addition, since the same storage data is stored in both cache devices, the storage data in the cache system can be made non-volatile. As a result, even if a failure occurs in one of the cache devices, loss of stored data is prevented.
Furthermore, since the conflict resolution processing is executed between both cache devices, the consistency and transparency of stored data between both cache devices is ensured. Therefore, it is not necessary to check the consistency of the stored data in both cache devices every time the stored data is read, and the overhead for checking can be eliminated.
As a result, it is possible to achieve consistency and transparency of stored data while improving performance.
[Brief description of the drawings]
FIG. 1 is a block diagram showing the overall configuration of a secondary storage device access system using a cache system according to an embodiment of the present invention.
FIG. 2 shows a configuration example of a cache control table held by the cache management unit.
FIG. 3 shows an example of the data structure of a control message communicated between message communication units.
FIG. 4 is a sequence diagram showing a flow of a process of writing storage data transmitted from the access host to the secondary storage device group.
FIG. 5A to FIG. 5C are sequence diagrams showing the flow of processing for collision detection when a collision occurs between both cache devices.
FIG. 6 is a sequence diagram showing the flow of storage data write processing when load distribution is performed.
7A to 7E show a case where the address ranges of two stored data overlap.

Claims

Each cache device in a cache system having two cache devices, storing data given from an access host or a secondary storage device, and storing data from the access host in the secondary storage device,
A data input unit for inputting first data provided from the access host;
A data receiving unit for receiving second data input from the access host by the other cache device and transmitted to the own cache device;
A cache storage unit that stores both or either of the first data and the second data;
A cache management unit for managing the cache storage unit;
A data transmission unit for transmitting the first data to the other cache device;
A data output unit for outputting the first data or the second data to the secondary storage device;
A cache device.

In the first claim,
By the time the data transmission unit completes transmission of the first data, the data input unit has the same address range as the address range on the secondary storage device of the first data, When the third data different in content from the first data is input from the access host, the cache management unit stores the third data in the cache storage unit as valid, and the first data Is treated as invalid,
Cache device.

In claim 1 or 2,
When the data receiving unit receives the second data from when the data input unit inputs the first data to when the data transmitting unit completes transmission of the first data, Presence or absence of collision between the first data and the second data based on the address range of both the first data and the second data on the secondary storage device and the contents of both data A collision detection unit for determining
A collision detection message transmission unit that transmits a collision detection message indicating that a collision has occurred to the other cache device when the collision detection unit detects a collision;
A collision detection message receiving unit for receiving the collision detection message from the other cache device;
A cashew apparatus further comprising:

In claim 3,
The collision detection unit detects a collision even when the collision detection message reception unit receives the collision detection message from the other cache device;
Cache device.

In claim 3 or 4,
When the collision detection unit detects a collision, the cache management unit treats the data prioritized based on a predetermined priority order among the first data and the second data as valid, Treat the other as invalid,
Cache device.

In claim 3 or 4,
When the collision detection unit detects a collision, the cache management unit receives the time when the first data is input to the data input unit and the second data are input to the data input unit of the other cache device. Data with earlier time is treated as valid, and data with later time is treated as invalid,
Cache device.

In claim 3 or 4,
When the collision detection unit detects a collision, a data / message retransmission unit that transmits a retransmission message indicating retransmission of the first data or the first data after elapse of a random time;
A data / message receiving unit that receives the second data transmitted from the data / message retransmission unit of the other cache device or a retransmission message of the second data;
And
The cache management unit treats the data corresponding to the earlier time among the retransmission time by the data / message retransmission unit and the reception time by the data / message receiving unit as valid, and handles the data corresponding to the later time. Treat it as invalid,
Cache device.

In claim 7,
The data / message retransmission unit repeats retransmission after a lapse of random time if the retransmission time by the data / message retransmission unit and the reception time by the data / message reception unit are the same time,
Cache device.

In any one of claims 1 to 8,
After the data output unit outputs the first data or the second data to the secondary storage device, a flash message transmission unit transmits a flush message indicating that the output is completed to the other cache device When,
A flash message receiving unit for receiving a flash message transmitted from the other cache device;
A cache device further comprising:

In claim 9,
The cache management unit may receive the first data or the second data corresponding to the flush message after the flush message is transmitted by the flush message transmission unit or after the flush message reception unit receives the flush message. Treat the data as invalid,
Cache device.

In any one of claims 1 to 10,
The data transmission unit transmits first flush authority information indicating which cache device performs output of the first data to the secondary storage device together with the first data;
The data receiving unit receives, together with the second data, second flush authority information indicating which cache device performs output of the second data to the secondary storage device,
The data output unit outputs the first data to the secondary storage device when the first flush authority information indicates the own cache device, and the second flush authority information is the own cache device. If the device is indicated, the second data is output to the secondary storage device.
Cache device.

In claim 11,
A load information transmission unit that measures the load of the own cache device and transmits the measured load value to the other cache device;
A load information receiving unit for receiving a load value of the other cache device transmitted from the other cache device;
The load value of the self cache device is compared with the load value of the other cache device, and the flush authority of the first storage data or the second storage data is set in the cache device having a small load value A flash authority setting section to
A cache device further comprising:

In claim 12,
A cache device, wherein the value of the load is a ratio of the load to the performance of each cache device.

In any one of claims 1 to 13,
When the occurrence of a failure in the other cache device is monitored and the occurrence of the failure is detected, the data output unit or the data output unit of the other cache device among the first data stored in the cache storage unit Of the second data stored in the cache storage unit, and the second data stored in the cache storage unit, the output to the next storage device has not been completed and the data transmission unit has completed transmission to the other cache device, A fault monitoring unit for controlling the data output unit or the data output unit of the other cache device to output to the secondary storage device what has not been output to the secondary storage device Further having
Cache device.

In any one of claims 1 to 14,
A read request input unit for receiving a data read request including an address range of the secondary storage device from the access host;
When data corresponding to the address range included in the read request is stored as valid data in the cache storage unit, the data is read from the cache storage unit and does not exist in the cache storage unit, or , When not stored as valid data, a data reading unit for reading from the secondary storage device;
A read data transmission unit for transmitting data read by the data read unit to the access host;
A cache device further comprising:

In claim 15,
The data management unit stores the data read from the secondary storage device by the data reading unit as valid data in the cache storage unit;
Cache device.

A cache system having two cache devices, storing data supplied from an access host or a secondary storage device, and storing data from the access host in the secondary storage device;
Each of the two cache devices is
A data input unit for inputting first data provided from the access host;
A data receiving unit for receiving second data input from the access host by the other cache device and transmitted to the own cache device;
A cache storage unit that stores both or either of the first data and the second data;
A cache management unit for managing the cache storage unit;
A data transmission unit for transmitting the first data to the other cache device;
A data output unit for outputting the first data or the second data to the secondary storage device;
Having a cache system.

A cache method in a cache system having two cache devices, storing data provided from an access host or a secondary storage device, and storing data from the access host in the secondary storage device,
One cache device that has received data from the access host stores the data in its own cache memory, and transmits the data to the other cache device,
The other cache device receives the data transmitted from the one cache device, stores the data in its own cache memory,
The one cache device or the other cache device outputs the data to the secondary storage device;
Cache method.