JP3614946B2

JP3614946B2 - Memory buffer device

Info

Publication number: JP3614946B2
Application number: JP23765495A
Authority: JP
Inventors: 淳河井
Original assignee: Oki Electric Industry Co Ltd
Current assignee: Oki Electric Industry Co Ltd
Priority date: 1995-08-23
Filing date: 1995-08-23
Publication date: 2005-01-26
Anticipated expiration: 2015-08-23
Also published as: JPH0962571A

Description

【０００１】
【発明の属する技術分野】
本発明は、計算機におけるメモリ読み出し、およびメモリ書き込みのためのメモリバッファ装置に関する。
【０００２】
【従来の技術】
既存技術による計算機では、メモリ操作は基本的に１語、半語、あるいは１／４語単位で任意のメモリアドレスに対してデータ読み出し（ロード命令実行）、およびデータ書き込み（ストア命令実行）を行う。このため、小規模の計算機では１語のメモリレジスタを設置し、ロード命令実行ではメモリ装置から読み出したデータをこのレジスタに格納し、ＣＰＵ（中央処理装置）はこれを内部に取り込み、また、ストア命令実行では書き込むべきデータを一旦レジスタに格納し、これをメモリ装置に書き込んでいる。
このため、ロード命令実行では、ＣＰＵの要求毎にメモリアクセス時間が命令実行時間にそのまま反映され、ストア命令実行では、ＣＰＵからレジスタに書き込みデータを格納するまでの時間が命令実行時間に反映される。但し、ストア命令実行直後にロード命令、あるいはストア命令を実行する場合には、前のストア命令実行に伴うメモリ書き込み時間の間、次のメモリ操作は待たされる。
【０００３】
大規模の計算機ではキャッシュメモリを設置し、これを介在させてロード命令実行、およびストア命令実行を行う。これによりＣＰＵはキャッシュメモリとの間で１語単位のデータ授受を行ない、また、メモリ装置はキャッシュメモリとの間で連続するメモリアドレスに配置される複数語のデータ（以下データブロックと称す）授受を行う。これにより、ロード命令で要求されるデータがキャッシュメモリに存在する場合（以下ロードキャッシュヒットと称す）には、キャッシュメモリからの高速なデータ応答により短時間でロード命令が完了し、また、ストア命令では書き込むべきデータを含むデータブロックがキャッシュメモリに存在する場合（以下ストアキャッシュヒットと称す）には、キャッシュメモリへのデータ書き込みのみ実行することでストア命令を完了させることが可能である。
【０００４】
ロード命令実行においてロードキャッシュヒットでない場合（以下ロードキャッシュミスと称す）、あるいはストア命令実行においてストアキャッシュヒットでない場合（以下ストアキャッシュミスと称す）には小規模の計算機と同様にメモリ装置のアクセス時間が命令実行時間、あるいは直後のメモリ操作に反映される。
【０００５】
【発明が解決しようとする課題】
上記のように、従来技術による計算機におけるメモリ操作では、小規模の計算機ではロード命令毎にメモリ装置のアクセス時間が命令実行時間に直接反映され、また、ストア命令実行に伴うメモリ書き込み動作のためにこれに連続するロード命令、あるいはストア命令の実行は待たされてしまうため、メモリ操作命令の実行時間による性能低下の問題が大きい。かつ、１語単位のメモリ操作であるためＤＲＡＭデバイスに用意される高速ページアクセス等の連続語に渡るメモリ操作によるメモリスループットの向上は活かされない。
特に、最近の高動作周波数のＣＰＵではＣＰＵ内部のみで実行される命令実行時間と、メモリ操作命令の実行時間の差が大きくなる傾向で、このことがＣＰＵ動作周波数の向上にもかかわらずプログラム実行性能が延びない原因となっている。
【０００６】
一方、大規模の計算機ではキャッシュメモリを設置することにより上記ＣＰＵ動作とメモリ動作のギャップを埋めているが、キャッシュメモリをＣＰＵと同一のＬＳＩに内蔵することによるチップサイズ、チップコスト、および消費電力の増大、あるいはキャッシュメモリをＣＰＵ−ＬＳＩ外部に設置することによる装置コスト、装置容積、および消費電力の増大が問題となっている。
更に、ＣＰＵ動作周波数の向上に伴い、キャッシュメモリのアクセス時間がこれに追従できず、キャッシュメモリを設置しながらもメモリ操作命令実行のためのクロックサイクル数の増大を招いている。
【０００７】
このような点から、ローコスト化および消費電力の低減化が図れ、かつ、プログラム実行性能を向上させることのできるメモリバッファ装置を実現することが望まれていた。
【０００８】
【課題を解決するための手段】
第１発明のメモリバッファ装置は、前述の課題を解決するために、メモリ装置からデータ読み出しのためのロードバッファ部と、メモリ装置への書き込みのためのストアバッファ部とを備え、ロードバッファ部は、一メモリブロック分のデータを格納するための複数語の独立に読み書き可能なレジスタを有するロードデータバッファ部と、中央処理装置からのメモリ装置へのデータ読み出し要求に対して、レジスタ内のいずれかに該当データが存在する場合は、これを中央処理装置に返送し、どのレジスタにも該当データが存在しない場合は、メモリ装置より、読み出し要求のデータのメモリアドレスから連続して同一メモリブロックの最終アドレスまでのデータを読み出して、これらを順次、レジスタに格納するロードバッファ制御部とからなり、ストアバッファ部は、一メモリブロック分のデータとして複数語を格納する先入れ先出しバッファを備えたストアデータバッファ部と、中央処理装置からのメモリ装置への書き込み要求に対して、先入れ先出しバッファ内に、該当データが存在する場合はこれを更新すると共に、格納位置を最後尾とし、該当データの同一メモリブロックのデータが存在する場合は、該当データを最後尾に格納し、先入れ先出しバッファ内のデータが、該当データのメモリブロックとは異なるメモリブロックであった場合は、先入れ先出しバッファに格納されているデータを全てメモリ装置に書き込むと共に、該当データを先入れ先出しバッファの先頭位置に格納するストアバッファ制御部とからなることを特徴とするものである。
【０００９】
第１発明のメモリバッファ装置がこのように構成されていることにより、ロードバッファ部は、中央処理装置からのメモリ装置へのデータ読み出し要求に対して、レジスタ内のいずれかに該当データが存在する場合は、これを中央処理装置に返送する。一方、どのレジスタにも該当するデータが存在しない場合は、メモリ装置より、読み出し要求のデータのメモリアドレスから連続して同一メモリブロックの最終アドレスまでのデータを読み出して、これらを順次、レジスタに格納する。
また、ストアバッファ部は、中央処理装置からのメモリ装置への書き込み要求に対して、先入れ先出しバッファ内に、該当データが存在する場合はこれを更新すると共に、格納位置を最後尾とする。
書き込み要求のデータの同一メモリブロックのデータが、先入れ先出しバッファ内に存在する場合は、該当データをその最後尾に格納する。
一方、先入れ先出しバッファ内のデータが、該当データのメモリブロックとは異なるメモリブロックであった場合は、先入れ先出しバッファに格納されているデータを全てメモリ装置に書き込むと共に、その該当データを先入れ先出しバッファの先頭位置に格納する。
【００１０】
第２発明のメモリバッファ装置は、前述の課題を解決するために、メモリ装置からデータ読み出しのためのロードバッファ部と、メモリ装置への書き込みのためのストアバッファ部とを備え、ロードバッファ部は、一メモリブロック分のデータを格納するための複数語の独立に読み書き可能なレジスタを有するロードデータバッファ部と、中央処理装置からのメモリ装置へのデータ読み出し要求に対して、レジスタ内のいずれかに該当データが存在する場合は、これを中央処理装置に返送し、どのレジスタにも該当データが存在しない場合は、メモリ装置より、読み出し要求のデータのメモリアドレスを含む同一メモリブロック全体のデータを読み出して、これらを順次、レジスタに格納するロードバッファ制御部とからなり、ストアバッファ部は、一メモリブロック分のデータとして複数語を格納する先入れ先出しバッファを備えたストアデータバッファ部と、中央処理装置からのメモリ装置への書き込み要求に対して、先入れ先出しバッファ内に、該当データが存在する場合はこれを更新すると共に、格納位置を最後尾とし、該当データの同一メモリブロックのデータが存在する場合は、該当データを最後尾に格納し、先入れ先出しバッファ内のデータが、該当データのメモリブロックとは異なるメモリブロックであった場合は、先入れ先出しバッファに格納されているデータを全てメモリ装置に書き込むと共に、該当データを先入れ先出しバッファの先頭位置に格納するストアバッファ制御部とからなることを特徴とするものである。
【００１１】
第２発明のメモリバッファ装置がこのように構成されていることにより、ロードバッファ部は、中央処理装置からのメモリ装置へのデータ読み出し要求に対して、レジスタ内のいずれかに該当データが存在する場合は、これを中央処理装置に返送する。一方、どのレジスタにも該当するデータが存在しない場合は、メモリ装置より、読み出し要求のデータのメモリアドレスを含む同一メモリブロック全体のデータを読み出して、これらを順次、レジスタに格納する。
また、ストアバッファ部の動作は、第１発明と同様である。
【００１２】
第３発明のメモリバッファ装置は、前述の課題を解決するために、第１または第２発明のメモリバッファ装置において、中央処理装置からのメモリ装置へのデータ書き込み要求に対して、該当データが、ロードデータバッファ部内のレジスタのいずれかに存在する場合は、このデータを更新するロードバッファ制御部を備えたことを特徴とするものである。
【００１３】
第３発明がこのように構成されていることにより、中央処理装置からのメモリ装置へのデータ書き込み要求があった場合、該当データが、ロードデータバッファ部内のレジスタのいずれかに存在した場合は、その都度、メモリ装置に書き込まず、ロードデータバッファ部内のデータのみを更新する。
【００１４】
【発明の実施の形態】
以下、本発明の実施の形態を図面を用いて詳細に説明する。
《実施形態１》
［構成］
図１は本発明のメモリバッファ装置の実施形態１および後述する実施形態２を示す構成図である。
図の装置は、メモリバッファ装置を構成するロードバッファ部１００とストアバッファ部２００、および中央処理装置（以下、ＣＰＵという）３００とメモリ装置４００を示している。
【００１５】
ロードバッファ部１００は、主としてＣＰＵ３００からのロード命令（メモリ読み出し命令）実行時に働き、ストアバッファ部２００は、ＣＰＵ３００からのストア命令（メモリ書き込み命令）実行時に働く。
即ち、ロードバッファ部１００は、一メモリブロック分のデータを格納するための複数語の独立に読み書き可能なレジスタを有するロードデータバッファ部１０１と、ＣＰＵ３００からのメモリ読み出し要求に対して、複数のレジスタ内のいずれかに該当データが存在する場合は、これをＣＰＵ３００に返送し、どのレジスタにも該当データが存在しない場合は、メモリ装置４００より、読み出し要求のデータのメモリアドレスから連続して同一メモリブロックの最終アドレスまでのデータを読み出して、これらを順次、レジスタに格納するロードバッファ制御部１０２とからなる。
【００１６】
ロード命令実行時はＣＰＵ３００からロードバッファ部１００に対してリクエスト信号をアサートし（“１”を出力し）、リード／ライト信号をネゲートし（“０”を出力し）、更に所定のメモリアドレスになるようにアドレスを出力する。ロードバッファ部１００ではこれに応えて、アクノレッジ信号をアサートすると共に、ＣＰＵ３００の要求するロードデータをＣＰＵ３００に出力する。ロード命令実行時で、メモリ装置４００からデータ読み出しを行う場合にはロードバッファ部１００からメモリ装置４００に対してメモリリクエストをアサートし、メモリリード／ライト信号をネゲートし、更に所定のメモリアドレスになるようにメモリアドレスを出力する。
メモリ装置４００はこれに応えて、メモリアクノレッジ信号をアサートすると共に、ロードバッファ部１００の要求するリードデータをロードバッファ部１００に出力する。
【００１７】
また、ストアバッファ部２００は、一メモリブロック分のデータとして複数語を格納する先入れ先出しバッファを備えたストアデータバッファ部２０１と、ＣＰＵ３００からメモリ装置４００への書き込み要求に対して、先入れ先出しバッファ内に、該当データが存在する場合はこれを更新すると共に、格納位置を最後尾とし、該当データの同一メモリブロックのデータが存在する場合は、該当データを最後尾に格納し、先入れ先出しバッファ内のデータが、該当データのメモリブロックとは異なるメモリブロックであった場合は、先入れ先出しバッファに格納されているデータを全てメモリ装置４００に書き込むと共に、該当データを先入れ先出しバッファの先頭位置に格納するストアバッファ制御部２０２とを備えている。
【００１８】
ストア命令実行時はＣＰＵ３００からストアバッファ部２００に対してリクエスト信号をアサートし（“１”を出力し）、リード／ライト信号をアサートし（“１”を出力し）、書き込むべきデータ、即ち、ストアデータを出力し、更に所定のメモリアドレスになるようにアドレスを出力する。ストアバッファ部２００ではこれに応えて、アクノレッジ信号をアサートしてＣＰＵ３００に出力する。ストア命令実行時で、メモリにデータ書き込みを行う場合にはストアバッファ部２００からメモリ装置４００に対してメモリリクエストをアサートし、メモリリード／ライト信号をアサートし、書き込むべきデータとなるようにライトデータを出力し、更に所定のメモリアドレスになるようにメモリアドレスを出力する。メモリ装置４００はこれに応えて、メモリアクノレッジ信号をアサートしてロードバッファ部１００に出力する。
【００１９】
図２は、ロードバッファ部１００の構成図である。
ロードバッファ部１００は、上記のように、ロードデータバッファ部１０１と、ロードバッファ制御部１０２とで構成されている。
ロードデータバッファ部１０１は、本実施形態１では、４語のデータレジスタ（ＬＢ０〜３）、ロードバッファ入力セレクタ１０１ａ、およびロードバッファ出力セレクタ１０１ｂとで構成されるが、本発明においてはデータ語数は４語に限定されるものではなく、１語以上任意の語数をとることが可能である。
【００２０】
実施形態１および実施形態２では、ロードデータバッファ部１０１は４語のメモリデータを一時的に格納することが可能で、メモリ装置４００から読み出した１語のデータ、あるいは連続的に読み出した２〜４語のデータを格納する。ＣＰＵ３００からのロード命令実行時に、これらのデータレジスタＬＢ０〜３内に保存されているデータが、ＣＰＵ３００からのアドレスで指定されるメモリアドレスに格納されているデータのコピーを含む場合には、メモリ装置４００からデータ読み出しは行わずに４つのデータレジスタＬＢ０〜３の出力のうち、ＣＰＵ３００で指定されるデータを格納する１つをロードバッファ出力セレクタ１０１ｂにて選択し、ロードデータとしてＣＰＵ３００に出力する。
【００２１】
また、ストア命令実行時において、そのストアデータのメモリアドレスと同一のメモリアドレスからデータを読み出し、データレジスタＬＢ０〜３にそのデータを格納してある場合には、データの一致性を保つために新たなストアデータを該当レジスタにも格納する必要がある。本実施形態では、以下これをストアバイパス動作と記す。ロードバッファ入力セレクタ１０１ａは、通常のメモリ装置４００からの読み出し時のリードデータとストアバイパス動作時のＣＰＵ３００からのストアデータとのいずれかを選択してデータレジスタＬＢ０〜３に出力するものである。
【００２２】
ロードバッファ制御部１０２は、ＣＰＵ３００およびメモリ装置４００に対する制御信号の入出力を行うと共に、ロードデータバッファ部１０１の制御を行う機能を有している。ロードバッファ制御部１０２からロードデータバッファ部１０１には、ストアバイパス信号、ＬＢ０〜３セット、およびロードアドレスオフセット信号が出力される。ストアバイパス信号はロードバッファ入力セレクタ１０１ａへの入力選択信号で“０”のときリードデータを、“１”のときストアデータを選択してＬＢ０〜３に出力する。ＬＢ０〜３セット信号はそれぞれデータレジスタＬＢ０〜３へのデータセット信号であり、この信号の立ち上がりエッジで入力データが新たに格納される。ＣＰＵ３００からのアドレスの下位２ビット（以下、スモールアドレスと記す）はロードバッファ出力セレクタ１０１ｂへの選択信号となり、ＬＢ０〜３の出力のうち１つを選択する。
【００２３】
図３は、ロードバッファ制御部１０２の構成図である。
図４、図５は、それぞれ図３におけるロードデータバッファ有効フラグ部およびＬＢ０〜３セット信号生成部の構成図である。
ロードバッファ制御部１０２は、ＣＰＵ３００、およびメモリ装置４００に対する制御信号の入出力を行うと共に、ロードデータバッファ部１０１の制御を行うもので、メモリアドレス一致検査部１０２ａ、ロードデータバッファ有効フラグ（以下Ｖフラグと記す）部１０２ｂ、ＬＢ０〜３セット信号生成部１０２ｃ、およびシーケンス制御部とから構成される。
【００２４】
即ち、これらの図に示すロードバッファ制御部１０２は、アンド回路１〜２１、オア回路３１〜３５、バッファ４１〜４４、ブロックアドレスバッファ４５、オフセットアドレスバッファ４６、タグレジスタ５１、Ｖ０レジスタ（Ｖ０）〜Ｖ３レジスタ（Ｖ３）、メモリリクエストレジスタ５２、スモールアドレスセレクタ６１、Ｖフラグセレクタ６２、スモールアドレスデコーダ７１、オフセットアドレスデコーダ７２、スモールアドレスデコーダ７３、オフセットアドレスカウンタ８１、比較器９１を備えている。
【００２５】
尚、図３中、Ａ１はアクノレッジ、Ａ２はラージアドレス（ブロックアドレス）、Ａ３はリード／ライト、Ａ４はリクエスト、Ａ５はスモールアドレスを示す。また、Ｂ１はメモリブロックアドレス、Ｂ２はメモリオフセットアドレス、Ｂ３はメモリリクエスト、Ｂ４はメモリリード／ライト、Ｂ５はメモリビジー、Ｂ６はストアバイパス、Ｂ７はメモリアクノレッジ、Ｂ８〜Ｂ１１はＬＢ０セット〜ＬＢ３セットを示している。
【００２６】
メモリアドレス一致検査部１０２ａは、タグレジスタ５１および比較器９１とで構成される。
タグレジスタ５１は、ＣＰＵ３００から入力されるアドレスのうち下位２ビットを除く信号（ラージアドレス）を一時的に格納するレジスタで、ロードデータバッファ部１０１に格納される１〜４語のデータのメモリ装置４００上に配置されているブロックアドレスを保持する。比較器９１は、Ｅｘ−ＮＯＲ回路であり、タグレジスタ５１に格納されるラージアドレス（ブロックアドレス）と、ＣＰＵ３００から入力されるアドレスのうち下位２ビットを除く値を比較し、両者が一致するか否かを検査する。つまり、ロード命令、あるいはストア命令実行時にＣＰＵ３００から入力されるブロックアドレスに配置されるデータの一部あるいは全部（４語）のコピーがロードデータバッファ部１０１に格納されているか否かを検査するものである。以下、上記検査結果が一致する場合をロードデータバッファヒット、そうでない場合をロードデータバッファミスと記す。
【００２７】
Ｖフラグ部１０２ｂは、図４に示すように、Ｖ０〜３レジスタ（Ｖ０〜３）、スモールアドレスセレクタ６１、スモールアドレスデコーダ７１、アンド回路３〜６、およびＶフラグセレクタ６２とで構成される。
Ｖ０〜３レジスタ（Ｖ０〜Ｖ３）は、ロードデータバッファ部１０１のＬＢ０〜３に格納されるメモリデータのコピーのうち、その内容が有効であるか否かをそれぞれ示すためのＶ０〜３フラグを格納する部分で、ロードデータヒット時にＣＰＵ３００から入力されるアドレスの下位２ビット（以下スモールアドレスと記す）で指定されるデータレジスタ、即ち、ＬＢ０〜３のうちのいずれか１つに対して該当するＶフラグが“１”である場合にはそのデータレジスタの内容が有効であることを示すものである。
【００２８】
一方、該当するＶフラグが“０”である場合には、ロードデータヒット時であってもＣＰＵ３００の要求するデータはデータレジスタＬＢ０〜３には格納されておらず、メモリ装置４００にアクセスを行い、データを読み出す必要がある。スモールアドレスセレクタ６１は、Ｖ０〜３レジスタＶ０〜Ｖ３のうち、Ｖフラグをセットすべきレジスタを選択するための２ビットの選択信号を選択するもので、メモリ装置４００から読み出したデータを、データレジスタＬＢ０〜３に格納する時には、オフセットアドレスカウンタ８１（シーケンス制御部）の出力を選択し、また、前記ストアバイパス動作時にはＣＰＵ３００から入力されるスモールアドレスを選択する。スモールアドレスデコーダ７１は、スモールアドレスセレクタ６１の出力をデコードする。アンド回路３〜６は、スモールアドレスデコーダ７１の出力（４出力のうち１つが“１”となる）とシーケンス制御部のオア回路３１の出力（後述）との論理積をそれぞれとるもので、これらの出力はＶ０〜３レジスタＶ０〜Ｖ３のセット入力にそれぞれ接続される。
【００２９】
オア回路３１出力がアサートされた時点でアンド回路３〜６のうちのいずれか１出力が“１”となり、この信号の立ち上がり時に該当するＶ０〜３レジスタＶ０〜Ｖ３のうち１つが“１”にセットされる。Ｖ０〜３レジスタＶ０〜Ｖ３の前記以外の３つはそれ以前の内容を保存する。Ｖ０〜３レジスタＶ０〜Ｖ３のリセット入力はシーケンス制御部のアンド回路９の出力（メモリリクエストレジスタ５２のセット入力：後述）に接続され、このアンド回路９の出力がアサートされた立ち上がり時点でＶ０〜３の内容は全て“０”となる。
Ｖフラグセレクタ６２は、Ｖ０〜３レジスタＶ０〜Ｖ３の４つの出力のうち１つを選択して出力するもので、ＣＰＵ３００から入力されるスモールアドレスを選択信号とする。前記ロードバッファヒット時に、このＶフラグセレクタ６３の出力が“１”の場合にはＣＰＵ３００がロード命令で要求するメモリデータのコピーが該当するロードデータバッファ部１０１に格納されていることを示す。
【００３０】
ＬＢ０〜３セット信号生成部１０２ｃは、図５に示すように、オフセットアドレスデコーダ７２、スモールアドレスデコーダ７３、アンド回路１４〜２１、およびオア回路３２〜３５とで構成され、ロードデータバッファＬＢ０〜３へのデータセット信号を生成するものである。
【００３１】
オフセットアドレスデコーダ７２は、シーケンス制御部のオフセットアドレスカウンタ８１の２ビットの出力をデコードするものである。尚、オフセットアドレスカウンタ８１はメモリ装置４００に対して読み出しを行う時のメモリアドレスの下位２ビットとなるもので、実際にメモリ装置４００からデータを読み出すメモリブロック内の位置（以下、これをメモリオフセットと記す）を示す。
【００３２】
アンド回路１４〜１７は、オフセットアドレスデコーダ７２の出力（４つの出力のうちいずれか１出力が“１”となる）と、シーケンス制御部のアンド回路１１（後述）出力との論理積をそれぞれとるもので、メモリ装置４００から読み出したリードデータをデータレジスタＬＢ０〜３の該当レジスタに格納するためのセット信号を生成する。
【００３３】
スモールアドレスデコーダ７３は、ＣＰＵ３００から入力される２ビットのスモールアドレスをデコードするもので、ストアバイパス動作時のメモリブロック内のデータ書き込み位置（メモリオフセット）に相当するデータレジスタＬＢ０〜３のうちの１つを選択するものである。
【００３４】
アンド回路１８〜２１は、スモールアドレスデコーダ７３の出力（４つの出力のうちいずれか１出力が“１”となる）と、シーケンス制御部のアンド回路１３（後述）出力との論理積をそれぞれとるもので、ストアバイパス動作時にＣＰＵ３００から入力されるストアデータをバイパスしてデータレジスタＬＢ０〜３の該当レジスタに格納するためのセット信号を生成する。
オア回路３２〜３５は、アンド回路１４〜１７、およびアンド回路１８〜２１の出力の論理和をそれぞれとるもので、オア回路３２〜３５の出力がＬＢ０〜３セット信号としてデータレジスタＬＢ０〜３にそれぞれ出力される。
【００３５】
シーケンス制御部は、ＣＰＵ３００およびメモリ装置４００に対する制御信号の入出力、メモリ装置４００に対する読み出し動作制御を行うと共に、ロードデータバッファ部１０１の動作制御を行うもので、図３に示すアンド回路１〜２，７〜１３、メモリリクエストレジスタ５２、バッファ４１〜４４、ブロックアドレスバッファ４５、オフセットアドレスバッファ４６、オア回路３１、オフセットアドレスカウンタ８１で構成される。
【００３６】
シーケンス制御部では、先ず、アンド回路１でＣＰＵ３００からのリクエスト信号と、リード／ライト信号の反転信号との論理積をとり、リードリクエスト信号を生成する。尚、リード／ライト信号は、“０”の場合はリード、“１”の場合はライトを示している。従って、リードリクエスト信号が“１”の時、ＣＰＵ３００からのロード命令実行であることを認識する。
【００３７】
また、アンド回路２ではＣＰＵ３００からのリクエスト信号、リード／ライト信号、およびメモリアドレス一致検査部１０２ａの比較器９１出力（以下、この出力をタグマッチ信号と記す）との論理積をとり、ストアバイパス信号を生成する。ストアバイパス信号がアサートされる場合にはストアバイパス動作を行う。即ち、ライト指示でかつ、ブロックアドレスが一致したため、ストアバイパス動作を行うものである。
【００３８】
アンド回路７は、アドレス一致検査部１０２ａの比較器９１出力と、Ｖフラグ部１０２ｂのＶフラグセレクタ６２出力であるバリッド信号との論理積をとる。アンド回路７の出力が“１”の場合は、ＣＰＵ３００から入力されるアドレスに配置されるメモリ装置４００内のデータのコピーがデータレジスタＬＢ０〜３の該当レジスタに存在することを示す。即ち、ロードデータバッファヒットで、かつ、Ｖフラグが“１”である。
【００３９】
アンド回路８では、アンド回路７の出力と、アンド回路１の出力（リードリクエスト信号）との論理積をとる。アンド回路８の出力が“１”の場合にはメモリ装置４００からのデータ読み出しは行わず、この信号をバッファ４１で駆動アクノレッジ信号としてＣＰＵ３００に出力する。この時、図２のロードバッファ部１００内のロードバッファ出力セレクタ１０１ｂでは、データレジスタＬＢ０〜３のうち、ＣＰＵ３００から入力されるスモールアドレスで指定されるデータレジスタ出力を選択し、ロードデータとしてＣＰＵ３００に出力する。
【００４０】
ここで、アクノレッジ信号とリクエスト信号とはハンドシェイクの関係にある。即ち、ＣＰＵ３００にてリクエスト信号がアサートされ、メモリバッファ装置が動作を行う。これが完了した時点でメモリバッファ装置にてアクノレッジ信号がアサートされる。ＣＰＵ３００ではアクノレッジ信号がアサートされたことによりリクエスト信号をネゲートする。メモリバッファ装置ではリクエスト信号がネゲートされたことによりアクノレッジ信号をネゲートする。ＣＰＵ３００ではアクノレッジ信号がネゲートされている状態において、次のメモリリクエストをアサートする。
このようにメモリリクエスト信号、およびアクノレッジ信号は、ＣＰＵ３００、およびメモリバッファ装置間の１語のデータアクセス毎の動作の同期をとるために使用される。
【００４１】
アンド回路９では、アンド回路７の出力の反転信号と、アンド回路１の出力であるリードリクエスト信号、およびメモリビジィの反転信号との論理積をとる。この出力は、ＣＰＵ３００からのロード命令で読み出す必要のあるデータがロードデータバッファ部１０１に存在せず、かつ、その時点でメモリ装置４００がアクセス可能な状態にあることを示す。
メモリビジィ信号はロードバッファ部１００およびストアバッファ部２００との共通信号である。そして、両者のうちいずれかがメモリ装置４００にアクセス中である時にアサートされるアンド回路９の出力の立ち上がり時点で、メモリリクエストレジスタ５２が“１”にセットされる（以下、“１”にセットされることを単にセットされると記し、“０”にセットされることをリセットされると記す）。つまり、ロードデータバッファミスであるため、ＣＰＵ３００の要求するデータをメモリ装置４００から読み出す必要があり、メモリリクエストレジスタ５２をセットすることでメモリ装置４００に起動をかける。
【００４２】
アンド回路９の出力は、更にタグレジスタ５１のクロック入力、Ｖ０〜３レジスタＶ０〜Ｖ３のリセット入力、およびオフセットアドレスカウンタ８１のロード入力に接続される。タグレジスタ５１は、アンド回路９の出力のアサート時点、即ち、メモリリクエストレジスタ５２のセット時点でＣＰＵ３００から入力されるラージアドレスを新たに格納する。
タグレジスタ５１の出力は、メモリリクエストレジスタ５２の出力であるＭリクエスト信号がアサートされている間ブロックアドレスバッファ４５により駆動され、メモリブロックアドレスとしてメモリ装置４００に出力される。また、Ｖ０〜３レジスタはメモリ装置４００に対する新たなメモリアクセスを開始する時点、即ち、アンド回路９の出力のアサート時点に全てクリアされ、その内容は“０”となる。
【００４３】
オフセットアドレスカウンタ８１は、アンド回路９の出力のアサート時点にＣＰＵ３００から入力される２ビットのスモールアドレスをセットする。オフセットアドレスカウンタ８１の出力は、オフセットアドレスバッファ４６により駆動され、メモリ装置４００にメモリオフセットアドレスとして出力される。また、この信号はオフセットアドレスデコーダ７２に入力され、メモリ装置４００からのリードデータをロードデータバッファ部１０１に格納する時のＬＢ０〜３の選択信号となる。更に、この信号はスモールアドレスセレクタ６１に入力され、メモリ装置４００に対するリードアクセスを行った際のＶフラグをセットする時のＶ０〜３レジスタＶ０〜Ｖ３の選択信号となる。
【００４４】
メモリリクエストレジスタ５２の出力であるＭリクエスト信号は、アンド回路１２において、メモリアクノレッジの反転信号との論理積をとり、バッファ４２により駆動され、メモリリクエスト信号としてメモリ装置４００に出力される。メモリアクノレッジ信号はメモリ装置４００から入力される信号で、この信号がアサートされるとメモリ装置４００での所定の動作が完了したことを示す。また、メモリリクエスト信号はロードバッファ部１００およびストアバッファ部２００との共通信号であり、両者のうちいずれかがメモリ装置４００に対してアクセス要求していることを示す。
【００４５】
メモリリクエスト信号とメモリアクノレッジ信号はハンドシェイクの関係にある。即ち、メモリバッファ装置（ロードバッファ部１００またはストアバッファ部２００）にてメモリリクエスト信号がアサートされ、メモリ装置４００が動作を行う。これが完了した時点でメモリ装置４００にてメモリアクノレッジ信号がアサートされる。メモリバッファ装置ではメモリアクノレッジ信号がアサートされたことにより、メモリリクエスト信号をネゲートする。メモリ装置４００ではメモリリクエスト信号がネゲートされたことにより、メモリアクノレッジ信号をネゲートする。メモリバッファ装置ではメモリアクノレッジ信号がネゲートされている状態において次のメモリリクエストをアサートする。このようにメモリリクエスト信号、およびメモリアクノレッジ信号は、メモリバッファ装置、およびメモリ装置４００の間の１語のデータアクセス毎の動作の同期をとるために使用される。従って、前記２つの信号は毎メモリサイクルにオンオフする。
【００４６】
Ｍリクエストの反転信号はバッファ４３により駆動され、メモリリード／ライト信号としてメモリ装置４００に出力される。メモリリード／ライト信号はＭリクエスト信号が“１”の間“０”となる。また、Ｍリクエスト信号はバッファ４４により駆動され、メモリビジィ信号をアサートする。メモリリード／ライト信号、およびメモリビジィ信号は毎メモリサイクルにオンオフせず、Ｍリクエスト信号がアサートされている間駆動され続ける。これらは、メモリバッファ装置とメモリ装置４００の間のメモリブロック内の連続メモリアクセスを行うために１〜４メモリサイクルの間アサートされる。
【００４７】
Ｍリクエスト信号は、更にアンド回路１１の入力、およびスモールアドレスセレクタ６１の選択入力に接続される。アンド回路１１ではＭリクエスト信号とメモリアクノレッジ信号との論理積がとられる。この出力はリードアクノレッジ信号としてこの信号がアサートされたことによりメモリ装置４００におけるリードアクセス動作が完了したことを示す。スモールアドレスセレクタ６１は、Ｖ０〜３レジスタのうち、いずれか１つを選択して、これをセットするもので、メモリ装置４００に対するリードアクセスの時にはオフセットアドレスカウンタ８１の出力を選択し、ストアバイパスの時にはＣＰＵ３００から入力されるスモールアドレスを選択してスモールアドレスデコーダ７１に出力する。このための選択信号としてＭリクエスト信号が使用される。
【００４８】
アンド回路１１の出力であるリードアクノレッジ信号は、アンド回路１０の入力、オア回路３１の入力、オフセットアドレスカウンタ８１のクロック入力、およびアンド回路１４〜１７の入力に接続される。アンド回路１０ではリードアクノレッジ信号とオフセットアドレスデコーダ７２の出力３（入力信号が“１１”の時アサートされる）との論理積をとる。アンド回路１０の出力はリードダン信号で、この信号のアサートにより、メモリ装置４００からＣＰＵ３００が要求するデータからリードアクセスを始めてそのメモリブロックの最大メモリアドレス、即ち、メモリアドレスの下位２ビットが“１１”まで連続してリードアクセスが完了したことを示す。
【００４９】
リードダン信号はメモリリクエストレジスタ５２のリセット入力に入力され、この信号の立ち上がり時点でメモリリクエストレジスタ５２はリセットされる。そして出力のＭリクエスト信号はネゲートされる。
オア回路３１ではリードアクノレッジ信号とアンド回路１３の出力の論理和がとられる。オア回路３１の出力は、Ｖフラグ部１０２ｂのアンド回路３〜６に入力され、Ｖ０〜３レジスタＶ０〜Ｖ３へのセット入力のアサートされるタイミング信号となる。
【００５０】
アンド回路１１の出力であるリードアクノレッジ信号は、メモリ装置４００に対してリードアクセスを行う場合のＶフラグのセットタイミング信号として使用される。オフセットアドレスカウンタ８１は、メモリ装置４００に出力するオフセットアドレスの生成を行うと共に、連続してメモリアクセスを行う時のサイクル数を計数するために使用する。本実施形態１では、メモリアクセスはロードバッファミスとなったメモリアドレスからそのメモリブロックの最大アドレスまでに配置されるデータを連続してリードアクセスしてロードデータバッファ部１０１に格納するとしている。
【００５１】
アンド回路１３は、ストアバイパス信号とアクノレッジ信号との論理積をとる。この時のアクノレッジ信号はストア命令実行時にストアバッファ部２００で駆動されアサートされる。アンド回路１３の出力は、オア回路３１の入力およびＬＢ０〜３セット信号生成部１０２ｃのアンド回路１８〜２１の入力に接続される。オア回路３１へは、ストアバイパス時のＶフラグのセットタイミング信号として入力される。アンド回路１８〜２１ではストアバイパス時にＣＰＵ３００から入力されるストアデータをバイパスしてロードデータバッファ部１０１に格納するためのＬＢ０〜３セット信号をアサートするタイミング信号となる。
【００５２】
図６は、ストアバッファ部２００の構成図である。
上述したように、ストアバッファ部２００は、ストアデータバッファ部２０１と、ストアバッファ制御部２０２とで構成される。
ストアデータバッファ部２０１はキュー（ＦＩＦＯ：先入れ先出しメモリ）で構成される。本実施形態では４語のデータレジスタ（ＳＢ０〜３）、４つのスモールアドレスレジスタ（各２ビット：ＳＢ０〜３Ａ）、および３つのデータセレクタ（ＳＢ１〜３Ｂ）を直列に接続して、ストアデータとその書き込み先メモリアドレスの下位２ビット、即ち、スモールアドレスとを格納するキューの部分、およびＣＰＵ３００から入力されたスモールアドレスの内容とスモールアドレスレジスタＳＢ０〜３Ａに格納される既にキューに入れられているスモールアドレスの内容との比較を行うマッチャＳＢ０〜３Ｃとで構成されるが、本発明においてはデータ語数は４語に限定されるものではなく、１語以上任意の語数をとることが可能である。
【００５３】
ＣＰＵ３００からのストア命令実行時に入力されるラージアドレス、即ち、メモリアドレスの下位２ビットを除いた値と、前記キューに保存されているデータの書き込み先メモリブロックのアドレス、即ちメモリブロックアドレスが等しい場合には、メモリ装置４００に対してデータ書き込みは行わずにキューの最後尾に新たなストアデータと書き込み先メモリブロック内のオフセットアドレス（以下、単にオフセットアドレスと記す）、即ち、ＣＰＵ３００から入力されるスモールアドレスの内容を書き込む。
【００５４】
この時、既にキューに格納されているデータ、およびオフセットアドレスは１語分キューの先頭方向にシフトされる。メモリブロックアドレス、およびオフセットアドレスの両方がそれぞれＣＰＵ３００から入力されるラージアドレス、およびスモールアドレスに等しい場合、即ち、ストア命令で与えられたメモリアドレスと同一のメモリアドレスへのストアデータが既にキューに格納されている場合には、そのデータは後続するストアデータで上書きされる。即ち、実際にメモリ装置４００には書き込まれずに同一メモリアドレスに対する新たなストアデータがキューに格納される。
【００５５】
ラージアドレスとメモリブロックアドレスの一致検査はストアバッファ制御部２０２にて行われ、スモールアドレスとメモリオフセットアドレスの一致検査はマッチャＳＢ０〜３Ｃにて行われる。マッチャＳＢ０〜３Ｃのうちいずれかで一致が確認された場合には、一致したストアバッファＳＢ（データレジスタＳＢ０〜３、およびスモールアドレスレジスタＳＢ０〜３Ａに対する総称とする）に相当するマッチャＳＢ０〜３Ｃのいずれかがアサートされる。
データセレクタＳＢ０〜３Ｂは、キューに新たなデータを格納する場合には、そのストアバッファＳＢへの入力に対してはＣＰＵ３００から入力されるストアデータ、およびスモールアドレスを選択し、それ以外のストアバッファＳＢに対しては１つ後ろのストアバッファＳＢの出力を選択して出力する。
【００５６】
ストアバッファ制御部２０２は、ＣＰＵ３００、およびメモリ装置４００に対する制御信号の入出力を行うと共に、ストアデータバッファ部２０１の制御を行う。
ストアバッファ制御部２０２からストアデータバッファ部２０１には、ＳＢ０〜３クロック信号、およびＳＢ１〜３セット信号が出力され、ストアデータバッファ部２０１からストアバッファ制御部２０２には、ＳＢ０〜３マッチ信号が出力される。ＳＢ０〜３クロック信号は、データレジスタＳＢ０〜３、およびスモールアドレスレジスタＳＢ０〜３Ａの各レジスタに対するクロック信号である。データレジスタＳＢ０〜３、およびスモールアドレスレジスタＳＢ０〜３Ａは、クロック信号の立ち上がり時点で入力を格納する。ＳＢ１〜３セット信号はデータセレクタＳＢ１〜３Ｂの選択信号で、“１”の時、ＣＰＵ３００から入力されるストアデータ、およびスモールアドレスを選択し、“０”の時、直列に接続される後続のレジスタの出力を選択する。ＳＢ０〜３マッチ信号は、マッチャＳＢ０〜３Ｃの出力信号でマッチャの入力同士が一致した場合には“１”となる。
【００５７】
図７は、ストアバッファ制御部２０２の構成図である。
図８は、図７におけるストアバッファ信号生成部２０２ｂの構成図である。
これらの図において、ストアバッファ制御部２０２は、ＣＰＵ３００、およびメモリ装置４００に対する制御信号の入出力を行うと共に、ストアデータバッファ部２０１の制御を行うもので、メモリアドレス一致検査部２０２ａ、ストアバッファ信号生成部２０２ｂ、およびシーケンス制御部とから構成される。
【００５８】
即ち、これらの図に示すストアバッファ制御部２０２は、アンド回路１〜１８、オア回路３１〜４０、バッファ４１〜４３、ブロックアドレスバッファ４４、タグレジスタ５１、メモリリクエストレジスタ５２、セレクタ６３〜６６、ライトバッファテールデコーダ７４、ライトバッファポインタ８２、比較器９１を備えている。
尚、図７中、Ａ１は、ラージアドレス、Ａ２はアクノレッジ、Ａ３はリクエスト、Ａ４はリード／ライト、Ａ５〜Ａ８はＳＢ０マッチ〜ＳＢ３マッチであり、Ｂ１はメモリブロックアドレス、Ｂ２はメモリリードライト、Ｂ３はメモリビジー、Ｂ４はメモリリクエスト、Ｂ５はメモリアクノレッジ、Ｂ６〜Ｂ８はＳＢ１セット〜ＳＢ３セット、Ｂ９〜Ｂ１２はＳＢ０クロック〜ＳＢ３クロックを示している。
【００５９】
メモリアドレス一致検査部２０２ａは、図３に示すロードバッファ制御部１０２のメモリアドレス一致検査部１０２ａと同一の構成で動作も同様であるため、タグレジスタ５１と比較器９１からなる構成の図示および動作の説明は省略する。
【００６０】
ストアバッファ信号生成部２０２ｂは、ＳＢ０〜３クロック、およびＳＢ１〜３セット信号を生成する回路であり、図８に示すように、ライトバッファテールデコーダ７４、アンド回路１２〜１４、オア回路３５〜４０、セレクタ６３〜６６、およびアンド回路１５〜１８で構成される。
ライトバッファテールデコーダ７４は２−４デコーダであり、シーケンス制御部のライトバッファポインタ８２で計数されるストアバッファＳＢで構成されるライトデータバッファキューの最後尾位置を示す２ビットの入力信号をデコードするもので、４つの出力のうちアサートされた１つに相当するストアバッファＳＢがキューの最後尾となる。
【００６１】
アンド回路１２〜１４は、ＳＢ１〜３セット信号を生成するもので、ライトバッファテールデコーダ７４の１〜３出力（入力が“０１”、“１０”、“１１”の時それぞれアサートされる）と、シーケンス制御部のアンド回路１の出力（後述）との論理積をそれぞれとるものである。ここで、アンド回路１２〜１４の出力のうち“１”の信号がある場合には、図６のデータセレクタＳＢ１〜３Ｂにおいて、ＣＰＵ３００から入力されるストアデータ、およびスモールアドレスが選択され、データレジスタＳＢ１〜３、およびスモールアドレスレジスタＳＢ１〜３Ａの該当レジスタに入力される。
【００６２】
一方、データレジスタＳＢ０、およびスモールアドレスレジスタＳＢ０Ａは、キューの入り口に設置されるため、後続のレジスタは入力に接続されない。従って、データレジスタＳＢ０、およびスモールアドレスレジスタＳＢ０Ａにデータを格納する場合には常にＣＰＵ３００から入力されるストアデータ、およびスモールアドレスが入力となる。
【００６３】
オア回路３５〜３７は、キューに新たなデータを格納する場合、およびキューの内容をシフトアウトしてメモリ装置４００に書き込む場合に、ストアバッファＳＢに出力するＳＢ０〜３クロック信号発生の選択を行う論理をとる部分である。新たなデータをキューに格納する場合にはライトバッファポインタ８２の値の示すレジスタよりも先に接続される、即ち、キューの先頭方向にあるストアバッファＳＢに対してクロック信号を供給する。また、キューの内容をシフトアウトしてメモリ装置４００に書き込みを行う場合についても、外部から新たに書き込むべきデータがＣＰＵ３００から入力されない限りは、既にキューに格納されているデータのみを先頭方向にシフトする。従って、この場合にもライトバッファポインタ８２の値の示すレジスタよりも先に接続される、即ち、キューの先頭方向にあるストアバッファＳＢに対してクロック信号を供給する。オア回路３５〜３７においてこのような論理をとることにより、キューの内容をメモリ装置４００に書き込む際にも無効なデータがキューに格納されることはなくなる。
【００６４】
オア回路３８〜４０では、既にキューに格納されているデータのメモリ装置４００内の書き込みメモリアドレスと同一のメモリアドレスに対するストア命令実行の場合のＳＢ０〜３クロック信号発生の選択を行う。この場合は、キュー内容の上書きを行うため、ストアデータバッファ部２０１から入力されるＳＢ０〜３マッチのうちアサートされる１〜４つの信号に応じてＳＢ０〜３クロック信号が生成される。
【００６５】
即ち、ストアバッファＳＢ０のみアサートされる場合にはＳＢ０クロックのみが生成され、ストアバッファＳＢ１のみ、あるいはストアバッファＳＢ０、ＳＢ１の両方がアサートされる場合には、ＳＢ０クロック、およびＳＢ１クロックが生成され、ストアバッファＳＢ２のみ、ストアバッファＳＢ２、ＳＢ１の両方、あるいはストアバッファＳＢ２、ＳＢ１、およびＳＢ０の３つがアサートされる場合には、ＳＢ０クロック、ＳＢ１クロック、およびＳＢ２クロックの３つが生成され、そして、ストアバッファＳＢ３のみ、ストアバッファＳＢ３、ＳＢ２の両方、ストアバッファＳＢ３、ＳＢ２、ＳＢ１の３つ、あるいはストアバッファＳＢ３、ＳＢ２、ＳＢ１、およびストアバッファＳＢ０の全てがアサートされる場合にはＳＢ０クロック、ＳＢ１クロック、ＳＢ２クロック、およびＳＢ３クロックの全てが生成される。
【００６６】
これにより、新たに書き込むデータがライトバッファテール信号（ライトバッファポインタ８２の出力）により示されるキューの最後尾に格納され、そして元のストアデータは後続のストアデータに上書きされることになる。
セレクタ６３〜６６は、前記２系統のＳＢ０〜３クロック発生選択回路を経て生成される信号の選択回路で、シーケンス制御部のアンド回路３の出力（後述）が“１”の場合、即ち、キューに格納されているストアデータの上書き時には、オア回路３８〜４０の出力、およびＳＢ３マッチ信号が選択され、アンド回路１５〜１８に出力される。一方、アンド回路３の出力が“０”の場合、即ち、キューに新たなストアデータを格納する場合、あるいはキューに格納されているストアデータを順次連続してメモリ装置４００に書き込む場合には、ライトバッファテールデコーダ７４の０出力、およびオア回路３５〜３７出力が選択されて、アンド回路１５〜１８に入力される。アンド回路１５〜１８は、これらセレクタ６３〜６６の出力と、シーケンス制御部のオア回路３４の出力（後述）との論理積をとり、ストアデータバッファ部２０１に出力するためのＳＢ０〜３クロックを生成する。オア回路３４の出力は、ストアバッファＳＢに与えるクロックのアサートタイミングを決定する信号である。
【００６７】
シーケンス制御部は、ＣＰＵ３００、およびメモリ装置４００に対する制御信号の入出力、メモリ装置４００に対する書き込み動作制御を行うと共に、ストアデータバッファ部の動作制御を行う。このシーケンス制御部は、図７に示すアンド回路１〜１１、オア回路３１〜３４、メモリリクエストレジスタ５２、バッファ４１〜４３、ブロックアドレスバッファ４４、ライトバッファポインタ８２で構成される。
【００６８】
シーケンス制御部では、先ず、アンド回路１で、ＣＰＵ３００からのリクエスト信号と、リード／ライト信号との論理積をとり、ライトリクエスト信号を生成する。ライトリクエスト信号が“１”の時、ＣＰＵ３００からのストア命令実行であることを認識する。また、オア回路３１は、ストアデータバッファ部２０１から入力されるＳＢ０〜３マッチ信号全ての論理和をとる。オア回路３１の出力はオフセットアドレスマッチ信号で、ストアデータバッファ部２０１（キュー）に新たにＣＰＵ３００から書き込み要求のあるメモリアドレスの下位２ビット、即ち、スモールアドレスと同一のメモリブロック内アドレス、更に言い替えれば、メモリオフセットアドレスに対するストアデータがキューに既に格納されている時にこの信号がアサートされる。
【００６９】
アンド回路２は、メモリアドレス一致検査部２０２ａの比較器９１出力と、オア回路３１の出力するオフセットアドレスマッチの反転信号との論理積をとる。アンド回路２の出力は、タグマッチ信号が“１”、即ち、タグレジスタ５１に格納されているメモリブロックアドレスの内容とＣＰＵ３００から入力されるラージアドレスの内容は同一であるが、スモールアドレスと前記キューのスモールアドレスレジスタＳＢ０〜３Ａに格納されるメモリオフセットアドレスの内容は一致しないことを意味する。つまり、キューに格納されるストアデータと同一のメモリブロックの新たなブロック内オフセットアドレスに書き込むストア命令が実行された場合である。この場合には、入力されるストアデータ、およびスモールアドレスの内容をキューに格納する。この時の格納先ストアバッファＳＢは、ライトバッファポインタ８２の出力であるライトバッファテール信号により示される。
【００７０】
アンド回路３は、メモリアドレス一致検査部２０２ａの比較器９１の出力と、前記のオフセットアドレスマッチ信号との論理積をとる。アンド回路３の出力は、ＣＰＵ３００から入力されたアドレスとタグレジスタ５１、およびキューに格納されているメモリアドレスが一致する場合にアドレスマッチ信号としてアサートされる。この場合には、既にキューに格納されているストアデータを上書きし、新たにＣＰＵ３００から入力されるストアデータをライトバッファテール信号で指定されるキューの最後尾に格納する。
【００７１】
アンド回路４は、メモリアドレス一致検査部２０２ａの比較器９１の出力の反転信号、アンド回路１の出力であるライトリクエスト信号、およびメモリビジィ信号との論理積をとる。メモリビジィ信号は、ロードバッファ部１００でも説明したように、ロードバッファ部１００、およびストアバッファ部２００との共通信号である。
【００７２】
アンド回路５は、アンド回路１の出力とアンド回路２の出力との論理積をとり、エンキュー信号を出力する。アンド回路６は、アンド回路１の出力とアンド回路３の出力との論理積をとり、オーバライト信号を出力する。アンド回路３の出力は、ストアバッファ信号生成部２０２ｂのセレクタ６３〜６６に対する選択信号として使用される。前記エンキュー信号は、オア回路３４、およびオア回路３２に出力される。前記オーバライト信号はオア回路３４に入力される。オア回路３４は、これら２つの信号とアンド回路１０の出力（後述）との論理和をとるもので、シフトタイミング信号を出力する。
【００７３】
シフトタイミング信号は、ストアバッファ信号生成部２０２ｂのアンド回路１５〜１８に入力され、ＳＢ０〜３クロック信号のアサートタイミングを与える。シフトタイミング信号は、更にアンド回路７に出力される。アンド回路７は、前記のライトリクエスト信号と、シフトタイミング信号との論理積をとり、バッファ４１により駆動され、アクノレッジ信号としてＣＰＵ３００に出力する。アンド回路４の出力は、メモリリクエストレジスタ５２のセット入力（Ｓ入力）、およびオア回路３２に出力される。メモリリクエストレジスタ５２では、セット入力の立ち上がり時点でセットされ、Ｍリクエスト信号をアサートする。Ｍリクエスト信号は、前記ロードバッファ制御部１０２のＭリクエスト信号と同様に、メモリ装置４００に対する書き込みアクセスを行うための起動信号となる。
【００７４】
Ｍリクエスト信号は、ブロックアドレスバッファ４４のイネーブル信号としてタグレジスタ５１の出力を駆動し、メモリラージアドレスをメモリ装置４００に出力し、バッファ４２により駆動され、メモリリード／ライト信号をメモリ装置４００に出力し、アンド回路１１に出力され、メモリ装置４００から入力されるメモリアクノレッジの反転信号との論理積をとり、バッファ４３により駆動され、メモリリクエスト信号としてメモリ装置４００に出力される。
【００７５】
メモリリクエスト信号、およびメモリアクノレッジ信号は、前記ロードバッファ部１００で記したようにハンドシェイク信号である。Ｍリクエスト信号は更にアンド回路１０に出力される。アンド回路１０は、Ｍリクエスト信号とメモリアクノレッジ信号との論理積をとり、ライトアクノレッジ信号を出力する。ライトアクノレッジ信号は、ストアバッファ部２００から出力されたメモリ書き込みアクセス要求のためのリクエスト信号に対する動作が完了したことを示す信号で、この信号の立ち上がり時点で前記メモリ動作が完了したと認識される。
【００７６】
アンド回路１０の出力、即ち、ライトアクノレッジ信号は、オア回路３４、アンド回路８、および、アンド回路９に出力される。オア回路３４は、シフトタイミング信号を生成するもので、ライトアクノレッジ信号がアサートされ、オア回路３４に入力される場合には、メモリ装置４００への書き込みアクセスが完了したことを示すもので、この時、シフトタイミング信号はアサートされ、アンド回路１５〜１８に出力されて、ＳＢ０〜３クロック信号のアサートタイミングを与える。また、シフトタイミング信号はアンド回路７に出力される。
ライトアクノレッジ信号がアサートされることによりシフトタイミング信号がアサートされ、そしてアンド回路７にてライトリクエストとの論理積出力がアサートされた場合は、メモリ装置４００への書き込みを伴うストア命令実行に対するアクノレッジ信号がアサートされ、バッファ４１で駆動され、ＣＰＵ３００に出力される。この場合のアクノレッジ信号は、ストアデータバッファ（キュー）の先頭の１語がメモリ装置４００に書き込み完了した時点でアサートされる。
【００７７】
アンド回路１０の出力、即ち、ライトアクノレッジ信号は、アンド回路８にてライトバッファテールデコーダ７４の３出力、即ち、ラストアクセス信号と論理積がとられ、また、アンド回路９にて前記ラストアクセスの反転信号との論理積がとられる。アンド回路８の出力は、メモリ装置４００へのストアデータバッファ（キュー）に格納されている古い（新たにＣＰＵ３００が書き込み要求しているデータの書き込み先メモリデータブロックとは異なる）メモリデータブロックに対する書き込みの最後のメモリアクセスの完了時にアサートされる。また、アンド回路９の出力は、上記メモリ書き込みの最後以外のメモリアクセスの完了時にアサートされる。アンド回路８の出力はライトダン信号で、メモリリクエストレジスタ５２のリセット入力、およびオア回路３３に出力される。また、アンド回路９の出力はオア回路３２に出力される。
【００７８】
メモリリクエストレジスタ５２は、リセット入力の立ち上がり時点でリセットされる。メモリリクエストレジスタ５２がリセットされることで、Ｍリクエスト信号がネゲートされ、これによりメモリ装置４００に対する１〜４回のメモリ書き込みを終了させると共に、メモリビジィ信号をネゲートする。オア回路３２は前記アンド回路４の出力、即ち、メモリリクエストレジスタ５２へのセット入力となる信号と、アンド回路９の出力との論理和をとるもので、出力はライトバッファポインタ８２へのアップ入力に接続される。また、オア回路３３は、前記アンド回路５の出力、即ち、エンキュー信号と、アンド回路８の出力、即ち、ライトダン信号との論理和をとるもので、出力はライトバッファポインタ８２へのダウン入力に接続される。
【００７９】
ライトバッファポインタ８２は、ストアデータバッファ（キュー）の最後尾を示すための２ビットのライトバッファテール信号を生成するものである。以下にライトバッファポインタ８２による計数動作を示す。ストアデータバッファは先頭位置（出口）から３、２、１、０の順に番号付けされる。したがってキューの構造上の入り口（最後尾とは限らない；最後尾は動作時のキューの入り口）は０である。また、キューの最後尾は通常次にデータを書き込むことのできるＳＢを示している。したがって、ライトバッファポインタはデータの挿入時に更新され（１つ番号が小さくなる）、次のデータ挿入に備え、またデータのメモリ装置４００への書き出し時に更新される（１つ番号が大きくなる）。
【００８０】
［ａ．初期状態、即ち、電源投入直後］
データは何も書き込まれていないので３となる。
［ｂ．タグレジスタの内容と同一のメモリブロックへのストア命令実行時］
（ｂ−１．スモールアドレスがスモールアドレスレジスタＳＢ０〜３Ａのいずれかと同一の場合）：データの上書きを行うので増減なし。
（ｂ−２．スモールアドレスがスモールアドレスレジスタＳＢ０〜３Ａのいずれとも同一でない場合）：データの挿入を行うので１減じられる。
【００８１】
［ｃ．タグレジスタの内容と異なるメモリブロックへのストア命令実行時］
▲１▼先ず、１増加される（キューへのデータ挿入、および元のキュー内容の先頭方向へのシフト、即ち、１語のキュー内容のシフトアウトへのポインタ準備）。
▲２▼メモリ装置４００への書き込みアクセスを行い、メモリアクノレッジがアサートされる毎に１増加される（キュー内容のシフトアウトへのポインタ準備。新たなメモリブロックへの書き込みデータが挿入される）。そして、ポインタの値が３になるまで増加は続く。
▲３▼ポインタの値が３の時のメモリ書き込みアクセス時のメモリアクノレッジ信号のアサート時に１減じられ２となり、メモリ書き込みアクセスは終了する。即ち、以前のメモリブロックへの書き込みデータが全てメモリ装置４００に書き込まれ、新たなメモリブロックへの書き込みデータがキューの先頭まで進み、ポインタの値は次のデータの挿入に備えて２となる（キューの先頭から２番目のＳＢ２を示す）。
【００８２】
上記説明において、ｂ−１の時にはアンド回路５の出力、即ち、エンキュー信号がアサートされ、ｃ−▲１▼の時には、アンド回路４の出力がアサートされ、ｃ−▲２▼の時にはアンド回路９の出力がアサートされ、そしてｃ−▲３▼の時にはアンド回路８の出力、即ち、ライトダン信号がアサートされて、オア回路３２、あるいはオア回路３３を経てライトバッファポインタ８２のアップ入力、あるいはダウン入力に出力され、ライトバッファポインタ８２にて必要な計数動作を行う。
【００８３】
［動作］
図９〜１１は、本実施形態１の動作を示す説明図であり、図９が、ロード時の動作、図１０、１１がストア時の動作を示している。
ロード命令実行時は、ロードバッファ部１００のタグレジスタ５１の内容とＣＰＵ３００から入力されるラージアドレスの内容の一致、あるいは不一致、および上記ラージアドレスの一致の場合のＶフラグの内容（“１”の時有効、“０”の時無効）により動作が異なる。
【００８４】
ストア命令実行時は、ストアバッファ部２００のタグレジスタ５１の内容とＣＰＵ３００から入力されるラージアドレスの内容の一致、あるいは不一致、および、これらとキューに格納されているメモリオフセットアドレスとＣＰＵ３００から入力されるスモールアドレスの内容の一致、あるいは不一致の組み合わせにより動作が異なる。更に、上記ストア命令実行時においてロードバッファ部１００のタグレジスタ５１の内容とＣＰＵ３００から入力されるラージアドレスの内容が一致、あるいは不一致の場合で更に動作が異なる。図９〜１１にはこれらの各々の組み合わせと動作が記されている。
【００８５】
図１２、図１３は、本実施形態１および後述する実施形態２におけるロード命令実行時のタイムチャートである。
図１２はバッファヒットの場合の動作で、この場合はメモリ装置４００に対して読み出しアクセスは行われず、ロードデータバッファ部１０１内の必要なデータが選択されＣＰＵ３００に出力される。図１２において、リクエスト信号とアクノレッジ信号はハンドシェイクの関係にある。また、ラージアドレス、スモールアドレス、およびリード／ライト信号はリクエスト信号がアサートされている間正しい値となっている。ロードデータはアクノレッジ信号の立ち上がり時点で正しい値となっている。
【００８６】
図１３はバッファミスの場合の動作で、この場合はメモリ装置４００に対して読み出しアクセスを行い、ＣＰＵ３００の要求するデータをＣＰＵ３００に出力すると共に、ロードデータバッファ部１０１に格納する。更に、ＣＰＵ３００はメモリ装置４００に読み出しアクセスを行ったメモリアドレス以降、そのメモリブロック（実施形態１、および後述する実施形態２では４語で構成される）の最後のデータまで連続してメモリ装置４００から読み出し、ロードデータバッファ部１０１の該当レジスタに格納する。ＣＰＵ３００に対するアクノレッジ信号のアサートおよびロードデータの出力は、最初のメモリ読み出しアクセスでメモリ装置４００から読み出したデータが入力された時点、即ち、最初のメモリリクエスト信号のアサートに対するメモリアクノレッジ信号のアサート時に確定する。
【００８７】
図１３ではスモールアドレスの内容が“００”の場合、即ち、当該メモリブロックの最初に配置されるデータに対してのロード命令実行時の動作を示す。この場合には、ロードバッファミスとなったメモリブロックの最初に配置されるデータから読み出しアクセスを行い、４語のメモリブロック全部を連続して読み出し、ロードデータバッファ部１０１の該当レジスタに順次格納する。図１３でリクエスト信号とアクノレッジ信号はハンドシェイクの関係にある。また、ラージアドレス、スモールアドレス、およびリード／ライト信号はリクエスト信号がアサートされている間正しい値となっている。更に、メモリブロックアドレス、メモリオフセットアドレス、メモリリード／ライト信号はメモリリクエスト信号がアサートされている間正しい値となっている。そして、メモリリードデータはメモリアクノレッジ信号の立ち上がり時点、およびＬＢ０〜３セット信号の立ち上がり時点で正しい値となっている。また、ロードデータはアクノレッジ信号の立ち上がり時点で正しい値となっている。
【００８８】
図１４、１５は、本実施形態１および後述する実施形態２のストア命令実行時のタイムチャートである。
図１４はバッファヒットの場合で、この場合にはメモリ装置４００に対する書き込みアクセスは行われず、ストアデータバッファ（キュー）に書き込みデータを格納するのみの動作を行う。図１４でリクエスト信号とアクノレッジ信号はハンドシェイクの関係にある。また、ラージアドレス、スモールアドレス、リード／ライト信号、およびストアデータはリクエスト信号がアサートされている間正しい値となっている。
【００８９】
図１５はバッファミスの場合の動作で、この場合はメモリ装置４００に対して連続して書き込みアクセスを行い、既にキューに格納されていた書き込みデータをメモリ装置４００の当該メモリブロックに対して書き込みアクセスを行うと共に、ＣＰＵ３００から入力されるストアデータを新たにキューに格納する。ＣＰＵ３００に対するアクノレッジ信号のアサートは、最初のメモリ書き込みアクセスでキューの先頭にあるデータをメモリ装置４００に書き込み完了した時点、即ち、最初のメモリリクエスト信号のアサートに対するメモリアクノレッジ信号のアサート時に確定する。図１５でリクエスト信号とアクノレッジ信号はハンドシェイクの関係にある。また、ラージアドレス、スモールアドレス、リード／ライト信号、およびストアデータはリクエスト信号がアサートされている間正しい値となっている。更に、メモリブロックアドレス、メモリオフセットアドレス、メモリリード／ライト信号、およびメモリライトデータはメモリリクエスト信号がアサートされている間、およびＳＢ０〜３クロックの立ち上がり時点で正しい値となっている。
【００９０】
［効果］
以上のように、本実施形態１は、複数語のロードデータバッファ部１０１、および複数語のストアデータバッファ部２０１（キュー）、およびこれらの制御部とからメモリバッファ装置を構成しているため、例えば、キャッシュメモリに比べてデータ格納のための別途のメモリ部を必要としない等、小規模の回路でメモリバッファ装置を構成することができる。その結果、ＣＰＵ３００の高動作周波数化に対して十分追従可能な動作速度が確保できると共に、ＣＰＵ３００と本発明のメモリバッファ装置とを含むＬＳＩのチップ面積を、従来技術によるＣＰＵとキャッシュメモリで構成されるＬＳＩに比べ大幅に削減することが可能で、ローコスト化および消費電力の低減化を図ることができる。
【００９１】
また、ロード命令実行時でバッファミスが発生した場合は、メモリ装置４００から複数語のメモリアクセス動作を行うため、ＣＰＵ３００のメモリ装置４００へのアクセス頻度を軽減でき、メモリスループットの向上を図ることができ、従って、プログラム実行性能の向上に寄与することができる。
【００９２】
更に、本実施形態１では、ロード命令実行時でバッファミスが発生した場合に、そのメモリアドレスから連続して同一メモリブロックの最終アドレスまでの読み出しアクセスを行い、ロードバッファ部１００に格納するようにしたので、次のような効果がある。
ミスを起こしたメモリアドレスから先の（メモリアドレスが増加する方向の）データのみを連続アクセスすることにより、例えば、配列演算といった前方参照頻度の高いプログラムの実行に対しては、ブロック全体を常に読みだす場合に比べてメモリアクセスサイクルを低減することができる。即ち、メモリ参照は直前のメモリ参照の対象アドレスから連続して、あるいは数語離れた前方（メモリアドレスの増加する方向）に対して行われる確率が高い場合には、直前のメモリ参照の対象アドレスよりも後方（メモリアドレスの減少する方向）のデータアクセスは稀であり、これらのデータを不必要にメモリ装置４００から読み出すことがないからである。
【００９３】
《実施形態２》
［構成］
本実施形態２において、全体構成は、図１に示した構成および図２に示した構成と図面上は同様である。また、ストアバッファ部については、図６、７、８に示した構成と同様であるため、これらの構成については省略する。
本実施形態が上記実施形態１と異なるのは、ロードバッファ制御部の構成であり、以下のようになっている。
【００９４】
図１６は、実施形態２におけるロードバッファ制御部の構成図である。
即ち、本実施形態では、上記実施形態１のロードバッファ制御部１０２に対し、メモリアクセスカウンタ８３を新たに設置する点が異なる。この点以外の構成については実施形態１と同様であるため、対応する部分に同一符号を付してその説明を省略する。
【００９５】
本実施形態２と上記実施形態１との違いはロード命令実行時にバッファミスとなった場合の動作のみである。このような場合、実施形態１ではミスを発生したメモリアドレスから連続して当該メモリブロックの最終アドレスのデータまでをメモリ装置４００から連続読み出しアクセスしてロードデータバッファ部１０１の該当レジスタに格納する動作を行った。これに対し、実施形態２ではミスを発生したメモリアドレスから連続して当該メモリブロックの最終アドレスのデータ、および当該メモリブロックの先頭アドレスのデータからミスを発生したメモリアドレスの直前のメモリアドレスのデータまで、即ち、１メモリブロック全体のデータをメモリ装置４００から読み出しアクセスして、ロードデータバッファ部１０１の該当レジスタに格納する動作を行う。
【００９６】
このために、図１６に示すようにオフセットアドレスカウンタ８１、およびメモリアクセスカウンタ８３の２つのカウンタを設置する。オフセットアドレスカウンタ８１は、読み出しアクセスを行うメモリブロック内のオフセットアドレスを生成し、また、１メモリブロック分のデータ読み出しアクセスを行うためのアクセスサイクル数を計数する。そして、メモリアクセスカウンタ８３のカウントクロック入力はオフセットアドレスカウンタ８１のカウントクロック入力と同一信号で、図３に示すロードバッファ制御部１０２のリードアクノレッジ信号と同一である。
【００９７】
メモリアクセスカウンタ８３のロード入力はオフセットアドレスカウンタ８１のロード入力と同一信号で、図３に示すロードバッファ制御部１０２のメモリリクエストレジスタ５２へのセット入力と同一である。この時、メモリアクセスカウンタ８３には必ず“００”がロードされる。メモリアクセスカウンタ８３のキャリー出力は、この２ビットのカウンタがオーバフローした時、即ち、４となった時にアサートされる。これにより１メモリブロック分のデータである４語のデータがメモリ装置４００から連続して読み出されたことが認識される。この信号はラストアクセス信号で、アンド回路１０でリードアクノレッジ信号と論理積がとられ、その出力がメモリリクエストレジスタ５２のリセット信号となり、メモリリクエストレジスタ５２をリセットする。尚、ラストアクセス信号は図３に示す実施形態１のロードバッファ制御部１０２と同一の役割である。
このメモリアクセスカウンタ８３を設置することにより、バッファミスの場合には必ずミスを発生したメモリブロック全体、即ち、４語のデータアクセスを連続してメモリ装置４００に対して行う。
【００９８】
図１７〜１９は、本実施形態２の動作を示す説明図であり、図１７がロード時の動作、図１８、１９がストア時の動作を示している。
本実施形態２のメモリバッファ装置の動作は、図９〜１１に示す実施形態１のメモリバッファ装置の動作と比べてロード命令実行時の動作のみ異なる。
実施形態２では、ロードバッファミスの場合に必ずミスとなったメモリアドレスを含むメモリブロック全体（４語）をメモリ装置４００から読み出しアクセスし、ロードデータバッファ部１０１の該当レジスタに格納するため、以降のメモリアクセスにおいて、ロードＶフラグは必ず“１”となる。従って、ロードバッファ部１００のタグレジスタ５１の内容がＣＰＵ３００から入力されるラージアドレスの内容と一致する場合には必ずバッファヒットとなり、この場合にはロードデータバッファ部１０１の該当レジスタからデータをＣＰＵ３００に出力する。
【００９９】
一方、上記タグレジスタ５１の内容とＣＰＵ３００から入力されるラージアドレスの内容が一致しない場合にはバッファミスであるため、ミスとなったメモリアドレスから連続してメモリ装置４００から１メモリブロック分のデータを読み出し、最初に読み出したリードデータをＣＰＵ３００に出力すると共に、１メモリブロック分のデータ（４語）をロードデータバッファ部１０１の該当レジスタに格納する。ストア命令実行時の動作は図９〜１１に示す実施形態１のメモリバッファ装置の動作と同一である。
【０１００】
また、実施形態２におけるロード命令実行時およびストア命令実行時の各信号のタイミングについては、図１２〜図１５で示した状態と同様であるため、その説明は省略する。但し、実施形態２では、スモールアドレスの内容が“００”〜“１１”のいずれの値の場合であってもロードバッファミスとなったメモリブロックの最初に配置されるデータから読み出しアクセスを行い、４語のメモリブロック全部を連続して読み出しロードデータバッファの該当レジスタに順次格納する。
【０１０１】
［効果］
以上のように、実施形態２では、上記実施形態１と同様に、ローコスト化および消費電力の低減化を図ることができるといった効果を奏すると共に、ロード命令実行時でバッファミスが発生した場合に、ミスとなったメモリアドレスを含む１メモリブロック全体のデータの読み出しアクセスを行ってロードバッファ部１００に格納するようにしたので、次のような効果がある。
ミスを起こしたメモリアドレスを含むメモリブロック全体を連続アクセスすることにより、例えば、スタックを用いた演算といった前方参照頻度と後方参照頻度とが同程度に高いプログラムの実行に対しては、ブロック全体を常に読みだすことで全体のバッファミス回数を低減することができる。即ち、メモリ参照は直前のメモリ参照の対象アドレスから連続して、あるいは数語離れた前方（メモリアドレスの増加する方向）、および後方（メモリアドレスの減少する方向）に対して同程度のメモリアクセス確率がある場合には、１度に連続アクセスしてバッファに格納することが可能である最大語数のデータをメモリ装置４００から連続してアクセスすることで、メモリスループットを最大とすることができる。
【０１０２】
【発明の効果】
以上説明したように、第１発明のメモリバッファ装置によれば、複数語のレジスタを有するロードデータバッファ部と、複数語の先入れ先出しバッファを有するストアデータバッファ部、およびこれらの制御部とから構成しているため、例えば、キャッシュメモリに比べてデータ格納のための別途のメモリ部を必要としない等、小規模の回路でメモリバッファ装置を構成することができる。その結果、ＣＰＵの高動作周波数化に対して十分追従可能な動作速度が確保できると共に、ＣＰＵと本発明のメモリバッファ装置とを含むＬＳＩのチップ面積を、従来技術によるＣＰＵとキャッシュメモリで構成されるＬＳＩに比べ大幅に削減することが可能で、ローコスト化および消費電力の低減化を図ることができる。
【０１０３】
また、ロード命令実行時でバッファミスが発生した場合は、メモリ装置から複数語のメモリアクセス動作を行うため、ＣＰＵのメモリ装置へのアクセス頻度を軽減でき、メモリスループットの向上を図ることができ、従って、プログラム実行性能の向上に寄与することができる。
【０１０４】
更に、ロード命令実行時でバッファミスが発生した場合に、そのメモリアドレスから連続して同一メモリブロックの最終アドレスまでの読み出しアクセスを行い、ロードバッファ部に格納するようにしたので、次のような効果がある。
ミスを起こしたメモリアドレスから先の（メモリアドレスが増加する方向の）データのみを連続アクセスすることにより、例えば、配列演算といった前方参照頻度の高いプログラムの実行に対しては、ブロック全体を常に読みだす場合に比べてメモリアクセスサイクルを低減することができる。
【０１０５】
第２発明のメモリバッファ装置によれば、第１発明と同様に、複数語のレジスタを有するロードデータバッファ部と、複数語の先入れ先出しバッファを有するストアデータバッファ部、およびこれらの制御部とから構成しているため、第１発明と同様の効果がある。また、ロード命令実行時でバッファミスが発生した場合に、ミスとなったメモリアドレスを含む１メモリブロック全体のデータの読み出しアクセスを行ってロードバッファ部に格納するようにしたので、次のような効果がある。即ち、ミスを起こしたメモリアドレスを含むメモリブロック全体を連続アクセスすることにより、例えば、スタックを用いた演算といった前方参照頻度と後方参照頻度とが同程度に高いプログラムの実行に対しては、ブロック全体を常に読みだすことで全体のバッファミス回数を低減することができる。
【０１０６】
第３発明のメモリバッファ装置によれば、中央処理装置からのデータ書き込み要求に対して、該当データが、ロードデータバッファ部内のレジスタのいずれかに存在する場合は、このデータを更新するようにしたので、このような場合は、メモリ装置にアクセスする必要がなく、従って、プログラム実行性能を向上させることができる。
【図面の簡単な説明】
【図１】本発明のメモリバッファ装置の実施形態１、２を示す構成図である。
【図２】本発明のメモリバッファ装置のロードバッファ部の構成図である。
【図３】本発明のメモリバッファ装置の実施形態１のロードバッファ制御部の構成図である。
【図４】本発明のメモリバッファ装置の実施形態１におけるロードデータバッファ有効フラグ部の構成図である。
【図５】本発明のメモリバッファ装置の実施形態１におけるＬＢ０〜３セット信号生成部の構成図である。
【図６】本発明のメモリバッファ装置の実施形態１におけるストアバッファ部の構成図である。
【図７】本発明のメモリバッファ装置の実施形態１におけるストアバッファ制御部の構成図である。
【図８】本発明のメモリバッファ装置の実施形態１におけるストアバッファ信号生成部の構成図である。
【図９】本発明のメモリバッファ装置の実施形態１における動作（ロード時）の説明図である。
【図１０】本発明のメモリバッファ装置の実施形態１における動作（ストア時）の説明図（その１）である。
【図１１】本発明のメモリバッファ装置の実施形態１における動作（ストア時）の説明図（その２）である。
【図１２】本発明のメモリバッファ装置におけるロード命令実行時のタイムチャート（バッファヒット）である。
【図１３】本発明のメモリバッファ装置におけるロード命令実行時のタイムチャート（バッファミス）である。
【図１４】本発明のメモリバッファ装置におけるストア命令実行時のタイムチャート（バッファヒット）である。
【図１５】本発明のメモリバッファ装置におけるストア命令実行時のタイムチャート（バッファミス）である。
【図１６】本発明のメモリバッファ装置の実施形態２におけるロードバッファ制御部の構成図である。
【図１７】本発明のメモリバッファ装置の実施形態２における動作（ロード時）の説明図である。
【図１８】本発明のメモリバッファ装置の実施形態２における動作（ストア時）の説明図（その１）である。
【図１９】本発明のメモリバッファ装置の実施形態２における動作（ストア時）の説明図（その２）である。
【符号の説明】
１００ロードバッファ部
１０１ロードデータバッファ部
１０２ロードバッファ制御部
２００ストアバッファ部
２０１ストアデータバッファ部
２０２ストアバッファ制御部[0001]
BACKGROUND OF THE INVENTION
The present invention relates to a memory buffer device for memory reading and memory writing in a computer.
[0002]
[Prior art]
In computers based on existing technology, memory operations are basically performed by reading data (load instruction execution) and writing data (store instruction execution) to an arbitrary memory address in units of one word, half word, or 1/4 word. . For this reason, a small-scale computer has a one-word memory register, and when executing a load instruction, the data read from the memory device is stored in this register, and the CPU (central processing unit) takes it in and stores it. In instruction execution, data to be written is temporarily stored in a register and written in a memory device.
For this reason, in the load instruction execution, the memory access time is reflected on the instruction execution time as it is for each CPU request, and in the store instruction execution, the time until the write data is stored in the register from the CPU is reflected in the instruction execution time. . However, when a load instruction or a store instruction is executed immediately after execution of the store instruction, the next memory operation is waited for the memory write time associated with execution of the previous store instruction.
[0003]
In a large-scale computer, a cache memory is installed, and load instructions and store instructions are executed through the cache memory. As a result, the CPU exchanges data in units of words with the cache memory, and the memory device exchanges data of a plurality of words (hereinafter referred to as data blocks) arranged at continuous memory addresses with the cache memory. I do. As a result, when the data requested by the load instruction exists in the cache memory (hereinafter referred to as a load cache hit), the load instruction is completed in a short time by a high-speed data response from the cache memory, and the store instruction In the case where a data block including data to be written exists in the cache memory (hereinafter referred to as a store cache hit), the store instruction can be completed by executing only the data write to the cache memory.
[0004]
When the load instruction is not a load cache hit (hereinafter referred to as a load cache miss), or when the store instruction is not a store cache hit (hereinafter referred to as a store cache miss), the memory device access time is the same as in a small computer. Is reflected in the instruction execution time or the memory operation immediately after.
[0005]
[Problems to be solved by the invention]
As described above, in the memory operation in the computer according to the prior art, the access time of the memory device is directly reflected in the instruction execution time for each load instruction in a small-scale computer, and for the memory write operation accompanying the store instruction execution Since execution of a load instruction or a store instruction that is continuous to this is awaited, there is a large problem of performance degradation due to execution time of a memory operation instruction. In addition, since the memory operation is performed in units of one word, the improvement of the memory throughput due to the memory operation over continuous words such as high-speed page access prepared in the DRAM device cannot be utilized.
In particular, in recent high operating frequency CPUs, there is a tendency for the difference between the instruction execution time executed only inside the CPU and the execution time of the memory operation instruction to increase, and this is the reason for the program execution despite the improvement of the CPU operating frequency. This is the reason why performance does not increase.
[0006]
On the other hand, in a large-scale computer, the cache memory is installed to fill the gap between the CPU operation and the memory operation. However, the chip size, chip cost, and power consumption by incorporating the cache memory in the same LSI as the CPU. The increase in device cost, the device volume, and the power consumption due to the increase of the cache memory or the outside of the CPU-LSI is a problem.
Further, with the improvement of the CPU operating frequency, the access time of the cache memory cannot follow this, and the number of clock cycles for executing the memory operation instruction is increased while the cache memory is installed.
[0007]
In view of the above, it has been desired to realize a memory buffer device that can reduce cost and power consumption and can improve program execution performance.
[0008]
[Means for Solving the Problems]
In order to solve the above-described problem, a memory buffer device according to a first aspect of the present invention includes a load buffer unit for reading data from a memory device, and a store buffer unit for writing to the memory device. , A load data buffer unit having a plurality of words independently readable and writable registers for storing data for one memory block, and any one of the registers in response to a data read request from the central processing unit to the memory device Is returned to the central processing unit, and if there is no corresponding data in any register, the memory device continuously sends the final data of the same memory block from the memory address of the read request data. Load buffer control unit that reads data up to address and stores them sequentially in registers The store buffer unit has a store data buffer unit including a first-in first-out buffer for storing a plurality of words as data for one memory block, and a write request to the memory device from the central processing unit in the first-in first-out buffer. When the corresponding data exists, it is updated, and the storage position is the last, and when the data of the same memory block of the corresponding data exists, the corresponding data is stored at the last, and the data in the first-in first-out buffer is When the memory block is different from the memory block of the corresponding data, the storage block control unit writes all the data stored in the first-in first-out buffer to the memory device and stores the data at the head position of the first-in first-out buffer. It is characterized by this.
[0009]
Since the memory buffer device according to the first aspect of the present invention is configured in this way, the load buffer unit has corresponding data in any of the registers in response to a data read request from the central processing unit to the memory device. If so, return it to the central processing unit. On the other hand, if there is no corresponding data in any register, the data from the memory address of the read request data to the last address of the same memory block is read from the memory device, and these are sequentially stored in the register. To do.
In response to a write request from the central processing unit to the memory device, the store buffer unit updates the corresponding data if it exists in the first-in first-out buffer, and sets the storage position at the end.
If the data in the same memory block as the write request data exists in the first-in first-out buffer, the corresponding data is stored at the end thereof.
On the other hand, if the data in the first-in first-out buffer is a memory block different from the memory block of the corresponding data, all the data stored in the first-in first-out buffer is written to the memory device, and the corresponding data is stored at the head position of the first-in first-out buffer. To store.
[0010]
In order to solve the above-described problem, the memory buffer device of the second invention includes a load buffer unit for reading data from the memory device, and a store buffer unit for writing to the memory device. , A load data buffer unit having a plurality of words independently readable and writable registers for storing data for one memory block, and any one of the registers in response to a data read request from the central processing unit to the memory device Is returned to the central processing unit, and if the corresponding data does not exist in any register, the entire memory block including the memory address of the read request data is read from the memory device. It consists of a load buffer control unit that reads and stores them in registers in sequence. The store data buffer unit has a first-in first-out buffer that stores a plurality of words as data for one memory block, and the corresponding data exists in the first-in first-out buffer in response to a write request from the central processing unit to the memory device. If the data is stored in the first-in / first-out buffer, the data is stored at the tail and the data is stored in the first-in first-out buffer. When the memory block is different from the block, it is characterized by comprising a store buffer control unit for writing all the data stored in the first-in first-out buffer to the memory device and storing the corresponding data at the head position of the first-in first-out buffer. To do.
[0011]
Since the memory buffer device according to the second aspect of the present invention is configured as described above, the load buffer unit has corresponding data in any of the registers in response to a data read request from the central processing unit to the memory device. If so, return it to the central processing unit. On the other hand, if there is no corresponding data in any register, the entire memory block including the memory address of the read request data is read from the memory device, and these are sequentially stored in the register.
The operation of the store buffer unit is the same as in the first invention.
[0012]
In order to solve the above-described problem, the memory buffer device of the third invention is the memory buffer device of the first or second invention, wherein the corresponding data in response to a data write request from the central processing unit to the memory device, In the case where it exists in any of the registers in the load data buffer unit, a load buffer control unit for updating this data is provided.
[0013]
Since the third invention is configured as described above, when there is a data write request from the central processing unit to the memory device, when the corresponding data exists in any of the registers in the load data buffer unit, Each time, only the data in the load data buffer unit is updated without writing to the memory device.
[0014]
DETAILED DESCRIPTION OF THE INVENTION
Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.
Embodiment 1
[Constitution]
FIG. 1 is a block diagram showing Embodiment 1 of the memory buffer device of the present invention and Embodiment 2 described later.
The illustrated apparatus shows a load buffer unit 100 and a store buffer unit 200, a central processing unit (hereinafter referred to as a CPU) 300, and a memory device 400 that constitute a memory buffer device.
[0015]
The load buffer unit 100 mainly works when executing a load instruction (memory read instruction) from the CPU 300, and the store buffer unit 200 works when executing a store instruction (memory write instruction) from the CPU 300.
That is, the load buffer unit 100 includes a load data buffer unit 101 having a plurality of words independently readable and writable registers for storing data for one memory block, and a plurality of registers in response to a memory read request from the CPU 300. If the corresponding data exists in any of the registers, it is returned to the CPU 300. If no corresponding data exists in any register, the memory device 400 continuously reads the same memory from the memory address of the data requested to be read. The load buffer control unit 102 reads out data up to the last address of the block and stores them sequentially in a register.
[0016]
When executing a load instruction, the CPU 300 asserts a request signal to the load buffer unit 100 (outputs “1”), negates a read / write signal (outputs “0”), and further sets a predetermined memory address. The address is output as follows. In response to this, the load buffer unit 100 asserts an acknowledge signal and outputs the load data requested by the CPU 300 to the CPU 300. When data is read from the memory device 400 when the load instruction is executed, a memory request is asserted from the load buffer unit 100 to the memory device 400, a memory read / write signal is negated, and a predetermined memory address is obtained. The memory address is output as follows.
In response to this, the memory device 400 asserts a memory acknowledge signal and outputs read data requested by the load buffer unit 100 to the load buffer unit 100.
[0017]
The store buffer unit 200 includes a store data buffer unit 201 having a first-in first-out buffer that stores a plurality of words as data for one memory block, and a write request from the CPU 300 to the memory device 400 in the first-in first-out buffer. When the corresponding data exists, it is updated, and the storage position is the last, and when the data of the same memory block of the corresponding data exists, the corresponding data is stored at the last, and the data in the first-in first-out buffer is When the memory block is different from the memory block of the corresponding data, all the data stored in the first-in first-out buffer is written in the memory device 400, and the corresponding data is stored at the head position of the first-in first-out buffer; It has.
[0018]
When executing a store instruction, the CPU 300 asserts a request signal to the store buffer unit 200 (outputs “1”), asserts a read / write signal (outputs “1”), and writes data to be written, that is, Store data is output, and an address is output so that a predetermined memory address is obtained. In response to this, the store buffer unit 200 asserts an acknowledge signal and outputs it to the CPU 300. When data is written to the memory at the time of execution of the store instruction, the memory request is asserted from the store buffer unit 200 to the memory device 400, the memory read / write signal is asserted, and the write data is set to become the data to be written. And a memory address is output so that a predetermined memory address is obtained. In response to this, the memory device 400 asserts a memory acknowledge signal and outputs it to the load buffer unit 100.
[0019]
FIG. 2 is a configuration diagram of the load buffer unit 100.
The load buffer unit 100 includes the load data buffer unit 101 and the load buffer control unit 102 as described above.
In the first embodiment, the load data buffer unit 101 includes a four-word data register (LB0 to LB3), a load buffer input selector 101a, and a load buffer output selector 101b. In the present invention, the number of data words is as follows. The number of words is not limited to four, and an arbitrary number of one or more words can be taken.
[0020]
In the first and second embodiments, the load data buffer unit 101 can temporarily store four words of memory data. One word of data read from the memory device 400 or two to two of the data read continuously. Stores 4 words of data. When the data stored in the data registers LB0 to LB3 includes a copy of the data stored in the memory address specified by the address from the CPU 300 when the load instruction is executed from the CPU 300, the memory device Data is not read from 400, and one of the outputs of the four data registers LB0 to LB3 for storing data specified by the CPU 300 is selected by the load buffer output selector 101b and output to the CPU 300 as load data.
[0021]
In addition, when the store instruction is executed, when data is read from the same memory address as that of the store data and stored in the data registers LB0 to LB3, a new one is stored to maintain data consistency. Store data must also be stored in the corresponding register. In the present embodiment, this is hereinafter referred to as a store bypass operation. The load buffer input selector 101a selects either read data at the time of reading from the normal memory device 400 or store data from the CPU 300 at the time of store bypass operation, and outputs the selected data to the data registers LB0 to LB3.
[0022]
The load buffer control unit 102 has a function of inputting / outputting control signals to / from the CPU 300 and the memory device 400 and controlling the load data buffer unit 101. A store bypass signal, LB 0 to 3 sets, and a load address offset signal are output from the load buffer control unit 102 to the load data buffer unit 101. The store bypass signal is an input selection signal to the load buffer input selector 101a. When it is “0”, read data is selected, and when it is “1”, store data is selected and output to LB0-3. The LB0 to 3 set signals are data set signals to the data registers LB0 to LB3, respectively, and input data is newly stored at the rising edge of this signal. The lower two bits (hereinafter referred to as a small address) of the address from the CPU 300 serve as a selection signal to the load buffer output selector 101b, and selects one of the outputs of LB0 to LB3.
[0023]
FIG. 3 is a configuration diagram of the load buffer control unit 102.
4 and 5 are configuration diagrams of the load data buffer valid flag portion and the LB0 to 3 set signal generation portions in FIG. 3, respectively.
The load buffer control unit 102 inputs / outputs control signals to / from the CPU 300 and the memory device 400, and controls the load data buffer unit 101. The load address control unit 102a, a load data buffer valid flag (hereinafter referred to as V) (Denoted as a flag) unit 102b, LB0 to 3 set signal generation unit 102c, and a sequence control unit.
[0024]
That is, the load buffer control unit 102 shown in these drawings includes AND circuits 1 to 21, OR circuits 31 to 35, buffers 41 to 44, block address buffer 45, offset address buffer 46, tag register 51, and V0 register (V0). ~ V3 register (V3), memory request register 52, small address selector 61, V flag selector 62, small address decoder 71, offset address decoder 72, small address decoder 73, offset address counter 81, and comparator 91.
[0025]
In FIG. 3, A1 indicates an acknowledge, A2 indicates a large address (block address), A3 indicates a read / write, A4 indicates a request, and A5 indicates a small address. B1 is a memory block address, B2 is a memory offset address, B3 is a memory request, B4 is memory read / write, B5 is memory busy, B6 is store bypass, B7 is memory acknowledge, and B8 to B11 are LB0 set to LB3 set. Is shown.
[0026]
The memory address coincidence checking unit 102 a is configured by a tag register 51 and a comparator 91.
The tag register 51 is a register for temporarily storing a signal (large address) excluding the lower 2 bits of the address input from the CPU 300, and a memory device for data of 1 to 4 words stored in the load data buffer unit 101. The block address arranged on 400 is held. The comparator 91 is an Ex-NOR circuit, and compares the large address (block address) stored in the tag register 51 with the value excluding the lower 2 bits of the address input from the CPU 300, and the two match. Check for no. That is, it is checked whether or not a part or all (four words) of a copy of data arranged at a block address input from the CPU 300 at the time of execution of a load instruction or a store instruction is stored in the load data buffer unit 101. It is. Hereinafter, a case where the above inspection results match is referred to as a load data buffer hit, and a case where the result is not the case is referred to as a load data buffer miss.
[0027]
As shown in FIG. 4, the V flag unit 102 b includes V0 to 3 registers (V0 to 3), a small address selector 61, a small address decoder 71, AND circuits 3 to 6, and a V flag selector 62.
The V0 to 3 registers (V0 to V3) have V0 to 3 flags for indicating whether or not the contents of the copy of the memory data stored in the LB0 to LB3 of the load data buffer unit 101 are valid. This corresponds to the data register specified by the lower 2 bits (hereinafter referred to as the small address) of the address input from the CPU 300 when the load data hits, that is, one of LB0 to LB. When the V flag is “1”, this indicates that the contents of the data register are valid.
[0028]
On the other hand, if the corresponding V flag is “0”, the data requested by the CPU 300 is not stored in the data registers LB0 to 3 even when the load data hits, and the memory device 400 is accessed. Need to read data. The small address selector 61 selects a 2-bit selection signal for selecting a register in which the V flag is to be set from the V0 to 3 registers V0 to V3. When storing in LB0-3, the output of the offset address counter 81 (sequence control unit) is selected, and the small address input from the CPU 300 is selected during the store bypass operation. The small address decoder 71 decodes the output of the small address selector 61. The AND circuits 3 to 6 take logical products of the output of the small address decoder 71 (one of the four outputs is “1”) and the output of the OR circuit 31 of the sequence control unit (described later). Are connected to the set inputs of the V0-3 registers V0-V3, respectively.
[0029]
At the time when the output of the OR circuit 31 is asserted, one of the AND circuits 3 to 6 becomes “1”, and one of the corresponding V0 to 3 registers V0 to V3 becomes “1” at the rising edge of this signal. Set. The other three of the V0 to V3 registers V0 to V3 store the previous contents. The reset inputs of the V0 to 3 registers V0 to V3 are connected to the output of the AND circuit 9 of the sequence control unit (the set input of the memory request register 52: described later), and V0 to V0 when the output of the AND circuit 9 is asserted. The contents of 3 are all “0”.
The V flag selector 62 selects and outputs one of the four outputs of the V0-3 registers V0-V3, and uses the small address input from the CPU 300 as a selection signal. If the output of the V flag selector 63 is “1” at the time of the load buffer hit, it indicates that a copy of the memory data requested by the CPU 300 with the load instruction is stored in the corresponding load data buffer unit 101.
[0030]
As shown in FIG. 5, the LB0-3 set signal generation unit 102c includes an offset address decoder 72, a small address decoder 73, AND circuits 14-21, and OR circuits 32-35, and load data buffers LB0-3 LB0-3. A data set signal is generated.
[0031]
The offset address decoder 72 decodes the 2-bit output of the offset address counter 81 of the sequence control unit. The offset address counter 81 is the lower two bits of the memory address when reading data from the memory device 400. The offset address counter 81 is a position in the memory block where data is actually read from the memory device 400 (hereinafter referred to as the memory offset). ).
[0032]
The AND circuits 14 to 17 each perform a logical product of the output of the offset address decoder 72 (any one of the four outputs becomes “1”) and the AND circuit 11 (described later) of the sequence control unit. Therefore, a set signal for storing the read data read from the memory device 400 in the corresponding registers of the data registers LB0 to LB3 is generated.
[0033]
The small address decoder 73 decodes the 2-bit small address input from the CPU 300, and is one of the data registers LB0 to LB3 corresponding to the data write position (memory offset) in the memory block during the store bypass operation. To choose one.
[0034]
The AND circuits 18 to 21 each perform a logical product of the output of the small address decoder 73 (any one of the four outputs becomes “1”) and the AND circuit 13 (described later) of the sequence control unit. Therefore, a set signal for bypassing the store data input from the CPU 300 during the store bypass operation and storing it in the corresponding registers of the data registers LB0 to LB3 is generated.
The OR circuits 32 to 35 take the logical sum of the outputs of the AND circuits 14 to 17 and the AND circuits 18 to 21, respectively. The outputs of the OR circuits 32 to 35 are sent to the data registers LB0 to LB3 as LB0 to 3 set signals. Each is output.
[0035]
The sequence control unit performs input / output of control signals to / from the CPU 300 and the memory device 400, performs read operation control on the memory device 400, and controls operation of the load data buffer unit 101. The sequence circuits 1-2 shown in FIG. 7-13, memory request register 52, buffers 41-44, block address buffer 45, offset address buffer 46, OR circuit 31, and offset address counter 81.
[0036]
In the sequence control unit, first, the AND circuit 1 performs a logical product of the request signal from the CPU 300 and the inverted signal of the read / write signal to generate a read request signal. The read / write signal indicates read when “0” and write when “1”. Therefore, when the read request signal is “1”, it is recognized that the load instruction is executed from the CPU 300.
[0037]
In the AND circuit 2, the logical product of the request signal from the CPU 300, the read / write signal, and the output of the comparator 91 of the memory address match checking unit 102a (hereinafter, this output is referred to as a tag match signal) is obtained. Is generated. When the store bypass signal is asserted, a store bypass operation is performed. That is, the store bypass operation is performed because the write instruction and the block address match.
[0038]
The AND circuit 7 performs an AND operation between the output of the comparator 91 of the address match checking unit 102a and the valid signal that is the output of the V flag selector 62 of the V flag unit 102b. When the output of the AND circuit 7 is “1”, it indicates that a copy of the data in the memory device 400 arranged at the address input from the CPU 300 exists in the corresponding registers of the data registers LB0 to LB3. That is, the load data buffer hits and the V flag is “1”.
[0039]
The AND circuit 8 performs a logical product of the output of the AND circuit 7 and the output of the AND circuit 1 (read request signal). When the output of the AND circuit 8 is “1”, data reading from the memory device 400 is not performed, and this signal is output to the CPU 300 by the buffer 41 as a drive acknowledge signal. At this time, the load buffer output selector 101b in the load buffer unit 100 of FIG. 2 selects the data register output specified by the small address input from the CPU 300 from the data registers LB0 to LB3, and sends it to the CPU 300 as load data. Output.
[0040]
Here, the acknowledge signal and the request signal are in a handshake relationship. That is, the request signal is asserted by the CPU 300, and the memory buffer device operates. When this is completed, an acknowledge signal is asserted in the memory buffer device. The CPU 300 negates the request signal when the acknowledge signal is asserted. In the memory buffer device, the acknowledge signal is negated when the request signal is negated. In the state where the acknowledge signal is negated, the CPU 300 asserts the next memory request.
As described above, the memory request signal and the acknowledge signal are used to synchronize the operation for each data access of one word between the CPU 300 and the memory buffer device.
[0041]
In the AND circuit 9, the logical product of the inverted signal of the output of the AND circuit 7, the read request signal output from the AND circuit 1, and the inverted signal of the memory busy is obtained. This output indicates that the data that needs to be read by the load instruction from the CPU 300 does not exist in the load data buffer unit 101 and the memory device 400 is accessible at that time.
The memory busy signal is a signal common to the load buffer unit 100 and the store buffer unit 200. The memory request register 52 is set to “1” (hereinafter, set to “1”) when the output of the AND circuit 9 is asserted when either of them is accessing the memory device 400. That it is set is simply set, and that it is set to “0” is reset). In other words, since the load data buffer is missed, it is necessary to read out the data requested by the CPU 300 from the memory device 400, and the memory device 400 is activated by setting the memory request register 52.
[0042]
The output of the AND circuit 9 is further connected to the clock input of the tag register 51, the reset input of the V0-3 registers V0 to V3, and the load input of the offset address counter 81. The tag register 51 newly stores a large address input from the CPU 300 when the output of the AND circuit 9 is asserted, that is, when the memory request register 52 is set.
The output of the tag register 51 is driven by the block address buffer 45 while the M request signal output from the memory request register 52 is asserted, and is output to the memory device 400 as a memory block address. The V0-3 registers are all cleared when a new memory access to the memory device 400 is started, that is, when the output of the AND circuit 9 is asserted, and the contents thereof become “0”.
[0043]
The offset address counter 81 sets a 2-bit small address input from the CPU 300 when the output of the AND circuit 9 is asserted. The output of the offset address counter 81 is driven by the offset address buffer 46 and is output to the memory device 400 as a memory offset address. This signal is input to the offset address decoder 72, and becomes a selection signal for LB0 to LB3 when the read data from the memory device 400 is stored in the load data buffer unit 101. Further, this signal is input to the small address selector 61 and becomes a selection signal for the V0-3 registers V0-V3 when the V flag is set when read access to the memory device 400 is performed.
[0044]
The M request signal that is the output of the memory request register 52 is ANDed with the inverted signal of the memory acknowledge in the AND circuit 12, is driven by the buffer 42, and is output to the memory device 400 as a memory request signal. The memory acknowledge signal is a signal input from the memory device 400. When this signal is asserted, it indicates that a predetermined operation in the memory device 400 is completed. The memory request signal is a signal common to the load buffer unit 100 and the store buffer unit 200, and indicates that either of them requests access to the memory device 400.
[0045]
The memory request signal and the memory acknowledge signal have a handshake relationship. That is, the memory request signal is asserted in the memory buffer device (load buffer unit 100 or store buffer unit 200), and the memory device 400 operates. When this is completed, the memory device 400 asserts a memory acknowledge signal. In the memory buffer device, when the memory acknowledge signal is asserted, the memory request signal is negated. The memory device 400 negates the memory acknowledge signal when the memory request signal is negated. The memory buffer device asserts the next memory request in a state where the memory acknowledge signal is negated. As described above, the memory request signal and the memory acknowledge signal are used to synchronize operations for each data access of one word between the memory buffer device and the memory device 400. Therefore, the two signals are turned on and off every memory cycle.
[0046]
The inverted signal of the M request is driven by the buffer 43 and output to the memory device 400 as a memory read / write signal. The memory read / write signal is “0” while the M request signal is “1”. The M request signal is driven by the buffer 44 and asserts a memory busy signal. The memory read / write signal and the memory busy signal are not turned on / off every memory cycle, and are continuously driven while the M request signal is asserted. These are asserted for 1 to 4 memory cycles in order to make consecutive memory accesses in the memory block between the memory buffer device and the memory device 400.
[0047]
The M request signal is further connected to an input of the AND circuit 11 and a selection input of the small address selector 61. The AND circuit 11 performs an AND operation between the M request signal and the memory acknowledge signal. This output indicates that the read access operation in the memory device 400 is completed by asserting this signal as a read acknowledge signal. The small address selector 61 selects and sets any one of the V0 to V3 registers, selects the output of the offset address counter 81 at the time of read access to the memory device 400, and sets the store bypass. Sometimes a small address input from the CPU 300 is selected and output to the small address decoder 71. An M request signal is used as a selection signal for this purpose.
[0048]
The read acknowledge signal, which is the output of the AND circuit 11, is connected to the input of the AND circuit 10, the input of the OR circuit 31, the clock input of the offset address counter 81, and the inputs of the AND circuits 14-17. The AND circuit 10 performs a logical product of the read acknowledge signal and the output 3 of the offset address decoder 72 (asserted when the input signal is “11”). The output of the AND circuit 10 is a read signal, and by asserting this signal, read access is started from the data requested by the CPU 300 from the memory device 400 and the maximum memory address of the memory block, that is, the lower 2 bits of the memory address is “11”. Indicates that read access has been completed.
[0049]
The read signal is input to the reset input of the memory request register 52, and the memory request register 52 is reset when the signal rises. The output M request signal is negated.
In the OR circuit 31, the logical sum of the read acknowledge signal and the output of the AND circuit 13 is taken. The output of the OR circuit 31 is input to the AND circuits 3 to 6 of the V flag unit 102b, and becomes a timing signal for asserting the set input to the V0 to 3 registers V0 to V3.
[0050]
The read acknowledge signal that is the output of the AND circuit 11 is used as a V flag set timing signal when the memory device 400 is read-accessed. The offset address counter 81 is used to generate an offset address to be output to the memory device 400 and to count the number of cycles when memory access is continuously performed. In the first embodiment, in memory access, data arranged from the memory address in which a load buffer miss occurs to the maximum address of the memory block is continuously read-accessed and stored in the load data buffer unit 101.
[0051]
The AND circuit 13 takes a logical product of the store bypass signal and the acknowledge signal. The acknowledge signal at this time is driven and asserted by the store buffer unit 200 when the store instruction is executed. The output of the AND circuit 13 is connected to the input of the OR circuit 31 and the inputs of the AND circuits 18 to 21 of the LB0 to 3 set signal generation unit 102c. The OR circuit 31 is input as a V flag set timing signal during store bypass. The AND circuits 18 to 21 serve as timing signals for asserting LB0 to 3 set signals for bypassing store data input from the CPU 300 and storing in the load data buffer unit 101 during store bypass.
[0052]
FIG. 6 is a configuration diagram of the store buffer unit 200.
As described above, the store buffer unit 200 includes the store data buffer unit 201 and the store buffer control unit 202.
The store data buffer unit 201 includes a queue (FIFO: first-in first-out memory). In this embodiment, a 4-word data register (SB0-3), four small address registers (2 bits each: SB0-3A), and three data selectors (SB1-3B) are connected in series to store data The part of the queue that stores the lower 2 bits of the write destination memory address, that is, the small address, and the contents of the small address input from the CPU 300 and already stored in the small address registers SB0 to 3A. Although it is composed of matchers SB0 to 3C that perform comparison with the contents of the small address, in the present invention, the number of data words is not limited to four words, and it is possible to take an arbitrary number of one or more words. .
[0053]
When the large address input when executing the store instruction from the CPU 300, that is, the value excluding the lower 2 bits of the memory address, and the address of the write destination memory block of the data stored in the queue, that is, the memory block address are the same In this case, data is not written to the memory device 400, and new store data and an offset address in the write destination memory block (hereinafter simply referred to as an offset address), that is, input from the CPU 300 are input at the end of the queue. Write the contents of the small address.
[0054]
At this time, the data already stored in the queue and the offset address are shifted toward the head of the queue for one word. When both the memory block address and the offset address are equal to the large address and small address input from the CPU 300, that is, store data to the same memory address as the memory address given by the store instruction is already stored in the queue. If so, the data is overwritten with subsequent store data. That is, new store data for the same memory address is stored in the queue without actually being written to the memory device 400.
[0055]
The match check between the large address and the memory block address is performed by the store buffer control unit 202, and the match check between the small address and the memory offset address is performed by the matchers SB0 to 3C. When the match is confirmed in any of the matchers SB0 to 3C, the matchers SB0 to 3C corresponding to the matched store buffers SB (collectively referring to the data registers SB0 to 3 and the small address registers SB0 to 3A) are stored. Either is asserted.
When new data is stored in the queue, the data selectors SB0 to 3B select store data and small addresses input from the CPU 300 for the input to the store buffer SB, and other store buffers. For SB, the output of the next store buffer SB is selected and output.
[0056]
The store buffer control unit 202 inputs / outputs control signals to / from the CPU 300 and the memory device 400 and controls the store data buffer unit 201.
The store buffer control unit 202 outputs the SB0-3 clock signal and the SB1-3 set signal to the store data buffer unit 201, and the store data buffer unit 201 outputs the SB0-3 match signal to the store buffer control unit 202. Is output. The SB0-3 clock signals are clock signals for the data registers SB0-3 and small address registers SB0-3A. The data registers SB0 to SB3 and the small address registers SB0 to 3A store inputs at the rising edge of the clock signal. The SB1 to 3 set signals are selection signals for the data selectors SB1 to SB3. When “1”, the store data input from the CPU 300 and the small address are selected. When “0”, the subsequent signals connected in series are selected. Select the output of the register. The SB0-3 match signal is “1” when the matcher inputs match the output signals of the matchers SB0-3C.
[0057]
FIG. 7 is a configuration diagram of the store buffer control unit 202.
FIG. 8 is a configuration diagram of the store buffer signal generation unit 202b in FIG.
In these figures, the store buffer control unit 202 inputs / outputs control signals to / from the CPU 300 and the memory device 400 and controls the store data buffer unit 201. The memory address match check unit 202a, the store buffer signal The generation unit 202b and a sequence control unit are included.
[0058]
That is, the store buffer control unit 202 shown in these drawings includes AND circuits 1 to 18, OR circuits 31 to 40, buffers 41 to 43, block address buffer 44, tag register 51, memory request register 52, selectors 63 to 66, A write buffer tail decoder 74, a write buffer pointer 82, and a comparator 91 are provided.
In FIG. 7, A1 is a large address, A2 is an acknowledge, A3 is a request, A4 is a read / write, A5 to A8 are SB0 match to SB3 match, B1 is a memory block address, B2 is a memory read / write, B3 is a memory busy, B4 is a memory request, B5 is a memory acknowledge, B6 to B8 are SB1 set to SB3 set, and B9 to B12 are SB0 clock to SB3 clock.
[0059]
The memory address match checking unit 202a has the same configuration and operation as the memory address match check unit 102a of the load buffer control unit 102 shown in FIG. Description of is omitted.
[0060]
The store buffer signal generation unit 202b is a circuit that generates SB0-3 clocks and SB1-3 set signals. As shown in FIG. 8, the write buffer tail decoder 74, AND circuits 12-14, and OR circuits 35-40 , Selectors 63 to 66 and AND circuits 15 to 18.
The write buffer tail decoder 74 is a 2-4 decoder, and decodes a 2-bit input signal indicating the last position of the write data buffer queue composed of the store buffer SB counted by the write buffer pointer 82 of the sequence control unit. Therefore, the store buffer SB corresponding to one of the four outputs asserted is the tail of the queue.
[0061]
The AND circuits 12 to 14 generate SB1 to 3 set signals, and 1 to 3 outputs of the write buffer tail decoder 74 (asserted when the input is “01”, “10”, and “11”, respectively). The logical product with the output (described later) of the AND circuit 1 of the sequence control unit is obtained. Here, when there is a signal “1” among the outputs of the AND circuits 12 to 14, store data and small addresses input from the CPU 300 are selected in the data selectors SB 1 to 3 B of FIG. SB1 to SB3 and small address registers SB1 to 3A are input to corresponding registers.
[0062]
On the other hand, since the data register SB0 and the small address register SB0A are installed at the entrance of the queue, the subsequent registers are not connected to the input. Accordingly, when data is stored in the data register SB0 and the small address register SB0A, store data and a small address input from the CPU 300 are always input.
[0063]
The OR circuits 35 to 37 select generation of SB0 to 3 clock signals to be output to the store buffer SB when new data is stored in the queue and when the contents of the queue are shifted out and written to the memory device 400. This is the part that takes logic. When new data is stored in the queue, the clock signal is supplied to the store buffer SB connected before the register indicated by the value of the write buffer pointer 82, that is, in the head direction of the queue. Also, when the contents of the queue are shifted out and written to the memory device 400, only the data already stored in the queue is shifted in the head direction unless new data to be written from the outside is input from the CPU 300. To do. Therefore, also in this case, the clock signal is supplied to the store buffer SB connected before the register indicated by the value of the write buffer pointer 82, that is, in the head direction of the queue. By taking such logic in the OR circuits 35 to 37, invalid data is not stored in the queue even when the contents of the queue are written to the memory device 400.
[0064]
In the OR circuits 38 to 40, selection of SB0 to 3 clock signal generation in the case of executing a store instruction for the same memory address as the write memory address in the memory device 400 of data already stored in the queue is performed. In this case, in order to overwrite the queue contents, SB0-3 clock signals are generated in accordance with one to four signals asserted among the SB0-3 matches input from the store data buffer unit 201.
[0065]
That is, when only the store buffer SB0 is asserted, only the SB0 clock is generated, and when only the store buffer SB1 or both the store buffers SB0 and SB1 are asserted, the SB0 clock and the SB1 clock are generated. If only store buffer SB2, both store buffers SB2, SB1, or three store buffers SB2, SB1, and SB0 are asserted, three of SB0 clock, SB1 clock, and SB2 clock are generated and stored Only the buffer SB3, both the store buffers SB3 and SB2, the three store buffers SB3, SB2 and SB1, or the store buffer SB3, SB2, SB1 and the store buffer SB0 are all asserted when the SB0 clock is asserted. , SB1 clock, SB2 clock, and SB3 all clocks are generated.
[0066]
As a result, newly written data is stored at the tail end of the queue indicated by the write buffer tail signal (output of the write buffer pointer 82), and the original store data is overwritten on the subsequent store data.
The selectors 63 to 66 are selection circuits for signals generated through the two systems of SB0 to 3 clock generation selection circuits. When the output (described later) of the AND circuit 3 of the sequence control unit is “1”, that is, the queue When the store data stored in is overwritten, the outputs of the OR circuits 38 to 40 and the SB3 match signal are selected and output to the AND circuits 15 to 18. On the other hand, when the output of the AND circuit 3 is “0”, that is, when new store data is stored in the queue, or when the store data stored in the queue is sequentially written in the memory device 400, The 0 output of the write buffer tail decoder 74 and the OR circuits 35 to 37 are selected and input to the AND circuits 15 to 18. The AND circuits 15 to 18 AND the outputs of the selectors 63 to 66 and the output (described later) of the OR circuit 34 of the sequence control unit, and output SB0 to 3 clocks to be output to the store data buffer unit 201. Generate. The output of the OR circuit 34 is a signal that determines the assert timing of the clock supplied to the store buffer SB.
[0067]
The sequence control unit performs input / output of control signals to / from the CPU 300 and the memory device 400, controls writing operation to the memory device 400, and controls operation of the store data buffer unit. The sequence control unit includes AND circuits 1 to 11, OR circuits 31 to 34, a memory request register 52, buffers 41 to 43, a block address buffer 44, and a write buffer pointer 82 shown in FIG.
[0068]
In the sequence control unit, first, the AND circuit 1 calculates the logical product of the request signal from the CPU 300 and the read / write signal, and generates a write request signal. When the write request signal is “1”, it is recognized that the store instruction is executed from the CPU 300. In addition, the OR circuit 31 calculates the logical sum of all the SB0-3 match signals input from the store data buffer unit 201. The output of the OR circuit 31 is an offset address match signal, and the store data buffer unit 201 (queue) is newly written in the lower 2 bits of the memory address requested by the CPU 300, that is, the same address within the memory block as the small address. For example, this signal is asserted when store data for a memory offset address is already stored in the queue.
[0069]
The AND circuit 2 takes a logical product of the output of the comparator 91 of the memory address match checking unit 202a and the inverted signal of the offset address match output from the OR circuit 31. The output of the AND circuit 2 is that the tag match signal is “1”, that is, the contents of the memory block address stored in the tag register 51 and the contents of the large address input from the CPU 300 are the same, but the small address and the queue This means that the contents of the memory offset addresses stored in the small address registers SB0 to 3A do not match. That is, this is a case where a store instruction for writing to a new intra-block offset address of the same memory block as the store data stored in the queue is executed. In this case, the input store data and the contents of the small address are stored in the queue. The storage destination store buffer SB at this time is indicated by a write buffer tail signal that is an output of the write buffer pointer 82.
[0070]
The AND circuit 3 takes the logical product of the output of the comparator 91 of the memory address match checking unit 202a and the offset address match signal. The output of the AND circuit 3 is asserted as an address match signal when the address input from the CPU 300 matches the memory address stored in the tag register 51 and the queue. In this case, the store data already stored in the queue is overwritten, and the store data newly input from the CPU 300 is stored at the end of the queue specified by the write buffer tail signal.
[0071]
The AND circuit 4 takes a logical product of the inverted signal of the output of the comparator 91 of the memory address coincidence inspection unit 202a, the write request signal that is the output of the AND circuit 1, and the memory busy signal. The memory busy signal is a signal common to the load buffer unit 100 and the store buffer unit 200 as described in the load buffer unit 100.
[0072]
The AND circuit 5 calculates the logical product of the output of the AND circuit 1 and the output of the AND circuit 2, and outputs an enqueue signal. The AND circuit 6 calculates the logical product of the output of the AND circuit 1 and the output of the AND circuit 3, and outputs an overwrite signal. The output of the AND circuit 3 is used as a selection signal for the selectors 63 to 66 of the store buffer signal generation unit 202b. The enqueue signal is output to the OR circuit 34 and the OR circuit 32. The overwrite signal is input to the OR circuit 34. The OR circuit 34 takes a logical sum of these two signals and an output (described later) of the AND circuit 10, and outputs a shift timing signal.
[0073]
The shift timing signal is input to the AND circuits 15 to 18 of the store buffer signal generation unit 202b, and gives the assert timing of the SB0 to 3 clock signals. The shift timing signal is further output to the AND circuit 7. The AND circuit 7 calculates the logical product of the write request signal and the shift timing signal, is driven by the buffer 41, and outputs the result as an acknowledge signal to the CPU 300. The output of the AND circuit 4 is output to the set input (S input) of the memory request register 52 and the OR circuit 32. The memory request register 52 is set at the rising edge of the set input and asserts the M request signal. Similar to the M request signal of the load buffer control unit 102, the M request signal is an activation signal for performing write access to the memory device 400.
[0074]
The M request signal drives the output of the tag register 51 as an enable signal of the block address buffer 44, outputs the memory large address to the memory device 400, is driven by the buffer 42, and outputs the memory read / write signal to the memory device 400. Then, it is output to the AND circuit 11 and logically ANDed with the inverted signal of the memory acknowledge input from the memory device 400, driven by the buffer 43, and output to the memory device 400 as a memory request signal.
[0075]
The memory request signal and the memory acknowledge signal are handshake signals as described in the load buffer unit 100. The M request signal is further output to the AND circuit 10. The AND circuit 10 calculates the logical product of the M request signal and the memory acknowledge signal, and outputs a write acknowledge signal. The write acknowledge signal is a signal indicating that the operation for the request signal for the memory write access request output from the store buffer unit 200 is completed, and it is recognized that the memory operation is completed at the rising edge of this signal.
[0076]
The output of the AND circuit 10, that is, the write acknowledge signal is output to the OR circuit 34, the AND circuit 8, and the AND circuit 9. The OR circuit 34 generates a shift timing signal. When the write acknowledge signal is asserted and input to the OR circuit 34, it indicates that the write access to the memory device 400 is completed. The shift timing signal is asserted and output to the AND circuits 15 to 18 to give the assert timing of the SB0 to 3 clock signals. The shift timing signal is output to the AND circuit 7.
When the write acknowledge signal is asserted, the shift timing signal is asserted, and when the AND output with the write request is asserted in the AND circuit 7, the acknowledge signal for the execution of the store instruction accompanied with the writing to the memory device 400 Is asserted, driven by the buffer 41, and output to the CPU 300. The acknowledge signal in this case is asserted when the first word of the store data buffer (queue) is completely written to the memory device 400.
[0077]
The output of the AND circuit 10, ie, the write acknowledge signal, is ANDed with the output of the write buffer tail decoder 74, ie, the last access signal, in the AND circuit 8, and the AND circuit 9 performs the AND operation on the last access signal. The logical product with the inverted signal is taken. The output of the AND circuit 8 is a write to an old memory data block stored in a store data buffer (queue) to the memory device 400 (different from the write destination memory data block of data newly requested by the CPU 300). Asserted upon completion of the last memory access. The output of the AND circuit 9 is asserted when memory access other than the last of the memory write is completed. The output of the AND circuit 8 is a write-dan signal, and is output to the reset input of the memory request register 52 and the OR circuit 33. The output of the AND circuit 9 is output to the OR circuit 32.
[0078]
The memory request register 52 is reset when the reset input rises. When the memory request register 52 is reset, the M request signal is negated, thereby completing one to four memory writes to the memory device 400 and negating the memory busy signal. The OR circuit 32 takes the logical sum of the output of the AND circuit 4, that is, the set input to the memory request register 52, and the output of the AND circuit 9, and the output is an up input to the write buffer pointer 82. Connected to. The OR circuit 33 takes the logical sum of the output of the AND circuit 5, that is, the enqueue signal, and the output of the AND circuit 8, that is, the write-in signal, and the output is a down input to the write buffer pointer 82. Connected.
[0079]
The write buffer pointer 82 generates a 2-bit write buffer tail signal for indicating the tail end of the store data buffer (queue). The counting operation by the write buffer pointer 82 is shown below. Store data buffers are numbered in the order of 3, 2, 1, 0 from the head position (exit). Therefore, the queue structure entry (not necessarily the tail; the tail is the queue entry during operation) is zero. The tail of the queue usually indicates an SB to which data can be written next. Therefore, the write buffer pointer is updated when data is inserted (one number is decreased), and is prepared for the next data insertion, and is updated when data is written to the memory device 400 (one number is increased).
[0080]
[A. Initial state, ie immediately after power-on]
Since no data is written, it becomes 3.
[B. When executing a store instruction to the same memory block as the contents of the tag register]
(B-1. When the small address is the same as one of the small address registers SB0 to 3A): Since data is overwritten, there is no increase or decrease.
(B-2. When the small address is not the same as any of the small address registers SB0 to 3A): 1 is subtracted because data is inserted.
[0081]
[C. When executing a store instruction to a memory block different from the contents of the tag register]
{Circle around (1)} First, it is incremented by 1 (data insertion into the queue and shifting of the original queue content toward the head, ie, preparation of a pointer for shifting out the queue content of one word).
{Circle around (2)} A write access to the memory device 400 is performed, and is incremented by 1 each time a memory acknowledge is asserted (a pointer is prepared to shift out the queue contents. Write data to a new memory block is inserted). The increase continues until the pointer value becomes 3.
(3) When the value of the pointer is 3, when the memory acknowledge signal is asserted during memory write access, 1 is subtracted to 2, and the memory write access is completed. That is, all the write data to the previous memory block is written to the memory device 400, the write data to the new memory block advances to the head of the queue, and the pointer value becomes 2 in preparation for the insertion of the next data ( The second SB2 from the head of the queue is shown).
[0082]
In the above description, the output of the AND circuit 5, that is, the enqueue signal is asserted at b-1, the output of the AND circuit 4 is asserted at c- <1>, and the AND circuit 9 at c- <2>. And the output of the AND circuit 8, that is, the write-dan signal is asserted, and the up-input or down-input of the write buffer pointer 82 via the OR circuit 32 or the OR circuit 33. And the necessary counting operation is performed by the write buffer pointer 82.
[0083]
[Operation]
9 to 11 are explanatory diagrams showing the operation of the first embodiment. FIG. 9 shows the operation during loading, and FIGS. 10 and 11 show the operation during storage.
When executing the load instruction, the contents of the V flag (“1”) in the case where the contents of the tag register 51 of the load buffer unit 100 and the contents of the large address input from the CPU 300 match or do not match, and the large address matches. The operation differs depending on whether it is valid or invalid when “0”.
[0084]
When the store instruction is executed, the contents of the tag register 51 of the store buffer unit 200 and the contents of the large address input from the CPU 300 match or do not match, and these and the memory offset address stored in the queue and the CPU 300 input. The operation differs depending on whether the contents of the small addresses match or do not match. Furthermore, when the store instruction is executed, the operation is further different depending on whether the contents of the tag register 51 of the load buffer unit 100 and the contents of the large address input from the CPU 300 match or do not match. 9 to 11 show combinations and operations of these.
[0085]
12 and 13 are time charts at the time of execution of a load instruction in the first embodiment and a second embodiment to be described later.
FIG. 12 shows an operation in the case of a buffer hit. In this case, the memory device 400 is not read-accessed, and necessary data in the load data buffer unit 101 is selected and output to the CPU 300. In FIG. 12, the request signal and the acknowledge signal have a handshake relationship. The large address, small address, and read / write signal are correct values while the request signal is asserted. The load data is correct at the rising edge of the acknowledge signal.
[0086]
FIG. 13 shows an operation in the case of a buffer miss. In this case, the memory device 400 is read-accessed, and the data requested by the CPU 300 is output to the CPU 300 and stored in the load data buffer unit 101. Further, after the memory address at which the CPU 300 performs read access to the memory device 400, the memory device 400 continues to the last data of the memory block (consisting of four words in the first embodiment and the second embodiment described later). And stored in the corresponding register of the load data buffer unit 101. The assertion of the acknowledge signal to the CPU 300 and the output of the load data are determined when the data read from the memory device 400 is input in the first memory read access, that is, when the memory acknowledge signal is asserted in response to the assertion of the first memory request signal. .
[0087]
FIG. 13 shows an operation when the content of the small address is “00”, that is, when a load instruction is executed for data arranged at the beginning of the memory block. In this case, the read access is performed from the data arranged at the beginning of the memory block in which the load buffer is missed, and all the four word memory blocks are continuously read and sequentially stored in the corresponding registers of the load data buffer unit 101. . In FIG. 13, the request signal and the acknowledge signal have a handshake relationship. The large address, small address, and read / write signal are correct values while the request signal is asserted. Further, the memory block address, the memory offset address, and the memory read / write signal are correct values while the memory request signal is asserted. The memory read data has a correct value at the rise time of the memory acknowledge signal and at the rise time of the LB0 to 3 set signals. The load data is correct at the rising edge of the acknowledge signal.
[0088]
14 and 15 are time charts at the time of execution of a store instruction according to the first embodiment and the second embodiment to be described later.
FIG. 14 shows a case of a buffer hit. In this case, the write access to the memory device 400 is not performed, and only the write data is stored in the store data buffer (queue). In FIG. 14, the request signal and the acknowledge signal have a handshake relationship. The large address, small address, read / write signal, and store data are correct values while the request signal is asserted.
[0089]
FIG. 15 shows an operation in the case of a buffer miss. In this case, write access is continuously made to the memory device 400, and write data already stored in the queue is written to the memory block of the memory device 400. In addition, store data input from the CPU 300 is newly stored in the queue. The assertion of the acknowledge signal to the CPU 300 is determined when the data at the head of the queue is completely written to the memory device 400 by the first memory write access, that is, when the memory acknowledge signal is asserted in response to the assertion of the first memory request signal. In FIG. 15, the request signal and the acknowledge signal have a handshake relationship. The large address, small address, read / write signal, and store data are correct values while the request signal is asserted. Further, the memory block address, the memory offset address, the memory read / write signal, and the memory write data are correct values while the memory request signal is asserted and at the rising edge of the SB0-3 clocks.
[0090]
[effect]
As described above, in the first embodiment, the memory buffer device includes the load data buffer unit 101 of multiple words, the store data buffer unit 201 (queue) of multiple words, and the control units. For example, a memory buffer device can be configured with a small-scale circuit such that a separate memory unit for storing data is not required as compared with a cache memory. As a result, it is possible to secure an operation speed that can sufficiently follow the increase in the operating frequency of the CPU 300, and the LSI chip area including the CPU 300 and the memory buffer device of the present invention is configured by the CPU and the cache memory according to the prior art. This can be significantly reduced compared to an LSI that has a low cost and can reduce power consumption.
[0091]
In addition, when a buffer miss occurs during execution of a load instruction, a memory access operation for a plurality of words is performed from the memory device 400. Therefore, the access frequency of the CPU 300 to the memory device 400 can be reduced, and the memory throughput can be improved. Therefore, the program execution performance can be improved.
[0092]
Further, in the first embodiment, when a buffer miss occurs at the time of executing the load instruction, read access from the memory address to the last address of the same memory block is performed and stored in the load buffer unit 100. Therefore, there are the following effects.
By continuously accessing only the data that precedes the memory address where the error occurred (in the direction of increasing memory address), the entire block is always read for the execution of a program with high forward reference frequency, such as array operations. The memory access cycle can be reduced as compared with the case where it is started. That is, if there is a high probability that the memory reference is performed continuously from the target address of the immediately preceding memory reference or forward several words away (in the direction in which the memory address increases), the target address of the immediately preceding memory reference This is because data access in the backward direction (in the direction in which the memory address decreases) is rare, and these data are not unnecessarily read from the memory device 400.
[0093]
<< Embodiment 2 >>
[Constitution]
In the second embodiment, the overall configuration is the same as that shown in FIG. 1 and the configuration shown in FIG. Further, the store buffer unit is the same as the configuration shown in FIGS.
The present embodiment differs from the first embodiment in the configuration of the load buffer control unit, which is as follows.
[0094]
FIG. 16 is a configuration diagram of a load buffer control unit according to the second embodiment.
That is, the present embodiment is different from the load buffer control unit 102 of the first embodiment in that a memory access counter 83 is newly installed. Since the configuration other than this point is the same as that of the first embodiment, the corresponding parts are denoted by the same reference numerals and the description thereof is omitted.
[0095]
The difference between the second embodiment and the first embodiment is only the operation when a buffer miss occurs during execution of a load instruction. In such a case, in the first embodiment, an operation of continuously reading from the memory device 400 to the data at the last address of the memory block continuously from the memory address in which a miss occurs is stored in the corresponding register of the load data buffer unit 101. Went. In contrast, in the second embodiment, the data at the last address of the memory block continuously from the memory address where the miss occurred and the data at the memory address immediately before the memory address where the miss occurred from the data at the head address of the memory block. Until that is, that is, the entire data of one memory block is read and accessed from the memory device 400 and stored in the corresponding register of the load data buffer unit 101.
[0096]
For this purpose, two counters, an offset address counter 81 and a memory access counter 83, are installed as shown in FIG. The offset address counter 81 generates an offset address in the memory block to which read access is performed, and counts the number of access cycles for performing data read access for one memory block. The count clock input of the memory access counter 83 is the same signal as the count clock input of the offset address counter 81, and is the same as the read acknowledge signal of the load buffer control unit 102 shown in FIG.
[0097]
The load input of the memory access counter 83 is the same signal as the load input of the offset address counter 81 and is the same as the set input to the memory request register 52 of the load buffer control unit 102 shown in FIG. At this time, the memory access counter 83 is always loaded with “00”. The carry output of the memory access counter 83 is asserted when the 2-bit counter overflows, that is, when it becomes 4. As a result, it is recognized that data of four words, which are data for one memory block, have been continuously read from the memory device 400. This signal is a last access signal, ANDed with the read acknowledge signal by the AND circuit 10, and its output becomes a reset signal for the memory request register 52 to reset the memory request register 52. The last access signal has the same role as the load buffer control unit 102 of the first embodiment shown in FIG.
By installing the memory access counter 83, in the case of a buffer miss, the entire memory block in which the miss has occurred, that is, data access for four words is continuously performed on the memory device 400.
[0098]
FIGS. 17 to 19 are explanatory diagrams showing the operation of the second embodiment. FIG. 17 shows the operation during loading, and FIGS. 18 and 19 show the operation during storage.
The operation of the memory buffer device according to the second embodiment is different from the operation of the memory buffer device according to the first embodiment shown in FIGS.
In the second embodiment, in the case of a load buffer miss, the entire memory block (4 words) including the memory address that is always a miss is read from the memory device 400 and stored in the corresponding register of the load data buffer unit 101. In the memory access, the load V flag is always “1”. Therefore, when the contents of the tag register 51 of the load buffer unit 100 match the contents of the large address input from the CPU 300, a buffer hit is always caused. In this case, data from the corresponding register of the load data buffer unit 101 is transferred to the CPU 300. Output.
[0099]
On the other hand, if the contents of the tag register 51 and the contents of the large address input from the CPU 300 do not match, a buffer miss occurs, so that data corresponding to one memory block from the memory device 400 continues from the memory address in which the miss occurred. The first read data is output to the CPU 300, and the data (4 words) for one memory block is stored in the corresponding register of the load data buffer unit 101. The operation at the time of executing the store instruction is the same as the operation of the memory buffer device of the first embodiment shown in FIGS.
[0100]
Further, the timing of each signal at the time of execution of the load instruction and the execution of the store instruction in the second embodiment is the same as the state shown in FIGS. However, in the second embodiment, even if the content of the small address is any value from “00” to “11”, the read access is performed from the data arranged at the beginning of the memory block in which the load buffer miss occurs. All four word memory blocks are successively read and sequentially stored in the corresponding registers of the load data buffer.
[0101]
[effect]
As described above, in the second embodiment, similarly to the first embodiment, it is possible to reduce the cost and reduce the power consumption, and when a buffer miss occurs when the load instruction is executed, Since the read access of the data of the entire memory block including the memory address having the miss is performed and stored in the load buffer unit 100, the following effects are obtained.
By continuously accessing the entire memory block including the memory address that caused the miss, for example, for the execution of a program having a high forward reference frequency and a backward reference frequency such as operations using a stack, the entire block is By constantly reading, the total number of buffer misses can be reduced. That is, the memory reference is the same level of memory access to the front (memory address increasing direction) and backward (memory address decreasing direction) that are consecutive or several words away from the target address of the immediately preceding memory reference. When there is a probability, the memory throughput can be maximized by continuously accessing from the memory device 400 data of the maximum number of words that can be continuously accessed and stored in the buffer at a time.
[0102]
【The invention's effect】
As described above, according to the memory buffer device of the first aspect of the present invention, the load data buffer unit having a register of a plurality of words, the store data buffer unit having a first-in first-out buffer of a plurality of words, and these control units are configured. Therefore, for example, a memory buffer device can be configured with a small-scale circuit such that a separate memory unit for storing data is not required as compared with a cache memory. As a result, it is possible to ensure an operation speed that can sufficiently follow the increase in the operating frequency of the CPU, and the LSI chip area including the CPU and the memory buffer device of the present invention is configured by the CPU and the cache memory according to the prior art. This can be significantly reduced compared to an LSI that has a low cost and can reduce power consumption.
[0103]
In addition, when a buffer miss occurs during the execution of the load instruction, the memory access operation of a plurality of words is performed from the memory device, so that the access frequency of the CPU to the memory device can be reduced, and the memory throughput can be improved. Therefore, it can contribute to the improvement of the program execution performance.
[0104]
Furthermore, when a buffer miss occurs during execution of a load instruction, read access from the memory address to the last address of the same memory block is performed and stored in the load buffer unit. effective.
By continuously accessing only the data that precedes the memory address where the error occurred (in the direction of increasing memory address), the entire block is always read for the execution of a program with high forward reference frequency, such as array operations. The memory access cycle can be reduced as compared with the case where it is started.
[0105]
According to the memory buffer device of the second invention, similarly to the first invention, the load data buffer unit having a register of a plurality of words, the store data buffer unit having a first-in first-out buffer of a plurality of words, and these control units are constituted. Therefore, there are the same effects as the first invention. Further, when a buffer miss occurs at the time of executing the load instruction, the data read access of the entire memory block including the memory address in which the miss occurred is performed and stored in the load buffer unit. effective. In other words, by continuously accessing the entire memory block including the memory address in which a mistake has occurred, for example, for a program execution in which the forward reference frequency and backward reference frequency are as high as operations using a stack, The total number of buffer misses can be reduced by always reading the whole.
[0106]
According to the memory buffer device of the third invention, in response to a data write request from the central processing unit, if the corresponding data exists in any of the registers in the load data buffer unit, this data is updated. Therefore, in such a case, it is not necessary to access the memory device, and therefore the program execution performance can be improved.
[Brief description of the drawings]
FIG. 1 is a configuration diagram showing first and second embodiments of a memory buffer device of the present invention.
FIG. 2 is a configuration diagram of a load buffer unit of the memory buffer device of the present invention.
FIG. 3 is a configuration diagram of a load buffer control unit according to the first embodiment of the memory buffer device of the present invention;
FIG. 4 is a configuration diagram of a load data buffer valid flag unit in the first embodiment of the memory buffer device of the present invention;
FIG. 5 is a configuration diagram of an LB0 to 3 set signal generation unit in the first embodiment of the memory buffer device of the present invention;
FIG. 6 is a configuration diagram of a store buffer unit in the first embodiment of the memory buffer device of the present invention;
FIG. 7 is a configuration diagram of a store buffer control unit in the first embodiment of the memory buffer device of the present invention;
FIG. 8 is a configuration diagram of a store buffer signal generation unit in the first embodiment of the memory buffer device of the present invention;
FIG. 9 is an explanatory diagram of an operation (when loading) in the first embodiment of the memory buffer device of the present invention;
FIG. 10 is an explanatory diagram (part 1) of an operation (during storage) in the first embodiment of the memory buffer device of the present invention;
FIG. 11 is an explanatory diagram (part 2) of the operation (during storage) in the first embodiment of the memory buffer device of the present invention;
FIG. 12 is a time chart (buffer hit) when a load instruction is executed in the memory buffer device of the present invention;
FIG. 13 is a time chart (buffer miss) when a load instruction is executed in the memory buffer device of the present invention.
FIG. 14 is a time chart (buffer hit) when a store instruction is executed in the memory buffer device of the present invention.
FIG. 15 is a time chart (buffer miss) when a store instruction is executed in the memory buffer device of the present invention;
FIG. 16 is a configuration diagram of a load buffer control unit in the second embodiment of the memory buffer device of the present invention;
FIG. 17 is an explanatory diagram of an operation (when loading) in the second embodiment of the memory buffer device of the present invention;
FIG. 18 is an explanatory diagram (part 1) of an operation (during storage) in the second embodiment of the memory buffer device of the present invention;
FIG. 19 is an explanatory diagram (part 2) of the operation (during storage) in the second embodiment of the memory buffer device of the present invention;
[Explanation of symbols]
100 Load buffer section
101 Load data buffer
102 Load buffer control unit
200 Store buffer
201 Store data buffer
202 Store buffer control unit

Claims

A load buffer unit for reading data from the memory device, and a store buffer unit for writing to the memory device,
The load buffer unit
A load data buffer unit having a plurality of independently read / write registers for storing data for one memory block;
In response to a data read request from the central processing unit to the memory device, if the corresponding data exists in any of the registers, return it to the central processing unit,
If no corresponding data exists in any register, data from the memory device to the last address of the same memory block is read continuously from the memory address of the data of the read request, and these are sequentially stored in the register. The load buffer control unit
The store buffer unit
A store data buffer unit having a first-in first-out buffer for storing a plurality of words as data for one memory block;
In response to a write request from the central processing unit to the memory device, if the corresponding data exists in the first-in first-out buffer, it is updated, and the storage position is the last, and the same memory block of the corresponding data is stored. If data exists, store the data at the end,
When the data in the first-in first-out buffer is a memory block different from the memory block of the corresponding data, all the data stored in the first-in first-out buffer is written to the memory device and the corresponding data is stored in the first-in first-out buffer. A memory buffer device comprising: a store buffer control unit storing at a head position.

A load buffer unit for reading data from the memory device, and a store buffer unit for writing to the memory device,
The load buffer unit
A load data buffer unit having a plurality of independently read / write registers for storing data for one memory block;
In response to a data read request from the central processing unit to the memory device, if the corresponding data exists in any of the registers, return it to the central processing unit,
When there is no corresponding data in any register, the load buffer control unit reads out the entire memory block data including the memory address of the data of the read request from the memory device, and stores them sequentially in the register And consist of
The store buffer unit
A store data buffer unit having a first-in first-out buffer for storing a plurality of words as data for one memory block;
In response to a write request from the central processing unit to the memory device, if the corresponding data exists in the first-in first-out buffer, it is updated, and the storage position is the last, and the same memory block of the corresponding data is stored. If data exists, store the data at the end,
When the data in the first-in first-out buffer is a memory block different from the memory block of the corresponding data, all the data stored in the first-in first-out buffer is written to the memory device and the corresponding data is stored in the first-in first-out buffer. A memory buffer device comprising: a store buffer control unit storing at a head position.

The memory buffer device according to claim 1 or 2,
In response to a data write request from the central processing unit to the memory device, when the corresponding data exists in any of the registers in the load data buffer unit, a load buffer control unit is provided for updating the data. A memory buffer device.