JP3952856B2

JP3952856B2 - Caching method

Info

Publication number: JP3952856B2
Application number: JP2002153586A
Authority: JP
Inventors: 仁小柳; 邦夫中野; 直人山本; 友則駒坂
Original assignee: Konica Minolta Inc
Current assignee: Konica Minolta Inc
Priority date: 2002-05-28
Filing date: 2002-05-28
Publication date: 2007-08-01
Anticipated expiration: 2022-05-28
Also published as: JP2003347930A

Description

【０００１】
【発明の属する技術分野】
本発明は、プログラマブル論理回路を含むコンピュータシステムにおけるキャッシュ方法に関する。
【０００２】
【従来の技術】
高速なコンピュータシステムを構成する方法としては、大別して２つの方法がある。第１の方法は、図８に示すように、複数のＣＰＵ２を設け、それらを並列に動作させるマルチプロセッサ方式であり、第２の方法は、図９に示すように、ＣＰＵ２の動作を支援することにより、高速化を達成するものである。
【０００３】
マルチプロセッサの代表例としては、並列型のスーパーコンピュータや高性能なサーバー等が挙げられる。また最近では、ＪＡＶＡ（Ｒ）プロセッサを搭載するケースや、ＤＳＰ（Digital Signaling Processor）を搭載するケースなど、特定のアプリケーションに特化して性能を向上させることができるようなマルチプロセッサシステムも提案されている。
【０００４】
また、ＣＰＵ２の動作を支援する構成としては、コプロセッサを搭載する構成やＡＳＩＣ（Application Specific Integrated Circuit）を搭載する構成等がある。
【０００５】
コプロセッサの代表例として、浮動小数点演算を高速に行うＦＰＵ（Floating-point Processing Unit）がある。このＦＰＵを用いることにより、ＣＰＵがハードで浮動小数点演算を行えず、ソフトウェアで処理しなければならない場合に高速化が可能になる。他の例としては、行列の演算を高速に行うためのベクトルコプロセッサがある。これは、行列演算などの規則的な演算を高速に行うためのハードウェアを備えているものである。このようなコプロセッサを用いることにより、科学技術計算などで規則的な演算が大量に出てくる場合に、ＣＰＵだけで処理するよりも高速に処理することが可能になる。
【０００６】
一方、ＡＳＩＣは、アプリケーションに特化した機能の一部または全部をハードウェアで構成することにより、高速動作を可能にするものであり、その代表例としてゲートアレイが知られている。組み込み型のシステムでは、ＡＳＩＣを搭載することより高速でかつ小型のシステム構築が可能になるが、このＡＳＩＣは、アプリケーションに特化した機能を持つために、さまざまなアプリケーションに対応することができないという欠点がある。
【０００７】
この欠点を補うため、近年、ＦＰＧＡ（field Programmable Gate Array）やＰＬＤ（Programmable logic device）等の書き換え可能な領域を備えるデバイス（以下、これらを総称してプログラマブル論理回路と呼ぶ。）の開発が行われている（例えば、米国特許第４，７００，１８７号）。このプログラマブル論理回路は、図１０に示すように、ＬＵＴ（LookUp Table）８ａとフリップフロップ８ｂとで構成される基本セル８を配列したものであり、ＬＵＴ８ａを書き換えることにより内部のハードウェアロジックを変更することが出来る。従って、アプリケーションに応じてハードウェアロジックを書き換えることができることから、特殊な用途やサイクルの短い機器等の制御デバイスとして利用されている。
【０００８】
【発明が解決しようとする課題】
一般にコンピュータシステムの高速化を図るには、多くのハードウェアを使用すればよい。例えば、図８のマルチプロセッサシステムでは、ＣＰＵ２の個数を増やすほど、全体のパフォーマンスは向上する。また、図９のコプロセッサやＡＳＩＣを搭載するシステムでは、より多くのＣＰＵ２の仕事を肩代わりすることができるようなハードウェアを構成すれば、全体のパフォーマンスは向上する。しかしながら、このような方法ではシステムを構成する部品数が増えるため、システムの高価格化と大規模化を招いてしまう。また、これらのシステムでは、ハードウェアが提供する機能が限定的なものであるために、さまざまなアプリケーションに対応できる機能を提供することができない。
【０００９】
一方、図１０に示すプログラマブル論理回路では、アプリケーションで必要となる複数の処理の回路情報を予めメモリに格納しておき、必要に応じてメモリから読み出して書き換え可能領域に書き込むことにより、その時点で必要となる回路を生成することが可能である。従って、この方法では、回路規模の小さなプログラマブル論理回路を用いて、その回路規模以上の回路を実現することができ、コンピュータシステムの小型化と低コスト化を実現することが可能となる。しかしながら、単一の書き換え可能領域からなるプログラマブル論理回路では、ある時間で見た場合には一つのアプリケーションに特化しており、マルチタスクで動作するコンピュータに比べて高速化に限界がある。
【００１０】
そこで、マルチタスクを実現するために、複数のプログラマブル論理回路を並設し、各々を独立して駆動させる構成とすることも可能であるが、この構成ではマルチプロセッサシステムと同様にシステムの高価格化と大規模化を招き、また、プログラマブル論理回路同士を結ぶ配線が複雑になるために結果として処理が遅くなってしまう。
【００１１】
本発明は、上記問題点に鑑みてなされたものであって、その主たる目的は、限られたハードウェアを活用し、マルチタスクを含むさまざまなアプリケーションを高速に処理することができるキャッシュ方法を提供することにある。
【００１２】
【課題を解決するための手段】
上記目的を達成するため、本発明は、プログラマブル論理回路の書き換え可能なロジック部を略等しいデータサイズの複数のスロットに分割し、スロット番号を前記スロットに対して割り当て、前記スロット番号に対応する前記ロジック部の領域に対して、データ又はファイルの形でソフトウェアから参照されるハードウェアロジックを順次書き込むキャッシュ方法であって、前記スロットの個数以上のハードウェアロジックを挿入する場合に、各々の前記スロットに挿入されている前記ハードウェアロジックの履歴情報を参照して、使用する可能性の低いスロットを選択し、該選択されたハードウェアロジックの追い出しを行うものである。
【００１３】
本発明においては、前記ハードウェアロジックと前記スロットの容量又は該スロットに割り当てられたアドレス空間とを比較し、前記ハードウェアロジックが、前記スロットの容量又は該スロットに割り当てられたアドレス空間よりも大きい場合に、複数の前記スロットに対して前記ハードウェアロジックを書き込む構成とすることができる。
【００１４】
また、本発明においては、前記スロットの入出力段に設けた制御手段により、前記スロットに書き込まれたハードウェアロジックと外部のハードウェア又はソフトウェアとの調停を行う構成とすることができる。
【００１５】
また、本発明においては、前記ハードウェアロジックの割り込み信号を前記スロットに設けた割り込みポートに割り付け、前記スロットの出力段に設けた割り込み制御手段により、該割り込み制御手段が設定する優先順位に従ってプロセッサに割り込みを通知する構成とすることもできる。
【００１６】
また、本発明においては、前記スロット及び前記制御手段に設けた切り離し手段により、ハードウェアロジックが書き込まれていない前記スロット、又は、書き換えが行われている前記スロットを論理的に分離する構成とすることもできる。
【００２４】
このように、本発明の構成によれば、略等しいデータサイズに分割された複数のスロットに、ソフトウェアの管理下に置かれたハードウェアロジックが、調停手段、割り込み手段、切り離し手段によって制御されて、必要に応じて書き込まれるため、アプリケーションで必要な処理をハード上で高速に処理することができる。
【００２５】
また、スロットに書き込んだハードウェアロジックの登録情報や使用情報等の履歴情報を参照して、新たなハードウェアロジックと既に書き込まれているハードウェアロジックとが比較され、優先順位の低いハードウェアロジックが順次追い出されるため、ハードウェア資源を有効に利用することができ、スロット不足による処理の遅延を防止することができる。
【００２６】
【発明の実施の形態】
本発明に係るプログラマブル論理回路は、好ましい一実施の形態において、書き換え可能領域が個別にアクセス可能な複数のスロットに略等分割され、スロットの入出力段に、スロットに書き込むハードウェアロジックと外部のハードウェアやソフトウェアとの調停を行うスロット入力制御部及びスロット出力制御部と、ハードウェアロジックの割り込み制御を可能とする割り込み制御部と、各々のスロットの状態を示すステータスとを備えるものであり、この制御部で調停を取りながら、各々のスロットにソフトウェアによって管理されるハードウェアロジックを書き込むことにより、ソフトウェアの実行中に必要となる処理をハード上で高速に処理することができ、システム全体の規模を大きくすることなく、様々なアプリケーションに対応することができる汎用的なコンピュータシステムを提供することができる。
【００２７】
【実施例】
上記した本発明の実施の形態についてさらに詳細に説明すべく、本発明の一実施例に係るプログラマブル論理回路及びコンピュータシステム並びにキャッシュ方法について、図１乃至図７を参照して説明する。図１は、プログラマブル論理回路を含むコンピュータシステムの基本構成を示す図であり、図２は、各々のスロットの入出力インターフェース構成を示す図である。又、図３は、ハードウェアロジックの管理形態を示す図であり、図４は、アドレス空間の分割例を示す図である。又、図５は、ハードウェアロジックのリプレースの様子を示す図であり、図６及び図７は、ハードウェアロジックの書き換え処理の手順を示すフロー図である。
【００２８】
まず、本発明の基本概念について説明する。
【００２９】
ソフトウェアの高速化手段としてキャッシュがある。キャッシュは、ソフトウェアのプログラムやデータを高速にアクセスできる場所（例えば、キャッシュメモリ）に置くことにより、高速化を図る技術である。本発明の基本的な考え方は、プログラムやデータではなく、ハードウェアの機能をキャッシュする方法を提案するものである。
【００３０】
そのための構成について図１を参照して説明する。本発明の一実施例に係るコンピュータシステムは、書き換え可能領域を含むプログラマブル論理回路１と、ＣＰＵ等のプロセッサ２と、メモリ３とがＰＣＩ等のバス４によって接続され、プログラマブル論理回路１には、所定のデータサイズ（データ幅）、数量に分割された複数のスロット１０からなる書き換え可能領域と、各々のスロット１０に書き込まれたハードウェアロジックと外部のハードウェアやソフトウェアとの調停を行うスロット入力制御部１１及びスロット出力制御部１２と、ハードウェアロジックの割り込み制御を行う割り込み制御部１３と、各々のスロット１０の状態を示すステータス１４とを備えている。
【００３１】
なお、図の構成は本発明のコンピュータシステムの基本的な構成であり、プロセッサ２やメモリ３が複数接続されていてもよく、コプロセッサ等のＣＰＵの動作を支援する回路が接続されていてもよい。また、図ではプログラマブル論理回路１とＣＰＵ２とメモリ３とを別ＩＣ（ＬＳＩ）として記載しているが、実際の構成では、これらが一つのシステムＬＳＩ内に含まれる場合もある。以下では、便宜的にプログラマブル論理回路１をＣＰＵ２及びメモリ３と切り離して説明するが、プログラマブル論理回路１には、書き換え可能な領域に入出力制御部や割り込み制御部を備えた構造、更にプロセッサ２やメモリ３等を含むコンピュータシステム全体のいずれの構造も含まれるものとする。
【００３２】
まず、本発明では、高速化を実現する方法として、複数のプログラマブル論理回路１を設けるのではなく、１つのプログラマブル論理回路１の書き換え可能領域を複数のスロット１０という単位に分割することを特徴としている。このスロット１０は、図２に示すようなインターフェース信号によって機能する独立した領域であり、各々のスロット１０には自由にハードウェア機能を定義することができる。
【００３３】
インターフェース信号は、スロット入力制御部１１を介して入力される入力側インターフェース信号（Address、Data_in、Control_in）と、スロット出力制御部１２又は割り込み制御部１３に出力される出力側インターフェース信号（Data_out、Control_out、Interrupt）とがあり、各々は以下に示す役割を果たす。
【００３４】
・Address：１ビット又は複数ビット（スロット共通のビット数）からなり、このAddress信号により、スロット１０内部に埋め込まれる論理にＣＰＵ２又はソフトウェアがアクセスすることができる、
・Data_in：スロット１０内部の論理に対する入力データであり、例えば、３２ビット等の共通のビット数を持つ、
・Control_in：１ビット又は複数ビットからなり、スロットを制御するための制御信号である。例えば、Adoress信号と同期してスロット内部の論理に対して、ReadやWriteを行う、
・Data_out：スロット１０外部に対して演算結果などを出力するパスであり、例えば、３２ビット等の共通のビット数を持つ、
・Control_out：スロット１０内部の論理が外部の論理に対して要求等を通知するための出力信号である、
・(Interrupt)：オプションとして設けられるもので、ＣＰＵ２又はソフトウェアに、演算の終了通知などを行うための割り込み信号である。
【００３５】
なお、図では、スロット１０を３つに分割している例を示しているが、分割数や各々のスロット１０のデータサイズ（データ幅）は任意であり、スロット１０に書き込まれるハードウェアロジックのデータ幅やマルチタスクで処理する論理の数、コンピュータシステム全体の規模や性能等を勘案して設定することができる。但し、スロット１０のデータ幅が各々異なる場合、ハードウェアロジックのデータ幅を考慮して書き込むべきスロット１０を選択しなければならず、ハードウェアロジックの書き換えが煩雑になる可能性があるため、本発明のプログラマブル論理回路１を汎用性の高いシステムとするためには、スロット１０は略等しいデータサイズに等分割することが好ましい。
【００３６】
次に、分割された各々のスロット１０にハードウェアロジックを書き込むが、このハードウェアロジックは、図３に示すように、ファイル又はデータの形としてソフトウェア５の管理下に置き、ソフトウェア５の実行中にスロット１０への定義・リプレースが出来るようにする。
【００３７】
ここで、スロット１０は固定のビット数のアドレスやデータ幅を持つため、ハードウェアロジック６によっては空間やデータ幅が足りない場合もある。その場合には、複数のスロット１０を使用することにより解決することができる。例えば、スロットが３２ビットのデータ幅を持ち、６４ビットのハードウェアロジック６を入れたい場合には２つのスロット１０を使用すればよい。
【００３８】
なお、ハードウェアロジック６に関してもそのデータサイズ（データ幅）は任意であるが、データ幅が大きすぎると一度に多くのスロット１０を占有することになり、他のハードウェアロジック６の処理に支障をきたす場合もある。また、各々のハードウェアロジック６のデータサイズが違いすぎると、他のハードウェアロジック６との入れ替えができない場合も生じる。従って、各々のハードウェアロジック６はスロット１０の容量を考慮して構成されることが好ましい。
【００３９】
このハードウェアロジック６は独立に動作可能な部分であり、メモリ３へのアクセスやＣＰＵ２等のプロセッサとのアクセスは独自に行うことができるようにする必要がある。そこで、プロセッサ２から個別にアクセスすることが出来るようにするために、例えば、図４に示すように、アドレスの空間を分割し、アドレス（Slot1 offset等）を用いて個別制御が出来るようにしている。
【００４０】
また、本発明の構造では、各ハードウェアロジック６が個別に動作を行うので、外部のハードウェアやソフトウェアとの調停を行う必要がある。そこで、プログラマブル論理回路１内にスロット入力制御部１１、スロット出力制御部１２等の調停ロジックを設け、これにより外部のハードウェアやソフトウェア等との全体の調整を行う。また、指定された処理の終了通知や内部ステータスの変化などを通知するため、各スロット１０からの割り込み要求を受け付け、ＣＰＵ２へ割り込みを通知する割り込み制御部１３を搭載している。なお、このスロット入力制御部１１、スロット出力制御部１２、割り込み制御部１３は、書き換え可能領域にプログラマブルに形成しても、書き換え可能領域外部に固定ロジックとして形成してもよい。
【００４１】
また、各々のスロット１０は個別に書き換え可能であるが、あるスロットが書き換えを行っている間は、他のスロットは動作している必要がある。このため、各スロット及びその調停ロジック（スロット入力制御部１１及びスロット出力制御部１２）には、書き換え中のスロット１０を論理的に切り離すロジック（図示せず）を搭載している。
【００４２】
更に、本発明のプログラマブル論理回路１は、ハードウェアのキャッシュであるので、不必要になったハードウェアロジック６の追い出しが必要である。そこで、本発明では、各ハードウェアロジック６がどのスロット１０に入り、いつ使用されたか等の履歴をとり、その履歴情報から追い出すべきハードロジックを選択できるようにしている。
【００４３】
具体的に説明すると、図５に示すように、スロット（１）に音楽プレーヤー、スロット（２）にペイントツール、スロット（３）にゲーム１のハードウェアロジック６が書き込まれおり、各々のハードウェアロジック６が図の塗りつぶし領域で使用されているとすると、スロット（１）の音楽プレーヤーのハードウェアロジック６は一旦使用された後、使用されない状態が続いているため、スロット（１）から追い出しても問題ないと考えられる。そこで、新たなハードウェアロジック６が要求された時点（図の矢印の時点）で音楽プレーヤーを追い出し、代わりにゲーム２をリプレースしている。
【００４４】
このように、各々のスロット１０に書き込まれたハードウェアロジック６の書き込み時期、使用頻度、不使用期間等の履歴情報と、スロット１０の残数や処理が予定されている論理数等を勘案して、使用される可能性の低いハードウェアロジックを適宜追い出すことにより、スロット１０の空きを確保し、次のハードウェアロジック６の書き込みに対して待ち時間が生じないようにして処理の遅延を防止している。
【００４５】
以上説明したように、ハードウェアの機能をキャッシュするために、書き換え可能な領域を汎用的なデータサイズ、数量のスロット１０に分割し、各々のスロット１０に書き込まれたハードウェアロジック６と外部のハードウェア又はソフトウェアとの調停を行うためのスロット入力制御部１１、スロット出力制御部１２、割り込み制御部１３、切り離しロジックを設けることにより、ソフトウェアの管理下に置かれたハードウェアロジック６を、必要に応じてスロット１０に書き込んで機能させることができるため、複数のプログラマブル論理回路を設ける場合のようにシステム全体の規模を拡大することなく、高速処理を実現することができる。
【００４６】
また、各スロット１０に書き込まれたハードウェアロジック６の使用状況を監視し、履歴情報を参照して、ハードウェアロジックをスロット１０から追い出しているため、次のハードウェアロジック６を書き込むべきスロット１０がなく処理が滞るという問題を防止し、システム全体としての処理の高速化を図っている。
【００４７】
次に、上記構成のプログラマブル論理回路１にハードウェアロジック６を書き込む手順について具体的に説明する。
【００４８】
本発明のプログラマブル論理回路１は、上述したようにハードウェアをキャッシュするものであり、アプリケーション側からの要求（例えば、関数コール）が送信されてから、この要求に対応するハードウェアロジック６を空きスロット１０に書き込んでハード上で処理し、その処理結果をアプリケーションに返送する。そこで、アプリケーションとプログラマブル論理回路１とを仲介する手段が必要となるが、ここでは、この制御をハードマクロライブラリ管理部により行っている。
【００４９】
このハードマクロライブラリ管理部は、スロット１０に入れるハードウェアロジック６をライブラリとして持ち、アプリケーション側からの要求により、必要なハードウェアロジック６をスロット１０に挿入し、動作させるソフトウェアであり、このソフトウェアがＣＰＵ２やプログラマブル論理回路部１のスロット入力制御部１１やスロット出力制御部１２、ステータス１４にアクセスしてハードウェアロジック６の書き換え制御を行う。
【００５０】
以下、アプリケーションとハードマクロライブラリ管理部とプログラマブル論理回路部１とで行われる処理を図６及び図７に基づいて説明する。まず、図６に示すように、アプリケーション（例えば、画像処理プログラム）の動作において、画像処理関数等の演算が必要になった場合、アプリケーションはハードマクロライブラリ管理部に対して関数コールを行う。この関数コールに対して、ハードマクロライブラリ管理部では、ハードウェアロジック６の登録情報を参照して、この関数の演算を行うハードウェアロジック６が既にスロット１０に挿入されているか否かを調査する。
【００５１】
そして、対応するハードウェアロジックがスロット１０に挿入されていない場合は、ハードマクロライブラリ管理部は、プログラマブル論理回路１のステータス１４にアクセスして各々のスロット１０の状態を調べ、スロット１０に空きがあるか否かを確認する。
【００５２】
ここで、スロット１０に空きがない場合は、新たなハードウェアロジック６を書き込むことができないため、関数の演算ができずに処理が滞ってしまうが、本発明のプログラマブル論理回路１では、各々のスロット１０に書き込まれているハードウェアロジックの履歴情報を参照して、使用されていないハードウェアロジックが適宜追い出されるため、スロット１０の空きを確保することができ、処理が滞るという問題を回避している。
【００５３】
次に、スロット１０に空きがあることを確認したら、ハードマクロライブラリ管理部は、ライブラリの中から対応するハードウェアロジック６を取り出し、スロット入力制御部１１を介して図２に示す入力インターフェース信号により、該スロット１０に書き込みを開始する。その際、ハードウェアロジック６とスロット１０の容量とが比較され、例えば、ハードウェアロジック６のデータ幅がスロット１０のデータ幅より大きい場合は、複数のスロット１０に対して書き込みを行う。また、書き込み中は他のスロット１０の動作に支障が生じないように、スロット１０及び調停ロジック（スロット入力制御部１１及びスロット出力制御部１２）に設けた切り離しロジックにより、書き込み中のスロット１０を論理的に切り離す。そして、スロット出力制御部１２は、スロット１０への書き込みが終了したら書き込み終了信号をハードマクロライブラリ管理部に送出する。
【００５４】
次に、ハードマクロライブラリ管理部は、スロット入力制御部１１を介してスロット１０のControl inに制御信号を送り処理の起動を命令すると共に、Data inに関数の演算に必要なデータを送る。そして、プログラマブル論理回路１では、ハードウェアロジック６により関数の演算を行い、演算が終了したらData outからハードマクロライブラリ管理部に演算結果を送ると共に、interruptから割り込み制御部１３を介してＣＰＵ２に割り込み信号を送り、演算が終了したことを通知する。そして、ハードマクロライブラリ管理部はその演算結果をアプリケーションに送って一連の処理が終了する。
【００５５】
上記フローは関数コールに対応するハードウェアロジック６がスロット１０に挿入されていない場合の手順であるが、関数を演算するハードウェアロジック６が既にスロット１０に挿入されている場合には、図７に示すように、ハードマクロライブラリ管理部は、ステータス１４にスロット１０の空き状態を確認することなく、スロット入力制御部１１を介してスロット１０のControl inに制御信号を送り処理の起動を命令すると共に、Data inに関数の演算に必要なデータを送る。
【００５６】
そして、プログラマブル論理回路１では、ハードウェアロジックにより関数の演算を行い、演算が終了したらData outからハードマクロライブラリ管理部に演算結果を送ると共に、interruptから割り込み制御部１３を介してＣＰＵ２に割り込み信号を送り、演算が終了したことを通知する。そして、ハードマクロライブラリ管理部はその演算結果をアプリケーションに送って一連の処理が終了する。
【００５７】
このように、本発明のプログラマブル論理回路１では、アプリケーションから関数コールがあった場合に、自動的に、該関数の演算を行うハードウェアロジック６がスロット１０に挿入されているか、スロット１０に空きがあるかが調査され、挿入されていない場合に該ハードウェアロジック６がスロット１０に書き込まれ、ハード上で演算を行って演算結果が返送されるため、関数コールに対して迅速かつ円滑な処理を行うことができる。
【００５８】
【発明の効果】
以上説明したように、本発明のプログラマブル論理回路及びコンピュータシステム並びにキャッシュ方法によれば、下記記載の効果を奏する。
【００５９】
本発明の第１の効果は、システム全体の規模を大きくすることなく、マルチタスクを含む処理を高速に行うことができることができる。
【００６０】
その理由は、書き換え可能領域を複数のスロットに分割し、ソフトウェアで管理されたハードウェアロジックを必要に応じてスロットに書き込んで処理を行うため、書き換え可能領域を効率的に利用することができるからである。また、スロット入力制御部やスロット出力制御部等の調停手段や割り込み制御部を用いて外部のハードウェアやソフトウェアと調停するため、相互の処理を円滑に実行することができるからである。
【００６１】
また、本発明の第２の効果は、汎用的なプログラマブル論理回路を構築することができるということである。
【００６２】
その理由は、書き換え可能領域を複数のスロットに分割するに際して、スロットのデータ幅や分割数を一般的なハードウェアロジックに合わせて設定し、また、スロットよりも大きなハードウェアロジックを書き込む場合は、複数のスロットに書き込むことができるため、どのようなハードウェアロジックにも対応することができるからである。
【００６３】
また、本発明の第３の効果は、処理の遅延を防止し、演算要求に対して迅速に対応することができるということである。
【００６４】
その理由は、各々のスロットに書き込まれたハードウェアロジックの履歴情報を参照して、使用されていないハードウェアロジックの追い出しを行っているため、スロットの占有を防止し、書き込み待ちの発生を防止することができるからである。
【図面の簡単な説明】
【図１】本発明の一実施例に係るコンピュータシステムの基本構成を示すブロック図である。
【図２】本発明の一実施例に係るプログラマブル論理回路を構成するスロットのインターフェース構成を示す図である。
【図３】スロットに書き込まれるハードウェアロジックの管理形態を示す図である。
【図４】本発明の一実施例に係るプログラマブル論理回路のスロットに対するアドレス空間の分割例を示す図である。
【図５】ハードウェアロジックのリプレース動作を示す図である。
【図６】本発明の一実施例に係るプログラマブル論理回路にハードウェアロジックを書き込む手順を示すフロー図である。
【図７】本発明の一実施例に係るプログラマブル論理回路にハードウェアロジックを書き込む手順を示すフロー図である。
【図８】従来のマルチプロセッサシステムの基本構成を示す図である。
【図９】従来のプロセッサ支援システムの基本構成を示す図である。
【図１０】従来のプログラマブル論理回路の基本構成を示す図である。
【符号の説明】
１プログラマブル論理回路
２ＣＰＵ
３メモリ
４バス
５ソフトウェア
６ハードウェアロジック
７コプロセッサ又はＡＳＩＣ
８単位セル
８ａＬＵＴ
８ｂフリップフロップ
１０スロット
１１スロット入力制御部
１２スロット出力制御部
１３割り込み制御部
１４ステータス[0001]
BACKGROUND OF THE INVENTION
  The present inventionIn a computer system including a programmable logic circuitIt relates to the cache method.
[0002]
[Prior art]
There are roughly two methods for configuring a high-speed computer system. The first method is a multiprocessor system in which a plurality of CPUs 2 are provided and operate in parallel as shown in FIG. 8, and the second method supports the operation of the CPU 2 as shown in FIG. In this way, high speed is achieved.
[0003]
Typical examples of multiprocessors include parallel supercomputers and high-performance servers. Recently, multiprocessor systems have been proposed that can improve performance by specializing in specific applications, such as a case in which a JAVA (R) processor is installed or a case in which a DSP (Digital Signaling Processor) is installed. Yes.
[0004]
Further, as a configuration for supporting the operation of the CPU 2, there are a configuration in which a coprocessor is mounted, a configuration in which an ASIC (Application Specific Integrated Circuit) is mounted, and the like.
[0005]
A typical example of a coprocessor is an FPU (Floating-point Processing Unit) that performs floating-point arithmetic at high speed. By using this FPU, it is possible to increase the speed when the CPU cannot perform a floating point calculation by hardware and must be processed by software. Another example is a vector coprocessor for performing matrix operations at high speed. This is equipped with hardware for performing regular operations such as matrix operations at high speed. By using such a coprocessor, when a large amount of regular operations are generated by scientific and technological calculations, it is possible to perform processing at a higher speed than processing by the CPU alone.
[0006]
On the other hand, an ASIC enables high-speed operation by configuring a part or all of functions specialized for an application with hardware, and a gate array is known as a typical example. In an embedded system, it is possible to construct a system that is faster and smaller than that of an ASIC. However, since this ASIC has application-specific functions, it cannot be applied to various applications. There are drawbacks.
[0007]
In order to make up for this drawback, in recent years, development of devices (hereinafter collectively referred to as programmable logic circuits) having rewritable areas such as field programmable gate arrays (FPGAs) and programmable logic devices (PLDs) has been performed. (Eg, US Pat. No. 4,700,187). As shown in FIG. 10, this programmable logic circuit is an array of basic cells 8 composed of a LUT (LookUp Table) 8a and a flip-flop 8b, and the internal hardware logic is changed by rewriting the LUT 8a. I can do it. Therefore, since the hardware logic can be rewritten according to the application, it is used as a control device for a special purpose or a device with a short cycle.
[0008]
[Problems to be solved by the invention]
In general, a large amount of hardware may be used to increase the speed of a computer system. For example, in the multiprocessor system of FIG. 8, the overall performance improves as the number of CPUs 2 increases. Further, in the system in which the coprocessor and ASIC shown in FIG. 9 are mounted, the overall performance can be improved by configuring hardware that can take over the work of more CPUs 2. However, such a method increases the number of parts that constitute the system, leading to an increase in the price and scale of the system. Moreover, in these systems, since the functions provided by the hardware are limited, it is not possible to provide functions that can handle various applications.
[0009]
On the other hand, in the programmable logic circuit shown in FIG. 10, circuit information of a plurality of processes necessary for an application is stored in a memory in advance, and is read from the memory and written in a rewritable area as needed. It is possible to generate the necessary circuit. Therefore, in this method, it is possible to realize a circuit larger than the circuit scale by using a programmable logic circuit having a small circuit scale, and it is possible to realize a reduction in size and cost of a computer system. However, a programmable logic circuit composed of a single rewritable area is specialized for one application when viewed in a certain time, and there is a limit in speeding up as compared with a computer operating in multitasking.
[0010]
Therefore, in order to realize multitasking, it is possible to have a configuration in which a plurality of programmable logic circuits are arranged in parallel and driven independently of each other. Increase in size and scale, and the wiring connecting programmable logic circuits becomes complicated, resulting in a slow processing.
[0011]
  The present invention has been made in view of the above problems, and its main purpose is to provide a cache method capable of processing various applications including multitasking at high speed by utilizing limited hardware. There is to do.
[0012]
[Means for Solving the Problems]
  To achieve the above object, the present inventionDivides the rewritable logic part of the programmable logic circuit into a plurality of slots having substantially the same data size, assigns a slot number to the slot, and assigns data to the area of the logic part corresponding to the slot number. Or a cache method for sequentially writing hardware logic referred to by software in the form of a file, wherein the hardware logic inserted in each slot is inserted when more hardware logic than the number of slots is inserted. Referring to the history information, select a slot that is unlikely to be used, and evict the selected hardware logicIs.
[0013]
  In the present invention,The hardware logic is compared with the capacity of the slot or the address space assigned to the slot, and when the hardware logic is larger than the capacity of the slot or the address space assigned to the slot, a plurality of The hardware logic may be written to the slot.
[0014]
  In the present invention,The control means provided at the input / output stage of the slot arbitrates between the hardware logic written in the slot and external hardware or software.It can be configured.
[0015]
  In the present invention,The hardware logic interrupt signal is assigned to an interrupt port provided in the slot, and the interrupt control means provided in the output stage of the slot notifies the processor of the interrupt according to the priority set by the interrupt control means.It can also be configured.
[0016]
  In the present invention,The slot in which hardware logic is not written or the slot in which rewriting is performed is logically separated by the detaching means provided in the slot and the control means.It can also be configured.
[0024]
As described above, according to the configuration of the present invention, the hardware logic placed under the management of software in the plurality of slots divided into substantially equal data sizes is controlled by the arbitrating means, the interrupting means, and the disconnecting means. Since data is written as necessary, processing required by the application can be processed at high speed on hardware.
[0025]
In addition, referring to historical information such as hardware logic registration information and usage information written in the slot, the new hardware logic and the already written hardware logic are compared, and the hardware logic with a low priority is compared. Are sequentially expelled, hardware resources can be used effectively, and processing delays due to lack of slots can be prevented.
[0026]
DETAILED DESCRIPTION OF THE INVENTION
In a preferred embodiment of the programmable logic circuit according to the present invention, the rewritable area is substantially equally divided into a plurality of individually accessible slots, and the hardware logic to be written to the slot and the external logic are written in the input / output stage of the slot. Slot input control unit and slot output control unit that perform arbitration with hardware and software, an interrupt control unit that enables interrupt control of hardware logic, and a status indicating the status of each slot, By writing hardware logic managed by software in each slot while arbitrating in this control unit, processing necessary during software execution can be processed at high speed on the hardware, For various applications without increasing the scale It is possible to provide a general-purpose computer system that can respond.
[0027]
【Example】
In order to describe the above-described embodiment of the present invention in more detail, a programmable logic circuit, a computer system, and a cache method according to an embodiment of the present invention will be described with reference to FIGS. FIG. 1 is a diagram showing a basic configuration of a computer system including a programmable logic circuit, and FIG. 2 is a diagram showing an input / output interface configuration of each slot. FIG. 3 is a diagram showing a management form of hardware logic, and FIG. 4 is a diagram showing an example of dividing an address space. FIG. 5 is a diagram showing how hardware logic is replaced, and FIGS. 6 and 7 are flowcharts showing the procedure of hardware logic rewrite processing.
[0028]
First, the basic concept of the present invention will be described.
[0029]
There is a cache as a means of speeding up software. The cache is a technique for increasing the speed by placing software programs and data in a place (for example, a cache memory) where the software programs and data can be accessed at high speed. The basic idea of the present invention is to propose a method for caching hardware functions rather than programs and data.
[0030]
A configuration for this will be described with reference to FIG. In a computer system according to an embodiment of the present invention, a programmable logic circuit 1 including a rewritable area, a processor 2 such as a CPU, and a memory 3 are connected by a bus 4 such as PCI. A rewritable area composed of a plurality of slots 10 divided into a predetermined data size (data width) and quantity, and slot input for arbitrating between the hardware logic written in each slot 10 and external hardware or software A control unit 11 and a slot output control unit 12, an interrupt control unit 13 that performs hardware logic interrupt control, and a status 14 that indicates the status of each slot 10 are provided.
[0031]
The configuration shown in the figure is a basic configuration of the computer system of the present invention, and a plurality of processors 2 and memories 3 may be connected, or a circuit that supports the operation of a CPU such as a coprocessor may be connected. Good. In the figure, the programmable logic circuit 1, the CPU 2, and the memory 3 are described as separate ICs (LSIs). However, in an actual configuration, these may be included in one system LSI. In the following description, the programmable logic circuit 1 will be described as being separated from the CPU 2 and the memory 3 for convenience, but the programmable logic circuit 1 has a structure including an input / output control unit and an interrupt control unit in a rewritable area, and further the processor 2. Any structure of the entire computer system including the memory 3 and the like is also included.
[0032]
First, in the present invention, as a method for realizing high speed, a plurality of programmable logic circuits 1 are not provided, but a rewritable area of one programmable logic circuit 1 is divided into units of a plurality of slots 10. Yes. The slot 10 is an independent area that functions in accordance with an interface signal as shown in FIG. 2, and each slot 10 can freely define a hardware function.
[0033]
The interface signal includes an input side interface signal (Address, Data_in, Control_in) input via the slot input control unit 11 and an output side interface signal (Data_out, Control_out) output to the slot output control unit 12 or the interrupt control unit 13. , Interrupt), each of which plays the following role.
[0034]
Address: 1 bit or a plurality of bits (the number of bits common to the slots). By this Address signal, the CPU 2 or software can access the logic embedded in the slot 10.
Data_in: Input data for the logic in the slot 10, for example, having a common number of bits such as 32 bits.
Control_in: A control signal consisting of 1 bit or a plurality of bits for controlling the slot. For example, read and write to the logic inside the slot in synchronization with the Adoress signal,
Data_out: a path for outputting an operation result or the like to the outside of the slot 10, for example, having a common number of bits such as 32 bits.
Control_out: an output signal for the logic in the slot 10 to notify the external logic of a request, etc.
(Interrupt): This is provided as an option, and is an interrupt signal for notifying the CPU 2 or software of the end of calculation.
[0035]
Although the figure shows an example in which the slot 10 is divided into three, the number of divisions and the data size (data width) of each slot 10 are arbitrary, and the hardware logic written in the slot 10 is arbitrary. It can be set in consideration of the data width, the number of logics processed by multitasking, the scale and performance of the entire computer system, and the like. However, when the data widths of the slots 10 are different, the slot 10 to be written must be selected in consideration of the data width of the hardware logic, and rewriting of the hardware logic may be complicated. In order to make the programmable logic circuit 1 of the invention a highly versatile system, the slot 10 is preferably equally divided into substantially equal data sizes.
[0036]
Next, the hardware logic is written in each of the divided slots 10, and this hardware logic is placed under the management of the software 5 in the form of a file or data as shown in FIG. To allow definition / replacement to slot 10.
[0037]
Here, since the slot 10 has an address and data width of a fixed number of bits, the space and data width may be insufficient depending on the hardware logic 6. In that case, the problem can be solved by using a plurality of slots 10. For example, if the slot has a data width of 32 bits and it is desired to insert the 64-bit hardware logic 6, two slots 10 may be used.
[0038]
Note that the data size (data width) of the hardware logic 6 is also arbitrary, but if the data width is too large, many slots 10 are occupied at one time, which hinders the processing of other hardware logic 6. May cause In addition, if the data size of each hardware logic 6 is too different, there may be a case where the hardware logic 6 cannot be replaced. Accordingly, each hardware logic 6 is preferably configured in consideration of the capacity of the slot 10.
[0039]
The hardware logic 6 is a part that can operate independently, and it is necessary to be able to independently access the memory 3 and access to a processor such as the CPU 2. Therefore, in order to enable individual access from the processor 2, for example, as shown in FIG. 4, the address space is divided so that individual control can be performed using addresses (Slot1 offset, etc.). Yes.
[0040]
In the structure of the present invention, since each hardware logic 6 operates individually, it is necessary to perform arbitration with external hardware and software. Therefore, arbitration logic such as the slot input control unit 11 and the slot output control unit 12 is provided in the programmable logic circuit 1 so as to perform overall adjustment with external hardware, software, and the like. In addition, an interrupt control unit 13 that receives an interrupt request from each slot 10 and notifies the CPU 2 of an interrupt is provided in order to notify the end of a designated process, a change in internal status, and the like. The slot input control unit 11, the slot output control unit 12, and the interrupt control unit 13 may be formed programmable in the rewritable area or may be formed as fixed logic outside the rewritable area.
[0041]
Each slot 10 can be individually rewritten, but other slots need to be operating while a certain slot is rewriting. For this reason, each slot and its arbitration logic (slot input control unit 11 and slot output control unit 12) are equipped with logic (not shown) that logically separates the slot 10 being rewritten.
[0042]
Furthermore, since the programmable logic circuit 1 of the present invention is a hardware cache, it is necessary to evict the hardware logic 6 that has become unnecessary. Therefore, in the present invention, a history such as which slot 10 each hardware logic 6 enters and when it is used is taken, and the hardware logic to be evicted can be selected from the history information.
[0043]
More specifically, as shown in FIG. 5, a music player is written in slot (1), a paint tool is written in slot (2), and hardware logic 6 of game 1 is written in slot (3). Assuming that the logic 6 is used in the filled area in the figure, since the hardware logic 6 of the music player in the slot (1) is once used and is not used, it is evicted from the slot (1). It seems that there is no problem. Therefore, when the new hardware logic 6 is requested (at the time of the arrow in the figure), the music player is driven out, and the game 2 is replaced instead.
[0044]
Thus, considering the history information such as the writing time, usage frequency, and non-use period of the hardware logic 6 written in each slot 10, the remaining number of slots 10 and the number of logic scheduled to be processed are taken into consideration. Thus, by expelling hardware logic that is unlikely to be used, an empty slot 10 is secured, and a delay in processing is prevented so that no waiting time is generated for writing of the next hardware logic 6. is doing.
[0045]
As described above, in order to cache the hardware functions, the rewritable area is divided into general-purpose data size and quantity slots 10 and the hardware logic 6 written in each slot 10 and the external By providing a slot input control unit 11, a slot output control unit 12, an interrupt control unit 13 and a disconnection logic for mediation with hardware or software, the hardware logic 6 placed under the management of the software is necessary. Therefore, the high-speed processing can be realized without increasing the scale of the entire system as in the case of providing a plurality of programmable logic circuits.
[0046]
Also, since the usage status of the hardware logic 6 written in each slot 10 is monitored and the hardware logic is driven out of the slot 10 by referring to the history information, the slot 10 in which the next hardware logic 6 is to be written. This prevents the problem of processing delays and speeds up the processing of the entire system.
[0047]
Next, a procedure for writing the hardware logic 6 in the programmable logic circuit 1 having the above configuration will be specifically described.
[0048]
The programmable logic circuit 1 of the present invention caches hardware as described above, and after a request (for example, a function call) from the application side is transmitted, the hardware logic 6 corresponding to this request is empty. The data is written in the slot 10 and processed on the hardware, and the processing result is returned to the application. Therefore, a means for mediating the application and the programmable logic circuit 1 is required. Here, this control is performed by the hard macro library management unit.
[0049]
The hardware macro library management unit is software that has a hardware logic 6 to be inserted into the slot 10 as a library, and inserts necessary hardware logic 6 into the slot 10 and operates in response to a request from the application side. The CPU 2 and the slot input control unit 11, the slot output control unit 12, and the status 14 of the programmable logic circuit unit 1 are accessed to perform rewrite control of the hardware logic 6.
[0050]
Hereinafter, processing performed by the application, the hard macro library management unit, and the programmable logic circuit unit 1 will be described with reference to FIGS. 6 and 7. First, as shown in FIG. 6, when an operation such as an image processing function is necessary in the operation of an application (for example, an image processing program), the application makes a function call to the hard macro library management unit. In response to this function call, the hardware macro library management unit refers to the registration information of the hardware logic 6 and investigates whether or not the hardware logic 6 that performs this function operation has already been inserted into the slot 10. .
[0051]
If the corresponding hardware logic is not inserted in the slot 10, the hard macro library management unit accesses the status 14 of the programmable logic circuit 1 to check the status of each slot 10, and the slot 10 has an empty space. Check if it exists.
[0052]
Here, when there is no free space in the slot 10, the new hardware logic 6 cannot be written, so that the function cannot be calculated and the processing is delayed. In the programmable logic circuit 1 of the present invention, each By referring to the history information of the hardware logic written in the slot 10, unused hardware logic is expelled as appropriate, so that the slot 10 can be reserved and the problem of processing delay is avoided. ing.
[0053]
Next, when it is confirmed that the slot 10 has an empty space, the hard macro library management unit takes out the corresponding hardware logic 6 from the library, and uses the input interface signal shown in FIG. , Writing to the slot 10 is started. At this time, the hardware logic 6 and the capacity of the slot 10 are compared. For example, when the data width of the hardware logic 6 is larger than the data width of the slot 10, writing to a plurality of slots 10 is performed. Further, in order to prevent troubles in the operation of other slots 10 during writing, the slot 10 being written is determined by the separation logic provided in the slot 10 and the arbitration logic (slot input control unit 11 and slot output control unit 12). Logically separate. Then, when the writing to the slot 10 is completed, the slot output control unit 12 sends a write end signal to the hard macro library management unit.
[0054]
Next, the hard macro library management unit sends a control signal to Control in of the slot 10 via the slot input control unit 11 to instruct to start processing, and sends data necessary for calculation of the function to Data in. In the programmable logic circuit 1, the hardware logic 6 performs a function calculation, and when the calculation is completed, the calculation result is sent from the Data out to the hard macro library management unit, and the interrupt is interrupted from the interrupt to the CPU 2 via the interrupt control unit 13. A signal is sent to notify that the computation is complete. Then, the hard macro library management unit sends the calculation result to the application, and the series of processing ends.
[0055]
The above flow is a procedure in the case where the hardware logic 6 corresponding to the function call is not inserted in the slot 10, but when the hardware logic 6 for calculating the function is already inserted in the slot 10, FIG. As shown in FIG. 5, the hard macro library management unit sends a control signal to the Control in of the slot 10 via the slot input control unit 11 without confirming the empty state of the slot 10 in the status 14 to instruct the activation of the processing. At the same time, the data necessary for the operation of the function is sent to Data in.
[0056]
Then, in the programmable logic circuit 1, the function is calculated by hardware logic, and when the calculation is completed, the calculation result is sent from the Data out to the hard macro library management unit, and the interrupt signal is sent from the interrupt to the CPU 2 via the interrupt control unit 13. To notify that the computation is complete. Then, the hard macro library management unit sends the calculation result to the application, and the series of processing ends.
[0057]
As described above, in the programmable logic circuit 1 of the present invention, when there is a function call from the application, the hardware logic 6 that automatically performs the operation of the function is inserted in the slot 10 or is empty in the slot 10. When the hardware logic 6 is not inserted, the hardware logic 6 is written in the slot 10 and the operation is performed on the hardware and the operation result is returned. Therefore, the function call can be processed quickly and smoothly. It can be performed.
[0058]
【The invention's effect】
As described above, according to the programmable logic circuit, the computer system, and the cache method of the present invention, the following effects can be obtained.
[0059]
The first effect of the present invention is that processing including multitasking can be performed at high speed without increasing the scale of the entire system.
[0060]
The reason is that the rewritable area can be used efficiently because the rewritable area is divided into a plurality of slots and the hardware logic managed by the software is written into the slots as necessary for processing. It is. In addition, arbitration means such as a slot input control unit and slot output control unit and an interrupt control unit are used for arbitration with external hardware and software, so that mutual processing can be executed smoothly.
[0061]
The second effect of the present invention is that a general-purpose programmable logic circuit can be constructed.
[0062]
The reason is that when dividing the rewritable area into a plurality of slots, the slot data width and number of divisions are set according to general hardware logic, and when writing hardware logic larger than the slot, This is because the data can be written in a plurality of slots, so that any hardware logic can be supported.
[0063]
In addition, the third effect of the present invention is that it is possible to prevent processing delays and respond quickly to calculation requests.
[0064]
The reason is that the unused hardware logic is evicted by referring to the history information of the hardware logic written in each slot. Because it can be done.
[Brief description of the drawings]
FIG. 1 is a block diagram showing a basic configuration of a computer system according to an embodiment of the present invention.
FIG. 2 is a diagram showing an interface configuration of slots constituting a programmable logic circuit according to one embodiment of the present invention.
FIG. 3 is a diagram showing a management form of hardware logic written in a slot.
FIG. 4 is a diagram illustrating an example of dividing an address space for a slot of a programmable logic circuit according to an embodiment of the present invention.
FIG. 5 is a diagram illustrating a hardware logic replacement operation;
FIG. 6 is a flowchart showing a procedure for writing hardware logic into a programmable logic circuit according to one embodiment of the present invention.
FIG. 7 is a flowchart showing a procedure for writing hardware logic into a programmable logic circuit according to one embodiment of the present invention;
FIG. 8 is a diagram showing a basic configuration of a conventional multiprocessor system.
FIG. 9 is a diagram showing a basic configuration of a conventional processor support system.
FIG. 10 is a diagram showing a basic configuration of a conventional programmable logic circuit.
[Explanation of symbols]
1 Programmable logic circuit
2 CPU
3 memory
4 Bus
5 Software
6 Hardware logic
7 Coprocessor or ASIC
8 unit cells
8a LUT
8b flip-flop
10 slots
11 Slot input controller
12 slot output controller
13 Interrupt controller
14 Status

Claims

A rewritable logic part of a programmable logic circuit is divided into a plurality of slots having substantially the same data size, a slot number is assigned to the slot, and data or a file is assigned to the area of the logic part corresponding to the slot number. A cache method for sequentially writing hardware logic referenced by software in the form of
When inserting hardware logic more than the number of the slots, refer to the history information of the hardware logic inserted in each of the slots, select a slot that is unlikely to be used, and select the selected slot A cache method characterized by expelling hardware logic.

The hardware logic is compared with the capacity of the slot or the address space assigned to the slot, and when the hardware logic is larger than the capacity of the slot or the address space assigned to the slot, a plurality of The cache method according to claim 1, wherein the hardware logic is written to the slot.

3. The cache method according to claim 1, wherein arbitration between hardware logic written in the slot and external hardware or software is performed by control means provided in an input / output stage of the slot.

The hardware logic interrupt signal is assigned to an interrupt port provided in the slot, and the interrupt control means provided in the output stage of the slot notifies the processor of an interrupt according to the priority set by the interrupt control means. The cache method according to any one of claims 1 to 3.

5. The slot in which hardware logic is not written or the slot in which rewriting is performed is logically separated by the detaching means provided in the slot and the control means. The caching method according to any one of the above.