JP2004500608A

JP2004500608A - Universal resource access controller

Info

Publication number: JP2004500608A
Application number: JP2000582896A
Authority: JP
Inventors: ストラコブスキー・ヘンリー; シャベルスキー・ピオトル
Original assignee: Infineon Technologies AG
Current assignee: Infineon Technologies AG
Priority date: 1998-11-16
Filing date: 1999-11-15
Publication date: 2004-01-08
Also published as: WO2000029955A8; CN1354854A; GB2361561A; DE19983738T1; KR100710531B1; GB2361561B; WO2000029955A1; KR20010086034A; GB0111925D0; CN1311357C; WO2000029955A9

Abstract

【課題】
【解決手段】ユニバーサルリソースアクセスコントローラ（１０４）は、要求元システム（１０２）とリソース（１０８）とに結合されているので、要求元システム（１０２）は、要求元システム（１０２）からリソース（１０８）へのアクセスが必要とされる場合にリソースアクセス要求を生成し、それをユニバーサルリソースコントローラ（１０４）に引き渡す。すると、ユニバーサルリソースコントローラ（１０４）は、要求されたリソースの特定の動作特性パラメータと、要求されたリソースの現在の状態とを使用し、要求元システム（１０２）が必要とするリソース（１０８）へのアクセスに適した対応する順序づけられたユニバーサルアクセス要求コマンドを生成する。
【選択図】図１Ａ【Task】
The universal resource access controller (104) is coupled to the requesting system (102) and the resource (108), so that the requesting system (102) is moved from the requesting system (102) to the resource (108). ) Generates a resource access request when access is required and hands it off to the universal resource controller (104). The universal resource controller (104) then uses the specific operating characteristic parameters of the requested resource and the current state of the requested resource to provide the resource (108) required by the requesting system (102). Generate a corresponding ordered universal access request command suitable for the access of
[Selection diagram] FIG. 1A

Description

【０００１】
【発明の属する技術分野】
本発明は、広義にはコンピュータシステムに関する。より具体的には、本発明は、マルチプロセッサコンピュータシステムなどのコンピュータシステムにおける共用リソースへのアクセスに関する。特に、共用リソースへのユニバーサルアクセスを提供するための装置および方法が記載される。
【０００２】
【従来の技術】
基本的なコンピュータシステムにおいては、中央処理機構、またはＣＰＵが、関連するメモリ内に記憶されている所定のプログラムまたは命令一式に従って動作する。そのような、プロセッサの動作を規定する、記憶されている命令一式またはプログラム以外にも、処理中における中央プロセッサの情報操作を容易にするために、メモリ空間がプロセッサメモリまたは関連する追加メモリ内に備えられている。追加メモリは、プロセッサによって作成される情報の記憶場所を提供し、加えて、プログラムの処理に際してプロセッサが臨時に、すなわち“メモ帳”代わりにして使用する情報の記憶場所を提供する。さらに、関連メモリは命令一式を実行しているプロセッサの出力情報を設置する場所を提供し、システムの出力装置がその情報を利用できるようにする。
【０００３】
現存するメモリにアクセスするために、多くの構成部分（プロセッサ、ハードドライブなど）が１つの共通のバスを共同使用しなければならないシステムにおいては、メモリへのアクセスをめぐる衝突が発生する可能性が上昇する。特にマルチプロセッサコンピュータシステムなどの場合、異なるプロセッサを利用するシステムが同時に操作されるため、メモリまたは他の共用リソースへのアクセスが複雑になる。各プロセッサまたはプロセッサシステムが同じメモリへのアクセスを同時に要求する可能性が高いので、プロセッサ間での衝突は一般的には回避不可能である。基本的に、マルチプロセッサコンピュータシステムにおける２つ以上のプロセッサまたはプロセッサシステムの動作は、結果として、共用メモリ或いは他の共用リソースに対するメモリコマンドの断続的な重複を生じさせる。
【０００４】
共用メモリに対するメモリアクセス要求の衝突という問題の解決のために採られてきた従来の方策は、ある場合では、プロセッサのそれぞれに使用されるメモリの完全な重複や、プロセッサシステムの隔離が含まれる。しかし、メモリアクセス要求の衝突問題を解決するためのこの方策は、しばしば、多重プロセッサシステムにおいて意図された利点を無にしてしまう。そのような多重プロセッサは、一方のプロセッサが他方のプロセッサの動作を補助しながら、同じデータでの並行した算出動作を行うといった状況で操作される場合に最も効果的である。従来、そのようなプロセッサシステムは、プロセッサ同士がメモリなどの共用リソースへのアクセスを競い合うという時間的共用であるか、またはプロセッサシステムが二重のポートを有しており、各プロセッサは例えば個別にメモリバスを有し、一方がアクセスを許可された場合には他方は待機しているといった状態である。
【０００５】
上記の衝突問題を回避するため様々な方策が採られてきた。ある方策では、各プロセッサの逐次的な使用、またはプロセッサの時間的共用によって衝突の回避が達成される。この方法では、単純に、衝突を避けるために、プロセッサが順番に共用リソースへのアクセスを行う。一般的に利用されるこのようなシステムは“リングパッシング”または“トークンシステム”を含み、それにより、ユーザーグループでのリングの伝達と類似する所定の連続動作に従って、衝突する可能性のあるプロセッサがシステムによってポーリングされる。
【０００６】
残念ながら、プロセッサによる逐次的なアクセスの方法論は、コンピュータシステムの全体的な動作に著しい制限を課すことになる。この制限は、システムが、衝突するプロセッサをポーリングするためにかなりの時間を費やすという事実に起因する。さらに、単独のプロセッサが作動しており、例えば共用メモリへのアクセスを要求する場合には、システムによる連続動作の実行に伴い、共用リソースへのプロセッサアクセスの間で、各メモリサイクル毎に遅延が生じる。
【０００７】
衝突回避のための他の一般的な方策は、コンピュータシステム内のプロセッサ間での優先順位付けによるものである。そのような方法では、各プロセッサに対してシステムの重要性の階層に従った優先順位が付される。衝突が発生するたびに、メモリコントローラは単純により高い優先順位を有するプロセッサにアクセスを提供する。例えば、２つのプロセッサを有するシステムの場合、共用メモリへは第一のプロセッサ、第二のプロセッサがアクセスする。共用メモリは、典型的には、メモリに維持記憶されているデータの周期的なリフレッシュが必要となるダイナミックＤＲＡＭ（ＤＲＡＭ）型メモリ装置などである。一般的に、ＤＲＡＭ型メモリでは別の独立したリフレッシュシステムによってリフレッシュが行われる。そのようなマルチプロセッサシステムの場合、プロセッサとリフレッシュシステムとの双方が共用メモリへのアクセスを競うことになり、プロセッサおよびリフレッシュシステムに割り当てられた優先順位に従って、システムメモリコントローラがメモリアクセス要求の衝突やコマンドを処理する。そのようなシステムは、衝突の問題を解決し、さらに、単純な逐次的アクセスによる衝突回避のシステムよりも効果的である反面、依然として柔軟性に欠くものである。
【０００８】
衝突解消のための他の従来的な方策は、メモリコントローラ内に組み込まれた意思決定機能である。残念ながら、メモリコントローラの意思決定部分はクロックシステムによる制御およびタイミングによって操作されるため、実際に意思決定を実行し、メモリコントローラが共用メモリへのアクセスを許可し得るようになるまでに多くの時間を費やすことになる。
【０００９】
残念なことに、この実際の意思決定の実行という問題は、従来のメモリコントローラが有するマルチバンク型メモリシステムへのアクセス許可能力を実質上低下させる。マルチバンク型メモリシステムでは、実際のメモリコアは特定の領域、すなわちバンクに区分され、読み出されるデータもそこに記憶される。より迅速且つより効率的なメモリアクセスが提供できる反面、マルチバンクメモリ装置に対応するために従来のメモリコントローラには複雑な仕組みが求められ、その結果、全体的なシステムとしては、アクセススピードが全体的に著しく低下することになる。
【００１０】
以上のことから、ユニバーサルデバイスアクセスコントローラが望まれていることがわかる。
【００１１】
【発明の概要】
本発明にしたがって、例えばシンクロナスリンクＤＲＡＭ（ＳＬＤＲＡＭ）等からなる関連リソースへのアクセスを制御するために、ユニバーサルリソースアクセスコントローラが提供される。本発明は、要求元システムとリソースとに結合されたユニバーサルリソースアクセスコントローラを備えるので、要求元システムは、リソースへのアクセスを必要とする場合にリソースアクセス要求を生成し、ユニバーサルリソースコントローラに引き渡す。すると、ユニバーサルリソースコントローラは、要求されたリソースの特定の動作特性パラメータと、要求されたリソースの現在の状態とを使用し、要求元システムが必要とするリソースへのアクセスに適した、対応する順序付けられたユニバーサルアクセス要求コマンドを生成する。
本発明の別の実施形態にしたがって、複数の要求元システムのうちの任意の要求元システムによる、複数のアクセス可能なデバイスのうちの任意のデバイスへのアクセスを制御するための装置を開示する。この装置は、ユニバーサルコントローラユニットと、ユニバーサルコントローラユニットに結合されたアドレス空間コントローラユニットとを備える。ユニバーサルコントローラユニットは、要求元システムから入力されたシステムアドレスとシステムコマンドとをデコードする。次いで、ユニバーサルコントローラユニットは、アドレス空間コントローラに格納され、アドレス空間コントローラによって提供されるデバイスパラメータに基づいて、関連するデバイスアドレスと対応するデバイスコマンドとを生成する。アドレス空間コントローラは、複数のデバイスのそれぞれにアドレス空間コントローラ内で自身のアドレス領域が与えられるように構成される。
【００１２】
本発明の特質および利点については、本明細書中の以下に示される部分および図面により、その一層の理解が達成される。
【００１３】
【発明の実施の形態】
プロセッサなどの複数の装置が同じリソースを共用するシステムの場合、１つ以上の装置が共用リソースへのアクセスを要求する際に典型的に発生する衝突を回避するために、様々な方策が採られてきた。ある方策では、プロセッサを順番に動作させることでその衝突を回避し、またはプロセッサの時間的な共用によって衝突回避を達成する。この方法においては、衝突を避けるために、単純に、プロセッサが順番に共用リソースへのアクセスを行う。一般的に利用されるこのようなシステムは“リングパッシング”または“トークンシステム”を含み、それにより、ユーザーグループでのリングの伝達と類似する所定の連続動作に従って、衝突する可能性のあるプロセッサがシステムによってポーリングされる。
【００１４】
残念ながら、このようなプロセッサによる逐次的なアクセスの方法論では、衝突するプロセッサをシステムがポーリングするためにかなりの時間を費やすため、コンピュータシステムの全体的な動作に著しい制限を課すことになる。
【００１５】
衝突回避のための他の一般的な方策は、コンピュータシステム内のプロセッサ間での優先順位付けによるものである。そのような方法では、各プロセッサに対してシステムの重要性の階層に従った優先順位が付される。そのようなシステムは衝突の問題を解決し、さらに、単純な逐次的アクセスによる衝突回避のシステムよりも効果的である反面、依然として柔軟性に欠くものである。
【００１６】
衝突回避のためのさらに別な一般的方策は、コントローラ型装置に組み込まれた意思決定ロジックを含む。残念ながら、意思決定ロジックの複雑さから、実際の意思決定が実行されコントローラによる共用メモリへのアクセス許可が可能になるまでに、多くの時間を費やすこととなる。
【００１７】
複雑なロジックがシステムの動作速度を低下させるこの問題は、それぞれに異なる動作特性を有しながら相互に接続された複数のメモリ装置の間に点在するメモリを含むマルチチップモジュール型メモリシステムにおいてはさらに顕著である。従来のロジックは、様々なメモリ装置に固有の、それぞれに異なるアクセス特性を補うようには構成され得ないので、全体的なシステムの実行性能を低下させることで解決しようとする。
【００１８】
広義には、図１Ａに示される通り、本発明は、それぞれがユニバーサルデバイスコントローラ１０４に結合された要求デバイス１０２を有するシステム１００として示すことができる。ここで、要求デバイス１０２は、任意の数および種類の共用リソース１０８へのアクセスを提供するように適切に形成されたシステムバス１０６によってユニバーサルデバイスコントローラ１０４に結合される。１つの実施形態では、システムバス１０６は関連するシステムインターフェース層１１０によってユニバーサルコントローラ１０４に結合され、さらには、そのユニバーサルコントローラ１０４が共用リソースインターフェース１０９によって共用リソース１０８に結合される。広義には、ユニバーサルコントローラ１０４は、任意の要求元システム１０２からの共用リソースへの要求および共用リソース動作特性パラメータ１１３に基づき、共用リソース１０８の状態を判定するように設定される。
【００１９】
要求元システム１０２がマルチプロセッサシステム内の１つのプロセッサであり、同じようにそこに結合されているほかのプロセッサによっても共用されているメモリ装置１０８としての共用リソース１０８に対するアクセスを要求する場合、ユニバーサルコントローラ１０４は、所望のリソースアクセス要求を完了するために、実行されるべき動作の順序を決定する。例えばメモリ装置１０８がＳＤＲＡＭである場合、その動作は典型的にはプレチャージ、ページクローズ、ページオープン、およびページリードまたはページライトを含む。
【００２０】
ある特定の動作順序が決定されると、例えばデータ衝突または他の種類の対立を回避するために、ユニバーサルコントローラ１０４が順序付けられた連続動作の間の適当な時間間隔を決定する。好ましい実施形態では、その時間間隔は、例えばルックアップテーブルに記憶されている共用メモリ装置の動作特性に、部分的に基づいて決定される。その後、適切に順序付けられたアクセスコマンドがユニバーサルコントローラにより出力され、次いで、共用メモリによって応答される。
【００２１】
以下に示される本発明の詳細な説明では、発明についての完全な理解を促進するために複数の具体的な実施形態が示される。しかし、当業者には明白となるように、本発明はこれらの特定の詳細に依らず、または他の要素あるいは工程を利用することにより実践され得る。他の場合では、本発明の本質が不明瞭になることを避けるため、周知の工程、手順、構成要素、および回路についての詳細な説明が割愛されている。
【００２２】
以下、本発明は、プロセッサと共用メモリとの間の連絡機構として機能するように設定されたメモリコントローラの観点から説明される。しかし、本発明は、共用であるか否かに関わらず、任意のリソースへのアクセスを制御し得るユニバーサルコントローラとしても実行し得ることに留意されたい。そのようなリソースはメモリである必要性は無く、事実、本発明は、例えば、バスアクセスの待ち時間を低減することによりシステムバスの効率的な帯域幅を増加させることを目的としてマルチプロセッサ内の情報量を制御するなど、共用システムバスへのアクセスを制御するためにも用いられ得る。
【００２３】
次に、図１Ｂでは、プロセッサなどの要求デバイス１０２を有し、システムバス１０６によってユニバーサルコントローラ１０４に結合されるシステム１００が示される。さらに、コントローラ１０４は共用リソース１０８に結合され、共用リソース１０８は、例えば、様々な形態を採り得るメモリ１０８であり、ＤＲＡＭ、ＳＤＲＡＭ、ＳＬＤＲＡＭ，ＥＤＯ、ＦＰＭ、またはＲＤＲＡＭなどである。例示の実施形態では、システムバス１０６は単方向アドレスバス１０６−１を含み、その単方向アドレスバス１０６−１はプロセッサ１０２によって出力されるメモリアドレス要求をユニバーサルコントローラ１０４に伝達する。さらにシステムバス１０６は単方向アドレスバス１０６−２も含み、その単方向アドレスバス１０６−２はアドレスバス１０６−１と連係してメモリアドレスに関するコマンドを伝達する。例えば、プロセッサ１０２がメモリ１０８における特定のメモリ位置に記憶されている実行可能な命令を要求する場合、プロセッサはコマンドバス１０６−２に対して読み取り要求（システムコマンドと称す）を出力し、実質的には同時に、アドレスバス１０６−１に対しても対応するメモリアドレス要求（システムアドレスと称す）の出力が行われる。システムアドレスおよびシステムコマンドの双方はコントローラ１０４に含まれる設定可能なシステムインターフェース１１０によって受け取られる。ここで、設定可能とは、受け取ったシステムコマンドおよびシステムアドレスが、メモリ１０８によって要求されるいかなる方法および形態のものでも、システムインターフェース１１０がそれを処理できるように、システムインターフェース１１０を設定し得るということを意味するものである。これにより、プロセッサ１０２が各メモリ装置に対して個別の要求を発する必要が無くなり、プロセッサ１０２が必要としているデータを、コントローラ１０４に結合する任意の数および種類のメモリ装置に格納することができる。
【００２４】
例示の実施形態において、システムインターフェース１１０は、受け取ったシステムコマンドおよびシステムアドレスをユニバーサルコマンド２００と称されるものに変換するよう設定されている。ユニバーサルコマンド２００の例は図２Ａに示される。１つの実施例では、共用リソースがＤＲＡＭ型メモリ装置（ＳＬＤＲＡＭ、ＳＤＲＡＭ、ＥＤＯＤＲＡＭなどを含む）である場合、ユニバーサルコマンド２００は、メモリ１０８の任意のメモリアクセス要求を実行するために必要な全ての動作を含む５つのデータフィールドから形成される。そのような動作は、データプレチャージフィールド２０２によって示されているプレチャージ動作を含み、このフィールドは、特定のロウを予めチャージしておく必要が有るかどうかを表示するために用いられる、プレチャージ動作を含む。他の動作として、データ活性化フィールド２０４、データリードフィールド２０６、データライトフィールド２０８およびデータリフレッシュフィールド２１０を含む。例えばここで、メモリ１０８が、メモリバンク１において現在アクティブである（すなわち、リードまたはライトが行われた後のオープンされている）メモリページ１を有し、且つ、続くプロセッサコマンドが、メモリバンク１のページ２上に記憶されているデータを読み取り、プロセッサ１０２に出力するよう要求しているとする。この場合、プロセッサ１０２によって要求されるコマンドを実行するためには、ページ１がクローズされ（すなわち、ページ１がプレチャージされる）、ページ２は活性化されなければならない。活性化が完了した後、ページ２からのリードが行われる。したがって、図２Ｂに示されるユニバーサルコマンド２１２は、データフィールド２０２，２０４，２０６，２０８，２１０を有するユニバーサルコマンド生成部１１０によって生成され、そのうちのデータフィールド２０２，２０４，２０６は“関連する動作の実行”を示す“１”に設定され、データフィールド２０８，２１０は“関連する動作の不実行”（すなわち、“ＮＯＰ”）を示す“０”に設定される。
【００２５】
図１Ｂに戻る。メモリ１０８へのアクセスは複数の異なる要求デバイスによって共用されているため非常に動的であり、従ってメモリ１０８の状態は常に変化する。メモリの状態とは、特定のメモリ位置において特定の動作を間違いなく実行するためには、そのメモリ位置の状態を知る必要があることを意味する。例えば、特定のメモリページがクローズしている場合、リード動作を実行するためにはまずそのメモリページをオープンする必要がある。従って、特定のアドレス位置のその時点での状態を探知するために、特定のメモリ位置に対して実行された最も新しい動作が図３に示されるリソースタグ３００によって識別される。本発明の１つの実施形態では、リソースタグ３００は、特定のメモリアドレス位置を識別するために用いられるアドレスフィールド３０２、そのアドレスフィールド３０２によって識別されたアドレスに最後に発行されたコマンドを識別するために用いられる最終発行コマンドフィールド３０４、さらに最終コマンド発行時間データフィールド３０６を含む。例えば、メモリアドレスＡＤＤ５のリソースタグ３０８は時間５φ（５システムクロック周期を示す）にページリードコマンドが発行されたことを示し、リソースタグ３１０は、同じメモリアドレスＡＤＤ５について、時間１０φにそのメモリページにページライトが実行されることを示している。メモリアドレスＡＤＤ５の状態を観測することにより、ユニバーサルコントローラ１０４はＡＤＤ５におけるメモリページが既にオープンしていること、従って、ページオープン操作は必要ないことを認知する。
【００２６】
リソースタグバッファ１１４に記憶されているタグ３００によって提供されるリソースの状態についての情報を元に、設定可能システムインターフェース１１０に結合されるコマンド順序付け器１１６は、ユニバーサルコマンド２００のコマンド構成要素２０２〜２１０のそれぞれの間における適切な時間間隔を提示し、それにより、図２Ｃに示されるような、コマンド構成要素２０２〜２０４、およびコマンド構成要素２０４〜２０６がそれぞれ時間間隔ｔ１およびｔ２となるよう構成された順序付け済みコマンド２２０が提供される。ここで、コマンド構成要素２０８〜２１０は“ＮＯＰ”型のフィールドであるため、順序付け済みコマンド２２０はこれらのフィールドについて如何なる参照も含んでおらず、構成要素２０２〜２０６に必要なクロック周期と、ｔ１＋ｔ２に実質的に等しい時間周期と、の合計に実質的に等しい時間周期を必要とするのみである。これにより、コマンド順序付け器１１６は、プロセッサ１０２とメモリ１０８との間のコマンドおよびデータの最適な流れを提供し得る。
【００２７】
発明の別の実施形態では、共用リソース１０８がＳＤＲＡＭなどのマルチバンク型メモリ装置である場合、または共用リソースがマルチチップモジュールなどのマルチ装置メモリである場合、リソースタグバッファ１１４は、例えば特定のバンクまたは装置においてオープンしているページ全てについてのリソースタグを記憶し得る。１つの実施例では、比較器（図示せず）がシステムアドレスにおけるバンク番号または装置識別子を探知し、ページアドレスおよびシステムアドレスをタグバッファ１１４の内容と比較する。比較結果が“一致”でない場合（すなわち、アドレスが適合しない場合）、ユニバーサルコントローラ１０４はタグバッファ１１４からのアドレスを用いて古いページをクローズしなければならず、さらに、新しいシステムコマンドに基づき新しいページをオープンしなければならない。
【００２８】
複数の異なる装置がユニバーサルコントローラ１０４によって補助されている場合、特定の装置にのみ関連する動作パラメータであって、入力されるシステムアドレスにも関連している動作パラメータを選択し得ることが望ましい。ユニバーサルコントローラが複数の異なる装置を補助している場合に、ユニバーサルコントローラ１０４に結合されたアドレス空間コントローラ１２０が図１Ｃに示されている。例示の実施形態では、アドレス空間コントローラ１２０は入力されたシステムアドレスに関連する１つの装置を示す装置特有のパラメータのみを選択する性能を有する。図１Ｄに示される具体的な実施例では、アドレス空間コントローラ１２０は比較器１２２を含み、その比較器１２２が、入力されたシステムアドレスと、入力されたアドレスに関連する装置（または同様にして、メモリ領域）を識別する領域アドレス範囲バッファ１２４の内容とを比較する。特定の装置または領域が識別されると、装置パラメータレジスタ１２６および１２８（それぞれがバッファ１２４に結合され、特定の装置ごとに特有のパラメータを含む）のグループ中１つのレジスタが選択される。選択された装置パラメータレジスタは次いでそのシステムアドレスに対応する装置についての特定の動作パラメータを提示する。別の実施形態では、選択された装置パラメータレジスタの内容はＬＵＴ１１８に入力される。これにより、任意の数の異なる装置をユニバーサルコントローラ１０４によって補助することができ、各装置の特定の動作パラメータが識別されて、対応するユニバーサルコマンドの最適な順序付けに用いられる。
【００２９】
ユニバーサルコントローラに結合された装置の１つがビジー状態で新しいコマンドを受付られないといった場合に、コマンド列で待機中の任意の別のコマンドを選択し得ることが有益であるという点に留意すべきである。発明の別の実施形態では、装置による応答およびユニバーサルコントローラによる要求のそれぞれが、それぞれ関連する識別番号１５０を有しており、例示の実施形態では、図１Ｅに示されるように、その識別番号は５ビット長のデータワードである。識別番号１５０は、２ビット長のグループセレクタフィールド１５２および３ビット長の要求番号フィールド１５３を含むように形成される。グループセレクタ（ＧＳ）は特定のシステム要求がどのグループに属するか（例えば、プロセッサ）を判定し、要求番号（ＲＮ）は、グループセレクタフィールド１５２によって識別された関連グループの要求または応答の番号を表す。その際、同一のトランシーバからの連続する要求は連続した要求番号を有する。
【００３０】
別の実施形態では、グループ優先順位セレクタレジスタ１５４は応答グループまたは要求グループのそれぞれについての優先順位値を含み、より高い優先順位値を有する応答グループまたは要求グループが低い優先順位値のグループに先行する。これにより、優先順位値の低い要求や応答が次のクロック周期で処理され得ない場合に、その低順位値の要求または応答に先行して高順位値の要求または応答が処理され得る。いわゆる動ロック（ライブロック，Ｌｉｖｅｌｏｃｋ）を防ぐため、動ロックカウンタレジスタ１５６は、低優先順位値の要求（または応答）に先行し得る、高優先順位値を有する連続する要求（または応答）の数の情報を含む。これにより、低優先順位値の要求（または応答）はクロック周期の多くの周期中放置されずにすむことになる。
【００３１】
また別の留意点として、コマンドの流れとデータの流れの双方の制御を最適化するために、共用リソースのそれぞれがそれぞれに動作特性一式（例えば、ＤＲＡＭ型装置の場合はアクセス時間、ＣＡＳ待ち時間など）を関連させている点があげられる。ユニバーサルコントローラ１０４によって１つ以上の共用リソースが補助されている場合、共用リソースのそれぞれは異なる動作特性の一式を有しており、別の実施形態では、その動作特性はコマンド順序付け器１１６に結合するルックアップテーブル（ＬＵＴ）１１８に記憶される。コマンド順序付け器１１６は、リソースタグバッファ１１４に記憶されるリソースタグと連係してＬＵＴ１１８が提供する情報を用いて、コマンド構成要素２０２〜２１０を適切に順序付け、順序付け済みコマンド２２０を形成する。これは、共用リソースが、マルチチップモジュールなど、まさにそれぞれが実質的に異なる動作特性を有するメモリ装置の集合である場合には特にあてはまる。
【００３２】
次に、図４は、発明の実施形態に従ってユニバーサルコントローラが共用リソースへのアクセスを図るプロセス４００の詳細を示したフローチャートである。このプロセス４００はシステムが共用リソースへのアクセスを要求する４０２から開始する。共用リソースがＤＲＡＭ型メモリ装置の場合、その動作はプレチャージ、リフレッシュ、クローズ、オープン、リードおよびライトを含む。例えば、プロセッサが、システムコマンド（すなわちページリード）と、要求されているページが記憶されているメモリ内の場所を示す関連のシステムアドレスとを生成することにより、共用メモリ内に記憶されるメモリページを要求する。これは、好ましい実施形態では、４０４において、例えば、共用メモリ内でのアクティブなメモリ位置に関連したリソースタグを用いてリソースの状態が判定される。次いで、４０６にて、共用リソース対する所望の要求を実行するために必要な、動作の順序付けについての判定が行われる。４０８では、ユニバーサルコントローラが、所望の要求を実行するために必要な動作の連続順序に基づいてユニバーサルコマンドを生成する。例えば、ページリード動作を実行するためには、前段階でオープンされているページをクローズし、新しいページを活性化し、それからリード動作が実行される必要がある。その際、これら全てはユニバーサルコマンド構成１つで理解される。共用リソースについてのリソースタグおよび特有の動作特性を用いてユニバーサルコントローラがユニバーサルコマンドを形成すると、次いで、４１０にて、ユニバーサルコントローラはユニバーサルコマンドのコマンド構成要素それぞれの間の適切な時間間隔を判定する。その後、４１２において、順序付け済みコマンドが共用リソースに対して発行される。この際、別の実施形態では物理ステージが用いられる。最後に４１４において、共用リソースは、例えば、システムアドレスに示される位置に記憶されているデータを提示することなどによって、順序付け済みコマンドに応答する。
【００３３】
発明の別の実施形態では、図５に示されるプロセス５００を用いて、ユニバーサルコントローラがリソースの状態（４０４）および実行する動作の順番（４０６）を決定する。このプロセス５００は、５０２にてリソース区分識別子（すなわち、メモリアドレスレジスタ）をリソース識別子（すなわち、リソースタグアドレスフィールド２０２）と比較することによって開始される。５０４において“一致”の発生が確認された場合（すなわち、新しいコマンドのアドレスがその時点でのタグアドレスフィールドと適合する場合）、続いて、５０６において次のコマンドが発行される。他方、新しいコマンドアドレスがその時点でのタグアドレスフィールドに適合しない場合（すなわち、不一致の場合）、次いで、５０８にて、古いページがまだオープンしているか否かの判定が行われる。古いページがオープンしている場合には、５１０にてそのページがクローズされ、５１２にて新しいページがオープンされる。しかし、５０８にて古いページがオープンしていないことが確認されると、次いで、５１２において新しいページがオープンされる。どちらの場合でも、一度新しいページがオープンされると、５０６にて次のコマンド（データ操作）が発行される。
【００３４】
発明の別の実施形態において、ユニバーサルコントローラは、図６に示されるプロセス６００に基づいて、連続動作のそれぞれの間における適切な時間間隔を決定する（４１０）。このプロセス６００は、６０２において、ユニバーサルコントローラが、ある特定のリソースに対しての新しい一連のコマンドの最初のコマンドと、その時点までに同じリソースに対して発行されたコマンドのうち最も新しい一連のコマンドの最後のコマンドとを比較することによって開始される。６０４において、ユニバーサルコントローラは、新しいユニバーサルコマンドの最初のコマンド構成要素と、それ以前のもののうち最も新しいコマンドの最終コマンド要素とを比較することによって、ユニバーサルコマンド構成要素間の時間的な制約を判定する。別の実施形態では、ユニバーサルコントローラは表１に示されるような２次元配列の形態をとる２指標ルックアップテーブル（ＬＵＴ）を用いる。その中で、配列の第一行は古い（すなわち今までのうちで最も新しい）コマンドを表し、配列の第一列は新しいコマンドを表す。例えば、表１を参照して、古いコマンドがページリードであった場合、且つ新しいコマンドがページクローズであった場合、それらの、ページクローズという新しいコマンドとページリードという古いコマンドとの交差位置には、それら２つの動作の間で許可される最低時間量（すなわち、発行に要する物理的な最低限の時間）が示される。通常、ＬＵＴに記憶される情報は共用リソースの製造者によって提供される。
【００３５】
【表１】

【００３６】
特定のユニバーサルコマンド構成要素についてのリソースの物理的な制約が判定されると、６０６において、同じユニバーサルコマンドに更なるコマンド構成要素が存在しているか否かの判定が行われる。更なるコマンド構成要素が存在しない場合、６０８において、ユニバーサルコマンドおよび関連する構成要素の時間間隔に関する詳細が記憶される。他方、ユニバーサルコマンドに更なるコマンド構成要素が含まれている場合、制御は６０４に戻され、その構成要素について、対応する物理的な時間的制約が判定される。
【００３７】
しかし、例えば複数のメモリバンクを有する共用メモリ１０８内の物理ページの状態を観測するためには、非常に多くのリソースタグが必要となり、それに従い非常に多くのキャッシュメモリがリソースタグバッファ１１４のために必要となる。これでは、それぞれが遠隔に位置するメモリの特定のメモリページについての固有のリソースタグを検索するために非常に多くの時間を要することになり、その結果、ユニバーサルコントローラ１０４の全体的な動作速度を低減させることになる。図７Ａに示される別の実施形態では、ページヒット／ミスコントローラ７０２がユニバーサルコントローラ１０４に含まれ、ページレジスタ７０４の数Ｍがマルチバンクメモリ７０６内のメモリバンク数Ｎ未満になるよう設定される。これは、Ｍ個のページレジスタ７０４中で、全てのバンクが対応できるがわけではないからである。この動作において、Ｍページレジスタ７０４のそれぞれは、オープンているページのアドレスおよび状態データを記憶する。また、ランダムページレジスタ番号生成部７０８は、ページレジスタに対応する無作為なＭ以下の整数値を生成し、オープンしているページのアドレスと交換する。比較器７１０は、入力されたシステムアドレスと、Ｍ個のすべてのレジスタのバンク番号およびページアドレスとの比較を並行に行い、以下に示す４つの可能な結果のいずれかを得る。
【００３８】
１）比較器７１０がヒット（一致）を示す場合、要求されているバンクの所望のページがオープンされ、アクセスへの準備が整う。
【００３９】
２）比較器７１０がバンクのヒット（一致）およびページのミス（不一致）を示す場合、ユニバーサルコントローラ１０４は、ページレジスタからのページアドレスを利用して古いページをクローズし、且つ、システムアドレスからのページアドレスを利用して新しいページをオープンする必要がある。
【００４０】
３）比較器７１０がバンクおよびページの双方でのミスを示す場合、ユニバーサルコントローラ１０４は、ランダムページレジスタ番号生成部によって提供される番号のバンクにおける任意の古いページをクローズし、且つシステムアドレスを利用して新しいページをオープンする必要がある。その後、所望のバンクへのアクセスを行う。
【００４１】
４）バンクおよびページの双方がミスで、しかし少なくとも１つのページレジスタが未使用である場合、そのレジスタが利用されて、新しいページがオープンされる。
【００４２】
別の実施形態では、図７Ｂに示されるように、ランダムページレジスタ番号生成部７０８は、最も以前に用いられた（ＬＲＵ）比較器７１２によって代替され、Ｍ個のレジスタ７０４のうちのどれが最も長く未使用であるか（すなわち、最も以前に用いられたか）を判定する。
【００４３】
マルチバンクメモリ７０６における物理ページの状態の観測に加え、図８に示されるバンクアクセスコントローラ８００は、マルチバンクメモリ７０６に含まれるメモリバンク数Ｎに対応するＮ個のバンクレジスタ８０２を含む。関連するバンクの情報が記憶されるバンクレジスタ８０２は、バンクの識別番号を規定するバンク番号フィールド８０４を含む。さらに、バンクレジスタ８０２は、バンク番号フィールド８０４中のバンク番号によって識別される特定のバンクの状態を示すバンクステータスフィールド８０６を含む。具体的な実施形態では、バンクステータスフィールド８０６は表２に示されるような値をとり得る。
【００４４】
【表２】

【００４５】
４００〜８００Ｍｂ／ｓ／ピンの範囲の速度でバスデータを伝達するシンクロナスリンクＤＲＡＭ（ＳＬＤＲＡＭ）などのパケット指向型高速メモリの発展に伴い、メモリアクセスの衝突に起因する問題は益々増加している。まず図９Ａを参照すると、発明の実施形態による、例示的なＳＬＤＲＡＭ型マルチプロセッサシステム９００が示される。マルチプロセッサシステム９００は、システムバス９０６によってコントローラ９０４に接続されるプロセッサ９０２を含む。ユニバーサルコントローラ９０４は、次いで、ＳＬＤＲＡＭバスによってシンクロナスリンクＤＲＡＭ（ＳＬＤＲＡＭ）９０８およびＳＬＤＲＡＭ９１０に接続される。ＳＬＤＲＡＭバスは単方向コマンドバス９１２および双方向データバス９１４で構成されている。留意点として、図９Ａには２つのＳＬＤＲＡＭのみが示されているが、バス９１２および９１４によって、任意の数のＳＬＤＲＡＭがユニバーサルコントローラ９０４に接続され得ることがあげられる。別の場合では、ＳＬＤＲＡＭは、ＳＬＤＲＡＭ９０８のようなＳＬＤＲＡＭを任意の適切な数だけ含むバッファ付きモジュールの形態をとり得る。ユニバーサルコントローラ９０４をＳＬＤＲＡＭ９０８および９１０のそれぞれに接続する初期化／同期（Ｉ／Ｓ）バス９１６は、ユニバーサルコントローラ９０４によって生成される初期化信号および同期信号の信号経路を提供する。
【００４６】
発明の別の実施形態では、ユニバーサルコントローラ９０４からのパケット化されたコマンド、アドレス、および制御情報が、コマンドバス９１２上をＳＬＤＲＡＭ９０８および９１０に選択的に伝達される。データバス９１４は、ユニバーサルコントローラ９０４からのパケット化されたライトデータをＳＬＤＲＡＭ９０８および９１０のいずれか選択された方へ伝達するよう設定される。または、データバス９１４が、ＳＬＤＲＡＭ９０８および９１０のいずれか選択された方からのパケット化されたリードデータをユニバーサルコントローラ９０４に送り返すようにも形成される。留意点として、コマンドバス９１２およびデータバス９１４は通常、例えば４００ＭＢ／ｓ／ｐ、６００ＭＢ／ｓ／ｐ、８００ＭＢ／ｓ／ｐなどの互いに同一の速度で動作することがあげられる。
【００４７】
ユニバーサルコントローラ９０４により生成され、コマンドバス９１２によって伝達される複数の制御信号は、例えば、無差異の走行クロック信号（ＣＣＬＫ）、ＦＬＡＧ信号、コマンドアドレス信号ＣＡ，ＬＩＳＴＥＮ信号、ＬＩＮＫＯＮ信号およびＲＥＳＥＴ信号を含む。通常、パケットコマンドは４つの連続する１０ビットワードから成り、コマンドの最初のワードは、ＦＬＡＧ信号の最初のビットが“１”となっている。好ましい実施形態では、無差異の走行クロック信号ＣＣＬＫの両端は、コマンドワードをラッチするためにＳＬＤＲＡＭ９０８および９１０によって使用される。ＳＬＤＲＡＭ９０８および９１０は、入力されたコマンドについてコマンドバス９１２を調べることにより、ＨレベルのＬＩＳＴＥＮ信号に応答する。または、ＳＬＤＲＡＭ９０８および９１０は、省電力スタンバイモードに入ることによって、ＬレベルのＬＩＳＴＥＮ信号に応答する。ＬＩＮＫＯＮ信号およびＲＥＳＥＴ信号は、ＳＬＤＲＡＭ９０８および９１０のいずれか選択された方を既知の所望の状態になるように停止するかまたは起動するためにそれぞれ使用される。
【００４８】
ここで議論の残された点に関しては、ＳＬＤＲＡＭ９０８は、適当と考えられる任意の数のＳＬＤＲＡＭがユニバーサルコントローラに接続され得ることが、よく理解された上でのみ論じられるであろう。上に論じた通り、ＳＬＤＲＡＭ９０８のような典型的なＳＬＤＲＡＭデバイスは、メモリ領域だけでなく、メモリバンク、カラム、ロウ、ビットによって階層的に編成されている。これらの各階層レベルが相互に異なった動作特性を持つことが実際に認められることに注意すべきである。そのような動作特性には、メモリアクセス時間、チップイネーブル時間、データ検索時間等のパラメータを含むが、これらに限定はされない。領域が、それぞれ異なるコマンド待ち時間やデータ待ち時間を持つ異なるメモリタイプやメモリグループといった異なるデバイスに定義されるのに対し、マルチバンクメモリ内のバンクが通常同じ動作特性を持つことは注記されるべきである。例えば、１つのローカルメモリグループがメモリコントローラにダイレクトに接続でき、また、ローカルメモリグループに関するコマンド待ち時間やデータ待ち時間を仲介するドライバが増加させるボード上に位置する第二の非ローカルメモリグループに接続できる。他のケースでは、マルチチップモジュールを形成する様々なメモリチップのそれぞれは、異なるメモリ領域であると考えられる。
【００４９】
図９Ａのシステムに関してさらに具体的に述べると、ＳＬＤＲＡＭ９０８は、コマンドバス９１２、データバス９１４、Ｉ／Ｓバス９１６によってそれぞれ個別にアクセス可能な４つのメモリチップＡ、Ｂ、Ｃ、Ｄを持つマルチチップモジュールである。メモリチップＡ〜Ｄのそれぞれは、コマンドやデータパケットを最適にスケジューリングするために、異なった動作特性を持つことができ（通常は製造者によって与えられる）、ユニバーサルコントローラ９０４は特定の階層レベル、およびまたは、対応するメモリ領域の動作特性を使用することができる。
【００５０】
例として、図９Ｂは、図９に示されたマルチプロセッサシステムに従った代表的ＳＬＤＲＡＭバストランザクションの典型的タイミング図を示している。演算の間、プロセッサは通常、例えばリードコマンド９５０やライトコマンド９５２のように、ＳＬＤＲＡＭ９０８の適当なメモリバンク（複数でも可）が応答するプロセッサコマンドパケットを生成する。通常、リードコマンド９５０やライトコマンド９５２は、それらが生成されるプロセッサ９０２の特定の要求に基づいて、システムバス９０６上でパイプライン化されており、ＳＬＤＲＡＭの最適のパフォーマンスには適していない。システムクロックＣＬＫｓｙｓ（示されていない）は必要なタイミング信号を与える。
【００５１】
この例では、プロセッサ９０２ａは、ＳＬＤＲＡＭ９０８のメモリチップＡに位置するメモリアドレスＭＡ１を持つリードコマンド９５０を生成する。他方、プロセッサ９０２ｂは、同じくＳＬＤＲＡＭ９０８のメモリチップＡに位置するメモリアドレスＭＡ２を持つリードコマンド９５２を生成する。この例では、リードコマンド９５０は、ライトコマンド９５２の出力に優先するシステム・バス９０６への出力である。ユニバーサルコントローラ９０４は、まずリードコマンド９５０を受け取り、続いてそのコマンド自体とコマンドアドレスＭＡ１に基づいて、ユニバーサルコントローラ９０４内に格納された終点アドレス特定情報を使って、コマンドの処理を始める。いったん最短発行時間が決定されると、ユニバーサルコントローラ９０４は次に、受信したプロセッサ・コマンド９５０に応じて、ＳＬＤＲＡＭコマンドパケットリード９６０を生成し、それをコマンドバス９１２に送り出す。
【００５２】
一般に、ＳＬＤＲＡＭコマンドパケットは、表３に示したように、８バンク、１０２４ロウアドレス、１２８カラムアドレスを持つ６４ＭのＳＬＤＲＡＭを表わす４つの１０ビットワードとして編成されている。示されたように、バンクアドレス（ＢＮＫ）は３ビット、ロウ・アドレス（ＲＯＷ）は１０ビット、カラム・アドレス（ＣＯＬ）は７ビットである。他にも多くの編成や密度が可能であり、図示された４０ビットフォーマットの他、適切と規定されうる他のどんなフォーマットでも収容されうる。電源が入っている間、ユニバーサルコントローラ９０４は、バンク、ロウ、カラムの番号や、その時にユニバーサルコントローラ９０４によって格納される関連付けられた動作特性などの要因についてのＳＬＤＲＡＭのポーリングに基づいてコマンドパケットを編成する。
【００５３】
コマンドパケットの最初のワードは、複数のチップＩＤビットを含む。ＳＬＤＲＡＭは、ローカルＩＤに整合しないコマンドはどれでも無視する。チップＩＤは電源オンとともにユニバーサルコントローラ９０４によって、初期化信号と同期信号を用いて割り当てられる。このようにして、ユニバーサルコントローラ９０４は、セパレートチップイネーブル信号またはグルーロジックの生成によって、マルチプロセッサシステム９００のそれぞれのＳＬＤＲＡＭに一対一対応でアドレス指定する。
【００５４】
【表３】

【００５５】
リードコマンド９５０とライトコマンド９５２はパイプライン化されているので、ユニバーサルコントローラ９０４は、リードコマンド９５０の受け取った後一定時間の後にライトコマンド９５２を受け取り（またはバッファに格納させておくこともできる）、次いで、ライトコマンド９５２に対応するＳＬＤＲＡＭコマンドパケットライト９６２を発行する。同一のバンク（Ａ）が両方のコマンドにアクセスされるので前に発行されたリードコマンド９６０の干渉を避けるために、ユニバーサルコントローラ９０４は、ＭＡ２の特定の特性データならびに最短発行時間を生成させるためのリードコマンド９６０の発行時間（すなわち、発行の時刻）と、ライト９６２のデータオフセットを使う。
【００５６】
このようにして、ユニバーサルコントローラ９０４は少なくとも、コマンドやデータパケットのストリームの現在の状態ならびに特定の終点アドレスデバイスの動作特性に基づいて、ＳＬＤＲＡＭコマンドパケットの発行をダイナミックにスケジューリングできる。
【００５７】
次に、本発明の実施形態に従ったメモリコントローラ１０００のブロックダイヤグラムを図解した図１０に言及する。メモリコントローラ１０００が図１に示されたユニバーサルコントローラ１０４の実施形態の１つにすぎず、したがって本発明の限界を制限するものと捉えるべきでないことは注記すべきである。メモリコントローラ１０００はシステム・インタフェース１００２を含み、これはシステム・バス９０６を介してプロセッサ９０２をメモリ・スケジューラ１００６（スケジューラと呼ぶ）に接続する。本発明の実施形態の１つにおいては、システムインタフェース１００２は、プロセッサ９０２によって生成されたメモリコマンドパケットとそれに関連付けられた書き込みデータパケットの両方の、メモリコマンドパケットスケジューラ１００４への伝送に備えるために配置されている。内部バッファがいっぱいで新しいコマンドが収容できないとスケジューラ１００６が表示している状況では、システムインタフェース１００２はスケジューラ１００６が新しいコマンドを受け入れ準備完了と表示する時まで新しいコマンドをホールドする。
【００５８】
シンクロナスリンク・メディアアクセスコントローラ（ＳＬｉＭＡＣ）１００８は、スケジューラ１００６とＳＬＤＲＡＭ９０８間の物理的インタフェースを提供する。さらに具体的には、ＳＬｉＭＡＣ１００８は、それぞれコマンドバス９１２とデータバス９１４を介して、ＳＬｉＭＡＣ１００８をＳＬＤＲＡＭ９０８に接続するコマンドインタフェース１０１０とデータインタフェース１０１２を含む。本発明の好適な実施形態では、コマンドインタフェース１０１０はメモリコマンドを、関連するコマンドクロックＣＣＬＫとともにＳＬｉＭＡＣ１００からＳＬＤＲＡＭ９０８に伝送する。いくつかの実施形態では、通常２００ＭＨｚで作動するコマンドクロック信号ＣＣＬＫを生成させるために、ＳＬｉＭＡＣ１００８はインタフェースクロック信号ＩＣＬＫ（およそ１００ＭＨｚで作動できる）を使うクロックダブラを組み入れている。
【００５９】
本発明の実施形態の１つでは、データインタフェース１０１２はデータバス９１４のデータの受信、送信両方をする。データバス９１４の幅は、必要とされる数のＳＬＤＲＡＭをサポートするのに十分な大きさにできることは注記すべきである。したがって、必要な帯域幅を供給するために、必要なだけ十分のデータインタフェースがＳＬｉＭＡＣに含まれ得る。一例として、データバス９１４が３２ビット幅（例えば、ＳＬＤＲＡＭごとに１６ビット）であれば、その場合、ＳＬｉＭＡＣ１００８は、個々のＳＬＤＲＡＭに関連付けられた１６ビットをそれぞれ制御できる２つのデータインタフェースを備えることができる。このようにして、ＳＬｉＭＡＣ１００８に含まれたデータインタフェースのサイズは、それに接続されたＳＬＤＲＡＭの特定の構成に厳密に適合させることができる。
【００６０】
コマンド・インタフェース１０１０を用いるのとほぼ同じ方法で、ＳＬｉＭＡＣ１００８は、ＳＬＤＲＡＭ９０８からＳＬｉＭＡＣ１００８に送信されたリードデータに伴うデータクロック信号ＤＣＬＫを供給することができる。本発明の実施形態の１つにおいては、データクロックＤＣＬＫは、インタフェースクロックＩＣＬＫ周波数を約１００ＭＨｚから約１０００ＭＨｚに増加させるクロック・ダブラを使うことで生成される。また、インタフェースクロック信号ＩＣＬＫ、コマンドクロック信号ＣＣＬＫ、データクロック信号ＤＣＬＫが全位相同期的であることも注記されるべきである。
【００６１】
本発明の好適な実施形態においては、スケジューラ１００６は、システムコマンドとそれに関連付けられたシステムアドレスデータを、接続されたシステムインタフェース１００２から受信するように配列された制限ブロック１０１６を含む。制限ブロック１０１６はＳＬＤＲＡＭコマンドパケットデータと関連付けられたタイミング情報を再順序付けブロック１０１８に供給する。ライトバッファ１０２０はシステムインタフェース１００２からリードデータを受け取る。スケジューラ１００６によって指示されたように、リードデータは、リードデータをシステム・インタフェース１００２に供給するように配列されたデータバス９１４に接続されたリードバッファ１０２２を通してデータインタフェース１００２から伝送される。初期化／同期（Ｉ／Ｓ）バス９１４に接続されたＩ／Ｓブロック１０２４は、適切な初期化および／または同期信号をＳＬＤＲＡＭ９０８に要求通りに供給する。
【００６２】
動作中において、スケジューラ１００６は、プロセッサ９０２によって生成されたパイプライン化されたメモリコマンドパケットを受け取る。通常、メモリコマンドパケットはメモリコマンドとそれに関連付けられたメモリアドレスとで構成されている。本発明の実施形態の１つでは、スケジューラ１００６は、メモリコマンドとそれに関連付けられたデータパケット（もしあれば）が差し向けられた終点アドレスを決定するために、受信された新コマンドに関連付けられたメモリアドレスをデコードする。いったんデコードされれば、スケジューラ１００６は、新しいＳＬＤＲＡＭコマンドパケットを送り出すために、そこに格納された終点アドレス特定デバイス特性データを、直前に送り出されたメモリコマンドに関連付けられた情報とともに使う。新しいＳＬＤＲＡＭコマンドパケットはコマンドバス９１２への出力であり、最終的にはＳＬＤＲＡＭコマンドパケットに含まれたチップＩＤによって識別されるＳＬＤＲＡＭへの出力である。
【００６３】
スケジューリングプロセスの一部として、スケジューラ１００６は、新しいコマンドの発行前に、直前に発行された要求されたコマンドの発行後の最短時間量を決定する。上述のように、例えばメモリバンクなどのＳＬＤＲＡＭのそれぞれの階層レベルは、異なった動作特性を持つことができる（通常、製造者によって与えられている）ので、スケジューラ１００６は、初期化の間に、それがサービスするＳＬＤＲＡＭのそれぞれにポーリングする。いくつかの実施形態では、接続されたメモリデバイスが動作特性を決定するためのポーリングを許さない場合、メモリ特定パラメータ（タイミングなど）は制限ブロックレジスタ１０１６に直接書き込むことができる。いったんＳＬＤＲＡＭがポーリングされれば、スケジューラ１００６は、後で適切なスケジューリングプロトコルを展開するために使用するデバイス特定情報を格納する。このようにして、スケジューラ１００６は、ハードワイヤリングを必要とすることもなく、さらなる時間の浪費や経費のかかる手続もなく、どんな数やタイプのＳＬＤＲＡＭにも適応するスケジューリングサービスを供給することが可能である。
【００６４】
図１１は、本発明の実施形態に従った制限ブロック１１００を模式的に図解したものである。制限ブロック１１００が図１０に示された制限ブロックの唯一の可能な実施形態ではなく、したがってこれに限定されるものではないことは注記されるべきである。制限ブロック１１００は、プロセッサ９０２によって生成される新しいメモリコマンドに関連付けられた受信した新しいアドレス信号をデコードするために配列され、システムインタフェース１００２に接続されたアドレス・デコーダ１１０２を含む。デコードされた新しいアドレス信号は、配列タグ・レジスタ１１０４への入力を供給し、この配列タグレジスタには、関連するＳＬＤＲＡＭメモリバンクのすべてのための、または場合によってはサブセットのみのためのステータスや他の関連情報が格納されている。配列タグレジスタ１１０４は、セレクタ１１０６への入力を供給し、セレクタ１１０６は、デコードされた新しいコマンドに基づいて、選択された仮想バンクに関連するデータをルックアップテーブル（ＬＵＴ）１１０８に伝える。
【００６５】
制限ブロック１１００はまた、システムインタフェース１００２に接続された領域比較器１１１０を含んでおり、この領域比較器１１１０は、受信した新しいアドレス信号を用いて、新しいコマンドアドレスが存在するメモリ領域を示す領域識別子を供給する。このようにして、制限ブロック１１００は、メモリ領域特定特性データに少なくとも部分的には基づいて、新しいメモリコマンドのための最善のスケジューリングプロトコルを提供することが可能である。領域比較器１１１０は、新しいコマンド信号とともに、領域識別子をＬＵＴ　１１０８に入力として供給する。次にＬＵＴ１１０８は、新しいコマンドとそれに関連付けられた新しいアドレスをＳＬＤＲＡＭコマンドパケットに変換するために使われる最小デルタ発行時間とデータオフセットを供給する。最小デルタ発行時間が、直前に発行された古いコマンドとの関係において新しいコマンドを発行する（クロックサイクルでの）デルタ時間を示していることは注記されるべきである。データオフセット時間は、新しいコマンドの発行後に新しいコマンドに関連付けられたリードデータパケットを受信するための、クロックサイクルにおけるデルタ時間を表わしている。
【００６６】
本発明の一実施形態において、制限ブロック１１００は１６個の配列タグバンクレジスタを含み、ＬＵＴ１１０８は、１６個の関連付けられたレジスタを持つ４個のタイミング領域それぞれのために４つの異なったパラメータセットを格納可能である。
【００６７】
図１２は、本発明の実施形態の１つに従って、受信したプロセッサコマンドに応答するＳＬＤＲＡＭバス信号のタイミング図１２００である。表４が、様々な発生信号を識別することにより、制限ブロックによって実行されるスケジューリングプロセスを要約していることは注記されるべきである。また、メモリコマンドが｛コマンド，アドレス｝の形式を取り、ここでは“コマンド”は実行されるべき命令を表わし、“アドレス”は関連付けられたメモリのロケーションを表わしていることも注記されるべきである。
【００６８】
次に表４と図１２に言及する。システムクロックサイクルΦ１の間、最初のコマンド｛オープンページ，１０００｝がアドレスデコーダ３０２で受信され、並列的に領域比較器１１１０で受信される。この例では、アドレスデコーダ１１０２はオープンページコマンドアドレス“１０００”を“１００”および“４００”としてデコードし、それを領域比較器１１１０がメモリ領域０内に含まれることを決定する。オープンページコマンドは受信されるべき最初のコマンドなので、仮想バンクＢ０−１３のどれにも“ヒット”は無く、対応する置換カウンタは“０”に設定される。この実施形態では、置換カウンタは擬似ランダムカウンティング方法に基づいて更新されるのに対し、他の実施形態では、ランダム・カウンティングか他の適切な方法が使用される。最初のコマンド｛オープンページ，１０００｝はオープンタイプコマンドなので、関連付けられた最小デルタ発行時間もデータオフセットも無く、したがってアドレス１０００のページは最初のコマンドクロックサイクルΦＣ１で開かれる。
【００６９】
次のシステムクロックサイクルΦ２の間に、｛リード，１０００｝コマンドが制限ブロック１１００で受信され、それをアドレスデコーダ１１０２が１００および４００としてデコードし（すなわち、メモリアドレスロケーション１０００に前のクロックサイクルでオープンされたページをリードすること）、これらの値が、領域比較器１１１０に領域識別子を領域１に設定させる。しかしながらこの場合には、Ｂ０レジスタに格納されている前の、あるいは他の言い方では“古いコマンド”がＢ０において“ヒット”することになり、これにより、セレクタは、“リード”を“古いコマンド”入力としてＬＵＴ１１０８に出力する。他の入力としては、領域比較器１１０４により発せられた領域標識子「領域１」と、リードである“新しいコマンド”入力とを含む。ＬＵＴ１１０８は、格納された特性データを用いて、３つのコマンドクロックサイクルΦ３の最小デルタ発行時間を生成する。これは、少なくとも３つのコマンドクロックサイクルが｛ページオープン，１０００｝コマンドの発行と、それに関連付けられた｛リード，１０００｝コマンドとを分離しなければならないということを示している。
【００７０】
このようにして、制限ブロック１１００において受信されたそれぞれのメモリコマンドパケットは、ＬＵＴ１１０８に格納された特性データにしたがって、少なくともある程度は直前に発行されたコマンドに基づいて処理される。
【００７１】
次に、本発明の特定の実施形態に従った、制限ブロックから受信されたコマンドの再順序付けを説明する。図１３Ａ〜図１３Ｃは、時刻表１３０２，１３０４であり、単純なコマンド再順序付けの例を通して、本発明の特定の実施形態に従ったメモリコマンドの再順序付けによって実現される利点のいくつかを例示するのに役立つ。それぞれの時刻表は２つの異なったメモリバンクに対応する４つの読み出しコマンドを示している。ＣＭＤ０とＣＭＤ１は、関連付けられたメモリのバンク１に差し向けられたリードコマンドである。ＣＭＤ２とＣＭＤ３は、関連付けられたメモリのバンク２に差し向けられたリードコマンドである。時刻表１３０２は、コマンドがシステムプロセッサからメモリコントローラによって受信される順序で、メモリコントローラとメモリを接続するコマンドバス上に配置されたメモリコマンドを示している。ＣＭＤ０が時間帯０を占め、ＣＮＤ１が時間帯３を占め、ＣＭＤ２が時間帯４を占め、ＣＭＤ３が時間帯７を占めている。それぞれの時間帯は１つのクロックサイクルを表わしている。
【００７２】
上に述べたように、同じメモリバンクへのコマンドは、前に発行されたコマンドの処理を行うために、発行の間に最短の遅延が必要である。これは図１３Ａに、一対のコマンドの間の二つの時間帯によって表わされている。見てとれるように、図１３Ａに示された順序で４つのリードコマンドがメモリに送られたとすると、コマンドバスは４つの使用可能なクロックサイクル、すなわち時間帯１、２、５、６の間は、使われずに過ぎることになる。以下で論じるように、この非効率の少なくとも幾分かは、本発明に従ったコマンドの再順序付けによって改善されるであろう。
【００７３】
図１３Ｂと図１３Ｃの時刻表１３０４と１３０６はそれぞれ、本発明の特定の実施形態に従った図１３Ａのコマンドの再順序付けと、それにより得られる利点の少なくともいくつかを図解している。この例においては、データバスのコンフリクトは簡便を期して考慮されていない。しかしながら以下に論じるように、メモリコマンドの効果的な再順序付けのためには、そのような考慮に注意が払われねばならない。ＣＭＤ２とＣＭＤ３がＣＭＤ０とＣＭＤ１とは異なったメモリバンクに差し向けられているという事実のため、二対のコマンドの間にあるメモリ・アクセス待ち時間は問題とならず、無視してもよい。すなわち、コマンドは時刻表１３０４に示されたように配置し直すことができ、ＣＭＤ２をＣＭＤ０の直後の時間帯１に置き、ＣＭＤ３をＣＭＤ１の直後の時間帯４に置く。これは、異なったメモリバンクに差し向けられているという事実により、ＣＭＤ０とＣＭＤ２の発行の間、ＣＭＤ１とＣＭＤ３の発行の間には遅延は必要ないからである。しかしながら、図１３Ｃに示したように、同じバンクに差し向けられたコマンドの対の間には最短の遅延時間、例えば２クロックサイクルが維持されねばならないことが理解されるだろう。すなわち、コマンドの再順序付けは同じメモリバンクへの連続するコマンド間の遅延時間を縮減する試みは含んでいない。
【００７４】
コマンドの再順序付けの結果は図１３Ｃに示されている。ここでは４つのコマンドが５クロックサイクル内に発行され、時間帯２だけが使われずに過ぎている。もちろん、さらに別のメモリバンクに向かう５番目のメモリコマンドが時間帯２に挿入され、コマンドバスが活用される効率をさらに極限まで増大させることは理解されるだろう。
【００７５】
図１４は、本発明の特定の実施形態に従って構成されたメモリコントローラの一部のブロックダイヤグラムである。再順序付け回路１４００はシステム・プロセッサから入ってくるメモリコマンドシーケンス、すなわち１、２、３というコマンドシーケンスを受け取る。特定の実施形態によれば、メモリコマンドは制限回路（示されていない）を介して再順序付け回路１４００に伝送され、制限回路は上述のように、選択されたコマンドに、関連付けられたメモリの同一の論理バンクに差し向けられた他のコマンドに応じて、発行時間の制約を課す。コマンドは、コマンドキュー１４０２内で再順序付けされ、そこからコマンドはメモリに向けて発行される。この例においては、コマンドは１、３、２の順に再順序付けされている。
【００７６】
オリジナルのメモリコマンドシーケンス、すなわち１、２、３は、データ読み取り回路１４０６内のＦＩＦＯメモリに格納される。ＦＩＦＯ１４０４内のシーケンスは、コマンドが元々メモリコントローラによって受け取られた順序に対応するように、メモリから受け取ったデータを再順序付けするために使われる。しかしながら、プロセッサのいくつかは順序通りのデータを期待するのに対し、他のプロセッサは順序通りでないデータを期待するので、ＦＩＦＯ１４０４のスイッチを必要に応じてオン、オフすることにより、いかなるタイプのデータ順序もサポートされうることは注記されるべきである。これが必要なのは、プロセッサがコマンドを元々メモリコントローラに伝送した順序に対応する順序でデータを受け取ることを“期待する”からである。
【００７７】
さらに、メモリからのデータはメモリコントローラによって、プロセッサがメモリコマンドを伝送する元のシーケンスに対応しないシーケンスで受け取られるかもしれないので、第３のシーケンスがデータキュー１４０８に格納される。このシーケンス（この例では３、２、１）は、コマンドシーケンス１、３、２に対応するデータがデータ読み取り回路１４０６に受け取られるであろう順序を表わしている。データキューシーケンスは、コマンドキューシーケンスとメモリの様々な論理バンクに関連付けされた既知の待ち時間に基づいて、再順序付け回路１４００によって算出される。メモリがデータキュー１４０８に格納されたシーケンス（すなわち３、１、２）でメモリコントローラにデータを伝送するとき、データは読み取りデータバッファ１４１０に格納され、ＦＩＦＯ１４０４内の情報とデータキュー１４０８に基づいて、元のコマンドシーケンスの順序に対応する順序、すなわち１、２、３でプロセッサに伝送するように、再順序付けされる。
【００７８】
図１５は、本発明の特定の実施形態に従って構成されたメモリコントローラ内の再順序付け回路１５００のブロック図である。再順序付け回路１５００は、システムプロセッサから受け取ったコマンドを格納し再順序付けするコマンドキュー１５０２を含む。コマンドキュー１５０２は、メモリ内の同一の論理バンクに向かうコマンドに関連付けられたコマンド発行時間制約とデータバス使用制約を使って、それぞれのコマンドの発行時間を計算し、コマンドを発行し、発行されたコマンドをキューから外す。
【００７９】
データキュー１５０４は、発行されたメモリコマンドに対応するデータ発生時刻を表わすデータ要素を格納し、キューへの新規入力それぞれに対する新しいデータ発生時刻を計算し、対応するメモリトランザクションが完了したときにキューエントリを外す。
【００８０】
比較器マトリクス１５０６は、衝突検出機能を実行する。この機能では、コマンドキュー１５０２から発行準備のできているコマンドのデータ発生時刻（マルチプレクサ１５０８を介して伝達される）が、データキュー１５０４に表わされている前に発行されたコマンドのデータ発生時刻と比較される。衝突が検出されれば、コマンドの発行が延期される。
【００８１】
図１６は、図１５の再順序付け回路１５００のさらに詳細なブロック・ダイヤグラムである。図１７のダイヤグラムに示されているように、コマンドキュー１５０２は、６つのコマンドキュー要素１６０２を含み、そのそれぞれは特定のメモリコマンドに関して６１ビットの情報を格納している。コマンドフィールド１７０２は、メモリコマンドを特定する４０ビットメモリコマンドパケットを含む。コマンド発行時間（Ｃｄ）フィールド１７０４は６ビットのフィールドであり、コマンドが発行される前のクロックサイクルのデルタ時間を表示する。フィールド１７０４の値は、上述の制限回路によって決定され、メモリ内の同一の論理バンクに対応する最新のメモリコマンドに関係する。すなわち、Ｃｄフィールドの値は同一のバンクへの２つのコマンド間の待ち時間を表わす。それぞれのバンクに必要な待ち時間に関する情報は制限回路に格納され、大部分はメモリの物理的特性によって決定される。コマンドキュー内で、Ｃｄフィールドはそれぞれのクロックサイクルに対して決定されるが、いくつかの例外がある。例えば、同一の論理バンクへの連続するコマンド間の待ち時間は変更されない。したがって、特定のバンクに向けられたコマンドのためのＣｄフィールドがゼロになり発行されない場合、最初のコマンドが発行されるまで、同じバンクへの他のコマンドのＣｄフィールドはデクリメントされない。
【００８２】
データ発生時刻（Ｄｄ）フィールド１７０６は６ビットのフィールドであり、コマンドキューからのメモリコマンドの発行と、対応するデータの転送の間のクロックサイクルのデルタ時間を示す。Ｄｄフィールド１７０６はコマンドキュー内では変更されてはならない。コマンドＩＤフィールド１７０８は５ビット・フィールドであり、コマンドパケット１７０２内のコマンドを一対一対応に識別する。この情報は、コマンドやデータの再順序付けが効を奏するように、どれがどのパケットか、どのデータがどのパケットに対応するか見失わないようにするため、ＦＩＦＯの対応する情報とデータキューとともに使用される。論理バンク（Ｂ）フィールド１７１０は３ビットのフィールドであり、メモリ内のどの論理バンクにコマンドパケットが向けられているかを識別する。最後に、バースト標識（Ｄｂ）フィールド１７１２は１ビットのフィールドであり、要求された、または書き込まれたデータが、１つまたは２つのクロックサイクルを占めることを示す。
【００８３】
図１６に戻って述べると、コマンドキューの動作はコマンドキューコントローラ１６０４によって制御されている。コントローラ１６０４は、どのコマンドキュー要素１６０２が使用可能かを見失わないようにして、空き位置識別部１６０６を介して、入ってくるコマンドの特定のキュー要素への挿入を制御する。コントローラ１６０４はまた、対応するコマンドが発行されたときに、コマンドキュー要素の情報をデータキュー１５０４に挿入するのを容易にする。特定の実施形態によれば、コマンドバスやデータバス上の空き時間スロットの利用可能性に関係なく、コマンドはコマンドキュー１５０２に挿入される。
【００８４】
コマンドは、そのＣｄカウントがゼロで、かつ、データバス上に衝突がない場合、マルチプレクサ１６０８を介して、コマンドキュー要素１６０２のどの１つからでもコマンドバスに発行されうる。すなわち、コマンドバスおよび／またはデータバス上の空き時間スロットが識別されねばならない。コマンドがリードやライトではない場合（したがってデータバス・リソースを必要としない場合）、コマンドバス時間スロットのみが必要とされる。コマンドがリードやライトである場合には、コマンドバスとデータバス両方のスロットが必要とされる。
【００８５】
コントローラ１６０４のゼロコンパレータ１６１０は、最初の決定、すなわちＣｄ＝０かどうかを決定するために使われる。減算器１６１２は、上記の例外すなわち発行できない特定のコマンドに対してＣｄ＝０であることがない場合に、それぞれのクロックサイクル毎に、それぞれのコマンドキュー要素１６０２に対するＣｄカウントから“１”を減じるために使われる。その場合、キューコントローラ１６０４は、すべてのキュー要素に対するＣｄフィールドとＢフィールドを用いて、同一の論理バンクへの全コマンドに対するＣｄカウントがデクリメントするのを防ぐマスク信号（Ｍ）を発する。
【００８６】
特定の実施形態によれば、Ｃｄ＝０である２つのキュー要素があった場合、優先順位の最も高いもの（例えば、最も古いもの）が発行される。アドレスシフタ１６１４は、以下で図１８を参照しつつ詳細に論じるように、キュー内のコマンドの優先順位を決定する。他の特定の実施形態によれば、新しいコマンドがコマンドキューに到着して、そのＣｄカウントがすでにゼロである場合、それはマルチプレクサ１６０８を介して直接メモリに伝送される。新しいコマンドは、そのＣｄカウントがゼロでない場合、または優先順位が上位でＣｄ＝０でコマンドキューに格納された他のコマンドがある場合、コマンドキュー要素１６０２に格納される。しかしながら、コマンドキューが空の場合には、新しいコマンドはただちに発行される（Ｃｄがゼロに等しい場合）。
【００８７】
リードコマンドやライトコマンドに関しては、衝突は、発行準備ができているコマンドを含むコマンドキュー要素１６０２のＤｄフィールドとＤｂフィールドを使って検出される。コマンドに対応するデータ発生時刻と持続時間はマルチプレクサ１５０８を介して比較器マトリクス１５０６に伝送され、マルチプレクサ１５０８はキューコントローラ１６０４によって制御される。すなわち、キューコントローラ１６０４は、コマンド発行時間すなわちＣｄがゼロであるキュー要素のデータ発生時刻と持続時間（１または２クロックサイクル）を伝送するために、マルチプレクサ１５０８を制御する。持続時間は１または２クロックサイクルであり、これは、加算器１６１６が、Ｄｂビットをデータ発生時刻Ｄｄに加算することによって、Ｄｄ＋１に対する“０” （１クロックサイクルを表わす）または“１”（２クロックサイクルを表わす）を導き出すことにより得られる。次に、データ発生時刻および持続時間は、比較器マトリクス１５０６で、データキュー１５０４に格納された５つの以前に発行されたコマンドのデータ発生時刻および持続時間と比較される。特定の実施形態によれば、比較器マトリクス１５０６は２×１０個のパラレル比較器マトリクスを含む。
【００８８】
図１８は、図１６のアドレスシフタ１６１４の特定の実施形態のブロック図である。上述のように、アドレスシフタ１６１４はコマンドの優先順位を決定する。これもまた上述のように、新しいコマンドは、空き位置認識部１６０６にしたがって、任意のフリーコマンドキュー要素１６０２に挿入される。新しいコマンドが挿入されるコマンドキュー要素１６０２のアドレスは、最初の空き位置（Ａ０〜Ａ５）に最上位の優先順位で挿入される。結果として、アドレスシフタのＡ０ポジションは、未発行の最も古いコマンド用のキュー要素アドレスを格納する。コマンドがコマンドキューから発行されるとき、アドレスシフタ１６１４の対応するエントリが外され、下位の優先順位のコマンド用のアドレスが上位の優先順位のポジションに変更される。上述のように、コマンドキュー内のコマンドに対するＣｄカウントがゼロに達したとき、コマンドは発行される。しかしながら、Ｃｄ＝０のコマンドが１つ以上ある場合には、最も古いコマンド、すなわちアドレスシフタ１６１４内のアドレスのポジションによって示される優先順位が最上位のコマンドが発行される。
【００８９】
図１６のデータキュー１５０４は５つのキュー要素１６５２を含み、そのそれぞれは図１９によって図解されているように、以前に発行されたメモリコマンドに関する１２ビットの情報を含んでいる。データ発生時刻（Ｄｄ）フィールド１９０２は、コマンドキューからのコマンドの発行と、対応するデータの受信との間のクロックサイクルでのデルタ時間を示す６ビットのフィールドである。それぞれのデータキュー要素１６５２に対するＤｄカウントは、その値がゼロに到達するまで、減算器１６５４の１つを用いてクロックサイクル毎にデクリメントされる。Ｄｄ＝０のとき、対応するデータはデータバス上にある。したがって、任意の時刻にＤｄ＝０を持つデータキュー要素１６５２はただ１つであることが理解されるだろう。Ｄｄカウントがゼロに到達した後、対応するデータキュー要素内の情報はデータキュー１５０４から外される。
【００９０】
コマンドＩＤフィールド１９０４は、データが対応する発行済みコマンドを一対一対応に識別する５ビットのフィールドである。この情報は、コマンドがメモリコントローラに最初に伝送された元の順序に対応するようにデータを再順序付けするのに役立つ。最後に、バースト標識（Ｄｂ）フィールド１９０６は、データが１つまたは２つのクロックサイクルを占めていることを示す１ビットのフィールドである。
【００９１】
図１６に戻って論じると上述のように、データキュー要素１６５２のそれぞれに対するデータ発生時刻（Ｄｄ）および持続時間は、発行準備ができているコマンド、すなわちＣｄ＝０のコマンドキュー１５０２内のコマンド用のＤｄおよび持続時間と、比較器マトリクス１５０６で比較される。持続時間は、１または２クロックサイクルであり、これは、加算器１６１６が、Ｄｂビットをデータ発生時刻Ｄｄに加算することによって、Ｄｄ＋１に対する“０” （１クロックサイクルを表わす）または“１”（２クロックサイクルを表わす）を導き出すことによって得られる。比較によりデータバス上に衝突がないことがわかれば、コマンドがコマンドキューから発行される。
【００９２】
データキューコントローラ１６５８は、データキュー１５０４の動作を制御する。空き位置識別部１６６０は、コマンドキューコントローラ１６０４とともに、データキュー要素１６５２内に新たなデータキュー要素情報を挿入することを容易にする。空き位置識別部はまた、対応するメモリアクセスが完了したときにデータキュー要素１６５２から情報を外すのを容易にする。ゼロコンパレータ１６６２とバースト識別部１６６４は、いつデータキュー要素１６５２のどれかに対するＤｄがゼロになるか、そしていつデータ転送がデータバスを占有しなくなるか、そしてそれゆえいつ対応する情報がデータキューから外されるべきかを決定するのに使われる。
【００９３】
本発明の別の特定の実施形態によれば、衝突検出は２次元配列のコンパレータとマルチプレクサの使用を通して、さらに複雑になる。このアプローチは、上述の１次元アプローチよりさらにシリコン集約的で、発行準備のできたコマンドに対する１つの要素だけでなく、コマンドキュー内の全要素を見る。これは、以前に発行されたコマンドについてだけでなく、データバス上のデータパケットの順序についても、コマンドのスケジューリングを行う。
【００９４】
新しいコマンドを挿入するためには、コマンドパイプの発行予定部分における２つの連続したステージのそれぞれの組み合わせが、それらの間に新しいコマンドを挿入できるか調べるために比較されなければならない。この比較は、コマンドが挿入されうる範囲を、実際に決定する。この範囲は以下に示すとおりである。
【００９５】
ＣＬＥＮＸ＝コマンドの長さ
Ｔｃｓｔａｒｔ＝ｔｃＡ＋ＣＬＥＮＡ　　（１）
Ｔｃｅｎｄ＝ｔｃＢ　　　　　　　　（２）
【００９６】
ここで、ｔｃＡとｔｃＢは、連続するパイプライン要素ＡとＢの発行時間である。パイプライン要素Ａはパイプライン要素Ｂに先立ち、したがってその発行時間は２つの中の低い方である。挿入があるとすれば、当然ＡとＢの要素の間には少なくとも１つのオープンスロットがなければならない。したがって：
【００９７】
Ｎ＝Ｔｃｅｎｄ−Ｔｃｓｔａｒｔ＋１　　　　（３）
（ここで、Ｎ＝エレメントＡＢ間の発行スロットの数）
ＬＥＮ＜＝ｔｃｂ−ｔｃａ−ＣＬＥＮＡ　　（４）
【００９８】
ハードウェアでは、以下の条件を単純に実装することは容易である：
【００９９】
（ｔｃＢ−ＣＬＥＮＡ）−（ｔｃＡ＋ＣＬＥＮＡ）＝＞０　（５）
【０１００】
範囲のスタートポイントとエンドポイントも、関連付けられたデータスロットの可能な範囲を特定する。この範囲は、オーバーラップがないかどうか、新しい範囲がどのようになるかを調べるために、データパイプ内のそれぞれの連続する要素と比較されなければならない。この比較には５つの異なるケースが存在する。
【０１０１】
ケース０：
このケースでは、データスロットｔｄＡとｔｄＢによって画される範囲は、２つの連続する要素ＭとＮの範囲の完全に外部にある。このケースでは、したがって：
【０１０２】
ｔｄＡ＋ＣＬＥＮＡ＝＞ｔｄＮ　（６）
または、ＤＬＥＮＸ＝データの長さとすると、
ｔｄＢ＜＝ｔｄＭ＋ＤＬＥＮＭ　（７）
【０１０３】
ＭとＮのペアの間には可能なデータスロットはない。
【０１０４】
ケース１：
このケースでは、データスロットｔｄＡとｔｄＢによって画される範囲は、２つの連続する要素ＭとＮの範囲の完全に内部にある。このケースでは、したがって：
【０１０５】
ｔｄＡ＋ＣＬＥＮＡ＝＞ｔｄＭ＋ＤＬＥＮＭ　（８）
かつ、
ｔｄＢ−ＣＬＥＮ＋ＤＬＥＮ＜＝ｔｄＮ（ここで、ＣＬＥＮはスロット内の新しいコマンドの長さ、ＤＬＥＮはスロット内の新しいデータの長さ）　　（９）
【０１０６】
このケースにおける最も早い可能なデータスロット時間は、対応するコマンドの発行時間ｔｃＡ＋ＣＬＥＮＡを持つｔｄＡ＋ＬＥＮＡである。
【０１０７】
ケース２：
このケースでは、データスロットｔｄＡとｔｄＢによって画される範囲は、要素Ｍの長さに一致する。このケースでは、したがって：
【０１０８】
ｔｄＡ＋ＣＬＥＮＡ＜ｔｄＭ＋ＤＬＥＮＭ　（１０）
かつ、
ｔｄＢ−ＣＬＥＮ＋ＤＬＥＮ＞ｔｄＭ＋ＤＬＥＮＭ、かつ、ｔｄＢ−ＣＬＥＮ＋ＤＬＥＮ＜ｔｄＭ　　　（１１）
【０１０９】
このケースにおける最も早い可能なデータスロット時間は、対応するコマンドの発行時間ｔｃＭ＋ＣＬＥＮＭ−ＤＡＴＡ−ＯＦＦＳＥＴを持つｔｄＭ＋ＤＬＥＮＭ＋１である。ここでＤＡＴＡ−ＯＦＦＳＥＴは、コマンド発行時間とデータ占有時間の間の時間である。
【０１１０】
ケース３：
このケースでは、データスロットｔｄＡとｔｄＢによって画される範囲は、要素Ｎの長さに一致する。このケースでは、したがって：
【０１１１】
ｔｄＡ＋ＣＬＥＮＡ＞ｔｄＭ＋ＤＬＥＮＭ　　（１２）
かつ、
ｔｄＡ＋ＣＬＥＮＡ＋ＤＬＥＮ＜ｔｄＮ　　　（１３）
【０１１２】
したがって、このケースにおける最も早い可能なデータスロット時間は、対応するコマンドの発行時間ｔｃＡ＋ＣＬＥＮＡ＋１を持つｔｄＡ＋ＣＬＥＮＭである。このケースにはケース１も含まれていることは注記すべきである。
【０１１３】
ケース４：
このケースでは、データスロットｔｄＡとｔｄＢによって画される範囲は、要素ＭとＮによって定義される範囲を包含する。このケースでは、したがって：
【０１１４】
ｔｄＡ＋ＣＬＥＮＡ＜ｔｄＭ＋ＤＬＥＮＭ　　（１４）
かつ、
ｔｄＢ−ＬＥＮ＞ＣｔｄＮ　　　　　　　　　（１５）
【０１１５】
したがって、このケースにおける最も早い可能なデータスロット時間は、対応するコマンドの発行時間ｔｃＭ＋ＣＬＥＮＡ＋ＤＡＴＡ−ＯＦＦＳＥＴを持つｔｄＭ＋ＤＬＥＮＭである。ここでＤＡＴＡ−ＯＦＦＳＥＴ＝ｔｄＡ−ｔｃＡである。
【０１１６】
最も早い可能なスロットが常に選ばれるようなスケジューリングという目的に照らすと、ケース１とケース３が同一になることは明らかである。従って、これらを合体したケースはケース３である。同様に、ケース２とケース４は、要求される結果がｔｄＭ＋ＬＥＮＭであるので、同一である。このケースでは、ｔｄＭがｔｄＡとｔｄＢによって与えられる範囲に一致することが端的に示されねばならない。さらに、入ってくるコマンドに対する、最も早い可能な発行時間（ｔｃ）とデータスロット（ｔｄ）が考慮されねばならない。それぞれのコマンドパイプのペアに対して、それぞれのデータパイプで行われなければならない比較は以下の通りである：
【０１１７】
【数１】

【０１１８】
したがって、コマンドパイプに関して必要な演算は：
【０１１９】
【数２】

【０１２０】
同様に、データパイプに関して必要な演算は：
【０１２１】
【数３】

【０１２２】
従って、この決定ロジックは、上で定義されたコンパレータのマトリクスで構成されている。最適の選択は最も早いコマンド発行時間であり、これは単純な優先順位エンコーダによって決定される。
【０１２３】
再順序付けパイプ制御ロジックは、コマンドパイプとデータパイプのそれぞれの要素について、どの演算がなされるべきかをダイナミックに決定しなければならない。
【０１２４】
待機中のコマンドパイプでは、それぞれのパイプ要素は４つの動作が可能である。前の要素からの読み出し（パイプは前進する）、現内容の保持（パイプはホールドする）、次の要素からの読み出し（パイプはバックアップする）、入ってくるコマンドバスからの読み出しである。４つのケースによって定義されたパイプの様々なポイントに、多数組の条件が存在しうる。ＳＬｉＭＡＣへの発行元の要素は、要素０と定義され、それに対し発行から最も遠い要素は要素Ｍと定義される。再順序付け決定ロジックが、カレントパイプラインにおける最適の挿入点が要素Ｎ−１とＮの間であることを発見した場合には、要素Ｎへの挿入が行われる。
【０１２５】
ケース１−ホールド：
ＳＬｉＭＡＣへの発行や新しいコマンドの挿入がない場合、パイプはホールドする。
【０１２６】
ケース２−ホールド＆インサート：
このケースでは、ＳＬｉＭＡＣへの発行はないが、新しいコマンドのパイプへの挿入がある。挿入が要素Ｎに生じるならば、パイプは要素０から要素Ｎ−１までをホールドし、要素Ｎに挿入し、要素Ｎ＋１から要素Ｍまでをバックアップする。
【０１２７】
ケース３−発行：
このケースでは、要素０からＳＬｉＭＡＣへの発行があり、パイプの他の部分は、要素０が要素１の内容を、要素１が要素２の内容を、というように、要素Ｍ−１が要素Ｍの内容を含むようになるまで、先送りされる。
【０１２８】
ケース４−発行＆挿入：
このケースでは、要素０からＳＬｉＭＡＣへの発行があり、要素Ｎでの挿入がある。このケースでは、要素０からＮ−２では先送り動作が行われ、要素Ｎ−１では挿入動作が行われ、要素ＮからＭはホールドされることになる。先送りは、直後の要素からのデータを格納することになる要素に行われ、要素Ｎでの挿入は（その要素はカレントパイプの要素Ｎ−１と要素Ｎの間に挿入されることになる）、実際には、挿入された要素が、更新されたパイプのＮ−１の位置に行くことを意味している。
【０１２９】
図２０は、図１５に示された衝突検出システム１５００の別の実現である衝突検出システム２０００を図解している。この実施形態では、衝突検出システム２０００は、ターゲット応答制限に基づいた最適のコマンドシーケンスが得られるようにコマンドを再順序付けして、初期化コントローラとターゲットサブシステム間のデータ転送用の最適のスロットを決定する。コマンドの再順序付けはデータバス上の異なったデータパケット同志の衝突を引き起こしてはならないので、特定のコマンドに関係するコマンドデータ転送がデータコンフリクトを引き起こす場合この特定のコマンドの発行を差し止める衝突検出器２００２が必要となる。この実施形態では、衝突検出システム２０００は、コマンドキュー２００４と結合される衝突検出器２００２を備える。
【０１３０】
この実施形態では、衝突検出器２００２は“発行予定の”コマンド（コマンドキュー２００４に格納されている）と“発行済みの”コマンド（データキュー２００６に格納されている）の間のすべての可能なデータ衝突を検出する。この実施形態では、それぞれマルチプレクサ２００８と結合されているＮ個のコマンドキュー２００４がある。Ｎ個のコマンドキュー２００４のそれぞれは、発行予定のそれらのコマンドと、コマンドがターゲットデバイスに発行された後、データ転送がユニバーサルコントローラ１０４とターゲットデバイス（すなわち共有リソース）１０８間のデータバス上にいつ現れるかを示す時間ファクタ“ｄ−ｔｉｍｅＮＤ”と、データ・バースト転送を示すバーストビット（ｂＮＤ）と、リード／ライトビット（ｒｗＮＤ）と、を格納するように構成されている。この実施形態では、データキュー２００６は、すでにターゲットデバイスに発行済みの要求に対して、データ転送がユニバーサルコントローラ１０４とターゲットデバイス（すなわち共有リソース）１０８間のデータバス上にいつ現れるかを示す時間ファクタ“ｄ−ｔｉｍｅＤ”を格納する。コマンドキュー２００６はまた、バーストビット（ｂＮＤ）、リード／ライトビット（ｒｗＮＤ）も格納する。
【０１３１】
より好ましい実施形態では、衝突検出システム２０００は、発行予定のコマンドを格納し再順序付けするように構成されたキュー＆リンクコントローラユニット２０１０を備える。キュー＆コントローラユニット２０１０は、新しいコマンドの発行時間と、データがデータバス上に現れる時間も計算する。キュー＆コントローラユニット２０１０はまた、コマンドキューから発行されたデータをデータキューに転送し、同時にコマンド発行後にコマンドキューからそれを外す。キュー＆コントローラユニット２０１０はまた、メモリへのアクセス完了後にデータキューからデータ要素を外す。
【０１３２】
図２１を参照すると、ターゲットデバイスへのすべてのリード／ライトコマンドは、データパケット転送に関連付けられている。ターゲットデバイスへのコマンド発行の前に、新しいデータパケットＮＤ（Ｎｅｗ　Ｄａｔａ）が、衝突を起こさずにデータキューへの挿入が可能か調べるために、そのタイミング情報に従ってチェックされる。図２１に示されたこの例では、発行されたデータパケットＤはすでにデータキューの中に位置を占めており、新しいデータパケットＮＤは発行済みデータパケットＤに対して比較される。発行済みデータパケットＤと新しいデータパケットＮＤが両方ともバーストアクセスを表わしていることには留意されたい。したがってこの例では、データ衝突を起こさずに発行済みデータパケットＤに配慮して新しいデータパケットＮＤが位置を占める可能性が２つある。新しいデータパケットＮＤは発行済みデータパケットＤの左側か右側に位置を占めることができる。
【０１３３】
この特定の例は、非バーストデータ転送とバーストデータ転送（すなわち４データストリーム）の両方をサポートするメモリコントローラの衝突検出を図解している。データバスの双方向的性質のため、１クロックサイクルが連続するリード−ライトまたはライト−リード転送の間に挿入されなければならない。
【０１３４】
多くの可能な結果があることは留意されるべきであり、そのいくつかを以下に列挙する。
【０１３５】
１）ＮＤがＤの後または前に置かれる場合、衝突は起こらない。
【０１３６】
２）連続するリード−ライトまたはライト−リードデータ転送の間には、１クロックサイクルが挿入されなければならない。コマンドキューとデータキューのすべての要素は、その動作が“リードデータ”（ｒｗ＝０）か“ライトデータ”（ｒｗ＝１）かを示す“ｒｗ”ビットを格納している。
【０１３７】
３）データパケットは、１データストリーム（非バースト転送）か４データストリーム（バースト転送）で構成されている。コマンドキューとデータキューのすべての要素は、その動作が“バースト転送”（バースト＝１）か“非バースト転送”（バースト＝０）かを示す“バースト”ビットを格納する。
【０１３８】
発行予定データパケットと発行済みデータパケットのペアのそれぞれにおいて、発行予定コマンドについて行われるべき比較は以下のとおりである：
【０１３９】
【数４】

【０１４０】
本発明のさらに別の実施形態では、連続する２つのメモリアクセス間の時間を予測する装置と方法が開示されている。この装置と方法は、新しいコマンドに対する最も早い“コマンド発行時間”の高速計算を可能とするものである。図２２について述べれば、特定のページに発行された最新のコマンドとそのメモリへの予測された次のアクセス間の時間を格納するＮ個のページタイマ２０２２を持つ予測システム２２００を図解したものである。同じページへの次のアクセスは“クローズ”、“オープン”、“リード”、“ライト”の場合がある。入ってくる新しいコマンド（例えば、リード）は、特定のページへのアクセスが発行前にどのくらい待たなければならないかを表示する１つの特定のページタイマを選ぶ。この新しいコマンドは次に、タイミングルックアップテーブル２２０４の中から、このコマンド（リード）と同じページへの可能な次のアクセス（クローズ、オープン、ライト、リード）の間に挿入されるべき適切な内容をら選ぶ。タイマの解像度は１クロックサイクルである。
【０１４１】
タイミングルックアップテーブル−データは、コマンド発行後データバス上のデータがどのくらいのサイクル有効かを示す時間を格納する。新しいコマンドが非アクティブの場合には、すべてのサイクル毎に、すべてのページタイマの値が“０”に達するまでになる。
【０１４２】
次に図２３を参照すれば、本発明のさらに別の実施形態において、本発明の実施形態に従ったデバイスアクセス優先順位決定部２３０２を持つデバイスコントローラ２３００が示されている。この実施形態では、優先順位決定部２３０２は、リクエストコントローラユニット２３０４に結合されたデバイス要求をいくつでも受け取り格納するのに適したリクエストキュー２３０３を含む。リクエストコントローラユニット２３０４は一部には、リクエストキュー２３０３の任意の位置から特定の応答をフェッチし、フェッチした応答を複数の共有デバイス１０８のうちの適切な１つに伝達するために使われる。この実施形態では、優先順位決定部２３０２はまた、応答キュー２３０６を含む。この応答キューは、応答コントローラユニット２３０８に結合された共有デバイス１０８のいずれからでも応答を受け取り格納するように構成されており、応答コントローラユニット２３０８は、格納された応答から、要求しているデバイス１０２に送られる特定の応答を選択するのに利用される。
【０１４３】
好ましい実施形態では、図１Ｅに示されているように、要求とそれに関連付けられた応答とが同じＩＤ番号１５０を持つように、応答と要求がＩＤ番号１５０とそれぞれ関連付けられている。上述のように、ＩＤ番号１５０は、５つのデータビットを含み、その１番目と２番目のデータビットは、その特定の応答／要求が属する要求元デバイスのグループ（マルチプロセッサコンピューティング環境におけるプロセッサの１グループなど）を識別するグループセレクタフィールド１５２である。さらに上述のように、要求番号フィールド（ＲＮ）１５３は、グループセレクタフィールド１５２によって識別された要求元デバイスのグループに関連付けられた要求および／または応答の番号を表しており、例えば、同一の要求デバイスからの連続する要求が連続する要求番号フィールド１５３を持つように表わされている。
【０１４４】
動作中は、リクエストコントローラ２３０４とレスポンスコントローラ２３０８の双方は、グループ優先順位セレクタレジスタ１５４と、動ロックカウンタレジスタ１５６と、再順序付けセレクタ２３１２とをそれぞれ組み入れている。グループ優先順位セレクタレジスタ１５４は、ＲＮ１５２によって識別された特定の要求／応答グループに関する優先順位情報を含む。それについて１つの実施形態では、“３”という値が最上位の優先順位を表わし、“０”という値が最下位の優先順位を表わしており、上位の優先順位の要求が優先順位の下位の要求を飛び越えることができる。
【０１４５】
動ロック状況を回避するために、動ロックカウンタレジスタ１５６は、何個の連続する優先順位上位の要求（または応答）が、優先順位下位の要求（または応答）を飛び越せるかに関する情報を含んでいる。動ロックカウンタレジスタ１５６が、優先順位上位のリクエストが優先順位下位のリクエストを飛び越える状況においてのみアクティブとなることは注記されるべきである。実際、適当なキューに優先順位下位の要求（または応答）がない場合には、動ロックカウンタレジスタ１５６は非アクティブとなる。
【０１４６】
以上、本発明の実施形態の幾つかを詳細に説明してきたが、本発明の趣旨と範囲から逸脱することなく他の多くの特定の形式で本発明を実施し得ることは理解されよう。したがって、ここに示した例は説明のためのものであり、これらに限定されるものではない。本発明は、ここに示された細部に限定されることなく、前記特許請求の範囲内で変形可能である。
【図面の簡単な説明】
【図１Ａ】
本発明の実施形態によるユニバーサルコントローラの一般的な使用例を示す図である。
【図１Ｂ】
図１Ａに示されるユニバーサルコントローラの具体的な使用例を示す図である。
【図１Ｃ】
本発明の実施形態によるユニバーサルコントローラに結合されたアドレス空間コントローラを示す図である。
【図１Ｄ】
図１Ｃに示されるアドレス空間コントローラの具体的な使用例を示す図である。
【図１Ｅ】
本発明の実施形態による例示的な要求／応答識別番号を示す図である。
【図２Ａ】
本発明の実施形態による一般的なユニバーサルコマンドを示す図である。
【図２Ｂ】
図２Ａと同種のユニバーサルコマンドで、メモリページリードコマンドに適する具体的なユニバーサルコマンドを示す図である。
【図２Ｃ】
図２Ｂの例示的なコマンドにおいて、コマンド構成要素間に適当な時間間隔を設けることによって形成される一連のコマンドの例を示す図である。
【図３】
本発明の実施形態によるリソースタグを示す図である。
【図４】
本発明の実施形態に従って、ユニバーサルコントローラが共用リソースにアクセスするプロセスの詳細を表すフローチャートである。
【図５】
本発明の実施形態に従って、ユニバーサルコントローラがリソースの状態および実行すべき動作の順序を決定するプロセスを示す図である。
【図６】
本発明の実施形態による工程に基づき、ユニバーサルコントローラが連続する動作の間における適切な時間間隔を決定する工程を示す図である。
【図７Ａ】
本発明の実施形態によるページヒット／ミスコントローラを示す図である。
【図７Ｂ】
本発明の実施形態によるページヒット／ミスコントローラを示す図である。
【図８】
本発明の実施形態によるバンクアクセスコントローラを示す図である。
【図９Ａ】
本発明の実施形態による例示的なＳＬＤＲＡＭ型マルチプロセッサシステムを示す図である。
【図９Ｂ】
図９Ａに示されるマルチプロセッサシステムによる例示的なＳＬＤＲＡＭバスの処理の流れを示すタイミング図である。
【図１０】
本発明の実施形態によるメモリコントローラのブロック図である。
【図１１】
本発明の実施形態による制限ブロックのブロック図である。
【図１２】
本発明の実施形態による例示的なＳＬＤＲＡＭコマンドのタイミング図である。
【図１３Ａ】
本発明の具体的な実施形態による、メモリコマンドの再順序付けの流れを示した図である。
【図１３Ｂ】
本発明の具体的な実施形態によるメモリコマンドの再順序付けの流れを示した図である。
【図１３Ｃ】
本発明の具体的な実施形態によるメモリコマンドの再順序付けの流れを示した図である。
【図１４】
本発明の具体的な実施形態により構成されたメモリコントローラの一部を示すブロック図である。
【図１５】
本発明の具体的な実施形態により構成された再順序付け回路のブロック図である。
【図１６】
図１５の再順序付け回路のより詳細なブロック図である。
【図１７】
本発明の具体的な実施形態によるコマンドキュー要素の内容を示す図である。
【図１８】
アドレスシフタの具体的な実施形態を示すブロック図である。
【図１９】
本発明の具体的な実施形態によるデータキュー要素の内容を示す図である。
【図２０】
図１５に示される衝突探知システムの別の実施例としての衝突探知システムを示す図である。
【図２１】
ターゲットデバイスに対するリード／ライトコマンドのそれぞれがどのようにデータパケットの伝達に関連しているかを示す例示的なタイミング図である。
【図２２】
特定のページに最後に発行されたコマンドとそのメモリへの予測される次期アクセスとの間の時間を記憶するＮページタイマーを有する予測システムを示す図である。
【図２３】
本発明の実施形態によるデバイスアクセス優先順位決定器を有するデバイスコントローラを示す図である。
【図２４】
本発明の実施形態による制限ブロックにより実行されるスケジューリングプロセスを要約する表４を示す図である。[0001]
TECHNICAL FIELD OF THE INVENTION
The present invention relates generally to computer systems. More specifically, the present invention relates to accessing shared resources in a computer system, such as a multiprocessor computer system. In particular, an apparatus and method for providing universal access to shared resources is described.
[0002]
[Prior art]
In a basic computer system, a central processing unit, or CPU, operates according to a predetermined program or set of instructions stored in an associated memory. In addition to the stored set of instructions or programs that define the operation of the processor, the memory space may be resident in processor memory or associated additional memory to facilitate information manipulation of the central processor during processing. Provided. The additional memory provides a storage location for the information created by the processor, and additionally provides a storage location for the information that the processor uses temporarily, ie, instead of a "notepad," when processing the program. In addition, the associated memory provides a place to place the output information of the processor executing the set of instructions and make that information available to the output devices of the system.
[0003]
In systems where many components (processors, hard drives, etc.) must share a common bus to access existing memory, collisions over memory access can occur. To rise. Access to memory or other shared resources is complicated, particularly in systems such as multiprocessor computer systems, where systems utilizing different processors are operated simultaneously. Collisions between processors are generally unavoidable, as each processor or processor system is likely to request access to the same memory at the same time. Basically, the operation of two or more processors or processor systems in a multiprocessor computer system results in intermittent duplication of memory commands for shared memory or other shared resources.
[0004]
Conventional approaches that have been taken to solve the problem of conflicting memory access requests for shared memory include, in some cases, complete duplication of memory used for each of the processors and isolation of the processor system. However, this approach to solving the memory access request collision problem often negates the intended advantage in a multiprocessor system. Such multiprocessors are most effective when operated in a situation where one processor performs computations in parallel on the same data while assisting the operation of the other processor. Conventionally, such processor systems are either time-shared, with processors competing for access to shared resources such as memory, or the processor system has dual ports, and each processor is individually It has a memory bus, and when one is permitted to access, the other is in a standby state.
[0005]
Various measures have been taken to avoid the collision problem described above. In one approach, collision avoidance is achieved through sequential use of each processor, or time sharing of processors. In this method, the processor simply accesses the shared resource in order to avoid collision. Such systems commonly used include a "ring passing" or "token system" whereby a potentially conflicting processor follows a predetermined sequence of operations similar to the propagation of a ring in a group of users. Polled by the system.
[0006]
Unfortunately, the sequential access methodology by the processor places significant limitations on the overall operation of the computer system. This limitation is due to the fact that the system spends considerable time polling for conflicting processors. Further, when a single processor is operating and, for example, requests access to a shared memory, a delay occurs in each memory cycle between processor accesses to a shared resource with execution of a continuous operation by the system. Occurs.
[0007]
Another common strategy for collision avoidance is by prioritizing among the processors in the computer system. In such a manner, each processor is prioritized according to a hierarchy of system importance. Each time a collision occurs, the memory controller simply provides access to the higher priority processor. For example, in the case of a system having two processors, the first processor and the second processor access the shared memory. The shared memory is typically a dynamic DRAM (DRAM) type memory device or the like that requires periodic refresh of data maintained and stored in the memory. Generally, in a DRAM type memory, refresh is performed by another independent refresh system. In such a multiprocessor system, both the processor and the refresh system will compete for access to the shared memory, and the system memory controller may conflict with the memory access request according to the priority assigned to the processor and the refresh system. Process a command. While such systems solve the collision problem and are more effective than collision avoidance systems with simple sequential access, they are still inflexible.
[0008]
Another conventional strategy for conflict resolution is a decision making function built into the memory controller. Unfortunately, the decision-making part of the memory controller is driven by the control and timing of the clock system, so it takes a lot of time before it actually makes the decision and the memory controller can grant access to the shared memory. Will be spent.
[0009]
Unfortunately, the problem of making this actual decision substantially reduces the ability of conventional memory controllers to grant access to multi-bank memory systems. In a multi-bank type memory system, an actual memory core is divided into a specific area, that is, a bank, and data to be read is also stored therein. While faster and more efficient memory access can be provided, the conventional memory controller requires a complicated mechanism to cope with the multi-bank memory device, and as a result, the overall system has a low access speed. It will be significantly reduced.
[0010]
From the above, it can be seen that a universal device access controller is desired.
[0011]
Summary of the Invention
In accordance with the present invention, a universal resource access controller is provided for controlling access to related resources, such as, for example, a synchronous link DRAM (SLDRAM). The present invention comprises a universal resource access controller coupled to the requesting system and the resource, so that the requesting system generates a resource access request when it needs access to the resource and passes it on to the universal resource controller. The universal resource controller then uses the specific operating characteristic parameters of the requested resource and the current state of the requested resource to create a corresponding ordering suitable for accessing the resource required by the requesting system. The generated universal access request command is generated.
According to another embodiment of the present invention, an apparatus for controlling access by any of a plurality of requesting systems to any of a plurality of accessible devices is disclosed. The apparatus comprises a universal controller unit and an address space controller unit coupled to the universal controller unit. The universal controller unit decodes the system address and the system command input from the request source system. The universal controller unit then generates an associated device address and a corresponding device command based on the device parameters stored in the address space controller and provided by the address space controller. The address space controller is configured such that each of the plurality of devices is provided with its own address area in the address space controller.
[0012]
A better understanding of the nature and advantages of the present invention will be gained by the portions and drawings set forth below.
[0013]
BEST MODE FOR CARRYING OUT THE INVENTION
In systems where multiple devices, such as processors, share the same resource, various strategies have been employed to avoid the collisions that typically occur when one or more devices requests access to a shared resource. Have been. One approach is to operate the processors sequentially to avoid their collisions, or to achieve collision avoidance by sharing the processors in time. In this method, the processor simply accesses the shared resources in order to avoid collision. Such systems commonly used include a "ring passing" or "token system" whereby a potentially conflicting processor follows a predetermined sequence of operations similar to the propagation of a ring in a group of users. Polled by the system.
[0014]
Unfortunately, the sequential access methodology by such processors imposes significant limitations on the overall operation of the computer system, as the system spends considerable time polling for conflicting processors.
[0015]
Another common strategy for collision avoidance is by prioritizing among the processors in the computer system. In such a manner, each processor is prioritized according to a hierarchy of system importance. While such systems solve the collision problem and are more effective than simple sequential access collision avoidance systems, they are still inflexible.
[0016]
Yet another general strategy for collision avoidance involves decision logic embedded in a controller-type device. Unfortunately, the complexity of the decision logic means that much time is spent before the actual decision is made and the controller can be granted access to the shared memory.
[0017]
This problem of complicated logic slowing down the operation of the system is a problem in a multi-chip module type memory system including memories scattered between a plurality of interconnected memory devices, each having different operation characteristics. Even more noticeable. Since conventional logic cannot be configured to compensate for the different access characteristics inherent in various memory devices, it seeks to reduce the overall system execution performance.
[0018]
Broadly, as shown in FIG. 1A, the present invention can be depicted as a system 100 having request devices 102 each coupled to a universal device controller 104. Here, requesting device 102 is coupled to universal device controller 104 by a system bus 106 suitably configured to provide access to any number and type of shared resources 108. In one embodiment, system bus 106 is coupled to universal controller 104 by an associated system interface layer 110, which is further coupled to shared resource 108 by shared resource interface 109. In a broad sense, the universal controller 104 is configured to determine the state of the shared resource 108 based on requests to the shared resource from any requesting system 102 and the shared resource operating characteristic parameters 113.
[0019]
If the requesting system 102 is one processor in a multiprocessor system and requests access to a shared resource 108 as a memory device 108 that is also shared by other processors coupled to it, Controller 104 determines the order of operations to be performed to complete the desired resource access request. For example, if memory device 108 is an SDRAM, its operations typically include precharge, page close, page open, and page read or write.
[0020]
Once a particular order of operation has been determined, the universal controller 104 determines an appropriate time interval between the ordered sequence of operations, for example, to avoid data collisions or other types of conflicts. In a preferred embodiment, the time interval is determined in part based on operating characteristics of the shared memory device stored, for example, in a look-up table. Thereafter, properly ordered access commands are output by the universal controller and then responded to by the shared memory.
[0021]
In the following detailed description of the invention, a number of specific embodiments are set forth in order to facilitate a thorough understanding of the invention. However, it will be apparent to one skilled in the art that the present invention may be practiced without these specific details or by utilizing other elements or steps. In other instances, well-known processes, procedures, components, and circuits have not been described in detail so as not to obscure the present invention.
[0022]
Hereinafter, the present invention will be described in terms of a memory controller configured to function as a communication mechanism between a processor and a shared memory. However, it should be noted that the present invention can also be implemented as a universal controller that can control access to any resource, whether shared or not. Such resources need not be memory, and in fact, the present invention provides a multiprocessor within a multiprocessor for the purpose of increasing the efficient bandwidth of the system bus, for example, by reducing the latency of bus access. It can also be used to control access to a shared system bus, such as controlling the amount of information.
[0023]
1B, a system 100 having a requesting device 102, such as a processor, and coupled to a universal controller 104 by a system bus 106 is shown. Further, the controller 104 is coupled to a shared resource 108, which is, for example, a memory 108 that can take various forms, such as DRAM, SDRAM, SLDRAM, EDO, FPM, or RDRAM. In the illustrated embodiment, system bus 106 includes unidirectional address bus 106-1, which communicates memory address requests output by processor 102 to universal controller 104. The system bus 106 also includes a unidirectional address bus 106-2, which communicates commands related to memory addresses with the address bus 106-1. For example, if the processor 102 requests an executable instruction stored at a particular memory location in the memory 108, the processor outputs a read request (referred to as a system command) on the command bus 106-2, substantially. At the same time, a corresponding memory address request (referred to as a system address) is output to the address bus 106-1. Both system addresses and system commands are received by a configurable system interface 110 included in controller 104. Here, configurable means that the system interface 110 can be configured such that the received system command and system address can process the system interface 110 in any manner and form required by the memory 108. It means that. This eliminates the need for the processor 102 to issue individual requests to each memory device, and allows the data required by the processor 102 to be stored in any number and type of memory devices coupled to the controller 104.
[0024]
In the exemplary embodiment, system interface 110 is configured to translate received system commands and system addresses into what are referred to as universal commands 200. An example of a universal command 200 is shown in FIG. 2A. In one embodiment, if the shared resource is a DRAM-type memory device (including SLDRAM, SDRAM, EDO DRAM, etc.), universal command 200 may include all necessary memory access requests for memory 108 to perform. It is formed from five data fields containing the operation. Such operations include a precharge operation, indicated by a data precharge field 202, which is used to indicate whether a particular row needs to be precharged, Including actions. Other operations include a data activation field 204, a data read field 206, a data write field 208, and a data refresh field 210. For example, here, memory 108 has memory page 1 that is currently active in memory bank 1 (i.e., opened after a read or write has occurred), and a subsequent processor command causes memory bank 1 Requesting that the data stored on page 2 be read and output to processor 102. In this case, page 1 must be closed (ie, page 1 is precharged) and page 2 must be activated to execute the command required by processor 102. After the activation is completed, reading from page 2 is performed. Accordingly, the universal command 212 shown in FIG. 2B is generated by the universal command generation unit 110 having the data fields 202, 204, 206, 208, 210, of which the data fields 202, 204, 206 are "executing related operations". , And the data fields 208 and 210 are set to "0" indicating "non-execution of the related operation" (that is, "NOP").
[0025]
Returning to FIG. 1B. Access to memory 108 is very dynamic because it is shared by a number of different requesting devices, so the state of memory 108 is constantly changing. The state of a memory means that the state of that memory location must be known in order to perform a particular operation at that particular memory location. For example, when a specific memory page is closed, it is necessary to first open that memory page in order to execute a read operation. Accordingly, to locate the current state of a particular address location, the most recent operation performed on the particular memory location is identified by the resource tag 300 shown in FIG. In one embodiment of the present invention, the resource tag 300 includes an address field 302 used to identify a particular memory address location, to identify the last command issued to the address identified by that address field 302. , And a final command issue time data field 306. For example, the resource tag 308 of the memory address ADD5 indicates that a page read command was issued at time 5φ (indicating 5 system clock cycles), and the resource tag 310 assigns the same memory address ADD5 to the memory page at time 10φ. Indicates that a page write is to be performed. By observing the state of the memory address ADD5, the universal controller 104 recognizes that the memory page in ADD5 has already been opened, and that no page open operation is necessary.
[0026]
Based on information about the state of the resource provided by the tag 300 stored in the resource tag buffer 114, the command orderer 116 coupled to the configurable system interface 110 may use the command components 202-210 of the universal command 200. , So that the command components 202-204 and the command components 204-206 are configured to be the time intervals t1 and t2, respectively, as shown in FIG. 2C. An ordered command 220 is provided. Here, because the command components 208-210 are "NOP" type fields, the ordered command 220 does not include any reference to these fields, the clock period required for components 202-206, and t1 + t2 Only requires a time period substantially equal to and a time period substantially equal to the sum of: Thereby, command orderer 116 may provide optimal flow of commands and data between processor 102 and memory 108.
[0027]
In another embodiment of the invention, if shared resource 108 is a multi-bank memory device, such as an SDRAM, or if the shared resource is a multi-device memory, such as a multi-chip module, resource tag buffer 114 may include, for example, a specific bank. Alternatively, resource tags for all open pages on the device may be stored. In one embodiment, a comparator (not shown) looks up the bank number or device identifier in the system address and compares the page address and the system address with the contents of the tag buffer 114. If the result of the comparison is not a "match" (i.e., the addresses do not match), the universal controller 104 must close the old page using the address from the tag buffer 114 and, based on the new system command, Must be opened.
[0028]
Where multiple different devices are assisted by the universal controller 104, it is desirable to be able to select operating parameters that are only relevant to a particular device and that are also relevant to the input system address. An address space controller 120 coupled to the universal controller 104 is shown in FIG. 1C when the universal controller is assisting a plurality of different devices. In the exemplary embodiment, address space controller 120 has the ability to select only device-specific parameters that indicate one device associated with the input system address. In the specific embodiment shown in FIG. 1D, the address space controller 120 includes a comparator 122, which compares the input system address and the device associated with the input address (or, similarly, The contents of the area address range buffer 124 for identifying the memory area) are compared. Once a particular device or region is identified, one register in a group of device parameter registers 126 and 128 (each coupled to buffer 124 and containing parameters specific to a particular device) is selected. The selected device parameter register then presents specific operating parameters for the device corresponding to the system address. In another embodiment, the contents of the selected device parameter register are input to LUT 118. This allows any number of different devices to be assisted by the universal controller 104, and the particular operating parameters of each device are identified and used for optimal sequencing of the corresponding universal commands.
[0029]
It should be noted that it would be beneficial to be able to select any other command waiting in the command sequence, such as when one of the devices coupled to the universal controller is busy and cannot accept a new command. is there. In another embodiment of the invention, each of the response by the device and the request by the universal controller has an associated identification number 150, and in an exemplary embodiment, as shown in FIG. It is a 5-bit data word. The identification number 150 is formed to include a group selector field 152 having a length of 2 bits and a request number field 153 having a length of 3 bits. The group selector (GS) determines to which group a particular system request belongs (eg, a processor), and the request number (RN) represents the request or response number of the associated group identified by the group selector field 152. . In that case, successive requests from the same transceiver have consecutive request numbers.
[0030]
In another embodiment, group priority selector register 154 includes a priority value for each of the response or request groups, with the response or request group having a higher priority value preceding the group of lower priority value. . This allows a higher priority value request or response to be processed prior to a lower priority value request or response if the lower priority value request or response cannot be processed in the next clock cycle. To prevent so-called dynamic locks (Livelock), the dynamic lock counter register 156 stores a number of consecutive requests (or responses) having a high priority value that may precede requests (or responses) of a low priority value. Including information. As a result, a request (or response) of a low priority value does not have to be left for many clock cycles.
[0031]
Another point to keep in mind is that each of the shared resources has its own set of operating characteristics (eg, access time, CAS latency for DRAM devices, etc.) to optimize control of both command and data flows. Etc.) are related. If one or more shared resources are backed by universal controller 104, each of the shared resources has a different set of operating characteristics, and in another embodiment, the operating characteristics are coupled to command orderer 116. It is stored in a look-up table (LUT) 118. Command orderer 116 uses information provided by LUT 118 in conjunction with resource tags stored in resource tag buffer 114 to appropriately order command components 202-210 to form ordered commands 220. This is especially true when the shared resource is exactly a collection of memory devices, each having substantially different operating characteristics, such as a multi-chip module.
[0032]
Next, FIG. 4 is a flowchart illustrating details of a process 400 for a universal controller to gain access to a shared resource in accordance with an embodiment of the invention. The process 400 begins at 402 where the system requests access to a shared resource. If the shared resource is a DRAM type memory device, its operations include precharge, refresh, close, open, read and write. For example, a memory page stored in shared memory by the processor generating a system command (i.e., page read) and an associated system address indicating the location in memory where the requested page is stored. Request. This is, in the preferred embodiment, at 404, the state of the resource is determined using, for example, a resource tag associated with an active memory location in shared memory. Next, at 406, a determination is made as to the order of operations necessary to perform the desired request for the shared resource. At 408, the universal controller generates a universal command based on the sequence of operations necessary to perform the desired request. For example, in order to execute a page read operation, it is necessary to close a page opened in a previous stage, activate a new page, and then execute a read operation. All of this is understood by one universal command structure. Once the universal controller has formed the universal command using the resource tags and unique operating characteristics for the shared resource, then, at 410, the universal controller determines an appropriate time interval between each of the command components of the universal command. Then, at 412, an ordered command is issued for the shared resource. In this case, a physical stage is used in another embodiment. Finally, at 414, the shared resource responds to the ordered command, such as by presenting the data stored at the location indicated by the system address.
[0033]
In another embodiment of the invention, the universal controller determines the state of the resources (404) and the order of operations to be performed (406) using the process 500 shown in FIG. The process 500 begins by comparing a resource partition identifier (ie, a memory address register) with a resource identifier (ie, a resource tag address field 202) at 502. If the occurrence of a "match" is confirmed at 504 (i.e., the address of the new command matches the current tag address field), then at 506 the next command is issued. On the other hand, if the new command address does not match the current tag address field (ie, if there is a mismatch), then a determination is made at 508 whether the old page is still open. If an old page is open, the page is closed at 510 and a new page is opened at 512. However, if it is determined at 508 that the old page is not open, then at 512 a new page is opened. In either case, once a new page is opened, the next command (data operation) is issued at 506.
[0034]
In another embodiment of the invention, the universal controller determines an appropriate time interval between each of the successive operations based on the process 600 shown in FIG. 6 (410). The process 600 includes, at 602, the universal controller having the first command of a new set of commands for a particular resource and the most recent set of commands issued to the same resource up to that point. Is started by comparing with the last command. At 604, the universal controller determines a time constraint between the universal command components by comparing the first command component of the new universal command with the last command component of the newest previous command. . In another embodiment, the universal controller uses a two-index look-up table (LUT) in the form of a two-dimensional array as shown in Table 1. In it, the first row of the array represents the old (ie, newest) command and the first column of the array represents the new command. For example, referring to Table 1, if the old command is page read and the new command is page close, the intersection of the new command of page close and the old command of page read is , The minimum amount of time allowed between the two operations (ie, the minimum physical time required for issuance). Typically, the information stored in the LUT is provided by the manufacturer of the shared resource.
[0035]
[Table 1]

[0036]
Once the resource constraints for a particular universal command component have been determined, a determination is made at 606 as to whether there are additional command components for the same universal command. If there are no more command components, at 608 details regarding the time interval of the universal command and associated components are stored. On the other hand, if the universal command includes additional command components, control is returned to 604, for which corresponding physical time constraints are determined.
[0037]
However, in order to observe the state of a physical page in the shared memory 108 having a plurality of memory banks, for example, a very large number of resource tags are required, and accordingly, a very large number of cache memories are used for the resource tag buffer 114. Is required. This would require a significant amount of time to retrieve a unique resource tag for a particular memory page of the remotely located memory, thereby reducing the overall operating speed of the universal controller 104. Will be reduced. In another embodiment shown in FIG. 7A, a page hit / miss controller 702 is included in the universal controller 104 and the number M of page registers 704 is set to be less than the number N of memory banks in the multi-bank memory 706. This is because not all banks can cope with the M page registers 704. In this operation, each of the M page registers 704 stores the address and status data of an open page. Further, the random page register number generation unit 708 generates a random integer value equal to or less than M corresponding to the page register and exchanges it with the address of the open page. Comparator 710 compares the input system address with the bank numbers and page addresses of all M registers in parallel to obtain one of the four possible results shown below.
[0038]
1) If the comparator 710 indicates a hit (match), the desired page of the requested bank is opened and ready for access.
[0039]
2) If the comparator 710 indicates a bank hit (match) and a page miss (mismatch), the universal controller 104 closes the old page using the page address from the page register and returns from the system address. A new page needs to be opened using the page address.
[0040]
3) If the comparator 710 indicates a miss in both bank and page, the universal controller 104 closes any old pages in the bank with the number provided by the random page register number generator and uses the system address. Need to open a new page. Thereafter, access to a desired bank is performed.
[0041]
4) If both the bank and page are missed, but at least one page register is unused, that register is used to open a new page.
[0042]
In another embodiment, as shown in FIG. 7B, the random page register number generator 708 is replaced by the least recently used (LRU) comparator 712, and which of the M registers 704 is most recently used. Determine if it has been unused for a long time (ie, has it been used most recently).
[0043]
In addition to observing the state of the physical page in the multi-bank memory 706, the bank access controller 800 shown in FIG. 8 includes N bank registers 802 corresponding to the number N of memory banks included in the multi-bank memory 706. The bank register 802 in which the information of the related bank is stored includes a bank number field 804 that defines the identification number of the bank. In addition, bank register 802 includes a bank status field 806 indicating the status of a particular bank identified by the bank number in bank number field 804. In particular embodiments, bank status field 806 may take on the values as shown in Table 2.
[0044]
[Table 2]

[0045]
With the development of packet-oriented high-speed memories such as synchronous link DRAMs (SLDRAMs) that transmit bus data at speeds in the range of 400 to 800 Mb / s / pin, problems due to memory access collisions are increasing. . Referring first to FIG. 9A, an exemplary SLDRAM-type multiprocessor system 900 is shown, according to an embodiment of the invention. Multiprocessor system 900 includes a processor 902 connected to a controller 904 by a system bus 906. Universal controller 904 is then connected to synchronous link DRAM (SLDRAM) 908 and SLDRAM 910 by an SLDRAM bus. The SLDRAM bus includes a unidirectional command bus 912 and a bidirectional data bus 914. Note that although only two SLDRAMs are shown in FIG. 9A, any number of SLDRAMs may be connected to universal controller 904 by

buses

912 and 914. In other cases, the SLDRAM may take the form of a buffered module that includes any suitable number of SLDRAMs, such as SLDRAM 908. An initialization / synchronization (I / S) bus 916 that connects the universal controller 904 to each of the

SLDRAMs

908 and 910 provides a signal path for initialization and synchronization signals generated by the universal controller 904.
[0046]
In another embodiment of the invention, packetized command, address, and control information from universal controller 904 is selectively communicated on command bus 912 to

SLDRAMs

908 and 910. Data bus 914 is set to transmit the packetized write data from universal controller 904 to the selected one of

SLDRAMs

908 and 910. Alternatively, data bus 914 is formed to send packetized read data from the selected one of

SLDRAMs

908 and 910 back to universal controller 904. It should be noted that the command bus 912 and data bus 914 typically operate at the same speed as one another, for example, 400 MB / s / p, 600 MB / s / p, 800 MB / s / p.
[0047]
The plurality of control signals generated by the universal controller 904 and transmitted by the command bus 912 include, for example, a running clock signal (CCLK), a FLAG signal, a command address signal CA, a LISTEN signal, a LINKON signal, and a RESET signal. . Typically, a packet command consists of four consecutive 10-bit words, with the first word of the command having the first bit of the FLAG signal set to "1". In a preferred embodiment, both ends of the running clock signal CCLK are used by

SLDRAMs

908 and 910 to latch the command word. The

SLDRAMs

908 and 910 respond to the H level LISTEN signal by checking the command bus 912 for the input command. Alternatively,

SLDRAMs

908 and 910 respond to the L-level LISTEN signal by entering a power saving standby mode. The LINKON and RESET signals are used to stop or activate the selected one of

SLDRAMs

908 and 910, respectively, to a known desired state.
[0048]
For the remainder of the discussion herein, the SLDRAM 908 will be discussed only with the understanding that any number of SLDRAMs deemed appropriate may be connected to the universal controller. As discussed above, a typical SLDRAM device, such as SLDRAM 908, is hierarchically organized by memory banks, columns, rows, and bits, as well as memory areas. It should be noted that it is indeed recognized that each of these hierarchical levels has different operating characteristics from one another. Such operating characteristics include, but are not limited to, parameters such as memory access time, chip enable time, data search time, and the like. It should be noted that while regions are defined for different devices, such as different memory types and memory groups, each with different command and data latencies, banks in multi-bank memories typically have the same operating characteristics. It is. For example, one local memory group can be directly connected to the memory controller and also connected to a second non-local memory group located on the board which increases the driver mediating command wait time and data wait time for the local memory group. it can. In other cases, each of the various memory chips forming the multi-chip module is considered to be a different memory area.
[0049]
9A, the SLDRAM 908 is a multi-chip having four memory chips A, B, C, and D, each individually accessible by a command bus 912, a data bus 914, and an I / S bus 916. Module. Each of the memory chips A-D may have different operating characteristics (typically provided by the manufacturer) to optimally schedule commands and data packets, and the universal controller 904 may operate at specific hierarchical levels, and Alternatively, the operating characteristics of the corresponding memory area can be used.
[0050]
By way of example, FIG. 9B shows a typical timing diagram of a typical SLDRAM bus transaction according to the multiprocessor system shown in FIG. During operation, the processor typically generates a processor command packet, such as a read command 950 or a write command 952, to which the appropriate memory bank (s) of the SLDRAM 908 respond. Typically, read commands 950 and write commands 952 are pipelined on system bus 906 based on the specific requirements of processor 902 in which they are generated, and are not suitable for optimal performance of SLDRAM. A system clock CLKsys (not shown) provides the necessary timing signals.
[0051]
In this example, the processor 902a generates a read command 950 having the memory address MA1 located on the memory chip A of the SLDRAM 908. On the other hand, the processor 902b generates a read command 952 having the memory address MA2 also located in the memory chip A of the SLDRAM 908. In this example, the read command 950 is an output to the system bus 906 that has priority over the output of the write command 952. The universal controller 904 first receives the read command 950, and then starts processing the command based on the command itself and the command address MA1, using the end point address specifying information stored in the universal controller 904. Once the shortest issuance time is determined, universal controller 904 then generates SLDRAM command packet read 960 in response to received processor command 950 and sends it out to command bus 912.
[0052]
In general, an SLDRAM command packet is organized as shown in Table 3 into four 10-bit words representing a 64M SLDRAM with 8 banks, 1024 row addresses, and 128 column addresses. As shown, the bank address (BNK) is 3 bits, the row address (ROW) is 10 bits, and the column address (COL) is 7 bits. Many other arrangements and densities are possible and can accommodate any of the 40 bit formats shown, as well as any other format that may be defined as appropriate. During power up, the universal controller 904 organizes command packets based on polling the SLDRAM for factors such as bank, row, column numbers and associated operating characteristics stored at that time by the universal controller 904. I do.
[0053]
The first word of the command packet contains a plurality of chip ID bits. The SLDRAM ignores any command that does not match the local ID. The chip ID is assigned by the universal controller 904 using the initialization signal and the synchronization signal when the power is turned on. In this way, the universal controller 904 addresses each SLDRAM of the multiprocessor system 900 one-to-one by generating a separate chip enable signal or glue logic.
[0054]
[Table 3]

[0055]
Since the read command 950 and the write command 952 are pipelined, the universal controller 904 receives the write command 952 after a certain period of time after receiving the read command 950 (or may store it in a buffer), Next, an SLDRAM command packet write 962 corresponding to the write command 952 is issued. In order to avoid the interference of the previously issued read command 960 since the same bank (A) is accessed for both commands, the universal controller 904 uses the MA2 specific characteristic data as well as the shortest issue time to generate it. The issue time of the read command 960 (that is, the issue time) and the data offset of the write 962 are used.
[0056]
In this manner, the universal controller 904 can dynamically schedule issuance of SLDRAM command packets based at least on the current state of the stream of commands and data packets and the operating characteristics of the particular destination address device.
[0057]
Reference is now made to FIG. 10 illustrating a block diagram of a memory controller 1000 according to an embodiment of the present invention. It should be noted that memory controller 1000 is only one of the embodiments of universal controller 104 shown in FIG. 1 and should not be taken as limiting the limitations of the present invention. The memory controller 1000 includes a system interface 1002 that connects the processor 902 via a system bus 906 to a memory scheduler 1006 (referred to as a scheduler). In one embodiment of the present invention, the system interface 1002 is arranged to provide for transmission of both the memory command packet generated by the processor 902 and its associated write data packet to the memory command packet scheduler 1004. Have been. In situations where the scheduler 1006 indicates that the internal buffer is full and cannot accommodate a new command, the system interface 1002 holds the new command until the scheduler 1006 indicates that the new command is ready to accept.
[0058]
A synchronous link media access controller (SLiMAC) 1008 provides a physical interface between the scheduler 1006 and the SLDRAM 908. More specifically, SLiMAC 1008 includes a command interface 1010 and a data interface 1012 for connecting SLiMAC 1008 to SLDRAM 908 via command bus 912 and data bus 914, respectively. In a preferred embodiment of the present invention, command interface 1010 transmits a memory command from SLiMAC 100 to SLDRAM 908 with an associated command clock CCLK. In some embodiments, to generate a command clock signal CCLK that typically operates at 200 MHz, SLiMAC 1008 incorporates a clock doubler that uses an interface clock signal ICLK (which can operate at approximately 100 MHz).
[0059]
In one embodiment of the invention, data interface 1012 both receives and transmits data on data bus 914. It should be noted that the width of the data bus 914 can be large enough to support the required number of SLDRAMs. Thus, as many as necessary data interfaces may be included in the SLiMAC to provide the required bandwidth. As an example, if the data bus 914 is 32 bits wide (eg, 16 bits per SLDRAM), then the SLiMAC 1008 may have two data interfaces that can each control the 16 bits associated with an individual SLDRAM. it can. In this way, the size of the data interface included in the SLiMAC 1008 can be strictly adapted to the particular configuration of the SLDRAM connected to it.
[0060]
In much the same way as using the command interface 1010, the SLiMAC 1008 can provide a data clock signal DCLK associated with the read data transmitted from the SLDRAM 908 to the SLiMAC 1008. In one embodiment of the invention, the data clock DCLK is generated using a clock doubler that increases the interface clock ICLK frequency from about 100 MHz to about 1000 MHz. It should also be noted that the interface clock signal ICLK, command clock signal CCLK, and data clock signal DCLK are all phase synchronous.
[0061]
In a preferred embodiment of the present invention, scheduler 1006 includes a restriction block 1016 arranged to receive system commands and their associated system address data from connected system interface 1002. Restriction block 1016 provides timing information associated with the SLDRAM command packet data to reordering block 1018. The write buffer 1020 receives read data from the system interface 1002. As dictated by the scheduler 1006, read data is transmitted from the data interface 1002 through a read buffer 1022 connected to a data bus 914 arranged to provide the read data to the system interface 1002. An I / S block 1024 connected to the initialization / synchronization (I / S) bus 914 provides appropriate initialization and / or synchronization signals to the SLDRAM 908 as required.
[0062]
In operation, scheduler 1006 receives pipelined memory command packets generated by processor 902. Normally, a memory command packet is composed of a memory command and a memory address associated therewith. In one embodiment of the present invention, the scheduler 1006 is associated with the new command received to determine the destination address to which the memory command and its associated data packet, if any, were directed. Decode a memory address. Once decoded, scheduler 1006 uses the destination address specific device characteristic data stored therein, along with information associated with the previously sent memory command, to send out a new SLDRAM command packet. The new SLDRAM command packet is output to the command bus 912, and finally to the SLDRAM identified by the chip ID included in the SLDRAM command packet.
[0063]
As part of the scheduling process, before issuing a new command, scheduler 1006 determines the minimum amount of time since the issuance of the previously issued requested command. As described above, because each hierarchical level of the SLDRAM, eg, a memory bank, can have different operating characteristics (typically provided by the manufacturer), the scheduler 1006 will Polls each of the SLDRAMs it serves. In some embodiments, if the connected memory device does not allow polling to determine operating characteristics, memory specific parameters (such as timing) can be written directly to the restricted block register 1016. Once the SLDRAM is polled, the scheduler 1006 stores the device specific information that will be used later to deploy the appropriate scheduling protocol. In this way, the scheduler 1006 can provide a scheduling service that adapts to any number or type of SLDRAM without the need for hard wiring, without additional time consuming and costly procedures. is there.
[0064]
FIG. 11 schematically illustrates a restriction block 1100 according to an embodiment of the present invention. It should be noted that the restriction block 1100 is not the only possible embodiment of the restriction block shown in FIG. 10, and thus is not so limited. Restriction block 1100 includes an address decoder 1102 arranged to decode a received new address signal associated with a new memory command generated by processor 902 and connected to system interface 1002. The decoded new address signal provides an input to the array tag register 1104, which has status and other information for all or possibly only a subset of the associated SLDRAM memory banks. Related information is stored. The array tag register 1104 provides an input to a selector 1106, which communicates data associated with the selected virtual bank to a look-up table (LUT) 1108 based on the decoded new command.
[0065]
The restriction block 1100 also includes an area comparator 1110 connected to the system interface 1002, the area comparator 1110 using the received new address signal to indicate an area identifier indicating the memory area where the new command address is located. Supply. In this way, the restriction block 1100 can provide the best scheduling protocol for a new memory command based at least in part on the memory area specific property data. The region comparator 1110 provides the region identifier as an input to the LUT 1108 along with the new command signal. The LUT 1108 then provides the minimum delta issue time and data offset used to translate the new command and its associated new address into an SLDRAM command packet. It should be noted that the minimum delta issue time indicates the delta time (in clock cycles) at which a new command is issued in relation to the previously issued old command. The data offset time represents a delta time in a clock cycle for receiving a read data packet associated with the new command after issuing the new command.
[0066]
In one embodiment of the present invention, the restriction block 1100 includes 16 array tag bank registers, and the LUT 1108 includes four different parameter sets for each of the four timing regions with 16 associated registers. Can be stored.
[0067]
FIG. 12 is a timing diagram 1200 of an SLDRAM bus signal in response to a received processor command according to one embodiment of the present invention. It should be noted that Table 4 summarizes the scheduling process performed by the restriction block by identifying various generated signals. It should also be noted that the memory command takes the form of {command, address}, where "command" indicates the instruction to be executed and "address" indicates the location of the associated memory. is there.
[0068]
Reference is now made to Table 4 and FIG. During the system clock cycle φ1, the first command {open page, 1000} is received by the address decoder 302 and received in parallel by the area comparator 1110. In this example, the address decoder 1102 decodes the open page command address “1000” as “100” and “400”, and determines that the area comparator 1110 is included in the memory area 0. Since the open page command is the first command to be received, there is no "hit" in any of the virtual banks B0-13, and the corresponding replacement counter is set to "0". In this embodiment, the replacement counter is updated based on a pseudo-random counting method, whereas in other embodiments, random counting or another suitable method is used. Since the first command {open page, 1000} is an open type command, there is no associated minimum delta issue time or data offset, so the page at address 1000 is opened in the first command clock cycle φC1.
[0069]
During the next system clock cycle Φ2, a {Read, 1000} command is received at the restriction block 1100, which the address decoder 1102 decodes as 100 and 400 (ie, opens to memory address location 1000 in the previous clock cycle). These values cause the area comparator 1110 to set the area identifier to area 1. However, in this case, the "old command" before, or in other words, the "old command" stored in the B0 register will "hit" at B0, thereby causing the selector to replace the "read" with the "old command". Output to the LUT 1108 as input. Other inputs include an area indicator "area 1" issued by area comparator 1104 and a "new command" input which is a lead. The LUT 1108 generates a minimum delta issue time of three command clock cycles Φ3 using the stored characteristic data. This indicates that at least three command clock cycles must separate the issuance of the {page open, 1000} command and the associated {read, 1000} command.
[0070]
In this manner, each memory command packet received in the restriction block 1100 is processed, at least in part, based on the most recently issued command according to the characteristic data stored in the LUT 1108.
[0071]
Next, the reordering of commands received from restricted blocks according to certain embodiments of the present invention will be described. FIGS. 13A-13C are

timetables

1302, 1304 that illustrate, through a simple command reordering example, some of the benefits realized by memory command reordering in accordance with certain embodiments of the present invention. Help. Each timetable shows four read commands corresponding to two different memory banks. CMD0 and CMD1 are read commands directed to bank 1 of the associated memory. CMD2 and CMD3 are read commands directed to bank 2 of the associated memory. The timetable 1302 shows the memory commands arranged on the command bus connecting the memory controller and the memory in the order in which the commands are received from the system processor by the memory controller. CMD0 occupies time zone 0, CND1 occupies time zone 3, CMD2 occupies time zone 4, and CMD3 occupies time zone 7. Each time period represents one clock cycle.
[0072]
As mentioned above, commands to the same memory bank require a minimum delay between issuances in order to process previously issued commands. This is represented in FIG. 13A by two time zones between a pair of commands. As can be seen, assuming that four read commands have been sent to memory in the order shown in FIG. 13A, the command bus has four available clock cycles, ie, during

time zones

1, 2, 5, and 6. , Will not be used. As discussed below, at least some of this inefficiency will be improved by command reordering in accordance with the present invention.
[0073]
The

timetables

1304 and 1306 of FIGS. 13B and 13C, respectively, illustrate the reordering of the commands of FIG. 13A and at least some of the benefits obtained in accordance with certain embodiments of the present invention. In this example, data bus conflicts are not considered for simplicity. However, as discussed below, attention must be paid to such considerations for effective reordering of memory commands. Due to the fact that CMD2 and CMD3 are directed to different memory banks from CMD0 and CMD1, the memory access latency between the two pairs of commands is not a problem and may be ignored. That is, the commands can be rearranged as shown in timetable 1304, with CMD2 placed in time zone 1 immediately after CMD0 and CMD3 placed in time zone 4 immediately after CMD1. This is because no delay is required between the issuance of CMD0 and CMD2 and between the issuance of CMD1 and CMD3 due to the fact that they are directed to different memory banks. However, it will be appreciated that a minimum delay time, eg, two clock cycles, must be maintained between pairs of commands directed to the same bank, as shown in FIG. 13C. That is, command reordering does not include attempts to reduce the delay between successive commands to the same memory bank.
[0074]
The result of the command reordering is shown in FIG. 13C. Here, four commands are issued within five clock cycles and only time zone 2 has not been used. Of course, it will be appreciated that a fifth memory command towards yet another memory bank is inserted at time zone 2, further increasing the efficiency with which the command bus is utilized.
[0075]
FIG. 14 is a block diagram of a portion of a memory controller configured in accordance with certain embodiments of the present invention. The reordering circuit 1400 receives an incoming memory command sequence from the system processor, ie, the

command sequence

1, 2, 3. According to a particular embodiment, the memory command is transmitted to the reordering circuit 1400 via a limiting circuit (not shown), which, as described above, uses the same command in the memory associated with the selected command. Issue time constraints in response to other commands directed to the logical bank. Commands are reordered in command queue 1402, from which commands are issued to memory. In this example, the commands are reordered in the

order

1, 3, 2.
[0076]
The original memory command sequence, ie, 1, 2, 3 is stored in the FIFO memory in the data reading circuit 1406. The sequence in FIFO 1404 is used to reorder the data received from memory such that the commands correspond to the order in which they were originally received by the memory controller. However, since some processors expect data out of order, while others expect data out of order, turning on and off the FIFO 1404 as needed may result in any type of data. It should be noted that order can also be supported. This is necessary because the processor "expects" to receive the data in an order corresponding to the order in which the commands were originally transmitted to the memory controller.
[0077]
Further, a third sequence is stored in data queue 1408 because data from memory may be received by the memory controller in a sequence that does not correspond to the sequence from which the processor transmitted the memory command. This sequence (3, 2, 1 in this example) represents the order in which data corresponding to command

sequences

1, 3, 2 will be received by data reading circuit 1406. The data queue sequence is calculated by the reordering circuit 1400 based on the command queue sequence and the known latencies associated with the various logical banks of memory. When the memory transmits data to the memory controller in the sequence stored in the data queue 1408 (ie, 3, 1, 2), the data is stored in the read data buffer 1410 and based on the information in the FIFO 1404 and the data queue 1408, It is re-ordered for transmission to the processor in an order corresponding to the order of the original command sequence, ie, 1, 2, 3.
[0078]
FIG. 15 is a block diagram of a reordering circuit 1500 in a memory controller configured according to a particular embodiment of the present invention. The reordering circuit 1500 includes a command queue 1502 for storing and reordering commands received from the system processor. The command queue 1502 calculates the issue time of each command using the command issue time constraint and the data bus use constraint associated with the command going to the same logical bank in the memory, issues the command, and issues the command. Dequeue a command from the queue.
[0079]
The data queue 1504 stores a data element indicating a data occurrence time corresponding to the issued memory command, calculates a new data occurrence time for each new input to the queue, and sets a queue entry when the corresponding memory transaction is completed. Remove.
[0080]
The comparator matrix 1506 performs a collision detection function. In this function, the data generation time of a command that is ready to be issued from the command queue 1502 (transmitted via the multiplexer 1508) is the data generation time of a previously issued command represented in the data queue 1504. Is compared to If a collision is detected, the command issuance is postponed.
[0081]
FIG. 16 is a more detailed block diagram of the reordering circuit 1500 of FIG. As shown in the diagram of FIG. 17, command queue 1502 includes six command queue elements 1602, each of which stores 61 bits of information regarding a particular memory command. Command field 1702 includes a 40-bit memory command packet specifying a memory command. The command issue time (Cd) field 1704 is a 6-bit field, and indicates a delta time of a clock cycle before a command is issued. The value of field 1704 is determined by the limiting circuit described above and relates to the most recent memory command corresponding to the same logical bank in memory. That is, the value of the Cd field represents the latency between two commands to the same bank. Information about the latency required for each bank is stored in a limiting circuit, which is largely determined by the physical characteristics of the memory. In the command queue, the Cd field is determined for each clock cycle, with some exceptions. For example, the latency between successive commands to the same logical bank is not changed. Thus, if the Cd field for a command destined for a particular bank becomes zero and is not issued, the Cd field of other commands to the same bank will not be decremented until the first command is issued.
[0082]
The data generation time (Dd) field 1706 is a 6-bit field and indicates a delta time of a clock cycle between the issuance of a memory command from the command queue and the transfer of the corresponding data. Dd field 1706 must not be changed in the command queue. The command ID field 1708 is a 5-bit field, and identifies the command in the command packet 1702 on a one-to-one basis. This information is used together with the corresponding information in the FIFO and the data queue to keep track of which packets and which data correspond to which packets so that command and data reordering works. You. The logical bank (B) field 1710 is a 3-bit field and identifies which logical bank in the memory the command packet is directed to. Finally, the burst indicator (Db) field 1712 is a one-bit field, indicating that the requested or written data occupies one or two clock cycles.
[0083]
Referring back to FIG. 16, the operation of the command queue is controlled by the command queue controller 1604. The controller 1604 controls insertion of an incoming command into a specific queue element via the empty position identification unit 1606 so as to keep track of which command queue element 1602 is available. Controller 1604 also facilitates inserting command queue element information into data queue 1504 when a corresponding command is issued. According to a particular embodiment, commands are inserted into the command queue 1502 regardless of the availability of free time slots on the command bus or data bus.
[0084]
A command may be issued to the command bus from any one of the command queue elements 1602 via multiplexer 1608 if its Cd count is zero and there is no collision on the data bus. That is, free time slots on the command bus and / or data bus must be identified. If the command is not a read or write (and therefore does not require data bus resources), only a command bus time slot is needed. If the command is a read or a write, slots for both the command bus and the data bus are required.
[0085]
The zero comparator 1610 of the controller 1604 is used to make the first decision, that is, whether Cd = 0. Subtractor 1612 subtracts “1” from the Cd count for each command queue element 1602 at each clock cycle if the above exception, ie, Cd = 0 for a particular command that cannot be issued. Used for In that case, the queue controller 1604 uses the Cd and B fields for all queue elements to issue a mask signal (M) that prevents the Cd count for all commands to the same logical bank from decrementing.
[0086]
According to a particular embodiment, if there are two queue elements with Cd = 0, the one with the highest priority (eg, the oldest) is issued. The address shifter 1614 prioritizes commands in the queue, as discussed in detail below with reference to FIG. According to another particular embodiment, if a new command arrives in the command queue and its Cd count is already zero, it is transmitted directly to memory via multiplexer 1608. The new command is stored in the command queue element 1602 if its Cd count is not zero, or if there is another command with higher priority and stored in the command queue with Cd = 0. However, if the command queue is empty, a new command is issued immediately (if Cd is equal to zero).
[0087]
For read and write commands, collisions are detected using the Dd and Db fields of the command queue element 1602 containing commands that are ready to be issued. The data generation time and duration corresponding to the command are transmitted to a comparator matrix 1506 via a multiplexer 1508, which is controlled by a queue controller 1604. That is, the queue controller 1604 controls the multiplexer 1508 to transmit the data issuance time and the duration (1 or 2 clock cycles) of the queue element whose command issuance time, that is, Cd is zero. The duration is one or two clock cycles, which means that adder 1616 adds "0" (representing one clock cycle) or "1" (2 clocks) for Dd + 1 by adding the Db bit to data generation time Dd. (Representing a clock cycle). Next, the data occurrence time and duration are compared in comparator matrix 1506 with the data occurrence times and durations of the five previously issued commands stored in data queue 1504. According to a particular embodiment, comparator matrix 1506 includes 2 × 10 parallel comparator matrices.
[0088]
FIG. 18 is a block diagram of a specific embodiment of the address shifter 1614 of FIG. As described above, the address shifter 1614 determines the priority of the command. As also described above, a new command is inserted into an arbitrary free command queue element 1602 according to the empty position recognition unit 1606. The address of the command queue element 1602 into which a new command is inserted is inserted in the first empty position (A0 to A5) with the highest priority. As a result, the A0 position of the address shifter stores the queue element address for the oldest unissued command. When a command is issued from the command queue, the corresponding entry in address shifter 1614 is removed and the address for the lower priority command is changed to a higher priority position. As described above, a command is issued when the Cd count for a command in the command queue reaches zero. However, when there is one or more commands with Cd = 0, the oldest command, that is, the command with the highest priority indicated by the position of the address in the address shifter 1614 is issued.
[0089]
The data queue 1504 of FIG. 16 includes five queue elements 1652, each of which contains 12 bits of information regarding previously issued memory commands, as illustrated by FIG. Data generation time (Dd) field 1902 is a 6-bit field indicating a delta time in a clock cycle between the issuance of a command from the command queue and the reception of the corresponding data. The Dd count for each data queue element 1652 is decremented every clock cycle using one of the subtractors 1654 until its value reaches zero. When Dd = 0, the corresponding data is on the data bus. Thus, it will be appreciated that there is only one data queue element 1652 with Dd = 0 at any given time. After the Dd count reaches zero, the information in the corresponding data queue element is removed from data queue 1504.
[0090]
The command ID field 1904 is a 5-bit field that identifies the issued command corresponding to the data on a one-to-one basis. This information helps reorder the data so that it corresponds to the original order in which the commands were originally transmitted to the memory controller. Finally, a burst indicator (Db) field 1906 is a one-bit field that indicates that the data occupies one or two clock cycles.
[0091]
Returning to FIG. 16, as discussed above, the data generation time (Dd) and duration for each of the data queue elements 1652 may be determined for commands that are ready to be issued, ie, commands in the command queue 1502 with Cd = 0. Is compared with the Dd and duration in the comparator matrix 1506. The duration is one or two clock cycles, which means that adder 1616 adds "0" (representing one clock cycle) or "1" (for one clock cycle) to Dd + 1 by adding the Db bit to data generation time Dd. (Representing two clock cycles). If the comparison shows no collision on the data bus, the command is issued from the command queue.
[0092]
Data queue controller 1658 controls the operation of data queue 1504. The empty position identification unit 1660, together with the command queue controller 1604, facilitates insertion of new data queue element information into the data queue element 1652. The free location identifier also facilitates removing information from data queue element 1652 when the corresponding memory access is completed. Zero comparator 1662 and burst identifier 1664 determine when Dd for any of data queue elements 1652 will be zero, and when data transfer will no longer occupy the data bus, and therefore when the corresponding information is removed from the data queue. Used to determine if it should be removed.
[0093]
According to another particular embodiment of the present invention, collision detection is further complicated through the use of a two-dimensional array of comparators and multiplexers. This approach is more silicon intensive than the one-dimensional approach described above, and looks at all elements in the command queue, not just one element for ready-to-issue commands. It schedules commands not only for previously issued commands, but also for the order of data packets on the data bus.
[0094]
In order to insert a new command, each combination of two consecutive stages in the to-be-issued portion of the command pipe must be compared to see if a new command can be inserted between them. This comparison actually determines the range in which the command can be inserted. This range is as shown below.
[0095]
CLENX = command length
Tcstart = tcA + CLENA (1)
Tcend = tcB (2)
[0096]
Here, tcA and tcB are the issue times of the continuous pipeline elements A and B. Pipeline element A precedes pipeline element B, so its issue time is the lower of the two. If there is an insertion, there must of course be at least one open slot between the A and B elements. Therefore:
[0097]
N = Tcend−Tcstart + 1 (3)
(Where N = number of issue slots between elements AB)
LEN <= tcb-tca-CLENA (4)
[0098]
In hardware, it is easy to simply implement the following conditions:
[0099]
(TcB-CLENA)-(tcA + CLENA) => 0 (5)
[0100]
The start and end points of the range also specify the possible range of the associated data slot. This range must be compared to each successive element in the datapipe to see if there is any overlap and what the new range will be. There are five different cases in this comparison.
[0101]
Case 0:
In this case, the range delimited by data slots tdA and tdB is completely outside the range of two consecutive elements M and N. In this case, therefore:
[0102]
tdA + CLENA => tdN (6)
Or, if DLENX = length of data,
tdB <= tdM + DLENM (7)
[0103]
There are no possible data slots between the M and N pairs.
[0104]
Case 1:
In this case, the range defined by data slots tdA and tdB is completely inside the range of two consecutive elements M and N. In this case, therefore:
[0105]
tdA + CLENA => tdM + DLENM (8)
And,
tdB-CLEN + DLEN <= tdN (where CLEN is the length of a new command in a slot and DLEN is the length of new data in a slot) (9)
[0106]
The earliest possible data slot time in this case is tdA + LENA with the issuance time of the corresponding command tcA + CLENA.
[0107]
Case 2:
In this case, the range defined by data slots tdA and tdB corresponds to the length of element M. In this case, therefore:
[0108]
tdA + CLENA <tdM + DLENM (10)
And,
tdB-CLEN + DLEN> tdM + DLENM, and tdB-CLEN + DLEN <tdM (11)
[0109]
The earliest possible data slot time in this case is tdM + DLENM + 1 with the corresponding command issue time tcM + CLENM-DATA-OFFSET. Here, DATA-OFFSET is the time between the command issue time and the data occupation time.
[0110]
Case 3:
In this case, the range defined by data slots tdA and tdB corresponds to the length of element N. In this case, therefore:
[0111]
tdA + CLENA> tdM + DLENM (12)
And,
tdA + CLENA + DLEN <tdN (13)
[0112]
Thus, the earliest possible data slot time in this case is tdA + CLENM with the corresponding command issue time tcA + CLENA + 1. It should be noted that this case also includes Case 1.
[0113]
Case 4:
In this case, the range defined by data slots tdA and tdB includes the range defined by elements M and N. In this case, therefore:
[0114]
tdA + CLENA <tdM + DLENM (14)
And,
tdB-LEN> CtdN (15)
[0115]
Therefore, the earliest possible data slot time in this case is tdM + DLENM with the corresponding command issuance time tcM + CLENA + DATA-OFFSET. Here, DATA-OFFSET = tdA-tcA.
[0116]
For the purpose of scheduling such that the earliest possible slot is always chosen, it is clear that case 1 and case 3 are identical. Therefore, the case where these are combined is Case 3. Similarly, case 2 and case 4 are identical because the required result is tdM + LENM. In this case, it must be clearly shown that tdM matches the range given by tdA and tdB. In addition, the earliest possible issuance time (tc) and data slot (td) for incoming commands must be considered. For each command pipe pair, the comparison that must be made on each data pipe is as follows:
[0117]
(Equation 1)

[0118]
Therefore, the operations required for the command pipe are:
[0119]
(Equation 2)

[0120]
Similarly, the operations required for a data pipe are:
[0121]
[Equation 3]

[0122]
The decision logic therefore consists of the matrix of comparators defined above. The best choice is the earliest command issuance time, which is determined by a simple priority encoder.
[0123]
The reordering pipe control logic must dynamically determine which operation is to be performed on each element of the command pipe and the data pipe.
[0124]
In the waiting command pipe, each pipe element can perform four operations. Reading from the previous element (the pipe moves forward), holding the current contents (the pipe holds), reading from the next element (the pipe backs up), and reading from the incoming command bus. At various points in the pipe defined by the four cases, there can be many sets of conditions. The source element to the SLiMAC is defined as element 0, whereas the element furthest from issuance is defined as element M. If the reordering logic finds that the optimal insertion point in the current pipeline is between elements N-1 and N, an insertion into element N is made.
[0125]
Case 1-Hold:
If there is no issue to the SLiMAC or insertion of a new command, the pipe is held.
[0126]
Case 2-Hold & Insert:
In this case, there is no issue to the SLiMAC, but there is an insertion of a new command into the pipe. If the insertion occurs at element N, the pipe holds elements 0 through N-1 and inserts into element N, backing up elements N + 1 through M.
[0127]
Case 3-Issue:
In this case, there is an issue from element 0 to SLiMAC, and the other part of the pipe is that element M-1 is element M-1, element 1 is element 2, and so on. Will be postponed until it contains the contents of
[0128]
Case 4-Issue & Insert:
In this case, there is an issue from element 0 to SLiMAC and an insertion at element N. In this case, the advance operation is performed in the elements 0 to N-2, the insertion operation is performed in the element N-1, and the elements N to M are held. The advance is performed on the element that will store the data from the immediately following element, and the insertion at element N (the element will be inserted between elements N-1 and N of the current pipe). , In effect, means that the inserted element goes to location N-1 of the updated pipe.
[0129]
FIG. 20 illustrates a collision detection system 2000, which is another implementation of the collision detection system 1500 shown in FIG. In this embodiment, the collision detection system 2000 reorders the commands to obtain an optimal command sequence based on the target response limit to find an optimal slot for data transfer between the initialization controller and the target subsystem. decide. Since the reordering of commands must not cause collisions between different data packets on the data bus, a collision detector that prevents the issuance of this particular command if the command data transfer associated with that particular command causes a data conflict 2002 is required. In this embodiment, the collision detection system 2000 includes a collision detector 2002 coupled to the command queue 2004.
[0130]
In this embodiment, the collision detector 2002 determines all possible transitions between a "scheduled" command (stored in the command queue 2004) and a "issued" command (stored in the data queue 2006). Detect data collisions. In this embodiment, there are N command queues 2004 each coupled to a multiplexer 2008. Each of the N command queues 2004 will include those commands to be issued and when data transfer will occur on the data bus between the universal controller 104 and the target device (ie, shared resource) 108 after the command is issued to the target device. It is configured to store a time factor “d-timeND” indicating whether it appears, a burst bit (bND) indicating data burst transfer, and a read / write bit (rwND). In this embodiment, data queue 2006 is a time factor that indicates when a data transfer appears on the data bus between universal controller 104 and target device (ie, shared resource) 108 for requests that have already been issued to the target device. “D-timeD” is stored. The command queue 2006 also stores a burst bit (bND) and a read / write bit (rwND).
[0131]
In a more preferred embodiment, the collision detection system 2000 includes a queue and link controller unit 2010 configured to store and reorder commands to be issued. The queue & controller unit 2010 also calculates the issuance time of the new command and the time when data appears on the data bus. The queue & controller unit 2010 also transfers the data issued from the command queue to the data queue, and at the same time removes it from the command queue after issuing the command. The queue & controller unit 2010 also removes the data element from the data queue after completing the access to the memory.
[0132]
Referring to FIG. 21, all read / write commands to the target device are associated with a data packet transfer. Prior to issuing a command to the target device, a new data packet ND (New Data) is checked according to its timing information to see if insertion into the data queue is possible without collision. In this example shown in FIG. 21, the issued data packet D has already occupied a position in the data queue, and a new data packet ND is compared against the issued data packet D. Note that both the issued data packet D and the new data packet ND represent a burst access. Therefore, in this example, there is two possibilities that a new data packet ND occupies a position in consideration of the issued data packet D without causing a data collision. The new data packet ND can occupy the position on the left or right side of the issued data packet D.
[0133]
This particular example illustrates collision detection for a memory controller that supports both non-burst data transfers and burst data transfers (ie, four data streams). Due to the bidirectional nature of the data bus, one clock cycle must be inserted between successive read-write or write-read transfers.
[0134]
It should be noted that there are many possible results, some of which are listed below.
[0135]
1) If ND is placed after or before D, no collision occurs.
[0136]
2) One clock cycle must be inserted between successive read-write or write-read data transfers. All elements of the command queue and the data queue store an "rw" bit indicating whether the operation is "read data" (rw = 0) or "write data" (rw = 1).
[0137]
3) A data packet is composed of one data stream (non-burst transfer) or four data streams (burst transfer). All elements of the command queue and the data queue store a "burst" bit indicating whether the operation is "burst transfer" (burst = 1) or "non-burst transfer" (burst = 0).
[0138]
The comparison to be made for the to-be-issued command in each of the to-be-issued and issued data packet pairs is as follows:
[0139]
(Equation 4)

[0140]
In yet another embodiment of the present invention, an apparatus and method for estimating the time between two consecutive memory accesses is disclosed. This apparatus and method allows for fast calculation of the earliest "command issue time" for a new command. Referring to FIG. 22, there is illustrated a prediction system 2200 having N page timers 2022 for storing the time between the most recent command issued for a particular page and its predicted next access to its memory. . The next access to the same page may be "closed", "open", "read", or "write". An incoming new command (e.g., read) selects one particular page timer that indicates how long access to a particular page must wait before issuing. This new command will then have the appropriate contents from the timing lookup table 2204 to be inserted during the next possible access (close, open, write, read) to the same page as this command (read). Choose The resolution of the timer is one clock cycle.
[0141]
The timing look-up table-data stores a time indicating how many cycles the data on the data bus is valid after the command is issued. When the new command is inactive, the value of all page timers reaches “0” every cycle.
[0142]
Referring now to FIG. 23, in yet another embodiment of the present invention, there is shown a device controller 2300 having a device access priority determiner 2302 according to an embodiment of the present invention. In this embodiment, the priority determiner 2302 includes a request queue 2303 suitable for receiving and storing any number of device requests coupled to the request controller unit 2304. The request controller unit 2304 is used, in part, to fetch a particular response from any location in the request queue 2303 and communicate the fetched response to an appropriate one of the plurality of shared devices 108. In this embodiment, the priority determining unit 2302 also includes a response queue 2306. The response queue is configured to receive and store a response from any of the shared devices 108 coupled to the response controller unit 2308, and the response controller unit 2308 derives the requesting device 102 from the stored response. Used to select a particular response to be sent to
[0143]
In a preferred embodiment, the response and the request are each associated with an ID number 150 such that the request and its associated response have the same ID number 150, as shown in FIG. 1E. As described above, ID number 150 includes five data bits, the first and second data bits of which are the group of requesting devices to which the particular response / request belongs (for a processor in a multi-processor computing environment). A group selector field 152 for identifying one group. Further, as described above, the request number field (RN) 153 represents the request and / or response number associated with the group of the requesting device identified by the group selector field 152, for example, the same requesting device. Are represented as having a continuous request number field 153.
[0144]
In operation, both the request controller 2304 and the response controller 2308 incorporate a group priority selector register 154, a dynamic lock counter register 156, and a reordering selector 2312, respectively. Group priority selector register 154 contains priority information for a particular request / response group identified by RN 152. In one embodiment, in one embodiment, a value of “3” represents the highest priority, a value of “0” represents the lowest priority, and a request of a higher priority is a lower priority request. You can jump over the request.
[0145]
To avoid a dynamic lock situation, the dynamic lock counter register 156 contains information about how many consecutive higher priority requests (or responses) can skip lower priority requests (or responses). I have. It should be noted that the dynamic lock counter register 156 is only active in situations where a higher priority request jumps over a lower priority request. In fact, if there is no lower priority request (or response) in the appropriate queue, the dynamic lock counter register 156 will be inactive.
[0146]
While several embodiments of the invention have been described in detail, it will be appreciated that the invention may be embodied in many other specific forms without departing from the spirit and scope of the invention. Therefore, the examples shown here are for explanation and not for limitation. The invention is not limited to the details shown here, but may vary within the scope of the appended claims.
[Brief description of the drawings]
FIG. 1A
FIG. 4 is a diagram illustrating a general use example of a universal controller according to an embodiment of the present invention.
FIG. 1B
FIG. 1B is a diagram showing a specific usage example of the universal controller shown in FIG. 1A.
FIG. 1C
FIG. 4 illustrates an address space controller coupled to a universal controller according to an embodiment of the present invention.
FIG. 1D
FIG. 2 is a diagram showing a specific usage example of the address space controller shown in FIG. 1C.
FIG. 1E
FIG. 4 illustrates an exemplary request / response identification number according to an embodiment of the present invention.
FIG. 2A
FIG. 4 is a diagram illustrating a general universal command according to an embodiment of the present invention.
FIG. 2B
FIG. 2B is a diagram showing a specific universal command suitable for a memory page read command, which is the same kind of universal command as in FIG. 2A.
FIG. 2C
FIG. 3B illustrates an example of a series of commands formed by providing appropriate time intervals between command components in the example command of FIG. 2B.
FIG. 3
FIG. 4 illustrates a resource tag according to an embodiment of the present invention.
FIG. 4
5 is a flowchart illustrating details of a process for a universal controller to access a shared resource according to an embodiment of the present invention;
FIG. 5
FIG. 4 illustrates a process by which a universal controller determines the state of resources and the order of operations to be performed, according to an embodiment of the present invention.
FIG. 6
FIG. 4 illustrates a step of determining an appropriate time interval between successive operations by a universal controller based on a step according to an embodiment of the present invention.
FIG. 7A
FIG. 3 illustrates a page hit / miss controller according to an embodiment of the present invention.
FIG. 7B
FIG. 3 illustrates a page hit / miss controller according to an embodiment of the present invention.
FIG. 8
FIG. 4 is a diagram illustrating a bank access controller according to an embodiment of the present invention.
FIG. 9A
FIG. 1 illustrates an exemplary SLDRAM-type multiprocessor system according to an embodiment of the present invention.
FIG. 9B
FIG. 9B is a timing diagram illustrating a process flow of the exemplary SLDRAM bus by the multiprocessor system shown in FIG. 9A.
FIG. 10
FIG. 3 is a block diagram of a memory controller according to an embodiment of the present invention.
FIG. 11
FIG. 4 is a block diagram of a restriction block according to an embodiment of the present invention.
FIG.
FIG. 4 is a timing diagram of an exemplary SLDRAM command according to an embodiment of the present invention.
FIG. 13A
FIG. 4 illustrates a flow of memory command reordering according to a specific embodiment of the present invention.
FIG. 13B
FIG. 7 is a diagram illustrating a flow of reordering memory commands according to a specific embodiment of the present invention.
FIG. 13C
FIG. 7 is a diagram illustrating a flow of reordering memory commands according to a specific embodiment of the present invention.
FIG. 14
FIG. 3 is a block diagram showing a part of a memory controller configured according to a specific embodiment of the present invention.
FIG.
FIG. 4 is a block diagram of a reordering circuit configured according to a specific embodiment of the present invention.
FIG.
FIG. 16 is a more detailed block diagram of the reordering circuit of FIG.
FIG.
FIG. 4 is a diagram illustrating contents of a command queue element according to a specific embodiment of the present invention.
FIG.
FIG. 3 is a block diagram illustrating a specific embodiment of an address shifter.
FIG.
FIG. 4 is a diagram illustrating contents of a data queue element according to a specific embodiment of the present invention.
FIG.
FIG. 16 is a diagram illustrating a collision detection system as another example of the collision detection system illustrated in FIG. 15.
FIG. 21
FIG. 4 is an exemplary timing diagram illustrating how each of the read / write commands to a target device is related to the transmission of a data packet.
FIG.
FIG. 3 illustrates a prediction system with an N page timer that stores the time between the last command issued for a particular page and its predicted next access to its memory.
FIG. 23
FIG. 3 is a diagram illustrating a device controller having a device access priority determiner according to an embodiment of the present invention.
FIG. 24
FIG. 5 shows Table 4 summarizing the scheduling process performed by the restriction block according to an embodiment of the present invention.

Claims

A universal resource access controller coupled to the requesting system and the resource,
The requesting system generates a resource access request passed to the universal resource controller when the requesting system desires access to the resource;
The universal resource controller uses a particular operating characteristic parameter of the requested resource and a current state of the requested resource to provide a corresponding response to the requesting system for access to the resource as requested. An access controller that generates an ordered universal access request command.

The universal resource access controller according to claim 1, wherein
The universal resource controller,
A configurable system interface coupled to the requesting system and configured to both receive the resource access request and generate a corresponding universal command;
A universal command orderer coupled to the configurable system interface;
A resource tag buffer coupled to the command orderer and configured to store a resource tag configured to identify the current state of the requested resource;
An operating characteristic parameter buffer coupled to the command orderer and configured to store the operating characteristic parameter associated with the requested resource;
With
The universal command orderer uses the resource tag identifying the current state of the requested resource and the operating characteristic parameters associated with the requested resource to generate the ordered universal command. Generate an access controller.

The universal resource access controller according to claim 2, wherein
The requesting system is one of a plurality of processors included in a multiprocessor computer system;
An access controller, wherein the configurable system interface is configurable to accept and process resource requests from any of the plurality of processors.

The universal resource access controller according to claim 3, wherein
The requested resource is one of a plurality of shared resources;
Each of the plurality of shared resources is associated with an operation characteristic parameter stored in a corresponding operation characteristic parameter buffer,
An access controller, wherein each of the plurality of shared resources is associated with a resource tag indicating a current state of each of the plurality of resources.

The universal resource access controller according to claim 4, further comprising:
An address space controller coupled to the command orderer and configured to store each of the operational characteristic parameter buffers associated with each of the plurality of shared resources;
The access controller, wherein the resource tag buffer stores a resource tag associated with each of the plurality of shared resources.

The universal resource access controller according to claim 5, wherein
The requesting system is identified to the universal resource access controller if a particular requesting system desires to access a particular one of the plurality of shared resources;
The universal resource access controller, in response, receives the specific shared resource request generated by the requesting system and then configures the configurable system interface to generate a corresponding universal command. Is configured to
The command orderer is configured to determine the requested shared resource based on the current state of the requested resource indicated by the corresponding resource tag and the operating characteristic parameter associated with the requested resource. An access controller that generates ordered universal commands that are specific to.

The universal resource access controller according to claim 6, wherein
An access controller, wherein the requesting system is a processor configured to execute executable instructions.

The universal resource access controller according to claim 7, wherein
An access controller, wherein the shared resources are peripheral buses used to interconnect a plurality of computer system peripheral devices.

The universal resource access controller according to claim 8, wherein:
The processor, when the processor desires to access the peripheral bus, sends the universal resource to a corresponding peripheral bus access request based on the current state of the peripheral bus and the operating characteristics of the peripheral bus. An access controller that generates peripheral bus requests that are translated by the access controller.

The universal resource access controller according to claim 9, wherein
The access controller, wherein the processor is one of a plurality of processors included in a multiprocessor computer system.

An access control device for controlling access to any of a plurality of accessible devices by any of a plurality of requesting systems,
A universal controller unit,
An address space controller unit coupled to the universal controller unit;
With
The universal controller unit decodes a system address and a system command input from the requesting system, and, based on device parameters stored and provided by the address space controller, generates a relevant device address and a corresponding device command. Generate,
The control device, wherein each of the plurality of devices has its own address area provided in the address space controller.

The access control device according to claim 11, wherein
The control device, wherein the device parameters include a device setting set, a device access protocol, and device access timing information.

The access control device according to claim 12, wherein
The address space controller comprises:
A plurality of address range registers each corresponding to a particular one of the plurality of accessible devices;
A comparator coupled to the plurality of address range registers;
A multiplexer coupled to the comparator;
With
The input system address is compared with the contents of each of the plurality of address range registers, and if the input system address matches one of the contents of the address range register, the system address is activated. A control unit that identifies a unique system address.

The access control device according to claim 13,
The control device, wherein the comparator provides an output signal corresponding to the active system address indicating an active address area corresponding to a requested one of the accessible devices.

The access control device according to claim 14, wherein
The controller, wherein the multiplexer selects a device parameter corresponding to the requested accessible device.

A multiprocessor computer system,
A system bus,
A plurality of processors each capable of issuing processor commands and associated data bursts to the system bus;
2. The system of claim 1, wherein the system bus is connected to the plurality of processors and configured to receive the processor command and the associated data burst issued by the processor and issue a corresponding shared memory command. Memory controller and
A shared memory device;
A command bus configured to connect between the shared memory device and the memory controller, and to carry the issued shared memory command according to a shortest issuance time generated by the memory controller;
Data configured to connect between the shared memory and the memory controller and to carry data read from the shared memory to the memory controller based on the data offset generated by the memory controller. Bus and
A computer system comprising:

The multiprocessor computer system according to claim 16, wherein:
The computer system, wherein the shared memory is an SDRAM.

The multiprocessor computer system according to claim 16, wherein:
A computer system, wherein the shared memory is an SLDRAM module having a plurality of SDRAMs.

An access control method for controlling access to a memory device,
The memory device is coupled to the requesting system by a universal memory access controller;
The universal memory access controller,
A configurable system interface coupled to the requesting system and suitably configured to both receive a resource access request and generate a corresponding universal command;
A universal command orderer coupled to the configurable system interface;
A resource tag buffer coupled to the command orderer and configured to store a resource tag configured to identify a current state of the memory device;
An operating characteristic parameter buffer coupled to the command orderer and configured to store the operating characteristic parameter associated with the memory device;
With
The access control method,
Identifying the requesting system;
Configuring the configurable system interface to conform to the identified requesting system;
Generating a memory access request by the requesting system;
Generating a universal command by the configurable system interface based on the memory access request;
A universal that orders the universal commands by the command orderer based on the current state of the memory device indicated by the corresponding resource tag and the operating characteristics of the memory indicated by the corresponding operating characteristic parameters. Converting to a command;
Accessing the memory using the ordered universal command;
A control method, comprising:

The access control method according to claim 19, wherein
The requesting system is one of a plurality of processors included in a multiprocessor computer system;
The control method, wherein the configurable system interface is configured to accept and process a memory access request from any of the plurality of processors.

The access control method according to claim 20, wherein
The memory device is one of a plurality of memory devices,
Each of the plurality of memory devices is associated with an operation characteristic parameter stored in a corresponding operation characteristic parameter buffer,
The control method, wherein each of the plurality of memory devices is associated with a resource tag indicating a current state of each of the plurality of memory devices.

The access control method according to claim 21, wherein
The control method, wherein the memory access request includes a memory command and an associated memory address corresponding to a specific memory page.

23. The access control method according to claim 22, further comprising:
Determining whether a memory address corresponding to a particular memory page associated with a current system request matches any of the previously requested and stored memory addresses for the particular memory page. When,
If the latest memory address matches at least one of the stored memory addresses, determining whether the memory page corresponding to the matched address is open;
Issuing a next system request if the requested memory page is determined to be open;
A control method, comprising:

The access control method according to claim 23,
If it is determined that none of the stored memory addresses match the latest memory address, and if the old page is determined to be open, close the requested page and create a new page. , And otherwise opening the new page in response to the system request.

The access control method according to claim 19, wherein
The control method, wherein the memory device is a multi-bank type memory device.

The access control method according to claim 24, wherein:
The control method, wherein the system address is converted into a bank address, a row address, and a column address.

The access control method according to claim 26, wherein
The control method, wherein the memory device includes a virtual channel.

The access control method according to claim 27, wherein:
The control method, wherein the system address further includes a segment address.

29. The access control method according to claim 28,
The control method, wherein the universal command includes five data bits.

The access control method according to claim 29, wherein:
The control method, wherein the memory device is an SDRAM.

31. The access control method according to claim 30, wherein
The first bit is a precharge bit,
The second bit is an open page bit,
The third bit is a close page bit,
The fourth bit is a read page bit,
The control method, wherein the fifth bit is a write page bit.

The multiprocessor computer system according to claim 32, wherein:
The computer system, wherein the shared memory is an SDRAM.

34. The multiprocessor computer system according to claim 33,
A computer system, wherein the shared memory is an SLDRAM module having a plurality of SDRAMs.