JP4125675B2

JP4125675B2 - Glitch-free logic system and method insensitive to timing

Info

Publication number: JP4125675B2
Application number: JP2003521985A
Authority: JP
Inventors: ピン−シェンセン，; シャロンシャウ−ピンリン，; クインシークン−スーシェン，
Original assignee: ベリシティーデザイン，インコーポレイテッド
Priority date: 2001-08-14
Filing date: 2001-08-14
Publication date: 2008-07-30
Anticipated expiration: 2021-08-14
Also published as: KR20040028599A; IL154480A0; CN100578510C; CN1491394A; IL154480A; JP2005500625A; CA2420022A1; EP1417605A4; EP1417605A1

Description

【０００１】
（関連米国出願）
本出願は、１９９７年５月２日に米国特許商標庁（ＵＳＰＴＯ）に出願された米国特許出願の第０８／８５０，１３６号の部分継続出願である。
【０００２】
（発明の背景）
（発明の分野）
本発明は概して、電子設計オートメーション（ＥＤＡ）に関する。より詳細には、本発明は、シミュレーション、ハードウエア加速、および保護（ｃｏｖｅｒｉｆｉｃａｔｉｏｎ）を含む種々の用途の保持時間およびクロックグリッチ問題を解決するデジタル論理デバイスに関する。
【０００３】
（関連技術の説明）
一般に、電子設計自動化（ＥＤＡ）は、設計者にユーザのカスタム回路設計図を設計しかつ検証するための自動化または半自動化ツールを設計者に提供するために、種々のワークステーションに構成されたコンピュータベースのツールである。ＥＤＡは、シミュレーション、エミュレーション、試作、実行、またはコンピューティングの目的のために任意の電子設計図を作成し、解析し、そして編集するために一般に用いられる。ＥＤＡの用語はまた、ユーザ設計サブシステムまたはコンポーネントを用いるシステム（すなわち、ターゲットシステム）を開発するために用いられ得る。ＥＤＡの最終的な結果は、通常、個別の集積回路またはプリント回路基板の形態において、変更されかつエンハンスされた設計であり、これはオリジナルの設計を超える改良であるが、オリジナルの設計の精神を維持する。
【０００４】
ハードウエアエミュレーション前の回路設計のシミュレーションを行うソフトウエアの価値がＥＤＡ技術を用いて利益を得る種々の産業にて認識されている。それにも拘らず、現在のソフトウエアシミュレーションおよびハードウエアエミュレーション／アクセラレーションは、これらのプロセスが性質上、分離かつ独立であるためにユーザにとって厄介である。例えば、ユーザは、１デバッグ／テストセッション全てにおいて、その時間の一部の間に回路設計をシミレートするかまたはデバッグし、この結果を用いて別の時間の間にハードウエアモデルを用いるシミュレーションプロセスをアクセラレーションさせ、選択した時間に回路内の種々のレジスタおよび組み合わされた論理値を検査し、そしてその後にソフトウエアのシミュレーションに戻ることを望む可能性がある。さらに、内部レジスタおよび組合せ論理値がシミュレーション時間が過ぎるにつれて変化する場合、ユーザは、ハードウエアクセラレーション／エミュレーションプロセスの間にハードウエアモデルに変化が生じる場合でさえ、この変化をモニタリングすることができるようにすべきである。
【０００５】
コ−シミュレーション（ｃｏ−ｓｉｍｉｌａｔｉｏｎ）は、純粋なソフトウエアシミュレーションおよび純粋なハードウエアエミュレーション／アクセラレーションという２つの分離かつ独立のプロセスを用いることの厄介な本質を有するいくつかの問題を取り扱い、そして全体システムをより使い易くする必要性から生じた。しかし、コ−シミュレーションは、なお多数の欠陥を有する：（１）コシステムが手動パーティションを要求とする、（２）コ−シミュレーションが２つの疎結合のエンジンを用いる、（３）コ−シミュレーション速度はソフトウエアシミュレーション速度と同程度に遅い、そして（４）コ−シミュレーションシステムは競合条件（ｒａｃｅｃｏｎｄｉｔｉｏｎ）に出くわす。
【０００６】
第１に、ソフトウエアとハードウエアとの間のパーティションは、自動の代わりにさらにユーザに負担を与える手動で行われる。本質的には、コ−シミュレーションがユーザに設計（挙動レベルから始まり、次いでＲＴＬ、そして次いでゲートレベルに至る）をパーティションし、非常に大きい機能ブロックにてソフトウエアおよびハードウエア間のモデル自体をテストすることを要求する。このような制約は、ユーザに対してある程度の洗練された知識を要求する。
【０００７】
第２に、コ−シミュレーションシステムが２つの疎結合でかつ独立のエンジンを利用し、そしてこれらが内部エンジン同期化、調整および柔軟性の問題を引き起こす。コ−シミュレーションが２つの異なる検証エンジン（ソフトウエアシミュレーションおよびハードウエアエミュレーション）の同期化を要求する。ソフトウエアシミュレータ側がハードウエアクセレータ側に結合される場合でさえ、外部ピン出力データ（ｐｉｎ−ｏｕｔｄａｔａ）だけが検査およびロードに利用可能である。レジスタのモデリングされた回路内の値および組合せ論理レベルは、容易な検査および一方の側から他方の側へのダウンロードに利用不可能であり、これらのコシミュレータシステムのユーティリティを制限する。通常、ユーザがソフトウエアシミュレーションからハードウエア／アクセラレーションにスイッチングし、その後、戻ってスイッチングする場合、ユーザが全体の設計を再度シミュレーションを行わなければならない可能性がある。したがって、ユーザが、レジスタおよび組み合わせ論理値を検査する１回デバッグセッションの間に、ソフトウエアシミュレーションとハードウエア／アクセラレーションとの間でスイッチングすることを望む場合、コ−シミュレーションシステムはこの能力を提供しない。
【０００８】
第３に、コ−シミュレーション速度はシミュレーション速度と同じくらいに遅い。コ−シミュレーションは、２つの異なる検証エンジン（ｖｅｒｉｆｉｃａｔｉｏｎｅｎｇｉｎｅ）、すなわち、ソフトウエアシミュレーションとハードウエアエミュレーションの同期化を要求する。これらのエンジンそれぞれは、シミュレーションまたはエミュレーションを駆動するためにそれぞれ固有の制御機構を有する。これは、ソフトウエアとハードウエアとの間の同期化により、ソフトウエアシミュレーションと同じ低い速度に全体の性能を押しやることを示唆する。これら２つのエンジンの動作を調整するオーバーヘッドがコ−シミュレーションシステムの低速化に加わる。
【０００９】
第４に、コ−シミュレーションシステムが、セットアップ、保持時間およびクロック信号間の競合条件に起因するクロックグリッチ問題に出会う。コシミュレータは、ハードウエア駆動クロックを用い、そして異なるワイヤライン長に起因する異なる時間に異なる論理素子への入力にされ得る。これらの論理素子が共にそのデータを評価すべきである場合、ある論理素子がある時間期間にデータを評価し、他の論理素子が異なる時間期間にデータを評価するので、このことが評価結果の不確定性レベルを引き上げる。
【００１０】
従って、現在公知のシミュレーションシステム、ハードウエアエミュレーションデバイス、ハードウエア加速、コシミュレーション、および保護システムによって上掲した問題を解決するシステムまたは方法に対する必要性が産業に存在する。
【００１１】
（発明の要旨）
本発明は、フレキシブルかつ高速シミュレーション／エミュレーションシステムの形態の上述の問題に対する解決策を提供し、本明細書中では、このシステムは、再構成可能な計算システム（またはＲＣＣ計算システム）および再構成可能なハードウエアアレイ（またはＲＣＣハードウエアアレイ）を含む、「Ｓエミュレーションシステム（ＳＥｍｕｌａｔｉｏｎｓｙｓｔｅｍ）」、「Ｓエミュレータシステム（ＳＥｍｕｌａｔｏｒｓｙｓｔｅｍ）」、または、保護システムと称する。
【００１２】
本発明のＳエミュレーションシステムおよび方法は、シミュレーションのためのソフトウエアおよびハードウエア表示に電子システムの設計を変換する能力をユーザに提供する。一般的に、Ｓエミュレーションシシステムはソフトウエア制御エミュレータまたはハードウエア加速シミュレータであり、本明書中で使用された方法である。従って、純粋なソフトウエアシミュレーションが可能であるが、シミュレーションはまた、ハードウエアモデルの使用によって加速され得る。ハードウエア加速は、開始、停止、値のアサート、および値の検査のためのソフトウエア制御によって可能になる。回路内エミュレーションシモードは、さらに利用可能になり、回路のターゲットシステムの環境においてユーザの回路設計をテストする。再度、ソフトウエア制御が利用可能である。
【００１３】
ソフトウエアモデルおよびハードウエアモデルの両方を制御し、ユーザが開始、停止、値のアサート、値の検査、および種々のモード間のスイッチングを可能にすることによって、ユーザに対してより大きい実行時間の柔軟性を提供するソフトウエアカーネルは、システムの核心である。カーネルは、レジスタに対するイネーブル入力を介してハードウエアのデータ評価を制御することによって種々のモードを制御する。
【００１４】
本発明によるＳエミュレーションシシステムおよび方法は、４つのモードの動作を提供する。すなわち、（１）ソフトウエアシミュレーション、（２）ハードウエア加速を介したシミュレーション、（３）回路内エミュレーションシ（ＩＣＥ）、および（４）ポストシミュレーション解析である。ハイレベルには、本発明は、上記４つのモードのそれぞれまたは以下のようなこれらのモードの種々の組み合わせにおいて具現化される。すなわち、（１）ソフトウエアシミュレーションのみ、（２）ハードウエア加速を介したシミュレーションのみ、（３）回路内エミュレーションシ（ＩＣＥ）のみ、（４）ポストシミュレーション解析のみ、（５）ソフトウエアシミュレーションおよびハードウエア加速を介したシミュレーション、（６）ソフトウエアシミュレーションおよびＩＣＥ、（７）ハードウエア加速を介したシミュレーションおよびＩＣＥ（８）ソフトウエアシミュレーション、ハードウエア加速を介したシミュレーション、およびＩＣＥ、（９）ソフトウエアシミュレーションおよびポストシミュレーション解析、（１０）ハードウエア加速を介したシミュレーションおよびポストシミュレーション解析、（１１）ソフトウエアシミュレーション、ハードウエア加速を介したシミュレーション、およびポストシミュレーション解析、（１２）ＩＣＥおよびポストシミュレーション解析、（１３）ソフトウエアシミュレーション、ＩＣＥ、ポストシミュレーション解析、（１４）ハードウエア加速を介したシミュレーション、ＩＣＥ、ポストシミュレーション解析、および（１５）ソフトウエアシミュレーション、ハードウエア加速を介したシミュレーション、ＩＣＥ、およびポストシミュレーション解析である。他の組み合わせが可能であり、本発明の範囲内にある。
【００１５】
各モードまたはモードの組み合わせは、以下の特性またはこの特性の組み合わせを提供する。すなわち、（１）手動または自動のモード間でスイッチングする、（２）使用（ユーザは、モード間でスイッチングし得、開始、停止、アサート、値のアサート、値の検査、シミュレーションまたはエミュレーションシプロセスにわたるサイクルの単一処理、（３）ソフトウエアモデルおよびハードウエアモデルを生成するコンパイルプロセス、（４）メイン制御ループを有する全てのモードを制御するソフトウエアカーネルは、一実施形態では、システムを初期化するステップと、アクティブテストベンチプロセス／コンポーネントを評価するステップと、クロックコンポーネントを評価するステップと、クロックエッジを検出するステップと、レジスタおよびメモリを更新するステップと、組み合わせコンポーネントを伝達するステップと、シミュレーション時間を進めるステップと、アクティブテストベンチプロセスが存在する限り、ループを継続するステップとを含む、（５）ハードウエアモデルを生成するためのコンポーネントタイプの解析、（６）一実施形態では、クラスタリング、配置、およびルーティングによって、ハードウエアモデルを再構成可能な基板にマッピングするステップ、（７）一実施形態では、ゲートクロック論理解析およびゲートデータ論理解析によって、競合条件を回避するためのソフトウエアクロックセットアップ、（８）一実施形態では、ハードウエアモデルにおけるイネーブル信号をトリガし、ゲートクロック論理を介して、一次クロックからハードウエアモデルのクロックエッジレジスタのクロック入力に信号を送信し、クロックイネーブル信号をハードウエアモデルのレジスタのイネーブル入力に送信し、ゲートデータ論理を介して、一次クロックレジスタからハードウエアモデルのレジスタにデータを送信し、ハードウエアモデルのレジスタのイネーブル入力にクロックイネーブル信号をディセーブルするクロックエッジレジスタをリセットするソフトウエアモデルにおいてクロックエッジ検出することによるソフトウエアクロック実現、（９）デバッグセッションおよびポストシミュレーション解析のための書き込み選択データ、（１０）組み合わせ論理再生成、（１１）一実施形態では、基本的な構築ブロックは非同期入力および同期入力を有するＤタイプレジスタである、（１２）各チップにおけるアドレスポインタ、（１３）多重化されたクロスチップアドレスポインタチェーン、（１４）ＦＰＧＡチップおよびその相互接続スキームのアレイ、（１５）ＰＣＩバスシステムの性能をトラッキングするバスを有するＦＰＧＡチップのバンク、（１６）ピギーバック基板を介して拡張を可能にするＦＰＧＡバンク、および（１７）最適ピン使用のための時分割多重化（ＴＤＭ）回路である。種々の実施形態によって本発明は、本明細書中で説明されたような他の特徴を提供し、これは上述の特徴のリストに列挙され得ない。
【００１６】
本発明の一実施形態は、シミュレーションシステムである。シミュレーションシステムは回路の挙動をシミュレートするためのホストコンピュータシステムにおいて動作する。ホストコンピュータシステムは、中央処理ユニット（ＣＰＵ）、メインメモリ、およびＣＰＵをメインメモリに結合し、ＣＰＵとメインメモリとの間の通信を可能にするローカルバスを含む。この回路は、構造およびＨＤＬ等のハードウエア言語において特定化された機能を有する。この言語は、コンポーネントタイプおよび接続として回路を記述することを可能にする。このシミュレーションシステムはソフトウエアモデル、ソフトウエア制御論理、およびハードウエア論理素子を含む。
【００１７】
回路のソフトウエアモデルはローカルバスに結合される。典型的には、このモデルはメインメモリに常駐している。ソフトウエア制御論理は、ソフトウエアモデルおよびハードウエア論理素子の動作を制御するために、ソフトウエアモデルおよびハードウエア論理素子に結合される。ソフトウエア制御論理は、外部プロセスからの入力データおよびクロック信号の受信を可能にするインターフェイス論理およびクロック信号のアクティブエッジの検出およびトリガ信号の生成のためのクロック検出論理を含む。さらにハードウエア論理素子は、ローカルバスに結合され、コンポーネントタイプに基づく回路の少なくとも一部分のハードウエアモデルおよびトリガ信号に応答してハードウエアモデルにおけるデータを評価するためのクロックイネーブル論理を含む。
【００１８】
さらにハードウエア論理素子は、アレイまたは互いに結合された複数のフィールドプログラマブルデバイスを含む。各フィールドプログラム可能なデバイスは、少なくとも一部の回路のハードウエアモデルを含み、従って、全てのフィールドプログラム可能なデバイスの組み合わせは、全ハードウエアモデルを含む。さらに複数の相互接続は、ハードウエアモデルの一部を互いに接続させる。各相互接続は、同一のロウまたはカラムに配置された任意の２つのフィールドプログラム可能なデバイス間の直接接続を表す。任意の２つのフィールドプログラム可能なデバイス間の最も短い距離は、せいぜい２つの相互接続または「ホップ」である。
【００１９】
本発明の別の実施形態は、回路をシミュレートするシステムおよび方法であり、回路はソフトウエアにモデル化され、回路の少なくとも一部分はハードウエアにモデル化される。データ評価はハードウエアで発生するが、ソフトウエアクロックを介してソフトウエアで制御される。評価されるべきデータは、ハードウエアモデルに伝達され安定化される。ソフトウエアモデルがアクティブクロックエッジを検出する場合、ソフトウエアモデルは、イネーブル信号をハードウエアモデルに送信し、データ評価を始動させる。ハードウエアモデルはデータを評価し、ソフトウエアモデルにおける次のアクティブクロックエッジ信号検出において評価され得る新しい入来データを待機する。
【００２０】
本発明の別の実施形態は、ソフトウエアモデルおよびハードウエアモデルの動作を制御するソフトウエアカーネルを含む。ソフトウエアカーネルは、アクティブテストベンチプロセスコンポーネントを評価するステップと、クロックコンポーネントを評価するステップと、クロックエッジを検出するステップと、レジスタおよびメモリを更新するステップと、組み合わせコンポーネントを伝達するステップと、シミュレーション時間を進めるステップと、アクティブベンチプロセスが存在すする限りループを継続するステップとを含む。
【００２１】
本発明のさらなる実施形態は、回路をシミュレートする方法であって、回路は、ハードウエア言語（例えばＨＤＬ）において特定された構造および機能を有する。さらにハードウエア言語は、回路をコンポーネントに記載または変形することを可能にする。本方法は、（１）ハードウエア言語においてコンポーネントタイプを決定するステップと、（２）コンポーネントタイプに基づいて回路のモデルを生成するステップと、および（３）入力データをモデルに提供することによって、そのモデルを用いて回路の挙動をシミュレートするステップとを含む。このモデルを一般化するステップは、（１）回路のソフトウエアモデルを生成するステップと、（２）コンポーネントタイプに基づく回路のハードウエアモデルを生成するステップとを含み得る。
【００２２】
別の実施形態では、本発明は回路をシミュレートする方法である。そのステップは、（１）回路のソフトウエアモデルを生成するステップと、（２）回路のハードウエアモデルを生成するステップと、（３）ソフトウエアモデルに入力データを提供することによってソフトウエアモデルを用いて回路の挙動をシミュレートするステップと、（４）ハードウエアモデルに選択的にスイッチングするステップと、（５）入力データをハードウエアモデルに提供するステップと、（６）ハードウエアモデルにおいてシミュレーションを加速することによって、ハードウエアモデルを用いて回路の挙動をシミュレートするステップとを含む。さらに本方法は、（１）ソフトウエアモデルに選択的にスイッチングするステップと、（２）入力データをソフトウエアモデルに提供することによってソフトウエアモデルを用いて回路の挙動をシミュレートするステップとをさらに含む。シミュレーションはまた、ソフトウエアモデルを用いて停止され得る。
【００２３】
回路内エミュレーションシモードに対して、本方法は、（１）回路のソフトウエアモデルを生成するステップと、（２）回路の少なくとも一部のハードウエアモデルを生成するステップと、（３）ターゲットシステムからハードウエアモデルに入力信号を供給するステップと、（４）ハードウエアモデルからターゲットシステムに出力信号を供給するステップと、（５）ハードウエアモデルを用いて回路の挙動をシミュレートするステップであって、ソフトウエアモデルはシミュレーション／エミュレーションをサイクルごとに制御することを可能にする、ステップとを含む。
【００２４】
ポストシミュレーション解析に対して、回路をシミュレートする方法は、（１）回路のモデルを生成するステップと、（２）入力データをそのモデルに提供することによって、このモデルを用いて回路の挙動をシミュレートするステップと、（３）選択された入力データおよび選択出力データをこのモデルからの書き込みポイントとして書き込むステップとを含む。ソフトウエアモデルおよびハードウエアモデルが生成され得る。本方法は、（１）シミュレーションにおける所望された時間依存ポイントを選択するステップと、（２）選択された時間依存ポイントにおいて、またはその前に書き込みポイントを選択するステップと、（３）入力データをハードウエアモデルに提供するステップと、（４）選択された書き込みポイントからハードウエアモデルを用いて回路の挙動をシミュレートするステップとをさらに含み得る。
【００２５】
本発明のさらなる実施形態は、回路をシミュレートするためのシミュレーションシステムのためのモデルを生成する方法である。このステップは、（１）回路のソフトウエアモデルを生成するステップと、（２）コンポーネントタイプに基づく回路の少なくとも一部に対してハードウエアモデルを生成するステップと、（３）ハードウエアモデルにおいてクロック生成回路を生成して、ソフトウエアモデルにおけるクロックエッジ検出に応答して、ハードウエアモデルのデータ評価をトリガするステップとを含む。
【００２６】
本発明の種々の実施形態は、標準的な設計のフリップフロップおよびラッチを置換する特別に設計された論理デバイスによって上記問題を解決する。本発明の一実施形態は、タイミングに鈍感なグリッチのない（ＴＩＧＦ）論理デバイスである。ＴＩＧＦ論理デバイスは、任意のラッチまたはエッジトリガフリップフロップの形態をとり得る。本発明の一実施形態では、トリガ信号が供給されて、ＴＩＧＦ論理デバイスを更新する。トリガ信号は、評価期間から隣接する時間において発生した短いトリガ期間の間に供給される。
【００２７】
ラッチ形態では、ＴＩＧＦラッチは、トリガ信号が受け取られるまでＴＩＧＦラッチの現在の状態を保持するフリップフロップを含む。マルチプレクサはまた、新しい値および古い格納された値を受け取るように設けられる。イネーブル信号は、マルチプレクサに対するセレクタ信号として機能する。トリガ信号がＴＩＧＦ信号の更新を制御するため、ＴＩＧＦラッチへのＤ入力におけるデータおよびイネーブル入力における制御データは、保持時間超過を受けることなく任意の順序で到達し得る。あるいは、トリガ信号は、ＴＩＧＦの更新を制御するため、イネーブル信号は、ＴＩＧＦラッチの適切な動作に負の影響を受けることなくグリッチし得る。
【００２８】
フリップフロップ形態においてＴＩＧＦフリップフロップは、新しい入力値を保持する第１のフリップフロップ、現在格納された値を保持する第２のフリップフロップ、およびクロックエッジ検出器を含む。これら３つのコンポーネント全てがＴＩＧＦフリップフロップを更新するためのトリガ信号によって制御される。マルチプレクサは、さらにセレクタ信号として機能するエッジ検出器信号が供給される。１つの専用の第１のフリップフロップは、評価の間に変化する入力を効率的にブロックする新しい入力値を格納するため、保持時間超過が回避される。ＴＩＧＦフリップフロップ更新を制御するトリガ信号によって、クロックグリッチは、ＴＩＧＦフリップフロップをエミュレートされたフリップフロップとして使用するユーザ設計回路のハードウエアモデルに影響を与えない。
【００２９】
これらの実施形態および他の実施形態は本明細書の以下のセクションで十分に議論され、示される。
【００３０】
添付された図面が、本発明のいくつかの異なる局面および実施形態について以下で説明される。
【００３１】
（好適な実施形態の詳細な説明）
本明細書において、「Ｓエミュレータ」または「Ｓエミュレータ」システムを呼ばれるシステムに関することを介しかつ内部の本発明の種々の実施形態が説明される。本明細書にわたって、用語「Ｓエミュレーションシステム」、「Ｓエミュレータシステム」、「Ｓエミュレーション」、または簡単に「システム」が用いられ得る。これらの用語は、４つの動作モードの任意の組合せのための本発明による種々の装置および方法を表す：すなわち、（１）ソフトウエアシミュレーション、（２）ハードウエアクセラレーションによるシミュレーション、（３）インサーキットエミュレーション（ＩＣＥ）、および（４）ポストシミュレーション解析（個々のセットアップまたは前処理ステージを含む）である。他の場合にも用語「Ｓエミュレーション」が用いられ得る。この用語は本明細書中に記載された新規のプロセスをいう。
【００３２】
同様に、「再構成可能ハードウエアコンピューティング（ＲＣＣ）アレイシステム」または「ＲＣＣコンピューティングシステム」などの用語は、メインプロセッサ、ソフトウエアカーネルおよびユーザ設計のソフトウエアモデルを含むシミュレーション／コ−ベリフィケーションシステムのこの部分をいう。「再構成可能ハードウエアハードウエアレイ」または「ＲＣＣハードウエアレイ」などの用語は、１実施形態において、ユーザ設計のハードウエアモデルを含み、かつ再構成可能ハードウエア論理素子を含むシミュレーション／コ−ベリフィケーションシステムのこの部分をいう。
【００３３】
また、本明細書には、「ユーザ」およびユーザの「回路設計」または「電子設計」が記載されている。「ユーザ」は、このインターフェースを介してＳエミュレーションシステムを用いる人間であり、そして設計プロセスにてほとんどか、全く役割を果たさなかった回路の設計者またはテスト／デバッガーであり得る。「回路設計」または「電子設計」は、ソフトウエアまたはハードウエア（テスト／デバッグ目的のためにＳエミュレーションシステムによってモデリングされ得る）であるカスタム設計システムまたはコンポーネントである。多くの場合、「ユーザ」はまた「回路設計」および「電子設計」を行った。
【００３４】
本明細書はまた、「ワイヤ」、「ワイヤライン」、「ワイヤ／バスライン」、および「バス」を用いる。これらの用語は、電気的に伝導する種々の線をいう。各ラインが２つのポイントの間の単一のワイヤまたは複数のポイントの間のいくつかのワイヤであり得る。これらの用語は、「ワイヤ」が１以上の導線を含み得、「バス」はまた１以上の導線を含み得る。
【００３５】
本明細書は、アウトラインの形態にて提示される。第１に、本明細書は、４つの動作モードおよびハードウエア実現スキームの概要を含むＳエミュレーションシステムの全体的な概要を提示する。第２に、本明細書は、Ｓエミュレーションシステムの詳細な説明を提供する。いくつかの場合、１つの図面が添付された図に示された種々の実施形態を提供し得る。これらの場合、同一の参照番号が同一のコンポーネント／ユニット／プロセスのために用いられる。本明細書のアウトラインは以下の通りである。
【００３６】
Ｉ．概要
Ａ．シミュレーション／ハードウエアクセラレーションモード
Ｂ．ターゲットシステムモードでのエミュレーション
Ｃ．ポストシミュレーション解析モード
Ｄ．ハードウエア実現スキーム
Ｅ．シミュレーションサーバ
Ｆ．メモリシミュレーション
Ｇ．コ−ベリフィケーションシステム
ＩＩ．システムの記述
ＩＩＩ．シミュレーション／ハードウエアクセラレーションモード
ＩＶ．ターゲットシステムモードによるエミュレーション
Ｖ．ポストシミュレーション解析モード
ＶＩ．ハードウエア実現スキーム
Ａ．概要
Ｂ．アドレスポインタ
Ｃ．ゲートデータ（ＧＡＴＥＤＤＡＴＡ）／クロックネットワーク解析
Ｄ．ＦＰＧＡアレイおよび制御
Ｅ．高集積度ＦＰＧＡチップを用いる別の実施形態
Ｆ．ＴＩＧＦ論理デバイス
ＶＩＩ．シミュレーションサーバ
ＶＩＩＩ．メモリシミュレーション
ＩＸ．コ−ベリフィケーションシステム
Ｘ．例
−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−
Ｉ．概要
本発明の種々の実施形態は、４つの一般的な動作モードを有する。（１）ソフトウエアシミュレーション、（２）ハードウエアクセラレーションによるシミュレーション、（３）インサーキットエミュレーション（ＩＣＥ）、および（４）ポストシミュレーション解析。種々の実施形態は、以下の機能の少なくともいくつかを有する以上のモードのシステムおよび方法を含む。（１）単一の緊密に結合されたシミュレーションエンジン、ソフトウエアカーネル（サイクル毎にソフトウエアおよびハードウエアを制御する）を有するソフトウエアおよびハードウエアモデル。（２）ソフトウエアおよびハードウエアモデル生成およびパーティションのためのコンパイルプロセスの間の自動コンポーネントタイプ解析。（３）ソフトウエアシミュレーションモード、ハードウエアクセラレーションモードによるシミュレーション、インサーキットエミュレーションモードおよびポストシミュレーション解析モード間でスイッチング（サイクル毎）を行う機能。（４）ソフトウエア組み合わせコンポーネント再生成による完全なハードウエア可観性（ｖｉｓｉｂｉｌｉｔｙ）。（５）競合条件を避けるためのソフトウエアクロックおよびゲートクロック／データ論理による二重バッファクロックモデリング；（６）ポストシミュレーションセッションの任意の選択されたポイントからユーザの回路設計を再度シミュレーションをするかまたはハードウエアクセラレーションを行う機能。最終的な目的は、完全なＨＤＬ機能性およびエミュレータ実行性能を有する柔軟で高速のシミュレータ／エミュレータシステムおよび方法である。
【００３７】
Ａ．シミュレーション／ハードウエアクセラレーションモード
Ｓエミュレーションシステムは、自動コンポーネントタイプ解析を通じて、ソフトウエアおよびハードウエアのユーザのカスタム回路設計をモデリングし得る。全体のユーザ回路設計がソフトウエアにてモデリングされ、一方評価コンポーネント（すなわち、レジスタコンポーネント、組み合わせコンポーネント）がハードウエアにてモデリングされる。ハードウエアモデリングがコンポーネントタイプ解析によって容易にされる。
【００３８】
汎用プロセッサシステムのメインメモリに常駐するソフトウエアカーネルは、Ｓエミュレータシステムのメインプログラム（種々のモードおよび機能での全体動作および実行を制御する）として役目を果たす。任意のテストベンチプロセッサが活性化している限り、カーネルは活性化しているテストベンチコンポーネントを評価し、クロックコンポーネントを評価し、組み合わせ論理データを伝達させると同じようにレジスタおよびメモリを更新するクロックエッジを検出し、そしてシミュレーションタイムを進める。このソフトウエアカーネルがシミュレータエンジンとハードウエアエンジンとの緊密な結合特性を提供する。ソフトウエア／ハードウエア境界について、Ｓエミュレーションシステムが多数のＩ／Ｏアドレス空間−ＲＥＧ（レジスタ）、ＣＬＫ（ソフトウエアクロック）、Ｓ２Ｈ（ソフトウエアからハードウエアへ）およびＨ２Ｓ（ハードウエアからソフトウエアへ）を提供する。
【００３９】
Ｓエミュレーションが４つの動作モードの間で選択的にスイッチングする性能を有する。システムのユーザがシミュレーションを開始させ、シミュレーションを終了させ、入力値をアサートし、値を検査し、サイクル毎の単一のステップを試験を行い得、そして４つの異なるモードの間で前後してスイッチングし得る。例えば、本システムが時間期間のソフトウエアの回路をシミュレーションをし、ハードウエアモデルを介してシミュレーションをアクセラレーションし、そしてソフトウエアシミュレーションモードに戻し得る。
【００４０】
一般に、Ｓエミュレーションシステムは、ソフトウエアまたはハードウエアのどちらにてコンポーネントがモデリングされるかに関わらず、ユーザに全てのモデリングされるコンポーネントを「見る（ｓｅｅ）」能力を提供する。種々の理由から、組み合わせコンポーネントはレジスタのように「見える（ｖｉｓｉｂｌｅ）」ものではなく、従って、組み合わせコンポーネントデータを入手することは困難である。１つの理由は、ユーザの回路設計のハードウエア部をモデリングするように再構成可能ボードの中に用いられるＦＰＧＡが通常、実際の組み合わせコンポーネントの代わりに、ルックアップテーブル（ＬＵＴ）として組み合わせコンポーネントをモデリングすることである。したがって、Ｓエミュレーションシステムがレジスタ値を読み出し、そして次いで、組み合わせコンポーネントを再生成する。いくつかのオーバーヘッドが組み合わせコンポーネントを再生成するために必要とされるので、この再生成プロセスは常に実行されるわけではない。むしろ、ユーザのリクエストに応じるのみである。
【００４１】
ソフトウエアカーネルがソフトウエア側に常駐しているので、クロックエッジ検出機構は、ハードウエアモデルの種々のレジスタにイネーブル入力を駆動させるいわゆるソフトウエアクロックの生成をトリガーするために提供される。タイミングが二重バッファ回路の実装を介して厳密に制御されるので、データをこれらのモデルに入力させる前にソフトエアクロックイネーブル信号がレジスタモデルに入る。一旦これらのレジスタモデルへのデータ入力が安定化すると、ソフトウエアクロックは、全てのデータ値が保持時間違反の任意のリスクなしにとともにゲートされる（ｇａｔｅｄ）ことを確実にするように同期的にデータをゲートする。
【００４２】
また、ソフトウエアシミュレーションがシステムが全ての入力値および選択されたレジスタ値／状態だけをログするので高速になり、従って、オーバーヘッドがＩ／Ｏ動作の数を減少させることによって最小化される。ユーザがロギング頻度を選択的に選択し得る。
【００４３】
Ｂ．ターゲットシステムモードのエミュレーション
Ｓエミュレーションシステムがターゲットシステム環境内にユーザの回路をエミュレートできる。ターゲットシステムが評価のためにデータをハードウエアモデルに出力し、ハードウエアモデルはまたデータをターゲットシステムに出力する。さらに、ソフトウエアカーネルがこのモードの動作を制御するので、ユーザが開始し、停止し、値をアサートし、値を検査し、単一のステップを行い、そしてあるモードから別のモードにスイッチングするオプションをまだ有する。
【００４４】
Ｃ．ポストシミュレーション解析モード
ログがユーザにシミュレーションセッションの履歴記録を提供する。公知のシミュレーションシステムと異なり、Ｓエミュレーションシステムがシングルごとの値、内部状態またはシミュレーションプロセスの間の値変化をロギングしない。Ｓエミュレーションシステムがロギング頻度（すなわち、Ｎサイクル毎に１記録をログ）に基づいて選択された値および状態をロギングするだけである。ポストシミュレーションステージの間、ユーザは、ちょうど完了したシミュレーションセッションのポイントＸ近くの種々のデータを試験することを望む場合、ユーザがロギングされたポイント（例えば、ロギングされたポイントＹ（ポイントＸ近くにあり、時間的にポイントＸの前に配置される））の１つに進む。次いで、ユーザは、シミュレーション結果を入手するために選択されたロギングポイントＹから自分の望むポイントＸにシミュレーションをする。
【００４５】
また、ＶＣＤオンデマンドシステムが説明される。ＶＣＤオンデマンドシステムは、ユーザが、シミュレーションの再走行なしにオンデマンドで任意のシミュレーションターゲット範囲（すなわち、シミュレーション時間）を眺めることを可能にする。
【００４６】
Ｄ．ハードウエア実現スキーム
Ｓエミュレーションシステムは、再構成可能ボード上のＦＰＧＡチップのアレイを実現する。ハードウエアモデルに基づいて、Ｓエミュレーションシステムが、ユーザ回路設計のそれぞれ選択された部分をＦＰＧＡチップ上にパーティションし、マッピングし、配置し、そしてルーティングを行う。従って、例えば、１６チップの４×４アレイは、これらの１６チップにわたって広がられた大きな回路をモデリングし得る。相互接続スキームは、それぞれのチップが別のチップに２「ジャンプ」またはリンク内にアクセスすることを可能にする。
【００４７】
各ＦＰＧＡチップが各Ｉ／Ｏアドレス空間（すなわち、ＲＥＧ、ＣＬＫ、Ｓ２Ｈ、Ｈ２Ｓ）に対してアドレスポインタを実現する。特定のアドレス空間に関連する全てのアドレスポインタの組合せが共に連鎖される。したがって、データ転送の間、ワードデータ（各チップ（ある時間当たりに１チップ）の選択されたアドレス空間に対するある時間当たり１ワード）は、所望のワードデータがその選択されたアドレス空間に対してアクセスされるまでメインＦＰＧＡバスおよびＰＣＩバスから／へ（ｆｒｏｍ／ｔｏ）と逐次的に選択される。この逐次的なワードデータの選択が伝播するワード選択信号によって達成される。このワード選択信号がチップのアドレスポインタを介して移動し、そして次いで、次のチップのアドレスポインタに伝播し、そしてこの動作が最後のチップまたはシステムがアドレスポインタを初期化するまで続く。
【００４８】
再構成可能ハードウエアボードのＦＰＧＡバスシステムは、ＰＣＩバスバンド幅の２倍だがＰＣＩバス速度の半分で動作する。従って、ＦＰＧＡチップがより大きなバンド幅バスを利用するようにバンクに分離される。このＦＰＧＡバスシステムのスループットは、性能がバス速度を低減することによって損なわれないようにＰＣＩバスシステムのスループットをトラッキングし得る。拡張は、バンク長を拡張するピギーバックボード（ｐｉｇｇｙｂａｃｋｂｏａｒｄ）によって可能である。
【００４９】
本発明の別の実施形態において、より集積度の高いＦＰＧＡチップが用いられる。１つのこのような集積度の高いチップがＡｌｔｅｒａ１０Ｋ１３０Ｖおよび１０Ｋ２５０Ｖチップである。これらのチップの使用は、８つ未満の集積度の高いＦＰＧＡチップ（Ａｌｔｅｒａ１０Ｋ１００Ｖ）の代わりに、４つのみのＦＰＧＡチップがボード毎に用いられるようにボード設計を変更する。
【００５０】
シミュレーションシステムのＦＰＧＡアレイが特定のボード相互接続構造を介してマザーボード上に提供される。各チップは、相互接続部のセットを８つまで有してもよく、相互接続部は、ローカルバス接続部を除く、隣接した直接的に近接する相互接続部（すなわち、Ｎ［７３：０］、Ｓ［７３：０］、Ｗ［７３：０］、Ｅ［７３：０］）、および１ホップ（ｏｎｅ−ｈｏｐ）の隣接相互接続部（ＮＨ［２７：０］、ＳＨ［２７：０］、ＸＨ［３６：０］、ＸＨ［７２：３７］）によって単一のボード内および異なるボードにわたって配置される。各チップは、隣接した近接チップに直接的に相互接続され得るか、または上方、下方、左右に配置された１ホップ非隣接チップ（ｏｎｅ−ｈｏｐｔｏａｎｏｎ−ａｄｊａｃｅｎｔｃｈｉｐ）にて相互接続され得る。Ｘ方向（東西）においてアレイがトーラスである。Ｙ方向（北南）において、アレイがメッシュである。
【００５１】
相互接続部が単一のボード内の論理デバイスおよび他のコンポーネントを接続し得る。しかし、内部ボードコネクタは、上述のボードを接続し、そして（１）マザーボードおよびアレイボードを介するＰＣＩバスと（２）任意の２つのアレイボードとの間に信号が伝送するように異なるボードにわたって共に相互接続するために提供されている。
【００５２】
マザーボードコネクタはボードをマザーボード、従ってＰＣＩバス、電源および接地にグラウンドする。いくつかのボードについて、マザーボードコネクタがマザーボードとの直接的に接続するために用いられない。６枚ボード構成において、単なるボード１、３および５だけがマザーボードに直接的に接続されている一方、残りのボード２、４および６がマザーボード接続性について近接ボードに依存している。従って、全ての他のボードが直接的にマザーボードに接続され、これらのボードの相互接続部およびローカルバスがはんだ面に配置された内部ボードコネクタを介して共にコンポーネント面に結合されている。ＰＣＩ信号がボード（通常、第１のボード）の１つを通ってルーティングされる。電源およびグラウンドがこれらのボードの他のマザーボードコネクタに加えられる。コンポーネント面にハンダ面が配置され、種々の内部ボードコネクタがＰＣＩバスコンポーネント、ＦＰＧＡ論理デバイス、メモリデバイスおよび種々のシミュレーションシステム制御回路間の通信を可能にする。
【００５３】
Ｅ．シミュレーションサーバ
本発明の別の実施形態において、シミュレーションサーバが同一の再構成可能ハードウエアユニットに複数のユーザがアクセスすることを可能にする。あるシステム構成において、ネットワークにわたる複数のワークステーションまたは非ネットワーク環境の複数のユーザ／プロセスは、同一または異なるユーザ回路設計をレビュー／デバッグするように同一のサーバベースの再構成可能ハードウエアユニットにアクセスし得る。このアクセスが時間共有プロセス（スケジューラが複数のユーザのアクセス優先順位を決定し、ジョブをスワップし、そして競合スケジューリングされたユーザ間のハードウエアモデルアクセスを選択的にロックするプロセス）を介して達成される。あるシナリオでは、各ユーザは、はじめて、彼／彼女と異なるユーザ設計を再構成可能ハードウエアモデルにマッピングするためにアクセスし得、この場合、システムがソフトウエアおよびハードウエアモデルを生成するためにこの設計をコンパイルし、クラスタリング動作を実行し、配置およびルーティング動作（ｐｌａｃｅ−ａｎｄ−ｒｏｕｔｅｏｐｅｒａｔｉｏｎ）を実行し、ビットストリーム構成ファイルを生成し、そして再構成可能ハードウエアユニットにてＦＰＧＡチップを再構成し、これによりユーザの設計のハーウエア部分をモデリングする。あるユーザがハードウエアモデルを用いて自分の設計をアクセラレーションさせ、ソフトウエアシミュレーションのために自分のメモリにハードウエアの状態をダウンロードした場合、このハードウエアユニットがアクセスのために別のユーザによって解放され得る。
【００５４】
サーバにより、複数のユーザまたはプロセスがアクセラレーションおよびハードウエア状態スワッピング目的のために再構成可能ハードウエアユニットにアクセスできる。シミュレーションサーバは、スケジューラ（ｓｃｈｅｄｕｌｅｒ）、１以上のデバイスドライバおよび再構成可能ハードウエアユニットを含む。シミュレーションサーバのスケジューラは、割り込みラウンドロビンアルゴリズム（ｐｒｅｅｍｐｔｉｖｅｒｏｕｎｄｒｏｂｉｎａｌｇｏｒｉｔｈｍ）に基づいている。サーバスケジューラは、シミュレーションジョブキューテーブル、プライオリティソータ（ｐｒｉｏｒｉｔｙｓｏｒｔｅｒ）およびジョブスワッパを含む。本発明の回復および再生機能は、非ネットワークマルチプロセッシング環境およびネットワークマルチユーザ環境（これらの環境では、以前のチェックポイント状態データがダウンロードされ得、このチェックポイントに関連する全体のシミュレーション状態が再生デバッギングまたはサイクル毎のステッピングのために回復され得る）を容易にする。
【００５５】
Ｆ．メモリシミュレーション
本発明のメモリシミュレーションまたはメモリマッピングの局面は、ユーザ設計の構成されたハードウエアモデル（再構成可能ハードウエアハードウエアユニットのＦＰＧＡチップのアレイにプログラミングされた）の種々のメモリブロックを管理するために、シミュレーションシステムに効率的な方法を提供する。本発明のメモリシミュレーション局面は、ユーザの設計に関連する多くのメモリブロックが、ユーザの設計を構成してモデリングするために用いられる論理デバイスの代わりにシミュレーションシステムのＳＲＡＭメモリデバイスにマッピングされる構造およびスキームを提供する。メモリシミュレーションシステムがメモリ状態機械、評価状態機械および次の（１）〜（３）を制御してインターフェースをとるためのこれらに関連する論理を含む。（１）メインコンピューティングシステムおよびこれに関連するメモリシステム、（２）シミュレーションシステムのＦＰＧＡバスに結合されるＳＲＡＭメモリデバイス、および（３）構成されてプログラミングされたユーザ設計（デバッグ中）を含むＦＰＧＡ論理デバイス。本発明の１実施形態によるメモリシミュレーションシステムの動作は全体的に以下の通りである。シミュレーション書き込み／読み出しサイクルが３つの期間（ＤＭＡデータ転送、評価およびメモリアクセス）に分割される。
【００５６】
メモリシミュレーションシステムのＦＰＧＡ論理デバイス側は、次の（１）および（２）を処理にするためにユーザ設計のユーザ自身のメモリインターフェースとインターフェースをとるために、評価状態機械、ＦＰＧＡバスドライバおよび各メモリブロックＮに対する論理インターフェースを含む。（１）ＦＰＧＡ論理デバイス間のデータ評価、および（２）ＦＰＧＡ論理デバイスとＳＲＡＭメモリデバイスとの間の書き込み／読み出しメモリアクセス。ＦＰＧＡ論理デバイス側との関係において、ＦＰＧＡＩ／Ｏコントローラ側は、メモリ状態機械と、（１）メインコンピューティングシステムとＳＲＡＭメモリデバイスとの間、および（２）ＦＰＧＡ論理デバイスとＳＲＡＭメモリデバイスとの間のＤＭＡ、書き込みおよび読み出し動作を処理するためのインターフェース論理とを含む。
【００５７】
Ｇ．コ−ベリフィケーションシステム
本発明の１実施形態は、再構成可能なコンピューティングシステム（以下の「ＲＣＣコンピューティングシステム」）および再構成可能なコンピューティングハードウエアレイ（以下の「ＲＣＣハードウエアレイ」）を含むコ−ベリフィケーションシステムである。いくつかの実施形態において、ターゲットシステムおよび外部Ｉ／Ｏデバイスは、ソフトウエアにおいてモデリングされ得るので必要でない。他の実施形態において、ターゲットシステムおよび外部Ｉ／Ｏデバイスは、シミュレーションされたテストベンチデータではなく、速さを得てかつ実際のデータを用いるためにコ−ベリフィケーションシステムに実際に接続される。したがって、コ−ベリフィケーションシステムは、実際のターゲットシステムおよび／またはＩ／Ｏデバイスを用いつつ、ユーザの設計のソフトウエア部分およびハードウエア部分をデバッグする機能性と共にＲＣＣコンピューティングシステムおよびＲＣＣハードウエアレイを組み込むことができる。
【００５８】
ＲＣＣコンピューティングシステムはまた、クロック論理（クロックエッジ検出およびソフトウエアクロック生成用の論理）、ユーザ設計をテストするテストベンチプロセス、ユーザが実際の物理的なＩ／Ｏデバイスを用いるのではなく、ソフトウエアにおいてモデリングすることを決定する任意のＩ／Ｏデバイスのデバイスモデルを含む。もちろん、ユーザが１デバッグセッション内に実際のＩ／ＯデバイスおよびモデリングされたＩ／Ｏデバイスを用いることを決定し得る。ソフトウエアクロックは、ターゲットシステムおよび外部Ｉ／Ｏデバイスの外部クロック源として機能するように外部インターフェースに提供される。このソフトウエアクロックの使用は、入出力するデータを処理するために必要な同期化を提供する。ＲＣＣコンピューティングシステム生成ソフトウエアクロックはデバッグセッションにおいて時間ベースであるので、シミュレーションされかつハードウエアクセラレーションされたデータがコ−ベリフィケーションシステムと外部インターフェースとの間で伝達される任意のデータと同期化される。
【００５９】
ターゲットシステムおよび外部Ｉ／Ｏデバイスがコ−ベリフィケーションシステムに結合されている場合、ピン出力データ（ｐｉｎ−ｏｕｔｄａｔａ）はコ−ベリフィケーションシステムとその外部インターフェイスとの間で提供されなければならない。コ−ベリフィケーションシステムは、（１）ＲＣＣコンピューティングシステムとＲＣＣハードウエアレイとの間、および（２）外部インターフェース（ターゲットシステムおよび外部Ｉ／Ｏデバイスに結合される）とＲＣＣハードウエアレイとの間のトラフィック制御を提供する制御論理を含む。なぜなら、ＲＣＣコンピューティングシステムがソフトウエアの設計全体のモデル（ＲＣＣハードウエアレイにおいてモデリングされたユーザ設計の部分を含む）を有するので、ＲＣＣコンピューティングシステムはまた外部インターフェースとＲＣＣハードウエアレイとの間で通過する全てのデータを有しなければならない。制御論理がＲＣＣコンピューティングシステムがこれらのデータにアクセスを有することを確実にする。
【００６０】
ＩＩ．システムの記述
図１は、本発明の１実施形態の高級レベルの概要（ｈｉｇｈｌｅｖｅｌｏｖｅｒｖｉｅｗ）を示す。ワークステーション１０は、ＰＣＩバスシステム５０を介して再構成可能ハードウエアモデル２０およびエミュレーションインターフェース３０に結合される。再構成可能ハードウエア２０は、ケーブル６１と同様に、ＰＣＩバス５０を介してエミュレーションインターフェース３０に結合される。ターゲットシステム４０は、ケーブル６０を介してエミュレーションインターフェースに結合される。他の実施形態において、エミュレーションインターフェース３０およびターゲットシステム４０を含むインサーキットエミュレーションセットアップ７０（点線で描かれたボックスで示される）は、ターゲットシステムの環境内のユーザの回路設計のエミュレーションが特定のテスト／デバッグセッションの間に望まれない場合、このセットアップにおいて提供されない。インサーキットエミュレーションセットアップ７０なしで、再構成可能ハードウエアモデル２０がＰＣＩバス５０を介してワークステーション１０と通信する。
【００６１】
インサーキットエミュレーションセットアップ７０と組み合わせて、再構成可能ハードウエア２０がターゲットシステムのいくつかの電子サブシステム（ｅｌｅｃｔｒｏｎｉｃｓｕｂｓｙｓｔｅｍ）のユーザの回路設計を真似るかまたは模倣する。ターゲットシステムの環境内の電子サブシステムのユーザの回路設計の正しい動作を確実にするために、ターゲットシステム４０とモデリングされた電子サブシステムとの間の入出力信号が評価用の再構成可能ハードウエアモデル２０に提供されなければならない。そこで、再構成可能ハードウエアモデル２０から入出力するターゲットシステム４０の入出力信号がエミュレーション３０とＰＣＩバス５０を介してケーブル６０を介して伝達される。あるいは、ターゲットシステム４０の入力／出力信号がエミュレーションインターフェース３０とケーブル６１とを介して再構成可能ハードウエアモデル２０に伝達され得る。
【００６２】
制御データおよびいくつか実質的なシミュレーションデータが再構成可能ハードウエアモデル２０とワークステーション１０との間でＰＣＩバスを介して通過する。実際に、ワークステーション１０は、全体的なＳエミュレーションシステムの動作を制御し、そして再構成可能ハードウエアモデル２０へのアクセス（読み出し／書き込み）を有さなければならないソフトウエアカーネルを走行させる。
【００６３】
コンピュータ、キーボード、マウス、モニタおよび適切なバス／ネットワークインターフェース付きのワークステーション１０は、ユーザが電子システムの回路設計を記載するデータを入れて、変更することを可能にする。例示的なワークステーションは、ＳｕｎＭｉｃｒｏｓｙｓｔｅｍｓのＳＰＡＲＣまたはＵＬＴＲＡ−ＳＰＡＲＣワークステーションまたはＩｎｔｅｌ／Ｍｉｃｒｏｓｏｆｔベースのコンピューティングステーションを含む。当業者に知られているように、ワークステーション１０は、ＣＰＵ１１、ローカルバス１２、ホスト／ＰＣＩブリッジ１３、メモリバス１４およびメインメモリ１５を含む。種々のソフトウエアシミュレーション、ハードウエアクセラレーションによるシミュレーション、インサーキットエミュレーションおよび本発明のポストシミュレーション解析局面がワークステーション１０、再構成可能ハードウエアモデル２０およびエミュレーション３０に提供される。ソフトウエアに具体化されたアルゴリズムは、テスト／デバッグセッションの間にメインメモリ１５に格納され、そしてワークステーションのオペレーティングシステムの経由のもとのＣＰＵ１１を介して実行される。
【００６４】
当業者に知られているように、オペレーティングシステムがスタートアップファームウエアによってワークステーション１０のメモリにロードされた後、制御が必要なデータ構造をセットアップするために開始するための初期化コードに移り、そしてデバイスドライバをロードし、初期化する。次いで、制御は、コマンドラインインタプリタ（ユーザに走行されるプログラムへのプロンプトが与える）（ＣＬＩ）移される。次いで、オペレーティングシステムは、プログラムを走行するために必要なメモリ量を決定し、メモリブロックを配置するか、またはメモリのブロックに割り当て、そして直接的にまたはＢＩＯＳを介してメモリにアクセスする。メモリローデングプロセスの完了後、アプリケーションプログラムが実行し始める。
【００６５】
本発明の１実施形態は、Ｓエミュレーション用の特定のアプリケーションプログラムである。このプログラムの実行の過程の間、このアプリケーションプログラムがオペレーティングシステムから多数のサービスを要求し得る。これらの多数のサービスは、ディスクファイルから読み出し、ディスクファイルに書き込み、データ通信を実行し、そしてディスプレイ／キーボード／マウスとインターフェースをとることを含むが、これらに限定されない。
【００６６】
ワークステーション１０は、ユーザが回路設計データを入力し、回路設計データを編集し、結果を入手しながらシミュレーションおよびエミュレーションの進展をモニタリングし、そして本質的にはシミュレーションおよびエミュレーションプロセスを制御することを可能にする適切なユーザインターフェースを有する。図１に示されていないが、ユーザインターフェースは、ユーザアクセス可能メニュ駆動オプション（ｕｓｅｒ−ａｃｃｅｓｓｉｂｌｅｍｅｎｕ−ｄｒｉｖｅｎｏｐｔｉｏｎ）およびコマンドセット（キーボードおよびマウスで入力可能で、モニタで眺められ得る）を含む。通常、ユーザは、キーボード９０付のコンピューティングステーション８０を用いる。
【００６７】
ユーザは通常、電子システムの特定の回路設計を作成し、自分の設計されたシステムのＨＤＬ（常に、構造化されたＲＴＬレベル）コードの記載をワークステーション１０に入力する。本発明のＳエミュレーションシステムは、ソフトウエアとハードウエアとの間のモデリングをパーティションするために、他の動作の間で、コンポーネントタイプ解析を実行する。Ｓエミュレーションシステムは、ソフトウエアにおいて、挙動、ＲＴＬおよびゲートレベルコードをモデリングする。ハードウエアモデリングのために、このシステムがＲＴＬおよびゲートレベルコードをモデリングし得る；しかし、ＲＴＬレベルがハードウエアモデリングの前にゲートレベルに合成されなければならない。ゲートレベルコードは、ハードウエアモデリング用の使用可能ソース設計データベースフォーマット（ｕｓａｂｌｅｓｏｕｒｃｅｄｅｓｉｇｎｄａｔａｂａｓｅｆｏｒｍａｔ）の中で直接的に処理され得る。ＲＴＬおよびゲートレベルコードを用いて、システムがコンポーネントタイプ解析を自動的に実行し、パーティションステップを完了する。ソフトウエアコンパイル時間の間のパーティション解析に基づいて、システムがハードウエアクセラレーションを介しての高速シミュレーションのために、回路設計のある部分をハードウエアにマッピングする。ユーザはまた、現実環境インサーキットエミュレーションのために、モデリングされた回路設計をターゲットシステムに結合し得る。ソフトウエアシミュレーションおよびハードウエアクセラレーションエンジンがソフトウエアカーネルを介して緊密に結合されるので、次いでユーザが、テスト／デバッグプロセスが完了するまで、ソフトウエアシミュレーションを用いつつ、全体の回路設計のシミュレーションを実行し、マッピングされた回路設計のハードウエアモデルを用いることによってテスト／デバッグプロセスをアクセラレーションし、シミュレーション部分に戻り、ハードウエアクセラレーションに戻り得る。ソフトウエアシミュレーションとハードウエアクセラレーションの間をサイクル毎およびユーザの競合でスイッチングする能力がこの実施形態の価値のある特長の１つである。この機能は、種々のポイントを検査し、その後、回路設計をデバッグするために、ハードウエアクセラレーションモードを用い、次いでソフトウエアシミュレーションを用いつつ、ユーザが特定のポイントまたはサイクルに非常に高速で行くことを可能にすることにより、デバッグプロセスにおいて特に有用である。さらに、Ｓエミュレーションシステムは、コンポーネントの内部の実現状態がハードウエアまたはソフトウエア内にあるか否かと関係なく、全てのコンポーネントをユーザが見れるようにする。Ｓエミュレーションシステムは、ユーザがこのような読み出しを要求する場合、ハードウエアモデルからレジスタ値を読み出し、ソフトウエアモデルを用いて組み合わせコンポーネントを再構成することによってこれを達成する。これらと他の特長が本明細書中に後により十分に議論される。
【００６８】
ワークステーション１０がバスシステム５０に結合される。バスシステムは、ワークステーション１０、再構成可能ハードウエアモデル２０およびエミュレーションインターフェース３０等の種々のエージェントが動作上に共に結合されることを可能にする任意の利用可能なバスシステムであり得る。好適には、バスシステムは、実時間またはほぼ実時間（ｎｅａｒｒｅａｌ−ｔｉｍｅ）をユーザに提供するのに十分に高速である。このようなバスシステムの１つが、周辺コンポーネント相互接続（ＰｅｒｉｐｈｅｒａｌＣｏｍｐｏｎｅｎｔＩｎｔｅｒｃｏｎｎｅｃｔ）（ＰＣＩ）規格（参考のために本明細書中で援用される）で記述されたバスシステムである。現在、ＰＣＩ規格の改正版（ｒｅｖｉｓｉｏｎ）２．０が３３ＭＨｚバス速度を提供する。改正版２．１は６６ＭＨｚバス速度を可能にする（ｓｕｐｐｏｒｔ）。したがって、ワークステーション１０、再構成可能ハードウエアモデル２０およびエミュレーションインターフェース３０がＰＣＩ規格と適合し得る。
【００６９】
１実施形態において、ワークステーション１０と再構成可能ハードウエアモデル２０との間の通信がＰＣＩバス上で処理される。他のＰＣＩ適合デバイスはこのバスシステムにおいて見出され得得る。以上のデバイスは、ワークステーション１０、再構成可能ハードウエアモデル２０およびエミュレーションインターフェース３０と同一のレベルまたは他のレベルにてＰＣＩバスに結合され得る。ＰＣＩバス５２等の、異なるレベルでの各ＰＣＩバスは、ある場合には、ＰＣＩ対ＰＣＩブリッジ（ＰＣＩ−ｔｏ−ＰＣＩｂｒｉｄｇｅ）５１を介して、ＰＣＩ５０等の別のＰＣＩバスレベルに結合される。ＰＣＩバス５２にて、２つのＰＣＩデバイス５３および５４が互いに結合され得る。
【００７０】
再構成可能ハードウエアモデル２０は、ユーザの電子システム設計のハードウエア部分をモデリングするためにプログラム可能なように構成され、再構成され得るフィールドプログラマブルゲートアレイ（ＦＰＧＡ）チップを含む。この実施形態において、ハードウエアモデルが再構成可能である。すなわち、この実施形態では、手近の特定の計算（ｃｏｍｐｕｔａｔｉｏｎ）またはユーザ回路設計に適合させるようにハードウエアを再構成できる。例えば、多くの加算器（ａｄｄｅｒ）または乗算器が必要とされる場合、本システムが多くの加算器および乗算器を含むように構成される。他のコンピューティング素子または関数（ｆｕｎｃｔｉｏｎ）が必要とされる場合、それらもまたシステム内にモデリングまたは形成され得る。このように、システムは、特定化された計算または論理動作を実行するために最適化され得る。ユーザが再構成可能システムはまた、製造、テストまたは使用中に生じる小さなハードウエアの欠陥（ｍｉｎｏｒｈａｒｄｗａｒｅｄｅｆｅｃｔ）を処理できるように柔軟である。１実施形態において、再構成可能ハードウエアモデル２０は、種々のユーザ回路設計およびアプリケーションのために計算リソースを提供するために、ＦＰＧＡチップを構成するコンピューティング素子の２次元アレイを含み、ハードウエア構成プロセスのさらなる詳細が提供される。
【００７１】
２つのこのようなＦＰＧＡがＡｌｔｅｒａおよびＸｉｌｌｉｎｘによって販売されているチップを含む。いくつかの実施形態において、再構成可能ハードウエアモデルがフィールドプログラマブルデバイスの使用によって再構成可能になる。しかし、本発明の他の実施形態は、アプリケーション専用集積回路（ＡＳＩＣ）技術を用いて実現され得る。さらに、他の実施形態がカスタム集積回路の形態でもあり得る。
【００７２】
通常のテスト／デバッグシナリオにおいて、再構成可能デバイスは、ユーザの回路設計のシミュレーション／エミュレーションを実行するように用いられるので、適切な変更が実際のプロトタイプの製造前に為され得る。しかし、いくつかの他の例では、再シミュレーションおよび再エミュレーションのたぶん非機能な回路設計を素早くかつコスト効果的にユーザが変化させることができないが、実際のＡＳＩＣまたはカスタム集積回路が用いられ得る。もっとも時には、このようなＡＳＩＣまたはカスタムＩＣは、実際の非再構成可能チップによるエミュレーションが好適であり得るように既に製造さ競合ぐに利用可能である。
【００７３】
本発明により、ワークステーションのソフトウエアは、外部ハードウエアモデルと一体化して、現存するシステムを超えるより高い程度の柔軟性、制御および性能をエンドユーザに提供する。シミュレーションおよびエミュレーションを走行するために、回路設計のモデルおよび関連パラメータ（例えば、入力テストベンチ刺激（ｓｔｉｍｕｌｕｓ）、全体システムの出力、中間結果）は、決定され、シミュレーションソフトウエアシステムに提供される。ユーザは、システム回路設計を定義するためにスキマティックキャプチャ（ｓｃｈｅｍａｔｉｃｃａｐｔｕｒｅ）ツールまたは合成ツールのいずれかを用い得る。ユーザは、通常、スキマティック図（後に合成ツールを用いてＨＤＬ形式に変換される）で電子システムの回路設計を始める。ＨＤＬはまた、ユーザによって直接的に書き込まれ得る。例示的なＨＤＬ言語は、ＶｅｒｉｌｏｇおよびＶＨＤＬを含む。しかし、他の言語も利用可能である。ＨＤＬで表される回路設計が多くの並列型コンポーネントを含む。各コンポーネントは、回路素子の挙動を定義するか、シミュレーションの実行を制御するかのいずれかであるコードシーケンスである。
【００７４】
Ｓエミュレーションシステムは、上記のコンポーネントのタイプを決定するために上記のコンポーネントを解析し、コンパイラはソフトウエアおよびハードウエアにて異なる実行モデルを構築するようにこのコンポーネントタイプ情報を用いる。その後、ユーザが本発明のＳエミュレーションシステムを用い得る。設計者は、シミュレーションを通じて、入力信号およびテストベクトルパターン等の種々の刺激をシミュレーション中のモデルに作用させることによって回路の精度を検証し得る。シミュレーションの間、回路が計画された挙動しなかった場合、ユーザが回路のスキマティック図またはＨＤＬファイルを変更することによって回路を再定義する。
【００７５】
本発明のこの実施形態の使用が図２のフローチャートに示される。アルゴリズムが工程１００で開始する。システムにＨＤＬファイルをロードした後、システムは、回路設計を適切なハードウエアモデルにコンパイルし、パーティションし、マッピングする。コンパイル、パーティションおよびマッピング工程が以下でより詳細に説明される。
【００７６】
シミュレーションが走行する前に、システムは、ハードウエアクセラレーションモデルが機能し得る前にソフトウエア上の未知の「ｘ」値の全てを除去するようにリセットシーケンスを走行させなければならない。本発明の１実施形態は、４状態値（「００」が論理ｌｏｗであり、「０１」が論理ｈｉｇｈであり、「１０」が「ｚ」であり、「１１」が「ｘ」である）をバス信号に提供するように２ビット幅データパスを用いる。当業者に知られているように、ソフトウエアモデルが「０」、「１」、「ｘ」（バス衝突かまたは未知の値）および「ｚ」（ドライバがないか、または高いインピーダンスでない）を処理し得る。対照的に、特定の応用コード（ａｐｐｌｌｉｃａｂｌｅｃｏｄｅ）に依存して変わるリセットシーケンスがレジスタ値を全て「０」または全て「１」にリセットするので、ハードウエアが未知の値「ｘ」を処理できない。
【００７７】
工程１０５において、ユーザが回路設計のシミュレーションを実行するか否かを決定する。通常、まず、ユーザがソフトウエアシミュレーションについてシステムを開始させる。したがって、工程１０５の決定が「はい（ＹＥＳ）」である場合、ソフトウエアシミュレーションが工程１１０で始まる。
【００７８】
工程１１５に示されるように、ユーザは値を検査するためにシミュレーションを停止させることができる。実際、ユーザは、ハードウエアクセラレーションモード、ＩＣＥモードおよびポストシミュレーションモードにおいて、工程１１５から種々のノードに伸びる点線によって示されたテスト／デバッグセッションの間の任意の時間にシミュレーションを停止させることができる。ユーザは、実行工程１１５を実行することによって工程１６０に進む。
【００７９】
停止後、システムカーネルは、ユーザが組み合わせコンポーネント値を検査することを望む場合、ハードウエアレジスタコンポーネントの状態を読み返し、これにより組み合わせコンポーネントを含むソフトウエアモデル全体を再生成する。ソフトウエアモデル全体を復帰させた後、ユーザがシステムの任意の信号値を検査し得る。停止および検査後、ユーザがシミュレーション専用モードまたはハードウエアモデルアクセラレーションモードの中に走行し続ける。フローチャートに示されるように、工程１１５が停止／値検査ルーチンに分岐する。停止／値検査ルーチンが工程１６０にて開始する。工程１６５にて、ユーザがこのポイントにてシミュレーションを停止し、値を検査するか否かを決定しなければならない。工程１６５の決定が「はい」である場合、工程１７０では、現在進行中であり得るシミュレーションを停止し、回路設計の訂正をチェックするために種々の値を検査する。工程１７５にて、アルゴリズムは、工程１１５で分岐したポイントに戻る。ここで、ユーザがテスト／デバッグセッションの残りのために値のシミュレーションをしかつ停止／検査し続けるか、またはインサーキットエミュレーション工程に進み得る。
【００８０】
同様に、工程１０５の決定が「いいえ（ＮＯ）」である場合、アルゴリズムがハードウエアクセラレーション決定工程１２０に進む。工程１２０にて、ユーザがモデリングされた回路設計のハードウエア部分を介してシミュレーションをアクセラレーションすることによってテスト／デバッグプロセスをアクセラレーションするか否かを決定する。工程１２０の決定が「はい」である場合、ハードウエアモデルアクセラレーションが工程１２５にて行われる。システムコンパイルプロセスの間、Ｓエミュレーションシステムがいくつかの部分をハードウエアモデルにマッピングした。ここで、ハードウエアクセラレーションが望まれる場合、システムがレジスタおよび組み合わせコンポーネントをハードウエアモデルに移動させ、入力値および評価値をハードウエアモデルに移動させる。したがって、ハードウエアクセラレーションの間、評価がアクセラレーションされた速度にて長期間ハードウエアモデルの中で行われる。カーネルは、ハードウエアモデルにテストベンチ出力を書き込み、ソフトウエアクロックを更新し、次いで、サイクル毎にハードウエアモデル出力を読み出す。ユーザによって所望される場合、ユーザの回路設計のソフトウエアモデル全体（回路設計全体）からの値は、レジスタ値および組み合わせコンポーネントを出力することにより、レジスタ値で組み合わせコンポーネントを再生成することにより、利用可能となり得る。これらの組み合わせコンポーネントを再生成するためのソフトウエアの介入（ｉｎｔｅｒｖｅｎｔｉｏｎ）の必要性のために、ソフトウエアモデル全体の値の出力が１サイクル毎に提供されない；むしろ、ユーザがこのような値を望む場合のみにこのような値がユーザに提供される。この明細書は組み合わせコンポーネントの再生成プロセスを以下で説明する。
【００８１】
再び、ユーザが工程１１５によって示されたような任意の時間にハードウエアクセラレーションモードを停止することができない。ユーザが停止することを望む場合、アルゴリズムが工程１１５および１６０に進み、停止／値検査ルーチンに分岐する。ここで、工程１１５内のように、ユーザは、任意の時間に、ハードウエアクセラレーションシミュレーションプロセスを停止し、シミュレーションプロセスから生じる値を検査できるか、またはユーザがハードウエアクセラレーションシミュレーションプロセスを続けることができる。停止／値検査ルートは、シミュレーションを停止するという関連で上述された工程１６０、１６５、１７０および１７５に分岐する。工程１２５の後のメインルートに戻ると、ユーザが工程１３５でハードウエアクセラレーションシミュレーションを続けることを決定し得るか、代わりに純粋のシミュレーション（ｐｕｒｅｓｉｍｕｌａｔｉｏｎ）を実行することを決定し得る。ユーザがさらにシミュレーションを実行することを望む場合、アルゴリズムが工程１０５に進む。ユーザがさらにシミュレーションを実行することを望まない場合、アルゴリズムが工程１４０にてポストシミュレーション解析に進む。
【００８２】
工程１４０にて、Ｓエミュレーションシステムが多数のポストシミュレーション解析特性を提供する。システムがハードウエアモデルに全ての入力をログする。ハードウエアモデル出力について、システムがユーザ定義ロギング頻度（例えば、１／１０，０００レコード／サイクル）でハードウエアレジスタコンポーネントの全ての値をログする。ロギング頻度は、出力値が何回記録されるかを決定する。１／１０，０００レコード／サイクルのロギング頻度について、出力値が１０，０００サイクルに１回記録される。ロギング頻度が高くなればなるほど、後のポストシミュレーション解析のために多数の情報が記録される。選択されたロギング頻度がＳエミュレーション速度と因果関係を有するので、ユーザがロギング頻度を注意して選択する。システムはさらなるシミュレーションが実行され得る前にメモリへのＩ／Ｏ動作を実行することによって出力データを記録するために時間およびリソースを費やさなければならないので、より高いロギング頻度はＳエミュレーション速度を減少する。
【００８３】
ポストシミュレーション解析について、ユーザがシミュレーションが望まれる特定のポイントを選択する。次いで、ユーザは、値の変化および全てのハードウエアコンポーネントの内部状態を計算するためにハードウエアモデルへの入力記録を伴って、ソフトウエアシミュレーションを走行させることによって、Ｓエミュレーション後に解析を実行できる。ハードウエアクセレータは、シミュレーション結果を解析するために選択されたロギングポイントからデータのシミュレーションを実行するように用いられることに留意する。このポストシミュレーション解析方法がポストシミュレーション用の任意のシミュレーション波形ビューワにリンクできる。以下にさらに詳細な説明される。
【００８４】
工程１４５にて、ユーザは、そのターゲットシステム環境内でシミュレーションをされた回路設計のエミュレーションを実行するように選択できる。工程１４５の決定が「いいえ」である場合、アルゴリズムが終了し、Ｓエミュレーションプロセスが工程１５５にて終了する。ターゲットシステムのエミュレーションが望まれる場合、アルゴリズムが工程１５０に進む。この工程はエミュレーションインターフェースボードを駆動し、ケーブルおよびチップピンアダプタをターゲットシステムにプラグし、ターゲットシステムからシステムＩ／Ｏを入手するためにターゲットシステムを走行させることを含む。ターゲットシステムからのシステムＩ／Ｏは、ターゲットシステムと回路設計のエミュレーションとの間の信号を含む。エミュレーションされた回路設計は、ターゲットシステムから入力信号を受信し、この入力信号を処理し、さらなる処理のためにこの入力信号をＳエミュレーションシステムに送信し、恐らく処理された信号をターゲットシステムに出力する。逆に、エミュレーションされた回路設計は、出力信号をターゲットシステム（出力信号を処理し、処理された信号を出力してエミュレーションをされた回路設計に戻す）に送信する。このように、回路設計の性能は、本来のターゲットシステム環境にて評価され得る。ターゲットシステムのエミュレーションをした後、ユーザが回路設計を確証するか、非機能局面を示すという結果を存する。このポイントにて、ユーザが工程１３５にて示されるように再びシミュレーション／エミュレーションを実行し、回路設計を変更するために完全に停止するか、確証された回路設計に基づいて集積回路製造に進み得る。
【００８５】
ＩＩＩ．シミュレーション／ハードウエアクセラレーションモード
本発明の１実施形態に従ってコンパイル時間帯と走行時間帯におけるソフトウエアコンパイルおよびハードウエア構成の高級レベルブロック図（ｈｉｇｈｌｅｖｅｌｄｉａｇｒａｍ）が図３に示される。図３は、情報の２つのセットを示す。情報の一方のセットは、コンパイル時間とシミュレーション／エミュレーション走行時間との間に実行される動作を区別し、情報の他方のセットは、ソフトウエアモデルとハードウエアモデルとの間のパーティションを示す。手始めに、本発明の１実施形態によるＳエミュレーションシステムは、入力データ２００としてユーザ回路設計を必要とする。ユーザ回路設計はＨＤＬファイルのある形式で行われる（例えば、Ｖｅｒｉｌｏｇ、ＶＨＤＬ）。Ｓエミュレーションシステムは、ＨＤＬファイルを構文解析するので、挙動レベルコード、レジスタ転送レベルコードおよびゲートレベルコードは、Ｓエミュレーションシステムによって使用可能な形態に帰着され得る。システムがフロントエンド処理工程２０５に開けてソース設計データベースを生成する。ここにおいて処理されたＨＤＬファイルがＳエミュレーションシステムによって使用可能である。構文解析プロセスは、ＡＳＣＩＩデータを内部バイナリデータ構造に変換し、このことは当業者に公知である。本明細書中で援用されるＡＬＦＲＥＤＶ．ＡＨＯ，ＲＡＶＩＳＥＴＨＩ，ＡＮＤＪＥＦＦＲＥＹＤ．ＵＬＬＭＡＮ，ＣＯＭＰＩＬＥＲＳ：ＰＲＩＮＣＩＰＬＥＳ，ＴＥＣＨＮＩＱＵＥＳ，ＡＮＤＴＯＯＬＳ（１９８８）を参照する。
【００８６】
コンパイル時間がプロセス２２５によって表され、走行時間がプロセス／要素２３０によって表される。プロセス２２５によって示されるようなコンパイル時間の間に、Ｓエミュレーションシステムは処理されたＨＤＬファイルをコンポーネントタイプ解析を実行することによってコンパイルする。コンポーネントタイプ解析は、ＨＤＬコンポーネントを組み合わせコンポーネント、レジスタコンポーネント、クロックコンポーネント、メモリコンポーネントおよびテストベンチコンポーネントに分類する。本質的に、システムはユーザ回路設計を制御および評価コンポーネントにパーティションする。
【００８７】
Ｓエミュレーションコンパイラ２１０は、本質的に、シミュレーションの制御コンポーネントをソフトウエアにマッピングし、評価コンポーネントをソフトウエアおよびハードウエアにマッピングする。コンパイラ２１０がＨＤＬコンポーネント全てのためにソフトウエアモデルを生成する。ソフトウエアモデルがコード２１５にてキャストされる。さらに、Ｓエミュレーション２１０は、ＨＤＬファイルのコンポーネントタイプ情報を用い、ライブラリまたはモジュールジェネレータからハードウエア論理ブロック／要素を選択または生成し、所定のＨＤＬコンポーネントのためにハードウエアを生成する。最終的に生じるのは所謂「ビットストリーム（ｂｉｔｓｔｒｅａｍ）」構成ファイル２２０である。
【００８８】
走行時間に備えて、コード形式のソフトウエアモデルは、本発明の１実施形態によるＳエミュレーションプログラムに関連したアプリケーションプログラムが格納されるメインメモリに格納される。このコードは汎用プロセッサまたはワークステーション２４０で処理される。実質的に現時点で、ハードウエアモデル用の構成ファイル２２０は、ユーザ回路設計を再構成ハードウエアボード２５０にマッピングするために用いられる。ここで、ハードウエア内でモデリングされてきた回路設計のこれらの部分は、再構成可能ハードウエアボード２５０のＦＰＧＡチップにマッピングされ、パーティションされる。
【００８９】
上述したように、ユーザテストベンチ刺激、テストベクトルデータおよび他のテストベンチリソース２３５は、シミュレーションの目的のために汎用プロセッサまたはワークステーション２４０に適用される。さらに、ユーザは、ソフトウエア制御によって回路設計のエミュレーションを実行し得る。再構成可能ハードウエアボード２５０は、ユーザのエミュレーションをされた回路設計を含む。このＳエミュレーションシステムはソフトウエアシミュレーションとハードウエアエミュレーションとの間に選択的にユーザがスイッチングできる機能と、シミュレーションまたはエミュレーションプロセスのいずれかを任意の時間にサイクル毎に停止させる機能とを有し、これにより、レジスタか組み合わせコンポーネントのいずれであれ、モデルの全てのコンポーネントからの値を検査する。したがって、Ｓエミュレーションシステムは、シミュレーションのためにテストベンチ２３５とプロセッサ／ワークステーション２４０との間、およびエミュレーションのためにデータバス２４５とプロセッサ／ワークステーションを経由してテストベンチ２３５と再構成可能ハードウエアボード２５０との間でデータを通過させる。ユーザターゲットシステム２６０が含まれる場合、エミュレーションデータが再構成可能ハードウエアボード２５０とターゲットシステムとの間にエミュレーションインターフェース２５５およびデータバス２４５を介して通過できる。カーネルは、プロセッサ／ワークステーション２４０のメモリのソフトウエアシミューションモデルの中に存在するので、データは、必要な場合、プロセッサ／ワークステーション２４０と再構成可能ハードウエア２５０との間でデータバス２３５を介して通過する。
【００９０】
図４は、本発明の１実施形態によってコンパイルプロセスのフローチャートを示す。コンパイルプロセスが図３のプロセス２０５および２１０として表される。図４のコンパイルプロセスは工程３００にて開始する。工程３０１は、フロントエンド情報を処理する。ここで、ゲートレベルＨＤＬコードが生成される。ユーザは、このコードを直接的にハンドライティングするか、コードのゲートレベルＨＤＬ表示を生成するために、ある形態のステマティックまたは合成ツールを用いることで、初期の回路設計をＨＤＬ形式に変換される。Ｓエミュレーションシステムは、バイナリフォーマットにＨＤＬファイル（ＡＳＣＩＩフォーマット）を構文解析するので、挙動レベルコード、レベル転送レベル（ＲＴＬ）コードおよびゲートレベルコードは、Ｓエミュレーションシステムによって使用可能な内部データ構造形式に帰着され得る。システムは、構文解析されたＨＤＬコードを含むソース設計データベースを生成する。
【００９１】
工程３０２は、タイプリソース３０３に示されるように、組み合わせコンポーネント、レジスタコンポーネント、クロックコンポーネント、メモリコンポーネントおよびテストベンチコンポーネントにＨＤＬコンポーネントコンポーネントを分類することによってコンポーネントタイプ解析を実行する。Ｓエミュレーションシステムは、レジスタおよび組み合わせコンポーネント用にハードウエアモデルを生成する（いくつかの例外は以下で述べられる）。テストベンチおよびメモリコンポーネントがソフトウエアにマッピングされる。クロックコンポーネント（例えば、派生されたクロック（ｄｅｒｉｖｅｄｃｌｏｃｋ））がハードウエアにモデリングされるものもあり、ソフトウエア／ハードウエア境界（例えば、ソフトウエアクロック）に常駐するものもある。
【００９２】
組み合わせコンポーネントは、この出力値が現在の入力値の関数であり、入力値の履歴に依存しない。状態に無関係な（ｓｔａｔｅｌｅｓｓ）論理コンポーネントであり、組み合わせコンポーネントの例は、プリミティブゲート（例えば、ＡＮＤ、ＯＲ、ＸＯＲ、ＮＯＴ）、セレクタ、加算器、乗算器、シフタ（ｓｈｉｆｔｅｒ）およびバスドライバを含む。
【００９３】
レジスタコンポーネントは、単一の格納コンポーネントである。レジスタの状態遷移はクロック信号によって制御される。エッジが検出される場合に状態を変化させ得るレジスタの１形式はエッジトリガ型である（ｅｄｇｅ−ｔｒｉｇｇｅｒｅｄ）。例はフリップフロップ（Ｄタイプ、ＪＫタイプ）およびレベル検知ラッチ（ｌｅｖｅｌ−ｓｅｎｓｉｔｉｖｅｌａｔｃｈ）を含む。
【００９４】
クロックコンポーネントは、周期的な信号を論理デバイスに送達し、これにより論理デバイスの挙動を制御するコンポーネントである。通常、クロック信号はレジスタの更新を制御する。一次クロックは、セルフタイミングテストベンチプロセス（ｓｅｌｆ−ｔｉｍｅｄｔｅｓｔ−ｂｅｎｃｈｐｒｏｃｅｓｓ）から生成される。例えば、Ｖｅｒｉｌｏｇにおけるクロック生成用の通常のテストベンチプロセスは以下の通りである：
ａｌｗａｙｓｂｅｇｉｎ
Ｃｌｏｃｋ＝０；
＃５；
Ｃｌｏｃｋ＝１；
＃５；
ｅｎｄ；
このコードによると、クロック信号は最初に論理「０」である。５タイム単位（５ｔｉｍｅｕｎｉｔｓ）後、クロック信号が論理「１」に変化する。５タイム単位後、クロック信号が論理「０」に反転して戻る。通常、一次クロック信号がソフトウエアにて生成され、わずかな（すなわち、１−１０）一次クロックは通常のユーザ回路設計に存在する。派生またはゲートされたクロックは、順番に一次クロックによって駆動される組み合わせ論理およびレジスタのネットワークから生成される。多数の（すなわち、１，０００以上）派生されたクロックが通常のユーザ回路設計に存在する。
【００９５】
メモリコンポーネントは、特定のメモリ位置の個々のデータにアクセスするためのアドレスおよび制御ラインを備えたブロック格納コンポーネントである。例はＲＯＭ、非同期化ＲＡＭおよび同期化ＲＡＭを含む。
【００９６】
テストベンチコンポーネントは、シミュレーションプロセスを制御し、モニタリングするために用いられるソフトウエアプロセスである。したがって、これらのコンポーネントは、テストの下では、ハードウエア回路設計の一部ではない。テストベンチコンポーネントは、クロック信号を生成し、シミュレーションデータを初期化し、ディスク／メモリからシミュレーションテストベクトルパターンを読み出すことによってシミュレーションを制御する。テストベンチコンポーネントはまた、値の変化をチェックし、値変化ダンプを実行し、信号値関係（ｓｉｇｎａｌｖａｌｕｅｒｅｌａｔｉｏｎ）のアサートされた制限をチェックし、ディスク／メモリに出力テストベクトルを書き込み、種々の波形ビューワおよびデバッガとインターフェースをとることによってシミュレーションをモニタリングする。
【００９７】
Ｓエミュレーションシステムは、以下のようにコンポーネントタイプ解析を実行する。このシステムは、バイナリソース設計データベースを試験する。ソース設計データベースに基づいて、このシステムが上記のコンポーネントタイプの１つとして要素を特徴づけるか、分類し得る。連続的な割り当てステートメントが組み合わせコンポーネントとして分類される。プリミティブゲートは、言語定義（ｌａｎｇｕａｇｅｄｅｆｉｎｉｔｉｏｎ）によるレジスタタイプの組み合わせタイプまたはラッチ形式のいずれかである。初期化コードがテストベンチの初期化タイプとして扱われる。
【００９８】
ネット（ｎｅｔ）を用いることなく、ネットを駆動させるプロセスは、常に、テストベンチのドライバタイプである。ネット（ｎｅｔ）を用いることなく、ネットを読み出すプロセスは、常に、テストベンチのモニタタイプである。遅延制御または複数のイベントに関連するプロセスは、常に、テストベンチの汎用タイプである。
【００９９】
単一イベント制御および単一ネットの駆動に関する常時のプロセスは、以下の内の１つであり得る：（１）イベント制御がエッジトリガされたイベントである場合、したがってプロセスはエッジトリガされたタイプのレジスタコンポーネントである。（２）プロセスにおいて駆動されるネットが全ての可能な実行経路の中で定義されない場合、したがってネットはラッチタイプのレジスタである。（３）プロセスにおいて駆動されるネットが全ての可能な実行経路で定義される場合、したがってネットは組み合わせコンポーネントである。
【０１００】
単一イベント制御であるが多数ネットの駆動に関する常時のプロセスは、個々のコンポーネントタイプを個別に駆動させるために個別に各ネットを駆動させるいくつかのプロセスに分解され得る。次いで、分解されたプロセスは、コンポーネントタイプを決定するために用いられ得る。
【０１０１】
工程３０４は、コンポーネントタイプに関わらず、すべてのＨＤＬコンポーネントに対してのソフトウエアモデルを生成する。適切なユーザ駆動インターフェースによって、ユーザは、完全なソフトウエアモデルを用いて回路設計全体のシミュレーションをできる。テストベンチプロセスは、刺激入力を駆動させ、ベクトルパターンをテストし、シミュレーション全体を制御し、シミュレーションプロセスをモニタリングするために用いられる。
【０１０２】
工程３０５は、クロック解析を実行する。クロック解析が２つの一般的な工程を包含する。（１）クロック抽出および逐次的なマッピング、および（２）クロックネットワーク解析。クロック抽出および逐次的なマッピング工程は、ユーザのレジスタコンポーネントをＳエミュレーションシステムのハードウエアレジスタモデルにマッピングし、次いで、システムのハードウエアレジスタコンポーネントからクロック信号を抽出する。クロックネットワーク解析工程は、抽出されたクロック信号に基づいて一次クロックおよび派生されたクロックを決定することおよびゲートされたクロックネットワークおよびゲートデータネットワークを分離することを含む。さらに詳細な説明は図１６にて提供される。
【０１０３】
工程３０６は常駐（ｒｅｓｉｄｅｎｃｅ）選択を実行する。システムは、ユーザに関連して、ハードウエアモデルのためのコンポーネントを選択する。すなわち、ユーザの回路設計のハードウエアモデルにて実現され得る可能なハードウエアコンポーネントの一般的なものであり、いくつかのハードウエアコンポーネントは種々の理由からハードウエアにてモデリングされない。これらの理由は、コンポーネントタイプ、ハードウエアリソース制限（すなわち、浮動点動作および大規模乗算動作がソフトウエアに存在している）、シミュレーションおよび通信オーバーヘッド（すなわち、テストベンチプロセス間の小さいブリッジ論理がソフトウエアに存在しており、テストベンチプロセスによってモニタリングされる信号がソフトウエアに存在している）およびユーザの嗜好を含む。性能およびシミュレーションモニタリングを含む種々の理由から、ユーザは、さもなければ、ハードウエアにてモデリングされる所定のコンポーネントをソフトウエアに存在するように課すことができる。
【０１０４】
工程３０７は、再構成可能ハードウエアエミュレーションボードに選択されたハードウエアモデルをマッピングする。特に、工程３０７は、ネットリストを取り出してマッピングし、回路設計を特定のＦＰＧＡチップにマッピングする。この工程は、論理素子を共にグループ分けまたはクラスタリングを行うことを包含する。次いで、システムは、唯一的なＦＰＧＡチップに各グループを割り当てるか、いくつかのグループを単一のＦＰＧＡチップに割り当てる。システムはまた、異なるＦＰＧＡチップにグループを割り当て得る。一般に、システムは、ＦＰＧＡチップにグループを割り当てる。さらに詳細な説明が図６に関して下記に提供される。システムは、内部チップ通信オーバーヘッドを最小化するためにハードウエアモデルコンポーネントをＦＰＧＡチップのメッシュに配置する。１実施形態において、アレイは、ＦＰＧＡの４×４アレイ、ＰＣＩインターフェースユニットおよびソフトウエアクロック制御ユニットを含む。ＦＰＧＡのアレイは、このソフトウエアコンパイルプロセスの工程３０２−３０６にて以上で決定したようにユーザのハードウエア回路設計の一部を実現する。ＰＣＩインターフェースユニットは、再構成可能ハードウエアエミュレーションモデルがＰＣＩバスを介してワークステーションと通信することを可能にする。ソフトウエアクロックは、ＦＰＧＡのアレイに対する種々のクロック信号の競合条件を避ける。さらに、工程３０７は、ハードウエアモデル間の通信スケジュールによってＦＰＧＡチップにルーティングを行う。
【０１０５】
ステップ３０８は制御回路を挿入する。これらの制御回路は、ＤＭＡエンジンと通信するための、シミュレータへのＩ／Ｏ回路アドレスポインタおよびデータバス論理（図１１、図１２、および図１４を参照して以下で説明される）、ならびに、ハードウエア状態遷移およびワイヤマルチプレクシングを制御するための評価制御論理（図１９および図２０を参照して以下に説明される）を含む。当業者に公知のように、ダイレクトメモリアクセス（ＤＭＡ）ユニットは、周辺機器とメインメモリとの間のさらなるデータチャンネルを提供し、この周辺機器は、ＣＰＵを介することなくメインメモリに直接アクセス（すなわち、読み出し、書き込み）し得る。各ＦＰＧＡチップにおけるアドレスポインタは、バスのサイズ制限を考慮して、ソフトウエアモデルとハードウエアモデルとの間でデータを移動させることを可能にする。評価制御論理は、実質的には、クロックおよびデータ入力がこれらのレジスタに入力する前に、アサートされるべきレジスタにクロックイネーブルが入力することを確実にする有限状態機械である。
【０１０６】
ステップ３０９は、ハードウエアモデルをＦＰＧＡチップにマッピングするための構成ファイルを生成する。本質的には、ステップ３０９は、回路設計コンポーネントを各チップにおける特定のセルまたはゲートレベルコンポーネントに割り当てる。ステップ３０７が、ハードウエアモデル群を特定のＦＰＧＡチップにマッピングすることを決定するが、ステップ３０９は、このマッピング結果を獲得し、各ＦＰＧＡチップに対する構成ファイルを生成する。
【０１０７】
ステップ３１０は、ソフトウエアカーネルコードを生成する。このカーネルは、全体のＳエミュレーション（ＳＥｍｕｌａｔｉｏｎ）システムを制御するソフトウエアコードのシーケンスである。このカーネルは、コードの部分がハードウエアコンポーネントを更新かつ評価することを要求するため、このポイントまで生成され得ない。ステップ３０９の後のみ、ハードウエアモデルへの適切なマッピングおよび発生したＦＰＧＡチップが生成する。より詳細な議論が図５を参照して以下に提供される。コンパイルは、ステップ３１１で終了する。
【０１０８】
図４を参照して上述されたように、ソフトウエアカーネルコードは、ソフトウエアモデルおよびハードウエアモデルが決定された後でステップ３１０において生成される。このカーネルは、全体のシステムの動作を制御するＳエミュレーションシステムにおけるソフトウエアの一部である。このカーネルはソフトウエアシミュレーションの実行およびハードウエアエミュレショーンの実行を制御する。さらに、カーネルはハードウエアモデルの中心に常駐するために、シミュレータはエミュレータと統合される。他の公知のコ−シミュレーションシステムとは対照的に、本発明の一実施形態によるシミュレーションシステムは、外部からエミュレータとインタラクトするシミュレータを要求しない。カーネルの一実施形態は、図５に示される制御ループである。
【０１０９】
図５を参照すると、カーネルはステップ３３０で開始する。ステップ３３１は、初期化コードを評価する。ステップ３３２で開始し、決定ステップ３３９によってとぶ（ｂｏｕｎｄ）ことによって、制御ループが開始し、システムがアクティブテストベンチプロセスを観察しなくなるまで制御ループが繰り返し循環する。この場合、シミュレーションまたはエミュレーションセッションが完成される。ステップ３３２は、シミュレーションまたはエミュレーションのためのアクティブテストベンチコンポーネントを評価する。
【０１１０】
ステップ３３３は、クロックコンポーネントを評価する。これらのクロックコンポーネントは、テストベンチプロセスから生じる。通常、ユーザは、どのタイプのクロック信号がシミュレーションシステムに生成されるかを命令する。１つの例（コンポーネントタイプ解析に関して上述され、そこで再生された例）では、テストベンチプロセスにおいてユーザによって設計されたクロックコンポーネントは、以下のようになる。
【０１１１】
ａｌｗａｙｓｂｅｇｉｎ
Ｃｌｏｃｋ＝０；
＃５；
Ｃｌｏｃｋ＝１；
＃５；
ｅｎｄ；
このクロックコンポーネントの例において、ユーザは、論理「０」信号が最初に生成され、そして次に、５シミュレーション時間の後、論理「１」信号が生成されることを判定する。このクロック生成プロセスは、ユーザによって停止されるまで連続的に循環する。このシミュレーション時間は、カーネルによって進められる。
【０１１２】
決定ステップ３３４は、任意のアクティブクロックエッジが検出されるかどうかを問い合わせ、このステップは、ソフトウエアモデルおよび可能なハードウエアモデルにおいていくつかの種類の論理評価を生じさせる（エミュレーションが実行された場合）。アクティブクロックエッジを検出するためにカーネルが使用するクロック信号は、テストベンチプロセスからのクロック信号である。決定ステップ３３４が「いいえ」と評価する場合、カーネルはステップ３３７に進む。決定ステップ３３４が「はい」と評価する場合、その結果、レジスタおよびメモリを更新するステップ３３５に進み、組み合わせコンポーネントを伝達するステップ３３６に進む。クロック信号がアサートされた後、ステップ３３６は、実質的に、組み合わせ論理ネットワークを介して値を伝達するためのある時間を必要とする組み合わせ論理に注意する。一旦、値が組み合わせコンポーネントを介して伝達され、そして安定されると、カーネルはステップ３３７に進む。
【０１１３】
レジスタおよび組み合わせコンポーネントがハードウエアにおいてさらにモデル化され、それにより、カーネルは、Ｓエミュレーションシステムのエミュレータ部分を制御することに留意されたい。実際には、任意のアクティブクロックエッジが検出されたときはいつでも、カーネルは、ステップ３３４および３３５においてハードウエアモデルの評価を加速し得る。従って、従来技術とは異なり、本発明の一実施形態によるＳエミュレーションシステムは、ソフトウエアカーネルを介して、そしてコンポーネントタイプ（例えば、レジスタ、組み合わせ）に基づいてハードウエアエミュレータを加速し得る。さらに、カーネルは、サイクルごとのソフトウエアモデルおよびハードウエアモデルの実行を制御する。本質的には、エミュレータハードウエアモデルは、シミュレーションカーネルを実行する、汎用プロセッサに対するシミュレーションコプロセッサとして特徴付けられ得る。このコプロッセッサは、シミュレーションタスクをスピードアップする。
【０１１４】
ステップ３３７は、アクティブベンチコンポーネントを評価する。ステップ３３８は、シミュレーション時間だけ進める。ステップ３３９は、ステップ３３２で開始する制御ループのための境界（ｂｏｕｎｄａｒｙ）を提供する。ステップ３３９は、任意のテストベンチプロセスがアクティブであるかどうかを決定する。任意のテストベンチプロセスがアクティブである場合、シミュレーションおよび／またはエミュレーションがさらに実行され、より多くのデータが評価されるべきである。従って、カーネルは、ステップ３３２までループして、任意のアクティブテストベンチコンポーネントを評価する。テストベンチプロセスがアクティブでない場合、その時、シミュレーションおよびエミュレーションプロセスが完了される。ステップ３４０は、シミュレーション／エミュレーションプロセスを終了する。要するに、カーネルは、全Ｓエミュレーションシステムの動作を制御するメイン制御ループである。任意のテストベンチプロセスがアクティブである限り、カーネルはアクティブテストベンチコンポーネントを評価し、クロックコンポーネントを評価し、レジスタおよびメモリを更新するクロックエッジを検出し、ならびに組み合わせ論理データを伝達し、シミュレーション時間だけ進める。
【０１１５】
図６は、ハードウエアモデルの再構成可能な基板（ｂｏａｒｄ）への自動的マッピングのための方法の一実施形態を示す。ネットリストファイルは、ハードウエア実現プロセスへの入力を提供する。このネットリストは論理機能およびその相互接続を説明する。ハードウエアモデル／ＦＰＧＡ実現プロセスは３つの独立したタスク（マッピング、配置、およびルーティング）を含む。一般的にこのツールは、「配置およびルーティング」ツールと呼ばれる。使用される設計ツールは、ＶｉｅｗｌｏｇｉｃＶｉｅｗｄｒａｗ、スキマティックキャプチャシステム、ＸｉｌｉｎｘＸａｃｔ配置およびルーティングソフトウエア、あるいは、ＡｌｔｅｒａＭＡＸ＋ＰＬＵＳＩＩシステムであり得る。
【０１１６】
マッピングタスクは回路設計を論理ブロック、Ｉ／Ｏブロック、および他のＦＰＧＡリソースに分割する。フリップフロップおよびバッファのようないくつかの論理機能が対応するＦＰＧＡリソースに直接にマッピングされ得るが、組み合わせ論理等の他の論理機能は、マッピングアルゴリズムを用いて論理ブロックにおいて実現されなければならない。通常、ユーザは、最適な密度または最適な性能のためにマッピングを選択し得る。
【０１１７】
配置タスクは、マッピングタスクから論理ブロックおよびＩ／Ｏブロックを取り出すことと、および論理ブロックおよびＩ／ＯブロックをＦＰＧＡアレイ内部の物理領域に割り当てることを含む。現在のＦＰＧＡツールは通常、３つの技術（最小カット（ｍｉｎｃｕｔ）、シミュレートアニーリング、および汎用フォースダイレクティッド緩和（ｇｅｎｅｒａｌｆｏｒｃｅ−ｄｉｒｅｃｔｅｄｒｅｌａｘａｔｉｏｎ：ＧＦＤＲ））のいくつかの組み合わせを使用する。実質的に、これらの技術は、他の変数間において、相互接続の全ネット長または臨界信号経路のセットに沿う遅延に依存する種々のコスト関数に基づいて最適な配置を決定する。ＸｉｌｉｎｘＸＣ４０００シリーズのＦＰＧＡツールは、最初の配置に対する最小カット技術配置の後に続く配置の緻密な改良のためのＧＦＤＲ技術の改変体を使用する。
【０１１８】
このルーティングタスクは、種々のマッピングされたブロックおよび配置されたブロックを相互接続するために使用されたルーティング経路を決定することを含む。１つのこのようなルータ（迷路（ｍａｚｅ）ルータと呼ばれる）は、２点間の最短の経路を探し出す。ルーティングタスクは、チップ間の直接的な相互接続を提供するため、チップに関する回路の配置は重要である。
【０１１９】
初めにおいて、ハードウエアモデルは、ゲートネットリスト３５０またはＲＴＬ３５７のいずれかにおいて説明され得る。ＲＴＬレベルコードは、ゲートレベルネットリストにさらに合成され得る。マッピングプロセスの間、合成器サーバ３６０（ＡｌｔｅｒａＭＡＸ＋ＰＬＵＳＩＩプログラム可能な論理開発ツールシステムおよびソフトウエア等）を使用して、マッピング目的のための出力ファイルを生成し得る。合成器サーバ３６０は、ユーザの回路設計コンポーネントとライブラリ３６１において見出された任意の標準的な既存の論理素子（例えば、標準的な加算器または標準的な乗算器）とを一致させ、任意のパラメータ化されかつ頻繁に使用された論理モジュール３６２（例えば、標準的ではないマルチプレクサまたは標準的ではない加算器）を生成し、そして、ランダム論理素子３６３（例えば、カスタマイズされた論理機能を実現するルックアップテーブルに基づく論理）を合成する能力を有する。さらに合成器サーバは、冗長論理および使用されていない論理を取り除く。実質的に出力ファイルは、ユーザの回路設計によって必要とされた論理を合成または最適化する。
【０１２０】
ＨＤＬのいくつかまたは全てがＲＴＬレベルである場合、回路設計コンポーネントは、Ｓエミュレーションシステムが、Ｓエミュレーションレジスタまたはコンポーネントを用いてこれらのコンポーネントを容易にモデル化し得るような高い十分なレベルにおいて存在する。ＨＤＬのいくつかまたは全てがゲートネットリストレベルにおいて存在する場合、回路設計コンポーネントは、より回路設計特有になり得、ユーザ回路設計コンポーネントのＳエミュレーションコンポーネントへのマッピングをより困難にする。従って、シンセサイザサーバは、標準的な論理素子またはランダム論理素子の改変体に基づいて任意の論理素子を生成することを可能にする。標準的な論理素子またはランダム論理素子の改変体は、これらの改変体またはライブラリ標準論理素子において任意の並列性（ｐａｒａｌｌｅｌ）を有し得ない。
【０１２１】
回路設計がゲートネットリスト形態である場合、Ｓエミュレーションシステムは、グループ化またはクラスタリング動作３５１を最初に実行する。ハードウエアモデル構成は、クラスタリングプロセスに基づく。なぜなら、組み合わせ論理およびレジスタがクロックから分離されるためである。従って、共通の一次クロックまたはゲートクロック信号を共有する論理素子は、この素子を互いにグループ化し、チップ上に共に配置することによってより良好に提供され得る。クラスタリングアルゴリズムは、接続性駆動（ｃｏｎｎｅｃｔｉｖｉｔｙｄｒｉｖｅｎ）、階層的な抽出、および規則構造抽出に基づく。この記述が構造化されたＲＴＬ３５８において存在する場合、Ｓエミュレーションシステムは、論理機能分解動作３５９によって提示されるように、機能をより小さなユニットに分解し得る。任意の段において、論理合成または論理最適化が必要とされる場合、合成器サーバ３６０は、回路設計を、ユーザの命令の基づくより効率的な表示を変換することに利用可能である。クラスタリング動作３５１に対して、合成器サーバに対するリンクは、点線矢印３６４によって示される、構造化されたＲＴＬ３５８について、合成器サーバ３６０へのリンクは、矢印３６５によって示される。論理機能分解動作３５９に対して、合成器サーバ３６０へのリンクが矢印３６６によって示される。
【０１２２】
クラスタリング動作３５１は、機能およびサイズに基づいて選択された態様で論理コンポーネントを共にグループ化する。このクラスタリングは、小さい回路設計に対して１つのみのクラスタまたは大きな回路設計に対していくつかのクラスタを含み得る。にもかかわらず、以後のステップにおいて、論理素子のクラスタが使用されて、このクラスタを設計されたＦＰＧＡチップにマッピングする。すなわち、あるクラスタが特定のチップに照準を定め、別のクラスタは、異なるチップ、または恐らく第１のクラスタと同一のチップに照準を定める。通常、クラスタ内の論理素子は、チップにおけるクラスタと共に存在するが、最適化目的のために、クラスタは１つ以上のチップに分割される必要があり得る。
【０１２３】
クラスタはクラスタリング動作３５１において形成された後、システムは配置およびルーティング動作を実行する。最初に、クラスタのＦＰＧＡチップへの粗いグレイン配置動作３５２が実行される。最初に、粗いグレイン利得配置動作３５２は、論理素子のクラスタを選択されたＦＰＧＡチップに配置する。必要ならば、矢印３６７に示されるように、システムは合成器サーバ３６０を粗いグレイン配置動作３５２に対して利用可能にする。粗い利得配置動作の後に、緻密なグレイン配置動作が実行され、最初の配置を精密に調整する。Ｓエミュレーションシステムは、ピン利用条件、ゲート利用条件、およびゲート間ホップに基づくコスト関数を使用して、粗いグレイン配置動作および緻密なグレイン配置動作の両方に対する最適な配置を決定する。
【０１２４】
クラスタが所定のチップにどのように配置されるのかを決定することは、配置コストに基づき、この配置コストは、コスト関数ｆ（Ｐ，Ｇ，Ｄ）によって、２つ以上の回路（すなわち、ＣＫＴＱ＝ＣＫＴ１，ＣＫＴ２，．．．，ＣＫＴＮ）およびＦＰＧＡチップのアレイの各位置に対して計算される。ここで、Ｐは、一般的にピンの使用／使用可能性であり、Ｇは、一般的にゲートの使用／使用可能性であり、Ｄは、接続性マトリクスＭ（図８と共に図７に示される）によって規定されるように、ゲート間「ホップ」の距離または数である。ハードウエアモデルにおいてモデル化されたユーザの回路設計は、回路ＣＫＴＱの全組み合わせを含む。各コスト関数は、計算された配置コストの計算された値が、一般的に生成される傾向にあるように定義される。すなわち、（１）ＦＰＧＡアレイにおける任意の２つの回路ＣＫＴＮ−１とＣＫＴＮとの間の「ホップ」の最小数、および（２）ピン使用が最小化されるようなＦＰＧＡアレイにおける回路ＣＫＴＮ−１およびＣＫＴＮの配置である。
【０１２５】
一実施形態では、コスト関数Ｆ（Ｐ，Ｇ，Ｄ）は、以下のように定義される。
【０１２６】
【数１】

【０１２７】
この式は、以下の式で簡略化され得る。
【０１２８】
ｆ（Ｐ，Ｇ，Ｄ）＝Ｃ０＊Ｐ＋Ｃ１＊Ｇ＋Ｃ２＊Ｄ
第１の項（すなわち、Ｃ０＊Ｐ）は使用されたピンの数および利用可能なピンの数に基づいて第１の配置コストを生成する。第２の項（すなわち、Ｃ１＊Ｇ）は、使用されたゲートの数および利用可能なゲートの数に基づく第２の配置コストを生成する。第３の項（すなわち、Ｃ２＊Ｄ）は、回路ＣＫＴＱ（すなわち、ＣＫＴ１、ＣＫＴ２、．．．、ＣＫＴＮ）において、種々の相互接続間に存在するホップの数に基づいて配置コスト値を生成する。全配置コスト値は、反復的にこれら３つの配置コスト値を加算することによって生成される。定数Ｃ０、Ｃ１、およびＣ２は、任意の反復配置コスト計算が行われる間、最も重要である１つ以上のファクタ（すなわち、ピン使用、ゲート使用、またはゲート間ホップ）に対するこのコスト関数から生成された全配置コスト値を選択的に非対称にする（ｓｋｅｗ）重み付き定数を表す。
【０１２９】
システムが重み付き定数Ｃ０、Ｃ１、およびＣ２に対して異なる相対値を選択する場合、配置コストが繰り返して計算される。従って、一実施形態では、粗いグレイン配置動作の間、システムは、Ｃ２に対してＣ０およびＣ１により大きな値を選択する。この反復では、システムは、ＦＰＧＡチップのアレイにおいて、ピン使用／利用可能性およびゲート使用／利用可能性の最適化は、回路ＣＫＴＱの最初の配置においてゲート間ホップを最適化するよりもより重要である事を決定する。以後の反復では、システムは、Ｃ２に対するＣ０およびＣ１に対する小さい値を選択する。この反復では、システムは、ゲート間ホップを最適化することは、ピン使用／利用可能性およびゲート使用／利用可能性を最適化することよりもより重要である。
【０１３０】
緻密なグレイン配置動作の間、システムは同じコスト関数を使用する。一実施形態では、Ｃ０、Ｃ１、およびＣ２の選択に関する反復すステップは、粗いグレイン動作と同じである。別の実施形態では、緻密なグレイン配置動作は、Ｃ２に対するＣ０およびＣ１に対する小さい値をシステムに選択させることを含む。
【０１３１】
ここで、これらの変数および式の説明が行われる。ＦＰＧＡチップｘまたはＦＰＧＡチップｙ（他のＦＰＧＡチップの中の）における所定の回路ＣＫＴＱを配置するかどうかを決定する場合、コスト関数は、ピン使用／利用可能性（Ｐ）、ゲート使用／利用可能性Ｇ、およびゲート間ホップＤを検査する。コスト関数の変数Ｐ、Ｇ、およびＤに基づいて、コスト関数ｆ（Ｐ，Ｇ，Ｄ）は、ＦＰＧＡアレイの特定の位置に回路ＣＴＫＱを配置するための配置コスト値を生成する。
【０１３２】
ピン使用／利用可能性Ｐはまた、Ｉ／Ｏ容量を示す。Ｐ_ｕｓｅｄは、各ＦＰＧＡチップの回路ＣＫＴＱによって使用されたピンの数である。Ｐ_{ａｖａｉｌａｂｌｅ}は、ＦＰＧＡチップにおける利用可能なピンの数である。一実施形態では、Ｐ_{ａｖａｉｌａｂｌｅ}は、２６４（４４ピン×６相互接続／チップ）であるが、別の実施形態では、Ｐ_{ａｖａｉｌａｂｌｅ}は、２６５（４４ピン×６相互接続／チップ＋１余分のピン）である。しかし、特定の数の利用可能なピンは、使用されたＦＰＧＡチップのタイプ、チップあたり使用された相互接続の全体の数、および各相互接続に対して使用されたピンの数に依存する。従って、Ｐ_{ａｖａｉｌａｂｌｅ}はかなり変動され得る。そのため、コスト関数Ｆ（Ｐ，Ｇ，Ｄ）の式の第１の項（すなわち、Ｃ０＊Ｐ）を評価するために、各ＦＰＧＡチップに対して比Ｐ_ｕｓｅｄ／Ｐ_{ａｖａｉｌａｂｌｅ}が計算される。従って、ＦＰＧＡチップの４×４アレイに対して、１６の比Ｐ_ｕｓｅｄ／Ｐ_{ａｖａｉｌａｂｌｅ}が計算される。所与の利用可能な数のピンに対してピンの数が多く使用されればされるほど、その比が大きくなる、所与の利用可能な数のピンに対して使用される。１６の計算された比のうち、最も大きい数を生成する比が選択される。第１の配置コスト値が、選択された最大の比Ｐ_ｕｓｅｄ／Ｐ_{ａｖａｉｌａｂｌｅ}と重み定数Ｃ０とを乗算することによって第１の項Ｃ０＊Ｐから計算される。この第１の項は、計算された比Ｐ_ｕｓｅｄ／Ｐ_{ａｖａｉｌａｂｌｅ}および各ＦＰＧＡチップに対して計算された比の中で特定の最大の比に依存するため、配置コスト値は、全ての他のファクタが等しいとして、より高いピン使用に対してより大きくなる。システムは、最も低い配置コストを生成する配置を選択する。全ての他のファクタが等しいとして、種々の配置に対して計算された全ての最大値の中で最も小さい最大比Ｐ_ｕｓｅｄ／Ｐ_{ａｖａｉｌａｂｌｅ}を生成する特定の配置がＦＰＧＡアレイの最適な配置として一般的に考慮される。
【０１３３】
ゲート使用／利用可能性Ｇは各ＦＰＧＡチップによって許されるゲートの数に基づく。一実施形態では、アレイ中の回路ＣＫＴＱの位置に基づいて、各チップにおいて使用されたゲートＧ_ｕｓｅｄの数が所定の閾値よりも高い場合、この結果、この第２の配置コスト（Ｃ１＊Ｇ）は、配置が実現可能でないことを示す値を割り当てられる。同様に、回路ＣＫＴＱを含む各チップにおいて使用されたゲートの数が所定の閾値または所定の閾値よりも小さい場合、この結果、この第２の項（Ｃ１＊Ｇ）は、配置が実現可能であることを示す値に割り当てられる。従って、システムが特定のチップに回路ＣＫＴ１を配置することを望み、そのチップが回路ＣＫＴ１に収容させるのに十分なゲートを有しない場合、この結果システムは、この特定の配置が実現不可能であることをコスト関数によって結論し得る。一般的には、Ｇが大きい数（例えば、無限大）であることは、回路ＣＫＴＱの所望の配置が実現不可能であり、代替の配置が決定されるべきであることを示す高い配置コスト値を生成することを確実にする。
【０１３４】
別の実施形態では、アレイにおける回路ＣＫＴＱの位置に基づいて、比Ｇ_ｕｓｅｄ／Ｇ_{ａｖａｉｌａｂｌｅ}が各チップに対して計算される。ただし、Ｇ_ｕｓｅｄは、各ＦＰＧＡチップにおける回路ＣＫＴＱによって使用されたゲートの数であり、Ｇ_{ａｖａｉｌａｂｌｅ}は、各チップにおいて利用可能なゲートの数である。一実施形態では、システムは、ＦＰＧＡアレイに対してＦＬＥＸ１０Ｋ１００チップを使用する。ＦＬＥＸ１０Ｋ１００チップは、約１００，０００ゲートを含む。従って、この実施形態では、Ｇ_{ａｖａｉｌａｂｌｅ}は、１００，０００ゲートに等しい。従って、ＦＰＧＡチップの４×４アレイに対して、１６の比Ｇ_ｕｓｅｄ／Ｇ_{ａｖａｉｌａｂｌｅ}が計算される。所与の数の利用可能なゲートに対して使用されるゲートが多くなると、この比がより大きくなる。１６の計算された比の中で、最も大きい数を生成する比が選択される。第２の配置コスト値が、選択された最大比Ｇ_ｕｓｅｄ／Ｇ_{ａｖａｉｌａｂｌｅ}と重み定数Ｃ１とを乗算することによって、第２の項Ｃ１＊Ｇから計算される。この第２項は、計算された比Ｇ_ｕｓｅｄ／Ｇ_{ａｖａｉｌａｂｌｅ}および各ＦＰＧＡチップに対して計算された比の中で特定の最大比に依存するため、配置コスト値は、全ての他のファクタが等しくても、より高いゲート使用に対してより大きくなる。システムは、最も低い配置コストを生成する回路配置を選択する。種々の配置に対して計算された全最大値の中で最も小さい最大比Ｇ_ｕｓｅｄ／Ｇ_{ａｖａｉｌａｂｌｅ}を生成する特定の配置は、一般的に、全ての他のファクタが等しくても、ＦＰＧＡアレイにおける最適な配置として考慮される。
【０１３５】
別の実施形態では、最初にＣ１に対してある値を選択する。比Ｇ_ｕｓｅｄ／Ｇ_{ａｖａｉｌａｂｌｅ}が「１」より大きい場合、この特定の配置は実現不可能である（すなわち、少なくとも１つのチップが、回路のこの特定の配置に対して十分なゲートを有さない）。結果として、システムは、Ｃ１を非常に大きい数（例えば、無限大）を用いて変更し、従って、第２項Ｃ１＊Ｇは、また非常に大きい数であり、全配置コスト値ｆ（Ｐ，Ｇ，Ｄ）もまた、非常に大きい。他方では、比Ｇ_ｕｓｅｄ／Ｇ_{ａｖａｉｌａｂｌｅ}が「１」以下である場合、この結果、この特定の配置は実現可能である（すなわち、各チップは、回路実現を支援するのに十分なゲートを有する）。結果として、システムはＣ１を変更せず、従って、第２の項Ｃ１＊Ｇは、特定の数となる。
【０１３６】
第３の項Ｃ２＊Ｄは、相互接続を必要とする全ゲートの間のホップの数を表す。さらにホップの数は、相互接続マトリクスに依存する。接続性マトリクスは、チップ間相互接続を必要とする任意の２つのゲート間の回路経路を決定するための基礎を提供する。全てのゲートがゲート間接続を必要とするとは限らない。ユーザの元の回路設計およびクラスタを所定のチップに分割することに基づいて、いくつかのゲートは任意の相互接続を少しも必要としない。なぜなら、１つ以上の論理回路素子が、それぞれの１つ以上の入力かつ１つ以上の出力に接続され、１つ以上の論理回路素子が同一のチップに配置される。しかし、他のゲートは相互接続を必要とする。なぜなら、１つ以上の論理素子が１つ以上の各入力かつ１つ以上の各出力に接続され、１つ以上の論理素子が異なるチップ内に配置される。
【０１３７】
「ホップ」を理解するために、図７においてテーブル形式で示され、図８において図解的に示される接続性マトリクスを参照する。チップＦ１１とチップＦ１４との間の相互接続６０２等のチップ間の各相互接続が、４４のピンまたは４４の配線で表される。他の実施形態では、各相互接続は４４よりも多くのピン示す。さらに他の実施形態では、各相互接続は４４ピン未満を示す。
【０１３８】
この相互接続スキームにおいて、２つの「ホップ」または「ジャンプ」によって、データはあるチップから別のチップまで通過し得る。従って、データは、相互接続６０１を介して、１つのホップにおいてチップＦ１１からチップＦ１２まで通過し得、データは、相互接続６００および６０６、あるいは相互接続６０３および６１０を介して２つのホップにおいてチップＦ１１からチップＦ３３まで通過し得る。これらの例示的なホップは、これらのセットのチップ間で最も短い経路のホップである。いくつかの例では、信号は、種々のチップを介して、一方のチップのゲートと他方のチップのゲート間のホップの数が最も短い経路のホップを超えるようにルーティングされ得る。ゲート間ホップの数を決定する際に検査されなければならない回路の経路のみが、相互接続に必要な回路の経路である。
【０１３９】
接続性が、内部チップ相互接続を必要とするゲート間の全てのホップの合計によって示される。任意の２つのチップ間の最も短い経路は、図７および図８の接続性マトリクスを用いて１または２の「ホップ」によって表され得る。しかし、所定のハードウエアモデル実現では、Ｉ／Ｏ容量が、アレイにおいて任意の２つのゲート間で直接の最も短い経路接続の数を限定し得、従って、これらの信号は、より長い経路（従って２よりも多いホップ）を通ってルーティングされ、目的箇所まで到達させなければならない。従って、ホップの数は、いくつかのゲート間接続に対して２を超え得る。一般的に、全てが等しいと、より少ない数のホップが、より少ない配置コストで生じる。
【０１４０】
第３の項（すなわち、Ｃ２＊Ｄ）は、以下の式のように再現される。
【０１４１】
【数２】

【０１４２】
第３の項は、重み定数Ｃ２と加算コンポーネント（Ｓ．．．）との積である。加算コンポーネントは、実質的に、チップ間相互接続を要するユーザの回路設計における各ゲートｉとゲートｊとの間の全ホップの合計である。上述のように、全てのゲートが必ずしも内部チップ相互接続を必要とする訳ではない。内部チップ相互接続を必要とするこれらのゲートｉおよびゲートｊに対して、ホップの数が決定される。全てのゲートｉおよびｊに対して、ホップの全数が共に加算される。
【０１４３】
距離計算もまた以下の式のように定義され得る。
【０１４４】
【数３】

【０１４５】
ここで、Ｍは接続性マトリクスである。接続性マトリクスの一実施形態は、図７に示される。この距離は、相互接続を要する各ゲート間接続に対して計算される。従って、各ゲートｉおよびゲートｊの比較に対して、接続性マトリクスＭが検査される。より詳細には、
【０１４６】
【数４】

【０１４７】
各チップが識別可能に番号付けされるように、マトリクスがアレイの全てのチップに設定される。これらの識別番号は、カラムのヘッダーとしてマトリクスの頂上に設定される。同様に、これらの識別番号は、ロウのヘッダーとしてマトリクスの側面に沿って設定される。このマトリクスにおけるロウおよびカラムの交点における特定のエントリは、ロウによって識別されたチップとカラムによって識別されたチップとの間の直接的な接続データを提供し、この間で交差が生じる。チップｉとチップｊとの間の任意の距離計算について、マトリクスＭ_ｉ，ｊにおけるエントリは、直接接続に対して「１」または直接でない接続に対して「０」のいずれか一方を含む。インデックスｋは、相互接続に必要である、チップｉにおける任意のゲートをチップｊにおける任意のゲートに相互接続する必要があるホップの数を示す。
【０１４８】
最初に、Ｋ＝１に対する接続性マトリクスＭ_ｉ，ｊが検査されるべきである。エントリが「１」である場合、チップｉのこのゲートの、チップｊにおける選択されたゲートへの直接接続が存在する。従って、インデックスまたはホップｋ＝１は、Ｍ_ｉ，ｊの結果として指定され、これは、これら２つのゲート間の距離を生じる。この点において、別のゲート間接続が検査され得る。しかし、エントリが「０」である場合、直接接続が存在しない。
【０１４９】
直接接続が存在しない場合、次のｋが検査されるべきである。新しいｋ（すなわち、ｋ＝２）は、マトリクスＭ_ｉ，ｊと自身とを乗算するによって計算され得る。言い換えると、Ｍ^２＝Ｍ＊Ｍ（ここで、ｋ＝２）である。
【０１５０】
チップｉおよびチップｊに対する特定のロウおよびカラムエントリまで、ＭとＭ自身とを乗算するプロセスが、計算された結果が「１」になるまで継続し、この点において、インデックスｋは、ホップの数として選択される。この演算は、論理積マトリクスＭ同士の論理積演算を行うこと、および、次いで、この論理積演算の結果を論理和演算ことを含む。マトリクスｍ_ｉ，ｌとｍ_ｌ，ｊとの間の論理積演算は、論理「１」値を生じた場合、その結果、接続が、ホップｋ内において、任意のチップ１を介して、チップｉにおいて選択されたゲートとチップｊにおいて選択されたゲートとの間に存在する。そうでない場合、接続がこの特定のホップｋ内に存在せず、さらなる計算が必要である。マトリクスｍ_ｉ，ｌおよびｍ_ｌ，ｊは、このハードウエアモデリングに対して定義されたように接続性マトリクスＭである。相互接続を必要とする任意の所与のゲートｉおよびゲートｊに対して、マトリクスｍ_ｉ，ｌにおけるゲートｉに対するＦＰＧＡチップを含むロウは、ゲートｊおよびｍ_ｌ，ｊに対するＦＰＧＡチップを含むカラムに論理的に論理積演算される。個々の論理積演算されたコンポーネントが論理和演算され、インデックスまたはホップｋに対する生成したＭ_ｉ，ｊ値が、「１」または「０」であるかどうかを決定する。その結果が「１」である場合、その結果接続が存在し、インデックスｋがホップの数として指定される。結果が「０」になる場合、従って接続が存在しない。
【０１５１】
以下の例がこれらの原理を説明する。図３５（Ａ）〜図３５（Ｄ）を参照して、図３５（Ａ）は、クラウド１０９０として示されたユーザの回路設計を示す。この回路設計１０９０は単純または複雑であり得る。回路設計１０９０の一部は、ＯＲゲート１０９１および２つのＡＮＤゲート１０９２および１０９３を含む。ＡＮＤゲート１０９２および１０９３の出力は、ＯＲゲート１０９１の入力に接続される。さらに、これらのゲート１０９１、１０９２、および１０９３は、回路設計１０９０の他の部分に接続され得る。
【０１５２】
図３５（Ｂ）を参照して、３つのゲート１０９１、１０９２、および１０９３を含む部分を含む回路１０９０のコンポーネントは、ＦＰＧＡチップ１０９４、１０９５、および１０９６に構成および配置され得る。ＦＰＧＡチップの特定の例示的なアレイは、示されるように相互接続スキームを有する。すなわち、相互接続１０９７のセットは、チップ１０９４とチップ１０９５とを接続し、相互接続１０９８の別のセットは、チップ１０９５とチップ１０９６とを接続する。直接的な相互接続がチップ１０９４とチップ１０９６との間に設けられない。この回路設計１０９０のコンポーネントをチップに配置する場合、システムは、予め設計された相互接続スキームを使用して、異なるチップにわたって回路経路を接続する。
【０１５３】
図３５（Ｃ）を参照して、１つの可能な構成および配置は、チップ１０９４内に配置されたＯＲゲート１０９１、チップ１０９５内に配置されたＡＮＤゲート１０９２、およびチップ１０９６に配置されたＡＮＤゲート１０９３である。回路１０９０の他の部分は、教示目的のために示されない。ＯＲゲート１０９１とＡＮＤゲート１０９２との間の接続は相互接続を必要とする。なぜなら、これらのゲートは、異なるチップに配置され、相互接続１０９７のセットが使用されるためである。この相互接続に対するホップの数は「１」である。ＯＲゲート１０９１とＡＮＤゲート１０９３との間の接続もまた、相互接続を必要とし、相互接続１０９７および１０９８のセットが使用される。ホップの数は「２」である。この配置の例として全体の数のホップは、「３」であり、この他のゲートからの寄与および図示されない回路１０９０の残りにおける相互接続を差し引く。
【０１５４】
図３５（Ｄ）は、別の配置の例を示す。ここで、ＯＲゲート１０９１は、チップ１０９４に配置され、ＡＮＤゲート１０９２および１０９３は、チップ１０９５に配置される。再び、回路１０９０の他の部分は、教示目的のために示されない。ＯＲゲート１０９１とＡＮＤゲート１０９２との間の接続は、相互接続を要求する。なぜなら、異なるチップ内に配置され、相互接続１０９７のセットが使用される。この接続に対するホップの数は「１」である。さらに、ＯＲゲート１０９１とＡＮＤゲート１０９３との間の接続もまた、相互接続を要求し、相互接続１０９７のセットが使用される。さらにホップの数は「１」である。この配置の例に対して、ホップの全体の数は、「２」であり、他のゲートからの寄与および図示されない回路１０９０の残りにおける相互接続を差し引く。このようにして、距離Ｄパラメータのみに基づき、他の全てのファクタが等しいと仮定すると、コスト関数は図３５（Ｃ）の配置の例よりも、図３５（Ｄ）の配置の例の方がより低いコスト関数を計算する。しかし、他の全てのファクタが等しくない。恐らく、図３５（Ｄ）に対するコスト関数はまた、ゲート使用／利用可能性Ｇに基づく。図３５（Ｄ）では、図３５（Ｃ）において同一のチップ内で使用されたゲートよりも、さらに１つ多いのゲートがチップ１０９５内で使用される。さらに、図３５（Ｃ）に示された配置の例においてチップ１０９５に対するピン使用／利用可能性Ｐは、図３５（Ｄ）に示される他の配置の例において同じチップに対するピン使用／利用可能性より大きい。
【０１５５】
粗いグレイン配置の後で、平坦化されたクラスタの配置の緻密な調整が配置結果をさらに最適化する。この緻密なグレイン配置動作３５３は、粗いグレイン配置動作３５２によって最初に選択された配置を改良する。ここで、このような構成が最適化を増加させる場合、最初のクラスタは分解され得る。例えば、論理素子ＸおよびＹがクラスタＡのもとの一部であり、ＦＰＧＡチップ１に対して指定されることを仮定する。緻密なグレイン配置動作３５３によると、論理構成素子ＸおよびＹは、今や別々のクラスタＢとして指定され得るか、または別のクラスタＣの一部を形成し、ＦＰＧＡチップ２における配置に対して指定される。ユーザの回路設計を特定のＦＰＧＡに接続するＦＰＧＡネットリスト３５４が生成される。
【０１５６】
クラスタがどれくらい分割されるかおよび所定のチップにどれくらい配置されるかの決定は、また配置コストに基づき、そしてこの配置コストは、回路ＣＫＴＱに対するコスト関数ｆ（Ｐ，Ｇ，Ｄ）によって計算される。一実施形態では、緻密なグレイン配置プロセスに対して使用されるコスト関数は、粗いグレイン配置プロセスに対して使用されたコスト関数と同一である。２つの配置プロセス間の差のみが、プロセス自身の配置ではなく配置されたクラスタのサイズである。粗いグレイン配置プロセスは、緻密なグレイン配置プロセスよりもより大きいクラスタを使用する。他の実施形態では、粗いグレイン配置プロセスおよび緻密なグレイン配置プロセスに対するコスト関数は、選択重み定数Ｃ０、Ｃ１、およびＣ２に関して上述したように互いに異なる。
【０１５７】
配置が終了すると、チップ間のルーティングタスク３５５が実行される。異なるチップにおいて配置された回路を接続するルーティングワイヤの数が、回路間ルーティングに対して割り当てられたこれらのＦＰＧＡチップにおいて利用可能なピンを超える場合、時分割多重化（ＴＤＭ）回路が用いられ得る。例えば、各ＦＰＧＡチップが２つの異なるＦＰＧＡチップにおいて配置された回路を接続するために４４ピンのみを可能でかつ、特定のモデルの実現は、チップ間に４５ワイヤを必要とする場合、特定の時分割多重化回路は、さらに各チップ内に実現され得る。この特定のＴＤＭ回路は少なくとも２つのワイヤと共に接続する。ＴＤＭ回路の１つの実施形態は、図９Ａ、図９Ｂ、および図９Ｃに示され、そしてこれらは以後説明される。従って、ルーティングタスクが常に完成される。なぜなら、このピンはこれらのチップの中から時分割多重化形態に構成され得るためである。
【０１５８】
一旦、各ＦＰＧＡの配置およびルーティングが決定されると、各ＦＰＧＡは、最適化された駆動回路に構成され得、従って、システムは、「ビットストリーム」構成ファイル３５６を生成する。Ａｌｔｅｒａの用語では、システムは１以上のプログラマオブジェクトファイル（．ｐｏｆ）を生成する。他の生成されたファイルは、ＳＲＡＭオブジェクトファイル（．ｓｏｆ）、ＪＥＤＥＣファイル（．ｊｅｄ）、１６進法の（インテルフォーマット）ファイル（．ｈｅｘ）、およびチューブラ（ｔｕｂｌａｒ）テキストファイル（．ｔｔｆ）を含む。ＡｌｔｅｒａＭＡＸ＋ＰＬＵＳＩＩプログラマーは、Ａｌｔｅｒａハードウエアプログラム可能なデバイスと共にＰＯＦ、ＳＯＦ、ＪＥＤＥＣファイルを使用して、ＦＰＧＡアレイをプログラムする。あるいは、１以上の生の（ｒａｗ）バイナリファイル（．ｒｂｆ）を生成する。ＣＰＵは．ｒｂｆファイルを受信し、ＰＣＩバスを介してＦＰＧＡアレイをプログラムする。
【０１５９】
この点において、構成されたハードウエアは、ハードウエアスタートアップ３７０のために準備中である。これは再構成可能な基板上のハードウエアモデルの自動構成を終了する。
【０１６０】
ピン出力のグループが共に時分割多重化され、１つのピン出力のみが実際に使用されることを可能にするＴＤＭ回路を戻って、実質的に、ＴＤＭ回路は、少なくとも２つの入力（２つのワイヤに対して）、１つの出力、およびループ内にセレクタ信号として構成されたレジスタの接続を有するマルチプレクサである。Ｓエミュレーションシステムが、より多くのワイヤが共にグループ化されることを要求する場合に、より多くの入力およびループレジスタが提供され得る。このＴＤＭ回路へのセレクタ信号として、ループ内に構成された数個のレジスタは、適切な信号をマルチプレクサに提供し、１つの期間において、入力の１つが出力として選択され、別の期間では、別の入力が出力として選択される。従って、ＴＤＭ回路は、チップ間のただ１つの出力ワイヤのみを使用することを管理し、この例では、特定のチップにおいて実現された回路のハードウエアモデルが、４５ピンの代わりに、４４ピンを用いて達成され得る。従って、ルーティングタスクは、常に終了され得る。なぜなら、ピンがチップの中でも時分割多重化形態に配置され得るためである。
【０１６１】
図９Ａは、ピンアウト問題の概略図を示す。これはＴＤＭ回路を要求するため、図９Ｂは、送信側のためのＴＤＭ回路を提供し、図９Ｃは、受信側のためのＴＤＭ回路を提供する。これらの図は、Ｓエミュレーションシステムがチップ間の２つのワイヤの代わりに１つのワイヤを要求する１つの特定の例のみを示す。２つ以上のワイヤが、時間多重化された構成において共に接続しなければならない場合、当業者は、以下の教示を考慮して適切な改変を可能にし得る。
【０１６２】
図９Ａは、ＳエミュレーションシステムがＴＤＭ構成において２つのワイヤを接続するＴＤＭ回路の一実施形態を示す。２つのチップ９９０および９９１が設けられる。完全なユーザ回路設計の一部である回路９６０がチップ９９１内にモデル化され、配置される。完全なユーザ回路設計の一部である回路９７３がチップ９９０内にモデル化され、配置される。相互接続９９４、相互接続９９２、および相互接続９９３の群を含むいくつかの相互接続が回路９６０と回路９７３との間に設けられる。この例では、相互接続の数は全部で４５である。一実施形態において、各チップはこれらの相互接続に対してせいぜい４４ピンのみを提供する場合、本発明の一実施形態は、時間多重化される相互接続のうち少なくとも２つを提供し、これらのチップ９９０と９９１との間で１つのみの相互接続を要求する。
【０１６３】
この例では、相互接続９９４のグループは、４３ピンの使用を継続する。第４４番目および最後のピンに対して、本発明の一実施形態によるＴＤＭ回路は、時分割多重化された形態と共に相互接続９９２および９９３を接続するために使用され得る。
【０１６４】
図９Ｂは、ＴＤＭ回路の一実施形態を示す。ＦＰＧＡチップ９９１内のモデル化された回路（またはその一部）９６０は、ワイヤ９６６および９６７上の２つの信号を供給する。回路９６０に対して、これらのワイヤ９６６および９６７が出力する。通常、これらの出力は、チップ９９０（図９Ａおよび図９Ｃを参照）においてモデル化された回路９７３に接続される。しかし、これらの２つの出力ワイヤ９６６および９６７に対して１つのピンのみの利用可能性は、直接的なピン間接続を除外する。出力９６６および９６７が、単一方向に他のチップに伝達されるため、適切な送信および受信器ＴＤＭ回路はこれらのラインを共に接続するために設けられなければならない。送信側のＴＤＭ回路の一実施形態は、図９Ｂに示される。
【０１６５】
送信側のＴＤＭ回路は、ＡＮＤゲート９６１および９６２を含み、そのそれぞれの出力９７０および９７１は、ＯＲゲート９６３の入力に接続される。ＯＲゲート９６３の出力９７２は、ピンに割り当てられ、別のチップ９９０に接続されたチップの出力である。ＡＮＤゲート９６１および９６２への入力９６６および９６７の１つのセットは、それぞれ回路モデル９６０によって提供される。入力９６８および９６９の他のセットは、ループ化されたレジスタスキームによって提供され、そのスキームは、時分割多重化セレクタ信号として機能する。
【０１６６】
ループ化されたレジスタスキームはレジスタ９６４および９６５を含む。レジスタ９６４の出力９９５は、レジスタ９６５の入力およびＡＮＤゲート９６１の入力９６８に提供される。レジスタ９６５の出力９９６は、レジスタ９６４の入力およびＡＮＤゲート９６２の入力９６９に供給される。各レジスタ９６４および９６５は、共通のクロックソースによって制御される。任意の所与の瞬間において、出力９９５または９９６の１つのみが論理「１」を供給する。他は論理「０」である。従って、各クロックエッジの後、論理「１」は、出力９９５と出力９９６との間でシフトする。次に、これは、ＡＮＤゲート９６１またはＡＮＤゲート９６２のいずれかに「１」を供給し、ワイヤ９６６またはワイヤ９６７のいずれかの信号を「選択する」。従って、ワイヤ９７２上のデータは、ワイヤ９６６またはワイヤ９６７のいずれかの回路９６０から生じる。
【０１６７】
ＴＤＭ回路の受信側の一実施形態は、図９Ｃに示される。チップ９９１のワイヤ９６６およびワイヤ９６７上の回路９６０（図９Ａおよび図９Ｂ）からの信号は、図９Ｃにおける回路９７３への適切なワイヤ９８５または９８６に接続されなければならない。チップ９９１からの時分割多重信号は、ワイヤ／ピン９７８から入力する。受信機側ＴＤＭ回路は、ワイヤ／ピン９７８上のこれらの信号を回路９７３への適切なワイヤ９８５および９８６に接続し得る。
【０１６８】
ＴＤＭ回路は、入力レジスタ９７４および９７５を含む。ワイヤ／ピン９７８上の信号は、ワイヤ９７９および９８０それぞれを介してこれらの入力レジスタ９７４および９７５に供給される。入力レジスタ９７４の出力９８５は、回路９７３における適切なポートに供給される。同様に、入力レジスタ９７５の出力９８６は、回路９７３内の適切なポートに供給される。これらの入力レジスタ９７４および９７５はループされたレジスタ９７６および９７７によって制御される。
【０１６９】
レジスタ９７６の出力９８４は、レジスタ９７７の入力およびレジスタ９７４のクロック入力９８１に接続される。レジスタ９７７の出力９８３は、レジスタ９７６の入力およびレジスタ９７５のクロック入力９８２に接続される。各レジスタ９７６および９７７は、共通のクロックソースによって制御される。任意の瞬間において、イネーブル入力９８１または９８２の内の１つのみは論理「１」である。他は論理「０」において存在する。従って、各クロックエッジの後、論理「１」は、イネーブル入力９８１と出力９８２との間でシフトする。次に、これは、ワイヤ９７９またはワイヤ９８０のいずれかの信号を「選択する」。従って、回路９６０からのワイヤ９７８上のデータは、ワイヤ９８５またはワイヤ９８６を介して回路９７３にほぼ接続される。
【０１７０】
本発明の一実施形態によるアドレスポインタは、図４を参照して簡単に説明されるように、以下に詳細に説明される。繰り返しに対して、いくつかのアドレスポインタは、ハードウエアモデルにおける各ＦＰＧＡチップに配置される。一般的には、アドレスポインタを実現するための主要な目的は、システムが、３２ビットＰＣＩバス３２８（図１０を参照）を介して、ソフトウエアモデル３１５とハードウエアモデル３２５における特定のＦＰＧＡチップとの間のデータを送達することを可能にすることである。より詳細には、３２−ビットＰＣＩバスの帯域幅の制限を考慮して、ソフトウエア／ハードウエア境界およびＦＰＧＡのバンク３２６ａ〜３２６ｄの間の各チップにおいて、アドレスポインタの第一の目的がアドレス空間（すなわち、ＲＥＧ、Ｓ２Ｈ、Ｈ２Ｓ、およびＣＬＫ）のそれぞれの間のデータ送達を選択的に制御することである。６４ビット−ＰＣＩバスが実現されても、これらのアドレスポインタがデータ送達の制御をさらに必要とする。従って、ソフトウエアモデルが５つのアドレス空間（すなわち、ＲＥＧ読み出し、ＲＥＧ書き込み、Ｓ２Ｈ読み出し、Ｈ２Ｓ書き込み、およびＣＬＫ書き込み）を有し、各ＦＰＧＡチップは、５つのアドレス空間に対応する５つのアドレスポインタを有する。各ＦＰＧＡは、これらの５つのアドレスポインタを必要とする。なぜなら、選択されたアドレス空間において特定の選択され処理されたワードが、任意の１つ以上のＦＰＧＡチップに常駐し得るためである。
【０１７１】
ＦＰＧＡＩ／Ｏコントローラ３８１は、ＳＰＡＣＥインデックスを用いることによってソフトウエア／ハードウエア境界に対応する特定のアドレス空間（すなわち、ＲＥＧ、Ｓ２Ｈ、Ｈ２Ｓ、およびＣＬＫ）を選択する。一旦、アドレス空間が選択されると、各ＦＰＧＡチップにおいて選択されたアドレス空間に対応する特定のアドレスポインタが、その選択されたアドレス空間における同じワードに対応する特定のワードを選択する。ソフトウエア／ハードウエア境界におけるアドレス空間の最大のサイズおよび各ＦＰＧＡチップにおけるアドレスポインタは、選択されたＦＰＧＡチップのメモリ／ワード容量に依存する。例えば、本発明の一実施形態では、ＦＰＧＡチップのＡｌｔｅｒａＦＬＥＸ１０Ｋファミリを使用する。従って、各アドレス空間に対する推定された最大のサイズは、ＲＥＧ、３０００ワード、ＣＬＫ、１ワード、Ｓ２Ｈ、１０ワード、およびＨ２Ｓ、１０ワードである。各ＦＰＧＡチップは、約１００ワード保持することが可能である。
【０１７２】
さらに、Ｓエミュレータシステムは、Ｓエミュレーションプロセスの任意の時間において、ユーザが起動、停止、入力値のアサート、値の検査を可能にする機能を有する。シミュレータの柔軟性を提供するために、さらにＳエミュレータは、コンポーネントの内部実現がソフトウエアまたはハードウエアに存在するかどうかにかかわらず、全コンポーネントをユーザに見えるようにしなければならない。ソフトウエアでは、組み合わせのコンポーネントがモデル化され、値がシミュレーションプロセスの間に計算される。従って、これらの値は、シミュレーションプロセスの間に任意の時間において、ユーザがアクセスすることを明確に「見ることができる」。
【０１７３】
しかし、ハードウエアモデルの組み合わせの値は直接的に「見ることができる」ことはない。レジスタがソフトウエアカーネルによって容易にかつ直接的にアクセス可能（すなわち、読み出し／書き込み）であるが、組み合わせのコンポーネントは、決定することがより困難である。ＦＰＧＡにおいて、ほとんどの組み合わせコンポーネントは、高いゲート機能を達成するために、ルックアップテーブルとしてモデル化される。結果として、ルックアップテーブルマッピングは、効率的なハードウエアモデリングを提供するが、ほとんどの組み合わせ論理信号の可観性を損失する。
【０１７４】
組み合わせコンポーネントの可観性の欠如を有するこれらの問題にかかわらず、シミュレーションシステムは、ハードウエア加速モードの後に、ユーザによる検査のために組み合わせコンポーネントを再構成または再生成し得る。ユーザの回路設計が、組み合わせコンポーネントおよびレジスタコンポーネントのみを有する場合、全ての組み合わせコンポーネントの値は、レジスタコンポーネントから導かれ得る。すなわち、組み合わせコンポーネントは、回路設計によって要求された特定の論理機能に従って、種々の構成のレジスタから構築されるか、またはこのレジスタを含む。Ｓエミュレータは、レジスタコンポーネントおよび組み合わせコンポーネントだけのハードウエアモデルを有し、そして結果としてＳエミュレータは、ハードウエアモデルから全てのレジスタ値を読み出し、次いで全ての組み合わせコンポーネントを再構成または再生成する。この再生成プロセスを実行するように要求されたオーバーヘッドのため、組み合わせコンポーネント再生が全ての時間において実行されない。むしろ、ユーザによるリクエストに応じてのみ実行される。実際には、ハードウエアモデルを用いる利益の１つは、Ｓエミュレーションプロセスを加速することである。各サイクル（またはほとんどのサイクルでさえも）における組み合わせコンポーネントを決定することは、さらにシミュレーションのスピードを低減する。いずれのイベントにおいても、レジスタ値のみの検査は、ほとんどのシミュレーション解析に対して十分であるべきである。
【０１７５】
レジスタ値から組み合わせコンポーネント値を再生成するプロセスは、Ｓエミュレーションシステムがハードウエア加速モードまたはＩＣＥモードにあったと仮定する。そうでなければ、ソフトウエアシミュレーションは、既に組み合わせコンポーネント値をユーザに提供する。Ｓエミュレーションシステムは、ハードウエア加速の開始の前に、ソフトウエアモデルにおいて常駐していた組み合わせコンポーネント値およびレジスタ値を維持する。これらの値は、システムによるさらなる上書き動作までにソフトウエアモデルにおいて保持する。ソフトウエアモデルは、ハードウエア加速動作の開始直前の時間からレジスタ値および組み合わせコンポーネント値を既に有するため、組み合わせコンポーネント再生成プロセスは、更新された入力レジスタ値に応じてソフトウエアモデルのこれらの値のいくつかまたは全てを更新することを含む。
【０１７６】
組み合わせコンポーネント再生成プロセスは以下のようである。第１に、ユーザによってリクエストされた場合、ソフトウエアカーネルは、ＦＰＧＡチップからＲＥＧバッファにハードウエアレジスタコンポーネントの全ての出力値を読み出す。このプロセスは、アドレスポインタのチェインを介してＦＰＧＡチップのレジスタ値をＲＥＧアドレス空間に転送することを含む。ハードウエアモデルにあったレジスタ値をＲＥＧバッファ（ソフトウエア／ハードウエア境界にある）に配置することは、ソフトウエアモデルをさらなる処理のためにデータにアクセスすることを可能にする。
【０１７７】
第２に、ソフトウエアカーネルは、ハードウエア加速実行の前後にレジスタ値を比較する。ハードウエア加速実行の前のレジスタ値は、ハードウエア加速実行の後の値と同じである場合、組み合わせコンポーネントの値は、変化されない。時間の拡張および組み合わせコンポーネントを再生成するリソースの代わりに、これらの値はソフトウエアモデルから読み出され得、そしてこのソフトウエアモデルは、ハードウエア加速実行の直前からソフトウエアモデルに格納される組み合わせコンポーネント値を有する。他方では、１以上のレジスタ値が変化される場合、変化されたレジスタ値に依存する１以上の組み合わせコンポーネントはまた、値を変化させる。これらの組み合わせコンポーネントもまた、以下の第３のステップによって再生成されなければならない。
【０１７８】
第３に、前加速および後加速の比較とは異なる値を有するレジスタに対して、ソフトウエアカーネルは、ファンアウト組み合わせコンポーネントをイベントキューにスケジューリングする。ここで、この加速動作の間に値を変化させるこれらのレジスタは、イベントを検出する。恐らく、これらの変化されたレジスタ値に依存するこれらの組み合わせコンポーネントは、異なる値を生成する。これらの組み合わせコンポーネントの値の任意の変化にもかかわらず、このシステムは、次のステップでこれらの変化されたレジスタ値を評価することを確実にする。
【０１７９】
第４に、次いでソフトウエアカーネルは、標準的なイベントシミュレーションアルゴリズムを実行して、ソフトウエアモデルにおいてレジスタから全ての組み合わせコンポーネントまで変化する値を伝達する。言い換えると、加速前から加速後の時間間隔の間に変化するレジスタ値がこれらのレジスタ値に依存する全ての組み合わせコンポーネントのダウンストリームに伝達される。次いで、これらの組み合わせコンポーネントは、これらの新しいレジスタ値を評価しなければならない。展開および伝達原理に従って、次に変化されたレジスタ値に直接依存する第１のレベルの組み合わせコンポーネントからダウンストリームに配置された他の第２のレベルの組み合わせコンポーネントは、もしあれば、さらに変化されたデータを評価する。レジスタ値を影響を与え得る他のコンポーネントダウンストリームに伝達するこのプロセスは、展開ネットワークの末端まで継続する。従って、ダウンストリームに配置され、そして変化されたレジスタによって影響されるこれらの組み合わせコンポーネントのみがソフトウエアモデルにおいて更新される。全ての組み合わせコンポーネントが影響を受けるわけではない。従って、加速前から加速後の時間間隔の間に変化された１つのみのレジスタ値および１つのみの組み合わせコンポーネントがこのレジスタ値の変化によって影響される場合、次に、この組み合わせコンポーネントのみがこの変化されたレジスタ値を考慮してその値を再評価する。このモデル化された回路の他の部分は影響されない。この小さな変化に対して、組み合わせコンポーネント再生成プロセスが比較的高速で発生する。
【０１８０】
最後に、イベント伝達が終了する場合、システムは任意のモードの動作の準備中である。通常、ユーザは長い実行の後の値の検査を望む。組み合わせコンポーネント再生成プロセスの後、ユーザはデバッグ／テスト目的にための純粋なソフトウエアシミュレーションを継続する。しかし、他の場合は、ユーザは次の所望のポイントへのハードウエア加速を継続することを望む。さらに他の場合では、ユーザはＩＣＥモードにさらに進むことを望む。
【０１８１】
要するに、組み合わせコンポーネント再生は、レジスタ値を用いてソフトウエアモデルのコンポーネント値を更新することを含む。任意のレジスタ値が変化した場合、変化されたレジスタ値は、値が更新されるとともに、レジスタのファンアウトネットワークを介して伝達される。レジスタ値が変化しない場合、さらにソフトウエアモデルの値は変化せず、従って、システムは組み合わせコンポーネントを再生成する必要がない。通常、ハードウエア加速の実行がある時間の間に発生する。結果として、多くのレジスタ値は変更し得、変化された値を有するこれらのレジスタの展開ネットワークのダウンストリームに配置された多くの組み合わせコンポーネント値に影響を与える。この場合、組み合わせコンポーネント再生成プロセスは比較的遅くてもよい。他の場合では、ハードウエア加速実行の後、わずかのみのレジスタ値を変更し得る。変更されたレジスタ値を有するレジスタに対する展開ネットワークは小さてもよく、従って組み合わせコンポーネント再生成プロセスは比較的高速であり得る。
【０１８２】
ＩＶ．ターゲットシステムモードを用いるエミュレーション
図１０は、本発明の一実施形態によるＳエミュレーションシステムアーキテクチャを示す。さらに、図１０は、システムがインサーキットエミュレーションモードで動作する場合、ソフトウエアモデル、ハードウエアモデル、エミュレーションインターフェイス、およびターゲットシステム間の関係を示す。上述したように、Ｓエミュレーションシステムは、汎用マイクロプロセッサ、およびＰＣＩバス等の高速バスによって相互接続された再構成可能なハードウエア基板を含む。Ｓエミュレーションシステムは、ユーザの回路設計をコンパイルし、ハードウエアモデルへの再構成可能なボードマッピングプロセスのためのエミュレーションハードウエア構成データを生成する。次いで、ユーザは、汎用プロセッサを介して回路をシミュレートし、シミュレーションプロセスをハードウエア加速し、エミュレーションインターフェイスを介してターゲットシステムを用いて回路設計をエミュレートし、そしてその後で、ポストシミュレーション解析を実行する。
【０１８３】
ソフトウエアモデル３１５およびハードウエアモデル３２５は、コンパイルプロセスの間に決定される。さらにエミュレーションインターフェイス３８２およびターゲットシステム３８７は、インサーキットエミュレーションモードのためのシステムにおいて提供される。ユーザの判断において、エミュレーションインターフェイスおよびターゲットシステムは、初めにシステムに接続される必要がない。
【０１８４】
ソフトウエアモデル３１５は、全システムを制御するカーネル３１６、およびソフトウエア／ハードウエア境界（ＲＥＧ、Ｓ２Ｈ、Ｈ２Ｓ、およびＣＬＫ）に対する４つのアドレス空間を含む。Ｓエミュレーションシステムは、異なるコンポーネントタイプおよび制御機能に従って、ハードウエアモデルをメインメモリにおける４つのアドレス空間にマッピングする。ＲＥＧスペース３１７は、レジスタコンポーネントに対して指定される。ＣＬＫ空間３２０は、ソフトウエアクロックに対して指定される。Ｓ２Ｈ空間３１８は、ソフトウエアテストベンチコンポーネントにハードウエアモデルへの出力に対して指定される。Ｈ２Ｓ空間３１９は、ハードウエアモデルのソフトウエアテストベンチコンポーネントへの出力に対して指定される。これらの特定用途のＩ／Ｏバッファ空間は、システム初期化時間の間にカーネルのメインメモリ空間にマッピングされる。
【０１８５】
ハードウエアモデルは、ＦＰＧＡチップのいくつかのバンク３２６ａ〜３２６ｄおよびＦＰＧＡＩ／Ｏコントローラ３２７を含む。各バンク（例えば、３２６ｂ）は、少なくとも１つのＦＰＧＡチップを含む。一実施形態では、各バンクは４つのＦＰＧＡチップを含む。ＦＰＧＡチップの４×４アレイでは、バンク３２６ｂおよび３２６ｄは、低いバンクであり得、バンク３２６ａおよび３２６ｃは、高いバンクであり得る。特定のチップおよびその相互接続に対する特定のハードウエアモデルのユーザ回路設計素子のマッピング、配置、およびルーティングは、図６を参照して説明される。ソフトウエアモデル３１５とハードウエアモデル３２５との間の相互接続３２８は、ＰＣＩバスシステムである。さらにハードウエアモデルは、ＦＰＧＡＩ／Ｏコントローラ３２７を含み、ＦＰＧＡＩ／Ｏコントローラ３２７は、ＰＣＩバスとＦＰＧＡチップのバンク３２６ａ〜３２６ｄとの間のデータトラフィックを制御しつつ、ＰＣＩバスのスループットを維持するためのＰＣＩインターフェイス３８０および制御ユニット３８１を含む。各ＦＰＧＡチップは、いくつかのアドレスポインタをさらに含み、各アドレスポインタは、ソフトウエア／ハードウエア境界の各アドレス空間（すなわち、ＲＥＧ、Ｓ２Ｈ、Ｈ２Ｓ、およびＣＬＫ）に対応し、これらのアドレス空間のそれぞれとＦＰＧＡチップのバンク３２６ａ〜３２６ｄにおける各ＦＰＧＡチップとの間のデータを接続する。
【０１８６】
ソフトウエアモデル３１５とハードウエアモデル３２５との間の通信は、ハードウエアモデルのＤＭＡエンジンまたはアドレスポインタを介して発生する。あるいは、さらに通信は、ハードウエアモデルのＤＭＡエンジンおよびアドレスポインタの両方を介して発生する。カーネルは、直接マッピングされたＩ／Ｏ制御レジスタを介して評価リクエストと共にＤＭＡ転送を開始する。ＲＥＧ空間３１７、ＣＬＫ空間３２０、Ｓ２Ｈ空間３１８、およびＨ２Ｓ空間３１９は、ソフトウエアモデル３１５とハードウエアモデル３２５との間のデータ送達のために、Ｉ／Ｏデータパス経路３２１、３２２、３２３、および３２４それぞれを使用する。
【０１８７】
二重バッファリングは、Ｓ２ＨおよびＣＬＫ空間への全ての一次入力に対して要求される。なぜなら、これらの空間はいくつかのクロックサイクルを獲得し、更新プロセスを終了する。ダブルバッファリングは、競合状態を引き起こし得る内部ハードウエアモデル状態の妨害を回避する。
【０１８８】
Ｓ２ＨおよびＣＬＫ空間は、カーネルからハードウエアモデルまでの一次入力である。上述のように、ハードウエアモデルは、全てのレジスタコンポーネントおよびユーザの回路設計の全ての組み合わせコンポーネントを実質的に保持する。さらに、ソフトウエアクロックは、ソフトウエアにおいてモデル化され、ＣＬＫＩ／Ｏアドレス空間に設けられ、ハードウエアモデルとインターフェイスをとる。カーネル進行シミュレーション時間は、アクティブテストベンチコンポーネントを検索し、クロックコンポーネントを評価する。任意のクロックエッジがカーネルによって検索される場合、レジスタおよびメモリが更新され、組み合わせコンポーネントを介して値が伝達される。従って、ハードウエア加速モードが選択される場合、この空間内の値の任意の変化がハードウエアモデルをトリガして論理状態を変化させる。
【０１８９】
インサーキットエミュレーションモードに対してエミュレーションインターフェイス３８２は、ＰＣＩバス３２８に接続され、エミュレーションインターフェイスは、ハードウエアモデル３２５およびソフトウエアモデル３１５と通信し得る。ハードウエア加速シミュレーションモードおよびインサーキットエミュレーションモードの間、カーネル３１６は、ソフトウエアモデルおよびハードウエアモデルを制御する。さらに、エミュレーションインターフェイス３８２は、ケーブル３９０を介してターゲットシステム３８７に接続される。さらに、エミュレーションインターフェイス３８２は、インターフェイスポート３８５、エミュレーションＩ／Ｏ制御３８６、ターゲット−ハードウエアＩ／Ｏバッファ（Ｔ２Ｈ）３８４、およびハードウエア−ターゲットＩ／Ｏバッファ（Ｈ２Ｔ）３８３を含む。
【０１９０】
ターゲットシステム３８７は、ターゲットシステム３８７の一部であるコネクタ３８９、信号入力／信号出力インターフェイスソケット３８８、および他のモジュールまたはチップを含む。例えば、ターゲットシステム３８７は、ＥＧＡビデオレコーダであり得、ユーザの回路設計は特定のＩ／Ｏコントローラ回路であり得る。ＥＧＡビデオコントローラのためのＩ／Ｏコントローラのユーザの回路設計は、ソフトウエアモデル３１５において完全にモデル化され、ハードウエアモデル３２５で部分的にモデル化される。
【０１９１】
さらに、ソフトウエアモデル３１５のカーネル３１６は、インサーキットエミュレーションモードを制御する。エミュレーションクロックの制御は、ソフトウエアクロック、ゲートクロック論理、およびゲートデータ論理を介してそのソフトウエアの中に依然として存在し、セットアップおよび保持時間の問題がインサーキットエミュレーションモードの間には生じない。従って、ユーザは、開始し、停止し、単一処理（ｓｉｎｇｌｅ−ｓｔｅｐ）し、値をアサートし、およびインサーキットエミュレーションプロセスにおける任意の時間において値を検査し得る。
【０１９２】
この作業を行うために、ターゲットシステムとハードウエアモデルとの間の全てのクロックノードが識別される。ターゲットシステムにおけるクロック発生器がディセーブルされ、ターゲットシステムからのクロックポートが切断され、または、そうでなければターゲットシステムからのクロック信号はハードウエアモデルに到達することを妨げる。その代わりに、クロック信号は、テストベンチプロセスまたはソフトウエア発生クロックの他の発生形態から生じる。その結果、ソフトウエアカーネルは、アクティブクロックエッジを検出して、データ評価をトリガし得る。従って、ＩＣＥモードでは、Ｓエミュレーションシステムは、ソフトウエアクロックを使用して、ターゲットシステムのクロックの代わりにハードウエアモデルを制御する。
【０１９３】
ターゲットシステムの環境内でユーザの回路設計の動作をシミュレートするために、ターゲットシステム４０とモデル化された回路設計との間の一次入力（入信号）および出力（出信号）信号は、評価のためにハードウエアモデル３２５に供給される。これは、２つのバッファ（ターゲット／ハードウエアバッファ（Ｔ２Ｈ）３８４およびハードウエア／ターゲットバッファ（Ｈ２Ｔ）３８３）を介して達成される。ターゲットシステム３８７は、Ｔ２Ｈバッファ３８４を使用して、入力信号をハードウエアモデル３２５に適用する。ハードウエアモデル３２５は、Ｈ２Ｔバッファ３８３を使用して、出力信号をターゲットシステム３８７に送達する。データを評価するためにソフトウエアモデル３１５のテストベンチプロセスに代わりに、このインサーキットエミュレーションモードでは、Ｓ２ＨおよびＨ２Ｓバッファの代わりにＴ２ＨおよびＨ２Ｔバッファを介してＩ／Ｏ信号を送受信する。なぜなら、システムは現在、ターゲットシステム３８７を使用しているためである。ターゲットシステムはソフトウエアシミュレーションの速度よりも実質的に大きい速度で実行するため、インサーキットエミュレーションモードはまた、より大きい速度で実行する。これらの入力および出力信号の伝達は、ＰＣＩバス３２８上で発生する。
【０１９４】
さらに、バス６１は、エミュレーションインターフェイス３８２とハードウエアモデル３２５との間に設けられる。このバスは図１のバス６１と類似している。バス６１は、エミュレーションインターフェイス３８２およびハードウエアモデル３２５がＴ２Ｈバッファ３８４およびＨ２Ｔバッファ３８３を介して通信する。
【０１９５】
典型的には、ターゲットシステム３８７は、ＰＣＩバスに接続されない。しかし、エミュレーションインターフェイス３８２がターゲットシステム３８７の設計に組み込まれる場合、このような接続は実現可能であり得る。この設定では、ケーブル３９０は存在しない。ターゲットシステム３８７とハードウエアモデル３２５との間の信号は、エミュレーションインターフェイスを通過する。
【０１９６】
Ｖ．ポストエミュレーション解析モード
本発明のシミュレーションシステムは、ポストシミュレーション解析に対して広範囲に使用されたシミュレータ機能値である変化ダンプ（ＶＣＤ）を支援し得る。本質的には、ＶＣＤは、ハードウエアモデルの全入力および選択されたレジスタ出力の履歴記録を提供する。その後、ポストシミュレーション解析の間、種々の入力およびシミュレーションプロセスの結果の出力を再検討し得る。ＶＣＤを支援するために、システムは全ての入力をハードウエアモデルに書き込む。出力に対して、システムは、ユーザ定義されたロギング頻度（例えば、１／１０，０００レコード／サイクル）でハードウエアレジスタコンポーネントの全ての値を書き込む。書き込み頻度は、出力値がどれくらいの頻度で記録されるかを決定する。１／１０，０００レコード／サイクルのロギング頻度に対して、出力値は１０，０００サイクルごとに１回記録される。ロギング頻度が大きくなると、後のポストシミュレーション解析に対してより多くの情報が記録される。ロギング頻度が小さくなると、後のポストシミュレーションプロセスに対してより少ない情報が格納される。選択された書き込み頻度が、Ｓエミュレーション速度に対して一定ではない（ｃａｓｕａｌ）関係を有するため、ユーザはロギング頻度を注意して選択すべきである。より大きいロギング頻度は、Ｓエミュレーション速度を低減する。なぜなら、さらなるシミュレーションが実行され得る前に、メモリへのＩ／Ｏ動作を実行することによって出力データをメモリに記録するために、システムは、時間およびリソースを消費しなければならないからである。
【０１９７】
ポストシミュレーション解析に関して、ユーザは、シミュレーションが望まれる特定のポイントを選択する。ロギング頻度が１／５００レコード／サイクルである場合、ポイント０、５００、１０００、１５００等、５００サイクルごとについてレジスタ値が記録される。例えば、ユーザがポイント６１０において結果を望む場合、ユーザは、レジスタ値が記録されるポイント５００を選択し、シミュレーションがポイント６１０に到達するまで、ユーザは、シミュレーションがポイント６１０に到達するまで時間に合わせて前の方にシミュレートする。この解析ステージの間、解析速度は、シミュレーション速度と同じである。なぜなら、ユーザは、最初にポイント５００についてのデータを最初にアクセスし、その次に、ポイント６１０の前方にシミュレートを行うためである。より高いロギング頻度において、より多くのデータがポストシミュレーション解析のために格納されることに留意されたい。従って、１／３００レコード／サイクルのロギング頻度に対して、データは、ポイント０、３００、６００、９００等、３００サイクルごとについて記録される。ポイント６１０において結果を得るために、ユーザは、初めにレジスタ値が記録されるポイント６００を選択し、そしてポイント６１０まで前方にシミュレートする。システムは、ポストシミュレーション解析の間、ロギング頻度が１／５００ではなく１／３００である場合、所望のポイント６１０により高速に到達し得ることに留意されたい。しかし、これは必ずしも高速ではない。ロギング頻度と共に特定の解析ポイントは、ポストシミュレーション解析の点がどれくらい高速で到達するかを決定する。例えば、ＶＣＤロギング頻度が１／３００ではなく１／５００である場合、システムは、ポイント５２３により速く到達し得る。
【０１９８】
次いで、ユーザがハードウエアモデルに入力ログを用いてソフトウエアシミュレーションを実行することによって、Ｓエミュレーション後の解析を実行して、全てのハードウエアコンポーネントのダンプを計算し得る。さらにユーザは、任意のレジスタ書き込み点を遅れることなく選択し、そのログポイントから値変化ダンプを遅れることなく前方に向かって開始する。この値変化ダンプ方法は、ポストシミュレーション解析のために任意のシミュレーション波形にリンクし得る。
【０１９９】
ＶＩ．ハードウエア実現スキーム
（Ａ．概要）
Ｓエミュレーションシステムは、再構成可能な基板上でＦＰＧＡチップのアレイを実現する。ハードウエアモデルに基づいて、Ｓエミュレーションシステムは、ＦＰＧＡチップにユーザの回路設計の選択された部分のそれぞれを分割し、マッピングし、配置し、そしてルーティングする。従って、例えば、４×４アレイの１６のチップは、これらの１６のチップにわたって拡がった大きい回路をモデル化し得る。相互接続スキームは、各チップが２つの「ジャンプ」またはリンク内の別のチップにアクセスすることを可能にする。
【０２００】
各ＦＰＧＡチップは、Ｉ／Ｏアドレス空間（すなわち、ＲＥＧ、ＣＬＫ、Ｓ２Ｈ、Ｈ２Ｓ）のそれぞれに対してアドレスポインタを実現する。特定のアドレス空間に関連付けられた全てのアドレスポインタの組み合わせが互いに連鎖される。そのため、データ転送の間、各チップにおけるワードデータは、所望のワードデータがその選択されたアドレス空間に対してアクセスされるまで、メインＦＰＧＡバスおよびＰＣＩバスから／メインＦＰＧＡバスおよびＰＣＩバスに、各チップのワードデータは、各チップの選択されたアドレス空間に対して一度に１ワードおよび一度に１チップだけ連続的に選択される。ワードデータの連続的な選択は、ワード選択信号を伝達することによって達成される。ワード選択信号は、チップ内のアドレスポインタによって進行し、次いで、次のチップのアドレスポインタに伝達し、最後のチップまでさらに継続するか、または、システムはアドレスポインタを初期化する。
【０２０１】
再構成可能な基板のＦＰＧＡバスシステムは、ＰＣＩバス帯域幅を２回動作させるが、ＰＣＩバス速度の半分で動作させる。従ってＦＰＧＡチップは、バンクに分離され、より大きな帯域幅のバスを利用する。このＦＰＧＡバスシステムのスループットは、ＰＣＩバスシステムのスループットを追跡し得、そのため性能は、バス速度の低減によって損失されない。バンク長さを拡張するより多くのＦＰＧＡチップまたはピギーバック基板を含むより大きな基板によって拡張が可能になる。
【０２０２】
（Ｂ．アドレスポインタ）
図１１は、本発明のアドレスポインタの一実施形態を示す。全Ｉ／Ｏ動作はＤＭＡストリーミングによって進む。システムが１つのみのバスを有するため、システムは、一度に１ワードだけ連続的にデータにアクセスする。従って、アドレスポインタの一実施形態は、これらのアドレス空間の選択されたワードに連続的にアクセスするシフトレジスタチェーンを使用する。アドレスポインタ４００は、フリップフロップ４０１〜４０５、ＡＮＤゲート４０６、一対の制御信号の接続、初期化４０７および移動４０８を含む。
【０２０３】
各アドレスポインタは、選択されたアドレス空間における同じワードに対応する各ＦＰＧＡチップにおいてｎ個の可能なワードからワードを選択するために、ｎ個の出力（Ｗ０，Ｗ１，Ｗ２，．．．，Ｗｎ−１）を有する。モデル化された特定のユーザ回路設計に応じて、ワード数ｎが回路設計間で変動し、所与の回路設計に対して、ｎはＦＰＧＡチップ間で変動し得る。図１１では、アドレスポインタ４００は、５ワード（すなわちｎ＝５）のみのアドレスポインタ４００である。従って、特定のアドレス空間に対してこの５ワードのアドレスポインタを含むこの特定のＦＰＧＡチップが選択すべき５ワードのみを有する。言うまでもなく、アドレスポインタ４００は、任意のワード数ｎを実現し得る。この出力信号Ｗｎは、さらにワード選択信号によって呼び出され得る。このワード選択信号は、このアドレスポインタにおける最後のフリップフロップの出力に到達する場合、次のＦＰＧＡチップのアドレスポインタの入力に伝達されるべきＯＵＴ信号によって呼び出される。
【０２０４】
初期化信号がアサートされる場合、アドレスポインタが初期化される。第１のフリップフロップ４０１が「１」に設定され、他の全てのフリップフロップ４０２〜４０５が「０」に設定される。この点において、アドレスポインタの初期化は、任意のワード選択を可能にしない。すなわち、初期化の後、全てのＷｎ出力が「０」のままである。またアドレスポインタ初期化手順が図１２を参照して説明される。
【０２０５】
移動信号はワード選択に対してポインタの進行を制御する。この移動信号はＦＰＧＡＩ／Ｏコントローラからの読み出し、書き込み、および空間インデックス制御信号から発生する。各動作が実質的に読み出しまたは書き込みであるために、空間インデックス信号は、実質的にどのアドレスポインタが移動信号に適用されるかを決定する。従って、システムは、一度に選択されたＩ／Ｏアドレス空間に関連付けられた１つのみのアドレスポインタを駆動し、この時間の間、システムはそのアドレスポインタのみに移動信号を適用する。移動信号の生成は、図１３に関してさらに説明される。図１１を参照して、移動信号がアサートされる場合、移動信号は、ＡＮＤゲート４０６への入力およびフリップフロップ４０１〜４０５のイネーブル入力に供給される。従って、論理「１」は、ワード出力Ｗｉ〜Ｗｉ＋１の各システムクロックサイクルに移動する。すなわち、ポインタは、Ｗｉ〜Ｗｉ＋１まで移動し、特定のワードの各サイクルを選択する。シフティングワード選択信号がその方向を最後のフリップフロップ４０５の出力（本明細書中では「ＯＵＴ」としてラベリングされる）４１３に向ける場合、その後、このＯＵＴ信号は、このアドレスポインタが再度初期化されない場合、図１４および図１５を参照して説明されるように、多重化されたクロスチップアドレスポインタチェーンを介して次のＦＰＧＡチップに向ける。
【０２０６】
アドレスポインタ初期化手順が説明される。図１２は、図１１のアドレスポインタに対するアドレスポインタ初期化の状態遷移図である。最初に状態４６０は、アイドル状態である。ＤＡＴＡ＿ＸＳＦＲが「１」に設定される場合、システムは状態４６１に進む。ここでアドレスポインタは初期化される。ここで初期化信号はアサートされる。各アドレスポインタにおける第１のフリップフロップが「１」に設定され、アドレスポインタにおける全ての他のフリップフロップが「０」に設定される。この点において、アドレスポインタの初期化は、いずれのワード選択もイネーブルしない。すなわち、Ｗｎ出力の全てが「０」のままである。ＤＡＴＡ＿ＸＳＦＲは「１」のままである間、次の状態は待機状態４６２である。ＤＡＴＡ＿ＸＳＦＲが「０」である場合、アドレスポインタ初期化手順が終了し、システムはアイドル状態４６０に戻る。
【０２０７】
アドレスポインタに対して種々の移動信号を生成するための移動信号発生器がここで説明される。ＦＰＧＡＩ／Ｏコントローラ（図１０におけるアイテム３２７、図２２）によって生成された空間インデックスは、特定のアドレス空間（すなわち、ＲＥＧ読み出し、ＲＥＧ書き込み、Ｓ２Ｈ読み出し、Ｈ２Ｓ書き込み、およびＣＬＫ書き込み）を選択する。このアドレス空間内において、本発明のシステムはアクセスされるべき特定のワードを連続的に選択する。この連続的なワード選択は、移動信号によって各アドレスポインタにおいて達成される。
【０２０８】
移動信号発生器の一実施形態が図１３に示される。各ＦＰＧＡチップ４５０は、種々のソフトウエア／ハードウエア境界アドレス空間（すなわちＲＥＧ、Ｓ２Ｈ、Ｈ２Ｓ、およびＣＬＫ）に対応するアドレスポインタを有する。ＦＰＧＡチップ４５０においてモデル化され実現されたアドレスポインタおよびユーザの回路設計に加えて、移動信号発生器４７０は、ＦＰＧＡチップ４５０に設けられる。移動信号発生器４７０は、アドレス空間デコーダ４５１およびいくつかのＡＮＤゲート４５２〜４５６を含む。入力信号は、ワイヤ線４５７上のＦＰＧＡ読み出し信号（Ｆ＿ＲＤ）、ワイヤ線４５８上のＦＰＧＡ書き込み信号（Ｆ＿ＷＲ）、およびアドレス空間信号４５９である。どのアドレス空間のアドレスポインタが利用可能であるかに依存して、各アドレスポインタに対する出力移動信号は、ワイヤ線４６４上のＲＥＧＲ移動、ワイヤ線４６５上のＲＥＧＷ移動、ワイヤ線４６６上のＳ２Ｈ移動、ワイヤ線４６７上のＨ２Ｓ移動、およびワイヤ線４６８上のＣＬＫ移動に対応する。これらの出力信号は、ワイヤ線４０８上の移動信号に対応する（図１１）。
【０２０９】
アドレス空間デコーダ４５１は、３−ビット入力信号４５９を受け取る。さらに２ビット入力信号を受け取り得る。この２ビット信号は４つの可能なアドレス空間を提供するが、３ビット入力は、８つの可能なアドレス空間を提供する。一実施形態では、ＣＬＫは、「００」に割り当てられ、Ｓ２Ｈは、「０１」に割り当てられ、Ｈ２Ｓは、「１０」に割り当てられ、およびＲＥＧは、「１１」に割り当てられる。入力信号４５９に依存して、アドレス空間デコーダの出力は、ＲＥＧ、Ｈ２Ｓ、Ｓ２Ｈ、およびＣＬＫにそれぞれ対応するワイヤ線４６０〜４６３の内の１つ上に「１」を出力するが、残っているワイヤ線は、「０」に設定される。従って、任意のこれらの出力ワイヤ線４６０〜４６３が「０」である場合、ＡＮＤゲート４５２〜４５６の対応する出力が「０」である。同様に、任意のこれらの入力ワイヤ線４６０〜４６３が「１」である場合、ＡＮＤゲート４５２〜４５６の対応する出力が「１」である。例えば、アドレス空間信号４５９が「１０」である場合、アドレス空間Ｈ２Ｓが選択される。ワイヤ線４６１が「１」である一方で、残っているワイヤ線４６０、４６２、および４６３が「０」である。従って、ワイヤ線４６６が「１」である一方で、残っているワイヤ線４６４、４６５、４６７、および４６８は「０」である。同様に、ワイヤ線４６０が「１」であり、ＲＥＧ空間が選択され、読み出し（Ｆ＿ＲＤ）または書き込み（Ｆ＿ＷＲ）動作が選択されるかどうかに依存している場合、ワイヤ線４６４上のＲＥＧＲ移動信号またはワイヤ線４６５上のＲＥＧＷ移動信号上のいずれかが「１」になる。
【０２１０】
上述のように、空間インデックスはＦＰＧＡＩ／Ｏコントローラによって生成される。コードでは、移動制御は、
ＲＥＧ空間読み出しポインタ：ＲＥＧＲ−ｍｏｖｅ＝（ＳＰＡＣＥ−ｉｎｄｅｘ＝＝＃ＲＥＧ）＆ＲＥＡＤ；
ＲＥＧ空間書き込みポインタ：ＲＥＧＷ−ｍｏｖｅ＝（ＳＰＡＣＥ−ｉｎｄｅｘ＝＝＃ＲＥＧ）＆ＷＲＩＴＥ；
Ｓ２Ｈ空間読み出しポインタ：Ｓ２Ｈ−ｍｏｖｅ＝（ＳＰＡＣＥ−ｉｎｄｅｘ＝＝＃Ｓ２Ｈ）＆ＲＥＡＤ；
Ｈ２Ｓ空間書き込みポインタ：Ｈ２Ｓ−ｍｏｖｅ＝（ＳＰＡＣＥ−ｉｎｄｅｘ＝＝＃Ｈ２Ｓ）＆ＷＲＩＴＥ；
ＣＬＫ空間書き込みポインタ：ＣＬＫ−ｍｏｖｅ＝（ＳＰＡＣＥ−ｉｎｄｅｘ＝＝＃ＣＬＫ）＆ＷＲＩＴＥ；
これは、図１３上の移動信号発生器の論理図に対して等価なコードである。
【０２１１】
上述のように、各ＦＰＧＡチップは、ソフトウエア／ハードウエア境界におけるアドレス空間と同じ数のアドレスポインタを有する。ソフトウエア／ハードウエア境界が４つのアドレス空間（すなわち、ＲＥＧ、Ｓ２Ｈ、Ｈ２Ｓ、およびＣＬＫ）を有する場合、各ＦＰＧＡチップは、これらの４つのアドレス空間に対応する４つのアドレスポインタを有する。各ＦＰＧＡは、これらの４つのアドレスポインタを必要とするため、選択されたアドレス空間における処理された特定の選択されたワードは、任意の１つ以上のＦＰＧＡチップに常駐し得るか、または、選択されたアドレス空間におけるデータは、各ＦＰＧＡチップにモデル化され、実現される種々の回路素子に影響を与える。選択されたワードが適切な１つ以上のＦＰＧＡチップにおいて１以上の適切な回路素子を用いて処理されることを確実にするために、所与のソフトウエア／ハードウエア境界のアドレス空間（すなわち、ＲＥＧ、Ｓ２Ｈ、Ｈ２Ｓ、およびＣＬＫ）に関連付けられたアドレスポインタの各セットは、いくつかのＦＰＧＡチップにわたって互いに「連鎖される」。図１１を参照して上述されたように、この「連鎖」の実施形態では、１つのＦＰＧＡチップの特定のアドレス空間と関連付けられたアドレスポインタが次のＦＰＧＡチップと同じアドレス空間に関連付けられたアドレスポインタに「変化される」ことを除いて、移動信号を介して特定のシフティングワード選択機構または伝達ワード選択機構がなおも利用される。
【０２１２】
４つの入力ピンと４つの出力ピンとを実現して、アドレスポインタを連鎖することは、同じ目的を達成することである。しかしこの実現は、リソースの効率的な使用に関してコストがかかりすぎる。すなわち、４つのワイヤが２つのチップ間で必要とされ、４つの入力ピンおよび４つの出力ピンが各チップにおいて必要とされる。本発明のシステムの一実施形態は、多重化されたクロスチップアドレスポインタチェーンを使用する。このチェインは、ハードウエアモデルが各チップ（チップの２つのＩ／Ｏピン）においてチップ間の１つのみのワイヤ、１つのみの入力ピン、および１つの出力ピンを使用することを可能にする。多重化されたクロスチップアドレスポインタチェーンの１つの実施形態が図１４に示される。
【０２１３】
図１４に示される実施形態では、ユーザの回路設計は、再構成可能なハードウエア基板４７０において３つのＦＰＧＡチップ４１５〜４１７にマッピングされ分割されている。このアドレスポインタは、ブロック４２１〜４３２のように示される。ワードＷｎの数（フリップフロップの数）は、どれくらいの数のワードがユーザのカスタム回路設計に対して各チップにおいて実現されるかに応じて変動し得ることを除いて、各アドレスポインタ（例えばアドレスポインタ４２７）は、図１１に示されるアドレポインタと同様な構造および機能を有する。
【０２１４】
ＲＥＧＲアドレス空間に対して、ＦＰＧＡチップ４１５はアドレスポインタ４２１を有し、ＦＰＧＡチップ４１６はアドレスポインタ４２５を有し、そしてＦＰＧＡチップ４１７はアドレスポインタ４２９を有する。ＲＥＧＷアドレス空間に対して、ＦＰＧＡチップ４１５はアドレスポインタ４２２を有し、ＦＰＧＡチップ４１６はアドレスポインタ４２６を有し、そしてＦＰＧＡチップ４１７はアドレスポインタ４３０を有する。Ｓ２Ｈアドレス空間に対して、ＦＰＧＡチップ４１５はアドレスポインタ４２３を有し、ＦＰＧＡチップ４１６はアドレスポインタ４２７を有し、そしてＦＰＧＡチップ４１７はアドレスポインタ４３１を有する。Ｈ２Ｓアドレス空間に対して、ＦＰＧＡチップ４１５はアドレスポインタ４２４を有し、ＦＰＧＡチップ４１６はアドレスポインタ４２８を有し、そしてＦＰＧＡチップ４１７はアドレスポインタ４３２を有する。
【０２１５】
各チップ４１５〜４１７は、マルチプレクサ４１８〜４２０それぞれを有する。公知のように、これらのマルチプレクサ４１８〜４２０がモデル化され、実際の実現はレジスタおよび論理素子の組み合わせであり得ることに留意されたい。例えば、マルチプレクサは、図１５に示されるように、ＯＲゲートに供給されるいくつかのＡＮＤゲートであり得る。マルチプレクサ４８７は、４つのＡＮＤゲート４８１〜４８４および１つのＯＲゲート４８５を含む。マルチプレクサ４８７の入力は、チップの各アドレスポインタからのアウト信号および移動信号である。マルチプレクサ４８７の出力４８６は、次のＦＰＧＡチップへの入力に通過するチェーンカットアウト信号である。
【０２１６】
図１５では、この特定のＦＰＧＡチップは、Ｉ／Ｏアドレス空間に対応する４つのアドレスポインタ４７５〜４７８を有する。アドレスポインタの出力（アウト信号および移動信号は、マルチプレクサ４８７への入力である。例えば、アドレスポインタ４７５は、ワイヤ線４７９上のアウト信号およびワイヤ線４８０上の移動信号を有する。これらの信号はＡＮＤゲート４８１への入力である。ＡＮＤゲート４８１の出力はＯＲゲート４８５への入力である。ＯＲゲート４８５の出力はこのマルチプレクサ４８７の出力である。動作において、対応する移動信号および空間インデックスを組み合わせてアドレスポインタ４７５〜４７８のそれぞれの出力におけるアウト信号は、マルチプレクサ４８７に対するセレクタ信号として機能する。すなわち、アウト信号および移動信号の両方（空間インデックス信号に由来する）は、アクティブ（例えば論理「１」）にアサートされ、マルチプレクサからのワード選択信号をチェーンアウトワイヤラインに伝達する必要がある。移動信号が定期的にアサートされ、アドレスポインタにおけるフリップフロップを介してワード選択信号を移動させ、この信号は入力ＭＵＸデータ信号として特徴付けられる。
【０２１７】
図１４に戻って、これらのマルチプレクサ４１８〜４２０は、４つのセットの入力と１つの出力を有する。入力の各セットは、（１）特定のアドレス空間に関連付けられたアドレスポインタに対する最後の出力Ｗｎ−１ワイヤライン（例えば図１１に示されたアドレスポインタにおけるワイヤライン４１３）で見出されるアウト信号、および（２）移動信号を含む。マルチプレクサ４１８〜４２０のそれぞれの出力はチェインアウト信号である。各アドレスポインタにおけるフリップフロップを介したワード選択信号Ｗｎは、この信号がアドレスポインタにおける最後のフリップフロップの出力に到達する場合にアウト信号になる。ワイヤライン４３３〜４３５上のチェインアウト信号は、同じアドレスポインタに関連付けられたアウト信号および移動信号が両方ともアクティブに（例えば「１」にアサートされる）アサートされた場合のみ「１」になる。
【０２１８】
マルチプレクサ４１８に対して、入力は、アドレスポインタ４２１〜４２４からのアウト信号および移動信号にそれぞれ対応する移動信号４３６〜４３９およびアウト信号４４０〜４４３である。マルチプレクサ４１９に対して、入力は、アドレスポインタ４２５〜４２８からのアウト信号および移動信号にそれぞれ対応する移動信号４４４〜４４７およびアウト信号４５２〜４５５である。マルチプレクサ４２０に対して、入力は、アドレスポインタ４２９〜４３２からのアウト信号および移動信号にそれぞれ対応する移動信号４４８〜４５１およびアウト信号４５６〜４５９である。
【０２１９】
動作時に、ワードＷｎの任意の所与のシフトに対して、ソフトウエア／ハードウエア境界において選択されたＩ／Ｏアドレス空間に関連付けられたこれらのアドレスポインタまたはチェーンアドレスポインタのみがアクティブになる。従って、図１４では、アドレス空間ＲＥＧＲ、ＲＥＧＷ、Ｓ２Ｈ、またはＨ２Ｓの内の１つに関連付けられた、チップ４１５、４１６、および４１７におけるアドレスポインタのみが所与のシフトに対してアクティブである。あるいは、フリップフロップを通過するワード選択信号Ｗｎの所与のシフトに対して、選択されたワードはバス帯域幅に関する制限のために連続的にアクセスされる。一実施形態では、バスは３２ビットの幅であり、ワードは３２ビットであり、そのため、１つのワードのみが一度にアクセスされ、適切なリソースに送達され得る。
【０２２０】
アドレスポインタがフリップフロップを介してワード選択信号を伝達またはシフトしている途中である場合、出力チェインアウト信号がアクティブにされず（例えば「１」ではない）、従って、このチップのマルチプレクサはワード選択信号を次のＦＰＧＡチップに伝達する準備がまだできていない。アウト信号がアクティブ（例えば「１」）にアサートされた場合、システムが次のＦＰＧＡチップにワード選択信号を伝達またはシフトする準備ができていることを示すチェインアウト信号は、アクティブ（例えば「１」）にアサートされる。従って、アクセスが一度に１つのチップで発生する。すなわち、ワード選択シフト動作が別のチップに対して実行される前にワード選択信号は、１つのチップのフリップフロップを介してシフトされる。実際には、チェインアウト信号は、ワード選択信号が各チップにおけるアドレスポインタの末端に到達する場合のみアサートされる。コードでは、チェインアウト信号は、
Ｃｈａｉｎ−ｏｕｔ＝（ＲＥＧＲ−ｍｏｖｅ＆ＲＥＧＲ−ｏｕｔ）│（ＲＥＧＷ−ｍｏｖｅ＆ＲＥＧＷ−ｏｕｔ）│（Ｓ２Ｈ−ｍｏｖｅ＆Ｓ２Ｈ−ｏｕｔ）│（Ｈ２Ｓ−ｍｏｖｅ＆Ｈ２Ｓ−ｏｕｔ）│
要するに、システム内のＩ／Ｏアドレス空間（すなわち、ＲＥＧ、Ｈ２Ｓ、Ｓ２Ｈ、ＣＬＫ）の数Ｘに対して、各ＦＰＧＡはＸのアドレスポインタ（各アドレス空間に対する１つのアドレスポインタ）を有する。各アドレスポインタのサイズは、各ＦＰＧＡチップにおけるユーザのカスタム回路設計をモデル化するために必要とするワードの数に依存する。特定のＦＰＧＡチップに対してｎワード（すなわち、アドレスポインタに対してｎワード）を仮定すると、この特定のアドレスポインタは、ｎの出力（すなわち、Ｗ０，Ｗ１，Ｗ２，．．．，Ｗｎ−１）を有する。これらの出力Ｗｉはさらにワード選択信号と呼ばれる。特定のワードＷｉが選択される場合、Ｗｉ信号がアクティブ（すなわち「１」）にアサートされる。このワード選択信号がこのチップのアドレスポインタの末端に到達するまで、このワード選択信号は、このチップのアドレスポインタにシフトまたは伝達する。この点において、この信号は、次のチップのアドレスポインタを介してワード選択信号Ｗｉの伝達を開始するチェインアウト信号の生成をトリガする。このように、所与のＩ／Ｏアドレス空間に関連付けられたアドレスポインタのチェインは、この再構成可能なハードウエア基板のＦＰＧＡチップの全てにわたって実現され得る。
【０２２１】
（Ｃ．ゲートデータ／クロックネットワーク解析）
本発明の種々の実施形態がゲートデータ論理およびゲートクロック論理解析に従ってクロック解析を実行する。ゲートクロック論理（またはクロックネットワーク）およびゲートデータネットワーク決定は、ソフトウエアクロックの連続する実現およびエミュレーションの間のハードウエアモデルの論理評価に対して重要である。図４を参照して説明されるように、クロック解析がステップ３０５で実行される。クロック解析プロセスに関してさらに詳述するように、図１６は、本発明の一実施形態によるフローチャートを示す。さらに図１６は、ゲートデータ解析を示す。
【０２２２】
Ｓエミュレーションシステムは、ソフトウエアのユーザの回路設計の完成したモデルおよびハードウエアのユーザの回路設計のいくつかの部分を有する。これらのハードウエア部は、クロックコンポーネント（特に派生したクロック）を含む。クロック送達タイミング発行は、ソフトウエアとハードウエアとの間のこの境界のために生じる。完全なモデルがソフトウエアにあるため、ソフトウエアはレジスタ値に影響を与えるクロックエッジを検出し得る。レジスタのソフトウエアモデルに加えて、物理的には、これらのレジスタはハードウエアモデルに配置される。ハードウエアレジスタはさらにその各入力（すなわち、Ｄ入力におけるデータをＱ出力に移動すること）を評価することを確実にするために、ソフトウエア／ハードウエア境界は、ソフトウエアクロックを含む。ソフトウエアクロックは、ハードウエアモデルにおけるレジスタが正確に評価することを確実にする。ソフトウエアクロックは、ハードウエアレジスタコンポーネントへのクロック入力を制御するのではなく、ハードウエアレジスタのイネーブル入力を実質的に制御する。このソフトウエアクロックは、レース条件を回避し、従って、保持時間の超過（ｖｉｏｌａｔｉｏｎ）を回避するために緻密なタイミング制御が必要とされない。図１６に示されたクロックネットワークおよびゲートデータ論理解析プロセスは、レース条件が回避されフレキシブルなソフトウエア／ハードウエア境界実現が提供されるように、クロックをモデル化および実現する方法およびハードウエアレジスタに対するデータ送達システムを提供する。
【０２２３】
上述したように、一次クロックは、テストベンチプロセスからのクロック信号である。組み合わせコンポーネントから発生するこれらのクロック信号等の他のクロックの全てが発生されたかまたはゲートクロックである。一次クロックは、ゲートクロックおよびゲートデータ信号の両方を発生し得る。ほとんどの部分に対して、少し（例えば１〜１０）のみの発生したクロックまたはゲートクロックは、ユーザの回路設計に存在する。これらの発生したクロックは、ソフトウエアクロックとして実現され、ソフトウエア内にとどまり得る。比較的多数（例えば１０よりも多い）の発生したクロックが回路設計内に存在する場合、Ｓエミュレーションシステムは、そのクロックをハードウエアにモデル化して、Ｉ／Ｏオーバーヘッドを低減し、Ｓエミュレーションシステムの性能を維持する。ゲートデータは、いくつかの組み合わせ論理を介して一次クロックから発生したクロック以外のレジスタのデータまたは制御入力である。
【０２２４】
ゲートデータ／クロック解析プロセスはステップ５００で開始する。ステップ５０１は、ＨＤＬコードから発生された利用可能なソース設計データベースコードを獲得し、ユーザのレジスタ素子をＳエミュレーションシステムのレジスタコンポーネントにマッピングする。ユーザレジスタのＳエミュレーションシステムへの一対一マッピングは、以後のモデル化プロセスを容易にする。いくつかの場合、このマッピングは、特定の未処理物（ｐｒｉｍｉｔｉｖｅ）を用いてレジスタ素子を説明するユーザ回路設計を処理するために必要である。従って、ＲＴＬレベルコードに対して、Ｓエミュレーションレジスタは、かなり容易に使用され得るため、ＲＴＬレベルコードは、十分に高いレベルにおいて存在し、より低いレベルの実現を変更することを可能にする。ゲートレベルネットリストに対して、Ｓエミュレーションシステムは、コンポーネントのセルライブラリにアクセスし、特定の回路設計に特有の論理素子に適応するようにこのコンポーネントを修正する。
【０２２５】
ステップ５０２は、ハードウエアモデルのレジスタコンポーネントからのクロック信号を抽出する。このステップは、システムが一次クロックおよび発生したクロックを決定することを可能にする。さらにこのステップは、回路設計における種々のコンポーネントによって必要とされる全てのクロック信号を決定する。このステップからの情報は、ソフトウエア／ハードウエアクロックモデル化ステップを容易にする。
【０２２６】
ステップ５０３は、一次クロックおよび発生したクロックを決定する。一次クロックはテストベンチコンポーネントから発生し、ソフトウエアのみでモデル化される。発生したクロックは組み合わせ論理から発生され、このクロックは次に一次クロックによって駆動される。デフォルトによって、本発明のＳエミュレーションシステムは、発生したクロックをソフトウエア内で保持する。発生したクロックの数（例えば１０未満）が小さい場合、これらの発生したクロックはソフトウエアクロックとしてモデル化され得る。これらの発生したクロックを生成する組み合わせコンポーネントの数は小さく、そのため有意なＩ／Ｏオーバーヘッドは、これらの組み合わせコンポーネントをソフトウエア内に常駐させ続けることによって与えられない。しかし、発生したクロックの数が大きい（例えば１０より大きい）場合、これらの発生したクロックはハードウエアにモデル化され、Ｉ／Ｏオーバーヘッドを最小化し得る。時には、ユーザの回路設計が一次クロックから発生した多くの発生したクロックコンポーネントを使用する。従って、システムは、ハードウエアにおけるクロックを構築し、ソフトウエアクロックの数を小さく保持する。
【０２２７】
決定ステップ５０４は、システムが任意の発生したクロックがユーザの回路設計において見出されるかどうかを決定することを要求する。システムが任意の発生したクロックがユーザの回路設計において見出されるかどうかを決定することを要求しない場合、ステップ５０４は、「いいえ」であると決定し、クロック解析はステップ５０８で終了する。なぜなら、ユーザの回路設計における全てのクロックが一次クロックであり、これらのクロックが単にソフトウエアでモデル化されただけであるためである。発生したクロックがユーザの回路設計で見出された場合、ステップ５０４は「はい」であると決定し、アルゴリズムはステップ５０５まで進む。
【０２２８】
ステップ５０５は、一次クロックから発生されたクロックまでのファンアウト（ｆａｎ−ｏｕｔ）組み合わせコンポーネントを決定する。言い換えると、このステップは、組み合わせコンポーネントによって一次クロックからのクロック信号データ経路を追跡する。ステップ５０６は、発生したクロックからのファンイン（ｆａｎ−ｉｎ）組み合わせコンポーネントを決定する。言い換えると、このステップは、組み合わせコンポーネントから発生されたクロックまでのクロック信号データ経路を追跡する。システムにおけるファンアウトセットおよびファンインセットを決定することは、ソフトウエアにおいて再帰的に為される。正味のＮのファンインセットは以下のようである。
【０２２９】
【数５】

【０２３０】
ゲートクロックまたはデータ論理ネットワークは、正味のＮのファンインセットおよびファンアウトセット、ならびにその交点を決定するを再帰的に決定することによって決定される。本明細書における最終的な目標は、いわゆる正味のＮのファンインセットを決定することである。典型的には、正味のＮは、各ファンインからのゲートクロック論理を決定するためのクロック入力ノードである。各ファンインからゲートデータ論理を決定するために、正味のＮは、近くにある（ａｔｈａｎｄ）データ入力に関連付けられたクロック入力ノードである。ノードがレジスタ上にある場合、正味のＮは、このレジスタに関連付けられたデータ入力に対するこのレジスタへのクロック入力である。システムは正味のＮを駆動する全コンポーネントを見出す。正味のＮで駆動する各コンポーネントＸに対して、システムは、コンポーネントＸが組み合わせコンポーネントであるか否かを決定する。各コンポーネントＸが組み合わせコンポーネントではない場合、正味のＮのファンインセットは組み合わせコンポーネントを有さず、正味のＮは一次クロックである。
【０２３１】
しかし、少なくとも１つのコンポーネントＸが組み合わせコンポーネントである場合、システムはコンポーネントＸの正味の入力Ｙを決定する。ここで、システムは、コンポーネントＸへの入力ノードを見出すことによって回路設計においてさらに再検査する。各コンポーネントＸの正味の各入力Ｙに対して、正味のＹに接続されるファンインセットＷが存在し得る。この正味のＹのファンインセットＷは、正味のＮのファンインセットに与えられ、コンポーネントＸは、セットＮに与えられる。
【０２３２】
正味のＮのファンアウトセットは同様に決定される。正味のＮファンアウトセットは以下のように決定される。
【０２３３】
【数６】

【０２３４】
再度、ゲートクロックまたはデータ論理ネットワークが正味のＮのファンインセットおよびファンアウトセット、ならびにその相互接続を再帰的に決定することによって決定される。本明細書の最終的な目標は、正味のＮのいわゆるファンアウトセットを決定することである。典型的には、正味のＮは、各ファンアウトからのゲートクロック論理を決定するためのクロック出力ノードである。従って、正味のＮを用いる全ての論理素子のセットが決定される。各ファンアウトからのゲートデータ論理を決定するために、正味のＮは、近くにあるデータ出力に関連付けられたクロック出力ノードである。ノードがレジスタ上にある場合、正味のＮは、このレジスタに関連付けられた一次クロック駆動入力に対するこのレジスタの出力である。システムは正味のＮを用いる全コンポーネントを見出す。正味のＮを用いる各コンポーネントＸに対して、システムは、コンポーネントＸが組み合わせコンポーネントであるか否かを決定する。各コンポーネントＸが組み合わせコンポーネントではない場合、正味のＮのファンアウトセットは組み合わせコンポーネントを有さず、正味のＮは一次クロックである。
【０２３５】
しかし、少なくとも１つのコンポーネントＸが組み合わせコンポーネントである場合、システムはコンポーネントＸの正味の出力Ｙを決定する。ここで、システムは、コンポーネントＸからの出力ノードを見出すことによって回路設計における一次クロックからのさらなる転送を検索する。各コンポーネントＸからの正味の各出力Ｙに対して、ファンアウトセットＷは、正味のＹに接続される論理出力セットＷが存在し得る。この正味のＹのファンアウトセットＷは、正味のＮのファンアウトセットに与えられ、コンポーネントＸは、セットＮに与えられる。
【０２３６】
ステップ５０７は、クロックネットワークまたはゲートクロック論理を決定する。クロックネットワークはファンイン組み合わせコンポーネントおよびファンアウト組み合わせコンポーネントの相互接続である。
【０２３７】
同様に、同じファンインおよびファンアウトの原理が使用されて、ゲートデータ論理を決定し得る。ゲートクロックと同様に、ゲートデータは、いくつかの組み合わせ論理によって一次クロックによって駆動されるレジスタ（クロックを除く）のデータまたは制御入力である。ゲートデータ論理はゲートデータのファンインおよび一次クロックからのファンアウトの交点である。従って、クロック解析およびゲートデータ解析は、いくつかの組み合わせ論理およびゲートデータ論理によってゲートクロックネットワーク／ゲートクロック論理を生じる。以下で説明されるように、ゲートクロックネットワークおよびゲートデータネットワーク決定は、ソフトウエアクロックの成功した実現およびエミュレーションの間のハードウエアモデルにおける論理評価に対して重要である。クロック／データネットワーク解析は、ステップ５０８で終了する。
【０２３８】
図１７は、本発明の一実施形態によるハードウエアモデルの基本的な構成ブロックを示す。レジスタコンポーネントに対して、Ｓエミュレーションシステムは、非同期負荷制御を用いて、エッジトリガ（すなわちフリップフロップ）およびレベルに敏感な（すなわちラッチ）レジスタハードウエアモデルを構築するために、基本的なブロックとしてＤタイプフリップフロップを使用する。このレジスタモデル構築ブロックは以下のポートを有する。すなわち、Ｑ（出力状態）、Ａ＿Ｅ（非同期イネーブル）、Ａ＿Ｄ（非同期データ）、Ｓ＿Ｅ（同期イネーブル）、Ｓ＿Ｄ（同期データ）およびもちろんＳｙｓｔｅｍ．ｃｌｋ（システムクロック）である。
【０２３９】
Ｓエミュレーションレジスタモデルは、システムクロックの正のエッジまたは非同期イネーブル（Ａ＿Ｅ）入力の正のレベルによってトリガされる。これらの２つの正のエッジまたは正のレベルのトリガイベントのいずれかが発生する場合、レジスタモデルは非同期イネーブル（Ａ＿Ｅ）入力を検索する。非同期イネーブル（Ａ＿Ｅ）入力がイネーブルされると、出力Ｑは、非同期データ（Ａ＿Ｄ）の値を取得し、そうでなければ、同期イネーブル（Ｓ＿Ｅ）入力がイネーブルされると、出力Ｑは、同期データ（Ｓ＿Ｄ）の値を取得する。一方で、非同期イネーブル（Ａ＿Ｅ）も同期イネーブル（Ｓ＿Ｅ）入力もイネーブルされない場合、出力Ｑは、システムクロックの正のエッジの検出にもかかわらず、評価されない。このように、これらのイネーブルポートに対する入力がこの基本的な構築ブロックレジスタモデルの動作を制御する。
【０２４０】
システムは、特定のイネーブルレジスタであるソフトウエアクロックを使用して、これらのレジスタモデルのイネーブル入力を制御する。複雑なユーザ回路設計では、数１００万の素子が回路設計において見出され、従ってＳエミュレーションシステムは、ハードウエアモデルにおける数１００万の素子を実現する。これらの素子の全てを個別に制御することは高価である。なぜなら、ハードウエアモデルに数１００万の制御信号を送信することに対するオーバーヘッドがソフトウエア内のこれらの素子を評価するよりもより長い時間がかかるためである。しかし、この複雑な回路設計が数クロックのみ（１〜１０）を要求し、レジスタおよび組み合わせコンポーネントのみによってシステムの状態変化を制御するのに十分である。Ｓエミュレーションシステムのハードウエアモデルは、レジスタおよび組み合わせコンポーネントのみ使用する。Ｓエミュレーションシステムは、さらにソフトウエアクロックによってハードウエアモデルの評価を制御する。Ｓエミュレータシステムにおいて、レジスタに対するハードウエアモデルは、他のハードウエアコンポーネントに直接接続されたクロックを有さない。むしろ、ソフトウエアカーネルは全クロックの値を制御する。少しのクロック信号を制御することによって、カーネルは、コプロセッサ処理オーバーヘッドの無視できる量を用いてハードウエアモデルの評価にわたる全ての制御を有する。
【０２４１】
レジスタモデルがラッチまたはフリップフロップとして用いられるかどうかに依存して、ソフトウエアクロックは、非同期イネーブル（Ａ＿Ｅ）または同期イネーブル（Ｓ＿Ｅ）ワイヤ線のいずれかに入力される。ソフトウエアモデルからハードウエアモデルへのソフトウエアクロックの用途は、クロックコンポーネントのエッジ検出によってトリガされる。ソフトウエアカーネルがクロックコンポーネントのエッジを検出する場合、ソフトウエアカーネルは、ＣＬＫアドレス空間を介してクロックエッジレジスタを設定する。このクロックエッジレジスタはハードウエアレジスタモデルに対して、クロック入力ではなくイネーブル入力を制御する。グローバルシステムクロックは、クロック入力をハードウエアレジスタモデルにさらに供給する。しかし、クロックエッジレジスタはソフトウエアクロック信号を、二重バッファインターフェイスを介してハードウエアレジスタモデルに供給する。以下に説明するように、ソフトウエアクロックからハードウエアモデルまでの二重バッファインターフェイスは、全てのレジスタモデルがグローバルシステムクロックに関して同期的に更新されることを確実にする。従って、ソフトウエアクロックの使用は、保持時間の超過の危険を取り除く。
【０２４２】
図１８Ａおよび図１８Ｂは、ラッチおよびフリップフロップに対するビルディングブロックレジスタモデルの実現を示す。これらのレジスタモデルは適切なイネーブル入力を介してソフトウエアクロック制御される。レジスタモデルがフリップフロップまたはラッチとして使用されるかどうかに応じて、非同期ポート（Ａ＿Ｅ、Ａ＿Ｄ）および同期ポート（Ｓ＿Ｅ、Ｓ＿Ｄ）は、ソフトウエアクロックまたはＩ／Ｏ動作のいずれかのために使用される。図１８Ａは、ラッチとして使用される場合のレジスタモデルの実現を示す。ラッチは、レベルに敏感である。すなわち、クロック信号がアサートされた（例えば「１」）限り、出力Ｑは、入力Ｄに従う。ここで、ソフトウエアクロック信号は、非同期イネーブル（Ａ＿Ｅ）入力に供給され、データ入力は、非同期データ（Ａ＿Ｄ）入力に供給される。Ｉ／Ｏ動作に対して、ソフトウエアカーネルは、同期イネーブル（Ｓ＿Ｅ）および同期データ（Ｓ＿Ｄ）入力を使用して、値をＱポートにダウンロードする。このＳ＿ＥポートがＲＥＧ空間アドレスポインタとして使用され、Ｓ＿Ｄは、データをローカルデータバスに／ローカルデータバスからアクセスするために使用される。
【０２４３】
図１８Ｂは、設計フリップフロップとして使用される場合のレジスタモデルの実現を示す。設計フリップフロップは、次の状態論理（データＤ、セット（Ｓ）、リセット（Ｒ）、およびイネーブルＥ）を決定するために以下のポートを使用する。設計フリップフロップの次の状態論理の全ては、同期データ（Ｓ＿Ｄ）入力に供給されるハードウエア組み合わせコンポーネントに分解される。ソフトウエアクロックは、同期イネーブル（Ｓ＿Ｅ）入力へ入力される。Ｉ／Ｏ動作に対して、ソフトウエアカーネルは、非同期イネーブル（Ａ＿Ｅ）および非同期データ（Ａ＿Ｄ）入力を使用して、値をＱポートにダウンロードする。Ａ＿Ｅポートは、ＲＥＧ空間書き込みアドレスポインタとして使用され、Ａ＿Ｄポートが使用されて、データをローカルデータバスに／ローカルデータバスからアクセスする。
【０２４４】
ここで、ソフトウエアクロックが説明される。本発明のソフトウエアクロックの一実施形態は、ハードウエアレジスタモデルへのクロックイネーブル信号であり、これらのハードウエアレジスタモデルへの入力におけるデータがシステムクロックと共におよびシステムクロックと同期して評価される。これはレース条件および保持時間超過を取り除く。ソフトウエアクロック論理の一実施形態では、ソフトウエアにおけるクロックエッジ検出論理を含む。このクロックエッジ検出論理は、クロックエッジ検出に応じてハードウエアにおけるさらなる論理をトリガする。このようなイネーブル信号論理は、データのこれらのハードウエアレジスタモデルへの到達の前に、ハードウエアレジスタモデルへのイネーブル入力にイネーブル信号を生成する。ゲートクロックネットワークおよびゲートデータネットワーク決定は、ハードウエア加速モードの間、ソフトウエアクロックおよびハードウエアモデルにおける論理評価の成功した実現に対して重要である。上述したように、クロックネットワークまたはゲートクロック論理は、ゲートクロックのファンインおよび一次クロックのファンアウトの交点である。同様に、ゲートデータ論理はさらに、ゲートデータのファンインおよびデータ信号に対する一次クロックのファンアウトの交点である。これらのファンインおよびファンアウトの概念は図１６を参照して説明される。
【０２４５】
上述したように、一次クロックは、ソフトウエアのテストベンチプロセスによって生成される。発生したクロックまたはゲートクロックは、組み合わせ論理のネットワークおよび次いで一次クロックによって駆動されたレジスタから生成される。デフォルトによって、本発明のＳエミュレーションシステムは、発生したクロックをソフトウエア内で保持する。発生したクロックの数（例えば１０未満）が小さい場合、これらの発生したクロックはソフトウエアクロックとしてモデル化され得る。これらの発生したクロックを生成する組み合わせコンポーネントの数が小さく、そのため有意なＩ／Ｏオーバーヘッドは、これらの組み合わせコンポーネントをソフトウエア内にモデル化することによって与えられない。しかし、発生したクロックの数が大きい（例えば１０より大きい）場合、これらの発生したクロックおよびこの組み合わせコンポーネントはハードウエアにモデル化され、Ｉ／Ｏオーバヘッドを最小化し得る。
【０２４６】
最終的には、本発明の一実施形態に従って、ソフトウエアにおいて発生する（一次クロックへの入力を介して）クロックエッジ検出が、ハードウエアにおけるクロック検出に変換され得る（クロックエッジレジスタへの入力を介して）。ソフトウエアにおけるクロックエッジ検出は、ハードウエアにおけるイベントをトリガし、ハードウエアモデルにおけるレジスタは、データ信号の前のクロックイネーブル信号を受け取り、データ信号の評価がシステムクロックとの同期において発生し、保持時間超過を回避することを確実にする。
【０２４７】
上述のように、Ｓエミュレーションシステムは、ソフトウエアにおけるユーザの回路設計の完全なモデルおよびハードウエアにおけるユーザの回路設計の幾つかの部分を有する。カーネルにおいて特定されたように、ソフトウエアはハードウエアレジスタ値に影響を与えるクロックエッジを検出し得る。さらにハードウエアレジスタがその各入力を評価することを確実にするために、ソフトウエア／ハードウエア境界はソフトウエアクロックを含む。ソフトウエアクロックは、ハードウエアモデルにおけるレジスタがシステムクロックと同期して、そして任意の保持時間超過なしで評価することを確実にする。ソフトウエアクロックは、ハードウエアレジスタコンポーネントへのクロック入力を制御するのではなく、ハードウエアレジスタコンポーネントのイネーブル入力を実質的に制御する。ソフトウエアクロックを実現するための二重バッファアプローチは、レジスタがシステムクロックと同期させて評価することによってレース条件を回避し、保持時間超過を回避するための緻密なタイミング制御のための必要性を取り除くことを確実にする。
【０２４８】
図１９は、本発明に従うクロックインプリメンテーションシステムの一つの実施形態を示す。最初に、図１６に関して上述されたように、Ｓエミュレータシステムによってゲートクロック論理およびゲートデータ論理が判定される。従って、ゲートクロック論理およびゲートデータ論理は区別される。二重バッファをインプリメントする場合、駆動源および二重バッファ一次論理はまた、区別されるべきである。従って、ゲートデータ論理５１３およびゲートクロック論理５１４は、ファン−イン（ｆａｎ−ｉｎ）およびファン−アウト（ｆａｎ−ｏｕｔ）解析とは区別される。
【０２４９】
モジュール化一次クロックレジスタ５１０は、第１のバッファ５１１および第二のバッファ５１２を含む。これらは、両方ともＤレジスタである。この一次クロックは、ソフトウェアでモジュール化されるが、二重バッファの実現は、ソフトウェアおよびハードウェアの両方でモジュール化される。ソフトウェアにあるプライマルクロックレジスタ５１０でクロックエッジ検出が起こり、ハードウェアモデルをトリガして、ハードウェアモデルへのソフトウェアクロック信号を発生させる。データおよびアドレスは、ワイヤライン５１９および５２０でそれぞれ第１のバッファ５１１に入る。この第１のバッファ５１１のワイヤライン５２１によるＱ出力は、第二のバッファ５１２のＤ入力に結合される。また、この第１のバッファ５１１のＱ出力は、ワイヤライン５２２によってゲートクロック論理５１４へ提供され、最終的にクロックエッジレジスタ５１５の第１のバッファ５１６へのクロック入力を駆動する。ワイヤライン５２３による第二のバッファ５１２からのＱ出力は、ユーザカスタム設計された回路モデルにあるワイヤライン５３０を介してレジスタ５１８の入力を最終的に駆動する。一次クロックレジスタ５１０にある第二のバッファ５１２へのイネーブル入力は、ワイヤライン５３３による状態マシンからのＩＮＰＵＴ−ＥＮ信号である。この状態マシンは、従って、評価サイクルを判定し、かつ、様々な信号を制御する。
【０２５０】
クロックエッジレジスタ５１５は、また、第１のバッファ５１６および第二のバッファ５１７を含む。クロックエッジレジスタ５１５は、ハードウェアでインプリメントされる。（一次クロックレジスタ５１０への入力を介して）ソフトウェアでクロックエッジ検出が起こる場合、このことは、（クロックエッジレジスタ５１５を介して）ハードウェアにあるハードウェアの同じクロックエッジ検出をトリガし得る。ワイヤライン５２４による第１のバッファ５１６へのＤ入力は、論理「１」に設定される。ワイヤライン５２５によるクロック信号は、ゲートクロック論理５１４から導かれ、最終的には、ワイヤライン５２２による第１のバッファ５１１の出力において一次クロックレジスタ５１０から導かれる。ワイヤライン５２５によるこのクロック信号は、ゲートクロック信号である。第１のバッファ５１６へのイネーブルワイヤライン５２６は、Ｉ／Ｏサイクルおよび評価サイクルを制御する状態マシンからの〜ＥＶＡＬ信号である（後に説明される）。第１のバッファ５１６はまた、ワイヤライン５２７によるＲＥＳＥＴ信号を有する。この同じリセット信号は、また、クロックエッジレジスタ５１５にある第二のバッファ５１７に提供される。第１のバッファ５１６のＱ出力は、ワイヤライン５２９によって第二のバッファ５１７のＤ入力に供給される。第二のバッファ５１７は、ＣＬＫ−ＥＮ信号に対するワイヤライン５２８によるイネーブル入力およびワイヤライン５２７によるリセット入力を有する。ワイヤライン５３２による第二のバッファ５１７のＱ出力は、ユーザカスタム設計された回路モデルのレジスタ５１８のイネーブル入力に提供される。レジスタ５１８と共にバッファ５１１、５１２および５１７は、システムクロックによってクロックされる。クロックエッジレジスタ５１５のバッファ５１６のみが、ゲートクロック論理５１４からのゲートクロックによってクロックされる。
【０２５１】
レジスタ５１８は、ハードウェアでモデル化され、かつ、ユーザカスタム回路設計の一部である典型的なＤ型のレジスタモデルである。本発明のクロックインプリメンテーションスキームのこの実施形態は、厳密に評価を制御する。このクロックセットアップの最終的な目標は、ワイヤライン５３２によるクロックイネーブル信号が、ワイヤライン５３０によるデータ信号の前にレジスタ５１８に到達することを保証することである。その結果、このレジスタによるデータ信号の評価は、レース（ｒａｃｅ）状態がなくシステムと同期される。
【０２５２】
繰り返すために、一次クロックレジスタ５１０は、ソフトウェアにモデル化されるが、二重バッファインプリメンテーションは、ソフトウェアとハードウェの両方にモデル化される。クロックエッジレジスタ５１５は、ハードウェアにインプリメントされる。ゲートデータ論理５１３およびゲートクロック論理５１４は、モデル化の目的に対してファン−インおよびファン−アウト解析とは区別される。このゲートデータ論理５１３およびゲートクロック論理５１４は、また、ソフトウェア（ゲートデータおよびゲートクロックが小さい場合）またはハードウェア（ゲートデータおよびゲートクロックが大きい場合）にモデル化され得る。ゲートクロックネットワークおよびゲートデータネットワークを決定することが、ソフトウェアクロックおよびハードウェアアクセラレーションモード中のハードウェアモデルの論理評価をうまくインプリメントするために重要である。
【０２５３】
ソフトウェアクロックのインプリメンテーションは、主に、〜ＥＶＡＬ、ＩＮＰＵＴ−ＥＮ、ＣＬＫ−ＥＮおよびＲＥＳＥＴ信号のアサーションのタイミングに合わせて、図１９で示されるクロックセットアップに依存する。一次クロックレジスタ５１０は、クロックエッジを検出して、ハードウェアモデルに対するソフトウェアクロックの発生をトリガする。このクロックエッジ検出イベントは、ワイヤライン５２５によるクロック入力、ゲートクロック論理５１４、およびワイヤライン５２２を介してクロックエッジレジスタ５１５の「アクティベーション」をトリガする。これにより、クロックレジスタ５１５は、また、同じクロックエッジを検出する。このように、（一次クロックレジスタ５１０への入力５１９および５２０を介して）ソフトウェアで起こるクロック検出は、（クロックエッジレジスタ５１５への入力５２５を介して）ハードウェアにおけるクロックエッジ検出に転換され得る。この時点で、一次クロックレジスタ５１０にある第二のバッファ５１２へのＩＮＰＵＴ−ＥＮワイヤライン５３３、および、クロックエッジレジスタ５１５にある第二のバッファ５１７へのＣＬＫ−ＥＮワイヤライン５２８はアサートされておらず、従って、データは評価されない。次いで、クロックエッジは、データがハードウェアレジスタモデルで評価される前に検出される。この段階で、ワイヤライン５１９によるデータバスからのデータは、ゲートデータ論理５１３およびハードウェアモデル化ユーザレジスタ５１８へ伝搬すらされていないことに留意されたい。確かに、ワイヤライン５３３のＩＮＰＵＴ−ＥＮ信号がまだアサートされていないために、このデータは、一次クロックレジスタ５１０にある第二のバッファ５１２に到達すらしていない。
【０２５４】
Ｉ／Ｏ段階中に、ワイヤライン５２６の〜ＥＶＡＬ信号は、クロックエッジレジスタ５１５にある第１のバッファ５１６をイネーブルするようにアサートされる。〜ＥＶＡＬ信号が、ゲートクロック論理を通って、第１のバッファ５１６のワイヤライン５２５のクロック入力に方向付けるため、〜ＥＶＡＬ信号はまた、ゲートクロック論理５１４を通り、ゲートクロック信号を監視する。従って、四つの状態を評価する状態マシンに関して後に説明されるように、〜ＥＶＡＬ信号は、図１９で示されたシステムの一部を通るデータおよびクロック信号を安定化させるために必要である限り保持され得る。
【０２５５】
信号が安定化した場合、Ｉ／Ｏが完了した場合、そうでなければ、システムがデータを評価する準備が整った場合、〜ＥＶＡＬは、第１のバッファ５１６をディセーブルするようにディアサートされる。ＣＬＫ−ＥＮ信号は、アサートされて、第二のバッファ５１７をイネーブルするためにワイヤライン５２８を介して第二のバッファ５１７に適用され、そして、ワイヤライン５２９によって論理値「１」をレジスタ５１８の入力をイネーブルするために、ワイヤ線５３２によってＱ出力に送る。次に、レジスタ５１８は、イネーブルされて、ワイヤライン５３０にある任意のデータは、システムクロックによってレジスタ５１８内に同期してクロックされる。読み手（ｒｅａｄｅｒ）が理解し得るように、レジスタ５１８へのイネーブル信号は、このレジスタ５１８へのデータ信号の評価よりも速く伝わる。
【０２５６】
ワイヤライン５３３によるＩＮＰＵＴ−ＥＮ信号は、第二のバッファ５１２に対してアサートされない。また、ワイヤライン５２７によるＲＥＳＥＴエッジレジスタ信号は、クロックエッジレジスタ５１５のバッファ５１６および５１７に対してアサートされて、これらのバッファをリセットし、これらの出力が論理「０」であることを保証する。ＩＮＰＵＴ−ＥＮ信号がバッファ５１２に対してアサートされているので、ワイヤライン５２１によるデータは、ここで、ゲートデータ論理５１３へ、そして、ワイヤライン５３０によってユーザの回路レジスタ５１８へ伝搬する。このレジスタ５１８へのイネーブル入力は、ここで、論理「０」であるので、ワイヤライン５３０によるデータは、レジスタ５１８内にクロックされ得ない。しかしながら、以前のデータは、ＲＥＳＥＴ信号がアサートされて、レジスタ５１８をディセーブルする前に、ワイヤライン５３２による以前にアサートされたイネーブル信号によってクロックインされている。従って、レジスタ５１８への入力データ、および、ユーザのハードウェアモデル化回路設計、他のレジスタへの入力は、それぞれのレジスタ入力ポートに対して安定化する。クロックエッジが、実質的にソフトウェアに検出される場合、一次クロックレジスタ５１０およびハードウェア内のクロックエッジレジスタ５１５は、レジスタ５１８へのイネーブル入力をアクティブにする。その結果、レジスタ５１８の入力を待機するデータおよびそれぞれのレジスタへの入力を待機する他のデータは、同時に、および、システムクロックにより同期してクロックインされる。
【０２５７】
上記されたように、ソフトウェアクロックインプリメンテーションは、主に、〜ＥＶＡＬ、ＩＮＰＵＴ−ＥＮ、ＣＬＫ−ＥＮ、および、ＲＥＳＥＴ信号をアサートするタイミングに合わせた図１９に示されたクロックセットアップに依存する。図２０は、本発明の一つの実施形態に従う図１９のソフトウェアクロック論理を制御する有限状態マシンの四つの状態を示す。
【０２５８】
状態５４０において、システムは、アイドル状態である、または、いくつかのＩ／Ｏ動作が進行中である。ＥＶＡＬ信号は、論理「０」である。システムコントローラによって生成されたＥＶＡＬ信号は、評価サイクルを判定し、システムの論理を安定化するために、必要なだけ多くのクロックサイクルが続く。通常、ＥＶＡＬ信号の持続期間は、コンパイル中の配置スキームによって判定され、最長ダイレクトワイヤの長さ、および、最長分割多重ワイヤ（すなわち、ＴＤＭ回路）に基づく。評価中のＥＶＡＬ信号は、論理「１」である。
【０２５９】
状態５４１では，クロックはイネーブルされている。ＣＬＫ−ＥＮ信号は、論理「１」にアサートされて、次いで、ハードウェアレジスタモデルに対するイネーブル信号がアサートされる。ここで、ハードウェアレジスタモデルにおける以前のゲートデータは、保持時間に違反する危険がなく、同期して評価される。
【０２６０】
状態５４２において、新しいデータがＩＮＰＵＴ−ＥＮ信号が論理「１」にアサートされる場合、ＲＥＳＥＴ信号もまた、ハードウェアレジスタモデルからイネーブル信号を取り除くためにアサートされる。しかしながら、ゲートデータ論理ネットワークを通ってハードウェアレジスタモデル内にイネーブルされた新規のデータは、ハードウェアレジスタモデルの意図された宛先へ伝搬し続けるか、または、その宛先に到達するかであり、イネーブル信号が再びアサートされる場合、および、イネーブル信号が再びアサートされるとき、ハードウェアレジスタモデル内にクロックされるのを待機している。
【０２６１】
状態５４３において、伝搬する新規のデータは、ＥＶＡＬ信号が論理「１」である間、論理が安定化している。また、多重通信ワイヤは、図９Ａ、図９Ｂおよび図９Ｃに関連して、時分割多重（ＴＤＭ）回路として上述されたように、論理「１」である。ＥＶＡＬ信号が、デアサートされるまたは、論理「０」に設定される場合、システムは、アイドル状態５４０に戻り、ソフトウェアによるクロックエッジの検出に基づく評価を待機する。
【０２６２】
（Ｄ．ＦＰＧＡアレイおよび制御）
Ｓエミュレータシステムは、最初に、ソフトウェアモデルおよび要素の種類を含む様々な制御に基づいたハードウェアモデル内にユーザ回路設計データをコンパイルする。ハードウェアのコンパイルプロセス中に、システムは、ユーザの回路設計を作り上げる様々な要素を最適に区切る、設置する、および、相互接続するための図６に関して以前に説明されたようなマッピング、配置、ルーティングプロセスを実行する。公知のプログラミングツールを用いて、ビットストリームコンフィギュレーションファイルまたはプログラマオブジェクトファイル（．ｐｏｆ）（あるいは、元のバイナリファイル（．ｒｂｆ））が参照され、多くのＥＰＧＡチップを含むハードウェアボードを再構成する。各チップは、ユーザの回路設計に相当するハードウェアの一部を含む。
【０２６３】
一実施形態において、Ｓエミュレータシステムは、４×４のＦＰＧＡチップのアレイ（計１６チップ）を使用する。例示的なＦＰＧＡチップは、ＦＰＧＡ論理デバイス、および、ＡｌｔｅｒａＦＬＥＸ１０ＫデバイスのＸｉｌｉｎｘＸＣ４０００シリーズ系統を含む。
【０２６４】
ＸＣ４０００、ＸＣ４０００Ａ、ＸＣ４０００Ｄ、ＸＣ４０００Ｈ、ＸＣ４０００Ｅ、ＸＣ４０００ＥＸ、ＸＣ４０００Ｌ、および、ＸＣ４０００ＸＬを含むＦＰＧＡのＸｉｌｉｎｘＸＣ４０００シリーズが用いられ得る。特定のＦＰＧＡは、ＸｉｌｉｎｘＸＣ４００５Ｈ、ＸＣ４０２５、および、Ｘｉｌｉｎｘ４０２８ＥＸを含む。ＸｉｌｉｎｘＸＣ４０２８ＥＸＦＰＧＡエンジンの容量は、単一のＰＣＩボード上で５０万ゲートまで近づいている。これらのＸｉｌｉｎｘＦＰＧＡの詳細は、それらのデータブック（Ｘｉｌｉｎｘ、ＴｈｅＰｒｏｇｒａｍｍａｂｌｅＬｏｇｉｃＤａｔａＢｏｏｋ（９／９６））から得ることができる。このデータブックは、本明細書中で参照として援用される。ＡｌｔｅｒａＦＰＧＡの場合、詳細は、これらのデータブック（Ａｌｔｅｒａ、Ｔｈｅ１９９６ＤａｔａＢｏｏｋ（１９９６年６月））で見つけることができる。このデータブックは、本明細書中で参照として援用される。
【０２６５】
ＸＣ４０２５ＦＰＧＡの簡単で一般的な詳細が提供される。各アレイチップは、２４０ピンのＸｉｌｉｎｘのチップからなる。ＸｉｌｉｎｘＸＣ４０２５で密集されたアレイボードは、約４４０，０００個の構造化可能なゲートを含み、コンピュータで集約してタスクを実行することが可能である。ＸｉｌｉｎｘＸＣ４０２５ＦＰＧＡは、１０２４個の構造化可能な論理ブロック（ＣＬＢ）からなる。各ＣＬＢは、３２ビット非同期ＳＲＡＭ、または、少量の一般的なブール論理、および、ストローブされた二つのレジスタをインプリメントし得る。チップの周囲において、ストローブされていないＩ／Ｏレジスタが提供される。ＸＣ４０２５の代替物は、ＸＣ４００５Ｈである。１２０，０００個の構造化可能なゲートを有するアレイボードの比較的低いコストのバージョンである。ＸＣ４００５Ｈデバイスは、高電力２４ｍＡの駆動回路を有するが、標準的なＸＣ４０００シリーズでは、入力／出力フリップ／フロップを欠いている。これらのＸｉｌｉｎｘＦＰＧＡ、および、他のＸｉｌｉｎｘＦＰＧＡの詳細は、それらの公共で利用可能なデータシートを通して得られ得る。このデータシートは本明細書中で参照として援用される。
【０２６６】
ＸｉｌｉｎｘＸＣ４０００シリーズＦＰＧＡの機能は、配置データを内部メモリセル内にロードすることによってカスタマイズされ得る。これらのメモリセルに格納された値は、ＦＰＧＡの論理関数および論理相互接続を決定する。これらのＦＰＧＡの配置データは、オンチップメモリに格納され得、外部メモリからロードされ得る。ＦＰＧＡが、外部シリアルＰＲＯＭまたは外部パラレルＰＲＯＭからの配置データを読み出し得るか、あるいは、配置データが、外部デバイスからＦＰＧＡ内に書き込まれ得るかのいずれかである。これらのＦＰＧＡは、特にハードウェアが動的に変化される場合、または、ユーザが、ハードウェアが異なるアプリケーションに適応されるように望む場合に、無制限に何回でも再プログラムされ得る。
【０２６７】
概していうと、ＸＣ４０００シリーズＦＰＧＡは、１０２４個までのＣＬＢを有する。各ＣＬＢは、三つの入力を有する第三のルックアップテーブル（または、関数発生器Ｈ）、ならびに、二つのフリップ−フロップまたはラッチへの入力のうちのいくつかを提供する四つの入力を有する二つのルック−アップテーブル（または、関数発生器ＦおよびＧ）と共に、二つのルック−アップテーブルのレベルを有する。これらのルック−アップテーブルの出力は、これらのフリップ−フロップまたはラッチと独立して駆動され得る。ＣＬＢは、随意のブール関数（（１）四つまたは五つの変数を有する任意の関数、（２）四つの変数を有する任意の関数、四つまでの無関係の変数を有する任意の第二の関数、三つまでの無関係の変数を有する任意の第三の関数、（３）四つの変数を有する一つの関数および六つの変数を有する別の関数、（４）四つの変数を有する任意の二つの関数、（５）九つの変数を有するいくつかの関数）の次の組み合わせをインプリメントし得る。二つのＤ型フリップ−フロップまたはラッチは、ＣＬＢ入力を登録する、または、ルック−アップテーブルの出力を格納するために利用可能である。これらのフリップ−フロップは、ルック−アップテーブルとは独立して用いられ得る。ＤＩＮは、これらの二つのフリップ−フロップのうちの一つまたはラッチのいずれかへの直接入力として用いられ得、Ｈ１は、Ｈ関数発生器を通して他方を駆動する。
【０２６８】
ＣＬＢ（すなわち、ＦおよびＧ）の四つの入力関数発生器の各々は、繰り上げ信号および借り信号を迅速に発生するための専用演算論理を含む。この専用演算論理は、キャリー−インおよびキャリー−アウトを有する２ビット加算器をインプリメントするように配置され得る。また、これらの関数生成器は、読み出し／書き込みランダムアクセスメモリ（ＲＡＭ）としてインプリメントされ得る。四つの入力ワイヤラインは、ＲＡＭのためのアドレスラインとして用いられる。
【０２６９】
ＡｌｔｅｒａＦＬＥＸ１０Ｋチップは、コンセプトがやや似ている。これらのチップは、複数の３２ビットバスを有するＳＲＡＭを基礎としたプログラマブル論理デバイス（ＰＬＤ）である。特に、各ＦＬＥＸ１０Ｋ１００チップは、約１，０００，０００個のゲート、１２個の埋め込みアレイブロック（ＥＡＢ）、６２４個の論理アレイブロック（ＬＡＢ）、ＬＡＢ一個につき８個の論理素子（ＬＥ）（または、４，９９２個のＬＥ）、５，３９２個のフリップ−フロップまたはレジスタ、４０６個のＩ／Ｏピン、および、全体で５０３個のピンを含む。
【０２７０】
ＡｌｔｅｒａＦＬＥＸ１０Ｋチップは、埋め込みアレイブロック（ＥＢＡ）の埋め込みアレイ、および、論理アレイブロック（ＬＡＢ）の論理アレイを含む。ＥＡＢは、様々なメモリ（例えば、ＲＡＭ、ＲＯＭ、ＦＩＦＯ）、および、複素論理関数（例えば、デジタル信号プロセッサ（ＤＳＰ）、マイクロコントローラ、乗算器、データ変換関数、状態マシン）をインプリメントするように用いられ得る。メモリ関数をインプリメントするために、ＥＡＢは、２，０４８ビットを提供する。論理関数をインプリメントするために、ＥＡＢは、１００から６００個のゲートを提供する。
【０２７１】
ＬＡＢは、ＬＥを介して、中間の大きさの論理ブロックをインプリメントするように用いられ得る。各ＬＡＢは、約９６個の論理ゲートを表し、８個のＬＥおよび局所的な相互接続を含む。ＬＥは、４つの入力を有するルック−アップテーブル、プログラマブルフリップ−フロップ、ならびに、桁上げおよびカスケード関数のための専用信号パスを含む。作成され得る典型的な論理関数は、カウンタ、アドレス符号器、または、小さな状態マシンを含む。
【０２７２】
ＡｌｔｅｒａＦＬＥＸ１０Ｋのより詳細な説明は、Ａｌｔｅｒａ、１９９６ＤＡＴＡＢＯＯＫ（１９９６年６月）に見出され得、本明細書中で参照として援用される。データブックは、また、支援プログラミングソフトウェアの詳細を含む。
【０２７３】
図８は、４×４ＦＰＧＡアレイ、および、それらの相互接続の一実施形態を示す。Ｓエミュレータのこの実施形態は、ＦＰＧＡチップに対してクロスバーまたは部分的クロスバー接続を用いていない。ＦＰＧＡチップは、第１行にチップＦ１１〜Ｆ１４、第二行にチップＦ２１〜Ｆ２４、第三行にチップＦ３１〜Ｆ３４、および、第四行にチップＦ４１〜Ｆ４４を含む。一実施形態において、各ＦＰＧＡ（例えば、チップＦ２３）は、ＳエミュレータシステムのＦＰＧＡＩ／Ｏコントローラへのインタフェースに対して以下のピンを有する。
【０２７４】
【表１】

【０２７５】
従って、一実施形態において、各ＥＰＧＡチップは、Ｓエミュレータシステムとインタフェースするために４１個のピンのみを用いる。これらのピンは、図２２に関してさらに説明される。
【０２７６】
これらのＦＰＧＡチップは、非クロスバー相互接続または非部分的クロスバー相互接続を介して互いに相互接続される。チップＦ１１とチップＦ１４との間の相互接続６０２等のチップ間の各相互接続は、４４個のピンまたは４４本のワイヤラインを表す。他の実施形態において、各相互接続は、４４個より多いピンを表す。さらに他の実施形態において、各部接続は、４４個未満のピンを表す。
【０２７７】
各チップは、６つの相互接続を有する。例えば、チップＦ１１は、相互接続６００〜６０５を有する。また、チップＦ３３は、相互接続６０６〜６１１を有する。これらの相互接続は、行に沿って水平に、そして、列に沿って垂直に走る。各相互接続は、行に沿った二つのチップ間の直接接続、または、列に沿った二つのチップ間の直接接続を提供する。従って、例えば、相互接続６００はチップＦ１１とチップＦ１３とを直接接続する；相互接続６０１はチップＦ１１とチップＦ１２とを直接接続する；相互接続６０２はチップＦ１１とチップＦ１４とを直接接続する；相互接続６０３はチップＦ１１とチップＦ３１とを直接接続する；相互接続６０４はチップＦ１１とチップＦ２１とを直接接続する；相互接続６０５はチップＦ１１とチップＦ４１とを直接接続する。
【０２７８】
同様に、アレイのエッジに位置（例えば、Ｆ１１）していないチップＦ３３に関して、相互接続６０６はチップＦ３３とチップＦ１３とを直接接続する；相互接続６０７はチップＦ３３とチップＦ２３とを直接接続する；相互接続６０８はチップＦ３３とチップＦ３４とを直接接続する；相互接続６０９はチップＦ３３とチップＦ４３とを直接接続する；相互接続６１０はチップＦ３３とチップＦ３１とを直接接続する；相互接続６１１はチップＦ３３とチップＦ３２とを直接接続する。
【０２７９】
チップＦ１１がチップＦ１３から１ホップ内に位置しているので、相互接続６００は、「１」と表示される。チップＦ１１がチップＦ１２から１ホップ内に位置しているので、相互接続６０１は、「１」として表示される。同様に、チップＦ１１がチップＦ１４から１ホップ内に位置しているので、相互接続６０２は、「１」として表示される。同様に、チップＦ３３に関して、全ての相互接続は、「１」として表示される。
【０２８０】
この相互接続スキームによって、各チップは、２回以内の「ジャンプ」、または、相互接続でアレイにある任意の他のチップと通信することができる。従って、チップＦ１１は、以下の二つの経路（（１）相互接続６００から相互接続６０６へ；または（２）相互接続６０３から相互接続６１０へ）のいずれかを通ってチップＦ３３に接続される。つまり、この経路は、（１）最初は行に沿って、次に列に沿って、または、（２）最初は列に沿って、次に行に沿って、のいずれかであり得る。
【０２８１】
図８は、水平および垂直の相互接続で４×４のアレイに配置されたＦＰＧＡチップを示すが、ボード上の実際の物理的インプリメンテーションは、拡張ピギーバックボードを有する低および高バンクを通っている。このようにして、一実施形態において、チップＦ４１〜Ｆ４４、およびＦ２１〜Ｆ２４は、低バンクにある。チップＦ３１〜Ｆ３４およびＦ１１〜Ｆ１４は、高バンクにある。ピギーバックボードは、チップＦ１１〜Ｆ１４、および、Ｆ２１〜Ｆ２４を含む。従って、アレイを拡張するために、多くの（例えば、８個の）チップを含むピギーバックボードが、バンクに（つまり、現在チップＦ１１〜Ｆ１４を含む行の上に）加えられる。他の実施形態において、ピギーバックボードは、現在チップＦ４１〜Ｆ４４を含む行の下にアレイを拡張する。さらなる実施形態は、チップＦ１４、Ｆ２４、Ｆ３４およびＦ４４の右側へ拡張することを可能にする。さらに他の実施形態は、チップＦ１１、Ｆ２１、Ｆ３１およびＦ４１の左側へ拡張することを可能にする。
【０２８２】
図７は、「１」または「０」に置き換えて表示された場合の図８の４×４のＦＰＧＡアレイに対する連結マトリクスを示す。この連結マトリクスは、ハードウェアマッピング、配置、および、このＳエミュレーションシステムに対するルーティングプロセスに用いられるコスト関数から生じる設置コストを生成するために用いられる。このコスト関数は、図６に関して上記で説明された。例として、チップＦ１１は、チップＦ１３から１ホップ内に位置し、従って、Ｆ１１〜Ｆ１３に関する連結マトリクスの入力は、「１」である。
【０２８３】
図２１は、本発明の一実施形態に従う単一のＦＰＧＡチップに対する相互接続ピン−アウトを示す。ここで、各チップは、相互接続の六つのセットを有し、各セットは、特定の数のピンを含む。一実施形態において、各セットは４４個のピンを有する。各ＦＰＧＡチップの相互接続は、水平（東西）、および、垂直（南北）を向く。西向きの相互接続のセットは、Ｗ［４３：０］として表示される。東向きの相互接続のセットは、Ｅ［４３：０］として表示される。北向きの相互接続のセットは、Ｎ［４３：０］として表示される。南向きの相互接続のセットは、Ｓ［４３：０］として表示される。相互接続のこれらの完全なセットは、隣接するチップへの接続に関する。つまり、これらの相互接続は、任意のチップを越えて「ホップ」しない。例えば、図８において、チップＦ３３は、Ｎ［４３：０］に対する相互接続６０７、Ｅ［４３：０］に対する相互接続６０８、Ｓ［４３：０］に対する相互接続６０９、および、Ｗ［４３：０］に対する相互接続６１１を有する。
【０２８４】
図２１に戻ると、二つのさらなる相互接続のセットが残っている。相互接続の一つのセットは、垂直に走る隣接しない相互接続（ＹＨ［２１：０］およびＹＨ［４３：２２］）に関する。相互接続の他のセットは、ＸＨ［２１：０］およびＸＨ［４３：２２］を水平に走る隣接しない相互接続に関する。各セット、ＹＨ［．．．］およびＸＨ［．．．］は、二つに分けられ、一つのセットの各半分が２２個のピンを含む。この配置によって、各チップを同様に製造することが可能である。従って、各チップは、上、下、左および右に位置する隣接しないチップへ１ホップで相互接続可能である。このＦＰＧＡチップは、また、全体的な信号、ＦＰＧＡバス、および、ＪＴＡＧ信号に対するピン（単数または複数）を示す。
【０２８５】
次に、ＦＰＧＡＩ／Ｏコントローラが説明される。このコントローラは、アイテム３２７として図１０で最初に簡潔に導入された。ＦＰＧＡＩ／Ｏコントローラは、データを管理し、ＰＣＩバスとＦＰＧＡアレイとの間のトラフィックを制御する。
【０２８６】
図２２は、ＦＰＧＡチップのバンクに沿った、ＦＰＧＡチップＰＣＩバスとＦＰＧＡアレイとの間のＦＰＧＡコントローラの一実施形態を示す。ＦＰＧＡＩ／Ｏコントローラ７００は、ＣＴＲＬ＿ＦＰＧＡユニット７０１、クロックバッファ７０２、ＰＣＩコントローラ７０３、ＥＥＰＲＯＭ７０４、ＦＰＧＡシリアル配置インタフェース７０５、境界スキャンテストインタフェース７０６、および、バッファ７０７を含む。回路を調節する当業者に公知の適切な電力／電圧が提供される。例示的な供給源は、電圧検出器／レギュレータに結合されたＶ_ＣＣ、および、様々な環境状態で電圧を実質的に維持するセンス増幅器を含む。各ＦＰＧＡチップへのＶ_ＣＣは、薄膜ヒューズをそれらの間で素早く動かして供給される。Ｖ_ＣＣ−ＨＩは、全てのＦＰＧＡチップへのＣＯＮＦＩＧ＃に、および、ＬＯＣＡＬ＿ＢＵＳ７０８へのＬＩＮＴＩ＃に提供される。
【０２８７】
ＣＴＲＬ＿ＦＰＧＡユニット７０１は、様々な制御、試験を扱い、かつ、様々なユニットおよびバス間の実質的なデータを読み出す／書き込むＦＰＧＡＩ／Ｏコントローラ７００に対する一次コントローラである。ＣＴＲＬ＿ＦＰＧＡユニット７０１は、ＦＰＧＡチップの低バンクおよび高バンクに結合される。ＦＰＧＡチップＦ４１〜Ｆ４４およびＦ２１〜Ｆ２４（すなわち、低バンク）は、低ＦＰＧＡバス７１８に結合される。これらのＦＰＧＡチップＦ１１〜Ｆ１４、Ｆ２１〜Ｆ２４、Ｆ３１〜Ｆ３４およびＦ４１〜Ｆ４４は、参照番号を保ったまま、図８のＦＰＧＡチップに一致する。
【０２８８】
これらのＦＰＧＡチップＦ１１〜Ｆ１４、Ｆ２１〜Ｆ２４、Ｆ３１〜Ｆ３４およびＦ４１〜Ｆ４４、および、低バンクバス７１８および高バンクバス７１９の間に、適切なローディングをするための厚いフィルムチップレジスタがある。低バンクバス７１８に結合されたレジスタ７１３のグループは、例えば、レジスタ７１６および７１７を含む。高バンクバス７１９に結合されたレジスタ７１２のグループは、例えば、レジスタ７１４および７１５を含む。
【０２８９】
拡張が所望ならば、ＦＰＧＡチップは、Ｆ１１およびＦ１２のＦＰＧＡチップの右方向にある低バンクバス７１８および高バンクバス７１９にさらにインストールされ得る。一実施形態において、ピギーバックボード７２０と共通点があるピギーバックボードを介して拡張される。従って、ＦＰＧＡチップのこれらのバンクが、最初に８つのＦＰＧＡチップＦ４１〜Ｆ４４、およびＦ３１〜Ｆ３４のみを有する場合、ピギーバックボード７２０を追加することによってさらに拡張が可能になる。ピギーバックボード７２０は、低バンクにおいてＦＰＧＡチップＦ２４〜Ｆ２１を含み、高バンクにおいてＦＰＧＡチップＦ１４〜Ｆ１１を含む。ピギーバックボード７２０はまた、追加の低バンクバスおよび高バンクバス、ならびに、厚膜チップレジスタを含む。
【０２９０】
ＰＣＩコントローラ７０３は、ＦＰＧＡＩ／Ｏコントローラ７００と３２ビットＰＣＩバス７０９との間の一次ンタフェースである。ＰＣＩバスが６４ビットおよび／または６６ＭＨｚに拡張される場合、本発明の意図および範囲から逸脱することなく、このシステムにおいて適切な調整がなされ得る。これらの調整は、以下に記載される。このシステムにおいて使用され得るＰＣＩコントローラ７０３の一実施形態は、ＰＬＸ技術のＰＣＩ９０８０または９０６０である。ＰＣＩ９０８０は、適切なローカルバスインタフェース、制御レジスタ、ＦＩＦＯ、およびＰＣＩへのＰＣＩインタフェースを有する。データブックＰＬＸ技術、ＰＣＩ９０８０データシート（ｖｅｒ．０．９３、１９９７年２月２８日）が、本明細書中に参考として援用される。
【０２９１】
ＰＣＩコントローラ７０３は、ＬＯＣＡＬ＿ＢＵＳ７０８を介して、ＣＴＲＬ＿ＦＰＧＡユニット７０１とＰＣＩバス７０９との間にデータを通す。ＬＯＣＡＬ＿ＢＵＳは、制御信号のための制御バス部分、アドレス信号のためのアドレスバス部分、およびデータ信号を制御のためのデータバス部分を含む。ＰＣＩバスが６４ビットに拡張される場合、ＬＯＣＡＬ＿ＢＵＳ７０８のデータバス部分もまた、６４ビットに拡張され得る。ＰＣＩコントローラ７０３は、ＥＥＰＲＯＭ７０４に接続される。ＥＥＰＲＯＭ７０４は、ＰＣＩコントローラ７０３の構成データを含む。例示のＥＥＰＲＯＭ７０４は、国産の半導体の９３ＣＳ４６である。
【０２９２】
ＰＣＩバス７０９は、３３ＭＨｚのクロック信号をＦＰＧＡＩ／Ｏコントローラ７００に供給する。クロック信号は、同期化の目的のため、および低タイミングスキューのためにワイヤ線７１０を介してクロックバッファ７０２に提供される。このクロックバッファ７０２の出力は、ワイヤ線７１１を介して全てのＦＰＧＡチップに供給され、かつ、ワイヤ線７２１を介してＣＴＲＬ＿ＦＰＧＡユニット７０１に供給された３３ＭＨｚのグローバルクロック（ＧＬ＿ＣＬＫ）信号である。ＰＣＩバスが６６ＭＨｚに拡張される場合、クロックバッファはまた、システムに６６ＭＨｚを供給する。
【０２９３】
ＦＰＧＡシリアル構成インタフェース７０５は、ＦＰＧＡチップＦ１１〜Ｆ１４、Ｆ２１〜Ｆ２４、Ｆ３１〜Ｆ３４、およびＦ４１〜Ｆ４４を構成するために構成データを提供する。Ａｌｔｅｒａデータブック（Ａｌｔｅｒａ、１９９６データブック（１９９６年６月））は、構成デバイスおよびプロセッサの詳細な情報を提供する。ＦＰＧＡシリアル構成インタフェース７０５はまた、ＬＯＣＡＬ＿ＢＵＳ７０８およびパラレルポート７２１に結合される。さらに、ＦＰＧＡシリアル構成インタフェース７０５は、ＣＯＮＦ＿ＩＮＴＦワイヤ線７２３を介して、ＣＴＲＬ＿ＦＰＧＡユニット７０１およびＦＰＧＡチップＦ１１〜Ｆ１４、Ｆ２１〜Ｆ２４、Ｆ３１〜Ｆ３４、およびＦ４１〜Ｆ４４に結合される。
【０２９４】
境界スキャンテストインタフェース７０６は、ある特有のテストコマンドセットのＪＴＡＧ装置を提供して、プロセッサの論理ユニットまたはシステムの論理ユニットおよびソフトウェアによる回路部を外部からチェックする。このインタフェース７０６は、ＩＥＥＥＳｔｄ．１１４９．１−１９９０規格に準拠する。Ａｌｔｅｒａデータブック（Ａｌｔｅｒａ、１９９６データブック（１９９６年６月）およびアプリケーションノート３９（ＡｌｔｅｒａデバイスにおけるＪＴＡＧ境界スキャンテスト）を参照して、それらは共に、さらなる情報のために本明細書中参考として援用される。境界スキャンテストインタフェース７０６は、さらに、ＬＯＣＡＬ＿ＢＵＳ７０８およびパラレルポート７２２に結合される。さらに、境界スキャンテストインタフェース７０６は、ＢＳＴ＿ＩＮＴＦワイヤ線７２４を介してＣＴＲＬ＿ＦＰＧＡユニット７０１およびＦＰＧＡチップＦ１１〜Ｆ１４、Ｆ２１〜Ｆ２４、Ｆ３１〜Ｆ３４、およびＦ４１〜Ｆ４４に接続される。
【０２９５】
ＣＴＲＬ＿ＦＰＧＡユニット７０１は、バッファ７０７、低バンク３２ビットＦＤ［３１：０］のＦ＿ＢＵＳ７２５および高バンク３２ビットＦＤ［６３：３２］のＦ＿ＢＵＳ７２６と共に、低バンク３２ビットバス７１８を介して、ＦＰＧＡチップの低（チップＦ４１〜Ｆ４４およびＦ２１〜Ｆ２４）バンク、および高バンク３２ビットバス７１９を介して、ＦＰＧＡチップの高（チップＦ３１〜Ｆ３４およびＦ１１〜Ｆ１４）バンクに／からデータをそれぞれ通す。
【０２９６】
一実施形態では、低バンクバス７１８および高バンクバス７１９におけるＰＣＩバス７０９の処理能力を重複させる。ＰＣＩバス７０９は、３３ＭＨｚにおいて３２ビット幅である。従って、処理能力は、１３２ＭＢＸ（＝３３ＭＨｚ^＊４バイト）である。低バンクバス７１８は、ＰＣＩバス周波数の半分（３３／２ＭＨｚ＝１６．５ＭＨｚ）である３２ビットである。高バンクバス７１９はまた、ＰＣＩバス周波数の半分（３３／２＝１６．５ＭＨｚ）である３２ビット幅である。６４ビットの低バンクバスおよび高バンクバスの処理能力はまた、１３２ＭＢＸ（＝１６．５ＭＨｚ^＊８バイト）である。従って、低バンクバスおよび高バンクバスの性能は、ＰＣＩバスの性能を追跡する。言い換えると、性能制限は、低バンクバスおよび高バンクバスではなく、ＰＣＩバス状態にある。
【０２９７】
本発明の一実施形態によるアドレスポインタは、さらに各ソフトウェア／ハードウェア境界アドレススペースの各ＦＰＧＡチップにおいて実施される。これらのアドレスポインタは、多重化されたクロスチップアドレスポインタチェーンを通して、いくつかのＦＰＧＡチップにわたってつながれる。図９、１１、１２、１４、および１５に関する上述のアドレスポインタの考察を参照すること。所与のアドレススペースに関連したアドレスポインタのチェーンおよびいくつかのチップにわたってワード選択信号を移動するために、チェーンアウト（ｃｈａｉｎ−ｏｕｔ）ワイヤ線が提供される必要がある。これらのチェーンアウトワイヤ線は、チップ間の矢印として示される。低バンクに対する１つのこのようなチェーンアウトワイヤ線は、チップＦ２３とＦ２２との間のワイヤ線７３０である。高バンクに対する別のこのようなチェーンアウトワイヤ線は、チップＦ３１とＦ３２との間のワイヤ線７３１である。低バンクチップＦ２１の端部におけるチェーンアウトワイヤ線７３２は、ＬＡＳＴ＿ＳＨＩＦＴ＿ＬとしてＣＴＲＬ＿ＦＰＧＡユニット７０１に結合される。高バンクチップＦ１１の端部におけるチェーンアウトワイヤ線７３３は、ＬＡＳＴ＿ＳＨＩＦＴ＿ＨとしてＣＴＲＬ＿ＦＰＧＡユニット７０１に結合される。これらの信号ＬＡＳＴ＿ＳＨＩＦＴ＿ＬおよびＬＡＳＴ＿ＳＨＩＦＴ＿Ｈは、ワード選択信号がＦＰＧＡチップを介して伝達されるように、それらの各バンクのためのワード選択信号である。これらの信号ＬＡＳＴ＿ＳＨＩＦＴ＿ＬおよびＬＡＳＴ＿ＳＨＩＦＴ＿Ｈのどちらかが、ＣＴＲＬ＿ＦＰＧＡユニット７０１に対して論理「１」を表すと、これは、ワード選択信号がチップのそれぞれのバンクの端部に伝わることを示す。
【０２９８】
ＣＴＲＬ＿ＦＰＧＡユニット７０１は、ワイヤ線７３４の書き込み信号（Ｆ＿ＷＲ）、ワイヤ線７３５の読み出し信号（Ｆ＿ＲＤ）、ワイヤ線７３６のＤＡＴＡ＿ＸＳＦＲ信号、ワイヤ線７３７のＥＶＡＬ信号、およびワイヤ線７３８のＳＰＡＣＥ［２：０］信号をＦＰＧＡチップに、およびＦＰＧＡチップから提供する。ＣＴＲＬ＿ＦＰＧＡユニット７０１は、ワイヤ線７３９のＥＶＡＬ＿ＲＥＱ＃信号を受け取る。書き込み信号（Ｆ＿ＷＲ）、読み出し信号（Ｆ＿ＲＤ）、ＤＡＴＡ＿ＸＳＦＲ信号、およびＳＰＡＣＥ［２：０］信号は、ＦＰＧＡチップにおけるアドレスポインタと共に機能する。書き込み信号（Ｆ＿ＷＲ）、読み出し信号（Ｆ＿ＲＤ）、およびＳＰＡＣＥ［２：０］信号を用いて、ＳＰＡＣＥインデックス（ＳＰＡＣＥ［２：０］）によって決定されるように、選択されたアドレススペースと関連するアドレスポインタのＭＯＶＥ信号を生成する。ＤＡＴＡ＿ＸＳＦＲ信号を用いて、アドレスポインタを初期化して、逐語的データ転送プロセスを始める。
【０２９９】
ＥＶＡＬ＿ＲＥＱ＃信号を用いて、いくつかのＦＰＧＡチップがこの信号をアサートする場合、再び全ての点で評価サイクルを始める。例えば、データを評価するために、データは、ＰＣＩバスを介してホストプロセッサの計算ステーションのメインメモリからＦＰＧＡに転送されるか、または書き込まれる。転送が終了すると、評価サイクルは、アドレスポインタの初期化およびソフトウェアクロックの動作を含み始めて、評価プロセスを容易にする。しかし、様々な理由のために、特定のＦＰＧＡチップは、再び全ての点でデータを評価する必要があり得る。このＦＰＧＡチップは、ＥＶＡＬ＿ＲＥＱ＃信号をアサートし、ＣＮＴＦ＿ＦＰＧＡチップ７０１は、再び全ての点で評価サイクルを始める。
【０３００】
図２３は、図２２のＣＴＲＬ＿ＦＰＧＡユニット７０１およびバッファ７０７のより詳細な図を示す。図２２に示されるＣＴＲＬ＿ＦＰＧＡユニット７０１に関する同様の入力／出力信号およびそれらに対応する参照符号はまた、図２３において保持され、使用される。しかし、図２２に示されないさらなる信号およびワイヤ／バスライン（例えば、ＳＥＭ＿ＦＰＧＡ出力イネーブル１０１６、ローカル割り込み出力（ＬｏｃａｌＩＮＴＯ）７０８ａ、ローカル読み出し／書き込み制御信号７０８ｂ、ローカルアドレスバス７０８ｃ、ローカル割り込み入力（ＬｏｃａｌＩＮＴＩ＃）７０８ｄ、およびローカルデータバス７０８ｅ）が、新しい参照符号と共に記載される。
【０３０１】
ＣＴＲＬ＿ＦＰＧＡユニット７０１は、転送完了チェッキング論理（ＸＳＦＲ＿ＤＯＮＥＬｏｇｉｃ）１０００、評価制御論理（ＥＶＡＬＬｏｇｉｃ）１００１、ＤＭＡ記述子ブロック１００２、制御レジスタ１００３、評価タイマー論理（ＥＶＡＬｔｉｍｅｒ）１００４、アドレス復号器１００５、書き込みフラグシーケンサ論理１００６、ＦＰＧＡチップ読み出し／書き込み制御論理（ＳＥＭ＿ＦＰＧＡＲ／ＷＬｏｇｉｃ）１００７、デマルチプレクサおよびラッチ（ＤＥＭＵＸｌｏｇｉｃ）１００８、および図２２のバッファ７０７に対応するラッチ１００９〜１０１２を含む。ワイヤ／バス７２１によるグローバルクロック信号（ＣＴＲＬ＿ＦＰＧＡ＿ＣＬＫ）は、ＣＴＲＬ＿ＦＰＧＡユニット７０１における全ての論理素子／ブロックに提供される。
【０３０２】
転送完了チェッキング論理（ＸＳＦＲ＿ＤＯＮＥ）１０００は、ＬＡＳＴ＿ＳＨＩＦＴ＿Ｈ７３３、ＬＡＳＴ＿ＳＨＩＦＴ＿Ｌ７３２、およびローカルＩＮＴＯ７０８ａを受け取る。ＸＳＦＲ＿ＤＯＮＥ論理１０００は、ワイヤ／バス１０１３によりＥＶＡＬ論理１００１に転送完了信号（ＸＳＦＲ＿ＤＯＮＥ）を出力する。ＬＡＳＴ＿ＳＨＩＦＴ＿Ｈ７３３およびＬＡＳＴ＿ＳＨＩＦＴ＿Ｌ７３２の受信に基づいて、ＸＳＦＲ＿ＤＯＮＥ論理１０００は、データ転送完了のチェックをして、所望ならば、評価サイクルが始まり得る。
【０３０３】
ＥＶＡＬ論理１００１は、ワイヤ／バス１０１３の転送完了信号（ＸＳＦＲ＿ＤＯＮＥ）に加えて、ワイヤ／バス７３９のＥＶＡＬ＿ＲＥＱ＃信号およびワイヤ／バス１０１５のＷＲ＿ＸＳＦＲ／ＲＤ＿ＸＳＦＲ信号を受け取る。ＥＶＡＬ論理１００１は、２つの出力信号（ワイヤ／バス１０１４のＳｔａｒｔＥＶＡＬおよびワイヤ／バス７３６のＤＡＴＡ＿ＸＳＦＲ）を生成する。ＥＶＡＬ論理は、ＦＰＧＡバスとＰＣＩバスとの間のデータ転送がアドレスポインタを初期化し始めるときを示す。それは、データ転送が完了するときＸＳＦＲ＿ＤＯＮＥ信号を受け取る。ＷＲ＿ＸＳＦＲ／ＲＤ＿ＸＳＦＲ信号は、転送が読み出しか、または書き込みかどうかを示す。一旦、Ｉ／Ｏサイクルが完了すると（または、Ｉ／Ｏサイクルの開始以前）、ＥＶＡＬ論理は、開始ＥＶＡＬ信号ｔのＥＶＡＬタイマーとともに評価サイクルを開始し得る。ＥＶＡＬタイマーは、評価サイクルの継続時間に影響して、ソフトウェアクロック機構が完全に動作することを保証する。これは、全てのレジスタおよび組み合わせの構成要素にデータ伝達を安定化する必要がある間は、評価サイクルをアクティブに維持することによってなされる。
【０３０４】
ＤＭＡ記述子ブロック１００２は、ワイヤ／バス１０１９のローカルバスアドレス、アドレス復号器１００５からのワイヤ／バス１０１２の書き込みイネーブル信号、およびローカルデータバス７０８ｅを介してワイヤ／バス１０２９のローカルバスデータを受け取る。出力は、ワイヤ／バス１０４５のＤＥＭＵＸ論理１００８に対するワイヤ／バス１０４６のＤＭＡ記述子出力である。ＤＭＡ記述子ブロック１００２は、ＰＣＩアドレス、ローカルアドレス、転送カウント、転送方向、および次の記述子ブロックのアドレスを含むホストメモリの情報に対応する記述子ブロック情報を含む。ホストはまた、ＰＣＩコントローラの記述子ポインタレジスタにおける一次の記述子ブロックのアドレスを設定する。制御ビットをセットすることによって、転送が開始され得る。ＰＣＩは、第１の記述子ブロックをロードし、データ転送を開始する。ＰＣＩコントローラは、記述子ブロックをロードし続けて、ＰＣＩコントローラがチェーンビットの端部を検出するときまで、転送データは、次の記述子ポインタレジスタにおいて設定される。
【０３０５】
アドレス復号器１００５は、バス７０８ｂのローカルＲ／Ｗ制御信号を受け取って送信し、バス７０８ｃのローカルアドレス信号を受け取って送信する。アドレス復号器１００５は、ＤＭＡ記述子１００２へのワイヤ／バス１０２０による書き込みイネーブル信号、制御レジスタ１００３へのワイヤ／バス１０２１による書き込みイネーブル信号、ワイヤ／バス７３８によるＦＰＧＡアドレスＳＰＡＣＥインデックス、ワイヤ／バス１０２７による制御信号、およびＤＥＭＵＸ論理１００８によるワイヤ／バス１０２４による別の制御信号を生成する。
【０３０６】
制御レジスタ１００３は、アドレス復号器１００５からのワイヤ／バス１０２１の書き込みイネーブル信号、およびローカルデータ信号７０８ｅを介してワイヤ／バス１０３０からのデータを受け取る。制御レジスタ１００３は、ＥＶＡＬ論理１００１へのワイヤ／バス１０１５のＷＲ＿ＸＳＦＲ／ＲＤ＿ＸＳＦＲ信号、ＥＶＡＬタイマー１００４へのワイヤ／バス１０４１のセットＥＶＡＬタイム信号、およびＦＰＧＡチップへのワイヤ／バス１０１６のＳＥＭ＿ＦＰＧＡ出力イネーブル信号を生成する。システムは、各ＦＰＧＡチップを選択的にオンにする、またはイネーブルするためにＳＥＭ＿ＦＰＧＡ出力イネーブル信号を用いる。典型的に、システムは、同時に各ＦＰＧＡチップをイネーブルする。
【０３０７】
ＥＶＡＬタイマー１００４は、ワイヤ／バス１０１４のスタートＥＶＡＬ信号、およびワイヤ／バス１０４１のセットＥＶＡＬタイムを受け取る。ＥＶＡＬタイマー１００４は、ワイヤ／バス７３７のＥＶＡＬ信号、ワイヤ／バス１０１７の評価完了（ＥＶＡＬ＿ＤＯＮＥ）信号、および書き込みフラグシーケンサ論理１００６へのワイヤ／バス１０１８のスタート書き込みフラグ信号を生成する。一実施形態において、ＥＶＡＬタイマーは６ビット長である。
【０３０８】
書き込みフラグシーケンサ論理１００６は、ＥＶＡＬタイマー１００４からワイヤ／バス１０１８のスタート書き込みフラグ信号を受け取る。書き込みフラグシーケンサ論理１００６は、ローカルＲ／Ｗのワイヤ／バス７０８ｂへのワイヤ／バス１０２２のローカルＲ／Ｗ制御信号、ローカルアドレスバス７０８ｃへのワイヤ／バス１０２３のローカルアドレス信号、ローカルデータバス７０８ｅへのワイヤ／バス１０２８のローカルデータ信号、およびワイヤ／バス７０８ｄのローカルＩＮＴＩ＃を生成する。スタート書き込みフラグ信号を受け取ると、書き込みフラグシーケンサ論理は、制御信号のシーケンスを始めて、ＰＣＩバスへのメモリ書き込みサイクルを始める。
【０３０９】
ＳＥＭ＿ＦＰＧＡＲ／Ｗ制御論理１００７は、アドレス復号器１００５からワイヤ／バス１０２７の制御信号、ローカルＲ／Ｗ制御バス７０８ｂを介してワイヤ／バス１０４７のローカルＲ／Ｗ制御信号を受け取る。ＳＥＭ＿ＦＰＧＡＲ／Ｗ制御論理１００７は、ラッチ１００９へのワイヤ／バス１０３５のイネーブル信号、ＤＥＭＵＸ論理１００８へのワイヤ／バス１０２５の制御信号、ラッチ１０１１へのワイヤ／バス１０３７のイネーブル信号、ラッチ１０１２へのワイヤ／バス１０４２のイネーブル信号、ワイヤ／バス７３４のＦ＿ＷＲ信号、およびワイヤ／バス７３５のＦ−ＲＤ信号を生成する。ＳＥＭ＿ＦＰＧＡＲ／Ｗ制御論理１００７は、ＦＰＧＡの低バンクバスおよび高バンクバスへ／からの多様な書き込みおよび読み出しデータ転送を制御する。
【０３１０】
ＤＥＭＵＸ論理１００８は、マルチプレクサおよびラッチである。マルチプレクサおよびラッチは、入力信号の４つのセット受け取り、ローカルデータバス７０８ｅに対してワイヤ／バス１０２６の１つのセットの信号を出力する。セレクタ信号は、ＳＥＭ＿ＦＰＧＡＲ／Ｗ制御論理１００７からのワイヤ／バス１０２５の制御信号およびアドレス復号器１００５からワイヤ／バス１０２４の制御信号である。ＤＥＭＵＸ論理１００８は、ワイヤ／バス１０４２のＥＶＡＬ＿ＤＯＮＥ信号からの信号、ワイヤ／バス１０４３のＸＳＦＲ＿ＤＯＮＥ信号、およびワイヤ／バス１０４４のＥＶＡＬ信号の１つのセットを受け取る。この１つのセットの信号を参照符号１０４８として呼ぶ。任意のある周期において、これら３つの信号、ＥＶＡＬ＿ＤＯＮＥ、ＸＳＦＲ＿ＤＯＮＥ、およびＥＶＡＬのうち１つのみが選択を可能にするためにＤＥＭＵＸ論理１００８に提供される。ＤＥＭＵＸ論理１００８はまた、入力信号の他の３つのセットとして、ＤＭＡ記述子ブロック１００２からのワイヤ／バス１０４５のＤＭＡ記述子出力信号、ラッチ１０１２からのワイヤ／バス１０３９のデータ出力、およびラッチ１０１０からのワイヤ／バス１０３４の別のデータ出力を受け取る。
【０３１１】
ＣＴＲＬ＿ＦＰＧＡユニット７０１と低およびおよび高ＦＰＧＡバンクバスとの間のデータバッファは、ラッチ１００９〜１０１２を含む。ラッチ１００９は、ワイヤ／バス１０３１およびローカルデータバス７０８ｅを介するワイヤ／バス１０３２のローカルデータバス、ＳＥＭ＿ＦＰＧＡＲ／Ｗ制御論理１００７からのワイヤ／バス１０３５のイネーブル信号を受け取る。ラッチ１００９は、ラッチ１０１０に対してワイヤ／バス１０３３によってデータを出力する。
【０３１２】
ラッチ１０１０は、ラッチ１００９からのワイヤ／バス１０３３のデータ、ＳＥＭ＿ＦＰＧＡＲ／Ｗ制御論理１００７からのワイヤ／バス１０３７を介するワイヤ／バス１０３６のイネーブル信号を受け取る。ラッチ１０１０は、ＦＰＧＡの低バンクバスに対してワイヤ／バス７２５のデータ、およびワイヤ／バス１０３４を介してＤＥＭＵＸ論理１００８を出力する。
【０３１３】
ラッチ１０１１は、ローカルデータバス７０８ｅからのワイヤ／バス１０３１のデータ、およびＳＥＭ＿ＦＰＧＡＲ／Ｗ制御論理１００７からのワイヤ／バス１０３７のイネーブル信号を受け取る。ラッチ１０１１は、ＦＰＧＡの高バンクバスに対してワイヤ／バス７２６のデータ、およびラッチ１０１２に対してワイヤ／バス１０３８のデータを出力する。
【０３１４】
ラッチ１０１２は、ラッチ１０１１からワイヤ／バス１０３８のデータ、ＳＥＭ＿ＦＰＧＡＲ／Ｗ制御論理１００７からワイヤ／バス１０４０のイネーブル信号を受け取る。ラッチ１０１２は、ＤＥＭＵＸ１００８に対してワイヤ／バス１０３９により出力する。
【０３１５】
図２４は、４×４ＦＰＧＡアレイ、ＦＰＧＡバンクに対するその関係、および拡張性能を示す。図８のように、図２４は、同様の４×４アレイを示す。ＣＴＲＬ＿ＦＰＧＡユニット７４０がさらに示される。低バンクチップ（チップＦ４１〜Ｆ４４およびＦ２１〜Ｆ２４）および高バンクチップ（チップＦ３１〜Ｆ３４およびＦ１１〜Ｆ１４）は、代替の様態で構成される。従って、下の列から上の列にＦＰＧＡチップの列を特徴付ける（低バンク−高バンク−低バンク−高バンク）。データ転送チェーンは、所定の順番に従う。低バンクのデータ転送チェーンは、矢印７４１によって示される。高バンクのデータ転送チェーンは、矢印７４２によって示される。ＪＴＡＧ構成チェーンは、矢印７４３によって示される。矢印７４３は、Ｆ４１からＦ４４へ、Ｆ３４からＦ３１へ、Ｆ２１からＦ２４へ、およびＦ１４からＦ１１へ１６チップの全体のアレイを介して通り、ＣＴＲＬ＿ＦＰＧＡユニット７４０に戻る。
【０３１６】
拡張は、ピギーバックボードに達成され得る。ＦＰＧＡチップのオリジナルアレイが、Ｆ４１〜Ｆ４４およびＦ３１〜Ｆ３４を含むことを図２４において想定すると、チップＦ２１〜Ｆ２４およびＦ１１〜Ｆ１４の２つのさらなる列の追加がピギーバックボード７４５に達成され得る。ピギーバックボード７４５はまた、適したバスを含み、バンクを拡張する。アレイにおいて他の頂上に置かれたさらなるピギーバックボードと共にさらなる拡張が達成され得る。
【０３１７】
図２５は、ハードウェアの起動方法の一実施形態を示す。工程８００は、パワーオンまたはウォームブートシーケンスを開始する。工程８０１において、ＰＣＩコントローラは、初期化するためにＥＥＰＲＯＭを読み出す。工程８０２は、初期化シーケンスを考慮してＰＣＩコントローラレジスタを読み出し、ＰＣＩコントローラレジスタに書き込む。工程８０３の境界スキャンは、アレイにおいて全てのＦＰＧＡチップをテストする。工程８０４は、ＦＰＧＡＩ／ＯコントローラのＣＴＲＬ＿ＦＰＧＡユニットを構成する。工程８０５は、レジスタを読み出して、ＣＴＲＬ＿ＦＰＧＡユニットにおけるレジスタに書き込む。工程８０６は、ＤＭＡマスター読み出し／書き込みモードのためのＰＣＩコントローラを設定する。その後、データは、転送されて確認される。工程８０７は、テスト設計と共に全てのＦＰＧＡチップを構成し、それの正確さを確認する。工程８０８において、ハードウェアは、使用するための準備が整っている。この段階では、システムは、全ての工程がハードウェアの動作性を積極的に確認することを想定しており、そうでなければ、システムは、工程８０８に到達しないことになる。
【０３１８】
（Ｅ．より高密度のＦＰＧＡチップを用いる代替の実施形態）
本発明の一実施形態において、ＦＰＧＡ論理デバイスは、個々のボードで提供される。個々のボードで提供されるというよりも、多くのＦＰＧＡ論理デバイスが、ユーザの回路設計をかたどる必要がある場合、より多くのＦＰＧＡと共に複数のボードが提供され得る。シミュレーションシステム中にさらなるボードを追加する能力が、本発明の所望の特徴である。本実施形態において、より高密度のＦＰＧＡチップ（例えば、Ａｌｔｅｒａ１０Ｋ１３０Ｖおよび１０Ｋ２５０Ｖ）が使用される。これらのチップの使用は、８つのより低密度のＦＰＧＡチップ（例えば、Ａｌｔｅｒａ１０Ｋ１００）の代わりに、１枚のボードにつき４つのＦＰＧＡチップのみが使用されるように、ボードの設計を変更する。
【０３１９】
シミュレーションシステムのマザーボードに、これらのボードを結合するためには課題がある。相互接続および接続のスキームは、バックプレーン不足のための補正を行う必要がある。シミュレーションシステムにおけるＦＰＧＡアレイは、特定の相互接続構成を介してマザーボード上に提供される。相互接続が隣り合う直接隣接した相互接続（すなわち、Ｎ［７３：０］、Ｓ［７３：０］、Ｗ［７３：０］、Ｅ［７３：０］）、および１つ置きに隣接した相互接続（すなわち、ＮＨ［２７：０］、ＳＨ［２７：０］、ＸＨ［３６：０］、ＸＨ［７２：３７］）によって配置され、単一のボード内に、および異なるボードにわたってローカルバス接続を除外する場合には、各チップは、８セット以下の相互接続を有し得る。各チップは、隣接した隣り合うチップに直接、または隣接しない上、下、左、および右に位置されたチップに１つ置きで相互接続されることが可能である。Ｘ方向（東−西）において、アレイは環状（ｔｏｒｕｓ）になる。Ｙ方向（北−南）において、アレイはメッシュ状になる。
【０３２０】
相互接続は、単一のボード内に論理デバイスおよび他の構成要素を連結し得る。しかし、相互ボードコネクタを提供して、異なるボードにわたってこれらのボードと相互接続とを共に連結して、（１）マザーボードおよびアレイボードを介するＰＣＩバス、ならびに（２）任意の２つのアレイボード間に信号を伝える。各ボードは、それ自体ＦＰＧＡバスＦＤ［６３：０］を含む。ＦＰＧＡバスＦＤ［６３：０］により、ＦＰＧＡ論理デバイスは、ＳＲＡＭメモリデバイスおよびＣＴＲＬ＿ＦＰＧＡユニット（ＦＰＧＡＩ／Ｏコントローラ）と互いに通信可能である。ＦＰＧＡバスＦＤ［６３：０］は、複数のボードにわたって提供されない。しかし、ＦＰＧＡ相互接続は、複数のボードにわたってＦＰＧＡ論理デバイス間に接続性を提供する。しかし、これらの相互接続はＦＰＧＡバスに関係しない。一方、ローカルバスは、全てのボードにわたって提供される。
【０３２１】
マザーボードコネクタは、ボードをマザーボードに、かつ、従って、ＰＣＩバス、電源、およびグラウンドに接続する。数個のボードに関して、マザーボードコネクタを、マザーボードに直接接続するためには使用しない。６つのボード構成において、ボード１、３、および５だけが、マザーボードに直接接続され、残りのボード２、４、および６は、マザーボードの接続性のためにそれらの隣接するボードに依存する。従って、全ての他のボードは、マザーボードに直接接続される。これらボードの相互接続およびローカルバスは、はんだ面から部品面に配置された内部ボードコネクタを介して共に結合される。ＰＣＩ信号は、ボード（典型的に第１のボード）のうちの１つを介してのみルーティングされる。電力およびグランドは、これらのボード用の他のマザーボードに使用される。多様な内部ボードコネクタは、はんだ面から部品面に設置されると、ＰＣＩバス構成要素、ＦＰＧＡ論理デバイス、メモリデバイス、および多様なシミュレーションシステム制御回路間を通信可能にする。
【０３２２】
図５６は、本発明の一実施形態によるＦＰＧＡチップ構成のアレイの高レベルブロック図を示す。ＣＴＲＬ＿ＦＰＧＡユニット１２００は、上述のように、ライン１２０９を介してバス１２１０に結合される。一実施形態において、ＣＴＲＬ＿ＦＰＧＡユニット１２００は、例えば、Ａｌｔｅｒａ１０Ｋ５０チップといった、ＦＰＧＡチップ形式のプログラム可能な論理デバイス（ＰＬＤ）である。バス１２１０により、ＣＴＲＬ＿ＦＰＧＡユニット１２００は、（もしあれば）他のシミュレーションアレイボード、および他のチップ（例えば、ＰＣＩコントローラ、ＥＥＰＲＯＭ、クロックバッファ）に結合されることが可能になる。図５６は、論理デバイスおよびメモリデバイス形式において他の主要な機能性ブロックを示す。一実施形態において、論理デバイスは、例えば、Ａｌｔｅｒａ１０Ｋ１３０Ｖまたは１０Ｋ２５０ＶチップといったＦＰＧＡチップ形式のプログラム可能論理デバイス（ＰＬＤ）である。１０Ｋ１３０Ｖまたは１０Ｋ２５０Ｖはピン互換性を持ち、両方とも５９９ピンＰＧＡパッケージである。従って、アレイの８つのＡｌｔｅｒａＦＬＥＸ１０Ｋ１００チップで上述された実施形態の代わりに、本実施形態は、ＡｌｔｅｒａのＦＬＥＸ１０Ｋ１３０の４つのチップのみを使用する。本発明の一実施形態は、これら４つの論理デバイスおよびこれらの相互接続を含むボードを記載する。
【０３２３】
ユーザの設計が、アレイにおけるこれらの任意の数の論理デバイスでかたどられ、構成されるので、内部ＦＰＧＡ論理デバイス通信は、ユーザの回路設計の一部を別の部分に接続するために必要である。さらに、内部構成情報および境界はまた、内部ＦＰＧＡ相互接続によってサポートされる。最終的に、必要なシミュレーションシステム制御信号は、シミュレーションシステムとＦＰＧＡ論理デバイスとの間をアクセス可能になる必要がある。
【０３２４】
図３６は、本発明で使用されるＦＰＧＡ論理デバイスのハードウェアアーキテクチャを示す。ＦＰＧＡ論理デバイス１５００は、１０２上部Ｉ／Ｏピン、１０２下部Ｉ／Ｏピン、１１１左部Ｉ／Ｏピン、および１０２右部Ｉ／Ｏピンを含む。従って、相互接続の総数は４２５である。さらに、さらなる４５Ｉ／ＯピンはＧＣＬＫ，ＦＰＧＡバスＦＤ［３１：０］（ＦＤ［６３：３２］が高バンク専用になる）、Ｆ＿ＲＤ，Ｆ＿ＷＤ、ＤＡＴＡＸＳＦＲ、ＳＨＩＦＴＩＮ、ＳＨＩＦＴＯＵＴ、ＳＰＡＣＥ［２：０］、ＥＶＡＬ、ＥＶＡＬ＿ＲＥＱ＿Ｎ、ＤＥＶＩＣＥ＿ＯＥ（ＦＰＧＡ論理デバイスの出力ピンをオンにするためのＣＴＲＬ＿ＦＰＧＡユニットからの信号）、およびＤＥＶ＿ＣＬＲＮ（シミュレーションを始める前に全ての内部フリップフロップをクリアするためのＣＴＲＬ＿ＦＰＧＡユニットからの信号）専用になる。従って、任意の２つのＦＰＧＡ論理デバイス間を渡る任意のデータ信号および制御信号は、これらの相互接続によって伝達される。残りのピンは、電力およびグランド専用になる。
【０３２５】
図３７は、本発明の一実施形態による単一のＦＰＧＡチップのＦＰＧＡ相互接続ピンアウトを示す。各セットが特定の数のピンを含む場合、各チップ１５１０は、８つ以下のセットの相互接続を有し得る。ボード上のチップのそれぞれの位置に依存する相互接続のセットは、８つよりも少ないチップを有し得る。好適な実施形態において、チップは、全部で７セットの相互接続を有し得るが、使用される相互接続の特定のセットは、チップがボード上のそれぞれの位置に依存して変化し得る。各ＦＰＧＡチップの相互接続は、水平方向（東−西）および垂直方向（北−南）に向けられる。西方向の相互接続のセットは、Ｗ［７３：０］として符号が付けられる。東方向の相互接続のセットは、Ｅ［７３：０］として符号が付けられる。北方向の相互接続のセットは、Ｎ［７３：０］として符号が付けられる。南方向の相互接続のセットは、Ｓ［７３：０］として符号が付けられる。相互接続のこれら完全なセットは、隣接したチップに接続するためのものである。すなわち、これら相互接続は、任意のチップを越えて「ホップ」しない。例えば、図３９において、チップ１５７０は、Ｎ［７３：０］の相互接続１５４０、Ｗ［７３：０］の相互接続１５４２、Ｅ［７３：０］の相互接続１５４３、およびＳ［７３：０］の相互接続１５４５を有する。ＦＰＧＡ２チップでもあるこのＦＰＧＡチップ１５７０が全部で４セットの隣接した相互接続（Ｎ［７３：０］、Ｓ［７３：０］、Ｗ［７３：０］、およびＥ［７３：０］）を有する。ＦＰＧＡ０の西方向の相互接続は、環状相互接続を介してワイヤ１５３９を通ってＦＰＧＡ３の東方向の相互接続に接続する。従って、ワイヤ１５３９により、チップ１５６９（ＦＰＧＡ０）および１５７２（ＦＰＧＡ３）は、互いに接触されるように包み込まれるために、ボードの西−東端を含むような様態で互いに直接結合されることが可能になる。
【０３２６】
図３７に戻ると、４セットの「ホッピング（ｈｏｐｐｉｎｇ）」相互接続が提供される。垂直方向（ＮＨ［２７：０］およびＳＨ［２７：０］）に走る２セットの相互接続は、隣接しない相互接続である。例えば、図３９のＦＰＧＡ２チップ１５７０は、ＮＨ相互接続１５４１およびＳＨ相互接続１５４６を示す。図３７に戻ると、水平方向（ＸＨ［３６：０］およびＸＨ［７２：３７］）に走る他の２つのセットの相互接続は、隣接しない相互接続である。例えば、図３９のＦＰＧＡ２チップ１５７０は、ＸＨ相互接続１５４４を示す。
【０３２７】
図３７に戻ると、垂直方向のホッピング相互接続ＮＨ［２７：０］およびＳＨ［２７：０］はそれぞれ２８ピンを有する。水平方向の相互接続は、７３ピン、ＸＨ［３６：０］およびＸＨ［７２：３７］を有する。水平方向の相互接続ピン、ＸＨ［３６：０］およびＸＨ［７２：３７］は、西側（例えば、図３９におけるＦＰＧＡ３チップ１５７６の相互接続１６０５）および／または東側（例えば、図３９におけるＦＰＧＡ０チップ１５７３の相互接続１６０２）で使用され得る。この構成により、各チップは、同様に製造されることが可能になる。従って、各チップは、上部、下部、左部および右部に設置される隣接しないチップに対して１つ置きに相互接続されることが可能である。
【０３２８】
図３９は、本発明の一実施形態による単一のマザーボード上の６つのボードの直接隣接するＦＰＧＡアレイ、および１つ置きに隣接したＦＰＧＡアレイのレイアウトを示す。この図を用いて、２つの可能な構成（６ボードシステムおよび２ボードシステム）を示す。位置表示１５５０は、「Ｙ」方向が南北の方向であり、「Ｘ」方向が東西の方向であることを示す。Ｘ方向では、アレイは環状である。Ｙ方向では、アレイはメッシュ状である。図３９において、高レベルにおける、ボード、ＦＰＧＡ論理デバイス、相互接続、およびコネクタのみが示される。マザーボードおよび他のサポートする構成要素（例えば、ＳＲＡＭメモリデバイス）およびワイヤ線（例えば、ＦＰＧＡバス）が示されない。
【０３２９】
図３９がボードおよび他の構成要素、相互接続、およびコネクタのアレイ図を提供することに留意されたい。実際の物理的な構成および設定は、これらそれぞれの端部の部品面上のこれらのボードをはんだ面に置くことを含む。ボードの約半分が、マザーボードに直接接続され、残りの半分は、それらのそれぞれ隣接するボードに接続される。
【０３３０】
本発明の６つのボードの実施形態において、６つのボード１５５１（ボード１）、１５５２（ボード２）、１５５３（ボード３）、１５５４（ボード４）、１５５５（ボード５）、および１５５６（ボード６）が図１の再構成可能なハードウェアユニット２０の一部としてマザーボード（図示せず）上に提供される。各ボードは、構成要素およびコネクタのほとんど同等のセットを含む。従って、図示の目的のため、６つ目のボード１５５６は、ＦＰＧＡ論理デバイス１５６５〜１５６８、コネクタ１５５７〜１５６０および１５８１を含む。５つ目のボード１５５５は、ＦＰＧＡ論理デバイス１５６９〜１５７２およびコネクタ１５８２および１５８３を含み、４つ目のボード１５５４は、ＦＰＧＡ論理デバイス１５７３〜１５７６およびコネクタ１５８４および１５８５を含む。
【０３３１】
この６つのボード構成において、ボード１１５５１およびボード６１５５６は、「ブックエンド」ボードとして提供される。「ブックエンド」ボードは、例えば、ボード６１５５６上のＲ−パック端子１５５７〜１５６０およびボード１１５５１上のＲ−パック端子１５９１〜１５９４といったＹ−メッシュ端子を含む。中間に配置されたボード（すなわち、ボード１５５２（ボード２）、１５５３（ボード３）、１５５４（ボード４）、および１５５５（ボード５））もまたアレイを完成するために提供される。
【０３３２】
上述されるように、相互接続は、直接隣接した相互接続（すなわち、Ｎ［７３：０］、Ｓ［７３：０］、Ｗ［７３：０］、Ｅ［７３：０］）、および１つ置きに隣接した相互接続（すなわち、ＮＨ［２７：０］、ＳＨ［２７：０］、ＸＨ［３６：０］、ＸＨ［７２：３７］）によって構成され、単一のボード内に、および異なるボードにわたってローカルバス接続を除外する。相互接続は、単独で、単一のボード内で論理デバイスおよび他の構成要素を結合し得る。しかし、内部ボードコネクタ１５８１〜１５９０は、異なるボード（すなわち、ボード１〜ボード６）にわたってＦＰＧＡ論理デバイス間を通信することを可能にする。ＦＰＧＡバスは、内部ボードコネクタ１５８１〜１５９０の一部である。これらのコネクタ１５８１〜１５９０は、５２０の信号および８０の電力／グランド接続を、２つの隣接したアレイボード間に伝達する６００ピンコネクタである。
【０３３３】
図３９において、多様なボードは、内部ボードのコネクタ１５８１〜１５９０に関して非対称な様態で配置される。例えば、ボード１５５１と１５５２との間には、内部ボードコネクタ１５８９および１５９０が提供される。相互接続１５１５は、ＦＰＧＡ論理デバイス１５１１および１５７７を共に、およびコネクタ１５８９および１５９０に従って接続する。この接続は対称になる。しかし、相互接続１６０３は、対称にならない。それは、第３のボード１５５３のＦＰＧＡ論理デバイス１５５３をボード１５５１のＦＰＧＡ論理デバイス１５７７に接続する。コネクタ１５８９および１５９０によると、このような相互接続は対称にならない。同様に、相互接続１６００は、コネクタ１５８９および１５９０に対して対称にならない。なぜなら、相互接続１６００は、ＦＰＧＡ論理デバイス１５７７を端子１５９１に接続するからである。端子１５９１は、相互接続１６０１を介してＦＰＧＡ論理デバイス１５７７に接続する。他の類似の相互接続は、非対称をさらに示すように存在する。
【０３３４】
この非対称の結果として、相互接続は、２つの異なる方法（１つは相互接続１５１５のような対称相互接続、および別の方法は相互接続１６０３および１６００のような非対称相互接続）で内部ボードコネクタを介してルーティングされる。相互接続ルーティングはスキーム、図４０Ａおよび４０Ｂに示される。
【０３３５】
図３９において、単一のボード内の直接隣接した接続の例は、ボード１５５５の東−西方向に沿って、論理デバイス１５７０を論理デバイス１５７１に結合する相互接続１５４３である。単一のボード内の直接隣接した別の接続の例は、論理デバイス１５７３をボード１５５４の論理デバイス１５７３に結合する相互接続１６０７である。２つの異なるボード間の直接隣接した接続の例は、北−南方向に沿って、コネクタ１５８３および１５８４を介して、ボード１５５５の論理デバイス１５７０をボード１５５４の論理デバイス１５７４に結合する相互接続１５４５である。ここで、２つの内部ボードコネクタ１５８３および１５８４を用いて、信号を向こう側へに転送する。
【０３３６】
例示の単一のボード内の１つ置きの相互接続は、東−西方向に沿って、論理デバイス１５７０をボード１５５５の論理デバイス１５７２に結合する相互接続１５４４である。例示の２つの異なるボード間の１つ置きの相互接続は、コネクタ１５８１〜１５８４を介して、ボード１５５６の論理デバイス１５６５をボード１５５４の論理デバイス１５７３に結合する。ここで、４つの内部ボードのコネクタ１５８１〜１５８４を用いて、信号を向こう側へ転送する。
【０３３７】
いくつかのボード、特にマザーボード上の北−南端に置かれたボードはまた、１０ΩＲ−パックを備えていくつかの接続を終端させる。従って、６つ目のボード１５５６は、１０ΩＲ−パックコネクタ１５５７〜１５６０を備えて、１つ目のボード１５５１は、１０ΩＲ−パックコネクタ１５９１〜１５９４を備える。６つ目のボード１５５６は、相互接続１９７０〜１９７１のためにＲ−パックコネクタ１５５７、相互接続１９７２および１５４１のためにＲ−パックコネクタ１５５８、相互接続１９７３および１９７４のためにＲ−パックコネクタ１５５９、ならびに相互接続１９７５および１９７６のためにＲ−パックコネクタ１５６０を備える。さらに、相互接続１５６１〜１５６４は、いずれにも接続されない。これらの北−南相互接続は、東−西環状タイプの相互接続とは違って、メッシュ状タイプの様態で配置される。
【０３３８】
これらメッシュ状端末は、北−南直接相互接続の数を増やす。さもなければ、ＦＰＧＡメッシュ状の北および北方向の端部および南方向の端部における相互接続は、全て無駄になる。例えば、ＦＰＧＡ論理デバイス１５１１および１５７７は、既に直接相互接続１５１５の１つのセットを有する。さらなる相互接続はまた、Ｒ−パック１５９１ならびに相互接続１６００および１６０１を介するこれら２つのＦＰＧＡ論理デバイスが提供される。つまり、Ｒ−パックは、相互接続１６００および１６０１を共に接続する。これは、ＦＰＧＡ論理デバイス１５１１と１５７７との間の直接接続の数を増やす。
【０３３９】
内部ボード接続がさらに提供される。ボード１５５１上の論理デバイス１５７７、１５７８、１５７９、および１５８０は、相互接続１５１５、１５１６、１５１７、および１５１８ならびに内部ボードコネクタ１５８９および１５９０を介して論理デバイス１５１１、１５１２、１５１３、および１５１４に結合される。従って、相互接続１５１５は、ボード１５５２の論理デバイス１５１１を、コネクタ１５８９および１５９０を介してボード１５５１の論理デバイス１５７７に結合する。相互接続１５１６は、ボード１５５２の論理デバイス１５１２を、コネクタ１５８９および１５９０を介してボード１５５１の論理デバイス１５７８に結合する。相互接続１５１７は、ボード１５５２の論理デバイス１５１３を、コネクタ１５８９および１５９０を介してボード１５５１の論理デバイス１５７９に結合する。相互接続１５１８は、ボード１５５２の論理デバイス１５１４を、コネクタ１５８９および１５９０を介してボード１５５１の論理デバイス１５８０に結合する。
【０３４０】
例えば、１５９５、１５９６、１５９７、および１５９８といったいくつかの相互接続は、いずれにも結合されない。なぜなら、それらは使用されないからである。しかし、論理デバイス１５１１および１５７７に対して上述したように、Ｒ−パック１５９１は、相互接続１６００および１６０１を接続して、北−南相互接続の数を増やす。
【０３４１】
本発明の２つのボードの実施形態が図４４に示される。本発明の２つのボードの実施形態において、２つのボードのみが、シミュレーションシステムにおけるユーザの設計をかたどる必要がある。図３９の６つのボード構成のように、図４４の２つのボード構成は、「ブックエンド」の同じ２つのボード（ボード１１５５１およびボード６１５５６）を用いる。これらは、図１の再構成可能なハードウェアユニットの一部としてマザーボード上に提供される。図４４において、１つ目のブックエンドボードはボード１であり、２つ目のブックエンドボードはボード６である。ボード６は、図４４で用いられて、図３９のボード６と同様に示す。すなわち、ボード１およびボード６のようなブックエンドボードは、北−南メッシュ状接続の不可欠な終端を有するべきである。
【０３４２】
この２つのボード構成は、ボード１１５５１上の４つのＦＰＧＡ論理デバイス１５７７（ＦＰＧＡ０）、１５７８（ＦＰＧＡ１）、１５７９（ＦＰＧＡ２）、および１５８０（ＦＰＧＡ３）、ならびにボード６１５５６上の４つのＦＰＧＡ論理デバイス１５６５（ＦＰＧＡ０）、１５６６（ＦＰＧＡ１）、１５６７（ＦＰＧＡ２）、および１５６８（ＦＰＧＡ３）を含む。これら２つのボードは、内部ボードコネクタ１５８１および１５９０によって接続される。
【０３４３】
これらのボードは、いくつかの接続を終端させるために１０ΩＲ−パックを含む。２つのボードの実施形態に関して、ボードは共に、「ブックエンド」ボードである。ボード１５５１は、抵抗性の終端として１０ΩＲ−パックコネクタ１５９１、１５９２、１５９３、および１５９４を含む。２つ目のボード１５５６はまた、１０ΩＲ−パックコネクタ１５５７〜１５６０を含む。
【０３４４】
ボード１５５１はコネクタ１５９０を有し、ボード１５５６は、内部ボード通信のためのコネクタ１５８１を有する。例えば、相互接続１６００、１９７１、１９７７、１５４１、および１５４０といった、あるボードから別のボードに渡る相互接続は、これらのコネクタ１５９０および１５８１を通過する。言い換えると、内部ボードコネクタ１５９０および１５８１により、相互接続１６００、１９７１、１９７７、１５４１、および１５４０は、あるボード上のある構成要素および別のボード上の別の構成要素間の接続をうまく行くことを可能にする。内部ボードコネクタ１５９０および１５８１は、ＦＰＧＡバス上の制御データおよび制御信号を伝達する。
【０３４５】
４つのボード構成に関して、ボード１およびボード６は、ブックエンドボードを提供する。ボード２１５５２およびボード３１５５３（図３９を参照）は、中間のボードである。本発明（図３８Ａおよび図３８Ｂに関して論じられるように）に従ってマザーボードに接続されると、ボード１およびボード２は一組にされ、ボード３およびボード６は一組にされる。
【０３４６】
６つのボード構成に関して、ボード１およびボード６は、上述されるようにブックエンドボードを提供する。ボード２１５５２、ボード３１５５３、ボード４１５５４、およびボード５１５５５（図３９を参照）は、中間のボードである。本発明（図３８Ａおよび図３８Ｂに関して論じられるように）に従ってマザーボードに接続されると、ボード１およびボード２は一組にされ、ボード３およびボード４は一組にされ、ボード５およびボード６は一組にされる。
【０３４７】
所望ならば、さらなるボードが提供され得る。しかし、システムに追加されるボードに関係なく、ブックエンドボード（例えば、ボード１およびボード６のように）は、メッシュ状アレイ接続を達成する抵抗性の終端を有するべきである。一実施形態において、最小の構成は、図４４の２つのボード構成である。２つのボードの追加によって、さらなるボードが追加され得る。一次の構成がボード１およびボード６を有するならば、４つのボード構成への将来の変更は、上述のように、ボード６をさらに除去することと、ボード１およびボード２を共に一組にすることと、次に、ボード３およびボード６を共に一組にすることを含む。
【０３４８】
上述されるように、各論理デバイスは、隣接した隣り合う論理デバイス、および隣接しない、１つ置きに隣り合う論理デバイスに結合される。従って、図３９および４４において、論理デバイス１５７７は、相互接続１５４７を介して、隣接する隣り合う論理デバイス１５７８に結合される。論理デバイス１５７７はまた、１つ置きの相互接続１５４８を介して、隣接しない論理デバイス１５７９に結合される。しかし、論理デバイス１５８０は、結合を提供する相互接続１５４９を有する包み込む環状構成のために、論理デバイス１５７７に隣接するように考慮される。
【０３４９】
図４２は、オンボード構成要素、および単一のボードのコネクタの上面図（構成要素側）を示す。本発明の一実施形態において、シミュレーションシステムにおいてユーザの設計をかたどるために、１つのボードだけが必要である。他の実施形態において、複数のボード（すなわち、少なくとも２つのボード）が必要である。従って、例えば、図３９は、多様な６００ピンコネクタ１５８１〜１５９０を介して共に結合される６つのボード１５５１〜１５５６を示す。上部および下部において、ボード１５５１は、１セットの１０ΩＲ−パックによって終端され、ボード１５５６は、別の１セットの１０ΩＲ−パックによって終端される。
【０３５０】
図４２に戻ると、ボード１８２０は、４つのＦＰＧＡ論理デバイス１８２２（ＦＰＧＡ０）、１８２３（ＦＰＧＡ１）、１８２４（ＦＰＧＡ２）、および１８２５（ＦＰＧＡ３）を含む。２つのＳＲＡＭメモリデバイス１８２８および１８２９が、さらに提供される。ＳＲＡＭメモリデバイス１８２８および１８２９を用いて、このボード上の論理デバイスからメモリブロックをマッピングする。つまり、本発明によるメモリシミュレーションは、このボードの論理デバイスからこのボードのＳＲＡＭメモリデバイスにメモリブロックをマッピングする。他のボードは、他の論理デバイスおよびメモリデバイスを含み、類似のマッピング動作を達成する。一実施形態において、メモリマッピングはボードに依存する。すなわち、ボード１のメモリマッピングは、他のボードを無視して、ボード１上の論理デバイスおよびメモリデバイスに限定される。他の実施形態において、メモリマッピングは、ボードに依存しない。従って、あまり多数ではないメモリデバイスを用いて、あるボードの論理デバイスから別のボードに置かれたメモリデバイスにメモリブロックをマッピングする。
【０３５１】
発光ダイオード（ＬＥＤ）１８２１がまた、いくつかの選択活性を視覚的に示すように提供される。ＬＥＤディスプレイは、本発明の一実施形態によって表Ａの通りである。
【０３５２】
【表２】

【０３５３】
ＰＬＸＰＣＩコントローラ１８２６およびＣＴＲＬ＿ＦＰＧＡユニット１８２７といった、多様な他の制御チップが内部ＦＰＧＡおよびＰＣＩ通信を制御する。システムで使用され得るＰＬＸＰＣＩコントローラ１８２６の一例は、ＰＬＸＴｅｃｈｎｏｌｏｇｙのＰＣＩ９０８０または９０６０である。このＰＣＩ９０８０は、適切なローカルバスインタフェース、制御レジスタ、ＦＩＦＯ、およびＰＣＩバスへのＰＣＩインタフェースである。データブックのＰＬＸＴｅｃｈｎｏｌｏｇｙ、ＰＣＩ９０８０Ｓｈｅｅｔ（ｖｅｒ．０．９３、１９９７年２月２８日）は、本明細書中に参考として援用される。ＣＴＲＬ＿ＦＰＧＡユニット１８２７の一例は、例えば、Ａｌｔｅｒａ１０Ｋ５０チップといった、ＦＰＧＡチップの形式でプログラム可能な論理デバイス（ＰＬＤ）である。複数のボード構成において、第１のボードだけがＰＣＩコントローラを含むＰＣＩバスに結合される。
【０３５４】
コネクタ１８３０は、ボード１８２０をマザーボード（図示せず）、従って、ＰＣＩバス、電力、およびグランドに接続する。いくつかのボードに関して、コネクタ１８３０は、マザーボードへの直接の接続のために使用されない。従って、２つのボード構成において、第１のボードのみが、マザーボードに直接結合される。６つのボード構成において、ボード１、３、および５だけが、マザーボードに直接接続され、残りのボード２、４、および６が、マザーボードの接続性のためにそれらの隣接したボードに依存する。内部ボードコネクタＪ１〜Ｊ２８がさらに提供される。名前で意味されるように、これらのコネクタＪ１〜Ｊ２８は、異なるボードにわたる接続が可能である。
【０３５５】
コネクタＪ１は、外部電力およびグランド接続用である。以下の表Ｂは、本発明の一実施形態による、外部電力のピンおよび対応する詳細を示す。
【０３５６】
【表３】

【０３５７】
コネクタＪ２は、パラレルポート接続用である。コネクタＪ１およびＪ２は、作業中に単独単一ボード境界スキャンテストのために用いられる。以下の表Ｃは、本発明の一実施形態によるパラレルＪＴＡＧポートコネクタＪ２のピンおよび対応する詳細を示す。
【０３５８】
【表４】

【０３５９】
コネクタＪ３およびＪ４は、ボードにわたるローカルバス接続用である。コネクタＪ５〜Ｊ１６は、ＦＰＧＡ相互接続の接続のあるセットである。コネクタＪ１７〜Ｊ２８は、第２のセットのＦＰＧＡ相互接続の接続である。部品面からはんだ面に設置されると、これらのコネクタは、あるボードのある構成要素と別のボードの別の構成要素との間の効果的な接続を提供する。以下の表ＤおよびＥは、本発明の一実施形態によるコネクタＪ１〜Ｊ２８の完成リストおよび詳細を提供する。
【０３６０】
【表５】

【０３６１】
影付きコネクタはスルーホールタイプである。表Ｄにおいて、ブラケット［］の中の数字は、ＦＰＧＡ論理デバイスの番号０〜３を表すことに留意されたい。従って、Ｓ［０］は、南方向の相互接続（すなわち、図３７のＳ［７３：０］）およびＦＰＧＡ０の７４ビットを示す。
【０３６２】
【表６】

【０３６３】
図４３は、図４１Ａ〜図４１Ｆおよび図４２におけるコネクタＪ１〜Ｊ２８の説明を示す。一般に、クリアなブロックは、表面取り付けタイプであるのに対し、灰色のブロックがスルーホールタイプである。さらに、輪郭が実線のブロックは、部品面に置かれたコネクタを表す。輪郭が点線のブロックは、はんだ面に置かれたコネクタを表す。従って、空白および輪郭が実線のブロック１８４０は、２×３０のヘッダー、表面取り付け、および部品面に置かれたことを表す。クリアなおよび輪郭が点線のブロック１８４１は、２×３０のレセプタクル、表面取り付け、およびボードのはんだ面に置かれたことを表す。灰色で満たされ、および輪郭が実線のブロック１８４２は、２×３０または２×４５のヘッダー、スルーホール、および部品面に置かれたことを表す。灰色および輪郭が点線のブロック１８４３は、２×４５または２×３０のレセプタクル、スルーホール、およびはんだ面に置かれたことを表す。一実施形態において、シミュレーションシステムは、表面取りつけおよびスルーホールタイプの両方の２×３０または２×４５のマイクロストリップコネクタのＳａｍｔｅｃのＳＦＭおよびＴＦＭシリーズを使用する。クロスハッチで満たされ、および実線を有するブロック１８４４は、Ｒ−パック、表面取りつけ、およびボードの部品面に取り付けられている。クロスハッチで満たされ、および点線を有するブロック１８４５は、Ｒ−パック、表面取りつけ、およびはんだ面に取り付けられている。ＳａｍｔｅｃのウェブサイトのカタログからのＳａｍｔｅｃの仕様は、本明細書中に参考として援用される。図４２に戻ると、コネクタＪ３〜Ｊ２８は、図４３の説明として示されるようなタイプである。
【０３６４】
図４１Ａ〜４１Ｆは、各ボードおよびそれらのそれぞれのコネクタの上面図を示す。従って、ボード１６６０は、マザーボードコネクタ１６８２と共にコネクタ１６６１〜１６８１を含む。図４１Ｂは、ボード５用のコネクタを示す。従って、ボード１６９０は、マザーボードコネクタ１７０９と共にコネクタ１６９１〜１７０８を含む。図４１Ｃは、ボード４用のコネクタを示す。従って、ボード１７１５は、マザーボードコネクタ１７３４と共にコネクタ１７１６〜１７３３を含む。図４１Ｄは、ボード３用のコネクタを示す。従って、ボード１７４０は、マザーボードコネクタ１７５９と共にコネクタ１７４１〜１７５８を含む。図４１Ｅは、ボード２用のコネクタを示す。従って、ボード１７６５は、マザーボードコネクタ１７８４と共にコネクタ１７６６〜１７８３を含む。図４１Ｆは、ボード１用のコネクタを示す。従って、ボード１７９０は、マザーボードコネクタ１８１３と共にコネクタ１７９１〜１８１２を含む。図４３の説明で示されるように、６つのボードのこれらのコネクタは、（１）表面取り付けまたはスルーホール、（２）部品面またははんだ面、および（３）ヘッダーまたはレセプタクルあるいはＲ−パック、の組み合わせである。
【０３６５】
一実施形態において、これらのコネクタは、内部ボード通信用に仕様される。関係するバスおよび信号は、共にグループ化され、任意の２つのボード間のルーティング信号のためのこれらの内部ボードコネクタによってサポートされる。さらに、ボードの半分だけが、マザーボードに直接結合される。図４１Ａにおいて、ボード６１６６０は、１セットのＦＰＧＡ相互接続に関して指定されたコネクタ１６６１〜１６６８、コネクタ１６６９〜１６７４、ＦＰＧＡ相互接続の別のセットに関して指定されたコネクタ１６６９〜１６７４、１６７６、および１６７９、ならびにローカルバスに関して指定されたコネクタ１６８１を含む。ボード６１６６０が、マザーボード（他の終端において図４１Ｆのボード１１７９０と共に）の終端においてボードの１つとして設置されるため、コネクタ１６７５、１６７７、１６７８および１６８０は、正確な北−南方向の相互接続の１０ΩＲ−パック接続に関して指定される。さらに、６つ目のボード１５３５が、マザーボード１５２０に直接結合されるのではなく、５つ目のボード１５３４に結合される場合には、図３８Ｂで示されるように、マザーボードコネクタ１６８２はボード６１６６０用に使用されない。
【０３６６】
図４１Ｂにおいて、ボード５１６９０は、あるセットのＦＰＧＡ相互接続に関して指定されたコネクタ１６９１〜１６９８、別のセットのＦＰＧＡ相互接続に関して指定されたコネクタ１６９９〜１７０６、ならびに別のセットのＦＰＧＡ相互接続に関して指定されたコネクタ１７０７および１７０８を含む。コネクタ１７０９を用いて、ボード５１６９０をマザーボードに結合する。
【０３６７】
図４１Ｃにおいて、ボード４１７１５は、あるセットのＦＰＧＡ相互接続に関して指定されたコネクタ１７１６〜１７２３、別のセットのＦＰＧＡ相互接続に関して指定されたコネクタ１７２４〜１７３１、ならびにローカルバスに関して指定されたコネクタ１７３２および１７３３を含む。コネクタ１７０９を用いることなく、ボード４１７１５を直接マザーボードに結合する。この構成はまた、第４のボード１５３３が、マザーボード１５２０に直接接続されるわけではないが、第３のボード１５３２および第５のボード１５３４に結合される場合には、図３８Ｂに示される。
【０３６８】
図４１Ｄにおいて、ボード３１７４０は、あるセットのＦＰＧＡ相互接続に関して指定されたコネクタ１７４１〜１７４８、別のセットのＦＰＧＡ相互接続に関して指定されたコネクタ１７４９および１７５６、ならびにローカルバスに関して指定されたコネクタ１７５７および１７５８を含む。コネクタ１７５９を用いて、ボード３１７４０をマザーボードに結合する。
【０３６９】
図４１Ｅにおいて、ボード２１７６５は、あるセットのＦＰＧＡ相互接続に関して指定されたコネクタ１７６６〜１７７３、ローカルバスに関して指定されたコネクタ１７７４〜１７８１、ならびにローカルバスに関して指定されたコネクタ１７８２および１７８３を含む。コネクタ１７８４を用いることなく、ボード２１７６５をマザーボードに直接結合する。この構成はまた、第２のボード１５２５が、マザーボード１５２０に直接結合されるわけではないが、第３のボード１５３２および第１のボード１５２６に結合される場合には、図３８Ｂに示される。
【０３７０】
図４１Ｆにおいて、ボード１１７９０は、あるセットのＦＰＧＡ相互接続に関して指定されたコネクタ１７９１〜１７９８、別のセットのＦＰＧＡ相互接続に関して指定されたコネクタ１７９９〜１８０４、１８０６および１８０９、ならびにローカルバスに関して指定されたコネクタ１８１１および１８１２を含む。コネクタ１８１３を用いることなく、ボード１１７９０をマザーボードに結合する。ボード１１７９０が、マザーボードの端部におけるボードの１つ（図４１Ａにおいて他の端部におけるボード６１６６０と共に）として位置付けされるので、コネクタ１８０５、１８０７、１８０８、および１８１０は、正確な北−南方向の相互接続の１０ΩＲ−パック接続に関して指定される。
【０３７１】
本発明の一実施形態において、複数のボードは、独自の様態においてマザーボードおよび互いに結合される。複数のボードは、部品面をはんだ面に共に結合される。ボードの１つ（例えば、第１のボード）は、マザーボードコネクタを介して、マザーボードに結合され、従ってＰＣＩバスに結合される。さらに、第１のボードのＦＰＧＡ相互接続バスは、一組のＦＰＧＡ相互接続コネクタを介して他のボード（例えば、第２のバス）のＦＰＧＡ相互接続バスに結合される。第１のボード上のＦＰＧＡ相互接続コネクタは、部品面上にあり、第２のボード上のＦＰＧＡ相互接続コネクタは、はんだ面上にある。第１のボード上の部品面のコネクタおよび第２のボード上のはんだ面のコネクタにより、それぞれ、ＦＰＧＡ相互接続バスは、共に結合されることを可能にする。
【０３７２】
同様に、２つのボード上のローカルバスは、ローカルバスコネクタを介して共に結合される。第１のボード上のローカルバスコネクタは、部品面上にあり、第２のボード上のローカルバスコネクタは、はんだ面上にある。従って、第１のボード上の部品面のコネクタおよび第２のボード上のはんだ面のコネクタのそれぞれにより、ローカルバスは、共に結合されることを可能にする。
【０３７３】
さらなるボードが追加され得る。第３のボードは、第２のボードの部品面に第３のボードのはんだ面を追加し得る。類似のＦＰＧＡ相互接続およびローカルバスの内部ボード接続が、さらに追加される。第３のボードはまた、別のコネクタを介してマザーボードに結合されるが、このコネクタは、以下にさらに記載されるように、単に電力およびグラウンドを第３のボードに提供するだけである。
【０３７４】
２つのボード構成における部品面からはんだ面へのコネクタは、図３８Ａに参考として論じられる。この図は、本発明の一実施形態によるマザーボード上のＦＰＧＡボードの接続の側面図を示す。図３８Ａは、２つのボード構成を示し、名称が意味するように、２つのボードが利用される。図３８Ａにおいてこれら２つのボード１５２５（ボード２）および１５２６（ボード１）は、図３９において２つのボード１５５２および１５５１と一致する。ボード１５２５および１５２６の部品面は、参照符号１９８９によって表示される。ボード１５２５および１５２６のはんだ面は、参照符号１９８８によって表示される。図３８Ａに示されるように、これら２つのボード１５２５および１５２６は、マザーボードコネクタ１５２３を介してマザーボード１５２０に結合される。他のマザーボードコネクタ１５２１、１５２２、および１５２４はまた、拡張の目的のために提供され得る。ＰＣＩバスとボード１５２５と１５２６との間の信号は、マザーボードコネクタ１５２３を介してルーティングされる。ＰＣＩ信号は、まず、第１のボード１５２６を介して２つのボード構造とＰＣＩバスとの間をルーティングされる。従って、ＰＣＩバスからの信号は、それらの信号が第２のボード１５２５に移動する前に、まず、第１のボード１５２６と接触する。同様に、２つのボード構造からのＰＣＩバスへの信号は、第１のボード１５２６から送られる。電力はまた、マザーボードコネクタ１５２３を介して電源（図示せず）からボード１５２５および１５２６に印加される。
【０３７５】
図３８Ａに示されるように、ボード１５２６は、いくつかの構成要素およびコネクタを含む。１つのこのような構成要素は、ＦＰＧＡ論理デバイス１５３０である。コネクタ１５２８Ａおよび１５３１Ａは、さらに提供される。同様に、ボード１５２５は、いくつかの構成要素およびコネクタを含む。１つのこのような構成要素は、ＦＰＧＡ論理デバイス１５２９である。コネクタ１５２８Ｂおよび１５３１Ｂは、さらに提供される。
【０３７６】
一実施形態において、コネクタ１５２８Ａおよび１５２８Ｂは、例えば、１５９０および１５８１（図４４）といった、ＦＰＧＡバスの内部ボードコネクタである。これらの内部ボードコネクタは、多様なＦＰＧＡ相互接続（例えば、Ｎ［７３：０］、Ｓ［７３：０］、Ｗ［７３：０］、Ｅ［７３：０］、ＮＨ［２７：０］、ＳＨ［２７：０］、ＸＨ［３６：０］、およびＸＨ［７２：３７］）に対して内部ボード接続性を提供し、ローカルバス接続を除外する。
【０３７７】
さらに、コネクタ１５３１Ａおよび１５３１Ｂは、ローカルバスの内部ボードコネクタである。ローカルバスは、（ＰＣＩコントローラを介する）ＰＣＩバス、と（ＦＰＧＡＩ／Ｏコントローラ（ＣＴＲＬ＿ＦＰＧＡ）ユニットを介する）ＦＰＧＡバス間の信号を扱う。ローカルバスはまた、ＰＣＩコントローラおよびＦＰＧＡ論理デバイスおよびＦＰＧＡＩ／Ｏコントローラ（ＣＴＲＬ＿ＦＰＧＡ）ユニット間の構成および境界スキャンテスト情報を処理する。
【０３７８】
つまり、マザーボードコネクタは、一組のボードの１つのボードとＰＣＩバスおよび電力とを結合する。１セットのコネクタは、あるボードの部品面を介してＦＰＧＡ相互接続を他のボードのはんだ面に結合する。コネクタの別のセットは、あるボードの部品面を介してローカルバスを他のボードのはんだ面に結合する。
【０３７９】
本発明の別の実施形態において、２つより多くのボードが使用される。実際には、図３８Ｂは、６つのボード構成を示す。この構成は、図３８Ａの構成と類似しており、マザーボード、ならびにこれらボードの相互接続およびローカルバスに直接接続される全ての他のボードは、はんだ面から部品面へ配置される内部ボードコネクタを介して共に結合される。
【０３８０】
図３８Ｂは、６つのボード、１５２６（第１のボード）、１５２５（第２のボード）、１５３２（第３のボード）、１５３３（第４のボード）、１５３４（第５のボード）、および１５３５（第６のボード）を示す。これら６つのボードは、ボード１５２６（第１のボード）、１５３２（第３のボード）、および１５３４（第５のボード）上のコネクタを介してマザーボード１５２０に結合される。他のボード１５２５（第２のボード）、１５３３（第４のボード）、および１５３５（第６のボード）は、マザーボード１５２０に直接結合されず、むしろ、それらは、それらの隣接する各ボードへのそれぞれの接続を介してマザーボードに間接的に結合される。
【０３８１】
はんだ面から部品面に置かれると、多様な内部ボードコネクタにより、ＰＣＩバス構成要素、ＦＰＧＡ論理デバイス、メモリデバイス、および多様なシミュレーションシステム制御回路の間の通信が可能になる。内部ボードコネクタ１９９０の第１のセットは、図４２のコネクタＪ５〜Ｊ１６に対応する。内部ボードコネクタ１９９１の第２のセットは、図４２のコネクタＪ１７〜Ｊ２８に対応する。内部ボードコネクタ９９２の第３のセットは、図４２のコネクタＪ３およびＪ４に対応する。
【０３８２】
マザーボードコネクタ１５２１〜１５２４は、マザーボード１５２０上に提供され、マザーボード（および、故にＰＣＩバス）を６つのボードに結合する。上述のように、ボード１５２６（第１のボード）、１５３２（第３のボード）、および１５３４（第５のボード）は、それぞれ、コネクタ１５２３、１５２２、および１５２１に直接結合される。他のボード１５２５（第２のボード）、１５３３（第４のボード）、および１５３５（第６のボード）は、マザーボード１５２０に直接結合されない。１つのＰＣＩコントローラのみが、６つ全てのボードに対して必要とされるため、第１のボード１５２６だけがＰＣＩコントローラを含む。さらに、第１のボード１５２６に結合されたマザーボードコネクタ１５２３は、ＰＣＩバスに、およびＰＣＩバスからアクセスを提供する。コネクタ１５２２および１５２１は、電力およびグランドにのみ結合される。隣接したマザーボードコネクタ間の中心から中心への間隔は、一実施形態においてほぼ２０．３２ｍｍである。
【０３８３】
マザーボードコネクタ１５２３、１５２２、および１５２１にそれぞれ直接結合されるボード１５２６（第１のボード）、１５３２（第３のボード）、および１５３４（第５のボード）に関して、Ｊ５〜Ｊ１６のコネクタは、部品面に設置され、Ｊ１７〜Ｊ２８のコネクタは、はんだ面に設置され、Ｊ３〜Ｊ４のローカルバスコネクタは、部品面に設置される。マザーボードコネクタ１５２３、１５２２、および１５２１に直接結合されない他のボード１５２５（第２のボード）、１５３３（第４のボード）、および１５３５（第６のボード）に関して、Ｊ５〜Ｊ１６のコネクタは、はんだ面に設置され、Ｊ１７〜Ｊ２８のコネクタは、部品面に設置され、Ｊ３〜Ｊ４のローカルバスコネクタは、はんだ面に設置される。終端のボード１５２６（第１のボード）および１５３５（第６のボード）に関して、Ｊ１７〜Ｊ２８のコネクタの一部は、１０ΩＲ−パック終端である。
【０３８４】
図４０Ａおよび図４０Ｂは、異なるボードの間にアレイ接続を示す。製造プロセスを容易にするために、単一のレイアウト設計は、全てのボードに関して使用される。上述の説明のように、ボードは、バックプレーンなしで、コネクタを介して他のボードに接続する。図４０Ａは、２つの例示のボード１６１１（ボード２）および１６１０（ボード１）を示す。ボード１６１０の部品面は、ボード１６１１のはんだ面に向かい合っている。ボード１６１１は、多数のＦＰＧＡ論理デバイス、他の構成要素、およびワイヤ線を含む。これら論理デバイスの特定のノードおよびボード１６１１の他の構成要素は、ノードＡ’（参照符号１６１２）およびＢ’（参照符号１６１４）によって表示される。ノードＡ’は、ＰＣＢトレース１６２０を介してコネクタパッド１６１６に結合される。同様に、ノードＢ’は、ＰＣＢトレース１６２３を介してコネクタパッド１６１７に接続される。
【０３８５】
同じように、ボード１６１０はまた、多数のＦＰＧＡ論理デバイス、他の構成要素、およびワイヤ線を含む。これら論理デバイスの特定のノードおよびボード１６１０の他の構成要素は、ノードＡ（参照符号１６１３）およびノードＢ（参照符号１６１５）によって表示される。ノードＡは、ＰＣＢトレース１６２５を介してコネクタパッド１６１８に結合される。同様に、ノードＢは、ＰＣＢトレース１６２２を介してコネクタパッド１６１９に接続される。
【０３８６】
表面取り付けコネクタを用いる、異なるボードに設置されたノード間の信号のルーティングはここで論じられる。図４０Ａにおいて、（１）架空パス１６２０、１６２１、および１６２２によって示されるようなノードＡおよびノードＢ’と、（２）架空パス１６２３、１６２４、および１６２５によって示されるようなノードＢおよびノードＡ’との間に、所望の接続がある。これらの接続は、図３９のボード１５５１と１５５２との間の非対称の相互接続１６００といったパスを目的としている。他の非対称の相互接続は、コネクタ１５８９および１５９０の両面にあるＮＨ〜ＳＨの相互接続１９７７、１９７９、および１９８１を含む。
【０３８７】
Ａ−Ａ’およびＢ−Ｂ’は、相互接続１５１５（Ｎ，Ｓ）のような対称の相互接続に一致する。ＮおよびＳの相互接続は、ホールコネクタを介して使用するのに対し、ＮＨおよびＳＨの非対称相互接続は、ＳＭＤコネクタを用いる（表Ｄを参照）。
【０３８８】
表面取り付けコネクタを用いる実際の装置は、類似のアイテムに対して類似の番号を用いて、図４０Ｂを参照してここで論じられる。図４０Ｂにおいて、ボード１６１１は、ＰＣＢトレース１６２０を介して部品面のコネクタパッド１６３６に結合される部品面のノードＡ’を示す。部品面のコネクタパッド１６３６は、導電性パス１６５１を介してはんだ面のコネクタパッド１６３９に結合される。はんだ面のコネクタパッド１６３９は、導電性パス１６４８を介してボード１６１０の部品面のコネクタパッド１６４２に結合される。最後に、部品面のコネクタパッド１６４２は、ＰＣＢトレース１６２２を介してノードＢに結合される。従って、ボード１６１１上のノードＡ’は、ボード１６１０上のノードＢに結合される。
【０３８９】
同様に、図４０Ｂにおいて、ボード１６１１は、ＰＣＢトレース１６２３を介して部品面のコネクタパッド１６３８に結合された部品面上のノードＢ’を示す。部品面のコネクタパッド１６３８は、導電性パス１６５０を介してはんだ面のコネクタパッド１６３７に結合される。はんだ面のコネクタパッド１６３７は、導電性パス１６４５を介して部品面上のコネクタパッド１６４０に結合される。最後に、部品面のコネクタパッド１６４０は、ＰＣＢトレース１６２５を介してノードＡに結合される。従って、ボード１６１１上のノードＢ’は、ボード１６１０上のノードＡに結合され得る。これらのボードが同じレイアウトを共有するため、導電性パス１６５２および１６５３は、ボード１６１０に隣接して設置された他のボードのための導電性パス１６５０および１６５１と同様の様態で使用され得る。従って、独特の内部ボード接続性スキームは、スイッチング構成要素を用いることなく、表面取り付けを用いて提供され、さらにホールコネクタを介して提供される。
【０３９０】
（Ｆ．タイミング無反応グリッチフリー論理デバイス）
本発明の一実施形態は、保持時間およびクロックグリッチ問題を共に解決する。ユーザ設計を再構成可能な計算システムに構成する間、ユーザの設計で検出された標準の論理デバイス（例えば、ラッチ、フリップフロップ）は、本発明の一実施形態に従って、エミュレーション論理デバイス、すなわちタイミング無反応グリッチフリー（ＴＩＧＦ）論理デバイスと交換される。一実施形態において、ＥＶＡＬ信号に取り込まれたトリガー信号を用いて、これらのＴＩＧＦ論理デバイスにおいて格納された値を更新する。多様な入力信号および他の信号を待って、ユーザ設計のハードウェアモデルを介して伝達し、評価周期中に定常状態に達した後、トリガー信号が提供され、ＴＩＧＦ論理デバイスによって格納される値、またはラッチされる値を更新する。その結果、新しい評価周期が始まる。この評価周期のトリガー周期は、一実施形態において周期的になる。
【０３９１】
上述された保持時間問題は、ここで簡単に議論される。当業者にとって、論理回路設計の共通、かつ、広範な問題は、保持時間の違反である。制御入力が、データ入力（単数または複数）によって示された値をラッチするか、捕捉するか、または格納するように変化した後、論理素子のデータ入力（単数または複数）が、定常に保持されるために必要な最小時間として、保持時間は定義される（そうでなければ、論理素子が適切に動作できない）。
【０３９２】
シフトレジスタの例示は、ここで、保持時間の要求を説明するために論じられる。図７５Ａに例示のシフトレジスタを示す。この例示のシフトレジスタは、３つのＤ型フリップフロップを直列に接続する。すなわち、フリップフロップ２４００の出力は、フリップフロップ２４０１の入力に結合される。フリップフロップ２４０１の出力は、順に、フリップフロップ２４０２の入力に結合される。全体的な入力信号Ｓ_ｉｎは、フリップフロップ２４００の入力に結合され、全体的な出力信号Ｓ_ｏｕｔは、フリップフロップ２４０２の出力から生成される。全ての３つのフリップフロップは、それらのそれぞれの入力において共通のクロック信号を受け取る。このシフトレジスタの設計は、以下の条件に基づく。（１）クロック信号が、同時に全てのフリップフロップに到達すること、および（２）クロック信号のエッジを検出した後、フリップフロップの入力は、保持時間の保持時間中、変化しないこと。
【０３９３】
図７５Ｂのタイミング図を参照すると、システムが保持時間要求を違反しない場合の、保持時間条件が図示される。保持時間は、ある論理素子から次の論理素子へと変化するが、常に仕様書において特定される。クロック入力は、論理０から論理１へと、時間ｔ_０において変化する。図７５Ａに示されるように、クロック入力は、各フリップフロップ２４００〜２４０２に提供される。ｔ_０におけるこのクロック端から、入力Ｓｉｎは、保持時間Ｔ_Ｈの期間中、定常になる必要がある。保持時間Ｔ_Ｈは、時間ｔ_０から時間ｔ_１に続く。同様に、フリップフロップ２４０１（すなわち、Ｄ_２）および２４０２（すなわち、Ｄ_３）への入力はまた、クロック信号のトリガー端から保持時間の期間中、定常になる必要がある。この要求は、図７５Ａおよび７５Ｂで満たされるので、入力Ｓ_ｉｎは、フリップフロップ２４００にシフトされ、Ｄ_２（論理０）における入力は、フリップフロップ２４０１にシフトされ、ならびにＤ_３（論理１）における入力は、フリップフロップ２４０２にシフトされる。当業者にとって、クロック端がトリガーされると、フリップフロップ２４０１（入力Ｄ_２における論理１）およびフリップフロップ２４０２（入力Ｄ_３における論理０）の入力における新規の値は、次のクロックサイクルにおける次のフリップフロップにシフトされるか、または格納され、保持時間の必要条件が満たされることを想定する。次の表は、これら例示の値に対するシフトレジスタの操作の概要を述べる。
【０３９４】
【表７】

【０３９５】
実際の実施において、クロック信号は、全ての論理素子に同時には達さない。むしろ、クロック信号が、ほとんど同時に、または実質的に同時に全ての論理素子に達するように、回路は設計される。クロックスキュー（すなわち、各フリップフロップに達するクロック信号間のタイミング差）が保持時間要求よりもずっと短いように設計される必要がある。従って、全ての論理素子は、適切な入力値を捕捉する。図７５Ａおよび７５Ｂで示される上述の例において、別のフリップフロップが新規の入力値を捕捉する一方で、異なる時間にフリップフロップ２４００〜２４０２に達するクロック信号による保持時間の違反は、結果として古い入力値を捕捉するいくつかのフリップフロップにおいて生じ、一方で、別のフリップフロップが新規の入力値を捕捉する。結果として、シフトレジスタは、適切に操作されない。
【０３９６】
同じシフトレジスタ設計の再構成可能な論理デバイス（例えば、ＦＰＧＡ）装置において、クロックが一次入力から直接生成された場合、回路は、低スキューのネットワークがクロック信号を全ての論理素子に分配し得るように（例えば、論理素子が実質的に同時にクロックエッジを検出する）設計され得る。一次のクロックは、セルフタイムの試験ベンチ（ｔｅｓｔ−ｂｅｎｃｈ）処理から生成される。たいてい、一次のクロック信号は、ソフトウェアにおいて生成され、いくつか（すなわち、１〜１０）の一次クロックのみが、典型的なユーザ回路設計において見出される。
【０３９７】
しかし、クロック信号が一次の入力の代わりに内部論理から生成されると、保持時間は、さらに問題になる。誘導され、ゲートされたクロックは、一次のクロックによって順に駆動される組み合わせ論理およびレジスタのネットワークから生成される。多く（すなわち、１０００以上）の誘導されたクロックは、典型的なユーザ回路設計において見出される。
【０３９８】
特別な注意またはさらなる制御なしで、これらのクロック信号は、異なる時間において各論理素子に達し、クロックスキューは、保持時間よりも長くなり得る。これは、結果として、例えば、図７５Ａおよび７５Ｂに図示されるシフトレジスタ回路のように、回路設計の失敗になり得る。
【０３９９】
図７５Ａで図示される同じシフトレジスタ回路を用いて、ここで、保持時間の違反について論じる。今度は、しかしながら、シフトレジスタ回路の個々のフリップフロップが、図７６Ａに示されるように、複数の再構成可能な論理チップ（例えば、複数のＦＰＧＡチップ）にわたって展開される。第１のＦＰＧＡチップ２４１１は、内在的に導かれたクロック論理２４１０を含む。クロック論理２４１０は、そのクロック信号ＣＬＫをのいくつかの構成要素であるＦＰＧＡチップ２４１２〜２４１６に供給する。この例示において、内在的に生成されたクロック信号ＣＬＫは、シフトレジスタ回路のフリップフロップ２４００〜２４０２に提供される。チップ２４１２は、フリップフロップ２４００を含み、チップ２４１５は、フリップフロップ２４０１を含み、さらにチップ２４１６は、フリップフロップ２４０２を含む。保持時間違反の概念を説明するために、２つの他のチップ２４１３および２４１４が提供される。
【０４００】
チップ２４１１のクロック論理２４１０は、一次のクロック入力（すなわち、おそらく別の導かれたクロック信号）を受け取って、内部クロック信号ＣＬＫを生成する。この内部クロック信号ＣＬＫは、チップ２４１２に移動し、ＣＬＫ１として符号付けされる。クロック論理２４１０からの内部クロック信号ＣＬＫはまた、ＣＬＫ２として、チップ２４１３および２４１４を介してチップ２４１５に移動する。図示されるように、ＣＬＫ１は、フリップフロップ２４００への入力であり、ＣＬＫ２は、フリップフロップ２４０１への入力である。ＣＬＫ１およびＣＬＫ２は共に、例えば、ＣＬＫ１およびＣＬＫ２のエッジが、内部クロック信号ＣＬＫのエッジから遅延されるようにワイヤトレース遅延を経る。さらに、ＣＬＫ２は、他の２つのチップ２４１３および２４１４を介して移動されるため、ＣＬＫ２は、さらなる遅延を経る。
【０４０１】
図７６Ｂのタイミング図を参照すると、内部クロック信号ＣＬＫは、時間ｔ_２において生成され、トリガーされる。ワイヤトレース遅延のため、ＣＬＫ１は、時間ｔ_３までにチップ２４１２のフリップフロップ２４００に到達しない。ここで、時間ｔ_３は時間Ｔ１の遅延である。上の表で示されるように、ＣＬＫ１のクロックエッジの到達以前に、Ｑ_１における出力（または入力Ｄ_２）は、論理０にある。ＣＬＫ１のエッジがフリップフロップ２４００において感知された後に、Ｄ_１における入力は、必要な保持時間Ｈ２（すなわち、時間ｔ_４まで）のための定常を維持する必要がある。この時点で、フリップフロップ２４００は、入力論理１にシフトするか、または入力論理１を格納する。故に、Ｑ_１（Ｄ_２）における出力は、論理１にある。
【０４０２】
これが、フリップフロップ２４００に対して生じると、クロック信号ＣＬＫ２は、チップ２４１５のフリップフロップ２４０１へのクロック信号の通り道を作る。チップ２４１３および２４１４によって生じた遅延Ｔ２によって、ＣＬＫ２は、時間ｔ_５においてフリップフロップ２４０１に到達する。今、Ｄ_２における入力は論理１にあり、保持時間がこのフリップフロップ２４０１を満たした後、この論理値１は、出力Ｑ_２（またはＤ_３）に現れる。従って、出力Ｑ_２は、ＣＬＫ２の到達前に論理１にあり、出力は，ＣＬＫ２の到達後に論理１にとどまる。これは、誤った結果である。このシフトレジスタは、論理０にシフトするべきである。フリップフロップ２４００が、古い入力値（論理１）に誤ってシフトする場合、フリップフロップ２４０１は、新しい入力値（論理１）に誤ってシフトされる。この誤った動作は、典型的に、クロックスキュー（またはタイミング遅延）が保持時間よりも長くなったときに生じる。この例示において、Ｔ２＞Ｔ１＋Ｈ２である。つまり、図７６Ａに示されるように、いくつかの予防策が取られない場合に、クロック信号があるチップから生成され、異なるチップに常駐する他の論理素子にクロック信号を分配する所で保持時間違反が生じる可能性が高い。
【０４０３】
ここで、図７７Ａおよび７７Ｂを参照して、上述されたクロックグリッチ問題を述べる。一般に、回路の入力が変化するとき、出力が正しい値に安定する前に、非常に短時間で、出力は、いくらかのランダム値に変化する。別の回路が、まさに間違った時間に出力を検査し、ランダム値を読み込んだ場合、結果は間違っており、デバックが困難になり得る。別の回路に不利益に影響したこのランダム値は、グリッチと呼ばれる。典型的な論理回路において、ある回路は、別の回路に対してクロック信号を生成し得る。非補償のタイミング遅延が、１つまたは両方の回路に存在する場合、クロックグリッチ（すなわち、クロック端の計画されていない発生）が発生して、誤った結果を生じ得る。保持時間違反のように、回路設計の正確な論理素子が異なった時間において値を変化するために、クロックグリッチが起こる。
【０４０４】
図７７Ａは、例示の論理回路を示す。この場合、いくつかの論理素子が論理素子の別のセットに対してクロック信号を生成する。つまり、Ｄ型フリップフロップ２４２０、Ｄ型フリップフロップ２４２５、および排他的ＯＲ（ＸＯＲ）ゲート２４２２がＤ型フリップフロップ２４２３に対してクロック信号（ＣＬＫ３）を生成する。フリップフロップ２４２０は、線２４２５によってＤ_１においてデータ入力を受け取り、線２４２７によってＱ_１においてデータを出力する。フリップフロップ２４２０は、クロック論理２４２４からクロック入力（ＣＬＫ１）を受け取る。ＣＬＫは、クロック論理２４２４から本来生成されたクロック信号を参照して、ＣＬＫ１は、それがフリップフロップ２４２０に到達するときの時間まで遅延された同じ信号を参照する。
【０４０５】
フリップフロップ２４２１は、線２４２６によってＤ_２においてデータ入力を受け取って、線２４２８によってＱ_２においてデータを出力する。フリップフロップ２４２１は、クロック論理２４２４からクロック入力（ＣＬＫ２）を受け取る。上述されたように、ＣＬＫは、クロック論理２４２４から本来生成されたクロック信号を参照して、ＣＬＫ２は、それがフリップフロップ２４２１に到達するときの時間まで遅延される同じ信号を参照する。
【０４０６】
線２４２７によるフリップフロップ２４２０からの出力および線２４２８によるフリップフロップ２４２１からの出力は、ＸＯＲゲート２４２２への入力である。ＸＯＲゲート２４２２は、フリップフロップ２４２３のクロック入力に、ＣＬＫ３として符号付けされたデータを出力する。フリップフロップ２４２３はまた、線２４２９によりＤ_３においてデータを入力し、Ｑ_３においてデータを出力する。
【０４０７】
ここで、この回路に対して生じたクロックグリッチ問題は、図７７Ｂに図示されたタイミング図を参照して述べられる。ＣＬＫ信号は、時間ｔ_０においてトリガーされる。この時までには、このクロック信号（すなわち、ＣＬＫ１）は、フリップフロップ２４２０に到達し、この時間は、すでに時間ｔ_１である。ＣＬＫ２は、時間ｔ_２までにフリップフロップ２４２１に到達しない。
【０４０８】
Ｄ_１およびＤ_２への入力が共に、論理１にあると想定する。ＣＬＫ１が、時間ｔ_１においてフリップフロップ２４２０に到達するとき、Ｑ_１における出力は、（図７７Ｂに示されるように）論理１にある。ＣＬＫ２は、時間ｔ_２において少し遅れてフリップフロップ２４２１に到達し、故に、線２４２８の出力Ｑ_２は、時間ｔ_１から時間ｔ_２までの間、論理０に残る。ＸＯＲゲート２４２２は、時間ｔ_１と時間ｔ_２との間の時間周期中に、たとえ所望の信号が論理０（１ＸＯＲ１＝０）であっても、提示目的のＣＬＫ３としての論理１をフリップフロップ２４２３のクロック入力に生成する。この時間ｔ_１と時間ｔ_２との間の時間周期中のＣＬＫ３の生成が、クロックグリッチである。従って、フリップフロップ２４２３の入力線２４２９によりＤ_３において提供されたどんな論理値でも、それが所望であっても、所望でなくとも格納され、ここで、このフリップフロップ２４２３は、線２４２９による次の入力のために準備される。正確に所望であるならば、ＣＬＫ１およびＣＬＫ２の時間遅延は、最低限にされ、クロックグリッチは生成されず、または、最低限でも、クロックグリッチは、回路の残りに影響を与えられないほど短い間隔で続く。後者の場合、ＣＬＫ１およびＣＬＫ２の間のクロックスキューが十分に短い場合、ＸＯＲゲート遅延は、グリッチを十分に透過して、回路の残りに影響を与えない。
【０４０９】
保持時間違反問題への２つの公知の解決法は、（１）タイミング調整、および（２）タイミングの再統合である。タイミング調整は、米国特許出願第５，４７８３０号に記載されるように、論理素子の保持時間を延長するために正確な信号パスに十分な遅延素子の挿入を必要とする。例えば、上述のシフトレジスタ回路における入力Ｄ_２およびＤ_３による十分な遅延を追加することにより、保持時間違反を妨げ得る。従って、図７８において、同様のシフトレジスタ回路は、入力Ｄ_２およびＤ_３にそれぞれ追加される遅延素子２４３０および２４３１と共に示される。結果として、遅延素子２４３０は、時間ｔ_４が時間ｔ_５の後に、Ｔ２＜Ｔ１＋Ｈ２（図７６Ｂ）となって生じるように設計され得、その結果、保持時間違反は生じない。
【０４１０】
タイミング調整の解決策による潜在的な問題は、ＦＰＧＡチップの仕様書に強く依存していることである。公知の技術であるように、ＦＰＧＡチップのような再構成可能な論理チップは、ルックアップテーブルを用いて論理素子を実現することである。チップのルックアップテーブルの遅延は、この特定された時間遅延に依存する保持時間超過を回避するタイミング調整方法を用いて仕様書および設計者に提供される。しかし、この遅延は評価だけであり、チップ間で変動する。タイミング調整方法に関する別の潜在的な問題は、設計者が回路設計にわたって存在する配線遅延を補償しなければならないことである。これは不可能なタスクではないが、配線遅延の評価は時間を消費し、誤りを生じる傾向がある。さらにタイミング調整法は、クロックグリッチ問題を解決しない。
【０４１１】
別の解決策は、ＩＫＯＳのＶｉｒｔｕａｌＷｉｒｅｓｔｅｃｈｎｏｌｏｇｙによって導入された技術であるタイミング再合成である。タイミング再合成の概念は、ユーザの回路設計を機能的に等価な設計に変換しつつ、有限状態機械およびレジスタを介してクロックおよびピンアウト（ｐｉｎ−ｏｕｔ）信号のタイミングを厳密に制御する。タイミング再合成は、単一の高速クロックを導入することによってユーザの回路設計を再タイミング調整する（ｒｅｔｉｍｅ）。さらに、ラッチ、ゲートクロック、および多重同期クロックおよび多重非同期クロックをフリップフロップベースの単一クロック同期設計に変換する。従って、タイミング再合成は、各チップの入力ピンアウトおよび出力ピンアウトにおけるレジスタを使用して、緻密な内部チップ信号移動を制御し、その結果内部チップ保持時間超過が発生しない。さらにタイミング再合成は、各チップにおいて有限状態機械を使用し、他のチップからの入力をスケジューリングし、他のチップへの出力をスケジューリングし、そして、基準クロックに基づいて内部フリップフロップの更新をスケジューリングする。
【０４１２】
図７５Ａ、図７５Ｂ、図７６Ａ、および図７６Ｂに関して上述された説明によって導入された同じシフトレジスタ回路を使用することによって、図７９は、タイミング再合成回路の一例を示す。基本的な３つのフリップフロップシフトレジスタ設計は、機能的な等価回路に変換されている。チップ２４３０は、ライン２４４８を介してレジスタ２４４３に結合された元の内部クロック発生論理２４３５を含む。クロック論理２４３５はＣＬＫ信号を発生する。さらに第１の有限状態機械２４３８は、ライン２４４９を介してレジスタ２４４３に結合される。レジスタ２４４３および第１の有限状態機械２４３８は、独立設計のグローバル基準クロックによって制御される。
【０４１３】
さらにＣＬＫ信号は、その信号がチップ２４３４に到達する前に、チップ２４３２および２４３３にわたって送達する。チップ２４３２では、第２の有限状態機械２４４０は、ライン２４６２を介してレジスタ２４４５を制御する。ＣＬＫ信号は、ライン２４６１を介してレジスタ２４４３からレジスタ２４４５に進む。レジスタ２４４５はＣＬＫ信号を、ライン２４６３を介して次のチップ２４３３に出力する。チップ２４３３は、ライン２４６４を介してレジスタ２４４６を制御する第３の有限状態機械２４４１を含む。レジスタ２４４６は、ＣＬＫ信号をチップ２４３４に出力する。
【０４１４】
チップ２４３１は、元のフリップフロップ２４３６を含む。レジスタ２４４４は、入力Ｓ_ｉｎを受け取り、入力Ｓ_ｉｎを、ライン２４５２を介してフリップフロップ２４３６の入力Ｄ_１に出力する。フリップフロップ２４３６の出力Ｑ_１は、ライン２４５４を介してレジスタ２４６６に結合される。第４の有限状態機械２４３９は、ライン２４５１を介してレジスタ２４４４、ライン２４５５を介してレジスタ２４６６、およびラッチイネーブルライン２４５３を介してフリップフロップ２４３６を制御する。さらに第４の有限状態機械２４３９は、ライン２４５０を介してチップ２４３０から元のクロック信号ＣＬＫを受け取る。
【０４１５】
チップ２４３４は、ライン２４５６を介して、そのフリップフロップのＤ_２入力において、チップ２４３１のレジスタ２４６６から信号を受け取る元のフリップフロップ２４３７を含む。フリップフロップ２４３７のＱ_２出力は、ライン２４５７を介してレジスタ２４４７に結合される。第５の有限状態機械２４３９は、ライン２４５９を介してレジスタ２４４７、およびラッチイネーブルライン２４５８を介してフリップフロップ２４３７を制御する。さらに第５の有限状態機械２４４２は、チップ２４３２および２４３３を介してチップ２４３０から元のクロック信号ＣＬＫを受け取る。
【０４１６】
タイミング再合成によって、有限状態機械２４３８〜２４４２、レジスタ２４４３〜２４４７および２４６６、ならびに単一のグローバル基準クロックが使用されて、複数のチップにわたる信号フローを制御し、内部フリップフロップを更新する。従って、チップ２４３０では、ＣＬＫ信号の他のチップへの分配は、レジスタ２４４３を介して第１の有限状態機械２４３８によってスケジューリングされる。同様に、チップ２４３１では、第４の有限状態機械２４３９は、入力Ｓ_ｉｎを、レジスタ２４４４を介してフリップフロップ２４３６に送達すること、およびレジスタ２４６６を介してＱ_１出力を送達することをスケジューリングする。さらにフリップフロップ２４３６のラッチ機能は、第４の有限状態機械２４３９からのラッチイネーブル信号によって制御される。同じ原理が、他のチップ２４３２〜２４３４における論理に対して支援する。内部チップ入力送達スケジュール、内部チップ出力送達スケジュール、および内部フリップフロップ状態更新のこのような密な制御によると、内部チップ保持時間の超過が取り除かれる。
【０４１７】
しかし、タイミング再合成技術は、有限状態機械およびレジスタの追加を含む、はるかにより大きい機能的に等価な回路にユーザの回路設計を変換することを要求する。典型的には、この技術を実現するために必要なさらなる論理が各チップにおいて利用可能な論理の２０％まで占める。さらに、この技術は、クロックグリッチ問題に影響を受けない。クロックグリッチを避けるために、タイミング再合成技術を用いる設計者は、さらなる予備的なステップを獲得しなければならない。１つの保守的な設計アプローチは、ゲートクロックを利用する論理デバイスへの入力が同時に変化されないように回路を設計することである。積極的なアプローチは、回路の残りに影響を与えないようにゲート遅延を使用して、グリッチをフィルタリングする。しかし上述のように、タイミング再合成は、クロックグリッチを避けるためにいくつかのさらなる普通でない（ｎｏｎ−ｔｒｉｖａｌ）測定を必要とする。
【０４１８】
保持時間およびクロックグリッチ問題の両方を解決する本発明の種々の実施形態が説明される。ＲＣＣコンピューティングシステムのソフトウエアモデルおよびＲＣＣアレイのハードウエアモデルへのユーザ設計のマッピング構成の間、図１８Ａに示されるラッチは、本発明の一実施形態によるタイミングに敏感ではないグリッチのない（ＴＩＧＦ）ラッチを用いてエミュレートする。同様に、図１８Ｂに示された設計フリップフロップは、本発明の一実施形態によるＴＩＧＦフリップフロップを用いてエミュレートされる。これらのＴＩＧＦ論理デバイスは、ラッチまたはフリップフロップの形態であろうとも、エミュレーション論理デバイスとも呼ばれ得る。ＴＩＧＦラッチおよびフリップフロップの更新はグローバルトリガ信号を用いて制御される。
【０４１９】
本発明の一実施形態では、ユーザ設計回路において見出される論理デバイスの全てがＴＩＧＦ論理デバイスと置換されるわけではない。ユーザ設計回路は、ゲートクロックまたは発生したクロックによって制御される一次クロックおよび他の部分によってイネーブルまたはクロックされたこれらの部分を含む。保持時間超過およびクロックグリッチは、論理デバイスがゲートクロックまたは発生したクロックによって制御される後者の場合に対して発行されるため、ゲートクロックまたは発生したクロックによって制御されたこれらの特定の論理デバイスのみが本発明によるＴＩＧＦ論理デバイスで置換される。他の実施形態では、ユーザ設計回路において見出される全ての論理デバイスはＴＩＧＦ論理デバイスによって置換される。
【０４２０】
本発明のＴＩＧＦラッチおよびフリップフロップの実施形態を説明する前に、グローバルトリガ信号が説明される。一般的には、グローバルトリガ信号は、評価期間の間にＴＩＧＦラッチおよびフリップフロップがその状態（すなわち、古い入力値を維持する）に維持し、短いトリガ期間の間にその状態を更新する（すなわち、新しい入力値を格納する）ことを可能にするように使用される。一実施形態では、図８２に示されるグローバルトリガ信号は、上述のＥＶＡＬ信号から分離し、そしてこの信号から発生される。この実施形態では、グローバルトリガ信号は、長い評価期間の次に短いトリガ期間を有する。グローバルトリガ信号は、評価期間の間およびＥＶＡＬサイクルの終了時にＥＶＡＬ信号をトラッキングし、短いトリガ信号がＴＩＧＦラッチおよびフリップフロップを更新するように生成される。別の実施形態では、ＥＶＡＬ信号はグローバルトリガ信号であり、ＥＶＡＬ信号は、評価期間の間に１つの論理状態（例えば論理０）であり、評価しない期間またはＴＩＧＦラッチ／フリップフロップ更新期間の間に別の論理状態（例えば論理１）である。
【０４２１】
ＲＣＣコンピューティングシステムおよびＲＣＣハードウエアアレイに関して上述されたように、評価期間が一次入力およびフリップフロップ／ラッチデバイスの全ての変化を全体のユーザ設計に、一度に一シミュレーションサイクルで伝達するように使用される。この伝達の間に、ＲＣＣシステムは、システム内の全信号に安定状態を達成するまで待機する。この評価期間は、ユーザ設計がＲＣＣアレイの適切な再構成可能な論理デバイス（例えば、ＦＰＧＡチップ）にマッピングされ配置された後に計算される。従って、評価期間は設計特有である。すなわち、１つのユーザ設計に対する評価期間は、別のユーザ設計に対する評価期間とは異なり得る。この評価期間は、次の短いトリガ期間の前に、システムにおける全ての信号は全体のシステムを介して伝達され、安定状態に到達することを確実にするのに十分長くなければならない。
【０４２２】
図８２に示されるように、短いトリガ期間が評価期間に時間的に（ｉｎｔｉｍｅ）隣接して発生する。一実施形態では、短いトリガ期間が評価期間の後に発生する。この短いトリガ期間の前に、入力信号は、評価期間の間にユーザ設計回路のハードウエアモデル構成部にわたって伝達される。本発明の一実施形態によってＥＶＡＬ信号の論理状態の変化によって特徴付けられた短いトリガ期間は、ユーザ設計の全てのＴＩＧＦラッチおよびフリップフロップを制御し、安定状態が達成された後、評価期間から伝達された新しい値で更新され得る。この短いトリガ期間は、低いスキューネットワークにグローバルに分配され、再構成可能な論理デバイスが適切な動作を可能にする期間と同様に短くてもよい（すなわち、図８２に示されるように、ｔ_０〜ｔ_１およびｔ_２〜ｔ_３の持続時間）。この短いトリガ期間の間、新しい一次入力は、ＴＩＧＦラッチおよびフリップフロップの各入力ステージにおいてサンプリングされ、同じＴＩＧＦラッチおよびフリップフロップにおける古い格納された値は、ユーザ設計のＲＣＣハードウエアモデルにおける次のステージに出力される。以下の説明では、短いトリガ期間の間に発生するグローバルトリガ信号の一部は、ＴＩＧＦトリガ、ＴＩＧＦトリガ信号、トリガ信号、または単にトリガと呼ばれる。
【０４２３】
図８０Ａは、図１８Ａに元々示されるラッチ２４７０を示す。ラッチ動作は以下のようである。
【０４２４】
ｉｆ（＃Ｓ），Ｑ←１
ｅｌｓｅｉｆ（＃Ｒ），Ｑ←０
ｅｌｓｅｉｆ（ｅｎ），Ｑ←Ｄ
ｅｌｓｅＱｋｅｅｐｓｔｈｅｏｌｄｖａｌｕｅ．
このラッチはレベルに敏感で非同期であるため、クロック入力がイネーブルされ、ラッチイネーブル入力がイネーブルされる限り、出力Ｑは入力Ｄを追跡する。
【０４２５】
図８０Ｂは、本発明の一実施形態によるＴＩＧＦラッチを示す。図８０Ａのラッチと同様に、ＴＩＧＦラッチは、Ｄ入力、イネーブル入力、セット（Ｓ）、リセット（Ｒ）、および出力Ｏを有する。さらに、ＴＩＧＦラッチはトリガ入力を有する。ＴＩＧＦラッチは、Ｄ型フリップフロップ２４７１、マルチプレクサ２４７２、ＯＲゲート２４７３、ＡＮＤゲート２４７４、および種々の相互接続を含む。
【０４２６】
Ｄ型フリップフロップ２４７１は、ＡＮＤゲート２４７４の出力からライン２４７６を介してその入力を受け取る。このＤ型フリップフロップはまた、そのクロック入力においてライン２４７７上のトリガ信号によってトリガされる。このトリガ信号は、評価サイクルに依存する厳密なスケジュールに従って、ＲＣＣシステムによってグローバルに分配される。Ｄ型フリップフロップ２４７１の出力はライン２４７８を介してマルチプレクサ２４７２の入力の内の１つに結合される。マルチプレクサ２４７２の他の入力は、ライン２４７５上のＴＩＧＦラッチのＤ入力に結合される。このマルチプレクサはライン２４８４上のイネーブル信号によって制御される。マルチプレクサ２４７２の出力は、ライン２４７９を介してＯＲゲート２４７３の入力の内の１つに結合される。ＯＲゲート２４７３の他の入力は、ライン２４８０上のセット（Ｓ）入力に結合される。ＯＲゲート２４７３の出力は、ライン２４８１を介してＡＮＤゲート２４７４の入力の内の１つに接続される。ＡＮＤゲート２４７４の他の入力はライン２４８２のリセット（Ｒ）信号に接続される。ＡＮＤゲート２４７４の出力は、上述のようにライン２４７６を介してＤ型フリップフロップ２４７１の入力にフィードバックされる。
【０４２７】
本発明のこのＴＩＧＦラッチの実施形態の動作がここで説明される。ＴＩＧＦラッチのこの実施形態では、Ｄ型フリップフロップ２４７１は、ＴＩＧＦラッチの現在の状態（すなわち古い値）を保持する。Ｄ型フリップフロップ２４７１の入力におけるライン２４７６は、このＴＩＧＦラッチに既にラッチされた新しい入力値を表す。ライン２４７６は新しい値を表す。なぜなら、ライン２４７５上のＴＩＧＦラッチの主入力（Ｄ入力）は、マルチプレクサ２４７２の入力（ライン２４８４上の適切なイネーブル信号を用いて最終的に表される）からＯＲゲート２４７３を介して、そして最終的にはＡＮＤゲート２４７４を介して、ライン２４８３上に最終的に進み、ライン２４７６上のＤ型フリップフロップ２４７１に、ＴＩＧＦラッチの新しい入力信号をフィードバックするからである。ライン２４７７上のトリガ信号は、ライン２４７６上の新しい入力値をＤ型フリップフロップ２４７１にクロックすることによってＴＩＧＦラッチを更新する。従って、Ｄ型フリップフロップ２４７１のライン２４７８上の出力は、ＴＩＧＦラッチの現在の状態（すなわち古い値）を示しつつ、ライン２４７６上の入力はＴＩＧＦラッチによって既にラッチされた新しい入力値を示す。
【０４２８】
マルチプレクサ２４７２は、Ｄ型フリップフロップ２４７１から現在の状態およびライン２４７５上で新しい入力値を受け取る。イネーブルライン２４８４は、マルチプレクサ２４７２に対するセレクタ信号として機能する。ＴＩＧＦラッチは、トリガ信号がライン２４７７上で供給されるまで更新されないため（すなわち新しい入力値が格納される）、ライン２４７５上のＴＩＧＦラッチのＤ入力およびライン２４８４上のイネーブル入力が任意の順序でＴＩＧＦラッチに到達し得る。このＴＩＧＦラッチ（ユーザ設計のハードウエアモデルにおける他のＴＩＧＦラッチ）が、図７６Ａおよび図７６Ｂに関して上述したように（あるクロック信号が別のクロック信号よりもはるかに遅く到達する）、従来のラッチを使用した回路の保持時間超過を通常引き起こす状況に遭遇する場合、このＴＩＧＦラッチは、トリガ信号がライン２４７７上に供給されるまで適切な古い値を保持することによって適切に機能する。
【０４２９】
このトリガ信号は、低いスキューグローバルネットワークを介して分配される。^＊＊＊
さらにこのＴＩＧＦラッチは、クロックグリッチ問題を解決する。ＴＩＧＦラッチにおいてクロック信号がイネーブル信号によって置換されることに留意されたい。ライン２４８４上のイネーブル信号は、評価期間の間にしばしばグリッチし得るが、ＴＩＧＦラッチは、現在の状態を必ず保持するように継続する。ＴＩＧＦラッチが更新され得る機構のみがトリガ信号により存在し、この信号が安定状態に達した場合、一実施形態では、このトリガ信号が評価期間の後に供給される。
【０４３０】
図８１Ａは、図１８Ｂに元々示されたフリップフロップ２４９０を示す。このフリップフロップは以下のように動作する。
【０４３１】
ｉｆ（＃Ｓ），Ｑ←１
ｅｌｓｅｉｆ（＃Ｒ），Ｑ←０
ｅｌｓｅｉｆ（ｐｏｓｉｔｉｖｅｅｄｇｅｏｆＣＬＫ），Ｑ←Ｄ
ｅｌｓｅＱｋｅｅｐｓｔｈｅｏｌｄｖａｌｕｅ．
フリップフロップイネーブル入力がイネーブルされる限り、このラッチがエッジトリガされるため、出力Ｑは、クロック信号の正のエッジにおいて入力Ｄを追跡する。
【０４３２】
図８１Ｂは、本発明の一実施形態によるＴＩＧＦＤタイプフリップフロップを示す。図８１Ａのフリップフロップと同様に、ＴＩＧＦフリップフロップは、Ｄ入力、クロック入力、セット（Ｓ）、リセット（Ｒ）、および出力Ｑを有する。さらに、ＴＩＧＦフリップフロップはトリガ入力を有する。ＴＩＧＦフリップフロップは、３つのＤ型フリップフロップ２４９１、２４９２、および２４９６、マルチプレクサ２４９３、ＯＲゲート２４９４、２つのＡＮＤゲート２４９５および２４９７、ならびに種々の相互接続を含む。
【０４３３】
フリップフロップ２４９１は、ライン２４９８上でＴＩＧＦＤ入力、ライン２４９９上でトリガ入力を受け取り、ライン２５００上でＱ出力を供給する。さらにこの出力ライン２５００は、マルチプレクサ２４９３への入力として機能する。マルチプレクサ２４９３への他の出力はフリップフロップ２４９２のＱ出力からライン２５０３を介して入来する。マルチプレクサ２４９３の出力はライン２５０５を介してＯＲゲート２４９４の入力の内の１つに結合される。ＯＲゲート２４９２の他の入力は、ライン２５０６上のセット（Ｓ）信号である。ＯＲゲート２４９４の出力は、ライン２５０７を介してＡＮＤゲート２４９５の入力の内の１つに結合される。ＡＮＤゲート２４９５の他の入力は、ライン２５０８上のリセット（Ｒ）信号である。ＡＮＤゲート２４９５の出力（これは、全体のＴＩＧＦ出力Ｑでもある）は、ライン２５０１を介してフリップフロップ２４９２の入力に結合される。さらにフリップフロップ２４９２は、ライン２５０２上にトリガ入力を有する。
【０４３４】
マルチプレクサ２４９３に戻って、そのセレクタ入力は、ライン２５０９を介してＡＮＤゲート２４９７の出力に結合される。ＡＮＤゲート２４９７は、ライン２５１０上のＣＬＫ信号からの入力およびライン２５１２を介してフリップフロップ２４９６の出力からの他の入力の内の１つを受け取る。さらにフリップフロップ２４９６は、ライン２５１１上のＣＬＫ信号からの入力およびライン２５１３上のトリガ入力を受け取る。
【０４３５】
本発明のＴＩＧＦフリップフロップの動作の実施形態がここで説明される。この実施形態では、ＴＩＧＦフリップフロップは、３つの異なる点（ライン２４９９を介したＤ型フリップフロップ２４９１、ライン２５０２を介したＤ型フリップフロップ２４９２、およびライン２５１３を介したＤ型フリップフロップ２４９６）におけるトリガ信号を受け取る。
【０４３６】
ＴＩＧＦフリップフロップは、クロック信号のエッジが検出された場合のみ入力値を格納する。本発明の一実施形態によって、要求されたエッジはクロック信号の正のエッジである。クロック信号の正のエッジを検出するために、エッジ検出器２５１５が提供されている。エッジ検出器２５１５は、Ｄ型フリップフロップ２４９６およびＡＮＤゲート２４９７を含む。さらにエッジ検出器２５１５は、Ｄ型フリップフロップ２４９６のライン２５１３上のトリガ信号を介して更新される。
【０４３７】
Ｄ型フリップフロップ２４９１は、トリガ信号がライン２４９９上に提供されるまで、ＴＩＧＦフリップフロップの新しい入力値を保持し、ライン２４９８上のＤ入力に対する任意の変化に抵抗する。従って、ＴＩＧＦフリップフロップの各評価期間の前に、新しい値がＤ型フリップフロップ２４９１に格納される。従ってＴＩＧＦフリップフロップがトリガ信号によって更新されるまで新しい値を予め格納することによって、ＴＩＧＦフリップフロップは、保持時間超過を回避する。
【０４３８】
Ｄ型フリップフロップ２４９２は、トリガ信号がライン２５０２上に供給されるまで、ＴＩＧＦフリップフロップの現在の値（すなわち古い値）を保持する。この値は、この値が更新された後および次の評価期間の前にエミュレートされたＴＩＧＦフリップフロップの状態である。ライン２５０１上のＤ型フリップフロップ２４９２への入力は、新しい値（評価された期間の有意な持続時間に対して、ライン２５００上の値と同じである）を保持する。
【０４３９】
マルチプレクサ２４９３は、ライン２５００上で新しい入力値およびライン２５０３上でＴＩＧＦフリップフロップ２５０３に現在格納された古い値を受け取る。ライン２５０４上のセレクタ信号に基づいて、マルチプレクサは、エミュレートされたＴＩＧＦフリップフロップの出力として、新しい値（ライン２５００）または古い値（ライン２５０３）のいずれかを出力する。ユーザ設計のハードウエアモデルアプローチの安定状態において伝達された全ての信号の前に、この出力は任意のクロックグリッチによって変化する。従って、ライン２５０１上の入力は、評価期間の終了によってフリップフロップ２４９１に格納された新しい値を提示する。トリガ信号はＴＩＧＦフリップフロップによって受け取られる場合、フリップフロップ２４９２は、ライン２５０１に存在した新しい値を格納し、フリップフロップ２４９１は、ライン２４９８上の次の新しい値を格納する。従って、本発明の一実施形態によるＴＩＧＦフリップフロップは、クロックグリッチに負の影響を与えない。
【０４４０】
さらに詳述するために、このＴＩＧＦフリップフロップはまた、クロックグリッチに対していくつかの不感領域（ｉｍｍｕｎｉｔｙ）を提供する。当業者は、図７７Ａに示されるフリップフロップ２４２０、２４２１、および２４２３を図８１ＢのＴＩＧＦフリップフロップの実施形態に置き換えることによって、クロックグリッチは、このＴＩＧＦフリップフロップを使用する任意の回路に影響を与えないことを理解する。図７７Ａおよび図７７Ｂをしばらく参照すると、クロックグリッチは、図７７Ａの回路に負の影響を与える。なぜなら、時間ｔ_１〜ｔ_２に対して、フリップフロップ２４２３が新しい値でクロックされるべきではない場合に新しい値にクロックされるためである。ＣＬＫ１およびＣＬＫ２信号のスキューの性質は、時間ｔ_１〜ｔ_２の間、ＸＯＲゲート２４２２に論理１状態を生成させ、次のフリップフロップ２４２３のクロックラインを駆動させる。本発明の実施形態によるＴＩＧＦフリップフロップによって、クロックグリッチは、新しい値のクロッキングに影響を与えない。フリップフロップ２４２３をＴＩＧＦフリップフロップに置換することにより、一旦、信号が評価期間の間、安定状態に達成すると、短いトリガ期間の間のトリガ信号は、ＴＩＧＦフリップフロップがフリップフロップ２４９１（図８１Ｂ）に新しい値を格納することを可能にする。その後、時間ｔ_１〜ｔ_２からの時間間隔の間に図７７Ｂのクロックグリッチのような任意のクロックグリッチは新しい値にクロックしない。ＴＩＧＦフリップフロップは、トリガ信号のみを用いて更新し、この回路を伝達するこの信号が安定状態に達成した場合、このトリガ信号は、評価期間の後までＴＩＧＦフリップフロップに提示されない。
【０４４１】
ＴＩＧＦフリップフロップの特定の実施形態は、Ｄタイプフリップフロップであるが、他のフリップフロップ（例えば、Ｔ、ＪＫ、ＳＲ）は本発明の範囲内にある。他のタイプのエッジトリガフリップフロップは、Ｄ入力の前にいくつかのＡＮＤ／ＯＲ論理を追加することによってＤ型フリップフロップから発生し得る。
【０４４２】
（ＶＩＩ．シミュレーションサーバ）
本発明の別の実施形態によるシミュレーションサーバは、複数のユーザが同じ再構成可能なハードウエアユニットにアクセスできるように提供されて、時分割された態様で同じユーザ設計または異なるユーザ設計を効率的にシミュレートおよび加速する。高速シミュレーションスケジューラおよび状態スワッピング機構は、高いスループットを生じるアクティブシミュレーションプロセスによってシミュレーションサーバに供給するように使用される。このサーバは、加速およびハードウエア状態スワッピング目的のために再構成可能なハードウエアにアクセスする複数のユーザまたは複数のプロセスを提供する。一旦、加速が得られるか、またはハードウエア状態にアクセスされ、各ユーザまたはプロセスは、ソフトウエアのみにおいてシミュレートし得、従って、再構成可能なハードウエアユニットの制御を他のユーザまたはプロセスに解放する。
【０４４３】
本明細書のシミュレーションサーバ部において、「ジョブ」および「プロセス」等の用語が使用される。本明細書において用語「ジョブ」および「プロセス」は、一般的に相互交換可能に使用される。従来では、バッチシステムが「ジョブ」を実行し、時分割システムが「プロセス」またはプロググラムを格納および実行していた。今日のシステムでは、これらのジョブおよびプロセスは類似している。従って、本明細書中、用語「ジョブ」は、バッチ型システムに限定されず、「プロセス」は、時分割システムに限定されない。むしろ、極端な例では、タイムスライスにおいて、または、任意の他の時分割された仲介物（ｉｎｔｅｒｖｅｎｏｒ）による割り込みなしでプロセスがタイムスライスの範囲内で実行され得る場合、「ジョブ」は、「プロセス」と等価である。他の極端な例では、「ジョブ」が終了するために複数のタイムスライスを要求されない場合、「ジョブ」は、「プロセス」のサブセットである。そのため、複数の時間スライスが、「プロセス」が他の等しい優先度のユーザ／プロセスの存在のために完成するように要求する場合、「プロセス」は、「ジョブ」に分割される。さらに、プロセスは、唯一の高い優先度のユーザであるか、またはプロセスが時間スライス内に完成させるのに十分短いため、「プロセス」が、複数の時間スライスが完成するように要求される場合、「プロセス」は「ジョブ」と等価になる。従って、ユーザは、シミュレーションシステムにおいてロードされ実行された１つ以上の「プロセス」またはプログラムとインタラクトし得、各「プロセス」は、１つ以上の「ジョブ」が時分割システムにおいて完成するように要求し得る。
【０４４４】
１つのシステム構成において、リモート端末を介した複数のユーザは、非ネットワーク環境において同じマイクロプロセッサワークステーションを利用し、同じ再構成可能なハードウエアユニットにアクセスして、同じユーザ回路設計または異なるユーザ回路設計を検討／デバッグする。非ネットワーク環境において、リモート端末はその処理機能にアクセスするためメインコンピューティングシステムに接続される。この非ネットワーク構成は、多数のユーザが、パラレルデバッグ目的のために同じユーザ設計へのアクセスを共有することを可能にする。このアクセスは、時分割プロセスによって達成される。このプロセスにおいて、スケジューラは、複数のユーザへのアクセスの優先度を決定し、ジョブをスワッピングし、そしてスケジューリングされたユーザ間でハードウエアユニットアクセスを選択的にロックする。他の例では、複数のユーザは、デバッグ目的のためにユーザ自身の別の異なるユーザ設計に対するサーバを介して同じ再構成可能なハードウエアユニットにアクセスし得る。この構成では、複数のユーザまたはプロセスは、オペレーティングシステムを有するワークステーションにおける複数のマイクロプロセッサを共有する。別の構成では、分離したマイクロプロセッサベースのワークステーションにおける複数のユーザまたはプロセスは、同一の再構成可能なハードウエアユニットにアクセスし、ネットワークを介して同じユーザ回路設計または異なるユーザ回路設計を検討／デバッグし得る。同様に、このアクセスは、時分割プロセスを介して達成され、このプロセスにおいて、スケジューラは、複数のユーザへのアクセス優先度を決定し、ジョブをスワッピングし、そしてスケジューリングされたユーザ間でハードウエアユニットアクセスを選択的にロックする。ネットワーク環境において、スケジューラは、ＵＮＩＸ（登録商標）ソケットシステム呼び出しを介してネットワークリクエストに注意を払う。このオペレーティングシステムは、コマンドをスケジューラに送信するようにソケットを使用する。
【０４４５】
上述のように、シミュレーションスケジューラは、割り込み型多重優先度ラウンドロビンアルゴリズムを使用する。言い換えると、ユーザまたはプロセスがジョブを完了し、セッションを終了するまでに、より高い優先度のユーザまたはプロセッサがまず提供される。等しい優先度のユーザまたはプロセスの中でも、割り込み型多重優先度ラウンドロビンアルゴリズムが使用され、各ユーザまたはプロセスは、完成するまでその動作を実行するように等しいタイムスライスを割り当てる。タイムスライスは、複数のユーザまたはプロセスが提供される前に長時間待機する必要のないように十分短い。さらにタイムスライスは、シミュレーションサーバのスケジューラが１ユーザまたはプロセスを割り込む前に十分な動作が実行されて、スワップインし、新しいユーザのジョブを実行するのに十分長い。一実施形態では、デフォルトタイムスライスは５秒であり、ユーザ設定可能である。一実施形態では、スケジューラは、オペレーティングシステムのビルトインスケジューラへの特定の呼び出しを行う。
【０４４６】
図４５は、本発明の一実施形態によるマルチプロセッサワークステーションを用いる非ネットワーク環境を示す。図４５は図１の改変体であり、従って、同様の参照符号は同様の構成要素／ユニットのために使用される。ワークスション１１００は、ローカルバス１１０５、ホスト／ＰＣＩブリッジ１１０６、メモリバス１１０７、およびメインメモリ１１０８を含む。キャッシュメモリサブシステム（図示せず）がさらに設けられ得る。他のユーザインターフェイスユニット（例えばモニタ、キーボード）がさらに設けられるが、図４５に示されない。さらにワークステーション１１００は、スケジューラ１１１７および接続／パス１１１８を介してローカルバス１１０５に接続される複数のマイクロプロセッサ１１０１、１１０２、１１０３、および１１０４を含む。公知のように、オペレーティングシステム１１２１は、コンピューティング環境において種々のユーザ、プロセッサ、およびデバイスのためにファイルを管理し、リソースを割り当てるための全体のコンピューティング環境のためのユーザ−ハードウエアインターフェイスの基礎を提供する。概念的目的のために、バス１１２２と共にオペレーティングシステム１１２１が示される。オペレーティングシステムへの参照は、ＡｂｒａｈａｍＳｉｌｂｅｒｓｃｈａｔｚおよびＪａｍｅｓＬ．Ｐｅｔｅｒｓｏｎによる、ＯＰＥＲＡＴＩＮＧＳＹＳＴＥＭＣＯＮＣＥＰＴＳ（１９９８）およびＷｉｌｌｉａｍＳｔａｌｌｉｎｇｓ，ＭＯＤＥＲＮＯＰＥＲＡＴＩＮＧＳＹＳＴＥＭＳ（１９９６）において為され得、これらを本明細書中で参考として援用する。
【０４４７】
一実施形態では、ワークステーション１１００は、ＵｌｔｒａＳＰＡＲＣＩＩプロセッサを使用するＳｕｎＭｉｃｒｏｓｙｓｔｅｍｓＥｎｔｒｅｒｐｒｉｓｅ４５０システムである。ローカルバスを介するメモリアクセスの代わりに、Ｓｕｎ４５０システムにより、マルチプロセッサは、クロスバースイッチによってメモリへの専用バスを介してメモリへのアクセスを可能にする。従って、複数のプロセスが各命令を実行する複数のマイクロプロセッサを用いて実行し、そしてローカルバスに向かうことなくメモリにアクセスし得る。ＳｕｎＵｌｔｒａＳＰＡＲＣマルチプロセッサ仕様を有するＳｕｎ４５０システムが本明細書中で参考として援用される。ＳｕｎＵｌｔｒａ６０システムは、マイクロプロセッサシステムの別の例であるが、このシステムは２つのプロセッサのみを可能にする。
【０４４８】
スケジューラ１１１７は、デバイスドライバ１１１９および接続／パス１１２０を介して再構成可能なハードウエアユニット２０への時分割アクセスを提供する。スケジューラ１１１７は、シミュレーションジョブ割り込みおよびシミュレーションセッションをスワップイン／スワップアウトすることによって、ホストコンピューティングシステムのオペレーティングシステムとインタラクトするソフトウエアにおいてほとんど実現され、シミュレーションサーバとインタラクトするハードウエアにおいて部分的に実現される。スケジューラ１１１７およびデバイス１１１９は、以下でより詳細に説明される。
【０４４９】
各マイクロプロセッサ１１０１〜１１０４は、ワークステーション１１０１における他のマイクロプロセッサを独立して処理することを可能にする。本発明の一実施形態では、ワークステーション１１００は、ＵＮＩＸ（登録商標）ベースのオペレーティングシステムによって動作するが、他の実施形態では、ワークステーション１１００は、Ｗｉｎｄｏｗｓ（登録商標）ベースのオペレーティングシステムまたはＭａｃｉｎｔｏｓｈベースのオペレーティングシステムによって動作し得る。ＵＮＩＸ（登録商標）ベースのシステムに対して、プログラム、タスク、およびファイルを必要に応じて管理するためのＸ−ＷＩｎｄｏｗ（Ｒ）をユーザインターフェイスが備える。ＵＮＩＸ（登録商標）オペレーティングシステムに関する詳細に対して、参照がＭａｕｒｉｃｅＪ．Ｂａｃｈ，ＴＨＥＤＥＳＩＧＮＯＦＴＨＥＵＮＩＸ（登録商標）ＯＰＥＲＡＴＩＮＧＳＹＳＴＥＭ（１９８６）によって為される。
【０４５０】
図４５では、複数のユーザがリモート端末を介してワークステーション１１００にアクセスし得る。この時点で、各ユーザは特定のＣＰＵを用いてそのプロセスを実行し得る。他の時点では、各ユーザは、リソース制限に応じて異なるＣＰＵを使用する。通常、オペレーティングシステム１１２１は、このようなアクセスを決定し、実際、オペレーティングシステム自体は、あるＣＰＵから別のＣＰＵにジャンプし、このタスクを達成し得る。時分割プロセスを処理することに対して、スケジューラはネットワークに注意を払い、ソケットシステム呼び出しによってリクエストし、オペレーティングシステム１１２１へのシステム呼び出しを行い、次に、デバイスドライバ１１１９による割り込み信号の生成を開始することによって再構成可能なハードウエアユニット２０への割り込みを処理する。このような割り込み信号生成は、現在のジョブを停止し、現在割り込まれたジョブに対する状態情報を保存し、ジョブをスワップし、新しいジョブを実行することを含むスケジューリングアルゴリズムにおいて多くのステップの内の１つである。サーバスケジューリングアルゴリズムは、以下に説明される。
【０４５１】
ソケットおよびソケットシステム呼び出しがここで簡単に説明される。一実施形態において、ＵＮＩＸ（登録商標）オペレーティングシステムは、時分割モードで動作し得る。ＵＮＩＸ（登録商標）カーネルは、ＣＰＵをあるプロセス期間（例えばタイムスライス）に割り当て、タイムスライスの終了時に、このプロセスに割り込み、次のタイムスライスに対して別のプロセスをスケジューリングする。以前のタイムスライスから割り込まれたプロセスは、以後のタイムスライスにおける実行に対して再スケジューリングされる。
【０４５２】
内部プロセス通信を可能、かつ、容易にし、高度なネットワークプロトコルの使用を可能にする１つのスキームは、ソケットである。カーネルはクライアントサーバモデルの点で機能する３つの層を有する。これらの３つの層は、ソケット層、プロトコル層、およびデバイス層を含む。上部層すなわちソケット層は、システム呼び出しと下部層（プロトコル層およびデバイス層）との間のインターフェイスを提供する。典型的には、ソケットはクライアントプロセスとサーバプロセスとを結合するエンドポイントを有する。このソケットのエンドポイントは異なるマシンを有し得る。中間層（プロトコル層）は、ＴＣＰおよびＩＰ等の通信のためのプロトコルモジュールを提供する。下部層（デバイス層）は、ネットワークデバイスを制御するデバイスドライバを含む。デバイスドライバの一例は、イーサネット（登録商標）ベースのネットワークを介したイーサネット（登録商標）ドライバである。
【０４５３】
プロセスは、クライアント−サーバモデルを用いて通信する。ここで、サーバプロセスは、一方のエンドポイントにおいて、ソケットに注意を払い、サーバプロセスに対して、クライアントプロセスは、双方向通信経路の他方のエンドポイントにおいて、他のソケットを介して注意を払う。カーネルは、各クライアントおよびサーバの３層の間で、相互接続を維持し、必要に応じてクライアントからサーバにデータをルーティングする。
【０４５４】
ソケットは、通信経路のエンドポイントを確立するソケットシステム呼び出しを含むいくつかのシステム呼び出しを含む。多くのプロセスは、ソケット記述子ｓｄを、多くのシステム呼び出しにおいて用いる。結合システム呼び出しは、名前をソケット記述子と関連付ける。いくつかの他の例示的なシステム呼び出しには、カーネルがソケットへの接続を行うことを要求する接続システム呼び出し、ソケットを閉じる閉鎖システム呼び出し、ソケット接続を閉じる停止システム呼び出し、接続されたソケットでデータを送信する送信および受信システム呼び出しが含まれる。
【０４５５】
図４６は、複数のワークステーションが、ネットワークにわたって、時分割ベースで単一のシミュレーションシステムを共有する、本発明による他の実施形態を示す図である。複数のワークステーションは、スケジューラ１１１７を介してシミュレーションシステムに結合されている。シミュレーションシステムの計算環境において、単一ＣＰＵ１１は、ステーション１１１０内のローカルバス１２に結合されている。また、複数のＣＰＵがこのシステムにおいて提供され得る。当業者にとって公知であるように、オペレーティングシステム１１１８も提供され、殆ど全てのプロセスおよびアプリケーションは、オペレーティングシステム上にある。概念的な目的のため、バス１１２２とともにオペレーティングシステム１１２１が示される。
【０４５６】
図４６において、ワークステーション１１１０は、図１において、オペレーティングシステム１１２１を介してローカルバス１２に結合されるスケジューラ１１１７およびスケジューラバス１１１８とともに示されるコンポーネント／装置を含む。スケジューラ１１１７は、オペレーティングシステム１１２１へのソケット呼び出しを作成することによって、ユーザステーション１１１１、１１１２、および１１１３の時分割アクセスを制御する。スケジューラ１１１７は、多くは、ソフトウェアにおいて、部分的には、ハードウェアにおいて、インプリメントされる。
【０４５７】
この図には、３つのユーザのみが示され、ネットワークにわたってシミュレーションシステムにアクセスすることができる。当然、他のシステム構成は、３つより多いユーザまたは３つ未満のユーザに備える。各ユーザは、遠隔ステーション１１１１、１１１２、または１１１３を介してシステムにアクセスする。遠隔ユーザステーション１１１１、１１１２、および１１１３は、それぞれ、ネットワーク接続１１１４、１１１５および１１１６を介してスケジューラ１１１７に結合される。
【０４５８】
当業者にとって公知であるように、デバイスドライバ１１１９は、ＰＣＩバス５０と再構成可能ハードウェア装置２０との間に結合される。接続または導電経路１１２０は、デバイスドライバ１１１９と再構成可能ハードウェア装置２０との間に提供される。本発明のこのネットワークマルチユーザ実施形態において、スケジューラ１１１７は、デバイスドライバ１１１９と、ハードウェア技術復元の目的のため、ハードウェアの促進およびシミュレーション用の再構成可能ハードウェア装置２０と通信し、制御するように、オペレーティングシステム１１２１を介してインターフェースで連結する。
【０４５９】
再度、ある実施形態において、シミュレーションワークステーション１１００は、ＵｌｔｒａＳＰＡＲＣＩＩマルチプロセッサを用いる、ＳｕｎＭｉｃｒｏｓｙｓｔｅｍｓＥｎｔｅｒｐｒｉｓｅ４５０ｓｙｓｔｅｍである。ローカルバスを介するメモリアクセスの代わりに、Ｓｕｎ４５０ｓｙｓｔｅｍは、マルチプロセッサが、ローカルバスを拘束する代わりに、クロスバースイッチを介して、メモリ専用バスでメモリにアクセスすることを可能にする。
【０４６０】
図４７は、本発明のネットワーク実施形態による、シミュレーションサーバの高レベルな構造を示す図である。ここで、オペレーティングシステムは、明示されていないが、当業者にとって公知であるように、オペレーティングシステムは、シミュレーション計算環境における様々なユーザ、プロセス、およびデバイスの役に立つように、ファイル管理およびリソース割り当てのためには、常に存在する。シミュレーションサーバ１１３０は、スケジューラ１１３７、１つ以上のデバイスドライバ１１３８、および再構成可能ハードウェア装置１１３９を含む。図４５および４６においては、単一集積装置として明示されていないが、シミュレーションサーバは、スケジューラ１１１７、デバイスドライバ１１１９、および再構成可能ハードウェア装置２０を含む。図４７に戻ると、シミュレーションサーバ１１３０は、ネットワーク接続／経路１１３４、１１３５、および１１３６をそれぞれ介して、３つのワークステーション（または、ユーザ）１１３１、１１３２、および１１３３に結合される。上述したように、３つより多いワークステーションまたは３つ未満のワークステーションは、シミュレーションサーバ１１３０に結合され得る。
【０４６１】
シミュレーションサーバにおけるスケジューラは、プリエンプティブラウンドロビンアルゴリズムに基づく。本質的には、ラウンドロビン方式は、いくつかのユーザまたはプロセスが、連続的に実行して、周期的実行を完了することを可能にする。従って、各シミュレーションジョブ（ネットワーク環境においてワークステーションに関連するか、またはマルチプロセス非ネットワーク環境においてユーザ／プロセスに関連するジョブ）は、優先度レベルおよび実行される固定のタイムスライスが割り当てられる。
【０４６２】
概して、より優先度が高いジョブは、完了するために最初に実行される。一方の極端な例では、異なるユーザがそれぞれ異なる優先度を有する場合、まず、最も優先度が高いユーザに対して、このユーザのジョブが完了するまで役目を果たし、最も優先度が低いユーザに対しては、最後に役目を果たす。ここでは、各ユーザの優先度が異なり、スケジューラは、優先度に従って役目を果たすに過ぎないため、タイムスライスが用いられない。このシナリオは、完了するまでシミュレーションシステムにアクセスするユーザが１つしかない場合に類似する。
【０４６３】
他方の極端な例では、異なるユーザが等しい優先度を有する。従って、先入れ先出し（ＦＩＦＯ）キューを有するタイムスライスの概念が採用される。優先度が等しいジョブの間で、各ジョブは、ジョブが完了するか、または、固定タイムスライスが終わるか、いずれかが先に来るまで実行される。ジョブが、タイムスライスの間、完了するまで実行されない場合、完了したタスクに関連するシミュレーションイメージは、後で復元され、実行されるために保存される必要がある。その後、このジョブは、キューの最後に位置付けられる。次のジョブについて、保存されたシミュレーションイメージが存在する場合には、次のタイムスライスにおいて復元され、実行される。
【０４６４】
優先度が高いジョブは、優先度がより低いジョブよりも優先され得る。すなわち、優先度が等しいジョブは、タイムスライスを介して実行され、完了するまで、ランドロビン様式で実行される。その後、より優先度が低いジョブが、ラウンドロビン様式で実行される。より優先度が低いジョブが実行されている間に、より優先度が高いジョブがキューに挿入される場合、より優先度が高いジョブが実行され、完了するまで、より優先度が低いジョブよりも優先される。従って、より優先度が高いジョブを実行して、より優先度が低いジョブが実行され始める前に、完了する。優先度が低いジョブが既に実行され始めている場合、優先度が低いジョブは、優先度が高いジョブが実行され、完了するまで、さらに、完了するまで実行されない。
【０４６５】
一実施形態において、ＵＮＩＸ（登録商標）オペレーティングシステムは、基本的、且つ、基礎的なプリエンプティブラウンドロビンスケジューリングアルゴリズムを提供する。本発明の一実施形態による、シミュレーションサーバのスケジューリングアルゴリズムは、オペレーティングシステムのスケジューリングアルゴリズムと共に機能する。ＵＮＩＸ（登録商標）を用いるシステムにおいて、スケジューリングアルゴリズムのプリエンプティブな性質は、オペレーティングシステムにユーザ定義スケジュールを優先することを提供する。時分割方式を可能にするため、シミュレーションスケジューラは、オペレーティングシステム自体のスケジューリングアルゴリズムの上でプリエンプティブ複数優先度ラウンドロビンアルゴリズムを用いる。
【０４６６】
本発明の一実施形態による、複数ユーザとシミュレーションサーバとの間の関係は、複数ユーザがクライアントであり、シミュレーションサーバがサーバである場合のクライアント−サーバモデルに従う。ユーザクライアントとサーバとの間の通信は、ソケット呼び出しを介して発生する。簡略的に図５５を参照すると、
クライアントは、クライアントプログラム１１０９、ソケットシステム呼び出しコンポーネント１１２３、ＵＮＩＸ（登録商標）カーネル１１２４、およびＴＣＰ／ＩＰプロトコルコンポーネント１１２５を含む。サーバは、ＴＣＰ／ＩＰプロトコルコンポーネント１１２６、ＵＮＩＸ（登録商標）カーネル１１２７、ソケットシステム呼び出しコンポーネント１１２８、およびシミュレーションサーバ１１２９を含む。複数クライアントは、シミュレーションジョブが、クライアントアプリケーションプログラムから、ＵＮＩＸ（登録商標）ソケット呼び出しを介して、サーバにおいてシミュレーションされるようにリクエストし得る。
【０４６７】
一実施形態において、典型的なイベントのシーケンスには、複数クライアントが、ＵＮＩＸ（登録商標）ソケットプロトコルを介してサーバーにリクエストを送信することが含まれる。各リクエストについて、サーバは、コマンドが首尾良く実施されたか否かについてのリクエストを受け取ったことを通知する。しかし、サーバキューステータスのリクエストについては、サーバが、ユーザに適切に表示し得るように、現在のキュー状態で応答する。以下の表Ｆに、クライアントからの関係するソケット命令を挙げる。
【０４６８】
【表８】

【０４６９】
各ソケット呼び出しにおいて、整数で暗号化された各コマンドには、さらなるパラメータ、例えば、設計名を表す＜設計＞が続き得る。シミュレーションサーバからの応答は、コマンドが首尾良く実行される場合は、「０」であり、コマンドが失敗する場合は、「１」である。キューステータスをリクエストするコマンド「５」について、コマンドの返答のうちの一実施形態は、「＼０」というユーザのスクリーンに表示される文字で終了するＡＳＣＩＩテキストである。これらのシステムソケット呼び出しを用いて、適切な通信プロトコル信号が、デバイスドライバを介して、再構成可能なハードウェア装置に送信され、再構成可能なハードウェア装置から受信される。
【０４７０】
図４８は、本発明による、シミュレーションサーバのアーキテクチャの一実施形態である。上述したように、複数ユーザまたは複数プロセスに対して、ユーザの設計のシミュレーションおよびハードウェア促進について、時分割様式で、単一シミュレーションサーバが役目を果たし得る。従って、ユーザ／プロセス１１４７、１１４８、および１１４９は、それぞれ、プロセス間通信経路１１５０、１１５１、および１１５２を介して、シミュレーションサーバ１１４０に結合される。プロセス間通信経路１１５０、１１５１、および１１５２は、マルチプロセッサ設定および動作と同じワークステーションか、または、複数ワークステーション用のネットワークにあり得る。各シミュレーションセッションは、再構成可能なハードウェア装置を有する通信用のハードウェア状態と共に、ソフトウェアシミュレーション状態を含む。ソフトウェアセッションの間のプロセス間通信は、シミュレータープラグインカードがインストールされる同じワークステーション、または、ＴＣＰ／ＩＰネットワークを介して接続される別のワークステーションにあるシミュレーションセッションを有する能力を提供する、ＵＮＩＸ（登録商標）ソケット、またはシステム呼び出しを用いて行われる。シミュレーションサーバとの通信は、自動的に開始される。
【０４７１】
図４８において、シミュレーションサーバ１１４０は、サーバモニタ１１４１、シミュレーションジョブキューテーブル１１４２、優先度分類器１１４３、ジョブスワップ器１１４４、デバイスドライバ（単数または複数）１１４５、および再構成可能ハードウェア装置１１４６を含む。シミュレーションジョブキューテーブル１１４２、優先度分類器１１４３、およびジョブスワップ器１１４４は、図４７に示すスケジューラ１１３７を構成する。
【０４７２】
サーバモニタ１１４１は、システムの管理者にユーザインターフェース機能を提供する。ユーザは、キューにおけるシミュレーションジョブ、スケジューリング優先度、使用履歴、およびシミュレーションジョブスワップ効率を表示するようにシステムに命令することによって、シミュレーションサーバのステータスをモニタし得る。他のユティリティ機能には、ジョブ優先度の編集、シミュレーションジョブの削除、およびシミュレーションサーバ状態のリセットが含まれる。
【０４７３】
シミュレーションジョブキューテーブル１１４２は、スケジューラによって挿入されるキューにおける処理中の全ての突出したシミュレーションリクエストのリストを保持する。テーブル項目には、ジョブの数、ソフトウェアシミュレーションの数、ソフトウェアシミュレーションイメージ、ハードウェアシミュレーションイメージファイル、設計構成ファイル、優先度の数、ハードウェアサイズ、ソフトウェアサイズ、シミュレーション実行の累積時間、および所要者識別が含まれる。ジョブキューは、先入れ先出し（ＦＩＦＯ）キューを用いて実現される。従って、新しいジョブがリクエストされると、キューの最後に置かれる。
【０４７４】
優先度分類器１１４３は、キューにおけるいずれのシミュレーションジョブが実行されるかを決定する。一実施形態において、シミュレーションジョブ優先度方式は、ユーザにより定義可能（すなわち、システム管理者によって制御可能、且つ定義可能）であり、いずれのシミュレーションプロセスが、現在の実行について優先度を有するか制御する。一実施形態において、優先度レベルは、特定のプロセスまたは特定のユーザの重要度に基づいて、固定される。他の実施形態において、優先度レベルは動的であり、シミュレーション中に変更され得る。好適な実施形態において、優先度は、ユーザＩＤに基づく。典型的には、１人のユーザの優先度が高く、他の全てのユーザの優先度は、低いが等しい。
【０４７５】
優先度レベルは、システム管理者によって設定可能である。シミュレータサーバは、全てのユーザ情報を、典型的には、「／ｅｔｃ／ｐａｓｓｗｄ」と呼ばれる、ＵＮＩＸ（登録商標）ユーザファイルにおいて見出される、ＵＮＩＸ（登録商標）設備から入手する。新たなユーザを追加することは、新たなユーザをＵＮＩＸ（登録商標）システム内に追加するプロセスと整合する。全てのユーザを定義した後、シミュレータサーバモニタは、ユーザの優先度レベルを調節するために用いられ得る。
【０４７６】
ジョブスワップ器１１４４は、一時的に、スケジューラに関してプログラムされた優先度決定に基づいて、あるプロセスまたはあるワークステーションに関連する、あるシミュレーションジョブを、他のプロセスまたはワークステーションに関連する、他のシミュレーションジョブと取り換える。複数のユーザが、同じ設計をシミュレートする場合、ジョブスワップ器は、シミュレーションセッションについて、格納されたシミュレーション状態のみを取り換える。しかし、複数ユーザが複数の設計をシミュレートする場合、ジョブスワップ器が、シミュレーション状態において交換される前に、ハードウェア設定用の設計をロードする。一実施形態において、ジョブ交換は、再構成可能ハードウェア装置アクセスについてのみ行われる必要があるので、ジョブ交換メカニズムは、本発明の時分割実施形態の性能を向上させる。従って、１つのユーザが、ある程度の期間、ソフトウェアシミュレーションを必要とする場合、サーバは、他のユーザの他のジョブを交換して、この他のユーザが、ハードウェア促進用の再構成可能なハードウェア装置にアクセスし得るようにする。ジョブ交換の頻度は、ユーザによる調節およびプログラムが可能である。デバイスドライバは、ジョブを交換する、再構成可能なハードウェア装置と通信する。
【０４７７】
次に、シミュレーションサーバの動作を説明する。図４９は、動作中のシミュレーションサーバのフローチャートである。初期的には、工程１１６０で、システムはアイドルである。システムが工程１１６０でアイドルである場合、シミュレーションサーバは、必ずしも、イナクティブであるわけではないし、シミュレーションタスクが実行していないわけでもない。実際には、アイドルとは、下記のうちの１つを意味する。（１）シミュレーションが実行されていない。（２）１つのユーザ／ワークステーションのみが、１つのプロセッサ環境においてアクティブであり、時分割が必要とされない。あるいは、（３）マルチプロセス環境において１つのユーザ／ワークステーションのみがアクティブであるが、１つのプロセスのみが実行されている。従って、上記の状態２および３は、シミュレーションサーバが、１つしか処理するジョブを有さず、従って、ジョブをキューに並べ、優先度を決定し、ジョブをスワップすることが、必要、且つ、本質的でないことを示し、シミュレーションサーバは、他のワークステーションまたはプロセスから、リクエスト（イベント１１６１）を受け取らないので、アイドルである。
【０４７８】
シミュレーションリクエストが、複数のユーザ環境のワークステーションから、または複数のプロセッサ環境のマイクロプロセッサからの１つ以上のリクエスト信号に起因して発生する場合、シミュレーションサーバは、工程１１６２で、入来するシミュレーションジョブ（単数または複数）をキューに並べる。スケジューラは、全ての処理中のシミュレーションリクエストをそのキューに挿入して、全ての処理中のシミュレーションリクエストをリストに挙げるように、シミュレーションジョブキューテーブルを保持する。バッチシミュレーションジョブについて、サーバにおけるスケジューラは、全ての入来シミュレーションリクエストをキューに並べ、人間の介入なしで、タスクを自動的に処理する。
【０４７９】
その後、シミュレーションサーバは、キューに並べられたジョブを分類して、工程１１６３において、優先度を決定する。この工程は、複数のジョブについて、再構成可能なハードウェア装置へのアクセスを提供するため、サーバがその間で優先順位を付ける必要がある場合、特に重要である。優先度分類器は、キューにおけるいずれのシミュレーションジョブが実行されるかを決定する。一実施形態において、シミュレーションジョブ優先度方式は、リソース競合が存在する場合、現在の実行について、いずれのシミュレーションプロセスが優先度を有するかを制御するように、ユーザにより定義可能（すなわち、システム管理者によって制御可能、且つ定義可能）である。
【０４８０】
工程１１６３における優先度の分類の後、サーバは、必要に応じて、工程１１６４において、シミュレーションジョブを交換する。この工程は、サーバにおいてスケジューラに関してプログラムされた優先度決定に基づいて、一時的に、あるプロセスまたはあるワークステーションに関連するあるシミュレーションジョブを、他のプロセスまたはワークステーションに関連する他のシミュレーションジョブと置き換える。複数のユーザが、同じ設計をシミュレートする場合、ジョブスワップ器は、シミュレーションセッションについて、格納されたシミュレーション状態のみを取り換える。しかし、複数のユーザが複数の設計をシミュレートする場合、ジョブスワップ器が、まず、シミュレーション状態において交換される前に、ハードウェア設定用の設計をロードする。ここで、デバイスドライバは、ジョブを交換するように、再構成可能なハードウェア装置とも通信する。
【０４８１】
一実施形態において、ジョブ交換は、再構成可能ハードウェア装置アクセスについてのみ行われる必要があるので、ジョブ交換メカニズムは、本発明の時分割実施形態の性能を向上させる。従って、１つのユーザが、ある程度の期間、ソフトウェアシミュレーションを必要とする場合、サーバは、他のユーザの他のジョブを交換して、この他のユーザが、ハードウェア促進用の再構成可能なハードウェア装置にアクセスし得るようにする。例えば、２つのユーザ、ユーザ１およびユーザ２が、再構成可能なハードウェア装置へのアクセス用のシミュレーションサーバに結合されているとする。あるときには、ユーザ１がシステムにアクセスするので、ユーザ１の設計について、デバッギングが行われ得る。ユーザ１がソフトウェアモードにおいてのみデバッギングする場合、サーバは、ユーザ２がアクセスできるように、再構成可能なハードウェア装置を解除し得る。サーバは、ユーザ２のジョブをスワップし、ユーザ２は、モデルのソフトウェアシミュレーション、または、ハードウェア促進のいずれかを行い得る。ユーザ１およびユーザ２の間の優先度に依存して、ユーザ２は、ある所定の期間の間、再構成可能なハードウェア装置へのアクセスを継続し得るか、または、ユーザ１が促進のため、再構成可能なハードウェア装置を必要とする場合には、サーバは、ユーザ２のジョブを優先させ得るので、ユーザ１のジョブは、再構成可能なハードウェア装置を用いて、ハードウェア促進についてスワップされ得る。所定の時間とは、同じ優先度の複数のリクストに基づいた、シミュレータジョブの優先権のことである。一実施形態において、デフォルトの時間は、５分であるが、この時間は、ユーザによって設定可能である。この５分の設定は、タイムアウトタイマの一形態を表す。本発明のシミュレーションシステムは、現在のシミュレーションジョブには非常に時間がかかり、他の保留中の等しい優先度のジョブが再構成可能なハードウェアモデルへのアクセスを得る必要があるとシステムが決定するので、タイムアウトタイマを用いて、現在のシミュレーションジョブの実行を停止する。
【０４８２】
工程１１６４においてジョブスワップ工程が完了する場合、サーバ内のデバイスドライバが、再構成可能なハードウェア装置をロックするので、現在スケジューリングされているユーザまたはプロセスのみが、シミュレートし、ハードウェアモデルを用いることができる。ロックおよびシミュレーション工程は、工程１１６５において発生する。
【０４８３】
イベント１１６６において、現在のシミュレーションセッションでのシミュレーションの完了または一時停止のいずれかが発生するとき、サーバは、優先度分類工程１１６３に戻って、保留中のシミュレーションジョブの優先度を決定し、必要に応じて、シミュレーションジョブをスワップする。同様に、サーバは、イベント１１６７において、サーバを優先度分類状態１１６３に戻すように、現在アクティブであるシミュレーションジョブの実行を優先させる。優先権は、ある特定の状況の下でのみ発生する。このような状態のうちの１つとして、より優先度が高いタスクまたはジョブが保留中である場合がある。他のこのような状態として、システムが計算集中シミュレーションタスクを現在実行している場合がある。この場合、スケジューラは、タイムアウトタイマを用いることによって、現在実行しているジョブを優先させて、優先度が等しいタスクまたはジョブをスケジューリングするようにプログラムされ得る。一実施形態において、タイムアウトタイマは、５分に設定され、現在のジョブが５分実行される場合、システムは、現在のジョブを優先させて、保留中のジョブを、優先度のレベルが同じであっても、スワップする。
【０４８４】
図５０は、ジョブスワッププロセスのフローチャートである。ジョブスワップ機能は、図４９の工程１１６４において行われ、図４８のジョブスワップ器１１４４として、シミュレーションサーバハードウェア内に示される。図５０において、シミュレーションジョブが他のシミュレーションジョブとスワップされる必要がある場合、ジョブスワップ器は、工程１１８０において、再構成可能なハードウェア装置に割り込みを送信する。再構成可能なハードウェア装置が、現在あらゆるジョブを実行していない（すなわち、システムがアイドルであるか、または、ユーザが、任意のハードウェア促進介入のみがないソフトウェアシミュレーションモードで操作している）場合、割り込みは、直ちに、再構成可能なハードウェア装置をジョブスワップに備えて準備する。しかし、再構成可能なハードウェア装置が、現在、ジョブを実行している場合、命令を実行しているか、または、データを処理している最中で、割り込み信号が認識されるが、再構成可能な装置は、現在保留中の命令の実行、および現在のジョブのデータの処理を継続する。現在のシミュレーションジョブが命令の実行またはデータの処理の最中でないときに、再構成可能なハードウェア装置が割り込み信号を受信する場合、割り込み信号は、直ちに、再構成可能なハードウェア装置の動作を実質的に終わらせる。
【０４８５】
工程１１８１において、シミュレーションシステムは、現在のシミュレーションイメージ（すなわち、ハードウェアおよびソフトウェア状態）を保存する。このイメージを保存することによって、ユーザは、後で、保存された時点までシミュレーション全体を再実行することなく、シミュレーション実行を復元し得る。
【０４８６】
工程１１８２において、シミュレーションシステムは、新たなユーザ設計を用いて、再構成可能なハードウェア装置を設定する。この設定工程は、新たなジョブが、設定済みであり、再構成可能なハードウェア装置にロードされた設計とは異なるユーザ設計と関連し、実行がちょうど割り込まれたところである場合にのみ必要である。設定後、保存されたハードウェアシミュレーションイメージは、工程１１８３において再ロードされ、保存されたソフトウェアシミュレーションイメージは、工程１１８４において再ロードされる。新たなシミュレーションジョブが同じ設計と関連する場合、さらなる設定は必要とされない。同じ設計について、シミュレーションシステムは、工程１１８３におけるその同じ設計の新たなシミュレーションジョブと関連する、所望のハードウェアシミュレーションイメージを、新たなジョブのシミュレーション設計が、ちょうど割り込まれたところのジョブのシミュレーションイメージとは恐らく異なるので、ロードする。設定工程の細部は、この特許明細書中で提供される。その後、関連するソフトウェアシミュレーションイメージは、工程１１８４において、再ロードされる。ハードウェアおよびソフトウェアシミュレーションイメージの再ロードの後、工程１１８５において、この新たなジョブについて、シミュレーションが開始し得、以前に割り込まれたジョブは、しばらくは、再構成可能なハードウェア装置へのアクセスがないので、ソフトウェアシミュレーションモードのみで進み得る。
【０４８７】
図５１は、デバイスドライバと再構成可能なハードウェア装置との間の信号を示す図である。デバイスドライバ１１７１は、スケジューラ１１７０と再構成可能なハードウェア装置１１７２との間のインターフェースを提供する。また、デバイスドライバ１１７１は、図４５および４６に示すように、計算環境全体（すなわち、単数または複数のワークステーション、ＰＣＩバス、ＰＣＩデバイス）と、再構成可能なハードウェア装置１１７２との間のインターフェースを提供するが、図５１には、シミュレーションサーバ部分のみを示す。デバイスドライバと再構成可能なハードウェア装置との間の信号には、双方向通信ハンドシェイク信号と、計算環境から、スケジューラを介して再構成可能なハードウェア装置へと送られる一方向設計構成情報と、スワップして用いられるシミュレーション状態情報と、スワップして用いられなくなったシミュレーション状態情報と、デバイスドライバから、再構成可能なハードウェア装置へと送られ、シミュレーションジョブがスワップされ得る割り込み信号とが含まれる。
【０４８８】
ライン１１７３は、双方向通信ハンドシェイク信号を搬送する。これらの信号およびハンドシェイクプロトコルは、図５３および５４を参照しながら、さらに説明される。
【０４８９】
ライン１１７４は、計算環境から、スケジューラ１１７０を介して、再構成可能なハードウェア装置１１７２へと一方向設計構成情報を搬送する。初期設定情報は、このライン１１７０上を、モデリングのために、再構成可能なハードウェア装置１１７２へと送信され得る。さらに、ユーザが異なるユーザ設計をモデリングおよびシミュレーションしている場合、設定情報は、タイムスライスの間、再構成可能なハードウェア装置１１７２へと送信される必要がある。異なるユーザが同じユーザ設計をモデリングする場合、新たな設計構成が必要ではなく、むしろ、同じ設計に関連する、異なるシミュレーションハードウェア状態が、異なるシミュレーション実行において、再構成可能なハードウェア装置１１７２へと送信される必要があり得る。
【０４９０】
ライン１１７５は、スワップされて用いられるシミュレーション状態情報を、再構成可能なハードウェア装置１１７２へと搬送する。ライン１１７６は、スワップされて用いられなくなったシミュレーション状態情報を、再構成可能なハードウェア装置から計算環境（すなわち、通常のメモリ）へと搬送する。スワップされて用いられるシミュレーション状態情報には、再構成可能なハードウェア装置１１２７を促進するために必要とされる、以前に保存されたハードウェアモデル状態情報、およびハードウェアメモリ状態が含まれる。スワップされて用いられる状態情報は、タイムスライスの開始において送信され、スケジューリングされた現在のユーザが、促進のため、再構成可能なハードウェア装置１１７２にアクセスし得る。スワップされて用いられなくなったシミュレーション状態情報には、再構成可能なハードウェア装置１１７２が割り込み信号を受信して、異なるユーザ／プロセスに関連する次のタイムスライスに移る際に、タイムスライスの終わりでメモリに保存される必要があるハードウェアモデルおよびメモリ状態情報が含まれる。状態情報の保存は、現在のユーザ／プロセスが、後で、例えば、この現在のユーザ／プロセスに割り当てられた次のタイムスライスにおいて、この状態を復元することを可能にする。
【０４９１】
ライン１１７７は、割り込み信号を、デバイスドライバ１１７１から、再構成可能なハードウェア装置に送信し、シミュレーションジョブがスワップされ得る。この割り込み信号は、タイムスライスとタイムスライスとの間に送信されて、現在のタイムスライスの現在のシミュレーションジョブがスワップされて用いられなくなり、新たなタイムスライス用の新たなシミュレーションジョブにスワップされる。
【０４９２】
次に、本発明の一実施形態による通信ハンドシェイクプロトコルは、図５３および５４を参照しながら説明される。図５３に、デバイスドライバと、再構成可能なハードウェア装置との間の、ハンドシェイク論理インターフェースを介する通信ハンドシェイク信号を示す。図５４に、通信プロトコルの状態図を示す。図５１に、ライン１１７３上の通信ハンドシェイク信号を示す。図５３は、デバイスドライバ１１７１と再構成可能なハードウェア装置１１７２との間の通信ハンドシェイク信号の詳細な図である。
【０４９３】
図５３において、ハンドシェイク論理インターフェース１２３４が、再構成可能なハードウェア装置１１７２に設けられている。あるいは、ハンドシェイク論理インターフェース１２３４は、再構成可能なハードウェア装置１１７２の外部にインストールされ得る。４組の信号が、デバイスドライバ１１７１と、ハンドシェイク論理インターフェース１２３４との間に提供される。これらの信号は、ライン１２３０上の３ビットのＳＰＡＣＥ信号であり、ライン１２３１上の１ビットの読み出し／書き込み信号であり、ライン１２３２上の４ビットのＣＯＭＭＡＮＤ信号であり、ライン１２３３上の１ビットのＤＯＮＥ信号である。ハンドシェイク論理インターフェースは、これらの信号を処理して、再構成可能なハードウェア装置を、行われる必要がある様々な操作に適したモードにする論理回路を含む。インターフェースは、ＣＴＲＬ＿ＦＰＧＡ装置（または、ＦＰＧＡＩ／Ｏコントローラ）に結合される。
【０４９４】
３ビットのＳＰＡＣＥ信号について、ＰＣＩバスを介する、シミュレーションシステムの計算環境と再構成可能なハードウェア装置との間のデータ転送は、ソフトウェア／ハードウェア境界における、ある特定のＩ／Ｏアドレススペース、すなわち、ＲＥＧ（レジスタ）、ＣＬＫ（ソフトウェアクロック）、Ｓ２Ｈ（ソフトウェアからハードウェア）、およびＨ２Ｓ（ハードウェアからソフトウェア）用に指定される。上述したように、シミュレーションシステムは、ハードウェアモデルを、異なるコンポーネントのタイプおよび制御機能に従って、メインメモリ内の４つのアドレススペースにマッピングする。ＲＥＧスペースは、レジスタコンポーネント用に指定される。ＣＬＫスペースは、ソフトウェアクロック用に指定される。Ｓ２Ｈスペースは、ハードウェアモデルへのソフトウェアテストベンチコンポーネントの出力用に指定される。Ｈ２Ｓスペースは、ソフトウェアテストベンチコンポーネントへのハードウェアモデルの出力用に指定される。これらの専用のＩ／Ｏバッファスペースは、システム初期化の間、カーネルのメインメモリスペースにマッピングされる。
【０４９５】
以下の表Ｇに、ＳＰＡＣＥ信号の各々の記述を提供する。
【０４９６】
【表９】

【０４９７】
ライン１２３１上の読み出し／書き込み信号は、データ転送が読み出しであるか、または書き込みであるかを示す。ライン１２３３上のＤＯＮＥ信号は、ＤＭＡデータ転送期間の完了を示す。
【０４９８】
４ビットのＣＯＭＭＡＮＤは、データ転送操作が、書き込みであるか、読み出しであるか、再構成可能なハードウェア装置への新たなユーザ設計の設定であるか、または、シミュレーションの割り込みであるかを示す。下記の表Ｈに示すように、ＣＯＭＭＡＮＤプロトコルは、以下の通りである。
【０４９９】
【表１０】

【０５００】
次に、図５４上の状態を示す図を参照しながら、通信ハンドシェイクプロトコルが説明される。状態１４００において、シミュレーションシステムは、デバイスドライバにおいてアイドルである。新たなコマンドが提示されない限り、システムは、経路１４０１によって示されるように、アイドルであり続ける。新たなコマンドが提示される場合、コマンドプロセッサは、状態１４０２において、新たなコマンドを処理する。一実施形態において、コマンドプロセッサは、ＦＰＧＡＩ／Ｏコントローラである。
【０５０１】
ＣＯＭＭＡＮＤ＝００００、または、ＣＯＭＭＡＮＤ＝０００１である場合、システムは、工程１４０３において、ＳＰＡＣＥインデックスによって示されるように、指定されたスペースから読み出すか、または指定されたスペースに書き込む。ＣＯＭＭＡＮＤ＝００１０である場合、システムは、ユーザ設計を用いて、再構成可能なハードウェア装置においてＦＰＧＡを初期的に設定するか、または、状態１４０４における新たなユーザ設計を用いて、ＦＰＧＡを設定する。システムは、全てのＦＰＧＡのシステムの設定情報に順序を付けて、ハードウェアにモデリングされ得るユーザ設計の一部をモデリングする。しかし、ＣＯＭＭＡＮＤ＝００１１である場合、システムは、状態１４０５において、再構成可能なハードウェア装置に割り込み、新たなシミュレーション状態において新たなユーザ／プロセスにスワップするようにタイムスライスがタイムアウトになるので、シミュレーションシステムに割り込む。これらの状態１４０３、１４０４、または１４０５の完了において、シミュレーションシステムは、ＤＯＮＥ状態１４０６に進んで、ＤＯＮＥ信号を生成し、その後、状態１４００に戻って、新たなコマンドが提示されるまでアイドルになる。
【０５０２】
次に、優先度のレベルが異なる複数のジョブを処理する、シミュレーションサーバの時分割機能が記載される。図５２に、一例を示す。４つのジョブ（ジョブＡ、ジョブＢ、ジョブＣ、ジョブＤ）は、シミュレーションジョブキューの入来ジョブである。しかし、これらの４つのジョブの優先度のレベルは異なる。すなわち、ジョブＡおよびＢには、高い優先度Ｉが割り当てられているが、ジョブＣおよびＤには、低い優先度ＩＩが割り当てられる。図５２の時系列チャートに示すように、時分割された再構成可能なハードウェア装置の使用は、キューに並べられた入来ジョブの優先度レベルに依存する。時間１１９０において、シミュレーションは、再構成可能ハードウェア装置へのアクセスを与えられるジョブＡで開始する。時間１１９１において、ジョブＡは、ジョブＢがジョブＡと同じ優先度を有するので、ジョブＢに優先され、スケジューラは、２つのジョブに等しい時分割アクセスを提供する。ジョブＢは、再構成可能なハードウェア装置へのアクセスを有する。時間１１９２において、ジョブＡは、ジョブＢに優先し、ジョブＡは、時間１１９３において完了するまで実行される。時間１１９３において、ジョブＢがとって代わり、時間１１９４まで、完了するまで実行される。時間１１９４において、キューにおいて隣接するが、ジョブＡおよびＢよりも優先度のレベルが低いジョブＣは、ここで、実行のための再構成可能なハードウェア装置へのアクセスを有する。時間１１９５において、時分割アクセスにおいて、ジョブＤが、ジョブＣと優先度レベルが同じであるので、ジョブＣに優先する。ジョブＤは、ジョブＣによって優先されるアクセスを時間１１９６まで有する。ジョブＣは、時間１１９７で完了するまで実行される。その後、時間１１９７において、ジョブＤがとって代わり、時間１１９８まで、完了するまで実行される。
【０５０３】
（ＶＩＩＩ．メモリシミュレーション）
本発明のメモリシミュレーションまたはメモリマッピング局面は、シミュレーションシステムがユーザの設計の構成ハードウェアモデルに関連する種々のメモリブロックを管理するための有効な方法を提供する。その構成ハードウェアモデルは再構成可能なハードウェア部におけるＦＰＧＡのアレイ中へプログラミングされた。本発明の実施形態を実施することによって、メモリシミュレーションスキームは、メモリアクセスを処理するためのＦＰＧＡチップにおける専用ピンを全く必要としない。
【０５０４】
本明細書中で使用される用語「メモリアクセス」は、ユーザの設計が構成されるＦＰＧＡ論理回路とユーザの設計に関連するすべてのメモリブロックを格納するＳＲＡＭメモリデバイスとの間の書き込みアクセスまたは読み出しアクセスのいずれかを示す。したがって、書き込み動作はＦＰＧＡ論理デバイスからＳＲＡＭメモリデバイスへのデータ転送を含み、他方読み出し操作はＳＲＡＭメモリデバイスからＦＰＧＡ論理デバイスへのデータ転送を含む。図５６を参照する。ＦＰＧＡ論理デバイスは１２０１（ＦＰＧＡ１）、１２０２（ＦＰＧＡ３）、１２０３（ＦＰＧＡ０）、および１２０４（ＦＰＧＡ２）を含む。ＳＲＡＭメモリはメモリデバイス１２０５および１２０６を含む。
【０５０５】
また、用語「ＤＭＡデータ転送」は、当業者間で共通な使用法に加えて、計算システムとシミュレーションシステムとの間のデータ転送を示す。計算システムは、図１、４５、および４６においてシミュレーションシステムをサポートするメモリを有するＰＣＩ系システム全体として示され、ソフトウェアおよび再構成可能ハードウェア部中に常駐する。選択されたデバイスドライバ、オペレーティングシステムへ／からのソケット／システムコールはまた、オペレーティングシステムおよび再構成可能ハードウェア部と適切なインタフェースを可能にするシミュレーションシステムの一部である。本発明の１実施形態において、ＤＭＡ読み出し転送は、ＦＰＧＡ論理デバイス（および初期化およびメモリ内容ダンプのためのＦＰＧＡＳＲＡＭメモリデバイス）からホスト計算システムへのデータの転送を含む。ＤＭＡ書き込み転送は、ホスト計算システムからＦＰＧＡ論理デバイス（および初期化およびメモリ内容ダンプのためのＦＰＧＡＳＲＡＭメモリデバイス）へのデータの転送を含む。
【０５０６】
用語「ＦＰＧＡデータバス」、「ＦＰＧＡバス」、「ＦＤバス」およびそれらの変形は、デバッグされるべき構成およびプログラムされたユーザの設計を含むＦＰＧＡ論理デバイスとＳＲＡＭメモリデバイスとを結合する高バンクバスＦＤ［６３：３２］および低バンクバスＦＤ［３１：０]を示す。
【０５０７】
メモリシミュレーションシステムは、以下を制御しかつ以下とインタフェースをとるためのメモリ状態マシン、評価状態マシン、およびそれらに関連の論理を含む：（１）主計算システムおよびその関連のメモリシステム、（２）シミュレーションシステムにおけるＦＰＧＡに結合されたＳＲＡＭメモリ、および（３）デバッグにおける構成およびプログラムされたユーザの設計を含むＦＰＧＡ論理デバイス。
【０５０８】
メモリシミュレーションシステムのＦＰＧＡ論理デバイス側は、以下を処理するためにユーザの設計においてユーザの所有するメモリインタフェースとインタフェースをとるための各メモリブロックＮごとに評価状態マシン、ＦＰＧＡバスドライバ、および論理インタフェースを含む：（１）ＦＰＧＡ論理デバイス間のデータ評価、および（２）ＦＰＧＡ論理デバイスとＳＲＡＭメモリデバイスとの間の書き込み／読み出しメモリアクセス。ＦＰＧＡ論理デバイス側と併用して、ＦＰＧＡＩ／Ｏコントローラ側は、以下の間のＤＭＡ、書き込み、および読み出し動作を処理するためのメモリ状態マシンおよびインタフェース論理を含む：（１）主計算システムとＳＲＡＭメモリデバイス、および（２）ＦＰＧＡ論理デバイスとＳＲＡＭメモリデバイス。
【０５０９】
本発明の１実施形態にしたがうメモリシミュレーションシステムの動作は一般に以下のとおりである。シミュレーション書き込み／読み出しサイクルは３つの期間に分割される−ＤＭＡデータ転送、評価、およびメモリアクセス。ＤＡＴＡＸＳＦＲ信号はＤＭＡデータ転送期間の発生を示す。ＤＭＡデータ転送期間では、計算システムおよびＳＲＡＭメモリ部がＦＰＧＡデータバス（高バンクバス（ＦＤ[６３：３２]）１２１２および低バンクバス（ＦＤ[３１：０])１２１３を介して互いにデータを転送している。
【０５１０】
評価期間中は、各ＦＰＧＡ論理デバイスにおける論理回路はデータ評価のためのユーザの設計論理への適切なソフトウェアクロック、入力イネーブル、およびマルチプレクサイネーブル信号を生成する。ＦＰＧＡ論理デバイス間通信はこの期間中に発生する。
【０５１１】
メモリアクセス期間中は、メモリシミュレーションシステムは高および低バンクＦＰＧＡ論理デバイスがそれぞれのアドレスおよび制御信号をそれぞれのＦＰＧＡデータバスへ載せるのを待つ。これらのアドレスおよび制御信号はＣＴＲＬ＿ＦＰＧＡ部によってラッチインされる。動作が書き込みであれば、アドレス、制御、およびデータ信号がＦＰＧＡ論理デバイスからそれぞれのＳＲＡＭメモリデバイスへ転送される。動作が読み出しであれば、アドレスおよび制御信号が指定のＳＲＡＭメモリデバイスへ提供され、かつデータ信号がＳＲＡＭメモリデバイスからそれぞれのＦＰＧＡ論理デバイスへ転送される。すべてのＦＰＧＡ論理デバイスにおけるすべての所望のメモリブロックがアクセスされた後で、メモリシミュレーション書き込み／読み出しサイクルが完了し、そしてメモリシミュレーションシステムは次のメモリシミュレーション書き込み／読み出しサイクルの開始までアイドル状態である。
【０５１２】
図５６は、本発明の１実施形態にしたがうメモリシミュレーション構成の高レベルブロック図である。本発明のメモリシミュレーション局面に関連しない信号、接続、およびバスは図示されない。上記のＣＴＲＬ＿ＦＰＧＡ部１２００は、バス１２１０にライン１２０９を介して結合される。１実施形態において、ＣＴＲＬ＿ＦＰＧＡ部１２００はＡｌｔｅｒａ１０Ｋ５０チップなどのＦＰＧＡチップの形態であるプログラム可能論理デバイス（ＰＬＤ）である。ローカルバス１２１０は、ＣＴＲＬ＿ＦＰＧＡ部１２００が（あれば）他のシミュレーションアレイボードおよび他のチップ（例えば、ＰＣＩコントローラ、ＥＥＰＲＯＭ、クロックバッファ）に結合されるのを可能にする。ライン１２０９は、シミュレーションＤＭＡデータ転送期間の完了を示すＤＯＮＥ信号を伝送する。
【０５１３】
図５６は、論理デバイスおよびメモリデバイスの形態の他の主要な機能ブロックを示す。１実施形態において、論理デバイスはＡｌｔｅｒａ１０Ｋ１３０または１０Ｋ２５０チップなどのＦＰＧＡチップの形態であるプログラム可能論理デバイス（ＰＬＤ）である。したがって、アレイ中に８つのＡｌｔｅｒａＦＬＥＸ１０Ｋ１００チップを有する上記実施形態の代わりに、この実施形態はＡｌｔｅｒａのＦＬＥＸ１０Ｋ１３０のチップ４つだけ使用する。メモリデバイスは、Ｃｙｐｒｅｓｓ１２８Ｋｘ３２ＣＹ７Ｃ１３３５またはＣＹ７Ｃ１３３６チップなどの同期パイプライン化キャッシュＳＲＡＭである。論理デバイスは、１２０１（ＦＰＧＡ１）、１２０２（ＦＰＧＡ３）、１２０３（ＦＰＧＡ０）、および１２０４（ＦＰＧＡ２）を含む。ＳＲＡＭチップは、低バンクメモリデバイス１２０５（Ｌ＿ＳＲＡＭ）および高バンクメモリデバイス１２０６（Ｈ＿ＳＲＡＭ）を含む。
【０５１４】
これらの論理デバイスおよびメモリデバイスは、ＣＴＲＬ＿ＦＰＧＡ部１２００に高バンクバス１２１２（ＦＤ［６３：３２］）および低バンクバス（ＦＤ［３１：０]）を介して結合される。論理デバイス１２０１（ＦＰＧＡ１）および１２０２（ＦＰＧＡ３）は、高バンクバス１２１２にそれぞれバス１２２３およびバス１２２５を介して結合され、他方論理デバイス１２０３（ＦＰＧＡ０）および１２０４（ＦＰＧＡ２）は、低バンクデータバス１２１３にそれぞれバス１２２４およびバス１２２６を介して結合される。高バンクメモリデバイス１２０６は高バンクバス１２１２にバス１２２０を介して結合され、他方低バンクメモリデバイス１２０５は低バンクバス１２１３にバス１２１９を介して結合される。デュアルバンクバス構造は、シミュレーションシステムが高バンク上のデバイスおよび低バンク上のデバイスに並列に改善されたスループットレートでアクセスすることを可能にする。デュアルバンクデータバス構造は、シミュレーション書き込み／読み出しサイクルが制御され得るように制御およびアクセス信号などの他の信号をサポートする。
【０５１５】
図６１を簡単に参照しておくと、各シミュレーション書き込み／読み出しサイクルは、ＤＭＡデータ転送期間、評価期間、およびメモリアクセス期間を含む。種々の制御信号の組み合わせはシミュレーションシステムがある期間中にあって他ではないかどうかを制御しかつ示す。再構成可能ハードウェア部におけるホストコンピュータシステムと論理デバイス１２０１〜１２０４との間のＤＭＡデータ転送はＰＣＩバス（例えば、図４６のバス５０）、ローカルバス１２１０および１２３６、ならびにＦＰＧＡバス１２１２（ＦＤ［６３：３２］）およびＦＰＧＡバス１２１３（ＦＤ［３１：０]）を介して発生する。メモリデバイス１２０５および１２０６は、初期化およびメモリ内容ダンプのためのＤＭＡデータ転送に関与する。再構成可能ハードウェア部における論理デバイス１２０１〜１２０４間の評価データ転送は、相互接続（前出）ならびにＦＰＧＡバス１２１２（ＦＤ［６３：３２］）およびＦＰＧＡバス１２１３（ＦＤ［３１：０]）を介して発生する。論理デバイス１２０１〜１２０４とメモリデバイス１２０５および１２０６との間のメモリアクセスは、ＦＰＧＡバス１２１２（ＦＤ［６３：３２］）およびＦＰＧＡバス１２１３（ＦＤ［３１：０]）を介して発生する。
【０５１６】
図５６を再度参照する。ＣＴＲＬ＿ＦＰＧＡ部１２００は、多くの制御およびアドレス信号を提供および受信してシミュレーション書き込み／読み出しサイクルを制御する。ＣＴＲＬ＿ＦＰＧＡ部１２００は、ライン１２１１上のＤＡＴＡＸＳＦＲおよびＥＶＡＬ信号を、それぞれライン１２２１を介して論理デバイス１２０１および１２０３へ、それぞれライン１２２２を介して論理デバイス１２０２および１２０４へ提供する。ＣＴＲＬ＿ＦＰＧＡ部１２００はまた、メモリアドレス信号ＭＡ[１８：２]を低バンクメモリデバイス１２０５および高バンクメモリデバイス１２０６にそれぞれバス１２２９および１２１４を介して提供する。これらのメモリデバイスのモードを制御するために、ＣＴＲＬ＿ＦＰＧＡ部１２００はチップ選択書き込み（および読み出し）信号を低バンクメモリデバイス１２０５および高バンクメモリデバイス１２０６にそれぞれライン１２１６および１２１５を介して提供する。ＤＭＡデータ転送の完了を示すために、メモリシミュレーションシステムはライン１２０９上のＤＯＮＥ信号をＣＴＲＬ＿ＦＰＧＡ部１２００および計算システムに送信および受信し得る。
【０５１７】
図９、１１、１２、１４、および１５を参照して上記したように、論理デバイス１２０１〜１２０４は、特に、２セットのＳＩＦＴＩＮ／ＳＨＩＦＴＯＵＴライン−ライン１２０７、１２２７、および１２１８、ならびにライン１２０８、１２２８、および１２１７によって図５６に表される多重化クロスチップアドレスポインタチェーンによってまとめて接続される。これらのセットはチェーンの開始時にライン１２０７および１２０８におけるＶｃｃによって初期化される。ＳＨＩＦＴＩＮ信号は、バンクにおける前段のＦＰＧＡ論理デバイスから送信され、現在のＦＰＧＡ論理デバイスのためのメモリアクセスを開始する。所定セットのチェーンを介するシフトの完了時に、最後の論理デバイスはＬＡＳＴ信号（すなわち、ＬＡＳＴＬまたはＬＡＳＴＨ）をＣＴＲＬ＿ＦＰＧＡ部１２００へ生成する。高バンクに対して、論理デバイス１２０２はライン１２１８上のＬＡＳＴＨシフトアウト信号をＣＴＲＬ＿ＦＰＧＡ部１２００へ生成し、かつ低バンクに対して、論理デバイス１２０４はライン１２１７上のＬＡＳＴＬ信号をＣＴＲＬ＿ＦＰＧＡ部１２００へ生成する。
【０５１８】
ボード実装および図５６に関して、本発明の１実施形態は構成要素（例えば、論理デバイス１２０１〜１２０４、メモリデバイス１２０５〜１２０６、およびＣＴＲＬ＿ＦＰＧＡ部１２００）およびバス（例えば、ＦＰＧＡバス１２１２〜１２１３およびローカルバス１２１０）を１ボード中に内蔵する。この１ボードはマザーボードにマザーボードコネクタを介して結合される。したがって、１ボード中に、４つの論理デバイス（各バンク中に２つ）、２つのメモリデバイス（各バンク中に１つ）、およびバスが提供される。第２ボードは、その補完として論理デバイス（通常４つ）、メモリデバイス（通常２つ）、ＦＰＧＡＩ／Ｏコントローラ（ＣＴＲＬ＿ＦＰＧＡ部）およびバスを含み得る。しかし、ＰＣＩコントローラは第１のボードのみに設置され得る。ボード間コネクタは、上記のように、ボード間に提供され、すべてのボードにおける論理デバイスがまとめて接続され、そして評価期間中に互いに通信し、かつローカルバスがすべてのボードにわたって提供されるようにする。ＦＰＧＡバスＦＤ［６３：０]は、各ボード中のみに提供され、複数のボードにわたっては提供されない。
【０５１９】
このボード構成において、シミュレーションシステムは各ボードにおける論理デバイスとメモリデバイスとの間のメモリマッピングを実行する。異なるボードにわたるメモリマッピングは提供されない。したがって、ボード５における論理デバイスはメモリブロックをボード５中のみのメモリデバイスにマッピングし、他のボード上のメモリデバイスにはマッピングしない。しかし、他の実施形態において、シミュレーションシステムは、メモリブロックを１ボード上の論理デバイスから別のボード上のメモリデバイスへマッピングする。
【０５２０】
本発明の１実施形態のメモリシミュレーションシステムの動作は一般に以下のとおりである。シミュレーション書き込み／読み出しサイクルは３つの期間に分割される−ＤＭＡデータ転送、評価、およびメモリアクセス。シミュレーション書き込み／読み出しサイクルの完了を示すために、メモリシミュレーションシステムはライン１２０９上のＤＯＮＥ信号をＣＴＲＬ＿ＦＰＧＡ部１２００および計算システムに対して送信および受信し得る。バス１２１１上のＤＡＴＡＸＳＦＲ信号はＤＭＡデータ転送期間の発生を示す。ＤＭＡデータ転送期間において、計算システムおよびＦＰＧＡ論理デバイス１２０１〜１２０４は、ＦＰＧＡデータバス、高バンクバス（ＦＤ［６３：３２］）１２１２および低バンクバス（ＦＤ［３１：０]）１２１３を介して互いにデータを転送している。一般に、ＤＭＡ転送はホスト計算システムとＦＰＧＡ論理デバイスとの間で発生する。初期化およびメモリ内容ダンプのために、ＤＭＡ転送はホスト計算システムとＳＲＡＭメモリデバイス１２０５および１２０６との間で発生する。
【０５２１】
評価期間中に、各ＦＰＧＡ論理デバイス１２０１〜１２０４における論理回路はデータ評価のためにユーザの設計論理への適切なソフトウェアクロック、入力イネーブル、およびマルチプレクサイネーブル信号を生成する。ＦＰＧＡ論理デバイス間通信はこの期間中に発生する。ＣＴＲＬ＿ＦＰＧＡ部１２００はまた、評価カウンタを開始して評価期間の持続時間を制御する。カウントの数、およびしたがって評価期間の持続時間は、信号の最長の経路を決定することによってシステムにより設定される。経路長は、特定のステップ数と関連する。システムは、ステップ情報を使用し、そして評価サイクルを実行して完了させるのに必要なカウント数を計算する。
【０５２２】
メモリアクセス期間中は、メモリシミュレーションシステムは高および低バンクＦＰＧＡ論理デバイス１２０１〜１２０４がそれぞれのアドレスおよび制御信号をそれぞれのＦＰＧＡデータバスへ載せるのを待つ。これらのアドレスおよび制御信号はＣＴＲＬ＿ＦＰＧＡ部１２００によってラッチインされる。動作が書き込みであれば、アドレス、制御、およびデータ信号がＦＰＧＡ論理デバイス１２０１〜１２０４からそれぞれのＳＲＡＭメモリデバイス１２０５および１２０６へ転送される。動作が読み出しであれば、アドレスおよび制御信号がＦＰＧＡ論理デバイス１２０１〜１２０４からそれぞれのＳＲＡＭメモリデバイス１２０５および１２０６へ転送され、かつデータ信号はＳＲＡＭメモリデバイス１２０５および１２０６からそれぞれのＦＰＧＡ論理デバイス１２０１〜１２０４へ転送される。ＦＰＧＡ論理デバイス側では、ＦＤバスドライバがメモリブロックのアドレスおよび制御信号をＦＰＧＡデータバス（ＦＤバス）へ載せる。動作が書き込みであれば、書き込みデータがそのメモリブロックのためのＦＤバスへ載せられる。動作が読み出しであれば、ダブルバッファがＳＲＡＭメモリデバイスからのＦＤバス上のメモリブロックのためのデータをラッチインする。この動作は、各ＦＰＧＡ論理デバイスにおける各メモリブロックに対して一度に１メモリブロックずつ順番に続けられる。ＦＰＧＡ論理デバイスにおけるすべての所望のメモリブロックがアクセスされた後で、メモリシミュレーションシステムは各バンクにおける次のＦＰＧＡ論理デバイスに進み、そしてそのＦＰＧＡ論理デバイスにおけるメモリブロックのアクセスを開始する。すべてのＦＰＧＡ論理デバイス１２０１〜１２０４におけるすべての所望のメモリブロックがアクセスされた後で、メモリシミュレーション書き込み／読み出しサイクルが完了し、そしてメモリシミュレーションシステムは次のメモリシミュレーション書き込み／読み出しサイクルの開始までアイドル状態である。
【０５２３】
図５７は、本発明のメモリシミュレーション局面のより詳細なブロック図を示し、ＣＴＲＬ＿ＦＰＧＡ１２００およびメモリシミュレーションに関連する各論理デバイスのより詳細な構成図を含む。図５７は、ＣＴＲＬ＿ＦＰＧＡ１２００および論理デバイス１２０３の一部（他の論理デバイス１２０１、１２０２、および１２０４の一部と構造が類似する）を示す。ＣＴＲＬ＿ＦＰＧＡ１２００は、メモリ有限状態マシン（ＭＥＭＦＳＭ）１２４０、ＡＮＤゲート１２４１、評価（ＥＶＡＬ）カウンタ１２４２、低バンクメモリアドレス／制御ラッチ１２４３、低バンクアドレス／制御マルチプレクサ１２４４、アドレスカウンタ１２４５、高バンクメモリアドレス／制御ラッチ１２４７、および高バンクアドレス／制御マルチプレクサ１２４６を含む。図５７において示される論理デバイス１２０３などの各論理デバイスは、評価有限状態マシン（ＥＶＡＬＦＳＭｘ）１２４８、データバスマルチプレクサ（ＦＰＧＡ０論理デバイス１２０３のためのＦＤＯ＿ＭＵＸｘ）１２４９を含む。ＥＶＡＬＦＳＭの端に付加された「ｘ」表記は、ＥＶＡＬＦＳＭに関連する特定の論理デバイス（ＦＰＧＡ０、ＦＰＧＡ１、ＦＰＧＡ２、ＦＰＧＡ３）を識別する。この例において、「ｘ」は０〜３の番号である。したがって、ＥＶＡＬＦＳＭ０はＦＰＧＡ０論理デバイス１２０３に関連する。一般に、各論理デバイスは、ある番号ｘと関連し、かつＮ論理デバイスが使用されると、「ｘ」は０〜Ｎ−１の番号である。
【０５２４】
各論理デバイス１２０１〜１２０４において、多くのメモリブロックが構成およびマッピングされたユーザの設計に関連する。したがって、ユーザ論理におけるメモリブロックインタフェース１２５３は、計算システムがＦＰＧＡ論理デバイスのアレイにおける所望のメモリブロックにアクセスするための手段を提供する。メモリブロックインタフェース１２５３はまた、バス１２９５上のメモリ書き込みデータをＦＰＧＡデータバスマルチプレクサ（ＦＤＯ＿ＭＵＸｘ）１２４９へ提供し、かつバス１２９７上のメモリ読み出しデータをメモリ読み出しデータダブルバッファ１２５１から読み出す。
【０５２５】
メモリブロックデータ／論理インタフェース１２９８は、各ＦＰＧＡ論理デバイス中に提供される。これらのメモリブロックデータ／論理インタフェース１２９８の各々は、ＦＰＧＡデータバスマルチプレクサ（ＦＤＯ＿ＭＵＸｘ）１２４９、評価有限状態マシン（ＥＶＡＬＦＳＭｘ）１２４８、およびＦＰＧＡバスＦＤ［６３：０]に結合される。メモリブロックデータ／論理インタフェース１２９８は、メモリ読み出しデータバッファ１２５１、アドレスオフセット部１２５０、メモリモデル１２５２、および各メモリブロックＮ（ｍｅｍ＿ｂｌｏｃｋ＿Ｎ）１２５３のためのメモリブロックインタフェースを含む。これらはすべて各メモリブロックＮについていずれの所与のＦＰＧＡ論理デバイス１２０１〜１２０４においても繰り返される。したがって、５つのメモリブロックに対して、５セットのメモリブロックデータ／論理インタフェース１２９８が提供される。すなわち、５セットのメモリ読み出しデータバッファ１２５１、アドレスオフセット部１２５０、メモリモデル１２５２、および各メモリブロックＮ（ｍｅｍ＿ｂｌｏｃｋ＿Ｎ）１２５３のためのメモリブロックインタフェースが提供される。
【０５２６】
ＥＶＡＬＦＳＭｘと同様に、ＦＤＯ＿ＭＵＸｘにおける「ｘ」は、ＦＤＯ＿ＭＵＸｘが関連する特定の論理デバイス（ＦＰＧＡ０、ＦＰＧＡ１、ＦＰＧＡ２、ＦＰＧＡ３）を識別する。この例において、「ｘ」は０〜３の番号である。ＦＤＯ＿ＭＵＸｘ１２４９の出力はバス１２８２上に提供される。バス１２８２は、どのチップ（ＦＰＧＡ０、ＦＰＧＡ１、ＦＰＧＡ２、ＦＰＧＡ３）がＦＤＯ＿ＭＵＸｘ１２４９に関連するかに依存して、高バンクバスＦＤ［６３：３２］または低バンクバスＦＤ［３１：０]に結合される。図５７において、ＦＤＯ＿ＭＵＸｘは、低バンク論理デバイスＦＰＧＡ０１２０３に関連するＦＤＯ＿ＭＵＸ０である。したがって、バス１２８２上の出力は低バンクバスＦＤ［３１：０]に提供される。バス１２８３の部分は、メモリ読み出しデータダブルバッファ１２５１への入力のために、読み出しデータを高バンクバスＦＤ［６３：３２］または低バンクバスＦＤ［３１：０]から読み出しバス１２８３へ転送するために使用される。したがって、書き込みデータはＦＤＯ＿ＭＵＸ０１２４９を介して各論理デバイス１２０１〜１２０４におけるメモリブロックから高バンクバスＦＤ［６３：３２］または低バンクバスＦＤ［３１：０]バスへ出力転送され、かつ読み出しデータはメモリ読み出しデータダブルバッファ１２５１へ高バンクバスＦＤ［６３：３２］または低バンクバスＦＤ［３１：０]バスから読み出しバス１２８３を介して入力転送される。メモリ読み出しデータダブルバッファはダブルバッファ機構を提供して第１バッファにおいてデータをラッチし、次いで再度バッファリングして同時にラッチされたデータを出力して歪み（ｓｋｅｗ）を低減する。このメモリ読み出しデータダブルバッファ１２５１は以下により詳細に記載される。
【０５２７】
メモリモデル１２５２に戻る。メモリモデル１２５２はユーザメモリタイプをメモリシミュレーションシステムのＳＲＡＭタイプに変換する。ユーザの設計におけるメモリタイプは１つのタイプから別のタイプへ変化するので、このメモリブロックインタフェース１２５３はまたユーザの設計に対してユニークであり得る。例えば、ユーザメモリタイプはＤＲＡＭ、フラッシュメモリ、またはＥＥＰＲＯＭであり得る。しかし、メモリブロックインタフェース１２５３のすべての変形において、メモリアドレスおよび制御信号（例えば、読み出し、書き込み、チップ選択、ｍｅｍ＿ｃｌｋ）が提供される。本発明のメモリシミュレーション局面の１実施形態は、ユーザメモリタイプをメモリシミュレーションシステム中で使用されるＳＲＡＭタイプへ変換する。ユーザメモリタイプがＳＲＡＭならば、ＳＲＡＭタイプメモリモデルへの変換は全く簡単である。したがって、メモリアドレスおよび制御信号は、変換を行うメモリモデル１２５２へバス１２９６上で提供される。
【０５２８】
メモリモデル１２５２は、バス１２９３上のメモリブロックアドレスおよびバス１２９２上の制御情報を提供する。アドレスオフセット部１２５０は、種々のメモリブロックのアドレス情報を受信し、かつバス１２９３上の元のアドレスからバス１２９１上の変更されたオフセットアドレスを提供する。オフセットが必要であるのは、互いに重複するメモリブロックのアドレスがあるからである。例えば、１つのメモリブロックは空間０−２Ｋを使用してその中に常駐し、他方別のメモリブロックは空間０−３Ｋを使用してその中に常駐する。両方のメモリブロックは空間０−２Ｋにおいて重複するので、個々のアドレッシングはある種のアドレスオフセット機構がないと困難であり得る。したがって、第１メモリブロックは空間０−２Ｋを使用してその中に常駐し得、他方第２メモリブロックは約２Ｋかつ５Ｋまでの空間を使用してその中に常駐し得る。アドレスオフセット部１２５０からのオフセットアドレスおよびバス１２９２上の制御信号は組み合わされ、そしてバス１２９９上でＦＰＧＡバスマルチプレクサ（ＦＤＯ＿ＭＵＸｘ）１２４９へ提供される。
【０５２９】
ＦＰＧＡデータバスマルチプレクサＦＤＯ＿ＭＵＸｘは、バス１２８９上のＳＰＡＣＥ２データ、バス１２９０上のＳＰＡＣＥ３データ、バス１２９９上のアドレス／制御信号、およびバス１２９５上のメモリ書き込みデータを受信する。上記のように、ＳＰＡＣＥ２およびＳＰＡＣＥ３は特定の空間インデックスである。ＦＰＧＡＩ／Ｏコントローラ（図１０における項目３２７；図２２）によって生成されるＳＰＡＣＥインデックスは特定のアドレス空間（すなわち、ＲＥＧ読み出し、ＲＥＧ書き込み、Ｓ２Ｈ読み出し、Ｈ２Ｓ書き込み、およびＣＬＫ書き込み）を選択する。このアドレス空間内で、本発明のシステムはアクセスすべき特定のワードを逐次選択する。ＳＰＡＣＥ２は、ハードウェア対ソフトウェアＨ２ＳデータのためのＤＭＡ読み出し転送に専用のメモリ空間を示す。ＳＰＡＣＥ３は、ＲＥＧＩＳＴＥＲ＿ＲＥＡＤデータのためのＤＭＡ読み出し転送に専用のメモリ空間を示す。前出表Ｇを参照のこと。
【０５３０】
出力として、ＦＤＯ＿ＭＵＸｘ１２４９は、バス１２８２上のデータを低バンクバスまたは高バンクバスのいずれかに提供する。セレクタ信号は、ＥＶＡＬＦＳＭｘ部１２４８からのライン１２８４上の出力イネーブル（ｏｕｔｐｕｔ＿ｅｎ）信号およびライン１２８５上の選択信号である。ライン１２８４上の出力イネーブル信号は、ＦＤＯ＿ＭＵＸｘ１２４９の動作を使用可能（または使用不可能）にする。ＦＰＧＡバスを介するデータアクセスのために、出力イネーブル信号はＦＤＯ＿ＭＵＸｘが機能できるように使用可能にされる。ライン１２８５上の選択信号は、ＥＶＡＬＦＳＭｘ部１２４８によって生成され、バス１２８９上のＳＰＡＣＥ２、バス１２９０上のＳＰＡＣＥ３、バス１２９９上のアドレス／制御信号、およびバス１２９５上のメモリ書き込みデータから複数の入力を選択する。ＥＶＡＬＦＳＭｘ部１２４８による選択信号の生成は、以下にさらに記載される。
【０５３１】
ＥＶＡＬＦＳＭｘ部１２４８は、メモリシミュレーションシステムに関して各論理デバイス１２０１〜１２０４の動作の中心にある。ＥＶＡＬＦＳＭｘ部１２４８は入力としてライン１２７９上のＳＨＩＦＴＩＮ信号、ライン１２７４上のＣＴＲＬ＿ＦＰＧＡ部１２００からのＥＶＡＬ信号、およびライン１２８７上の書き込み信号ｗｒｘを受信する。ＥＶＡＬＦＳＭｘ部１２４８は、ライン１２８０上のＳＨＩＦＴＯＵＴ信号、メモリ読み出しデータダブルバッファ１２５１への読み出しラッチ信号ｒｄ＿ｌａｔｘ、ＦＤＯ＿ＭＵＸｘ１２４９上へのライン１２８４上の出力イネーブル信号、ＦＤＯ＿ＭＵＸｘ１２４９へのライン１２８５上の選択信号、およびライン１２８１上のユーザ論理への３つの信号（ｉｎｐｕｔ−ｅｎ、ｍｕｘ＿ｅｎ、およびｃｌｋ＿ｅｎ）を出力する。
【０５３２】
本発明の１実施形態のメモリシミュレーションシステムのためのＦＰＧＡ論理デバイス１２０１〜１２０４の動作は一般に以下のとおりである。ＥＶＡＬが論理１にある場合、ＦＰＧＡ論理デバイス１２０１〜１２０４内のデータ評価が発生する。そうでなければ、シミュレーションシステムはＤＭＡデータ転送またはメモリアクセスのいずれかを行う。ＥＶＡＬ＝１において、ＥＶＡＬＦＳＭｘ部１２４８はｃｌｋ＿ｅｎ信号、ｉｎｐｕｔ＿ｅｎ信号、およびｍｕｘ＿ｅｎ信号を生成してそれぞれユーザ論理が論理デバイスを介するデータ、ラッチ関連データ、および多重信号を評価できるようにする。ＥＶＡＬＦＳＭｘ部１２４８は、ｃｌｋ＿ｅｎ信号を生成してユーザの設計論理におけるすべてのクロックエッジレジスタフリップフロップの第２のフリップフロップを使用可能にする（図１９参照)。ｃｌｋ＿ｅｎ信号は他にもソフトウェアクロックとして公知である。ユーザメモリタイプが同期である場合、ｃｌｋ＿ｅｎはまた各メモリブロックにおけるメモリ読み出しデータダブルバッファ１２５１の第２クロックを使用可能とする。ＥＶＡＬＦＳＭｘ部１２４８は、ユーザの設計論理へのｉｎｐｕｔ＿ｅｎ信号を生成してＤＭＡ転送によってＣＰＵからユーザ論理へ送信される入力信号をラッチする。ｉｎｐｕｔ＿ｅｎ信号は、イネーブル入力を主クロックレジスタにおける第２フリップフロップへ提供する（図１９参照）。最後に、ＥＶＡＬＦＳＭｘ部１２４８は、ｍｕｘ＿ｅｎ信号を生成して各ＦＰＧＡ論理デバイスにおける多重化回路をオンにし、アレイにおける他のＦＰＧＡ論理デバイスとの通信を開始する。
【０５３３】
その後、ＦＰＧＡ論理デバイス１２０１〜１２０４は少なくとも１つのメモリブロックを含む場合、メモリシミュレーションシステムは、選択されたデータが選択されたＦＰＧＡ論理デバイスへシフトされるのを待ち、そして次いでＦＰＧＡデータバスドライバのためのｏｕｔｐｕｔ＿ｅｎおよび選択信号を生成してメモリブロックインタフェース１２５３（ｍｅｍ＿ｂｌｏｃｋ＿Ｎ）のアドレスおよび制御信号をＦＤバス上に載せる。
【０５３４】
ライン１２８７上の書き込み信号ｗｒｘが使用可能となると（すなわち、論理１）、選択およびｏｕｔｐｕｔ＿ｅｎ信号が使用可能とされ書き込みデータを、どのバンク上でＦＰＧＡチップが結合されるかに依存して、低または高バンクバスのいずれか上へ載せる。図５７において、論理デバイス１２０３はＦＰＧＡ０であり、かつ低バンクバスＦＤ［３１：０］に結合される。ライン１２８７上の書き込み信号ｗｒｘが使用不可能とされると（すなわち、論理０）、選択およびｏｕｔｐｕｔ＿ｅｎ信号は使用不可能とされ、かつライン１２８６上の読み出しラッチ信号ｒｄ＿ｌａｔｘは、どのバンク上でＦＰＧＡチップが結合されるかに依存して、低または高バンクバスのいずれかを介して、メモリ読み出しデータダブルバッファ１２５１にＳＲＡＭからの選択されたデータをラッチおよびダブルバッファ化させる。ｗｒｘ信号は、ユーザの設計論理のメモリインタフェースから得られるメモリ書き込み信号である。実際に、ライン１２８７上のｗｒｘ信号はメモリモデル１２５２から制御バス１２９２を介して来る。
【０５３５】
データの読み出しまたは書き込みのこの処理は、各ＦＰＧＡ論理デバイスに対して発生する。すべてのメモリブロックがＳＲＡＭアクセスを介して処理された後で、ＥＶＡＬＦＳＭｘ部１２４８はＳＨＩＦＴＯＵＴ信号を生成してチェーンにおける次のＦＰＧＡ論理デバイスによるＳＲＡＭアクセスを可能にする。なお、高および低バンク上のデバイスのためのメモリアクセスは並列に発生する。あるバンクのためのメモリアクセスが他のバンクのためのメモリアクセスの前に完了することもある。これらのアクセスのすべてについて、論理が準備完了しかつデータが利用可能な場合にのみ論理がデータを処理するように適切な待ちサイクルが挿入される。
【０５３６】
ＣＴＲＬ＿ＦＰＧＡ部１２００側において、ＭＥＭＦＳＭ１２４０は本発明のメモリシミュレーション局面の中心にある。ＭＥＭＦＳＭ１２４０は多くの制御信号を送信および受信してメモリシミュレーション書き込み／読み出しサイクルの起動およびサイクルによってサポートされる種々の動作の制御を制御する。ＭＥＭＦＳＭ１２４０は、ライン１２６０上のＤＡＴＡＳＦＲ信号をライン１２５８を介して受信する。この信号はまた、ライン１２７３上の各論理デバイスへ提供される。ＤＡＴＡＸＳＦＲがロー（ｌｏｗ）（論理ロー）になると、ＤＭＡデータ転送期間は終了し、そして評価およびメモリアクセス期間が開始する。
【０５３７】
ＭＥＭＦＳＭ１２４０はまた、ライン１２５４上のＬＡＳＴＨ信号およびライン１２５５上のＬＡＳＴＬ信号を受信して、選択されたアドレス空間に関連する選択されたワードが計算システムとシミュレーションシステムとの間でＰＣＩバスおよびＦＰＧＡバスを介してアクセスされたことを示す。このシフトアウト処理に関連するＭＯＶＥ信号は、所望のワードがアクセスされ、かつＭＯＶＥ信号がチェーンの終わりに最終的にＬＡＳＴ信号（すなわち、高バンクに対してＬＡＳＴＨおよび低バンクに対してＬＡＳＴＬ）となるまで各論理デバイス（例えば、論理デバイス１２０１〜１２０４）を介して伝送される。ＥＶＡＬＦＳＭ１２４８（すなわち、図５７はＦＰＧＡ０論理デバイス１２０３に対するＥＶＡＬＦＳＭ０を示す）において、対応するＬＡＳＴ信号はライン１２８０上のＳＨＩＦＴＯＵＴ信号である。特定の論理デバイス１２０３は図５６に示すように低バンクチェーンにおいて最後の論理デバイスではないので（論理デバイス１２０４が低バンクチェーンにおける最後の論理デバイスである）、ＥＶＡＬＦＳＭ０のためのＳＨＩＦＴＯＵＴ信号はＬＡＳＴ信号ではない。ＥＶＡＬＦＳＭ１２４８が図５６のＥＶＡＬＦＳＭ２に対応する場合、ライン１２８０上のＳＨＩＦＴＯＵＴ信号はＭＥＭＦＳＭへのライン１２５５へ提供されるＬＡＳＴＬ信号である。そうでなければ、ライン１２８０上のＳＨＩＦＴＯＵＴ信号は論理デバイス１２０４へ提供される（図５６参照）。同様に、ライン１２７９上のＳＨＩＦＴＩＮ信号は、ＦＰＧＡ０論理デバイス１２０３（図５６参照）のためのＶｃｃを表す。
【０５３８】
ＬＡＳＴＬおよびＬＡＳＴＨ信号はＡＮＤゲート１２４１へそれぞれライン１２５６および１２５７を介して入力される。ＡＮＤゲート１２４１はオープンドレインを提供する。ＡＮＤゲート１２４１の出力はライン１２５９上のＤＯＮＥ信号を生成する。ＤＯＮＥ信号は計算システムおよびＭＥＭＦＳＭ１２４０へ提供される。したがって、ＬＡＳＴＬおよびＬＡＳＴＨ信号の両方が論理ハイ（ｈｉｇｈ）であってシフトアウトチェーンプロセスの終了を示す場合のみ、ＡＮＤゲートは論理ハイを出力する。
【０５３９】
ＭＥＭＦＳＭ１２４０はＥＶＡＬカウンタ１２４２に対する開始信号をライン１２６１上に生成する。名前が示すように、開始信号はＥＶＡＬカウンタ１２４２を始動させ、かつＤＭＡデータ転送期間の完了後に送信される。開始信号はＤＤＡＴＡＸＳＦＲ信号のハイからロー（１から０）への遷移時に生成される。ＥＶＡＬカウンタ１２４２は、クロックサイクルの所定の数をカウントするプログラム可能カウンタである。ＥＶＡＬカウンタ１２４２におけるプログラムされたカウントの持続時間は、評価期間の持続期間を決定する。ライン１２７４上のＥＶＡＬカウンタ１２４２の出力は、カウンタがカウント中か否かに依存して、論理レベル１または０のいずれかである。ＥＶＡＬカウンタ１２４２がカウント中の場合、ライン１２７４上の出力は論理１であり、これはＥＶＡＬＦＳＭｘ１２４８を介して各ＦＰＧＡ論理デバイス１２０１〜１２０４に提供される。ＥＶＡＬ＝１の場合、ＦＰＧＡ論理デバイス１２０１〜１２０４はＦＰＧＡ間通信を行って、ユーザの設計におけるデータを評価する。ＥＶＡＬカウンタ１２４２の出力はまた、ライン１２６２上をＭＥＭＦＳＭユニット１２４０へそれ自身のトラッキングの目的でフィードバックされる。プログラムされたカウントの終了時に、ＥＶＡＬカウンタ１２４２はライン１２７４および１２６２上に論理０を生成して評価期間の終了を示す。
【０５４０】
メモリアクセスが所望されない場合、ライン１２７２上のＭＥＭ＿ＥＮは、論理０にアサートされ、かつＭＥＭＦＳＭユニット１２４０に提供される。この場合、メモリシミュレーションシステムは別のＤＭＡデータ転送期間のあいだ待機する。メモリアクセスが所望される場合、ライン１２７２上のＭＥＭ＿ＥＮ信号は論理１にアサートされる。実質的には、ＭＥＭ＿ＥＮ信号は、オンボードＳＲＡＭメモリデバイスがＦＰＧＡ論理デバイスにアクセスすることを可能にするためのＣＰＵからの制御信号である。ここで、ＭＥＭＦＳＭユニット１２４０は、ＦＰＧＡ論理デバイス１２０１〜１２０４がアドレスおよび制御信号をＦＰＧＡバスＦＤ［６３：３２］およびＦＤ［３１：０］に入力するのを待機する。
【０５４１】
残りの機能ユニットならびにそれらの関連制御信号およびラインは、データの書き込みおよび読み出しのためのＳＲＡＭメモリデバイスにアドレス／制御情報を提供するためのものである。これらの部分は、低バンクに対するメモリアドレス／制御ラッチ１２４３、低バンクに対するアドレス制御ｍｕｘ１２４４、高バンクに対するメモリアドレス／制御ラッチ１２４７、高バンクに対するアドレス制御マルチプレクサ１２４６、およびアドレスカウンタ１２４５を含む。
【０５４２】
低バンクに対するメモリアドレス／制御ラッチ１２４３は、バス１２１３に一致するＦＰＧＡバスＦＤ［３１：０］１２７５からのアドレスおよび制御信号、ならびにライン１２６３上のラッチ信号を受信する。ラッチ１２４３は、ライン１２６４上にｍｅｍ＿ｗｒ＿Ｌ信号を生成し、かつＦＰＧＡバスＦＤ［３１：０］からアドレス制御ｍｕｘ１２４４へバス１２６６を介して入力アドレス／制御信号を提供する。このｍｅｍ＿ｗｒ信号は、チップ選択書き込み信号と同じである。
【０５４３】
アドレス／制御マルチプレクサ１２４４は、入力としてバス１２６６上のアドレスおよび制御情報ならびにアドレスカウンタ１２４５からバス１２６８を介してのアドレス情報を受信する。出力として、アドレス／制御マルチプレクサ１２４４は、バス１２７６上でアドレス／制御情報を低バンクＳＲＡＭメモリデバイス１２０５へ送信する。ライン１２６５上の選択信号は、ＭＥＭＦＳＭユニット１２４０から適切な選択信号を提供する。バス１２７６上のアドレス／制御情報は、図５６におけるバス１２２９および１２１６上のＭＡ［１８：２］およびチップ選択読み出し／書き込み信号に対応する。
【０５４４】
アドレスカウンタ１２４５はＳＰＡＣＥ４およびＳＰＡＣＥ５からバス１２６７を介して情報を受信する。ＳＰＡＣＥ４はＤＭＡ書き込み転送情報を含む。ＳＰＡＣＥ５はＤＭＡ読み出し転送情報を含む。これらのＤＭＡ転送は計算システム（ワークステーションＣＰＵを介するキャッシュ／メインメモリ）とシミュレーションシステム（ＳＲＡＭメモリデバイス１２０５、１２０６）との間でＰＣＩバスを介して発生する。アドレスカウンタ１２４５はその出力をアドレス／制御マルチプレクサ１２４４および１２４６へのバス１２８８および１２６８に提供する。低バンクに対するライン１２６５上の適切な選択信号を用いて、アドレス／制御マルチプレクサ１２４４は、バス１２７６上に、ＳＲＡＭデバイス１２０５とＦＰＧＡ論理デバイス１２０３、１２０４との間の書き込み／読み出しメモリアドレスに対するバス１２６６上のアドレス／制御情報、または、バス１２６７上のＳＰＡＣＥ４またはＳＰＡＣＥ５からのＤＭＡ書き込み／読み出し転送データのいずれかを入力する。
【０５４５】
メモリアクセス期間中に、ＭＥＭＦＳＭユニット１２４０は、ライン１２６３上のラッチ信号をメモリアドレス／制御ラッチ１２４３に提供してＦＰＧＡバスＦＤ［３１：０］から入力をフェッチする。ＭＥＭＦＳＭユニット１２４０は、ＦＤ［３１：０］上のアドレス／制御信号からｍｅｍ＿ｗｒ＿Ｌ制御情報をさらなる制御のために抽出する。ライン１２６４上のｍｅｍ＿ｗｒ＿Ｌ信号が論理１である場合、書き込み動作が所望され、かつライン１２６５上の適切な選択信号はＭＥＭＦＳＭユニット１２４０によってアドレス／制御マルチプレクサ１２４４に生成され、バス１２６６上のアドレスおよび制御信号はバス１２７６上の低バンクＳＲＡＭへ送信される。その後、ＦＰＧＡ論理デバイスからＳＲＡＭメモリデバイスへの書き込みデータ転送が発生する。ライン１２６４上のｍｅｍ＿ｗｒ＿Ｌ信号が論理０である場合、読み出し動作が所望されるので、シミュレーションシステムは、ＳＲＡＭメモリデバイスによってそこに配置されるＦＰＧＡバスＦＤ［３１：０］上のデータを待機する。データが準備完了するとすぐに、ＳＲＡＭメモリデバイスからＦＰＧＡ論理デバイスへの読み出しデータ転送が発生する。
【０５４６】
高バンクに対する同様の構成および動作が提供される。高バンクに対するメモリアドレス／制御ラッチ１２４７は、バス１２１２に一致するＦＰＧＡバスＦＤ［６３：３２］１２７８からアドレスおよび制御信号、ならびにライン１２７０上のラッチ信号を受信する。ラッチ１２７０は、ライン１２７１上のｍｅｍ＿ｗｒ＿Ｈ信号を生成し、かつ入力アドレス／制御信号をＦＰＧＡバスＦＤ［６３：３２］からアドレス／制御マルチプレクサ１２４６へバス１２３９を介して提供する。
【０５４７】
アドレス／制御マルチプレクサ１２４６は、入力としてバス１２３９上のアドレス／制御情報およびバス１２６８上のアドレスカウンタ１２４５からアドレス情報を受信する。出力として、アドレス／制御マルチプレクサ１２４６は、バス１２７７上でアドレス／制御情報を高バンクＳＲＡＭメモリデバイス１２０６へ送信する。ライン１２６９上の選択信号は、ＭＥＭＦＳＭユニット１２４０から適切な選択信号を提供する。バス１２７７上のアドレス／制御情報は、図５６におけるバス１２１４および１２１５上のＭＡ［１８：２］およびチップ選択読み出し／書き込み信号に対応する。
【０５４８】
アドレスカウンタ１２４５は、上記のように、ＳＰＡＣＥ４およびＳＰＡＣＥ５からバス１２６７を介して情報をＤＭＡ書き込みおよび読み出し転送のために受信する。アドレスカウンタ１２４５はその出力をアドレス／制御マルチプレクサ１２４４および１２４６へのバス１２８８および１２６８に提供する。高バンクに対するライン１２６９上の適切な選択信号を用いて、アドレス／制御マルチプレクサ１２４６は、バス１２７７上に、ＳＲＡＭデバイス１２０６とＦＰＧＡ論理デバイス１２０１、１２０２との間の書き込み／読み出しメモリアドレスに対するバス１２３９上のアドレス／制御情報、または、バス１２６７上のＳＰＡＣＥ４またはＳＰＡＣＥ５からのＤＭＡ書き込み／読み出し転送データのいずれかを入力する。
【０５４９】
メモリアクセス期間中に、ＭＥＭＦＳＭユニット１２４０は、ライン１２７０上のラッチ信号をメモリアドレス／制御ラッチ１２４７に提供してＦＰＧＡバスＦＤ［６３：３２］からの入力をフェッチする。ＭＥＭＦＳＭユニット１２４０は、ｍｅｍ＿ｗｒ＿Ｈ制御情報をＦＤ［６３：３２］上のアドレス／制御信号をさらなる制御のために抽出する。ライン１２７１上のｍｅｍ＿ｗｒ＿Ｈ信号が論理１である場合、書き込み動作が所望され、かつライン１２６９上の適切な選択信号はＭＥＭＦＳＭユニット１２４０によってアドレス／制御マルチプレクサ１２４６に生成され、バス１２３９上のアドレスおよび制御信号はバス１２７７上の高バンクＳＲＡＭへ送信される。その後、ＦＰＧＡ論理デバイスからＳＲＡＭメモリデバイスへの書き込みデータ転送が発生する。ライン１２７１上のｍｅｍ＿ｗｒ＿Ｈ信号が論理０である場合、読み出し動作が所望されるので、シミュレーションシステムは、ＳＲＡＭメモリデバイスによってそこに配置されるＦＰＧＡバスＦＤ［６３：３２］上のデータを待機する。データが準備完了するとすぐに、ＳＲＡＭメモリデバイスからＦＰＧＡ論理デバイスへの読み出しデータ転送が発生する。
【０５５０】
図５７に示すように、アドレスおよび制御信号は低バンクＳＲＡＭメモリデバイスおよび高バンクメモリデバイスにそれぞれバス１２７６および１２７７を介して提供される。低バンクに対するバス１２７６は図５６におけるバス１２２９および１２１６の組み合せに対応する。同様に、高バンクに対するバス１２７７は図５６におけるバス１２１４および１２１５の組み合わせに対応する。
【０５５１】
本発明の１つの実施形態のメモリシミュレーションシステムに対するＣＴＲＬ＿ＦＰＧＡユニット１２００の動作は一般に以下のとおりである。ＣＴＲＬ＿ＦＰＧＡユニット１２００における計算システムおよびＭＥＭＦＳＭユニット１２４０へ提供される、ライン１２５９上のＤＯＮＥ信号は、シミュレーション書き込み／読み出しサイクルの完了を示す。ライン１２６０上のＤＡＴＡＸＳＦＲ信号は、シミュレーション書き込み／読み出しサイクルのＤＭＡデータ転送期間の発生を示す。ＦＰＧＡバスＦＤ［３１：０］およびＦＤ［６３：３２］の両方のメモリアドレス／制御信号は、それぞれ高バンクおよび低バンクに対するメモリアドレス／制御ラッチ１２４３および１２４７に提供される。いずれのバンクに対しても、ＭＥＭＦＳＭユニット１２４０はラッチ信号（１２６３または１２６９）を生成してアドレスおよび制御情報をラッチする。次いで、この情報はＳＲＡＭメモリデバイスに提供される。ｍｅｍ＿ｗｒ信号を使用して、書き込みまたは読み出し動作が所望されるかどうかを決定する。書き込みが所望される場合、データはＦＰＧＡ論理デバイス１２０１〜１２０４からＳＲＡＭメモリデバイスへ転送される。読み出しが所望される場合、シミュレーションシステムは、ＳＲＡＭメモリがリクエストされたデータをＰＦＧＡバス上へＳＲＡＭメモリとＦＰＧＡ論理デバイスとの間の転送のために入力するのを待機する。ＳＰＡＣＥ４およびＳＰＡＣＥ５のＤＭＡデータ転送のために、ライン１２６５、１２６９上の選択信号は、アドレスカウンタ１２４５の出力を、メイン計算システムとシミュレーションシステムにおけるＳＲＡＭメモリデバイスとの間で転送されるべきデータとして選択し得る。これらのアドレスのすべてについて、適切な待機サイクルは、論理が準備完了でかつデータが利用可能な場合にのみ論理がデータを処理するように挿入される。
【０５５２】
図６０は、メモリ読み出しデータダブルバッファ１２５１（図５７）のより詳細な図を示す。各ＦＰＧＡ論理デバイスにおける各メモリブロックＮは、異なる時間に入力され得る関連データをラッチインし、そして次いでこのラッチされたデータを同時に最後にバッファ出力するためのダブルバッファを有する。図６０において、メモリブロック０に対するダブルバッファ１３９１は、２つのＤ型フリップフロップ１３４０および１３４１を含む。第１のＤフリップフロップ１３４０の出力１３４３は第２のフリップフロップ１３４１の入力に結合される。第２のＤフリップフロップ１３４１の出力１３４４はダブルバッファの出力であり、これはユーザの設計論理におけるメモリブロックＮインタフェースに提供される。グローバルクロック入力は、ライン１３９３上で第１のフリップフロップ１３４０およびライン１３９４上で第２のフリップフロップ１３４１に提供される。
【０５５３】
第１のＤフリップフロップ１３４０はライン１３４２上でそのデータ入力をＳＲＡＭメモリデバイスからバス１２８３および高バンクに対するＦＰＧＡバスＦＤ［６３：３２］ならびに低バンクに対するＦＤ［３１：０］を介して受信する。イネーブル入力は、各ＦＰＧＡ論理デバイスに対するＥＶＡＬＦＳＭｘユニットからｒｄ＿ｌａｔｘ（例えば、ｒｄ＿ｌａｔ０）を受信するライン１３４５に結合される。したがって、読み出し動作（すなわち、ｗｒｘ＝０）に対して、ＥＶＡＬＦＳＭｘユニットはｒｄ＿ｌａｔｘ信号を生成して、ライン１３４２上のデータをライン１３４３にラッチインする。すべてのメモリブロックのすべてのダブルバッファに対する入力データは、異なる時間に入力され得る。ダブルバッファは、データのすべてが最初にラッチされることを確実にする。一旦すべてのデータがＤフリップフロップ１３４０にラッチインされると、ｃｌｋ＿ｅｎ信号（すなわち、ソフトウェアクロック）は、ライン１３４６上にＤフリップフロップ１３４１へのクロック入力として提供される。ｃｌｋ＿ｅｎ信号がアサートされると、ライン１３４３上のラッチされたデータがライン１３４４に対するＤフリップフロップ中１３４１にバッファされる。
【０５５４】
次のメモリブロック１に対して、ダブルバッファ１３９１に実質的に等価な別のダブルバッファ１３９２が提供される。ＳＲＡＭメモリデバイスからのデータはライン１３９６上へ入力される。グローバルクロック信号はライン１３９７上へ入力される。ｃｌｋ＿ｅｎ（ソフトウェアクロック）信号は、ライン１３９８上でダブルバッファ１３９２における第２のフリップフロップ（図示せず）に入力される。これらのラインは、メモリブロック０に対する第１のダブルバッファ１３９１および他のメモリブロックＮに対する他のダブルメモリに対して類似信号ラインに結合される。出力されたダブルバッファ化データはライン１３９９上に提供される。
【０５５５】
第２のダブルバッファ１３９２に対するｒｄ＿ｌａｔｘ信号（例えば、ｒｄ＿ｌａｔ１）は、ライン１３９５上に、他のダブルバッファに対する他のｒｄ＿ｌａｔｘ信号とは独立に提供される。より多くのダブルバッファが他のメモリブロックＮに対して提供される。
【０５５６】
ここで、ＭＥＭＦＳＭユニット１２４０の状態図を本発明の実施形態にしたがって説明する。図５８は、ＣＴＲＬ＿ＦＰＧＡユニットにおけるＭＥＭＦＳＭユニットの有限状態マシンのそのような状態図を示す。図５８における状態図は、シミュレーション書き込み／読み出しサイクル内の３つの期間がまた、それらに対応する状態を有して示されるように構成されている。したがって、状態１３００〜１３０１はＤＭＡデータ転送期間に対応する；状態１３０２〜１３０４は評価期間に対応する；および状態１３０５〜１３１４はメモリアクセス期間に対応する。以下の説明において図５７を図５８と併せて参照する。
【０５５７】
一般に、ＤＭＡ転送、評価、およびメモリアクセスのための信号のシーケンスが設定される。１つの実施形態において、そのシーケンスは以下のとおりである。ＤＡＴＡ＿ＸＳＦＲは、もしあればＤＭＡデータ転送を始動する。高バンクおよび低バンクの両方に対するＬＡＳＴ信号は、ＤＭＡデータ転送の完了時に生成され、かつＤＯＮＥ信号をトリガしてＤＭＡデータ転送完了期間の完了を示す。次いで、ＸＳＦＲ＿ＤＯＮＥ信号が生成され、かつ次いでＥＶＡＬサイクルが開始する。ＥＶＡＬの終結時に、メモリ読み出し／書き込みが開始し得る。
【０５５８】
図５８の上部に戻ると、状態１３００は、ＤＡＴＡＸＳＦＲ信号が論理０の場合はいつもアイドル状態である。このことはその場合にＤＭＡデータ転送が発生しないことを示す。ＤＡＴＡＸＳＦＲ信号が論理１の場合、ＭＥＭＦＳＭユニット１２４０は状態１３０１に進む。ここで、計算システムは、計算システム（図１、４５、および４６におけるメインメモリ）とシミュレーションシステム（図５６におけるＦＰＧＡ論理デバイス１２０１〜１２０４またはＳＲＡＭメモリデバイス１２０５、１２０６）との間のＤＭＡデータ転送を必要とする。適切な待機サイクルが、ＤＭＡデータ転送が完了するまで挿入される。ＤＭＡ転送が完了した場合、ＤＡＴＡＸＳＦＲ信号が論理０に戻る。
【０５５９】
ＤＡＴＡＸＳＦＲ信号が論理０に戻ると、開始信号の生成が状態１３０２でのＭＥＭＦＳＭユニット１２４０において起こされる。開始信号はＥＶＡＬカウンタ１２４２（プログラム可能カウンタ）を開始する。ＥＶＡＬカウンタが状態１３０３でカウントをしている限り、ＥＶＡＬ信号は論理１にアサートされ、かつ各ＦＰＧＡ論理デバイスおよびＭＥＭＦＳＭユニット１２４０においてＥＶＡＬＦＳＭｘを提供する。カウントの最後で、ＥＶＡＬカウンタは論理０のＥＶＡＬ信号を各ＦＰＧＡ論理デバイスおよびＭＥＭＦＳＭユニット１２４０におけるＥＶＡＬＦＳＭｘに提示する。ＭＥＭＦＳＭユニット１２４０は、論理０のＥＶＡＬ信号を受信すると、状態１３０４でＥＶＡＬ＿ＤＯＮＥフラグをオンにする。ＥＶＡＬ＿ＤＯＮＥフラグはＭＥＭＦＳＭによって使用され、評価期間が終了し、かつメモリアクセス期間が、所望ならばここで進行することを示す。ＣＰＵは、ＥＶＡＬ＿ＤＯＮＥおよびＸＳＦＲ＿ＤＯＮＥを、ＸＳＦＲ＿ＥＶＡＬレジスタ（以下の表Ｋを参照）を読み出すことによってチェックして、ＤＭＡ転送およびＥＶＡＬが次のＤＭＡ転送を成功裡に完了したことを確認する。
【０５６０】
しかし、いくつかの場合に、シミュレーションシステムは、その時点でメモリアクセスを実行したくないこともある。ここで、シミュレーションシステムはメモリイネーブル信号ＭＥＭ＿ＥＮを０に保持する。この使用不可にされた（論理０）ＭＥＭ＿ＥＮ信号は、ＭＥＭＦＳＭユニットをアイドル状態１３００に保持する。ここで、ＭＥＭＦＳＭユニットは、ＤＭＡデータ転送またはＦＰＧＡ論理デバイスによるデータの評価を待機する。他方、メモリイネーブル信号ＭＥＭ＿ＥＮが論理１であると、シミュレーションシステムはメモリアクセスの実行が所望されることを示す。
【０５６１】
図５８において状態１３０４より下では、状態図は、平行に進行する２つのセクションに分割される。１つのセクションは低バンクメモリアクセスのための状態１３０５、１３０６、１３０７、１３０８、および１３０９を含む。他のセクションは、高バンクメモリアクセスのための状態１３１１、１３１２、１３１３、１３１４、および１３０９を含む。
【０５６２】
状態１３０５において、シミュレーションは、現在選択されているＦＰＧＡ論理デバイスがアドレスおよび制御信号をＦＰＧＡバスＦＤ［３１：０］に入力するために１サイクル待機する。状態１３０６において、ＭＥＭＦＳＭは、メモリアドレス／制御ラッチ１２４３に対するライン１２６３上にラッチ信号を生成して、ＦＤ［３１：０］から入力をフェッチする。この特にフェッチされたアドレスおよび制御信号に対応するデータは、ＳＲＡＭメモリデバイスから読み出されるか、またはＳＲＡＭメモリへ書き込まれるかのいずれかである。シミュレーションシステムが読み込み動作または書き込み動作を必要としているかを判断するために、低バンクに対するメモリ書き込み信号ｍｅｍ＿ｗｒ＿Ｌがアドレスおよび制御信号から抽出され得る。ｍｅｍ＿ｗｒ＿Ｌ＝０の場合、読み出し動作がリクエストされる。ｍｅｍ＿ｗｒ＿Ｌ＝１の場合、書き込み動作がリクエストされる。上記のように、このｍｅｍ＿ｗｒ信号はチップ選択書き込み信号と等価である。
【０５６３】
状態１３０７において、アドレス／制御マルチプレクサ１２４４に対する適切な選択信号が生成され、アドレスおよび制御信号を低バンクＳＲＡＭへ送信する。ＭＥＭＦＳＭユニットは、ｍｅｍ＿ｗｒ信号およびＬＡＳＴＬ信号をチェックする。ｍｅｍ＿ｗｒ＿Ｌ＝１かつＬＡＳＴＬ＝０の場合、書き込み動作はリクエストされるが、ＦＰＧＡ論理デバイスのチェーンにおける最後のデータはまだ外へシフトされていない。したがって、シミュレーションシステムは状態１３０５に戻る。状態１３０５で、シミュレーションシステムは、ＦＰＧＡ論理デバイスがさらなるアドレスおよび制御信号をＦＤ［３１：０］に入力するために１サイクル待機する。この動作は、最後のデータがＦＰＧＡ論理デバイスの外へシフトされるまで継続する。しかし、ｍｅｍ＿ｗｒ＿Ｌ＝１かつＬＡＳＴＬ＝１の場合、最後のデータはＦＰＧＡ論理デバイスの外へシフトされた。
【０５６４】
同様に、読み出し動作を示すｍｅｍ＿ｗｒ＿Ｌ＝０の場合、ＭＥＭＦＳＭは状態１３０８へ進む。状態１３０８において、シミュレーションシステムは、ＳＲＡＭメモリデバイスがデータをＦＰＧＡバスＦＤ［３１：０］に入力するために１サイクル待機する。ＬＡＳＴＬ＝０の場合、ＦＰＧＡ論理デバイスのチェーンの最後のデータはまだ外へシフトされていない。したがって、シミュレーションシステムは状態１３０５に戻る。状態１３０５で、シミュレーションシステムは、ＦＰＧＡ論理デバイスがさらなるアドレスおよび制御信号をＦＤ［３１：０］に入力するために１サイクル待機する。この処理は、最後のデータがＦＰＧＡ論理デバイスの外へシフトされるまで継続する。なお、書き込み動作（ｍｅｍ＿ｗｒ＿Ｌ＝１）および読み出し動作（ｍｅｍ＿ｗｒ＿Ｌ＝０）はＬＡＳＴＬ＝１までインターリーブまたはそうでなければ交番され得る。
【０５６５】
ＬＡＳＴＬ＝１の場合、ＭＥＭＦＳＭは状態１３０９に進む。状態１３０９でＭＥＭＦＳＭはＤＯＮＥ＝０のあいだ待機する。ＤＯＮＥ＝１になると、ＬＡＳＴＬおよびＬＡＳＴＨの両方が論理１となり、かつ、したがって、シミュレーション書き込み／読み出しサイクルが完了した。次いで、シミュレーションシステムは状態１３００に進む。状態１３００でシミュレーションシステムはＤＡＴＡＸＳＦＲ＝０の場合はいつもアイドル状態のままである。
【０５６６】
同じ処理が高バンクに対して適用可能である。状態１３１１において、シミュレーションシステムは、現在選択されているＦＰＧＡ論理デバイスがアドレスおよび制御信号をＦＰＧＡバスＦＤ［６３：３２］に入力するために１サイクル待機する。状態１３１２において、ＭＥＭＦＳＭはラッチ信号をライン１２７０上にメモリアドレス／制御ラッチ１２４７に対して生成して、ＦＤ［６３：３２］から入力をフェッチする。この特定のフェッチされたアドレスおよび制御信号に対応するデータは、ＳＲＡＭメモリデバイスから読み出されるか、またはＳＲＡＭメモリデバイスへ書き込まれるかのいずれかであり得る。シミュレーションシステムが読み込み動作または書き込み動作を必要としているかを判断するために、高バンクに対するメモリ書き込み信号ｍｅｍ＿ｗｒ＿Ｈがアドレスおよび制御信号から抽出され得る。ｍｅｍ＿ｗｒ＿Ｈ＝０の場合、読み出し動作がリクエストされる。ｍｅｍ＿ｗｒ＿Ｈ＝１の場合、書き込み動作がリクエストされる。
【０５６７】
状態１３１３において、アドレス／制御マルチプレクサ１２４６に対する適切な選択信号が生成され、アドレスおよび制御信号を高バンクＳＲＡＭへ送信する。ＭＥＭＦＳＭユニットは、ｍｅｍ＿ｗｒ信号およびＬＡＳＴＨ信号をチェックする。ｍｅｍ＿ｗｒ＿Ｈ＝１かつＬＡＳＴＨ＝０の場合、書き込み動作はリクエストされるが、ＦＰＧＡ論理デバイスのチェーンにおける最後のデータはまだ外へシフトされていない。したがって、シミュレーションシステムは状態１３１１に戻る。状態１３１１で、シミュレーションシステムは、ＦＰＧＡ論理デバイスがさらなるアドレスおよび制御信号をＦＤ［６３：３２］に入力するために１サイクル待機する。この動作は、最後のデータがＦＰＧＡ論理デバイスの外へシフトされるまで継続する。しかし、ｍｅｍ＿ｗｒ＿Ｌ＝１かつＬＡＳＴＬ＝１の場合、最後のデータはＦＰＧＡ論理デバイスの外へシフトされた。
【０５６８】
同様に、読み出し動作を示すｍｅｍ＿ｗｒ＿Ｈ＝０の場合、ＭＥＭＦＳＭは状態１３１４へ進む。状態１３１４において、シミュレーションシステムは、ＳＲＡＭメモリデバイスがデータをＦＰＧＡバスＦＤ［６３：３２］に入力するために１サイクル待機する。ＬＡＳＴＨ＝０の場合、ＦＰＧＡ論理デバイスのチェーンの最後のデータはまだ外へシフトされていない。したがって、シミュレーションシステムは状態１３１１に戻る。状態１３１１で、シミュレーションシステムは、ＦＰＧＡ論理デバイスがさらなるアドレスおよび制御信号をＦＤ［６３：３２］に入力するために１サイクル待機する。この処理は、最後のデータがＦＰＧＡ論理デバイスの外へシフトされるまで継続する。なお、書き込み動作（ｍｅｍ＿ｗｒ＿Ｈ＝１）および読み出し動作（ｍｅｍ＿ｗｒ＿Ｈ＝０）はＬＡＳＴＨ＝１までインターリーブまたはそうでなければ交番され得る。
【０５６９】
ＬＡＳＴＨ＝１の場合、ＭＥＭＦＳＭは状態１３０９に進む。状態１３０９でＭＥＭＦＳＭはＤＯＮＥ＝０のあいだ待機する。ＤＯＮＥ＝１になると、ＬＡＳＴＬおよびＬＡＳＴＨの両方が論理１となり、かつ、したがって、シミュレーション書き込み／読み出しサイクルが完了した。次いで、シミュレーションシステムは状態１３００に進む。状態１３００でシミュレーションシステムはＤＡＴＡＸＳＦＲ＝０の場合はいつもアイドル状態のままである。
【０５７０】
あるいは、高バンクおよび低バンクの両方に対して、状態１３０９および１３１０は本発明の別の実施形態においては実施されない。したがって、低バンクにおいて、ＭＥＭＦＳＭは、状態１３０８（ＬＡＳＴＬ＝０）または１３０７（ＭＥＭ＿ＷＲ＿Ｌ＝１かつＬＡＳＴＬ＝１）を過ぎた後に状態１３００へ直接に進み得る。高バンクにおいて、ＭＥＭＦＳＭは、状態１３１４（ＬＡＳＴＨ＝１）または１３１３（ＭＥＭ＿ＷＲ＿Ｈ＝１かつＬＡＳＴＨ＝１）を過ぎた後に状態１３００へ直接に進み得る。
【０５７１】
ここでＥＶＡＬＦＳＭユニット１２４８の状態図を本発明の一実施形態にしたがって説明する。図５９は、各ＦＰＧＡチップにおけるＥＶＡＬＦＳＭｘ有限状態マシンのそのような状態図を示す。図５８と同様に、図５９における状態図は、シミュレーション書き込み／読み出しサイクル内の２つの期間がまたそれらに対応する状態を有して示されるように構成された。したがって、状態１３２０〜１３２６Ａは評価期間に対応し、かつ状態１３２６Ｂ〜１３３６はメモリアクセス期間に対応する。以下の説明において図５７を図５９と併せて参照する。
【０５７２】
ＥＶＡＬＦＳＭｘユニット１２４８は、ＣＴＲＬ＿ＦＰＧＡユニット１２００からライン１２７４上でＥＶＡＬ信号を受信する（図５７参照）。ＥＶＡＬ＝０の間、ＦＰＧＡ論理デバイスによるデータの評価は発生しない。したがって、状態１３２０において、ＥＶＡＬＦＳＭｘはＥＶＡＬ＝０のあいだアイドル状態である。ＥＶＡＬ＝１の場合、ＥＶＡＬＦＳＭｘは状態１３２１へ進む。
【０５７３】
状態１３２１、１３２２、および１３２３は、ＦＰＧＡ間通信に関する。ＦＰＧＡ間通信では、データがユーザ設計によってＦＰＧＡ論理デバイスを介して評価される。ここで、ＥＶＡＬＦＳＭｘは、信号ｉｎｐｕｔ＿ｅｎ、ｍｕｘ＿ｅｎ、およびｃｌｋ＿ｅｎ（図５７における項目１２８１）をユーザの論理に対して生成する。状態１３２１において、ＥＶＡＬＦＳＭｘはｃｌｋ＿ｅｎ信号を生成する。ｃｌｋ＿ｅｎ信号は、すべてのクロックエッジレジスタフリップフロップの第２のフリップフロップがこのサイクルにおけるユーザの設計論理において使用可能にする（図１９参照）。ｃｌｋ＿ｅｎ信号は別にソフトウェアクロックとして公知である。ユーザメモリタイプが同期である場合、ｃｌｋ＿ｅｎはまた、各メモリブロックにおけるメモリ読み出しデータダブルバッファ１２５１の第２のクロックを使用可能にする。各メモリブロックのためのＳＲＡＭデータ出力は、このサイクルにおいてユーザの設計論理に送信される。
【０５７４】
状態１３２２において、ＥＶＡＬＦＳＭｘはｉｎｐｕｔ＿ｅｎ信号をユーザの設計論理に対して生成して、ＤＭＡ転送によるＣＰＵからユーザの論理へ送信される入力信号をラッチする。ｉｎｐｕｔ＿ｅｎ信号は、イネーブル信号を一次クロックレジスタにおける第２のフリップフロップへ提供する（図１９参照）。
【０５７５】
状態１３２３において、ＥＶＡＬＦＳＭｘはｍｕｘ＿ｅｎ信号を生成して各ＦＰＧＡ論理デバイスにおける多重化回路をオンにしてアレイにおける他のＦＰＧＡ論理デバイスとの通信を開始する。上記のように、ＦＰＧＡ間ワイヤラインは、各ＦＰＧＡ論理デバイスチップにおける限られたピンリソースを効率的に使用するように多重化されることが多い。
【０５７６】
状態１３２４において、ＥＶＡＬＦＳＭはＥＶＡＬ＝１である限り待機する。ＥＶＡＬ＝０の場合、評価期間が完了し、そしてそのため状態１３２５はＥＶＡＬＦＳＭｘがｍｕｘ＿ｅｎ信号をオフにすることを必要とする。
【０５７７】
メモリブロックＭの数（ここでＭは０を含む整数）がゼロの場合、ＥＶＡＬＦＳＭｘは状態１３２０に戻る。状態１３２０でＥＶＡＬＦＳＭｘはＥＶＡＬ＝０ならばアイドル状態のままである。たいていの場合、Ｍ＞０、かつ、したがってＥＶＡＬＦＳＭｘは状態１３２６Ａ／１３２６Ｂに戻る。「Ｍ」はＦＰＧＡ論理デバイスにおけるメモリブロックの数である。Ｍは、ＦＰＧＡ論理デバイスにおいてマッピングおよび構成されるユーザの設計から一定である。Ｍは値を下げるようにはカウントされない。Ｍ＞０の場合、図５９の右の部分（メモリアクセス期間）はＦＰＧＡ論理デバイスにおいて構成され得る。Ｍ＝０の場合、図５９の左部分（ＥＶＡＬ期間）のみが構成され得る。
【０５７８】
状態１３２７は、ＳＨＩＦＴＩＮ＝０である限りＥＶＡＬＦＳＭｘを待機状態に保持する。ＳＨＩＦＴＩＮ＝１の場合、前回のＦＰＧＡ論理デバイスはそのメモリアクセスを完了し、かつ現在のＦＰＧＡ論理デバイスがここでそのメモリアクセスタスクを実行する準備が完了している。あるいは、現在のＦＰＧＡ論理デバイスはバンクにおいて第１の論理デバイスであり、かつＳＨＩＦＴＩＮ入力ラインはＶｃｃに結合される。にもかかわらす、ＳＨＩＦＴＩＮ＝１信号の受信は現在のＦＰＧＡ論理デバイスがメモリアクセスを実行する準備が完了していることを示す。状態１３２８において、メモリブロック数ＮはＮ＝１にセットされる。この数Ｎは、各ループの発生時にインクリメントされてその特定のメモリブロックＮに対するメモリアクセスが達成され得る。はじめに、Ｎ＝１であり、かつそこでＥＶＡＬＦＳＭｘはメモリブロック１に対するメモリにアクセスするように進み得る。
【０５７９】
状態１３２９において、ＥＶＡＬＦＳＭｘは、ＦＰＧＡバスドライバＦＤＯ＿ＭＵＸｘ１２４９に対してライン１２８５上に選択信号を生成し、かつライン１２８４上にｏｕｔｐｕｔ＿ｅｎ信号を生成して、Ｍｅｍ＿Ｂｌｏｃｋ＿Ｎインタフェース１２５３のアドレスおよび制御信号をＦＰＧＡバスＦＤ［６３：３２］またはＦＤ［３１：０］に入力する。書き込み動作が要求される場合、ｗｒ＝１である。そうでなければ、読み出し動作が要求され、そこでｗｒ＝０となる。ＥＶＡＬＦＳＭｘはその入力の１つとしてライン１２８７上のｗｒ信号を受信する。このｗｒ信号に基づいて、ライン１２８５上の適切な選択信号がアサートされ得る。
【０５８０】
ｗｒ＝１の場合、ＥＶＡＬＦＳＭｘは状態１３３０に進む。ＥＶＡＬＦＳＭｘはＦＤバスドライバに対する選択およびｏｕｔ＿ｅｎ信号を生成してＭｅｍ＿Ｂｌｏｃｋ＿Ｎ１２５３の書き込みデータをＦＰＧＡバスＦＤ［６３：３２］またはＦＤ［３１：０］に入力される。その後、ＥＶＡＬＦＳＭｘは、ＳＲＡＭメモリデバイスが書き込みサイクルを完了するようにさせるために１サイクル待機する。次いで、ＥＶＡＬＦＳＭｘは状態１３３５に進む。状態１３３５でメモリブロック数Ｎは１だけインクリメントされる。すなわち、Ｎ＝Ｎ＋１である。
【０５８１】
しかし、状態１３２９においてｗｒ＝０の場合、読み出し動作が要求され、そしてＥＶＡＬＦＳＭｘが状態１３３２に進む。ＥＶＡＬＦＳＭｘは、状態１３３２で１サイクル待機し、かつ次いで状態１３３３へ進んで別のサイクルのあいだ待機する。状態１３３４において、ＥＶＡＬＦＳＭｘはｒｄ＿ｌａｔｃｈ信号をライン１２８６上に生成して、メモリブロックＮのメモリ読み出しデータダブルバッファ１２５１がＳＲＡＭデータをＦＤバス上へフェッチするようにさせる。次いで、ＥＶＡＬＦＳＭｘは状態１３３５へ進む。状態１３３５でメモリブロック数Ｎは１だけインクリメントされる。すなわち、Ｎ＝Ｎ＋１である。したがって、インクリメント状態１３３５の前にＮ＝１の場合、Ｎはここで２となり、その結果のメモリアクセスはメモリブロック２に対して適用可能となり得る。
【０５８２】
現在のメモリブロックＮの数がユーザの設計におけるメモリブロックＭの総数以下の場合（すなわち、Ｎ≦Ｍ）、ＥＶＡＬＦＳＭｘは状態１３２９に進む。状態１３２９でＥＶＡＬＦＳＭｘは、動作が書き込みかまたは読み出しかに依存してＦＤバスドライバのための特定の選択およびｏｕｔ＿ｅｎ信号を生成する。次いで、この次のメモリブロックＮのための書き込みまたは読み出し動作が発生し得る。
【０５８３】
しかし、現在のメモリブロックＮの数がユーザの設計におけるメモリブロックＭの総数よりも大きい場合（すなわち、Ｎ≧Ｍ）、ＥＶＡＬＦＳＭｘは状態１３３６に進む。状態１３３６でＥＶＡＬＦＳＭｘはＳＨＩＦＴＯＵＴ出力信号をオンにして、バンクにおける次のＦＰＧＡ論理デバイスがＳＲＡＭメモリデバイスにアクセス可能とする。その後、ＥＶＡＬＦＳＭｘは状態１３２０に進む。状態１３２０でＥＶＡＬＦＳＭｘは、シミュレーションシステムがＦＰＧＡ論理デバイスの間のデータ評価を要求するまでアイドル状態である（すなわち、ＥＶＡＬ＝１）。
【０５８４】
図６１は、本発明の一実施形態のシミュレーション書き込み／読み出しサイクルを示す。図６１は、参照番号１３６６において、シミュレーション書き込み／読み出しサイクルにおける３つの期間（ＤＭＡデータ転送期間、評価期間、およびメモリアクセス期間）を示す。図示しないが、先行のＤＭＡ転送、評価、およびメモリアクセスが発生し得たことを暗に示す。さらに、低バンクＳＲＡＭへ／からのデータ転送に対するタイミングは、高バンクＳＲＡＭと異なる。簡単のため、図６１は、低バンクおよび高バンクに対するアクセス時間が同一である１つの例を示す。グローバルクロックＧＣＬＫ１３５０は、システムにおけるすべての構成要素に対してクロック信号を提供する。
【０５８５】
ＤＡＴＡＸＳＦＲ信号１３５１は、ＤＭＡデータ転送期間の発生を示す。トレース１３６７においてＤＡＴＡＸＳＦＲ＝１の場合、ＤＭＡデータ転送は、メイン計算システムとＦＰＧＡ論理デバイスまたはＳＲＡＭメモリデバイスとの間で発生している。したがって、データはＦＰＧＡ高バンクバスＦＤ［６３：３２］１３５９：およびトレース１３６９、ならびにＦＰＧＡ低バンクバスＦＤ［３１：０］１３５８およびトレース１３６８上に提供される。ＤＯＮＥ信号１３６４は、論理０対１信号（トレース１３９０）によってメモリアクセス期間の完了を示すか、またはそうでなければ、論理０を用いてシミュレーション書き込み／読み出しサイクルの持続期間（例えば、１３７０のエッジおよびトレース１３９０のエッジの組み合わせ）を示す。ＤＭＡ転送期間の間、ＤＯＮＥ信号は論理０である。
【０５８６】
ＤＭＡ転送期間の終了時に、ＤＡＴＡＸＳＦＲ信号は論理１から０へ遷移する。これにより、評価期間の開始がトリガされる。したがって、ＥＶＡＬ１３５２は、トレース１３７１によって示されるように論理１である。論理１でのＥＶＡＬ信号の持続期間は、予め決定され、かつプログラム可能であり得る。この評価期間の間、ユーザの設計論理におけるデータは、トレース１３７２によって示されるような論理１であるｃｌｋ＿ｅｎ信号１３５３、またトレース１３７３によって示されるような論理１であるｉｎｐｕｔ＿ｅｎ信号１３５４、およびまたトレース１３７４によって示されるようなｃｌｋ＿ｅｎおよびｉｎｐｕｔ＿ｅｎよりも長い持続時間のあいだ論理１であるｍｕｘ＿ｅｎ信号１３５５を用いて評価される。データは、この特定のＦＰＧＡ論理デバイス内で評価されている。ｍｕｘ＿ｅｎ信号１３５５はトレース１３７４で論理１から０へ遷移し、かつ少なくとも１つのメモリブロックがＦＰＧＡ論理デバイスにおいて存在する場合、評価期間が終了しかつメモリアクセス期間が開始する。
【０５８７】
ＳＨＩＦＴＩＮ信号１３５６はトレース１３７５で論理１にアサートされる。これは、先行のＦＰＧＡがその評価を完了し、かつすべての所望のデータはこの先行のＦＰＧＡ論理デバイスへ／からアクセスされた。ここで、バンクにおける次のＦＰＧＡ論理デバイスがメモリアクセスを開始するよう準備完了する。
【０５８８】
トレース１３７７〜１３８６において、次の命名法を使用する。ＡＣｊ＿ｋはアドレスおよび制御信号がＦＰＧＡｊおよびメモリブロックｋに関連付けられることを示す。ここでｊおよびｋは０を含む整数である。ＷＤｊ＿ｋは、ＦＰＧＡｊのための書き込みデータおよびメモリブロックｋを示す。ＲＤｊ＿ｋはＦＰＧＡｊおよびメモリブロックｋを示す。したがって、ＡＣ３＿１は、ＦＰＧＡ３およびメモリブロック１に関連付けられたアドレスおよび制御信号を示す。低バンクＳＲＡＭアクセスおよび高バンクＳＲＡＭアクセス１３６１は、トレース１３８７として示される。
【０５８９】
次の数トレース１３７７〜１３８７は、どのようにメモリアクセスが達成されるかを示す。ＥＶＡＬＦＳＭｘへのｗｒｘ信号およびその結果のＭＥＭＦＳＭへのｍｅｍ＿ｗｒ信号の論理レベルに基づいて、書き込みまたは読み出し動作のいずれかが実行され得る。書き込み動作が所望の場合、ユーザのメモリブロックＮインタフェースを有するメモリモデルインタフェース（図５７におけるＭｅｍ＿Ｂｌｏｃｋ＿Ｎインタフェース１２５３）はｗｒｘをその制御信号の１つとして提供する。この制御信号ｗｒｘは、ＦＤバスドライバおよびＥＶＡＬＦＳＭｘユニットに提供される。ｗｒｘが論理１の場合、適切な選択信号およびｏｕｔｐｕｔ＿ｅｎ信号は、ＦＤバスドライバに提供され、メモリ書き込みデータをＦＤバスに入力する。ここでＦＤバス上にあるこの同じ制御信号は、ＣＴＲＬ＿ＦＰＧＡユニットにおけるメモリアドレス／制御ラッチによってラッチされ得る。メモリアドレス／制御ラッチは、アドレスおよび制御信号をＳＲＡＭにＭＡ［１８：２］／制御バスを介して送信される。論理１であるｗｒｘ制御信号は、ＦＤバスから抽出され、かつ書き込み動作がリクエストされるので、ＦＤバス上のアドレスおよび制御信号に関連付けられたデータはＳＲＡＭメモリデバイスへ送信される。
したがって、図６１に示されるように、この次のＦＰＧＡ論理デバイス（低バンクにおける論理デバイスＦＰＧＡ０）は、トレース１３７７によって示されるようにＡＣ０＿０をＦＤ［３１：０］に入力する。シミュレーションシステムは、ＷＤ０＿０に対して書き込み動作を実行する。次いで、ＡＣ０＿１は、ＦＤ［３１：０］に入力される。しかし、読み出し動作がリクエストされた場合、ＡＣ０＿１をＦＤバスＦＤ［３１：０］に入力した後にいくらかの時間遅延が続き、その後ＡＣ０＿０に対応するＷＤ０＿０の代わりにＲＤ０＿０がＳＲＡＭメモリデバイスによってＦＤバスに入力される。
【０５９０】
なお、トレース１３８３によって示されるようにＡＣ０＿０をＭＡ［１８：２］／制御バスに入力することは、アドレス、制御、およびデータをＦＤバスに入力することよりわずかに遅れる。これは、ＭＥＭＦＳＭユニットがアドレス／制御信号をＦＤバスからラッチインし、ｍｅｍ＿ｗｒ信号を抽出し、かつ適切な選択信号をアドレス／制御マルチプレクサに対して生成してアドレス／制御信号がＭＡ［１８：２］／制御バスに入力され得るようにするのに時間を必要とするからである。さらに、アドレス／制御信号をＳＲＡＭメモリデバイスに対してＭＡ［１８：２］／制御バスに入力した後で、シミュレーションシステムは、ＳＲＡＭメモリからの対応のデータがＦＤバスに入力されるのを待機しなければならない。１つの例は、トレース１３８４とトレース１３８１との間の時間オフセットである。ここでＲＤ１＿１は、ＡＣ１＿１がＭＡ［１８：２］／制御バスに入力された後で、ＦＤバスに入力される。
【０５９１】
高バンク上で、ＦＰＧＡ１は、ＦＤ［６３：３２］にＡＣ１＿０を入力し、次いでＷＤ１＿０が続く。その後、ＡＣ１＿１がＦＤ［６３：３２］に入力される。これは、トレース１３８０によって示される。ＡＣ１＿１がＦＤバスに入力される場合、制御信号はこの例において読み出し動作を示す。したがって、上記のように、ＡＣ１＿１がトレース１３８４によって示されるようにＭＡ［１８：２］／制御バス上にあるので、論理０である適切なｗｒｘおよびｍｅｍ＿ｗｒ信号がＥＶＡＬＦＳＭｘおよびＭＥＭＦＳＭユニットへのアドレス／制御信号において存在する。シミュレーションシステムはこれが読み出し動作であることを知っているので、書き込みデータはＳＲＡＭメモリへ伝送されない。むしろ、ＡＣ１＿１に関連付けられた読み出しデータが、ユーザの設計論理によってシミュレーションメモリブロックインタフェースを介する後の読み出しのために、ＳＲＡＭメモリによってＦＤバスに入力される。これは高バンク上でトレース１３８１によって示される。低バンク上において、ＲＤ０＿１は、トレース１３７８によって示されるようにＦＤバスに入力され、続いてＡＣ０＿１がＭＡ［１８：２］／制御バスに入力される（図示せず）。
【０５９２】
ユーザの設計論理によるシミュレーションメモリブロックインタフェースを介する読み出し動作は、ＥＶＡＬＦＳＭｘがトレース１３８８によって示されるようにシミュレーションシステムにおけるメモリ読み出しデータダブルバッファに対してｒｄ＿ｌａｔ０信号１３６２を生成する場合に達成される。このｒｄ＿ｌａｔ０信号は、低バンクＦＰＧＡ０および高バンクＦＰＧＡ１の両方に提供される。
【０５９３】
その後、各ＦＰＧＡ論理デバイスに対する次のメモリブロックがＦＤバスに入力される。ＡＣ２＿０は、ＡＣ３＿０が高バンクＦＤバスに入力された間、低バンクＦＤバスに入力される。書き込み動作が所望の場合、ＷＤ２＿０は低バンクＦＤバスに入力され、かつＷＤ３＿０は高バンクＦＤバスに入力される。ＡＣ３＿０は、トレース１３８５上に示されるようにＭＡ［１８：２］／制御バスに入力される。この処理は、書き込みおよび読み出し動作のために次のメモリブロックに対して継続する。なお、低バンクおよび高バンクのための書き込みおよび読み出し動作は異なる時間および速度で発生し、かつ図６１は低バンクおよび高バンクに対するタイミングが同じである特定の例を示す。加えて、低および高バンクに対する書き込み動作は一緒に発生し、両方のバンク上での読み出し動作が続く。必ずしもいつもこうなるわけではない。低バンクおよび高バンクが存在することによって、これらのバックに結合されたデバイスの並列動作が可能となる。すなわち、低バンク上の活動は高バンク上の活動から独立である。高バンクが一続きの読み出し動作を並列に実行している間に低バンクが一続きの書き込み動作を実行するような他のシナリオが考えられる。
【０５９４】
各バンクに対して最後のＦＰＧＡ論理デバイスにおける最後のデータとなった場合、ＳＨＩＦＴＯＵＴ信号１３５７はトレース１３７６によって示されるようにアサートされる。読み出し動作に対して、低バンク上のＦＰＧＡ２および高バンク上のＦＰＧＡ３に対応するｒｄ＿ｌａｔ１信号１３６３は、トレース１３８９によって示されるようにアサートされ、トレース１３７９上のＲＤ２＿１およびトレース１３８２上のＲＤ３＿１を読み出す。最後のＦＰＧＡユニットに対する最後のデータがアクセスされたので、シミュレーション書き込み／読み出しサイクルの完了がトレース１３９０によって示されるようにＤＯＮＥ信号１３６４によって示される。
【０５９５】
以下の表Ｈは、シミュレーションシステムボード上の種々の構成要素、および対応のレジスタ／メモリ、ＰＣＩメモリアドレス、およびローカルアドレスをリストおよび記載する。
【０５９６】
【表１１】

【０５９７】
構成ファイルに対するデータ形式は、本発明の一実施形態にしたがって表Ｊにおいて以下に示される。ＣＰＵは、各時間にＰＣＩバスを介して１ワードを送信して、すべてのオンボードＦＰＧＡに対する１ビットを並列に構成する。
【０５９８】
【表１２】

【０５９９】
以下の表Ｋは、ＸＳＦＲ＿ＥＶＡＬレジスタをリストする。ＸＳＦＲ＿ＥＶＡＬレジスタはすべてのボード上に存在する。ＸＳＦＲ＿ＥＶＡＬレジスタは、ホスト計算システムによって使用され、ＥＶＡＬ期間をプログラムし、ＤＭＡ読み出し／書き込みを制御し、かつＥＶＡＬ＿ＤＯＮＥおよびＸＳＦＲ＿ＤＯＮＥフィールドのステータスを読み出す。ホスト計算システムはまた、このレジスタを使用してメモリアクセスを使用可能にする。このレジスタに対するシミュレーションシステムの動作は図６２および６３を参照して以下に説明される。
【０６００】
【表１３】

【０６０１】
以下の表ＬはＣＯＮＦＩＧ＿ＪＴＡＧ［６：１］レジスタの内容をリストする。ＣＰＵはＦＰＧＡ論理デバイスを構成し、かつこのレジスタを介してＦＰＧＡ論理デバイスに対して境界スキャンテストを実行する。各ボードは１つの専用レジスタを有する。
【０６０２】
【表１４】

【０６０３】
図６２および６３は本発明の別の実施形態のタイミング図を示す。これら２つの図は、ＸＳＦＲ＿ＥＶＡＬレジスタに対するシミュレーションシステムの動作を示す。ＸＳＦＲ＿ＥＶＡＬレジスタは、ホスト計算システムによって使用され、ＥＶＡＬ期間をプログラムし、ＤＭＡ読み出し／書き込みを制御し、かつＥＶＡＬ＿ＤＯＮＥおよびＸＳＦＲ＿ＤＯＮＥフィールドのステータスを読み出す。ホスト計算システムはまた、このレジスタを使用して、メモリアクセスを可能にする。２つの図の主な違いの１つは、ＷＡＩＴ＿ＥＶＡＬフィールドのステータスである。ＷＡＩＴ＿ＥＶＡＬフィールドが「０」に設定された場合（図６２の場合）、ＤＭＡ読み出し転送はＣＬＫ＿ＥＮの後に開始する。ＷＡＩＴ＿ＥＶＡＬフィールドが「１」に設定された場合（図６３の場合）、ＤＭＡ読み出し転送はＥＶＡＬ＿ＤＯＮＥの後に開始する。
【０６０４】
図６２において、ＷＲ＿ＸＳＦＲ＿ＥＮおよびＲＤ＿ＸＳＦＲ＿ＥＮの両方が「１」に設定される。これら２つのフィールドはＤＭＡ書き込み／読み出し転送を使用可能とし、かつＸＳＦＲ＿ＤＯＮＥによってクリアされる。２つのフィールドが「１」に設定されるので、ＣＴＲＬ＿ＦＰＧＡユニットは自動的にＤＭＡ書き込み転送をまず実行し、かつ次いでＤＭＡ読み出し転送を実行する。しかし、ＷＡＩＴ＿ＥＶＡＬフィールドは「０」に設定され、ＤＭＡ読み出し転送がＣＬＫ＿ＥＮのアサートの後（かつＤＭＡ書き込み動作の完了の後）で開始する。したがって、図６２において、ＤＭＡ読み出し動作は、ＣＬＫ＿ＥＮ信号（ソフトウェアクロック）が検出され次第、ＤＭＡ書き込み動作の完了後ほとんど直ちに発生する。ＤＭＡ読み出し転送はＥＶＡＬ期間の完了を待機しない。
【０６０５】
タイミング図の開始において、複数のＦＰＧＡ論理デバイスが注意（ａｔｔｅｎｔｉｏｎ）を得ようと競う際にＥＶＡＬ＿ＲＥＱ＿Ｎ信号は競合する。上記のように、ＥＶＡＬ＿ＲＥＱ＿Ｎ（またはＥＶＡＬ＿ＲＥＱ＃）信号は、ＦＰＧＡ論理デバイスのいずれかがこの信号をアサートした場合に、評価サイクルを開始するように使用される。データ転送の終了時に、アドレスポインタ初期化および評価処理を容易にするためのソフトウェアクロックの操作を含む評価サイクルが開始する。
【０６０６】
ＤＯＮＥ信号はまた、ＤＭＡデータ転送期間の終結時に生成され、複数のＬＡＳＴ信号（各ＦＰＧＡ論理デバイスの出力におけるｓｈｉｆｔｉｎおよびｓｈｉｆｔｏｕｔ信号からのもの）がＣＴＲＬ＿ＦＰＧＡユニットに対して生成および提供される際に競合する。すべてのＬＡＳＴ信号が受信および処理される場合、ＤＯＮＥ信号が生成され、かつＤＭＡデータ転送動作が開始し得る。ＥＶＡＬ＿ＲＥＱ＿Ｎ信号およびＤＯＮＥ信号は、以下に記載の方法で、時分割式に同じワイヤを使用する。
【０６０７】
システムは、時間１４０９におけるＷＲ＿ＸＳＦＲ信号によって示されるように自動的にＤＭＡ書き込み転送をまず開始する。一実施形態において、ＷＲ＿ＸＳＦＲ信号の初期部分は、ＰＣＩコントローラ、ＰＣＩ９０８０または９０６０に関連する所定のオーバヘッドを含む。その後、ホスト計算システムはＤＭＡ書き込み動作をローカルバスＬＤ［３１：０］およびＦＰＧＡバスＦＤ［６３：０］を介して、ＦＰＧＡバスＦＤ［６３：０］に結合されたＦＰＧＡ論理デバイスに対して実行する。
【０６０８】
時間１４１２において、ＷＲ＿ＸＳＦＲ信号は非アクティブにされ、ＤＭＡ書き込み動作の完了を示す。ＥＶＡＬ信号は時間１４１２から１４１０の間の所定時間のあいだアクティブにされる。ＥＶＡＬＴＩＭＥの持続期間がプログラム可能であり、かつ８＋Ｘに初期設定される。ここでＸは最長の信号トレースパスから得られる。ＸＳＦＲ＿ＤＯＮＥ信号はまた、短時間アクティブにされ、このＤＭＡ転送動作（現在の動作はＤＭＡ書き込み）の完了を示す。
【０６０９】
また、時間１４１２において、ＥＶＡＬ＿ＲＥＱ＿Ｎ信号間の競合は停止するがＤＯＮＥ信号を伝送するワイヤはここでＥＶＡＬ＿ＲＥＱ＿Ｎ信号をＣＴＲＬ＿ＦＰＧＡユニットに送達する。３クロックサイクルの間、ＥＶＡＬ＿ＲＥＱ＿Ｎ信号は、ＤＯＮＥ信号を伝送するワイヤを介して処理される。３クロックサイクルの後、ＥＶＡＬ＿ＲＥＱ＿Ｎ信号はもはやＦＰＧＡ論理デバイスによって生成されないが、前回にＣＴＲＬ＿ＦＰＧＡユニットに送達されたＥＶＡＬ＿ＲＥＱ＿Ｎ信号が処理され得る。ＥＶＡＬ＿ＲＥＱ＿Ｎ信号がもはやゲート化クロックのためのＦＰＧＡ論理デバイスによって生成されない最大時間はおよそ２３クロックサイクルである。この期間よりも長いＥＶＡＬ＿ＲＥＱ＿Ｎ信号は無視され得る。
【０６１０】
時間１４１３において、時間１４１２（ＤＭＡ書き込み動作の終了時）のおよそ２クロックサイクル後、ＣＴＲＬ＿ＦＰＧＡユニットは書き込みアドレスストローブＷＰＬＸＡＤＳ＿Ｎ信号をＰＣＩコントローラ（例えば、ＰＬＸＰＣＩ９０８０）を送信して、ＤＭＡ読み出し転送を開始する。時間１４１３からおよそ２４クロックサイクルで、ＰＣＩコントローラはＤＭＡ読み出し転送処理を開始し得、かつＤＯＮＥ信号がまた生成される。時間１４１４において、ＰＣＩコントローラによってＤＭＡ読み出し処理の開始の前に、ＲＤ＿ＸＳＦＲ信号がアクティブにされ、ＤＭＡ読み出し転送を使用可能にする。所定のＰＬＸオーバーヘッドデータがまず送信および処理される。時間１４１５において、このオーバーヘッドデータが処理される間、ＤＭＡ読み出しデータは、ＦＰＧＡバスＦＤ［６３：０］およびローカルバスＬＤ［３１：０］に入力される。時間１４１３から２４クロックサイクルの最後かつＤＯＮＥ信号のアクティブ化およびＦＰＧＡ論理デバイスからのＥＶＡＬ＿ＲＥＱ＿Ｎ信号の生成の時間において、ＰＣＩコントローラは、ＤＭＡ読み出しデータを、そのデータをＦＰＧＡバスＦＤ［６３：０］およびローカルバスＬＤ［３１：０］からホストコンピュータシステムへ伝送することによって処理する。
【０６１１】
時間１４１０において、ＤＭＡ読み出しデータは、ＥＶＡＬ信号が非アクティブ化され、かつＥＶＡＬ＿ＤＯＮＥ信号がアクティブ化されてＥＶＡＬサイクルの完了を示す間、処理され続け得る。ＦＰＧＡ論理デバイス間の競合はまた、ＥＶＡＬ＿ＲＥＱ＿Ｎ信号を生成する際に開始する。
【０６１２】
時間１４１７において、時間１４１６におけるＤＭＡ読み出し期間の完了の直前で、ホストコンピュータシステムは、ＰＬＸ割り込みレジスタをポーリングしてＤＭＡサイクルの終了が近いかどうかを判断する。ＰＣＩコントローラは、ＤＭＡデータ転送プロセスを完了するのにどれだけ多くのサイクルが必要であるかを識別する。所定数のサイクルの後、ＰＣＩコントローラは特定ビットをその割り込みレジスタに設定する。ホストコンピュータシステムにおけるＣＰＵは、ＰＣＩコントローラにおいてこの割り込みレジスタをポーリングする。ビットが設定された場合、ＣＰＵはＤＭＡ期間がほとんど終了したことを識別する。ホストシステムにおけるＣＰＵは、割り込みレジスタを常時ポーリングするわけではない。なぜなら、そうするとＰＣＩバスが読み出しサイクルで占有される。したがって、本発明の一実施形態において、ホストコンピュータシステムにおけるＣＰＵは、所定数のサイクルを待った後、割り込みレジスタをポーリングするように短時間の後、ＲＤ＿ＸＳＦＲが非活性化されると時間１４１６においてＤＭＡ読み出し期間の終了が起こり、ＤＭＡ読み出しデータはもはやＦＰＧＡバスＦＤ[６３：０]上にもローカルバスＬＤ[３１：０]上にもない。さらに時間１４１６においてＸＳＦＲ＿ＤＯＮＥ信号が活性化され、ＤＯＮＥ信号を発生させるためのＬＡＳＴ信号間の競争が開始される。
【０６１３】
時間１４０９におけるＷＲ＿ＸＳＦＲ信号の発生から時間１４１７までのＤＭＡ期間を通して、ホストコンピュータシステム内のＣＰＵはシミュレーションハードウェアシステムにアクセスしない。一実施形態において、この期間は（１）ＰＣＩコントローラ時間２用のオーバーヘッド期間と（２）ＷＲ＿ＸＳＦＲおよびＲＤ＿ＸＳＦＲのワード数と（３）ホストコンピュータシステム（例えばＳｕｎＵＬＴＲＡＳｐａｒｃ）のＰＣＩオーバーヘッドとの合計である。ＤＭＡ期間後の最初のアクセスは、ＣＰＵがＰＣＩコントローラ内のインタラプトレジスタをポーリングする時間１４１９に起こる。
【０６１４】
時間１４１１、すなわち時間１４１６から約３クロックサイクル後に、ＭＥＭ＿ＥＮ信号が活性化されることによりオンボードＳＲＡＭメモリデバイスがイネーブルになり、それにより、ＦＰＧＡ論理デバイスとＳＲＡＭメモリデバイスとの間のメモリアクセスが開始し得る。メモリアクセスは時間１４１９まで続き、一実施形態ではアクセス毎に５クロックサイクルを必要とする。ＤＭＡ読み出し転送が必要でない場合、メモリアクセスは時間１４１１ではなく、より早い時間１４１０に開始し得る。
【０６１５】
メモリアクセスがＦＰＧＡ論理デバイスとＳＲＡＭメモリデバイスとの間でＦＰＧＡバスＦＤ［６３：０］を介して起こる一方、ホストコンピュータシステム内のＣＰＵは、時間１４１８から時間１４２９までＰＣＩコントローラおよびＣＴＲＬ＿ＦＰＧＡユニットと、ローカルバスＬＤ［３１：０］を介して通信し得る。これは、ＣＰＵがＰＣＩコントローラのインタラプトレジスタのポーリングを完了した後に起こる。ＣＰＵは、次のデータ転送の準備として様々なレジスタにデータを書き込む。この期間は、４μｓｅｃよりも長い。メモリアクセスがこの期間よりも短い場合、ＦＰＧＡバスＦＤ［６３：０］にはコンフリクトが起こらない。時間１４２９において、ＸＳＦＲ＿ＤＯＮＥ信号が非活性化される。
【０６１６】
図６３のタイミング図は図６２のものとは幾分異なる。なぜなら図６３では、ＷＡＩＴ＿ＥＶＡＬフィールドが「１」に設定されているからである。換言すると、ＤＭＡ読み出し転送期間はＥＶＡＬ＿ＤＯＮＥ信号が活性化されほとんど完了した後に開始する。ＤＭＡ読み出し転送期間は、ＤＭＡ書き込み動作の完了直後ではなく、ＥＶＡＬ期間がほぼ完了するまで待ってから開始される。ＥＶＡＬ信号は時間１４１２から時間１４１０という所定期間中、活性化される。時間１４１０で、ＥＶＡＬ＿ＤＯＮＥ信号が活性化されることによりＥＶＡＬ期間の完了が示される。
【０６１７】
図６３において、時間１４１２にＤＭＡ書き込み動作が終了してから時間１４２０までの間、ＣＴＲＬ＿ＦＰＧＡユニットはＰＣＩコントローラへの書き込みアドレスストローブ信号ＷＰＬＸＡＤＳ＿Ｎを発生しない。時間１４２０は、ＥＶＡＬ期間の終了よりも約１６クロックサイクル前である。ＸＳＦＲ＿ＤＯＮＥ信号はさらに時間１４２３まで延長される。時間１４２３において、ＸＳＦＲ＿ＤＯＮＥフィールドが設定され、ＤＭＡ読み出しプロセスを開始するためにＷＰＬＸＡＤＳ＿Ｎ信号が発生し得る。
【０６１８】
時間１４２０、すなわちＥＶＡＬ＿ＤＯＮＥ信号の活性化より約１６クロックサイクル前に、ＣＴＲＬ＿ＦＰＧＡユニットが書き込みアドレスストローブＷＰＬＸＡＤＳ＿Ｎ信号をＰＣＩコントローラ（例えば、ＰＬＸＰＣＩ９０８０）に送信することにより、ＤＭＡ読み出し転送を開始する。時間１４２０から約２４クロックサイクル後に、ＰＣＩコントローラがＤＭＡ読み出し転送プロセスを開始し、ＤＯＮＥ信号も発生する。時間１４２１、すなわちＰＣＩコントローラによるＤＭＡ読み出しプロセスの開始前に、ＲＤ＿ＸＳＦＲ信号が活性化され、それによりＤＭＡ読み出し転送がイネーブルになる。いくらかのＰＬＸオーバーヘッドデータがまず送信され処理される。時間１４２２において、このオーバーヘッドデータが処理されている間に、ＤＭＡ読み出しデータがＦＰＧＡバスＦＤ［６３：０］およびローカルバスＬＤ[３１：０]上に載る。２４クロックサイクルが終わる時間１４２４に、ＰＣＩコントローラが、ＦＰＧＡバスＦＤ［６３：０］およびローカルバスＬＤ[３１：０]からホストコンピュータシステムにＤＭＡ読み出しデータを伝送することにより処理する。タイミング図の残りの部分は図６２と同等である。
【０６１９】
このように図６３では図６２よりも後にＲＤ＿ＸＳＦＲ信号が活性化される。図６３において、ＲＤ＿ＸＳＦＲ信号はＥＶＡＬ期間がほぼ完了した後に活性化され、これによりＤＭＡ読み出し動作が遅延される。図６２において、ＲＤ＿ＸＳＦＲ信号は、ＤＭＡ書き込み転送が完了した後にＣＬＫ＿ＥＮ信号が検出されるのを待って活性化される。
【０６２０】
（ＩＸ．コ−ベリフィケーションシステム）
本発明のコ−ベリフィケーションシステムは、ソフトウェアシミュレーションのフレキシビリティと、ハードウェアモデルを用いることから生じるより高い速度とを設計者に提供することにより、設計／開発サイクルを加速し得る。設計のハードウェア部分およびソフトウェア部分は両方とも、ＡＳＩＣの作成に先立ち、かつエミュレータベースのコ−ベリフィケーションツールに対する制限なくベリファイされ得る。デバッグ機能は向上し、デバッグ時間全体が大幅に低減され得る。
【０６２１】
（テストデバイスとしてＡＳＩＣを用いた従来のコ−ベリフィケーションツール）
図６４は、ビデオ、マルチメディア、イーサネット（登録商標）、またはＳＣＳＩカードなどのＰＣＩアドオンカードとして具現化された典型的な最終的設計を示す。このカード２０００は、他の周辺デバイスとの通信を可能にする直接インターフェースコネクタ２００２を含む。コネクタ２００２は、バス２００１に接続されて、ＶＣＲ、カメラまたはＴＶチューナからのビデオ信号、モニタまたはスピーカへのビデオおよびオーディオ出力、ならびに通信またはディスクドライブインターフェースへの信号を伝送する。ユーザ設計に依存して、当業者は他のインターフェースに対する要件を予測し得る。設計の多くの機能はチップ２００４内にあり、チップ２００４は、バス２００３を介してインターフェースコネクタ２００２と接続され、バス２００７を介してローカルクロック信号を発生する局部発振器２００５と接続され、バス２００８を介してメモリ２００６と接続されている。アドオンカード２０００はさらに、ＰＣＩコネクタ２００９を含み、ＰＣＩバス２０１０と接続されている。
【０６２２】
この設計を図６４に示すアドオンカードとして実施する前に、この設計はテストのためにＡＳＩＣ形態に変更される。従来のハードウェア／ソフトウェアコ−ベリフィケーションツールを図６５に示す。図６５において、ユーザ設計は、テストデバイス（または「ＤＵＴ」）２０２４として示すＡＳＩＣの形態で具現化されている。ＡＳＩＣのインターフェース先として設計された様々なソースからの刺激を得るために、テストデバイス２０２４はターゲットシステム２０２０内に配置される。ターゲットシステム２０２０は、マザーボード上の中央演算システム２０２１といくつかの周辺デバイスとの組み合わせである。ターゲットシステム２０２０は中央演算システム２０２１を含み、中央演算システム２０２１は、ＣＰＵおよびメモリを含む。ターゲットシステム２０２０は、ＭｉｃｒｏｓｏｆｔＷｉｎｄｏｗｓ（登録商標）またはＳｕｎＭｉｃｒｏＳｙｓｔｅｍのＳｏｌａｒｉｓなどのいくつかのオペレーティングシステム下で動作して複数のアプリケーションを実行させる。当業者には公知であるが、ＳｕｎＭｉｃｒｏＳｙｓｔｅｍのＳｏｌａｒｉｓは、インターネット、イントラネットおよび企業内コンピューティングをサポートする動作環境兼ソフトウェア製品セットである。Ｓｏｌａｒｉｓ動作環境は、業界標準であるＵＮＩＸ（登録商標）システムＶリリース４に基づいており、配信されたネットワーキング環境でクライアント−サーバアプリケーション用に設計され、相対的に小さいワークグループ用の適切なリソースを提供し、電子商取引に必要なＷｅｂＴｏｎｅを提供する。
【０６２３】
テストデバイス２０２４用のデバイスドライバ２０２２は、中央演算システム２０２１に含まれ、オペレーティングシステム（および任意のアプリケーション）とテストデバイス２０２４との間の通信をイネーブルにする。当業者には公知であるが、デバイスドライバは、コンピュータシステムのハードウェアコンポーネントまたは周辺デバイスを制御する特定のソフトウェアである。デバイスドライバは、デバイスのハードウェアレジスタへのアクセスを担い、しばしば、デバイスによって引き起こされるサービスインタラプトに対するインタラプトハンドラを含む。デバイスドライバはしばしば、オペレーティングシステムカーネルの最低レベルの一部分を形成する。この一部分とは、カーネルが構築されたときにデバイスドライバがリンクされる部分である。いくつかのより最近のシステムは、オペレーティングが実行された後にファイルからインストールされ得るロード可能デバイスドライバを有する。
【０６２４】
テストデバイス２０２４および中央演算システム２０２１は、ＰＣＩバス２０２３に接続されている。ターゲットシステム２０２０内の他の周辺デバイスは、バス２０３４を介してターゲットシステムをネットワーク２０３０に接続するために用いられるイーサネット（登録商標）ＰＣＩアドオンカード２０２５、バス２０３６および２０３５を介してＳＣＳＩドライブ２０２７および２０３１に接続されているＳＣＳＩＰＣＩアドオンカード２０２６、バス２０３２を介してテストデバイス２０２４に接続されたＶＣＲ２０２８（テストデバイス２０２４の設計上必要な場合）、ならびにバス２０３３を介してテストデバイス２０２４に接続されたモニタおよび／またはスピーカ２０２９（テストデバイス２０２４の設計上必要な場合）を含む。当業者には公知であるが、「ＳＣＳＩ」はＳｍａｌｌＣｏｍｐｕｔｅｒＳｙｓｔｅｍｓＩｎｔｅｆａｃｅ」の略であり、これは、コンピュータと、ハードディスク、フロッピー（登録商標）ディスク、ＣＤ−ＲＯＭ、プリンタ、スキャナを含む多くのインテリジェントデバイスとの間のシステムレベルのインターフェースの、プロセッサに依存しない標準である。
【０６２５】
このターゲットシステム環境において、テストデバイス２０２４は、中央演算システム（すなわち、オペレーティングシステム、アプリケーション）から周辺デバイスまでの様々な刺激を用いて検査され得る。時間的に問題がなく設計者が単純に成功か失敗かを知るためのテストを求めている場合、このコ−ベリフィケーションツールはそのニーズを満たすように適切に変更されるべきである。しかしほとんどの場合、設計プロジェクトは予算面および製品としてリリースされるまでのスケジュール面で厳しく制限されている。上述したように、この特定のＡＳＩＣベースのコ−ベリフィケーションツールは満足できるものではない。なぜなら、デバッグ機能が存在しないからである。（設計者は、高度な技術なくしては「失敗」したテストの原因を特定することができず、検出された各バグの「修正手段」の数がプロジェクトの開始時に予測できない。従って、スケジュールおよび予算が予測不能となる。）
（テストデバイスとしてエミュレータを用いた従来のコ−ベリフィケーションツール）
図６６は、エミュレータを用いた従来のコ−ベリフィケーションを示す。図６４に示し上述した設定とは異なり、テストデバイスは、ターゲットシステム２０４０といくつかの周辺デバイスとテストワークステーション２０５２とに接続されたエミュレータ２０４８内でプログラムされる。エミュレータ２０４８は、エミュレーションクロック２０６６とエミュレータ内でプログラムされたテストデバイスとを含む。
【０６２６】
エミュレータ２０４８は、ＰＣＩバスブリッジ２０４４とＰＣＩバス２０５７と制御線２０５６とを介してターゲットシステム２０４０に接続されている。ターゲットシステム２０４０は、マザーボード上の中央演算システム２０４１といくつかの周辺デバイスとの組み合わせである。ターゲットシステム２０４０は、中央演算システム２０４１を含み、中央演算システム２０４１は、ＣＰＵおよびメモリを含む。ターゲットシステム２０４０は、ＭｉｃｒｏｓｏｆｔＷｉｎｄｏｗｓ（登録商標）またはＳｕｎＭｉｃｒｏＳｙｓｔｅｍのＳｏｌａｒｉｓなどのいくつかのオペレーティングシステム下で動作して複数のアプリケーションを実行させる。テストデバイス用のデバイスドライバ２０４２は、中央演算システム２０４１に含まれ、オペレーティングシステム（および任意のアプリケーション）とエミュレータ２０４８内のテストデバイスとの間の通信をイネーブルにする。エミュレータ２０４８および、この演算環境の一部分である他のデバイスと通信するために、中央演算システム２０４１はＰＣＩバス２０４３に接続されている。ターゲットシステム２０４０内の他の周辺デバイスは、バス２０５８を介してターゲットシステムをネットワーク２０４９に接続するために用いられるイーサネット（登録商標）ＰＣＩアドオンカード２０４５、ならびにバス２０６０および２０５９を介してＳＣＳＩドライブ２０４７および２０５０に接続されているＳＣＳＩＰＣＩアドオンカード２０４６を含む。
【０６２７】
エミュレータ２０４８はさらにバス２０６２を介してテストワークステーション２０５２に接続されている。テストワークステーション２０５２は、その機能を果たすためにＣＰＵおよびメモリを含む。テストワークステーション２０５２はさらに、テストケース２０６１および、モデル化されているが物理的にはエミュレータ２０４８に接続されていない他のデバイス用のデバイスモデル２０６８を含み得る。
【０６２８】
最後にエミュレータ２０４８は、バス２０６１を介して、フレームバッファまたはデータストリーム記録／再生システム２０５１などのいくつかの他の周辺デバイスに接続されている。このフレームバッファまたはデータストリーム記録／再生システム２０５１はさらに、バス２０６３を介して通信デバイスまたはチャネル２０５３に接続され得、バス２０６４を介してＶＣＲ２０５４に接続され得、バス２０６５を介してモニタおよび／またはスピーカ２０５５に接続され得る。
【０６２９】
当業者には公知であるが、エミュレーションクロックは、実際のターゲットシステムの速度よりもはるかに遅い速度で動作する。従って、図６６のなかで塗りつぶされている部分はエミュレーション速度で実行され、他の塗りつぶされていない部分は実際のターゲットシステムの速度で実行される。
【０６３０】
上述したように、このエミュレータを用いたコ−ベリフィケーションツールにはいくつかの限界がある。論理アナライザまたはサンプルホールドデバイスを用いてテストデバイスの内部状態を得る場合、設計者は、デバッグ目的のために検査したい関連信号がサンプリング用出力ピン上に存在するように設計をコンパイルしなければならない。設計者が設計の別の部分をデバッグしたい場合、設計者は、その部分が論理アナライザまたはサンプルホールドデバイスによってサンプリングされ得る出力信号を有することを確認しなければならない。あるいは設計者は、これらの信号がサンプリング目的の出力ピン上に提供され得るようにエミュレータ２０４８内の設計をコンパイルしなおさなければならない。このような再コンパイルは数日または数週間かかり得、これは時間が重要である設計／開発スケジュールには長すぎる遅延であり得る。さらに、このコ−ベリフィケーションツールは信号を用いるため、これらの信号をデータに変換するために、または、何らかの信号から信号へのタイミング制御を提供するために、高度な回路が提供されなければならない。さらに、サンプリングすることが望まれる各信号に必要な多くのワイヤ２０６１および２０６２を用いる必要性が、デバッグ設定時の負荷および時間を増加させる。
【０６３１】
（再構成可能演算アレイによるシミュレーション）
本明細書で上述した本発明の単エンジン再構成可能演算（ＲＣＣ）アレイシステムの高レベル構成を、簡単に再検討するために図６７に示す。本発明の一実施形態において、この単エンジンＲＣＣシステムがコ−ベリフィケーションシステムに組み込まれる。
【０６３２】
図６７において、ＲＣＣアレイシステム２０８０は、ＲＣＣ演算システム２０８１と、再構成可能演算（ＲＣＣ）ハードウェアアレイ２０８４と、これらを接続するＰＣＩバス２０８９とを含む。重要なことは、ＲＣＣ演算システム２０８１がユーザ設計のモデル全体をソフトウェア内に含み、ＲＣＣハードウェアアレイ２０８４がユーザ設計のハードウェアモデルを含むことである。ＲＣＣ演算システム２０８１は、ＣＰＵと、メモリと、オペレーティングシステムと、単エンジンＲＣＣシステム２０８０を実行させるために必要なソフトウェアとを含む。ＲＣＣ演算システム２０８１内のソフトウェアモデルおよびＲＣＣハードウェアアレイ２０８４内のハードウェアモデルの漏れのない制御をイネーブルにするために、ソフトウェアクロック２０８２が設けられる。ＲＣＣ演算システム２０８１内にはテストベンチデータ２０８３がさらに格納されている。
【０６３３】
ＲＣＣハードウェアアレイシステム２０８４は、ＰＣＩインターフェース２０８５と、ＲＣＣハードウェアアレイボードセット２０８６と、インターフェース用の様々なバスとを含む。ＲＣＣハードウェアアレイボードセット２０８６は、ユーザ設計のうち少なくともハードウェア内でモデル化された部分（すなわちハードウェアモデル２０８７）とテストベンチデータ用のメモリ２０８８とを含む。一実施形態では、このハードウェアモデルの様々な部分が、構成時間中に、複数の再構成可能論理要素（例えばＦＰＧＡチップ）間に分散される。より多くの再構成可能論理要素またはチップが用いられるにつれて、より多くのボードが必要となり得る。一実施形態では、４つの再構成可能論理要素が単一のボード上に設けられる。他の実施形態では、８つの再構成可能論理要素が単一のボード上に設けられる。４つのチップを有するボード内での再構成可能論理要素の容量および性能は、８つのチップを有するボード内での再構成可能論理要素の容量および性能とは大幅に異なり得る。
【０６３４】
バス２０９０は、ＰＣＩインターフェース２０８５からハードウェアモデル２０８７に、ハードウェアモデル用の様々なクロックを提供する。バス２０９１は、ＰＣＩインターフェース２０８６とハードウェアモデル２０８７との間において、コネクタ２０９３および内部バス２０９４を介して他のＩ／Ｏデータを提供する。バス２０９２は、ＰＣＩインターフェース２０８５とハードウェアモデル２０８７との間のＰＣＩバスとして機能する。さらにテストベンチデータがハードウェアモデル２０８７内のメモリに格納され得る。上述したように、ハードウェアモデル２０８７は、ユーザ設計のハードウェアモデル以外の他の構成および機能であって、ハードウェアモデルがＲＣＣ演算システム２０８１とインターフェースすることを可能にするために必要な構成および機能を含む。
【０６３５】
このＲＣＣシステム２０８０は、単一のワークステーションとして提供されてもよいし、あるいはワークステーションのネットワークに接続されてもよい。後者の場合、各ワークステーションは時間分割ベースでＲＣＣシステム２０８０へのアクセスを提供される。実際、ＲＣＣアレイシステム２０８０は、シミュレーションスケジューラおよび状態スワッピングメカニズムを有するシミュレーションサーバとして作用する。サーバは、ワークステーションの各ユーザが、より高速な加速およびハードウェア状態スワッピングという目的のためにＲＣＣハードウェアアレイ２０８４にアクセスすることを可能にする。加速および状態スワッピングの後、各ユーザは、ユーザ設計をソフトウェア内で局所的にシミュレートする一方で、他のワークステーションの他のユーザにＲＣＣハードウェアアレイ２０８４の制御をリリースすることができる。このネットワークモデルは、以下に述べるコ−ベリフィケーションシステムにも用いられる。
【０６３６】
ＲＣＣアレイシステム２０８０は、設計全体をシミュレートするパワーとフレキシビリティ、選択されたサイクル中に再構成可能演算アレイ内でハードウェアモデルを介してテストポイントの一部を加速するパワーとフレキシビリティ、および設計者の設計の実質的に任意の部分の内部状態情報を随時取得するパワーとフレキシビリティを、設計者に与える。実際、単エンジン再構成可能演算アレイ（ＲＣＣ）システムは、概してハードウェア加速型シミュレータと呼ぶことができ、１回のデバッグセッションで以下のタスクを行うために用いられ得る。（１）シミュレーションのみ、（２）ユーザが随時、設計を開始し、停止し、値をアサートし、内部状態の調査を行い得る、ハードウェア加速を用いたシミュレーション、（３）シミュレーション後の分析、および（４）回路内エミュレーション。ソフトウェアモデルおよびハードウェアモデルは両方とも、ソフトウェアクロックを介して単エンジンの厳しい制御下にあるため、再構成可能演算アレイ内のハードウェアモデルはソフトウェアシミュレーションモデルに緊密に接続されている。このことにより、設計者は、価値のある内部状態情報を得るために、サイクル毎にデバッグすること、および複数のサイクルを介してハードウェアモデルを加速および減速することが可能になる。さらに、このシミュレーションシステムは信号ではなくデータを扱うため、信号からデータへの複雑な変換／タイミング回路が不要である。さらに典型的なエミュレーションシステムとは異なり、設計者が異なるノードセットを検査したいと考えた場合に、再構成可能演算アレイ内のハードウェアモデルを再コンパイルする必要はない。さらなる詳細については、上記を参照されたい。
【０６３７】
（外部Ｉ／Ｏを用いないコ−ベリフィケーションシステム）
本発明の一実施形態は、実際の物理的外部Ｉ／Ｏデバイスおよびターゲットアプリケーションを用いないコ−ベリフィケーションシステムである。従って、本発明の一実施形態によるコ−ベリフィケーションシステムは、ユーザ設計のソフトウェア部分およびハードウェア部分を、実際のターゲットシステムまたはＩ／Ｏデバイスを用いることなくデバッグするために、他の機能と共にＲＣＣシステムを組み込み得る。ターゲットシステムおよび外部Ｉ／Ｏデバイスは、ＲＣＣ演算システム内のソフトウェア内でモデル化される。
【０６３８】
図６８を参照すると、コ−ベリフィケーションシステム２１００は、ＲＣＣ演算システム２１０１と、ＲＣＣハードウェアアレイ２１０８と、これらを接続するＰＣＩバス２１１４とを含む。重要なことは、ＲＣＣ演算システム２１０１がユーザ設計のモデル全体をソフトウェア内に含み、再構成可能演算アレイ２１０８がユーザ設計のハードウェアモデルを含むことである。ＲＣＣ演算システム２１０１は、ＣＰＵと、メモリと、オペレーティングシステムと、単エンジンコ−ベリフィケーションシステム２１００を実行させるために必要なソフトウェアとを含む。ＲＣＣ演算システム２１０１内のソフトウェアモデルおよび再構成可能演算アレイ２１０８内のハードウェアモデルの漏れのない制御をイネーブルにするために、ソフトウェアクロック２１０４が設けられる。ＲＣＣ演算システム２１０１内にはテストケース２１０３がさらに格納されている。
【０６３９】
本発明の一実施形態によると、ＲＣＣ演算システム２１０１はさらに、ターゲットアプリケーション２１０２、ユーザ設計のハードウェアモデルのドライバ２１０５、デバイス（例えば、ビデオカード）のモデルとデバイスモデルのソフトウェア内のドライバ（２１０６で示す）、および別のデバイス（例えば、モニタ）のモデルとこれもまたソフトウェア内にあるデバイスモデルのドライバ（２１０７で示す）を含む。実質的にＲＣＣ演算システム２１０１は、実際のターゲットシステムおよび他のＩ／Ｏデバイスがこの演算環境の一部であることを、ユーザ設計のソフトウェアモデルおよびハードウェアモデルに伝えるために必要なデバイスモデルおよびドライバを、必要な数だけ含む。
【０６４０】
ＲＣＣハードウェアアレイ２１０８は、ＰＣＩインターフェース２１０９と、ＲＣＣハードウェアアレイボードセット２１１０と、インターフェース用の様々なバスとを含む。ＲＣＣハードウェアアレイボードセット２１１０は、ユーザ設計のうち少なくともハードウェア２１１２内でモデル化された部分、およびおよびテストベンチデータ用メモリ２１１３を含む。上述したように、各ボードは、複数の再構成可能論理要素またはチップを含む。
【０６４１】
バス２１１５は、ＰＣＩインターフェース２１０９からハードウェアモデル２１１２にハードウェアモデル用の様々なクロックを提供する。バス２１１６は、ＰＣＩインターフェース２１０９とハードウェアモデル２１１２との間において、コネクタ２１１１および内部バス２１１８を介して他のＩ／Ｏデータを提供する。バス２１１７は、ＰＣＩインターフェース２１０９とハードウェアモデル２１１２との間のＰＣＩバスとして機能する。さらにテストベンチデータが、ハードウェアモデル２１１３内のメモリに格納され得る。上述したように、ハードウェアモデルは、ユーザ設計のハードウェアモデル以外の他の構成および機能であって、ハードウェアモデルがＲＣＣ演算システム２１０１とインターフェースすることを可能にするために必要な構成および機能を含む。
【０６４２】
図６８のコ−ベリフィケーションシステムを従来のエミュレータベースのコ−ベリフィケーションシステムと比較するために、図６６は、ターゲットシステム２０４０、いくつかのＩ／Ｏデバイス（例えば、フレームバッファ、またはデータストリーム記録／再生システム２０５１）およびワークステーション２０５２に接続されたエミュレータ２０４８を示す。このエミュレータ構成は、設計者に、多くの問題と設定上の論点を提示する。エミュレータは、エミュレータ内でモデル化されるユーザ設計の内部状態を測定するために、論理アナライザまたはサンプルホールドデバイスを必要とする。論理アナライザおよびサンプルホールドデバイスは信号を必要とするため、信号からデータへの複雑な変換回路が必要である。さらに、信号から信号への複雑なタイミング制御回路が必要である。エミュレータの内部状態を測定するために用いられる各信号に必要な多くのワイヤが、設定中のユーザにさらに負荷を与える。デバッグセッション中、ユーザは、異なるセットの内部論理回路を検査したいと考える毎にエミュレータを再コンパイルして、論理アナライザまたはサンプルホールドデバイスによる測定および記録用の出力として、適切な信号が論理回路から提供されるようにしなければならない。再コンパイルにかかる長い時間は大きすぎる損失である。
【０６４３】
外部Ｉ／Ｏデバイスが接続されていない本発明のコ−ベリフィケーションシステムでは、ターゲットシステムおよび他のＩ／Ｏデバイスがソフトウェア内でモデル化されており、それにより実際の物理的ターゲットシステムおよびＩ／Ｏデバイスが不要となっている。ＲＣＣ演算システム２１０１はデータを処理するため、信号からデータへの複雑な変換回路も信号から信号へのタイミング制御システムも不要である。ワイヤの数も信号の数と無関係であり、従って設定は比較的単純である。さらに、ユーザ設計のハードウェアモデル内の論理回路の異なる部分をデバッグするのに再コンパイルを必要としない。なぜなら、コ−ベリフィケーションシステムはデータを処理するのであって、信号を処理するのではないからである。ＲＣＣ演算システムは、ソフトウェア制御クロック（すなわち、ソフトウェアクロックおよびクロックエッジ検出回路）を用いてＲＣＣハードウェアアレイを制御するため、ハードウェアモデルの起動および終了は容易になる。ハードウェアモデルからの読み出しも容易である。なぜなら、ユーザ設計全体のモデルがソフトウェア内にありソフトウェアクロックが同期をイネーブルにするからである。従って、ユーザはソフトウェアシミュレーションのみでデバッグを行い、ハードウェア内の設計の一部または全部を加速し、サイクル毎に様々な所望のテストポイントを行い、ソフトウェアおよびハードウェアモデルの内部状態（すなわち、レジスタおよび組み合わせ論理状態）を調査することができる。例えば、ユーザはいくつかのテストベンチデータで設計をシミュレートし、ハードウェアモデルに内部状態情報をダウンロードし、ハードウェアモデルでの様々なテストベンチデータで設計を加速し、得られたハードウェアモデルの内部状態値をレジスタ／組み合わせ論理再発生により調査し、ハードウェアモデルからソフトウェアモデルに値をロードすることができる。そしてユーザは最終的に、ハードウェアモデル加速型プロセスの結果を用いて、ソフトウェア内にあるユーザ設計の他の部分をシミュレートすることができる。
【０６４４】
しかし上述したように、デバッグセッション制御のために、ワークステーションがまだ必要である。ネットワーク構成において、ワークステーションは、デバッグデータに遠隔的にアクセスするために、コ−ベリフィケーションシステムに遠隔的に接続され得る。非ネットワーク構成においては、ワークステーションはコ−ベリフィケーションシステムに局所的に接続され得る。いくつかの実施形態では、ワークステーションは、デバッグデータが局所的にアクセスされ得るようにコ−ベリフィケーションシステムを内部に組み込み得る。
【０６４５】
（外部Ｉ／Ｏを用いたコ−ベリフィケーションシステム）
図６８において、様々なＩ／ＯデバイスおよびターゲットアプリケーションがＲＣＣ演算システム２１０１内でモデル化された。しかし、あまりに多くのＩ／ＯデバイスおよびターゲットアプリケーションがＲＣＣ演算システム２１０１内で実行しすぎると、全体の速度が低下する。ＲＣＣ演算システム２１０１内にＣＰＵが１つだけある場合、すべてのデバイスモデルおよびターゲットアプリケーションからの様々なデータを処理するのに、より長い時間が必要である。データのスループットを高めるために、実際のＩ／Ｏデバイスおよびターゲットアプリケーションが（これらのＩ／Ｏデバイスおよびターゲットアプリケーションのソフトウェアモデルに代えて）物理的にコ−ベリフィケーションシステムに接続され得る。
【０６４６】
本発明の一実施形態は、実際の物理的外部Ｉ／Ｏデバイスおよびターゲットアプリケーションを用い得るコ−ベリフィケーションシステムである。従って、コ−ベリフィケーションシステムは、実際のターゲットシステムおよび／またはＩ／Ｏデバイスを用いながら、ユーザ設計のソフトウェア部分およびハードウェア部分をデバッグするために、他の機能と共にＲＣＣシステムを組み込み得る。テストのために、コ−ベリフィケーションシステムはソフトウェアからのテストベンチデータと外部インターフェース（例えば、ターゲットシステムおよび外部Ｉ／Ｏデバイス）からの刺激を用い得る。テストベンチデータは、ユーザ設計のピン出力にテストデータを提供するためと、ユーザ設計内の内部ノードにテストデータを提供するためとに用いられ得る。外部Ｉ／Ｏデバイス（またはターゲットシステム）からの実際のＩ／Ｏ信号は、ユーザ設計のピン出力のみに向けられ得る。従って、外部インターフェース（例えば、ターゲットシステムまたは外部Ｉ／Ｏデバイス）からのテストデータと、ソフトウェア内のテストベンチプロセスとの間の１つの主要な相違は、テストベンチデータはピン出力および内部ノードに付与される刺激でユーザ設計をテストするために用いられ得るが、ターゲットシステムまたは外部Ｉ／Ｏデバイスからの実際のデータは、そのピン出力（またはピン出力を表すユーザ設計内のノード）を介してユーザ設計のみに付与され得るということである。コ−ベリフィケーションシステムの構造、およびターゲットシステムおよび外部Ｉ／Ｏデバイスに対する構成を以下に述べる。
【０６４７】
本発明の一実施形態によるコ−ベリフィケーションシステムは、図６６のシステム構成に比較すると、波線２０７０内の要素の構造および機能が異なる。換言すると、図６６は波線２０７０内のエミュレータおよびワークステーションを示すが、本発明の一実施形態は図６９に示すコ−ベリフィケーションシステム２１４０（および関連するワークステーション）を波線２０７０内のコ−ベリフィケーションシステム２１４０として含む。
【０６４８】
図６９を参照すると、本発明の一実施形態によるコ−ベリフィケーションシステム構成は、ターゲットシステム２１２０と、コ−ベリフィケーションシステム２１４０と、いくつかのオプションのＩ／Ｏデバイスと、これらを接続する制御／データバス２１３１および２１３２とを含む。ターゲットシステム２１２０は、中央演算システム２１２１を含み、中央演算システム２１２１はＣＰＵおよびメモリを含む。ターゲットシステム２１２０は、ＭｉｃｒｏｓｏｆｔＷｉｎｄｏｗｓ（登録商標）またはＳｕｎＭｉｃｒｏＳｙｓｔｅｍのＳｏｌａｒｉｓなどのいくつかのオペレーティングシステム下で動作して複数のアプリケーション２１２２およびテストケース２１２３を実行させる。ユーザ設計のハードウェアモデル用デバイスドライバ２１２４は、中央演算システム２１２１内に含まれて、オペレーティングシステム（および任意のアプリケーション）とユーザ設計との間の通信をイネーブルにする。コ−ベリフィケーションシステムおよびこの演算環境の一部である他のデバイスと通信するために、中央演算システム２１２１はＰＣＩバス２１２９に接続されている。ターゲットシステム２１２０内の他の周辺デバイスは、ターゲットシステムをネットワークに接続するために用いられるイーサネット（登録商標）ＰＣＩアドオンカード２１２５、バス２１３０を介してＳＣＳＩドライブ２１２８に接続されているＳＣＳＩＰＣＩアドオンカード２１２６、およびＰＣＩバスブリッジ２１２７を含む。
【０６４９】
コ−ベリフィケーションシステム２１４０は、ＲＣＣ演算システム２１４１、ＲＣＣハードウェアアレイ２１９０、外部Ｉ／Ｏ拡張部という形態の外部インターフェース２１３９、およびＲＣＣ演算システム２１４１とＲＣＣハードウェアアレイ２１９０とを接続するＰＣＩバス２１７１を含む。ＲＣＣ演算システム２１４１は、ＣＰＵと、メモリと、オペレーティングシステムと、単エンジンコ−ベリフィケーションシステム２１４０を実行させるために必要なソフトウェアとを含む。重要なことは、ＲＣＣ演算システム２１４１がユーザ設計全体をソフトウェア内に含み、ＲＣＣハードウェアアレイ２１９０がユーザ設計のハードウェアモデルを含むことである。
【０６５０】
上述したように、コ−ベリフィケーションシステムの単エンジンは、ＲＣＣ演算システム２１４１の主要メモリ内にある主要ソフトウェアカーネルから、パワーとフレキシビリティとを取得し、コ−ベリフィケーションシステム２１４０の動作および実行全体を制御する。テストベンチプロセスがアクティブであり且つ外界からの信号がすべてコ−ベリフィケーションシステムに提示される限り、カーネルはアクティブなテストベンチコンポーネントを評価し、クロックコンポーネントを評価し、クロックエッジを検出してレジスタおよびメモリを更新すると共に組み合わせ論理データを伝搬させ、シミュレーション時間を早める。この主要ソフトウェアカーネルのおかげで、ＲＣＣ演算システム２１４１とＲＣＣハードウェアアレイ２１９０とが緊密に接続されるという特徴が得られる。
【０６５１】
ソフトウェアカーネルは、ＲＣＣハードウェアアレイ２１９０および外部に提供されたソフトウェアクロックソース２１４２からソフトウェアクロック信号を生成する。クロックソース２１４２は、これらのソフトウェアクロックの宛先に依存して、異なる周波数の複数のクロックを生成し得る。概してソフトウェアクロックは、ユーザ設計のハードウェアモデル内のレジスタが、ホールド時間を乱すことなくシステムクロックと同期して評価することを保証する。ソフトウェアモデルは、ハードウェアモデルレジスタ値に影響を与えるソフトウェア内のクロックエッジを検出し得る。従って、クロック検出メカニズムは、主要ソフトウェアモデル内でのクロックエッジ検出がハードウェアモデルでのクロック検出として解釈され得ることを保証する。ソフトウェアクロックおよびクロックエッジ検出論理のより詳細な説明については、図１７〜図１９および本明細書の対応部分を参照されたい。
【０６５２】
本発明の一実施形態によると、ＲＣＣ演算システム２１４１はさらに、複数のＩ／Ｏデバイスの１以上のモデルを含み得る。他の実際の物理的Ｉ／Ｏデバイスがコ−ベリフィケーションシステムに接続され得るという事実にもかかわらずである。例えば、ＲＣＣ演算システム２１４１は、デバイス（例えば、スピーカ）のモデルならびにそのドライバおよびテストベンチデータをソフトウェア（２１４３で示す）に含み得、別のデバイス（例えば、グラフィクスアクセラレータ）のモデルならびにそのドライバおよびテストベンチデータをソフトウェア（２１４４で示す）に含み得る。ユーザは、いずれのデバイス（ならびに、それぞれのドライバおよびテストベンチデータ）をモデル化しＲＣＣ演算システム２１４１に組み込むか、およびいずれのデバイスを実際にコ−ベリフィケーションシステムに接続するかを決定する。
【０６５３】
コ−ベリフィケーションシステムは、（１）ＲＣＣ演算システム２１４１とＲＣＣハードウェアアレイ２１９０との間、および（２）外部インターフェース（ターゲットシステムと外部Ｉ／Ｏデバイスとに接続されている）とＲＣＣハードウェアアレイ２１９０との間、にトラフィック制御を提供する制御論理を含む。いくつかのデータは、ＲＣＣハードウェアアレイ２１９０とＲＣＣ演算システム２１４１との間を通過する。なぜなら、いくつかのＩ／Ｏデバイスは、ＲＣＣ演算システム内でモデル化され得るからである。さらに、ＲＣＣ演算システム２１４１は、ユーザ設計のうち、ＲＣＣハードウェアアレイ２１９０内でモデル化された部分を含むソフトウェア内に設計全体のモデルを有する。その結果、ＲＣＣ演算システム２１４１はさらに、外部インターフェースとＲＣＣハードウェアアレイ２１９０との間を通過するすべてのデータに対するアクセスを有していなければならない。制御論理は、ＲＣＣ演算システム２１４１がこれらのデータに対するアクセスを有することを保証する。制御論理を以下に詳細に述べる。
【０６５４】
ＲＣＣハードウェアアレイ２１９０は、複数のアレイボードを含む。図６９に示す特定の実施形態において、ハードウェアアレイ２１９０は、ボード２１４５〜２１４９を含む。ボード２１４６〜２１４９は、構成されたハードウェアモデルの大部分を含む。ボード２１４５（すなわち、ボードｍ１）は、コ−ベリフィケーションシステムがハードウェアモデルの少なくとも一部分を構成するために用い得る再構成可能演算要素（例えば、ＦＰＧＡチップ）２１５３と、外部インターフェース（ターゲットシステムおよびＩ／Ｏデバイス）およびコ−ベリフィケーションシステム２１４０の間のトラフィックおよびデータを方向づける外部Ｉ／Ｏコントローラ２１５２とを含む。ボード２１４５は、外部Ｉ／Ｏコントローラを介して、ＲＣＣ演算システム２１４１が、外界（すなわち、ターゲットシステムおよびＩ／Ｏデバイス）とＲＣＣハードウェアアレイ２１９０との間で伝送されるすべてのデータに対するアクセスを有することを可能にする。このアクセスは重要である。なぜなら、コ−ベリフィケーションシステム内のＲＣＣ演算システム２１４１は、ソフトウェア内にユーザ設計全体のモデルを含み、ＲＣＣ演算システム２１４１はさらにＲＣＣハードウェアアレイ２１９０の機能を制御することができるからである。
【０６５５】
外部Ｉ／Ｏデバイスからの刺激がハードウェアモデルに提供される場合、ソフトウェアモデルもまた、この刺激に対するアクセスを有していなければならない。これにより、コ−ベリフィケーションシステムのユーザは次のデバッグステップを選択的に制御し得る。次のデバッグステップは、この付与された刺激の結果として設計者の設計の内部状態値を調査することを含み得る。ボードレイアウトおよび相互接続スキームに関して上述したように、最初と最後のボードは、ハードウェアアレイ２１９０内に含まれる。そのため、ボード１（ボード２１４６と示す）およびボード８（ボード２１４９と示す）は８ボードを有するハードウェアアレイ（ボードｍ１を除く）に含まれる。これらのボード２１４５〜２１４９を除くと、チップｍ２を有するボードｍ２（図６９には示さないが図７４に示す）がさらに設けられ得る。このボードｍ２は、外部インターフェースを有していないという点、および追加のボードが必要な場合に拡張のために用いられ得るという点以外は、ボードｍ１と同様である。
【０６５６】
これらのボードの内容を述べる。ボード２１４５（ボードｍ１）は、ＰＣＩコントローラ２１５１、外部Ｉ／Ｏコントローラ２１５２、データチップ（ｍ１）２１５３、メモリ２１５４、およびマルチプレクサ２１５５を含む。一実施形態において、ＰＣＩコントローラはＰＬＸ９０８０である。ＰＣＩコントローラ２１５１は、バス２１７１を介してＲＣＣ演算システム２１４１に接続され、バス２１７２を介して３状態バッファ２１７９に接続されている。
【０６５７】
コ−ベリフィケーションシステム内の、外界（ターゲットシステム２１２０およびＩ／Ｏデバイス）とＲＣＣ演算システム２１４１との間の主要トラフィックコントローラは、外部Ｉ／Ｏコントローラ２１５２（図６９、図７１および図７３では「ＣＲＴＬＸＭ」とも呼ぶ）である。外部Ｉ／Ｏコントローラ２１５２は、ＲＣＣ演算システム２１４１、ＲＣＣハードウェアアレイ内の他のボード２１４６〜２１４９、ターゲットシステム２１２０、および実際の外部Ｉ／Ｏデバイスに接続されている。もちろん、上述したように、ＲＣＣ演算システム２１４１とＲＣＣハードウェアアレイ２１９０との間の主要トラフィックコントローラは常に、各アレイボード２１４６〜２１４９およびＰＣＩコントローラ２１５１内の個々の内部Ｉ／Ｏコントローラ（例えば、Ｉ／Ｏコントローラ２１５６および２１５８）である。一実施形態において、コントローラ２１５６および２１５８などの個々の内部Ｉ／Ｏコントローラは、上述し図２２（ユニット７００）および図５６（ユニット１２００）などの例示的図面に示したＦＰＧＡＩ／Ｏコントローラである。
【０６５８】
外部Ｉ／Ｏコントローラ２１５２は３状態バッファ２１７９に接続されることにより、外部Ｉ／ＯコントローラがＲＣＣ演算システム２１４１とインターフェースすることを可能にする。一実施形態において、３状態バッファ２１７９は、ある例では、ＲＣＣ演算システム２１４１からのデータがローカルバス２１８０に向かって通過することを可能にする一方で、ローカルバスからのデータがＲＣＣ演算システム２１４１に向かって通過することを妨げ、別の例では、データがローカルバス２１８０からＲＣＣ演算システム２１４１へ通過することを可能にする。
【０６５９】
外部Ｉ／Ｏコントローラ２１５２はさらに、データバス２１７６を介してチップ（ｍ１）２１５３およびメモリ／外部バッファ２１５４に接続されている。一実施形態において、チップ（ｍ１）２１５３は、ユーザ設計のハードウェアモデルの少なくとも一部分（またはユーザ設計が十分小さい場合にはハードウェアモデルの全体）を構成するために用いられ得る、ＦＰＧＡチップなどの再構成可能演算要素である。外部バッファ２１５４は、一実施形態ではＤＲＡＭＤＩＭＭであり、様々な目的のためにチップ２１５３によって用いられ得る。外部バッファ２１５４は、局所的に各再構成可能論理要素（例えば、再構成可能論理要素２１５７）に接続された個々のＳＲＡＭメモリデバイスよりも大きいメモリ容量を提供する。この大きいメモリ容量は、ＲＣＣ演算システムが大容量のデータを格納することを可能にする。大容量のデータとは、テストベンチデータ、マイクロコントローラ用の埋め込みコード（ユーザ設計がマイクロコントローラの場合）、および１メモリデバイス内の大きいルックアップテーブルなどである。外部バッファ２１５４はさらに、上述したようにハードウェアモデリング用に必要なデータを格納するために用いられ得る。実質的に、この外部バッファ２１５４は部分的には、上述し例えば図５６（ＳＲＡＭ１２０５および２１０６）に示した他の高バンクまたは低バンクＳＲＡＭメモリデバイスのように機能するが、より大きなメモリを必要とする。外部バッファ２１５４はさらに、コ−ベリフィケーションシステムによって用いられて、ターゲットシステム２１２０および外部Ｉ／Ｏデバイスから受け取られたデータを格納する。これにより、これらのデータは、後にＲＣＣ演算システム２１４１によって取り出され得る。チップｍ１２１５３および外部バッファ２１５４はさらに、本明細書中の「メモリシミュレーション」というセクションに記載するメモリマッピングシステムを含む。
【０６６０】
外部バッファ２１５４内の所望のデータにアクセスするために、チップ２１５３およびＲＣＣ演算システム２１４１の両方が（外部Ｉ／Ｏコントローラ２１５２を介して）所望のデータのアドレスを送達し得る。チップ２１５３はアドレスバス２１８２上にアドレスを提供し、外部Ｉ／Ｏコントローラ２１５２はアドレスバス２１７７上にアドレスを提供する。これらのアドレスバス２１８２および２１７７は、マルチプレクサ２１５５への入力であり、マルチプレクサ２１５５は、外部バッファ２１５４に接続された出力線２１７８上に、選択されたアドレスを提供する。マルチプレクサ２１５５用選択信号は、線２１８１を介して外部Ｉ／Ｏコントローラ２１５２によって提供される。
【０６６１】
外部Ｉ／Ｏコントローラ２１５２はさらに、バス２１８０を介して他のボード２１４６〜２１４９に接続されている。一実施形態において、バス２１８０は、上述し図２２（ローカルバス７０８）および図５６（ローカルバス１２１０）など例示的図面に示したローカルバスである。この実施形態において、僅か５つのボード（ボード２１４５（ボードｍ１）を含む）が用いられる。ボードの実際の数は、ハードウェア内でモデル化されるユーザ設計の複雑さと大きさによって決定される。普通の複雑さを有するユーザ設計のハードウェアモデルは、より複雑なユーザ設計のハードウェアモデルよりも必要とするボードの数が少ない。
【０６６２】
スケーラビリティをイネーブルにするために、ボード２１４６〜２１４９は、ボード間のいくつかの相互接続線を除いて、実質的に互いに同一である。これらの相互接続線は、ユーザ設計のハードウェアモデルのうち、１チップ（例えば、ボード２１４６内のチップ２１５７）内の部分が、同一のユーザ設計内のハードウェアモデルのうち、別のチップ（たとえば、ボード２１４８内のチップ２１６１）内に物理的に設けられた別の部分と通信することを可能にする。このコ−ベリフィケーションシステム用の相互接続構造に関して、図７４、図８および図３６〜図４４ならびに本明細書の対応部分を簡単に参照されたい。
【０６６３】
ボード２１４８は代表的ボードである。ボード２１４８は、この４ボードレイアウト（ボード２１４５（ボードｍｌ）を除く）内の第３のボードである。従って、相互接続線用の適切なターミネーションを必要とするエンドボードではない。ボード２１４８は、内部Ｉ／Ｏコントローラ２１５８、いくつかの再構成可能論理要素（たとえば、ＦＰＧＡチップ）２１５９〜２１６６、高バンクＦＤバス２１６７、低バンクＦＤバス２１６８、高バンクメモリ２１６９、および低バンクメモリ２１７０を含む。上述したように、一実施形態において、内部Ｉ／Ｏコントローラ２１５８は、上述し図２２（ユニット７００）および図５６（ユニット１２００）などの例示的図面に示したＦＰＧＡＩ／Ｏコントローラである。同様に、高および低バンクメモリデバイス２１６９および２１７０は、上述し例えば図５６（ＳＲＡＭ１２０５および１２０６）に示したＳＲＡＭメモリデバイスである。高および低バンクＦＤバス２１６７および２１６８は、一実施形態においては、上述し図２２（ＦＰＧＡバス７１８および７１９）、図５６（ＦＤバス１２１２および１２１３）、ならびに図５７（ＦＤバス１２８２）などの例示的図面に示したＦＤバスまたはＦＰＧＡバスである。
【０６６４】
コ−ベリフィケーションシステム２１４０をターゲットシステム２１２０および他のＩ／Ｏデバイスに接続するために、外部Ｉ／Ｏ拡張部という形態の外部インターフェース２１３９が設けられる。ターゲットシステム側では、外部Ｉ／Ｏ拡張部２１３９が、二次ＰＣＩバス２１３２および制御線２１３１を介してＰＣＩブリッジ２１２７に接続されている。制御線２１３１はソフトウェアクロックを送達するために用いられる。Ｉ／Ｏデバイス側では、外部Ｉ／Ｏ拡張部２１３９が、ピン出力データ用バス２１３６〜２１３８およびソフトウェアクロック用制御線２１３３〜２１３５を介して様々なＩ／Ｏデバイスに接続されている。Ｉ／Ｏ拡張部２１３９に接続され得るＩ／Ｏデバイスの数は、ユーザによって決定される。いずれにせよ、外部Ｉ／Ｏ拡張部２１３９には、多くのＩ／Ｏデバイスをコ−ベリフィケーションシステム２１４０に接続してデバッグセッションを首尾よく実行させるために必要なデータバスおよびソフトウェアクロック制御線が、必要なだけ設けられる。
【０６６５】
コ−ベリフィケーションシステム２１４０側では、外部Ｉ／Ｏ拡張部２１３９が、データバス２１７５、ソフトウェアクロック制御線２１７４、および走査制御線２１７３を介して外部Ｉ／Ｏコントローラ２１５２に接続される。外界（ターゲットシステム２１２０および外部Ｉ／Ｏデバイス）とコ−ベリフィケーションシステム２１４０との間においてピン出力データを通過させるために、データバス２１７５が用いられる。ＲＣＣ演算システム２１４１から外界へソフトウェアクロックデータを送達するために、ソフトウェアクロック制御線２１７４が用いられる。
【０６６６】
制御線２１７４および２１３１上に存在するソフトウェアクロックは、ＲＣＣ演算システム２１４１内の主要ソフトウェアカーネルによって生成される。ＲＣＣ演算システム２１４１は、ＰＣＩバス２１７１、ＰＣＩコントローラ２１５１、バス２１７１、３状態バッファ２１７９、ローカルバス２１８０、外部Ｉ／Ｏコントローラ２１５２、および制御線２１７４を介して、ソフトウェアクロックを外部Ｉ／Ｏ拡張部２１３９に送達する。外部Ｉ／Ｏ拡張部２１３９から、ソフトウェアクロックがターゲットシステム２１２０への（ＰＣＩブリッジ２１２７を介した）クロック入力として提供され、他の外部Ｉ／Ｏデバイスが制御線２１３３〜２１３５を介して提供される。ソフトウェアクロックは、主要クロックソースとして機能するため、ターゲットシステム２１２０およびＩ／Ｏデバイスはより低速で実行する。しかし、ターゲットシステム２１２０および外部Ｉ／Ｏデバイスに提供されるデータは、ＲＣＣ演算システム２１４１内のソフトウェアモデルおよびＲＣＣハードウェアアレイ２１９０内のハードウェアモデル同様、ソフトウェアクロック速度に同期する。同様に、ターゲットシステム２１２０および外部Ｉ／Ｏデバイスからのデータは、ソフトウェアクロックに同期してコ−ベリフィケーションシステム２１４０に送達される。
【０６６７】
従って、外部インターフェースとコ−ベリフィケーションシステムとの間を通過したＩ／Ｏデータは、ソフトウェアクロックに同期する。実質的に、ソフトウェアクロックは、外部Ｉ／Ｏデバイスおよびターゲットシステムとコ−ベリフィケーションシステムとの間にデータが通過する毎に、外部Ｉ／Ｏデバイスおよびターゲットシステムの動作を、コ−ベリフィケーションシステム（ＲＣＣ演算システムおよびＲＣＣハードウェアアレイ内）の動作に同期させる。データイン動作およびデータアウト動作の両方にソフトウェアクロックが用いられる。データイン動作のためには、ポインタ（後述する）がＲＣＣ演算システム２１４１から外部インターフェースへソフトウェアクロックをラッチすると、他のポインタがこれらのＩ／Ｏデータを外部インターフェースから、ＲＣＣハードウェアアレイ２１９０のハードウェアモデル内の選択された内部ノードへラッチする。これらのポインタは、ソフトウェアクロックが外部インターフェースに送達されるこのサイクル中、これらのＩ／Ｏデータを１つずつラッチする。すべてのデータがラッチされると、ＲＣＣ演算システムが別のソフトウェアクロックを生成し得、所望であれば別のソフトウェアクロックサイクルで再びさらなるデータをラッチする。データアウト動作のためには、ＲＣＣ演算システムが外部インターフェースにソフトウェアクロックを送達し、その後ＲＣＣハードウェアアレイ２１９０内のハードウェアモデルの内部ノードから外部インターフェースへのデータのゲーティングを、ポインタの補助を受けて制御する。ポインタは、内部ノードから外部インターフェースへデータを、ここでも１つずつゲーティングする。さらなるデータが外部インターフェースに送達される必要がある場合、ＲＣＣ演算システムは別のソフトウェアクロックを生成して、外部インターフェースにデータをゲーティングするために、選択されたポインタを活性化することができる。ソフトウェアクロックの生成は厳密に制御され、従って、コ−ベリフィケーションシステムが、コ−ベリフィケーションシステムと外部インターフェースに接続された任意の外部Ｉ／Ｏデバイスとの間でデータ送達とデータ評価を同期させることを可能とする。
【０６６８】
スキャン制御ライン２１７３を使用して、存在し得る任意のデータに対して変換システム２１４０がデータバス２１３２、２１３６、２１３７、および２１３８をスキャンすることを可能にする。スキャン信号をサポートする外部Ｉ／Ｏコントローラ２１５１における論理はポインタ論理である。ここで種々の入力が特定期間のあいだ出力として提供され、その後ＭＯＶＥ信号を介して次の入力に遷移する。この論理は図１１に示されるスキームに類似する。実質的に、スキャン信号はマルチプレクサに対する選択信号のように機能する。ただし、スキャン信号がマルチプレクサへの種々の入力を順繰りに選択する場合を除く。したがって、１期間において、スキャン制御ライン２１７３上のスキャン信号は、ターゲットシステム２１２０から入力され得るデータに対してデータバス２１３２をサンプリングする。次の期間において、スキャン制御ライン２１７３上のスキャン信号は、そこに結合され得る外部Ｉ／Ｏデバイスから入力され得るデータに対してデータバス２１３６をサンプリングする。次の期間において、データバス２１３７がサンプリングされるなどであるので、変換システム２１４０は、ターゲットシステム２１２０または外部Ｉ／Ｏデバイス由来のすべてのピンアウトデータをこのデバッグセッションの間に受信および処理し得る。変換システム２１４０によってデータバス２１３２、２１３６、２１３７、および２１３８をサンプリングすることによって受信されたいずれのデータも外部バッファ２１５４へ外部Ｉ／Ｏコントローラ２１５２を介して伝送される。
【０６６９】
なお、図６９に例示される構成は、ターゲットシステム２１２０が一次ＣＰＵを含み、かつユーザ設計がビデオコントローラ、ネットワークアダプタ、グラフィックスアダプタ、マウス、または他のサポートデバイス、カード、または論理などの所定の周辺デバイスであると仮定する。したがって、ターゲットシステム２１２０は、一次ＰＣＩバス２１２９に結合されたターゲットアプリケーション（オペレーティングシステムを含む）を含み、かつ変換システム２１４０はユーザ設計を含みかつ二次ＰＣＩバス２１３２に結合される。構成は、ユーザ設計の対象に依存してまったく異なり得る。例えば、ユーザ設計がＣＰＵであるとすると、ターゲットアプリケーションは、ターゲットシステム２１２０がもはや中央計算システム２１２１を含まない場合、変換システム２１４０のＲＣＣ計算システム２１４１において実行される。実際に、バス２１３２はここで一次ＰＣＩバスであり、かつバス２１２９は二次ＰＣＩバスであり得る。実質的に、ユーザ設計が中央計算システム２１２１をサポートする周辺デバイスの１つである代わりに、ユーザ設計はここで主計算センタであり、かつ他の周辺デバイスはユーザ設計をサポートしている。
【０６７０】
外部インタフェース（外部Ｉ／Ｏエクスパンダ２１３９）と変換システム２１４０との間でデータを伝送するための制御論理が各ボード２１４５〜２１４９に含まれる。制御論理の一次部分は外部Ｉ／Ｏコントローラ２１５２に含まれるが、他の部分は種々の内部Ｉ／Ｏコントローラ（例えば、２１５６および２１５８）および再構成可能論理素子（例えば、ＦＰＧＡチップ２１５９および２１６５）に含まれる。例示としては、すべてのボードのすべてのチップの同じ繰り返しの論理構造の代わりにこの制御論理の所定部分を示すだけでよい。図６９の点線２１５０内の変換システム２１４０の一部は、制御論理の１サブセットを含む。ここで、この制御論理を図７０〜７３を参照してより詳細に説明する。
【０６７１】
制御論理のこの特定のサブセットにおける構成要素は、外部Ｉ／Ｏコントローラ２１５２、トライステートバッファ２１７９、内部Ｉ／Ｏコントローラ２１５６（ＣＴＲＬ１）、再構成論理素子２１５７（ボード１のチップ０を示すチップ０＿１）、ならびにこれらの構成要素に結合された種々のバスおよび制御ラインを含む。特に、図７０は、データインサイクルに対して使用される制御論理のその部分を示す。ここで外部インタフェース（外部Ｉ／Ｏエクスパンダ２１３９）およびＲＣＣ計算システム２１４１からのデータがＲＣＣハードウェアアレイ２１９０に送達される。図７２はデータインサイクルのタイミング図である。図７１は、データアウトサイクルに対して使用される制御論理のその部分を示す。ここでＲＣＣハードウェアアレイ２１９０からのデータがＲＣＣ計算システム２１４１および外部インタフェース（外部Ｉ／Ｏエクスパンダ２１３９）に送達される。図７３はデータアウトサイクルのタイミング図である。
【０６７２】
（データイン）
本発明の１実施形態によるデータイン制御論理は、ＲＣＣ計算システムまたはＲＣＣハードウェアアレイとの外部インタフェースのいずれかから送達されるデータを処理する役割を担う。データイン制御論理の１つの特定のサブセット２１５０（図６９参照）は、図７０に示され、かつ外部Ｉ／Ｏコントローラ２２００、トライステートバッファ２２０２、内部Ｉ／Ｏコントローラ２２０３、再構成可能論理素子２２０４、およびその間のデータ伝送を可能にする種々のバスおよび制御ラインを含む。外部バッファ２２０１もこのデータイン実施形態のために示す。このサブセットはデータイン動作に対して必要な論理を例示する。ここで外部インタフェースおよびＲＣＣ計算システムからのデータがＲＣＣハードウェアアレイに送達される。図７０のデータイン制御論理および図７２のデータインタイミング図をまとめて説明する。
【０６７３】
本発明のこのデータイン実施形態において２つのタイプのデータサイクル（グローバルサイクルおよびソフトウェア対ハードウェア（Ｓ２Ｈ）サイクル）が使用される。グローバルサイクルは、クロックなどのＲＣＣハードウェアアレイにおけるすべてのチップに向けられたいずれのデータ、リセット、およびＲＣＣハードウェアアレイにおける多くの異なるノードに向けられた所定の他のＳ２Ｈデータに対して使用される。これらの後者の「グローバル」Ｓ２Ｈデータに対して、グローバルサイクルを介してこれらのデータを送信するほうが連続のＳ２Ｈデータよりもより実現可能である。
【０６７４】
ソフトウェア対ハードウェアサイクルを使用して、ＲＣＣ計算システムにおけるテストベンチプロセスからＲＣＣハードウェアアレイへすべてのボードにおいてチップからチップへ順次データを送信する。ユーザ設計のハードウェアモデルは数ボードにわたって分配されるので、テストベンチデータはデータ評価のために各チップに提供されなければならない。したがって、データは、一度に１内部ノードの割合で、各チップにおける各内部ノードに順次送達される。順次送達することで、特定の内部ノードに対して指定されたデータがＲＣＣハードウェアアレイにおけるすべてのチップによって処理されることが可能となる。なぜなら、ハードウェアモデルが複数のチップの間に分配されているからである。
【０６７５】
このデータ評価に対して、変換は２つのアドレス空間（Ｓ２ＨおよびＣＬＫ）を提供する。上記のように、Ｓ２ＨおよびＣＬＫ空間はカーネルからハードウェアモデルへの一次入力である。ハードウェアモデルは、実質的にすべてのレジスタ構成要素およびユーザの回路設計の組み合わせ構成要素を保持する。さらに、ソフトウェアクロックはソフトウェアでモデル化され、かつハードウェアモデルとインタフェースをとるようにＣＬＫＩ／Ｏアドレス空間において提供される。カーネルはシミュレーション時間を進め、アクティブなテストベンチ構成要素を探し、かつクロック構成要素を評価する。いずれかのクロックエッジがカーネルによって検出された場合、レジスタおよびメモリが更新され、かつ組み合わせ構成要素を介した値が伝播される。したがって、ハードウェアアクセラレーションモードが選択される場合、これらの空間における値のいずれの変化もハードウェアモデルを始動して論理状態を変化させる。
【０６７６】
データ転送の間、ＤＡＴＡ＿ＸＳＦＲ信号は論理１である。この時間の間、ローカルバス２２２２〜２２３０は変換システムによって使用され以下のデータサイクルを用いてデータを伝送する。（１）ＲＣＣ計算システムからＲＣＣハードウェアアレイおよびＣＬＫ空間へのグローバルデータ、（２）外部インタフェースからＲＣＣハードウェアアレイおよび外部バッファへのグローバルデータ、および（３）ＲＣＣ計算システムからＲＣＣハードウェアアレイへのＳ２Ｈデータ（各ボードにおいて一度に１チップの割合）。したがって、最初の２つのデータサイクルはグローバルサイクルの一部であり、かつ最後のデータサイクルはＳ２Ｈサイクルの一部である。
【０６７７】
グローバルデータがＲＣＣ計算システムからＲＣＣハードウェアアレイへ送信されるデータイングローバルサイクルの第１の部分で、外部Ｉ／Ｏコントローラ２２００は、ライン２２５５上でＣＰＵ＿ＩＮ信号を論理「１」に使用可能にする。ライン２２５５はトライステートバッファ２２０２のイネーブル入力に結合される。ライン２２５５上の論理「１」の場合、トライステートバッファ２２０２は、ローカルバス２２２２上のデータがトライステートバッファ２２０２の他方側のローカルバス２２２３〜２２３０を通ることを可能にする。この特定の例において、ローカルバス２２２３、２２２４、２２２５、２２２６、２２２７、２２２８、２２２９、および２２３０は、それぞれＬＤ３、ＬＤ４（外部Ｉ／Ｏコントローラ２２００から）、ＬＤ６（外部Ｉ／Ｏコントローラ２２００から）、ＬＤ１、ＬＤ６、ＬＤ４、ＬＤ５、およびＬＤ７に対応する。
【０６７８】
グローバルデータはこれらのローカルバスラインから内部I／Ｏコントローラ２２０３におけるバスライン２２３１〜２２３５へ、そして次いでＦＤバスライン２２３６〜２２４０へ伝播する。この例において、ＦＤバスライン２２３６、２２３７、２２３８、２２３９、および２２４０はそれぞれＦＤバスラインＦＤ１、ＦＤ６、ＦＤ４、ＦＤ５、およびＦＤ７に対応する。
【０６７９】
これらのＦＤバスライン２２３６〜２２４０は、再構成可能論理素子２２０４におけるラッチ２２０８〜２２１３への入力に結合される。この例において、再構成可能論理素子はチップ０＿１（すなわち、ボード１におけるチップ０）に対応する。また、ＦＤバスライン２２３６はラッチ２２０８に結合され、ＦＤバスライン２２３７はラッチ２２０９および２２１１に結合され、ＦＤバスライン２２３８はラッチ２２１０に結合され、ＦＤバスライン２２３９はラッチ２２１２に結合され、かつＦＤバスライン２２４０はラッチ２２１３に結合される。
【０６８０】
これらのラッチ２２０８〜２２１３のそれぞれに対するイネーブル入力は、いくつかのグローバルポインタおよびソフトウェア対ハードウェア（Ｓ２Ｈ）ポインタに結合される。ラッチ２２０８〜２２１１へのイネーブル入力はグローバルポインタに結合され、かつラッチ２２１２〜２２１３へのイネーブル入力はＳ２Ｈポインタに結合される。いくつかのグローバルポインタの例はライン２２４１上のＧＬＢ＿ＰＴＲ０、ライン２２４２上のＧＬＢ＿ＰＴＲ１、ライン２２４３上のＧＬＢ＿ＰＴＲ２、およびライン２２４４上のＧＬＢ＿ＰＴＲ３を含む。いくつかのＳ２Ｈポインタの例は、ライン２２４５上のＳ２Ｈ＿ＰＴＲ０およびライン２２４６上のＳ２Ｈ＿ＰＴＲ１を含む。これらのラッチへのイネーブル入力はこれらのポインタに結合されるので、それぞれのラッチは、ユーザ設計のハードウェアモデルにおけるそれらの目的の宛先ノードにデータを適切なポインタ信号なしにはラッチし得ない。
【０６８１】
これらのグローバルおよびＳ２Ｈポインタ信号は、データインポインタ状態マシン２２１４によって出力２２５４上に生成される。データインポインタ状態マシン２２１４は、ライン２２５３上のＤＡＴＡ＿ＸＳＦＲおよびＦ＿ＷＲによって制御される。内部I／Ｏコントローラ２２０３はＤＡＴＡ＿ＸＳＦＲおよびＦ＿ＷＲをライン２２５３上に生成する。ＤＡＴＡ＿ＸＳＦＲは、ＲＣＣハードウェアアレイとＲＣＣ計算システムまたは外部インタフェースのいずれかとの間のデータ転送が所望の場合はいつでも常に論理「１」である。Ｆ＿ＲＤ信号とは対照的に、Ｆ＿ＷＲ信号は、ＲＣＣハードウェアアレイへの書き込みが所望される場合はいつも論理「１」である。Ｆ＿ＲＤ信号を介する読み出しは、ＲＣＣハードウェアアレイからＲＣＣ計算システムまたは外部インタフェースのいずれかへのデータの送達を必要とする。ＤＡＴＡ＿ＸＳＦＲおよびＦ＿ＷＲ信号の両方が論理「１」である場合、データインポインタ状態マシンは、適切なプログラムされた順序で適切なグローバルまたはＳ２Ｈポインタ信号を生成し得る。
【０６８２】
これらのラッチの出力２２４７〜２２５２は、ユーザ設計のハードウェアモデルにおける種々の内部ノードに結合される。これらの内部ノードのいくつかはユーザ設計の入力ピンアウトに対応する。ユーザ設計は、通常ピンアウトを介してアクセス可能でない他の内部ノードを有するが、これらの非ピンアウト内部ノードには他のデバッグ目的がある。すなわち、ユーザ設計において種々の内部ノード（それらが入力ピンアウトであるかないかにかかわらず）に刺激を印加することを所望する設計者に柔軟性を与えることである。外部インタフェースによってユーザ設計の複雑なハードウェアモデルに印加される刺激に対して、データイン論理および入力ピンアウトに対応するこれらの内部ノードが関係する。例えば、ユーザ設計がＣＲＴＣ６８４５ビデオコントローラである場合、いくつかのピンアウトは以下のとおりであり得る。
【０６８３】
ＬＰＳＴＢ−ライトペンストローブピン
〜ＲＥＳＥＴ−６８４５コントローラをリセットするための低レベル信号
ＲＳ−レジスタ選択
Ｅ−イネーブル
ＣＬＫ−クロック
〜ＣＳ−チップ選択
他の入力ピンアウトはまた、このビデオコントローラにおいて利用可能である。外部へのインタフェースである入力ピンアウトの数に基づいて、ノードの数およびしたがってラッチおよびポインタの数は容易に決定され得る。ＲＣＣハードウェアアレイにおいて構成されるあるハードウェアモデルは、例えば、総数１８０ラッチ（＝３０×６）に対してＧＬＢ＿ＰＴＲ０、ＧＬＢ＿ＰＴＲ１、ＧＬＢ＿ＰＴＲ２、ＧＬＢ＿ＰＴＲ３、Ｓ２Ｈ＿ＰＴＲ０、およびＳ２Ｈ＿ＰＴＲ１のそれぞれに関連する３０の別個のラッチを有し得る。他の設計において、ＧＬＢ＿ＰＴＲ４〜ＧＬＢ＿ＰＴＲ３０などのより多くのグローバルポインタが必要に応じて使用され得る。同様に、Ｓ２Ｈ＿ＰＴＲ２〜Ｓ２Ｈ＿ＰＴＲ３０などのより多くのＳ２Ｈポインタが必要に応じて使用され得る。これらのポインタおよびそれらの対応のラッチは、各ユーザ設計のハードウェアモデルの要件に基づく。
【０６８４】
図７０および７２に戻る。ＦＤバスライン上のデータは、ラッチが適切なグローバルポインタまたはＳ２Ｈポインタ信号を用いて使用可能にされる場合にのみ、これらの内部ノードへ転送される。そうでなければ、これらのノードはＦＤバス上のいずれのデータによっても駆動されない。Ｆ＿ＷＲがＣＰＵ＿ＩＮ＝１期間の前半において論理「１」である場合、ＧＬＢ＿ＰＴＲ０は論理「１」であり、ＦＤ１上のデータを対応の内部ノードへライン２２４７を介して伝送する。使用可能とするためのＧＬＢ＿ＰＴＲ０に依存する他のラッチが存在する場合、これらのラッチはまたデータをそれらの対応する内部ノードにラッチする。ＣＰＵ＿ＩＮ＝１期間の後半において、Ｆ＿ＷＲは再度論理「１」になり、これによりＧＬＢ＿ＰＴＲ１を起動して論理「１」に上げる。これにより、ＦＤ６上のデータは、ライン２２４８に結合された内部ノードへ伝送される。また、これにより、ラッチ２２０５によってライン２２１６にラッチされるべきライン２２２３上にソフトウェアクロック信号を送信し、かつＧＬＢ＿ＰＴＲ１をイネーブルライン２２１５上に送信する。このソフトウェアクロックはターゲットシステムおよび他の外部Ｉ／Ｏデバイスへの外部クロック入力へ送達される。ＧＬＢ＿ＰＴＲ０およびＧＬＢ＿ＰＴＲ１はデータイングローバルサイクルの第１の部分のためにのみ使用されるので、ＣＰＵ＿ＩＮは論理「０」を返し、そしてこれによりＲＣＣ計算システムからＲＣＣハードウェアアレイへのグローバルデータの送達が完了する。
【０６８５】
ここでデータイングローバルサイクルの第２の部分を説明する。ここで外部インタフェースからのグローバルデータはＲＣＣハードウェアおよび外部バッファへ送達される。やはり、ユーザ設計に向けられた、ターゲットシステムまたは外部Ｉ／Ｏデバイスからの種々の入力ピンアウト信号は、ハードウェアモデルおよびソフトウェアモデルへ提供されなければならない。これらのデータは、適切なポインタを使用することによってハードウェアモデルへ送達され、かつ内部ノードへ伝送されるようにラッチされ得る。これらのデータはまた、ＲＣＣ計算システムによって後で取り出すための外部バッファ２２０１にまずそのデータを格納することによってソフトウェアモデルに送達され、ソフトウェアモデルの内部状態を更新する。
【０６８６】
ここでＣＰＵ＿ＩＮは論理「０」であり、かつＥＸＴ＿ＩＮは論理「１」である。したがって、外部I／Ｏコントローラ２２００におけるトライステートバッファ２２０６は使用可能とされバスライン２２１７および２２１８などのＰＣＩバスライン上にデータを載せる。これらのＰＣＩバスラインはまた、外部バッファ２２０１における格納のためにＦＤバスライン２２１９に結合される。ＥＸＴ＿ＩＮ信号が論理「１」である期間の前半において、ＧＬＢ＿ＰＴＲ２は論理「１」である。これによりデータはＦＤ４上のデータ（バスライン２２１７、２２２４、およびローカルバスライン２２２８（ＬＤ４）を介して）ライン２２４９に結合されたハードウェアモデルにおける内部ノードにラッチされるようにラッチする。
【０６８７】
ＥＸＴ＿ＩＮ信号が論理「１」である期間の後半において、ＧＬＢ＿ＰＴＲ３は論理「１」である。これによりデータはＦＤ６上のデータ（バスライン２２１８、２２２５、およびローカルバスライン２２２７（ＬＤ６）を介して）ライン２２５０に結合されたハードウェアモデルにおける内部ノードにラッチされるようにラッチする。
【０６８８】
上記のように、ターゲットシステムまたはいくつかの他の外部Ｉ／Ｏデバイスからのこれらのデータはまた、ＲＣＣ計算システムによって後で取り出すための外部バッファ２２０１にまずそのデータを格納することによってソフトウェアモデルに送達され、ソフトウェアモデルの内部状態を更新する。バスライン２２１７および２２１８上のデータは、ＦＤバスＦＤ［６３：０］２２１９を介して外部バッファ２２０１へ提供される。各データが外部バッファ２２０１において格納される特定メモリアドレスはメモリアドレスカウンタ２２０７によってバス２２２０を介して外部バッファ２２０１へ提供される。そのような格納を可能にするために、ＷＲ＿ＥＸＴ＿ＢＵＦ信号が外部バッファ２２０１へライン２２２１を介して提供される。外部バッファ２２０１が一杯になる前に、ＲＣＣ計算システムは外部バッファ２２０１の内容を読み出してソフトウェアモデルに対して適切な更新をし得るようにする。ＲＣＣハードウェアアレイにおけるハードウェアモデルの種々の内部ノードへ送達されたいずれのデータによってもおそらくハードウェアモデルになんらかの内部状態変化が生じる。ＲＣＣ計算システムはソフトウェアにおけるユーザ設計全体のモデルを有するので、ハードウェアモデルにおけるこれらの内部状態変化はまた、ソフトウェアモデルにおいて反映されるべきである。これによりデータイングローバルサイクルが終了する。
【０６８９】
ここでＳ２Ｈサイクルを説明する。Ｓ２Ｈサイクルを使用してテストベンチデータをＲＣＣ計算システムからＲＣＣハードウェアアレイへ送達し、そして次いでそのデータを各ボードについて順次１つのチップから次のチップへ移動させる。ＣＰＵ＿ＩＮ信号は論理「１」となり、他方ＥＸＴ＿ＩＮ信号は論理「０」となる。これは、ＲＣＣ計算システムとＲＣＣハードウェアアレイとの間のデータ転送を示す。外部インタフェースは関与しない。ＣＰＵ＿ＩＮ信号はまた、トライステートバッファ２２０２がデータをローカルバス２２２２から内部Ｉ／Ｏコントローラ２２０３へ転送させることを可能にする。
【０６９０】
ＣＰＵ＿ＩＮ＝１期間の開始において、Ｓ２Ｈ＿ＰＴＲ０は論理「１」になる。これは、ＦＤ５上の（ローカルバス２２２２、ローカルバスライン２２２９、バスライン２２３４、およびＦＤバス２２３９を介する）データがライン２２５１に結合されるハードウェアモデルにおける内部ノードにラッチされるようにラッチする。ＣＰＵ＿ＩＮ＝１期間の第２部分において、Ｓ２Ｈ＿ＰＴＲ１は論理「１」となる。これは、ＦＤ７上の（ローカルバス２２２２、ローカルバスライン２２３０、バスライン２２３５、およびＦＤバス２２４０を介する）データがライン２２５２に結合されるハードウェアモデルにおける内部ノードにラッチされるようにラッチする。順次データ評価中に、ＲＣＣ計算システムからのデータは、まずチップｍ１に、次いでチップ０＿１（すなわち、ボード１上のチップ０）、チップ１＿１（すなわち、ボード０上のチップ１）、最後のボードの最後のチップ、チップ７＿８（すなわち、ボード８上のチップ７）まで送達される。チップｍ２が利用可能な場合、データはまた同様にこのチップに転送される。
【０６９１】
このデータ転送の終了時に、ＤＡＴＡ＿ＸＳＦＲは論理「０」に戻る。なお、外部インタフェースからのＩ／Ｏは、グローバルサイクル中にグローバルデータおよびハンドルとして処理される。これでデータイン制御論理およびデータインサイクルの説明を終える。
【０６９２】
（データアウト）
ここで本発明のデータアウト制御論理実施形態を説明する。本発明の実施形態のデータアウト制御論理はＲＣＣハードウェアアレイからＲＣＣ計算システムおよび外部インタフェースへ送達されたデータの処理を担う。刺激（外部またはその他）に応答してデータを処理する経過中、ハードウェアモデルは目的のアプリケーションまたはいくつかのＩ／Ｏデバイスが必要とし得る所定の出力データを生成する。これらの出力データは、別のアプリケーションまたはデバイスがそれ自身の処理のために必要とし得る実体的な（ｓｕｂｓｔａｎｔｉｖｅ）データ、アドレス、制御情報、または他の関連情報であり得る。ＲＣＣ計算システム（ソフトウェアにおいて他の外部Ｉ／Ｏデバイスのモデルを有し得る）、ターゲットシステム、または外部Ｉ／Ｏデバイスへのこれらの出力データは、種々の内部ノード上へ提供される。データイン論理について上記したように、これらの内部ノードのいくつかはユーザ設計の出力ピンアウトに対応する。ユーザ設計は、通常はピンアウトを介しては利用可能でない他の内部ノードを有するが、これらの非ピンアウト内部ノードには他のデバッグ目的がある。すなわち、ユーザ設計において種々の内部ノード（それらが入力ピンアウトであるかないかにかかわらず）に刺激を読み出し、そして分析することを所望する設計者に柔軟性を与えることである。外部インタフェースによってユーザ設計の複雑なハードウェアモデルに印加される刺激に対して、データイン論理および入力ピンアウトに対応するこれらの内部ノードが関係する。
【０６９３】
例えば、ユーザ設計がＣＲＴＣ６８４５ビデオコントローラである場合、いくつかのピンアウトは以下のとおりであり得る。
【０６９４】
ＭＡ０〜ＭＡ１３メモリアドレス
Ｄ０〜Ｄ７データバス
ＤＥディスプレイイネーブル
ＣＵＲＳＯＲカーソル位置
ＶＳ垂直同期
ＨＳ水平同期
他の入力ピンアウトはまたこのビデオコントローラにおいて利用可能である。外部へのインタフェースである入力ピンアウトの数に基づいて、ノードの数およびしたがってゲート論理およびポインタの数は容易に決定され得る。したがって、ビデオコントローラ上の出力ピンアウトＭＡ０〜ＭＡ１３はビデオＲＡＭのためのメモリアドレスを提供する。ＶＳ出力ピンアウトは、垂直同期のための信号を提供し、かつしたがってモニタ上で垂直の再トレース（ｒｅｔｒａｃｅ）を起こす。出力ピンアウトＤ０〜Ｄ７は、ターゲットシステムにおけるＣＰＵによって内部６８４５レジスタにアクセスするための双方向データバスを形成する８つの端子を形成する。これらの出力ピンアウトは、ハードウェアにおける所定の内部ノードに対応する。当然ながら、これらの内部ノードの数および性質はユーザ設計に依存して変化する。
【０６９５】
これらの出力ピンアウト内部ノードからのデータはＲＣＣ計算システムに提供されなければならない。なぜなら、ＲＣＣ計算システムはソフトウェアにおけるユーザ設計全体のモデルを含み、かつハードウェアモデルにおいて発生するイベントはいずれも、対応の変化がなされ得るようにソフトウェアモデルに通信されなければならない。このように、ソフトウェアモデルは、ハードウェアモデルにおける情報と整合する情報を有し得る。したがって、ＲＣＣ計算システムは、ユーザまたは設計者が外部Ｉ／Ｏエクスパンダ（ｅｘｐａｎｄｅｒ）上のポートのうちの１つに実際のデバイスを接続するのではなくソフトウェアにおいてモデル化すると決定したＩ／Ｏデバイスのデバイスモデルを有し得る。例えば、ユーザは、外部Ｉ／Ｏエクスパンダポートのうちの１つにおいて実際のモニタまたはスピーカをプラグするのではなくソフトウェアにおいてモニタまたはスピーカをモデル化するほうがより容易でありかつより有効であると決定し得る。さらに、ハードウェアモデルにおけるこれらの内部ノードからのデータはターゲットシステムおよびいずれの他の外部のＩ／Ｏデバイスに提供されなければならない。これらの出力ピンアウト内部ノードにおけるノードがＲＣＣ計算システムならびにターゲットシステムおよび他の外部Ｉ／Ｏデバイスに送達されるためには、本発明の１実施形態のデータアウト制御論理が変換（ｃｏｎｖｅｒｔｉｆｉｃａｔｉｏｎ）システムにおいて提供される。
【０６９６】
データアウト制御論理は、ＲＣＣハードウェアアレイからＲＣＣ計算システム２１４１および外部インタフェース（外部Ｉ／Ｏエクスパンダ２１３９）へのデータの転送を含むデータアウトサイクルを使用する。図６９において、外部インタフェース（外部Ｉ／Ｏエクスパンダ２１３９）と変換システム２１４０との間でデータを転送するための制御論理が各ボード２１４５〜２１４９にある。制御論理の主部分は、外部Ｉ／Ｏコントローラ２１５２にあるが、他の部分は種々のＩ／Ｏコントローラ（例えば、２１５６および２１５８）および再構成可能論理素子（例えば、ＦＰＧＡチップ２１５９および２１６５）にある。ここでも、例としては、すべてのボードにおけるすべてのチップについての同じ繰り返しの論理構造の代わりにこの制御論理の所定部分を示すだけで十分である。図６９の点線２１５０内の変換システム２１４０の部分は１サブセットの制御論理を含む。ここでこの制御論理を図７１および７３を参照してより詳細に説明する。図７１は、データアウトサイクルのために使用される制御論理の一部を例示する。図７３はデータアウトサイクルのタイミング図である。
【０６９７】
１つの特定のサブセットのデータアウト制御論理は、図７１に示され、かつ外部Ｉ／Ｏコントローラ２３００、トライステートバッファ２３０１、内部Ｉ／Ｏコントローラ２３０２、再構成可能論理素子２３０３、ならびにその間でデータ転送を可能にする種々のバスおよび制御ラインを含む。このサブセットは、データアウト動作のために必要な論理を例示する。ここで、外部インタフェースおよびＲＣＣ計算システムからのデータはＲＣＣハードウェアアレイに送達される。図７１のデータアウト制御論理および図７３のデータアウトタイミング図をまとめて説明する。
【０６９８】
データインサイクルの２つのサイクルタイプとは対照的に、データアウトサイクルは１つだけのタイプのサイクルを含む。データアウト制御論理はＲＣＣハードウェアモデルからのデータが（１）ＲＣＣ計算システム、および次いで（２）ＲＣＣ計算システムおよび（ターゲットシステムおよび外部Ｉ／Ｏデバイスとの）外部インタフェースへ順次送達される。すなわち、データアウトサイクルは、ＲＣＣハードウェアアレイにおけるハードウェアモデルの内部ノードからのデータが第１にＲＣＣ計算システム、そして次いで第２にＲＣＣ計算システムおよび各チップにおける外部インタフェースへ、各ボードにおいて一度に１チップかつ一度に１ボードの割合で、送達されることを必要とする。
【０６９９】
データイン論理と同様に、ポインタを使用して内部ノードからＲＣＣ計算システムおよび外部インタフェースへのデータを選択（またはゲーティング）する。図７１および７３において例示される１実施形態において、データアウトポインタ状態マシン２３１９は、ハードウェア対ソフトウェアデータおよびハードウェア対外部インタフェースデータの両方のためのバス２３５９上の５つのポインタＨ２Ｓ＿ＰＴＲ［４：０］を生成する。データアウトポインタ状態マシン２３１９は、ライン２３５８上のＤＡＴＡ＿ＸＳＦＲおよびＦ＿ＲＤ信号によって制御される。内部Ｉ／Ｏコントローラ２３０２は、ライン２３５８上にＤＡＴＡ＿ＸＳＦＲおよびＦ＿ＲＤ信号を生成する。ＤＡＴＡ＿ＸＳＦＲは、ＲＣＣハードウェアアレイとＲＣＣ計算システムまたは外部インタフェースのいずれかとの間のデータ転送が所望される場合はいつでも常に論理「１」である。Ｆ＿ＷＲ信号とは対照的に、Ｆ＿ＲＤは、ＲＣＣハードウェアアレイからの読み出しが所望される場合はいつでも論理「１」である。ＤＡＴＡ＿ＸＳＦＲおよびＦ＿ＲＤ信号の両方が論理「１」であれば、データポインタ状態マシン２３１９は適切なプログラムされた順序で適切なＨ２Ｓポインタ信号を生成し得る。他の実施形態は、ユーザ設計のために必要に応じてより多くのポインタ（またはより少ないポインタ）を使用し得る。
【０７００】
これらのＨ２Ｓポインタ信号はゲート論理に提供される。ゲート論理への１セットの入力２３５３〜２３５７は、いくつかのＡＮＤゲート２３１４〜２３１８へ向けられる。その他のセットの入力２３４８〜２３５２は、ハードウェアモデルの内部ノードに結合される。したがって、ＡＮＤゲート２３１４は内部ノードからの入力２３４８およびＨ２Ｓ＿ＰＴＲ０からの入力２３５３を有し、ＡＮＤゲート２３１５は内部ノードからの入力２３４９およびＨ２Ｓ＿ＰＴＲ１からの入力２３５４を有し、ＡＮＤゲート２３１６は内部ノードからの入力２３５０およびＨ２Ｓ＿ＰＴＲ２からの入力２３５５を有し、ＡＮＤゲート２３１７は内部ノードからの入力２３５１およびＨ２Ｓ＿ＰＴＲ３からの入力２３５６を有し、かつＡＮＤゲート２３１８は内部ノードからの入力２３５２およびＨ２Ｓ＿ＰＴＲ４からの入力２３５７を有する。適切なＨ２Ｓ＿ＰＴＲポインタなしには、内部ノードはＲＣＣ計算システムまたは外部インタフェースのいずれに対しても駆動されない。
【０７０１】
これらのＡＮＤゲート２３１４〜２３１８のそれぞれの出力２３４３〜２３４７は、ＯＲゲート２３１０〜２３１３に結合される。したがって、ＡＮＤゲート出力２３４３はＯＲゲート２３１０の入力に結合され、ＡＮＤゲート出力２３４４はＯＲゲート２３１１の入力に結合され、ＡＮＤゲート出力２３４５はＯＲゲート２３１１の入力に結合され、ＡＮＤゲート出力２３４６はＯＲゲート２３１２の入力に結合され、かつＡＮＤゲート出力２３４７はＯＲゲート２３１３の入力に結合される。なお、ＡＮＤゲート２３１５の出力２３４４は共有されないＯＲゲートに結合されない。むしろ、出力２３４４はＯＲゲート２３１１に結合される。ＯＲゲート２３１１はまたＡＮＤゲート２３１６の出力２３４５に結合される。ＯＲゲート２３１０〜２３１３へのその他の入力２３６０〜２３６６は、他のＡＮＤゲート（図示せず）（それ自身はほかの内部ノードおよびＨ２Ｓ＿ＰＴＲポインタへ結合される）の出力に結合され得る。これらのＯＲゲートおよびそれらの特定の入力の使用は、ユーザ設計および構成されたハードウェアモデルに基づく。したがって、他の設計において、より多くのポインタが使用され得、かつＡＮＤゲート２３１５からの出力２３４４は、ＯＲゲート２３１１ではない異なるＯＲゲートに結合される。
【０７０２】
ＯＲゲート２３１０〜２３１３の出力２３３９〜２３４２はＦＤバスラインＦＤ０、ＦＤ３、ＦＤ１、およびＦＤ４に結合される。ユーザ設計のこの特定の例では、４つの出力ピンアウト信号だけがＲＣＣ計算システムおよび外部インタフェースに送達され得る。したがって、ＦＤ０はＯＲゲート２３１０の出力に結合され、ＦＤ３はＯＲゲート２３１１の出力に結合され、ＦＤ１はＯＲゲート２３１２の出力に結合され、かつＦＤ４はＯＲゲート２３１３の出力に結合さる。これらのＦＤバスラインは、ローカルバスライン２３３０〜２３３３に内部Ｉ／Ｏコントローラ２３０２における内部ライン２３３４〜２３３８を介して結合される。この実施形態において、ローカルバスライン２３３０はＬＤ０であり、ローカルバスライン２３３１はＬＤ３であり、ローカルバスライン２３３２はＬＤ１であり、かつローカルバスライン２３３３はＬＤ４である。
【０７０３】
これらのローカルバスライン２３３０〜２３３３上のデータがＲＣＣ計算システムに送達されることを可能にするために、これらのローカルバスラインはトライステートバッファ２３０１に結合される。トライステートバッファ２３０１は、その正常状態において、データがローカルバスライン２３３０〜２３３３からローカルバス２３２０へ転送されることを可能にする。対照的に、データインの間、データは、ＣＰＵ＿ＩＮ信号がトライステートバッファ２３０１に提供される場合にのみ、ＲＣＣ計算システムからＲＣＣハードウェアアレイへ転送されることが可能とされる。
【０７０４】
これらのローカルバスライン２３３０〜２３３３上のデータが外部インタフェースに送達されることを可能とするために、ライン２３２１〜２３２４が提供される。ライン２３２１はライン２３３０および外部Ｉ／Ｏコントローラ２３００における所定のラッチ（図示せず）に結合され、ライン２３２３はライン２３３２および外部Ｉ／Ｏコントローラ２３００におけるラッチ２３０５に結合され、かつライン２３２４はライン２３３３および外部Ｉ／Ｏコントローラ２３００におけるラッチ２３０６に結合される。
【０７０５】
これらのラッチ２３０５および２３０６の各出力は、バッファに結合され、そして次いで外部インタフェースに結合される。次いで外部インタフェースはターゲットシステムまたは外部Ｉ／Ｏデバイスの適切な出力ピンアウトに結合される。したがって、ラッチ２３０５の出力はバッファ２３０７およびライン２３２７に結合される。また、ラッチ２３０６の出力は、バッファ２３０８およびライン２３２８に結合される。別のラッチ（図示せず）の別の出力はライン２３２９に結合され得る。この例において、ライン２３２７〜２３２９は、ターゲットシステムまたは所定の外部Ｉ／Ｏデバイスのワイヤ１、ワイヤ４、およびワイヤ３にそれぞれ対応する。最後に、ハードウェアモデルから外部インタフェースへのデータ転送の間に、ユーザ設計のハードウェアモデルは、ライン２３５０に結合された内部ノードがライン２３２９上のワイヤ３に対応し、ライン２３５１に結合された内部ノードがライン２３２７上のワイヤ１に対応し、かつライン２３５２に結合された内部ノードがライン２３２８上のワイヤ４に対応するように構成される。同様に、ワイヤ３はライン２３３１上のＬＤ３に対応し、ワイヤ１はライン２３３２上のＬＤ２に対応し、かつワイヤ４はライン２３３３上のＬＤ４に対応する。
【０７０６】
ルックアップテーブル２３０９はこれらのラッチ２３０５および２３０６への入力が可能なように構成される。ルックアップテーブル２３０９はライン２３６７上のＦ＿ＲＤ信号によって制御される。Ｆ＿ＲＤ信号は、ルックアップテーブルアドレスカウンタ２３０４の動作を起動する。各カウンタの増分ごとに、ポインタはルックアップテーブル２３０９における特定の行を使用可能にする。その特定の行におけるエントリ（またはビット）が論理「１」であるならば、ルックアップテーブル２３０９におけるその特定のエントリに結合されたＬＵＴ出力ラインがその対応のラッチを使用可能にし、そしてデータを外部インタフェースに転送し、そして最終的にはターゲットシステムまたは所定の外部Ｉ／Ｏデバイスにおける所望の宛先に転送される。例えば、ＬＵＴ出力ライン２３２５はラッチ２３０５へのイネーブル入力に結合され、かつＬＵＴ出力ライン２３２６はラッチ２３０６へのイネーブル入力に結合される。
【０７０７】
この例において、ルックアップテーブル２３０９の行０〜３は、チップｍ１における内部ノードのための出力ピンアウトワイヤに対応するラッチを使用可能とするようにプログラムされる。同様に、行４〜６は、チップ０＿１（すなわち、ボード１におけるチップ０)における内部ノードのための出力ピンアウトワイヤに対応するラッチを使用可能とするようにプログラムされる。行４において、ビット３は論理「１」である。行５において、ビット１は論理「１」である。行６において、ビット４は論理「１」である。すべての他のエントリまたはビット位置は論理「０」である。ルックアップテーブルにおける任意の所定のビット位置に対して、１つのエントリだけが論理「１」である。なぜなら、１つの出力ピンアウトワイヤは複数のＩ／Ｏデバイスを駆動できないからである。言い換えると、ハードウェアモデルにおける出力ピンアウト内部ノードはデータを外部インタフェースに結合された１つだけのワイヤにしか提供し得ないからである。
【０７０８】
上記のように、データアウト制御論理は、ＲＣＣハードウェアモデルにおける各チップにおける各再構成可能論理素子におけるデータが（１）ＲＣＣ計算システム、および次いで（２）ＲＣＣ計算システムおよび（ターゲットシステムおよび外部Ｉ／Ｏデバイスとの）外部インタフェース共に順次送達される。ＲＣＣ計算システムはこれらのデータを必要とする。なぜなら、ＲＣＣ計算システムは、ソフトウェアにおけるいくつかのＩ／Ｏデバイスのモデルを有し、かつこれらのモデル化Ｉ／Ｏデバイスのうちの１つを対象とするデータに対して、ＲＣＣ計算システムは、その内部状態がＲＣＣハードウェアアレイにおけるハードウェアモデルの状態と整合するようにそれらをモニタする必要がある。図７１および７３において例示されるこの例において、７つの内部ノードだけが、ＲＣＣ計算システムおよび外部インタフェースへの出力のために駆動され得る。これらの内部ノードのうちの２つはチップｍ１中にあり、かつその他の５つの内部ノードはチップ０＿１（すなわち、ボード１におけるチップ０）中にある。当然ながら、これらおよび他のチップにおける内部ノードが特定のユーザ設計に対して必要であり得るが、図７１および７３はこれら７つのノードのみを示すのみであり得る。
【０７０９】
データ転送の間、ＤＡＴＡ＿ＸＳＦＲ信号は論理「１」である。この時間の間、ローカルバス２３３０〜２３３３は変換システムによって使用され、順次ＲＣＣハードウェアアレイにおける各ボードにおける各チップからＲＣＣ計算システムおよび外部インタフェースの両方へデータを転送する。ＤＡＴＡ＿ＸＳＦＲおよびＦ＿ＲＤ信号は、出力ピンアウト内部ノードのための適切なゲートへの適切なポインタ信号Ｈ２Ｓ＿ＰＴＲ［４：０］を生成するためのデータアウトポインタ状態マシンの動作を制御する。Ｆ＿ＲＤ信号はまた、内部ノードデータから外部インタフェースへの送達のためのルックアップテーブルアドレスカウンタ２３０４を制御する。
【０７１０】
チップｍ１における内部ノードがまず処理され得る。データ転送サイクルの開始時にＦ＿ＲＤが論理「１」へ上がると、チップｍ１におけるＨ２Ｓ＿ＰＴＲ０は論理「１」へ上がる。これにより、Ｈ２Ｓ＿ＰＴＲ０に依存するチップｍ１におけるこれらの内部ノードにおけるデータがＲＣＣ計算システムにトライステートバッファ２３０１およびローカルバス２３２０を介して転送される。ルックアップテーブルアドレスカウンタ２３０４はカウントして、そしてルックアップテーブル２３０９の行０をポイントし、チップｍ１における適切なデータにおいて外部インタフェースにラッチする。Ｆ＿ＲＤ信号は再度論理「１」に上がり、Ｈ２Ｓ＿ＰＴＲ１によって駆動され得る内部ノードでのデータは、ＲＣＣ計算システムおよび外部インタフェースに送達される。Ｈ２Ｓ＿ＰＴＲ１は論理「１」に上がり、そして第２のＦ＿ＲＤ信号に応答して、ルックアップテーブルアドレスカウンタ２３０４はカウントして、そしてルックアップテーブル２３０９の行１をポイントし、チップｍ１における適切なデータにおいて外部インタフェースにラッチする。
【０７１１】
ここで再構成可能論理素子２３０３（すなわち、ボード１におけるチップ０＿１、またはチップ０)における５つの内部ノードが処理され得る。この例において、Ｈ２Ｓ＿ＰＴＲ０およびＨ２Ｓ＿ＰＴＲ１に関連する２つの内部ノードからのデータはＲＣＣ計算システムだけに送達され得る。Ｈ２Ｓ＿ＰＴＲ２、Ｈ２Ｓ＿ＰＴＲ３、およびＨ２Ｓ＿ＰＴＲ４に関連する３つの内部ノードからのデータはＲＣＣ計算システムおよび外部インタフェースに送達され得る。
【０７１２】
Ｆ＿ＲＤが論理「１」に上がると、チップ２３０３におけるＨ２Ｓ＿ＰＴＲ０は論理「１」になる。これにより、Ｈ２Ｓ＿ＰＴＲ０に依存するチップ２３０３におけるこれらの内部ノードにおけるデータはＲＣＣ計算システムへトライステートバッファ２３０１およびローカルバス２３２０を介して転送される。この例において、ライン２３４８に結合された内部ノードはライン２３５３上のＨ２Ｓ＿ＰＴＲ０に依存する。Ｆ＿ＲＤ信号が再度論理「１」になると、Ｈ２Ｓ＿ＰＴＲ１によって駆動される内部ノードでのデータはＲＣＣ計算システムに送達される。ここで、ライン２３４９に結合された内部ノードが影響を受ける。このデータはライン２３３１および２３２２を介してＬＤ３へ転送される。
【０７１３】
Ｆ＿ＲＤ信号が再度論理「１」になると、Ｈ２Ｓ＿ＰＴＲ２は論理「１」となり、かつライン２３５０に結合された内部ノードでのデータはＬＤ３上に提供される。このデータはＲＣＣ計算システムおよび外部インタフェースの両方に提供される。トライステートバッファ２３０１は、データをローカルバス２３２０に、そして次いでＲＣＣ計算システムに転送することを可能にする。外部インタフェースに関して、このデータは、イネーブルＨ２Ｓ＿ＰＴＲ２信号によってライン２３３１および２３２２を介してＬＤ３に提供される。Ｆ＿ＲＤ信号に応答して、ルックアップテーブルアドレスカウンタ２３０４はカウントして、そしてルックアップテーブル２３０９の行４をポイントし、外部インタフェースでライン２３５０からライン２３２９（ワイヤ３）に結合されたこの内部ノードからの適切なデータにおいてラッチする。
【０７１４】
Ｆ＿ＲＤ信号が再度論理「１」になると、Ｈ２Ｓ＿ＰＴＲ３は論理「１」となり、かつライン２３５１に結合された内部ノードでのデータはＬＤ１上に提供される。このデータは、ＲＣＣ計算システムおよび外部インタフェースの両方に提供される。トライステートバッファ２３０１は、データをローカルバス２３２０に、そして次いでＲＣＣ計算システムに転送することを可能にする。外部インタフェースに関して、このデータは、イネーブルＨ２Ｓ＿ＰＴＲ３信号によってライン２３３２および２３２３を介してＬＤ１に提供される。Ｆ＿ＲＤ信号に応答して、ルックアップテーブルアドレスカウンタ２３０４はカウントして、そしてルックアップテーブル２３０９の行５をポイントし、外部インタフェースでライン２３５１からライン２３２７（ワイヤ１）に結合されたこの内部ノードからの適切なデータにおいてラッチする。
【０７１５】
Ｆ＿ＲＤ信号が再度論理「１」になると、Ｈ２Ｓ＿ＰＴＲ４は論理「１」となり、かつライン２３５２に結合された内部ノードでのデータはＬＤ４上に提供される。このデータは、ＲＣＣ計算システムおよび外部インタフェースの両方に提供される。トライステートバッファ２３０１は、データをローカルバス２３２０に、そして次いでＲＣＣ計算システムに転送することを可能にする。外部インタフェースに関して、このデータは、イネーブルＨ２Ｓ＿ＰＴＲ４信号によってライン２３３３および２３２４を介してＬＤ４に提供される。Ｆ＿ＲＤ信号に応答して、ルックアップテーブルアドレスカウンタ２３０４はカウントして、そしてルックアップテーブル２３０９の行６をポイントし、外部インタフェースでライン２３５２からライン２３２８（ワイヤ４）に結合されたこの内部ノードからの適切なデータにおいてラッチする。
【０７１６】
チップｍ１の内部ノードでのデータをまずＲＣＣ計算システムに、そして次いでＲＣＣ計算システムおよび外部インタフェースの両方に転送するこの処理は順次その他のチップについで継続される。第１に、チップｍ１の内部ノードが駆動された。第２に、チップ０＿１（チップ２３０３）の内部ノードが駆動された。次に、チップ１＿１の内部ノードがあれば駆動され得る。この動作は、最後のボードにおける最後のチップにおける最後のノードが駆動されるまで継続する。したがって、チップ７＿８の内部ノードがあれば駆動され得る。最後に、チップｍ２の内部ノードがあれば駆動され得る。
【０７１７】
図７１はチップ２３０３のみにおける内部ノードを駆動するためのデータアウト制御論理を示すが、他のチップはまた、ＲＣＣ計算システムおよび外部インタフェースに駆動される必要のあり得る内部ノードを有する。内部ノードの数にかかわらず、データアウト制御論理はデータを１つのチップにおける内部ノードからＲＣＣ計算システムへ転送し得、そして次いで別のサイクルで、同じチップにおける異なるセットの内部ノードをＲＣＣ計算システムおよび外部インタフェースの共に対して駆動する。次いで、データアウト制御論理は次のチップに移動し、そしてまずＲＣＣ計算システムに対して指定されたデータを転送し、そして次いでＲＣＣ計算システムおよび外部インタフェースの両方に対する外部インタフェースに対して指定されたデータを転送する同じ２ステップ動作を行う。データが外部インタフェースを対象とする場合でさえ、ＲＣＣ計算システムはそのデータを知らなければならない。なぜなら、ＲＣＣ計算システムは、ＲＣＣハードウェアアレイにおけるハードウェアモデルの内部状態情報と整合する内部状態情報を有さなければならないソフトウェアにおけるユーザ設計全体のモデルを有するからである。
【０７１８】
（ボードレイアウト）
ここで、本発明の１実施形態の変換システムのボードレイアウトを図７４を参照して説明する。ボードはＲＣＣハードウェアアレイにおいて設置される。ボードレイアウトは、図８および３６〜４４に例示され、かつ添付の文に記載されるものと同様である。
【０７１９】
１実施形態において、ＲＣＣハードウェアアレイは６つのボードを含む。ボードｍ１はボード１に結合され、かつボード２はボード８に結合される。ボード１、ボード２、ボード３、およびボード８の結合および配置は、図８および３６〜４４を参照して上記された。
【０７２０】
ボードｍ１はチップｍ１を含む。ボードｍ１とその他のボードとの相互接続構造は、チップｍ１がボード１のチップ０、チップ２、チップ４、およびチップ６への南相互接続（Ｓｏｕｔｈｉｎｔｅｒｃｏｎｎｅｃｔ）に結合される。ボードｍ２とその他のボードとの相互接続構造は、チップｍ２がボード８のチップ０、チップ２、チップ４、およびチップ６への南相互接続に結合される。
【０７２１】
（Ｘ．例）
本発明の１実施形態の動作を例示するために、仮想ユーザ回路設計が使用され得る。構造化レジスタ転送レベル（ＲＴＬ）ＨＤＬコードにおいて、ユーザ回路設計の例は以下のとおりである。
【０７２２】
【数７】

【０７２３】
このコードは図２６において再生される。この回路設計の特定の機能の詳細は本発明を理解するために必要でない。しかし、ユーザがこのＨＤＬコードを生成してシミュレーションのための回路を設計することを読者は理解するべきである。このコードによって表される回路はユーザによって設計されるような所定の関数を実行して入力信号に応答し、そして出力を生成する。
【０７２４】
図２７は、図２６を参照して説明されたＨＤＬコードの回路図を示す。たいていの場合、ユーザはＨＤＬ形態でこの性質を表す前にこの性質の回路図を実際に生成し得る。いくつかの図面（ｓｃｈｅｍａｔｉｃ）キャプチャツールによって、実体回路図が入力可能となり、そして処理後、これらのツールは使用可能なコードを生成する。
【０７２５】
図２８に示すように、シミュレーションシステムは構成要素タイプ分析を実行する。ユーザの特定の回路設計を表すとして図２６に最初に提示されたＨＤＬコードがここで分析された。「ｍｏｄｕｌｅｒｅｇｉｓｔｅｒ（ｃｌｏｃｋ，ｒｅｓｅｔ，ｄ，ｑ）；」で開始し、かつ「ｅｎｄｍｏｄｕｌｅ」で終了し、かつさらに参照番号９００で特定されるコードの最初の数行がレジスタ定義セクションである。
【０７２６】
コードの次の数行（参照番号９０７）は、所定のワイヤ相互接続情報を表す。当業者に公知であるようなＨＤＬにおけるワイヤ変数を使用して、ゲートなどの構造エンティティ（ｅｎｔｉｔｉｅｓ）の間の物理的接続を表す。ＨＤＬはデジタル回路をモデルするために主に使用されるので、ワイヤ変数は必要な変数である。通常は、「ｑ」（例えば、ｑ１、ｑ２、ｑ３）は出力ワイヤラインを表し、かつ「ｄ」（例えば、ｄ１、ｄ２、ｄ３）は入力ワイヤラインを表す。
【０７２７】
参照番号９０８は、テストベンチである「ｓｉｇｉｎ」を示す。レジスタ番号９０９は、テストベンチ入力である「ｓｉｇｏｕｔ」を示す。
【０７２８】
参照番号９０１はレジスタ構成要素Ｓ１、Ｓ２、およびＳ３を示す。参照番号９０２は組み合わせ構成要素Ｓ４、Ｓ５、Ｓ６、およびＳ７を示す。なお、組み合わせ構成要素Ｓ４〜Ｓ７はレジスタ構成要素Ｓ１〜Ｓ３への入力である出力変数ｄ１、ｄ２、およびｄ３を有する。参照番号９０３はクロック構成応訴Ｓ８を示す。
【０７２９】
コードライン番号の次のシリーズはテストベンチ構成要素を示す。参照番号９０４はテストベンチ構成要素（ドライバ)Ｓ９を示す。参照番号９０５はテストベンチ構成要素（初期化）Ｓ１０およびＳ１１を示す。参照番号９０４はテストベンチ構成要素（モニタ)Ｓ１２を示す。
【０７３０】
構成要素タイプ分析は以下のテーブルに要約される。
【０７３１】
【表１５】

【０７３２】
構成要素タイプ分析に基づいて、システムは回路全体のためのソフトウェアモデルならびにレジスタおよび組み合わせ構成要素のためのハードウェアモデルを生成する。Ｓ１〜Ｓ３はレジスタ構成要素およびＳ４〜Ｓ７は組み合わせ構成要素である。これらの構成要素はハードウェアにおいてモデル化され、Ｓエミュレーションシステムのユーザがソフトウェアにおける回路全体をシミュレートするか、またはソフトウェアにおいてシミュレートしかつハードウェアにおいて選択的に高速化するかのいずれかを可能にする。いずれの場合も、ユーザはシミュレーションおよびハードウェア高速化モードを支配する。加えて、ユーザは、サイクルごとに開始、停止、値の検査、および入力値のアサートのソフトウェア制御をなおも維持しながらターゲットシステムを用いて回路をエミュレートし得る。
【０７３３】
図２９は、同じ構造化ＲＴＬレベルＨＤＬコードの信号ネットワーク分析を示す。図示されるように、Ｓ８、Ｓ９、Ｓ１０、およびＳ１１は、ソフトウェアにおいてモデル化または提供される。Ｓ９は本質的にｓｉｇｉｎ信号を生成するテストベンチプロセスであり、かつＳ１２は本質的にｓｉｇｏｕｔ信号を受信するテストベンチも似たプロセスである。この例において、Ｓ９はランダムなｓｉｇｉｎを生成して、回路をシミュレートする。しかし、レジスタＳ１〜Ｓ３および組み合わせ構成要素Ｓ４〜Ｓ７はハードウェアおよびソフトウェアにおいてモデル化される。
【０７３４】
ソフトウェア／ハードウェア境界に対して、システムは、ソフトウェアモデルとハードウェアモデルとのインタフェースをとるために使用される種々のレジデンス（ｒｅｓｉｄｅｎｃｅ）信号（すなわち、ｑ１、ｑ２、ｑ３、ＣＬＫ、ｓｉｇｉｎ、ｓｉｇｏｕｔ）のためのメモリ空間を割り当てる。
【０７３５】
【表１６】

【０７３６】
図３０は、この回路設計例についてのソフトウェア／ハードウェア分割の結果を示す。図３０は、ソフトウェア／ハードウェア分割のより実現可能な例示である。ソフトウェア側９１０は、ハードウェア側９１２にソフトウェア／ハードウェア境界９１１およびＰＣＩバス９１３を介して結合される。
【０７３７】
ソフトウェア側９１０はソフトウェアカーネルを含み、かつこれに制御される。一般に、カーネルは、Ｓエミュレーションシステムの動作を制御する主制御ループである。任意のテストベンチプロセスがアクティブである限り、カーネルはテストベンチ構成要素を評価し、クロック構成要素を評価し、クロックエッジを検出してレジスタおよびメモリを更新し、組み合わせ論理データを伝送し、かつシミュレーション時間を進める。カーネルはソフト側に常駐するが、動作またはステートメントのいくつかがハードウェアにおいて実行される。なぜなら、ハードウェアモデルがこれらのステートメントおよび動作に対して存在するからである。したがって、ソフトウェアはソフトウェアおよびハードウェアモデルの両方を制御する。
【０７３８】
ソフトウェア側９１０は、Ｓ１〜Ｓ１２を含むユーザの回路のモデル全体を含む。ソフトウェア側のソフトウェア／ハードウェア境界部分はＩ／Ｏバッファまたはアドレス空間Ｓ２Ｈ、ＣＬＫ、Ｈ２Ｓ、およびＲＥＧを含む。なお、ドライバテストベンチプロセスＳ９はＳ２Ｈアドレス空間に結合され、モニタテストベンチプロセスＳ１２はＨ２Ｓアドレス空間に結合され、かつクロック生成器Ｓ８はクロックアドレス空間に結合される。レジスタＳ１〜Ｓ３出力信号ｑ１〜ｑ３はＲＥＧ空間に割り当てられる。
【０７３９】
ハードウェアモデル９１２は組み合わせ構成要素Ｓ４〜Ｓ７のモデルを有し、純粋なハードウェア側に常駐する。ハードウェアモデル９１２のソフトウェア／ハードウェア境界上で、ｓｉｇｏｕｔ、ｓｉｇｉｎ、レジスタ出力ｑ１〜ｑ３、およびソフトウェアクロック９１６が実装される。
【０７４０】
ユーザのカスタム回路設計のモデルに加えて、システムはソフトウェアクロックおよびアドレスポインタを生成する。ソフトウェアクロックはレジスタＳ１〜Ｓ３への入力を可能にするための信号を提供する。上記のように、本発明のソフトウェアクロックは競合（ｒａｃｅ）条件および保持時間超過問題を除く。クロックエッジがソフトウェアにおいて主クロックによって検出されると、検出論理がハードウェアにおける対応の検出論理を起動する。その後、クロックエッジレジスタ９１６は、レジスタへの入力に常駐する任意のデータにおいてゲートへのレジスタイネーブル入力へのイネーブル信号を生成する。
【０７４１】
アドレスポインタ９１４はまた例および概念を目的として示される。アドレスポインタは実際には各ＦＰＧＡチップにおいて実装され、かつデータが選択的および順次その宛先に転送されることを可能にする。
【０７４２】
組み合わせ構成要素Ｓ４〜Ｓ７はまた、レジスタ構成要素Ｓ１〜Ｓ３、ｓｉｇｉｎ、およびｓｉｇｏｕｔに結合される。これらの信号はＩ／Ｏバス９１５上をＰＣＩバス９１３へ／から伝播する。
【０７４３】
マッピング、配置、およびルーティングステップの前の、完全なハードウェアモデルが図３１に示される（アドレスポインタを除く）。システムはまだモデルを特定チップにマッピングしていない。レジスタＳ１〜Ｓ３はＩ／Ｏバスおよび組み合わせ構成要素Ｓ４〜Ｓ６に結合され提供される。ｓｉｇｉｎ、ｓｉｇｏｕｔ、およびソフトウェアクロック９２０はまたモデル化される。
【０７４４】
一旦ハードウェアモデルが決定された場合、次いでシステムはモデルを１つ以上のチップにマッピング、配置、およびルーティングする。この特定の例は実際に１つのＡｌｔｅｒａＦＬＥＸ１０Ｋチップ上に実装され得るが、例示を目的としてこの例は２つのチップがこのハードウェアモデルを実装するために必要であり得ることを仮定し得る。図３２はこの例についての１つの特定のハードウェアモデル対チップ分割の結果を示す。
【０７４５】
図３２に示される完全なモデル（Ｉ／Ｏおよいクロックエッジレジスタを除く）は、点線で表されたチップ境界を伴う。この結果は、Ｓエミュレーションシステムのコンパイラによって生成され、その後、最終構成ファイルが生成される。したがって、ハードウェアモデルは、ワイヤライン９２１、９２２、および９２３に対する２つのチップ間に少なくとも３つのワイヤを必要とする。これら２つのチップ（チップ１およびチップ２）の間のピン／ワイヤの数を低減するために、別のモデル対チップ分割が生成されるべきか、または多重化方式が使用されるべきかいずれでもよい。
【０７４６】
図３２において示されるこの特定の分割結果を分析すると、これら２つのチップ間のワイヤの数は、ｓｉｇｉｎワイヤライン９２３をチップ２からチップ１へ移動することによって２つに低減され得る。実際に、図３３にこの分割を例示する。図３３における具体的な分割は、ワイヤの数にのみに基づく図３２における分割よりも良好な分割のように見えるが、この例は、Ｓエミュレーションシステムが図３２の分割を選択する前にマッピング、配置、およびルーティング操作が実行されると仮定し得る。図３２の分割結果は構成ファイルを生成するための基礎として使用され得る。
【０７４７】
図３４は、同じ仮定例についての論理パッチ操作を示す。ここで２つのチップにおける最終的に実現されたものが示される。システムは図３２の分割結果を使用して構成ファイルを生成した。しかし、簡単のためアドレスポインタを示さない。２つのＦＰＧＡチップ９３０および９４０が示される。チップ９３０は、特に、ユーザの回路設計の分割された部分、ＴＤＭ部９３１（受信器側）、ソフトウェアクロック９３２、およびＩ／Ｏバス９３３を含む。チップ９４０は、特に、ユーザの回路設計の分割された部分、送信器側のためのＴＤＭ部９４１、ソフトウェアクロック９４２、およびＩ／Ｏバス９４３を含む。ＴＤＭ部９３１および９４１は図９Ａ、９Ｂ、および９Ｃを参照して説明された。
【０７４８】
これらのチップ９３０および９４０は、ハードウェアモデルをまとめて結合する相互接続ワイヤ９４４および９４５を有する。これらの２つの相互接続ワイヤは図８に示す相互接続の一部である。図８を参照すると、１つのそのような相互接続は、チップＦ３２とＦ３３との間に位置する相互接続６１１である。１つの実施形態において、各相互接続に対してワイヤ／ピンの最大数は４４である。図３４において、モデル化された回路はチップ９３０および９４０の間にワイヤ／ピンを２つだけ必要とする。
【０７４９】
これらのチップ９３０および９４０は、バンクバス９５０に結合される。２つのチップだけが実装されるので、両方のチップは同じバンク中にあるか、または各チップは異なるバンク中に常駐する。必要に応じて、片方のチップは１つのバンクバスに結合され、かつ他方のチップは別のバンクバスに結合されて、ＦＰＧＡインタフェースでのスループットがＰＣＩインタフェースでのスループットと同じになることを確実にする。
【０７５０】
本発明の好適な実施形態の上記記載は例示および記載を目的として提示された。本発明を説明し尽くしたわけではなく、開示の形態に厳密に限定されることを意図しない。多くの修正および改変は、当業者に明らかである。本明細書中に記載の用途は、本発明の精神および範囲を逸脱せずに他の用途に置き換えられ得る。したがって、本発明は請求項の範囲にのみ限定されるべきである。
【図面の簡単な説明】
【図１】図１は、ワークステーション、再構成可能ハードウエアハードウエアエミュレーションモデル、エミュレーションインターフェースおよびＰＣＩバスに結合されたターゲットシステムを含む本発明の１実施形態の高級レベルの概要を示す。
【図２】図２は、本発明の特定の使用フローチャートを示す。
【図３】図３は、本発明の１実施形態によるコンパイル時間および走行時間のソフトウエアのコンパイルおよびハードウエア構成の高級レベル模式図を示す。
【図４】図４は、ソフトウエア／ハードウエアモデルおよびソフトウエアカーネルコードを生成することを含む、コンパイルプロセスのフローチャートを示す。
【図５】図５は、Ｓエミュレーションシステム全体を制御するソフトウエアカーネルを示す。
【図６】図６は、マッピング、配置およびルーティングにより、ハードウエアモデルを再構成可能ハードウエアボードにマッピングする方法を示す。
【図７】図７は、図８に示されるＦＰＧＡアレイの接続性マトリクスを示す。
【図８】図８は、４×４ＦＰＧＡアレイおよび相互接続の１実施形態を示す。
【図９Ａ】図９Ａは、時間分割多重化（ＴＤＭ）回路の１実施形態を例示する。この時間分割多重化（ＴＤＭ）回路の１実施形態は、ワイヤのグループが時間多重化の方法にて一緒に結合されることを可能にし、これにより、複数のピンではなく、１つのピンがこのグループのために１つのチップにて用いられ得る。図９Ａは、ピン出力問題の概要を示す。
【図９Ｂ】図９Ｂは、時間分割多重化（ＴＤＭ）回路の１実施形態を例示する。この時間分割多重化（ＴＤＭ）回路の１実施形態は、ワイヤのグループが時間多重化の方法にて一緒に結合されることを可能にし、これにより、複数のピンではなく、１つのピンがこのグループのために１つのチップにて用いられ得る。図９Ｂは、送信側のＴＤＭ回路を提供する
【図９Ｃ】図９Ｃは、時間分割多重化（ＴＤＭ）回路の１実施形態を例示する。この時間分割多重化（ＴＤＭ）回路の１実施形態は、ワイヤのグループが時間多重化の方法にて一緒に結合されることを可能にし、これにより、複数のピンではなく、１つのピンがこのグループのために１つのチップにて用いられ得る。図９Ｃは、受信側のＴＤＭ回路を提供することをそれぞれ示す。
【図１０】図１０は、本発明の１実施形態によってＳエミュレーションシステムアーキテクチャを示す。
【図１１】図１１は、本発明のアドレスポインタの１実施形態を示す。
【図１２】図１２は、図１１のアドレスポインタのアドレスポインタ初期化の状態遷移図を示す。
【図１３】図１３は、アドレスポインタの種々のＭＯＶＥ信号を派生的に生成するＭＯＶＥ信号ジェネレータの１実施形態を示す。
【図１４】図１４は、各ＦＰＧＡチップの多重化されたアドレスポインタの連鎖（ｃｈａｉｎ）を示す。
【図１５】図１５は、本発明の１実施形態によって多重化されたクロスチップアドレス連鎖の１実施形態を示す。
【図１６】図１６は、ソフトウエアクロックの実施およびハードウエアモデルの論理コンポーネントの評価に重要なクロック／データネットワーク解析のフローチャートを示す。
【図１７】図１７は、本発明の１実施形態によってハードウエアモデルの基本的な構築ブロックを示す。
【図１８Ａ】図１８Ａは、ラッチおよびフリップフロップを行うレジスタモデル実現を示す。
【図１８Ｂ】図１８Ｂは、ラッチおよびフリップフロップを行うレジスタモデル実現を示す。
【図１９】図１９は、本発明の１実施形態によってクロックエッジ検出論理の１実施形態を示す。
【図２０】図２０は、本発明の１実施形態によって図１９のクロックエッジ検出論理を制御する４状態の有限状態機械を示す。
【図２１】図２１は、本発明の１実施形態によって相互接続（ＪＴＡＧ、ＦＰＧＡバスおよび各ＦＰＧＡチップ用のグローバル信号指定部）を示す。
【図２２Ａ】図２２Ａは、ＰＣＩバスとＦＰＧＡアレイとの間のＦＰＧＡコントローラの１実施形態を示す。
【図２２Ｂ】図２２Ｂは、ＰＣＩバスとＦＰＧＡアレイとの間のＦＰＧＡコントローラの１実施形態を示す。
【図２３Ａ】図２３Ａは、図２２で説明されたＣＴＲＬ＿ＦＰＧＡユニットおよびデータバッファのより詳細な例示を示す。
【図２３Ｂ】図２３Ｂは、図２２で説明されたＣＴＲＬ＿ＦＰＧＡユニットおよびデータバッファのより詳細な例示を示す。
【図２４】図２４は、４×４ＦＰＧＡアレイ、ＦＰＧＡバンクとの関係、および拡張機能を示す。
【図２５】図２５は、ハードウエア開始方法の１実施形態を示す。
【図２６】図２６は、モデリングされ、シミュレーションをされるユーザ回路設計の一例についてのＨＤＬコードを示す。
【図２７】図２７は、図２６のＨＤＬコードの回路設計を象徴的に表す回路図を示す。
【図２８】図２８は、図２６のＨＤＬコードのコンポーネントタイプ解析を示す。
【図２９】図２９は、図２６に示されるユーザのカスタム回路設計に基づく構造化されたＲＴＬＨＤＬコードの信号ネットワーク解析を示す。
【図３０】図３０は、同一の仮想的な例のソフトウエア／ハードウエアパーティション結果を示す。
【図３１】図３１は、同一の仮想的な例のハードウエアモデルを示す。
【図３２】図３２は、ユーザのカスタム回路設計の同一の仮想的な例の特定のハードウエアモデル−チップパーティション結果を示す。
【図３３】図３３は、ユーザのカスタム回路設計の同一の仮想的な例の別のハードウエアモデル−チップパーティション結果を示す。
【図３４】図３４は、ユーザのカスタム回路設計の同一の仮想的な論理パッチ動作を示す。
【図３５】図３５（Ａ）〜（Ｄ）は、２つの例によって「ホップ」の原理およびＦＰＧＡボード接続スキームを示す。
【図３６】図３６は、本発明に用いられるＦＰＧＡチップの概要を示す。
【図３７】図３７は、ＦＰＧＡチップのＦＰＧＡ相互接続を示す。
【図３８Ａ】図３８Ａは、本発明の１実施形態によってＦＰＧＡコード接続概念図の側面を示す。
【図３８Ｂ】図３８Ｂは、本発明の１実施形態によってＦＰＧＡコード接続概念図の側面を示す。
【図３９】図３９は、本発明の１実施形態によってＦＰＧＡアレイの直接的に隣接する１ホップの６枚ボード相互接続レイアウトを示す。
【図４０Ａ】図４０Ａは、ＦＰＧＡ内部ボードの相互接続スキームを示す。
【図４０Ｂ】図４０Ｂは、ＦＰＧＡ内部ボードの相互接続スキームを示す。
【図４１Ａ】図４１Ａは、ボード相互接続コネクタの上面を示す。
【図４１Ｂ】図４１Ｂは、ボード相互接続コネクタの上面を示す。
【図４１Ｃ】図４１Ｃは、ボード相互接続コネクタの上面を示す。
【図４１Ｄ】図４１Ｄは、ボード相互接続コネクタの上面を示す。
【図４１Ｅ】図４１Ｅは、ボード相互接続コネクタの上面を示す。
【図４１Ｆ】図４１Ｆは、ボード相互接続コネクタの上面を示す。
【図４２】図４２は、代表的なＦＰＧＡボードのオンボードコネクタおよびいくつかのコンポーネントを示す。
【図４３】図４３は、図４１Ａ〜Ｆおよび図４２のコネクタのレジェンド（ｌｅｇｅｎｄ）を示す。
【図４４】図４４は、本発明の別の実施形態によるＦＰＧＡアレイの直接的に隣接する１ホップの２枚のボード相互接続レイアウトを示す。
【図４５】図４５は、本発明の別の実施形態によるマルチプロセッサを備えたワークステーションを示す。
【図４６】図４６は、複数のユーザが時分割基礎に基づく信号シングルシミュレーション／エミュレーションシステムを共有する本発明の別の実施形態による環境を示す。
【図４７】図４７は、本発明の１実施形態によるシミュレーションサーバの高級レベル構造を示す。
【図４８】図４８は、本発明の１実施形態によるシミュレーションサーバのアーキテクチャを示す。
【図４９】図４９は、シミュレーションサーバのフローチャートを示す。
【図５０】図５０は、ジョブのスワッピングプロセスのフローチャートを示す。
【図５１】図５１は、デバイスドライバと再構成可能ハードウエアユニットとの間の信号を示す。
【図５２】図５２は、優先権の異なるレベルを有する複数のジョブを取り扱うシミュレーションサーバの時分割機能を示す。
【図５３】図５３は、デバイスドライバと再構成可能ハードウエアユニットとの間で通信ハンドシェイク信号を示す。
【図５４】図５４は、通信ハンドシェイクプロトコルの状態図を示す。
【図５５】図５５は、本発明の１実施形態によるシミュレーションサーバのクライアント−サーバモデルの概要を示す。
【図５６】図５６は、本発明の１実施形態によるメモリマッピングを実施するシミュレーションシステムの高級レベルブロック図を示す。
【図５７】図５７は、メモリ有限状態機械（ＭＥＭＦＳＭ）の支援コンポーネントおよび各ＦＰＧＡ論理デバイス（ＥＶＡＬＦＳＭｘ）の評価有限状態機械を備えたシミュレーションシステムのメモリマッピングの局面（ａｓｐｅｃｔ）のより詳細なブロック図を示す。
【図５８】図５８は、本発明の１実施形態によるＣＴＲＬ＿ＦＰＧＡユニットのＭＥＭＦＳＭユニットの有限状態機械の状態図を示す。
【図５９】図５９は、本発明の１実施形態による各ＦＰＧＡチップの有限状態機械の状態図を示す。
【図６０】図６０は、メモリ読み出しデータダブルバッファを示す。
【図６１】図６１は、本発明の１実施形態によるシミュレーション書き込み／読み出しサイクルを示す。
【図６２】図６２は、ＣＬＫ＿ＥＮ信号後にＤＭＡ読み出し動作が生じる時のシミュレーションデータ転送動作のタイミング図を示す。
【図６３】図６３は、ＥＶＡＬ期間の終了時近くにＤＭＡ読み出し動作が生じる時のシミュレーションデータ転送動作のタイミング図を示す。
【図６４】図６４は、ＰＣＩアドオンカードとして実施される典型的なユーザ設計を示す。
【図６５】図６５は、テスト下のデバイスとしてＡＳＩＣを用いる典型的なハードウエア／ソフトウエアコ−ベリフィケーションシステムを示す。
【図６６】図６６は、テスト下のデバイスがエミュレータに予めプロミングされる時にエミュレータを用いる典型的なコ−ベリフィケーションシステムを示す。
【図６７】図６７は、本発明の１実施形態によるシミュレーションシステムを示す。
【図６８】図６８は、本発明の１実施形態による外部Ｉ／Ｏデバイスを用いないコ−ベリフィケーションシステムを示し、ただしＲＣＣコンピューティングシステムが種々のＩ／Ｏデバイスおよびターゲットシステムのソフトウエアモデルを含む。
【図６９】図６９は、本発明の別の実施形態による実際の外部Ｉ／Ｏデバイスおよびターゲットシステムを備えたコ−ベリフィケーションシステムを示す。
【図７０】図７０は、本発明の１実施形態による制御論理のデータイン部のより詳細な論理図を示す。
【図７１】図７１は、本発明の１実施形態による制御論理のデータアウト部のより詳細な論理図を示す。
【図７２】図７２は、制御論理のデータイン部のタイミング図を示す。
【図７３】図７３は、制御論理のデータアウト部のタイミング図を示す。
【図７４】図７４は、本発明の１実施形態によるＲＣＣハードウエアレイのボードレイアウトを示す。
【図７５Ａ】図７５Ａは、保持時間およびクロックグリッチ問題を説明するために用いられる例示的なシフトレジスタ回路を示す。
【図７５Ｂ】図７５Ｂは、保持時間違反を例示するために図７５Ａに示されたシフトレジスタ回路のタイミング図を示す。
【図７６Ａ】図７６Ａは、複数のＦＰＧＡチップを配置した図７５Ａの同一のシフトレジスタ回路を示す。
【図７６Ｂ】図７６Ｂは、保持時間を例示するために図７６Ａに示されたシフトレジスタ回路のタイミング図を示す。
【図７７Ａ】図７７Ａは、クロックグリッチ問題を例示するように用いられる例示的な論理回路を示す。
【図７７Ｂ】図７７Ｂは、クロックグリッチ問題を例示するための図７７Ａの論理回路のタイミング図を示す。
【図７８】図７８は、保持時間違反問題を解決する従来技術のタイミング調整技術を示す。
【図７９】図７９は、保持時間違反問題を解決する従来技術のタイミング合成技術を示す。
【図８０Ａ】本発明の１実施形態により、図８０Ａは、元来のラッチを示す。
【図８０Ｂ】本発明の１実施形態により、図８０Ｂは、タイミング無関係でかつグリッチなしラッチを示す。
【図８１Ａ】本発明の１実施形態により、図８１Ａは、元来設計フリップフロップを示す。
【図８１Ｂ】本発明の１実施形態により、図８１Ｂは、タイミング無関係でかつグリッチなしの設計タイプのフリップフロップを示す。
【図８２】図８２は、本発明の１実施形態によってタイミング無関係でかつグリッチなしの設計タイプのフリップフロップのトリガ機構のタイミング図を示す。[0001]
(Related US application)
This application is a continuation-in-part of US patent application Ser. No. 08 / 850,136, filed with the US Patent and Trademark Office (USPTO) on May 2, 1997.
[0002]
(Background of the Invention)
(Field of Invention)
The present invention relates generally to electronic design automation (EDA). More particularly, the present invention relates to a digital logic device that solves retention time and clock glitch problems for various applications, including simulation, hardware acceleration, and protection.
[0003]
(Description of related technology)
In general, electronic design automation (EDA) is a computer configured on various workstations to provide a designer with an automated or semi-automated tool to design and verify a user's custom circuit blueprint. It is a base tool. EDA is commonly used to create, analyze, and edit any electronic blueprint for simulation, emulation, prototyping, execution, or computing purposes. The term EDA can also be used to develop a system that uses a user design subsystem or component (ie, a target system). The end result of EDA is a modified and enhanced design, usually in the form of a separate integrated circuit or printed circuit board, which is an improvement over the original design, but without the spirit of the original design. maintain.
[0004]
The value of software for simulating circuit designs before hardware emulation is recognized in various industries that benefit from using EDA technology. Nevertheless, current software simulation and hardware emulation / acceleration are cumbersome for users because these processes are separate and independent in nature. For example, a user may simulate or debug a circuit design during a portion of that time during one debug / test session and use the results to simulate a hardware model during another time. You may want to accelerate, examine the various registers and combined logic values in the circuit at a selected time, and then return to software simulation. In addition, if internal registers and combinatorial logic values change over time, users can monitor this change even if the hardware model changes during the hardware acceleration / emulation process. Should be.
[0005]
Co-simulation addresses several issues with the nuisance of using two separate and independent processes: pure software simulation and pure hardware emulation / acceleration, and overall Arose from the need to make the system easier to use. However, co-simulation still has a number of flaws: (1) co-system requires manual partitioning, (2) co-simulation uses two loosely coupled engines, (3) co-simulation speed Is as slow as the software simulation speed, and (4) the co-simulation system encounters a race condition.
[0006]
First, partitioning between software and hardware is done manually, which places additional burden on the user instead of automatically. In essence, co-simulation partitions the design (beginning at the behavior level, then RTL, then to the gate level) to the user, and tests the software / hardware model itself with very large functional blocks Require to do. Such constraints require some sophisticated knowledge from the user.
[0007]
Second, the co-simulation system utilizes two loosely coupled and independent engines, which cause internal engine synchronization, tuning and flexibility issues. Co-simulation requires the synchronization of two different verification engines (software simulation and hardware emulation). Even when the software simulator side is coupled to the hardware accelerator side, only external pin output data (pin-out data) is available for inspection and loading. The values in the modeled circuit of the registers and the combinational logic levels are not available for easy inspection and download from one side to the other, limiting the utility of these co-simulator systems. Typically, if the user switches from software simulation to hardware / acceleration and then switches back, the user may have to re-simulate the entire design. Thus, the co-simulation system provides this capability if the user wants to switch between software simulation and hardware / acceleration during a single debug session that examines registers and combinational logic values. do not do.
[0008]
Third, the co-simulation speed is as slow as the simulation speed. Co-simulation requires the synchronization of two different verification engines: software simulation and hardware emulation. Each of these engines has its own control mechanism to drive simulation or emulation. This suggests that synchronization between software and hardware pushes the overall performance to the same low speed as software simulation. The overhead of adjusting the operation of these two engines adds to the slowdown of the co-simulation system.
[0009]
Fourth, the co-simulation system encounters a clock glitch problem due to race conditions between setup, hold time and clock signal. The co-simulator uses a hardware driven clock and can be input to different logic elements at different times due to different wireline lengths. If both of these logic elements should evaluate the data, one logic element evaluates the data in one time period, and the other logic element evaluates the data in a different time period. Increase the uncertainty level.
[0010]
Accordingly, there is a need in the industry for a system or method that solves the problems listed above by currently known simulation systems, hardware emulation devices, hardware acceleration, co-simulation, and protection systems.
[0011]
(Summary of the Invention)
The present invention provides a solution to the above-described problems in the form of a flexible and fast simulation / emulation system, where the system is a reconfigurable computing system (or RCC computing system) and reconfigurable. This is referred to as a “S emulation system”, “S emulator system”, or protection system, which includes various hardware arrays (or RCC hardware arrays).
[0012]
The S emulation system and method of the present invention provides the user with the ability to convert the design of an electronic system into software and hardware displays for simulation. Generally, the S emulation system is a software control emulator or a hardware acceleration simulator, and is the method used in this document. Thus, although pure software simulation is possible, the simulation can also be accelerated by the use of hardware models. Hardware acceleration is enabled by software controls for start, stop, value assertion, and value checking. In-circuit emulation mode is further available to test the user's circuit design in the environment of the target system of the circuit. Again, software control is available.
[0013]
Controls both software and hardware models, allowing users to start, stop, assert values, inspect values, and switch between different modes, providing users with greater execution time A software kernel that provides flexibility is the heart of the system. The kernel controls the various modes by controlling hardware data evaluation via enable inputs to the registers.
[0014]
The S emulation system and method according to the present invention provides four modes of operation. That is, (1) software simulation, (2) simulation through hardware acceleration, (3) in-circuit emulation system (ICE), and (4) post-simulation analysis. At a high level, the present invention is embodied in each of the above four modes or in various combinations of these modes as follows. That is, (1) only software simulation, (2) only simulation through hardware acceleration, (3) only in-circuit emulation (ICE), (4) only post-simulation analysis, (5) software simulation and hardware (6) Software simulation and ICE, (7) Simulation and ICE via hardware acceleration (8) Software simulation, Simulation via hardware acceleration, and ICE, (9) Software Wear simulation and post-simulation analysis, (10) simulation and post-simulation analysis via hardware acceleration, (11) software simulation, hardware Simulation via speed and post simulation analysis, (12) ICE and post simulation analysis, (13) Software simulation, ICE, post simulation analysis, (14) Simulation via hardware acceleration, ICE, post simulation analysis, And (15) Software simulation, simulation via hardware acceleration, ICE, and post-simulation analysis. Other combinations are possible and are within the scope of the present invention.
[0015]
Each mode or combination of modes provides the following characteristics or combinations of these characteristics. (1) switch between manual or automatic modes, (2) use (users can switch between modes, start, stop, assert, value assert, value check, simulation or emulation processes) A single process of cycles, (3) a compilation process that generates a software model and a hardware model, and (4) a software kernel that controls all modes with a main control loop, in one embodiment, initializes the system. , Evaluating an active test bench process / component, evaluating a clock component, detecting a clock edge, updating registers and memory, and communicating a combination component (5) component type analysis to generate a hardware model, (6) in one embodiment, including advancing simulation time and continuing the loop as long as there is an active test bench process Mapping hardware models to reconfigurable boards by clustering, placement, and routing; (7) in one embodiment, software for avoiding race conditions by gate clock logic analysis and gate data logic analysis; Clock Setup, (8) In one embodiment, trigger an enable signal in the hardware model and send a signal from the primary clock to the clock input of the hardware model clock edge register via the gate clock logic to Send the clock signal to the enable input of the hardware model register, send data from the primary clock register to the hardware model register via the gate data logic, and send the clock enable signal to the enable input of the hardware model register. Software clock realization by detecting clock edge in software model that resets clock edge register to be disabled, (9) Write selection data for debug session and post-simulation analysis, (10) Combinational logic regeneration, (11 In one embodiment, the basic building block is a D-type register with asynchronous and synchronous inputs, (12) address pointers at each chip, (13) multiplexed cross-chip address points. Interchain, (14) an array of FPGA chips and their interconnection schemes, (15) a bank of FPGA chips with a bus that tracks the performance of the PCI bus system, (16) an FPGA that allows expansion through a piggyback board Bank, and (17) Time Division Multiplexing (TDM) circuit for optimal pin usage. By various embodiments, the present invention provides other features as described herein, which may not be listed in the above list of features.
[0016]
One embodiment of the present invention is a simulation system. The simulation system operates in a host computer system for simulating circuit behavior. The host computer system includes a central processing unit (CPU), a main memory, and a local bus that couples the CPU to the main memory and enables communication between the CPU and the main memory. This circuit has a structure and a function specified in a hardware language such as HDL. This language makes it possible to describe circuits as component types and connections. The simulation system includes a software model, software control logic, and hardware logic elements.
[0017]
The software model of the circuit is coupled to the local bus. Typically, this model is resident in main memory. Software control logic is coupled to the software model and hardware logic elements to control the operation of the software model and hardware logic elements. The software control logic includes interface logic that enables reception of input data and clock signals from external processes and clock detection logic for detection of active edges of clock signals and generation of trigger signals. Further, the hardware logic element is coupled to the local bus and includes a hardware model of at least a portion of the circuit based on the component type and clock enable logic for evaluating data in the hardware model in response to the trigger signal.
[0018]
Further, the hardware logic elements include an array or a plurality of field programmable devices coupled to each other. Each field programmable device includes a hardware model of at least some circuitry, and thus all field programmable device combinations include the entire hardware model. Furthermore, the plurality of interconnections connect parts of the hardware model to each other. Each interconnect represents a direct connection between any two field programmable devices arranged in the same row or column. The shortest distance between any two field programmable devices is at most two interconnects or “hops”.
[0019]
Another embodiment of the invention is a system and method for simulating a circuit, where the circuit is modeled in software and at least a portion of the circuit is modeled in hardware. Data evaluation occurs in hardware but is controlled by software via a software clock. The data to be evaluated is communicated to the hardware model and stabilized. If the software model detects an active clock edge, the software model sends an enable signal to the hardware model to initiate data evaluation. The hardware model evaluates the data and waits for new incoming data that can be evaluated at the next active clock edge signal detection in the software model.
[0020]
Another embodiment of the present invention includes a software kernel that controls the operation of the software model and the hardware model. The software kernel includes evaluating an active test bench process component, evaluating a clock component, detecting a clock edge, updating registers and memory, communicating a combination component, and simulation Advancing time and continuing the loop as long as there is an active bench process.
[0021]
A further embodiment of the invention is a method of simulating a circuit, the circuit having a structure and function specified in a hardware language (eg, HDL). In addition, the hardware language allows a circuit to be described or transformed into a component. The method includes (1) determining a component type in a hardware language, (2) generating a model of a circuit based on the component type, and (3) providing input data to the model. Simulating the behavior of the circuit using the model. Generalizing the model may include (1) generating a software model of the circuit and (2) generating a hardware model of the circuit based on the component type.
[0022]
In another embodiment, the present invention is a method for simulating a circuit. The steps include (1) generating a software model of the circuit, (2) generating a hardware model of the circuit, and (3) providing the input data to the software model. Using to simulate the behavior of the circuit, (4) selectively switching to the hardware model, (5) providing input data to the hardware model, and (6) simulating in the hardware model. And simulating the behavior of the circuit using a hardware model. The method further includes (1) selectively switching to a software model, and (2) simulating the behavior of the circuit using the software model by providing input data to the software model. In addition. The simulation can also be stopped using a software model.
[0023]
For in-circuit emulation mode, the method includes (1) generating a software model of the circuit, (2) generating a hardware model of at least a portion of the circuit, and (3) a target system. Supplying an input signal from the hardware model to the hardware model, (4) supplying an output signal from the hardware model to the target system, and (5) simulating the behavior of the circuit using the hardware model. Thus, the software model includes steps that allow simulation / emulation to be controlled on a cycle-by-cycle basis.
[0024]
For post-simulation analysis, the method of simulating a circuit includes (1) generating a model of the circuit, and (2) providing input data to the model, and using this model to modify the behavior of the circuit. Simulating and (3) writing selected input data and selected output data as write points from the model. Software models and hardware models can be generated. The method includes (1) selecting a desired time-dependent point in the simulation, (2) selecting a write point at or before the selected time-dependent point, and (3) input data Providing the hardware model; and (4) simulating the behavior of the circuit using the hardware model from the selected write point.
[0025]
A further embodiment of the present invention is a method for generating a model for a simulation system for simulating a circuit. This step includes (1) generating a software model of the circuit, (2) generating a hardware model for at least a part of the circuit based on the component type, and (3) a clock in the hardware model. Generating a generation circuit to trigger data evaluation of the hardware model in response to clock edge detection in the software model.
[0026]
Various embodiments of the present invention solve the above problem with specially designed logic devices that replace standard design flip-flops and latches. One embodiment of the present invention is a timing insensitive glitch-free (TIGF) logic device. The TIGF logic device may take the form of any latch or edge triggered flip-flop. In one embodiment of the invention, a trigger signal is provided to update the TIGF logic device. The trigger signal is supplied during a short trigger period that occurs at an adjacent time from the evaluation period.
[0027]
In the latch configuration, the TIGF latch includes a flip-flop that holds the current state of the TIGF latch until a trigger signal is received. A multiplexer is also provided to receive the new value and the old stored value. The enable signal functions as a selector signal for the multiplexer. Because the trigger signal controls the update of the TIGF signal, the data at the D input to the TIGF latch and the control data at the enable input can arrive in any order without experiencing a hold time excess. Alternatively, since the trigger signal controls the update of the TIGF, the enable signal can glitch without being negatively affected by proper operation of the TIGF latch.
[0028]
In the flip-flop form, the TIGF flip-flop includes a first flip-flop that holds a new input value, a second flip-flop that holds a currently stored value, and a clock edge detector. All three components are controlled by a trigger signal to update the TIGF flip-flop. The multiplexer is further supplied with an edge detector signal that functions as a selector signal. One dedicated first flip-flop stores new input values that effectively block the inputs that change during the evaluation, thus avoiding holding time overruns. With a trigger signal that controls the TIGF flip-flop update, the clock glitch does not affect the hardware model of the user-designed circuit that uses the TIGF flip-flop as an emulated flip-flop.
[0029]
These and other embodiments are fully discussed and shown in the following sections of this specification.
[0030]
The accompanying drawings describe several different aspects and embodiments of the invention below.
[0031]
(Detailed description of preferred embodiments)
Various embodiments of the present invention are described herein through and relating to a system referred to as an “S emulator” or “S emulator” system. Throughout this specification, the terms “S emulation system”, “S emulator system”, “S emulation”, or simply “system” may be used. These terms represent various apparatus and methods according to the present invention for any combination of four modes of operation: (1) software simulation, (2) hardware acceleration simulation, (3) in-circuit Emulation (ICE), and (4) Post-simulation analysis (including individual setup or pre-processing stages). The term “S emulation” may also be used in other cases. This term refers to the novel process described herein.
[0032]
Similarly, terms such as “reconfigurable hardware computing (RCC) array system” or “RCC computing system” refer to a simulation / co-verification that includes a main processor, a software kernel, and a user-designed software model. This part of the application system. Terms such as “reconfigurable hardware hardware array” or “RCC hardware array” in one embodiment include a user-designed hardware model and simulation / co-verification including reconfigurable hardware logic elements. This part of the application system.
[0033]
In this specification, “user” and “circuit design” or “electronic design” of the user are described. A “user” is a person who uses the S emulation system through this interface and can be a circuit designer or test / debugger who has played little or no role in the design process. A “circuit design” or “electronic design” is a custom design system or component that is software or hardware (which can be modeled by an S emulation system for test / debug purposes). In many cases, the “user” also performed “circuit design” and “electronic design”.
[0034]
This specification also uses “wire”, “wireline”, “wire / bus line”, and “bus”. These terms refer to various wires that conduct electrically. Each line can be a single wire between two points or several wires between multiple points. In these terms, “wire” may include one or more conductors, and “bus” may also include one or more conductors.
[0035]
This specification is presented in outline form. First, this document presents a general overview of the S emulation system, including an overview of the four modes of operation and the hardware implementation scheme. Second, this document provides a detailed description of the S emulation system. In some instances, one drawing may provide various embodiments shown in the accompanying figures. In these cases, the same reference numbers are used for the same components / units / processes. The outline of this specification is as follows.
[0036]
I. Overview
A. Simulation / Hardware acceleration mode
B. Emulation in target system mode
C. Post simulation analysis mode
D. Hardware implementation scheme
E. Simulation server
F. Memory simulation
G. Co-verification system
II. System description
III. Simulation / Hardware acceleration mode
IV. Emulation in target system mode
V. Post simulation analysis mode
VI. Hardware implementation scheme
A. Overview
B. Address pointer
C. Gate data (GATED DATA) / clock network analysis
D. FPGA array and control
E. Another embodiment using highly integrated FPGA chip
F. TIGF logic device
VII. Simulation server
VIII. Memory simulation
IX. Co-verification system
X. Example
-----------------------------------
I. Overview
Various embodiments of the present invention have four general modes of operation. (1) Software simulation, (2) Simulation by hardware acceleration, (3) In-circuit emulation (ICE), and (4) Post simulation analysis. Various embodiments include the above modes of systems and methods having at least some of the following functions. (1) A software and hardware model with a single tightly coupled simulation engine, software kernel (which controls the software and hardware every cycle). (2) Automatic component type analysis during the compilation process for software and hardware model generation and partitioning. (3) A function that performs switching (cycle by cycle) between simulation in software simulation mode, hardware acceleration mode, in-circuit emulation mode, and post-simulation analysis mode. (4) Full hardware visibility through software combination component regeneration. (5) Double buffer clock modeling with software clock and gate clock / data logic to avoid race conditions; (6) Re-simulate user's circuit design from any selected point in post-simulation session or A function to perform hardware acceleration. The ultimate goal is a flexible and fast simulator / emulator system and method with full HDL functionality and emulator execution performance.
[0037]
A. Simulation / Hardware acceleration mode
The S emulation system can model custom circuit designs for software and hardware users through automatic component type analysis. The entire user circuit design is modeled in software, while the evaluation components (ie register components, combination components) are modeled in hardware. Hardware modeling is facilitated by component type analysis.
[0038]
The software kernel resident in the main memory of the general-purpose processor system serves as the main program of the S emulator system (which controls the overall operation and execution in various modes and functions). As long as any testbench processor is activated, the kernel evaluates the activated testbench component, evaluates the clock component, and propagates combinational logic data as well as clock edges that update registers and memory. Detect and advance simulation time. This software kernel provides close coupling characteristics between the simulator engine and the hardware engine. For software / hardware boundaries, the S emulation system has multiple I / O address spaces—REG (register), CLK (software clock), S2H (software to hardware) and H2S (hardware to software). )I will provide a.
[0039]
S emulation has the ability to selectively switch between four modes of operation. System user can start simulation, end simulation, assert input value, examine value, test single step per cycle, and switch back and forth between four different modes Can do. For example, the system can simulate a software circuit for a period of time, accelerate the simulation via a hardware model, and return to software simulation mode.
[0040]
In general, S emulation systems provide the user with the ability to “see” all modeled components, regardless of whether the components are modeled in software or hardware. For various reasons, combinational components are not “visible” like registers, and it is therefore difficult to obtain combinational component data. One reason is that FPGAs used in reconfigurable boards to model the hardware part of a user's circuit design typically model combination components as look-up tables (LUTs) instead of actual combination components It is to be. Thus, the S emulation system reads the register value and then regenerates the combination component. This regeneration process is not always performed because some overhead is required to regenerate the combined component. Rather, it only responds to user requests.
[0041]
Since the software kernel resides on the software side, a clock edge detection mechanism is provided to trigger the generation of so-called software clocks that drive the enable inputs to various registers of the hardware model. Since the timing is tightly controlled through the implementation of the double buffer circuit, the soft air clock enable signal enters the register model before data is input to these models. Once the data input to these register models has stabilized, the software clock is synchronized to ensure that all data values are gated with no risk of retention time violations. Gate data.
[0042]
Also, software simulation is fast because the system logs all input values and only selected register values / states, thus overhead is minimized by reducing the number of I / O operations. The user can selectively select the logging frequency.
[0043]
B. Target system mode emulation
The S emulation system can emulate the user's circuit within the target system environment. The target system outputs data to the hardware model for evaluation, and the hardware model also outputs data to the target system. In addition, the software kernel controls the operation of this mode so that the user can start, stop, assert values, examine values, perform a single step, and switch from one mode to another Still have options.
[0044]
C. Post simulation analysis mode
The log provides the user with a history record of the simulation session. Unlike known simulation systems, the S emulation system does not log single values, internal states or value changes during the simulation process. The S emulation system only logs the value and state selected based on the logging frequency (ie log one record every N cycles). During the post-simulation stage, if the user wishes to test various data near point X of the completed simulation session, the user is logged points (eg, logged point Y (near point X) , Placed in front of point X in time))). The user then performs a simulation from the selected logging point Y to his desired point X to obtain the simulation results.
[0045]
A VCD on-demand system is also described. The VCD on-demand system allows the user to view any simulation target range (ie, simulation time) on demand without rerunning the simulation.
[0046]
D. Hardware implementation scheme
The S emulation system implements an array of FPGA chips on a reconfigurable board. Based on the hardware model, the S emulation system partitions, maps, places, and routes each selected portion of the user circuit design onto the FPGA chip. Thus, for example, a 4 × 4 array of 16 chips can model a large circuit spread across these 16 chips. The interconnection scheme allows each chip to access another chip in two “jumps” or links.
[0047]
Each FPGA chip implements an address pointer for each I / O address space (ie, REG, CLK, S2H, H2S). All address pointer combinations associated with a particular address space are chained together. Thus, during data transfer, word data (one word per time for a selected address space of each chip (one chip per time)) allows the desired word data to access that selected address space. Until then, the main FPGA bus and PCI bus are sequentially selected from / to. This sequential selection of word data is accomplished by a propagated word selection signal. This word select signal moves through the address pointer of the chip and then propagates to the address pointer of the next chip, and this operation continues until the last chip or system initializes the address pointer.
[0048]
The reconfigurable hardware board FPGA bus system operates at twice the PCI bus bandwidth but at half the PCI bus speed. Therefore, the FPGA chips are separated into banks so as to utilize a larger bandwidth bus. The throughput of this FPGA bus system can track the throughput of the PCI bus system so that performance is not compromised by reducing the bus speed. Expansion is possible with a piggyback board that extends the bank length.
[0049]
In another embodiment of the invention, a more highly integrated FPGA chip is used. One such highly integrated chip is the Altera 10K130V and 10K250V chips. The use of these chips changes the board design so that only four FPGA chips are used per board instead of less than eight highly integrated FPGA chips (Altera 10K100V).
[0050]
The FPGA array of the simulation system is provided on the motherboard via a specific board interconnect structure. Each chip may have up to eight sets of interconnects, which are adjacent directly adjacent interconnects (ie, N [73: 0], excluding local bus connections). , S [73: 0], W [73: 0], E [73: 0]), and one-hop adjacent interconnects (NH [27: 0], SH [27: 0] , XH [36: 0], XH [72:37]) within a single board and across different boards. Each chip can be directly interconnected to adjacent neighboring chips, or can be interconnected with one-hop to a non-adjacent chip located one above the other, below, left and right . The array is a torus in the X direction (east-west). In the Y direction (north-south), the array is a mesh.
[0051]
Interconnects may connect logical devices and other components within a single board. However, the internal board connector connects the above boards together and across different boards so that signals are transmitted between (1) the PCI bus through the motherboard and the array board and (2) any two array boards. Provided to interconnect.
[0052]
The motherboard connector grounds the board to the motherboard, thus the PCI bus, power supply and ground. For some boards, the motherboard connector is not used to connect directly to the motherboard. In a six board configuration, only the

boards

1, 3, and 5 are directly connected to the motherboard, while the remaining

boards

2, 4, and 6 rely on the proximity board for motherboard connectivity. Thus, all other boards are directly connected to the motherboard and their interconnections and local buses are coupled together to the component side via internal board connectors located on the solder side. The PCI signal is routed through one of the boards (usually the first board). Power and ground are applied to the other motherboard connectors on these boards. Solder sides are placed on the component side and various internal board connectors allow communication between PCI bus components, FPGA logic devices, memory devices and various simulation system control circuits.
[0053]
E. Simulation server
In another embodiment of the invention, the simulation server allows multiple users to access the same reconfigurable hardware unit. In some system configurations, multiple workstations across the network or multiple users / processes in a non-network environment have access to the same server-based reconfigurable hardware unit to review / debug the same or different user circuit designs. obtain. This access is achieved through a time-sharing process (a process in which the scheduler determines access priorities for multiple users, swaps jobs, and selectively locks hardware model access between competing scheduled users). The In one scenario, each user may for the first time have access to map his / her different user design to a reconfigurable hardware model, in which case the system will generate this software and hardware model. Compile the design, perform clustering operations, perform place-and-route operations, generate bitstream configuration files, and reconfigure the FPGA chip in a reconfigurable hardware unit This models the hardware part of the user's design. If one user uses a hardware model to accelerate his design and downloads the hardware state to his memory for software simulation, this hardware unit is freed by another user for access Can be done.
[0054]
The server allows multiple users or processes to access a reconfigurable hardware unit for acceleration and hardware state swapping purposes. The simulation server includes a scheduler, one or more device drivers, and a reconfigurable hardware unit. The simulation server scheduler is based on the preemptive round robin algorithm. The server scheduler includes a simulation job queue table, a priority sorter (priority sorter), and a job swapper. The recovery and replay function of the present invention allows non-network multiprocessing and network multiuser environments (in these environments, previous checkpoint state data can be downloaded and the entire simulation state associated with this checkpoint is replay debugging or Can be recovered for step by cycle).
[0055]
F. Memory simulation
The memory simulation or memory mapping aspect of the present invention is for managing various memory blocks of a user-designed configured hardware model (programmed into an array of FPGA chips of a reconfigurable hardware hardware unit). Provide an efficient way to the simulation system. The memory simulation aspect of the present invention is a structure in which many memory blocks associated with a user's design are mapped to the SRAM memory device of the simulation system instead of the logic device used to construct and model the user's design and Provide a scheme. The memory simulation system includes a memory state machine, an evaluation state machine and associated logic for controlling and interfacing (1) to (3) below. An FPGA that includes (1) a main computing system and associated memory system, (2) an SRAM memory device coupled to the FPGA bus of the simulation system, and (3) a configured and programmed user design (during debugging) Logical device. The overall operation of the memory simulation system according to the embodiment of the present invention is as follows. The simulation write / read cycle is divided into three periods (DMA data transfer, evaluation and memory access).
[0056]
The FPGA logic device side of the memory simulation system uses the evaluation state machine, the FPGA bus driver, and each memory to interface with the user-designed user's own memory interface to process the following (1) and (2) Includes a logical interface to block N. (1) Data evaluation between FPGA logic devices, and (2) Write / read memory access between FPGA logic devices and SRAM memory devices. In relation to the FPGA logic device side, the FPGA I / O controller side is the memory state machine, (1) between the main computing system and the SRAM memory device, and (2) between the FPGA logic device and the SRAM memory device. Interfacing DMA, and interface logic to handle write and read operations.
[0057]
G. Co-verification system
One embodiment of the present invention includes co-verification including a reconfigurable computing system (hereinafter “RCC computing system”) and a reconfigurable computing hardware array (hereinafter “RCC hardware array”). System. In some embodiments, the target system and external I / O devices are not necessary because they can be modeled in software. In other embodiments, the target system and external I / O devices are actually connected to the co-verification system to obtain speed and use actual data rather than simulated test bench data. . Thus, the co-verification system uses the RCC computing system and RCC hardware array with the functionality to debug the software and hardware portions of the user's design while using the actual target system and / or I / O device. Can be incorporated.
[0058]
The RCC computing system also includes clock logic (logic for clock edge detection and software clock generation), a test bench process to test the user design, software instead of using actual physical I / O devices. Device model of any I / O device that decides to model in hardware. Of course, the user may decide to use actual and modeled I / O devices in one debug session. A software clock is provided to the external interface to function as an external clock source for the target system and external I / O devices. The use of this software clock provides the synchronization necessary to process the incoming and outgoing data. Since the RCC computing system generated software clock is time based in the debug session, the simulated and hardware accelerated data is synchronized with any data transmitted between the co-verification system and the external interface. Is done.
[0059]
If the target system and external I / O device are coupled to a co-verification system, pin-out data must be provided between the co-verification system and its external interface. Don't be. The co-verification system consists of (1) between the RCC computing system and the RCC hardware array, and (2) between the external interface (coupled to the target system and external I / O device) and the RCC hardware array. Including control logic that provides traffic control. Because the RCC computing system has a model of the entire software design (including the part of the user design modeled in the RCC hardware array), the RCC computing system also passes between the external interface and the RCC hardware array. You must have all the data you want. Control logic ensures that the RCC computing system has access to these data.
[0060]
II. System description
FIG. 1 shows a high level overview of one embodiment of the present invention. The workstation 10 is coupled to the reconfigurable hardware model 20 and the emulation interface 30 via the PCI bus system 50. Similar to the cable 61, the reconfigurable hardware 20 is coupled to the emulation interface 30 via the PCI bus 50. Target system 40 is coupled to an emulation interface via cable 60. In other embodiments, an in-circuit emulation setup 70 (shown by a box drawn in dotted lines) that includes the emulation interface 30 and the target system 40 is used to emulate the user's circuit design within the environment of the target system. If not desired during a debug session, it is not provided in this setup. Without the in-circuit emulation setup 70, the reconfigurable hardware model 20 communicates with the workstation 10 via the PCI bus 50.
[0061]
In combination with an in-circuit emulation setup 70, the reconfigurable hardware 20 mimics or mimics the user's circuit design of several electronic subsystems of the target system. In order to ensure correct operation of the user's circuit design of the electronic subsystem within the environment of the target system, input / output signals between the target system 40 and the modeled electronic subsystem are reconfigurable hardware for evaluation. Must be provided for model 20. Therefore, input / output signals of the target system 40 input / output from the reconfigurable hardware model 20 are transmitted via the cable 30 via the emulation 30 and the PCI bus 50. Alternatively, the input / output signals of the target system 40 can be communicated to the reconfigurable hardware model 20 via the emulation interface 30 and the cable 61.
[0062]
Control data and some substantial simulation data pass between the reconfigurable hardware model 20 and the workstation 10 via the PCI bus. In effect, the workstation 10 runs a software kernel that must control the operation of the overall S emulation system and have access (read / write) to the reconfigurable hardware model 20.
[0063]
A workstation 10 with a computer, keyboard, mouse, monitor and appropriate bus / network interface allows the user to enter and modify data describing the circuit design of the electronic system. Exemplary workstations include Sun Microsystems SPARC or ULTRA-SPARC workstations or Intel / Microsoft based computing stations. As known to those skilled in the art, the workstation 10 includes a CPU 11, a local bus 12, a host / PCI bridge 13, a memory bus 14 and a main memory 15. Various software simulations, hardware acceleration simulations, in-circuit emulation, and post-simulation analysis aspects of the present invention are provided to workstation 10, reconfigurable hardware model 20 and emulation 30. The algorithm embodied in the software is stored in the main memory 15 during a test / debug session and is executed via the original CPU 11 via the workstation operating system.
[0064]
As known to those skilled in the art, after the operating system is loaded into the memory of workstation 10 by the startup firmware, it moves to initialization code to begin to set up the data structures that need to be controlled, and Load and initialize device drivers. Control is then transferred to the command line interpreter (providing the user with a prompt to run the program) (CLI). The operating system then determines the amount of memory required to run the program, places a memory block or assigns it to a block of memory, and accesses the memory either directly or through the BIOS. After the memory loading process is complete, the application program begins to execute.
[0065]
One embodiment of the present invention is a specific application program for S emulation. During the course of execution of this program, this application program may request a number of services from the operating system. Many of these services include, but are not limited to, reading from a disk file, writing to a disk file, performing data communication, and interfacing with a display / keyboard / mouse.
[0066]
The workstation 10 allows a user to enter circuit design data, edit the circuit design data, monitor the progress of simulation and emulation while obtaining results, and essentially control the simulation and emulation process. Have a suitable user interface. Although not shown in FIG. 1, the user interface includes user-accessible menu-driven options and command sets (which can be entered with a keyboard and mouse and viewed on a monitor). Typically, the user uses a computing station 80 with a keyboard 90.
[0067]
A user typically creates a specific circuit design for an electronic system and enters a description of the HDL (always structured RTL level) code for his designed system into the workstation 10. The S emulation system of the present invention performs component type analysis between other operations to partition modeling between software and hardware. The S emulation system models behavior, RTL and gate level code in software. For hardware modeling, this system can model RTL and gate level code; however, the RTL level must be synthesized to the gate level prior to hardware modeling. Gate level code can be processed directly in an available source design database format for hardware modeling. Using RTL and gate level code, the system automatically performs component type analysis and completes the partition step. Based on partition analysis during software compile time, the system maps certain parts of the circuit design to hardware for fast simulation via hardware acceleration. The user can also couple the modeled circuit design to the target system for real-world in-circuit emulation. The software simulation and hardware acceleration engine are tightly coupled through the software kernel so that the user can then run the entire circuit design simulation using the software simulation until the test / debug process is complete Then, by using the hardware model of the mapped circuit design, the test / debug process can be accelerated, returning to the simulation portion and returning to hardware acceleration. The ability to switch between software simulation and hardware acceleration on a cycle-by-cycle basis and with user contention is one of the valuable features of this embodiment. This feature allows the user to go to a specific point or cycle very quickly, using hardware acceleration mode, then using software simulation to inspect various points and then debug the circuit design Is particularly useful in the debugging process. In addition, the S emulation system allows the user to see all components regardless of whether the internal implementation of the components is in hardware or software. When the user requests such a read, the S emulation system accomplishes this by reading the register values from the hardware model and reconfiguring the combination component using the software model. These and other features are discussed more fully later in this document.
[0068]
Workstation 10 is coupled to bus system 50. The bus system may be any available bus system that allows various agents such as workstation 10, reconfigurable hardware model 20, and emulation interface 30 to be operatively coupled together. Preferably, the bus system is fast enough to provide the user with real time or near real-time. One such bus system is the bus system described in the Peripheral Component Interconnect (PCI) standard (incorporated herein for reference). Currently, revision 2.0 of the PCI standard provides a 33 MHz bus speed. The revised version 2.1 supports a 66 MHz bus speed. Thus, the workstation 10, the reconfigurable hardware model 20 and the emulation interface 30 can be compatible with the PCI standard.
[0069]
In one embodiment, communication between the workstation 10 and the reconfigurable hardware model 20 is handled on the PCI bus. Other PCI compatible devices can be found in this bus system. These devices may be coupled to the PCI bus at the same level as workstation 10, reconfigurable hardware model 20 and emulation interface 30, or at other levels. Each PCI bus at a different level, such as PCI bus 52, is in some cases coupled to another PCI bus level, such as PCI 50, via a PCI-to-PCI bridge 51. On the PCI bus 52, two

PCI devices

53 and 54 may be coupled to each other.
[0070]
The reconfigurable hardware model 20 includes a field programmable gate array (FPGA) chip that can be configured and reconfigured to be programmable to model the hardware portion of the user's electronic system design. In this embodiment, the hardware model is reconfigurable. That is, in this embodiment, the hardware can be reconfigured to adapt to the particular computation at hand or user circuit design. For example, if many adders or multipliers are required, the system is configured to include many adders and multipliers. If other computing elements or functions are required, they can also be modeled or formed in the system. In this way, the system can be optimized to perform specialized calculations or logical operations. The user reconfigurable system is also flexible to handle minor hardware defects that occur during manufacturing, testing or use. In one embodiment, the reconfigurable hardware model 20 includes a two-dimensional array of computing elements that make up an FPGA chip to provide computational resources for various user circuit designs and applications, and hardware configuration Further details of the process are provided.
[0071]
Two such FPGAs include chips sold by Altera and Xillinx. In some embodiments, the reconfigurable hardware model is reconfigurable through the use of field programmable devices. However, other embodiments of the present invention may be implemented using application specific integrated circuit (ASIC) technology. Furthermore, other embodiments may be in the form of custom integrated circuits.
[0072]
In normal test / debug scenarios, the reconfigurable device is used to perform simulation / emulation of the user's circuit design so that appropriate changes can be made prior to the actual prototype production. However, in some other examples, resimulation and re-emulation, possibly non-functional circuit designs, cannot be changed quickly and cost effectively by the user, but actual ASICs or custom integrated circuits can be used. Often, such ASICs or custom ICs are available in competition already manufactured so that emulation with actual non-reconfigurable chips may be suitable.
[0073]
In accordance with the present invention, the workstation software integrates with the external hardware model to provide the end user with a greater degree of flexibility, control and performance over existing systems. In order to run simulation and emulation, a model of the circuit design and related parameters (eg, input test bench stimuli, overall system output, intermediate results) are determined and provided to the simulation software system. A user may use either a schematic capture tool or a synthesis tool to define the system circuit design. Users typically begin circuit design of electronic systems with a schematic diagram (which is later converted to HDL format using a synthesis tool). The HDL can also be written directly by the user. Exemplary HDL languages include Verilog and VHDL. However, other languages are available. A circuit design expressed in HDL includes many parallel components. Each component is a code sequence that either defines the behavior of the circuit elements or controls the execution of the simulation.
[0074]
The S emulation system analyzes the component to determine the type of the component, and the compiler uses this component type information to build different execution models in software and hardware. The user can then use the S emulation system of the present invention. The designer can verify the accuracy of the circuit through simulation by applying various stimuli, such as input signals and test vector patterns, to the model being simulated. If the circuit does not behave as planned during simulation, the user redefines the circuit by changing the schematic diagram or HDL file of the circuit.
[0075]
The use of this embodiment of the invention is illustrated in the flowchart of FIG. The algorithm begins at step 100. After loading the HDL file into the system, the system compiles, partitions, and maps the circuit design into the appropriate hardware model. The compilation, partitioning and mapping process is described in more detail below.
[0076]
Before the simulation runs, the system must run a reset sequence to remove all of the unknown “x” values on the software before the hardware acceleration model can function. One embodiment of the present invention provides a four-state value (“00” is logic low, “01” is logic high, “10” is “z”, and “11” is “x”). Is used to provide a bus signal. As known to those skilled in the art, the software model is “0”, “1”, “x” (bus collision or unknown value) and “z” (no driver or high impedance). Can be processed. In contrast, the reset sequence that changes depending on the specific application code resets all register values to "0" or all "1", so the hardware cannot process the unknown value "x".
[0077]
In step 105, the user determines whether to perform a circuit design simulation. Typically, the user first starts the system for software simulation. Accordingly, if the determination at step 105 is “YES”, the software simulation begins at step 110.
[0078]
As shown in step 115, the user can stop the simulation to examine the value. In fact, the user can stop the simulation at any time during the test / debug session indicated by dotted lines extending from step 115 to various nodes in hardware acceleration mode, ICE mode and post-simulation mode. The user proceeds to step 160 by executing execution step 115.
[0079]
After stopping, the system kernel reads back the state of the hardware register component if the user wishes to examine the combination component value, thereby regenerating the entire software model including the combination component. After restoring the entire software model, the user can examine any signal value in the system. After stopping and testing, the user continues to run in simulation only mode or hardware model acceleration mode. As shown in the flowchart, step 115 branches to a stop / value check routine. A stop / value check routine begins at step 160. At step 165, the user must decide whether to stop the simulation at this point and examine the value. If the determination of step 165 is “yes”, step 170 stops the simulation that may be currently in progress and examines various values to check for circuit design corrections. At step 175, the algorithm returns to the point branched at step 115. Here, the user may continue to simulate and stop / inspect values for the remainder of the test / debug session, or proceed to the in-circuit emulation process.
[0080]
Similarly, if the determination at step 105 is “NO”, the algorithm proceeds to a hardware acceleration determination step 120. At step 120, the user determines whether to accelerate the test / debug process by accelerating the simulation through the hardware portion of the modeled circuit design. If the determination at step 120 is yes, hardware model acceleration is performed at step 125. During the system compilation process, the S emulation system mapped several parts to the hardware model. Here, if hardware acceleration is desired, the system moves registers and combinational components to the hardware model and moves input values and evaluation values to the hardware model. Thus, during hardware acceleration, evaluation is performed in the hardware model for an extended period of time at an accelerated rate. The kernel writes the test bench output to the hardware model, updates the software clock, and then reads the hardware model output every cycle. If desired by the user, values from the user's entire circuit design software model (entire circuit design) can be used by outputting the register value and combination component, and regenerating the combination component with the register value. Could be possible. Because of the need for software intervention to regenerate these combinational components, no output of the entire software model value is provided every cycle; rather, the user wants such a value. Only in such cases is such a value provided to the user. This specification describes the regeneration process of the combination component below.
[0081]
Again, the user cannot stop the hardware acceleration mode at any time as indicated by step 115. If the user wishes to stop, the algorithm proceeds to

steps

115 and 160 and branches to a stop / value check routine. Here, as in step 115, the user can stop the hardware acceleration simulation process at any time and examine the values resulting from the simulation process, or the user can continue the hardware acceleration simulation process. . The stop / value check route branches to

steps

160, 165, 170 and 175 described above in connection with stopping the simulation. Returning to the main route after step 125, the user may decide to continue the hardware acceleration simulation at step 135, or may instead decide to perform a pure simulation. If the user wishes to perform further simulations, the algorithm proceeds to step 105. If the user does not wish to perform further simulation, the algorithm proceeds to post simulation analysis at step 140.
[0082]
At step 140, the S emulation system provides a number of post-simulation analysis characteristics. The system logs all inputs to the hardware model. For hardware model output, the system logs all values of the hardware register component at a user-defined logging frequency (eg, 1 / 10,000 records / cycle). The logging frequency determines how many times the output value is recorded. For a logging frequency of 1 / 10,000 records / cycle, the output value is recorded once every 10,000 cycles. The higher the logging frequency, the more information is recorded for later post-simulation analysis. Since the selected logging frequency has a causal relationship with the S emulation speed, the user carefully selects the logging frequency. Higher logging frequency reduces S emulation speed because the system must spend time and resources to record output data by performing I / O operations to memory before further simulation can be performed .
[0083]
For post-simulation analysis, the user selects a specific point where the simulation is desired. The user can then perform analysis after S emulation by running a software simulation with input changes to the hardware model to calculate the change in value and the internal state of all hardware components. Note that the hardware accelerator is used to perform a simulation of the data from the selected logging point to analyze the simulation results. This post simulation analysis method can be linked to an arbitrary simulation waveform viewer for post simulation. Further details are described below.
[0084]
At step 145, the user can select to perform emulation of the simulated circuit design within its target system environment. If the determination at step 145 is “no”, the algorithm ends and the S emulation process ends at step 155. If emulation of the target system is desired, the algorithm proceeds to step 150. This process includes driving the emulation interface board, plugging cables and chip pin adapters into the target system, and running the target system to obtain system I / O from the target system. System I / O from the target system includes signals between the target system and circuit design emulation. The emulated circuit design receives the input signal from the target system, processes this input signal, sends this input signal to the S emulation system for further processing, and possibly outputs the processed signal to the target system. . Conversely, the emulated circuit design sends the output signal to the target system (processing the output signal and outputting the processed signal back to the emulated circuit design). Thus, the performance of the circuit design can be evaluated in the original target system environment. After emulation of the target system, the result is that the user confirms the circuit design or shows a non-functional aspect. At this point, the user may run simulation / emulation again as shown at step 135 and either stop completely to change the circuit design or proceed to integrated circuit manufacturing based on the verified circuit design. .
[0085]
III. Simulation / Hardware acceleration mode
FIG. 3 shows a high level block diagram of software compilation and hardware configuration in the compile time and travel time zones according to one embodiment of the present invention. FIG. 3 shows two sets of information. One set of information distinguishes the operations performed between compile time and simulation / emulation run time, and the other set of information indicates the partition between the software model and the hardware model. For starters, an S emulation system according to one embodiment of the present invention requires user circuit design as input data 200. User circuit design is done in some form of HDL file (eg, Verilog, VHDL). Since the S emulation system parses the HDL file, the behavior level code, register transfer level code, and gate level code can be reduced to a form usable by the S emulation system. The system opens at front end processing step 205 to generate the source design database. The HDL file processed here can be used by the S emulation system. The parsing process converts ASCII data into an internal binary data structure, which is known to those skilled in the art. ALFRED V. which is incorporated herein. AHO, RAVI SETHI, AND JEFFREED. See ULMAN, COMPILERS: PRINCIPLES, TECHNIQUES, AND TOOLS (1988).
[0086]
Compile time is represented by process 225 and run time is represented by process / element 230. During the compile time as indicated by process 225, the S emulation system compiles the processed HDL file by performing component type analysis. Component type analysis classifies HDL components into combination components, register components, clock components, memory components, and test bench components. In essence, the system partitions the user circuit design into control and evaluation components.
[0087]
The S emulation compiler 210 essentially maps the simulation control component to software and the evaluation component to software and hardware. The compiler 210 generates a software model for all HDL components. The software model is cast with code 215. In addition, the S emulation 210 uses the component type information in the HDL file to select or generate hardware logic blocks / elements from a library or module generator, and generate hardware for a given HDL component. The final result is a so-called “bit stream” configuration file 220.
[0088]
In preparation for travel time, the code-format software model is stored in a main memory in which an application program related to the S emulation program according to one embodiment of the present invention is stored. This code is processed by a general purpose processor or workstation 240. In essence, the configuration file 220 for the hardware model is used to map the user circuit design to the reconfiguration hardware board 250. Here, these parts of the circuit design that have been modeled in hardware are mapped and partitioned to the FPGA chip of the reconfigurable hardware board 250.
[0089]
As described above, user test bench stimuli, test vector data, and other test bench resources 235 are applied to a general purpose processor or workstation 240 for simulation purposes. Furthermore, the user can perform emulation of the circuit design by software control. Reconfigurable hardware board 250 includes a user-emulated circuit design. This S emulation system has a function that allows the user to selectively switch between software simulation and hardware emulation, and a function that stops either the simulation or the emulation process at any time in each cycle. To check the values from all components of the model, either registers or combinational components. Thus, the S emulation system reconfigurable hardware between test bench 235 and processor / workstation 240 for simulation and via data bus 245 and processor / workstation for emulation. Data is passed to and from the board 250. When the user target system 260 is included, emulation data can pass between the reconfigurable hardware board 250 and the target system via the emulation interface 255 and the data bus 245. Since the kernel resides in the software simulation model of the processor / workstation 240 memory, data can be transferred between the processor / workstation 240 and the reconfigurable hardware 250 when necessary. Pass through.
[0090]
FIG. 4 shows a flowchart of the compilation process according to one embodiment of the invention. The compilation process is represented as

processes

205 and 210 in FIG. The compilation process of FIG. Step 301 processes front end information. Here, a gate level HDL code is generated. The user can convert the initial circuit design to HDL format by handwriting this code directly or using some form of schematic or synthesis tool to generate a gate level HDL representation of the code. . The S emulation system parses the HDL file (ASCII format) into a binary format, so behavior level code, level transfer level (RTL) code and gate level code result in an internal data structure format that can be used by the S emulation system. Can be done. The system generates a source design database that includes the parsed HDL code.
[0091]
Step 302 performs component type analysis by classifying HDL component components into combination components, register components, clock components, memory components, and test bench components, as shown in type resource 303. The S emulation system generates a hardware model for registers and combinational components (some exceptions are described below). Test bench and memory components are mapped to software. Some clock components (eg, derived clocks) are modeled in hardware, while others reside at software / hardware boundaries (eg, software clocks).
[0092]
In the combination component, this output value is a function of the current input value and does not depend on the history of the input value. Stateless logic components, examples of combinational components include primitive gates (eg, AND, OR, XOR, NOT), selectors, adders, multipliers, shifters and bus drivers.
[0093]
The register component is a single storage component. The state transition of the register is controlled by a clock signal. One form of register that can change state when an edge is detected is edge-triggered. Examples include flip-flops (D type, JK type) and level-sensitive latches.
[0094]
A clock component is a component that delivers periodic signals to a logic device, thereby controlling the behavior of the logic device. Normally, the clock signal controls the updating of the register. The primary clock is generated from a self-timed test-bench process. For example, a typical test bench process for clock generation in Verilog is as follows:
always's begin
Clock = 0;
# 5;
Clock = 1;
# 5;
end;
According to this code, the clock signal is initially a logic “0”. After 5 time units, the clock signal changes to logic “1”. After 5 time units, the clock signal is inverted to logic “0” and returned. Typically, the primary clock signal is generated in software, and a few (ie, 1-10) primary clocks are present in normal user circuit designs. Derived or gated clocks are generated from a combinational logic and register network driven in turn by a primary clock. A large number (ie, over 1,000) derived clocks exist in typical user circuit designs.
[0095]
A memory component is a block storage component with address and control lines for accessing individual data at a particular memory location. Examples include ROM, desynchronized RAM and synchronized RAM.
[0096]
The test bench component is a software process used to control and monitor the simulation process. Therefore, these components are not part of the hardware circuit design under test. The test bench component controls the simulation by generating a clock signal, initializing simulation data, and reading a simulation test vector pattern from disk / memory. The testbench component also checks for value changes, performs value change dumps, checks asserted limits for signal value relationships, writes output test vectors to disk / memory, various waveforms Monitor simulations by interfacing with viewers and debuggers.
[0097]
The S emulation system performs component type analysis as follows. This system tests a binary source design database. Based on the source design database, the system can characterize or classify elements as one of the above component types. Continuous assignment statements are classified as combinational components. The primitive gate is either a combination type of register types or a latch format according to a language definition. The initialization code is treated as the test bench initialization type.
[0098]
The process of driving a net without using a net is always a test bench driver type. The process of reading a net without using a net is always a test bench monitor type. Processes related to delay control or multiple events are always a generic type of test bench.
[0099]
The regular process for single event control and single net drive may be one of the following: (1) if the event control is an edge triggered event, then the process is of the edge triggered type It is a register component. (2) If the net driven in the process is not defined in all possible execution paths, then the net is a latch-type register. (3) If a net driven in the process is defined by all possible execution paths, then the net is a combination component.
[0100]
The constant process for single event control but driving multiple nets can be broken down into several processes that drive each net individually to drive individual component types individually. The decomposed process can then be used to determine the component type.
[0101]
Step 304 generates a software model for all HDL components regardless of component type. With an appropriate user-driven interface, the user can simulate the entire circuit design using a complete software model. The test bench process is used to drive stimulus inputs, test vector patterns, control the entire simulation, and monitor the simulation process.
[0102]
Step 305 performs clock analysis. Clock analysis involves two general steps. (1) clock extraction and sequential mapping, and (2) clock network analysis. The clock extraction and sequential mapping process maps the user register component to the hardware register model of the S emulation system and then extracts the clock signal from the system hardware register component. The clock network analysis step includes determining a primary clock and a derived clock based on the extracted clock signal and separating the gated clock network and the gate data network. A more detailed description is provided in FIG.
[0103]
Step 306 performs a residence selection. The system selects components for the hardware model in relation to the user. That is, it is a general one of the possible hardware components that can be implemented in the hardware model of the user's circuit design, and some hardware components are not modeled in hardware for various reasons. These reasons include component type, hardware resource limitations (ie floating point operations and large multiply operations are present in the software), simulation and communication overhead (ie small bridge logic between test bench processes is soft) Signal that is present in the software and monitored by the test bench process is present in the software) and user preferences. For a variety of reasons, including performance and simulation monitoring, the user can impose certain components in the software that would otherwise be modeled in hardware.
[0104]
Step 307 maps the selected hardware model to the reconfigurable hardware emulation board. In particular, step 307 retrieves and maps the netlist and maps the circuit design to a specific FPGA chip. This step involves grouping or clustering the logic elements together. The system then assigns each group to a unique FPGA chip or assigns several groups to a single FPGA chip. The system may also assign groups to different FPGA chips. In general, the system assigns groups to FPGA chips. A more detailed description is provided below with respect to FIG. The system places hardware model components in the FPGA chip mesh to minimize internal chip communication overhead. In one embodiment, the array includes a 4 × 4 array of FPGAs, a PCI interface unit and a software clock control unit. The array of FPGAs implements part of the user's hardware circuit design as determined above in steps 302-306 of this software compilation process. The PCI interface unit allows a reconfigurable hardware emulation model to communicate with a workstation via the PCI bus. The software clock avoids various clock signal race conditions for the array of FPGAs. Further, step 307 routes to the FPGA chip according to a communication schedule between hardware models.
[0105]
Step 308 inserts a control circuit. These control circuits communicate I / O circuit address pointers and data bus logic to the simulator (described below with reference to FIGS. 11, 12, and 14) for communicating with the DMA engine, and Includes evaluation control logic (described below with reference to FIGS. 19 and 20) to control hardware state transitions and wire multiplexing. As is known to those skilled in the art, a direct memory access (DMA) unit provides an additional data channel between the peripheral device and the main memory, which directly accesses the main memory (ie, without going through the CPU). Read, write). The address pointer in each FPGA chip allows data to be moved between the software model and the hardware model in consideration of bus size limitations. The evaluation control logic is essentially a finite state machine that ensures that the clock enable enters the register to be asserted before the clock and data inputs enter these registers.
[0106]
Step 309 generates a configuration file for mapping the hardware model to the FPGA chip. In essence, step 309 assigns circuit design components to specific cell or gate level components in each chip. Step 307 determines to map the hardware model group to a specific FPGA chip, but step 309 obtains this mapping result and generates a configuration file for each FPGA chip.
[0107]
Step 310 generates software kernel code. This kernel is a sequence of software code that controls the entire S emulation system. This kernel cannot be generated up to this point because the portion of code requires the hardware component to be updated and evaluated. Only after step 309 an appropriate mapping to the hardware model and the generated FPGA chip is generated. A more detailed discussion is provided below with reference to FIG. Compilation ends in step 311.
[0108]
As described above with reference to FIG. 4, software kernel code is generated in step 310 after the software model and hardware model are determined. This kernel is part of software in the S emulation system that controls the operation of the entire system. This kernel controls the execution of software simulations and the execution of hardware emulations. In addition, since the kernel resides in the heart of the hardware model, the simulator is integrated with the emulator. In contrast to other known co-simulation systems, the simulation system according to one embodiment of the present invention does not require a simulator to interact with the emulator from the outside. One embodiment of the kernel is the control loop shown in FIG.
[0109]
Referring to FIG. 5, the kernel starts at step 330. Step 331 evaluates the initialization code. Beginning at step 332 and bounding by decision step 339, the control loop begins and the control loop cycles repeatedly until the system no longer observes the active test bench process. In this case, the simulation or emulation session is completed. Step 332 evaluates the active test bench component for simulation or emulation.
[0110]
Step 333 evaluates the clock component. These clock components arise from the test bench process. Typically, the user instructs what type of clock signal is generated in the simulation system. In one example (example described above with respect to component type analysis and reproduced there), the clock components designed by the user in the test bench process are as follows:
[0111]
always's begin
Clock = 0;
# 5;
Clock = 1;
# 5;
end;
In this clock component example, the user determines that a logic “0” signal is generated first, and then a logic “1” signal is generated after 5 simulation times. This clock generation process cycles continuously until stopped by the user. This simulation time is advanced by the kernel.
[0112]
Decision step 334 queries whether any active clock edges are detected, and this step results in several types of logic evaluation in the software model and possible hardware models (if emulation is performed). ). The clock signal used by the kernel to detect active clock edges is the clock signal from the test bench process. If decision step 334 evaluates “no”, the kernel proceeds to step 337. If decision step 334 evaluates to “yes”, then the process proceeds to step 335 for updating registers and memory, and to step 336 for communicating the combination component. After the clock signal is asserted, step 336 takes note of combinatorial logic that substantially requires some time to pass the value through the combinatorial logic network. Once the value is communicated through the combination component and stabilized, the kernel proceeds to step 337.
[0113]
Note that registers and combinational components are further modeled in hardware so that the kernel controls the emulator portion of the S emulation system. In fact, whenever any active clock edge is detected, the kernel may accelerate the evaluation of the hardware model in

steps

334 and 335. Thus, unlike the prior art, an S emulation system according to one embodiment of the present invention can accelerate hardware emulators via a software kernel and based on component types (eg, registers, combinations). In addition, the kernel controls the execution of the software model and hardware model on a cycle-by-cycle basis. In essence, an emulator hardware model can be characterized as a simulation coprocessor to a general purpose processor that executes a simulation kernel. This coprocessor speeds up the simulation task.
[0114]
Step 337 evaluates the active bench component. Step 338 advances by simulation time. Step 339 provides a boundary for the control loop starting at step 332. Step 339 determines whether any test bench processes are active. If any test bench process is active, further simulation and / or emulation should be performed and more data should be evaluated. Thus, the kernel loops to step 332 to evaluate any active test bench component. If the test bench process is not active, then the simulation and emulation process is completed. Step 340 ends the simulation / emulation process. In short, the kernel is the main control loop that controls the operation of the entire S emulation system. As long as any testbench process is active, the kernel evaluates the active testbench component, evaluates the clock component, detects clock edges that update registers and memory, and conveys combinatorial logic data, only simulation time Proceed.
[0115]
FIG. 6 illustrates one embodiment of a method for automatic mapping of a hardware model to a reconfigurable board. The netlist file provides input to the hardware implementation process. This netlist describes the logical functions and their interconnections. The hardware model / FPGA implementation process includes three independent tasks (mapping, placement, and routing). This tool is generally referred to as a “placement and routing” tool. The design tool used may be a Viewlogic Viewdraw, a Schematic Capture system, a Xilinx Xact placement and routing software, or an Altera MAX + PLUS II system.
[0116]
The mapping task divides the circuit design into logic blocks, I / O blocks, and other FPGA resources. Although some logic functions such as flip-flops and buffers can be mapped directly to the corresponding FPGA resources, other logic functions such as combinatorial logic must be implemented in the logic block using a mapping algorithm. Typically, the user can select a mapping for optimal density or optimal performance.
[0117]
The placement task includes retrieving logical blocks and I / O blocks from the mapping task, and assigning logical blocks and I / O blocks to physical areas within the FPGA array. Current FPGA tools typically use some combination of three techniques: minimum cut, simulated annealing, and general force-directed relaxation (GFDR). In effect, these techniques determine the optimal placement among other variables based on various cost functions that depend on the total net length of the interconnect or the delay along a set of critical signal paths. The Xilinx XC4000 series FPGA tools use a variation of the GFDR technology for close refinement of the placement following the minimal cut technology placement relative to the initial placement.
[0118]
This routing task includes determining the routing path used to interconnect the various mapped blocks and the placed blocks. One such router (called a maze router) finds the shortest path between two points. Because routing tasks provide direct interconnection between chips, the placement of circuitry with respect to the chips is important.
[0119]
Initially, the hardware model can be described in either the gate netlist 350 or the RTL 357. The RTL level code can be further synthesized into a gate level netlist. During the mapping process, a synthesizer server 360 (such as Altera MAX + PLUS II programmable logic development tool system and software) may be used to generate an output file for mapping purposes. The synthesizer server 360 matches the user's circuit design components with any standard existing logic elements found in the library 361 (eg, standard adders or standard multipliers) Generate a parameterized and frequently used logic module 362 (eg, a non-standard multiplexer or non-standard adder) and a random logic element 363 (eg, a look to implement a customized logic function) Ability to synthesize logic based on uptables. In addition, the combiner server removes redundant logic and unused logic. In effect, the output file synthesizes or optimizes the logic required by the user's circuit design.
[0120]
If some or all of the HDL is at the RTL level, the circuit design components exist at a high enough level that the S emulation system can easily model these components using S emulation registers or components. If some or all of the HDL is present at the gate netlist level, circuit design components can be more circuit design specific, making it more difficult to map user circuit design components to S emulation components. Thus, the synthesizer server allows any logic element to be generated based on a standard logic element or a variant of a random logic element. Standard logic elements or random logic element variants may not have any parallelism in these variants or library standard logic elements.
[0121]
If the circuit design is in the form of a gate netlist, the S emulation system first performs a grouping or clustering operation 351. The hardware model configuration is based on a clustering process. This is because the combinational logic and registers are separated from the clock. Thus, logic elements that share a common primary clock or gate clock signal may be better provided by grouping them together and placing them together on a chip. The clustering algorithm is based on connectivity drive, hierarchical extraction, and rule structure extraction. If this description is present in the structured RTL 358, the S emulation system may decompose the function into smaller units as presented by the logic function decomposition operation 359. If logic synthesis or logic optimization is required at any stage, the synthesizer server 360 can utilize the circuit design to convert a more efficient representation based on the user's instructions. For clustering operation 351, for structured RTL 358, the link to the synthesizer server is indicated by the dotted arrow 364, and the link to the synthesizer server 360 is indicated by arrow 365. For logic function disassembly operation 359, a link to synthesizer server 360 is indicated by arrow 366.
[0122]
Clustering operation 351 groups logical components together in a manner selected based on function and size. This clustering may include only one cluster for small circuit designs or several clusters for large circuit designs. Nevertheless, in subsequent steps, a cluster of logic elements is used to map this cluster to the designed FPGA chip. That is, one cluster is aimed at a particular chip and another cluster is aimed at a different chip, or perhaps the same chip as the first cluster. Usually, the logic elements in a cluster exist with the clusters in the chip, but for optimization purposes, the cluster may need to be divided into one or more chips.
[0123]
After the cluster is formed in clustering operation 351, the system performs placement and routing operations. Initially, a coarse grain placement operation 352 on the FPGA chip of the cluster is performed. Initially, a coarse grain gain placement operation 352 places a cluster of logic elements on a selected FPGA chip. If necessary, the system makes the synthesizer server 360 available for the coarse grain placement operation 352, as indicated by arrow 367. After the coarse gain placement operation, a fine grain placement operation is performed to fine tune the initial placement. The S emulation system uses a cost function based on pin usage conditions, gate usage conditions, and inter-gate hops to determine optimal placement for both coarse and fine grain placement operations.
[0124]
Determining how a cluster is placed on a given chip is based on the placement cost, which is determined by two or more circuits (ie, CKTQ) by a cost function f (P, G, D). = CKT1, CKT2, ..., CKTN) and each position of the array of FPGA chips. Where P is typically pin usage / usability, G is generally gate usage / usability, and D is the connectivity matrix M (shown in FIG. The distance or number of “hops” between gates, as defined by The user's circuit design modeled in the hardware model includes all combinations of the circuit CKTQ. Each cost function is defined such that the calculated value of the calculated placement cost generally tends to be generated. (1) the minimum number of “hops” between any two circuits CKTN-1 and CKTN in the FPGA array, and (2) circuits CKTN-1 and in the FPGA array such that pin usage is minimized, and This is the arrangement of CKTN.
[0125]
In one embodiment, the cost function F (P, G, D) is defined as follows:
[0126]
[Expression 1]

[0127]
This equation can be simplified with the following equation:
[0128]
f (P, G, D) = C0 * P + C1 * G + C2 * D
The first term (ie, C0 * P) generates a first placement cost based on the number of pins used and the number of available pins. The second term (ie, C1 * G) generates a second placement cost based on the number of used gates and the number of available gates. The third term (ie, C2 * D) generates a placement cost value based on the number of hops that exist between the various interconnections in the circuit CKTQ (ie, CKT1, CKT2,..., CKTN). . The total placement cost value is generated by iteratively adding these three placement cost values. Constants C0, C1, and C2 are generated from this cost function for one or more factors that are most important (ie, pin usage, gate usage, or inter-gate hop) during any iterative placement cost calculation. Represents a weighted constant that selectively asymmetrics all placement cost values.
[0129]
If the system chooses different relative values for the weighted constants C0, C1, and C2, the placement cost is calculated repeatedly. Thus, in one embodiment, during the coarse grain placement operation, the system selects a larger value for C0 and C0 and C1. In this iteration, the system is more concerned with optimizing pin usage / availability and gate usage / availability in an array of FPGA chips than optimizing inter-gate hops in the initial placement of the circuit CKTQ. Decide something. In subsequent iterations, the system selects C0 for C2 and a smaller value for C1. In this iteration, the system optimizes inter-gate hops more important than optimizing pin usage / availability and gate usage / availability.
[0130]
During the fine grain placement operation, the system uses the same cost function. In one embodiment, the iterative steps for selecting C0, C1, and C2 are the same as the coarse grain operation. In another embodiment, the fine grain placement operation includes having the system select C0 for C2 and a small value for C1.
[0131]
Here, these variables and expressions are described. When deciding whether to place a given circuit CKTQ in FPGA chip x or FPGA chip y (among other FPGA chips), the cost function is pin usage / availability (P), gate usage / availability Check G and hop D between gates. Based on the cost function variables P, G, and D, the cost function f (P, G, D) generates a placement cost value for placing the circuit CTKQ at a specific location in the FPGA array.
[0132]
Pin usage / availability P also indicates I / O capacity. P_usedIs the number of pins used by the circuit CKTQ of each FPGA chip. P_availableIs the number of pins available in the FPGA chip. In one embodiment, P_availableIs 264 (44 pins × 6 interconnects / chip), but in another embodiment, P_availableIs 265 (44 pins × 6 interconnects / chip + 1 extra pin). However, the specific number of available pins depends on the type of FPGA chip used, the total number of interconnects used per chip, and the number of pins used for each interconnect. Therefore, P_availableCan vary considerably. Therefore, to evaluate the first term (ie, C0 * P) of the expression of the cost function F (P, G, D), the ratio P for each FPGA chip_used/ P_availableIs calculated. Thus, for a 4 × 4 array of FPGA chips, a ratio P of 16_used/ P_availableIs calculated. The higher the number of pins used for a given available number of pins, the higher the ratio is used for the given available number of pins. Of the 16 calculated ratios, the ratio that produces the largest number is selected. The first placement cost value is the selected maximum ratio P_used/ P_availableAnd the weighting constant C0 is calculated from the first term C0 * P. This first term is the calculated ratio P_used/ P_availableAnd, depending on the specific maximum ratio among the ratios calculated for each FPGA chip, the placement cost value will be larger for higher pin usage, assuming all other factors are equal. The system selects the deployment that produces the lowest deployment cost. The smallest maximum ratio P of all the maximum values calculated for the various arrangements, assuming all other factors are equal_used/ P_availableThe specific arrangement that produces is generally considered as the optimal arrangement of the FPGA array.
[0133]
Gate usage / availability G is based on the number of gates allowed by each FPGA chip. In one embodiment, the gate G used in each chip based on the position of the circuit CKTQ in the array._usedAs a result, the second arrangement cost (C1 * G) is assigned a value indicating that the arrangement is not feasible. Similarly, if the number of gates used in each chip including the circuit CKTQ is less than a predetermined threshold or a predetermined threshold, this second term (C1 * G) can be implemented as a result. Is assigned to a value indicating that Therefore, if the system wants to place the circuit CKT1 on a particular chip, and that chip does not have enough gates to accommodate the circuit CKT1, then the system cannot implement this particular placement. This can be concluded by the cost function. In general, a large number of G (eg, infinite) means that a high placement cost value indicates that the desired placement of the circuit CKTQ is not feasible and an alternative placement should be determined. Make sure to generate
[0134]
In another embodiment, based on the position of the circuit CKTQ in the array, the ratio G_used/ G_availableIs calculated for each chip. However, G_usedIs the number of gates used by the circuit CKTQ in each FPGA chip and G_availableIs the number of gates available in each chip. In one embodiment, the system uses a FLEX 10K100 chip for the FPGA array. The FLEX10K100 chip contains approximately 100,000 gates. Therefore, in this embodiment, G_availableIs equal to 100,000 gates. Thus, for a 4 × 4 array of FPGA chips, a ratio of 16 G_used/ G_availableIs calculated. As more gates are used for a given number of available gates, this ratio becomes larger. Of the 16 calculated ratios, the ratio that produces the largest number is selected. The second placement cost value is the selected maximum ratio G_used/ G_availableIs calculated from the second term C1 * G by multiplying by a weight constant C1. This second term is the calculated ratio G_used/ G_availableAnd because it depends on a specific maximum ratio among the ratios calculated for each FPGA chip, the placement cost value will be higher for higher gate usage even though all other factors are equal. The system selects the circuit placement that produces the lowest placement cost. The smallest maximum ratio G among all the maximum values calculated for various configurations_used/ G_availableThe particular placement that produces is generally considered as the best placement in the FPGA array even though all other factors are equal.
[0135]
In another embodiment, a value is first selected for C1. Ratio G_used/ G_availableIf is greater than “1”, this particular arrangement is not feasible (ie, at least one chip does not have enough gates for this particular arrangement of circuits). As a result, the system changes C1 with a very large number (eg, infinity), so the second term C1 * G is also a very large number and the total deployment cost value f (P, G, D) is also very large. On the other hand, the ratio G_used/ G_availableAs a result, this particular arrangement is feasible (ie, each chip has enough gates to support circuit implementation). As a result, the system does not change C1, so the second term C1 * G is a specific number.
[0136]
The third term C2 * D represents the number of hops between all gates that require interconnection. Furthermore, the number of hops depends on the interconnection matrix. The connectivity matrix provides the basis for determining the circuit path between any two gates that require chip-to-chip interconnects. Not all gates require inter-gate connections. Based on dividing the user's original circuit design and cluster into a given chip, some gates do not require any arbitrary interconnections. This is because one or more logic circuit elements are connected to each one or more inputs and one or more outputs, and the one or more logic circuit elements are arranged on the same chip. However, other gates require interconnection. This is because one or more logic elements are connected to one or more inputs and one or more outputs, and the one or more logic elements are arranged in different chips.
[0137]
To understand “hops”, reference is made to the connectivity matrix shown in tabular form in FIG. 7 and illustrated schematically in FIG. Each interconnection between chips, such as interconnection 602 between chip F11 and chip F14, is represented by 44 pins or 44 wires. In other embodiments, each interconnect shows more than 44 pins. In yet other embodiments, each interconnect exhibits less than 44 pins.
[0138]
In this interconnection scheme, two “hops” or “jumps” allow data to pass from one chip to another. Thus, data can pass from chip F11 to chip F12 in one hop via interconnect 601 and data can be passed to chip F11 in two hops via

interconnect

600 and 606, or

interconnect

603 and 610. To the chip F33. These exemplary hops are the shortest path hops between these sets of chips. In some examples, the signal may be routed through various chips such that the number of hops between the gates of one chip and the gates of the other chip exceeds the shortest path hops. The only circuit paths that must be examined in determining the number of hops between gates are the circuit paths required for interconnection.
[0139]
Connectivity is indicated by the sum of all hops between gates that require internal chip interconnection. The shortest path between any two chips can be represented by one or two “hops” using the connectivity matrix of FIGS. However, in a given hardware model implementation, the I / O capacity can limit the number of shortest path connections directly between any two gates in the array, so that these signals are longer paths (and therefore Routed through more than 2 hops) to reach the destination. Thus, the number of hops can exceed 2 for some inter-gate connections. In general, if all are equal, a smaller number of hops result in a lower deployment cost.
[0140]
The third term (ie C2 * D) is reproduced as:
[0141]
[Expression 2]

[0142]
The third term is the product of the weighting constant C2 and the addition component (S ...). The summing component is essentially the sum of all hops between each gate i and gate j in the user's circuit design that requires chip-to-chip interconnection. As mentioned above, not all gates necessarily require internal chip interconnection. For those gates i and j that require internal chip interconnection, the number of hops is determined. For all gates i and j, the total number of hops is added together.
[0143]
The distance calculation can also be defined as:
[0144]
[Equation 3]

[0145]
Here, M is a connectivity matrix. One embodiment of a connectivity matrix is shown in FIG. This distance is calculated for each inter-gate connection that requires interconnection. Therefore, the connectivity matrix M is examined for each gate i and gate j comparison. More specifically,
[0146]
[Expression 4]

[0147]
A matrix is set for every chip in the array so that each chip is identifiable numbered. These identification numbers are set at the top of the matrix as column headers. Similarly, these identification numbers are set along the side of the matrix as row headers. The particular entry at the intersection of the row and column in this matrix provides the direct connection data between the chip identified by the row and the chip identified by the column, where an intersection occurs. For any distance calculation between chip i and chip j, matrix M_{i, j}The entry in includes either “1” for direct connections or “0” for non-direct connections. The index k indicates the number of hops that need to interconnect any gate in chip i to any gate in chip j that is required for interconnection.
[0148]
First, the connectivity matrix M for K = 1_{i, j}Should be inspected. If the entry is “1”, there is a direct connection of this gate of chip i to the selected gate in chip j. Thus, the index or hop k = 1 is M_{i, j}Which results in the distance between these two gates. At this point, another inter-gate connection can be examined. However, if the entry is “0”, there is no direct connection.
[0149]
If there is no direct connection, the next k should be examined. The new k (ie k = 2) is the matrix M_{i, j}And can be calculated by multiplying itself. In other words, M²= M * M (where k = 2).
[0150]
The process of multiplying M and M itself until a particular row and column entry for chip i and chip j continues until the calculated result is “1”, where index k is the number of hops. Selected as. This operation includes performing an AND operation between the AND matrices M, and then performing an OR operation on the result of the AND operation. Matrix m_{i, l}And m_{l, j}AND operation between and results in a logic "1" value, so that the connection is selected at chip i and at chip j via any chip 1 and in hop k Exists between the gates. Otherwise, the connection does not exist in this particular hop k and further computation is required. Matrix m_{i, l}And m_{l, j}Is the connectivity matrix M as defined for this hardware modeling. For any given gate i and gate j that require interconnection, the matrix m_{i, l}The row containing the FPGA chip for gate i at_{l, j}Is logically ANDed into the column containing the FPGA chip for. Individual ANDed components are ORed together to generate the generated M for index or hop k_{i, j}Determine if the value is "1" or "0". If the result is “1”, the connection exists as a result, and the index k is designated as the number of hops. If the result is “0”, then there is no connection.
[0151]
The following examples illustrate these principles. 35 (A) to 35 (D), FIG. 35 (A) shows a user's circuit design shown as cloud 1090. FIG. This circuit design 1090 can be simple or complex. Part of the circuit design 1090 includes an OR gate 1091 and two AND

gates

1092 and 1093. The outputs of AND

gates

1092 and 1093 are connected to the input of OR gate 1091. Further, these

gates

1091, 1092, and 1093 can be connected to other parts of the circuit design 1090.
[0152]
With reference to FIG. 35 (B), components of circuit 1090 including portions including three

gates

1091, 1092, and 1093 can be configured and arranged in

FPGA chips

1094, 1095, and 1096. Certain exemplary arrays of FPGA chips have an interconnection scheme as shown. That is, a set of interconnects 1097 connects chip 1094 and chip 1095, and another set of interconnects 1098 connects chip 1095 and chip 1096. There is no direct interconnection between chip 1094 and chip 1096. When placing components of this circuit design 1090 on a chip, the system connects circuit paths across different chips using a pre-designed interconnection scheme.
[0153]
Referring to FIG. 35 (C), one possible configuration and arrangement is an OR gate 1091 located in chip 1094, an AND gate 1092 located in chip 1095, and an AND gate located in chip 1096. 1093. Other parts of the circuit 1090 are not shown for teaching purposes. The connection between the OR gate 1091 and the AND gate 1092 requires an interconnection. This is because these gates are located on different chips and a set of interconnects 1097 is used. The number of hops for this interconnection is “1”. The connection between OR gate 1091 and AND gate 1093 also requires interconnection, and a set of

interconnections

1097 and 1098 is used. The number of hops is “2”. As an example of this arrangement, the total number of hops is “3”, subtracting the contribution from this other gate and the interconnection in the rest of the circuit 1090 not shown.
[0154]
FIG. 35D illustrates an example of another arrangement. Here, the OR gate 1091 is arranged in the chip 1094, and the AND

gates

1092 and 1093 are arranged in the chip 1095. Again, other parts of the circuit 1090 are not shown for teaching purposes. The connection between the OR gate 1091 and the AND gate 1092 requires an interconnection. Because a set of interconnects 1097 is used, which are located in different chips. The number of hops for this connection is “1”. In addition, the connection between OR gate 1091 and AND gate 1093 also requires interconnection, and a set of interconnections 1097 is used. Furthermore, the number of hops is “1”. For this example arrangement, the total number of hops is “2”, subtracting contributions from other gates and interconnections in the rest of the circuit 1090 not shown. In this manner, assuming that all other factors are equal based on the distance D parameter alone, the cost function of the example of the arrangement of FIG. 35D is better than the example of the arrangement of FIG. Calculate a lower cost function. But all other factors are not equal. Perhaps the cost function for FIG. 35 (D) is also based on the gate usage / availability G. In FIG. 35D, one more gate is used in the chip 1095 than the gate used in the same chip in FIG. Furthermore, the pin usage / availability P for the chip 1095 in the example of the arrangement shown in FIG. 35C is equal to the pin usage / availability P for the same chip in the other examples of the arrangement shown in FIG. Greater than.
[0155]
After coarse grain placement, fine tuning of the placement of the flattened clusters further optimizes the placement results. This fine grain placement operation 353 improves the placement initially selected by the coarse grain placement operation 352. Here, if such a configuration increases optimization, the initial cluster can be decomposed. For example, assume that logic elements X and Y are part of cluster A and are specified for FPGA chip 1. According to the fine grain placement operation 353, the logic components X and Y can now be designated as separate clusters B or form part of another cluster C and designated for placement in the FPGA chip 2. The An FPGA netlist 354 that connects the user's circuit design to a particular FPGA is generated.
[0156]
The determination of how much a cluster is divided and how much is placed on a given chip is also based on the placement cost, which is calculated by a cost function f (P, G, D) for the circuit CKTQ. . In one embodiment, the cost function used for the fine grain placement process is the same as the cost function used for the coarse grain placement process. The only difference between the two placement processes is the size of the deployed cluster, not the placement of the process itself. The coarse grain placement process uses larger clusters than the fine grain placement process. In other embodiments, the cost functions for the coarse grain placement process and the fine grain placement process are different from each other as described above with respect to the selection weight constants C0, C1, and C2.
[0157]
When the placement is completed, a chip-to-chip routing task 355 is executed. A time division multiplexing (TDM) circuit can be used if the number of routing wires connecting circuits located in different chips exceeds the pins available in these FPGA chips assigned for inter-circuit routing. . For example, each FPGA chip can only have 44 pins to connect circuits located in two different FPGA chips, and the implementation of a particular model requires 45 wires between the chips at a particular time The division multiplexing circuit can be further implemented in each chip. This particular TDM circuit connects with at least two wires. One embodiment of a TDM circuit is shown in FIGS. 9A, 9B, and 9C and will be described hereinafter. Therefore, the routing task is always completed. This is because this pin can be configured from these chips in a time division multiplexed form.
[0158]
Once the placement and routing of each FPGA is determined, each FPGA can be configured into an optimized drive circuit, and thus the system generates a “bitstream” configuration file 356. In Altera terminology, the system generates one or more programmer object files (.pof). Other generated files include SRAM object files (.sof), JEDEC files (.jed), hexadecimal (Intel format) files (.hex), and tubular text files (.ttf) . The Altera MAX + PLUS II programmer programs the FPGA array using POF, SOF, and JEDEC files with an Altera hardware programmable device. Alternatively, one or more raw binary files (.rbf) are generated. CPU. Receive the rbf file and program the FPGA array via the PCI bus.
[0159]
At this point, the configured hardware is being prepared for hardware startup 370. This ends the automatic configuration of the hardware model on the reconfigurable board.
[0160]
Returning to the TDM circuit, which allows the group of pin outputs to be time-division multiplexed together and only one pin output is actually used, the TDM circuit effectively has at least two inputs (two wires A multiplexer with one output and a register connection configured as a selector signal in the loop. More input and loop registers may be provided if the S emulation system requires more wires to be grouped together. As a selector signal to this TDM circuit, several registers configured in a loop provide the appropriate signal to the multiplexer, and in one period one of the inputs is selected as an output and in another period, Are selected as outputs. Therefore, the TDM circuit manages the use of only one output wire between the chips, and in this example, the hardware model of the circuit implemented on a particular chip uses 44 pins instead of 45 pins. Can be achieved. Thus, the routing task can always be terminated. This is because the pins can be arranged in a time-division multiplexed form in the chip.
[0161]
FIG. 9A shows a schematic diagram of the pinout problem. Since this requires a TDM circuit, FIG. 9B provides a TDM circuit for the transmitting side, and FIG. 9C provides a TDM circuit for the receiving side. These figures show only one specific example where the S emulation system requires one wire instead of two wires between chips. If two or more wires must be connected together in a time multiplexed configuration, one skilled in the art may allow appropriate modifications in view of the following teachings.
[0162]
FIG. 9A shows one embodiment of a TDM circuit where the S emulation system connects two wires in a TDM configuration. Two

chips

990 and 991 are provided. A circuit 960 that is part of the complete user circuit design is modeled and placed in chip 991. A circuit 973 that is part of the complete user circuit design is modeled and placed in chip 990. Several interconnects are provided between circuit 960 and circuit 973, including a group of interconnects 994, interconnects 992, and interconnects 993. In this example, the total number of interconnections is 45. In one embodiment, if each chip provides at most 44 pins for these interconnects, one embodiment of the present invention provides at least two of the time multiplexed interconnects, and these Only one interconnection is required between

chips

990 and 991.
[0163]
In this example, the group of interconnects 994 continues to use 43 pins. For the 44th and last pins, a TDM circuit according to one embodiment of the present invention can be used to connect

interconnects

992 and 993 with a time division multiplexed configuration.
[0164]
FIG. 9B shows one embodiment of a TDM circuit. A modeled circuit (or part thereof) 960 in the FPGA chip 991 provides two signals on

wires

966 and 967. These

wires

966 and 967 output to circuit 960. Typically, these outputs are connected to a circuit 973 modeled on chip 990 (see FIGS. 9A and 9C). However, the availability of only one pin for these two

output wires

966 and 967 excludes direct pin-to-pin connections. Because

outputs

966 and 967 are transmitted to other chips in a single direction, appropriate transmit and receiver TDM circuits must be provided to connect these lines together. One embodiment of a transmitting TDM circuit is shown in FIG. 9B.
[0165]
The transmitting TDM circuit includes AND

gates

961 and 962, and their

respective outputs

970 and 971 are connected to inputs of an OR gate 963. The output 972 of the OR gate 963 is the output of a chip assigned to a pin and connected to another chip 990. One set of

inputs

966 and 967 to AND

gates

961 and 962 are provided by circuit model 960, respectively. Another set of

inputs

968 and 969 is provided by a looped register scheme that serves as a time division multiplexed selector signal.
[0166]
The looped register scheme includes

registers

964 and 965. The output 995 of register 964 is provided to the input of register 965 and the input 968 of AND gate 961. The output 996 of register 965 is provided to the input of register 964 and the input 969 of AND gate 962. Each

register

964 and 965 is controlled by a common clock source. At any given moment, only one of the

outputs

995 or 996 provides a logic “1”. Others are logic “0”. Thus, after each clock edge, a logic “1” shifts between output 995 and output 996. This then provides a “1” to either AND gate 961 or AND gate 962 and “selects” the signal on either wire 966 or wire 967. Thus, the data on wire 972 originates from circuit 960 on either wire 966 or wire 967.
[0167]
One embodiment of the receiving side of the TDM circuit is shown in FIG. 9C. Signals from circuit 960 (FIGS. 9A and 9B) on wire 966 and wire 967 of chip 991 must be connected to the

appropriate wire

985 or 986 to circuit 973 in FIG. 9C. The time division multiplexed signal from the chip 991 is input from the wire / pin 978. The receiver TDM circuit may connect these signals on wire / pin 978 to the

appropriate wires

985 and 986 to circuit 973.
[0168]
The TDM circuit includes input registers 974 and 975. The signal on wire / pin 978 is provided to these input registers 974 and 975 via

wires

979 and 980, respectively. The output 985 of input register 974 is provided to the appropriate port in circuit 973. Similarly, the output 986 of input register 975 is provided to the appropriate port in circuit 973. These input registers 974 and 975 are controlled by looped

registers

976 and 977.
[0169]
The output 984 of register 976 is connected to the input of register 977 and the clock input 981 of register 974. The output 983 of register 977 is connected to the input of register 976 and the clock input 982 of register 975. Each

register

976 and 977 is controlled by a common clock source. At any given moment, only one of the

enable inputs

981 or 982 is a logic “1”. Others exist at logic “0”. Thus, after each clock edge, a logic “1” shifts between enable input 981 and output 982. This then “selects” the signal on either wire 979 or wire 980. Accordingly, data on wire 978 from circuit 960 is substantially connected to circuit 973 via wire 985 or wire 986.
[0170]
The address pointer according to one embodiment of the present invention is described in detail below, as briefly described with reference to FIG. For repetition, several address pointers are placed on each FPGA chip in the hardware model. In general, the primary purpose for implementing an address pointer is that the system can communicate with a particular FPGA chip in software model 315 and hardware model 325 via a 32-bit PCI bus 328 (see FIG. 10). It is possible to deliver data during. More specifically, considering the bandwidth limitations of the 32-bit PCI bus, the primary purpose of the address pointer is address space at each chip between software / hardware boundaries and FPGA banks 326a-326d. (Ie, selectively controlling data delivery during each of REG, S2H, H2S, and CLK). Even if a 64-bit PCI bus is implemented, these address pointers still require control of data delivery. Therefore, the software model has five address spaces (ie, REG read, REG write, S2H read, H2S write, and CLK write), and each FPGA chip has five address pointers corresponding to the five address spaces. Have. Each FPGA requires these five address pointers. This is because a particular selected processed word in the selected address space can reside on any one or more FPGA chips.
[0171]
The FPGA I / O controller 381 selects a specific address space (ie, REG, S2H, H2S, and CLK) corresponding to the software / hardware boundary by using the SPACE index. Once an address space is selected, a specific address pointer corresponding to the selected address space in each FPGA chip selects a specific word corresponding to the same word in the selected address space. The maximum size of the address space at the software / hardware boundary and the address pointer in each FPGA chip depends on the memory / word capacity of the selected FPGA chip. For example, one embodiment of the present invention uses the Altera FLEX 10K family of FPGA chips. Thus, the estimated maximum size for each address space is REG, 3000 words, CLK, 1 word, S2H, 10 words, and H2S, 10 words. Each FPGA chip can hold about 100 words.
[0172]
Furthermore, the S emulator system has functions that allow the user to start, stop, assert input values, and inspect values at any time during the S emulation process. In order to provide the flexibility of the simulator, the S emulator must also make all components visible to the user, regardless of whether the internal implementation of the components exists in software or hardware. In software, the combined components are modeled and values are calculated during the simulation process. Thus, these values can be clearly “seen” that the user has access at any time during the simulation process.
[0173]
However, the value of the combination of hardware models is not directly “visible”. Although the registers are easily and directly accessible by the software kernel (ie, read / write), the combined components are more difficult to determine. In FPGAs, most combination components are modeled as look-up tables to achieve high gating functions. As a result, lookup table mapping provides efficient hardware modeling, but loses the visibility of most combinatorial logic signals.
[0174]
Despite these problems with the lack of visibility of the combination component, the simulation system can reconfigure or regenerate the combination component for inspection by the user after the hardware acceleration mode. If the user's circuit design has only combinational and register components, the values of all combinational components can be derived from the register component. That is, the combination component is constructed from or includes various configurations of registers according to the specific logic function required by the circuit design. The S emulator has a hardware model with only register and combination components, and as a result, the S emulator reads all register values from the hardware model and then reconfigures or regenerates all combination components. Due to the overhead required to perform this regeneration process, combinational component regeneration is not performed at all times. Rather, it is executed only in response to a user request. In practice, one benefit of using the hardware model is to accelerate the S emulation process. Determining the combination component in each cycle (or even most cycles) further reduces the speed of the simulation. In any event, checking only the register values should be sufficient for most simulation analyses.
[0175]
The process of regenerating the combined component value from the register value assumes that the S emulation system was in hardware acceleration mode or ICE mode. Otherwise, the software simulation already provides the combined component value to the user. The S emulation system maintains the combinational component values and register values that were resident in the software model before the start of hardware acceleration. These values are retained in the software model until further overwriting by the system. Since the software model already has register values and combination component values from the time immediately before the start of the hardware acceleration operation, the combination component regeneration process is responsible for these values in the software model depending on the updated input register values. Including updating some or all.
[0176]
The combined component regeneration process is as follows. First, when requested by the user, the software kernel reads all output values of the hardware register component from the FPGA chip to the REG buffer. This process involves transferring the register values of the FPGA chip to the REG address space via the address pointer chain. Placing register values that were in the hardware model in the REG buffer (at the software / hardware boundary) allows the software model to access the data for further processing.
[0177]
Second, the software kernel compares register values before and after hardware acceleration execution. If the register value before the hardware acceleration execution is the same as the value after the hardware acceleration execution, the value of the combination component is not changed. Instead of resources to regenerate time expansion and combination components, these values can be read from the software model, and this software model is a combination stored in the software model immediately before hardware acceleration execution. Has a component value. On the other hand, if one or more register values are changed, one or more combinational components that depend on the changed register values also change the value. These combined components must also be regenerated by the following third step.
[0178]
Third, for registers that have different values than the pre-acceleration and post-acceleration comparisons, the software kernel schedules the fanout combination component in the event queue. Here, these registers that change value during this acceleration operation detect events. Presumably, these combinational components that depend on these changed register values produce different values. Despite any changes in the values of these combination components, the system ensures that these changed register values are evaluated in the next step.
[0179]
Fourth, the software kernel then executes a standard event simulation algorithm to communicate values that vary from register to all combinational components in the software model. In other words, register values that change between pre-acceleration and post-acceleration time intervals are communicated downstream of all combinational components that depend on these register values. These combinational components must then evaluate these new register values. In accordance with the deployment and propagation principle, the second level combination component placed downstream from the first level combination component that directly depends on the next changed register value has been further changed, if any. Evaluate the data. This process of communicating register values to other component downstreams that may be affected continues to the end of the deployment network. Thus, only those combinational components that are located downstream and affected by the changed registers are updated in the software model. Not all combination components are affected. Thus, if only one register value and only one combination component changed between pre-acceleration and post-acceleration time intervals are affected by this register value change, then only this combination component Re-evaluate the value taking into account the changed register value. Other parts of this modeled circuit are not affected. For this small change, the combined component regeneration process occurs relatively quickly.
[0180]
Finally, when event transmission ends, the system is ready for operation in any mode. Usually the user wants to check the value after a long run. After the combined component regeneration process, the user continues pure software simulation for debugging / testing purposes. However, in other cases, the user wishes to continue hardware acceleration to the next desired point. In yet other cases, the user wishes to proceed further to ICE mode.
[0181]
In short, the combined component playback includes updating the component value of the software model using the register value. If any register value changes, the changed register value is communicated via the register's fanout network as the value is updated. If the register value does not change, then the software model value does not change, so the system does not need to regenerate the combined component. Typically, hardware acceleration occurs during a certain time. As a result, many register values can change, affecting many combinational component values located downstream of the deployment network of those registers that have changed values. In this case, the combined component regeneration process may be relatively slow. In other cases, only a few register values may be changed after hardware acceleration execution. The deployment network for registers with changed register values may be small, so the combinatorial component regeneration process may be relatively fast.
[0182]
IV. Emulation using target system mode
FIG. 10 illustrates an S emulation system architecture according to one embodiment of the present invention. Further, FIG. 10 shows the relationship between the software model, hardware model, emulation interface, and target system when the system operates in in-circuit emulation mode. As described above, the S emulation system includes a general purpose microprocessor and a reconfigurable hardware board interconnected by a high speed bus such as a PCI bus. The S emulation system compiles the user's circuit design and generates emulation hardware configuration data for a reconfigurable board mapping process to a hardware model. The user then simulates the circuit via a general-purpose processor, hardware accelerates the simulation process, emulates the circuit design using the target system via the emulation interface, and then performs post-simulation analysis To do.
[0183]
Software model 315 and hardware model 325 are determined during the compilation process. In addition, an emulation interface 382 and a target system 387 are provided in the system for in-circuit emulation mode. At the user's discretion, the emulation interface and target system need not be initially connected to the system.
[0184]
The software model 315 includes a kernel 316 that controls the entire system and four address spaces for software / hardware boundaries (REG, S2H, H2S, and CLK). The S emulation system maps the hardware model into four address spaces in main memory according to different component types and control functions. The REG space 317 is specified for the register component. The CLK space 320 is specified for the software clock. The S2H space 318 is specified for output to the hardware model in the software test bench component. The H2S space 319 is specified for output to the software test bench component of the hardware model. These special purpose I / O buffer spaces are mapped into the kernel's main memory space during system initialization time.
[0185]
The hardware model includes several banks 326a-326d of FPGA chips and an FPGA I / O controller 327. Each bank (eg, 326b) includes at least one FPGA chip. In one embodiment, each bank includes four FPGA chips. In a 4 × 4 array of FPGA chips, banks 326b and 326d can be low banks and banks 326a and 326c can be high banks. The mapping, placement, and routing of user circuit design elements for a particular hardware model for a particular chip and its interconnections are described with reference to FIG. The interconnection 328 between the software model 315 and the hardware model 325 is a PCI bus system. The hardware model also includes an FPGA I / O controller 327, which maintains the PCI bus throughput while controlling data traffic between the PCI bus and the FPGA chip banks 326a-326d. A PCI interface 380 and a control unit 381. Each FPGA chip further includes a number of address pointers, each address pointer corresponding to a respective software / hardware boundary address space (ie, REG, S2H, H2S, and CLK). Data is connected between each and each FPGA chip in the FPGA chip banks 326a to 326d.
[0186]
Communication between the software model 315 and the hardware model 325 occurs via the DMA engine or address pointer of the hardware model. Alternatively, further communication occurs via both the hardware model DMA engine and the address pointer. The kernel initiates a DMA transfer with an evaluation request via a directly mapped I / O control register. REG space 317, CLK space 320, S2H space 318, and H2S space 319 provide I / O

data path paths

321, 322, 323, and 324 for data delivery between software model 315 and hardware model 325, and Each of 324 is used.
[0187]
Double buffering is required for all primary inputs to S2H and CLK space. Because these spaces get several clock cycles and finish the update process. Double buffering avoids disturbing internal hardware model states that can cause race conditions.
[0188]
The S2H and CLK spaces are the primary inputs from the kernel to the hardware model. As described above, the hardware model substantially holds all register components and all combination components of the user's circuit design. In addition, the software clock is modeled in software, provided in the CLK I / O address space, and interfaces with the hardware model. The kernel progress simulation time searches for active test bench components and evaluates clock components. When any clock edge is retrieved by the kernel, the registers and memory are updated and the value is communicated through the combinational component. Thus, if the hardware acceleration mode is selected, any change in value in this space will trigger the hardware model to change the logic state.
[0189]
For in-circuit emulation mode, the emulation interface 382 is connected to the PCI bus 328, which can communicate with the hardware model 325 and the software model 315. During the hardware accelerated simulation mode and the in-circuit emulation mode, the kernel 316 controls the software model and the hardware model. Further, the emulation interface 382 is connected to the target system 387 via a cable 390. Further, the emulation interface 382 includes an interface port 385, an emulation I / O control 386, a target-hardware I / O buffer (T2H) 384, and a hardware-target I / O buffer (H2T) 383.
[0190]
The target system 387 includes a connector 389 that is part of the target system 387, a signal input / signal output interface socket 388, and other modules or chips. For example, the target system 387 can be an EGA video recorder and the user's circuit design can be a specific I / O controller circuit. The I / O controller user's circuit design for the EGA video controller is fully modeled in software model 315 and partially modeled in hardware model 325.
[0191]
Furthermore, the kernel 316 of the software model 315 controls the in-circuit emulation mode. Control of the emulation clock still exists in the software via the software clock, gate clock logic, and gate data logic, and setup and hold time problems do not occur during in-circuit emulation mode. Thus, the user can start, stop, single-step, assert a value, and examine the value at any time in the in-circuit emulation process.
[0192]
To perform this task, all clock nodes between the target system and the hardware model are identified. The clock generator in the target system is disabled, the clock port from the target system is disconnected, or else the clock signal from the target system prevents it from reaching the hardware model. Instead, the clock signal originates from a test bench process or other generation form of a software generated clock. As a result, the software kernel can detect active clock edges and trigger data evaluation. Thus, in ICE mode, the S emulation system uses the software clock to control the hardware model instead of the target system clock.
[0193]
In order to simulate the operation of the user's circuit design within the environment of the target system, the primary input (input signal) and output (output signal) signals between the target system 40 and the modeled circuit design are evaluated. To the hardware model 325 for this purpose. This is accomplished through two buffers: a target / hardware buffer (T2H) 384 and a hardware / target buffer (H2T) 383. Target system 387 uses T2H buffer 384 to apply the input signal to hardware model 325. The hardware model 325 uses the H2T buffer 383 to deliver the output signal to the target system 387. Instead of the software model 315 test bench process to evaluate the data, this in-circuit emulation mode sends and receives I / O signals through T2H and H2T buffers instead of S2H and H2S buffers. This is because the system is currently using the target system 387. Since the target system runs at a speed substantially greater than the speed of software simulation, the in-circuit emulation mode also runs at a higher speed. Transmission of these input and output signals occurs on the PCI bus 328.
[0194]
Further, the bus 61 is provided between the emulation interface 382 and the hardware model 325. This bus is similar to the bus 61 of FIG. The bus 61 communicates with the emulation interface 382 and the hardware model 325 via the T2H buffer 384 and the H2T buffer 383.
[0195]
Typically, the target system 387 is not connected to the PCI bus. However, such connections may be feasible if the emulation interface 382 is incorporated into the design of the target system 387. In this setting, the cable 390 does not exist. Signals between the target system 387 and the hardware model 325 pass through the emulation interface.
[0196]
V. Post-emulation analysis mode
The simulation system of the present invention can support change dump (VCD), which is a simulator function value widely used for post-simulation analysis. In essence, VCD provides a historical record of all hardware model inputs and selected register outputs. Thereafter, during the post-simulation analysis, the various inputs and the resulting output of the simulation process may be reviewed. To support VCD, the system writes all inputs to the hardware model. For output, the system writes all values of the hardware register component with a user-defined logging frequency (eg, 1 / 10,000 records / cycle). The writing frequency determines how often the output value is recorded. For a logging frequency of 1 / 10,000 records / cycle, the output value is recorded once every 10,000 cycles. As the logging frequency increases, more information is recorded for later post-simulation analysis. As the logging frequency decreases, less information is stored for later post-simulation processes. Since the selected write frequency has a non-constant relationship with the S emulation speed, the user should carefully select the logging frequency. Larger logging frequency reduces S emulation speed. This is because the system must consume time and resources to record output data in memory by performing I / O operations to memory before further simulation can be performed.
[0197]
For post-simulation analysis, the user selects a specific point where simulation is desired. When the logging frequency is 1/500 records / cycle, a register value is recorded for every 500 cycles such as

points

0, 500, 1000, 1500, and the like. For example, if the user desires a result at point 610, the user selects point 500 where the register value is recorded, and until the simulation reaches point 610, the user matches the time until the simulation reaches point 610. Simulate forward. During this analysis stage, the analysis speed is the same as the simulation speed. This is because the user first accesses the data for point 500 first and then simulates ahead of point 610. Note that at higher logging frequency, more data is stored for post-simulation analysis. Thus, for a logging frequency of 1/300 records / cycle, data is recorded every 300 cycles, such as

points

0, 300, 600, 900, etc. To obtain the result at point 610, the user first selects point 600 where the register value is recorded and then simulates forward to point 610. Note that the system can reach the desired point 610 faster during post simulation analysis if the logging frequency is 1/300 instead of 1/500. However, this is not always fast. The specific analysis point along with the logging frequency determines how fast the post-simulation analysis point arrives. For example, if the VCD logging frequency is 1/500 instead of 1/300, the system may reach point 523 faster.
[0198]
The user can then perform a post-S emulation analysis by performing a software simulation using the input log on the hardware model to calculate a dump of all hardware components. Furthermore, the user selects an arbitrary register write point without delay, and starts a value change dump from the log point forward without delay. This value change dump method can be linked to any simulation waveform for post-simulation analysis.
[0199]
VI. Hardware implementation scheme
(A. Overview)
The S emulation system implements an array of FPGA chips on a reconfigurable substrate. Based on the hardware model, the S emulation system divides, maps, places, and routes each selected portion of the user's circuit design to the FPGA chip. Thus, for example, a 16 chip in a 4 × 4 array can model a large circuit that extends across these 16 chips. The interconnect scheme allows each chip to access two “jumps” or another chip in the link.
[0200]
Each FPGA chip implements an address pointer for each of the I / O address spaces (ie, REG, CLK, S2H, H2S). All address pointer combinations associated with a particular address space are chained together. Thus, during data transfer, the word data in each chip is transferred from the main FPGA bus and PCI bus to / from the main FPGA bus and PCI bus until the desired word data is accessed to the selected address space. The word data of the chip is continuously selected one word at a time and one chip at a time for the selected address space of each chip. Continuous selection of word data is achieved by transmitting a word selection signal. The word select signal proceeds with the address pointer in the chip and then propagates to the next chip's address pointer and continues until the last chip, or the system initializes the address pointer.
[0201]
The reconfigurable board FPGA bus system operates at twice the PCI bus bandwidth, but operates at half the PCI bus speed. Therefore, the FPGA chip is separated into banks and uses a larger bandwidth bus. The throughput of this FPGA bus system can track the throughput of the PCI bus system so that performance is not lost by reducing the bus speed. Expansion is possible with larger substrates including more FPGA chips or piggyback substrates that extend the bank length.
[0202]
(B. Address pointer)
FIG. 11 shows an embodiment of the address pointer of the present invention. All I / O operations proceed by DMA streaming. Since the system has only one bus, the system continuously accesses data one word at a time. Thus, one embodiment of the address pointer uses a shift register chain that continuously accesses selected words in these address spaces. Address pointer 400 includes flip-flops 401-405, AND gate 406, connection of a pair of control signals, initialization 407, and movement 408.
[0203]
Each address pointer has n outputs (W0, W1, W2, ..., Wn) to select a word from n possible words in each FPGA chip corresponding to the same word in the selected address space. -1). Depending on the particular modeled user circuit design, the number of words n varies between circuit designs, and for a given circuit design, n can vary between FPGA chips. In FIG. 11, the address pointer 400 is an address pointer 400 of only 5 words (ie, n = 5). Thus, this particular FPGA chip containing this 5-word address pointer for a particular address space has only 5 words to select. Needless to say, the address pointer 400 can realize an arbitrary number of words n. This output signal Wn can be further called by a word selection signal. When this word select signal reaches the output of the last flip-flop in this address pointer, it is invoked by the OUT signal to be transmitted to the address pointer input of the next FPGA chip.
[0204]
When the initialization signal is asserted, the address pointer is initialized. The first flip-flop 401 is set to “1”, and all the other flip-flops 402 to 405 are set to “0”. In this regard, initialization of the address pointer does not allow arbitrary word selection. That is, after initialization, all Wn outputs remain “0”. The address pointer initialization procedure will be described with reference to FIG.
[0205]
The movement signal controls the progress of the pointer for word selection. This movement signal is generated from read, write, and spatial index control signals from the FPGA I / O controller. Since each operation is substantially a read or write, the spatial index signal determines which address pointer is substantially applied to the movement signal. Thus, the system drives only one address pointer associated with the selected I / O address space at a time, and during this time the system applies the move signal only to that address pointer. The generation of the movement signal is further described with respect to FIG. Referring to FIG. 11, when the movement signal is asserted, the movement signal is supplied to the input to AND gate 406 and the enable inputs of flip-flops 401-405. Therefore, logic “1” moves to each system clock cycle of word outputs Wi-Wi + 1. That is, the pointer moves from Wi to Wi + 1 and selects each cycle of a specific word. If the shifting word select signal directs its direction to the output of the last flip-flop 405 (labeled herein as “OUT”) 413, then this OUT signal will not reinitialize this address pointer. In this case, as described with reference to FIG. 14 and FIG. 15, it is directed to the next FPGA chip through the multiplexed cross-chip address pointer chain.
[0206]
An address pointer initialization procedure is described. FIG. 12 is a state transition diagram of address pointer initialization for the address pointer of FIG. Initially, state 460 is an idle state. If DATA_XSFR is set to “1”, the system proceeds to state 461. Here, the address pointer is initialized. Here, the initialization signal is asserted. The first flip-flop in each address pointer is set to “1”, and all other flip-flops in the address pointer are set to “0”. At this point, initialization of the address pointer does not enable any word selection. That is, all of the Wn outputs remain “0”. While DATA_XSFR remains “1”, the next state is the standby state 462. If DATA_XSFR is “0”, the address pointer initialization procedure ends and the system returns to the idle state 460.
[0207]
A movement signal generator for generating various movement signals for the address pointer will now be described. The spatial index generated by the FPGA I / O controller (item 327 in FIG. 10, FIG. 22) selects a specific address space (ie, REG read, REG write, S2H read, H2S write, and CLK write). Within this address space, the system of the present invention continuously selects specific words to be accessed. This continuous word selection is accomplished at each address pointer by a move signal.
[0208]
One embodiment of the movement signal generator is shown in FIG. Each FPGA chip 450 has an address pointer corresponding to various software / hardware boundary address spaces (ie, REG, S2H, H2S, and CLK). In addition to the address pointer and user circuit design modeled and implemented in the FPGA chip 450, the movement signal generator 470 is provided in the FPGA chip 450. The movement signal generator 470 includes an address space decoder 451 and several AND gates 452-456. The input signals are an FPGA read signal (F_RD) on the wire line 457, an FPGA write signal (F_WR) on the wire line 458, and an address space signal 459. Depending on which address space's address pointer is available, the output move signal for each address pointer can be a REGR move on wire line 464, a REGW move on wire line 465, an S2H move on wire line 466, Corresponds to H2S movement on wire line 467 and CLK movement on wire line 468. These output signals correspond to the movement signals on the wire 408 (FIG. 11).
[0209]
Address space decoder 451 receives a 3-bit input signal 459. In addition, a 2-bit input signal may be received. This 2-bit signal provides four possible address spaces, while the 3-bit input provides eight possible address spaces. In one embodiment, CLK is assigned to “00”, S2H is assigned to “01”, H2S is assigned to “10”, and REG is assigned to “11”. Depending on the input signal 459, the output of the address space decoder outputs "1" on one of the wire lines 460-463 corresponding to REG, H2S, S2H, and CLK, respectively, but remains The wire line is set to “0”. Thus, if any of these output wire lines 460-463 are “0”, the corresponding outputs of AND gates 452-456 are “0”. Similarly, if any of these input wire lines 460-463 are “1”, the corresponding outputs of AND gates 452-456 are “1”. For example, when the address space signal 459 is “10”, the address space H2S is selected. Wire wire 461 is “1” while remaining

wire wires

460, 462, and 463 are “0”. Accordingly, wire line 466 is “1” while remaining

wire lines

464, 465, 467, and 468 are “0”. Similarly, if the wire line 460 is “1” and the REG space is selected and depends on whether a read (F_RD) or write (F_WR) operation is selected, the REGR movement signal on the wire line 464 Alternatively, any of the REGW movement signals on the wire line 465 becomes “1”.
[0210]
As described above, the spatial index is generated by the FPGA I / O controller. In the code, the movement control is
REG space read pointer: REGR-move = (SPACE-index == # REG) &READ;
REG space write pointer: REGW-move = (SPACE-index == # REG) &WRITE;
S2H space read pointer: S2H-move = (SPACE-index == # S2H) &READ;
H2S space write pointer: H2S-move = (SPACE-index == # H2S) &WRITE;
CLK space write pointer: CLK-move = (SPACE-index == # CLK) &WRITE;
This is an equivalent code for the logic diagram of the mobile signal generator on FIG.
[0211]
As described above, each FPGA chip has the same number of address pointers as the address space at the software / hardware boundary. If the software / hardware boundary has four address spaces (ie, REG, S2H, H2S, and CLK), each FPGA chip has four address pointers corresponding to these four address spaces. Since each FPGA requires these four address pointers, a particular selected word processed in the selected address space can reside on any one or more FPGA chips, or select The data in the designated address space is modeled in each FPGA chip and affects various circuit elements to be realized. To ensure that the selected word is processed with one or more appropriate circuit elements in the appropriate one or more FPGA chips, a given software / hardware boundary address space (ie, Each set of address pointers associated with REG, S2H, H2S, and CLK) is “chained” to each other across several FPGA chips. As described above with reference to FIG. 11, in this “chained” embodiment, an address pointer associated with a particular address space of one FPGA chip is associated with the same address space as the next FPGA chip. Except for being “changed” to a pointer, a specific shifting word selection mechanism or transmission word selection mechanism is still utilized via a movement signal.
[0212]
Implementing four input pins and four output pins to chain address pointers accomplishes the same purpose. However, this realization is too costly for efficient use of resources. That is, four wires are required between the two chips, and four input pins and four output pins are required at each chip. One embodiment of the system of the present invention uses a multiplexed cross-chip address pointer chain. This chain allows the hardware model to use only one wire between chips, only one input pin, and one output pin at each chip (two I / O pins of the chip). . One embodiment of a multiplexed cross-chip address pointer chain is shown in FIG.
[0213]
In the embodiment shown in FIG. 14, the user's circuit design is mapped and divided into three FPGA chips 415-417 on a reconfigurable hardware board 470. This address pointer is shown as blocks 421-432. The number of words Wn (number of flip-flops) can vary for each address pointer (eg, address), except that it can vary depending on how many words are implemented on each chip for the user's custom circuit design. The pointer 427) has the same structure and function as the address pointer shown in FIG.
[0214]
For the REGR address space, the FPGA chip 415 has an address pointer 421, the FPGA chip 416 has an address pointer 425, and the FPGA chip 417 has an address pointer 429. For the REGW address space, the FPGA chip 415 has an address pointer 422, the FPGA chip 416 has an address pointer 426, and the FPGA chip 417 has an address pointer 430. For the S2H address space, the FPGA chip 415 has an address pointer 423, the FPGA chip 416 has an address pointer 427, and the FPGA chip 417 has an address pointer 431. For the H2S address space, the FPGA chip 415 has an address pointer 424, the FPGA chip 416 has an address pointer 428, and the FPGA chip 417 has an address pointer 432.
[0215]
Each chip 415 to 417 has a multiplexer 418 to 420, respectively. Note that, as is known, these multiplexers 418-420 are modeled and the actual implementation may be a combination of registers and logic elements. For example, the multiplexer may be a number of AND gates fed to the OR gate, as shown in FIG. Multiplexer 487 includes four AND gates 481-484 and one OR gate 485. The input of the multiplexer 487 is an out signal and a movement signal from each address pointer of the chip. The output 486 of the multiplexer 487 is a chain cutout signal that passes to the input to the next FPGA chip.
[0216]
In FIG. 15, this particular FPGA chip has four address pointers 475-478 corresponding to the I / O address space. Address pointer outputs (out and move signals are inputs to multiplexer 487. For example, address pointer 475 has an out signal on wire line 479 and a move signal on wire line 480. These signals are ANDed. The input to gate 481. The output of AND gate 481 is the input to OR gate 485. The output of OR gate 485 is the output of this multiplexer 487. In operation, the corresponding movement signal and spatial index are combined. The out signal at each output of the address pointers 475-478 functions as a selector signal for the multiplexer 487. That is, both the out signal and the move signal (derived from the spatial index signal) are active (eg, logic “1”). To Asser The word select signal from the multiplexer needs to be transmitted to the chainout wire line, and the move signal is periodically asserted to move the word select signal through a flip-flop in the address pointer, which is the input MUX data. Characterized as a signal.
[0217]
Returning to FIG. 14, these multiplexers 418-420 have four sets of inputs and one output. Each set of inputs is (1) an out signal found at the last output Wn-1 wireline (eg, wireline 413 at the address pointer shown in FIG. 11) for the address pointer associated with a particular address space, and (2) Including a movement signal. Each output of the multiplexers 418 to 420 is a chain-out signal. The word selection signal Wn through the flip-flop at each address pointer becomes an out signal when this signal reaches the output of the last flip-flop at the address pointer. The chain-out signal on wirelines 433-435 will be "1" only when both the out signal and the move signal associated with the same address pointer are asserted actively (eg, asserted to "1").
[0218]
The inputs to the multiplexer 418 are movement signals 436 to 439 and out signals 440 to 443 corresponding to the out signals and movement signals from the address pointers 421 to 424, respectively. For multiplexer 419, the inputs are move signals 444-447 and out signals 452-455 corresponding to the out and move signals from address pointers 425-428, respectively. For multiplexer 420, the inputs are move signals 448-451 and out signals 456-459 that correspond to the out and move signals from address pointers 429-432, respectively.
[0219]
In operation, for any given shift of word Wn, only those address pointers or chain address pointers associated with the selected I / O address space at the software / hardware boundary are active. Thus, in FIG. 14, only the address pointers at

chips

415, 416, and 417 associated with one of the address spaces REGR, REGW, S2H, or H2S are active for a given shift. Alternatively, for a given shift of the word select signal Wn that passes through the flip-flop, the selected word is continuously accessed due to bus bandwidth limitations. In one embodiment, the bus is 32 bits wide and the word is 32 bits so that only one word can be accessed at a time and delivered to the appropriate resource.
[0220]
If the address pointer is in the process of transmitting or shifting the word select signal through the flip-flop, the output chainout signal is not activated (eg not "1"), so the chip's multiplexer is The signal is not yet ready for transmission to the next FPGA chip. If the out signal is asserted active (eg, “1”), the chainout signal indicating that the system is ready to transmit or shift the word select signal to the next FPGA chip is active (eg, “1”). ) Is asserted. Thus, access occurs on one chip at a time. That is, the word selection signal is shifted through the flip-flop of one chip before the word selection shift operation is performed on another chip. In practice, the chain-out signal is only asserted when the word select signal reaches the end of the address pointer in each chip. In the code, the chainout signal is
Chain-out = (REGR-move & REGR-out) | (REGW-move & REGW-out) | (S2H-move & S2H-out) | (H2S-move & H2S-out) |
In short, for each number X of I / O address spaces (ie, REG, H2S, S2H, CLK) in the system, each FPGA has X address pointers (one address pointer for each address space). The size of each address pointer depends on the number of words required to model the user's custom circuit design on each FPGA chip. Assuming n words for a particular FPGA chip (ie, n words for the address pointer), this particular address pointer will have n outputs (ie, W0, W1, W2,..., Wn−1). ). These outputs Wi are further called word select signals. If a particular word Wi is selected, the Wi signal is asserted active (ie, “1”). The word selection signal is shifted or transmitted to the address pointer of the chip until the word selection signal reaches the end of the address pointer of the chip. At this point, this signal triggers the generation of a chain-out signal that initiates transmission of the word select signal Wi via the address pointer of the next chip. Thus, a chain of address pointers associated with a given I / O address space can be implemented across all of the FPGA chips on this reconfigurable hardware board.
[0221]
(C. Gate data / clock network analysis)
Various embodiments of the present invention perform clock analysis according to gate data logic and gate clock logic analysis. Gate clock logic (or clock network) and gate data network decisions are important for logical evaluation of the hardware model during successive implementations and emulation of the software clock. As described with reference to FIG. 4, clock analysis is performed at step 305. To further elaborate on the clock analysis process, FIG. 16 shows a flow chart according to one embodiment of the present invention. Further, FIG. 16 shows gate data analysis.
[0222]
The S emulation system has a complete model of the software user's circuit design and several parts of the hardware user's circuit design. These hardware parts include clock components (particularly derived clocks). The clock delivery timing issue occurs because of this boundary between software and hardware. Since the complete model is in software, the software can detect clock edges that affect register values. In addition to the register software model, physically these registers are located in the hardware model. The software / hardware boundary includes a software clock to ensure that the hardware register further evaluates its respective input (ie, moving data at the D input to the Q output). The software clock ensures that the registers in the hardware model are evaluated correctly. The software clock does not control the clock input to the hardware register component, but substantially controls the enable input of the hardware register. This software clock avoids race conditions and therefore does not require precise timing control to avoid holding time violations. The clock network and gate data logic analysis process shown in FIG. 16 is directed to a method and hardware register for modeling and implementing a clock so that race conditions are avoided and a flexible software / hardware boundary implementation is provided. A data delivery system is provided.
[0223]
As described above, the primary clock is a clock signal from the test bench process. All of the other clocks such as these clock signals generated from the combination component have been generated or are gate clocks. The primary clock may generate both a gate clock and a gate data signal. For the most part, only a few (eg 1-10) generated clocks or gate clocks are present in the user's circuit design. These generated clocks are implemented as software clocks and can remain in software. If a relatively large number (eg, more than 10) of generated clocks are present in the circuit design, the S emulation system models the clocks in hardware to reduce I / O overhead, and Maintain performance. Gate data is the data or control input of a register other than the clock generated from the primary clock through some combinational logic.
[0224]
The gate data / clock analysis process begins at step 500. Step 501 obtains available source design database code generated from the HDL code and maps the user's register elements to the register components of the S emulation system. One-to-one mapping of user registers to S emulation systems facilitates the subsequent modeling process. In some cases, this mapping is necessary to handle user circuit designs that describe register elements using specific primitives. Thus, for RTL level code, the S emulation register can be used fairly easily, so the RTL level code exists at a sufficiently high level, allowing a lower level implementation to be modified. For the gate level netlist, the S emulation system accesses the component's cell library and modifies this component to accommodate the logic elements specific to a particular circuit design.
[0225]
Step 502 extracts a clock signal from a register component of the hardware model. This step allows the system to determine the primary clock and the generated clock. This step further determines all clock signals required by various components in the circuit design. Information from this step facilitates the software / hardware clock modeling step.
[0226]
Step 503 determines the primary clock and the generated clock. The primary clock originates from the test bench component and is modeled only in software. The generated clock is generated from combinatorial logic, which is then driven by the primary clock. By default, the S emulation system of the present invention keeps the generated clock in software. If the number of generated clocks (eg, less than 10) is small, these generated clocks can be modeled as software clocks. The number of combinational components that generate these generated clocks is small, so no significant I / O overhead is provided by keeping these combinational components resident in software. However, if the number of generated clocks is large (eg, greater than 10), these generated clocks can be modeled in hardware to minimize I / O overhead. Sometimes the user's circuit design uses many generated clock components generated from the primary clock. Thus, the system builds a clock in hardware and keeps the number of software clocks small.
[0227]
Decision step 504 requires the system to determine if any generated clock is found in the user's circuit design. If the system does not require that any generated clocks be found in the user's circuit design, step 504 determines “no” and clock analysis ends at step 508. This is because all clocks in the user's circuit design are primary clocks and these clocks are simply modeled in software. If the generated clock is found in the user's circuit design, step 504 determines “yes” and the algorithm proceeds to step 505.
[0228]
Step 505 determines a fan-out combination component from the primary clock to the generated clock. In other words, this step tracks the clock signal data path from the primary clock by the combination component. Step 506 determines a fan-in combination component from the generated clock. In other words, this step tracks the clock signal data path from the combination component to the generated clock. Determining fan out sets and fan in sets in the system is done recursively in software. The net N fan-in set is as follows.
[0229]
[Equation 5]

[0230]
The gate clock or data logic network is determined by recursively determining the net N fan-in and fan-out sets and their intersections. The ultimate goal here is to determine the so-called net N fan insets. Typically, net N is a clock input node for determining the gate clock logic from each fan-in. In order to determine the gate data logic from each fan-in, the net N is the clock input node associated with the data input at hand. If the node is on a register, the net N is the clock input to this register for the data input associated with this register. The system finds all components that drive net N. For each component X driven with a net N, the system determines whether component X is a combination component. If each component X is not a combination component, then the net N fan-in set has no combination component and the net N is the primary clock.
[0231]
However, if at least one component X is a combination component, the system determines the net input Y of component X. The system now re-examines further in the circuit design by finding the input node to component X. For each net input Y of each component X, there may be a fan inset W connected to the net Y. This net Y fan-in set W is given to the net N fan-in set, and the component X is given to the set N.
[0232]
The net N fanout set is determined similarly. The net N fanout set is determined as follows.
[0233]
[Formula 6]

[0234]
Again, the gate clock or data logic network is determined by recursively determining the net N fan-in and fan-out sets and their interconnections. The ultimate goal here is to determine the net N so-called fanout sets. Typically, net N is a clock output node for determining the gate clock logic from each fanout. Thus, the set of all logic elements that use net N is determined. To determine the gate data logic from each fanout, the net N is the clock output node associated with the nearby data output. If the node is on a register, the net N is the output of this register relative to the primary clock drive input associated with this register. The system finds all components that use net N. For each component X that uses net N, the system determines whether component X is a combined component. If each component X is not a combination component, then the net N fanout set has no combination component and the net N is the primary clock.
[0235]
However, if at least one component X is a combination component, the system determines the net output Y of component X. Here, the system searches for further transfers from the primary clock in the circuit design by finding the output node from component X. For each net output Y from each component X, the fanout set W may have a logical output set W connected to the net Y. This net Y fanout set W is presented to the net N fanout set, and component X is presented to set N.
[0236]
Step 507 determines the clock network or gate clock logic. A clock network is an interconnection of fan-in and fan-out combination components.
[0237]
Similarly, the same fan-in and fan-out principles can be used to determine the gate data logic. Like the gate clock, the gate data is the data or control input of a register (except the clock) driven by the primary clock by some combinational logic. Gate data logic is the intersection of fan-in of gate data and fan-out from the primary clock. Thus, clock analysis and gate data analysis result in gate clock network / gate clock logic with some combinational logic and gate data logic. As described below, gate clock network and gate data network decisions are important for logic evaluation in the hardware model during successful implementation and emulation of the software clock. The clock / data network analysis ends at step 508.
[0238]
FIG. 17 shows basic building blocks of a hardware model according to an embodiment of the present invention. For register components, the S emulation system uses D as the basic block to build edge-triggered (ie flip-flop) and level-sensitive (ie latched) register hardware models using asynchronous load control. Use type flip-flops. This register model building block has the following ports. That is, Q (output state), A_E (asynchronous enable), A_D (asynchronous data), S_E (synchronous enable), S_D (synchronous data) and, of course, System. clk (system clock).
[0239]
The S emulation register model is triggered by the positive edge of the system clock or the positive level of the asynchronous enable (A_E) input. If either of these two positive edges or a positive level trigger event occurs, the register model searches for the asynchronous enable (A_E) input. When the asynchronous enable (A_E) input is enabled, the output Q gets the value of the asynchronous data (A_D); otherwise, when the synchronous enable (S_E) input is enabled, the output Q The value of (S_D) is acquired. On the other hand, if neither the asynchronous enable (A_E) nor the synchronous enable (S_E) input is enabled, the output Q is not evaluated despite the detection of the positive edge of the system clock. Thus, the inputs to these enable ports control the operation of this basic building block register model.
[0240]
The system uses the software clock, which is a specific enable register, to control the enable inputs of these register models. In complex user circuit designs, millions of elements are found in circuit designs, so the S emulation system implements millions of elements in the hardware model. Controlling all of these elements individually is expensive. This is because the overhead of sending millions of control signals to the hardware model takes longer than evaluating these elements in software. However, this complex circuit design requires only a few clocks (1-10) and is sufficient to control system state changes with only registers and combinational components. The hardware model of the S emulation system uses only registers and combinational components. The S emulation system further controls the evaluation of the hardware model by a software clock. In the S emulator system, the hardware model for the registers does not have a clock directly connected to other hardware components. Rather, the software kernel controls the value of all clocks. By controlling a few clock signals, the kernel has all control over the evaluation of the hardware model with a negligible amount of coprocessor processing overhead.
[0241]
Depending on whether the register model is used as a latch or flip-flop, the software clock is input to either the asynchronous enable (A_E) or synchronous enable (S_E) wire. The use of the software clock from the software model to the hardware model is triggered by edge detection of the clock component. When the software kernel detects the edge of the clock component, the software kernel sets the clock edge register via the CLK address space. This clock edge register controls the enable input, not the clock input, for the hardware register model. The global system clock further provides a clock input to the hardware register model. However, the clock edge register provides a software clock signal to the hardware register model via a double buffer interface. As explained below, the double buffer interface from the software clock to the hardware model ensures that all register models are updated synchronously with respect to the global system clock. Thus, the use of a software clock removes the risk of exceeding the hold time.
[0242]
18A and 18B show the building block register model implementation for latches and flip-flops. These register models are software clocked via appropriate enable inputs. Asynchronous ports (A_E, A_D) and synchronous ports (S_E, S_D) are used for either software clock or I / O operations, depending on whether the register model is used as a flip-flop or latch. The FIG. 18A shows an implementation of a register model when used as a latch. The latch is level sensitive. That is, output Q follows input D as long as the clock signal is asserted (eg, “1”). Here, the software clock signal is supplied to the asynchronous enable (A_E) input, and the data input is supplied to the asynchronous data (A_D) input. For I / O operations, the software kernel uses the synchronization enable (S_E) and synchronization data (S_D) inputs to download values to the Q port. This S_E port is used as the REG space address pointer, and S_D is used to access data to / from the local data bus.
[0243]
FIG. 18B shows the realization of the register model when used as a design flip-flop. The design flip-flop uses the following ports to determine the next state logic (data D, set (S), reset (R), and enable E). All of the next state logic of the design flip-flop is broken down into hardware combination components that are fed into the synchronous data (S_D) input. The software clock is input to the synchronization enable (S_E) input. For I / O operations, the software kernel uses asynchronous enable (A_E) and asynchronous data (A_D) inputs to download values to the Q port. The A_E port is used as a REG space write address pointer, and the A_D port is used to access data to / from the local data bus.
[0244]
Here, the software clock will be described. One embodiment of the software clock of the present invention is a clock enable signal to hardware register models, and data at the inputs to these hardware register models is evaluated with and in synchronization with the system clock. This eliminates race conditions and holding time excess. One embodiment of software clock logic includes clock edge detection logic in software. This clock edge detection logic triggers additional logic in hardware in response to clock edge detection. Such enable signal logic generates an enable signal at the enable input to the hardware register model before the data reaches these hardware register models. Gate clock network and gate data network decisions are critical to the successful implementation of logic evaluation in the software clock and hardware model during hardware acceleration mode. As described above, the clock network or gate clock logic is the intersection of the fan-in of the gate clock and the fan-out of the primary clock. Similarly, the gate data logic is also the intersection of the fan-in of the gate data and the fan-out of the primary clock for the data signal. These fan-in and fan-out concepts will be described with reference to FIG.
[0245]
As described above, the primary clock is generated by a software test bench process. The generated clock or gate clock is generated from a network driven by a combinational logic and then a primary clock. By default, the S emulation system of the present invention keeps the generated clock in software. If the number of generated clocks (eg, less than 10) is small, these generated clocks can be modeled as software clocks. The number of combinational components that generate these generated clocks is small, so no significant I / O overhead is provided by modeling these combinational components in software. However, if the number of generated clocks is large (eg, greater than 10), these generated clocks and this combined component can be modeled in hardware to minimize I / O overhead.
[0246]
Eventually, according to one embodiment of the present invention, clock edge detection occurring in software (via input to the primary clock) can be converted to hardware clock detection (input to the clock edge register). Through). Clock edge detection in software triggers an event in hardware, a register in the hardware model receives the clock enable signal before the data signal, and the evaluation of the data signal occurs in synchronization with the system clock, and the hold time Make sure to avoid excess.
[0247]
As mentioned above, the S emulation system has a complete model of the user's circuit design in software and several parts of the user's circuit design in hardware. As specified in the kernel, software can detect clock edges that affect hardware register values. In addition, the software / hardware boundary includes a software clock to ensure that the hardware register evaluates its respective input. The software clock ensures that registers in the hardware model are evaluated synchronously with the system clock and without any hold time exceeded. The software clock does not control the clock input to the hardware register component, but substantially controls the enable input of the hardware register component. The double buffer approach to implementing the software clock avoids race conditions by evaluating the registers in synchronization with the system clock and eliminates the need for precise timing control to avoid holding time excess. Make sure to remove.
[0248]
FIG. 19 illustrates one embodiment of a clock implementation system according to the present invention. Initially, gate clock logic and gate data logic are determined by the S emulator system as described above with respect to FIG. Thus, a distinction is made between gate clock logic and gate data logic. When implementing a double buffer, the drive source and double buffer primary logic should also be distinguished. Thus, the gate data logic 513 and the gate clock logic 514 are distinguished from fan-in and fan-out analysis.
[0249]
The modularized primary clock register 510 includes a first buffer 511 and a second buffer 512. These are both D registers. This primary clock is modularized in software, but the double buffer implementation is modularized in both software and hardware. Clock edge detection occurs in the primal clock register 510 in the software and triggers the hardware model to generate a software clock signal to the hardware model. Data and addresses enter the first buffer 511 on

wirelines

519 and 520, respectively. The Q output by wire line 521 of this first buffer 511 is coupled to the D input of second buffer 512. The Q output of this first buffer 511 is also provided to the gate clock logic 514 via wireline 522 and ultimately drives the clock input to the first buffer 516 of the clock edge register 515. The Q output from the second buffer 512 by the wireline 523 ultimately drives the input of the register 518 via the wireline 530 in the user custom designed circuit model. The enable input to the second buffer 512 in the primary clock register 510 is an INPUT-EN signal from the state machine via wireline 533. This state machine therefore determines the evaluation cycle and controls the various signals.
[0250]
The clock edge register 515 also includes a first buffer 516 and a second buffer 517. The clock edge register 515 is implemented in hardware. If clock edge detection occurs in software (via input to primary clock register 510), this may trigger the same clock edge detection in hardware (via clock edge register 515). The D input to the first buffer 516 by wireline 524 is set to logic “1”. The clock signal on wireline 525 is derived from gate clock logic 514 and ultimately from primary clock register 510 at the output of first buffer 511 on wireline 522. This clock signal by wire line 525 is a gate clock signal. The enable wire line 526 to the first buffer 516 is the ~ EVAL signal from the state machine that controls the I / O and evaluation cycles (discussed later). The first buffer 516 also has a RESET signal over wireline 527. This same reset signal is also provided to the second buffer 517 in the clock edge register 515. The Q output of the first buffer 516 is supplied by wireline 529 to the D input of the second buffer 517. The second buffer 517 has an enable input via wireline 528 and a reset input via wireline 527 for the CLK-EN signal. The Q output of the second buffer 517 by wireline 532 is provided to the enable input of a user custom designed circuit model register 518.

Buffers

511, 512 and 517 along with register 518 are clocked by the system clock. Only the buffer 516 of the clock edge register 515 is clocked by the gate clock from the gate clock logic 514.
[0251]
Register 518 is a typical D-type register model that is modeled in hardware and is part of the user custom circuit design. This embodiment of the clock implementation scheme of the present invention strictly controls the evaluation. The ultimate goal of this clock setup is to ensure that the clock enable signal on wireline 532 reaches register 518 before the data signal on wireline 530. As a result, the evaluation of the data signal by this register is synchronized with the system without a race condition.
[0252]
To repeat, the primary clock register 510 is modeled in software, while the double buffer implementation is modeled in both software and hardware. The clock edge register 515 is implemented in hardware. Gate data logic 513 and gate clock logic 514 are distinguished from fan-in and fan-out analysis for modeling purposes. The gate data logic 513 and gate clock logic 514 can also be modeled in software (when gate data and gate clock are small) or hardware (when gate data and gate clock are large). Determining the gate clock network and the gate data network is important to successfully implement the logical evaluation of the hardware model during the software clock and hardware acceleration mode.
[0253]
The implementation of the software clock depends primarily on the clock setup shown in FIG. 19 in tune with the timing of the assertion of the ~ EVAL, INPUT-EN, CLK-EN and RESET signals. Primary clock register 510 detects the clock edge and triggers the generation of a software clock for the hardware model. This clock edge detection event triggers an “activation” of the clock edge register 515 via the clock input via wireline 525, gate clock logic 514, and wireline 522. Thereby, the clock register 515 also detects the same clock edge. In this way, clock detection occurring in software (via

inputs

519 and 520 to primary clock register 510) can be translated into clock edge detection in hardware (via input 525 to clock edge register 515). At this point, the INPUT-EN wire line 533 to the second buffer 512 in the primary clock register 510 and the CLK-EN wire line 528 to the second buffer 517 in the clock edge register 515 are not asserted. Therefore, the data is not evaluated. The clock edge is then detected before the data is evaluated in the hardware register model. Note that at this stage, data from the data bus over wireline 519 has not even been propagated to gate data logic 513 and hardware modeled user register 518. Indeed, this data has not even reached the second buffer 512 in the primary clock register 510 because the INPUT-EN signal on wireline 533 has not yet been asserted.
[0254]
During the I / O phase, the ~ EVAL signal on wireline 526 is asserted to enable the first buffer 516 in clock edge register 515. Because the ~ EVAL signal is directed through the gate clock logic to the clock input of the wire line 525 of the first buffer 516, the ~ EVAL signal also passes through the gate clock logic 514 and monitors the gate clock signal. Thus, as will be described later with respect to a state machine that evaluates four states, the ~ EVAL signal is retained as long as necessary to stabilize the data and clock signals through the portion of the system shown in FIG. Can be done.
[0255]
~ EVAL is deasserted to disable the first buffer 516 if the signal stabilizes, if the I / O is complete, otherwise the system is ready to evaluate the data. The The CLK-EN signal is asserted and applied to the second buffer 517 via wireline 528 to enable the second buffer 517, and the logic value “1” is set to the register 518 by the wireline 529. To enable input, send to Q output by wire 532. Register 518 is then enabled and any data on wireline 530 is clocked into register 518 synchronously by the system clock. As can be understood by the reader, the enable signal to register 518 travels faster than the evaluation of the data signal to this register 518.
[0256]
The INPUT-EN signal by wireline 533 is not asserted to the second buffer 512. Also, the RESET edge register signal on wireline 527 is asserted to

buffers

516 and 517 of clock edge register 515 to reset these buffers and ensure that their outputs are logic “0”. Since the INPUT-EN signal is asserted to buffer 512, the data on wireline 521 now propagates to gate data logic 513 and to user circuit register 518 by wireline 530. The enable input to this register 518 is now a logic “0”, so data over wireline 530 cannot be clocked into register 518. However, the previous data is clocked in by the previously asserted enable signal over wireline 532 before the RESET signal is asserted and register 518 is disabled. Thus, the input data to the register 518, the user's hardware modeling circuit design, and the input to the other registers are stabilized for each register input port. If the clock edge is substantially detected by software, the primary clock register 510 and the clock edge register 515 in hardware activate the enable input to register 518. As a result, data waiting for input to the register 518 and other data waiting for input to the respective registers are clocked in simultaneously and synchronously with the system clock.
[0257]
As described above, the software clock implementation relies primarily on the clock setup shown in FIG. 19 aligned with the timing of asserting the ~ EVAL, INPUT-EN, CLK-EN, and RESET signals. FIG. 20 illustrates four states of a finite state machine that controls the software clock logic of FIG. 19 according to one embodiment of the invention.
[0258]
In state 540, the system is idle or some I / O operations are in progress. The EVAL signal is a logic “0”. The EVAL signal generated by the system controller is followed by as many clock cycles as necessary to determine the evaluation cycle and stabilize the logic of the system. Usually, the duration of the EVAL signal is determined by the placement scheme being compiled and is based on the length of the longest direct wire and the longest split multiple wire (ie, TDM circuit). The EVAL signal under evaluation is logic “1”.
[0259]
In state 541, the clock is enabled. The CLK-EN signal is asserted to a logic “1” and then the enable signal for the hardware register model is asserted. Here, the previous gate data in the hardware register model is evaluated synchronously with no risk of violating the holding time.
[0260]
In state 542, if new data has the INPUT-EN signal asserted to a logic "1", the RESET signal is also asserted to remove the enable signal from the hardware register model. However, new data enabled in the hardware register model through the gate data logic network continues to propagate to the intended destination of the hardware register model or reaches that destination and is enabled Waiting to be clocked into the hardware register model when the signal is reasserted and when the enable signal is reasserted.
[0261]
In state 543, the new data that propagates is logic stable while the EVAL signal is logic "1". Also, the multiplex communication wire is a logic “1” as described above as a time division multiplexing (TDM) circuit in connection with FIGS. 9A, 9B, and 9C. If the EVAL signal is deasserted or set to logic “0”, the system returns to the idle state 540 and awaits evaluation based on the detection of the clock edge by software.
[0262]
(D. FPGA array and control)
The S emulator system first compiles user circuit design data into a hardware model based on various controls including software models and element types. During the hardware compilation process, the system maps, places, and routes as previously described with respect to FIG. 6 for optimally separating, installing, and interconnecting the various elements that make up the user's circuit design. Run the process. Using known programming tools, a bitstream configuration file or a programmer object file (.pof) (or the original binary file (.rbf)) is referenced to reconfigure a hardware board containing many EPGA chips . Each chip includes a part of hardware corresponding to a user's circuit design.
[0263]
In one embodiment, the S emulator system uses an array of 4x4 FPGA chips (16 chips total). Exemplary FPGA chips include FPGA logic devices and the Xilinx XC4000 series family of Altera FLEX 10K devices.
[0264]
The Xilinx XC4000 series of FPGAs including XC4000, XC4000A, XC4000D, XC4000H, XC4000E, XC4000EX, XC4000L, and XC4000XL can be used. Specific FPGAs include Xilinx XC4005H, XC4025, and Xilinx 4028EX. The capacity of the Xilinx XC4028EX FPGA engine is approaching 500,000 gates on a single PCI board. Details of these Xilinx FPGAs can be obtained from their data book (Xilinx, The Programmable Logic Data Book (9/96)). This data book is incorporated herein by reference. For Altera FPGAs, details can be found in these data books (Altera, The 1996 Data Book (June 1996)). This data book is incorporated herein by reference.
[0265]
Simple general details of the XC4025 FPGA are provided. Each array chip consists of a 240-pin Xilinx chip. The Xilinx XC4025 densely packed array board contains about 440,000 configurable gates and can be aggregated on a computer to perform tasks. The Xilinx XC4025 FPGA consists of 1024 configurable logic blocks (CLB). Each CLB may implement 32-bit asynchronous SRAM or a small amount of general Boolean logic and two strobed registers. Around the chip, non-strobe I / O registers are provided. An alternative to XC4025 is XC4005H. A relatively low cost version of an array board with 120,000 structurable gates. The XC4005H device has a high power 24 mA drive circuit, but the standard XC4000 series lacks input / output flip / flop. Details of these Xilinx FPGAs and other Xilinx FPGAs can be obtained through their publicly available data sheets. This data sheet is incorporated herein by reference.
[0266]
The functionality of the Xilinx XC4000 series FPGA can be customized by loading placement data into internal memory cells. The values stored in these memory cells determine the logic function and logic interconnection of the FPGA. These FPGA placement data can be stored in on-chip memory and loaded from external memory. Either the FPGA can read the placement data from the external serial PROM or the external parallel PROM, or the placement data can be written into the FPGA from the external device. These FPGAs can be reprogrammed indefinitely, especially if the hardware is dynamically changed, or if the user wants the hardware to adapt to different applications.
[0267]
Generally speaking, XC4000 series FPGAs have up to 1024 CLBs. Each CLB has a third lookup table (or function generator H) with three inputs, as well as two inputs with four inputs that provide some of the inputs to the two flip-flops or latches. With two look-up tables (or function generators F and G), it has two look-up table levels. The output of these look-up tables can be driven independently of these flip-flops or latches. CLB is an optional Boolean function ((1) any function with four or five variables, (2) any function with four variables, any second function with up to four unrelated variables. , Any third function with up to three unrelated variables, (3) one function with four variables and another function with six variables, (4) any two with four variables The following combinations of functions, (5) some functions with nine variables) may be implemented: Two D-type flip-flops or latches are available for registering CLB inputs or storing the output of look-up tables. These flip-flops can be used independently of the look-up table. DIN can be used as a direct input to either one of these two flip-flops or a latch, and H1 drives the other through an H-function generator.
[0268]
Each of the four input function generators of CLB (ie, F and G) includes dedicated arithmetic logic for rapidly generating carry and borrow signals. This dedicated arithmetic logic can be arranged to implement a 2-bit adder with carry-in and carry-out. These function generators can also be implemented as read / write random access memory (RAM). The four input wire lines are used as address lines for the RAM.
[0269]
The Altera FLEX 10K chip is somewhat similar in concept. These chips are programmable logic devices (PLDs) based on SRAM with multiple 32-bit buses. In particular, each FLEX 10K100 chip has approximately 1,000,000 gates, 12 embedded array blocks (EAB), 624 logical array blocks (LAB), and 8 logical elements (LE) per LAB ( Or 4,992 LE), 5,392 flip-flops or registers, 406 I / O pins, and a total of 503 pins.
[0270]
The Altera FLEX 10K chip includes an embedded array of embedded array blocks (EBA) and a logical array of logical array blocks (LAB). EAB is used to implement various memories (eg, RAM, ROM, FIFO) and complex logic functions (eg, digital signal processor (DSP), microcontroller, multiplier, data conversion function, state machine). Can be. To implement a memory function, EAB provides 2,048 bits. To implement a logic function, EAB provides 100 to 600 gates.
[0271]
LAB can be used to implement intermediate sized logic blocks via LE. Each LAB represents approximately 96 logic gates and includes 8 LEs and local interconnects. The LE includes a look-up table with four inputs, a programmable flip-flop, and a dedicated signal path for carry and cascade functions. Typical logic functions that can be created include counters, address encoders, or small state machines.
[0272]
A more detailed description of Altera FLEX10K can be found in Altera, 1996 DATA BOOK (June 1996), incorporated herein by reference. The data book also includes details of the assistive programming software.
[0273]
FIG. 8 illustrates one embodiment of 4 × 4 FPGA arrays and their interconnections. This embodiment of the S emulator does not use a crossbar or partial crossbar connection to the FPGA chip. The FPGA chip includes chips F11 to F14 in the first row, chips F21 to F24 in the second row, chips F31 to F34 in the third row, and chips F41 to F44 in the fourth row. In one embodiment, each FPGA (eg, chip F23) has the following pins for interface to the FPGA I / O controller of the S emulator system.
[0274]
[Table 1]

[0275]
Thus, in one embodiment, each EPGA chip uses only 41 pins to interface with the S emulator system. These pins are further described with respect to FIG.
[0276]
These FPGA chips are interconnected to each other via non-crossbar interconnects or non-partial crossbar interconnects. Each interconnection between chips, such as interconnection 602 between chip F11 and chip F14, represents 44 pins or 44 wire lines. In other embodiments, each interconnect represents more than 44 pins. In still other embodiments, each connection represents less than 44 pins.
[0277]
Each chip has six interconnects. For example, chip F11 has interconnects 600-605. Chip F33 also has interconnections 606-611. These interconnects run horizontally along the rows and vertically along the columns. Each interconnect provides a direct connection between two chips along a row or a direct connection between two chips along a column. Thus, for example, interconnect 600 directly connects chip F11 and chip F13; interconnect 601 directly connects chip F11 and chip F12; interconnect 602 directly connects chip F11 and chip F14; Connection 603 directly connects chip F11 and chip F31; interconnect 604 directly connects chip F11 and chip F21; interconnect 605 connects chip F11 and chip F41 directly.
[0278]
Similarly, for chip F33 not located at the edge of the array (eg, F11), interconnect 606 directly connects chip F33 and chip F13; interconnect 607 directly connects chip F33 and chip F23; Interconnect 608 directly connects chip F33 and chip F34; interconnect 609 directly connects chip F33 and chip F43; interconnect 610 directly connects chip F33 and chip F31; interconnect 611 is chip F33 and chip F32 are directly connected.
[0279]
Since chip F11 is located within one hop from chip F13, interconnect 600 is displayed as "1". Since chip F11 is located within one hop from chip F12, interconnect 601 is displayed as "1". Similarly, since chip F11 is located within one hop from chip F14, interconnect 602 is displayed as “1”. Similarly, for chip F33, all interconnections are displayed as “1”.
[0280]
This interconnect scheme allows each chip to communicate with up to two “jumps” or any other chip in the array with the interconnect. Thus, chip F11 is connected to chip F33 through one of the following two paths ((1) interconnect 600 to interconnect 606; or (2) interconnect 603 to interconnect 610). That is, the path can be either (1) first along a row, then along a column, or (2) first along a column, then along a row.
[0281]
Although FIG. 8 shows FPGA chips arranged in a 4 × 4 array with horizontal and vertical interconnects, the actual physical implementation on the board goes through low and high banks with extended piggyback boards. ing. Thus, in one embodiment, chips F41-F44 and F21-F24 are in a low bank. Chips F31-F34 and F11-F14 are in a high bank. The piggyback board includes chips F11 to F14 and F21 to F24. Thus, to expand the array, a piggyback board containing many (eg, eight) chips is added to the bank (ie, above the row that currently contains chips F11-F14). In other embodiments, the piggyback board extends the array below the row that currently contains chips F41-F44. Further embodiments allow for expansion to the right side of chips F14, F24, F34 and F44. Still other embodiments allow for expansion to the left of the chips F11, F21, F31 and F41.
[0282]
FIG. 7 shows the concatenation matrix for the 4 × 4 FPGA array of FIG. 8 when displayed as replaced by “1” or “0”. This connectivity matrix is used to generate the installation cost resulting from the hardware mapping, placement, and cost functions used in the routing process for this S emulation system. This cost function has been described above with respect to FIG. As an example, chip F11 is located within one hop from chip F13, so the input of the concatenation matrix for F11-F13 is “1”.
[0283]
FIG. 21 illustrates the interconnect pin-out for a single FPGA chip according to one embodiment of the present invention. Here, each chip has six sets of interconnects, each set including a certain number of pins. In one embodiment, each set has 44 pins. The interconnection of each FPGA chip is oriented horizontally (east-west) and vertically (north-south). The set of west-facing interconnections is displayed as W [43: 0]. The set of east-facing interconnections is displayed as E [43: 0]. The northward set of interconnections is displayed as N [43: 0]. The south-facing set of interconnections is displayed as S [43: 0]. These complete sets of interconnects relate to connections to adjacent chips. That is, these interconnects do not “hop” across any chip. For example, in FIG. 8, chip F33 includes an interconnect 607 for N [43: 0], an interconnect 608 for E [43: 0], an interconnect 609 for S [43: 0], and W [43: 0]. ] Has an interconnection 611.
[0284]
Returning to FIG. 21, two additional sets of interconnects remain. One set of interconnects relates to non-adjacent interconnects (YH [21: 0] and YH [43:22]) that run vertically. Another set of interconnects relates to non-adjacent interconnects that run horizontally on XH [21: 0] and XH [43:22]. Each set, YH [. . . ] And XH [. . . ] Is divided into two, each half of a set containing 22 pins. With this arrangement, each chip can be manufactured in the same manner. Thus, each chip can be interconnected in one hop to non-adjacent chips located at the top, bottom, left and right. The FPGA chip also shows the pin (s) for the overall signal, the FPGA bus, and the JTAG signal.
[0285]
Next, the FPGA I / O controller will be described. This controller was first briefly introduced as item 327 in FIG. The FPGA I / O controller manages data and controls traffic between the PCI bus and the FPGA array.
[0286]
FIG. 22 illustrates one embodiment of an FPGA controller between the FPGA chip PCI bus and the FPGA array along the bank of FPGA chips. The FPGA I / O controller 700 includes a CTRL_FPGA unit 701, a clock buffer 702, a PCI controller 703, an EEPROM 704, an FPGA serial arrangement interface 705, a boundary scan test interface 706, and a buffer 707. Appropriate power / voltage known to those skilled in the art of adjusting the circuit is provided. An exemplary source is V coupled to a voltage detector / regulator._CCAnd a sense amplifier that substantially maintains the voltage at various environmental conditions. V to each FPGA chip_CCIs supplied by quickly moving the thin film fuse between them. V_CC-HI is provided to CONFIG # to all FPGA chips and to LINTI # to LOCAL_BUS708.
[0287]
The CTRL_FPGA unit 701 is the primary controller for the FPGA I / O controller 700 that handles various controls, tests, and reads / writes substantial data between the various units and buses. A CTRL_FPGA unit 701 is coupled to a low bank and a high bank of FPGA chips. FPGA chips F41-F44 and F21-F24 (ie, low banks) are coupled to a low FPGA bus 718. These FPGA chips F11 to F14, F21 to F24, F31 to F34, and F41 to F44 coincide with the FPGA chip of FIG. 8 while maintaining the reference numbers.
[0288]
Between these FPGA chips F11-F14, F21-F24, F31-F34 and F41-F44, and a low bank bus 718 and a high bank bus 719, there are thick film chip registers for proper loading. The group of registers 713 coupled to the low bank bus 718 includes, for example, registers 716 and 717. The group of registers 712 coupled to the high bank bus 719 includes, for example, registers 714 and 715.
[0289]
If expansion is desired, the FPGA chips can be further installed on the low bank bus 718 and the high bank bus 719 that are to the right of the F11 and F12 FPGA chips. In one embodiment, it is extended through a piggyback board that has something in common with the piggybackboard 720. Thus, if these banks of FPGA chips initially have only eight FPGA chips F41-F44, and F31-F34, further expansion is possible by adding a piggyback board 720. The piggyback board 720 includes FPGA chips F24 to F21 in the low bank, and includes FPGA chips F14 to F11 in the high bank. The piggyback board 720 also includes additional low and high bank buses and thick film chip resistors.
[0290]
The PCI controller 703 is a primary interface between the FPGA I / O controller 700 and the 32-bit PCI bus 709. If the PCI bus is extended to 64 bits and / or 66 MHz, appropriate adjustments can be made in this system without departing from the spirit and scope of the present invention. These adjustments are described below. One embodiment of a PCI controller 703 that may be used in this system is the PLX technology PCI 9080 or 9060. PCI 9080 has a suitable local bus interface, control register, FIFO, and PCI interface to PCI. Data book PLX technology, PCI 9080 data sheet (ver. 0.93, February 28, 1997) is hereby incorporated by reference.
[0291]
The PCI controller 703 passes data between the CTRL_FPGA unit 701 and the PCI bus 709 via the LOCAL_BUS 708. LOCAL_BUS includes a control bus portion for control signals, an address bus portion for address signals, and a data bus portion for controlling data signals. If the PCI bus is extended to 64 bits, the data bus portion of LOCAL_BUS 708 can also be extended to 64 bits. The PCI controller 703 is connected to the EEPROM 704. The EEPROM 704 includes configuration data of the PCI controller 703. The exemplary EEPROM 704 is a domestic semiconductor 93CS46.
[0292]
The PCI bus 709 supplies a 33 MHz clock signal to the FPGA I / O controller 700. The clock signal is provided to clock buffer 702 via wire 710 for synchronization purposes and for low timing skew. The output of the clock buffer 702 is a 33 MHz global clock (GL_CLK) signal supplied to all FPGA chips via the wire line 711 and supplied to the CTRL_FPGA unit 701 via the wire line 721. If the PCI bus is extended to 66 MHz, the clock buffer also supplies 66 MHz to the system.
[0293]
The FPGA serial configuration interface 705 provides configuration data to configure the FPGA chips F11-F14, F21-F24, F31-F34, and F41-F44. The Altera data book (Altera, 1996 data book (June 1996)) provides detailed information on the configuration devices and the processor. FPGA serial configuration interface 705 is also coupled to LOCAL_BUS 708 and parallel port 721. Further, the FPGA serial configuration interface 705 is coupled to the CTRL_FPGA unit 701 and the FPGA chips F11 to F14, F21 to F24, F31 to F34, and F41 to F44 via a CONF_INTF wire line 723.
[0294]
The boundary scan test interface 706 provides a JTAG device with a specific set of test commands to externally check processor logic units or system logic units and software circuitry. This interface 706 is an IEEE Std. Conforms to the 1149.1-1990 standard. With reference to the Altera data book (Altera, 1996 data book (June 1996)) and application note 39 (JTAG boundary scan test on Altera devices), both are incorporated herein by reference for further information. The boundary scan test interface 706 is further coupled to the LOCAL_BUS 708 and the parallel port 722. Further, the boundary scan test interface 706 is connected to the CTRL_FPGA unit 701 and the FPGA chips F11 to F14, F21 to F24 via the BST_INTF wire line 724. , F31 to F34, and F41 to F44.
[0295]
The CTRL_FPGA unit 701, together with the buffer 707, the F_BUS 725 of the low bank 32-bit FD [31: 0] and the F_BUS 726 of the high bank 32-bit FD [63:32], is connected to the low ( Data is passed to / from the high (chips F31 to F34 and F11 to F14) banks of the FPGA chip via the chips F41 to F44 and F21 to F24) bank and the high bank 32-bit bus 719, respectively.
[0296]
In one embodiment, the processing capabilities of the PCI bus 709 in the low bank bus 718 and the high bank bus 719 are duplicated. The PCI bus 709 is 32 bits wide at 33 MHz. Therefore, the processing capacity is 132 MBX (= 33 MHz^*4 bytes). The low bank bus 718 is 32 bits, which is half the PCI bus frequency (33/2 MHz = 16.5 MHz). The high bank bus 719 is also 32 bits wide, which is half the PCI bus frequency (33/2 = 16.5 MHz). The processing capability of the 64-bit low bank bus and high bank bus is also 132 MBX (= 16.5 MHz).^*8 bytes). Thus, the performance of the low bank bus and the high bank bus tracks the performance of the PCI bus. In other words, the performance limitation is in the PCI bus state, not the low bank bus and the high bank bus.
[0297]
The address pointer according to an embodiment of the present invention is further implemented in each FPGA chip of each software / hardware boundary address space. These address pointers are spanned across several FPGA chips through a multiplexed cross-chip address pointer chain. See the address pointer discussion above with respect to FIGS. 9, 11, 12, 14, and 15. In order to move word select signals across a chain of address pointers and several chips associated with a given address space, a chain-out wire line needs to be provided. These chain-out wire lines are shown as arrows between the chips. One such chain-out wire line for the low bank is a wire line 730 between chips F23 and F22. Another such chain-out wire for the high bank is wire 731 between chips F31 and F32. The chain-out wire 732 at the end of the low bank chip F21 is coupled to the CTRL_FPGA unit 701 as LAST_SHIFT_L. The chain-out wire 733 at the end of the high bank chip F11 is coupled to the CTRL_FPGA unit 701 as LAST_SHIFT_H. These signals LAST_SHIFT_L and LAST_SHIFT_H are word selection signals for their respective banks so that the word selection signals are transmitted via the FPGA chip. If either of these signals LAST_SHIFT_L and LAST_SHIFT_H represents a logic “1” for the CTRL_FPGA unit 701, this indicates that the word select signal is propagated to the end of each bank of the chip.
[0298]
The CTRL_FPGA unit 701 includes a write signal (F_WR) for the wire line 734, a read signal (F_RD) for the wire line 735, a DATA_XSFR signal for the wire line 736, an EVAL signal for the wire line 737, and a SPACE [2: 0] for the wire line 738. Signals are provided to and from the FPGA chip. The CTRL_FPGA unit 701 receives the EVAL_REQ # signal on the wire line 739. The write signal (F_WR), read signal (F_RD), DATA_XSFR signal, and SPACE [2: 0] signal work with the address pointer in the FPGA chip. The address associated with the selected address space as determined by the SPACE index (SPACE [2: 0]) using the write signal (F_WR), read signal (F_RD), and SPACE [2: 0] signal. A pointer MOVE signal is generated. The DATA_XSFR signal is used to initialize the address pointer and begin the verbatim data transfer process.
[0299]
If some FPGA chips assert this signal using the EVAL_REQ # signal, the evaluation cycle begins again at all points. For example, to evaluate the data, the data is transferred or written from the main memory of the host processor computing station to the FPGA via the PCI bus. When the transfer is complete, the evaluation cycle begins to include address pointer initialization and software clock operations to facilitate the evaluation process. However, for various reasons, certain FPGA chips may need to evaluate the data at all points again. The FPGA chip asserts the EVAL_REQ # signal, and the CNTF_FPGA chip 701 again starts the evaluation cycle at all points.
[0300]
FIG. 23 shows a more detailed view of the CTRL_FPGA unit 701 and buffer 707 of FIG. Similar input / output signals and their corresponding reference numbers for the CTRL_FPGA unit 701 shown in FIG. 22 are also retained and used in FIG. However, additional signals and wire / bus lines not shown in FIG. 22 (eg, SEM_FPGA output enable 1016, local interrupt output (Local INTO) 708a, local read / write control signal 708b, local address bus 708c, local interrupt input (Local INTI) #) 708d, and local data bus 708e) are described with new reference numbers.
[0301]
The CTRL_FPGA unit 701 includes a transfer completion checking logic (XSFR_DONE Logic) 1000, an evaluation control logic (EVAL Logic) 1001, a DMA descriptor block 1002, a control register 1003, an evaluation timer logic (EVAL timer) 1004, an address decoder 1005, and a write. It includes flag sequencer logic 1006, FPGA chip read / write control logic (SEM_FPGA R / W Logic) 1007, demultiplexer and latch (DEMUX logic) 1008, and latches 1009-1012 corresponding to buffer 707 of FIG. The global clock signal (CTRL_FPGA_CLK) by the wire / bus 721 is provided to all the logic elements / blocks in the CTRL_FPGA unit 701.
[0302]
Transfer completion checking logic (XSFR_DONE) 1000 receives LAST_SHIFT_H733, LAST_SHIFT_L732, and local INTO 708a. The XSFR_DONE logic 1000 outputs a transfer completion signal (XSFR_DONE) to the EVAL logic 1001 via the wire / bus 1013. Based on the receipt of LAST_SHIFT_H733 and LAST_SHIFT_L732, the XSFR_DONE logic 1000 checks for data transfer completion and, if desired, an evaluation cycle may begin.
[0303]
The EVAL logic 1001 receives the EVAL_REQ # signal of the wire / bus 739 and the WR_XSFR / RD_XSFR signal of the wire / bus 1015 in addition to the wire / bus 1013 transfer completion signal (XSFR_DONE). EVAL logic 1001 generates two output signals (Start EVAL on wire / bus 1014 and DATA_XSFR on wire / bus 736). The EVAL logic indicates when a data transfer between the FPGA bus and the PCI bus begins to initialize the address pointer. It receives an XSFR_DONE signal when the data transfer is complete. The WR_XSFR / RD_XSFR signal indicates whether the transfer is a read or a write. Once the I / O cycle is complete (or before the start of the I / O cycle), the EVAL logic may begin an evaluation cycle with an EVAL timer for the start EVAL signal t. The EVAL timer affects the duration of the evaluation cycle and ensures that the software clock mechanism is fully operational. This is done by keeping the evaluation cycle active while it is necessary to stabilize data transmission to all registers and combinational components.
[0304]
The DMA descriptor block 1002 receives the local bus address of the wire / bus 1019, the write enable signal of the wire / bus 1012 from the address decoder 1005, and the local bus data of the wire / bus 1029 via the local data bus 708e. The output is the wire / bus 1046 DMA descriptor output for the wire / bus 1045 DEMUX logic 1008. The DMA descriptor block 1002 includes descriptor block information corresponding to host memory information including the PCI address, local address, transfer count, transfer direction, and address of the next descriptor block. The host also sets the address of the primary descriptor block in the PCI controller descriptor pointer register. A transfer can be initiated by setting a control bit. The PCI loads the first descriptor block and starts data transfer. The PCI controller continues to load the descriptor block and the transfer data is set in the next descriptor pointer register until the PCI controller detects the end of the chain bit.
[0305]
The address decoder 1005 receives and transmits a local R / W control signal on the bus 708b, and receives and transmits a local address signal on the bus 708c. The address decoder 1005 is a write enable signal via wire / bus 1020 to the DMA descriptor 1002, a write enable signal via wire / bus 1021 to the control register 1003, an FPGA address SPACE index via wire / bus 738, and via wire / bus 1027. A control signal and another control signal by wire / bus 1024 by DEMUX logic 1008 are generated.
[0306]
The control register 1003 receives the wire / bus 1021 write enable signal from the address decoder 1005 and the data from the wire / bus 1030 via the local data signal 708e. Control register 1003 provides wire / bus 1015 WR_XSFR / RD_XSFR signal to EVAL logic 1001, wire / bus 1041 set EVAL time signal to EVAL timer 1004, and SEM_FPGA output enable signal on wire / bus 1016 to the FPGA chip. Generate. The system uses the SEM_FPGA output enable signal to selectively turn on or enable each FPGA chip. Typically, the system enables each FPGA chip at the same time.
[0307]
The EVAL timer 1004 receives the start EVAL signal of the wire / bus 1014 and the set EVAL time of the wire / bus 1041. The EVAL timer 1004 generates a wire / bus 737 EVAL signal, a wire / bus 1017 evaluation complete (EVAL_DONE) signal, and a wire / bus 1018 start write flag signal to the write flag sequencer logic 1006. In one embodiment, the EVAL timer is 6 bits long.
[0308]
Write flag sequencer logic 1006 receives the wire / bus 1018 start write flag signal from EVAL timer 1004. Write flag sequencer logic 1006 provides local R / W control signal for wire / bus 1022 to wire / bus 708b for local R / W, local address signal for wire / bus 1023 to local address bus 708c, to local data bus 708e. Local data signal of the wire / bus 1028 and local INTI # of the wire / bus 708d. Upon receipt of the start write flag signal, the write flag sequencer logic begins a sequence of control signals to initiate a memory write cycle to the PCI bus.
[0309]
The SEM_FPGA R / W control logic 1007 receives the control signal of the wire / bus 1027 from the address decoder 1005 and the local R / W control signal of the wire / bus 1047 via the local R / W control bus 708b. The SEM_FPGA R / W control logic 1007 has a wire / bus 1035 enable signal to the latch 1009, a wire / bus 1025 control signal to the DEMUX logic 1008, a wire / bus 1037 enable signal to the latch 1011, and a latch 1012 The wire / bus 1042 enable signal, the wire / bus 734 F_WR signal, and the wire / bus 735 F-RD signal are generated. SEM_FPGA R / W control logic 1007 controls various write and read data transfers to / from the FPGA low bank bus and high bank bus.
[0310]
DEMUX logic 1008 is a multiplexer and latch. The multiplexer and latch receive four sets of input signals and output one set of signals on wire / bus 1026 to local data bus 708e. The selector signal is a wire / bus 1025 control signal from the SEM_FPGA R / W control logic 1007 and a wire / bus 1024 control signal from the address decoder 1005. The DEMUX logic 1008 receives one set of signals from the EVAL_DONE signal on the wire / bus 1042, the XSFR_DONE signal on the wire / bus 1043, and the EVAL signal on the wire / bus 1044. This one set of signals is referred to as reference numeral 1048. In any given period, only one of these three signals, EVAL_DONE, XSFR_DONE, and EVAL, is provided to DEMUX logic 1008 to allow selection. The DEMUX logic 1008 also includes, as other three sets of input signals, the wire / bus 1045 DMA descriptor output signal from the DMA descriptor block 1002, the wire / bus 1039 data output from the latch 1012, and the latch 1010. Receive another data output of the wire / bus 1034.
[0311]
The data buffer between the CTRL_FPGA unit 701 and the low and high FPGA bank buses includes latches 1009-1012. Latch 1009 receives the wire / bus 1035 enable signal from wire / bus 1031 and local data bus 708e, local data bus of wire / bus 1032, SEM_FPGA R / W control logic 1007. The latch 1009 outputs data to the latch 1010 via the wire / bus 1033.
[0312]
Latch 1010 receives wire / bus 1033 data from latch 1009 and wire / bus 1036 enable signal via wire / bus 1037 from SEM_FPGA R / W control logic 1007. Latch 1010 outputs wire / bus 725 data to the low bank bus of the FPGA and DEMUX logic 1008 via wire / bus 1034.
[0313]
Latch 1011 receives wire / bus 1031 data from local data bus 708 e and wire / bus 1037 enable signal from SEM_FPGA R / W control logic 1007. The latch 1011 outputs the data of the wire / bus 726 to the high bank bus of the FPGA and the data of the wire / bus 1038 to the latch 1012.
[0314]
The latch 1012 receives the wire / bus 1038 data from the latch 1011 and the wire / bus 1040 enable signal from the SEM_FPGA R / W control logic 1007. The latch 1012 outputs to the DEMUX 1008 via the wire / bus 1039.
[0315]
FIG. 24 shows the 4 × 4 FPGA array, its relationship to the FPGA bank, and expansion performance. As in FIG. 8, FIG. 24 shows a similar 4 × 4 array. A CTRL_FPGA unit 740 is further shown. The low bank chips (chips F41 to F44 and F21 to F24) and the high bank chips (chips F31 to F34 and F11 to F14) are configured in an alternative manner. Therefore, characterize the rows of FPGA chips from the bottom row to the top row (low bank-high bank-low bank-high bank). The data transfer chain follows a predetermined order. The low bank data transfer chain is indicated by arrow 741. The high bank data transfer chain is indicated by arrow 742. The JTAG construction chain is indicated by arrow 743. Arrow 743 passes through the entire array of 16 chips from F41 to F44, F34 to F31, F21 to F24, and F14 to F11 back to the CTRL_FPGA unit 740.
[0316]
Expansion can be achieved on a piggyback board. Assuming in FIG. 24 that the original array of FPGA chips includes F41-F44 and F31-F34, the addition of two further rows of chips F21-F24 and F11-F14 can be achieved in the piggyback board 745. The piggyback board 745 also includes a suitable bus to expand the bank. Further expansion can be achieved with additional piggyback boards placed on top of other in the array.
[0317]
FIG. 25 illustrates one embodiment of a hardware activation method. Step 800 initiates a power on or warm boot sequence. In step 801, the PCI controller reads the EEPROM for initialization. Step 802 reads the PCI controller register taking into account the initialization sequence and writes it to the PCI controller register. The boundary scan at step 803 tests all the FPGA chips in the array. Step 804 configures the CTRL_FPGA unit of the FPGA I / O controller. Step 805 reads the register and writes it to the register in the CTRL_FPGA unit. Step 806 sets the PCI controller for the DMA master read / write mode. The data is then transferred and confirmed. Step 807 configures all FPGA chips with the test design and verifies their accuracy. In step 808, the hardware is ready for use. At this stage, the system assumes that all processes positively verify hardware operability, otherwise the system will not reach process 808.
[0318]
(E. Alternative embodiment using higher density FPGA chip)
In one embodiment of the present invention, FPGA logic devices are provided on individual boards. Multiple boards with more FPGAs can be provided if more FPGA logic devices need to follow the user's circuit design than provided with individual boards. The ability to add additional boards into the simulation system is a desirable feature of the present invention. In this embodiment, higher density FPGA chips (eg, Altera 10K130V and 10K250V) are used. The use of these chips changes the board design so that only four FPGA chips are used per board instead of eight lower density FPGA chips (eg, Altera 10K100).
[0319]
There are challenges in connecting these boards to the motherboard of the simulation system. Interconnection and connection schemes need to be compensated for backplane shortages. The FPGA array in the simulation system is provided on the motherboard via a specific interconnect configuration. Interconnects that are directly adjacent to each other (ie, N [73: 0], S [73: 0], W [73: 0], E [73: 0]), and every other adjacent interconnect Arranged by connection (ie NH [27: 0], SH [27: 0], XH [36: 0], XH [72:37]) and local bus connections within a single board and across different boards Each chip may have no more than 8 sets of interconnects. Each chip can be interconnected either directly to adjacent neighboring chips or alternately to chips located at the top, bottom, left, and right that are not adjacent. In the X direction (east-west), the array is torus. In the Y direction (north-south), the array is meshed.
[0320]
An interconnect may link logic devices and other components within a single board. However, providing an inter-board connector to link these boards and interconnects together across different boards, (1) a PCI bus through the motherboard and array board, and (2) between any two array boards Communicate the signal. Each board includes its own FPGA bus FD [63: 0]. The FPGA bus FD [63: 0] allows the FPGA logic device to communicate with the SRAM memory device and the CTRL_FPGA unit (FPGA I / O controller). The FPGA bus FD [63: 0] is not provided across multiple boards. However, FPGA interconnections provide connectivity between FPGA logic devices across multiple boards. However, these interconnections are not related to the FPGA bus. On the other hand, a local bus is provided across all boards.
[0321]
The motherboard connector connects the board to the motherboard and thus to the PCI bus, power supply, and ground. For some boards, the motherboard connector is not used to connect directly to the motherboard. In a six board configuration, only

boards

1, 3, and 5 are connected directly to the motherboard, and the remaining

boards

2, 4, and 6 depend on their neighboring boards for motherboard connectivity. Thus, all other boards are connected directly to the motherboard. These board interconnects and local buses are coupled together via internal board connectors located from the solder side to the component side. The PCI signal is routed only through one of the boards (typically the first board). Power and ground are used for other motherboards for these boards. Various internal board connectors, when installed from the solder side to the component side, allow communication between PCI bus components, FPGA logic devices, memory devices, and various simulation system control circuits.
[0322]
FIG. 56 shows a high level block diagram of an array of FPGA chip configurations according to one embodiment of the present invention. The CTRL_FPGA unit 1200 is coupled to the bus 1210 via line 1209 as described above. In one embodiment, the CTRL_FPGA unit 1200 is a programmable logic device (PLD) in the form of an FPGA chip, such as, for example, an Altera 10K50 chip. Bus 1210 allows CTRL_FPGA unit 1200 to be coupled to other simulation array boards (if any) and to other chips (eg, PCI controller, EEPROM, clock buffer). FIG. 56 shows other major functional blocks in the logical device and memory device format. In one embodiment, the logic device is a programmable logic device (PLD) in the form of an FPGA chip, for example an Altera 10K130V or 10K250V chip. The 10K130V or 10K250V is pin compatible and both are 599 pin PGA packages. Thus, instead of the embodiment described above with eight Altera FLEX 10K100 chips in the array, this embodiment uses only four chips of Altera FLEX 10K130. One embodiment of the present invention describes a board that includes these four logic devices and their interconnections.
[0323]
Since the user's design is modeled and configured with any number of these logical devices in the array, internal FPGA logic device communication is necessary to connect one part of the user's circuit design to another part. . Furthermore, internal configuration information and boundaries are also supported by internal FPGA interconnects. Finally, the necessary simulation system control signals need to be accessible between the simulation system and the FPGA logic device.
[0324]
FIG. 36 shows the hardware architecture of the FPGA logic device used in the present invention. The FPGA logic device 1500 includes 102 upper I / O pins, 102 lower I / O pins, 111 left I / O pins, and 102 right I / O pins. Therefore, the total number of interconnections is 425. In addition, additional 45 I / O pins include GCLK, FPGA bus FD [31: 0] (FD [63:32] is dedicated to high bank), F_RD, F_WD, DATAXSFR, SHIFTIN, SHIFTOUT, SPACE [2: 0], Dedicated for EVAL, EVAL_REQ_N, DEVICE_OE (signal from CTRL_FPGA unit to turn on output pins of FPGA logic device), and DEV_CLRN (signal from CTRL_FPGA unit to clear all internal flip-flops before starting simulation) become. Thus, any data and control signals that pass between any two FPGA logic devices are carried by these interconnections. The remaining pins are dedicated to power and ground.
[0325]
FIG. 37 illustrates the FPGA interconnect pinout of a single FPGA chip according to one embodiment of the present invention. If each set includes a specific number of pins, each chip 1510 may have no more than eight sets of interconnects. A set of interconnects depending on the respective position of the chip on the board may have fewer than eight chips. In a preferred embodiment, the chip can have a total of seven sets of interconnects, but the particular set of interconnects used can vary depending on the location of the chip on the board. The interconnection of each FPGA chip is oriented in the horizontal direction (east-west) and the vertical direction (north-south). The set of west interconnections is labeled as W [73: 0]. The set of eastern interconnections is labeled as E [73: 0]. The set of northward interconnections is labeled as N [73: 0]. The set of southward interconnections is labeled as S [73: 0]. These complete sets of interconnects are for connecting to adjacent chips. That is, these interconnects do not “hop” across any chip. For example, in FIG. 39, chip 1570 includes N [73: 0] interconnect 1540, W [73: 0] interconnect 1542, E [73: 0] interconnect 1543, and S [73: 0]. Interconnects 1545. This FPGA chip 1570, which is also an FPGA2 chip, has a total of four adjacent interconnects (N [73: 0], S [73: 0], W [73: 0], and E [73: 0]) . The west interconnection of FPGA 0 connects to the east interconnection of FPGA 3 through wire 1539 via a circular interconnection. Thus, wire 1539 allows chips 1569 (FPGA0) and 1572 (FPGA3) to be directly coupled to each other in such a way as to include the west-east edge of the board to be wrapped in contact with each other. .
[0326]
Returning to FIG. 37, four sets of “hopping” interconnections are provided. Two sets of interconnects running in the vertical direction (NH [27: 0] and SH [27: 0]) are non-adjacent interconnects. For example, FPGA2 chip 1570 of FIG. 39 shows NH interconnect 1541 and SH interconnect 1546. Returning to FIG. 37, the other two sets of interconnects running in the horizontal direction (XH [36: 0] and XH [72:37]) are non-adjacent interconnects. For example, the FPGA2 chip 1570 of FIG. 39 shows the XH interconnect 1544.
[0327]
Returning to FIG. 37, the vertical hopping interconnects NH [27: 0] and SH [27: 0] each have 28 pins. The horizontal interconnect has 73 pins, XH [36: 0] and XH [72:37]. Horizontal interconnect pins, XH [36: 0] and XH [72:37] are connected to the west side (eg, interconnect 1605 of FPGA 3 chip 1576 in FIG. 39) and / or the east side (eg, FPGA 0 chip 1573 in FIG. 39). Interconnect 1602). With this configuration, each chip can be manufactured in the same manner. Thus, each chip can be interconnected with every other non-adjacent chips installed at the top, bottom, left and right.
[0328]
FIG. 39 shows a layout of six immediately adjacent FPGA arrays and every other adjacent FPGA array on a single motherboard according to an embodiment of the present invention. Using this figure, two possible configurations (6-board system and 2-board system) are shown. The position display 1550 indicates that the “Y” direction is the north-south direction and the “X” direction is the east-west direction. In the X direction, the array is annular. In the Y direction, the array is mesh. In FIG. 39, only the boards, FPGA logic devices, interconnects, and connectors are shown at a high level. The motherboard and other supporting components (eg, SRAM memory device) and wire lines (eg, FPGA bus) are not shown.
[0329]
Note that FIG. 39 provides an array diagram of boards and other components, interconnections, and connectors. The actual physical configuration and setup involves placing these boards on the component side of their respective ends on the solder side. About half of the boards are connected directly to the motherboard and the other half are connected to their respective adjacent boards.
[0330]
In the six board embodiment of the present invention, six boards 1551 (board 1), 1552 (board 2), 1553 (board 3), 1554 (board 4), 1555 (board 5), and 1556 (board 6). Are provided on a motherboard (not shown) as part of the reconfigurable hardware unit 20 of FIG. Each board contains an almost equivalent set of components and connectors. Accordingly, for purposes of illustration, the sixth board 1556 includes FPGA logic devices 1565-1568 and connectors 1557-1560 and 1581. The fifth board 1555 includes FPGA logic devices 1569-1572 and

connectors

1582 and 1583, and the fourth board 1554 includes FPGA logic devices 1573-1576 and

connectors

1584 and 1585.
[0331]
In this six board configuration, board 1 1551 and board 6 1556 are provided as “book end” boards. The “book end” board includes, for example, Y-mesh terminals such as R-pack terminals 1557-1560 on board 6 1556 and R-pack terminals 1591-1594 on board 1 1551. Intermediate boards (ie, board 1552 (board 2), 1553 (board 3), 1554 (board 4), and 1555 (board 5)) are also provided to complete the array.
[0332]
As described above, the interconnects are directly adjacent interconnects (ie, N [73: 0], S [73: 0], W [73: 0], E [73: 0]), and one Composed of alternate adjacent interconnects (ie, NH [27: 0], SH [27: 0], XH [36: 0], XH [72:37]), within a single board and different Exclude local bus connections across the board. An interconnect alone can combine logical devices and other components within a single board. However, internal board connectors 1581-1590 allow communication between FPGA logic devices across different boards (i.e., board 1-board 6). The FPGA bus is part of the internal board connectors 1581-1590. These connectors 1581-1590 are 600 pin connectors that transmit 520 signals and 80 power / ground connections between two adjacent array boards.
[0333]
In FIG. 39, the various boards are arranged in an asymmetric manner with respect to the internal board connectors 1581-1590. For example, between

board

1551 and 1552,

internal board connectors

1589 and 1590 are provided. Interconnect 1515 connects

FPGA logic devices

1511 and 1577 together and according to

connectors

1589 and 1590. This connection is symmetric. However, the interconnect 1603 is not symmetric. It connects the FPGA logic device 1553 of the third board 1553 to the FPGA logic device 1577 of the board 1551. According to

connectors

1589 and 1590, such interconnections are not symmetric. Similarly, interconnect 1600 is not symmetric with respect to

connectors

1589 and 1590. This is because interconnect 1600 connects FPGA logic device 1577 to terminal 1591. Terminal 1591 connects to FPGA logic device 1577 via interconnect 1601. Other similar interconnections exist to further indicate asymmetry.
[0334]
As a result of this asymmetry, the interconnects connect the internal board connectors in two different ways: one is a symmetric interconnect such as interconnect 1515 and another is an asymmetric interconnect such as interconnects 1603 and 1600. Routed through. Interconnect routing is shown in the scheme, FIGS. 40A and 40B.
[0335]
In FIG. 39, an example of a directly adjacent connection within a single board is an interconnect 1543 that couples logical device 1570 to logical device 1571 along the east-west direction of board 1555. Another example of a directly adjacent connection within a single board is an interconnect 1607 that couples a logical device 1573 to a logical device 1573 on a board 1554. An example of a directly adjacent connection between two different boards is an interconnect 1545 that couples a logic device 1570 on board 1555 to a logic device 1574 on board 1554 via

connectors

1583 and 1584 along the north-south direction. is there. Here, the signal is transferred to the other side using the two

internal board connectors

1583 and 1584.
[0336]
Every other interconnect in the exemplary single board is an interconnect 1544 that couples logic device 1570 to logic device 1572 on board 1555 along the east-west direction. Alternate interconnections between two exemplary different boards couple the logic device 1565 on board 1556 to the logic device 1573 on board 1554 via connectors 1581-1584. Here, the signals are transferred to the other side using the connectors 1581 to 1584 of the four internal boards.
[0337]
Some boards, especially those located at the north-south end on the motherboard, also include 10 Ω R-packs to terminate some connections. Accordingly, the sixth board 1556 includes 10Ω R-pack connectors 1557 to 1560, and the first board 1551 includes 10Ω R-pack connectors 1591 to 1594. A sixth board 1556 includes an R-pack connector 1557 for interconnections 1970-1971, an R-pack connector 1558 for

interconnections

1972 and 1541, and an R-pack connector 1559 for

interconnections

1973 and 1974. And an R-pack connector 1560 for

interconnections

1975 and 1976. Further, interconnects 1561-1564 are not connected to any. These north-south interconnects are arranged in a mesh-type manner, unlike east-west annular type interconnects.
[0338]
These mesh terminals increase the number of north-south direct interconnections. Otherwise, the interconnects at the north and north ends and south ends of the FPGA mesh are all wasted. For example,

FPGA logic devices

1511 and 1577 already have one set of direct interconnects 1515. Additional interconnects are also provided for these two FPGA logic devices via R-pack 1591 and

interconnects

1600 and 1601. That is, the R-pack connects

interconnects

1600 and 1601 together. This increases the number of direct connections between

FPGA logic devices

1511 and 1577.
[0339]
An internal board connection is further provided.

Logical devices

1577, 1578, 1579, and 1580 on board 1551 are coupled to

logical devices

1511, 1512, 1513, and 1514 via

interconnects

1515, 1516, 1517, and 1518 and

internal board connectors

1589 and 1590. . Accordingly, interconnect 1515 couples logic device 1511 on board 1552 to logic device 1577 on board 1551 via

connectors

1589 and 1590. Interconnect 1516 couples logic device 1512 on board 1552 to logic device 1578 on board 1551 via

connectors

1589 and 1590. Interconnect 1517 couples logic device 1513 on board 1552 to logic device 1579 on board 1551 via

connectors

1589 and 1590. Interconnect 1518 couples logic device 1514 on board 1552 to logic device 1580 on board 1551 via

connectors

1589 and 1590.
[0340]
For example, some interconnects such as 1595, 1596, 1597, and 1598 are not coupled together. Because they are not used. However, as described above for

logical devices

1511 and 1577, R-pack 1591 connects

interconnects

1600 and 1601 to increase the number of north-south interconnects.
[0341]
A two board embodiment of the present invention is shown in FIG. In the two board embodiment of the present invention, only two boards need to follow the user's design in the simulation system. Like the six board configuration of FIG. 39, the two board configuration of FIG. 44 uses the same two “bookend” boards (board 1 1551 and board 6 1556). These are provided on the motherboard as part of the reconfigurable hardware unit of FIG. In FIG. 44, the first book end board is board 1 and the second book end board is board 6. The board 6 is used in FIG. 44 and is shown similarly to the board 6 of FIG. That is, book end boards such as board 1 and board 6 should have an essential termination of the north-south mesh connection.
[0342]
The two board configurations include four FPGA logic devices 1577 (FPGA0), 1578 (FPGA1), 1579 (FPGA2), and 1580 (FPGA3) on board 1 1551, and four FPGA logic devices 1565 on board 6 1556. (FPGA0), 1566 (FPGA1), 1567 (FPGA2), and 1568 (FPGA3). These two boards are connected by

internal board connectors

1581 and 1590.
[0343]
These boards include a 10Ω R-pack to terminate some connections. With respect to the two board embodiment, both boards are “bookend” boards. Board 1551 includes 10 Ω R-

pack connectors

1591, 1592, 1593, and 1594 as resistive terminations. The second board 1556 also includes 10 Ω R-pack connectors 1557-1560.
[0344]
The board 1551 has a connector 1590, and the board 1556 has a connector 1581 for internal board communication. For example, interconnects from one board to another, such as

interconnects

1600, 1971, 1977, 1541, and 1540, pass through these

connectors

1590 and 1581. In other words, with

internal board connectors

1590 and 1581, interconnects 1600, 1971, 1977, 1541, and 1540 make it possible to successfully connect one component on one board and another component on another board. enable.

Internal board connectors

1590 and 1581 transmit control data and control signals on the FPGA bus.
[0345]
For the four board configuration, board 1 and board 6 provide book end boards. Board 2 1552 and board 3 1553 (see FIG. 39) are intermediate boards. When connected to a motherboard in accordance with the present invention (as discussed with respect to FIGS. 38A and 38B), board 1 and board 2 are combined and board 3 and board 6 are combined.
[0346]
For a six board configuration, board 1 and board 6 provide a book end board as described above. Board 2 1552, board 3 1553, board 4 1554, and board 5 1555 (see FIG. 39) are intermediate boards. When connected to a motherboard in accordance with the present invention (as discussed with respect to FIGS. 38A and 38B), board 1 and board 2 are combined, board 3 and board 4 are combined, board 5 and board 6 are To be paired.
[0347]
Additional boards can be provided if desired. However, regardless of the boards added to the system, the book end boards (eg, like board 1 and board 6) should have resistive terminations to achieve a meshed array connection. In one embodiment, the minimum configuration is the two board configuration of FIG. Additional boards can be added by adding two boards. If the primary configuration has board 1 and board 6, future changes to the four board configuration will further remove board 6 and board 1 and board 2 together as described above. And then combining board 3 and board 6 together.
[0348]
As described above, each logical device is coupled to an adjacent adjacent logical device and every other adjacent logical device that is not adjacent. Thus, in FIGS. 39 and 44, logical device 1577 is coupled to adjacent adjacent logical device 1578 via interconnect 1547. Logical device 1577 is also coupled to non-adjacent logical devices 1579 via alternate interconnects 1548. However, the logical device 1580 is considered adjacent to the logical device 1577 because of the enveloping annular configuration with the interconnect 1549 providing coupling.
[0349]
FIG. 42 shows a top view (component side) of on-board components and connectors of a single board. In one embodiment of the present invention, only one board is required to model a user's design in a simulation system. In other embodiments, multiple boards (ie, at least two boards) are required. Thus, for example, FIG. 39 shows six boards 1551-1556 coupled together via various 600 pin connectors 1581-1590. At the top and bottom, board 1551 is terminated by one set of 10 Ω R-packs and board 1556 is terminated by another set of 10 Ω R-packs.
[0350]
Returning to FIG. 42, the board 1820 includes four FPGA logic devices 1822 (FPGA0), 1823 (FPGA1), 1824 (FPGA2), and 1825 (FPGA3). Two SRAM memory devices 1828 and 1829 are further provided. SRAM memory devices 1828 and 1829 are used to map memory blocks from logical devices on this board. In other words, the memory simulation according to the present invention maps a memory block from the logic device of this board to the SRAM memory device of this board. Other boards include other logic devices and memory devices to achieve similar mapping operations. In one embodiment, the memory mapping is board dependent. That is, the memory mapping of board 1 is limited to the logical devices and memory devices on board 1 ignoring other boards. In other embodiments, the memory mapping is board independent. Therefore, memory blocks are mapped from a logical device on one board to a memory device located on another board using a small number of memory devices.
[0351]
A light emitting diode (LED) 1821 is also provided to visually show some selective activity. The LED display is as shown in Table A according to one embodiment of the present invention.
[0352]
[Table 2]

[0353]
Various other control chips, such as PLX PCI controller 1826 and CTRL_FPGA unit 1827, control internal FPGA and PCI communications. An example of a PLX PCI controller 1826 that may be used in the system is PLX Technology PCI 9080 or 9060. The PCI 9080 is a PCI interface to the appropriate local bus interface, control registers, FIFO, and PCI bus. The data book PLX Technology, PCI 9080 Sheet (ver. 0.93, February 28, 1997) is hereby incorporated by reference. An example of a CTRL_FPGA unit 1827 is a logic device (PLD) that can be programmed in the form of an FPGA chip, such as, for example, an Altera 10K50 chip. In a multiple board configuration, only the first board is coupled to the PCI bus including the PCI controller.
[0354]
Connector 1830 connects board 1820 to a motherboard (not shown) and thus to the PCI bus, power, and ground. For some boards, connector 1830 is not used for direct connection to the motherboard. Thus, in the two board configuration, only the first board is directly coupled to the motherboard. In a six board configuration, only

boards

1, 3, and 5 are connected directly to the motherboard, and the remaining

boards

2, 4, and 6 rely on their adjacent boards for motherboard connectivity. Internal board connectors J1-J28 are further provided. As implied by name, these connectors J1-J28 can be connected across different boards.
[0355]
The connector J1 is for external power and ground connection. Table B below shows external power pins and corresponding details according to one embodiment of the present invention.
[0356]
[Table 3]

[0357]
The connector J2 is for parallel port connection. Connectors J1 and J2 are used for single single board boundary scan testing during operation. Table C below shows the pins and corresponding details of the parallel JTAG port connector J2 according to one embodiment of the present invention.
[0358]
[Table 4]

[0359]
Connectors J3 and J4 are for local bus connections across the board. Connectors J5-J16 are a set with connections for FPGA interconnection. Connectors J17-J28 are connections for the second set of FPGA interconnects. When installed from the component side to the solder side, these connectors provide an effective connection between one component of one board and another component of another board. Tables D and E below provide a complete list and details of connectors J1-J28 according to one embodiment of the present invention.
[0360]
[Table 5]

[0361]
The shaded connector is a through-hole type. Note that in Table D, the numbers in brackets [] represent FPGA logic device numbers 0-3. Thus, S [0] indicates the southward interconnection (ie, S [73: 0] in FIG. 37) and 74 bits of FPGA0.
[0362]
[Table 6]

[0363]
FIG. 43 illustrates the connectors J1 to J28 in FIGS. 41A to 41F and 42. Generally, a clear block is a surface-mounted type, while a gray block is a through-hole type. Further, a block whose outline is a solid line represents a connector placed on the component surface. A block with a dotted outline represents a connector placed on the solder surface. Thus, blank and contoured blocks 1840 represent a 2 × 30 header, surface mount, and placed on the component side. A clear and dotted outline block 1841 represents a 2x30 receptacle, surface mount, and placed on the solder side of the board. A block 1842 filled with gray and having a solid outline represents a 2 × 30 or 2 × 45 header, through hole, and part surface. Gray and contoured block 1843 represents being placed on a 2 × 45 or 2 × 30 receptacle, through hole, and solder surface. In one embodiment, the simulation system uses Samtec's SFM and TFM series of 2x30 or 2x45 microstrip connectors of both surface mount and through-hole types. A block 1844 filled with a cross hatch and having a solid line is attached to the R-pack, surface mount, and component side of the board. A block 1845 filled with a cross hatch and having a dotted line is attached to the R-pack, surface mount, and solder surface. Samtec specifications from Samtec's website catalog are hereby incorporated by reference. Returning to FIG. 42, the connectors J3 to J28 are of the type as shown in FIG.
[0364]
41A-41F show top views of the boards and their respective connectors. Thus, board 1660 includes connectors 1661-1681 along with motherboard connector 1682. FIG. 41B shows a connector for the board 5. Thus, board 1690 includes connectors 1691-1708 along with motherboard connector 1709. FIG. 41C shows a connector for the board 4. Thus, board 1715 includes connectors 1716-1733 along with motherboard connector 1734. FIG. 41D shows a connector for the board 3. Thus, board 1740 includes connectors 1741-1758 along with motherboard connector 1759. FIG. 41E shows a connector for the board 2. Thus, board 1765 includes connectors 1766-1783 along with motherboard connector 1784. FIG. 41F shows a connector for the board 1. Thus, board 1790 includes connectors 1791-1812 along with motherboard connector 1813. As shown in the description of FIG. 43, these connectors on the six boards are: (1) surface mount or through hole, (2) component or solder side, and (3) header or receptacle or R-pack. It is a combination.
[0365]
In one embodiment, these connectors are specified for internal board communication. The related buses and signals are grouped together and supported by these internal board connectors for routing signals between any two boards. In addition, only half of the board is directly coupled to the motherboard. In FIG. 41A, board 6 1660 has connectors 1661-1668 designated for one set of FPGA interconnects, connectors 1669-1674, connectors 1669-1674, 1676, and 1679 designated for another set of FPGA interconnects. As well as a connector 1681 designated for the local bus. Since board 6 1660 is installed as one of the boards at the end of the motherboard (along with board 1 1790 of FIG. 41F at the other end),

connectors

1675, 1677, 1678 and 1680 are connected to each other in a precise north-south direction. Specified with respect to a 10Ω R-pack connection. Further, if the sixth board 1535 is coupled directly to the fifth board 1534 rather than directly to the motherboard 1520, the motherboard connector 1682 is connected to the board 6 1660 as shown in FIG. 38B. Not used for.
[0366]
In FIG. 41B, board 5 1690 is designated for connectors 1691-1698 specified for one set of FPGA interconnects, connectors 1699-1706 specified for another set of FPGA interconnects, and specified for another set of FPGA interconnects.

Connectors

1707 and 1708. Connector 1709 is used to couple board 5 1690 to the motherboard.
[0367]
In FIG. 41C, board 4 1715 includes connectors 1716-1723 designated for one set of FPGA interconnects, connectors 1724-1731 designated for another set of FPGA interconnects, and

connectors

1732 and 1732 designated for the local bus. 1733. Without using connector 1709, board 41715 is coupled directly to the motherboard. This configuration is also shown in FIG. 38B when the fourth board 1533 is not directly connected to the motherboard 1520 but is coupled to the third board 1532 and the fifth board 1534.
[0368]
In FIG. 41D, board 3 1740 includes connectors 1741-1748 designated for one set of FPGA interconnects,

connectors

1749 and 1756 designated for another set of FPGA interconnects, and

connectors

1757 and 1757 designated for the local bus. 1758. Connector 3759 is used to couple board 3 1740 to the motherboard.
[0369]
In FIG. 41E, board 2 1765 includes connectors 1766-1773 designated for a set of FPGA interconnects, connectors 1774-1781 designated for the local bus, and

connectors

1782 and 1783 designated for the local bus. Board 2 1765 is coupled directly to the motherboard without using connector 1784. This configuration is also shown in FIG. 38B when the second board 1525 is not directly coupled to the motherboard 1520 but is coupled to the third board 1532 and the first board 1526.
[0370]
In FIG. 41F, board 1 1790 is designated for connectors 1791-1798 designated for one set of FPGA interconnects, connectors 1799-1804, 1806 and 1809 designated for another set of FPGA interconnects, and local buses.

Connectors

1811 and 1812. Board 1 1790 is coupled to the motherboard without using connector 1813. Since board 1 1790 is positioned as one of the boards at the end of the motherboard (along with board 61660 at the other end in FIG. 41A),

connectors

1805, 1807, 1808, and 1810 are accurate north-south Specified for a 10 Ω R-pack connection in the direction interconnect.
[0371]
In one embodiment of the present invention, the plurality of boards are coupled to the motherboard and each other in a unique manner. The plurality of boards are bonded together with the component side to the solder side. One of the boards (eg, the first board) is coupled to the motherboard via the motherboard connector, and thus to the PCI bus. Further, the FPGA interconnect bus of the first board is coupled to the FPGA interconnect bus of another board (eg, the second bus) via a set of FPGA interconnect connectors. The FPGA interconnect connector on the first board is on the component side, and the FPGA interconnect connector on the second board is on the solder side. A component side connector on the first board and a solder side connector on the second board, respectively, allow the FPGA interconnect buses to be coupled together.
[0372]
Similarly, the local buses on the two boards are coupled together via a local bus connector. The local bus connector on the first board is on the component side and the local bus connector on the second board is on the solder side. Thus, the local buses can be coupled together by a component side connector on the first board and a solder side connector on the second board, respectively.
[0373]
Additional boards can be added. The third board may add the solder surface of the third board to the component side of the second board. Similar FPGA interconnects and local bus internal board connections are further added. The third board is also coupled to the motherboard via another connector, which simply provides power and ground to the third board, as further described below.
[0374]
The component side to solder side connector in the two board configuration is discussed in FIG. 38A for reference. This figure shows a side view of the connection of an FPGA board on a motherboard according to an embodiment of the present invention. FIG. 38A shows two board configurations, and as the name implies, two boards are used. In FIG. 38A, these two boards 1525 (board 2) and 1526 (board 1) coincide with the two

boards

1552 and 1551 in FIG. The component surfaces of

boards

1525 and 1526 are denoted by reference numeral 1989. The solder surfaces of

boards

1525 and 1526 are denoted by reference numeral 1988. As shown in FIG. 38A, these two

boards

1525 and 1526 are coupled to the motherboard 1520 via the motherboard connector 1523.

Other motherboard connectors

1521, 1522, and 1524 may also be provided for expansion purposes. Signals between the PCI bus and

boards

1525 and 1526 are routed through motherboard connector 1523. The PCI signal is first routed between the two board structures and the PCI bus via the first board 1526. Thus, the signals from the PCI bus first contact the first board 1526 before they move to the second board 1525. Similarly, signals to the PCI bus from the two board structures are sent from the first board 1526. Power is also applied to

boards

1525 and 1526 from a power source (not shown) via motherboard connector 1523.
[0375]
As shown in FIG. 38A, the board 1526 includes several components and connectors. One such component is an FPGA logic device 1530. Connectors 1528A and 1531A are further provided. Similarly, the board 1525 includes a number of components and connectors. One such component is an FPGA logic device 1529. Connectors 1528B and 1531B are further provided.
[0376]
In one embodiment, connectors 1528A and 1528B are FPGA bus internal board connectors, such as 1590 and 1581 (FIG. 44), for example. These internal board connectors include various FPGA interconnects (eg, N [73: 0], S [73: 0], W [73: 0], E [73: 0], NH [27: 0], SH [27: 0], XH [36: 0], and XH [72:37]) provide internal board connectivity and exclude local bus connections.
[0377]
Further, connectors 1531A and 1531B are local bus internal board connectors. The local bus handles signals between the PCI bus (via the PCI controller) and the FPGA bus (via the FPGA I / O controller (CTRL_FPGA) unit). The local bus also handles configuration and boundary scan test information between the PCI controller and FPGA logic device and the FPGA I / O controller (CTRL_FPGA) unit.
[0378]
That is, the motherboard connector couples one board of a set of boards to the PCI bus and power. One set of connectors couples the FPGA interconnect to the solder side of another board through the component side of one board. Another set of connectors couples the local bus to the solder side of another board via the component side of one board.
[0379]
In another embodiment of the invention, more than two boards are used. In practice, FIG. 38B shows a six board configuration. This configuration is similar to that of FIG. 38A, and the motherboard and all other boards connected directly to the board and the local bus have internal board connectors located from the solder side to the component side. Are joined together.
[0380]
FIG. 38B shows six boards, 1526 (first board), 1525 (second board), 1532 (third board), 1533 (fourth board), 1534 (fifth board), and 1535. (Sixth board). These six boards are coupled to the motherboard 1520 via connectors on boards 1526 (first board), 1532 (third board), and 1534 (fifth board). The other boards 1525 (second board), 1533 (fourth board), and 1535 (sixth board) are not directly coupled to the motherboard 1520; rather, they are connected to their adjacent boards. Indirectly coupled to the motherboard via respective connections.
[0381]
When placed from the solder side to the component side, various internal board connectors allow communication between PCI bus components, FPGA logic devices, memory devices, and various simulation system control circuits. The first set of internal board connectors 1990 corresponds to the connectors J5 to J16 in FIG. The second set of internal board connectors 1991 corresponds to the connectors J17 to J28 in FIG. A third set of internal board connectors 992 corresponds to connectors J3 and J4 in FIG.
[0382]
Motherboard connectors 1521-1524 are provided on the motherboard 1520 and couple the motherboard (and hence the PCI bus) to the six boards. As described above, boards 1526 (first board), 1532 (third board), and 1534 (fifth board) are directly coupled to

connectors

1523, 1522, and 1521, respectively. The other boards 1525 (second board), 1533 (fourth board), and 1535 (sixth board) are not directly coupled to the motherboard 1520. Since only one PCI controller is required for all six boards, only the first board 1526 includes a PCI controller. Further, a motherboard connector 1523 coupled to the first board 1526 provides access to and from the PCI bus.

Connectors

1522 and 1521 are only coupled to power and ground. The center-to-center spacing between adjacent motherboard connectors is approximately 20.32 mm in one embodiment.
[0383]
With respect to boards 1526 (first board), 1532 (third board), and 1534 (fifth board) that are directly coupled to

motherboard connectors

1523, 1522, and 1521 respectively, The connectors J17 to J28 are installed on the solder surface, and the local bus connectors J3 to J4 are installed on the component surface. For other boards 1525 (second board), 1533 (fourth board), and 1535 (sixth board) that are not directly coupled to

motherboard connectors

1523, 1522, and 1521, the connectors J5-J16 are soldered The connectors J17 to J28 are installed on the component surface, and the local bus connectors J3 to J4 are installed on the solder surface. With respect to termination boards 1526 (first board) and 1535 (sixth board), some of the connectors at J17-J28 are 10Ω R-pack terminations.
[0384]
Figures 40A and 40B show array connections between different boards. To facilitate the manufacturing process, a single layout design is used for all boards. As described above, the board connects to other boards via connectors without a backplane. FIG. 40A shows two example boards 1611 (board 2) and 1610 (board 1). The component surface of the board 1610 faces the solder surface of the board 1611. Board 1611 includes a number of FPGA logic devices, other components, and wire lines. The particular nodes of these logical devices and other components of the board 1611 are denoted by nodes A '(reference numeral 1612) and B' (reference numeral 1614). Node A 'is coupled to connector pad 1616 via PCB trace 1620. Similarly, node B ′ is connected to connector pad 1617 via PCB trace 1623.
[0385]
Similarly, board 1610 also includes a number of FPGA logic devices, other components, and wire lines. The specific nodes of these logical devices and other components of the board 1610 are denoted by node A (reference numeral 1613) and node B (reference numeral 1615). Node A is coupled to connector pad 1618 via PCB trace 1625. Similarly, Node B is connected to connector pad 1619 via PCB trace 1622.
[0386]
The routing of signals between nodes installed on different boards using surface mount connectors will now be discussed. In FIG. 40A, (1) Node A and Node B ′ as indicated by

fictitious paths

1620, 1621 and 1622; and (2) Node B and Node A ′ as indicated by

fictitious paths

1623, 1624 and 1625. There is a desired connection between These connections are intended for paths such as the asymmetrical interconnection 1600 between

boards

1551 and 1552 of FIG. Other asymmetrical interconnections include NH-

SH interconnections

1977, 1979, and 1981 on both sides of

connectors

1589 and 1590.
[0387]
A-A 'and B-B' correspond to a symmetric interconnect, such as interconnect 1515 (N, S). N and S interconnects are used through Hall connectors, whereas NH and SH asymmetric interconnects use SMD connectors (see Table D).
[0388]
An actual device that uses a surface mount connector will now be discussed with reference to FIG. 40B, using similar numbers for similar items. In FIG. 40B, a board 1611 shows a component side node A 'that is coupled to a component side connector pad 1636 via a PCB trace 1620. The component side connector pad 1636 is coupled to the solder side connector pad 1639 via a conductive path 1651. The solder side connector pad 1639 is coupled to the component side connector pad 1642 of the board 1610 via a conductive path 1648. Finally, component side connector pads 1642 are coupled to Node B via PCB traces 1622. Thus, node A 'on board 1611 is coupled to node B on board 1610.
[0389]
Similarly, in FIG. 40B, board 1611 shows node B 'on the component side coupled to component side connector pad 1638 via PCB trace 1623. The component side connector pad 1638 is coupled to the solder side connector pad 1637 via a conductive path 1650. Connector pad 1637 on the solder side is coupled to connector pad 1640 on the component side via conductive path 1645. Finally, component side connector pad 1640 is coupled to node A via PCB trace 1625. Thus, node B ′ on board 1611 can be coupled to node A on board 1610. Because these boards share the same layout,

conductive paths

1652 and 1653 can be used in a manner similar to

conductive paths

1650 and 1651 for other boards installed adjacent to board 1610. Thus, a unique internal board connectivity scheme is provided using surface mount without the use of switching components and is further provided via a Hall connector.
[0390]
(F. Timing unresponsive glitch-free logic device)
One embodiment of the present invention solves both the hold time and clock glitch problems. While configuring a user design into a reconfigurable computing system, standard logic devices (eg, latches, flip-flops) detected in the user's design are emulated logic devices, i.e., non-timing devices, according to one embodiment of the invention. Replaced with reactive glitch free (TIGF) logic device. In one embodiment, the trigger signal captured in the EVAL signal is used to update the value stored in these TIGF logic devices. Waits for various input signals and other signals, communicates through a user-designed hardware model, and after reaching a steady state during the evaluation period, a trigger signal is provided and stored by the TIGF logic device, Or update the latched value. As a result, a new evaluation cycle begins. The trigger period of the evaluation period is periodic in one embodiment.
[0390]
The retention time problem described above is briefly discussed here. For those skilled in the art, a common and widespread problem in logic circuit design is violation of retention time. After the control input changes to latch, capture, or store the value indicated by the data input (s), the logic element data input (s) are held stationary. The retention time is defined as the minimum time required to do this (otherwise the logic element cannot operate properly).
[0392]
An example of a shift register will now be discussed to explain the retention time requirement. FIG. 75A shows an exemplary shift register. This exemplary shift register connects three D-type flip-flops in series. That is, the output of flip-flop 2400 is coupled to the input of flip-flop 2401. The output of flip-flop 2401 is in turn coupled to the input of flip-flop 2402. Overall input signal S_inIs coupled to the input of flip-flop 2400 and has an overall output signal S_outIs generated from the output of the flip-flop 2402. All three flip-flops receive a common clock signal at their respective inputs. The design of this shift register is based on the following conditions. (1) The clock signal reaches all the flip-flops at the same time, and (2) after detecting the edge of the clock signal, the input of the flip-flop does not change during the holding time of the holding time.
[0393]
Referring to the timing diagram of FIG. 75B, a hold time condition is illustrated when the system does not violate a hold time request. The retention time varies from one logic element to the next, but is always specified in the specification. The clock input goes from logic 0 to logic 1 for time t₀Changes. As shown in FIG. 75A, a clock input is provided to each flip-flop 2400-2402. t₀From this clock end at, the input Sin has a holding time T_HDuring this period, it must be steady. Holding time T_HIs the time t₀To time t₁followed by. Similarly, flip-flop 2401 (ie, D₂) And 2402 (ie, D₃) Must also be stationary during the hold time from the trigger end of the clock signal. This requirement is satisfied in FIGS. 75A and 75B, so input S_inIs shifted to flip-flop 2400 and D₂The input at (logic 0) is shifted to flip-flop 2401, as well as D₃The input at (logic 1) is shifted to flip-flop 2402. For those skilled in the art, when the clock edge is triggered, flip-flop 2401 (input D₂1) and flip-flop 2402 (input D)₃It is assumed that the new value at the input of logic 0) in (1) is shifted or stored in the next flip-flop in the next clock cycle, and the hold time requirement is met. The following table outlines shift register operation for these example values.
[0394]
[Table 7]

[0395]
In actual implementation, the clock signal does not reach all the logic elements simultaneously. Rather, the circuit is designed so that the clock signal reaches all the logic elements almost simultaneously or substantially simultaneously. The clock skew (ie, the timing difference between the clock signals reaching each flip-flop) needs to be designed to be much shorter than the hold time requirement. Thus, all logic elements capture the appropriate input value. In the above example shown in FIGS. 75A and 75B, while another flip-flop captures the new input value, a violation of the holding time by the clock signal reaching the flip-flops 2400-2402 at different times results in the old input Occurs in some flip-flops that capture a value, while another flip-flop captures a new input value. As a result, the shift register is not operated properly.
[0396]
In reconfigurable logic device (eg, FPGA) devices of the same shift register design, if the clock is generated directly from the primary input, the circuit allows the low skew network to distribute the clock signal to all logic elements. (E.g., logic elements detect clock edges at substantially the same time). The primary clock is generated from a self-time test-bench process. Mostly, the primary clock signal is generated in software, and only a few (ie, 1-10) primary clocks are found in typical user circuit designs.
[0397]
However, hold time becomes even more problematic when the clock signal is generated from internal logic instead of the primary input. The induced and gated clock is generated from a network of combinatorial logic and registers that are sequentially driven by the primary clock. Many (ie, over 1000) derived clocks are found in typical user circuit designs.
[0398]
Without special care or further control, these clock signals reach each logic element at different times, and the clock skew can be longer than the hold time. This can result in circuit design failure, such as the shift register circuit illustrated in FIGS. 75A and 75B.
[0399]
Using the same shift register circuit illustrated in FIG. 75A, a hold time violation will now be discussed. This time, however, individual flip-flops of the shift register circuit are deployed across multiple reconfigurable logic chips (eg, multiple FPGA chips), as shown in FIG. 76A. The first FPGA chip 2411 includes internally derived clock logic 2410. The clock logic 2410 supplies the clock signal CLK to several components of the FPGA chips 2412-2416. In this example, the internally generated clock signal CLK is provided to flip-flops 2400-2402 of the shift register circuit. The chip 2412 includes a flip-flop 2400, the chip 2415 includes a flip-flop 2401, and the chip 2416 includes a flip-flop 2402. To illustrate the concept of retention time violations, two

other chips

2413 and 2414 are provided.
[0400]
The clock logic 2410 of the chip 2411 receives the primary clock input (ie possibly another derived clock signal) and generates an internal clock signal CLK. This internal clock signal CLK moves to chip 2412 and is labeled as CLK1. Internal clock signal CLK from clock logic 2410 also moves to chip 2415 via

chips

2413 and 2414 as CLK2. As shown, CLK1 is an input to flip-flop 2400 and CLK2 is an input to flip-flop 2401. Both CLK1 and CLK2 go through a wire trace delay so that, for example, the edges of CLK1 and CLK2 are delayed from the edge of the internal clock signal CLK. Furthermore, since CLK2 is moved through the other two

chips

2413 and 2414, CLK2 undergoes a further delay.
[0401]
Referring to the timing diagram of FIG. 76B, the internal clock signal CLK is time t.₂Is generated and triggered. Due to wire trace delay, CLK1 is time t₃The flip-flop 2400 of the chip 2412 is not reached. Where time t₃Is a delay of time T1. As shown in the table above, before the arrival of the clock edge of CLK1, Q₁Output (or input D)₂) Is at logic zero. After the edge of CLK1 is sensed in flip-flop 2400, D₁The input at is the required holding time H2 (ie time t₄It is necessary to maintain steady state for At this point, flip-flop 2400 shifts to input logic 1 or stores input logic 1. Therefore, Q₁(D₂The output at) is at logic one.
[0402]
When this occurs for flip-flop 2400, clock signal CLK2 creates a path for the clock signal to flip-flop 2401 of chip 2415. Due to the delay T2 caused by

chips

2413 and 2414, CLK2 is₅To the flip-flop 2401. D now₂The input at is at logic 1, and after the hold time fills this flip-flop 2401, this logic value 1 is output Q₂(Or D₃). Therefore, the output Q₂Is at a logic 1 before the arrival of CLK2, and the output remains at a logic 1 after the arrival of CLK2. This is an incorrect result. This shift register should shift to logic zero. If flip-flop 2400 erroneously shifts to the old input value (logic 1), flip-flop 2401 is erroneously shifted to the new input value (logic 1). This erroneous operation typically occurs when the clock skew (or timing delay) becomes longer than the hold time. In this example, T2> T1 + H2. That is, as shown in FIG. 76A, if some precautions are not taken, the retention time is where the clock signal is generated from one chip and distributed to other logic elements residing on a different chip. Violations are likely.
[0403]
The clock glitch problem described above will now be described with reference to FIGS. 77A and 77B. In general, when the circuit input changes, the output changes to some random value in a very short time before the output stabilizes to the correct value. If another circuit checks the output at the very wrong time and reads a random value, the result is wrong and debugging can be difficult. This random value that adversely affects another circuit is called a glitch. In a typical logic circuit, one circuit may generate a clock signal for another circuit. If uncompensated timing delays are present in one or both circuits, clock glitches (ie, unplanned occurrences of clock edges) can occur and produce erroneous results. Clock glitches occur because the exact logic elements of the circuit design change values at different times, such as hold time violations.
[0404]
FIG. 77A shows an exemplary logic circuit. In this case, some logic elements generate clock signals for another set of logic elements. That is, the D-type flip-flop 2420, the D-type flip-flop 2425, and the exclusive OR (XOR) gate 2422 generate a clock signal (CLK3) for the D-type flip-flop 2423. Flip-flop 2420 is connected by line 2425 to D₁Receive data input at line 2427₁Output data at. The flip-flop 2420 receives a clock input (CLK1) from the clock logic 2424. CLK refers to the clock signal originally generated from clock logic 2424, and CLK1 refers to the same signal delayed until the time it reaches flip-flop 2420.
[0405]
Flip-flop 2421 is connected by line 2426 to D₂Data input at Q₂Output data at. The flip-flop 2421 receives the clock input (CLK 2) from the clock logic 2424. As described above, CLK refers to the clock signal originally generated from clock logic 2424, and CLK2 refers to the same signal that is delayed until the time it reaches flip-flop 2421.
[0406]
The output from flip-flop 2420 by line 2427 and the output from flip-flop 2421 by line 2428 are inputs to XOR gate 2422. The XOR gate 2422 outputs data encoded as CLK3 to the clock input of the flip-flop 2423. Flip-flop 2423 is also connected by line 2429 to D₃Enter the data at₃Output data at.
[0407]
The clock glitch problem that arises for this circuit will now be described with reference to the timing diagram illustrated in FIG. 77B. The CLK signal is time t₀Triggered at By this time, this clock signal (ie, CLK1) has reached flip-flop 2420, and this time has already been reached at time t₁It is. CLK2 is the time t₂Until the flip-flop 2421 is not reached.
[0408]
D₁And D₂Assume that both inputs to are at logic one. CLK1 is time t₁When the flip-flop 2420 is reached at₁The output at is at logic 1 (as shown in FIG. 77B). CLK2 is the time t₂Arrives at the flip-flop 2421 with a slight delay, and hence the output Q of line 2428₂Is the time t₁To time t₂Until then, it remains at logic 0. XOR gate 2422 is activated at time t₁And time t₂During the time period between and, even if the desired signal is a logic 0 (1XOR1 = 0), a logic 1 as a presentation purpose CLK3 is generated at the clock input of the flip-flop 2423. This time t₁And time t₂The generation of CLK3 during the time period between is a clock glitch. Therefore, the input line 2429 of the flip-flop 2423 is connected to D₃Any logical value provided in is stored, whether it is desired or not, where this flip-flop 2423 is ready for the next input by line 2429. If precisely desired, the time delay of CLK1 and CLK2 is minimized and no clock glitch is generated or, at a minimum, the clock glitch is spaced so short that it does not affect the rest of the circuit. Continue on. In the latter case, if the clock skew between CLK1 and CLK2 is short enough, the XOR gate delay is sufficiently transparent to the glitch and does not affect the rest of the circuit.
[0409]
Two known solutions to the retention time violation problem are (1) timing adjustment and (2) timing reintegration. Timing adjustment requires the insertion of sufficient delay elements in the correct signal path to extend the holding time of the logic elements as described in US Pat. No. 5,478,830. For example, the input D in the shift register circuit described above₂And D₃Adding a sufficient delay due to can prevent retention time violations. Therefore, in FIG. 78, a similar shift register circuit has an input D₂And D₃Are shown with

delay elements

2430 and 2431 added respectively. As a result, the delay element 2430 is time t₄Is time t₅Can be designed to occur as T2 <T1 + H2 (FIG. 76B), so that no retention time violation occurs.
[0410]
A potential problem with the timing adjustment solution is a strong dependence on the FPGA chip specification. As is well known in the art, a reconfigurable logic chip, such as an FPGA chip, is to implement a logic element using a look-up table. The chip lookup table delay is provided to the specification and the designer using a timing adjustment method that avoids holding time excess depending on this identified time delay. However, this delay is only an evaluation and varies from chip to chip. Another potential problem with timing adjustment methods is that the designer must compensate for wiring delays that exist across the circuit design. This is not an impossible task, but wiring delay evaluation tends to be time consuming and error prone. Furthermore, the timing adjustment method does not solve the clock glitch problem.
[0411]
Another solution is timing resynthesis, a technique introduced by IKOS's VirtualWires technology. The concept of timing recombination strictly controls the timing of clock and pin-out signals through a finite state machine and registers while transforming the user's circuit design into a functionally equivalent design. Timing resynthesis retimes the user's circuit design by introducing a single high speed clock. In addition, the latch, gate clock, and multiple synchronous and multiple asynchronous clocks are converted to a flip-flop based single clock synchronous design. Therefore, timing recombination uses registers at the input pinout and output pinout of each chip to control precise internal chip signal movement so that no internal chip holding time is exceeded. In addition, timing resynthesis uses a finite state machine at each chip, schedules inputs from other chips, schedules outputs to other chips, and schedules internal flip-flop updates based on a reference clock To do.
[0412]
By using the same shift register circuit introduced by the description given above with respect to FIGS. 75A, 75B, 76A, and 76B, FIG. 79 shows an example of a timing resynthesis circuit. The basic three flip-flop shift register design has been converted to a functional equivalent circuit. Chip 2430 includes original internal clock generation logic 2435 that is coupled to register 2443 via line 2448. Clock logic 2435 generates the CLK signal. Further, the first finite state machine 2438 is coupled to the register 2443 via line 2449. Register 2443 and first finite state machine 2438 are controlled by an independently designed global reference clock.
[0413]
In addition, the CLK signal is delivered across

chips

2432 and 2433 before the signal reaches chip 2434. At chip 2432, second finite state machine 2440 controls register 2445 via line 2462. The CLK signal goes from register 2443 to register 2445 via line 2461. The register 2445 outputs the CLK signal to the next chip 2433 through the line 2463. Chip 2433 includes a third finite state machine 2441 that controls register 2446 via line 2464. The register 2446 outputs the CLK signal to the chip 2434.
[0414]
Chip 2431 includes the original flip-flop 2436. Register 2444 has input S_inAnd input S_inThrough line 2452 to input D of flip-flop 2436₁Output to. Output Q of flip-flop 2436₁Is coupled to register 2466 via line 2454. Fourth finite state machine 2439 controls register 2444 via line 2451, register 2466 via line 2455, and flip-flop 2436 via latch enable line 2453. Further, the fourth finite state machine 2439 receives the original clock signal CLK from the chip 2430 via line 2450.
[0415]
Chip 2434 is connected to D of its flip-flop via line 2456.₂At the input, it contains the original flip-flop 2437 that receives the signal from the register 2466 of the chip 2431. Q of flip-flop 2437₂The output is coupled to register 2447 via line 2457. The fifth finite state machine 2439 controls the register 2447 via line 2459 and the flip-flop 2437 via latch enable line 2458. Further, the fifth finite state machine 2442 receives the original clock signal CLK from the chip 2430 via the

chips

2432 and 2433.
[0416]
With timing resynthesis, a finite state machine 2438-2442, registers 2443-2447 and 2466, and a single global reference clock are used to control signal flow across multiple chips and update internal flip-flops. Thus, at chip 2430, the distribution of the CLK signal to other chips is scheduled by the first finite state machine 2438 via register 2443. Similarly, at chip 2431, the fourth finite state machine 2439 receives input S_inTo the flip-flop 2436 via register 2444 and Q via register 2466₁Schedule delivery of output. Further, the latch function of the flip-flop 2436 is controlled by a latch enable signal from the fourth finite state machine 2439. The same principle supports the logic in the other chips 2432-2434. Such close control of the internal chip input delivery schedule, internal chip output delivery schedule, and internal flip-flop state update eliminates excess internal chip retention time.
[0417]
However, timing resynthesis techniques require the user's circuit design to be converted to a much larger functionally equivalent circuit, including the addition of finite state machines and registers. Typically, the additional logic required to implement this technology accounts for up to 20% of the logic available on each chip. Furthermore, this technique is not affected by the clock glitch problem. In order to avoid clock glitches, designers using timing resynthesis techniques must take additional preliminary steps. One conservative design approach is to design the circuit so that the inputs to the logic device utilizing the gate clock are not changed simultaneously. An aggressive approach uses a gate delay to filter glitches so as not to affect the rest of the circuit. However, as noted above, timing resynthesis requires some additional non-trivial measurements to avoid clock glitches.
[0418]
Various embodiments of the present invention that solve both the hold time and clock glitch problems are described. During a user-designed mapping configuration to the RCC computing system software model and the RCC array hardware model, the latch shown in FIG. 18A is free of timing-sensitive glitches (TIGF) according to one embodiment of the invention. ) Emulate using a latch. Similarly, the design flip-flop shown in FIG. 18B is emulated using a TIGF flip-flop according to one embodiment of the present invention. These TIGF logic devices, whether in the form of latches or flip-flops, can also be referred to as emulation logic devices. The update of the TIGF latch and flip-flop is controlled using a global trigger signal.
[0419]
In one embodiment of the present invention, not all of the logic devices found in the user design circuit are replaced with TIGF logic devices. The user design circuit includes a primary clock controlled by a gated clock or generated clock and these parts enabled or clocked by other parts. Retention time exceeded and clock glitches are issued for the latter case where the logic device is controlled by the gate clock or generated clock, so only those specific logic devices controlled by the gate clock or generated clock are Replaced with a TIGF logic device according to the present invention. In other embodiments, all logic devices found in the user design circuit are replaced by TIGF logic devices.
[0420]
Before describing the TIGF latch and flip-flop embodiments of the present invention, the global trigger signal will be described. In general, the global trigger signal keeps the TIGF latch and flip-flop in its state (i.e. maintains the old input value) during the evaluation period and updates its state during the short trigger period (i.e. Used to allow new input values to be stored). In one embodiment, the global trigger signal shown in FIG. 82 is separated from and generated from the EVAL signal described above. In this embodiment, the global trigger signal has a short trigger period next to a long evaluation period. A global trigger signal is generated to track the EVAL signal during the evaluation period and at the end of the EVAL cycle, with a short trigger signal updating the TIGF latch and flip-flop. In another embodiment, the EVAL signal is a global trigger signal, and the EVAL signal is in one logic state (eg, logic 0) during the evaluation period, and during the non-evaluation period or TIGF latch / flip-flop update period. Another logic state (eg logic 1).
[0421]
As described above with respect to the RCC computing system and RCC hardware array, an evaluation period is used to communicate all changes of the primary inputs and flip-flop / latch devices to the entire user design, one simulation cycle at a time. The During this transmission, the RCC system waits until all signals in the system reach a steady state. This evaluation period is calculated after the user design is mapped and placed on the appropriate reconfigurable logic device (eg, FPGA chip) in the RCC array. The evaluation period is therefore design specific. That is, the evaluation period for one user design may be different from the evaluation period for another user design. This evaluation period must be long enough to ensure that all signals in the system are transmitted through the entire system and reach a steady state before the next short trigger period.
[0422]
As shown in FIG. 82, a short trigger period occurs in time adjacent to the evaluation period. In one embodiment, a short trigger period occurs after the evaluation period. Prior to this short trigger period, the input signal is transmitted across the hardware model components of the user design circuit during the evaluation period. A short trigger period characterized by a change in the logic state of the EVAL signal according to one embodiment of the present invention controls all TIGF latches and flip-flops of the user design and communicates from the evaluation period after a stable state is achieved. Can be updated with the new value. This short trigger period may be as short as the period that is globally distributed to the low skew network and allows the reconfigurable logic device to operate properly (ie, as shown in FIG. 82, t₀~ T₁And t₂~ T₃Duration). During this short trigger period, the new primary input is sampled at each input stage of the TIGF latch and flip-flop, and the old stored value in the same TIGF latch and flip-flop is the next stage in the user-designed RCC hardware model. Is output. In the following description, the part of the global trigger signal that occurs during a short trigger period is referred to as a TIGF trigger, a TIGF trigger signal, a trigger signal, or simply a trigger.
[0423]
FIG. 80A shows the latch 2470 originally shown in FIG. 18A. The latch operation is as follows.
[0424]
if (#S), Q ← 1
else if (#R), Q ← 0
else if (en), Q ← D
else Q keeps the old value.
Since this latch is level sensitive and asynchronous, output Q tracks input D as long as the clock input is enabled and the latch enable input is enabled.
[0425]
FIG. 80B illustrates a TIGF latch according to one embodiment of the present invention. Similar to the latch of FIG. 80A, the TIGF latch has a D input, an enable input, a set (S), a reset (R), and an output O. In addition, the TIGF latch has a trigger input. The TIGF latch includes a D-type flip-flop 2471, a multiplexer 2472, an OR gate 2473, an AND gate 2474, and various interconnections.
[0426]
D-type flip-flop 2471 receives its input from the output of AND gate 2474 via line 2476. This D-type flip-flop is also triggered by a trigger signal on line 2477 at its clock input. This trigger signal is distributed globally by the RCC system according to a strict schedule that depends on the evaluation cycle. The output of D-type flip-flop 2471 is coupled to one of the inputs of multiplexer 2472 via line 2478. The other input of multiplexer 2472 is coupled to the D input of the TIGF latch on line 2475. This multiplexer is controlled by an enable signal on line 2484. The output of multiplexer 2472 is coupled to one of the inputs of OR gate 2473 via line 2479. The other input of OR gate 2473 is coupled to the set (S) input on line 2480. The output of OR gate 2473 is connected via line 2481 to one of the inputs of AND gate 2474. The other input of AND gate 2474 is connected to the reset (R) signal on line 2482. The output of the AND gate 2474 is fed back to the input of the D-type flip-flop 2471 via the line 2476 as described above.
[0427]
The operation of this TIGF latch embodiment of the present invention will now be described. In this embodiment of the TIGF latch, the D-type flip-flop 2471 maintains the current state (ie, old value) of the TIGF latch. Line 2476 at the input of D-type flip-flop 2471 represents the new input value already latched into this TIGF latch. Line 2476 represents the new value. Because the primary input (D input) of the TIGF latch on line 2475 is routed from the input of multiplexer 2472 (which is ultimately represented using the appropriate enable signal on line 2484) via OR gate 2473 and finally This is because, finally, it proceeds to the line 2483 via the AND gate 2474 and feeds back the new input signal of the TIGF latch to the D-type flip-flop 2471 on the line 2476. The trigger signal on line 2477 updates the TIGF latch by clocking the new input value on line 2476 to D-type flip-flop 2471. Thus, the output on line 2478 of D-type flip-flop 2471 indicates the current state (ie, old value) of the TIGF latch, while the input on line 2476 indicates the new input value already latched by the TIGF latch.
[0428]
Multiplexer 2472 receives the current state and new input value on line 2475 from D-type flip-flop 2471. The enable line 2484 functions as a selector signal for the multiplexer 2472. Since the TIGF latch is not updated until the trigger signal is applied on line 2477 (ie, the new input value is stored), the D input of the TIGF latch on line 2475 and the enable input on line 2484 are in any order. The TIGF latch can be reached. This TIGF latch (another TIGF latch in a user-designed hardware model) replaces a conventional latch as described above with respect to FIGS. 76A and 76B (one clock signal arrives much later than another clock signal). When encountering a situation that would normally cause the used circuit to exceed its hold time, this TIGF latch functions properly by holding the appropriate old value until the trigger signal is provided on line 2477.
[0429]
This trigger signal is distributed through a low skew global network.^***
Furthermore, this TIGF latch solves the clock glitch problem. Note that the clock signal is replaced by an enable signal in the TIGF latch. The enable signal on line 2484 can often glitch during the evaluation period, but the TIGF latch continues to ensure that the current state is maintained. Only the mechanism by which the TIGF latch can be updated is present by the trigger signal, and if this signal reaches a steady state, in one embodiment, this trigger signal is provided after the evaluation period.
[0430]
FIG. 81A shows the flip-flop 2490 originally shown in FIG. 18B. This flip-flop operates as follows.
[0431]
if (#S), Q ← 1
else if (#R), Q ← 0
else if (positive edge of CLK), Q ← D
else Q keeps the old value.
Since this latch is edge triggered as long as the flip-flop enable input is enabled, the output Q tracks the input D at the positive edge of the clock signal.
[0432]
FIG. 81B shows a TIGF D-type flip-flop according to an embodiment of the present invention. Similar to the flip-flop of FIG. 81A, the TIGF flip-flop has a D input, a clock input, a set (S), a reset (R), and an output Q. In addition, the TIGF flip-flop has a trigger input. The TIGF flip-flop includes three D-type flip-

flops

2491, 2492, and 2496, a multiplexer 2493, an OR gate 2494, two AND

gates

2495 and 2497, and various interconnections.
[0433]
Flip-flop 2491 receives the TIGF D input on line 2498, the trigger input on line 2499, and provides the Q output on line 2500. Further, the output line 2500 functions as an input to the multiplexer 2493. The other output to multiplexer 2493 comes from the Q output of flip-flop 2492 via line 2503. The output of multiplexer 2493 is coupled via line 2505 to one of the inputs of OR gate 2494. The other input of OR gate 2492 is a set (S) signal on line 2506. The output of OR gate 2494 is coupled via line 2507 to one of the inputs of AND gate 2495. The other input of AND gate 2495 is a reset (R) signal on line 2508. The output of AND gate 2495 (which is also the overall TIGF output Q) is coupled to the input of flip-flop 2492 via line 2501. In addition, flip-flop 2492 has a trigger input on line 2502.
[0434]
Returning to multiplexer 2493, its selector input is coupled to the output of AND gate 2497 via line 2509. AND gate 2497 receives one of the input from the CLK signal on line 2510 and the other input from the output of flip-flop 2496 via line 2512. Further, flip-flop 2496 receives an input from the CLK signal on line 2511 and a trigger input on line 2513.
[0435]
An embodiment of the operation of the TIGF flip-flop of the present invention will now be described. In this embodiment, the TIGF flip-flop is at three different points: a D-type flip-flop 2491 via line 2499, a D-type flip-flop 2492 via line 2502, and a D-type flip-flop 2496 via line 2513. Receive trigger signal.
[0436]
The TIGF flip-flop stores an input value only when an edge of the clock signal is detected. According to one embodiment of the invention, the requested edge is the positive edge of the clock signal. An edge detector 2515 is provided to detect the positive edge of the clock signal. The edge detector 2515 includes a D-type flip-flop 2496 and an AND gate 2497. In addition, edge detector 2515 is updated via a trigger signal on line 2513 of D-type flip-flop 2496.
[0437]
D-type flip-flop 2491 holds the new input value of the TIGF flip-flop until a trigger signal is provided on line 2499 and resists any changes to the D input on line 2498. Therefore, before each evaluation period of the TIGF flip-flop, a new value is stored in the D-type flip-flop 2491. Thus, the TIGF flip-flop avoids exceeding the holding time by pre-storing the new value until the TIGF flip-flop is updated by the trigger signal.
[0438]
D-type flip-flop 2492 holds the current value (ie, the old value) of the TIGF flip-flop until a trigger signal is provided on line 2502. This value is the state of the TIGF flip-flop emulated after this value is updated and before the next evaluation period. The input to D-type flip-flop 2492 on line 2501 holds the new value (which is the same as the value on line 2500 for a significant duration of the evaluated period).
[0439]
Multiplexer 2493 receives the new input value on line 2500 and the old value currently stored in TIGF flip-flop 2503 on line 2503. Based on the selector signal on line 2504, the multiplexer outputs either the new value (line 2500) or the old value (line 2503) as the output of the emulated TIGF flip-flop. Prior to any signal transmitted in the steady state of the user-designed hardware model approach, this output changes with any clock glitch. Thus, the input on line 2501 presents the new value stored in flip-flop 2491 upon expiration of the evaluation period. If the trigger signal is received by a TIGF flip-flop, flip-flop 2492 stores the new value present on line 2501 and flip-flop 2491 stores the next new value on line 2498. Therefore, the TIGF flip-flop according to an embodiment of the present invention does not negatively affect the clock glitch.
[0440]
To further elaborate, this TIGF flip-flop also provides some insensitivity to the clock glitch. Those skilled in the art will recognize that the clock glitch will affect any circuit that uses this TIGF flip-flop by replacing the flip-flops 2420, 2421, and 2423 shown in FIG. 77A with the TIGF flip-flop embodiment of FIG. 81B. Understand not. Referring to FIGS. 77A and 77B for some time, clock glitches have a negative impact on the circuit of FIG. 77A. Because time t₁~ T₂On the other hand, flip-flop 2423 is clocked to a new value when it should not be clocked with a new value. The skew nature of the CLK1 and CLK2 signals is determined by the time t₁~ T₂During this time, the XOR gate 2422 generates a logic 1 state, and drives the clock line of the next flip-flop 2423. With the TIGF flip-flop according to embodiments of the present invention, the clock glitch does not affect the clocking of the new value. By replacing the flip-flop 2423 with a TIGF flip-flop, once the signal has reached a stable state during the evaluation period, the trigger signal during the short trigger period is transferred from the TIGF flip-flop to the flip-flop 2491 (FIG. 81B). Allows storing new values. Then time t₁~ T₂Any clock glitch such as the clock glitch of FIG. 77B does not clock to a new value during the time interval from. The TIGF flip-flop is updated using only the trigger signal, and if this signal carrying this circuit reaches a stable state, this trigger signal is not presented to the TIGF flip-flop until after the evaluation period.
[0441]
A particular embodiment of a TIGF flip-flop is a D-type flip-flop, but other flip-flops (eg, T, JK, SR) are within the scope of the present invention. Other types of edge-triggered flip-flops can be generated from D-type flip-flops by adding some AND / OR logic before the D input.
[0442]
(VII. Simulation server)
A simulation server according to another embodiment of the present invention is provided so that multiple users can access the same reconfigurable hardware unit to efficiently perform the same or different user designs in a time-sharing manner. Simulate and accelerate. A fast simulation scheduler and state swapping mechanism is used to supply the simulation server with an active simulation process that results in high throughput. This server provides multiple users or multiple processes to access reconfigurable hardware for acceleration and hardware state swapping purposes. Once acceleration is gained or the hardware state is accessed, each user or process can only be simulated in software, thus releasing control of the reconfigurable hardware unit to other users or processes To do.
[0443]
In the simulation server part of this specification, terms such as “job” and “process” are used. As used herein, the terms “job” and “process” are generally used interchangeably. Conventionally, a batch system executes “jobs”, and a time-sharing system stores and executes “processes” or programs. In today's systems, these jobs and processes are similar. Therefore, in this specification, the term “job” is not limited to a batch type system, and “process” is not limited to a time division system. Rather, in an extreme example, a “job” is a “process” if the process can be executed within the time slice or in the time slice or without interruption by any other time-divisioned intervenor. Is equivalent to. In another extreme example, a “job” is a subset of a “process” if multiple time slices are not required to complete the “job”. Thus, if multiple time slices require the “process” to be completed due to the presence of other equal priority users / processes, the “process” is divided into “jobs”. Furthermore, if the process is the only high priority user, or if the “process” is required to complete multiple time slices because the process is short enough to complete within a time slice, “Process” is equivalent to “Job”. Thus, a user can interact with one or more “processes” or programs loaded and executed in a simulation system, each “process” requiring one or more “jobs” to be completed in a time-sharing system. Can do.
[0444]
In one system configuration, multiple users via remote terminals can utilize the same microprocessor workstation in a non-network environment and access the same reconfigurable hardware unit to have the same user circuit design or different user circuits. Review / debug your design. In a non-network environment, the remote terminal is connected to the main computing system to access its processing functions. This non-network configuration allows multiple users to share access to the same user design for parallel debugging purposes. This access is achieved by a time division process. In this process, the scheduler determines priorities for access to multiple users, swaps jobs, and selectively locks hardware unit access between scheduled users. In another example, multiple users may access the same reconfigurable hardware unit via a server for their own different user design for debugging purposes. In this configuration, multiple users or processes share multiple microprocessors at a workstation with an operating system. In another configuration, multiple users or processes in separate microprocessor-based workstations can access the same reconfigurable hardware unit and review / same user circuit design or different user circuit designs over the network. Can be debugged. Similarly, this access is achieved through a time-sharing process in which the scheduler determines access priorities for multiple users, swaps jobs, and hardware units between scheduled users. Lock access selectively. In a network environment, the scheduler pays attention to network requests via UNIX socket system calls. This operating system uses sockets to send commands to the scheduler.
[0445]
As described above, the simulation scheduler uses an interrupted multi-priority round robin algorithm. In other words, a higher priority user or processor is first provided before the user or process completes the job and ends the session. Among equal priority users or processes, an interrupted multi-priority round-robin algorithm is used, with each user or process assigning equal time slices to perform its operations until completion. The time slice is short enough so that multiple users or processes do not have to wait for a long time before being served. In addition, the time slice is long enough for the simulation server's scheduler to perform enough operations to swap in and execute a new user's job before interrupting one user or process. In one embodiment, the default time slice is 5 seconds and is user configurable. In one embodiment, the scheduler makes a specific call to the operating system's built-in scheduler.
[0446]
FIG. 45 illustrates a non-network environment using a multiprocessor workstation according to one embodiment of the invention. FIG. 45 is a modification of FIG. 1 and therefore like reference numerals are used for like components / units. The workstation 1100 includes a local bus 1105, a host / PCI bridge 1106, a memory bus 1107, and a main memory 1108. A cache memory subsystem (not shown) may further be provided. Other user interface units (eg, monitor, keyboard) are further provided but are not shown in FIG. The workstation 1100 further includes a plurality of

microprocessors

1101, 1102, 1103, and 1104 connected to the local bus 1105 via a scheduler 1117 and a connection / path 1118. As is well known, the operating system 1121 is the basis of a user-hardware interface for the entire computing environment for managing files and allocating resources for various users, processors, and devices in the computing environment. I will provide a. For conceptual purposes, an operating system 1121 is shown with a bus 1122. References to the operating system can be found in Abraham Silverschatz and James L. By Peterson, OPERATING SYSTEM CONCEPTS (1998) and William Stallings, MODERN OPERATING SYSTEMS (1996), which are incorporated herein by reference.
[0447]
In one embodiment, workstation 1100 is a Sun Microsystems Enterprise 450 system that uses an UltraSPARC II processor. Instead of memory access via the local bus, the Sun450 system allows the multiprocessor to access the memory via a dedicated bus to memory by means of a crossbar switch. Thus, multiple processes may execute using multiple microprocessors executing each instruction and access memory without going to the local bus. The Sun450 system with the Sun UltraSPARC multiprocessor specification is incorporated herein by reference. The Sun Ultra60 system is another example of a microprocessor system, but this system only allows two processors.
[0448]
The scheduler 1117 provides time division access to the reconfigurable hardware unit 20 via the device driver 1119 and connection / path 1120. The scheduler 1117 is mostly implemented in software that interacts with the operating system of the host computing system, and is partially implemented in hardware that interacts with the simulation server, by swapping simulation job interrupts and simulation sessions in and out. The The scheduler 1117 and device 1119 are described in more detail below.
[0449]
Each microprocessor 1101-1104 allows other microprocessors at workstation 1101 to be processed independently. In one embodiment of the present invention, workstation 1100 operates with a UNIX-based operating system, while in other embodiments, workstation 1100 is a Windows-based operating system or Macintosh-based operating system. It can work with any operating system. For a UNIX (registered trademark) -based system, the user interface includes X-WIndo (R) for managing programs, tasks, and files as needed. For details regarding the UNIX operating system, see Maurice J. et al. Bach, THE DESIGN OF THE UNIX (R) OPERATING SYSTEM (1986).
[0450]
In FIG. 45, multiple users may access workstation 1100 via a remote terminal. At this point, each user can execute the process using a specific CPU. At other times, each user uses a different CPU depending on resource limitations. Normally, the operating system 1121 determines such access, and indeed the operating system itself can jump from one CPU to another to accomplish this task. For handling time-sharing processes, the scheduler pays attention to the network, requests by socket system calls, makes system calls to the operating system 1121, and then starts generating interrupt signals by the device driver 1119. To handle interrupts to the reconfigurable hardware unit 20. Such interrupt signal generation is one of many steps in a scheduling algorithm that includes stopping the current job, saving state information for the currently interrupted job, swapping the job, and executing a new job. One. The server scheduling algorithm is described below.
[0451]
Sockets and socket system calls are now briefly described. In one embodiment, the UNIX operating system may operate in a time division mode. The UNIX kernel allocates a CPU to a process period (eg, a time slice), interrupts this process at the end of the time slice, and schedules another process for the next time slice. Processes interrupted from previous time slices are rescheduled for execution in subsequent time slices.
[0452]
One scheme that allows and facilitates internal process communication and allows the use of advanced network protocols is sockets. The kernel has three layers that function in terms of a client-server model. These three layers include a socket layer, a protocol layer, and a device layer. The upper or socket layer provides an interface between system calls and lower layers (protocol layer and device layer). Typically, a socket has an endpoint that connects a client process and a server process. This socket endpoint may have different machines. The intermediate layer (protocol layer) provides a protocol module for communication such as TCP and IP. The lower layer (device layer) includes a device driver that controls the network device. An example of a device driver is an Ethernet (registered trademark) driver via an Ethernet (registered trademark) based network.
[0453]
Processes communicate using a client-server model. Here, the server process pays attention to the socket at one end point, and the client process pays attention to the server process through the other socket at the other end point of the bidirectional communication path. The kernel maintains an interconnection between the three layers of each client and server, routing data from the client to the server as needed.
[0454]
A socket contains a number of system calls, including a socket system call that establishes an endpoint for a communication path. Many processes use the socket descriptor sd in many system calls. A combined system call associates a name with a socket descriptor. Some other example system calls include a connection system call that requires the kernel to make a connection to the socket, a close system call that closes the socket, a stop system call that closes the socket connection, and data on the connected socket. Includes send and receive system calls to send.
[0455]
FIG. 46 is a diagram illustrating another embodiment according to the present invention in which multiple workstations share a single simulation system on a time division basis across a network. The plurality of workstations are coupled to the simulation system via the scheduler 1117. In a simulation system computing environment, a single CPU 11 is coupled to a local bus 12 within a station 1110. Multiple CPUs may also be provided in this system. As is known to those skilled in the art, an operating system 1118 is also provided, and almost all processes and applications reside on the operating system. For conceptual purposes, an operating system 1121 is shown with a bus 1122.
[0456]
In FIG. 46, workstation 1110 includes the components / devices shown in FIG. 1 with scheduler 1117 and scheduler bus 1118 coupled to local bus 12 via operating system 1121. The scheduler 1117 controls the time division access of the

user stations

1111, 1112, and 1113 by making a socket call to the operating system 1121. The scheduler 1117 is often implemented in software, partly in hardware.
[0457]
In this figure, only three users are shown and can access the simulation system across the network. Of course, other system configurations provide for more than three users or fewer than three users. Each user accesses the system via a

remote station

1111, 1112, or 1113.

Remote user stations

1111, 1112, and 1113 are coupled to scheduler 1117 via network connections 1114, 1115 and 1116, respectively.
[0458]
As is known to those skilled in the art, the device driver 1119 is coupled between the PCI bus 50 and the reconfigurable hardware device 20. A connection or conductive path 1120 is provided between the device driver 1119 and the reconfigurable hardware device 20. In this network multi-user embodiment of the present invention, scheduler 1117 communicates and controls device driver 1119 and reconfigurable hardware device 20 for hardware promotion and simulation for the purpose of hardware technology restoration. As described above, the interfaces are connected through the operating system 1121.
[0459]
Again, in an embodiment, the simulation workstation 1100 is a Sun Microsystems Enterprise 450 system that uses an UltraSPARC II multiprocessor. Instead of accessing the memory via the local bus, the Sun 450 system allows the multiprocessor to access the memory on the memory dedicated bus via a crossbar switch instead of binding the local bus.
[0460]
FIG. 47 is a diagram illustrating a high-level structure of a simulation server according to a network embodiment of the present invention. Here, the operating system is not specified, but as is known to those skilled in the art, the operating system is used for file management and resource allocation to serve various users, processes, and devices in a simulation computing environment. Is always present. The simulation server 1130 includes a scheduler 1137, one or more device drivers 1138, and a reconfigurable hardware device 1139. 45 and 46, although not explicitly shown as a single integrated device, the simulation server includes a scheduler 1117, a device driver 1119, and a reconfigurable hardware device 20. Returning to FIG. 47, the simulation server 1130 is coupled to three workstations (or users) 1131, 1132, and 1133 via network connections /

paths

1134, 1135, and 1136, respectively. As described above, more than three workstations or fewer than three workstations may be coupled to the simulation server 1130.
[0461]
The scheduler in the simulation server is based on a preemptive round robin algorithm. In essence, the round-robin scheme allows several users or processes to run continuously and complete periodic execution. Thus, each simulation job (job associated with a workstation in a network environment or user / process in a multi-process non-network environment) is assigned a priority level and a fixed time slice to be executed.
[0462]
In general, higher priority jobs are executed first to complete. In one extreme case, if different users have different priorities, it will first serve the user with the highest priority until the user's job is completed, and the user with the lowest priority. In the end, it plays a role. Here, the priority of each user is different, and the scheduler only plays a role according to the priority, so the time slice is not used. This scenario is similar when there is only one user accessing the simulation system until completion.
[0463]
In the other extreme example, different users have equal priority. Therefore, the concept of a time slice with a first in first out (FIFO) queue is employed. Among jobs of equal priority, each job is executed until either the job is completed or the fixed time slice ends, whichever comes first. If the job is not executed until it completes during a time slice, the simulation image associated with the completed task needs to be saved for later restoration and execution. This job is then positioned at the end of the queue. If a saved simulation image exists for the next job, it is restored and executed in the next time slice.
[0464]
A job with a higher priority may be given priority over a job with a lower priority. That is, jobs with equal priority are executed through time slices and run in a land-robin fashion until completion. Thereafter, jobs with lower priorities are executed in a round-robin fashion. If a job with a higher priority is inserted into the queue while a job with a lower priority is running, the job with the higher priority will be executed and will be executed before the job with a lower priority. have priority. Thus, a job with a higher priority is executed and completed before a job with a lower priority begins to be executed. When a job with a low priority has already started to be executed, a job with a low priority is not executed until a job with a high priority is executed and is completed.
[0465]
In one embodiment, the UNIX operating system provides a basic and basic preemptive round robin scheduling algorithm. The simulation server scheduling algorithm according to one embodiment of the present invention works with the operating system scheduling algorithm. In systems using UNIX, the preemptive nature of the scheduling algorithm provides the operating system with priority over user-defined schedules. In order to enable a time-sharing scheme, the simulation scheduler uses a preemptive multi-priority round robin algorithm on top of the operating system's own scheduling algorithm.
[0466]
The relationship between multiple users and the simulation server according to one embodiment of the present invention follows a client-server model where the multiple users are clients and the simulation server is a server. Communication between the user client and server occurs via socket calls. Referring briefly to FIG. 55,
The client includes a client program 1109, a socket system call component 1123, a UNIX (registered trademark) kernel 1124, and a TCP / IP protocol component 1125. The server includes a TCP / IP protocol component 1126, a UNIX (registered trademark) kernel 1127, a socket system call component 1128, and a simulation server 1129. Multiple clients may request that a simulation job be simulated at the server via a UNIX socket call from a client application program.
[0467]
In one embodiment, a typical sequence of events includes multiple clients sending a request to a server via the UNIX socket protocol. For each request, the server notifies that it has received a request as to whether the command was successfully implemented. However, for server queue status requests, the server responds with the current queue status so that it can be displayed appropriately to the user. Table F below lists the relevant socket commands from the client.
[0468]
[Table 8]

[0469]
In each socket call, each command encrypted with an integer may be followed by additional parameters, for example <design> representing the design name. The response from the simulation server is “0” if the command is executed successfully and “1” if the command fails. For command “5” requesting queue status, one embodiment of the command response is ASCII text ending with the characters “\ 0” displayed on the user's screen. With these system socket calls, the appropriate communication protocol signals are sent to and received from the reconfigurable hardware device via the device driver.
[0470]
FIG. 48 is one embodiment of a simulation server architecture according to the present invention. As described above, a single simulation server can serve in a time-sharing manner for user design simulation and hardware promotion for multiple users or multiple processes. Accordingly, users /

processes

1147, 1148, and 1149 are coupled to the simulation server 1140 via

inter-process communication paths

1150, 1151, and 1152, respectively.

Inter-process communication paths

1150, 1151, and 1152 can be on the same workstation as the multiprocessor configuration and operation, or on a network for multiple workstations. Each simulation session includes a software simulation state along with a communication hardware state having a reconfigurable hardware device. Interprocess communication between software sessions provides the ability to have a simulation session on the same workstation where the simulator plug-in card is installed, or another workstation connected via a TCP / IP network, UNIX This is performed using a (registered trademark) socket or a system call. Communication with the simulation server is automatically started.
[0471]
48, the simulation server 1140 includes a server monitor 1141, a simulation job queue table 1142, a priority classifier 1143, a job swapper 1144, a device driver (s) 1145, and a reconfigurable hardware device 1146. The simulation job queue table 1142, the priority classifier 1143, and the job swapper 1144 constitute the scheduler 1137 shown in FIG.
[0472]
The server monitor 1141 provides a user interface function to the system administrator. A user may monitor the status of the simulation server by instructing the system to display simulation jobs in the queue, scheduling priorities, usage history, and simulation job swap efficiency. Other utility functions include job priority editing, simulation job deletion, and simulation server state reset.
[0473]
The simulation job queue table 1142 holds a list of all outstanding simulation requests being processed in the queue inserted by the scheduler. Table items include number of jobs, number of software simulations, software simulation images, hardware simulation image files, design configuration files, number of priorities, hardware size, software size, cumulative execution time of simulation, and requirements identification Is included. The job queue is implemented using a first-in first-out (FIFO) queue. Thus, when a new job is requested, it is placed at the end of the queue.
[0474]
The priority classifier 1143 determines which simulation job in the queue is to be executed. In one embodiment, the simulation job priority scheme is user definable (ie, controllable and definable by the system administrator) and controls which simulation processes have priority for the current run. . In one embodiment, the priority level is fixed based on the importance of a particular process or a particular user. In other embodiments, the priority level is dynamic and can be changed during simulation. In the preferred embodiment, the priority is based on the user ID. Typically, one user has high priority, and all other users have low but equal priority.
[0475]
The priority level can be set by the system administrator. The simulator server obtains all user information from a UNIX® facility found in a UNIX® user file, typically called “/ etc / passwd”. Adding a new user is consistent with the process of adding a new user into the UNIX system. After defining all users, the simulator server monitor can be used to adjust the user's priority level.
[0476]
The job swapper 1144 temporarily transfers a simulation job associated with a process or a workstation to another simulation associated with another process or workstation based on a priority determination programmed with respect to the scheduler. Replace with a job. If multiple users simulate the same design, the job swapper will only replace the stored simulation state for the simulation session. However, when multiple users simulate multiple designs, the job swapper loads the design for hardware configuration before being replaced in the simulation state. In one embodiment, the job exchange mechanism improves the performance of the time division embodiment of the present invention because job exchange only needs to be done for reconfigurable hardware device accesses. Thus, if one user needs software simulation for a certain period of time, the server will exchange other jobs for other users, and the other users will be able to reconfigure hardware for hardware promotion. Access to the wear device. The frequency of job exchange can be adjusted and programmed by the user. The device driver communicates with a reconfigurable hardware device that exchanges jobs.
[0477]
Next, the operation of the simulation server will be described. FIG. 49 is a flowchart of the simulation server in operation. Initially, at step 1160, the system is idle. If the system is idle at step 1160, the simulation server is not necessarily inactive and the simulation task is not running. In practice, idle means one of the following: (1) The simulation is not executed. (2) Only one user / workstation is active in one processor environment and no time division is required. Or (3) Only one user / workstation is active in a multi-process environment, but only one process is running. Thus, states 2 and 3 above require that the simulation server have only one job to process, so it is necessary to queue the job, determine the priority, swap the job, and Indicates that it is not essential and the simulation server is idle because it does not receive a request (event 1161) from another workstation or process.
[0478]
If the simulation request originates from one or more request signals from multiple user environment workstations or from multiple processor environment microprocessors, the simulation server, at step 1162, receives an incoming simulation job. Queue (single or multiple). The scheduler holds a simulation job queue table so that all in-process simulation requests are inserted into the queue and all in-process simulation requests are listed. For batch simulation jobs, the scheduler at the server queues all incoming simulation requests and automatically processes the tasks without human intervention.
[0479]
Thereafter, the simulation server classifies the jobs queued and determines the priority in step 1163. This process is particularly important when a server needs to prioritize between jobs to provide access to reconfigurable hardware devices for multiple jobs. The priority classifier determines which simulation job in the queue is to be executed. In one embodiment, the simulation job priority scheme can be defined by the user to control which simulation process has priority for the current execution in the presence of resource contention (ie, system administrator). Controllable and definable).
[0480]
After the priority classification in step 1163, the server exchanges simulation jobs in step 1164, if necessary. This step temporarily changes one simulation job associated with one process or workstation to another simulation job associated with another process or workstation based on a priority determination programmed for the scheduler at the server. replace. If multiple users simulate the same design, the job swapper will only replace the stored simulation state for the simulation session. However, when multiple users simulate multiple designs, the job swapper first loads the design for hardware configuration before being replaced in the simulation state. Here, the device driver also communicates with a reconfigurable hardware device to exchange jobs.
[0481]
In one embodiment, the job exchange mechanism improves the performance of the time division embodiment of the present invention because job exchange only needs to be done for reconfigurable hardware device accesses. Thus, if one user needs software simulation for a certain period of time, the server will exchange other jobs for other users, and the other users will be able to reconfigure hardware for hardware promotion. Access to the wear device. For example, assume that two users, user 1 and user 2, are coupled to a simulation server for access to a reconfigurable hardware device. At some point, since user 1 has access to the system, debugging can be done on user 1's design. If user 1 is debugging only in software mode, the server may release the reconfigurable hardware device so that user 2 can access it. The server swaps user 2's job, and user 2 can either perform a software simulation of the model or a hardware promotion. Depending on the priority between user 1 and user 2, user 2 may continue to access the reconfigurable hardware device for a certain period of time, or user 1 may facilitate If the server needs a reconfigurable hardware device, the server can prioritize the user 2 job, so the user 1 job uses the reconfigurable hardware device for hardware promotion. Can be swapped. The predetermined time is a priority of the simulator job based on a plurality of requests having the same priority. In one embodiment, the default time is 5 minutes, but this time is configurable by the user. This 5-minute setting represents one form of a timeout timer. The simulation system of the present invention is very time consuming for current simulation jobs and the system determines that other pending equal priority jobs need to gain access to a reconfigurable hardware model. Therefore, the execution of the current simulation job is stopped using the timeout timer.
[0482]
When the job swap process is completed at step 1164, the device driver in the server locks the reconfigurable hardware device so that only the currently scheduled user or process simulates and uses the hardware model. be able to. A lock and simulation process occurs at step 1165.
[0483]
At event 1166, when either the simulation completes or pauses in the current simulation session occurs, the server returns to the priority classification step 1163 to determine the priority of the pending simulation job and In response, the simulation job is swapped. Similarly, at event 1167, the server prioritizes the execution of the currently active simulation job to return the server to the priority classification state 1163. Priority only occurs under certain circumstances. One such state is when a task or job with a higher priority is pending. Another such situation is when the system is currently executing a computational intensive simulation task. In this case, the scheduler can be programmed to prioritize the currently executing job and schedule tasks or jobs with equal priority by using a timeout timer. In one embodiment, the timeout timer is set to 5 minutes, and if the current job is run for 5 minutes, the system prioritizes the current job and puts the pending job at the same priority level. Swap even if it exists.
[0484]
FIG. 50 is a flowchart of the job swap process. The job swap function is performed in step 1164 of FIG. 49 and is shown in the simulation server hardware as the job swapper 1144 of FIG. In FIG. 50, if the simulation job needs to be swapped with another simulation job, the job swapper sends an interrupt to the reconfigurable hardware device at step 1180. The reconfigurable hardware device is not currently executing any job (ie, the system is idle or the user is operating in software simulation mode without any hardware-assisted intervention) If so, the interrupt immediately prepares a reconfigurable hardware device for job swapping. However, if a reconfigurable hardware device is currently executing a job, an interrupt signal is recognized while executing an instruction or processing data, but the reconfiguration Possible devices continue to execute currently pending instructions and process data for the current job. If a reconfigurable hardware device receives an interrupt signal when the current simulation job is not in the middle of instruction execution or data processing, the interrupt signal immediately takes action on the reconfigurable hardware device. To practically end.
[0485]
In step 1181, the simulation system saves the current simulation image (ie, hardware and software state). By saving this image, the user can later restore the simulation run without re-running the entire simulation until the saved point in time.
[0486]
In step 1182, the simulation system sets up a reconfigurable hardware device using the new user design. This setup process is only necessary if the new job is set up and is associated with a user design that is different from the design loaded on the reconfigurable hardware device and execution has just been interrupted. . After configuration, the saved hardware simulation image is reloaded at step 1183 and the saved software simulation image is reloaded at step 1184. If a new simulation job is associated with the same design, no further settings are required. For the same design, the simulation system replaces the desired hardware simulation image associated with a new simulation job of that same design at step 1183 with the simulation image of the job where the new job simulation design was just interrupted. Is probably different, so load it. Details of the setting process are provided in this patent specification. The associated software simulation image is then reloaded at step 1184. After reloading the hardware and software simulation image, in step 1185, a simulation can begin for this new job, and the previously interrupted job will have access to the reconfigurable hardware device for some time. Since it is not, it can proceed only in the software simulation mode.
[0487]
FIG. 51 is a diagram illustrating signals between a device driver and a reconfigurable hardware device. Device driver 1171 provides an interface between scheduler 1170 and reconfigurable hardware device 1172. In addition, as shown in FIGS. 45 and 46, the device driver 1171 is an interface between the entire computing environment (ie, one or more workstations, PCI buses, PCI devices) and a reconfigurable hardware device 1172. FIG. 51 shows only the simulation server portion. The signals between the device driver and the reconfigurable hardware device include a bidirectional communication handshake signal and one-way design configuration information sent from the computing environment to the reconfigurable hardware device via the scheduler. Simulation state information used after swapping, simulation state information no longer used after swapping, and an interrupt signal that can be sent from a device driver to a reconfigurable hardware device and a simulation job can be swapped. included.
[0488]
Line 1173 carries a two-way communication handshake signal. These signals and handshake protocols are further described with reference to FIGS.
[0489]
Line 1174 carries the one-way design configuration information from the computing environment via the scheduler 1170 to the reconfigurable hardware device 1172. Initialization information may be sent over this line 1170 to a reconfigurable hardware device 1172 for modeling. Further, if the user is modeling and simulating different user designs, the configuration information needs to be sent to the reconfigurable hardware device 1172 during the time slice. If different users model the same user design, a new design configuration is not required, but rather, different simulation hardware states associated with the same design will lead to a reconfigurable hardware device 1172 in different simulation runs. May need to be transmitted.
[0490]
Line 1175 carries the swapped and used simulation state information to a reconfigurable hardware device 1172. Line 1176 carries simulation state information that has been swapped and is no longer used from a reconfigurable hardware device to a computing environment (ie, a normal memory). The simulation state information that is swapped and used includes previously saved hardware model state information and hardware memory states that are required to facilitate the reconfigurable hardware device 1127. The state information that is swapped and used is transmitted at the start of the time slice, and the scheduled current user can access the reconfigurable hardware device 1172 for promotion. Simulation state information that has been swapped and is no longer used includes at the end of the time slice when the reconfigurable hardware device 1172 receives an interrupt signal and moves to the next time slice associated with a different user / process. Contains hardware model and memory state information that needs to be stored in memory. Saving the state information allows the current user / process to restore this state at a later time, eg, in the next time slice assigned to the current user / process.
[0491]
Line 1177 sends an interrupt signal from the device driver 1171 to the reconfigurable hardware device and the simulation job can be swapped. This interrupt signal is transmitted between time slices, and the current simulation job of the current time slice is swapped and not used, and is swapped to a new simulation job for a new time slice.
[0492]
Next, a communication handshake protocol according to one embodiment of the present invention will be described with reference to FIGS. FIG. 53 shows a communication handshake signal between the device driver and the reconfigurable hardware device via the handshake logic interface. FIG. 54 shows a state diagram of the communication protocol. FIG. 51 shows a communication handshake signal on line 1173. FIG. 53 is a detailed diagram of a communication handshake signal between the device driver 1171 and the reconfigurable hardware device 1172.
[0493]
In FIG. 53, a handshake logic interface 1234 is provided in the reconfigurable hardware device 1172. Alternatively, handshake logic interface 1234 can be installed outside of reconfigurable hardware device 1172. Four sets of signals are provided between the device driver 1171 and the handshake logic interface 1234. These signals are a 3-bit SPACE signal on line 1230, a 1-bit read / write signal on line 1231, a 4-bit COMMAND signal on line 1232, and a 1-bit COMMAND signal on line 1233. This is a DONE signal. The handshake logic interface includes logic that processes these signals to place the reconfigurable hardware device into a mode suitable for the various operations that need to be performed. The interface is coupled to a CTRL_FPGA device (or FPGA I / O controller).
[0494]
For 3-bit SPACE signals, data transfer between the simulation system computing environment and the reconfigurable hardware device via the PCI bus is a specific I / O address space at the software / hardware boundary, ie , REG (register), CLK (software clock), S2H (software to hardware), and H2S (hardware to software). As described above, the simulation system maps the hardware model into four address spaces in the main memory according to different component types and control functions. The REG space is specified for the register component. The CLK space is designated for the software clock. The S2H space is specified for the output of the software test bench component to the hardware model. The H2S space is specified for the output of the hardware model to the software test bench component. These dedicated I / O buffer spaces are mapped to the kernel's main memory space during system initialization.
[0495]
Table G below provides a description of each of the SPACE signals.
[0496]
[Table 9]

[0497]
The read / write signal on line 1231 indicates whether the data transfer is a read or a write. The DONE signal on line 1233 indicates the completion of the DMA data transfer period.
[0498]
The 4-bit COMMAND indicates whether the data transfer operation is a write, a read, a new user-designed setting on a reconfigurable hardware device, or a simulation interrupt . As shown in Table H below, the COMMAND protocol is as follows.
[0499]
[Table 10]

[0500]
Next, the communication handshake protocol will be described with reference to the diagram showing the state on FIG. In state 1400, the simulation system is idle in the device driver. Unless a new command is presented, the system will remain idle as indicated by path 1401. If a new command is presented, the command processor processes the new command in state 1402. In one embodiment, the command processor is an FPGA I / O controller.
[0501]
If COMMAND = 0000 or COMMAND = 0001, the system reads from or writes to the specified space, as indicated by the SPACE index, at step 1403. If COMMAND = 0010, the system uses the user design to initially set the FPGA in a reconfigurable hardware device, or the new user design in state 1404 to set the FPGA. . The system orders all FPGA system configuration information and models a portion of the user design that can be modeled in hardware. However, if COMMAND = 0011, the system interrupts the reconfigurable hardware device in state 1405 and the time slice times out to swap to a new user / process in the new simulation state, so the simulation Interrupt the system. Upon completion of these

states

1403, 1404, or 1405, the simulation system proceeds to the DONE state 1406 to generate a DONE signal and then returns to state 1400 to become idle until a new command is presented.
[0502]
Next, a time sharing function of the simulation server for processing a plurality of jobs having different priority levels will be described. FIG. 52 shows an example. Four jobs (job A, job B, job C, and job D) are incoming jobs in the simulation job queue. However, the priority levels of these four jobs are different. That is, high priority I is assigned to jobs A and B, but low priority II is assigned to jobs C and D. As shown in the time series chart of FIG. 52, the use of time-divisionable reconfigurable hardware devices depends on the priority level of incoming jobs queued. At time 1190, the simulation begins with job A given access to the reconfigurable hardware device. At time 1191, job A has priority over job B because job B has the same priority as job A, and the scheduler provides equal time division access to the two jobs. Job B has access to a reconfigurable hardware device. At time 1192, job A has priority over job B, and job A is executed until completion at time 1193. At time 1193, job B takes over and runs until completion until time 1194. At time 1194, job C, which is adjacent in the queue but has a lower priority level than jobs A and B, now has access to a reconfigurable hardware device for execution. At time 1195, since the priority level of job D is the same as that of job C in the time division access, priority is given to job C. Job D has access prioritized by job C up to time 1196. Job C is executed until completion at time 1197. Thereafter, at time 1197, job D takes over and runs until time 1198 is completed.
[0503]
(VIII. Memory simulation)
The memory simulation or memory mapping aspect of the present invention provides an effective way for the simulation system to manage the various memory blocks associated with the configuration hardware model of the user's design. The configuration hardware model was programmed into an array of FPGAs in a reconfigurable hardware section. By implementing the embodiments of the present invention, the memory simulation scheme does not require any dedicated pins in the FPGA chip to handle memory accesses.
[0504]
As used herein, the term “memory access” refers to a write access or read between an FPGA logic circuit in which a user design is configured and an SRAM memory device that stores all memory blocks associated with the user design. Indicates either access. Thus, the write operation includes data transfer from the FPGA logic device to the SRAM memory device, while the read operation includes data transfer from the SRAM memory device to the FPGA logic device. Refer to FIG. The FPGA logic devices include 1201 (FPGA1), 1202 (FPGA3), 1203 (FPGA0), and 1204 (FPGA2). The SRAM memory includes

memory devices

1205 and 1206.
[0505]
The term “DMA data transfer” also refers to data transfer between a computing system and a simulation system, in addition to common usage among those skilled in the art. The computing system is shown in FIGS. 1, 45, and 46 as an entire PCI-based system with a memory that supports the simulation system and resides in software and reconfigurable hardware. The selected device drivers, sockets / system calls to / from the operating system are also part of the simulation system that allows proper interfacing with the operating system and the reconfigurable hardware portion. In one embodiment of the present invention, the DMA read transfer includes the transfer of data from the FPGA logical device (and FPGA SRAM memory device for initialization and memory content dump) to the host computing system. DMA write transfers include the transfer of data from the host computing system to the FPGA logical device (and to the FPGA SRAM memory device for initialization and memory content dump).
[0506]
The terms “FPGA data bus”, “FPGA bus”, “FD bus” and variations thereof are high bank buses that couple the FPGA logic device and the SRAM memory device, including the configuration to be debugged and the programmed user design. FD [63:32] and low bank bus FD [31: 0] are shown.
[0507]
The memory simulation system includes a memory state machine, an evaluation state machine, and associated logic for controlling and interfacing with: (1) a main computing system and its associated memory system; (2) An SRAM logic coupled to the FPGA in the simulation system, and (3) an FPGA logic device including configuration in debugging and programmed user design.
[0508]
The FPGA logic device side of the memory simulation system has an evaluation state machine, an FPGA bus driver, and a logic interface for each memory block N for interfacing with the memory interface owned by the user in the user's design to process the following: Includes: (1) data evaluation between FPGA logic devices, and (2) write / read memory access between FPGA logic devices and SRAM memory devices. In conjunction with the FPGA logic device side, the FPGA I / O controller side includes a memory state machine and interface logic to handle the following DMA, write, and read operations: (1) Main computing system and SRAM A memory device, and (2) an FPGA logic device and an SRAM memory device.
[0509]
The operation of the memory simulation system according to one embodiment of the present invention is generally as follows. The simulation write / read cycle is divided into three periods-DMA data transfer, evaluation, and memory access. The DATAXSFR signal indicates the occurrence of a DMA data transfer period. In the DMA data transfer period, the calculation system and the SRAM memory unit transfer data to each other via the FPGA data bus (high bank bus (FD [63:32]) 1212 and low bank bus (FD [31: 0]) 1213. ing.
[0510]
During the evaluation period, the logic circuitry in each FPGA logic device generates the appropriate software clock, input enable, and multiplexer enable signals to the user's design logic for data evaluation. Communication between FPGA logic devices occurs during this period.
[0511]
During the memory access period, the memory simulation system waits for the high and low bank FPGA logic devices to place their respective address and control signals on their respective FPGA data buses. These addresses and control signals are latched in by the CTRL_FPGA unit. If the operation is a write, address, control, and data signals are transferred from the FPGA logic device to the respective SRAM memory device. If the operation is a read, address and control signals are provided to the designated SRAM memory device and data signals are transferred from the SRAM memory device to the respective FPGA logic device. After all desired memory blocks in all FPGA logic devices have been accessed, the memory simulation write / read cycle is complete and the memory simulation system is idle until the start of the next memory simulation write / read cycle.
[0512]
FIG. 56 is a high-level block diagram of a memory simulation configuration according to one embodiment of the present invention. Signals, connections, and buses not related to the memory simulation aspect of the present invention are not shown. The above CTRL_FPGA unit 1200 is coupled to the bus 1210 via a line 1209. In one embodiment, the CTRL_FPGA unit 1200 is a programmable logic device (PLD) that is in the form of an FPGA chip, such as an Altera 10K50 chip. The local bus 1210 allows the CTRL_FPGA unit 1200 (if any) to be coupled to other simulation array boards and other chips (eg, PCI controller, EEPROM, clock buffer). Line 1209 transmits a DONE signal indicating the completion of the simulation DMA data transfer period.
[0513]
FIG. 56 shows other major functional blocks in the form of logical and memory devices. In one embodiment, the logic device is a programmable logic device (PLD) that is in the form of an FPGA chip, such as an Altera 10K130 or 10K250 chip. Thus, instead of the above embodiment having eight AlteraFLEX10K100 chips in the array, this embodiment uses only four Altera FLEX10K130 chips. The memory device is a synchronous pipelined cache SRAM, such as a Cypress 128Kx32 CY7C1335 or CY7C1336 chip. The logical devices include 1201 (FPGA1), 1202 (FPGA3), 1203 (FPGA0), and 1204 (FPGA2). The SRAM chip includes a low bank memory device 1205 (L_SRAM) and a high bank memory device 1206 (H_SRAM).
[0514]
These logical devices and memory devices are coupled to the CTRL_FPGA unit 1200 via a high bank bus 1212 (FD [63:32]) and a low bank bus (FD [31: 0]). Logical devices 1201 (FPGA1) and 1202 (FPGA3) are coupled to high bank bus 1212 via bus 1223 and bus 1225, respectively, while logical devices 1203 (FPGA0) and 1204 (FPGA2) are connected to low bank data bus 1213. Coupled via bus 1224 and bus 1226, respectively. High bank memory device 1206 is coupled to high bank bus 1212 via bus 1220, while low bank memory device 1205 is coupled to low bank bus 1213 via bus 1219. The dual bank bus structure allows the simulation system to access devices on the high bank and devices on the low bank in parallel with improved throughput rates. The dual bank data bus structure supports other signals such as control and access signals so that the simulation write / read cycle can be controlled.
[0515]
Referring briefly to FIG. 61, each simulation write / read cycle includes a DMA data transfer period, an evaluation period, and a memory access period. Various control signal combinations control and indicate whether the simulation system is in one period and the other. DMA data transfer between the host computer system and the logical devices 1201 to 1204 in the reconfigurable hardware section is performed by PCI bus (for example, bus 50 in FIG. 46),

local buses

1210 and 1236, and FPGA bus 1212 (FD [63 : 32]) and the FPGA bus 1213 (FD [31: 0]).

Memory devices

1205 and 1206 are responsible for DMA data transfer for initialization and memory content dump. Evaluation data transfer between the logical devices 1201 to 1204 in the reconfigurable hardware part is performed by connecting the interconnection (described above) and the FPGA bus 1212 (FD [63:32]) and the FPGA bus 1213 (FD [31: 0]). Occur through. Memory accesses between the logical devices 1201-1204 and the

memory devices

1205 and 1206 occur via the FPGA bus 1212 (FD [63:32]) and the FPGA bus 1213 (FD [31: 0]).
[0516]
Reference is again made to FIG. The CTRL_FPGA unit 1200 provides and receives a number of control and address signals to control the simulation write / read cycle. The CTRL_FPGA unit 1200 provides the DATAXSFR and EVAL signals on the line 1211 to the

logical devices

1201 and 1203 via the line 1221 and to the

logical devices

1202 and 1204 via the line 1222, respectively. The CTRL_FPGA unit 1200 also provides the memory address signal MA [18: 2] to the low bank memory device 1205 and the high bank memory device 1206 via

buses

1229 and 1214, respectively. To control the mode of these memory devices, the CTRL_FPGA unit 1200 provides chip select write (and read) signals to the low bank memory device 1205 and the high bank memory device 1206 via

lines

1216 and 1215, respectively. To indicate the completion of the DMA data transfer, the memory simulation system may send and receive the DONE signal on line 1209 to the CTRL_FPGA unit 1200 and the computing system.
[0517]
As described above with reference to FIGS. 9, 11, 12, 14, and 15, the logic devices 1201-1204 are specifically configured with two sets of SIFTIN / SHIFTOUT lines—

lines

1207, 1227, and 1218 and

lines

1208, 1228. , And 1217 are connected together by a multiplexed cross-chip address pointer chain represented in FIG. These sets are initialized by Vcc on

lines

1207 and 1208 at the start of the chain. The SHIFTIN signal is transmitted from the preceding FPGA logic device in the bank and initiates memory access for the current FPGA logic device. Upon completion of the shift through the predetermined set of chains, the last logic device generates a LAST signal (ie, LASTL or LASTH) to the CTRL_FPGA unit 1200. For high banks, logic device 1202 generates a LASTH shift-out signal on line 1218 to CTRL_FPGA unit 1200, and for low banks, logic device 1204 generates a LASTL signal on line 1217 to CTRL_FPGA unit 1200. .
[0518]
With respect to board implementation and FIG. 56, one embodiment of the present invention includes components (eg, logic devices 1201-1204, memory devices 1205-1206, and CTRL_FPGA portion 1200) and buses (eg, FPGA buses 1212-1213 and local bus 1210). ) Is built in one board. The one board is coupled to the motherboard via a motherboard connector. Thus, in one board, four logical devices (two in each bank), two memory devices (one in each bank), and a bus are provided. The second board may include a logic device (usually 4), a memory device (usually 2), an FPGA I / O controller (CTRL_FPGA part) and a bus as its complement. However, the PCI controller can be installed only on the first board. The board-to-board connector is provided between boards as described above, so that the logic devices on all boards are connected together and communicate with each other during the evaluation period, and a local bus is provided across all boards. To do. The FPGA bus FD [63: 0] is provided only in each board, and is not provided across multiple boards.
[0519]
In this board configuration, the simulation system performs memory mapping between logic devices and memory devices on each board. Memory mapping across different boards is not provided. Thus, the logic devices on board 5 map memory blocks to memory devices only in board 5 and not to memory devices on other boards. However, in other embodiments, the simulation system maps memory blocks from logical devices on one board to memory devices on another board.
[0520]
The operation of the memory simulation system of one embodiment of the present invention is generally as follows. The simulation write / read cycle is divided into three periods-DMA data transfer, evaluation, and memory access. To indicate the completion of the simulation write / read cycle, the memory simulation system may send and receive the DONE signal on line 1209 to the CTRL_FPGA unit 1200 and the computing system. The DATAXSFR signal on bus 1211 indicates the occurrence of a DMA data transfer period. In the DMA data transfer period, the computing system and the FPGA logic devices 1201 to 1204 communicate with each other via the FPGA data bus, the high bank bus (FD [63:32]) 1212, and the low bank bus (FD [31: 0]) 1213. Data is being transferred. In general, DMA transfers occur between the host computing system and the FPGA logic device. For initialization and memory content dump, DMA transfers occur between the host computing system and the

SRAM memory devices

1205 and 1206.
[0521]
During the evaluation period, the logic circuitry in each FPGA logic device 1201-1120 generates the appropriate software clock, input enable, and multiplexer enable signals to the user design logic for data evaluation. Communication between FPGA logic devices occurs during this period. The CTRL_FPGA unit 1200 also starts an evaluation counter to control the duration of the evaluation period. The number of counts, and thus the duration of the evaluation period, is set by the system by determining the longest path of the signal. The path length is associated with a specific number of steps. The system uses the step information and calculates the number of counts necessary to execute and complete the evaluation cycle.
[0522]
During the memory access period, the memory simulation system waits for the high and low bank FPGA logic devices 1201-1120 to place their respective address and control signals on their respective FPGA data buses. These address and control signals are latched in by the CTRL_FPGA unit 1200. If the operation is a write, address, control and data signals are transferred from the FPGA logic devices 1201-1204 to the respective

SRAM memory devices

1205 and 1206. If the operation is a read, address and control signals are transferred from the FPGA logic devices 1201-1204 to the respective

SRAM memory devices

1205 and 1206, and data signals are transferred from the

SRAM memory devices

1205 and 1206 to the respective FPGA logic devices 1201-1204. Forwarded to On the FPGA logic device side, the FD bus driver loads the memory block address and control signal onto the FPGA data bus (FD bus). If the operation is a write, the write data is placed on the FD bus for that memory block. If the operation is a read, the double buffer latches in data for the memory block on the FD bus from the SRAM memory device. This operation is continued in sequence one memory block at a time for each memory block in each FPGA logic device. After all desired memory blocks in the FPGA logic device have been accessed, the memory simulation system proceeds to the next FPGA logic device in each bank and begins accessing memory blocks in that FPGA logic device. After all desired memory blocks in all FPGA logic devices 1201-1120 have been accessed, the memory simulation write / read cycle is complete and the memory simulation system is idle until the start of the next memory simulation write / read cycle. It is.
[0523]
FIG. 57 shows a more detailed block diagram of the memory simulation aspect of the present invention, including a more detailed block diagram of each logic device associated with CTRL_FPGA 1200 and memory simulation. FIG. 57 shows a portion of CTRL_FPGA 1200 and logical device 1203 (which is similar in structure to some of the other

logical devices

1201, 1202, and 1204). CTRL_FPGA 1200 includes a memory finite state machine (MEMFSM) 1240, an AND gate 1241, an evaluation (EVAL) counter 1242, a low bank memory address / control latch 1243, a low bank address / control multiplexer 1244, an address counter 1245, a high bank memory address / control. Latch 1247 and high bank address / control multiplexer 1246 are included. Each logical device, such as logical device 1203 shown in FIG. 57, includes an evaluation finite state machine (EVALFSMx) 1248, a data bus multiplexer (FDO_MUXx for FPGA0 logical device 1203) 1249. The “x” notation appended to the end of EVALFSM identifies the specific logical device (FPGA0, FPGA1, FPGA2, FPGA3) associated with EVALFSM. In this example, “x” is a number from 0 to 3. Thus, EVALFSM0 is associated with FPGA0 logical device 1203. In general, each logical device is associated with a number x, and if N logical devices are used, "x" is a number from 0 to N-1.
[0524]
In each logical device 1201-1204, a number of memory blocks are associated with the user's design configured and mapped. Accordingly, the memory block interface 1253 in user logic provides a means for the computing system to access a desired memory block in the array of FPGA logic devices. The memory block interface 1253 also provides memory write data on the bus 1295 to the FPGA data bus multiplexer (FDO_MUXx) 1249 and reads memory read data on the bus 1297 from the memory read data double buffer 1251.
[0525]
A memory block data / logic interface 1298 is provided in each FPGA logic device. Each of these memory block data / logic interfaces 1298 is coupled to an FPGA data bus multiplexer (FDO_MUXx) 1249, an evaluation finite state machine (EVALFSMx) 1248, and an FPGA bus FD [63: 0]. The memory block data / logical interface 1298 includes a memory read data buffer 1251, an address offset unit 1250, a memory model 1252, and a memory block interface for each memory block N (mem_block_N) 1253. These are all repeated in any given FPGA logic device 1201-1204 for each memory block N. Thus, five sets of memory block data / logical interfaces 1298 are provided for five memory blocks. That is, five sets of memory read data buffers 1251, an address offset unit 1250, a memory model 1252, and a memory block interface for each memory block N (mem_block_N) 1253 are provided.
[0526]
Similar to EVALFSMx, the “x” in FDO_MUXx identifies the specific logical device (FPGA0, FPGA1, FPGA2, FPGA3) with which FDO_MUXx is associated. In this example, “x” is a number from 0 to 3. The output of FDO_MUXx1249 is provided on bus 1282. Bus 1282 is coupled to high bank bus FD [63:32] or low bank bus FD [31: 0], depending on which chip (FPGA0, FPGA1, FPGA2, FPGA3) is associated with FDO_MUXx1249. In FIG. 57, FDO_MUXx is FDO_MUX0 associated with low bank logic device FPGA0 1203. Therefore, the output on bus 1282 is provided to low bank bus FD [31: 0]. The portion of the bus 1283 is used to transfer read data from the high bank bus FD [63:32] or the low bank bus FD [31: 0] to the read bus 1283 for input to the memory read data double buffer 1251. used. Therefore, the write data is output and transferred from the memory block in each of the logical devices 1201 to 1204 to the high bank bus FD [63:32] or the low bank bus FD [31: 0] bus via the FDO_MUX0 1249, and the read data is stored in the memory The read data double buffer 1251 is input and transferred from the high bank bus FD [63:32] or the low bank bus FD [31: 0] bus via the read bus 1283. The memory read data double buffer provides a double buffer mechanism to latch data in the first buffer, and then buffer again to output the latched data at the same time to reduce distortion. This memory read data double buffer 1251 is described in more detail below.
[0527]
Return to the memory model 1252. The memory model 1252 converts the user memory type to the SRAM type of the memory simulation system. This memory block interface 1253 can also be unique to the user's design because the memory type in the user's design varies from one type to another. For example, the user memory type can be DRAM, flash memory, or EEPROM. However, in all variants of the memory block interface 1253, memory addresses and control signals (eg, read, write, chip select, mem_clk) are provided. One embodiment of the memory simulation aspect of the present invention converts a user memory type to an SRAM type used in a memory simulation system. If the user memory type is SRAM, conversion to the SRAM type memory model is quite simple. Thus, the memory address and control signals are provided on bus 1296 to the memory model 1252 that performs the conversion.
[0528]
Memory model 1252 provides memory block addresses on bus 1293 and control information on bus 1292. The address offset unit 1250 receives address information of various memory blocks and provides a changed offset address on the bus 1291 from the original address on the bus 1293. The offset is necessary because there are overlapping memory block addresses. For example, one memory block resides in it using space 0-2K, while another memory block resides in it using space 0-3K. Since both memory blocks overlap in space 0-2K, individual addressing can be difficult without some sort of address offset mechanism. Thus, the first memory block may reside in it using space 0-2K, while the second memory block may reside in it using space up to about 2K and 5K. The offset address from address offset unit 1250 and the control signal on bus 1292 are combined and provided on bus 1299 to the FPGA bus multiplexer (FDO_MUXx) 1249.
[0529]
The FPGA data bus multiplexer FDO_MUXx receives SPACE2 data on bus 1289, SPACE3 data on bus 1290, address / control signals on bus 1299, and memory write data on bus 1295. As mentioned above, SPACE2 and SPACE3 are specific spatial indexes. The SPACE index generated by the FPGA I / O controller (item 327 in FIG. 10; FIG. 22) selects a specific address space (ie, REG read, REG write, S2H read, H2S write, and CLK write). Within this address space, the system of the present invention sequentially selects specific words to be accessed. SPACE2 indicates a memory space dedicated to DMA read transfers for hardware versus software H2S data. SPACE3 indicates a memory space dedicated to DMA read transfer for REGISTER_READ data. See Table G above.
[0530]
As an output, FDO_MUXx 1249 provides the data on bus 1282 to either the low bank bus or the high bank bus. The selector signal is an output enable (output_en) signal on the line 1284 from the EVALFSMx unit 1248 and a selection signal on the line 1285. The output enable signal on line 1284 enables (or disables) the operation of FDO_MUXx1249. For data access via the FPGA bus, the output enable signal is enabled so that FDO_MUXx can function. The select signal on line 1285 is generated by the EVALFSMx section 1248 and selects multiple inputs from SPACE2 on bus 1289, SPACE3 on bus 1290, address / control signals on bus 1299, and memory write data on bus 1295. To do. The generation of the selection signal by the EVALFSMx unit 1248 is further described below.
[0531]
The EVALFSMx unit 1248 is at the center of the operation of each logical device 1201-1120 with respect to the memory simulation system. The EVALFS Mx unit 1248 receives as inputs the SHIFTIN signal on line 1279, the EVAL signal from the CTRL_FPGA unit 1200 on line 1274, and the write signal wrx on line 1287. The EVALFSMx section 1248 includes a SHIFTOUT signal on line 1280, a read latch signal rd_latx to the memory read data double buffer 1251, an output enable signal on line 1284 on FDO_MUXx1249, a select signal on line 1285 to FDO_MUXx1249, and a line 1281 Output three signals (input-en, mux_en, and clk_en) to the above user logic.
[0532]
The operation of FPGA logic devices 1201-1204 for the memory simulation system of one embodiment of the present invention is generally as follows. If EVAL is at logic 1, data evaluation within the FPGA logic devices 1201-1120 occurs. Otherwise, the simulation system performs either DMA data transfer or memory access. At EVAL = 1, the EVALFS Mx unit 1248 generates a clk_en signal, an input_en signal, and a mux_en signal so that user logic can evaluate data, latch related data, and multiple signals through the logic device, respectively. The EVALFSMx unit 1248 generates a clk_en signal to enable the second flip-flop of all clock edge register flip-flops in the user's design logic (see FIG. 19). The clk_en signal is also known as a software clock. If the user memory type is synchronous, clk_en also enables the second clock of the memory read data double buffer 1251 in each memory block. The EVALFSMx unit 1248 generates an input_en signal to the user design logic and latches an input signal transmitted from the CPU to the user logic by DMA transfer. The input_en signal provides an enable input to the second flip-flop in the main clock register (see FIG. 19). Finally, the EVALFS Mx unit 1248 generates a mux_en signal to turn on the multiplexing circuit in each FPGA logic device and initiates communication with other FPGA logic devices in the array.
[0533]
Thereafter, if the FPGA logic devices 1201-1204 include at least one memory block, the memory simulation system waits for the selected data to be shifted to the selected FPGA logic device and then for the FPGA data bus driver. Output_en and select signal are generated, and the address and control signal of the memory block interface 1253 (mem_block_N) are placed on the FD bus.
[0534]
When the write signal wrx on line 1287 is enabled (ie, logic 1), the select and output_en signals are enabled and write data is low or low depending on which bank the FPGA chip is coupled to. Place on one of the high bank buses. In FIG. 57, logical device 1203 is FPGA0 and is coupled to low bank bus FD [31: 0]. When the write signal wrx on line 1287 is disabled (ie, logic 0), the select and output_en signals are disabled and the read latch signal rd_latx on line 1286 is on which FPGA chip Depending on whether is coupled, the memory read data double buffer 1251 is latched and double buffered the selected data from the SRAM via either the low or high bank bus. The wrx signal is a memory write signal obtained from the memory interface of the user's design logic. In fact, the wrx signal on line 1287 comes from memory model 1252 via control bus 1292.
[0535]
This process of reading or writing data occurs for each FPGA logic device. After all memory blocks have been processed via SRAM access, EVALFSMx section 1248 generates a SHIFTOUT signal to allow SRAM access by the next FPGA logic device in the chain. Note that memory accesses for devices on the high and low banks occur in parallel. A memory access for one bank may be completed before a memory access for another bank. For all of these accesses, appropriate wait cycles are inserted so that the logic processes the data only when the logic is ready and the data is available.
[0536]
On the CTRL_FPGA unit 1200 side, the MEMFSM 1240 is at the center of the memory simulation aspect of the present invention. The MEMFSM 1240 sends and receives a number of control signals to control the activation of the memory simulation write / read cycle and the control of the various operations supported by the cycle. The MEMFSM 1240 receives the DATASFR signal on line 1260 via line 1258. This signal is also provided to each logic device on line 1273. When DATAXSFR goes low (logical low), the DMA data transfer period ends and the evaluation and memory access period begins.
[0537]
The MEMFSM 1240 also receives the LASTH signal on line 1254 and the LASTL signal on line 1255 so that the selected word associated with the selected address space is routed between the PCI bus and the FPGA bus between the computing system and the simulation system. Indicates that it was accessed via The MOVE signal associated with this shift-out process is the desired word accessed and the MOVE signal eventually becomes the LAST signal at the end of the chain (ie, LASTH for the high bank and LASTL for the low bank). To each of the logical devices (for example, logical devices 1201 to 1204). In EVALFSM 1248 (ie, FIG. 57 shows EVALFSM0 for FPGA0 logic device 1203), the corresponding LAST signal is the SHIFTOUT signal on line 1280. Since the specific logical device 1203 is not the last logical device in the low bank chain as shown in FIG. 56 (logical device 1204 is the last logical device in the low bank chain), the SHIFTOUT signal for EVALFSM0 is not the LAST signal. Absent. If EVALFSM 1248 corresponds to EVALFSM2 in FIG. 56, the SHIFTOUT signal on line 1280 is the LASTL signal provided on line 1255 to MEMFSM. Otherwise, the SHIFTOUT signal on line 1280 is provided to logic device 1204 (see FIG. 56). Similarly, the SHIFTIN signal on line 1279 represents Vcc for FPGA0 logic device 1203 (see FIG. 56).
[0538]
The LASTL and LASTH signals are input to AND gate 1241 via

lines

1256 and 1257, respectively. The AND gate 1241 provides an open drain. The output of AND gate 1241 generates the DONE signal on line 1259. The DONE signal is provided to the computing system and MEMFSM 1240. Thus, the AND gate outputs a logic high only when both the LASTL and LASTH signals are logic high indicating the end of the shift-out chain process.
[0539]
The MEMFSM 1240 generates a start signal on line 1261 for the EVAL counter 1242. As the name implies, the start signal is sent to start the EVAL counter 1242 and after the DMA data transfer period is complete. The start signal is generated when the DDATAXSFR signal transitions from high to low (1 to 0). The EVAL counter 1242 is a programmable counter that counts a predetermined number of clock cycles. The duration of the programmed count in EVAL counter 1242 determines the duration of the evaluation period. The output of EVAL counter 1242 on line 1274 is either a

logic level

1 or 0, depending on whether the counter is counting. When the EVAL counter 1242 is counting, the output on line 1274 is a logic one, which is provided to each FPGA logic device 1201-1204 via the EVALFSMx1248. When EVAL = 1, the FPGA logic devices 1201 to 1204 perform inter-FPGA communication to evaluate data in the user's design. The output of the EVAL counter 1242 is also fed back on line 1262 to the MEMFSM unit 1240 for its own tracking purposes. At the end of the programmed count, EVAL counter 1242 generates a logical 0 on lines 1274 and 1262 to indicate the end of the evaluation period.
[0540]
If memory access is not desired, MEM_EN on line 1272 is asserted to logic 0 and provided to MEMFSM unit 1240. In this case, the memory simulation system waits for another DMA data transfer period. If memory access is desired, the MEM_EN signal on line 1272 is asserted to a logic one. In essence, the MEM_EN signal is a control signal from the CPU to allow the on-board SRAM memory device to access the FPGA logic device. Here, the MEMFSM unit 1240 waits for the FPGA logic devices 1201 to 1204 to input address and control signals to the FPGA buses FD [63:32] and FD [31: 0].
[0541]
The remaining functional units and their associated control signals and lines are for providing address / control information to the SRAM memory device for writing and reading data. These parts include a memory address / control latch 1243 for the low bank, an address control mux 1244 for the low bank, a memory address / control latch 1247 for the high bank, an address control multiplexer 1246 for the high bank, and an address counter 1245.
[0542]
The memory address / control latch 1243 for the low bank receives the address and control signals from the FPGA bus FD [31: 0] 1275 matching the bus 1213 and the latch signal on line 1263. Latch 1243 generates a mem_wr_L signal on line 1264 and provides an input address / control signal via bus 1266 from FPGA bus FD [31: 0] to address control mux 1244. This mem_wr signal is the same as the chip selection write signal.
[0543]
Address / control multiplexer 1244 receives as input the address and control information on bus 1266 and the address information via bus 1268 from address counter 1245. As an output, address / control multiplexer 1244 sends address / control information over bus 1276 to low bank SRAM memory device 1205. The select signal on line 1265 provides an appropriate select signal from the MEMFSM unit 1240. Address / control information on bus 1276 corresponds to MA [18: 2] and chip select read / write signals on

buses

1229 and 1216 in FIG.
[0544]
Address counter 1245 receives information from SPACE 4 and SPACE 5 via bus 1267. SPACE 4 includes DMA write transfer information. SPACE 5 includes DMA read transfer information. These DMA transfers occur via the PCI bus between the computing system (cache / main memory via the workstation CPU) and the simulation system (SRAM memory devices 1205, 1206). Address counter 1245 provides its output on buses 1288 and 1268 to address /

control multiplexers

1244 and 1246. Using the appropriate select signal on line 1265 for the low bank, address / control multiplexer 1244 is on bus 1276 and on bus 1266 for the write / read memory address between SRAM device 1205 and

FPGA logic devices

1203 and 1204. Address / control information or DMA write / read transfer data from SPACE 4 or SPACE 5 on the bus 1267 is input.
[0545]
During the memory access period, the MEMFSM unit 1240 provides a latch signal on line 1263 to the memory address / control latch 1243 to fetch input from the FPGA bus FD [31: 0]. The MEMFSM unit 1240 extracts mem_wr_L control information from the address / control signal on FD [31: 0] for further control. If the mem_wr_L signal on line 1264 is a logic one, a write operation is desired and the appropriate select signal on line 1265 is generated by the MEMFSM unit 1240 to the address / control multiplexer 1244 to address and control signals on the bus 1266. Is sent to the low bank SRAM on bus 1276. Thereafter, write data transfer from the FPGA logic device to the SRAM memory device occurs. If the mem_wr_L signal on line 1264 is a logic zero, a read operation is desired, so the simulation system waits for data on the FPGA bus FD [31: 0] located there by the SRAM memory device. As soon as the data is ready, a read data transfer from the SRAM memory device to the FPGA logic device occurs.
[0546]
Similar configurations and operations for high banks are provided. The memory address / control latch 1247 for the high bank receives the address and control signals from the FPGA bus FD [63:32] 1278 matching the bus 1212 and the latch signal on line 1270. Latch 1270 generates a mem_wr_H signal on line 1271 and provides input address / control signals from FPGA bus FD [63:32] to address / control multiplexer 1246 via bus 1239.
[0547]
Address / control multiplexer 1246 receives as input address / control information on bus 1239 and address information from address counter 1245 on bus 1268. As an output, address / control multiplexer 1246 sends address / control information over bus 1277 to high bank SRAM memory device 1206. The select signal on line 1269 provides an appropriate select signal from the MEMFSM unit 1240. The address / control information on bus 1277 corresponds to MA [18: 2] and chip select read / write signals on

buses

1214 and 1215 in FIG.
[0548]
Address counter 1245 receives information from SPACE4 and SPACE5 via bus 1267 for DMA write and read transfers as described above. Address counter 1245 provides its output on buses 1288 and 1268 to address /

control multiplexers

1244 and 1246. Using the appropriate select signal on line 1269 for the high bank, address / control multiplexer 1246 is on bus 1277 and on bus 1239 for the write / read memory address between SRAM device 1206 and

FPGA logic devices

1201 and 1202. Address / control information or DMA write / read transfer data from SPACE 4 or SPACE 5 on the bus 1267 is input.
[0549]
During the memory access period, the MEMFSM unit 1240 provides a latch signal on line 1270 to the memory address / control latch 1247 to fetch the input from the FPGA bus FD [63:32]. The MEMFSM unit 1240 extracts mem_wr_H control information and address / control signals on FD [63:32] for further control. If the mem_wr_H signal on line 1271 is a logic one, a write operation is desired and the appropriate select signal on line 1269 is generated by the MEMFSM unit 1240 to the address / control multiplexer 1246 to address and control signals on the bus 1239. Is sent to the high bank SRAM on bus 1277. Thereafter, write data transfer from the FPGA logic device to the SRAM memory device occurs. If the mem_wr_H signal on line 1271 is a logic zero, a read operation is desired, so the simulation system waits for data on the FPGA bus FD [63:32] located there by the SRAM memory device. As soon as the data is ready, a read data transfer from the SRAM memory device to the FPGA logic device occurs.
[0550]
As shown in FIG. 57, address and control signals are provided to low bank SRAM memory devices and high bank memory devices via

buses

1276 and 1277, respectively. Bus 1276 for the low bank corresponds to the combination of

buses

1229 and 1216 in FIG. Similarly, bus 1277 for the high bank corresponds to the combination of

buses

1214 and 1215 in FIG.
[0551]
The operation of the CTRL_FPGA unit 1200 for the memory simulation system of one embodiment of the present invention is generally as follows. The DONE signal on line 1259, provided to the computing system in the CTRL_FPGA unit 1200 and the MEMFSM unit 1240, indicates the completion of the simulation write / read cycle. The DATAXSFR signal on line 1260 indicates the occurrence of a DMA data transfer period in a simulated write / read cycle. Memory address / control signals for both FPGA buses FD [31: 0] and FD [63:32] are provided to memory address / control latches 1243 and 1247 for the high and low banks, respectively. For any bank, the MEMFSM unit 1240 generates a latch signal (1263 or 1269) to latch the address and control information. This information is then provided to the SRAM memory device. The mem_wr signal is used to determine whether a write or read operation is desired. If writing is desired, data is transferred from the FPGA logic devices 1201-1120 to the SRAM memory device. If a read is desired, the simulation system waits for the SRAM memory to input the requested data onto the PFGA bus for transfer between the SRAM memory and the FPGA logic device. For SPACE4 and SPACE5 DMA data transfers, the select signal on lines 1265, 1269 selects the output of address counter 1245 as data to be transferred between the main computing system and the SRAM memory device in the simulation system. obtain. For all of these addresses, the appropriate wait cycle is inserted so that the logic processes the data only if the logic is ready and the data is available.
[0552]
FIG. 60 shows a more detailed view of the memory read data double buffer 1251 (FIG. 57). Each memory block N in each FPGA logic device has a double buffer for latching in relevant data that can be input at different times, and then finally buffering out the latched data simultaneously. In FIG. 60, a double buffer 1391 for memory block 0 includes two D-type flip-

flops

1340 and 1341. The output 1343 of the first D flip-flop 1340 is coupled to the input of the second flip-flop 1341. The output 1344 of the second D flip-flop 1341 is the output of the double buffer, which is provided to the memory block N interface in the user's design logic. The global clock input is provided to the first flip-flop 1340 on line 1393 and the second flip-flop 1341 on line 1394.
[0553]
The first D flip-flop 1340 receives its data input on line 1342 from the SRAM memory device via bus 1283 and FPGA bus FD [63:32] for the high bank and FD [31: 0] for the low bank. The enable input is coupled to a line 1345 that receives rd_latx (eg, rd_lat0) from the EVALFSMx unit for each FPGA logic device. Thus, for a read operation (ie, wrx = 0), the EVALFSMx unit generates an rd_latx signal and latches the data on line 1342 into line 1343. Input data for all double buffers of all memory blocks can be input at different times. The double buffer ensures that all of the data is latched first. Once all the data is latched into D flip-flop 1340, the clk_en signal (ie, software clock) is provided on line 1346 as a clock input to D flip-flop 1341. When the clk_en signal is asserted, the latched data on line 1343 is buffered in 1341 in the D flip-flop for line 1344.
[0554]
Another double buffer 1392 substantially equivalent to the double buffer 1391 is provided for the next memory block 1. Data from the SRAM memory device is input on line 1396. The global clock signal is input on line 1397. The clk_en (software clock) signal is input on line 1398 to a second flip-flop (not shown) in double buffer 1392. These lines are coupled to similar signal lines for the first double buffer 1391 for memory block 0 and other double memories for other memory blocks N. The output double buffered data is provided on line 1399.
[0555]
The rd_latx signal for the second double buffer 1392 (eg, rd_lat1) is provided on line 1395 independently of the other rd_latx signals for the other double buffers. More double buffers are provided for other memory blocks N.
[0556]
Here, a state diagram of the MEMFSM unit 1240 will be described according to an embodiment of the present invention. FIG. 58 shows such a state diagram of the finite state machine of the MEMFSM unit in the CTRL_FPGA unit. The state diagram in FIG. 58 is configured such that three periods in the simulated write / read cycle are also shown having their corresponding states. Thus, states 1300 to 1301 correspond to DMA data transfer periods; states 1302 to 1304 correspond to evaluation periods; and states 1305 to 1314 correspond to memory access periods. In the following description, FIG. 57 is referred to in conjunction with FIG.
[0557]
In general, a sequence of signals for DMA transfer, evaluation, and memory access is set up. In one embodiment, the sequence is as follows: DATA_XSFR initiates DMA data transfer, if any. LAST signals for both high and low banks are generated upon completion of DMA data transfer and trigger the DONE signal to indicate completion of the DMA data transfer completion period. An XSFR_DONE signal is then generated and then the EVAL cycle begins. At the end of EVAL, memory read / write may begin.
[0558]
Returning to the top of FIG. 58, state 1300 is idle whenever the DATAXSFR signal is a logic zero. This indicates that no DMA data transfer occurs in that case. If the DATAXSFR signal is a logic one, the MEMFSM unit 1240 proceeds to state 1301. Here, the computing system performs DMA data transfer between the computing system (main memory in FIGS. 1, 45, and 46) and the simulation system (FPGA logic devices 1201-1120 or

SRAM memory devices

1205, 1206 in FIG. 56). I need. An appropriate wait cycle is inserted until the DMA data transfer is complete. When the DMA transfer is complete, the DATAXSFR signal returns to logic zero.
[0559]
When the DATAXSFR signal returns to logic 0, generation of a start signal is caused in the MEMFSM unit 1240 in state 1302. The start signal starts the EVAL counter 1242 (programmable counter). As long as the EVAL counter is counting in state 1303, the EVAL signal is asserted to logic 1 and provides EVALFSMx in each FPGA logic device and MEMFSM unit 1240. At the end of the count, the EVAL counter presents a logic zero EVAL signal to EVALFSMx in each FPGA logic device and MEMFSM unit 1240. When the MEMFSM unit 1240 receives an EVAL signal of logic 0, it turns on the EVAL_DONE flag in state 1304. The EVAL_DONE flag is used by MEMFSM to indicate that the evaluation period has expired and the memory access period will proceed here if desired. The CPU checks EVAL_DONE and XSFR_DONE by reading the XSFR_EVAL register (see Table K below) to confirm that the DMA transfer and EVAL have successfully completed the next DMA transfer.
[0560]
However, in some cases, the simulation system may not want to perform a memory access at that time. Here, the simulation system holds the memory enable signal MEM_EN at 0. This disabled (logic 0) MEM_EN signal keeps the MEMFSM unit in the idle state 1300. Here, the MEMFSM unit waits for DMA data transfer or data evaluation by the FPGA logic device. On the other hand, if the memory enable signal MEM_EN is logic 1, the simulation system indicates that it is desired to perform memory access.
[0561]
In FIG. 58, below state 1304, the state diagram is divided into two sections running in parallel. One section includes

states

1305, 1306, 1307, 1308, and 1309 for low bank memory access. Other sections include

states

1311, 1312, 1313, 1314, and 1309 for high bank memory accesses.
[0562]
In state 1305, the simulation waits for one cycle for the currently selected FPGA logic device to input address and control signals to the FPGA bus FD [31: 0]. In state 1306, the MEMFSM generates a latch signal on line 1263 for the memory address / control latch 1243 and fetches the input from FD [31: 0]. The data corresponding to this specifically fetched address and control signal is either read from the SRAM memory device or written to the SRAM memory. To determine whether the simulation system requires a read or write operation, the memory write signal mem_wr_L for the low bank can be extracted from the address and control signals. If mem_wr_L = 0, a read operation is requested. If mem_wr_L = 1, a write operation is requested. As described above, this mem_wr signal is equivalent to the chip selection write signal.
[0563]
In state 1307, the appropriate selection signal for address / control multiplexer 1244 is generated and sends the address and control signal to the low bank SRAM. The MEMFSM unit checks the mem_wr signal and the LASTL signal. If mem_wr_L = 1 and LASTL = 0, a write operation is requested, but the last data in the chain of FPGA logic devices has not yet been shifted out. Accordingly, the simulation system returns to state 1305. In state 1305, the simulation system waits for one cycle for the FPGA logic device to input additional address and control signals to FD [31: 0]. This operation continues until the last data is shifted out of the FPGA logic device. However, if mem_wr_L = 1 and LASTL = 1, the last data was shifted out of the FPGA logic device.
[0564]
Similarly, if mem_wr_L = 0 indicating a read operation, the MEMFSM proceeds to state 1308. In state 1308, the simulation system waits for one cycle for the SRAM memory device to input data to the FPGA bus FD [31: 0]. If LASTL = 0, the last data in the chain of FPGA logic devices has not yet been shifted out. Accordingly, the simulation system returns to state 1305. In state 1305, the simulation system waits for one cycle for the FPGA logic device to input additional address and control signals to FD [31: 0]. This process continues until the last data is shifted out of the FPGA logic device. Note that the write operation (mem_wr_L = 1) and the read operation (mem_wr_L = 0) can be interleaved until LASTL = 1 or otherwise alternated.
[0565]
If LASTL = 1, the MEMFSM proceeds to state 1309. In state 1309, MEMFSM waits for DONE = 0. When DONE = 1, both LASTL and LASTH were logic 1 and, therefore, the simulation write / read cycle was completed. The simulation system then proceeds to state 1300. In state 1300, the simulation system always remains idle when DATAXSFR = 0.
[0566]
The same process can be applied to high banks. In state 1311, the simulation system waits for one cycle for the currently selected FPGA logic device to input address and control signals to the FPGA bus FD [63:32]. In state 1312, the MEMFSM generates a latch signal on line 1270 to the memory address / control latch 1247 to fetch the input from FD [63:32]. The data corresponding to this particular fetched address and control signal can be either read from the SRAM memory device or written to the SRAM memory device. In order to determine if the simulation system requires a read or write operation, the memory write signal mem_wr_H for the high bank can be extracted from the address and control signals. If mem_wr_H = 0, a read operation is requested. If mem_wr_H = 1, a write operation is requested.
[0567]
In state 1313, the appropriate selection signal for address / control multiplexer 1246 is generated and sends the address and control signal to the high bank SRAM. The MEMFSM unit checks the mem_wr signal and the LASTH signal. If mem_wr_H = 1 and LASTH = 0, a write operation is requested, but the last data in the chain of FPGA logic devices has not yet been shifted out. Therefore, the simulation system returns to state 1311. In state 1311, the simulation system waits for one cycle for the FPGA logic device to input additional address and control signals to FD [63:32]. This operation continues until the last data is shifted out of the FPGA logic device. However, if mem_wr_L = 1 and LASTL = 1, the last data was shifted out of the FPGA logic device.
[0568]
Similarly, if mem_wr_H = 0 indicating a read operation, the MEMFSM proceeds to state 1314. In state 1314, the simulation system waits for one cycle for the SRAM memory device to input data to the FPGA bus FD [63:32]. If LASTH = 0, the last data in the chain of FPGA logic devices has not yet been shifted out. Therefore, the simulation system returns to state 1311. In state 1311, the simulation system waits for one cycle for the FPGA logic device to input additional address and control signals to FD [63:32]. This process continues until the last data is shifted out of the FPGA logic device. Note that the write operation (mem_wr_H = 1) and the read operation (mem_wr_H = 0) can be interleaved until LASTH = 1 or otherwise alternated.
[0569]
If LASTH = 1, the MEMFSM proceeds to state 1309. In state 1309, MEMFSM waits for DONE = 0. When DONE = 1, both LASTL and LASTH were logic 1 and, therefore, the simulation write / read cycle was completed. The simulation system then proceeds to state 1300. In state 1300, the simulation system always remains idle when DATAXSFR = 0.
[0570]
Alternatively, states 1309 and 1310 are not implemented in another embodiment of the present invention for both high and low banks. Thus, in the low bank, the MEMFSM may go directly to state 1300 after passing state 1308 (LASTL = 0) or 1307 (MEM_WR_L = 1 and LASTL = 1). In the high bank, the MEMFSM may go directly to state 1300 after passing state 1314 (LASTH = 1) or 1313 (MEM_WR_H = 1 and LASTH = 1).
[0571]
A state diagram of the EVALFSM unit 1248 will now be described according to one embodiment of the present invention. FIG. 59 shows such a state diagram of the EVALFSMx finite state machine in each FPGA chip. Similar to FIG. 58, the state diagram in FIG. 59 was configured such that two periods in the simulated write / read cycle are also shown having their corresponding states. Therefore, states 1320 to 1326A correspond to the evaluation period, and states 1326B to 1336 correspond to the memory access period. In the following description, FIG. 57 is referred to in conjunction with FIG.
[0572]
The EVALFSMx unit 1248 receives the EVAL signal on the line 1274 from the CTRL_FPGA unit 1200 (see FIG. 57). While EVAL = 0, no evaluation of data by the FPGA logic device occurs. Thus, in state 1320, EVALFSMx is idle during EVAL = 0. If EVAL = 1, EVALFSMx proceeds to state 1321.
[0573]

States

1321, 1322, and 1323 relate to inter-FPGA communication. In inter-FPGA communication, data is evaluated via the FPGA logic device by user design. Here, EVALFSMx generates signals input_en, mux_en, and clk_en (item 1281 in FIG. 57) for the user's logic. In state 1321, EVALFSMx generates a clk_en signal. The clk_en signal enables the second flip-flop of all clock edge register flip-flops to be used in the user's design logic during this cycle (see FIG. 19). The clk_en signal is also known as a software clock. If the user memory type is synchronous, clk_en also enables the second clock of the memory read data double buffer 1251 in each memory block. The SRAM data output for each memory block is sent to the user design logic in this cycle.
[0574]
In state 1322, EVALFSMx generates an input_en signal for the user design logic and latches the input signal sent from the CPU to the user logic by DMA transfer. The input_en signal provides an enable signal to the second flip-flop in the primary clock register (see FIG. 19).
[0575]
In state 1323, EVALFSMx generates a mux_en signal to turn on the multiplexing circuit in each FPGA logic device and initiate communication with the other FPGA logic devices in the array. As described above, inter-FPGA wirelines are often multiplexed to efficiently use the limited pin resources in each FPGA logic device chip.
[0576]
In state 1324, EVALFSM waits as long as EVAL = 1. If EVAL = 0, the evaluation period is complete and so state 1325 requires EVALFSMx to turn off the mux_en signal.
[0577]
If the number of memory blocks M (where M is an integer including 0) is zero, EVALFSMx returns to state 1320. In state 1320, EVALFSMx remains idle if EVAL = 0. In most cases, M> 0 and therefore EVALFSMx returns to state 1326A / 1326B. “M” is the number of memory blocks in the FPGA logical device. M is constant from the user design that is mapped and configured in the FPGA logic device. M is not counted to decrease the value. When M> 0, the right part of FIG. 59 (memory access period) can be configured in the FPGA logic device. When M = 0, only the left part (EVAL period) of FIG. 59 can be configured.
[0578]
State 1327 holds EVALFSMx in a standby state as long as SHIFTIN = 0. If SHIFTIN = 1, the previous FPGA logical device has completed its memory access, and the current FPGA logical device is now ready to perform its memory access task. Alternatively, the current FPGA logic device is the first logic device in the bank and the SHIFTIN input line is coupled to Vcc. Nevertheless, receipt of the SHIFTIN = 1 signal indicates that the current FPGA logic device is ready to perform a memory access. In state 1328, the number N of memory blocks is set to N = 1. This number N can be incremented at the occurrence of each loop to achieve memory access to that particular memory block N. Initially, N = 1, and then EVALFSMx may proceed to access the memory for memory block 1.
[0579]
In state 1329, EVALFSMx generates a select signal on line 1285 and an output_en signal on line 1284 to the FPGA bus driver FDO_MUXx1249, and sends the address and control signal of the Mem_Block_N interface 1253 to the FPGA bus FD [63 : 32] or FD [31: 0]. If a write operation is required, wr = 1. Otherwise, a read operation is requested, where wr = 0. EVALFSMx receives the wr signal on line 1287 as one of its inputs. Based on this wr signal, an appropriate select signal on line 1285 can be asserted.
[0580]
If wr = 1, EVALFSMx proceeds to state 1330. EVALFSMx generates a selection for the FD bus driver and an out_en signal, and the write data of Mem_Block_N1253 is input to the FPGA bus FD [63:32] or FD [31: 0]. The EVALFSMx then waits for one cycle to allow the SRAM memory device to complete the write cycle. EVALFSMx then proceeds to state 1335. In state 1335, the memory block number N is incremented by one. That is, N = N + 1.
[0581]
However, if wr = 0 in state 1329, a read operation is requested and EVALFSMx proceeds to state 1332. EVALFSMx waits for one cycle in state 1332 and then proceeds to state 1333 to wait for another cycle. In state 1334, EVALFSMx generates an rd_latch signal on line 1286, causing the memory read data double buffer 1251 of memory block N to fetch the SRAM data onto the FD bus. EVALFSMx then proceeds to state 1335. In state 1335, the memory block number N is incremented by one. That is, N = N + 1. Thus, if N = 1 before increment state 1335, N is now 2 and the resulting memory access may be applicable to memory block 2.
[0582]
If the current number of memory blocks N is less than or equal to the total number of memory blocks M in the user's design (ie, N ≦ M), EVALFSMx proceeds to state 1329. In state 1329, EVALFSMx generates a specific selection and out_en signal for the FD bus driver depending on whether the operation is a write or a read. A write or read operation for this next memory block N may then occur.
[0583]
However, if the current number of memory blocks N is greater than the total number of memory blocks M in the user's design (ie, N ≧ M), EVALFSMx proceeds to state 1336. In state 1336, EVALFSMx turns on the SHIFTOUT output signal, allowing the next FPGA logic device in the bank to access the SRAM memory device. Thereafter, EVALFSMx proceeds to state 1320. In state 1320, EVALFSMx remains idle until the simulation system requests data evaluation between FPGA logic devices (ie, EVAL = 1).
[0584]
FIG. 61 illustrates a simulation write / read cycle of one embodiment of the present invention. FIG. 61 shows at reference numeral 1366 three periods (DMA data transfer period, evaluation period, and memory access period) in the simulation write / read cycle. Although not shown, it implies that a prior DMA transfer, evaluation, and memory access could have occurred. Furthermore, the timing for data transfer to / from the low bank SRAM is different from that of the high bank SRAM. For simplicity, FIG. 61 shows one example where the access times for the low and high banks are the same. Global clock GCLK 1350 provides a clock signal to all components in the system.
[0585]
A DATAXSFR signal 1351 indicates the occurrence of a DMA data transfer period. When DATAXSFR = 1 in trace 1367, DMA data transfer is occurring between the main computing system and the FPGA logic device or SRAM memory device. Thus, data is provided on FPGA high bank bus FD [63:32] 1359: and trace 1369, and FPGA low bank bus FD [31: 0] 1358 and trace 1368. The DONE signal 1364 indicates the completion of the memory access period by a logic 0-to-1 signal (trace 1390), or otherwise uses the logic 0 for the duration of the simulation write / read cycle (eg, the edge of 1370 and Trace 1390 edge combination). During the DMA transfer period, the DONE signal is a logic zero.
[0586]
At the end of the DMA transfer period, the DATAXSFR signal transitions from logic 1 to 0. This triggers the start of the evaluation period. Thus, EVAL 1352 is a logic 1 as indicated by trace 1371. The duration of the EVAL signal at logic 1 can be predetermined and programmable. During this evaluation period, data in the user's design logic is transmitted by a clk_en signal 1353 that is a logic 1 as indicated by trace 1372, and an input_en signal 1354 that is a logic 1 as indicated by trace 1373, and also by trace 1374. Evaluated with a mux_en signal 1355 that is a logic one for a longer duration than clk_en and input_en as shown. Data is being evaluated within this particular FPGA logic device. The mux_en signal 1355 transitions from logic 1 to 0 at trace 1374, and if at least one memory block is present in the FPGA logic device, the evaluation period ends and the memory access period begins.
[0587]
The SHIFTIN signal 1356 is asserted to a logic 1 at trace 1375. This is because the previous FPGA has completed its evaluation and all desired data has been accessed to / from this previous FPGA logic device. Now, the next FPGA logic device in the bank is ready to start memory access.
[0588]
In traces 1377 to 1386, the following nomenclature is used. ACj_k indicates that the address and control signal are associated with FPGAj and memory block k. Here, j and k are integers including 0. WDj_k indicates write data and memory block k for FPGAj. RDj_k indicates FPGAj and memory block k. Therefore, AC3_1 indicates the address and control signal associated with FPGA 3 and memory block 1. Low bank SRAM access and high bank SRAM access 1361 are shown as trace 1387.
[0589]
The next few traces 1377-1387 show how memory access is achieved. Based on the logic level of the wrx signal to EVALFSMx and the resulting mem_wr signal to MEMFSM, either a write or read operation can be performed. If a write operation is desired, the memory model interface (Mem_Block_N interface 1253 in FIG. 57) with the user's memory block N interface provides wrx as one of its control signals. This control signal wrx is provided to the FD bus driver and the EVALFSMx unit. When wrx is logic 1, an appropriate select signal and output_en signal are provided to the FD bus driver to input memory write data to the FD bus. This same control signal now on the FD bus can be latched by the memory address / control latch in the CTRL_FPGA unit. The memory address / control latch sends address and control signals to the SRAM via the MA [18: 2] / control bus. Since the wrx control signal, which is logic 1, is extracted from the FD bus and a write operation is requested, the data associated with the address and control signal on the FD bus is transmitted to the SRAM memory device.
Therefore, as shown in FIG. 61, this next FPGA logic device (logic device FPGA0 in the low bank) inputs AC0_0 to FD [31: 0] as shown by trace 1377. The simulation system performs a write operation on WD0_0. Next, AC0_1 is input to FD [31: 0]. However, if a read operation is requested, there will be some time delay after AC0_1 is input to FD bus FD [31: 0], after which RD0_0 is input to the FD bus by the SRAM memory device instead of WD0_0 corresponding to AC0_0. Is done.
[0590]
Note that inputting AC0_0 to the MA [18: 2] / control bus, as shown by trace 1383, is slightly delayed from inputting address, control, and data to the FD bus. This is because the MEMFSM unit latches in the address / control signal from the FD bus, extracts the mem_wr signal, and generates an appropriate selection signal to the address / control multiplexer so that the address / control signal is MA [18: 2]. This is because it takes time to be able to input to the control bus. Further, after inputting address / control signals to the MA [18: 2] / control bus to the SRAM memory device, the simulation system waits for the corresponding data from the SRAM memory to be input to the FD bus. There must be. One example is the time offset between trace 1384 and trace 1381. Here, RD1_1 is input to the FD bus after AC1_1 is input to the MA [18: 2] / control bus.
[0591]
On the high bank, FPGA1 inputs AC1_0 to FD [63:32], followed by WD1_0. Thereafter, AC1_1 is input to FD [63:32]. This is indicated by trace 1380. When AC1_1 is input to the FD bus, the control signal indicates a read operation in this example. Thus, as described above, since AC1_1 is on the MA [18: 2] / control bus as indicated by trace 1384, the appropriate wrx and mem_wr signals that are logic 0 are address / control to the EVALFSMx and MEMFSM units. Present in the signal. Since the simulation system knows that this is a read operation, no write data is transmitted to the SRAM memory. Rather, the read data associated with AC1_1 is input to the FD bus by the SRAM memory for later read through the simulation memory block interface by the user design logic. This is indicated by trace 1381 on the high bank. On the low bank, RD0_1 is input to the FD bus as shown by trace 1378, followed by AC0_1 to the MA [18: 2] / control bus (not shown).
[0592]
A read operation through the simulation memory block interface with the user's design logic is achieved when EVALFSMx generates the rd_lat0 signal 1362 for the memory read data double buffer in the simulation system as indicated by trace 1388. This rd_lat0 signal is provided to both the low bank FPGA0 and the high bank FPGA1.
[0593]
Thereafter, the next memory block for each FPGA logic device is input to the FD bus. AC2_0 is input to the low bank FD bus while AC3_0 is input to the high bank FD bus. If a write operation is desired, WD2_0 is input to the low bank FD bus and WD3_0 is input to the high bank FD bus. AC3_0 is input to the MA [18: 2] / control bus as shown on trace 1385. This process continues for the next memory block for write and read operations. Note that the write and read operations for the low and high banks occur at different times and speeds, and FIG. 61 shows a specific example where the timing for the low and high banks is the same. In addition, write operations for the low and high banks occur together, followed by read operations on both banks. This is not always the case. The presence of a low bank and a high bank allows for parallel operation of these back coupled devices. That is, activity on the low bank is independent of activity on the high bank. Other scenarios are conceivable in which the low bank performs a series of write operations while the high bank performs a series of read operations.
[0594]
The SHIFTOUT signal 1357 is asserted as indicated by trace 1376 when it is the last data in the last FPGA logic device for each bank. For a read operation, the rd_lat1 signal 1363 corresponding to FPGA2 on the low bank and FPGA3 on the high bank is asserted as indicated by trace 1389, reading RD2_1 on trace 1379 and RD3_1 on trace 1382. Since the last data for the last FPGA unit has been accessed, the completion of the simulation write / read cycle is indicated by the DONE signal 1364 as indicated by trace 1390.
[0595]
Table H below lists and describes the various components on the simulation system board and the corresponding registers / memory, PCI memory addresses, and local addresses.
[0596]
[Table 11]

[0597]
The data format for the configuration file is shown below in Table J according to one embodiment of the present invention. The CPU sends one word over the PCI bus at each time to configure one bit for all onboard FPGAs in parallel.
[0598]
[Table 12]

[0599]
Table K below lists the XSFR_EVAL register. The XSFR_EVAL register is present on all boards. The XSFR_EVAL register is used by the host computing system to program the EVAL period, control DMA reads / writes, and read the status of the EVAL_DONE and XSFR_DONE fields. The host computing system also uses this register to enable memory access. The operation of the simulation system for this register is described below with reference to FIGS.
[0600]
[Table 13]

[0601]
Table L below lists the contents of the CONFIG_JTAG [6: 1] register. The CPU configures the FPGA logical device, and executes a boundary scan test on the FPGA logical device via this register. Each board has one dedicated register.
[0602]
[Table 14]

[0603]
62 and 63 show timing diagrams for another embodiment of the present invention. These two figures show the operation of the simulation system for the XSFR_EVAL register. The XSFR_EVAL register is used by the host computing system to program the EVAL period, control DMA reads / writes, and read the status of the EVAL_DONE and XSFR_DONE fields. The host computing system also uses this register to enable memory access. One of the main differences between the two figures is the status of the WAIT_EVAL field. When the WAIT_EVAL field is set to “0” (in the case of FIG. 62), the DMA read transfer starts after CLK_EN. When the WAIT_EVAL field is set to “1” (in the case of FIG. 63), the DMA read transfer starts after EVAL_DONE.
[0604]
In FIG. 62, both WR_XSFR_EN and RD_XSFR_EN are set to “1”. These two fields enable DMA write / read transfers and are cleared by XSFR_DONE. Since the two fields are set to “1”, the CTRL_FPGA unit automatically performs a DMA write transfer first and then performs a DMA read transfer. However, the WAIT_EVAL field is set to “0” and the DMA read transfer starts after the assertion of CLK_EN (and after the completion of the DMA write operation). Therefore, in FIG. 62, the DMA read operation occurs almost immediately after the completion of the DMA write operation as soon as the CLK_EN signal (software clock) is detected. The DMA read transfer does not wait for the EVAL period to complete.
[0605]
At the start of the timing diagram, the EVAL_REQ_N signal contends when multiple FPGA logic devices compete for attention. As described above, the EVAL_REQ_N (or EVAL_REQ #) signal is used to initiate an evaluation cycle when any of the FPGA logic devices asserts this signal. At the end of the data transfer, an evaluation cycle is initiated that includes software pointer manipulation to facilitate address pointer initialization and evaluation processing.
[0606]
The DONE signal is also generated at the end of the DMA data transfer period and contends when multiple LAST signals (from the shifttin and shiftout signals at the output of each FPGA logic device) are generated and provided to the CTRL_FPGA unit. . If all LAST signals are received and processed, a DONE signal is generated and a DMA data transfer operation can begin. The EVAL_REQ_N signal and the DONE signal use the same wire in a time division manner in the manner described below.
[0607]
The system first automatically initiates a DMA write transfer as indicated by the WR_XSFR signal at time 1409. In one embodiment, the initial portion of the WR_XSFR signal includes a predetermined overhead associated with the PCI controller, PCI 9080 or 9060. The host computing system then performs a DMA write operation on the FPGA logical device coupled to the FPGA bus FD [63: 0] via the local bus LD [31: 0] and FPGA bus FD [63: 0]. To do.
[0608]
At time 1412, the WR_XSFR signal is deactivated, indicating the completion of the DMA write operation. The EVAL signal is activated for a predetermined time between

times

1412 and 1410. The duration of EVALTIME is programmable and is initialized to 8 + X. Where X is obtained from the longest signal trace path. The XSFR_DONE signal is also activated for a short time, indicating the completion of this DMA transfer operation (current operation is DMA write).
[0609]
Also, at time 1412, contention between EVAL_REQ_N signals stops, but the wire carrying the DONE signal now delivers the EVAL_REQ_N signal to the CTRL_FPGA unit. During three clock cycles, the EVAL_REQ_N signal is processed over the wire carrying the DONE signal. After 3 clock cycles, the EVAL_REQ_N signal is no longer generated by the FPGA logic device, but the EVAL_REQ_N signal previously delivered to the CTRL_FPGA unit can be processed. The maximum time that the EVAL_REQ_N signal is no longer generated by the FPGA logic device for the gated clock is approximately 23 clock cycles. EVAL_REQ_N signals longer than this period can be ignored.
[0610]
At time 1413, approximately two clock cycles after time 1412 (at the end of the DMA write operation), the CTRL_FPGA unit sends a write address strobe WPLX ADS_N signal to the PCI controller (eg, PLX PCI9080) to initiate a DMA read transfer. . At approximately 24 clock cycles from time 1413, the PCI controller can begin the DMA read transfer process and a DONE signal is also generated. At time 1414, before the start of the DMA read process by the PCI controller, the RD_XSFR signal is activated to enable the DMA read transfer. Predetermined PLX overhead data is first transmitted and processed. At time 1415, while this overhead data is processed, the DMA read data is input to FPGA bus FD [63: 0] and local bus LD [31: 0]. From the time 1413 to the end of 24 clock cycles and at the time of activation of the DONE signal and generation of the EVAL_REQ_N signal from the FPGA logic device, the PCI controller reads the DMA read data, transfers the data to the FPGA bus FD [63: 0] and local Processing is performed by transmission from the bus LD [31: 0] to the host computer system.
[0611]
At time 1410, the DMA read data may continue to be processed while the EVAL signal is deactivated and the EVAL_DONE signal is activated to indicate the completion of the EVAL cycle. Contention between FPGA logic devices also begins when generating the EVAL_REQ_N signal.
[0612]
At time 1417, just prior to completion of the DMA read period at time 1416, the host computer system polls the PLX interrupt register to determine if the end of the DMA cycle is near. The PCI controller identifies how many cycles are required to complete the DMA data transfer process. After a predetermined number of cycles, the PCI controller sets a specific bit in its interrupt register. The CPU in the host computer system polls this interrupt register in the PCI controller. If the bit is set, the CPU identifies that the DMA period is almost over. The CPU in the host system does not always poll the interrupt register. This is because the PCI bus is occupied in the read cycle. Thus, in one embodiment of the present invention, the CPU in the host computer system waits for a predetermined number of cycles and then after a short time so as to poll the interrupt register, the DMA read at time 1416 when RD_XSFR is deactivated. The period ends and DMA read data is no longer on the FPGA bus FD [63: 0] or on the local bus LD [31: 0]. Further, at time 1416, the XSFR_DONE signal is activated, and competition between the LAST signals for generating the DONE signal is started.
[0613]
Throughout the DMA period from generation of the WR_XSFR signal at time 1409 to time 1417, the CPU in the host computer system does not access the simulation hardware system. In one embodiment, this period is the sum of (1) the overhead period for PCI controller time 2, (2) the number of words in WR_XSFR and RD_XSFR, and (3) the PCI overhead of the host computer system (eg, Sun ULTRASPARC). The first access after the DMA period occurs at time 1419 when the CPU polls the interrupt register in the PCI controller.
[0614]
After about 14 clock cycles from time 1411, ie, time 1416, the on-board SRAM memory device is enabled by activating the MEM_EN signal, thereby starting memory access between the FPGA logic device and the SRAM memory device. Can do. Memory accesses continue until time 1419, and in one embodiment, require 5 clock cycles per access. If a DMA read transfer is not required, memory access may begin at an earlier time 1410 rather than at time 1411.
[0615]
While memory access occurs between the FPGA logic device and the SRAM memory device via the FPGA bus FD [63: 0], the CPU in the host computer system runs from time 1418 to time 1429 with the PCI controller and CTRL_FPGA unit, Communication is possible via the bus LD [31: 0]. This occurs after the CPU completes polling the PCI controller interrupt register. The CPU writes data to various registers in preparation for the next data transfer. This period is longer than 4 μsec. When the memory access is shorter than this period, no conflict occurs in the FPGA bus FD [63: 0]. At time 1429, the XSFR_DONE signal is deactivated.
[0616]
The timing diagram of FIG. 63 is somewhat different from that of FIG. This is because in FIG. 63, the WAIT_EVAL field is set to “1”. In other words, the DMA read transfer period starts after the EVAL_DONE signal is activated and almost complete. The DMA read transfer period starts not after the completion of the DMA write operation but after waiting until the EVAL period is almost completed. The EVAL signal is activated during a predetermined period from time 1412 to time 1410. At time 1410, the EVAL_DONE signal is activated to indicate the completion of the EVAL period.
[0617]
In FIG. 63, the CTRL_FPGA unit does not generate the write address strobe signal WPLX ADS_N to the PCI controller until the time 1420 after the DMA write operation ends at the time 1412. Time 1420 is approximately 16 clock cycles before the end of the EVAL period. The XSFR_DONE signal is further extended to time 1423. At time 1423, the XSFR_DONE field is set and a WPLX ADS_N signal may be generated to initiate the DMA read process.
[0618]
At time 1420, approximately 16 clock cycles before the activation of the EVAL_DONE signal, the CTRL_FPGA unit sends a write address strobe WPLX ADS_N signal to the PCI controller (eg, PLX PCI 9080) to initiate a DMA read transfer. Approximately 24 clock cycles after time 1420, the PCI controller initiates the DMA read transfer process and also generates a DONE signal. Prior to the start of the DMA read process by the PCI controller at time 1421, the RD_XSFR signal is activated, thereby enabling DMA read transfers. Some PLX overhead data is first transmitted and processed. At time 1422, while this overhead data is being processed, the DMA read data is on the FPGA bus FD [63: 0] and the local bus LD [31: 0]. At time 1424 when the 24 clock cycles end, the PCI controller processes by transmitting DMA read data from the FPGA bus FD [63: 0] and local bus LD [31: 0] to the host computer system. The rest of the timing diagram is equivalent to FIG.
[0619]
Thus, in FIG. 63, the RD_XSFR signal is activated after FIG. In FIG. 63, the RD_XSFR signal is activated after the EVAL period is almost completed, thereby delaying the DMA read operation. In FIG. 62, the RD_XSFR signal is activated after the CLK_EN signal is detected after the DMA write transfer is completed.
[0620]
(IX. Co-verification system)
The co-verification system of the present invention may accelerate the design / development cycle by providing the designer with the flexibility of software simulation and the higher speed that results from using hardware models. Both the hardware and software portions of the design can be verified prior to the creation of the ASIC and without restrictions to emulator-based co-verification tools. The debugging function can be improved and the overall debugging time can be significantly reduced.
[0621]
(Conventional co-verification tool using ASIC as test device)
FIG. 64 shows a typical final design embodied as a PCI add-on card, such as video, multimedia, Ethernet, or SCSI card. The card 2000 includes a direct interface connector 2002 that allows communication with other peripheral devices. Connector 2002 is connected to bus 2001 and carries video signals from a VCR, camera or TV tuner, video and audio output to a monitor or speaker, and signals to a communication or disk drive interface. Depending on the user design, one skilled in the art can anticipate requirements for other interfaces. Many of the functions of the design are in chip 2004, which is connected to interface connector 2002 via bus 2003, connected to local oscillator 2005 that generates local clock signals via bus 2007, and via bus 2008. Are connected to the memory 2006. The add-on card 2000 further includes a PCI connector 2009 and is connected to the PCI bus 2010.
[0622]
Prior to implementing this design as an add-on card shown in FIG. 64, the design is changed to an ASIC configuration for testing. A conventional hardware / software co-verification tool is shown in FIG. In FIG. 65, the user design is embodied in the form of an ASIC, shown as a test device (or “DUT”) 2024. A test device 2024 is placed in the target system 2020 to obtain stimuli from various sources designed as interface destinations for the ASIC. The target system 2020 is a combination of a central processing system 2021 on the motherboard and several peripheral devices. The target system 2020 includes a central processing system 2021, and the central processing system 2021 includes a CPU and a memory. The target system 2020 operates under several operating systems, such as Microsoft Windows (registered trademark) or Solaris of Sun MicroSystem, to execute multiple applications. As known to those skilled in the art, Sun MicroSystem's Solaris is a set of operating environments and software products that support the Internet, Intranet and enterprise computing. The Solaris operating environment is based on the industry standard UNIX (R) System V Release 4 and is designed for client-server applications in a distributed networking environment and provides adequate resources for relatively small workgroups. Provide WebTone required for electronic commerce.
[0623]
A device driver 2022 for the test device 2024 is included in the central processing system 2021 and enables communication between the operating system (and any application) and the test device 2024. As known to those skilled in the art, a device driver is specific software that controls a hardware component or peripheral device of a computer system. The device driver is responsible for accessing the hardware registers of the device and often includes an interrupt handler for service interrupts caused by the device. Device drivers often form the lowest level part of the operating system kernel. This part is a part to which the device driver is linked when the kernel is built. Some more recent systems have a loadable device driver that can be installed from a file after the operating is executed.
[0624]
The test device 2024 and the central processing system 2021 are connected to the PCI bus 2023. Other peripheral devices in the target system 2020 include an Ethernet PCI add-on card 2025 used to connect the target system to the network 2030 via the bus 2034, and SCSI drives 2027 and 2031 via the

buses

2036 and 2035. SCSI PCI add-on card 2026 connected to the VCR 2028 connected to the test device 2024 via the bus 2032 (if necessary for the design of the test device 2024), and a monitor connected to the test device 2024 via the bus 2033 And / or speaker 2029 (if necessary for the design of test device 2024). As is known to those skilled in the art, "SCSI" is an abbreviation for Small Computer Systems Interface, which includes computers and many intelligent devices including hard disks, floppy disks, CD-ROMs, printers, and scanners. A processor-independent standard for system-level interfaces to devices.
[0625]
In this target system environment, the test device 2024 can be tested using various stimuli from the central processing system (ie, operating system, application) to peripheral devices. If there is no problem in time and the designer simply wants a test to see if it succeeds or fails, the co-verification tool should be modified appropriately to meet that need. In most cases, however, design projects are severely limited in terms of budget and schedule for release as a product. As mentioned above, this particular ASIC-based co-verification tool is not satisfactory. This is because there is no debugging function. (The designer cannot identify the cause of a “failed” test without advanced techniques, and the number of “fixes” for each detected bug cannot be predicted at the start of the project. Budget becomes unpredictable.)
(Conventional co-verification tool using an emulator as a test device)
FIG. 66 shows conventional co-verification using an emulator. Unlike the settings shown in FIG. 64 and described above, the test device is programmed in an emulator 2048 connected to the target system 2040, several peripheral devices, and a test workstation 2052. Emulator 2048 includes an emulation clock 2066 and a test device programmed within the emulator.
[0626]
The emulator 2048 is connected to the target system 2040 via the PCI bus bridge 2044, the PCI bus 2057, and the control line 2056. The target system 2040 is a combination of a central processing system 2041 on the motherboard and several peripheral devices. Target system 2040 includes a central processing system 2041, which includes a CPU and memory. Target system 2040 runs under several operating systems, such as Microsoft Windows® or Sun Microsystems Solaris, to run multiple applications. A device driver 2042 for the test device is included in the central processing system 2041 to enable communication between the operating system (and any application) and the test device in the emulator 2048. Central communications system 2041 is connected to PCI bus 2043 for communication with emulator 2048 and other devices that are part of this computing environment. Other peripheral devices in target system 2040 include an Ethernet PCI add-on card 2045 used to connect the target system to network 2049 via bus 2058, and SCSI drive 2047 and

buses

2060 and 2059. A SCSI PCI add-on card 2046 connected to 2050 is included.
[0627]
The emulator 2048 is further connected to the test workstation 2052 via the bus 2062. Test workstation 2052 includes a CPU and memory to perform its functions. Test workstation 2052 may further include a test case 2061 and a device model 2068 for other devices that are modeled but not physically connected to emulator 2048.
[0628]
Finally, emulator 2048 is connected via bus 2061 to several other peripheral devices such as a frame buffer or data stream recording / playback system 2051. This frame buffer or data stream recording / playback system 2051 can further be connected to a communication device or channel 2053 via bus 2063, can be connected to VCR 2054 via bus 2064, and can be monitored and / or speaker via bus 2065. 2055 can be connected.
[0629]
As is known to those skilled in the art, the emulation clock operates at a speed much slower than that of the actual target system. Therefore, the filled portion in FIG. 66 is executed at the emulation speed, and the other unfilled portions are executed at the actual target system speed.
[0630]
As described above, the co-verification tool using this emulator has some limitations. When using a logic analyzer or sample-and-hold device to obtain the internal state of the test device, the designer must compile the design so that the relevant signals that are to be examined for debugging purposes are present on the sampling output pins. If the designer wants to debug another part of the design, the designer must make sure that the part has an output signal that can be sampled by a logic analyzer or sample-and-hold device. Alternatively, the designer must recompile the design in emulator 2048 so that these signals can be provided on output pins for sampling purposes. Such recompilation can take days or weeks, which can be a delay that is too long for a time-critical design / development schedule. Furthermore, since this co-verification tool uses signals, advanced circuitry must be provided to convert these signals to data or to provide some signal-to-signal timing control. Don't be. Further, the need to use as

many wires

2061 and 2062 as necessary for each signal that it is desired to sample increases the load and time during debug setup.
[0631]
(Simulation with reconfigurable computing array)
A high level configuration of the inventive single engine reconfigurable computing (RCC) array system described herein above is shown in FIG. 67 for a brief review. In one embodiment of the present invention, this single engine RCC system is incorporated into a co-verification system.
[0632]
67, the RCC array system 2080 includes an RCC arithmetic system 2081, a reconfigurable arithmetic (RCC) hardware array 2084, and a PCI bus 2089 connecting them. Importantly, the RCC computing system 2081 includes the entire user-designed model in software, and the RCC hardware array 2084 includes the user-designed hardware model. The RCC computing system 2081 includes a CPU, memory, operating system, and software required to execute the single engine RCC system 2080. A software clock 2082 is provided to enable leaky control of the software model in the RCC computing system 2081 and the hardware model in the RCC hardware array 2084. Test bench data 2083 is further stored in the RCC computing system 2081.
[0633]
The RCC hardware array system 2084 includes a PCI interface 2085, an RCC hardware array board set 2086, and various buses for the interface. The RCC hardware array board set 2086 includes at least a portion of the user design modeled in hardware (ie, hardware model 2087) and a memory 2088 for test bench data. In one embodiment, various portions of this hardware model are distributed among multiple reconfigurable logic elements (eg, FPGA chips) during configuration time. As more reconfigurable logic elements or chips are used, more boards may be required. In one embodiment, four reconfigurable logic elements are provided on a single board. In other embodiments, eight reconfigurable logic elements are provided on a single board. The capacity and performance of a reconfigurable logic element in a board with 4 chips can be significantly different from the capacity and performance of a reconfigurable logic element in a board with 8 chips.
[0634]
Bus 2090 provides various clocks for the hardware model from PCI interface 2085 to hardware model 2087. The bus 2091 provides other I / O data between the PCI interface 2086 and the hardware model 2087 via the connector 2093 and the internal bus 2094. The bus 2092 functions as a PCI bus between the PCI interface 2085 and the hardware model 2087. In addition, test bench data may be stored in memory within the hardware model 2087. As described above, the hardware model 2087 is other configurations and functions other than the user-designed hardware model, and the configurations and functions necessary to allow the hardware model to interface with the RCC computing system 2081. Includes functionality.
[0635]
The RCC system 2080 may be provided as a single workstation or connected to a workstation network. In the latter case, each workstation is provided access to the RCC system 2080 on a time division basis. In fact, the RCC array system 2080 acts as a simulation server with a simulation scheduler and a state swapping mechanism. The server allows each user of the workstation to access the RCC hardware array 2084 for the purposes of faster acceleration and hardware state swapping. After acceleration and state swapping, each user can release control of the RCC hardware array 2084 to other users of other workstations while simulating the user design locally in software. This network model is also used in the co-verification system described below.
[0636]
The RCC array system 2080 has the power and flexibility to simulate the entire design, the power and flexibility to accelerate some of the test points through the hardware model within the reconfigurable computing array during selected cycles, and Gives the designer the power and flexibility to acquire internal state information from virtually any part of the designer's design at any time. Indeed, a single engine reconfigurable computing array (RCC) system can generally be referred to as a hardware accelerated simulator and can be used to perform the following tasks in a single debug session. (1) Simulation only, (2) Simulation using hardware acceleration that allows the user to start, stop, assert values, and investigate internal states at any time, (3) Analysis after simulation, And (4) In-circuit emulation. Since both the software model and the hardware model are under strict control of a single engine via a software clock, the hardware model in the reconfigurable computing array is closely connected to the software simulation model. This allows the designer to debug every cycle and to accelerate and decelerate the hardware model through multiple cycles to obtain valuable internal state information. Furthermore, since this simulation system handles data rather than signals, complex signal / data conversion / timing circuits are not required. Further, unlike typical emulation systems, it is not necessary to recompile the hardware model in the reconfigurable computing array if the designer wishes to examine a different set of nodes. See above for further details.
[0637]
(Co-verification system without external I / O)
One embodiment of the present invention is a co-verification system that does not use an actual physical external I / O device and target application. Thus, the co-verification system according to one embodiment of the present invention, along with other functions, can be used to debug the software and hardware portions of the user design without using the actual target system or I / O device. An RCC system may be incorporated. The target system and external I / O devices are modeled in software within the RCC computing system.
[0638]
Referring to FIG. 68, the co-verification system 2100 includes an RCC computing system 2101, an RCC hardware array 2108, and a PCI bus 2114 connecting them. Importantly, the RCC computing system 2101 includes the entire user-designed model in software, and the reconfigurable computing array 2108 includes the user-designed hardware model. The RCC computing system 2101 includes a CPU, a memory, an operating system, and software necessary for executing the single engine co-verification system 2100. A software clock 2104 is provided to enable leaky control of the software model in the RCC computing system 2101 and the hardware model in the reconfigurable computing array 2108. A test case 2103 is further stored in the RCC calculation system 2101.
[0639]
According to one embodiment of the present invention, the RCC computing system 2101 further includes a target application 2102, a user-designed hardware model driver 2105, a device (eg, video card) model and a driver in the device model software (at 2106 And a model of another device (eg, a monitor) and a device model driver (shown at 2107) that is also in software. In essence, the RCC computing system 2101 includes the device model required to communicate to the user-designed software and hardware models that the actual target system and other I / O devices are part of this computing environment. Include as many drivers as you need.
[0640]
The RCC hardware array 2108 includes a PCI interface 2109, an RCC hardware array board set 2110, and various buses for the interface. The RCC hardware array board set 2110 includes at least a part of the user design modeled in the hardware 2112 and a test bench data memory 2113. As described above, each board includes a plurality of reconfigurable logic elements or chips.
[0641]
The bus 2115 provides various clocks for the hardware model from the PCI interface 2109 to the hardware model 2112. The bus 2116 provides other I / O data between the PCI interface 2109 and the hardware model 2112 via the connector 2111 and the internal bus 2118. The bus 2117 functions as a PCI bus between the PCI interface 2109 and the hardware model 2112. In addition, test bench data can be stored in memory within the hardware model 2113. As described above, the hardware model is a configuration and function other than the user-designed hardware model, and the configuration and function necessary to enable the hardware model to interface with the RCC computing system 2101. including.
[0642]
To compare the co-verification system of FIG. 68 with a conventional emulator-based co-verification system, FIG. 66 shows that the target system 2040, some I / O devices (eg, frame buffer, or data 1 shows an emulator 2048 connected to a stream recording / playback system 2051) and a workstation 2052. This emulator configuration presents many problems and configuration issues to the designer. The emulator requires a logic analyzer or sample and hold device to measure the internal state of the user design that is modeled within the emulator. Since logic analyzers and sample and hold devices require signals, complex signal-to-data conversion circuits are required. Furthermore, a complex timing control circuit from signal to signal is required. The many wires required for each signal used to measure the internal state of the emulator add further load to the user being set up. During a debugging session, the user recompiles the emulator each time he wants to examine a different set of internal logic, and provides the appropriate signals from the logic as outputs for measurement and recording by a logic analyzer or sample-and-hold device. Must be done. The long time taken for recompilation is too big a loss.
[0643]
In the co-verification system of the present invention where no external I / O devices are connected, the target system and other I / O devices are modeled in software, so that the actual physical target system and I / O device is no longer needed. Since the RCC arithmetic system 2101 processes data, neither a signal-to-data complex conversion circuit nor a signal-to-signal timing control system is required. The number of wires is also independent of the number of signals, so the setup is relatively simple. Furthermore, no recompilation is required to debug different parts of the logic circuit within the user-designed hardware model. This is because the co-verification system processes data, not signals. Since the RCC computing system controls the RCC hardware array using a software control clock (i.e., software clock and clock edge detection circuit), the hardware model can be easily started and terminated. Reading from the hardware model is also easy. This is because the model of the entire user design is in software and the software clock enables synchronization. Thus, users can debug with software simulation only, accelerate some or all of the design in hardware, perform various desired test points on a cycle-by-cycle basis, and maintain the internal state of software and hardware models (ie, registers And combinatorial logic states). For example, the user simulates the design with some test bench data, downloads internal state information to the hardware model, accelerates the design with various test bench data in the hardware model, and the resulting hardware model Can be examined by register / combinatorial logic regeneration and values can be loaded from the hardware model into the software model. Finally, the user can use the results of the hardware model accelerated process to simulate other parts of the user design in the software.
[0644]
However, as mentioned above, a workstation is still needed for debug session control. In a network configuration, the workstation can be remotely connected to the co-verification system for remote access to debug data. In a non-network configuration, the workstation can be locally connected to the co-verification system. In some embodiments, the workstation may incorporate a co-verification system so that debug data can be accessed locally.
[0645]
(Co-verification system using external I / O)
In FIG. 68, various I / O devices and target applications have been modeled within the RCC computing system 2101. However, if too many I / O devices and target applications execute too much in the RCC computing system 2101, the overall speed is reduced. If there is only one CPU in the RCC computing system 2101, longer time is required to process various data from all device models and target applications. To increase data throughput, actual I / O devices and target applications may be physically connected to the co-verification system (instead of a software model of these I / O devices and target applications).
[0646]
One embodiment of the present invention is a co-verification system that can use actual physical external I / O devices and target applications. Thus, the co-verification system may incorporate the RCC system with other functions to debug the software and hardware portions of the user design while using the actual target system and / or I / O device. For testing, the co-verification system may use test bench data from software and stimuli from external interfaces (eg, target systems and external I / O devices). The test bench data can be used to provide test data to the pin outputs of the user design and to provide test data to internal nodes within the user design. The actual I / O signal from the external I / O device (or target system) can be directed only to the user designed pin output. Thus, one major difference between test data from an external interface (eg, target system or external I / O device) and the test bench process in software is that test bench data is applied to pin outputs and internal nodes The actual data from the target system or external I / O device can be used to test the user design with the stimulated stimulus, but the user's via its pin output (or the node in the user design that represents the pin output) It can be given only to the design. The structure of the co-verification system and the configuration for the target system and external I / O device are described below.
[0647]
The co-verification system according to one embodiment of the present invention differs in the structure and function of the elements in the broken line 2070 as compared with the system configuration of FIG. In other words, while FIG. 66 shows an emulator and workstation within the dashed line 2070, one embodiment of the present invention replaces the co-verification system 2140 (and associated workstation) shown in FIG. It is included as a verification system 2140.
[0648]
Referring to FIG. 69, a co-verification system configuration according to an embodiment of the present invention includes a target system 2120, a co-verification system 2140, and some optional I / O devices connected to each other. Control /

data buses

2131 and 2132 to be included. The target system 2120 includes a central processing system 2121, which includes a CPU and memory. Target system 2120 runs under several operating systems, such as Microsoft Windows® or Sun Microsystems Solaris, to run multiple applications 2122 and test cases 2123. User-designed hardware model device driver 2124 is included within central processing system 2121 to enable communication between the operating system (and any application) and the user design. Central communication system 2121 is connected to PCI bus 2129 for communication with the co-verification system and other devices that are part of this computing environment. Other peripheral devices in the target system 2120 include an Ethernet PCI add-on card 2125 used to connect the target system to the network, and a SCSI PCI add-on card 2126 connected to the SCSI drive 2128 via the bus 2130. , And a PCI bus bridge 2127.
[0649]
The co-verification system 2140 includes an RCC computing system 2141, an RCC hardware array 2190, an external interface 2139 in the form of an external I / O expansion unit, and a PCI bus that connects the RCC computing system 2141 and the RCC hardware array 2190. 2171. The RCC computing system 2141 includes a CPU, a memory, an operating system, and software necessary to execute the single engine co-verification system 2140. Importantly, the RCC computing system 2141 includes the entire user design in software, and the RCC hardware array 2190 includes a hardware model of the user design.
[0650]
As described above, the single engine of the co-verification system obtains power and flexibility from the main software kernel in the main memory of the RCC computing system 2141 and operates the co-verification system 2140 and Control overall execution. As long as the test bench process is active and all external signals are presented to the co-verification system, the kernel evaluates the active test bench component, evaluates the clock component, detects clock edges, and registers In addition, the logic is updated and the combinational logic data is propagated to speed up the simulation time. Thanks to this main software kernel, the RCC computing system 2141 and the RCC hardware array 2190 are closely connected.
[0651]
The software kernel generates a software clock signal from the RCC hardware array 2190 and a software clock source 2142 provided externally. The clock source 2142 may generate multiple clocks of different frequencies depending on the destination of these software clocks. In general, the software clock ensures that registers in the user-designed hardware model evaluate in synchronization with the system clock without disturbing the hold time. The software model may detect clock edges in software that affect hardware model register values. Therefore, the clock detection mechanism ensures that clock edge detection in the main software model can be interpreted as clock detection in the hardware model. For a more detailed description of the software clock and clock edge detection logic, refer to FIGS. 17-19 and corresponding portions of this specification.
[0652]
According to one embodiment of the invention, the RCC computing system 2141 may further include one or more models of multiple I / O devices. Despite the fact that other actual physical I / O devices can be connected to the co-verification system. For example, the RCC computing system 2141 may include a model of a device (eg, a speaker) and its driver and test bench data in software (shown at 2143), a model of another device (eg, a graphics accelerator) and its driver and test. Bench data may be included in the software (shown at 2144). The user decides which devices (and their respective drivers and test bench data) to model and incorporate into the RCC computing system 2141 and which devices are actually connected to the co-verification system.
[0653]
The co-verification system includes (1) an RCC computing system 2141 and an RCC hardware array 2190, and (2) an external interface (connected to a target system and an external I / O device) and an RCC hardware. Control logic is provided to provide traffic control to and from the wear array 2190. Some data passes between the RCC hardware array 2190 and the RCC computing system 2141. This is because some I / O devices can be modeled within an RCC computing system. Further, the RCC computing system 2141 has a model of the entire design in software that includes the portion of the user design modeled in the RCC hardware array 2190. As a result, the RCC computing system 2141 must also have access to all data passing between the external interface and the RCC hardware array 2190. The control logic ensures that the RCC computing system 2141 has access to these data. The control logic is described in detail below.
[0654]
The RCC hardware array 2190 includes a plurality of array boards. In the particular embodiment shown in FIG. 69, the hardware array 2190 includes boards 2145-2149. Boards 2146-2149 include most of the configured hardware models. Board 2145 (ie, board m1) includes a reconfigurable computing element (eg, FPGA chip) 2153 that can be used by the co-verification system to configure at least a portion of the hardware model, and an external interface (target system and I / O device) and an external I / O controller 2152 that directs traffic and data between the co-verification system 2140. The board 2145 allows the RCC computing system 2141 to access all data transmitted between the outside world (ie, the target system and I / O devices) and the RCC hardware array 2190 via an external I / O controller. Allows you to have. This access is important. This is because the RCC computing system 2141 in the co-verification system includes a model of the entire user design in software, and the RCC computing system 2141 can further control the functions of the RCC hardware array 2190.
[0655]
If a stimulus from an external I / O device is provided to the hardware model, the software model must also have access to this stimulus. This allows the user of the co-verification system to selectively control the next debugging step. The next debugging step may involve examining the designer's design internal state values as a result of this applied stimulus. As described above with respect to the board layout and interconnection scheme, the first and last boards are included in the hardware array 2190. Therefore, board 1 (shown as board 2146) and board 8 (shown as board 2149) are included in a hardware array (excluding board m1) having 8 boards. Except for these boards 2145 to 2149, a board m2 having a chip m2 (not shown in FIG. 69 but shown in FIG. 74) may be further provided. The board m2 is similar to the board m1 except that it does not have an external interface and can be used for expansion if additional boards are required.
[0656]
Describe the contents of these boards. The board 2145 (board m1) includes a PCI controller 2151, an external I / O controller 2152, a data chip (m1) 2153, a memory 2154, and a multiplexer 2155. In one embodiment, the PCI controller is a PLX9080. The PCI controller 2151 is connected to the RCC arithmetic system 2141 via the bus 2171 and is connected to the tristate buffer 2179 via the bus 2172.
[0657]
The main traffic controller in the co-verification system between the outside world (target system 2120 and I / O devices) and the RCC computing system 2141 is an external I / O controller 2152 (in FIGS. 69, 71 and 73). Also referred to as “CRTLXM”). The external I / O controller 2152 is connected to the RCC computing system 2141, the other boards 2146-2149 in the RCC hardware array, the target system 2120, and the actual external I / O devices. Of course, as noted above, the primary traffic controller between the RCC computing system 2141 and the RCC hardware array 2190 is always the individual internal I / O controller (eg, I / O controller) within each array board 2146-2149 and PCI controller 2151. / O controllers 2156 and 2158). In one embodiment, the individual internal I / O controllers such as controllers 2156 and 2158 are FPGA I / O controllers described above and illustrated in the exemplary drawings such as FIG. 22 (unit 700) and FIG. 56 (unit 1200). .
[0658]
External I / O controller 2152 is connected to tristate buffer 2179 to allow the external I / O controller to interface with RCC computing system 2141. In one embodiment, the tri-state buffer 2179, in one example, allows data from the RCC computing system 2141 to pass toward the local bus 2180 while data from the local bus enters the RCC computing system 2141. In another example, data is allowed to pass from the local bus 2180 to the RCC computing system 2141.
[0659]
The external I / O controller 2152 is further connected to a chip (m1) 2153 and a memory / external buffer 2154 via a data bus 2176. In one embodiment, chip (m1) 2153 is an FPGA chip, such as an FPGA chip, that can be used to construct at least a portion of a user-designed hardware model (or the entire hardware model if the user design is sufficiently small). Reconfigurable computing element. The external buffer 2154 is a DRAM DIMM in one embodiment and can be used by the chip 2153 for various purposes. External buffer 2154 provides a larger memory capacity than the individual SRAM memory devices locally connected to each reconfigurable logic element (eg, reconfigurable logic element 2157). This large memory capacity allows the RCC computing system to store large amounts of data. The large amount of data includes test bench data, embedded code for a microcontroller (if the user design is a microcontroller), and a large look-up table in one memory device. The external buffer 2154 can further be used to store data necessary for hardware modeling as described above. In essence, this external buffer 2154 functions in part like the other high bank or low bank SRAM memory devices described above and shown, for example, in FIG. 56 (SRAM 1205 and 2106), but requires more memory. To do. External buffer 2154 is further used by the co-verification system to store data received from target system 2120 and external I / O devices. This allows these data to be later retrieved by the RCC computing system 2141. Chip m1 2153 and external buffer 2154 further include a memory mapping system as described in the section “Memory Simulation” herein.
[0660]
To access the desired data in external buffer 2154, both chip 2153 and RCC computing system 2141 may deliver the address of the desired data (via external I / O controller 2152). Chip 2153 provides an address on address bus 2182 and external I / O controller 2152 provides an address on address bus 2177. These address buses 2182 and 2177 are inputs to multiplexer 2155, which provides the selected address on output line 2178 connected to external buffer 2154. The selection signal for multiplexer 2155 is provided by external I / O controller 2152 via line 2181.
[0661]
The external I / O controller 2152 is further connected to other boards 2146 to 2149 via the bus 2180. In one embodiment, bus 2180 is the local bus described above and illustrated in the exemplary drawings such as FIG. 22 (local bus 708) and FIG. 56 (local bus 1210). In this embodiment, only five boards (including board 2145 (board m1)) are used. The actual number of boards is determined by the complexity and size of the user design that is modeled in hardware. A user-designed hardware model with normal complexity requires fewer boards than a more complex user-designed hardware model.
[0662]
In order to enable scalability, the boards 2146-2149 are substantially identical to each other except for a few interconnect lines between the boards. These interconnect lines are configured so that a portion of one chip (eg, chip 2157 in board 2146) in a user-designed hardware model is another chip (eg, in a hardware model in the same user design). , To allow communication with another part physically provided in the chip 2161) in the board 2148. With respect to the interconnect structure for this co-verification system, please refer briefly to FIGS. 74, 8 and 36-44 and corresponding portions of this specification.
[0663]
Board 2148 is a representative board. The board 2148 is a third board in this 4-board layout (excluding the board 2145 (board ml)). Therefore, it is not an end board that requires proper termination for the interconnect lines. Board 2148 includes internal I / O controller 2158, several reconfigurable logic elements (eg, FPGA chips) 2159-2166, high bank FD bus 2167, low bank FD bus 2168, high bank memory 2169, and low bank memory. 2170. As described above, in one embodiment, internal I / O controller 2158 is the FPGA I / O controller described above and shown in the exemplary drawings such as FIG. 22 (unit 700) and FIG. 56 (unit 1200). Similarly, high and low bank memory devices 2169 and 2170 are the SRAM memory devices described above and shown, for example, in FIG. 56 (SRAM 1205 and 1206). High and low bank FD buses 2167 and 2168, in one embodiment, are described above and illustrated in FIG. 22 (FPGA buses 718 and 719), FIG. 56 (FD buses 1212 and 1213), and FIG. 57 (FD bus 1282). FD bus or FPGA bus shown in the schematic drawing.
[0664]
An external interface 2139 in the form of an external I / O extension is provided to connect the co-verification system 2140 to the target system 2120 and other I / O devices. On the target system side, the external I / O expansion unit 2139 is connected to the PCI bridge 2127 via the secondary PCI bus 2132 and the control line 2131. Control line 2131 is used to deliver a software clock. On the I / O device side, the external I / O expansion unit 2139 is connected to various I / O devices via pin output data buses 2136 to 2138 and software clock control lines 2133 to 2135. The number of I / O devices that can be connected to the I / O expansion unit 2139 is determined by the user. In any case, the external I / O extension 2139 includes a data bus and software clock control lines necessary to connect many I / O devices to the co-verification system 2140 to successfully execute a debugging session. Are provided as needed.
[0665]
On the co-verification system 2140 side, an external I / O expansion unit 2139 is connected to an external I / O controller 2152 via a data bus 2175, a software clock control line 2174, and a scan control line 2173. A data bus 2175 is used to pass pin output data between the outside world (target system 2120 and external I / O devices) and the co-verification system 2140. Software clock control line 2174 is used to deliver software clock data from RCC computing system 2141 to the outside world.
[0666]
The software clock present on control lines 2174 and 2131 is generated by the main software kernel in RCC computing system 2141. The RCC computing system 2141 sends a software clock to an external I / O expansion unit via a PCI bus 2171, a PCI controller 2151, a bus 2171, a three-state buffer 2179, a local bus 2180, an external I / O controller 2152, and a control line 2174. Delivered to 2139. From external I / O extension 2139, a software clock is provided as a clock input (via PCI bridge 2127) to target system 2120, and other external I / O devices are provided via control lines 2133-2135. . Because the software clock functions as the primary clock source, the target system 2120 and the I / O device run at a slower speed. However, the data provided to the target system 2120 and external I / O devices is synchronized to the software clock speed, as is the software model in the RCC computing system 2141 and the hardware model in the RCC hardware array 2190. Similarly, data from target system 2120 and external I / O devices is delivered to co-verification system 2140 in synchronization with the software clock.
[0667]
Therefore, the I / O data passed between the external interface and the co-verification system is synchronized with the software clock. In effect, the software clock controls the operation of the external I / O device and target system each time data passes between the external I / O device and target system and the co-verification system. Synchronization with the operation of the application system (in the RCC computing system and the RCC hardware array). A software clock is used for both data-in and data-out operations. For a data-in operation, when a pointer (described later) latches a software clock from the RCC computing system 2141 to the external interface, the other pointers receive these I / O data from the external interface and the hardware of the RCC hardware array 2190. Latch to a selected internal node in the wear model. These pointers latch these I / O data one by one during this cycle when the software clock is delivered to the external interface. Once all the data is latched, the RCC computing system may generate another software clock and again latch additional data in another software clock cycle if desired. For data-out operations, the RCC computing system delivers a software clock to the external interface, then gating data from the internal nodes of the hardware model in the RCC hardware array 2190 to the external interface, with pointer assistance Receive and control. The pointer also gates the data from the internal node to the external interface one by one. If additional data needs to be delivered to the external interface, the RCC computing system can generate another software clock to activate the selected pointer to gate the data to the external interface. The generation of the software clock is strictly controlled, so that the co-verification system performs data delivery and data evaluation between the co-verification system and any external I / O device connected to the external interface. It is possible to synchronize.
[0668]
Scan control line 2173 is used to allow conversion system 2140 to scan data buses 2132, 2136, 2137, and 2138 for any data that may be present. The logic in the external I / O controller 2151 that supports the scan signal is pointer logic. Here, various inputs are provided as outputs for a specific period of time and then transition to the next input via the MOVE signal. This logic is similar to the scheme shown in FIG. In effect, the scan signal functions like a select signal for the multiplexer. However, the case where the scan signal sequentially selects various inputs to the multiplexer is excluded. Accordingly, in one period, the scan signal on the scan control line 2173 samples the data bus 2132 for data that can be input from the target system 2120. In the next period, the scan signal on scan control line 2173 samples data bus 2136 for data that may be input from an external I / O device that may be coupled thereto. In the next period, such as data bus 2137 being sampled, conversion system 2140 may receive and process all pinout data from target system 2120 or external I / O devices during this debug session. Any data received by sampling the data buses 2132, 2136, 2137, and 2138 by the conversion system 2140 is transmitted to the external buffer 2154 via the external I / O controller 2152.
[0669]
69, the target system 2120 includes a primary CPU, and the user design is a predetermined controller such as a video controller, a network adapter, a graphics adapter, a mouse, or other support device, a card, or logic. Assume that it is a peripheral device. Accordingly, target system 2120 includes a target application (including an operating system) coupled to primary PCI bus 2129, and conversion system 2140 includes a user design and is coupled to secondary PCI bus 2132. The configuration can be quite different depending on the subject of the user design. For example, if the user design is a CPU, the target application is executed in the RCC calculation system 2141 of the conversion system 2140 if the target system 2120 no longer includes the central calculation system 2121. Indeed, bus 2132 may here be a primary PCI bus and bus 2129 may be a secondary PCI bus. In essence, instead of the user design being one of the peripheral devices that support the central computing system 2121, the user design is now the main computing center and the other peripheral devices support the user design.
[0670]
Control logic for transmitting data between the external interface (external I / O expander 2139) and the conversion system 2140 is included in each board 2145-2149. The primary part of the control logic is contained in the external I / O controller 2152, while the other parts are various internal I / O controllers (eg, 2156 and 2158) and reconfigurable logic elements (eg, FPGA chips 2159 and 2165). include. Illustratively, only a predetermined portion of this control logic need be shown instead of the same repeating logic structure of all chips on all boards. The portion of conversion system 2140 within dotted line 2150 of FIG. 69 includes a subset of control logic. This control logic will now be described in more detail with reference to FIGS.
[0671]
The components in this particular subset of control logic are: external I / O controller 2152, tristate buffer 2179, internal I / O controller 2156 (CTRL1), reconfiguration logic element 2157 (chip 0_1 indicating board 0 chip 0) And various buses and control lines coupled to these components. In particular, FIG. 70 shows that portion of the control logic used for data-in-cycle. Here, data from the external interface (external I / O expander 2139) and the RCC calculation system 2141 is delivered to the RCC hardware array 2190. FIG. 72 is a timing diagram of the data in cycle. FIG. 71 shows that portion of the control logic used for the data out cycle. Here, data from the RCC hardware array 2190 is delivered to the RCC calculation system 2141 and the external interface (external I / O expander 2139). FIG. 73 is a timing diagram of the data out cycle.
[0672]
(Data in)
Data-in control logic according to one embodiment of the invention is responsible for processing data delivered from either the RCC computing system or an external interface to the RCC hardware array. One particular subset 2150 of data-in control logic (see FIG. 69) is shown in FIG. 70 and includes an external I / O controller 2200, a tri-state buffer 2202, an internal I / O controller 2203, a reconfigurable logic element 2204. And various buses and control lines that allow data transmission therebetween. An external buffer 2201 is also shown for this data-in embodiment. This subset illustrates the logic required for data-in operations. Here data from the external interface and RCC computing system is delivered to the RCC hardware array. The data-in control logic of FIG. 70 and the data-in timing diagram of FIG. 72 will be described together.
[0673]
In this data-in embodiment of the invention, two types of data cycles are used (global cycles and software-to-hardware (S2H) cycles). The global cycle is used for any data destined for all chips in the RCC hardware array, such as clocks, resets, and certain other S2H data destined for many different nodes in the RCC hardware array. The For these latter “global” S2H data, it is more feasible to transmit these data via a global cycle than continuous S2H data.
[0674]
Software-to-hardware cycles are used to transmit data sequentially from chip to chip on all boards from the test bench process in the RCC computing system to the RCC hardware array. Since user-designed hardware models are distributed across several boards, test bench data must be provided to each chip for data evaluation. Thus, data is delivered sequentially to each internal node in each chip at a rate of one internal node at a time. Sequential delivery allows data specified for a particular internal node to be processed by all chips in the RCC hardware array. This is because the hardware model is distributed among a plurality of chips.
[0675]
For this data evaluation, the translation provides two address spaces (S2H and CLK). As mentioned above, the S2H and CLK spaces are the primary inputs from the kernel to the hardware model. The hardware model holds virtually all register components and combination components of the user's circuit design. In addition, the software clock is modeled in software and provided in the CLK I / O address space to interface with the hardware model. The kernel advances the simulation time, finds the active test bench component, and evaluates the clock component. If any clock edge is detected by the kernel, the registers and memory are updated and the value through the combinational component is propagated. Thus, if the hardware acceleration mode is selected, any change in the values in these spaces will trigger the hardware model and change the logic state.
[0676]
During the data transfer, the DATA_XSFR signal is a logic one. During this time, the local buses 2222-2230 are used by the conversion system to transmit data using the following data cycle. (1) Global data from RCC computing system to RCC hardware array and CLK space, (2) Global data from external interface to RCC hardware array and external buffer, and (3) From RCC computing system to RCC hardware array S2H data (rate of one chip at a time on each board). Thus, the first two data cycles are part of the global cycle and the last data cycle is part of the S2H cycle.
[0677]
In the first part of the data-in-global cycle in which global data is sent from the RCC computing system to the RCC hardware array, the external I / O controller 2200 makes the CPU_IN signal available on line 2255 to logic “1”. . Line 2255 is coupled to the enable input of tristate buffer 2202. For a logic “1” on line 2255, tristate buffer 2202 allows data on local bus 2222 to pass through local buses 2223 to 2230 on the other side of tristate buffer 2202. In this particular example,

local buses

2223, 2224, 2225, 2226, 2227, 2228, 2229, and 2230 are respectively LD3, LD4 (from external I / O controller 2200), and LD6 (from external I / O controller 2200). , LD1, LD6, LD4, LD5, and LD7.
[0678]
Global data propagates from these local bus lines to bus lines 2231 to 2235 in internal I / O controller 2203 and then to FD bus lines 2236 to 2240. In this example,

FD bus lines

2236, 2237, 2238, 2239, and 2240 correspond to FD bus lines FD1, FD6, FD4, FD5, and FD7, respectively.
[0679]
These FD bus lines 2236-2240 are coupled to inputs to latches 2208-2213 in reconfigurable logic element 2204. In this example, the reconfigurable logic element corresponds to chip 0_1 (ie, chip 0 in board 1). Also, FD bus line 2236 is coupled to latch 2208, FD bus line 2237 is coupled to latches 2209 and 2211, FD bus line 2238 is coupled to latch 2210, FD bus line 2239 is coupled to latch 2212, and FD Bus line 2240 is coupled to latch 2213.
[0680]
The enable input for each of these latches 2208-2213 is coupled to a number of global pointers and software to hardware (S2H) pointers. The enable input to latches 2208-2211 is coupled to the global pointer, and the enable input to latches 2212-2213 is coupled to the S2H pointer. Some examples of global pointers include GLB_PTR0 on line 2241, GLB_PTR1 on line 2242, GLB_PTR2 on line 2243, and GLB_PTR3 on line 2244. Some example S2H pointers include S2H_PTR0 on line 2245 and S2H_PTR1 on line 2246. Because the enable inputs to these latches are coupled to these pointers, the respective latches cannot latch data to their intended destination node in the user-designed hardware model without an appropriate pointer signal.
[0681]
These global and S2H pointer signals are generated on output 2254 by data in pointer state machine 2214. The data in pointer state machine 2214 is controlled by DATA_XSFR and F_WR on line 2253. Internal I / O controller 2203 generates DATA_XSFR and F_WR on line 2253. DATA_XSFR is always a logic “1” whenever data transfer between the RCC hardware array and either the RCC computing system or the external interface is desired. In contrast to the F_RD signal, the F_WR signal is a logic “1” whenever writing to the RCC hardware array is desired. Reading via the F_RD signal requires delivery of data from the RCC hardware array to either the RCC computing system or the external interface. If both the DATA_XSFR and F_WR signals are logic “1”, the data-in pointer state machine may generate the appropriate global or S2H pointer signals in the appropriate programmed order.
[0682]
The outputs 2247-2252 of these latches are coupled to various internal nodes in the user-designed hardware model. Some of these internal nodes correspond to user-designed input pinouts. User designs typically have other internal nodes that are not accessible through pinouts, but these non-pinout internal nodes have other debugging purposes. That is, it provides flexibility to designers who wish to apply stimuli to various internal nodes (whether they are input pinouts or not) in the user design. These internal nodes that correspond to data-in logic and input pinouts are relevant to stimuli applied to a complex hardware model of the user design by an external interface. For example, if the user design is a CRTC 6845 video controller, some pinouts may be as follows:
[0683]
LPSTB-light pen strobe pin
~ RESET-6845 Low level signal to reset the controller
RS-register selection
E-enable
CLK-clock
~ CS-chip selection
Other input pinouts are also available in this video controller. Based on the number of input pinouts that are external interfaces, the number of nodes and thus the number of latches and pointers can be easily determined. One hardware model configured in the RCC hardware array has, for example, 30 separate latches associated with each of GLB_PTR0, GLB_PTR1, GLB_PTR2, GLB_PTR3, S2H_PTR0, and S2H_PTR1 for a total of 180 latches (= 30 × 6). Can have. In other designs, more global pointers such as GLB_PTR4 to GLB_PTR30 may be used as needed. Similarly, more S2H pointers such as S2H_PTR2 to S2H_PTR30 may be used as needed. These pointers and their corresponding latches are based on the hardware model requirements of each user design.
[0684]
Returning to FIGS. Data on the FD bus line is transferred to these internal nodes only when the latch is enabled using the appropriate global pointer or S2H pointer signal. Otherwise, these nodes are not driven by any data on the FD bus. When F_WR is logic “1” in the first half of the CPU_IN = 1 period, GLB_PTR0 is logic “1”, and the data on FD1 is transmitted to the corresponding internal node via line 2247. If there are other latches that rely on GLB_PTR0 to be available, these latches also latch data into their corresponding internal nodes. In the second half of the CPU_IN = 1 period, F_WR becomes logic “1” again, thereby starting GLB_PTR1 and raising it to logic “1”. Thereby, the data on FD 6 is transmitted to the internal node coupled to line 2248. This also causes a software clock signal to be sent on line 2223 to be latched on line 2216 by latch 2205 and GLB_PTR1 to be sent on enable line 2215. This software clock is delivered to an external clock input to the target system and other external I / O devices. Since GLB_PTR0 and GLB_PTR1 are only used for the first part of the data-in-global cycle, CPU_IN returns a logical “0” and this completes the delivery of global data from the RCC computing system to the RCC hardware array. To do.
[0685]
Here, the second part of the data in global cycle will be described. Here, global data from the external interface is delivered to the RCC hardware and external buffer. Again, various input pin-out signals from the target system or external I / O device that are directed to the user design must be provided to the hardware and software models. These data can be delivered to the hardware model by using appropriate pointers and latched for transmission to internal nodes. These data are also delivered to the software model by first storing the data in an external buffer 2201 for later retrieval by the RCC computing system to update the internal state of the software model.
[0686]
Here, CPU_IN is logic “0”, and EXT_IN is logic “1”. Therefore, the tristate buffer 2206 in the external I / O controller 2200 is made available and places data on PCI bus lines such as

bus lines

2217 and 2218. These PCI bus lines are also coupled to the FD bus line 2219 for storage in the external buffer 2201. In the first half of the period when the EXT_IN signal is logic “1”, GLB_PTR2 is logic “1”. This causes the data to be latched to be latched at an internal node in the hardware model coupled to the data on FD4 (via

bus lines

2217, 2224 and local bus line 2228 (LD4)) line 2249.
[0687]
In the second half of the period when the EXT_IN signal is logic “1”, GLB_PTR3 is logic “1”. This latches the data so that it is latched at the internal node in the hardware model coupled to the data on FD 6 (via

bus lines

2218, 2225, and local bus line 2227 (LD6)) line 2250.
[0688]
As noted above, these data from the target system or some other external I / O device can also be stored in the software model by first storing the data in an external buffer 2201 for later retrieval by the RCC computing system. Delivered and updates the internal state of the software model. Data on the

bus lines

2217 and 2218 is provided to the external buffer 2201 via the FD bus FD [63: 0] 2219. A specific memory address where each data is stored in the external buffer 2201 is provided to the external buffer 2201 via the bus 2220 by the memory address counter 2207. To allow such storage, a WR_EXT_BUF signal is provided to the external buffer 2201 via line 2221. Before the external buffer 2201 is full, the RCC computing system reads the contents of the external buffer 2201 so that it can make appropriate updates to the software model. Any data delivered to the various internal nodes of the hardware model in the RCC hardware array will probably cause some internal state change in the hardware model. Since the RCC computing system has a model of the entire user design in software, these internal state changes in the hardware model should also be reflected in the software model. This completes the data in global cycle.
[0689]
Here, the S2H cycle will be described. The S2H cycle is used to deliver test bench data from the RCC calculation system to the RCC hardware array, and then move that data sequentially from one chip to the next for each board. The CPU_IN signal is a logic “1”, while the EXT_IN signal is a logic “0”. This indicates data transfer between the RCC computing system and the RCC hardware array. No external interface is involved. The CPU_IN signal also allows the tristate buffer 2202 to transfer data from the local bus 2222 to the internal I / O controller 2203.
[0690]
At the start of the CPU_IN = 1 period, S2H_PTR0 becomes logic “1”. This latches so that the data on FD5 (via local bus 2222, local bus line 2229, bus line 2234, and FD bus 2239) is latched to an internal node in the hardware model coupled to line 2251. In the second part of the CPU_IN = 1 period, S2H_PTR1 becomes logic “1”. This latches the data on FD 7 (via local bus 2222, local bus line 2230, bus line 2235, and FD bus 2240) to be latched at an internal node in the hardware model coupled to line 2252. During the sequential data evaluation, the data from the RCC calculation system is first on chip m1, then on chip 0_1 (ie chip 0 on board 1), chip 1_1 (ie chip 1 on board 0), last board Delivered to the last chip, chip 7_8 (ie, chip 7 on board 8). If chip m2 is available, data is also transferred to this chip as well.
[0691]
At the end of this data transfer, DATA_XSFR returns to logic “0”. Note that I / O from the external interface is processed as global data and handles during the global cycle. This completes the explanation of the data-in control logic and the data-in cycle.
[0692]
(Data out)
A data out control logic embodiment of the present invention will now be described. The data out control logic of the embodiment of the present invention is responsible for processing data delivered from the RCC hardware array to the RCC computing system and external interface. During the course of processing data in response to a stimulus (external or otherwise), the hardware model generates predetermined output data that may be required by the target application or some I / O devices. These output data can be substantive data, addresses, control information, or other relevant information that another application or device may need for its own processing. These output data to the RCC computing system (which may have models of other external I / O devices in software), target systems, or external I / O devices are provided on various internal nodes. As described above for data-in logic, some of these internal nodes correspond to user-designed output pinouts. User designs typically have other internal nodes that are not available through pinouts, but these non-pinout internal nodes have other debugging purposes. That is, to give flexibility to designers who want to read and analyze stimuli at various internal nodes (whether they are input pinouts or not) in the user design. These internal nodes that correspond to data-in logic and input pinouts are relevant to stimuli applied to a complex hardware model of the user design by an external interface.
[0693]
For example, if the user design is a CRTC 6845 video controller, some pinouts may be as follows:
[0694]
MA0 to MA13 Memory address
D0 to D7 Data bus
DE display enable
CURSOR cursor position
VS vertical sync
HS horizontal sync
Other input pinouts are also available in this video controller. Based on the number of input pinouts that are external interfaces, the number of nodes and hence the number of gate logic and pointers can be easily determined. Thus, output pinouts MA0-MA13 on the video controller provide a memory address for the video RAM. The VS output pinout provides a signal for vertical synchronization and therefore causes vertical retrace on the monitor. Output pinouts D0-D7 form eight terminals that form a bi-directional data bus for accessing internal 6845 registers by the CPU in the target system. These output pinouts correspond to predetermined internal nodes in hardware. Of course, the number and nature of these internal nodes will vary depending on the user design.
[0695]
Data from these output pinout internal nodes must be provided to the RCC calculation system. Because the RCC computing system includes a model of the entire user design in software, and any events that occur in the hardware model must be communicated to the software model so that corresponding changes can be made. In this way, the software model may have information that matches the information in the hardware model. Thus, the RCC computing system has determined that the user or designer has modeled in software rather than connecting the actual device to one of the ports on the external I / O expander. Can have several device models. For example, the user determines that it is easier and more effective to model a monitor or speaker in software rather than plugging an actual monitor or speaker in one of the external I / O expander ports. Can do. Furthermore, data from these internal nodes in the hardware model must be provided to the target system and any other external I / O devices. In order for nodes in these output pinout internal nodes to be delivered to the RCC computation system and target systems and other external I / O devices, the data out control logic of one embodiment of the present invention is provided in the conversion system. Is done.
[0696]
The data out control logic uses a data out cycle that includes the transfer of data from the RCC hardware array to the RCC computing system 2141 and the external interface (external I / O expander 2139). In FIG. 69, each of the boards 2145 to 2149 has control logic for transferring data between the external interface (external I / O expander 2139) and the conversion system 2140. The main part of the control logic is in the external I / O controller 2152, while the other parts are in various I / O controllers (eg, 2156 and 2158) and reconfigurable logic elements (eg, FPGA chips 2159 and 2165). is there. Again, as an example, it is sufficient to show a predetermined portion of this control logic instead of the same repetitive logic structure for all chips on all boards. The portion of conversion system 2140 within dotted line 2150 of FIG. 69 includes a subset of control logic. This control logic will now be described in more detail with reference to FIGS. FIG. 71 illustrates a portion of the control logic used for the data out cycle. FIG. 73 is a timing diagram of the data out cycle.
[0697]
One particular subset of data-out control logic is shown in FIG. 71 and includes external I / O controller 2300, tri-state buffer 2301, internal I / O controller 2302, reconfigurable logic element 2303, and data transfer therebetween. Includes various buses and control lines to enable This subset illustrates the logic required for data out operations. Here, data from the external interface and the RCC computing system is delivered to the RCC hardware array. The data out control logic of FIG. 71 and the data out timing diagram of FIG. 73 will be described together.
[0698]
In contrast to the two cycle types of data-in cycles, data-out cycles include only one type of cycle. Data out control logic sequentially delivers data from the RCC hardware model to (1) the RCC computing system, and then (2) the RCC computing system and the external interface (with the target system and external I / O devices). That is, the data out cycle is such that data from the internal nodes of the hardware model in the RCC hardware array is first on the RCC computation system and then second on the RCC computation system and external interfaces on each chip at each board. It needs to be delivered one chip and one board at a time.
[0699]
Similar to data-in logic, pointers are used to select (or gate) data from internal nodes to the RCC computing system and external interface. In one embodiment illustrated in FIGS. 71 and 73, the data out pointer state machine 2319 has five pointers H2S_PTR [4: 0] on the bus 2359 for both hardware-to-software data and hardware-to-external interface data. ] Is generated. Data out pointer state machine 2319 is controlled by the DATA_XSFR and F_RD signals on line 2358. Internal I / O controller 2302 generates DATA_XSFR and F_RD signals on line 2358. DATA_XSFR is a logic “1” whenever data transfer between the RCC hardware array and either the RCC computing system or the external interface is desired. In contrast to the F_WR signal, F_RD is a logic “1” whenever a read from the RCC hardware array is desired. If both the DATA_XSFR and F_RD signals are logic “1”, the data pointer state machine 2319 may generate the appropriate H2S pointer signal in the appropriate programmed order. Other embodiments may use more pointers (or fewer pointers) as needed for user design.
[0700]
These H2S pointer signals are provided to the gate logic. A set of inputs 2353 to 2357 to the gate logic is directed to several AND gates 2314 to 2318. The other set of inputs 2348-2352 are coupled to internal nodes of the hardware model. Thus, AND gate 2314 has input 2348 from the internal node and input 2353 from H2S_PTR0, AND gate 2315 has input 2349 from the internal node and input 2354 from H2S_PTR1, and AND gate 2316 is from the internal node. Input 2350 and input 2355 from H2S_PTR2, AND gate 2317 has input 2351 from internal node and input 2356 from H2S_PTR3, and AND gate 2318 has input 2352 from internal node and input 2357 from H2S_PTR4. Have. Without an appropriate H2S_PTR pointer, the internal node is not driven to either the RCC computing system or the external interface.
[0701]
The respective outputs 2343 to 2347 of these AND gates 2314 to 2318 are coupled to OR gates 2310 to 2313. Accordingly, AND gate output 2343 is coupled to the input of OR gate 2310, AND gate output 2344 is coupled to the input of OR gate 2311, AND gate output 2345 is coupled to the input of OR gate 2311, and AND gate output 2346 is ORed. Coupled to the input of gate 2312 and AND gate output 2347 is coupled to the input of OR gate 2313. Note that output 2344 of AND gate 2315 is not coupled to an unshared OR gate. Rather, output 2344 is coupled to OR gate 2311. OR gate 2311 is also coupled to output 2345 of AND gate 2316. The other inputs 2360-2366 to OR gates 2310-2313 may be coupled to the outputs of other AND gates (not shown) (which are themselves coupled to other internal nodes and the H2S_PTR pointer). The use of these OR gates and their specific inputs is based on the user design and the configured hardware model. Thus, in other designs, more pointers may be used and the output 2344 from the AND gate 2315 is coupled to a different OR gate that is not the OR gate 2311.
[0702]
Outputs 2339 to 2342 of OR gates 2310 to 2313 are coupled to FD bus lines FD0, FD3, FD1, and FD4. In this particular example of user design, only four output pinout signals may be delivered to the RCC calculation system and external interface. Thus, FD0 is coupled to the output of OR gate 2310, FD3 is coupled to the output of OR gate 2311, FD1 is coupled to the output of OR gate 2312, and FD4 is coupled to the output of OR gate 2313. These FD bus lines are coupled to local bus lines 2330 to 2333 via internal lines 2334 to 2338 in the internal I / O controller 2302. In this embodiment, local bus line 2330 is LD0, local bus line 2331 is LD3, local bus line 2332 is LD1, and local bus line 2333 is LD4.
[0703]
These local bus lines are coupled to a tristate buffer 2301 to allow data on these local bus lines 2330-2333 to be delivered to the RCC computing system. Tri-state buffer 2301 allows data to be transferred from local bus lines 2330-2333 to local bus 2320 in its normal state. In contrast, during data-in, data can only be transferred from the RCC computing system to the RCC hardware array when the CPU_IN signal is provided to the tri-state buffer 2301.
[0704]
Lines 2321-2324 are provided to allow data on these local bus lines 2330-2333 to be delivered to the external interface. Line 2321 is coupled to line 2330 and a predetermined latch (not shown) in external I / O controller 2300, line 2323 is coupled to line 2332 and latch 2305 in external I / O controller 2300, and line 2324 is line 2333. And to a latch 2306 in the external I / O controller 2300.
[0705]
Each output of these latches 2305 and 2306 is coupled to a buffer and then to an external interface. The external interface is then coupled to the appropriate output pinout of the target system or external I / O device. Accordingly, the output of latch 2305 is coupled to buffer 2307 and line 2327. The output of latch 2306 is also coupled to buffer 2308 and line 2328. Another output of another latch (not shown) may be coupled to line 2329. In this example, lines 2327-2329 correspond to wire 1, wire 4, and wire 3 of the target system or predetermined external I / O device, respectively. Finally, during the data transfer from the hardware model to the external interface, the user-designed hardware model has the internal node coupled to line 2350 corresponding to wire 3 on line 2329 and coupled to line 2351. The internal node corresponds to wire 1 on line 2327 and the internal node coupled to line 2352 is configured to correspond to wire 4 on line 2328. Similarly, wire 3 corresponds to LD3 on line 2331, wire 1 corresponds to LD2 on line 2332, and wire 4 corresponds to LD4 on line 2333.
[0706]
Lookup table 2309 is configured to allow input to these latches 2305 and 2306. Lookup table 2309 is controlled by the F_RD signal on line 2367. The F_RD signal activates the look-up table address counter 2304 operation. For each counter increment, the pointer enables a particular row in the lookup table 2309. If the entry (or bit) in that particular row is a logical “1”, the LUT output line coupled to that particular entry in lookup table 2309 enables its corresponding latch and the data is external It is transferred to the interface and finally transferred to a desired destination in the target system or a predetermined external I / O device. For example, LUT output line 2325 is coupled to an enable input to latch 2305 and LUT output line 2326 is coupled to an enable input to latch 2306.
[0707]
In this example, rows 0-3 of lookup table 2309 are programmed to enable the latch corresponding to the output pinout wire for the internal node in chip m1. Similarly, rows 4-6 are programmed to enable latches corresponding to output pinout wires for internal nodes in chip 0_1 (ie, chip 0 in board 1). In row 4, bit 3 is a logic “1”. In row 5, bit 1 is a logic “1”. In row 6, bit 4 is a logic “1”. All other entries or bit positions are logic “0”. For any given bit position in the lookup table, only one entry is a logical “1”. This is because one output pin-out wire cannot drive a plurality of I / O devices. In other words, the output pinout internal node in the hardware model can provide data to only one wire coupled to the external interface.
[0708]
As described above, the data out control logic is configured such that the data in each reconfigurable logic element in each chip in the RCC hardware model is (1) the RCC calculation system, and then (2) the RCC calculation system and (the target system and the external I Sequentially delivered with the external interface (with / O device). The RCC calculation system needs these data. Because the RCC calculation system has a model of several I / O devices in software, and for data that targets one of these modeled I / O devices, the RCC calculation system They need to be monitored so that their internal state is consistent with the state of the hardware model in the RCC hardware array. In this example illustrated in FIGS. 71 and 73, only seven internal nodes may be driven for output to the RCC computing system and external interface. Two of these internal nodes are in chip m1, and the other five internal nodes are in chip 0_1 (ie, chip 0 in board 1). Of course, while internal nodes in these and other chips may be necessary for a particular user design, FIGS. 71 and 73 may only show these seven nodes.
[0709]
During the data transfer, the DATA_XSFR signal is a logic “1”. During this time, local buses 2330-2333 are used by the conversion system to sequentially transfer data from each chip on each board in the RCC hardware array to both the RCC computing system and the external interface. The DATA_XSFR and F_RD signals control the operation of the data out pointer state machine to generate the appropriate pointer signal H2S_PTR [4: 0] to the appropriate gate for the output pinout internal node. The F_RD signal also controls a look-up table address counter 2304 for delivery from internal node data to the external interface.
[0710]
The internal nodes in chip m1 can be processed first. When F_RD rises to logic “1” at the start of the data transfer cycle, H2S_PTR0 in chip m1 rises to logic “1”. Thereby, the data in these internal nodes in the chip m1 depending on H2S_PTR0 is transferred to the RCC calculation system via the tristate buffer 2301 and the local bus 2320. Lookup table address counter 2304 counts and points to row 0 of lookup table 2309 and latches into the external interface at the appropriate data in chip m1. The F_RD signal again rises to logic “1” and data at the internal node that can be driven by H2S_PTR1 is delivered to the RCC computation system and external interface. H2S_PTR1 goes to logic “1”, and in response to the second F_RD signal, look-up table address counter 2304 counts and points to row 1 of look-up table 2309 and in the appropriate data at chip m1 Latch to external interface.
[0711]
Here, five internal nodes in reconfigurable logic element 2303 (ie, chip 0_1 in board 1 or chip 0) can be processed. In this example, data from the two internal nodes associated with H2S_PTR0 and H2S_PTR1 may be delivered only to the RCC computing system. Data from the three internal nodes associated with H2S_PTR2, H2S_PTR3, and H2S_PTR4 may be delivered to the RCC computation system and external interface.
[0712]
When F_RD rises to logic “1”, H2S_PTR0 in chip 2303 becomes logic “1”. Thereby, the data in these internal nodes in the chip 2303 depending on H2S_PTR0 is transferred to the RCC calculation system via the tristate buffer 2301 and the local bus 2320. In this example, the internal node coupled to line 2348 depends on H2S_PTR0 on line 2353. When the F_RD signal becomes logic “1” again, the data at the internal node driven by H2S_PTR1 is delivered to the RCC calculation system. Here, the internal node coupled to line 2349 is affected. This data is transferred to LD 3 via lines 2331 and 2322.
[0713]
When the F_RD signal becomes logic “1” again, H2S_PTR2 becomes logic “1” and the data at the internal node coupled to line 2350 is provided on LD3. This data is provided to both the RCC calculation system and the external interface. The tri-state buffer 2301 allows data to be transferred to the local bus 2320 and then to the RCC computing system. For the external interface, this data is provided to LD3 via lines 2331 and 2322 by the enable H2S_PTR2 signal. In response to the F_RD signal, lookup table address counter 2304 counts and points to row 4 of lookup table 2309 and from this internal node coupled from line 2350 to line 2329 (wire 3) at the external interface. Latch at the appropriate data.
[0714]
When the F_RD signal becomes logic “1” again, H2S_PTR3 becomes logic “1” and the data at the internal node coupled to line 2351 is provided on LD1. This data is provided to both the RCC calculation system and the external interface. The tri-state buffer 2301 allows data to be transferred to the local bus 2320 and then to the RCC computing system. For the external interface, this data is provided to LD1 via lines 2332 and 2323 by the enable H2S_PTR3 signal. In response to the F_RD signal, lookup table address counter 2304 counts and points to row 5 of lookup table 2309, from this internal node coupled from line 2351 to line 2327 (wire 1) at the external interface. Latch at the appropriate data.
[0715]
When the F_RD signal becomes logic “1” again, H2S_PTR4 becomes logic “1” and the data at the internal node coupled to line 2352 is provided on LD4. This data is provided to both the RCC calculation system and the external interface. The tri-state buffer 2301 allows data to be transferred to the local bus 2320 and then to the RCC computing system. For the external interface, this data is provided to LD4 via lines 2333 and 2324 by the enable H2S_PTR4 signal. In response to the F_RD signal, lookup table address counter 2304 counts and points to row 6 of lookup table 2309, from this internal node coupled from line 2352 to line 2328 (wire 4) at the external interface. Latch at the appropriate data.
[0716]
This process of transferring the data at the internal node of chip m1 first to the RCC calculation system and then to both the RCC calculation system and the external interface is then continued sequentially for the other chips. First, the internal node of chip m1 was driven. Second, the internal node of chip 0_1 (chip 2303) was driven. Next, if there is an internal node of the chip 1_1, it can be driven. This operation continues until the last node in the last chip on the last board is driven. Therefore, if there is an internal node of the chip 7_8, it can be driven. Finally, if there is an internal node of chip m2, it can be driven.
[0717]
71 shows data out control logic for driving internal nodes in chip 2303 only, other chips also have internal nodes that may need to be driven to the RCC computing system and external interface. Regardless of the number of internal nodes, the data-out control logic can transfer data from the internal nodes in one chip to the RCC computing system, and then in another cycle, transfer different sets of internal nodes in the same chip to the RCC computing system and Drive for both external interfaces. The data out control logic then moves to the next chip and first transfers the data specified for the RCC calculation system and then the data specified for the external interface to both the RCC calculation system and the external interface. The same two-step operation is performed to transfer. Even when the data is for an external interface, the RCC computing system must know the data. This is because the RCC computing system has a model of the entire user design in software that must have internal state information that matches the internal state information of the hardware model in the RCC hardware array.
[0718]
(Board layout)
Here, the board layout of the conversion system according to the embodiment of the present invention will be described with reference to FIG. The board is installed in the RCC hardware array. The board layout is similar to that illustrated in FIGS. 8 and 36-44 and described in the accompanying text.
[0719]
In one embodiment, the RCC hardware array includes six boards. Board m1 is coupled to board 1 and board 2 is coupled to board 8. The coupling and placement of board 1, board 2, board 3 and board 8 was described above with reference to FIGS. 8 and 36-44.
[0720]
Board m1 includes chip m1. The interconnect structure between the board m1 and the other boards is such that the chip m1 is coupled to the south interconnect to the chip 0, chip 2, chip 4 and chip 6 of the board 1. The interconnection structure between board m2 and the other boards is such that chip m2 is coupled to the south interconnection of board 8 to chip 0, chip 2, chip 4 and chip 6.
[0721]
(X. Example)
To illustrate the operation of one embodiment of the present invention, a virtual user circuit design may be used. In a structured register transfer level (RTL) HDL code, an example of user circuit design is as follows.
[0722]
[Expression 7]

[0723]
This code is reproduced in FIG. The specific functional details of this circuit design are not necessary to understand the present invention. However, the reader should understand that the user generates this HDL code and designs a circuit for simulation. The circuit represented by this code performs a predetermined function as designed by the user, responds to the input signal, and generates an output.
[0724]
FIG. 27 shows a circuit diagram of the HDL code described with reference to FIG. In most cases, the user can actually generate a schematic of this property before expressing it in HDL form. A number of schematic capture tools allow for the entry of a real circuit diagram, and after processing, these tools generate usable code.
[0725]
As shown in FIG. 28, the simulation system performs component type analysis. The HDL code originally presented in FIG. 26 as representing the user's specific circuit design was analyzed here. The first few lines of code starting with "module register (clock, reset, d, q);" and ending with "endmodule" and further identified by reference number 900 are the register definition section.
[0726]
The next few lines of code (reference number 907) represent predetermined wire interconnection information. Wire variables in HDL as known to those skilled in the art are used to represent physical connections between structural entities such as gates. Since HDL is mainly used to model digital circuits, wire variables are necessary variables. Usually, “q” (eg, q1, q2, q3) represents an output wireline and “d” (eg, d1, d2, d3) represents an input wireline.
[0727]
Reference numeral 908 indicates “signin” which is a test bench. The register number 909 indicates “sigout” which is a test bench input.
[0728]
Reference numeral 901 indicates register components S1, S2, and S3. Reference numeral 902 indicates combination components S4, S5, S6, and S7. The combination components S4 to S7 have output variables d1, d2, and d3 that are inputs to the register components S1 to S3. Reference numeral 903 indicates a clock configuration response S8.
[0729]
The next series of code line numbers shows the test bench components. Reference numeral 904 indicates a test bench component (driver) S9. Reference numeral 905 indicates test bench components (initialization) S10 and S11. Reference numeral 904 indicates a test bench component (monitor) S12.
[0730]
The component type analysis is summarized in the following table.
[0731]
[Table 15]

[0732]
Based on the component type analysis, the system generates a software model for the entire circuit and a hardware model for the registers and combinatorial components. S1 to S3 are register components, and S4 to S7 are combination components. These components are modeled in hardware, allowing the user of the S emulation system to either simulate the entire circuit in software or to simulate in software and selectively speed up in hardware To. In either case, the user dominates the simulation and hardware acceleration modes. In addition, the user can emulate the circuit using the target system while still maintaining software control of starting, stopping, checking values, and asserting input values every cycle.
[0733]
FIG. 29 shows a signal network analysis of the same structured RTL level HDL code. As shown, S8, S9, S10, and S11 are modeled or provided in software. S9 is essentially a test bench process that generates a sigin signal, and S12 is essentially a similar process for a test bench that receives a sigout signal. In this example, S9 generates a random sign to simulate the circuit. However, registers S1-S3 and combinational components S4-S7 are modeled in hardware and software.
[0734]
For a software / hardware boundary, the system uses various residence signals (ie, q1, q2, q3, CLK, sign, signout) that are used to interface the software model with the hardware model. Allocate memory space for
[0735]
[Table 16]

[0736]
FIG. 30 shows the result of software / hardware division for this circuit design example. FIG. 30 is a more feasible example of software / hardware partitioning. Software side 910 is coupled to hardware side 912 via software / hardware boundary 911 and PCI bus 913.
[0737]
Software side 910 contains and is controlled by a software kernel. In general, the kernel is the main control loop that controls the operation of the S emulation system. As long as any test bench process is active, the kernel evaluates the test bench components, evaluates the clock components, detects clock edges, updates registers and memory, transmits combinatorial logic data, and simulates Advance time. The kernel resides on the soft side, but some of the operations or statements are executed in hardware. This is because a hardware model exists for these statements and operations. Thus, the software controls both software and hardware models.
[0738]
The software side 910 includes the entire model of the user's circuit including S1-S12. The software / hardware boundary on the software side includes I / O buffers or address spaces S2H, CLK, H2S, and REG. The driver test bench process S9 is coupled to the S2H address space, the monitor test bench process S12 is coupled to the H2S address space, and the clock generator S8 is coupled to the clock address space. Registers S1-S3 output signals q1-q3 are assigned to the REG space.
[0739]
The hardware model 912 has a model of combination components S4 to S7 and resides on the pure hardware side. On the software / hardware boundary of hardware model 912, sigout, signin, register outputs q1-q3, and software clock 916 are implemented.
[0740]
In addition to the user's model of custom circuit design, the system generates a software clock and an address pointer. The software clock provides a signal to allow input to registers S1-S3. As noted above, the software clock of the present invention eliminates race conditions and retention time exceeded problems. When a clock edge is detected by the main clock in software, the detection logic activates the corresponding detection logic in hardware. The clock edge register 916 then generates an enable signal to the register enable input to the gate in any data that resides at the input to the register.
[0741]
Address pointer 914 is also shown for purposes of example and concept. An address pointer is actually implemented in each FPGA chip and allows data to be transferred selectively and sequentially to its destination.
[0741]
Combination components S4-S7 are also coupled to register components S1-S3, signin, and signout. These signals propagate on / from PCI bus 913 on I / O bus 915.
[0743]
The complete hardware model before the mapping, placement, and routing steps is shown in FIG. 31 (excluding the address pointer). The system has not yet mapped the model to a specific chip. Registers S1-S3 are provided coupled to the I / O bus and combinational components S4-S6. The signin, sigout, and software clock 920 are also modeled.
[0744]
Once the hardware model is determined, the system then maps, places, and routes the model to one or more chips. Although this particular example may actually be implemented on one AlteraFLEX 10K chip, for purposes of illustration, this example may assume that two chips may be needed to implement this hardware model. FIG. 32 shows the result of one specific hardware model versus chip partition for this example.
[0745]
The complete model shown in FIG. 32 (except for the I / O good clock edge register) involves chip boundaries represented by dotted lines. This result is generated by the compiler of the S emulation system and then the final configuration file is generated. Thus, the hardware model requires at least three wires between the two chips for

wirelines

921, 922, and 923. To reduce the number of pins / wires between these two chips (chip 1 and chip 2), either another model-to-chip partition should be generated or a multiplexing scheme should be used Good.
[0746]
Analyzing this particular segmentation result shown in FIG. 32, the number of wires between these two chips can be reduced to two by moving the sign wire line 923 from chip 2 to chip 1. Actually, this division is illustrated in FIG. The specific division in FIG. 33 appears to be a better division than the division in FIG. 32 based solely on the number of wires, but this example shows the mapping before the S emulation system selects the division in FIG. It can be assumed that placement and routing operations are performed. The split result of FIG. 32 can be used as a basis for generating a configuration file.
[0747]
FIG. 34 shows the logical patch operation for the same hypothetical example. Here is the final implementation of the two chips. The system generated a configuration file using the segmentation result of FIG. However, for simplicity, no address pointer is shown. Two

FPGA chips

930 and 940 are shown. Chip 930 specifically includes a divided portion of the user's circuit design, a TDM section 931 (receiver side), a software clock 932, and an I / O bus 933. Chip 940 specifically includes a divided portion of the user's circuit design, a TDM portion 941 for the transmitter side, a software clock 942, and an I / O bus 943. TDM portions 931 and 941 have been described with reference to FIGS. 9A, 9B, and 9C.
[0748]
These

chips

930 and 940 have

interconnect wires

944 and 945 that join together the hardware model. These two interconnect wires are part of the interconnect shown in FIG. Referring to FIG. 8, one such interconnect is an interconnect 611 located between chips F32 and F33. In one embodiment, the maximum number of wires / pins for each interconnect is 44. In FIG. 34, the modeled circuit requires only two wires / pins between

chips

930 and 940.
[0749]
These

chips

930 and 940 are coupled to bank bus 950. Since only two chips are implemented, both chips are in the same bank, or each chip resides in a different bank. If necessary, one chip is coupled to one bank bus and the other chip is coupled to another bank bus to ensure that the throughput at the FPGA interface is the same as the throughput at the PCI interface. To do.
[0750]
The foregoing descriptions of preferred embodiments of the present invention have been presented for purposes of illustration and description. It is not intended to be exhaustive or to limit the invention to the precise form disclosed. Many modifications and variations will be apparent to practitioners skilled in this art. The applications described herein may be replaced with other applications without departing from the spirit and scope of the present invention. Accordingly, the invention should be limited only by the scope of the claims.
[Brief description of the drawings]
FIG. 1 shows a high level overview of one embodiment of the present invention including a target system coupled to a workstation, a reconfigurable hardware hardware emulation model, an emulation interface and a PCI bus.
FIG. 2 shows a specific usage flow chart of the present invention.
FIG. 3 shows a high-level schematic diagram of software compilation and hardware configuration for compile time and run time according to one embodiment of the present invention.
FIG. 4 shows a flowchart of a compilation process that includes generating software / hardware model and software kernel code.
FIG. 5 shows a software kernel that controls the entire S emulation system.
FIG. 6 illustrates a method for mapping a hardware model to a reconfigurable hardware board by mapping, placement and routing.
FIG. 7 shows a connectivity matrix for the FPGA array shown in FIG.
FIG. 8 illustrates one embodiment of a 4 × 4 FPGA array and interconnect.
FIG. 9A illustrates one embodiment of a time division multiplexing (TDM) circuit. One embodiment of this time division multiplexing (TDM) circuit allows groups of wires to be coupled together in a time multiplexed manner, so that one pin, rather than multiple pins, is this Can be used on one chip for a group. FIG. 9A shows an overview of the pin output problem.
FIG. 9B illustrates one embodiment of a time division multiplexing (TDM) circuit. One embodiment of this time division multiplexing (TDM) circuit allows groups of wires to be coupled together in a time multiplexed manner, so that one pin, rather than multiple pins, is this Can be used on one chip for a group. FIG. 9B provides a TDM circuit on the transmitting side
FIG. 9C illustrates one embodiment of a time division multiplexing (TDM) circuit. One embodiment of this time division multiplexing (TDM) circuit allows groups of wires to be coupled together in a time multiplexed manner, so that one pin, rather than multiple pins, is this Can be used on one chip for a group. FIG. 9C each shows providing a TDM circuit on the receiving side.
FIG. 10 illustrates an S emulation system architecture according to one embodiment of the present invention.
FIG. 11 shows an embodiment of the address pointer of the present invention.
12 shows a state transition diagram of address pointer initialization of the address pointer of FIG. 11. FIG.
FIG. 13 illustrates one embodiment of a MOVE signal generator that derivatively generates various MOVE signals for the address pointer.
FIG. 14 shows a chain of multiplexed address pointers of each FPGA chip.
FIG. 15 illustrates one embodiment of a cross-chip address chain multiplexed according to one embodiment of the present invention.
FIG. 16 shows a flowchart of clock / data network analysis important for software clock implementation and evaluation of the logical components of the hardware model.
FIG. 17 shows the basic building blocks of a hardware model according to one embodiment of the invention.
FIG. 18A shows a register model implementation with latches and flip-flops.
FIG. 18B shows a register model implementation with latches and flip-flops.
FIG. 19 illustrates one embodiment of clock edge detection logic in accordance with one embodiment of the present invention.
FIG. 20 illustrates a four-state finite state machine that controls the clock edge detection logic of FIG. 19 in accordance with one embodiment of the present invention.
FIG. 21 shows interconnections (JTAG, FPGA bus and global signal designator for each FPGA chip) according to one embodiment of the invention.
FIG. 22A illustrates one embodiment of an FPGA controller between a PCI bus and an FPGA array.
FIG. 22B shows one embodiment of an FPGA controller between the PCI bus and the FPGA array.
FIG. 23A shows a more detailed illustration of the CTRL_FPGA unit and data buffer described in FIG.
FIG. 23B shows a more detailed illustration of the CTRL_FPGA unit and data buffer described in FIG.
FIG. 24 shows the relationship between 4 × 4 FPGA arrays, FPGA banks, and expansion functions.
FIG. 25 illustrates one embodiment of a hardware start method.
FIG. 26 shows HDL code for an example of a user circuit design that is modeled and simulated.
FIG. 27 is a circuit diagram symbolically showing the circuit design of the HDL code of FIG.
FIG. 28 shows component type analysis of the HDL code of FIG.
FIG. 29 shows a signal network analysis of a structured RTL HDL code based on the user's custom circuit design shown in FIG.
FIG. 30 shows the same virtual example software / hardware partition results.
FIG. 31 shows the same virtual example hardware model.
FIG. 32 shows specific hardware model-chip partition results for the same virtual example of a user's custom circuit design.
FIG. 33 shows another hardware model-chip partition result of the same virtual example of a user's custom circuit design.
FIG. 34 shows the same virtual logic patch operation of the user's custom circuit design.
FIGS. 35 (A)-(D) illustrate the “hop” principle and FPGA board connection scheme by two examples.
FIG. 36 shows an outline of an FPGA chip used in the present invention.
FIG. 37 shows FPGA interconnections of the FPGA chip.
FIG. 38A shows a side view of an FPGA cord connection conceptual diagram according to one embodiment of the present invention.
FIG. 38B shows a side view of an FPGA cord connection conceptual diagram according to one embodiment of the present invention.
FIG. 39 illustrates a directly adjacent 1-hop 6-board interconnect layout of an FPGA array according to one embodiment of the present invention.
FIG. 40A shows an FPGA internal board interconnection scheme.
FIG. 40B shows an FPGA internal board interconnection scheme.
FIG. 41A shows a top view of a board interconnect connector.
FIG. 41B shows the top surface of the board interconnect connector.
FIG. 41C shows the top surface of the board interconnect connector.
FIG. 41D shows a top view of the board interconnect connector.
FIG. 41E shows a top view of the board interconnect connector.
FIG. 41F shows a top view of the board interconnect connector.
FIG. 42 shows a typical FPGA board on-board connector and some components.
43 shows a legend for the connectors of FIGS. 41A-F and FIG. 42. FIG.
FIG. 44 illustrates a directly adjacent 1-hop two board interconnect layout of an FPGA array according to another embodiment of the present invention.
FIG. 45 shows a workstation with a multiprocessor according to another embodiment of the invention.
FIG. 46 illustrates an environment according to another embodiment of the invention in which multiple users share a signal single simulation / emulation system based on a time division basis.
FIG. 47 shows a high-level structure of a simulation server according to an embodiment of the present invention.
FIG. 48 shows the architecture of a simulation server according to an embodiment of the present invention.
FIG. 49 shows a flowchart of the simulation server.
FIG. 50 shows a flowchart of a job swapping process.
FIG. 51 shows signals between a device driver and a reconfigurable hardware unit.
FIG. 52 shows a time-sharing function of a simulation server that handles a plurality of jobs having different levels of priority.
FIG. 53 shows a communication handshake signal between a device driver and a reconfigurable hardware unit.
FIG. 54 shows a state diagram of a communication handshake protocol.
FIG. 55 shows an overview of a client-server model of a simulation server according to an embodiment of the present invention.
FIG. 56 shows a high-level block diagram of a simulation system that implements memory mapping according to one embodiment of the invention.
FIG. 57 is a more detailed block diagram of the memory mapping aspect of a simulation system with a memory finite state machine (MEMFSM) support component and an evaluation finite state machine for each FPGA logic device (EVALFSMx). Indicates.
FIG. 58 shows a state diagram of a finite state machine of a MEMFSM unit of a CTRL_FPGA unit according to one embodiment of the invention.
FIG. 59 shows a state diagram of a finite state machine for each FPGA chip according to one embodiment of the invention.
FIG. 60 shows a memory read data double buffer.
FIG. 61 illustrates a simulation write / read cycle according to one embodiment of the invention.
FIG. 62 shows a timing diagram of a simulation data transfer operation when a DMA read operation occurs after the CLK_EN signal.
FIG. 63 shows a timing diagram of a simulation data transfer operation when a DMA read operation occurs near the end of the EVAL period.
FIG. 64 shows an exemplary user design implemented as a PCI add-on card.
FIG. 65 shows an exemplary hardware / software co-verification system that uses an ASIC as the device under test.
FIG. 66 shows an exemplary co-verification system that uses an emulator when the device under test is pre-promoted to the emulator.
FIG. 67 shows a simulation system according to an embodiment of the present invention.
FIG. 68 illustrates a co-verification system that does not use an external I / O device according to one embodiment of the invention, except that the RCC computing system uses various I / O device and target system software. Includes model.
FIG. 69 shows a co-verification system with an actual external I / O device and target system according to another embodiment of the present invention.
FIG. 70 shows a more detailed logic diagram of the data-in portion of the control logic according to one embodiment of the present invention.
FIG. 71 shows a more detailed logic diagram of the data out part of the control logic according to one embodiment of the present invention.
FIG. 72 shows a timing diagram of the data-in part of the control logic.
FIG. 73 shows a timing diagram of the data out part of the control logic.
FIG. 74 shows a board layout for an RCC hardware array according to one embodiment of the invention.
FIG. 75A shows an exemplary shift register circuit used to explain retention time and clock glitch problems.
FIG. 75B shows a timing diagram of the shift register circuit shown in FIG. 75A to illustrate a retention time violation.
FIG. 76A shows the same shift register circuit of FIG. 75A in which a plurality of FPGA chips are arranged.
FIG. 76B shows a timing diagram of the shift register circuit shown in FIG. 76A to illustrate the retention time.
77A shows an exemplary logic circuit used to illustrate the clock glitch problem. FIG.
FIG. 77B shows a timing diagram of the logic circuit of FIG. 77A to illustrate the clock glitch problem.
FIG. 78 illustrates a prior art timing adjustment technique that solves the retention time violation problem.
FIG. 79 shows a prior art timing synthesis technique that solves the retention time violation problem.
80A shows an original latch, according to one embodiment of the present invention. FIG.
FIG. 80B shows a timing independent and glitchless latch, according to one embodiment of the invention.
FIG. 81A shows an originally designed flip-flop, according to one embodiment of the present invention.
FIG. 81B illustrates a timing-independent and glitch-free design type flip-flop, according to one embodiment of the present invention.
FIG. 82 shows a timing diagram of a trigger mechanism for a timing-independent and glitch-free design type flip-flop according to one embodiment of the present invention.

Claims

A first input having a first input for receiving first data, a second input for receiving second data, a first output, and a control input for receiving a control signal. The first data received at the first input and the second data received at the second input in response to the control signal received at the control input A first logic circuit that selects one of the data and presents the selected data to the first output ;
A second logic circuit for storing a value, the first trigger input; a second logic input coupled to the first output; and a second input of the first logic circuit A logic device comprising: a second logic circuit having a second logic output configured;
The second logic circuit presents the value as the second data to the second input of the first logic circuit via the second logic output of the second logic circuit;
Regardless of the order in which the control signal arrives at the control input or the first data arrives at the first input of the first logic circuit, (1) the second logic A circuit is updated to the first data by replacing the value via the first output of the first logic circuit and the second logic input of the second logic circuit; (2) The second logic circuit receives the first data via the second logic output of the second logic circuit when the trigger signal is received at the trigger input. Presenting the second data to the second input of the logic circuit of
The logical unit is
A fourth input for receiving new data; a second trigger input for receiving the trigger signal; and a third output coupled to the first input of the first logic circuit. A third logic circuit comprising:
An edge detector having a clock input for receiving a clock signal, a third trigger input for receiving the trigger signal, and a fourth output coupled to the control input of the first logic circuit And further comprising
When the trigger signal is applied to the second trigger input at a selected time, the third logic circuit is connected to the third logic circuit via the third output of the third logic circuit. Presenting the new data received at the fourth input of the first logic circuit to the first input of the first logic circuit as the first data;
When the trigger signal is applied to the third trigger input at the selected time, the edge detector is responsive to the clock signal being received at the clock input of the edge detector; A logic device presenting an output signal as the control signal to the control input of the first logic circuit via the fourth output of the edge detector.

A debugging system for verifying proper operation of a user-designed circuit, wherein the user design includes a plurality of logic devices and a plurality of circuit paths,
The debug system
A calculation system for generating a software model of the user-designed circuit;
A reconfigurable hardware system that generates a hardware model of at least a portion of the user-designed circuit, the hardware model comprising a plurality of emulation logic devices that replace a plurality of logic circuits in the user-designed circuit Including a reconfigurable hardware system,
Control means functioning as a slave to the computing system and coupled to the computing system and the reconfigurable hardware system for controlling the operation of the hardware model in the reconfigurable hardware system Control means, and
Each of the plurality of emulation logic devices is
A first input having a first input for receiving first data, a second input for receiving second data, a first output, and a control input for receiving a control signal. of I logic der, in response to the control signal received at the control input, the first received at the first input data and the second of the second received at the input A first logic circuit that selects one of the data and presents the selected data to the first output ;
A second logic circuit for storing a value, the first trigger input; a second logic input coupled to the first output; and a second input of the first logic circuit And a second logic circuit having a second logic output configured,
The second logic circuit presents the value as the second data to the second input of the first logic circuit via the second logic output of the second logic circuit;
Regardless of the order in which the control signal arrives at the control input or the first data arrives at the first input of the first logic circuit, (1) the second logic A circuit is updated to the first data by replacing the value via the first output of the first logic circuit and the second logic input of the second logic circuit; (2) The second logic circuit receives the first data via the second logic output of the second logic circuit when the trigger signal is received at the trigger input. Presenting the second data to the second input of the logic circuit of
Each of the plurality of emulation logic devices is
A fourth input for receiving new data; a second trigger input for receiving the trigger signal; and a third output coupled to the first input of the first logic circuit. A third logic circuit comprising:
An edge detector having a clock input for receiving a clock signal, a third trigger input for receiving the trigger signal, and a fourth output coupled to the control input of the first logic circuit And further including
When the trigger signal is applied to the second trigger input at a selected time, the third logic circuit is connected to the third logic circuit via the third output of the third logic circuit. Presenting the new data received at the fourth input of the first logic circuit to the first input of the first logic circuit as the first data;
When the trigger signal is applied to the third trigger input at the selected time, the edge detector is responsive to the clock signal being received at the clock input of the edge detector; A debugging system that presents an output signal as the control signal to the control input of the first logic circuit via the fourth output of the edge detector.

A first input having a first input for receiving first data, a second input for receiving second data, a first output, and a control input for receiving a control signal. The first data received at the first input and the second data received at the second input in response to the control signal received at the control input A first logic circuit that selects one of the data and presents the selected data to the first output ;
A second logic circuit for storing a value, the first trigger input; a second logic input coupled to the first output; and a second input of the first logic circuit A logic device comprising: a second logic circuit having a second logic output configured;
The second logic circuit presents the value as the second data to the second input of the first logic circuit via the second logic output of the second logic circuit;
The second logic circuit replaces the first value by replacing the first value of the first logic circuit and the second logic input of the second logic circuit. Updated to data, and the second logic circuit passes the first data through the second logic output of the second logic circuit when the trigger signal is received at the trigger input. Presenting the second data to the second input of the first logic circuit as the second data ;
The trigger input is received when the first data reaches a steady state;
The logical unit is
A fourth input for receiving new data; a second trigger input for receiving the trigger signal; and a third output coupled to the first input of the first logic circuit. A third logic circuit comprising:
An edge detector having a clock input for receiving a clock signal, a third trigger input for receiving the trigger signal, and a fourth output coupled to the control input of the first logic circuit And further comprising
When the trigger signal is applied to the second trigger input at a selected time, the third logic circuit is connected to the third logic circuit via the third output of the third logic circuit. Presenting the new data received at the fourth input of the first logic circuit to the first input of the first logic circuit as the first data;
When the trigger signal is applied to the third trigger input at the selected time, the edge detector is responsive to the clock signal being received at the clock input of the edge detector; A logic device presenting an output signal as the control signal to the control input of the first logic circuit via the fourth output of the edge detector.

4. The apparatus of claim 3, wherein the first logic circuit includes a multiplexer and the second logic circuit includes a D-type flip-flop.