JP2008140405A

JP2008140405A - Co-validation method between electronic circuit and control program

Info

Publication number: JP2008140405A
Application number: JP2007336117A
Authority: JP
Inventors: J French Leslie; レスリー・ジェイ・フレンチ
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 2000-10-17
Filing date: 2007-12-27
Publication date: 2008-06-19
Also published as: JP2002175344A; JP2005135436A; JP4422011B2

Abstract

<P>PROBLEM TO BE SOLVED: To use a light-weight operating system on a message base as a base for cooperative simulation of hardware/software. <P>SOLUTION: Electronic circuits are mapped in a software process and communicate status changes through message transfer primitive as a hardware process. A signal maintainer 12 manages statuses of all global signals. A clock signal used by a hardware process and an instruction set simulator is generated by a clock generator 13, and a process queue waiting for an event is managed by a queue maintainer 14. A scheduler 16 executes a process at every nanosecond tick, and a central control process 11 controls the execution of the processes. <P>COPYRIGHT: (C)2008,JPO&INPIT

Description

本発明は、軽量のメッセージベースのオペレーティングシステムを用いてハードウェアおよびソフトウェアのコデザイン（co-design、協調設計）およびコバリデーション（co-validation、協調検証）を行う方法および装置に関する。特に、電気回路設計はソフトウェアプロセスにマッピングされ、マッピングされた電気回路要素間の状態変化は、軽量オペレーティングシステムのメッセージ受渡しプリミティブを用いて渡される。軽量オペレーティングシステム下で動作する命令セットシミュレータにより、制御ソフトウェア要素と、マッピングされた電気回路要素とはコシミュレート（co-simulate、協調シミュレート）され、それらの相互作用の評価が可能となる。本発明は、ソフトウェアおよびハードウェアのコシミュレーション（co-simulation、協調シミュレーション）環境を提供するコバリデーション方法、コシミュレーション方法をインプリメント（実装）するコンピュータシステム、コシミュレーション方法をインプリメントするソフトウェア命令を含むコンピュータプログラム製品、ならびに、ハードウェア要素のソフトウェアシミュレーションおよび命令セットシミュレーションをサポートするソフトウェア環境として、実現される。また、本発明は、ターゲット中央処理装置（ＣＰＵ）に対する命令セットシミュレータを開発する方法としても実現される。 The present invention relates to a method and apparatus for hardware and software co-design and co-validation using a lightweight message-based operating system. In particular, the electrical circuit design is mapped to a software process, and state changes between the mapped electrical circuit elements are passed using lightweight operating system message passing primitives. An instruction set simulator operating under a lightweight operating system allows control software elements and mapped electrical circuit elements to be co-simulated and evaluated for their interaction. The present invention relates to a co-validation method that provides a software and hardware co-simulation environment, a computer system that implements the co-simulation method, and a computer that includes software instructions that implement the co-simulation method. It is implemented as a software environment that supports program products and software simulation and instruction set simulation of hardware elements. The present invention is also realized as a method for developing an instruction set simulator for a target central processing unit (CPU).

以下の参照事項は、すべて本発明に関連する示した主題についての有用な背景的情報を提供する。 The following references all provide useful background information about the indicated subject matter relating to the present invention.

ＰＴＯＬＥＭＹは、並行（コンカレント）システムの異種(heterogeneous)モデリング、シミュレーションおよび設計を研究するプロジェクトである。現在の実装は、ＪＡＶＡ(登録商標）プログラミング言語で実現されている。http://ptolemy.eecs.berkeley.eduにあるＰＴＯＬＥＭＹウェブサイトは、ＰＴＯＬＥＭＹに関するさらに詳細な情報を含む。 PTOLEMY is a project that studies heterogeneous modeling, simulation and design of concurrent systems. The current implementation is implemented in the JAVA (registered trademark) programming language. The PTOLEMY website at http://ptolemy.eecs.berkeley.edu contains more detailed information about PTOLEMY.

ＰＯＬＩＳは、組込みシステムのハードウェア／ソフトウェアコデザインのためのソフトウェアプログラムである。ＰＯＬＩＳは、ＰＴＯＬＥＭＹのために開発されたフレームワークを利用する。http://www-cad.berkeley.edu/~polis/にあるＰＯＬＩＳウェブサイトは、ＰＴＯＬＥＭＹに関するさらに詳細な情報を含む。 POLIS is a software program for hardware / software co-design of embedded systems. POLIS utilizes a framework developed for PTOLEMY. The POLIS website at http://www-cad.berkeley.edu/~polis/ contains more detailed information about PTOLEMY.

次に、本発明を理解するための適当な基礎を提供するいくつかの主題について記述する。 The following is a description of some subjects that provide a suitable basis for understanding the present invention.

一般に、軽量オペレーティングシステムは、ハードウェアとアプリケーションレベルのコードとの間の多くの障壁を除去することによって、アプリケーションが高速に動作することを可能にする。１つの直接的な結果として、システムに残る保護機構は、あるとしてもわずかとなる。このため、このような環境は、組込みあるいは専用アプリケーションに適したものにはなるが、不完全に設計された、あるいは、悪意のある結果を生じるように意図されたアプリケーションに適したものとはならない。従来、このことは、ソフトまたはハードなリアルタイム組込みコードへの、マイクロカーネルシステムの利用を制限してきた。 In general, lightweight operating systems allow applications to run faster by removing many barriers between hardware and application level code. One direct result is that few, if any, protection mechanisms remain in the system. This makes such an environment suitable for embedded or dedicated applications, but not for applications that are poorly designed or intended to produce malicious consequences. . In the past, this has limited the use of microkernel systems for soft or hard real-time embedded code.

現在、良好なハードウェア−ソフトウェア・コシミュレーションツールに対する需要が、次の３つの主要な要因によって引き起こされている。 Currently, the demand for good hardware-software co-simulation tools is caused by three main factors:

１．コンピュータシステム（ハードウェアおよびソフトウェアの両方）のサイズおよび複雑さの増大；
２．費用効果の高いＳＯＣ(system-on-a-chip)実装に対する要求；
３．これまでの投資に対する収益を最大化するようなＩＰ(intellectual property)の再利用。 1. Increase in size and complexity of computer systems (both hardware and software);
2. Requirements for cost-effective SOC (system-on-a-chip) implementation;
3. Reuse IP (intellectual properties) to maximize return on investment.

現在、一般に、上記の要因に対する有効な解決策は、ハードウェアおよびソフトウェアの両方のコンポーネントを必要とすることが認識されている。このことは、設計空間を「コデザイン」要求へと切り開いた。多くのタスクが、汎用の中央処理装置によっても、専用（あるいはプログラム可能）ハードウェアによっても実行可能である場合、最も有効な分割はどのようにしてなされるか。設計ステップは、次のようなフィードバックループで作用する。 Currently, it is generally recognized that an effective solution to the above factors requires both hardware and software components. This opened up the design space to “co-design” requirements. If many tasks can be performed either by general purpose central processing units or by dedicated (or programmable) hardware, how is the most effective partitioning done? The design step operates in the following feedback loop.

１．タスクへの初期分割を行い、タスクアルゴリズムをコーディングする；
２．タスクをハードウェアまたはソフトウェアコンポーネントに割り当て、これらの割当てに従って適当な実行可能コードを生成する；
３．高レベルシミュレーションを行い、基本機能および動作制約を確認する；
４．設計基準を満たすように、エラーを訂正し可能な再分割を行う；
５．ハードウェアコンポーネントの完全な合成と、ソフトウェアの最適化を行う；
６．すべてのコンポーネントの低レベル（タイミングの正確な）シミュレーションを行う。 1. Do initial partitioning into tasks and code task algorithms;
2. Assign tasks to hardware or software components and generate appropriate executable code according to these assignments;
3. Perform high-level simulation to confirm basic functions and operational constraints;
4). Correct errors and perform possible subdivision to meet design criteria;
5. Complete synthesis of hardware components and software optimization;
6). Perform a low-level (accurate timing) simulation of all components.

７．許容可能なあるいは最適な解に向けて反復する。 7. Iterate towards an acceptable or optimal solution.

この点まで採用されたこの手順に対する代表的なアプローチが、ＰＴＯＬＥＭＹ／ＰＯＬＩＳ環境で例証されている。ＰＴＯＬＥＭＹ／ＰＯＬＩＳは、タスクがソフトウェアあるいはハードウェアのいずれの実装で合成されることも可能にすることによって、真のコデザイン環境であることを主張している。ＰＴＯＬＥＭＹ／ＰＯＬＩＳ環境は、与えられた分割の検査を可能にする統合シミュレーションツールのためのフレームワークを提供するとともに、合成された結果を生成する最終的なコード生成フェーズを提供する。 A typical approach to this procedure that has been adopted to this point is illustrated in the PTOLEMY / POLIS environment. PTOLEMY / POLIS claims to be a true co-design environment by allowing tasks to be synthesized in either software or hardware implementations. The PTOLEMY / POLIS environment provides a framework for an integrated simulation tool that allows inspection of a given partition and provides a final code generation phase that produces synthesized results.

しかし、ＰＴＯＬＥＭＹ／ＰＯＬＩＳ環境のアーキテクチャは、その作者およびその一次ユーザ（すなわち、ハードウェアエンジニア）の先入観を反映している。ＰＴＯＬＥＭＹ／ＰＯＬＩＳ環境には、システム設計がコシミュレートされるためのソフトウェアコンポーネントに対する最小限のサポートしかない。このことは、ソフトウェア実装のためのものを含めて、ＥＳＴＥＲＥＬにおけるすべてのタスク仕様を書く必要があることから明らかである。ＥＳＴＥＲＥＬは、状態マシン型のＶＨＤＬに対するフロントエンドとしては適当なプログラミング言語であるが、ソフトウェアのためにＣへと合成される言語としては大変である。 However, the architecture of the PTOLEMY / POLIS environment reflects the preconceptions of its author and its primary user (ie, hardware engineer). The PTOLEMY / POLIS environment has minimal support for software components for system design to be co-simulated. This is clear from the need to write all task specifications in ESTEREL, including those for software implementations. ESTEREL is a suitable programming language as a front end for state machine type VHDL, but is a difficult language to synthesize into C for software.

プログラマの観点から見ると、「良い」ＥＳＴＥＲＥＬであっても、「悪い」Ｃを生成する。実際、ＥＳＴＥＲＥＬは、そのプログラミング様式（単純なif-thenテストと、goto分岐）において、ＦＯＲＴＲＡＮに非常に近いコードを生成する。このことは、ある程度のマシンコードを生成するだけでもＣコンパイラに重い負荷をかけ、よりグローバルな最適化を行う能力を制限する。また、この条件のため、ソフトウェアエンジニアは、既存のＩＰを捨てることを余儀なくされ、有用で試験済みの（しかも信頼された）アルゴリズムを再コーディングしなければならない。オブジェクト指向技術のサポートがないことはいうに及ばず、最新の言語の高水準の特徴がないため、ソフトウェア分割は、この階層における下層階級となる。 From the programmer's point of view, even a “good” ESTEREL generates a “bad” C. In fact, ESTEREL generates code very close to FORTRAN in its programming style (simple if-then test and goto branch). This places a heavy load on the C compiler just by generating a certain amount of machine code and limits the ability to perform more global optimization. This condition also forces software engineers to discard existing IPs and recode useful and tested (and trusted) algorithms. Needless to say that there is no support for object-oriented technology, software partitioning is a lower class in this hierarchy, since there are no high-level features of modern languages.

最新のプログラミング言語（例えば、Ｃ＋＋）および高水準ハードウェア設計言語の両方の表現力を有する言語がないため、統一されたコデザイン環境に取り組もうとするのにさえかなりの抵抗がある。 Because there is no language that has the expressive power of both modern programming languages (eg, C ++) and high-level hardware design languages, there is considerable resistance to even trying to address a unified co-design environment.

ＰＴＯＬＥＭＹフレームワークの別の実現も可能であり、以上の問題点の解決に役立つ可能性がある。このようなシステムは、ソースコード形式で入手不可能なこともある市販のＩＰを含む既存のコードを利用することができなければ、可能な設計空間を制限し、最適な結果を排除することになる可能性がある。（コストや市場出荷までの期間(time-to-market)の理由から）主な問題点が、ハードウェアとソフトウェアにすでに分割された既存の「レガシー」ＩＰの再利用である場合、設計空間は変わる。例えばＣコードやＶＨＤＬのような複数の方式で存在するコンポーネントが容易に思いつく。この場合、問題は次のようになる。 Other implementations of the PTOLEMY framework are possible and may help solve the above problems. Such systems limit available design space and eliminate optimal results if existing code, including commercial IP, which may not be available in source code form, cannot be used. There is a possibility. If the main problem (for cost or time-to-market reasons) is the reuse of existing “legacy” IP that has already been split into hardware and software, the design space change. Components that exist in multiple ways, such as C code or VHDL, can be easily conceived. In this case, the problem is as follows.

１．一緒に用いられる場合に設計制約を満たす既存のＩＰのセットが存在するか；
２．存在する場合、制御信号およびデータが正しく流れるように、これらのプロセスをどのように「のり付け」しなければならないか；
３．存在しない場合、まずい点はどこか。また、その点を解決するのに必要なアプローチは何か。 1. Is there an existing set of IPs that meet design constraints when used together;
2. How these processes, if present, must be “glued” so that control signals and data flow correctly;
3. If it does n’t exist, what ’s wrong? What is the necessary approach to solve this problem?

これらの問題は、既存の設計どうしの間のインタフェースと、システム全体の総合的な正しい動作とを重要視し、したがって、コシミュレーションおよびコベリフィケーション(co-verification)の点を重要視したものである。 These issues focus on the interface between existing designs and the overall correct operation of the entire system, and therefore focus on co-simulation and co-verification. is there.

動的再分割のことを仮に除いて考えると、プログラマは、自分の必要に最も適した言語（１つの言語または複数の言語）で自分の考えを表現する自由がある。追加の「のり(glue)」は、ハードウェアエンティティ（例えば、アドレスデコーディングロジック）やソフトウェアモジュール（例えば、ＣＰＵ外ハードウェアチップのためのドライバ）として提供可能である。この段階で必要とされるのは、設計を評価するための迅速なフィードバックシステムである。 With the exception of dynamic subdivision, programmers have the freedom to express their ideas in the language (single language or languages) that best suits their needs. Additional “glue” can be provided as hardware entities (eg, address decoding logic) or software modules (eg, drivers for off-CPU hardware chips). What is needed at this stage is a rapid feedback system to evaluate the design.

ここでも、市販のコシミューレションツールは、約束するもののすべてを提供してはいない。ときとしては、この「ツール」は、市販のシミュレータを接続することができる単なる統合フレームワークである。これは、多くの重要な作業を節約し、これが「プラグアンドプレイ」フレームワークを提供する場合には、それぞれの要求を解決するのに最適なツールが接続されることが可能となる。しかし、個々のコンポーネントどうしの間に直接の通信がなく、したがって直接のフィードバックがない場合には、設計空間が増大すると、このアプローチは最終的に機能しなくなる可能性がある。このような実装は、現在、あるツールから別のツールにトレースを供給して収束を待つ反復的「逐次近似」法によって特徴づけられる。 Again, commercial co-simulation tools don't offer everything they promise. Sometimes this "tool" is just an integrated framework that can connect a commercially available simulator. This saves a lot of important work, and if this provides a “plug and play” framework, the best tools can be connected to solve each requirement. However, if there is no direct communication between the individual components and thus no direct feedback, this approach may eventually fail as the design space increases. Such an implementation is currently characterized by an iterative “sequential approximation” method that provides traces from one tool to another and waits for convergence.

１つの代替アプローチは、要求されるフィードバックパスをすでに備えたアーキテクチャフレームワークおよびコシミュレーション環境を開発し、これに、一連の問題群全体にわたり役立つ可能性のあるターゲット変更可能(re-targetable)ツールを配置するものである。 One alternative approach is to develop an architectural framework and co-simulation environment that already has the required feedback path, which includes a re-targetable tool that may be useful across a set of problems. Is to be placed.

ハードウェアエンジニアリングの観点ではなくソフトウェアの観点からコシミュレーション問題を検討し、特に、ＳＯＣ集積の同様のアプリケーションを考慮すると、次のようないくつかの類似点が浮かび上がる。 Considering the co-simulation problem from a software perspective rather than a hardware engineering perspective, especially considering similar applications of SOC integration, several similarities emerge:

１ａ．軽量ソフトウェアシステムは、小規模の（ときにはリアルタイムの）スケジューラを通る相互通信プロセスのセットとして書かれる傾向がある；
１ｂ．ＶＨＤＬコードは、ワイヤのセットを通る相互通信エンティティのセットとして書かれる傾向がある。 1a. Lightweight software systems tend to be written as a set of intercommunication processes through a small (sometimes real-time) scheduler;
1b. VHDL code tends to be written as a set of intercommunication entities through a set of wires.

２ａ．ソフトウェアプロセスはイベント駆動型である傾向がある。すなわち、ソフトウェアプロセスは、何らかの他のプロセスまたは外部信号をウェイト（待機）し、何らかの処理を実行し、そして、次のイベントをウェイトする；
２ｂ．ハードウェアエンティティはイベント駆動型である傾向がある。すなわち、ハードウェアエンティティは、入力に現れる何らかの条件をウェイトし、何らかの処理を実行し、そして、次のイベントをウェイトする。 2a. Software processes tend to be event driven. That is, the software process waits for some other process or external signal, performs some processing, and waits for the next event;
2b. Hardware entities tend to be event driven. That is, the hardware entity waits for some condition that appears in the input, performs some processing, and waits for the next event.

３ａ．ソフトウェアプロセス間のイベントは通常、送信側が情報の「パケット」（データ領域）を用意し、データ同期ポイントとして作用するスケジューラを通して、「信号」を別のプロセスに送ることによって進行する；
３ｂ．ハードウェアプロセス間のイベントは通常、送信側が「データ」ワイヤのセット（例えば、バス）上に値を用意し、データ同期ポイントとして作用するグルーロジック（例えば、ラッチ）を通して、信号（例えば、チップ選択(chip-select)信号）を別のチップに送ることによって進行する。 3a. Events between software processes typically proceed by sending a “signal” to another process through a scheduler where the sender prepares a “packet” (data area) of information and acts as a data synchronization point;
3b. Events between hardware processes are typically signaled (eg, chip select) through glue logic (eg, latches) where the sender provides a value on a set of “data” wires (eg, a bus) and acts as a data synchronization point. Proceed by sending a (chip-select) signal to another chip.

４ａ．シングルＣＰＵシステムでは、真のソフトウェア並列化はなく、見かけの並列化が、オペレーティングシステムがタスク間を切り換えることによって行われる；
４ｂ．きわめて自明なハードウェアを除くすべてのハードウェアでは、真の並列化があり、これは、独立のハードウェアユニットが分散クロックを受信し、それぞれそのそのクロックサイクル中に自己のタスクを実行することによって達成される。 4a. In a single CPU system, there is no true software parallelism, and apparent parallelism is done by the operating system switching between tasks;
4b. In all hardware except for the most obvious hardware, there is true parallelism, which is because independent hardware units receive a distributed clock and each performs its own task during that clock cycle. Achieved.

５ａ．適切に設計されたソフトウェアシステムでは、個々のプロセスは、あらかじめ定義されたプロセス間通信経路（これは、動的に割り当てられることも可能である）を除いては、独立に動作する；
５ｂ．適切に設計されたハードウェアシステムでは、ハードウェアプロセスの個々のインスタンスは、信号およびワイヤを用いて適切に定義された通信経路（これは、「グルーロジック」内の設定によって決定されることも可能である）を除いては、単独で動作する。 5a. In a properly designed software system, individual processes operate independently except for predefined inter-process communication paths (which can also be dynamically assigned);
5b. In a well-designed hardware system, each instance of a hardware process can be determined by a well-defined communication path using signals and wires (this can be determined by settings in "glue logic") Except for).

ＶＨＤＬコードのシミュレーションのためのツール（例えば、Model Technologies社のＶＳＩＭ）がすでに存在する。このようなツールは、システム全体の内部の個々のエンティティの挙動をエミュレートすることによって動作し、上記の項目５ｂに基づいて、ハードウェア並列化の代わりにソフトウェア型の並列化（すなわち、順次実行）を行う。これは、同期ポイントにおける個々のプロセスの状態（例えば、デルタあるいはクロック遷移）をそれぞれチェックし、各シミュレーションサイクルの終端で状態変化を行うことによってなされなければならない。通常、シミュレーション環境全体が、ワークステーション上の単一プロセスとして動作し、内部スケジューラが、その内部のＶＨＤＬプロセスの状態を追跡する。 Tools for simulation of VHDL code already exist (eg Model Technologies VSIM). Such tools operate by emulating the behavior of individual entities within the entire system, and based on item 5b above, instead of hardware parallelism, software-type parallelism (ie sequential execution) )I do. This must be done by checking each individual process state (eg, delta or clock transition) at the synchronization point and making a state change at the end of each simulation cycle. Typically, the entire simulation environment operates as a single process on the workstation, and an internal scheduler tracks the state of its internal VHDL process.

図１に、従来のＶＨＤＬシミュレーションを例示する。通常、シミュレーション環境は、マルチプロセシングオペレーティングシステム（例えばＵＮＩＸ(登録商標）１）内で実行される。シミュレーションのコンポーネントは、命令セットシミュレータ４、ＶＨＤＬシミュレーション５、および、信号を受渡しするためのインタフェース６である。ＶＳＩＭ環境２は、内部スケジューラ３を用いて、シミュレーションコンポーネントを制御する。図１に示されているように、割込み７は、ＶＨＤＬシミュレーション５から、インタフェースを通じて、命令セットシミュレータに渡されなければならない。外部コンポーネント８との相互作用は、内部スケジューラ３を通してＶＳＩＭ環境２に渡されなければならず、さらに、ＶＳＩＭシミュレータ２は、ＵＮＩＸ(登録商標）システム１を通じてデータを渡さなければならない。オペレーティングシステムにとってＶＳＩＭ環境２の全体がＵＮＩＸ(登録商標）オペレーティングシステムシェル１で実行される単一のプロセスとして見えるため、内部スケジューラ３は、従来のシミュレーションシステムの本質的コンポーネントである。同時に、このプロセスが、システムの他の外部コンポーネント８と相互作用することを可能にするため、ＵＮＩＸ(登録商標）オペレーティングシステム１のマルチプロセシングサポートが要求される。 FIG. 1 illustrates a conventional VHDL simulation. Typically, the simulation environment is executed within a multiprocessing operating system (eg, UNIX® 1). The simulation components are an instruction set simulator 4, a VHDL simulation 5, and an interface 6 for passing signals. The VSIM environment 2 uses the internal scheduler 3 to control simulation components. As shown in FIG. 1, the interrupt 7 must be passed from the VHDL simulation 5 through the interface to the instruction set simulator. Interaction with the external component 8 must be passed to the VSIM environment 2 through the internal scheduler 3, and the VSIM simulator 2 must pass data through the UNIX system 1. The internal scheduler 3 is an essential component of a conventional simulation system because the entire VSIM environment 2 appears to the operating system as a single process running on the UNIX operating system shell 1. At the same time, multiprocessing support for the UNIX operating system 1 is required to allow this process to interact with other external components 8 of the system.

現実のシステムは、単純な「部分の総和」アプローチが示すようなものよりもしばしばはるかに複雑であり、システム統合の問題は、個々の設計フェーズ中に見落とされる設計欠陥を明らかにすることがある。このことは特に、非決定性システム（例えば、ベストエフォート型ネットワークリンクを通じて受信されるデータの処理）の設計の際には明らかである。また、設計空間は、起こりうるさまざまなエラー（パケット損失、データ破損、リアルタイムデッドライン超過など）と、それが実際に起きたときにエラーを処理するストラテジとを考慮に入れなければならない。 Real systems are often much more complex than what a simple “sum of parts” approach shows, and system integration issues can reveal design defects that are overlooked during individual design phases . This is particularly apparent when designing non-deterministic systems (eg, processing of data received over best effort network links). The design space must also take into account the various errors that can occur (packet loss, data corruption, real-time deadline exceeded, etc.) and strategies for handling the errors when they actually occur.

実世界のアプリケーションにおけるもう１つの問題点は、データの周期性が、マイクロ秒やナノ秒ではなく、数分の１秒以上となることが多いことである。例として、ＭＰＥＧデータの表示システムを考える。（毎秒３０フレームの表示の場合）３３ｍｓという基本フレーム周期があるが、これは、データの真の周期ではない。ＭＰＥＧは、データの（単一の）フルフレームと、それを修正する一連の補間フレームとを有するＧＯＰ(group of pictures)セグメントへと、フレームをクラスタ化する。通常、ＧＯＰは、１５フレーム、すなわち、０．５秒のオーダーのデータである。フルフレームデータの損失は実質的にそのＧＯＰ内の後続フレームを無意味にするため、このレイヤはエラーの伝搬を制限する。 Another problem in real world applications is that the periodicity of the data is often a fraction of a second, not microseconds or nanoseconds. As an example, consider a display system for MPEG data. There is a basic frame period of 33 ms (in the case of 30 frames per second display), but this is not the true period of data. MPEG clusters frames into GOP (group of pictures) segments that have a (single) full frame of data and a series of interpolated frames that modify it. Usually, a GOP is 15 frames, that is, data on the order of 0.5 seconds. This layer limits the propagation of errors because the loss of full frame data effectively renders subsequent frames in that GOP meaningless.

したがって、このようなシステムの設計においては、１個のＧＯＰ全体のデータより短いシミュレーションは、不正なエラー処理状況を見落とす可能性が高い。実際、秒のオーダーのデータと、１分に達するシミュレートされた実行を要求しない試験ストラテジを考えることは困難である。このようなシステムでは、任意の設計に対して、少なくとも２つの「予備」ステップがあることになる。第１のステップは、その設計がフルスピードＭＰＥＧデータのリアルタイム制約を処理することの確認である。第２のステップは、ずっと長い期間にわたるエラー条件をどの程度うまく処理するかについて調べることである。 Therefore, in such a system design, a simulation shorter than the data of one entire GOP is likely to overlook an illegal error processing situation. In fact, it is difficult to consider a test strategy that does not require data on the order of seconds and a simulated run down to 1 minute. In such a system, there will be at least two “preliminary” steps for any design. The first step is confirmation that the design handles real-time constraints of full speed MPEG data. The second step is to find out how well it handles error conditions over a much longer period.

より強力な（例えば、コスト、エネルギーに関して）ＣＰＵと、追加のカスタムハードウェア（例えば、ＦＰＧＡやＡＳＩＣ）の間には一般的なトレードオフがある。上記のＭＰＥＧの例を用いると、ＭＩＰＳプロセッサでハードウェア支援がない場合、１０フレーム／秒を超える表示は不可能である。しかし、インテルＭＭＸレベルのプロセッサの場合、追加ハードウェアがなくても、３０フレームを優に超える表示が可能である。このような観察は、ハードウェアおよびソフトウェアの両方に対して単一の設計言語を使用することに対する強力な反論となる。というのも、このパフォーマンスは、主要なルーチンをインテルのアセンブリ言語で直接にコーディングすることによってのみ達成することができるからである。当面する問題を正しく解決することが可能な機能を無視することによって、候補となる可能性のあるＣＰＵを「不自由にする」のはほとんど無意味である。 There is a general tradeoff between a more powerful (eg, cost and energy) CPU and additional custom hardware (eg, FPGA or ASIC). Using the above MPEG example, if there is no hardware support in the MIPS processor, display exceeding 10 frames / second is impossible. However, in the case of an Intel MMX level processor, it is possible to display more than 30 frames without additional hardware. Such an observation is a powerful objection to using a single design language for both hardware and software. This is because this performance can only be achieved by coding the main routines directly in Intel assembly language. It is almost meaningless to “make a CPU” a potential candidate by ignoring a function that can correctly solve the problem at hand.

本発明は、上記の状況に鑑み、従来技術の上記の問題点と制限を克服するものである。 The present invention overcomes the above problems and limitations of the prior art in view of the above circumstances.

本発明のさらに他の利点については、その一部は以下の説明に記載されており、一部はその説明から明らかであるか、または、本発明を実施することにより知ることができる。本発明の利点は、特許請求の範囲に特に記載された手段と組合せによって実現され達成される。 Additional advantages of the invention will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The advantages of the invention will be realized and attained by means of the instruments and combinations particularly pointed out in the appended claims.

本発明の第１の特徴によれば、電子回路と、該電子回路をターゲットとする制御プログラムとのコバリデーションを行うコンピュータシステムは、前記電子回路および制御プログラムは、メッセージによるプロセス間通信を提供するマイクロカーネルをもとに構築される軽量オペレーティングシステム環境で実行される所定のコンピュータ言語を用いてシミュレートされ、前記コンピュータシステムは、すべてのグローバル信号の状態を管理する信号管理手段と、前記電子回路のソフトウェアモデルと前記制御プログラムの一部を実行する命令セットシミュレータとによって使用されるクロック信号を生成するクロック生成手段と、前記ソフトウェアモデルおよび前記命令セットシミュレータからのイベントをウェイトするプロセスキューを管理するキュー管理手段と、所定のタイミング間隔ごとに、前記ソフトウェアモデルおよび前記命令セットシミュレータを含むコンポーネントを実行するためのスケジューラ手段と、を有し、信号管理手段、前記クロック生成手段、前記キュー管理手段および前記スケジューラ手段は、前記コンピュータシステムが実行する中央制御プロセスによって制御されるサブシステムであり、前記電子回路は前記軽量オペレーティングシステムで実行されるプロセスにマッピングされ、前記マッピングされた電子回路の要素間の状態変化は前記軽量オペレーティングシステムのメッセージ受け渡しでモデル化されることを特徴とする。 According to a first aspect of the present invention, there is provided a computer system that performs validation of an electronic circuit and a control program that targets the electronic circuit, wherein the electronic circuit and the control program provide interprocess communication using messages. Simulated using a predetermined computer language executed in a lightweight operating system environment constructed on the basis of a microkernel, the computer system includes signal management means for managing the state of all global signals, and the electronic circuit. A clock generation means for generating a clock signal used by the software model of the system and an instruction set simulator for executing a part of the control program, and a process queue for waiting for an event from the software model and the instruction set simulator Queue management means for managing, and scheduler means for executing a component including the software model and the instruction set simulator for each predetermined timing interval, the signal management means, the clock generation means, and the queue management Means and a scheduler means are subsystems controlled by a central control process executed by the computer system, wherein the electronic circuit is mapped to a process executed by the lightweight operating system, and elements of the mapped electronic circuit The state change between them is modeled by message passing of the lightweight operating system.

本発明の第２の特徴によれば、電子回路とその電子回路を制御する制御プログラムとを含むシステムの検証のための、ターゲットプロセッサのサイクル精度の命令セットシミュレーションを導出する方法が実現される。この方法は、メモリとキャッシュの間の相互作用をデータバス上の信号のシーケンスとしてモデル化することを含む。ここで、ターゲットプロセッサは、メモリオペレーションの完了を待機することによってキャッシュがロードされるまでストールされる。さらに、この方法は、ターゲットプロセッサの内部データフローモデルから導出される信号のシーケンスを用いるとともに、内部バス幅およびタイミングを用いて、命令パイプラインを充填して、適当なクロックサイクル数だけ遅延を行うことを含む。さらに、この方法は、ターゲットプロセッサの命令デコードサイクルを実行して、実行のために利用可能な命令を解釈することを含む。さらに、この方法は、スケジューリングされた命令がパイプライン切断をインプリメントしているかどうかを判定し、スケジューリングされた命令がパイプライン切断をインプリメントしている場合、将来の命令のためにスケジューラをストールさせることを含む。さらに、この方法は、スケジューリングされた命令を、適当な命令パイプラインまたはハードウェアコンポーネントに転送し、各命令ごとにサイクル実行時間を計算することを含む。さらに、この方法は、非決定性タイミングがあるかどうかを判定する。さらに、この方法は、計算された実行サイクルの終端で利用可能な結果を出力することを含む。さらに、この方法は、ターゲットプロセッサのエミュレートされた制御レジスタとともに、信号インタフェースを用いて、割込みハンドラが次のサイクルにスケジューリングされるべきかどうかを決定することを含む。 According to the second aspect of the present invention, a method for deriving a cycle-accurate instruction set simulation of a target processor for verification of a system including an electronic circuit and a control program for controlling the electronic circuit is realized. The method includes modeling the interaction between the memory and the cache as a sequence of signals on the data bus. Here, the target processor is stalled until the cache is loaded by waiting for the completion of the memory operation. In addition, this method uses a sequence of signals derived from the internal data flow model of the target processor and uses the internal bus width and timing to fill the instruction pipeline and delay by an appropriate number of clock cycles. Including that. In addition, the method includes executing an instruction decode cycle of the target processor to interpret instructions available for execution. In addition, the method determines whether the scheduled instruction implements pipeline disconnection and if the scheduled instruction implements pipeline disconnection, stalls the scheduler for future instructions. including. Further, the method includes transferring the scheduled instruction to an appropriate instruction pipeline or hardware component and calculating the cycle execution time for each instruction. Furthermore, this method determines whether there is non-deterministic timing. Further, the method includes outputting a result that is available at the end of the calculated execution cycle. The method further includes using the signal interface along with the emulated control register of the target processor to determine whether the interrupt handler should be scheduled for the next cycle.

本発明の第２の特徴によれば、電子回路と、その電子回路をターゲットとする制御プログラムとのコバリデーションを行うコンピュータシステムのための実行可能プログラムが実現される。ここで、電子回路および制御プログラムは、メッセージによるプロセス間通信を提供するマイクロカーネルをもとに構築される軽量オペレーティングシステム環境で実行される所定のコンピュータ言語を用いてシミュレートされる。この実行可能プログラムは、複数のグローバル信号の状態を管理する第１実行可能コード部分を有する。さらに、この実行可能プログラムは、複数のクロック信号を生成する第２実行可能コード部分を有する。さらに、この実行可能プログラムは、イベントをウェイトするプロセスキューを管理する第３実行可能コード部分を有する。さらに、この実行可能プログラムは、所定のタイミング間隔を生成する第４実行可能コード部分を有する。さらに、この実行可能プログラムは、コンピュータ上で実行されるときに、少なくとも前記第１、第２、第３および第４実行可能コード部分の実行を制御する第５実行可能コード部分を有し、前記電子回路は前記軽量オペレーティングシステムで実行されるプロセスにマッピングされ、前記マッピングされた電子回路の要素間の状態変化は前記軽量オペレーティングシステムのメッセージ受け渡しでモデル化される。 According to the second aspect of the present invention, an executable program for a computer system that performs the validation of an electronic circuit and a control program targeting the electronic circuit is realized. Here, the electronic circuit and the control program are simulated using a predetermined computer language executed in a lightweight operating system environment constructed based on a microkernel that provides interprocess communication by messages. The executable program has a first executable code portion that manages the state of a plurality of global signals. The executable program further includes a second executable code portion that generates a plurality of clock signals. Further, the executable program has a third executable code portion that manages a process queue that waits for an event. Furthermore, the executable program has a fourth executable code portion that generates a predetermined timing interval. The executable program further includes a fifth executable code portion that controls execution of at least the first, second, third, and fourth executable code portions when executed on a computer; An electronic circuit is mapped to a process executed in the lightweight operating system, and a state change between elements of the mapped electronic circuit is modeled by message passing of the lightweight operating system.

本発明の上記の特徴およびその他の利点は、以下の詳細な説明から、また、添付図面を参照して、明らかとなる。 The above features and other advantages of the present invention will become apparent from the following detailed description and with reference to the accompanying drawings.

本発明の特徴について説明する前に、本発明の理解を助けるため、および、さまざまな用語の意味を提示するために、従来技術に関していくつかの詳細な事項について説明する。 Before describing the features of the present invention, some details are discussed with respect to the prior art to assist in understanding the present invention and to provide the meaning of various terms.

本明細書において、「コンピュータシステム」という用語は、可能な限り最も広い意味を含み、独立（スタンドアローン）のプロセッサ、ネットワーク化されたプロセッサ、メインフレームプロセッサ、およびクライアント／サーバ関係にあるプロセッサを含むが、これらに限定されない。「コンピュータシステム」という用語は、少なくともメモリおよびプロセッサを含むものと理解されるべきである。一般に、メモリは、いろいろなときに、実行可能プログラムコードの少なくとも一部を格納し、プロセッサは、その実行可能プログラムコードを構成する命令を実行する。 As used herein, the term “computer system” has the broadest possible meaning and includes stand-alone processors, networked processors, mainframe processors, and processors in a client / server relationship. However, it is not limited to these. The term “computer system” should be understood to include at least a memory and a processor. Generally, the memory stores at least a portion of the executable program code at various times, and the processor executes the instructions that make up the executable program code.

本明細書において、「組込みコンピュータシステム」という用語は、組込み中央プロセッサとオブジェクトコード命令を有するメモリとからなるが、これに限定されない。組込みコンピュータシステムの例には、携帯情報端末（ＰＤＡ）、セルラ電話機およびディジタルカメラがあるが、これらに限定されない。一般に、どんなに原始的であっても、中央プロセッサを用いてその機能を制御する機器は、組込みコンピュータシステムを有するということができる。組込み中央プロセッサは、メモリに格納されたオブジェクトコード命令を実行する。組込みコンピュータシステムは、キャッシュメモリ、入出力デバイスおよびその他の周辺装置を有することが可能である。 In this specification, the term “embedded computer system” includes, but is not limited to, an embedded central processor and a memory having object code instructions. Examples of embedded computer systems include, but are not limited to, personal digital assistants (PDAs), cellular telephones, and digital cameras. In general, no matter how primitive, an instrument that uses a central processor to control its functions can be said to have an embedded computer system. The embedded central processor executes object code instructions stored in memory. An embedded computer system can have a cache memory, input / output devices, and other peripheral devices.

理解されるように、「所定オペレーション」という用語および「コンピュータシステムソフトウェア」という用語は、本明細書の目的では、実質的に同じものを意味する。本発明の実施にとって、メモリおよびプロセッサが物理的に同じ場所に位置する必要はない。すなわち、プロセッサおよびメモリが物理的に異なる装置にあることや、地理的に異なる場所にあることも予期される。 As will be appreciated, the terms “predetermined operation” and “computer system software” mean substantially the same for the purposes of this specification. For the implementation of the present invention, the memory and processor need not be physically located at the same location. That is, it is anticipated that the processor and memory will be in physically different devices or in different geographical locations.

本明細書において、当業者には理解されるように、「媒体」あるいは「コンピュータ可読媒体」は、ディスケット、テープ、コンパクトディスク、集積回路、カートリッジ、通信回線を通じてのリモート伝送、あるいはその他の同様の、コンピュータが使用可能な媒体を含むことが可能である。例えば、コンピュータシステムソフトウェアを分散させるため、供給業者（サプライや）は、ディスケットを提供することも可能であり、また、衛星伝送、直接電話リンク、あるいはインターネットを通じて何らかの形で所定オペレーションを実行する命令を伝送することも可能である。 As used herein, "media" or "computer-readable media" refers to diskettes, tapes, compact disks, integrated circuits, cartridges, remote transmission over communication lines, or other similar Computer-usable media can be included. For example, to distribute computer system software, suppliers (suppliers, etc.) can also provide diskettes and provide instructions to perform certain operations in some form over satellite transmission, direct telephone links, or the Internet. It is also possible to transmit.

コンピュータシステムソフトウェアは、ディスケットに「書き込まれる」ことも、集積回路に「格納される」ことも、通信回線を通じて「伝送される」ことも可能であるが、理解されるように、本明細書の目的では、コンピュータ使用可能媒体は、所定オペレーションを実行する命令を「有する」ということにする。すなわち、「有する」という用語は、所定オペレーションを実行するための命令がコンピュータ使用可能媒体に関わる上記およびすべての等価な方法を含むことを意図するものである。 As will be appreciated, the computer system software can be “written” on diskette, “stored” on an integrated circuit, or “transmitted” over a communication line. For purposes, a computer usable medium is said to “have” instructions to perform a predetermined operation. That is, the term “comprising” is intended to include the above and all equivalent ways in which instructions for performing a given operation involve a computer-usable medium.

したがって、簡単のため、「プログラム製品」という用語は、以下、任意の形式で所定オペレーションを実行するための命令を有する（上記で定義）コンピュータ使用可能媒体を指すために用いられる。 Thus, for simplicity, the term “program product” is used hereinafter to refer to a computer usable medium having instructions (as defined above) for performing a predetermined operation in any form.

次に、添付図面を参照して、本発明の特徴の詳細な説明を行う。 The features of the present invention will now be described in detail with reference to the accompanying drawings.

ＶＨＤＬプロセスと、軽量オペレーティングシステムで実行されるプロセスとの間の顕著な類似性のため、ＶＨＤＬプロセスを、個別のソフトウェアプロセスとして、軽量オペレーティングシステムを用いて書き、イベントをスケジューリングすることが可能となる。好ましくは、軽量オペレーティングシステムは、メッセージによるプロセス間通信を提供する小さいマイクロカーネルをもとに構築される。メッセージは、関連するパラメータを有するオペレーションコードであり、上記の項目３ａのイベント駆動モデルを表現する。好ましくは、メッセージは、プロセス間で受渡しされるシステム構造体にマッピングされるグローバルデータ定義である。イベントのセマンティクスはアプリケーションによって定義される。この場合、プロセス間メッセージは、ＶＨＤＬ信号値の変化についてプロセスに通知するために用いられることも可能である。好ましくは、軽量オペレーティングシステムは、信号線、アドレスバス、データバスおよびクロック分配方式が少数の基本メッセージから構築されることを可能にする一群のメッセージプリミティブを提供する。好ましくは、マイクロカーネルコアによって供給されるメッセージに加えて、プロセスは、アプリケーション固有のインタフェースをインプリメントするために自己のメッセージコードおよびパラメータを定義することも可能である。その場合、このインタフェースは、共有ライブラリにインプリメントされたルーチンコールのセットにまとめられる。 Because of the remarkable similarity between VHDL processes and processes running on lightweight operating systems, it becomes possible to write VHDL processes as separate software processes using the lightweight operating system and schedule events. . Preferably, the lightweight operating system is built on a small microkernel that provides message-to-process communication. The message is an operation code having an associated parameter, and expresses the event-driven model of item 3a above. Preferably, the message is a global data definition that is mapped to a system structure that is passed between processes. Event semantics are application defined. In this case, the inter-process message can also be used to notify the process about changes in the VHDL signal value. Preferably, the lightweight operating system provides a group of message primitives that allow signal lines, address buses, data buses and clock distribution schemes to be built from a small number of basic messages. Preferably, in addition to the messages supplied by the microkernel core, the process can also define its own message code and parameters to implement application specific interfaces. In that case, this interface is grouped into a set of routine calls implemented in a shared library.

次に、図２を参照して、電子回路と、その回路をターゲットとするソフトウェアとのコバリデーションを行う本発明の特徴について簡単に説明する。電子回路および制御プログラムは、軽量コンピュータシステムオペレーティング環境で実行される所定のコンピュータ言語を用いてシミュレートされる。Ｓ０９００で、１つまたは複数の回路記述言語によってモデル化された、コンピュータハードウェアの動作の要素を、選択された軽量オペレーティングシステムの構文およびメソッドに翻訳する。Ｓ１０００で、電子回路設計の動作記述を、軽量コンピュータシステムオペレーティング環境をターゲットとする前記構文からなるソフトウェアモデルに翻訳する。Ｓ１１００で、電子回路設計をターゲットとする制御プログラムの部分を命令セットシミュレーションが実行するように、命令セットシミュレータを所定のコンピュータ言語で構成する。Ｓ１２００で、電子回路設計のソフトウェアモデルを命令セットシミュレーションに結合して、軽量コンピュータオペレーティングシステム環境で実行される試験対象システムを作成する。最後に、Ｓ１３００で、試験対象システムに刺激を入力することにより、試験対象システムに結果を出力させる。 Next, with reference to FIG. 2, the feature of the present invention for performing the validation of the electronic circuit and the software targeting the circuit will be briefly described. The electronic circuit and control program are simulated using a predetermined computer language that runs in a lightweight computer system operating environment. In S0900, the elements of the computer hardware behavior, modeled by one or more circuit description languages, are translated into the selected lightweight operating system syntax and methods. In S1000, the behavioral description of the electronic circuit design is translated into a software model having the above syntax targeting a lightweight computer system operating environment. In S1100, the instruction set simulator is configured in a predetermined computer language so that the instruction set simulation executes the part of the control program targeted for electronic circuit design. At S1200, the electronic circuit design software model is combined with the instruction set simulation to create a system under test to be executed in a lightweight computer operating system environment. Finally, in S1300, a stimulus is input to the test target system to cause the test target system to output a result.

次に、電子回路および制御プログラムのコバリデーションを行う本発明の特徴についてさらに詳細に説明する。 Next, the feature of the present invention for performing the validation of the electronic circuit and the control program will be described in more detail.

ステップＳ０９００では、ハードウェア記述言語を調べる。好ましくはＶＨＤＬ言語がモデル化されるが、本発明の方法および応用は、特定の言語や特定の軽量オペレーティングシステムに限定されない。動作レベルＶＨＤＬの制御フローおよび様式は、以下の具体的な点でＣに容易にマッピングすることが可能である。 In step S0900, the hardware description language is checked. Preferably, a VHDL language is modeled, but the methods and applications of the present invention are not limited to a specific language or a specific lightweight operating system. The control flow and mode of the operation level VHDL can be easily mapped to C in the following specific points.

ＶＨＤＬの変数はＣの変数に似ている。その値は、制御がプロセスを離れて戻ってきたときにも保持されている。しかし、その名前のスコープは、それがインスタンス化されたプロセスについてローカルである。したがって、本発明は、ＶＨＤＬの変数をＣのスタティック変数で置き換える。ビットベクタ型は、３２ビット幅（３２ビットプロセッサで動作する処理系の場合）までは、Ｃの符号なし整数型に含めることができる。個別ビットアクセスおよびスライス演算はいずれも、シフトおよびマスク演算で実行することができる。 VHDL variables are similar to C variables. That value is retained when control leaves the process. However, the scope of that name is local to the process in which it was instantiated. Thus, the present invention replaces VHDL variables with C static variables. The bit vector type can be included in the C unsigned integer type up to 32 bits wide (in the case of a processing system operating on a 32-bit processor). Both individual bit access and slice operations can be performed with shift and mask operations.

信号は、プロセス間で値を送信するために用いられる。信号の値のスコープは、システム全体を通じてグローバルであるが、その名前は、特定のプロセスについてローカルである。本発明では、プロセスの複数のコピーを、それぞれ相異なる信号（例えば、UART_A_Chip_Select、UART_B_Chip_Select）にアクセスする同一のローカル信号名（例えば、Chip_Select）でインスタンス化することが可能である。「０」や「１」のような単純な信号値と、スライス値とは、マシン内のビットパターンとして容易に表現される。ＶＨＤＬは、用いられる電気的モデルに依存して、「Ｚ」や「Ｗ」のようなこの範囲外の論理値をサポートする。 Signals are used to send values between processes. The scope of the signal value is global throughout the system, but its name is local to a particular process. In the present invention, multiple copies of a process can be instantiated with the same local signal name (eg, Chip_Select) that accesses different signals (eg, UART_A_Chip_Select, UART_B_Chip_Select). Simple signal values such as “0” and “1” and slice values are easily expressed as bit patterns in the machine. VHDL supports logic values outside this range, such as “Z” and “W”, depending on the electrical model used.

個々のＶＨＤＬプロセスは、一定のナノ秒数の間、または、その信号に関する条件のセットが満たされるまで、その実行をサスペンドすることが可能である。ＶＨＤＬプロセスが「動作可能」であるための条件を決定することは、言語構文のマッピングにおいてインプリメントされなければならない機能の１つである。 An individual VHDL process can suspend its execution for a fixed number of nanoseconds or until a set of conditions for that signal is met. Determining conditions for a VHDL process to be “operational” is one of the functions that must be implemented in language syntax mapping.

また、ＶＨＤＬ言語は、ハードウェアの実行のフローを制御する構文を提供する。これは、軽量オペレーティングシステムによって用いられるプログラミング言語（この場合はＣ）の等価な構文にマッピングされなければならない。 The VHDL language also provides a syntax that controls the flow of hardware execution. This must be mapped to the equivalent syntax of the programming language (in this case C) used by the lightweight operating system.

好ましくは、シミュレーションシステム内の中央プロセスは、各信号の値を追跡すること、および、その値をモニタするプロセス間に信号変化を分配することの処理を行う。以下の説明でこのプロセスに言及するときには、「ファブリック」プロセスと呼ぶことにする。本発明は、軽量オペレーティングシステムのメッセージ受渡しオペレーションを用いて、プロセス間で、信号値の変化やその他の構文の受渡しを行う。例えば、"wait for N ns"（Ｎナノ秒ウェイト）というＶＨＤＬ構文は、次のような２つのユーザ定義パラメータを有するメッセージにより表現される。 Preferably, the central process in the simulation system handles the tracking of the value of each signal and the distribution of signal changes among the processes monitoring that value. When this process is referred to in the following description, it will be referred to as the “fabric” process. The present invention uses lightweight operating system message passing operations to pass signal value changes and other syntax passing between processes. For example, the VHDL syntax “wait for N ns” is expressed by a message having the following two user-defined parameters.

Msg WAITFOR｛int CLOCKT; int TICKS;｝
TICKSパラメータは、ＶＨＤＬにおけるＮパラメータからナノ秒数を提供し、CLOCKTパラメータは、クロック相対同期ポイントを提供する。 Msg WAITFOR {int CLOCKT; int TICKS;}
The TICKS parameter provides nanoseconds from the N parameter in VHDL, and the CLOCKT parameter provides the clock relative synchronization point.

メッセージインタフェースは、アプリケーションコードに直接には現れない。すべての相互作用は、マイクロカーネルコア機能を呼び出す共有ライブラリルーチンを通じてなされる。 The message interface does not appear directly in the application code. All interactions are through shared library routines that call microkernel core functions.

信号は文字列を用いて命名されるが、opaque型ポインタを用いて参照される。以下のインタフェースルーチンは、信号名と参照の間のマッピングをインプリメントする。さらに、このインタフェースは、信号値の変化について中央制御プロセッサに通知し、信号の値の変化をウェイトする、共有ライブラリルーチンを含む。 Signals are named using strings, but are referenced using opaque pointers. The following interface routines implement the mapping between signal names and references. In addition, the interface includes shared library routines that notify the central control processor about signal value changes and wait for signal value changes.

・void *fabinit(void)：このルーチンは、中央制御プロセスインタフェースライブラリを初期化する。このルーチンは、信号メンテナを使用することになる各プロセスによって１回呼び出されなければならない。 Void * fabinit (void): This routine initializes the central control process interface library. This routine must be called once by each process that will use the signal maintainer.

・void *fabric_find_signal(char *SIGNAL)：このルーチンは、（グローバルに命名された）SIGNALパラメータを参照するために使用可能なopaque型ポインタを返す。このルーチンは、すべての登録された名前に対する文字列比較を実行し、信号参照が存在すればそれを返す。信号がリスト内にまだない場合、初期値０で作成される。このようにして、プロセスは、特定の順序でインスタンス化あるいは初期化されることは不要である。 Void * fabric_find_signal (char * SIGNAL): This routine returns an opaque pointer that can be used to reference the (globally named) SIGNAL parameter. This routine performs a string comparison against all registered names and returns a signal reference if it exists. If the signal is not already in the list, it is created with an initial value of zero. In this way, processes need not be instantiated or initialized in a particular order.

・uint fabric_read_signal(void *SYM)：このルーチンは、opaque型SYMパラメータによって参照される信号の現在値を、もしそれが定義された（すなわち、非浮遊トライステート）値を有していれば、返す。 Uint fabric_read_signal (void * SYM): This routine returns the current value of the signal referenced by the opaque SYM parameter, if it has a defined (ie non-floating tristate) value .

・void fabric_set_clock(char *NAME, int INITIAL, int PERIOD)：このルーチンは、このプロセスに対する同期オペレーションで用いられるクロック信号を定義する。このルーチンは、ファブリックプロセス内への共有ライブラリ呼出しであり、ファブリックプロセスがすべての遷移を内部的に処理する。INITIALパラメータは、信号の開始値（通常０）を指定し、PERIODパラメータは、遷移間の間隔（すなわち、クロック周期の半分）を質する。これは信号を定義するため、クロックはグローバルであり、複数のプロセスが同じクロックを共有することができる。 Void fabric_set_clock (char * NAME, int INITIAL, int PERIOD): This routine defines the clock signal used in synchronous operations for this process. This routine is a shared library call into the fabric process, and the fabric process handles all transitions internally. The INITIAL parameter specifies the starting value of the signal (usually 0), and the PERIOD parameter determines the interval between transitions (ie half the clock period). Since this defines a signal, the clock is global and multiple processes can share the same clock.

・void fabric_tristate(void *SYM, int VALUE)：このルーチンは、ＩＥＥＥ論理システムの拡張値の設定およびそれとの比較を可能にする多値論理信号をサポートする。 Void fabric_tristate (void * SYM, int VALUE): This routine supports multi-valued logic signals that allow setting and comparison with IEEE logic system extension values.

・void fabric_write_signal(void *SYM, uint VALUE)：このルーチンは、opaque型ポインタSYMによって参照される信号の値をVALUEにセットする。 Void fabric_write_signal (void * SYM, uint VALUE): This routine sets the value of the signal referenced by the opaque pointer SYM to VALUE.

ファブリックプロセスは、信号が、単純な２値ドメイン内で作用しているか、それとも、多値論理を使用しているかを知らなければならない。このドメインは、システムにわたり信号変化を送信するために用いられるインタフェースルーチンによって決定され、処理系は、多値論理ドメインが表現されることを可能にする。それぞれの多値論理ドメインに要求される拡張値の表現を含む個々のＣヘッダファイル（例えば、ieee.h）は、個々の論理ドメインをカプセル化することが可能である（例えば、std_logicパッケージ）。 The fabric process must know whether the signal is operating in a simple binary domain or using multi-valued logic. This domain is determined by the interface routines used to send signal changes across the system, and the processor allows multi-valued logic domains to be represented. Individual C header files (eg, ieee.h) that contain a representation of the extension values required for each multi-valued logical domain can encapsulate individual logical domains (eg, std_logic package).

以下の形のWaitルーチンはすべて、CLOCKTパラメータに次の３つの値のうちの１つをとる。０は、非同期オペレーションを意味する（指定された条件が満たされるとすぐにプロセスをスケジューリングする）。−１は、条件がプロセスのクロック信号の立下り端で満たされる場合にプロセスをスケジューリングし、＋１は、条件がプロセスのクロック信号の立上り端で満たされる場合にプロセスをスケジューリングする。 All Wait routines of the following form take one of the following three values for the CLOCKT parameter: 0 means asynchronous operation (schedule the process as soon as the specified condition is met). -1 schedules the process if the condition is met on the falling edge of the process clock signal, +1 schedules the process if the condition is met on the rising edge of the process clock signal.

・void WaitClk(int CLOCKT)：このルーチンは、（CLOCKTパラメータに従って）そのクロックの次の立上りまたは立下り端まで、現在のプロセスをサスペンドする。 Void WaitClk (int CLOCKT): This routine suspends the current process until the next rising or falling edge of the clock (according to the CLOCKT parameter).

・void WaitForc(uint TICKS, int CLOCKT)：このルーチンは、TICKS個のクロックティックの間、現在のプロセスをサスペンドする。 Void WaitForc (uint TICKS, int CLOCKT): This routine suspends the current process for TICKS clock ticks.

・void WaitUntilc(void *SYM, int OP, uint VALUE, int CLOCKT)
・void WaitUntil2c(void *SYM1, int OP1, uint VALUE1, void *SYM2, int OP2, uint VALUE2, int CLOCKT)
・void WaitUntil3c(void *SYM1, int OP1, uint VALUE1, void *SYM2, int OP2, uint VALUE2, void *SYM3, int OP3, uint VALUE3, int CLOCKT)：これらのルーチンは、１、２または３個の条件のうちのいずれか１つ以上が（同時に）真になるのをウェイトする。それぞれの条件は、信号SYM、オペレーションOPおよびVALUEである。例えば、"EQ"オペレーションは、SIGNALとVALUEの間の等値性をテストするために用いられる。 Void WaitUntilc (void * SYM, int OP, uint VALUE, int CLOCKT)
Void WaitUntil2c (void * SYM1, int OP1, uint VALUE1, void * SYM2, int OP2, uint VALUE2, int CLOCKT)
Void WaitUntil3c (void * SYM1, int OP1, uint VALUE1, void * SYM2, int OP2, uint VALUE2, void * SYM3, int OP3, uint VALUE3, int CLOCKT): These routines can be 1, 2 or 3 Wait for one or more of the conditions to be true (at the same time). Each condition is signal SYM, operation OP and VALUE. For example, the “EQ” operation is used to test for equality between SIGNAL and VALUE.

上記のルーチンは、プロセス間メッセージを形成し、それをファブリックプロセスに送る。このコントローラプロセスは、信号に対するすべての変化を追跡し、イベントをウェイトしているプロセスをいつ呼び起こすかを決定する。 The above routine forms an interprocess message and sends it to the fabric process. This controller process keeps track of all changes to the signal and determines when to wake up the process waiting for the event.

これらのルーチンは、共有ライブラリにインプリメントされ、Ｃヘッダファイルを通じてエクスポートされた。また、ヘッダファイルは、ファブリックプロセスによって管理されるグローバルナノ秒ティックカウンタもエクスポートするため、個々のプロセスからのトレース出力は、現在のタイムスタンプでタグ付けされることが可能である。 These routines were implemented in a shared library and exported through a C header file. The header file also exports a global nanosecond tick counter managed by the fabric process so that trace output from individual processes can be tagged with the current timestamp.

図３を参照すると、本発明は、それぞれの信号名を前処理する。Ｓ１０１０で、それぞれの信号に対して、本発明は、名前にプレフィクス句"signal_"を付けることによって、内部変数を生成する。したがって、信号CHIP_SELECTへのopaque型ポインタは"static void *signal_CHIP_SELECT"となる。その後、この新しい信号名は、プロセスがインスタンス化されるときに、信号参照に結合される。インスタンス化された信号名がその信号名と同一である必要はない。例えば、signal_CHIP_SELECTは次のようにインスタンス化されることも可能である。 Referring to FIG. 3, the present invention preprocesses each signal name. In S1010, for each signal, the present invention generates an internal variable by appending the prefix phrase “signal_” to the name. Therefore, the opaque pointer to the signal CHIP_SELECT is “static void * signal_CHIP_SELECT”. This new signal name is then bound to the signal reference when the process is instantiated. The instantiated signal name need not be the same as the signal name. For example, signal_CHIP_SELECT can be instantiated as follows:

signal_CHIP_SELECT=fabric_find_signal("UART_A_CHIP_SELECT");
ＶＨＤＬの本体で、プリプロセッサは、信号CHIP_SELECTへの参照を、変数signal_CHIP_SELECTを参照するように修正する。インスタンス化されるとき、変数signal_CHIP_SELECTは、UART_A_CHIP_SELECT信号に結合される。 signal_CHIP_SELECT = fabric_find_signal ("UART_A_CHIP_SELECT");
In the VHDL body, the preprocessor modifies the reference to the signal CHIP_SELECT to refer to the variable signal_CHIP_SELECT. When instantiated, the variable signal_CHIP_SELECT is coupled to the UART_A_CHIP_SELECT signal.

Ｓ１０１５で、前処理すべき信号名がさらにあるかどうかを判定する。ある場合、Ｓ１０１８で、前処理すべき次の信号名を取得し、プロセス制御フローはＳ１０１０に戻って、信号名の前処理を続ける。ない場合、プロセス制御フローはＳ１０２０に進む。 In S1015, it is determined whether there are more signal names to be preprocessed. If there is, the next signal name to be preprocessed is acquired in S1018, and the process control flow returns to S1010 to continue the preprocessing of the signal name. If not, the process control flow proceeds to S1020.

図３を参照して、プロセス制御フローがＳ１０２０〜Ｓ１０６０に到達すると、どのタイプの動作言語構文が用いられているか、および、その動作言語を、軽量コンピュータオペレーティングシステムをターゲットとする構文にどのように翻訳するかについて判定を行う。好ましくは、使用される動作言語はＶＨＤＬであり、翻訳のために用いられる構文はＣベースのものである。 Referring to FIG. 3, when the process control flow reaches S1020 to S1060, what type of operating language syntax is used, and how that operating language is targeted to a syntax targeting a lightweight computer operating system. Determine whether to translate. Preferably, the working language used is VHDL and the syntax used for translation is C-based.

Ｓ１０２０で、翻訳されるべき動作言語構文が内部変数への代入であるかどうかを判定する。内部変数への代入である場合、プロセス制御フローはＳ１０２５に進む。内部変数への代入でない場合、プロセス制御フローはＳ１０３０に進む。 In step S1020, it is determined whether the operation language syntax to be translated is assignment to an internal variable. In the case of assignment to an internal variable, the process control flow proceeds to S1025. If it is not an assignment to an internal variable, the process control flow proceeds to S1030.

Ｓ１０２５で、ＶＨＤＬ代入からＣ代入への翻訳が次のように行われる。 In S1025, translation from VHDL substitution to C substitution is performed as follows.

ＶＨＤＬ： i_word := 1;
Ｃ： static uint i_word;
i_word = 1;
この翻訳の終了後、プロセス制御フローはＳ１０２０に戻る。 VHDL: i_word: = 1;
C: static uint i_word;
i_word = 1;
After completion of this translation, the process control flow returns to S1020.

Ｓ１０３０で、翻訳されるべき動作言語構文が外部変数への代入であるかどうかを判定する。外部変数への代入である場合、プロセス制御フローはＳ１０３５に進む。外部変数への代入でない場合、プロセス制御フローはＳ１０４０に進む。 In S1030, it is determined whether or not the operation language syntax to be translated is assignment to an external variable. In the case of assignment to an external variable, the process control flow proceeds to S1035. If it is not assignment to an external variable, the process control flow proceeds to S1040.

ほとんどのインタフェースルーチンは、ＶＨＤＬ−Ｃ翻訳が読みやすくかつ理解しやすくなるように設計されたプリプロセッサマクロを通じてアクセスされる。これらのマクロは、信号名と変数名の間の変換を処理する。以下のマクロにおいて、"signal"はＶＨＤＬ信号の名前を意味する。これらの名前は、上記のopaque型参照へと前処理される。 Most interface routines are accessed through preprocessor macros designed to make VHDL-C translations easier to read and understand. These macros handle the conversion between signal names and variable names. In the following macro, “signal” means the name of the VHDL signal. These names are preprocessed to the opaque type reference above.

・uint GetSignal(signal NAME)：このマクロは、NAMEという信号の現在値を返す；
・void Signal(signal NAME, uint VALUE)：このマクロは、NAMEという信号のVALUEをその（３２ビットの）値にセットする；
・void SignalZ(signal NAME)：このマクロは、NAMEという信号の値をＩＥＥＥのＺ値にセットする。 Uint GetSignal (signal NAME): This macro returns the current value of the signal named NAME;
Void Signal (signal NAME, uint VALUE): This macro sets the VALUE of the signal NAME to its (32-bit) value;
Void SignalZ (signal NAME): This macro sets the value of the signal NAME to the IEEE Z value.

Ｓ１０３５で、ＶＨＤＬ代入から外部信号への翻訳が次のように行われる：
ＶＨＤＬ： a_busp <= '0';
マクロ： Signal(a_busp, 0);
Ｃ： fabric_write_signal(signal_a_busp, 0);
生成される代入は、ランタイムインタフェースを通じてファブリックプロセスへの呼出しを生成することになる。この翻訳の終了後、プロセス制御フローはＳ１０２０に戻る。 At S1035, translation from VHDL substitution to external signal is performed as follows:
VHDL: a_busp <= '0';
Macro: Signal (a_busp, 0);
C: fabric_write_signal (signal_a_busp, 0);
The generated assignment will generate a call to the fabric process through the runtime interface. After completion of this translation, the process control flow returns to S1020.

Ｓ１０４０で、翻訳されるべき動作言語構文がウェイトポイントであるかどうかを判定する。ウェイトポイントである場合、プロセス制御フローはＳ１０４５に進む。外部変数への代入でない場合、プロセス制御フローはＳ１０５０に進む。外部信号への代入と同様に、ウェイトポイントは、次のようなプリプロセッサマクロを用いたインタフェース呼出しにマッピングされる。 In S1040, it is determined whether the operating language syntax to be translated is a wait point. If it is a wait point, the process control flow proceeds to S1045. If it is not an assignment to an external variable, the process control flow proceeds to S1050. Similar to the assignment to the external signal, the wait point is mapped to an interface call using the following preprocessor macro.

・void WaitClk｛F|R｝(void)：これらのマクロはそれぞれ、プロセスクロック信号の次の立上りまたは立下り端をウェイトする；
・void WaitFor｛F|R｝(uint TICKS)：これらのマクロは、TICKSナノ秒間ウェイトする。プロセスは、そのティックにおける、または、次の適当なクロック遷移における実行のためにスケジューリングされる；
・void WaitUntil｛F|R｝(signal NAME, int OP, uint VALUE)
・void WaitUntil2｛F|R｝(signal NAME, int OP, uint VALUE, signal NAME2, int OP2, uint VALUE2)
・void WaitUntil3｛F|R｝(signal NAME, int OP, uint VALUE, signal NAME2, int OP2, uint VALUE2, signal NAME3, int OP3, uint VALUE3)：これらのマクロは、ある条件が１、２、または３個の信号に現れるまでプロセスがウェイトすること、および、要求されるクロック同期に従ってプロセスがスケジューリングされることを可能にする。 Void WaitClk {F | R} (void): Each of these macros waits for the next rising or falling edge of the process clock signal;
Void WaitFor {F | R} (uint TICKS): These macros wait for TICKS nanoseconds. The process is scheduled for execution at that tick or at the next appropriate clock transition;
Void WaitUntil {F | R} (signal NAME, int OP, uint VALUE)
Void WaitUntil2 {F | R} (signal NAME, int OP, uint VALUE, signal NAME2, int OP2, uint VALUE2)
Void WaitUntil3 {F | R} (signal NAME, int OP, uint VALUE, signal NAME2, int OP2, uint VALUE2, signal NAME3, int OP3, uint VALUE3): These macros have a condition 1, 2 or Allows the process to wait until it appears in the three signals, and allows the process to be scheduled according to the required clock synchronization.

Ｓ１０４５で、ＶＨＤＬ構文の翻訳は次のように行われる：
ＶＨＤＬ： wait until (startn = '0');
マクロ： WaitUntil(startn, EQ, 0);
Ｃ： WaitUntilc(signal_startn, EQ, 0, 0);
ウェイトポイントの翻訳の終了後、プロセス制御フローはＳ１０２０に戻る。 In S1045, the translation of the VHDL syntax is performed as follows:
VHDL: wait until (startn = '0');
Macro: WaitUntil (startn, EQ, 0);
C: WaitUntilc (signal_startn, EQ, 0, 0);
After completion of the weight point translation, the process control flow returns to S1020.

Ｓ１０５０で、ＶＨＤＬ制御構文が、等価なＣ言語オペレーションに翻訳される。この翻訳は、追加のインタフェース呼出しを含むことがある。例として、'if'制御構文の翻訳は次のように行われる：
ＶＨＤＬ： if (resetn = '0') then
マクロ： if (GetSignal(resetn) == 0) ｛
Ｃ： if (fabric_read_signal(signal_resetn) == 0) ｛
制御構文の翻訳の終了後、プロセス制御フローはＳ１０２０に戻る。 At S1050, the VHDL control syntax is translated into an equivalent C language operation. This translation may include additional interface calls. As an example, the translation of the 'if' control syntax is done as follows:
VHDL: if (resetn = '0') then
Macro: if (GetSignal (resetn) == 0) {
C: if (fabric_read_signal (signal_resetn) == 0) {
After the completion of the translation of the control syntax, the process control flow returns to S1020.

図４を参照すると、Ｓ１０６０で、翻訳する必要のある動作言語構文がさらにあるかどうかを判定する。さらにある場合、プロセス制御フローはＳ１０２０に進む。さらにない場合、変換プロセスは完了する。 Referring to FIG. 4, it is determined in step S1060 whether there are more operating language constructs that need to be translated. If there is more, the process control flow proceeds to S1020. If there are no more, the conversion process is complete.

これらの少数の基本変換により、動作ＶＨＤＬコードをＣに翻訳し、そのコードをコンパイルし、軽量オペレーティングシステム用にそれをリンクすることが可能である。ＣとＶＨＤＬの間のマッピングは一対一であり可逆であるため、もとの設計がＶＨＤＬで定式化されているか、それとも、翻訳可能Ｃで定式化されているかで、シミュレーションには差がない。翻訳が正しく実行される限り、結果は同じになる。この翻訳から生じるＣのサブセットは依然として強力なプログラミング言語である。これは、ポインタ間接参照や高度なデータ構造体のような特にＣＰＵベースのアルゴリズムを目的とするいくつかの機能を欠いているが、それ以外の点では、ターゲット変更可能設計の出発点として働くことが可能である。分割がわかったら、個々の解（ハードウェアまたはソフトウェア）の機能を十分に活用すべきである。 With these few basic transformations, it is possible to translate working VHDL code into C, compile that code, and link it for a lightweight operating system. Since the mapping between C and VHDL is one-to-one and reversible, there is no difference in simulation depending on whether the original design is formulated in VHDL or translatable C. As long as the translation is performed correctly, the result will be the same. The subset of C that results from this translation is still a powerful programming language. It lacks some features specifically aimed at CPU-based algorithms such as pointer indirection and advanced data structures, but otherwise serves as a starting point for retargetable designs. Is possible. Once the partitioning is known, the functions of the individual solutions (hardware or software) should be fully utilized.

本発明の特徴によれば、設計のソフトウェアコンポーネントは、ターゲットプロセッサの命令セットシミュレータ（ＩＳＳ：Instruction-Set Simulator）を用いてシミュレートすることが可能である。一般に、このようなシミュレータは、次の３つの要素を有する。 According to a feature of the invention, the software component of the design can be simulated using an instruction-set simulator (ISS) of the target processor. In general, such a simulator has the following three elements.

１．ターゲットプロセッサのオペレーションおよびレジスタのエミュレーション；
２．エミュレートされるプロセッサの状態を表示し、エミュレータのオペレーションを制御する、外部プログラムへのインタフェース、あるいはヒューマンインタフェース；
３．試験対象システムに存在するデバイスを表す外部ハードウェアのエミュレーションへのインタフェース。 1. Target processor operation and register emulation;
2. An interface to an external program or human interface that displays the state of the emulated processor and controls the operation of the emulator;
3. An interface to external hardware emulation that represents the devices present in the system under test.

シミュレータの高度化と、外部デバイスとの相互作用とに依存して、結果は、試験対象システムの基本機能のみを確認することもあり、あるいは、すべてのプロセスのサイクル精度のシミュレーションを与えることもある。非サイクル精度のＩＳＳの構成を、本発明の第１の特徴として以下で説明し、本発明のさらに他の特徴による変形についてはその後説明する。 Depending on simulator sophistication and interaction with external devices, the results may confirm only the basic functions of the system under test, or may give a cycle-accurate simulation of all processes. . The configuration of the non-cycle precision ISS will be described below as a first feature of the present invention, and modifications according to still other features of the present invention will be described later.

要素１は、ソフトウェアが、指定されたアーキテクチャによるプロセッサのレジスタをエミュレートすることを要求する。このようなレジスタには、汎用レジスタ、浮動小数点レジスタ、制御レジスタ、および、ユーザソフトウェアから直接にはアクセス可能でない「隠れた」レジスタが含まれるが、これらに限定されない。さらに、レジスタに対するオペレーションもまた、そのようなオペレーションによって引き起こされるステータスおよびエラー情報の生成を含めて、正しくエミュレートされなければならない。 Element 1 requires software to emulate the registers of a processor with a specified architecture. Such registers include, but are not limited to, general purpose registers, floating point registers, control registers, and “hidden” registers that are not directly accessible from user software. Furthermore, operations on registers must also be correctly emulated, including the generation of status and error information caused by such operations.

要素２は、外部エージェントへのインタフェースを要求する。好ましくは、ＩＳＳは、非常に単純なコマンドラインインタフェースを有し、使用されるコマンドは、図５に示したテーブル１にリストされている。 Element 2 requires an interface to the foreign agent. Preferably, the ISS has a very simple command line interface, and the commands used are listed in Table 1 shown in FIG.

要素３は、プロセッサに接続されたメモリおよびその他の外部デバイスのエミュレーションを要求する。ＩＳＳは、２つの方法のうちの一方を用いてローカルＲＡＭをエミュレートする。ＲＡＭの量が小さい場合、ホストマシンのローカルメモリが用いられる。そうでない場合、ディスクスペースを用いて、エミュレートされるＲＡＭを保持する。これにより、実際のマシンより多くの物理メモリを有するマシンのエミュレーションが可能となる。また、これは、障害時のメモリの再検討可能なチェックポイントを提供する。予想されるように、外部ディスクを用いると、シミュレーションはかなり遅くなる。 Element 3 requires emulation of memory and other external devices connected to the processor. ISS emulates local RAM using one of two methods. When the amount of RAM is small, the local memory of the host machine is used. Otherwise, disk space is used to hold the emulated RAM. Thereby, it is possible to emulate a machine having more physical memory than an actual machine. This also provides a re-checkable memory checkpoint in case of failure. As expected, using an external disk slows down the simulation considerably.

ＩＳＳが外部ハードウェアに応答するためには、そのハードウェアのモデルにアクセスすることができなければならない。本発明の第１の特徴によれば、外部ハードウェアの動作を表すモジュールのセットが、ＩＳＳ実行可能ファイルにリンクされる。外部デバイスは、外部グルーロジックを通じてＣＰＵアドレス空間にマッピングされると仮定され、ある範囲の（非キャッシュ）メモリアドレスがデバイスへのアクセスを引き起こすようになる。このアクセス範囲は、ハードウェアコマンドによって定義される。 In order for the ISS to respond to external hardware, it must be able to access the hardware model. According to a first aspect of the invention, a set of modules representing the operation of external hardware is linked to an ISS executable file. External devices are assumed to be mapped to the CPU address space through external glue logic, and a range of (non-cached) memory addresses will cause access to the device. This access range is defined by a hardware command.

例えば、一般的なＵＡＲＴインタフェースは、次の５つの機能を有することが可能である。 For example, a general UART interface can have the following five functions.

・void uart_reset(void)：このルーチンは、ハードウェアリセット信号の送信をシミュレートするために、ＩＳＳ初期化中に呼び出される。 Void uart_reset (void): This routine is called during ISS initialization to simulate sending a hardware reset signal.

・uint uart_fetch(int ADDRESS, int LENGTH)：このルーチンは、プロセッサからデバイスへのリード（読出し）要求をシミュレートする。デバイスは、その入力アドレスライン上にADDRESSパラメータが提示されたかのように、LENGTHパラメータによって指定される適当な数のビットを返さなければならない。 Uint uart_fetch (int ADDRESS, int LENGTH): This routine simulates a read request from the processor to the device. The device must return the appropriate number of bits specified by the LENGTH parameter as if the ADDRESS parameter was presented on its input address line.

・void uart_store(int ADDRESS, int LENGTH, uint VALUE)：このルーチンは、プロセッサからデバイスへのライト（書込み）要求をエミュレートする。このルーチンは、ADDRESSパラメータがその入力アドレスラインに提示され、かつ、VALUEパラメータがその入力データラインに提示されたかのように、作用しなければならない。LENGTHパラメータは、有効な値のバイト数を指定する。 Void uart_store (int ADDRESS, int LENGTH, uint VALUE): This routine emulates a write request from the processor to the device. This routine must act as if the ADDRESS parameter was presented on the input address line and the VALUE parameter was presented on the input data line. The LENGTH parameter specifies the number of valid value bytes.

・void uart_tick(void)：このルーチンは、プロセッサクロックティックごとに呼び出され、データストリーミングやタイマオペレーションをシミュレートするために使用可能である。 Void uart_tick (void): This routine is called every processor clock tick and can be used to simulate data streaming and timer operations.

・ハードウェアコンポーネントがプロセッサへの割込み信号を発生することを可能にする割込み発生ルーチン。 An interrupt generation routine that allows a hardware component to generate an interrupt signal to the processor.

このエミュレーションは、タイミング情報を含まない。ＵＡＲＴは、イベントが提示されるのと同じクロックティック内に応答している。正確なタイミングが問題とはならない試験ソフトウェアの場合、この単純なモデルで十分であった。 This emulation does not include timing information. The UART is responding within the same clock tick that the event is presented. For test software where accurate timing is not an issue, this simple model was sufficient.

プロセッサは数百メガヘルツのクロックで動作することが可能であるが、このような速度は、命令およびデータがいずれも高速のオンチップキャッシュから来るときにのみ実現される。遅いキャッシュや外部メモリにアクセスすると、ＣＰＵの見かけのパフォーマンスは低下することになる。さらに、多くのコデザイン問題の場合、外部ハードウェアは、オフチップＲＡＭにのみ書込みが可能である。このこともまた、ソフトウェア設計に対する制約となる。このようなデータにアクセスするにはデータキャッシュをバイパスしなければならない（結果としてそのデータはキャッシュにロードされるかもしれないが）からである。もちろん、メモリマップドＩ／Ｏロケーションへのアクセスもまた、非キャッシュアクセスを通じて行われなければならない。 Processors can operate with hundreds of megahertz clocks, but such speeds are only realized when both instructions and data come from a fast on-chip cache. Accessing a slow cache or external memory will reduce the apparent performance of the CPU. Furthermore, for many co-design problems, external hardware can only write to off-chip RAM. This is also a restriction on software design. This is because the data cache must be bypassed to access such data (although that data may be loaded into the cache as a result). Of course, accesses to memory mapped I / O locations must also be made through non-cached accesses.

しかし、このインタフェースは、コンポーネントに対する異なるアクセス時刻をエミュレートしない。すべてのメモリトランザクションは、単一サイクルで完了すると仮定される。 However, this interface does not emulate different access times for components. All memory transactions are assumed to complete in a single cycle.

基本インタフェースは、必要に応じて個々の命令エミュレーションから呼び出される次の３つのルーチンを通じて提供される。 The basic interface is provided through three routines called from individual instruction emulations as needed:

・void fetch(iblock *IN, uint ADDRESS1, int NBYTE, uint *ADDRESS2)：このルーチンは、（エミュレーション）アドレスADDRESS1からNBYTEバイトをロードし、それを（マシン）アドレスADDRESS2に格納（ストア）する。命令ブロックINは、リソーススコアボーディングのために用いられる。このルーチンは、キャッシュ可能領域ではデータキャッシュ探索を行い、それ以外ではエミュレートされたＲＡＭやデバイスにアクセスする。 Void fetch (iblock * IN, uint ADDRESS1, int NBYTE, uint * ADDRESS2): This routine loads NBYTE bytes from (emulation) address ADDRESS1 and stores (stores) them in (machine) address ADDRESS2. The instruction block IN is used for resource score boarding. This routine performs a data cache search in the cacheable area, and otherwise accesses the emulated RAM or device.

・void ifetch(uint ADDRESS1, int MAX, int IX)：このルーチンは、命令フェッチアルゴリズムをインプリメントする。これは、（エミュレーション）アドレスADDRESS1から始まるMAX個までのワードを、インデックスIXから始まる（４ワード）命令パイプにロードする。このルーチンは、まず、命令キャッシュを探索した後、データが見つからない場合に８ワードキャッシュラインを充填するためにメモリに進む。 Void ifetch (uint ADDRESS1, int MAX, int IX): This routine implements the instruction fetch algorithm. This loads up to MAX words starting at (emulation) address ADDRESS1 into the (4 words) instruction pipe starting at index IX. The routine first searches the instruction cache and then proceeds to memory to fill the 8-word cache line if no data is found.

・void store(block *IN, uint ADDRESS, int NBYTE, uint *VAL)：このルーチンは、（マシン）アドレスVALから（エミュレーション）アドレスADDRESSにNBYTEバイトを書き込む。また、このルーチンは、キャッシュ可能メモリのためにデータキャッシュを更新する。命令ブロックポインタINは、スコアボーディングのために用いられる。また、これらのルーチンは、キャッシュヒット数、キャッシュミス数、および非キャッシュメモリへのアクセス数についての統計を収集する。 Void store (block * IN, uint ADDRESS, int NBYTE, uint * VAL): This routine writes NBYTE bytes from (machine) address VAL to (emulation) address ADDRESS. This routine also updates the data cache for cacheable memory. The instruction block pointer IN is used for score boarding. These routines also collect statistics on the number of cache hits, the number of cache misses, and the number of accesses to non-cache memory.

タイミング測定が不要な場合、ＣＰＵの最も単純（かつ最も高速）なシミュレーションは、命令を１個ずつフェッチおよびデコードし、次の命令に進む前に各命令の実行を完了することである。インテルＩ９６０プロセッサチップを例として用いると、ＩＳＳは、キャッシュ動作の十分なエミュレーションを提供する。これは、命令キャッシュおよびデータキャッシュの両方をサポートし、メモリコントローラレジスタ（mcon0〜mcon15）を用いて、データをキャッシュするかどうかを決定する。また、このインタフェースは、メモリマップドハードウェアデバイスにアクセスするためのアドレス空間デコーディングも提供する。ルーチンは、ハードウェアルーチンのうちの１つ（例えば、uart_fetch）を呼び出すか、または、直接にＲＡＭエミュレーションにアクセスする。 When timing measurements are not required, the simplest (and fastest) simulation of the CPU is to fetch and decode instructions one by one and complete the execution of each instruction before proceeding to the next instruction. Using the Intel I960 processor chip as an example, ISS provides sufficient emulation of cache operations. It supports both instruction cache and data cache and uses memory controller registers (mcon0-mcon15) to determine whether to cache data. This interface also provides address space decoding for accessing memory mapped hardware devices. The routine calls one of the hardware routines (eg, uart_fetch) or directly accesses RAM emulation.

ＩＳＳは、軽量オペレーティングシステムの下で組込みアプリケーションとして動作する例示的な"hello world"プログラムを実行するために用いられた。この簡単なプログラムは、４個のプロセス、すなわち、ＵＡＲＴドライバ、コンソールプロセス、'hello'アプリケーション、およびアイドルプロセスを含む。また、このプログラムは、オペレーティングシステムのコアと、Ｃランタイムライブラリも含む。 ISS was used to run an exemplary “hello world” program that runs as an embedded application under a lightweight operating system. This simple program includes four processes: a UART driver, a console process, a 'hello' application, and an idle process. The program also includes an operating system core and a C runtime library.

ターゲットシステムは、開始エントリポイントから開始される。ターゲットシステムは、ブランク記憶領域をゼロにクリアし、制御されたプロセッサリセットを実行して、内部制御レジスタを設定する。初期化ルーチンが、４個のプロセスを作成し、ＵＡＲＴのためのハードウェア初期化ルーチンを呼び出す。このハードウェア初期化ルーチンは割込みハンドラをインストールする。 The target system is started from the starting entry point. The target system clears the blank storage area to zero, performs a controlled processor reset, and sets the internal control register. The initialization routine creates four processes and calls the hardware initialization routine for UART. This hardware initialization routine installs an interrupt handler.

コンソールプロセスは、ＵＡＲＴプロセスへのメッセージチャネルを開き、ＵＡＲＴキーボードインタフェースからの入力を受け取ることができるようにメッセージのキューをセットアップする。 The console process opens a message channel to the UART process and sets up a queue of messages so that it can receive input from the UART keyboard interface.

アプリケーションは、Ｃランタイムライブラリを用いて、通常のＩ／Ｏファイル（stdin、stdoutおよびstderr）をコンソールマルチプレクサプロセスに対して開いた後、printfルーチンを呼び出して、文字列をフォーマットしてメッセージをコンソールに送る。これは、コンソールへのコンテクストスイッチを引き起こし、文字列の前にプロセス名を付加してそれをＵＡＲＴプロセスに転送する。 The application uses the C runtime library to open regular I / O files (stdin, stdout, and stderr) to the console multiplexer process and then calls the printf routine to format the string and send the message to the console. send. This causes a context switch to the console, prepending the process name to the string and forwarding it to the UART process.

ＵＡＲＴは、ハードウェアによって生成されるTx-Available割込みに応答して、文字列を一時に１文字ずつハードウェアに送信する。すべての文字が送られた後、メッセージがコンソールプロセスを通じてアプリケーションに返され、その後、アプリケーションは終了する。 In response to the Tx-Available interrupt generated by the hardware, the UART transmits a character string to the hardware one character at a time. After all characters have been sent, a message is returned to the application through the console process, after which the application is terminated.

ＩＳＳをこのレベルで動作させると、ソフトウェアが正しく動作しているというある程度の信頼性を示すことができる。例えば、アプリケーションとのＵＡＲＴの動作を制御するソフトウェアは、いつＣＰＵが文字を書くことができるかを示すためにチップによって供給されるステータスフラグを正しく解釈している。さらに、ルーチンは、正しいパラメータおよび連繋で、正しい順序で呼び出されている。また、検出可能なメモリ破損がなく、定義されたメモリ領域の外部へのアクセスもないと仮定することも妥当である。 Operating the ISS at this level can show a certain degree of reliability that the software is operating correctly. For example, software that controls the operation of a UART with an application correctly interprets status flags supplied by the chip to indicate when the CPU can write characters. Furthermore, the routines are called in the correct order with the correct parameters and linkage. It is also reasonable to assume that there is no detectable memory corruption and no access to the outside of the defined memory area.

しかし、このモードでシミュレータを動作させても、実行時間に関する実際の情報は得られない。あらゆる命令は１クロックティックしかかからないと仮定しているからである。命令タイミングモードを動作させ、パイプライントレースをプリントアウトすれば、命令サイクルがどこを進んでいるかが示される。 However, even if the simulator is operated in this mode, actual information regarding the execution time cannot be obtained. This is because every instruction assumes only one clock tick. Run the instruction timing mode and print out the pipeline trace to show where the instruction cycle is going.

また、ＩＳＳを使用すると、回路内エミュレータを使用しても通常は入手できない情報にアクセスすることもできる。例えば、実際に使用されたレジスタキャッシュの最大深さを決定することや、キャッシュヒット統計とともにフレームこぼれ数をカウントすることも可能である。 Using ISS also allows access to information that is not normally available using an in-circuit emulator. For example, it is possible to determine the maximum depth of the register cache actually used, or to count the number of frame spills along with cache hit statistics.

また、それぞれの命令が実行された回数をカウントすることも可能である。このようなデータは、特に高集積ＳＯＣを製造するとき、プロセッサトレードオフを決定する際に非常に有益である。これは、例えば、より少数の分岐命令によって、ゲート数を少なくするためである。Ｉ９６０アーキテクチャは「予測分岐」命令を備えているが、コンパイラはそれらを使用しない。これらの命令を除去することも可能であり、あるいは、これらの機能を利用可能な別のコンパイラを検討することも可能である。 It is also possible to count the number of times each instruction has been executed. Such data is very useful in determining processor tradeoffs, especially when manufacturing highly integrated SOCs. This is because, for example, the number of gates is reduced by a smaller number of branch instructions. Although the I960 architecture provides “predictive branch” instructions, the compiler does not use them. These instructions can be removed, or another compiler that can use these functions can be considered.

このレベルの詳細さでも、サイクル精度にはほど遠い。この場合、すべてのバストランザクションが（マルチワードオペレーションを含めて）同じサイクル内で完了するという１サイクルメモリモデルを仮定しているからである。このレベルでのタイミングを正確にモデル化するためには、システムデータバスとそのデバイスの精密なモデルが必要であり、これは、完全なコシミュレーション環境を生成することを必要とする。 Even this level of detail is far from cycle accuracy. This is because it assumes a one-cycle memory model where all bus transactions (including multiword operations) are completed within the same cycle. In order to accurately model timing at this level, a precise model of the system data bus and its devices is required, which requires creating a complete co-simulation environment.

次に、完全なサイクル精度の命令セットシミュレーションの構成について説明する。真のハードウェア／ソフトウェアコシミュレーションを可能にするためには、本発明の第１の特徴に要求されるＩＳＳの機能を少なくとも含む、選択されたＣＰＵに対する命令セットシミュレータが要求される。 Next, the configuration of a complete cycle accuracy instruction set simulation will be described. In order to enable true hardware / software co-simulation, an instruction set simulator for the selected CPU is required that includes at least the ISS functionality required for the first aspect of the invention.

しかし、この機能は、特定のＣＰＵを選択する際に次の３つの主要な要素を無視している。 However, this feature ignores the following three main elements when selecting a particular CPU:

１．例えば乗算や除算のように、多くの命令は、単一サイクルで実行を完了しない。 1. Many instructions, such as multiplication and division, do not complete execution in a single cycle.

２．ＣＰＵは、リソース使用衝突がないと仮定して、バスオペレーションがレジスタ間オペレーションと同時に実行可能なように、複数の専用ユニットにわたる内部並列処理をインプリメントしていることがある。 2. The CPU may implement internal parallel processing across multiple dedicated units so that bus operations can be performed concurrently with register-to-register operations, assuming no resource usage conflicts.

３．内部キャッシュメモリ、外部ＲＡＭメモリおよびデバイスへのアクセスは、互いに非常に異なる応答時間を有することがある。これによる通常の効果は単に実行が遅延されるだけであるが、ある重大な場合には（例えば、到来する割込みの相対的タイミングがあるタスクの処理順序を決定する場合）、全体のパフォーマンスにおいて多大な役割を演ずることがある。 3. Access to internal cache memory, external RAM memory and devices may have very different response times. The normal effect of this is simply delayed execution, but in certain critical cases (for example, determining the processing order of tasks with relative timing of incoming interrupts), the overall performance is significant. May play a role.

次に、電子回路とその電子回路を制御する制御プログラムとを有するシステムの検証のための、ターゲットプロセッサのサイクル精度の命令セットシミュレーションを導出する方法について詳細に説明する。 Next, a method for deriving a cycle-accurate instruction set simulation of a target processor for verification of a system having an electronic circuit and a control program for controlling the electronic circuit will be described in detail.

インテルＩ９６０プロセッサを例として用いると、ＩＳＳは、命令キャッシュに加えて、次の機能をインプリメントすることによって、Ｉ９６０命令スケジューラをモデル化する。 Using the Intel I960 processor as an example, the ISS models the I960 instruction scheduler by implementing the following functions in addition to the instruction cache.

・並列デコーディングを有する４ワード命令パイプ；
・５個のＩ９６０命令パスのエミュレーション；
・結果が利用可能になる前には使用されないことを保証するためのレジスタスコアボーディング；
・分岐や呼出し（コール）のようなパイプライン切断命令の実装；
・完了するのに単一クロック時間より長くかかる命令に対するパイプラインストール。 A 4-word instruction pipe with parallel decoding;
Emulation of 5 I960 instruction paths;
• Register scoreboarding to ensure that results are not used before they are available;
Implementation of pipeline disconnection instructions such as branches and calls (calls);
• Pipeline installation for instructions that take longer than a single clock time to complete.

図６を参照すると、命令パイプラインは、各プロセッサクロックティックの最初に検査される。パイプが完全に満たされてはいない場合、満たすためにifetchルーチンが呼び出される。パイプを満たすのに必要な命令が命令キャッシュに見つからない場合、ifetch擬似命令がスケジューリングされる。この命令は、ＣＰＵのバス制御ユニット（ＢＣＵ：Bus Control Unit）およびメモリパスにおいて実行される（そのため、これらのユニットがビジーである場合には遅延されることがある）。結果的に、この命令が実行されると、８ワードが命令キャッシュラインにロードされる。これらのワードのうちの４個までが、fill_Ipipeルーチンによって、命令パイプラインを満たすために使用される。使用されるワード数は、パイプの状態と、命令キャッシュバウンダリに関する現在の命令ポインタのアラインメントとに依存する。 Referring to FIG. 6, the instruction pipeline is examined at the beginning of each processor clock tick. If the pipe is not completely filled, the ifetch routine is called to fill it. If the instruction needed to fill the pipe is not found in the instruction cache, the ifetch pseudo-instruction is scheduled. This instruction is executed in the CPU bus control unit (BCU) and memory path (and therefore may be delayed if these units are busy). As a result, when this instruction is executed, 8 words are loaded into the instruction cache line. Up to four of these words are used by the fill_Ipipe routine to fill the instruction pipeline. The number of words used depends on the state of the pipe and the current instruction pointer alignment with respect to the instruction cache boundary.

命令パイプラインにデータがある場合、その値が命令へとデコードされる。一部のＩ９６０命令は複数ワードを占めるため、不完全な命令がパイプに存在する可能性があり、その場合（図６の第２のcallx命令の場合のように）、それはデコードすることができない。命令は一度だけでコードされ、デコードされた形がデータ構造体に格納される。また、デコーディングには、命令をスケジューリングするために必要な内部処理ユニット、パイプラインおよびレジスタリソースを識別することも関連する。 If there is data in the instruction pipeline, its value is decoded into an instruction. Some I960 instructions occupy more than one word, so an incomplete instruction may be present in the pipe, in which case it cannot be decoded (as in the second callx instruction in FIG. 6). . The instruction is coded only once and the decoded form is stored in the data structure. Decoding also involves identifying internal processing units, pipelines and register resources needed to schedule instructions.

ＣＰＵが使用中のリソースは、ティックごとに、スコアボード構造体によって表される。命令は、パイプラインから順に取り出され、１つずつスケジューリングされる。スケジューリングは、命令がこのティックの間にすでにコミットされたリソースを要求するとき、もしくは、命令がパイプライン切断を生じるとき、または、パイプライン内にもはやデコードされた命令がないときのいずれかに、停止する。 The resources being used by the CPU are represented by a scoreboard structure for each tick. Instructions are fetched sequentially from the pipeline and scheduled one by one. Scheduling is either when an instruction requests a resource that has already been committed during this tick, or when an instruction causes a pipeline disconnect, or when there are no more decoded instructions in the pipeline. Stop.

スケジューラがパイプライン内のスロットを消費した場合、残りのデータが繰り上がり、命令ポインタが更新される。 When the scheduler consumes a slot in the pipeline, the remaining data is carried and the instruction pointer is updated.

スケジューリングが完了すると、現在のティックにおいてユニットに入る命令が実行される。完了するのに単一ティックより長くかかる命令は、パイプラインがストールすると、そのリソースを後続のティックに伝搬させる。 When scheduling is complete, the instructions that enter the unit are executed in the current tick. An instruction that takes longer than a single tick to complete propagates its resources to subsequent ticks when the pipeline stalls.

図６を参照すると、ＡＧＵ、ＭＥＭパス、ソースレジスタとしてのフレームポインタ、および、デスティネーションレジスタとしてのスタックポインタをロックして、lda命令が現在のティックにおいてスケジューリングされることが可能である。これは、次のcallx命令がスケジューリングされないようにするため、次のcallx命令はパイプにとどまる。lda命令は、実行するのに複数サイクルかかるため、ＡＧＵは次のティックでストールし、この命令からのリソースは、この命令がＡＧＵを占有する限り、スコアボードにコピーされる。ldaは、完了すると、そのリソースを解放し、callx命令が開始することができる。次のcallx（これはここで完全にデコードされることが可能となる）は、次の２つの理由で、スケジューリングされない。第１に、このcallxでのリソースクラッシュのためであるが、それだけでなく第２に、callx命令はパイプライン切断（および新しい命令ポインタへの転送）を引き起こすからである。 Referring to FIG. 6, the lda instruction can be scheduled in the current tick, locking the AGU, the MEM path, the frame pointer as the source register, and the stack pointer as the destination register. This prevents the next callx instruction from being scheduled, so the next callx instruction remains in the pipe. Since the lda instruction takes multiple cycles to execute, the AGU stalls at the next tick, and resources from this instruction are copied to the scoreboard as long as the instruction occupies the AGU. When lda completes, it releases its resources and the callx instruction can start. The next callx (which can now be fully decoded) is not scheduled for two reasons. First, because of a resource crash on this callx, but secondly, the callx instruction causes a pipeline disconnect (and a transfer to a new instruction pointer).

図６を参照すると、lda命令のエミュレーションは特に複雑ではない。命令デコーディングルーチンは、デスティネーションレジスタおよびメモリオペランドをすでに評価しているため、ルーチンは、レジスタを保持している（ＩＳＳメモリ内の）ロケーションを単に更新するだけである。その結果は、検査のために外部観察者に対して表示することが可能であり、スコアボードは、この命令に対する全部で４ティックについてマークされる。 Referring to FIG. 6, the emulation of the lda instruction is not particularly complicated. Since the instruction decoding routine has already evaluated the destination register and memory operands, the routine simply updates the location (in ISS memory) holding the register. The result can be displayed to an external observer for examination, and the scoreboard is marked for a total of 4 ticks for this instruction.

エミュレーションにおいては、デスティネーションレジスタが命令の期間中にアクセス不能とマークされている限り、現実のＣＰＵの場合のように「最終」ティックまで結果の代入を遅延させる必要はない。この便法は、ハードウェアがコシミュレートされているときにはもはや成り立たない。 In emulation, as long as the destination register is marked inaccessible during the instruction, there is no need to delay the assignment of the result until the “final” tick as in a real CPU. This expedient no longer holds when the hardware is co-simulated.

図７〜図９を参照して、サイクル精度のターゲットプロセッサのモデル化のプロセスについて説明する。Ｓ１１０５で、ターゲットプロセッサがオンボードキャッシュメモリを有するかどうかを判定する。ターゲットプロセッサがオンボードキャッシュメモリを有する場合、プロセスフローはＳ１１１５に進み、オンボードキャッシュアクセスを制御するアルゴリズムがモデル化される。ターゲットプロセッサがオンボードキャッシュメモリを有しない場合、プロセスフローはＳ１１２０に進む。 The process of modeling the target processor with cycle accuracy will be described with reference to FIGS. In step S1105, it is determined whether the target processor has an onboard cache memory. If the target processor has on-board cache memory, the process flow proceeds to S1115, and an algorithm for controlling on-board cache access is modeled. If the target processor does not have on-board cache memory, the process flow proceeds to S1120.

Ｓ１１２０で、ターゲットプロセッサが命令パイプラインアーキテクチャを有するかどうかを判定する。ターゲットプロセッサが命令パイプラインアーキテクチャを有する場合、Ｓ１１３０で、命令パイプライン充填プロセスがモデル化される。そうでない場合、プロセスフローはＳ１１３５に進む。 In S1120, it is determined whether the target processor has an instruction pipeline architecture. If the target processor has an instruction pipeline architecture, the instruction pipeline filling process is modeled at S1130. Otherwise, the process flow proceeds to S1135.

Ｓ１１３５で、命令およびリソースの表現を作成する。ターゲットプロセッサのアーキテクチャに依存して、各命令の効果、その命令が要求しロックするターゲットプロセッサ内のリソース、および、その命令の継続時間または終了条件を表現するデータ構造体が定義され初期化される。 In S1135, an instruction and resource representation is created. Depending on the architecture of the target processor, the effect of each instruction, the resources in the target processor that the instruction requests and locks, and a data structure that represents the duration or termination condition of the instruction are defined and initialized. .

図８を参照すると、Ｓ１１４０で、上記で定義されたリソース定義と、現在のマシン状態における命令のアクションとに対するインタプリタを作成することによって、命令がシミュレートされる。 Referring to FIG. 8, at S1140, the instruction is simulated by creating an interpreter for the resource definition defined above and the action of the instruction in the current machine state.

Ｓ１１４５で、ターゲットプロセッサがマルチパスアーキテクチャを有するかどうかを判定する。ターゲットプロセッサがマルチパスアーキテクチャを有しない場合、プロセスフローはＳ１１６５に進む。これに対して、ターゲットプロセッサがマルチパスアーキテクチャを有する場合、Ｓ１１５５で、現在のマシン状態において、他の命令が並列に実行されるべきかどうかを判定するアルゴリズムが生成される。他の命令が並列に実行されるべきである場合、プロセスフローはＳ１１４０に戻り、並列命令が実行される。そうでない場合、プロセスフローはＳ１１６５に進む。 In S1145, it is determined whether the target processor has a multipath architecture. If the target processor does not have a multipath architecture, the process flow proceeds to S1165. On the other hand, if the target processor has a multipath architecture, an algorithm is generated in S1155 that determines whether other instructions should be executed in parallel in the current machine state. If other instructions are to be executed in parallel, the process flow returns to S1140 and the parallel instructions are executed. Otherwise, the process flow proceeds to S1165.

図８および図９を参照すると、Ｓ１１６５で、シミュレートされる命令がマルチサイクル命令であるかどうかを判定する。シミュレートされる命令がマルチサイクル命令である場合、Ｓ１１７０で、上記のようにスコアボードが用意される。次に、Ｓ１１７５で、上記のようにリソースが伝搬され、プロセスフローはＳ１１８０に進む。 Referring to FIGS. 8 and 9, in S1165, it is determined whether the simulated instruction is a multi-cycle instruction. If the simulated instruction is a multi-cycle instruction, a scoreboard is prepared at S1170 as described above. Next, in S1175, resources are propagated as described above, and the process flow proceeds to S1180.

Ｓ１１８０で、非決定性タイミングが存在するかどうかを判定する。非決定性タイミングが存在する場合、終了の評価を行う。Ｓ１１９０で、現在のマシン状態における非決定性タイミングオペレーションに対する終了条件を判定するアルゴリズムを生成する。ターゲットプロセッサのアーキテクチャに従って、これは、エミュレートされるマシンレジスタ内の値に基づく計算を含むこともあり、また、外部信号の状態を調べるコードを生成することを含むこともある。Ｓ１１９２で、終了条件がチェックされ、命令が終了していない場合、プロセスフローはＳ１１７５に進む。命令が終了している場合、前に伝搬されたリソースがクリアされ、命令のシミュレーションは完了する。 In S1180, it is determined whether non-deterministic timing exists. If non-deterministic timing exists, end is evaluated. In S1190, an algorithm is generated that determines termination conditions for non-deterministic timing operations in the current machine state. Depending on the architecture of the target processor, this may include computations based on values in the emulated machine registers, and may also include generating code that examines the state of external signals. In S1192, the end condition is checked, and if the instruction has not ended, the process flow proceeds to S1175. If the instruction is finished, the previously propagated resource is cleared and the instruction simulation is complete.

図１０〜図１３を参照して、電子回路と、その電子回路を制御する制御プログラムとを含むシステムのコバリデーションの実行について説明する。Ｓ１３１０で、刺激が試験対象システムに入力される。Ｓ１３１５で、ＣＰＵが「動作可能」であるかどうかを判定する。動作可能である場合、プロセスフローはＳ１３４０に進む。動作可能でない場合、Ｓ１３２０で、ターゲットプロセッサはストールされ、Ｓ１３２５で、エミュレートされているキャッシュメモリに命令がロードされる。Ｓ１３３０で、キャッシュメモリがロードされているかを判定する。メモリが十分にロードされている場合、Ｓ１３３５で、ターゲットプロセッサがアンストール（ストール解除）される。 With reference to FIG. 10 to FIG. 13, execution of system covalidation including an electronic circuit and a control program for controlling the electronic circuit will be described. In S1310, a stimulus is input to the system under test. In step S1315, it is determined whether the CPU is “operable”. If so, the process flow proceeds to S1340. If not, the target processor is stalled at S1320 and the instruction is loaded into the emulated cache memory at S1325. In S1330, it is determined whether the cache memory is loaded. If the memory is sufficiently loaded, the target processor is uninstalled (stall release) in S1335.

Ｓ１３４０で、命令パイプラインが、ターゲットプロセッサの内部データフローモデルから導出される信号のシーケンスを用いて充填される。図１１を参照すると、Ｓ１３４５で、ターゲットプロセッサの内部データバスのアクションをエミュレートするために、所定数のクロックサイクルに基づく遅延が実行される。この所定数は、内部データバスの幅に基づき、１〜数クロックサイクルの範囲とすることが可能である。 At S1340, the instruction pipeline is filled with a sequence of signals derived from the target processor's internal data flow model. Referring to FIG. 11, at S1345, a delay based on a predetermined number of clock cycles is performed to emulate the internal data bus action of the target processor. This predetermined number can range from one to several clock cycles based on the width of the internal data bus.

次に、Ｓ１３５０で、実行に利用可能な命令を解釈するために、ターゲットプロセッサの内部デコードサイクルが実行される。この命令デコードサイクルは、上記のように実行される。 Next, at S1350, an internal decode cycle of the target processor is executed to interpret the instructions available for execution. This instruction decode cycle is executed as described above.

ターゲットプロセッサの内部デコードサイクルが実行された後、Ｓ１３５５で、スケジューリングされた命令が上記のようにパイプライン切断をインプリメントしているかどうかを判定する。パイプライン切断が要求される場合、Ｓ１３６５で、命令スケジューラがストールされる。パイプライン切断が要求されない場合、プロセス制御フローはＳ１３７０に進む。 After the internal decode cycle of the target processor is executed, it is determined in S1355 whether the scheduled instruction implements pipeline disconnection as described above. If pipeline disconnection is requested, the instruction scheduler is stalled in S1365. If pipeline disconnection is not requested, the process control flow proceeds to S1370.

次に、Ｓ１３２０で、スケジューリングされた命令が、適当な命令パイプラインあるいはハードウェアコンポーネントに転送される。適当な命令パイプラインへの命令の転送後、Ｓ１３７５で、各命令のサイクル実行時間が計算される。 Next, at S1320, the scheduled instruction is transferred to the appropriate instruction pipeline or hardware component. After transferring the instruction to the appropriate instruction pipeline, the cycle execution time of each instruction is calculated at S1375.

図１２を参照すると、Ｓ１３８０で、非決定性タイミングが存在するかどうかを判定する。非決定性タイミングが存在する場合、Ｓ１３９０で、Ｓ１１９０で生成された終了条件を判定するアルゴリズムが、各クロックサイクルの終端で実行される。 Referring to FIG. 12, it is determined in S1380 whether non-deterministic timing exists. If non-deterministic timing exists, in S1390, an algorithm for determining the termination condition generated in S1190 is executed at the end of each clock cycle.

Ｓ１４００で、エミュレーションは、計算された命令時間に対するリソース割当てを伝搬させる。Ｓ１４０５で、計算された実行サイクルの終端で利用可能な結果が出力される。この結果は、エミュレートされたレジスタやフラグの値を更新することや、外部信号の値を変化させることを含むことがある。 In S1400, the emulation propagates the resource allocation for the calculated instruction time. In S1405, a usable result is output at the end of the calculated execution cycle. This result may include updating the value of the emulated register or flag, or changing the value of the external signal.

次に、Ｓ１４１０で、次の命令サイクルで割込みハンドラがスケジューリングされるべきかどうかを判定する。スケジューリングされるべきである場合、Ｓ１４２０で、次の命令サイクルで実行するために割込みハンドラがスケジューリングされる。 Next, in S1410, it is determined whether an interrupt handler should be scheduled in the next instruction cycle. If so, at S1420, the interrupt handler is scheduled for execution in the next instruction cycle.

図１３を参照すると、Ｓ１４２５で、刺激からの結果が試験対象システムから出力される。この出力は、外部観察者に対して示されるシミュレーションの進行の視覚的表示であることも可能であり、あるいは、後の検討や後処理のためのそのような変化のログ（例えば、図２０によって例示されるようなトレース出力）の生成であることも可能である。Ｓ１４３０で、試験対象システムに入力されるべき刺激がまだあるかどうかを判定し、もうない場合、プロセスは終了する。まだある場合、プロセス制御はＳ１３１０に進む。 Referring to FIG. 13, in S1425, the results from the stimulus are output from the system under test. This output can be a visual indication of the progress of the simulation presented to the external observer, or a log of such changes (eg, according to FIG. 20) for later review and post-processing. It is also possible to generate a trace output) as illustrated. At S1430, it is determined whether there are more stimuli to be input to the system under test, and if there are no more, the process ends. If so, process control proceeds to S1310.

次に、試験対象システムへの刺激の入力により、試験対象システムに結果を出力させることについて詳細に説明する。 Next, a detailed description will be given of causing a test target system to output a result by inputting a stimulus to the test target system.

一般に、いくつかの異なるプロセスが、試験対象システムの異なる部分をシミュレートする。これらのプロセスは試験対象システムに固有であり、ケースバイケースで修正されなければならない。図１４を参照すると、設計された試験サイクルを継続的に繰り返して、コシミュレートされるハードウェアおよびソフトウェアを十分に作用させるために、例として、グローバルループコントローラ２３が用いられる。試験刺激プロセス２４は、「外界」から試験対象システムに刺激を提供する。試験刺激プロセス２４のもう１つの機能は、誤った入力がどのように処理されるかを見るために、誤入力を試験対象システムに入力することである。例えば、試験刺激プロセス２４は、データ転送トランザクションの応答側として作用し、試験対象システムの再試行メカニズムを試験するために、プロトコルエラーで応答することが可能である。バスアービトレーションプロセス２２は、試験対象システムと試験刺激プロセスの間のバスアービトレーションを提供する。試験対象回路２１は、単一のプロセスであることも可能であり、あるいは代替例では、回路設計の複雑さに依存して、数個のプロセスであることも可能である。 In general, several different processes simulate different parts of the system under test. These processes are specific to the system under test and must be modified on a case-by-case basis. Referring to FIG. 14, the global loop controller 23 is used as an example to continually repeat the designed test cycle to fully work with the co-simulated hardware and software. The test stimulus process 24 provides stimulus from the “outside” to the system under test. Another function of the test stimulus process 24 is to input erroneous inputs to the system under test in order to see how erroneous inputs are handled. For example, the test stimulus process 24 can act as a responder to a data transfer transaction and respond with a protocol error to test the retry mechanism of the system under test. The bus arbitration process 22 provides bus arbitration between the system under test and the test stimulus process. The circuit under test 21 can be a single process, or in the alternative, can be several processes depending on the complexity of the circuit design.

さらに、信号表示プロセスにより、ステップ実行を行い、グローバル状態に対する個々のステートメントの効果をウォッチすることが可能である。 In addition, the signal display process allows stepping and watching the effect of individual statements on the global state.

次に、電子回路のハードウェアモデルを命令セットシミュレーションに結合して、軽量コンピュータオペレーティングシステム環境で実行される試験対象システムを作成することについて、さらに詳細に説明する。 Next, combining the hardware model of the electronic circuit with an instruction set simulation to create a system under test to be executed in a lightweight computer operating system environment will be described in further detail.

第１段階として、ハードウェアの動作記述を調べ、各ハードウェアプロセスが個別のソフトウェアプロセスによって表現されるプロセス構造が構成される。各プロセス内では、動作記述が、軽量オペレーティングシステムのプログラミング言語に翻訳される。命令セットシミュレータがこの言語で同様にコーディングされる。シミュレータに必要なファブリックプロセスおよび刺激生成プログラムも同様である。第２段階として、プログラミング言語ソースファイルは、軽量オペレーティングシステムが動作するプロセッサに対するオブジェクトコードにコンパイルされる。第３段階として、これらのファイルが、軽量オペレーティングシステムに対するオブジェクトコードとリンクされ、完全な実行環境が形成される。第４段階として、シミュレータが実行されるシステムに、実行可能ファイルがロードされる。第５段階として、シミュレータが、軽量オペレーティングシステムの制御下で実行され、出力が観察される。 As a first step, a hardware operation description is examined, and a process structure in which each hardware process is represented by an individual software process is configured. Within each process, the behavioral description is translated into a lightweight operating system programming language. An instruction set simulator is similarly coded in this language. The fabric process and stimulus generation program required for the simulator are the same. As a second step, the programming language source file is compiled into object code for a processor running a lightweight operating system. As a third step, these files are linked with the object code for the lightweight operating system to form a complete execution environment. As a fourth stage, the executable file is loaded into the system where the simulator is executed. As a fifth step, the simulator is run under the control of a lightweight operating system and the output is observed.

次に、図１４を参照して、電子回路と、その電子回路をターゲットとする制御プログラムとのコバリデーションのためのコンピュータシステムについて、簡単に説明する。電子回路および制御プログラムは、軽量コンピュータシステムオペレーティング環境で実行される所定のコンピュータ言語を用いてシミュレートされる。また、このコンピュータシステムは、上記のルーチンを使用する。Ｓ０９００の設計ステップの終了後、コンピュータシステムは、ファブリックプロセス１０を有する。ファブリックプロセス１０は、複数のグローバル信号の状態を管理する信号メンテナサブシステム１２と、ソフトウェアモデルおよび命令セットシミュレータによって使用されるクロック信号を生成するクロックジェネレータサブシステム１３と、ソフトウェアモデルおよび命令セットシミュレータからのイベントをウェイトするプロセスキューを管理するキューメンテナサブシステム１４と、所定のタイミング間隔で少なくとも１つのプロセスを実行するスケジューラサブシステム１６と、少なくとも信号メンテナサブシステム１２、クロック信号ジェネレータサブシステム１３、キューメンテナサブシステム１４およびスケジューラサブシステムの実行を制御する中央制御プロセス１１とを有する。 Next, with reference to FIG. 14, a computer system for co-validation of an electronic circuit and a control program targeting the electronic circuit will be briefly described. The electronic circuit and control program are simulated using a predetermined computer language that runs in a lightweight computer system operating environment. The computer system also uses the above routine. After completion of the design step of S0900, the computer system has a fabric process 10. The fabric process 10 includes a signal maintainer subsystem 12 that manages the state of a plurality of global signals, a clock generator subsystem 13 that generates a clock signal used by the software model and instruction set simulator, and a software model and instruction set simulator. A queue maintainer subsystem 14 that manages a process queue that waits for an event, a scheduler subsystem 16 that executes at least one process at a predetermined timing interval, at least a signal maintainer subsystem 12, a clock signal generator subsystem 13, a queue And a central control process 11 that controls the execution of the maintainer subsystem 14 and the scheduler subsystem.

マイクロカーネルサポートプロセス以外に、コンピュータシステムは、ファブリックプロセス１０、命令セットシミュレーションプロセス、ならびに、ハードウェアコンポーネントおよび刺激システムを表すプロセスを有し、これらはすべて互いに関連して動作する。コシミュレーション環境の中央制御プロセス１１は、次の５個の主要なサブシステムの全体の機能および相互作用を管理する。 In addition to the microkernel support process, the computer system has a fabric process 10, an instruction set simulation process, and processes that represent hardware components and a stimulus system, all of which operate in conjunction with each other. The central control process 11 of the co-simulation environment manages the overall function and interaction of the following five major subsystems.

（ａ）すべてのグローバル信号の状態を管理する信号メンテナ１２；
（ｂ）クロック信号を生成するクロックジェネレータ１３；
（ｃ）イベントをウェイトするプロセスのキューを管理するキューメンテナ１４；
（ｄ）各ナノ秒ティックごとにそれぞれのアクティブプロセスを実行するスケジューラ１６；
（ｅ）ユーザに対して信号変化を示すディスプレイを駆動するディスプレイジェネレータ１５。 (A) a signal maintainer 12 that manages the state of all global signals;
(B) a clock generator 13 for generating a clock signal;
(C) a queue maintainer 14 that manages a queue of processes that wait for an event;
(D) a scheduler 16 that executes each active process for each nanosecond tick;
(E) A display generator 15 that drives a display showing signal changes to the user.

試験刺激の例としては、外部インタフェース上でのデータ利用可能性を示す信号がある。また、刺激は、エラー条件、例えば、ＥＣＣメモリによって生成されるパリティ検査エラーをシミュレートすることも可能である。 An example of a test stimulus is a signal that indicates the availability of data on the external interface. The stimulus can also simulate error conditions, such as parity check errors generated by ECC memory.

図１５を参照すると、シミュレーション環境の全体は、軽量オペレーティングシステム３０内の複数のプロセスとして動作する。個々のプロセスには、ファブリックプロセス３１、命令セットシミュレーションプロセス３２およびＶＨＤＬプロセス３３、３４が含まれる。ＶＨＤＬプロセス３３、３４は、モデル化されているハードウェアのインスタンス化されたハードウェアプロセスと一対一に対応する。すべての信号相互作用３５は、オペレーティングシステム３０によってスケジューリングされたメッセージとして、ファブリックプロセス３１を通る。 Referring to FIG. 15, the entire simulation environment operates as a plurality of processes within the lightweight operating system 30. Individual processes include a fabric process 31, an instruction set simulation process 32, and VHDL processes 33,34. The VHDL processes 33, 34 have a one-to-one correspondence with the instantiated hardware process of the modeled hardware. All signal interactions 35 pass through the fabric process 31 as messages scheduled by the operating system 30.

このようなソフトウェアベースのシミュレーションシステム内では、シングルプロセッサシステムでは真の並行性はないため、見かけ上の並列処理しか達成することができない。これは、信号に対する「同時」変化と時間の概念の問題を引き起こす。考慮すべき次の３つの問題がある。 In such a software-based simulation system, there is no true concurrency in a single processor system, so only apparent parallel processing can be achieved. This causes the problem of “simultaneous” changes to the signal and the concept of time. There are three issues to consider:

・問題１：２つのプロセスが動作可能であり、それらの実行の期間中に、ある信号の値を変化させる場合、その結果は、プロセスが実行される順序に依存することがある。例えば、信号"test"が実行の最初に値'1'を有する場合、
プロセス１： if (test = '1') then test <= '0'; end if;
プロセス２： test <= '1';
プロセス１がプロセス２の前に実行される場合、（素朴な）結果は、実行の終了時に"test"が'1'にセットされるというものである。プロセス２がプロセス１の前に実行される場合、結果は、実行の終了時に"test"が'0'にセットされるというものである。 Problem 1: If two processes are operational and change the value of a signal during their execution, the result may depend on the order in which the processes are executed. For example, if the signal "test" has the value '1' at the beginning of the run,
Process 1: if (test = '1') then test <= '0'; end if;
Process 2: test <= '1';
If process 1 is executed before process 2, the (simple) result is that "test" is set to '1' at the end of execution. If process 2 is executed before process 1, the result is that "test" is set to '0' at the end of execution.

・問題２：プロセスが変化し、同じ実行期間に、ある信号を読み出す場合、結果は非決定性となる。例えば、次のコードフラグメントを考える。 Problem 2: If the process changes and a signal is read during the same execution period, the result is non-deterministic. For example, consider the following code fragment:

プロセス１： test <= '1';
if (test = '1') then output <= '1'; end if;
現実のシステムでは、これはoutputを'1'にセットすることもしないこともある。 Process 1: test <= '1';
if (test = '1') then output <= '1'; end if;
In real systems, this may or may not set output to '1'.

・問題３：プロセスは、既知の周波数で動作するクロック信号に対する絶対時間の値を必要とする。さらに、絶対時間は、より複雑なデューティサイクルとともに、次のような構文に必要である。 Problem 3: The process requires an absolute time value for a clock signal operating at a known frequency. In addition, absolute time, along with a more complex duty cycle, is required for the following syntax:

クロックプロセス： clk <= !clk after 5 ns;
従来のＶＨＤＬシミュレータは通常、最初の２つの問題を、「デルタ」、すなわち、サブティック同期ポイントの概念により処理する。サブティック同期ポイントを使用すると、すべての信号値は、デルタの最初に固定される。これらの値が信号値に対する要求に応答して出され、デルタの最後にのみ値が更新される。これは、シミュレーションを決定性にし、プロセスのシミュレーションの順序とは独立にするという利点がある。これは、ＶＨＤＬプログラマに対して、次のデルタへのアラインメントを強制するように余分のウェイト(wait)ステートメントをコードに挿入することを要求する。上記の問題２の場合、この非決定性問題を解決するために、デルタバウンダリを強制するステートメントが挿入される。 Clock process: clk <=! Clk after 5 ns;
Conventional VHDL simulators typically deal with the first two problems by the concept of “delta”, ie, a subtick sync point. Using subtick synchronization points, all signal values are fixed at the beginning of the delta. These values are issued in response to a request for signal values and the values are updated only at the end of the delta. This has the advantage of making the simulation deterministic and independent of the process simulation order. This requires the VHDL programmer to insert extra wait statements into the code to force alignment to the next delta. For problem 2 above, a statement that enforces a delta boundary is inserted to solve this non-deterministic problem.

プロセス１： test <= '1';
wait for 0 ns;
if (test = '1') then output <= '1'; end if;
プロセス１は一貫してoutputを'1'にセットすることになる。しかし、合成された結果が期待通りに動作するという保証はない。 Process 1: test <= '1';
wait for 0 ns;
if (test = '1') then output <= '1'; end if;
Process 1 will consistently set output to '1'. However, there is no guarantee that the synthesized result will work as expected.

ファブリックプロセスの現在の実装では、瞬時値が出される（そのため、この例は、ウェイトが暗黙のうちに挿入されているかのように動作する）が、結果は、プロセスの実行の順序に依存する。当業者には明らかなように、システムは、１つの信号に対する単一の「デルタ」内の複数の変化を検出して、これをシミュレーション出力に報告するか、あるいは、他の動作をまねるように実装を変更するように、修正することも可能である。 In the current implementation of the fabric process, an instantaneous value is issued (so this example behaves as if a weight is implicitly inserted), but the result depends on the order of execution of the process. As will be apparent to those skilled in the art, the system may detect multiple changes within a single “delta” for a single signal and report this to the simulation output or mimic other actions. Modifications can be made to change the implementation.

シミュレータは、絶対時間の値を生成する。ファブリックプロセスのクロックサブシステムは、現在、１ナノ秒の基本タイムスライスを使用しているが、これは、現在のクロック信号の必要な精度を追跡するのに十分である。これは、数ＧＨｚのシステムがさらに広まるに応じて、変更される必要があるかもしれない。クロック信号のサポートは、共通のオペレーションのためのインタフェースを単純化するために、中央プロセスに組み込まれた。 The simulator generates an absolute time value. The fabric process clock subsystem currently uses a 1 nanosecond basic time slice, which is sufficient to track the required accuracy of the current clock signal. This may need to be changed as the multi-GHz system becomes more widespread. Clock signal support has been incorporated into the central process to simplify the interface for common operations.

ほとんどの動作ＶＨＤＬは、２つのモードのうちの一方で動作するトリガで書かれる。非同期モードでは、信号変化は、その信号に反応するプロセスを、次のナノ秒で動作可能にした。同期（クロック）モードでは、信号は、クロック信号の端（立上りあるいは立下り）でのみ調べられた。これらのモードは、図２のＳ１０００における動作翻訳中に決定される。プロセスの動作可能性は、前のティックに生じた信号変化と、そのときに起こっているクロック遷移とに基づいて、各ナノ秒ティックの最初に決定された。そのナノ秒ティック中に動作可能となったプロセスは次のティックまで開始されず、「デルタ」メカニズムに類似の環境提供した。 Most operational VHDL is written with a trigger that operates in one of two modes. In asynchronous mode, the signal change enabled the process to react to that signal in the next nanosecond. In synchronous (clock) mode, the signal was examined only at the edge of the clock signal (rising or falling). These modes are determined during motion translation in S1000 of FIG. The process operability was determined at the beginning of each nanosecond tick based on the signal changes that occurred in the previous tick and the clock transitions occurring at that time. The process that became operational during the nanosecond tick did not begin until the next tick, providing an environment similar to the “delta” mechanism.

このインタフェース方法の顕著な特徴は、イベントをウェイトしているプロセスが、マイクロカーネルコア内で真のウェイト状態にあることである。すなわち、それらのプロセスは、各クロックティックでチェックされず、システムの残りの部分におけるオーバーヘッドを生じない。ファブリックプロセスは、条件のすべてのチェックを処理する。従来のオペレーティングシステムのオーバーヘッドを除去し、軽量オペレーティングシステムの直下でシミュレーションを実行することによって、本発明は、より高速に動作するのみならず、エミュレートされる状態の表現は、オペレーティングシステム自体のスケジューリングコアを通じて直接にインプリメントされる。 The salient feature of this interface method is that the process waiting for the event is in a true wait state within the microkernel core. That is, they are not checked at each clock tick and do not incur overhead in the rest of the system. The fabric process handles all checks for conditions. By removing the overhead of the traditional operating system and performing simulations directly under the lightweight operating system, the present invention not only operates faster, but the representation of the emulated state is the scheduling of the operating system itself. Implemented directly through the core.

個々のプロセスが翻訳されたコードを実行するため、トレース出力はコンソールに表示することができる。システム全体のECHOVHDL変数が０でない場合、シミュレーション環境は、ナノ秒ティックのタイムスタンプを前に付した文字列を印字することによって信号値の変化が観察者に見えるようにする。コシミュレーション環境内の信号表示サブシステムと組み合わせて、ステップ実行を行い、グローバル状態に対する個々のＶＨＤＬステートメントの効果をウォッチすることが可能である。 Trace output can be displayed on the console as each process executes the translated code. If the system-wide ECHOVHDL variable is not zero, the simulation environment makes the signal value change visible to the observer by printing a string prefixed with a nanosecond tick timestamp. In combination with the signal display subsystem in the co-simulation environment, stepping can be performed to watch the effect of individual VHDL statements on the global state.

図１６に、プロセッサ４０、Ｉ／Ｏデバイス４３およびビデオディスプレイ端末４１を有するコンピュータシステムの実施例を示す。Ｉ／Ｏデバイス４３は、キーボードおよびマウスを含むが、これらに限定されない。タッチパッドのような他のデバイスも使用可能である。さらに、コンピュータシステムは、このコンピュータシステムが本発明のステップを実行することを可能にするように適応したソフトウェア命令を含むメモリ４２（図示していないが、プロセッサ４０に組み込まれている）を有する。 FIG. 16 shows an embodiment of a computer system having a processor 40, an I / O device 43 and a video display terminal 41. The I / O device 43 includes, but is not limited to, a keyboard and a mouse. Other devices such as a touchpad can also be used. In addition, the computer system has a memory 42 (not shown, but incorporated in the processor 40) that includes software instructions adapted to allow the computer system to perform the steps of the present invention.

また、コンピュータシステムは、データリンク４４によってプロセッサ４０に接続されたサーバ４５も含むことが可能である。データリンク４４は、従来のデータリンク（例えば、イーサーネット、ツイストペア、ＦＴＰ、ＨＴＴＰなど）である。サーバ４５は、このサーバに接続されたプログラムライブラリ４６へのアクセスを提供する。また、プログラムライブラリ４６は、コンピュータシステムが本発明のステップを実行することを可能にするように適応したソフトウェア命令を提供することも可能である。上記のように、プログラムライブラリ４６は、当業者に周知の任意のさまざまな媒体（例えば、フロッピー(登録商標）ディスク、ハードディスク、光ディスク、カートリッジ、テープ、ＣＤ−ＲＯＭ、書き込み可能ＣＤなど）上に実現可能である。図１６に示したコンピュータシステムでは、メモリ４２上のソフトウェア命令により、プロセッサ４０は、データリンク４４を通じてサーバ４５にアクセスすることによって、プログラムライブラリ４６にアクセスすることができる。図１６に示したコンピュータシステムは、いかなる意味でも限定的であることは意図しておらず、本発明を実施する多数のさまざまなコンピュータシステムを組み合わせることが可能である。 The computer system can also include a server 45 connected to the processor 40 by a data link 44. The data link 44 is a conventional data link (for example, Ethernet, twisted pair, FTP, HTTP, etc.). The server 45 provides access to the program library 46 connected to this server. Program library 46 may also provide software instructions adapted to allow a computer system to perform the steps of the present invention. As described above, the program library 46 is implemented on any of various media known to those skilled in the art (eg, floppy disk, hard disk, optical disk, cartridge, tape, CD-ROM, writable CD, etc.). Is possible. In the computer system shown in FIG. 16, the software instruction on the memory 42 allows the processor 40 to access the program library 46 by accessing the server 45 through the data link 44. The computer system shown in FIG. 16 is not intended to be limiting in any way, and many different computer systems that implement the present invention can be combined.

図１７に、本発明によってモデル化される例示的なシステムのブロックレベル設計を示す。システムは、Ｉ９６０プロセッサ３５、グルーロジック３６およびＵＡＲＴ３７を有する。この例示的システムは、本発明の適用範囲を制限するものではなく、以下の説明に役立てるためのものである。このサンプルシステムは、外部データバスがメモリサブシステム３８およびアドレスデコーディンググルーロジック３６の両方に接続されたＩ９６０命令セットシミュレータを有する。矢印は、信号の一方向性または双方向性を示す。グルーロジック３６は、メモリおよびＵＡＲＴチップ３７に対するchip-select信号をドライブするとともに、ＵＡＲＴ３７に対してさまざまな信号提示要求を提供する。この設計は、プロセッサクロックおよびバスクロックという２つのクロックにより、意図的に複数のタイミングドメインに分割された。 FIG. 17 shows a block level design of an exemplary system modeled by the present invention. The system has an I960 processor 35, glue logic 36 and UART 37. This exemplary system is not intended to limit the scope of the invention, but to serve the following description. This sample system has an I960 instruction set simulator with an external data bus connected to both the memory subsystem 38 and the address decoding glue logic 36. Arrows indicate unidirectional or bidirectional signal. The glue logic 36 drives a chip-select signal for the memory and the UART chip 37 and provides various signal presentation requests to the UART 37. This design was intentionally divided into multiple timing domains by two clocks, a processor clock and a bus clock.

コシミュレーション環境を実証するため、Ｉ９６０命令セットシミュレータは、ＮＥＣＶＲ４３００プロセッサのユーザガイド(User Guide)からの情報を用いて、ＭＩＰＳアーキテクチャに基づくデータバスにリンクされた。バス上には外部バスマスタリングデバイスがなかったため、システムは、フルＭＩＰＳバスのサブセットを使用した。選択された信号のセットは、さまざまなバースト能力とともに、さまざまな応答時間で、デバイスをインプリメントするのに十分であった。 To demonstrate the co-simulation environment, the I960 instruction set simulator was linked to a data bus based on the MIPS architecture, using information from the NEC VR4300 processor User Guide. Since there was no external bus mastering device on the bus, the system used a subset of the full MIPS bus. The selected set of signals was sufficient to implement the device with different burst times and different response times.

バスは同期モードで動作し、アクティビティはbusclk信号の立上り端でスケジューリングされる。信号およびプロトコルは以下に記載するとおりである。次のバス信号が用いられる。 The bus operates in synchronous mode and activity is scheduled on the rising edge of the busclk signal. Signals and protocols are as described below. The following bus signals are used.

・SysAD [31..0] 多重化システムアドレスおよびデータバス；
・SysCmd [4..0] データおよびトランザクションタイプを識別するコマンドバス。これは、リードモードに８ワードのバーストを追加するが、エラー指示コマンドおよび外部非応答モードを使用しない；
・EOK 外部デバイスがコマンドを受け入れることができるときにローにセットされる；
・EValid 外部デバイスがバス上に有効なデータを置いたときにローにセットされる；
・PMaster プロセッサが（外部デバイスがリード応答を発行することができるように）バス所有権を解放したときにハイにセットされる。 SysAD [31..0] Multiplexed system address and data bus;
• SysCmd [4..0] A command bus that identifies data and transaction types. This adds a burst of 8 words to the read mode but does not use error indication commands and external non-response mode;
EOK set low when an external device can accept commands;
EValid set low when an external device places valid data on the bus;
Set to high when the PMaster processor releases bus ownership (so that an external device can issue a read response).

外部バスマスタリングデバイスのないこの簡単なモデルでは、フルＭＩＰＳバスからの他のプロトコル信号は使用されなかった。 In this simple model without an external bus mastering device, no other protocol signals from the full MIPS bus were used.

プロセッサの視点から、図１８を参照すると、バスは次の順序でドライブされる。 From the processor perspective, referring to FIG. 18, the buses are driven in the following order.

１．プロセッサは、EOKがローになる（これは、バスを使用中の外部デバイスがないことを示す）のをウェイトする；
２．プロセッサは、SysAD上にアドレスを、また、SysCmd上に適当なWriteコマンドをドライブする。また、プロセッサは、PValidをローにセットする（これは、バス上に有効なデータがあることを示す）；
３．プロセッサは、EOKがハイになる（これは、外部デバイスがコマンドを受け入れたことを示す）のをウェイトする；
４．プロセッサは、SysADバスに、busclkサイクルごとに１個ずつデータワードを順次ドライブし、最後（または唯一）のワードを'no-more'フラグでマークする；
５．最後のデータサイクルの終わりに、プロセッサは、PValidをハイにセットし（これは、プロセッサがもはやバスをドライブしていないことを示す）、SysADおよびSysCmdをトライステートにする。 1. The processor waits for EOK to go low (which indicates that no external device is using the bus);
2. The processor drives an address on SysAD and an appropriate Write command on SysCmd. The processor also sets PValid low (this indicates that there is valid data on the bus);
3. The processor waits for EOK to go high (this indicates that the external device has accepted the command);
4). The processor sequentially drives one data word on the SysAD bus every busclk cycle and marks the last (or only) word with a 'no-more'flag;
5. At the end of the last data cycle, the processor sets PValid high (this indicates that the processor is no longer driving the bus) and tri-states SysAD and SysCmd.

図１９を参照すると、プロセッサリードは次のようになる。 Referring to FIG. 19, the processor lead is as follows.

１．プロセッサは、EOKがローになる（これは、バスを使用中の外部デバイスがないことを示す）のをウェイトする；
２．プロセッサは、SysAD上にアドレスを、また、SysCmd上に適当なReadコマンドをドライブする。また、プロセッサは、PValidをローにセットする（これは、バス上に有効なデータがあることを示す）；
３．プロセッサは、EOKがハイになる（これは、外部デバイスがコマンドを受け入れたことを示す）のをウェイトする；
４．プロセッサは、PValidをハイにセットし（これは、プロセッサがもはやバスをドライブしていないことを示す）、PMasterをハイにセットしてバスの制御を解放する；
５．プロセッサは、EValidがローにセットされる（これは、外部デバイスがバス上に有効なデータを置いたことを示す）のをウェイトする。次に、プロセッサは、SysADバスから、busclkサイクルごとに１個ずつデータワードを順次読み出す；
６．プロセッサは、EValidがハイにセットされるのをウェイトしてから、PMasterをローにセットし、バスの制御を再び獲得する。 1. The processor waits for EOK to go low (which indicates that no external device is using the bus);
2. The processor drives an address on SysAD and an appropriate Read command on SysCmd. The processor also sets PValid low (this indicates that there is valid data on the bus);
3. The processor waits for EOK to go high (this indicates that the external device has accepted the command);
4). The processor sets PValid high (this indicates that the processor is no longer driving the bus) and sets PMaster high to release control of the bus;
5. The processor waits for EValid to be set low (this indicates that the external device has placed valid data on the bus). The processor then sequentially reads one data word from the SysAD bus, one per busclk cycle;
6). The processor waits for EValid to be set high, then sets PMaster low and regains control of the bus.

ＵＡＲＴは、次のような非常に単純な信号モデルを使用した。 UART used a very simple signal model as follows.

・uartD｛7..0｝は、チップとの間での単一データ（非バースト）アクセスをサポートするバイト幅双方向データバスである；
・uartAは、２つの内部レジスタのうちの一方を選択するために用いられる１ビット「アドレス」バスである。アドレス'0'は制御レジスタであり、アドレス'1'はデータレジスタである；
・uartRは、プロセッサがリード要求を発行した場合にローにセットされる；
・uartWは、プロセッサがライト要求を発行した場合にローにセットされる；
・uartSは、ＵＡＲＴを選択するためにローにセットされる。アドレスおよびデータバスの値は、チップが選択されている間中有効でなければならない；
・uartIは、ＵＡＲＴからプロセッサに割込みを通知するためにハイにセットされる。 UartD {7..0} is a byte-wide bi-directional data bus that supports single data (non-burst) access to and from the chip;
UartA is a 1-bit “address” bus used to select one of two internal registers. Address '0' is a control register and address '1' is a data register;
UartR is set low when the processor issues a read request;
UartW is set low when the processor issues a write request;
UartS is set low to select UART. Address and data bus values must be valid as long as the chip is selected;
UartI is set high to notify the interrupt from the UART to the processor.

busclkは、チップに対する同期クロック信号であり、すべての遷移は、クロックの立上り端でサンプリングされる。 busclk is a synchronous clock signal to the chip, and all transitions are sampled at the rising edge of the clock.

ＵＡＲＴは、アドレスおよびデータの値が同時に提示されることを要求するため、グルーロジックは、多重化されたSysADバスからの翻訳を処理しなければならない。 Since UART requires that address and data values be presented at the same time, glue logic must handle the translation from the multiplexed SysAD bus.

制御レジスタは、まずインデックス番号を書き込んだ後に８ビットのデータを書き込むという２段階アクセスによって、内部ＵＡＲＴレジスタにアクセスするために用いられる。制御レジスタ０は、制御アドレスに制御オペレーションを書き込むことによって直接にアクセスされることが可能である（制御オペレーションは、間接レジスタ番号の範囲外のバイナリ値を有する）。この最小モデルでは、上記テーブル２にリストした制御オペレーションのみがインプリメントされた。これらは、割込み駆動およびポーリングモードの出力（入力はない）を提供するのに十分であった。

The control register is used to access the internal UART register by a two-stage access that first writes an index number and then writes 8-bit data. Control register 0 can be accessed directly by writing a control operation to the control address (the control operation has a binary value outside the range of the indirect register number). In this minimal model, only the control operations listed in Table 2 above were implemented. These were sufficient to provide interrupt driven and polling mode outputs (no inputs).

システムメモリは、busclkの速度でSysADラインを通じてのバックツーバックバーストモード転送をサポートすると仮定された。メモリコントローラには、プロセッサバス上のSysAD、SysCmd、PMasterおよびEValid信号へのアクセスが許可され、busclkでクロックされた。メモリコントローラは、グルーロジックへのさらに２つの次の信号を使用した。 System memory was assumed to support back-to-back burst mode transfer over the SysAD line at the speed of busclk. The memory controller was granted access to the SysAD, SysCmd, PMaster and EValid signals on the processor bus and was clocked by busclk. The memory controller used two additional signals to glue logic.

・memSはメモリchip-selectであり、これは、メモリが、バス上のコマンドおよびアドレスデータのターゲットであったときにローにセットされる；
・memEOKは、メモリコントローラがコマンドを受け入れることができるときに、メモリコントローラによってローにセットされる。 MemS is a memory chip-select, which is set low when the memory is the target of command and address data on the bus;
MemEOK is set low by the memory controller when the memory controller can accept the command.

「現実の」コントローラの内部の詳細（例えば、ＲＡＳおよびＣＡＳ生成）はモデル化されず、すべてのメモリアクセスは単一の（バス）クロックサイクルで応答された。 Internal details of the “real” controller (eg, RAS and CAS generation) were not modeled, and all memory accesses were responded in a single (bus) clock cycle.

ＵＡＲＴおよびメモリの実装はいずれも、上記のアーキテクチャ仕様によるそれらのオペレーションを制御するために、ＶＨＤＬ信号ライブラリを用いて直接にbehavioral Cで書かれた。さらに、ＵＡＲＴ実装は、ＩＳＳログファイルに出力されるように命令された文字列を格納した。このことは、シミュレータは、シミュレートされたシステムの出力の視覚的記録を生成したことを意味する。グルーロジックは、アドレスでコーディングおよび信号翻訳のための状態マシンを実装するＶＨＤＬで、単一のエンティティとして書かれた。ＶＨＤＬは、Ｓ１０００の方法に従ってＣに翻訳された。 Both UART and memory implementations were written directly in behavioral C using the VHDL signal library to control their operation according to the above architecture specification. In addition, the UART implementation stored a string that was instructed to be output to the ISS log file. This means that the simulator has generated a visual record of the output of the simulated system. Glue logic was written as a single entity in VHDL that implements a state machine for coding and signal translation by address. VHDL was translated into C according to the method of S1000.

次に、上記の命令セットシミュレータをコシミュレーション環境に統合することについて説明する。ＩＳＳをファブリック環境に統合するためには、ＩＳＳにいくつかの変更が必要とされた。いったんＩ／Ｏオペレーションが予測可能な継続時間のものでなくなると、その変更のほとんどは、タイミングの変更に関するものであった。 Next, the integration of the above instruction set simulator into the co-simulation environment will be described. Several changes to the ISS were required to integrate the ISS into the fabric environment. Once the I / O operation was not of a predictable duration, most of the changes were related to timing changes.

ファブリックインタフェース、プロセスの同期クロックおよび信号のセットを初期化するために、次のコードがプロセス初期化シーケンスに追加された。 The following code was added to the process initialization sequence to initialize the fabric interface, process synchronization clock and signal set.

FabInit();
/* pclock周期を指定する */
fabric_set_clock("pclk", 0, 5);
/* プロセッサ信号の初期状態 */
Signal(PMaster, 0);
Signal(PValid, 1);
/* ドライブされるまでデータおよびコマンドバスは浮遊 */
SignalZ(SysAD);
SignalZ(SysCmd);
単一のプロセッサクロックティックをシミュレートするルーチンは、各ティックを実行する前に、
WaitClkR();
への呼出しを追加することによって、中央ファブリックプロセスからのpclk上の立上り端変化をウェイトするように変更された。これはまた、プロセッサを他のハードウェアコンポーネントと同期させる効果を有する。注意すべき点であるが、このオペレーションのシーケンスは、選択されたデータバスモデルのアーキテクチャによって決定されているため、モデル化されているシステムに固有である。このような統合ステップは、コシミュレートされるそれぞれの設計ごとに実行されることになる。 FabInit ();
/ * Specify pclock period * /
fabric_set_clock ("pclk", 0, 5);
/ * Initial state of the processor signal * /
Signal (PMaster, 0);
Signal (PValid, 1);
/ * Data and command bus floats until driven * /
SignalZ (SysAD);
SignalZ (SysCmd);
A routine that simulates a single processor clock tick must run before executing each tick.
WaitClkR ();
Modified to wait for a rising edge change on pclk from the central fabric process by adding a call to. This also has the effect of synchronizing the processor with other hardware components. It should be noted that this sequence of operations is specific to the system being modeled as it is determined by the architecture of the selected data bus model. Such an integration step will be performed for each design that is co-simulated.

入出力（フェッチおよびストア）オペレーションの基本的なストラテジは同じままであった。メモリ領域デスクリプタに依存して、データキャッシュ（または命令キャッシュ）をまず参照した後、データがキャッシュにない場合には、外部デバイスのうちの１つがアクセスされる。主な変更は、すべてのアドレスデコーディングがグルーロジックで行われ、「ハードウェア」記述およびdevice_storeルーチンが使用されなかったことである。 The basic strategy for I / O (fetch and store) operations remained the same. Depending on the memory area descriptor, after first referring to the data cache (or instruction cache), if the data is not in the cache, one of the external devices is accessed. The main change is that all address decoding was done in glue logic and the “hardware” description and device_store routine were not used.

インタフェース自体は、上記の説明に従った。例えば、新たなオペレーションは次のように開始された（変数thisはＩ／Ｏ要求データ構造体を指すと仮定する）。 The interface itself followed the above description. For example, a new operation started as follows (assuming that the variable this points to an I / O request data structure):

if (GetSignal(EOK) == 0)
｛
Signal(Pvalid, 0);
Signal(SysAD, this->addr);
Signal(SysCmd, this->op);
this->memstate = 1;
｝
sbmarksb(this->inst, 2);
すなわち、EOKがローにセットされるのをウェイトした後、バス上に信号をドライブする。なお、コマンドが開始されたかどうかにかかわらず、ユニットは次のティックの間ストールされ、リソースは利用可能でないとマークされなければならない。また、このルーチンはそのすべての信号に対してポーリングを行う。ウェイトポイントが'step'ルーチンにあるからである。 if (GetSignal (EOK) == 0)
{
Signal (Pvalid, 0);
Signal (SysAD, this->addr);
Signal (SysCmd, this->op);
this-> memstate = 1;
}
sbmarksb (this-> inst, 2);
That is, after waiting for EOK to be set low, it drives a signal on the bus. Note that, regardless of whether the command is initiated, the unit is stalled for the next tick and the resource must be marked as unavailable. This routine also polls all the signals. This is because the wait point is in the 'step' routine.

外部バスマスタリングデバイス、あるいは、エラー処理がない場合、外部インタフェースは、状態が１０個より少ない小規模な状態マシンで捕捉された。また、このマシンは、命令フェッチサイクルも処理した。 In the absence of an external bus mastering device or error handling, the external interface was captured by a small state machine with less than 10 states. The machine also handled instruction fetch cycles.

このようなシステムに割込み信号を統合する際には特別の注意をしなければならない。シングルプロセス実装では、シミュレートされるプロセッサの割込み処理中の制御レジスタを直接修正することが可能である。コシミュレーションシステムでは、これは、ＵＡＲＴコードにおける
Signal(uartI, 1);
への呼出しと、ＩＳＳプロセスにおける対応するテスト
if (GetSignal(uartI) == 1)
｛
IGEN(7);
｝
によって置き換えなければならない。このアプローチは、バス制御ユニット(Bus
Control Unit)がビジーでありプロセッサが割込みを直ちに処理することができないとき、または、割込みが実行可能になる前に命令フェッチを要求するときに問題を引き起こした。特別の対処がなければ、プロセッサの優先度レジスタが割込み優先度に更新されるまで、割込みは各クロックティックにおいて処理されることになる。 Special care must be taken when integrating interrupt signals into such systems. In a single process implementation, it is possible to directly modify the control register during the interrupt processing of the simulated processor. In a co-simulation system, this is in UART code
Signal (uartI, 1);
Calls and corresponding tests in the ISS process
if (GetSignal (uartI) == 1)
{
IGEN (7);
}
Must be replaced by. This approach uses the bus control unit (Bus
The problem occurred when the Control Unit) was busy and the processor could not handle the interrupt immediately, or when requesting an instruction fetch before the interrupt was ready to run. Without special action, interrupts will be processed at each clock tick until the processor priority register is updated to the interrupt priority.

これは、割込み処理が開始されるとセットされるフラグにより処理された。このフラグは、バス制御ユニットにおけるアクティビティ（割込みディスパッチャによってスケジューリングされるアクティビティを含む）がある限り、すべてのパイプラインスケジューリングをバイパスした。このフラグをセットすることは、割込み処理中のレジスタを検査するオペレーションが再びスケジューリングされる前にすべてのＩ／Ｏオペレーションが完了してプロセッサ優先度が更新されることを保証した。 This was handled by a flag that was set when interrupt processing was started. This flag bypassed all pipeline scheduling as long as there was activity in the bus control unit (including activity scheduled by the interrupt dispatcher). Setting this flag ensured that all I / O operations were completed and the processor priority was updated before the operation checking the register being interrupted was rescheduled.

プロセッサクロックは、１００ＭＨｚで動作するようにセットされた。プロセッサ内では、命令キャッシュと命令パイプの間の内部バスは、フルスピードで動作するフル１２８ビット幅と仮定された。したがって、命令パイプはキャッシュからレイテンシ０で充填される。データキャッシュは、レベル１、すなわち、フルプロセッサスピードにあると仮定された。メモリアクセスはbusclkで動作したため、アクセスあたり１個の（プロセッサ）ウェイト状態を必要とした。 The processor clock was set to operate at 100 MHz. Within the processor, the internal bus between the instruction cache and the instruction pipe was assumed to be full 128 bits wide operating at full speed. Thus, the instruction pipe is filled with 0 latency from the cache. The data cache was assumed to be at level 1, ie full processor speed. Since the memory access was operated by busclk, one (processor) wait state was required per access.

ＵＡＲＴは、９６００ボーの外部シリアルラインに接続されていると仮定された。したがって、文字は１ミリ秒ごとに送出された。ＵＡＲＴ内部にはバッファリングがないため、'Tx Available'信号は、ライトオペレーションごとに１ミリ秒だけ遅延された。 The UART was assumed to be connected to a 9600 baud external serial line. Therefore, characters were sent every 1 millisecond. Since there is no buffering inside the UART, the 'Tx Available' signal was delayed by 1 millisecond for each write operation.

このシステムは、前述のコード例、すなわち"Hello World"プログラムの実行を繰り返すために使用された。図２０に示す出力は、ＭＩＰＳ−ＩＩＩアーキテクチャＣＰＵを含む開発ボード上で実行された、動作中のシステムを実証している。このアーキテクチャは、シミュレートされているＣＰＵ（インテルＩ９６０）のアーキテクチャとは異なり、より能力の高いプロセッサがどのようにシミュレーションのために用いられるかを実証している。 This system was used to repeat the execution of the previous code example, the "Hello World" program. The output shown in FIG. 20 demonstrates a working system running on a development board containing a MIPS-III architecture CPU. This architecture, unlike the architecture of the simulated CPU (Intel I960), demonstrates how a more capable processor is used for simulation.

ターゲットＣＰＵ自体の処理能力が低い場合、ターゲットＣＰＵは、シミュレーション環境をサポートしておらず、適当な時間で結果を出さない可能性がある。軽量オペレーティングシステムのためのデバイスドライバを容易に書くことができるため、迅速なプロトタイピングとシミュレーション環境が提供可能である。シミュレータをシングルＣＰＵに制約する理由はない。 When the processing capability of the target CPU itself is low, the target CPU does not support the simulation environment and may not produce a result in an appropriate time. Device drivers for lightweight operating systems can be easily written, providing a rapid prototyping and simulation environment. There is no reason to restrict the simulator to a single CPU.

軽量オペレーティングシステムコア自体を除いて、シミュレーションシステムは、特定のマシンアーキテクチャとは独立である。その非常に低いオーバーヘッドおよびメモリ要求により、軽量オペレーティングシステムは、ページングやスワッピングによらずに、利用可能なＲＡＭをシミュレーションのために最大限に利用させることができる。また、このようなシステムは一般に、最小限の外部周辺装置のセットしか必要としないため、従来のワークステーションのためのシミュレーションエンジンあるいはアクセラレータとして高性能ＣＰＵの利用することも考えられる。 With the exception of the lightweight operating system core itself, the simulation system is independent of any particular machine architecture. Its very low overhead and memory requirements allow lightweight operating systems to make maximum use of available RAM for simulation without paging or swapping. Also, such systems generally require a minimal set of external peripherals, so it may be possible to use a high performance CPU as a simulation engine or accelerator for a conventional workstation.

図１６を参照すると、本発明のもう１つの特徴によれば、目的とするターゲットＣＰＵは、通信媒体５１を通じてシミュレーションシステム４０に接続された開発ボード５０上に位置する。通信媒体５１は、システム４０内に共存するデータバスであることも可能である。ターゲットアプリケーションは、開発ボード５０のＣＰＵ上で直接実行され、命令セットシミュレータの必要性が回避される。このアプローチは、サイクルレベルの制度を維持するためにＣＰＵ命令セットからのサポートを必要とし、任意の与えられたＣＰＵに対するその実現は、すでに説明した原理から導出することが可能である。 Referring to FIG. 16, according to another feature of the present invention, the target CPU of interest is located on the development board 50 connected to the simulation system 40 through the communication medium 51. The communication medium 51 can also be a data bus that coexists in the system 40. The target application is executed directly on the CPU of the development board 50, avoiding the need for an instruction set simulator. This approach requires support from the CPU instruction set to maintain a cycle-level regime, and its implementation for any given CPU can be derived from the principles already described.

本発明のさまざまな特徴についての以上の記載は、例示および説明の目的で提示したものである。これは、網羅的であることや、開示したとおりの形に本発明を限定することは意図しておらず、さまざまな修正および変更が、上記の説明に照らして可能であり、また、本発明の実施により得られる。当業者が、考えている個々の用途に合わせて、さまざまな実施例においてさまざまな修正を施して本発明を利用することができるように、本発明の原理およびその実際的な応用について説明した。 The foregoing description of various features of the present invention has been presented for purposes of illustration and description. It is not intended to be exhaustive or to limit the invention to the precise form disclosed, and various modifications and changes are possible in light of the above description. It is obtained by performing. The principles of the present invention and its practical application have been described so that one skilled in the art can utilize the present invention in various modifications in various embodiments for the particular application contemplated.

したがって、本発明のいくつかの特徴についてしか具体的には説明しなかったが、明らかなように、本発明の技術思想および技術的範囲から離れることなく、さまざまな変形を行うことが可能である。さらに、頭字語は、単に、本明細書の可読性を高めるために用いられている。注意すべき点であるが、このような頭字語は、用いられている用語の一般性を狭めることを意図しておらず、特許請求の範囲を限定すると解釈されてはならない。 Accordingly, although only some of the features of the present invention have been specifically described, it will be apparent that various modifications can be made without departing from the spirit and scope of the present invention. . In addition, acronyms are only used to enhance the readability of this specification. It should be noted that such acronyms are not intended to narrow the generality of the terms used and should not be construed as limiting the scope of the claims.

マルチレイヤシミュレーション環境を有する従来のシミュレーションシステムを示す図である。It is a figure which shows the conventional simulation system which has a multilayer simulation environment. 本発明の特徴による、電子回路と、その回路をターゲットとするソフトウェアとのコバリデーション方法の基本プロセスフローを示す図である。FIG. 3 is a diagram showing a basic process flow of a method for co-validation between an electronic circuit and software targeting the circuit according to the characteristics of the present invention. 本発明の特徴による、ハードウェアの動作記述をソフトウェアモデルに翻訳する方法の詳細なプロセスフローを示す図である。FIG. 5 shows a detailed process flow of a method for translating a hardware behavior description into a software model according to a feature of the present invention. 本発明の特徴による、ハードウェアの動作記述をソフトウェアモデルに翻訳する方法の詳細なプロセスフローを示す図である。FIG. 5 shows a detailed process flow of a method for translating a hardware behavior description into a software model according to a feature of the present invention. 命令セットシミュレータへのコマンドラインインタフェースの例示的なコマンドのセットを示す図である。FIG. 6 illustrates an example set of commands for a command line interface to an instruction set simulator. インテル１９６０プロセッサによる代表的な命令フローを示す図である。It is a figure which shows the typical command flow by the Intel 1960 processor. 本発明の特徴による、命令がターゲットマイクロプロセッサを通る際の命令のサイクリングを決定するプロセスフローを示す図である。FIG. 4 illustrates a process flow for determining instruction cycling as an instruction passes through a target microprocessor, in accordance with aspects of the present invention. 本発明の特徴による、命令がターゲットマイクロプロセッサを通る際の命令のサイクリングを決定するプロセスフローを示す図である。FIG. 4 illustrates a process flow for determining instruction cycling as an instruction passes through a target microprocessor, in accordance with aspects of the present invention. 本発明の特徴による、命令がターゲットマイクロプロセッサを通る際の命令のサイクリングを決定するプロセスフローを示す図である。FIG. 4 illustrates a process flow for determining instruction cycling as an instruction passes through a target microprocessor, in accordance with aspects of the present invention. 本発明の特徴による、軽量コンピュータオペレーティングシステム環境でコバリデーションシミュレーションが実行される場合の、エミュレートされるターゲットプロセッサのプロセスフローを示す図である。FIG. 4 illustrates a process flow of an emulated target processor when a co-validation simulation is performed in a lightweight computer operating system environment in accordance with features of the present invention. 本発明の特徴による、軽量コンピュータオペレーティングシステム環境でコバリデーションシミュレーションが実行される場合の、エミュレートされるターゲットプロセッサのプロセスフローを示す図である。FIG. 4 illustrates a process flow of an emulated target processor when a co-validation simulation is performed in a lightweight computer operating system environment in accordance with features of the present invention. 本発明の特徴による、軽量コンピュータオペレーティングシステム環境でコバリデーションシミュレーションが実行される場合の、エミュレートされるターゲットプロセッサのプロセスフローを示す図である。FIG. 4 illustrates a process flow of an emulated target processor when a co-validation simulation is performed in a lightweight computer operating system environment in accordance with features of the present invention. 本発明の特徴による、軽量コンピュータオペレーティングシステム環境でコバリデーションシミュレーションが実行される場合の、エミュレートされるターゲットプロセッサのプロセスフローを示す図である。FIG. 4 illustrates a process flow of an emulated target processor when a co-validation simulation is performed in a lightweight computer operating system environment in accordance with features of the present invention. 本発明の特徴による、軽量コンピュータシステム環境におけるコバリデーションの例示的なソフトウェアプロセスを示す図である。FIG. 4 illustrates an exemplary software process for co-validation in a lightweight computer system environment in accordance with features of the present invention. 本発明の特徴による、軽量コンピュータオペレーティングシステム環境における信号フローパスを示す図である。FIG. 5 illustrates a signal flow path in a lightweight computer operating system environment in accordance with features of the present invention. 本発明の特徴による、ターゲットマイクロプロセッサおよび電気回路のハードウェアおよびソフトウェアコバリデーションのための例示的なコンピュータシステムを示す図である。FIG. 3 illustrates an exemplary computer system for hardware and software validation of a target microprocessor and electrical circuit in accordance with features of the present invention. 本発明を用いて協調検証されるターゲットマイクロプロセッサ、グルーロジック、Ｉ／Ｏハードウェアおよびメモリからなる例示的なシステムを示す図である。FIG. 4 illustrates an exemplary system consisting of a target microprocessor, glue logic, I / O hardware and memory that are co-verified using the present invention. 図１７に示した例示的システムのバスリードタイミングを示す図である。FIG. 18 is a diagram illustrating bus read timing of the exemplary system shown in FIG. 17. 図１７に示した例示的システムのバスライトタイミングを示す図である。FIG. 18 illustrates bus write timing for the exemplary system shown in FIG. 17. 図１７に示した例示的システムに対して、本発明によって生成される信号タイミングを示す図である。FIG. 18 shows signal timing generated by the present invention for the exemplary system shown in FIG.

Explanation of symbols

１ＵＮＩＸ(登録商標）システム
２ＶＳＩＭ環境
３内部スケジューラ
４命令セットシミュレータ
５ＶＨＤＬシミュレーション
６インタフェース
７割込み
８外部コンポーネント
１０ファブリックプロセス
１１中央制御プロセス
１２信号メンテナサブシステム
１３クロックジェネレータサブシステム
１４キューメンテナサブシステム
１５ディスプレイジェネレータ
１６スケジューラサブシステム
２１試験対象回路
２２バスアービトレーションプロセス
２３グローバルループコントローラ
２４試験刺激プロセス
３０軽量オペレーティングシステム
３１ファブリックプロセス
３２命令セットシミュレーションプロセス
３３ＶＨＤＬプロセス
３４ＶＨＤＬプロセス
３５Ｉ９６０プロセッサ
３６グルーロジック
３７ＵＡＲＴ
３８メモリサブシステム
４０プロセッサ
４１ビデオディスプレイ端末
４２メモリ
４３Ｉ／Ｏデバイス
４４データリンク
４５サーバ
４６プログラムライブラリ
５０開発ボード
５１通信媒体 DESCRIPTION OF SYMBOLS 1 UNIX (R) system 2 VSIM environment 3 Internal scheduler 4 Instruction set simulator 5 VHDL simulation 6 Interface 7 Interrupt 8 External component 10 Fabric process 11 Central control process 12 Signal maintainer subsystem 13 Clock generator subsystem 14 Queue maintainer subsystem 15 Display generator 16 Scheduler subsystem 21 Circuit under test 22 Bus arbitration process 23 Global loop controller 24 Test stimulus process 30 Lightweight operating system 31 Fabric process 32 Instruction set simulation process 33 VHDL process 34 VHDL process 35 I960 processor 36 Glue logic 37 UART
38 Memory Subsystem 40 Processor 41 Video Display Terminal 42 Memory 43 I / O Device 44 Data Link 45 Server 46 Program Library 50 Development Board 51 Communication Medium

Claims

In a computer system that performs validation of an electronic circuit and a control program that targets the electronic circuit,
The electronic circuit and control program are simulated using a predetermined computer language that is executed in a lightweight operating system environment built on a microkernel that provides interprocess communication via messages,
The computer system includes:
A signal management means for managing the state of all global signals;
Clock generation means for generating a clock signal used by a software model of the electronic circuit and an instruction set simulator for executing a part of the control program;
Queue management means for managing a process queue that waits for an event from the software model and the instruction set simulator;
Scheduler means for executing components including said software model and said instruction set simulator at predetermined timing intervals;
The signal management means, the clock generation means, the queue management means and the scheduler means are subsystems controlled by a central control process executed by the computer system, and the electronic circuit is the lightweight operating system. The computer system is characterized in that a state change between elements of the mapped electronic circuit is modeled by message passing of the lightweight operating system.

3. The computer system according to claim 2, further comprising display control means for displaying a state change with respect to the global signal on a display.

The computer system according to claim 1, wherein the predetermined timing interval simulates a time interval of 1 nanosecond or more.

The computer system according to claim 1, wherein the predetermined timing interval simulates a time interval of less than 1 nanosecond.

In an executable program for a computer system that performs validation of an electronic circuit and a control program that targets the electronic circuit,
The electronic circuit and control program are simulated using a predetermined computer language that is executed in a lightweight operating system environment built on a microkernel that provides interprocess communication via messages,
The executable program is:
A first executable code portion for managing a plurality of global signal states when executed on a computer;
A second executable code portion for generating a plurality of clock signals when executed on a computer;
A third executable portion of code that manages a process queue that waits for an event when executed on a computer;
A fourth executable code portion that, when executed on a computer, generates a predetermined timing interval;
A fifth executable code portion for controlling execution of at least the first, second, third and fourth executable code portions when executed on a computer;
And the electronic circuit is mapped to a process executed by the lightweight operating system, and a state change between elements of the mapped electronic circuit is modeled by message passing of the lightweight operating system. An executable program for a computer system that performs electronic circuit and control program co-validation.