JP4083491B2

JP4083491B2 - Module-to-module interface automatic synthesis apparatus, synthesis method, program, and portable storage medium

Info

Publication number: JP4083491B2
Application number: JP2002211812A
Authority: JP
Inventors: 憲範富田
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2002-07-19
Filing date: 2002-07-19
Publication date: 2008-04-30
Anticipated expiration: 2022-07-19
Also published as: JP2004054641A

Description

【０００１】
【発明の属する技術分野】
本発明は、高位合成(High Level Synthesis)による回路設計の技術に関し、更に詳しくは並列動作する複数の回路モジュール間のインタフェースとなるモジュール間インタフェースの合成に関する。
【０００２】
【従来の技術】
近年ハードウエアは、ＶＬＳＩに代表されるように大規模化されており、これに対処する為ハードウエア設計をコンピュータを使って自動化するという研究が進められている。ハードウエア設計の手法の一つとして、ＶＨＤＬやＳＦＬなどのような回路記述言語を用いてハードウエアを意識しないで動作仕様を記述し、これをコンピュータに入力すると、コンピュータが各モジュールの入力論理の合成や論理圧縮などを行い、更に配置・配線などが行われて実際の回路を合成するものがある。このような回路記述言語からの合成手法のうち、特に抽象度の高い回路記述から合成する手法を高位合成と呼ぶ。
【０００３】
図３２に、高位合成による回路設計で用いられる回路記述言語による動作記述例を示す。同図の例では、（Ａ）部分には、配列変数にデータを出力する処理が記述されており、（Ｂ）部分には、配列変数からデータを読み出す処理が記述されている。
【０００４】
高位合成による回路設計において、図３２に示すように記述された、もともと逐次的に処理される部分（Ａ）、（Ｂ）を、並列実行させる構成としたい場合がある。
【０００５】
このような場合、並列実行される回路モジュール間でのデータ転送を担いインタフェースとなる部分をハードウエアとして作成するが、たいていこのインタフェースとなる部分の回路には、タイミング調整用のバッファメモリを設けなければならない。そしてデータを出力する回路モジュールはこのバッファにデータを書き込み、データを受け取る回路モジュールはこのバッファからデータを読み出す。
【０００６】
図３２の（Ａ）部分で記述された処理を実現する回路モジュールＡ、及び（Ｂ）部分で記述された処理を実現する回路モジュールＢを実装し、両者を並列実行させる場合、回路モジュールＡと回路モジュールＢとでバッファ上の同じ配列変数に対するアクセスが行われる。
【０００７】
この時この２つの回路モジュールが正しく並列処理を行う為には、配列変数に対する書き込みと読み出しの順序が正しく行われなければならない。つまり、回路モジュールＡが書き込んだ後に回路モジュールＢが読み出す、回路モジュールＢが読み出す前に回路モジュールＡが書き込んではいけない、といった順序関係が保証される必要がある。
【０００８】
これを実現する方法の１つとして、配列変数ａｒｒａｙを２重化し、ａｒｒａｙＡとａｒｒａｙＢの２つの配列変数に対するメモリ領域を備える構成としてハードウエア化することが考えられる。この場合、回路モジュールＡがａｒｒａｙＡに書き込み、その間回路モジュールＢはａｒｒａｙＢから読み出す。回路モジュールＡ、Ｂが最後まで書き込み／読み出し終わったら、今度は、回路モジュールＡがａｒｒａｙＢに書き込み、回路モジュールＢはａｒｒａｙＡから読み出す。以降、この動作を繰り返す。
【０００９】
この方法を用いると、回路モジュールＡで書き込み終わった後に回路モジュールＢが読み出し、回路モジュールＢが読み出す前に回路モジュールＡは書き込まない、という条件は満たすことが出来る。
【００１０】
この方法を用いた場合、配列変数ａｒｒａｙとして使用するメモリ量は、必ずしも元の配列変数のサイズの２倍は必要ないケースが多い。しかし回路を半導体チップ内に構成する場合、メモリサイズはできるだけ小さい方が回路の配置／配線、消費電力、製造コストなど多くの点でメリットがある。
【００１１】
メモリを２重化する方法以外としては、回路設計者が、動作記述を十分に解析し、上記の配列変数を領域を２重化する方法よりもメモリ量を削減した実現方法を、回路毎に検討して手作業で設計する方法が考えられる。
【００１２】
また高位合成一般において、本発明で対象としているモジュール間インタフェース回路とは別に、変数をメモリ（レジスタ）に割り当てるスケジューリング処理がある。そこでは、変数共有と呼ばれる最適化処理が行われる。これは、ライフタイムが重複しない変数は、おなじメモリに割り当てることで、高位合成によって生成されるハードウエア量を減らす技術である。この技術は、本発明で対象としている高位レベル記述のモデルの構造と配列変数に対しては、そのままでは適用できない。
【００１３】
【発明が解決しようとする課題】
本発明は、上述したような、並列実行される複数の回路のインタフェースとなる回路を自動合成することが出来る仕組みを提供することを課題とする。
【００１４】
またこの時、インタフェースとなる回路を構成するバッファメモリのサイズ決定と、バッファメモリに関係する制御回路を自動合成することが出来る仕組みを提供することを課題とする。
【００１５】
配列変数のアクセスを数学的にモデル化し、従来より少ないメモリ量でレイテンシ（処理すべきデータが投入されてから処理結果がでてくるまでの時間）も小さくなったインタフェースとなる回路を、自動合成する仕組みを提供することを課題とする。
【００１６】
【課題を解決するための手段】
上記問題点を解決する本発明による合成装置は、並列に動作する複数の回路モジュールの間のデータの受け渡しを行うモジュール間インタフェースを合成することを前提として、変数算出手段、ライフタイム算出手段、縮退集合算出手段及び回路合成手段を備える。
【００１７】
変数算出手段は、前記回路モジュールによる前記モジュール間インタフェースへのデータの書き込み及び読み出しを解析し、該解析の結果に基いて前記モジュール間インタフェースが持つ変数を求める。
【００１８】
ライフタイム算出手段は、前記モジュール間インタフェースが持つ変数に対して、最初にデータが書き込まれてから最後に読み出されるまでの期間を示すライフタイムを求める。
【００１９】
縮退集合算出手段は、前記変数を要素として持つ集合を整数分割して得られる部分集合で、要素となる変数の全てに対し、当該要素となる変数の前記ライフタイムと特定の時間ずれた前記ライフタイムを持つ変数が、前記分割して得られる他の部分集合に存在する縮退させた変数集合を探す。
【００２０】
回路合成手段は、前記縮退させた変数集合を用いて、変数の割り当てを元に前記モジュール間インタフェースを合成する。
また本発明は、回路記述言語による記載を解析し、回路の仕様を出力するコンパイラ手段及び上記合成装置を手段として備えるＣＡＤシステムとして実現することもできる。
【００２１】
また本発明は、前記回路モジュールによる前記モジュール間インタフェースへのデータの書き込み及び読み出しを解析し、該解析の結果に基いて前記モジュール間インタフェースが持つ変数を求め、前記モジュール間インタフェースが持つ変数に対して、最初にデータが書き込まれてから最後に読み出されるまでの期間を示すライフタイムを求め、前記変数を要素として持つ集合を整数分割して得られる部分集合で、要素となる変数の全てに対し、当該要素となる変数の前記ライフタイムと特定の時間ずれた前記ライフタイムを持つ変数が、前記分割して得られる他の部分集合に存在する縮退させた変数集合を探し、前記縮退させた変数集合を用いて、変数の割り当てを元に前記モジュール間インタフェースを合成することを特徴とする合成方法も適用範囲内である。
【００２２】
更に本発明は、コンピュータプログラムや可搬記憶媒体も適用範囲内である。本発明によれば、モジュール間インタフェースを自動的に合成することができる。
【００２３】
また使用するメモリ量を少なくしたモジュール間インタフェースを合成することができる。
発明の他の特徴を図２４，図２５を参照して説明する。Ｖ［０］からＶ［ｎ−１］までのｎ個の変数の全体集合（図２４ではＶ［０］からＶ［６３］）の各変数Ｖ［ｉ］が時刻Ｂｅｇｉｎ［ｉ］（図２４で＃の始まる時刻）において、メモリのあるアドレスに書き込まれ、ライフタイムＴ（図２４では＃の数）だけ保持され、時刻Ｅｎｄ［ｉ］（図２４で＃が終わった次の時刻）で読み出されるメモリの書き込み読み出し動作を前提に説明する。変数Ｖ［ｉ］（図２４で例えばＶ［０］の書き込みが始まるＢｅｇｉｎ［ｉ］からライフタイムＴ後に変数Ｖ［ｊ］（図２４で例えばＶ［４］）の書き込み時刻Ｂｅｇｉｎ［ｊ］となること、及び変数Ｖ［ｉ］の読み出し時刻Ｅｎｄ［ｉ］以降に変数Ｖ［ｊ］のＢｅｇｉｎ［ｊ］が始まらなければいけないという条件を満足するｉとｊの組み合わせからなる書き込み系列と読み出し系列が、互いに合同すなわち時間差Ｔrで同じライフタイムをもつ変数の集合になる部分集合（図２４のＰ0，Ｐ1，Ｐ2，Ｐ3）を求める。各部分集合の中では変数の書き込み時刻は異なる。その条件を満足する部分集合がいくつかできた場合にはその部分集合を全体集合に対して縮退された部分集合と呼ぶ。
【００２４】
この縮退された部分集合は変数の順序関係において、合同となっているので縮退された部分集合の変数に対応するアドレス空間で（図２４では部分集合P0に対応するアドレス空間１６）メモリの読み書きが可能となる。すなわち、アドレス空間が図２４では６４から１６に縮退されたことになる。
【００２５】
さらに、各縮退された変数集合をさらなる部分集合に分割しても、変数の共有化すなわち１つのアドレスに複数の変数を入れることによって前述の縮退されたアドレス空間よりも少ないアドレス空間で上の条件を満たすことができる。たとえば、図２４のＶ［０］，Ｖ［４］，Ｖ［６］をＡ，＋，Ｂとすれば、図２５のメモリのアドレスＭ［０］に割り当てることができる。これを変数の共有化と呼ぶ。図２５では、もともとのアドレス空間６４が１０まで縮退されたことになる。
【００２６】
【発明の実施の形態】
図１は、本発明におけるインタフェースとなるモジュール間インタフェースのを自動生成する技術をＣＡＤシステムに応用した場合のＣＡＤシステムの構成を示すブロック図である。
【００２７】
同図のＣＡＤシステム１は、入力編集部１１、コンパイラ部１２、ライブラリ格納部１３、インタフェースモジュール生成部１４及び出力部１５を備えている。尚同図は、ＣＡＤシステムのうち本発明に関連する部分のみが示してある。
【００２８】
入力編集部１１は、回路を設計する際、ユーザによる処理内容を回路記述言語で記述入力・編集を処理する部分である。コンパイラ部１２は、入力編集部１１からユーザが入力した記述内容やファイルとして読み込んだファイル内の回路記述言語による記述内容を実現する回路の仕様を生成する。ライブラリ格納部１３は、複数の回路ライブラリを格納する部分で、コンパイラ部１２が回路を生成する際に参照し、必要なライブラリを引き出す。インタフェースモジュール生成部１４は、上記した並列実行される複数の回路モジュール間のインタフェースとなるモジュール間インタフェースを生成するもので、コンパイラ部１２が回路を生成する際、回路記述言語による記述内容を解析し、モジュール間インタフェースが必要と判断すると、その生成を依頼する。尚このインタフェースモジュール生成部１４は、ライブラリ格納部１３の一部機能として構成しても良い。入出力部１５は、コンパイラ部１２が生成した回路をファイルとして、あるいはディスプレイ等の表示装置上に出力するものである。
【００２９】
このＣＡＤシステムは専用ハード上によって実現しても良いし、汎用のコンピュータを用いてソフトウエアによって実現することも可能である。
尚インタフェースモジュール生成部１４は、ＣＡＤシステム１やＣＡＤシステムを実現するソフトウエア一部としてではなく、他の装置やソフトエア、例えば高位合成ツール等の開発ツールやコンパイラの一部として、あるいは単独の装置やソフトウエアとして構成しても良い。
【００３０】
図２は、インタフェースモジュール生成部１４が行う処理を示す図である。
インタフェースモジュール生成部１４は、コンパイラ部１２からのインタフェースモジュールの生成依頼に対して、以下の処理を行う。
１．書き込み系列と読み出し系列を解析する。
２．書き込み系列と読み出し系列から、出力遅延を求める。
３．変数のライフタイムを解析する。
４．縮退させた変数集合を求める。
５．変数共有問題を解き、変数をメモリに割り当てる。
６．モジュール間インタフェースを合成する。
【００３１】
尚上記各処理については、後述する。
インタフェースモジュール生成部１４は、並列に動作する回路モジュールＡ、Ｂ及び回路モジュールＡ、Ｂのモジュール間インタフェースＣをプロセスとして表現して処理する。
【００３２】
ここでのプロセスとは、通信ポートに書き込みもしくは読み出しを行うことで、他のプロセスと通信しながら動作するものをいう。プロセス内部には、記憶（メモリ）が存在してもよい。プロセスは、互いに影響を与えることなく自由に通信操作を実行できる（複数のプロセスが並列動作する）。
【００３３】
またモジュール間インタフェースＣを生成する際には、以下の点を考慮して行う。
・複数のプロセスが並列に動作する。
・通信路に記憶は存在しない。
・相手が読み出す前に書き込みを行った場合には、いわゆる上書きを行ってしまい、前回書き込んだデータを壊してしまう。
・新しいデータを書き込む前に読み出しを行うと、前回書き込んだデータが読み出されるだけで新しいデータを読み出すことは出来ない。
【００３４】
プロセスは、それぞれ固有の時計を持つ。ただし、各プロセスの時計が示す時刻は、互いに線形関数で表現できるものとする。プロセス毎に時計が別々なのは、各回路モジュールが同一のクロック信号を用いて動作しているとは限らず、回路モジュール毎にクロック周波数やその位相が異なった場合において、簡潔に表現するためのものにすぎない。またそれぞれ固有の時計を持っても、各時刻は線形関数で変換できるので、複数の時計があることで、モデルがそれほど複雑になることはない。また、各回路モジュールが全て同一のクロック信号で動作することが予め判っているときは、全ての回路モジュールの時刻を１つの時計で表現してもよい。
【００３５】
図３は、インタフェースモジュール生成部１４で扱われるプロセスの例を示す図である。
同図において、プロセスＣが並列動作するプロセスＡ及びプロセスＢのモジュール間インタフェースとなる。プロセスＡは、データを出力する際、プロセスＣのメモリへの書き込み操作に相当する通信を行い、プロセスＢは、データを入力する際、プロセスＣのメモリからの読み出し操作に相当する通信を行う。そしてインタフェースモジュール生成部１４は、回路モジュールＡが書き込身が完了した後に回路モジュールＢが読み出し、回路モジュールＢが読み出す前に回路モジュールＡが書き込まないよう、順序関係が保証されるように制御するモジュールＣを生成する。
【００３６】
図４及び図５は、インタフェースモジュール生成部１４によって生成されるモジュール間インタフェースの実現例を示す図である。
図４は、配列変数をＲＡＭ上に割り当てた場合の構成で、回路モジュールＡからのデータの書き込みやモジュールＢからのデータの読み出しに対して、制御回路が各配列変数に割り当てたアドレスを発行して、順序関係が保証されるように制御する。
【００３７】
また図５は、配列変数をレジスタ上に割り当てた場合の構成で、回路モジュールＡからのデータの書き込みに対して、制御回路は割り当てられたレジスタに対してロード信号を送信して書き込ませ、モジュールＢからのデータの読み出しに対しては、セレクタを制御信号によって制御して対応するレジスタからのデータを出力させることによって順序関係が保証されるように制御する。
【００３８】
次に図２に示した、インタフェースモジュール生成部１４による処理を詳細に説明する。
（１）書き込み系列／読み出し系列の解析
この処理では、回路モジュールＡの出力及び回路モジュールＢの入力をプロセスＣの入力／出力として書き換え、以降の処理をプロセスＣに対する記述のみで議論できるようにする。
【００３９】
以下の説明では、プロセスＡ及びプロセスＢの時計が示す時刻をｔ_A 、ｔ_B と表記する。またプロセスＡが時刻ｔ_A のときに書き込む配列変数の添字の集合をω（Ａ，ｔ_A）、プロセスＢが時刻ｔ_B のときに読み出す配列変数の添字の集合をγ（Ｂ，ｔ_B ）と表記する。尚本実施例では一度に複数の配列変数要素に対してアクセスするケースも想定して集合としている。また
プロセスＡの書き込み系列Ｗ（Ａ）：
Ｗ（Ａ）＝｛（ｔ_A ，ω（Ａ，ｔ_A ））｜ｔ_A はプロセスＡの時刻｝
プロセスＢの読み出し系列Ｒ（Ｂ）：
Ｒ（Ｂ）＝｛ｔ_B ，γ（Ｂ，ｔ_B ））｜ｔ_B はプロセスＢの時刻｝
と定義する。たとえば、
Ｗ（Ａ）＝｛（０，｛０，３２｝），（１，｛１，３３｝），・・・，｛ｊ，（３２＋ｊ）｝｝，ｊ＝ｃｍｏｄ３２
は、時刻０で配列変数の０番目と３２番目に書き込みを行い、時刻１で１番目と３３番目に書き込みを行い、・・・というプロセスを表現しており、また
Ｒ（Ｂ）＝｛（０，｛０，１｝），（１，｛２，３｝），・・・，（ｄ，｛（２ｊ），（２ｊ＋１）｝），・・・｝，ｊ＝ｄｍｏｄ３２
は、時刻０で配列変数の０番目と１番目を読み出し、時刻１で２番目と３番目を読み出し、・・・というプロセスを表現している。
【００４０】
図６は、読み出し系列／書き込み系列のデータ構造例を示す図である。
プロセスの読み出し系列は書き込み系列は、図６のような１次元配列とリスト構造の組み合わせで実現できる。同図は、配列の要素がリスト構造データの先頭を示しており、またリスト構造部分の各データはその時刻でアクセスする変数の添え字の集合を表している。同図の例では系列｛（０，｛０，１、２｝），（１，｛３，４｝），（３，｛５，６｝），・・・，（ｎ−１，｛３０，３１｝）｝を格納した場合を示している。
【００４１】
またプロセスＡの時刻ｔ_A ^bからｔ_A ^eまでの書き込み系列Ｗ（Ａ，ｔ_A ^b，ｔ_A ^e）：
Ｗ（Ａ，ｔ_A ^b，ｔ_A ^e）＝｛（ｔ_A ^b，ω（Ａ，ｔ_A ））｜ｔ_A ^b≦ｔ_A ＜ｔ_A ^e｝
プロセスＡの時刻ｔ^b _Aからｔ_A ^eまでの書き込み位置集合ω（Ａ，ｔ_A ^b，ｔ_A ^e）：
【００４２】
【数１】

【００４３】
と定義する。
同様に、プロセスＢの読み出しについても、以下のように定義する。
プロセスＢの時刻ｔ_B ^bからｔ_B ^eまでの読み出し系列Ｒ（Ｂ，ｔ_B ^b，ｔ_B ^e）：
Ｒ（Ｂ，ｔ_B ^b，ｔ_B ^e）＝｛（ｔ_B，γ（Ｂ，ｔ_B））｜ｔ_B ^b＜ｔ_Ｂ＜ｔ_B ^e｝
プロセスＢの時刻ｔ_B ^bからｔ_B ^eまでの読み出し位置集合γ（Ｂ，ｔ_B ^b，ｔ_B ^e）：
【００４４】
【数２】

【００４５】
図３の場合、プロセスＡからプロセスＣへの通信路があるため、Ａの書き込み系列は、Ｃの読み出し系列でもある。ここでは取りこぼしがなく、確実に通信が行えるケースのみを扱うので、それが実現できるように、各プロセスでの時計を設定する。
【００４６】
プロセスＡの時刻ｔ_A をプロセスＣの時刻ｔ_C によって
ｔ_A ＝α_Caｔ_C＋β_Ca：α_Ca，β_Caは定数（α_Caはクロック信号の周波数の比、β_Caは位相の差に対応）
と表現すると、プロセスＣの読み出し系列γ（Ｃ，ｔ_C ，ｔ_C ＋１）は、プロセスＡの書き込み系列から容易に求められる。
γ（Ｃ，ｔ_C ，ｔ_C ＋１）＝ω（Ａ，α_Caｔ_C ＋β_Ca，α_Ca（ｔ_C+1）＋β_Ca）
同様にして、プロセスＢの時刻ｔ_B をプロセスＣの時刻ｔ_C によって
ｔ_B ＝α_Cbｔ_C＋β_Cb：α_Cb，β_Cbは定数（α_Cbはクロック信号の周波数の比、β_Cbは位相の差に対応）
と表現すると、プロセスＢの書き込み系列ω（Ｃ，ｔ_C ，ｔ_C ＋１）は、プロセスＣの書き込み系列から求めることが可能である。
ω（Ｃ，ｔ_C ，ｔ_C ＋１）＝γ（Ｂ，α_Cbｔ_C ＋β_Cb，α_Cb（ｔ_C+1）＋β_Cb）
ただし実際の処理としては、プロセスＡの書き込み系列とプロセスＢの読み出し系列が与えられる情報となり、本処理では、逆にプロセスＢの読み出し系列から、プロセスＣの書き込み系列を求めることとなる。
【００４７】
実際には、各プロセスが同じクロック信号で動くケース、つまり、α_Ca＝１，β_Ca＝０のようなケースが大部分であるが、本処理では必ずしもクロック信号が同じ（周波数が同じ）でない場合にも対処する為に、一般化してこのような表現を用いる。
【００４８】
このようにして、モジュールＡの出力、モジュールＢの入力の２つを、プロセスＣの入力と出力だけで表現し、今後、回路モジュールＣの合成方法をプロセスＣだけで議論できるようになる。
【００４９】
一方のプロセスが１回の読み出し処理を行う間に、他方のプロセスが２回以上の書き込み処理を行う場合、複数のポートを用いて通信する等の工夫が必要になる。逆に、一方のプロセスが１回の書き込み処理を行う間に、他方のプロセスが２回以上の読み出し処理を行う場合、読み出しを休む回を設ける等の工夫が必要になる。
【００５０】
図７（ａ）は、プロセスＡの書き込み系列をプロセスＣの読み出し系列に変換する処理を示すフローチャートである。同図の処理によってｔ_A ＝０からプロセスＡの書き込み系列において、最後に書き込みを行う時刻ｔ_A ^EまでのプロセスＡの書き込み系列をプロセスＣの読み出し系列に変換する。
【００５１】
同図において、処理が開始されるとまずステップＳ１として、ｔ_a ＝０、プロセスＣにより読み出し系列を保存する図６に示したリスト構造のデータＲ_C を全てクリアする。
【００５２】
次にステップＳ２としてｔ_C にα_aCｔ_A ＋β_aC（α_aC，β_aCは時間の関係式における定数）の計算値を小数点以下切り上げた値を代入し、プロセスＡでの時刻をプロセスＣの時刻に変換する。そしてステップＳ３として時刻t_CのプロセスＣの読み出し系列を保存している図６に示したようなリスト構造のデータＲ_C ［ｔ_C ］の末尾に時刻ｔ_aのプロセスＡの書き込み系列を保存している図６に示したようなリスト構造のデータＷ_A ［ｔ_a ］を付加する。
【００５３】
次にステップＳ４としてｔ_A の値を１つ加算し、ステップＳ５としてｔ_a の値が最後に書き込みを行う時刻ｔ_A ^Eを超えていなければ（ステップＳ５、ＹＥＳ）、ステップＳ２に処理を戻し、ステップＳ２〜Ｓ５の処理を繰り返す。そしてステップＳ５で、ステップＳ５としてｔ_Aの値が最後に書き込みを行う時刻ｔ_A ^Eを超えたならば（ステップＳ５、ＮＯ）、処理を終了する。
【００５４】
図７（ｂ）は、図７（ａ）の処理によってプロセスＡの書き込み系列のプロセスＣの読み出し系列への変換を示したもので、同図上段はプロセスＡの方がプロセスＣより速度が遅い場合、同図下段はプロセスＡの方がプロセスＣより速度が速い場合を示している。
【００５５】
同図上段では、プロセスＡの方が速度が遅いので、時刻ｔ_A の書き込み処理がプロセスＣの時刻ｔ_C の読み出し処理に、また時刻ｔ_A ＋１の書き込み処理がプロセスＣの時刻ｔ_C＋２の読み出し処理に対応する。また同図下段では、プロセスＡの方が速度が速いので、時刻ｔ_A の書き込み処理がプロセスＣの時刻ｔ_C読み出し処理に、また時刻ｔ_A ＋１とｔ_A ＋２の書き込み処理がプロセスＣの時刻ｔc ＋１の読み出し処理に対応する。
【００５６】
図８（ａ）は、プロセスＢの読み出し系列をプロセスＣの書き込み系列に変換する処理を示すフローチャートである。同図の処理によってｔ_B ＝０からプロセスＢの書き込み系列において、最後に読み出しを行う時刻ｔ_B ^EまでのプロセスＢの読み出し系列をプロセスＣの書き込む系列に変換する。
【００５７】
同図において処理が開始されると、まずステップＳ１１として、ｔ_B ＝０、プロセス系列を保存する図６に示したリスト構造のデータＷ_Cを全てクリアする。
次にステップＳ１２としてｔc にα_bCｔ_B＋β_bC（α_bC，β_bCは時間の関係式における定数）の計算値を小数点以下切り下げた値を代入し、プロセスＢでの時刻をプロセスＣの時刻に変換する。そしてステップＳ１３として時刻t_CのプロセスＣの読み出し系列を保存している図６に示したようなリスト構造のデータＷ_C ［ｔ_C ］の末尾に時刻ｔ_BのプロセスＢによる読み出し系列を保存している図６に示したようなリスト構造のデータＲ_B ［ｔ_B ］を付加する。
【００５８】
次にステップＳ１４としてｔ_B の値をインクリメントし、ステップＳ１５としてｔ_B の値が最後に読み出しを行う時刻ｔ_B ^Eを超えていなければ（ステップＳ１５、ＹＥＳ）、ステップＳ１２に処理を戻し、ステップＳ１２〜Ｓ１５の処理を繰り返す。そしてステップＳ１５で、ステップＳ１５としてｔ_Bの値が最後に読み出しを行う時刻ｔ_B ^Eを超えたならば（ステップＳ１５、ＮＯ）、処理を終了する。
【００５９】
図８（ｂ）は、図８（ａ）の処理によってプロセスＢの読み出し系列のプロセスＣの書き込み系列への変換を示したもので、同図上段はプロセスＢの方がプロセスＣより速度が速い場合、同図下段はプロセスＢの方がプロセスＣより速度が遅い場合を示している。
【００６０】
同図上段では、プロセスＢの方が速度が速いので、プロセスＣの時刻ｔ_C の書き込み処理がプロセスＢの時刻ｔ_b またはｔ_b ＋１のいずれかの読み出し処理に、またプロセスＣの時刻ｔ_C＋１の書き込み処理がプロセスＢの時刻ｔ_C ＋２またはｔ_C ＋３のいずれかでの読み出し処理に対応する。また同図下段では、プロセスＢの方が速度が遅いので一般的には複数の通信ポートが必要になり、プロセスＣの時刻ｔ_Cの書き込み処理がプロセスＢの時刻ｔ_b 読み出し処理に、またプロセスＣの時刻ｔ_C ＋１とｔ_C ＋２の２回分の書き込み処理がプロセスＢの時刻ｔ_b＋１の読み出し処理に対応する。この場合、複数個のデータがプロセス間で受け渡されるならば、複数の通信ポートが必要になる
（２）出力遅延を求める
出力遅延は、プロセスＣプロセスＣが、プロセスＡからデータを読み出してから、そのデータをプロセスＢに書き出すまでの時間で最小出力遅延ｔ_C ^ｏｄｅｌ ^ａｙは、その最小時間を示す値である。本実施形態では、モジュールＣで使用するメモリ量を削減するために、プロセスＣの出力遅延として、この最小出力遅延ｔ_C ^{ｏｄｅｌａｙ} を選択する。
【００６１】
最小出力遅延ｔ_C ^{ｏｄｅｌａｙ}は、以下の関係を満足する整数値となる。
∀ｔ_C γ（Ｃ，０，ｔ_C ）⊇ω（Ｃ，０，ｔ_C−ｔ_C ^{ｏｄｅｌａｙ}）
プロセスＡが書き込みを開始してから、最小出力遅延ｔ_C ^{ｏｄｅｌａｙ}時間経過後にプロセスＢの読み出しを開始すれば、
・プロセスＡが配列変数に書き込んだあとに、プロセスＢが配列変数を読み出す・プロセスＡが配列変数を書き込まないうちに、プロセスＢが配列変数を読み出すことはない
という処理の順序関係を満たす。
【００６２】
最小出力遅延ｔ_C ^{ｏｄｅｌａｙ}を求めるアルゴリズムの一例を以下に示す。尚配列変数の添字の集合γ及びωは時刻ｔ_C に対して要素数が単調増加し、上限は配列変数のサイズに等しい。よってこのアルゴリズムは、最大でも、プロセスＢが全ての要素を読み出すのに要するサイクル数だけループすれば停止する。
１．ｔ＝０
２．γ（Ｃ，０，ｔ_C ）⊇ω（Ｃ，０，ｔ_C−t）ならば、ｔ_C ^{ｏｄｅｌａｙ} ＝ｔとして終了
３．ｔ＝ｔ＋１
４．２へ戻る
図９は、最小出力遅延ｔ_C ^{ｏｄｅｌａｙ}を求める処理を示すフローチャートである。
【００６３】
同図において処理が開始されると、まずステップＳ２１として、初期値としてｄ、ｔ１、ｔ２に０、Ｒに時刻ｔ１のプロセスＣの読み出し系列を保存している図６に示したようなリスト構造のデータＲc［ｔ１］が設定される。
【００６４】
次にステップＳ２２としてｔ２がプロセスＣが最後に書き込みを行う時間ｔ_C ^E以下であるかどうかを判断する。その結果ｔ_C ^E以下であるならば（ステップＳ２２、ＹＥＳ）、ステップＳ２３としてリスト構造のデータのＷc ［ｔ２］の要素が全てリスト構造のデータＲに含まれているかどうかを判断し、含まれていなければ（ステップＳ２３、ＮＯ）ｄを１つ加算させ（ステップＳ２４）、含まれていれば（ステップＳ２３、ＹＥＳ）、ｔ２を１つ加算する（ステップＳ２５）。
【００６５】
そしてステップＳ２６として、ｔ１の値を１つ加算すると共に、リスト構造の変数Ｒの末尾にリスト構造の変数Ｒc ［ｔ１］を付加して、ステップＳ２２に処理を戻し、ｔ２がｔ_C ^Eを超えるまで、ステップＳ２２〜Ｓ２６の処理を繰り返す。
【００６６】
ステップＳ２２において、ｔ２がｔ_C ^Eより大きければ（ステップＳ２２、ＮＯ）、ステップＳ２７としてｔ_C ^{ｏｄｅｌａｙ}としてｄの値を設定して処理を終了する。
（３）変数のライフタイムの解析
変数ｖのライフタイムＬ（ｖ）＝（ｂｅｇｉｎ（ｖ），ｅｎｄ（ｖ））とは、時刻ｂｅｇｉｎ（ｖ）に最初に変数ｖに値の代入が発生し、時刻ｅｎｄ（ｖ）に最後の変数ｖの値参照が発生したことを示す。本処理では、モジュールＣに含まれる配列変数のライフタイムを解析する。
【００６７】
配列変数の要素ｖ［ｉ］（ｉは添え字）について、値の代入時刻は、モジュールＡより通信ポートを介して読み出した時刻となり、最後の参照時刻とは、通信ポートよりモジュールＢに向けて書き出した時刻となる。
【００６８】
変数ｖ［ｉ］について、ライフタイムＬ（ｖ［ｉ］）＝（ｂｅｇｉｎ（ｖ［ｉ］），ｅｎｄ（ｖ［ｉ］）
は以下のように求められる。
ｉ∈γ（Ｃ，ｔ）ならばｂｅｇｉｎ（ｖ［ｉ］）＝ｔ
ｉ∈ω（Ｃ，ｔ’）ならばｅｎｄ（ｖ［ｉ］）＝ｔ’
図１０は、変数のライフタイムｂｅｇｉｎ（ｖ［ｉ］），ｅｎｄ（ｖ［ｉ］）のデータ構造を示す図である。
【００６９】
変数ｖ［０］〜ｖ［ｉ］のライフタイムを示すｂｅｇｉｎ（ｖ［０］）〜ｂｅｇｉｎ（ｖ［ｉ］）及びｅｎｄ（ｖ［０］）〜ｅｎｄ（ｖ［ｉ］）は、それぞれ同図に示すような一次元配列Ｂｅｇｉｎ［０］〜Ｂｅｇｉｎ［ｎ−１］及び一次元配列Ｅｎｄ［０］〜［ｎ−１］によって実現することが出来る。
【００７０】
図１１は、ライフタイム∀ ｖｉ，Ｌ（ｖｉ）＝（ｂｅｇｉｎ（ｖ［ｉ］，ｅｎｄ（ｖ［ｉ］）を求める処理を示すフローチャートである。
同図の処理は、図１０に示した１次元配列を用いて行われるものとする。
【００７１】
同図において処理が開始されると、ステップＳ３１として初期値としてｔ＝０が設定される。
次にステップＳ３２として、時刻ｔのプロセスＣの読み出し系列を格納する図６のようなリスト構造をもつ変数Ｒｃ［ｔ］の全要素に対して、図１０に示した一次配列変数Ｂｅｇｉｎ［ｉ］（ｉは変数Ｒｃ［ｔ］の要素）にｔを代入してゆく（ステップＳ３３）。
【００７２】
次にステップＳ３４として、時刻ｔのプロセスＣの書き込み系列を格納する図６のようなリスト構造をもつ変数Ｗｃ［ｔ］の全要素に対して、図１０に示した一次配列変数Ｅｎｄ［ｉ］（ｉは変数Ｗｃ［ｔ］の要素）にｔを代入してゆく（ステップＳ３５）。
【００７３】
ステップＳ３６としてｔの値を１つ加算し、この値がプロセスＣが最後に書き込みを行う時間ｔ_C ^Eより小さいかどうかを判断する（ステップＳ３７）。その結果ｔがｔ_C ^Eより小さければ、ステップＳ３２に処理を戻しステップＳ３２〜Ｓ３６の処理を繰り返す。そして、ステップＳ３７でｔがｔ_C ^Eより小さくなくなったら（ステップＳ３７、ＮＯ）、処理を終了する。
（４）縮退させた変数集合を求める
変数集合Ｖと、Ｖの部分集合ＰとＱ、Ｐ⊆Ｖ、Ｑ⊆Ｖ、｜Ｐ｜＝｜Ｑ｜（本明細書の説明で｜集合Ｘ｜は集合Ｘの要素数を示す）と、ある整数Ｔｒについて、∀ｐ∈Ｐ ∃ｑ∈Ｑｓｕｃｈｔｈａｔｂｅｇｉｎ（ｐ）＋Ｔｒ＝ｂｅｇｉｎ（ｑ）且つｅｎｄ（ｐ）＋Ｔｒ＝ｅｎｄ（ｑ）且つｅｎｄ（ｐ）≦ｂｅｇｉｎ（ｑ）が成り立つ場合を、外１と表記する。この時ＰとＱは時間差Ｔｒで同
【００７４】
【外１】

【００７５】
じライフタイムを持つと呼ぶ。
このような部分集合Ｐが存在すれば、変数集合Ｖ全体を考慮せずに、より小さい集合Ｐを考慮するだけでよくなる。Ｐのことをここでは縮退させた変数集合と呼ぶ。
【００７６】
本処理では、モジュールＣに含まれる配列変数を縮退した変数集合を求める。
縮退させた変数集合Ｖｒを求める処理を以下に示す。
Ｈｅａｄ（ｎ，Ｌ）を、順序つきリスト構造のデータＬの先頭からｎ個の要素を集合として返す関数とする。
１．変数集合Ｖから、各変数ｖｉ∈Ｖについてｂｅｇｉｎ（ｖｉ）の昇順で並んだ順序つきリストＶ_ord を求める。
２．変数集合Ｖの要素数｜Ｖ｜の全公約数（≠１）の中で、以下を満足するような最小の公約数ｍを求める。
【００７７】
【数３】

【００７８】
３．ｍが求まったなら、Ｖ_ｒ＝Ｐ０＝Ｈｅａｄ（ｍ，Ｖ_ord ）を、縮退された変数集合とする。
４．もしこのようなｍが存在しなければ、Ｖｒ＝Ｖとする（縮退できない）。
【００７９】
図１２は、縮退された変数集合を示す図である。
同図の例は、１／４に縮退された場合（集合Ｖ_ord の要素数／ｍ＝４）を示す。各変数集合Ｐ０〜Ｐ３は、外２の関係が成り立っており、Ｐ０〜Ｐ３は
【００８０】
【外２】

【００８１】
それぞれｍ個の要素を持つ。
このような縮退させた変数集合Ｐ０が求まると、変数集合Ｖ全体でなくこの変数集合Ｐ０を縮退させた変数集合Ｖｒとして考慮すればよい。
【００８２】
図１３は、縮退させた変数集合Ｖｒを求める処理を示すフローチャートである。
同図の処理を行うに当たって、プロセスＣの変数の集合を一次元配列Ｖに、一次元配列Ｖの要素ｖi∈ＶをＢｅｇｉｎ［ｉ］の昇順にソートしたリストデータを一次元配列Ｖ_ord に、一次元配列Ｖの要素数｜Ｖ｜の約数の集合を降順に並べたリストを一次元配列Ｍに事前に設定しておく。
【００８３】
図１３の処理が開始されると、まずステップＳ４１として配列Ｍが空かどうかが判断される。その結果空でなければ（ステップＳ４１、ＮＯ）、ｎに一次元配列Ｍの最初の要素、即ち要素のうち最も小さい値を設定し、配列Ｍからこの最初の要素を取り除く。
【００８４】
次にステップＳ４３として一次元配列Ｖ_ord 内の要素を先頭から順番にｎ等分し、これらをＰ０，Ｐ１，・・・，Ｐｎ−１とする。そして以下のステップＳ４４〜Ｓ４８によって、このＰ０，Ｐ１，・・・，Ｐｎ−１に対して外３が
【００８５】
【外３】

【００８６】
成り立つかどうかを判断する。
まずステップＳ４４として変数ｉに初期値０を設定し、次にステップＳ４５としてｉ＜ｎ−１かどうか、即ち全てのＰ０，Ｐ１，・・・，Ｐｎ−１に対して処理を行ったかどうかを判断し、その結果ｉ＜ｎ−１ならば（ステップＳ４５、ＹＥＳ）、ステップＳ４６として外４かどうかを判断し、その結果ＹＥＳなら
【００８７】
【外４】

【００８８】
ば（ステップＳ４６、ＹＥＳ）、ステップＳ４７として変数ｉの値を１つ加算すると共にステップＳ４５の処理に戻り、ステップＳ４５〜Ｓ４７の処理を繰り返す。そしてステップＳ４５で、ｉ＜ｎ−１でないと判断されたなら（ステップＳ４５、ＮＯ）、Ｐ０，Ｐ１，・・・，Ｐｎ−１の全てに対してステップＳ４６の判断において、外５が成立すると判断されたので、ステップＳ４８としてＶ
【００８９】
【外５】

【００９０】
ｒ＝Ｐ０として処理を終了する。
また途中で、ステップＳ４６の判断の結果がＮＯとなったならば（ステップＳ４６、ＮＯ）、ステップＳ４１に処理を戻し、一次配列Ｍ内の１番目の要素をｎとして、上記した処理と同様の処理を繰り返す。尚ステップＳ４６の判断処理についての詳細は、後述する。
【００９１】
一次配列Ｍ内の全ての要素に対して処理が完了し、ステップＳ４１で一次配列Ｍが空と判断されたなら（ステップＳ４１、ＹＥＳ）、変数の集合Ｖは縮退できないので、ステップＳ４９としてＶｒ＝Ｖ_ord として処理を終了する。
【００９２】
図１４は、図１３のステップＳ４６の判断処理の詳細を示すフローチャートである。
ステップＳ４６の判断処理では、まず初期値として、変数ｍにＰｉの要素数｜Ｐ｜、変数ｊにｍ＊ｉ、変数ＴｒにＢｅｇｉｎ［ｍ］−Ｂｇｅｇｉｎ［０］が設定される（ステップＳ４６１）。
【００９３】
次に、ステップＳ４６２でｊ＜ｎ−１かどうかが判断され，その結果ｊ＜ｎ−１ならば（ステップＳ４６２、ＹＥＳ）、ステップＳ４６２としてＢｅｇｉｎ［ｊ］＋Ｔｒ＝Ｂｅｇｉｎ［ｊ＋ｍ］且つＥｎｄ［ｊ］＋Ｔｒ＝Ｅｎｄ［ｊ＋ｍ］且つＥｎｄ「ｊ」≦Ｂｅｇｉｎ［ｊ＋ｍ］かどうかが判断される。その結果これら条件を満たしていれば（ステップＳ４６３、ＹＥＳ）、ステップＳ４６４として変数ｊの値を１つ加算した後、ステップＳ４６３の処理を戻し、ステップＳ４６３の判断で上記条件を満たしていないと判断されるか（ステップＳ４６３、ＮＯ）、ステップＳ４６３でｊ＜ｎ−１を満たしていないと判断される（ステップＳ４６３、ＮＯ）までステップＳ４６２〜Ｓ４６４の処理を繰り返す。そして、テップＳ４６２でＮＯとなった時は、外６が成立している
【００９４】
【外６】

【００９５】
と判断され、ステップＳ４６はＹＥＳとなり、ステップＳ４６３でＮＯとなったときは、外７は成立しないと判断され、ステップＳ４６はＮＯとなる。
【００９６】
【外７】

【００９７】
（５）変数共有問題を解き、変数をメモリに割り当てる
本処理では、（４）の処理によって縮退された変数集合Ｖｒの各要素に対して、メモリを割り当ててゆく。変数集合Ｖｒは縮退が行われているので、このＶｒの要素となっている変数にそれぞれメモリを割り当て、他の変数はＶｒの要素となっている変数とメモリを共有することによって、メモリの規模を縮小化させた、モジュール間インタフェースを生成することができる。
【００９８】
またこの変数集合Ｖｒについて、変数共有問題を解き、この結果に基いて変数にメモリを割り当てることにより、更にモジュール間インタフェースのメモリ量を更に小さくできる。
【００９９】
この変数共有問題を解くのには、以下の文献に記載されているLeft Edge 法を修正したアルゴリズムで解くことが可能である。
書名 Hight-Leｖel Synthesys
著者 D. Gajski, N. Dutt, A. Wu and S. Lin
発行 Kluwer Academic Publishers 発行年 1992
Left Edge 法を修正したアルゴリズムでは、処理対象の変数群Ｓ内の変数ｖをｂｅｇｉｎの昇順に並べ替え、最初の変数とから他の変数とライフタイムが重なりあうかを調べ、重なり合わなければ同じメモリに割り当ててゆく。
【０１００】
変数集合Ｖについて、Ｖ’を以下のような集合とする。
Ｖ’＝｛ｖ’｜ｖ’∈Ｖ，Ｌ（ｖ’）＝（ｂｅｇｉｎ（ｖ）＋Ｔ，ｅｎｄ（ｖ）＋Ｔ）｝
上式においてＴはプロセスの周期であり、プロセスＣが配列変数の要素に初めて書き込んだ時刻をｔ1 とし、プロセスＣが配列変数のすべての要素に書き込み終わった後、初めて書き込みを行ったときの時刻をｔ２とするとＴ＝ｔ２−ｔ１となる。この変数集合Ｖを縮退できなかった際、このＶ’を用いて変数共有問題を解く。
【０１０１】
図１５は、変数共有問題を解く処理を示すフローチャートである。
本処理では、最初に変数集合Ｖを縮退できた場合と出来なかった場合とで場合分けをし、縮退できなかった場合には、変数集合Ｓに上記Ｖ’を加えて、本来の２倍の量の要素を考慮している。これは、モジュール間インタフェースを設計する際にメモリを２倍にして、読み込み側と書き込み側とで交互にメモリを使って、アクセス競合を防ぐのと同じアイディアである。
【０１０２】
図１５の処理が開始されると、まずステップＳ５１として、縮退した変数集合Ｖｒが変数集合Ｖの変数を最初に値の代入が発生した順序で並べた順序つきリストＶ_ord と同じかどうかが判断される。その結果異なれば（ステップＳ５１、ＮＯ）、ステップＳ５２としてＳ＝Ｖｒ、Ｔｒ＝Ｂｅｇｉｎ［｜Ｖｒ｜］−Ｂｅｇｉｎ［０］を設定する。またステップＳ５１で同じであれば（ステップＳ５１、ＹＥＳ）、Ｓ＝Ｖ∪Ｖ’、Ｔｒ＝Ｔ＊２とする。
【０１０３】
次にステップＳ５４として集合Ｓに対して、Ｂｅｇｉｎ［ｉ］，ｉ∈Ｓの昇順でソートをかけこれを順序付きリストＳ_ord とする。
そしてステップＳ５５として、変数ｍｅｍ、ｌａｓｔ及びｉに０を、また変数ｎｅｘｔに∞を設定する。
【０１０４】
次にステップＳ５６としてＳ_ord が空かどうか判断され、空でなければ（ステップＳ５６、ＮＯ）、Ｓ_ord の要素数｜Ｓ_ord ｜≦ｉかどうかが判断される。その結果｜Ｓ_ord｜≦ｉならば（ステップＳ５７、ＹＥＳ）、ステップＳ５８として、変数ｍｅｍの値を１つ加算し、変数ｌａｓｔ及びｉに０を、ｎｅｘｔに∞を設定した後、またステップＳ５７で｜Ｓ_ord｜≦ｉでないならば（ステップＳ５７、ＮＯ），そのまま処理をステップＳ５９に移す。
【０１０５】
ステップＳ５９では、Ｓ_ord の先頭からｉ番目の要素をｖに代入し、ステップＳ６０としてｌａｓｔ≦Ｂｅｇｉｎ［ｖ］且つＥｎｄ［ｖ］≦ｎｅｘｔを満たしているかどうかを判断する。
【０１０６】
その結果、満たしていると判断した場合（ステップＳ６０、ＹＥＳ）、ステップＳ６２として順序付きリストＳ_ord から要素ｖを除き、
ステップＳ６３としてＢｅｇｉｎ［ｖ］＜Ｅｎｄ［ｖ］かどうかを判断し、その結果Ｂｅｇｉｎ［ｖ］＜Ｅｎｄ［ｖ］でなければ（ステップＳ６３、ＮＯ）、処理をステップＳ５６に処理を戻す。
【０１０７】
またステップＳ６３の判断において、Ｂｅｇｉｎ［ｖ］＜Ｅｎｄ［ｖ］ならば）（ステップＳ６３、ＹＥＳ）、ステップＳ６４として変数のメモリへの割り振りを示す配列ａｓｓｉｇｎ［ｖ］にｍｅｍを代入し、またｌａｓｔ＝ｅｎｄ（ｖ）とした後処理をステップＳ５６に戻す。そしてｎｅｘｔには、ｎｅｘｔの値とＢｅｇｉｎ（ｖ）＋Ｔｒの値の小さいほうを代入する。またステップＳ６０で上記条件を満たしていなかった場合（ステップＳ６０、ＮＯ）、ステップＳ６１としてｉに対して１つ加算した後処理をステップＳ５６に戻す。
【０１０８】
このステップＳ５６からＳ６４までの処理を順序付きリストＳ_ord の要素が無くなるまで繰り返し、Ｓ_ord が空になったら（ステップＳ５６、ＹＥＳ）、処理を終了する。そしてこの配列変数ａｓｓｉｇｎに格納された結果に基いて変数にメモリを割り当てる。
（６）インタフェース回路を合成する
これまでの手続きによって、変数がメモリに割り当てられ、どのタイミングでどの変数にアクセスするか（＝どのタイミングでどのメモリにアクセスするか）が決まったので、プロセスＣをＲＴＬ（レジスタ・トランスファ・レベル）での回路として実現するのに十分な情報がそろったことになる。あとは、これらの情報に基づいたＦＳＭ（Finite State Machine）を構築し、ＦＳＭに従って、回路を作ればよい。
【０１０９】
以下、ＲＴＬを記述するときの文法として、ＳｙｓｔｅｍＣにならった書式を採用する。
プロセスＣのＲＴＬは、次の２つのサブプロセスから構成される。サブプロセスは同時に実行開始され、並列実行される。動作のタイミングは「ｗａｉｔ（数字）」（後述）で同期するようになっている。
・入力ポートから読み込んで、メモリに書き込むサブプロセス
・メモリもしくは入力ポートからデータを読み込んで、出力ポートに書き込むサブプロセス
変数集合を縮退できた場合の、ＲＴＬ生成を行うアルゴリズムを、図１６に示す。尚変数集合を縮退できなかった場合は、図１５の処理と同様に、２倍の量の変数を考慮する。このとき、γ（Ｃ，ｔ_C ）とω（Ｃ，ｔ_C ）とを、２倍の量の変数を扱えるように拡張する。拡張するとは、以下のような置き換えをすることを示す。
【０１１０】
もともとは、γ（Ｃ，ｔ_C ）＝γ（Ｃ，ｔ_C ＋Ｔ）であるが、次のように定義される、拡張したγ⁺ （Ｃ，ｔ_C）にγ（Ｃ，ｔ_C ）を置き換える。
・γ⁺ （Ｃ，ｔ_C）＝γ（Ｃ，ｔ_C ）０≦ｔ_c ＜Ｔ
・γ⁺ （Ｃ，ｔ_C）＝γ（Ｃ，ｔ_C ）のインデックスだけを｜Ｖ｜だけ増加させたもの。Ｔ≦ｔ_c ＜２Ｔとする。ω（Ｃ，ｔc ）についても同様にω⁺（Ｃ，ｔc）に置き換える。
【０１１１】
尚図１６中のｏｕｔｐｕｔは、ＲＴＬコードをファイルやコンパイラなどに出力するという意味を示し、Ｐｊは入力ポートｊを示し、Ｐｊ．ｒｅａｄ（）は入力ポートから読み出したデータ、Ｑｋは出力ポートｋ、Ｑｋ．ｗｒｉｔｅ（ｄ）は出力ポートｋにデータｄを書き込むことを意味する。
【０１１２】
またγ（Ｃ，ｔ_C ）は集合であるので、１サイクル中に複数のデータを取り込む場合がある。これを実現するために、プロセスＣは入力ポート及び出力ポートは複数存在する。またどのポートからどのデータが入ってくるかはあらかじめ与えられるとする。
【０１１３】
またＡｓｓｉｇｎ［］は変数共有問題で得られた値であり、Ａｓｓｉｇｎ［ｖ］は変数ｖがどのメモリに割り当てられたかを出力する。このＡｓｓｉｇｎ［］はＲＴＬコード生成時には、定数値となる。Ｍｅｍ［ａ］＝ｂは、メモリのａ番目に値ｂを書き込むことを意味する。ｗａｉｔ（n）は、ｎクロック待つことを意味する。
【０１１４】
図１６において上段のサブプロセスはデータの読み込みのＲＴＬ記述を出力するアルゴリズムで、γ（Ｃ，ｔ１）（ｔ１＝０〜Ｔ−１）の各要素に対してＭｅｍ［Ａｓｓｉｇｎ［ｊ］］＝Ｐｊ．ｒｅａｄ（）（ｊはγ（Ｃ，ｔ１）の各要素）を出力する。
【０１１５】
図１７は、このデータの読み込みを行うサブプロセスによるＲＴＬ記述の出力例を示す図である。尚同図及び後述する図１８、１９においては、サブプロセスによって出力される注釈分については略してある。
【０１１６】
同図の記述ではこれまでの処理によって得られた各変数とメモリの割り当て関係から、入力ポートからの入力データが、メモリの０，１，２，３，４，５，６，７，０，４，２，６，８番目に格納されてゆくことを示している。図４若しくは図５に示した制御部は、ｗｒｉｔｅａｄｄｒｅｓｓやロード信号によってこの記述に基いた制御を行いデータをメモリやレジスタに格納してゆく。
【０１１７】
図１６において下段のサブプロセスはデータの書き込みのＲＴＬ記述を出力するアルゴリズムで、ω（Ｃ，ｔ２）（ｔ２＝０〜Ｔ−１）の各要素に対してＱｋ．ｗｒｉｔｅ（Ｍｅｍ［Ａｓｓｉｇｎ［ｋ］］）（ｋはω（Ｃ，ｔ２）の各要素）を出力する。
【０１１８】
図１８は、このデータの出力ポートへの書き込みを行うサブプロセスによるＲＴＬ記述の出力例を示す図である。
同図の記述ではこれまでの処理によって得られた各変数とメモリの割り当て関係から、出力ポートに対して、メモリの０，２，４，６，０，２，０に格納されているデータ、入力ポートのデータ、メモリの１，３，５，７，４，６，８，９に格納されているデータが、出力されてゆくことを示している。図４若しくは図５に示した制御部は、ｒｅａｄａｄｄｒｅｓｓやロード選択信号によってこの記述に基いた制御を行いデータをメモリやレジスタからデータを出力ポートに出力してゆく。
【０１１９】
図１９は、データの読み込みを行うサブプロセスとデータの出力ポートへの書き込みを行うサブプロセスの２つのサブプロセスをまとめたＲＴＬコードの模擬的な例を示す図である。
【０１２０】
２つのサブプロセスを１つにまとめて記述すると同図のようになり、２つの順序回路を１つにまとめて記述したものとなっている。
次に本実施形態による、具体的な設計例を示す。
【０１２１】
例１として、以下のようなω（Ａ，ｔ_A ）とγ（Ｂ，ｔ_B ）と規定されるプロセスＡ、Ｂが与えられたとする。これはサイズ６４の配列変数をＡからＢへ受け渡すことに相当しており、Ｔ＝６４である。
【０１２２】
【数４】

【０１２３】
ここで、ｚⁱ はｚを２進表現した場合のｉ桁目の値（０若しくは１）を表しており、［ｚⁱ，ｚ^j ，・・・］は、０か１の値をこの順序で並べた２進数表現の値のことを指す。つまりｚ＝［ｚ⁵ ，ｚ⁴，ｚ³ ，ｚ² ，ｚ¹ ，ｚ⁰ ］であり、ｙはｚのビット５〜３を並べ替えて得られる値となる。
【０１２４】
ここでは、時間の関係式における定数α_C _ａ＝１、α_Cb＝１、β_Ca＝０、β_Cb＝０として、プロセスＣの読み出し系列Ｒ（Ｃ）、書き込み系列Ｗ（Ｃ）を求めると、以下のようになる。
【０１２５】

また最小出力遅延を求めるとｔ_C ^odelay＝７となる。
【０１２６】
本実施形態の前提条件として、プロセスＣは一度データの出力をはじめたら、決められた順序で常に出力しつづけ、まだ入力データが到着していないので、出力を一時停止させる、といった措置はいっさい行わないものとしている(いったん流れ始めたら、出力は滞ることなく、常に流れつづけなければならない)。そのため、あまり早いうちから出力を出しはじめてしまうと、まだ到着していないデータを出力する状況に追い込まれてしまう。そのため、ある程度待ってデータがたまってから出力を始める。この時間が最小遅延時間ｔ_c ^odelayである。
【０１２７】
たとえば、上の例で、２単位時間だけ待って出力を開始するケースを考えるとする。そのときの様子を図２０に示す。
同図では、各時刻において、
・どのデータが入力として到着し、
・どのデータを出力し
・バッファメモリ内には、どのデータが保持されているか
を表している。
【０１２８】
同図中矢印はデータの移動を、バツ印はバッファメモリからデータが取り出されることを表している。バッファメモリの内容は集合で表現しており、順序は関係ない(この順番で、メモリに格納されていることを意味しているのではない)。尚、時刻ｔ_C におけるバッファメモリの内容は、次のように集合の差で表現できる。
(バッファメモリの内容)=γ(Ｃ，０，ｔ_C )−ω（Ｃ，０，ｔ_C−２）
同図では、時刻５のとき、３番目のデータを出力しなければならないが、まだ３番は到着していないので、行き詰まってしまう（出力が滞る）。
【０１２９】
次に、７単位時間だけ待って出力するケースを考える。その様子は、図２１のようになる。
図２１では時刻２８までしか描かれていないが、それ以降も、出力が滞ることなくデータの入力と出力を連続して実行することが可能である。
【０１３０】
７単位時間よりも長く待ってから出力するケースは、（１）必要なバッファメモリ量が増加する、（２）入力が入ってから出力がでるまでの時間（レイテンシ）が増加する、といったデメリットしかないため、本実施形態ではそのようなケースは対象外とする。ただし、対象外というのは、そのようなケースを扱えないという意味ではない。このようなケースを扱うのには単にｔ_c ^odelay の値に最小出力遅延の値よりも大きい値を採用すればよいだけである。
【０１３１】
プロセスＣの配列変数ｖのライフタイムの解析結果を図２２に示す。
同図においてＢｅｇｉｎ及びＥｎｄは、図１０に示した１次元配列に対応し、Ｂｅｇｉｎはその変数に最初に値の代入が発生した時間、Ｅｎｄは最後に参照が発生時間、添え字は変数の添え字を示している。
【０１３２】
図２３は、図２２を元に、各変数のライフタイムを図形的に表したものである。
同図は縦軸方向にプロセスＣの配列変数の添え字（インデックス）とり、横方向にその変数のライフタイムを“＃”で表している。同図ではｉ行目が配列変数の要素ｖ［ｉ］に対応しており、Ｂｅｇｉｎ［ｉ］からＥｎｄ［ｉ］−１までの範囲で記号＃をプロットしている。
【０１３３】
普通、配列変数の各要素は、それぞれ専用のメモリに割り当てられる。例えば、変数ｖ［０］はＭｅｍ［０］、変数ｖ［１］はＭｅｍ［１］という具合である。
【０１３４】
しかし、もしも変数ｖ［０］と変数Ｖ［１］のライフタイムに重なりが無いならば専用のメモリを割り当てることなく、メモリ１つだけで、２つの変数の値を確保させることが可能になる。例えば、変数ｖ［０］のライフタイムがＬ（Ｖ［０］）＝（３，５）であり、変数ｖ［１］のライフタイムがＬ（ｖ［１］）＝（１０，１５）であるとき、メモリＭｅｍ［０］は、時刻３〜５のときは変数ｖ［０］の値を保持し（厳密にいうと、時刻５の時に最後の値読み出しが行われ、時刻５から新しい値を保持させられるようになる）、時刻１０〜１５の時は変数ｖ［０］の値を保持する。Ｌ（Ｖ［１］）＝（１０，１５）とは、時刻１５を過ぎたら、もう変数ｖ［１］の値は使わないので値が書き換えられてしまってもかまわない、という意味である。
【０１３５】
変数共有問題とは、いくつかの変数を同じメモリに割り当てるやり方を決めることである。変数集合を縮退させることは、先に変数の共有問題を大雑把に解くことに相当するとも言える。
【０１３６】
縮退させた変数集合Ｖｒを求めると、｜Ｖ｜＝６４、ｍ＝１６であり、変数集合Ｖを４等分した、Ｖｒ＝｛v0 ，ｖ32，ｖ1 ，ｖ33，ｖ2 ，ｖ34，ｖ3 ，ｖ35，ｖ4 ，ｖ36，ｖ5 ，ｖ37，ｖ6 ，ｖ38，ｖ7 ，ｖ39｝が求まる。
【０１３７】
図２４は、図２３に示した変数集合から求めた縮退させた変数集合の例を示す図である。
同図では、変数集合Ｖを４つの部分集合に分けることが出来、各部分集合は時間差Ｔｒで同じライフタイムを持っている。尚ここでは、ここではＴｒ＝１６となっている。
【０１３８】
縮退させた変数集合Ｖｒは、変数集合Ｖの１／４のとなり、モジュールＣが必要とするメモリは、変数集合Ｖ内の各要素にそれぞれメモリを割り当てた場合の１／４の大きさですむ。
【０１３９】
次に第２の例として、図２３、２４のように縮退させた変数集合Ｖｒに対して、図１５に示したようなLeft Edge 法を修正したアルゴリズムによる処理を施して変数共有問題を解き、更にモジュールＣが必要とするメモリを小さくした例を示す。
【０１４０】
図２４に示した縮退させた変数集合Ｖｒ内の各変数においても、ライフタイムが重ならなければ、１つのメモリを複数の変数で共有することができる。
図２４に示した、縮退させた変数集合Ｖｒに対して図１５に示した処理を施し、変数の共有問題を解くと以下のように変数がメモリに割り当てられる。
【０１４１】
Ｍｅｍ［０］へは変数ｖ０、ｖ４、ｖ６
Ｍｅｍ［１］へは変数ｖ３２
Ｍｅｍ［２］へは変数ｖ１、ｖ５
Ｍｅｍ［３］へは変数ｖ３３
Ｍｅｍ［４］へは変数ｖ２、ｖ３６
Ｍｅｍ［５］へは変数ｖ３４
Ｍｅｍ［６］へは変数ｖ３、ｖ３７
Ｍｅｍ［７］へは変数ｖ３５
Ｍｅｍ［８］へは変数ｖ３８
Ｍｅｍ［９］へは変数ｖ３９
ここで変数集合Ｖが縮退されている為、メモリＭｅｍ［ｋ］に変数ｖｉ∈Ｐ０が割り当てられているなら、Ｐ１、Ｐ２、Ｐ３中の変数も対応するＰ０中の変数と同じメモリに割り当てられる。
【０１４２】
図２５は、図２４の縮退した変数の集合に対して変数共有問題を解いた後、変数をメモリに割り当てた場合を示す図である。
図２３にあるように元々はサイズ６４の配列変数があり、読み出しと書き込みの順序保証を行う為、モジュールＣは、最大でサイズ１２８のメモリが必要となるところであったのが、図２５に示すように、本実施形態に示した処理を施すことによって、サイズ１０のメモリを持つ構成として合成することができる。
【０１４３】
次に第３の例として変数集合Ｖが縮退できない場合を示す。
以下のようなω（Ａ，ｔ_A ）とγ（Ｂ，ｔ_B ）とで規定されるプロセスＡ，Ｂが与えられた場合を示す。この例はサイズ６４の配列変数をプロセスＡからプロセスＢへ受け渡すことに相当し、Ｔ＝６４である。
【０１４４】
Ｗ（Ａ）＝｛（０，｛０｝），（１，｛１｝），（２，｛２｝），（３，｛３｝），・・・（６３，｛６３｝）｝＝｛ｉ，｛ｉ｝｝｜０≦ｉ≦６３｝
Ｒ（Ｂ）＝｛（０，｛０｝），（１，｛８｝），（２，｛１６｝），（３，｛２４｝），・・・（６３，｛６３｝）｝＝｛ｉ，｛ｉ｝｝｜０≦ｉ≦６３，ｊ＝（ｉｍｏｄ８）×８＋［ｉ／８］｝（ここでは、[ｉ／８］は、ｉを８で割って小数を切り捨てて整数にした値を表す。）
ここでは、時間の関係式における定数α_C _ａ＝１、α_Cb＝１、β_Ca＝０、β_Cb＝０として、プロセスＣの読み出し系列Ｒ（Ｃ）、書き込み系列Ｗ（Ｃ）を求めると、Ｒ（Ｃ）＝ｗ（Ａ），Ｗ（Ｃ）＝Ｒ（Ｂ）となる。
【０１４５】
プロセスＣの配列変数ｖのライフタイムの解析結果を図２６に示す。
同図においてＢｅｇｉｎ及びＥｎｄは、図１０に示した１次元配列に対応し、Ｂｅｇｉｎはその変数に最初に値の代入が発生した時間、Ｅｎｄは最後に参照が発生時間、添え字は変数の添え字を示している。
【０１４６】
図２７は、図２６を元に、各変数のライフタイムを図形的に表したものである。
本例の場合は、変数集合Ｖに外８となるＶの部分集合Ｐ、Ｑが見つけられ
【０１４７】
【外８】

【０１４８】
ず、変数集合ｖは縮退することが出来ない。また、変数をＢｅｇｉｎ［ｖ］の昇順に並べ替えても図２３と変わらない。
このような変数集合Ｖに対して、図１５に示したようなLeft Edge 法を修正したアルゴリズムによる処理を施して変数共有問題を解くと、図２４に示すように各変数がメモリへ割り当てられる。
【０１４９】
図２８は、例３における変数のメモリの割り当て状況を示す図である。
同図より変数ｖ０，ｖ４９，ｖ６３，ｖ１１２，ｖ１２１は、メモリの０番目に割り当てる。表現を変えれば、Ａｓｓｉｎ［０］＝０，Ａｓｓｉｎ［４９］＝０，Ａｓｓｉｇｎ［６３］＝０，Ａｓｓｉｇｎ［１１２］＝０，Ａｓｓｉｇｎ［１２１］＝０である、とも言える。
【０１５０】
例３では、変数集合を縮退できなかったため、２倍のサイズの変数集合を扱っている。そのため、もともとあった変数ｖ０〜ｖ６３と、追加の変数ｖ６４〜ｖ１２７の、合計１２８個の変数を同時に扱っている。
【０１５１】
この図２８より、プロセスＣが必要とするメモリサイズがわかる。もともとはサイズ６４の配列変数があり、読み書き競合を防ぐためにはメモリを２倍にして、最大でサイズ１２８のメモリが必要になるところであったが、本発明を適用することにより、サイズ５６のメモリで済ませることができる。
【０１５２】
なお、図２６をみると、変数ｖ５６のライフタイムはＬ（ｖ５６）＝（５６，５６）であり、ｂｅｇｉｎの値とｅｎｄの値が等しい。そのような変数は、メモリに格納することなく、入力ポートから出力ポートへ素通しで出力する。これは、（１）レイテンシを短くするため、（２）バッファとして必要なメモリ量を減らす、という２つの目的がある。これと同様なことが、例１での変数ｖ７にて見られる。
【０１５３】
図２９に第３の例における変数のメモリの割り当て状況を示す。同図は、図２４に示した変数のメモリへの割り当て状況を縦軸にメモリ、横軸に時間を取って示した図である。
【０１５４】
図３０は、図１のＣＡＤシステムを汎用のコンピュータとして実現した場合のコンピュータのシステム環境図である。
同図のコンピュータは、ＣＰＵ２１、各プログラムのワークエリアとなる主記憶装置２２、各プログラムやデータベースが記録されるハードディスク等の補助記憶装置２３、ディスプレイ、キーボード等の入出力装置（Ｉ／Ｏ）２４、モデム等のネットワーク接続装置２５及びディスク、磁気テープなどの可搬記憶媒体から記憶内容を読み出す媒体読取り装置２６を有し、これらが互いにバス２８により接続される構成を備えている。
【０１５５】
ＣＡＤシステムとしての機能や上記したモジュール間インタフェースの自動生成の機能をソフトウエアによって実現した場合、ＣＰＵ２１がプログラムに基いて、主記憶装置２２をワークエリアとして、主記憶装置２２若しくは補助記憶装置２３上の領域に実現されたライブラリ格納部２１からデータを読み出して実現する。
【０１５６】
図３０のコンピュータでは、媒体読取り装置２６により磁気テープ、フレキシブルディスク、ＣＤ−ＲＯＭ、ＭＯ等の記憶媒体２７に記憶されているプログラム、データを読み出し、これを主記憶装置２２または補助記憶装置２３にダウンロードする。そして本実施形態による各処理は、ＣＰＵ２１がこのプログラムやデータを実行することにより、ソフトウエア的に実現させることが出来る。
【０１５７】
また、図３０のコンピュータでは、フレキシブルディスク等の記憶媒体２７を用いてアプリケーションソフトの交換が行われる場合がある。よって、本発明は、モジュール間インタフェースの自動合成装置、合成方法に限らず、コンピュータにより使用されたときに、上述した本発明の実施形態の機能をコンピュータに行わせるためのプログラムやコンピュータ読み出し可能な記憶媒体２７として構成することもできる。
【０１５８】
この場合、「記憶媒体」には、例えば図３１に示されるように、ＣＤ−ＲＯＭ、フレキシブルディスク（あるいはＭＯ、ＤＶＤ、リムーバブルハードディスク等であってもよい）等の媒体駆動装置３７に脱着可能な可搬記憶媒体３６や、ネットワーク回線３３経由で送信される外部の装置（サーバ等）内の記憶手段（データベース等）３２、あるいはコンピュータ３１の本体３４内のメモリ（ＲＡＭ又はハードディスク等）３５等が含まれる。可搬記憶媒体３６や記憶手段（データベース等）３２に記憶されているプログラムは、本体３４内のメモリ（ＲＡＭ又はハードディスク等）３５にロードされて、実行される。
【０１５９】
尚２つの回路モジュール間でデータの受け渡しを行うモジュール間インタフェースは、回路でなくても良い。例えば２つの回路モジュールが、並列実行される計算機で、共有メモリへのアクセスによってデータのやり取りを行う際、図１７，１８に対応するデータメモリへの読み込み／書き込みの情報を２つの計算機に与えることによって、これら計算機はこの情報に基いて共有メモリにアクセスすることによって、共有メモリのデータの受け渡しに用いる領域を小さくすることができる。
【０１６０】
（付記１）並列に動作する複数の回路モジュールの間のデータの受け渡しを行うモジュール間インタフェースを合成する合成装置であって、
前記回路モジュールによる前記モジュール間インタフェースへのデータの書き込み及び読み出しを解析し、該解析の結果に基いて前記モジュール間インタフェースが持つ変数を求める変数算出手段と、
前記モジュール間インタフェースが持つ変数に対して、最初にデータが書き込まれてから最後に読み出されるまでの期間を示すライフタイムを求めるライフタイム算出手段と、
前記変数を要素として持つ集合を整数分割して得られる部分集合で、要素となる変数の全てに対し、当該要素となる変数の前記ライフタイムと特定の時間ずれた前記ライフタイムを持つ変数が、前記分割して得られる他の部分集合に存在する縮退させた変数集合を探す縮退集合算出手段と、
前記縮退させた変数集合を用いて、変数の割り当てを元に前記モジュール間インタフェースを合成する回路合成手段と、
を備えることを特徴とするモジュール間インタフェースの自動合成装置。
【０１６１】
（付記２）前記縮退された変数集合の要素である変数の前記ライフタイムが重ならないようにメモリに共有させ、前記縮退された変数集合の要素である変数の前記メモリへの割り当てを求める変数割り当て手段を更に備え、前記回路合成手段は前記割り当てに基いて前記モジュール間インタフェースを合成することを特徴とする付記１に記載のモジュール間インタフェースの自動合成装置。
【０１６２】
（付記３）前記モジュール間インタフェースがデータを読み込んでから書き込むまでの最小の時間である最小遅延時間を求める最小遅延時間算出手段を更に備え、前記回路合成手段は、前記最小遅延時間を考慮して、前記モジュール間インタフェースを合成することを特徴とする付記１又は２に記載のモジュール間インタフェースの自動合成装置。
【０１６３】
（付記４）前記回路合成手段は、前記縮退させた変数集合から求まる値に基づいて、前記モジュール間インタフェースが有するメモリの大きさを決定することを特徴とする付記１乃至３の何れか１つに記載のモジュール間インタフェースの自動合成装置。
【０１６４】
（付記５）回路記述言語による記載を解析し、回路の仕様を出力するコンパイラ手段と、
前記コンパイラ手段から、並列に動作する複数の回路モジュールの間のデータの受け渡しを行うモジュール間インタフェースを合成を依頼されると、該モジュール間インタフェースを合成するインタフェースモジュール生成手段とを備え、
前記インタフェースモジュール生成手段は、
前記回路モジュールによる前記モジュール間インタフェースへのデータの書き込み及び読み出しを解析し、該解析の結果に基いて前記モジュール間インタフェースが持つ変数を求める変数算出手段と、
前記モジュール間インタフェースが持つ変数に対して、最初にデータが書き込まれてから最後に読み出されるまでの期間を示すライフタイムを求めるライフタイム算出手段と、
前記変数を要素として持つ集合を整数分割して得られる部分集合で、要素となる変数の全てに対し、当該要素となる変数の前記ライフタイムと特定の時間ずれた前記ライフタイムを持つ変数が、前記分割して得られる他の部分集合に存在する縮退させた変数集合を探す縮退集合算出手段と、
前記縮退させた変数集合を用いて、変数の割り当てを元に前記モジュール間インタフェースを合成する回路合成手段と、
を有することを特徴とするＣＡＤシステム。
【０１６５】
（付記６）並列に動作する複数の回路モジュールの間のデータの受け渡しを行うモジュール間インタフェースを合成する合成方法であって、
前記回路モジュールによる前記モジュール間インタフェースへのデータの書き込み及び読み出しを解析し、該解析の結果に基いて前記モジュール間インタフェースが持つ変数を求め、
前記モジュール間インタフェースが持つ変数に対して、最初にデータが書き込まれてから最後に読み出されるまでの期間を示すライフタイムを求め、
前記変数を要素として持つ集合を整数分割して得られる部分集合で、要素となる変数の全てに対し、当該要素となる変数の前記ライフタイムと特定の時間ずれた前記ライフタイムを持つ変数が、前記分割して得られる他の部分集合に存在する縮退させた変数集合を探し、
前記縮退させた変数集合を用いて、変数の割り当てを元に前記モジュール間インタフェースを合成する
ことを特徴とする合成方法。
【０１６６】
（付記７）並列に動作する複数の回路モジュールの間のデータの受け渡しを行うモジュール間インタフェースを合成するコンピュータによって実行されるプログラムであって
前記回路モジュールによる前記モジュール間インタフェースへのデータの書き込み及び読み出しを解析し、該解析の結果に基いて前記モジュール間インタフェースが持つ変数を求め、
前記モジュール間インタフェースが持つ変数に対して、最初にデータが書き込まれてから最後に読み出されるまでの期間を示すライフタイムを求め、
前記変数を要素として持つ集合を整数分割して得られる部分集合で、要素となる変数の全てに対し、当該要素となる変数の前記ライフタイムと特定の時間ずれた前記ライフタイムを持つ変数が、前記分割して得られる他の部分集合に存在する縮退させた変数集合を探し、
前記縮退させた変数集合を用いて、変数の割り当てを元に前記モジュール間インタフェースを合成する
ことを前記コンピュータに実行させるプログラム。
【０１６７】
（付記８）前記縮退された変数集合の要素である変数の前記ライフタイムが重ならないようにメモリに共有させ、前記縮退された変数集合の要素である変数の前記メモリへの割り当てを求め、前記割り当てに基いて前記モジュール間インタフェースを合成することを前記コンピュータに実行させる付記７に記載のプログラム。
【０１６８】
（付記９） Left Edge 法に基いて、前記縮退された変数集合の要素である変数の前記ライフタイムが重ならないようにメモリに共有させ、前記縮退された変数集合の要素である変数の前記メモリへの割り当てを求めることを特徴とする付記８に記載のプログラム。
【０１６９】
（付記１０）前記メモリへの割り当てに基づいて、前記モジュール間インタフェースが有するメモリの大きさを決定することを前記コンピュータに実行させる付記８又は９に記載のプログラム。
【０１７０】
（付記１１）前記モジュール間インタフェースがデータを読み込んでから書き込むまでの最小の時間である最小遅延時間を求め、該最小遅延時間を考慮して、前記モジュール間インタフェースを合成することを前記コンピュータに実行させる付記７乃至１０の何れか１つに記載のプログラム。
【０１７１】
（付記１２）前記最小遅延時間を考慮して、前記モジュール間インタフェースが有するメモリの大きさを決定することを前記コンピュータに実行させる付記１０に記載のプログラム。
【０１７２】
（付記１３）前記縮退させた変数集合から求まる値に基づいて、前記モジュール間インタフェースが有するメモリの大きさを決定することを前記コンピュータに実行させる付記７乃至１２の何れか１つに記載のモジュール間インタフェースの自動合成装置。
【０１７３】
（付記１４）前記モジュール間インタフェースへのデータの書き込みを行う前記回路モジュール、前記モジュール間インタフェースへのデータの読み出しを行う前記回路モジュール及び前記モジュール間インタフェースに対してそれぞれの同期信号に基いた時刻を設定し、前記モジュール間インタフェースへのデータの書き込み及び読み出しを解析する際、前記書き込みを行う回路モジュール及び読み出しを行う前記回路モジュールの時刻を前記モジュール間インタフェースの時刻に変換することを前記コンピュータに実行させることを特徴とする付記７乃至１３の何れか１つに記載のプログラム。
【０１７４】
（付記１５）並列に動作する複数の回路モジュールの間のデータの受け渡しを行うモジュール間インタフェースを合成するコンピュータによって使用された時、
前記回路モジュールによる前記モジュール間インタフェースへのデータの書き込み及び読み出しを解析し、該解析の結果に基いて前記モジュール間インタフェースが持つ変数を求め、
前記モジュール間インタフェースが持つ変数に対して、最初にデータが書き込まれてから最後に読み出されるまでの期間を示すライフタイムを求め、
前記変数を要素として持つ集合を整数分割して得られる部分集合で、要素となる変数の全てに対し、当該要素となる変数の前記ライフタイムと特定の時間ずれた前記ライフタイムを持つ変数が、前記分割して得られる他の部分集合に存在する縮退させた変数集合を探し、
前記縮退させた変数集合を用いて、変数の割り当てを元に前記モジュール間インタフェースを合成する
ことを前記コンピュータに実行させるプログラムを記憶した前記コンピュータが読み出し可能な可搬記憶媒体。
【０１７５】
【発明の効果】
本発明によれば、従来手法よりも使用するメモリ量を少なくしたモジュール間インタフェースを自動的に合成することができる。
【０１７６】
またこのモジュール間インタフェースを自動的に合成することができるので、手作業で設計することによるミスを防げる。
【図面の簡単な説明】
【図１】本実施形態におけるＣＡＤシステムの構成を示す図である。
【図２】インタフェースモジュール生成部が行う処理を示す図である。
【図３】インタフェースモジュール生成部で扱われるプロセスの例を示す図である。
【図４】インタフェース回路モジュールの実現例を示す図（その１）である。
【図５】インタフェース回路モジュールの実現例を示す図（その２）である。
【図６】読み出し系列／書き込み系列のデータ構造例を示す図である。
【図７】（ａ）はプロセスＡの書き込み系列をプロセスＣの読み出し系列に変換する処理を示すフローチャート、（ｂ）はその変換例を示す図である。
【図８】（ａ）はプロセスＢの読み出し系列をプロセスＣの書き込み系列に変換する処理を示すフローチャート（ｂ）はその変換例を示す図である。
【図９】最小出力遅延ｔ_c ^{ｏｄｅｌａｙ}を求める処理を示すフローチャートである。
【図１０】変数のライフタイムｂｅｇｉｎ（ｖ［ｉ］），ｅｎｄ（ｖ［ｉ］）のデータ構造を示す図である。
【図１１】ライフタイムを求める処理を示すフローチャートである。
【図１２】縮退された変数集合を示す図である。
【図１３】縮退させた変数集合Ｖｒを求める処理を示すフローチャートである。
【図１４】ステップＳ４６の判断処理の詳細を示すフローチャートである。
【図１５】変数共有問題を解く処理を示すフローチャートである。
【図１６】ＲＴＬ生成を行うアルゴリズムを示す図である。
【図１７】入力ポートの読み出しを行うサブプロセスのＲＴＬコードの例を示す図である。
【図１８】出力ポートの書き込みを行うサブプロセスのＲＴＬコードの例を示す図である。
【図１９】２つのサブプロセスをまとめたＲＴＬコードの模擬的な例を示す図である。
【図２０】出力遅延の説明図（その１）である。
【図２１】出力遅延の説明図（その２）である。
【図２２】例１における配列変数のライフタイムの解析結果を示す図である。
【図２３】例１における配列変数のライフタイムを示す図である。
【図２４】例１における変数集合から求めた縮退させた変数集合の例を示す図である。
【図２５】例２における変数のメモリへの割り当てを示す状況を示す図である。
【図２６】例３における配列変数のライフタイムの解析結果を示す図である。
【図２７】例３における配列変数のライフタイムを示す図である。
【図２８】第３の例における変数のメモリの割り当て結果を示す図である。
【図２９】第３の例における変数のメモリの割り当て状況を示す図である
【図３０】本実施形態におけるコンピュータのシステム環境図である。
【図３１】媒体例を示す図である。
【図３２】高位合成による回路設計で用いられる回路記述言語による動作記述例を示す図である。
【符号の説明】
１ＣＡＤシステム
１１入力編集部
１２コンパイラ部
１３ライブラリ格納部
１４インタフェースモジュール生成部
１５出力部
２１ＣＰＵ
２２主記憶装置
２３補助記憶装置
２３入出力装置
２５ネットワーク接続装置
２６媒体読取装置
２７記憶媒体
２８バス
３１情報処理装置
３２記憶手段
３３ネットワーク回線
３４本体
３５メモリ
３６可搬記憶媒体[0001]
BACKGROUND OF THE INVENTION
The present invention relates to a technique of circuit design by high level synthesis, and more particularly to synthesis of an inter-module interface that becomes an interface between a plurality of circuit modules operating in parallel.
[0002]
[Prior art]
In recent years, hardware has been enlarged as represented by VLSI, and research to automate hardware design using a computer is underway to cope with this. As one of the hardware design methods, an operation specification is described without being aware of hardware using a circuit description language such as VHDL or SFL, and when this is input to a computer, the computer inputs the input logic of each module. There are some which synthesize an actual circuit by performing composition, logic compression, etc., and further arranging and wiring. Among the synthesis methods from such a circuit description language, a method of synthesizing from a circuit description having a particularly high degree of abstraction is called high-level synthesis.
[0003]
FIG. 32 shows an operation description example in a circuit description language used in circuit design by high-level synthesis. In the example shown in the figure, a process for outputting data to an array variable is described in part (A), and a process for reading data from the array variable is described in part (B).
[0004]
In circuit design by high-level synthesis, there is a case where it is desired that the parts (A) and (B) described as shown in FIG. 32 that are originally processed sequentially are executed in parallel.
[0005]
In such a case, the interface part that handles data transfer between the circuit modules that are executed in parallel is created as hardware.However, the circuit part of the interface part is usually provided with a buffer memory for timing adjustment. I must. A circuit module that outputs data writes data to this buffer, and a circuit module that receives data reads data from this buffer.
[0006]
When the circuit module A that realizes the process described in the part (A) of FIG. 32 and the circuit module B that realizes the process described in the part (B) are mounted and executed in parallel, the circuit module A and The circuit module B accesses the same array variable on the buffer.
[0007]
At this time, in order for these two circuit modules to perform parallel processing correctly, the order of writing to and reading from the array variables must be performed correctly. That is, it is necessary to guarantee an order relationship in which the circuit module B reads after the circuit module A writes, and the circuit module A must not write before the circuit module B reads.
[0008]
As one of the methods for realizing this, it is conceivable that the array variable array is duplicated and the hardware is formed as a configuration including a memory area for two array variables of array A and array B. In this case, the circuit module A writes to the array A, while the circuit module B reads from the array B. When the writing / reading of the circuit modules A and B is completed to the end, this time, the circuit module A writes to the array B, and the circuit module B reads from the array A. Thereafter, this operation is repeated.
[0009]
If this method is used, the condition that the circuit module B reads after completion of writing in the circuit module A and the circuit module A does not write before the circuit module B reads can be satisfied.
[0010]
When this method is used, the memory amount used as the array variable array is not always required to be twice the size of the original array variable. However, when a circuit is configured in a semiconductor chip, the smaller memory size has advantages in many respects such as circuit arrangement / wiring, power consumption, and manufacturing cost.
[0011]
As an alternative to the method of duplicating the memory, the circuit designer can analyze the behavioral description sufficiently and implement an implementation method that reduces the amount of memory compared with the method of duplicating the above array variables for each circuit. A method of studying and designing manually can be considered.
[0012]
In general high-level synthesis, there is a scheduling process for assigning variables to memories (registers) separately from the inter-module interface circuit which is the subject of the present invention. There, an optimization process called variable sharing is performed. This is a technique for reducing the amount of hardware generated by high-level synthesis by assigning variables whose lifetimes do not overlap to the same memory. This technique cannot be applied to the structure and array variables of the high-level description model that is the subject of the present invention.
[0013]
[Problems to be solved by the invention]
An object of the present invention is to provide a mechanism capable of automatically synthesizing a circuit serving as an interface between a plurality of circuits executed in parallel as described above.
[0014]
It is another object of the present invention to provide a mechanism that can automatically determine the size of a buffer memory that constitutes a circuit serving as an interface and a control circuit related to the buffer memory.
[0015]
Mathematical modeling of access to array variables, and automatic synthesis of a circuit that becomes an interface with a smaller amount of memory than in the past and a reduced latency (time from the input of data to be processed until the processing result comes out) It is an issue to provide a mechanism to do this.
[0016]
[Means for Solving the Problems]
A synthesizing apparatus according to the present invention that solves the above problems is based on the premise that an inter-module interface that transfers data between a plurality of circuit modules that operate in parallel is synthesized, variable calculation means, lifetime calculation means, degeneration A set calculation means and a circuit synthesis means are provided.
[0017]
The variable calculation means analyzes writing and reading of data to and from the inter-module interface by the circuit module, and obtains a variable that the inter-module interface has based on the result of the analysis.
[0018]
The lifetime calculation means obtains a lifetime indicating a period from when data is first written to when it is finally read, with respect to a variable included in the inter-module interface.
[0019]
The degenerate set calculation means is a subset obtained by integer division of the set having the variable as an element, and for all the variables that are elements, the life that is shifted from the lifetime of the variables that are the elements by a specific time A variable having time is searched for a degenerated variable set that exists in another subset obtained by the division.
[0020]
The circuit synthesis means synthesizes the inter-module interface based on the variable assignment using the degenerated variable set.
The present invention can also be realized as a CAD system including as a means a compiler means for analyzing a description in a circuit description language and outputting a circuit specification, and the synthesis apparatus.
[0021]
Further, the present invention analyzes the writing and reading of data to and from the inter-module interface by the circuit module, obtains a variable possessed by the inter-module interface based on the result of the analysis, and calculates the variable possessed by the inter-module interface. Then, the lifetime indicating the period from when data is first written to when it is last read is obtained, and is a subset obtained by dividing the set having the variables as elements into integers, and for all the variables that are elements. The variable having the lifetime shifted by a certain time from the lifetime of the variable as the element is searched for a degenerated variable set existing in another subset obtained by the division, and the degenerated variable A composition characterized by synthesizing the inter-module interface based on variable assignment using a set The law is also within the scope.
[0022]
Further, the present invention includes a computer program and a portable storage medium. According to the present invention, the interface between modules can be automatically synthesized.
[0023]
It is also possible to synthesize an interface between modules that uses a small amount of memory.
Other features of the invention will be described with reference to FIGS. Each variable V [i] of the entire set of n variables from V [0] to V [n−1] (V [0] to V [63] in FIG. 24) is represented by time Begin [i] (FIG. 24). Is written at an address in the memory at the beginning of the time (#), is held for the lifetime T (number of # in FIG. 24), and is read at time End [i] (the next time when # ends in FIG. 24). The description will be made on the premise of the memory read / write operation. The writing time Begin [j] of the variable V [j] (for example, V [4] in FIG. 24) after the lifetime T from Begin [i] in which writing of the V [0] in FIG. And a write sequence and a read sequence composed of a combination of i and j that satisfy the condition that Begin [j] of the variable V [j] must start after the read time End [i] of the variable V [i]. Are obtained, that is, subsets (P 0, P 1, P 2, P 3 in FIG. 24) that become a set of variables having the same lifetime with a time difference Tr. If some subsets satisfy the above, the subsets are called reduced subsets of the entire set.
[0024]
Since the degenerated subsets are congruent in the order of the variables, the read / write of the memory is performed in the address space corresponding to the degenerated subset variables (in FIG. 24, the address space 16 corresponding to the subset P0). It becomes possible. That is, the address space is degenerated from 64 to 16 in FIG.
[0025]
Furthermore, even if each degenerated variable set is divided into further subsets, the above condition can be obtained in a smaller address space than the above degenerated address space by sharing variables, that is, by putting a plurality of variables in one address. Can be met. For example, if V [0], V [4], and V [6] in FIG. 24 are A, +, and B, they can be assigned to the memory address M [0] in FIG. This is called variable sharing. In FIG. 25, the original address space 64 is degenerated to 10.
[0026]
DETAILED DESCRIPTION OF THE INVENTION
FIG. 1 is a block diagram showing a configuration of a CAD system when a technique for automatically generating an inter-module interface serving as an interface in the present invention is applied to a CAD system.
[0027]
The CAD system 1 shown in FIG. 1 includes an input editing unit 11, a compiler unit 12, a library storage unit 13, an interface module generation unit 14, and an output unit 15. In the figure, only the part related to the present invention in the CAD system is shown.
[0028]
The input editing unit 11 is a part for processing description input / editing of processing contents by a user in a circuit description language when designing a circuit. The compiler unit 12 generates a specification of a circuit that realizes the description content input by the user from the input editing unit 11 and the description content in the circuit description language in the file read as a file. The library storage unit 13 stores a plurality of circuit libraries, and is referenced when the compiler unit 12 generates a circuit to extract a necessary library. The interface module generation unit 14 generates an inter-module interface that serves as an interface between a plurality of circuit modules that are executed in parallel. When the compiler unit 12 generates a circuit, the description content in the circuit description language is analyzed. If it determines that an inter-module interface is necessary, it requests its generation. The interface module generation unit 14 may be configured as a partial function of the library storage unit 13. The input / output unit 15 outputs the circuit generated by the compiler unit 12 as a file or on a display device such as a display.
[0029]
This CAD system may be realized on dedicated hardware, or may be realized by software using a general-purpose computer.
The interface module generation unit 14 is not a part of the software that implements the CAD system 1 or the CAD system, but is a part of another device or software, for example, a development tool such as a high-level synthesis tool or a compiler, or a single unit. You may comprise as an apparatus or software.
[0030]
FIG. 2 is a diagram illustrating processing performed by the interface module generation unit 14.
In response to the interface module generation request from the compiler unit 12, the interface module generation unit 14 performs the following processing.
1. Analyze write and read sequences.
2. The output delay is obtained from the write sequence and the read sequence.
3. Analyze the lifetime of variables.
4). Find the degenerate variable set.
5. Solve variable sharing problems and assign variables to memory.
6). Synthesize module interface.
[0031]
The above processes will be described later.
The interface module generation unit 14 processes the circuit modules A and B operating in parallel and the inter-module interface C of the circuit modules A and B as processes.
[0032]
The process here refers to a process that operates while communicating with other processes by writing or reading data in a communication port. There may be memory (memory) within the process. Processes can freely perform communication operations without affecting each other (a plurality of processes operate in parallel).
[0033]
Further, when generating the inter-module interface C, the following points are taken into consideration.
・ Multiple processes run in parallel.
-There is no memory in the communication path.
-If writing is done before the partner reads, so-called overwriting is performed and the data written last time is destroyed.
• If reading is performed before writing new data, the previously written data is only read and new data cannot be read.
[0034]
Each process has its own clock. However, the time indicated by the clock of each process can be expressed by a linear function. The clocks are different for each process because each circuit module does not always operate using the same clock signal, but is simply expressed when the clock frequency and phase are different for each circuit module. Only. Even if each has its own clock, each time can be converted by a linear function, so the presence of multiple clocks does not complicate the model. When it is known in advance that all circuit modules operate with the same clock signal, the time of all the circuit modules may be expressed by one clock.
[0035]
FIG. 3 is a diagram illustrating an example of a process handled by the interface module generation unit 14.
In the figure, a process C is an interface between modules of a process A and a process B that operate in parallel. The process A performs communication corresponding to a write operation to the memory of the process C when outputting data, and the process B performs communication corresponding to a read operation from the memory of the process C when inputting data. Then, the interface module generation unit 14 performs control so that the order relationship is ensured so that the circuit module B reads after the circuit module A completes writing, and the circuit module A does not write before the circuit module B reads. Module C is generated.
[0036]
4 and 5 are diagrams illustrating an implementation example of the inter-module interface generated by the interface module generation unit 14.
FIG. 4 shows a configuration in which array variables are allocated on the RAM. In response to writing data from the circuit module A and reading data from the module B, the control circuit issues addresses allocated to the array variables. Control to assure the order relation.
[0037]
FIG. 5 shows a configuration in which an array variable is assigned to a register. In response to data writing from the circuit module A, the control circuit transmits a load signal to the assigned register and writes it to the module. For reading data from B, the selector is controlled by a control signal to output data from the corresponding register so that the order relationship is guaranteed.
[0038]
Next, the processing by the interface module generation unit 14 shown in FIG. 2 will be described in detail.
(1)Analysis of write series / read series
In this processing, the output of the circuit module A and the input of the circuit module B are rewritten as the input / output of the process C so that the subsequent processing can be discussed only with the description for the process C.
[0039]
In the following description, the time indicated by the clocks of process A and process B is t_A , T_B Is written. Also, process A is at time t_A Ω (A, t_A), Process B is time t_B Γ (B, t_B ). In the present embodiment, a set is assumed assuming that a plurality of array variable elements are accessed at a time. Also
Process A write sequence W (A):
W (A) = {(t_A , Ω (A, t_A )) | T_A Is the time of process A}
Process B read sequence R (B):
R (B) = {t_B , Γ (B, t_B )) | T_B Is the time of process B}
It is defined as For example,
W (A) = {(0, {0, 32}), (1, {1, 33}),..., {J, (32 + j)}}, j = c mod 32
Represents the process of writing the 0th and 32nd array variables at time 0, writing the first and 33rd at time 1, and so on.
R (B) = {(0, {0, 1}), (1, {2, 3}), ..., (d, {(2j), (2j + 1)}), ...}, j = D mod 32
Represents the process of reading the 0th and 1st array variables at time 0, reading the 2nd and 3rd at time 1 and so on.
[0040]
FIG. 6 is a diagram illustrating an example of a data structure of a read sequence / write sequence.
The read sequence of processes and the write sequence can be realized by a combination of a one-dimensional array and a list structure as shown in FIG. In the figure, the elements of the array indicate the head of the list structure data, and each data in the list structure part represents a set of subscripts of variables accessed at that time. In the example shown in the figure, the sequences {(0, {0, 1, 2}), (1, {3, 4}), (3, {5, 6}), ..., (n-1, {30 , 31})} is stored.
[0041]
Process A time t_A ^bTo t_A ^eWrite series W (A, t_A ^b, T_A ^e):
W (A, t_A ^b, T_A ^e) = {(T_A ^b, Ω (A, t_A )) | T_A ^b≦ t_A <T_A ^e}
Process A time t^b _ATo t_A ^eWriting position set ω (A, t_A ^b, T_A ^e):
[0042]
[Expression 1]

[0043]
It is defined as
Similarly, the reading of the process B is defined as follows.
Process B time t_B ^bTo t_B ^eReading series R (B, t_B ^b, T_B ^e):
R (B, t_B ^b, T_B ^e) = {(T_B, Γ (B, t_B)) | T_B ^b<T_B <T_B ^e}
Process B time t_B ^bTo t_B ^eReading position set γ (B, t_B ^b, T_B ^e):
[0044]
[Expression 2]

[0045]
In the case of FIG. 3, since there is a communication path from the process A to the process C, the A write series is also a C read series. Here, only cases where communication can be performed reliably without any omissions are dealt with, so that a clock for each process is set so that this can be realized.
[0046]
Process A time t_A To process C time t_C By
t_A = Α_Cat_C+ Β_Ca: Α_Ca, Β_CaIs a constant (α_CaIs the frequency ratio of the clock signal, β_CaCorresponds to the phase difference)
In other words, the readout sequence γ (C, t of process C_C , T_C +1) is easily obtained from the write sequence of process A.
γ (C, t_C , T_C +1) = ω (A, α_Cat_C + Β_Ca, Α_Ca(T_C+1) + β_Ca)
Similarly, process B time t_B To process C time t_C By
t_B = Α_Cbt_C+ Β_Cb: Α_Cb, Β_CbIs a constant (α_CbIs the frequency ratio of the clock signal, β_CbCorresponds to the phase difference)
In other words, the writing sequence ω (C, t of process B_C , T_C +1) can be obtained from the writing sequence of process C.
ω (C, t_C , T_C +1) = γ (B, α_Cbt_C + Β_Cb, Α_Cb(T_C+1) + β_Cb)
However, in actual processing, the write sequence of process A and the read sequence of process B are information. In this processing, the write sequence of process C is obtained from the read sequence of process B.
[0047]
In practice, each process runs on the same clock signal, that is, α_Ca= 1, β_CaIn most cases, however, such a representation is used in order to deal with the case where the clock signals are not necessarily the same (the frequency is the same).
[0048]
In this way, the output of module A and the input of module B are expressed only by the input and output of process C, and in the future, the synthesis method of circuit module C can be discussed only by process C.
[0049]
When one process performs a single read process and the other process performs a write process twice or more, it is necessary to devise communication using a plurality of ports. On the other hand, when one process performs one writing process and the other process performs two or more reading processes, it is necessary to devise a method such as providing a period for resting reading.
[0050]
FIG. 7A is a flowchart showing a process of converting the write sequence of process A into the read sequence of process C. T_A = 0 to the last write time t in the process A write series_A ^EThe write sequence of process A up to is converted into a read sequence of process C.
[0051]
In the figure, when the processing is started, first, as step S1, t_a = 0, the data R of the list structure shown in FIG._C Clear all.
[0052]
Next, as step S2, t_C Α_aCt_A + Β_aC(Α_aC, Β_aCIs a value obtained by rounding up the calculated value of the constant in the time relational expression, and the time in process A is converted to the time in process C. In step S3, time t_CThe data R having the list structure as shown in FIG._C [T_C ] At the end of time_aThe data W of the list structure as shown in FIG._A [T_a ] Is added.
[0053]
Next, as step S4, t_A 1 is added and t is set as step S5._a Is the last write time t_A ^EIf not exceeded (step S5, YES), the process returns to step S2, and the processes of steps S2 to S5 are repeated. In step S5, t is set as step S5._AIs the last write time t_A ^EIs exceeded (step S5, NO), the process is terminated.
[0054]
FIG. 7B shows the conversion of the write series of process A into the read series of process C by the process of FIG. 7A. The upper part of FIG. In this case, the lower part of the figure shows the case where the process A is faster than the process C.
[0055]
In the upper part of the figure, since the speed of the process A is slower, the time t_A Write processing is time t of process C_C At the time t_A +1 write processing is time t of process C_CThis corresponds to +2 read processing. In the lower part of the figure, since the speed of process A is faster, time t_A Write processing is time t of process C_CIn the reading process, the time t_A +1 and t_A The write process of +2 corresponds to the read process of process C at time tc + 1.
[0056]
FIG. 8A is a flowchart illustrating a process of converting the read sequence of process B into the write sequence of process C. T_B = 0 to the last read time t in the process B write series_B ^EThe process B read sequence up to is converted into a process C write sequence.
[0057]
When processing is started in the figure, first, as step S11, t_B = 0, data W of the list structure shown in FIG._CClear all.
Next, in step S12, α_bCt_B+ Β_bC(Α_bC, Β_bCIs a value obtained by rounding down the calculated value of the constant in the relational expression for time, and converts the time in process B to the time of process C. In step S13, time t_CThe data W of the list structure as shown in FIG._C [T_C ] At the end of time_BThe data R of the list structure as shown in FIG._B [T_B ] Is added.
[0058]
Next, as step S14, t_B Is incremented, and t15 is performed as step S15._B Time t when the value of is last read_B ^EIs not exceeded (step S15, YES), the process is returned to step S12, and the processes of steps S12 to S15 are repeated. In step S15, t is set as step S15._BTime t when the value of is last read_B ^EIs exceeded (step S15, NO), the process is terminated.
[0059]
FIG. 8B shows conversion of the read sequence of process B into the write sequence of process C by the process of FIG. 8A. In the upper part of FIG. 8B, process B is faster than process C. In this case, the lower part of the figure shows the case where the process B is slower than the process C.
[0060]
In the upper part of the figure, since the speed of the process B is faster, the time t of the process C_C Write processing of process B is time t_b Or t_b +1 for any one of the reading processes, and the time t of the process C_C+1 write processing is time B of process B_C +2 or t_C It corresponds to the reading process at any of +3. In the lower part of the figure, since the speed of the process B is slower, generally, a plurality of communication ports are required, and the time t of the process C_CWrite processing of process B is time t_b For the reading process, the time t of process C_C +1 and t_C +2 twice write processing is process B time t_bThis corresponds to the reading process of +1. In this case, if a plurality of data is transferred between processes, a plurality of communication ports are required.
(2)Find output delay
The output delay is the minimum output delay t from the time when the process C reads data from the process A to the time when the data is written to the process B._C ^odel ^ay Is a value indicating the minimum time. In this embodiment, in order to reduce the amount of memory used in the module C, this minimum output delay t is used as the output delay of the process C._C ^odelay Select.
[0061]
Minimum output delay t_C ^odelayIs an integer value that satisfies the following relationship.
∀t_C γ (C, 0, t_C ) ⊇ω (C, 0, t_C-T_C ^odelay)
Minimum output delay t since process A started writing_C ^odelayIf reading of process B is started after the elapse of time,
-Process B reads array variables after process A writes to array variables-Process B does not read array variables before process A writes array variables
The order relationship of processing is satisfied.
[0062]
Minimum output delay t_C ^odelayAn example of an algorithm for obtaining is shown below. The set of array variable subscripts γ and ω is the time t._C The number of elements increases monotonically, and the upper limit is equal to the size of the array variable. Therefore, this algorithm stops at the maximum if the process B loops the number of cycles required to read all the elements.
1. t = 0
2. γ (C, 0, t_C ) ⊇ω (C, 0, t_C-T), t_C ^odelay = T ends
3. t = t + 1
Return to 4.2
FIG. 9 shows the minimum output delay t_C ^odelayIt is a flowchart which shows the process which calculates | requires.
[0063]
When the processing is started in the figure, first, in step S21, the initial value is 0 for d, t1, t2, and the list structure as shown in FIG. Data Rc [t1] is set.
[0064]
Next, in step S22, t2 is the time t at which process C last writes._C ^EDetermine whether: Result t_C ^EIf it is the following (step S22, YES), it is determined in step S23 whether or not all the elements of Wc [t2] of the list structure data are included in the list structure data R (step S23). S23, NO) One d is added (step S24), and if it is included (step S23, YES), one t2 is added (step S25).
[0065]
In step S26, the value of t1 is incremented by 1 and the list structure variable Rc [t1] is added to the end of the list structure variable R. The process returns to step S22, and t2 is t_C ^ESteps S22 to S26 are repeated until the value exceeds.
[0066]
In step S22, t2 is t_C ^EIf it is larger (step S22, NO), t is set as step S27._C ^odelayAs a result, the value of d is set and the process is terminated.
(3)Analysis of variable lifetime
The lifetime L (v) = (begin (v), end (v)) of the variable v means that a value is first assigned to the variable v at time begin (v) and the last at time end (v). Indicates that a value reference for variable v has occurred. In this process, the lifetime of the array variable included in module C is analyzed.
[0067]
For the element v [i] (i is a subscript) of the array variable, the value assignment time is the time read from the module A through the communication port, and the last reference time is from the communication port to the module B. It will be the time of writing.
[0068]
For the variable v [i], the lifetime L (v [i]) = (begin (v [i]), end (v [i])
Is obtained as follows.
If i∈γ (C, t) then begin (v [i]) = t
If iεω (C, t ′), then end (v [i]) = t ′
FIG. 10 is a diagram illustrating a data structure of variable lifetimes begin (v [i]) and end (v [i]).
[0069]
Begin (v [0]) to begin (v [i]) and end (v [0]) to end (v [i]) indicating the lifetimes of the variables v [0] to v [i] are the same. It can be realized by a one-dimensional array Begin [0] to Begin [n−1] and a one-dimensional array End [0] to [n−1] as shown in the figure.
[0070]
FIG. 11 is a flowchart showing a process for obtaining the lifetime ∀vi, L (vi) = (begin (v [i], end (v [i])).
It is assumed that the process in FIG. 10 is performed using the one-dimensional array shown in FIG.
[0071]
When the process is started in the figure, t = 0 is set as an initial value in step S31.
Next, as step S32, the primary array variable Begin [i] shown in FIG. 10 is applied to all elements of the variable Rc [t] having a list structure as shown in FIG. T is substituted for (i is an element of variable Rc [t]) (step S33).
[0072]
Next, as step S34, the primary array variable End [i] shown in FIG. 10 is applied to all elements of the variable Wc [t] having a list structure as shown in FIG. T is substituted for (i is an element of the variable Wc [t]) (step S35).
[0073]
In step S36, the value t is incremented by one, and this value is the time t at which the process C last writes._C ^EIt is determined whether it is smaller (step S37). The result t is t_C ^EIf smaller, the process returns to step S32 and the processes of steps S32 to S36 are repeated. In step S37, t is t._C ^EWhen it becomes smaller (step S37, NO), the process is terminated.
(4)Find degenerate variable set
Variable set V, subsets P and Q of V, P⊆V, Q⊆V, | P | = | Q | (in the description of this specification, | set X | indicates the number of elements of set X), For a certain integer Tr, Ｐp∈P ∃q∈Q such that there (p) + Tr = begin (q) and end (p) + Tr = end (q) and end (p) ≦ begin (q) It is written as outside 1. At this time, P and Q have the same time difference Tr
[0074]
[Outside 1]

[0075]
It is called having the same lifetime.
If such a subset P exists, it is only necessary to consider a smaller set P without considering the entire variable set V. Here, P is referred to as a degenerated variable set.
[0076]
In this process, a variable set obtained by degenerating the array variable included in the module C is obtained.
A process for obtaining the degenerated variable set Vr will be described below.
Let Head (n, L) be a function that returns a set of n elements from the beginning of data L with an ordered list structure.
1. From the variable set V, an ordered list V arranged in ascending order of begin (vi) for each variable vi∈V._ord Ask for.
2. Among all the common divisors (≠ 1) of the number of elements | V | of the variable set V, the smallest common divisor m satisfying the following is obtained.
[0077]
[Equation 3]

[0078]
3. If m is found, V_r = P0 = Head (m, V_ord ) Is a degenerate variable set.
4). If such m does not exist, Vr = V (cannot be degenerated).
[0079]
FIG. 12 is a diagram illustrating a degenerated variable set.
The example in the figure shows a case where the set is reduced to 1/4 (set V_ord The number of elements / m = 4). Each variable set P0-P3 has a relationship of outer 2 and P0-P3 is
[0080]
[Outside 2]

[0081]
Each has m elements.
When such a degenerated variable set P0 is obtained, the variable set Vr may be considered as a degenerated variable set Vr, not the entire variable set V.
[0082]
FIG. 13 is a flowchart showing a process for obtaining the degenerated variable set Vr.
In performing the processing shown in the figure, the set of variables of the process C is set to the one-dimensional array V, and the list data obtained by sorting the elements vi∈V of the one-dimensional array V in ascending order of Begin [i] is set to the one-dimensional array V._ord In addition, a list in which a set of divisors of the number of elements | V | of the one-dimensional array V is arranged in descending order is set in the one-dimensional array M in advance.
[0083]
When the processing of FIG. 13 is started, first, at step S41, it is determined whether or not the array M is empty. If the result is not empty (NO in step S41), the first element of the one-dimensional array M, that is, the smallest value among the elements is set in n, and the first element is removed from the array M.
[0084]
Next, in step S43, the one-dimensional array V_ord Are equally divided into n in order from the top, and these are defined as P0, P1,..., Pn-1. Then, by the following steps S44 to S48, the outer 3 is set to P0, P1,.
[0085]
[Outside 3]

[0086]
Determine whether it is true.
First, in step S44, an initial value 0 is set to the variable i. Next, in step S45, whether i <n−1, that is, whether all P0, P1,..., Pn−1 have been processed. If i <n−1 as a result of the determination (step S45, YES), it is determined whether or not outside 4 as step S46.
[0087]
[Outside 4]

[0088]
If YES (step S46, YES), one value of the variable i is added as step S47, and the process returns to step S45 to repeat the processes of steps S45 to S47. If it is determined in step S45 that i <n-1 is not satisfied (step S45, NO), if the determination in step S46 is satisfied for all of P0, P1,. Since it is determined, V is determined as step S48.
[0089]
[Outside 5]

[0090]
The process ends with r = P0.
On the other hand, if the result of determination in step S46 is NO (step S46, NO), the process returns to step S41, and the first element in the primary array M is set to n, and the same process as described above. Repeat the process. Details of the determination process in step S46 will be described later.
[0091]
If the processing is completed for all the elements in the primary array M, and the primary array M is determined to be empty in step S41 (YES in step S41), the variable set V cannot be degenerated. V_ord To finish the process.
[0092]
FIG. 14 is a flowchart showing details of the determination processing in step S46 of FIG.
In the determination process of step S46, first, as the initial value, the number of elements of Pi | P | is set in the variable m, m * i is set in the variable j, and Begin [m] −Bgegin [0] is set in the variable Tr (step S461). .
[0093]
Next, in step S462, it is determined whether j <n−1. If j <n−1 (step S462, YES), then as step S462, Begin [j] + Tr = Begin [j + m] and End [j ] + Tr = End [j + m] and End “j” ≦ Begin [j + m]. As a result, if these conditions are satisfied (step S463, YES), the value of the variable j is incremented by one as step S464, and then the process of step S463 is returned, and it is determined in step S463 that the above conditions are not satisfied. The process of steps S462 to S464 is repeated until it is determined that j <n-1 is not satisfied in step S463 (NO in step S463) (NO in step S463). And when NO in step S462, the outer 6 is established.
[0094]
[Outside 6]

[0095]
When step S46 is YES and NO is determined in step S463, it is determined that the outer 7 is not established, and step S46 is NO.
[0096]
[Outside 7]

[0097]
(5)Solve variable sharing problems and assign variables to memory
In this process, a memory is allocated to each element of the variable set Vr degenerated by the process (4). Since the variable set Vr is degenerated, a memory is allocated to each variable that is an element of this Vr, and other variables share the memory with the variable that is an element of the Vr, so that the scale of the memory is obtained. It is possible to generate a module-to-module interface with a reduced size.
[0098]
Further, by solving the variable sharing problem for the variable set Vr and allocating memory to the variable based on the result, the memory amount of the inter-module interface can be further reduced.
[0099]
In order to solve this variable sharing problem, it is possible to solve it with an algorithm modified from the Left Edge method described in the following document.
Title Hight-Level Synthesys
Author D. Gajski, N. Dutt, A. Wu and S. Lin
Published Kluwer Academic Publishers Published 1992
In the algorithm modified from the Left Edge method, the variable v in the variable group S to be processed is rearranged in the ascending order of begin, and it is checked whether the lifetime overlaps with other variables from the first variable. Allocate to memory.
[0100]
For the variable set V, V ′ is set as follows.
V ′ = {v ′ | v′εV, L (v ′) = (begin (v) + T, end (v) + T)}
In the above equation, T is the period of the process, and the time when the process C first writes to the elements of the array variable is t1, and the time when the process C has written to all the elements of the array variable for the first time If t is t2, then T = t2-t1. When this variable set V cannot be degenerated, the variable sharing problem is solved using this V '.
[0101]
FIG. 15 is a flowchart showing processing for solving the variable sharing problem.
In this process, the case where the variable set V can be first reduced and the case where the variable set V cannot be reduced is divided. If the variable set V cannot be reduced, V ′ is added to the variable set S to double the original value. The quantity factor is taken into account. This is the same idea as designing an interface between modules by doubling the memory and alternately using the memory on the reading side and the writing side to prevent access contention.
[0102]
When the process of FIG. 15 is started, first, in step S51, the degenerated variable set Vr arranges the variables of the variable set V in the order in which the first value substitution occurred._ord It is judged whether or not. If the result is different (NO in step S51), S = Vr and Tr = Begin [| Vr |] -Begin [0] are set as step S52. If it is the same in step S51 (step S51, YES), S = V∪V ′ and Tr = T * 2.
[0103]
Next, in step S54, the set S is sorted in the ascending order of Begin [i], i∈S, and this is added to the ordered list S._ord And
In step S55, the variables mem, last, and i are set to 0, and the variable next is set to ∞.
[0104]
Next, as step S56, S_ord Is determined to be empty, and if not empty (step S56, NO), S_ord Number of elements | S_ord It is determined whether | ≦ i. Result ｜ S_ordIf | ≦ i (step S57, YES), as step S58, the value of the variable mem is incremented by one, the variables last and i are set to 0, and next is set to ∞._ordIf | ≦ i is not satisfied (step S57, NO), the process directly proceeds to step S59.
[0105]
In step S59, S_ord The i-th element from the top is substituted for v, and it is determined in step S60 whether last ≦ Begin [v] and End [v] ≦ next are satisfied.
[0106]
As a result, when it is determined that it is satisfied (step S60, YES), the ordered list S is used as step S62._ord Remove element v from
In step S63, it is determined whether Begin [v] <End [v]. If the result is not Begin [v] <End [v] (NO in step S63), the process returns to step S56.
[0107]
If it is determined in step S63 that Begin [v] <End [v] (YES in step S63), mem is substituted in the array assign [v] indicating the allocation of the variable to the memory in step S64, and last. The post-processing with = end (v) is returned to step S56. For next, the smaller of the value of next and the value of Begin (v) + Tr is substituted. If the above condition is not satisfied in step S60 (NO in step S60), one is added to i in step S61, and the post-processing is returned to step S56.
[0108]
The processing from step S56 to S64 is performed as an ordered list S._ord Repeat until there are no more elements._ord Is empty (step S56, YES), the process is terminated. Then, the memory is allocated to the variable based on the result stored in the array variable assign.
(6)Synthesize the interface circuit
By the procedure so far, the variable is assigned to the memory, and which variable is to be accessed at which timing (= which memory is to be accessed at which timing), the process C is determined to be RTL (register transfer level). This means that we have enough information to implement as a circuit. After that, an FSM (Finite State Machine) based on these pieces of information is constructed, and a circuit is made according to the FSM.
[0109]
In the following, a format similar to SystemC is adopted as a grammar for describing RTL.
The RTL of process C is composed of the following two subprocesses. Sub-processes start executing simultaneously and are executed in parallel. The operation timing is synchronized with “wait (number)” (described later).
-Subprocess that reads from input port and writes to memory
-Subprocess that reads data from memory or input port and writes to output port
FIG. 16 shows an algorithm for performing RTL generation when a variable set can be degenerated. If the variable set cannot be reduced, twice as many variables are considered as in the process of FIG. At this time, γ (C, t_C ) And ω (C, t_C ) To handle twice as many variables. “Expand” means the following replacement.
[0110]
Originally, γ (C, t_C ) = Γ (C, t_C + T), but defined as⁺ (C, t_C) To γ (C, t_C ).
・ Γ⁺ (C, t_C) = Γ (C, t_C ) 0 ≦ t_c <T
・ Γ⁺ (C, t_C) = Γ (C, t_C ) Is increased only by | V |. T ≦ t_c <2T. Similarly for ω (C, tc)⁺Replace with (C, tc).
[0111]
16 indicates that the RTL code is output to a file or a compiler, Pj indicates the input port j, Pj. read () is data read from the input port, Qk is the output port k, Qk. write (d) means writing data d to the output port k.
[0112]
Also, γ (C, t_C ) Is a set, a plurality of data may be taken in one cycle. In order to realize this, the process C has a plurality of input ports and output ports. Also, it is assumed that which data comes from which port is given in advance.
[0113]
Assign [] is a value obtained in the variable sharing problem, and Assign [v] outputs to which memory the variable v is assigned. This Assign [] is a constant value when generating the RTL code. Mem [a] = b means that the value b is written in the ath memory. wait (n) means waiting for n clocks.
[0114]
In FIG. 16, the upper sub-process is an algorithm for outputting an RTL description for reading data, and Mem [Assign [j]] = Pj for each element of γ (C, t1) (t1 = 0 to T−1). . read () (j is each element of γ (C, t1)) is output.
[0115]
FIG. 17 is a diagram showing an output example of the RTL description by the sub-process that reads this data. In FIG. 18 and FIGS. 18 and 19 described later, the annotations output by the sub-process are omitted.
[0116]
In the description of the figure, the input data from the input port is 0, 1, 2, 3, 4, 5, 6, 7, 0, from the allocation relationship between the variables and the memory obtained by the processing so far. This indicates that the fourth, second, sixth and eighth positions are stored. The control unit shown in FIG. 4 or 5 performs control based on this description by a write address or a load signal, and stores data in a memory or a register.
[0117]
In FIG. 16, the lower sub-process is an algorithm that outputs an RTL description for writing data, and for each element of ω (C, t2) (t2 = 0 to T−1), Qk. write (Mem [Assign [k]]) (k is each element of ω (C, t2)).
[0118]
FIG. 18 is a diagram illustrating an output example of the RTL description by the sub-process that writes the data to the output port.
In the description of the figure, the data stored in 0, 2, 4, 6, 0, 2, 0 of the memory with respect to the output port, based on the allocation relationship between each variable and the memory obtained by the processing so far, It shows that the data of the input port and the data stored in the

memory

1, 3, 5, 7, 4, 6, 8, 9 are output. The control unit shown in FIG. 4 or FIG. 5 performs control based on this description by read address or load selection signal, and outputs data from the memory or register to the output port.
[0119]
FIG. 19 is a diagram showing a simulated example of an RTL code in which two subprocesses, that is, a subprocess that reads data and a subprocess that writes data to an output port are combined.
[0120]
If two subprocesses are described together as one, it is as shown in the figure, and two sequential circuits are described together as one.
Next, a specific design example according to this embodiment is shown.
[0121]
As an example 1, the following ω (A, t_A ) And γ (B, t_B Assume that processes A and B defined as) are given. This corresponds to passing an array variable of size 64 from A to B, and T = 64.
[0122]
[Expression 4]

[0123]
Where zⁱ Represents the i-th digit value (0 or 1) when z is expressed in binary, and [zⁱ, Z^j ,...] Indicate binary representation values in which 0 or 1 values are arranged in this order. That is, z = [z^Five , Z^Four, Z^Three , Z² , Z¹ , Z⁰ And y is a value obtained by rearranging bits 5 to 3 of z.
[0124]
Here, the constant α in the time relational expression_C _a= 1, α_Cb= 1, β_Ca= 0, β_CbWhen = 0, the read sequence R (C) and write sequence W (C) of the process C are obtained as follows.
[0125]

When the minimum output delay is obtained, t_C ^odelay= 7.
[0126]
As a precondition of the present embodiment, once the process C starts outputting data, it continues to output in a predetermined order, and since input data has not yet arrived, all measures are taken such as temporarily stopping output. (Once it begins to flow, the output must always flow without stagnation). For this reason, if the output is started too early, the situation where the data that has not yet arrived is output is driven. Therefore, after waiting for a certain amount of data, output begins. This time is the minimum delay time t_c ^odelayIt is.
[0127]
For example, in the above example, consider a case in which output is started after waiting for two unit times. The state at that time is shown in FIG.
In the figure, at each time,
Which data arrives as input,
-Which data is output
-What data is stored in the buffer memory
Represents.
[0128]
In the figure, arrows indicate data movement, and crosses indicate that data is extracted from the buffer memory. The contents of the buffer memory are expressed as a set, and the order does not matter (it does not mean that they are stored in the memory in this order). Time t_C The contents of the buffer memory in can be expressed as set differences as follows.
(Contents of buffer memory) = γ (C, 0, t_C ) -Ω (C, 0, t_C-2)
In the figure, at time 5, the third data must be output, but since the third data has not yet arrived, it becomes stuck (output is delayed).
[0129]
Next, consider a case in which the output is made after waiting for 7 unit times. The situation is as shown in FIG.
In FIG. 21, only the time 28 is shown, but after that, it is possible to execute data input and output continuously without delaying the output.
[0130]
The case of outputting after waiting for longer than 7 unit time has only demerits such as (1) increase in required buffer memory amount, (2) increase in time from input to output (latency). Therefore, in this embodiment, such a case is excluded. However, excluding does not mean that such cases cannot be handled. To handle such a case simply t_c ^odelay It is only necessary to adopt a value larger than the value of the minimum output delay for the value of.
[0131]
The lifetime analysis result of the array variable v of process C is shown in FIG.
In the figure, “Begin” and “End” correspond to the one-dimensional array shown in FIG. 10, “Begin” is the time when the value is first assigned to the variable, “End” is the time when the reference is lastly generated, and the subscript is the variable attached. The letters are shown.
[0132]
FIG. 23 graphically represents the lifetime of each variable based on FIG.
In the figure, the subscript (index) of the array variable of the process C is taken in the vertical axis direction, and the lifetime of the variable is represented by “#” in the horizontal direction. In the figure, the i-th row corresponds to the element v [i] of the array variable, and the symbol # is plotted in the range from Begin [i] to End [i] -1.
[0133]
Normally, each element of an array variable is assigned to a dedicated memory. For example, the variable v [0] is Mem [0], the variable v [1] is Mem [1], and so on.
[0134]
However, if there is no overlap between the lifetimes of the variables v [0] and V [1], it is possible to secure the values of the two variables with only one memory without allocating dedicated memories. . For example, the lifetime of the variable v [0] is L (V [0]) = (3, 5), and the lifetime of the variable v [1] is L (v [1]) = (10, 15). At some time, the memory Mem [0] holds the value of the variable v [0] at time 3 to 5 (strictly speaking, the last value is read at time 5 and a new value is obtained from time 5). At the time 10 to 15, the value of the variable v [0] is held. L (V [1]) = (10, 15) means that after the time 15, the value of the variable v [1] is no longer used and the value may be rewritten.
[0135]
The variable sharing problem is to decide how to assign several variables to the same memory. It can be said that reducing the variable set is equivalent to roughly solving the variable sharing problem first.
[0136]
When the degenerated variable set Vr is obtained, | V | = 64, m = 16, and the variable set V is divided into four equal parts, Vr = {v0, v32, v1, v33, v2, v34, v3, v35, v4, v36, v5, v37, v6, v38, v7, v39} are obtained.
[0137]
FIG. 24 is a diagram illustrating an example of a degenerated variable set obtained from the variable set illustrated in FIG.
In the figure, the variable set V can be divided into four subsets, and each subset has the same lifetime with a time difference Tr. Here, Tr = 16 here.
[0138]
The degenerated variable set Vr is ¼ of the variable set V, and the memory required by the module C can be ¼ the size of the memory allocated to each element in the variable set V. .
[0139]
Next, as a second example, the variable set Vr degenerated as shown in FIGS. 23 and 24 is processed by an algorithm modified from the Left Edge method as shown in FIG. Further, an example in which the memory required by the module C is reduced will be shown.
[0140]
Even in each variable in the reduced variable set Vr shown in FIG. 24, one memory can be shared by a plurality of variables as long as the lifetimes do not overlap.
When the process shown in FIG. 15 is performed on the degenerated variable set Vr shown in FIG. 24 and the variable sharing problem is solved, variables are allocated to the memory as follows.
[0141]
Mem [0] has variables v0, v4, v6
Mem [1] has variable v32
Mem [2] has variables v1, v5
Mem [3] has variable v33
Mem [4] has variables v2 and v36
To Mem [5], the variable v34
Variables v3 and v37 for Mem [6]
Mem [7] has variable v35
The variable v38 to Mem [8]
Mem [9] has variable v39
Here, since the variable set V is degenerated, if the variable viεP0 is assigned to the memory Mem [k], the variables in P1, P2, and P3 are also assigned to the same memory as the corresponding variable in P0. .
[0142]
FIG. 25 is a diagram illustrating a case where variables are allocated to a memory after the variable sharing problem is solved for the degenerated variable set of FIG.
As shown in FIG. 23, originally there is an array variable of size 64, and module C needs to have a memory of size 128 at the maximum in order to guarantee the order of reading and writing, as shown in FIG. Thus, by performing the processing shown in this embodiment, it is possible to synthesize a configuration having a memory of size 10.
[0143]
Next, a case where the variable set V cannot be degenerated as a third example will be described.
Ω (A, t_A ) And γ (B, t_B ) And processes A and B specified by the above are given. This example corresponds to passing an array variable of size 64 from process A to process B, and T = 64.
[0144]
W (A) = {(0, {0}), (1, {1}), (2, {2}), (3, {3}), ... (63, {63})} = {I, {i}} | 0 ≦ i ≦ 63}
R (B) = {(0, {0}), (1, {8}), (2, {16}), (3, {24}), ... (63, {63})} = {I, {i}} | 0 ≦ i ≦ 63, j = (i mod 8) × 8 + [i / 8]} (Here, [i / 8] is obtained by dividing i by 8 and rounding down decimals. (Represents an integer value.)
Here, the constant α in the time relational expression_C _a= 1, α_Cb= 1, β_Ca= 0, β_CbWhen the read sequence R (C) and the write sequence W (C) of the process C are obtained with = 0, R (C) = w (A) and W (C) = R (B).
[0145]
The analysis result of the lifetime of the array variable v of process C is shown in FIG.
In the figure, “Begin” and “End” correspond to the one-dimensional array shown in FIG. 10, “Begin” is the time when the value is first assigned to the variable, “End” is the time when the reference is lastly generated, and the subscript is the variable attached. The letter is shown.
[0146]
FIG. 27 graphically represents the lifetime of each variable based on FIG.
In the case of this example, the subsets P and Q of V that are outside 8 are found in the variable set V.
[0147]
[Outside 8]

[0148]
In other words, the variable set v cannot be degenerated. Further, even if the variables are rearranged in the ascending order of Begin [v], they are the same as in FIG.
When such a variable set V is processed by an algorithm modified from the Left Edge method as shown in FIG. 15 to solve the variable sharing problem, each variable is allocated to the memory as shown in FIG.
[0149]
FIG. 28 is a diagram illustrating a variable memory allocation state in the third example.
From the figure, variables v0, v49, v63, v112, and v121 are assigned to the 0th memory. In other words, it can be said that Assin [0] = 0, Assin [49] = 0, Assign [63] = 0, Assign [112] = 0, and Assign [121] = 0.
[0150]
In Example 3, since the variable set could not be degenerated, a variable set of twice the size is handled. For this reason, a total of 128 variables of the original variables v0 to v63 and additional variables v64 to v127 are handled simultaneously.
[0151]
FIG. 28 shows the memory size required by process C. Originally there was an array variable of size 64, and in order to prevent read / write contention, the memory was doubled and a memory of size 128 at the maximum was necessary. However, by applying the present invention, a memory of size 56 You can do it.
[0152]
In FIG. 26, the lifetime of the variable v56 is L (v56) = (56, 56), and the value of begin is equal to the value of end. Such variables are output from the input port to the output port without being stored in the memory. This has two purposes: (1) to reduce the latency, and (2) to reduce the amount of memory required as a buffer. The same thing can be seen in the variable v7 in Example 1.
[0153]
FIG. 29 shows the state of variable memory allocation in the third example. This figure shows the allocation status of the variables shown in FIG. 24 to the memory with the vertical axis representing the memory and the horizontal axis representing the time.
[0154]
FIG. 30 is a system environment diagram of a computer when the CAD system of FIG. 1 is realized as a general-purpose computer.
The computer shown in FIG. 1 includes a CPU 21, a main storage device 22 serving as a work area for each program, an auxiliary storage device 23 such as a hard disk in which each program and database are recorded, and an input / output device (I / O) 24 such as a display and a keyboard. A network connection device 25 such as a modem, and a medium reading device 26 that reads out stored contents from a portable storage medium such as a disk or a magnetic tape, and these are connected to each other via a bus 28.
[0155]
When the functions of the CAD system and the function of automatically generating the inter-module interface described above are realized by software, the CPU 21 uses the main storage device 22 as a work area on the main storage device 22 or the auxiliary storage device 23 based on the program. This is realized by reading data from the library storage unit 21 realized in the area.
[0156]
In the computer of FIG. 30, a program and data stored in a storage medium 27 such as a magnetic tape, a flexible disk, a CD-ROM, and an MO are read by the medium reading device 26, and this is read into the main storage device 22 or the auxiliary storage device 23. to download. And each process by this embodiment is realizable like software, when CPU21 runs this program and data.
[0157]
In the computer of FIG. 30, application software may be exchanged using a storage medium 27 such as a flexible disk. Therefore, the present invention is not limited to an automatic module synthesizing apparatus and a synthesizing method for inter-module interfaces, and when used by a computer, a program for causing a computer to perform the functions of the above-described embodiments of the present invention and a computer-readable program. The storage medium 27 can also be configured.
[0158]
In this case, as shown in FIG. 31, for example, as shown in FIG. 31, the “storage medium” is detachable from a medium driving device 37 such as a CD-ROM, a flexible disk (or may be an MO, DVD, removable hard disk, etc.). A portable storage medium 36, storage means (database or the like) 32 in an external device (server or the like) transmitted via the network line 33, or a memory (RAM or hard disk or the like) 35 in the main body 34 of the computer 31 or the like. included. The program stored in the portable storage medium 36 or the storage means (database or the like) 32 is loaded into a memory (RAM or hard disk or the like) 35 in the main body 34 and executed.
[0159]
The inter-module interface that exchanges data between two circuit modules may not be a circuit. For example, when data is exchanged by accessing a shared memory in a computer in which two circuit modules are executed in parallel, read / write information to the data memory corresponding to FIGS. 17 and 18 is given to the two computers. Thus, these computers can reduce the area used for data transfer in the shared memory by accessing the shared memory based on this information.
[0160]
(Additional remark 1) It is the synthetic | combination apparatus which synthesize | combines the interface between modules which transfers the data between the several circuit modules which operate | move in parallel,
Analyzing the writing and reading of data to and from the inter-module interface by the circuit module; and variable calculating means for obtaining a variable of the inter-module interface based on the result of the analysis;
Lifetime calculation means for obtaining a lifetime indicating a period from when data is first written to when it is last read, for the variable possessed by the inter-module interface;
In a subset obtained by integer division of the set having the variable as an element, the variable having the lifetime deviated by a specific time from the lifetime of the variable serving as the element for all the variables serving as the element, Reduced set calculation means for searching for a reduced variable set existing in another subset obtained by the division;
Circuit synthesis means for synthesizing the inter-module interface based on variable assignment using the degenerated variable set;
A device for automatically synthesizing an interface between modules.
[0161]
(Supplementary Note 2) Variable allocation for causing a memory to be shared so that the lifetimes of variables that are elements of the degenerated variable set do not overlap, and for determining allocation of variables that are elements of the degenerated variable set to the memory The inter-module interface automatic synthesizing apparatus according to claim 1, further comprising: means for synthesizing the inter-module interface based on the assignment.
[0162]
(Additional remark 3) It further has the minimum delay time calculation means which calculates | requires the minimum delay time which is the minimum time after the said interface between modules reads data and writes it, The said circuit synthetic | combination means considers the said minimum delay time. The inter-module interface automatic synthesizing apparatus according to

appendix

1 or 2, wherein the inter-module interface is synthesized.
[0163]
(Additional remark 4) The said circuit synthesis means determines the magnitude | size of the memory which the said interface between modules has based on the value calculated | required from the degenerated variable set, Any one of Additional remark 1 thru | or 3 characterized by the above-mentioned. An automatic synthesis device for inter-module interfaces as described in 1.
[0164]
(Supplementary Note 5) Compiler means for analyzing a description in a circuit description language and outputting a circuit specification;
An interface module generating means for synthesizing the inter-module interface when requested by the compiler means to synthesize an inter-module interface for transferring data between a plurality of circuit modules operating in parallel;
The interface module generation means includes
Analyzing the writing and reading of data to and from the inter-module interface by the circuit module; and variable calculating means for obtaining a variable of the inter-module interface based on the result of the analysis;
Lifetime calculation means for obtaining a lifetime indicating a period from when data is first written to when it is finally read, with respect to a variable possessed by the inter-module interface;
In a subset obtained by integer division of the set having the variable as an element, the variable having the lifetime deviated by a specific time from the lifetime of the variable serving as the element for all the variables serving as the element, Reduced set calculation means for searching for a reduced variable set existing in another subset obtained by the division;
Circuit synthesis means for synthesizing the inter-module interface based on variable assignment using the degenerated variable set;
A CAD system characterized by comprising:
[0165]
(Additional remark 6) It is the synthetic | combination method which synthesize | combines the interface between modules which transfers data between the several circuit modules which operate | move in parallel,
Analyzing the writing and reading of data to the inter-module interface by the circuit module, and determining the variables of the inter-module interface based on the results of the analysis,
For a variable of the inter-module interface, obtain a lifetime indicating a period from when data is first written to when it is finally read,
In a subset obtained by integer division of the set having the variable as an element, the variable having the lifetime deviated by a specific time from the lifetime of the variable serving as the element for all the variables serving as the element, Search for a degenerate variable set existing in another subset obtained by the division,
Using the degenerated variable set, the inter-module interface is synthesized based on the variable assignment.
A synthesis method characterized by the above.
[0166]
(Supplementary Note 7) A program executed by a computer that synthesizes an inter-module interface for transferring data between a plurality of circuit modules operating in parallel.
Analyzing the writing and reading of data to the inter-module interface by the circuit module, and determining the variables of the inter-module interface based on the results of the analysis,
For a variable of the inter-module interface, obtain a lifetime indicating a period from when data is first written to when it is finally read,
In a subset obtained by dividing the set having the variables as elements into integers, for all of the variables that are elements, the variables having the lifetime deviated from the lifetime of the variables that are the elements by a specific time, Search for a degenerate variable set existing in another subset obtained by the division,
Using the degenerated variable set, the inter-module interface is synthesized based on the variable assignment.
A program for causing the computer to execute the above.
[0167]
(Supplementary Note 8) The memory is shared so that the lifetimes of the variables that are elements of the degenerated variable set do not overlap, and the allocation of the variables that are the elements of the degenerated variable set to the memory is obtained, The program according to appendix 7, which causes the computer to synthesize the inter-module interface based on assignment.
[0168]
(Supplementary Note 9) Based on the Left Edge method, the memory of a variable that is an element of the degenerated variable set is shared by a memory so that the lifetimes of the variables that are elements of the degenerated variable set do not overlap. 9. The program according to appendix 8, characterized in that an assignment to a user is obtained.
[0169]
(Additional remark 10) The program of

Additional remark

8 or 9 which makes the said computer perform determining the magnitude | size of the memory which the said inter-module interface has based on the allocation to the said memory.
[0170]
(Additional remark 11) The said interface between modules calculates | requires the minimum delay time which is the minimum time from reading to writing, and performs the said interface between modules in consideration of this minimum delay time. The program according to any one of appendices 7 to 10.
[0171]
(Additional remark 12) The program of Additional remark 10 which makes the said computer perform determining the magnitude | size of the memory which the said inter-module interface has in consideration of the said minimum delay time.
[0172]
(Additional remark 13) The module as described in any one of additional remark 7 thru | or 12 which makes the said computer perform determining the magnitude | size of the memory which the said inter-module interface has based on the value calculated | required from the degenerated variable set. Automatic interface synthesizer.
[0173]
(Supplementary Note 14) The circuit module for writing data to the inter-module interface, the circuit module for reading data to the inter-module interface, and the time based on the respective synchronization signals for the inter-module interface When analyzing the writing and reading of data to and from the inter-module interface, the computer executes the conversion of the time of the circuit module that performs the writing and the time of the circuit module that performs the reading to the time of the inter-module interface 14. The program according to any one of appendices 7 to 13, wherein the program is executed.
[0174]
(Supplementary Note 15) When used by a computer that synthesizes an inter-module interface for transferring data between a plurality of circuit modules operating in parallel,
Analyzing the writing and reading of data to the inter-module interface by the circuit module, and determining the variables of the inter-module interface based on the results of the analysis,
For a variable of the inter-module interface, obtain a lifetime indicating a period from when data is first written to when it is finally read,
In a subset obtained by dividing the set having the variables as elements into integers, for all of the variables that are elements, the variables having the lifetime deviated from the lifetime of the variables that are the elements by a specific time, Search for a degenerate variable set existing in another subset obtained by the division,
Using the degenerated variable set, the inter-module interface is synthesized based on the variable assignment.
A portable storage medium readable by the computer storing a program for causing the computer to execute the above.
[0175]
【The invention's effect】
According to the present invention, it is possible to automatically synthesize an interface between modules that uses less memory than the conventional method.
[0176]
In addition, since this inter-module interface can be automatically synthesized, mistakes due to manual design can be prevented.
[Brief description of the drawings]
FIG. 1 is a diagram illustrating a configuration of a CAD system according to an embodiment.
FIG. 2 is a diagram illustrating processing performed by an interface module generation unit.
FIG. 3 is a diagram illustrating an example of a process handled by an interface module generation unit.
FIG. 4 is a diagram (part 1) illustrating an implementation example of an interface circuit module;
FIG. 5 is a diagram (part 2) illustrating an implementation example of an interface circuit module;
FIG. 6 is a diagram illustrating an example of a data structure of a read sequence / write sequence.
7A is a flowchart showing a process for converting a write sequence of process A into a read sequence of process C, and FIG. 7B is a diagram showing an example of the conversion.
FIG. 8A is a flowchart showing a process for converting a process B read sequence into a process C write sequence, and FIG.
FIG. 9: Minimum output delay t_c ^odelayFIG.
FIG. 10 is a diagram illustrating the data structure of variable lifetimes begin (v [i]) and end (v [i]).
FIG. 11 is a flowchart showing a process for obtaining a lifetime.
FIG. 12 is a diagram showing a degenerated variable set.
FIG. 13 is a flowchart showing processing for obtaining a degenerated variable set Vr;
FIG. 14 is a flowchart showing details of determination processing in step S46.
FIG. 15 is a flowchart showing processing for solving a variable sharing problem.
FIG. 16 is a diagram illustrating an algorithm for performing RTL generation;
FIG. 17 is a diagram illustrating an example of an RTL code of a sub process that reads an input port;
FIG. 18 is a diagram illustrating an example of an RTL code of a sub-process that writes to an output port.
FIG. 19 is a diagram showing a simulated example of RTL code in which two sub-processes are collected.
FIG. 20 is an explanatory diagram (part 1) of an output delay;
FIG. 21 is an explanatory diagram (part 2) of the output delay;
22 is a diagram showing the analysis result of the lifetime of the array variable in Example 1. FIG.
FIG. 23 is a diagram illustrating lifetimes of array variables in Example 1;
24 is a diagram illustrating an example of a degenerate variable set obtained from the variable set in Example 1. FIG.
FIG. 25 is a diagram illustrating a situation indicating allocation of variables to memories in Example 2;
FIG. 26 is a diagram showing the analysis result of the lifetime of the array variable in Example 3.
FIG. 27 is a diagram illustrating lifetimes of array variables in Example 3.
FIG. 28 is a diagram illustrating variable memory allocation results in the third example;
FIG. 29 is a diagram showing a variable memory allocation state in the third example;
FIG. 30 is a system environment diagram of a computer according to the present embodiment.
FIG. 31 is a diagram illustrating an example of a medium.
FIG. 32 is a diagram illustrating an example of operation description in a circuit description language used in circuit design by high-level synthesis.
[Explanation of symbols]
1 CAD system
11 Input editor
12 Compiler section
13 Library storage
14 Interface module generator
15 Output section
21 CPU
22 Main memory
23 Auxiliary storage
23 I / O devices
25 Network connection device
26 Medium reader
27 Storage media
28 Bus
31 Information processing device
32 storage means
33 Network line
34 Body
35 memory
36 Portable storage media

Claims

There line passing of data between the plurality of circuit modules operating in parallel, a synthesizer for synthesizing the inter-module interface having a buffer memory for storing the variables input to and output from the inter-module interface,
Variable calculation means for analyzing writing and reading of data to and from the inter-module interface by the circuit module, and obtaining each variable vi of the inter-module interface based on a result of the analysis;
Lifetime calculation means for obtaining a lifetime indicating a period from when data is first written to when it is finally read, for a variable vi possessed by the inter-module interface;
A function that returns a set of n elements from the beginning of the data L of the ordered list structure, Head ( n, L), the ordered list V _ord arranged in ascending order of begin (vi) for each variable viεV from the variable set V obtained by the variable calculation means. In the total common divisor (≠ 1) of the number of elements | V | of the variable set V,

∀pεP j ∃qεP j + 1, that thatbegin (p) + m = begin (q) and end (p) + m = end (q) and end (p) ≦ begin (q) A degenerate set calculation means for obtaining a minimum common divisor m _satisfying and using Head (m, V _ord ) as a degenerate variable set of the variable set V ;
Variable allocation means for sharing the buffer memory so that the lifetimes of variables that are elements of the degenerated variable set do not overlap, and for allocating variables that are elements of the degenerated variable set to the buffer memory When,
A circuit for synthesizing the inter-module interface by allocating the same address of the buffer memory to a plurality of variables whose lifetimes do not overlap among variables that are elements of the set of degenerated variables based on the allocation Combining means;
A device for automatically synthesizing an interface between modules.

A compiler means for analyzing a description in a circuit description language and outputting a circuit specification;
When requested by the compiler means to synthesize the inter-module interface having a buffer memory for storing variables input / output to / from the inter-module interface that transfers data between a plurality of circuit modules operating in parallel, Interface module generating means for synthesizing the interface between modules,
The interface module generation means includes
Variable calculation means for analyzing writing and reading of data to and from the inter-module interface by the circuit module, and obtaining each variable vi of the inter-module interface based on a result of the analysis;
Lifetime calculation means for obtaining a lifetime indicating a period from when data is first written to when it is finally read, for a variable vi possessed by the inter-module interface;
A function that returns a set of n elements from the beginning of the data L of the ordered list structure , Head ( n, L), the ordered list V _ord arranged in ascending order of begin (vi) for each variable viεV from the variable set V obtained by the variable calculation means. In the total common divisor (≠ 1) of the number of elements | V | of the variable set V,

∀p ∈ P j ∃q ∈ P j + 1 such that begin (p) + m = begin (q) and end (p) + m = end (q) and end (p) ≦ begin (q) A degenerate set calculation means for obtaining a minimum common divisor m _satisfying and using Head (m, V _ord ) as a degenerate variable set of the variable set V ;
Variable allocation means for sharing the buffer memory so that the lifetimes of variables that are elements of the degenerated variable set do not overlap, and for allocating variables that are elements of the degenerated variable set to the buffer memory When,
A circuit for synthesizing the inter-module interface by allocating the same address of the buffer memory to a plurality of variables whose lifetimes do not overlap among variables that are elements of the set of degenerated variables based on the allocation Combining means;
A CAD system characterized by comprising:

A synthesis method executed by a computer for synthesizing the inter-module interface having a buffer memory for storing a variable input / output to / from the inter-module interface that transfers data between a plurality of circuit modules operating in parallel,
Analyzing the writing and reading of data to the inter-module interface by the circuit module, and determining each variable vi of the inter-module interface based on the result of the analysis;
For a variable vi possessed by the inter-module interface, a lifetime indicating a period from when data is first written to when it is finally read is obtained.
A function that returns the first time of each variable vi as begin (i), the last time of reading and writing as end (i), and a function that returns n elements from the head of the data L of the ordered list structure as Head (n , L), the ordered list V _ord arranged in ascending order of begin (vi) for each variable viεV from the variable set V possessed by the inter-module interface. In the total common divisor (≠ 1) of the number of elements | V | of the variable set V,

∀p ∈ P j ∃q ∈ P j + 1 such that begin (p) + m = begin (q) and end (p) + m = end (q) and end (p) ≦ begin (q) Find the smallest common divisor m that holds, and let Head (m, V _ord ) be a degenerate variable set of the variable set V,
The buffer memory is shared so that the lifetimes of variables that are elements of the degenerated variable set do not overlap, and the allocation of variables that are elements of the degenerated variable set to the buffer memory is obtained,
Based on the allocation, the same address of the buffer memory is allocated to a plurality of variables whose lifetimes do not overlap among variables that are elements of the degenerated variable set, and the inter-module interface is synthesized. A synthesis method characterized by

A program executed by a computer for synthesizing the inter-module interface having a buffer memory for storing a variable input / output to / from the inter-module interface that exchanges data between a plurality of circuit modules operating in parallel. Analyzing writing and reading of data to the inter-module interface by the module, and obtaining each variable vi possessed by the inter-module interface based on the result of the analysis,
For a variable vi possessed by the inter-module interface, a lifetime indicating a period from when data is first written to when it is finally read is obtained.
A function that returns the first time of each variable vi as begin (i), the last time of reading and writing as end (i), and a function that returns n elements from the head of the data L of the ordered list structure as Head (n , L), the ordered list V _ord arranged in ascending order of begin (vi) for each variable viεV from the variable set V possessed by the inter-module interface. In the total common divisor (≠ 1) of the number of elements | V | of the variable set V,

∀p ∈ P j ∃q ∈ P j + 1 such that begin (p) + m = begin (q) and end (p) + m = end (q) and end (p) ≦ begin (q) Find the smallest common divisor m that holds, and let Head (m, V _ord ) be a degenerate variable set of the variable set V,
The buffer memory is shared so that the lifetimes of variables that are elements of the degenerated variable set do not overlap, and the assignment of variables that are elements of the degenerated variable set to the buffer memory is obtained,
Based on the allocation, the same address of the buffer memory is allocated to a plurality of variables whose lifetimes do not overlap among variables that are elements of the degenerated variable set, and the inter-module interface is synthesized. A program for causing the computer to execute.

Determining the minimum of the minimum delay time is the time of the inter-module interface to write the read data, in consideration of said minimum delay time, claim to perform the synthesis of the inter-module interface to the computer 4 the program according to.

The circuit module for writing data to the inter-module interface, the circuit module for reading data to the inter-module interface, and a time based on the respective synchronization signals are set for the inter-module interface, When analyzing data writing to and reading from an inter-module interface, the computer is caused to convert the time of the circuit module that performs the writing and the time of the circuit module that performs the reading into the time of the inter-module interface. The program according to claim 4 or 5 .