JP2020135743A

JP2020135743A - Circuit design support device, circuit design support method, and information processing device

Info

Publication number: JP2020135743A
Application number: JP2019031899A
Authority: JP
Inventors: 泰地久恒; Taiji Hisatsune; 泰輔植田; Yasusuke Ueda; 茂規早瀬; Shigenori Hayase
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 2019-02-25
Filing date: 2019-02-25
Publication date: 2020-08-31
Anticipated expiration: 2039-02-25
Also published as: JP7100597B2

Abstract

To consider performance when a logic circuit is repeatedly accessed on software in making a portion of software description a logic circuit.SOLUTION: A circuit design support device includes: a measurement part having a function for detecting processing which enables parallel operation configuration in a logic circuit and is executed a plurality of times in making a portion of software description the logic circuit, listing inputs when the logic circuit performs the processing, and outputting performance in reducing an input amount to the logic circuit as a logic circuit formation candidate list; and a description insertion part having a function for inserting description for reducing the input amount to the logic circuit into the software description for a logic circuit formation candidate selected from the logic circuit formation candidate list.SELECTED DRAWING: Figure 3

Description

本発明は、論理回路設計を支援する回路設計支援技術に関する。具体的な例では、ソフトウェア記述から論理回路記述を自動的に生成する高位合成を用いた論理回路設計を支援する回路設計支援装置や方法に関する。 The present invention relates to a circuit design assisting technique that supports logic circuit design. A specific example relates to a circuit design support device and a method for supporting a logic circuit design using high-level synthesis that automatically generates a logic circuit description from a software description.

CPU（Central Processing Unit）上で動作するソフトウェア処理を高速化する手法の一つとして、ソフトウェア処理の負荷となっている処理を論理回路デバイスで実装し、オフロードする手法がある。FPGA（field-programmable gate array）等に代表される論理回路デバイスは、論理回路記述によって内部の論理回路を自由に設計することができる。逐次処理を行うCPUとは異なり、論理回路は複数の処理を同時に実行できるため、ソフトウェア処理を高速化する際は、演算を論理回路上で並列に構築することでCPUを上回る高速化性能を発揮することが可能である。 As one of the methods for accelerating the software processing running on the CPU (Central Processing Unit), there is a method of implementing the processing that is the load of the software processing on a logic circuit device and offloading it. For logic circuit devices such as FPGA (field-programmable gate array), the internal logic circuit can be freely designed by the logic circuit description. Unlike a CPU that performs sequential processing, a logic circuit can execute multiple processes at the same time, so when speeding up software processing, by constructing operations in parallel on the logic circuit, it demonstrates faster performance than the CPU. It is possible to do.

従来、論理回路の設計は、レジスタ転送レベルによる設計が必要であったが、近年ではより抽象度の高いC言語等の高級言語によって設計する高位合成も可能になっている。高位合成を活用することで、CPU上で動作するソフトウェアの負荷処理コードを直接高位合成を用いて論理回路へ変換し、論理回路デバイスで高速化する事も可能である。 Conventionally, logic circuit design has been required at the register transfer level, but in recent years, high-level synthesis designed by a higher-level language such as C language, which has a higher degree of abstraction, has become possible. By utilizing high-level synthesis, it is also possible to directly convert the load processing code of software running on the CPU into a logic circuit using high-level synthesis, and speed up the logic circuit device.

しかし、ソフトウェアの記述は一般的にCPU上で逐次的に処理されるように記述されている為、そのまま高位合成で論理回路へ実装すると、並列化がうまくされない、外部メモリへアクセスする等の論理回路にとって冗長な処理が発生し、期待した高速化効果を得る事が困難である。 However, since the software description is generally described so that it is processed sequentially on the CPU, if it is implemented in a logic circuit by high-level synthesis as it is, parallelization will not be successful, and logic such as accessing an external memory will be performed. It is difficult to obtain the expected speed-up effect due to redundant processing for the circuit.

特開2015−95130号公報（特許文献１）では、「並列化候補コードブロックにおいて、評価対象となるコードブロックの範囲である評価対象範囲を前記クリティカルコードブロックを中心にして公狭を変化させて複数設定し、評価対象範囲ごとに、評価対象範囲のコードブロックに対応するデータパスの特性を評価し、複数の評価対象範囲の中から、回路並列化の対象にする評価対象範囲を選択する評価部とを有することを特徴とする。」という記載がある通り、単純に並列演算による高速化効果が高いと見込める箇所へ高位合成を適用するだけでなく、論理回路にとって最適となる箇所を見極める必要がある。 In Japanese Patent Application Laid-Open No. 2015-95130 (Patent Document 1), "In the parallelization candidate code block, the evaluation target range, which is the range of the code block to be evaluated, is changed in the narrowness around the critical code block. Multiple settings are made, the characteristics of the data path corresponding to the code block of the evaluation target range are evaluated for each evaluation target range, and the evaluation target range to be the target of circuit parallelization is selected from the multiple evaluation target ranges. As stated in "It is characterized by having a part", it is necessary not only to simply apply high-level composition to the part where the speed-up effect by parallel computing is expected to be high, but also to identify the part that is most suitable for the logic circuit. There is.

特開2015−95130号公報JP-A-2015-95130

特許文献１は、すなわち、ソフトウェア処理内で並列回路による高速化が達成可能な箇所を論理回路化対象として設定し、その周辺の複数の処理を含めた時の複数の論理回路化ケースに対して性能評価をする技術である。 Patent Document 1 refers to a plurality of logic circuit cases when a portion of software processing in which speeding up by a parallel circuit can be achieved is set as a logic circuit target and a plurality of peripheral processes are included. This is a performance evaluation technology.

しかしながら、特許文献１においては、ソフトウェア上で論理回路が繰り返し呼ばれる際の性能を含めた評価をしていない。すなわち、ソフトウェア記述の一部を論理回路化する際に、ソフトウェア上で論理回路が繰り返し呼ばれる際の性能を考慮することが望まれる。 However, Patent Document 1 does not evaluate the performance when the logic circuit is repeatedly called on the software. That is, when making a part of the software description into a logic circuit, it is desirable to consider the performance when the logic circuit is repeatedly called on the software.

本発明の好ましい一側面は、ソフトウェア記述の一部を論理回路化する際に、論理回路で並列演算構成が可能な複数回実行される処理を検出し、当該処理を論理回路で行なう際の入力をリスト化し、論理回路への入力量を削減した際の性能を論理回路化候補リストとして出力する機能を持つ計測部と、当該論理回路化候補リストから選択された論理回路化候補に対し、ソフトウェア記述に論理回路への入力量を削減するための記述を挿入する機能を持つ記述挿入部と、を備える回路設計支援装置である。 A preferable aspect of the present invention is that when a part of the software description is made into a logic circuit, a process that is executed a plurality of times that allows a parallel arithmetic configuration in the logic circuit is detected, and an input when the process is performed in the logic circuit For the measurement unit that has the function of listing the performance when the input amount to the logic circuit is reduced and outputting it as a logic circuit candidate list, and for the logic circuit candidates selected from the logic circuit candidate list, software It is a circuit design support device including a description insertion unit having a function of inserting a description for reducing the amount of input to a logic circuit into the description.

本発明の好ましい他の一側面は、ソフトウェア記述の一部を論理回路化する回路設計支援方法であって、ソフトウェア記述から論理回路化する繰り返し処理を抽出する、繰り返し処理抽出処理と、抽出した繰り返し処理において、同じ値を繰り返し使用する入力配列を抽出する、同値転送配列抽出処理と、同じ値を繰り返し使用する入力配列を、他の入力配列と同じ要素数を持つ配列に変換するように、ソフトウェア記述を変更する記述挿入処理と、を行なう回路設計支援方法である。 Another preferable aspect of the present invention is a circuit design support method for converting a part of the software description into a logic circuit, that is, an iterative process extraction process for extracting the iterative process for making a logic circuit from the software description, and the extracted iteration. Software to extract an input array that repeatedly uses the same value in the process, and to convert an input array that repeatedly uses the same value into an array that has the same number of elements as other input sequences. This is a circuit design support method that performs a description insertion process that changes the description.

本発明の好ましい他の一側面は、ＣＰＵと論理回路とメモリを備え、メモリに記憶されたソフトウェアを前記ＣＰＵで実行して処理を行なう際に、論理回路を呼び出して処理の一部を実行させる、情報処理装置である。この装置では、ＣＰＵは、論理回路を呼び出して処理の一部を実行させる際に、論理回路が実行する処理の入力が複数の配列を含み、かつ、複数の配列が同じ値を繰り返し入力する配列を含む場合には、同じ値を繰り返し使用する配列を、他の配列と同じ要素数を持つ配列に変換して論理回路に転送し、論理回路は、同じ値を繰り返し使用する配列を内蔵メモリに記憶し、該内蔵メモリに記憶された配列を用いて処理を行なう。 Another preferred aspect of the present invention includes a CPU, a logic circuit, and a memory, and when the software stored in the memory is executed by the CPU to perform processing, the logic circuit is called to execute a part of the processing. , An information processing device. In this device, when the CPU calls a logic circuit to execute a part of the processing, the input of the processing executed by the logic circuit includes a plurality of arrays, and the plurality of arrays repeatedly input the same value. If, an array that repeatedly uses the same value is converted to an array that has the same number of elements as other arrays and transferred to the logical circuit, and the logical circuit transfers the array that repeatedly uses the same value to the built-in memory. It is stored and processing is performed using the array stored in the built-in memory.

ソフトウェア記述の一部を論理回路化する際に、ソフトウェア上で論理回路が繰り返し呼ばれる際の性能を考慮したシステムが得られる。 When a part of the software description is made into a logic circuit, a system considering the performance when the logic circuit is repeatedly called on the software can be obtained.

ソフトウェア処理の論理回路化の一例を示す概念図。The conceptual diagram which shows an example of the logic circuit of software processing. ソフトウェア処理の論理回路化の他の例を示す概念図。A conceptual diagram showing another example of logic circuitization of software processing. 実施形態が適用される動作記述の設計支援装置を示す構成図。The block diagram which shows the design support apparatus of the operation description to which embodiment is applied. 実施例に関わる高位合成の構成を示すブロック図。The block diagram which shows the structure of the high-level synthesis which concerns on Example. 実施形態中の計測部における動作を示すフローチャート。The flowchart which shows the operation in the measurement part in an embodiment. 実施例によるソフトウェア処理の論理回路化の例を示す概念図。The conceptual diagram which shows the example of the logic circuit of software processing by an Example. 実施形態中の記述挿入部における動作を示すフローチャート。The flowchart which shows the operation in the description insertion part in an embodiment. 実施形態における論理回路化候補リストの表図。The table diagram of the logic circuit candidate list in an embodiment. 実施形態が適用されるハードウェアアーキテクチャを示す構成図。The block diagram which shows the hardware architecture to which an embodiment is applied. ソフトウェア処理の論理回路化におけるコード記述例の図。The figure of the code description example in the logic circuit of software processing. 実施例によるソフトウェア処理の論理回路化におけるコード記述例の図。The figure of the code description example in the logic circuit of software processing by an Example.

実施の形態について、図面を用いて詳細に説明する。ただし、本発明は以下に示す実施の形態の記載内容に限定して解釈されるものではない。本発明の思想ないし趣旨から逸脱しない範囲で、その具体的構成を変更し得ることは当業者であれば容易に理解される。 The embodiment will be described in detail with reference to the drawings. However, the present invention is not construed as being limited to the description of the embodiments shown below. It is easily understood by those skilled in the art that a specific configuration thereof can be changed without departing from the idea or purpose of the present invention.

以下に説明する発明の構成において、同一部分又は同様な機能を有する部分には同一の符号を異なる図面間で共通して用い、重複する説明は省略することがある。 In the configuration of the invention described below, the same reference numerals may be used in common among different drawings for the same parts or parts having similar functions, and duplicate description may be omitted.

同一あるいは同様な機能を有する要素が複数ある場合には、同一の符号に異なる添字を付して説明する場合がある。ただし、複数の要素を区別する必要がない場合には、添字を省略して説明する場合がある。 When there are a plurality of elements having the same or similar functions, they may be described by adding different subscripts to the same code. However, if it is not necessary to distinguish between a plurality of elements, the subscript may be omitted for explanation.

本明細書等における「第１」、「第２」、「第３」などの表記は、構成要素を識別するために付するものであり、必ずしも、数、順序、もしくはその内容を限定するものではない。また、構成要素の識別のための番号は文脈毎に用いられ、一つの文脈で用いた番号が、他の文脈で必ずしも同一の構成を示すとは限らない。また、ある番号で識別された構成要素が、他の番号で識別された構成要素の機能を兼ねることを妨げるものではない。 Notations such as "first", "second", and "third" in the present specification and the like are attached to identify the components, and do not necessarily limit the number, order, or contents thereof. is not. In addition, numbers for identifying components are used for each context, and numbers used in one context do not always indicate the same composition in other contexts. Further, it does not prevent the component identified by a certain number from having the function of the component identified by another number.

図面等において示す各構成の位置、大きさ、形状、範囲などは、発明の理解を容易にするため、実際の位置、大きさ、形状、範囲などを表していない場合がある。このため、本発明は、必ずしも、図面等に開示された位置、大きさ、形状、範囲などに限定されない。 The position, size, shape, range, etc. of each configuration shown in the drawings and the like may not represent the actual position, size, shape, range, etc. in order to facilitate understanding of the invention. Therefore, the present invention is not necessarily limited to the position, size, shape, range, etc. disclosed in the drawings and the like.

本実施の形態では、動作記述の設計支援装置を説明する。当該動作記述の設計支援装置は、ソフトウェア処理のうち、並列回路が構成できる繰り返し処理を抽出し、抽出した処理がソフトウェア上で複数回呼ばれている場合に、複数回の呼び出しを考慮して論理回路への入力量を削減し、処理時間を短縮することができる。 In this embodiment, a design support device for operation description will be described. The design support device of the operation description extracts the iterative process that can be configured by the parallel circuit from the software processes, and when the extracted process is called multiple times on the software, the logic is taken into consideration for multiple calls. The amount of input to the circuit can be reduced and the processing time can be shortened.

一般的に、高位合成によってソフトウェア処理の一部を論理回路化する場合、一つの処理に対して論理回路として最適化されるよう高位合成が行われる。以下、高位合成を実施する際の具体的な変換処理について説明する。 Generally, when a part of software processing is made into a logic circuit by high-level synthesis, high-level synthesis is performed so as to be optimized as a logic circuit for one process. Hereinafter, a specific conversion process when performing high-level synthesis will be described.

例えば、ソフトウェア記述の一部を論理回路化し高速化する高位合成において、論理回路が複数回呼ばれ、複数の配列が転送され、値の変わらない配列が存在するとき、入力配列を全て引数化すると、同じ値を持つ配列を繰り返し入力する処理が必要になる。そこで、値の変わらない配列を要素数の大きい配列の要素数と揃えることで、入力インターフェースを共通にし、同じ値を持つ配列を１回の入力で行なうようにする。また、最も大きい要素数の配列一つを論理回路への入力とし、全ての入力配列の要素数を揃えることで、入力インターフェースを共通にすることができる。同じ値を入力する配列を論理回路内のメモリへ格納することで、繰り返しの転送をすることなく、論理回路内で再利用を可能にし、転送時間を削減し、高速化することができる。 For example, in high-level synthesis in which a part of the software description is made into a logic circuit to speed up, when the logic circuit is called multiple times, multiple arrays are transferred, and there is an array whose value does not change, all the input arrays are used as arguments. , It is necessary to repeatedly input an array with the same value. Therefore, by aligning the array with the same value with the number of elements of the array with a large number of elements, the input interface is made common, and the array with the same value is performed with one input. Further, by using one array with the largest number of elements as an input to the logic circuit and aligning the number of elements in all the input arrays, the input interface can be made common. By storing the array in which the same value is input in the memory in the logic circuit, it is possible to reuse it in the logic circuit without repeating the transfer, and it is possible to reduce the transfer time and increase the speed.

図１にソフトウェア処理の一部を高位合成によって論理回路化したときの構成の一例を示した。高位合成によってある処理を論理回路化すると、当該処理はソフトウェア上で関数化され、当該関数における引数が論理回路への入出力として扱われる。図１においては、ソフトウェア記述（１００）で記述された処理Ａ，Ｂ，Ｃ，Ｄのうち、５０回繰り返し呼ばれる処理D（１０１Ｄ）を論理回路化し、論理回路D（２００Ｄ）へ入力となる配列A[100]を配列A1[100]（３０１）配列A2[100]（３０２）配列A3[100]（３０３）と、順に転送している。 FIG. 1 shows an example of a configuration when a part of software processing is made into a logic circuit by high-level synthesis. When a certain process is made into a logic circuit by high-level synthesis, the process is made into a function on software, and the arguments in the function are treated as input / output to the logic circuit. In FIG. 1, among the processes A, B, C, and D described in the software description (100), the process D (101D) called repeatedly 50 times is made into a logic circuit, and the array is input to the logic circuit D (200D). A [100] is transferred in order of sequence A1 [100] (301) sequence A2 [100] (302) sequence A3 [100] (303).

尚、配列A1、A2、A3は、いずれも異なる値を格納する１００の要素を持つ配列A[100]を意味し、処理D（１０１Ｄ）の繰り返しである５０回分異なる値、すなわち、A1[100]からA50[100]までを論理回路Ｄ（２００Ｄ）へ入力するように続く。論理回路D（２００Ｄ）では、受け取った配列A[100]に対して、その１００の要素を並列に展開して５０回並列演算（２０１）を行い、必要に応じてその他の演算（２０２）を行ない結果を出力し、ソフトウェア処理に戻す。以上のような構成とすることで、論理回路D（２００Ｄ）によって配列A[100]の１００の要素を並列演算できる。 The arrays A1, A2, and A3 all mean an array A [100] having 100 elements that store different values, and the values differ by 50 times, which is the repetition of the process D (101D), that is, A1 [100]. ] To A50 [100] are input to the logic circuit D (200D). In the logic circuit D (200D), the 100 elements of the received array A [100] are expanded in parallel to perform parallel calculation (201) 50 times, and other operations (202) are performed as necessary. Outputs the result and returns to software processing. With the above configuration, 100 elements of array A [100] can be operated in parallel by the logic circuit D (200D).

図２は、複数の配列を入力とする例を示した図である。ソフトウェア記述（１００）の処理Ａ，Ｂ，Ｃ，Ｄのうち、１００回呼び出される処理B（１０１Ｂ）に対して、論理回路化を実行すると、論理回路B（２００Ｂ）となる。 FIG. 2 is a diagram showing an example in which a plurality of arrays are input. Of the processes A, B, C, and D of the software description (100), when the logic circuit is executed for the process B (101B) called 100 times, the logic circuit B (200B) is obtained.

処理B（１０１Ｂ）は、複数（この例では２つ）の配列A[100]とB[10]を入力として持っている為、論理回路B（２００Ｂ）に対しても配列A[100]、配列B[10]の複数の配列からなる入力セット（５０１）（５０２）（５０３）等が入力される。 Since the process B (101B) has a plurality of (two in this example) arrays A [100] and B [10] as inputs, the array A [100], also for the logic circuit B (200B), An input set (501) (502) (503) or the like composed of a plurality of arrays of the array B [10] is input.

論理回路B（２００Ｂ）では、並列演算（２０１）が構成され、複数の配列からなる入力セット（５０１）（５０２）（５０３）等の中の、配列A1[100]、A2[100]、A3[100]・・A100[100]に対して順に並列演算が実行される。論理回路B（２００Ｂ）では、並列演算（２０１）と同時に配列B[10]を入力として演算（２０３）も処理される。 In the logic circuit B (200B), parallel operation (201) is configured, and arrays A1 [100], A2 [100], A3 in an input set (501) (502) (503) composed of a plurality of arrays, etc. [100] ... Parallel operations are executed in order for A100 [100]. In the logic circuit B (200B), the operation (203) is also processed with the array B [10] as an input at the same time as the parallel operation (201).

ここで、入力において、配列A[100]は毎回異なる値が格納され、転送されるが、配列B[10]は同じ値を転送しているとすると、本来は配列Bの転送、及び演算（２０３）は一度の処理で十分にもかかわらず、毎回転送し、演算していることとなる。 Here, in the input, the array A [100] stores and transfers different values each time, but assuming that the array B [10] transfers the same value, the transfer and operation of the array B are originally performed ( In 203), even though one process is sufficient, it is transferred and calculated every time.

高位合成で論理回路化を行うと、一回の論理回路の呼び出し時の入力における配列の要素数と配列数は一様に決定される。すなわちこの例では、論理回路Bへの入力は、常に要素100の配列と要素10の配列の２つに限定される。そのため、図２の例では、配列A[100]と配列B[10]の入力セットを一度に入力することが条件となり、配列B[10]だけを転送することはできない。よって、入力セット（５０２）、及び入力セット（５０３）に内在する配列Bは、入力セット（５０１）の値と同じであるにも関わらず論理回路B（２００Ｂ）へ入力しなければならない。この同じ値を繰り返し転送する処理は、論理回路への転送量を増加させ、処理時間を増大させる。 When the logic circuit is made by high-level synthesis, the number of elements and the number of arrays in the input at the time of calling the logic circuit once are uniformly determined. That is, in this example, the input to the logic circuit B is always limited to two, an array of elements 100 and an array of elements 10. Therefore, in the example of FIG. 2, it is a condition that the input sets of the array A [100] and the array B [10] are input at the same time, and it is not possible to transfer only the array B [10]. Therefore, the input set (502) and the array B inherent in the input set (503) must be input to the logic circuit B (200B) even though they are the same as the values of the input set (501). The process of repeatedly transferring the same value increases the amount of transfer to the logic circuit and increases the processing time.

この課題を考慮した実施例について、以下説明する。以下の実施例では、高速化対象のソフトウェア記述に対し、並列回路が構成できる繰り返し処理を抽出し、抽出した演算を論理回路化した際に、当該処理が複数回ソフトウェア上で呼ばれる場合に、同じ値を繰り返し転送する入力配列（以下同値転送配列とも表記する）があるかを検出する。同値転送配列が存在する場合、対象の論理回路で並列演算が処理される前に論理回路内のメモリへ予め転送されるように記述を挿入することで、繰り返しの転送にかかっていた処理時間を削減する。 An example in which this problem is taken into consideration will be described below. In the following embodiment, when an iterative process that can be configured by a parallel circuit is extracted from the software description to be speeded up and the extracted operation is made into a logic circuit, the same process is performed when the process is called multiple times on the software. Detects whether there is an input array (hereinafter also referred to as an equivalence transfer array) that repeatedly transfers values. If an equivalence transfer array exists, the processing time required for repeated transfer can be reduced by inserting a description so that it is transferred to the memory in the logic circuit in advance before the parallel operation is processed in the target logic circuit. Reduce.

また、本来１回の入力で十分な配列については、他の配列と要素数を統一して統合することにより、論理回路への繰り返しの転送を省略する。図２の例で説明すると、配列Aと配列Bを統合して１種類の入力セットとし、その入力セットに配列Aと配列Bを含むようにし、論理回路に入力できるようにする。具体的には、配列Bを１００の要素を含む形に変形し、高位合成により配列Aと同様に入力できるようにすることで、要素数１００の１種類の配列の転送で処理を実行できるようにする。 Further, for an array for which one input is originally sufficient, repeated transfer to the logic circuit is omitted by unifying and integrating the number of elements with other arrays. Explaining with the example of FIG. 2, array A and array B are integrated into one type of input set, and array A and array B are included in the input set so that they can be input to a logic circuit. Specifically, by transforming array B into a form containing 100 elements and making it possible to input in the same way as array A by high-level synthesis, processing can be executed by transferring one type of array with 100 elements. To.

図３は、本発明をソフトウェア記述に適用した時の実施例であり、動作記述の設計支援装置の処理の流れを示したものである。計測部（２）、記述挿入部（４）、高位合成（７）は、機能ブロックを示している。機能ブロックで実行される計算や制御等の機能は、入力装置、出力装置、処理装置、および記録装置を備える情報処理装置、たとえばサーバで実現されるものとする。機能ブロックは、記憶装置に格納されたプログラムが処理装置によって実行されることで、定められた処理を他のハードウェアと協働して実現される。計算機などが実行するプログラム、その機能、あるいはその機能を実現する手段を、「機能」、「手段」、「部」、「ユニット」、「モジュール」等と呼ぶ場合がある。 FIG. 3 is an example when the present invention is applied to the software description, and shows the flow of processing of the design support device for the operation description. The measurement unit (2), the description insertion unit (4), and the high-level synthesis (7) indicate functional blocks. Functions such as calculation and control executed in the functional block shall be realized by an information processing device including an input device, an output device, a processing device, and a recording device, for example, a server. A functional block is realized by executing a program stored in a storage device by a processing device in cooperation with other hardware. A program executed by a computer or the like, its function, or a means for realizing the function may be referred to as a "function", a "means", a "part", a "unit", a "module", or the like.

ソフトウェア記述（１）、論理回路化候補リスト（３）、最適化済み動作記述（５）、動作記述外ソフトウェア記述（６）は、機能ブロックで処理されるあるいは処理されたデータあるいはソフトウェアである。最適化済み動作記述（５）は、高位合成（７）で論理回路化され、論理回路（８）として例えばFPGAに実装される。動作記述外ソフトウェア記述（６）は、CPU処理（９）によってCPUで処理される。 The software description (1), the logic circuit candidate list (3), the optimized operation description (5), and the non-operation description software description (6) are data or software processed or processed by a functional block. The optimized operation description (5) is made into a logic circuit by high-level synthesis (7), and is implemented as a logic circuit (8) in, for example, an FPGA. The software description (6) outside the operation description is processed by the CPU by the CPU process (9).

計測部（２）は、ソフトウェア記述（１）に対して、並列演算が可能となる処理を抽出し、各並列演算の呼び出される回数及び、各並列演算の含む同値転送配列を検出する。当該同値転送配列に対し、予め論理回路へ転送する記述を挿入した場合の削減時間を演算し、結果を論理回路化候補リスト（３）として出力する。論理回路化候補リスト（３）は例えばサーバの表示装置（モニタなど）や、プリンターによって出力表示され、ユーザ（設計者）が見ることができる。 The measurement unit (2) extracts the process that enables parallel calculation from the software description (1), and detects the number of times each parallel operation is called and the equivalence transfer array included in each parallel operation. The reduction time when the description to be transferred to the logic circuit is inserted in advance for the equivalence transfer array is calculated, and the result is output as the logic circuit conversion candidate list (3). The logic circuit candidate list (3) is output and displayed by, for example, a display device (monitor or the like) of a server or a printer, and can be viewed by a user (designer).

設計者は出力された論理回路化候補リスト（３）から、どの処理箇所を論理回路化するかを選択する。記述挿入部（４）では、選択した論理回路化箇所が含む同値転送配列に対し、並列演算の前に予め論理回路へ転送するように記述を挿入する。結果、ソフトウェア記述（１）は、論理回路化候補リスト（３）で選択された箇所をもとに、記述挿入部（４）で最適化記述が挿入された後に、最適化済み動作記述（５）と、動作記述外ソフトウェア記述（６）とに分けられる。 The designer selects which processing part is to be made into a logic circuit from the output logic circuit formation candidate list (3). In the description insertion unit (4), a description is inserted so that the equivalence transfer array included in the selected logic circuitized portion is transferred to the logic circuit in advance before the parallel operation. As a result, the software description (1) is the optimized operation description (5) after the optimization description is inserted in the description insertion section (4) based on the location selected in the logic circuit candidate list (3). ) And software description (6) outside the operation description.

最適化済み動作記述（５）はそのまま高位合成（７）によって論理回路化され、ロジックデバイス上で論理回路（８）として実装される。動作記述外ソフトウェア記述（６）は、CPU側で処理される。 The optimized operation description (5) is directly converted into a logic circuit by high-level synthesis (7), and is implemented as a logic circuit (8) on the logic device. The software description (6) outside the operation description is processed on the CPU side.

図４は、計測部（２）と論理回路化候補リスト（３）と、記述挿入部（４）の詳細を記した図である。この動作記述の設計装置では、高速化できる箇所を抽出してインターフェースを共通化する。計測部（２）は、繰り返し処理抽出部（１０）、同値転送配列抽出部（１１）、削減量演算部（１２）から構成される。 FIG. 4 is a diagram showing details of the measurement unit (2), the logic circuit candidate list (3), and the description insertion unit (4). In the design device of this operation description, the parts that can be speeded up are extracted and the interface is made common. The measurement unit (2) is composed of a repetitive processing extraction unit (10), an equivalence transfer sequence extraction unit (11), and a reduction amount calculation unit (12).

繰り返し処理抽出部（１０）は、ソフトウェア記述から、同じ処理が複数回実行される繰り返し処理を抽出する。具体的には、ソフトウェア記述から並列演算による高速化が達成できる繰り返し処理を抽出し、各繰り返し処理がソフトウェア上で呼び出される回数を計測する。 The iterative process extraction unit (10) extracts the iterative process in which the same process is executed a plurality of times from the software description. Specifically, iterative processing that can achieve high speed by parallel calculation is extracted from the software description, and the number of times each iterative processing is called on the software is measured.

同値転送配列抽出部（１１）は、繰り返し処理で参照される引数を表す入出力配列から、同じ値を繰り返し入力している配列を抽出する。具体的には、繰り返し処理抽出部（１０）で抽出した各繰り返し処理に対して、論理回路化した際の入出力となる配列を検出し、そのうちで、論理回路化すると同じ値を何度も入力することとなる配列を抽出する。 The equivalence transfer array extraction unit (11) extracts an array in which the same value is repeatedly input from the input / output array representing the argument referred to in the iterative process. Specifically, for each iterative process extracted by the iterative process extraction unit (10), an array that becomes input / output when the logic circuit is formed is detected, and the same value is repeatedly obtained when the logic circuit is formed. Extract the array to be input.

削減量演算部（１２）は、条件を満たした同値転送配列に対して、削除した際の削減量と、高速化性能を出力する。具体的には、同値転送配列抽出部（１１）において抽出した各同値転送配列に対して図５で後述のステップ１８からステップ２７に示す処理を実行することで、削減できる同値転送配列の転送量と、削除後の高速化性能を算出し、各繰り返し処理リスト、同値転送配列と合わせて論理回路化候補リスト（３）として出力する。 The reduction amount calculation unit (12) outputs the reduction amount when deleted and the speed-up performance for the equivalence transfer array satisfying the conditions. Specifically, the transfer amount of the equivalence transfer sequence that can be reduced by executing the processes shown in steps 18 to 27 described later in FIG. 5 for each equivalence transfer sequence extracted by the equivalence transfer sequence extraction unit (11). Then, the speed-up performance after deletion is calculated, and it is output as a logic circuit candidate list (3) together with each iterative processing list and the equivalence transfer array.

論理回路化候補リスト（３）は、繰り返し回数、同値転送配列を削減した時の入力あるいは出力のデータ量、削減したデータ量及び短縮したレイテンシ数を繰り返し処理毎に表示する。設計者は、論理回路化候補リスト（３）から、実際に論理回路化する繰り返し処理を選択する。 The logic circuit candidate list (3) displays the number of repetitions, the amount of input or output data when the equivalence transfer array is reduced, the amount of data reduced, and the number of reduced latencies for each repetition process. The designer selects the iterative process to actually make a logic circuit from the logic circuit candidate list (3).

ソフトウェア記述（１）は論理回路化候補リスト（３）で選択された論理回路化候補に基づいて、記述挿入部（４）によって、同値転送配列の入力が削減されるよう記述が挿入される。当該記述により、選択された処理が論理回路化される際に、インターフェースが共通化される手法によって高速化が可能となる。結果、設計者は同値転送配列が削減された最適化済み動作記述（５）と、動作記述外ソフトウェア記述（６）を得ることができる。 The software description (1) is based on the logic circuitization candidate selected in the logic circuitization candidate list (3), and the description insertion unit (4) inserts the description so that the input of the equivalence transfer array is reduced. According to the description, when the selected process is made into a logic circuit, the speed can be increased by the method of sharing the interface. As a result, the designer can obtain an optimized operation description (5) in which the equivalence transfer sequence is reduced and a software description (6) outside the operation description.

図５は、図４に示した回路設計支援装置の計測部（２）の処理を示すフローチャートである。図５に基づく動作は以下の通りである。ステップ１３からステップ１６までは繰り返し処理抽出部（１０）が実行する。 FIG. 5 is a flowchart showing the processing of the measurement unit (2) of the circuit design support device shown in FIG. The operation based on FIG. 5 is as follows. The iterative processing extraction unit (10) executes steps 13 to 16.

ステップ１３：ソフトウェア記述（１）を読み込み、記述に含まれる繰り返し処理を抽出する。繰り返し処理は、同一の処理を異なる要素へ適用するため、並列回路構成による高速化効果が高い。 Step 13: Read the software description (1) and extract the iterative process included in the description. Since the same process is applied to different elements in the iterative process, the effect of speeding up due to the parallel circuit configuration is high.

ステップ１４：ステップ１３で読み込まれた繰り返し処理をリスト化する。 Step 14: List the iterative processes read in step 13.

ステップ１５：ステップ１４でリスト化された各繰り返し処理に対して、内部の処理を含めて関数化する。この時、関数で用いる変数及び配列等は全て引数化する。引数化によって、内部の処理で参照、代入する外部データを全て引数として明確化できる。ここで引数となった変数、配列は論理回路化にあたって、論理回路の入出力値となる。 Step 15: Each iterative process listed in step 14 is made into a function including internal processes. At this time, all variables and arrays used in the function are used as arguments. By making arguments, all external data to be referenced and assigned in internal processing can be clarified as arguments. The variables and arrays that are the arguments here are the input / output values of the logic circuit when making the logic circuit.

ステップ１６：ステップ１５において切り離された関数の引数を入力、出力別に列挙する。 Step 16: List the arguments of the function separated in step 15 by input and output.

ステップ１７からステップ２６までの処理では、同値転送配列を検出し、削減可能な転送量を演算する。ステップ１７からステップ２１までの処理は、同値転送配列抽出部（１１）で実行する。ステップ２２からステップ２６までの処理は、削減量演算部（１２）で実行する。 In the processes from step 17 to step 26, the equivalence transfer sequence is detected and the transfer amount that can be reduced is calculated. The processes from step 17 to step 21 are executed by the equivalence transfer sequence extraction unit (11). The processing from step 22 to step 26 is executed by the reduction amount calculation unit (12).

図６に基づいて、各ステップで行われる具体的な処理とその原理を説明する。本実施例では、図６の形を取ることで、図２における入力セット（５０２）、（５０３）及び以降に続く入力セット内部にある配列B[10]の転送を削減できる。 The specific processing performed in each step and its principle will be described with reference to FIG. In this embodiment, taking the form of FIG. 6 can reduce the transfer of the input sets (502), (503) in FIG. 2 and the subsequent array B [10] inside the input set.

図６は、図２と同様に同じ処理を１００回繰り返す処理B（１０１Ｂ）を論理回路化する際に、本実施例の技術を適用した一例である。図２と同様に、処理B（１０１Ｂ）は、複数の配列A[100]とB[10]を入力として持っている為、通常は論理回路B（２００）に対しても配列A[100]、配列B[10]の複数の配列が入力される。 FIG. 6 is an example in which the technique of this embodiment is applied when the process B (101B) in which the same process is repeated 100 times is made into a logic circuit as in FIG. Similar to FIG. 2, since the process B (101B) has a plurality of arrays A [100] and B [10] as inputs, normally, the array A [100] is also applied to the logic circuit B (200). , Multiple arrays of array B [10] are input.

本実施例では、論理回路B（２００）への入力配列を、並列演算（２０１）の対象である配列A[100]の型となる１つの配列、１００の要素のみとし、配列B[10]の配列要素数を配列A[100]と同様の１００とする（６００）。このためには、例えば１０の元の配列要素に、９０のダミーデータを付加すればよい。この形へ変形することで、一回の論理回路B（２００）の呼び出しにおける入力は１００の要素を持つ一つの配列のみとなり、配列A[100]、配列B[10]それぞれを個別に転送することができる。 In this embodiment, the input array to the logic circuit B (200) is only one array of the type of the array A [100], which is the target of the parallel operation (201), and only 100 elements, and the array B [10]. The number of array elements of is set to 100, which is the same as that of array A [100] (600). For this purpose, for example, 90 dummy data may be added to 10 original array elements. By transforming into this form, the input in one call of the logic circuit B (200) becomes only one array having 100 elements, and each of the array A [100] and the array B [10] is transferred individually. be able to.

この時、論理回路B（２００）上では入力された配列がどの配列であるか判断ができない為、変数による信号によってどの配列かを判定する入力判定処理（２０４）を演算前に配置する。尚、配列の型や配列の値によって入力配列を判断してもよい。 At this time, since it is not possible to determine which array the input array is on the logic circuit B (200), the input determination process (204) for determining which array is based on the signal from the variable is arranged before the calculation. The input array may be determined based on the array type and the array value.

入力判定処理（２０４）は、入力された配列が配列B[10]（インターフェース上B[100]に変形されている）であった場合は、論理回路B（２００）上のメモリ（２０５）へ格納し、配列A[100]であった場合は、並列演算（２０１）へ処理を渡す。この構成によって、本来の並列演算前に、配列B[10]を一度入力し、メモリ（２０５）へ格納した後に、配列A[100]入力の際は並列演算（２０１）を実行し、配列B[10]が必要な演算（２０３）は、配列B[10]を、CPUからの入力よりもレイテンシの少ない論理回路内のメモリ（２０５）から取得することができる。結果、配列B[10]の入力を最初の一回のみに削減しつつ、図２と同等の論理回路B（２００）を設計することができる。 The input determination process (204) goes to the memory (205) on the logic circuit B (200) when the input array is the array B [10] (transformed into B [100] on the interface). If it is stored and the array A [100], the process is passed to the parallel operation (201). With this configuration, before the original parallel operation, the array B [10] is input once, stored in the memory (205), and then the parallel operation (201) is executed when the array A [100] is input, and the array B is executed. The operation (203) that requires [10] can acquire the array B [10] from the memory (205) in the logical circuit having less latency than the input from the CPU. As a result, it is possible to design a logic circuit B (200) equivalent to FIG. 2 while reducing the input of the array B [10] to only the first time.

このときの配列B[10]の削減量は、９９回の転送がなくなり、10*99 = 990の配列要素から、一回の転送にA[100]と合わせる為に必要となった配列要素90を引いた、900の要素数分の転送となる。以上に示す処理によって同値転送配列の入力量が削減され、処理の高速化が達成できる。 The reduction amount of the array B [10] at this time is that 99 transfers are eliminated, and the array element 90 required to match A [100] in one transfer from the array element of 10 * 99 = 990. Subtracted, the transfer is for 900 elements. By the processing shown above, the input amount of the equivalence transfer array can be reduced, and the processing speed can be increased.

図５のフローチャートでは、本実施例が適用できる条件を以下として示し、判定している。
a)論理回路化候補処理に対して、複数の配列が入力されている（ステップ１８）
b)同じ値を繰り返し入力している配列が含まれている（ステップ１９）
c)論理回路化候補処理の繰り返し回数を取得し（ステップ１７）、Lとする。同値転送配列の配列要素数をa、並列演算予定の転送配列の配列要素数をbとした時に（ステップ２１）、a＞b && bL＞a（ステップ２２）を満たす時、bL-aの転送量を削減可能（ステップ２３）となり、b＞a && aL＞b（ステップ２４）を満たす時、a(L-1)の転送量を削減可能（ステップ２５）となる。なお、「x && y」はx と y が共にtrueの時にtrueを返す演算子である。 In the flowchart of FIG. 5, the conditions to which this embodiment can be applied are shown and determined as follows.
a) A plurality of arrays are input for the logic circuit candidate processing (step 18).
b) Contains an array in which the same value is repeatedly entered (step 19)
c) Obtain the number of repetitions of the logic circuit candidate process (step 17), and set it to L. When the number of array elements of the equivalence transfer array is a and the number of array elements of the transfer array scheduled for parallel operation is b (step 21), and when a> b &&bL> a (step 22) is satisfied, bL-a is transferred. The amount can be reduced (step 23), and when b> a &&aL> b (step 24) is satisfied, the transfer amount of a (L-1) can be reduced (step 25). Note that "x &&y" is an operator that returns true when both x and y are true.

入力配列が条件を満たした場合、ステップ２６へ移動し、削減後の転送量と処理速度を、ステップ２０において纏めた各候補への入出力配列リストと統合する（ステップ２７）。結果、論理回路化候補処理の入力を最適化した後の高速化性能を出力する（ステップ２８）。出力は、ユーザに提示可能な論理回路化候補リスト（３）として纏め出力される。 When the input sequence satisfies the conditions, the process proceeds to step 26, and the reduced transfer amount and processing speed are integrated with the input / output sequence list for each candidate summarized in step 20 (step 27). As a result, the high-speed performance after optimizing the input of the logic circuit candidate processing is output (step 28). The output is collectively output as a logic circuitization candidate list (3) that can be presented to the user.

図７は、図３の記述挿入部（４）における処理のフローチャートである。記述挿入部（４）では、ソフトウェア記述（１）に記述を挿入または変更することにより、ソフトウェア記述（１）を高位合成して生成される論理回路に、所定の機能を付加する。図７に基づく動作は以下の通りである。 FIG. 7 is a flowchart of processing in the description insertion section (4) of FIG. The description insertion unit (4) adds a predetermined function to the logic circuit generated by high-level synthesis of the software description (1) by inserting or changing the description in the software description (1). The operation based on FIG. 7 is as follows.

ステップ７０１：論理回路化候補リスト（３）より、どの箇所を論理回路化するかを設計者が選択する。設計者は、論理回路を搭載するデバイスのリソース量や、目的とする性能を鑑みてどの処理を論理回路化するか選択する。尚、このステップにおいて、論理回路化候補リスト（３）から、最も転送時間を短縮できたものを自動で選択する機能を備える事もできる。 Step 701: From the logic circuit candidate list (3), the designer selects which part to be logic circuited. The designer selects which process to make a logic circuit in consideration of the amount of resources of the device on which the logic circuit is mounted and the target performance. In this step, it is also possible to provide a function of automatically selecting the one with the shortest transfer time from the logic circuit candidate list (3).

ステップ７０２：ステップ７０１で選択した論理回路化箇所に対して、計測部（２）で検出した同値転送配列による転送時間を短縮するため、図６（６００）の例に示したように、論理回路へ入力する配列のうち、最も要素数の多い配列の要素数へ、他の配列の要素数を揃える。 Step 702: A logic circuit as shown in the example of FIG. 6 (600) in order to shorten the transfer time by the equivalence transfer sequence detected by the measurement unit (2) for the logic circuitized location selected in step 701. Align the number of elements of other arrays to the number of elements of the array with the largest number of elements among the arrays to be input to.

ステップ７０３：論理回路への入力を、入力予定の配列のうち最大となる配列の要素数を持つ配列一つへと制限する。これにより、論理回路への入力インターフェース（ＩＦ）を統合していることになる。 Step 703: The input to the logic circuit is limited to one array having the maximum number of elements of the array to be input. As a result, the input interface (IF) to the logic circuit is integrated.

ステップ７０４：図６（２０４）に示す、入力判定処理を挿入する。同値転送配列が入力された場合はメモリ（２０５）への格納（ステップ７０５）を、毎回異なる値が格納された配列が入力された場合は並列演算（２０１）を行うように分岐を記述する。 Step 704: Insert the input determination process shown in FIG. 6 (204). A branch is described so that when an equivalence transfer array is input, it is stored in the memory (205) (step 705), and when an array in which a different value is stored is input each time, a parallel operation (201) is performed.

ステップ７０６：同値転送配列を用いる演算（２０３）において、ステップ５０で記述したメモリ（２０５）内に格納した配列を使用するよう記述を変更する。これによって、同じ値を繰り返し入力する必要なく、論理回路内のメモリから配列を再利用することができる。 Step 706: In the operation (203) using the equivalence transfer array, the description is changed so that the array stored in the memory (205) described in step 50 is used. As a result, the array can be reused from the memory in the logic circuit without having to repeatedly input the same value.

ステップ７０７：同値転送配列を、並列演算（２０１）が行われる前に動作記述上で予め論理回路（２００）へ入力する処理を挿入する。これによって、繰り返し呼ばれる並列演算処理の前に同値転送配列を入力することが可能となる。 Step 707: Insert a process of inputting the equivalence transfer array into the logic circuit (200) in advance on the operation description before the parallel operation (201) is performed. This makes it possible to input an equivalence transfer array before the parallel arithmetic processing called repeatedly.

図８は、図３における論理回路化候補リスト（３）の出力例である。各論理回路化候補処理が項目としてリスト化され、当該処理を関数化した際のループ回数と入出力、及び同値転送配列と、本実施例を適用した際の削減量（入力データの削減量や処理時間の削減量）を表示する。 FIG. 8 is an output example of the logic circuit candidate list (3) in FIG. Each logic circuit candidate process is listed as an item, and the number of loops, input / output, and equivalence transfer array when the process is made into a function, and the reduction amount when this embodiment is applied (the reduction amount of input data and Processing time reduction amount) is displayed.

図９は、実施形態にかかる対象のソフトウェアが動作するアーキテクチャを示す図である。本実施例の対象となるソフトウェアは、CPU(Central Processing Unit)（９０１）によってRAM（Random Access Memory）（９０２）内のプログラムを読み込むことで動作し、一部の処理をPLD(Programmable Logic Device)（９０３）上の論理回路で構成する形をとる。PLDは例えばFPGAであり、ソフトウェア処理の負荷となっている処理を論理回路デバイスで実装している。論理回路デバイスは、複数の処理を並列に実行可能である。この構成により、CPUの処理の一部をPLDに移管し、処理の効率化を行なう。各装置（５４〜５６）はバス（９０４）によって接続されている。 FIG. 9 is a diagram showing an architecture in which the target software according to the embodiment operates. The software that is the target of this embodiment operates by reading the program in RAM (Random Access Memory) (902) by the CPU (Central Processing Unit) (901), and performs some processing by PLD (Programmable Logic Device). (903) It takes the form of the above logic circuit. PLD is, for example, FPGA, and the processing that is the load of software processing is implemented by a logic circuit device. The logic circuit device can execute a plurality of processes in parallel. With this configuration, part of the CPU processing is transferred to the PLD to improve processing efficiency. Each device (54-56) is connected by a bus (904).

CPU（９０１）は、プログラムが記憶されているRAM（９０２）から当該プログラムを読み出し、処理を行う。論理回路を実行する際は、入力となる配列をPLD（９０２）へ転送し、結果をCPU（９０１）が再び受け取る。 The CPU (901) reads the program from the RAM (902) in which the program is stored and performs processing. When executing the logic circuit, the input array is transferred to the PLD (902), and the result is received by the CPU (901) again.

図１０は、比較のために示す、図２に示す論理回路化におけるコード例である。１２０行〜１２５行にあるhardware関数（１００１）は、論理回路化する関数であり、図２における論理回路B（２００Ｂ）に該当する。Hardware関数（１００１）内部では、入力された配列B[10]に対してはfunc_once（１００２）による処理を、配列A[100]に対しては１００回のループ演算（１００３）を行っている。 FIG. 10 is a code example in the logic circuit formation shown in FIG. 2, which is shown for comparison. The hardware function (1001) in lines 120 to 125 is a function for forming a logic circuit, and corresponds to the logic circuit B (200B) in FIG. Inside the Hardware function (1001), the input array B [10] is processed by func_once (1002), and the array A [100] is looped 100 times (1003).

高位合成によって論理回路化されると、このループ演算（１００３）が並列演算として構成される。hardware関数（１００１）は、２６０行〜２６３行にある繰り返し処理（１００４）で１００回呼び出されている。高位合成を行う場合、このhardware関数（１００１）が呼ばれると、CPUから論理回路へ処理が受け渡される。 When a logic circuit is formed by high-level synthesis, this loop operation (1003) is configured as a parallel operation. The hardware function (1001) is called 100 times by the iterative process (1004) on lines 260 to 263. When high-level synthesis is performed, when this hardware function (1001) is called, processing is passed from the CPU to the logic circuit.

hardware関数（１００１）への入力は、１２０行で定義されるように、要素数１００の配列A[100]、要素数１０の配列B[10]、出力は、要素数１００の配列output[100]となる。配列A[100]は、繰り返し処理（１００４）において、２６２行目のhardware関数の呼び出し前に、２６１行目のrefresh関数によって配列の値が毎回更新されている。配列Bは、２５５行目のinitialize関数によって初期化され、以降は値の代入処理は行われない。 The input to the hardware function (1001) is an array A [100] with 100 elements, an array B [10] with 10 elements, and the output is an array output [100] with 100 elements, as defined in line 120. ]. In the iterative process (1004), the values of the array A [100] are updated every time by the refresh function on the 261st line before the hardware function on the 262nd line is called. Array B is initialized by the initialize function on line 255, and no value assignment processing is performed thereafter.

結果、hardware関数（１００１）に対し、配列A[100]は毎回異なる値を入力し、配列B[10]は毎回同じ値を入力する構成となる。以上のように、図２の構成は、図６の構成に比較して繰り返し入力によるオーバヘッドが大きい。 As a result, for the hardware function (1001), the array A [100] inputs a different value each time, and the array B [10] inputs the same value each time. As described above, the configuration of FIG. 2 has a larger overhead due to repeated input than the configuration of FIG.

図１１は、図６に示す処理構造を持つ、本実施例を適用した際のコード例である。１２０行目から１２７行目のhardware_manager関数（１１０１）は、図６の入力判定処理（２０４）とメモリ（２０５）を加えた論理回路B（２００）の記述である。 FIG. 11 is a code example when the present embodiment has the processing structure shown in FIG. The hardware_manager function (1101) on the 120th to 127th lines is a description of the logic circuit B (200) to which the input determination process (204) and the memory (205) of FIG. 6 are added.

hardware_manager関数（１１０１）は、１２０行に定義されるように、引数に100の要素を持つ配列input[100]と配列判定用の信号となる変数sig、出力用配列のoutput[100]を持つ。この内input[100]は、図１０における２つの配列A[100]と配列B[10]を統合したインターフェースに相当する。図６において、100の要素の配列一つを論理回路B（２００）への入力としたように、配列A[100]、配列B[10]の入力となる。 The hardware_manager function (1101) has an array input [100] having 100 elements as arguments, a variable sig as a signal for array determination, and an output array output [100] as defined in line 120. Of these, input [100] corresponds to an interface in which two arrays A [100] and array B [10] in FIG. 10 are integrated. In FIG. 6, as if one array of 100 elements was input to the logic circuit B (200), the array A [100] and the array B [10] are input.

配列input[100]によって入力された配列が配列A[100]と配列B[10]のどちらであるかは、引数のsigによって判定される。この時、配列の型から配列の種類を判定できる場合等、入力配列の値によって判定が可能である場合は配列の中身で判定してもよい。 Whether the array input by the array input [100] is the array A [100] or the array B [10] is determined by the argument sig. At this time, if the type of the array can be determined from the type of the array, or if the value of the input array can be used for the determination, the contents of the array may be used for the determination.

１２２行目、１２４行目の条件分岐は、図６の入力判定処理（２０４）にあたり、配列Bが入力された場合（１１０３）は、sig = 0であるためメモリへ格納される。このメモリは論理回路デバイス内部にあるメモリ（２０５）であり、CPUから論理回路処理が呼び出された際に初期化されない。高位合成においては、関数内部の配列は呼び出される度初期化されるが、グローバル配列として宣言（１１０２）することで初期化されない配列とすることができる。 The conditional branch on the 122nd and 124th lines corresponds to the input determination process (204) in FIG. 6, and when the array B is input (1103), it is stored in the memory because sig = 0. This memory is a memory (205) inside the logic circuit device, and is not initialized when the logic circuit processing is called from the CPU. In high-level synthesis, the array inside the function is initialized every time it is called, but it can be made an array that is not initialized by declaring it as a global array (1102).

２５５行目から２５７行目のコード（１１０５）では、２５６行目で配列Bを入力配列inputへ関数mem_setによって格納し、２５７行目で変数sigを0に設定して、hardware_manager関数（１１０１）を呼び出して転送している。この処理は図６における処理（６００）にあたる。当該コード（１１０５）によって、配列Bのみを論理回路へ転送することができる。論理回路（２００）では、入力input[100]を受け取ると、１２２行目で入力判定処理（２０４）が変数sigを確認し、sig = 0であれば配列Bと判定して、１２３行目でメモリ（２０５）に格納する。 In the code (1105) on the 255th to 257th lines, the array B is stored in the input array input by the function mem_set on the 256th line, the variable sig is set to 0 on the 257th line, and the hardware_manager function (1101) is set. Calling and transferring. This process corresponds to the process (600) in FIG. With the code (1105), only the array B can be transferred to the logic circuit. In the logic circuit (200), when the input input [100] is received, the input judgment process (204) confirms the variable sig in the 122nd line, and if sig = 0, it is judged as the array B, and in the 123rd line. Store in memory (205).

２５８行目から２６１行目のコード（１１０６）では、refresh関数とhardware_manager関数が１００回繰り返されている。Refresh関数では、配列A[100]へ繰り返しの度に新しい値を格納している。２６０行目では、配列A[100]を入力として、変数sigを1に設定して、論理回路を呼び出している。ここで、配列A[100]はinput[100]と同じ要素数の為、直接の転送が可能である。配列A[100]が転送された場合、hardware_manager関数（１１０１）では、１２５行目のhardware関数の呼び出し（１１０７）が実行される。hardware関数は、図１０におけるhardware関数（１１０１）と同等の演算処理を行う関数である。 In the code (1106) on lines 258 to 261, the refresh function and the hardware_manager function are repeated 100 times. In the Refresh function, a new value is stored in the array A [100] each time it is repeated. In the 260th line, the logic circuit is called by taking the array A [100] as an input and setting the variable sig to 1. Here, since the array A [100] has the same number of elements as the input [100], direct transfer is possible. When the array A [100] is transferred, the hardware_manager function (1101) executes the call of the hardware function (1107) on the 125th line. The hardware function is a function that performs arithmetic processing equivalent to the hardware function (1101) in FIG.

ただし、図１０の２６２行目のhardware関数の呼び出しでは、その１２０行目で定義されているように配列Aと配列Bが固定されているのに対して、図１１の１２５行目のhardware関数の呼び出しでは、その１２０行目で定義されるように入力inputに配列Aと配列Bの両方が統合され、bram Bの値はメモリから呼び出される。 However, in the call of the hardware function on the 262nd line of FIG. 10, the array A and the array B are fixed as defined on the 120th line, whereas the hardware function on the 125th line of FIG. 11 is fixed. In the call to, both array A and array B are integrated into the input input as defined in line 120, and the value of bram B is called from memory.

論理回路（２００）では、入力input[100]を受け取ると、１２４行目で入力判定処理（２０４）が変数sigが１であることを確認し、並列演算（２０１）を行なう。また、論理回路へ入力された配列A[100]と、既に論理回路内メモリへ格納されている配列B[10]を引数として演算（２０２）を行う。ここで、配列B[10]を論理回路内のメモリ（２０５）から参照することで、ソフトウェアからの転送時間を削減し、論理回路全体の処理時間を削減することができる。 In the logic circuit (200), when the input input [100] is received, the input determination process (204) confirms that the variable sig is 1 on the 124th line, and performs the parallel operation (201). Further, the operation (202) is performed using the array A [100] input to the logic circuit and the array B [10] already stored in the memory in the logic circuit as arguments. Here, by referring to the array B [10] from the memory (205) in the logic circuit, the transfer time from the software can be reduced and the processing time of the entire logic circuit can be reduced.

なお、上記実施例では配列の判別のために変数sigを採用しているが、一方の配列のみにフラグを付するなど他の方法でも良い。 In the above embodiment, the variable sig is used to discriminate the sequences, but other methods such as flagging only one of the sequences may be used.

以上説明した実施例によれば、CPU上で実行されるソフトウェア記述の論理回路化において、論理回路によって並列演算が構成できる箇所を検出し、当該処理がソフトウェア記述上で複数回呼び出される場合に、論理回路への入力を複数回呼ばれることを考慮して最適化し、転送量を削減することで高速化することができる。これにより、例えばソフトウェア上で複数回論理回路が呼ばれる時にFPGAへの入力量が最適化されアプリケーション全体における転送時間が減少し、高速化が達成される。 According to the embodiment described above, in the logic circuit of the software description executed on the CPU, when a place where the parallel operation can be configured by the logic circuit is detected and the process is called a plurality of times in the software description, The speed can be increased by optimizing the input to the logic circuit in consideration of being called multiple times and reducing the transfer amount. As a result, for example, when the logic circuit is called a plurality of times on the software, the input amount to the FPGA is optimized, the transfer time in the entire application is reduced, and the speed is increased.

Claims

When a part of the software description is made into a logic circuit, a process that can be configured in parallel in the logic circuit and is executed multiple times is detected, and the inputs for performing the process in the logic circuit are listed and sent to the logic circuit. A measurement unit that has a function to output the performance when the input amount is reduced as a logic circuit candidate list,
A description insertion unit having a function of inserting a description for reducing the amount of input to the logic circuit into the software description for the logic circuit candidate selected from the logic circuit candidate list.
Circuit design support device equipped with.

The description insertion part is
Of the multiple arrays included in the input to the logic circuit, the array that repeatedly inputs the same value is converted to an array with the same number of elements as the other array, and the array converted from the other array is transferred to the logic circuit. Has the ability to change the software description, so that
The circuit design support device according to claim 1.

The description insertion part is
It has a function of changing the software description so as to add information that can distinguish the other sequence from the converted sequence when the converted sequence is integrated with the other sequence and transferred to the logic circuit.
The circuit design support device according to claim 2.

The measurement unit includes a reduction amount calculation unit.
The reduction amount calculation unit calculates how much the processing time can be shortened by reducing the input amount to the logic circuit, the number of loops of the processing for each processing in the software description, and when the processing is made into a logic circuit. The circuit design support device according to claim 1, which has a function of inputting and outputting an input amount to a reducible logic circuit.

The reduction amount calculation unit
Calculates how much processing time can be reduced by reducing the input amount of the array when the input to the logic circuit contains multiple arrays and there is an array that transfers the same value to the logic circuit. To do,
The circuit design support device according to claim 4.

The reduction amount calculation unit
Let L be the number of repetitions of the process of making a logic circuit, let a be the number of elements in the array that repeatedly transfers the same value, and let b be the number of elements in the array that transfer different values each time, and the conditions a> b and bL> a are satisfied. When bL−a, b> a and aL> b are satisfied, the transfer amount of a (L−1) can be reduced.
The circuit design support device according to claim 5.

The description insertion part is
It has a function of inserting into the software description a process of storing an array in which the same value is repeatedly input in a memory in a logic circuit.
The circuit design support device according to claim 2.

The description insertion part is
It has a function of changing the software description so as to refer to the memory instead of the input for a logic circuit that refers to an array that repeatedly inputs the same value.
The circuit design support device according to claim 7.

The description insertion part is
It has a function of changing the software description so that an array that repeatedly transfers the same value is input to a logic circuit in advance before a parallel operation is called.
The circuit design support device according to claim 2.

It is a circuit design support method that makes a part of the software description into a logic circuit.
Iterative process extraction process that extracts the iterative process that makes a logic circuit from the software description,
In the extracted iterative process, an equivalence transfer sequence extraction process that extracts an input sequence that repeatedly uses the same value, and
A description insertion process that changes the software description so as to convert an input array that repeatedly uses the same value into an array that has the same number of elements as other input arrays.
Circuit design support method to perform.

When converting an input array that repeatedly uses the same value into an array having the same number of elements as other input arrays, the software adds information that can distinguish the converted input array from other input sequences. Change the description,
The circuit design support method according to claim 10.

The software description is modified so that the input array that repeatedly uses the same value is stored in the memory of the logic circuit before the iterative processing is performed in the logic circuit.
The circuit design support method according to claim 10.

When the repetitive processing is performed in the logic circuit, the software description is changed so that the same value is called from the memory and the processing is performed.
The circuit design support method according to claim 12.

An output process that outputs a logic circuit candidate list that displays the extracted iterative process in association with a change in performance due to a change in the software description.
A selection process for selecting an iterative process for making a logic circuit from the logic circuit candidate list
10. The circuit design support method according to claim 10.

An information processing device including a CPU, a logic circuit, and a memory, which calls the logic circuit to execute a part of the processing when the software stored in the memory is executed by the CPU to perform processing.
When the CPU calls the logic circuit to execute a part of the processing, the CPU includes an array in which the input of the processing executed by the logic circuit includes a plurality of arrays and the plurality of arrays repeatedly input the same value. When it is included, an array that repeatedly uses the same value is converted into an array having the same number of elements as other arrays and transferred to the logic circuit.
The logic circuit stores an array in which the same value is repeatedly used in the built-in memory, and performs processing using the array stored in the built-in memory.
Information processing device.