JP6758470B2

JP6758470B2 - Information processing device

Info

Publication number: JP6758470B2
Application number: JP2019202147A
Authority: JP
Inventors: 雄介菅野; 鳥羽　忠信; 忠信鳥羽; 真佐圓; 豪一小野; 山岡　雅直; 雅直山岡
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 2019-11-07
Filing date: 2019-11-07
Publication date: 2020-09-23
Anticipated expiration: 2035-08-20
Also published as: JP2020053059A

Description

本発明は、情報処理装置および情報処理システムに関し、例えば、ＦＰＧＡ（Field Programmable Gate Array）を代表とするプログラマブルロジックを含んだ情報処理装置、および当該情報処理装置を備えた情報処理システムに関する。 The present invention relates to an information processing apparatus and an information processing system, for example, an information processing apparatus including a programmable logic represented by an FPGA (Field Programmable Gate Array), and an information processing system including the information processing apparatus.

演算の高速化や高機能化を実現するために、フィールドプログラマブル論理ＬＳＩによるダイナミックリコンフィグ技術が知られている。この技術は、予めユーザが設計したデータをＲＯＭからＦＰＧＡ内のコンフィギュレーションメモリへ展開し、ＦＰＧＡに所望の演算を実行させるものであり、このコンフィギュレーションメモリへ格納するデータをＬＳＩが稼動状態の中で、動的書き換える技術である。 A dynamic reconfiguring technique using a field programmable logic LSI is known in order to realize high-speed and high-performance operations. In this technology, data designed by the user in advance is expanded from the ROM to the configuration memory in the FPGA, and the FPGA is made to execute a desired operation. The data stored in the configuration memory is stored in the configuration memory while the LSI is in operation. It is a technology for dynamic rewriting.

例えば、非特許文献１には、データセンタを構成する各サーバにＦＰＧＡを搭載する技術が示される。当該技術では、ＦＰＧＡに対して適宜ダイナミックリコンフィグを行うことで、スループットの向上等を図れる。また、非特許文献２には、特にＦＰＧＡとは関係なく、イジングモデルに基づく処理を行うＣＭＯＳ型のイジングチップの構成が示される。 For example, Non-Patent Document 1 discloses a technique for mounting an FPGA on each server constituting a data center. In this technology, throughput can be improved by appropriately performing dynamic reconfiguration on FPGA. Further, Non-Patent Document 2 discloses a configuration of a CMOS type Ising chip that performs processing based on an Ising model regardless of FPGA.

“A reconfigurable fabric for accelerating large-scale datacenter services,” ACM/IEEE 41st International Symposium on Computer Architecture (ISCA), 2014.“A reconfigurable fabric for accelerating large-scale datacenter services,” ACM / IEEE 41st International Symposium on Computer Architecture (ISCA), 2014. “20k-spin Ising chip for combinational optimization problem with CMOS annealing,” Solid-State Circuits Conference - (ISSCC), 2015 IEEE International, Session 24.3.“20k-spin Ising chip for combinational optimization problem with CMOS annealing,” Solid-State Circuits Conference-(ISSCC), 2015 IEEE International, Session 24.3.

情報処理を高速化する上で、現在知られているアーキテクチャが最適とは断言できない。数ある可能性の中で、人間が考え出したよりよい解法を論理ＬＳＩにて実現し、演算を実行しているからである。演算最適化は解くべき問題に依存するので、本来的には、解くべき問題に依存してハード構成が自己最適化を進めるように変化することが望ましい。ただし、現時点で自己最適化を完全に実現するのは難しく、設計されたデータを元に変更を加えていくことが現実的であろう。 It cannot be said that the currently known architecture is optimal for speeding up information processing. This is because, among the many possibilities, a better solution method devised by humans is realized by a logic LSI and an operation is executed. Since arithmetic optimization depends on the problem to be solved, it is desirable that the hardware configuration changes so as to proceed with self-optimization depending on the problem to be solved. However, it is difficult to achieve full self-optimization at this point, and it will be realistic to make changes based on the designed data.

これまで、遺伝的アルゴリズムなどのソフトウエアとしての進化アルゴリズムは開発されてきている。こうした中、本発明者等は、ＦＰＧＡのコンフィギュレーションに、このような自己最適化の技術を導入できないかと考えた。コンフィギュレーションを変更するということは、論理接続情報を変更することであり、プロセッサ上でプログラム的に進化アルゴリズムを実行する場合よりも演算の高速化が見込める。その際、演算データの関連を考慮してコンフィギュレーションを自律的に変更できることが望ましい。 So far, evolutionary algorithms as software such as genetic algorithms have been developed. Under these circumstances, the present inventors wondered if such a self-optimization technique could be introduced into the FPGA configuration. Changing the configuration means changing the logical connection information, which can be expected to be faster than when the evolutionary algorithm is executed programmatically on the processor. At that time, it is desirable that the configuration can be changed autonomously in consideration of the relation of the calculated data.

本発明は、このようなことに鑑みてなされたものであり、その目的の一つは、プログラマブルロジックを含んだ情報処理装置および情報処理システムにおいて、演算効率の向上を実現することにある。 The present invention has been made in view of the above, and one of the objects thereof is to realize improvement of calculation efficiency in an information processing apparatus and an information processing system including programmable logic.

本発明の前記並びにその他の目的と新規な特徴は、本明細書の記述及び添付図面から明らかになるであろう。 The above and other objects and novel features of the present invention will become apparent from the description and accompanying drawings herein.

本願において開示される発明のうち、代表的な実施の形態の概要を簡単に説明すれば、次のとおりである。 A brief description of typical embodiments of the inventions disclosed in the present application is as follows.

本実施の形態による情報処理装置は、ＦＰＧＡファブリック部と、コンフィギュレーションメモリと、データ監視部と、コンフィグ制御部と、ＣＲＡＭコントローラとを有する。ＦＰＧＡファブリック部は、ユーザ論理回路を実装し、コンフィギュレーションメモリは、当該ユーザ論理回路の回路構成を定めるコンフィグデータを保持する。データ監視部は、ユーザ論理回路の動作状態を監視し、コンフィグ制御部は、当該監視結果に基づいてコンフィグデータの変更要否を判定する。ＣＲＡＭコントローラは、コンフィグ制御部の判定結果に基づいてコンフィギュレーションメモリのコンフィグデータを変更する。 The information processing apparatus according to the present embodiment includes an FPGA fabric unit, a configuration memory, a data monitoring unit, a config control unit, and a CRAM controller. The FPGA fabric unit implements a user logic circuit, and the configuration memory holds config data that defines the circuit configuration of the user logic circuit. The data monitoring unit monitors the operating state of the user logic circuit, and the config control unit determines whether or not the config data needs to be changed based on the monitoring result. The CRAM controller changes the config data of the configuration memory based on the determination result of the config control unit.

本願において開示される発明のうち、代表的な実施の形態によって得られる効果を簡単に説明すると、演算効率の向上が実現可能になる。 A brief description of the effects obtained by a typical embodiment of the inventions disclosed in the present application makes it possible to improve the calculation efficiency.

本発明の実施例１による情報処理システムにおいて、全体の概略構成例を示すブロック図である。It is a block diagram which shows the overall schematic structure example in the information processing system according to Example 1 of this invention. 図１におけるＣＲＡＭコントローラの構成例を示すブロック図である。It is a block diagram which shows the structural example of the CRAM controller in FIG. 図１の情報処理装置において、コンフィグデータの変更を行う際の制御フローの一例を示すフロー図である。It is a flow diagram which shows an example of the control flow at the time of changing the config data in the information processing apparatus of FIG. 図３のフローに伴う、図１の情報処理装置の入出力信号の一例を示すシーケンス図である。It is a sequence diagram which shows an example of the input / output signal of the information processing apparatus of FIG. 1 accompanying the flow of FIG. 本発明の実施例２による情報処理装置において、ＣＲＡＭのアドレスマップの構造例を示す概略図である。It is the schematic which shows the structural example of the address map of CRAM in the information processing apparatus by Example 2 of this invention. （ａ）および（ｂ）は、本発明の実施例３による情報処理装置において、コンフィグデータの変更方法の一例を説明する概念図である。(A) and (b) are conceptual diagrams illustrating an example of a method of changing config data in the information processing apparatus according to the third embodiment of the present invention. 本発明の実施例４による情報処理システムにおいて、コンフィグデータの変更方法の一例を説明する概念図である。FIG. 5 is a conceptual diagram illustrating an example of a method of changing config data in the information processing system according to the fourth embodiment of the present invention. 本発明の実施例５による情報処理システムにおいて、その全体の概略構成例を示すブロック図である。FIG. 5 is a block diagram showing a schematic configuration example of the entire information processing system according to the fifth embodiment of the present invention. 本発明の実施例６による情報処理装置において、コンフィグデータの変更方法の一例を説明する概念図である。FIG. 5 is a conceptual diagram illustrating an example of a method of changing config data in the information processing apparatus according to the sixth embodiment of the present invention. 本発明の実施例７による情報処理システムにおいて、その主要部の概略構成例を示すブロック図である。FIG. 5 is a block diagram showing a schematic configuration example of a main part of the information processing system according to the seventh embodiment of the present invention. 一般的なＬＵＴの構成例を示す回路図である。It is a circuit diagram which shows the structural example of a general LUT. 本発明の実施例８による情報処理装置において、その主要部の構成例を示す概略図である。It is the schematic which shows the structural example of the main part of the information processing apparatus according to Example 8 of this invention. 図１２における具体的な演算シーケンスの一例を示す波形図である。It is a waveform diagram which shows an example of the specific operation sequence in FIG. イジングチップへマッピングする２次元イジングモデルの基本単位であるＰＥ（Primitive Element）の一例を示す図である。It is a figure which shows an example of PE (Primitive Element) which is the basic unit of the 2D Ising model to map to the Ising chip. ＰＥにおける演算の基本単位であるＥＵ（Execlusion Unit）の一例を説明する図である。It is a figure explaining an example of EU (Execlusion Unit) which is a basic unit of operation in PE. 図１５のＥＵの詳細構成例を示す図である。It is a figure which shows the detailed configuration example of EU of FIG. 図１５のＰＥをＦＰＧＡへマッピングした場合の構成例を示す概略図である。It is a schematic diagram which shows the configuration example when PE of FIG. 15 is mapped to FPGA. 図１７の構成を用いて演算を実施する際の全体的な動作例を示す図である。It is a figure which shows the overall operation example when performing the calculation using the structure of FIG.

以下の実施例において、便宜上その必要があるときは、複数のセクションまたは実施例に分割して説明するが、特に明示した場合を除き、それらは互いに無関係なものではなく、一方は他方の一部または全部の変形例、詳細、補足説明等の関係にある。また、要素の数等（個数、数値、量、範囲等を含む）に言及する場合、特に明示した場合および原理的に明らかに特定の数に限定される場合等を除き、その特定の数に限定されるものではなく、特定の数以上でも以下でも良い。 In the following examples, when necessary for convenience, the description will be divided into a plurality of sections or examples, but unless otherwise specified, they are not unrelated to each other, and one is a part of the other. Or, there is a relationship of all modifications, details, supplementary explanations, etc. In addition, when referring to the number of elements (including the number, numerical value, quantity, range, etc.), unless otherwise specified or clearly limited to a specific number in principle, the specific number is used. It is not limited, and may be more than or less than a specific number.

さらに、以下の実施例において、その構成要素（要素ステップ等も含む）は、特に明示した場合および原理的に明らかに必須であると考えられる場合等を除き、必ずしも必須のものではないことは言うまでもない。同様に、構成要素等の形状、位置関係等に言及するときは、特に明示した場合および原理的に明らかにそうでないと考えられる場合等を除き、実質的にその形状等に近似または類似するもの等を含むものとする。このことは、上記数値および範囲についても同様である。 Furthermore, in the following examples, it goes without saying that the components (including element steps, etc.) are not necessarily essential unless otherwise specified or clearly considered to be essential in principle. No. Similarly, when referring to the shape, positional relationship, etc. of a component, etc., it is substantially similar to or similar to the shape, etc., unless otherwise specified or when it is considered that it is not clearly the case in principle. Etc. shall be included. This also applies to the above numerical values and ranges.

以下、本発明の実施例を図面に基づいて詳細に説明する。なお、実施例を説明するための全図において、同一の部材には原則として同一の符号を付し、その繰り返しの説明は省略する。 Hereinafter, examples of the present invention will be described in detail with reference to the drawings. In addition, in all the drawings for explaining the examples, the same members are in principle given the same reference numerals, and the repeated description thereof will be omitted.

《ＦＰＧＡの概要》 << Overview of FPGA >>

一般にＦＰＧＡは、論理記述言語（たとえばＶｅｒｉｌｏｇＨＤＬ（Hardware Description Language）やＶＨＤＬ）を用いて設計され、この論理回路はＲＴＬ（Register Transfer Level）と呼ばれる手法にて記述される。一般的に、ＲＴＬからＬＳＩへの集積は、次の手順で実施される。すなわち、ＲＴＬの論理接続情報を実際のＬＳＩに集積されるトランジスタの物理的な配置を考慮していわゆるネットリストへ変換し、さらに、各信号線のタイミング制約を考慮して静的タイミング解析シミュレーション（ＳＴＡ）を実施し、その配置を最適化した後に、ＬＳＩへ集積する。 Generally, FPGA is designed using a logic description language (for example, Verilog HDL (Hardware Description Language) or VHDL), and this logic circuit is described by a method called RTL (Register Transfer Level). Generally, integration from RTL to LSI is carried out by the following procedure. That is, the logical connection information of the RTL is converted into a so-called netlist in consideration of the physical arrangement of the transistors integrated in the actual LSI, and further, the static timing analysis simulation (static timing analysis simulation) in consideration of the timing constraint of each signal line. After performing STA) and optimizing the arrangement, it is integrated into the LSI.

ＦＰＧＡにおいてもほぼ同様の手続きで実施され、このＲＴＬを設計後、論理合成を実施し、信号線のタイミングを調整しながら、調整後のデータをＦＰＧＡのコンフィギュレーションメモリ（以降、ＣＲＡＭと呼ぶ）へ格納するデータへ変換する。この変換を、以降、ＣＲＡＭへのマッピングと呼ぶ。 The procedure is almost the same for FPGA, and after designing this RTL, logic synthesis is performed, and while adjusting the signal line timing, the adjusted data is transferred to the FPGA configuration memory (hereinafter referred to as CRAM). Convert to data to be stored. This conversion is hereinafter referred to as mapping to CRAM.

ＣＲＡＭへのマッピングは、より詳細には、ＦＰＧＡ内部に集積された基本論理回路および配線接続スイッチの構成の物理的な配置を考慮し、論理記述情報から変換されたトランジスタレベル結線情報（ネットリスト）を元に、ＦＰＧＡのＣＲＡＭへのビット列（ビットストリーム）へ変換することで実現される。ＦＰＧＡのＣＲＡＭへ当該ビットストリームを格納することによって、所望の動作をターゲットのＦＰＧＡで実現できることになる。ＦＰＧＡで所望の機能を実現するためにＲＴＬから変換されたネットリストと、ＦＰＧＡへ格納されるＣＲＡＭのデータ列（ビットストリーム）とは、１対１対応している。
《情報処理システムの全体構成》 More specifically, the mapping to the CRAM considers the physical arrangement of the configuration of the basic logic circuit and the wiring connection switch integrated inside the FPGA, and the transistor level connection information (net list) converted from the logical description information. It is realized by converting into a bit string (bit stream) to the CRAM of FPGA based on. By storing the bitstream in the CRAM of the FPGA, the desired operation can be realized in the target FPGA. There is a one-to-one correspondence between the netlist converted from the RTL in order to realize the desired function in the FPGA and the data string (bit stream) of the CRAM stored in the FPGA.
<< Overall configuration of information processing system >>

図１は、本発明の実施例１による情報処理システムにおいて、全体の概略構成例を示すブロック図である。図１に示す情報処理システムは、一乃至複数の情報処理装置（ここではＦＰＧＡチップＦＰＧＡ＿ＣＨ）と、評価部インターフェースＥＶＡＬ＿ＩＦと、外部コンフィグ記憶装置ＳＴＲＧと、これらを接続する各種信号線とを備える。各種信号線には、制御信号線（ＳＣＴＬ）、コンフィグ信号線（ＳＣＦＧ）、データ信号線（ＳＤＡＴ）が含まれる。 FIG. 1 is a block diagram showing a schematic configuration example of the entire information processing system according to the first embodiment of the present invention. The information processing system shown in FIG. 1 includes one or more information processing devices (in this case, FPGA chip FPGA_CH), an evaluation unit interface EVAL_IF, an external config storage device CTRL, and various signal lines connecting them. The various signal lines include a control signal line (SCTL), a config signal line (SCFG), and a data signal line (SDAT).

図１の構成例は、当該構成のみの形で所望の情報処理システムを構成することもできるが、例えば、プロセッサシステムの一部に適用され、当該プロセッサシステムのハードウエアアクセラレータとして用いることも可能である。なお、図１には、一般的な情報処理装置が情報処理を行う上で通常備えるストレージやネットワークは、不図示とし、ＦＰＧＡのコンフィギュレーション変更に関する要部のみを記載する。 The configuration example of FIG. 1 can configure a desired information processing system only in the form of the configuration, but can be applied to, for example, a part of a processor system and used as a hardware accelerator of the processor system. is there. Note that FIG. 1 does not show the storage and network normally provided in a general information processing apparatus for information processing, and only the main parts related to the configuration change of the FPGA are shown.

ＦＰＧＡチップＦＰＧＡ＿ＣＨは、複数のＦＰＧＡファブリック部ＦＰＧＡ＿ＦＡＢと、ＣＲＡＭと、ＣＲＡＭコントローラＣＲＡＭＣと、データ監視部ＣＵＮＴと、チップ統括コントローラＣＨＰＣ、オンチップバスＤＢＵＳと、コンフィグ制御部ＣＮＦＧＣとを備える。ＣＲＡＭは、ユーザ論理回路の回路構成を定めるためのコンフィグデータを保持する。ＦＰＧＡファブリック部ＦＰＧＡ＿ＦＡＢは、当該ＣＲＡＭのコンフィグデータに基づくユーザ論理回路を実装する。ＦＰＧＡファブリック部ＦＰＧＡ＿ＦＡＢは、詳細には、複数のＬＵＴ（Lookup table）によって所望の論理真理値表を実装するプログラマブルロジック部ＬＯＧと、複数のスイッチによって各ＬＵＴ間を任意に接続するスイッチ部ＳＷとを備える。 The FPGA chip FPGA_CH includes a plurality of FPGA fabric units FPGA_FAB, CRAM, CRAM controller CRAMC, data monitoring unit CUNT, chip control controller CHPC, on-chip bus DBUS, and config control unit CNFGC. The CRAM holds config data for determining the circuit configuration of the user logic circuit. The FPGA fabric unit FPGA_FAB implements a user logic circuit based on the config data of the CRAM. In detail, the FPGA fabric unit FPGA_FAB includes a programmable logic unit LOG that implements a desired logic truth table by a plurality of LUTs (Lookup tables), and a switch unit SW that arbitrarily connects each LUT by a plurality of switches. Be prepared.

ＣＲＡＭコントローラＣＲＡＭＣは、ＣＲＡＭまたは加えて外部コンフィグ記憶装置ＳＴＲＧとの間でコンフィグデータの読み出し／書き込み等を行う。オンチップバスＤＢＵＳは、ＦＰＧＡファブリック部ＦＰＧＡ＿ＦＡＢの入力出信号を伝送する。具体的には、オンチップバスＤＢＵＳは、各ＦＰＧＡファブリック部ＦＰＧＡ＿ＦＡＢ間、または、各ＦＰＧＡファブリック部ＦＰＧＡ＿ＦＡＢとデータ信号線（ＳＤＡＴ）との間の通信データを伝送する。データ監視部ＣＵＮＴは、例えば、データの流通を計測すること等で、ＦＰＧＡファブリック部ＦＰＧＡ＿ＦＡＢに実装されるユーザ論理回路の動作状態を監視する。チップ統括コントローラＣＨＰＣは、チップ全体の動作を制御する。 The CRAM controller CRAMC reads / writes config data to / from the CRAM or, in addition, the external config storage device STRG. The on-chip bus DBUS transmits the input / output signal of the FPGA fabric unit FPGA_FAB. Specifically, the on-chip bus DBUS transmits communication data between each FPGA fabric unit FPGA_FAB or between each FPGA fabric unit FPGA_FAB and a data signal line (SDAT). The data monitoring unit CUNT monitors the operating state of the user logic circuit mounted on the FPGA fabric unit FPGA_FAB, for example, by measuring the distribution of data. The chip control controller CHPC controls the operation of the entire chip.

なお、図１では、便宜上、ＦＰＧＡファブリック部ＦＰＧＡ＿ＦＡＢのみがプログラマブルロジックで構成されているが、これに加えて、データ監視部ＣＵＮＴ、コンフィグ制御部ＣＮＦＧＣ、ＣＲＡＭコントローラＣＲＡＭＣ、およびチップ統括コントローラＣＨＰＣの一部もプログラマブルロジックで構成されてもよい。また、このようなＦＰＧＡファブリック部ＦＰＧＡ＿ＦＡＢを除く部分は、ＦＰＧＡに内蔵されるハードコア・プロセッサ（予めプロセッサのハードウェアが実装されたもの）またはソフトコア・プロセッサ（プログラマブルロジックを用いてプロセッサを構成したもの）を利用して構成されてもよい。 In FIG. 1, for convenience, only the FPGA fabric unit FPGA_FAB is composed of programmable logic, but in addition to this, a part of the data monitoring unit CUNT, the config control unit CNFGC, the CRAM controller CRAMC, and the chip control controller CHPC. May also be configured with programmable logic. In addition, the part excluding the FPGA fabric part FPGA_FAB is a hardcore processor (a processor hardware is mounted in advance) or a soft-core processor (a processor configured by using programmable logic) built in the FPGA. ) May be used.

また、図１では、便宜上、ＦＰＧＡファブリック部ＦＰＧＡ＿ＦＡＢがあたかも分割されているように記載されている。これは、ＦＰＧＡへ搭載する機能ブロック毎に論理的に分割されていることを判りやすく示すためである。もちろん、実際には、ＦＰＧＡ内部は、ＦＰＧＡファブリックＦＰＧＡ＿ＦＡＢが一様に配置されていてもよい。 Further, in FIG. 1, for convenience, the FPGA fabric portion FPGA_FAB is described as if it were divided. This is to clearly show that each functional block mounted on the FPGA is logically divided. Of course, in practice, the FPGA fabric FPGA_FAB may be uniformly arranged inside the FPGA.

データ信号線（ＳＤＡＴ）は、複数のＦＰＧＡチップＦＰＧＡ＿ＣＨ間でデータ信号ＳＤＡＴを送受信するための信号線であり、例えば、高速シリアル通信線などで実現される。コンフィグ信号線（ＳＣＦＧ）は、複数のＦＰＧＡチップＦＰＧＡ＿ＣＨ間、または、各ＦＰＧＡチップＦＰＧＡ＿ＣＨと、不揮発メモリやハードディスクドライブ等で構成される外部コンフィグ記憶装置ＳＴＲＧとの間で、ＣＲＡＭのコンフィグ信号（コンフィグデータ）ＳＣＦＧを送受信するための信号線である。制御信号線（ＳＣＴＬ）は、コンフィグデータの変更依頼や変更通知などを各ＦＰＧＡチップＦＰＧＡ＿ＣＨへ伝達する信号線である。ここでは、これらの信号線を明示的に分けて記載したが、時分割多重方式等で共用配線を用いて伝達することも可能である。 The data signal line (SDAT) is a signal line for transmitting and receiving a data signal SDAT between a plurality of FPGA chips FPGA_CH, and is realized by, for example, a high-speed serial communication line. The config signal line (SCFG) is a CRAM config signal (config data) between a plurality of FPGA chips FPGA_CH, or between each FPGA chip FPGA_CH and an external config storage device STRG composed of a non-volatile memory, a hard disk drive, or the like. ) A signal line for transmitting and receiving SCFG. The control signal line (SCTL) is a signal line that transmits a change request or change notification of config data to each FPGA chip FPGA_CH. Here, these signal lines are explicitly described separately, but it is also possible to transmit them using shared wiring by a time division multiplexing method or the like.

データ監視部ＣＵＮＴは、図１の例では、オンチップバスＤＢＵＳに接続され、例えばカウンタ回路等によってオンチップバスＤＢＵＳ上のデータ通信量を監視する。例えば、オンチップバスＤＢＵＳでパケット通信やＩ２Ｃ（Inter Integrated Circuit）通信等を用いる場合、データ監視部ＣＵＮＴは、データ中の送信元および宛先情報を参照し、送信元および宛先の組合せ毎にデータ通信量を計測する。ただし、データ監視部ＣＵＮＴは、必ずしも当該位置に限定されず、例えば、各ＦＰＧＡファブリック部ＦＰＧＡ＿ＦＡＢの内部（具体的には、オンチップバスＤＢＵＳとのインタフェース部）に設けられてもよい。この場合、当該データ監視部ＣＵＮＴは、例えば、受信したデータ中の送信元情報を参照し、各送信元毎にデータ通信量を計測する。 In the example of FIG. 1, the data monitoring unit CUNT is connected to the on-chip bus DBUS and monitors the amount of data communication on the on-chip bus DBUS by, for example, a counter circuit or the like. For example, when packet communication, I2C (Inter Integrated Circuit) communication, or the like is used in the on-chip bus DBUS, the data monitoring unit CUNT refers to the source and destination information in the data, and data communication is performed for each combination of source and destination. Measure the amount. However, the data monitoring unit CUNT is not necessarily limited to the relevant position, and may be provided, for example, inside each FPGA fabric unit FPGA_FAB (specifically, an interface unit with the on-chip bus DBUS). In this case, the data monitoring unit CUNT refers to, for example, the source information in the received data, and measures the data communication amount for each source.

コンフィグ制御部ＣＮＦＧＣは、データ監視部ＣＵＮＴの監視結果に基づいて、閾値テーブルＴＢＬを参照しながら、コンフィグデータの変更要否を判定する。変更要と判定した場合、コンフィグ制御部ＣＮＦＧＣは、コンフィグデータ変更要求をＣＲＡＭコントローラＣＲＡＭＣへ発行する。閾値テーブルＴＢＬには、例えば、通信レートの統計データ等が格納され、コンフィグ制御部ＣＮＦＧＣは、監視結果に基づく通信レートまたは通信レートの変動量等が、当該統計データに基づく閾値を超えた場合に、コンフィグデータ変更要求を発行する。当該変更要求を受けたＣＲＡＭコントローラＣＲＡＭＣは、ＣＲＡＭの保持内容（すなわちコンフィグデータ）を書き換える。 Based on the monitoring result of the data monitoring unit CUNT, the config control unit CNFGC determines whether or not the config data needs to be changed while referring to the threshold table TBL. When it is determined that the change is necessary, the config control unit CNFGC issues a config data change request to the CRAM controller CRAMC. For example, statistical data of communication rate is stored in the threshold table TBL, and the config control unit CNFGC causes the communication rate based on the monitoring result or the fluctuation amount of the communication rate to exceed the threshold value based on the statistical data. , Issue a config data change request. Upon receiving the change request, the CRAM controller CRAMC rewrites the holding contents (that is, config data) of the CRAM.

図２は、図１におけるＣＲＡＭコントローラの構成例を示すブロック図である。ＣＲＡＭコントローラＣＲＡＭＣは、ＣＲＡＭの読み書きを制御するＣＲＡＭアクセスコントローラＣＲＡＭＡＣと、ＣＲＡＭのアドレスマップを保有するアドレスマップ保持部ＡＤＲＭＡＰと、ＣＲＡＭとのデータ授受および、外部コンフィグ記憶装置ＳＴＲＧとのデータ授受を実施するポートＰＴとを備える。ＣＲＡＭコントローラＣＲＡＭＣは、コンフィグデータを変更後、ポートＰＯＲＴを介してコンフィグデータのＣＲＣ等の符号化コードチェックをコードチェック部ＣＯＤＥＣＨＫにて実施し、その演算した符号化コードをＲＡＭへ格納する。ＲＡＭへ格納した符号化コードを用いることで、例えば、ＣＲＡＭのコンフィグデータを定期的に読み出し、エラー検出・訂正などを行える。 FIG. 2 is a block diagram showing a configuration example of the CRAM controller in FIG. The CRAM controller CRAMC transfers data between the CRAM access controller CRAMAC that controls reading and writing of the CRAM, the address map holding unit ADRMAP that holds the address map of the CRAM, and the CRAM, and transfers data between the external config storage device STRG. It has a port PT. After changing the config data, the CRAM controller CRAMC performs a coding code check such as CRC of the config data on the code check unit CODECHK via the port PORT, and stores the calculated coded code in the RAM. By using the coding code stored in the RAM, for example, the config data of the CRAM can be periodically read, and error detection / correction can be performed.

ここで、本実施例１の方式では、前述したように、コンフィグ制御部ＣＮＦＧＣがデータ監視部ＣＵＮＴの監視結果に基づいてコンフィグデータ変更要求を発行し、これに応じて、ＣＲＡＭコントローラＣＲＡＭＣがコンフィグデータを変更する。ＣＲＡＭアクセスコントローラＣＲＡＭＡＣは、コンフィグデータ変更要求を受けると、アドレスマップ保持部ＡＤＲＭＡＰを参照し、コマンド・アドレス信号ＣＭＤを介して所定のＣＲＡＭアドレスへのデータ読み出しを実施する。 Here, in the method of the first embodiment, as described above, the config control unit CNFGC issues a config data change request based on the monitoring result of the data monitoring unit CUNT, and the CRAM controller CRAMC responds to the config data. To change. Upon receiving the config data change request, the CRAM access controller CRAMAC refers to the address map holding unit ADRMAP and reads data to a predetermined CRAM address via the command address signal CMD.

これに応じてＣＲＡＭから読み出されたコンフィグデータ（ＲＤ）は、ポートＰＯＲＴを介してコンフィグ更新部ＣＮＦＧ＿ＭＤＦＹへ入力される。コンフィグ更新部ＣＮＦＧ＿ＭＤＦＹは、コンフィグデータの変更を実施し、当該変更後のコンフィグデータ（ＷＴ）をポートＰＯＲＴを介してＣＲＡＭに書き込む。この際に、当該コンフィグデータ（ＷＴ）は、ポートＰＯＲＴを介してコードチェック部ＣＯＤＥＣＨＫにも入力される。コードチェック部ＣＯＤＥＣＨＫは、コンフィグデータ（ＷＴ）に対する符号化コードを生成し、それをＲＡＭに記憶する。 The config data (RD) read from the CRAM in response to this is input to the config update unit CNFG_MDFY via the port PORT. The config update unit CNFG_MDFY changes the config data and writes the changed config data (WT) to the CRAM via the port PORT. At this time, the config data (WT) is also input to the code check unit CODECHK via the port PORT. The code check unit CODECHK generates a code for the config data (WT) and stores it in the RAM.

なお、当該コンフィグデータ（ＷＴ）は、例えば変更履歴とともに、外部コンフィグ記憶装置ＳＴＲＧにも記憶される。また、ＣＲＡＭコントローラＣＲＡＭＣは、必要に応じて、外部コンフィグ記憶装置ＳＴＲＧからのコンフィグデータを読み出し、それをＣＲＡＭに格納することや、当該読み出したコンフィグデータをコンフィグ更新部ＣＮＦＧ＿ＭＤＦＹで更新したのち、それをＣＲＡＭに格納することもできる。その際にも、コードチェック部ＣＯＤＥＣＨＫによる処理が行われる。 The config data (WT) is stored in the external config storage device STRG together with the change history, for example. Further, the CRAM controller CRAMC reads the config data from the external config storage device STRG and stores it in the CRAM as necessary, or updates the read config data with the config update unit CNFG_MDFY, and then stores it. It can also be stored in the CRAM. At that time as well, processing is performed by the code check unit CODECHK.

以上のように、本実施例１の方式の主要な特徴の一つは、データ監視部ＣＵＮＴおよびコンフィグ制御部ＣＮＦＧＣが各ＦＰＧＡファブリック部ＦＰＧＡ＿ＦＡＢの動作状態を監視および判定し、その判定結果に基づいてＣＲＡＭコントローラＣＲＡＭＣがＣＲＡＭのコンフィグデータを更新する点にある。各ＦＰＧＡファブリック部ＦＰＧＡ＿ＦＡＢの動作状態は、例えば、前述したように、各ＦＰＧＡファブリック部ＦＰＧＡ＿ＦＡＢ間のデータ通信量や、または、各ＦＰＧＡファブリック部ＦＰＧＡ＿ＦＡＢ毎のデータ処理量等である。各ＦＰＧＡファブリック部ＦＰＧＡ＿ＦＡＢ毎のデータ処理量は、通常、各ＦＰＧＡファブリック部で入出力されるデータ量に比例すると考えられ、例えば、各ＦＰＧＡファブリック部間のデータ通信量から得ることができる。 As described above, one of the main features of the method of the first embodiment is that the data monitoring unit CUNT and the config control unit CNFGC monitor and determine the operating state of each FPGA fabric unit FPGA_FAB, and based on the determination result. CRAM controller The point is that the CRAMC updates the config data of the CRAM. The operating state of each FPGA fabric unit FPGA_FAB is, for example, as described above, the amount of data communication between each FPGA fabric unit FPGA_FAB, the amount of data processing for each FPGA fabric unit FPGA_FAB, and the like. The amount of data processed for each FPGA fabric unit FPGA_FAB is usually considered to be proportional to the amount of data input / output by each FPGA fabric unit, and can be obtained from, for example, the amount of data communication between each FPGA fabric unit.

また、本実施例１の方式の主要な特徴の他の一つは、図２のコンフィグ更新部ＣＮＦＧ＿ＭＤＦＹのように、ＣＲＡＭまたは外部コンフィグ記憶装置ＳＴＲＧから読み出したコンフィグデータを変更し、それをＣＲＡＭ、または加えて外部コンフィグ記憶装置ＳＴＲＧに書き込めるような仕組み（プラットフォーム）を備える点にある。特に限定はされないが、コンフィグデータの具体的な変更方法として、進化アルゴリズム等に基づき、コンフィグデータの一部のビットを変更すること等が挙げられる。このような処理は、外部コンフィグ記憶装置ＳＴＲＧに、コンフィグデータの履歴を適宜残すことで実現できる。あるいは、具体的な変更方法として、データ量の監視結果に基づき、クリティカルな機能ブロック等を対象に補強を行うこと等が挙げられる。なお、コンフィグデータの具体的な変更方法の一例に関しては、以降にて適宜述べる。 Further, another main feature of the method of the first embodiment is that the config data read from the CRAM or the external config storage device STRG is changed and the config data is changed to the CRAM, as in the config update unit CNFG_MDFY of FIG. Alternatively, it is provided with a mechanism (platform) for writing to the external config storage device CTRL. Although not particularly limited, as a specific method of changing the config data, it is possible to change some bits of the config data based on an evolutionary algorithm or the like. Such processing can be realized by appropriately leaving a history of config data in the external config storage device CTRL. Alternatively, as a specific change method, reinforcement may be performed for critical functional blocks or the like based on the monitoring result of the amount of data. An example of a specific method for changing the config data will be described later as appropriate.

ここで、例えば非特許文献１に示されるような一般的なダイナミックリコンフィグは、予め複数のユーザ論理回路（言い換えればコンフィグデータ）を定めておき、その定めたユーザ論理回路を予め定めたシーケンスに沿って順次切り換えるようなものである。例えば、予め演算処理Ａを実行するユーザ論理回路Ａと演算処理Ｂを実行するユーザ論理回路Ｂとを設け、所定のシーケンスに基づき演算処理Ａから演算処理Ｂへ切り換える際に、これに合わせてユーザ論理回路Ａをユーザ論理回路Ｂにリコンフィグする。このように、一般的な情報処理装置（ＦＰＧＡチップＦＰＧＡ＿ＣＨ）は、特に判断を行うことなく、単に既定の手順に基づいてユーザ論理回路を他律的に変更する。 Here, for example, in a general dynamic reconfigure as shown in Non-Patent Document 1, a plurality of user logic circuits (in other words, config data) are defined in advance, and the defined user logic circuits are arranged in a predetermined sequence. It's like switching sequentially along. For example, when a user logic circuit A that executes the arithmetic processing A and a user logic circuit B that executes the arithmetic processing B are provided in advance and the arithmetic processing A is switched to the arithmetic processing B based on a predetermined sequence, the user is set accordingly. The logic circuit A is reconfigured to the user logic circuit B. As described above, the general information processing apparatus (FPGA chip FPGA_CH) heteronomously changes the user logic circuit based on a predetermined procedure without making any particular determination.

一方、本実施例１の方式は、あるユーザ論理回路（言い換えればコンフィグデータ）を出発点として、その動作状態を監視しながら、動作状態が改善する方向に向かうようにユーザ論理回路を自律的に最適化していくものである。すなわち、情報処理装置（ＦＰＧＡチップＦＰＧＡ＿ＣＨ）は、自らの判断に基づいてユーザ論理回路を自律的に変更する。さらに、例えば、進化アルゴリズム等を用いる場合、本実施例１の方式は、言うなれば、あるユーザ論理回路を出発点として、その動作状態を監視しながら、動作状態が改善する方向に向かう回路箇所を探索し、当該探索結果に基づいてユーザ論理回路を自律的に最適化していくようなものである。 On the other hand, in the method of the first embodiment, starting from a certain user logic circuit (in other words, config data), the user logic circuit is autonomously directed toward the improvement of the operating state while monitoring the operating state. It is to be optimized. That is, the information processing device (FPGA chip FPGA_CH) autonomously changes the user logic circuit based on its own judgment. Further, for example, when an evolutionary algorithm or the like is used, the method of the first embodiment is, so to speak, a circuit location in which the operating state is improved while monitoring the operating state starting from a certain user logic circuit. It is like searching for and autonomously optimizing the user logic circuit based on the search result.

このように、所定の演算処理を実行するユーザ論理回路を、当該演算処理に特化した形で自律的に最適化できる仕組みを設けることで、情報処理装置（ＦＰＧＡチップＦＰＧＡ＿ＣＨ）に実装したユーザ論理回路の演算効率を向上させることが可能になる。なお、ここでは、主に、情報処理装置内で各ＦＰＧＡファブリック部ＦＰＧＡ＿ＦＡＢ間のデータ量を監視する場合を例としたが、同様にして、情報処理システム内で情報処理装置間のデータ量を監視することも可能である。 In this way, the user logic implemented in the information processing device (FPGA chip FPGA_CH) is provided by providing a mechanism that can autonomously optimize the user logic circuit that executes a predetermined arithmetic processing in a form specialized for the arithmetic processing. It becomes possible to improve the calculation efficiency of the circuit. Here, the case of monitoring the amount of data between each FPGA fabric unit FPGA_FAB in the information processing device is mainly taken as an example, but similarly, the amount of data between the information processing devices is monitored in the information processing system. It is also possible to do.

例えば、図１において、複数の情報処理装置（ＦＰＧＡチップＦＰＧＡ＿ＣＨ）が同一の演算を実行する場合、複数のＦＰＧＡチップＦＰＧＡ＿ＣＨで構成されるサブシステム全体の性能を評価することが望ましい。この場合、評価部インターフェースＥＶＡＬ＿ＩＦは、各ＦＰＧＡチップＦＰＧＡ＿ＣＨ間のデータ量を監視し、前述したコンフィグ制御部ＣＮＦＧＣと同様に、閾値テーブル等を用いてコンフィグデータの変更要否を判定する。そして、評価部インターフェースＥＶＡＬ＿ＩＦは、当該判定結果に基づき、所定のＦＰＧＡチップＦＰＧＡ＿ＣＨに向けて、制御信号線（ＳＣＴＬ）を介してコンフィグデータの変更依頼等を行う。なお、この監視結果の判定の際には、例えば、閾値テーブルのパラメータを適宜変更することで、全体性能に及ぼす影響が大きいパラメータを判定すること等も可能である。
《情報処理装置の動作》 For example, in FIG. 1, when a plurality of information processing devices (FPGA chip FPGA_CH) execute the same operation, it is desirable to evaluate the performance of the entire subsystem composed of the plurality of FPGA chips FPGA_CH. In this case, the evaluation unit interface EVAL_IF monitors the amount of data between each FPGA chip FPGA_CH, and determines whether or not the config data needs to be changed by using a threshold table or the like as in the config control unit CNFGC described above. Then, the evaluation unit interface EVAL_IF makes a request for changing the config data to the predetermined FPGA chip FPGA_CH via the control signal line (SCTL) based on the determination result. When determining the monitoring result, for example, by appropriately changing the parameters of the threshold table, it is possible to determine the parameters that have a large effect on the overall performance.
<< Operation of information processing device >>

図３は、図１の情報処理装置において、コンフィグデータの変更を行う際の制御フローの一例を示すフロー図である。図３において、情報処理装置（ＦＰＧＡチップＦＰＧＡ＿ＣＨ）は、各ＦＰＧＡファブリック部（機能ブロック）ＦＰＧＡ＿ＦＡＢで所定の演算処理を行わせながら、データ監視部ＣＵＮＴを用いて機能ブロック間のデータ流通を監視し、演算時のデータ流通を監視する（ステップＳ１０１）。この際に、データ監視部ＣＵＮＴは、前述したように、例えば、各機能ブロックへのデータ送受信回数や経過時間などをカウントする。 FIG. 3 is a flow diagram showing an example of a control flow when changing the config data in the information processing apparatus of FIG. In FIG. 3, the information processing apparatus (FPGA chip FPGA_CH) monitors data distribution between functional blocks by using the data monitoring unit CUNT while performing predetermined arithmetic processing on each FPGA fabric unit (functional block) FPGA_FAB. Monitor the data distribution during calculation (step S101). At this time, as described above, the data monitoring unit CUNT counts, for example, the number of times data is transmitted / received to each functional block and the elapsed time.

情報処理装置は、例えば、データ送受信回数が所定の回数に達した際や、または所定の経過時間に達した際などに、コンフィグデータの書き換えが必要かどうかの判断を実施する（ステップＳ１０２）。具体的には、コンフィグ制御部ＣＮＦＧＣが、例えば、データ監視部ＣＵＮＴの監視結果に基づく通信レートと、閾値テーブルＴＢＬに格納される通信レートの閾値または通信レートの変動量の閾値等とを比較することなどで判断する。 The information processing apparatus determines whether or not it is necessary to rewrite the config data when, for example, the number of times of data transmission / reception reaches a predetermined number of times or a predetermined elapsed time is reached (step S102). Specifically, the config control unit CNFGC compares, for example, the communication rate based on the monitoring result of the data monitoring unit CUNT with the threshold value of the communication rate stored in the threshold table TBL or the threshold value of the fluctuation amount of the communication rate. Judge by things.

その結果、書き換えが不要な場合、情報処理装置は、ステップＳ１０１にもどり、監視を続行する。一方、書き換えが必要な場合、情報処理装置は、ステップＳ１０３へ進んでＣＲＡＭのコンフィグデータの変更処理に入る。具体的には、ＣＲＡＭアクセスコントローラＣＲＡＭＡＣおよびコンフィグ更新部ＣＮＦＧ＿ＭＤＦＹは、ＣＲＡＭからコンフィグデータを読み出したのち当該データを変更し（ステップＳ１０３）、新しいコンフィグデータをＣＲＡＭに書き込む（ステップＳ１０４）。 As a result, when rewriting is unnecessary, the information processing apparatus returns to step S101 and continues monitoring. On the other hand, when rewriting is required, the information processing apparatus proceeds to step S103 to start the CRAM config data change process. Specifically, the CRAM access controller CRAMAC and the config update unit CNFG_MDFY read the config data from the CRAM, change the data (step S103), and write the new config data to the CRAM (step S104).

次いで、ＣＲＡＭコントローラＣＲＡＭＣは、コードチェック部ＣＯＤＥＣＨＫを用いた符号化コードの生成やＲＡＭへの格納を行い（ステップＳ１０５）、さらに、新しいコンフィグデータを外部コンフィグ記憶装置ＳＴＲＧへ格納する処理を行う（ステップＳ１０６）。このような処理は、所定の演算処理が終了するまで繰り返し行われる（ステップＳ１０７）。なお、ここでは、ステップＳ１０５，Ｓ１０６の処理を順次行ったが、これらの処理をステップＳ１０４の処理のバックグランドで並行して行ってもよい。そうすることで、より一層の高速化が実現できる。 Next, the CRAM controller CRAMC generates a coded code using the code check unit CODECHK and stores it in the RAM (step S105), and further performs a process of storing the new config data in the external config storage device STRG (step). S106). Such processing is repeated until the predetermined arithmetic processing is completed (step S107). Although the processes of steps S105 and S106 are sequentially performed here, these processes may be performed in parallel in the background of the process of step S104. By doing so, even higher speed can be realized.

図４は、図３のフローに伴う、図１の情報処理装置の入出力信号の一例を示すシーケンス図である。まず、時刻Ｔ１で、データ監視部ＣＵＮＴは、ＣＮＴ＿Ｓ信号をアサート（ここでは‘Ｈ’レベルに制御）し、バスモニタ（ＤＢＵＳ）の監視結果をコンフィグ制御部ＣＮＦＧＣへ伝達する。これを受けて、コンフィグ制御部ＣＮＦＧＣは、コンフィグデータの変更要否を判定し、変更要の場合には、時刻Ｔ２で、ＲＣＯＮＦ＿Ｓ信号をアサートする。 FIG. 4 is a sequence diagram showing an example of input / output signals of the information processing apparatus of FIG. 1 according to the flow of FIG. First, at time T1, the data monitoring unit CUN asserts the CNT_S signal (here, it is controlled to the'H'level), and transmits the monitoring result of the bus monitor (DBUS) to the config control unit CNFGC. In response to this, the config control unit CNFGC determines whether or not the config data needs to be changed, and if it needs to be changed, asserts the RCONF_S signal at time T2.

ＣＲＡＭコントローラＣＲＡＭＣは、ＲＣＯＮＦ＿Ｓ信号のアサートを受けて、時刻Ｔ３で、ＣＣＳ＿Ｓ信号をアサートする。これを受けて、チップ統括コントローラＣＨＰＣは、リコンフィグに伴うＦＰＧＡファブリック部ＦＰＧＡ＿ＦＡＢの制御を実施する。具体的には、チップ統括コントローラＣＨＰＣは、ＦＦＳ＿Ｓ信号をＦＰＧＡファブリック部ＦＰＧＡ＿ＦＡＢへ伝達する。 The CRAM controller CRAMC receives the assertion of the RCONF_S signal and asserts the CCS_S signal at time T3. In response to this, the chip control controller CHPC controls the FPGA fabric unit FPGA_FAB accompanying the reconfiguration. Specifically, the chip control controller CHPC transmits the FFS_S signal to the FPGA fabric unit FPGA_FAB.

ここで、ＦＦＳ＿Ｓ信号は、リセット信号（ＲＥＳＢ＿Ｓ信号）、論理演算実行信号（ＥＸＥ＿Ｓ信号）、状態ホールド信号（ＨＯＬＤ＿Ｓ信号）で構成される。時刻Ｔ４で、ＥＸＥ＿Ｓ信号がネゲートされると、ＦＰＧＡファブリック部ＦＰＧＡ＿ＦＡＢに実装されるユーザ論理回路は、演算処理を中断する。次いで、時刻Ｔ５で、ＨＯＬＤ＿Ｓ信号がアサートされると、ＦＰＧＡファブリック部ＦＰＧＡ＿ＦＡＢは、オンチップバスＤＢＵＳを介したデータ通信をホールドする。 Here, the FFS_S signal is composed of a reset signal (RESB_S signal), a logical operation execution signal (EXE_S signal), and a state hold signal (HOLD_S signal). When the EXE_S signal is negated at time T4, the user logic circuit mounted on the FPGA fabric unit FPGA_FAB interrupts the arithmetic processing. Then, at time T5, when the HOLD_S signal is asserted, the FPGA fabric unit FPGA_FAB holds the data communication via the on-chip bus DBUS.

次いで、オンチップバスＤＢＵＳを介した信号授受が停止した状態で、チップ統括コントローラＣＨＰＣは、時刻Ｔ７で、リコンフィグ開始可能信号（ＣＣＳＡ＿Ｓ信号）を発行し、これを受けて、ＣＲＡＭコントローラＣＲＡＭＣは、時刻Ｔ８で、リコンフィグ信号（ＣＲＡＭＣ＿Ｓ信号）をアサートする（すなわち、ＣＲＡＭのコンフィグデータを変更する）。また、時刻Ｔ８では、当該コンフィグデータ（ＣＲＡＭＤ＿Ｓ）の変更に応じて、ＦＰＧＡファブリック部ＦＰＧＡ＿ＦＡＢにおけるユーザ論理回路の再構築が行われる。 Next, with the signal transmission / reception via the on-chip bus DBUS stopped, the chip control controller CHPC issues a reconfigurable startable signal (CCSA_S signal) at time T7, and in response to this, the CRAM controller CRAMC receives the signal. At time T8, the reconfig signal (CRAMC_S signal) is asserted (that is, the config data of the CRAM is changed). Further, at time T8, the user logic circuit in the FPGA fabric unit FPGA_FAB is reconstructed in response to the change in the config data (CRAMD_S).

その後、時刻Ｔ９で、ＣＲＡＭ（言い換えればＦＰＧＡファブリック部ＦＰＧＡ＿ＦＡＢ）のコンフィギュレーションが完了すると、ＣＲＡＭＣ＿Ｓ信号がネゲートされ、時刻Ｔ１０で、リコンフィグ信号（ＲＣＯＮＦ＿Ｓ信号）がネゲートされる。引き続き、時刻Ｔ１１で、状態ホールド信号（ＨＯＬＤ＿Ｓ信号）がネゲートされ、時刻Ｔ１２で、リセット信号（ＲＥＳＢ＿Ｓ信号）がアサートされた後、時刻Ｔ１３で、論理演算実行信号（ＥＸＥ＿Ｓ信号）がアサートされる。これにより、再構築されたユーザ論理回路は、演算を再開する。オンチップバスＤＢＵＳにおいて、データ通信は、ここでは、時刻Ｔ１から時刻Ｔ１３までは中断され、時刻Ｔ１３で再開される。 Then, at time T9, when the configuration of the CRAM (in other words, FPGA fabric unit FPGA_FAB) is completed, the CRAMC_S signal is negated, and at time T10, the reconfig signal (RCONF_S signal) is negated. Subsequently, the state hold signal (HOLD_S signal) is negated at time T11, the reset signal (RESB_S signal) is asserted at time T12, and then the logical operation execution signal (EXE_S signal) is asserted at time T13. As a result, the reconstructed user logic circuit resumes the operation. In the on-chip bus DBUS, data communication is interrupted here from time T1 to time T13 and resumed at time T13.

なお、ここでは、簡単のため、ひとつのＦＰＧＡファブリック部（機能ブロック）ＦＰＧＡ＿ＦＡＢに対するコンフィギュレーション実行のシーケンスについて述べた。しかし、ＦＰＧＡチップＦＰＧＡ＿ＣＨ上には複数の機能ブロックが搭載されており、それらの一部が動作中でも図３および図４のような処理は可能である。そのため、図４の時刻Ｔ１から時刻Ｔ１３までの期間では、リコンフィギュレーションの非対象となる機能ブロック間でのオンチップバスＤＢＵＳを介したデータ通信が行われていてもよい。 Here, for the sake of simplicity, the sequence of configuration execution for one FPGA fabric part (functional block) FPGA_FAB has been described. However, a plurality of functional blocks are mounted on the FPGA chip FPGA_CH, and the processing as shown in FIGS. 3 and 4 is possible even when some of them are in operation. Therefore, during the period from time T1 to time T13 in FIG. 4, data communication may be performed between the functional blocks that are not subject to reconfiguration via the on-chip bus DBUS.

また、ここでは、ＦＰＧＡファブリック部ＦＰＧＡ＿ＦＡＢのユーザ論理回路を変更する場合を例としたが、これに限らず、内部バスのバス幅の増減することや、ユーザ論理回路を新たに追加するようなことも、ＦＰＧＡチップＦＰＧＡ＿ＣＨ内リソースが許される範囲で実施可能である。例えば、図４の時刻Ｔ１に応じて機能ブロックの増加が必要と判断される場合があり、この際には、機能ブロックの増加と共に、バス幅を増やすことが望ましいことがある。あるいは、元々、同じ演算を行う複数の機能ブロックがあった場合でも、その機能ブロック間のデータ通信のバンド幅を増やすような判断がなされることがある。このような場合には、バス幅を増やすようなコンフィギュレーション変更が実施されるので、時刻Ｔ１３以降はバス幅が増えた形でのデータ通信が実施される。 Further, here, the case of changing the user logic circuit of the FPGA fabric part FPGA_FAB is taken as an example, but the present invention is not limited to this, and the bus width of the internal bus may be increased or decreased, or a new user logic circuit may be added. Can also be implemented as long as the resources in the FPGA chip FPGA_CH are allowed. For example, it may be determined that the number of functional blocks needs to be increased according to the time T1 in FIG. 4, and in this case, it may be desirable to increase the bus width as the number of functional blocks increases. Alternatively, even if there are a plurality of functional blocks that originally perform the same operation, a determination may be made to increase the bandwidth of data communication between the functional blocks. In such a case, since the configuration change is performed so as to increase the bus width, the data communication in the form of increasing the bus width is carried out after the time T13.

以上、本実施例１の情報処理装置および情報処理システムを用いることで、代表的には、演算効率の向上等が実現可能になる。 As described above, by using the information processing device and the information processing system of the first embodiment, it is possible to typically improve the calculation efficiency.

本実施例２では、ＣＲＡＭのビットストリームの構成について述べる。図５は、本発明の実施例２による情報処理装置において、ＣＲＡＭのアドレスマップの構造例を示す概略図である。実施例１で述べたようにして、ＦＰＧＡファブリック部ＦＰＧＡ＿ＦＡＢのユーザ論理回路を変更するにあたり、例えば、ＬＵＴの構成を維持したまま、その数を増減させ、これに応じて各ＬＵＴ間の接続関係（すなわち図１のスイッチ部ＳＷの構成）を変えたいような場合がある。 In the second embodiment, the configuration of the bit stream of the CRAM will be described. FIG. 5 is a schematic view showing a structural example of an address map of a CRAM in the information processing apparatus according to the second embodiment of the present invention. As described in the first embodiment, when changing the user logic circuit of the FPGA fabric unit FPGA_FAB, for example, the number of LUTs is increased or decreased while maintaining the configuration of the LUTs, and the connection relationship between the LUTs is increased accordingly. That is, there are cases where it is desired to change the configuration of the switch unit SW in FIG.

このような場合、図５に示すように、ＦＰＧＡファブリック部ＦＰＧＡ＿ＦＡＢのスイッチ部ＳＷの構成を定めるＣＲＡＭアドレスと、プログラマブルロジック部ＬＯＧの構成を定めるＣＲＡＭアドレスとを明示的に分けておくことが望ましい。ここでは、スイッチ部ＳＷを構成する複数のスイッチと、プログラマブルロジック部ＬＯＧを構成する複数のＬＵＴとに、それぞれ２５６ＭｂｉｔのＣＲＡＭ容量が付与され、各スイッチおよび各ＬＵＴが１６ビットで構成される場合を例としている。ただし、勿論、ＣＲＡＭ容量や、各スイッチおよび各ＬＵＴのビット数は、これに限定されるものではない。 In such a case, as shown in FIG. 5, it is desirable to explicitly separate the CRAM address that determines the configuration of the switch unit SW of the FPGA fabric unit FPGA_FAB and the CRAM address that determines the configuration of the programmable logic unit LOG. Here, a case where a CRAM capacity of 256 Mbit is assigned to each of a plurality of switches constituting the switch unit SW and a plurality of LUTs constituting the programmable logic unit LOG, and each switch and each LUT are composed of 16 bits. It is an example. However, of course, the CRAM capacity and the number of bits of each switch and each LUT are not limited to this.

本実施例３では、コンフィグデータの変更方法の一例について述べる。図６（ａ）および図６（ｂ）は、本発明の実施例３による情報処理装置において、コンフィグデータの変更方法の一例を説明する概念図である。ここでは、例えば、図１のコンフィグ制御部ＣＮＦＧＣによって機能ブロックＡの処理能力が不足すると判定され、これに応じて、機能ブロックＡを構成する回路ブロックＡを増強する場合を例とする。 In the third embodiment, an example of a method of changing the config data will be described. 6 (a) and 6 (b) are conceptual diagrams illustrating an example of a method of changing config data in the information processing apparatus according to the third embodiment of the present invention. Here, for example, a case where the config control unit CNFGC of FIG. 1 determines that the processing capacity of the functional block A is insufficient and the circuit block A constituting the functional block A is enhanced accordingly is taken as an example.

図６（ａ）では、回路ブロックＡ（ＣＲＣＩＴＢＫ＿Ａ）〜回路ブロックＣ（ＣＲＣＩＴＢＫ＿Ｃ）と、スペアの回路ブロックＸ（ＣＲＣＩＴＢＫ＿Ｘ）と、スイッチブロック１（ＳＷＢＫ１）およびスイッチブロック２（ＳＷＢＫ２）と、各回路ブロックおよび各スイッチブロックに対応するＣＲＡＭとが示される。この図では、簡単のため、左から右へデータ処理が流れる例を示している。 In FIG. 6A, circuit blocks A (CRCITBK_A) to circuit blocks C (CRCITBK_C), spare circuit blocks X (CRCITBK_X), switch blocks 1 (SWBK1) and switch blocks 2 (SWBK2), and each circuit block. And the CRAM corresponding to each switch block are shown. In this figure, for simplicity, an example is shown in which data processing flows from left to right.

ここで、回路ブロックＸ（ＣＲＣＩＴＢＫ＿Ｘ）の論理を生成する方法について述べる。このブロックは、本実施例３では、予めスペアとして準備しておいた領域であり、この部分に、回路ブロックＡ（ＣＲＣＩＴＢＫ＿Ａ）を生成することで、処理能力不足を解消することができる。具体的には、図１のＣＲＡＭコントローラＣＲＡＭＣは、まず、ＣＲＡＭを読み出し、ＦＰＧＡチップＦＰＧＡ＿ＣＨに設けられるユーザメモリ内に、変更する回路ブロックＡに対応するＣＲＡＭアドレスとデータを格納する。そして、この際に、図１のＣＲＡＭコントローラＣＲＡＭＣは、図６（ｂ）に示すように、回路ブロックＸに対応するＣＲＡＭアドレス（ＣＡＤＲ＿Ｘ）のデータを、回路ブロックＡと同じデータ（ＣＤＡＴＡ＿Ａ）に定める。 Here, a method of generating the logic of the circuit block X (CRCITBK_X) will be described. This block is an area prepared in advance as a spare in the third embodiment, and by generating a circuit block A (CRCITBK_A) in this area, the lack of processing capacity can be solved. Specifically, the CRAM controller CRAMC of FIG. 1 first reads the CRAM and stores the CRAM address and data corresponding to the circuit block A to be changed in the user memory provided in the FPGA chip FPGA_CH. Then, at this time, as shown in FIG. 6B, the CRAM controller CRAMC of FIG. 1 sets the data of the CRAM address (CADR_X) corresponding to the circuit block X to the same data (CDATA_A) as the circuit block A. ..

さらに、ＣＲＡＭコントローラＣＲＡＭＣは、図６（ｂ）に示すように、スイッチブロック１，２にそれぞれ対応するＣＲＡＭアドレス（ＣＡＤＲ＿ＳＷ１，ＣＡＤＲ＿ＳＷ２）のデータを、それぞれ所定のデータ（ＣＤＡＴＡ＿ＳＷ１’，ＣＤＡＴＡ＿ＳＷ２’）に定める。例えば、データＣＤＡＴＡ＿ＳＷ１’は、変更前のデータＣＤＡＴＡ＿ＳＷ１に含まれる回路ブロックＡに向けた結線データを、回路ブロックＡに加えて回路ブロックＸに向けた結線データに変更したものである。データＣＤＡＴＡ＿ＳＷ２’に関しても、これと同様である。 Further, as shown in FIG. 6B, the CRAM controller CRAMC defines the data of the CRAM addresses (CADR_SW1, CADR_SW2) corresponding to the switch blocks 1 and 2 in the predetermined data (CDATA_SW1', CDATA_SW2'), respectively. .. For example, the data CDATA_SW1'is changed from the connection data for the circuit block A included in the data CDATA_SW1 before the change to the connection data for the circuit block X in addition to the circuit block A. The same applies to the data CDATA_SW2'.

ＣＲＡＭコントローラＣＲＡＭＣは、ユーザメモリ内でこのようにして定めた各コンフィグデータをＣＲＡＭへ書き込む。この際に、ＣＲＡＭコントローラＣＲＡＭＣは、実施例１で述べたように、ＥＣＣやＣＲＣ等の符号化コードの生成も行う。なお、このようなコンフィグデータの生成は、ＦＰＧＡチップＦＰＧＡ＿ＣＨ外部のメモリを活用してもよい。チップ内にＲＡＭからＣＲＡＭへの転送経路を設けることが望ましいが、それが存在しない場合は、一旦外部コンフィグ記憶装置ＳＴＲＧを介して転送することも可能である。 The CRAM controller CRAMC writes each config data determined in this way in the user memory to the CRAM. At this time, the CRAM controller CRAMC also generates a coding code such as ECC or CRC as described in the first embodiment. The memory outside the FPGA chip FPGA_CH may be used to generate such config data. It is desirable to provide a transfer path from RAM to CRAM in the chip, but if it does not exist, it is also possible to temporarily transfer via the external config storage device CTRL.

以上のように、本実施例３の方式を用いると、ＦＰＧＡチップＦＰＧＡ＿ＣＨ内のリソースを有効に活用し、最大限のパフォーマンスを出せるように自律的な制御が可能になる。 As described above, when the method of the third embodiment is used, the resources in the FPGA chip FPGA_CH can be effectively utilized, and autonomous control can be performed so as to maximize the performance.

本実施例４では、コンフィグデータの変更方法の他の一例について述べる。図７は、本発明の実施例４による情報処理システムにおいて、コンフィグデータの変更方法の一例を説明する概念図である。図７に示す情報処理システムは、ｎ＋１個のＦＰＧＡチップＦＰＧＡ＿ＣＨ０〜ＦＰＧＡ＿ＣＨｎで構成されるＦＰＧＡサブシステムＦＰＧＡ＿ＳＳＹＳと、図１に示した評価部インターフェースＥＶＡＬ＿ＩＦと、外部コンフィグ記憶装置ＳＴＲＧとを備える。ＦＰＧＡチップＦＰＧＡ＿ＣＨ０には、機能ブロックＦＵＮＣ＿Ａ０〜ＦＵＮＣ＿Ａ２が実装され、ＦＰＧＡチップＦＰＧＡ＿ＣＨ１には、機能ブロックＦＵＮＣ＿Ｂ０〜ＦＵＮＣ＿Ｂ２が実装され、ＦＰＧＡチップＦＰＧＡ＿ＣＨ２には、機能ブロックＦＵＮＣ＿Ｃ０〜ＦＵＮＣ＿Ｃ２が実装されている。 In the fourth embodiment, another example of the method of changing the config data will be described. FIG. 7 is a conceptual diagram illustrating an example of a method of changing config data in the information processing system according to the fourth embodiment of the present invention. The information processing system shown in FIG. 7 includes an FPGA subsystem FPGA_SSYS composed of n + 1 FPGA chips FPGA_CH0 to FPGA_CHn, an evaluation unit interface EVAL_IF shown in FIG. 1, and an external config storage device STRG. Functional blocks FUNC_A0 to FUNC_A2 are mounted on the FPGA chip FPGA_CH0, functional blocks FUNC_B0 to FUNC_B2 are mounted on the FPGA chip FPGA_CH1, and functional blocks FUNC_C0 to FUNC_C2 are mounted on the FPGA chip FPGA_CH2.

評価部インターフェースＥＶＡＬ＿ＩＦは、システムの最適化を行う最適化制御部ＣＴＬＰを備える。最適化制御部ＣＴＬＰは、図１に示したデータ監視部ＣＵＮＴ、コンフィグ制御部ＣＮＦＧＣおよびＣＲＡＭコントローラＣＲＡＭＣと同様の機能を備える。また、各ＦＰＧＡチップＦＰＧＡ＿ＣＨ０〜ＦＰＧＡ＿ＣＨｎ内にも当該機能に該当する最適化制御部ＣＴＬＣが備わっている。 The evaluation unit interface EVAL_IF includes an optimization control unit CTLP that optimizes the system. The optimization control unit CTLP has the same functions as the data monitoring unit CUNT, the config control unit CNFGC, and the CRAM controller CRAMC shown in FIG. Further, each FPGA chip FPGA_CH0 to FPGA_CHn is also provided with an optimization control unit CTLC corresponding to the function.

ここで、評価部インターフェースＥＶＡＬ＿ＩＦの最適化制御部ＣＴＬＰは、例えば、データ信号線（ＳＤＡＴ）の監視を行ったり、あるいは、各チップ内の最適化制御部ＣＴＬＣでの監視結果を取得すること等で、ＦＰＧＡサブシステムＦＰＧＡ＿ＳＳＹＳ内の演算処理能力を監視する。そして、最適化制御部ＣＴＬＰは、例えば、閾値テーブルを用いた統計処理によって演算処理能力が不足する傾向にあると判断した場合、新たな機能ブロックを生成する。 Here, the optimization control unit CTLP of the evaluation unit interface EVAL_IF may, for example, monitor the data signal line (SDAT) or acquire the monitoring result by the optimization control unit CTLC in each chip. , FPGA subsystem Monitors computing power in FPGA_SSYS. Then, the optimization control unit CTLP generates a new functional block when, for example, it is determined that the arithmetic processing capacity tends to be insufficient by statistical processing using the threshold table.

この例では、ＦＰＧＡチップＦＰＧＡ＿ＣＨ０の機能ブロックＦＵＮＣ＿Ａ０と、ＦＰＧＡチップＦＰＧＡ＿ＣＨ１の機能ブロックＦＵＮＣ＿Ｂ２と、ＦＰＧＡチップＦＰＧＡ＿ＣＨ２の機能ブロックＦＵＮＣ＿Ｃ１の使用頻度が高く、最適化制御部ＣＴＬＰは、これらのブロックの増強が効率的であると判断したものとする。この場合、最適化制御部ＣＴＬＰは、例えば、当該各機能ブロックＦＵＮＣ＿Ａ０，ＦＵＮＣ＿Ｂ２，ＦＵＮＣ＿Ｃ１のコンフィグデータを各チップからコンフィグ信号線（ＳＣＦＧ）を介して読み出し、当該コンフィグデータを実装するように、制御信号線（ＳＣＴＬ）を介してＦＰＧＡチップＦＰＧＡ＿ＣＨｎの最適化制御部ＣＴＬＣに指示する。 In this example, the functional block FUNC_A0 of the FPGA chip FPGA_CH0, the functional block FUNC_B2 of the FPGA chip FPGA_CH1, and the functional block FUNC_C1 of the FPGA chip FPGA_CH2 are frequently used, and the optimization control unit CTLP efficiently enhances these blocks. It is assumed that it is determined to be. In this case, the optimization control unit CTLP reads, for example, the config data of each functional block FUNC_A0, FUNC_B2, and FUNC_C1 from each chip via the config signal line (SCFG), and implements the control signal. Instruct the optimization control unit CTLC of the FPGA chip FPGA_CHn via a line (SCTL).

なお、このようなコンフィグデータのコピーを行う際には、各機能ブロックのインターフェースが統一されていると便利であり、入出力ピン数などをそろえておくことが望ましい。ただし、入力ピン数がそろっていない場合でも、スイッチ部ＳＷにおいて、余ったピンに対して、‘Ｈ’レベル固定、‘Ｌ’レベル固定、オープン等の処理を実施すれば、問題なく実装できる。 When copying such config data, it is convenient if the interface of each functional block is unified, and it is desirable to have the same number of input / output pins. However, even if the number of input pins is not uniform, it can be mounted without any problem by performing processing such as "H" level fixing, "L" level fixing, and opening on the surplus pins in the switch unit SW.

以上のように、本実施例４の方式を用いると、情報処理システム内のリソースを有効に活用し、最大限のパフォーマンスを出せるように自律的な制御が可能になる。 As described above, when the method of the fourth embodiment is used, the resources in the information processing system can be effectively utilized and autonomous control can be performed so as to maximize the performance.

本実施例５では、図１に示した情報処理装置（ＦＰＧＡチップＦＰＧＡ＿ＣＨ）の変形例と共に、コンフィグデータの変更方法の更に他の一例について述べる。図８は、本発明の実施例５による情報処理システムにおいて、その全体の概略構成例を示すブロック図である。図１では、オンチップバスＤＢＵＳ上のデータ授受の統計的なデータを利用してコンフィグデータを変更する例を示したが、本実施例５では、データの流れではなく、データの処理に応じてコンフィグデータを変更する。 In the fifth embodiment, a modification of the information processing apparatus (FPGA chip FPGA_CH) shown in FIG. 1 and still another example of a method of changing the config data will be described. FIG. 8 is a block diagram showing a schematic configuration example of the entire information processing system according to the fifth embodiment of the present invention. FIG. 1 shows an example of changing the config data by using statistical data of data transfer on the on-chip bus DBUS, but in the fifth embodiment, it depends on the data processing, not the data flow. Change the config data.

図８の情報処理装置（ＦＰＧＡチップＦＰＧＡ＿ＣＨ）では、図１と異なり、データ監視部ＣＵＮＴはオンチップバスＤＢＵＳに接続されているのではなく、各ＦＰＧＡファブリック部ＦＰＧＡ＿ＦＡＢに直接接続されている。データ監視部ＣＵＮＴは、ＦＰＧＡファブリック部ＦＰＧＡ＿ＦＡＢからの演算処理の終了信号に応じて、コンフィグ切り替え信号をコンフィグ制御部ＣＮＦＧＣへ送信する。演算処理の終了信号は、ＦＰＧＡファブリック部ＦＰＧＡ＿ＦＡＢによって所定の単位の演算処理を終了する毎に送信される。 In the information processing device (FPGA chip FPGA_CH) of FIG. 8, unlike FIG. 1, the data monitoring unit CUNT is not connected to the on-chip bus DBUS, but is directly connected to each FPGA fabric unit FPGA_FAB. The data monitoring unit CUNT transmits a config switching signal to the config control unit CNFGC in response to the completion signal of the arithmetic processing from the FPGA fabric unit FPGA_FAB. The end signal of the arithmetic processing is transmitted by the FPGA fabric unit FPGA_FAB every time the arithmetic processing of a predetermined unit is completed.

コンフィグ切り替えに際しては、演算処理の終了信号そのものを用いてもよいが、演算処理の開始から終了までの時間間隔や、演算処理の実行回数などを計測する（すなわち、演算処理の負荷の重さや演算処理の頻度を計測する）ことがより効果的である。また、例えば、プログラムの解析に基づいて予め演算処理の時系列な推移が予測できるような場合、タイマを内蔵し、設定時刻になったら切り替えることも可能である。 When switching the configuration, the end signal of the arithmetic processing itself may be used, but the time interval from the start to the end of the arithmetic processing, the number of executions of the arithmetic processing, etc. are measured (that is, the load weight of the arithmetic processing and the arithmetic). (Measuring the frequency of processing) is more effective. Further, for example, when the time-series transition of the arithmetic processing can be predicted in advance based on the analysis of the program, it is possible to incorporate a timer and switch when the set time is reached.

さらに、割り込み処理の回数や、割り込み処理の種類に応じて、最適な論理構成になるように自律的にユーザ論理回路を変更することもできる。つまり、割り込み処理の演算が固定であり、しかもその頻度が高い場合は、ほとんど演算処理を実行していないＦＰＧＡファブリック部ＦＰＧＡ＿ＦＡＢに、割り込み処理を実行するユーザ論理回路を実装することが可能になる。 Further, the user logic circuit can be autonomously changed so as to have an optimum logic configuration according to the number of interrupt processing and the type of interrupt processing. That is, when the interrupt processing calculation is fixed and the frequency is high, it is possible to implement a user logic circuit that executes the interrupt processing in the FPGA fabric unit FPGA_FAB that hardly executes the calculation processing.

この割り込み処理の概念は、演算処理部の更新の概念にも拡張することができる。一般的に、割り込みというと、通常の制御シーケンスの中で、緊急的な処理のルーチンや追加の演算処理要求にこたえるために、ソフトウエアを分岐させるイメージが強い。一方、本実施例の方式は、そのような固定的ないわゆる設計されたプログラム分岐のみならず、設計当初は考慮していなかった追加機能の拡充という概念まで含めることができる。 This concept of interrupt processing can be extended to the concept of updating the arithmetic processing unit. Generally speaking, interrupts have a strong image of branching software in order to respond to urgent processing routines and additional arithmetic processing requests in a normal control sequence. On the other hand, the method of this embodiment can include not only such a fixed so-called designed program branching but also the concept of expansion of additional functions that were not considered at the beginning of the design.

例えば、割り込み処理と呼ぶ新規の演算処理は、設計当初において頻度が少ないもしくは想定範囲外ということで、低い優先順位で設計されているかもしくは当初の設計には入っていなかった場合でも、処理するサービスやアプリケーションの進化に応じて頻繁に発生するようになり得る。すなわち、新規の演算処理の優先順位が動的に高まるような事態が生じ得る。本実施例の方式を用いると、このような動的な優先順位の高まりを受けて、その状況変化に適応するように、ハードウエア処理を進化させていくことができる。このように専用のハードウエアで演算処理を行うことで、ソフトウエアで演算処理を行う場合と比較して、高速性や低消費電力化を実現できる。 For example, a new arithmetic process called interrupt processing is a service that processes even if it is designed with a low priority or is not included in the original design because it is infrequent or out of the expected range at the beginning of design. And can occur frequently as the application evolves. That is, a situation may occur in which the priority of new arithmetic processing is dynamically increased. By using the method of this embodiment, it is possible to evolve the hardware processing so as to adapt to the change in the situation in response to such a dynamic increase in priority. By performing the arithmetic processing with the dedicated hardware in this way, it is possible to realize high speed and low power consumption as compared with the case where the arithmetic processing is performed by the software.

以上のように、本実施例５の方式を用いると、情報処理装置内のリソースを有効に活用し、最大限のパフォーマンスを出せるように自律的な制御が可能になる。 As described above, when the method of the fifth embodiment is used, the resources in the information processing apparatus can be effectively utilized and autonomous control can be performed so as to maximize the performance.

本実施例６では、コンフィグデータの変更方法の更に他の一例について述べる。前述した実施例では、ＦＰＧＡチップＦＰＧＡ＿ＣＨに搭載したユーザ論理回路間のデータ通信回数（または演算処理回数）等に基づいて、主に、ユーザ論理回路（機能ブロック）のコンフィグデータを変更する例を示した。本実施例６では、スイッチ部のコンフィグデータを変更する例について説明する。図９は、本発明の実施例６による情報処理装置において、コンフィグデータの変更方法の一例を説明する概念図である。 In the sixth embodiment, still another example of the method of changing the config data will be described. In the above-described embodiment, an example in which the config data of the user logic circuit (functional block) is mainly changed based on the number of data communications (or the number of arithmetic processes) between the user logic circuits mounted on the FPGA chip FPGA_CH is shown. It was. In the sixth embodiment, an example of changing the config data of the switch unit will be described. FIG. 9 is a conceptual diagram illustrating an example of a method of changing config data in the information processing apparatus according to the sixth embodiment of the present invention.

図９に示す情報処理装置ＦＰＧＡ＿ＣＨは、複数の機能ブロックＦＵＮＣ＿Ａ〜ＦＵＮＣ＿Ｆと、各機能ブロック間を接続するスイッチ部ＳＷと、図７で述べたような最適化制御部ＣＴＬＣとを備える。ここでは、簡単のため、各機能ブロック間の信号は４本に制限して説明する。機能ブロックＦＵＮＣ＿Ａは、出力インタフェースＩＦＯを備え、当該出力インタフェースＩＦＯからの信号は、１本が機能ブロックＦＵＮＣ＿Ｄへ、１本が機能ブロックＦＵＮＣ＿Ｅへ、２本が機能ブロックＦＵＮＣ＿Ｆへ接続されている。 The information processing apparatus FPGA_CH shown in FIG. 9 includes a plurality of functional blocks FUNC_A to FUNC_F, a switch unit SW for connecting each functional block, and an optimization control unit CTLC as described in FIG. 7. Here, for the sake of simplicity, the number of signals between each functional block will be limited to four. The functional block FUNC_A includes an output interface IFO, and one signal from the output interface IFO is connected to the functional block FUNC_D, one is connected to the functional block FUNC_E, and two signals are connected to the functional block FUNC_F.

同様に機能ブロックＦＵＮＣ＿Ｂの出力インタフェースＩＦＯからの信号は、２本が機能ブロックＦＵＮＣ＿Ｄへ、１本が機能ブロックＦＵＮＣ＿Ｅへ、１本が機能ブロックＦＵＮＣ＿Ｆへ接続されている。さらに、機能ブロックＦＵＮＣ＿Ｃからの信号は、１本が機能ブロックＦＵＮＣ＿Ｄへ、１本が機能ブロックＦＵＮＣ＿Ｅへ、２本が機能ブロックＦＵＮＣ＿Ｆへ接続されている。 Similarly, two signals from the output interface IFO of the functional block FUNC_B are connected to the functional block FUNC_D, one to the functional block FUNC_E, and one to the functional block FUNC_F. Further, one signal from the functional block FUNC_C is connected to the functional block FUNC_D, one is connected to the functional block FUNC_E, and two are connected to the functional block FUNC_F.

ここで、最適化制御部ＣＴＬＣは、例えば、機能ブロックＦＵＮＣ＿Ａと機能ブロックＦＵＮＣ＿Ｄとの関係が強いと判断した場合、機能ブロックＦＵＮＣ＿Ａと機能ブロックＦＵＮＣ＿Ｄとの間の結合度を変更する。この例では、機能ブロックＦＵＮＣ＿Ａからの信号は、２本が機能ブロックＦＵＮＣ＿Ｄへ、１本が機能ブロックＦＵＮＣ＿Ｅへ、１本が機能ブロックＦＵＮＣ＿Ｆへ接続されるように変更される。この接続は、機能ブロック間のデータの連結度に依存して柔軟に変更できるので、限られたトランジスタおよび配線資産を活用しながら、チップ全体の演算速度の向上が期待できる。なお、このような変更を容易に行うためには、実施例２に示したようなＣＲＡＭのビットストリームを用いることが望ましい。 Here, for example, when the optimization control unit CTLC determines that the relationship between the functional block FUNC_A and the functional block FUNC_D is strong, the optimization control unit CTLC changes the degree of coupling between the functional block FUNC_A and the functional block FUNC_D. In this example, the signal from the functional block FUNC_A is changed so that two are connected to the functional block FUNC_D, one is connected to the functional block FUNC_E, and one is connected to the functional block FUNC_F. Since this connection can be flexibly changed depending on the degree of data connectivity between functional blocks, it is expected that the calculation speed of the entire chip will be improved while utilizing the limited transistor and wiring assets. In order to easily make such a change, it is desirable to use a bit stream of CRAM as shown in the second embodiment.

以上のように、本実施例６の方式を用いると、情報処理装置内のリソースを有効に活用し、最大限のパフォーマンスを出せるように自律的な制御が可能になる。 As described above, when the method of the sixth embodiment is used, the resources in the information processing apparatus can be effectively utilized and autonomous control can be performed so as to maximize the performance.

本実施例７では、図１に示した情報処理システムの変形例と共に、コンフィグデータの変更方法の更に他の一例について述べる。本実施の形態では、機能ブロックのパフォーマンスや機能ブロック間のデータの授受等の情報を利用して、演算効率の向上や演算速度の向上を目指している。これまでの実施例は、主に、既存の設計データを再利用し、機能ブロック数の増減や接続関係の再定義と、運用後に新規に設計したハードウエアの自律的な組込みに関するものであった。本実施例７では、ＣＲＡＭのビット反転を故意に引き起こし、論理の変更によって、演算パフォーマンスを向上させる仕組みについて述べる。 In the seventh embodiment, a modified example of the information processing system shown in FIG. 1 and still another example of a method of changing the config data will be described. In this embodiment, information such as the performance of functional blocks and the transfer of data between functional blocks is used to improve the calculation efficiency and the calculation speed. The examples so far have mainly been related to reusing existing design data, increasing / decreasing the number of functional blocks, redefining connection relationships, and autonomously incorporating newly designed hardware after operation. .. In the seventh embodiment, a mechanism for intentionally causing bit inversion of the CRAM and improving the calculation performance by changing the logic will be described.

図１０は、本発明の実施例７による情報処理システムにおいて、その主要部の概略構成例を示すブロック図である。図１０に示す情報処理システムは、図７の場合と同様に、複数のＦＰＧＡチップＦＰＧＡ＿ＣＨ０〜ＦＰＧＡ＿ＣＨｎからなるＦＰＧＡサブシステムＦＰＧＡ＿ＳＳＹＳを備える。図７との差異は、サブシステムの進化を制御する進化ブロックＥＶＯＬが新たに設けられる点と、進化を促す信号線（ＳＥＶＬ）が追加される点と、各ＦＰＧＡチップに、必要に応じてコンフィグデータを進化させるためのコンフィグデータ生成部ＧＥＮが設けられる点にある。 FIG. 10 is a block diagram showing a schematic configuration example of a main part of the information processing system according to the seventh embodiment of the present invention. The information processing system shown in FIG. 10 includes an FPGA subsystem FPGA_SSYS including a plurality of FPGA chips FPGA_CH0 to FPGA_CHn, as in the case of FIG. 7. The difference from FIG. 7 is that an evolution block EVOL that controls the evolution of the subsystem is newly provided, a signal line (SEVL) that promotes evolution is added, and each FPGA chip is configured as necessary. The point is that a config data generation unit GEN for evolving data is provided.

各チップのコンフィグデータ生成部ＧＥＮは、進化ブロックＥＶＯＬからのイネーブル信号や設定情報を、信号ＳＥＶＬを介してそれぞれ独立に受信し、ＣＲＡＭを更新する。図１０の例では、ＦＰＧＡチップＦＰＧＡ＿ＣＨ０，ＦＰＧＡ＿ＣＨ１に対して、まず、同じ機能ブロックＦＵＮＣ＿Ａ０，ＦＵＮＣ＿Ａ１，ＦＵＮＣ＿Ａ２を実装し、その後の演算処理の最中にＦＰＧＡチップＦＰＧＡ＿ＣＨ１のコンフィグデータを故意に変化させて演算結果がどうなるかを調べるような動作が行われる。ＦＰＧＡチップＦＰＧＡ＿ＣＨ０は設計された状態でデータの処理を進める機能ブロックであり、ＦＰＧＡチップＦＰＧＡ＿ＣＨ１は設計されたデータから出発して、よりよい論理構成を求める評価ブロックになる。このように、本実施例７では、実データを活用しながらハードウエアをよりよい状態に変更していくための方式が示される。 The config data generation unit GEN of each chip independently receives the enable signal and the setting information from the evolution block EVOL via the signal SEVL, and updates the CRAM. In the example of FIG. 10, the same functional blocks FUNC_A0, FUNC_A1, and FUNC_A2 are first mounted on the FPGA chips FPGA_CH0 and FPGA_CH1, and the config data of the FPGA chip FPGA_CH1 is intentionally changed during the subsequent arithmetic processing. An action is taken to find out what the result will be. The FPGA chip FPGA_CH0 is a functional block that advances data processing in a designed state, and the FPGA chip FPGA_CH1 is an evaluation block that starts from the designed data and seeks a better logical configuration. As described above, in the seventh embodiment, a method for changing the hardware to a better state while utilizing the actual data is shown.

ＣＲＡＭの更新には、ランダムなビット反転や、遺伝的アルゴリズムによるＣＲＡＭの部分的な変更などが考えられる。ランダムなビット反転を用いる場合、例えば、図２のコンフィグ更新部ＣＮＦＧ＿ＭＤＦＹにコンフィグデータ生成部ＧＥＮを設ければよい。コンフィグ更新部ＣＮＦＧ＿ＭＤＦＹは、ＣＲＡＭから読み出したコンフィグデータに対して、乱数または擬似乱数を用いてビットをランダムに選択し、当該選択したビットを反転させたのちＣＲＡＭに書き戻す。 Random bit inversion and partial modification of the CRAM by a genetic algorithm can be considered for updating the CRAM. When random bit inversion is used, for example, the config data generation unit GEN may be provided in the config update unit CNFG_MDFY of FIG. The config update unit CNFG_MDFY randomly selects bits from the config data read from the CRAM using random numbers or pseudo-random numbers, inverts the selected bits, and then writes them back to the CRAM.

一方、遺伝的アルゴリズムを用いる場合、予め設けた複数の染色体（ここではコンフィグデータ）に対して、淘汰、交叉および突然変異といったような操作を繰り返しながら、全体的に適応度が高い（動作状態が最良な）染色体を探索するような処理が行われる。交叉とは、例えば、ある染色体における一部のビットと別の染色体における一部のビットとを入れ替えるような操作であり、突然変異とは、ある染色体における一部のビットを反転させるような操作である。また、淘汰とは、適応度が高い染色体を残し、適応度が低い染色体を排除するような操作である。 On the other hand, when a genetic algorithm is used, the fitness is high as a whole (operating state is high) while repeating operations such as selection, crossover, and mutation for a plurality of predetermined chromosomes (config data in this case). Processing is performed to search for the best) chromosome. Crossover is, for example, an operation in which some bits in one chromosome are exchanged with some bits in another chromosome, and mutation is an operation in which some bits in one chromosome are inverted. is there. In addition, selection is an operation that leaves chromosomes with high fitness and eliminates chromosomes with low fitness.

交叉や突然変異の操作を行ったのち、残された染色体に対して、さらに、交叉や突然変異の操作を行うことで、最も適応度が高い染色体（すなわち動作状態が最良なコンフィグデータ）を探索することができる。このような操作の過程で、淘汰されないコンフィグデータは、外部コンフィグ記憶装置ＳＴＲＧに蓄えられ、淘汰されたコンフィグデータは、外部コンフィグ記憶装置ＳＴＲＧから削除される。 After performing crossover and mutation operations, the remaining chromosomes are further crossed and mutated to search for the chromosome with the highest fitness (that is, the config data with the best operating state). can do. In the process of such an operation, the config data that is not eliminated is stored in the external config storage device CTRL, and the config data that has been selected is deleted from the external config storage device CTRL.

図１０の例では、評価部インターフェースＥＶＡＬ＿ＩＦは、ＦＰＧＡチップＦＰＧＡ＿ＣＨ０とＦＰＧＡチップＦＰＧＡ＿ＣＨ１との比較（データの正当性や処理能力の比較等）を逐次行う。進化ブロックＥＶＯＬは、この比較結果に基づいて、例えば前述した淘汰等を行い、ＦＰＧＡチップＦＰＧＡ＿ＣＨ１のコンフィグデータ生成部ＧＥＮにＣＲＡＭの更新を指示する。コンフィグデータ生成部ＧＥＮは、外部コンフィグ記憶装置ＳＴＲＧで淘汰されていないコンフィグデータを用いて例えば交叉や突然変異等の操作を行い、ＣＲＡＭを更新する。ここで、評価部インターフェースＥＶＡＬ＿ＩＦにおける比較の方法の具体例としては、演算処理の時間間隔の比較や、演算結果の性能等が考えられる。 In the example of FIG. 10, the evaluation unit interface EVAL_IF sequentially compares the FPGA chip FPGA_CH0 and the FPGA chip FPGA_CH1 (comparison of data correctness, processing capacity, etc.). Based on this comparison result, the evolution block EVOL performs, for example, the above-mentioned selection, and instructs the config data generation unit GEN of the FPGA chip FPGA_CH1 to update the CRAM. The config data generation unit GEN updates the CRAM by performing operations such as crossover and mutation using the config data that has not been selected by the external config storage device CTRL. Here, as a specific example of the comparison method in the evaluation unit interface EVAL_IF, comparison of time intervals of arithmetic processing, performance of arithmetic results, and the like can be considered.

前者の演算処理の時間間隔は、単純に処理の開始と終了を計測すればよい。その際、設計時のデータを基準として、ＦＰＧＡチップＦＰＧＡ＿ＣＨ１で逐次試行を実施しながら、その試行の前後の時間間隔を比較することで、より高速処理が実現可能な（すなわちより適応度が高い）コンフィグデータを得ることができる。また、後者の演算結果の性能については、例えば、画像認識のような処理を実施する際、認識率の向上度合いなどを閾値として判定するなどの方法が考えられる。 For the time interval of the former arithmetic processing, the start and end of the processing may be simply measured. At that time, higher-speed processing can be realized (that is, higher fitness) by comparing the time intervals before and after the trial while sequentially performing trials on the FPGA chip FPGA_CH1 based on the data at the time of design. You can get the config data. Further, regarding the performance of the latter calculation result, for example, when performing a process such as image recognition, a method such as determining the degree of improvement of the recognition rate as a threshold value can be considered.

このような処理を商用ベースの実データを活用しながら蓄積し、よりよいコンフィグデータを外部コンフィグ記憶装置ＳＴＲＧに保存する。いわば、活きたデータを活用しながら学習を進め、より効率的な処理を模索するということになる。その際、評価部インターフェースＥＶＡＬ＿ＩＦでの評価結果や処理時刻、処理データ情報を同時に格納できるようにしておくことが望ましい。また、評価部インターフェースＥＶＡＬ＿ＩＦでの評価においては、不図示であるが、電力モニタ等の計測結果を踏まえることで、電力効率の評価も可能である。 Such processing is accumulated while utilizing commercial-based actual data, and better config data is stored in the external config storage device CTRL. So to speak, we will proceed with learning while utilizing live data and seek more efficient processing. At that time, it is desirable to be able to simultaneously store the evaluation result, the processing time, and the processing data information in the evaluation unit interface EVAL_IF. Further, although not shown in the evaluation by the evaluation unit interface EVAL_IF, it is possible to evaluate the power efficiency by taking into account the measurement results of the power monitor or the like.

図１１は、一般的なＬＵＴの構成例を示す回路図である。一般的なＬＵＴは、ＣＲＡＭの情報がセレクタを介して選択的に出力されるような構成になっており、４入力の場合は、１６ビットのＣＲＡＭで構成される。このＬＵＴひとつで実現可能なＣＲＡＭの設定可能な値は、２＾１６＝６５５３６通り、および入力Ｉ０〜Ｉ３のとりうる値は、２＾４＝１６通りである。これらを独立に変化させると１０４８５７６通りの組み合わせが実現できる。一般に、ＦＰＧＡにはＬＵＴが１００万個程度集積されており、そのデータ処理ビットは莫大な数になる。本実施例７の方式は、人間の経験と論理合成ツールによって設計された機能ブロックを、人間の想定範囲を超えて自律的に進化させ、よりよい性能を得るためのひとつの手段になる。 FIG. 11 is a circuit diagram showing a general LUT configuration example. A general LUT is configured so that CRAM information is selectively output via a selector, and in the case of 4 inputs, it is composed of a 16-bit CRAM. The settable value of the CRAM that can be realized by this LUT is 2 ^ 16 = 65536, and the possible values of the inputs I0 to I3 are 2 ^ 4 = 16. By changing these independently, 1048576 combinations can be realized. Generally, about 1 million LUTs are accumulated in FPGA, and the number of data processing bits is enormous. The method of the seventh embodiment is one means for autonomously evolving the functional block designed by human experience and the logic synthesis tool beyond the range assumed by human beings and obtaining better performance.

なお、ここでは、情報処理システム内でのＦＰＧＡチップ間での比較結果に基づいて所定のＦＰＧＡチップの最適化を図ったが、同様にして、ＦＰＧＡチップ内でのＦＰＧＡファブリック部間での比較結果に基づいて所定のＦＰＧＡファブリック部の最適化を図ることも可能である。また、ここでは、エラーが無いデータを確実に得るため、データの処理を進める機能ブロックと、最適化を進める評価ブロックとを設けたが、例えば、画像・音声処理等のように多少のエラーは許容できる場合には、１個のブロックでデータの処理と最適化の両方を進めることも可能である。 Here, the optimization of a predetermined FPGA chip was attempted based on the comparison result between the FPGA chips in the information processing system, but similarly, the comparison result between the FPGA fabric parts in the FPGA chip was attempted. It is also possible to optimize a predetermined FPGA fabric portion based on the above. In addition, here, in order to surely obtain data without errors, a functional block for advancing data processing and an evaluation block for advancing optimization are provided, but some errors such as image / audio processing are provided. If acceptable, it is possible to proceed with both data processing and optimization in a single block.

これまでの実施例は、ＣＲＡＭの進化による演算装置としてのハードの性能向上を実現する実施例であった。ここでは、さらに、コンフィグデータの頻繁な更新による新たな効果として、高速演算チップの実現に関する実施例を述べる。図１２は、本発明の実施例８による情報処理装置において、その主要部の構成例を示す概略図である。図１２に示す構成は、プログラマブルロジックを用いてイジングチップを実現し、ＣＲＡＭに格納したデータを進化させていくものである。 The examples so far have been examples in which the performance of the hardware as an arithmetic unit is improved by the evolution of the CRAM. Here, an example relating to the realization of a high-speed arithmetic chip will be described as a new effect due to frequent updates of config data. FIG. 12 is a schematic view showing a configuration example of a main part of the information processing apparatus according to the eighth embodiment of the present invention. The configuration shown in FIG. 12 realizes an Ising chip by using programmable logic and evolves the data stored in the CRAM.

この例では、ＣＲＡＭ自体をデータメモリのようにして扱い、ＦＰＧＡの構成を用いて演算処理を実施する。図１２には、簡単のために、それぞれ１６ビットを持つ４つのＣＲＡＭと、当該４つのＣＲＡＭに対応する４つのＬＵＴが設けられる例を示している。このＣＲＡＭには処理するデータを搭載し、４つのＬＵＴの入力を夫々共通化して接続することで、例えば、４個のＣＲＡＭのそれぞれにおける所定の１ビット（例えばビット＃０）を、ＬＵＴを構成するセレクタを介して同時に選択して出力することが可能である。この例では、当該所定の１ビットは１６ビット分あるため、４ビットで構成される情報を１面と定義すると、１６面の情報の選択が可能である。たとえば、この４ビットの情報を変更しながら、演算を進める計算機の場合、４つのＬＵＴで１６種の演算を実現できることになる。 In this example, the CRAM itself is treated like a data memory, and arithmetic processing is performed using the configuration of the FPGA. FIG. 12 shows an example in which four CRAMs each having 16 bits and four LUTs corresponding to the four CRAMs are provided for simplicity. Data to be processed is mounted on this CRAM, and by connecting the inputs of the four LUTs in common, for example, a predetermined 1 bit (for example, bit # 0) in each of the four CRAMs constitutes a LUT. It is possible to select and output at the same time via the selector. In this example, since the predetermined 1 bit has 16 bits, if the information composed of 4 bits is defined as one surface, the information on the 16 surfaces can be selected. For example, in the case of a computer that advances operations while changing the 4-bit information, 16 types of operations can be realized with four LUTs.

図１３は、図１２における具体的な演算シーケンスの一例を示す波形図である。入力信号Ｉ０〜Ｉ３は、クロック同期で変化させ、同時に、相互作用判定回路ＩＮＴＲＡＣやデータ生成部ＤＧＥＮにもクロックが入力される。クロックに同期させて、ＣＲＡＭ１６面の演算を実施させつつ、ＣＲＡＭのコンフィグを実行させる。入力信号Ｉ０〜Ｉ３は、図１３の波形図に示すように、１６回の演算が実施できるようになっており、夫々の演算をステートと呼ぶと、４つのＬＵＴに対応する４つのＣＲＡＭのビット情報が、各ステート毎に、０番から１５番まで順に出力され、演算される。 FIG. 13 is a waveform diagram showing an example of a specific calculation sequence in FIG. The input signals I0 to I3 are changed by clock synchronization, and at the same time, the clock is input to the interaction determination circuit INTRAC and the data generation unit DGEN. Synchronized with the clock, the CRAM config is executed while performing the calculation on the 16th surface of the CRAM. As shown in the waveform diagram of FIG. 13, the input signals I0 to I3 can perform 16 operations, and when each operation is called a state, the bits of the four CRAMs corresponding to the four LUTs are used. Information is output in order from 0 to 15 for each state and calculated.

この例では、４つのＬＵＴからの出力信号は、相互作用判定回路ＩＮＴＲＡＣへも入力され、相互作用判定回路ＩＮＴＲＡＣは、４つのＬＵＴからの出力信号の相互作用を判定することで、ＣＲＡＭに格納されるデータの相関を評価する。相互作用判定回路ＩＮＴＲＡＣは、たとえば、各ビット間の相関や重み付けのパラメータを加え、積和演算や平均化処理を実行する。その演算結果を受けて、データ生成部ＤＧＥＮは、ＣＲＡＭに格納していたビット情報を反転させるかそのままにするかを決める。このように、本実施例８では、ＣＲＡＭのビット情報間の関連性を検出して、ＣＲＡＭの更新を実施する。 In this example, the output signals from the four LUTs are also input to the interaction determination circuit INTRAC, and the interaction determination circuit INTRAC is stored in the CRAM by determining the interaction of the output signals from the four LUTs. Evaluate the correlation of the data. The interaction determination circuit INTRAC, for example, adds correlation and weighting parameters between each bit, and executes a product-sum operation and an averaging process. Upon receiving the calculation result, the data generation unit DGEN decides whether to invert the bit information stored in the CRAM or leave it as it is. As described above, in the eighth embodiment, the CRAM is updated by detecting the relationship between the bit information of the CRAM.

図１４〜図１６は、イジングチップの概念を説明する図である。物理現象の基底状態の計算手法として、イジングモデルと呼ばれる手法が提案されている。これは、物質を構成する粒子の特にスピンの向きのよるエネルギー状態を計算し、より安定な状態を求めるための計算モデルであり、たとえば、スピンを持った粒子を平面に並べてそれらの最近接のスピン間の相互作用を計算し安定状態へ導くものである。図１４は、一例として２次元イジングモデルをＦＰＧＡへマッピングする例を示したものである。 14 to 16 are diagrams for explaining the concept of the Ising tip. A method called the Ising model has been proposed as a method for calculating the ground state of a physical phenomenon. This is a computational model for calculating the energy state of the particles that make up a substance, especially depending on the direction of spin, and finding a more stable state. For example, particles with spin are arranged on a plane and their closest ones. It calculates the interaction between spins and leads to a stable state. FIG. 14 shows an example of mapping a two-dimensional Ising model to an FPGA.

図１４において、イジングの各スピン粒子（イジングノード（Ｉｎｄ）と呼ぶ）は丸印で表され、平面に一様に並べられている。この図では、それらを６×６の行列単位でまとめて、ＰＥ（Primitive Element）として定義している。以降で、これをＦＰＧＡへマッピングする例を説明する。 In FIG. 14, each Ising spin particle (referred to as an Ising node (Ind)) is represented by a circle and is uniformly arranged on a plane. In this figure, they are grouped in 6 × 6 matrix units and defined as PE (Primitive Element). An example of mapping this to FPGA will be described below.

図１５は、ひとつのＰＥを抜き出し、近接イジングノードＩｎｄ間の接続関係を示した図である。ひとつのイジングノードＩｎｄに着目し、それらを縦横斜めの関連を計算することを念頭に置くと、図１５に示すように、８本の相互作用線Ｉｂ（Interaction bond）が考えられる。この８本の相互作用線Ｉｂで結ばれた基本単位（以降、ＥＵ：Execution Unitと呼ぶ）ごとに演算を実行する。 FIG. 15 is a diagram showing a connection relationship between neighboring Ising nodes Ind by extracting one PE. Focusing on one Ising node Ind and keeping in mind that the vertical, horizontal, and diagonal relationships are calculated, eight interaction lines Ib (Interaction bond) can be considered as shown in FIG. The calculation is executed for each basic unit (hereinafter referred to as EU: Execution Unit) connected by these eight interaction lines Ib.

この時、演算の組み合わせとして、初期のＣＲＡＭビットに格納されているデータを用いて演算する上で、１６通りの演算の組み合わせが存在することになり、ひとつのＰＥの中に、図１５に示されるように、１６個のＥＵ（ＥＵ０〜ＥＵ１５）が定義される。図１５では、見易さのため、それらの関係が重ならないように示されており、（Ａ）〜（Ｉ）の９個の図中に１６通りの組み合わせをすべて表現している。つまり、ひとつのＰＥを考える場合、１６通りの計算ができることが必要となる。ＥＵの構成について、図１６に詳細を示す。ＥＵ内は、このモデルでは、９個のイジングノードＩｎｄ０〜Ｉｎｄ８と、８本の相互作用線Ｉｂ０〜Ｉｂ７により構成される。本実施例８では、各イジングノードをＬＵＴに割り当て、その相互作用の演算をユーザ論理回路で実施する。 At this time, as a combination of operations, there are 16 combinations of operations in performing the operation using the data stored in the initial CRAM bit, and it is shown in FIG. 15 in one PE. 16 EUs (EU0 to EU15) are defined so as to be. In FIG. 15, for the sake of clarity, the relationships are shown so as not to overlap, and all 16 combinations are represented in the nine figures (A) to (I). That is, when considering one PE, it is necessary to be able to perform 16 kinds of calculations. Details of the EU configuration are shown in FIG. In this model, the EU is composed of nine Ising nodes Ind0 to Ind8 and eight interaction lines Ib0 to Ib7. In the eighth embodiment, each Ising node is assigned to the LUT, and the operation of the interaction is performed by the user logic circuit.

図１７は、図１５のＰＥ（をＦＰＧＡへマッピングした場合の構成例を示す概略図である。図１７には、９個のＬＵＴ（ＬＵＴ０〜ＬＵＴ８）と、各ＬＵＴへの入力である４ビットの入力信号Ｉ０〜Ｉ３と、入力信号を９つのＬＵＴへ接続するスイッチ回路ＳＷＣと、演算データを一時的に保持するフリップフロップＦＦと、相関評価回路ＩＥＵと、メモリＲＡＭと、アドレス変換部ＡＤＲＣＶと、ＣＲＡＭ書き込み回路ＣＲＷＴとが示される。 FIG. 17 is a schematic view showing a configuration example when the PE of FIG. 15 is mapped to the FPGA. FIG. 17 shows nine LUTs (LUT0 to LUT8) and four bits which are inputs to each LUT. Input signals I0 to I3, a switch circuit SWC that connects the input signals to nine LUTs, a flip-flop FF that temporarily holds arithmetic data, a correlation evaluation circuit IEU, a memory RAM, and an address conversion unit ADRCV. , CRAM write circuit CRWT is shown.

相関評価回路ＩＥＵは、図１２の相互作用判定回路ＩＮＴＲＡＣおよびデータ生成部ＤＧＥＮに該当し、イジングノードの相互作用を演算し、その相互作用（Ｅ）の演算結果および当該演算結果から導出される各イジングノード（σｉ，σｊ）のデータをメモリＲＡＭに格納する。メモリＲＡＭは、例えば、予めＦＰＧＡ内に搭載される。ＣＲＡＭ書き込み回路ＣＲＷＴは、メモリＲＡＭに書き込んだイジングノードの演算結果のデータを、アドレスを指定してＣＲＡＭへ書き戻す。アドレス変換部ＡＤＲＣＶは、ＣＲＡＭ書き込み回路ＣＲＷＴが書き戻しを行う際のアドレスを変更する機能を備える。各ＬＵＴには、対応するＣＲＡＭが備わり、本実施例８では４入力のＬＵＴを想定し、１６ビットのＣＲＡＭがひとつのＬＵＴに関連付けられる。ＬＵＴ０にはＣＲＡＭ０が対応し、そのＣＲＡＭは、ビット＃０〜＃１５の１６ビットを備える。 The correlation evaluation circuit IEU corresponds to the interaction determination circuit INTRAC and the data generation unit DGEN in FIG. 12, calculates the interaction of Ising nodes, and calculates the calculation result of the interaction (E) and each derived from the calculation result. The data of the Ising nodes (σi, σj) is stored in the memory RAM. The memory RAM is installed in the FPGA in advance, for example. The CRAM write circuit CRWT writes back the data of the calculation result of the Ising node written in the memory RAM to the CRAM by designating the address. The address conversion unit ADRCV has a function of changing the address when the CRAM writing circuit CRWT writes back. Each LUT is provided with a corresponding CRAM, and in the eighth embodiment, a 4-input LUT is assumed, and a 16-bit CRAM is associated with one LUT. CRAM0 corresponds to LUT0, and the CRAM includes 16 bits of bits # 0 to # 15.

図１５と対応させて、図１７の構成について述べる。図１５では、ひとつのＥＵにおいて、相互作用を実施するイジングノードＩｎｄは９個あり、相互作用線Ｉｂは８本ある。イジングノードＩｎｄについて、８本の相互作用線を定義しているが、この相互作用の元となる９個のイジングノードＩｎｄを、９個のＬＵＴで実現している。図１７の例では、９個のＬＵＴの夫々に対して４つの入力信号Ｉ０〜Ｉ３が共通に入力される。図１３の場合と同様に、４つの入力信号で１６通りの入力を行うと、これに応じて、９個のＬＵＴは、対応するＣＲＡＭの中から１ビットを選択し、後段のフリップフロップＦＦへ出力する動作を１６回行う。 The configuration of FIG. 17 will be described in correspondence with FIG. In FIG. 15, in one EU, there are nine Ising node Inds that perform interactions and eight interaction lines Ib. Eight interaction lines are defined for the Ising node Ind, and nine Ising node Inds that are the source of this interaction are realized by nine LUTs. In the example of FIG. 17, four input signals I0 to I3 are commonly input to each of the nine LUTs. As in the case of FIG. 13, when 16 types of inputs are performed with the four input signals, the nine LUTs select one bit from the corresponding CRAMs and go to the flip-flop FF in the subsequent stage. The output operation is performed 16 times.

ここで、図１５に示した１６個のＥＵ０〜ＥＵ１５は、それぞれ、各ＬＵＴのＣＲＡＭのビット＃０からビット＃１５に対応する。すなわち、図１５のＥＵ０は、ＣＲＡＭ０〜ＣＲＡＭ８のビット＃０に対応し、当該ＣＲＡＭ０〜ＣＲＡＭ８のビット＃０は、それぞれ、当該ＥＵ０のイジングノードＩｎｄ０〜Ｉｎｄ８（図１６）に対応する。また、図１５のＥＵ１は、ＣＲＡＭ０〜ＣＲＡＭ８のビット＃１に対応し、当該ＣＲＡＭ０〜ＣＲＡＭ８のビット＃１は、それぞれ、当該ＥＵ１のイジングノードＩｎｄ０〜Ｉｎｄ８（図１６）に対応する。 Here, the 16 EUs to EU15 shown in FIG. 15 correspond to bits # 0 to bits # 15 of the CRAM of each LUT, respectively. That is, EU0 in FIG. 15 corresponds to bit # 0 of CRAM0 to CRAM8, and bit # 0 of the CRAM0 to CRAM8 corresponds to Ising nodes Ind0 to Ind8 (FIG. 16) of the EU0, respectively. Further, EU1 in FIG. 15 corresponds to bit # 1 of CRAM0 to CRAM8, and bit # 1 of the CRAM0 to CRAM8 corresponds to Ising nodes Ind0 to Ind8 (FIG. 16) of the EU1, respectively.

イジングモデルでは、イジングノード（σｉ，σｊ）との間の相互作用を計算するので、単純な演算では、重み係数（相互作用係数）（ｗｉｊ）とσｉ，σｊ（各σ＝−１か１）を掛け合わせて総和を計算する。この時、ＣＲＡＭに格納されるデータは０，１であるので、０については、四則演算時に−１に変換する処理を実施する。総和演算の結果、エネルギーの増減を計算するが、その際、以前の値との比較を実施することで、よりエネルギー状態の低い方向となるように粒子のスピンの向きを変える。粒子のスピンの向きは、σｉ，σｊで記述されるので、演算結果で更新されたσｉ，σｊをメモリＲＡＭへ書き込む。 In the Ising model, the interaction with the Ising node (σi, σj) is calculated, so in a simple calculation, the weighting coefficient (interaction coefficient) (wij) and σi, σj (each σ = -1 or 1) Multiply to calculate the sum. At this time, since the data stored in the CRAM is 0, 1, 0 is converted to -1 at the time of the four arithmetic operations. As a result of the summation calculation, the increase or decrease of energy is calculated. At that time, by comparing with the previous value, the direction of the spin of the particle is changed so that the energy state is lower. Since the spin directions of the particles are described by σi and σj, the σi and σj updated by the calculation result are written to the memory RAM.

イジングモデルでは、全粒子間の相互作用を計算し、エネルギーが最低もしくは極小になる状態を見つける。そのため、図１４に記載のＰＥは、この図に示した区分だけでは無く、ＰＥを上下左右に１イジングノードだけずらしたＰＥを定義して計算する必要がある。そのために、図１７に示したハードの基本構成を変えるのではなく、ＥＵのイジングノードの情報を格納するＣＲＡＭのメモリアドレスを変更することで、あたかも評価するイジングノード位置を変化させたかのごとく表現する。 In the Ising model, the interaction between all particles is calculated to find the state where the energy is the lowest or the lowest. Therefore, the PE shown in FIG. 14 needs to be calculated by defining not only the classification shown in this figure but also the PE in which the PE is shifted up, down, left and right by one Ising node. Therefore, instead of changing the basic configuration of the hardware shown in FIG. 17, by changing the memory address of the CRAM that stores the information of the EU Ising node, it is expressed as if the position of the Ising node to be evaluated was changed. ..

これにより、基本的なハードを固定しながら、イジングノード間の相互作用を、逐次的に、あまねく計算することが可能となる。そのための手段として、メモリＲＡＭ内の更新されたイジングノードのデータをＣＲＡＭに書き戻す際、アドレス変換部ＡＤＲＣＶを介してＣＲＡＭに格納する。たとえば、最初、ＣＲＡＭ０のビット＃０に格納されていたデータを、演算の結果、ＣＲＡＭ０のビット＃１に格納するなどが考えられる。このようにすることで、最初のイジングノードが、たとえば図１４、もしくは、図１５に記載の例では、右へ１ビットずれる形になり、相対的にＥＵがイジングノードひとつ分左へシフトすることになり、結果として、相互作用を計算する関係がイジングノードひとつ分ずれることになる。 This makes it possible to sequentially and uniformly calculate the interaction between Ising nodes while fixing the basic hardware. As a means for that, when writing back the updated Ising node data in the memory RAM to the CRAM, the data is stored in the CRAM via the address conversion unit ADRCV. For example, it is conceivable that the data initially stored in bit # 0 of CRAM0 is stored in bit # 1 of CRAM0 as a result of the calculation. By doing so, for example, in the example shown in FIG. 14 or FIG. 15, the first Ising node is shifted by 1 bit to the right, and the EU is relatively shifted to the left by one Ising node. As a result, the relationship for calculating the interaction is shifted by one Ising node.

図１７に示した例では、ひとつのＰＥを計算する場合について述べたが、実際は、このＰＥが多数ＦＰＧＡ上に展開されている。そのため、他のＰＥも同様に同じ演算サイクルで計算され、演算結果をメモリ格納後、新たにＣＲＡＭへ格納するために、アドレス変換が実施される。ただし、ＰＥは、便宜上分けただけであるので、隣接するＰＥともイジングノードを介して相互作用をさせる必要がある。すなわち、他のＰＥとの間で演算結果を授受する必要がある。図１７には、そのための信号経路を簡略化して記載した。実際、データの授受には、メモリ間の転送であってもよい。メモリ間で転送後、ＣＲＡＭにセットするためのアドレス変換を実施しＣＲＡＭへ格納すればよいからである。 In the example shown in FIG. 17, the case of calculating one PE has been described, but in reality, many of these PEs are deployed on the FPGA. Therefore, other PEs are similarly calculated in the same calculation cycle, and after the calculation result is stored in the memory, address conversion is performed in order to newly store the calculation result in the CRAM. However, since PEs are only separated for convenience, it is necessary to interact with adjacent PEs via the Ising node. That is, it is necessary to exchange calculation results with other PEs. FIG. 17 shows a simplified signal path for that purpose. In fact, data transfer may be transfer between memories. This is because after the transfer between the memories, the address conversion for setting in the CRAM may be performed and stored in the CRAM.

なお、図１７には、ＣＲＡＭ書き込み回路ＣＲＷＴがＰＥ内に入っている例を示したが、このＣＲＡＭ書き込み回路ＣＲＷＴは、チップ内に集合的に集積してもかまわない。チップ内にある程度集積させることで、論理回路の集積度が向上できる効果がある。また、本実施例８の方式では、ＦＰＧＡを、いわゆる通常の意味での論理回路を集積するＬＳＩとしても使用可能であり、汎用使用においても何ら問題が生じない。以上のように、本実施例８の方式を用いることで、汎用のＣＯＴＳ（Commercial Off The Shelf）製品を使った高性能ＬＳＩの実現が可能になる。 Although FIG. 17 shows an example in which the CRAM writing circuit CRWT is contained in the PE, the CRAM writing circuit CRWT may be collectively integrated in the chip. By integrating to some extent in the chip, there is an effect that the degree of integration of the logic circuit can be improved. Further, in the method of the eighth embodiment, the FPGA can be used as an LSI for integrating logic circuits in the so-called normal sense, and no problem occurs even in general-purpose use. As described above, by using the method of the eighth embodiment, it is possible to realize a high-performance LSI using a general-purpose COTS (Commercial Off The Shelf) product.

図１８は、図１７の構成を用いて演算を実施する際の全体的な動作例を示す図である。図１８において、パターンＡの状態は、イジングノードを３６個分集めてＰＥを構成し、それを、縦横に配置した状態となっている。パターンＢの状態は、そのイジングノードの集まりをイジングノード１つ分だけ右にずらしてＰＥを定義し、それを縦横に並べた状態となっている。同様に、パターンＣの状態は、イジングノードの集まりをイジングノード１つ分だけさらに右にずらしてＰＥを定義し、それを縦横に並べた状態となっている。 FIG. 18 is a diagram showing an overall operation example when performing an operation using the configuration of FIG. In FIG. 18, the state of the pattern A is a state in which 36 Ising nodes are collected to form a PE, which are arranged vertically and horizontally. In the state of pattern B, the group of Ising nodes is shifted to the right by one Ising node to define PE, and the PEs are arranged vertically and horizontally. Similarly, the state of pattern C is a state in which a group of Ising nodes is further shifted to the right by one Ising node to define PE, and the PEs are arranged vertically and horizontally.

ここで、図１８に示した動作波形図では、時刻Ｔ１にて、パターンＡの状態ですべてのＰＥを演算し、演算結果をメモリＲＡＭへ格納し、その後、アドレス変換を加えてＣＲＡＭへ格納するまでの演算がシーケンシャルに実施される。その後、時刻Ｔ２では、パターンＢの状態において、ＰＥに関するすべての演算が実施される。以下、時刻Ｔ３では、パターンＣの状態において、ＰＥに関するすべての演算が実施され、以降同様にして、イジングノード１つ分だけ上下左右にずらした演算が複数回実施され、すべてのイジングノード間の演算が実施されることになる。 Here, in the operation waveform diagram shown in FIG. 18, at time T1, all PEs are calculated in the state of pattern A, the calculation results are stored in the memory RAM, and then address conversion is performed and stored in the CRAM. The operations up to are performed sequentially. After that, at time T2, all operations related to PE are performed in the state of pattern B. Hereinafter, at time T3, in the state of pattern C, all operations related to PE are performed, and thereafter, in the same manner, operations shifted up, down, left and right by one Ising node are performed a plurality of times, and between all Ising nodes. The calculation will be performed.

なお、本実施例８では、４入力のＬＵＴについて例示的に述べてきたが、ＬＵＴの入力数は４入力に限定されるものではない。入力数が５の場合は、ＥＵの数が２倍の３２個まで取れることになり、ＰＥ２つ分を、９個のＬＵＴで実現する形になるので、より高集積化ができる効果がある。このように、ＬＵＴの入力数は４入力以外であっても本実施例８の方式を適用することは可能である。 Although the 4-input LUT has been described as an example in the eighth embodiment, the number of LUT inputs is not limited to four. When the number of inputs is 5, the number of EUs can be doubled to 32, and 2 PEs can be realized by 9 LUTs, which has the effect of achieving higher integration. As described above, the method of the eighth embodiment can be applied even if the number of LUT inputs is other than four.

以上、本発明者によってなされた発明を実施の形態に基づき具体的に説明したが、本発明は前記実施の形態に限定されるものではなく、その要旨を逸脱しない範囲で種々変更可能である。例えば、前述した実施の形態は、本発明を分かり易く説明するために詳細に説明したものであり、必ずしも説明した全ての構成を備えるものに限定されるものではない。また、ある実施例の構成の一部を他の実施例の構成に置き換えることが可能であり、また、ある実施例の構成に他の実施例の構成を加えることも可能である。また、各実施例の構成の一部について、他の構成の追加・削除・置換をすることが可能である。 Although the invention made by the present inventor has been specifically described above based on the embodiment, the present invention is not limited to the embodiment and can be variously modified without departing from the gist thereof. For example, the above-described embodiments have been described in detail in order to explain the present invention in an easy-to-understand manner, and are not necessarily limited to those having all the described configurations. Further, it is possible to replace a part of the configuration of one embodiment with the configuration of another embodiment, and it is also possible to add the configuration of another embodiment to the configuration of one embodiment. Further, it is possible to add / delete / replace a part of the configuration of each embodiment with another configuration.

ＣＮＦＧＣコンフィグ制御部
ＣＲＡＭコンフィギュレーションメモリ
ＣＲＡＭＣＣＲＡＭコントローラ
ＣＵＮＴデータ監視部
ＤＢＵＳオンチップバス
ＥＶＡＬ＿ＩＦ評価部インターフェース
ＦＰＧＡ＿ＣＨＦＰＧＡチップ
ＦＰＧＡ＿ＦＡＢＦＰＧＡファブリック部
ＳＴＲＧ外部コンフィグ記憶装置 CNFGC config control unit CRAM configuration memory CRAMC CRAM controller CUNT data monitoring unit DBUS on-chip bus EVAL_IF evaluation unit interface FPGA_CH FPGA chip FPGA_FAB FPGA fabric unit STRG external config storage device

Claims

K lookup tables, each with n inputs,
M configuration memories corresponding to each of the K lookup tables,
A correlation evaluation circuit that determines the interaction of K output signals output from the K lookup table and generates new data based on the determination result.
A CRAM write circuit that writes back the new data generated by the correlation evaluation circuit to the M configuration memories by designating an address.
It is an information processing device that has
Each of the M configuration memories is composed of 2 n bits.
Each of the K look-up tables selects and outputs one bit in the corresponding configuration memory according to the n-bit input signal.
The n-bit input signals are commonly input to the K look-up tables.
Data of the calculation result of each Ising node based on the Ising model is mapped to each bit of the M configuration memories.
Information processing device.

In the information processing apparatus according to claim 1,
Further, it has an address conversion unit that shifts the address specified by the CRAM writing circuit.
Information processing device.