JP5315517B2

JP5315517B2 - Information processing apparatus and virtual circuit writing method

Info

Publication number: JP5315517B2
Application number: JP2008184301A
Authority: JP
Inventors: 優年関根; 浩晃飯島; 一輝佐藤
Original assignee: NATIONAL UNIVERSITY CORPORATION TOKYO UNIVERSITY OF AGRICULUTURE & TECHNOLOGY
Current assignee: NATIONAL UNIVERSITY CORPORATION TOKYO UNIVERSITY OF AGRICULUTURE & TECHNOLOGY
Priority date: 2008-07-15
Filing date: 2008-07-15
Publication date: 2013-10-16
Anticipated expiration: 2028-07-15
Also published as: JP2010026607A

Abstract

PROBLEM TO BE SOLVED: To provide an inexpensive and high performance computer for a large-scaled arithmetic operation which can be used for individuals. SOLUTION: A information processor includes: a memory for storing an application program for performing a prescribed arithmetic operation about an arithmetic object; an arithmetic unit array 40 configured of a plurality of arithmetic units arranged so as to be connected to directly perform data communication between the adjacent arithmetic units correspondingly to each problematic region of the arithmetic object, and configured to transmit/receive arithmetic result data about each problematic region between the adjacent arithmetic units, wherein arithmetic circuits to be used for executing the application program are reconfigured to perform an arithmetic operation corresponding to each problematic region; a host processor for executing the application program, and for acquiring arithmetic result data about each problematic region from each arithmetic unit configuring the arithmetic device array 40, and for calculating the arithmetic result of the arithmetic object; and a bus for communicating data among the memory, host processor and the arithmetic unit array 40. COPYRIGHT: (C)2010,JPO&INPIT

Description

本発明は、情報処理装置および情報処理方法に関し、特に、書き換え可能なプログラマブル・ロジック・デバイス（PLD：Programmable Logic Device）を用いて膨大な計算量の処理を行うコンピュータに適用して好適な情報処理装置および仮想回路書き込み方法に関する。 The present invention relates to an information processing apparatus and an information processing method, and in particular, information processing suitable for application to a computer that performs processing of an enormous amount of calculation using a rewritable programmable logic device (PLD). The present invention relates to an apparatus and a virtual circuit writing method.

近年、大規模演算の環境が高度に整備されてきたのを受け、生物・化学・天文・工学などの広い分野においてＨＰＣ(high-performance computing) による演算手法が利用されている。ＨＰＣは複数の計算機システムを結合し、一つのシステムとして演算環境を提供するものである。一つの計算機システムの性能が低くとも、システム全体としては高速に演算が可能となる。 In recent years, a large-scale computing environment has been highly developed, and computation techniques using HPC (high-performance computing) are used in a wide range of fields such as biology, chemistry, astronomy, and engineering. HPC combines a plurality of computer systems and provides a computing environment as one system. Even if the performance of one computer system is low, the entire system can be operated at high speed.

例えば、ＨＰＣの一例としてスーパーコンピュータがある。スーパーコンピュータは従来、専用のプロセッサを及び専用アーキテクチャで構成されてきたが、図１に示すように、バス２を介して汎用のマイクロ・プロセッサ１を多数接続して並列実行するアーキテクチャで構成されたものが現れている。 For example, there is a supercomputer as an example of HPC. Conventionally, a supercomputer has been configured with a dedicated processor and a dedicated architecture. However, as shown in FIG. 1, the supercomputer is configured with an architecture in which a large number of general-purpose microprocessors 1 are connected via a bus 2 and executed in parallel. Things are appearing.

また、ＨＰＣには、汎用ＰＣ（Personal Computer）によるグリッド構成のものや、ＧＰＧＰＵ(General Purpose Graphics Processing Unit)を利用したグリッド構成など、ＩＣ（Integrated Circuit）を並べた構成のものもある。 Some HPCs have a configuration in which ICs (Integrated Circuits) are arranged, such as a grid configuration using a general-purpose PC (Personal Computer) and a grid configuration using a GPGPU (General Purpose Graphics Processing Unit).

さらにＦＰＧＡ（Field Programmable Gate Array）等のリコンフィギャラブル（再構成可能）な半導体集積回路（LSI：Large Scale Integration）を用いて並列実行するアーキテクチャで構成されたものなど、高性能な演算を行う研究も見られるようになった。特に大規模演算向けに構成されているクラスタ型のものは、ＲＨＰＣ(Reconfigurable High Performance Computing)やＨＰＲＣｓ（High-Performance Reconfigurable Computers）などと呼ばれている。このような回路をリコンフィギャラブルなハードウェアは、総称してプログラマブル・ロジック・デバイス（PLD：Programmable Logic Device）と呼ばれており、少数の製品から量産品まで幅広く流通している。 In addition, research that performs high-performance operations, such as FPGAs (Field Programmable Gate Array) and other architectures that are configured in parallel architecture using reconfigurable semiconductor integrated circuits (LSI: Large Scale Integration) Also came to be seen. In particular, cluster-type computers configured for large-scale computation are called RHPC (Reconfigurable High Performance Computing) and HPRCs (High-Performance Reconfigurable Computers). Such reconfigurable hardware is generally called a programmable logic device (PLD) and is widely distributed from a small number of products to mass-produced products.

例えば、本出願人は、ハードウェア・モジュール（hwModule）と呼ばれるＰＣＩ型ＦＰＧＡボードを利用した、ハードウェア／ソフトウェア（hw/sw）複合体を提案している（例えば、特許文献１を参照。）。 For example, the present applicant has proposed a hardware / software (hw / sw) complex using a PCI type FPGA board called a hardware module (hwModule) (see, for example, Patent Document 1). .

hw/sw複合体とは、ハードウェア・モジュールのＦＰＧＡを仮想回路(hwNet)として利用し、ハードウェア・オブジェクト（hwObject）と呼ばれる仮想回路の詳細な制御を隠蔽するクラスを継承することにより、回路資源を容易に利用できるシステムである。このハードウェア／ソフトウェア複合体の特長として、汎用性、並列分散処理や外部機器との接続などがあげられる。 The hw / sw complex uses a hardware module FPGA as a virtual circuit (hwNet) and inherits a class that hides the detailed control of the virtual circuit called a hardware object (hwObject). It is a system that can easily use resources. Features of this hardware / software complex include versatility, parallel distributed processing, and connection to external devices.

すなわち、ソフトウェアから直接、ＦＰＧＡ内の仮想回路を制御でき、対象となるアプリケーション対応して作成された適切な仮想回路をＦＰＧＡに書き込むことで、最速な仮想回路により、アプリ毎の問題を計算することが可能である。 That is, the virtual circuit in the FPGA can be controlled directly from the software, and the problem for each application is calculated by the fastest virtual circuit by writing an appropriate virtual circuit created for the target application in the FPGA. Is possible.

一般に、同一の演算をソフトウェアとハードウェア回路で実行した場合、ハードウェア回路による実行はソフトウェアの場合と比較して３０倍から数百倍、高速であることが経験的に知られている。例えば、現行のマイクロ・プロセッサは、３ＧＨｚの周波数で動作するが、安価なＦＰＧＡでは２００ＭＨｚ動作が普通に得られる最良の速度である。したがって、マイクロ・プロセッサによるソフトウェア実行に対して、ＦＰＧＡによる仮想回路では、実効的に６ＧＨｚ（２００ＭＨｚ×３０）から２０−４０ＧＨｚで動作するマイクロ・プロセッサに対応する。さらに、ＦＧＰＡ内に複数個の演算回路を入れることで、１０数倍の性能向上が可能となり、６０ＧＨｚ〜４００ＧＨｚ相当の性能が見込まれる。
特許第３８４５０２１号公報 In general, when the same operation is executed by software and a hardware circuit, it is empirically known that execution by the hardware circuit is 30 to several hundred times faster than the case of software. For example, current microprocessors operate at a frequency of 3 GHz, but cheaper FPGAs are the best speed at which 200 MHz operation is normally obtained. Therefore, for the software execution by the microprocessor, the virtual circuit by the FPGA effectively corresponds to the microprocessor operating from 6 GHz (200 MHz × 30) to 20-40 GHz. Furthermore, by adding a plurality of arithmetic circuits in the FGPA, the performance can be improved by a factor of ten, and a performance equivalent to 60 GHz to 400 GHz is expected.
Japanese Patent No. 3845021

上述したＨＰＣに共通した特徴は、アプリケーション・プログラムを並列実行可能な形態に分割し、並列実行可能な演算装置に割り当てて実行するというものである。このような構成のＨＰＣは高価であり、かつ、多人数が異なる問題領域に使用するため、ＨＰＣは汎用的な構成、すなわち、汎用のマイクロ・プロセッサ間で任意のデータ転送路が実現できるように設計される。概念的には、図１に示したバス２のような共通の通信路を介して、マイクロ・プロセッサ１が接続される形態である。しかし、いずれの演算結果もバス２を介して分散した各マイクロ・プロセッサ２に転送され、そのバスネックが性能のボトルネックとなっている。 A feature common to the above-described HPC is that an application program is divided into a form that can be executed in parallel, and is assigned to an arithmetic device that can be executed in parallel for execution. Since the HPC having such a configuration is expensive and is used for problem areas with different numbers of people, the HPC has a general configuration, that is, an arbitrary data transfer path can be realized between general-purpose microprocessors. Designed. Conceptually, the microprocessor 1 is connected via a common communication path such as the bus 2 shown in FIG. However, any calculation result is transferred to each distributed microprocessor 2 via the bus 2, and the bus neck becomes a performance bottleneck.

上述のとおり、図１に示したような形態では、共通の通信路であるバス２のオーバーヘッドが大きくなる。そこで、バス２のオーバーヘッドをなくすために、図２に示すようなスイッチボックス５で高速化を図るものが普通である。スイッチボックス５において、任意のマイクロ・プロセッサからのデータを最適なマイクロ・プロセッサを選択して送るというものである。マイクロ・プロセッサの代わりにＦＰＧＡを用いたものもあるが、バス２を用いることやスイッチボックス５等の構成は同じである。 As described above, in the form as shown in FIG. 1, the overhead of the bus 2, which is a common communication path, increases. Therefore, in order to eliminate the overhead of the bus 2, a switch box 5 as shown in FIG. In the switch box 5, data from an arbitrary microprocessor is selected and sent to the optimum microprocessor. Some use FPGA instead of the microprocessor, but the configuration of the bus 2 and the switch box 5 is the same.

しかし、これらの方式では、消費電力が膨大なものになり、これ以上の高速化を実現することは困難であった。また、通信路での高速性と低消費電力を実現するために広帯域のデータ通信が可能な光通信を使用することが考えられているが高価なものになる。また、高価なスーパーコンピュータとなるので金銭面から多くの台数を用意することは、難しく多人数の利用者が予約して時分割で使用するため、利用できるまでの待ち時間が長くなるという問題も生じている。 However, with these methods, the power consumption becomes enormous, and it has been difficult to achieve higher speeds. In addition, in order to realize high speed and low power consumption in the communication path, it is considered to use optical communication capable of broadband data communication, but it becomes expensive. Also, since it becomes an expensive supercomputer, it is difficult to prepare a large number of units from the financial aspect, and it is difficult for many users to make reservations and use them in time division, so there is a problem that the waiting time until use becomes long Has occurred.

また特許文献１に記載されたhw/sw複合体のアプリケーション例には、これまでhwModuleのネットワーク対応に関する研究がなされてきた。しかし、大規模並列化を要求する、数値シミュレーションなどのアプリケーションには、計算機間を非常に高速にデータ転送することを必要とするものもあり、その際にバスネックがボトルネックとなって必ずしもＨＰＣとしての要求性能を満たせなかった。すなわち、内部演算速度の向上に対してデータ転送は、通信路の電気特性に律束されるので、安価で高速の通信路を得ることができなかった。 In addition, as for the application example of the hw / sw complex described in Patent Document 1, studies on hwModule network support have been made so far. However, some applications that require large-scale parallelization, such as numerical simulation, require data transfer between computers at a very high speed. The required performance was not met. In other words, data transfer is restricted by the electrical characteristics of the communication path for the improvement of the internal calculation speed, and it has not been possible to obtain an inexpensive and high-speed communication path.

本発明は、このような状況に鑑みてなされたものであり、個人用に使える安価で高性能な大規模演算用のコンピュータを提供できるようにする。 The present invention has been made in view of such a situation, and makes it possible to provide an inexpensive and high-performance computer for large-scale computation that can be used for personal use.

本発明の情報処理装置は、演算対象の構造に合わせて複数の演算装置を配置し、演算実行時はバスを介在させず演算装置間で直接データ転送し、各演算装置から演算結果を取り出してホストプロセッサへ送出またはホストプロセッサからデータを入力するときのみバスを使用するようにしたものである。
具体的には、本発明の一側面の情報処理装置は、メモリと、演算装置アレイと、ホストプロセッサと、バスを備えるように構成する。
メモリは、演算対象について所定の演算を行うアプリケーション・プログラムを記憶する。
演算装置アレイは、複数の演算装置から構成される。この複数の演算装置は、前記演算対象の各々の問題領域に対応して隣接する演算装置間で直接データ通信可能に接続されて配置され、また前記アプリケーション・プログラムの実行に用いられ前記各々の問題領域に対応した演算を行う演算回路が再構成され、隣接する演算装置間で前記問題領域について前記演算回路による演算結果データを送受信する。
ホストプロセッサは、前記演算装置アレイを構成する各演算装置に対して予め演算結果データの転送先としての隣接する演算装置を決定し、前記アプリケーション・プログラムを実行し、前記演算装置アレイを構成する各演算装置から各問題領域についての演算結果データを取得し、前記演算対象について演算結果を算出する。
バスは、前記メモリ、前記ホストプロセッサおよび前記演算装置アレイとの間で、データを通信する。
そして、演算装置に対し、前記ホストプロセッサから送られる、前記演算対象の各々の問題領域に対応する仮想回路データを含んだ仮想回路書き込みデータに基づいて、前記演算回路を書き込むための仮想回路データ書き込み回路を書き込み、該仮想回路データ書き込み回路を書き込んだ後、当該演算装置は、前記仮想回路書き込みデータに含まれる書き込み順番に従って、隣接する演算装置へ該仮想回路書き込みデータを転送する。 In the information processing apparatus of the present invention, a plurality of arithmetic devices are arranged in accordance with the structure to be operated, and data is directly transferred between the arithmetic devices without performing a bus at the time of executing the arithmetic, and the arithmetic result is taken out from each arithmetic device. The bus is used only when sending data to the host processor or inputting data from the host processor.
Specifically, an information processing apparatus according to one aspect of the present invention includes a memory, an arithmetic device array, a host processor, and a bus.
The memory stores an application program that performs a predetermined calculation on the calculation target.
The arithmetic device array is composed of a plurality of arithmetic devices. The plurality of arithmetic devices are arranged so as to be directly communicable between adjacent arithmetic devices corresponding to each problem area to be calculated, and are used for executing the application program. An arithmetic circuit that performs an operation corresponding to the area is reconfigured, and operation result data by the arithmetic circuit is transmitted and received between the adjacent arithmetic devices for the problem area.
The host processor previously determines an adjacent computing device as a transfer destination of computation result data for each computing device constituting the computing device array, executes the application program, and configures each computing device array Calculation result data for each problem area is acquired from the calculation device, and the calculation result is calculated for the calculation target.
A bus communicates data between the memory, the host processor, and the computing device array.
Then, virtual circuit data writing for writing the arithmetic circuit to the arithmetic unit based on virtual circuit write data including virtual circuit data corresponding to each problem area to be calculated is sent from the host processor to the arithmetic unit. After writing the circuit and writing the virtual circuit data write circuit, the arithmetic device transfers the virtual circuit write data to the adjacent arithmetic device in accordance with the write order included in the virtual circuit write data.

本発明の一側面の情報処理装置によれば、演算対象の構造に合わせて複数の演算装置が配置され、演算実行時はバスを介在させず演算装置間で直接データ転送が行われる。そして、各演算装置で演算された演算結果は、取り出してバスを介してホストプロセッサへ送出される。全ての演算装置は隣接演算装置に対してのみデータを転送する処理を実行すればよく、データ転送区間が短くなる。 According to the information processing apparatus of one aspect of the present invention, a plurality of arithmetic devices are arranged in accordance with the structure to be calculated, and data is directly transferred between the arithmetic devices without interposing a bus when executing the arithmetic operation. Then, the calculation result calculated by each calculation device is taken out and sent to the host processor via the bus. All the arithmetic devices only need to execute the process of transferring data only to the adjacent arithmetic devices, and the data transfer interval is shortened.

また、本発明の一側面の仮想回路書き込み方法は、演算対象の各々の問題領域に対応して隣接する演算装置間で直接データ通信可能に接続されて配置された複数の演算装置の各々に対して、ホストプロセッサから送られる前記演算対象の各々の問題領域に対応する仮想回路データを含んだ仮想回路書き込みデータに基づいて、演算回路を書き込むための仮想回路データ書き込み回路を書き込む。
次に、演算装置に仮想回路データ書き込み回路を書き込んだ後、当該演算装置は、前記仮想回路書き込みデータに含まれる書き込み順番に従って、隣接する演算装置へ該仮想回路書き込みデータを転送する。
次に、全ての演算装置について前記仮想回路データ書き込み回路の書き込みが終了後、前記仮想回路データ書き込み回路が最後に書き込まれた演算装置から最初に書き込まれた演算装置まで前記全ての演算装置について前記演算回路を書き込む。 Further, the virtual circuit writing method according to one aspect of the present invention is provided for each of a plurality of arithmetic devices arranged so as to be directly communicable between adjacent arithmetic devices corresponding to each problem area to be calculated. Te, based on the virtual circuits write data containing a virtual circuit data corresponding to each of the problem areas of the operation target sent from the host processor, write the virtual circuit data write circuit for writing an arithmetic circuit.
Next, after writing the virtual circuit data write circuit to the arithmetic device, the arithmetic device transfers the virtual circuit write data to the adjacent arithmetic device in accordance with the write order included in the virtual circuit write data.
Next, after the writing of the virtual circuit data writing circuit is completed for all the arithmetic devices, the virtual circuit data writing circuit is the arithmetic device from which the virtual circuit data writing circuit was last written to the first written arithmetic device. Write the arithmetic circuit .

また、本発明の一側面の仮想回路書き込み方法によれば、演算装置アレイを構成する複数の演算装置に対して、まず演算回路を書き込むための仮想回路データ書き込み回路が書き込まれた後、前記仮想回路データ書き込み回路が最後に書き込まれた演算装置から最初に書き込まれた演算装置まで全ての演算装置について前記演算回路が書き込まれる。そのため、演算回路が途中で書き込めなくなるということが生じる恐れがない。 According to the virtual circuit writing method of one aspect of the present invention, after a virtual circuit data writing circuit for writing an arithmetic circuit is first written to a plurality of arithmetic devices constituting the arithmetic device array, the virtual circuit writing method is performed. The arithmetic circuit is written in all the arithmetic devices from the arithmetic device in which the circuit data writing circuit was last written to the arithmetic device in which the circuit data was written first. Therefore, there is no possibility that the arithmetic circuit cannot be written in the middle.

以上のように、本発明によれば、個人用に使える安価で高性能な大規模演算用のコンピュータを実現することができる。それにより、個人でも用途に応じて手軽に大規模演算可能なコンピュータを利用でき、待ち時間も減らすことができる。 As described above, according to the present invention, an inexpensive and high-performance computer for large-scale computation that can be used for personal use can be realized. As a result, even an individual can use a computer that can easily perform a large-scale calculation according to the application, and the waiting time can be reduced.

以下、本発明を実施するための最良の形態の例について、添付図面を参照しながら説明する。説明は下記項目に従って順に行うとする。
１．本発明の一実施の形態に係る演算装置アレイの概念
２．隣接する演算装置との接続
３．情報処理装置の概要
４．情報処理装置の全体構成
５．ホストプロセッサの構成
６．演算装置（演算ボード、書き込み・入出力ボード）の構成
７．演算装置アレイによる演算処理
８．演算装置への書き込み処理
９．本発明の他の実施の形態に係る演算装置アレイ
１０．本発明のさらに他の実施の形態に係る演算装置アレイ Hereinafter, an example of the best mode for carrying out the present invention will be described with reference to the accompanying drawings. The explanation will be made in order according to the following items.
1. 1. Concept of arithmetic device array according to one embodiment of the present invention 2. Connection with adjacent computing device Outline of information processing apparatus 4. 4. Overall configuration of information processing apparatus 5. Configuration of host processor 6. Arrangement of arithmetic unit (arithmetic board, writing / input / output board) 7. Arithmetic processing by the arithmetic device array 8. Write processing to arithmetic unit 9. Arithmetic device array according to another embodiment of the present invention Arithmetic device array according to still another embodiment of the present invention

［１．本発明の一実施の形態に係る演算装置アレイの概念］
本発明に係る情報処理装置は、特許文献１（特開２００３−２０８３１１号公報）に記載の技術をスーパーコンピュータ等のＨＰＣに適用し、大規模演算をスケーラブルに行える基盤として、演算対象（対象問題）の構造を反映させて多数のＰＬＤ同士を相互接続したＰＬＤアレイ（演算装置アレイ）を構成したものである。演算装置アレイは、ＰＬＤとして例えば大規模ＦＰＧＡを搭載し、外部ＩＯ（入出力部）を大量に装備した小型のＦＰＧＡカード（演算装置）を、格子状に並べた構成を取る。 [1. Concept of arithmetic device array according to one embodiment of the present invention]
An information processing apparatus according to the present invention applies a technique described in Patent Document 1 (Japanese Patent Application Laid-Open No. 2003-208311) to an HPC such as a supercomputer, and serves as a basis for performing a large-scale operation in a scalable manner. The PLD array (arithmetic unit array) in which a large number of PLDs are interconnected to each other is reflected. The arithmetic device array has a configuration in which, for example, a large-scale FPGA is mounted as a PLD, and small FPGA cards (arithmetic devices) equipped with a large amount of external IO (input / output units) are arranged in a grid pattern.

すなわち、演算対象が１次元、２次元、３次元の構造である場合、それらの構造に合わせて隣接する演算装置を直接接続して１次元、２次元、３次元に配置し、演算対象において生じる数値あるいはそのデータ変化を隣接した演算装置に直接転送する。したがって、全ての演算装置は隣接演算装置に対してのみデータを転送する処理を実行すればよく、データ転送区間が短くなり、データ転送処理が情報処理装置の処理能力のボトルネックになることを回避する。また電力消費が少なくなる。 That is, when the calculation object has a one-dimensional, two-dimensional, or three-dimensional structure, adjacent calculation devices are directly connected in accordance with those structures and arranged in one, two, or three dimensions, and the calculation object is generated. The numerical value or its data change is directly transferred to the adjacent arithmetic unit. Therefore, all the arithmetic devices only need to execute the process of transferring data only to the adjacent arithmetic device, and the data transfer section is shortened, so that the data transfer processing is prevented from becoming a bottleneck of the processing capability of the information processing device. To do. Also, power consumption is reduced.

ＦＰＧＡ等のＲＣ−ＬＳＩ（Reconfigurable LSI）は、演算対象に合わせて回路を構成することができるため、専用回路を用いた高性能な演算手段を比較的安価に手に入れることができる。なお、以下に述べる実施の形態の例では、演算装置にＦＰＧＡを利用した例を説明するが、広義の再構成可能なＰＬＤであればこの例に限られるものでなく、ＣＰＬＤ（Complex Programmable Logic Device）等を適用できる。 Since an RC-LSI (Reconfigurable LSI) such as an FPGA can configure a circuit in accordance with a calculation target, a high-performance calculation means using a dedicated circuit can be obtained relatively inexpensively. In the example of the embodiment described below, an example in which an FPGA is used as an arithmetic unit will be described. However, the present invention is not limited to this example as long as it is a reconfigurable PLD in a broad sense, and a CPLD (Complex Programmable Logic Device). ) Etc. can be applied.

図３は、演算対象の物理的または論理的な構造の一例を示すものである。
演算対象６は、流体モデルや液体モデルなどであり、１次元、２次元および３次元の任意の構造を取り得る。演算対象の具体例としては、例えば天文学や物理学分野において現在進められているALMAプロジェクトや重力波検出、さらには、次世代核融合実証炉等における炉設計、高エネルギー物理学分野においては、次世代加速器の内部反応予測シミュレーション、地球科学分野における様々な可視化シミュレーション（具体的には断層モデルの可視化等）、気象学的には中長期予報のためのシミュレーション分野の拡充等が要望として上がってきている。また、宇宙航空分野においては、人工衛星の機能設計や国際宇宙ステーションにおけるシミュレーション、将来の月への基地建設、有人火星探査におけるリスクシミュレーション等の分野からの要望も生じている。 FIG. 3 shows an example of a physical or logical structure to be calculated.
The calculation target 6 is a fluid model, a liquid model, or the like, and can take any one-dimensional, two-dimensional, or three-dimensional structure. Specific examples of computation targets include, for example, the ALMA project currently being promoted in the fields of astronomy and physics, gravity wave detection, reactor design in the next generation fusion demonstration reactor, and high energy physics. The internal reaction prediction simulation of the next generation accelerator, various visualization simulations in the earth science field (specifically, visualization of fault models, etc.), and meteorology, the expansion of the simulation field for medium- to long-term forecasts has been raised as requests. Yes. In the aerospace field, there are also demands from fields such as functional design of satellites, simulation at the International Space Station, construction of a base for the future moon, risk simulation in manned Mars exploration.

例えば、電磁気学や半導体工学などでよく利用される方程式の一つにポアソン方程式があるが、これは楕円型の偏微分方程式である。この方程式を差分法で解く場合、領域内の格子点数だけの連立方程式を解く必要がある。格子幅については任意でよいが、精度に反映されるため、いかに効率よく大型の連立方程式を解くかが問題となる。解きたい問題の領域について、格子点間隔を狭めれば格子点が増加し、それゆえ大規模な演算が必要となる。 For example, one of the equations often used in electromagnetism and semiconductor engineering is the Poisson equation, which is an elliptic partial differential equation. When solving this equation by the difference method, it is necessary to solve simultaneous equations for the number of grid points in the region. The lattice width may be arbitrary, but it is reflected in the accuracy, so how to solve a large simultaneous equation efficiently becomes a problem. In the problem area to be solved, the lattice points increase if the lattice point interval is narrowed, and therefore a large-scale calculation is required.

図３に示した演算対象６は３次元構造の例であり、本発明の情報処理装置を用いて当該演算対象６について演算処理を行うにあたり、この演算対象６を格子状の複数の問題領域６−１に分割する。この例では、演算対象６を、４（Ｘ方向）×３（Ｙ方向）×３（Ｚ方向）＝３６の問題領域に分割している。そして、図４に示すように、演算対象６に対し複数の演算装置４０−１を、当該演算対象６の各々の問題領域６−１に対応するように配置して演算装置アレイ４０を構成する。例えば斜線で表した個別の問題領域６−１の演算は、同じく斜線で表した対応する位置の演算装置４０−１が担当する。したがって、この例では演算装置アレイ４０は、問題領域６−１と同じ３６個の演算装置４０−１を用いて構成される。 The calculation target 6 shown in FIG. 3 is an example of a three-dimensional structure, and when performing calculation processing on the calculation target 6 using the information processing apparatus of the present invention, the calculation target 6 is converted into a plurality of lattice problem areas 6. Divide into -1. In this example, the calculation target 6 is divided into problem areas of 4 (X direction) × 3 (Y direction) × 3 (Z direction) = 36. Then, as shown in FIG. 4, a plurality of arithmetic devices 40-1 are arranged for the calculation target 6 so as to correspond to each problem area 6-1 of the calculation target 6 to constitute the calculation device array 40. . For example, the calculation of individual problem areas 6-1 represented by diagonal lines is performed by the arithmetic unit 40-1 at the corresponding position, also represented by diagonal lines. Therefore, in this example, the arithmetic device array 40 is configured by using the same 36 arithmetic devices 40-1 as the problem area 6-1.

各々の問題領域においては、隣接した問題領域と相互作用が行われ、物理量が隣接の問題領域に伝搬されるので、その物理現象に対応して隣接した演算装置間で対応するデータ転送が行われる。具体的には、一つの問題領域内には演算の対象となる多数のデータが格子点６−１Ａ，６−１Ｂ，６−１Ｃ，・・・上にあり、対応する演算装置による演算が領域境界に達した場合にその演算結果が、当該演算装置から隣接する演算装置へ転送される。 In each problem area, an interaction with an adjacent problem area is performed, and a physical quantity is propagated to the adjacent problem area, so that corresponding data transfer is performed between adjacent arithmetic devices corresponding to the physical phenomenon. . Specifically, in one problem area, a large number of data to be calculated are on the grid points 6-1A, 6-1B, 6-1C,..., And the calculation by the corresponding calculation device is an area. When the boundary is reached, the calculation result is transferred from the calculation device to the adjacent calculation device.

このように、本発明では、分割した問題領域の物理形状や特性に対応して、演算装置を割り当て、隣接する演算装置間で直接データ転送することで、高性能なデータ処理能力を維持している。 As described above, according to the present invention, high-performance data processing capability is maintained by assigning arithmetic devices corresponding to the physical shape and characteristics of the divided problem areas and transferring data directly between adjacent arithmetic devices. Yes.

これに対し、従来の方式では、固定的に物理構成のままであり通信路でのデータ転送がネックとなり、演算対象に対して演算装置の性能を十分に発揮することができない。また、通常の場合には、演算装置間のデータ転送がボトルネックとなり通信路の高速性能を使用できずにいる。それを改善するために光ケーブル等の最高性能の通信路を用意すると、システム全体が高価なものとならざるを得ない。 On the other hand, in the conventional method, the physical configuration is fixed and the data transfer on the communication path becomes a bottleneck, and the performance of the arithmetic device cannot be sufficiently exerted on the operation target. Also, in normal cases, data transfer between arithmetic devices becomes a bottleneck, and the high-speed performance of the communication path cannot be used. In order to improve it, if the highest performance communication path such as an optical cable is prepared, the entire system must be expensive.

本発明では、演算対象に応じて演算装置間を連接接続する通信路がスケーラブルに増大するので、高性能・高価格の通信路を用いる必要がない。また、データ転送先が隣接する演算装置までのため、通信路に掛かる処理負荷も小さく、消費電力も少なくすることができる。 In the present invention, the communication path for connecting and connecting the arithmetic devices according to the calculation target increases in a scalable manner, so that it is not necessary to use a high-performance and high-priced communication path. In addition, since the data transfer destination is up to an adjacent computing device, the processing load on the communication path is small and the power consumption can be reduced.

［２．隣接する演算装置との接続］
本発明の情報処理装置においては、上記のとおり隣接する演算装置間で直接データ転送できるように構成している。以下、この隣接する演算装置との接続の形態について、図５〜図７を参照して説明する。 [2. Connection with adjacent processing unit]
The information processing apparatus of the present invention is configured so that data can be directly transferred between adjacent arithmetic devices as described above. Hereinafter, the form of connection with this adjacent arithmetic unit will be described with reference to FIGS.

図５、演算装置（演算ボード）のコネクタ配置を示す図である。図６は、演算装置（演算ボード）の相互接続を示す図である。さらに、図７は、演算装置の分解斜視図である。 FIG. 5 is a diagram showing the connector arrangement of the arithmetic device (arithmetic board). FIG. 6 is a diagram showing the interconnection of arithmetic devices (arithmetic boards). FIG. 7 is an exploded perspective view of the arithmetic device.

本実施の形態における演算装置４０−１は、図５〜図７に示すように、上下に配置された１対の演算ボード５０と書き込み・入出力ボード６０（特許請求の範囲に記載された書き込みボードの一例）から構成される。
演算ボード５０は、主に演算処理を担当する演算回路等（特許請求の範囲に記載された演算回路に相当）が書き込まれるＦＰＧＡ５１、主記憶装置として機能するメモリ５２、および複数のコネクタ（接続端子）が設置された基板５０Ａを備える。また、書き込み・入出力ボード６０は、演算ボード５０のＦＰＧＡに演算回路等を書き込むための書き込み回路およびデータ転送用の入出力回路等が書き込まれるＦＰＧＡ６１、主記憶装置として機能するメモリ６２、および複数のコネクタ（接続端子）が設置された基板６０Ａを備える。各コネクタは、演算ボード５０の基板５０Ａおよび書き込み・入出力ボード６０の基板６０Ａ上の各辺の所定位置に設けられる。演算ボード５０と書き込み・入出力ボード６０の機能等の詳細については後述する。 As shown in FIG. 5 to FIG. 7, the arithmetic device 40-1 in the present embodiment includes a pair of arithmetic boards 50 and a write / input / output board 60 (a write / input board 60 described in the claims). An example of a board).
The arithmetic board 50 includes an FPGA 51 to which an arithmetic circuit or the like (corresponding to an arithmetic circuit described in claims) is mainly written, a memory 52 functioning as a main storage device, and a plurality of connectors (connection terminals). ) Is provided. The write / input / output board 60 includes an FPGA 61 in which a write circuit for writing an arithmetic circuit and the like to the FPGA of the arithmetic board 50 and an input / output circuit for data transfer are written, a memory 62 functioning as a main storage device, and a plurality of The board (60A) on which the connector (connection terminal) is installed. Each connector is provided at a predetermined position on each side of the substrate 50A of the arithmetic board 50 and the substrate 60A of the writing / input / output board 60. Details of functions of the arithmetic board 50 and the writing / input / output board 60 will be described later.

演算ボード５０は、これらの複数のコネクタにより相互に接続することが可能である。例えば、図６に示すように、演算ボード５０は前接続用のコネクタ５３Ｆを当該演算ボード５０の前に配置されている演算ボード５０の後用接続用のコネクタ５３Ｂと直結することによって、前に配置された演算ボード５０と直接データ転送可能になる。同様にして、後接続用のコネクタ５３Ｂ、左接続用のコネクタ５３Ｌ、右接続用のコネクタ５３Ｒ、上接続用のコネクタ５３Ｕ、下接続用のコネクタ５３Ｄにより、それぞれ当該演算ボード５０の後ろ、左、右、上、下に位置する演算ボード５０と直接データ転送可能に接続できる。 The arithmetic board 50 can be connected to each other by the plurality of connectors. For example, as shown in FIG. 6, the calculation board 50 is connected to the front connection connector 53 F directly with the rear connection connector 53 B disposed in front of the calculation board 50. Data can be directly transferred to the arranged arithmetic board 50. Similarly, the rear connection connector 53B, the left connection connector 53L, the right connection connector 53R, the upper connection connector 53U, and the lower connection connector 53D are respectively connected to the rear, left, It can be connected to the calculation board 50 located on the right, top, and bottom so that data can be directly transferred.

また、演算ボード５０は、書き込み・入出力ボード６０と接続するためのコネクタ５４を備え、コネクタ５４を対応する書き込み・入出力ボード６０のコネクタ６４と直結することで、２つのボード間を直接データ転送可能にする。 In addition, the arithmetic board 50 includes a connector 54 for connecting to the writing / input / output board 60. By directly connecting the connector 54 to the connector 64 of the corresponding writing / input / output board 60, data is directly transmitted between the two boards. Enable transfer.

演算ボード５０は、前後左右方向の接続についてはコネクタ直結とし、高速転送を可能とする。また隣接接続端子の上下端子すなわちコネクタは、積層に適するよう対応する同一の場所にレイアウトする。ただし、各ボードの廃熱を考えると、上下端子の接続は直結とはせずにケーブルを介して接続するのが好ましい。 The arithmetic board 50 is connected directly to the connector in the front-rear and left-right directions to enable high-speed transfer. In addition, the upper and lower terminals of the adjacent connection terminals, that is, the connectors are laid out in the same corresponding locations so as to be suitable for stacking. However, considering the waste heat of each board, it is preferable to connect the upper and lower terminals via a cable instead of being directly connected.

なお、書き込み・入出力ボード６０の基板６０Ａにおいて、対応する演算ボード５０の下接続用のコネクタ５３Ｄに対応する位置に、コネクタ５３Ｄの形状に合わせて切り欠き６０Ａ１が形成してある。書き込み・入出力ボード６０の切り欠き６０Ａ１が形成された辺は、コネクタ６３Ｂ，６３Ｆが設けられた辺の長さと比較して短い。この切り欠き６０Ａ１を通して、演算ボード５０の下接続用のコネクタ５３Ｄが対応する書き込み・入出力ボード６０の基板５０Ａを貫通し、その下に配置された演算ボード５０と接続することができる。 In the board 60A of the writing / input / output board 60, a cutout 60A1 is formed at a position corresponding to the connector 53D for lower connection of the corresponding computing board 50 in accordance with the shape of the connector 53D. The side where the notch 60A1 of the writing / input / output board 60 is formed is shorter than the length of the side where the connectors 63B and 63F are provided. Through this notch 60A1, the connector 53D for lower connection of the arithmetic board 50 can penetrate the board 50A of the corresponding write / input / output board 60 and can be connected to the arithmetic board 50 arranged therebelow.

書き込み・入出力ボード６０に切り欠き６０Ａ１を形成した場合、コネクタ６３Ｂ，６３Ｆが設けられた辺が短いので基板６０Ａの面積が小さくなり、材料の節約、コスト削減に繋がる。なお、演算ボード５０のコネクタ５３Ｄが書き込み・入出力ボード６０を貫通できればよいので、例えば演算ボード５０と同面積・同形状の書き込み・入出力ボード６０に、切り欠き６０Ａ１に代えて貫通用の孔を設けてもよい。 When the notch 60A1 is formed in the writing / input / output board 60, the side where the connectors 63B and 63F are provided is short, so the area of the substrate 60A is reduced, leading to material saving and cost reduction. Since the connector 53D of the arithmetic board 50 only needs to be able to penetrate the writing / input / output board 60, for example, the writing / input / output board 60 having the same area and shape as the arithmetic board 50 is replaced with a through hole instead of the notch 60A1. May be provided.

図７に示した例では、コネクタ５３Ｆ，５３Ｒ，５４は芯線の接続部が凸であるオス型コネクタとし、コネクタ５３Ｂ，５３Ｌ，６４は芯線の接続部が凹であるメス型コネクタとしているが、この例に限られるものではない。 In the example shown in FIG. 7, the connectors 53F, 53R, and 54 are male connectors with convex core wire connections, and the connectors 53B, 53L, and 64 are female connectors with concave core wire connections. It is not limited to this example.

［３．情報処理装置の概要］
次に、上述のような構成の演算装置アレイを用いた情報処理装置の概要について説明する。 [3. Overview of information processing equipment]
Next, an outline of an information processing apparatus using the arithmetic device array having the above-described configuration will be described.

既述のとおり、本発明の情報処理装置は、特許文献１に記載された半導体回路制御装置を応用して構成する。つまり、本発明においては、ＦＰＧＡ等のＰＬＤの内部構成には関与しないが、動的に回路（いわゆる「仮想回路」）が書き換え可能なＲＣ−ＬＳＩを用いる。そして、複数のＲＣ−ＬＳＩ（演算装置に相当）を演算対象の構造に合わせて配置し、かつ、隣接するＲＣ−ＬＳＩを相互に直接接続して演算装置アレイを構成することにより、ＲＣ−ＬＳＩ間でバスを用いず直接データ転送可能に構成する。 As described above, the information processing apparatus of the present invention is configured by applying the semiconductor circuit control apparatus described in Patent Document 1. That is, in the present invention, an RC-LSI that is not involved in the internal configuration of a PLD such as an FPGA but that can dynamically rewrite a circuit (a so-called “virtual circuit”) is used. A plurality of RC-LSIs (corresponding to arithmetic units) are arranged in accordance with the structure to be operated, and adjacent RC-LSIs are directly connected to each other to form an arithmetic unit array, thereby forming an RC-LSI. It is configured to be able to transfer data directly without using a bus.

このように構成することにより、本発明においても、特許文献１に記載された半導体回路制御装置と同様に、仮想回路をオブジェクトでラッピングしたハードウェア・オブジェクトモデルは通常のソフトウェア・オブジェクトと外見は同様であり、自由にプログラムの中で使用できる。オブジェクト・ライブラリ中のオブジェクトから派生する方法を踏襲し、ハードウェア・オブジェクト・ライブラリを用意する。並列性が重視される画像認識、音声認識などの処理、常時観測が必要な処理など、専用回路に適した処理はハードウェア・オブジェクトとしてライブラリからＲＣ−ＬＳＩに読み出して処理する。ただし、ハードウェア・オブジェクトは回路であるので、所定の同期信号に基づいてタイミングや同期の制御は行わなければならない。また、仮想回路を包み込むオブジェクトを演算対象に合わせて必要数だけ確保できるように、メモリやＲＣ−ＬＳＩを演算装置に用意する。 With this configuration, in the present invention as well as the semiconductor circuit control device described in Patent Document 1, the hardware object model obtained by wrapping virtual circuits with objects has the same appearance as a normal software object. And can be used freely in the program. Following the method of deriving from the objects in the object library, a hardware object library is prepared. Processes suitable for dedicated circuits, such as image recognition and speech recognition processes that place importance on parallelism, and processes that require constant observation, are read out from the library to the RC-LSI and processed as hardware objects. However, since the hardware object is a circuit, the timing and synchronization must be controlled based on a predetermined synchronization signal. In addition, a memory and an RC-LSI are prepared in the arithmetic unit so that the necessary number of objects enclosing the virtual circuit can be secured according to the target of calculation.

図８は、本発明の一実施の形態に係る情報処理装置の概要を示す図である。
本実施の形態における情報処理装置は、例えば、ＣＰＵなどからなるホストプロセッサ１１と、実際のメモリ空間を構築するメモリ１２と、仮想回路空間を構築するＰＬＤ１３とを備える。 FIG. 8 is a diagram showing an overview of an information processing apparatus according to an embodiment of the present invention.
The information processing apparatus according to the present embodiment includes, for example, a host processor 11 including a CPU, a memory 12 that constructs an actual memory space, and a PLD 13 that constructs a virtual circuit space.

アプリケーション（アプリケーション・プログラム）２１は、ソフトウェアで実現してあるオブジェクトの他に最適な性能を実現する回路を含んだハードウェア・オブジェクト２６も含まれている。アプリケーション２１を実行する計算機は、メモリ空間を実現するメモリ（メモリ素子）１２上にアプリケーションのソフトウェア・プログラム部分を配置し、これとソフトウェアを実行するホストプロセッサ１１をシステムバスで接続している。同時にホストプロセッサ１１には複数のＰＬＤ１３を接続した標準バスが接続されている。これらのバスを総称してバス１４と表記する。 The application (application program) 21 includes a hardware object 26 including a circuit that realizes optimum performance in addition to an object realized by software. The computer that executes the application 21 arranges the software program part of the application on a memory (memory element) 12 that realizes a memory space, and connects this to the host processor 11 that executes the software via a system bus. At the same time, a standard bus connecting a plurality of PLDs 13 is connected to the host processor 11. These buses are collectively referred to as bus 14.

このＰＬＤ１３はアプリケーションが起動され、動作中にハードウェア・オブジェクト２６が起動されると、その回路部分のハードウェア・ネット２９が書き込まれる実ハードウェア部品である。メモリ素子と同様に多数のＬＳＩで回路空間を構成する。ＰＬＤ１３は、後述するハードウェア・モジュール３０に搭載されるＦＰＧＡおよび演算装置アレイ４０を構成する演算装置４０−１に搭載されたＦＰＧＡ（図７参照）に相当する。 The PLD 13 is an actual hardware component into which the hardware net 29 of the circuit portion is written when an application is activated and the hardware object 26 is activated during operation. A circuit space is constituted by a large number of LSIs in the same manner as the memory elements. The PLD 13 corresponds to an FPGA (see FIG. 7) mounted on an FPGA mounted on a hardware module 30 described later and an arithmetic device 40-1 constituting the arithmetic device array 40.

アプリケーション２１が起動されるとメモリ１２内にアプリケーション２１がまず配置される。プログラムの処理が進みハードウェア・オブジェクト（hwObject）２６を生成するコンストラクタ文が実行されると、メモリ１２とＰＬＤ１３とに跨って（両方でセットとして）ハードウェア・オブジェクト２６が配置される。ここでは、hwObject−１とhwObject−２のコンストラクタ文が実行されて二つのハードウェア・オブジェクト２６が生成されている。アプリケーション２１は、hwObject−１及びhwObject−２を含む。各ハードウェア・オブジェクト２６に対応して、ハードウェア・ドライバ（hwDD）２７及びハードウェア・ネット２９が存在する。ハードウェア・ドライバ２７は、メモリ１２内に生成される。一方、ハードウェア・ネット２９は、仮想回路空間であるＰＬＤ１３内に、直接データ転送可能に生成され且つ消去可能とされる。 When the application 21 is activated, the application 21 is first arranged in the memory 12. When the processing of the program proceeds and a constructor statement that generates a hardware object (hwObject) 26 is executed, the hardware object 26 is arranged across the memory 12 and the PLD 13 (both as a set). Here, hwObject-1 and hwObject-2 constructor statements are executed to generate two hardware objects 26. The application 21 includes hwObject-1 and hwObject-2. A hardware driver (hwDD) 27 and a hardware net 29 exist corresponding to each hardware object 26. The hardware driver 27 is generated in the memory 12. On the other hand, the hardware net 29 is generated and erasable in the PLD 13 which is a virtual circuit space so that data can be directly transferred.

図９は、ホストプロセッサとハードウェア・ネット間の制御概要の流れを示す図である。
ここでは、ハードウェア・オブジェクト２６に対する読み書きの要求がアプリケーション２１から、順次、要求を伝えてハードウェア・ネット２９にまで伝えて応答を返すまで処理が続く。この時、ハードウェア・ネット２９は並列動作を行う。アプリケーション２１はハードウェア・オブジェクト２６に実行を命令して、返答を待たずに、次の処理、例えば、別のハードウェア・オブジェクト２６の実行を行っていく。ハードウェア・ネット２９の処理が終了したかどうかを調べる処理を行いハードウェア・オブジェクト２６のメンバー変数の値を読み取る。このように、アプリケーション２１のレベルでの並列処理にあった形で、ハードウェア・オブジェクト２６すなわち演算回路を含む組み込み回路の並列性を適切に利用することができる。 FIG. 9 is a diagram showing a flow of control outline between the host processor and the hardware network.
Here, the processing continues until a read / write request to the hardware object 26 is transmitted from the application 21 to the hardware network 29 in sequence, and a response is returned. At this time, the hardware net 29 performs a parallel operation. The application 21 instructs the hardware object 26 to execute, and performs the next process, for example, the execution of another hardware object 26 without waiting for a response. A process for checking whether the processing of the hardware net 29 is completed is performed, and the value of the member variable of the hardware object 26 is read. As described above, the parallelism of the hardware object 26, that is, the embedded circuit including the arithmetic circuit can be appropriately used in a form suitable for the parallel processing at the level of the application 21.

以下に、情報処理装置の基本的な動作を説明する。
まず、アプリケーション２１が起動されると、ＯＳはアプリケーション２１の実行に必要な領域をアプリケーション２１が排他的に使用できるように確保して制御をアプリケーション２１に渡す。アプリケーション２１の起動・初期化プログラムは、プログラムで使用されるイベント管理、メッセージ管理等のＯＳとの通信やアプリケーション２１内での管理に必要な基盤部分の立ち上げを行う。なお、ハードウェア・オブジェクト２６の管理に必要な管理制御部分もこの時にアプリケーション部に組み込まれる。 The basic operation of the information processing apparatus will be described below.
First, when the application 21 is activated, the OS secures an area necessary for the execution of the application 21 so that the application 21 can exclusively use it, and passes control to the application 21. The startup / initialization program of the application 21 starts up a basic part necessary for communication with the OS such as event management and message management used in the program, and management within the application 21. At this time, the management control part necessary for managing the hardware object 26 is also incorporated into the application part.

アプリケーションの動作中にハードウェア・オブジェクト２６を生成する文が実行されると、ハードウェア・オブジェクト２６がメモリ領域にＩＯ処理部であるハードウェア・ドライバ２７を含んで生成される。ＰＬＤ１３には同時に回路としてのハードウェア・ネット２９が書き込まれ、ハードウェア・オブジェクト２６が排他的に使用できるように設定される。このハードウェア・ネット２９の回路データは、例えば、このハードウェア・ネット２９の動作仕様を動作記述言語により記述し、設計自動化ツールである高位・論理合成、配置・配線ツールを使いＰＬＤ１３に書き込まれる回路データを事前に作成して、回路ライブラリに登録しておくことができる。ここで、事前にハードウェア・ネット２９を書き込んでおき、ハードウェア・オブジェクト２６の生成時に回路を単に活性化させ、回路の書き込み時間を短縮させることは容易に考えられる。 When a statement that generates the hardware object 26 is executed during the operation of the application, the hardware object 26 is generated including a hardware driver 27 that is an IO processing unit in the memory area. At the same time, a hardware net 29 as a circuit is written in the PLD 13 so that the hardware object 26 can be used exclusively. The circuit data of the hardware net 29 is written in the PLD 13 by describing the operation specifications of the hardware net 29 in an operation description language and using a high-level / logic synthesis / placement / wiring tool that is a design automation tool. Circuit data can be created in advance and registered in the circuit library. Here, it can be easily considered that the hardware net 29 is written in advance, the circuit is simply activated when the hardware object 26 is generated, and the circuit writing time is shortened.

ハードウェア・オブジェクト２６は一度生成されると、ハードウェア・オブジェクト２６を消滅させる文が実行されるかアプリケーション２１が終了するまで存在し続ける。アプリケーション２１の動作中にハードウェア・オブジェクト２６に対する読み書きを行う文が実行されると、それがハードウェア・ネット２９に関係するメンバー変数やメンバー関数に対する場合には、ハードウェア・ドライバ２７を介してハードウェア・ネット２９に対するＩＯ処理を行う。 Once created, the hardware object 26 continues to exist until a statement that causes the hardware object 26 to disappear is executed or the application 21 is terminated. When a statement that reads from or writes to the hardware object 26 is executed during the operation of the application 21, if it is a member variable or member function related to the hardware net 29, the statement is sent via the hardware driver 27. IO processing for the hardware net 29 is performed.

［４．情報処理装置の全体構成」
図１０は、情報処理装置の全体構成を示す図である。
本実施の形態における情報処理装置は、例えば、ホストプロセッサ１１と、主記憶装置としてのメモリ１２と、仮想回路空間であるハードウェア・モジュール３０と、演算対象の各問題領域について演算を行う演算装置アレイ４０とを備える。
演算装置アレイ４０はハードウェア・モジュールの一種であり、図４に示した例では３次元構造であるが、ここでは説明の便宜のため２次元構造で表現してある。また、ホストプロセッサ１１、メモリ１２、およびハードウェア・モジュール３０は、所定の規格（ＰＣＩ規格等）のバス１４を通じて高速なデータ転送が可能となっている。 [4. Overall Configuration of Information Processing Device "
FIG. 10 is a diagram illustrating an overall configuration of the information processing apparatus.
The information processing apparatus according to the present embodiment includes, for example, a host processor 11, a memory 12 as a main storage device, a hardware module 30 that is a virtual circuit space, and an arithmetic device that performs an operation on each problem area to be calculated. And an array 40.
The arithmetic device array 40 is a kind of hardware module and has a three-dimensional structure in the example shown in FIG. 4, but is represented here in a two-dimensional structure for convenience of explanation. In addition, the host processor 11, the memory 12, and the hardware module 30 can perform high-speed data transfer through a bus 14 of a predetermined standard (PCI standard or the like).

図に示すように、ソフトウェア・プログラムの大半はメモリ１２に置かれ、ホストプロセッサ１１により実行される。各種制御・処理のための回路は仮想化され、ハードウェア・ネット（hwNet）として、バス１４上に置かれたハードウェア・モジュール３０と名付けたハードウェア部品中に設けられ、ホストプロセッサ１１が必要となった時点に一時的に書き込まれる。そして、ハードウェア・モジュール３０に書き込まれた仮想回路によって、演算装置アレイ４０の各演算装置４０−１内のＦＰＧＡに演算回路等の仮想回路が書き込まれる。ハードウェア・モジュール３０および各演算装置４０−１に書き込まれた仮想回路は、不要になれば消去または再利用のために初期化される。 As shown in the figure, most of the software programs are stored in the memory 12 and executed by the host processor 11. Circuits for various types of control and processing are virtualized and provided as a hardware network (hwNet) in a hardware component named a hardware module 30 placed on the bus 14 and requires a host processor 11 It is temporarily written at the time. Then, a virtual circuit such as an arithmetic circuit is written to the FPGA in each arithmetic device 40-1 of the arithmetic device array 40 by the virtual circuit written in the hardware module 30. The virtual circuit written in the hardware module 30 and each arithmetic unit 40-1 is initialized for erasure or reuse when it becomes unnecessary.

この生成・消去の手順の詳細は後述するが、ソフトウェアのオブジェクトと同じようにコンストラクタとデストラクタ演算子で行われる。ハードウェア・モジュール３０および演算装置アレイ４０はメモリ型のデバイスとして認識され、ハードウェア・オブジェクト２６への読み書きはメモリ１２上に作られたオブジェクトと同じように行われる。メモリ空間上に自由にオブジェクトが作れるのと同じように、ハードウェア・オブジェクト２６が自由に作られる仮想回路空間を実部品として提供するのがハードウェア・モジュール３０および演算装置アレイ４０である。したがって、特に演算装置アレイ４０は演算対象に合わせて大きな仮想回路空間を張るように多数の演算装置４０−１を事前に用意する。ハードウェア・オブジェクト２６を使った計算モデルは仮想回路をオブジェクトとして扱い、アプリケーション２１にハードウェア・ネット２９を埋め込むインターフェースを提供するものである。 The details of the generation / deletion procedure will be described later, but are performed by a constructor and a destructor operator in the same manner as a software object. The hardware module 30 and the arithmetic unit array 40 are recognized as memory-type devices, and reading / writing to the hardware object 26 is performed in the same manner as an object created on the memory 12. The hardware module 30 and the arithmetic unit array 40 provide the virtual circuit space in which the hardware object 26 can be freely created as real parts in the same way that an object can be freely created in the memory space. Therefore, in particular, the arithmetic device array 40 prepares a large number of arithmetic devices 40-1 in advance so as to create a large virtual circuit space in accordance with the operation target. The calculation model using the hardware object 26 treats a virtual circuit as an object and provides an interface for embedding a hardware net 29 in the application 21.

本実施の形態では、再構成可能システムを構成する一要素として、図１０に示したようなハードウェアボードである、ハードウェア・モジュール３０を導入している。ハードウェア・モジュール３０は、例えば、標準バス・インターフェース（ＢＩ）３１、ローカルメモリ（ＬＭ）３２、ローカルプロセッサ（ＬＰ）３３、ＦＰＧＡ３４、入出力インターフェース３５をデータ転送可能に接続する構成になっている。 In the present embodiment, a hardware module 30 which is a hardware board as shown in FIG. 10 is introduced as one element constituting the reconfigurable system. For example, the hardware module 30 is configured to connect a standard bus interface (BI) 31, a local memory (LM) 32, a local processor (LP) 33, an FPGA 34, and an input / output interface 35 so that data can be transferred. .

ＬＰ３３は、ＦＰＧＡ等のＲＣ−ＬＳＩで構成され、ハードウェア・モジュール３０内の制御を行う。すなわちＬＰ３３によってＬＭ３２を利用しつつハードウェア・ネット（仮想回路データ書き込みのための書き込み回路の一例）がＪＴＡＧ方式でＦＰＧＡ３４に書き込まれる。ＦＰＧＡ３４は、入出力インターフェース３５および通信路３６を介して演算装置アレイ４０の各演算装置４０−１内のＦＰＧＡにハードウェア・ネット２９（演算回路等の仮想回路）を書き込む。演算装置アレイ４０の各演算装置４０−１に書き込まれた仮想回路のそれぞれが、ハードウェア・ネット２９として機能する。 The LP 33 is configured by an RC-LSI such as an FPGA and controls the hardware module 30. That is, a hardware net (an example of a writing circuit for writing virtual circuit data) is written into the FPGA 34 by the JTAG method while using the LM 32 by the LP 33. The FPGA 34 writes the hardware net 29 (virtual circuit such as an arithmetic circuit) to the FPGA in each arithmetic device 40-1 of the arithmetic device array 40 via the input / output interface 35 and the communication path 36. Each virtual circuit written in each arithmetic device 40-1 of the arithmetic device array 40 functions as a hardware net 29.

ＢＩ３１は、後述するハードウェア・ドライバの制御命令によりホストプロセッサ１１とハードウェア・ネット２９（ＦＰＧＡ３４、演算装置アレイ４０）との通信を制御する。ハードウェア・モジュール３０（演算装置アレイ４０を含む）はメモリ・デバイスとして認識され、メモリと同じような形態でバス１４（例えば、ＰＣＩ）に接続して情報処理装置（計算機）に組み込まれる。この時、ＯＳによりハードウェア・モジュール・ドライバと呼ぶデバイス・ドライバが演算装置アレイ４０の各演算装置４０−１に組み込まれ、メモリ領域が割り当てられる。多数のメモリ・チップが一様且つ平坦なメモリ空間を構成するように多数のハードウェア・モジュール（演算装置４０−１）内のＲＣ−ＬＳＩは仮想回路空間を構成する。 The BI 31 controls communication between the host processor 11 and the hardware net 29 (FPGA 34, arithmetic device array 40) by a hardware driver control instruction to be described later. The hardware module 30 (including the arithmetic device array 40) is recognized as a memory device, and is connected to the bus 14 (for example, PCI) in the same form as the memory and incorporated in the information processing device (computer). At this time, a device driver called a hardware module driver is incorporated into each arithmetic device 40-1 of the arithmetic device array 40 by the OS, and a memory area is allocated. RC-LSIs in a large number of hardware modules (computing device 40-1) constitute a virtual circuit space so that a large number of memory chips constitute a uniform and flat memory space.

なお、ＬＰ３３およびＦＰＧＡ３４は、ホストプロセッサ１１から送信されたデータを演算装置アレイ４０へ転送し、また、演算装置アレイ４０から送信されたデータをホストプロセッサ１１へ転送する入出力回路としての機能も備える。 Note that the LP 33 and the FPGA 34 also have a function as an input / output circuit that transfers data transmitted from the host processor 11 to the arithmetic device array 40 and transfers data transmitted from the arithmetic device array 40 to the host processor 11. .

図１０に示すようなハードウェア・モジュール３０を介してホストプロセッサ１１からの仮想回路データを演算装置アレイ４０の各演算装置４０−１のＦＰＧＡに転送することで、演算装置アレイ４０の各演算装置４０−１にそれぞれハードウェア・ネット２９を配置することができる。詳細は後述するが、例えば演算装置「００」のＦＰＧＡ５１に書き込まれた最初のhwNet−１の出力を演算装置「０１」に転送し、それを演算装置「０１」のＦＰＧＡ５１に書き込まれたhwNet−２が読み込む事で、hwNet−１とhwNet−２（図８参照）とは同時に動作してパイプライン処理を行う事ができる。 The virtual circuit data from the host processor 11 is transferred to the FPGA of each arithmetic device 40-1 in the arithmetic device array 40 via the hardware module 30 as shown in FIG. A hardware net 29 can be arranged in each of 40-1. Although details will be described later, for example, the output of the first hwNet-1 written in the FPGA 51 of the arithmetic device “00” is transferred to the arithmetic device “01”, and the output is transferred to the hwNet− written in the FPGA 51 of the arithmetic device “01”. When 2 is read, hwNet-1 and hwNet-2 (see FIG. 8) can operate simultaneously to perform pipeline processing.

また、ＬＰ３３によりホストプロセッサ１１の負担を減らす事ができる。すなわち、各演算装置４０−１のＦＰＧＡ５１に書き込まれる使用頻度が高いハードウェア・ネット２９の書き込みは、ＢＩ３１を通してこのＬＰ３３に書き込み命令を送る事で可能である。 Further, the load on the host processor 11 can be reduced by the LP 33. That is, the hardware network 29 that is frequently used and written to the FPGA 51 of each arithmetic unit 40-1 can be written by sending a write command to the LP 33 through the BI 31.

［５．ホストプロセッサの構成］
図１１に、ホストプロセッサの階層構成図を示す。
ホストプロセッサ１１は、アプリケーション２１、オブジェクト・マネジャー２２、ハードウェア・モジュール・ドライバ２３、ＯＳ２４、バス２５を備える。この例では、アプリケーション２１を３つ含む例を示しているが、適宜の数のアプリケーションを有することができる。各アプリケーション２１は、ひとつ又は複数のハードウェア・オブジェクト２６、ハードウェア・ドライバ２７、アプリケーション２１とハードウェア・ドライバ２７との入出力を制御するインターフェース２８の組を有する。 [5. Host processor configuration]
FIG. 11 shows a hierarchical configuration diagram of the host processor.
The host processor 11 includes an application 21, an object manager 22, a hardware module driver 23, an OS 24, and a bus 25. In this example, an example including three applications 21 is shown, but an appropriate number of applications can be provided. Each application 21 includes a set of one or a plurality of hardware objects 26, a hardware driver 27, and an interface 28 that controls input / output between the application 21 and the hardware driver 27.

各ハードウェア・ドライバ２７は、ハードウェア・ネット２９毎に定義され、（ハードウェア・モジュール３０を介して）ハードウェア・ネット２９の入出力動作を制御する。ハードウェア・ドライバ２７には、例えば、hwNetの端子情報、書き込みや読み込み、イネーブル、アウトプット・イネーブルなどのhwNetの制御情報、hwObject番号、hwNet番号、hwModule（演算装置）番号、割り当てＰＬＤ番号（演算装置とhwModuleが一対一に対応する場合は不要）、hwNet割り当て端子番号、局所（ローカル）メモリ割り当てアドレス、局所メモリ割り当て領域サイズ、hwNet状態、hwNet命令、主メモリでのhwNet通信領域アドレス、hwNet通信領域サイズ、hwNet通信領域カレント・アドレスなどの通信制御情報が組み込まれている。これらの情報は、ハードウェア・ネットの回路情報と共にハードウェア・ネットライブラリに保存されている。ハードウェア・オブジェクト２６が生成されてハードウェア・ネット２９がロードされる時には、hwModule（演算装置）番号、hwNet番号、ＰＬＤ番号などを取得してハードウェア・ドライバ２７がハードウェア・オブジェクト２６の一部として生成される。 Each hardware driver 27 is defined for each hardware net 29 and controls input / output operations of the hardware net 29 (via the hardware module 30). The hardware driver 27 includes, for example, hwNet terminal information, hwNet control information such as write / read, enable, output enable, hwObject number, hwNet number, hwModule (arithmetic unit) number, and assigned PLD number (arithmetic unit). (It is not necessary when the device and hwModule correspond one-to-one), hwNet allocation terminal number, local (local) memory allocation address, local memory allocation area size, hwNet status, hwNet command, hwNet communication area address in main memory, hwNet communication Communication control information such as area size and hwNet communication area current address is incorporated. These pieces of information are stored in the hardware net library together with the hardware net circuit information. When the hardware object 26 is generated and the hardware net 29 is loaded, the hwModule (arithmetic unit) number, hwNet number, PLD number, etc. are obtained and the hardware driver 27 is one of the hardware objects 26. Is generated as a part.

ハードウェア・モジュール３０および演算装置アレイ４０を情報処理装置（計算機）に組み込んだ時、図に示すハードウェア・モジュール・ドライバ２３は添付されたデバイス情報をもとにＯＳ２４にデバイス・ドライバとして永続的に登録される。このハードウェア・モジュール・ドライバ２３は、計算機のバスに接続されたハードウェア・モジュール３０に対する通信を制御している。例えば、バス１４としてＰＣＩバスを使う時にはＰＣＩデバイス情報などがＯＳから割り当てられる。情報処理装置を立ち上げる時には常にハードウェア・モジュール・ドライバ２３は、ＯＳに事前に組み込まれる。一方、このハードウェア・モジュール・ドライバ２３に反して、ハードウェア・オブジェクト２６のハードウェア・ドライバ２７は、ハードウェア・ネット２９が存在する時だけハードウェア・モジュール・ドライバ２３の中に組み込まれ、ＯＳ２４がハードウェア・ドライバ２７を感知することはない。他方、ハードウェア・オブジェクト２６側は、ハードウェア・ドライバ２７をＯＳに組み込まれたデバイス・ドライバのようにみなしてハードウェア・ネット２９に対するアクセスを行う。このとき、どのハードウェア・モジュール３０および演算装置アレイ４０に対するアクセスを行うかなどは考慮する必要がない。 When the hardware module 30 and the arithmetic unit array 40 are incorporated into an information processing apparatus (computer), the hardware module driver 23 shown in the figure is permanently stored in the OS 24 as a device driver based on the attached device information. Registered in The hardware module driver 23 controls communication with the hardware module 30 connected to the computer bus. For example, when a PCI bus is used as the bus 14, PCI device information and the like are allocated from the OS. Whenever the information processing apparatus is started up, the hardware module driver 23 is incorporated in advance in the OS. On the other hand, contrary to the hardware module driver 23, the hardware driver 27 of the hardware object 26 is incorporated into the hardware module driver 23 only when the hardware net 29 exists. The OS 24 does not sense the hardware driver 27. On the other hand, the hardware object 26 side views the hardware driver 27 as a device driver incorporated in the OS, and accesses the hardware net 29. At this time, it is not necessary to consider which hardware module 30 and the arithmetic unit array 40 are accessed.

このように、本発明では、ＯＳ２４側からは、ハードウェア・モジュール・ドライバ２３が常時組み込まれるデバイス・ドライバとして安定して制御及び監視を行う事ができるので、システムの安定性を保証する事ができる。他方、アプリケーション２１側からは、ハードウェア・ドライバ２７が必要なときにだけ割り当てられるので、ハードウェア・ネット２９を使う自由度が大幅に増すことになる。 As described above, in the present invention, the OS 24 side can stably control and monitor as a device driver in which the hardware module driver 23 is always incorporated, so that the stability of the system can be guaranteed. it can. On the other hand, since the hardware driver 27 is assigned only when necessary from the application 21 side, the degree of freedom to use the hardware net 29 is greatly increased.

［６．演算装置（演算ボード、書き込み・入出力ボード）の構成］
図１２は、演算装置（演算ボード、書き込み・入出力ボード）の仮想回路書き込み後の機能を示すブロック図である。以下、図１２を参照して、演算ボード５０、書き込み・入出力ボード６０の順に説明する。 [6. Arithmetic unit (arithmetic board, writing / input / output board) configuration]
FIG. 12 is a block diagram illustrating functions after the virtual circuit is written in the arithmetic device (arithmetic board, writing / input / output board). Hereinafter, the arithmetic board 50 and the writing / input / output board 60 will be described in this order with reference to FIG.

演算ボード５０は少なくとも、仮想回路空間が形成されるＦＰＧＡ５１と、メモリ空間が形成されるメモリ５２を備える。メモリ５２には、ハードウェア・モジュール３０を介してホストプロセッサ１１から送信された演算対象の問題領域に対応する仮想回路データや演算対象データ、各回路による演算結果データ等が一時的に格納される。また、メモリ５２は、ある問題領域における演算中でない格子点に関する情報など、演算に必要ない情報を一時的に格納したりもする。 The arithmetic board 50 includes at least an FPGA 51 in which a virtual circuit space is formed and a memory 52 in which a memory space is formed. The memory 52 temporarily stores virtual circuit data and calculation target data corresponding to the problem area to be calculated, transmitted from the host processor 11 via the hardware module 30, and calculation result data by each circuit. . In addition, the memory 52 temporarily stores information that is not necessary for calculation, such as information regarding lattice points that are not being calculated in a certain problem area.

ＦＰＧＡの内部構造は周知であり、例えば、任意の論理を構成可能な４入力程度の組合せ回路と順序回路からなる論理ブロックが、格子状に多数配置され、その間の配線を簡易なスイッチブロックで接続し、スイッチを切り替えることで所望の機能を持つ仮想回路を再構成するようにしている。そして、仮想回路が再構成された論理ブロックにより、入出力ブロックＩ／Ｏを介してデータの入力および演算結果の出力等が行われる。 The internal structure of the FPGA is well known. For example, a large number of logic blocks consisting of combinational circuits and sequential circuits of about 4 inputs that can form any logic are arranged in a grid, and the wiring between them is connected with a simple switch block. Then, a virtual circuit having a desired function is reconfigured by switching the switch. Then, data is input through the input / output block I / O, the operation result is output, and the like by the logic block in which the virtual circuit is reconfigured.

本実施の形態におけるＦＰＧＡ５１は、メモリ制御回路５５、並列演算回路５６（演算回路の一例）、演算対象データ入出力回路５７、演算結果データ入出力回路５８、隣接ＦＰＧＡデータ転送回路５９Ｘ，５９Ｙ，５９Ｚを含む回路が書き込まれる。並列演算回路５６を含む全ての回路は、書き込み・入出力ボード６０に搭載されたＦＰＧＡ６１により書き込まれる。なお、以降、演算回路を含むこれらの回路を総称して「演算回路等」という。 The FPGA 51 in this embodiment includes a memory control circuit 55, a parallel operation circuit 56 (an example of an operation circuit), an operation target data input / output circuit 57, an operation result data input / output circuit 58, and adjacent FPGA data transfer circuits 59X, 59Y, and 59Z. A circuit including is written. All the circuits including the parallel arithmetic circuit 56 are written by the FPGA 61 mounted on the write / input / output board 60. Hereinafter, these circuits including the arithmetic circuit are collectively referred to as “arithmetic circuit and the like”.

メモリ制御回路５５は、メモリ５２に記憶された各種データの読み出しおよび書き込みを行い、演算ボード５０上のメモリ５２と各仮想回路とのデータ転送を実現する回路である。 The memory control circuit 55 is a circuit that reads and writes various data stored in the memory 52 and realizes data transfer between the memory 52 on the arithmetic board 50 and each virtual circuit.

並列演算回路５６は、複数の論理ブロック５６Ａ，５６Ｂ・・・を備え、演算対象の問題領域に対応した演算を行う回路である。例えば、演算対象が連続系の物理問題のときには、問題領域の各格子点上に定義された微分方程式を離散化した連立方程式を計算する並列回路が再構成される。また、演算対象が多分岐とグラフ構造で示される場合には、ノード分岐を辿りながら計算を行う並列回路が再構成される。 The parallel arithmetic circuit 56 includes a plurality of logic blocks 56A, 56B,... And performs an operation corresponding to the problem area to be calculated. For example, when the operation target is a continuous physical problem, a parallel circuit is reconstructed that calculates simultaneous equations obtained by discretizing differential equations defined on each lattice point in the problem area. Further, when the operation target is indicated by a multi-branch and a graph structure, a parallel circuit that performs calculation while following the node branch is reconfigured.

問題領域内の格子点（ｉ，ｊ，ｋ）における物理量ｆは、演算対象が３次元の場合、次式で表され、隣接格子上の物理量で与えられる方程式が定義できる。
f(i, j, k) = F(i-1, j-1, k-1, i, j, k, i+1, j+1, k+1) The physical quantity f at the lattice point (i, j, k) in the problem area is expressed by the following formula when the calculation target is three-dimensional, and an equation given by the physical quantity on the adjacent grid can be defined.
f (i, j, k) = F (i-1, j-1, k-1, i, j, k, i + 1, j + 1, k + 1)

今、問題領域の１辺の格子点数をＮとすると、問題領域内の格子点数は３次元で増加するのに対し、境界面では２次元で増加するため、格子点での演算量はＮの３乗に比例（〜Ｎ^３）し、境界面では６／Ｎ^２に比例する。よって、演算量に関して格子点／境界面（〜Ｎ^３／６Ｎ^２）＝Ｎ／６である。したがって、両者の演算量と演算手段とのバランスをとるには、Ｎ個の格子点用の並列演算回路に対する境界面用のそれの数をＮ／６以上とすればよい。例えば、論理ブロック５６Ａが領域内の格子点に対応し、論理ブロック５６Ｂが境界面に対応するとすれば、論理ブロック５６Ａの数を論理ブロック５６Ｂの数のＮ／６以上とする。 Now, assuming that the number of grid points on one side of the problem area is N, the number of grid points in the problem area increases three-dimensionally, but increases two-dimensionally at the boundary surface. It is proportional to the third power (˜N ³ ) and proportional to 6 / N ² at the boundary surface. Therefore, the grid point / boundary surface (˜N ³ / 6N ² ) = N / 6 with respect to the calculation amount. Therefore, in order to balance the amount of calculation between the two and the calculation means, the number of the boundary planes for the parallel calculation circuits for N lattice points may be N / 6 or more. For example, if the logical block 56A corresponds to a lattice point in the area and the logical block 56B corresponds to a boundary surface, the number of logical blocks 56A is set to N / 6 or more of the number of logical blocks 56B.

演算対象データ入出力回路５７は、ホストプロセッサ１１の制御下でハードウェア・モジュール３０から送られる演算対象データが書き込み・入出力ボード６０を介して入力される回路である。入力された演算対象データは、メモリ制御回路５５を介してメモリ５２に記憶され、並列演算回路５６等による処理に利用される。演算対象データには、例えば演算対象の範囲、問題領域、演算結果の転送先、演算条件等、演算に使用するパラメータが含まれる。 The calculation target data input / output circuit 57 is a circuit to which calculation target data sent from the hardware module 30 under the control of the host processor 11 is input via the write / input / output board 60. The input operation target data is stored in the memory 52 via the memory control circuit 55 and used for processing by the parallel operation circuit 56 and the like. The calculation target data includes parameters used for calculation, such as a calculation target range, a problem area, a calculation result transfer destination, calculation conditions, and the like.

なお、演算対象データのパラメータ（演算条件、演算対象、問題領域等）は、メモリ５２に記憶させている途中で、つまりホストプロセッサ１１から送られる段階で変更になることも考えられる。そのような場合、演算対象データ入出力回路５７は、メモリ５２に途中まで記憶させた演算対象データに、メモリ５２における当該演算対象データが格納されたアドレス情報を付加してホストプロセッサ１１へ送信するようにしてもよい。このようにした場合、ホストプロセッサ１１は、各演算装置の変更後の演算対象データと、各演算装置から戻ってきた演算対象データとを比較し、真に変更された演算対象データについてのみ該当演算装置へ再送すればよいので、通信路のリソースの節約および再送処理に要する時間を短縮できる。 Note that the parameters (calculation conditions, calculation targets, problem areas, etc.) of the calculation target data may be changed while being stored in the memory 52, that is, at the stage of being sent from the host processor 11. In such a case, the calculation target data input / output circuit 57 adds the address information storing the calculation target data in the memory 52 to the calculation target data stored halfway in the memory 52, and transmits the result to the host processor 11. You may do it. In such a case, the host processor 11 compares the calculation target data after the change of each arithmetic device with the calculation target data returned from each arithmetic device, and applies the calculation only for the arithmetic target data that has been truly changed. Since retransmission to the apparatus is sufficient, it is possible to save communication path resources and reduce the time required for retransmission processing.

演算結果データ入出力回路５８は、当該演算ボード５０の並列演算回路５６等による演算完了後の演算結果データ、あるいは演算に使用するパラメータ等のデータを、メモリ制御回路５５を介してメモリ５２に記憶したり、対応する書き込み・入出力ボード６０へ送信したりする回路である。この演算結果データには、並列演算回路５６による演算結果に加え、当該演算ボード（演算装置）の番号、問題領域の各格子点情報とそれに対応する演算結果データ等が含まれる。なお、演算結果データは、演算対象データに含まれる「演算結果の転送先」に基づいて隣接する所定の演算ボードへ転送される。 The calculation result data input / output circuit 58 stores calculation result data after completion of calculation by the parallel calculation circuit 56 of the calculation board 50 or data such as parameters used for calculation in the memory 52 via the memory control circuit 55. And a circuit for transmitting to the corresponding writing / input / output board 60. In addition to the calculation result by the parallel calculation circuit 56, the calculation result data includes the number of the calculation board (arithmetic unit), each lattice point information of the problem area, calculation result data corresponding to the calculation point data, and the like. The calculation result data is transferred to a predetermined adjacent calculation board based on the “calculation result transfer destination” included in the calculation target data.

なお、演算中の演算対象データのパラメータ変更やシミュレーション方法の変更などの理由により、演算を途中で停止することがある。このような場合、演算結果データ入出力回路５８は、メモリ５２に途中まで記憶させた演算結果データに、メモリ５２における当該演算結果データが格納されたアドレス情報を付加してホストプロセッサ１１へ送信するようにしてもよい。このようにした場合、ホストプロセッサ１１は、各演算装置による途中までの演算結果を利用して演算対象の解析を行うことができる。さらに、その途中までの演算結果をホストプロセッサ１１（メモリ１２）から読み込むようにしてもよい。それにより、途中まで実施した演算結果を流用することができるので、新たに演算が必要な部分のみ演算を行えばよく、演算に要する時間を短縮できる。 Note that the calculation may be stopped halfway due to a change in the parameter of the calculation target data being calculated or a change in the simulation method. In such a case, the calculation result data input / output circuit 58 adds the address information storing the calculation result data in the memory 52 to the calculation result data stored in the memory 52 halfway, and transmits the result to the host processor 11. You may do it. In this case, the host processor 11 can analyze the calculation target by using the calculation results up to the middle of each calculation device. Furthermore, the calculation result up to that point may be read from the host processor 11 (memory 12). As a result, it is possible to use the result of the calculation performed halfway, so that only the part that needs a new calculation needs to be calculated, and the time required for the calculation can be shortened.

隣接ＦＰＧＡデータ転送回路５９Ｘは、各コネクタを介してＸ方向（前後）に隣接する演算ボード５０との間でデータの送受信を行い、並列演算回路５６へ転送する回路である。この隣接する演算ボード５０との間でやり取りするデータには、隣接する演算ボード５０による演算結果に加え、領域境界を跨ぐデータの送受信を実現するため、送信元の演算ボード（演算装置）の番号、各々の格子点情報とそれに対応する演算結果データ等が含まれる。同様にして、隣接ＦＰＧＡデータ転送回路５９ＹはＹ方向（左右）に隣接する演算ボード５０と、隣接ＦＰＧＡデータ転送回路５９ＺはＺ方向（上下）に隣接する演算ボード５０とデータの送受信を行う。なお、各隣接ＦＰＧＡデータ転送回路５９Ｘ，５９Ｙ，５９Ｚから隣接する演算装置４０−１へのデータ転送は、ハードウェア・モジュール３０を介してホストプロセッサ１１から送られるクロック信号（ＣＬＫ）に基づき全演算装置４０−１でタイミングを合わせて行われる。 The adjacent FPGA data transfer circuit 59 X is a circuit that transmits / receives data to / from the operation board 50 adjacent in the X direction (front and rear) via each connector and transfers the data to the parallel operation circuit 56. In the data exchanged with the adjacent calculation board 50, in addition to the calculation result by the adjacent calculation board 50, the number of the calculation board (arithmetic unit) of the transmission source is realized in order to realize transmission / reception of data across the region boundary. , Each piece of grid point information and corresponding calculation result data are included. Similarly, the adjacent FPGA data transfer circuit 59Y transmits and receives data to and from the operation board 50 adjacent in the Y direction (left and right), and the adjacent FPGA data transfer circuit 59Z transmits and receives data to and from the operation board 50 adjacent in the Z direction (up and down). The data transfer from each adjacent FPGA data transfer circuit 59X, 59Y, 59Z to the adjacent arithmetic unit 40-1 is based on the clock signal (CLK) sent from the host processor 11 via the hardware module 30. The timing is adjusted by the device 40-1.

次に、書き込み・入出力ボード６０について説明する。図６に示したように隣接する演算ボード５０同士を接続し、書き換え可能な半導体装置に演算回路等を書き込むためには、書き込みを行う書き込み回路が必要である。 Next, the writing / input / output board 60 will be described. As shown in FIG. 6, in order to connect adjacent arithmetic boards 50 and write an arithmetic circuit or the like to a rewritable semiconductor device, a writing circuit for writing is necessary.

書き込み・入出力ボード６０は少なくとも、仮想回路空間が形成されるＦＰＧＡ６１と、メモリ空間が形成されるメモリ６２を備える。メモリ６２には、ハードウェア・モジュール３０を介してホストプロセッサ１１から送信された演算対象の問題領域に対応する仮想回路データや演算対象データ、また演算ボード５０から送られてくる演算結果データや各回路の演算結果データ等が一時的に格納される。 The writing / input / output board 60 includes at least an FPGA 61 in which a virtual circuit space is formed and a memory 62 in which a memory space is formed. The memory 62 stores virtual circuit data and calculation target data corresponding to the problem area to be calculated transmitted from the host processor 11 via the hardware module 30, calculation result data transmitted from the calculation board 50, Circuit operation result data and the like are temporarily stored.

本実施の形態におけるＦＰＧＡ６１は、ハードウェア・モジュール３０のＦＰＧＡ３４の制御の下、メモリ制御回路６５、書き込み回路６６（仮想回路データ書き込み回路の一例）、仮想回路データ入出力回路６７、各種データ入出力回路６８が書き込まれる。以降において、各種データ入出力回路６８を「演算回路等」に含める場合もある。 The FPGA 61 in the present embodiment is controlled by the FPGA 34 of the hardware module 30 under the control of the memory control circuit 65, the write circuit 66 (an example of a virtual circuit data write circuit), a virtual circuit data input / output circuit 67, and various data inputs / outputs. Circuit 68 is written. In the following, various data input / output circuits 68 may be included in the “arithmetic circuit and the like”.

メモリ制御回路６５は、メモリ６２に記憶された各種データの読み出しおよび書き込みを行い、書き込み・入出力ボード６０上のメモリ６２と各仮想回路とのデータ転送を実現する回路である。 The memory control circuit 65 is a circuit that reads and writes various data stored in the memory 62 and realizes data transfer between the memory 62 on the writing / input / output board 60 and each virtual circuit.

書き込み回路６６は、ホストプロセッサ１１の制御下でハードウェア・モジュール３０から送信される演算対象の問題領域に対応する仮想回路データに基づいて、演算ボード５０のＦＰＧＡ５１に各仮想回路を書き込む回路である。ホストプロセッサ１１からの指示内容によって、問題領域ごとに各演算ボード５０のＦＰＧＡ５１に書き込む仮想回路を変える場合もあれば、仮想回路の論理構成は同一で演算対象パラメータを変えることで問題領域ごとに異なる演算結果を得るような場合もある。 The write circuit 66 is a circuit that writes each virtual circuit to the FPGA 51 of the calculation board 50 based on the virtual circuit data corresponding to the problem area to be calculated transmitted from the hardware module 30 under the control of the host processor 11. . Depending on the contents of instructions from the host processor 11, the virtual circuit to be written to the FPGA 51 of each arithmetic board 50 may be changed for each problem area, or the virtual circuit has the same logical configuration and is different for each problem area by changing the operation target parameter. In some cases, an operation result is obtained.

仮想回路データ入出力回路６７は、ハードウェア・モジュール３０から送信される仮想回路データの入力、および当該仮想回路データを指定された書き込み・入出力ボード６０へ転送するべく制御する回路である。 The virtual circuit data input / output circuit 67 is a circuit that controls to input virtual circuit data transmitted from the hardware module 30 and to transfer the virtual circuit data to a designated write / input / output board 60.

各種データ入出力回路６８は、ハードウェア・モジュール３０から送られてきた仮想回路データや演算対象データを受信し、メモリ制御回路６５を介してメモリ６２に記憶したり、対応する演算ボード５０へ転送したりする回路である。また、対応する演算ボード５０から送られてくる問題領域の演算結果データや各回路の演算結果データ等を、ハードウェア・モジュール３０を介してホストプロセッサ１１へ送信する回路である。 The various data input / output circuits 68 receive virtual circuit data and calculation target data sent from the hardware module 30 and store them in the memory 62 via the memory control circuit 65 or transfer them to the corresponding calculation board 50. It is a circuit to do. In addition, this is a circuit for transmitting calculation result data of a problem area, calculation result data of each circuit, and the like sent from the corresponding calculation board 50 to the host processor 11 via the hardware module 30.

なお、書き込み・入出力ボード６０の書き込み回路６６および仮想回路データ入出力回路６７は、対応する演算ボード５０への演算回路等の書き込みが終了した時点で消去してもよい。このようにした場合、書き込み・入出力ボード６０のＦＰＧＡ６１のリソースを節約することができる。勿論、ＦＰＧＡ６１へ各種データ入出力回路６８を書き込んでもなおリソースに余裕があれば、書き込み回路６６および仮想回路データ入出力回路６７を残しておいてもよい。 Note that the writing circuit 66 and the virtual circuit data input / output circuit 67 of the writing / input / output board 60 may be erased when the writing of the arithmetic circuit or the like to the corresponding arithmetic board 50 is completed. In this case, resources of the FPGA 61 of the write / input / output board 60 can be saved. Of course, the write circuit 66 and the virtual circuit data input / output circuit 67 may be left as long as there are still sufficient resources even if various data input / output circuits 68 are written in the FPGA 61.

［７．演算装置アレイによる演算処理］
次に、図１０に示した演算装置アレイ４０を例に演算処理の概要を説明する。
前提として、演算装置アレイ４０を構成する各演算装置４０−１の演算ボード５０および書き込み・入出力ボード６０に、各々が担当する演算対象の問題領域に合わせてhwNet29である仮想回路（図１２参照）が書き込まれた状態であるとする。演算を行う順番は、演算対象の物理現象や特性等に基づいて決定される。ここでは演算装置「００」，演算装置「１０」，演算装置「２０」から演算を開始して横方向へ順に移動していき、末端の演算装置「０３」，演算装置「１３」，演算装置「２３」で演算を終了する場合を想定する。 [7. Arithmetic processing by arithmetic device array]
Next, the outline of the arithmetic processing will be described by taking the arithmetic device array 40 shown in FIG. 10 as an example.
As a premise, a virtual circuit which is hwNet29 is arranged on the arithmetic board 50 and the writing / input / output board 60 of each arithmetic device 40-1 constituting the arithmetic device array 40 according to the problem area to be operated by each (see FIG. 12). ) Is written. The order in which the calculations are performed is determined based on the physical phenomenon or characteristics to be calculated. Here, the calculation is started from the calculation device “00”, the calculation device “10”, and the calculation device “20” and moved in the horizontal direction in order, the terminal calculation device “03”, the calculation device “13”, the calculation device Assume that the calculation is terminated at “23”.

演算装置アレイ４０を構成する１２個の演算装置、演算装置「００」〜演算装置「２３」の各々の演算ボード５０に、それぞれが担当する演算対象の問題領域ごとの演算対象パラメータの初期値が入力されメモリ５２に一時記憶される。また、メモリ５２は、当該演算装置が転送すべき「演算結果の転送先」の情報を記憶している。 The initial values of the parameters to be calculated for each problem area to be calculated are assigned to the calculation boards 50 of the 12 calculation devices constituting the calculation device array 40, that is, the calculation devices “00” to “23”. The data is input and temporarily stored in the memory 52. Further, the memory 52 stores information on “transfer destination of calculation results” to be transferred by the calculation device.

演算装置「００」，「１０」，「２０」の演算ボード５０における並列演算回路５６（図１２参照）は、ハードウェア・モジュール３０を介して入力されるホストプロセッサ１１からの指示を受けて、あるいは所定の信号や条件をトリガとするタイミングで、各々の対応する問題領域について演算を開始する。生成された各演算結果データはそれぞれのメモリ５２に記憶する。 The parallel arithmetic circuit 56 (see FIG. 12) in the arithmetic board 50 of the arithmetic devices “00”, “10”, and “20” receives an instruction from the host processor 11 input via the hardware module 30, Alternatively, calculation is started for each corresponding problem area at a timing triggered by a predetermined signal or condition. Each generated calculation result data is stored in each memory 52.

続いて、演算装置「００」，「１０」，「２０」はメモリ５２に格納した各々の演算結果データを、ホストプロセッサ１１（または図示せぬクロック発生部）が発信するクロック信号（ＣＬＫ）に同期して、隣接ＦＰＧＡデータ転送回路５９Ｘ（図１２参照）からコネクタ５３Ｒ（図７参照）を介してそれぞれ「演算結果の転送先」である演算装置「０１」，「１１」，「２１」へ転送する。 Subsequently, each of the arithmetic devices “00”, “10”, “20” uses the operation result data stored in the memory 52 as a clock signal (CLK) transmitted by the host processor 11 (or a clock generation unit (not shown)). Synchronously, from the adjacent FPGA data transfer circuit 59X (see FIG. 12) to the calculation devices “01”, “11”, and “21”, which are “transfer destinations of calculation results”, respectively, via the connector 53R (see FIG. 7). Forward.

演算装置「０１」，「１１」，「２１」では、各隣接ＦＰＧＡデータ転送回路５９Ｘがコネクタ５３Ｌを介して演算装置「００」，「１０」，「２０」からそれぞれ演算結果データを受信する。そして、受信した演算結果データと演算対象パラメータ値、及び各演算装置に保存された過去の演算時点での演算結果を用いて、演算装置「０１」，「１１」，「２１」の並列演算回路５６が新たな演算結果データを生成し、それぞれのメモリ５２に記憶する。 In the arithmetic devices “01”, “11”, and “21”, each adjacent FPGA data transfer circuit 59X receives the operation result data from the arithmetic devices “00”, “10”, and “20” via the connector 53L. Then, using the received calculation result data, the calculation target parameter value, and the calculation result at the previous calculation time stored in each calculation device, the parallel calculation circuits of the calculation devices “01”, “11”, and “21” are used. 56 generates new calculation result data and stores them in the respective memories 52.

同様にして、演算装置「０１」，「１１」，「２１」は各々の演算結果データを、「演算結果の転送先」の情報に従って演算装置「０２」，「１２」，「２２」に転送する。演算装置「０２」，「１２」，「２２」では、演算装置「０１」，「１１」，「２１」から受信した各々の演算結果データと演算対象パラメータ値、及び各演算装置に保存された過去の演算時点での演算結果を用いて、並列演算回路５６が新たな演算結果データを生成し、それぞれのメモリ５２に記憶する。 Similarly, the arithmetic devices “01”, “11”, and “21” transfer the respective arithmetic result data to the arithmetic devices “02”, “12”, and “22” according to the information of “calculation result transfer destination”. To do. In the calculation devices “02”, “12”, and “22”, the calculation result data and calculation target parameter values received from the calculation devices “01”, “11”, and “21”, and the calculation devices are stored in each calculation device. The parallel calculation circuit 56 generates new calculation result data using the calculation results at the previous calculation time, and stores them in the respective memories 52.

また、演算装置「０２」，「１２」，「２２」は各々の演算結果データを、「演算結果の転送先」の情報に従って演算装置「０３」，「１３」，「２３」に転送する。演算装置「０３」，「１３」，「２３」では、演算装置「０２」，「１２」，「２２」から受信した各々の演算結果データと演算対象パラメータ値、及び各演算装置に保存された過去の演算時点での演算結果を用いて、演算装置「０３」，「１３」，「２３」の並列演算回路５６が新たな演算結果データを生成し、それぞれのメモリ５２に記憶する。 The arithmetic devices “02”, “12”, and “22” transfer the respective arithmetic result data to the arithmetic devices “03”, “13”, and “23” according to the information of “calculation result transfer destination”. In the arithmetic devices “03”, “13”, and “23”, each arithmetic result data and arithmetic target parameter value received from the arithmetic devices “02”, “12”, and “22” are stored in each arithmetic device. Using the calculation results at the past calculation time, the parallel calculation circuits 56 of the calculation devices “03”, “13”, and “23” generate new calculation result data and store them in the respective memories 52.

演算装置「００」〜演算装置「２３」の全ての演算装置で演算処理が終了した後、演算結果データをホストプロセッサ１１へ送信する処理を実行する。まず演算装置「００」〜演算装置「２３」の各々の演算ボード５０における並列演算回路５６は、担当する問題領域について演算結果データを生成した後、当該演算結果データを演算結果データ入出力回路５８からコネクタ５４を介して対応する書き込み・入出力ボード６０へ転送する。 After the arithmetic processing is completed in all the arithmetic devices “00” to “23”, processing for transmitting the operation result data to the host processor 11 is executed. First, the parallel arithmetic circuit 56 in each arithmetic board 50 of the arithmetic device “00” to the arithmetic device “23” generates arithmetic result data for the problem area in charge, and then outputs the arithmetic result data to the arithmetic result data input / output circuit 58. To the corresponding write / input / output board 60 via the connector 54.

演算装置「００」〜演算装置「２３」の各々の書き込み・入出力ボード６０において、各種データ入出力回路６８がコネクタ６４を介して、対応する演算ボード５０から演算結果データを受信する。そして、各種データ入出力回路６８は、受信した演算結果データを、通信路３６を通じてハードウェア・モジュール３０へクロック信号（ＣＬＫ）に同期して転送する。 In each of the writing / input / output boards 60 of the arithmetic devices “00” to “23”, various data input / output circuits 68 receive the arithmetic result data from the corresponding arithmetic board 50 via the connector 64. The various data input / output circuits 68 transfer the received calculation result data to the hardware module 30 through the communication path 36 in synchronization with the clock signal (CLK).

ハードウェア・モジュール３０においては、演算装置アレイ４０の演算装置「００」〜演算装置「２３」から受信した各演算結果データをＦＰＧＡ３４からローカルプロセッサ３３へ送る。ローカルプロセッサ３３は、標準バス・インターフェース３１からバス１４を介して、ホストプロセッサ１１へ演算結果データを転送する。 In the hardware module 30, calculation result data received from the calculation devices “00” to “23” of the calculation device array 40 is sent from the FPGA 34 to the local processor 33. The local processor 33 transfers operation result data from the standard bus interface 31 to the host processor 11 via the bus 14.

演算装置アレイ４０を構成する各演算装置４０−１から個々の問題領域に対する演算結果データを取得したホストプロセッサ１１は、当該演算結果データと対応する演算装置の番号を基に演算対象全体の演算を実行する。そして、図３に示したような演算対象全体の物理的構造や論理的構造、特性などを解析し、表示装置（図示せず）に出力する。 The host processor 11 that has acquired the operation result data for each problem area from each of the operation devices 40-1 constituting the operation device array 40 performs the operation of the entire operation object based on the operation device number corresponding to the operation result data. Run. Then, the physical structure, logical structure, characteristics, etc. of the entire operation target as shown in FIG. 3 are analyzed and output to a display device (not shown).

上記構成の演算ボード５０と書き込み・入出力ボード６０を備える演算装置４０−１によれば、任意の演算装置４０−１の演算ボード５０による演算結果データを隣接する演算装置４０−１の演算ボード５０へ直接転送することができる。つまり、演算のためのデータ転送にはバスを介さない。 According to the arithmetic device 40-1 including the arithmetic board 50 and the writing / input / output board 60 having the above-described configuration, the arithmetic result data obtained by the arithmetic board 50 of the arbitrary arithmetic device 40-1 is used as the arithmetic board of the adjacent arithmetic device 40-1. 50 directly. In other words, data transfer for computation does not go through the bus.

また、演算装置アレイ４０の各演算装置４０−１による演算結果データ等を取り出してホストプロセッサ１１宛てに送信するとき、および、ホストプロセッサ１１から仮想回路データや演算対象データ等を受信するときのみバス（例えばバス１４）を利用する。
このように構成したことにより、隣接する演算装置４０−１間のデータ転送処理が情報処理装置の処理能力のボトルネックになることを回避することができる。 Further, the bus is only used when the calculation result data and the like by the respective calculation devices 40-1 of the calculation device array 40 are taken out and transmitted to the host processor 11 and when virtual circuit data and calculation target data are received from the host processor 11. (For example, bus 14) is used.
With this configuration, it is possible to avoid the data transfer process between the adjacent arithmetic devices 40-1 from becoming a bottleneck of the processing capability of the information processing device.

なお、演算装置アレイ４０の各演算装置からホストプロセッサ１１への演算結果データの転送は、各演算装置において問題領域の演算が終了後、演算結果データをメモリに記憶する処理と併せて実行するようにするとよい。このようにした場合、演算が終了した演算装置から順に演算結果データがホストプロセッサ１１へ向けて転送されるので、データの輻輳が抑制され、バス１４を始めとする各通信路におけるオーバーヘッドを小さくすることができる。 It should be noted that the transfer of the operation result data from each operation device of the operation device array 40 to the host processor 11 is performed in conjunction with the process of storing the operation result data in the memory after the operation of the problem area is completed in each operation device. It is good to make it. In this case, since the operation result data is transferred to the host processor 11 in order from the operation device that has completed the operation, data congestion is suppressed, and overhead in each communication path including the bus 14 is reduced. be able to.

本出願人の試作した情報処理装置によれば、１２８セットの演算装置を用いて演算装置アレイを構成したところ、約０．７テラフロップス（ＴＦＬＯＰＳ）を達成している。 According to the information processing apparatus prototyped by the present applicant, when an arithmetic unit array is configured using 128 sets of arithmetic units, about 0.7 teraflops (TFLOPS) is achieved.

［８．演算装置への書き込み処理］
次に、図１３および図１４を参照して、演算装置アレイを構成する各演算装置に対する演算回路等の書き込み処理を説明する。図１３は、演算回路等書き込み処理を示すフローチャートである。図１４は、演算回路等書き込み処理時の状態遷移を示すものである。 [8. Write processing to arithmetic unit]
Next, with reference to FIG. 13 and FIG. 14, the writing process of the arithmetic circuit etc. with respect to each arithmetic unit which comprises an arithmetic unit array is demonstrated. FIG. 13 is a flowchart showing write processing such as an arithmetic circuit. FIG. 14 shows a state transition at the time of writing processing such as an arithmetic circuit.

図１０に示したように、最初の書き込み回路はホストプロセッサ１１上にあり、ホストプロセッサ１１から段階的に、演算装置アレイ４０の各演算装置４０−１に演算回路等の仮想回路を書き込んでいく。しかし、仮想回路の中に書き込み回路が構成されていないと、次段（隣接）の演算装置４０−１に仮想回路を書き込むことができない。他方、演算時には、書き込み回路は不用であるので、仮想回路の書き込みが終了したときには削除するのが望ましい。したがって、書き込みが可能なように、全ての演算装置４０−１に対して、書き込み回路あるいは書き込み用の接続回路を書き込み、その後に末端の演算装置４０−１から演算回路等に置き換えていく必要がある。この処理は、ホストプロセッサ１１上で動作するソフトウェアが図１３に示す流れに沿って行う。 As shown in FIG. 10, the first write circuit is on the host processor 11, and virtual circuits such as arithmetic circuits are written to the arithmetic devices 40-1 of the arithmetic device array 40 from the host processor 11 step by step. . However, if the writing circuit is not configured in the virtual circuit, the virtual circuit cannot be written in the next stage (adjacent) arithmetic unit 40-1. On the other hand, since a writing circuit is unnecessary at the time of calculation, it is desirable to delete it when writing of the virtual circuit is completed. Therefore, it is necessary to write a writing circuit or a connection circuit for writing to all the arithmetic devices 40-1 so that writing is possible, and then replace the terminal arithmetic device 40-1 with an arithmetic circuit or the like. is there. This process is performed by software operating on the host processor 11 along the flow shown in FIG.

まず、ホストプロセッサ１１がバス１４および標準バス・インターフェース３１を介して、メモリ１２に記憶してある演算対象に基づく仮想回路書き込みデータをハードウェア・モジュール３０へ送信する（ステップＳ１）。この仮想回路書き込みデータには少なくとも、演算対象の各問題領域に対応する「演算装置の番号」と、それに対応する「仮想回路データ」と、「書き込み順番」の情報が含まれる。現段階では、演算装置４０−１を構成する演算ボード５０および書き込み・入出力ボード６０には仮想回路が書き込まれていない初期状態である（図１４Ａ）。 First, the host processor 11 transmits virtual circuit write data based on the operation target stored in the memory 12 to the hardware module 30 via the bus 14 and the standard bus interface 31 (step S1). The virtual circuit writing data includes at least information on “number of arithmetic unit” corresponding to each problem area to be calculated, “virtual circuit data” corresponding to the number, and “write order”. At this stage, the virtual circuit is not written in the arithmetic board 50 and the writing / input / output board 60 constituting the arithmetic device 40-1 (FIG. 14A).

ホストプロセッサ１１は、ハードウェア・モジュール３０内のＦＰＧＡからなるローカルプロセッサ３３に、仮想回路書き込みデータを書き込む（ステップＳ２）。 The host processor 11 writes the virtual circuit write data to the local processor 33 made of FPGA in the hardware module 30 (step S2).

そして仮想回路書き込みデータが書き込まれたローカルプロセッサ３３は、ハードウェア・モジュール３０内の演算装置アレイ制御用のＦＰＧＡ３４に、仮想回路書き込みデータを書き込む（ステップＳ３）。 The local processor 33 in which the virtual circuit write data is written writes the virtual circuit write data into the FPGA 34 for controlling the arithmetic device array in the hardware module 30 (step S3).

仮想回路書き込みデータが書き込まれたＦＰＧＡ３４は、入出力インターフェース３５および通信路３６を通じて、演算装置アレイ４０の所定の演算装置４０−１に仮想回路書き込みデータを送信する（ステップＳ４）。 The FPGA 34 to which the virtual circuit write data is written transmits the virtual circuit write data to the predetermined arithmetic device 40-1 of the arithmetic device array 40 through the input / output interface 35 and the communication path 36 (step S4).

ＦＰＧＡ３４は、演算装置４０−１の書き込み・入出力ボード６０のＦＰＧＡ６１に、仮想回路データに基づく書き込み回路６６および仮想回路データ入出力回路６７を再構成する。さらに、各種データ入出力回路６８を再構成する。そして、書き込み回路６６および仮想回路データ入出力回路６７が書き込まれた書き込み・入出力ボード６０では、入力された仮想回路書き込みデータを当該仮想回路書き込みデータに含まれる書き込み順番に従って、隣接する演算装置４０−１の書き込み・入出力ボード６０へ転送する（ステップＳ５）。このとき、書き込み・入出力ボード６０に書き込み回路６６および仮想回路データ入出力回路６７が書き込まれた状態となる（図１４Ｂ）。 The FPGA 34 reconfigures the write circuit 66 and the virtual circuit data input / output circuit 67 based on the virtual circuit data in the FPGA 61 of the write / input / output board 60 of the arithmetic device 40-1. Further, various data input / output circuits 68 are reconfigured. In the write / input / output board 60 in which the write circuit 66 and the virtual circuit data input / output circuit 67 are written, the input virtual circuit write data is input to the adjacent arithmetic device 40 in accordance with the write order included in the virtual circuit write data. 1 is transferred to the write / input / output board 60 (step S5). At this time, the write circuit 66 and the virtual circuit data input / output circuit 67 are written in the write / input / output board 60 (FIG. 14B).

図１０を例に説明すると、一例として演算装置「００」→演算装置「１０」，「０１」→演算装置「２０」，「１１」，「２２」→・・・の順番に書き込み回路６６等を書き込んでいく。なお、図１０に示した例では、演算装置が２次元に配置してあるが、演算装置が３次元に配置されていれば書き込み処理を３次元で展開してもよい。 Referring to FIG. 10 as an example, as an example, the writing circuit 66 etc. in the order of the arithmetic device “00” → the arithmetic device “10”, “01” → the arithmetic devices “20”, “11”, “22” →. Will be written. In the example shown in FIG. 10, the arithmetic devices are arranged in two dimensions. However, if the arithmetic devices are arranged in three dimensions, the writing process may be developed in three dimensions.

ホストプロセッサ１１は、書き込み・入出力ボード６０へ書き込み回路６６等を書き込むと、その都度、動作確認のため書き込み・入出力ボード６０から書き込み完了を示す終了信号（フラグ）を取得する。そして、ホストプロセッサ１１は、全ての演算装置４０−１の書き込み・入出力ボード６０について、書き込み回路等の再構成が終了したか否かを判定する（ステップＳ６）。 Each time the host processor 11 writes the write circuit 66 or the like to the write / input / output board 60, the host processor 11 acquires an end signal (flag) indicating the completion of writing from the write / input / output board 60 for operation confirmation. Then, the host processor 11 determines whether or not the reconfiguration of the write circuit and the like has been completed for the write / input / output boards 60 of all the arithmetic devices 40-1 (step S6).

全ての書き込み・入出力ボード６０について書き込み回路６６等の書き込みが終了していない場合、ステップＳ５の処理に戻り、書き込み処理を継続する。 If writing by the writing circuit 66 or the like has not been completed for all the writing / input / output boards 60, the processing returns to step S5 and the writing processing is continued.

一方、全ての書き込み・入出力ボード６０について書き込みが終了した場合、演算装置アレイ４０の末端の書き込み・入出力ボード６０の書き込み回路６６が、対応する演算ボード５０のＦＰＧＡ５１に並列演算回路５６等の仮想回路（図１２参照）を書き込んでいく（ステップＳ７）。 On the other hand, when writing is completed for all the writing / input / output boards 60, the writing circuit 66 of the writing / input / output board 60 at the end of the computing device array 40 is connected to the FPGA 51 of the corresponding computing board 50, such as the parallel computing circuit 56, A virtual circuit (see FIG. 12) is written (step S7).

そして、全ての書き込み・入出力ボード６０に対応する書き込み回路６６が、入力された仮想回路書き込みデータに含まれる書き込み順番情報に従って、対応する書き込み演算ボード５０のＦＰＧＡ５１に並列演算回路５６等の仮想回路を書き込んでいく（ステップＳ８）。 Then, the write circuits 66 corresponding to all the write / input / output boards 60 are connected to the FPGA 51 of the corresponding write calculation board 50 according to the write order information included in the input virtual circuit write data, and the virtual circuits such as the parallel calculation circuit 56. Are written (step S8).

例えば、演算装置「２３」→演算装置「２２」，「１３」→演算装置「２１」，「１２」，「０３」→・・・の順番に並列演算回路５６へ置き換えていく。なお、図１０に示した例では、演算装置が２次元に配置してあるが、演算装置が３次元に配置されていれば書き込み処理も３次元で展開してもよい。 For example, the parallel arithmetic circuit 56 is replaced in the order of the arithmetic device “23” → the arithmetic device “22”, “13” → the arithmetic devices “21”, “12”, “03” →. In the example shown in FIG. 10, the arithmetic devices are arranged in two dimensions. However, if the arithmetic devices are arranged in three dimensions, the writing process may be developed in three dimensions.

ホストプロセッサ１１は、演算ボード５０へ並列演算回路５６等が書き込まれると、その都度、演算ボード５０から書き込み完了を示す終了信号（フラグ）を取得する。ホストプロセッサ１１は終了信号に基づき、全ての演算装置４０−１の演算ボード５０について、並列演算回路５６等の書き込みが終了したか否かを判定する（ステップＳ９）。 The host processor 11 acquires an end signal (flag) indicating completion of writing from the arithmetic board 50 each time the parallel arithmetic circuit 56 or the like is written to the arithmetic board 50. Based on the end signal, the host processor 11 determines whether or not the writing of the parallel arithmetic circuit 56 and the like has been completed for the arithmetic boards 50 of all the arithmetic devices 40-1 (step S9).

全ての演算ボード５０について書き込みが終了していない場合、ステップＳ８の処理に戻り、並列演算回路５６等の書き込み処理を継続する。 If writing has not been completed for all the arithmetic boards 50, the processing returns to step S8, and the writing processing of the parallel arithmetic circuit 56, etc. is continued.

一方、全ての演算ボード５０について書き込みが終了した場合、演算装置アレイ４０の演算装置４０−１に対する仮想回路の書き込みを終了する。このとき、書き込み・入出力ボード６０には書き込み回路６６と仮想回路データ入出力回路６７等が、演算ボード５０には並列演算回路５６等が書き込まれた状態となる（図１４Ｃ）。 On the other hand, when the writing is completed for all the arithmetic boards 50, the writing of the virtual circuit to the arithmetic device 40-1 of the arithmetic device array 40 is ended. At this time, the writing circuit 66 and the virtual circuit data input / output circuit 67 are written in the writing / input / output board 60, and the parallel arithmetic circuit 56 is written in the arithmetic board 50 (FIG. 14C).

１つの演算装置を書き換えるのにおおよそ４０ｍｓであるので、演算装置アレイ全体でも数秒で書き換えることができる。 Since it takes approximately 40 ms to rewrite one arithmetic device, the entire arithmetic device array can be rewritten in a few seconds.

なお、ＦＰＧＡはゲートアレイであるからゲート数のリソースには限りがある。また、演算時には、書き込み回路は不用であるので、仮想回路の書き込みが終了したときには削除するのが望ましい。そこで、演算ボード５０に並列演算回路５６等を書き込むその一方で、仮想回路データ書き込み回路が最後に書き込まれた演算装置４０−１から最初に書き込まれた演算装置４０−１まで順に、書き込み・入出力ボード６０のＦＰＧＡ６１に書き込んだ書き込み回路６６と仮想回路データ入出力回路６７を消去する。このとき、書き込み・入出力ボード６０には各種データ入出力回路６８が、また演算ボード５０には並列演算回路５６等が書き込まれた状態となる（図１４Ｄ）。 Since the FPGA is a gate array, the number of gates is limited. In addition, since a writing circuit is unnecessary at the time of calculation, it is desirable to delete it when the writing of the virtual circuit is completed. Therefore, while the parallel arithmetic circuit 56 and the like are written to the arithmetic board 50, the virtual circuit data writing circuit is written and input sequentially from the arithmetic device 40-1 written last to the arithmetic device 40-1 written first. The write circuit 66 and the virtual circuit data input / output circuit 67 written in the FPGA 61 of the output board 60 are erased. At this time, various data input / output circuits 68 are written in the write / input / output board 60, and the parallel arithmetic circuit 56 and the like are written in the arithmetic board 50 (FIG. 14D).

また、仮想回路データ書き込み回路が最後に書き込まれた演算装置４０−１から最初に書き込まれた演算装置４０−１まで順に、書き込み・入出力ボード６０のＦＰＧＡ６１に書き込んだ書き込み回路６６と仮想回路データ入出力回路６７を消去するとしたが、この例に限られない。すなわち、ホストプロセッサ１１が全ての演算装置４０−１について演算回路等の書き込み動作が正常かどうかを確認できればよく、適切な順番で、書き込み・入出力ボード６０の書き込み回路６６と仮想回路データ入出力回路６７を消去していけばよい。 Further, the write circuit 66 and the virtual circuit data written in the FPGA 61 of the write / input / output board 60 are sequentially written from the arithmetic device 40-1 in which the virtual circuit data write circuit is written last to the arithmetic device 40-1 in which the virtual circuit data is written first. Although the input / output circuit 67 is erased, the present invention is not limited to this example. That is, it is only necessary for the host processor 11 to confirm whether or not the write operation of the arithmetic circuit or the like is normal for all the arithmetic devices 40-1, and in an appropriate order, the write circuit 66 of the write / input / output board 60 and the virtual circuit data input / output The circuit 67 may be deleted.

さらに上述した例では、仮想回路データ入出力回路６７と各種データ入出力回路６８を別々に構成したが、データ転送回路などとして一体に構成してもよい。 Further, in the above-described example, the virtual circuit data input / output circuit 67 and the various data input / output circuits 68 are separately configured, but may be integrally configured as a data transfer circuit or the like.

図１５は、演算回路等の仮想回路を書き込んだ後の演算装置アレイの状態を示したものである。
この例では、第１の演算装置の演算ボード５０−１と隣接する第２の演算装置の演算ボード５０−２が接続し、第１の演算装置の書き込み・入出力ボード６０−１と隣接する第２の演算装置の書き込み・入出力ボード６０−２が接続している。第１の演算装置はハードウェア・モジュールを介してハードウェア・モジュール３０（ホストプロセッサ１１側）と接続しているとする。 FIG. 15 shows the state of the arithmetic device array after writing a virtual circuit such as an arithmetic circuit.
In this example, the arithmetic board 50-1 of the first arithmetic device and the arithmetic board 50-2 of the second arithmetic device adjacent to each other are connected and adjacent to the write / input / output board 60-1 of the first arithmetic device. The write / input / output board 60-2 of the second arithmetic unit is connected. It is assumed that the first arithmetic device is connected to the hardware module 30 (host processor 11 side) via a hardware module.

第１の演算装置の演算ボード５０−１の並列演算回路５６で生成された演算結果データは、対応する書き込み・入出力ボード６０−１の各種データ入出力回路６８からハードウェア・モジュール３０へ送信され、ハードウェア・モジュール３０からバス１４を介してホストプロセッサ１１へ転送される。 Calculation result data generated by the parallel calculation circuit 56 of the calculation board 50-1 of the first calculation device is transmitted from the various data input / output circuits 68 of the corresponding write / input / output board 60-1 to the hardware module 30. Then, the data is transferred from the hardware module 30 to the host processor 11 via the bus 14.

一方、第２の演算装置の演算ボード５０−２の並列演算回路５６で生成された演算結果データは、対応する書き込み・入出力ボード６０−２の各種データ入出力回路６８からまずはホストプロセッサ１１側に近い第２の演算装置の書き込み・入出力ボード６０−２へ送信される。それから、書き込み・入出力ボード６０−２の各種データ入出力回路６８によって、第２の演算装置の当該演算結果データが通信路３６を介してハードウェア・モジュール３０へ送信され、ハードウェア・モジュール３０からバス１４を介してホストプロセッサ１１へ転送される。 On the other hand, the operation result data generated by the parallel operation circuit 56 of the operation board 50-2 of the second operation device is first sent from the various data input / output circuits 68 of the corresponding write / input / output board 60-2 to the host processor 11 side. Is sent to the write / input / output board 60-2 of the second arithmetic unit close to Then, the calculation result data of the second arithmetic unit is transmitted to the hardware module 30 via the communication path 36 by the various data input / output circuits 68 of the writing / input / output board 60-2, and the hardware module 30 To the host processor 11 via the bus 14.

本例では、説明の便宜上、隣接する２個の演算装置間の演算結果データの転送処理について説明したが、それ以上の個数の演算装置間、あるいは３次元配置された複数の演算装置間においても技術思想の基本は同じである。 In this example, for the sake of convenience of explanation, the processing for transferring the calculation result data between two adjacent arithmetic devices has been described. However, even between a larger number of arithmetic devices or a plurality of arithmetic devices arranged in three dimensions. The basics of technical thought are the same.

上述のように構成された本実施の形態によれば、演算装置アレイを構成する全ての演算装置（ＦＰＧＡ等）は隣接演算装置に対してのみ演算結果等のデータを転送する処理を実行すればよく、データ転送処理が情報処理装置の処理能力のボトルネックになることを回避することができる。 According to the present embodiment configured as described above, if all the arithmetic devices (FPGA, etc.) constituting the arithmetic device array execute processing for transferring data such as arithmetic results only to adjacent arithmetic devices. In many cases, it is possible to avoid the data transfer process from becoming a bottleneck of the processing capability of the information processing apparatus.

また、従来の演算装置は全てのデータ転送にバスを介して行っていたため消費電力が膨大なものになり、データ転送の高速化を実現する上で障害となっていたが、上記実施の形態では演算装置間のデータ転送にバスを用いずコネクタ直結によるデータ転送を行うようにしたので、バスを用いた場合と比較して消費電力を小さくできる。 In addition, since the conventional arithmetic unit performs all data transfer via the bus, the power consumption becomes enormous, which has been an obstacle to realizing high-speed data transfer. Since the data transfer between the arithmetic devices is performed by the connector direct connection without using the bus, the power consumption can be reduced as compared with the case of using the bus.

また、従来は通信路での高速性と低消費電力を実現するために光通信を利用することが行われていたが、本実施の形態では光通信を用いずに処理能力の向上および低消費電力を実現するので、大規模演算用の情報処理装置を安価に提供できる。したがって、個人であっても、予約待ちをすることなく、大規模演算用のコンピュータを利用できるようになる。 Conventionally, optical communication has been used in order to realize high speed and low power consumption in a communication path. However, in this embodiment, improvement in processing capability and low consumption are achieved without using optical communication. Since power is realized, an information processing apparatus for large-scale computation can be provided at a low cost. Therefore, even an individual can use a computer for large-scale computation without waiting for a reservation.

［９．本発明の他の実施の形態に係る演算装置アレイ］
３次元構造の演算装置アレイ４０において、図１２に示したように、一つの演算装置４０−１の演算ボード５０には、隣接ＦＰＧＡデータ転送回路が３対ある。この隣接ＦＰＧＡデータ転送回路の接続先を電気的に変えることで、３次元領域の中に２次元領域を畳み込むことができる。すなわち、３次元に構成した演算装置アレイの各演算装置を電気的に制御することにより、２次元または３次元の演算装置アレイとして使用することが可能になる。 [9. Arithmetic device array according to another embodiment of the present invention]
In the arithmetic device array 40 having a three-dimensional structure, as shown in FIG. 12, the arithmetic board 50 of one arithmetic device 40-1 has three pairs of adjacent FPGA data transfer circuits. By electrically changing the connection destination of the adjacent FPGA data transfer circuit, the two-dimensional area can be folded into the three-dimensional area. That is, by electrically controlling each arithmetic device of the arithmetic device array configured in three dimensions, it can be used as a two-dimensional or three-dimensional arithmetic device array.

例えば、図１６に示す演算装置アレイ７０は、ＸＹ領域に配列された４つの演算装置アレイ７１，７２，７３，７４が順に連接された構成である。末端の演算装置、例えばＸ端の演算装置においてＺ方向の演算装置に接続することでＸＹ領域はＺ方向に向きを変え、上層の演算装置アレイで逆方向に進むことが可能である。 For example, the arithmetic device array 70 shown in FIG. 16 has a configuration in which four arithmetic device arrays 71, 72, 73, and 74 arranged in the XY region are sequentially connected. It is possible to change the direction of the XY region in the Z direction by connecting to the Z direction computing device in the terminal computing device, for example, the X end computing device, and proceed in the reverse direction in the upper layer computing device array.

すなわち、演算装置アレイ７１のＸ端に位置する演算装置７１−１，７１−２，７１−３と、対応する下層の演算装置アレイ７２のＸ端に位置する演算装置７２−１０，７２−１１，７２−１２をそれぞれケーブル７６で接続している。一方、演算装置アレイ７２と演算装置アレイ７３との関係では、演算装置アレイ７１と演算装置アレイ７２の接続部分とは反対側のＸ端を接続する。同様にして、演算装置アレイ７３と演算装置アレイ７４との関係では、演算装置アレイ７２と演算装置アレイ７３の接続部分とは反対側のＸ端を接続する。 That is, the arithmetic devices 71-1, 71-2, 71-3 located at the X end of the arithmetic device array 71 and the arithmetic devices 72-10, 72-11 located at the X end of the corresponding lower arithmetic device array 72. , 72-12 are connected by a cable 76. On the other hand, in the relationship between the arithmetic device array 72 and the arithmetic device array 73, the X end on the opposite side of the connecting portion between the arithmetic device array 71 and the arithmetic device array 72 is connected. Similarly, in the relationship between the arithmetic device array 73 and the arithmetic device array 74, the X end on the opposite side of the connecting portion between the arithmetic device array 72 and the arithmetic device array 73 is connected.

結果として、最下層にある演算装置アレイ７４の演算装置７４−１，７１−２，７１−３から、演算装置アレイ７２および演算装置アレイ７３を経由して、最上層にある演算装置アレイ７１の演算装置７１−１０，７１−１１，７１−１２までが２次元に接続され、演算装置アレイ７０を２次元領域で構成することができる。 As a result, the arithmetic device array 71 in the uppermost layer passes through the arithmetic device array 72 and the arithmetic device array 73 from the arithmetic devices 74-1, 71-2, 71-3 in the arithmetic device array 74 in the lowermost layer. The arithmetic devices 71-10, 71-11, 71-12 are connected in two dimensions, and the arithmetic device array 70 can be configured in a two-dimensional region.

これらの接続態様は、ホストプロセッサ１１が演算対象に合わせて決定する。つまり、演算装置アレイを構成する複数の演算装置のうちいずれの演算装置間を接続するかをホストプロセッサ１１が決定し、その決定事項を「演算結果の転送先」として各演算装置に送るようにする。 These connection modes are determined by the host processor 11 according to the operation target. In other words, the host processor 11 determines which of the plurality of arithmetic devices constituting the arithmetic device array is to be connected, and sends the determined item to each arithmetic device as the “transfer destination of the arithmetic result”. To do.

このようにして、コネクタを用いて物理的に３次元に配置した演算装置アレイであっても、より低次（２次元、１次元）の演算対象となる問題領域を畳み込んで演算が行えるよう各演算装置に所望の演算回路等を書き込むことで、隣接する演算装置と物理的な接続を変えることなく電気的に接続を変更することができる。 In this way, even in an arithmetic device array physically arranged in three dimensions using a connector, it is possible to perform an operation by convolving a problem area that is a lower-order (two-dimensional, one-dimensional) calculation target. By writing a desired arithmetic circuit or the like in each arithmetic device, the connection can be electrically changed without changing the physical connection with the adjacent arithmetic device.

［１０．本発明のさらに他の実施の形態に係る演算装置アレイ］
次に、演算対象が周期境界条件を満たす場合の演算装置アレイについて説明する。
演算対象が周期境界条件を満たす場合、演算装置アレイを構成する複数のハードウェア・ネットすなわち演算装置のうち周期境界に位置する演算装置同士を電気的に接続し、トーラス状（円環）を形成する。 [10. Arithmetic device array according to still another embodiment of the present invention]
Next, an arithmetic device array in the case where the operation target satisfies the periodic boundary condition will be described.
When the calculation target satisfies the periodic boundary condition, a plurality of hardware nets constituting the arithmetic device array, that is, the arithmetic devices located at the periodic boundary among the arithmetic devices are electrically connected to form a torus shape (ring) To do.

例えば、図１７に示す２次元の演算装置アレイ８０において、周期境界に該当する最終段の演算装置８０−１０，８０−１１，８０−１２と、一方の周期境界に該当する初段の演算装置８０−１，８０−２，８０−３をコネクタ８１を介して電気的に接続する。この時、各演算装置間で発生する遅延時間が同一になるように各演算装置のコネクタはフレシキブルな配線を用いて適用する事も勿論である．このようにして、演算装置８０−１０，８０−１１，８０−１２の各々の演算結果データを、最初に演算を行う演算装置８０−１，８０−２，８０−３に戻すことにより、繰り返し処理が行える。なお、図１７に示す例では、２次元的周期境界条件を満たす場合を説明したが、３次元の周期境界条件を満たす場合にも適用できることは勿論である。 For example, in the two-dimensional arithmetic unit array 80 shown in FIG. 17, the final stage arithmetic units 80-10, 80-11, and 80-12 corresponding to the periodic boundary and the first stage arithmetic unit 80 corresponding to one periodic boundary. −1, 80-2, 80-3 are electrically connected via the connector 81. At this time, it is a matter of course that the connector of each arithmetic device is applied using flexible wiring so that the delay time generated between the arithmetic devices is the same. In this way, the operation result data of each of the arithmetic devices 80-10, 80-11, and 80-12 is repeatedly returned to the arithmetic devices 80-1, 80-2, and 80-3 that perform the operation first. Can be processed. In the example shown in FIG. 17, the case where the two-dimensional periodic boundary condition is satisfied has been described, but it is needless to say that the present invention can also be applied when the three-dimensional periodic boundary condition is satisfied.

以上に述べた実施の形態は、本発明を実施するための好適な形態の具体例であるから、技術的に好ましい種々の限定が付されている。ただし、本発明は、以上の実施の形態の説明において特に本発明を限定する旨の記載がない限り、これらの実施の形態に限られるものではない。したがって、例えば、以上の説明で挙げた使用材料とその使用量、処理時間、処理順序および各パラメータの数値的条件等は好適例に過ぎず、また、説明に用いた各図における寸法、形状および配置関係等も実施の形態の一例を示す概略的なものである。したがって、本発明は、上述した実施の形態の例に限定されるものではなく、本発明の要旨を逸脱しない範囲において、種々の変形、変更が可能である。 The embodiment described above is a specific example of a preferred embodiment for carrying out the present invention, and therefore various technically preferable limitations are given. However, the present invention is not limited to these embodiments unless otherwise specified in the above description of the embodiments. Therefore, for example, the materials used in the above description, the amount used, the processing time, the processing order, and the numerical conditions of each parameter are only suitable examples, and the dimensions, shapes, and The arrangement relationship and the like are also schematic showing an example of the embodiment. Therefore, the present invention is not limited to the above-described embodiments, and various modifications and changes can be made without departing from the scope of the present invention.

例えば、図１２に示す演算装置４１−１において、演算ボード５０と書き込み・入出力ボード６０に、それぞれメモリ５２とメモリ６２を備えるようにしたがこれらは一つで代用してもよい。例えば、メモリ６２は搭載せずメモリ５２だけを使用するようにしてもよい。また、メモリ５２とメモリ６２に代えて大規模なメモリボードを用意したり、ハードディスク等の大容量記録装置を設けるようにしたりしてもよい。 For example, in the arithmetic device 41-1 shown in FIG. 12, the arithmetic board 50 and the write / input / output board 60 are provided with the memory 52 and the memory 62, respectively, but these may be replaced by one. For example, the memory 62 may not be mounted and only the memory 52 may be used. A large-scale memory board may be prepared instead of the memory 52 and the memory 62, or a large-capacity recording device such as a hard disk may be provided.

従来のシステムの概要を示す構成図である。It is a block diagram which shows the outline | summary of the conventional system. スイッチボックスの説明に供する概略図である。It is the schematic where it uses for description of a switch box. 演算対象の一例を示す模式図である。It is a schematic diagram which shows an example of a calculation object. 本発明の一実施の形態に係る演算装置アレイを示す概略図である。It is the schematic which shows the arithmetic unit array which concerns on one embodiment of this invention. 本発明の一実施の形態に係る演算装置（演算ボード）のコネクタ配置を示す図である。It is a figure which shows connector arrangement | positioning of the arithmetic unit (arithmetic board) which concerns on one embodiment of this invention. 本発明の一実施の形態に係る演算装置（演算ボード）の相互接続の説明に供する図である。It is a figure where it uses for description of the interconnection of the arithmetic unit (arithmetic board) which concerns on one embodiment of this invention. 本発明の一実施の形態に係る演算装置の分解斜視図である。It is a disassembled perspective view of the arithmetic unit which concerns on one embodiment of this invention. 本発明の一実施の形態に係る情報処理装置の概要を示す図である。It is a figure which shows the outline | summary of the information processing apparatus which concerns on one embodiment of this invention. 本発明の一実施の形態に係るホストプロセッサとハードウェア・ネット間の制御概要を示す図である。It is a figure which shows the control outline between the host processor which concerns on one embodiment of this invention, and a hardware network. 本発明の一実施の形態に係る情報処理装置の全体構成を示す概略図である。It is the schematic which shows the whole structure of the information processing apparatus which concerns on one embodiment of this invention. 本発明の一実施の形態に係るホストプロセッサの階層を示す構成図である。It is a block diagram which shows the hierarchy of the host processor which concerns on one embodiment of this invention. 本発明の一実施の形態に係る演算装置（演算ボード、書き込み・入出力ボード）の機能を示すブロック図である。It is a block diagram which shows the function of the arithmetic unit (arithmetic board, write / input / output board) which concerns on one embodiment of this invention. 本発明の一実施の形態に係る演算回路等書き込み処理を示すフローチャートである。It is a flowchart which shows write-in processing of arithmetic circuits etc. which concern on one embodiment of this invention. Ａ〜Ｄは、本発明の一実施の形態に係る演算回路等書き込み時の状態遷移を示す図である。AD is a figure which shows the state transition at the time of writing of arithmetic circuits etc. which concern on one embodiment of this invention. 本発明の一実施の形態に係る演算回路等書き込み後の状態を示す図である。It is a figure which shows the state after writing of the arithmetic circuit etc. which concern on one embodiment of this invention. 本発明の他の実施の形態に係る３次元領域に２次元領域を畳み込む場合の演算装置アレイを示す図である。It is a figure which shows the arithmetic unit array in the case of convolving a two-dimensional area | region with the three-dimensional area | region which concerns on other embodiment of this invention. 本発明のさらに他の実施の形態に係る周期境界条件を満たす場合の演算装置アレイを示す図である。It is a figure which shows the arithmetic unit array in the case of satisfy | filling the periodic boundary conditions based on further another embodiment of this invention.

Explanation of symbols

６−１Ａ〜６−１Ｃ…格子点、１１…ホストプロセッサ、１２…メモリ、１３…ＰＬＤ、１４…バス、２１…アプリケーション、２２…オブジェクト・マネジャー、２３…ハードウェア・モジュール・ドライバ、２５…バス、２６…ハードウェア・オブジェクト、２７…ハードウェア・ドライバ、２８…インターフェース、２９…ハードウェア・ネット、３０…ハードウェア・モジュール、３１…標準バス・インターフェース、３２…ローカルメモリ、３３…ローカルプロセッサ、３４…ＦＰＧＡ、３５…入出力インターフェース、３６…通信路、４０…演算装置アレイ、４０−１…演算装置、５０…演算ボード、５１…ＦＰＧＡ、５２…メモリ、５３Ｆ，５３Ｂ，５３Ｌ，５３Ｒ，５３Ｕ，５３Ｄ，５４…コネクタ、５５…メモリ制御回路、５６…並列演算回路、５６Ａ，５６Ｂ…論理ブロック、５７…演算対象データ入出力回路、５８…演算結果データ入出力回路、５９Ｘ，５９Ｙ，５９Ｚ…隣接ＦＰＧＡデータ転送回路、６０…書き込み・入出力ボード、６０Ａ１…切り欠き、６１…ＦＰＧＡ、６２…メモリ、６３Ｆ，６３Ｂ，６４…コネクタ、６５…メモリ制御回路、６６…書き込み回路、６７…仮想回路データ入出力回路、６８…各種データ入出力回路 6-1A to 6-1C ... lattice points, 11 ... host processor, 12 ... memory, 13 ... PLD, 14 ... bus, 21 ... application, 22 ... object manager, 23 ... hardware module driver, 25 ... bus , 26 ... hardware object, 27 ... hardware driver, 28 ... interface, 29 ... hardware net, 30 ... hardware module, 31 ... standard bus interface, 32 ... local memory, 33 ... local processor, 34 ... FPGA, 35 ... I / O interface, 36 ... communication path, 40 ... arithmetic unit array, 40-1 ... arithmetic unit, 50 ... arithmetic board, 51 ... FPGA, 52 ... memory, 53F, 53B, 53L, 53R, 53U 53D, 54 ... connector, 55 ... memory control circuit 56 ... Parallel operation circuit, 56A, 56B ... Logic block, 57 ... Operation target data input / output circuit, 58 ... Operation result data input / output circuit, 59X, 59Y, 59Z ... Adjacent FPGA data transfer circuit, 60 ... Write / input / output board , 60A1 ... notches, 61 ... FPGA, 62 ... memory, 63F, 63B, 64 ... connectors, 65 ... memory control circuit, 66 ... write circuit, 67 ... virtual circuit data input / output circuit, 68 ... various data input / output circuits

Claims

A memory for storing an application program for performing a predetermined calculation on a calculation target;
Corresponding to each problem area to be calculated, the calculation devices are arranged so as to be directly communicable between adjacent calculation devices, and are used for executing the application program to perform the calculation corresponding to each problem area. An arithmetic device array comprising a plurality of arithmetic devices in which an arithmetic circuit is reconfigured and transmits and receives operation result data by the arithmetic circuit for the problem area between adjacent arithmetic devices;
The arithmetic unit determines the adjacent computing device as previously processed data destination for each arithmetic unit constituting the array, executes the application program, the operation result data for each problem area from the arithmetic unit And a host processor for calculating a calculation result for the calculation target;
The memory, between the host processor and the arithmetic unit array, viewed including a bus for communicating data, and
A virtual circuit data writing circuit for writing the arithmetic circuit to the arithmetic device based on virtual circuit write data including virtual circuit data corresponding to each problem area to be calculated, sent from the host processor. After writing the virtual circuit data write circuit, the arithmetic device transfers the virtual circuit write data to an adjacent arithmetic device according to the write order included in the virtual circuit write data .

In previous SL plurality of arithmetic unit, after the writing of the virtual circuit data write circuit for all the arithmetic unit terminates, the virtual circuit data write circuit wherein all the last written arithmetic unit until first written arithmetic units The information processing apparatus according to claim 1, wherein the arithmetic circuit is written in the arithmetic apparatus.

The information processing apparatus according to claim 2, wherein the arithmetic device includes a pair of arithmetic devices including an arithmetic board to which the arithmetic circuit is written and a writing board to which the virtual circuit data writing circuit is written.

In the plurality of arithmetic devices constituting the computing device array, with write No write the arithmetic circuit to the arithmetic board, writes the data input and output circuit for transferring data to said host processor and receiving the writing board according Item 4. The information processing device according to Item 3.

The computing board of the computing device includes a first connector for connecting to adjacent computing boards in the vertical and horizontal directions, and a second connector for connecting to the writing board that makes a pair. , Comprising a third connector for connecting to the paired arithmetic boards,
The arithmetic circuit of the arithmetic board transmits / receives the arithmetic result data to / from the arithmetic circuit of the adjacent arithmetic board via the first connector, and the write circuit of the write board via the second and third connectors The information processing apparatus according to claim 4, wherein data is transmitted to and received from the data input / output circuit.

The information processing apparatus according to claim 5, wherein the host processor controls a predetermined arithmetic board among a plurality of arithmetic boards constituting the arithmetic device array to be electrically connected via a first connector.

When the calculation target satisfies the periodic boundary condition, the host processor electrically connects the arithmetic devices corresponding to the periodic boundary among the plurality of arithmetic devices constituting the arithmetic device array via the first connector. The information processing apparatus according to claim 5.

Each problem of the calculation target sent from the host processor to each of the plurality of calculation devices arranged so as to be directly communicable between adjacent calculation devices corresponding to each problem area of the calculation target a step of based on the virtual circuit writing data including the virtual circuit data corresponding to the area, write the virtual circuit data write circuit for writing an arithmetic circuit,
After writing the virtual circuit data write circuit to the arithmetic device, the arithmetic device transfers the virtual circuit write data to an adjacent arithmetic device according to the write order included in the virtual circuit write data;
After the writing of the virtual circuit data writing circuit for all the arithmetic devices, the arithmetic circuit for all the arithmetic devices from the arithmetic device in which the virtual circuit data writing circuit was last written to the first arithmetic device to be written Writing step;
A virtual circuit writing method including: