JP5610603B1

JP5610603B1 - VLSI circuit, parallel computing system and computer system

Info

Publication number: JP5610603B1
Application number: JP2013235925A
Authority: JP
Inventors: 隆治村上
Original assignee: 株式会社仲池上工房
Priority date: 2013-11-14
Filing date: 2013-11-14
Publication date: 2014-10-22
Anticipated expiration: 2033-11-14
Also published as: JP2015095223A

Abstract

【課題】任意のＰＥ間のデータ転送が可能であり、スケーラビリティを持った並列コンピューティングシステムを提供すること。また、かかる並列コンピューティングシステムを活用するコンピュータシステムを提供し、小型の携帯端末装置上でラジオシティの計算を可能とすること。【解決手段】ＨＸＮｅｔをＶＬＳＩに実装し、追加ＢＭによってＶＬＳＩ間のデータ転送を可能とする。ＶＬＳＩ数を任意に選択できるスケーラビリティが実現され、小型の携帯端末装置上でラジオシティの計算が可能となる。【選択図】図２To provide a parallel computing system capable of transferring data between arbitrary PEs and having scalability. In addition, a computer system utilizing such a parallel computing system is provided, and radiosity can be calculated on a small portable terminal device. HXNet is mounted on a VLSI, and data transfer between VLSIs is enabled by an additional BM. Scalability in which the number of VLSIs can be arbitrarily selected is realized, and radiosity can be calculated on a small portable terminal device. [Selection] Figure 2

Description

本発明は、並列コンピューティングのためのＶＬＳＩ回路、該ＶＬＳＩ回路を活用する並列コンピューティングシステム及び該並列コンピューティングシステムを活用するコンピュータシステムに関する。 The present invention relates to a VLSI circuit for parallel computing, a parallel computing system utilizing the VLSI circuit, and a computer system utilizing the parallel computing system.

計算を並列化して計算を高速化することを目的とする並列コンピューティングシステムは、数十年に及ぶ研究開発がなされてきた。
並列コンピューティングシステムにおいては、ＰＥと呼ばれる計算エレメント（ＣＰＵに相当）が複数用いられて、各々のＰＥが独立に計算を行い、他のＰＥに計算結果のデータを転送する。ここで、データ転送には転送用のバスが必要となるため、多くのデータ転送を必要とする計算においてはハードウェアの構築が容易でなかった。 Parallel computing systems aimed at speeding up computation by parallelizing computation have been researched and developed for decades.
In a parallel computing system, a plurality of calculation elements called PEs (corresponding to CPUs) are used, each PE performs calculations independently, and transfers calculation result data to other PEs. Here, since a transfer bus is required for data transfer, it is not easy to construct hardware in calculations that require a large amount of data transfer.

例えば、物質の拡散を表す微分方程式の数値解を求める（ラプラシアン演算を行う）場合であれば各ＰＥに空間の１座標における計算をさせ、近接する座標における計算をするＰＥ（２次元であれば４つ、３次元であれば６つ）のみにデータ転送をすればよいので、バスの数を増やさない構成が可能であった。しかし、昨今携帯端末装置においても行われる画像処理におけるラジオシティ計算においては、各ＰＥに１つの小平面における計算をさせると、全てのＰＥ間でデータ転送が必要となる可能性がある。このため、従来の並列コンピューティングシステムでは、十分な高速演算が実現されなかった。 For example, in the case of obtaining a numerical solution of a differential equation representing the diffusion of a substance (performing Laplacian calculation), each PE is calculated at one coordinate in the space, and is calculated at a nearby coordinate (if it is two-dimensional) Since it is sufficient to transfer data only to four (three in the case of three dimensions), a configuration in which the number of buses is not increased was possible. However, in radiosity calculation in image processing performed in recent mobile terminal devices, if each PE performs calculations on one small plane, data transfer may be required between all PEs. For this reason, in a conventional parallel computing system, sufficient high-speed computation has not been realized.

ＰＥ間のデータ転送の問題を解決する方法として、例えば特許文献１には、データ転送を専門に実行する通信ネットワークを有する並列コンピューティングシステムが開示されている。しかし、かかる通信ネットワークの構築には多大なコストを要する。 As a method for solving the problem of data transfer between PEs, for example, Patent Document 1 discloses a parallel computing system having a communication network that specially executes data transfer. However, the construction of such a communication network requires a great deal of cost.

また、特許文献２には、２次元的に配列された各々のＰＥに対応するメモリを、３ポート以上のバスに対応させ、２つの次元を超えた第３のポートを介して幅広いデータ転送を実行する並列コンピューティングシステムが開示されている。しかし、第３のポートにかかるデータ転送の具体的な実現は、個々に設計しなければならない。 In Patent Document 2, a memory corresponding to each PE arranged two-dimensionally corresponds to a bus having three or more ports, and a wide range of data transfer can be performed via a third port exceeding two dimensions. A parallel computing system for performing is disclosed. However, the specific implementation of data transfer over the third port must be individually designed.

ＰＥを２次元的に配列し任意のＰＥ間のデータ転送を可能とした並列コンピューティングシステムとしては、ＨＸＮｅｔ（非特許文献１）によるものが知られていた。ｍ^２個のＰＥ（ｉ，ｊ）（１≦ｉ≦ｍ、１≦ｊ≦ｍ）として、ＰＥ（ｉ，ｊ）→ＰＥ（ｊ，ｋ）→ＰＥ（ｋ，ｌ）の順にデータ転送を行うことで、任意のＰＥ間のデータ転送を可能としたものである。ＨＸＮｅｔは、実装が保証された有用なものである。 As a parallel computing system in which PEs are arranged two-dimensionally and data transfer between arbitrary PEs is possible, one based on HXNet (Non-patent Document 1) has been known. m As ² PE (i, j) (1 ≦ i ≦ m, 1 ≦ j ≦ m), data transfer is performed in the order of PE (i, j) → PE (j, k) → PE (k, l). By doing so, it is possible to transfer data between arbitrary PEs. HXNet is a useful one that is guaranteed to be implemented.

一方ＨＸＮｅｔは、ＰＥの数がｍ^２個に限定され、複数のＨＸＮｅｔを結合して大きなＨＸＮｅｔを構成することができない。小規模のものを後に大規模化するスケーラビリティがなかった。 On the other hand, in HXNet, the number of PEs is limited to m ² , and a large HXNet cannot be configured by combining a plurality of HXNets. There was no scalability to scale up small things later.

特開平０６−０５２１２５号公報Japanese Patent Laid-Open No. 06-052125 特開平０６−０７５９３０号公報Japanese Patent Application Laid-Open No. 06-075930

超並列ＶＬＳＩコンピュータ廉田浩著工業調査会Massively parallel VLSI computer by Hiroshi Renda Industrial Research Committee

本発明は、任意のＰＥ間のデータ転送が可能であり、スケーラビリティを持った並列コンピューティングシステムを提供することを課題とする。
また、かかる並列コンピューティングシステムを活用するコンピュータシステムを提供し、小型の携帯端末装置上でラジオシティの計算を可能とすることを課題とする。 An object of the present invention is to provide a parallel computing system capable of transferring data between arbitrary PEs and having scalability.
It is another object of the present invention to provide a computer system that utilizes such a parallel computing system and to enable calculation of radiosity on a small portable terminal device.

小規模のＨＸＮｅｔをＶＬＳＩ回路によって実現する。ＶＬＳＩ回路は、ＨＸＮｅｔに使用するＢＭ（ＢＭは「バッファメモリ」を表す。）に加え、他のＶＬＳＩ回路にデータを転送するための追加ＢＭを備える。これにより、任意の数のＶＬＳＩを結合して並列コンピューティングシステムを構成することができ、任意のＰＥ間でのデータ転送が可能となる。 A small-scale HXNet is realized by a VLSI circuit. The VLSI circuit includes an additional BM for transferring data to another VLSI circuit in addition to the BM used for HXNet (BM represents “buffer memory”). Thereby, an arbitrary number of VLSIs can be combined to constitute a parallel computing system, and data transfer between arbitrary PEs becomes possible.

本発明のＶＬＳＩ回路は、
ｍ^２個のＰＥ及びｍ^３個のＢＭを含むＨＸＮｅｔと、
ｍ^２（ｎ−１）個の追加ＢＭを実装したことを特徴とする。 The VLSI circuit of the present invention is
HXNet containing m ² PEs and m ³ BMs;
It is characterized by mounting m ² (n−1) additional BMs.

ＨＸＮｅｔをＶＬＳＩに実装し、他のＶＬＳＩとのデータ転送のための追加ＢＭを更に実装する。ここで、ｍ及びｎは２以上の整数である。 HXNet is mounted on the VLSI, and an additional BM for data transfer with other VLSI is further mounted. Here, m and n are integers of 2 or more.

本発明の並列コンピューティングシステムは、
ｍ ^２個のＰＥ及びｍ ^３個のＢＭを含むＨＸＮｅｔとｍ ^２（ｎ−１）個の追加ＢＭを実装したＶＬＳＩ回路をｎ個含み、
前記ｍ及びｎは２以上の整数であり、
各ＶＬＳＩ回路において、前記追加ＢＭは、前記ＨＸＮｅｔ中のｍ ^２個のＰＥの各々から書込可能な（ｎ−１）個の追加ＢＭをｍ^２組備え、
前記追加ＢＭのｍ ^２組は順序付けされ、ｉ組目の追加ＢＭに書込可能なＰＥをｉ番目のＰＥ（ｉは１以上ｍ ^２以下の整数）と表すものとし、
前記（ｎ−１）個の追加ＢＭ（１組の追加ＢＭ）は順序付けされ、
ｊ番目の順序のＶＬＳＩ回路のｉ組目の追加ＢＭの組のｋ番目の順序の追加ＢＭがｋ番目の順序（ｋ＜ｊの場合）又は（ｋ＋１）番目の順序（ｋ≧ｊの場合）のＶＬＳＩ回路のｉ番目のＰＥによって読取可能であることを特徴とする。 The parallel computing system of the present invention includes:
Including n VLSI circuits mounted with HXNet including m ² PEs and m ³ BMs and m ² (n−1) additional BMs ,
M and n are integers of 2 or more;
In each VLSI circuit, the additional BM includes m ² sets of (n−1) additional BMs writable from each of the m ² PEs in the HXNet,
The m ² sets of the additional BM are ordered, and a PE that can be written to the i-th additional BM is represented as an i-th PE (i is an integer of 1 to m ² ),
The (n-1) additional BMs (a set of additional BMs) are ordered,
The kth order additional BM of the i-th set BM of the jth order VLSI circuit is the kth order (when k <j) or the (k + 1) th order (when k ≧ j). It can be read by the i-th PE of the VLSI circuit.

各々の追加ＢＭからのデータ転送先のバスを確定し、ＶＬＳＩ間のバスを少なくする。 Data transfer destination buses from each additional BM are determined, and buses between VLSIs are reduced.

本発明の並列コンピューティングシステムは、
ｋ番目の順序のＶＬＳＩ回路のｉ番目のＰＥをＰＥ（ｋ，ｉ）と表すとき、ＰＥ（ｋ_１，ｉ_１）からＰＥ（ｋ_２，ｉ_２）へのデータ転送が、
（１）ｋ_１＝ｋ_２であればｋ_１番目のＶＬＳＩ回路内のＨＸＮｅｔによって実行され、
（２）ｋ_１≠ｋ_２かつｉ_１＝ｉ_２であればＰＥ（ｋ_１，ｉ_１）が所定の追加ＢＭにデータを書き込み、該データをＰＥ（ｋ_２，ｉ_１）が読取ることによって実行され、
（３）ｋ_１≠ｋ_２かつｉ_１≠ｉ_２であればＰＥ（ｋ_１，ｉ_１）が所定の追加ＢＭにデータを書き込み、該データをＰＥ（ｋ_２，ｉ_１）が読取り、ｋ_２番目のＶＬＳＩ回路内のＨＸＮｅｔによりＰＥ（ｋ_２，ｉ_２）に転送されることによって実行される
ことを特徴とする。 The parallel computing system of the present invention includes:
When the i-th PE of the k-th order VLSI circuit is represented as PE (k, i), the data transfer from PE (k ₁ , i ₁ ) to PE (k ₂ , i ₂ )
(1) If k ₁ = k ₂ , k is executed by the HXNet in the _first VLSI circuit,
(2) If k ₁ ≠ k ₂ and i ₁ = i ₂ , PE (k ₁ , i ₁ ) writes data to a predetermined additional BM, and PE (k ₂ , i ₁ ) reads the data. Executed,
(3) If k ₁ ≠ k ₂ and i ₁ ≠ i ₂ , PE (k ₁ , i ₁ ) writes data to a predetermined additional BM, and the data is read by PE (k ₂ , i ₁ ), k the HXNet in _second VLSI circuit, characterized in that it is executed by being transferred to the _{_{PE (k 2, i 2)}} .

具体的なデータ転送手順を与える。 A specific data transfer procedure is given.

本発明のコンピュータシステムは、
本体ＣＰＵと、上記の並列コンピューティングシステムと、前記本体ＣＰＵと前記並列コンピューティングシステムとのインタフェース回路とを備えることを特徴とする。 The computer system of the present invention
A main body CPU, the parallel computing system described above, and an interface circuit between the main body CPU and the parallel computing system are provided.

ＶＬＳＩによる並列コンピューティングシステムを、本体ＣＰＵから見て１つのデバイスとして取り扱うことを可能とする。 A parallel computing system based on VLSI can be handled as one device when viewed from the main body CPU.

本発明のコンピュータシステムは、
前記並列コンピューティングシステムを用いてラジオシティの計算を行うことを特徴とする。 The computer system of the present invention
Radiosity is calculated using the parallel computing system.

ラジオシティの計算においては、ＰＥ間のデータ転送が多い。本発明のコンピュータシステムが有効に活用される。 In the calculation of radiosity, there are many data transfers between PEs. The computer system of the present invention is effectively used.

本発明のコンピュータシステムは、
携帯端末装置上で動作し、
ゲームのアプリケーションのおける画像表示にラジオシティの計算を行うことを特徴とする。 The computer system of the present invention
Runs on a mobile terminal device,
Radiosity is calculated for image display in game applications.

スケーラビリティを持った並列コンピューティングシステムであり、携帯端末装置での活用、特にゲームにおける活用が考えられる。 It is a parallel computing system with scalability and can be used in mobile terminal devices, especially in games.

本発明によれば、任意のＰＥ間のデータ転送が可能でありスケーラビリティを持った並列コンピューティングシステム、及びかかる並列コンピューティングシステムを活用するコンピュータシステムが提供される。 According to the present invention, there are provided a parallel computing system capable of transferring data between arbitrary PEs and having scalability, and a computer system utilizing such a parallel computing system.

図１は、ＶＬＳＩ回路を示す図である。FIG. 1 is a diagram showing a VLSI circuit. 図２は、複数のＶＬＳＩ回路による並列コンピューティングシステムを示す図である。FIG. 2 is a diagram illustrating a parallel computing system using a plurality of VLSI circuits. 図３は、コンピュータの構成を示す図である。FIG. 3 is a diagram illustrating the configuration of the computer. 図４は、ラジオシティの計算手順を示す図である。FIG. 4 is a diagram showing a radiosity calculation procedure.

以下、本発明の実施例を、ｍ＝２、ｎ＝３の例で説明する。ｍ、ｎが他の値であっても同様に動作する。 In the following, examples of the present invention will be described with an example of m = 2 and n = 3. The same operation is performed even if m and n are other values.

図１は、ＶＬＳＩ回路を示す図である。ＶＬＳＩ回路１には、ＨＸＮｅｔ２及び８つの追加ＢＭ（ＡＢＭ）４が実装されている。ＨＸＮｅｔ２は、４つ（＝２^２）のＰＥ３と８つ（＝２^３）のＢＭを有している。ＶＬＳＩ回路１は、この他に８つのＡＢＭを含む。８つのＡＢＭは、４つのＰＥ（ｉ）のそれぞれから読み書き可能な２つずつのＡＢＭ（ｉ，ｊ）である（ｉ＝１，．．．４、ｊ＝１，２）。 FIG. 1 is a diagram showing a VLSI circuit. The VLSI circuit 1 is mounted with HXNet 2 and eight additional BMs (ABMs) 4. HXNet2 has 4 (= 2 ² ) PE3 and 8 (= 2 ³ ) BMs. In addition to this, the VLSI circuit 1 includes eight ABMs. The eight ABMs are two ABMs (i, j) that can be read and written from each of the four PEs (i) (i = 1,..., J = 1, 2).

８つのＡＢＭは、ｊ毎に（読み書きされるＰＲ毎（ｉ毎）でなく各ＰＭに係る順序毎（ｊ毎）に）、グループ化されてバスを有している。むろん、各々のＡＢＭ毎に別々のバスを有するのであるが、グループ毎に略同一方向へのバス（ＶＬＳＩ回路外への経路）を有している。 The eight ABMs have a bus grouped for each j (for each order (for each j) related to each PM, not for each PR (for each i) to be read and written). Of course, each ABM has a separate bus, but each group has a bus (path to the outside of the VLSI circuit) in substantially the same direction.

１つのＶＬＳＩ回路は、４ＰＥからなるＨＸＮｅｔを構成しており、それ自体が並列コンピューティングシステムとして動作する。本実施例では４ＰＥとしたが、９ＰＥ、１６ＰＥ、２５ＰＥ、その他任意の数のＰＥ（ただし、ｍを２以上の整数としてｍ^２個のＰＥとする）であってよい。 One VLSI circuit constitutes an HXNet composed of 4PE, and operates as a parallel computing system. Although the 4PE in this embodiment, 9PE, 16PE, 25PE, any other number of PE (However, m a and ^{m 2} pieces of PE as an integer of 2 or more) may be.

図２は、複数のＶＬＳＩ回路による並列コンピューティングシステムを示す図である。本実施例では３つのＶＬＳＩ回路１による１２ＰＥ（４ＰＥ×３）の並列コンピューティングシステムを示すが、４以上のＶＬＳＩ回路１による並列コンピューティングシステムも同様に構築可能である。 FIG. 2 is a diagram illustrating a parallel computing system using a plurality of VLSI circuits. In this embodiment, a 12PE (4PE × 3) parallel computing system using three VLSI circuits 1 is shown, but a parallel computing system using four or more VLSI circuits 1 can be similarly constructed.

３つのＶＬＳＩ回路１ａ、１ｂ及び１ｃがあり、ＶＬＳＩ回路１ａと１ｂとを結ぶバス５ａｂ、ＶＬＳＩ回路１ｂと１ｃとを結ぶバス５ｂｃ及びＶＬＳＩ回路１ｃと１ａとを結ぶバス５ｃａが備えられている。 There are three VLSI circuits 1a, 1b, and 1c. A bus 5ab that connects the VLSI circuits 1a and 1b, a bus 5bc that connects the VLSI circuits 1b and 1c, and a bus 5ca that connects the VLSI circuits 1c and 1a are provided.

バス５ａｂ、５ｂｃ及び５ｃａは、それぞれ４つのＡＢＭに係る４本のバスをまとめて示している。１本１本のバスは、以下のように接続されている。例えば、バス５ａｂのうち、ＶＬＳＩ回路１ａのＡＢＭ（１，２）からのバスは、ＶＬＳＩ回路１ｂのＡＢＭ（１，１）及び／又はＰＥ１に接続される。一方のＶＬＳＩ回路のＡＢＭ（ｉ_１，ｊ_１）から他方のＶＬＳＩ回路のＡＢＭ（ｉ_２，ｊ_２）又はＰＥｉ_２を接続するにあたって、ｉ_１＝ｉ_２の関係を保つ。このようにして、同じ番号のＰＥ若しくはそれに係るＡＢＭを接続している。 The buses 5ab, 5bc, and 5ca collectively represent four buses related to four ABMs. Each bus is connected as follows. For example, of the bus 5ab, the bus from the ABM (1,2) of the VLSI circuit 1a is connected to the ABM (1,1) and / or PE1 of the VLSI circuit 1b. When connecting the ABM (i ₁ , j ₁ ) of one VLSI circuit to the ABM (i ₂ , j ₂ ) or PEi ₂ of the other VLSI circuit, the relationship of i ₁ = i ₂ is maintained. In this way, PEs having the same number or ABMs related thereto are connected.

ここで、バスで接続されたＡＢＭ間でデータを複写する（ＡＢＭ間でミラーリングする）か、接続先のＡＢＭを書き換えずにＰＥにデータを転送するかは、いずれでもよい。ＰＥにデータを転送することが目的であり、ＡＢＭを介するか否かは本質でない。 Here, data may be copied between ABMs connected by a bus (mirroring between ABMs) or data may be transferred to the PE without rewriting the connected ABM. The purpose is to transfer the data to the PE, and it is not essential whether it is via the ABM.

ここで、ｉ_１＝ｉ_２の関係を保つことが重要である。ＨＸＮｅｔにおいては、ＰＥの数がｍ^２個であり、共に１〜ｍの値をとるｉ及びｊによってＰＥが番号付けされているため、ＰＥ（ｉ，ｊ）からＰＥ（ｊ、ｋ）に転送する際に、ｊの値にかかわらずＰＥ（ｊ、ｋ）が存在する。しかし、本発明においては、ＶＬＳＩ回路の数ｎがｍよりも小さい場合には、ｉ番目のＶＬＳＩのｊ番目のＰＥに対して、「ｊ番目のＶＬＳＩ」が存在しない可能性がある。ＰＥ（ｉ，ｊ）からＰＥ（ｊ、ｋ）への転送が保証されない。このため、ｉを各々のＶＬＳＩにおけるＰＥの順序として、転送先のＰＥが存在することを保証するものである。 Here, it is important to maintain the relationship of i ₁ = i ₂ . In HXNet, the number of PEs is m ² , and PEs are numbered by i and j that take values of 1 to m, so transfer from PE (i, j) to PE (j, k). In this case, PE (j, k) exists regardless of the value of j. However, in the present invention, when the number n of VLSI circuits is smaller than m, there is a possibility that the “j-th VLSI” does not exist for the j-th PE of the i-th VLSI. Transfer from PE (i, j) to PE (j, k) is not guaranteed. For this reason, i is assumed to be the order of PEs in each VLSI, and it is guaranteed that the transfer destination PE exists.

以上により、ｍ^２（ｎ−１）個の追加ＢＭを実装したＶＬＳＩ回路１を、最大ｎ個まで、スケーラビリティを持って任意の数を接続することができ、任意のＰＥ間のデータ転送を実現することができる。以下、任意のＰＥ間のデータ転送を説明する。ｋ番目の順序のＶＬＳＩ回路のｉ番目のＰＥをＰＥ（ｋ，ｉ）と表すとき、ＰＥ（ｋ_１，ｉ_１）からＰＥ（ｋ_２，ｉ_２）へのデータ転送は、以下の手順で行われる。 As described above, an arbitrary number of VLSI circuits 1 mounted with m ² (n−1) additional BMs can be connected up to a maximum of n, and data transfer between arbitrary PEs is realized. can do. Hereinafter, data transfer between arbitrary PEs will be described. When the i-th PE of the k-th order VLSI circuit is represented as PE (k, i), data transfer from PE (k ₁ , i ₁ ) to PE (k ₂ , i ₂ ) is performed according to the following procedure. Done.

ｋ_１＝ｋ_２であればｋ_１番目のＶＬＳＩ回路内のＨＸＮｅｔによって実行される。すなわち、同一のＶＬＳＩ内におけるＨＸＮｅｔによるデータ転送である。 If k ₁ = k ₂ , it is executed by the HXNet in the k _1st VLSI circuit. That is, data transfer by HXNet in the same VLSI.

ｋ_１≠ｋ_２かつｉ_１＝ｉ_２であればＰＥ（ｋ_１，ｉ_１）が所定の追加ＢＭにデータを書き込み、該データをＰＥ（ｋ_２，ｉ_１）が読取ることによって実行される。すなわち、ＶＬＳＩ内の順序が同一のＰＥ間は、ＡＢＭ及びバスを介したデータ転送が可能である。 If k ₁ ≠ k ₂ and i ₁ = i ₂ , PE (k ₁ , i ₁ ) writes data to a predetermined additional BM, and the data is read by PE (k ₂ , i ₁ ). . That is, data transfer via ABM and bus is possible between PEs in the same order in the VLSI.

ｋ_１≠ｋ_２かつｉ_１≠ｉ_２であればＰＥ（ｋ_１，ｉ_１）が所定の追加ＢＭにデータを書き込み、該データをＰＥ（ｋ_２，ｉ_１）が読取り、ｋ_２番目のＶＬＳＩ回路内のＨＸＮｅｔによりＰＥ（ｋ_２，ｉ_２）に転送されることによって実行される。すなわち、ＡＢＭ及びバスを介したデータ転送を行った後に、転送先のＶＬＳＩ内のＨＸＮｅｔによるデータ転送を行う。 If k ₁ ≠ k ₂ and i ₁ ≠ i ₂ , PE (k ₁ , i ₁ ) writes data to a predetermined additional BM, and the data is read by PE (k ₂ , i ₁ ), and the k _2nd This is executed by being transferred to PE (k ₂ , i ₂ ) by HXNet in the VLSI circuit. That is, after performing data transfer via the ABM and the bus, data transfer is performed by HXNet in the transfer destination VLSI.

以上、ＶＬＳＩ回路及びその接続を説明した。次いで、並列コンピューティングシステムを実現するためのインタフェース回路について説明する。ただし、インタフェース回路は非特許文献１に開示されたものと同様であってよく、特段の説明をせずとも本発明の属する技術分野における通常の知識によって開発可能である。 The VLSI circuit and its connection have been described above. Next, an interface circuit for realizing a parallel computing system will be described. However, the interface circuit may be the same as that disclosed in Non-Patent Document 1, and can be developed by ordinary knowledge in the technical field to which the present invention belongs without special description.

図３は、コンピュータの構成を示す図である。コンピュータにはメインＣＰＵ６及び並列コンピューティング用のＶＬＳＩ回路１が備えられ、これらはインタフェース回路７によって結ばれている。ここで、ＶＬＳＩ回路１は、１つのＶＬＳＩであっても、上述のように複数のＶＬＳＩを結合したものであってもよい。 FIG. 3 is a diagram illustrating the configuration of the computer. The computer includes a main CPU 6 and a VLSI circuit 1 for parallel computing, which are connected by an interface circuit 7. Here, the VLSI circuit 1 may be a single VLSI or a combination of a plurality of VLSIs as described above.

インタフェース回路７は、データストレージと同様に、メインＣＰＵ６にとっては１つの入出力デバイスとして取り扱うことができる。何らかのデータを与え、何らかのデータを受け取るものだからである。 The interface circuit 7 can be handled as one input / output device for the main CPU 6 as in the case of data storage. This is because some data is given and some data is received.

インタフェース回路７は、メインＣＰＵ６からデータ（プログラムの指示を含む）を受け取り、ＶＬＳＩ回路１に並列コンピューティングを行わせ、その計算結果をメインＣＰＵ６に返す。このためには、データをストアするＩＦ−メモリと、ＶＬＳＩ回路１の動作を制御するＩＦ−ＣＰＵとを備えている。メインＣＰＵ６から受け取るプログラムの指示に従って、ＶＬＳＩ回路１の各々のＰＥにデータを与え、計算を指示する。 The interface circuit 7 receives data (including program instructions) from the main CPU 6, causes the VLSI circuit 1 to perform parallel computing, and returns the calculation result to the main CPU 6. For this purpose, an IF-memory for storing data and an IF-CPU for controlling the operation of the VLSI circuit 1 are provided. In accordance with the instructions of the program received from the main CPU 6, data is given to each PE of the VLSI circuit 1 to instruct calculation.

以上、並列コンピューティングを行うことのできるコンピュータを説明した。次いで、かかるコンピュータによって実行させるラジオシティの計算について説明する。 The computer that can perform parallel computing has been described above. Next, calculation of radiosity executed by the computer will be described.

ラジオシティの計算は、画像を求める計算であり、従来のレイトレーシング（光路追跡）と異なり、個々の光線を追跡せずに物体の表面を表す小平面における反射を他の小平面との関係で計算することが特徴である。 The calculation of radiosity is a calculation to obtain an image. Unlike conventional ray tracing, the reflection on the small plane that represents the surface of the object without tracking individual rays is related to other small planes. It is characteristic to calculate.

ｘ番目の小平面の反射率をＲ_ｘ、ｘ番目の小平面とｙ番目の小平面との角関係（ｘ番目の小平面で反射された光がｙ番目の小平面に到達する割合）をＦ_ｘｙ、ｘ番目の小平面から放射される光エネルギーをＢ_ｘ、初期の放射光エネルギーをＥ_ｘとするとき、以下の式が成立する。なお、Ｒ_ｘ及びＦ_ｘｙが色相によって相違するとして色相ごとに計算してもよい。
ここで、Ｒ_ｘは物体表面の材質によって定まる定数であり、Ｅ_ｘは初期の放射光（光源）によって定まる定数である。Ｆ_ｘｙ及びＢ_ｘを計算することがラジオシティの計算の中心となる。 The reflectance of the xth facet is R _x , and the angular relationship between the xth facet and the yth facet (the rate at which the light reflected by the xth facet reaches the yth facet). When F _xy , the light energy radiated from the xth small plane is B _x , and the initial radiated light energy is E _x , the following equation is established. Note that R _x and F _xy may be calculated for each hue on the assumption that they differ depending on the hue.
Here, R _x is a constant determined by the material of the object surface, and E _x is a constant determined by the initial radiation light (light source). The calculation of F _xy and B _x is central to the calculation of radiosity.

図４は、ラジオシティの計算手順を示す図である。先にＦ_ｘｙを求め、その後にＢ_ｘを求める。ここで、Ｆ_ｘｙを求めるステップ８ａ及びＢ_ｘ求めるステップ８ｂに、並列化計算が有効である。 FIG. 4 is a diagram showing a radiosity calculation procedure. First, F _xy is obtained, and then B _x is obtained. Here, the parallel calculation is effective in the step 8a for _obtaining F _xy and the step 8b for obtaining B _x .

Ｆ_ｘｙは、ｘ番目の小平面とｙ番目の小平面との角関係によるので、ｘ番目の小平面とｙ番目の小平面に関する情報のみに基づいて計算できる。すなわち、小平面の数をｐとするときｐ（ｐ−１）／２個のＦ_ｘｙの値を計算するが、それぞれを独立に並列計算できる。ｍ^２ｎ個のＰＥを備えた並列コンピューティングシステムによれば、計算時間が（１／ｍ^２ｎ）になることが期待される。 Since F _xy depends on the angular relationship between the xth _facet and the yth _facet , F _xy can be calculated based only on information about the xth facet and the yth facet. That is, when the number of small planes is p, p (p−1) / 2 F _xy values are calculated, but they can be independently calculated in parallel. According to the parallel computing system including m ² n PEs, the calculation time is expected to be (1 / m ² n).

Ｂ_ｘの計算は、上記数１に基づいて（右辺の計算結果を左辺に代入して）逐次計算する方法と、上記数１を線形連立方程式として逆行列計算によって解く方法とがある。 The calculation of B _x includes a method of sequentially calculating based on the above equation 1 (substituting the calculation result of the right side into the left side) and a method of solving the above equation 1 by linear matrix equations by inverse matrix calculation.

逐次計算の場合には、各々のｘに対するＢ_ｘを他のｘに対するものとは独立に計算できる。並列化計算が有効である。 In the case of sequential computation, B _x for each x can be computed independently of those for the other x. Parallel calculation is effective.

逆行列計算の場合には、非特許文献１に記載された方法によって、並列化計算を有効に活用できる。ここで、Ｆ_ｘｙの値は多くが非ゼロであり、逆行列計算の対象となる行列が密であるため、並列化計算の効果が大きい。 In the case of inverse matrix calculation, parallelized calculation can be effectively utilized by the method described in Non-Patent Document 1. Here, most of the values of F _xy are non-zero, and the matrix to be subjected to inverse matrix calculation is dense, so that the effect of parallel calculation is great.

以上のとおり、ｍ^２個のＰＥを持つＨＸＮｅｔのＶＬＳＩ回路１を任意の数だけ用いることのできる、スケーラビリティを持った並列コンピューティングシステムが実現された。また、この並列コンピューティングシステムを活用したコンピュータシステム及びそのコンピュータシステムにおいて実行されるラジオシティの計算を示した。携帯端末等における並列コンピューティングシステムの活用を実現したものである。 As described above, a scalable parallel computing system that can use an arbitrary number of HXNet VLSI circuits 1 having m ² PEs has been realized. In addition, a computer system utilizing this parallel computing system and calculation of radiosity executed in the computer system are shown. This realizes the use of a parallel computing system in portable terminals and the like.

本発明の並列コンピューティングシステムは、スケーラビリティを有するので、小型の携帯端末装置に用いることも（この場合にはｍの小さなＶＬＳＩ回路を用いる。）、大型の計算機に用いることも（この場合にはｍ、ｎの大きなＶＬＳＩ回路を用いる。）、ゲーム機その他の機器に用いることもできるものである。 Since the parallel computing system of the present invention has scalability, it can be used for a small portable terminal device (in this case, a VLSI circuit having a small m) or a large computer (in this case). VLSI circuits with large m and n are used.), and can also be used for game machines and other devices.

任意のＰＥ間のデータ転送が可能でありスケーラビリティを持った並列コンピューティングシステム、及びかかる並列コンピューティングシステムを活用するコンピュータシステムである。多くの携帯端末製造業者、コンピュータメーカによる利用が期待される。 A parallel computing system capable of transferring data between arbitrary PEs and having scalability, and a computer system utilizing such a parallel computing system. Expected to be used by many mobile terminal manufacturers and computer manufacturers.

また、ラジオシティの計算を携帯端末等においても実現するものであり、多くのソフトウェア開発業者による利用も期待される。 In addition, the calculation of radiosity is also realized in portable terminals and the like, and it is expected to be used by many software developers.

１…ＶＬＳＩ回路
２…ＨＸＮｅｔ
３…ＰＥ
４…追加ＢＭ
５…バス
６…メインＣＰＵ
７…インタフェース回路
８…並列化ステップ 1 ... VLSI circuit 2 ... HXNet
3 ... PE
4 ... Additional BM
5 ... Bus 6 ... Main CPU
7 ... Interface circuit 8 ... Parallelization step

Claims

HXNet containing m ² PEs and m ³ BMs;
implement m ² (n−1) additional BMs,
The VLSI circuit, wherein m and n are integers of 2 or more .

Including n VLSI circuits mounted with HXNet including m ² PEs and m ³ BMs and m ² (n−1) additional BMs ,
M and n are integers of 2 or more;
In each VLSI circuit, the additional BM includes m ² sets of (n−1) additional BMs writable from each of the m ² PEs in the HXNet,
The m ² sets of the additional BM are ordered, and a PE that can be written to the i-th additional BM is represented as an i-th PE (i is an integer of 1 to m ² ),
The (n-1) additional BMs (a set of additional BMs) are ordered,
The kth order additional BM of the i-th set BM of the jth order VLSI circuit is the kth order (when k <j) or the (k + 1) th order (when k ≧ j). The parallel computing system is readable by the i-th PE of the VLSI circuit.

When the i-th PE of the k-th order VLSI circuit is represented as PE (k, i), the data transfer from PE (k ₁ , i ₁ ) to PE (k ₂ , i ₂ )
(1) If k ₁ = k ₂ , k is executed by the HXNet in the _first VLSI circuit,
(2) If k ₁ ≠ k ₂ and i ₁ = i ₂ , PE (k ₁ , i ₁ ) writes data to a predetermined additional BM, and PE (k ₂ , i ₁ ) reads the data. Executed,
(3) If k ₁ ≠ k ₂ and i ₁ ≠ i ₂ , PE (k ₁ , i ₁ ) writes data to a predetermined additional BM, and the data is read by PE (k ₂ , i ₁ ), k _3. The parallel computing system according to claim 2, wherein the parallel computing system is executed by being transferred to PE (k ₂ , i ₂ ) by HXNet in the second VLSI circuit.

A computer system comprising: a main body CPU; a parallel computing system according to claim 3; and an interface circuit between the main body CPU and the parallel computing system.

The computer system according to claim 4, wherein radiosity is calculated using the parallel computing system.

Runs on a mobile terminal device,
And performing the radiosity calculations image display definitive the game application, the computer system according to claim 5.