JP2000298654A

JP2000298654A - Processor element circuit, parallel calculation system and bus bridging method

Info

Publication number: JP2000298654A
Application number: JP11106099A
Authority: JP
Inventors: Shinjirou Inahata; 深二郎稲畑; Sou Yamada; 想山田; Nobuaki Miyagawa; 宣明宮川; Kazuaki Murakami; 和彰村上; Hajime Takashima; 一高島; Kazuyasu Kitamura; 一泰北村
Original assignee: Fuji Xerox Co Ltd; Taisho Pharmaceutical Co Ltd
Current assignee: Taisho Pharmaceutical Co Ltd; Fujifilm Business Innovation Corp
Priority date: 1999-04-14
Filing date: 1999-04-14
Publication date: 2000-10-24

Abstract

PROBLEM TO BE SOLVED: To reduce design man-hours of a processor element and a bus ridge circuit when a dedicated system is constructed as a parallel computer system by selecting an operation processing mode and a bridge mode and sharing the same LSI by a processor element and the bus bridge circuit. SOLUTION: This system is constituted of a host computer 11 and plural sheets of boards 12 on which many processor elements 14 are mounted. The host computer 11 and the respective boards 12 are connected to each other with a system bus 13. In the system, a mode selection means can select an operation processing mode and a bridge mode. A processor element circuit is used as the processor element 14 of a parallel calculation system by making it to be the operation processing mode or it can be used as a bus bridge circuit 17 for protocol conversion with the different bus of the parallel calculation system or as a part of it by making it to be the bridge mode.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】この発明は、非経験的分子軌
道計算や重力多体問題などの膨大な計算を行うための専
用並列計算システムに使用される、バスプロトコル間変
換のためのバスブリッジ装置、およびそのバスブリッジ
装置を使用したシステムに関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a bus bridge device for conversion between bus protocols used in a dedicated parallel computing system for performing enormous calculations such as ab initio molecular orbital calculations and gravity many-body problems. , And a system using the bus bridge device.

【０００２】[0002]

【従来の技術】従来、医薬品の開発や材料設計などの分
野では、実際に物を使って実験を行うことが非常に困難
なものもあるため、近年、計算機によって、医薬品など
の化学現象をシミュレーションする手法が、盛んに用い
られるようになってきた。また、一般に、このような化
学現象のシミュレーションには、膨大な演算量が必要と
されるため、化学者達によって、高速にシミュレーショ
ンを行うための計算アルゴリズムの開発に精力が注ぎ込
まれてきた。また、これと同時に、専用の超並列計算機
が考案され、計算の高速化が図られてきた。2. Description of the Related Art In the past, in the fields of drug development and material design, it has been extremely difficult to carry out experiments using actual objects. Techniques have become popular. In general, simulating such chemical phenomena requires an enormous amount of computation, and chemists have been devoting their energies to developing calculation algorithms for performing high-speed simulations. At the same time, a dedicated massively parallel computer has been devised to speed up the calculation.

【０００３】このような努力の結果、近年、計算化学の
分野では、非経験的分子軌道計算と呼ばれるシミュレー
ション手法が大きな発展を遂げてきた。この非経験的分
子軌道計算手法について説明する。As a result of such efforts, in recent years, in the field of computational chemistry, a simulation technique called ab initio molecular orbital calculation has been greatly developed. This ab initio molecular orbital calculation method will be described.

【０００４】まず、非経験的分子軌道計算手法の概要に
ついて説明する。非経験的分子軌道計算手法の原理は、
例えば「藤永茂、分子軌道法、岩波書店（１９８０）」
や、「菊池修、基礎量子化学、朝倉書店（１９９７）」
などに示されている。この方法は、量子化学的な原理に
従って、ある分子が持っているエネルギーを計算で求め
るものである。原子間の距離や配置のうち、このエネル
ギーの値が最小のものを求めることによって、材料の特
性などを判断して薬品の設計などを行うことができる。First, an outline of the ab initio molecular orbital calculation method will be described. The principle of the ab initio molecular orbital calculation method is
For example, "Shigeru Fujinaga, Molecular Orbital Method, Iwanami Shoten (1980)"
And "Osamu Kikuchi, Basic Quantum Chemistry, Asakura Shoten (1997)"
And so on. In this method, the energy of a certain molecule is calculated by calculation according to the quantum chemical principle. By finding the minimum value of the energy among the distances and arrangements between the atoms, it is possible to determine the characteristics of the material and to design the medicine.

【０００５】非経験的分子軌道計算に最もよく使われる
方法として、ハートリーフォック法（ＨＦ法）と呼ばれ
る計算手法がある。以下、このＨＦ法の概要について述
べる。The most frequently used method for ab initio molecular orbital calculation is a calculation method called the Hartree-Fock method (HF method). Hereinafter, the outline of the HF method will be described.

【０００６】ＨＦ法は、フォック方程式を、後述するＳ
ＣＦ法によって解く方法として定式化されている。ここ
で、フォック方程式は、分子に含まれている原子軌道の
総数をＮ、原子軌道の線形近似（ＬＣＡＯ）で表される
分子軌道の総数をｍとすると、分子全体に関するシュレ
ディンガー方程式に対して１電子近似、ＬＣＡＯ近似を
行った結果得られるＦＣ＝ＳＣε （１）という行列による式で表される。ＨＦ法は、このフォッ
ク方程式を、後述するＳＣＦ（ＳｅｌｆＣｏｎｓｉｓ
ｔｅｎｔＦｉｅｌｄ）法によって解く方法として定式
化されている。この方程式を解くことによって、分子の
有するエネルギーが求まるため、その値により分子が安
定な状態かどうかを判定できる。In the HF method, the Fock equation is expressed by S
It is formulated as a method to solve by the CF method. Here, assuming that the total number of atomic orbitals contained in a molecule is N and the total number of molecular orbitals expressed by a linear approximation of the atomic orbital (LCAO) is m, the Fock equation is 1 with respect to the Schrodinger equation for the whole molecule. It is expressed by a matrix expression FC = SCε (1) obtained as a result of performing the electronic approximation and the LCAO approximation. The HF method converts this Fock equation into an SCF (Self Consis
(tent field) method. By solving this equation, the energy of the molecule is obtained, and it is possible to determine whether the molecule is in a stable state based on the value.

【０００７】（１）式において、Ｆはフォック行列と呼
ばれるＮ×Ｎの行列である。また、Ｓは重なり行列と呼
ばれるＮ×Ｎの行列、Ｃは係数を表すＮ×ｍの行列、ε
は分子軌道を占有するそれぞれの電子が持つエネルギー
を表すｍ×ｍの対角行列である。In the equation (1), F is an N × N matrix called a Fock matrix. S is an N × N matrix called an overlap matrix, C is an N × m matrix representing coefficients, ε
Is an m × m diagonal matrix representing the energy of each electron occupying the molecular orbital.

【０００８】ここで、フォック行列の要素Ｆｒｓ（ｒ，
ｓ＝１〜Ｎ）は、以下の式で表される。Here, the element Frs (r, r,
s = 1 to N) is represented by the following equation.

【０００９】Ｆｒｓ＝ｈｒｓ＋ｇｒｓ＝ｈｒｓ＋Σ〔ｔ，ｕ＝１〜Ｎ〕Ｐｔｕ（（ｒｓ，ｔｕ） −（１／２）（ｒｔ，ｓｕ））（２）Frs = hrs + grs = hrs + Σ [t, u = 1 to N] Ptu ((rs, tu) − (1/2) (rt, su)) (2)

【００１０】この（２）式のｈｒｓは、１電子に対する
エネルギーを表す積分量であり、（１）式の１回の計算
でＮ²個に比例する数だけ計算される。The hrs in the equation (2) is an integral amount representing the energy for one electron, and is calculated by a number in proportion to N ^{2 in} one calculation of the equation (1).

【００１１】なお、この明細書において、Σ〔ｉ，ｊ＝
１〜Ｎ〕ｆ（ｉ，ｊ）は、ｉおよびｊについて１からＮ
までの総和を関数ｆ（ｉ，ｊ）について求める演算を示
すものとしている。また、Σ〔ｉ＝１〜Ｎ〕ｆ（ｉ）
は、ｉについて１からＮまでの総和を関数ｆ（ｉ）につ
いて求める演算を示すものとしている。In this specification, Σ [i, j =
1-N] f (i, j) is 1 to N for i and j
It is assumed that the calculation for obtaining the sum up to the above for the function f (i, j) is shown. Also, Σ [i = 1 to N] f (i)
Denotes an operation for calculating the sum of 1 to N for i for the function f (i).

【００１２】また、（２）式のＰｔｕは密度行列と呼ば
れ、以下のように、上記の行列Ｃを用いて表される。Further, Ptu in the equation (2) is called a density matrix, and is expressed by using the above matrix C as follows.

【００１３】Ｐｔｕ＝Σ（ｊ＝１〜ｍ）Ｃｔｊ・Ｃｕｊ（３）また、（２）式の（ｒｓ，ｔｕ）（ｒ，ｓ，ｔ，ｕ＝１
〜Ｎ）は２電子積分と呼ばれる物理量であり、原子軌道
χ_ｉ（ｒ）（ｉ＝１〜Ｎ，ｒは座標）を用いて以下の式
のように表される。Ptu = Σ (j = 1 to m) Ctj · Cju (3) Also, (rs, tu) (r, s, t, u = 1) in equation (2)
To N) are physical quantities called two-electron integrals, and are represented by the following formula using the atomic orbital χ _i (r) (i = 1 to N, where r is a coordinate).

【００１４】（ｒｓ，ｔｕ）＝∫∫χ_ｒ（ｒ１）χ_ｓ（ｒ１）（１／ｒ12） ×χ_ｔ（ｒ２）χ_ｕ（ｒ２）ｄｒ１・ｄｒ２（４）(Rs, tu) = ∫∫χ _r (r1) χ _s (r1) (1 / r12) × χ _t (r2) χ _u (r2) dr1 · dr2 (4)

【００１５】ここで、ｒ１、ｒ２はそれぞれ独立した２
つの座標系であり、それぞれ全空間にわたって二重積分
が行われる。また、ｒ12は、座標系ｒ１とｒ２との間の
距離を表す。この２電子積分は、ｒ，ｓ，ｔ，ｕが、そ
れぞれ原子軌道の数だけ存在するので、（１）式の１回
の計算でＮ⁴個に比例した数だけ必要となる。Here, r1 and r2 are independent 2
Two coordinate systems, each of which performs double integration over the entire space. R12 represents the distance between the coordinate systems r1 and r2. In this two-electron integration, r, s, t, and u are required by the number of atomic orbitals, respectively, so that a single calculation of the equation (1) requires a number proportional to N ^{4 in} one calculation.

【００１６】次に、重なり行列Ｓの要素Ｓｒｓは以下の
式で表される。Next, the element Srs of the overlap matrix S is represented by the following equation.

【００１７】Ｓｒｓ＝∫χ_ｒ（ｒ１）χ_ｓ（ｒ１）ｄｒ１（５）このように表されるので、ＨＦ法は（１）式で表される
ｍ個の固有値εｉ、固有ベクトルＣｉ（ｉ＝１〜ｍ）を
求める問題となる。しかしながら、（２）式、（３）式
より分かるように、（１）式に含まれるフォック行列
は、係数を表すベクトルＣｉを使って求められるので、
（１）式を解いて得られるＣｉを使用しないと、Ｆｒｓ
の値も求められないことになる。Srs = ∫χ _r (r1) χ _s (r1) dr1 (5) Since it is expressed as described above, the HF method uses m eigenvalues εi and eigenvectors Ci (i = 1 to m). However, as can be seen from the equations (2) and (3), the Fock matrix included in the equation (1) is obtained using the vector Ci representing the coefficient.
If Ci obtained by solving equation (1) is not used, Frs
Will not be determined.

【００１８】したがって、まず、Ｃｉの初期値として適
当な値を設定し、そのＣｉを使用してＦｒｓを求め、
（１）式の固有値問題を解いて、新たなＣｉを求める。
次に、この求めたＣｉを使って、新たなＦを計算して
（１）式を解く。このように繰り返し計算を行い、最後
にＦの計算に使用されたＣｉと、求められたＣｉとの間
に殆ど差がなくなったところで計算を終了する。物理的
には、繰り返し毎に分子を構成する各電子が作り出すク
ーロン場と、原子核の作るクーロン場の矛盾がなくなっ
ていくので、この方法は、ＳＣＦ（ｓｅｌｆ−ｃｏｎｓ
ｉｓｔｅｎｔｆｉｅｌｄ）法と呼ばれ、分子軌道計算
において広く使われている方法である。Therefore, first, an appropriate value is set as an initial value of Ci, and Frs is obtained using the Ci.
Solving the eigenvalue problem of equation (1), finds a new Ci.
Next, a new F is calculated using the obtained Ci to solve the equation (1). The calculation is repeated in this manner, and the calculation is terminated when there is almost no difference between the Ci used for the calculation of F and the obtained Ci. Physically, there is no contradiction between the Coulomb field created by each electron constituting the molecule and the Coulomb field created by the nucleus at each repetition, so this method uses the SCF (self-cons
This method is called an isent field method and is widely used in molecular orbital calculations.

【００１９】（１）式で表される２電子積分の個数は、
原子軌道の総数Ｎの４乗に比例するため、例えば生物学
などの分野でよく現れる１００個程度の原子からなる分
子を考えた場合、Ｎの値は１０００程度となり、その４
乗の１００兆個のオーダーにものぼる。ここで、２電子
積分を計算する前に、値が小さいものを判定してカット
オフする方法が良く用いられるものの、計算が必要な２
電子積分の数は１億個程度であり、膨大な数であること
に変わりはない。The number of two-electron integrals represented by the equation (1) is
Since the total number of atomic orbitals is proportional to the fourth power of N, for example, in the case of a molecule consisting of about 100 atoms that often appears in the field of biology, for example, the value of N is about 1000.
It is on the order of 100 trillion squares. Here, before calculating the two-electron integral, a method of judging a small value and cutting off the value is often used.
The number of electron integration is about 100 million, which is still a huge number.

【００２０】このため、ＳＣＦ法の各反復には同じ２電
子積分が使用されるものの、２電子積分を一旦計算して
格納しておくメモリ容量がないため、反復毎に２電子積
分を計算し直すというダイレクト法が通常用いられる。
このダイレクト法による分子軌道計算では、２電子積分
の計算に大部分の計算時間が占有されるため、この部分
の高速化が重要となる。For this reason, although the same two-electron integral is used in each iteration of the SCF method, there is no memory capacity for temporarily calculating and storing the two-electron integral. The direct method of fixing is usually used.
In the molecular orbital calculation by the direct method, since the calculation of the two-electron integral occupies most of the calculation time, it is important to speed up this part.

【００２１】ここで、（４）式で表される原子軌道χ_ｉ
には、通常、２電子積分を解析的に求めることができる
ガウス型関数が使用される。このガウス型関数の原子軌
道を用いた高速な２電子積分の計算法としては、従
来、”Ｓ．ＯｂａｒａａｎｄＡ．Ｓａｉｋａ，Ｊ．Ｃ
ｈｅｍ．Ｐｈｙｓ．８４，３９６３（１９８６）”（以
下文献１という）に示されている方法（以下、小原の方
法と称する）が知られていた。Here, the atomic orbit χ _i represented by the equation (4)
In general, a Gaussian function that can analytically calculate a two-electron integral is used for the calculation. As a method of calculating a high-speed two-electron integral using the atomic orbital of the Gaussian function, a conventional method is described in S. Obara and A. Saika, J.C.
hem. Phys. 84, 3963 (1986) "(hereinafter referred to as Reference 1) (hereinafter referred to as Ohara's method).

【００２２】小原の方法は、２電子積分を拡張した補助
積分という物理量を導入し、補助積分を含んだ漸化式の
形式で表される。この漸化式によって、１つの２電子積
分は、より低次の補助積分を含む積和演算の形式によっ
て表される。ある２電子積分を求めるときには、まず、
漸化式に従って、最も次数の低い補助積分だけを含んだ
形式に展開し、次に、積和演算によって、次数の高い補
助積分を順次求めていくことによって計算が行われる。
すなわち、２電子積分の計算は、一連の特定の形式を持
った積和演算に帰着される。Ohara's method introduces a physical quantity called an auxiliary integral which is an extension of the two-electron integral, and is expressed in the form of a recurrence formula including the auxiliary integral. According to the recurrence formula, one two-electron integral is represented by a form of a product-sum operation including a lower-order auxiliary integral. When calculating a certain two-electron integral, first,
According to the recurrence formula, the calculation is performed by expanding to a form including only the lowest-order auxiliary integral, and then sequentially obtaining higher-order auxiliary integrals by a product-sum operation.
In other words, the calculation of the two-electron integral results in a series of product-sum operations with a particular format.

【００２３】このように、分子軌道計算では計算量が膨
大となるため、２電子積分の計算などを並列化して、計
算時間を短縮することが行われてきた。高速に分子軌道
計算を行うための公知の技術としては、特開平９−５０
４２８号公報、特開平９−５０４５８号公報にみられる
ように、異なる２電子積分を複数のプロセッサで計算す
る方法、あるいは「超高速分子軌道計算専用機ＭＯＥの
アーキテクチャ」（Ｔｅｃｈ．Ｒｅｐ．ＩＥＩＣＥ．，
ＣＰＳＹ９６−４６，１９９６−０５）などに示されて
いるように、専用の並列計算機を用いて計算を行う方法
がある。As described above, since the amount of calculation in the molecular orbital calculation becomes enormous, the calculation of two-electron integral and the like have been parallelized to reduce the calculation time. Known techniques for performing high-speed molecular orbital calculation include Japanese Patent Application Laid-Open No. 9-50 / 1990.
No. 428, Japanese Unexamined Patent Publication No. 9-50458, a method of calculating different two-electron integrals by a plurality of processors, or “Architecture of MOE for exclusive use in ultra-high-speed molecular orbital calculation” (Tech. ,
As shown in CPSY96-46, 1996-05), there is a method of performing calculation using a dedicated parallel computer.

【００２４】特に、分子軌道計算では、２電子積分の計
算量が大きく、この計算は互いに独立しており、比較的
並列化し易いことから、スーパーコンピュータで計算す
るよりも、プロセッサエレメントを多数備えた高速な専
用計算機で計算したほうが高速になる。このように、近
年、特定のアプリケーションに対して専用計算機を用い
た計算が盛んに行われるようになった。In particular, in the molecular orbital calculation, the calculation amount of the two-electron integral is large, and this calculation is independent of each other and relatively easy to parallelize. It is faster to calculate with a high-speed dedicated computer. As described above, in recent years, calculations using a special-purpose computer for a specific application have been actively performed.

【００２５】図７に、分子軌道計算などを行うための専
用システムとしての並列計算システムの一例を示す。こ
のシステムは、汎用のワークステーションやＰＣ（パー
ソナルコンピュータ）を使用して構成されるホストコン
ピュータ１と、多数のプロセッサエレメント４を搭載し
た複数枚のボード２とから構成され、ホストコンピュー
タ１と各ボード２との間は、システムバス３で結合され
ている。FIG. 7 shows an example of a parallel computing system as a dedicated system for performing molecular orbital calculations and the like. This system includes a host computer 1 configured using a general-purpose workstation or a PC (personal computer), and a plurality of boards 2 on which a large number of processor elements 4 are mounted. 2 is connected by a system bus 3.

【００２６】さらに、各ボード２内においては、多数の
プロセッサエレメント４のそれぞれには、ローカルメモ
リ５が接続されて設けられている。そして、多数のプロ
セッサエレメント４が、ボードバス６を通して接続され
ている。ボードバス６と、システムバス３との間は、バ
スブリッジ７を介して接続されている。Further, in each board 2, a local memory 5 is connected to each of a number of processor elements 4. Many processor elements 4 are connected through a board bus 6. The board bus 6 and the system bus 3 are connected via a bus bridge 7.

【００２７】システムバス３には、高速性の他に、これ
がホストコンピュータ１に直接接続するバスであること
から、汎用性が要求される。また、プロセッサエレメン
ト４では、計算に必要なパラメータや制御変数などを使
って高速に計算を行わなければならないため、ボードバ
ス６には、主に高速性が要求される。The system bus 3 is required to have high versatility in addition to high speed because it is a bus directly connected to the host computer 1. In addition, the processor element 4 must perform high-speed calculations using parameters and control variables necessary for the calculation, and therefore the board bus 6 is mainly required to have high speed.

【００２８】ボードバス６を高速にするためには、バス
の負荷容量を小さくしなければならず、このためには、
１本のボードバス６に接続されているプロセッサエレメ
ント４の個数を少なくして、さらに、ボードバス６自体
も短くしなければならない。このため、あるボード２上
のボードバス６は、他のボード２のボードバス６とは分
離されていなければならない。さらに、ボードバス６に
は、専用に開発された高速バスや、最新のテクノロジを
駆使した新しい高速のバスが採用されることが多く、汎
用のＰＣやワークステーションには、このようなバス
は、まだ実装されていない場合が多い。In order to increase the speed of the board bus 6, it is necessary to reduce the load capacity of the bus.
The number of processor elements 4 connected to one board bus 6 must be reduced, and the board bus 6 itself must be shortened. For this reason, the board bus 6 on one board 2 must be separated from the board bus 6 on another board 2. In addition, a high-speed bus developed exclusively for the board bus or a new high-speed bus utilizing the latest technology is often used for the board bus 6, and such a bus is used for general-purpose PCs and workstations. Often not implemented yet.

【００２９】このために、各ボード上には、システムバ
ス３とボードバス６との間でプロトコル変換などの中継
を行うためのバスブリッジ７が必要になるのである。こ
の種のバスブリッジ回路の例は、特開平６−３７７６８
号公報などに記載されている。For this reason, a bus bridge 7 for relaying protocol conversion and the like between the system bus 3 and the board bus 6 is required on each board. An example of this type of bus bridge circuit is disclosed in JP-A-6-37768.
No., etc.

【００３０】[0030]

【発明が解決しようとする課題】ところで、専用システ
ムの場合には、カスタムのプロセッサエレメントを設計
することが必要な場合が多い。また、特に、上述した分
子軌道法などの膨大な量の計算を行わなければならない
場合、必然的に、プロセッサエレメントの規模が大きく
なってしまうため、プロセッサエレメントの設計の工数
がかかってしまう。By the way, in the case of a dedicated system, it is often necessary to design a custom processor element. In particular, when a huge amount of calculations such as the molecular orbital method described above must be performed, the scale of the processor element inevitably increases, and thus the man-hour for designing the processor element is increased.

【００３１】また、前述のように、ボードバス６には専
用のバスなどが使用されるため、バスブリッジ回路７に
は汎用の製品はなく、新規設計しなければならない。こ
のため、プロセッサエレメント４の設計工数とともに、
バスブリッジ７の設計工数がかかってしまう問題があ
る。As described above, since a dedicated bus or the like is used for the board bus 6, there is no general-purpose product in the bus bridge circuit 7, and it must be newly designed. For this reason, along with the design man-hour of the processor element 4,
There is a problem that the design man-hour of the bus bridge 7 is required.

【００３２】前述の特開平６−３７７６８号公報に記載
された発明は、バス間の転送バッファの容量を減少させ
ることが目的であるが、近年の半導体プロセスの向上に
より、バッファ容量はあまり問題とはならない。むし
ろ、特に専用システムの場合には、上で述べた設計工数
の増大を抑えることが重要な課題となる。The purpose of the invention described in the above-mentioned Japanese Patent Application Laid-Open No. Hei 6-37768 is to reduce the capacity of a transfer buffer between buses. Not be. Rather, especially in the case of a dedicated system, it is important to suppress the increase in the number of design steps described above.

【００３３】この発明は、上記の点にかんがみ、並列計
算システムとして専用システムを構築する場合に、その
プロセッサエレメント、およびバスブリッジ回路の設計
工数を減少させることを目的とする。In view of the above, it is an object of the present invention to reduce the number of design steps for a processor element and a bus bridge circuit when a dedicated system is constructed as a parallel computing system.

【００３４】[0034]

【課題を解決するための手段】上記目的を達成するため
に、請求項１の発明によるプロセッサエレメント回路
は、演算を並列に行う機能を備えると共に、複数個のバ
スを備え、この複数個のバス間のブリッジを行うための
ブリッジ回路を備える並列計算システムにおいて用いら
れるプロセッサエレメント回路であって、データを格納
するための内部メモリと、内部に第１のレジスタを備
え、前記内部メモリまたは前記第１のレジスタから読み
出したデータに対して演算を施し、その演算結果を前記
内部メモリまたは前記第１のレジスタに格納する演算器
と、プロセッサエレメント回路を、演算処理モードと、
ブリッジモードとのいずれで動作させるかを選択するモ
ード選択手段と、前記モード選択手段によって前記演算
処理モードが選択されたときは、プロセッサエレメント
回路と外部メモリとの間で、前記モード選択手段によっ
て前記ブリッジモードが選択されたときは、プロセッサ
エレメント回路と周辺回路との間で、それぞれデータの
入出力を行うインタフェース手段と、前記モード選択手
段によって前記演算処理モードが選択されているときに
は、起動時に待機状態となり、その後の起動操作により
動作状態となり、前記モード選択手段によって前記ブリ
ッジモードが選択されているときには、起動時にブート
状態となり、その後の遷移操作により動作状態となり、
さらに、内部にデータを格納するための第２のレジスタ
を備え、命令列に従って動作することにより、少なくと
も前記演算器を制御する機能と、前記モード選択手段に
よる選択結果に応じて、前記外部メモリまたは前記周辺
回路と前記第２のレジスタまたは前記第１のレジスタと
の間でデータを入出力する機能とを有するプロセッサ手
段と、前記ブート状態のときに特定の命令列を前記プロ
セッサ手段に供給し、特定の命令列によってＲＯＭから
読み出された命令列を前記内部メモリに格納し、特定の
命令列の供給終了後前記プロセッサ手段に対して前記遷
移操作を行い、前記プロセッサ手段が前記動作状態のと
きは前記内部メモリから命令列を読み出して前記プロセ
ッサ手段に供給する命令選択手段と、外部バスから入力
されたデータを前記内部メモリまたは前記外部メモリに
格納し、前記内部メモリまたは前記外部メモリに格納さ
れているデータを読み出して前記外部バスに出力し、前
記外部バスからの入力に応じて前記プロセッサ手段に対
して前記起動操作を行う外部バスインタフェース手段
と、を備えることを特徴とする。In order to achieve the above object, a processor element circuit according to the present invention has a function of performing arithmetic operations in parallel and a plurality of buses. A processor element circuit used in a parallel computing system including a bridge circuit for performing a bridge between the internal memory and the first memory, the internal memory for storing data, and a first register therein. An arithmetic unit for performing an operation on the data read from the register and storing the operation result in the internal memory or the first register; and a processor element circuit,
A mode selection unit for selecting which of a bridge mode and an operation is to be performed, and when the arithmetic processing mode is selected by the mode selection unit, the mode selection unit sets the processor operation between the processor element circuit and the external memory. When the bridge mode is selected, the interface means for inputting / outputting data between the processor element circuit and the peripheral circuit, and when the arithmetic processing mode is selected by the mode selecting means, the apparatus stands by at startup. State, becomes an operating state by a subsequent starting operation, and when the bridge mode is selected by the mode selecting means, becomes a booting state at the time of starting, and becomes an operating state by a subsequent transition operation,
Furthermore, a second register for storing data therein is provided, and is operated in accordance with an instruction sequence, so that at least the function of controlling the arithmetic unit and the external memory or the external memory or Processor means having a function of inputting and outputting data between the peripheral circuit and the second register or the first register; and supplying a specific instruction sequence to the processor means in the boot state. When the instruction sequence read from the ROM by the specific instruction sequence is stored in the internal memory, the transition operation is performed on the processor unit after the supply of the specific instruction sequence is completed, and the processor unit is in the operating state. Instruction selection means for reading an instruction sequence from the internal memory and supplying it to the processor means; The data stored in the internal memory or the external memory is read, and the data stored in the internal memory or the external memory is read and output to the external bus. External bus interface means for performing operations.

【００３５】また、請求項２の発明は、請求項１に記載
のプロセッサエレメント回路において、前記プロセッサ
手段は、割り込みを処理する手段を備え、前記インタフ
ェース手段は、前記選択入力によってブリッジモードが
選択されているときには、前記周辺回路からの割り込み
を入力して、前記プロセッサ手段に供給する手段を備
え、前記外部バスインタフェース手段は、前記外部バス
から特定のデータが入力されると、前記プロセッサ手段
に割り込みを供給する手段を備えることを特徴とする。According to a second aspect of the present invention, in the processor element circuit according to the first aspect, the processor means includes means for processing an interrupt, and the interface means selects a bridge mode by the selection input. Means for inputting an interrupt from the peripheral circuit and supplying it to the processor means, wherein the external bus interface means interrupts the processor means when specific data is input from the external bus. Is provided.

【００３６】また、請求項３の発明は、演算を並列に行
う機能を備えると共に、複数のバスを備え、この複数の
バス間のブリッジを行うためのブリッジ回路を備える並
列計算システムにおいて、システム全体の制御を行うた
めのホストコンピュータと、請求項１または請求項２に
記載のプロセッサエレメント回路であって、前記モード
選択手段により前記演算処理モードとされたプロセッサ
エレメントと、複数個の前記プロセッサエレメントをそ
れぞれ搭載する複数個のボードと、前記各ボード上で、
前記複数のプロセッサエレメントの前記外部バスインタ
フェース手段同士を接続するための第１のバスと、前記
ホストコンピュータと、前記複数個のボードとを接続す
るための第２のバスと、請求項１または請求項２に記載
のプロセッサエレメント回路であり、前記モード選択手
段によりブリッジモードとされ、前記各ボード上で前記
第１のバスとは前記外部バスインタフェース手段が接続
され、前記第２のバスとは変換回路を通して前記インタ
フェース手段が接続され、それらの手段によって前記第
１のバスと前記第２のバスとの間で相互にプロトコル変
換する、バスブリッジと、を備えることを特徴とする。According to a third aspect of the present invention, there is provided a parallel computing system having a function of performing operations in parallel, a plurality of buses, and a bridge circuit for bridging the plurality of buses. And a processor element circuit according to claim 1 or 2, wherein the processor element is set to the arithmetic processing mode by the mode selecting means, and a plurality of the processor elements are controlled by the processor element. Multiple boards to be mounted on each, and on each of the boards,
2. A first bus for connecting the external bus interface means of the plurality of processor elements to each other, a second bus for connecting the host computer and the plurality of boards, and Item 3. The processor element circuit according to Item 2, wherein a bridge mode is set by the mode selection means, the external bus interface means is connected to the first bus on each of the boards, and a conversion is made to the second bus. A bus bridge connected to the interface means through a circuit and performing protocol conversion between the first bus and the second bus with each other by the means.

【００３７】また、請求項４の発明は、請求項３に記載
の並列計算システムにおいて、異なるプロトコルを持つ
前記第１のバスと前記第２のバスとの間のプロトコル変
換を行うバスブリッジ方法であって、前記プロセッサエ
レメントから、前記第１のバスの外側に位置するノード
に対する書き込み要求、書き込みアドレス、書き込みデ
ータを含んだ書き込み情報を、第１のバスの書き込み手
順に基づいて受け取る手順と、前記書き込み情報をもと
にして、前記第２のバスの書き込み手順に基づいて、前
記書き込みアドレスに対して前記書き込みデータを書き
込む手順と、を有することを特徴とする。According to a fourth aspect of the present invention, there is provided a bus bridge method for performing a protocol conversion between the first bus and the second bus having different protocols in the parallel computing system according to the third aspect. Receiving, from the processor element, write information including a write request, a write address, and write data for a node located outside the first bus based on a write procedure of the first bus; Writing the write data to the write address based on the write information based on the write procedure of the second bus.

【００３８】さらに、請求項５の発明は、請求項３に記
載の並列計算システムにおいて、異なるプロトコルをも
つ前記第１のバスと前記第２のバスとの間のプロトコル
変換を行うバスブリッジ方法であって、前記プロセッサ
エレメントから、前記第１のバスの外側に位置するノー
ドからの読み出し要求、読み出しアドレス、読み出しデ
ータを格納するプロセッサエレメントのアドレスを含ん
だ読み出し情報を、第１のバスの書き込み手順に基づい
て受け取る手順と、前記読み出し情報をもとにして、前
記第２のバスの読み出し手順に基づいて、前記読み出し
アドレスからデータを読み出す手順と、前記読み出した
データを、前記読み出し情報に含まれるプロセッサエレ
メントのアドレスに、前記第２のバスの書き込み手順に
基づいて書き込む手順と、を有することを特徴とする。Furthermore, the invention of claim 5 is a bus bridge method for performing a protocol conversion between the first bus and the second bus having different protocols in the parallel computing system according to claim 3. A read request including a read request from a node located outside the first bus, a read address, and an address of a processor element for storing read data from the processor element is written into the first bus by the processor element. A step of receiving data based on the read information, a step of reading data from the read address based on the read procedure of the second bus, and the read data being included in the read information. Write to the address of the processor element based on the write procedure of the second bus And having a forward, a.

【００３９】[0039]

【作用】請求項１の発明においては、モード選択手段に
よって演算処理モードとブリッジモードの選択が可能と
なり、プロセッサエレメント回路を、演算処理モードと
することによって並列計算システムのプロセッサエレメ
ントとして使用し、また、ブリッジモードとすることに
よって並列計算システムの異なるバスの間のプロトコル
変換を行うためのバスブリッジ回路またはその一部とし
て使用することが可能になる。According to the first aspect of the present invention, the operation mode and the bridge mode can be selected by the mode selection means, and the processor element circuit is used as the processor element of the parallel computing system by setting the operation mode to the operation mode. By setting the bridge mode, the parallel computing system can be used as a bus bridge circuit for performing protocol conversion between different buses or a part thereof.

【００４０】また、請求項２の発明によれば、プロセッ
サ手段に割り込み処理を行わせるようにしたため、プロ
セッサエレメント回路をバスブリッジ回路またはその一
部として使用したとき、任意の時刻で入出力されるバス
との間のデータを容易に処理することが可能となる。According to the second aspect of the present invention, since the processor means is made to perform interrupt processing, when the processor element circuit is used as a bus bridge circuit or a part thereof, it is input / output at an arbitrary time. Data to and from the bus can be easily processed.

【００４１】また、請求項３の発明によれば、並列計算
システムで、プロセッサエレメントとバスブリッジ回路
とを同一のプロセッサエレメント回路を使用して構成す
ることが可能になるため、システムの開発コストを削減
するとともに、システムの開発期間を短縮することがで
きる。According to the third aspect of the present invention, in the parallel computing system, the processor element and the bus bridge circuit can be configured by using the same processor element circuit. In addition to the reduction, the development period of the system can be shortened.

【００４２】また、請求項４の発明によれば、プロセッ
サエレメント回路を使ってバスブリッジ回路を構成して
いる場合に、第１のバスから直接書き込みアクセスする
ことができない第２のバスの資源を、第１のバスの書き
込みを利用してアクセスできるようになる。したがっ
て、特別な回路を追加することなく、第１のバスからア
クセスすることが可能になる。According to the fourth aspect of the present invention, when the bus bridge circuit is configured using the processor element circuit, the resources of the second bus that cannot be directly accessed for writing from the first bus are used. , Can be accessed using the writing of the first bus. Therefore, it is possible to access from the first bus without adding a special circuit.

【００４３】また、請求項５の発明によれば、プロセッ
サエレメント回路を使ってバスブリッジ回路を構成して
いる場合に、第１のバスから直接読み出しアクセスする
ことができない第２のバスの資源を、第１のバスの書き
込みを利用してアクセスできるようになる。したがっ
て、特別な回路を追加することなく、第１のバスからア
クセスすることが可能になる。According to the fifth aspect of the present invention, when a bus bridge circuit is configured using a processor element circuit, resources of the second bus which cannot be directly read and accessed from the first bus are used. , Can be accessed using the writing of the first bus. Therefore, it is possible to access from the first bus without adding a special circuit.

【００４４】[0044]

【発明の実施の形態】以下、この発明の実施の形態を図
を参照して説明する。Embodiments of the present invention will be described below with reference to the drawings.

【００４５】［システムの構成］図２は、この発明によ
る並列計算システムの実施の形態の構成を示したブロッ
ク図である。この図２の実施の形態は、並列計算システ
ムを専用システムの構成としたものであって、このシス
テムは、汎用のワークステーションやＰＣ（パーソナル
コンピュータ）を使用して構成されるホストコンピュー
タ１１と、多数のプロセッサエレメント１４を搭載した
複数枚のボード１２とから構成され、ホストコンピュー
タ１１と、各ボード１２との間は、システムバス１３で
結合されている。[System Configuration] FIG. 2 is a block diagram showing a configuration of an embodiment of a parallel computing system according to the present invention. In the embodiment shown in FIG. 2, the parallel computing system is configured as a dedicated system. This system comprises a host computer 11 configured using a general-purpose workstation or a PC (personal computer); It comprises a plurality of boards 12 on which a number of processor elements 14 are mounted, and the host computer 11 and each board 12 are connected by a system bus 13.

【００４６】各ボード１２内においては、多数のプロセ
ッサエレメント１４のそれぞれには、ローカルメモリ１
５が接続されて設けられている。そして、多数のプロセ
ッサエレメント１４が、ボードバス１６を通して接続さ
れている。ボードバス１６と、システムバス１３とは、
バスブリッジ回路１７を介して接続されている。In each board 12, each of a number of processor elements 14 has a local memory 1
5 are connected and provided. Many processor elements 14 are connected through a board bus 16. The board bus 16 and the system bus 13
They are connected via a bus bridge circuit 17.

【００４７】ホストコンピュータ１１は、図２のシステ
ムの全体の制御を行う。各ボード１２は、このシステム
で行う個々の計算を並列に行う。各ボード１２のプロセ
ッサエレメント１４は、その内部に、プログラムによっ
て動作する浮動小数点演算器を備え、各ボード２に与え
られた個々の計算（浮動小数点積和演算）を実行する。
ローカルメモリ１５には、計算に使用されるパラメータ
や制御変数などを格納する。各プロセッサエレメント１
４は、このローカルメモリに格納されたパラメータや制
御変数などを用いながら、計算を実行する。The host computer 11 controls the entire system shown in FIG. Each board 12 performs individual calculations performed in this system in parallel. The processor element 14 of each board 12 includes therein a floating-point arithmetic unit operated by a program, and executes individual calculations (floating-point multiply-add operation) given to each board 2.
The local memory 15 stores parameters and control variables used for calculation. Each processor element 1
4 executes a calculation using the parameters and control variables stored in the local memory.

【００４８】この実施の形態においては、バスブリッジ
回路１７を開発する工数を削減するために、バスブリッ
ジ回路１７と、プロセッサエレメント１４とを、共通の
ＬＳＩを使用して構成する。詳細については後述する。In this embodiment, the bus bridge circuit 17 and the processor element 14 are configured using a common LSI in order to reduce the number of steps for developing the bus bridge circuit 17. Details will be described later.

【００４９】次に、図２の並列計算システムにおけるシ
ステムバス１３とボードバス１６について説明する。Next, the system bus 13 and the board bus 16 in the parallel computing system of FIG. 2 will be described.

【００５０】システムバス１３は、汎用のＰＣなどで構
成されるホストコンピュータ１１に接続されるため、前
述の通り汎用バスが用いられる。システムバス１３に使
用される汎用バスの例としては種々あるが、この実施の
形態では、ＩＥＥＥ１３９４シリアルバスを用いる。こ
のＩＥＥＥ１３９４シリアルバスは、現状で、最大転送
レートが４００Ｍｂｐｓであり、汎用といいながらも十
分な高速性を保持している。Since the system bus 13 is connected to the host computer 11 composed of a general-purpose PC or the like, the general-purpose bus is used as described above. There are various examples of a general-purpose bus used for the system bus 13. In this embodiment, an IEEE 1394 serial bus is used. At present, the IEEE 1394 serial bus has a maximum transfer rate of 400 Mbps, and has a sufficient high-speed even though it is generally used.

【００５１】また、ボードバス１６には高速なパラレル
バスなどが使用される。このボードバス１６に使用され
るバスの例も種々あるが、この実施の形態では、現状で
最大８００Ｍｂｐｓの通信性能を持つＰＰＲＡＭ−Ｌｉ
ｎｋを用いる。The board bus 16 is a high-speed parallel bus or the like. Although there are various examples of a bus used for the board bus 16, in this embodiment, at present, a PPRAM-Li having a communication performance of up to 800 Mbps is used.
nk.

【００５２】図２においては、各ボード１２のバスブリ
ッジ回路１７にポートが２個ずつ用意され、それらのポ
ートを用いて、各ボード１２を順次に直列に接続したＩ
ＥＥＥ１３９４シリアルバスのデイジーチェーン形式
で、システムバス１３は構成されている。In FIG. 2, two ports are prepared in the bus bridge circuit 17 of each board 12, and each board 12 is connected in series by using these ports.
The system bus 13 is configured in a daisy chain format of the EEE1394 serial bus.

【００５３】また、ＰＰＲＡＭ−Ｌｉｎｋでは、バス上
の負荷容量を小さくするために、後述するように、１７
ビット信号によって単方向リング状に接続されるため、
図２に示したボードバス１６の接続形態も、この形状と
なっている。In the PPRAM-Link, as described later, in order to reduce the load capacity on the bus,
Because they are connected in a unidirectional ring by bit signals,
The connection form of the board bus 16 shown in FIG. 2 also has this shape.

【００５４】以下、ボードバス１６に使用されるＰＰＲ
ＡＭ−Ｌｉｎｋの仕様について、さらに詳しく説明す
る。ＰＰＲＡＭ−Ｌｉｎｋに関しては、「ＰＰＲＡＭ−
Ｌｉｎｋ論理階層仕様（九大案０．１版）の概要」（山
崎ら、Ｔｅｃｈ．Ｒｅｐ．ＩＥＩＣＥ．，ＩＣＤ９７−
２４，１９９７−０５）、「ＰＰＲＡＭ−Ｌｉｎｋイン
タフェース・コアの開発」（橋本ら、情報処理学会研究
報告，ＡＲＣ−１２５−１１，１９９７−０８）などに
記されている。まず、ＰＰＲＡＭ−Ｌｉｎｋの接続例を
図４に示す。Hereinafter, the PPR used for the board bus 16 will be described.
The specification of AM-Link will be described in more detail. For PPRAM-Link, refer to “PPRAM-
Overview of Link Logic Hierarchy Specification (Kyudai Univ., 0.1 Edition) "(Yamazaki et al., Tech. Rep. IEICE., ICD97-
24, 1997-05), and "Development of PPRAM-Link Interface Core" (Hashimoto et al., Information Processing Society of Japan, ARC-125-11, 1997-08). First, a connection example of the PPRAM-Link is shown in FIG.

【００５５】図４において、Ｎ１〜Ｎ５はＰＰＲＡＭノ
ードと呼ばれる基本的な構成単位である。ＰＰＲＡＭノ
ードＮ１〜Ｎ５のそれぞれは、１７ビットの入力端子Ｉ
ｎｐｕｔＬｉｎｋおよび出力端子ＯｕｔｐｕｔＬｉｎｋ
を有しており、図４の例に示したように、隣り合うＰＰ
ＲＡＭノードの入力端子ＩｎｐｕｔＬｉｎｋと出力端子
ＯｕｔｐｕｔＬｉｎｋとが接続されて、全体として単方
向のリング状に接続されている。In FIG. 4, N1 to N5 are basic structural units called PPRAM nodes. Each of the PPRAM nodes N1 to N5 has a 17-bit input terminal I
nOutputLink and output terminal OutputLink
And, as shown in the example of FIG.
The input terminal InputLink and the output terminal OutputLink of the RAM node are connected, and are connected in a unidirectional ring as a whole.

【００５６】入力端子ＩｎｐｕｔＬｉｎｋ、あるいは出
力端子ＯｕｔｐｕｔＬｉｎｋの１７ビットの信号のそれ
ぞれ上位１６ビットがシンボル信号、下位１ビットがフ
ラグ信号となっており、これらの信号は、図示を省略し
た共通のクロック信号に同期して、１クロックに１個ず
つ転送される。１個のＬＳＩに複数のＰＰＲＡＭノード
が含まれても良いが、この実施の形態では、１個のＰＰ
ＲＡＭノードが１個のプロセッサエレメント１４に相当
する。The upper 16 bits of the 17-bit signal of the input terminal InputLink or the output terminal OutputLink are the symbol signal, and the lower 1 bit is the flag signal. These signals are used as common clock signals (not shown). Synchronously, data is transferred one by one clock. Although one LSI may include a plurality of PPRAM nodes, in this embodiment, one PPRAM node is used.
The RAM node corresponds to one processor element 14.

【００５７】ＰＰＲＡＭノードＮ１〜Ｎ５の各々には、
システム全体で固有のノード識別子が割り当てられてお
り、また、各ＰＰＲＡＭノード内のメモリなどにはオフ
セットアドレスが割り当てられている。各ＰＰＲＡＭノ
ード間のアクセスは、このノード識別子とオフセットア
ドレスを用いて行われる。Each of the PPRAM nodes N1 to N5 has
A unique node identifier is assigned to the entire system, and an offset address is assigned to a memory or the like in each PPRAM node. Access between the PPRAM nodes is performed using the node identifier and the offset address.

【００５８】次に、２つのＰＰＲＡＭノード間の通信に
ついて、図４のＰＰＲＡＭノードＮ１からＰＰＲＡＭノ
ードＮ３に備えられているメモリなどに対するリード
（読み出し）、ライト（書き込み）などのアクセスを行
うときを例にとって説明する。なお、１回のアクセスは
トランザクションと呼ばれる。Next, with respect to communication between two PPRAM nodes, an example in which access such as read (read) and write (write) is performed from the PPRAM node N1 to the memory provided in the PPRAM node N3 in FIG. To explain. One access is called a transaction.

【００５９】このようなトランザクションは、ＰＰＲＡ
Ｍノード間でパケットをやりとりすることで行われる。
ここで、パケットとは、ＰＰＲＡＭ−Ｌｉｎｋにおける
通信の１単位のことであり、前述の入力端子Ｉｎｐｕｔ
Ｌｉｎｋや出力端子ＯｕｔｐｕｔＬｉｎｋの信号に含ま
れる１６ビットのシンボル信号を通して、複数クロック
にわたって連続して転送される。このとき、前述のフラ
グ信号は、パケットの判別などの制御に使用される。さ
らに、１クロックでシンボル信号を通して転送される１
６ビットデータは１シンボルと呼ばれる。Such a transaction is referred to as PPRA
This is performed by exchanging packets between the M nodes.
Here, the packet is one unit of communication in the PPRAM-Link, and is the input terminal Input described above.
The signal is continuously transferred over a plurality of clocks through a 16-bit symbol signal included in the signal of the Link or the output terminal OutputLink. At this time, the above-mentioned flag signal is used for control such as packet discrimination. Furthermore, 1 clock transmitted through the symbol signal in 1 clock
6-bit data is called one symbol.

【００６０】また、ＰＰＲＡＭ−Ｌｉｎｋでは、上述し
たように、単方向のリング状に接続されているため、パ
ケットがＰＰＲＡＭノードＮ１からＰＰＲＡＭノードＮ
３に送られるときは、図４のＰＰＲＡＭノードＮ１→Ｐ
ＰＲＡＭノードＮ２→ＰＰＲＡＭノードＮ３の経路が使
われ、パケットがＰＰＲＡＭノードＮ３からＰＰＲＡＭ
ノードＮ１に送られるときは、図４のＰＰＲＡＭノード
Ｎ３→ＰＰＲＡＭノードＮ４→ＰＰＲＡＭノードＮ５→
ＰＰＲＡＭノードＮ１の経路が使われる。In the PPRAM-Link, as described above, since packets are connected in a unidirectional ring, packets are transferred from the PPRAM node N1 to the PPRAM node N.
3 is sent to the PPRAM node N1 → P in FIG.
The path from the PRAM node N2 to the PPRAM node N3 is used, and the packet is transferred from the PPRAM node N3 to the PPRAM.
When sent to the node N1, the PPRAM node N3 in FIG. 4 → the PPRAM node N4 → the PPRAM node N5 →
The path of the PPRAM node N1 is used.

【００６１】図５は、この実施の形態の場合の通信手順
を示すものであり、上述したように、ＰＰＲＡＭノード
Ｎ１（要求側）からＰＰＲＡＭノードＮ３（応答側）に
備えられているメモリなどに対するアクセスを行うため
の１回のトランザクションの手順を示した概念図であ
る。通信は以下の手順に従って行われる。FIG. 5 shows a communication procedure in the case of this embodiment. As described above, the PPRAM node N1 (request side) switches from the memory provided in the PPRAM node N3 (response side) to the like. FIG. 7 is a conceptual diagram showing a procedure of one transaction for performing access. Communication is performed according to the following procedure.

【００６２】ＰＰＲＡＭノードＮ１からＰＰＲＡＭノ
ードＮ３に要求送出パケットを送付する。A request transmission packet is sent from the PPRAM node N1 to the PPRAM node N3.

【００６３】ＰＰＲＡＭノードＮ３からＰＰＲＡＭノ
ードＮ１に要求受領パケットを送付する。A request reception packet is sent from PPRAM node N3 to PPRAM node N1.

【００６４】ＰＰＲＡＭノードＮ３からＰＰＲＡＭノ
ードＮ１に応答送出パケットを送付する。A response transmission packet is sent from PPRAM node N3 to PPRAM node N1.

【００６５】ＰＰＲＡＭノードＮ１からＰＰＲＡＭノ
ードＮ３に応答受領パケットを送付する。A response reception packet is sent from PPRAM node N1 to PPRAM node N3.

【００６６】ここで、各パケットにはヘッダ情報が含ま
れている。このヘッダ情報には、パケットを発行したノ
ードのノード識別子、パケット送付先のノード識別子、
パケット送付先のメモリのオフセットアドレス、トラン
ザクションの種類（読み出し、書き込みなど）などの情
報が含まれている。このため、要求側ノードと応答側ノ
ードとは、ヘッダ情報を基にしてパケットの受け渡しを
行うことが可能となる。Here, each packet contains header information. This header information includes the node identifier of the node that issued the packet, the node identifier of the packet destination,
It contains information such as the offset address of the packet destination memory and the type of transaction (read, write, etc.). Therefore, the requesting node and the responding node can exchange packets based on the header information.

【００６７】また、ＰＰＲＡＭノードＮ１からの読み出
し要求のときには、ＰＰＲＡＭノードＮ３は、前記の
応答送出パケットに、このヘッダとともにメモリから読
み出したデータを付加して送付する。When a read request is made from the PPRAM node N1, the PPRAM node N3 adds the header and the data read from the memory to the above-mentioned response transmission packet and sends it.

【００６８】さらに、ＰＰＲＡＭノードＮ１からの書き
込み要求のときには、ＰＰＲＡＭノードＮ１は、前記
の要求送出パケットに、ＰＰＲＡＭノードＮ３に書き込
むべきデータを付加して送付する。Further, when a write request is issued from the PPRAM node N1, the PPRAM node N1 adds the data to be written to the PPRAM node N3 to the request transmission packet and sends the packet.

【００６９】また、要求受領パケット、応答受領パケッ
トは、送出パケットを送ったＰＰＲＡＭノードが、伝送
経路の途中でパケットが消失せずに送付先に到着したこ
とを確認するためのパケットである。なお、と、
とのように、送出パケットの転送と受領パケットの転
送のことを、まとめてサブアクションと呼ぶ。The request acknowledgment packet and the response acknowledgment packet are packets for confirming that the PPRAM node that sent the transmission packet has arrived at the destination without losing the packet in the middle of the transmission path. In addition, and
The transfer of the transmission packet and the transfer of the reception packet as described above are collectively referred to as a subaction.

【００７０】このように、ＰＰＲＡＭ−Ｌｉｎｋにおけ
る１回のトランザクションは、上に述べた手順によって
行われる。As described above, one transaction in the PPRAM-Link is performed according to the procedure described above.

【００７１】次に、前述した分子軌道計算を例にとり、
この図２のシステムの動作について説明する。まず、計
算の概要について説明する。Next, taking the above-mentioned molecular orbital calculation as an example,
The operation of the system shown in FIG. 2 will be described. First, an outline of the calculation will be described.

【００７２】前述のように、分子軌道計算では、個々の
２電子積分の計算量は、基底数Ｎの４乗に比例し、ま
た、それぞれの計算は独立して実施できる。また、
（１）式で表されるフォック行列の対角化の計算量は、
Ｎの３乗に比例するが、個々の計算は互いに依存しあっ
ている。この計算では、２電子積分の絶対値が十分に小
さいものは、前述のようにカットオフされるが、それで
も実用的な大きさの基底を考えた場合、２電子積分の計
算量の方がフォック行列の対角化の計算よりも圧倒的に
多い。As described above, in the molecular orbital calculation, the calculation amount of each two-electron integral is proportional to the fourth power of the basis number N, and each calculation can be performed independently. Also,
The computational complexity of the diagonalization of the Fock matrix represented by equation (1) is
Although proportional to the third power of N, the individual calculations are dependent on each other. In this calculation, if the absolute value of the two-electron integral is sufficiently small, the cutoff is performed as described above. However, when considering a base of a practical size, the calculation amount of the two-electron integral is smaller than that of the two-electron integral. It is far more than the calculation of the diagonalization of the matrix.

【００７３】そこで、図２に示した専用システムで分子
軌道計算を実行する場合、計算量が多く独立しているた
め通信が少ない２電子積分の計算を、複数個のプロセッ
サエレメント１４で並列に実施し、計算量が比較的少な
く、互いに依存しているフォック行列の対角化の計算
は、ホストコンピュータ１１で行う。Therefore, when the molecular orbital calculation is performed by the dedicated system shown in FIG. 2, the calculation of the two-electron integral with a small amount of communication due to a large amount of calculation is performed in parallel by a plurality of processor elements 14. The calculation of the diagonalization of the Fock matrices, which have a relatively small amount of calculation and depend on each other, is performed by the host computer 11.

【００７４】また、プロセッサエレメント１４で計算し
た２電子積分を、ホストコンピュータ１１に送付して、
ホストコンピュータ１１で（２）式に従ってフォック行
列要素を計算すると、膨大な量の２電子積分のデータを
ボードバス１６やシステムバス１３を通して通信しなけ
ればならない。この結果、高速に２電子積分やフォック
行列を対角化しても、通信速度がシステムの性能を律速
してしまい、システムの分子軌道計算の速度を向上させ
ることができない。The two-electron integral calculated by the processor element 14 is sent to the host computer 11 and
When the host computer 11 calculates the Fock matrix elements according to the equation (2), an enormous amount of two-electron integral data must be communicated through the board bus 16 and the system bus 13. As a result, even if the two-electron integration or the Fock matrix is diagonalized at high speed, the communication speed limits the performance of the system, and the speed of the molecular orbital calculation of the system cannot be improved.

【００７５】このため、この通信量を少なくするため
に、各プロセッサエレメント１４では（２）式のｇｒｓ
までを計算して、その値をホストコンピュータ１１に送
付する。ホストコンピュータ１１では、この値をもとに
して（２）式に従ってフォック行列要素を計算する。For this reason, in order to reduce the communication amount, each processor element 14 uses grs in the equation (2).
Is calculated, and the value is sent to the host computer 11. The host computer 11 calculates a Fock matrix element according to the equation (2) based on this value.

【００７６】さらに、前述したＳＣＦ法に基づいた計算
が行われるため、繰り返しの都度、（３）式によって密
度行列要素が計算される。この計算もホストコンピュー
タ１１で行われるものとする。Further, since the calculation based on the above-described SCF method is performed, the density matrix element is calculated by the equation (3) every time the calculation is repeated. This calculation is also performed by the host computer 11.

【００７７】次に、この図２のシステムによる具体的な
計算手順の概要について説明する。まず、２電子積分の
計算に必要な分子情報や基底情報、プロセッサエレメン
ト１４の内部の浮動小数点演算器を動作させて、２電子
積分の計算やｇｒｓを計算するためのプログラム、さら
には他の計算に必要となるテーラー展開の係数などのパ
ラメータは、あらかじめホストコンピュータ１１のメモ
リに格納されているものとする。Next, an outline of a specific calculation procedure by the system of FIG. 2 will be described. First, the molecular information and basis information necessary for the calculation of the two-electron integral, a program for operating the floating-point arithmetic unit inside the processor element 14 and calculating the two-electron integral and the grs, and further, other calculations It is assumed that parameters such as Taylor expansion coefficients required for the above are stored in the memory of the host computer 11 in advance.

【００７８】また、ＳＣＦ法に基づいた繰り返しの１回
目で使用されるｇｒｓの計算に必要な密度行列要素の値
も、予め（３）式に基づいてホストコンピュータ１１で
計算され、そのメモリに格納されているものとする。さ
らに、各プロセッサエレメント１４に担当させる２電子
積分の計算は、各プロセッサエレメント１４の計算時間
が均等になり、計算時間と通信時間の観点から、システ
ムの計算時間が最小になるように、ホストコンピュータ
１１が決定する。The value of the density matrix element required for the calculation of grs used in the first iteration of the SCF method is also calculated in advance by the host computer 11 based on the expression (3) and stored in its memory. It is assumed that Further, the calculation of the two-electron integral assigned to each processor element 14 is performed in such a manner that the calculation time of each processor element 14 becomes uniform, and the calculation time of the system is minimized from the viewpoint of calculation time and communication time. 11 is determined.

【００７９】最初にシステムに起動がかかった後に、各
プロセッサエレメント１４には、ホストコンピュータ１
１に格納されているデータのうち、分子情報、基底情報
やその他のパラメータ、さらに前述のプログラムなどが
転送され、プロセッサエレメント１４内部のメモリ、あ
るいはプロセッサエレメント１４に付属するローカルメ
モリ１５に格納される。After the system is started for the first time, each processor element 14
Among the data stored in 1, molecular information, base information and other parameters, the above-mentioned programs, etc. are transferred and stored in a memory inside the processor element 14 or a local memory 15 attached to the processor element 14. .

【００８０】また、このとき、これらのデータは、ま
ず、ホストコンピュータ１１からシステムバス１３であ
るＩＥＥＥ１３９４シリアルバスを経由して、各ボード
１２上のバスブリッジ回路１７に送られ、バスブリッジ
回路１７によってボードバス１６であるＰＰＲＡＭ−Ｌ
ｉｎｋへの中継が行われる。その後、データは、ボード
バス１６を通して各プロセッサエレメント１４に送られ
る。その後、さらに、ホストコンピュータ１１から、同
じ経路を通して各プロセッサエレメント１４に起動がか
けられ、各プロセッサエレメント１４は計算を開始す
る。At this time, these data are first sent from the host computer 11 to the bus bridge circuit 17 on each board 12 via the IEEE 1394 serial bus which is the system bus 13, and the bus bridge circuit 17 PPRAM-L which is a board bus 16
Relay to the Ink is performed. Thereafter, the data is sent to each processor element 14 through the board bus 16. Thereafter, the host computer 11 further activates each processor element 14 through the same path, and each processor element 14 starts calculation.

【００８１】前述のように、あるプロセッサエレメント
１４に、どの２電子積分の計算を担当させるかはホスト
コンピュータ１１が決定するため、プロセッサエレメン
ト１４に起動がかけられた後、ホストコンピュータ１１
からは、各プロセッサエレメント１４が担当する２電子
積分情報、あるいはそれらの２電子積分からｇｒｓを計
算するための密度行列情報が転送される。As described above, since the host computer 11 determines which two-electron integral is to be calculated by a certain processor element 14, after the processor element 14 is activated, the host computer 11
, Two-electron integral information assigned to each processor element 14 or density matrix information for calculating grs from the two-electron integral is transferred.

【００８２】それぞれのプロセッサエレメント１４は、
転送された２電子積分情報に対応した２電子積分を、前
述のプログラムに従って計算する。２電子積分の計算が
終ったら、プロセッサエレメント１４は、計算結果を、
２電子積分情報とともに転送された密度行列と掛け合わ
せ、ｇｒｓの計算を行う。Each processor element 14
The two-electron integral corresponding to the transferred two-electron integral information is calculated according to the above-described program. When the calculation of the two-electron integral is completed, the processor element 14 calculates
The grs is calculated by multiplying by the density matrix transferred together with the two-electron integration information.

【００８３】このようにして計算されたｇｒｓは、各プ
ロセッサエレメント１４によって、付属するローカルメ
モリ１５に格納され、各プロセッサエレメント１４から
ホストコンピュータ１１に対して計算終了情報が送られ
る。ホストコンピュータ１１は、プロセッサエレメント
１４からの計算終了情報を受け取ると、ローカルメモリ
１５からｇｒｓを読み出す。The grs calculated in this manner is stored in the associated local memory 15 by each processor element 14, and calculation end information is sent from each processor element 14 to the host computer 11. Upon receiving the calculation end information from the processor element 14, the host computer 11 reads grs from the local memory 15.

【００８４】このとき、上記計算終了情報は、まず、各
プロセッサエレメント１４からボードバス１６であるＰ
ＰＲＡＭ−Ｌｉｎｋを通してボード１２上のバスブリッ
ジ回路１７に転送され、バスブリッジ回路１７でシステ
ムバス３であるＩＥＥＥ１３９４シリアルバスへの中継
が行われる。その後、計算終了情報は、システムバス１
３を通してホストコンピュータ１１に送られる。また、
ローカルメモリ１５からｇｒｓの値が読み出されるとき
にも、同様の経路を通して、データがホストコンピュー
タ１１に送られる。At this time, the above-mentioned calculation end information is first transmitted from each processor element 14 to the P
The data is transferred to the bus bridge circuit 17 on the board 12 through the PRAM-Link, and relayed to the IEEE 1394 serial bus, which is the system bus 3, by the bus bridge circuit 17. After that, the calculation end information is transmitted to the system bus 1
3 to the host computer 11. Also,
When the value of grs is read from the local memory 15, the data is sent to the host computer 11 through the same route.

【００８５】その後、ホストコンピュータ１１は、その
計算の終了したプロセッサエレメント１４に、担当させ
る別の２電子積分情報、密度行列情報を送付し、プロセ
ッサエレメントは、再び、ｇｒｓまでを計算して、それ
をホストコンピュータ１１に送り返す。全ての２電子積
分の計算が終了するまで、この操作が繰り返された後、
ホストコンピュータ１１には、フォック行列要素が揃う
ため、ホストコンピュータ１１では、対角化が行われ
る。After that, the host computer 11 sends another two-electron integral information and density matrix information to be assigned to the processor element 14 which has completed the calculation, and the processor element calculates again grs, and Is sent back to the host computer 11. This operation is repeated until the calculation of all two-electron integrals is completed.
Since the host computer 11 has Fock matrix elements, the host computer 11 performs diagonalization.

【００８６】さらに、前述のＳＣＦ法により、繰り返
し、フォック行列要素の計算が行われるため、フォック
行列の対角化が終了した後、さらに繰り返しが必要と判
断された場合には、ホストコンピュータ１１において、
再度、密度行列要素が計算され、最終的な結果が得られ
るまで、上に述べた手順に従って、各プロセッサエレメ
ント１４において新たなｇｒｓの計算が繰り返される。Further, since the Fock matrix elements are repeatedly calculated by the above-described SCF method, if it is determined that further repetition is necessary after the diagonalization of the Fock matrix is completed, the host computer 11 ,
Again, the calculation of the new grs is repeated in each processor element 14 according to the procedure described above until the density matrix elements are calculated and the final result is obtained.

【００８７】［プロセッサエレメント回路の実施の形
態］従来の技術の欄で説明したように、専用システムの
場合、通常は、プロセッサエレメント１４と、バスブリ
ッジ回路１７とは、専用のＬＳＩがそれぞれ用いられ
る。この実施の形態においては、前述したように、プロ
セッサエレメント１４として用いるプロセッサエレメン
ト回路に対して最小限の拡張を施して、それをバスブリ
ッジ回路１７、または、バスブリッジ回路１７の一部と
して用いることによって、２種類のＬＳＩを開発する手
間、コストをなくすようにしている。[Embodiment of Processor Element Circuit] As described in the section of the related art, in the case of a dedicated system, the processor element 14 and the bus bridge circuit 17 usually use dedicated LSIs, respectively. . In this embodiment, as described above, the processor element circuit used as the processor element 14 is minimally expanded and used as the bus bridge circuit 17 or a part of the bus bridge circuit 17. Thus, labor and cost for developing two types of LSIs are eliminated.

【００８８】以下の説明において、このようにプロセッ
サエレメントと、バスブリッジ回路とに共用化できるよ
うにしたプロセッサエレメント回路のＬＳＩチップのこ
とを共用チップと呼ぶことにする。In the following description, the LSI chip of the processor element circuit that can be shared by the processor element and the bus bridge circuit will be referred to as a shared chip.

【００８９】以下、共用チップの構成を図１および図３
を参照して説明する。図１は、共用チップを用いてプロ
セッサエレメント１４を構成した場合の構成図である。
また、図３は、共用チップを用いてバスブリッジ回路１
７を構成した場合の構成図である。The structure of the shared chip will be described below with reference to FIGS.
This will be described with reference to FIG. FIG. 1 is a configuration diagram when the processor element 14 is configured using a shared chip.
FIG. 3 shows a bus bridge circuit 1 using a shared chip.
7 is a configuration diagram in the case of configuring No. 7; FIG.

【００９０】［共用チップの構成］まず、この実施の形
態における共用チップ（プロセッサエレメント回路）の
構成を、図１を参照して説明する。図１は、この実施の
形態の共用チップ２０の内部構成を、ローカルメモリ１
５とボードバス１６とともに示したブロック図である。[Configuration of Shared Chip] First, the configuration of the shared chip (processor element circuit) in this embodiment will be described with reference to FIG. FIG. 1 shows an internal configuration of a shared chip 20 according to this embodiment in a local memory 1.
5 is a block diagram shown together with a board bus 16; FIG.

【００９１】図１に示すように、共用チップ２０は、Ｃ
ＰＵ２１と、浮動小数点演算ユニット２２と、内部メモ
リ２３と、ローカルメモリコントローラ２４と、周辺回
路コントローラ２５と、マルチプレクサ２６と、ロード
ジェネレータ２７と、ボードバスインタフェース２８
と、プロセッサエレメント内部バス（ＰＥバスと略称す
る）３０と、ローカルメモリ内部バス（ＬＭバスと略称
する）３１と、命令バス３２と、浮動小数点データバス
３３と、モード選択信号ＳＥＬの入力端子３４と、割り
込み制御信号ライン３５を備えて構成されている。As shown in FIG. 1, the shared chip 20
PU 21, floating point arithmetic unit 22, internal memory 23, local memory controller 24, peripheral circuit controller 25, multiplexer 26, load generator 27, board bus interface 28
, A processor element internal bus (abbreviated as PE bus) 30, a local memory internal bus (abbreviated as LM bus) 31, an instruction bus 32, a floating-point data bus 33, and an input terminal 34 for a mode selection signal SEL. And an interrupt control signal line 35.

【００９２】ＣＰＵ２１は、この共用チップの全体の制
御およびプロセッサエレメントとして動作するときの計
算全体の制御をするためのもので、この例では、３２ビ
ットＣＰＵが用いられる。ＣＰＵ２１は、内部に複数本
の３２ビット整数レジスタを備え、このレジスタは整数
演算に使用される。The CPU 21 controls the whole of the shared chip and the whole calculation when operating as a processor element. In this example, a 32-bit CPU is used. The CPU 21 has a plurality of 32-bit integer registers therein, and these registers are used for integer operations.

【００９３】浮動小数点演算ユニット（以下、ＦＰＵと
略称する）２２は、内部に倍精度浮動小数点演算器を備
え、高速に積和演算などを実行するものである。ＦＰＵ
２２は、その内部に、６４ビット倍精度浮動小数点デー
タを格納できる複数本の浮動小数点レジスタを備え、倍
精度浮動小数点演算器の演算に使用される。The floating-point arithmetic unit (hereinafter abbreviated as FPU) 22 has a double-precision floating-point arithmetic unit therein and executes a product-sum operation at a high speed. FPU
Reference numeral 22 includes a plurality of floating-point registers capable of storing 64-bit double-precision floating-point data, and is used for the operation of the double-precision floating-point calculator.

【００９４】内部メモリ２３は、ＣＰＵ２１で実行され
る命令や、ＣＰＵ２１、ＦＰＵ２２で行われる計算結果
などを格納する。この内部メモリ２３は、３ポート構成
であり、ポート１がマルチプレクサ２６を介して命令バ
ス３２に接続される３２ビット読み出し専用ポート、ポ
ート２がプロセッサエレメント内部バス３０に接続され
る３２ビット読み出し／書き込みポート、ポート３がＦ
ＰＵ２２と倍精度浮動小数点データをやり取りするため
の６４ビット読み出し／書き込みポートとなっている。The internal memory 23 stores instructions executed by the CPU 21, calculation results executed by the CPU 21 and the FPU 22, and the like. The internal memory 23 has a three-port configuration. Port 1 is a 32-bit read-only port connected to an instruction bus 32 via a multiplexer 26. Port 2 is a 32-bit read / write port connected to a processor element internal bus 30. Port, port 3 is F
It is a 64-bit read / write port for exchanging double precision floating point data with PU22.

【００９５】ローカルメモリコントローラ（以下、ＬＭ
Ｃと略称する）２４は、共用チップ２０に対して接続さ
れる外部のローカルメモリのチップセレクト信号や書き
込み信号などを生成する。また、周辺回路コントローラ
（以下、ＰＥＣと略称する）２５は、周辺回路の制御信
号を生成する。A local memory controller (hereinafter referred to as LM)
C) 24 generates a chip select signal and a write signal of an external local memory connected to the shared chip 20. A peripheral circuit controller (hereinafter abbreviated as PEC) 25 generates a control signal for the peripheral circuit.

【００９６】マルチプレクサ２６は、内部メモリ２３ま
たはロードジェネレータ２７からの命令のうち、ＣＰＵ
２１に入力する命令を選択するものである。ロードジェ
ネレータ２７は、共用チップ２０が、バスブリッジ回路
１７中に使われるとき、初期化のためのプログラムを生
成するものである。Multiplexer 26 is one of instructions from internal memory 23 or load generator 27,
This is for selecting an instruction to be input to the register 21. The load generator 27 is for generating a program for initialization when the shared chip 20 is used in the bus bridge circuit 17.

【００９７】ボードバスインタフェース２８は、ボード
バス１６と、共用チップ２０で構成されるプロセッサエ
レメント回路とのインタフェースを行う。The board bus interface 28 interfaces the board bus 16 with a processor element circuit constituted by the shared chip 20.

【００９８】プロセッサエレメント内部バス（以下、Ｐ
Ｅバスと略称する）３０は、ボードバスインタフェース
２８、ＣＰＵ２１、ＦＰＵ２２、内部メモリ２３を接続
するための３２ビットバスである。The processor element internal bus (hereinafter referred to as P
The E bus 30 is a 32-bit bus for connecting the board bus interface 28, the CPU 21, the FPU 22, and the internal memory 23.

【００９９】ローカルメモリ内部バス（以下、ＬＭバス
と略称する）３１は、外部のローカルメモリと、ＣＰＵ
２１、ＦＰＵ２２とを接続するための６４ビットバスで
ある。A local memory internal bus (hereinafter abbreviated as LM bus) 31 is provided with an external local memory and a CPU.
21 is a 64-bit bus for connecting to the FPU 22.

【０１００】命令バス３２は、プログラムをＣＰＵ２１
に読み出すための３２ビットのバスである。また、浮動
小数点データバス３３は、ＦＰＵ２２が内部メモリ２３
に格納されている倍精度浮動小数点データをやりとりす
るためのバスである。The instruction bus 32 stores the program in the CPU 21.
This is a 32-bit bus for reading the data. The floating-point data bus 33 is connected to the FPU 22 by the internal memory 23.
This is a bus for exchanging double-precision floating-point data stored in.

【０１０１】モード選択信号ＳＥＬの入力端子３４は、
共用チップ２０を、プロセッサエレメントとして使う
か、バスブリッジ回路として使うかを選択する選択入力
端子である。The input terminal 34 of the mode selection signal SEL is
A selection input terminal for selecting whether to use the shared chip 20 as a processor element or a bus bridge circuit.

【０１０２】割り込み信号ライン３５は、ボードバスイ
ンタフェース２８からＣＰＵ２１へ割り込み信号を供給
するラインである。The interrupt signal line 35 is a line for supplying an interrupt signal from the board bus interface 28 to the CPU 21.

【０１０３】さらに、また、モード選択信号ＳＥＬが
“０”なら、共用チップ２０は、プロセッサエレメント
として使用され、“１”なら、バスブリッジ回路として
使われる。ここで、共用チップ２０で構成される回路に
おいて、モード選択信号ＳＥＬが“０”でプロセッサエ
レメントとして使用されるときを演算処理モード、モー
ド選択信号ＳＥＬが“１”でバスブリッジ回路として使
用されるときをブリッジモードと呼ぶ。Further, if the mode selection signal SEL is "0", the shared chip 20 is used as a processor element, and if "1", it is used as a bus bridge circuit. Here, in the circuit constituted by the shared chip 20, when the mode selection signal SEL is "0" and is used as a processor element, the operation processing mode is set. Time is called bridge mode.

【０１０４】［プロセッサエレメントとしての構成、機
能、動作］［プロセッサエレメントとしての構成］次に、この共用
チップ２０をプロセッサエレメント１４として動作させ
るときの構成を示す。[Configuration, Function, and Operation as Processor Element] [Configuration as Processor Element] Next, a configuration when the shared chip 20 is operated as the processor element 14 will be described.

【０１０５】前述したように、図１は、共用チップ２０
をプロセッサエレメント１４として使用するときの構成
を示しており、入力端子３４に供給されるモード選択信
号ＳＥＬは、その値が“０”である。このため、共用チ
ップ２０は演算処理モードとなって、プロセッサエレメ
ントとして使用される。As described above, FIG.
Is used as the processor element 14, and the value of the mode selection signal SEL supplied to the input terminal 34 is "0". For this reason, the common chip 20 enters the arithmetic processing mode and is used as a processor element.

【０１０６】共用チップ２０が演算処理モードのとき
は、マルチプレクサ２６は内部メモリ２３の出力を選択
し、命令バス３２には内部メモリ２３に格納されている
命令が読み出される。When the shared chip 20 is in the arithmetic processing mode, the multiplexer 26 selects the output of the internal memory 23, and the instruction stored in the internal memory 23 is read out to the instruction bus 32.

【０１０７】また、このとき、図１に示したように、ボ
ードバスインタフェース２８を、ボードバス１６に、ま
た、ＬＭバス３１とＬＭＣ２４とをローカルメモリ１５
に接続する。ローカルメモリ１５には、例えば汎用のＳ
ＲＡＭを使用して、ＬＭバス３１は、ＳＲＡＭのデータ
端子とアドレス端子に接続され、また、ＬＭＣ２４は、
ＳＲＡＭのチップセレクト信号、書き込みイネーブル信
号などの制御信号を生成する。ＬＭバス３１は、前述の
通り６４ビットであるため、ローカルメモリ１５には、
例えばデータ幅が１６ビットのＳＲＡＭが４個並列に接
続される。At this time, as shown in FIG. 1, the board bus interface 28 is connected to the board bus 16 and the LM bus 31 and the LMC 24 are connected to the local memory 15.
Connect to For example, a general-purpose S
Using a RAM, the LM bus 31 is connected to the data terminal and the address terminal of the SRAM, and the LMC 24
It generates control signals such as an SRAM chip select signal and a write enable signal. Since the LM bus 31 is 64 bits as described above, the local memory 15
For example, four SRAMs having a data width of 16 bits are connected in parallel.

【０１０８】なお、共用チップ２０が演算処理モードの
時には、周辺回路コントローラ２５、ロードジェネレー
タ２７は機能しないものとする。When the shared chip 20 is in the arithmetic processing mode, the peripheral circuit controller 25 and the load generator 27 do not function.

【０１０９】次に、共用チップ２０の内部ユニットのう
ち、この演算処理モードにおけるＣＰＵ２１とボードバ
スインタフェース２８の機能について説明する。まず、
ＣＰＵ２１の機能について述べる。Next, functions of the CPU 21 and the board bus interface 28 in the arithmetic processing mode among the internal units of the shared chip 20 will be described. First,
The function of the CPU 21 will be described.

【０１１０】ＣＰＵ２１は、通常のＲＩＳＣプロセッサ
であり、前述のように、内部に３２ビットの整数レジス
タを含んでいる。整数レジスタとは別に、ＣＰＵ２１の
内部ユニットや外部ユニットを制御するためのコントロ
ールレジスタを含んでいる。ＣＰＵ２１は、リセット直
後などは起動待ち状態（待機状態）となって、後述する
ように、ボードバスインタフェース２８から起動がかけ
られるまで動作しない。以下、ＣＰＵ２１に起動がかけ
られたと仮定して説明を続ける。The CPU 21 is a normal RISC processor, and internally includes a 32-bit integer register as described above. In addition to the integer registers, it includes a control register for controlling an internal unit and an external unit of the CPU 21. The CPU 21 is in a startup waiting state (standby state) immediately after a reset or the like, and does not operate until the board bus interface 28 starts up, as described later. Hereinafter, the description will be continued assuming that the CPU 21 has been activated.

【０１１１】ボードバスインタフェース２８からＣＰＵ
２１に起動がかけられると、動作状態に遷移して、特定
のアドレスから命令バス３２を通して予め内部メモリ２
３に格納されている命令の読み出しを開始する。その
後、アドレスをインクリメントさせながら、内部メモリ
２３から１個ずつ順番に命令を読み出し、解釈実行す
る。From the board bus interface 28 to the CPU
When the memory 21 is activated, the state is changed to an operation state, and the internal memory 2 is previously transferred from a specific address through the instruction bus 32.
The reading of the instruction stored in No. 3 is started. Thereafter, the instructions are sequentially read out one by one from the internal memory 23 while the address is incremented, and the instructions are interpreted and executed.

【０１１２】ＣＰＵ２１の命令は、通常のＲＩＳＣ型汎
用ＣＰＵが持つ命令セットで構成されており、少なくと
も、何も操作が行われないＮＯＰ命令、３２ビット整数
レジスタやコントロールレジスタ間の算術論理演算命
令、シフト演算命令、ＰＥバス３０を通じた内部メモリ
２３とＣＰＵ２１内部の３２ビット整数レジスタとの間
のロード、ストア命令、ＬＭＢＵＳを通したＬＭ１２と
３２ビット整数レジスタとの間のロード、ストア命令、
また、ＬＭバス３１を通じたローカルメモリ１５とＦＰ
Ｕ２２内部の浮動小数点レジスタとの間のロード、スト
ア命令、また、内部メモリ２３とＦＰＵ２２内部の浮動
小数点レジスタとの間のロード、ストア命令、ジャンプ
や分岐命令、また、ＦＰＵ２２内部の浮動小数点演算器
を使い、ＦＰＵ２２内部の浮動小数点レジスタの間で浮
動小数点演算を行うための浮動小数点演算命令、さら
に、図示を省略したＣＰＵ２１−ＦＰＵ２２間バスを使
った、ＣＰＵ２１内部の整数レジスタとＦＰＵ２２内部
の浮動小数点レジスタ間のデータ転送命令を備える。The instructions of the CPU 21 are constituted by an instruction set of a general RISC-type general-purpose CPU, and include at least a NOP instruction that performs no operation, an arithmetic and logic operation instruction between a 32-bit integer register and a control register, A shift operation instruction, a load / store instruction between the internal memory 23 and the 32-bit integer register inside the CPU 21 via the PE bus 30, a load / store instruction between the LM12 and the 32-bit integer register via the LMBUS,
Also, the local memory 15 and the FP through the LM bus 31
Load and store instructions between floating point registers inside U22, load and store instructions between internal memory 23 and floating point registers inside FPU 22, jump and branch instructions, floating point arithmetic unit inside FPU 22 , A floating-point operation instruction for performing a floating-point operation between floating-point registers inside the FPU 22, and an integer register inside the CPU 21 and a floating-point inside the FPU 22 using a bus (not shown) between the CPU 21 and the FPU 22. It has a data transfer instruction between registers.

【０１１３】これらの命令のうち、ローカルメモリ１５
や内部メモリ２３と、ＦＰＵ２２内部の浮動小数点レジ
スタとの間のロード、ストア命令や浮動小数点演算命
令、データ転送命令が、ＣＰＵ２１に読み出されると、
ＣＰＵ２１は、図示を省略した制御信号を通してＦＰＵ
２２内部の浮動小数点レジスタや浮動小数点演算器を制
御しながら、これらの命令を実行する。Of these instructions, the local memory 15
When a load, store instruction, floating-point operation instruction, or data transfer instruction between the CPU 21 and the internal memory 23 and the floating-point register inside the FPU 22 is read by the CPU 21,
The CPU 21 controls the FPU through a control signal (not shown).
These instructions are executed while controlling a floating-point register and a floating-point arithmetic unit inside 22.

【０１１４】また、ＣＰＵ２１が、ローカルメモリ１５
と、ＦＰＵ２２内部の浮動小数点レジスタまたはＣＰＵ
２１内部の整数レジスタとの間のロード、ストア命令を
実行するときには、ＣＰＵ２１は、ローカルメモリ１５
のアドレスを出力し、ＬＭバス３１を経由してローカル
メモリ１５に出力するとともに、ＬＭコントローラ２４
を制御して、ローカルメモリ１５のチップセレクト信
号、書き込み信号などを生成してローカルメモリ１５に
出力する。また、ローカルメモリ１５と、ＣＰＵ２１内
部の整数レジスタとの間のロードのときは、６４ビット
のＬＭバス３１を通して読み出す。Further, the CPU 21 stores the local memory 15
And a floating point register or CPU inside the FPU 22
When executing a load / store instruction to / from an integer register in the internal memory 21, the CPU 21
Is output to the local memory 15 via the LM bus 31 and the LM controller 24
To generate a chip select signal, a write signal, and the like for the local memory 15 and output the signal to the local memory 15. In the case of loading between the local memory 15 and the integer register inside the CPU 21, the data is read through the 64-bit LM bus 31.

【０１１５】このとき、コントロールレジスタの設定に
より、ＣＰＵ２１内部の３２ビット整数レジスタには、
ＬＭバス３１の上位３２ビットを読み出すか、下位３２
ビットを読み出すかが選択される。また、この整数レジ
スタからローカルメモリ１５へのストアも、６４ビット
のＬＭバス３１を通して行われるため、コントロールレ
ジスタの設定によって、ローカルメモリ１５の上位３２
ビットに書き込むか、下位３２ビットに書き込むかが選
択されるものである。At this time, according to the setting of the control register, the 32-bit integer register in the CPU 21 stores
The upper 32 bits of the LM bus 31 are read or the lower 32 bits are read.
Whether to read the bit is selected. Further, since the store from the integer register to the local memory 15 is also performed through the 64-bit LM bus 31, the upper 32 bits of the local memory 15 are set according to the setting of the control register.
Whether to write to bits or to write to lower 32 bits is selected.

【０１１６】さらに、ＣＰＵ２１は、割り込みを処理す
る機能も持ち、ボードバスインタフェース２８やプロセ
ッサエレメント回路の外部から、あるいはＦＰＵ２２か
ら、割り込み信号ライン３５などを通して割り込まれる
と、それまで実行していた作業を中断して、中断した命
令のアドレスを整数レジスタに保存して、割り込みハン
ドラにジャンプする。また、割り込みハンドラの実行が
終了すると、特定の整数レジスタに保存したアドレスに
戻り、もとの作業を再開する。Further, the CPU 21 has a function of processing an interrupt, and when interrupted from outside of the board bus interface 28 or the processor element circuit or from the FPU 22 through the interrupt signal line 35 or the like, the CPU 21 executes the work which has been executed so far. Interrupt, save the address of the interrupted instruction in an integer register, and jump to the interrupt handler. When the execution of the interrupt handler is completed, the process returns to the address stored in the specific integer register and resumes the original work.

【０１１７】次に、ボードバスインタフェース２８の機
能と構成の概略について述べる。Next, an outline of the function and configuration of the board bus interface 28 will be described.

【０１１８】ボードバスインタフェース２８は、前述の
ＰＰＲＡＭ−Ｌｉｎｋのプロトコルに従って動作するボ
ードバス１６と、共用チップ２０の内部との間をインタ
フェースする。このため、ボードバスインタフェース２
８は、内部メモリ２３へのアクセスや、前述したＰＰＲ
ＡＭ−Ｌｉｎｋのサブアクションに対応したパケットの
生成などを行う。このボードバスインタフェース２８
は、以下の（Ａ）〜（Ｆ）に記す６つの機能を持つ。The board bus interface 28 interfaces between the board bus 16 operating according to the above-described PPRAM-Link protocol and the inside of the shared chip 20. Therefore, the board bus interface 2
8 indicates access to the internal memory 23 or the PPR described above.
A packet corresponding to the AM-Link subaction is generated. This board bus interface 28
Has six functions described in the following (A) to (F).

【０１１９】（Ａ）ボードバス１６を経由した他のＰＰ
ＲＡＭノードからの内部メモリ２３への書き込み要求に
従って、外部から送られてきたデータを、ＰＥバス３０
を経由して、内部メモリ２３に３２ビット単位で書き込
む。(A) Another PP via the board bus 16
In response to a write request from the RAM node to the internal memory 23, data sent from the outside is transferred to the PE bus 30.
Is written to the internal memory 23 in units of 32 bits.

【０１２０】（Ｂ）ボードバス１６を経由した他のＰＰ
ＲＡＭノードからのローカルメモリ１５への書き込み要
求に従って、外部から送られてきたデータを、ＬＭバス
３１を経由して、ローカルメモリ１５に６４ビット単位
で書き込む。(B) Another PP via the board bus 16
In response to a write request from the RAM node to the local memory 15, data sent from the outside is written to the local memory 15 via the LM bus 31 in 64-bit units.

【０１２１】（Ｃ）ボードバス１６を経由した他のＰＰ
ＲＡＭノードからの内部メモリ２３の読み出し要求に従
って、ＰＥバス３０を経由して、内部メモリ２３から３
２ビット単位でデータを読み出し、ボードバス１６に出
力する。(C) Another PP via the board bus 16
In response to a read request from the RAM node to the internal memory 23, the internal memory 23
Data is read in units of 2 bits and output to the board bus 16.

【０１２２】（Ｄ）ボードバス１６を経由した他のＰＰ
ＲＡＭノードからのローカルメモリ１５の読み出し要求
に従って、ＬＭバス３１を経由して、ローカルメモリ１
５から６４ビット単位でデータを読み出し、ボードバス
１６に出力する。(D) Another PP via the board bus 16
In response to a read request of the local memory 15 from the RAM node, the local memory 1
Data is read in units of 5 to 64 bits and output to the board bus 16.

【０１２３】（Ｅ）ＣＰＵ２１から他のＰＰＲＡＭノー
ドへの書き込み要求に従って、ボードバス１６に要求を
出力するとともに、内部メモリ２３に格納されているデ
ータを、ＰＥバス３０を経由して読み出し、ボードバス
１６に出力する。(E) In response to a write request from the CPU 21 to another PPRAM node, a request is output to the board bus 16, and data stored in the internal memory 23 is read out via the PE bus 30. 16 is output.

【０１２４】（Ｆ）ＣＰＵ２１から他のＰＰＲＡＭノー
ドへのデータ読み出し要求に従って、ボードバス１６に
要求を出力するとともに、ボードバス１６から入力され
てくるデータを、ＰＥバス３０を経由して、内部メモリ
２３に書き込む。(F) In accordance with a data read request from the CPU 21 to another PPRAM node, a request is output to the board bus 16 and data input from the board bus 16 is transferred to the internal memory via the PE bus 30. Write to 23.

【０１２５】これらの機能を実現するために、ボードバ
スインタフェース２８は、内部メモリ２３やローカルメ
モリ１５にアクセスする機能とともに、上記（Ａ）〜
（Ｆ）の動作に対応して、図５に示したトランザクショ
ンを構成する、ＰＰＲＡＭ−Ｌｉｎｋのサブアクション
を生成する機能およびＣＰＵ２１と協調しながら動作す
る機能を備える。In order to realize these functions, the board bus interface 28 has a function of accessing the internal memory 23 and the local memory 15 as well as the functions (A) to (3) described above.
In correspondence with the operation of (F), a function for generating a sub-action of PPRAM-Link and a function for operating in cooperation with the CPU 21 constituting the transaction shown in FIG.

【０１２６】次に、ボードバスインタフェース２８の構
成の概略について説明する。このボードバスインタフェ
ース２８のユニットの構成は、既に、前述の橋本らの文
献に示されている。そこで、ここでは、この構成をもと
にして、この実施の形態に基づいて、若干変更を加えた
ものをボードバスインタフェース２８の例として挙げ
る。Next, the outline of the configuration of the board bus interface 28 will be described. The configuration of the unit of the board bus interface 28 has already been disclosed in the above-mentioned document by Hashimoto et al. Therefore, here, an example of the board bus interface 28 that has been slightly modified based on this configuration based on this embodiment will be described.

【０１２７】図６は、このように橋本らの文献の構成を
もとにして変更を加えたボードバスインタフェース２８
の構成例を示したブロック図である。図６で、破線はブ
ロック間の主な制御信号を表す。FIG. 6 shows a modified board bus interface 28 based on the configuration of Hashimoto et al.
FIG. 3 is a block diagram showing an example of the configuration. In FIG. 6, broken lines represent main control signals between blocks.

【０１２８】ボードバスインタフェース２８において
は、入力端子ＩｎｐｕｔＬｉｎｋから入力されたパケッ
トは、アドレスデコーダ４０によって、このノード宛の
パケットかどうかを判定する。この判定の結果、自分の
ノード宛のパケットである場合は、パケットは入力キュ
ー４３に入れられ、また、自分のノード宛のパケットで
ない場合は、出力端子ＯｕｔｐｕｔＬｉｎｋを通して隣
のノードに送るため、バイパスＦＩＦＯ４１に入れられ
る。In the board bus interface 28, the packet inputted from the input terminal InputLink is judged by the address decoder 40 whether the packet is addressed to this node. As a result of this determination, if the packet is addressed to the own node, the packet is put into the input queue 43. If the packet is not addressed to the own node, the packet is sent to the adjacent node through the output terminal OutputLink. Can be put in.

【０１２９】また、入力されたパケットが、図５を用い
て説明したパケットのうち要求送出パケットや応答送出
パケットであるときには、前述したサブアクションの動
作により、必ず要求受領パケット、応答受領パケットを
送り返すため、アドレスデコーダ４０から受領パケット
ジェネレータ４４を制御して、これらの受領パケットが
生成される。さらに、プロセッサエレメント内部で要求
送出パケットや、応答送出パケットが生成されるときに
は、出力キュー４５に、送出するパケットが蓄積され
る。When the input packet is a request transmission packet or a response transmission packet among the packets described with reference to FIG. 5, the request reception packet and the response reception packet are always returned by the above-described sub-action operation. Therefore, the reception packet generator 44 is controlled from the address decoder 40 to generate these reception packets. Furthermore, when a request transmission packet or a response transmission packet is generated inside the processor element, the transmission queue is accumulated in the output queue 45.

【０１３０】そして、フローコントローラ４２には、そ
れぞれバイパスＦＩＦＯ４１、受領パケットジェネレー
タ４４、出力キュー４５の出力が接続されている。フロ
ーコントローラ４２は、これらのうちから１個のパケッ
トを選び出して、出力端子ＯｕｔｐｕｔＬｉｎｋから出
力する。フローコントローラ４２の詳細な動作について
は、前述の文献に記されているので省略する。The outputs of the bypass FIFO 41, the reception packet generator 44, and the output queue 45 are connected to the flow controller 42, respectively. The flow controller 42 selects one packet from these, and outputs it from the output terminal OutputLink. The detailed operation of the flow controller 42 is described in the above-mentioned document and will not be described.

【０１３１】また、割り込み制御ブロック４６は、ＣＰ
Ｕ２１によってパケット生成を行うときなどに、ＣＰＵ
２１に出力する割り込み信号Ｉｎｔｒを生成するための
ブロックである。Further, the interruption control block 46
When packet generation is performed by U21, the CPU
21 is a block for generating an interrupt signal Intr to be output to the CPU 21.

【０１３２】また、メモリアクセスブロック４７は、外
部からの要求などに従って、ローカルメモリ１５や内部
メモリ２３に、書き込み／読み出しを行うためのブロッ
クである。The memory access block 47 is a block for writing / reading data to / from the local memory 15 and the internal memory 23 according to a request from the outside.

【０１３３】さらに、制御状態レジスタ４８は、制御レ
ジスタ４８ａと状態レジスタ４８ｂとの２個のレジスタ
からなる。このうち、制御レジスタ４８ａは、その格納
データ内容によって、ボードバスインタフェース２８の
内部ブロックを制御する。この制御レジスタ４８ａは、
ＰＥバス３０を通して、ＣＰＵ２１から書き込むことが
できるため、ボードバスインタフェース２８は、この制
御レジスタ４８ａを通してＣＰＵの制御を受ける。ま
た、状態レジスタ４８ｂは、ボードバスインタフェース
２８の内部状態を、ＣＰＵ２１に伝えるためのレジスタ
であり、ＰＥバス３０を通してＣＰＵ２１に読み出すこ
とができる。Further, the control status register 48 comprises two registers, a control register 48a and a status register 48b. The control register 48a controls the internal blocks of the board bus interface 28 according to the stored data content. This control register 48a
Since data can be written from the CPU 21 through the PE bus 30, the board bus interface 28 is controlled by the CPU through the control register 48a. The state register 48 b is a register for transmitting the internal state of the board bus interface 28 to the CPU 21, and can be read out to the CPU 21 through the PE bus 30.

【０１３４】この構成を使って、前記（Ａ）〜（Ｆ）の
トランザクションを実行する時は以下のようにする。When the transactions (A) to (F) are executed using this configuration, the following is performed.

【０１３５】まず、前記（Ａ）、（Ｂ）のように、外部
からの書き込み要求に従ったトランザクションの動作に
ついて述べる。この場合は、まず、入力キュー４３に要
求送出パケットのヘッダとともに書き込むデータが入力
されるので、前述のように、受領パケットジェネレータ
４４によって受領パケットを生成するとともに、入力キ
ュー４３からメモリアクセスブロック４７を起動して、
ターゲットとなるメモリへデータを書き込む。データ書
き込みの後、割り込みブロック４６を起動して、ＣＰＵ
２１に割り込みを発生させる。First, the operation of a transaction in accordance with an external write request as described in (A) and (B) will be described. In this case, first, data to be written together with the header of the request transmission packet is input to the input queue 43. Therefore, as described above, the reception packet is generated by the reception packet generator 44, and the memory access block 47 is transmitted from the input queue 43 to the input queue 43. Start up,
Write data to the target memory. After writing the data, the interrupt block 46 is activated and the CPU
An interrupt is generated at 21.

【０１３６】ＣＰＵ２１は割り込みを受け付けると、割
り込みハンドラに飛んで、内部メモリ２３に応答送出パ
ケットのヘッダ情報を生成する。このように、割り込み
が来たときにＣＰＵが生成するパケットの種類は、いく
つかあるため、ボードバスインタフェース２８は、状態
レジスタ４８ｂには、外部からの書き込み要求が来たこ
とを示す情報を保持しておき、ＣＰＵ２１は、ＰＥバス
３０を通して状態レジスタ４８ｂを読み出すことによっ
て生成するパケットのヘッダの種類を判別する。When the CPU 21 receives the interrupt, it jumps to the interrupt handler and generates the header information of the response transmission packet in the internal memory 23. As described above, since there are several types of packets generated by the CPU when an interrupt is received, the board bus interface 28 holds information indicating that an external write request has been received in the status register 48b. The CPU 21 determines the type of the header of the packet to be generated by reading the status register 48b through the PE bus 30.

【０１３７】その後、ＣＰＵ２１は、ＰＥバス３０を通
して制御レジスタ４８ａに書き込みを行い、応答パケッ
トの出力をボードバスインタフェース２８に指示する。
すると、ボードバスインタフェース２８はメモリアクセ
スブロック４７を通して応答送出パケットのヘッダを読
み出し、出力キュー４５に格納することによって、応答
送出パケットをボードバス１６に出力する。Thereafter, the CPU 21 writes the data in the control register 48a through the PE bus 30, and instructs the board bus interface 28 to output a response packet.
Then, the board bus interface 28 reads the header of the response transmission packet through the memory access block 47 and stores it in the output queue 45, thereby outputting the response transmission packet to the board bus 16.

【０１３８】また、ターゲットとなるノードが出力する
応答受領パケットを、さまざまな理由で受け取れなかっ
たことも考慮して、入力キュー４３に、応答受領パケッ
トが返ってくるまで、出力キュー４５に、応答送出パケ
ットは保持されるものとする。一定時間経過しても、応
答受領パケットが返ってこない場合は、出力キュー４５
は、応答送出パケットを再度出力する。In consideration of the fact that the response acknowledgment packet output by the target node cannot be received for various reasons, the response queue is sent to the output queue 45 until the response acknowledgment packet is returned to the input queue 43. The outgoing packet shall be retained. If the response acknowledgment packet does not return even after a certain period of time, the output queue 45
Outputs the response transmission packet again.

【０１３９】次に、前記（Ｃ）、（Ｄ）のように、外部
からの読み出し要求に従ったトランザクションの動作に
ついて述べる。この場合は、入力キュー４３に、要求送
出パケットのヘッダだけが格納されるので、前述のよう
に、受領パケットジェネレータ４４によって受領パケッ
トを生成するとともに、入力キュー４３は、割り込みブ
ロック４６を起動して、ＣＰＵ２１に割り込みをかけ
る。Next, the operation of a transaction according to a read request from the outside as described in (C) and (D) will be described. In this case, since only the header of the request transmission packet is stored in the input queue 43, the reception packet is generated by the reception packet generator 44 as described above. Interrupts the CPU 21.

【０１４０】すると、ＣＰＵ２１は、前の場合と同様
に、状態レジスタ４８ｂの情報をもとに、読み出し要求
に対する応答パケットのヘッダを、内部メモリ２３に生
成する。その後、ＣＰＵ２１は、制御レジスタ４８ａに
書き込むことによって、読み出し要求に対する応答パケ
ットの出力を指示する。Then, the CPU 21 generates a header of a response packet to the read request in the internal memory 23 based on the information of the status register 48b, as in the previous case. Thereafter, the CPU 21 instructs the output of a response packet to the read request by writing to the control register 48a.

【０１４１】すると、ボードバスインタフェース２８
は、メモリアクセスブロック４７を通して応答送出パケ
ットのヘッダを読み出し、さらに、メモリアクセスブロ
ック４７は、ローカルメモリ１５または内部メモリ２３
にある、要求されたアドレスからデータを読み出して、
出力キュー４５に格納する。その後、前記（Ａ）、
（Ｂ）の場合と同様に、出力キュー４５から応答パケッ
トが出力される。Then, the board bus interface 28
Reads the header of the response transmission packet through the memory access block 47, and furthermore, the memory access block 47 reads the local memory 15 or the internal memory 23
Read the data from the requested address at
It is stored in the output queue 45. Then, (A),
As in the case of (B), a response packet is output from the output queue 45.

【０１４２】次に、前記（Ｅ）のように、ＣＰＵ２１か
ら他のノードに対して書き込み要求を出すときのトラン
ザクションの様子について述べる。この場合は、まず、
ＣＰＵ２１は、書き込み要求パケットのヘッダと書き込
みたいデータとを、内部メモリ２３の特定のアドレスに
格納しておいた上で、制御レジスタ４８ａに書き込むこ
とによって書き込み要求を出力するようにボードバスイ
ンタフェース２８に指示する。Next, the state of a transaction when the CPU 21 issues a write request to another node as described in (E) will be described. In this case,
The CPU 21 stores the header of the write request packet and the data to be written at a specific address in the internal memory 23, and writes the data to the control register 48a to the board bus interface 28 so as to output the write request. To instruct.

【０１４３】すると、ボードバスインタフェース２８
は、メモリアクセスブロック４７を通して、要求送出パ
ケットのヘッダとデータとを、内部メモリ２３の前記の
特定のアドレスから読み出して、出力キュー４５に格納
する。その後、応答送出パケットの場合と同様に、出力
キュー４５から要求送出パケットが出力される。Then, the board bus interface 28
Reads out the header and data of the request transmission packet from the specific address in the internal memory 23 through the memory access block 47 and stores them in the output queue 45. Thereafter, a request transmission packet is output from the output queue 45 as in the case of a response transmission packet.

【０１４４】このとき、要求送出パケットの送出後、一
定時間経過しても、要求送出パケットに対する受領パケ
ットが返ってこなければ、要求送出パケットを再送する
ために、前述の（Ａ）、（Ｂ）の場合で説明した、応答
送出パケットに対する受領パケットを待つ場合と同様
に、受領パケットが返ってくるまで、パケットを出力キ
ュー４５に保持しておき、受領パケットを待つ。At this time, if a reception packet for the request transmission packet does not return even after a certain period of time has elapsed after the transmission of the request transmission packet, the above-mentioned (A) and (B) are used to retransmit the request transmission packet. As in the case of waiting for a reception packet for a response transmission packet described in the case of (1), the packet is held in the output queue 45 and the reception packet is waited until the reception packet is returned.

【０１４５】さらに、ＣＰＵ２１が出した要求サブアク
ションに対する応答サブアクションが返って来ない場合
も考慮して、割り込みブロック４６は、要求サブアクシ
ョンを出力してから一定時間経過しても入力キュー４３
に応答送出パケットが返ってこない場合は、ＣＰＵ２１
に対して割り込みを出す。Further, taking into consideration a case where a response subaction to the request subaction issued by the CPU 21 is not returned, the interrupt block 46 sets the input queue 43 even if a certain time has elapsed after outputting the request subaction.
If no response transmission packet is returned to the
Give an interrupt to

【０１４６】ＣＰＵ２１は、この割り込みを受け付ける
と、割り込みハンドラに飛んで、以前に送付した要求送
出パケットを再送する。このとき、ＣＰＵ２１は、状態
レジスタ４８ｂからの情報によって、再送の動作を行う
ことを判断するものである。When the CPU 21 receives this interrupt, it jumps to the interrupt handler and retransmits the previously transmitted request transmission packet. At this time, the CPU 21 determines, based on the information from the status register 48b, that the retransmission operation is to be performed.

【０１４７】また、出力した要求サブアクションに対す
る応答サブアクションが返ってきて、入力キュー４３に
応答送出パケットが格納されると、前述と同様に、受領
パケットジェネレータ４４によって受領パケットを生成
するとともに、割り込みブロック４６のタイマをクリア
して、ＣＰＵ２１に、再送のための割り込みを出さない
ようにする。When a response sub-action to the output request sub-action is returned and a response transmission packet is stored in the input queue 43, a reception packet is generated by the reception packet generator 44 and interrupted as described above. The timer of the block 46 is cleared so that an interrupt for retransmission is not issued to the CPU 21.

【０１４８】また、状態レジスタ４８ｂには、ＣＰＵ２
１が発生させたトランザクションが終了したことを示す
フラグを保持するようになっているので、応答送出パケ
ットを受け取った後に、このフラグをセットする。ＣＰ
Ｕ２１は、このフラグがセットされるのを待って、次の
操作に移ることができる。The status register 48b contains the CPU 2
Since a flag indicating that the transaction generated by 1 has been completed is held, this flag is set after receiving the response transmission packet. CP
U21 can wait for this flag to be set and move on to the next operation.

【０１４９】最後に、前記（Ｆ）のように、ＣＰＵ２１
から他のノードに対して読み出し要求を出すときのトラ
ンザクションの様子について述べる。この場合には、ま
ず、ＣＰＵ２１は、読み出し送出パケットのヘッダのみ
を内部メモリ２３に書き込んで、前述と同様に制御レジ
スタ４８を通して、ボードバスインタフェース２８にパ
ケットの出力を指示する。その後、メモリアクセスブロ
ック４７によって、内部メモリから読み出されたパケッ
トのヘッダが、出力キュー４５に格納されて、要求送出
パケットが出力される。Finally, as shown in (F), the CPU 21
The state of a transaction when a read request is issued to another node from is described. In this case, first, the CPU 21 writes only the header of the read out transmission packet into the internal memory 23, and instructs the board bus interface 28 to output the packet through the control register 48 as described above. Thereafter, the header of the packet read from the internal memory is stored in the output queue 45 by the memory access block 47, and the request transmission packet is output.

【０１５０】その後、要求受領パケットと応答送出パケ
ットが返ってくると、メモリアクセスブロック４７は、
応答送出パケットに含まれるデータを、内部メモリ２３
の特定の位置に書き込む。その後、前記（Ｅ）の場合と
同様に、トランザクションが終了したフラグをセットす
る。ＣＰＵ２１は、このフラグがセットされたことを確
認してから、内部メモリ２３に書き込まれたデータにア
クセスできる。After that, when the request reception packet and the response transmission packet are returned, the memory access block 47
The data included in the response transmission packet is stored in the internal memory 23.
Write to a specific location in. Thereafter, as in the case of (E), a flag indicating that the transaction has been completed is set. The CPU 21 can access the data written in the internal memory 23 after confirming that this flag is set.

【０１５１】さらに、要求送出パケットに対する要求受
領パケット、応答送出パケットが返ってこない場合の操
作は、前述の（Ｅ）の場合と同様である。Further, the operation in the case where the request reception packet and the response transmission packet for the request transmission packet are not returned is the same as the case of the above (E).

【０１５２】ここで、前記（Ｅ）または（Ｆ）の場合に
おいて、ボード１２上の、あるプロセッサエレメント１
４からは、他のプロセッサエレメント１４またはバスブ
リッジ回路１７はＰＰＲＡＭ−Ｌｉｎｋのボードバス１
６上にあるためにＰＰＲＡＭノードと認識できるので、
このような書き込み、読み出し要求を送付することが可
能である。Here, in the case of the above (E) or (F), a certain processor element 1 on the board 12
4, the other processor element 14 or the bus bridge circuit 17 is connected to the board bus 1 of the PPRAM-Link.
6 so that it can be recognized as a PPRAM node,
It is possible to send such a write / read request.

【０１５３】しかし、ホストコンピュータ１１上のメモ
リは、ＰＰＲＡＭ−Ｌｉｎｋの外側にあるために、ボー
ド１２上のプロセッサエレメント１４からは認識できな
い。例えば、あるプロセッサエレメント１４が、ホスト
コンピュータ１１上のメモリの特定のアドレスに書き込
みを行おうとしても、ホストコンピュータ１１には、Ｐ
ＰＲＡＭ−Ｌｉｎｋのノード識別子やオフセットアドレ
スが割り当てられていないので、書き込みができない。However, since the memory on the host computer 11 is outside the PPRAM-Link, it cannot be recognized from the processor element 14 on the board 12. For example, even if a certain processor element 14 attempts to write to a specific address in a memory on the host computer 11, the host computer 11
Since no node identifier or offset address of the PRAM-Link is assigned, writing cannot be performed.

【０１５４】そこで、この実施の形態では、ＰＰＲＡＭ
−Ｌｉｎｋの書き込み要求を利用して、この問題を解決
する。Therefore, in this embodiment, the PPRAM
-This problem is solved by using a link write request.

【０１５５】あるプロセッサエレメント１４が、ホスト
コンピュータ１１上のメモリに、アクセスしたいときに
は、バスブリッジ回路１７に対してＰＰＲＡＭ−Ｌｉｎ
ｋの書き込み要求を送信する。このとき、要求送信パケ
ットに含める書き込みデータ中に、ホストコンピュータ
１１のメモリアドレスやトランザクションの種類、書き
込みたいデータなどを入れておく。When a certain processor element 14 wants to access the memory on the host computer 11, the bus bridge circuit 17 supplies a PPRAM-Lin
Send a write request for k. At this time, in the write data included in the request transmission packet, the memory address of the host computer 11, the type of transaction, the data to be written, and the like are put in advance.

【０１５６】バスブリッジ回路１７が、このパケットを
受け取ると、ホストコンピュータ１１に対して、送付さ
れた内容に従ってアクセスを行う。このアクセスの詳し
い様子については後述する。When the bus bridge circuit 17 receives this packet, it accesses the host computer 11 in accordance with the sent contents. Details of this access will be described later.

【０１５７】次に、ボードバスインタフェース２８から
ＣＰＵ２１への起動方法について説明する。この起動
は、ＰＥバス３０を通してＣＰＵ２１の内部レジスタ
に、書き込みを行うことによって行われるものとする。Next, a method of starting from the board bus interface 28 to the CPU 21 will be described. This activation is performed by writing to an internal register of the CPU 21 through the PE bus 30.

【０１５８】ＣＰＵ２１の起動は、リセット後に、特定
の時間だけ待ってからボードバスインタフェース２８か
ら自動的に書き込みを行うようにしても良いし、ホスト
コンピュータ１１からシステムバス１３、ボードバス１
６を経由してボードバスインタフェース２８を介して行
っても良い。後者の起動方法の場合は、上述したよう
に、ＣＰＵ２１が応答送出パケットを生成しなければな
らないので、ＣＰＵ２１の起動後、完全にＣＰＵ２１が
立ち上がってから、割り込みを入れるようにする必要が
ある。あるいは、ＰＰＲＡＭ−Ｌｉｎｋには応答送出パ
ケットを返さないアクテイブメッセージというトランザ
クションが用意されているため、このアクテイブメッセ
ージを使用しても良い。The CPU 21 may be activated by automatically writing data from the board bus interface 28 after waiting for a specific time after resetting, or from the host computer 11 to the system bus 13 or the board bus 1.
6 via the board bus interface 28. In the case of the latter activation method, as described above, since the CPU 21 must generate a response transmission packet, it is necessary to completely interrupt the CPU 21 after the activation of the CPU 21 before interrupting. Alternatively, since a transaction called an active message that does not return a response transmission packet is prepared in the PPRAM-Link, this active message may be used.

【０１５９】［プロセッサエレメント１４としての動
作］さらに、分子軌道専用システムが前述の計算手順に
従って動作するときの、プロセッサエレメント１４の動
作について説明する。[Operation as Processor Element 14] The operation of the processor element 14 when the molecular orbital-only system operates according to the above-described calculation procedure will be described.

【０１６０】ここで、まず、システムバス１３から見た
場合、ホストコンピュータ１１の内部メモリと、各プロ
セッサエレメント１４の内部メモリ２３と、プロセッサ
エレメント１４に付属するローカルメモリ１５とは、シ
ステムバスであるＩＥＥＥ１３９４シリアルバスのアド
レス空間にマッピングされているため、それぞれ割り当
てられているアドレスを使って、アクセスできるように
されている。Here, first, when viewed from the system bus 13, the internal memory of the host computer 11, the internal memory 23 of each processor element 14, and the local memory 15 attached to the processor element 14 are the system bus. Since it is mapped in the address space of the IEEE 1394 serial bus, it can be accessed using the assigned address.

【０１６１】さらに、ボード１２上のボードバス１６か
ら見た場合には、前述のように、それぞれのプロセッサ
エレメント１４にはノード識別子が割り振られており、
また、プロセッサエレメント１４の内部メモリ２３およ
び付属するローカルメモリ１２には、オフセットアドレ
スが割り振られており、ボードバス１６上では、ＰＰＲ
ＡＭ−Ｌｉｎｋを通してノード識別子とオフセットアド
レスを使って、これら内部メモリ２３およびローカルメ
モリ１５がアクセスできるようにされている。Further, when viewed from the board bus 16 on the board 12, as described above, a node identifier is assigned to each processor element 14, and
An offset address is allocated to the internal memory 23 of the processor element 14 and the attached local memory 12.
The internal memory 23 and the local memory 15 can be accessed using the node identifier and the offset address through the AM-Link.

【０１６２】まず、システムの起動後、プロセッサエレ
メント１４には、図示を省略したリセット端子を通して
プロセッサエレメント１４の各内部ユニットに、リセッ
トがかけられる。First, after the system is started, each internal unit of the processor element 14 is reset through a reset terminal (not shown).

【０１６３】ボードバスインタフェース２８は、リセッ
ト直後から自動的に初期化を行い、後述するインタフェ
ース機能に従った動作が可能な状態になる。また、ＣＰ
Ｕ２１は、前述のようにリセットがかけられることによ
って、動作をしていない起動待ち状態（待機状態）にな
る。The board bus interface 28 automatically initializes immediately after resetting, and becomes ready to operate according to the interface function described later. Also, CP
As described above, the U21 is in a start-up waiting state (standby state) in which it is not operated by being reset as described above.

【０１６４】前述したように、システムの起動後には、
ホストコンピュータ１１からシステムバス１３→バスブ
リッジ回路１７→ボードバス１６を経由して、書き込み
要求が転送され、分子情報、基底情報、その他のパラメ
ータ、およびプログラムがプロセッサエレメント１４に
転送される。As described above, after starting the system,
A write request is transferred from the host computer 11 via the system bus 13 → the bus bridge circuit 17 → the board bus 16, and molecular information, base information, other parameters, and a program are transferred to the processor element 14.

【０１６５】このとき、上述したボードバスインタフェ
ース２８の（Ａ）、（Ｂ）の機能を使って、分子情報、
基底情報、その他のパラメータ情報は、ローカルメモリ
１５に書き込まれ、また、プログラムは、内部メモリ２
３に書き込まれる。また、その後、プロセッサエレメン
ト１４を起動するときには、ホストコンピュータ１１か
ら、前述のようにして、ＣＰＵ２１に対して起動がかけ
られる。このことにより、ＣＰＵ２１は起動待ち状態か
ら、動作状態に遷移する。At this time, using the functions (A) and (B) of the board bus interface 28 described above, molecular information and
The base information and other parameter information are written in the local memory 15, and the program is stored in the internal memory 2.
3 is written. Thereafter, when the processor element 14 is activated, the host computer 11 activates the CPU 21 as described above. As a result, the CPU 21 transitions from the start waiting state to the operating state.

【０１６６】ＣＰＵ２１が動作状態に遷移すると、ＣＰ
Ｕ２１は、内部メモリ２３の特定のアドレスから命令を
実行する。ここで、前述のプログラムは、ＣＰＵ２１が
動作を開始するアドレスからロードされるものとする。
したがって、ＣＰＵ２１は、ホストコンピュータ１１に
よって予め内部メモリ２３に書き込まれているプログラ
ムの実行を開始する。また、前述のように、リセット後
に、ＣＰＵ２１を起動待ち状態にすることによって、プ
ログラムが書き込まれる前の内部メモリ２３の内容を実
行して、ＣＰＵ２１が暴走してしまわないようにするこ
とができる。When the CPU 21 transitions to the operation state, the CP
U21 executes an instruction from a specific address of the internal memory 23. Here, it is assumed that the above-described program is loaded from an address at which the CPU 21 starts operating.
Therefore, the CPU 21 starts executing the program written in the internal memory 23 by the host computer 11 in advance. Further, as described above, by setting the CPU 21 in the standby state after the reset, the contents of the internal memory 23 before the program is written can be executed, and the runaway of the CPU 21 can be prevented.

【０１６７】次に、ＣＰＵ２１は、ホストコンピュータ
１１によって、そのプロセッサエレメント１４が担当す
る２電子積分情報、および密度行列要素が書き込まれる
まで、例えばＮＯＰ命令だけで構成されるループなどを
実行することによって待つ。その後、ホストコンピュー
タ１１は、システムバス１３、バスブリッジ回路１７、
ボードバス１６を通して、これらの情報の書き込み要求
を転送し、前述のボードバスインタフェース２８の
（Ｂ）の機能によってローカルメモリ１５に書き込む。Next, the CPU 21 causes the host computer 11 to execute, for example, a loop composed only of NOP instructions until the two-electron integral information and the density matrix element assigned to the processor element 14 are written. wait. Thereafter, the host computer 11 communicates with the system bus 13, the bus bridge circuit 17,
The information write request is transferred through the board bus 16 and written to the local memory 15 by the function (B) of the board bus interface 28 described above.

【０１６８】ＣＰＵ２１は、前記の必要な情報の書き込
みが終了したことを検知すると、ループ実行による待ち
状態を抜けて、２電子積分およびｇｒｓの計算を実行
し、ｇｒｓの計算結果をローカルメモリ１５に格納す
る。書き込みの検知方法（ホストコンピュータ１１とＣ
ＰＵ２１との同期）、およびこの計算のさらに詳しい内
容は後述する。When the CPU 21 detects that the writing of the necessary information has been completed, it exits the waiting state by executing the loop, executes the two-electron integration and the calculation of grs, and stores the calculation result of grs in the local memory 15. Store. Write detection method (host computer 11 and C
Synchronization with the PU 21) and further details of this calculation will be described later.

【０１６９】その後、前述のようにプロセッサエレメン
ト１４は、計算終了情報を以下のようにしてホストコン
ピュータ１１に送付する。Thereafter, as described above, the processor element 14 sends the calculation end information to the host computer 11 as follows.

【０１７０】まず、計算終了情報のために、ホストコン
ピュータ１１は、その内部メモリなどに、各プロセッサ
エレメント１４に対応した特定の変数領域を設ける。こ
の変数を計算終了ビットと呼ぶ。それぞれのプロセッサ
エレメント１４に対応する計算終了ビットは、前述のよ
うに、ホストコンピュータ１１の内部メモリがシステム
バス１３のアドレス空間にマッピングされているため、
全てのプロセッサエレメント１４からアクセスすること
ができる。First, the host computer 11 provides a specific variable area corresponding to each processor element 14 in its internal memory or the like for the calculation end information. This variable is called a calculation end bit. The calculation end bit corresponding to each processor element 14 is, as described above, because the internal memory of the host computer 11 is mapped in the address space of the system bus 13
It can be accessed from all processor elements 14.

【０１７１】システムの起動時に、この計算終了ビット
を、例えば“０”にしておき、ホストコンピュータ１１
は、この計算終了ビットを監視して、“１”になってい
れば、対応するプロセッサエレメント１４での計算が終
了したと判断して、必要な操作を行う。When the system is started, this calculation end bit is set to, for example, “0”, and the host computer 11
Monitors this calculation end bit, and if it is "1", determines that the calculation by the corresponding processor element 14 has been completed and performs a necessary operation.

【０１７２】そして、ＣＰＵ２１は、前述のホストコン
ピュータ１１上のメモリへの書き込み方法を使って、ボ
ードバス１６、バスブリッジ回路１７、システムバス１
３を経由してホストコンピュータ１１の内部メモリの計
算終了ビットへ、“１”を書き込むことによって、計算
終了情報を転送する。The CPU 21 uses the above-described writing method to the memory on the host computer 11 to execute the board bus 16, bus bridge circuit 17, system bus 1
Then, by writing "1" to the calculation end bit of the internal memory of the host computer 11 via "3", the calculation end information is transferred.

【０１７３】このようにして計算終了情報が伝えられる
と、ホストコンピュータ１１は、その計算が終了したプ
ロセッサエレメントに接続されているローカルメモリ１
５に対する読み出し要求を送付することにより、ボード
バスインタフェース２８の機能（Ｄ）を使って、ここに
格納されているｇｒｓの値を読み出す。When the calculation end information is transmitted in this manner, the host computer 11 sends the information to the local memory 1 connected to the processor element for which the calculation has been completed.
By sending a read request to the device 5, the value of grs stored here is read using the function (D) of the board bus interface 28.

【０１７４】その後、当該プロセッサエレメント１４の
計算終了ビットを、再度、“０”にクリアするととも
に、そのプロセッサエレメント１４で行われる次の計算
に対応する２電子積分情報、密度行列を、上で述べたよ
うな手順で送付する。以下同様にして、この手順が繰り
返される。Thereafter, the calculation end bit of the processor element 14 is cleared to “0” again, and the two-electron integration information and the density matrix corresponding to the next calculation performed by the processor element 14 are described above. And send it in the same way. Hereinafter, the same procedure is repeated.

【０１７５】次に、２電子積分およびｇｒｓの計算の内
容、およびホストコンピュータ１１と、プロセッサエレ
メント１４との同期の取り方について説明する。Next, the contents of the two-electron integration and the calculation of grs, and how to establish synchronization between the host computer 11 and the processor element 14 will be described.

【０１７６】［２電子積分、ｇｒｓの計算の動作概要］
まず、２電子積分およびｇｒｓの計算の概要について説
明する。[Outline of 2-electron integration and grs calculation]
First, an outline of two-electron integration and calculation of grs will be described.

【０１７７】前記（４）式で表される２電子積分は、前
述のように、小原の方法などを用いた計算によって、浮
動小数点積和演算を用いた計算手順に帰着される。そこ
で、システムの起動後にプロセッサエレメント１４に送
付されるプログラム中に、ＦＰＵ２２の浮動小数点演算
器による浮動小数点演算命令を使った２電子積分の計算
手順を、例えばサブルーチンの形で含めておき、計算を
行うときに、このサブルーチンを呼び出すようにする。As described above, the two-electron integral represented by the above equation (4) is reduced to a calculation procedure using a floating-point multiply-accumulate operation by a calculation using the Ohara method or the like. Therefore, in the program sent to the processor element 14 after the system is started, the calculation procedure of the two-electron integration using the floating-point operation instruction by the floating-point operation unit of the FPU 22 is included in, for example, a subroutine, and the calculation is performed When doing this, call this subroutine.

【０１７８】また、前述の小原の方法に関する文献に示
されているように、２電子積分の計算に必要なパラメー
タとしては、計算に関与する原子の座標や原子の種類や
使用する基底系などに応じて決まる軌道指数、さらには
軌道の種類に関する軌道量子ベクトル、２電子積分の重
ね合わせに用いる係数などがある。これらのパラメータ
は、ホストコンピュータから送付される２電子積分情報
に含めても良いし、あるいは、分子と使用する基底系が
決まれば、これらのパラメータも決まるため、システム
起動後に、全てのパラメータを各プロセッサエレメント
１４に付属するローカルメモリ１５に送付しておき、２
電子積分情報としては、これらのパラメータを指し示す
インデックスを送るようにしてもよい。Further, as shown in the above-mentioned literature relating to Ohara's method, the parameters necessary for the calculation of the two-electron integral include the coordinates of the atoms involved in the calculation, the types of atoms, and the basis set used. An orbital index determined accordingly, and an orbital quantum vector related to the type of orbital, a coefficient used for superposition of two-electron integration, and the like. These parameters may be included in the two-electron integration information sent from the host computer, or if the molecule and the basis set to be used are determined, these parameters are also determined. Sent to the local memory 15 attached to the processor element 14,
As the electronic integration information, an index indicating these parameters may be sent.

【０１７９】また、ここで述べた軌道量子ベクトルは、
複数通りに分けられ、また、（４）式で表されるよう
に、４つの原子軌道のそれぞれに対応する。このため、
複数個の２電子積分が存在し、浮動小数点演算命令を使
った２電子積分の計算手順は、この種類だけ存在する。
このため、上記の方法では、サブルーチンが複数個必要
となる。したがって、これらのサブルーチンは、例えば
容量の限られた内部メモリ２３に格納するのではなく、
ローカルメモリ１５に格納するようにしても良い。The orbital quantum vector described here is
It is divided into a plurality of types, and corresponds to each of the four atomic orbitals as represented by equation (4). For this reason,
There are a plurality of two-electron integrals, and the calculation procedure of the two-electron integral using the floating-point operation instruction exists only for this type.
Therefore, the above method requires a plurality of subroutines. Therefore, these subroutines are not stored in the internal memory 23 having a limited capacity, for example.
The information may be stored in the local memory 15.

【０１８０】このとき、２電子積分の計算は、分岐など
の制御命令がない浮動小数点演算の連続で表される部分
を含むため、この部分をローカルメモリ１５に格納し、
さらに、ＦＰＵ２２の浮動小数点演算器に、ＣＰＵ２１
の制御を受けずに、ローカルメモリ１５から演算手順を
読み出して計算するモードを設け、このモードによって
計算するようにしても良い。At this time, since the calculation of the two-electron integral includes a part represented by a sequence of floating-point operations without control instructions such as branching, this part is stored in the local memory 15.
Further, the floating point arithmetic unit of the FPU 22 includes
It is also possible to provide a mode in which the calculation procedure is read out from the local memory 15 without receiving the control described above, and the calculation is performed in this mode.

【０１８１】このようにして２電子積分が計算されたの
ち、プロセッサエレメント１４では、前記（２）式に従
ってｇｒｓが計算される。この計算には、ホストコンピ
ュータ１から２電子積分情報とともに送付されてきた密
度行列情報が使用される。このときは、２電子積分の種
類によらず、一定の手順に従って計算が行われるため、
内部メモリ２３に格納されるプログラム中に、この手順
を含めておき、２電子積分が求められた後、この手順に
従った浮動小数点演算を実行して、ｇｒｓを求めれば良
い。After the two-electron integral is calculated as described above, the processor element 14 calculates grs according to the above equation (2). For this calculation, the density matrix information sent together with the two-electron integration information from the host computer 1 is used. At this time, the calculation is performed according to a certain procedure regardless of the type of the two-electron integration,
This procedure may be included in the program stored in the internal memory 23, and after the two-electron integration is obtained, grs may be obtained by executing a floating-point operation according to this procedure.

【０１８２】また、ホストコンピュータ１１から送付さ
れてくる密度行列情報の中には、ボード１２上のプロセ
ッサエレメント１４間で共通に使用されるものもある。
このため、これらの密度行列は、ホストコンピュータ１
１から送付するのではなく、高速なボードバス１３だけ
を通して、プロセッサエレメント１４間で送付するよう
にしても良い。Some of the density matrix information sent from the host computer 11 is commonly used between the processor elements 14 on the board 12.
Therefore, these density matrices are stored in the host computer 1
It is also possible to send the data between the processor elements 14 through only the high-speed board bus 13 instead of sending the data from 1.

【０１８３】［ホストとプロセッサエレメントの同期］
次に、ホストコンピュータ１１による２電子積分情報や
密度行列情報の転送と、プロセッサエレメント１４にお
ける上記の計算の同期の取り方について説明する。[Synchronization of host and processor element]
Next, how to transfer the two-electron integral information and the density matrix information by the host computer 11 and how to synchronize the above calculations in the processor element 14 will be described.

【０１８４】これらの間に同期がとれていないと、計算
に使用される２電子積分情報や密度行列が、ローカルメ
モリ１５に書き込まれる前に、ＣＰＵ２１が計算を開始
してしまう。あるいはＣＰＵ２１の計算が終了していな
いにもかかわらず、ホストコンピュータ１１が２電子積
分情報や密度行列を上書きしてしまう。これらの結果、
求められたｇｒｓの値が誤ったものになってしまう。If synchronization is not established between them, the CPU 21 starts the calculation before the two-electron integral information and the density matrix used for the calculation are written into the local memory 15. Alternatively, the host computer 11 overwrites the two-electron integral information and the density matrix even though the calculation by the CPU 21 is not completed. As a result of these,
The obtained value of grs is incorrect.

【０１８５】このような事態を避けるために、例えば、
特定の内部メモリ２３やローカルメモリ１５のアドレス
に、タグビットと呼ぶ変数領域を設ける。このタグビッ
トには、ホストコンピュータ１１と、ＣＰＵ２１との両
方から書き込むことができる。また、タグビットが
“０”の間は、ＣＰＵ２１は２電子積分およびｇｒｓの
計算を開始できず、タグビットが“１”になるまで、前
述のループを抜けられない。また、タグビットが“１”
の間は、ホストコンピュータ１１は、２電子積分情報、
および密度行列要素の書き込みができず、タグビットが
“０”になるまで待つ。To avoid such a situation, for example,
A variable area called a tag bit is provided at a specific internal memory 23 or local memory 15 address. This tag bit can be written from both the host computer 11 and the CPU 21. While the tag bit is "0", the CPU 21 cannot start the two-electron integration and the calculation of grs, and cannot exit the above loop until the tag bit becomes "1". Also, the tag bit is “1”
In the meantime, the host computer 11 performs two-electron integration information,
And the density matrix element cannot be written, and waits until the tag bit becomes “0”.

【０１８６】このような前提の下で、システム起動時
に、タグビットを“０”としておく。すると、最初に、
ホストコンピュータ１１から２電子積分情報、および密
度行列要素の書き込みが行われるまで、上で述べたよう
に、ＣＰＵ２１はループを繰り返す。これらの情報を書
き込み終ると、ホストコンピュータ１１は、タグビット
を“１”にする。すると、タグビットが“１”になった
ため、ＣＰＵ２１は、ループを抜けて２電子積分および
ｇｒｓの計算を開始する。さらに、ＣＰＵ２１が、ｇｒ
ｓの計算を終了すると、ＣＰＵ２１はタグビットを
“０”にする。Under such a premise, the tag bit is set to "0" when the system is started. Then, first,
As described above, the CPU 21 repeats the loop until the host computer 11 writes the two-electron integral information and the density matrix element. After writing the information, the host computer 11 sets the tag bit to “1”. Then, since the tag bit becomes “1”, the CPU 21 exits the loop and starts the two-electron integration and the calculation of grs. Further, the CPU 21
When the calculation of s ends, the CPU 21 sets the tag bit to “0”.

【０１８７】この間、ホストコンピュータ１１が、次の
２電子積分情報または密度行列情報を書き込もうとして
も、上で述べたように、ホストコンピュータ１１は待ち
状態に入っているため、ＣＰＵ２１が実行中の情報を上
書きすることはない。その後、ＣＰＵ２１によってタグ
ビットが“０”になると、ホストコンピュータ１１は、
次の２電子積分情報、密度行列情報を書き込む。During this time, even if the host computer 11 tries to write the next two-electron integral information or density matrix information, as described above, the host computer 11 is in the waiting state, It does not overwrite information. Thereafter, when the tag bit becomes “0” by the CPU 21, the host computer 11
The next two-electron integral information and density matrix information are written.

【０１８８】このようにすることで、タグビットを使用
して、ホストコンピュータ１１とプロセッサエレメント
１４との同期をとることができる。このため、ＣＰＵ２
１が、書き込み前の２電子積分情報や密度行列、あるい
は上書きされてしまった後の２電子積分情報や密度行列
を使って、２電子積分やｇｒｓの計算を行うことを避け
ることができる。By doing so, the host computer 11 and the processor element 14 can be synchronized using the tag bits. Therefore, the CPU 2
It is possible to avoid the calculation of two-electron integration and grs by using the two-electron integration information and density matrix before writing or the two-electron integration information and density matrix after being overwritten.

【０１８９】また、効率を上げるために、前述のホスト
コンピュータ１１のメモリに保持される、計算終了ビッ
ト、２電子積分や密度行列要素を書き込むためのローカ
ルメモリ１５上の領域、ｇｒｓを格納するためのローカ
ルメモリ１５上の領域、タグビットのそれぞれを、それ
ぞれ２個ずつ持ち、上述した書き込みや検査、読み出し
を交互に行うようにしても良い。このようにすれば、ホ
ストコンピュータ１１は、例えばプロセッサエレメント
１４が計算している途中で、２電子積分情報や密度行列
情報を書き込むことができるようになるため、計算の効
率が向上する。To increase the efficiency, grs is stored in the memory of the host computer 11 and is stored in the local memory 15 for writing two-electron integrals and density matrix elements. Of the local memory 15 and two tag bits, and the above-described writing, inspection, and reading may be performed alternately. In this way, the host computer 11 can write the two-electron integral information and the density matrix information while the processor element 14 is performing the calculation, for example, so that the calculation efficiency is improved.

【０１９０】［バスブリッジ回路としての構成、機能、
動作］［ブリッジ回路としての構成］次に、共用チップ２０
を、バスブリッジ回路１７の一部として用いるときの構
成について述べる。図３は、このバスブリッジ回路１７
の構成を示したブロック図であり、共用チップ２０と、
若干の付加回路とを用いて構成される。このとき、図３
に示すように、共用チップ２０の入力端子３５に入力さ
れるモード選択信号ＳＥＬは、“１”となっており、共
用チップ２０は、ブリッジモードで動作する。[Configuration and Function as Bus Bridge Circuit,
Operation] [Configuration as Bridge Circuit] Next, the shared chip 20
Will be described as a part of the bus bridge circuit 17. FIG. 3 shows the bus bridge circuit 17.
FIG. 3 is a block diagram showing the configuration of the shared chip 20;
It is configured using some additional circuits. At this time, FIG.
As shown in (1), the mode selection signal SEL input to the input terminal 35 of the shared chip 20 is "1", and the shared chip 20 operates in the bridge mode.

【０１９１】共用チップ２０に付加される回路として
は、ブリッジモードの共用チップ２０のＣＰＵ２１で使
用されるプログラムを格納するＲＯＭ３６と、ＩＥＥＥ
１３９４シリアルバスのリンク層コントローラ（以下、
ＬＬＣと略称する）３７と、ＩＥＥＥ１３９４シリアル
バスの物理層コントローラ（以下、ＰＨＹと略称する）
３８とが設けられる。そして、ＬＬＣ３７から、共用チ
ップ２０へ外部割り込みライン３９が設けられる。外部
割り込みライン３９からの割り込みは、図示を省略した
信号線によってＣＰＵ２１に入力されるので、前述のよ
うに、外部割り込みによって、割り込みハンドラが起動
する。Circuits added to the shared chip 20 include a ROM 36 for storing a program used by the CPU 21 of the shared chip 20 in the bridge mode, an IEEE 36
Link layer controller of 1394 serial bus
LLC) 37 and an IEEE 1394 serial bus physical layer controller (hereinafter abbreviated as PHY)
38 are provided. Then, an external interrupt line 39 is provided from the LLC 37 to the shared chip 20. Since an interrupt from the external interrupt line 39 is input to the CPU 21 via a signal line (not shown), the interrupt handler is activated by the external interrupt as described above.

【０１９２】ここで、ＬＬＣ３７とＰＨＹ３８について
簡単に説明する。ＬＬＣ３７およびＰＨＹ３８は、ＩＥ
ＥＥ１３９４−１９９５標準にのっとったリンク層、物
理層の仕様に基づいた機能を有する回路である。Here, the LLC 37 and the PHY 38 will be briefly described. LLC37 and PHY38 are IE
This circuit has a function based on the specifications of the link layer and the physical layer according to the EE1394-1995 standard.

【０１９３】このうち、リンク層の機能を持つＬＬＣ３
７の、共用チップ２０との間のインタフェースの仕様と
しては、例えばＰＣＩバスなどの汎用のバスが考えられ
るが、ここでは、ＳＲＡＭとのインタフェースのよう
に、チップセレクト信号と書き込みパルス信号、あるい
はリクエスト信号とアクノリッジ信号を用いてアクセス
を制御する汎用インタフェースとする。Among them, LLC3 having a link layer function
7 may be a general-purpose bus such as a PCI bus, for example. Here, a chip select signal and a write pulse signal or a request A general-purpose interface that controls access using signals and acknowledge signals.

【０１９４】このような汎用インタフェース機能を持つ
ＬＬＣ３７とＰＨＹ３８は、現在のところ、数社からＬ
ＳＩ化されており、ＬＬＣの例としてはテキサスインス
ツルメンツ社製のＴＳＢ１２ＬＶ３１や新日本製鉄社製
のＮＮ８Ａ２１３などが、またＰＨＹの例としてはテキ
サスインスツルメンツ社製のＴＳＢ２１ＬＶ０３や新日
本製鉄社製のＮＮ８Ａ２０２などがある。At present, LLC 37 and PHY 38 having such a general-purpose interface function are available from several companies.
Examples of LLC include TSB12LV31 manufactured by Texas Instruments and NN8A213 manufactured by Nippon Steel Corporation. Examples of PHY include TSB21LV03 manufactured by Texas Instruments and NN8A202 manufactured by Nippon Steel Corporation. is there.

【０１９５】共用チップ２０とＬＬＣ３７とのインタフ
ェースについてさらに述べる。上に挙げたＬＬＣは、内
部にレジスタを持ち、共用チップ２０から、これらのレ
ジスタに対する書き込み／読み出しアクセスによってイ
ンタフェースされる。したがって、共用チップ２０から
は、上で述べた制御信号の他に、数ビットのアドレス信
号と、例えば１６ビットのデータ信号を用いて、ＳＲＡ
Ｍと同様の方法によってインタフェースできる。The interface between the shared chip 20 and the LLC 37 will be further described. The above-mentioned LLC has internal registers, and is interfaced from the shared chip 20 by write / read access to these registers. Therefore, from the shared chip 20, in addition to the control signal described above, an SRA using an address signal of several bits and a data signal of, for example, 16 bits is used.
It can be interfaced in the same way as M.

【０１９６】この実施の形態では、ＬＬＣ３７の制御信
号は、ＰＥＣ２５が生成する。また、ＬＬＣ３７のアド
レス信号としては、演算処理モードではローカルメモリ
１５のアドレスに使用されていたＬＭバス３１から出力
されるアドレス信号の下位の数ビットが入力される。ま
た、ここでは、データ信号を１６ビットとして、このデ
ータ信号には、例えば演算処理モードではローカルメモ
リ１５のデータ入出力に使用されていたＬＭバス３１と
接続される６４ビットのデータ信号の上位１６ビットを
用いる。In this embodiment, the control signal of the LLC 37 is generated by the PEC 25. As the address signal of the LLC 37, several lower bits of the address signal output from the LM bus 31 used for the address of the local memory 15 in the arithmetic processing mode are input. Here, the data signal is 16 bits, and this data signal includes, for example, the upper 16 bits of the 64-bit data signal connected to the LM bus 31 used for data input / output of the local memory 15 in the arithmetic processing mode. Use bits.

【０１９７】さらに、ＬＬＣ３７は、特定のインタフェ
ースによってＰＨＹ３８に接続される。ＰＨＹ３８で
は、伝送線の制御などの物理層の処理が行われ、図３に
示したポート１、ポート２を通して通信が行われる。ポ
ート１およびポート２は、前述の図２に示したように、
システムバス１３に対して接続され、デイジーチェーン
の形態で、ボード１２間またはホストコンピュータ１１
との間で、互いに接続される。Further, the LLC 37 is connected to the PHY 38 by a specific interface. In the PHY 38, processing of the physical layer such as control of a transmission line is performed, and communication is performed through the ports 1 and 2 shown in FIG. Port 1 and port 2 are, as shown in FIG.
It is connected to the system bus 13 and connected between the boards 12 or the host computer 11 in the form of a daisy chain.
Are connected to each other.

【０１９８】また、共用チップ２０は、ＬＬＣ３７の他
に、例えばデータバス幅が１６ビットのＲＯＭ３６と接
続される。この実施の形態では、ＲＯＭ３６のチップセ
レクト信号などの制御信号は、演算処理モードのときと
同様にＬＭＣ２４が生成し、また、アドレス信号は、Ｌ
Ｍバス３１から出力されるアドレス信号とされる。さら
に、データ信号を１６ビットとして、このデータ信号用
には、例えばＬＭバス３１と入出力される６４ビットの
データ信号の下位１６ビットが接続される。The shared chip 20 is connected to, for example, the ROM 36 having a data bus width of 16 bits, in addition to the LLC 37. In this embodiment, a control signal such as a chip select signal of the ROM 36 is generated by the LMC 24 in the same manner as in the arithmetic processing mode, and the address signal is L.
The address signal is output from the M bus 31. Further, assuming that the data signal is 16 bits, for this data signal, for example, the lower 16 bits of a 64-bit data signal input / output to / from the LM bus 31 are connected.

【０１９９】［バスブリッジ回路１７の機能、動作］次
に、バスブリッジ回路１７の機能と動作について説明す
る。[Function and Operation of Bus Bridge Circuit 17] Next, the function and operation of the bus bridge circuit 17 will be described.

【０２００】［ブリッジ回路の機能］まず、共用チップ
２０がブリッジモードのときの機能について、演算処理
モードとの違いを中心にして説明する。このブリッジモ
ードでは、システム起動時のリセット後の動作およびＣ
ＰＵ２１の動作に、演算処理モードとの違いがある。[Functions of Bridge Circuit] First, functions when the shared chip 20 is in the bridge mode will be described focusing on differences from the arithmetic processing mode. In this bridge mode, the operation after reset at system startup and C
The operation of the PU 21 is different from the operation processing mode.

【０２０１】まず、システム起動時のリセット後、演算
処理モードではＣＰＵ２１は起動待ち状態になっていた
のに対し、ブリッジモードでは、まず、ブート状態とな
ってＲＯＭ３６に格納されているプログラムを自動的に
内部メモリ２３に格納して、その後、自動的に動作状態
に移行する。First, after a reset at the time of system startup, the CPU 21 is in a startup waiting state in the arithmetic processing mode, whereas in the bridge mode, it is first in the boot state to automatically execute a program stored in the ROM 36. And then automatically shifts to the operating state.

【０２０２】このことによって、システムバス１３であ
るＩＥＥＥ１３９４シリアルバスや、ボードバス１６で
あるＰＰＲＡＭ−Ｌｉｎｋが立ち上がる前に、通信の機
能を使うことなく、プログラムをロードして動作を開始
することができる。Thus, before the IEEE 1394 serial bus as the system bus 13 and the PPRAM-Link as the board bus 16 start up, the program can be loaded and the operation can be started without using the communication function. .

【０２０３】ブート状態では、内部メモリ２３に格納さ
れている命令の代わりに、マルチプレクサ２６は、ロー
ドジェネレータ２７の出力を選択して、命令バス３２を
通してＣＰＵ２１に供給する。[0203] In the boot state, the multiplexer 26 selects the output of the load generator 27 and supplies it to the CPU 21 through the instruction bus 32 instead of the instruction stored in the internal memory 23.

【０２０４】ロードジェネレータ２７は、「ＲＯＭ３６
に格納されている命令を、ＬＭバス３１を通して、１６
ビット毎にＣＰＵ２１内部の整数レジスタにロードし、
ロード結果を２個ずつまとめて３２ビットの命令を構成
し、その結果をさらに内部メモリ２３にストアするとい
う操作を、ＲＯＭ３６と内部メモリ２３のアドレスを増
加させながら所定の回数だけ繰り返す」という命令列を
自動的に生成する。The load generator 27 has a ROM 36
Through the LM bus 31 to the instruction stored in
Loads an integer register inside the CPU 21 for each bit,
An operation of forming a 32-bit instruction by putting together two load results and storing the result in the internal memory 23 is repeated a predetermined number of times while increasing the addresses of the ROM 36 and the internal memory 23. " Is automatically generated.

【０２０５】さらに、このとき、ＣＰＵ２１は、ほぼ前
述の動作状態と同じように命令を実行する。この結果、
ＣＰＵ２１は、ブート状態で、ロードジェネレータ２７
で生成された命令を実行するため、ＲＯＭ３６に格納さ
れている命令が、全て内部メモリ２３に格納される。Further, at this time, the CPU 21 executes the command substantially in the same manner as the above-described operation state. As a result,
When the CPU 21 is in the boot state, the load generator 27
The instructions stored in the ROM 36 are all stored in the internal memory 23 in order to execute the instruction generated in the step S1.

【０２０６】但し、ＲＯＭ３６のアクセスタイムは、一
般に、ＳＲＡＭで構成されるローカルメモリのアクセス
タイムよりも遅い場合が多いため、ブート状態における
ＲＯＭ３６からのロード命令の実行は、演算処理モード
の動作状態におけるロード命令よりも時間をかけて実施
する。However, in general, the access time of the ROM 36 is often slower than the access time of the local memory constituted by the SRAM. Therefore, the execution of the load instruction from the ROM 36 in the boot state is performed in the operation state in the operation processing mode. Takes longer to execute than load instructions.

【０２０７】また、前述のようにＬＭバス３１を通し
て、ＣＰＵ２１の内部の整数レジスタへロードするとき
には、ＣＰＵ２１のコントロールレジスタの設定によっ
て、６４ビットのＬＭバス３１のどの部分をロードする
かが決められる。このため、ロード命令でＬＭバス３１
の下位３２ビットからロードされるように、ＣＰＵ２１
のコントロールレジスタのリセット直後の値が設定され
ている。As described above, when loading an integer register inside the CPU 21 through the LM bus 31, which part of the 64-bit LM bus 31 is loaded is determined by the setting of the control register of the CPU 21. Therefore, the load instruction causes the LM bus 31
CPU 21 so that it is loaded from the lower 32 bits of
The value immediately after resetting of the control register is set.

【０２０８】ブート状態でＲＯＭ３６の命令が内部メモ
リ２３に転送され終わると、上に述べたように、ＣＰＵ
２１は、自動的に動作状態に移行して、命令の実行を開
始する。この状態では、前述のマルチプレクサ２６が切
り替わり、演算処理モードと同様、内部メモリ２３から
の命令出力を選択する。それとともに、ＣＰＵ２１は、
ブート状態でプログラムが格納された内部メモリ２３の
アドレスから順番に命令を読み出しながら、実行する。When the instructions in the ROM 36 have been transferred to the internal memory 23 in the boot state, as described above, the CPU
21 automatically shifts to the operating state and starts executing the instruction. In this state, the multiplexer 26 is switched to select an instruction output from the internal memory 23 as in the arithmetic processing mode. At the same time, the CPU 21
In the boot state, instructions are read out and executed sequentially from the address of the internal memory 23 where the program is stored.

【０２０９】ここで、動作状態における外部回路からの
ロード、ストア命令の動作は、演算処理モードと、この
ブリッジモードとでは異なる。演算処理モードでは、汎
用ＳＲＡＭで構成されたローカルメモリ１５に対して読
み出し／書き込みが行われていたのに対して、ブリッジ
モードでは、前述したＬＬＣ３７のインタフェースを通
して読み出し／書き込みが行われる。Here, the operation of the load / store instruction from the external circuit in the operation state is different between the arithmetic processing mode and the bridge mode. In the arithmetic processing mode, reading / writing is performed to / from the local memory 15 configured by a general-purpose SRAM, whereas in the bridge mode, reading / writing is performed through the interface of the LLC 37 described above.

【０２１０】このため、ブリッジモードでは、ＰＥＣ２
５によって、ＬＬＣ３７の制御信号を生成しながら、前
述の汎用インタフェースの仕様に基づいて、ＬＬＣ３７
に含まれている内部レジスタの読み出しや書き込みが行
われる。また、ＳＲＡＭとＬＬＣ３７のアクセスタイム
にも違いがあるため、ＣＰＵ２１は、演算処理モードの
ときとは異なるサイクル数をかけて、ＬＬＣ３７との間
のロード、ストアを実行する。Therefore, in bridge mode, PEC2
5, while generating a control signal for the LLC 37, based on the above-described general-purpose interface specifications, the LLC 37
The reading and writing of the internal register included in the. In addition, since there is a difference in the access time between the SRAM and the LLC 37, the CPU 21 executes load and store with the LLC 37 by using a different number of cycles from that in the arithmetic processing mode.

【０２１１】また、このとき、ＬＭバス３１の下位１６
ビットからの書き込みは、ＣＰＵ２１のコントロールレ
ジスタを設定することによって禁止されるため、ストア
命令のときにも、ＲＯＭ３６に書き込みが行われること
はなく、ＲＯＭ３６の出力とデータ出力とが衝突するこ
とはない。At this time, the lower 16 bits of the LM bus 31
Writing from a bit is prohibited by setting the control register of the CPU 21. Therefore, even when a store command is issued, writing to the ROM 36 is not performed, and the output of the ROM 36 does not collide with the data output. .

【０２１２】ボードバスインタフェース２８など、上述
した部分以外の他のユニットは、ブリッジモードでも、
演算処理モードと同じ機能を持つ。Other units, such as the board bus interface 28, other than the above-described units, operate in the bridge mode.
It has the same function as the arithmetic processing mode.

【０２１３】［ＬＬＣ３７の動作概要］次に、共用チッ
プ２０から、ＬＬＣ３７を使用してＩＥＥＥ１３９４シ
リアルバスからなるシステムバス１３を経由した通信を
行うときの動作の概要について説明する。[Outline of Operation of LLC 37] Next, an outline of an operation when communication is performed from the shared chip 20 via the system bus 13 composed of an IEEE 1394 serial bus using the LLC 37 will be described.

【０２１４】ＩＥＥＥ１３９４シリアルバスの仕様で
は、非同期のアシンクロナス転送と、アシンクロナス転
送よりも高速であり、リアルタイム転送を行うためのア
イソクロナス転送が定義されているが、この実施の形態
では、簡単のためにＩＥＥＥ１３９４バス規格のアシン
クロナス転送を用いることとする。このとき、ＬＬＣ３
７による通信制御手順としては、一般的に、以下のよう
なものが考えられる。In the specification of the IEEE 1394 serial bus, asynchronous asynchronous transfer and isochronous transfer for performing real-time transfer, which is faster than asynchronous transfer, are defined. It is assumed that asynchronous transfer of the bus standard is used. At this time, LLC3
In general, the following can be considered as the communication control procedure by 7.

【０２１５】まず、ＬＬＣ３７には、トランスミット用
ＦＩＦＯ、レシーブ用ＦＩＦＯ、制御用レジスタ、状態
レジスタが含まれるものとする。トランスミット用ＦＩ
ＦＯは、共用チップ２０からＩＥＥＥ１３９４シリアル
バスであるシステムバス１３に、アシンクロナス転送の
パケットを送出するときに使用され、前述の汎用インタ
フェースに従って共用チップ２０から書き込みが行われ
る。また、制御用レジスタは、この送出を制御するため
に使用され、同じく、共用チップ２０から書き込みが行
われる。First, it is assumed that the LLC 37 includes a transmit FIFO, a receive FIFO, a control register, and a status register. FI for transmission
The FO is used when transmitting an asynchronous transfer packet from the shared chip 20 to the system bus 13 which is an IEEE 1394 serial bus, and writing is performed from the shared chip 20 according to the above-described general-purpose interface. The control register is used to control this transmission, and similarly, writing is performed from the shared chip 20.

【０２１６】さらに、レシーブ用ＦＩＦＯは、システム
バスであるＩＥＥＥ１３９４シリアルバスからアシンク
ロナス転送のパケットを受信するときに使用され、前述
の汎用インタフェースに従って共用チップ２０から読み
出される。また、ＬＬＣ３７からＣＰＵ２１への割り込
み信号ライン３９も使用される。The receive FIFO is used when receiving asynchronous transfer packets from the IEEE 1394 serial bus, which is a system bus, and is read from the shared chip 20 in accordance with the above-described general-purpose interface. An interrupt signal line 39 from the LLC 37 to the CPU 21 is also used.

【０２１７】このとき、状態レジスタは、割り込みの状
態などを共用チップ２０に伝えるために使用され、共用
チップ２０から汎用インタフェースに従って読み出され
る。上で述べた各社から発表されているＬＬＣは、さま
ざまな通信制御手順を用いているが、ここで述べた一般
的な手順に対し、わずかな変更を加えるだけで対応する
ことが可能なので、以下この手順を前提にして説明す
る。At this time, the status register is used for transmitting the interrupt status and the like to the shared chip 20, and is read from the shared chip 20 according to the general-purpose interface. The LLCs announced by the companies mentioned above use various communication control procedures, but the general procedures described here can be handled with only slight changes, so The description will be made based on this procedure.

【０２１８】ここで、ＩＥＥＥ１３９４のアシンクロナ
ス転送では、あるノードから別のノードに対して書き込
みトランザクション、読み出しトランザクションを行う
ときは、前述のＰＰＲＡＭ−Ｌｉｎｋの要求パケット送
出と同様、要求パケットの送出、および応答パケットの
受け取りによってトランザクションが構成される。In the IEEE 1394 asynchronous transfer, when a write transaction and a read transaction are performed from one node to another node, the request packet transmission and response are performed in the same manner as the PPRAM-Link request packet transmission described above. A transaction is configured by receiving a packet.

【０２１９】すなわち、要求側のノードは、アシンクロ
ナスの書き込み要求パケットまたは読み出し要求パケッ
トを送出する。その後、要求を受け付けたノードは、ア
シンクロナスの書き込み応答パケットまたは読み出し応
答パケットを要求側のノードに対して返す。また、ＰＰ
ＲＡＭ−Ｌｉｎｋと同様、書き込みの場合は書き込み要
求パケットにデータが付加され、読み出しの場合は読み
出し応答パケットにデータが付加される。In other words, the requesting node sends out an asynchronous write request packet or read request packet. Thereafter, the node that has received the request returns an asynchronous write response packet or read response packet to the requesting node. Also, PP
As in the case of the RAM-Link, in the case of writing, data is added to a write request packet, and in the case of reading, data is added to a read response packet.

【０２２０】ＩＥＥＥ１３９４シリアルバス規格のリン
ク層では、このようにパケットのレベルの処理が行われ
るので、アシンクロナスの要求パケットと応答パケット
の処理などのトランザクションレベルの処理は、共用チ
ップ２０のプログラムで行う必要がある。In the link layer of the IEEE 1394 serial bus standard, the processing at the packet level is performed as described above. Therefore, the processing at the transaction level such as the processing of the asynchronous request packet and the response packet needs to be performed by the program of the shared chip 20. There is.

【０２２１】ここで、共用チップ２０からＩＥＥＥ１３
９４シリアルバスを通してアシンクロナスの要求パケッ
トまたは応答パケットを送出するときは、まず、共用チ
ップ２０内のＣＰＵ２１を通して、前述のトランスミッ
ト用ＦＩＦＯに、これらのパケットを書き込み、その
後、制御用レジスタに特定の値を書き込むことによって
パケットの送出を指示する。すると、ＬＬＣ３７によっ
て、トランスミット用ＦＩＦＯに格納されたパケットが
自動的にＩＥＥＥ１３９４に送出される。Here, the shared chip 20 to the IEEE 13
When transmitting an asynchronous request packet or response packet through the 94 serial bus, first, these packets are written into the above-mentioned transmit FIFO through the CPU 21 in the shared chip 20, and then a specific value is stored in the control register. To instruct transmission of the packet. Then, the packet stored in the transmission FIFO is automatically transmitted to IEEE 1394 by the LLC 37.

【０２２２】また、ＩＥＥＥ１３９４シリアルバスから
アシンクロナスの要求パケットまたは応答パケットが受
け取られると、このパケットがＬＬＣ３７のレシーブ用
ＦＩＦＯに格納されるとともに、ＬＬＣ３７から前述の
外部割り込み信号ライン３９を通して、共用チップ２０
に割り込みが発生する。そこで、共用チップ２０内のＣ
ＰＵ２１は、割り込みハンドラを起動し、前述のレシー
ブ用ＦＩＦＯからパケットを読み出す。また、前述の状
態レジスタのデータを読み出すことによって、ＣＰＵ２
１は、割り込みを発生したときのパケットの種類などの
情報を得ることができる。When an asynchronous request packet or response packet is received from the IEEE 1394 serial bus, this packet is stored in the receive FIFO of the LLC 37, and is sent from the LLC 37 to the shared chip 20 through the external interrupt signal line 39 described above.
Is interrupted. Therefore, C in shared chip 20
The PU 21 activates an interrupt handler and reads a packet from the receive FIFO described above. Also, by reading the data of the status register, the CPU 2
1 can obtain information such as the type of packet when an interrupt occurs.

【０２２３】［バスブリッジ回路としての動作］次に、
以上説明した機能によってバスブリッジ回路１７が動作
する様子について説明する。前述したように、システム
によって初期化された後、共用チップ２０内のＣＰＵ２
１は、ＲＯＭ３６から内部メモリ２３に読み出されたプ
ログラムに従って動作する。前に説明したシステムの動
作、あるいはプロセッサエレメント１４の動作に関わ
り、バスブリッジ回路１７では、このプログラムに基い
て以下の３通りの動作を行うことが必要である。[Operation as Bus Bridge Circuit]
The manner in which the bus bridge circuit 17 operates with the functions described above will be described. As described above, the CPU 2 in the shared chip 20 after being initialized by the system.
1 operates according to a program read from the ROM 36 to the internal memory 23. In connection with the operation of the system described above or the operation of the processor element 14, the bus bridge circuit 17 needs to perform the following three operations based on this program.

【０２２４】（Ｇ）ホストコンピュータ１１から、各プ
ロセッサエレメント１４への、内部メモリ２３またはロ
ーカルメモリ１５に対するパラメータやプログラムの書
き込み要求の処理。(G) Processing of a request from the host computer 11 to each processor element 14 for writing a parameter or a program to the internal memory 23 or the local memory 15.

【０２２５】（Ｈ）各プロセッサエレメント１４から、
ホストコンピュータ１１の計算終了ビットへの書き込み
要求の処理。(H) From each processor element 14,
Processing of a write request to the calculation end bit of the host computer 11.

【０２２６】（Ｉ）ホストコンピュータ１１から、各プ
ロセッサエレメント１４に付属のローカルメモリ１５か
らのｇｒｓのデータ読み出し要求の処理。(I) Processing of a read request for grs data from the local memory 15 attached to each processor element 14 from the host computer 11.

【０２２７】以下、これらの動作の内容について説明す
る。The contents of these operations will be described below.

【０２２８】まず、バスブリッジ回路１７のＣＰＵ２１
は、通常、自分が出したＩＥＥＥ１３９４バス規格のア
シンクロナスの要求送出パケットのうち、まだ、ＬＬＣ
３７を通して応答パケットが返って来ていないものに対
して、経過時間を監視するループを実行している。ＣＰ
Ｕ２１が、通常、この操作を行う理由は後述する。ま
た、前記（Ｇ）〜（Ｉ）のＣＰＵ２１の動作は、以下に
述べるように、割り込みによって起動されるため、殆ど
が割り込みハンドラの処理となる。First, the CPU 21 of the bus bridge circuit 17
Is normally the LLC request transmission packet of the IEEE 1394 bus standard issued by itself.
A loop for monitoring the elapsed time is executed for those for which no response packet has been returned through 37. CP
The reason why U21 normally performs this operation will be described later. In addition, since the operations of the CPUs (G) to (I) are started by an interrupt as described below, most of the operations are performed by an interrupt handler.

【０２２９】前記（Ｇ）の動作のときは、バスブリッジ
回路１７は、ホストコンピュータ１１から書き込みデー
タを受け取り、書き込み先のプロセッサエレメント１４
に対して、ＰＰＲＡＭ−Ｌｉｎｋの書き込みトランザク
ションを行う。In the case of the operation (G), the bus bridge circuit 17 receives write data from the host computer 11,
, A write transaction of PPRAM-Link is performed.

【０２３０】このとき、上で述べたように、ホストコン
ピュータ１１からのアシンクロナスの書き込み要求パケ
ットが、ＬＬＣ３７のレシーブ用ＦＩＦＯに格納され
て、外部割り込み信号ライン３９を通して割り込みが発
生するので、バスブリッジ回路１７の共用チップ２０の
ＣＰＵ２１は外部割り込みを処理するための割り込みハ
ンドラに飛ぶ。At this time, as described above, the asynchronous write request packet from the host computer 11 is stored in the receive FIFO of the LLC 37 and an interrupt is generated through the external interrupt signal line 39. The CPU 21 of the shared chip 20 jumps to an interrupt handler for processing an external interrupt.

【０２３１】この割り込みハンドラでは、ＣＰＵ２１
は、まず、ＬＬＣ３７の状態レジスタを読み出して、書
き込み要求が来ていることを確認すると、この書き込み
要求パケットのヘッダ及び付加されている書き込みデー
タを、レシーブ用ＦＩＦＯから読み出し、内部メモリ２
３に書き込む。In this interrupt handler, the CPU 21
First reads the status register of the LLC 37 and confirms that a write request has been received, reads the header of the write request packet and the added write data from the receive FIFO, and reads the internal memory 2
Write to 3.

【０２３２】その後、ＣＰＵ２１は、書き込み要求パケ
ットのヘッダ情報に含まれているアドレスを、データを
書き込むプロセッサエレメント１４のＰＰＲＡＭノード
識別子とメモリアドレスオフセットに変換する。さら
に、これらの情報を基にして、ＰＰＲＡＭ−Ｌｉｎｋの
書き込み要求パケットのヘッダを生成して、上記データ
とともに、内部メモリ２３の所定の位置に格納する。After that, the CPU 21 converts the address contained in the header information of the write request packet into the PPRAM node identifier and the memory address offset of the processor element 14 into which the data is to be written. Further, a header of a write request packet of the PPRAM-Link is generated based on these pieces of information, and stored in a predetermined position of the internal memory 23 together with the data.

【０２３３】その後、プロセッサエレメント１４の機能
の説明で述べた、前記（Ｅ）と全く同じ操作を行うこと
によって、前記内部メモリ２３の所定の位置に格納され
たデータを、該当するプロセッサエレメント１４の内部
メモリ２３に書き込む。同時に、バスブリッジ回路１７
の共用チップ２０は、ＩＥＥＥ１３９４シリアルバスで
あるシステムバス１３に対して、アシンクロナスの書き
込み要求パケットに対する書き込み応答パケットを送出
する。この操作は、まず、ＣＰＵ２１によって書き込み
応答パケットを内部メモリ２３上に生成し、その生成し
た書き込み応答パケットを、前述のように、ＬＬＣ３７
のトランスミット用ＦＩＦＯ、制御レジスタへ書き込む
ことによって行うことができる。Thereafter, by performing exactly the same operation as in (E) described in the description of the function of the processor element 14, the data stored at a predetermined position in the internal memory 23 is transferred to the corresponding processor element 14. Write to the internal memory 23. At the same time, the bus bridge circuit 17
Sends a write response packet to the asynchronous write request packet to the system bus 13 which is an IEEE 1394 serial bus. In this operation, first, a write response packet is generated in the internal memory 23 by the CPU 21, and the generated write response packet is stored in the LLC 37 as described above.
By writing to the transmit FIFO and control register.

【０２３４】また、前記（Ｉ）の動作の処理内容は、前
記（Ｇ）の処理内容と同様である。バスブリッジ回路１
７は、ホストコンピュータ１１から読み出し要求を受け
取ると、読み出しの対象となるプロセッサエレメント１
４に対して、ＰＰＲＡＭ−Ｌｉｎｋの読み出しトランザ
クションを発生させて、データを受け取り、その後、応
答パケットにより、ホストコンピュータ１１にデータを
送出する。The processing contents of the operation (I) are the same as the processing contents of the above (G). Bus bridge circuit 1
When receiving a read request from the host computer 11, the processor element 1 to be read is
4, a read transaction of the PPRAM-Link is generated to receive the data, and thereafter, the data is transmitted to the host computer 11 by a response packet.

【０２３５】このとき、ホストコンピュータ１１からの
アシンクロナスの読み出し要求パケットが、ＬＬＣ３７
のレシーブ用ＦＩＦＯに格納されて、外部割り込み信号
ライン３９を通して割り込みが発生するので、バスブリ
ッジ回路１７の共用チップ２０のＣＰＵ２１は、外部割
り込みを処理するための割り込みハンドラに飛ぶ。At this time, the asynchronous read request packet from the host computer 11 is transmitted to the LLC 37
Is stored in the receiving FIFO and an interrupt is generated through the external interrupt signal line 39, so that the CPU 21 of the shared chip 20 of the bus bridge circuit 17 jumps to an interrupt handler for processing the external interrupt.

【０２３６】そして、割り込みハンドラにおいて、ＬＬ
Ｃ３７の状態レジスタを読み出して、読み出し要求が来
ていることを判定すると、この読み出し要求パケットを
レシーブ用ＦＩＦＯから読み出し、内部メモリ２３に書
き込む。その後、ＣＰＵ２１は、読み出し要求パケット
のヘッダ情報に含まれているアドレスを、データを読み
出すプロセッサエレメントのＰＰＲＡＭノード識別子と
メモリアドレスオフセットに変換する。Then, in the interrupt handler, LL
When it is determined that a read request has been received by reading the status register of C37, this read request packet is read from the receive FIFO and written to the internal memory 23. Thereafter, the CPU 21 converts the address included in the header information of the read request packet into a PPRAM node identifier and a memory address offset of the processor element from which data is read.

【０２３７】さらに、これらの情報を基にして、ＰＰＲ
ＡＭ−Ｌｉｎｋの読み出し要求パケットのヘッダを生成
して、内部メモリ２３の所定の位置に格納する。その
後、プロセッサエレメントの機能の説明で述べた、前記
（Ｆ）と全く同じ操作を行うことによって、該当するプ
ロセッサエレメント１４のメモリからデータが読み出さ
れ、バスブリッジ回路１７の共用チップ２０の内部メモ
リ２３に、その読み出されたデータが書き込まれる。Further, based on these information, the PPR
The header of the read request packet of the AM-Link is generated and stored in a predetermined position of the internal memory 23. Thereafter, by performing exactly the same operation as in (F) described in the description of the function of the processor element, data is read from the memory of the corresponding processor element 14 and the internal memory of the shared chip 20 of the bus bridge circuit 17 is read. 23, the read data is written.

【０２３８】これと同時に、バスブリッジ回路１７の共
用チップ２０は、ＩＥＥＥ１３９４バスからなるシステ
ムバス１３に対して、アシンクロナスの読み出し要求パ
ケットに対する読み出し応答パケットを送出する。この
とき、読み出し応答パケットには、データが付加される
ので、まず、ＣＰＵ２１によって読み出し応答パケット
のヘッダを生成し、さらに、このヘッダと、内部メモリ
２３に格納された上記読み出しデータとを、ＬＬＣ３７
のトランスミット用ＦＩＦＯに書き込み、次に前述のよ
うに制御レジスタへ書き込むことによって、読み出し応
答パケットを送出する。At the same time, the shared chip 20 of the bus bridge circuit 17 sends out a read response packet to the asynchronous read request packet to the system bus 13 composed of the IEEE 1394 bus. At this time, since data is added to the read response packet, first, a header of the read response packet is generated by the CPU 21, and the header and the read data stored in the internal memory 23 are transferred to the LLC 37.
Then, a read response packet is sent out by writing to the transmission FIFO of the above and then writing to the control register as described above.

【０２３９】さらに、前記（Ｈ）の動作では、（Ｇ）、
（Ｉ）の動作とは異なり、バスブリッジ回路１７は、Ｉ
ＥＥＥ１３９４シリアルバスに対するアシンクロナスの
書き込みトランザクションを実行しなければならない。In the operation (H), (G)
Unlike the operation of (I), the bus bridge circuit 17
An asynchronous write transaction to the EEE1394 serial bus must be performed.

【０２４０】また、前述のように、この動作では、ある
プロセッサエレメント１４からホストコンピュータ１１
のメモリへ書き込み要求を行うので、そのプロセッサエ
レメント１４からバスブリッジ回路１７の内部メモリ２
３に対して、ＰＰＲＡＭ−Ｌｉｎｋの書き込み要求を使
って、特殊なフォーマットのデータが書き込まれる。こ
の特殊なフォーマットのデータの中には、ホストコンピ
ュータ１１に対する書き込み動作を行うこと、書き込み
を行うホストコンピュータ１１上のメモリアドレス、そ
のアドレスに対する書き込みデータなどの情報が含まれ
ている。そこで、プロセッサエレメント１４から書き込
み要求が来ると、バスブリッジ回路１７では、これらの
情報を基にして、ＩＥＥＥ１３９４シリアルバス用のト
ランザクションを起こし、ホストコンピュータ１１に書
き込みを行う。以下、この操作を詳しく述べる。In addition, as described above, in this operation, a certain processor element 14 sends a signal from the host computer 11 to the host computer 11.
Of the internal memory 2 of the bus bridge circuit 17 from the processor element 14.
3, a special format data is written using a PPRAM-Link write request. The data in the special format includes information such as performing a write operation on the host computer 11, a memory address on the host computer 11 for performing the write, and write data for the address. Therefore, when a write request is received from the processor element 14, the bus bridge circuit 17 generates a transaction for the IEEE 1394 serial bus based on the information and writes the transaction to the host computer 11. Hereinafter, this operation will be described in detail.

【０２４１】このとき、バスブリッジ回路１７では、ま
ず、前述の（Ａ）とほぼ同様の操作が行われる。すなわ
ち、共用チップ２０内のボードバスインタフェース２８
は、プロセッサエレメント１４からの書き込み要求パケ
ットに含まれるデータを内部メモリ２３に格納して、Ｃ
ＰＵ２１に割り込みを出す。ＣＰＵ２１は、割り込みを
受け取ると、ボードバスインタフェース２８からの割り
込みを処理する割り込みハンドラに飛んで、前記（Ａ）
の操作によって、プロセッサエレメント１４にＰＰＲＡ
Ｍ−Ｌｉｎｋの書き込み応答パケットを出す。At this time, in the bus bridge circuit 17, first, substantially the same operation as in the above (A) is performed. That is, the board bus interface 28 in the shared chip 20
Stores the data contained in the write request packet from the processor element 14 in the internal memory 23,
An interrupt is issued to the PU 21. When the CPU 21 receives the interrupt, it jumps to the interrupt handler that processes the interrupt from the board bus interface 28, and
Operation causes the processor element 14 to
An M-Link write response packet is issued.

【０２４２】この後の操作は、前記（Ａ）に記した操作
の場合と、このブリッジモードの場合とで異なる。ま
ず、前記（Ａ）の操作では、すぐに割り込みハンドラか
らメインルーチンに戻り、操作は終了する。しかし、こ
のブリッジモードの場合には、割り込みハンドラから戻
ることなく、続けて以下のようにＩＥＥＥ１３９４シリ
アルバスのパケットを発行する操作が実行される。The subsequent operation is different between the operation described in (A) and the bridge mode. First, in the operation (A), the process immediately returns to the main routine from the interrupt handler, and the operation ends. However, in the case of the bridge mode, an operation of continuously issuing a packet of the IEEE 1394 serial bus is performed as follows without returning from the interrupt handler.

【０２４３】この時点で、内部メモリ２３には、前述の
特殊なフォーマットのデータを通して、ホストコンピュ
ータ１１のメモリアドレスや書き込むデータが格納され
ているので、ＣＰＵ２１は、これらの情報を基にして、
ＩＥＥＥ１３９４シリアルバスのアシンクロナスの書き
込み要求パケットのヘッダを、内部メモリ２３上に生成
する。At this point, since the memory address of the host computer 11 and the data to be written are stored in the internal memory 23 through the data of the special format described above, the CPU 21
The header of the asynchronous write request packet of the IEEE 1394 serial bus is generated on the internal memory 23.

【０２４４】その後、前述のように、ＬＬＣ３７のトラ
ンスミット用ＦＩＦＯに、このヘッダとデータを書き込
み、制御レジスタに書き込みを行うことにより、書き込
み要求パケットを、ＩＥＥＥ１３９４シリアルバスであ
るシステムバス１３に送出する。その後、ＣＰＵ２１
は、ボードバスインタフェース２８からの割り込みハン
ドラからメインルーチンに戻る。Thereafter, as described above, the header and data are written in the transmission FIFO of the LLC 37 and written in the control register, thereby transmitting a write request packet to the system bus 13 which is an IEEE 1394 serial bus. . After that, the CPU 21
Returns from the interrupt handler from the board bus interface 28 to the main routine.

【０２４５】このような操作は、前記（Ａ）の操作を行
うボードバスインタフェース２８からの割り込みを処理
する割り込みハンドラに対し、ハンドラから戻る直前
に、ＩＥＥＥ１３９４シリアルバス規格のパケットを発
行するプログラムを追加することによって実現できる。In such an operation, a program for issuing an IEEE 1394 serial bus standard packet is added to the interrupt handler for processing the interrupt from the board bus interface 28 for performing the operation (A) just before returning from the handler. It can be realized by doing.

【０２４６】さらに、前述したように、ＬＬＣ３７は、
アシンクロナスのパケットの処理レベルまでしか行わな
いので、トランザクションレベルの処理は、バスブリッ
ジ回路１７で行わなければならない。したがって、この
とき送出したＩＥＥＥ１３９４バス規格の書き込み要求
パケットに対する書き込み応答パケットが帰ってくるこ
とを確認しなければならない。Furthermore, as described above, LLC 37
Since the processing is performed only up to the processing level of the asynchronous packet, the processing at the transaction level must be performed by the bus bridge circuit 17. Therefore, it is necessary to confirm that the write response packet corresponding to the IEEE 1394 bus standard write request packet transmitted at this time returns.

【０２４７】また、書き込み要求パケットを出してから
一定時間経っても書き込み応答パケットが返ってこなけ
れば、書き込み要求パケットを再送するなどの措置を講
じる必要がある。この操作は、前記（Ｇ）の説明の直前
で述べたように、割り込みがかかる前に、ＣＰＵ２１
が、通常実行するプログラムによって行われる。If a write response packet does not return within a certain period of time after issuing a write request packet, it is necessary to take measures such as resending the write request packet. This operation is performed by the CPU 21 before the interruption is performed, as described immediately before the description of (G).
Is performed by a normally executed program.

【０２４８】このとき、ＣＰＵ２１は、まだＬＬＣ３７
から書き込み応答パケットが返ってきていない要求パケ
ットを、内部メモリ２３上に保持しておくとともに、書
き込み要求パケットの１個ずつに対応して、ＣＰＵ２１
内部の整数レジスタ上などにカウンタなどの変数を設け
ておき、一定時間毎に、このカウンタをインクリメント
する。そして、カウンタが所定の値以上になれば、再
度、書き込み要求パケットを再送するなどの操作を行
う。ＣＰＵ２１は、この操作を、通常実行するプログラ
ムとして、無限に繰り返す。At this time, the CPU 21 still outputs the LLC 37
The request packet from which no write response packet has been returned from the CPU 21 is stored in the internal memory 23 and the CPU 21 responds to each of the write request packets one by one.
A variable such as a counter is provided on an internal integer register or the like, and this counter is incremented at regular intervals. When the counter becomes equal to or more than a predetermined value, an operation such as resending the write request packet is performed again. The CPU 21 repeats this operation indefinitely as a normally executed program.

【０２４９】さらに、書き込み応答パケットが返ってき
たときには、前述のように、ＬＬＣ３７から外部割り込
み信号ライン３９を通して、バスブリッジ回路１７の共
用チップ２０に対して割り込みが行われ、前記（Ｇ）、
（Ｉ）のときと同じ外部割り込みに対する割り込みハン
ドラに飛ぶ。そして、この割り込みハンドラに、次の操
作を追加する。When the write response packet is returned, the LLC 37 interrupts the shared chip 20 of the bus bridge circuit 17 through the external interrupt signal line 39 as described above.
Jump to the interrupt handler for the same external interrupt as in (I). Then, the following operation is added to this interrupt handler.

【０２５０】内部メモリ２３に、レシーブ用ＦＩＦＯか
ら読み出したパケットが書き込み応答パケットであった
場合には、そのパケットとともに、前述のように、内部
メモリ２３に保持されている書き込み要求パケットを除
去して、割り込みハンドラから戻る。上記のＣＰＵ２１
で通常行われているプログラムでは、応答パケットが返
っていない書き込み要求パケットに対してだけ、カウン
トが行われるので、この割り込みハンドラの処理によっ
て、カウント対象から除去される。When the packet read from the receive FIFO is a write response packet, the write request packet held in the internal memory 23 is removed from the internal memory 23 as described above. Return from the interrupt handler. CPU 21 above
In the program normally executed in the above, the count is performed only for the write request packet for which the response packet is not returned. Therefore, the program is removed from the count target by the processing of the interrupt handler.

【０２５１】以上のようにして、ブリッジ回路の動作が
行われる。ここで、ＩＥＥＥ１３９４シリアルバス規格
の処理は、ＣＰＵ２１で行うようにしているため、速度
が問題になることも考えられるが、もともとプロセッサ
エレメントで高速に浮動小数点積和演算を行うために開
発された高速なＣＰＵを使用しているので、分子軌道計
算のように、比較的通信量が少ないアプリケーションに
対しては、十分な性能を確保することができる。As described above, the operation of the bridge circuit is performed. Here, since processing according to the IEEE 1394 serial bus standard is performed by the CPU 21, speed may be an issue. However, a high-speed processing originally developed for performing a floating-point multiply-accumulate operation by a processor element at high speed is considered. Since a simple CPU is used, sufficient performance can be ensured for an application with a relatively small communication amount, such as molecular orbital calculation.

【０２５２】以上説明したように、上述の実施の形態に
よれば、プロセッサエレメント１４に必要な構成に対し
て、モード選択信号ＳＥＬによって演算処理モードとブ
リッジモードとを切り替え、ブリッジモードで、ＣＰＵ
２１が動作状態のときに、外部とインタフェースするＰ
ＥＣ（周辺回路コントローラ）２５を追加し、また、初
期化を行うときに、ローダジェネレータ２７からプログ
ラムをロードするようにし、さらに、ＣＰＵ２１の外部
インタフェースのタイミングを変えることによって、同
じＬＳＩ２０を、バスブリッジ回路１７の一部として使
用することができるので、回路開発の工数を劇的に削減
することができる。As described above, according to the above-described embodiment, the mode required for the processor element 14 is switched between the arithmetic processing mode and the bridge mode by the mode selection signal SEL.
When the interface 21 is in the operating state, the P
An EC (peripheral circuit controller) 25 is added, a program is loaded from a loader generator 27 at the time of initialization, and the timing of an external interface of the CPU 21 is changed so that the same LSI 20 can be connected to a bus bridge. Since it can be used as a part of the circuit 17, the number of steps for circuit development can be drastically reduced.

【０２５３】なお、上述の実施の形態では、ホストコン
ピュータ１１から発生されるトランザクションによって
密度行列などが送付されていたため、バスブリッジ回路
１７の動作内容は、上記（Ｇ）〜（Ｉ）にとどまってい
たが、システムにより柔軟性を持たせるために、プロセ
ッサエレメント１４からホストコンピュータ１１のメモ
リ上に保持されている密度行列を読み出すトランザクシ
ョンを発生させる機能を追加しても良い。この場合に
は、バスブリッジ回路１７に対して、以下の（Ｊ）の操
作が新たに必要である。In the above embodiment, since the density matrix and the like are sent by the transaction generated from the host computer 11, the operation contents of the bus bridge circuit 17 are limited to the above (G) to (I). However, in order to make the system more flexible, a function for generating a transaction for reading the density matrix stored in the memory of the host computer 11 from the processor element 14 may be added. In this case, the following operation (J) is newly required for the bus bridge circuit 17.

【０２５４】（Ｊ）各プロセッサエレメント１４から、
ホストコンピュータ１１上のメモリにあるデータの書き
込み要求の処理。(J) From each processor element 14,
Processing of a request to write data in the memory on the host computer 11.

【０２５５】この処理は、以下の操作によって実現でき
る。This process can be realized by the following operations.

【０２５６】まず、プロセッサエレメント１４からは、
前記（Ｈ）の場合と同様、ＰＰＲＡＭ−Ｌｉｎｋの書き
込みトランザクションを発生することによって、ホスト
コンピュータ１１からの読み出し要求を示す特殊なフォ
ーマットを、バスブリッジ回路１７の共用チップ２０の
内部メモリ２３へ書き込む。この特殊フォーマットの中
には、ホストコンピュータのメモリからの読み出しであ
ることや、読み出しを行うメモリアドレスが含まれてい
る。First, from the processor element 14,
As in the case of (H), a special format indicating a read request from the host computer 11 is written to the internal memory 23 of the shared chip 20 of the bus bridge circuit 17 by generating a write transaction of the PPRAM-Link. This special format includes the fact that the data is to be read from the memory of the host computer and the memory address at which the data is to be read.

【０２５７】したがって、バスブリッジ回路１７は、前
記（Ｈ）の処理と同様にして、まず、前記（Ａ）の処理
を実施した後、ＬＬＣ３７を通してＩＥＥＥ１３９４シ
リアルバス規格の読み出し要求パケットを送出すること
によって、読み出しトランザクションを起こし、ボード
バスインタフェース２８からの割り込みハンドラを終了
する。また、上述のように、ＣＰＵ２１が通常行うプロ
グラムによって、この読み出しパケットに対する応答パ
ケットの監視を実施する。Therefore, the bus bridge circuit 17 executes the process (A) in the same manner as the process (H), and then transmits a read request packet of the IEEE 1394 serial bus standard through the LLC 37. , Causing a read transaction and terminating the interrupt handler from the board bus interface 28. Further, as described above, the response packet to the read packet is monitored by the program normally executed by the CPU 21.

【０２５８】さらに、ホストコンピュータ１１からＬＬ
Ｃ３７を通して読み出し応答パケットが返ってくると、
前述のように、ＬＬＣ３７から外部割り込みが発生され
るので、外部割り込み処理を行うハンドラに、読み出し
応答パケットの処理を行う部分も追加する。この処理に
は、前記（Ｈ）の処理にはない操作が含まれている。Further, the LL is sent from the host computer 11.
When the read response packet is returned via C37,
As described above, since an external interrupt is generated from the LLC 37, a part for processing the read response packet is also added to the handler for performing the external interrupt processing. This process includes an operation not included in the process (H).

【０２５９】まず、このトランザクションでは、読み出
し応答パケットに、読み出されたデータが付加されてい
るので、割り込みハンドラでは、読み出し応答パケット
のヘッダと、データとを内部メモリ２３に書き込む。そ
の後、前記（Ｃ）または前記（Ｄ）と同様の手順によっ
て、最初に読み出し要求を発生させたプロセッサエレメ
ント１４に対して、特定のメモリアドレスに対するＰＰ
ＲＡＭ−Ｌｉｎｋの書き込みトランザクションを発生す
る。このときのプロセッサエレメントのアドレス情報
は、上記特殊なフォーマットの中に含まれているものと
する。このことによって、バスブリッジ回路１７の内部
メモリ２３に格納されたホストコンピュータ１１からの
読み出しデータが、プロセッサエレメント１４のメモリ
上に書き込まれる。このように、新たなＰＰＲＡＭ−Ｌ
ｉｎｋの書き込みトランザクションを発生させるところ
が、前記（Ｈ）の場合にはない点である。First, in this transaction, since the read data is added to the read response packet, the interrupt handler writes the header and data of the read response packet into the internal memory 23. Thereafter, by the same procedure as in the above (C) or (D), the processor element 14 that first generated the read request is sent to the PP for the specific memory address.
Generate a RAM-Link write transaction. It is assumed that the address information of the processor element at this time is included in the special format. As a result, the read data from the host computer 11 stored in the internal memory 23 of the bus bridge circuit 17 is written on the memory of the processor element 14. Thus, a new PPRAM-L
An ink write transaction is not generated in the case of the above (H).

【０２６０】また、この割り込みハンドラでは、前記
（Ｈ）の操作と同様、ＰＵで通常行われている応答パケ
ットの監視の対象から除去する操作も、勿論行われる。In this interrupt handler, as well as the operation (H), of course, the operation of removing the response packet from the monitoring target of the response packet normally performed in the PU is also performed.

【０２６１】なお、プロセッサエレメント１４でｇｒｓ
の計算が終了したら、その旨をホストコンピュータ１１
に通知して、ホストコンピュータ１１から値を読み出す
ようにしていたが、上記バスブリッジ回路１７の動作
（Ｈ）を使用して、プロセッサエレメント１４から、直
接、ホストコンピュータ１１に対してｇｒｓの値を送る
ようにしても良い。It should be noted that grs is
Is completed, the host computer 11 notifies that fact.
To read the value of grs from the host computer 11, but using the operation (H) of the bus bridge circuit 17, the value of grs is directly transmitted from the processor element 14 to the host computer 11. You may send it.

【０２６２】また、上述の実施の形態では、ボードバス
インタフェース２８にＰＰＲＡＭ−Ｌｉｎｋからの要求
パケットが来た場合、ＣＰＵ２１に割り込みを出すよう
にしていたが、割り込みは出さずに、レジスタを設けて
おき、要求パケットが来る度に、レジスタに書き込みを
行うことで、その旨をＣＰＵ２１に伝えるようにしても
良い。In the above embodiment, when a request packet from the PPRAM-Link comes to the board bus interface 28, an interrupt is issued to the CPU 21. However, an interrupt is not issued, and a register is provided. Alternatively, each time a request packet arrives, writing to a register may be performed to notify the CPU 21 of that fact.

【０２６３】また、ＬＬＣ３７も割り込みを出さないよ
うに設定し、前記状態レジスタをＣＰＵ２１から定期的
に読み出すようにしておけば、上で述べた機能は実現で
きる。このようにすることによって、ＣＰＵ２１が割り
込みを処理する機能は不要になり、共用チップ２０の開
発時の機能検証の工数が減少する。If the LLC 37 is set so as not to issue an interrupt and the status register is periodically read from the CPU 21, the above-described function can be realized. By doing so, the function of the CPU 21 for processing the interrupt becomes unnecessary, and the number of steps for function verification at the time of development of the shared chip 20 is reduced.

【０２６４】また、上述の実施の形態では、プロセッサ
エレメント１４に含まれる内部メモリ２３に複数のポー
トを持たせ、整数データと浮動小数点データを同時に格
納するようにしていたが、もちろん、整数データを格納
するメモリと、浮動小数点データを格納するメモリを別
に用意しても良い。In the above-described embodiment, the internal memory 23 included in the processor element 14 is provided with a plurality of ports to store integer data and floating-point data at the same time. A memory for storing the data and a memory for storing the floating-point data may be separately prepared.

【０２６５】さらに、上述の実施の形態では、６４ビッ
トの倍精度浮動小数点数を扱うようにしているが、浮動
小数点データのフォーマットはこれにとどまらないこと
は言うまでもない。例えば、ＬＭバス、ローカルメモリ
のビット幅を８０ビットとして、８０ビット以下のビッ
ト長で表される浮動小数点フォーマットを用いるように
しても良い。Further, in the above-described embodiment, a 64-bit double-precision floating-point number is handled, but it goes without saying that the format of floating-point data is not limited to this. For example, the bit width of the LM bus and local memory may be 80 bits, and a floating point format represented by a bit length of 80 bits or less may be used.

【０２６６】さらに、上述の実施の形態では、システム
バス１３にＩＥＥＥ１３９４規格のシリアルバス、ボー
ドバス１６にＰＰＲＡＭ−Ｌｉｎｋを使用した例を示し
たが、これらに限らず、全く別のバスの組み合わせで構
成しても良い。Furthermore, in the above-described embodiment, an example is shown in which an IEEE 1394 standard serial bus is used for the system bus 13 and a PPRAM-Link is used for the board bus 16. You may comprise.

【０２６７】また、さらに、上述の実施の形態は、分子
軌道計算を例にとって説明したが、分子軌道計算とは異
なる他のアプリケーションに対しても、この発明が適用
できることは勿論である。Further, in the above-described embodiment, the molecular orbital calculation has been described as an example. However, it is needless to say that the present invention can be applied to other applications different from the molecular orbital calculation.

【０２６８】[0268]

【発明の効果】以上説明したように、この発明によれ
ば、プロセッサエレメントとしての構成に必要最小限の
拡張を施したプロセッサエレメント回路を提供し、モー
ド選択信号により、演算処理モードと、ブリッジモード
とを選択することができるようにして、同じＬＳＩを、
プロセッサエレメントと、バスブリッジ回路の一部とし
て共用することができるようにしたので、回路開発の工
数を劇的に削減することができ、システムの開発期間の
短縮とコストの削減が可能になる。As described above, according to the present invention, there is provided a processor element circuit in which the configuration as a processor element has been extended to a minimum necessary, and an operation processing mode and a bridge mode are provided by a mode selection signal. And the same LSI,
Since it can be shared as a part of the bus bridge circuit with the processor element, the man-hour for circuit development can be dramatically reduced, and the system development period and cost can be reduced.

[Brief description of the drawings]

【図１】この発明によるプロセッサエレメント回路の実
施の形態を、並列計算システムのプロセッサエレメント
として使用した場合のブロック図である。FIG. 1 is a block diagram when an embodiment of a processor element circuit according to the present invention is used as a processor element of a parallel computing system.

【図２】この発明による並列計算システムの実施の形態
の構成を示すブロック図である。FIG. 2 is a block diagram showing a configuration of an embodiment of a parallel computing system according to the present invention.

【図３】この発明によるプロセッサエレメント回路の実
施の形態を、並列計算システムのバスブリッジ回路の一
部として使用した場合のブロック図である。FIG. 3 is a block diagram when an embodiment of a processor element circuit according to the present invention is used as a part of a bus bridge circuit of a parallel computing system.

【図４】実施の形態の並列計算システムのボードバスに
使用されるＰＰＲＡＭ−Ｌｉｎｋの構成を示す図であ
る。FIG. 4 is a diagram illustrating a configuration of a PPRAM-Link used for a board bus of the parallel computing system according to the embodiment;

【図５】実施の形態の並列計算システムのボードバスに
使用されるＰＰＲＡＭ−Ｌｉｎｋのトランザクションを
示した図である。FIG. 5 is a diagram illustrating a transaction of a PPRAM-Link used for a board bus of the parallel computing system according to the embodiment;

【図６】プロセッサエレメント回路の実施の形態のボー
ドバスインタフェースの内部構成を示すブロック図であ
る。FIG. 6 is a block diagram showing an internal configuration of a board bus interface according to the embodiment of the processor element circuit.

【図７】並列計算の専用システムの例を示したブロック
図である。FIG. 7 is a block diagram showing an example of a dedicated system for parallel calculation.

[Explanation of symbols]

１，１１ホストコンピュータ２，１２ボード３，１３システムバス４，１４プロセッサエレメント５，１５ローカルメモリ６，１６ボードバス７，１７バスブリッジ回路２０共用チップ２１ＣＰＵ２２浮動小数点演算ユニット２３内部メモリ２４ローカルメモリコントローラ２５周辺回路コントローラ２６マルチプレクサ２７ロードジェネレータ２８ボードバスインタフェース３０プロセッサエレメント内部バス３１ローカルメモリ内部バス３２命令バス３３浮動小数点データバス３４モード選択信号ＳＥＬの入力端子３５割り込み信号ライン３６ＲＯＭ３７リンク層コントローラ３８物理層コントローラ３９外部割り込み信号ライン４０アドレスデコーダ４１バイパス用ＦＩＦＯ４２フローコントローラ４３入力キュー４４受領パケットジェネレータ４５出力キュー４６割り込み生成ブロック４７メモリアクセスブロック４８制御状態レジスタ 1,11 host computer 2,12 board 3,13 system bus 4,14 processor element 5,15 local memory 6,16 board bus 7,17 bus bridge circuit 20 shared chip 21 CPU 22 floating point arithmetic unit 23 internal memory 24 local Memory controller 25 Peripheral circuit controller 26 Multiplexer 27 Load generator 28 Board bus interface 30 Processor element internal bus 31 Local memory internal bus 32 Instruction bus 33 Floating point data bus 34 Input terminal of mode selection signal SEL 35 Interrupt signal line 36 ROM 37 Link layer Controller 38 Physical layer controller 39 External interrupt signal line 40 Address decoder 41 FIFO for bypass 42 Flow control Over La 43 input queue 44 receives the packet generator 45 output queues 46 interrupt generation block 47 the memory access block 48 control status register

───────────────────────────────────────────────────── フロントページの続き (72)発明者山田想神奈川県足柄上郡中井町境430 グリーンテクなかい富士ゼロックス株式会社内 (72)発明者宮川宣明神奈川県足柄上郡中井町境430 グリーンテクなかい富士ゼロックス株式会社内 (72)発明者村上和彰福岡県春日市春日公園４−１−２−307 (72)発明者高島一東京都豊島区高田３−24−１大正製薬株式会社内 (72)発明者北村一泰東京都豊島区高田３−24−１大正製薬株式会社内Ｆターム(参考） 5B045 AA07 BB11 BB12 BB14 BB42 FF02 FF06 GG11 HH01 KK04 5B056 BB00 BB31 FF15 5B061 FF01 FF02 GG01 GG13 ──────────────────────────────────────────────────続き Continuing on the front page (72) Inventor Satoru Yamada 430 Nakaicho Sakai, Ashigarakami-gun, Kanagawa Prefecture Green Tech Nakai Fuji Xerox Co., Ltd. (72) Kazuaki Murakami, Inventor 4-1-2-307, Kasuga Park, Kasuga City, Fukuoka Prefecture Ichiyasu 3-24-1, Takada, Toshima-ku, Tokyo F-term (reference) 5B045 AA07 BB11 BB12 BB14 BB42 FF02 FF06 GG11 HH01 KK04 5B056 BB00 BB31 FF15 5B061 FF01 FF02 GG01 GG13

Claims

[Claims]

1. A processor element circuit used in a parallel computing system having a function of performing arithmetic operations in parallel, a plurality of buses, and a bridge circuit for bridging the plurality of buses. An internal memory for storing data; and a first register therein, wherein an operation is performed on the data read from the internal memory or the first register, and the operation result is stored in the internal memory or the first memory. A processor to be stored in one register, a mode selector for selecting whether to operate the processor element circuit in an arithmetic processing mode or a bridge mode, and the arithmetic processing mode is selected by the mode selector. The mode selection means between the processor element circuit and the external memory. When the ridge mode is selected, interface means for inputting and outputting data between the processor element circuit and the peripheral circuit; and when the arithmetic processing mode is selected by the mode selection means, the apparatus stands by at startup. State, and becomes an operating state by a subsequent start operation. When the bridge mode is selected by the mode selecting means, the state becomes a boot state at the time of start, and becomes an active state by a subsequent transition operation, and further stores data therein. Operating in accordance with an instruction sequence, and at least a function of controlling the arithmetic unit, and the external memory or the peripheral circuit and the second circuit in accordance with a selection result by the mode selection means. Input / output data to / from a register or the first register Processor means having a function; supplying a specific instruction sequence to the processor means in the boot state; storing the instruction sequence read from the ROM by the specific instruction sequence in the internal memory; After the supply of the sequence is completed, the transition operation is performed on the processor, and when the processor is in the operating state, an instruction selecting unit that reads an instruction sequence from the internal memory and supplies the instruction to the processor, The input data is stored in the internal memory or the external memory, the data stored in the internal memory or the external memory is read and output to the external bus, and the processor is operated in response to an input from the external bus. External bus interface means for performing the starting operation on the means. Instrument circuit.

2. The processor means comprises means for processing an interrupt, wherein the interface means receives an interrupt from the peripheral circuit when a bridge mode is selected by the selection input, and 2. The processor according to claim 1, wherein said external bus interface means includes means for supplying an interrupt to said processor means when specific data is input from said external bus. Element circuit.

3. A host for controlling the entire system in a parallel computing system having a function of performing arithmetic operations in parallel, a plurality of buses, and a bridge circuit for bridging the plurality of buses. 3. A computer, and the processor element circuit according to claim 1 or 2, wherein the processor is set to the arithmetic processing mode by the mode selection means, and a plurality of boards each mounting a plurality of the processor elements. A first bus for connecting the external bus interface means of the plurality of processor elements on each of the boards; a second bus for connecting the host computer and the plurality of boards; A bus, and a processor element circuit according to claim 1 or 2. A mode is set to a bridge mode by the mode selecting means. On each of the boards, the external bus interface means is connected to the first bus, and the interface means is connected to the second bus through a conversion circuit. The first bus and the second bus
And a bus bridge for mutually performing protocol conversion between the bus and the bus.

4. The parallel computing system according to claim 3, wherein said first bus and said second bus having different protocols are provided.
A bus bridge method for performing a protocol conversion between the first bus and a bus, wherein a write request including a write request, a write address, and write data to a node located outside the first bus is transmitted from the processor element to a second bus. Receiving a write data based on the write information, and writing the write data to the write address based on the write information based on the write procedure of the second bus. A bus bridge method, characterized in that:

5. The parallel computing system according to claim 3, wherein said first bus and said second bus having different protocols are provided.
A bus bridge method for performing protocol conversion between the processor element and a bus, wherein the processor element stores a read request from a node located outside the first bus, a read address, and an address of a processor element that stores read data. Receiving read information including the following based on a write procedure of the first bus; and reading data from the read address based on the read procedure of the second bus based on the read information. And a procedure for writing the read data to an address of a processor element included in the read information based on a write procedure for the second bus.