JPH0895898A

JPH0895898A - Direct memory access device and data transfer device

Info

Publication number: JPH0895898A
Application number: JP22940794A
Authority: JP
Inventors: Ichiro Okabayashi; 一郎岡林
Original assignee: Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Holdings Corp
Priority date: 1994-09-26
Filing date: 1994-09-26
Publication date: 1996-04-12

Abstract

PURPOSE: To provide a direct memory access device which can be widely used and is expandable and can reduce the overhead of data transfer and a data transfer device equipped with the direct memory access device. CONSTITUTION: As for a DMAC 250 which calculates the addresses of the acquisition destination and transfer destination of data to be transferred by sequentially executing an instruction string stored in an instruction memory 201, a decoder 202 outputs a standby period and branch conditions specified by a standby and branch instruction to a standby counter 208 and a condition decision part 209 when the decoder 202 decodes the standby and branch instruction. The standby counter 208 counts the standby period and the condition decision part 29 decides the branch conditions. A program counter 205 performs branching to a branch destination address stored in a branch address part 210 when the branch conditions are established during the standby period, and counts the address of an instruction following the standby and branch instruction when it is not.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、例えば、複数のプロセ
ッサエレメントが利用者プログラムを分担して処理する
並列計算機システムに好適に備えられ、前記利用者プロ
グラムを実行するプロセッサ（以下、単にプロセッサと
いう。）の制御を介さずに直接データの転送を行うダイ
レクトメモリアクセス装置および前記ダイレクトメモリ
アクセス装置を備えるデータ転送装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention is preferably provided in, for example, a parallel computer system in which a plurality of processor elements share and process a user program, and a processor that executes the user program (hereinafter, simply referred to as a processor). The present invention relates to a direct memory access device that directly transfers data without the control of the above) and a data transfer device including the direct memory access device.

【０００２】[0002]

【従来の技術】近年、プロセッサ単体の処理速度が飛躍
的に向上している。ところがプロセッサを含むシステム
全体においては、プロセッサ間のデータ転送速度および
外部デバイスとの通信速度がプロセッサの処理速度に比
べて遅いため、プロセッサ単体の処理速度の向上がシス
テム全体の処理速度の向上に直接つながらない。従っ
て、システム全体の性能向上のためには、メモリおよび
外部デバイスとの通信性能、機能の向上も必要である。
しかしながら、この部分はプロセッサ単体に比べて性能
向上が困難であり、現在多くの研究がなされている最中
である。2. Description of the Related Art In recent years, the processing speed of a single processor has been dramatically improved. However, in the entire system including the processor, the data transfer speed between the processors and the communication speed with the external device are slower than the processing speed of the processor. it dose not connect. Therefore, in order to improve the performance of the entire system, it is necessary to improve the communication performance and functions with the memory and external devices.
However, it is difficult to improve the performance of this part as compared with the processor alone, and many studies are currently being made.

【０００３】図９は、従来の並列計算機システムの概略
構成を示すブロック図である。この並列計算機システム
は、ネットワーク９３０に接続された複数のプロセッサ
エレメント（以下ＰＥと略す）９０１（１）〜９０１
（ｎ）を有している。各ＰＥ９０１（１）〜ＰＥ９０１
（ｎ）は同等のものであるのでＰＥ９０１（１）のみを
説明する。FIG. 9 is a block diagram showing a schematic configuration of a conventional parallel computer system. This parallel computer system includes a plurality of processor elements (hereinafter abbreviated as PE) 901 (1) to 901 connected to a network 930.
(N). PE901 (1) to PE901
Since (n) is equivalent, only PE901 (1) will be described.

【０００４】ＰＥ９０１（１）は、プロセッサ９１０、
メモリ９１１、ローカルバス９１２およびデータ転送装
置９１３を備える。プロセッサ９１０、メモリ９１１、
データ転送装置９１３は、ローカルバス９１２を介して
相互にデータ転送を行う。例えばＰＥ９０１（１）は、
配列演算などを他のＰＥ９０１（２）、９０１（３）、
…、９０１（ｎ）と分担して計算する。そのため、他の
ＰＥ９０１（２）、９０１（３）、…、９０１（ｎ）の
演算結果を必要とするときは、後述のネットワーク９３
０を介してデータを相互に転送する。プロセッサ９１０
は、メモリ９１１中のプログラムに従ってメモリ９１１
のデータに対して演算処理を実行する。The PE 901 (1) is a processor 910,
A memory 911, a local bus 912 and a data transfer device 913 are provided. A processor 910, a memory 911,
The data transfer devices 913 mutually transfer data via the local bus 912. For example, PE901 (1)
Array operations such as PE901 (2), 901 (3),
..., 901 (n) are shared and calculated. Therefore, when the calculation results of the other PEs 901 (2), 901 (3), ..., 901 (n) are required, the network 93 described later is used.
Transfer data to each other via 0. Processor 910
According to the program in the memory 911.
The arithmetic processing is executed on the data.

【０００５】図１０は、ＰＥ９０１（１）〜ＰＥ９０１
（ｎ）間のデータを伝送するネットワーク９３０の構成
を示すブロック図である。ネットワーク９３０は、ＰＥ
９０１（１）〜ＰＥ９０１（ｎ）間で相互に転送される
データを伝達する。具体的にはネットワーク９３０は、
ＦＩＦＯ（ＦｉｒｓｔＩｎＦｉｒｓｔＯｕｔ）メ
モリにより構成されるバッファＦ１１〜Ｆｎｎが格子状
に配置され、ＰＥ９０１（１）〜ＰＥ９０１（ｎ）間を
相互に接続する。FIG. 10 shows PE901 (1) to PE901.
It is a block diagram which shows the structure of the network 930 which transmits the data between (n). Network 930 is PE
It transfers data mutually transferred between 901 (1) to PE901 (n). Specifically, the network 930 is
Buffers F11 to Fnn configured by a FIFO (First In First Out) memory are arranged in a grid and connect PE901 (1) to PE901 (n) to each other.

【０００６】データ転送装置９１３は、転送すべきデー
タをネットワーク９３０を通じて他のＰＥ９０１
（２）、…、９０１（ｎ）に送信し、また、他のＰＥ９
０１（２）、…、９０１（ｎ）からネットワーク９３０
を通じてデータを受信してメモリ９１１に格納する。詳
しくは、データ送信時には、メモリ９１１からバッファ
Ｆ１１〜Ｆ１ｎの任意の何れかにデータを転送し、デー
タ受信時には、バッファＦ１１〜Ｆｎ１の任意の何れか
からメモリ９１１にデータを転送する。The data transfer device 913 sends data to be transferred to another PE 901 through the network 930.
(2), ..., 901 (n), and other PE9
01 (2), ..., 901 (n) to the network 930
The data is received through and stored in the memory 911. Specifically, at the time of data transmission, the data is transferred from the memory 911 to any of the buffers F11 to F1n, and at the time of data reception, the data is transferred from any of the buffers F11 to Fn1 to the memory 911.

【０００７】前述のような並列計算機システムでは、デ
ータ転送の頻度が高いので、効率のよいデータ転送が要
求される。このため、プロセッサ９１０を介さずに外部
デバイスとメモリ９１１との間のデータ転送を行うＤＭ
Ａ（ダイレクトメモリアクセス）が知られている。この
場合には、別途、ＤＭＡＣ（ダイレクトメモリアクセス
装置）が必要である。ＤＭＡＣは、ローカルバス９１２
を専有するためのＨＯＬＤ要求信号、メモリ９１１への
書き込み読み出し（Ｒ／Ｗ）を制御するバス制御信号、
データの取得先および転送先のアドレスなどを生成す
る。ＤＭＡＣは、専用のハードウェアで実現されるもの
が一般的であるが、この場合にはＤＭＡＣの拡張性が低
く、並列計算機システムの仕様変更に柔軟に対応するこ
とが困難である。このため、量産化が期待できず、コス
トの低減が困難である。このため、汎用のマイクロプロ
セッサを用いてプログラマブルなＤＭＡＣを構成するこ
とを考える。In the parallel computer system as described above, since the frequency of data transfer is high, efficient data transfer is required. Therefore, a DM that transfers data between the external device and the memory 911 without going through the processor 910.
A (direct memory access) is known. In this case, a separate DMAC (Direct Memory Access Device) is required. DMAC is a local bus 912
HOLD request signal for monopolizing the bus, a bus control signal for controlling writing / reading (R / W) to / from the memory 911,
It creates addresses for data acquisition and transfer. The DMAC is generally realized by dedicated hardware, but in this case, the expandability of the DMAC is low, and it is difficult to flexibly deal with the specification change of the parallel computer system. Therefore, mass production cannot be expected and it is difficult to reduce the cost. Therefore, it is considered to configure a programmable DMAC using a general-purpose microprocessor.

【０００８】一般に、アドレス計算は、以下のような処
理手順で行われる。以下に示す命令列は、高級言語（フ
ォートラン）で記述されており、実際にマイクロプロセ
ッサで実行される命令列ではないが、アルゴリズムは同
じであるので例示する。ｆｏｒｉ＝０，ｉｉｆｏｒｊ＝０，ｊｊａｄｄｒｅｓｓ＝ＢＡ＋ＯＦＳ＊ｉ＋ｊここで、ＢＡはベースアドレス、ＯＦＳはオフセット、
ｉｉおよびｊｊはカウンタのカウント範囲を示す。ここ
で、オフセットとは、異なるアドレス領域にジャンプす
るために予め定められた値である。Generally, the address calculation is performed by the following processing procedure. The instruction sequence shown below is written in a high-level language (Fortran) and is not an instruction sequence that is actually executed by a microprocessor, but the algorithm is the same, so it is illustrated. for i = 0, ii for j = 0, jj address = BA + OFS * i + j where BA is the base address, OFS is the offset,
ii and jj indicate the count range of the counter. Here, the offset is a predetermined value for jumping to a different address area.

【０００９】”ｆｏｒｉ＝０，ｉｉ”は、ｉの値を
「０」から「ｉｉ」までカウントアップしながら以下の
命令を繰り返すことを意味する。また、”ｆｏｒｊ＝
０，ｊｊ”は、ｊの値を「０」から「ｊｊ」までカウン
トアップしながら以下の命令を繰り返すことを意味す
る。”ａｄｄｒｅｓｓ＝ＢＡ＋ＯＦＳ＊ｉ＋ｊ”は、ベ
ースアドレスに、オフセットにｉの値を乗算した値と、
ｊの値とを加算するとアドレスが求められることを意味
する。"For i = 0, ii" means that the following instruction is repeated while incrementing the value of i from "0" to "ii". Also, "for j =
0, jj "means that the following instruction is repeated while counting up the value of j from" 0 "to" jj ". “Address = BA + OFS * i + j” is a value obtained by multiplying the base address by the offset by the value i.
This means that the address is obtained by adding the value of j.

【００１０】このように、”ａｄｄｒｅｓｓ＝ＢＡ＋Ｏ
ＦＳ＊ｉ＋ｊ”の命令が繰り返し実行される毎に１つの
アドレスが算出される。例えば、ＢＡ＝１０００，ｉｉ
＝４，ｊｊ＝３，ＯＦＳ＝１００のとき、アドレス＝１
０００，１００１，１００２，１００３，１１００，１
１０１，１１０２，１１０３，１２００，１２０１，…
が順次、算出される。このように、ＤＭＡＣにおけるア
ドレス生成では高頻度で繰り返し処理が行われる。In this way, "address = BA + O"
One address is calculated each time the FS * i + j ″ instruction is repeatedly executed. For example, BA = 1000, ii.
= 4, jj = 3, OFS = 100, address = 1
000,1001,1002,1003,1100,1
101, 1102, 1103, 1200, 1201, ...
Are sequentially calculated. Thus, in the address generation in the DMAC, the iterative process is performed with high frequency.

【００１１】また、ＤＭＡＣは、ネットワーク９３０上
のバッファＦ１１〜Ｆｎｎからのデータ受信信号によっ
て割り込み処理を行う場合を除き、バッファＦ１１〜Ｆ
ｎｎのアドレスを順次、生成してそのアドレスで指定さ
れるバッファＦ１１〜Ｆｎｎ内のデータの有無を検索
し、検索したバッファＦ１１〜Ｆｎｎに転送すべきデー
タが有れば転送先のアドレスを生成して転送する。転送
すべきデータがなければ他のバッファＦ１１〜Ｆｎｎの
アドレスを生成してそのアドレスで指定されるバッファ
Ｆ１１〜Ｆｎｎ内のデータの有無を検索する必要があ
る。これは、命令列実行における分岐処理に対応する。Further, the DMAC is provided with the buffers F11 to Fnn except when the interrupt processing is performed by the data reception signals from the buffers F11 to Fnn on the network 930.
The addresses nn are sequentially generated, and the presence or absence of data in the buffers F11 to Fnn designated by the addresses is searched. If there is data to be transferred in the searched buffers F11 to Fnn, a transfer destination address is generated. To transfer. If there is no data to be transferred, it is necessary to generate the addresses of the other buffers F11 to Fnn and search for the presence or absence of data in the buffers F11 to Fnn designated by the addresses. This corresponds to the branch processing in the instruction string execution.

【００１２】図１１は、マイクロプロセッサの命令列を
示す図である。図１１（ａ）は分岐処理のための命令列
を示し、図１１（ｂ）は繰り返し処理のための命令列を
示す。それぞれにおいて、左がプログラムとして記述さ
れる静的な命令列、右が実行順序を示す動的な命令列で
ある。図１１（ａ）に示す分岐時では、分岐命令は”Ｂ
ＥＱＬ１”である。”ＢＥＱＬ１”は条件成立なら
Ｌ１へ飛び、条件不成立であれば飛ばずに次の命令を実
行することを意味する。条件判定はＢＥＱ命令の前、つ
まりＳＵＢ命令の結果によって決定される。ここでは、
ＳＵＢ命令で結果が「０」でなければ条件不成立で、次
に後続のＡＮＤ命令を実行する。また、ＳＵＢ命令で結
果が「０」であれば条件成立で、次に分岐先であるＯＲ
命令を実行する。FIG. 11 is a diagram showing an instruction sequence of the microprocessor. FIG. 11A shows an instruction string for branch processing, and FIG. 11B shows an instruction string for iterative processing. In each of these, the left is a static instruction sequence written as a program, and the right is a dynamic instruction sequence indicating the execution order. At the time of branching shown in FIG. 11A, the branch instruction is "B.
"BEQ L1" means jump to L1 if the condition is satisfied, and execute the next instruction without skipping if the condition is not satisfied. Condition determination is before the BEQ instruction, that is, the result of the SUB instruction. Determined by where:
If the result of the SUB instruction is not "0", the condition is not satisfied, and the subsequent AND instruction is executed next. If the result of the SUB instruction is "0", the condition is satisfied, and the next branch destination OR
Execute an instruction.

【００１３】図１１（ｂ）に示す繰り返し時では、ＤＢ
ｃｃ命令で繰り返しループの制御を行なう。“ＤＢｃｃ
Ｄ１Ｌ１”はＤ１−１を実行し、結果が「０」以上
であればＬ１に分岐する。Ｄ１はデータレジスタの記憶
内容である。Ｄ１−１は、Ｄ１の記憶内容から「１」を
減算し、減算結果をＤ１に格納することを意味する。所
望の回数だけ繰り返し処理を行わせるためには、あらか
じめＤ１に「所望の繰り返し回数から１減じたもの」を
格納し、ラベルＬ１をループの先頭に付けておく。ここ
ではＡＮＤ命令にＬ１を付し、Ｄ１に「２」を格納す
る。これにより図１１（ｂ）右に示すように、ＡＮＤ命
令、ＯＲ命令、ＤＢｃｃ命令の３命令を３回繰り返すこ
とができる。At the time of repetition shown in FIG.
The cc instruction controls the repeat loop. "DBcc
D1 L1 ″ executes D1-1, and if the result is “0” or more, branches to L1. D1 is the stored contents of the data register. D1-1 means that “1” is subtracted from the storage content of D1, and the subtraction result is stored in D1. In order to perform the repetitive processing a desired number of times, "the desired number of repeated times minus one" is stored in D1 in advance, and the label L1 is attached to the beginning of the loop. Here, L1 is attached to the AND instruction, and “2” is stored in D1. As a result, as shown in the right part of FIG. 11B, the three instructions of the AND instruction, the OR instruction, and the DBcc instruction can be repeated three times.

【００１４】[0014]

【発明が解決しようとする課題】しかしながら、ダイレ
クトメモリアクセスのためのアドレス生成を、前述のよ
うな命令列を実行する専用のマイクロプロセッサによっ
て行う場合には、アドレス生成動作として、予め定める
命令列で記述することができない動作は実行することが
できない。すなわち、従来のマイクロプロセッサの例で
示したような命令列によってしか条件分岐、繰り返し処
理を行なうことができない。However, when the address generation for the direct memory access is performed by the dedicated microprocessor for executing the above-mentioned instruction sequence, the address generation operation is performed by a predetermined instruction sequence. Actions that cannot be described cannot be performed. That is, conditional branching and repetitive processing can be performed only by the instruction sequence shown in the example of the conventional microprocessor.

【００１５】例えば、前述のようなプロセッサに一定期
間待機して、待機期間中に条件が成立すればその時点で
分岐する処理をさせたい場合には、待機期間に相当する
時間間隔で図１１（ａ）に示すＳＵＢ命令およびＢＥＱ
命令を繰り返すような命令列をプログラムすることにな
る。しかし、分岐命令において、条件成立を判断するの
はＳＵＢ命令を実行する１時点に限られており、たとえ
ＳＵＢ命令と次のＳＵＢ命令との間に条件が成立して
も、次のＳＵＢ命令を実行するまでの間は分岐に移れな
い。データ転送装置では非同期的に送られてくるデータ
を扱う場合が多い。この場合に、データ転送装置は、取
得すべきデータが取得先に格納されるのを待機する必要
が生じるが、前述のような待機動作では、条件成立と次
のＳＵＢ命令実行までの間にオーバーヘッドを生じてし
まい、データ転送効率が悪いうという問題点がある。従
って、データ転送装置では従来とは異なる条件分岐が必
要である。For example, in the case where the processor as described above waits for a certain period of time and if it is desired to perform a process of branching at that time when a condition is satisfied during the waiting period, the process shown in FIG. SUB instruction and BEQ shown in a)
You will program a sequence of instructions that repeats the instructions. However, in the branch instruction, the condition is only determined at one time point when the SUB instruction is executed. Even if the condition is satisfied between the SUB instruction and the next SUB instruction, the next SUB instruction is executed. It cannot move to a branch until it is executed. Data transfer devices often handle data sent asynchronously. In this case, the data transfer device needs to wait for the data to be acquired to be stored in the acquisition destination. However, in the above-described standby operation, the overhead occurs between the satisfaction of the condition and the execution of the next SUB instruction. However, there is a problem that the data transfer efficiency is deteriorated. Therefore, the data transfer device requires conditional branching different from the conventional one.

【００１６】また、特に並列計算機で用いられるダイレ
クトメモリアクセス装置では、アドレス計算における繰
り返し処理の頻度が高いので、この繰り返し処理を短時
間で行うことも必要である。例えば、図１１（ｂ）に示
すＤＢｃｃ命令を用いて繰り返し処理を行う場合には、
図１１（ｂ）右に示すように、繰り返しの都度、ＤＢｃ
ｃ命令が実行される。従って、ＤＢｃｃ命令は、可能な
限り短時間で実行されることが望ましい。これに対し、
ＤＢｃｃ命令は、その命令の中にＤ１−１という演算と
Ｄ１が「０」以上かどうかという判断との２つの処理を
含んでいるので、その他の単純な演算命令よりは長い実
行時間を必要とする。従って、このようなプログラムで
繰り返し処理を行わせると、繰り返しの処理毎にオーバ
ーヘッドを生じるという問題点がある。Further, in a direct memory access device used especially in a parallel computer, since the frequency of repetitive processing in address calculation is high, it is necessary to perform this repetitive processing in a short time. For example, in the case of repeating the process using the DBcc instruction shown in FIG.
As shown on the right side of FIG. 11B, DBc
The c instruction is executed. Therefore, it is desirable that the DBcc instruction be executed in the shortest possible time. In contrast,
Since the DBcc instruction includes two processes, an operation of D1-1 and a determination of whether D1 is “0” or more, the DBcc instruction requires a longer execution time than other simple operation instructions. To do. Therefore, if the repetitive processing is performed by such a program, there is a problem that an overhead is generated for each repetitive processing.

【００１７】本発明は上記問題点に鑑み、汎用性、拡張
性に優れ、かつデータの転送に伴うオーバーヘッドを低
減することができるプログラマブルなダイレクトメモリ
アクセス装置およびデータ転送装置を提供することを目
的とする。In view of the above problems, it is an object of the present invention to provide a programmable direct memory access device and a data transfer device which are excellent in versatility and expandability and can reduce the overhead associated with data transfer. To do.

【００１８】[0018]

【課題を解決するための手段】上記目的を達成するため
に、請求項１記載の本発明のダイレクトメモリアクセス
装置は、転送プログラムを格納する命令メモリと、前記
命令メモリから１命令ずつ読み出して解読し、予め定め
る分岐命令を解読した際には、当該分岐命令によって定
められる待機期間、分岐条件、分岐先を出力するデコー
ダと、前記デコーダが解読した命令を実行する実行部と
を含み、前記実行部は待機期間中に分岐条件が成立した
ときは命令メモリ内の前記分岐先の命令から順次、命令
を実行し、待機期間中に分岐条件が成立しなかったとき
は待機期間終了後、前記分岐命令の次の命令から順次、
命令を実行することを特徴とする。In order to achieve the above object, a direct memory access device of the present invention according to claim 1 reads an instruction memory for storing a transfer program and one instruction from the instruction memory for decoding. When a predetermined branch instruction is decoded, the execution includes a decoder that outputs a waiting period determined by the branch instruction, a branch condition, and a branch destination, and an execution unit that executes the instruction decoded by the decoder. When the branch condition is satisfied during the waiting period, the instructions are sequentially executed from the branch destination instruction in the instruction memory, and when the branch condition is not satisfied during the waiting period, the branch is executed after the waiting period ends. From the instruction after the instruction,
It is characterized by executing an instruction.

【００１９】また、請求項２記載の本発明のダイレクト
メモリアクセス装置は、転送プログラムを格納する命令
メモリと、前記命令メモリから１命令ずつ読み出して解
読し、予め定める繰り返し命令を解読した際には、当該
繰り返し命令によって定められる繰り返し回数および命
令数を出力するデコーダと、前記デコーダが解読した命
令を実行する実行部と、実行された命令の数を、前記繰
り返し命令によって定められた命令数までカウントする
命令数カウンタと、命令メモリ内の命令を順次指し示
し、前記繰り返し命令が実行されている場合には命令数
カウンタのカウントに基づき、次に読み出すべき命令を
デコーダに指し示す命令フェッチ部と、前記命令数カウ
ンタのカウント終了回数を繰り返し命令で指示された繰
り返し回数までカウントする回数カウンタとを含み、前
記命令フェッチ部は、前記命令カウンタのカウント終了
毎に前記繰り返し命令の次の命令を指し示し、前記回数
カウンタのカウント終了時に、前記繰り返し命令に続く
予め定める数の命令列の次の命令を指し示すことを特徴
とする。According to another aspect of the present invention, there is provided a direct memory access device according to the present invention, wherein an instruction memory for storing a transfer program and one instruction from the instruction memory are read and decoded, and a predetermined repeated instruction is decoded. , A decoder that outputs the number of repetitions and the number of instructions determined by the repeated instruction, an execution unit that executes the instruction decoded by the decoder, and the number of executed instructions up to the number of instructions determined by the repeated instruction An instruction number counter for sequentially pointing to an instruction in the instruction memory, and an instruction fetch unit for pointing to the decoder the instruction to be read next based on the count of the instruction number counter when the repetitive instruction is being executed; The count end count of the number counter is counted up to the repeat count specified by the repeat command. The instruction fetch unit indicates the instruction next to the repetitive instruction each time the count of the instruction counter ends, and when the count of the number counter ends, a predetermined number of instructions following the repetitive instruction. Characterized by pointing to the next instruction in the sequence.

【００２０】さらに、請求項３記載の本発明のダイレク
トメモリアクセス装置は、転送プログラムを格納する命
令メモリと、前記命令メモリから１命令ずつ読み出して
解読し、予め定める分岐命令を解読した際には、当該分
岐命令によって定められる待機期間、分岐条件、分岐先
を出力し、予め定める繰り返し命令を解読した際には、
繰り返し回数および命令数を出力するデコーダと、前記
デコーダが解読した命令を実行する実行部と、実行され
た命令の数を、繰り返し命令で指示された命令数までカ
ウントする命令数カウンタと、命令メモリ内の命令を順
次指し示し、前記分岐命令が実行されている場合には分
岐条件に基づき、前記繰り返し命令が実行されている場
合には命令数カウンタのカウントに基づき、次に読み出
すべき命令をデコーダに指し示す命令フェッチ部と、前
記命令数カウンタのカウント終了回数を繰り返し命令で
指示された繰り返し回数までカウントする回数カウンタ
とを含み、前記命令フェッチ部は、前記分岐命令が実行
されているときには、待機期間中に分岐条件が成立した
ときは命令メモリ内の前記分岐先の命令から順次、命令
を指し示し、待機期間中に分岐条件が成立しなかったと
きは待機期間終了後、前記分岐命令の次の命令から順
次、命令を指し示し、前記繰り返し命令が実行されてい
るときには、前記命令カウンタのカウント終了毎に前記
繰り返し命令の次の命令を指し示し、前記回数カウンタ
のカウント終了時に、前記繰り返し命令に続く予め定め
る数の命令列の次の命令を指し示すことを特徴とする。Further, in the direct memory access device according to the present invention as defined in claim 3, when the instruction memory for storing the transfer program and one instruction from the instruction memory are read and decoded, and a predetermined branch instruction is decoded, , When the waiting period determined by the branch instruction, the branch condition, and the branch destination are output and the predetermined repeat instruction is decoded,
A decoder that outputs the number of repetitions and the number of instructions, an execution unit that executes the instructions decoded by the decoder, an instruction number counter that counts the number of executed instructions up to the number of instructions designated by the repeated instruction, and an instruction memory The instruction to be read next to the decoder based on the branch condition when the branch instruction is executed and based on the count of the instruction number counter when the repeat instruction is executed. The instruction fetch unit includes a pointing instruction fetch unit and a count counter that counts the count end count of the instruction count counter up to the repeat count designated by a repeat instruction, and the instruction fetch unit includes a waiting period when the branch instruction is being executed. If a branch condition is met, the instruction is sequentially pointed to from the instruction at the branch destination in the instruction memory, and the instruction waits. If the branch condition is not satisfied during the interval, after the waiting period ends, the instruction is sequentially pointed to from the instruction next to the branch instruction, and when the repeat instruction is being executed, the instruction is counted every time the instruction counter counts. It is characterized in that it points to the instruction next to the repeat instruction, and at the end of counting of the number of times counter, points to the next instruction of a predetermined number of instruction sequences following the repeat instruction.

【００２１】さらに、請求項４記載の本発明のダイレク
トメモリアクセス装置において、請求項１、請求項２ま
たは請求項３記載の実行部は、レジスタと、アドレス計
算を行なう計算部と、前記計算部が計算したアドレスの
数をカウントし、予め定めるカウント数毎にタイミング
信号を出力するタイミングカウンタとを含むことを特徴
とする。Further, in the direct memory access device according to the present invention as defined in claim 4, the execution part according to claim 1, claim 2 or claim 3 is a register, a calculation part for performing address calculation, and the calculation part. And a timing counter that counts the number of addresses calculated by and outputs a timing signal for each predetermined count number.

【００２２】また、さらに、請求項５記載の本発明のデ
ータ転送装置は、第１のポートと、第２のポートと、第
３のポートと、前記第３のポートを介して入力されたデ
ータを一時記憶する第１のバッファと、前記第２のポー
トを介して入力されたデータを一時記憶する第２のバッ
ファと、転送すべきデータが格納されているメモリ内の
アドレスを生成し前記第３のポートを介して前記アドレ
スからデータを読み出し、前記第１のバッファに格納
し、前記第２のバッファに格納されているデータを書き
込むべきメモリ内のアドレスを生成して前記第３のポー
トを介してメモリ内の前記アドレスにデータを書き込む
よう制御信号を出力する第３のダイレクトメモリアクセ
ス装置と、転送すべきデータが格納されているデバイス
のアドレスを生成し、前記第２のポートを介して前記デ
バイスからデータを読み出して前記第２のバッファに格
納するよう制御信号を出力する第２のダイレクトメモリ
アクセス装置と、前記第１のバッファに格納されている
データを書き込むべきバッファのアドレスを生成し、前
記第１のポートを介して前記バッファにデータを書き込
むよう制御信号を出力する第１のダイレクトメモリアク
セス装置とを含み、前記第１、第２および第３のダイレ
クトメモリアクセス装置は、請求項１、請求項２、請求
項３または請求項４記載のダイレクトメモリアクセス装
置であることを特徴とする。Further, in the data transfer apparatus of the present invention according to claim 5, the data input via the first port, the second port, the third port, and the third port. A first buffer for temporarily storing data, a second buffer for temporarily storing data input through the second port, and an address in a memory in which data to be transferred is stored to generate the first buffer. Data from the address via the port of No. 3, stored in the first buffer, and an address in the memory where the data stored in the second buffer is to be written is generated, and the third port is generated. A third direct memory access device for outputting a control signal to write data to the address in the memory via, and generating an address of a device storing data to be transferred. A second direct memory access device for reading data from the device via the second port and outputting a control signal to store the data in the second buffer; and data stored in the first buffer. A first direct memory access device for generating an address of a buffer to be written and outputting a control signal to write data to the buffer via the first port, the first, second and third The direct memory access device is the direct memory access device according to claim 1, claim 2, claim 3, or claim 4.

【００２３】[0023]

【作用】本発明に従えば、請求項１記載のダイレクトメ
モリアクセス装置において、デコーダは、前記命令メモ
リ内に格納されている転送プログラムを１命令ずつ読み
出して解読し、実行部は、前記デコーダが解読した命令
を実行する。また、デコーダは予め定める分岐命令を解
読した際には、当該分岐命令によって定められる待機期
間、分岐条件、分岐先を出力し、実行部は待機期間中に
分岐条件が成立したときは命令メモリ内の前記分岐先の
命令から順次、命令を実行し、待機期間中に分岐条件が
成立しなかったときは待機期間終了後、前記分岐命令の
次の命令から順次、命令を実行する。According to the present invention, in the direct memory access device according to claim 1, the decoder reads and decodes the transfer program stored in the instruction memory one by one, and the execution unit is Execute the decoded instruction. Further, when the decoder decodes a predetermined branch instruction, it outputs the waiting period, branch condition, and branch destination determined by the branch instruction, and the execution unit stores in the instruction memory when the branch condition is satisfied during the waiting period. The instructions are sequentially executed from the branch destination instruction, and when the branch condition is not satisfied during the waiting period, the instructions are sequentially executed from the instruction next to the branch instruction after the waiting period ends.

【００２４】従って、例えば、命令メモリ内の前記分岐
命令の直前までは１つのデバイスのアドレスを計算する
命令列を格納しておき、これによって、実行部に、転送
すべきデータの取得先のアドレスを計算させる。また、
「直前に計算したアドレスで指定されるデバイスの中に
転送すべきデータが有る」ということを分岐条件として
おく。さらに、前記分岐命令の直後に他のデバイスのア
ドレスを計算する命令列を格納しておき、分岐先にデー
タの転送先のアドレスを計算する命令列を格納してお
く。これによって、「一つのデバイスのデータの有無を
見にいった時に、そのデバイスにデータがなければ予め
定める時間、データの到着を待機して、待機時間中にデ
ータが到着すればそのデータを読み込んで転送先に転送
し、そうでなければ別のデバイスのデータの有無を見に
行く」という動作が実現可能となる。Therefore, for example, up to immediately before the branch instruction in the instruction memory, an instruction string for calculating the address of one device is stored, whereby the execution unit receives the address of the acquisition destination of the data to be transferred. To calculate. Also,
The branch condition is that "there is data to be transferred in the device specified by the address calculated immediately before". Further, immediately after the branch instruction, an instruction sequence for calculating the address of another device is stored, and an instruction sequence for calculating the address of the data transfer destination is stored at the branch destination. As a result, “When you look at the presence / absence of data in one device, if there is no data in that device, you wait for the arrival of data for a predetermined time, and if the data arrives during the waiting time, read that data. Then, it is transferred to the transfer destination, and if not, go to see if there is data in another device. "

【００２５】また、本発明に従えば、請求項２記載のダ
イレクトメモリアクセス装置において、デコーダは、予
め定める繰り返し命令を解読した際には、当該繰り返し
命令によって定められる繰り返し回数および命令数を出
力する。命令数カウンタは、実行された命令の数を、前
記繰り返し命令によって定められた命令数までカウント
する。回数カウンタは、前記命令数カウンタのカウント
終了回数を繰り返し命令で指示された繰り返し回数まで
カウントする。前記命令フェッチ部は、前記命令カウン
タのカウント終了毎に前記繰り返し命令の次の命令を指
し示し、前記回数カウンタのカウント終了時に、前記繰
り返し命令に続く予め定める数の命令列の次の命令を指
し示す。実行部は、命令フェッチ部の指示に従って前記
デコーダが解読した命令を実行する。According to the present invention, in the direct memory access device according to the second aspect, when the decoder decodes the predetermined repeat instruction, the decoder outputs the repeat count and the instruction number defined by the repeat instruction. . The instruction number counter counts the number of executed instructions up to the number of instructions determined by the repetitive instruction. The count counter counts the count end count of the instruction count counter up to the repeat count designated by the repeat command. The instruction fetch unit points to an instruction next to the repetitive instruction each time the count of the instruction counter ends, and points to a next instruction of a predetermined number of instruction sequences following the repetitive instruction when the count of the number of times counter ends. The execution unit executes the instruction decoded by the decoder according to the instruction of the instruction fetch unit.

【００２６】以上のように、命令カウンタが実行された
命令の数を繰り返し実行すべき命令数までカウントし、
回数カウンタが命令数カウンタのカウント終了回数を繰
り返し回数までカウントするので、ダイレクトメモリア
クセス装置は、前記予め定める繰り返し命令を一度実行
するだけで、実行部が命令数および繰り返し回数をカウ
ントすることなく、予め定める数の命令を予め定める回
数だけ繰り返し実行することができる。従って、ダイレ
クトメモリアクセス装置の繰り返し処理におけるオーバ
ーヘッドが低減され、実効効率のよいアドレス計算およ
びデータ転送動作を行うことができる。As described above, the instruction counter counts the number of executed instructions up to the number of instructions to be repeatedly executed,
Since the number counter counts the count end number of the instruction number counter up to the number of repetitions, the direct memory access device only executes the predetermined repeating instruction once, and the execution unit does not count the number of instructions and the number of repetitions. A predetermined number of instructions can be repeatedly executed a predetermined number of times. Therefore, the overhead in the repetitive processing of the direct memory access device is reduced, and the address calculation and data transfer operation can be performed with high efficiency.

【００２７】さらに、本発明に従えば、請求項３記載の
ダイレクトメモリアクセス装置において、デコーダは、
前記命令メモリに格納されている転送プログラムを１命
令ずつ読み出して解読し、予め定める分岐命令を解読し
た際には、当該分岐命令によって定められる待機期間、
分岐条件、分岐先を出力し、予め定める繰り返し命令を
解読した際には、繰り返し回数および命令数を出力す
る。実行部は、前記デコーダが解読した命令を実行す
る。Further, according to the present invention, in the direct memory access device according to claim 3, the decoder is
When the transfer program stored in the instruction memory is read and decoded one by one, and a predetermined branch instruction is decoded, a waiting period determined by the branch instruction,
The branch condition and branch destination are output, and when the predetermined repeat instruction is decoded, the number of repeats and the number of instructions are output. The execution unit executes the instruction decoded by the decoder.

【００２８】命令フェッチ部は、デコーダが次に読み出
すべき命令を指し示す。命令フェッチ部は、前記分岐命
令が実行されている場合には分岐条件に基づき、前記繰
り返し命令が実行されている場合には命令数カウンタの
カウントに基づき、デコーダが次に読み出すべき命令を
指し示す。その他の場合は、命令メモリ内の命令を順次
指し示す。The instruction fetch unit indicates the instruction to be read next by the decoder. The instruction fetch unit indicates the instruction to be read next by the decoder based on the branch condition when the branch instruction is executed and based on the count of the instruction number counter when the repeat instruction is executed. In other cases, the instructions in the instruction memory are sequentially pointed.

【００２９】命令数カウンタは、実行された命令の数
を、繰り返し命令で指示された命令数までカウントす
る。回数カウンタは、前記命令数カウンタのカウント終
了回数を繰り返し命令で指示された繰り返し回数までカ
ウントする。前記命令フェッチ部は、前記分岐命令が実
行されている場合、待機期間中に分岐条件が成立したと
きには命令メモリ内の前記分岐先の命令から順次、命令
を指し示し、待機期間中に分岐条件が成立しなかったと
きには待機期間終了後、前記分岐命令の次の命令から順
次、命令を指し示す。また、前記繰り返し命令が実行さ
れている場合、前記命令カウンタのカウント終了毎に前
記繰り返し命令の次の命令を指し示し、前記回数カウン
タのカウント終了時に、前記繰り返し命令に続く予め定
める数の命令列の次の命令を指し示す。The instruction number counter counts the number of executed instructions up to the number of instructions designated by the repeat instruction. The count counter counts the count end count of the instruction count counter up to the repeat count designated by the repeat command. When the branch instruction is being executed, the instruction fetch unit sequentially points to instructions from the branch destination instruction in the instruction memory when the branch condition is satisfied during the standby period, and the branch condition is satisfied during the standby period. If not, after the waiting period ends, instructions are sequentially pointed to from the instruction next to the branch instruction. When the repeat instruction is being executed, the instruction next to the repeat instruction is indicated each time the count of the instruction counter ends, and when the count of the number of times counter ends, a predetermined number of instruction sequences following the repeat instruction Points to the next instruction.

【００３０】これによって、ダイレクトメモリアクセス
装置は、請求項１記載のダイレクトメモリアクセス装置
の分岐動作および請求項２記載のダイレクトメモリアク
セス装置の繰り返し動作を個別に行うことができるとと
もに、それぞれの動作を組み合わせて行うことができ
る。さらに、本発明に従えば、請求項４記載のダイレク
トメモリアクセス装置の請求項１、請求項２または請求
項３記載の実行部において、計算部は、レジスタを用い
てアドレス計算を行なう。タイミングカウンタは、前記
計算部が計算したアドレスの数をカウントし、予め定め
るカウント数毎にタイミング信号を出力する。これによ
って、従来では利用者プログラムの処理を行うプロセッ
サがダイレクトメモリアクセス装置のデータ転送回数を
カウントして並列計算機システム内のダイレクトメモリ
アクセス装置の同期を取っていたのに対し、本発明によ
れば、タイミングカウンタが前記計算部が計算したアド
レスの数をカウントするので、プロセッサの負荷を軽減
することができ、プロセッサの実効効率を向上すること
ができる。As a result, the direct memory access device can individually perform the branch operation of the direct memory access device according to claim 1 and the repetitive operation of the direct memory access device according to claim 2, and perform each operation. It can be performed in combination. Further, according to the invention, in the execution unit according to claim 1, claim 2 or claim 3 of the direct memory access device according to claim 4, the calculation unit performs address calculation using a register. The timing counter counts the number of addresses calculated by the calculation unit and outputs a timing signal for each predetermined count number. As a result, according to the present invention, the processor for processing the user program conventionally counts the number of data transfers of the direct memory access device to synchronize the direct memory access device in the parallel computer system. Since the timing counter counts the number of addresses calculated by the calculation unit, the load on the processor can be reduced and the effective efficiency of the processor can be improved.

【００３１】また、さらに、本発明に従えば、請求項５
記載のデータ転送装置において、第１のバッファは、前
記第３のポートを介して入力されたデータを一時記憶す
る。第２のバッファは、前記第２のポートを介して入力
されたデータを一時記憶する。第３のダイレクトメモリ
アクセス装置は、転送すべきデータが格納されているメ
モリ内のアドレスを生成する。同時に、前記第３のポー
トを介して前記アドレスからデータを読み出して、前記
第１のバッファに格納するよう制御信号を出力する。ま
た、第３のダイレクトメモリアクセス装置は、前記第２
のバッファに格納されているデータを書き込むべきメモ
リ内のアドレスを生成する。同時に、前記第３のポート
を介してメモリ内の前記アドレスにデータを書き込むよ
う制御信号を出力する。第２のダイレクトメモリアクセ
ス装置は、転送すべきデータが格納されているデバイス
のアドレスを生成する。同時に、前記第２のポートを介
して前記デバイスからデータを読み出して前記第２のバ
ッファに格納するよう制御信号を出力する。第１のダイ
レクトメモリアクセス装置は、前記第１のバッファに格
納されているデータを書き込むべきバッファのアドレス
を生成する。同時に、前記第１のポートを介して前記バ
ッファにデータを書き込むよう制御信号を出力する。Furthermore, according to the present invention, claim 5
In the described data transfer device, the first buffer temporarily stores the data input via the third port. The second buffer temporarily stores the data input via the second port. The third direct memory access device generates an address in the memory where the data to be transferred is stored. At the same time, the control signal is output to read the data from the address via the third port and store the data in the first buffer. The third direct memory access device is the second direct memory access device.
Generates an address in memory to write the data stored in the buffer. At the same time, a control signal is output via the third port to write data to the address in the memory. The second direct memory access device generates the address of the device in which the data to be transferred is stored. At the same time, a control signal is output to read data from the device via the second port and store the data in the second buffer. The first direct memory access device generates an address of a buffer in which the data stored in the first buffer should be written. At the same time, a control signal is output via the first port to write data in the buffer.

【００３２】前述の第１、第２および第３のダイレクト
メモリアクセス装置は、請求項１、請求項２、請求項３
または請求項４記載のダイレクトメモリアクセス装置で
あるので、データ転送装置は、請求項１、請求項２、請
求項３または請求項４記載のダイレクトメモリアクセス
装置の制御に基づいてデータ転送動作を行うことができ
る。The above-mentioned first, second and third direct memory access devices are claimed in claim 1, claim 2 and claim 3.
Alternatively, since it is the direct memory access device according to claim 4, the data transfer device performs the data transfer operation based on the control of the direct memory access device according to claim 1, claim 2, claim 3, or claim 4. be able to.

【００３３】[0033]

【実施例】次に本発明の第１実施例のダイレクトメモリ
アクセス装置（以下、ＤＭＡＣという）２５０につい
て、図１および図２を参照しながら説明する。図１は、
本発明の第１実施例であるＤＭＡＣ２５０を備えるデー
タ転送装置１５０の構成を示すブロック図である。図２
は、本発明の第１実施例のＤＭＡＣ２５０の構成を示す
ブロック図である。なお、図１に示すＤＭＡＣ１１０、
１２０、１３０、１４０は、ＤＭＡＣ２５０である。DESCRIPTION OF THE PREFERRED EMBODIMENTS Next, a direct memory access device (hereinafter referred to as a DMAC) 250 of a first embodiment of the present invention will be described with reference to FIGS. Figure 1
1 is a block diagram showing a configuration of a data transfer device 150 including a DMAC 250 which is a first embodiment of the present invention. Figure 2
FIG. 3 is a block diagram showing a configuration of a DMAC 250 according to the first embodiment of this invention. The DMAC 110 shown in FIG.
120, 130 and 140 are DMACs 250.

【００３４】データ転送装置１５０は、並列計算機シス
テムにおいて図９に示すデータ転送装置９１３に代えて
備えられる。データ転送装置１５０は、ＤＭＡＣ１１
０、１２０、１３０、１４０、ポート１０１、１０２、
１０３、外部アドレスバス１０４、外部データバス１０
５、内部バス１０６、データバッファ１５１、１５２お
よびセレクタ１６１などから構成される。The data transfer device 150 is provided in the parallel computer system in place of the data transfer device 913 shown in FIG. The data transfer device 150 uses the DMAC 11
0, 120, 130, 140, ports 101, 102,
103, external address bus 104, external data bus 10
5, the internal bus 106, the data buffers 151 and 152, the selector 161, and the like.

【００３５】データバッファ１５１、１５２は、ＰＥ９
０１間で転送すべきデータ、具体的には、座標、圧力、
応力、速度、温度などの物理データである配列データ、
制御情報、画像情報および音声情報などを一時格納す
る。外部データバス１０５および外部アドレスバス１０
４は、図９におけるローカルバス９１２であり、外部デ
ータバス１０５および外部アドレスバス１０４には当該
データ転送装置１５０を含むＰＥ９０１内のプロセッサ
９１０およびメモリ９１１が接続されている。データ転
送装置１５０は、各プロセッサ９１０および外部デバイ
スからのデータ転送要求によりデータの転送を開始す
る。The data buffers 151 and 152 are the PE 9
Data to be transferred between 01, specifically, coordinates, pressure,
Sequence data, which is physical data such as stress, velocity, and temperature,
Control information, image information, audio information, etc. are temporarily stored. External data bus 105 and external address bus 10
Reference numeral 4 denotes a local bus 912 in FIG. 9, and a processor 910 and a memory 911 in the PE 901 including the data transfer device 150 are connected to the external data bus 105 and the external address bus 104. The data transfer device 150 starts data transfer in response to a data transfer request from each processor 910 and an external device.

【００３６】ＤＭＡＣ１１０はポート１０１の受信制
御、ＤＭＡＣ１２０はポート１０２の送信制御、ＤＭＡ
Ｃ１３０はポート１０３の送信制御、ＤＭＡＣ１４０は
ポート１０３の受信制御を行なう。ＤＭＡＣ１１０、１
２０は外部デバイスであるバッファＦ１１〜Ｆｎｎのア
ドレスを生成する。ここでのアドレスはバッファＦ１１
〜Ｆｎｎの装置アドレス、即ちどのバッファＦ１１〜Ｆ
ｎｎを選択するかを示す。また、ＤＭＡＣ１３０、１４
０はメモリ９１１のアドレスを生成する。The DMAC 110 controls the reception of the port 101, the DMAC 120 controls the transmission of the port 102, and the DMA.
The C 130 controls transmission of the port 103, and the DMAC 140 controls reception of the port 103. DMAC110, 1
20 generates addresses of the buffers F11 to Fnn which are external devices. The address here is the buffer F11.
~ Fnn device address, i.e. which buffer F11-F
Indicates whether to select nn. In addition, the DMACs 130, 14
0 generates the address of the memory 911.

【００３７】その他、各ＤＭＡＣ１１０、１２０、１３
０、１４０は、読み出し書き込みのためのバス制御信
号、各ＤＭＡＣが同期的に動作するためのタイミング通
知信号を生成する。先ず、初期設定について述べる。外
部データバス１０５、内部バス１０６を介して命令メモ
リ１１１、１２１、１３１、１４１、データメモリ１１
４、１２４、１３４、１４４にデータを格納する。これ
はプロセッサ９１０が各命令メモリおよび各データメモ
リに直接書き込むことで実現する。In addition, each DMAC 110, 120, 13
Reference numerals 0 and 140 generate a bus control signal for reading and writing, and a timing notification signal for synchronously operating each DMAC. First, the initial setting will be described. Instruction memory 111, 121, 131, 141, data memory 11 via external data bus 105 and internal bus 106
Data is stored in 4, 124, 134, and 144. This is realized by the processor 910 directly writing to each instruction memory and each data memory.

【００３８】次にポート１０３からデータバッファ１５
２への転送について説明する。Next, from the port 103 to the data buffer 15
The transfer to No. 2 will be described.

【００３９】ＤＭＡＣ１４０はメモリ９１１内の転送す
べきデータが格納されている記憶領域のアドレスを生成
してアドレスライン１７４に送出し、セレクタ１６１を
介して外部アドレスバス１０４に送出する。同時に読み
出しを指示するバス制御信号を送出する。ＤＭＡＣ１４
０は、前記アドレスでメモリ９１１から転送すべきデー
タを読み出し、外部データバス１０５を介してデータバ
ッファ１５２へ格納する。The DMAC 140 generates the address of the storage area in the memory 911 in which the data to be transferred is stored, sends it to the address line 174, and sends it to the external address bus 104 via the selector 161. At the same time, a bus control signal for instructing reading is transmitted. DMAC14
0 reads the data to be transferred from the memory 911 at the address and stores it in the data buffer 152 via the external data bus 105.

【００４０】次にＤＭＡＣ１２０は転送先のバッファＦ
１１〜Ｆ１ｎの装置アドレスを生成してアドレスライン
１７２に送出するとともに、書き込みを指示するバス制
御信号を送出する。前記装置アドレスは、バッファＦ１
１〜Ｆ１ｎへ送られ、いずれかのバッファＦ１１〜Ｆ１
ｎが選択される。ＤＭＡＣ１２０は、同時にデータバッ
ファ１５２から転送すべきデータをデータライン１８２
に送出する。これによって、送出されたデータが選択さ
れたバッファＦ１１〜Ｆ１ｎに格納される。Next, the DMAC 120 is a transfer destination buffer F.
The device addresses 11 to F1n are generated and transmitted to the address line 172, and at the same time, the bus control signal for instructing writing is transmitted. The device address is the buffer F1.
1 to F1n, one of the buffers F11 to F1
n is selected. The DMAC 120 simultaneously transfers the data to be transferred from the data buffer 152 to the data line 182.
Send to. As a result, the transmitted data is stored in the selected buffers F11 to F1n.

【００４１】次にポート１０１０からポート１０３への
転送について説明する。Next, the transfer from the port 1010 to the port 103 will be described.

【００４２】ＤＭＡＣ１１０は転送すべきデータが格納
されているバッファＦ１１〜Ｆｎ１の装置アドレスを生
成してアドレスライン１７１に送出するとともに、読出
を指示するバス制御信号を送出する。前記装置アドレス
は、バッファＦ１１〜Ｆｎ１へ送られ、いずれかのバッ
ファＦ１１〜Ｆｎ１が選択される。同時に選択されたバ
ッファＦ１１〜Ｆｎ１から転送すべきデータがデータラ
イン１８１に読み出され、データバッファ１５１に格納
される。The DMAC 110 generates device addresses of the buffers F11 to Fn1 in which data to be transferred is stored and sends them to the address line 171, and also sends a bus control signal for instructing reading. The device address is sent to the buffers F11 to Fn1 and one of the buffers F11 to Fn1 is selected. At the same time, the data to be transferred from the selected buffers F11 to Fn1 is read to the data line 181, and stored in the data buffer 151.

【００４３】次にＤＭＡＣ１３０は、メモリ９１１内の
データを書き込むべき記憶領域のアドレスを生成してア
ドレスライン１７３に送出し、セレクタ１６１を介して
外部アドレスバス１０４に送出するとともに、書き込み
を指示するバス制御信号を送出する。同時に転送すべき
データを外部データバス１０５に送りだし、メモリ９１
１への書き込みを行なう。Next, the DMAC 130 generates an address of a storage area in which the data in the memory 911 is to be written, sends it to the address line 173, sends it to the external address bus 104 via the selector 161, and a bus for instructing writing. Send a control signal. At the same time, the data to be transferred is sent to the external data bus 105, and the memory 91
Write to 1.

【００４４】図２に示すようにＤＭＡＣ２５０は、命令
メモリ２０１、デコーダ２０２、実行部２０３、データ
メモリ２０４、プログラムカウンタ２０５および分岐制
御部２３０などから構成される。さらに分岐制御部２３
０は、ゲート２０７、待機カウンタ２０８、条件判定部
２０９および分岐アドレス部２１０などから構成され
る。As shown in FIG. 2, the DMAC 250 is composed of an instruction memory 201, a decoder 202, an execution unit 203, a data memory 204, a program counter 205, a branch control unit 230 and the like. Furthermore, the branch control unit 23
0 is composed of a gate 207, a standby counter 208, a condition determination unit 209, a branch address unit 210, and the like.

【００４５】命令メモリ２０１は、アドレス生成のため
の演算命令列を格納する。デコーダ２０２は、プログラ
ムカウンタ２０５の示すアドレスから命令メモリ２０１
内の命令を読みだし、解読した上で実行部２０３に指示
を与える。さらに、デコーダ２０２は、命令メモリ２０
１から読み出した命令が待機＆分岐命令である時は、停
止指示ライン２１６に停止指示信号を送出し、プログラ
ムカウンタ２０５を一時停止する。The instruction memory 201 stores an arithmetic instruction sequence for address generation. The decoder 202 starts the instruction memory 201 from the address indicated by the program counter 205.
The instructions in the above are read and decoded, and then instructions are given to the execution unit 203. Further, the decoder 202 includes the instruction memory 20.
When the instruction read from 1 is the standby & branch instruction, the stop instruction signal is sent to the stop instruction line 216 to suspend the program counter 205.

【００４６】実行部２０３は、デコーダ２０２により制
御される。実行部２０３はデータメモリ２０４とデータ
をやりとりしながら演算を実行し、アドレス、バス制御
信号およびタイミング通知信号を生成する。実行部２０
３が一組のアドレスおよびバス制御信号を送出する度
に、１つのデータがデータ転送装置１５０とメモリ９１
１との間またはデータ転送装置１５０とバッファＦ１１
〜Ｆｎｎとの間で転送される。実行部２０３によって算
出されたアドレスはアドレスライン２３２に、バス制御
信号はバス制御ライン２３３に、タイミング通知信号は
タイミング通知ライン２３４に送出される。The execution unit 203 is controlled by the decoder 202. The execution unit 203 executes an operation while exchanging data with the data memory 204, and generates an address, a bus control signal, and a timing notification signal. Execution unit 20
Each time 3 sends out a set of address and bus control signals, one piece of data is transferred to the data transfer device 150 and the memory 91.
1 or data transfer device 150 and buffer F11
To Fnn. The address calculated by the execution unit 203 is sent to the address line 232, the bus control signal is sent to the bus control line 233, and the timing notification signal is sent to the timing notification line 234.

【００４７】データメモリ２０４は、ベースアドレス、
カウンタのカウント数およびオフセットなどを格納す
る。プログラムカウンタ２０５は、次に読み出されるべ
き命令のアドレスをカウントする。ゲート２０７は、カ
ウント中信号と条件成立信号との論理積を分岐指示信号
として分岐指示ライン２２３に送出する。The data memory 204 has a base address,
Stores the count number and offset of the counter. The program counter 205 counts the address of the instruction to be read next. The gate 207 sends the logical product of the counting signal and the condition satisfaction signal to the branch instruction line 223 as a branch instruction signal.

【００４８】待機カウンタ２０８は、予め定める時間毎
に、例えば、１命令サイクル毎に待機時間をカウントダ
ウンする。待機カウンタ２０８は、待機時間のタイムア
ウトまではカウント中信号を活性化し、カウント中ライ
ン２１５に送出する。待機カウンタ２０８は、条件成立
ライン２２２からの条件成立信号によりリセットされ
る。The standby counter 208 counts down the standby time every predetermined time, for example, every one instruction cycle. The standby counter 208 activates the counting signal until the waiting time times out, and sends it to the counting line 215. The standby counter 208 is reset by the condition satisfaction signal from the condition satisfaction line 222.

【００４９】条件判定部２０９は、複数の分岐条件ライ
ン２３１を介して入力される分岐条件信号が条件を満足
するか否かを判定し、分岐条件信号が条件を満足する時
は条件成立信号を活性化して条件成立ライン２２２に送
出する。前記分岐条件ライン２３１は、例えば、バッフ
ァＦ１１〜Ｆｎｎ、データバッファ１５１、１５２、外
部アドレスバス１０４、外部データバス１０５などに接
続され、バッファＦ１１〜Ｆｎｎまたはデータバッファ
１５１、１５２内のデータの有無およびオーバーフロー
状態を示すとともに、外部アドレスバス１０４および外
部データバス１０５の使用状態を示す。条件判定部２０
９は、例えば、データ取得先のバッファＦ１１〜Ｆｎｎ
にデータがなければ条件不成立とし、データが到着した
時点で条件成立とする。The condition determination unit 209 determines whether or not the branch condition signal input via the plurality of branch condition lines 231 satisfies the condition. When the branch condition signal satisfies the condition, the condition determination signal is output. It is activated and sent to the condition establishment line 222. The branch condition line 231 is connected to, for example, the buffers F11 to Fnn, the data buffers 151 and 152, the external address bus 104, the external data bus 105, etc., and the presence or absence of data in the buffers F11 to Fnn or the data buffers 151 and 152 and The overflow state is shown, and the use states of the external address bus 104 and the external data bus 105 are shown. Condition determination unit 20
9 indicates, for example, buffers F11 to Fnn of the data acquisition destination.
If there is no data in, the condition is not satisfied, and the condition is satisfied when the data arrives.

【００５０】分岐アドレス部２１０は、分岐命令によっ
て分岐する命令メモリ２０１内の分岐先のアドレスを記
憶する。以上のような構成において、ＤＭＡＣ２５０
は、デコーダ２０２における命令フェッチ、デコード、
および実行部２０３における命令実行の３段階パイプラ
イン動作を行う。パイプライン動作とは、各段階で並行
して処理を実行し、次の段階に結果を送出すると、各段
階それぞれが同時に次の処理に移る動作方法をいう。The branch address unit 210 stores the address of the branch destination in the instruction memory 201 that branches according to the branch instruction. In the above configuration, the DMAC250
Is an instruction fetch, decode,
And a three-stage pipeline operation of instruction execution in the execution unit 203. The pipeline operation is an operation method in which processing is executed in parallel at each stage, and when a result is sent to the next stage, each stage simultaneously moves to the next process.

【００５１】図３は第１実施例のＤＭＡＣ２５０の動作
を示すタイミングチャートである。図３でサイクルＣ
０、Ｃ１、Ｃ２…は１命令を実行するサイクルを示す。
ここで、ＤＭＡＣ２５０は、待機＆分岐命令に従って、
予め定められる時間待機し、待機時間内に条件が成立す
れば分岐先の命令から実行し、そうでなければ待機＆分
岐命令の次の命令から実行するという動作を行う。前記
待機＆分岐命令は、”ｗａｉｔ＆ｂｒａＮＬ”
のように記述され、Ｎサイクルのうちに条件が成立すれ
ばＬに分岐することを示す。FIG. 3 is a timing chart showing the operation of the DMAC 250 of the first embodiment. Cycle C in Figure 3
0, C1, C2, ... Show cycles for executing one instruction.
Here, the DMAC 250 follows the wait & branch instruction
It waits for a predetermined time, and if the condition is satisfied within the waiting time, it executes from the instruction at the branch destination, and if not, it executes from the instruction next to the wait & branch instruction. The wait & branch instruction is "wait & bra N L".
Is described and indicates that the process branches to L if the condition is satisfied in N cycles.

【００５２】デコーダ２０２は待機＆分岐命令をサイク
ルＣ４で解読し、デコーダ出力ライン２２０に待機カウ
ント数Ｎ、条件、分岐先アドレスＬを送出する。これら
をそれぞれ待機カウンタ２０８、条件判定部２０９、分
岐アドレス部２１０がサイクルＣ５で取り込む。図３に
示す待機＆分岐命令”ｗａｉｔ＆ｂｒａ５Ｌ
１”は、５サイクルの間に条件が成立すれば分岐先アド
レスＬ１の命令”ｉ”から実行し、さもなくば次の命
令”ｅ”を実行することを示し、待機カウンタ２０８に
は「５」が設定される。条件成立は分岐条件信号が
「１」のときである。The decoder 202 decodes the wait & branch instruction in the cycle C4 and sends the wait count number N, the condition and the branch destination address L to the decoder output line 220. The standby counter 208, the condition determination unit 209, and the branch address unit 210 respectively fetch these in cycle C5. The wait & branch instruction “wait & bra 5 L shown in FIG. 3
1 "indicates that if the condition is satisfied during 5 cycles, the instruction" i "of the branch destination address L1 is executed, otherwise the next instruction" e "is executed, and the wait counter 208 indicates" 5 ". Is set. The condition is satisfied when the branch condition signal is "1".

【００５３】デコーダ２０２は待機＆分岐命令を解読す
ると、プログラムカウンタ２０５へ停止指示信号を出し
て一時停止を指示する。プログラムカウンタ２０５のカ
ウント再開のタイミングは、条件不成立の場合はカウン
ト終了信号に従い、条件成立の場合は分岐指示信号に従
う。以下、条件不成立の場合について説明する。When the decoder 202 decodes the wait & branch instruction, it issues a stop instruction signal to the program counter 205 to instruct temporary stop. The timing for restarting the count of the program counter 205 follows the count end signal when the condition is not satisfied, and follows the branch instruction signal when the condition is satisfied. The case where the conditions are not satisfied will be described below.

【００５４】待機カウンタ２０８は、サイクルＣ５から
サイクルＣ９にかけて「５」、「４」、「３」、
「２」、「１」とカウントダウンする。この間カウント
中信号は活性化されて「１」であるが、条件成立信号が
「０」のままであるので、論理積である分岐指示信号は
「０」である。従って、分岐動作は行なわない。サイク
ルＣ９でカウント終了信号が「１」となり、プログラム
カウンタ２０５は待機＆分岐命令の次のアドレスの命
令”ｅ”を実行する。以上により実行される命令列は”
ａ”、”ｂ”、”ｃ”、”ｗａｉｔ＆ｂｒａ”、”
ｅ”、”ｆ”となる。The standby counter 208 is "5", "4", "3", from cycle C5 to cycle C9.
Count down as "2" and "1". During this period, the counting signal is activated and is "1", but since the condition satisfaction signal is still "0", the branch instruction signal which is a logical product is "0". Therefore, no branching operation is performed. In cycle C9, the count end signal becomes "1", and the program counter 205 executes the instruction "e" at the address next to the wait & branch instruction. The instruction sequence executed by the above is "
a ”,“ b ”,“ c ”,“ wait & bra ”,“
e "and" f ".

【００５５】以下、条件成立の場合について説明する。
待機カウンタ２０８は、サイクルＣ５からカウントダウ
ンを始めるが、サイクルＣ６で条件成立信号が活性化さ
れ「１」となる。これによりサイクルＣ６で分岐指示信
号「１」を送出し、サイクルＣ８で待機カウンタ２０８
をリセットする。前記分岐指示信号により、プログラム
カウンタ２０５は、分岐アドレス「Ｌ１」を取り込み、
命令メモリ２０１へ送出する。ここでは、サイクルＣ７
から分岐先「Ｌ１」からの命令”ｉ”、”ｊ”、”ｋ”
…の実行に移行する。以上により実行される命令列は”
ａ”、”ｂ”、”ｃ”、”ｗａｉｔ＆ｂｒａ”、”
ｉ”、”ｊ”、”ｋ”、…となる。The case where the conditions are satisfied will be described below.
The standby counter 208 starts counting down from the cycle C5, and the condition satisfaction signal is activated to become "1" in the cycle C6. As a result, the branch instruction signal "1" is transmitted in cycle C6, and the standby counter 208
To reset. In response to the branch instruction signal, the program counter 205 fetches the branch address “L1”,
It is sent to the instruction memory 201. Here, cycle C7
From "L1" branch instruction from "i", "j", "k"
Move to the execution of. The instruction sequence executed by the above is "
a ”,“ b ”,“ c ”,“ wait & bra ”,“
i ”,“ j ”,“ k ”, ...

【００５６】条件不成立、成立いずれの場合も”ｗａｉ
ｔ＆ｂｒａ”の実行直後には後続命令実行が解読さ
れていないので、２サイクル程、デコーダ２０２から実
行部２０３に”ＮＯＰ”（ＮｏＯｐｅｒａｔｉｏｎ：
何もしない）が指示され、この間、デコーダ２０２にお
いて次の命令フェッチ、解読を行なう。以上のように本
実施例によれば、「ある時間待機し、その間に条件が成
立すれば分岐する」、具体的には、「一つのバッファの
データの有無を見にいった時に、そのバッファにデータ
がなければ予め定める時間、データの到着を待機して、
待機時間中にデータが到着すればそのデータを読み込ん
で転送先に転送し、そうでなければ別のバッファのデー
タの有無を見に行く」という動作が実現可能となる。Whether the condition is satisfied or not satisfied
Immediately after the execution of "t &bra", the subsequent instruction execution is not decoded, and therefore "NOP" (No Operation: No Operation: from the decoder 202 to the execution unit 203 for about two cycles.
Do nothing) is instructed. During this time, the decoder 202 fetches and decodes the next instruction. As described above, according to the present embodiment, “wait for a certain time and branch if a condition is satisfied during that time”, specifically, “when the presence or absence of data in one buffer is checked, If there is no data in, wait for the arrival of data for a predetermined time,
When the data arrives during the waiting time, the data is read and transferred to the transfer destination, and if not, go to see if there is data in another buffer ”.

【００５７】次に実行部２０３について説明する。図４
は、第１実施例の実行部２０３の構成を示すブロック図
である。実行部２０３は、汎用レジスタ４２５、計算部
４２６、タイミングカウンタ４２７を含む。汎用レジス
タ４２５は、計算部４２６において計算に使用されるデ
ータを一時記憶する。計算部４２６は、演算ユニットで
ありデコーダの指示により汎用レジスタ４２５およびデ
ータメモリ２０４とデータをやりとりして演算を実行
し、アドレスおよびバス制御信号を生成する。計算部４
２６は、アドレスを生成する毎に、アドレス生成通知信
号をタイミングカウンタ４２７に送出する。タイミング
カウンタ４２７は、アドレス生成通知信号の数を数えて
予め定められた値になった際にタイミング通知信号をタ
イミング通知ライン２３４に送出する。これにおいて、
各ＤＭＡＣ１１０、１２０、１３０、１４０は、タイミ
ングカウンタ４２７がタイミング通知信号を送出した時
点で、動作を停止する。Next, the execution unit 203 will be described. Figure 4
FIG. 3 is a block diagram showing a configuration of an execution unit 203 of the first embodiment. The execution unit 203 includes a general-purpose register 425, a calculation unit 426, and a timing counter 427. The general-purpose register 425 temporarily stores data used in the calculation by the calculation unit 426. The calculation unit 426 is an arithmetic unit and exchanges data with the general-purpose register 425 and the data memory 204 according to an instruction from the decoder to execute an arithmetic operation and generate an address and a bus control signal. Calculator 4
26 sends an address generation notification signal to the timing counter 427 every time an address is generated. The timing counter 427 counts the number of address generation notification signals and sends the timing notification signal to the timing notification line 234 when it reaches a predetermined value. In this,
Each of the DMACs 110, 120, 130, 140 stops its operation when the timing counter 427 sends out a timing notification signal.

【００５８】図５は、並列計算機システムにおけるＤＭ
ＡＣ２５０の同期的動作を説明するための図である。図
５（ａ）は、並列計算機システムの構成を示す。図５
（ｂ）は、ＤＭＡＣ２５０の動作を示すタイミングチャ
ートである。ここで、説明を簡単にするために各ＰＥ９
０１には、それぞれひとつのＤＭＡＣ２５０が備えられ
るものとする。FIG. 5 shows DM in the parallel computer system.
FIG. 6 is a diagram for explaining a synchronous operation of AC 250. FIG. 5A shows the configuration of the parallel computer system. Figure 5
(B) is a timing chart showing the operation of the DMAC 250. Here, in order to simplify the explanation, each PE 9
Each 01 is provided with one DMAC 250.

【００５９】各ＰＥ９０１内のＤＭＡＣ２５０におい
て、タイミングカウンタ４２７は、それぞれの計算部４
２６が、例えば、１０個のアドレスを生成すると、タイ
ミング通知信号をタイミング通知ライン２３４に送出す
る。各ＰＥ９０１からのタイミング通知ライン２３４
は、ゲート５０１の入力段に接続される。ゲート５０１
は、タイミング通知信号の論理積を取り、停止通知信号
を停止通知ライン５２０に送出する。従って、停止通知
信号は、並列計算機システム内のすべてのＤＭＡＣ２５
０が停止した時「１」となる。前記停止通知ライン５２
０は、各ＰＥ９０１のプロセッサ９１０に接続される。
各プロセッサ９１０は、停止通知信号「１」を検出する
と、ＰＥ９０１内のＤＭＡＣ２５０に次の転送動作を指
示するためのパラメータを設定し、命令メモリ２０１お
よびデータメモリ２０４などに次の転送動作に必要なア
ドレス生成命令およびアドレスデータを書き込む。これ
によって、プロセッサ９１０は、並列計算機システム全
体で処理する利用者プログラムに応じて、データ転送装
置１５０のデータ転送動作を切り換えることができる。
次いで、各ＰＥ９０１は、停止解除信号を停止解除ライ
ン５３０に送出し、各ＤＭＡＣ２５０の転送動作を再開
させる。In the DMAC 250 in each PE 901, the timing counter 427 is used for each calculation unit 4
When 26 generates, for example, 10 addresses, it sends a timing notification signal to the timing notification line 234. Timing notification line 234 from each PE 901
Are connected to the input stage of the gate 501. Gate 501
Takes the logical product of the timing notification signals and sends the stop notification signal to the stop notification line 520. Therefore, the stop notification signal is sent to all DMACs 25 in the parallel computer system.
When 0 stops, it becomes "1". The stop notification line 52
0 is connected to the processor 910 of each PE 901.
When each processor 910 detects the stop notification signal “1”, each processor 910 sets a parameter for instructing the DMAC 250 in the PE 901 to perform the next transfer operation, and the instruction memory 201 and the data memory 204 are required for the next transfer operation. Write the address generation instruction and address data. As a result, the processor 910 can switch the data transfer operation of the data transfer device 150 according to the user program processed by the entire parallel computer system.
Next, each PE 901 sends a stop release signal to the stop release line 530 to restart the transfer operation of each DMAC 250.

【００６０】以上のように、各プロセッサ９１０がＤＭ
ＡＣ２５０にプログラムすることにより、データ転送装
置１５０は、一般の数値シミュレーションなど通信パタ
ーンがランダムで一様な場合には均等にバッファＦ１１
〜Ｆｎｎを検索し、後述の粒子シミュレーションのよう
に相関性の高いＰＥ９０１間に偏ってデータの転送が行
われる場合には予め定めるバッファＦ１１〜Ｆｎｎを優
先的に検索して転送すべきデータの有無を調べるという
動作を行うことができる。As described above, each processor 910 is DM
By programming the AC 250, the data transfer device 150 can evenly buffer the buffer F11 when the communication pattern is random and uniform, such as general numerical simulation.
~ Fnn is searched, and when data is transferred biasedly between PEs 901 having high correlation as in the particle simulation described later, the predetermined buffers F11 to Fnn are searched preferentially to determine whether there is data to be transferred. You can perform the action of examining.

【００６１】具体的には、粒子シミュレーションでは、
注目している粒子の近傍ほどその粒子の影響が強いので
ＰＥ９０１間の通信パターンに偏りが生じる。この場合
は、相互に影響が強いと考えられる領域を担当するＰＥ
９０１からの受信を優先することによってデータ転送の
効率を向上することができる。さらに具体的には、粒子
シミュレーションでは、演算処理対象領域をブロックに
細分し、各ブロックの演算処理を各ＰＥ９０１が担当す
る。ここで、ＰＥ９０１（ｎ）が領域ｎを担当したとす
る。（ここで、ｎは自然数である。）また、領域２０に
隣接する領域が領域１９、２１、１０、３０であったと
する。このような場合、プロセッサ９１０（２０）は、
ＤＭＡＣ２５０（２０）にＰＥ９０１（１９）、ＰＥ９
０１（２１）、ＰＥ９０１（１０）、ＰＥ９０１（３
０）からのデータを優先的に受信するようアドレス生成
命令をプログラミングする。これにより、無作為にバッ
ファＦ１１〜Ｆｎｎを順次検索する場合に比較してより
効率的にデータの転送を行うことができる。Specifically, in the particle simulation,
The closer the target particle is, the stronger the effect of the particle is, so that the communication pattern between the PEs 901 is biased. In this case, PEs in charge of areas that are considered to have a strong influence on each other.
By prioritizing the reception from 901, the efficiency of data transfer can be improved. More specifically, in the particle simulation, the calculation processing target area is subdivided into blocks, and each PE 901 takes charge of the calculation processing of each block. Here, it is assumed that the PE 901 (n) is in charge of the area n. (Here, n is a natural number.) Further, it is assumed that the regions adjacent to the region 20 are the regions 19, 21, 10, and 30. In such a case, processor 910 (20)
DMAC250 (20) with PE901 (19), PE9
01 (21), PE901 (10), PE901 (3
Address generation instructions are programmed to preferentially receive data from 0). As a result, data can be transferred more efficiently as compared with the case where the buffers F11 to Fnn are sequentially searched.

【００６２】また、予め定めるバッファＦ１１〜Ｆｎｎ
からの受信を優先する場合にも、優先動作を様々にプロ
グラムすることができる。例えば、すべてのバッファ
Ｆ１１〜Ｆｎｎを順次検索し、優先順位の高いバッファ
Ｆ１１〜Ｆｎｎでの待機時間を長く設定する。あるい
は、ＤＭＡＣ２５０（１）がバッファＦ１３を優先的
に検索する場合を考えると、Ｆ１１→Ｆ１３→Ｆ１２→
Ｆ１３→Ｆ１３→Ｆ１３→Ｆ１４→Ｆ１３→Ｆ１５→…
のように、各バッファＦ１１〜Ｆ１ｎを順次検索する合
間に、優先順位の高いバッファＦ１１〜Ｆｎｎを検索す
る。というように、並列計算機システムが実行中の利用
者プログラムに応じて最適なデータ受信優先動作を選択
することができる。Further, predetermined buffers F11 to Fnn
In the case of prioritizing reception from the terminal, various priority operations can be programmed. For example, all the buffers F11 to Fnn are sequentially searched, and the waiting time in the buffers F11 to Fnn having a high priority is set to be long. Alternatively, considering the case where the DMAC 250 (1) preferentially searches the buffer F13, F11 → F13 → F12 →
F13 → F13 → F13 → F14 → F13 → F15 → ...
As described above, while the buffers F11 to F1n are sequentially searched, the buffers F11 to Fnn having a higher priority are searched. Thus, the optimum data reception priority operation can be selected according to the user program being executed by the parallel computer system.

【００６３】また、以上のように、本実施例の実行部２
０３は、タイミングカウンタ４２７においてアドレス生
成信号をカウントし、タイミング通知信号を送出するの
で、従来のようにプロセッサ９１０がＤＭＡＣ２５０の
データ転送回数をカウントしてＤＭＡＣ２５０の同期を
とる必要がなくなり、プロセッサ９１０の負荷を軽減す
ることができる。Further, as described above, the execution unit 2 of this embodiment
03 counts the address generation signal in the timing counter 427 and sends the timing notification signal, so that it is not necessary for the processor 910 to count the number of data transfers of the DMAC 250 and synchronize the DMAC 250 as in the conventional case. The load can be reduced.

【００６４】以下本発明の第２実施例のＤＭＡＣ６５０
について、図面を参照しなら説明する。図６は、本発明
の第２実施例のＤＭＡＣ６５０の構成を示すブロック図
である。本実施例のデータ転送装置は、図１に示すデー
タ転送装置１５０において、ＤＭＡＣ１１０、１２０、
１３０、１４０は、ＤＭＡＣ６５０である。ＤＭＡＣ６
５０は、ＤＭＡＣ２５０の分岐制御部２３０に代えて、
加算器６１２および繰り返し制御部６３１を備える。以
下、第１実施例と同様の構成については同一の参照符号
を付し、加算器６１２および繰り返し制御部６３１につ
いてのみ説明する。The DMAC 650 according to the second embodiment of the present invention will be described below.
Will be described with reference to the drawings. FIG. 6 is a block diagram showing the configuration of the DMAC 650 according to the second embodiment of the present invention. The data transfer device of this embodiment is the same as the data transfer device 150 shown in FIG.
Reference numerals 130 and 140 are DMACs 650. DMAC6
50 is replaced with the branch control unit 230 of the DMAC 250,
An adder 612 and a repetition control unit 631 are provided. The same reference numerals are given to the same components as those in the first embodiment, and only the adder 612 and the repetition control unit 631 will be described below.

【００６５】図６に示すように繰り返し制御部６３１
は、回数カウンタ６１３、命令数カウンタ６１４などを
含む。加算器６１２は、演算ユニットであり、デコーダ
２０２の指示により、プログラムカウンタ２０５が停止
した後、後述の命令数カウンタ６１４の命令オフセット
に予め定める値を乗算して次に実行すべき命令の相対ア
ドレス（繰り返し命令のアドレスを基準とする）を算出
し、プログラムカウンタ２０５の出力に加算して命令メ
モリ２０１に送出する。As shown in FIG. 6, the repeat control unit 631
Includes a number counter 613, an instruction number counter 614, and the like. The adder 612 is an arithmetic unit, and after the program counter 205 is stopped by the instruction of the decoder 202, the instruction offset of an instruction number counter 614 described later is multiplied by a predetermined value and the relative address of the instruction to be executed next. Calculate (based on the address of the repeat instruction), add it to the output of the program counter 205, and send it to the instruction memory 201.

【００６６】回数カウンタ６１３は、繰り返し処理にお
ける繰り返し回数をカウントする。回数カウンタ６１３
は、命令数カウンタ６１４のカウント終了信号毎に１つ
カウントダウンする。回数カウンタ６１３は、カウント
終了時にカウント終了信号「１」をカウント終了ライン
６２２に送出し、プログラムカウンタ２０５の動作を再
開させる。The number counter 613 counts the number of repetitions in the repetitive processing. Number counter 613
Counts down by one for each count end signal of the instruction number counter 614. The count counter 613 sends a count end signal “1” to the count end line 622 at the end of counting, and restarts the operation of the program counter 205.

【００６７】命令数カウンタ６１４は、繰り返し処理に
おいて実行された命令の数をカウントし、カウント終了
毎にカウント終了信号をカウント終了ライン６２１に送
出する。例えば、ここでは、命令数カウンタ６１４は、
繰り返し処理される命令が１つ実行される毎に、初期値
から１つカウントダウンする。さらに、命令数カウンタ
６１４は、カウント値から命令オフセットを算出し、命
令オフセットライン６２５に送出する。命令オフセット
とは、繰り返し命令から次に実行すべき命令までの命令
数である。命令数カウンタ６１４は、カウント終了時に
回数カウンタ６１３のカウント終了信号を検出し、回数
カウンタ６１３のカウント終了信号が「０」のときカウ
ント値を初期値にリセットし、カウントダウンを継続す
る。命令数カウンタ６１４は、回数カウンタ６１３のカ
ウント終了信号が「１」のとき、命令オフセットを命令
オフセットライン６２５に送出し、カウントダウンを終
了する。この時、プログラムカウンタは、このときの加
算器６１２の出力を取り込んで出力し、次の命令サイク
ルからカウントを開始する。The instruction number counter 614 counts the number of instructions executed in the repetitive processing, and sends a count end signal to the count end line 621 each time the count ends. For example, here, the instruction number counter 614 is
Each time one instruction that is repeatedly processed is executed, it counts down by one from the initial value. Further, the instruction number counter 614 calculates an instruction offset from the count value and sends it to the instruction offset line 625. The instruction offset is the number of instructions from the repeat instruction to the next instruction to be executed. The instruction number counter 614 detects the count end signal of the number counter 613 at the end of counting, resets the count value to the initial value when the count end signal of the number counter 613 is “0”, and continues the countdown. When the count end signal of the number counter 613 is "1", the instruction number counter 614 sends the instruction offset to the instruction offset line 625 and ends the countdown. At this time, the program counter takes in and outputs the output of the adder 612 at this time, and starts counting from the next instruction cycle.

【００６８】以上のように構成されたＤＭＡＣ６５０の
基本動作は第１実施例と同様である。従って、繰り返し
制御部６３１、加算器６１２を中心に、ＤＭＡＣ６５０
の動作を図７を用いて説明する。図７は本実施例のＤＭ
ＡＣ６５０の繰り返し動作を示すタイミングチャートで
ある。図７でサイクルＣ０、Ｃ１、Ｃ２…は一処理を実
行するのに要するサイクルを示す。The basic operation of the DMAC 650 constructed as above is the same as that of the first embodiment. Therefore, the iterative controller 631 and the adder 612 are used as the main components of the DMAC 650.
The operation of will be described with reference to FIG. FIG. 7 shows the DM of this embodiment.
6 is a timing chart showing a repeated operation of AC650. In FIG. 7, cycles C0, C1, C2 ... Show the cycles required to execute one process.

【００６９】ここで、繰り返し命令は、”ｌｏｏｐＮ
Ｍ”のように記述され、繰り返し命令に続いて連続す
るＮ個の命令をＭ回、計Ｎ＊Ｍ回実行することを示す。
デコーダ２０２は繰り返し命令をサイクルＣ３で解読
し、繰り返し回数、命令数をデコーダ出力ライン２２０
に送出する。これらをそれぞれ回数カウンタ６１３、命
令数カウンタ６１４がサイクルＣ３で取り込む。Here, the repeat instruction is "loop N
It is described as "M", and indicates that N consecutive instructions are executed M times in succession after the repeat instruction, that is, N * M times in total.
The decoder 202 decodes the repeated instruction in the cycle C3, and determines the number of repetitions and the number of instructions in the decoder output line 220.
Send to. The number counter 613 and the instruction number counter 614 respectively fetch these in cycle C3.

【００７０】ここでの繰り返し命令”ｌｏｏｐ２
３”は後続の２命令を３回繰り返すことを示す。デコー
ダ２０２は繰り返し命令を解読すると、停止指示信号を
停止指示ライン２１６に送出してプログラムカウンタ２
０５に一時停止を指示する。回数カウンタ６１３、命令
数カウンタ６１４はそれぞれサイクルＣ３よりカウント
ダウン動作を始める。ここでは、回数カウンタ６１３が
「３」、「２」、「１」、命令数カウンタ６１４が
「２」、「１」とカウントダウンする。The repeat instruction "loop 2 here"
3 "indicates that the following two instructions are repeated three times. When the decoder 202 decodes the repeated instruction, it sends a stop instruction signal to the stop instruction line 216 and the program counter 2
05 is instructed to pause. The number counter 613 and the instruction number counter 614 each start counting down from cycle C3. Here, the number counter 613 counts down to “3”, “2”, “1”, and the instruction number counter 614 counts down to “2”, “1”.

【００７１】この間プログラムカウンタ２０５は動作を
停止しているので、加算器６１２からはプログラムカウ
ンタ２０５の出力に命令数カウンタ６１４から送出され
る命令オフセットを加算したものが送出される。ここで
は命令オフセットとして「１」、「２」が交互に送出さ
れ、これにより命令”ｃ”、”ｄ”が命令メモリ２０１
から連続的に読み出される。During this time, the program counter 205 has stopped operating, so the adder 612 sends the output of the program counter 205 plus the instruction offset sent from the instruction number counter 614. Here, "1" and "2" are alternately transmitted as the command offset, whereby the commands "c" and "d" are transferred to the command memory 201.
Are continuously read from.

【００７２】回数カウンタ６１３は命令数カウンタ６１
４のカウント終了信号によりカウントダウンする。図７
ではサイクルＣ５、Ｃ７、Ｃ９である。最終的にはサイ
クルＣ９で回数カウンタ６１３のカウント終了信号が送
出され、命令数カウンタ６１４はカウントダウンを終了
する。このとき命令オフセットは「３」である。また、
プログラムカウンタ２０５は、このときの加算器６１２
の出力を取り込み、サイクルＣ１０から命令”ｅ”以降
のアドレスをカウントする。The number counter 613 is the instruction number counter 61.
It counts down by the count end signal of 4. Figure 7
Then, the cycles are C5, C7, and C9. Finally, in cycle C9, the count end signal of the number counter 613 is transmitted, and the instruction number counter 614 ends the countdown. At this time, the instruction offset is “3”. Also,
The program counter 205 uses the adder 612 at this time.
The output of the instruction is fetched, and the addresses after the instruction "e" are counted from the cycle C10.

【００７３】図７のタイミングチャートに従って実行さ
れる命令列を再度書くと、”ａ”、”ｂ”、”ｌｏｏ
ｐ”、”ｃ”、”ｄ”、”ｃ”、”ｄ”、”ｃ”、”
ｄ”、”ｅ”となる。図１１で示した従来例ではここで
言う”ｃ”、”ｄ”と”ｃ”、”ｄ”の間に”ＤＢｃ
ｃ”が入ることになる。本実施例の命令列を従来のもの
で実行すれば、”ａ”、”ｂ”、”ｃ”、”ｄ”、”Ｄ
Ｂｃｃ”、”ｃ”、”ｄ”、”ＤＢｃｃ”、”ｃ”、”
ｄ”、”ＤＢｃｃ”、”ｅ”となり、繰り返しの度にＤ
Ｂｃｃ命令によるオーバーヘッドが生じる。すなわち、
従来のプロセッサによるアドレス計算ではアドレスを１
つ計算する毎にオーバーヘッドを生じることになり、高
速なアドレス計算を要求されるダイレクトメモリアクセ
ス装置においては不都合である。本実施例では繰り返し
命令を実行するのは最初のｌｏｏｐ命令のただ１回だけ
である。When the instruction sequence executed according to the timing chart of FIG. 7 is rewritten, "a", "b", "loo"
p "," c "," d "," c "," d "," c ","
In the conventional example shown in FIG. 11, "DBc" is placed between "c", "d" and "c", "d".
c "is entered. If the instruction sequence of this embodiment is executed by the conventional one," a "," b "," c "," d "," D ".
Bcc "," c "," d "," DBcc "," c ","
d ”,“ DBcc ”,“ e ”, and D for each repetition
The Bcc instruction causes overhead. That is,
In the conventional address calculation by the processor, the address is 1
This causes an overhead for each calculation, which is inconvenient in a direct memory access device that requires high-speed address calculation. In this embodiment, the repetitive instruction is executed only once in the first loop instruction.

【００７４】従って、以上のように本実施例によれば、
ほとんどオーバーヘッドなしに繰り返し命令を実行する
ことができ、アドレス計算において高頻度に繰り返し処
理を要求されるダイレクトメモリアクセス装置に好適に
実施することができる。Therefore, according to the present embodiment as described above,
The repetitive instruction can be executed with almost no overhead, and the repetitive processing can be preferably performed in the direct memory access device that requires the repetitive processing with high frequency in the address calculation.

【００７５】第１実施例で条件分岐処理、第２実施例で
繰り返し処理について述べたが、これらを合わせ持った
装置も可能である。ここではそれについて述べる。以下
本発明の第３実施例のダイレクトメモリアクセス装置に
ついて、図面を参照しながら説明する。Although the conditional branching process is described in the first embodiment and the iterative process is described in the second embodiment, an apparatus having both of them is also possible. I will describe it here. A direct memory access device according to a third embodiment of the present invention will be described below with reference to the drawings.

【００７６】図８は、本発明の第３実施例のＤＭＡＣ８
５０の構成を示すブロック図である。本実施例のデータ
転送装置は、図１に示すデータ転送装置１５０におい
て、ＤＭＡＣ１１０、１２０、１３０、１４０は、ＤＭ
ＡＣ８５０である。ＤＭＡＣ８５０は、第１実施例の分
岐制御部２３０と、第２実施例の加算器６１２と繰り返
し制御部６３１とを同時に備える。FIG. 8 shows the DMAC 8 of the third embodiment of the present invention.
It is a block diagram which shows the structure of 50. The data transfer apparatus of this embodiment is the same as the data transfer apparatus 150 shown in FIG. 1, except that the DMACs 110, 120, 130, 140 are DMs.
AC850. The DMAC 850 includes the branch control unit 230 of the first embodiment, the adder 612 of the second embodiment, and the repetition control unit 631 at the same time.

【００７７】以上のように構成されたＤＭＡＣ８５０お
よびデータ転送装置により第１実施例で示した条件分岐
処理及び第２実施例で示した繰り返し処理を実現するこ
とができる。分岐に関しては分岐制御部２３０が、繰り
返しに関しては繰り返し制御部６３１が制御する。これ
らについては第１実施例、第２実施例の説明と同様であ
る。The conditional branching process shown in the first embodiment and the iterative process shown in the second embodiment can be realized by the DMAC 850 and the data transfer device configured as described above. The branch control unit 230 controls branching, and the repetition control unit 631 controls repetition. These are the same as those described in the first and second embodiments.

【００７８】以上のような構成を備えることにより、第
１実施例の待機＆分岐処理と第２実施例の繰り返し処理
との組み合わせ動作を行うことができ、異なる条件に対
応して、例えば、利用者プログラムの処理内容によって
異なる転送データの発生状況に対応して、より適切な転
送動作を選択することができる。With the above configuration, the combined operation of the waiting & branching process of the first embodiment and the iterative process of the second embodiment can be performed. It is possible to select a more appropriate transfer operation in response to the generation status of transfer data that differs depending on the processing content of the user program.

【００７９】[0079]

【発明の効果】以上のように、請求項１記載の本発明に
よれば、実行部は待機期間中に分岐条件が成立したとき
は命令メモリ内の前記分岐先の命令から順次、命令を実
行し、待機期間中に分岐条件が成立しなかったときは待
機期間終了後、前記分岐命令の次の命令から順次、命令
を実行する。As described above, according to the present invention as set forth in claim 1, when the branch condition is satisfied during the waiting period, the execution unit sequentially executes the instructions from the branch destination instruction in the instruction memory. When the branch condition is not satisfied during the waiting period, the instructions are sequentially executed from the instruction next to the branch instruction after the waiting period ends.

【００８０】従って、例えば、命令メモリ内の前記分岐
命令の直前までは１つのデバイスのアドレスを計算する
命令列を格納しておき、これによって、実行部に、転送
すべきデータの取得先のアドレスを計算させる。また、
「直前に計算したアドレスで指定されるデバイスの中に
転送すべきデータが有る」ということを分岐条件として
おく。さらに、前記分岐命令の直後に他のデバイスのア
ドレスを計算する命令列を格納しておき、分岐先にデー
タの転送先のアドレスを計算する命令列を格納してお
く。これによって、ダイレクトメモリアクセス装置は、
「一つのデバイスのデータの有無を見にいった時に、そ
のデバイスにデータがなければ予め定める時間、データ
の到着を待機して、待機時間中にデータが到着すればそ
のデータを読み込んで転送先に転送し、そうでなければ
別のデバイスのデータの有無を見に行く」というデータ
転送動作を実現することができる。Therefore, for example, until just before the branch instruction in the instruction memory, an instruction sequence for calculating the address of one device is stored, and the execution unit is thereby provided with the address of the acquisition destination of the data to be transferred. To calculate. Also,
The branch condition is that "there is data to be transferred in the device specified by the address calculated immediately before". Further, immediately after the branch instruction, an instruction sequence for calculating the address of another device is stored, and an instruction sequence for calculating the address of the data transfer destination is stored at the branch destination. This allows the direct memory access device to
"When I checked the presence / absence of data in one device, if there was no data in that device, I waited for the arrival of data for a predetermined time, and if data arrived during the waiting time, read that data and transfer it to the destination. Data transfer operation, otherwise, go to see if there is data in another device. "

【００８１】また、前述の分岐命令に限らず、本発明の
ダイレクトメモリアクセス装置は、命令メモリに格納さ
れた転送プログラムに基づいてデータ転送を制御するの
で、転送プログラムの内容によって自由に環境設定を行
うことができ、拡張性および汎用性に優れる。例えば、
転送順位に優先権を付与したり、転送先のデバイスの数
を自由に設定することができる。Further, the direct memory access device of the present invention controls data transfer based on the transfer program stored in the instruction memory, not limited to the branch instruction described above, so that the environment can be freely set according to the contents of the transfer program. It can be performed and is highly expandable and versatile. For example,
Priority can be given to the transfer order, and the number of transfer destination devices can be freely set.

【００８２】以上のように、請求項２記載の本発明によ
れば、ダイレクトメモリアクセス装置は、命令カウンタ
が実行された命令の数を繰り返し実行すべき命令数まで
カウントし、回数カウンタが命令数カウンタのカウント
終了回数を繰り返し回数までカウントするので、ダイレ
クトメモリアクセス装置は、前記予め定める繰り返し命
令を一度実行するだけで、実行部が命令数および繰り返
し回数をカウントすることなく、予め定める数の命令を
予め定める回数だけ繰り返し実行することができる。従
って、本発明のダイレクトメモリアクセス装置は、繰り
返し処理におけるオーバーヘッドが低減することがで
き、ひいては実効効率のよいアドレス計算およびデータ
転送動作を行うことができる。As described above, according to the present invention, the direct memory access device counts the number of instructions executed by the instruction counter up to the number of instructions to be repeatedly executed, and the number counter counts the number of instructions. Since the count end count of the counter is counted up to the repeat count, the direct memory access device only executes the predetermined repeat instruction once, and the executing unit does not count the number of instructions and the repeat count, but a predetermined number of instructions. Can be repeatedly executed a predetermined number of times. Therefore, the direct memory access device of the present invention can reduce the overhead in the repetitive processing, and consequently can perform the address calculation and the data transfer operation with the effective efficiency.

【００８３】以上のように、請求項３記載の本発明によ
れば、ダイレクトメモリアクセス装置は、請求項１記載
のダイレクトメモリアクセス装置の分岐動作および請求
項２記載のダイレクトメモリアクセス装置の繰り返し動
作を個別に行うことができるとともに、それぞれの動作
を組み合わせて行うことができる。以上のように、請求
項４記載の本発明によれば、従来では利用者プログラム
の処理を行うプロセッサがダイレクトメモリアクセス装
置のデータ転送回数をカウントして並列計算機システム
内のダイレクトメモリアクセス装置の同期を取っていた
のに対し、本発明のダイレクトメモリアクセス装置は、
タイミングカウンタが前記計算部が計算したアドレスの
数をカウントするので、プロセッサの負荷を軽減するこ
とができ、プロセッサの実効効率を向上することができ
る。As described above, according to the present invention as set forth in claim 3, the direct memory access device has the branch operation of the direct memory access device according to claim 1 and the repetitive operation of the direct memory access device according to claim 2. Can be performed individually, and each operation can be performed in combination. As described above, according to the present invention as set forth in claim 4, the processor for processing the user program conventionally counts the number of data transfers of the direct memory access device and synchronizes the direct memory access device in the parallel computer system. The direct memory access device of the present invention,
Since the timing counter counts the number of addresses calculated by the calculation unit, the load on the processor can be reduced and the effective efficiency of the processor can be improved.

【００８４】以上のように、請求項５記載の本発明によ
れば、前記第１、第２および第３のダイレクトメモリア
クセス装置は、請求項１、請求項２、請求項３または請
求項４記載のダイレクトメモリアクセス装置であるの
で、データ転送装置は、請求項１、請求項２、請求項３
または請求項４記載のダイレクトメモリアクセス装置の
制御に基づいてデータ転送動作を行うことができる。As described above, according to the present invention of claim 5, the first, second and third direct memory access devices are any one of claim 1, claim 2, claim 3 or claim 4. Since it is the direct memory access device described in the above, the data transfer device is defined by claim 1, claim 2, or claim 3.
Alternatively, the data transfer operation can be performed under the control of the direct memory access device according to the fourth aspect.

[Brief description of drawings]

【図１】本発明の第１実施例のＤＭＡＣ２５０を備える
データ転送装置１５０の構成を示すブロック図である。FIG. 1 is a block diagram showing a configuration of a data transfer device 150 including a DMAC 250 according to a first embodiment of the present invention.

【図２】本発明の第１実施例のＤＭＡＣ２５０の構成を
示すブロック図である。FIG. 2 is a block diagram showing a configuration of a DMAC 250 according to the first embodiment of this invention.

【図３】第１実施例のＤＭＡＣ２５０の動作を示すタイ
ミングチャートである。FIG. 3 is a timing chart showing the operation of the DMAC 250 of the first embodiment.

【図４】第１実施例の実行部２０３の構成を示すブロッ
ク図である。FIG. 4 is a block diagram showing a configuration of an execution unit 203 of the first embodiment.

【図５】並列計算機システムにおけるＤＭＡＣ２５０の
同期的動作を説明するための図である。FIG. 5 is a diagram for explaining a synchronous operation of the DMAC 250 in the parallel computer system.

【図６】本発明の第２実施例のＤＭＡＣ６５０の構成を
示すブロック図である。FIG. 6 is a block diagram showing a configuration of a DMAC 650 according to a second embodiment of the present invention.

【図７】第２実施例のＤＭＡＣ６５０のくり返し動作を
説明するためのタイミングチャートである。FIG. 7 is a timing chart for explaining a repeating operation of the DMAC 650 of the second embodiment.

【図８】本発明の第３実施例のＤＭＡＣ８５０の構成を
示すブロック図である。FIG. 8 is a block diagram showing a configuration of a DMAC 850 according to a third embodiment of the present invention.

【図９】従来の並列計算機システムの概略構成を示すブ
ロック図である。FIG. 9 is a block diagram showing a schematic configuration of a conventional parallel computer system.

【図１０】ＰＥ９０１（１）〜ＰＥ９０１（ｎ）間のデ
ータを伝送するネットワーク９３０の構成を示すブロッ
ク図である。FIG. 10 is a block diagram showing a configuration of a network 930 that transmits data between PEs 901 (1) to 901 (n).

【図１１】従来のマイクロプロセッサの命令列を示す図
である。FIG. 11 is a diagram showing an instruction sequence of a conventional microprocessor.

[Explanation of symbols]

２０１命令メモリ２０２デコーダ２０３実行部２０４データメモリ２０５プログラムカウンタ２０７ゲート２０８待機カウンタ２０９条件判定部２１０分岐アドレス部２３０分岐制御部 201 instruction memory 202 decoder 203 execution unit 204 data memory 205 program counter 207 gate 208 standby counter 209 condition determination unit 210 branch address unit 230 branch control unit

Claims

[Claims]

1. In a parallel computer system in which a plurality of processor elements each including a processor, a memory, and a data transfer device are connected via a network, a memory in a processor element to which the processor element belongs and a plurality of devices on the network. A direct memory access device for use in a data transfer device for transferring data between a plurality of memory devices, comprising an instruction memory for storing a transfer program and one instruction from the instruction memory for decoding and decoding a predetermined branch instruction. Includes a decoder that outputs a waiting period determined by the branch instruction, a branch condition, and a branch destination, and an execution unit that executes the instruction decoded by the decoder. The execution unit satisfies the branch condition during the waiting period. When this is done, the instructions are executed sequentially from the branch destination instruction in the instruction memory, After completion of the waiting period when the branch condition during aircraft period is not satisfied, sequentially from the next instruction of the branch instruction, the direct memory access device and executes the instructions.

2. In a parallel computer system in which a plurality of processor elements each including a processor, a memory, and a data transfer device are connected via a network, a memory in a processor element to which the processor element belongs and a plurality of devices on the network. A direct memory access device for use in a data transfer device for transferring data between a plurality of memory devices, comprising: an instruction memory for storing a transfer program; Is a decoder that outputs the number of repetitions and the number of instructions determined by the repeat instruction, an execution unit that executes the instruction decoded by the decoder, and the number of executed instructions, and the number of instructions determined by the repeat instruction. The instruction counter that counts up to The instruction fetch unit that sequentially points to the instructions in the memory and, when the repeat instruction is executed, points to the decoder the instruction to be read next based on the count of the instruction number counter, and the count end count of the instruction number counter. A count counter that counts up to the number of repetitions indicated by a repeat instruction, wherein the instruction fetch unit indicates the next instruction of the repeat instruction each time the count of the instruction counter ends, and when the count of the count counter ends, A direct memory access device characterized by pointing to the next instruction of a predetermined number of instruction sequences following a repetitive instruction.

3. A parallel computer system in which a plurality of processor elements each including a processor, a memory, and a data transfer device are connected via a network, and a memory in a processor element to which the processor element belongs and a plurality of devices on the network. A direct memory access device for use in a data transfer device for transferring data between a plurality of memory devices, comprising an instruction memory for storing a transfer program and one instruction from the instruction memory for decoding and decoding a predetermined branch instruction. Is a decoder that outputs a waiting period determined by the branch instruction, a branch condition, and a branch destination, and outputs a repeat count and the number of instructions when decoding a predetermined repeat instruction, and an instruction decoded by the decoder. The execution part to be executed and the number of executed instructions An instruction counter that counts up to the instructed number of instructions, and sequentially points to instructions in the instruction memory. Based on a branch condition when the branch instruction is executed, an instruction when the repeat instruction is executed The instruction fetch unit includes an instruction fetch unit that indicates to a decoder the instruction to be read next based on the count of the number counter, and a number counter that counts the count end count of the instruction number counter up to the repeat count designated by the repeat instruction. When the branch instruction is being executed, when the branch condition is satisfied during the waiting period, the section sequentially indicates the instructions from the branch destination instruction in the instruction memory, and the branch condition is not satisfied during the waiting period. After the end of the waiting period, the instructions are sequentially pointed to from the instruction following the branch instruction, and the repeat instruction is executed. When the count of the instruction counter is completed, the instruction next to the repeat instruction is indicated, and at the end of the count of the number of times counter, the instruction next to the predetermined number of instruction sequences following the repeat instruction is indicated. Characteristic direct memory access device.

4. The execution unit according to claim 1, claim 2, or claim 3, which counts a register, a calculation unit that performs address calculation, the number of addresses calculated by the calculation unit, and a predetermined count number. A direct memory access device including a timing counter for outputting a timing signal for each.

5. In a parallel computer system in which a plurality of processor elements each including a processor, a memory, and a data transfer device are connected via a network, a memory in a processor element to which the processor element belongs and a plurality of devices on the network. A data transfer device for transferring data between a first port, a second port, a third port, and a first port for temporarily storing data input via the third port. Buffer, a second buffer for temporarily storing the data input through the second port, and an address in the memory in which the data to be transferred is stored, and the second port through the third port. The data to be read from the address, stored in the first buffer, and the data stored in the second buffer should be written. A third direct memory access device for generating an address in the memory and outputting a control signal to write data to the address in the memory via the third port; and a device storing data to be transferred. A second direct memory access device that generates an address of the device and outputs a control signal to read data from the device via the second port and store the data in the second buffer; A first direct memory access device for generating an address of a buffer to write the stored data and outputting a control signal to write the data to the buffer via the first port; The second and third direct memory access devices are claim 1, claim 2, claim 3 or claim 4.
A data transfer device, which is the direct memory access device described in the above.