JPH05290080A

JPH05290080A - Parallel processor

Info

Publication number: JPH05290080A
Application number: JP8583692A
Authority: JP
Inventors: Tatsuya Nagasawa; 達也長沢; Hideyuki Iino; 秀之飯野
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1992-04-08
Filing date: 1992-04-08
Publication date: 1993-11-05

Abstract

PURPOSE:To provide a highly reliable parallel processor at low cost by means of reducing a circuit amount. CONSTITUTION:The parallel processor having N-number of data processing parts is provided with N-number of first storage means 10 Which have mutually common address and independently operate, a second storage means 20 which has the storage parts of N-stages including an input stage and a final stage, inputs address data inputting data designating the address of the N-number of first storage means 10 in accordance with the N-number of data processing parts in serial to the input stage, shifts it to the direction of the final stage and supplies the outputs of the storage parts of the respective stages to the N-number of corresponding first storage means 10 as address signals, and a first selection means 30 outputting data which are read from the N-number of first storage means 10 based on the address signals from the respective storage parts in the second storage means to one of the N-number of data processing parts to which the address corresponds.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は，ベクトルプロセッサ等
で使用される並列処理装置の改良に関する。近年，テク
ノロジーの進歩，パイプライン方式等高速化技術の採用
などによってデータ処理装置の処理能力は飛躍的に向上
しているが，さらにその向上を図るために，ベクトル計
算機などのスーパーコンピュータにおいて，複数の独立
に動作する演算器を設け，また，メモリを複数の独立に
動作するバンクに分割して，それぞれ，データを並列に
処理する方式が広く行われている。従って，バンク構成
を有するメモリに格納されたデータを，複数の独立した
演算器によって並列処理する計算機システムにおいて，
効率的で，かつ，経済的な並列処理装置が望まれてい
る。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to improvement of a parallel processing device used in a vector processor or the like. In recent years, the processing capacity of data processing devices has dramatically improved due to technological advances and the adoption of high-speed technologies such as pipeline systems. A method is widely used in which an independently operating arithmetic unit is provided, and the memory is divided into a plurality of independently operating banks to process data in parallel. Therefore, in a computer system that processes data stored in a memory having a bank configuration in parallel by a plurality of independent arithmetic units,
There is a demand for an efficient and economical parallel processing device.

【０００２】[0002]

【従来の技術】図４は，ベクトル計算装置の要部構成図
である。コマンドバッファユニットは，主メモリ又はバ
ッファ（又はキャッシュ）メモリ（図示省略）から読み
出され，データバスヘ出力された命令コードを命令バッ
ファにセットして，制御ユニットへ出力する。制御ユニ
ットは，命令コードをデコードし，そのコードに基づい
てアドレスユニット及びベクトルユニットを制御するこ
とにより，命令取出し・実行のパイプライン制御を行
う。アドレスユニットは，必要により動的アドレス変換
（ＤＡＴ）を行って，主メモリへの物理アドレスを発生
する。ベクトルユニットは，本発明の対象となる装置で
あって，複数のバンクを有するメモリと複数の演算器と
を並列に同時動作させることによってベクトル演算を効
率的に実行する。2. Description of the Related Art FIG. 4 is a block diagram of the essential parts of a vector computing device. The command buffer unit sets the instruction code read from the main memory or the buffer (or cache) memory (not shown) and output to the data bus in the instruction buffer and outputs it to the control unit. The control unit decodes the instruction code and controls the address unit and the vector unit based on the code to perform pipeline control of instruction fetch and execution. The address unit performs dynamic address translation (DAT) as necessary to generate a physical address to the main memory. The vector unit is a device to which the present invention is applied, and efficiently executes a vector operation by simultaneously operating a memory having a plurality of banks and a plurality of arithmetic units in parallel.

【０００３】図５は，本発明の技術背景の説明図であっ
て，理解を容易にするためにバンク上のデータの配列を
モデル化してある。メモリは独立に動作可能なバンク０
−３から構成されており，データはベクトル計算機など
に固有な配置でバンク０−３に格納される。即ち，演算
器Ａ−Ｄに供給するデータはコンパイラプログラムによ
って予め，図に示すように，バンク０−３にわたるメモ
リ領域Ａ−Ｄに配置される。例えば，各バンクに割り当
てられた物理的アドレスの，ある特定のアドレスをａと
し，その，バンク０−３を考慮したアドレスを０／ａ−
３／ａと表現すると，演算器Ａへ供給するデータのアド
レスは，０／ａ，１／ａ，２／ａ，３／ａ，０／(a＋
i), １／(a＋i)，２／(a＋i)，３／(a＋i)，０／(a＋2
i), １／(a＋2i）・・・となる。ここで，i は予め定め
た値であって，あるアドレスについてバンク０からバン
ク３までをアクセスした後，アドレスを増分ｉだけ増加
して更新する。演算器Ｂ−Ｄへ供給するデータのアドレ
スも同様にして，a を，それぞれ，ｂ，ｃ，ｄで置換す
ることにより表される。FIG. 5 is an explanatory view of the technical background of the present invention, in which the array of data on the bank is modeled for easy understanding. Memory 0 can operate independently
-3, the data is stored in banks 0-3 in an arrangement unique to a vector computer or the like. That is, the data to be supplied to the arithmetic units A to D are arranged in advance in the memory areas A to D extending over the banks 0 to 3 as shown in the figure by the compiler program. For example, a specific address of the physical addresses assigned to each bank is set to a, and the address considering the banks 0-3 is 0 / a-.
When expressed as 3 / a, the addresses of the data supplied to the arithmetic unit A are 0 / a, 1 / a, 2 / a, 3 / a, 0 / (a +
i), 1 / (a + i), 2 / (a + i), 3 / (a + i), 0 / (a + 2)
i), 1 / (a + 2i) ... Here, i is a predetermined value, and after accessing bank 0 to bank 3 for a certain address, the address is incremented by the increment i and updated. Similarly, the address of the data supplied to the arithmetic units BD is represented by replacing a with b, c and d, respectively.

【０００４】図６は，従来例を示す並列処理回路のブロ
ック図であって，例えば，ベクトル計算機システムの４
並列処理を行う回路部分を表す。全図を通して，同一符
号は同一又は同様な構成要素を示す。FIG. 6 is a block diagram of a parallel processing circuit showing a conventional example.
Represents a circuit part that performs parallel processing. Throughout the drawings, the same reference numerals indicate the same or similar components.

【０００５】メモリ10a は，並行して，同時に読み書き
可能な独立したバンクB0−B3から構成される。カウンタ
CNTA−CNTDは，それぞれ，演算器1A−1Dへ供給するデー
タのメモリ10a のアドレスを保持する。外部から入力さ
れるロードアドレス信号ＬＡＤＲを，演算器1A−1Dに対
応するロード信号ＬＤＡ−ＬＤＤのタイミングで，ロー
ドして保持する。バンクB0からB3までアクセスし，次に
バンクB0に戻るときには，それぞれ，外部から入力され
るＥＮＡ−ＥＮＤ信号で与えられる増分（Ｉa −Ｉd ）
だけカウントアップした後，アクセスを続行する。The memory 10a is composed of independent banks B0-B3 capable of reading and writing simultaneously in parallel. counter
CNTA-CNTD hold the address of the memory 10a of the data supplied to the computing units 1A-1D, respectively. The load address signal LADR input from the outside is loaded and held at the timing of the load signals LDA-LDD corresponding to the arithmetic units 1A-1D. When accessing banks B0 to B3 and then returning to bank B0, the increments (Ia-Id) given by the externally input ENA-END signals, respectively.
After counting up, access continues.

【０００６】セレクタ SEL1A−SEL1D は，それぞれ，バ
ンクB0−B3ヘ入力すべきアドレスを，選択信号ＳＥＬ
（例えば，２ビットから成り，10進値０，１，２，３を
循環的に繰り返す）の値に従って選択する。即ち，図６
において選択信号ＳＥＬが，セレクタSEL1A −SEL1D の
入力に付した丸で囲んだ値（０−３）をとるとき，その
入力を選択して，それぞれ，バンクB0−B3ヘ供給する。
例えば，セレクタSEL1Aは，選択信号ＳＥＬが０，１，
２，３のとき, その順にカウンタCNTA，CNTB，CNTC，CN
TDの内容を選択してバンクB0へ供給する。The selectors SEL1A-SEL1D respectively select the addresses to be input to the banks B0-B3 with the selection signal SEL.
(For example, it is composed of 2 bits, and decimal values 0, 1, 2, and 3 are cyclically repeated). That is, FIG.
When the selection signal SEL takes a value (0-3) enclosed by a circle attached to the input of the selectors SEL1A to SEL1D, the input is selected and supplied to the banks B0 to B3, respectively.
For example, in the selector SEL1A, the selection signal SEL is 0, 1,
In case of 2 and 3, counters CNTA, CNTB, CNTC, CN in that order
Select the contents of TD and supply to bank B0.

【０００７】セレクタ SEL2A−SEL2D は，セレクタ SEL
1A−SEL1D と同様にして，バンクB0−B3から読み出され
たデータの１組を，選択信号ＳＥＬの値によって選択し
て，それぞれ，演算器1A−1Dへ供給する。例えば，セレ
クタSEL2A は，選択信号ＳＥＬが０，１，２，３のと
き, その順にバンクB0，B1，B2，B3から読み出されたデ
ータを選択して演算器1Aへ供給する。Selectors SEL2A-SEL2D are selectors SEL
Similar to 1A-SEL1D, one set of data read from the banks B0-B3 is selected by the value of the selection signal SEL and supplied to the computing units 1A-1D, respectively. For example, when the selection signal SEL is 0, 1, 2, 3, the selector SEL2A selects the data read from the banks B0, B1, B2, B3 in that order and supplies it to the arithmetic unit 1A.

【０００８】図７は，従来例の並列処理回路のタイミン
グ図である。 (1) ロードアドレス信号ＬＡＤＲ上のアドレスデータａ
−ｄがロード信号ＬＤＡ−ＬＤＤを伴って入力される。 (2) アドレスデータａ−ｄは，ロード信号ＬＤＡ−ＬＤ
Ｄがオンになったときのときの，クロックＣＬＫの立下
がりでカウンタCNTA−CNTDにセットされる。 (3) カウンタCNTA−CNTDの値は，その後，外部から入力
されるＥＮＡ−ＥＮＤで与えられる増分Ｉa −Ｉd を加
算してアドレスを増加する。 (4) セレクタSEL1A は，選択信号ＳＥＬが０，１，２，
３と変化するとき, カウンタCNTAの内容ａを選択し，そ
れぞれ，バンクB0，B1，B2，B3へアドレスとして供給す
る。図では，そのアドレスをバンクに対応して，０／
ａ，１／ａ，２／ａ，３／ａと表示してある。また，セ
レクタSEL1B は, 選択信号ＳＥＬが１，２，３，０と変
化するにつれて, カウンタCNTBの内容ｂを選択して，ア
ドレス０／ｂ，１／ｂ，２／ｂ，３／ｂを，それぞれ，
バンクB0，B1，B2，B3へ供給する。セレクタSEL1C 及び
SEL1D も同様な操作を行う。 (5) セレクタ SEL2A−SEL2D も同様な操作を行って，セ
レクタ SEL1A−SEL1D が順次，アドレスを供給したバン
クB0−B3から順次，読み出されるデータを選択して，そ
れぞれ，演算器1A−1Dへ供給する。例えば，セレクタ S
EL2Aは，セレクタSEL1A−SEL1D がバンクB0−B3へ順
次，供給したアドレス０／ａ−３／ａから順次，読み出
されるデータを選択して演算器1Aへ供給する。 (6) 以下，前記(3) に戻って(3) から(5) の操作を繰り
返す。外部からのロード信号ＬＤＡ−ＬＤＤがオンにな
ったときは，前記(1) に戻って同様な操作を繰り返す。FIG. 7 is a timing chart of a conventional parallel processing circuit. (1) Address data a on the load address signal LADR
-D is input with the load signals LDA-LDD. (2) Address data ad are load signals LDA-LD
The counters CNTA-CNTD are set at the falling edge of the clock CLK when D is turned on. (3) The value of the counters CNTA-CNTD is then incremented by adding the increments Ia-Id given by the externally input ENA-END. (4) Selector SEL1A has select signal SEL 0, 1, 2,
When it changes to 3, the content a of the counter CNTA is selected and supplied to the banks B0, B1, B2 and B3 as addresses. In the figure, the address is 0 /
It is indicated as a, 1 / a, 2 / a, 3 / a. Further, the selector SEL1B selects the content b of the counter CNTB as the selection signal SEL changes to 1, 2, 3, 0, and selects the addresses 0 / b, 1 / b, 2 / b, 3 / b, Each,
Supply to banks B0, B1, B2, B3. Selector SEL1C and
SEL1D performs the same operation. (5) The selectors SEL2A-SEL2D also perform the same operation, and the selectors SEL1A-SEL1D sequentially select the data to be read from the banks B0-B3 to which the addresses are supplied and supply them to the computing units 1A-1D, respectively. To do. For example, selector S
The EL2A sequentially selects the data read from the addresses 0 / a-3 / a supplied by the selectors SEL1A-SEL1D to the banks B0-B3 and supplies the selected data to the arithmetic unit 1A. (6) Then, return to (3) above and repeat steps (3) to (5). When the load signals LDA-LDD from the outside are turned on, the procedure returns to (1) and the same operation is repeated.

【０００９】[0009]

【発明が解決しようとする課題】上記のように従来方法
によると，メモリバンクへアドレスを供給するためのカ
ウンタ及び２組のセレクタを，演算器またはメモリバン
クの数に対応して，独立に必要とする。従って，回路量
が増加して装置の価格が増加し，また，部品点数が増加
して信頼性が低下するという問題点があった。As described above, according to the conventional method, the counter for supplying the address to the memory bank and the two sets of selectors are required independently according to the number of arithmetic units or memory banks. And Therefore, there is a problem that the amount of circuits increases, the cost of the device increases, and the number of parts increases and reliability decreases.

【００１０】本発明は，回路量を減少することによっ
て，低価格で高信頼性の並列処理装置を提供することを
目的とする。An object of the present invention is to provide a low-cost and highly reliable parallel processing device by reducing the circuit amount.

【００１１】[0011]

【課題を解決するための手段】図１は本発明の原理ブロ
ック図を示す。１は，複数の数Ｎ個のデータ処理部，10
は，相互に共通のアドレスを有し，独立に動作するＮ個
の第１の記憶手段，2iは，Ｎ個の第１の記憶手段10のア
ドレスを指定するデータを，Ｎ個のデータ処理部１に対
応して直列に入力するアドレスデータ，20は，入力段及
び最終段を含むＮ段の記憶部を有し，アドレスデータ2i
を入力段へ入力して最終段方向へシフトすると共に，各
段の記憶部の出力をアドレス信号として，対応するＮ個
の第１の記憶手段(10)の各々へ供給する第２の記憶手
段，30は，第２の記憶手段20の各記憶部からのアドレス
信号に基づいてＮ個の第１の記憶手段10の各々から読み
出されたデータを，そのアドレスが対応するＮ個のデー
タ処理部１の中の一つへ出力する第１の選択手段であ
る。FIG. 1 shows a block diagram of the principle of the present invention. 1 is a plurality of N data processing units, 10
Is an N number of first storage means operating independently of each other having a common address, and 2i is a number of N data processing units for designating the addresses of the N number of first storage means 10. Address data to be serially input corresponding to 1, 20 has a storage unit of N stages including an input stage and a final stage, and address data 2i
Is input to the input stage and shifted toward the final stage, and the output of the storage unit of each stage is supplied as an address signal to each of the corresponding N first storage units (10). , 30 processes the data read from each of the N first storage units 10 based on the address signal from each storage unit of the second storage unit 20, to process the N data corresponding to the address. It is a first selection means for outputting to one of the parts 1.

【００１２】[0012]

【作用】本発明によれば，複数の数Ｎ個のデータ処理部
１によって並列処理する装置において，Ｎ個の第１の記
憶手段10は相互に共通のアドレスを有し，独立に動作
し，アドレスデータ2iはＮ個の第１の記憶手段10のアド
レスを指定するデータを，Ｎ個のデータ処理部１に対応
して直列に入力し，第２の記憶手段20は入力段及び最終
段を含むＮ段の記憶部を有し，アドレスデータ2iを入力
段へ入力して最終段方向へシフトすると共に，各段の記
憶部の出力をアドレス信号として，対応するＮ個の第１
の記憶手段10の各々へ供給し，第１の選択手段30は，第
２の記憶手段20の各記憶部からのアドレス信号に基づい
てＮ個の第１の記憶手段10の各々から読み出されたデー
タを，そのアドレスが対応するＮ個のデータ処理部１の
中の一つへ出力する。従って，アドレスデータ2iによっ
て与えられ，第２の記憶手段20内をシフトされる，Ｎ個
のデータ処理部１に個別のアドレスによって，Ｎ個の第
１の記憶手段10は各々，並行動作してデータを読み出
し，第１の選択手段30は読み出されたデータを対応する
Ｎ個のデータ処理部１へ振り分けることによって，Ｎ個
のデータ処理部１は並列処理を行うことが可能となる。According to the present invention, in a device for performing parallel processing by a plurality of N data processing units 1, the N first storage means 10 have common addresses and operate independently. As the address data 2i, data designating the addresses of the N first storage means 10 are serially input corresponding to the N data processing units 1, and the second storage means 20 has an input stage and a final stage. The storage unit has N stages of storage, and the address data 2i is input to the input stage to shift toward the final stage, and the output of the storage unit of each stage is used as an address signal for the corresponding N first
Of the first storage means 10 and the first selection means 30 is read from each of the N first storage means 10 based on the address signal from each storage portion of the second storage means 20. The output data is output to one of the N data processing units 1 corresponding to the address. Therefore, the N first storage means 10 are operated in parallel by the individual addresses of the N data processing units 1 which are given by the address data 2i and are shifted in the second storage means 20. By reading the data and the first selecting means 30 sorts the read data to the corresponding N data processing units 1, the N data processing units 1 can perform parallel processing.

【００１３】[0013]

【実施例】図２は，本発明の実施例を示す並列処理回路
のブロック図である。全図を通して，同一符号は同一又
は同様な構成要素を示す。FIG. 2 is a block diagram of a parallel processing circuit showing an embodiment of the present invention. Throughout the drawings, the same reference numerals indicate the same or similar components.

【００１４】演算器1A−1Dは，それぞれ独立に動作す
る，例えば，乗算器，加算器，除算器，ロード／ストア
器（または，外部回路へデータを出力するインタフェー
ス回路）である。The arithmetic units 1A-1D are, for example, multipliers, adders, dividers, load / store units (or interface circuits that output data to an external circuit) that operate independently of each other.

【００１５】メモリ（又はレジスタ群であってもよい）
10b は，バンクB0−B3から構成されている。バンクB0−
B3は相互に共通の物理アドレスが割り当てられており，
同時に，並行して読み書き可能なメモリである。Memory (or a group of registers)
10b is composed of banks B0-B3. Bank B0-
B3 is assigned a common physical address,
At the same time, it is a memory that can be read and written in parallel.

【００１６】選択信号ＳＥＬは，２ビットから成り，１
クロックサイクルごとに10進値０，１，２，３を循環的
に繰り返す。ロードアドレス信号ＬＡＤＲは，外部（例
えばベクトル計算機の構成要素間を結合するバスを制御
するバス制御回路）から入力される，各バンクに共通な
物理アドレスであって，例えば，各バンクが256 語の記
憶容量を有するとき８ビットで構成される。ベクトル計
算機が所定単位の処理を開始するときにメモリ10b の，
演算器1A−1Dヘ供給するデータが格納された領域の先頭
アドレスが，１クロックサイクルの期間ずつ直列に入力
される。The selection signal SEL consists of 2 bits, and 1
Decimal values 0, 1, 2, 3 are cyclically repeated every clock cycle. The load address signal LADR is a physical address common to each bank, which is input from the outside (for example, a bus control circuit that controls a bus that couples the components of the vector computer). When it has a storage capacity, it is composed of 8 bits. When the vector computer starts the processing of a predetermined unit,
The start address of the area in which the data supplied to the computing units 1A-1D is stored is serially input for each period of one clock cycle.

【００１７】ＥＮＡ−ＥＮＤ信号は，演算器1A−1Dに対
応して外部から入力され，アドレス更新時の増分（Ｉa
−Ｉd ）を与える。同じアドレス（例えば，ａ）でバン
クB0からB3までアクセスし，アドレスを進めてバンクB0
から再度，アクセスを開始するとき（アドレス更新時と
いう）には，アドレスを演算器1A−1Dに対応する増分
（Ｉa −Ｉd ）だけ更新してアクセスを続行する。ＥＮ
Ａ−ＥＮＤ信号は，例えば，２ビットで構成され，Ｉa
−Ｉd は，０から３の値をとり得る。The ENA-END signal is externally input corresponding to the arithmetic units 1A-1D, and is incremented (Ia
-Id) is given. Banks B0 to B3 are accessed at the same address (for example, a), and the address is advanced to bank B0.
When the access is started again (referred to as address updating), the address is updated by the increment (Ia-Id) corresponding to the arithmetic units 1A-1D and the access is continued. EN
The A-END signal is composed of 2 bits, for example, and
-Id can take values from 0 to 3.

【００１８】レジスタREG0−REG3は，それぞれ，演算器
1A−1Dへ供給するデータが格納されたメモリアドレスを
保持する。レジスタREG0からレジスタREG3の方向へデー
タをシフトするシフトレジスタとして構成され，レジス
タREG0−REG3の出力は，それぞれ，バンクB0−B3へ接続
され，レジスタREG0−REG3に格納されたアドレスをバン
クB0−B3へ供給する。The registers REG0 to REG3 are arithmetic units, respectively.
Holds the memory address where the data to be supplied to 1A-1D is stored. It is configured as a shift register that shifts data in the direction from register REG0 to register REG3. The outputs of registers REG0-REG3 are connected to banks B0-B3, respectively, and the addresses stored in registers REG0-REG3 are transferred to banks B0-B3. Supply to.

【００１９】セレクタSEL4は，ＳＥ信号（論理和回路OR
によるロード信号ＬＤＡ−ＬＤＤの論理和）がオンのと
きには，ロードアドレス信号ＬＡＤＲ（演算器1A−1Dに
対応するアドレス信号，例えば，ａ−ｄ）を選択出力し
てレジスタREG0へ供給する。ＳＥ信号オフ時のアドレス
更新時には，後述する加算器ADD の出力を選択出力して
レジスタREG0へ供給する。The selector SEL4 outputs an SE signal (OR circuit OR
When the logical sum of the load signals LDA-LDD by (1) is ON, the load address signal LADR (address signal corresponding to the arithmetic units 1A-1D, for example, a-d) is selectively output and supplied to the register REG0. When updating the address when the SE signal is off, the output of the adder ADD described later is selectively output and supplied to the register REG0.

【００２０】加算器ADD は，アドレス更新時に，ＥＮＡ
−ＥＮＤ信号で与えられる増分（Ｉa −Ｉd ）をレジス
タREG3の内容に加算することによってアドレスを更新す
る。セレクタSEL5は，選択信号ＳＥＬの値０−３に対応
して，ＥＮＡ−ＥＮＤ信号の１つを選択して出力する。When the address is updated, the adder ADD uses ENA
Update the address by adding the increment (Ia-Id) given by the -END signal to the contents of register REG3. The selector SEL5 selects and outputs one of the ENA-END signals corresponding to the values 0-3 of the selection signal SEL.

【００２１】セレクタ SEL2A−SEL2D は，それぞれ，バ
ンクB0−B3から読み出されたデータの１組を，選択信号
ＳＥＬの値によって選択して，それぞれ，演算器1A−1D
へ供給する。即ち，図６において選択信号ＳＥＬが，セ
レクタ SEL2A−SEL2D の入力に付した丸で囲んだ値（０
−３）をとるとき，その入力を選択して演算器1A−1Dへ
供給する。例えば，セレクタ SEL2Aは，選択信号ＳＥＬ
が０，１，２，３のとき, その順にバンクB0，B1，B2，
B3から読み出されたデータを選択して演算器1Aへ供給す
る。The selectors SEL2A-SEL2D select one set of data read from the banks B0-B3, respectively, according to the value of the selection signal SEL, and the arithmetic units 1A-1D respectively.
Supply to. That is, in FIG. 6, the selection signal SEL is the value (0) enclosed by a circle attached to the input of the selectors SEL2A-SEL2D.
When -3) is taken, that input is selected and supplied to the computing units 1A-1D. For example, the selector SEL2A uses the selection signal SEL
Are 0, 1, 2, 3 and banks B0, B1, B2, in that order
The data read from B3 is selected and supplied to the arithmetic unit 1A.

【００２２】図３は，本発明の実施例の並列処理回路の
タイミング図である。 (1) 選択信号ＳＥＬは，１クロックサイクル毎に０，
１，２，３の値を循環的に繰り返す。 (2) ロードアドレス信号ＬＡＤＲ上のアドレスデータａ
−ｄがロード信号ＬＤＡ−ＬＤＤを伴って入力される。
アドレスデータａ−ｄは，演算器1A−1Dへ供給すべきデ
ータが格納されたメモリ領域の先頭アドレスを示す。 (3) ＳＥ信号（論理和回路ORによるロード信号ＬＤＡ−
ＬＤＤの論理和）がオンになったとき，ロードアドレス
信号ＬＡＤＲ上のアドレスデータａは，ロード信号ＬＤ
Ａがオンのときの，クロックＣＬＫの立下がりで，レジ
スタREG0ヘセットされる。同時に，レジスタREG0内の内
容ａをアドレスとしてバンクB0がアクセスされる。 (4) 読み出されたデータは，ＳＥＬ信号の値（このと
き，ＳＥＬ値は０）に従ってセレクタSEL3A によって選
択されて，演算器1Aへ供給される。即ち，図に０／ａで
示すように，バンクB0のアドレスａに格納されたデータ
が読み出され，セレクタSEL3A によって選択される。 (5) 次に，ロード信号ＬＤＢがオンのときの，クロック
ＣＬＫの立下がりで，レジスタREG0の内容ａはレジスタ
REG1へシフトされると共に，ロードアドレス信号ＬＡＤ
Ｒ上のアドレスデータｂがレジスタREG0ヘセットされ
る。同時に，バンクB1はレジスタREG1の内容ａをアドレ
スとし，また，バンクB0はレジスタREG0の内容ｂをアド
レスとしてアクセスされる。 (6) 読み出された１／ａ及び０／ｂのデータは，それぞ
れ，ＳＥＬ信号の値に従ってセレクタSEL3A 及びセレク
タSEL3B によって選択されて，演算器1B及び演算器1Aへ
供給される。 (7) 続いて，ロードアドレス信号ＬＡＤＲからアドレス
ｃ及びｄがレジスタREG0ヘ順にセットされ，同様に上記
のシフト操作，バンクのアクセス，読み出されたデータ
の選択，及び演算器へ供給が行われる。 (8) アドレスデータａがレジスタREG3へシフトされてき
たとき加算器ADD は，セレクタSEL5によって選択された
ＥＮＡ信号のＩa をアドレスデータａに加算する。セレ
クタSEL4は，ＳＥ信号がオフであるので，加算器ADD の
出力を選択して出力する。レジスタREG0−REG2の内容
は，それぞれ，レジスタREG1−REG3へシフトされると同
時に，ａ＋Ｉa の値がレジスタREG0へセットされる。 (9) レジスタREG0へセットされたａ＋Ｉa をアドレスと
してバンクB0がアクセスされる。 (10)読み出されたデータは，ＳＥＬ信号の値（このと
き，ＳＥＬ値は０）に従ってセレクタSEL3A によって選
択されて，演算器1Aへ供給される。即ち，図に０／ａ＋
Ｉa で示すように，バンクB0のアドレスａ＋Ｉa に格納
されたデータが読み出され，セレクタSEL3A によって選
択される。同時に，レジスタREG1，REG2，REG3内のアド
レスｄ，ｃ，ｂについて，同様な操作が行われる。 (11)次に, 同様にして，アドレスｂ＋Ｉb がレジスタRE
G0にセットされ，次にＳＥ信号がオンになるまで同様な
操作が繰り返される。FIG. 3 is a timing chart of the parallel processing circuit according to the embodiment of the present invention. (1) The selection signal SEL is 0 for each clock cycle,
The values 1, 2 and 3 are cyclically repeated. (2) Address data a on the load address signal LADR
-D is input with the load signals LDA-LDD.
The address data a to d indicate the start address of the memory area in which the data to be supplied to the arithmetic units 1A to 1D are stored. (3) SE signal (load signal LDA by OR circuit OR
When the logical sum of LDD) is turned on, the address data a on the load address signal LADR is
It is set in the register REG0 at the falling edge of the clock CLK when A is on. At the same time, the bank B0 is accessed by using the content a in the register REG0 as an address. (4) The read data is selected by the selector SEL3A according to the value of the SEL signal (the SEL value is 0 at this time) and supplied to the computing unit 1A. That is, as indicated by 0 / a in the figure, the data stored in the address a of the bank B0 is read and selected by the selector SEL3A. (5) Next, when the load signal LDB is on, the contents a of the register REG0 are registered at the falling edge of the clock CLK.
Shifted to REG1 and load address signal LAD
The address data b on R is set in the register REG0. At the same time, the bank B1 is accessed using the content a of the register REG1 as an address, and the bank B0 is accessed using the content b of the register REG0 as an address. (6) The read data of 1 / a and 0 / b are selected by the selectors SEL3A and SEL3B, respectively, according to the value of the SEL signal, and are supplied to the arithmetic units 1B and 1A. (7) Next, the addresses c and d are sequentially set from the load address signal LADR to the register REG0, and similarly, the shift operation, the bank access, the read data selection, and the supply to the arithmetic unit are performed. .. (8) When the address data a is shifted to the register REG3, the adder ADD adds the Ia of the ENA signal selected by the selector SEL5 to the address data a. Since the SE signal is off, the selector SEL4 selects and outputs the output of the adder ADD. The contents of the registers REG0-REG2 are respectively shifted to the registers REG1-REG3, and at the same time, the value of a + Ia is set to the register REG0. (9) Bank B0 is accessed with a + Ia set in register REG0 as an address. (10) The read data is selected by the selector SEL3A according to the value of the SEL signal (the SEL value is 0 at this time) and supplied to the computing unit 1A. That is, 0 / a + in the figure
As indicated by Ia, the data stored in the address a + Ia of the bank B0 is read and selected by the selector SEL3A. At the same time, the same operation is performed for the addresses d, c, b in the registers REG1, REG2, REG3. (11) Next, similarly, the address b + Ib is set to the register RE.
It is set to G0, and the same operation is repeated until the SE signal turns on next.

【００２３】演算器1A−1Dに供給するデータは，図５に
示すように領域Ａ−Ｄに分割して配置されている必要は
なく，相互に重なり合っていてもよい。また，増分Ｉa
−Ｉd は演算器1A−1Dに対して異なった値であってもよ
いし，同一の演算器について，バンクB0−B3を順にアク
セスした後の更新時ごとに個々に変化する値であっても
よい。The data supplied to the arithmetic units 1A-1D do not have to be divided and arranged in areas A-D as shown in FIG. 5, and may be overlapped with each other. Also, the increment Ia
-Id may be a different value for the arithmetic units 1A-1D, or may be a value that changes for each update after sequentially accessing banks B0-B3 for the same arithmetic unit. Good.

【００２４】このように，レジスタREG0−REG3は，外部
から直列に入力される，演算器1A−1Dに対応するアドレ
スデータをシフトしながら，各バンクB0−B3に対してア
ドレスを供給することにより，各バンクB0−B3にわたっ
て共通のアドレスから順次，データが読み出される。セ
レクタ SEL3A−SEL3D は読み出されたデータを選択し
て，それぞれ，該当する演算器1A−1Dへ供給する。従っ
て，バンクB0−B3は同時動作を行い，各演算器1A−1Dも
独立に，並行動作を行うことによって，ベクトル計算機
等に固有の配置でメモリ10b に格納されたデータを処理
することができる。また，回路量については，本発明
は，従来例のカウンタCNTA−CNTDをシフトレジスタREG0
−REG3で置換することによって回路量を減少すると共
に，従来例が必要とした，例えば，アドレス８ビットを
４組入力する４個のセレクタ SEL1A−SEL1D を不要とす
ることによって，演算器1A−1D及びバンクB0−B3を除く
図示する回路について，回路量を略半減することができ
た。As described above, the registers REG0-REG3 supply addresses to the banks B0-B3 while shifting the address data corresponding to the arithmetic units 1A-1D, which are serially input from the outside. , Data is sequentially read from a common address in each of the banks B0 to B3. Selectors SEL3A-SEL3D select the read data and supply them to the corresponding computing units 1A-1D. Therefore, the banks B0-B3 operate simultaneously, and each of the arithmetic units 1A-1D independently operates in parallel, so that the data stored in the memory 10b can be processed in an arrangement unique to a vector computer or the like. .. Regarding the circuit amount, the present invention uses the counters CNTA-CNTD of the conventional example as the shift register REG0.
By replacing with -REG3, the circuit amount is reduced, and by eliminating the need for four selectors SEL1A-SEL1D for inputting 4 sets of 8 bits of address, which are required in the conventional example, the arithmetic units 1A-1D In addition, the circuit volume of the circuits shown, excluding banks B0-B3, could be cut in half.

【００２５】[0025]

【発明の効果】以上説明したように，本発明によると，
Ｎ個の第１の記憶手段10は各々，並行動作し，また，Ｎ
個のデータ処理部１も独立に，並行動作することによっ
て，ベクトル計算機等に固有の配置でＮ個の第１の記憶
手段10に格納されたデータを処理することができる。さ
らに，本発明の回路構成によれば，回路量を減少するこ
とができ，並列処理装置の価格を低下すると共に，その
信頼性を向上することができるという効果がある。As described above, according to the present invention,
Each of the N first storage means 10 operates in parallel, and N
By operating the data processing units 1 independently and in parallel, it is possible to process the data stored in the N first storage units 10 in the arrangement unique to the vector computer or the like. Further, according to the circuit configuration of the present invention, the circuit amount can be reduced, the cost of the parallel processing device can be reduced, and the reliability thereof can be improved.

[Brief description of drawings]

【図１】本発明の原理ブロック図FIG. 1 is a block diagram of the principle of the present invention.

【図２】本発明の実施例を示す並列処理回路のブロッ
ク図FIG. 2 is a block diagram of a parallel processing circuit showing an embodiment of the present invention.

【図３】本発明の実施例の並列処理回路のタイミング
図FIG. 3 is a timing diagram of the parallel processing circuit according to the embodiment of the present invention.

【図４】ベクトル計算装置の要部構成図FIG. 4 is a block diagram of a main part of a vector calculation device.

【図５】本発明の技術背景の説明図FIG. 5 is an explanatory diagram of a technical background of the present invention.

【図６】従来例を示す並列処理回路のブロック図FIG. 6 is a block diagram of a parallel processing circuit showing a conventional example.

【図７】従来例の並列処理回路のタイミング図FIG. 7 is a timing diagram of a parallel processing circuit of a conventional example.

[Explanation of symbols]

１Ｎ個のデータ処理部 10 Ｎ個の第１の記憶手段 20 第２の記憶手段 30 第１の選択手段 2i アドレス信号 10b メモリ ADD 加算器 B0−B3 バンク OR 論理和回路 REG0−REG3 レジスタ SEL3A −SEL3D, SEL4, SEL5 セレクタ 1 N data processing units 10 N first storage means 20 second storage means 30 first selection means 2i address signal 10b memory ADD adder B0-B3 bank OR OR circuit REG0-REG3 register SEL3A- SEL3D, SEL4, SEL5 selector

Claims

[Claims]

1. A device for performing parallel processing by a plurality of N data processing units (1), the N first storage means (10) having mutually common addresses and operating independently. And address data (2i) for serially inputting data designating addresses of the N first storage means 10 corresponding to the N data processing units 1, and an input stage and a final stage. The memory unit has N stages, the address data (2i) is input to the input stage and shifted toward the final stage, and the output of the memory unit of each stage is used as an address signal to correspond to the N first corresponding units. Second storage means (20) for supplying to each of the storage means (10) of the
The data read from each of the N first storage means (10) on the basis of the address signal from each storage section of the storage means (20) is processed by the N data processing corresponding to the address. A parallel processing device comprising: a first selecting means (30) for outputting to one of the units (1).

2. N sets of external address signals (4i) externally input corresponding to the N data processing units (1), and a storage unit at a final stage of the second storage unit (20). Selecting means (50) for adding a predetermined data (51) to the address data, the N sets of address data input by the external address signal (4i) and the output data of the adding means (50) 2. The parallel processing apparatus according to claim 1, further comprising a second selecting means (40) for outputting the address signal (2i).

3. An increment signal (6i) externally input corresponding to the N number of data processing units (1), and the second storage means (20) from among the increment signals (6i). The N data processing units (1) corresponding to the address data in the storage unit at the final stage of
Select the increment signal corresponding to the above-mentioned predetermined data (51)
3. The parallel processing apparatus according to claim 2, further comprising a third selecting means (60) for outputting as.