JP2771912B2

JP2771912B2 - Arithmetic control method of vector arithmetic processing unit

Info

Publication number: JP2771912B2
Application number: JP3201739A
Authority: JP
Inventors: 浩二黒田
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1991-08-12
Filing date: 1991-08-12
Publication date: 1998-07-02
Anticipated expiration: 2013-07-02
Also published as: JPH0546655A

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【産業上の利用分野】本発明は複数の演算器で構成され
るベクトル演算処理装置の演算制御方式に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to an arithmetic control system for a vector arithmetic processing device comprising a plurality of arithmetic units.

【０００２】[0002]

【従来の技術】従来例を図１０〜図１４を参照して説明
する。図１０は本発明および従来例に適用されるベクト
ル演算処理装置の構成図、図１１は同構成のベクトルレ
ジスタの具体例、図１２は従来例の演算器の構成図、図
１３は従来例の動作タイミングチャート、図１４は従来
例の演算器の構成案である。2. Description of the Related Art A conventional example will be described with reference to FIGS. 10 is a block diagram of a vector operation processing device applied to the present invention and the conventional example, FIG. 11 is a specific example of a vector register having the same configuration, FIG. 12 is a configuration diagram of a conventional arithmetic unit, and FIG. FIG. 14 shows an operation timing chart of a configuration example of a conventional arithmetic unit.

【０００３】まず図１０を参照してベクトル演算処理装
置の構成を説明する。なお説明を容易にするため、図１
０では演算器が４である場合を示している。１１は演算
器であり、プロセッサ１５の命令に従って演算処理を実
行する。１３はベクトルレジスタであり、演算器１１で
演算処理するデータが記録される。１４はシステムバス
である。また、１２は隣れる演算器にデータを転送する
バスである。First, the configuration of a vector operation processing device will be described with reference to FIG. In order to facilitate the explanation, FIG.
0 indicates a case where the arithmetic unit is 4. Numeral 11 denotes an arithmetic unit, which executes arithmetic processing according to an instruction of the processor 15. Reference numeral 13 denotes a vector register in which data to be processed by the arithmetic unit 11 is recorded. 14 is a system bus. A bus 12 transfers data to an adjacent computing unit.

【０００４】また、ベクトルレジスタ１３は、図１１に
示すように、ＸアドレスとＹアドレスに対応するメモリ
で構成され、ＸアドレスのレジスタはＸアドレスに対応
する演算器とアクセスされる。ベクトル演算処理装置に
おいては、各種の演算処理が行なわれるが、処理結果の
総和を求める処理も頻繁に行なわれる。すなわち、図１
１で示されるデータＤ ₀−０〜Ｄ₃−４の、例えば、全
てのデータの和を求める処理が頻繁に行なわれる。この
場合、各演算器は各演算器と接続されているベクトルレ
ジスタ１３の対応するＸアドレスに対するデータの和を
求める。すなわち、演算器１１−３はＸアドレスからで
あるデータＤ₃−０〜Ｄ₃−４の和を求める。その後、
各演算器で求めた和は、隣れる演算器にデータを転送す
るバス１２を介して演算器１１−０に転送され、演算器
１１−０で総和処理が実行される。本発明は総和演算処
理実行時の演算制御方式に関するものである。A vector register 13 is shown in FIG.
As shown, the memory corresponding to the X address and the Y address
The register of X address corresponds to X address
Is accessed. Vector arithmetic processing unit
In this case, various types of arithmetic processing are performed.
Processing for obtaining the sum is also frequently performed. That is, FIG.
Data D indicated by 1 ₀−0 to D_Three-4, for example, all
Processing for obtaining the sum of all data is frequently performed. this
In this case, each computing unit is connected to the vector
The sum of data for the corresponding X address of the
Ask. That is, the arithmetic unit 11-3 starts from the X address.
Some data D_Three−0 to D_Three-4 is obtained. afterwards,
The sum obtained by each operation unit transfers the data to the adjacent operation unit.
Transferred to a computing unit 11-0 via a bus 12
At 11-0, summation processing is executed. The present invention provides a
The present invention relates to an arithmetic control method at the time of execution of processing.

【０００５】つぎに、従来の各演算器の構成例を図１２
を参照して説明する。Ｒ１３，Ｒ１４，Ｒ２４，Ｒ５，
Ｒ６およびＲ７はｋバイトのデータｍ個が記録できるレ
ジスタである。すなわち、８ｋｍビットのデータを記録
できるレジスタである。しかし以後説明を容易にするた
め、以ってｋ＝１，ｍ＝４として説明を行なう。Next, a configuration example of each conventional arithmetic unit is shown in FIG.
This will be described with reference to FIG. R13, R14, R24, R5
R6 and R7 are registers capable of recording m pieces of k-byte data. That is, it is a register that can record 8 km-bit data. However, hereinafter, for the sake of simplicity, the description will be made on the assumption that k = 1 and m = 4.

【０００６】また、Ｓ１，Ｓ２およびＳ３はセレクタで
あり、レジスタに入力するデータをセレクトする。ベク
トルレジスタ１３と演算器１１間にｋｍバイトのデータ
受渡を行うが、隣れる演算器にデータを転送するバス１
２はｋバイトの専用バスとなっており、このため、レジ
スタＲ１３はｋｍバイトのメモリを分割してｋバイトｍ
個で構成してデータを格納できるようになっている。Further, S1, S2 and S3 are selectors for selecting data to be input to the register. A bus 1 for transferring data of km bytes between the vector register 13 and the arithmetic unit 11 is used to transfer data to an adjacent arithmetic unit.
2 is a k-byte dedicated bus. Therefore, the register R13 divides a km-byte memory into k-byte m buses.
It can be configured as individual and store data.

【０００７】また、レジスタＲ７はレジスタＲ６の上位
（ｍ−１）ｋバイトのデータを格納するレジスタＲ７ａ
とセレクトＳ３よりのｋバイトのデータを格納するレジ
スタＲ７ｂに分割して構成されている。セレクタＳ３は
レジスタＲ６のｋｍバイトのデータをｋバイトｍ個に分
割し、分割されたｋバイトのデータとレジスタＲ１３の
よりｋバイトのデータのうちの一つを選択し、レジスタ
Ｒ７ｂへ出力する。The register R7 is a register R7a for storing upper (m-1) k-byte data of the register R6.
And a register R7b for storing the data of k bytes from the select S3. The selector S3 divides the km-byte data of the register R6 into k-byte m data, selects one of the divided k-byte data and the k-byte data from the register R13, and outputs the selected data to the register R7b.

【０００８】また、１は算術論理ユニット（ＡＬＵ）で
ある。つぎに、図１１で示したベクトルレジスタに格納
されているデータの総和を求める演算処理を、図１３で
示す動作タイミングチャートにしたがって説明する。ま
ず、ＣＰＵ１５は全演算器に対して各演算器に接続され
ているベクトルレジスタ１２に記録されているデータの
加算命令を送出し、各演算器は加算処理を実行する。図
１３では演算器０で代表して加算処理を示している。Reference numeral 1 denotes an arithmetic logic unit (ALU). Next, the operation of calculating the sum of the data stored in the vector register shown in FIG. 11 will be described with reference to the operation timing chart shown in FIG. First, the CPU 15 sends an addition instruction of the data recorded in the vector register 12 connected to each operation unit to all the operation units, and each operation unit executes an addition process. FIG. 13 shows the addition process as a representative of the arithmetic unit 0.

【０００９】ＣＰＵ１５よりの加算命令を受けると、演
算器０の時間Ｔ₀よりＴ₄で順次図１１に示すベクトル
レジスタのデータＤ₀−０〜Ｄ₀−４をリードし、レジ
スタＲ１３に格納、次のタイミングでレジスタＲ１４に
移す（Ｔ₁〜Ｔ₅）。また、次のタイミングではレジス
タＲ２４とＲ１４のデータ加算をＡＬＵ１で行い、結果
をレジスタＲ５に格納する（Ｔ₂〜Ｔ₆）。また、次の
タイミングではレジスタＲ５のデータをレジスタＲ２４
に移す（Ｔ₃〜Ｔ₆）。なお、ＡＬＵ１での加算におい
て、最初と第２回においてはレジスタＲ２４のデータ
（Ｔ₁およびＴ₂時）は０にセットされている。[0009] Upon receiving the add instruction from the CPU 15, reads the data D _₀ -0~D ₀ -4 vector registers sequentially shown in FIG. 11 at T ₄ from the time T ₀ of the arithmetic unit 0, stored in the register R13, transferred to the register R14 at the next timing (T ₁ ~T _5). Further, in the next timing perform data addition register R24 and R14 in ALU1, and store the result in register R5 (T ₂ ~T _6). At the next timing, the data of the register R5 is transferred to the register R24.
(T _{3 to} T ₆ ). Note that in addition in ALU1, data (time T ₁ and T ₂₎ of the register R24 in the first and second times are set to zero.

【００１０】したがって、時間Ｔ₆においてレジスタＲ
５にはデータＤ₀−０，Ｄ₀−２およびＤ₀−４の加算
結果が、またレジスタＲ２４にはデータＤ₀−１および
Ｄ₀３の加算結果が格納されている。時間Ｔ₇ではレジ
スタＲ₅のデータがレジスタＲ１４に移され、次の時間
Ｔ₈でレジスタＲ₅とＲ１４のデータが加算されてレジ
スタＲ₅に格納され、次のＴ ₉でレジスタＲ２４に移さ
れる。Therefore, the time T₆At the register R
5 has data D₀−0, D₀-2 and D₀Addition of -4
The result and the data D are stored in the register R24.₀-1 and
D₀3 is stored. Time T₇Then cash register
Star R_FiveIs transferred to the register R14, and the next time
T₈With register R_FiveAnd the data of R14 are added
Star R_FiveAnd the next T ₉Moved to register R24
It is.

【００１１】なお、演算器０以外の演算器でも同様な加
算処理が同時に実行されるが、時間Ｔ₉でレジスタＲ５
のデータはレジスタＲ６に移される。各演算器での加算
処理が終了するとＣＰＵは各演算器での加算結果の総和
を求める命令が送出され、各演算器は総和終了を開始す
る。[0011] Note that the same addition process in calculator other than computing unit 0 is performed simultaneously at time T ₉ register R5
Is moved to the register R6. When the addition processing in each computing unit is completed, the CPU sends a command for calculating the sum of the addition results in each computing unit, and each computing unit starts terminating the sum.

【００１２】まず、演算器１が処理を開始し、レジスタ
Ｒ６の４個に分割（ｍ＝４）されたｋバイトデータを順
次レジスタＲ７ｂに移す（Ｔ₁₀〜Ｔ₁₃）。また次のタイ
ミングでレジスタＲ７ｂのデータは専用バス１２−１を
使用して演算器０のレジスタＲ１３に転送する（Ｔ₁₁〜
Ｔ₁₄）。すなわち、Ｔ₁₄で演算器１のレジスタＲ６に格
納されている加算データは演算器０のレジスタＲ１３に
移されたことになる。[0012] First, the arithmetic unit 1 starts the process, four split register R6 (m = 4) has been transferred to a k-byte data sequentially register R7b the (T ₁₀ ~T _13). The data for the register R7b at the next timing to transfer to the register R13 of the arithmetic unit 0 using a dedicated bus 12-1 (T ₁₁ ~
T _14). That is, the addition data stored in the register R6 of the arithmetic unit 1 in T ₁₄ will be transferred to the register R13 of the arithmetic unit 0.

【００１３】演算器０では時間Ｔ₁₅でレジスタＲ１３の
データをレジスタＲ１４に移し、Ｔ ₁₆で演算器０の加算
データを格納しているレジスタＲ２４との加算が行なわ
れレジスタＲ５に記録される。レジスタＲ５のデータは
次の加算のためにＴ₁₇でレジスタ２４に移される。In the arithmetic unit 0, the time T_FifteenIn the register R13
Transfer the data to register R14, ₁₆Addition of arithmetic unit 0
Addition with register R24 storing data is performed
Is recorded in the register R5. The data in register R5 is
T for the next addition₁₇Is transferred to the register 24.

【００１４】以上の動作と並行して、演算器２では時間
Ｔ₁₂〜Ｔ₁₅でレジスタＲ６のデータをレジスタＲ７ｂに
移し、次のタイミングで順次演算器１のレジスタＲ１３
の最下位のｋバイトを格納するレジスタに移される（Ｔ
₁₃〜Ｔ₁₆）。また、次のタイミングでレジスタＲ１３の
最下位バイト格納メモリに格納された演算器２よりのデ
ータはセレクタ３を通ってレジスタＲ７ｂに（Ｔ₁₄〜Ｔ
₁₇）、また次のタイミングで演算器０のレジスタＲ１３
に移され（Ｔ₁₅〜Ｔ₁₈）、演算器１との加算で説明した
と同様の加算処理が実行される。[0014] In parallel with the above operation, the arithmetic unit in 2 at time T ₁₂ through T ₁₅ were transferred data in the register R6 to register R7b, register sequentially calculator 1 at the next timing R13
Is transferred to the register storing the least significant k bytes of (T
₁₃ ~T _16). Further, following (T ₁₄ data to the register R7b through selector 3 from the arithmetic unit 2 which is stored in the least significant byte storage memory register R13 at the timing ~T
₁₇ ) Also, at the next timing, the register R13
(T _{15 to} T ₁₈ ), and the same addition processing as described in the addition with the arithmetic unit 1 is performed.

【００１５】演算器３の動作も時間Ｔ₁₄より開始され、
演算器２で説明したと同様な動作を行い、演算器２およ
び１のレジスタＲ１３およびＲ７ｂを通って演算器０の
レジスタＲ１３に転送され、Ｔ₂₄でレジスタＲ５で総和
が格納され、以下レジスタＲ６およびＲ７に移され時間
Ｔ₂₆で総和加算処理は終了する。[0015] initiated from work time T ₁₄ of the arithmetic unit 3,
Performs the same operation as described in the calculator 2, through the registers R13 and R7b computing units 2 and 1 are transferred to the register R13 of the arithmetic unit 0, the sum is stored in register R5 at T _24, the following registers R6 and total addition processing at the time T ₂₆ is moved to R7 is terminated.

【００１６】以上説明した従来例の構成は演算器が４個
であったが、一般的ベクトル演算処理装置においては更
に多くの演算器で構成される。演算器数が多くなると、
各演算器より演算器０へデータを転送するに要する時間
（図１３のＴ₉〜Ｔ₂₂）が長くなり、この転送時間を短
かくするために、図１４で示す構成案も考えられる。す
なわち、データ転送に関与するレジスタＲ１３およびＲ
７ｂを高速で動作するレジスタＦＲおよびＴＲに分離し
て構成させる案も考えられる。Although the configuration of the conventional example described above has four arithmetic units, a general vector arithmetic processing unit is composed of more arithmetic units. When the number of arithmetic units increases,
The time required to transfer data from each arithmetic unit to the arithmetic unit 0 (T _{9 to} T _{22 in} FIG. 13) becomes longer. To shorten this transfer time, a configuration shown in FIG. 14 can be considered. That is, the registers R13 and R13 involved in data transfer
It is also conceivable to separately configure the register 7b into registers FR and TR operating at high speed.

【００１７】[0017]

【発明が解決しようとする課題】前述したように、従来
のベクトル演算処理装置におけるベクトルレジスタに記
録されているデータの総和を求める処理において、各演
算器の加算結果を総和を求める演算器へのデータ転送に
要する時間が非常に長時間を必要とした。As described above, in the processing for obtaining the sum of the data recorded in the vector registers in the conventional vector operation processing device, the addition result of each operation unit is sent to the operation unit for obtaining the sum. The time required for data transfer was very long.

【００１８】また、この転送時間を短かくするために、
データ転送に関与するレジスタを高速で動作するレジス
タに分離して構成する案も考えられるが、この場合は、
分離構成させるために物量が多くなり複雑かつ高価とな
る。本発明は各演算器のデータを或る演算器に転送する
データ転送時間を短かくするよう改良したベクトル演算
処理装置の演算制御方式を提供することを目的とする。Further, in order to shorten the transfer time,
It is conceivable to separately configure registers involved in data transfer into registers that operate at high speed. In this case,
Because of the separate configuration, the amount of material increases, and the cost becomes complicated and expensive. SUMMARY OF THE INVENTION It is an object of the present invention to provide an operation control method of a vector operation processing device improved so as to shorten a data transfer time for transferring data of each operation unit to a certain operation unit.

【００１９】[0019]

【課題を解決するための手段】前述の課題を解決するた
めに本発明が採用した手段を図１を参照して説明する。
図１は本発明の原理図である。少なくとも、ベクトルレ
ジスタからの第１のオペランドを入力するレジスタＲ１
３を設け、前記レジスタＲ１３をｍ分割したレジスタの
それぞれに後段の演算器からのＫバイトデータの入力を
可能とし、第２のオペランドと前記レジスタＲ１３との
データに対する演算結果を受取るレジスタＲ６と、前記
レジスタＲ６のｍ分割されたデータと前記レジスタＲ１
３のｋバイトデータのいずれかを選択するセレクタＳ３
と、前記レジスタＲ６よりのｋ（ｍ−１）バイトデータ
を入力するレジスタＲ７ａと前記セレクタＳ３よりｋバ
イトデータを入力するレジスタＲ７ｂに分割して構成さ
れるレジスタＲ７とを備えた演算器を複数個有するベク
トル演算処理装置において、特定命令によって、前記レ
ジスタＲ１３を動作させるクロックタイミングを高速の
クロックタイミングに切替えるタイミング切替手段Ａ１
００と、前記レジスタＲ７ｂを動作させるクロックタイ
ミングを高速のクロックタイミングに切替えるタイミン
グ切替手段Ｂ１０１と、前記セレクタＳ３を動作させる
セレクタ信号の生成を高速のクロックタイミングに切替
えて生成するセレクタ切替手段１０２と、を備える。Means adopted by the present invention to solve the above-mentioned problems will be described with reference to FIG.
FIG. 1 is a diagram illustrating the principle of the present invention. At least a register R1 for inputting a first operand from a vector register
3, a register R6 that allows input of K-byte data from a subsequent-stage arithmetic unit to each of the registers obtained by dividing the register R13 by m, and receives a second operand and an operation result for the data of the register R13; M divided data of the register R6 and the register R1
Selector S3 for selecting any one of the three k-byte data
And a register R7 divided into a register R7a for inputting k (m-1) -byte data from the register R6 and a register R7b for inputting k-byte data from the selector S3. In a vector operation processing device having a plurality of
Timing switching means A1 for switching to clock timing
00, a timing switching means B101 for switching a clock timing for operating the register R7b to a high-speed clock timing , and a generation of a selector signal for operating the selector S3 to a high-speed clock timing.
Selector switching means 102 for generating the data .

【００２０】[0020]

【作用】ベクトル演算処理装置のＣＰＵよりの特定命令
によって、タイミング切替手段Ａ１００，タイミング切
替手段Ｂ１０１およびセレクタ切替手段１０２が動作を
開始し、タイミング切替手段Ａ１００によってレジスタ
Ｒ１３を、またタイミング切替手段Ｂ１０１によってレ
ジスタＲ７ｂを高速で動作させ、セレクタ切替手段１０
２によってセレクタＳ３が高速でセレクトされる信号を
送出してセレクタを切替える。The timing switching means A100, the timing switching means B101, and the selector switching means 102 start operating in response to a specific instruction from the CPU of the vector arithmetic processing unit, and the register R13 is operated by the timing switching means A100 and the timing switching means B101 is operated. By operating the register R7b at high speed, the selector switching means 10
2, the selector S3 sends out a signal for selecting at high speed to switch the selector.

【００２１】以上のように、ＣＰＵからの特定命令によ
って、後段の演算器より前段の演算器へのデータ転送に
関与するレジスタおよびセレクタが高速で動作するた
め、データ転送時間を非常に短かくすることができる。As described above, the register and the selector involved in the data transfer to the operation unit at the preceding stage from the operation unit at the subsequent stage operate at high speed by the specific instruction from the CPU, so that the data transfer time is made very short. be able to.

【００２２】[0022]

【実施例】本発明の一実施例を図２〜図９を参照して説
明する。図２は本発明の実施例の構成、図３は同実施例
のタイミング制御信号発生回路の具体例、図４は同実施
例のタイミング切替回路Ａの具体例、図５は同実施例の
タイミング切替回路Ｂの具体例、図６は同実施例のセレ
クト信号発生回路の具体例、図７は同実施例のセレクタ
切替回路の具体例、図８は同実施例の動作タイミングチ
ャート、図９は同実施例のタイミングチャートである。DESCRIPTION OF THE PREFERRED EMBODIMENTS One embodiment of the present invention will be described with reference to FIGS. FIG. 2 shows a configuration of the embodiment of the present invention, FIG. 3 shows a specific example of the timing control signal generation circuit of the embodiment, FIG. 4 shows a specific example of the timing switching circuit A of the embodiment, and FIG. 6 is a specific example of the select signal generation circuit of the embodiment, FIG. 7 is a specific example of the selector switch circuit of the embodiment, FIG. 8 is an operation timing chart of the embodiment, and FIG. 4 is a timing chart of the embodiment.

【００２３】図２において、レジスタＲ１３，Ｒ１４，
Ｒ２４，Ｒ５，Ｒ６，Ｒ７，セレクタＳ１，Ｓ２，Ｓ３
およびＡＬＵ１については図１２で説明したとおりであ
り、タイミング切替手段Ａ１００、タイミング切替手段
Ｂ１０１およびセレクタ切替手段１０２は図１で説明し
たとおりである。In FIG. 2, registers R13, R14,
R24, R5, R6, R7, selectors S1, S2, S3
ALU1 and ALU1 are as described in FIG. 12, and timing switching means A100, timing switching means B101 and selector switching means 102 are as described in FIG.

【００２４】実施例では、タイミング切替手段Ａ１００
はカウンタ（Ｆｃ）２、タイミング制御信号発生回路３
および切替回路Ａ４で構成され、タイミング切替手段Ｂ
１０１はタイミング切替回路Ｂ５で、またセレクタ切替
手段１０２はセレクト信号発生回路６およびセレクタ切
替回路７で構成される。In the embodiment, the timing switching means A100
Is a counter (Fc) 2, a timing control signal generation circuit 3
And a switching circuit A4.
101 is a timing switching circuit B5, also selector SWITCHING means 102 is composed of a select signal generating circuit 6 and the selector switching circuit 7.

【００２５】タイミング制御信号発生回路３は、図３に
示すように、オア回路３１ａ〜ｆおよびアンド回路３２
ａ〜ｅで構成される。オア回路に入力される信号Ｆｃｎ
のｎはカウンタ（Ｆｃ）２のカウント値ｎに対応し、カ
ウント値がｎのとき「１」が入力される。また、アンド
回路への入力「＋演算器０」および「−演算器０」は演
算器０に対応する入力端子にそれぞれ「１」および
「０」を入力し、その他の演算器では演算器０と逆の信
号を入力する。すなわち「１」には「０」を、「０」に
は「１」を入力する。また「ＳＩＧ」については後で説
明する。As shown in FIG. 3, the timing control signal generating circuit 3 includes OR circuits 31a to 31f and an AND circuit 32.
a to e. Signal Fcn input to OR circuit
N corresponds to the count value n of the counter (Fc) 2, and when the count value is n, "1" is input. Inputs “+ operation unit 0” and “− operation unit 0” to the AND circuit input “1” and “0” to input terminals corresponding to operation unit 0, respectively. Input the opposite signal. That is, "0" is input for "1", and "1" is input for "0". "SIG" will be described later.

【００２６】タイミング切替回路Ａ４は、図４で示され
るように、アンド回路４１ａ〜ｆおよびオア回路４２ａ
〜ｄで構成される。カウンタ（Ｆｃ）２のカウント値が
０のときはアンド４１ｆの出力は正規クロック、アンド
４１ｅが「０」となり、オア回路４２ａ〜ｄの出力には
正規クロックによる信号が出力され、レジスタＲ１３の
ライトタイミングが供給される。Ｆｃ２のカウント値が
０以外のときは、前記タイミング制御信号発生回路３よ
りの制御信号（Ｒ１３−０〜−３制御）にもとづいて、
倍クロックのタイミングでレジスタＲ１３−０〜−３に
ライトタイミングが供給される。As shown in FIG. 4, the timing switching circuit A4 includes AND circuits 41a to 41f and an OR circuit 42a.
To d. When the count value of the counter (Fc) 2 is 0, the output of the AND 41f is a normal clock , the AND 41e is "0", signals of the normal clock are output to the outputs of the OR circuits 42a to 42d, and the write of the register R13 is performed. Timing is provided. When the count value of Fc2 is other than 0, based on the control signal (R13-0 to -3 control) from the timing control signal generation circuit 3,
The write timing is supplied to the registers R13-0 to R3-3 at the timing of the double clock.

【００２７】タイミング切替回路Ｂ５は、図５に示すよ
うに、アンド回路５１ａおよびＢ、オア回路５２で構成
される。したがって、レジスタＲ７ａへのライトタイミ
ングは変更ないが、レジスタＲ７ｂへのライトタイミン
グは前記タイミング制御信号発生回路３よりのＲ７ｂ制
御信号にもとづいて、正規クロックと倍クロック信号の
切替えが行なわれる。As shown in FIG. 5, the timing switching circuit B5 comprises AND circuits 51a and B and an OR circuit 52. Accordingly, although the write timing to the register R7a is not changed, the write timing to the register R7b is switched between the normal clock and the double clock signal based on the R7b control signal from the timing control signal generation circuit 3.

【００２８】セレクト信号発生回路６は、図６に示すよ
うに、アンド回路６１ａ〜ｃ、Ｒｓフリップフロップ６
２、カウンタ６３および５デコーダ６４で構成される。
アンド回路６１ａに入力されるＦｃ（ｎ）はセレクト信
号発生回路６が実装されている演算器の番号ｎに対応さ
せ、カウンタ（Ｆｃ）２のカウント値がｎのとき「１」
が入力されて、ＲＳ−ＦＦ６２をセットし、倍クロック
でカウンタ６３の計数を開始する。５デコーダ６４はカ
ウンタ６３のカウント値が５になったことを検出し、ア
ンド回路６１ｃよりカウンタ６３に入力される倍クロッ
ク信号の通過を阻止する。したがって、カウンタ６３よ
り出力される信号Ｓ３制御は０〜５の値を取る。また、
ＲＳ−ＦＦ６２およびカウンタ６３はカウンタ（Ｆｃ）
２のカウント値７でリセットされて０になる。As shown in FIG. 6, select signal generation circuit 6 includes AND circuits 61a-61c, Rs flip-flop 6
2, a counter 63 and a 5-decoder 64.
Fc (n) input to the AND circuit 61a is made to correspond to the number n of the arithmetic unit on which the select signal generation circuit 6 is mounted, and "1" when the count value of the counter (Fc) 2 is n.
Is input, the RS-FF 62 is set, and the counter 63 starts counting at the double clock. The 5-decoder 64 detects that the count value of the counter 63 has become 5, and blocks passage of the double clock signal input to the counter 63 from the AND circuit 61c. Therefore, the control of the signal S3 output from the counter 63 takes a value of 0 to 5. Also,
The RS-FF 62 and the counter 63 are counters (Fc)
It is reset to 0 by the count value 7 of 2.

【００２９】セレクタ切替回路７は、図７に示すよう
に、アンド回路７１ａおよびｂとオア回路７２で構成さ
れる。セレクタＳ３へのセレクタ信号はセレクト端子番
号に対応した数値信号でできており、この数値信号の数
値をセレクトＳ３内の図示しないデコーダがデコードし
て対応する端子をセレクトする。カウンタ（Ｆｃ）２の
カウント値が０のときは、正規の制御信号がセレクタＳ
３に加えられ、カウンタ（Ｆｃ）２のカウント値が０以
外のときは、前記セレクト信号発生回路６よりの出力で
あるＳ３制御信号がセレクタＳ３に加えられる。The selector switching circuit 7 includes AND circuits 71a and 71b and an OR circuit 72, as shown in FIG. The selector signal to the selector S3 is formed of a numerical signal corresponding to the select terminal number, and the numerical value of the numerical signal is decoded by a decoder (not shown) in the select S3 to select a corresponding terminal. When the count value of the counter (Fc) 2 is 0, a normal control signal is output from the selector S
When the count value of the counter (Fc) 2 is other than 0, the S3 control signal output from the select signal generating circuit 6 is applied to the selector S3.

【００３０】つぎに、実施例の動作を、従来例で説明し
たベクトルレジスタの総和を求める処理と同様な処理に
ついて図８および９を参照してその動作を説明する。図
８において、時間Ｔ₀〜Ｔ₈は従来例の図１３で説明し
たと同様に各演算器でそれぞれのベクトルレジスタのデ
ータ加算が実行される。また、時間Ｔ₉〜Ｔ ₁₅は図１３
で説明したタイミングの２倍のタイミングで、各演算器
で加算され、レジスタＲ６に記録されているデータが順
次後段より前段の演算器に転送され、演算器０で総和が
求められる。また、時間Ｔ₁₆〜Ｔ₁₉は従来例の図１３で
説明した時間Ｔ₂₃〜Ｔ₂₆と同様の処理が行なわれる。Next, the operation of the embodiment will be described with reference to a conventional example.
Processing similar to the processing to calculate the sum of
The operation will be described with reference to FIGS. Figure
8, at time T₀~ T₈Is explained with reference to FIG.
In the same way as in
Data addition is performed. Also, the time T₉~ T _FifteenFigure 13
Each operation unit has a timing that is twice the timing described in
And the data recorded in the register R6 is
The data is transferred to the operation unit at the stage before the next stage and the sum is calculated at operation unit 0.
Desired. Also, the time T₁₆~ T₁₉Is the conventional example in FIG.
Explained time T_{twenty three}~ T₂₆The same processing as described above is performed.

【００３１】時間Ｔ₉〜Ｔ₁₅の倍クロックでのデータ転
送の開始はカウンタ（Ｆｃ）２のカウント開始によって
始められる。カウンタ（Ｆｃ）２はカウント値７で繰返
えされ、そのカウント開始はＣＰＵ１５よりの命令によ
ってカウントを開始する。ＣＰＵは各演算器で、それぞ
れベクトルレジスタ１３のデータの加算が終了（Ｔ₈）
すると、各演算器の加算結果の総和を求める命令を指令
する。この命令によって、カウント（Ｆｃ）２はカウン
トを開始する（Ｔ₉）。The start of data transfer with a double clock of the times T _{9 to} T ₁₅ is started when the counter (Fc) 2 starts counting. The counter (Fc) 2 is repeated with a count value of 7, and the counting is started by an instruction from the CPU 15. The CPU completes the addition of the data of the vector register 13 in each arithmetic unit (T ₈ ).
Then, a command for obtaining the sum of the addition results of the respective arithmetic units is instructed. With this instruction, the count (Fc) 2 starts counting (T ₉ ).

【００３２】図９において、正規クロックは演算器が通
常動作しているクロックタイミングであり、倍クロック
は正規クロック周波数の２倍の周波数のクロックタイミ
ング、またＳＩＧは正規クロックに対するデュティ５０
％のパルスで、実線の期間出力を「１」にする。In FIG. 9, the normal clock is the clock timing at which the arithmetic unit normally operates, the double clock is the clock timing of twice the normal clock frequency, and the SIG is the duty 50 relative to the normal clock.
%, The output is set to "1" during the period indicated by the solid line.

【００３３】まず、演算器０のタイミング制御信号発生
回路３の動作について説明する。演算器０に対しては図
３のアンド回路３２ａ〜ｄに入力される信号「＋演算器
０」は「１」が、アンド回路３２ｅには「０」が入力さ
れるため、アンド回路３２ａの出力（Ｒ１３−０制御）
には、カウンタ（Ｆｃ）のカウント値が１，３および５
と信号ＳＩＧとのアンドがとられた期間パルスが出力さ
れる。また、以外同様に、アンド回路３２ｂおよびｃと
オア回路３１ｅに、それぞれ図９に示したＲ１３−１〜
−３制御で示されるパルスが出力される。First, the operation of the timing control signal generation circuit 3 of the arithmetic unit 0 will be described. 3 is input to the AND circuits 32a to 32d of FIG. 3 as "1" and "0" is input to the AND circuit 32e. Output (R13-0 control)
Have count values of the counter (Fc) of 1, 3 and 5
A pulse is output during a period in which AND of signal and signal SIG is taken. Similarly, the AND circuits 32b and 32c and the OR circuit 31e have R13-1 to R13-1 shown in FIG.
-3 control is output.

【００３４】また、演算器０以外の演算器は、前述した
ように、「＋演算器０」に「０」，「−演算器０」に
「１」が入力されるため、アンド回路３２ａ〜ｃに対応
する出力（Ｒ１３−０〜−２）にはパルスが送出され
ず、オア回路３１ｅのみにカウンタ（Ｆｃ）のカウント
値２〜６の期間「１」のＲ１３−３制御で示すパルスを
出力する。As described above, the operation units other than the operation unit 0 receive "0" to the "+ operation unit 0" and "1" to the "-operation unit 0". No pulse is sent to the output (R13-0 to R13-2) corresponding to c, and only the OR circuit 31e outputs a pulse indicated by the R13-3 control in the period "1" of the count value 2 to 6 of the counter (Fc). Output.

【００３５】また、レジスタＲ７ｂへの制御信号Ｒ７ｂ
制御は全ての演算器でカウンタ（Ｆｃ）のカウント値が
１〜７まで「１」が出力される。このようにして発生さ
れたＲ１３−０〜−３制御信号はタイミング切替回路Ａ
４に入力され、前述したようにレジスタＲ１３へのデー
タライトタイミングを２倍にして動作させ、また、Ｒ７
ｂ制御信号はタイミング切替回路Ｂ７に入力され、前述
したようにレジスタＲ７ｂへのデータライトタイミング
を２倍にして動作させる。The control signal R7b to the register R7b
In control, "1" is output from 1 to 7 in the count value of the counter (Fc) in all the arithmetic units. The R13-0 to -3 control signals generated in this manner are transmitted to the timing switching circuit A.
4, the data write timing to the register R13 is doubled as described above, and the operation is performed.
The b control signal is input to the timing switching circuit B7, and operates by doubling the data write timing to the register R7b as described above.

【００３６】また、セレクタ信号発生回路６では、カウ
ンタ（Ｆｃ）のカウント値が演算器の番号ｎに対応する
値と一致したときＳＲ−ＦＦ６２がセットされ、図６で
示すカウンタ６２が倍クロックでカウントを開始する。
したがって、例えば、演算器３のセレクタ信号発生回路
６はカウンタ（Ｆｃ）２のカウント値が１よりカウント
を倍クロックで開始し、カウント値が５で停止する。ま
た、カウンタ（Ｆｃ）のカウント値が７のときリセット
されて０となる。カウンタ６３より出力されるＳ３制御
信号はセレクタ切替回路７に入力され、セレクタＳ３を
高速で対応する番号の端子をセレクトし、レジスタＲ７
ｂへデータを出力する。In the selector signal generating circuit 6, when the count value of the counter (Fc) coincides with the value corresponding to the arithmetic unit number n, the SR-FF 62 is set, and the counter 62 shown in FIG. Start counting.
Therefore, for example, the selector signal generation circuit 6 of the arithmetic unit 3 starts counting with a double clock when the count value of the counter (Fc) 2 is 1, and stops when the count value is 5. Also, when the count value of the counter (Fc) is 7, it is reset to 0. The S3 control signal output from the counter 63 is input to the selector switching circuit 7, and the selector S3 selects the terminal of the corresponding number at high speed, and
Output data to b.

【００３７】以上説明した動作が行なわれることによ
り、従来例では図１３に示すように時間Ｔ₉〜Ｔ₂₂まで
要した各演算より演算器０へのデータ転送が、図８で示
すように時間Ｔ₉〜Ｔ₁₄で完了し、処理時間を非常に短
かくすることができる。なお、以上説明した実施例にお
いては演算器の数を４個としたが、個数を４個と限定す
るものではなく、本発明は複数個の演算器に適用され
る。By performing the operation described above, in the conventional example, the data transfer to the arithmetic unit 0 from each operation required from time T ₉ to time T ₂₂ as shown in FIG. complete with T ₉ through T _14, it can be very short processing time. Although the number of arithmetic units is four in the embodiment described above, the number is not limited to four, and the present invention is applied to a plurality of arithmetic units.

【００３８】また、以上、本発明の一実施例について説
明したが、本発明はこの実施例に限定されるものではな
く、その発明の主旨に従った各種変形が可能である。Although one embodiment of the present invention has been described above, the present invention is not limited to this embodiment, and various modifications in accordance with the gist of the invention are possible.

【００３９】[0039]

【発明の効果】以上説明したように、本発明によれば次
の効果が得られる。ＣＰＵからの特定命令によって、後
段の演算器より前段の演算器へのデータ転送に関与する
レジスタおよびセレクタが高速で動作するため、データ
転送時間を非常に短かくすることができる。As described above, according to the present invention, the following effects can be obtained. The register and the selector involved in the data transfer to the operation unit at the preceding stage from the operation unit at the subsequent stage operate at high speed by the specific instruction from the CPU, so that the data transfer time can be made very short.

[Brief description of the drawings]

【図１】本発明の原理図である。FIG. 1 is a principle diagram of the present invention.

【図２】本発明の一実施例の構成図である。FIG. 2 is a configuration diagram of one embodiment of the present invention.

【図３】同実施例のタイミング制御信号発生回路の具体
例である。FIG. 3 is a specific example of a timing control signal generation circuit of the embodiment.

【図４】同実施例のタイミング切替回路Ａの具体例であ
る。FIG. 4 is a specific example of a timing switching circuit A of the embodiment.

【図５】同実施例のタイミング切替回路Ｂの具体例であ
る。FIG. 5 is a specific example of a timing switching circuit B of the embodiment.

【図６】同実施例のセレクト信号発生回路の具体例であ
る。FIG. 6 is a specific example of a select signal generation circuit of the embodiment.

【図７】同実施例のセレクタ切替回路の具体例である。FIG. 7 is a specific example of the selector switching circuit of the embodiment.

【図８】同実施例の動作タイミングチャートである。FIG. 8 is an operation timing chart of the embodiment.

【図９】同実施例のタイミングチャートである。FIG. 9 is a timing chart of the embodiment.

【図１０】本発明および従来例が適用されるベクトル演
算処理装置の構成図である。FIG. 10 is a configuration diagram of a vector operation processing device to which the present invention and a conventional example are applied.

【図１１】ベクトルレジスタの具体例である。FIG. 11 is a specific example of a vector register.

【図１２】従来例の演算器の構成図である。FIG. 12 is a configuration diagram of a conventional arithmetic unit.

【図１３】従来の動作タイミングチャートである。FIG. 13 is a conventional operation timing chart.

【図１４】従来例の演算器構成案である。FIG. 14 is a configuration diagram of a conventional arithmetic unit.

[Explanation of symbols]

１００タイミング切替手段Ａ１０１タイミング切替手段Ｂ１０２セレクタ切替手段１算術論理ユニット（ＡＬＵ）２カウンタ（Ｆｃ）３タイミング制御信号発生回路４タイミング切替回路Ａ５タイミング切替回路Ｂ６セレクト信号発生回路７セレクタ切替回路１１演算器１２専用バス１３ベクトルレジスタ１４システムバス１５プロセッサ（ＣＰＵ）３１，４２，５２，７２オア回路３２，４１，５１，６１，７１アンド回路６２ＲＳフリップフロップ（ＲＳ−ＦＦ）６３カウンタ６４５デコーダＲ１３，Ｒ１４，Ｒ２４，Ｒ５，Ｒ６，Ｒ７，ＦＲ，Ｔ
ＲレジスタＳ１，Ｓ２，Ｓ３セレクタREFERENCE SIGNS LIST 100 timing switching means A 101 timing switching means B 102 selector switching means 1 arithmetic logic unit (ALU) 2 counter (Fc) 3 timing control signal generating circuit 4 timing switching circuit A 5 timing switching circuit B 6 select signal generating circuit 7 selector switching Circuit 11 Operation unit 12 Dedicated bus 13 Vector register 14 System bus 15 Processor (CPU) 31, 42, 52, 72 OR circuit 32, 41, 51, 61, 71 AND circuit 62 RS flip-flop (RS-FF) 63 Counter 64 5 decoders R13, R14, R24, R5, R6, R7, FR, T
R register S1, S2, S3 selector

Claims

(57) [Claims]

1. A register (R13) for inputting at least a first operand from a vector register,
A register (R6) that enables input of k-byte data from a subsequent-stage arithmetic unit to each of the registers obtained by dividing the register (R13) into m, and that receives a second operand and an operation result for the data in the register (R13). A selector (S3) for selecting one of the m-divided data of the register (R6) and the k-byte data of the register (R13); and k (m) from the register (R6).
-1) A register (R7) divided into a register (R7a) for inputting byte data and a register (R7b) for inputting k-byte data from the selector (S3).
In the vector processing unit having a plurality of computing units with bets by the particular instruction, and the timing switching means A (100) for switching the clock timing for operating the register (R13) in the high-speed clock timing, the register (R7b ), The timing switching means B (101) for switching the clock timing to the high-speed clock timing , and the generation of a select signal for operating the selector (S3).
And a selector switching means (102) for generating by switching to a high-speed clock timing .

2. The method according to claim 1, wherein the specific instruction updates data of each arithmetic unit
It is a command to be transferred to the preceding computing unit
An arithmetic control method for the vector arithmetic processing device according to claim 1.