JPS6122831B2

JPS6122831B2 -

Info

Publication number: JPS6122831B2
Application number: JP18705180A
Authority: JP
Inventors: Shoji Nakatani; Hiroshi Tamura
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1980-12-27
Filing date: 1980-12-27
Publication date: 1986-06-03
Also published as: JPS57111667A

Description

【発明の詳細な説明】本発明はデータ処理回路に関し、特に倍精度浮
動小数点データ，固定小数点データ等のように複
数のデータ形式をもつたデータを同一のアライン
回路を使用し、少ないバス本数で転送できるよう
にしたデータ処理の回路に関する。[Detailed Description of the Invention] The present invention relates to a data processing circuit, and in particular to a data processing circuit that processes data in multiple data formats such as double-precision floating point data, fixed point data, etc. using the same align circuit and with a small number of buses. This invention relates to a data processing circuit that enables data transfer.

ベクトルデータを取り扱うデータ処理装置で
は、ベクトル演算を行うために主メモリにある多
量のベクトルデータのうち使用される一部のデー
タを高速にアクセス可能なベクトル・レジスタ等
に転送しておき、このベクトル・レジスタのデー
タを順次高速にアクセスして高速に演算を行な
う。 In a data processing device that handles vector data, some of the large amount of vector data in main memory used for vector operations is transferred to a vector register that can be accessed at high speed.・Access register data sequentially at high speed to perform high-speed calculations.

またベクトル・レジスタに得られた演算結果は
必要により主メモリに転送される。 Further, the operation results obtained in the vector register are transferred to the main memory as necessary.

このようにベクトルレジスタを主メモリとパイ
プライン演算器との間に位置させることにより、
演算器はベクトルレジスタとの間でやりとりをす
ればよく、また主メモリはベクトルレジスタとの
間でやりとりをすればよいため、主メモリの動作
は演算器を、また演算器は主メモリを意識するこ
とはなく動作可能となる。このことは制御を容易
にし、かつ高速処理を実現する方法として行なわ
れている。 By locating the vector register between the main memory and the pipeline arithmetic unit in this way,
Arithmetic units only need to communicate with vector registers, and main memory only needs to communicate with vector registers, so the main memory operations are aware of the arithmetic units, and the arithmetic units are aware of the main memory. It becomes possible to operate without any problem. This is done as a method to facilitate control and realize high-speed processing.

例えば、第１図に示すように、メモリ１はMM
−Ａ〜MM−Ｄの４つの部分から構成され、ベク
トルデータの複数のエレメントe₀，e₁，e₂…はメ
モリ１上においてそのアドレスにより、連続する
アドレスに位置する場合、ある一定の距離をもつ
たアドレスに位置する場合あるいは不規則なアド
レスに位置する場合等様々に存在している。ま
た、演算器（図示せず）ではベクトルレジスタよ
りエレメント順に順次読み出して演算の実行が行
なわれるため、ベクトルレジスタ内にはベクトル
データがエレメント順に格納されている。上記の
ことからメモリ１に格納されているベクトルデー
タをデータ処理部２のベクトルレジスタVR−
１，VR−２等にロードする場合、各ベクトルデ
ータのエレメントに対応したメモリ１の読出しア
ドレスをもとにデータを転送するため、読出しデ
ータバスBI−０〜BI−３は各エレメントのアド
レス情報によつていずれの読出しデータバスから
読み出されるかが決まり、ベクトル・レジスタ
VR−１，VR−２へ入力するデータバスBO−０
〜BO−３はベクトル・レジスタがエレメント順
に構成されている為、各エレメントのエレメント
番号によつて決まる。したがつて入力レジスタ
IR−０〜３と出力レジスタOR−０〜OR−３と
の間にマトリツクス状に複数のバスが設けられる
必要があり、これがアライン回路３である。この
アライン回路３により入力バスBI−０〜BI−３
のすべては、すべての出力バスBO−０〜BO−３
と接続されている。すなわち各エレメントe₀，
e₁，e₂…がどのメモリユニツトMM−Ａ〜MM−
Ｄにあつてもアクセスすることが可能となる。 For example, as shown in Figure 1, memory 1 is MM
Consisting of four parts -A to MM-D, multiple elements e ₀ , e ₁ , e ₂ ... of vector data are located at consecutive addresses in memory 1 by a certain distance. There are various cases, such as cases where the address is located at an address with Further, since an arithmetic unit (not shown) sequentially reads data from a vector register in order of elements and executes arithmetic operations, vector data is stored in the vector register in order of elements. From the above, the vector data stored in the memory 1 is transferred to the vector register VR- of the data processing unit 2.
1, VR-2, etc., data is transferred based on the read address of memory 1 corresponding to each element of vector data, so read data buses BI-0 to BI-3 are used to store address information of each element. determines which read data bus is read from, and the vector register
Data bus BO-0 input to VR-1 and VR-2
~BO-3 is determined by the element number of each element because the vector register is configured in the order of elements. Therefore the input register
A plurality of buses must be provided in a matrix between IR-0 to IR-3 and output registers OR-0 to OR-3, and this is the align circuit 3. This align circuit 3 allows the input buses BI-0 to BI-3 to
All of the output buses BO-0 to BO-3
is connected to. That is, each element e ₀ ,
Which memory unit MM-A to MM- are e ₁ , e ₂ ...?
It becomes possible to access even if it is in D.

また演算器では、倍精度浮動小数点データの演
算，単精度浮動小数点データの演算または固定小
数点データの演算等が行なわれるため、各演算で
取り扱うデータの大きさがかわり、またベクトル
レジスタに格納されるデータも演算器で扱われる
データと同じ形式となる。すなわち第２図イに示
す如く、倍精度の浮動小数点データ，同ロに示す
如く、単精度の浮動小数点データ、同ハに示す如
く、固定小数点データ等がある。 In addition, since the arithmetic unit performs operations on double-precision floating point data, single-precision floating-point data, fixed-point data, etc., the size of the data handled in each operation changes and is stored in the vector register. The data also has the same format as the data handled by the arithmetic unit. That is, as shown in FIG. 2A, there is double precision floating point data, as shown in FIG. 2B, single precision floating point data, and as shown in FIG. 2C, fixed point data.

ところでこれらのデータは主メモリ上ではそれ
ぞれ8Bないし4Bのデータとして置かれている。
したがつてアライン回路３においては、例えば倍
精度浮動小数点データ８バイト・データ（72ビツ
ト）として処理する場合、例えば単精度浮動小数
点データ４バイト（36ビツト）上位部（Upper部
分）として処理する場合および例えば固定小数点
データ４バイト下位部（Lower部分）として処理
する場合がある。勿論、この４バイト上位部とし
て処理する場合には、その下位部には図示省略し
た「０」発生部より「０」づめが行なわれ、４バ
イト下位部として処理する場合には、同様に、そ
の上位部に「０」づめが行なわれる。 By the way, each of these data is stored as 8B to 4B data on the main memory.
Therefore, in the align circuit 3, for example, when processing double-precision floating point data as 8-byte data (72 bits), for example, when processing it as single-precision floating-point data as 4 bytes (36 bits) upper part (upper part). For example, it may be processed as a 4-byte lower part of fixed-point data. Of course, when processing as the upper 4-byte part, the lower part is padded with "0" from a "0" generating part (not shown), and when processing as the lower 4-byte part, similarly, The upper part is padded with "0".

このようにして、８バイトのデータとして処理
する場合と、４バイト上位部および４バイト下位
部として処理する場合とがある。それ故、８バイ
トのデータとして転送処理する場合には、アライ
ン回路の各８バイトバスの張り方は第３図に示す
如くなるが、４バイト上位部および４バイト下位
部としてデータ転送する場合には、第４図の如く
各４バイトバスを設ける必要がある。勿論この第
４図のように各４バイトバスを設ける場合には、
第３図に示したように８バイト転送も可能となる
ので、換言すれば、第４図には第３図も含まれる
ことになる。 In this way, there are cases in which the data is processed as 8-byte data, and cases in which it is processed as 4-byte upper part and 4-byte lower part. Therefore, when transferring data as 8-byte data, the arrangement of each 8-byte bus in the align circuit is as shown in Figure 3, but when data is transferred as 4-byte upper part and 4-byte lower part. It is necessary to provide each 4-byte bus as shown in FIG. Of course, if each 4-byte bus is provided as shown in Figure 4,
Since 8-byte transfer is also possible as shown in FIG. 3, in other words, FIG. 4 also includes FIG. 3.

このようにして従来のアライン回路では、第４
図に示すように入力レジスタIR−０〜IR−３お
よび出力レジスタOR−０〜OR−３がそれぞれ４
個ずつ設けられる場合には各レジスタを４バイト
単位で分割すると64本もの４バイト転送バスを必
要とするという問題があつた。 In this way, in the conventional align circuit, the fourth
As shown in the figure, input registers IR-0 to IR-3 and output registers OR-0 to OR-3 each have four
When each register is provided separately, there is a problem that if each register is divided into 4-byte units, 64 4-byte transfer buses are required.

また、ロードの場合とは逆に、演算結果データ
をベクトル・レジスタVR−１，VR−２に一度書
込んでおき、第５図に示すように、このベクトル
レジスタに書込まれた結果データを主メモリ１に
格納することも行なわれるが、このような場合
も、ベクトルレジスタからの各エレメントは主メ
モリ１への書込みアドレスをもとにデータを転送
するため書込データバスBO−０′〜BO−３′はメ
モリ１への書込みアドレスによつてデータバス位
置が決まり、またデータ処理部２からアライン回
路３′へのアライン入力バスBI−０′〜BI−３′の
制御順序はエレメント番号によつて規定されるこ
とになる。それ故、このメモリへの書込の場合
も、アライン入力バスBI−０′〜BI−３′のすべ
ては、すべての書込みバスBO−０′〜BO−３′と
接続されるアライン回路３′を必要とする。そし
て、データ処理部２から出力されてアライン回路
３′に入力されるデータが８バイトデータの場合
には、第６図の如き８バイトバスが必要となり、
４バイト上位部及び４バイト下位部としてデータ
転送する場合には第７図の如き状態で４バイトバ
スを設ける必要がある。そして第７図のようにバ
スを設ける場合には第６図のように８バイト転送
も可能であるので、第７図には第６図の場合も含
まれることになる。かくして、書込みの場合も、
上記ロードの場合と同様に入力レジスタIR−
０′〜IR−３′および出力レジスタOR−０′〜OR
−３′がそれぞれ４個ずつ設けられる場合には、
これまた各レジスタを４バイト単位で分割すると
64本もの４バイト転送バスを必要とするという問
題があつた。 Also, contrary to the case of loading, the operation result data is written to vector registers VR-1 and VR-2 once, and the result data written to these vector registers is written as shown in Figure 5. Storing data in the main memory 1 is also performed, but in such a case, each element from the vector register transfers data based on the write address to the main memory 1, so the write data bus BO-0'~ The data bus position of BO-3' is determined by the address written to the memory 1, and the control order of the align input buses BI-0' to BI-3' from the data processing section 2 to the align circuit 3' is determined by the element number. It will be defined by. Therefore, in the case of writing to this memory, all of the align input buses BI-0' to BI-3' are connected to the align circuit 3' connected to all the write buses BO-0' to BO-3'. Requires. If the data output from the data processing section 2 and input to the align circuit 3' is 8-byte data, an 8-byte bus as shown in FIG. 6 is required.
When data is transferred as a 4-byte upper part and a 4-byte lower part, it is necessary to provide a 4-byte bus as shown in FIG. If a bus is provided as shown in FIG. 7, 8-byte transfer as shown in FIG. 6 is also possible, so that the case shown in FIG. 6 is also included in FIG. Thus, even in the case of writing,
As in the case of the load above, the input register IR−
0' to IR-3' and output register OR-0' to OR
-3' is provided four times each,
Again, if each register is divided into 4 bytes,
The problem was that 64 4-byte transfer buses were required.

したがつて本発明はこのような問題点を改善し
てバス本数を少なくし、より少ないハードウエア
量であつても種々の形式のデータを転送速度を落
すことなくデータ転送が可能なデータ処理回路を
提供することを目的とするものであり、そのため
に本発明のデータ処理回路では、独立に動作可能
な１つないし複数の読出しデータバス及び１つな
いし複数の書込データバスを有する複数のメモリ
と、複数の入力バスおよび複数の出力バスを備え
るデータ処理部と、前記メモリのそれぞれの読出
しデータバスの一部または全てが前記データ処理
部のそれぞれの入力バスの一部または全てに転送
可能とするとともに前記データ処理部のそれぞれ
の出力バスの一部または全てが前記メモリの書込
データバスに転送可能とするアライン回路とを設
けたデータ処理装置において、前記アライン回路
のメモリからの読出しデータバスと、データ処理
部への入力バスと、データ処理部からの出力バス
とメモリへの書込データバスのそれぞれを複数個
の区分に分割し、前記メモリからの読出しデータ
バスのある区分データを前記データ処理部の入力
バスの一部の区分に転送する時には前記複数個に
分割されたメモリからの読出しデータバスの区分
データを前記データ処理部の入力バスごとに予め
少なくとも１つを選択し、該選択された区分デー
タを対応するデータ処理部の入力バスの各区分に
転送可能とするように構成し、前記データ処理部
からの出力バスのある区分データを前記メモリの
書込データバスの一部の区分に転送する時には前
記複数個に分割されたデータ処理部からの出力バ
スの区分を各出力バスごとに予め少なくとも１つ
を選択し、該選択された区分データを該複数個に
分割されたメモリへの書込データバスの任意の区
分に転送可能とするように構成したことを特徴と
する。 Therefore, the present invention improves these problems by reducing the number of buses, and provides a data processing circuit that can transfer data in various formats without reducing the transfer speed even with a smaller amount of hardware. To this end, the data processing circuit of the present invention includes a plurality of memories each having one or more independently operable read data buses and one or more write data buses. and a data processing unit comprising a plurality of input buses and a plurality of output buses, and a part or all of each read data bus of the memory can be transferred to part or all of each input bus of the data processing part. and an align circuit that enables a part or all of the respective output buses of the data processing section to be transferred to the write data bus of the memory, wherein the read data bus from the memory of the align circuit is provided. Then, the input bus to the data processing section, the output bus from the data processing section, and the write data bus to the memory are each divided into a plurality of sections, and the section data from the read data bus from the memory is divided into the sections. When data is to be transferred to some sections of the input bus of the data processing section, at least one section of the read data bus from the divided memory is selected in advance for each input bus of the data processing section; The selected partitioned data can be transferred to each partition of the input bus of the corresponding data processing unit, and the partitioned data with the output bus from the data processing unit is transferred to a part of the write data bus of the memory. When transferring the data to the plurality of divisions, at least one division of the output bus from the plurality of divided data processing units is selected in advance for each output bus, and the selected division data is divided into the plurality of divisions. The present invention is characterized in that it is configured so that data can be transferred to any section of the write data bus to the memory.

以下本発明の一実施例を第８図〜第１０図にも
とづき説明する。 An embodiment of the present invention will be described below with reference to FIGS. 8 to 10.

なお、第８図〜第１０図は本発明において、メ
モリからベクトルデータをデータ処理部にロード
する場合の図を示す。 Note that FIGS. 8 to 10 show diagrams in the case of loading vector data from memory to the data processing section in the present invention.

第８図は本発明における４バイトデータ転送バ
スの状態を示し、第９図は８バイトデータを転送
するために第８図に追加するバスの状態を示し、
第１０図はこれらをまとめた本発明の一実施例構
成を示す。 FIG. 8 shows the state of a 4-byte data transfer bus in the present invention, and FIG. 9 shows the state of a bus added to FIG. 8 for transferring 8-byte data.
FIG. 10 shows the configuration of an embodiment of the present invention that combines these elements.

本発明においては、第８図に示すように各入力
レジスタIR−０〜IR−３の上位部および下位部
に分割した入力区分データMI−０〜MI−７を設
け、その出力を出力区分データMO−１，MO−
３，MO−５およびMO−７に転送する。したが
つて４バイトデータを扱う命令の場合には、入力
レジスタIR−０〜IR−３にセツトしこれを入力
区分データMI−０〜MI−７で取出して出力区分
データMO−１，MO−３，MO−５，MO−７に
送出し、出力レジスタOR−０〜OR−３の上位部
あるいは下位部にセツトする。このようにするこ
とにより、アライン回路３では各区分データ間の
４バイトデータ転送用バスが32本ですむことにな
る。また出力区分データMO−１，MO−３，MO
−５，MO−７から出力レジスタOR−０〜OR−
３の上位部に４バイト転送用のバスがそれぞれ１
本づつ合計４本必要になる。この外に第９図に示
すように、８バイトデータ転送用のために、出力
区分データMO−０，MO−２，MO−４および
MO−６を設け、これらの出力区分データと入力
区分データMI−０，MI−２，MI−４およびMI−
６との間にそれぞれ４本ずつ合計16本の４バイト
バスを必要とする。そしてこの結果、第１０図に
示す如く、アライン回路３では４バイト転送用バ
スを32＋４＋16＝52本ですむことになり、第４図
に示すように64本の４バイト転送バスを必要とし
たものに比較すれば転送速度を落すことなくバス
量を大幅に少なくすることができ、またこのこと
はバスが４バイト（例えば36ビツト）であるため
ハードウエア量を大幅に少なくすることができ
る。 In the present invention, as shown in FIG. 8, each input register IR-0 to IR-3 is provided with input classification data MI-0 to MI-7 divided into an upper part and a lower part, and the output thereof is divided into output classification data. MO−1, MO−
3. Transfer to MO-5 and MO-7. Therefore, in the case of an instruction that handles 4-byte data, set it in input registers IR-0 to IR-3, take it out as input segment data MI-0 to MI-7, and output it as output segment data MO-1 and MO-. 3, send to MO-5, MO-7, and set in the upper or lower part of output registers OR-0 to OR-3. By doing this, the align circuit 3 requires only 32 buses for transferring 4-byte data between each data segment. Also output classification data MO-1, MO-3, MO
-5, MO-7 to output register OR-0 to OR-
There is one bus for 4-byte transfer in the upper part of 3.
You will need 4 books in total. In addition, as shown in Figure 9, for 8-byte data transfer, output classification data MO-0, MO-2, MO-4 and
MO-6 is provided, and these output classification data and input classification data MI-0, MI-2, MI-4 and MI-
A total of 16 4-byte buses, 4 of each bus, are required between 6 and 6. As a result, as shown in Figure 10, the align circuit 3 requires only 32 + 4 + 16 = 52 4-byte transfer buses, compared to the 64 4-byte transfer buses required as shown in Figure 4. Compared to this, the amount of bus can be significantly reduced without reducing the transfer speed, and since the bus is 4 bytes (for example, 36 bits), the amount of hardware can be significantly reduced.

また、ベクトル・ストア命令によりベクトル・
レジスタ等のデータ処理部からメモリにデータを
ストアする場合の本発明の他の実施例を第１１図
〜第１３図について説明する。 In addition, the vector store instruction
Another embodiment of the present invention in which data is stored in a memory from a data processing unit such as a register will be described with reference to FIGS. 11 to 13.

第１１図は本発明のストア時における４バイト
データ転送バスの状態を示し、第１２図は８バイ
トデータを転送するために第１１図に追加するバ
スの状態を示し、第１３図はこれらをまとめた本
発明の一実施例構成を示す。第１１図に示すよう
に、入力レジスタIR−０′〜IR−３′の下位部に
入力区分データMI−１′，MI−３′，MI−５′お
よびMI−７′を設ける。入力区分データMI−１′
は入力レジスタIR−０′の上位区分あるいは下位
区分から入力された４バイトデータを出力区分デ
ータMO−０′〜MO−７′にそれぞれ伝達できる
ように構成されている。したがつて第１１図にお
いて４バイトデータを扱う命令の場合には、入力
レジスタIR−０′〜IR−３′にセツトし、これを
入力区分データMI−１′，MI−３′，MI−５′お
よびMI−７′で取出し、その伝達先の出力区分デ
ータMO−０′〜MO−７′に送出する。このよう
にすることによりアライン回路３では、各区分デ
ータ間の４バイトデータ転送用バスが32本ですむ
ことになる。この外に、入力区分データMI−
１′，MI−３′，MI−５′およびMI−７′に対して
入力レジスタIR−０′〜IR−３′の上位部からの
４バイト転送用バスが各１本ずつ合計４本必要に
なる。また第１２図に示すように、８バイトデー
タ転送用のために、入力区分データMI−０′，MI
−２′，MI−４′およびMI−６′を設け、これらの
入力区分データと出力区分データMO−０′，MO
−２′，MO−４′およびMO−６′との間にそれぞ
れ４本ずつ、合計16本の４バイトバスを必要とす
る。そしてこの結果、第１３図に示す如く、アラ
イン回路３′では４バイト転送用バスを32＋４＋
16＝52本設ければよく、転送速度を落とすことな
くハードウエア量を大幅に少なくすることができ
る。 FIG. 11 shows the state of the 4-byte data transfer bus during storage according to the present invention, FIG. 12 shows the state of the bus added to FIG. 11 to transfer 8-byte data, and FIG. 13 shows these states. 1 shows a summarized configuration of an embodiment of the present invention. As shown in FIG. 11, input classification data MI-1', MI-3', MI-5' and MI-7' are provided in the lower part of the input registers IR-0' to IR-3'. Input classification data MI-1'
is configured to be able to transmit 4-byte data inputted from the upper section or lower section of the input register IR-0' to the output section data MO-0' to MO-7', respectively. Therefore, in the case of an instruction that handles 4-byte data in FIG. 5' and MI-7', and sends it to the destination output classification data MO-0' to MO-7'. By doing this, the align circuit 3 requires only 32 buses for 4-byte data transfer between each segment of data. In addition to this, input classification data MI−
1', MI-3', MI-5', and MI-7' require a total of 4 buses for 4-byte transfer from the upper part of input registers IR-0' to IR-3', one each. become. In addition, as shown in Figure 12, for 8-byte data transfer, input classification data MI-0', MI
-2', MI-4' and MI-6' are provided, and these input section data and output section data MO-0', MO
A total of 16 4-byte buses, 4 each between MO-2', MO-4' and MO-6', are required. As a result, as shown in FIG. 13, the align circuit 3' uses 32+4+
It is sufficient to provide 16 = 52 lines, and the amount of hardware can be significantly reduced without reducing the transfer speed.

さらに本発明の第３の実施例を第１４図〜第１
６図について説明する。第１４図はメモリからデ
ータを読出すベクトル・ロード命令の場合に使用
するアライン回路とメモリに対してデータを書込
むベクトル・ストア命令の場合に使用するアライ
ン回路とを共通にした場合を示し、第１５図はそ
のときに使用される従来のアライン回路を示し、
第１６図は本発明の一実施例構成を示す。 Furthermore, a third embodiment of the present invention is shown in FIGS.
Figure 6 will be explained. FIG. 14 shows a case where the align circuit used in the case of a vector load instruction to read data from memory and the align circuit used in the case of a vector store instruction to write data to memory are shared, FIG. 15 shows a conventional align circuit used at that time,
FIG. 16 shows the configuration of an embodiment of the present invention.

すなわち第１６図はメモリからベクトルレジス
タへデータをロードする場合の第１０図とベクト
ルレジスタからメモリへストアする場合の第１８
図とを共通に合わせたものである。なお第１８図
は第１７図のもう１つの実施例であり、第１７図
は第１３図を書き直したものである。 In other words, Figure 16 shows Figure 10 when loading data from memory to vector register, and Figure 18 when storing data from vector register to memory.
This is a common combination of the figures. Note that FIG. 18 is another embodiment of FIG. 17, and FIG. 17 is a redrawn version of FIG. 13.

第１４図において、ベクトル・ロード命令を実
行する場合、メモリ１のメモリユニツト１−０〜
１−３から読出されたデータはアライン回路３″
の入力レジスタＲ０〜Ｒ３，ゲートG₁，G₃，
G₅，G₇を経由してアライン部４に伝達され、出
力レジスタＲ４〜Ｒ７を経由してデータ処理部２
−０の入力レジスタr₆，r₇あるいはデータ処理部
２−１の入力レジスタr₂，r₃に送出される。また
ベクトル・ストア命令を実行する場合には、例え
ばデータ処理部２−０のレジスタr₄，r₅から送出
されたデータは、ゲートG₀，G₂を経由してアラ
イン部４に伝達され、それから出力レジスタＲ１
２〜Ｒ１５を経由してメモリユニツト１−０〜１
−３にストアされる。同様にデータ処理部２−１
のレジスタr₀，r₁から送出されたデータは、ゲー
トG₄，G₆を経由してアライン部４に伝達され、
それから出力レジスタＲ１２〜Ｒ１５を経由して
メモリユニツト１−０〜１−３にストアされる。
したがつて、第１５図に示すように、従来の方法
によると入力区分データｍ−０〜ｍ−７および出
力区分データｍ−８〜ｍ−１５の間にそれぞれ８
本ずつの４バイト転送用バスを必要とするため、
合計64本の４バイト転送用バスを必要としてい
た。 In FIG. 14, when a vector load instruction is executed, memory units 1-0 to 1 of memory 1 are
The data read from 1-3 is sent to the align circuit 3''
input registers R0 to R3, gates _G1 , _G3 ,
It is transmitted to the align unit 4 via G ₅ and G ₇ , and is transmitted to the data processing unit 2 via output registers R4 to R7.
-0 input registers r ₆ and r ₇ or input registers r ₂ and r ₃ of the data processing section 2-1. Furthermore, when executing a vector store instruction, for example, data sent from registers r ₄ and r ₅ of the data processing unit 2-0 is transmitted to the align unit 4 via gates G ₀ and G ₂ , Then output register R1
Memory units 1-0 to 1 via 2 to R15
-3 is stored. Similarly, data processing section 2-1
The data sent from the registers r ₀ and r ₁ of are transmitted to the align unit 4 via the gates G ₄ and G ₆ ,
It is then stored in memory units 1-0 to 1-3 via output registers R12 to R15.
Therefore, as shown in FIG. 15, according to the conventional method, there are 8 times each between the input section data m-0 to m-7 and the output section data m-8 to m-15.
Since each 4-byte transfer bus is required,
A total of 64 4-byte transfer buses were required.

しかしながら、本発明によれば第１６図に示す
如く、ベクトル・ストア命令を実行する場合に
は、各入力区分データＭ−０，Ｍ−０′のうちい
ずれか一方を選択して入力区分データｍ−０′に
転送し、以下同様に入力区分データＭ−１〜Ｍ−
３′の出力をそれぞれ入力区分データｍ−２′，ｍ
−４′，ｍ−６′に転送する。そしてベクトル・ロ
ード命令を実行する場合には、アライン部の出力
側において各出力データバスのうちの１つの区分
データｍ−９′に入力区分データＭ−０〜Ｍ−
３′のうちいずれか１つの区分データを選択し、
選択された出力区分データｍ−９′と出力区分デ
ータｍ−８′のいずれか一方を選択して出力区分
データｍ−９′を出力区分データＯ−０に転送す
るようにし、さらに出力区分データｍ−９′は出
力区分データＯ−１に出力するようにする。同様
に出力区分データｍ−１１′，ｍ−１３′，ｍ−１
５′はそれぞれｍ−１０′，ｍ−１２′，ｍ−１
４′とを選択した出力を出力区分データＯ−２，
Ｏ−４，Ｏ−６に転送するとともに出力区分デー
タｍ−１１′，ｍ−１３′，ｍ−１５′は出力区分
データＯ−３，Ｏ−５，Ｏ−７に転送する。この
ように構成することにより、入力区分データｍ−
１′，ｍ−３′，ｍ−５′およびｍ−７′からそれぞ
れ出力区分データｍ−８′，ｍ−１０′，ｍ−１
２′およびｍ−１４′に対する４バイト転送バスを
省略することが可能になる。したがつて、第１５
図に示した従来のアライン回路が64本の４バイト
転送用バスを必要とするのに対し、本発明によれ
ば各入力区分データＭ−０′，Ｍ−１′，Ｍ−
２′，Ｍ−３′から各入力区分データｍ−０′，ｍ
−２′，ｍ−４′，ｍ−６′に選択あるいは左シフ
トして転送するバスおよび各出力区分データｍ−
９′，ｍ−１１′，ｍ−１３′，ｍ−１５′から各出
力区分データＯ−０，Ｏ−２，Ｏ−４，Ｏ−６に
選択あるいは左シフトして転送するバスを加えて
も、32＋16＋４＋４＝56本となり、データの転送
速度を落すことなくバスの本数を減少することが
でき、従つてハードウエアを減少させることがで
きる。 However, according to the present invention, as shown in FIG. 16, when executing a vector store instruction, one of the input segment data M-0 and M-0' is selected and the input segment data m -0', and input classification data M-1 to M- in the same way.
The output of 3' is input segmented data m-2', m, respectively.
-4', transfer to m-6'. When a vector load instruction is executed, input segment data M-0 to M- is input to one segment data m-9' of each output data bus on the output side of the align unit.
Select any one of the segmented data from 3',
Either one of the selected output classification data m-9' and output classification data m-8' is selected to transfer the output classification data m-9' to the output classification data O-0, and then the output classification data m-9' is transferred to the output classification data O-0. m-9' is outputted to output classification data O-1. Similarly, output classification data m-11', m-13', m-1
5' are m-10', m-12', m-1 respectively
4' is selected as output classification data O-2,
At the same time, output section data m-11', m-13', m-15' are transferred to output section data O-3, O-5, O-7. By configuring in this way, the input classification data m-
Output segment data m-8', m-10', m-1 from 1', m-3', m-5' and m-7', respectively.
It becomes possible to omit the 4-byte transfer bus for 2' and m-14'. Therefore, the 15th
While the conventional align circuit shown in the figure requires 64 4-byte transfer buses, according to the present invention, each input segment data M-0', M-1', M-
2', M-3' to each input segment data m-0', m
-2', m-4', m-6' or left-shifted bus and each output section data m-
9', m-11', m-13', m-15' to each output section data O-0, O-2, O-4, O-6 by adding a bus to select or shift to the left and transfer. The number of buses is 32+16+4+4=56, so the number of buses can be reduced without reducing the data transfer speed, and therefore the hardware can be reduced.

なお、上記の場合には各入力区分データＭ−
０′，Ｍ−１′，Ｍ−２′，Ｍ−３′から各入力区分
データｍ−０′，ｍ−２′，ｍ−４′，ｍ−６′に選
択あるいは左シフトして転送するバスおよび各出
力区分データｍ−９′，ｍ−１１′，ｍ−１３′，
ｍ−１５′から各出力区分データＯ−０，Ｏ−
２，Ｏ−４，Ｏ−６に選択あるいは左シフトして
転送するバスを例として説明したが勿論各入力区
分データＭ−０，Ｍ−１，Ｍ−２，Ｍ−３から各
入力区分データｍ−１′，ｍ−３′，ｍ−５′，ｍ
−７′に選択あるいは右シフトして転送するバス
および各出力区分データｍ−８′，ｍ−１０′，ｍ
−１２′，ｍ−１４′から各出力区分データＯ−
１，Ｏ−３，Ｏ−５，Ｏ−７に選択あるいは右シ
フトして転送する場合においても同様である。 In addition, in the above case, each input category data M-
Select or left shift data from 0', M-1', M-2', M-3' to each input segment data m-0', m-2', m-4', m-6' and transfer. Bus and each output division data m-9', m-11', m-13',
m-15' to each output classification data O-0, O-
2, O-4, O-6 are selected or left-shifted and transferred, but of course each input section data is transferred from each input section data M-0, M-1, M-2, M-3. m-1', m-3', m-5', m
-7' or right-shifted bus and each output section data m-8', m-10', m
-12', m-14' to each output classification data O-
The same applies when selecting or right-shifting to 1, O-3, O-5, and O-7.

この場合には、例えば第１６図において各入力
区分データＭ−０′〜Ｍ−３′を使用して点線に示
すバスを設けるとともに、各出力区分データｍ−
８′，ｍ−１０′，ｍ−１２′，ｍ−１４′から点線
に示す如きバスを設ければよい。このようにする
ことにより、今度は各入力区分データｍ−０′，
ｍ−２′，ｍ−４′，ｍ−６′から各出力区分デー
タｍ−９′，ｍ−１１′，ｍ−１３′，ｍ−１５′に
接続するバスを省略することが可能になる。すな
わち第１６図で第１０図と第１８図を共通化した
が、これは第１０図と第１７図を共通化するのよ
りも有利であるということである。 In this case, for example, in FIG. 16, each input section data M-0' to M-3' is used to provide a bus indicated by a dotted line, and each output section data m-
8', m-10', m-12', and m-14' may be provided with buses as shown by dotted lines. By doing this, each input segment data m-0',
It becomes possible to omit the bus connecting m-2', m-4', m-6' to each output classification data m-9', m-11', m-13', m-15'. . That is, although FIG. 10 and FIG. 18 are made common in FIG. 16, this is more advantageous than making FIG. 10 and FIG. 17 common.

以上説明の如く、結局本発明によればアライン
回路におけるバスの本数、従つてハードウエア量
を転送速度を落すことなく大幅に節約することが
できるので、その結果、製造が容易となるのみな
らず、その制御も非常に容易にすることができ
る。 As explained above, according to the present invention, the number of buses in the align circuit and therefore the amount of hardware can be significantly reduced without reducing the transfer speed, which not only simplifies manufacturing but also , its control can also be made very easy.

なお、本発明が、説明で用いた８バイトや４バ
イトなどのデータの大きさおよび、データバスの
本数、更には入力レジスタや出力レジスタの存在
などに限定されるものではなく、任意に構成可能
であり、適用可能であることは言うまでもない。 Note that the present invention is not limited to the data size such as 8 bytes or 4 bytes used in the explanation, the number of data buses, or the presence of input registers and output registers, and can be configured as desired. Needless to say, it is applicable.

[Brief explanation of the drawing]

第１図はデータ処理装置のロード時におけるデ
ータ転送説明図、第２図はデータ形式の説明図、
第３図，第４図は従来のロード用のアライン回
路、第５図はデータ処理装置のストア時における
データ転送説明図、第６図，第７図は従来のスト
ア用のアライン回路、第８図〜第１０図は本発明
の一実施例であるロード用のアライン回路の説明
図、第１１図〜第１３図は本発明の他の実施例で
あるストア用のアライン回路の説明図、第１４図
および第１５図はロードおよびストアに共用でき
るアライン回路を有するデータ処理装置の説明
図、第１６図は本発明の他の実施例であるロー
ド／ストア共用のアライン回路の説明図、第１７
図は第１３図を書き直したもの、第１８図はスト
ア用アライン回路の他の実施例説明図である。図中、１はメモリ、２はデータ処理部、３，
３′，３″はアライン回路、４はアライン部をそれ
ぞれ示す。 Figure 1 is an explanatory diagram of data transfer when loading the data processing device, Figure 2 is an explanatory diagram of the data format,
3 and 4 are conventional alignment circuits for loading, FIG. 10 to 10 are explanatory diagrams of an align circuit for loading which is an embodiment of the present invention, and FIGS. 11 to 13 are diagrams explanatory of an align circuit for storing which is another embodiment of the present invention. 14 and 15 are explanatory diagrams of a data processing device having an align circuit that can be shared for loading and storing, FIG. 16 is an explanatory diagram of an align circuit that can be shared for loading and storing, which is another embodiment of the present invention, and FIG.
The figure is a redrawn version of FIG. 13, and FIG. 18 is an explanatory diagram of another embodiment of the store alignment circuit. In the figure, 1 is a memory, 2 is a data processing unit, 3,
3' and 3'' are align circuits, and 4 is an align section, respectively.

Claims

[Claims]

1 a plurality of memories having one or more independently operable read data buses and one or more write data buses; a data processing section having a plurality of input buses and a plurality of output buses; Some or all of the respective read data buses can be transferred to some or all of the respective input buses of the data processing section, and some or all of the respective output buses of the data processing section can be transferred to the write data of the memory. In a data processing device, a data processing device is provided with an align circuit that enables data to be transferred to a data bus including a read data bus from the memory of the align circuit, an input bus to the data processing section, and an output bus from the data processing section to the memory. Each of the write data buses is divided into a plurality of sections, and when data of a certain section of the read data bus from the memory is transferred to some sections of the input bus of the data processing section, the data bus is divided into the plurality of sections. selecting in advance at least one segmented data of the read data bus from the memory for each input bus of the data processing unit;
The selected segmented data can be transferred to each segment of the input bus of the corresponding data processing unit, and the segmented data with the output bus from the data processing unit is transferred to one of the write data buses of the memory. When transferring the data to the divisions of the data processing unit, at least one division of the output bus from the divided data processing unit is selected in advance for each output bus, and the selected division data is divided into the plurality of divisions. A data processing circuit characterized in that the data processing circuit is configured to be able to transfer data to any section of a write data bus to a memory.