JPH0677241B2

JPH0677241B2 - Processor

Info

Publication number: JPH0677241B2
Application number: JP63069056A
Authority: JP
Inventors: 洋一佐藤
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 1988-03-23
Filing date: 1988-03-23
Publication date: 1994-09-28
Anticipated expiration: 2009-09-28
Also published as: JPH01241646A

Description

【発明の詳細な説明】〔産業上の利用分野〕本発明は情報処理装置の一部を構成する演算処理装置に
関し、特にキャッシュ・メモリと複数のLSIチップとで
構成される演算処理装置におけるキャッシュ・メモリと
LSIチップとの間のデータ転送にかかる技術に関するも
のである。The present invention relates to an arithmetic processing unit forming a part of an information processing apparatus, and more particularly to a cache in an arithmetic processing unit including a cache memory and a plurality of LSI chips.・ With memory
The present invention relates to a technology for data transfer with an LSI chip.

[Conventional technology]

近年、電子デバイスの集積化の進歩が著しく、高性能の
演算処理装置も数個のLSIチップで実現されるようにな
ってきた。In recent years, the integration of electronic devices has made remarkable progress, and high-performance arithmetic processing devices have also been realized by several LSI chips.

ところで、このような高性能の演算処理装置では、処理
の一層の高速化を図る目的でキャッシュ・メモリが採用
されるが、LSIチップが複数個の場合はキャッシュ・メ
モリの読出し先や書込み元が複数のLSIチップにまたが
ることになり、個々にデータ・パスを設けるとキャッシ
ュ・メモリのピン数が膨大となってしまうことから、一
般にはデータ・パスをバス化して各LSIチップで共通利
用し、ピン数制限におさまるようにしている。By the way, in such a high-performance arithmetic processing device, a cache memory is adopted for the purpose of further increasing the processing speed. However, when there are a plurality of LSI chips, the read destination and the write source of the cache memory are Since it will be spread over multiple LSI chips, and the number of pins of the cache memory will become enormous if each data path is provided, the data path is generally made into a bus and commonly used by each LSI chip. I try to stay within the pin count limit.

〔発明が解決しようとする課題〕上述したように、従来の演算処理装置は、キャッシュ・
メモリとのアクセスのためのデータ・パスをバス化する
ことにより、キャッシュ・メモリのピン数を少なくして
いた。しかしながら、バスに接続されるLSIチップ数が多くなるとバスの線
長が長くなり、静電容量の増大によりバス上の信号の遅
延時間が増大してキャッシュ・メモリの高速なアクセス
が行えない。[Problems to be Solved by the Invention] As described above, the conventional arithmetic processing device is
The number of pins of the cache memory has been reduced by making the data path for accessing the memory a bus. However, when the number of LSI chips connected to the bus increases, the line length of the bus increases, and the delay time of signals on the bus increases due to an increase in capacitance, which makes it impossible to access the cache memory at high speed.

バス方式であるため全てのLSIチップのデータ幅を一
致させなければならず、異なるデータ幅のLSIチップに
ついては入出力端子の前段にデータ整列回路を設ける必
要がある。Since it is a bus system, the data widths of all the LSI chips must be the same, and for LSI chips with different data widths, it is necessary to provide a data alignment circuit before the input / output terminals.

等の欠点があった。There were drawbacks such as.

特に、キャッシュ・メモリのアクセスをパイプライン化
している演算処理装置にあっては、キャッシュ・メモリ
の読出し時間の増大はマシン・サイクルの短縮化を阻む
直接的な要因となることから、演算処理装置の性能を低
下させることとなり、についての対策は重要な問題で
あった。また、についてもハードウェアの増加をもた
らすため、その削減を図ることが重要な課題であった。In particular, in an arithmetic processing unit in which access to the cache memory is pipelined, the increase in the read time of the cache memory is a direct factor that prevents the shortening of the machine cycle. However, the countermeasure against was an important issue. In addition, since the increase of hardware also brings about, it was an important issue to reduce it.

本発明は上記の点に鑑み提案されたものであり、その目
的とするところは、高速なキャッシュ・メモリのアクセ
スを行うことができると共に、ハードウェアを削減する
ことのできる演算処理装置を提供することにある。The present invention has been proposed in view of the above points, and an object of the present invention is to provide an arithmetic processing device capable of performing high-speed cache memory access and reducing hardware. Especially.

[Means for Solving the Problems]

本発明は上記の目的を達成するため、キャッシュ・メモ
リと、データ幅が均一でない複数のLSIチップとから構
成され、前記キャッシュ・メモリと２個以上の前記LSI
チップとの間でデータ転送が行われる演算処理装置にお
いて、任意の入出力端子間を接続状態とできるクロス・
バー・スイッチ機能を有すると共にデータを整列してデ
ータ幅を変換する機能を有するチップを介して前記キャ
ッシュ・メモリと２個以上の前記LSIチップとを接続す
るようにしている。In order to achieve the above-mentioned object, the present invention comprises a cache memory and a plurality of LSI chips having non-uniform data widths, and the cache memory and two or more LSI chips.
In an arithmetic processing unit that transfers data to and from a chip, a cross
The cache memory and two or more LSI chips are connected via a chip having a bar switch function and a function of aligning data and converting a data width.

[Action]

本発明の演算処理装置にあっては、クロス・バー・スイ
ッチ機能を有するチップを介してキャッシュ・メモリと
LSIチップとの間でデータ転送が行われると共に、必要
に応じて転送先のLSIチップのデータ幅に適合するよう
にデータ幅の変換が行われる。In the arithmetic processing unit of the present invention, the cache memory is provided via the chip having the cross bar switch function.
Data is transferred to and from the LSI chip, and if necessary, data width conversion is performed so as to match the data width of the transfer destination LSI chip.

〔Example〕

以下、本発明の実施例につき図面を参照して詳細に説明
する。Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.

第１図は本発明の演算処理装置を含む情報処理装置の一
実施例を示す構成図である。第１図において、90が本発
明の対象となる演算処理装置であり、この演算処理装置
90はシステム・バス94を介して主記憶装置91,入出力制
御装置92,システム制御装置93と接続されている。な
お、第１図では示していないが、マルチプロセッサ構成
においては他に数台の演算処理装置をシステム・バス94
に接続し、更に主記憶容量の増大時には主記憶装置を複
数台にしてシステム・バス94に接続するものである。FIG. 1 is a block diagram showing an embodiment of an information processing apparatus including an arithmetic processing unit of the present invention. In FIG. 1, reference numeral 90 denotes an arithmetic processing unit which is the object of the present invention.
Reference numeral 90 is connected to a main storage device 91, an input / output control device 92, and a system control device 93 via a system bus 94. Although not shown in FIG. 1, in the multiprocessor configuration, several other arithmetic processing units are connected to the system bus 94.
When a main memory capacity is further increased, a plurality of main memory devices are connected to the system bus 94.

また、演算処理装置90は、命令制御回路10,アドレス変
換制御回路20,バス制御回路30,演算制御回路40,高速演
算回路50,制御記憶回路60を構成する各LSIチップと、複
数個のランダム・アクセス・メモリ（RAM）から構成さ
れる制御記憶85と、キャッシュ・メモリ83,84と、アド
レス・アレイ（AA）81と、コピー・アドレス・アレイ
（CAA）82と、複数個のLSIチップから構成されるクロス
・バー・スイッチ70とで構成されている。Further, the arithmetic processing unit 90 includes each LSI chip constituting the instruction control circuit 10, the address translation control circuit 20, the bus control circuit 30, the arithmetic control circuit 40, the high speed arithmetic circuit 50, and the control memory circuit 60, and a plurality of random numbers. Control memory 85 consisting of access memory (RAM), cache memories 83, 84, address array (AA) 81, copy address array (CAA) 82, and multiple LSI chips It is composed of a crossbar switch 70 and the like.

次に、キャッシュ・メモリ83,84および主記憶装置91に
対する読出しオペレーション動作について説明する。先
ず、命令あるいはオペランドの読出し指示と読出しアド
レスは命令制御回路10から結線102を介してアドレス変
換制御回路20へ転送される。上記読出しアドレスが仮想
アドレスの場合はアドレス変換制御回路20内で仮想アド
レスから実アドレスに変換される。アドレス変換制御回
路20は読出し実アドレスを結線201,202,203,204上に出
力し、キャッシュ・メモリ83,84と主記憶装置91との対
応関係、すなわちキャッシュ・メモリ83,84登録情報を
記憶し登録の有無を判定するアドレス・アレイ81から結
線202′を介して返送される信号によりキャッシュ・ヒ
ット（登録有り）か否かを判定し、キャッシュ・ヒット
ならばキャッシュ・メモリ83あるいはキャッシュ・メモ
リ84の読出しデータを有効としてクロス・バー・スイッ
チ70を介して読出し先のLSIチップに返送する。返送先
は、一般的には、命令の読出しの場合は命令制御回路10
となり、オペランドの読出しの場合は演算制御回路40と
なるが、特殊な動作においてはアドレス変換制御回路20
や高速演算回路50となることもある。一方、キャッシュ
・ヒットでない場合（キャッシュ・ミスあるいはNFBと
呼ばれる。）は、バス制御回路30によりシステム・バス
94を介して主記憶装置91に対しブロック転送要求を送出
する。そして、主記憶装置91から返送されるデータは、
バス制御回路30を経た後、結線307,クロス・バー・スイ
ッチ70,結線837あるいは結線847によりキャッシュ・メ
モリ83あるいはキャッシュ・メモリ84へ書込まれる。ま
た、主記憶装置91からの第１回目の返送データはクロス
・バー・スイッチ70から返送先へ返送される。以上のよ
うにして読出しオペレーションが実行される。Next, the read operation operation for the cache memories 83, 84 and the main memory 91 will be described. First, an instruction or operand read instruction and a read address are transferred from the instruction control circuit 10 to the address conversion control circuit 20 via a connection line 102. When the read address is a virtual address, the virtual address is translated in the address translation control circuit 20 into a real address. The address translation control circuit 20 outputs the read real address on the connection lines 201, 202, 203, 204, stores the correspondence relationship between the cache memories 83, 84 and the main storage device 91, that is, the cache memory 83, 84 registration information and determines the presence or absence of registration. Based on the signal returned from the address array 81 via the connection 202 ', it is determined whether or not there is a cache hit (registered), and if it is a cache hit, the read data from the cache memory 83 or the cache memory 84 is valid. Then, the data is returned to the read-out LSI chip via the cross bar switch 70. Generally, the return destination is the instruction control circuit 10 when reading an instruction.
The operation control circuit 40 is used for reading operands, but the address translation control circuit 20 is used for special operations.
It may also be the high-speed arithmetic circuit 50. On the other hand, if there is no cache hit (called cache miss or NFB), the bus control circuit 30 sets the system bus.
A block transfer request is sent to the main storage device 91 via 94. Then, the data returned from the main storage device 91 is
After passing through the bus control circuit 30, the data is written in the cache memory 83 or the cache memory 84 by the connection 307, the cross bar switch 70, the connection 837 or the connection 847. The first return data from the main storage device 91 is returned from the cross bar switch 70 to the return destination. The read operation is executed as described above.

次に、キャッシュ・メモリ83,84および主記憶装置91に
対する書込みオペレーション動作について説明する。先
ず、書込み指示と書込みアドレスは命令制御回路10で書
込みオペレーションを必要とする命令を解読した場合あ
るいはマイクロ・プログラムで書込みオペレーションを
実行する場合に命令制御回路10内で作成され、結線102
を介してアドレス変換制御回路20へ送出される。その書
込みアドレスが仮想アドレスの場合にはアドレス変換制
御回路20で実アドレスへ変換された後、アドレス変換制
御回路20内の書込みアドレスを保持するレジスタに保持
され、高速演算回路50等で書込みデータが準備された時
点で、キャッシュ・メモリ83あるいはキャッシュ・メモ
リ84への書込みと、主記憶装置91に対する書込み指示，
書込みアドレス，書込みデータのバス制御回路30への送
出とが実行される。ただし、キャッシュ・メモリ83ある
いはキャッシュ・メモリ84への書込みは、該当するアド
レスがキャッシュ・メモリ83あるいはキャッシュ・メモ
リ84に登録されている場合のみ行われる。そして、バス
制御回路30ではシステム・バス94を介して主記憶装置91
への書込みを実行する。なお、書込みデータは演算制御
回路40において主にマイクロ・プログラムの制御下で準
備され、結線405を介して高速演算回路50にある書込み
データを保持するレジスタへ送られた後、書込みアドレ
スとの同期をとって結線507を介してクロス・バー・ス
イッチ70へ送られ、バス制御回路30およびキャッシュ・
メモリ83あるいはキャッシュ・メモリ84へ転送される。
以上のようにして書込みオペレーションが実行される。Next, the write operation operation for the cache memories 83, 84 and the main memory 91 will be described. First, the write instruction and the write address are created in the instruction control circuit 10 when the instruction control circuit 10 decodes an instruction requiring the write operation or when the micro program executes the write operation, and the connection 102
Is sent to the address translation control circuit 20 via. If the write address is a virtual address, it is converted to a real address by the address conversion control circuit 20, and then stored in a register that holds the write address in the address conversion control circuit 20, and the write data is stored in the high-speed arithmetic circuit 50 or the like. At the time of preparation, writing to the cache memory 83 or the cache memory 84 and a write instruction to the main memory 91,
The write address and write data are sent to the bus control circuit 30. However, writing to the cache memory 83 or the cache memory 84 is performed only when the corresponding address is registered in the cache memory 83 or the cache memory 84. In the bus control circuit 30, the main storage device 91 is connected via the system bus 94.
Write to. The write data is prepared in the arithmetic control circuit 40 mainly under the control of the micro program, sent to the register holding the write data in the high-speed arithmetic circuit 50 through the connection 405, and then synchronized with the write address. Sent to the crossbar switch 70 via the connection 507, and the bus control circuit 30 and cache
It is transferred to the memory 83 or the cache memory 84.
The write operation is executed as described above.

キャッシュ・メモリ83,84および主記憶装置91に対する
データの読出しオペレーションおよび書込みオペレーシ
ョンは以上のように実行されるものであるが、データが
転送されるデータ線は図示のように全て各回路を構成す
るLSIチップ間を１対１で接続するように配設されてな
るものであり、クロス・バー・スイッチ70により選択さ
れた結線の他は影響しないと共に、アクセス・パスの線
上が最短になるように各LSIチップをパッケージ上に実
装することができるため、パッケージ上のデータ線によ
る遅延時間を大幅に短縮することが可能である。すなわ
ち、従来の装置を第１図の実施例に当てはめてみると、
従来は結線207,107,407,507,307,837,847が並列に接続
されたバス構成となっていたため、トータルの線長が長
くなり、静電容量が増大してデータ転送の際の遅延時間
が大きくなってしまっていたが、本発明によればクロス
・バー・スイッチ70により選択された結線のみの静電容
量しか関係してこないと共に最短のアクセス・パスとす
ることができるため、静電容量に起因する遅延時間を大
幅に短縮することができるわけである。The data read operation and the data write operation for the cache memories 83, 84 and the main memory 91 are executed as described above, but the data lines to which the data are transferred all constitute each circuit as shown in the figure. The LSI chips are arranged so as to be connected to each other in a one-to-one manner, and other than the connection selected by the cross bar switch 70 is not affected, and the line of the access path is minimized. Since each LSI chip can be mounted on the package, the delay time due to the data line on the package can be significantly reduced. That is, when the conventional device is applied to the embodiment shown in FIG.
Conventionally, since the connection 207, 107, 407, 507, 307, 837, 847 had a bus configuration in which they were connected in parallel, the total line length became long, the capacitance increased and the delay time at the time of data transfer increased, but the present invention According to the above, since only the capacitance of the connection selected by the crossbar switch 70 is relevant and the shortest access path can be obtained, the delay time due to the capacitance is significantly reduced. It can be done.

次に、第２図は第１図におけるクロス・バー・スイッチ
70の内部構成の例を示す構成図である。第２図におい
て、847,837,307,207,507,407,107は、第１図において
示したように、各々キャッシュ・メモリ84,キャッシュ
・メモリ83,バス制御回路30,アドレス変換制御回路20,
高速演算回路50,演算制御回路40,命令制御回路10と接続
される結線である。なお、図では簡略化して記載してあ
るが、結線847,837,307,207,507,107はデータ幅が例え
ば８バイト（64ビット）となっているものである。ただ
し、結線407だけはデータ幅が他と異なり、例えば４バ
イトとなっている。しかして、結線847,837,307,207,50
7,407,107にそれぞれ対応してセレクタ710〜716および
入出力のドライバが設けられており、クロス・バー・ス
イッチ70の制御線である結線205としてセレクタ710〜71
6のセレクト信号205−S0〜205−S6と、ドライバの出力
イネーブル信号205−E0〜205−E4とが与えられ、アドレ
ス変換制御回路20により個々のセレクタ710〜716は独立
に制御されるようになっている。例えば、キャッシュ・
メモリ83から命令制御回路10へデータの読出しを行う場
合には、セレクタ716により結線107と結線837とを接続
する。Next, FIG. 2 shows the crossbar switch in FIG.
FIG. 30 is a configuration diagram showing an example of an internal configuration of 70. In FIG. 2, reference numerals 847, 837, 307, 207, 507, 407, 107 denote cache memory 84, cache memory 83, bus control circuit 30, address translation control circuit 20, respectively, as shown in FIG.
The wiring is connected to the high-speed arithmetic circuit 50, arithmetic control circuit 40, and instruction control circuit 10. It should be noted that although illustrated in a simplified manner in the figure, the connection lines 847, 837, 307, 207, 507, 107 have a data width of, for example, 8 bytes (64 bits). However, the data width of only the connection 407 is different from the others, and is, for example, 4 bytes. Then connection 847,837,307,207,50
Selectors 710 to 716 and input / output drivers are provided corresponding to 7,407 and 107, respectively, and selectors 710 to 71 are provided as connection lines 205 which are control lines of the crossbar switch 70.
The select signals 205-S0 to 205-S6 of 6 and the output enable signals 205-E0 to 205-E4 of the driver are given, and the address translation control circuit 20 controls the individual selectors 710 to 716 independently. Has become. For example, cache
When data is read from the memory 83 to the instruction control circuit 10, the selector 716 connects the connection 107 and the connection 837.

なお、このクロス・バー・スイッチ70は本発明の他の特
徴として、データ幅を変換する機能を有しており、デー
タ幅が均一でないLSIチップ同士を結合することができ
るようになっている。例えば、演算制御回路40（前述し
たように結線407だけはデータ幅が他と異なり、例えば
４バイトである。）へデータの読出しを実行する場合、
キャッシュ・アクセス時はセレクタ715は読出しアドレ
スに応じて結線837または結線847の入力データを選択
し、更に読出しアドレスに応じ８バイト内の上位４バイ
トあるいは下位４バイトのいずれかの４バイトを選択す
るようにセレクト信号205−S5が与えられることで、８
バイト・データを４バイト・データとして演算制御回路
40に返送することができる。なお、他のLSIチップ、例
えば命令制御回路10へのデータ読出しの際は結線107の
データ幅がキャッシュ・メモリ83,84等と同じ８バイト
であるため、４バイト単位の選択は不要である。As another feature of the present invention, the cross bar switch 70 has a function of converting the data width, and LSI chips having non-uniform data widths can be coupled to each other. For example, when data is read to the arithmetic control circuit 40 (only the connection 407 has a data width different from others as described above, for example, 4 bytes),
At the time of cache access, the selector 715 selects the input data of the connection 837 or the connection 847 according to the read address, and further selects either the upper 4 bytes or the lower 4 bytes of the 8 bytes according to the read address. As the select signal 205-S5 is given,
Operation control circuit with byte data as 4 byte data
Can be returned to 40. When reading data to another LSI chip, for example, the instruction control circuit 10, the data width of the connection line 107 is 8 bytes, which is the same as the cache memories 83, 84, etc., so that selection in 4-byte units is not necessary.

次に、第３図は第１図におけるアドレス変換制御回路20
の内部構成の一部を示したものである。第３図におい
て、要求コードは命令制御回路10から与えられる読出し
オペレーションあるいは書込みオペレーション等を指示
する情報が含まれたコードであり、要求アドレスは命令
制御回路10から与えられる読出し，書込みアドレス（命
令制御回路10から与えられる読出し，書込みアドレスが
仮想アドレスである場合は実アドレスに変換された後の
もの）である。Next, FIG. 3 shows the address conversion control circuit 20 in FIG.
3 shows a part of the internal configuration of the. In FIG. 3, a request code is a code including information for instructing a read operation or a write operation given from the instruction control circuit 10, and a request address is a read / write address given by the instruction control circuit 10 (instruction control If the read / write address given from the circuit 10 is a virtual address, it is after being converted to a real address.

以下、動作を説明する。先ず、結線20−101および結線2
0−201に要求コードおよび要求アドレスが与えられる
と、要求コードは要求コード・レジスタ20−10にセット
され、要求アドレスは実アドレス・レジスタ20−20にセ
ットされる。通常状態では要求受付時に実アドレス・レ
ジスタ20−20に要求アドレスがセットされると同時に、
AAアドレス・レジスタ20−30と、DAアドレス・レジスタ
20−40あるいはDAアドレス・レジスタ20−41にも要求ア
ドレスの一部がセットされる。読出しまたは書込みオペ
レーション時はAAアドレス・レジスタ20−30,DAアドレ
ス・レジスタ20−40,20−41から結線202〜204にアドレ
スが与えられてアドレス・アレイ81とキャッシュ・メモ
リ83またはキャッシュ・メモリ84とが読出され、アドレ
ス・アレイ81でキャッシュ・ヒットか否かが調べられ
る。そして、読出しオペレーションの場合は、キャッシ
ュ・ヒットならばキャッシュ・メモリ83またはキャッシ
ュ・メモリ84から読出したデータはクロス・バー・スイ
ッチ70を介して読出し先へ返送される。なお、キャッシ
ュ・メモリ83かキャッシュ・メモリ84のいずれから読出
しデータを返送するかは要求アドレス中の予め定められ
た１ビットの値に従って行われ、このビットの値が“0"
の時にキャッシュ・メモリ83（バンク＃０）が選択さ
れ、“1"の時にキャッシュ・メモリ84（バンク＃１）が
選択される。一方、キャッシュ・ヒットでない場合（キ
ャッシュ・ミスの場合）、実アドレス・レジスタ20−20
からセレクタ20−23を介して結線201によりバス制御回
路30へ主記憶装置91に対するブロック転送のアドレスが
送出され、バス制御回路30で読出されたブロック転送デ
ータの第１回の返送時、そのデータはクロス・バー・ス
イッチ70を介して読出し先に返送されると同時にキャッ
シュ・メモリ83またはキャッシュ・メモリ84へ登録され
る。なお、ブロック・サイズを32バイト、データの転送
幅を８バイトとすると、ブロック転送は８バイト転送を
４回実行することになる。また、キャッシュ・メモリ8
3,84のバンクをアドレスの下位から第５ビット目、すな
わち16バイト境界で分けることとすると、ブロック転送
データはキャッシュ・メモリ83とキャッシュ・メモリ84
へ２回ずつ（16バイトずつ）書込まれることになる。The operation will be described below. First, connection 20-101 and connection 2
When the request code and the request address are given to 0-201, the request code is set in the request code register 20-10, and the request address is set in the real address register 20-20. In the normal state, the request address is set in the real address register 20-20 when the request is received, and at the same time,
AA address register 20-30 and DA address register
Part of the requested address is also set in 20-40 or DA address register 20-41. During a read or write operation, an address is given from the AA address register 20-30, DA address register 20-40, 20-41 to the connection lines 202-204, and the address array 81 and the cache memory 83 or the cache memory 84 are supplied. And are read and the address array 81 is checked for a cache hit. In the case of a read operation, if a cache hit, the data read from the cache memory 83 or the cache memory 84 is returned to the read destination via the crossbar switch 70. Whether the read data is returned from the cache memory 83 or the cache memory 84 is determined according to a predetermined 1-bit value in the request address, and the value of this bit is "0".
When, the cache memory 83 (bank # 0) is selected, and when it is "1", the cache memory 84 (bank # 1) is selected. On the other hand, if there is no cache hit (cache miss), the real address register 20-20
The address of the block transfer to the main memory device 91 is sent from the bus control circuit 30 to the bus control circuit 30 via the connection 201 through the selector 20-23, and when the block transfer data read by the bus control circuit 30 is returned for the first time, the data is transferred. Is returned to the read destination via the cross bar switch 70 and simultaneously registered in the cache memory 83 or the cache memory 84. When the block size is 32 bytes and the data transfer width is 8 bytes, the 8-byte transfer is executed four times for the block transfer. Also, cache memory 8
If the banks of 3,84 are divided at the 5th bit from the lower address, that is, at the 16-byte boundary, the block transfer data is cache memory 83 and cache memory 84.
It will be written twice to 16 bytes each.

一方、要求コード・レジスタ20−10に書込みオペレーシ
ョンの指示がセットされた場合は、アドレス・アレイ81
の参照とキャッシュ・メモリ83あるいはキャッシュ・メ
モリ84の読出しとが実行された後、要求アドレス（書込
みアドレス）は実アドレス・レジスタ20−20から実アド
レス・レジスタ20−22にセットされ、キャッシュ・メモ
リ83あるいはキャッシュ・メモリ84の読出しデータはデ
ータ・レジスタ20−50へセットされる。また、キャッシ
ュ・ヒットか否かの情報はデコーダ20−11に入力され、
要求コード・レジスタ20−12へセットされる。このよう
に書込みオペレーションの場合は、要求コード・レジス
タ20−10,実アドレス・レジスタ20−20の第１ステージ
から要求コード・レジスタ20−12,実アドレス・レジス
タ20−22の第２ステージに処理を移行させ、第１ステー
ジを空けることにより、後続の要求を受付けることがで
きるようになっている。すなわち、書込みオペレーショ
ンでは書込みデータを待ち合わせる必要から、このよう
な処理が可能となる。On the other hand, if the instruction of the write operation is set in the request code register 20-10, the address array 81
Is executed and the cache memory 83 or the cache memory 84 is read, the request address (write address) is set in the real address register 20-20 to the real address register 20-22, and the cache memory The read data from 83 or cache memory 84 is set in data registers 20-50. In addition, information on whether or not a cache hit is input to the decoder 20-11,
Set in request code register 20-12. In the case of the write operation as described above, processing is performed from the first stage of the request code register 20-10 and the real address register 20-20 to the second stage of the request code register 20-12 and the real address register 20-22. , And the subsequent stage can be accepted by vacating the first stage. That is, since it is necessary to wait for write data in the write operation, such processing is possible.

さて、第２ステージの要求コード・レジスタ20−12,実
アドレス・レジスタ20−22にセットされた書込みオペレ
ーションの要求コード，要求アドレスは、高速演算回路
50内の書込みデータ・レジスタに書込みデータが準備さ
れるのを待ち合わせ、書込みデータが準備された時点で
書込み動作を行う。なお、本発明の直接的な内容ではな
いが、この実施例ではキャッシュ・ヒットの場合は書込
みに際してデータ幅内の全てのデータ（例えば８バイ
ト）を書換えない部分書込みであっても、データ幅内の
全てのデータを書換える全書込みとし、特に主記憶装置
91への書込みにかかる処理速度の向上を図れるようにな
っている。すなわち、アドレス・アレイ81の参照とキャ
ッシュ・メモリ83あるいはキャッシュ・メモリ84の読出
しとが実行された状態で、キャッシュ・メモリ83あるい
はキャッシュ・メモリ84の読出しデータは結線207を介
しデータ・レジスタ20−50に保持されるようになってお
り、書込みデータが準備された場合に、高速演算回路50
から結線507を介して転送される書込みデータと、アド
レス変換制御回路20のデータ・レジスタ20−50からセレ
クタ20−51および結線207を介して転送される書込み前
データとをクロス・バー・スイッチ70で受け、バイト単
位でデータの入換えを行い、新たな書込みデータを作成
するようになっている。つまり、バイト単位に書込みマ
スク（データ幅が８バイトの場合は８ビット）が設けら
れており、そのマスクが“1"のバイトのみが書込み前デ
ータと入換えられるようになっている。すなわち、書込
みマスクが“1"のバイトでは結線507の書込みデータを
選択し、書込みマスクが“0"のバイトでは結線207の書
込み前データを選択する。なお、この書込みマスクは書
込みデータとともに結線507でクロス・バー・スイッチ7
0に送出されるものであり、書込みマスク受入部720で受
信された後、結線205による制御信号と同様にセレクタ
の制御に使用される。この操作によりキャッシュ・ヒッ
ト時は、全書込みでない書込みオペレーションに対して
もバス制御回路30および主記憶装置91に対して全書込み
とすることが可能である。すなわち、全書込み化が可能
となる。なお、キャッシュ・ヒットの場合はデータ・レ
ジスタ20−50の内容は書込み前データとなるため、上記
のような処理が可能であるが、キャッシュ・ミスの場合
は内容は不定（パリティのみ保障される）。であるた
め、全書込み化は行えない。このようなキャッシュ・ミ
スの場合は全書込み化は不可能であるので、２バイト書
込みならそのまま２バイト部分書込みとしてバス制御回
路30へ送出され、キャッシュ・メモリ83,84への書込み
も実行しない。また、一般に主記憶装置91では８バイト
単位にエラー訂正符号（ECC）を有し、読出し１ビット
・エラーを訂正するようにしているため、例えば２バイ
ト部分書込み等の８バイト全書込み以外の書込み実行時
は、対応する８バイト境界データの読出しを行った後、
書込みデータの２バイトのみを差し換えて８バイト単位
にエラー訂正符号を再作成してデータとともに書込むこ
とが必要であり、全書込みに比べ処理時間が大きくなっ
てしまうことが考えられるが、その場合は、この処理時
間の遅れを救済するため、演算処理装置90内のキャッシ
ュ・メモリ83,84で上記の処理を予め実行し、主記憶装
置91に対しては全書込み動作として主記憶装置91の処理
時間を短縮することが可能である。Now, the request code and the request address of the write operation set in the request code register 20-12 and the real address register 20-22 of the second stage are the high speed operation circuit.
Wait for write data to be prepared in the write data register in 50, and perform the write operation when the write data is prepared. Although not a direct content of the present invention, in this embodiment, in the case of a cache hit, even if a partial write in which all the data (for example, 8 bytes) within the data width is not rewritten at the time of writing, Write all data to rewrite all data, especially main memory
The processing speed for writing to 91 can be improved. That is, in a state where the reference of the address array 81 and the reading of the cache memory 83 or the cache memory 84 are executed, the read data of the cache memory 83 or the cache memory 84 is connected to the data register 20- When the write data is prepared, the high-speed arithmetic circuit 50
From the data register 20-50 of the address translation control circuit 20 to the pre-write data transferred via the selector 20-51 and the connection 207 from the crossbar switch 70. Then, the data is exchanged in byte units, and new write data is created. That is, a write mask (8 bits when the data width is 8 bytes) is provided for each byte, and only the byte whose mask is "1" can be replaced with the pre-write data. That is, the write data of the connection 507 is selected in the byte whose write mask is "1", and the pre-write data of the connection 207 is selected in the byte whose write mask is "0". This write mask together with the write data is connected to the crossbar switch 7 via connection 507.
It is sent to 0, and is received by the write mask receiving unit 720 and then used for controlling the selector in the same manner as the control signal by the connection 205. By this operation, at the time of cache hit, it is possible to write all to the bus control circuit 30 and the main memory 91 even for write operations that are not all write. That is, full writing becomes possible. In the case of a cache hit, the contents of the data register 20-50 are pre-write data, so the above processing is possible, but in the case of a cache miss, the contents are undefined (only parity is guaranteed. ). Therefore, full writing cannot be performed. In the case of such a cache miss, since full writing is impossible, if 2-byte writing is performed, it is sent as it is to the bus control circuit 30 as 2-byte partial writing, and writing to the cache memories 83 and 84 is not executed. In general, the main memory 91 has an error correction code (ECC) in units of 8 bytes and corrects a read 1-bit error. Therefore, for example, writing other than 8-byte full writing such as 2-byte partial writing is performed. At the time of execution, after reading the corresponding 8-byte boundary data,
It is necessary to replace only 2 bytes of the write data and recreate the error correction code in units of 8 bytes and write it together with the data, which may result in a longer processing time than all writes. In that case In order to remedy this delay in processing time, the cache memory 83, 84 in the arithmetic processing unit 90 previously executes the above processing, and the main memory 91 is written as a full write operation. It is possible to shorten the processing time.

一方、本発明の直接的な内容ではないが、第３図におい
ては要求コード・レジスタおよび実アドレス・レジスタ
が２つのステージとなっており、２個のバンクに分割さ
れたキャッシュ・メモリ83,84に対して同時に書込み，
読出しが行えるようになっている。以下、第２ステージ
の要求コード・レジスタ20−12,実アドレス・レジスタ2
0−22に書込みオペレーションがセットされ、第１ステ
ージの要求コード・レジスタ20−10,実アドレス・レジ
スタ20−20に読出しオペレーションがセットされている
場合について動作を説明する。なお、この場合、書込
み，読出しを行うキャッシュ・メモリのバンクによって
動作が異なる。なお、バンクの選択は前述したように要
求アドレス中の予め決められた１ビットの値に従って行
われる。On the other hand, although not a direct content of the present invention, in FIG. 3, the request code register and the real address register are two stages, and the cache memories 83, 84 divided into two banks. Simultaneously write to
It can be read. Below, the second stage request code register 20-12, the real address register 2
The operation will be described for the case where the write operation is set to 0-22 and the read operation is set to the request code register 20-10 and the real address register 20-20 of the first stage. In this case, the operation differs depending on the bank of the cache memory for writing and reading. The bank selection is performed according to a predetermined 1-bit value in the request address as described above.

（１）同一バンクの場合この場合は第２ステージの書込みオペレーションが優先
され、DAアドレス・レジスタ20−40またはDAアドレス・
レジスタ20−41には書込みアドレス（実アドレス・レジ
スタ20−22の内容）の一部がセレクタ20−23,20−42,20
−43を介してセットされ、キャッシュ・メモリ83あるい
はキャッシュ・メモリ84への書込みアドレスを確保し、
書込みが行われる。また、第１ステージの読出しオペレ
ーションは書込みオペレーションが終了するのを待ち合
わせて行われる。(1) In the case of the same bank In this case, the write operation of the second stage is prioritized and the DA address register 20-40 or DA address
In the register 20-41, part of the write address (contents of the real address register 20-22) is selected by the selectors 20-23, 20-42, 20.
It is set via -43, and secures the write address to the cache memory 83 or the cache memory 84,
Writing is done. Further, the read operation of the first stage is performed by waiting for the completion of the write operation.

（２）別バンクの場合この場合、例えば書込みがバンク＃０（キャッシュ・メ
モリ83）で読出しがバンク＃１（キャッシュ・メモリ8
4）の場合、書込みアドレスの一部はDAアドレス・レジ
スタ20−40に、読出しアドレスの一部はAAアドレス・レ
ジスタ20−30およびDAアドレス・レジスタ20−41にセッ
トされる。従って、第２ステージではDAアドレス・レジ
スタ20−40によりキャッシュ・メモリ83のアドレスを確
保し、結線507,207により書込みデータを作成し、結線8
37によりキャッシュ・メモリ83へデータを書込むと同時
に、結線307によりバス制御回路30へ書込みデータを送
出して主記憶装置91への書込みを行う。これと並列し
て、第１ステージではAAアドレス・レジスタ20−30とDA
アドレス・レジスタ20−41とによりアドレス・アレイ81
とキャッシュ・メモリ84のアドレスを確保し、キャッシ
ュ・メモリ84のデータを結線847により読み出す。この
時、読出し先が命令制御回路10または演算制御回路40な
らば上記の読出しデータを返送することが可能である。
ただし、高速演算回路50またはアドレス変換制御回路20
は第２ステージの書込みオペレーションにより使用され
ているため、これらへの読出しは不可である。(2) In the case of another bank In this case, for example, writing is bank # 0 (cache memory 83) and reading is bank # 1 (cache memory 8).
In the case of 4), part of the write address is set in the DA address register 20-40, and part of the read address is set in the AA address register 20-30 and DA address register 20-41. Therefore, in the second stage, the address of the cache memory 83 is secured by the DA address register 20-40, the write data is created by the connections 507 and 207, and the connection 8
At the same time that the data is written to the cache memory 83 by 37, the write data is sent to the bus control circuit 30 through the connection 307 to write to the main memory 91. In parallel with this, in the first stage AA address registers 20-30 and DA
Address array 81 by address register 20-41
The address of the cache memory 84 is secured, and the data of the cache memory 84 is read by the connection 847. At this time, if the read destination is the instruction control circuit 10 or the operation control circuit 40, the read data can be returned.
However, high-speed arithmetic circuit 50 or address translation control circuit 20
Cannot be read because they are used by the second stage write operations.

〔The invention's effect〕

以上説明したように、本発明の演算処理装置にあって
は、キャッシュ・メモリの読出し，書込みのためのデー
タ・パスにバス方式を使わずにクロス・バー・スイッチ
機能を有するチップで直接にキャッシュ・メモリとLSI
チップとの接続を行うようにしているため、データ転送
の行われるデータ・パスを形成するトータルの線長を最
短にすることが可能となり、高速なキャッシュ・メモリ
のアクセスを実現することができる効果がある。また、
クロス・バー・スイッチ機能を有するチップにより必要
に応じてデータ幅の変換が行えるため、データ幅の異な
るLSIチップが混在していても制御信号を変更するだけ
でそのまま対応することが可能であり、特別なデータ整
列回路が必要でないため、ハードウェアを削減すること
ができる効果がある。As described above, in the arithmetic processing unit of the present invention, the cache is directly cached by the chip having the crossbar switch function without using the bus method for the data path for reading and writing of the cache memory.・ Memory and LSI
Since the connection with the chip is made, the total line length forming the data path for data transfer can be minimized, and high-speed cache memory access can be realized. There is. Also,
Since the data width can be converted as needed by the chip with the cross bar switch function, even if LSI chips with different data widths are mixed, it is possible to respond as is by changing the control signal. Since no special data alignment circuit is required, there is an effect that the hardware can be reduced.

[Brief description of drawings]

第１図は本発明の演算処理装置を含む情報処理装置の構
成図、第２図は第１図におけるクロス・バー・スイッチの内部
構成図および、第３図は第１図におけるアドレス変換制御回路の内部構
成の一部を示す図である。図において、90……演算処理装置、91……主記憶装置、
92……入出力制御装置、93……システム制御装置、94…
…システム・バス、10……命令制御回路、20……アドレ
ス変換制御回路、30……バス制御回路、40……演算制御
回路、50……高速演算回路、60……制御記憶回路、70…
…クロス・バー・スイッチ、81……アドレス・アレイ、
82……コピー・アドレス・アレイ、83,84……キャッシ
ュ・メモリ、85……制御記憶。FIG. 1 is a block diagram of an information processing apparatus including an arithmetic processing unit of the present invention, FIG. 2 is an internal block diagram of a crossbar switch in FIG. 1, and FIG. 3 is an address conversion control circuit in FIG. It is a figure which shows a part of internal structure of. In the figure, 90 ... arithmetic processing unit, 91 ... main memory unit,
92 ... I / O controller, 93 ... System controller, 94 ...
... System bus, 10 ... Command control circuit, 20 ... Address translation control circuit, 30 ... Bus control circuit, 40 ... Operation control circuit, 50 ... High-speed operation circuit, 60 ... Control memory circuit, 70 ...
… Crossbar switch, 81 …… Address array,
82 ... Copy address array, 83, 84 ... Cache memory, 85 ... Control memory.

Claims

[Claims]

1. An arithmetic processing unit comprising a cache memory and a plurality of LSI chips having non-uniform data widths, wherein data is transferred between the cache memory and two or more of the LSI chips. A cross bar that can connect any input / output terminals
An arithmetic processing unit characterized in that the cache memory and two or more LSI chips are connected via a chip having a switch function and a function of aligning data and converting a data width.