JPH0719227B2

JPH0719227B2 - Processor

Info

Publication number: JPH0719227B2
Application number: JP63069057A
Authority: JP
Inventors: 洋一佐藤
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 1988-03-23
Filing date: 1988-03-23
Publication date: 1995-03-06
Anticipated expiration: 2010-03-06
Also published as: JPH01241647A

Description

【発明の詳細な説明】〔産業上の利用分野〕本発明は情報処理装置の一部を構成する演算処理装置に
関し、特にキャッシュ・メモリと複数のLSIチップとで
構成される演算処理装置におけるキャッシュ・メモリと
LSIチップとの間のデータ転送にかかる技術に関するも
のである。The present invention relates to an arithmetic processing unit forming a part of an information processing apparatus, and more particularly to a cache in an arithmetic processing unit including a cache memory and a plurality of LSI chips.・ With memory
The present invention relates to a technology for data transfer with an LSI chip.

[Conventional technology]

近年、電子デバイスの集積化の進歩が著しく、高性能の
演算処理装置も数個のLSIチップで実現されるようにな
ってきた。In recent years, the integration of electronic devices has made remarkable progress, and high-performance arithmetic processing devices have also been realized by several LSI chips.

ところで、このような高性能の演算処理装置では、処理
の一層の高速化を図る目的でキャッシュ・メモリが採用
されるが、LSIチップが複数個の場合はキャッシュ・メ
モリの読出し先や書込み元が複数のLSIチップにまたが
ることになり、個々にデータ・パスを設けるとキャッシ
ュ・メモリのピン数が膨大となってしまうことから、一
般にはデータ・パスをバス化して各LSIチップで共通利
用し、ピン数制限におさまるようにしている。By the way, in such a high-performance arithmetic processing device, a cache memory is adopted for the purpose of further increasing the processing speed. However, when there are a plurality of LSI chips, the read destination and the write source of the cache memory are Since it will be spread over multiple LSI chips, and the number of pins of the cache memory will become enormous if each data path is provided, the data path is generally made into a bus and commonly used by each LSI chip. I try to stay within the pin count limit.

[Problems to be Solved by the Invention]

上述したように、従来の演算処理装置は、キャッシュ・
メモリとのアクセスのためのデータ・パスをバス化する
ことにより、キャッシュ・メモリのピン数を少なくして
いた。しかしながら、バスに接続されるLSIチップ数が多くなるとバスの線
長が長くなり、静電容量の増大によりバス上の信号の遅
延時間が増大してキャッシュ・メモリの高速なアクセス
が行えない。As described above, the conventional arithmetic processing unit
The number of pins of the cache memory has been reduced by making the data path for accessing the memory a bus. However, when the number of LSI chips connected to the bus increases, the line length of the bus increases, and the delay time of signals on the bus increases due to an increase in capacitance, which makes it impossible to access the cache memory at high speed.

演算処理装置と接続される主記憶装置に対する書込み
は、処理速度向上の要請からデータ幅全部を書き込む、
いわゆる全書込みを行うことが好ましいが、従来は全書
込み化を行うための回路を別に必要とし、ハードウェア
の増加を招く。For writing to the main memory connected to the arithmetic processing unit, the entire data width is written in order to increase the processing speed.
It is preferable to perform so-called full writing, but conventionally, a circuit for performing full writing is separately required, which causes an increase in hardware.

という欠点があった。There was a drawback.

特に、キャッシュ・メモリのアクセスをパイプライン化
している演算処理装置にあっては、キャッシュ・メモリ
の読出し時間の増大はマシン・サイクルの短縮化を阻む
直接的な要因となることから、演算処理装置の性能を低
下させることとなり、についての対策は重要な問題で
あった。また、についてもハードウェアの増加をもた
らすため、その削減を図ることが重要な課題であった。In particular, in an arithmetic processing unit in which access to the cache memory is pipelined, the increase in the read time of the cache memory is a direct factor that prevents the shortening of the machine cycle. However, the countermeasure against was an important issue. In addition, since the increase of hardware also brings about, it was an important issue to reduce it.

本発明は上記の点に鑑み提案されたものであり、その目
的とするところは、高速なキャッシュ・メモリのアクセ
スを行うことができると共に、全書込み化を行うための
ハードウェアを削減することのできる演算処理装置を提
供することにある。The present invention has been proposed in view of the above points, and an object of the present invention is to enable high-speed access to a cache memory and to reduce the hardware for performing full writing. An object of the present invention is to provide an arithmetic processing device capable of performing the operation.

[Means for Solving the Problems]

本発明は上記の目的を達成するため、キャッシュ・メモ
リと複数のLSIチップとから構成され、前記キャッシュ
・メモリと２個以上の前記LSIチップとの間でデータ転
送が行われる演算処理装置において、セレクタによっ
て、任意の入出力端子間を接続状態とできると共に、書
込みマスクが指示するバイト位置について、１つの入出
力端子から入力されたデータの内容を別の１つの入出力
端子から入力されたデータの内容にバイト単位で入れ換
えて出力することができるクロス・バー・スイッチ機能
を有するチップを介して、前記キャッシュ・メモリと２
個以上の前記LSIチップとが接続され、かつ、前記キャ
ッシュ・メモリへのデータ書込み時に前記キャッシュ・
メモリのヒットした書込み前データを前記クロス・バー
・スイッチ機能を有するチップを介して読出して保持す
るデータ・レジスタを前記LSIチップの１つで構成され
るアドレス変換制御回路内に備え、演算の結果得られた
書込みデータおよび該書込みデータの入れ換えるべきバ
イト位置を示す書込みマスクと前記データ・レジスタに
保持された書込み前データとを前記クロス・バー・スイ
ッチ機能を有するチップに入力して、前記主記憶装置お
よび前記キャッシュ・メモリに書込むための全書込みデ
ータを生成し、該生成された全書込みデータを前記主記
憶装置および前記キャッシュ・メモリに書込む構成を有
している。In order to achieve the above-mentioned object, the present invention provides an arithmetic processing device comprising a cache memory and a plurality of LSI chips, wherein data is transferred between the cache memory and two or more of the LSI chips, With the selector, any input / output terminal can be connected, and at the byte position indicated by the write mask, the content of the data input from one input / output terminal can be changed to the data input from another input / output terminal. And the cache memory via a chip having a cross bar switch function that can output the contents of the
More than one of the LSI chips are connected, and the cache memory is used when writing data to the cache memory.
A data register for reading and holding the hit data before writing in the memory through the chip having the cross bar switch function is provided in the address conversion control circuit formed of one of the LSI chips, and the result of the operation is provided. The obtained write data and a write mask indicating a byte position at which the write data should be replaced and the pre-write data held in the data register are input to the chip having the cross bar switch function, and the main memory is stored. All write data for writing to the device and the cache memory are generated, and the generated all write data is written to the main storage device and the cache memory.

[Action]

本発明の演算処理装置にあっては、クロス・バー・スイ
ッチ機能を有するチップを介してキャッシュ・メモリと
LSIチップとの間でデータ転送が行われると共に、演算
の結果得られた書込みデータの部分書込み時にキャッシ
ュ・メモリでヒットすると、ヒットした書込み前データ
がアドレス変換制御回路内のデータ・レジスタに読出さ
れて保持され、次いで、演算の結果得られた上記書込み
データおよびこの書込みデータの入れ換えるべきバイト
位置を示す書込みマスクと上記データ・レジスタに保持
された書込み前データとがクロス・バー・スイッチ機能
を有するチップに入力され、主記憶装置およびキャッシ
ュ・メモリに書込むための全書込みデータが生成され、
この全書込みデータが主記憶装置およびキャッシュ・メ
モリに書込まれる。In the arithmetic processing unit of the present invention, the cache memory is provided via the chip having the cross bar switch function.
When data is transferred to and from the LSI chip, and if the cache memory is hit during partial write of the write data obtained as a result of the operation, the hit pre-write data is read to the data register in the address translation control circuit. The write data obtained as a result of the operation and the write mask indicating the byte position of the write data to be exchanged and the pre-write data held in the data register have a cross bar switch function. All write data that is input to the chip and written to main memory and cache memory is generated,
The entire write data is written in the main memory device and the cache memory.

〔Example〕

以下、本発明の実施例につき図面を参照して詳細に説明
する。Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.

第１図は本発明の演算処理装置を含む情報処理装置の一
実施例を示す構成図である。第１図において、90が本発
明の対象となる演算処理装置であり、この演算処理装置
90はシステム・バス94を介して主記憶装置91,入出力制
御装置92,システム制御装置93と接続されている。な
お、第１図では示していないが、マルチプロセッサ構成
においては他に数台の演算処理装置をシステム・バス94
に接続し、更に主記憶容量の増大時には主記憶装置を複
数台にしてシステム・バス94に接続するものである。FIG. 1 is a block diagram showing an embodiment of an information processing apparatus including an arithmetic processing unit of the present invention. In FIG. 1, reference numeral 90 denotes an arithmetic processing unit which is the object of the present invention.
Reference numeral 90 is connected to a main storage device 91, an input / output control device 92, and a system control device 93 via a system bus 94. Although not shown in FIG. 1, in the multiprocessor configuration, several other arithmetic processing units are connected to the system bus 94.
When a main memory capacity is further increased, a plurality of main memory devices are connected to the system bus 94.

また、演算処理装置90は、命令制御回路10,アドレス変
換制御回路20,バス制御回路30,演算制御回路40,高速演
算回路50,制御記憶回路60を構成する各LSIチップと、複
数個のランダム・アクセス・メモリ（RAM）から構成さ
れる制御記憶85と、キャッシュ・メモリ83,84と、アド
レス・アレイ（AA）81と、コピー・アドレス・アレイ
（CAA）82と、複数個のLSIチップから構成されるクロス
・バー・スイッチ70とで構成されている。Further, the arithmetic processing unit 90 includes each LSI chip constituting the instruction control circuit 10, the address translation control circuit 20, the bus control circuit 30, the arithmetic control circuit 40, the high speed arithmetic circuit 50, and the control memory circuit 60, and a plurality of random numbers. Control memory 85 consisting of access memory (RAM), cache memories 83, 84, address array (AA) 81, copy address array (CAA) 82, and multiple LSI chips It is composed of a crossbar switch 70 and the like.

次に、キャッシュ・メモリ83,84および主記憶装置91に
対する読出しオペレーション動作について説明する。先
ず、命令あるいはオペランドの読出し指示と読出しアド
レスは命令制御回路10から結線102を介してアドレス変
換制御回路20へ転送される。上記読出しアドレスが仮想
アドレスの場合はアドレス変換制御回路20内で仮想アド
レスから実アドレスに変換される。アドレス変換制御回
路20は読出し実アドレスを結線201,202,203,204上に出
力し、キャッシュ・メモリ83,84と主記憶装置91との対
応関係、すなわちキャッシュ・メモリ83,84の登録情報
を記憶し登録の有無を判定するアドレス・アレイ81から
結線202′を介して返送される信号によりキャッシュ・
ヒット（登録有り）か否かを判定し、キャッシュ・ヒッ
トならばキャッシュ・メモリ83あるいはキャッシュ・メ
モリ84の読出しデータを有効としてクロス・バー・スイ
ッチ70を介して読出し先のLSIチップに返送する。返送
先は、一般的には、命令の読出しの場合は命令制御回路
10となり、オペランドの読出しの場合は演算制御回路40
となるが、特殊な動作においてはアドレス変換制御回路
20や高速演算回路50となることもある。一方、キャッシ
ュ・ヒットでない場合（キャッシュ・ミスあるいはNFB
と呼ばれる。）は、バス制御回路30によりシステム・バ
ス94を介して主記憶装置91に対しブロック転送要求を送
出する。そして、主記憶装置91から返送されるデータ
は、バス制御回路30を経た後、結線307,クロス・バー・
スイッチ70,結線837あるいは結線847によりキャッシュ
・メモリ83あるいはキャッシュ・メモリ84へ書込まれ
る。また、主記憶装置91からの第１回目の返送データは
クロス・バー・スイッチ70から返送先へ返送される。以
上のようにして読出しオペレーションが実行される。Next, the read operation operation for the cache memories 83, 84 and the main memory 91 will be described. First, an instruction or operand read instruction and a read address are transferred from the instruction control circuit 10 to the address conversion control circuit 20 via a connection line 102. When the read address is a virtual address, the virtual address is translated in the address translation control circuit 20 into a real address. The address conversion control circuit 20 outputs the read real address on the connection lines 201, 202, 203, 204, and stores the correspondence between the cache memories 83, 84 and the main storage device 91, that is, the registration information of the cache memories 83, 84 and the presence or absence of registration. The cache returned by the signal returned from the address array 81 through the connection 202 '.
It is determined whether or not there is a hit (registered), and if it is a cache hit, the read data of the cache memory 83 or the cache memory 84 is validated and returned to the LSI chip of the read destination via the crossbar switch 70. Generally, the return destination is the instruction control circuit when reading an instruction.
10, the operation control circuit 40 for reading operands
However, in special operation, the address translation control circuit
It may be 20 or high-speed arithmetic circuit 50. On the other hand, if it is not a cache hit (cache miss or NFB
Called. ) Sends a block transfer request to the main memory device 91 via the system bus 94 by the bus control circuit 30. Then, the data returned from the main memory 91 passes through the bus control circuit 30 and then the connection 307, the cross bar
It is written to the cache memory 83 or the cache memory 84 by the switch 70, the connection 837 or the connection 847. The first return data from the main storage device 91 is returned from the cross bar switch 70 to the return destination. The read operation is executed as described above.

次に、キャッシュ・メモリ83,84および主記憶装置91に
対する書込みオペレーション動作について説明する。先
ず、書込み指示と書込みアドレスは命令制御回路10で書
込みオペレーションを必要とする命令を解読した場合あ
るいはマイクロ・プログラムで書込みオペレーションを
実行する場合に命令制御回路10内で作成され、結線102
を介してアドレス変換制御回路20へ送出される。その書
込みアドレスが仮想アドレスの場合にはアドレス変換制
御回路20で実アドレスへ変換された後、アドレス変換制
御回路20内の書込みアドレスを保持するレジスタに保持
され、高速演算回路50等で書込みデータが準備された時
点で、キャッシュ・メモリ83あるいはキャッシュ・メモ
リ84への書込みと、主記憶装置91に対する書込み指示，
書込みアドレス，書込みデータのバス制御回路30への送
出とが実行される。ただし、キャッシュ・メモリ83ある
いはキャッシュ・メモリ84への書込みは、該当するアド
レスがキャッシュ・メモリ83あるいはキャッシュ・メモ
リ84に登録されている場合のみ行われる。そして、バス
制御回路30ではシステム・バス94を介して主記憶装置91
への書込みを実行する。なお、書込みデータは演算制御
回路40において主にマイクロ・プログラムの制御下で準
備され、結線405を介して高速演算回路50にある書込み
データを保持するレジスタへ送られた後、書込みアドレ
スとの同期をとって結線507を介してクロス・バー・ス
イッチ70へ送られ、バス制御回路30およびキャッシュ・
メモリ83あるいはキャッシュ・メモリ84へ転送される。
以上のようにして書込みオペレーションが実行される。Next, the write operation operation for the cache memories 83, 84 and the main memory 91 will be described. First, the write instruction and the write address are created in the instruction control circuit 10 when the instruction control circuit 10 decodes an instruction requiring the write operation or when the micro program executes the write operation, and the connection 102
Is sent to the address translation control circuit 20 via. If the write address is a virtual address, it is converted to a real address by the address conversion control circuit 20, and then stored in a register that holds the write address in the address conversion control circuit 20, and the write data is stored in the high-speed arithmetic circuit 50 or the like. At the time of preparation, writing to the cache memory 83 or the cache memory 84 and a write instruction to the main memory 91,
The write address and write data are sent to the bus control circuit 30. However, writing to the cache memory 83 or the cache memory 84 is performed only when the corresponding address is registered in the cache memory 83 or the cache memory 84. In the bus control circuit 30, the main storage device 91 is connected via the system bus 94.
Write to. The write data is prepared in the arithmetic control circuit 40 mainly under the control of the micro program, sent to the register holding the write data in the high-speed arithmetic circuit 50 through the connection 405, and then synchronized with the write address. Sent to the crossbar switch 70 via the connection 507, and the bus control circuit 30 and cache
It is transferred to the memory 83 or the cache memory 84.
The write operation is executed as described above.

キャッシュ・メモリ83,84および主記憶装置91に対する
データの読出しオペレーションおよび書込みオペレーシ
ョンは以上のように実行されるものであるが、データが
転送されるデータ線は図示のように全て各回路を構成す
るLSIチップ間を１対１で接続するように配設されてな
るものであり、クロス・バー・スイッチ70により選択さ
れた結線の他は影響しないと共に、アクセス・パスの線
長が最短になるように各LSIチップをパッケージ上に実
装することができるため、パッケージ上のデータ線によ
る遅延時間を大幅に短縮することが可能である。すなわ
ち、従来の装置を第１図の実施例に当てはめてみると、
従来は結線207,107,407,507,307,837,847が並列に接続
されたバス構成となっていたため、トータルの線長が長
くなり、静電容量が増大してデータ転送の際の遅延時間
が大きくなってしまっていたが、本発明によればクロス
・バー・スイッチ70により選択された結線のみの静電容
量しか関係してこないと共に最短のアクセス・パスとす
ることができるため、静電容量に起因する遅延時間を大
幅に短縮することができるわけである。The data read operation and the data write operation for the cache memories 83, 84 and the main memory 91 are executed as described above, but the data lines to which the data are transferred all constitute each circuit as shown in the figure. The LSI chips are arranged so as to be connected to each other in a one-to-one manner, and the connection path selected by the cross bar switch 70 is not affected, and the line length of the access path is minimized. Since each LSI chip can be mounted on a package, the delay time due to the data line on the package can be greatly reduced. That is, when the conventional device is applied to the embodiment shown in FIG.
Conventionally, since the connection 207, 107, 407, 507, 307, 837, 847 had a bus configuration in which they were connected in parallel, the total line length became long, the capacitance increased and the delay time at the time of data transfer increased, but the present invention According to the above, since only the capacitance of the connection selected by the crossbar switch 70 is relevant and the shortest access path can be obtained, the delay time due to the capacitance is significantly reduced. It can be done.

次に、第２図は第１図におけるクロス・バー・スイッチ
70の内部構成の例を示す構成図である。第２図におい
て、847,837,307,207,507,407,107は、第１図において
示したように、各々キャッシュ・メモリ84,キャッシュ
・メモリ83,バス制御回路30,アドレス変換制御回路20,
高速演算回路50,演算制御回路40,命令制御回路10と接続
される結線である。なお、図では簡略化して記載してあ
るが、結線847,837,307,207,507,107はデータ幅が例え
ば８バイト（64ビット）となっているものである。ただ
し、結線407だけはデータ幅が他と異なり、例えば４バ
イトとなっている。しかして、結線847,837,307,207,50
7,407,107にそれぞれ対応してセレクタ710〜716および
入出力のドライバが設けられており、クロス・バー・ス
イッチ70の制御線である結線205としてセレクタ710〜71
6のセレクタ信号205−S0〜205−S6と、ドライバの出力
イネーブル信号205−E0〜205−E4とが与えられ、アドレ
ス変換制御回路20により個々のセレクタ710〜716は独立
に制御されるようになっている。例えば、キャッシュ・
メモリ83から命令制御回路10へデータの読出しを行う場
合には、セレクタ716により結線107と結線837とを接続
する。Next, FIG. 2 shows the crossbar switch in FIG.
FIG. 30 is a configuration diagram showing an example of an internal configuration of 70. In FIG. 2, reference numerals 847, 837, 307, 207, 507, 407, 107 denote cache memory 84, cache memory 83, bus control circuit 30, address translation control circuit 20, respectively, as shown in FIG.
The wiring is connected to the high-speed arithmetic circuit 50, arithmetic control circuit 40, and instruction control circuit 10. It should be noted that although illustrated in a simplified manner in the figure, the connection lines 847, 837, 307, 207, 507, 107 have a data width of, for example, 8 bytes (64 bits). However, the data width of only the connection 407 is different from the others, and is, for example, 4 bytes. Then connection 847,837,307,207,50
Selectors 710 to 716 and input / output drivers are provided corresponding to 7,407 and 107, respectively, and selectors 710 to 71 are provided as connection lines 205 which are control lines of the crossbar switch 70.
The selector signals 205-S0 to 205-S6 of 6 and the output enable signals 205-E0 to 205-E4 of the driver are provided so that the address translation control circuit 20 controls the individual selectors 710 to 716 independently. Has become. For example, cache
When data is read from the memory 83 to the instruction control circuit 10, the selector 716 connects the connection 107 and the connection 837.

なお、本発明の直接的な内容ではないが、このクロス・
バー・スイッチ70はデータ幅を変換する機能も有してお
り、データ幅が均一でないLSIチップ同士を結合するこ
とができるようになっている。例えば、演算制御回路40
（前述したように結線407だけはデータ幅が他と異な
り、例えば４バイトである。）へデータの読出しを実行
する場合、キャッシュ・アクセス時はセレクタ715は読
出しアドレスに応じて結線837または結線847の入力デー
タを選択し、更に読出しアドレスに応じ８バイト内の上
位４バイトあるいは下位４バイトのいずれかの４バイト
を選択するようにセレクト信号205−S5が与えられるこ
とで、８バイト・データを４バイト・データとして演算
制御回路40に返送することができる。なお、他のLSIチ
ップ、例えば命令制御回路10へのデータ読出しの際は結
線107のデータ幅がキャッシュ・メモリ83,84等と同じ８
バイトであるため、４バイト単位の選択は不要である。Although not a direct content of the present invention, this cross
The bar switch 70 also has a function of converting the data width, and can connect LSI chips having non-uniform data widths. For example, the arithmetic control circuit 40
(As described above, only the connection 407 has a data width different from the others, for example, 4 bytes.) When data is read out, the selector 715 selects the connection 837 or the connection 847 according to the read address during cache access. Input data is selected, and the select signal 205-S5 is given so as to select either the upper 4 bytes or the lower 4 bytes within the 8 bytes according to the read address. It can be returned to the arithmetic control circuit 40 as 4-byte data. When reading data from another LSI chip, for example, the instruction control circuit 10, the data width of the connection 107 is the same as that of the cache memories 83 and 84.
Since it is a byte, it is not necessary to select in units of 4 bytes.

次に、第３図は第１図におけるアドレス変換制御回路20
の内部構成の一部を示したものである。第３図におい
て、要求コードは命令制御回路10から与えられる読出し
オペレーションあるいは書込みオペレーション等を指示
する情報が含まれたコードであり、要求アドレスは命令
制御回路10から与えられる読出し，書込みアドレス（命
令制御回路10から与えられる読出し，書込みアドレスが
仮想アドレスである場合は実アドレスに変換された後の
もの）である。Next, FIG. 3 shows the address conversion control circuit 20 in FIG.
3 shows a part of the internal configuration of the. In FIG. 3, a request code is a code including information for instructing a read operation or a write operation given from the instruction control circuit 10, and a request address is a read / write address given by the instruction control circuit 10 (instruction control If the read / write address given from the circuit 10 is a virtual address, it is after being converted to a real address.

以下、動作を説明する。先ず、結線20−101および結線2
0−201に要求コードおよび要求アドレスが与えられる
と、要求コードは要求コード・レジスタ20−10にセット
され、要求アドレスは実アドレス・レジスタ20−20にセ
ットされる。通常状態では要求受付時に実アドレス・レ
ジスタ20−20に要求アドレスがセットされると同時に、
AAアドレス・レジスタ20−30と、DAアドレス・レジスタ
20−40あるいはDAアドレス・レジスタ20−41にも要求ア
ドレスの一部がセットされる。読出しまたは書込みオペ
レーション時はAAアドレス・レジスタ20−30,DAアドレ
ス・レジスタ20−40,20−41から結線202〜204にアドレ
スが与えられてアドレス・アレイ81とキャッシュ・メモ
リ83またはキャッシュ・メモリ84とが読出され、アドレ
ス・アレイ81でキャッシュ・ヒットか否かが調べられ
る。そして、読出しオペレーションの場合は、キャッシ
ュ・ヒットならばキャッシュ・メモリ83またはキャッシ
ュ・メモリ84から読出したデータはクロス・バー・スイ
ッチ70を介して読出し先へ返送される。なお、キャッシ
ュ・メモリ83かキャッシュ・メモリ84のいずれから読出
しデータを返送するかは要求アドレス中の予め決められ
た１ビットの値に従って行われ、このビットの値が“0"
の時にキャッシュ・メモリ83（バンク＃０）が選択さ
れ、“1"の時にキャッシュ・メモリ84（バンク＃１）が
選択される。一方、キャッシュ・ヒットでない場合（キ
ャッシュ・ミスの場合）、実アドレス・レジスタ20−20
からセレクタ20−23を介して結線201によりバス制御回
路30へ主記憶装置91に対するブロック転送のアドレスが
送出され、バス制御回路30で読出されたブロック転送デ
ータの第１回の返送時、そのデータはクロス・バー・ス
イッチ70を介して読出し先に返送されると同時にキャッ
シュ・メモリ83またはキャッシュ・メモリ84へ登録され
る。なお、ブロック・サイズを32バイト、データの転送
幅を８バイトとすると、ブロック転送は８バイト転送を
４回実行することになる。また、キャッシュ・メモリ8
3,84のバンクをアドレスの下位から第５ビット目、すな
わち16バイト境界で分けることとすると、ブロック転送
データはキャッシュ・メモリ83とキャッシュ・メモリ84
へ２回ずつ（16バイトずつ）書込まれることになる。The operation will be described below. First, connection 20-101 and connection 2
When the request code and the request address are given to 0-201, the request code is set in the request code register 20-10, and the request address is set in the real address register 20-20. In the normal state, the request address is set in the real address register 20-20 when the request is received, and at the same time,
AA address register 20-30 and DA address register
Part of the requested address is also set in 20-40 or DA address register 20-41. During a read or write operation, an address is given from the AA address register 20-30, DA address register 20-40, 20-41 to the connection lines 202-204, and the address array 81 and the cache memory 83 or the cache memory 84 are supplied. And are read and the address array 81 is checked for a cache hit. In the case of a read operation, if a cache hit, the data read from the cache memory 83 or the cache memory 84 is returned to the read destination via the crossbar switch 70. Whether the read data is returned from the cache memory 83 or the cache memory 84 is determined according to a predetermined 1-bit value in the request address, and the value of this bit is "0".
When, the cache memory 83 (bank # 0) is selected, and when it is "1", the cache memory 84 (bank # 1) is selected. On the other hand, if there is no cache hit (cache miss), the real address register 20-20
The address of the block transfer to the main memory device 91 is sent from the bus control circuit 30 to the bus control circuit 30 via the connection 201 through the selector 20-23, and when the block transfer data read by the bus control circuit 30 is returned for the first time, the data is transferred. Is returned to the read destination via the cross bar switch 70 and simultaneously registered in the cache memory 83 or the cache memory 84. When the block size is 32 bytes and the data transfer width is 8 bytes, the 8-byte transfer is executed four times for the block transfer. Also, cache memory 8
If the banks of 3,84 are divided at the 5th bit from the lower address, that is, at the 16-byte boundary, the block transfer data is cache memory 83 and cache memory 84.
It will be written twice to 16 bytes each.

一方、要求コード・レジスタ20−10に書込みオペレーシ
ョンの指示がセットされた場合は、アドレス・アレイ81
の参照とキャッシュ・メモリ83あるいはキャッシュ・メ
モリ84の読出しとが実行された後、要求アドレス（書込
みアドレス）は実アドレス・レジスタ20−20から実アド
レス・レジスタ20−22にセットされ、キャッシュ・メモ
リ83あるいはキャッシュ・メモリ84の読出しデータはデ
ータ・レジスタ20−50へセットされる。また、キャッシ
ュ・ヒットか否かの情報はデコーダ20−11に入力され、
要求コード・レジスタ20−12へセットされる。このよう
に書込みオペレーションの場合は、要求コード・レジス
タ20−10,実アドレス・レジスタ20−20の第１ステージ
から要求コード・レジスタ20−12,実アドレス・レジス
タ20−22の第２ステージに処理を移行させ、第１ステー
ジを空けることにより、後続の要求を受付けることがで
きるようになっている。すなわち、書込みオペレーショ
ンでは書込みデータを待ち合わせる必要から、このよう
な処理が可能となる。On the other hand, if the instruction of the write operation is set in the request code register 20-10, the address array 81
Is executed and the cache memory 83 or the cache memory 84 is read, the request address (write address) is set in the real address register 20-20 to the real address register 20-22, and the cache memory The read data from 83 or cache memory 84 is set in data registers 20-50. In addition, information on whether or not a cache hit is input to the decoder 20-11,
Set in request code register 20-12. In the case of the write operation as described above, processing is performed from the first stage of the request code register 20-10 and the real address register 20-20 to the second stage of the request code register 20-12 and the real address register 20-22. , And the subsequent stage can be accepted by vacating the first stage. That is, since it is necessary to wait for write data in the write operation, such processing is possible.

さて、第２ステージの要求コード・レジスタ20−12,実
アドレス・レジスタ20−22にセットされた書込みオペレ
ーションの要求コード，要求アドレスは、高速演算回路
50内の書込みデータ・レジスタに書込みデータが準備さ
れるのを待ち合わせ、書込みデータが準備された時点で
書込み動作を行う。Now, the request code and the request address of the write operation set in the request code register 20-12 and the real address register 20-22 of the second stage are the high speed operation circuit.
Wait for write data to be prepared in the write data register in 50, and perform the write operation when the write data is prepared.

なお、本発明の他の特徴点として、この実施例ではキャ
ッシュ・ヒットの場合は書込みに際してデータ幅内の全
てのデータ（例えば８バイト）を書換えない部分書込み
であっても、データ幅内の全てのデータを書換える全書
込みとし、特に主記憶装置91への書込みにかかる処理速
度の向上を図れるようになっている。すなわち、アドレ
ス・アレイ81の参照とキャッシュ・メモリ83あるいはキ
ャッシュ・メモリ84の読出しとが実行された状態で、キ
ャッシュ・メモリ83あるいはキャッシュ・メモリ84の読
出しデータは結線207を介しデータ・レジスタ20−50に
保持されるようになっており、書込みデータが準備され
た場合に、高速演算回路50から結線507を介して転送さ
れる書込みデータと、アドレス変換制御回路20のデータ
・レジスタ20−50からセレクタ20−51および結線207を
介して転送される書込み前データとをクロス・バー・ス
イッチ70で受け、バイト単位でデータの入換えを行い、
新たな書込みデータを作成するようになっている。つま
り、バイト単位に書込みマスク（データ幅が８バイトの
場合は８ビット）が設けられており、そのマスクが“1"
のバイトのみが書込み前データと入換えられるようにな
っている。すなわち、書込みマスクが“1"のバイトでは
結線507の書込みデータを選択し、書込みマスクが“0"
のバイトでは結線207の書込み前データを選択する。な
お、この書込みマスクは書込みデータとともに結線507
でクロス・バー・スイッチ70に送出されるものであり、
書込みマスク受入部720で受信された後、結線205による
制御信号と同様にセレクタの制御に使用される。この操
作によりキャッシュ・ヒット時は、全書込みでない書込
みオペレーションに対してもバス制御回路30および主記
憶装置91に対して全書込みとすることが可能である。す
なわち、全書込み化が可能となる。なお、キャッシュ・
ヒットの場合はデータ・レジスタ20−50の内容は書込み
前データとなるため、上記のような処理が可能である
が、キャッシュ・ミスの場合は内容は不定（パリティの
み保証される。）であるため、全書込み化は行えない。
このようなキャッシュ・ミスの場合は全書込み化は不可
能であるので、２バイト書込みならそのまま２バイト部
分書込みとしてバス制御回路30へ送出され、キャッシュ
・メモリ83,84への書込みも実行しない。また、一般に
主記憶装置91では８バイト単位にエラー訂正符号（EC
C）を有し、読出し１ビット・エラーを訂正するように
しているため、例えば２バイト部分書込み等の８バイト
全書込み以外の書込み実行時は、対応する８バイト境界
データの読出しを行った後、書込みデータの２バイトの
みを差し換えて８バイト単位にエラー訂正符号を再作成
してデータとともに書込むことが必要であり、全書込み
に比べ処理時間が大きくなってしまうことが考えられる
が、その場合は、この処理時間の遅れを救済するため、
演算処理装置90内のキャッシュ・メモリ83,84で上記の
処理を予め実行し、主記憶装置91に対しては全書込み動
作として主記憶装置91の処理時間を短縮することが可能
である。As another feature of the present invention, in this embodiment, in the case of a cache hit, even if a partial write in which all the data (for example, 8 bytes) within the data width is not rewritten at the time of writing, All the data is rewritten, and the processing speed for writing to the main memory 91 can be improved. That is, in a state where the reference of the address array 81 and the reading of the cache memory 83 or the cache memory 84 are executed, the read data of the cache memory 83 or the cache memory 84 is connected to the data register 20- When the write data is prepared, the write data transferred from the high speed arithmetic circuit 50 through the connection 507 and the data register 20-50 of the address conversion control circuit 20 The cross bar switch 70 receives the pre-write data transferred via the selector 20-51 and the connection 207, and exchanges the data in byte units.
It is designed to create new write data. In other words, a write mask (8 bits when the data width is 8 bytes) is provided for each byte, and the mask is "1".
Only the byte of is replaced with the data before writing. That is, in the byte whose write mask is "1", the write data of connection 507 is selected and the write mask is "0".
In the byte of, the data before writing of the connection 207 is selected. Note that this write mask is connected to the write data together with wiring 507.
Is sent to the crossbar switch 70 at
After being received by the write mask receiving section 720, it is used for controlling the selector in the same manner as the control signal by the connection 205. By this operation, at the time of cache hit, it is possible to write all to the bus control circuit 30 and the main memory 91 even for write operations that are not all write. That is, full writing becomes possible. In addition, cash
In the case of a hit, the contents of the data register 20-50 are pre-write data, so the above processing is possible, but in the case of a cache miss, the contents are undefined (only parity is guaranteed). Therefore, full writing cannot be performed.
In the case of such a cache miss, since full writing is impossible, if 2-byte writing is performed, it is sent as it is to the bus control circuit 30 as 2-byte partial writing, and writing to the cache memories 83 and 84 is also not executed. Further, generally, in the main memory 91, an error correction code (EC
Since it has C) and corrects the read 1-bit error, for example, when writing other than 8-byte full write such as 2-byte partial write, after reading the corresponding 8-byte boundary data , It is necessary to replace only 2 bytes of the write data, recreate the error correction code in units of 8 bytes, and write it together with the data, and the processing time may be longer than that of all writing. In order to remedy this delay in processing time,
It is possible to shorten the processing time of the main memory 91 by performing the above-mentioned processing in advance in the cache memories 83 and 84 in the arithmetic processing unit 90 and performing a full write operation to the main memory 91.

一方、本発明の直接的な内容ではないが、第３図におい
ては要求コード・レジスタおよび実アドレス・レジスタ
が２つのステージとなっており、２個のバンクに分割さ
れたキャッシュ・メモリ83,84に対して同時に書込み，
読出しが行えるようになっている。以下、第２ステージ
の要求コード・レジスタ20−12,実アドレス・レジスタ2
0−22に書込みオペレーションがセットされ、第１ステ
ージの要求コード・レジスタ20−10,実アドレス・レジ
スタ20−20に読出しオペレーションがセットされている
場合について動作を説明する。なお、この場合、書込
み，読出しを行うキャッシュ・メモリのバンクによって
動作が異なる。なお、バンクの選択は前述したように要
求アドレス中の予め決められた１ビットの値に従って行
われる。On the other hand, although not a direct content of the present invention, in FIG. 3, the request code register and the real address register are two stages, and the cache memories 83, 84 divided into two banks. Simultaneously write to
It can be read. Below, the second stage request code register 20-12, the real address register 2
The operation will be described for the case where the write operation is set to 0-22 and the read operation is set to the request code register 20-10 and the real address register 20-20 of the first stage. In this case, the operation differs depending on the bank of the cache memory for writing and reading. The bank selection is performed according to a predetermined 1-bit value in the request address as described above.

（１）同一バンクの場合この場合は第２ステージの書込みオペレーションが優先
され、DAアドレス・レジスタ20−40またはDAアドレス・
レジスタ20−41には書込みアドレス（実アドレス・レジ
スタ20−22の内容）の一部がセレクタ20−23,20−42,20
−43を介してセットされ、キャッシュ・メモリ83あるい
はキャッシュ・メモリ84への書込みアドレスを確保し、
書込みが行われる。また、第１ステージの読出しオペレ
ーションは書込みオペレーションが終了するのを待ち合
わせて行われる。(1) In the case of the same bank In this case, the write operation of the second stage is prioritized and the DA address register 20-40 or DA address
In the register 20-41, part of the write address (contents of the real address register 20-22) is selected by the selectors 20-23, 20-42, 20.
It is set via -43, and secures the write address to the cache memory 83 or the cache memory 84,
Writing is done. Further, the read operation of the first stage is performed by waiting for the completion of the write operation.

（２）別バンクの場合この場合、例えば書込みがバンク＃０（キャッシュ・メ
モリ83）で読出しがバンク＃１（キャッシュ・メモリ8
4）の場合、書込みアドレスの一部はDAアドレス・レジ
スタ20−40に、読出しアドレスの一部はAAアドレス・レ
ジスタ20−30およびDAアドレス・レジスタ20−41にセッ
トされる。従って、第２ステージではDAアドレス・レジ
スタ20−40によりキャッシュ・メモリ83のアドレスを確
保し、結線507,207により書込みデータを作成し、結線8
37によりキャッシュ・メモリ83へデータを書込むと同時
に、結線307によりバス制御回路30へ書込みデータを送
出して主記憶装置91への書込みを行う。これと並列し
て、第１ステージではAAアドレス・レジスタ20−30とDA
アドレス・レジスタ20−41とによりアドレス・アレイ81
とキャッシュ・メモリ84のアドレスを確保し、キャッシ
ュ・メモリ84のデータを結線847により読み出す。この
時、読出し先が命令制御回路10または演算制御回路40な
らば上記の読出しデータを返送することが可能である。
ただし、高速演算回路50またはアドレス変換制御回路20
は第２ステージの書込みオペレーションにより使用され
ているため、これらへの読出しは不可である。(2) In the case of another bank In this case, for example, writing is bank # 0 (cache memory 83) and reading is bank # 1 (cache memory 8).
In the case of 4), part of the write address is set in the DA address register 20-40, and part of the read address is set in the AA address register 20-30 and DA address register 20-41. Therefore, in the second stage, the address of the cache memory 83 is secured by the DA address register 20-40, the write data is created by the connections 507 and 207, and the connection 8
At the same time that the data is written to the cache memory 83 by 37, the write data is sent to the bus control circuit 30 through the connection 307 to write to the main memory 91. In parallel with this, in the first stage AA address registers 20-30 and DA
Address array 81 by address register 20-41
The address of the cache memory 84 is secured, and the data of the cache memory 84 is read by the connection 847. At this time, if the read destination is the instruction control circuit 10 or the operation control circuit 40, the read data can be returned.
However, high-speed arithmetic circuit 50 or address translation control circuit 20
Cannot be read because they are used by the second stage write operations.

〔The invention's effect〕

以上説明したように、本発明の演算処理装置にあって
は、キャッシュ・メモリの読出し，書込みのためのデー
タ・パスにバス方式を使わずにクロス・バー・スイッチ
機能を有するチップで直接にキャッシュ・メモリとLSI
チップとの接続を行うようにしているため、データ転送
の行われるデータ・パスを形成するトータルの線長を最
短にすることが可能となり、高速なキャッシュ・メモリ
のアクセスを実現することができる効果がある。また、
クロス・バー・スイッチ機能を有するチップが部分書込
みデータと書込み前データとを使用して主記憶装置に対
する全書込みデータを作成し、全書込みを可能とするた
め、従来のように別に回路を設ける必要がなくなり、ハ
ードウェアの削減を図ることができる効果がある。As described above, in the arithmetic processing unit of the present invention, the cache is directly cached by the chip having the crossbar switch function without using the bus method for the data path for reading and writing of the cache memory.・ Memory and LSI
Since the connection with the chip is made, the total line length forming the data path for data transfer can be minimized, and high-speed cache memory access can be realized. There is. Also,
Since the chip having the cross bar switch function uses the partial write data and the pre-write data to create all write data to the main memory device and enable all write, it is necessary to provide a separate circuit as in the past. Is eliminated, and there is an effect that the hardware can be reduced.

[Brief description of drawings]

第１図は本発明の演算処理装置を含む情報処理装置の構
成図、第２図は第１図におけるクロス・バー・スイッチの内部
構成図および、第３図は第１図におけるアドレス変換制御回路の内部構
成の一部を示す図である。図において、90…演算処理装置、91…主記憶装置、92…
入出力制御装置、93…システム制御装置、94…システム
・バス、10…命令制御回路、20…アドレス変換制御回
路、30…バス制御回路、40…演算制御回路、50…高速演
算回路、60…制御記憶回路、70…クロス・バー・スイッ
チ、81…アドレス・アレイ、82…コピー・アドレス・ア
レイ、83,84…キャッシュ・メモリ、85…制御記憶、20
−50…データ・レジスタ。FIG. 1 is a block diagram of an information processing apparatus including an arithmetic processing unit of the present invention, FIG. 2 is an internal block diagram of a crossbar switch in FIG. 1, and FIG. 3 is an address conversion control circuit in FIG. It is a figure which shows a part of internal structure of. In the figure, 90 ... Arithmetic processing unit, 91 ... Main memory unit, 92 ...
Input / output control device, 93 ... System control device, 94 ... System bus, 10 ... Command control circuit, 20 ... Address conversion control circuit, 30 ... Bus control circuit, 40 ... Arithmetic control circuit, 50 ... High speed arithmetic circuit, 60 ... Control memory circuit, 70 ... Cross bar switch, 81 ... Address array, 82 ... Copy address array, 83, 84 ... Cache memory, 85 ... Control memory, 20
-50 ... Data register.

Claims

[Claims]

1. An arithmetic processing unit comprising a cache memory and a plurality of LSI chips, wherein data is transferred between the cache memory and two or more of the LSI chips. The output terminals can be connected, and at the byte position indicated by the write mask, the content of the data input from one input / output terminal can be changed to the content of the data input from another input / output terminal in byte units. The cache memory and two or more of the LSI chips are connected to each other via a chip having a cross bar switch function that can be replaced and output.
In addition, when writing data to the cache memory, one of the LSI chips has a data register for reading and holding the hit data before writing in the cache memory through the chip having the cross bar switch function. In the address conversion control circuit configured, the write data obtained as a result of the operation, a write mask indicating a byte position at which the write data is to be replaced, and pre-write data held in the data register are crossbars. Input to a chip having a switch function to generate all write data for writing to the main memory and the cache memory, and the generated all write data to the main memory and the cache memory An arithmetic processing unit having a writing configuration.