JP4674865B2

JP4674865B2 - Semiconductor integrated circuit

Info

Publication number: JP4674865B2
Application number: JP2006293469A
Authority: JP
Inventors: 道明中山; 秀樹榊原; 徹小林; 修一宮岡; 勇治横山; 英雄澤本; 正二久米
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 2006-10-30
Filing date: 2006-10-30
Publication date: 2011-04-20
Anticipated expiration: 2020-02-03
Also published as: JP2007080283A

Description

本発明は、メモリブロックを有する半導体集積回路、更にはリードアクセスの要求に対するデータリード動作のスループットを向上させる技術に関し、例えばロジック回路と共にＤＲＡＭを混載したキャッシュメモリ用の半導体集積回路に適用して有効な技術に関する。 The present invention relates to a semiconductor integrated circuit having a memory block, and further to a technique for improving the throughput of a data read operation in response to a read access request. Technology.

情報参照の時間的・空間的局所性に鑑みた記憶装置の記憶階層は、一般にアクセス速度と記憶容量の異なる複数レベルのメモリによって構成される。メインメモリにはビット単価の安いＤＲＡＭ（ダイナミック・ランダム・アクセス・メモリ）が用いられ、プロセッサ若しくはＣＰＵ（セントラル・プロセッシング・ユニット）に近いレベルにはＳＲＡＭ（スタティック・ランダム・アクセス・メモリ）などによって構成されるキャッシュメモリが配置される。キャッシュメモリはプロセッサが最近利用したデータに対して時間的・空間的に局在するデータを保持し、下位レベルからのデータリード動作よりもスループットを向上させることを可能にするものである。 A storage hierarchy of a storage device in consideration of temporal and spatial locality of information reference is generally configured by a plurality of memories having different access speeds and storage capacities. DRAM (Dynamic Random Access Memory) with a low cost per bit is used for the main memory, and the level close to the processor or CPU (Central Processing Unit) is configured by SRAM (Static Random Access Memory). Cache memory to be used is arranged. The cache memory holds data that is temporally and spatially localized with respect to data recently used by the processor, and makes it possible to improve the throughput as compared with the data read operation from the lower level.

本発明者は、本発明を完成した後、特開平２−２９７７９１号及び特開平６−１９５２６１号の存在について知らされた。これらの文献は、ダイナミック型メモリ（ＤＲＡＭ）とスタティック型メモリ（ＳＲＡＭ）とを１チップの半導体基板上に有し、上記ＤＲＡＭ及び上記ＳＲＡＭをキャッシュメモリとして利用することについて述べている。しかしながら、本発明の目的及びその構成については、それらには述べられていない。 The inventor was informed of the existence of Japanese Patent Application Laid-Open Nos. Hei 2-297971 and Hei 6-195261 after completing the present invention. These documents describe having a dynamic memory (DRAM) and a static memory (SRAM) on a one-chip semiconductor substrate, and using the DRAM and the SRAM as a cache memory. However, the object of the present invention and its configuration are not described therein.

特開平２−２９７７９１号公報Japanese Patent Laid-Open No. 2-29791 特開平６−１９５２６１号公報JP-A-6-195261

本発明者は比較的アクセス速度の遅いＤＲＡＭモジュールをロジック回路と共に多数混載してこれをキャッシュメモリに利用可能にすることについて検討した。例えば、レベル１（Ｌ１）及びレベル２（Ｌ２）キャッシュメモリを内蔵したマイクロプロセッサのレベル３（Ｌ３）キャッシュメモリ等に用いることができるＤＲＡＭ混載の半導体集積回路を検討した。 The present inventor has studied to incorporate a large number of DRAM modules having a relatively low access speed together with a logic circuit so that the DRAM module can be used as a cache memory. For example, a DRAM-embedded semiconductor integrated circuit that can be used for a level 3 (L3) cache memory of a microprocessor incorporating level 1 (L1) and level 2 (L2) cache memories was examined.

本発明者の検討によれば、多数のＤＲＡＭモジュールを混載して並列動作可能とすることによって見掛け上メモリ・リードサイクルを短くしようとするとき、並列動作によるデータ出力動作等の競合回避を考慮しなければならない。この場合、データの競合回避のためにデータバッファを採用しようとするとき、データ競合を生じていない場合にもデータバッファリングを行ったのでは無駄のあることが見出された。 According to the study of the present inventor, when trying to shorten the memory read cycle by seemingly shortening the memory read cycle by mounting a large number of DRAM modules in parallel, consideration is given to avoiding competition such as data output operation due to parallel operation. There must be. In this case, when trying to adopt a data buffer to avoid data contention, it was found that it was useless to perform data buffering even when no data contention occurred.

プロセッサによるデータ処理効率を考慮すれば、プロセッサのリードアクセスに応答するリード動作のスループット向上が第一義である。このとき、キャッシュメモリのリード動作にはプロセッサによるライトアクセスに伴うコピーバック（若しくはライトバック）のためのリード動作もあり、そのようなリード動作では殆どの場合高いスループットは要求されない。即ち、コピーバックは、キャッシュミスに際してダーティーなキャッシュラインをリプレースするためにそのデータをメインメモリに退避させる動作だからである。したがって、キャッシュメモリとしての利用を考慮するときは、リードデータのスループット向上に対してもリードデータの用途に応じて軽重の差を付けられるようにしてロジック回路の論理規模が無駄に拡大しないようにすることの必要性が本発明者によって明らかにされた。 Considering the data processing efficiency by the processor, the primary improvement is the throughput of the read operation in response to the read access of the processor. At this time, the read operation of the cache memory also includes a read operation for copy back (or write back) accompanying write access by the processor, and such read operation does not require high throughput in most cases. That is, copy back is an operation of saving the data in the main memory in order to replace a dirty cache line in the event of a cache miss. Therefore, when considering the use as a cache memory, the logic scale of the logic circuit should not be increased unnecessarily by adding a light weight difference according to the use of the read data for improving the read data throughput. The need to do so has been clarified by the inventor.

また、プロセッサのライトアクセスに対してはそれに応答するライト処理の高速化は左程重要ではないが、プロセッサによるデータ処理効率を考慮すれば、ライトアクセスの要求を受け付けてその動作からプロセッサを短時間で解放する事が必要である。特に、ＤＲＡＭの場合にはリフレッシュインターバル毎に記憶情報のリフレッシュ動作が必要であり、これによってライトアクセス要求の受付が遅れないようにしなければならない。 In addition, for processor write access, it is not as important to speed up the write processing that responds to it, but considering the data processing efficiency of the processor, the write access request is accepted and the processor is put into operation for a short time. It is necessary to release with. In particular, in the case of a DRAM, a refresh operation of stored information is required at each refresh interval, so that reception of a write access request must not be delayed.

本発明の目的は、メモリブロックの並列動作によるデータ競合を回避するためにデータバッファを採用した構成においてリード動作のスループットを改善することができる半導体集積回路を提供することにある。 An object of the present invention is to provide a semiconductor integrated circuit capable of improving the throughput of a read operation in a configuration employing a data buffer in order to avoid data contention due to parallel operation of memory blocks.

本発明の別の目的は、ロジック回路の論理規模が無駄に拡大しないようにリード動作のスループットを改善することができる半導体集積回路を提供することにある。 Another object of the present invention is to provide a semiconductor integrated circuit capable of improving the throughput of the read operation so that the logic scale of the logic circuit is not increased unnecessarily.

本発明のその他の目的は、内部のメモリ動作状態に拘わらずにライトアクセスの要求を受け付ける事が容易な半導体集積回路を提供することにある。 Another object of the present invention is to provide a semiconductor integrated circuit that can easily accept a write access request regardless of the internal memory operation state.

本発明の前記並びにその他の目的と新規な特徴は本明細書の記述及び添付図面から明らかになるであろう。 The above and other objects and novel features of the present invention will be apparent from the description of this specification and the accompanying drawings.

本願において開示される発明のうち代表的なものの概要を簡単に説明すれば下記の通りである。 The following is a brief description of an outline of typical inventions disclosed in the present application.

〔１〕メモリブロックの並列動作によるデータ競合を回避するためにリードバッファを採用し、リード動作のスループットを改善する。そのための構成として、半導体集積回路は、並列動作可能な複数個のメモリブロック（ＢＮＫ０〜ＢＮＫ７）と、外部からライトデータを入力可能であって外部へリードデータを出力可能な外部インタフェース手段（Ｉ／Ｆ１）と、前記メモリブロックから読み出されたリードデータを前記外部インタフェース手段から外部に出力不可能な状態に呼応して保持する事が可能なリードバッファ（ＲＢ０〜ＲＢ３）と、前記出力不可能な状態が解消されているとき前記メモリブロックから読み出されたリードデータ又は前記リードバッファから読み出されたリードデータを選択して前記外部インタフェース手段に与える選択手段（４０，４１）と、を有する。 [1] A read buffer is employed to avoid data contention due to parallel operation of memory blocks, thereby improving the throughput of the read operation. As a configuration for this purpose, the semiconductor integrated circuit includes a plurality of memory blocks (BNK0 to BNK7) that can operate in parallel, and external interface means (I / O) that can input write data from the outside and output read data to the outside. F1), a read buffer (RB0 to RB3) capable of holding read data read from the memory block in response to a state in which it cannot be output to the outside from the external interface means, and the output impossible Selecting means (40, 41) for selecting the read data read from the memory block or the read data read from the read buffer and giving the read data to the external interface means when the state is resolved .

上記手段によれば、並列動作可能なメモリブロックの一つのリードデータが外部インタフェース手段から外部に出力されているとき、他のメモリブロックのリード動作が行われると、このリードデータは外部出力の点でリソース競合を生ずることになるのでリードバッファに一旦格納され、先のデータ出力動作が終了された後、リードバッファから外部に出力可能にされる。したがって、リードデータの出力動作でリソース競合を生ずる事になるようなリードアクセス要求があってもその要求を待たせずにリード動作を開始でき、リソース競合の虞が無くなればバッファから即座にリードデータを外部に出力でき、この点においてリードデータ出力動作のスループットを向上させる事が可能になる。 According to the above means, when one read data of a memory block that can be operated in parallel is output to the outside from the external interface means, if the read operation of another memory block is performed, the read data Since resource contention occurs in this case, the data is temporarily stored in the read buffer, and after the previous data output operation is completed, the data can be output from the read buffer to the outside. Therefore, even if there is a read access request that causes resource contention in the read data output operation, the read operation can be started without waiting for the request, and when there is no risk of resource contention, the read data is immediately read from the buffer. Can be output to the outside, and in this respect, the throughput of the read data output operation can be improved.

メモリブロックからデータが読み出されたとき前記リソース競合が無ければ、リードデータはリードバッファを介することなく直接外部インタフェース手段から外部に出力されるから、データ競合を生じていない場合にも一旦データをバッファリングするような無駄を回避でき、この点においてリードデータ出力動作のスループット向上に寄与する。 If there is no resource contention when data is read from the memory block, the read data is directly output from the external interface means without going through the read buffer. Therefore, even if no data contention occurs, the data is temporarily stored. Waste such as buffering can be avoided, which contributes to the improvement of the throughput of the read data output operation.

リードバッファはメモリブロックに比べて小容量・高速のメモリ等によって構成すればよい。例えばメモリブロックをＤＲＡＭモジュールで構成するとき、リードバッファをＳＲＡＭモジュールで構成すればよい。 The read buffer may be constituted by a memory having a small capacity and high speed as compared with the memory block. For example, when the memory block is composed of a DRAM module, the read buffer may be composed of an SRAM module.

上記構成を制御の観点から述べれば、半導体集積回路は、並列動作可能な複数個のメモリブロック（ＢＮＫ０〜ＢＮＫ７）と、前記メモリブロックから読み出されたリードデータを保持する事が可能なリードバッファ（ＲＢ０〜ＲＢ３）と、前記リードバッファから出力されるリードデータ及び前記メモリブロックから出力されるリードデータを外部へ出力可能な外部インタフェース手段（Ｉ／Ｆ１）と、前記メモリブロックから読み出されたリードデータを前記外部インタフェース手段から外部に出力不可能な状態に呼応して当該リードデータを前記リードバッファに保持させ、前記出力不可能な状態が解消されているとき前記メモリブロックから読み出されたリードデータ又は前記リードバッファから読み出されたリードデータを前記外部インタフェース手段から出力させる制御手段（ＭＣＮＴ）と、を有する。 To describe the above configuration from the viewpoint of control, the semiconductor integrated circuit includes a plurality of memory blocks (BNK0 to BNK7) that can operate in parallel and a read buffer that can hold read data read from the memory block. (RB0 to RB3), external interface means (I / F1) capable of outputting the read data output from the read buffer and the read data output from the memory block to the outside, and read from the memory block In response to a state in which read data cannot be output from the external interface means to the outside, the read data is held in the read buffer, and read from the memory block when the non-outputable state has been resolved Read data read from the read buffer or the read buffer And a control means for outputting from the interface means (MCNT), the.

〔２〕内部のメモリ動作状態に拘わらずに外部からのライトアクセスの要求を容易に受け付け可能にするために、半導体集積回路は、並列動作可能な複数個のメモリブロック（ＢＮＫ０〜ＢＮＫ７）と、外部からライトデータを入力可能な外部インタフェース手段（Ｉ／Ｆ１）と、前記外部インタフェース手段に入力されたライトデータを入力して保持し、メモリブロックがライト動作可能にされた後にライトデータをメモリブロックに供給するライトバッファ（ＷＢ０〜ＷＢ３）と、を有する。 [2] In order to make it possible to easily accept external write access requests regardless of the internal memory operation state, the semiconductor integrated circuit includes a plurality of memory blocks (BNK0 to BNK7) capable of operating in parallel, External interface means (I / F1) capable of inputting write data from the outside and the write data inputted to the external interface means are inputted and held, and the write data is stored in the memory block after the memory block is enabled for the write operation. And write buffers (WB0 to WB3) to be supplied to.

記憶情報のリフレッシュやリード動作などメモリブロックの内部動作中に、それに対するライトアクセスの要求があっても、ライトバッファにライトデータを予めバッファリングする事ことができるから、ライトアクセスを行うプロセッサなどをライトアクセス動作から短時間で解放する事が可能になる。プロセッサ等によるデータ処理効率を考慮したとき、プロセッサのライトアクセスに対してはそれに応答するメモリ側でのライト処理の高速化は左程重要ではないが、上記より、プロセッサのライトアクセス要求を待たせないので、システム全体のデータ処理効率の向上に寄与する。 Even if there is a request for write access during a memory block internal operation such as refresh or read operation of stored information, write data can be buffered in the write buffer in advance. It is possible to release the write access operation in a short time. When considering the data processing efficiency of the processor, etc., it is not important to increase the write processing on the memory side in response to the write access of the processor. However, from the above, it is necessary to wait for the write access request of the processor. This contributes to improving the data processing efficiency of the entire system.

ライトバッファはメモリブロックに比べて小容量・高速のメモリ等によって構成すればよく、上記同様、例えばメモリブロックをＤＲＡＭモジュールで構成するとき、ライトバッファをＳＲＡＭモジュールで構成すればよい。 The write buffer may be configured by a memory having a smaller capacity and higher speed than the memory block. For example, when the memory block is configured by a DRAM module, the write buffer may be configured by an SRAM module.

上記構成を制御の観点を主体に述べれば、半導体集積回路は、外部からライトデータを入力可能な外部インタフェース手段（Ｉ／Ｆ１）と、前記外部インタフェース手段に入力されたライトデータを入力するライトバッファ（ＷＢ０〜ＷＢ３）と、前記ライトバッファからライトデータが供給される複数個のメモリブロック（ＢＮＫ０〜ＢＮＫ７）と、外部からのアクセス要求に応答して外部インタフェース手段に供給されるライトデータを前記ライトバッファに格納させ、アクセス対象メモリブロックがライト動作可能にされるのを待ってライトデータをライトバッファからメモリブロックに供給させる制御手段（ＭＣＮＴ）と、を有する。 If the above configuration is mainly described from the viewpoint of control, the semiconductor integrated circuit includes an external interface unit (I / F1) capable of inputting write data from the outside, and a write buffer for inputting the write data input to the external interface unit (WB0 to WB3), a plurality of memory blocks (BNK0 to BNK7) to which write data is supplied from the write buffer, and write data supplied to external interface means in response to an external access request And a control unit (MCNT) that stores the data in the buffer and waits for the memory block to be accessed to be able to perform a write operation, and supplies write data from the write buffer to the memory block.

〔３〕上記リードバッファとライトバッファの双方の構成を兼ね備えた半導体集積回路は、並列動作可能な複数個のメモリブロック（ＢＮＫ０〜ＢＮＫ７）と、外部からライトデータを入力可能であって外部へリードデータを出力可能な外部インタフェース手段（Ｉ／Ｆ１）と、前記外部インタフェース手段に入力されたライトデータを入力して保持し、メモリブロックがライト動作可能にされた後にライトデータをメモリブロックに供給するライトバッファ（ＷＢ０〜ＷＢ３）と、前記メモリブロックから読み出されたリードデータを前記外部インタフェース手段から外部に出力不可能な競合状態に応答して保持する事が可能なリードバッファ（ＲＢ０〜ＲＢ３）と、前記出力不可能な競合状態が解消されているとき前記メモリブロックから読み出されたリードデータ又は前記リードバッファから読み出されたリードデータを選択して前記外部インタフェース手段（４０，４１）に与える選択手段と、を有する。 [3] The semiconductor integrated circuit having both the read buffer and write buffer configurations is capable of inputting a plurality of memory blocks (BNK0 to BNK7) capable of operating in parallel and inputting write data from the outside and reading to the outside. External interface means (I / F1) capable of outputting data and write data input to the external interface means are input and held, and the write data is supplied to the memory block after the memory block is enabled for write operation. A write buffer (WB0 to WB3) and a read buffer (RB0 to RB3) capable of holding read data read from the memory block in response to a race condition that cannot be output to the outside from the external interface means And when the non-output possible race condition is resolved, the memory block Read read data or select the read data read from the read buffer having a selecting means for giving the outside interface means (40, 41).

〔４〕下位レベル及び上位レベルの双方の記憶階層に接続可能なキャッシュメモリとしての用途を想定する。このとき、半導体集積回路は、並列動作可能な複数個のメモリブロック（ＢＮＫ０〜ＢＮＫ７）と、外部からライトデータを入力可能であって外部へリードデータを出力可能な第１の外部インタフェース手段（Ｉ／Ｆ１）と、外部からライトデータを入力可能であって外部へリードデータを出力可能な第２の外部インタフェース手段（Ｉ／Ｆ２）とを有する。更に、半導体集積回路は、前記第１又は第２の外部インタフェース手段に入力されたライトデータを入力して保持し、メモリブロックがライト動作可能にされた後にライトデータをメモリブロックに供給するライトバッファ（ＷＢ０〜ＷＢ３）、前記第２の外部インタフェース手段から出力すべきリードデータの保持と、前記第１の外部インタフェース手段から出力すべきリードデータであって当該第１の外部インタフェース手段から出力不可能な競合状態にあるリードデータの保持とを行うことが可能なリードバッファ（ＲＢ０〜ＲＢ３）と、前記出力不可能な競合状態が解消されているとき前記メモリブロックから読み出されたリードデータ又は前記リードバッファから読み出されたリードデータを選択して前記第１の外部インタフェース手段に与える選択手段（４０，４１）と、を有する。 [4] Assume a use as a cache memory that can be connected to both lower and upper level storage hierarchies. At this time, the semiconductor integrated circuit includes a plurality of memory blocks (BNK0 to BNK7) capable of operating in parallel and first external interface means (I) capable of inputting write data from the outside and outputting read data to the outside. / F1) and second external interface means (I / F2) capable of inputting write data from the outside and outputting read data to the outside. Further, the semiconductor integrated circuit inputs and holds the write data input to the first or second external interface means, and supplies the write data to the memory block after the memory block is enabled for the write operation (WB0 to WB3), holding read data to be output from the second external interface means, and read data to be output from the first external interface means, which cannot be output from the first external interface means A read buffer (RB0 to RB3) capable of holding read data in a different race state, and the read data read from the memory block when the non-output race state has been resolved, or The read data read from the read buffer is selected and the first external interface is selected. A selection means providing a means (40, 41), the.

この構成において、第１の外部インタフェース手段は上位の記憶階層に接続され、第２の外部インタフェース手段は下位の記憶階層に接続される。プロセッサのリード・ライトアクセス要求に対する前記リードバッファ及びライトバッファの基本的な作用は上記同様である。特に、第２の外部インタフェース手段を介する下位記憶階層へのリードデータの出力は、リードバッファを介するデータ出力だけになる。これは、下位記憶階層へのリードデータ出力として、プロセッサによるライトアクセスに伴うコピーバック（若しくはライトバック）のためのリード動作を想定するからである。コピーバックは、キャッシュミスに際してダーティーなキャッシュラインをリプレースするためにそのデータをメインメモリに退避させる動作だから、そのようなリード動作では殆どの場合高いスループットは要求されないので、リードバッファを迂回して直接リードデータを第２の外部インタフェース手段から出力可能にするデータパスやそのためのロジック回路を省いて、回路の論理規模が無駄に拡大しないようにしている。 In this configuration, the first external interface means is connected to the upper storage hierarchy, and the second external interface means is connected to the lower storage hierarchy. The basic operation of the read buffer and the write buffer with respect to the read / write access request of the processor is the same as described above. In particular, the output of read data to the lower storage hierarchy via the second external interface means is only data output via the read buffer. This is because a read operation for copy back (or write back) accompanying write access by the processor is assumed as read data output to the lower storage hierarchy. Copyback is an operation that saves the data to the main memory in order to replace the dirty cache line in the event of a cache miss, so in such a read operation, high throughput is not required in most cases. The data path that enables the read data to be output from the second external interface means and the logic circuit therefor are omitted so that the logic scale of the circuit is not increased unnecessarily.

前記半導体集積回路をマルチプロセッサシステムに適用することを考慮すると、下位の記憶階層側にも別のプロセッサが接続される事になり、当該別のプロセッサのアクセスにもその半導体集積回路が動作される場合が想定される。これに対処するには、前記第１及び第２の外部インタフェース手段は、夫々個別にメモリブロックに対するアクセス要求とアクセスアドレスとを外部から入力可能であればよい。 Considering application of the semiconductor integrated circuit to a multiprocessor system, another processor is connected to the lower storage hierarchy side, and the semiconductor integrated circuit is operated for accessing the other processor. A case is assumed. In order to cope with this, it is sufficient that the first and second external interface means can individually input an access request and an access address to the memory block from the outside.

また、下位の記憶階層から上記半導体集積回路を通過して上位の記憶階層にリードデータが供給されるときのリソース競合を考慮すれば、前記第２の外部インタフェース手段からデータを入力して保持し、保持したデータを前記第２の外部インタフェース手段から外部に出力可能なメモリバッファ（５４）を更に有することが、キャッシュメモリとしての半導体集積回路の利便性を増す。 In addition, if resource contention occurs when read data is supplied from the lower storage layer to the upper storage layer through the semiconductor integrated circuit, data is input from the second external interface unit and held. Further, having the memory buffer (54) capable of outputting the held data from the second external interface means to the outside increases the convenience of the semiconductor integrated circuit as the cache memory.

〔５〕メモリブロックが例えばＤＲＡＭで構成される場合に、ＤＲＡＭのアクセス時間の短縮は公知のページモードやスタティック・カラムモードでも達成できる。更に、ＤＲＡＭで構成されるようなメモリブロックにおける見掛け上のアクセスタイムを短縮するために、データの入力に直列並列変換を施し、データ出力に並列・直列変換を施す。即ち、半導体集積回路は、メモリセルアレイ（１０）、ロウ選択回路（１１）、カラム選択回路（１２，１３）、直列・並列変換回路（２１）、ライトアンプ（１７Ｗ）、メインアンプ（１７Ｒ）、並列・直列変換回路（２５）を有するメモリブロックを含む。メモリセルアレイは、選択端子がワード線に接続され、データ入出力端子がビット線に接続されたメモリセルを複数個有する。ロウ選択回路は、ロウアドレスストローブ信号の変化にクロック信号同期で応答しロウアドレス信号で指定されるワード線を選択する。カラム選択回路は、カラムアドレスストローブ信号の変化にクロック信号同期で応答しカラムアドレス信号で指定されるビット線を複数本並列に選択する。直列・並列変換回路は、ライトバッファから直列的に入力されるライトデータをクロック信号に同期して並列データに変換する。ライトアンプは、前記カラム選択回路で選択された複数本のビット線に前記直列・並列変換回路の出力を並列出力する。メインアンプは、前記カラム選択回路で選択された複数本のビット線から並列出力される並列データを増幅する。並列・直列変換回路はメインアンプから供給される並列データをクロック信号に同期して直列データに変換して前記リードバッファ及び選択手段に向けて出力する。 [5] When the memory block is composed of, for example, a DRAM, the access time of the DRAM can be shortened even in a known page mode or static column mode. Further, in order to shorten the apparent access time in a memory block constituted by a DRAM, serial / parallel conversion is performed on the data input, and parallel / serial conversion is performed on the data output. That is, the semiconductor integrated circuit includes a memory cell array (10), a row selection circuit (11), a column selection circuit (12, 13), a serial / parallel conversion circuit (21), a write amplifier (17W), a main amplifier (17R), It includes a memory block having a parallel / serial conversion circuit (25). The memory cell array has a plurality of memory cells each having a selection terminal connected to a word line and a data input / output terminal connected to a bit line. The row selection circuit responds to the change of the row address strobe signal in synchronization with the clock signal and selects the word line specified by the row address signal. The column selection circuit responds to the change of the column address strobe signal in synchronization with the clock signal and selects a plurality of bit lines specified by the column address signal in parallel. The serial / parallel conversion circuit converts the write data serially input from the write buffer into parallel data in synchronization with the clock signal. The write amplifier outputs the output of the serial / parallel conversion circuit in parallel to a plurality of bit lines selected by the column selection circuit. The main amplifier amplifies parallel data output in parallel from a plurality of bit lines selected by the column selection circuit. The parallel / serial conversion circuit converts the parallel data supplied from the main amplifier into serial data in synchronization with the clock signal, and outputs the serial data to the read buffer and the selection means.

前記メモリブロックは前記クロック信号周期のｎ（２以上の正の整数）倍の周期で変化される前記カラムアドレスストローブ信号が入力され、カラムアドレス信号が変化されるサイクル毎に、メモリセルアレイから読み出されクロック信号サイクルに同期して並列・直列変換された複数の直列データがメモリブロックから出力され、またクロック信号サイクルに同期してメモリブロックに入力されて直・並列変換された並列データがメモリセルアレイに書込まれる。このように、クロック信号のｎサイクルに１回の割合でカラムアドレスストローブ信号を変化させるというアクセス仕様によってメモリ動作の高速化を図ることが可能になる。 The memory block receives the column address strobe signal that is changed at a cycle of n (a positive integer greater than or equal to 2) times the clock signal cycle, and reads from the memory cell array every cycle when the column address signal is changed. A plurality of serial data converted in parallel / serial in synchronization with the clock signal cycle is output from the memory block, and the parallel data input in the memory block in synchronization with the clock signal cycle and subjected to serial / parallel conversion is output to the memory cell array. Written on. As described above, it is possible to increase the memory operation speed by the access specification that the column address strobe signal is changed once every n cycles of the clock signal.

前記直列・並列変換回路の直列データ入力経路と、前記並列・直列変換回路の直列データ出力経路とを独立に備えるとよい。リード動作ではカラムアドレスストローブ信号の変化に応答してメモリセルアレイからデータを読み出した後に並列・直列変換の時間を要してメモリブロックから直列データが出力されるが、ライト動作では、カラムアドレスストローブ信号の変化に応答してメモリセルアレイに並列データを書込む前に、予めメモリブロックに入力された直列データを並列データに変換する動作を完了していなければならない。このとき、リード動作に続けてライト動作が指示されると、リード動作による直列データをメモリブロックから出力する動作に並行して、ライト動作のための直列データを予めメモリブロックに順次直列に入力する動作を行わなければならないことが多く予想される。即ち、メモリブロックからの直列データ出力タイミングとメモリブロックへの直列データ入力タイミングとがオーバラップする蓋然性が高い。前述の如く、メモリブロックの直列データ入力経路と直列データ出力経路とを独立に持つことによって、そのような処理のオーバラップに対してもデータの衝突を回避して効率的な処理を実現可能になる。 A serial data input path of the serial / parallel converter circuit and a serial data output path of the parallel / serial converter circuit may be provided independently. In the read operation, data is read from the memory cell array in response to the change in the column address strobe signal, and then serial / serial conversion takes time to output serial data from the memory block. In the write operation, the column address strobe signal is output. Before the parallel data is written to the memory cell array in response to the change of the above, the operation for converting the serial data input to the memory block into the parallel data in advance must be completed. At this time, when a write operation is instructed following the read operation, serial data for the write operation is sequentially serially input to the memory block in advance in parallel with the operation of outputting the serial data from the read operation from the memory block. Many operations are expected to be performed. That is, there is a high probability that the serial data output timing from the memory block and the serial data input timing to the memory block overlap. As described above, by having the serial data input path and serial data output path of the memory block independently, it is possible to realize efficient processing by avoiding data collision even for such processing overlap. Become.

〔６〕リードデータの伝播遅延を考慮するとき、前記半導体集積回路には以下のレイアウト構成を採用するのがよい。例えば信号入出力用のボンディングパッド若しくはバンプ電極などの外部接続電極がチップの中央部に配置されているセンタパッド形態を想定する。このとき、半導体チップ上にメモリブロックが離間して対向配置される。前記対向するメモリブロックの間に、メモリブロックから読み出されたリードデータを保持することが可能なリードバッファ及びメモリブロックに与えるライトデータを保持する事が可能なライトバッファが配置される。前記リードバッファとライトバッファとの近傍に外部インタフェース手段が配置される。前記外部インタフェース手段の近傍に位置する外部接続電極を有する。前記ライトバッファは、前記外部インタフェース手段に入力されたライトデータを入力して保持し、メモリブロックがライト動作可能にされた後にライトデータをメモリブロックに供給する。前記リードバッファは、前記メモリブロックから読み出されたリードデータを前記外部インタフェース手段から外部に出力不可能な状態に呼応して保持する事が可能である。 [6] When considering propagation delay of read data, it is preferable to adopt the following layout configuration for the semiconductor integrated circuit. For example, a center pad configuration in which external connection electrodes such as signal input / output bonding pads or bump electrodes are arranged at the center of the chip is assumed. At this time, the memory blocks are arranged opposite to each other on the semiconductor chip. A read buffer capable of holding read data read from the memory block and a write buffer capable of holding write data to be given to the memory block are arranged between the opposing memory blocks. External interface means is disposed in the vicinity of the read buffer and the write buffer. An external connection electrode located in the vicinity of the external interface means; The write buffer receives and holds the write data input to the external interface means, and supplies the write data to the memory block after the memory block is enabled for the write operation. The read buffer can hold the read data read from the memory block in response to a state in which the read data cannot be output from the external interface means to the outside.

本願において開示される発明のうち代表的なものによって得られる効果を簡単に説明すれば下記の通りである。 The effects obtained by the representative ones of the inventions disclosed in the present application will be briefly described as follows.

すなわち、メモリブロックの並列動作によるデータ競合を回避するためにデータバッファを採用した構成においてリード動作のスループットを改善することができる。 That is, the read operation throughput can be improved in a configuration employing a data buffer in order to avoid data contention due to parallel operation of memory blocks.

ロジック回路の論理規模が無駄に拡大しないようにリード動作のスループットを改善することができる。 The throughput of the read operation can be improved so that the logic scale of the logic circuit does not increase unnecessarily.

内部のメモリ動作状態に拘わらずにライトアクセスの要求を受け付ける事が容易な半導体集積回路を実現する事ができる。 It is possible to realize a semiconductor integrated circuit that can easily accept a write access request regardless of the internal memory operation state.

図１には本発明に係る半導体集積回路の一例が全体的に示される。同図に示される半導体集積回路１は、特に制限されないが、Ｌ３キャッシュメモリとしての利用が想定された半導体集積回路であり、８個のメモリブロックＢＮＫ０〜ＢＮＫ７、４個のライトバッファＷＢ０〜ＷＢ３、４個のリードバッファＲＢ０〜ＲＢ３、上位記憶階層（例えばプロセッサバス）に接続される上位階層インタフェースブロックＩ／Ｆ１、下位記憶階層（例えばメモリバス）に接続される下位階層インタフェースブロックＩ／Ｆ２、メモリ制御回路ＭＣＮＴを有する。 FIG. 1 generally shows an example of a semiconductor integrated circuit according to the present invention. The semiconductor integrated circuit 1 shown in FIG. 1 is not particularly limited, but is a semiconductor integrated circuit assumed to be used as an L3 cache memory, and includes eight memory blocks BNK0 to BNK7, four write buffers WB0 to WB3, Four read buffers RB0 to RB3, an upper layer interface block I / F1 connected to an upper storage layer (for example, a processor bus), a lower layer interface block I / F2 connected to a lower storage layer (for example, a memory bus), a memory A control circuit MCNT is included.

前記上位階層インタフェースブロックＩ／Ｆ１は、上位記憶階層例えばＬ１キャッシュメモリ及びＬ２キャッシュメモリを内蔵したプロセッサが接続されるプロセッサバス等に接続され、アクセス制御信号及びアクセスアドレス信号を等を含むアクセス制御情報を入力し、また、例えば７２ビット並列でデータを入出力する。 The upper layer interface block I / F1 is connected to a processor bus to which a processor incorporating an upper storage layer, for example, an L1 cache memory and an L2 cache memory is connected, and includes access control information including an access control signal and an access address signal. In addition, for example, data is input / output in parallel with 72 bits.

前記下位階層インタフェースブロックＩ／Ｆ２は、下位記憶階層例えばメインメモリ又はＬ４キャッシュメモリ等が接続されるメモリバス等に接続され、例えば７２ビット並列でデータを入出力する。特に制限されないが、マルチプロセッサシステムを想定し、上位記憶階層のプロセッサとは別のプロセッサからもアクセス可能なように、前記下位階層インタフェースブロックＩ／Ｆ２は当該別のプロセッサからアクセス制御情報を入力してメモリブロックＢＮＫ０〜ＢＮＫ７のアクセスが可能になっている。 The lower layer interface block I / F 2 is connected to a lower storage layer such as a memory bus to which a main memory or an L4 cache memory is connected, and inputs and outputs data in parallel, for example, 72 bits. Although not particularly limited, the lower layer interface block I / F 2 inputs access control information from another processor so that it can be accessed from a processor different from the processor of the upper storage layer, assuming a multiprocessor system. Thus, the memory blocks BNK0 to BNK7 can be accessed.

前記メモリ制御回路ＭＣＮＴはアクセス制御情報を入力し、それに含まれるアドレス情報の一部をデコードしてアクセス対象メモリブロックを判定し、アクセス対象メモリブロックに、ローカルなメモリアドレスとアクセス制御信号を出力して、そのメモリブロックの動作を制御する。 The memory control circuit MCNT inputs access control information, decodes a part of address information included in the memory control circuit MCNT, determines an access target memory block, and outputs a local memory address and an access control signal to the access target memory block. To control the operation of the memory block.

代表的に示されたメモリブロックＢＮＫ０は７２ビット（８バイト）単位で直列的に入力されるライトデータを４個のライトレジスタ（ＩＬＴ）２２に順次ラッチして、２８８ビット（３２バイト）並列でＤＲＡＭコア８に書込み可能にされ、また、ＤＲＡＭコア８から２８８ビット並列で読み出されたリードデータを７２ビット単位でリードレジスタ（ＯＬＴ）２６にラッチし、セレクタ２７によりリードレジスタ２６の出力を順次選択して７２ビット単位で直列的にリードデータを出力可能になっている。従って、メモリブロックＢＮＫ０は、ＤＲＡＭコア８のアクセスタイムに対して４倍の速度でデータを入出力することができる。尚、本明細書においては、１バイトは８ビットのデータと１ビットのパリティーデータとを含むものとされる。 In the memory block BNK0 shown as a representative, write data inputted in series in units of 72 bits (8 bytes) is sequentially latched in four write registers (ILT) 22, and 288 bits (32 bytes) are paralleled. The read data read from the DRAM core 8 in parallel with 288 bits is latched in the read register (OLT) 26 in units of 72 bits, and the selector 27 sequentially outputs the output of the read register 26. The read data can be output serially in units of 72 bits. Therefore, the memory block BNK0 can input / output data at a speed four times the access time of the DRAM core 8. In this specification, one byte includes 8-bit data and 1-bit parity data.

上位階層インタフェースブロックＩ／Ｆ１に入力された上位記憶階層からの書込みデータはライトバッファＷＢ０（ＷＢ１〜ＷＢ３）を介してメモリブロックＢＮＫ０（ＢＮＫ１〜ＢＮＫ７）に供給される。 Write data from the upper storage layer input to the upper layer interface block I / F1 is supplied to the memory block BNK0 (BNK1 to BNK7) via the write buffer WB0 (WB1 to WB3).

メモリブロックＢＮＫ０（ＢＮＫ１〜ＢＮＫ７）から読み出されたリードデータの出力経路は、上位スルー経路、上位バッファリング経路、及び下位バッファリング経路の３通りとされる。上位スルー経路は、概略的に示されたセレクタ４０，４１を介して上位階層インタフェースブロックＩ／Ｆ１から上位記憶階層に出力する経路である。上位バッファリング経路は、リードバッファＲＢ０（ＲＢ１〜ＲＢ３）に一旦蓄えられたリードデータを前記セレクタ４０，４１を介して上位階層インタフェースブロックＩ／Ｆ１から上位記憶階層に出力する経路である。下位バッファリング経路は、リードバッファＲＢ０（ＲＢ１〜ＲＢ３）に一旦蓄えられたリードデータを前記セレクタ４２を介して下位階層インタフェースブロックＩ／Ｆ２から下位記憶階層に出力する経路である。下位階層へのスルー経路は設けられていない。 There are three output paths for read data read from the memory block BNK0 (BNK1 to BNK7): an upper through path, an upper buffering path, and a lower buffering path. The upper through route is a route that is output from the upper layer interface block I / F1 to the upper storage layer via the selectors 40 and 41 schematically shown. The upper buffering path is a path for outputting the read data once stored in the read buffer RB0 (RB1 to RB3) from the upper hierarchy interface block I / F1 to the upper storage hierarchy via the selectors 40 and 41. The lower buffering path is a path for outputting the read data once stored in the read buffer RB0 (RB1 to RB3) from the lower hierarchy interface block I / F2 to the lower storage hierarchy via the selector 42. There is no through path to the lower layer.

前記リードバッファＲＢ０〜ＲＢ３及びライトバッファＷＢ０〜ＷＢ３はＳＲＡＭによって構成される。それらＳＲＡＭのアクセスはシステムのクロック信号によって規定される１サイクル単位で可能にされている。上記リードバッファＲＢ０〜ＲＢ３乃び上記ライトバッファＷＢ０〜ＷＢ３のおのおのを構成する上記ＳＲＡＭは公知のＳＲＡＭと同様に構成することが可能である。上記ＳＲＡＭは、特に制限されないが、複数のスタティク型メモリセルと複数のワード線と複数の相補データ線対とを含むメモリアレイ、所定のワード線をアドレス信号に応答して選択するアドレスデコーダー、選択された複数個のメモリセルのデータを増幅するセンスアンプ及び増幅されたデータを出力するデータ出力回路とを有する構成とされる。 The read buffers RB0 to RB3 and the write buffers WB0 to WB3 are constituted by SRAM. Access to these SRAMs is made possible in units of one cycle defined by the system clock signal. The SRAM constituting each of the read buffers RB0 to RB3 and the write buffers WB0 to WB3 can be configured in the same manner as a known SRAM. The SRAM is not particularly limited, but includes a memory array including a plurality of static memory cells, a plurality of word lines, and a plurality of complementary data line pairs, an address decoder that selects a predetermined word line in response to an address signal, and a selection A sense amplifier for amplifying data of the plurality of memory cells and a data output circuit for outputting the amplified data are provided.

以下に述べられる様に、各ＳＲＡＭは、一組のアドレス信号の入力に対して、７２個のメモリセルが同時に選択される構成とされる。各スタティク型メモリセルは、Ｎチャンネル型ＭＯＳＦＥＴとＰチャンネル型ＭＯＳＦＥＴとを含むＣＭＯＳインバータを１対含むと共に、上記１対のＣＭＯＳインバータの入力と出力とを交差結合して構成された情報記憶部と、その情報記憶部を選択するための複数個のＮチャンネル型トランスファＭＯＳＦＥＴからなる選択トランジスタを含む。上記複数個の選択トランジスタのゲート端子は、１本乃至複数本のワード線に選択的に結合され、上記複数個の選択トランジスタのソース・ドレインパスは、対応する１対乃至複数対の相補データ線に結合され、多入力ポート・多出力ポートのメモリセルとして構成される。上記リードバッファＲＢ０〜ＲＢ３乃び上記ライトバッファＷＢ０〜ＷＢ３のおのおのを構成する上記ＳＲＡＭの各々は、特に制限されないが、１２８ワード×７２ビットの構成とされる。 As will be described below, each SRAM has a configuration in which 72 memory cells are simultaneously selected in response to a set of address signals. Each static memory cell includes a pair of CMOS inverters including an N-channel MOSFET and a P-channel MOSFET, and an information storage unit configured by cross-coupling the input and output of the pair of CMOS inverters. And a selection transistor including a plurality of N-channel type transfer MOSFETs for selecting the information storage unit. The gate terminals of the plurality of select transistors are selectively coupled to one to a plurality of word lines, and the source / drain paths of the plurality of select transistors are a pair of complementary data lines. And configured as a memory cell with multiple input ports and multiple output ports. Each of the SRAMs constituting the read buffers RB0 to RB3 and the write buffers WB0 to WB3 is not particularly limited, but has a configuration of 128 words × 72 bits.

尚、多入力ポート・多出力ポートのメモリセルの構成自体は、種々変更可能である事は当業者にとって容易に理解されるであろう。 It will be readily appreciated by those skilled in the art that the configuration of the memory cell of the multi-input port / multi-output port can be variously changed.

図２には前記半導体集積回路１における前記リードデータの出力経路の詳細が例示される。メモリブロックＢＮＫ０、ＢＮＫ４はリードバッファＲＢ０とライトバッファＷＢ０を共有する。同じく、メモリブロックＢＮＫ１、ＢＮＫ５はリードバッファＲＢ１及びライトバッファＷＢ１を共有し、メモリブロックＢＮＫ２、ＢＮＫ６はリードバッファＲＢ２及びライトバッファＷＢ２を共有し、メモリブロックＢＮＫ３、ＢＮＫ７はリードバッファＲＢ３及びライトバッファＷＢ３を共有する。前記ライトバッファＷＢ０〜ＷＢ３及びリードバッファＲＢ０〜ＲＢ３は、特に制限されないが、２個のリードポートと２個のライトポートを有する。各ポートは８バイトの並列アクセスポートである。 FIG. 2 illustrates details of the output path of the read data in the semiconductor integrated circuit 1. The memory blocks BNK0 and BNK4 share the read buffer RB0 and the write buffer WB0. Similarly, the memory blocks BNK1 and BNK5 share the read buffer RB1 and the write buffer WB1, the memory blocks BNK2 and BNK6 share the read buffer RB2 and the write buffer WB2, and the memory blocks BNK3 and BNK7 share the read buffer RB3 and the write buffer WB3. Share. The write buffers WB0 to WB3 and the read buffers RB0 to RB3 are not particularly limited, but have two read ports and two write ports. Each port is an 8-byte parallel access port.

対を成す一方のメモリブロックＢＮＫ０からのリードデータと他方のメモリブロックＢＮＫ４からのリードデータとの何れかを選択するセレクタ４１Ａａが設けられている。他のメモリブロックに関しても同様のセレクタ４１Ａｂ〜４１Ａｄが設けられている。Ｓ１０〜Ｓ１３は前記セレクタ４１Ａａ〜４１Ａｄの選択制御信号である。前記リードバッファＲＢ０から出力されるリードデータとセレクタ４１Ａａで選択されるリードデータとの何れかを選択するセレクタ４０Ａａが設けられている。他のメモリブロックに関しても同様のセレクタ４０Ａｂ〜４０Ａｄが設けられている。Ｓ２０〜Ｓ２３は前記セレクタ４０Ａａ〜４０Ａｄの選択制御信号である。前記セレクタ４０Ａａ〜４０Ａｄの出力はセレクタ４１Ｂで選択されて上位階層インタフェースブロックＩ／Ｆ１に与えられる。セレクタ４１Ｂの動作は２ビットの選択信号Ｓ３０Ａ，Ｓ３０Ｂで制御される。前記セレクタ４２はリードバッファＲＢ０〜ＲＢ３の一方のリードポートからの出力を選択して下位階層インタフェースブロックＩ／Ｆ２に与える。セレクタ４２の動作は２ビットの選択信号Ｓ３１Ａ，Ｓ３１Ｂで制御される。 A selector 41Aa for selecting either read data from one memory block BNK0 and read data from the other memory block BNK4 is provided. Similar selectors 41Ab to 41Ad are provided for the other memory blocks. S10 to S13 are selection control signals for the selectors 41Aa to 41Ad. A selector 40Aa is provided for selecting either the read data output from the read buffer RB0 or the read data selected by the selector 41Aa. Similar selectors 40Ab to 40Ad are provided for the other memory blocks. S20 to S23 are selection control signals for the selectors 40Aa to 40Ad. The outputs of the selectors 40Aa to 40Ad are selected by the selector 41B and given to the upper layer interface block I / F1. The operation of the selector 41B is controlled by 2-bit selection signals S30A and S30B. The selector 42 selects an output from one read port of the read buffers RB0 to RB3 and supplies it to the lower layer interface block I / F2. The operation of the selector 42 is controlled by 2-bit selection signals S31A and S31B.

図３には前記メモリ制御回路ＭＣＮＴが生成する制御信号が例示される。メモリ制御回路ＭＣＮＴは、メモリブロックＢＮＫ０〜ＢＮＫ７毎にアドレス信号ＡＤＲＳ、ロウアドレスストローブ信号ＲＡＳ、カラムアドレスストローブ信号ＣＡＳ及びライトイネーブル信号ＷＥ等を出力し、リードバッファＲＢ０〜ＲＢ３毎にアドレス信号ＡＤＲＳ、メモリイネーブル信号ＭＳ、リード／ライト信号Ｒ／Ｗ及びポートセレクト信号ＰＳＬを出力し、ライトバッファＷＢ０〜ＷＢ３毎にアドレス信号ＡＤＲＳ、メモリイネーブル信号ＭＳ、リード／ライト信号Ｒ／Ｗ及びポートセレクト信号ＰＳＬを出力し、前記セレクタ選択信号Ｓ１０〜Ｓ１３，Ｓ２０〜Ｓ２３，Ｓ３０Ａ，Ｓ３０Ｂ，Ｓ３１Ａ，Ｓ３１Ｂを出力し、インタフェースブロックＩ／Ｆ１，Ｉ／Ｆ２に対する出力イネーブル信号ＯＥＰ１，ＯＥＰ２等を出力する。メモリ制御回路ＭＣＮＴはアクセス制御情報を上位記憶階層と下位記憶階層との双方から入力し、入力したアクセス制御情報が指示する動作を実現するように上記制御信号の中から必要な制御信号を所定のタイミングで活性化制御する。上記メモリ制御回路ＭＣＮＴは、メモリブロックＢＮＫ０〜ＢＮＫ７のリフレッシュ動作の期間に関する信号ＭＲｅｆ０〜７が各メモリブロックＢＮＫ０〜ＢＮＫ７から入力される。 FIG. 3 illustrates control signals generated by the memory control circuit MCNT. The memory control circuit MCNT outputs an address signal ADRS, a row address strobe signal RAS, a column address strobe signal CAS, a write enable signal WE, and the like for each of the memory blocks BNK0 to BNK7, and an address signal ADRS and a memory for each of the read buffers RB0 to RB3. Enable signal MS, read / write signal R / W, and port select signal PSL are output, and address signal ADRS, memory enable signal MS, read / write signal R / W, and port select signal PSL are output for each of write buffers WB0 to WB3. The selector selection signals S10 to S13, S20 to S23, S30A, S30B, S31A and S31B are output, and the output enable signals OEP1 and OEP2 for the interface blocks I / F1 and I / F2 are output. To. The memory control circuit MCNT inputs the access control information from both the upper storage hierarchy and the lower storage hierarchy, and sends a necessary control signal from the control signals to a predetermined value so as to realize the operation indicated by the input access control information. Activation control is performed at the timing. In the memory control circuit MCNT, signals MRef0 to MRef7 related to the refresh operation period of the memory blocks BNK0 to BNK7 are input from the memory blocks BNK0 to BNK7.

アクセス制御情報４３は図４に例示されるようにアドレス指定部４３Ａとオペレーション指定部４３Ｂとを含む。アドレス指定部４３Ａはリード、ライトを行うメモリブロックＢＮＫ０〜ＢＮＫ７の指定情報と、メモリブロック内のアドレス情報とを含む。オペレーション指定部４３Ｂは、半導体集積回路１に、例えばアドレス指定部で指定されるアドレスからの８バイトのデータのリード／ライト、アドレス指定部で指定されるアドレスから連続１６バイトのデータのリード／ライト、アドレス指定部で指定されるアドレスから連続３２バイトのデータのリード／ライト等の動作を指定する。 The access control information 43 includes an address specifying unit 43A and an operation specifying unit 43B as illustrated in FIG. The address designating unit 43A includes designation information for the memory blocks BNK0 to BNK7 to be read and written, and address information in the memory block. The operation designating unit 43B reads / writes 8-byte data from, for example, an address designated by the address designating unit and reads / writes continuous 16-byte data from the address designated by the address designating unit to the semiconductor integrated circuit 1. The operation such as reading / writing of continuous 32-byte data from the address designated by the address designating unit is designated.

前記メモリ制御回路ＭＣＮＴは半導体集積回路１の内部においてリソースが競合しない範囲でメモリブロックＢＮＫ０〜ＢＮＫ７を並列動作させるように外部からのアクセス要求を受け付ける。また、メモリ制御回路ＭＣＮＴは、メモリブロックＢＮＫ０〜ＢＮＫ７の中から選ばれた一つのメモリブロック又はリードバッファＲＢ０〜ＲＢ３の中から選ばれた１つのリードバッファをインタフェースブロックＩ／Ｆ１、Ｉ／Ｆ２に導通させて、リードデータの外部出力を制御する。 The memory control circuit MCNT accepts an external access request so that the memory blocks BNK0 to BNK7 are operated in parallel within a range in which resources do not compete within the semiconductor integrated circuit 1. Further, the memory control circuit MCNT converts one memory block selected from the memory blocks BNK0 to BNK7 or one read buffer selected from the read buffers RB0 to RB3 to the interface blocks I / F1 and I / F2. It is turned on to control the external output of read data.

図５には外部からのアクセス要求に対する前記メモリ制御回路ＭＣＮＴの主な制御手順が代表的に示される。 FIG. 5 representatively shows a main control procedure of the memory control circuit MCNT in response to an access request from the outside.

前記メモリ制御回路ＭＣＮＴは、ライトアクセスの要求に対しては、ライト対象メモリブロックによるリフレッシュ等の内部動作の有無に関係なくライトバッファＷＢ０〜ＷＢ３の内の対応するライトバッファに予めライトデータを取込む制御を行う（Ｔ１）。その後、書込み対象メモリブロックがリフレッシュ等の内部動作を行っておらずライト動作可能であるかの判定が行われ（Ｔ２）、ライト動作可能の判定を待って対象メモリブロックへデータライトが行われる（Ｔ３）。 In response to a write access request, the memory control circuit MCNT takes in write data in advance in the corresponding write buffer of the write buffers WB0 to WB3 regardless of whether there is an internal operation such as refresh by the write target memory block. Control is performed (T1). Thereafter, it is determined whether the write target memory block is not performing an internal operation such as refresh and the write operation is possible (T2), and data write to the target memory block is performed after waiting for the determination of the write operation possible (T2). T3).

図６にはライトアクセスの途中にリフレッシュ動作が介在される場合のライト動作の一例が示される。図６ではメモリブロックＢＮＫ０を書込み対象とする。Ｕ１〜Ｕ８は夫々半導体集積回路１の外部から与えられるアセス単位を意味するものであり、アクセス単位のデータは７２ビット並列である。システムの動作サイクル４〜９ではメモリブロックＢＮＫ０はリフレッシュ動作を行い、その前後ではリード／ライト可能である。ライトデータは１サイクル遅れで順次ライトバッファＷＢ０に格納されていく。一旦ライトバッファＷＢ０に格納されたライトデータは、書き込み対象メモリブロックＢＮＫ０がリード／ライト可能であれば、１サイクル毎にライトデータが順次メモリブロックＢＮＫ０のライトレジスタ２２に供給されていく。アクセス単位Ｕ４のデータがライトレジスタ２２にラッチされたとき既にメモリブロックＢＮＫ０はリフレッシュ動作に入っている。メモリ制御回路ＭＣＮＴはライトバッファＷＢ０からライトレジスタ２２へのデータ転送を中断してリフレッシュ動作の終了を待つ。その間、ライトバッファＷＢ０へのライトデータの書込みは継続される。動作サイクル９でメモリブロックＢＮＫ０のリフレッシュ動作が完了されると、メモリ制御回路ＭＣＮＴはサイクル１０でメモリブロックＢＮＫ０に対してストローブ信号ＲＡＳ，ＣＡＳ，ＷＥをアサートしライトアドレスを与え、アクセス単位Ｕ１〜Ｕ４のデータを４サイクルかけてＤＲＡＭコア８に書込む。ＤＲＡＭコア８への書込みに並行して、後続のアクセス単位Ｕ５〜Ｕ８のライトデータをライトレジスタ２２に順次転送する。メモリ制御回路ＭＣＮＴはサイクル１４でメモリブロックＢＮＫ０に対してストローブ信号ＲＡＳ，ＣＡＳ，ＷＥをアサートしライトアドレスを与え、アクセス単位Ｕ５〜Ｕ８のデータを４サイクルかけてＤＲＡＭコア８に書込む。結果として、アクセス単位Ｕ１〜Ｕ８のライトアクセスを指示する上位記憶階層側のプロセッサは、サイクル８で今回のライトアクセスの処理から解放され、メモリブロックＢＮＫ０にリフレッシュ動作が介在されてもその影響を受けない。 FIG. 6 shows an example of a write operation when a refresh operation is interposed during the write access. In FIG. 6, the memory block BNK0 is a write target. U1 to U8 mean access units given from the outside of the semiconductor integrated circuit 1, and the data of the access units is 72 bits in parallel. In the operation cycles 4 to 9 of the system, the memory block BNK0 performs a refresh operation and can be read / written before and after that. Write data is sequentially stored in the write buffer WB0 with a delay of one cycle. Once the write data once stored in the write buffer WB0 can be read / written by the write target memory block BNK0, the write data is sequentially supplied to the write register 22 of the memory block BNK0 every cycle. When the data of the access unit U4 is latched in the write register 22, the memory block BNK0 has already entered the refresh operation. The memory control circuit MCNT interrupts the data transfer from the write buffer WB0 to the write register 22 and waits for the end of the refresh operation. Meanwhile, the writing of write data to the write buffer WB0 is continued. When the refresh operation of the memory block BNK0 is completed in the operation cycle 9, the memory control circuit MCNT asserts the strobe signals RAS, CAS, and WE to the memory block BNK0 in cycle 10 to give write addresses, and access units U1 to U4. Is written to the DRAM core 8 over 4 cycles. In parallel with the writing to the DRAM core 8, the write data of the subsequent access units U5 to U8 are sequentially transferred to the write register 22. The memory control circuit MCNT asserts strobe signals RAS, CAS, and WE to the memory block BNK0 in cycle 14 to give a write address, and writes the data of the access units U5 to U8 to the DRAM core 8 in 4 cycles. As a result, the processor on the upper storage hierarchy side instructing the write access of the access units U1 to U8 is released from the current write access processing in cycle 8 and is affected even if the refresh operation is interposed in the memory block BNK0. Absent.

図７にはライトバッファＷＢ０が設けられていない場合ライト動作の一例が示される。ライトバッファＷＢ０が無い場合、ＤＲＡＭコア８でリフレッシュ動作が開始されると、ライトアクセスを指示する上位記憶階層側のプロセッサは、サイクル４でライトアドレスとライトデータの出力、即ちライトアクセス要求の発行を中断し、リフレッシュ動作が完了するのを検出しながら待たなければならない。リフレッシュ動作が終わったサイクル９以降、上位記憶階層側のプロセッサは再び、ライトアクセスの要求を発行して、サイクル１０から順次アクセス単位Ｕ５〜Ｕ８のアドレス及びデータを出力する。これにより、メモリブロックＢＮＫ０のＤＲＡＭコア８に対するアクセス単位Ｕ１〜Ｕ８のライトデータ書込みはサイクル１９で完了するが、ライトアクセスを指示する上位記憶階層側のプロセッサは、サイクル１３までライトアクセスの処理から解放されない。図６と比較すれば明らかなように、Ｌ３キャッシュメモリとされる半導体集積回路１がライトバッファＷＢ０〜ＷＢ３をもつ事により、上位階層のプロセッサのデータ処理効率を格段に向上させることが可能になる。 FIG. 7 shows an example of a write operation when the write buffer WB0 is not provided. In the absence of the write buffer WB0, when the refresh operation is started in the DRAM core 8, the processor on the upper storage hierarchy side instructing write access outputs the write address and write data, that is, issues the write access request in cycle 4. It must be interrupted and wait while detecting that the refresh operation is complete. After cycle 9 when the refresh operation is completed, the processor on the upper storage hierarchy side issues a write access request again and sequentially outputs the addresses and data of the access units U5 to U8 from cycle 10. As a result, the write data write of the access units U1 to U8 to the DRAM core 8 of the memory block BNK0 is completed in cycle 19, but the processor on the upper storage hierarchy side instructing the write access is released from the write access processing until cycle 13. Not. As is clear from comparison with FIG. 6, the semiconductor integrated circuit 1 that is an L3 cache memory has the write buffers WB0 to WB3, so that the data processing efficiency of the upper layer processors can be remarkably improved. .

一方、前記メモリ制御回路ＭＣＮＴは、メモリブロックＢＮＫ０〜ＢＮＫ７から読み出されたリードデータをインタフェースブロックＩ／Ｆ１、Ｉ／Ｆ２を介して外部に出力する動作を制御するときは、前記リードデータの出力経路の選択、即ち、前記上位スルー経路、上位バッファリング経路、及び下位バッファリング経路の選択制御を行って、データリード動作のスループットを向上させる。 On the other hand, when the memory control circuit MCNT controls the operation of outputting the read data read from the memory blocks BNK0 to BNK7 to the outside via the interface blocks I / F1 and I / F2, the output of the read data is performed. Path selection, that is, selection control of the upper through path, upper buffering path, and lower buffering path is performed to improve the throughput of the data read operation.

前記上位スルー経路を選択するか上位バッファリング経路を選択するかはリソース競合を生ずる虞があるか否かをメモリ制御回路ＭＣＮＴが判定することによって行う。 Whether the upper through path or the upper buffering path is selected is determined by the memory control circuit MCNT determining whether there is a risk of resource contention.

即ち、前記メモリ制御回路ＭＣＮＴは、図５に例示されるように、リードアクセス要求に対して、アクセス対象メモリブロックのＤＲＡＭコア８からデータを読み出し（Ｔ４）、読み出したリードデータを前記上位階層インタフェースブロックＩ／Ｆ１から外部に出力するときリソース競合があるかを判定し（Ｔ５）、リソース競合がある場合、即ち、前記上位階層インタフェースブロックＩ／Ｆ１から外部にそのリードデータを出力することが不可能な状態であるときは、当該リードデータを対応するリードバッファＲＢ０〜ＲＢ３に保持させる（Ｔ６）。前記出力不可能な状態が解消されているときは前記メモリブロックＢＮＫ０〜ＢＮＫ７から読み出されたリードデータ又は前記リードバッファＲＢ０〜ＲＢ３から読み出されたリードデータを前記上位階層インタフェースブロックＩ／Ｆ１から外部に出力させる（Ｔ７）。 That is, as illustrated in FIG. 5, the memory control circuit MCNT reads data from the DRAM core 8 of the memory block to be accessed (T4) in response to a read access request, and reads the read data into the upper layer interface. It is determined whether there is resource contention when outputting from the block I / F1 to the outside (T5). If there is resource contention, that is, it is not possible to output the read data from the upper layer interface block I / F1 to the outside. If it is possible, the read data is held in the corresponding read buffers RB0 to RB3 (T6). When the non-outputable state is resolved, the read data read from the memory blocks BNK0 to BNK7 or the read data read from the read buffers RB0 to RB3 is sent from the upper layer interface block I / F1. Output to the outside (T7).

図８にはリードバッファＲＢ０〜ＲＢ３を利用したリード動作の一例が示される。図８において上位記憶階層側のプロセッサはシステムの動作サイクル単位で連続してアクセス要求Ａ〜Ｄを発行する。アクセス要求ＡはメモリブロックＢＮＫ０のアドレスＡから連続する３２バイトのデータＡ−０、Ａ−１，Ａ−２，Ａ−３を読み出すリードアクセス要求である。同様に、アクセス要求ＢはメモリブロックＢＮＫ４のアドレスＢから連続する３２バイトのデータＢ−０、Ｂ−１，Ｂ−２，Ｂ−３を読み出すリードアクセス要求、アクセス要求ＣはメモリブロックＢＮＫ１のアドレスＣから連続する３２バイトのデータＣ−０、Ｃ−１，Ｃ−２，Ｃ−３を読み出すリードアクセス要求、アクセス要求ＤはメモリブロックＢＮＫ３のアドレスＤから連続する３２バイトのデータＤ−０、Ｄ−１，Ｄ−２，Ｄ−３を読み出すリードアクセス要求である。 FIG. 8 shows an example of a read operation using the read buffers RB0 to RB3. In FIG. 8, the processor on the upper storage hierarchy side issues access requests A to D continuously in units of system operation cycles. The access request A is a read access request for reading continuous 32-byte data A-0, A-1, A-2, A-3 from the address A of the memory block BNK0. Similarly, the access request B is a read access request for reading consecutive 32-byte data B-0, B-1, B-2, and B-3 from the address B of the memory block BNK4, and the access request C is the address of the memory block BNK1. Read access request for reading continuous 32-byte data C-0, C-1, C-2, C-3 from C, access request D is 32-byte data D-0 continuous from address D of memory block BNK3, This is a read access request for reading D-1, D-2, and D-3.

前記アクセス要求Ａがあると、メモリ制御回路ＭＣＮＴはメモリブロックＢＮＫ０のＤＲＡＭコア８からアドレスＡで指定された２８８ビットのデータを並列に読み出してリードレジスタ２６にラッチする。そしてリードレジスタ２６が順次選択され、８バイト単位でリードデータＡ−０，Ａ−１，Ａ−２，Ａ−３がメモリブロックＢＮＫ０から出力される。このリードデータの出力はシステムの動作サイクル単位（１サイクル単位）で行われる。このとき、上位階層インタフェースブロックＩ／Ｆ１は出力動作を行っていない。これに従ってメモリコントローラＭＣＮＴはセレクタ４１Ａａ，４０Ａａ，４１ＢによってメモリブロックＢＮＫ０からの出力データＡ−０，Ａ−１，Ａ−２，Ａ−３を直接上位階層インタフェースブロックＩ／Ｆ１に伝達して外部に出力させる。 When there is the access request A, the memory control circuit MCNT reads 288-bit data designated by the address A in parallel from the DRAM core 8 of the memory block BNK0 and latches it in the read register 26. Then, the read register 26 is sequentially selected, and read data A-0, A-1, A-2, A-3 are output from the memory block BNK0 in units of 8 bytes. The read data is output in units of system operation cycles (one cycle unit). At this time, the upper layer interface block I / F1 is not performing an output operation. Accordingly, the memory controller MCNT transmits the output data A-0, A-1, A-2, A-3 from the memory block BNK0 directly to the upper layer interface block I / F1 by the selectors 41Aa, 40Aa, 41B and externally. Output.

この出力動作に並行して、１サイクル遅れで、次のアクセス要求Ｂが発行され、メモリブロックＢＮＫ４から順次リードデータＢ−０，Ｂ−１，Ｂ−２，Ｂ−３が出力される。出力されたリードデータは順次リードバッファＲＢ０に蓄えられていく。同様に、後続する次のアクセス要求ＣによってメモリブロックＢＮＫ１から順次リードデータＣ−０，Ｃ−１，Ｃ−２，Ｃ−３が出力されてリードバッファＲＢ１に蓄えられ、更に後続する次のアクセス要求ＤによってメモリブロックＢＮＫ３から順次リードデータＤ−０，Ｄ−１，Ｄ−２，Ｄ−３が出力されてリードバッファＲＢ３に蓄えられる。後続するリードデータをリードバッファに保持する動作を行っている途中で、上位階層インタフェースブロックＩ／Ｆ１による外部へのデータ出力が終了すると、これに続くアクセス要求のリードデータを今度はリードバッファから読み出して外部に出力させる。即ち、データＡ−３の次は、リードバッファＲＢ０から順次リードデータＢ−０，Ｂ−１，Ｂ−２，Ｂ−３を出力させ、これをセレクタ４０Ａａ，４１Ｂで選択して上位階層インタフェースブロックＩ／Ｆ１から外部へデータさせる。以下、データＣ−０〜Ｄ−３まで連続して外部に出力される。 In parallel with this output operation, the next access request B is issued with a delay of one cycle, and read data B-0, B-1, B-2, and B-3 are sequentially output from the memory block BNK4. The output read data is sequentially stored in the read buffer RB0. Similarly, the read data C-0, C-1, C-2, and C-3 are sequentially output from the memory block BNK1 by the subsequent next access request C and stored in the read buffer RB1. In response to the request D, read data D-0, D-1, D-2, and D-3 are sequentially output from the memory block BNK3 and stored in the read buffer RB3. When the data output to the outside by the upper layer interface block I / F1 is finished while the operation of holding the subsequent read data in the read buffer is completed, the read data of the subsequent access request is read from the read buffer this time. Output to the outside. That is, next to the data A-3, the read data B-0, B-1, B-2, B-3 are sequentially output from the read buffer RB0, and this is selected by the selectors 40Aa, 41B, and the upper layer interface block. Data is transmitted from the I / F1 to the outside. Thereafter, data C-0 to D-3 are continuously output to the outside.

一方、図９に例示されるようにリードバッファが無ければ、最初のアクセス要求Ａに係るリードデータを全て外部に出力するまで、次のアクセス要求を受け付けることはできない。異なったメモリブロックにおいて、少なくとも、リードレジスタの出力動作が競合しないようにしなければならない。 On the other hand, if there is no read buffer as illustrated in FIG. 9, the next access request cannot be accepted until all the read data related to the first access request A is output to the outside. In different memory blocks, at least the output operation of the read register must not conflict.

これより明らかなように、リードバッファＲＢ０〜ＲＢ３を採用することによって後続のリードアクセス要求を予め受け付けてメモリブロックの内部動作を先行させる事ができ、また、リードバッファにはＤＲＡＭよりもアクセス速度の速いＳＲＡＭを採用することによりバッファリングされたデータ出力動作が遅くなることもなく、データリード動作のスループットを向上させる事ができる。 As is clear from this, by adopting the read buffers RB0 to RB3, subsequent read access requests can be received in advance and the internal operation of the memory block can be preceded, and the read buffer has an access speed higher than that of the DRAM. By adopting a fast SRAM, the buffered data output operation is not delayed, and the throughput of the data read operation can be improved.

更に、メモリブロックＢＮＫ０〜ＢＮＫ７からデータが読み出されたとき前記リソース競合が無ければ、リードデータはリードバッファＲＢ０〜ＲＢ３を介することなく直接上位階層インタフェースブロックＩ／Ｆ１から外部に出力されるから、データ競合を生じていない場合にも一旦データバッファリングを行うような無駄を回避でき、この点においてリードデータ出力動作のスループットを向上に寄与する。 Further, when there is no resource contention when data is read from the memory blocks BNK0 to BNK7, the read data is directly output from the upper layer interface block I / F1 without going through the read buffers RB0 to RB3. Even when data contention does not occur, it is possible to avoid waste such as temporarily performing data buffering, which contributes to improving the throughput of the read data output operation.

次に前記半導体集積回路１をキャッシュメモリシステムに適用した場合に即して説明する。 Next, the case where the semiconductor integrated circuit 1 is applied to a cache memory system will be described.

図１０にはキャッシュメモリシステムの第１の例が示される。半導体集積回路１はＬ３キャッシュメモリとして利用され、プロセッサ５０とメインメモリ５１の間に配置される。半導体集積回路１の前記上位階層インタフェースブロックＩ／Ｆ１にはプロセッサバス５２が接続され、プロセッサとの間でデータを入出力し、また、プロセッサ５０から出力されるアクセス制御情報を入力する。半導体集積回路１の前記下位階層インタフェースブロックＩ／Ｆ２にはメモリバス５３が接続され、メインメモリ５１との間でデータを入出力する。メインメモリ５１に対するアクセス制御情報は、特に制限されないが、プロセッサ５０が発行する情報である。 FIG. 10 shows a first example of the cache memory system. The semiconductor integrated circuit 1 is used as an L3 cache memory and is disposed between the processor 50 and the main memory 51. A processor bus 52 is connected to the upper layer interface block I / F 1 of the semiconductor integrated circuit 1 to input / output data to / from the processor and input access control information output from the processor 50. A memory bus 53 is connected to the lower layer interface block I / F 2 of the semiconductor integrated circuit 1 to input / output data to / from the main memory 51. The access control information for the main memory 51 is information issued by the processor 50, although not particularly limited.

プロセッサ５０はＣＰＵ５０Ａと共にＬ１キャッシュメモリ５０Ｂ、Ｌ２キャッシュメモリ５０Ｃを内蔵し、更にＬ３キャッシュメモリのためのタグ制御論理（ＴＡＧ）５０Ｄを備えている。半導体集積回路１はＬ３キャッシュメモリのデータメモリ部として位置付けられる。タグ制御論理５０Ｄは、Ｌ３キャッシュメモリとしての半導体集積回路１のキャッシュライン毎にインデックスアドレスとキャッシュエントリのタグアドレスとを関連着ける情報を有する。更に、キャッシュライン毎に、そのキャッシュラインの有効性を示すバリッドビット、キャッシュラインのリプレースに際して下位記憶階層へのコピーバック若しくはライトバックの必要性を示すダーティービット等を有する。 The processor 50 incorporates an L1 cache memory 50B and an L2 cache memory 50C together with the CPU 50A, and further includes a tag control logic (TAG) 50D for the L3 cache memory. The semiconductor integrated circuit 1 is positioned as a data memory unit of the L3 cache memory. The tag control logic 50D has information for associating the index address with the tag address of the cache entry for each cache line of the semiconductor integrated circuit 1 as the L3 cache memory. Further, each cache line has a valid bit indicating the validity of the cache line, and a dirty bit indicating the necessity of copy back or write back to the lower storage hierarchy when the cache line is replaced.

尚、図１０において、半導体集積回路１の下位記憶階層はメインメモリに限定されず、Ｌ４キャッシュメモリであってもよい。Ｌ４キャッシュメモリのタグ制御部はプロセッサ５０の内部に構成してよい。 In FIG. 10, the lower storage hierarchy of the semiconductor integrated circuit 1 is not limited to the main memory, but may be an L4 cache memory. The tag control unit of the L4 cache memory may be configured inside the processor 50.

図１１には図１０のキャッシュメモリシステムにおけるプロセッサのリードアクセス動作に着目したデータフローが示される。プロセッサ５０に内蔵されたＬ１キャッシュメモリ５０Ｂ、Ｌ２キャッシュメモリ５０Ｃがキャッシュミスのとき、タグ制御論理５０Ｄによって半導体集積回路１がキャッシュヒットするならば、プロセッサ５０は半導体集積回路１をターゲットとしてリードアクセスを要求する。この時のアクセス制御情報の経路はＰ１である。前述のように、リソース競合が無ければリードデータはメモリブロックＢＮＫ０〜ＢＮＫ７から直接プロセッサ５０に返される（経路Ｐ２）。リソース競合があるときは、リードデータは一旦リードバッファＲＢ０〜ＲＢ３の内の一つに保持され、リソース競合を生じないタイミングでリードバッファＲＢ０〜ＲＢ３からプロセッサ５０に返される（経路Ｐ２’）。半導体集積回路１もキャッシュミスになるとき、プロセッサ５０はアクセス制御情報をメインメモリ５１に与え（経路Ｐ３）、メインメモリ５１のリードデータがプロセッサ５０に返される（径路Ｐ４）。 FIG. 11 shows a data flow focusing on the read access operation of the processor in the cache memory system of FIG. If the semiconductor integrated circuit 1 has a cache hit by the tag control logic 50D when the L1 cache memory 50B and the L2 cache memory 50C built in the processor 50 have a cache miss, the processor 50 performs read access with the semiconductor integrated circuit 1 as a target. Request. The access control information path at this time is P1. As described above, if there is no resource contention, the read data is returned directly from the memory blocks BNK0 to BNK7 to the processor 50 (path P2). When there is resource contention, the read data is temporarily held in one of the read buffers RB0 to RB3, and returned from the read buffers RB0 to RB3 to the processor 50 at a timing that does not cause resource contention (path P2 '). When the semiconductor integrated circuit 1 also becomes a cache miss, the processor 50 gives access control information to the main memory 51 (path P3), and read data of the main memory 51 is returned to the processor 50 (path P4).

このとき、仮に、別の回路モジュールの影響によって径路Ｐ４でバスの競合を生ずるような場合、メインメモリ５１からリードデータをプロセッサ５０に送る事ができない。バス競合が解消されても、再度プロセッサ５０はメインメモリ５１にアクセス要求を発行して、ＤＲＡＭで成るようなアクセス速度の遅いメインメモリ５１を再度アクセスしなければならなくなる。そこで、図１１に例示されるように、リードバッファＲＢ０〜ＲＢ３と同様にＳＲＡＭ等で成る高速アクセス可能なメモリバッファ（ＭＢ）５４をメインメモリ５１とプロセッサ５０との間に配置するとよい。 At this time, if a bus conflict occurs in the path P4 due to the influence of another circuit module, the read data cannot be sent from the main memory 51 to the processor 50. Even if the bus contention is resolved, the processor 50 must issue an access request to the main memory 51 again to access again the main memory 51 having a low access speed such as a DRAM. Therefore, as illustrated in FIG. 11, a high-speed accessible memory buffer (MB) 54 made of SRAM or the like may be disposed between the main memory 51 and the processor 50 as in the read buffers RB <b> 0 to RB <b> 3.

メモリバッファ５４は半導体集積回路１に内蔵させてもよい。メモリバッファ５４は前記下位階層インタフェースブロックＩ／Ｆ２からデータを入力して保持し、保持したデータを前記上位階層インタフェースブロックＩ／Ｆ１から外部に出力可能にすればよい。メモリバッファ５４のリードデータ出力とメモリブロックＢＮＫ０〜ＢＮＫ７のリードデータ出力は排他的であれよく、例えば、プロセッサ５０はメモリバッファ５４を直接指定して動作させればよい。 The memory buffer 54 may be built in the semiconductor integrated circuit 1. The memory buffer 54 may be configured to input and hold data from the lower layer interface block I / F2 and to output the held data to the outside from the upper layer interface block I / F1. The read data output of the memory buffer 54 and the read data output of the memory blocks BNK0 to BNK7 may be exclusive. For example, the processor 50 may operate by directly specifying the memory buffer 54.

図１２は図１０のキャッシュメモリシステムにおけるプロセッサのライトアクセスに動作に着目したデータフローが示される。プロセッサ５０のライトアクセスにおいて、プロセッサ５０に内蔵されたＬ１キャッシュメモリ５０Ｂ、Ｌ２キャッシュメモリ５０Ｃがキャッシュミスのとき、タグ制御論理５０Ｄによって半導体集積回路１がキャッシュヒットするならば、プロセッサ５０は半導体集積回路１をターゲットとしてライトアクセスを要求する。この時のアクセス制御情報の経路はＰ１である。前述のように、ライトデータは一旦ライトバッファＷＢ０〜ＷＢ３の内の一つに格納され、ライト対象メモリブロックがライト動作可能になったときライトバッファからメモリブロックにライトデータが書き込まれる（経路Ｐ５）。半導体集積回路１はライトバッファＷＢ０〜ＷＢ３を備えるので、ライト要求の途中でメモリブロックのリフレッシュ動作が介在されても書き込み要求を途中で中断しなくてもよい。したがって、書き込み処理からプロセッサ５０を早く開放してやることができる。半導体集積回路１もキャッシュミスになるときは、プロセッサ５０はアクセス制御情報をメインメモリ５１に与え（経路Ｐ３）、ライトデータをメインメモリ５１に与える（径路Ｐ６）。 FIG. 12 shows a data flow focusing on the operation for the write access of the processor in the cache memory system of FIG. In the write access of the processor 50, if the semiconductor integrated circuit 1 hits the cache by the tag control logic 50D when the L1 cache memory 50B and the L2 cache memory 50C built in the processor 50 have a cache miss, the processor 50 Request write access with 1 as target. The access control information path at this time is P1. As described above, the write data is temporarily stored in one of the write buffers WB0 to WB3, and when the write target memory block becomes ready for write operation, the write data is written from the write buffer to the memory block (path P5). . Since the semiconductor integrated circuit 1 includes the write buffers WB0 to WB3, the write request does not have to be interrupted even if the refresh operation of the memory block is interposed during the write request. Therefore, the processor 50 can be quickly released from the writing process. When the semiconductor integrated circuit 1 also becomes a cache miss, the processor 50 gives access control information to the main memory 51 (path P3) and gives write data to the main memory 51 (path P6).

図１３は図１０のキャッシュメモリシステムにおけるキャッシュラインのリプレースに着目したデータフローが示される。ライトアクセス又はリードアクセス時における半導体集積回路１のキャッシュミスに応答してメモリブロックＢＮＫ０〜ＢＮＫ７の所定のキャッシュラインをリプレースするとき、当該キャッシュラインのダーティービットがイネーブルであるとき、リプレース前に、そのキャッシュラインのエントリを対応するタグアドレスの下位階層エリアにコピーバックしなければならない。コピーバックすべきデータはメモリブロックＢＮＫ０〜ＢＮＫ７からリードバッファＲＢ０〜ＲＢ３に格納すればよく、実際にアクセス速度の遅いＤＲＡＭから成るメインメモリ５１に実際にコピーバックされるのを待つ必要はない。また、リプレースすべき新たなキャッシュエントリのデータは、コピーバックすべきデータがリードバッファＲＢ０〜ＲＢ３に転送されるのを待つことなくメインメモリ５１からライトバッファＷＢ０〜Ｗ３に書き込んでよい。これにより、キャッシュラインのリプレースを伴なうようなときにもプロセッサ５０による最終的なデータリードのスループットを向上させることができる。 FIG. 13 shows a data flow focusing on the replacement of the cache line in the cache memory system of FIG. When replacing a predetermined cache line of the memory blocks BNK0 to BNK7 in response to a cache miss of the semiconductor integrated circuit 1 at the time of write access or read access, if the dirty bit of the cache line is enabled, before the replacement, The cache line entry must be copied back to the area below the corresponding tag address. The data to be copied back may be stored in the read buffers RB0 to RB3 from the memory blocks BNK0 to BNK7, and there is no need to wait for the actual copying back to the main memory 51 composed of a DRAM having a low access speed. The data of the new cache entry to be replaced may be written from the main memory 51 to the write buffers WB0 to W3 without waiting for the data to be copied back to be transferred to the read buffers RB0 to RB3. As a result, the throughput of the final data read by the processor 50 can be improved even when the cache line is replaced.

前記図１及び図２に基づいて説明したように下位階層インタフェースブロックＩ／Ｆ２とメモリブロックＢＮＫ０〜ＢＮＫ７との接続はリードバッファＲＢ０〜ＲＢ３を介する経路のみであり、上位階層のようなスルー経路は設けられていない。コピーバックは、キャッシュミスに際してダーティーなキャッシュラインをリプレースするためにそのデータをメインメモリに退避させる動作だから、そのようなリード動作では殆どの場合高いスループットは要求されないので、リードバッファＲＢ０〜ＲＢ３を迂回して直接リードデータを下位階層インタフェースブロックＩ／Ｆ２から出力可能にするデータパスやそのためのロジック回路を省けば、半導体集積回路１の論理規模が無駄に拡大しないようになる。 As described with reference to FIGS. 1 and 2, the lower layer interface block I / F2 and the memory blocks BNK0 to BNK7 are only connected via the read buffers RB0 to RB3. Not provided. Copyback is an operation that saves the data to the main memory in order to replace the dirty cache line in the event of a cache miss. Therefore, in such a read operation, high throughput is not required in most cases, and the read buffers RB0 to RB3 are bypassed. If the data path that enables direct read data to be output from the lower layer interface block I / F 2 and the logic circuit therefor are omitted, the logic scale of the semiconductor integrated circuit 1 is not increased unnecessarily.

図１４にはキャッシュメモリシステムの第２の例が示される。半導体集積回路１を前記プロセッサ５０のメインメモリとして利用することも可能である。この場合、半導体集積回路１の下位階層インタフェースブロックＩ／Ｆ２を用いる必要はない。 FIG. 14 shows a second example of the cache memory system. It is also possible to use the semiconductor integrated circuit 1 as a main memory of the processor 50. In this case, it is not necessary to use the lower layer interface block I / F2 of the semiconductor integrated circuit 1.

図１５にはキャッシュメモリシステムの第３の例が示される。同図に示されるキャッシュメモリシステムはマルチプロセッサシステムに適用された例であり、特に制限されないが、前記プロセッサ５０−１，５０−２を有し、夫々には前記半導体集積回路１で構成されたＬ３キャッシュメモリ１−１，１−２が接続され、Ｌ３キャッシュメモリ１−１，１−２はバススイッチ回路５５を介して前記メインメモリ５１に接続される。 FIG. 15 shows a third example of the cache memory system. The cache memory system shown in the figure is an example applied to a multiprocessor system, and is not particularly limited, but includes the processors 50-1 and 50-2, each of which is configured by the semiconductor integrated circuit 1. The L3 cache memories 1-1 and 1-2 are connected, and the L3 cache memories 1-1 and 1-2 are connected to the main memory 51 via the bus switch circuit 55.

前記Ｌ３キャッシュメモリ１−１，１−２は上位階層インタフェースブロックＩ／Ｆ１に接続するプロセッサバス５２−１，５２−２を介してプロセッサ５０−１，５０−２に結合し、プロセッサ５０−１，５０−２との間でデータを入出力し、また、プロセッサ５０−１，５０−２から出力されるアクセス制御情報を入力する。Ｌ３キャッシュメモリ１−１，１−２の前記下位階層インタフェースブロックＩ／Ｆ２はメモリバス５３−１，５３−２を介してバススイッチ回路５５に接続され、メインメモリ５１はメモリバス５３−３を介してバススイッチ５５に接続する。 The L3 cache memories 1-1 and 1-2 are coupled to the processors 50-1 and 50-2 via the processor buses 52-1 and 52-2 connected to the upper layer interface block I / F1, and the processor 50-1 , 50-2, and access control information output from the processors 50-1, 50-2. The lower layer interface block I / F2 of the L3 cache memories 1-1 and 1-2 is connected to the bus switch circuit 55 via the memory buses 53-1, 53-2, and the main memory 51 connects the memory bus 53-3. To the bus switch 55.

前記バススイッチ回路５５は特に制限されないが第１乃至第４のバス接続状態を選択的に実現する。第１のバス接続状態は、プロセッサ５０−１から出力されるアクセス制御情報をメインメモリ５１に伝達し、メインメモリ５１とＬ３キャッシュメモリ１−１又はプロセッサ５０−１との間でデータ入出力を可能にする。第２のバス接続状態は、プロセッサ５０−２から出力されるアクセス制御情報をメインメモリ５１に伝達し、メインメモリ５１とＬ３キャッシュメモリ１−２又はプロセッサ５０−２との間でデータ入出力を可能にする。第３のバス接続状態はプロセッサ５０−１から出力されるアクセス制御情報をＬ３キャッシュメモリ１−２に伝達し、Ｌ３キャッシュメモリ１−２とプロセッサ５０−１又はＬ３キャッシュメモリ１−１との間でデータ入出力を可能にする。第４のバス接続状態はプロセッサ５０−２から出力されるアクセス制御情報をＬ３キャッシュメモリ１−１に伝達し、Ｌ３キャッシュメモリ１−１とプロセッサ５０−２又はＬ３キャッシュメモリ１−２との間でデータ入出力を可能にする。 The bus switch circuit 55 is not particularly limited, but selectively realizes first to fourth bus connection states. In the first bus connection state, access control information output from the processor 50-1 is transmitted to the main memory 51, and data input / output is performed between the main memory 51 and the L3 cache memory 1-1 or the processor 50-1. enable. In the second bus connection state, access control information output from the processor 50-2 is transmitted to the main memory 51, and data input / output is performed between the main memory 51 and the L3 cache memory 1-2 or the processor 50-2. enable. In the third bus connection state, the access control information output from the processor 50-1 is transmitted to the L3 cache memory 1-2, and between the L3 cache memory 1-2 and the processor 50-1 or the L3 cache memory 1-1. Enables data input / output. In the fourth bus connection state, the access control information output from the processor 50-2 is transmitted to the L3 cache memory 1-1, and between the L3 cache memory 1-1 and the processor 50-2 or L3 cache memory 1-2. Enables data input / output.

Ｌ３キャッシュメモリ１−２は、前記第３のバス接続状態に応答するため、下位階層インタフェースブロックＩ／Ｆ２にプロセッサ１−１から出力されるアクセス制御情報を受けてキャッシュメモリ動作可能になっている。同様に、Ｌ３キャッシュメモリ１−１は、前記第４のバス接続状態に応答するため、下位階層インタフェースブロックＩ／Ｆ２にプロセッサ１−２から出力されるアクセス制御情報を受けてキャッシュメモリ動作可能になっている。 Since the L3 cache memory 1-2 responds to the third bus connection state, the lower layer interface block I / F 2 receives the access control information output from the processor 1-1 and can operate the cache memory. . Similarly, since the L3 cache memory 1-1 responds to the fourth bus connection state, the cache memory can be operated by receiving the access control information output from the processor 1-2 to the lower layer interface block I / F2. It has become.

図１６には前記半導体集積回路１のチップレイアウトが示されている。単結晶シリコンのような１個の矩形の半導体チップ１Ａの主面の中央部はロジック回路の領域１Ｂとされ、その上下に夫々メモリブロックＢＮＫ０〜ＢＮＫ３とメモリブロックＢＮＫ４〜ＢＮＫ７が分離して配置される。ロジック回路領域１Ｂの端にはリードバッファＲＢ０〜ＲＢ３とライトバッファＷＢ０〜ＷＢ３が分離して配置される。リードバッファＲＢ０〜ＲＢ３とライトバッファＷＢ０〜ＷＢ３との間にはインタフェースブロックＩ／Ｆ１，Ｉ／Ｆ２が分離配置される。インタフェースブロックＩ／Ｆ１，Ｉ／Ｆ２の近傍にはボンディングパッド又はバンプ電極のような外部接続電極（図示せず）が多数配置されている。特に制限されないが、インタフェースブロックＩ／Ｆ１，Ｉ／Ｆ２の間には、図１１で説明したバッファメモリ（ＭＢ）５４が配置される。ロジック回路領域１Ｂには特に図示はしないがその他の論理回路も配置されている。 FIG. 16 shows a chip layout of the semiconductor integrated circuit 1. A central portion of the main surface of one rectangular semiconductor chip 1A such as single crystal silicon is a logic circuit region 1B, and memory blocks BNK0 to BNK3 and memory blocks BNK4 to BNK7 are separately arranged above and below the region 1B, respectively. The Read buffers RB0 to RB3 and write buffers WB0 to WB3 are separately arranged at the end of the logic circuit region 1B. Interface blocks I / F1 and I / F2 are separately arranged between the read buffers RB0 to RB3 and the write buffers WB0 to WB3. Many external connection electrodes (not shown) such as bonding pads or bump electrodes are arranged in the vicinity of the interface blocks I / F1 and I / F2. Although not particularly limited, the buffer memory (MB) 54 described with reference to FIG. 11 is arranged between the interface blocks I / F1 and I / F2. Although not specifically shown, other logic circuits are also arranged in the logic circuit region 1B.

図１６のレイアウト構成を採用する事により、リードバッファＲＢ０〜ＲＢ３はメモリブロックＢＮＫ０〜ＢＮＫ７よりもインタフェースブロックＩ／Ｆ１，Ｉ／Ｆ２及び外部接続電極の近くに位置される。これにより、メモリブロックＢＮＫ０〜ＢＮＫ７のリードレジスタからリードバッファＲＢ０〜ＲＢ３を介することなく直接リードデータを外部に出力させる径路の動作遅延及び伝播遅延に対して、リードバッファＲＢ０〜ＲＢ３からリードデータを外部に出力させる径路の動作遅延及び伝播遅延が極端に増えてしまわないようにできる。したがって、上記レイアウト構成は、データリード動作のスループット向上に寄与する。 By adopting the layout configuration of FIG. 16, the read buffers RB0 to RB3 are located closer to the interface blocks I / F1 and I / F2 and the external connection electrodes than the memory blocks BNK0 to BNK7. As a result, the read data from the read buffers RB0 to RB3 is externally applied to the path operation delay and propagation delay in which the read data is directly output from the read registers of the memory blocks BNK0 to BNK7 without passing through the read buffers RB0 to RB3 It is possible to prevent the operation delay and propagation delay of the path to be output from increasing excessively. Therefore, the layout configuration contributes to an improvement in the throughput of the data read operation.

図１７にはメモリブロックの詳細な一例が示される。同図に代表的に示されるメモリブロックＢＮＫ０は、図示を省略するダイナミック型メモリセルがマトリクス配置されたメモリセルアレイ１０を有する。ダイナミック型メモリセルは情報を記憶する容量素子と、それに結合されたＮチャネル型ＭＯＳＦＦＴからなる選択トランジスタとを含み、上記選択トランジスタのゲートである選択端子はワード線ＷＬに接続され、上記選択トランジスタのソース・ドレインパスの一端は上記容量素子に結合され、上記ソース−ドレインパスの他端、すなわち、データ入出力端子は相補ビット線ＢＬに接続される。特に図示はしないが、相補ビット線はセンスアンプを中心とした折り返しビット線構造を有し、相補ビット線間にはプリチャージ回路などが配置されている。 FIG. 17 shows a detailed example of the memory block. A memory block BNK0 typically shown in the figure has a memory cell array 10 in which dynamic memory cells (not shown) are arranged in a matrix. The dynamic memory cell includes a capacitive element for storing information and a selection transistor composed of an N-channel type MOSFFT coupled thereto, and a selection terminal which is a gate of the selection transistor is connected to a word line WL. One end of the source / drain path is coupled to the capacitor element, and the other end of the source / drain path, that is, the data input / output terminal is connected to the complementary bit line BL. Although not shown in particular, the complementary bit line has a folded bit line structure centered on the sense amplifier, and a precharge circuit or the like is disposed between the complementary bit lines.

ロウデコーダ１１はロウアドレスストローブ信号ＲＡＳの立ち下がり変化に応答してドレスロウアドレス信号ＲＡＳＡＤＲで指定されるワード線ＷＬを選択するロウ選択回路である。相補ビット線ＢＬの選択はカラムデコーダ１３及びカラムスイッチ回路１２で行う。カラムデコーダ１３はカラムアドレスストローブ信号ＣＡＳの立ち下がり変化に応答してカラムアドレス信号ＣＡＳＡＤＲで指定される相補ビット線を複数本並列に選択するためのカラム選択信号１４を生成する。更にカラムデコーダ１３は、ライトイネーブル信号ＷＥのローレベルによる書込み動作の指示に応答して書込み信号１５Ｗを活性化し、ライトイネーブル信号ＷＥのハイレベルによる読み出し動作の指示に応答して読み出し信号１５Ｒを活性化する。カラムスイッチ回路１２はカラム選択信号１４によってスイッチ動作して当該信号１４にて指示される３２バイト（２８８ビット）分の相補ビット線を３２バイト分の相補書込みデータ線ＷＩＯと３２バイト分の相補読み出しデータ線ＲＩＯに夫々通させる。 The row decoder 11 is a row selection circuit that selects the word line WL specified by the dress row address signal RAS ADR in response to the falling change of the row address strobe signal RAS. The complementary bit line BL is selected by the column decoder 13 and the column switch circuit 12. The column decoder 13 generates a column selection signal 14 for selecting a plurality of complementary bit lines specified by the column address signal CASADR in parallel in response to the falling change of the column address strobe signal CAS. Further, the column decoder 13 activates the write signal 15W in response to the write operation instruction by the low level of the write enable signal WE, and activates the read signal 15R in response to the read operation instruction by the high level of the write enable signal WE. Turn into. The column switch circuit 12 performs a switching operation in response to a column selection signal 14 and outputs a 32-byte (288 bits) complementary bit line indicated by the signal 14 to a 32-byte complementary write data line WIO and a 32-byte complementary read. The data lines RIO are respectively passed.

前記相補書込みデータ線ＷＩＯにはライトアンプ１７Ｗから出力される３２バイトの書込みデータが並列に供給される。また、前記相補読み出しデータ線ＲＩＯはメインアンプ１７Ｒに３２バイトの読み出しデータを並列に供給される。ライトアンプ１７Ｗは２８８個の書込み増幅回路を有し、書込み信号１５Ｗが活性化されるのに応答して、並列入力される２８８ビットの書き込みデータＤＩＮ＜０＞〜ＤＩＮ＜３＞に対する増幅信号を前記相補書込みデータ線ＷＩＯに２８８ビットで並列出力動作可能にされる。前記メインアンプ１７Ｒは２８８個の読み出し増幅回路を有し、前記読み出し信号が活性化されるのに応答して、前記相補読み出しデータ線ＲＩＯからの入力に対する増幅信号を２８８ビットの読み出しデータＭＡＯＵＴ＜０＞〜ＭＡＯＵＴ＜３＞として並列出力動作可能にされる。尚、前記データＤＩＮ＜０＞，…，ＤＩＮ＜３＞は夫々８バイトであり、同様に前記データＭＡＯＵＴ＜０＞，…，ＭＡＯＵＴ＜３＞も夫々８バイトである。 The complementary write data line WIO is supplied with 32-byte write data output from the write amplifier 17W in parallel. The complementary read data line RIO is supplied with 32-byte read data in parallel to the main amplifier 17R. The write amplifier 17W has 288 write amplification circuits, and in response to activation of the write signal 15W, an amplification signal for the 288-bit write data DIN <0> to DIN <3> input in parallel is provided. The complementary write data line WIO can be operated in parallel with 288 bits. The main amplifier 17R has 288 read amplification circuits, and in response to activation of the read signal, the amplification signal corresponding to the input from the complementary read data line RIO is converted into 288-bit read data MAOUT <0. > To MAOUT <3> to enable parallel output operation. The data DIN <0>,..., DIN <3> are each 8 bytes, and the data MAOUT <0>,..., MAOUT <3> are each 8 bytes.

書込みデータＷＤの入力経路２０と前記ライトアンプ１７Ｗとの間には直列・並列変換回路２１が配置されている。特に制限されないが、書込みデータＷＤは８バイト並列で供給される。直列・並列変換回路２１は、前記４個のライトレジスタ２２とデータラッチ制御回路２３を有する。ライトレジスタ２２の入力端子は入力経路２０に共通接続され、出力端子は個別にライトアンプ１７Ｗの書込み増幅回路の入力端子に結合される。データラッチ制御回路２３は２ビットのラッチ制御データＤＬＡＴ＜１：０＞をクロック信号ＣＬＫに同期してデコードすることにより４ビットのラッチ制御信号ＤＩＮＬ＜３：０＞を生成し、対応するライトレジスタ２２のラッチ制御を行う。ラッチ制御データＬＡＴＤ＜１：０＞が順次インクリメントされて変化されることにより、８バイト単位で並列に入力される書き込みデータＷＤがクロック信号ＣＬＫに同期して順次４個のライトレジスタ２２にラッチされ、４個のライトレジスタ２２の出力には３２バイト並列で書き込みデータＤＩＮ＜０＞〜ＤＩＮ＜３＞が得られる。 A serial / parallel conversion circuit 21 is disposed between the input path 20 of the write data WD and the write amplifier 17W. Although not particularly limited, the write data WD is supplied in 8-byte parallel. The serial / parallel conversion circuit 21 includes the four write registers 22 and the data latch control circuit 23. The input terminals of the write register 22 are commonly connected to the input path 20, and the output terminals are individually coupled to the input terminals of the write amplifier circuit of the write amplifier 17W. The data latch control circuit 23 decodes the 2-bit latch control data DLAT <1: 0> in synchronization with the clock signal CLK to generate a 4-bit latch control signal DINL <3: 0>, and the corresponding write register 22 latch control is performed. By sequentially incrementing and changing the latch control data LATD <1: 0>, the write data WD input in parallel in units of 8 bytes are sequentially latched in the four write registers 22 in synchronization with the clock signal CLK. Write data DIN <0> to DIN <3> are obtained from the outputs of the four write registers 22 in parallel by 32 bytes.

読み出しデータＭＵＸＯＵＴの出力経路２９と前記メインアンプ１７Ｒとの間には並列・直列変換回路２５が配置されている。並列・直列変換回路２５は、４個のリードレジスタ２６、出力セレクタ２７及び選択制御回路２８を有する。リードレジスタ２６の入力端子にはメインアンプ１７Ｒから夫々読み出しデータＭＡＯＵＴ＜０＞〜ＭＡＯＵＴ＜３＞が入力される。リードレジスタ２６のラッチタイミングはラッチ制御信号ＰＤＯＬＴＴで制御される。ラッチ制御信号ＰＤＯＬＴＴによるラッチタイミングは、メモリセルから読み出されたデータによって読み出しデータＭＡＯＵＴ＜０＞〜ＭＡＯＵＴ＜３＞が確定された後のタイミングとなるように後述の出力制御回路３０で制御される。 A parallel / serial conversion circuit 25 is disposed between the output path 29 of the read data MUXOUT and the main amplifier 17R. The parallel / serial conversion circuit 25 includes four read registers 26, an output selector 27, and a selection control circuit 28. Read data MAOUT <0> to MAOUT <3> are input to the input terminals of the read register 26 from the main amplifier 17R, respectively. The latch timing of the read register 26 is controlled by a latch control signal PDOLTT. The latch timing by the latch control signal PDOLTT is controlled by an output control circuit 30 described later so as to be the timing after the read data MAOUT <0> to MAOUT <3> are determined by the data read from the memory cell. .

前記セレクタ２７は、リードレジスタ２６の出力データＤＯＵＴ＜０＞〜ＤＯＵＴ＜３＞を８バイトづつ選択制御信号ＭＳＥＬ＜３：０＞で選択して前記出力経路に２９に出力する。選択制御回路２８は２ビットの選択制御データＭＵＸＳＥＬ＜１：０＞をクロック信号ＣＬＫに同期してデコードすることにより４ビットの選択制御信号ＭＳＥＬ＜３：０＞を生成する。選択制御データＭＵＸＳＥＬ＜１：０＞が順次インクリメントされて変化されることにより、出力データＤＯＵＴ＜０＞〜ＤＯＵＴ＜３＞がクロック信号ＣＬＫに同期して順次８バイトづつ出力経路２９に出力されて読み出しデータＭＵＸＯＵＴが得られる。 The selector 27 selects the output data DOUT <0> to DOUT <3> of the read register 26 by a selection control signal MSEL <3: 0> by 8 bytes and outputs it to the output path 29. The selection control circuit 28 generates a 4-bit selection control signal MSEL <3: 0> by decoding the 2-bit selection control data MUXSEL <1: 0> in synchronization with the clock signal CLK. As the selection control data MUXSEL <1: 0> is sequentially incremented and changed, the output data DOUT <0> to DOUT <3> are sequentially output to the output path 29 in units of 8 bytes in synchronization with the clock signal CLK. Read data MUXOUT is obtained.

出力制御回路３０はＣＡＳレイテンシに従って前記ラッチ制御信号ＰＤＯＬＴＴを生成する。ＣＡＳレイテンシとは、データ読み出し動作において前記カラムアドレスストローブ信号ＣＡＳの立ち下がり変化にクロック同期で応答するときその次のクロックサイクルから前記並列・直列変換回路２５のデータ入力が確定するまでの遅延時間を前記クロック信号ＣＬＫのサイクル数相当で表現したものである。詳しくは、カラムアドレスストローブ信号ＣＡＳの立ち下がりをクロック信号ＣＬＫの立下り（フォールエッジで）で検出する場合、前記カラムアドレスストローブ信号ＣＡＳの立ち下がりを検出するフォールエッジの次のクロック信号ＣＬＫのフォールエッジから前記読み出しデータＤＯＵＴ＜０＞〜ＤＯＵＴ＜３＞が確定した状態におけるクロック信号ＣＬＫの最初のフォールエッジまでのクロック信号ＣＬＫのサイクル数がＣＡＳレイテンシである。メモリセルアレイ１０からのデータ読み出し動作とメインアンプ１７Ｒによる読み出しデータの増幅動作は回路構成及び回路素子の特性等によって一義的に決まる。したがって、外部へ高速にデータを出力するには、それら動作遅延時間以上でそれに最も近い遅延時間のＣＡＳレイテンシを設定することが必要である。前述の如くＣＡＳレイテンシはクロック信号ＣＬＫのサイクル数相当であるから、ＣＡＳレイテンシによる実際の遅延時間はクロック信号ＣＬＫの周波数に依存し、同じ遅延時間を設定する場合であっても、クロック信号ＣＬＫの周波数が高ければＣＡＳレイテンシは相対的に大きく、クロック信号ＣＬＫの周波数が低ければＣＡＳレイテンシは相対的に小さくなる。図１の例において出力制御回路３０は、レイテンシ設定データＦＲＣＤ＜１：０＞を入力して前記ＣＡＳレイテンシを可変制御可能なＣＡＳレイテンシ制御回路を実現する。前記ＣＡＳレイテンシは前記ラッチ制御信号ＰＤＯＬＴＴによるラッチタイミングに反映される。 The output control circuit 30 generates the latch control signal PDOLTT according to CAS latency. The CAS latency is a delay time from the next clock cycle until the data input of the parallel / serial conversion circuit 25 is determined when responding to the falling change of the column address strobe signal CAS in a data read operation. This is expressed in terms of the number of cycles of the clock signal CLK. Specifically, when the falling edge of the column address strobe signal CAS is detected at the falling edge of the clock signal CLK (at the fall edge), the fall of the clock signal CLK following the fall edge for detecting the falling edge of the column address strobe signal CAS is detected. The number of cycles of the clock signal CLK from the edge to the first fall edge of the clock signal CLK in a state where the read data DOUT <0> to DOUT <3> are determined is the CAS latency. The data read operation from the memory cell array 10 and the read data amplification operation by the main amplifier 17R are uniquely determined by the circuit configuration, circuit element characteristics, and the like. Therefore, in order to output data to the outside at high speed, it is necessary to set a CAS latency having a delay time that is equal to or greater than the operation delay time. Since the CAS latency is equivalent to the number of cycles of the clock signal CLK as described above, the actual delay time due to the CAS latency depends on the frequency of the clock signal CLK, and even when the same delay time is set, The CAS latency is relatively large when the frequency is high, and the CAS latency is relatively small when the frequency of the clock signal CLK is low. In the example of FIG. 1, the output control circuit 30 receives the latency setting data FRCD <1: 0> to realize a CAS latency control circuit capable of variably controlling the CAS latency. The CAS latency is reflected in the latch timing according to the latch control signal PDOLTT.

リフレッシュ制御回路（ＲＣＣ）４０は、上記メモリセルアレイ内の各メモリセルのデータを定期的にリフレッシュするための制御回路であり、メモリブロックＢＮＫ０の内部回路に対して複数の内部制御信号refを生成して供給する。一
方、上記リフレッシュ制御回路４０は、上記メモリ制御回路ＭＣＮＴに対して、メモリブロックＢＮＫ０がリフレッシュ期間おいて活性化されるリフレッシュ期間通知信号ＭＲｅｆ０を出力する。 The refresh control circuit (RCC) 40 is a control circuit for periodically refreshing the data of each memory cell in the memory cell array, and generates a plurality of internal control signals ref for the internal circuit of the memory block BNK0. Supply. On the other hand, the refresh control circuit 40 outputs to the memory control circuit MCNT a refresh period notification signal MRef0 that activates the memory block BNK0 during the refresh period.

以上の説明より明らかなように、前記メモリブロックＢＮＫ０〜ＢＮＫ７は前記クロック信号ＣＬＫの周期の複数倍の周期で変化される前記カラムアドレスストローブ信号ＣＡＳが入力され、カラムアドレス信号ＣＡＳが変化されるサイクル毎に、メモリセルアレイ１０から読み出されクロック信号ＣＬＫのサイクルに同期して並列・直列変換された複数の直列データがメモリブロックから出力され、またクロック信号ＣＬＫのサイクルに同期してメモリブロックに入力されて直・並列変換された並列データがメモリセルアレイ１０に書込まれる。このように、クロック信号ＣＬＫの複数サイクルに１回の割合でカラムアドレスストローブ信号ＣＡＳを変化させるというアクセス仕様によってメモリ動作の高速化を図ることが可能になる。 As is apparent from the above description, the memory blocks BNK0 to BNK7 are supplied with the column address strobe signal CAS which is changed at a multiple of the clock signal CLK, and the column address signal CAS is changed. Each time, a plurality of serial data read out from the memory cell array 10 and converted in parallel and serially in synchronization with the cycle of the clock signal CLK is output from the memory block, and input to the memory block in synchronization with the cycle of the clock signal CLK. The parallel data that has been serial-parallel converted is written into the memory cell array 10. As described above, it is possible to increase the memory operation speed by the access specification that the column address strobe signal CAS is changed at a rate of once per a plurality of cycles of the clock signal CLK.

以上本発明者によってなされた発明を実施形態に基づいて具体的に説明したが、本発明はそれに限定されるものではなく、その要旨を逸脱しない範囲において種々変更可能であることは言うまでもない。 Although the invention made by the present inventor has been specifically described based on the embodiments, it is needless to say that the present invention is not limited thereto and can be variously modified without departing from the gist thereof.

例えば、本発明の半導体集積回路は上位及び下位階層双方のインタフェースブロックを有する構成に限定されない。例えば、図１１に示されるようにメモリブロックとリードバッファを備える構成、図１２示されるようにメモリブロックとライトバッファを備える構成、図１３示されるようにメモリブロックとリードバッファ及びライトバッファを備える構成として本発明を夫々別々に把握することが可能である。 For example, the semiconductor integrated circuit of the present invention is not limited to a configuration having both upper and lower interface blocks. For example, a configuration including a memory block and a read buffer as shown in FIG. 11, a configuration including a memory block and a write buffer as shown in FIG. 12, and a configuration including a memory block, a read buffer and a write buffer as shown in FIG. It is possible to grasp the present invention separately.

また、チップ面積に余裕があればリードバッファやライトバッファをメモリブロック毎に設けてもよい。 Further, a read buffer and a write buffer may be provided for each memory block as long as the chip area is sufficient.

また、メモリブロックの数、並列データ入出力ビット数、リードレジスタ及びライトレジスタの段数等についても適宜変更することが可能である。 Further, the number of memory blocks, the number of parallel data input / output bits, the number of stages of read registers and write registers, and the like can be changed as appropriate.

メモリブロックはＤＲＡＭに限定されず、リードバッファ及びライトバッファはＳＲＡＭに限定されず、他の記憶形式のメモリであってもよい。本発明は各種階層のキャッシュメモリ、メインメモリ、その他のロジック混載半導体集積回路に広く適用できることは言うまでもない。 The memory block is not limited to the DRAM, and the read buffer and the write buffer are not limited to the SRAM, but may be a memory of another storage format. Needless to say, the present invention can be widely applied to cache memories of various levels, main memories, and other logic-embedded semiconductor integrated circuits.

本発明に係る半導体集積回路の一例を全体的に示すブロック図である。1 is a block diagram generally showing an example of a semiconductor integrated circuit according to the present invention. 図１の半導体集積回路における前記リードデータの出力経路の詳細を例示するブロック図である。FIG. 2 is a block diagram illustrating details of an output path of the read data in the semiconductor integrated circuit of FIG. 1. メモリ制御回路が生成する制御信号を例示する説明図である。It is explanatory drawing which illustrates the control signal which a memory control circuit produces | generates. アクセス制御情報を情報フォーマットを例示する説明図である。It is explanatory drawing which illustrates an information format for access control information. 外部からのアクセス要求に対するメモリ制御回路の主な制御手順を代表的に示すフローチャートである。It is a flowchart which shows typically the main control procedure of the memory control circuit with respect to the access request from the outside. ライトアクセスの途中にリフレッシュ動作が介在される場合のライト動作の一例を示すタイミングチャートである。10 is a timing chart illustrating an example of a write operation when a refresh operation is interposed during a write access. ライトバッファが設けられていないときのライト動作を比較例として示すタイミングチャートである。6 is a timing chart illustrating a write operation when a write buffer is not provided as a comparative example. リードバッファを利用したリード動作の一例を示すタイミングチャートである。6 is a timing chart illustrating an example of a read operation using a read buffer. リードバッファが設けられていないときのリード動作を比較例として示すタイミングチャートである。10 is a timing chart showing a read operation when a read buffer is not provided as a comparative example. 半導体集積回路をＬ３キャッシュメモリとして利用したキャッシュメモリシステムのブロック図である。1 is a block diagram of a cache memory system using a semiconductor integrated circuit as an L3 cache memory. 図１０のキャッシュメモリシステムにおけるプロセッサのリードアクセス動作に着目したデータフローを示す説明図である。FIG. 11 is an explanatory diagram showing a data flow focusing on a read access operation of a processor in the cache memory system of FIG. 10. 図１０のキャッシュメモリシステムにおけるプロセッサのライトアクセスに動作に着目したデータフローを示す説明図である。FIG. 11 is an explanatory diagram showing a data flow focusing on an operation for a write access of a processor in the cache memory system of FIG. 10. 図１０のキャッシュメモリシステムにおけるキャッシュラインのリプレースに着目したデータフローを示す説明図である。FIG. 11 is an explanatory diagram showing a data flow focusing on replacement of a cache line in the cache memory system of FIG. 10. 半導体集積回路をプロセッサのメインメモリとして利用したメモリシステムのブロック図である。1 is a block diagram of a memory system using a semiconductor integrated circuit as a main memory of a processor. 半導体集積回路をＬ３キャッシュメモリとしてマルチプロセッサシステムに適用した例を示すブロック図である。It is a block diagram which shows the example which applied the semiconductor integrated circuit to the multiprocessor system as L3 cache memory. 本発明に係る半導体集積回路のチップレイアウトを例示するレイアウト図である。1 is a layout diagram illustrating a chip layout of a semiconductor integrated circuit according to the present invention. メモリブロックの詳細な一例を示すブロック図である。It is a block diagram which shows a detailed example of a memory block.

Explanation of symbols

１半導体集積回路
ＢＮＫ０〜ＢＮＫ７メモリブロック
ＲＢ０〜ＲＢ３リードバッファ
ＷＢ０〜ＷＢ３ライトバッファ
ＭＣＮＴメモリ制御回路
Ｉ／Ｆ１上位階層インタフェースブロック
Ｉ／Ｆ２下位階層インタフェースブロック
４０（４０Ａａ，４０Ａｂ，４０Ａｃ，４０Ａｄ）セレクタ
４１（４１Ａａ，４１Ａｂ，４１Ａｃ，４１Ａｄ、４１Ｂ）セレクタ
４２セレクタ
８ＤＲＡＭコア
２２ライトレジスタ
２６リードレジスタ
５０，５０−１，５０−２プロセッサ
５０ＡＣＰＵ
５０ＢＬ１キャッシュメモリ
５０ＣＬ２キャッシュメモリ
５０Ｄタグ制御論理
５１メインメモリ
５２，５２−１，５２−２プロセッサバス
５３，５３−１，５３−２，５３−３メモリバス
５４メモリバッファ
５５バススイッチ回路
ＣＬＫクロック信号
ＢＬ相補ビット線
ＷＬワード線
ＲＡＳロウアドレスストローブ信号
ＣＡＳカラムアドレスストローブ信号 1 Semiconductor integrated circuit BNK0 to BNK7 Memory block RB0 to RB3 Read buffer WB0 to WB3 Write buffer MCNT Memory control circuit I / F1 Upper layer interface block I / F2 Lower layer interface block 40 (40Aa, 40Ab, 40Ac, 40Ad) Selector 41 ( 41Aa, 41Ab, 41Ac, 41Ad, 41B) Selector 42 Selector 8 DRAM core 22 Write register 26 Read register 50, 50-1, 50-2 Processor 50A CPU
50B L1 cache memory 50C L2 cache memory 50D Tag control logic 51 Main memory 52, 52-1, 52-2 Processor bus 53, 53-1, 53-2, 53-3 Memory bus 54 Memory buffer 55 Bus switch circuit CLK clock Signal BL Complementary bit line WL Word line RAS Row address strobe signal CAS Column address strobe signal

Claims

A write buffer circuit for sequentially storing input data ;
A plurality of write registers for sequentially storing the input data stored in the write buffer circuit in units of the input data;
A memory block having a plurality of memory cells that require a refresh operation periodically, and holding the input data stored in the plurality of write registers in parallel;
When the refresh period starts, the input of data to the write buffer circuit is continued and the input data to the memory block is waited for the data of the write buffer circuit to be stored in the plurality of write registers. And a control circuit that resumes the operation of writing the input data stored in the write register into the memory block when the refresh period ends.

The memory block further includes a plurality of word lines and a plurality of data lines ,
Each of the plurality of memory cells is coupled to the plurality of word lines and the plurality of data lines so that one memory cell is coupled to one word line and one data line. Including a transistor,
2. The semiconductor integrated circuit according to claim 1, wherein said selection transistor has a selection terminal coupled to a corresponding word line and a data input / output terminal coupled to a corresponding data line.

Each of the write buffer circuits is
A memory array including a plurality of static memory cells, a plurality of word lines, and a plurality of complementary data line pairs;
An address decoder for selecting a predetermined word line in response to an address signal;
A sense amplifier that amplifies data of a plurality of selected memory cells;
3. The semiconductor integrated circuit according to claim 2, further comprising a data output circuit for outputting the amplified data.

4. The semiconductor integrated circuit according to claim 3, wherein each of the plurality of static memory cells includes a pair of inverters whose input / output terminals are cross-coupled.