CA1323451C - Cache-memory architecture - Google Patents

Cache-memory architecture

Info

Publication number
CA1323451C
CA1323451C CA000597419A CA597419A CA1323451C CA 1323451 C CA1323451 C CA 1323451C CA 000597419 A CA000597419 A CA 000597419A CA 597419 A CA597419 A CA 597419A CA 1323451 C CA1323451 C CA 1323451C
Authority
CA
Canada
Prior art keywords
cache
memory
cmmus
main
processor bus
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CA000597419A
Other languages
French (fr)
Inventor
Victor Jacques Menasce
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nortel Networks Ltd
Original Assignee
Northern Telecom Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Northern Telecom Ltd filed Critical Northern Telecom Ltd
Priority to CA000597419A priority Critical patent/CA1323451C/en
Priority to US07/871,578 priority patent/US5193166A/en
Application granted granted Critical
Publication of CA1323451C publication Critical patent/CA1323451C/en
Anticipated expiration legal-status Critical
Expired - Fee Related legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/14Handling requests for interconnection or transfer
    • G06F13/20Handling requests for interconnection or transfer for access to input/output bus
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0844Multiple simultaneous or quasi-simultaneous cache accessing
    • G06F12/0846Cache with multiple tag or data arrays being simultaneously accessible
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0864Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches using pseudo-associative means, e.g. set-associative or hashing

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Memory System Of A Hierarchy Structure (AREA)

Abstract

ABSTRACT

A cache-memory system comprising a processor bus communicating with an associated processor (CPU) in said system; a plurality of cache-memory management units (CMMUs) on said processor bus each having a single cache address tag array for addressing one associated cache array, and each one of said plurality of CMMUs communciating through an associated one of a plurality of memory buses with a main memory of said system.

Description

1 323~5 1 I~PROVED CACHE-~EMORY ARCHITECTURE

BACKGROU~D OF THE INVE~TION

l. Field o~ the Invention The present invention relates to digital computers in general. and to computer memory organization in particular. ~ore particularly still, ilt relates to or~anization and architecture of cache-memory systems.
2. Prior .~rt of the Invention The closest prior art known to the present 1~ invention is disclosed in detail in Motorola Inc.'s Users Manual of an inteorated circuit known as the MC88200 cachelmemory management unit or CMMU.

An e~cellent background pa~er on the general ==
subject of the present invention by Alan Jay Smith is entitled "Problems, Directions and Issues in Memory Hierarchies" published in the Proceedin~s of the Ei~hteenth Annual Hawaii International Conference on System Sciences~ 1~85. Section 2 of this paper concerns cache memories and is particularly relevant.

- ~

-, ,, ~ .

1 3234~ ~

~ nother important back~round paper bv the above-mentioned author entitled "Line (Block) Size Choice for CP~ Cache ~1emories" was published September 9~ 1987. in the IEEE Transaction~ On C~mputers, Vol. C-36~ No. 9.

Cache memories temporarily hold portions of the contents of the main system memory which hopefully have a high probability of current use by the CPU or processor, In practice, the cache memory holds recently used contents of main memory, Thus the three basic performance parameters are "hit access time" "miss access time" and "hit ratio." Hit access time is the time to read or write cache me~ory, when the data is in the cache memory, ~iss access time is the time to read r.~ain memory when the requested word is not in the cache memory. Hit ratio is the probability of findin8 the tarOet data in the cache, and directly affects memory traffic maXin~ memory or bus bandwidth a critical and limitinO resource in~RISC (Reduced Instruction Set Computer) microprocessor systems (Smith, 1~85~

, :. ~ ~ : ' ~ lemor.v traffic consists of two components: fetch traf~ic and write or copyback traffic. ~he .lle~ory fetch traffic increases with line size in the cache. while generallv the "miss ratio" ~opposite of "hit ratio") declines with increasing line size. Memory traffic also incr~ases with miss ratio.

SU~IARY OF THE INVENTION

A primary object of the present invention is to improve memory bandwidth and reduce its latency without degradin~ the hit or miss ratios.

The improved cache-memory architecture or system of the present invention is equally applicable to instructions or code caches and data caches, or to mixed caches.

.
Amon~ disadvantages of the present invention are:

n inc~ease in the cache address "tag array" size; and an increase in the number of simùltaneously switching ~i~nal~ on the system "backplane", ;

- ~ ' " ; '` ' ' '. '' ' ` - : ' ' ' - ' ' . ` ` ' ` ' ' ` ,. ' ,~ ` ' ' ` . "

According to the present invention there is provided a cache-~er.lorv system comprising:

a processor bus communicating with an associated processor ~CPU) in said syster..:

S a plurality of cache-memory management units IC~ s) on said processor bus each having a single cache address taO array for addressing one associated cache ~rray; and each one of said plurality of CM~Us communicatino 1~ through an associated one of a plurality of memory buses with a main memory of said system.

Preferably~ a cache will contain one data or code word per line of storage; but the system according to the invention will have a larger apparent or effective line size.

, . ,, , . . : - , , ,:
' '- ' ;, :- ~, ,, : .

BRIEF DESCRIPTION OF THE DRAWI~GS

The preferred embodiment of the present inven-tion will now be de~cribed in detail in conjunction with the anne~ed drawings, in which:

Figure l is a block schematic of the prior art cache-memory architecture;

Fi~ure 2 is a bloc~ schematic of the cache-me~ory syste!~ of the present invention; and Fi~ure 3 is a block schematic of an alternative to Fi~ure 2.

DETAILED DESCRIPTIO~ OF THE PREFERRED EMBODIME~T

Fi~ure l of the drawings shows the system architecture of the prior art, where a processor lO
generally has a processor bus ll communlcatlng with a plurality of CMMUs O to N. ~A CMMU, such ~as;Motorola's ;
MC88200, has a 16k Bytes of: high-speed instruction~or ::: `
:

: ~
- . : , : ::
', ~, . . . .. .

data storage space. These 16k Bytes of storaoe are organized in 10~4 lines o~ 4 words/line, each word being 4 Bytes long~ This is e~plained in full detail in the above-mentioned MC~200 Users Manual. The plurality of C~1~1Us~ in turn, communicate with the main system memory 1~, generally a RAM, via a memory bus 13. This organization is a 4 word!line cache and is also a $-way set associative cache.

Turning now to Figure 2 of the drawings, it shows an architecture according to the present invention, wherein the number of sets has been increased by a factor of ~ (here ~=4) and~ maintaining the size of cache memory, the nu~.ber of words/line is reduced to l.
The apparent or effective number of words/line, however, rerlains at 4 words/line~ thus maintaining the same hit or miss ratios.

In Figure 2, the processor 10 has, as is often the ca~e~ two buses~ a code bus 14 and a data bus 15.
However, while the data and code words are interleaved ~0 on the PBUS. here they are, as a result of the cache reorganization, no lon~er interleaved on a memory bus.
:: :

" ., ' . ;

5; ~i R~ther, each word, here O to 3~ of the cache in each of the ~ IUs accesses the corresponding word -- (RAM WORD O
-to RAM WORD 3) -- in the system RAM via a separate bus (MBUSO to MBUS3). thus interleaving the CMMUs rather os than the data or code words. This simple reorganization can result in an increase of memory bandwidth, and reduces average memory latency compared to sequential burst memory access. At the same time, due to the simplicity of the reorganizatinn, the PBUS and MBUS
specifications need not be altered. And should it be necessary to have still larger caches~ the CMMUs for each word may be paralleled as tau~ht by the prior art.

The system architecture shown in Fi~ures 2 and 3, however, requires an increase in the total cache address tag array size by a factor of appr. 4 for each CMMU, (actually the increase is from 256 entries to 1024 entries). Operationally, the standarcl cache algorithm is altered so that a miss on a line address results in a fetch instruction from the system memory 12 for all CMMUs, but that only the CMMU ~hat matches on the~word address returns the result to the processor ~10.

~5 -' - ' ' ' ' ' " , , , ' : -,~

Turnin~ now to Fi~ure 3, it shows a unitary C~U 16 having four cache memories 17 to 20, bein~ addressed by means of four tags 21 to 24. This CMMU 16 is equivalent to four CMMUs shown in Figure 2. Since ~he capability of address translation from logical or virtual memory addresses to physical cache addresses is determined by the much larger virtual addresses, this capability is essentially the same as that in the standard CMMU.

Thus the CMMU 16 in Fi~ure 3 differs only in the manner and scale of integration~ rather than in any principle of operation. Practical pin-out problems may, however.
have to be overcome.

, ' . ~' . '.:
. . . ` '

Claims (6)

1. A cache-memory system comprising:

a processor bus communicating with an associated processor (CPU) in said system;

a plurality of cache-memory management units (CMMUs) on said processor bus each having a single cache address tag array for addressing one associated cache array; and each one of said plurality of CMMUs communciating through an associated one of a plurality of memory buses with a main memory of said system.
2. The cache-memory system of claim 1, wherein said each one of said plurality of caches is configured to store only one word per line.
3. The cache-memory system of claim 2, wherein said processor bus supports separate code and data caches.
4. The cache-memory system of claims 1, 2 or 3, wherein the effective number of words/line is larger than the actual number of words/line stored in a cache.
5. The cache-memory system of claims 1, 2 or 3, wherein a plurality of CMMUs access the same memory bus.
6. A cache-memory system comprising:
a central processing unit (CPU);
a plurality of cache-memory management units (CMMUs), each of said CMMUs having a single cache address tag for addressing one associated cache;
a processor bus for providing communication between said CPU and said CMMUs;
a plurality of main-memory storage units; and a plurality of buses for providing communication between said CMMUs and said main-memory storage units, each of said memory buses being associated with only one of said CMMUs and with only one of said main-memory storage units, such that each word stored in one of said CMMUs is associated with only one word stored in one of said main-memory storage units, thereby increasing memory bandwidth while decreasing average memory latency.
CA000597419A 1989-04-21 1989-04-21 Cache-memory architecture Expired - Fee Related CA1323451C (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CA000597419A CA1323451C (en) 1989-04-21 1989-04-21 Cache-memory architecture
US07/871,578 US5193166A (en) 1989-04-21 1992-04-20 Cache-memory architecture comprising a single address tag for each cache memory

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CA000597419A CA1323451C (en) 1989-04-21 1989-04-21 Cache-memory architecture

Publications (1)

Publication Number Publication Date
CA1323451C true CA1323451C (en) 1993-10-19

Family

ID=4139937

Family Applications (1)

Application Number Title Priority Date Filing Date
CA000597419A Expired - Fee Related CA1323451C (en) 1989-04-21 1989-04-21 Cache-memory architecture

Country Status (1)

Country Link
CA (1) CA1323451C (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0845120A1 (en) * 1995-08-16 1998-06-03 MicroUnity Systems Engineering, Inc. General purpose, programmable media processor
US7213131B2 (en) 1995-08-16 2007-05-01 Microunity Systems Engineering, Inc. Programmable processor and method for partitioned group element selection operation
US7216217B2 (en) 1995-08-16 2007-05-08 Microunity Systems Engineering, Inc. Programmable processor with group floating-point operations
US7301541B2 (en) 1995-08-16 2007-11-27 Microunity Systems Engineering, Inc. Programmable processor and method with wide operations

Cited By (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0845120A1 (en) * 1995-08-16 1998-06-03 MicroUnity Systems Engineering, Inc. General purpose, programmable media processor
EP0845120A4 (en) * 1995-08-16 2005-05-04 Microunity Systems Eng General purpose, programmable media processor
US7213131B2 (en) 1995-08-16 2007-05-01 Microunity Systems Engineering, Inc. Programmable processor and method for partitioned group element selection operation
US7216217B2 (en) 1995-08-16 2007-05-08 Microunity Systems Engineering, Inc. Programmable processor with group floating-point operations
US7222225B2 (en) 1995-08-16 2007-05-22 Microunity Systems Engineering, Inc. Programmable processor and method for matched aligned and unaligned storage instructions
US7260708B2 (en) 1995-08-16 2007-08-21 Microunity Systems Engineering, Inc. Programmable processor and method for partitioned group shift
US7301541B2 (en) 1995-08-16 2007-11-27 Microunity Systems Engineering, Inc. Programmable processor and method with wide operations
US7353367B2 (en) 1995-08-16 2008-04-01 Microunity Systems Engineering, Inc. System and software for catenated group shift instruction
US7386706B2 (en) 1995-08-16 2008-06-10 Microunity Systems Engineering, Inc. System and software for matched aligned and unaligned storage instructions
US7653806B2 (en) 1995-08-16 2010-01-26 Microunity Systems Engineering, Inc. Method and apparatus for performing improved group floating-point operations
US7660973B2 (en) 1995-08-16 2010-02-09 Microunity Systems Engineering, Inc. System and apparatus for group data operations
US7660972B2 (en) 1995-08-16 2010-02-09 Microunity Systems Engineering, Inc Method and software for partitioned floating-point multiply-add operation
US7730287B2 (en) 1995-08-16 2010-06-01 Microunity Systems Engineering, Inc. Method and software for group floating-point arithmetic operations
US7818548B2 (en) 1995-08-16 2010-10-19 Microunity Systems Engineering, Inc. Method and software for group data operations
US7849291B2 (en) 1995-08-16 2010-12-07 Microunity Systems Engineering, Inc. Method and apparatus for performing improved group instructions
US7987344B2 (en) 1995-08-16 2011-07-26 Microunity Systems Engineering, Inc. Multithreaded programmable processor and system with partitioned operations
US8001360B2 (en) 1995-08-16 2011-08-16 Microunity Systems Engineering, Inc. Method and software for partitioned group element selection operation
US8117426B2 (en) 1995-08-16 2012-02-14 Microunity Systems Engineering, Inc System and apparatus for group floating-point arithmetic operations
US8289335B2 (en) 1995-08-16 2012-10-16 Microunity Systems Engineering, Inc. Method for performing computations using wide operands

Similar Documents

Publication Publication Date Title
US7159095B2 (en) Method of efficiently handling multiple page sizes in an effective to real address translation (ERAT) table
US5193166A (en) Cache-memory architecture comprising a single address tag for each cache memory
US5694567A (en) Direct-mapped cache with cache locking allowing expanded contiguous memory storage by swapping one or more tag bits with one or more index bits
US5410669A (en) Data processor having a cache memory capable of being used as a linear ram bank
EP0813709B1 (en) Parallel access micro-tlb to speed up address translation
US6795897B2 (en) Selective memory controller access path for directory caching
US7539843B2 (en) Virtual memory fragment aware cache
US4853846A (en) Bus expander with logic for virtualizing single cache control into dual channels with separate directories and prefetch for different processors
US6360282B1 (en) Protected control of devices by user applications in multiprogramming environments
US6457104B1 (en) System and method for recycling stale memory content in compressed memory systems
US6304944B1 (en) Mechanism for storing system level attributes in a translation lookaside buffer
US20020194433A1 (en) Shared cache memory replacement control method and apparatus
US9678872B2 (en) Memory paging for processors using physical addresses
WO2005033946A1 (en) A mechanism to compress data in a cache
JPH08235052A (en) System and method for storage of address tag into directory
US20050182912A1 (en) Method of effective to real address translation for a multi-threaded microprocessor
GB2441435A (en) Directory cache for a plurality of caches stores information about copies of memory lines stored in the caches.
CA1323451C (en) Cache-memory architecture
US5710905A (en) Cache controller for a non-symetric cache system
US20020042861A1 (en) Apparatus and method for implementing a variable block size cache
US6587923B1 (en) Dual line size cache directory
EP0470736B1 (en) Cache memory system
US20050071566A1 (en) Mechanism to increase data compression in a cache
GB2335764A (en) Selective caching of memory accesses based on access history
US20030225992A1 (en) Method and system for compression of address tags in memory structures

Legal Events

Date Code Title Description
MKLA Lapsed