EP1894098A1 - Cache with flexible configuration, data processing system using same, and method therefor - Google Patents

Cache with flexible configuration, data processing system using same, and method therefor

Info

Publication number
EP1894098A1
EP1894098A1 EP05766573A EP05766573A EP1894098A1 EP 1894098 A1 EP1894098 A1 EP 1894098A1 EP 05766573 A EP05766573 A EP 05766573A EP 05766573 A EP05766573 A EP 05766573A EP 1894098 A1 EP1894098 A1 EP 1894098A1
Authority
EP
European Patent Office
Prior art keywords
cache
bus
bus operation
controller
input
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
EP05766573A
Other languages
German (de)
French (fr)
Inventor
Syed R. Rahman
David W. Todd
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
NXP USA Inc
Original Assignee
Freescale Semiconductor Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Freescale Semiconductor Inc filed Critical Freescale Semiconductor Inc
Publication of EP1894098A1 publication Critical patent/EP1894098A1/en
Withdrawn legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/12Replacement control
    • G06F12/121Replacement control using replacement algorithms
    • G06F12/126Replacement control using replacement algorithms with special data handling, e.g. priority of data or instructions, handling errors or pinning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0844Multiple simultaneous or quasi-simultaneous cache accessing
    • G06F12/0846Cache with multiple tag or data arrays being simultaneously accessible
    • G06F12/0848Partitioned cache, e.g. separate instruction and operand caches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/25Using a specific main memory architecture
    • G06F2212/251Local memory within processor subsystem
    • G06F2212/2515Local memory within processor subsystem being configurable for different purposes, e.g. as cache or non-cache memory
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2212/00Indexing scheme relating to accessing, addressing or allocation within memory systems or architectures
    • G06F2212/60Details of cache memory
    • G06F2212/601Reconfiguration of cache memory

Definitions

  • the invention relates generally to data processing systems, and more particularly to caches for use in data processing systems.
  • a known way to increase the performance of a computer system is to include a local, high-speed memory known as a cache.
  • a cache increases system performance because there is a high probability that when the central processing unit (CPU) accesses data (instructions or data operands) at a particular address, it will access that data or data at an adjacent address shortly thereafter.
  • the cache fetches and stores data that is located adjacent to the requested piece of data from a slower, main memory or lower-level cache.
  • Ll cache the cache that is closest to the CPU
  • L2 cache the highest-level cache in the hierarchy and is generally the fastest.
  • SOCs systems-on-chip
  • a cache can also be categorized by the type of data that it stores.
  • An instruction cache stores instructions fetched from memory.
  • a data cache stores data operands allocated to the cache from memory.
  • a popular system configuration includes separate Ll instruction and data caches and a combined instruction and data cache at the L2 level.
  • a basic cache includes a large static random access memory (SRAM) partitioned into multiple-word cache lines. Caches vary on the degree of associativity of the cache lines.
  • a fully-associative cache allows any line from memory to be stored in any line in the cache.
  • a set-associative cache includes cache lines grouped into sets, in which each cache line in the set has a portion of its address, known as the set index, in common with other cache lines in the set.
  • a direct-mapped cache has a specific location for each line of the memory. The set associative technique is usually used because it limits the number of address comparisons during an access to a manageable amount, thereby allowing large cache arrays to be built with a reasonable amount of circuit area.
  • each cache line or "way" in a set has a corresponding tag that stores a portion of the address.
  • a match between this portion of the access address and the tag of a valid cache line in the corresponding set is. termed a cache hit.
  • the tag is usually implemented with content addressable memory (CAM) to quickly compare an access address against the tag of each cache line in the set.
  • CAM content addressable memory
  • LRU least-recently used
  • the cache In addition to instructions and data operands needed by the CPU, the cache is also a convenient place to store data communication frames for further processing.
  • using the cache for this purpose can cause inefficiency during periods of high data communication traffic.
  • the portion of the cache used for data frames grows and can crowd out locations available to store CPU instructions and data. This crowding out effect can cause the CPU to operate inefficiently since it now has to fetch needed instructions and data from lower levels of the memory hierarchy and must perform relatively slow accesses to main memory more frequently.
  • FIG. 1 illustrates in block diagram form a data processing system according to the present invention
  • FIG. 2 illustrates in block diagram form a portion of the cache of FIG. 1 useful in understanding the organization thereof;
  • FIG. 3 illustrates in block diagram form a cache access algorithm used by cache access logic in the cache controller of FIG. 1;
  • FIG. 4 illustrates a flow diagram of a cache replacement algorithm used by cache replacement logic in the cache controller of FIG. 1.
  • a cache is adapted to be coupled to a bus and comprises an array and a controller.
  • the array has a plurality of sets each having a plurality of cache lines, each cache line in each set including a tag and a plurality of data words.
  • the controller is coupled to the array, and associates both a first portion of the plurality of cache lines of each set with a first bus operation type and a second portion of the plurality of cache lines of each set with a second bus operation type according to a configuration value.
  • the controller also causes data conducted over the bus during a bus operation to be selectively stored in the first portion if a type of the bus operation is the first bus operation type or in the second portion if the type of the bus operation is the second bus operation type.
  • the first bus operation type may be an input/output stash bus operation and the second bus operation type may be a processor bus operation.
  • the controller may further associate a third portion of the plurality of cache lines of each set to be used as a static random access memory according to the configuration value.
  • the cache may further include a configuration register for storing the configuration value.
  • the configuration register may include a first field for dete ⁇ nining a size of the first portion and a second field for determining a size of the third portion, a size of the second portion being a remainder of the plurality of cache lines that are not in the first portion or the third portion.
  • the controller may include replacement logic responsive to an address of the bus operation to select a cache line for replacement from one of the first and second portions of the plurality of cache lines corresponding to the type of the bus operation.
  • the replacement logic may further perform a pseudo least recently used (LRU) algorithm to select the cache line for replacement.
  • LRU pseudo least recently used
  • Each set may also store a plurality of history bits and the controller may operate on the plurality of history bits to determine a cache line for replacement while avoiding ineligible lines identified by the configuration value.
  • a data processing system in another form includes a first " bus, a central processing unit (CPU), an input/output controller, and a cache.
  • the CPU is coupled to the first bus and performs processor bus operations on the first bus.
  • the input/output controller performs input/output stash bus operations on the first bus.
  • the cache is coupled to the first bus and includes an array and a controller.
  • the array has a plurality of sets each having a plurality of cache lines, each cache line hi each set including a tag and a plurality of data words.
  • the controller is coupled to the array, and associates both a first portion of the plurality of cache lines of each set with the input/output stash bus operations and a second portion of the plurality of cache lines of each set with the processor bus operations according to a configuration value.
  • the controller also causes data conducted over the first bus during a bus operation to be selectively stored in the first portion if the bus operation is an input/output stash bus operation or in the second portion if the bus operation is the
  • the first bus, the central processing unit, the input/output controller, and the cache may be combined on a single integrated circuit.
  • the data processing system may include a second bus coupled to the input/output controller, and a coherency module coupled to the first bus and to the second bus that provides input/output stash bus operations on the second bus to the first bus.
  • the input/output controller may further perform other bus operations that are not the input/output stash bus operations.
  • the input/output controller may perform the input/output stash bus operations for data communication header data - A - and the other bus operations that are not the input/output stash bus operations for data communication payload data.
  • the central processing unit includes a level one (Ll) cache and the cache functions as a level two (L2) cache.
  • Ll level one
  • L2 level two
  • the controller may comprise replacement logic responsive to an address of the bus operation to select a- cache line for replacement from one of the first and second portions of the plurality of cache lines corresponding to the type of the bus operation.
  • Such replacement logic may further perform a pseudo least recently used (LRU) algorithm to select the cache line for replacement.
  • LRU pseudo least recently used
  • each set may store a plurality of history bits and the controller may operate on the plurality of history bits to determine a cache line for replacement while avoiding ineligible lines identified by the configuration value.
  • a method for organizing a cache is provided.
  • An array of the cache is organized into a plurality of sets each having a plurality of cache lines, each cache line including a tag and a plurality of data words.
  • a configuration value is received and stored.
  • Both a first portion of the plurality of cache lines of each set is associated with a first bus operation type and a second portion of the plurality of cache lines of each set is associated with a second bus operation type according to the configuration value.
  • Data conducted over a bus during a bus operation is selectively stored in the first portion if the bus operation is the first bus operation type or in the second portion if the bus operation is the second bus operation type.
  • the step of associating the first portion of the plurality of cache lines with the first bus operation type may further comprise the step of associating the first portion of the plurality of cache lines with an input/output stash bus operation, and the step of associating the second portion of the plurality of cache lines with the second bus operation type may comprise the step of associating the second portion of the plurality of cache lines with a processor bus operation type.
  • the method may also include a step of associating a third portion of the plurality of cache lines of each set to be used as static random access memory according to the configuration value.
  • FIG. 1 illustrates in block diagram form a data processing system 100 according to the present invention.
  • Data processing system 100 includes generally a central processing unit (CPU) core 110, a level 2 ("L2") cache 120, a coherency module 130, a bus 140, a double data rate (DDR) synchronous dynamic random access memory (SDRAM) controller 150, a local bus interface 160, input/output controllers in the form of Ethernet controllers 170 and 180, and a peripheral hub 190 to which is attached a variety of peripheral devices that will be described in greater detail below.
  • the components of data processing system 100 are combined into a single integrated circuit.
  • local bus interface 160 connects to an off-chip local bus
  • DDR SDRAM controller 150 connects to off-chip DDR SDRAMs. It should be apparent that a variety of other integrated circuit configurations are possible using the same basic components of data processing system 100.
  • CPU core 110 is a high performance reduced instruction set computer (RISC) core. As part of the high performance, CPU core 110 includes a level one ("Ll") data cache 112 and an Ll instruction cache 114. CPU core 110 also includes an interface to a front-side bus 116 by which it performs load and store accesses.
  • Ll level one
  • Ll instruction cache 114 Ll instruction cache 114
  • L2 cache 120 includes a bidirectional connection to front-side bus 116, and is a combined instruction and data cache that operates in conjunction with Ll data cache 112 and Ll instruction cache 114 in a non- exclusive arrangement. More particularly L2 cache 120 stores both CPU instructions and data operands, and input/output data. The input/output data is in the form of portions of data communication frames as will be described more fully below.
  • L2 cache 120 includes generally a controller 122 and an array of cache lines 125. As will be described more fully below, array 125 includes both SRAM for storing cache lines and a CAM having locations corresponding to the cache lines for storing TAGs.
  • Coherency module 130 is bidirectionally connected to both front-side bus 116 and internal bus 140.
  • Coherency module performs two basic functions. First, it snoops read and write activity on bus 140 and causes caches 112, 114, and 120 to search for cache hits and to store data of certain input/output transactions. Second, it forms a bridge between bus 116 and bus 140 so CPU core 110 can access devices on internal bus 140.
  • Local bus interface 160 is provided as a bridge between transactions on bus 140 and an external bus.
  • the external bus may be, for example, a general-purpose memory bus.
  • Ethernet controllers 170 and 180 are connected to separate Ethernet links and transmit data frames to and receive data frames from these links. As illustrated, they perform at least a portion of the data link layer function and interface to off-chip physical layer devices. Thus Ethernet controllers 170 and 180 are sources and sinks for data communication frames. Coherency module 130 recognizes transactions that must be globally visible for snooping and provides them over front-side bus 116 for storage in L2 cache 120. Thus Ethernet Controllers 170 and 180 advantageously provide data communication frame load and store transactions with attributes that identify them as being globally visible for snooping.
  • DDR SDRAM controller 150 provides an interface to external mass storage that is compliant with JEDEC Standard JESD-79. In other embodiments, DDR SDRAM controller 150 may be compliant with the so-called DDR-II standard, JESD-90, some other standard, or with conventional asynchronous DRAMs.
  • Hub 190 is the interconnection point or bridge for connection to a variety of various peripheral devices.
  • Data processing system 100 includes various ones of such devices, including an input/output messaging unit 191, four links labeled "SRIO" that operate according to a serial form of the protocol published by the RapidIO Trade Association shown as a single block 192 to links 193 having up to four lanes, a peripheral component interconnect (PCI) Express (PCI-E) bus interface 194 to links 195 having up to eight lanes, a dedicated PCI or PCI-X controller 196 connected to a corresponding PCI link 197, and a direct memory access (DMA) controller 198.
  • PCI peripheral component interconnect
  • PCI-E peripheral component interconnect Express
  • DMA direct memory access
  • Ethemet controllers 170 and 180 may operate with normal 10 megabit-per-second (Mbps) Ethernet links, fast Ethernet at 100 Mbps, or Gigabit Ethernet at 1 Gigabit-per-second (Gbps).
  • Mbps megabit-per-second
  • Gbps gigabit-per-second
  • L2 cache 120 has the ability to dedicate portions of array 125 for either input/output (I/O) data or CPU data.
  • I/O stashing will be further described with respect to FIGs. 2-4 below.
  • I/O stashing means storing data conducted during an I/O bus operation in the cache.
  • Ethernet controller 170 may write Ethernet frame data to memory over bus 140, and L2 cache 120 could advantageously stash this data since it would be needed soon thereafter for further processing by CPU core 110.
  • This capability allows a first programmable portion of L2 cache 120 to be set aside for (Le. associated with) input/output operations to support the data communication task and a second programmable portion to be set aside for CPU instructions and operand data, to substantially avoid crowding out.
  • a third programmable portion of cache 120 can be dedicated for use as a general purpose SRAM, providing a high degree of flexibility for different system needs.
  • FIG. 2 illustrates in block diagram form a portion of cache 120 of FIG. 1 useful in understanding the organization thereof.
  • cache 120 includes array 125 and a control register 230 labeled "L2 CACHE CONTROL REGISTER" associated with controller 122.
  • Array 125 includes N sets of cache lines including a set 200 labeled "SET 0", a set 210 labeled "SET 1", and a set 220 labeled "SET N". Each set is 8- way associative in that it has eight cache lines 202, and also a common set of pseudo least recently used
  • PLRU bits 204 function as history bits to store the history of accesses to the cache lines of set 200.
  • Each cache line has a corresponding tag labeled "ADDRESS TAG 0" through “ADDRESS TAG 7" corresponding to both a way of the group WAY 0 through WAY 7 and a group of eight data words labeled "WORDS[0-7]".
  • Each cache line also includes control bits for implementing cache coherency, not specifically shown in FIG. 2, including a valid bit for indicating whether the cache line is valid.
  • PLRU bits 204 are used to implement the cache replacement algorithm. For example, the bank and set index of a particular access address indicates that new data will be allocated to a line hi SET 0. PLRU bits 204, in conjunction with the association by bus operation type and locking mechanisms, then determine which cache line must be cast out, as will be further explained with respect to FIG. 4 below.
  • L2 CACHE CONTROL REGISTER 230 stores a configuration value to associate a first portion of each set with I/O stash operations and a second portion of each set with processor bus operations. This association ensures that certain amounts of the cache will remain available to the processor even during periods of high network traffic, allowing the processor to continue operating efficiently.
  • controller 122 determines the bus operation type by examining both a set of signals known as the "transfer type" issued by the bus master during a bus transaction, and the address of the transaction. Considering both the transfer type and the address allows controller 122 to interpret only some of the transactions initiated by Ethernet controllers 170 and 180 as being I/O stash operations that need to be cached. Thus, for example, data communication header data that will be subsequently processed by CPU core 110 will be available in cache 120, whereas payload data will not be cached. In other embodiments, however, these and other factors could be used differently to determine bus operation type.
  • L2 cache 120 also includes the ability to dedicate portions of each set to be used as SRAM, providing additional flexibility for systems that need dedicated SRAM for high-speed scratchpad memory.
  • a first 3-bit field 232 labeled "L2SRAM” determines which portion of each set starting at the top will be associated with the SRAM function, shown as SRAM portion 207.
  • L2SRAM allows sizes of 0, 1, 2, 4, and 8 ways associated with the SRAM function.
  • a second 2-bit field 234 labeled “L2STASHCTL” determines which portion of each set, starting from the bottom, will be associated with the I/O stash function, shown as I/O STASH portion 205.
  • L2STASHCTL allows 0, 1, 2, and 4 ways to be associated with the I/O stash function. Any remaining portion in the middle will be associated with the processor cache function, shown as PROCESSOR CACHE portion 206.
  • L2SRAM field 232 and L2STASHCTL field 234 collectively form a configuration value for the three portions, that L2 CACHE CONTROL REGISTER 230 can determine the size of all three portions of each set through the specification of the size of any two of the fields, and that the fields that are provided to determine the size of these three portions may vary in other embodiments.
  • L2SRAM field 232 has the value of "100" to indicate that the SRAM portion includes the top two cache lines in each set corresponding to WAYO and WAYl.
  • L2STASHCTL field 234 has the value of "10" to indicate that the I/O stash area includes two cache lines in each set corresponding to WAY6 and WAY7.
  • the processor cache area includes the remaining four cache lines in the middle of each set corresponding to WAY2-WAY5.
  • FIG. 3 illustrates in block diagram form a cache access algorithm used by cache access logic in cache controller 122 of FIG. 1.
  • Controller 122 performs the lookup process by examining a 36-bit access address labeled "ADDR" 320.
  • ADDR 320 includes a first portion 322 corresponding to bits 20-22 that controller 122 uses as a 3-bit bank select index 330 to select one of the 8 available banks; a second portion 324 corresponding to bits 23-30 that controller 122 uses as an 8-bit set index 332 to select one of the 256 available sets in the selected bank; a third portion 326 corresponding to bits 0-19 that controller 122 uses indirectly to form a 3-bit way select index 334; and a fourth portion 328 corresponding to bits 31-35 that controller 122 uses as a 5-bit byte select index 336.
  • the 20-bit way select comparison determines whether the line is stored in the cache by performing eight comparisons in parallel using eight 20-bit tags 310. If there is a match (and the line is valid as indicated by the valid bit), then the one of the eight cache lines that matched the tag portion of the ADDR indicates the selected way.
  • a cache line corresponding to ADDR 320 may be allocated to a selected area of the cache.
  • the process of allocation involves replacing an existing cache line with the new cache line.
  • Controller 122 determines which existing cache line will be replaced according to a pseudo LRU algorithm that has been specially modified to store the new cache line in a portion of the cache that has been allocated to the source of the data as indicated by the bus operation type, either processor cache or I/O stash.
  • FIG. 4 illustrates a cache replacement algorithm 400 used by cache replacement logic in cache controller 122 of FIG. 1.
  • the set is first selected by the block and set indexes of the access address. Then PLRU bits 204 are used to select the appropriate way.
  • controller 122 modifies cache replacement algorithm 400 to walk the decision tree while avoiding ineligible lines identified by the configuration value stored in L2 CACHE CONTROL REGISTER 230.
  • WAY 3 is the LRU cache line, but the bus operation type is an I/O stash operation. Since L2STASHCTL has allocated the two lines corresponding to WAY 6 and WAY 7 to the I/O stash operation, then the leftward flow from decision box 410 is an ineligible transition regardless of the state of PO, so the flow proceeds rightward to decision box 450. Similarly, the leftward flow from decision box 450 is an ineligible transition, so the flow proceeds rightward to decision box 470.
  • the state of P6 determines whether to replace the cache line in WAY 6 or WAY 7 with the new cache line.
  • This pseudo LRU algorithm allows easy integration with a cache locking mechanism. When a line in the cache is locked, it cannot be de-allocated, regardless of the states of the PLRU bits. Thus to support locking, controller 122 also recognizes ineligible state transitions in response to locked lines.
  • data processing system 100 could be implemented as a single integrated circuit (IC), or various components such as L2 cache 120 could be implemented as separate ICs or systems.
  • controller 122 could separate portions of each set based on other combinations of bus operation types besides I/O stash bus operations and processor bus operations.
  • bus operations besides I/O stash bus operations and processor bus operations.
  • cache organizations besides the example shown are possible. It should be understood that various changes can be made in the function and arrangement of elements without departing from the scope of the invention as set forth hi the appended claims and the legal equivalents thereof.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Memory System Of A Hierarchy Structure (AREA)

Abstract

A cache (120) is adapted to be coupled to a bus (116) and includes an array (125) and a controller (122). The array (125) has a plurality of sets (200, 210, 220) each having a plurality of cache lines (202), each cache line in each set including a tag and a plurality of data words. The controller (122) is coupled to the array (125), and associates both a first portion (205) of the plurality of cache lines (202) of each set with a first bus operation type and a second portion (206) of the plurality of cache lines (202) of each set with a second bus operation type according to a configuration value (232, 234). The controller (122) also causes data conducted over the bus (116) during a bus operation to be selectively stored in the first portion (205) if a type of the bus operation is the first bus operation type or in the second portion (206) if the type of the bus operation is the second bus operation type.

Description

CACHE WITH FLEXIBLE CONFIGURATION, DATA PROCESSING SYSTEM USING SAME,
AND METHOD THEREFOR
Field of the Invention
The invention relates generally to data processing systems, and more particularly to caches for use in data processing systems.
BACKGROUND
A known way to increase the performance of a computer system is to include a local, high-speed memory known as a cache. A cache increases system performance because there is a high probability that when the central processing unit (CPU) accesses data (instructions or data operands) at a particular address, it will access that data or data at an adjacent address shortly thereafter. The cache fetches and stores data that is located adjacent to the requested piece of data from a slower, main memory or lower-level cache.
In very high performance computer systems, several caches may be placed in a hierarchy. The cache that is closest to the CPU, known as the upper-level or "Ll" cache, is the highest-level cache in the hierarchy and is generally the fastest. Other, generally slower caches are then placed in descending order in the hierarchy starting with the "L2" cache, etc., until the lowest level cache that is connected to main memory. "Modern single-chip microprocessors or systems-on-chip (SOCs) typically include both Ll and L2 caches on chip.
In addition to the level in the cache hierarchy, a cache can also be categorized by the type of data that it stores. An instruction cache stores instructions fetched from memory. A data cache stores data operands allocated to the cache from memory. A popular system configuration includes separate Ll instruction and data caches and a combined instruction and data cache at the L2 level.
A basic cache includes a large static random access memory (SRAM) partitioned into multiple-word cache lines. Caches vary on the degree of associativity of the cache lines. A fully-associative cache allows any line from memory to be stored in any line in the cache. A set-associative cache includes cache lines grouped into sets, in which each cache line in the set has a portion of its address, known as the set index, in common with other cache lines in the set. A direct-mapped cache has a specific location for each line of the memory. The set associative technique is usually used because it limits the number of address comparisons during an access to a manageable amount, thereby allowing large cache arrays to be built with a reasonable amount of circuit area.
In a set-associative cache, each cache line or "way" in a set has a corresponding tag that stores a portion of the address. A match between this portion of the access address and the tag of a valid cache line in the corresponding set is. termed a cache hit. The tag is usually implemented with content addressable memory (CAM) to quickly compare an access address against the tag of each cache line in the set.
Since the majority of the circuit area of a cache is SRAM, it has been recognized that some of the cache can be programmed to operate as independent SRAM. Terry Biggs et al. in U.S. Patent No. 5,410,669 disclose a technique to allow a portion of a cache to be used as SRAM on a set-by-set basis. This partitioning provides a programmer with improved flexibility.
Once the cache has been completely filled with data, it is necessary to store new data in cache lines by replacing data that is less likely to be needed in the near future. Thus it is desirable to select cache lines to be removed that have been least-recently used (LRU). To overcome the difficulties with implementing a true LRU system, most caches use a pseudo LRU system.
In addition to instructions and data operands needed by the CPU, the cache is also a convenient place to store data communication frames for further processing. However using the cache for this purpose can cause inefficiency during periods of high data communication traffic. During periods of intense activity on the network, the portion of the cache used for data frames grows and can crowd out locations available to store CPU instructions and data. This crowding out effect can cause the CPU to operate inefficiently since it now has to fetch needed instructions and data from lower levels of the memory hierarchy and must perform relatively slow accesses to main memory more frequently.
What is needed then is a cache system in which crowding out is limited or controlled. Such a cache system is provided by the present invention, whose features and advantages will become apparent from the Detailed Despription taken in conjunction with the attached drawings.
BRIEF DESCRIPTION OF DRAWINGS
The present disclosure may be better understood, and its numerous features and advantages made apparent to those skilled in the art by referencing the accompanying drawing, in which like reference numbers indicate similar or identical items.
FIG. 1 illustrates in block diagram form a data processing system according to the present invention;
FIG. 2 illustrates in block diagram form a portion of the cache of FIG. 1 useful in understanding the organization thereof;
FIG. 3 illustrates in block diagram form a cache access algorithm used by cache access logic in the cache controller of FIG. 1; and
FIG. 4 illustrates a flow diagram of a cache replacement algorithm used by cache replacement logic in the cache controller of FIG. 1.
The use of the same reference symbols in different drawings indicates similar or identical items.
DETAILED DESCRIPTION In one form a cache is adapted to be coupled to a bus and comprises an array and a controller. The array has a plurality of sets each having a plurality of cache lines, each cache line in each set including a tag and a plurality of data words. The controller is coupled to the array, and associates both a first portion of the plurality of cache lines of each set with a first bus operation type and a second portion of the plurality of cache lines of each set with a second bus operation type according to a configuration value. The controller also causes data conducted over the bus during a bus operation to be selectively stored in the first portion if a type of the bus operation is the first bus operation type or in the second portion if the type of the bus operation is the second bus operation type.
For such a cache the first bus operation type may be an input/output stash bus operation and the second bus operation type may be a processor bus operation. In this case the controller may further associate a third portion of the plurality of cache lines of each set to be used as a static random access memory according to the configuration value. The cache may further include a configuration register for storing the configuration value. The configuration register may include a first field for deteπnining a size of the first portion and a second field for determining a size of the third portion, a size of the second portion being a remainder of the plurality of cache lines that are not in the first portion or the third portion.
Also for such a cache, the controller may include replacement logic responsive to an address of the bus operation to select a cache line for replacement from one of the first and second portions of the plurality of cache lines corresponding to the type of the bus operation. The replacement logic may further perform a pseudo least recently used (LRU) algorithm to select the cache line for replacement. Each set may also store a plurality of history bits and the controller may operate on the plurality of history bits to determine a cache line for replacement while avoiding ineligible lines identified by the configuration value.
In another form a data processing system includes a first "bus, a central processing unit (CPU), an input/output controller, and a cache. The CPU is coupled to the first bus and performs processor bus operations on the first bus. The input/output controller performs input/output stash bus operations on the first bus. The cache is coupled to the first bus and includes an array and a controller. The array has a plurality of sets each having a plurality of cache lines, each cache line hi each set including a tag and a plurality of data words. The controller is coupled to the array, and associates both a first portion of the plurality of cache lines of each set with the input/output stash bus operations and a second portion of the plurality of cache lines of each set with the processor bus operations according to a configuration value. The controller also causes data conducted over the first bus during a bus operation to be selectively stored in the first portion if the bus operation is an input/output stash bus operation or in the second portion if the bus operation is the processor bus operation.
The first bus, the central processing unit, the input/output controller, and the cache may be combined on a single integrated circuit. In this case the data processing system may include a second bus coupled to the input/output controller, and a coherency module coupled to the first bus and to the second bus that provides input/output stash bus operations on the second bus to the first bus. In this case the input/output controller may further perform other bus operations that are not the input/output stash bus operations. For example, the input/output controller may perform the input/output stash bus operations for data communication header data - A - and the other bus operations that are not the input/output stash bus operations for data communication payload data.
In one specific embodiment the central processing unit includes a level one (Ll) cache and the cache functions as a level two (L2) cache.
The controller may comprise replacement logic responsive to an address of the bus operation to select a- cache line for replacement from one of the first and second portions of the plurality of cache lines corresponding to the type of the bus operation. Such replacement logic may further perform a pseudo least recently used (LRU) algorithm to select the cache line for replacement. In this case, each set may store a plurality of history bits and the controller may operate on the plurality of history bits to determine a cache line for replacement while avoiding ineligible lines identified by the configuration value.
In yet another form a method for organizing a cache is provided. An array of the cache is organized into a plurality of sets each having a plurality of cache lines, each cache line including a tag and a plurality of data words. A configuration value is received and stored. Both a first portion of the plurality of cache lines of each set is associated with a first bus operation type and a second portion of the plurality of cache lines of each set is associated with a second bus operation type according to the configuration value. Data conducted over a bus during a bus operation is selectively stored in the first portion if the bus operation is the first bus operation type or in the second portion if the bus operation is the second bus operation type.
The step of associating the first portion of the plurality of cache lines with the first bus operation type may further comprise the step of associating the first portion of the plurality of cache lines with an input/output stash bus operation, and the step of associating the second portion of the plurality of cache lines with the second bus operation type may comprise the step of associating the second portion of the plurality of cache lines with a processor bus operation type. The method may also include a step of associating a third portion of the plurality of cache lines of each set to be used as static random access memory according to the configuration value.
Now turning to the drawings, FIG. 1 illustrates in block diagram form a data processing system 100 according to the present invention. Data processing system 100 includes generally a central processing unit (CPU) core 110, a level 2 ("L2") cache 120, a coherency module 130, a bus 140, a double data rate (DDR) synchronous dynamic random access memory (SDRAM) controller 150, a local bus interface 160, input/output controllers in the form of Ethernet controllers 170 and 180, and a peripheral hub 190 to which is attached a variety of peripheral devices that will be described in greater detail below. In the illustrated embodiment, the components of data processing system 100 are combined into a single integrated circuit. Thus local bus interface 160 connects to an off-chip local bus, and DDR SDRAM controller 150 connects to off-chip DDR SDRAMs. It should be apparent that a variety of other integrated circuit configurations are possible using the same basic components of data processing system 100.
Now turning to the components of data processing system 100, CPU core 110 is a high performance reduced instruction set computer (RISC) core. As part of the high performance, CPU core 110 includes a level one ("Ll") data cache 112 and an Ll instruction cache 114. CPU core 110 also includes an interface to a front-side bus 116 by which it performs load and store accesses.
L2 cache 120 includes a bidirectional connection to front-side bus 116, and is a combined instruction and data cache that operates in conjunction with Ll data cache 112 and Ll instruction cache 114 in a non- exclusive arrangement. More particularly L2 cache 120 stores both CPU instructions and data operands, and input/output data. The input/output data is in the form of portions of data communication frames as will be described more fully below. L2 cache 120 includes generally a controller 122 and an array of cache lines 125. As will be described more fully below, array 125 includes both SRAM for storing cache lines and a CAM having locations corresponding to the cache lines for storing TAGs.
Coherency module 130 is bidirectionally connected to both front-side bus 116 and internal bus 140.
Coherency module performs two basic functions. First, it snoops read and write activity on bus 140 and causes caches 112, 114, and 120 to search for cache hits and to store data of certain input/output transactions. Second, it forms a bridge between bus 116 and bus 140 so CPU core 110 can access devices on internal bus 140.
Local bus interface 160 is provided as a bridge between transactions on bus 140 and an external bus.
The external bus may be, for example, a general-purpose memory bus.
Ethernet controllers 170 and 180 are connected to separate Ethernet links and transmit data frames to and receive data frames from these links. As illustrated, they perform at least a portion of the data link layer function and interface to off-chip physical layer devices. Thus Ethernet controllers 170 and 180 are sources and sinks for data communication frames. Coherency module 130 recognizes transactions that must be globally visible for snooping and provides them over front-side bus 116 for storage in L2 cache 120. Thus Ethernet Controllers 170 and 180 advantageously provide data communication frame load and store transactions with attributes that identify them as being globally visible for snooping.
DDR SDRAM controller 150 provides an interface to external mass storage that is compliant with JEDEC Standard JESD-79. In other embodiments, DDR SDRAM controller 150 may be compliant with the so-called DDR-II standard, JESD-90, some other standard, or with conventional asynchronous DRAMs.
Hub 190 is the interconnection point or bridge for connection to a variety of various peripheral devices. Data processing system 100 includes various ones of such devices, including an input/output messaging unit 191, four links labeled "SRIO" that operate according to a serial form of the protocol published by the RapidIO Trade Association shown as a single block 192 to links 193 having up to four lanes, a peripheral component interconnect (PCI) Express (PCI-E) bus interface 194 to links 195 having up to eight lanes, a dedicated PCI or PCI-X controller 196 connected to a corresponding PCI link 197, and a direct memory access (DMA) controller 198. It should be apparent that the peripherals connected to hub 190 are merely illustrative. Ethemet controllers 170 and 180 may operate with normal 10 megabit-per-second (Mbps) Ethernet links, fast Ethernet at 100 Mbps, or Gigabit Ethernet at 1 Gigabit-per-second (Gbps). As should be apparent communication at this high a rate may occasionally result in a large amount of frame data being accumulated in L2 cache 120, which may eventually start to degrade the performance of CPU core 110 through the crowding out effect. However, according to the present invention, L2 cache 120 has the ability to dedicate portions of array 125 for either input/output (I/O) data or CPU data. The I/O operation, known as I/O stashing, will be further described with respect to FIGs. 2-4 below. However generally I/O stashing means storing data conducted during an I/O bus operation in the cache. For example, Ethernet controller 170 may write Ethernet frame data to memory over bus 140, and L2 cache 120 could advantageously stash this data since it would be needed soon thereafter for further processing by CPU core 110. This capability allows a first programmable portion of L2 cache 120 to be set aside for (Le. associated with) input/output operations to support the data communication task and a second programmable portion to be set aside for CPU instructions and operand data, to substantially avoid crowding out. In addition, as will be described below, a third programmable portion of cache 120 can be dedicated for use as a general purpose SRAM, providing a high degree of flexibility for different system needs.
FIG. 2 illustrates in block diagram form a portion of cache 120 of FIG. 1 useful in understanding the organization thereof. As shown in FIG. 2 cache 120 includes array 125 and a control register 230 labeled "L2 CACHE CONTROL REGISTER" associated with controller 122. Array 125 includes N sets of cache lines including a set 200 labeled "SET 0", a set 210 labeled "SET 1", and a set 220 labeled "SET N". Each set is 8- way associative in that it has eight cache lines 202, and also a common set of pseudo least recently used
(PLRU) bits 204. As described herein, PLRU bits 204 function as history bits to store the history of accesses to the cache lines of set 200. Each cache line has a corresponding tag labeled "ADDRESS TAG 0" through "ADDRESS TAG 7" corresponding to both a way of the group WAY 0 through WAY 7 and a group of eight data words labeled "WORDS[0-7]". Each cache line also includes control bits for implementing cache coherency, not specifically shown in FIG. 2, including a valid bit for indicating whether the cache line is valid.
The sets hi array 125 are further organized into groups of sets known as banks. For example, array 125 is organized into eight banks of 256 sets, each set including eight cache lines or "ways" of thirty-two bytes. Thus cache 120 includes 8 X 256 X 8 X 32 = 512K bytes of memory. Note that in the specific embodiment shown in FIG. 2, one word is four bytes hi length, but in other embodiments cache lines may be of different lengths.
PLRU bits 204 are used to implement the cache replacement algorithm. For example, the bank and set index of a particular access address indicates that new data will be allocated to a line hi SET 0. PLRU bits 204, in conjunction with the association by bus operation type and locking mechanisms, then determine which cache line must be cast out, as will be further explained with respect to FIG. 4 below.
In order to substantially avoid crowding out, L2 CACHE CONTROL REGISTER 230 stores a configuration value to associate a first portion of each set with I/O stash operations and a second portion of each set with processor bus operations. This association ensures that certain amounts of the cache will remain available to the processor even during periods of high network traffic, allowing the processor to continue operating efficiently. In the specific architecture controller 122 determines the bus operation type by examining both a set of signals known as the "transfer type" issued by the bus master during a bus transaction, and the address of the transaction. Considering both the transfer type and the address allows controller 122 to interpret only some of the transactions initiated by Ethernet controllers 170 and 180 as being I/O stash operations that need to be cached. Thus, for example, data communication header data that will be subsequently processed by CPU core 110 will be available in cache 120, whereas payload data will not be cached. In other embodiments, however, these and other factors could be used differently to determine bus operation type.
L2 cache 120 also includes the ability to dedicate portions of each set to be used as SRAM, providing additional flexibility for systems that need dedicated SRAM for high-speed scratchpad memory.
A first 3-bit field 232 labeled "L2SRAM" determines which portion of each set starting at the top will be associated with the SRAM function, shown as SRAM portion 207. L2SRAM allows sizes of 0, 1, 2, 4, and 8 ways associated with the SRAM function. A second 2-bit field 234 labeled "L2STASHCTL" determines which portion of each set, starting from the bottom, will be associated with the I/O stash function, shown as I/O STASH portion 205. L2STASHCTL allows 0, 1, 2, and 4 ways to be associated with the I/O stash function. Any remaining portion in the middle will be associated with the processor cache function, shown as PROCESSOR CACHE portion 206. It should be apparent that L2SRAM field 232 and L2STASHCTL field 234 collectively form a configuration value for the three portions, that L2 CACHE CONTROL REGISTER 230 can determine the size of all three portions of each set through the specification of the size of any two of the fields, and that the fields that are provided to determine the size of these three portions may vary in other embodiments.
For the example shown in FIG. 2, L2SRAM field 232 has the value of "100" to indicate that the SRAM portion includes the top two cache lines in each set corresponding to WAYO and WAYl. L2STASHCTL field 234 has the value of "10" to indicate that the I/O stash area includes two cache lines in each set corresponding to WAY6 and WAY7. Thus, the processor cache area includes the remaining four cache lines in the middle of each set corresponding to WAY2-WAY5.
FIG. 3 illustrates in block diagram form a cache access algorithm used by cache access logic in cache controller 122 of FIG. 1. Controller 122 performs the lookup process by examining a 36-bit access address labeled "ADDR" 320. ADDR 320 includes a first portion 322 corresponding to bits 20-22 that controller 122 uses as a 3-bit bank select index 330 to select one of the 8 available banks; a second portion 324 corresponding to bits 23-30 that controller 122 uses as an 8-bit set index 332 to select one of the 256 available sets in the selected bank; a third portion 326 corresponding to bits 0-19 that controller 122 uses indirectly to form a 3-bit way select index 334; and a fourth portion 328 corresponding to bits 31-35 that controller 122 uses as a 5-bit byte select index 336. Once the bank and set are determined, the 20-bit way select comparison determines whether the line is stored in the cache by performing eight comparisons in parallel using eight 20-bit tags 310. If there is a match (and the line is valid as indicated by the valid bit), then the one of the eight cache lines that matched the tag portion of the ADDR indicates the selected way.
If there is no hit, then a cache line corresponding to ADDR 320 may be allocated to a selected area of the cache. The process of allocation involves replacing an existing cache line with the new cache line. Controller 122 determines which existing cache line will be replaced according to a pseudo LRU algorithm that has been specially modified to store the new cache line in a portion of the cache that has been allocated to the source of the data as indicated by the bus operation type, either processor cache or I/O stash.
The replacement algorithm will next be described. FIG. 4 illustrates a cache replacement algorithm 400 used by cache replacement logic in cache controller 122 of FIG. 1. When a cache line is to be allocated, the set is first selected by the block and set indexes of the access address. Then PLRU bits 204 are used to select the appropriate way. The general flow of cache replacement algorithm 400, which is a decision tree, will first be described. At a step 410 bit PO is examined. IfPO = 0, then the flow continues to a step 420, at which the state of bit Pl is examined. If Pl = 0, then flow continues to a step 430, at which the state of bit P3 is tested. If P3 = 0, then cache controller 122 allocates WAY 0, whereas if P3 = 1, then cache controller 122 allocates WAY 1. If Pl = 1, then flow continues to a step 440, at which the state of bit P4 is tested. If P4 = 0, then cache controller 122 allocates WAY 2, whereas if P4 = 1, then cache controller 122 allocates WAY 3. If PO = 1, then the flow continues to a step 450, at which the state of bit P2 is examined. If P2 = 0, then flow continues to a step 460, at which the state of bit P5 is tested. If P5 = 0, then cache controller 122 allocates WAY 4, whereas if P5 = 1, then cache controller 122 allocates WAY 5. If P2 = 1, then flow continues to a step 470, at which the state of bit P6 is tested. If P6 = 0, then cache controller 122 allocates WAY 6, whereas if P6 = 1, then cache controller 122 allocates WAY 7.
Next a conventional pseudo LRU algorithm that does not take into account the bus operation type will be described. Assume that WAY 3 is the least recently used cache line. PLRU bits 204 will have the value [01 lxxxx], wherein "x" represents a don't care state. Since PO = 0, the flow proceeds from box 410 leftward to box 420; since Pl = 1, the flow next proceeds rightward to box 440. Finally since P4 = 1, WAY 3 is allocated.
The pseudo LRU algorithm that has been modified according to the present invention will now be described. In general, controller 122 modifies cache replacement algorithm 400 to walk the decision tree while avoiding ineligible lines identified by the configuration value stored in L2 CACHE CONTROL REGISTER 230. Suppose again that WAY 3 is the LRU cache line, but the bus operation type is an I/O stash operation. Since L2STASHCTL has allocated the two lines corresponding to WAY 6 and WAY 7 to the I/O stash operation, then the leftward flow from decision box 410 is an ineligible transition regardless of the state of PO, so the flow proceeds rightward to decision box 450. Similarly, the leftward flow from decision box 450 is an ineligible transition, so the flow proceeds rightward to decision box 470. Then the state of P6 determines whether to replace the cache line in WAY 6 or WAY 7 with the new cache line. This pseudo LRU algorithm allows easy integration with a cache locking mechanism. When a line in the cache is locked, it cannot be de-allocated, regardless of the states of the PLRU bits. Thus to support locking, controller 122 also recognizes ineligible state transitions in response to locked lines.
While at least one exemplary embodiment has been presented in the foregoing detailed description, it should be appreciated that a vast number of variations exist. It should also be appreciated that the exemplary embodiment or exemplary embodiments are only examples, and are not intended to limit the scope, applicability, or configuration of the invention in any way. Rather, the foregoing detailed description will provide those skilled in the art with a convenient road map for implementing the exemplary embodiment or exemplary embodiments. For example, data processing system 100 could be implemented as a single integrated circuit (IC), or various components such as L2 cache 120 could be implemented as separate ICs or systems. Also controller 122 could separate portions of each set based on other combinations of bus operation types besides I/O stash bus operations and processor bus operations. Furthermore many other cache organizations besides the example shown are possible. It should be understood that various changes can be made in the function and arrangement of elements without departing from the scope of the invention as set forth hi the appended claims and the legal equivalents thereof.

Claims

WHAT IS CLAIMED IS:
1. A cache (120) adapted to be coupled to a bus (116), comprising: an array (125) having a plurality of sets (200, 210, 220) each having a plurality of cache lines (202), each cache line in each set including a tag and a plurality of data words; and a controller (122) coupled to said array (125), said controller (122) associating both a first portion (205) of said plurality of cache lines (202) of each set with a first bus operation type and a second portion (206) of said plurality of cache lines (202) of each set with a second bus operation type according to a configuration value (232, 234) and causing data conducted over the bus (116) during a bus operation to be selectively stored in said first portion (205) if a type of said bus operation is said first bus operation type or in said second portion (206) if said type of said bus operation is said second bus operation type.
2. The cache (120) of claim 1 wherein said first bus operation type is an input/output stash bus operation and said second bus operation type is a processor bus operation.
3. The cache (120) of claim 2 wherein said controller (122) further associates a third portion (207) of said plurality of cache lines (202) of each set to be used as static random access memory according to said configuration value (232, 234).
4. The cache (120) of claim 3 further comprising a configuration register (230) for storing said configuration value (232, 234).
5. The cache (120) of claim 4 wherein said configuration register (230) includes a first field (234) for determining a size of said first portion (205) and a second field (232) for determining a size of said third portion (207), a size of said second portion (206) being a remainder of said plurality of cache lines (202) that are not in said first portion (205) or said third portion (207).
6. The cache (120) of claim 1 wherein said controller (122) comprises replacement logic (122, 400) responsive to an address of said bus operation to select a cache line for replacement from one of said first and second portions of said plurality of cache lines corresponding to said type of said bus operation.
7. The cache (120) of claim 6 wherein said replacement logic (122, 400) further performs a pseudo least recently used (LRU) algorithm to select said cache line for replacement.
8. The cache (120) of claim 7 wherein each set stores a plurality of history bits (204) and said controller (122) operates on said plurality of history bits (204) to determine a cache line for replacement while avoiding ineligible lines identified by said configuration value (232, 234).
9. A data processing system (100) comprising: a first bus (116); a central processing unit (CPU) (110) coupled to said first bus (116) for performing processor bus operations on said first bus (116);. an input/output controller (170) for performing input/output stash bus operations on said first bus
(116); and a cache (120) coupled to said first bus (116) comprising: an array (125) having a plurality of sets (200, 210, 220) each having a plurality of cache lines (202), each cache line in each set including a tag and a plurality of data words; and a controller (122) coupled to said array (125), said controller (122) associating both a first portion
(205) of said plurality of cache lines (202) of each set with said input/output stash bus operations and a second portion (206) of said plurality of cache lines (202) of each set with said processor bus operations according to a configuration value (232, 234) and causing data conducted over said first bus (116) during a bus operation to be selectively stored in said first portion (205) if said bus operation is an input/output stash bus operation or in said second portion (206) if said bus operation is said processor bus operation.
10. The data processing system (100) of claim 9 wherein said first bus (116), said central processing unit (110), said input/output controller (170), and said cache (120) are combined on a single integrated circuit.
11. The data processing system (100) of claim 9 further comprising: a second bus (140) coupled to said input/output controller (170); and a coherency module (130) coupled to said first bus (116) and to said second bus (140) that provides input/output stash bus operations on said second bus (140) to said first bus (116).
12. The data processing system (100) of claim 11 wherein said input/output controller (170) further performs other bus operations that are not said input/output stash bus operations.
13. The data processing system (100) of claim 12 wherein said input/output controller (170) performs said input/output stash bus operations for data communication header data and said other bus operations that are not said input/output stash bus operations for data communication payload data.
14. The data processing system (100) of claim 9 wherein said central processing unit (110) includes a level one (Ll) cache (112) and said cache (120) functions as a level two (L2) cache.
15. The data processing system (100) of claim 9 wherein said controller (122) comprises replacement logic (122, 400) responsive to an address of said bus operation to select a cache line for replacement from one of said first and second portions of said plurality of cache lines corresponding to said type of said bus operation.
16. The data processing system (100) of claim 15 wherein said replacement logic (400) further performs a pseudo least recently used (LRU) algorithm to select said cache line for replacement.
17. The data processing system (100) of claim 16 wherein each set stores a plurality of history bits (204) and said controller (122) operates on said plurality of history bits (204) to determine a cache line for replacement while avoiding ineligible lines identified by said configuration value (232, 234).
18. A method for organizing a cache (120) comprising the steps of: organizing an array (125) of the cache (120) into a plurality of sets (200, 210, 220) each having a plurality of cache lines (202), each cache line including a tag and a plurality of data words; receiving and storing a configuration value (232, 234); associating both a first portion (205) of said plurality of cache lines (202) of each set with a first bus operation type and a second portion (206) of said plurality of cache lines (202) of each set with a second bus operation type according to said configuration value (232, 234); selectively storing data conducted over a bus (116) during a bus operation in said first portion (205) if said bus operation is said first bus operation type or in said second portion (206) if said bus operation is said second bus operation type.
19. The method of claim 18 wherein said step of associating said first portion (205) of said plurality of cache lines (202) with said first bus operation type comprises the step of associating said first portion (205) of said plurality of cache lines (202) with an input/output stash bus operation, and wherein said step of associating said second portion (206) of said plurality of cache lines (202) with said second bus operation type comprises the step of associating said second portion (206) of said plurality of cache lines (202) with a processor bus operation type.
20. The method of claim 19 further comprising the step of associating a third portion of said plurality of cache lines (202) of each set to be used as static random access memory according to said configuration value (232, 234).
EP05766573A 2005-06-15 2005-06-15 Cache with flexible configuration, data processing system using same, and method therefor Withdrawn EP1894098A1 (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/US2005/021146 WO2007001257A1 (en) 2005-06-15 2005-06-15 Cache with flexible configuration, data processing system using same, and method therefor

Publications (1)

Publication Number Publication Date
EP1894098A1 true EP1894098A1 (en) 2008-03-05

Family

ID=35636869

Family Applications (1)

Application Number Title Priority Date Filing Date
EP05766573A Withdrawn EP1894098A1 (en) 2005-06-15 2005-06-15 Cache with flexible configuration, data processing system using same, and method therefor

Country Status (3)

Country Link
EP (1) EP1894098A1 (en)
JP (1) JP2008544366A (en)
WO (1) WO2007001257A1 (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10140219B2 (en) * 2012-11-02 2018-11-27 Blackberry Limited Multi-port shared cache apparatus
US10452593B1 (en) * 2018-05-03 2019-10-22 Arm Limited High-performance streaming of ordered write stashes to enable optimized data sharing between I/O masters and CPUs

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH07248967A (en) * 1994-03-11 1995-09-26 Hitachi Ltd Memory control system
US5893153A (en) * 1996-08-02 1999-04-06 Sun Microsystems, Inc. Method and apparatus for preventing a race condition and maintaining cache coherency in a processor with integrated cache memory and input/output control
EP1215581A1 (en) 2000-12-15 2002-06-19 Texas Instruments Incorporated Cache memory access system and method
US20030041213A1 (en) * 2001-08-24 2003-02-27 Yakov Tokar Method and apparatus for using a cache memory
JP3929872B2 (en) * 2002-10-30 2007-06-13 株式会社東芝 Cache memory, processor and cache control method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See references of WO2007001257A1 *

Also Published As

Publication number Publication date
JP2008544366A (en) 2008-12-04
WO2007001257A1 (en) 2007-01-04

Similar Documents

Publication Publication Date Title
US10394718B2 (en) Slot/sub-slot prefetch architecture for multiple memory requestors
US7917699B2 (en) Apparatus and method for controlling the exclusivity mode of a level-two cache
US7793048B2 (en) System bus structure for large L2 cache array topology with different latency domains
KR100327935B1 (en) High performance cache directory addressing scheme for variable cache sizes utilizing associativity
US7516275B2 (en) Pseudo-LRU virtual counter for a locking cache
US8001330B2 (en) L2 cache controller with slice directory and unified cache structure
US20140281248A1 (en) Read-write partitioning of cache memory
US6157980A (en) Cache directory addressing scheme for variable cache sizes
US20040260908A1 (en) Method and apparatus for dynamic prefetch buffer configuration and replacement
US8621152B1 (en) Transparent level 2 cache that uses independent tag and valid random access memory arrays for cache access
US9720847B2 (en) Least recently used (LRU) cache replacement implementation using a FIFO storing indications of whether a way of the cache was most recently accessed
CN101918925B (en) Second chance replacement mechanism for a highly associative cache memory of a processor
JPWO2010035425A1 (en) Cache memory, control method thereof, and memory system
JP6478843B2 (en) Semiconductor device and cache memory control method
US20060179230A1 (en) Half-good mode for large L2 cache array topology with different latency domains
CN107771322B (en) Management of memory resources in programmable integrated circuits
WO2010004497A1 (en) Cache management systems and methods
US6463514B1 (en) Method to arbitrate for a cache block
US7454580B2 (en) Data processing system, processor and method of data processing that reduce store queue entry utilization for synchronizing operations
KR100395768B1 (en) Multi-level cache system
US7610458B2 (en) Data processing system, processor and method of data processing that support memory access according to diverse memory models
US6510493B1 (en) Method and apparatus for managing cache line replacement within a computer system
US6240487B1 (en) Integrated cache buffers
US7302530B2 (en) Method of updating cache state information where stores only read the cache state information upon entering the queue
EP1894098A1 (en) Cache with flexible configuration, data processing system using same, and method therefor

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

17P Request for examination filed

Effective date: 20080115

AK Designated contracting states

Kind code of ref document: A1

Designated state(s): DE FR GB

17Q First examination report despatched

Effective date: 20080613

RBV Designated contracting states (corrected)

Designated state(s): DE FR GB

DAX Request for extension of the european patent (deleted)
STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE APPLICATION IS DEEMED TO BE WITHDRAWN

18D Application deemed to be withdrawn

Effective date: 20120103