GB2365582A

GB2365582A - High bandwidth cache

Info

Publication number: GB2365582A
Application number: GB0102442A
Authority: GB
Inventors: Reid James Riedlinger; Dean Ahmad Mulla; Thomas Grutkowski
Original assignee: Intel Corp; Hewlett Packard Co
Current assignee: HP Inc; Intel Corp
Priority date: 2000-02-18
Filing date: 2001-01-31
Publication date: 2002-02-20
Also published as: GB0102442D0

Abstract

A system and method are disclosed which provide a high bandwidth cache 100 that enables reads and writes to be performed simultaneously. More specifically, a system and method are disclosed which provide a cache design that enables any one of multiple cache banks 10, 20 to be mapped to any one of multiple ports to satisfy a memory access request. In a preferred embodiment, multiple ports are dedicated as load (or "read") ports and multiple ports are dedicated for stores and fills (i.e., "write" ports). In a preferred embodiment, the cache structure is segmented into multiple cache banks. In a preferred embodiment, the cache structure is implemented such that any one of the multiple cache banks may be mapped to any one of the multiple ports, thereby enabling a high bandwidth cache. In a preferred embodiment, the cache structure comprises a cross-over MUX 14 that enables data from any one of the multiple cache banks to be mapped to any one of the multiple ports to satisfy a memory access request. Moreover, in a preferred embodiment, the cache structure is arranged to receive multiple memory access requests and map any one of the multiple cache banks to any one of the multiple ports in order to satisfy, in parallel, multiple ones of the multiple memory access requests received. Accordingly, in a preferred embodiment, the cache structure is arranged such that it may satisfy a read request via a dedicated read port and a write request via a dedicated write port, in parallel.

Description

<Desc/Clms Page number 1> METHOD AND SYSTEM FOR PROVIDING A HIGH BANDWIDTH CACHE THAT ENABLES SIMULTANEOUS READS AND WRITES WITHIN THE CACHE RELATED APPLICATIONS This application is related to co-filed and commonly assigned U.S. Patent Application Serial Number [Attorney Docket No. 10971421] entitled "METHOD AND SYSTEM FOR EARLY TAG ACCESSES FOR LOWER-LEVEL CACHES IN PARALLEL WITH FIRST- LEVEL CACHE," and co-filed and commonly assigned U.S. Patent, Application Serial Number [Attorney Docket No. 10971230] entitled "SYSTEM AND METHOD UTILIZING SPECULATIVE CACHE ACCESS FOR IMPROVED PERFORMANCE," the disclosures of which are hereby incorporated herein by reference.

TECHNICAL FIELD This invention relates in general to cache design for a computer processor, and in specific to a high bandwidth cache design.

BACKGROUND Computer systems may employ a multilevel hierarchy of memory, with relatively fast, expensive but limited-capacity memory at the highest level of the hierarchy and proceeding to relatively slower, lower cost but higher-capacity memory at the lowest level of the hierarchy. The hierarchy may include a small fast memory called a cache, either physically integrated within a processor or mounted physically close to the processor for speed. The computer system may employ separate instruction caches and data caches. In addition, the computer system may use multiple levels of caches. The use of a cache is generally transparent to a computer program at the instruction level and can thus be added to a computer architecture without changing the instruction set or requiring modification to existing programs.

Computer processors typically include cache for storing data. When executing an instruction that requires access to memory (e.g., read from or write to memory), a processor typically accesses cache in an attempt to satisfy the instruction. Of course, it is desirable to have the cache implemented in a manner that allows the processor to access the cache in an efficient manner. That is, it is desirable to have the cache implemented in a manner such that the processor is capable of accessing the cache (i.e., reading from or writing to the cache) quickly so that the processor may be capable of executing instructions quickly.

Bank cache structures of the prior art are typically designed having dedicated ports for each bank. Generally, a "port" is physical wire(s) coupled directly to a bank for memory access (e.g., to read from and write to a memory bank). As an example of a prior art bank cache structure, a cache may be designed with two banks: an even address bank and an odd address bank, and such cache may have a dedicated port for both stores (i.e., writes) and loads (i.e., reads) to each one of these banks. That is, a dedicated port is typically used for both reading from and writing to the bank. The even side of the banks-could be doing a write and the odd side could be doing a read at the same time. Thus, such prior designs enable a higher bandwidth from the cache, but the designs limit the number of accesses to the number of physical ports and the number of banks that are implemented for the cache.

Since prior art caches are designed with a dedicated port for each cache bank, such cache designs allow only a load or a store to a particular bank to be performed at any given time. Accordingly, since either port of a cache bank may only be used for a load or a store at any given time, a high number of address conflicts occur in prior art designs. Such address conflicts result in an execution unit requesting access to the cache to be stalled while awaiting the conflict to be removed. Accordingly, the time required for satisfying the execution unit may be delayed, thereby resulting in a greater latency for the computer system. Additionally, prior art cache designs might only allow simultaneous access to an odd address bank and an even address bank. This example of prior art cache designs do not allow simultaneous access to two even address banks or two odd address banks, for example. Such a design further constrains the memory access requests that may be satisfied simultaneously by the cache.

SUMMARY OF THE INVENTION In view of the above, a desire exists for a cache bank structure that allows for high bandwidth for the cache. Generally, "bandwidth" is the amount (or "width") of data that can be provided to the core at any point in time, as well as the speed at which it is provided to the core. Thus, increasing the amount of data that can be provided to the core at any given time or increasing the speed in which data can be provided to the core typically increases the bandwidth of the cache memory. A further desire exists for a cache bank structure that allows for multiple accesses of the cache simultaneously, while reducing the number of address conflicts that occur. That is, a further desire exists for a cache bank structure that allows for simultaneous reads and writes to be performed within the cache, while reducing the number of address conflicts incurred.

These and other objects, features and technical advantages are achieved by a system and method which provide a cache design that enables any one of multiple cache banks to be mapped to any one of multiple ports to satisfy a memory access request. In a preferred embodiment, multiple ports are dedicated as load (or "read") ports and multiple ports are dedicated for stores and fills (i.e., "write" ports). In a most preferred embodiment, four ports are dedicated as load ports, four ports are dedicated as store ports, and one is dedicated as a fill port. However, in alternative embodiments, any number of ports may be dedicated for loads, stores, and fills. In a preferred embodiment, the cache structure is segmented into multiple cache banks. In a most preferred embodiment, the cache structure is segmented into sixteen cache banks. However, in alternative embodiments, any number of cache banks may be implemented within the cache. In a preferred embodiment, the cache structure is implemented such that any one of the multiple cache banks may be mapped to any one of the multiple ports, thereby enabling a high bandwidth cache. In a preferred embodiment, the cache structure comprises a crossover MUX that enables data from any one of the multiple cache banks to be mapped to any one of the multiple load ports.

In a most preferred embodiment, a fill uses eight banks, each store uses one bank at a time, and each load uses one bank at a time. More specifically, in a preferred embodiment, a single fill line is distributed across eight banks. Thus, a portion of the fill line may be utilized

to read from each of the eight banks which it is distributed across simultaneously. Since four store ports are implemented in a most preferred embodiment, four banks may be utilized for performing stores, and since four load ports are implemented in a most preferred embodiment, four banks may be utilized for performing loads simultaneously. Accordingly, in a most preferred embodiment, sixteen cache accesses may effectively be performed simultaneously (e.g., four stores, four loads, and eight banks may be used for a fill). Because alternative embodiments may be implemented having a greater or fewer number of banks and ports, such alternative embodiments may enable a greater or fewer number of simultaneous cache accesses. Utilizing dedicated ports for loads enables better scheduling of accesses within the cache, which thereby increases the bandwidth of data that can be returned to the "core." As used herein, the "core" of a chip is the particular execution unit (e.g., an integer execution unit or floating point execution unit) that issued the memory access request to the cache. Additionally, utilizing dedicated store ports ("write" ports) in the cache enables a higher bandwidth of write data to be written into the cache without interfering with the loads that may be occurring simultaneously.

It should be understood that in prior art architectures a cache may be designed having multiple ports (e.g., four ports) coupled to the cache. However, at any given time, such multiple ports of the prior art designs can only be used for stores or only be used for writes. Thus, at most, such prior art cache designs allow for four stores and no loads to be performed at any given time, or four loads and no stores to be performed at any given time. A preferred embodiment of the present invention provides a cache structure that enables simultaneous reads and writes. More specifically, a preferred embodiment enables simultaneous reads and writes within the cache by providing multiple dedicated read ports and multiple dedicated write ports, and enabling any one of multiple cache banks to be mapped to any one of the multiple dedicated read ports and multiple dedicated write ports to satisfy a memory access request simultaneous with a cache bank being mapped to another one of the ports for satisfying another memory access request.

In a preferred embodiment, a line in the cache structure is distributed across multiple banks (e.g., eight banks) to enable multiple banks (e.g., four banks if there are four read ports)

to be utilized for performing simultaneous reads on the same line. Thus, for example, memory access requests may be received on multiple read ports, with the memory access request on each read port requesting access to the same line of the cache structure. For example, it is common within the execution of software code for the code to hit the same line very often. In a preferred embodiment, the cache may satisfy multiple read requests for a single line of cache simultaneously by enabling multiple read ports to be mapped to the same line of cache simultaneously, thereby increasing the bandwidth of the cache and decreasing the latency involved in accessing the cache.

It should be appreciated that a technical advantage of one aspect of the present invention is that a cache structure having a high bandwidth is provided. That is, a cache structure is disclosed in which the cache bandwidth far exceeds any cache dynamic construction of the prior art. A further technical advantage of one aspect of the present invention is that a cache structure is disclosed which enables simultaneous read and write operations to be satisfied within the cache. Accordingly, because a greater number of simultaneous operations can occur, the number of stalls required for execution units is decreased. Still a further technical advantage of one aspect of the present invention is that a cache structure is provided which eliminates the need for a large store buffer. Prior art caches are typically implemented having a large store buffer, however, because a cache structure of a preferred embodiment provides a greater amount of store bandwidth, such a large store buffer may be eliminated from the cache design of a preferred embodiment.

In addition, a further technical advantage of one aspect of the present invention is that because stores have a different linked pipeline, this enables a store that is bypassed to not conflict with a later load in the pipeline. For example, assume that a store is bypassed in one cycle. Further assume that it takes two additional cycles for the store to reach the cache. Now assume that a load instruction, which is younger than the received store instruction (i.e., was issued later in time than the store instruction), is received by the cache. The load and store instructions may be satisfied simultaneously by the cache if they do not have a bank conflict. That is, the instructions may be satisfied simultaneously if their physical address index (e.g., bits [7:4] of their physical addressed) do not conflict.

Yet a further technical advantage of one aspect of the present invention is that the use of a dedicated read and write port enables data to be sent to the cache and not written until a bank is available. For example, fill data may be sent to the cache in four chunks. In a preferred embodiment, the data is stored in the cache bank's data array until the last chunk arrives in the cache. In traditional cache architectures, the whole line is sent to the cache at one time. This prevents any other read or write operation from occurring simultaneously in traditional cache architectures. However, with the design of a preferred embodiment, multiple reads and writes may be occurring simultaneously. For instance, in a most preferred embodiment, four loads, four stores, and a fill may be occurring simultaneously. Thus, in a most preferred embodiment, the cache appears as though it is sixteen ported because a fill requires eight banks and each one of the loads and stores require an additional bank.

The foregoing has outlined rather broadly the features and technical advantages of the present invention in order that the detailed description of the invention that follows may be better understood. Additional features and advantages of the invention will be described hereinafter which form the subject of the claims of the invention. It should be appreciated by those skilled in the art that the conception and specific embodiment disclosed may be readily utilized as a basis for modifying or designing other structures for carrying out the same purposes of the present invention. It should also be realized by those skilled in the art that such equivalent constructions do not depart from the spirit and scope of the invention as set forth in the appended claims.

BRIEF DESCRIPTION OF THE DRAWING For a more complete understanding of the present invention, and the advantages thereof, reference is now made to the following descriptions taken in conjunction with the accompanying drawing, in which: FIGURE 1 shows an overview of a cache design for a preferred embodiment; and FIGURE 2 shows a preferred embodiment of a cache design in greater detail.

DETAILED DESCRIPTION Turning to FIG. 1, an overview of a preferred embodiment of a cache design 100 is shown. In a preferred embodiment, the cache 100 is partitioned into sixteen banks (or "ways"). As used herein, a "bank" or "way" is a segmentation of the cache memory. In a most preferred embodiment, the cache 100 is 256 kilobyte (KB) cache, which is partitioned into sixteen banks each having 16 KB of data. Thus, in a preferred embodiment, the cache 100 is implemented with sixteen banks of 16 KB of data, which results in a total of 256 KB for the cache. In alternative embodiments, the cache 100 may be any size having any number of banks implemented therein, and any such implementation is intended to be within the scope of the present invention.

As shown in FIG. 1, the cache 100 is implemented with a first set of banks 10, which may comprise any number of banks 10 ,, 102, . . . ION, and a second set of banks 20, which may comprise any number of banks 20,, 202, . . . 20N. In a most preferred embodiment, bank sets 10 and 20 each comprise eight banks. In alternative embodiments, any number of sets of banks, each set comprising any number of banks, may be implemented for the cache 100, and any such implementation is intended to be within the scope of the present invention. As further shown in FIG. 1, the cache 100 is implemented with a store/fill multiplexer ("MUX") 12 for the first set of banks 10, and a store/fill MUX 16 for the second set of banks 20. The cache 100 further comprises a crossover MUX 14. In a most preferred embodiment, crossover MUX 14 comprises four 16:1 MUXes, such that MUX 14 is capable of mapping any one of sixteen banks to any one of four read ports. However, in alternative embodiments, crossover MUX 14 may comprise any number of MUXes such that it is capable of mapping any number of banks to any number of ports, and any such implementation is intended to be within the scope of the present invention.

The data path for the level of cache shown in FIG. 1 (e.g., level Ll) supplies each of the banks l01 through l ON and 20, through 20N with an independent address. In a most preferred embodiment, such independent address is in the form of a physical address index (e.g., bits [ 14:$) of a physical address) requested by an instruction and a way select signal (e.g., bits [7:0) of a way select signal for an eight way cache) indicates the appropriate way

for such requested physical address. A data bus (not shown) coupled to each bank is used for input/output (I/0) for the bank. In a most preferred embodiment, such data bus is a 128 bit data bus. However, in alternative embodiments, any size data bus may be implemented, and any such implementation is intended to be within the scope of the present invention. In a preferred embodiment, such a data bus is used both to write data into the bank to which it is coupled, as well as read data from such bank. The individual wires of such a data bus may be referred to herein as global bit lines.

Each bank 10, through l ON and 20, through 20N only has one I/0 input to it for a read or write instruction. Thus, read and write instructions use the same 1/0 data input to a bank, in a preferred embodiment. In a preferred embodiment, when data is being read from a cache bank (e.g., bank 10 ,), the global bit line for the bank is pulled down to a low voltage value by the cache's random access memory (RAM) cells, which causes the data to be received at a read port from the cache bank's data array via crossover MUX 14. When data is being written to a cache bank (e.g., bank 10,), the global bit line for the bank is pulled down by the store/fill IVIUX 12, 16 for the cache bank. Thereafter, the word line is fired for the bank and the data is written from the bank into the cache bank's data array. Accordingly, each cache bank effectively has an independent 1/0 wire out of it.

The store/fill MUX 12 is used to select between the store ports of the cache banks 10 and the fill data input to the store/fill MUX 12. In general, "fill data" is data that is written into the cache from other levels of the memory hierarchy (e.g., main memory). For example, data may be written into the cache from the disk drive of a computer system to allow for faster access of such data thereafter. Fill data is typically written to the cache via a "line," which in a preferred embodiment is 128 bytes (e.g., capable of writing 128 bytes of fill data), but may be any size in alternative embodiments. The store/fill MUX 16 is used to select between the store ports of the cache banks 20 and the fill data input to the store/fill MUX 16. Thus, in a preferred embodiment, each bank may only be performing one store or one fill at any given time. A bank may not be performing both a store and a fill at the same time. Additionally, a bank may only be performing either a store/fill (i.e., write) or a load (i.e.,

read) at any given time. Thus, each bank may only satisfy either a read or a write operation at any given time.

In a most preferred embodiment, the cache design 100 comprises four store ports (write ports), four load ports (read ports), and one fill port. In a preferred embodiment, the fill port is distributed across eight banks. As discussed above, fill data is preferably written to the cache via a 128 byte line. In a preferred embodiment, the data fill line is distributed across eight banks, such that the 128 bytes is written into the eight banks with each bank receiving sixteen bytes of the line. It should be recognized that in a preferred embodiment, portions of an individual data fill line can be accessed in parallel for satisfying multiple loads (reads). Accordingly, a preferred embodiment provides a pseudo-sixteen ported SRAM enabling simultaneous reads and writes. That is, in a preferred embodiment, it appears as though sixteen ports are implemented because sixteen banks can be accessed simultaneously. More specifically, the four store ports may each access one bank (for a total of four banks being accessed by the store ports), the four load ports may each access one bank (for a total of four banks being accessed by the load ports), and the fill port may access eight banks, resulting in a total of sixteen banks that may be accessed simultaneously. However, in alternative embodiments, any number of store ports, load ports, and fill ports may be implemented, and any such implementation is intended to be within the scope of the present invention. It should also be understood that in alternative embodiments a fill port may be distributed across any number of banks, and any such implementation is intended to be within the scope of the present invention. It should be understood that by implementing a greater number of banks within cache 100,a greater number of read and write operations may be satisfied by the cache 100 simultaneously.

As shown in FIG. 1, a crossover MUX 14 is implemented within a preferred embodiment. Crossover MUX 14 enables the cache to map data from any one of the banks of sets 10 and 20 (i.e., any one of banks 10, through 10,; and 20, through 20") to any of the cache ports. As discussed above, a total of nine ports, including four load ports, four store ports, and one fill port, are implemented in a most preferred embodiment. Thus, for a most preferred embodiment, crossover MUX 14 enables the cache to map any one of the sixteen

banks to any one of the four load ports. Similarly, crossover MUX 14 enables any one of the four store ports to be mapped to any one of the banks implemented for the cache. Accordingly, crossover MUX 14 is capable of mapping any one of the cache banks to any of the cache ports based upon whether an address requested by a port is contained within a cache bank. That is, crossover MUX 14 is capable of mapping any one of the ports, which contains a memory access request to a memory address, to a bank containing the requested memory address.

As a result, a preferred embodiment provides a cache design that does not have a dedicated port to any one bank. Accordingly, the resulting cache design provides a much . greater bandwidth than the cache designs of the prior art, in which a port is dedicated to a single bank. Moreover, because any cache bank may be mapped to any port, the number of address conflicts in accessing the cache is reduced in a preferred embodiment, thereby reducing the number of stalls required for an execution unit. and decreasing the overall latency in accessing the cache of a system.

Turning to FIG. 2, a preferred embodiment is shown in greater detail. In a preferred embodiment, a bit line for a bank (e.g., bank 10,) is shared for both read and write operations. As shown in FIG. 2, in a preferred embodiment, data store/fill circuitry 12 comprises a MUX 30 that is used to select between the fill data and the store ports (Ps0-Ps3). A write source signal is input to the MUX 30 to control its operation by selecting which source to write out onto a particular bank's bit line (e.g., bank 10,'s bit line). If a write instruction is being performed, the data is driven out onto the bit line from the data store/fill MUX 30, and then the word lines and way select signals are fired to actually write that data onto the cache bank's bit line. In that case (i.e., when a write instruction is being performed), the data through the crossover MUX 14 is not used. That is, if a write instruction is being performed, the bit line going into the crossover MUX 14 is ignored.

On the other hand, if a read instruction is being performed, the data store/fill MUX 30 is turned off and the data array of the particular bank (e.g., of bank 10,) is allowed to drive the bit line. When the data array of bank 10, drives the bit line, it inputs the data into the crossover MUX 14. As discussed above in conjunction with FIG. 1, crossover MUX 14 selects to

which load (or "read") port (PLO-PL3) a particular bank (e.g., bank 10,) is mapped. In a preferred embodiment, the control signals for MUX 14 are bits [7:4] of the physical address for a memory access request. Bits [7:4J of the physical address are known early in a preferred embodiment because they are the same in the virtual address. Thus, when the virtual address for a memory access request is received, the bits [7:4J of the physical address are known. Therefore, a preferred embodiment enables the control circuitry to be set up very early, thereby reducing the latency involved in accessing the cache memory. It should be understood, that the present invention is not intended to be limited solely to the use of bits [7:4] of the physical address for such control of MUX 14, but rather any known bits of the physical address may be utilized in alternative embodiments.

Thus, if a read instruction is being performed, MUX 30 of store/fill circuitry 12 is turned off, and crossover MUX 14 maps the appropriate bank to the appropriate read port PLO-PL3. In addition, if a write instruction is being performed, the bit line from the cache bank containing the desired address for the write instruction to the crossover MM 14 is ignored, and MUX 30 allows the store data from the appropriate store port PSO-PS3 to be written into the cache bank's data array. Accordingly, implementing crossover MUX 14 within cache 100 enables any of the cache banks to be mapped to any of the ports. Actually, in a preferred embodiment, any one of the cache banks may be mapped to drive more than one of the read ports. In fact, in a preferred embodiment, any one of the cache banks may be mapped to drive all of the read ports (e.g., bank 10, may be mapped to drive all of ports PO- P3). As illustrated in FIG. 2, crossover MUX 14 enables a cache design that does not have a dedicated port for any particular bank.

It should be recognized that the architecture shown in FIG. 2 may be duplicated for any number of banks, and in a most preferred embodiment it is duplicated for seven additional banks, resulting in a total of eight banks (e.g., the total number of banks within a set of banks 10). For simplicity, FIG. 2 illustrates crossover MUX 14 as only having eight banks coupled to it, which would be duplicated in a most preferred embodiment (indicated in FIG. 2 by "2X") to allow sixteen banks to be coupled to MUX 14. Thus, in a most preferred embodiment, sixteen banks are coupled to crossover MUX 14. Of course, in alternative

embodiments the cache design may be implemented for any number of banks, as well as for any number of ports, and any such implementation is intended to be within the scope of the present invention.

It should be understood that the cache of a preferred embodiment may be implemented within a multilevel cache, such as the multilevel cache disclosed in U.S. Patent Application Serial No. [Attorney Docket No. 1097142l] entitled "METHOD AND SYSTEM FOR EARLY TAG ACCESSES FOR LOWER-LEVEL CACHES IN PARALLEL WITH FIRST-LEVEL CACHE," the disclosure of which is hereby incorporated herein by reference. Furthermore, a preferred embodiment may be implemented within a high performance cache as disclosed in U.S. Patent Application Serial Number [Attorney Docket No. 10971230] entitled "SYSTEM AND METHOD UTILIZING SPECULATIVE CACHE ACCESS FOR IMPROVED PERFORMANCE," the disclosure of which is hereby incorporated herein by reference. It should also be understood that a cache structure of the present invention may be implemented within any type of computer system having a processor, including but not limited to a personal computer (PC), laptop computer, and personal data assistant (e.g., a Palmtop PC).

Although the present invention and its advantages have been described in detail, it should be understood that various changes, substitutions and alterations can be made herein without departing from the spirit and scope of the invention as defined by the appended claims. Moreover, the scope of the present application is not intended to be limited to the particular embodiments of the process, machine, manufacture, composition of matter, means, methods and steps described in the specification. As one of ordinary skill in the art will readily appreciate from the disclosure of the present invention, processes, machines, manufacture, compositions of matter, means, methods, or steps, presently existing or later to be developed that perform substantially the same function or achieve substantially the same result as the corresponding embodiments described herein may be utilized according to the present invention. Accordingly, the appended claims are intended to include within their scope such processes, machines, manufacture, compositions of matter, means, methods, or steps.

Claims

WHAT IS CLAIMED IS: 1. A method of accessing cache, wherein said cache comprises multiple banks and multiple ports, said method comprising the steps of: receiving a memory access request in said cache 100; selecting one bank of multiple banks 10, 20 of said cache, wherein said selected bank contains a memory address required to satisfy said memory access request; and upon receiving said memory access request, mapping said selected bank to at least one of said multiple ports to satisfy said memory access request, wherein any one of said multiple banks can be mapped to any one of said multiple ports.
2. A computer system comprising: at least one processor for executing instructions; and a cache structure 100 accessible by said at least one processor to satisfy a memory access request therefrom, wherein said cache structure comprises multiple cache banks 10, 20 and multiple ports, and wherein said cache is configured such that any one of said multiple cache banks can be mapped to any one of said multiple ports to satisfy a memory access request.
3. A cache structure that is accessible by at least one computer processor to satisfy memory access requests for instructions being executed by said at least one computer processor, said cache structure comprising: means for receiving a memory access request from at least one processor; multiple cache banks 10, 20; multiple ports; and means 14 operable upon receiving a memory access request for mapping any one of said multiple cache banks to any one of said multiple ports to satisfy a received memory access request.

<Desc/Clms Page number 16>
4. The method of claiml or the computer system of claim 2 wherein said multiple ports include multiple read ports and multiple write ports.
5. The method of claim 4 or the computer system of claim 4 wherein said multiple write ports includes at least one data fill port 12, 16.
6. The method of claim 1 further comprising the steps of receiving a second memory access request for said cache; selecting a second bank of said multiple banks of said cache, wherein said second bank contains a memory address required to satisfy said second memory access request; and in parallel with said last-mentioned mapping step, mapping said second bank to at least one other of said multiple ports to satisfy said second memory access request.
7. The computer system of claim 2 wherein said cache structure further comprises: a crossover MUX 14 that enables data from any one of said multiple cache banks to be mapped to any one of said multiple ports.
8. The computer system of claim 7 wherein said cache structure is arranged to receive multiple memory access requests and map any one of said multiple cache banks to any one of said multiple ports to satisfy multiple ones of said received multiple memory access requests in parallel.
9. The cache structure of claim 3 wherein said means for mapping comprises a crossover MUX 14.
10. The cache structure of claim 3 wherein said multiple ports comprises multiple read ports and multiple write ports, and wherein said means for receiving a memory access request comprises a cache data path.