WO1995022791A2 - Method and apparatus for single cycle cache access on double word boundary cross - Google Patents
Method and apparatus for single cycle cache access on double word boundary cross Download PDFInfo
- Publication number
- WO1995022791A2 WO1995022791A2 PCT/US1995/001779 US9501779W WO9522791A2 WO 1995022791 A2 WO1995022791 A2 WO 1995022791A2 US 9501779 W US9501779 W US 9501779W WO 9522791 A2 WO9522791 A2 WO 9522791A2
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- data
- cache
- output
- significant byte
- input
- Prior art date
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline or look ahead
- G06F9/3802—Instruction prefetching
- G06F9/3816—Instruction alignment, e.g. cache line crossing
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/0802—Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
- G06F12/0877—Cache access modes
- G06F12/0886—Variable-length word access
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/10—Address translation
- G06F12/1027—Address translation using associative or pseudo-associative address translation means, e.g. translation look-aside buffer [TLB]
- G06F12/1045—Address translation using associative or pseudo-associative address translation means, e.g. translation look-aside buffer [TLB] associated with a data cache
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F12/00—Accessing, addressing or allocating within memory systems or architectures
- G06F12/02—Addressing or allocation; Relocation
- G06F12/08—Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
- G06F12/10—Address translation
- G06F12/109—Address translation for multiple virtual address spaces, e.g. segmentation
Definitions
- the invention relates generally to cache memory and, more particularly, to cache memory that allows single cycle cache access to data that crosses a double word boundary.
- the invention further relates to a method of using such a cache memory.
- Prior cache memory accesses have been limited to boundaries defined in main memory.
- the prior art cache memory systems include high speed memory for holding the most recently used information for future reuse by the CPU. Thus, the CPU views the cache as another memory module.
- the CPU has an address bus and a data bus.
- the address bus accesses the desired main memory location and the data bus communicates the data to the CPU. Since the CPU treats cache as another memory module, the address bus and data bus also access the cache.
- prior art systems use slower memory for the bulk of a computer's main memory and faster memory in the cache.
- the cache helps move data between the main memory and the CPU with the least delay. Without the cache, the CPU may sit idle while it waits for retrieval of the requested data. With the cache, however, a computer can keep the data most likely to be requested readily available. The data in the faster cache can be provided to the CPU with a minimum of delay.
- a cache memory system further includes a Data File, a Tag File, a Valid Flag File, and a Least- Recently-Used (LRU) File.
- the Tag File holds indications of which portions of main memory are stored in the cache.
- the Valid Flag File indicates which portions of the cache are valid.
- the Least-Recentl ⁇ -Used (LRU) file defines which portions of the cache to discard.
- the Data File provides a memory bank for storage of data and instructions fetched from main memory. Memory in conventional computers is divided into 8-bit quantities (bytes), 16-bit quantities (words), and 32-bit quantities (double words). In most 32-bit computers, main memory is organized into double word (32-bit) boundaries that correspond to the width of the data bus. To increase processor throughput, prior art cache memory systems are organized to allow access to more memory per cycle than is obtainable on the main memory data path. For example, each address line of the cache memory can hold multiple double words. Thus the cache memory is wider than the main memory data bus.
- data samples reside in main memory within a double word boundary.
- a data sample may also cross a double word boundary.
- data may cross a double word boundary in main memory such that a portion of the data sample resides in a first double word and a portion resides in an adjacent double word.
- two read cycles are required: one cycle to read from the data within the first double word boundary; and one cycle to read the data in the adjacent double word boundary.
- prior art cache memory systems can access multiple double words, the prior art systems still operate on double word boundaries in order to maintain consistency with main memory. Therefore, these prior art cache memory systems suffer from several drawbacks. For example, two read cycles are necessary to retrieve data that crosses a double word boundary. Also, data that crosses a double word boundary cannot be stored into the cache in a single cycle. Furthermore, prior art cache memories typically do not provide alignment circuitry to allow single cycle accesses to cache memory while maintaining consistency with main memory. In addition to data, some cache memory systems also allow storage of instructions. To enhance performance, processors often decode instruction codes in a pipeline fashion. In exemplary prior art, an instruction unit in the CPU processes instruction codes. To enhance instruction decoding and processing, prior art systems employ a wide instruction bus to move instructions between the cache and the instruction unit. A wide instruction bus can take advantage of a simultaneous cache access to multiple double words.
- prior art cache systems fail to differentiate between instruction and data cache accesses that cross double word boundaries. When data crosses a double word boundary, cache circuitry needs to provide proper alignment, whereas instructions that cross double word boundaries do not need alignment. Furthermore, prior art cache memories do not provide circuitry to allow direct access for the wider instruction bus, and an aligned access for a separate data bus.
- a cache memory alignment system for single cycle access to data that crosses double word boundaries.
- the subject invention comprises a microprocessor having a central processing unit (CPU) and an on-chip cache that enhances the microprocessor processing speed.
- CPU central processing unit
- on-chip cache that enhances the microprocessor processing speed.
- the cache access circuitry allows single cycle access to larger instruction segments and to data that crosses double word boundaries.
- the cache memory uses a plurality of random access memories (RAMs) to provide a memory bank 16 bytes wide and 512 lines high.
- RAMs random access memories
- the reliance on a plurality of RAMs provides the further benefit of accessing a byte, word, or double word for data, and multiple double words for instructions. Barrel shifters and multiplexers are used to eliminate unwanted read and write delays.
- the 16 RAMs share the same address lines.
- the memory accessed by the address lines is known as a cache data line.
- nine address lines access 512 cache data lines.
- the Tag File, Valid Flag File and LRU File use the physical address lines on the address bus to enable a corresponding cache data line.
- the access circuitry allows single cycle access of eight bytes of instruction information from a cache data line.
- the access circuitry allows single cycle access to data that crosses a double word boundary within a cache data line. The advantage of this design is higher total performance. All cache data accesses take one cycle. All cache instruction accesses provide up to 16 bytes of data in two cycles.
- a byte, word, or double word is fetched from main memory.
- a select line on the input multiplexer is asserted to allow the external data bus to load the input barrel shifter. If data is sent from the CPU, a select line on the input multiplexer is asserted to allow the internal bus to load the input barrel shifter.
- the input data may include multiple bytes where a least significant byte is stored in a non-least significant byte location of a first double word in the addressed cache data line, and where the input data's most significant byte is stored in a non-most significant byte location of a second adjacent double word in the addressed cache data line.
- the data output path receives output data from an enabled cache data line and aligns the output data in a single access cycle.
- the output data may include multiple bytes with a least significant byte stored in a non-least significant byte location of a first double word of the enabled cache data line and include a most significant byte stored in a non-most significant byte location of a second adjacent double word in the enabled cache data line.
- the output circuitry selectively shifts the output data to position the least significant byte to the least significant byte position within the multiple byte output.
- Yet another aspect of the invention includes an instruction output path coupled to said RAM data outputs that selects and outputs a plurality of instruction bytes from a plurality of double words of an addressed cache data line.
- FIG. 1 is a block diagram of a microprocessor system comprising a central processing unit (CPU), a cache, and an external main memory.
- CPU central processing unit
- cache cache
- external main memory main memory
- Fig. 4 is a table of cache memory locations illustrating the RAMs organized into four double words.
- Fig. 5 is a block diagram of the access circuitry.
- Fig. 6 is a schematic of the input barrel shifter connected to the RAMs.
- Fig. 7 is a truth table of input barrel shifter lines BARRELJN1 and BARRELJNO.
- Fig. 8a and 8b is a truth table of alignment register values, size values, and bytes in a cache data line.
- Fig. 11 is a truth table of output barrel shifter lines BARREL_0UT1 and BARREL 0UT0.
- Fig. 14 is a timing diagram of a data read cycle in the cache memory of the present invention.
- Fig. 16 is a block diagram of the output circuitry and a data sample W, X, Y, and Z that crosses 15 a double word boundary.
- FIG. 1 illustrates a microprocessor having an on-chip cache.
- the microprocessor comprises a central processing unit (CPU) 100, and a cache 102.
- the on-chip cache 102 connects to a main memory 20 104 via an external data bus 106 and an address bus 108.
- the CPU 100 accesses the cache via the address bus 108 and the internal bus 110.
- the CPU accesses the cache memory via a local data bus 112, and a local instruction bus 114.
- FIG. 2 illustrates a block diagram of the preferred embodiment of the present invention.
- a cache memory, or Data File 102 stores data and instructions retrieved from main memory.
- the Data File 102 30 includes access circuitry 116, an input barrel shifter 120, input multiplexers 115, 117, 118, 119, memory 122, an instruction register 126, an instruction multiplexer 124, data output multiplexers 130, 132, 134, 136, and an output barrel shifter 138.
- the memory 122 organizes a plurality of RAMs into a memory module. As shown in FIG. 3, each RAM 140 is provided with a plurality of address lines 142, and a set of eight data lines (one byte) 144. In the preferred embodiment, nine address lines access 512 memory locations. A write enable (WE) 146 is provided to strobe data from the set of data lines 144 into the RAM 140. The data is strobed into the RAM 140 at a location determined by the access circuitry (not shown). Thus, the plurality of address lines access 512 locations in the RAM 140, each location holds eight bits (one byte). As shown by a block diagram in FIG. 4, the Data File includes 16 RAMs 140.
- the cache memory is a two dimensional matrix, 512 rows (lines) high and 16 columns (bytes) wide.
- the organization of the 16 RAMs allows the cache to access 16 bytes of data in a single cycle.
- the 16 RAMs share the same address lines 142.
- the 16 bytes of memory that correspond to a particular address are known as a cache data line 148. Since the address lines access 512 locations, the cache has 512 cache data lines 148.
- the 16 bytes of each cache data line 148 are numbered from 0 to 15 (where zero is the least significant byte and 15 is the most significant byte).
- the cache further includes a Tag File, a Valid Flag File, and a Least-Recently-Used (LRU) File (not shown).
- the Tag File, Valid Flag File, and LRU File are well known in the prior art.
- the Tag File holds indications of which portions of main memory are stored in the cache.
- the Valid Flag File indicates which portions of the cache are valid.
- the LRU file defines which portions of the cache to discard.
- the Tag File contains four sets of 128x21 RAMs and 21 -bit comparators.
- the Tag File stores bits 31:10 of the physical addresses of the data stored in the cache.
- the Valid Flag File has four 128x1 RAMs for storing the valid bits that determine if valid data exists at a particular location in the cache.
- the LRU File (Least Recently Used) includes three 128x1 RAMs. Once the cache memory is loaded, the LRU defines what locations to discard. No further discussion of the Tag File, Valid Flag File, or LRU File is made herein as one who is skilled in the relevant technology will understand their function. Memory in conventional computers is divided up into 8-bit quantities (bytes), 16-bit quantities
- each cache data line is further partitioned into four double words.
- Each cache double word corresponds to a double word in main memory.
- each row (cache data line) has 16 bytes, the 16 bytes are further organized into four large columns (double words).
- each cache data line 148 are numbered from 0 to 15 (where zero is the least significant byte and 15 is the most significant byte).
- the first double word occupies bytes 0-3, the second double word occupies bytes 4-7, the third double word occupies bytes 8- 11 and the fourth double word occupies bytes 12-15 of each cache data line 148.
- FIG. 1 when the CPU 100 initiates a fetch from main memory 104, it also initiates a cache access. If the requested data is not in the cache 102 (a miss), then a fetch from main memory 104 is generated on the external data bus 106. The fetch from main memory 104 retrieves four consecutive double words. The four double words (16 bytes) from main memory are copied into the selected cache data line. The CPU generates the address lines that determine which 16 consecutive bytes to retrieve from main memory.
- the access circuitry 116 controls the input barrel shifter 120, the input multiplexers 1 15, 117, 118, 119, memory (RAMs) 122, the instruction register 126, the instruction multiplexer 124, the data output multiplexers 130, 132, 134, 136 and the output barrel shifter 138.
- the access circuitry includes an alignment register 150, a size register 152, and control logic 154.
- the alignment register 150 receives physical address lines 3-0.
- the size register 152 identifies how many bytes to access in the selected cache data line 148. Since it is possible to access a byte, word, or double word, the alignment register 150 identifies the lowest order byte, and the size register 152 identifies any additional bytes. The control logic 154, the size register 152, and the alignment register 150 determine which write enables 146 of the cache data line 148 are asserted.
- the input barrel shifter 120, input multiplexers 115, 1 17, 118, 119, and access circuitry 116 load the cache memory (RAMs) 122.
- the input barrel shifter 120 connects to the internal bus 110.
- the input multiplexers 115, 117, 118, and 119 connect to the external data bus 106 and the outputs of the input barrel shifter 120.
- the internal bus 110 and external data bus 106 are 32-bits wide and carry both data and instructions.
- the input barrel shifter 120 transfers the contents of the internal data bus 110 to the input multiplexers 115, 117, 118, 119.
- the input multiplexers 115, 117, 118, 119 are two-to-one multiplexers that select data from the external data bus 106 or the input barrel shifter 120.
- the outputs of input multiplexers 115, 1 17, 118, 119 connect to the cache memory 122.
- the input barrel shifter 120 holds a double word (4 bytes) and can shift each byte left with the most significant byte wrapping around to the least significant byte location.
- the wrap around shift left capability of the barrel shifter 120 allows alignment of data before storage into the enabled cache data line 148.
- the shift left moves each byte in the barrel shifter 120 left one byte.
- the most significant byte wraps around to the least significant byte.
- Each byte of the input barrel shifter 120 connects to one of the input multiplexers 115, 117,
- the letters W, X, Y, and Z represent the byte ordering of the input barrel shifter 120 that corresponds to bytes in the cache data line 148.
- the lowest order byte of the barrel shifter (Z) connects MUX4 119.
- the highest order byte of the barrel shifter (W) connects to MUX7 115. Accordingly, the middle two bytes (X and Y) connect to MUX6 117, and MUX5 118.
- MUX4 119 connects to bytes 0, 4, 8, and 12 in the Cache Data Line.
- the output of MUX5 118 connects to bytes 1, 5, 9, and 13.
- the output of MUX6 117 connects to bytes 2, 6, 10, and 14.
- the output of MUX7 115 connects to bytes 3, 7, 11, and 15.
- the input barrel shifter 120 can align bytes for storage into the cache data line 148. To properly align the bytes, the output barrel shifter shifts data left. Two control lines, BARRELJN1 and BARRELJNO determine how far to shift data left.
- FIG. 7 illustrates a truth table for the control lines BARRELJN1 and BARRELJNO. It should be understood that because the four byte outputs of the barrel shifter are replicated for each double word, it s not necessary to shift left more than three bytes. Referring to FIG. 5, to write a byte, the control logic 154 asserts the write enable identified by the alignment register 150. To write a word the control logic 154 asserts the write enable identified in the alignment register 150 and the next higher order byte.
- FIGs. 8a and 8b show a table of the alignment register values (hexadecimal), the byte, word or double word size, and the corresponding write enable asserted by the access circuitry.
- the control logic 154 asserts the appropriate write enables 146 identified by the alignment register 150. For example, to write a word at byte location zero in a cache data line 148 (see line 2 in FIG. 8a), the control logic 154 asserts the write enable identified in the alig ⁇ me ⁇ t register 150.
- the alignment register 150 identifies byte zero in the cache data line 148.
- the size register 152 identifies that the data sample is two bytes (a word), thus the control logic 154 also asserts the write enable 146 to byte one in the cache data line 148.
- the control logic 154 asserts the write enable identified by the alignment register 150 and the write enables 146 of the next three higher order bytes. For example, to write a double word at byte location eight in a cache data line 148 (see first line in FIG. 8b), the alignment register 150 identifies byte eight, and the size register 152 identifies a double word. The control logic 154 asserts the write enables 146 to bytes eight, nine, ten, and eleven in the cache data line 148. Turning now to the output circuitry as shown in FIG. 2, once data or instructions are stored into the cache data line 148, the CPU can access the cache data line 148 in a single cycle. For this purpose, the cache includes a separate output path for instructions and data.
- Physical address line 3 drives the select line of the instruction multiplexer 124. If address line 3 is not asserted, the instruction multiplexer 124 selects the first eight bytes (bytes 0-7) of the instruction register 126. If address line 3 is asserted, the instruction multiplexer 124 selects the second eight bytes (bytes 8-15) of the instruction register 126. In order to retrieve data, the data output path includes the four data output multiplexers 130,
- Each multiplexer has two select lines that control which byte to select. Select lines MUX3_SEL1 and MUX3_SEL0 control MUX3 130. Select lines MUX2_SEL1, and MUX2 SEL0 control MUX2 132. Select lines MUX1 SEL1, and MUX1 SEL0 control MUX1 134. Select lines MUX0 EL1, and MUXO SELO control MUXO 136. Table 1 illustrates a truth table for the select lines of each data multiplexer.
- MUX3 MUX2 MUX1 MUXO Selects from Selects from Selects from Selects from bytes: bytes: bytes: 15, 11, 7, 3 14, 10, 6, 2 13, 9, 5, 1 12, 8, 4, 0
- each multiplexer is hard-wired to particular bytes in the cache data line 148, the bytes selected by the data output multiplexers 130, 132, 134, 136 may need alignment.
- the output barrel shifter 138 shifts data right.
- the output barrel shifter 138 wraps the least significant byte around to the most significant byte location.
- Two control lines, BARREL_0UT1 and BARREL_0UT0 determine how far to shift data right.
- the access circuitry 116 generates control lines BARREL 0UT1 and BARREL OUTO.
- FIG. 11 shows a truth table for the control lines BARREL_0UT1 and BARREL_OUTO.
- the cache allows accesses of bytes, words and double words.
- the cache stores a copy of the data retrieved from main memory.
- the access circuitry decodes the address and size of the data sample and enables the appropriate cache data line 148.
- the access circuitry 116 also directs the input multiplexers 115, 117, 118, 119 to select the external data bus 106.
- the byte ordering on the external data bus 106 will correspond to the bytes addressed in main memory. Since data retrieved from the external data bus 106 aligns within double word boundaries, no alignment is performed. If data is sent from the CPU 100, the access circuitry 116 directs the input multiplexers 115,
- the CPU 100 To retrieve such a data sample in main memory, two cycles are required. Referring to the timing diagrams in FIGs. 13 and 14, and the block diagram in FIG. 11, the CPU 100 first issues an access command. If the data is not in the cache, the CPU 100 reads from the main memory 104. The access circuitry 1 16 directs the input multiplexers 115, 117, 118, 119 to select the external data bus 106. As shown by the timing diagram in FIG. 13, during one cycle, bytes W and X are transferred to bytes 5, and 4 in the cache data line 148. In the other cycle, bytes Y, and Z are transferred to bytes 3 and 2 in the cache data line. Two additional read cycles retrieve the third and fourth double words from main memory to load the third and fourth double words in the cache data line.
- the input barrel shifter 120 Since each byte of the input barrel shifter 120 is hard-wired via the input multiplexers 115, 117, 118, 119 to specific byte locations in the cache data line 148, the input barrel shifter 120 must align the bytes before storage into the cache data line 148. To properly align the data sample, the following will occur. As illustrated in FIG. 12, the access circuitry asserts BARREL JN1 to command the barrel shifter to rotate W, X, Y, and Z two bytes left, so that Y, Z, W, and X appear in the barrel shifter in that order. Furthermore, the access circuitry 116 asserts the write enables to bytes 5, 4, 3, and 2 in the cache data line 148. FIG. 8a shows that bytes 5, 4, 3, and 2 are enabled when alignment register holds a two and the size register holds a double word. Input multiplexer 119 passes the first byte (lowest order byte) in the input barrel shifter 120
- the access circuitry 116 allows access of bytes, words, and double words within the cache data line.
- a data sample may cross a cache boundary.
- a data sample W, X, Y, and Z resides in cache such that bytes Y and Z reside in the fourth double word of a first cache data line 148, and bytes W and X reside in the first double word of a second cache data line 148.
- the access circuitry 116 will access bytes Y and Z in the first cache data line 148 in one cycle, and access W and X in the second cache data line 148 in a second cycle.
- the cache will access the data in two cycles.
- the timing diagram in FIG. 15 shows a fetch of instructions in the present invention.
- the CPU issues an access command and retrieves four double words of instructions from main memory in four cycles.
- the four double words are also loaded into an enabled cache data line 148.
- the cache determines that the instructions reside in cache memory (a hit).
- the access circuitry 116 enables the appropriate cache data line 148.
- the 16 bytes in the enabled cache data line 148 are loaded into the instruction register 126.
- the instruction multiplexer 124 selects the first and second double words of the instruction register 126 and outputs to the local instruction bus 114.
- the instruction multiplexer 124 selects the second and third double words from the instruction register 126 and outputs to the local instruction bus. Therefore, the cache of the present invention can fetch eight instruction bytes in a single cycle and 16 instruction bytes in two cycles.
- the instruction multiplexer selects the upper eight bytes of the cache data line.
- the instruction multiplexer outputs the eight upper bytes of the instruction register K, L, M, N, 0, P, Q, and R on to the local instruction bus.
- the CPU can retrieve 16 bytes of instruction information in two fetch cycles.
- the cache of the preferred embodiment can retrieve a byte, word, or double word.
- the access circuitry 116 decodes the address and size of the data sample and selects a particular cache data line 148.
- the access circuitry 116 also asserts the select lines to each data output multiplexer 130, 132, 134, 136 in order to retrieve four consecutive bytes in the cache data line 148.
- the data output multiplexers 130, 132, 134, 136 connect to the input of the output barrel shifter 138. If alignment is necessary, the access circuitry 116 commands the output barrel shifter 138 to shift right. After shifting, the output barrel shifter 138 outputs the data to the local data bus 112.
- the access circuitry can access data sizes of a byte, word, or double word in a single cycle. Table 2 illustrates an example of four bytes of data aligned within a double word boundary.
- the four bytes of data are designated as W, X, Y, and Z.
- W is the highest order byte and Z is the lowest order byte.
- Bytes W, X, Y, and Z exist in bytes 3, 2, 1, and 0 of the cache data line. Conceptionally, bytes W, X, Y, and Z reside in the first double word of the cache data line.
- the CPU initiates a read data cycle. The cache determines that W, X,
- Y, and Z reside in the cache RAMs (a hit).
- the access circuitry enables the appropriate cache data line 148.
- the access circuitry 116 asserts the select lines of the data multiplexers.
- the data output multiplexers, MUX3 130, MUX2 132, MUX1 134, and MUXO 136 select data bytes 3, 2, 1, and 0 in the cache data line as shown in Table 3.
- the data output multiplexers 130, 132, 134, 136 load the output barrel shifter 138 with W, X, Y, and Z. Since W, X, Y, and Z align within a double word boundary no shifting is necessary. Thus the access circuitry does not assert BARREL JUT1 or BARREL JUTO, and the output barrel shifter 138 loads W, X, Y, and Z onto the internal data bus.
- the data output multiplexers MUX3 130, MUX2 132, MUX1 134 and MUXO 136 may need alignment before sending it to the CPU 100.
- four bytes that cross a double word boundary are designated as W, X, Y, and Z as shown in FIG. 16.
- W is the highest order byte and Z is the lowest order byte.
- Bytes W and X reside in the cache data line 148 at bytes 13 and 12.
- Bytes Y and Z reside in bytes 11 and 10.
- Bytes W and X reside in the fourth double word, while bytes Y and Z reside in the third double word.
- the cache determines that W, X, Y, and Z reside in the cache RAMs.
- the access circuitry 116 enables the appropriate cache data line.
- FIG. 16 shows that the access circuity also asserts the select lines on the data output multiplexers 130, 132, 134, 136 to select data bytes 13, 12, 11, and 10. Because the data output multiplexers 130, 132, 134, 136 are hard-wired to specific bytes in the cache data line 148, the access circuitry 116 directs MUX3 130 to select byte 12, MUX2 132 to select byte 11, MUX1 134 to select byte 14, and MUXO 136 to select byte 13. As a result, the byte ordering differs from the original sample.
- the data multiplexers load Y, Z, W, and X into the output barrel shifter 138.
- Table 4 illustrates another example where four bytes of data cross a double word boundary.
- MUX2 132 selects byte 10
- MUX1 134 selects byte 9
- MUXO 136 selects byte 12 as shown in
- the present apparatus and method provides speed a flexibility by providing alignment, storage, and retrieval of data that crosses double word boundaries in a single access cycle.
- the separate instruction output path allows access to an entire cache data line 148 in two cycles.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Software Systems (AREA)
- Memory System Of A Hierarchy Structure (AREA)
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US19338394A | 1994-02-08 | 1994-02-08 | |
US08/193,383 | 1994-02-08 |
Publications (2)
Publication Number | Publication Date |
---|---|
WO1995022791A2 true WO1995022791A2 (en) | 1995-08-24 |
WO1995022791A3 WO1995022791A3 (en) | 1995-09-21 |
Family
ID=22713424
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/US1995/001779 WO1995022791A2 (en) | 1994-02-08 | 1995-02-08 | Method and apparatus for single cycle cache access on double word boundary cross |
Country Status (2)
Country | Link |
---|---|
TW (1) | TW255024B (enrdf_load_stackoverflow) |
WO (1) | WO1995022791A2 (enrdf_load_stackoverflow) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2000002127A3 (de) * | 1998-07-03 | 2000-06-29 | Infineon Technologies Ag | Datenspeichervorrichtung |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPS54122043A (en) * | 1978-03-15 | 1979-09-21 | Toshiba Corp | Electronic computer |
JPS5530727A (en) * | 1978-08-22 | 1980-03-04 | Nec Corp | Information processor |
US4814976C1 (en) * | 1986-12-23 | 2002-06-04 | Mips Tech Inc | Risc computer with unaligned reference handling and method for the same |
US5386531A (en) * | 1991-05-15 | 1995-01-31 | International Business Machines Corporation | Computer system accelerator for multi-word cross-boundary storage access |
-
1994
- 1994-02-15 TW TW083101184A patent/TW255024B/zh not_active IP Right Cessation
-
1995
- 1995-02-08 WO PCT/US1995/001779 patent/WO1995022791A2/en active Application Filing
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2000002127A3 (de) * | 1998-07-03 | 2000-06-29 | Infineon Technologies Ag | Datenspeichervorrichtung |
US6952762B1 (en) | 1998-07-03 | 2005-10-04 | Infineon Technologies Ag | Data storage device with overlapped buffering scheme |
Also Published As
Publication number | Publication date |
---|---|
TW255024B (enrdf_load_stackoverflow) | 1995-08-21 |
WO1995022791A3 (en) | 1995-09-21 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US5091851A (en) | Fast multiple-word accesses from a multi-way set-associative cache memory | |
US5796972A (en) | Method and apparatus for performing microcode paging during instruction execution in an instruction processor | |
US5706466A (en) | Von Neumann system with harvard processor and instruction buffer | |
EP0192202B1 (en) | Memory system including simplified high-speed data cache | |
US5826052A (en) | Method and apparatus for concurrent access to multiple physical caches | |
US5813031A (en) | Caching tag for a large scale cache computer memory system | |
US6138209A (en) | Data processing system and multi-way set associative cache utilizing class predict data structure and method thereof | |
US6275902B1 (en) | Data processor with variable types of cache memories and a controller for selecting a cache memory to be access | |
US5070502A (en) | Defect tolerant set associative cache | |
EP0407119B1 (en) | Apparatus and method for reading, writing and refreshing memory with direct virtual or physical access | |
CA1181866A (en) | Multiword memory data storage and addressing technique and apparatus | |
JP4006436B2 (ja) | 種々のキャッシュ・レベルにおける連想セットの重畳一致グループを有するマルチレベル・キャッシュ | |
KR100341948B1 (ko) | 제어된버스트메모리액세스기능을갖는데이타처리기및그방법 | |
JP2002509312A (ja) | 不整列データ・アクセスを実行するためのデータ整列バッファを有するディジタル信号プロセッサ | |
EP0706133A2 (en) | Method and system for concurrent access in a data cache array utilizing multiple match line selection paths | |
US5721957A (en) | Method and system for storing data in cache and retrieving data from cache in a selected one of multiple data formats | |
WO1981001894A1 (en) | Cache memory in which the data block size is variable | |
US6157980A (en) | Cache directory addressing scheme for variable cache sizes | |
US5805855A (en) | Data cache array having multiple content addressable fields per cache line | |
US6473835B2 (en) | Partition of on-chip memory buffer for cache | |
JPH04233050A (ja) | キャッシュメモリ交換プロトコル | |
US5761714A (en) | Single-cycle multi-accessible interleaved cache | |
US5574883A (en) | Single chip processing unit providing immediate availability of frequently used microcode instruction words | |
US6442667B1 (en) | Selectively powering X Y organized memory banks | |
US5070444A (en) | Storage system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AK | Designated states |
Kind code of ref document: A2 Designated state(s): CN JP KR |
|
AL | Designated countries for regional patents |
Kind code of ref document: A2 Designated state(s): AT BE CH DE DK ES FR GB GR IE IT LU MC NL PT SE |
|
AK | Designated states |
Kind code of ref document: A3 Designated state(s): CN JP KR |
|
AL | Designated countries for regional patents |
Kind code of ref document: A3 Designated state(s): AT BE CH DE DK ES FR GB GR IE IT LU MC NL PT SE |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
DFPE | Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101) | ||
122 | Ep: pct application non-entry in european phase |