US20020032827A1 - Structure and method for providing multiple externally accessible on-chip caches in a microprocessor - Google Patents

Structure and method for providing multiple externally accessible on-chip caches in a microprocessor Download PDF

Info

Publication number
US20020032827A1
US20020032827A1 US08/818,060 US81806097A US2002032827A1 US 20020032827 A1 US20020032827 A1 US 20020032827A1 US 81806097 A US81806097 A US 81806097A US 2002032827 A1 US2002032827 A1 US 2002032827A1
Authority
US
United States
Prior art keywords
pins
data
cache
bus
memory
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
US08/818,060
Other versions
US6446164B1 (en
Inventor
De H. Nguyen
Raymond M. Chu
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Priority to US08/818,060 priority Critical patent/US6446164B1/en
Publication of US20020032827A1 publication Critical patent/US20020032827A1/en
Application granted granted Critical
Publication of US6446164B1 publication Critical patent/US6446164B1/en
Anticipated expiration legal-status Critical
Expired - Fee Related legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches

Definitions

  • This invention relates to integrated circuits, and in particular, relates to the design of microprocessors.
  • cache memories have been successfully used to achieve high performance in many computer systems.
  • cache memories of microprocessor-based systems are provided off-chip using high performance memory components. This is primarily because the amount of silicon area necessary to provide an on-chip cache memory of reasonable performance would have been impractical, since increasing the size of an integrated circuit to accommodate a cache memory will adversely impact the yield of the integrated circuit in a given manufacturing process.
  • the central processing unit looks into the cache memory system for a copy of the memory word. If the memory word is found in the cache memory, a cache “hit” is said to have occurred, and the main memory is not accessed. Thus, a figure of merit which can be used to measure the effectiveness of the cache memory is the “hit” ratio.
  • the hit ratio is the percentage of total memory references in which the desired datum is found in the cache memory without accessing the main memory. When the desired datum is not found in the cache memory, a “cache miss” is said to have occurred.
  • This portion of the address space is said to be “uncached” or “uncacheable”.
  • the addresses assigned to input/output (I/O) devices are almost always uncached. Both a cache miss or an uncacheable memory reference results in an access to the main memory.
  • ICE in-circuit emulator
  • the ICE typically monitors the signals on the microprocessor's pins.
  • the ICE causes alternative instructions to be executed for such purpose as reading or altering the internal states of the CPU.
  • Such alternative instructions can be preloaded or excluded from the cache memory.
  • the ability to load or exclude instructions from the cache memory from a source external to the CPU can be very useful in many applications. Such ability is not known in the prior art.
  • the ICE can easily isolate the cache memory, perform diagnostic test on each cell in the cache memory, using such techniques as exhaustive standard memory test algorithms independent from the operation of the CPU.
  • the transactions between the cache memory and the CPU can be monitored by the ICE on the off-chip bus between the cache memory and the CPU.
  • no difficulty is created in testing or using an off-chip cache.
  • the cache memory is implemented on-chip, the transactions between the cache and the CPU occur on an on-chip bus, which cannot be probed from the pins of the integrated circuit.
  • debugging operations using an ICE in a system with an on-chip cache system can be very restricted.
  • a structure and a method provide read and write accesses to a microprocessor's internal cache.
  • an external data bus transmits to an internal data bus an address, cache tags and data in accordance with a clock signal provided externally.
  • the external data bus transmits an address and receives from the internal data bus data and tag, also in accordance with the clock, signal provided externally.
  • the external data bus is time-multiplexed to transmit the address, the cache tags and data in two clock periods of an externally provided clock signal.
  • the external data bus is time-multiplexed to transmit to the internal data bus an address in the first clock period of the external clock signal, and to receive cache tags and data in the next two successive clock periods of the externally provided clock signal.
  • “reserved” pins are used to specify a cache access mode. Control signals for the cache access are provided via pins which are used during functional operation to receive external interrupt signals.
  • the present invention allows the user of the microprocessor to exhaustively test the on-chip cache using standard memory test algorithms.
  • the present invention also allows preloading the on-chip cache under control of signals external to the microprocessor. Such preloading operations can be useful in certain applications.
  • the present invention provides a facility for external testing equipment to monitor or intervene internal operations of the microprocessor.
  • FIG. 1 a shows a computer system 100 having a processor 101 with an on-chip instruction cache system 102 and a main memory system 150 external to the processor 101 , in accordance with the present invention.
  • FIG. 1 b is a block diagram of the processor 101 of FIG. 1 a.
  • FIG. 2 is a block diagram showing the addressing scheme used in instruction cache 102 a of the cache system 102 of FIGS. 1 a and 1 b.
  • FIG. 3 is a block diagram in further detail than FIG. 2 of the interface between CPU core 103 and the instruction and data caches 102 a and 102 b , including the control signals ICLK, DCLK, ⁇ overscore (IWR) ⁇ , ⁇ overscore (DWR) ⁇ , ⁇ overscore (IRD) ⁇ and ⁇ overscore (DRD) ⁇ .
  • FIG. 4 summarizes some control signals generated from signals received on the microprocessor's pins for controlling reading and writing the instruction and data caches 102 a and 102 b , in accordance with the present invention.
  • FIG. 5 shows data flow between one pin of processor 101 to one bit each in the DATA[31:0] bus and one of ADRLO[12:0] and TAG[31:11] busses, in accordance with the present invention.
  • FIG. 6 shows a timing diagram for a read cycle and a write cycle involving either the instruction cache memory 102 a , or the data cache memory 102 b , in accordance with the present invention.
  • FIG. 1 a shows, as an example, a computer system 100 having a processor 101 with an on-chip cache system 102 and a main memory system 150 external to the processor, in accordance with the present invention.
  • external or read and write memory (“main memory”) system 150 which is interfaced to the processor 101 over a bus 153 , comprises a dynamic random access memory (DRAM) controller 151 , a main memory 152 implemented by banks 152 a and 152 b of DRAMs and a bus interface 154 .
  • DRAM dynamic random access memory
  • the address space of computer system 100 is also used to access other memory-mapped devices such as I/O controller 141 , I/O devices 142 and 143 , and programmable read-only memory (PROM) 144 .
  • I/O system 140 the memory-mapped devices other than the main memory 150 defined above are collectively referred to as the I/O system 140 , even though read-only memories, such as PROM 144 , are often not considered part of the I/O system.
  • I/O system 140 is also interfaced to the bus 153 .
  • Bus 153 comprises address/data bus 153 a and control bus 153 b .
  • Memory data and memory addresses are time-multiplexed on the 32-bit address/data bus 153 a .
  • Other device configurations using the memory address space are also possible within the scope of the present invention.
  • processor 101 includes two co-processors 103 a and 103 b , controlled by a master pipeline control unit 103 c .
  • Coprocessor 103 a is also referred to as the integer CPU, and includes 32 32-bit general registers 103 a - 1 , an ALU 103 a - 2 , a shifter 103 a - 3 , a multiplication and division unit 103 a - 4 , an address adder 103 a - 5 , and program counter control unit 103 a - 6 .
  • Processor 103 a executes the instruction set known as the MIPS-I Instruction Set Architecture (ISA).
  • ISA MIPS-I Instruction Set Architecture
  • Coprocessor 103 b also known as the System Control Coprocessor, comprises exception/control registers 103 b - 1 , a memory management registers unit 103 b - 2 and a translation look-aside buffer (TLB) 103 b - 3 .
  • the TLB unit 103 b - 3 provides a mapping between virtual and physical addresses.
  • the TLB unit 103 b - 3 has a 64-entry look-up table to provide mapping between virtual and physical addresses efficiently.
  • the TLB unit 103 b - 3 is provided at the user's option.
  • the TLB unit 103 b - 3 can be disabled.
  • the above units of the coprocessors 103 a and 103 b can be implemented by conventional or any suitable designs known in the art.
  • the coprocessor units 103 a and 103 b , and the pipeline control unit 103 c are collectively referred to as the CPU core 103 .
  • the cache system 102 of processor 101 comprises two cache memories 102 a and 102 b .
  • Cache 102 a is an instruction cache.
  • the capacity of cache 102 a can be 4K or 8K bytes, and block fill and line sizes of four memory words each.
  • Cache 102 b is a data cache, and has a selectable block refill size of one or four memory words, a line size of one memory word, and a capacity of 2K bytes.
  • Other cache, block refill and line sizes can be provided within the scope of the present invention. Both the capacities of cache 102 a and cache 102 b , and their respective block refill and line sizes, are matters of design choice. In addition, it is also not necessary to provide separate data and instruction caches.
  • a joint data and instruction cache is also within the scope of the present invention.
  • the TLB unit 103 b - 3 receives from the CPU core 103 on bus 109 a virtual address and provides to either cache 102 a or cache 102 b on bus 107 the corresponding physical memory address.
  • cache accessing using virtual addresses is also possible, by using physical addressing in the instruction and data caches, the present embodiment simplifies software requirements and avoids the cache flushing operations necessary during a context switch in a virtually addressed cache.
  • the cache addressing scheme of the present embodiment is discussed below in conjunction with FIG. 2. Other cache addressing schemes are also possible within the scope of the present invention.
  • Bus interface unit (BIU) 106 interfaces processor 101 with the main memory 150 when a read or write access to main memory is required.
  • BIU 106 comprises a 4-deep write buffer 106 - 4 , a 4-deep read buffer 106 - 3 , a DMA arbiter 106 - 2 and BIU control unit 106 - 1 .
  • BIU control unit 106 - 1 provides all control signals on bus 153 b , which comprises buses 153 b - 1 to 153 b - 3 necessary to interface with the main memory 150 and the I/O system 140 .
  • Both addresses and data are multiplexed on the address/data bus 153 a , and the control signals are provided on the ⁇ overscore (Rd) ⁇ / ⁇ overscore (Wr) ⁇ control bus 153 b - 1 , the system clock signal 153 b - 2 , and the DMA control bus 153 b - 3 .
  • FIG. 2 is a block diagram showing the addressing scheme used in the instruction cache 102 a of the cache system 102 , which is shown in FIGS. 1 a and 1 b.
  • the higher order 20 bits of a virtual address (generated by CPU core 103 , as shown in FIG. 1 b ), which is represented by block 202 , is provided to the cache addressing mechanism represented by block 201 .
  • the remaining 10 bits of the memory word address are common between the virtual and the physical addresses. (The lowest two address bits are byte addresses, which are not used in cache addressing.) These common bits are directly provided to index into the cache memory 102 a , represented by blocks 204 and 205 .
  • Block 205 represents the data portion of the cache line, which comprises four 32-bit memory words in this embodiment.
  • Block 204 represents the “tag” portion (TAG[32:11]) of the cache data word; this tag portion contains both a “valid” TAGV bit and the higher order 20 bits of the memory word addresses of the data words stored in the cache line. (Since the addresses of memory words within the cache line are contiguous, the higher order 20 bits are common to all of the memory words in the cache line).
  • the valid bit TAGV indicates that the cache word contains valid data. Invalid data may exist if the data in the cache does not contain a current memory word. This condition may arise, for example, after a reset period.
  • Each virtual address is associated with a particular process identified by a unique “process id” PID, which is represented by block 203 .
  • Block 201 represents the virtual address to the physical address translation, which is performed using the TLB unit 103 b - 3 when the TLB is present. (FIG. 1 b .)
  • TLB miss occurs if either a mapping between the virtual address and the corresponding physical address cannot be found in the 64 entries of the TLB unit 103 b - 3 , the PID stored in the TLB unit 103 b - 3 does not match the PID of the virtual address, or if the valid bit in the data word is not set.
  • Block 207 represents the determination of whether a TLB miss has occurred.
  • the TLB miss condition raises an exception condition, which is handled by CPU core 103 . If a virtual address to physical address mapping is found, the higher order 20 bits of the physical memory word address is compared (block 206 ) with the memory address portion of the tag. The valid bit is examined to ensure the data portion of the cache line contains valid data. If the comparison (block 206 ) indicates a cache hit, the selected 32-bit word in the cache line is the desired data.
  • BIU 106 is invoked and CPU core 103 stalls until BIU 106 indicates that the requested data is available.
  • a cache miss can also be generated when the memory access is to a “uncacheable” portion of memory.
  • BIU 106 receives a datum from main memory, the CPU core 103 executes either a “refill”, a “fix-up”, or a “stream” cycle.
  • a refill cycle an instruction datum received (in the read buffer 106 - 3 ) is brought into the cache 102 a .
  • a fix-up cycle the CPU core 103 transitions from a refill cycle to execute the instruction brought out of the read buffer 106 - 3 .
  • the CPU core 103 In a stream cycle, the CPU core 103 simultaneously refills cache memory 102 a and executes the instruction brought out of the read buffer 106 - 3 . For uncacheable references, the CPU core 103 executes a fixup cycle to bring out the fetched memory word from the read buffer 106 - 3 , but the uncacheable memory word is not brought into the cache memory 102 a . Otherwise, the CPU core 103 executes refill cycles until the miss address is reached. At that time, a fixup cycle is executed. Subsequent cycles are stream cycles until the end of the 4 -memory word block is reached and normal run operation resumes. If sequential execution is interrupted, e.g. a successful branch condition, refill cycles are executed to refill the cache before execution is resumed at the branch address.
  • the operation of the data cache 102 b is similar to that of instruction cache 102 a , except that only one fixup cycle is used after one or four refill cycles, depending upon the refill block size selected. Because the size of the data caches is 2K bytes, a 21-bit “tag” is required. Hence, because of the different sizes of the instruction and data caches, the data cache's tag is 1 bit longer than the instruction cache's tag. In order to have the data and instruction caches share a common cache addressing scheme, the instruction cache routes one of its lower order address bits back as a tag bit, so as to appear as if the tag portion of the instruction cache is 21- bit. If the refill block size selected for the data cache is four memory words, as will be apparent below, the present invention provides the same benefit in the data cache as in the instruction cache.
  • FIG. 3 is a more detailed block diagram of the interface between CPU core 103 and the instruction cache memory 102 a and the data cache memory 102 b .
  • CPU core 103 provides the lower order bits of the physical cache addresses on bus 107 - 1 (ADRLO[12:0]) to address either of the cache memories 102 a and 102 b , and receives the tag and data contents of the cache memory addressed respectively on 22- bit bus 108 - 1 (TAG[31:11] and TAGV, hereinafter “TAG BUS”) and 32-bit bus 108 - 2 (“DATA[31:0]”).
  • CPU core 103 provides to instruction cache 102 a the clock signal ICLK, the read signal ⁇ overscore (IRd) ⁇ , and the write signal ⁇ overscore (IWr) ⁇ for reading and writing cache 102 a .
  • An analogous set of signals DCLK, ⁇ overscore (DRd) ⁇ and ⁇ overscore (DWr) ⁇ are provided to the data cache memory 102 b .
  • Instruction cache 102 a is divided into two banks 102 a - 1 and 102 a - 2 . In bank 102 a - 1 is stored the tags of the cache entries, and the data words are stored in bank 102 a - 2 .
  • instruction cache 102 a has a line size of four, there are four times as many entries in the data bank 102 a - 2 as tag bank 102 a - 1 .
  • Data cache 102 b is similarly divided into tag and cache banks 102 b - 1 and 102 b - 2 respectively.
  • Processor 101 is a microprocessor of 84 pins. Other than the power and ground signals, processor 101 receives or provides: a 32-bit address or data bus ADBUS[31:0], lower address bus ADR[3:2], address latch enable signal ALE, data input enable signal ⁇ overscore (DataEn) ⁇ , burst transfer or write near signal ⁇ overscore (Burst) ⁇ / ⁇ overscore (WrNear) ⁇ , read signal ⁇ overscore (Rd) ⁇ , write signal ⁇ overscore (Wr) ⁇ , acknowledge signal ⁇ overscore (ACK) ⁇ , read buffer clock enable signal ⁇ overscore (RdCEn) ⁇ , bus error signal ⁇ overscore (BusError) ⁇ , diagnostic signals Diag[1:0], DMA bus request signal ⁇ overscore (BusReq) ⁇ , DMA bus grant signal ⁇ overscore (BusGnt) ⁇ , branch condition port BrCond[3:0], interrupt signals ⁇ overscore (Int
  • the pins receiving reserved signals RSVD[4:0] are used to place processor 101 into the “cache memory access” mode. This is accomplished when bit pattern ‘011’ is detected on the reserved pins RSVD[4:2].
  • Reserved pins RSVD[4:0] are provided for general testing purpose, such as testing the cache memories 102 a and 102 b as provided by the present invention. To avoid accidentally placing processor 101 into the a testing mode, reserved pins RSVD[4:0] are each provided with a weak pull-down device. Consequently, since the user of processor 101 will normally leave reserved pins RSVD[4:0] floating, each of the reserved pins RSVD[4:0] will settle at ground voltage.
  • the CPU core 103 stalls to yield control of the data busses DATA[31:0] ( 108 - 2 ), ADRLO[12:0] ( 107 - 1 ), TAG BUS ( 108 - 1 ) and the leads for the cache control signals ICLK, DCLK, ⁇ overscore (IWr) ⁇ , ⁇ overscore (IRd) ⁇ , ⁇ overscore (DWr) ⁇ and ⁇ overscore (DRd) ⁇ to the external testing device desiring to access the cache memory.
  • IWr ⁇ overscore
  • IRd ⁇ overscore
  • DWr ⁇ overscore
  • DRd ⁇ overscore
  • the signals on tag and data buses TAG BUS ( 108 - 1 ) and DATA[31:0] and the control signals ICLK, DCLK, ⁇ overscore (IRd) ⁇ , ⁇ overscore (DRd) ⁇ , ⁇ overscore (IWr) ⁇ and ⁇ overscore (DWr) ⁇ are provided externally.
  • the pins (“ ⁇ overscore (INT[5:0]) ⁇ pins”) normally receiving interrupt signals ⁇ overscore (INT[5:0]) ⁇ , and the reserved pin RSVD[1] are used to provide these control signals from the external testing device.
  • the ⁇ overscore (INT[0]) ⁇ pin provides a clock signal CA_CLK
  • the ⁇ overscore (INT[1]) ⁇ pin provides a read signal ⁇ overscore (CA_Rd) ⁇
  • the ⁇ overscore (INT[2]) ⁇ pin provides a write signal ⁇ overscore (CA_Wr) ⁇ .
  • the signal (“I/ ⁇ overscore (D) ⁇ ”) reserved pin RSVD[1] indicates whether the signals on the ⁇ overscore (INT[2:0]) ⁇ pins are directed to data cache 102 b (RSVD[1] at logic low) or the instruction cache 102 a (RSVD[1] at logic high).
  • the control signals ICLK, DCLK, ⁇ overscore (IRd) ⁇ , ⁇ overscore (DRd) ⁇ , ⁇ overscore (IWr) ⁇ , and ⁇ overscore (DWr) ⁇ are generated internally.
  • the pins ADBUS[31:0] and ADR[3:2], which are to be used for reading or writing the cache memories 102 a and 102 b must be time-multiplexed.
  • data flowing to and from the data bus DATA[31:0]( 108 - 2 ), and the data flowing to and from the TAG BUS ( 108 - 1 ) must occur at different phases of the CA_CLK.
  • the tag and data phases of the clock are indicated by the logic state of the signal (“T/ ⁇ overscore (D) ⁇ ”) on the ⁇ overscore (INT[5]) ⁇ pin.
  • control signals In order to provide time-multiplexing of ADBUS[31:0], control signals must be generated according to (i) whether a read cycle or a write cycle is desired, (ii) whether data is to flow between the ADBUS[31:0] and which one of the TAG BUS 108 - 1 , the ADRLO[12:0] bus 107 - 1 , and the DATA[31:0] bus 108 - 2 .
  • a set of control signals TEST[4:2, 0] are generated accordingly.
  • each bit on an external pin (any pin on the ADBUS[31:0] bus or the ADR[3:2] bus) is time-multiplexed between a bit on the DATA[31:0] bus 108 - 2 and a bit from either the TAG BUS 108 - 1 or the ADRLO[12:0] bus 107 - 1 .
  • the present invention provides datapaths between an ADBUS bit and its corresponding DATA ( 108 - 2 ) bit and ADRLO ( 107 - 1 ) or TAG BUS ( 108 - 1 ) bit in the manner provided in FIG. 5.
  • an external pin 501 is provided with both receiving (i.e. input) and driving (i.e.
  • FIG. 5 is a generalized data path description of one external pin.
  • ADBUS[11] which is multiplexed between DATA[11] and TAG[11] does not have the circuit enclosed in box 503 .
  • ADBUS[4] which is multiplexed between DATA[4] and ADRLO[4] does not have the circuit enclosed in box 502 .
  • the signal received by input buffer 505 is provided to the tristate buffer 510 and to either the latch 506 or the tristate buffer 512 depending on whether pin 501 is associated with the TAG BUS ( 108 - 1 ) or the ADRLO[12:0] bus ( 107 - 1 ).
  • Latch 506 is clocked by a signal TAG_LC, which is a derivative of the clock signal CA_CLK driven from the ⁇ overscore (INT[0]) ⁇ pin, to latch a tag bit from pin 501 .
  • Tristate buffer 507 is controlled by the control signal TEST[3] for driving the TAG BUS 108 - 1 at the predetermined phase of the CA_CLK.
  • a similar tristate buffer 512 is controlled by the control signal TEST[4] to drive the ADRLO[12:0] bus ( 107 - 1 ).
  • the control signal TEST[2] activates on tristate buffer 508 .
  • tristate buffer 509 To output a bit from DATA bus 108 - 2 , tristate buffer 509 , which is controlled by control signal TEST[0], is activated. Conversely, to input a bit from pin 501 , tristate buffer 510 , which is controlled by control signal TEST[3], is activated.
  • FIG. 6 is a timing diagram showing a write cycle and a read cycle for either the instruction cache memory 102 a or the data cache memory 102 b , depending on whether the I/ ⁇ overscore (D) ⁇ signal on the RSVD[1] bus is at logic high (instruction cache), or at logic low (data cache).
  • the output signals of the read buffer 106 - 3 and 106 - 4 are deselected from their functional operation output pins ADBUS[31:0].
  • the write cycle which is two ⁇ overscore (SysClk) ⁇ periods long, is initiated at time t 0 .
  • the cache address ADR[12:2], in the order specified, is placed on the ADBUS[3:2, 10:4] and the ADR[3:2] pins.
  • the tag data to be written TAG[31:11] and TAGV are placed on the ADBUS[31:11] and the ADBUS[0] pins.
  • the CA_CLK signal on the ⁇ overscore (INT[0]) ⁇ pin latches the ADRLO[12:2] data in the address latches of the cache memory specified by the signal I/ ⁇ overscore (D) ⁇ on the RSVD[1] pin.
  • the tag data TAG[31:11] and the TAGV bit are latched into latches provided, such as latch 506 .
  • the control signal Test[4] is activated to drive the input signals on the ADBUS[3:2], the ADBUS[10:4] and the ADR[3:2] pins onto the target ADRLO bus.
  • ⁇ overscore (SysClk) ⁇ cycle i.e.
  • the data to be written DATA[31:0] are placed on the ADBUS[31:0] pins.
  • the ⁇ overscore (CA_WR) ⁇ signal on the ⁇ overscore (INT[1]) ⁇ pin is asserted and both the tag data TAG[31:11] previously latched, and the data DATA[31:0] on the ADBUS[31:0] are written into the location specified by ADRLO[12:2] in the selected cache memory.
  • the control signal TEST[3] is activated to drive the both signals on ADBUS[31:0] and the tag data previously latched onto the respective targets, i.e. the DATA[31:0] bus ( 108 - 2 ) and the TAG BUS ( 108 - 1 ).
  • ADRLO[12:2] of the location in the cache memory selected by the I/ ⁇ overscore (D) ⁇ signal on RSVD[1] is placed on the assigned ADBUS[3:2, 10:4] and ADR[3:2] pins.
  • this address is latched into the address latches of the selected cache memory, the control signal TEST[4] having driven this address onto the ADRLO[12:0] bus.
  • the T/ ⁇ overscore (D) ⁇ signal on the ⁇ overscore (INT[5]) ⁇ pin goes to logic low to select DATA[31:10] bus ( 108 - 2 ) for output in the next ⁇ overscore (SysClk) ⁇ cycle, i.e. after time t 6 .
  • ⁇ overscore (CA_Rd) ⁇ signal is asserted to cause the selected cache memory to place the tag and data bits respectively onto the TAG BUS ( 108 - 1 ) and the DATA[31:0] bus ( 108 - 2 ), and the control signal ADOUTEN enables the ADBUS[31:0] pins for output.
  • Control signal TEST[0] is also asserted to activate tristate buffer 509 , so as to allow the data on DATA[31:0] bus ( 108 - 2 ) to be output on the ADBUS[31:0] pins.
  • the signal T/ ⁇ overscore (D) ⁇ on pin INT[5] goes to logic high, activating control signal TEST[2] and deactivating control signal TEST[0], so that the tag data on TAG BUS 108 - 1 (TAG[31:11] and TAGV bit) can be output on the ADBUS[31:11] and ADBUS[0].
  • the read cycle completes at time t 10 , when the read signal ⁇ overscore (CA_Rd) ⁇ is negated.
  • each location in each of the instruction cache memory 102 a and the data cache memory 102 b can be accessed.
  • Standard exhaustive memory testing algorithms can be applied to each of the instruction and data cache memories 102 a and 102 b .
  • the present invention allows testing processor 101 using methods requiring preloading the cache memories with data and instructions. Further, during testing by an in-circuit emulator, the contents of the cache memory can be examined and monitored.

Abstract

A structure and a method provide read and write access to a microprocessor's internal cache. During write access, an external data bus transmits to an internal data bus an address, cache tags and data in accordance with a clock provided externally. During read access, the external data bus transmits an address and receives from the internal data bus data and cache tags. In one embodiment, during write access, the external data bus is time-multiplexed to transmit an address, cache tags and data in two clock periods of an externally provided clock signal. During read access, the external data bus is time-multiplexed to transmit to the internal data bus an address in the first clock period of the external clock signal, and to receive tag and data in the next successive clock periods of the externally provided clock signal. In this embodiment, reserved pins are used to specify a cache access mode. Control for the cache access is provided via pins which are used during functional operation to receive external interrupt signals.

Description

    FIELD OF THE INVENTION
  • This invention relates to integrated circuits, and in particular, relates to the design of microprocessors. [0001]
  • DESCRIPTION OF RELATED ART
  • Exploiting the property of locality of memory references, cache memories have been successfully used to achieve high performance in many computer systems. In the past, cache memories of microprocessor-based systems are provided off-chip using high performance memory components. This is primarily because the amount of silicon area necessary to provide an on-chip cache memory of reasonable performance would have been impractical, since increasing the size of an integrated circuit to accommodate a cache memory will adversely impact the yield of the integrated circuit in a given manufacturing process. However, with the density achieved recently in integrated circuit technology, it is now possible to provide on-chip cache memory economically. [0002]
  • In a computer system in which a cache memory is provided, when a memory word is needed, the central processing unit (CPU) looks into the cache memory system for a copy of the memory word. If the memory word is found in the cache memory, a cache “hit” is said to have occurred, and the main memory is not accessed. Thus, a figure of merit which can be used to measure the effectiveness of the cache memory is the “hit” ratio. The hit ratio is the percentage of total memory references in which the desired datum is found in the cache memory without accessing the main memory. When the desired datum is not found in the cache memory, a “cache miss” is said to have occurred. In addition, in many computer systems, there is one or more portions of the address space which is not mapped to the cache memory. This portion of the address space is said to be “uncached” or “uncacheable”. For example, the addresses assigned to input/output (I/O) devices are almost always uncached. Both a cache miss or an uncacheable memory reference results in an access to the main memory. [0003]
  • In the course of developing or debugging a computer system, it is often necessary to monitor program execution by the CPU or to interrupt one instruction stream to direct the CPU to execute certain alternate instructions. For example, a technique for testing a microprocessor in a system under development uses an in-circuit emulator (ICE) which provides facilities to monitor and intervene in the CPU's instruction stream. The ICE typically monitors the signals on the microprocessor's pins. In one mode of ICE operation, when a predetermined condition in the program execution is encountered, the ICE causes alternative instructions to be executed for such purpose as reading or altering the internal states of the CPU. Such alternative instructions can be preloaded or excluded from the cache memory. The ability to load or exclude instructions from the cache memory from a source external to the CPU can be very useful in many applications. Such ability is not known in the prior art. [0004]
  • When the cache memory is implemented off-chip, the ICE can easily isolate the cache memory, perform diagnostic test on each cell in the cache memory, using such techniques as exhaustive standard memory test algorithms independent from the operation of the CPU. In addition, the transactions between the cache memory and the CPU can be monitored by the ICE on the off-chip bus between the cache memory and the CPU. Hence, no difficulty is created in testing or using an off-chip cache. However, when the cache memory is implemented on-chip, the transactions between the cache and the CPU occur on an on-chip bus, which cannot be probed from the pins of the integrated circuit. As a result, debugging operations using an ICE in a system with an on-chip cache system can be very restricted. The inability to access and exhaustively test the internal cache makes diagnosing certain system problems difficult. When the on-chip cache achieves a high hit ratio, only the relatively infrequent accesses to main memory due to cache misses or references to uncacheable parts of memory can be monitored from the pins. [0005]
  • SUMMARY OF THE INVENTION
  • In accordance with the present invention, a structure and a method provide read and write accesses to a microprocessor's internal cache. During write access, an external data bus transmits to an internal data bus an address, cache tags and data in accordance with a clock signal provided externally. During read access, the external data bus transmits an address and receives from the internal data bus data and tag, also in accordance with the clock, signal provided externally. [0006]
  • In one embodiment, during write access, the external data bus is time-multiplexed to transmit the address, the cache tags and data in two clock periods of an externally provided clock signal. In the same embodiment, during read access, the external data bus is time-multiplexed to transmit to the internal data bus an address in the first clock period of the external clock signal, and to receive cache tags and data in the next two successive clock periods of the externally provided clock signal. In this embodiment, “reserved” pins are used to specify a cache access mode. Control signals for the cache access are provided via pins which are used during functional operation to receive external interrupt signals. [0007]
  • The present invention allows the user of the microprocessor to exhaustively test the on-chip cache using standard memory test algorithms. The present invention also allows preloading the on-chip cache under control of signals external to the microprocessor. Such preloading operations can be useful in certain applications. In addition, the present invention provides a facility for external testing equipment to monitor or intervene internal operations of the microprocessor.[0008]
  • The present invention is better understood upon consideration of the below detailed description and the accompanying drawings. [0009]
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • FIG. 1[0010] a shows a computer system 100 having a processor 101 with an on-chip instruction cache system 102 and a main memory system 150 external to the processor 101, in accordance with the present invention.
  • FIG. 1[0011] b is a block diagram of the processor 101 of FIG. 1a.
  • FIG. 2 is a block diagram showing the addressing scheme used in instruction cache [0012] 102 a of the cache system 102 of FIGS. 1a and 1 b.
  • FIG. 3 is a block diagram in further detail than FIG. 2 of the interface between [0013] CPU core 103 and the instruction and data caches 102 a and 102 b, including the control signals ICLK, DCLK, {overscore (IWR)}, {overscore (DWR)}, {overscore (IRD)} and {overscore (DRD)}.
  • FIG. 4 summarizes some control signals generated from signals received on the microprocessor's pins for controlling reading and writing the instruction and [0014] data caches 102 a and 102 b, in accordance with the present invention.
  • FIG. 5 shows data flow between one pin of [0015] processor 101 to one bit each in the DATA[31:0] bus and one of ADRLO[12:0] and TAG[31:11] busses, in accordance with the present invention.
  • FIG. 6 shows a timing diagram for a read cycle and a write cycle involving either the instruction cache memory [0016] 102 a, or the data cache memory 102 b, in accordance with the present invention.
  • DETAILED DESCRIPTION
  • FIG. 1[0017] a shows, as an example, a computer system 100 having a processor 101 with an on-chip cache system 102 and a main memory system 150 external to the processor, in accordance with the present invention. As shown in FIG. 1a, external or read and write memory (“main memory”) system 150, which is interfaced to the processor 101 over a bus 153, comprises a dynamic random access memory (DRAM) controller 151, a main memory 152 implemented by banks 152 a and 152 b of DRAMs and a bus interface 154. In addition, the address space of computer system 100 is also used to access other memory-mapped devices such as I/O controller 141, I/O devices 142 and 143, and programmable read-only memory (PROM) 144. To facilitate reference, the memory-mapped devices other than the main memory 150 defined above are collectively referred to as the I/O system 140, even though read-only memories, such as PROM 144, are often not considered part of the I/O system. I/O system 140 is also interfaced to the bus 153. Bus 153 comprises address/data bus 153 a and control bus 153 b. Memory data and memory addresses are time-multiplexed on the 32-bit address/data bus 153 a. Other device configurations using the memory address space are also possible within the scope of the present invention.
  • The organization of [0018] processor 101 is shown in FIG. 1b. As shown in FIG. 1b, processor 101 includes two co-processors 103 a and 103 b, controlled by a master pipeline control unit 103 c. Coprocessor 103 a is also referred to as the integer CPU, and includes 32 32-bit general registers 103 a-1, an ALU 103 a-2, a shifter 103 a-3, a multiplication and division unit 103 a-4, an address adder 103 a-5, and program counter control unit 103 a-6. Processor 103 a executes the instruction set known as the MIPS-I Instruction Set Architecture (ISA). Coprocessor 103 b, also known as the System Control Coprocessor, comprises exception/control registers 103 b-1, a memory management registers unit 103 b-2 and a translation look-aside buffer (TLB) 103 b-3. The TLB unit 103 b-3 provides a mapping between virtual and physical addresses. The TLB unit 103 b-3 has a 64-entry look-up table to provide mapping between virtual and physical addresses efficiently. In this embodiment, the TLB unit 103 b-3 is provided at the user's option. The TLB unit 103 b-3 can be disabled. The above units of the coprocessors 103 a and 103 b can be implemented by conventional or any suitable designs known in the art. The coprocessor units 103 a and 103 b, and the pipeline control unit 103 c are collectively referred to as the CPU core 103.
  • The [0019] cache system 102 of processor 101 comprises two cache memories 102 a and 102 b. Cache 102 a is an instruction cache. In this embodiment shown, the capacity of cache 102 a can be 4K or 8K bytes, and block fill and line sizes of four memory words each. Cache 102 b is a data cache, and has a selectable block refill size of one or four memory words, a line size of one memory word, and a capacity of 2K bytes. Other cache, block refill and line sizes can be provided within the scope of the present invention. Both the capacities of cache 102 a and cache 102 b, and their respective block refill and line sizes, are matters of design choice. In addition, it is also not necessary to provide separate data and instruction caches. A joint data and instruction cache is also within the scope of the present invention. The TLB unit 103 b-3 receives from the CPU core 103 on bus 109 a virtual address and provides to either cache 102 a or cache 102 b on bus 107 the corresponding physical memory address. Although cache accessing using virtual addresses is also possible, by using physical addressing in the instruction and data caches, the present embodiment simplifies software requirements and avoids the cache flushing operations necessary during a context switch in a virtually addressed cache. The cache addressing scheme of the present embodiment is discussed below in conjunction with FIG. 2. Other cache addressing schemes are also possible within the scope of the present invention.
  • Bus interface unit (BIU) [0020] 106 interfaces processor 101 with the main memory 150 when a read or write access to main memory is required. BIU 106 comprises a 4-deep write buffer 106-4, a 4-deep read buffer 106-3, a DMA arbiter 106-2 and BIU control unit 106-1. BIU control unit 106-1 provides all control signals on bus 153 b, which comprises buses 153 b-1 to 153 b-3 necessary to interface with the main memory 150 and the I/O system 140. Both addresses and data are multiplexed on the address/data bus 153 a, and the control signals are provided on the {overscore (Rd)}/{overscore (Wr)} control bus 153 b-1, the system clock signal 153 b-2, and the DMA control bus 153 b-3.
  • FIG. 2 is a block diagram showing the addressing scheme used in the instruction cache [0021] 102 a of the cache system 102, which is shown in FIGS. 1a and 1 b. As shown in FIG. 2, the higher order 20 bits of a virtual address (generated by CPU core 103, as shown in FIG. 1b), which is represented by block 202, is provided to the cache addressing mechanism represented by block 201. The remaining 10 bits of the memory word address are common between the virtual and the physical addresses. (The lowest two address bits are byte addresses, which are not used in cache addressing.) These common bits are directly provided to index into the cache memory 102 a, represented by blocks 204 and 205. Block 205 represents the data portion of the cache line, which comprises four 32-bit memory words in this embodiment. Block 204 represents the “tag” portion (TAG[32:11]) of the cache data word; this tag portion contains both a “valid” TAGV bit and the higher order 20 bits of the memory word addresses of the data words stored in the cache line. (Since the addresses of memory words within the cache line are contiguous, the higher order 20 bits are common to all of the memory words in the cache line). The valid bit TAGV indicates that the cache word contains valid data. Invalid data may exist if the data in the cache does not contain a current memory word. This condition may arise, for example, after a reset period.
  • Each virtual address is associated with a particular process identified by a unique “process id” PID, which is represented by [0022] block 203. Block 201 represents the virtual address to the physical address translation, which is performed using the TLB unit 103 b-3 when the TLB is present. (FIG. 1b.) When the TLB is present, a TLB miss occurs if either a mapping between the virtual address and the corresponding physical address cannot be found in the 64 entries of the TLB unit 103 b-3, the PID stored in the TLB unit 103 b-3 does not match the PID of the virtual address, or if the valid bit in the data word is not set. Block 207 represents the determination of whether a TLB miss has occurred. The TLB miss condition raises an exception condition, which is handled by CPU core 103. If a virtual address to physical address mapping is found, the higher order 20 bits of the physical memory word address is compared (block 206) with the memory address portion of the tag. The valid bit is examined to ensure the data portion of the cache line contains valid data. If the comparison (block 206) indicates a cache hit, the selected 32-bit word in the cache line is the desired data.
  • If a cache miss is indicated, BIU [0023] 106 is invoked and CPU core 103 stalls until BIU 106 indicates that the requested data is available. A cache miss can also be generated when the memory access is to a “uncacheable” portion of memory. When BIU 106 receives a datum from main memory, the CPU core 103 executes either a “refill”, a “fix-up”, or a “stream” cycle. In a refill cycle, an instruction datum received (in the read buffer 106-3) is brought into the cache 102 a. In a fix-up cycle, the CPU core 103 transitions from a refill cycle to execute the instruction brought out of the read buffer 106-3. In a stream cycle, the CPU core 103 simultaneously refills cache memory 102 a and executes the instruction brought out of the read buffer 106-3. For uncacheable references, the CPU core 103 executes a fixup cycle to bring out the fetched memory word from the read buffer 106-3, but the uncacheable memory word is not brought into the cache memory 102 a. Otherwise, the CPU core 103 executes refill cycles until the miss address is reached. At that time, a fixup cycle is executed. Subsequent cycles are stream cycles until the end of the 4-memory word block is reached and normal run operation resumes. If sequential execution is interrupted, e.g. a successful branch condition, refill cycles are executed to refill the cache before execution is resumed at the branch address.
  • The operation of the [0024] data cache 102 b is similar to that of instruction cache 102 a, except that only one fixup cycle is used after one or four refill cycles, depending upon the refill block size selected. Because the size of the data caches is 2K bytes, a 21-bit “tag” is required. Hence, because of the different sizes of the instruction and data caches, the data cache's tag is 1 bit longer than the instruction cache's tag. In order to have the data and instruction caches share a common cache addressing scheme, the instruction cache routes one of its lower order address bits back as a tag bit, so as to appear as if the tag portion of the instruction cache is 21-bit. If the refill block size selected for the data cache is four memory words, as will be apparent below, the present invention provides the same benefit in the data cache as in the instruction cache.
  • FIG. 3 is a more detailed block diagram of the interface between [0025] CPU core 103 and the instruction cache memory 102 a and the data cache memory 102 b. As shown in FIG. 3, CPU core 103 provides the lower order bits of the physical cache addresses on bus 107-1 (ADRLO[12:0]) to address either of the cache memories 102 a and 102 b, and receives the tag and data contents of the cache memory addressed respectively on 22-bit bus 108-1 (TAG[31:11] and TAGV, hereinafter “TAG BUS”) and 32-bit bus 108-2 (“DATA[31:0]”). CPU core 103 provides to instruction cache 102 a the clock signal ICLK, the read signal {overscore (IRd)}, and the write signal {overscore (IWr)} for reading and writing cache 102 a. An analogous set of signals DCLK, {overscore (DRd)} and {overscore (DWr)} are provided to the data cache memory 102 b. Instruction cache 102 a is divided into two banks 102 a-1 and 102 a-2. In bank 102 a-1 is stored the tags of the cache entries, and the data words are stored in bank 102 a-2. Since instruction cache 102 a has a line size of four, there are four times as many entries in the data bank 102 a-2 as tag bank 102 a-1. Data cache 102 b is similarly divided into tag and cache banks 102 b-1 and 102 b-2 respectively.
  • [0026] Processor 101 is a microprocessor of 84 pins. Other than the power and ground signals, processor 101 receives or provides: a 32-bit address or data bus ADBUS[31:0], lower address bus ADR[3:2], address latch enable signal ALE, data input enable signal {overscore (DataEn)}, burst transfer or write near signal {overscore (Burst)}/{overscore (WrNear)}, read signal {overscore (Rd)}, write signal {overscore (Wr)}, acknowledge signal {overscore (ACK)}, read buffer clock enable signal {overscore (RdCEn)}, bus error signal {overscore (BusError)}, diagnostic signals Diag[1:0], DMA bus request signal {overscore (BusReq)}, DMA bus grant signal {overscore (BusGnt)}, branch condition port BrCond[3:0], interrupt signals {overscore (Int[5:0])}, clock signals Clk2xIn and {overscore (SysClk)}, reset signal {overscore (Reset)}, and reserved signals RSVD[4:0]. The functional descriptions of these signals can be found in the “IDT79R3051 Family Hardware User's Manual,” available from Integrated Device Technology, Inc., Santa Clara, Calif. This hardware manual is hereby incorporated by reference in its entirety.
  • In order to provide the benefits of the present invention, the pins receiving reserved signals RSVD[4:0] (i.e. the “reserved pins RSVD[4:0]”) are used to place [0027] processor 101 into the “cache memory access” mode. This is accomplished when bit pattern ‘011’ is detected on the reserved pins RSVD[4:2]. Reserved pins RSVD[4:0] are provided for general testing purpose, such as testing the cache memories 102 a and 102 b as provided by the present invention. To avoid accidentally placing processor 101 into the a testing mode, reserved pins RSVD[4:0] are each provided with a weak pull-down device. Consequently, since the user of processor 101 will normally leave reserved pins RSVD[4:0] floating, each of the reserved pins RSVD[4:0] will settle at ground voltage.
  • When cache memory access mode is entered, the [0028] CPU core 103 stalls to yield control of the data busses DATA[31:0] (108-2), ADRLO[12:0] (107-1), TAG BUS (108-1) and the leads for the cache control signals ICLK, DCLK, {overscore (IWr)}, {overscore (IRd)}, {overscore (DWr)} and {overscore (DRd)} to the external testing device desiring to access the cache memory. Because processor 101 stalls in cache memory access mode, the signals on tag and data buses TAG BUS (108-1) and DATA[31:0] and the control signals ICLK, DCLK, {overscore (IRd)}, {overscore (DRd)}, {overscore (IWr)} and {overscore (DWr)} are provided externally. In the cache memory access mode, the pins (“{overscore (INT[5:0])} pins”) normally receiving interrupt signals {overscore (INT[5:0])}, and the reserved pin RSVD[1] are used to provide these control signals from the external testing device. Specifically, the {overscore (INT[0])} pin provides a clock signal CA_CLK, the {overscore (INT[1])} pin provides a read signal {overscore (CA_Rd)}, and the {overscore (INT[2])} pin provides a write signal {overscore (CA_Wr)}. 1-6 5 In addition, the signal (“I/{overscore (D)}”) reserved pin RSVD[1] indicates whether the signals on the {overscore (INT[2:0])} pins are directed to data cache 102 b (RSVD[1] at logic low) or the instruction cache 102 a (RSVD[1] at logic high). Using the signals on these pins, the control signals ICLK, DCLK, {overscore (IRd)}, {overscore (DRd)}, {overscore (IWr)}, and {overscore (DWr)} are generated internally. Under cache memory access mode, because the combined width of the TAG, ADRLO, and DATA busses are 67 bits, and when added to the number of the control signals, exceeds the total number of functional pins (i.e. other than power and ground pins) available, the pins ADBUS[31:0] and ADR[3:2], which are to be used for reading or writing the cache memories 102 a and 102 b must be time-multiplexed. Specifically, data flowing to and from the data bus DATA[31:0](108-2), and the data flowing to and from the TAG BUS (108-1) must occur at different phases of the CA_CLK. During a read cycle (see below) the tag and data phases of the clock are indicated by the logic state of the signal (“T/{overscore (D)}”) on the {overscore (INT[5])} pin. Consequently, the following pin assignments are made:
    FUNCTIONAL MODE CACHE MEMORY ACCESS MODE
    {overscore (INT [0])} CA_CLK
    {overscore (INT [1])} {overscore (CA_Rd)}
    {overscore (INT [2])} {overscore (CA_Wr)}
    {overscore (INT [5])} T/{overscore (D)}
    RSVD[1] I/{overscore (D)}
    ADBUS[31:11] TAG[31:11], DATA[31:11]
    ADBUS[1O:4] ADRLO[1O:4], DATA[1O:4]
    ADBUS [3:2] ADRLO[12:11], DATA[3:2]
    ADBUS[0] TAGV
    ADR[3:2] ADRLO[3:2]
  • In order to provide time-multiplexing of ADBUS[31:0], control signals must be generated according to (i) whether a read cycle or a write cycle is desired, (ii) whether data is to flow between the ADBUS[31:0] and which one of the TAG BUS [0029] 108-1, the ADRLO[12:0] bus 107-1, and the DATA[31:0] bus 108-2. A set of control signals TEST[4:2, 0] are generated accordingly. Some control signals generated from the values of the control pins discussed above for accomplishing the present invention are summarized in FIG. 4.
  • As shown above, each bit on an external pin (any pin on the ADBUS[31:0] bus or the ADR[3:2] bus) is time-multiplexed between a bit on the DATA[31:0] bus [0030] 108-2 and a bit from either the TAG BUS 108-1 or the ADRLO[12:0] bus 107-1. The present invention provides datapaths between an ADBUS bit and its corresponding DATA (108-2) bit and ADRLO (107-1) or TAG BUS (108-1) bit in the manner provided in FIG. 5. As shown in FIG. 5, an external pin 501 is provided with both receiving (i.e. input) and driving (i.e. output) abilities by input buffer 505 and output buffer 504 respectively. When inputting, the output buffer 504 is disabled by control signal ADOUTEN (ADBUS output enable). The input buffer 505 is always enabled. During functional ocprations, pin 501 is multiplexed between the read buffer 106-3 (FIG. 1b) and the write buffer 106-4. An output signal from write buffer 106-4, for example, is provided on lead 513 for output to pin 501 through tristate buffers 511 and 504. Tristate buffer 511 is controlled by NOR gate 512, which receives as input signals the control signals TEST[0] and TEST[2]. During cache access mode, however, the write buffer 106-4 and the read buffer 106-3 are deselected by placing tristate buffer 511 in the high impedance state.
  • Depending on whether [0031] pin 501 is associated with a TAG BUS (108-1) bit or an ADRLO (107-1) bit, only one of the circuits enclosed in the boxes 502 and 503 is present at any pin. Thus, FIG. 5 is a generalized data path description of one external pin. For example, ADBUS[11], which is multiplexed between DATA[11] and TAG[11] does not have the circuit enclosed in box 503. Alternative, ADBUS[4], which is multiplexed between DATA[4] and ADRLO[4] does not have the circuit enclosed in box 502.
  • As shown in FIG. 5, the signal received by input buffer [0032] 505 is provided to the tristate buffer 510 and to either the latch 506 or the tristate buffer 512 depending on whether pin 501 is associated with the TAG BUS (108-1) or the ADRLO[12:0] bus (107-1). Latch 506 is clocked by a signal TAG_LC, which is a derivative of the clock signal CA_CLK driven from the {overscore (INT[0])} pin, to latch a tag bit from pin 501. Tristate buffer 507 is controlled by the control signal TEST[3] for driving the TAG BUS 108-1 at the predetermined phase of the CA_CLK. In the circuit enclosed in box 503, a similar tristate buffer 512 is controlled by the control signal TEST[4] to drive the ADRLO[12:0] bus (107-1). When outputting a TAG BUS (108-1) bit, the control signal TEST[2] activates on tristate buffer 508.
  • To output a bit from DATA bus [0033] 108-2, tristate buffer 509, which is controlled by control signal TEST[0], is activated. Conversely, to input a bit from pin 501, tristate buffer 510, which is controlled by control signal TEST[3], is activated.
  • FIG. 6 is a timing diagram showing a write cycle and a read cycle for either the instruction cache memory [0034] 102 a or the data cache memory 102 b, depending on whether the I/{overscore (D)} signal on the RSVD[1] bus is at logic high (instruction cache), or at logic low (data cache). As mentioned above, in the cache memory access mode, the output signals of the read buffer 106-3 and 106-4 are deselected from their functional operation output pins ADBUS[31:0].
  • As shown in FIG. 6, the write cycle, which is two {overscore (SysClk)} periods long, is initiated at time t[0035] 0. The cache address ADR[12:2], in the order specified, is placed on the ADBUS[3:2, 10:4] and the ADR[3:2] pins. At the same time, the tag data to be written TAG[31:11] and TAGV are placed on the ADBUS[31:11] and the ADBUS[0] pins. The CA_CLK signal on the {overscore (INT[0])} pin latches the ADRLO[12:2] data in the address latches of the cache memory specified by the signal I/{overscore (D)} on the RSVD[1] pin. At the same time, the tag data TAG[31:11] and the TAGV bit are latched into latches provided, such as latch 506. The control signal Test[4] is activated to drive the input signals on the ADBUS[3:2], the ADBUS[10:4] and the ADR[3:2] pins onto the target ADRLO bus. At the next {overscore (SysClk)} cycle, i.e. after time t2, the data to be written DATA[31:0] are placed on the ADBUS[31:0] pins. At time t3, the {overscore (CA_WR)} signal on the {overscore (INT[1])} pin is asserted and both the tag data TAG[31:11] previously latched, and the data DATA[31:0] on the ADBUS[31:0] are written into the location specified by ADRLO[12:2] in the selected cache memory. The control signal TEST[3] is activated to drive the both signals on ADBUS[31:0] and the tag data previously latched onto the respective targets, i.e. the DATA[31:0] bus (108-2) and the TAG BUS (108-1).
  • At time t[0036] 4, a read cycle is initiated. The address ADRLO[12:2] of the location in the cache memory selected by the I/{overscore (D)} signal on RSVD[1] is placed on the assigned ADBUS[3:2, 10:4] and ADR[3:2] pins. At time t5, this address is latched into the address latches of the selected cache memory, the control signal TEST[4] having driven this address onto the ADRLO[12:0] bus. At the same time, the T/{overscore (D)} signal on the {overscore (INT[5])} pin goes to logic low to select DATA[31:10] bus (108-2) for output in the next {overscore (SysClk)} cycle, i.e. after time t6. At time t7, {overscore (CA_Rd)} signal is asserted to cause the selected cache memory to place the tag and data bits respectively onto the TAG BUS (108-1) and the DATA[31:0] bus (108-2), and the control signal ADOUTEN enables the ADBUS[31:0] pins for output. Control signal TEST[0] is also asserted to activate tristate buffer 509, so as to allow the data on DATA[31:0] bus (108-2) to be output on the ADBUS[31:0] pins. At time t8, the signal T/{overscore (D)} on pin INT[5] goes to logic high, activating control signal TEST[2] and deactivating control signal TEST[0], so that the tag data on TAG BUS 108-1 (TAG[31:11] and TAGV bit) can be output on the ADBUS[31:11] and ADBUS[0]. The read cycle completes at time t10, when the read signal {overscore (CA_Rd)} is negated.
  • Using these read and write cycles, every location in each of the instruction cache memory [0037] 102 a and the data cache memory 102 b can be accessed. Standard exhaustive memory testing algorithms can be applied to each of the instruction and data cache memories 102 a and 102 b. In addition, the present invention allows testing processor 101 using methods requiring preloading the cache memories with data and instructions. Further, during testing by an in-circuit emulator, the contents of the cache memory can be examined and monitored.
  • The above detailed description is provided to illustrate the specific embodiments provided above, and not intended to be limiting the present invention. Many modifications and variations within the scope of the present invention are possible. The present invention is defined by the following Claims. [0038]

Claims (8)

We claim:
1. A structure for reading and writing an internal memory of an integrated circuit having a plurality of pins, comprising:
an internal bus interfaced to said internal memory;
means for receiving at one of said pins a clock signal;
means for receiving at one of said pins a read signal indicating reading of said internal memory is desired;
means for receiving at one of said pins a write signal indicating writing of said internal memory is desired;
means for providing the data on said internal bus to a first group of said pins; and
means for providing the data on a second group of said pins to said internal bus.
2. A structure as in claim 1, wherein said first and second groups of pins includes common pins belonging to both said first and second groups of pins, said common pins provided with tristate buffers to effectuate bidirectional operations.
3. A structure as in claim 1, wherein in said internal memory has a bit-width exceeding the number of pins in said first group of pins, said means for providing the data on said internal bus provides said data by time-multiplexing said first group of pins.
4. A structure as in claim 1, wherein in said internal memory has a bit-width exceeding the number of pins in said second group of pins, said means for providing the data on said second group of pins provides said data by time-multiplexing said second group of pins.
5. A method for writing an internal memory of an integrated circuit having a plurality of pins, comprising the steps of:
providing an internal bus interfaced to said internal memory;
receiving at one of said pins a clock signal;
receiving at one of said pins a write signal indicating writing of said internal memory is desired; and
providing the data on a group of said pins to said internal bus.
6. A method for reading an internal memory of an integrated circuit having a plurality of pins, comprising the steps of:
providing an internal bus interfaced to said internal memory;
receiving at one of said pins a clock signal;
receiving at one of said pins a read signal indicating reading of said internal memory is desired; and
providing the data on said internal bus to a group of said pins.
7. A method as in claim 6 wherein in said internal memory has a bit-width exceeding the number of pins in said group of pins, said step of providing the data on said internal bus provides said data by time-multiplexing said group of pins.
8. A structure as in claim 5, wherein in said internal memory has a bit-width exceeding the number of pins in said group of pins, said step of providing the data on said group of pins provides said data by time-multiplexing said group of pins.
US08/818,060 1991-06-27 1997-03-14 Test mode accessing of an internal cache memory Expired - Fee Related US6446164B1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US08/818,060 US6446164B1 (en) 1991-06-27 1997-03-14 Test mode accessing of an internal cache memory

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US72202691A 1991-06-27 1991-06-27
US08/818,060 US6446164B1 (en) 1991-06-27 1997-03-14 Test mode accessing of an internal cache memory

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
US72202691A Continuation 1991-06-27 1991-06-27

Publications (2)

Publication Number Publication Date
US20020032827A1 true US20020032827A1 (en) 2002-03-14
US6446164B1 US6446164B1 (en) 2002-09-03

Family

ID=24900223

Family Applications (1)

Application Number Title Priority Date Filing Date
US08/818,060 Expired - Fee Related US6446164B1 (en) 1991-06-27 1997-03-14 Test mode accessing of an internal cache memory

Country Status (1)

Country Link
US (1) US6446164B1 (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060059316A1 (en) * 2004-09-10 2006-03-16 Cavium Networks Method and apparatus for managing write back cache
US7558925B2 (en) 2004-09-10 2009-07-07 Cavium Networks, Inc. Selective replication of data structures
US7594081B2 (en) 2004-09-10 2009-09-22 Cavium Networks, Inc. Direct access to low-latency memory
US20120096213A1 (en) * 2009-04-10 2012-04-19 Kazuomi Kato Cache memory device, cache memory control method, program and integrated circuit
US9311239B2 (en) 2013-03-14 2016-04-12 Intel Corporation Power efficient level one data cache access with pre-validated tags

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP3189816B2 (en) * 1998-12-08 2001-07-16 日本電気株式会社 Semiconductor storage device
US6889344B2 (en) * 2001-08-09 2005-05-03 International Business Machines Corporation System and method for exposing hidden events on system buses
US7007157B2 (en) 2001-10-30 2006-02-28 Microsoft Corporation Network interface sharing methods and apparatuses that support kernel mode data traffic and user mode data traffic
US20040044508A1 (en) * 2002-08-29 2004-03-04 Hoffman Robert R. Method for generating commands for testing hardware device models
CN1332315C (en) * 2002-09-27 2007-08-15 佛山市顺德区顺达电脑厂有限公司 Internal storage testing method
DE10248753B4 (en) * 2002-10-18 2005-09-15 Infineon Technologies Ag Semiconductor device and method for functional test and configuration of a semiconductor device
US7779212B2 (en) 2003-10-17 2010-08-17 Micron Technology, Inc. Method and apparatus for sending data from multiple sources over a communications bus
US7395454B1 (en) 2005-01-04 2008-07-01 Marvell Israel (Misl) Ltd. Integrated circuit with integrated debugging mechanism for standard interface
US7778812B2 (en) * 2005-01-07 2010-08-17 Micron Technology, Inc. Selecting data to verify in hardware device model simulation test generation
US7353345B1 (en) * 2005-03-07 2008-04-01 Integated Device Technology, Inc. External observation and control of data in a computing processor

Family Cites Families (26)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4071889A (en) * 1973-07-28 1978-01-31 Mitsubishi Denki Kabushiki Kaisha Central processing apparatus for generating and receiving time division multiplex signals
US3867579A (en) * 1973-12-21 1975-02-18 Bell Telephone Labor Inc Synchronization apparatus for a time division switching system
US4257095A (en) * 1978-06-30 1981-03-17 Intel Corporation System bus arbitration, circuitry and methodology
US4315310A (en) * 1979-09-28 1982-02-09 Intel Corporation Input/output data processing system
US4365294A (en) * 1980-04-10 1982-12-21 Nizdorf Computer Corporation Modular terminal system using a common bus
US4575792A (en) * 1982-03-31 1986-03-11 Honeywell Information Systems Inc. Shared interface apparatus for testing the memory sections of a cache unit
US4591975A (en) * 1983-07-18 1986-05-27 Data General Corporation Data processing system having dual processors
US4701844A (en) * 1984-03-30 1987-10-20 Motorola Computer Systems, Inc. Dual cache for independent prefetch and execution units
US4933835A (en) * 1985-02-22 1990-06-12 Intergraph Corporation Apparatus for maintaining consistency of a cache memory with a primary memory
US4920534A (en) * 1986-02-28 1990-04-24 At&T Bell Laboratories System for controllably eliminating bits from packet information field based on indicator in header and amount of data in packet buffer
US4922438A (en) * 1986-12-11 1990-05-01 Siemens Aktiengesellschaft Method and apparatus for reading packet-oriented data signals into and out of a buffer
JPH0221342A (en) * 1987-02-27 1990-01-24 Hitachi Ltd Logical cache memory
US4933846A (en) * 1987-04-24 1990-06-12 Network Systems Corporation Network communications adapter with dual interleaved memory banks servicing multiple processors
US5185878A (en) * 1988-01-20 1993-02-09 Advanced Micro Device, Inc. Programmable cache memory as well as system incorporating same and method of operating programmable cache memory
US5553262B1 (en) * 1988-01-21 1999-07-06 Mitsubishi Electric Corp Memory apparatus and method capable of setting attribute of information to be cached
US5131083A (en) * 1989-04-05 1992-07-14 Intel Corporation Method of transferring burst data in a microprocessor
JPH0359741A (en) * 1989-07-28 1991-03-14 Mitsubishi Electric Corp Cache memory
US5307477A (en) * 1989-12-01 1994-04-26 Mips Computer Systems, Inc. Two-level cache memory system
US5226130A (en) * 1990-02-26 1993-07-06 Nexgen Microsystems Method and apparatus for store-into-instruction-stream detection and maintaining branch prediction cache consistency
US5317718A (en) * 1990-03-27 1994-05-31 Digital Equipment Corporation Data processing system and method with prefetch buffers
US5249281A (en) * 1990-10-12 1993-09-28 Lsi Logic Corporation Testable ram architecture in a microprocessor having embedded cache memory
US5479630A (en) * 1991-04-03 1995-12-26 Silicon Graphics Inc. Hybrid cache having physical-cache and virtual-cache characteristics and method for accessing same
US5293603A (en) * 1991-06-04 1994-03-08 Intel Corporation Cache subsystem for microprocessor based computer system with synchronous and asynchronous data path
US5317711A (en) * 1991-06-14 1994-05-31 Integrated Device Technology, Inc. Structure and method for monitoring an internal cache
US5636363A (en) * 1991-06-14 1997-06-03 Integrated Device Technology, Inc. Hardware control structure and method for off-chip monitoring entries of an on-chip cache
US5649232A (en) * 1991-06-14 1997-07-15 Integrated Device Technology, Inc. Structure and method for multiple-level read buffer supporting optimal throttled read operations by regulating transfer rate

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060059316A1 (en) * 2004-09-10 2006-03-16 Cavium Networks Method and apparatus for managing write back cache
US20060059310A1 (en) * 2004-09-10 2006-03-16 Cavium Networks Local scratchpad and data caching system
US7558925B2 (en) 2004-09-10 2009-07-07 Cavium Networks, Inc. Selective replication of data structures
US7594081B2 (en) 2004-09-10 2009-09-22 Cavium Networks, Inc. Direct access to low-latency memory
US7941585B2 (en) * 2004-09-10 2011-05-10 Cavium Networks, Inc. Local scratchpad and data caching system
US9141548B2 (en) 2004-09-10 2015-09-22 Cavium, Inc. Method and apparatus for managing write back cache
US20120096213A1 (en) * 2009-04-10 2012-04-19 Kazuomi Kato Cache memory device, cache memory control method, program and integrated circuit
US9026738B2 (en) * 2009-04-10 2015-05-05 Panasonic Intellectual Property Corporation Of America Cache memory device, cache memory control method, program and integrated circuit
US9311239B2 (en) 2013-03-14 2016-04-12 Intel Corporation Power efficient level one data cache access with pre-validated tags

Also Published As

Publication number Publication date
US6446164B1 (en) 2002-09-03

Similar Documents

Publication Publication Date Title
US5317711A (en) Structure and method for monitoring an internal cache
US5636363A (en) Hardware control structure and method for off-chip monitoring entries of an on-chip cache
US5276833A (en) Data cache management system with test mode using index registers and CAS disable and posted write disable
US5519839A (en) Double buffering operations between the memory bus and the expansion bus of a computer system
US5539890A (en) Microprocessor interface apparatus having a boot address relocator, a request pipeline, a prefetch queue, and an interrupt filter
US5581727A (en) Hierarchical cache system flushing scheme based on monitoring and decoding processor bus cycles for flush/clear sequence control
EP0734553B1 (en) Split level cache
US5249281A (en) Testable ram architecture in a microprocessor having embedded cache memory
US6446164B1 (en) Test mode accessing of an internal cache memory
US6202125B1 (en) Processor-cache protocol using simple commands to implement a range of cache configurations
US5893921A (en) Method for maintaining memory coherency in a computer system having a cache utilizing snoop address injection during a read transaction by a dual memory bus controller
US5671231A (en) Method and apparatus for performing cache snoop testing on a cache system
US11321248B2 (en) Multiple-requestor memory access pipeline and arbiter
US6173243B1 (en) Memory incoherent verification methodology
US5860113A (en) System for using a dirty bit with a cache memory
US5455925A (en) Data processing device for maintaining coherency of data stored in main memory, external cache memory and internal cache memory
US6446169B1 (en) SRAM with tag and data arrays for private external microprocessor bus
KR100367139B1 (en) Pipeline-type microprocessor that prevents the cache from being read if the contents of the cache are invalid
US6282626B1 (en) No stall read access-method for hiding latency in processor memory accesses
US6549984B1 (en) Multi-bus access cache
KR0155931B1 (en) On-chip cache memory system
AU706450B2 (en) A processor interface circuit
Garcia et al. Single chip PCI bridge and memory controller for PowerPC microprocessors
Path INTEL 430TX PCISET: 82439TX SYSTEM CONTROLLER (MTXC)
GB2272548A (en) Zero wait state cache using non-interleaved banks of asynchronous static random access memories

Legal Events

Date Code Title Description
FPAY Fee payment

Year of fee payment: 4

FPAY Fee payment

Year of fee payment: 8

REMI Maintenance fee reminder mailed
LAPS Lapse for failure to pay maintenance fees
STCH Information on status: patent discontinuation

Free format text: PATENT EXPIRED DUE TO NONPAYMENT OF MAINTENANCE FEES UNDER 37 CFR 1.362

FP Lapsed due to failure to pay maintenance fee

Effective date: 20140903