WO1999032976A1

WO1999032976A1 - Risc processor with concurrent snooping and instruction execution

Info

Publication number: WO1999032976A1
Application number: PCT/IB1998/001545
Authority: WO
Inventors: Slobodan Simovich; Brad E. Eltman
Original assignee: Koninklijke Philips Electronics N.V.; Philips Ab
Priority date: 1997-12-18
Filing date: 1998-10-05
Publication date: 1999-07-01

Abstract

A RISC processor has a cache that is conditionally accessible for a snooping address or a normal address in an instruction in the instruction stream. Upon receipt of a snooping address, execution of instructions is stalled dependent on whether or not the instruction currently being executed or currently advancing in the processor's pipeline is either a LOAD or a STORE instruction.

Description

RISC processor with concurrent snooping .and instruction execution.

FIELD OF THE INVENTION

The invention relates to a data processing system comprising a CPU coupled to a data resource shared with another device.

BACKGROUND ART

A shared-memory multiprocessor system is a data processing system wherein multiple processors share a memory. Sharing the same data among the processors may give rise to the so-called data cache-coherence problem. Two or more of the processors may read the same word from the shared memory and load this word into their respective caches. A first one of the processors may modify this word in its own cache and the shared memory, while the data cache of a second one of the processors still has the old word. If the process that is running on the second processor uses this old word the semantics of the shared memory is violated. A known solution to this cache-coherence problem is the snooping cache technique. See, for example, "Structured Computer Organization", A.S. Tanenbaum, Prentice Hall International Editions, third edition, 1990, especially pp. 498-505, or U.S. patent 5,353,415 incorporated herein by reference.

A cache is a relatively small but fast memory arranged between the data and/or instruction inputs of the CPU and main memory in order to compensate for the difference in speed between the processing in the CPU and the fetching of data and instructions from the main memory. Cache operation relies on the locality principle: program references to memory tend to be clustered in time and in logical space. Temporal clustering relates to the tendency to reference the same address more than once within a specific period of time. Spatial clustering relates to the tendency to fetch data or instructions from logically consecutive memory addresses. The data and instructions in the main memory are mapped into the cache in blocks of logically coherent addresses.

As a typical snooping scenario consider the following. Assume that a CPU and another device are connected to a memory via a shared bus. The other device is capable of writing to the memory and is, for example, another CPU or a peripheral. The bus has a bus controller. The other device requests ownership of the bus from the bus controller and the latter grants the bus to the device if the bus is available. The device becomes then the bus master. The bus master writes to the memory via the bus. The bus controller monitors the traffic. Upon finding that the bus master issues a snoopable memory address, the bus controller sends a snoop request to the. CPU. Upon receipt of the snoop request, the CPU checks its cache to determine if the cache contains data associated with the address, referred to as snooping address. If the data associated with the snooping address is present in the cache, the cache controller invalidates the corresponding data in the cache. Upon a read operation of the CPU's cache at that address, the CPU experiences a miss and the correct data is fetched from main memory. A problem associated with maintaining cache-coherence using snooping in a configuration with a processor that has a single ported-cache is the need for stalling of the execution of instructions when the local cache controller is verifying the status or presence of shared data in the cache. The execution unit of the CPU and the cache controller may want to access the cache simultaneously, the execution unit in order to read or write data, and the cache controller in order to check for possibly shared data and to invalidate or otherwise qualify the shared data. Stalling the execution unit decreases the efficiency of the execution of the program.

OBJECT OF THE INVENTION It is therefore .an object of the invention to provide a data processing system whose program execution is less hampered by the snooping procedures than the system of the prior art.

SUMMARY OF THE INVENTION To this end, the invention provides a data processing system comprising a CPU and a device coupled to a data resource shared with the CPU. The device is capable of writing to the resource. The CPU has an execution unit for processing data under control of instructions. The execution unit is coupled to the resource via a cache. The cache has a controller for controlling the cache in response to receiving a snooping address generated by the device. The execution unit conditionally stalls dependent on whether or not there is a conflict between the execution unit and the cache controller regarding access to the cache. The invention is based on the insight that a conflict regarding simultaneous cache access requests arises only if the execution unit is about to execute LOAD or STORE instructions while the cache controller is about to respond to a snoop or vice versa. LOAD and STORE instructions move data between the memory .and the general registers of the CPU. These so-called memory reference instructions work directly between the registers and main memory. LOAD and STORE instructions also can operate between the registers and the data cache on implementations so equipped. See, for example, "MIPS RISC Architecture", Gerry Kane .and Joe Heinrich, Prentice Hall, 1992, especially pages A5 - A6. A LOAD instruction loads a general register with data from the data cache. A STORE instruction stores a data value from a general register into the data cache. There is no need for the CPU to stall the execution of instructions if the instructions being carried out do not interact with the cache, that is, if the instructions are neither LOADS nor STORES. In contrast, the known RISC architectures having a single-ported data cache stall the execution unconditionally in response to a snoop.

BRIEF DESCRIPTION OF THE DRAWINGS

The invention is explained in more detail and by way of example with reference to the accompanying drawings, wherein:

Fig.1 is a block diagram of a multiprocessor system; and Fig.2 is a block diagram of a part of a processor for support of the conditional stalling.

Throughout the figures, same reference numerals indicate similar or corresponding features.

DETAILED EMBODIMENTS

Fig.l is a block diagram of a data processing system 100 according to the invention. System 100 comprises a CPU 102 and a device 104 that are coupled to main memory 106 via a bus 108. Bus traffic is controlled by a bus controller 110. Both CPU 102 and device 104 are capable of writing to memory 106. Device 104 may, but need not, be another CPU. CPU 102 has an instruction execution unit 112, a bus interface 114, an instruction cache 116, an instruction cache controller 118, a data cache 120 and a data cache controller 122. CPU 102 has a pipelined LOAD/STORE architecture. The instructions that reference memory are LOAD instructions and STORE instructions as mentioned above. The pipeline operation is brought about by having the CPU's components, e.g., execution unit 112, its registers (not shown), caches 116 and 120, instruction fetch and decode unit (not shown) work in parallel so that at any instant several instructions are in various stages of processing. The pipeline has, for example, the following stages in the this order: fetch the instruction, decode instruction .and/or access register file, execute the instruction, access memory (cache), and write-back to the cache. CPU 102 has a typical RISC architecture. See, e.g., "Structured Computer Organization", A.S. Tanenbaum, Prentice Hall International Editions, third edition, 1990, especially pp. 431 -450, and "MLPS RISC Architecture", Gerry Kane and Joe Heinrich, Prentice Hall, 1992, especially Chapter 1.

Data cache controller 122 receives a snooping address via bus controller 110 and checks then if the snooping address supplied is present in its local cache 120. If so, cache controller 120 has to invalidate the corresponding data or take another action. The term "snooping cycle" is used herein to indicate this sequence of actions including the checking and invalidating steps necessary to maintain cache-coherence.

The invention distinguishes between two possible situations in which second device 104 triggers a snooping cycle: I) while execution unit 112 of CPU 102 is currently executing a LOAD instruction or a STORE instruction; and ii) while execution unit 112 of CPU 102 is executing .an instruction other than a

LOAD instruction or a STORE instruction.

In case I), the LOAD or STORE instruction is postponed until snooping has been completed first. The LOAD or STORE is resumed after snooping is ready. During the snooping cycle, execution unit 112 of CPU 102 is stalled. That is, the pipeline advances through the various stages until cache 120 is to be accessed for the write-back. The execution stops before performing the access step, since the LOAD or STORE operation is then competing with cache controller 122 for access to cache 120, and access to cache 120 is denied to execution unit 112. In case ii), CPU 102 is performing a snooping cycle in parallel with the instruction flow without stalling execution unit 112. If it happens that during performing the snooping cycle a LOAD or a STORE instruction enters the instruction stream of CPU 102, hardware interlocking is used to prevent these instructions from advancing in the pipeline of CPU 102 until the snooping cycle has been completed. In this scheme, execution unit 112 is conditionally stalled during snooping: dependent on whether or not there is an outstanding LOAD ore STORE instruction currently present in the pipeline of CPU 102. Identification of the type of instruction is done by the instruction fetch and decode unit (not shown).

Fig.2 is a diagram of snoop-control circuitry 200 in cache controller 122. Circuitry 200 comprises a multiplexer 202 that has an input 204 for receiving a snooping address and an input 206 for receiving the addresses associated with the data supplied to execution unit 112. Circuitry 200 further comprises a multiplex controller 208 for control of multiplexer 202. Multiplex controller 208 is p-art of cache controller 122 and is governed by the state machine (not shown) of controller 122. Controller 208 has an output connected to execution unit 112 for supply of a snoop-stall signal that stalls execution unit 112 in case a LOAD or STORE instruction is about to be executed by unit 112. Hardware-interlocking is handled by execution unit 112.

Claims

CLAIMS:

1. A data processing system (100) comprising:

- a data resource (106);

- a device (104) coupled to the resource, and capable of writing to the resource;

- a CPU (102) coupled to the resource and having: - an execution unit (112) for processing data under control of instructions;

- a cache (120) coupling the unit to the resource;

- a cache controller (122) for controlling the cache in response to receiving a snooping address generated by the device; wherein: - the execution unit conditionally stalls dependent on whether or not there is a conflict, regarding access to the cache, between the execution unit and the cache controller.

2. The system of claim 1 , wherein the execution unit stalls when the cache controller requires access to the cache in response to the snooping address while a LOAD instruction or a STORE instruction is being executed.

3. An electronic circuit comprising a CPU (102) that has:

- an execution unit (112) for processing data under control of instructions;

- a data cache (120) coupled to the unit; and - a cache controller (122) for controlling the cache in response to receiving a snooping address; wherein:

- the execution unit conditionally stalls dependent on whether or not there is a conflict, regarding access to the cache, between the execution unit and the cache controller.

4. The circuit of claim 3, wherein the execution unit stalls when the cache controller requires access to the cache in response to the snooping address while a LOAD instruction or a STORE instruction is being executed.

5. A method of controlling a data processing system ( 100) comprising

- a data resource (106);

- a cache (120) coupling the unit to the resource;

- a cache controller (122) for controlling the cache in response to receiving a snooping address generated by the device; wherein: - the method comprises conditionally stalling the execution unit dependent on whether or not there is a conflict, regarding access to the cache, between the execution unit and the cache controller.