CA2030888C - Cache data consistency mechanism for workstations and servers with an i/o cache - Google Patents

Cache data consistency mechanism for workstations and servers with an i/o cache

Info

Publication number
CA2030888C
CA2030888C CA 2030888 CA2030888A CA2030888C CA 2030888 C CA2030888 C CA 2030888C CA 2030888 CA2030888 CA 2030888 CA 2030888 A CA2030888 A CA 2030888A CA 2030888 C CA2030888 C CA 2030888C
Authority
CA
Canada
Prior art keywords
cache
memory
data
devices
central
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CA 2030888
Other languages
French (fr)
Other versions
CA2030888A1 (en
Inventor
John Watkins
David Labuda
William C. Van Loo
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sun Microsystems Inc
Original Assignee
Sun Microsystems Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sun Microsystems Inc filed Critical Sun Microsystems Inc
Publication of CA2030888A1 publication Critical patent/CA2030888A1/en
Application granted granted Critical
Publication of CA2030888C publication Critical patent/CA2030888C/en
Anticipated expiration legal-status Critical
Expired - Fee Related legal-status Critical Current

Links

Landscapes

  • Memory System Of A Hierarchy Structure (AREA)

Abstract

Hardware and software improvements in workstations which utilize a cache for increasing the throughput of Direct Memory Access (DMA) I/O on an operating system supporting multiple concurrent I/O operations. In a workstation or server having an operating system supporting multiple concurrent I/O operations, performance may be improved significantly by including a write back cache for I/O as one of the systems elements. Such write back cache supports external devices with at least two types of device interfaces: a standard system bus interface and a network control interface through a unique combination of hardware and software support while maintaining data consistency between the I/O cache and the CPU cache by providing all associated controls, I/O Cache arrays, CPU Cache arrays, data paths, and diagnostic and programming support necessary to implement an efficient data consistency mechanism between the CPU cache data and I/O Cache data.

Description

- 20~0~88 A Cache Dala Consistency 1\1echani~m for Wurkstations and Servers with an l/() Cache Summary of the In-~ention:

This invention is directed to certain hardware and software improvements in workstations which utilize a cache for increasing the throughput of Direct Memory Access (DMA) I/O on an operating system supporting multiple concurrent I/O operations. In this connection, for convenience the invention will be described with reference to a parlicular operating system with this support, namely the Unix operating system.
(Unix is a registered trademark of the American Telephone & Telegraph Corp.) However, the invention is not limited to use in connection with the Unix operating system, nor are the claims to be interpreted as covering an invention which may be used only with the Unix operating system.

In a Unix based workstation or server, system performance may be improved significantly by including a write back cache for l/O as one of the system elements. However, a problem that can arise from this strategy is maintaining data consistency between the l/O cache and the CPU cache.
Traditional solutions to lhis problem place the burden of maintaining consistency either on the operating system, which causes severe performance degradation, or on the system hardware, which increases the cost and complexity of the cache design.
*

- 203e888 The write back cache for 1/0, which is assumed as a system element in the present invention, is a component in the workstation or server's Input/Output (1/0) subsystem. In a typical workstation or server configuration, the major system components include a Central Processing Unit (CPU), a Memory Management Unit (MMU), an optional Cache subsystem, Main Memory, and an Input/Output (I/O) subsystem for transfering data between the memory subsystem (Cache and Main Memory) and external devices. The I/O subsystem described here supports external devices with at least two types of device interfaces: a standard system bus interface and a network control interface. The standard system bus is typically capable of supporting a variety of devices, including disk controllers, as one example.

Control of data movement between the e~ternal devices and the mairl memory subsystem is typically done in either of two ways. First, data mo~ ement can be controlled by the CPU directly reading from the device (to internal CPU registers) or writing from registers to the device. This type of control is called Programmed I/O. The second type of control is with data movement being controlled, for the most part, by the external device itself. This type of control is called Direct Memory Access, or, if the device accesses memory through virtual addresses (as is the case in the prefered embodiment), Direct Virtual Memory Access (DVMA). Coordination between the external device and the CPU is typically handled either by message passing or through interrupts.
2~30~88 The 1/0 Cache assumed here is a mechanism to significantly enhance the performance of DVMA (or DMA) data transfers. It is further assumed that the VO Cache may be any of a variety of devices whose purpose is to temporarily buffer data being transfered between DVMA devices and the system's cache and memory subsystem.

In the present invention, the problem of maintaining data consistency between an 1/0 cache and a CPU cache is solved through a unique combination of hardware and software support, which is collectively called "Consistency Controls". The term "Consistency Controls"
as used below will be meant to include all associated controls, 1/0 Cache arrays, CPU Cache arrays, data paths, and diagnostic and programming support necessary to implement an efficient data consistency mechanism betu een the CPU cache data and 1/0 Cache data.

DVMA device classifications ln order to support the proper operation of the 1/0 Cache, the operating system divides all DVMA or DMA I/O devices on the system into 3 classes. These classes of devices are each treated differently by the operating system, but all devices within a class are treated identically by the routines that support the I/O Cache operation.

Class 1 devices are characterized by their sequential 1/0 to a dynarnic buffer in system memory. They are high throughput devices, such as magnetic disk and tape, and thus increased system performance can be achieved by properly cacheing their data in the I/O Cache. They always 2030~88 _ perform 1/0 via sequential DMA accesses to a specified buffer, and communicate with the operating system via shared memory outside the data buffer. In all cases, the data buffer used by a Class 1 device is dynamically allocated, so the operating system must allocate and deallocate the buffers for each operation.

Class 2 devices are characterized by their I/O to multiple, static data buffers. This class includes networking devices, which typically use a ring buffer scheme for sending and receiving network packets. Class 2 devices do not allocate and deallocate buffers per operation. Instead, a set of statically allocated data buffers is repeatedly used to perform I/O
operations. These devices must perform sequential DMA within a data buffer, but they can be accessing several data buffers simultaneously in an interleaved fashion. Class 2 devices are also high throughput devices, so it is beneficial to system performance to have their data cached in the I/O
Cache.

Class 3 devices are characterized by either non-sequential DMA
accesses to their data buffers, or throughput that is too low to gain noticeable system performance from cacheing their data in the I/O Cache.
The operating system is designed to have Class 3 devices bypass the I/O
Cache entirely, so their data is never cached in the I/O Cache. S uch data may or may not be cached in the Central Cache used by the CPU.

When DMA devices in any of the three classes employ a virtual addressing space, then these devices are called Direct Virtual Memory Access (DVMA) devices. Within the prefered embodiment, all I/O devices 203~8~8 -which are the subject of this description are DVMA devices. However, this description may be modified to include Direct Memory Access (DMA) de- ices either as a substitute for DVMA devices or in conjunction with DV~lA devices. DMA devices differ, conceptually, from DVMA devices only in their mechanisms to address data in main memory. DMA devices access memory using real (or physical) memory addresses; DVMA devices access memory through virtual memory addesses which are mapped to real addresses. The mechanism to accomplish this mapping in the prefered embodiment system is the I/O Mapper. The concepts of the ItO Cache, developed here for a system with DVMA devices, may be extended as well to a system supporting DMA devices.

Examples of Class 1 devices in the prefered embodiment of the invention are devices connected to the system through a standard system bus, the VMEbus. An example of a Class 2 device in the prefered embodiment is the Intel Ethernet interface with supporting DVMA logic.
E~amples of Class 3 devices include slower speed (e.g., serial data communication) devices connected to the system through a standard system bus (VMEbus) interface.

Hardware Data Consistency Support for 110 Device Classes 1-3 The Consistency Controls in the present embodiment use an efficient combination of hardware and operating system commands to ensure that the CPU and each DVMA Class 1, 2, and 3 device access consistent data.
There are three distinct problems in maintaining data consistency. First, if the Central Cache is a write back cache, then the Consistency Controls must - ~03~888 ensure that all CPU writes into the cache are seen by DVMA devices reading data from the cache-memory subsystem.

Second, regardless of whether the Central Cache is a write through or write back (copy back) cache, the Consistency Controls must ensure that all addresses for blocks within the Central Cache which are overwritten by DV~IA devices writing new data into the cache-memory subsystem, are mar~ed as invalid or "stale" addresses within the Central Cache.

Third, since the 1/0 Cache acts as a temporary storage buffer for DV~IA data in transit between the cache-memory subsystem and DVMA
devices, the Consistency Controls must ensure that data and controls within the I/O Cache are properly reset at the conclusion of each transfer sequence by a DVMA device. In particular, for DVMA devices writing into the cache-memory subsystem, any data in the I/O Cache buffer at the end of the transfer sequence must be flushed into memory. For DVMA devices reading from the cache-memory subsystem, any valid read data left in the buffer at the end of the transfer sequence must be invalidated.

Within the Consistency Controls, Class 3 DVMA devices resolve these three requirements by bypassing the I/O Cache on all DVMA accesses and, instead, accessing data directly from the Central Cache (or Main Memory, if the DVMA data is non-cacheable for the Central Cache).

For Class 1 and Class 2 DVMA devices, the first and second requirements are efficiently solved within the Consistency Controls by the use of hardware controls which "snoop" into the Central Cache upon every ~030~88 1/0 Cache "miss" on a block of data. When a DVMA read request "misses"
the 1/0 Cache, the block "miss" address is checked against the Central Cache. If a matching address is found, this block of data is copied from lhe Centrai Cache into the I/O Cache while bypassing the required "miss" data to the DVMA device. Similarly, when a DVMA write request "misses" the 1/0 Cache, the block "miss" address is also checked against the Central Cache. If a matching address is found, this block of data is invalidated within the Central Cache. This data consistency mechanism is efficient in that the frequency with which Class I and 2 DVMA devices interfere with the CPU for access to the CPU cache is dramatically reduced: only 1/0 Cache miss cycles require Central Cache snooping.

The third require~nent for Class 1 and Class 2 DVMA devices is solved by hardware and software interaction, through the use of a Flush UO Cache command issued by the CPU at the conclusion of a DVMA device transfer sequence. In summary, the Flush command addresses a block within the 1/0 Cache. If this block is valid and modified, then the contents of the block are written back into memory. If the block is valid, it is marked invalid. If the block is invalid, no action is taken. If the mapping of the DVMA device address sspace into the I/O Cache Arrays is properly specified, then the number of 1/0 Cache blocks to be flushed at the conclusion of a DVMA device transfer sequence will be minimal. The proper action of the Flush command depends on operating system conventions and constraints, which are outlined in the following sections.

Data Consistency Requirements 2030~88 -In response to the partial data consistency supported by the hardware, the operating system must logically divide the physical address space of the system into segments of size M bytes, where M is the larger of the cache line size of the CPU cache and the cache line size of the 1/0 cache.
For each of these resulting segments, the operating system must abide by the following rules to prevent data inconsistencies from occurring:

1. each segment must be identified internally at any given instant of time as ~eing owned either by the CPU cache or the VO cache.

2. segments owned by the CPU cache can be accessed freely by the CPU, but the operating system must insure that the 1/0 cache contains no valid data for that segment and that no 1/0 cacheable DMA accesses occur to the segment while it is owned by the CPU cache.

3. segments owned by the 1/0 cache can be freely accessed by 1/0 cacheable DMA devices, but the operating system must insure that no CPU
accesses to the segment occur while it is owned by the 1/0 cache.

Note that because of the hardware support for partial data consistency, the operating system isn't required to insure that the CPU
cache contains no data from a segment owned by the I/O cache. On the contrary, instances of data from segments owned by the 1/0 Cache may appear valid in the CPU cache. The Operating System is only required to avoid accessing that data while the segment is owned by the 1/0 Cache.
Eliminating the requirement to actually invalidate the data in the CPU
cache when ownership of a segment is transferred to the 1/0 Cache 2030~8 tremendously increases the perrormance benefits of the 1/0 cache, and is one of the key concepts of the invention.

Operating System Consistency Guidelines The operating system fulfills the above requirements by following these guidelines:

1. All 110 buffers used by the operating system that are to be marked 1/0 cacheable at any time must be aligned such that the lowest base 2 log(M) bits of their physical address all equal 0, and sized such that the buffer is an integral multiple of M bytes in length. This insures that any 1/0 cacheable buffer begins and ends on cache line boundaries in both the CPU
cache and the 1/0 cache, and thus can be easily assigned as wholly owned by either the CPU cache or the UO cache.

For Class 1 devices, this is accomplished by always allocating full pages of physical memory for I/O buffers. Full pages of physical memory always satisfy the above criteria.

For Class 2 devices, this is accomplished by explicitly padding the 1/0buffers with unused bytes until they meet the above criteria. When the static buffers for Class 2 devices are allocated at system startup, enough additional memory is allocated so that the buffers can be aligned and sized according to the constraints.

For Class 3 devices, none of the I/O buffers are ever marked VO cacheable, so the above criteria does not apply.

g - 20308~

2. When ownership of a segment is transferred from the CPU cache to the 1/0 cache, the operating system must inhibit subsequent CPU accesses to that segment.

For Class 1 and 2 devices, this is accomplished by using the internal state of the operating system to mark the 1/0 buffer that segment is contained in as owned by the 1/0 device, and preventing any processes from generating CPU accesses to that segment. This also requires that the ownership of all segments contained in a given 1/0 buffer remain consistent at all times; the entire buffer is either owned by the CPU cache or by the I/O cache.

For Class 3 devices, ownership is never transferred to the I/O cache, so this criteria does not apply.

3. Whenever ownership of a segment is transferred from the 1/0 cache to the CPU cache, the operating system must flush any data for that segment out of the I/O cache, and inhibit subsequent I/O cacheable accesses to that segment .

For Class 1 devices, this is accomplished by the operating system module that deallocates the I/O buffers after completion of the operation. This module uses the address and size of the VO buffer to calculate which cache lines of the I/O cache may contain data from segments within the I/O
buffer, and executes a flush operation to each of those lines. Next the module invalidates the I/O cacheable mapping to the buffer, so no subsequent accesses to any segment in the buffer will be I/O cacheable. lf the 1/0 buffer is la~er used for DMA from a Class I de- ice, it is reallocated and remapped 1/0 Cacheable by the operating system before ownership of the buffer is transferred back to the 1/0 Cache.

For Class 2 devices, this is accomplished by the device driver module for the specific device. The device driver code that processes the completion of an I/O operation must perform a f]ush operation to the appropriate 1/0 cache line, based on the direction of the completed operation. The device driver must then change its intemal state to mark the VO buffer as owned by the CPU cache. It is up to the device driver to control the device and prevent it from making further accesses to any segments in the 1/0 buffer.

For Class 3 devices, ownership of segments in the 1/0 buffers is never transferred to the VO cache, so the above criteria does not apply.

Accordingly, in one aspect the present invention provides in a computer system comprising a central processing unit (CPU), a central cache, an input/output (I/O) cache, a memory, and a plurality of I/O devices, a method for maintaining data coherency between said central cache, said I/O cache, and said memory, said method comprising the steps of:
a) partitioning said memory into a plurality of memory segments;
b) assigning ownership for each of said memory segments to said central cache, each of said memory segments assigned to said central cache being eligible to be cached by said central cache only, but accessible by both read and write cycles of said CPU and of said I/O devices addressed to said memory;

c) classifying each of said I/O devices to one of a plurality of I/O device classes based on their logical I/O
buffer and memory access characteristics;
d) allocating and deallocating said memory segments to said logical I/O buffers of said I/O devices, conditionally reassigning ownership of said memory segments being allocated and deallocated to said I/O cache and back to said central cache before said allocation and after said deallocation respectively, based on said I/O devices' classified I/O device classes, said memory segments assigned to said I/O cache being eligible to be cached by said central and I/O caches, but accessible by said read and write cycles of said I/O devices only;
e) detecting read and write cycles of said CPU and said I/O devices;
f) returning data to be read for said detected read cycles of said CPU from selected ones of (i) said central cache and (ii) said memory, and of said I/O devices from selected ones of (i) said I/O cache, (ii) said central cache, and (iii) said memory, respectively; and g) storing data to be written for write cycles of said CPU into selected ones of (i) said central cache, (ii) said memory, and (iii) both said central cache and memory, and of said I/O devices into selected ones of (i) said I/O cache and (ii) memory respectively.

-llA-P, .~ .

20308~8 In a further aspect, the present inve~tion provides a computer system comprising:
a) a memory comprising a plurality of memory segments;
b) a central cache coupled to said memory, a CPU, and a plurality of I/O devices, said central cache being assigned ownership of said memory segments, said central cache assigned memory segments being eligible to be cached by said central cache only, but accessible to read and write cycles of said CPU
and said I/O devices addressed to said memory;
c) an I/O cache coupled to said memory and said I/O
devices, said I/O cache being conditionally reassigned ownership of said memory segments based on classified I/O
device classes of said I/O devices when said memory segments are allocated to logical I/O buffers of said I/O devices, said I/O cache assigned memory segments being eligible to be cached by said central cache and said I/O cache, but accessible to read and write cycles of said I/O devices addressed to said memory only;
d) an operating system allocating and deallocating said memory segments to said logical I/O buffers of said I/O
devices, and conditionally reassigning ownership of said memory segments being allocated and deallocated to said I/O cache and back to said central cache before said allocation and after said deallocation respectively, based on said classified I/O
device classes of said I/O devices;

-llB-20308~8 e) said central processing unit (CPU) performing read and write cycles addressed to said memory on behalf of process being executed by said CPU, data to be read for said read cycles of said CPU being returned from selected ones of (i) said central cache, and (ii) said memory, data to be written for said write cycles of said CPU being stored into selected ones of (i) said central cache, (ii) said memory, and (iii) both said central cache and memory;
and f) said plurality of input/output (I/O) devices performing said read and write cycles addressed to said memory, said I/O devices being classified into said I/O
device classes based on their logical I/O buffer and memory access characteristics, data to be read for said cycles of said I/O devices being returned from selected ones of (i) said I/O cache, (ii) said central cache, and (iii) said memory, data to be written for said write cycles of said I/O devices being stored into selected ones of (i) said I/O
cache, and (iii) said memory.
Brief Description of the Drawings Figure 1 is a block diagram showing the basic system elements in a workstation with an I/O Cache.
Figure 2a is a detailed overall block diagram of major functional elements in a workstation or server which incorporates an I/O cache.
Figure 2b (shown on two sheets) is a detailed block diagram of the major functional elements of the I/O cache itself.
Figure 3a is a diagram showing the cache "hit" logic for the central cache.
Figure 3b is a diagram showing the cache "hit" logic for the I/O cache.
Figure 4a is a diagram showing the DVMA address space for a typical workstation or server.
Figure 4b is a diagram showing how the DVMA address space maps into the I/O cache for Class 1 and Class 2 devices. l2 ~7 20~88 -Figure 5 is a flow diagram of a cache bus arbiter for the I/O cache which describes the arbitration of the I/O
buses in response to three request types: an Ethernet request, a VME request, and a Flush I/O Cache request from the CPU.
Figure 6 (shown for clarity as Figures 6A and 6B but hereinafter collectively referred to as Figure 6) is a flow diagram of a cache memory read controller for the I/O cache which describes the control of the memory read operation for I/O cacheable pages. Also shown in this diagram is how the data flow would be altered if hardware data consistency controls have been implemented.
Figure 7 (shown for clarity as Figures 7A and 7B but hereinafter collectively referred to as Figure 7) is a flow diagram of a write back controller for the I/O cache which describes the control to download data into I/O Cache Write Back buffers when required.
Figure 8 (shown for clarity as Figures 8A and 8B but hereinafter collectively referred to as Figure 8) is a flow diagram of a governor for the I/O cache which describes the control of the I/O Cache data busses and certain I/O Cache state machines control signals.
Figure 9 (shown for clarity as Figures 9A and 9B but hereinafter collectively referred to as Figure 9) is a flow diagram of DVMA cycle termination logic for the I/O cache which describes I/O Cache state machine control signals used to used to conclude a DVMA cycle to the I/O Cache.
Figure 10 (shown for clarity as Figures loA and lOB
but hereinafter collectively referred to as Figure 10) is a flow diagram of tag update logic for the I/O cache which describes the controls for updating tags in the I/O Cache.
Figure lla is a diagram showing operation of the memory data bus during a block read cycle I/O transfer.
Figure llb is a diagram showing operation of the memory data bus during a write back cycle I/O transfer.
Figure llc is a diagram showing operation of a memory .,~
i~

- 20308~8 data bus during a write to don't cache page cycle I/O
transfer.
Figure 12 (shown for clarity as Figures 12A and 12B
but hereinafter collectively referred to as Figure 12) is a timing diagram for an I/O cache read miss with CPU cache miss.
Figure 13 (shown for clarity as Figures 13A and 13B
but hereinafter collectively referred to as Figure 13) is a timing diagram for an I/O cache read miss with CPU cache hit.
Figure 14 (shown for clarity as Figures 14A and 14B
but hereinafter collectively referred to as Figure 14) is a timing diagram for an I/O cache write miss with no write back.
Figure 15 (shown for clarity as Figures 15A and 15B
but hereinafter collectively referred to as Figure 15) is a timing diagram for an I/O cache write miss with write back.
Figure 16 (shown for clarity as Figures 16A and 16B
but hereinafter collectively referred to as Figure 16) is a timing diagram for an I/O cache flush of a non-modified I/O cache line.
Figure 17 (shown for clarity as Figures 17A and 17B
but hereinafter collectively referred to as Figure 17) is a timing diagram for an I/O cache flush of a modified I/O
cache line.

- 2030~8 Detailed Description of the Invention Figure 1 shows the most fundamental elements of a cache based workstation or server with an I/O Cache and Consistency Controls. Within this system, three devices, a Central Processing Unit (or CPU) together with two DVMA devices, access programs and data from a memory subsystem.
The two DVMA devices shown are an Ethernet network transceiver and controller, and a VMEbus Slave interface. (In typical configurations, the CPU also utilizes a VMEbus Master interface to directly access devices on the VMEbus.) A DVMA Arbiter is used to arbitrate access to the 1/0 Cache busses amoung the two DVMA devices and the CPU, while a Central Cache Arbiter arbitrates between the CPU and a DVMA request. With the Consistency Controls, Class 3 DVMA devices will issue requests to access data through the Central Cache, and Class 1 and Class 2 DVMA devices will access the Central Cache only for data consistency checks.

The I/O subsystem of Figure 1 also includes the I/O Cache itself and the necessary control logic to operate the I/O Cache. This may include logic to detect an I/O Cache "miss", to create a DVMA request to the Central Cache Arbiterto handle this miss, and to download a modified I/O Cache block, if present, into a Write Back buffer. The Consistency Controls require that the CPU be able to access the VO Cache to flush DVMA data from the cache at the end of DVMA transfer sequences. The I/O subsystem may include a DVMA data path directly to main memory, operating in parallel with the CPU data path, together with the necessary controls to coordinate the two memory interfaces.

- 2030~88 _ Figure 2a shows in more detail the functional blocks in a workstation or server in which the present invention is implemented. The CPU and memory subsystem includes a microprocessor or Central Processing Unit (CPU) with its address buffer and data transceiver, CPU Address and Data busses, the Central Cache Tag and Data Arrays, an Address Incrementer, a Central Cache Tag Address Comparator, a CPU Memory Address Multiplexer, a CPU Memory Address Register, CPU Control Logic, a CPU
Input Data Buffer (labeled CPU Bfr), a CPU Write Back Buffer (labeled CPU
Wrt Back Buffer), a CPU to DVMA Data Transceiver, a CPU to DVMA
Address Buffer, a Memory Bus, and Main Memory. The 1/0 subsystem includes a VMEbus Master and Slave interface with its address buffer and data transceiver, an Ethernet Network DVMA interface with its address buffer and data transceiver, a DVMA Address and Data bus, an 1/0 Mapper, an I/O Cache Miss Address Register, 1/0 Cache Control Logic to both address and control the I/O Cache and to control other DVMA logic, the 1/0 Cache Tag and Data Arrays, the I/O Cache Address Comparators, to compare both the the high order address (Page Address) and the Block identity within a page, an I/O Cache Address to Data Buffer, an 1/0 Cache lnput Data Buffer (labeled 10 Bfr), and an I/O Cache Write Back Buffer ~labeled IO Wrt Back Buffer). A number of components of the CPU and memory subsystem also` play a role in DVMA operations.

Figure 2b shows the I/O Cache subsystem in more detail. There are two added functional blocks shown in this diagram: the Miss Address Function Driver; and the IOvalid, IOdirty, Write Back Function Update - - 2~30888 Lo~ic. In addition, the useage of control signals set by the state machine flow charts (in later figures) is also shown.

Description of the Elements of a System with Consistency Controls:
the CPU Cache Subsystem The CPU issues bus cycles to address instructions and data in memory and possibly other system devices. The CPU address itself is a real address of (A) bits in size which uniquely identifies bytes of instructions or data. The CPU bus cycle may be characterized by one or more control fields to uniquely identify the bus cycle. In particular, a Read/Write indicator is required, as well as a "Type" field. This field identifies the memory address and data space as well as the access priority (i.e., "Supervisor" or "User"
access priority) for the bus cycle. A CPU which may be utilized in a workstation or server having real addressing and capable of supporting a multi-user operating system is a Motorola MC68030. Note that the Motorola M C68030 has an integral Memory Management Unit, and consequently presents real (or physical) addresses to the CPU Address Bus.

The CPU is interconnected with other system devices and local device busses through the CPU Address and Data busses. The Address bus is a real address bus 32 bits in width. The CPU Data bus is also 32 bits in width.

The cache subsystem has meaning, insofar as the present invention is concerned, only in that DVMA data may reside in this cache. If this is the case, then DVMA Class 3 devices need the Central Cache and its controls to source data, and DVMA Class 1 and Class 2 devices need the Central Cache to provide data consistency for the 1/0 subsystem, through the application of the Consistency Controls.

Within the Central Cache, the Central Cache Data Array is organized as an array of 2**N blocks of data, each of which contains 2**M bytes. The 2*$M bytes within each block are uniquely identified with the low order M
address bits. Each of the 2**N blocks is uniquely addressed as an array element by the next lowest N address bits.

The Central Cache Data Array described herein is a "direct mapped"
cache, or "one way set associative" cache. While this cache organization is used to illustrate the invention, it is not meant to restrict the scope of the invention, which may also be used in connection with multi-way set associative caches.

Another element required for the Central Cache operation is the Central Cache Tag Array, which has one tag array element for each block of data in the Central Cache Data Array. The tag array thus contains 2$*N
elements, each of which has a Valid bit (V), a Modified bit (M), and a real address field (RA). The contents of the real address field, together with low order address bits used to address the cache tag and data arrays, uniquely identify the cache block within the total real address space of (A) bits. That is, the tag real address field must contain at least (A - (M+N)) bits.

Central Cache "hit" logic compares the real addresses for cache accesses to the contents of the Central Cache Tag address field. Within the access address, the lowest order M bits address bytes within a block; the - 203a~88 -next lowest N bits address a block within the cache; and the remaining (A -(M+N)) bits compare with the tag real address field, as part of the cache "hit" logic. Logic for the Central Cache "hit" is shown in figure 3a. Protectionchecking for the real address cache is not necessary, since this can be accomplished at the time of address translation from virtual to real addresses, which is done within the I/O Mapper for DVMA cycles.

The system described herein utilizes a real address Central Cache.
The use of a real address cache is not a requirement for the implementation of the present invention: a virtual address Central Cache, with the appropriate controls for protection checking and the detection of "alias" virtual addresses within the cache control logic, is another possible system configuration in which the present invention may be implemented.
("Alias" virtual addresses arise when two or more different virtual addresses map to the same real address. ) The Address Incrementer controls the word addressing for data within the Central Cache Data Array. In the prefered embodiment, cache blocks are 16 bytes, or 4 words, in length. The Address Incrementer controls the address generation of bits A(03 :02) for the data array.

The CPU Memory Address Multiplexer multiple~es the high order address bits TAGA(31:16) from the Central Cache with the corresponding address bits CPUA(31:16) from the CPU Address Bus. The TAGA bus typically specifies a portion of a write back address, while CPUA(3 1:16) specifies a portion of a cache miss address. The multiple~er sends the resulting address into the CPU Memory Address Register. This register - 2 0 ~ 8 -receives its low order address bits from the CPU Address Bus, bits CPUA( 15:00).

The CPU Memory Address Register serves as the address interface to the Memory Bus for all accesses to main memory. These accesses specifically include reading cache blocks, writing back modified cache blocks, and writing partially modified double words (selected bytes from 8 byte double words).

The CPU Control Logic uses the results of the Central Cache Hit/Miss indication, as well as other information from the CPU and other system devices, to control the operation of that pOniOn of the system related to the CPU.

The CPU Input Data Buffer is a registered buffer for 64 bits of data from the Memory Bus. It multiplexes the data onto the CPU Data Bus in 32 byte increments. On cache miss operations, the word miss address bit A(2) specifies which work from the CPU lnput Data Buffer is multiplexed onto the CPU Data Bus first.

The CPU Write Back Buffer is a buffering register for a full cache block which is loaded from the 32 bit CPU Data Bus and drives the 64 bit Memory Bus. It is used to buffer modified cache blocks as well as partially modified double words to be written to memory.

The CPU to DVMA Data Transceiver buffers data between the CPU
Data Bus and the DVMA Data Bus. As long as DVMA devices "hit" the 1/0 2030~88 Cache, these two busses and their controling logic normally operate independently (that is, the buffers are disabled).

The CPU to DVMA Address Buffer registers and buffers the address from the CPU when it accesses devices which are on the DVMA Data Bus.
These devices include the VMEbus Master interface and the 1/0 Cache Tags and Data, for both diagnostic operations and cache flushing.

The Memory Bus is a 64 bit multiplexed Address and Data bus, ~hose operation is described in Figure 13. The CPU Memory Address Register is the source for the memory address for both CPU and DVMA bus cycles, but the data buffers for CPU and DVMA operations are independent.
That is, data transfers for DVMA operations utilize the IOC Input Data Buffer and IOC Write Back Buffer, while CPU transfers use the CPU lnput data Buffer and the CPU Write Back Buffer.

Main Memory is accessed over the 64 bit Memory Bus. It is addressed as a 30 bit device, is implemented with Dynamic RAM parts, and includes registers and controls for such operations as initializing physical address ranges, checking and generating ECC codes, generating DRAM
Refresh, and reporting errors. These memory features and others are only necessary to the invention as they enable the implementation of a reliable main memory subsystem.

Description of tke Elements of a System with Consistency Controls the CPU Cache Subsystem Operation - 20308~8 Within the present implementation, the Central Cache and memory subsystem are utilized for Consistency Controls in two possible roles. First, for Class 1 and 2 DVMA devices, the Central Cache and memory subsystem are accessed on l/O Cache "misses" to check for data consistency between the Central Cache and the I/O Cache. Second, for Class 3 DVMA devices, the Central Cache and memory subsystem can be the source (or destination) of the DVMA data.

For this latter case, the I/O Cache Miss Address Register (described below) issues a physical address. This address is checked against the contents of the Central Cache Tag Array. The low order bits of the address from the l/O Cache Miss Address Register are used to address both the Central Cache Tag and Data arrays. In particular, bits A( 15 :04) address the Tag Array, and bits A(15:02) address a word in the Data Array. The high order bits A(3 1:16) of the l/O Cache Miss Address Register address are compared with the contents of the address field of the Tag Array with the Central Cache Tag Address Comparator. If the compare is a match and the tag entry is legitimate, as indicated by a "Valid" bit within the Tag Array entry, then the I/O Cache Miss Address Register access has resulted in a Central Cache "hit". If the VO Cache Miss Address Register issued a read operation, the contents of the Central Cache Data Array addressed by A( 15 :02) are sent to the DVMA Data bus. If the I/O Cache Miss Address Re~ister issued a write operation, data from the DVMA Data bus is written into the Central Cache Data Array entry addressed by A(15:02), with bytes modified as indicated by a "size" field set by the l/O Cache Miss Address 2~308~8 -Register. The corresponding Tag entry's "Dirty" bit is set to indicate that the cache line has been modified.

Should the address issued by the 1/0 Cache Miss Address Register not result in a Central Cache "hit" (i.e., result in a cache "miss"), and the DV~A page is marked cacheable for the Central Cache, a block of data from Main Memory is read through the CPU lnput Data Buffer and placed into the Central Cache Data Array. On a DVMA read miss, the miss data from the memory interface is forwarded onto the DVMA Data bus. lf the operation is a write, incoming data from Main Memory is merged with modified bytes of DVMA data from the DVMA Data bus. This merged data is w ritten into the Central Cache, along with the rest of the cache block from memory, and the "Dirty" bit in the Central Cache Tag Array is set. For any miss, as long as the DVMA page is marked cacheable for the Central Cache, the address of the new data is written into the address field of the Central Cache Tags.

Should a cache miss require data from Main Memory to be written to a location in the Central Cache Data Array currently occupied by a valid cache block that had been previously modified, the block is first read out of the Central Cache Data Array into the CPU Write Back Buffer. The data is written into Main Memory from the CPU Write Back Buffer after the memory read required by the cache miss. If the Central Cache hit rate is high, then the Main Memory traffic generated by the CPU will be low, allowing high bandwidth for DVMA devices to access Main Memory.

- 2~30~8 -For DVMA Class 1 and Class 2 devices, the Central Cache is used to provide data consistency between the Central Cache and the I/O Cache. If the 1/0 Cache Miss Address Register indicates a read DVMA bus cycle, then data is sourced from the Central Cache onto the DVMA Data bus if the DVMA address "hits" the Central Cache. If the DVMA read address "misses"
the Central Cache (the typical case) then read data is sourced from Main Memory through DVMA subsystem data buffers, as explained below.

Similarly, if the I/O Cache Miss Address Register indicates a write DVMA bus cycle, then a "hit" in the Central Cache causes the Central Cache entry at the "hit" address to be invalidated. A "miss" in the Central Cache simply allows the DVMA operation to complete within the VO subsystem.

Description of the Elements of a System with Consisten~y Conlrols:
the 110 Cache Subsystem Within the I/O subsystem, the VMEbus Master and Slave lnterface includes drivers and receivers for the VMEbus address and data busses together with arbiter logic, interrupt handling logic, and such other controls as are needed to implement a VMEbus Master and Slave interface according to the VMEbus specification. The VMEbus Slave interface supports DVMA cycles from the system bus.

A particular element of this control is logic to recognize virtual VMEbus addresses within the CPU's VMEbus DVMA address space. This DVMA virtual address space is shown in Figure 4a (?). From the full 32 bit (4 gigabyte) VMEbus address space (VMEbus A32 option from the VMEbus specification), or from the 24 bit ( 16 Megabyte) VMEbus address space (option A24 from the VMEbus specification), the lowest (P) pages are recognized as the DVMA virtual address space for the system, where each pa~e is of size (S) bytes. In the present implementation, the page size is (S
= 8) kilobytes, and the DVMA virtual address space for VMEbus devices is (P = 128) pages total. Of these, the top 8 pages are reserved.

The Ethemet Network DVMA lnterface includes an Ethernet control chip and supporting logic together with address and data registers and ~uffers to interconnect with the DVMA Address and Data busses.

The DVMA Address Bus is a virtual address address bus which interconnects the VMEbus DVMA address interface, the Ethemet DVMA
address interface, and the CPU to DVMA Address Buffer with the 1/0 Mapper, the block address inputs for the 1/0 Cache Tag and Data Arrays, the I/O Cache Address Comparator, the 1/0 Cache Miss Address Register, the 1/0 Cache Address to Data Buffer, and the 1/0 Cache Control Logic.

The DVMA Data Bus interconnects the VMEbus data interface, the Ethernet data interface, and the CPU to DVMA Data Transceiver with the I/O Cache Tag and Data Arrays, the I/O Cache Address to Data Buffer, the I/O Cache Address Comparator, the I/O Cache Input Data Buffer, the 1/0 Cache Write Back Buffer, and the UO Cache Control Logic.

The 1/0 Mapper translates the virtual addresses from the DVMA
devices into physical addresses while performing protection checking. The Mapper is effectively a simple MMU. It has an entry for each page of the 20~0388 DVMA address space. Each entry is L bits in width and is broken into anaddress bit field and a status bit field. The address field provides the translation from virtual to physical page number for the virtual address supplied at the input. The status field consists of several bits which indicate, if the page is valid, what the write and access protections for the pa~e are, and if the page is l/O Cacheable. The key status bit required is the l/O Cacheable bit. The particular content of the l/O Mapper may vary considerably. In fact, an I/O system using DMA, with physically mapped devices, can still use the same principals described in this invention. An I/O Cacheable bit would, however, still be required for such a system.

The I/O Cacheable bit in the I/O Mapper distinguishes those pages, and consequently those DVMA ~evices, which can use the I/O Cache from those which can not. DVMA Class 1 and Class 2 devices are mapped as l/O
Cacheable, while DVMA Class 3 devices are mapped as non-vo Cacheable.
DVMA transfers for these later devices are handled as accesses to the Central Cache, ignoring the VO Cache.

The Miss Address Function Driver drives the low order address bits, IORA(03:00~, with new updated page statistics and control bits during a page mapper update.

The I/O Miss Address Register captures the physical DVMA address for bus cycles which are non-l/O Cacheable, as indicated in the l/O
Mapper. The I/O Miss Address Register also captures the physical DVMA
address for DVMA cycles from Class 1 devices which "miss" the l/O Cache.

The address source for low order bits within a page is the Virtual 1/0 Address Bus, while the I/O Mapper sources the physical page translation.

The I/O Cache Control Logic controls the arbitration of the CPU and DVMA devices for use of the 1/0 Address and Data busses; the indexing of both the I/O Mapper and the 1/0 Cache Tag and Data Arrays; the updates of the I/O Mapper from the CPU; updates of the I/O Cache Tag and Data Arrays from the CPU, from the 1/0 Mapper, and from Main Memory; the control of Flush comands from the CPU to the 1/0 Cache; and all other controls associated with independent DVMA operation of the 1/0 Cache.
This logic also interacts with the CPU Control Logic on all VO Cache misses, for cache consistency; all CPU accesses to devices within the DVMA
subsection; and on all DVMA accesses to non-vo Cacheable pages, again for cache consistency. It finally provides such control related to the handling of DVMA cycles and CPU access of DVMA logic which is necessary for both testability and functionality but is not e~plicitely enumerated in this summary description.

Tha 1/0 Cache Tag and Data Arrays contain P cache tag entries and P
cache data blocks. Each I/O Cache data block contains B bytes of data.
Generally, the I/O Cache Data Array block size is the same as the Central Cache block size. This is not a requirement but does simplify the system implementation. Each of the P I/O Cache Tag Array entries records the memory address and control information for each of the P blocks of data in the I/O Cache Data Array at the corresponding address. In general-, the memory address in the I/O Cache Tag Array may be either a physical -address or a virtual address, and this virtual address again may be either from the DVMA device address space or from the operating system address space for DVMA Devices. In the present implementation, the address field of the l/O Cache Tag Array contains a physical address.

How the l/O Cache Tag and Data Arrays must be addressed is not a requirement for the COnsistency Controls. In the present implementation, the l/O Cache Tag and Data Arrays are addressed with VIOA(19:13) for VMEbus DVMA devices, which corresponds to the page index within the virtual VMEbus DVMA address space. Within this range, the top 8 pages are unused. Two of these are in turn assigned for use as Ethernet buffers:
one for Ethernet read data, at A(19:13) = 0x7f, and one for Ethernet write data, at A(19:13) = 0x77.

In general, the Tag Array must contain within its address field as many bits as are required to uniquely identify the DVMA block address. In the present implementation, the Tag Array contains a physical address.
This is not a requirement for the I/O Cache, but leads to design simplifications in the present implementation. The I/O Mapper maps both the 24 bit Ethernet DVMA address space and the 20 bit VMEbus address space into the physical addess space, which is 32 bits in this implementation. Therefore the address field in the I/O Cache Tag Array in this implementation contains the physical address A(3 1:04). In an I/O
Cache Tag Array access in the present implementation, the Tag Array address field bits A(31: 13) are compared with the physical address PIOA(3 1:13) from the l/O Mapper, while the bits A( 12:04) from the - 20308~8 address field, which identify the block within a page, are compared with the corresponding bits VIOA( 12:04) in the VIOA bus. If the two comparisons described above match and the valid bit of the l/O Cache Tag Array entry is set, then an l/O Cache "hit" is indicated.

The VO Cache Tag Array may be accessed by the CPU for at least two distinct operations, a diagnostic read / write operation and an I/O Cache Flush command, as part of the Consistency Controls. CPU diagnostic cycles can write data into and read data paterns from the Tag Array as a P entry memory array. In the present implementation, the CPU address bits A( 10:04) index the Tag Array, both on diagnostic operat ions and for the Flush command.

The Flush command, which is a CPU write cycle in the present implementation, results in the I/O Cache Control Logic first reading the l/O
Cache Tag Array entry to see if it is valid and whether it is modified. lf the entry is both valid and modified, the controls download the corresponding block in the VO Cache Data Array; see the Data Array description, below. If the entry is not Yalid, no further action is taken to the tag array. If the entry is valid, then this I/O Cache Tag Array entry is invalidated.

The l/O Cache Tag Array is also updated as a part of normal DVMA
cycles. If the DVMA device access "hits" the I/O Cache, then no update of the Tag Array is required. If the DVMA device bus cycle is l/O Cacheable, has no protection violation (as indicated through the I/O Mapper) and "misses" the I/O Cache, then at the conclusion of the DVMA bus cycle, the entry in the Tag Array will be written with the new DVMA physical block - 2~30~88 address, the valid bit set to true, and the modified bit set if the DVMA
device is executing a write cycle. On DVMA write cycles which "miss" the l/O Cache, if the old Tag Array entry is marked valid and modified, then the physical block address from the Tag Array, A(3 1:04) in the present implementation, is written into the I/O Cache Miss Address Register. This address will be loaded into the CPU Memory Address Register to provide the write back address for the modified I/O Cache block.

The I/O Cache Data Array has P blocks, corresponding to the P Tag Array entries. Like the Tag Array, it may be accessed by the CPU for at least two distinct operations, a diagnostic read / write operation and an I/O Cache Flush command, as part of the Consistency Controls. CPU
diagnostic cycles can write data into and read data pattems from the Data Array as a P entry memory array of B bytes. In the present implementation, the CPU address bits A( 10:04) index the block of the Data Array, while A(3:2) identify a word within the block.

The l/O Cache Address Comparators provide the address comparison to determine if an l/O Cache "hit" has occured. In the present implementation, the block identification bits from the Tag Arrray, A( 12:04), must match the DVMA address in VIOA( 12:04), and the physical page address from the Tag Array, A(3 1:13), must match the IJO Mapper physical address, PlOA(3 1:13).

The l/O Cache Address to Data Buffer provides the path to access the output of the l/O Mapper onto the VO Data Bus. This buffer has two uses.
First, this path is used to update the physical address field in the VO Cache 2~30~8 Tag Array. Second, the path is used for diagnostic testing of the l/O
Mapper by the CPU.

The IOvalid, lOdirty, Write Back Function Update Logic drives the low order address bits, IORA(03:00), with new updated tag values on ItO
Cache updates. It also examines these bits during an l/O cache tag check to see if a write back of a modified IIO Cache block is required.

The I/O Cache Input Data Buffer provides the data path to the I/O
Cache data Array for DVMA data returned from Main Memory on DVMA
read cycles which "miss" the VO Cache. On such operations, the "miss" data for the DVMA device is simultaneously bypassed to the DVMA device while it is written into the l/O Cache Data Array. The buffer is also used as the data path for returning data from Main Memory to those Class 3 DVMA
devices which are mapped to non-I/O Cacheable pages.

The I/O Cache Write Back Buffer provides the data path for writing modified data from the l/O Cache Data Array back into Main Memory. It also buffers the write back address from the l/O Cache Tag Array.

Description of the Elements of a System ~!ith Consisten~y Contr-ols:
the 1/0 Cache Subsystem Operation Summary The operation of the components of the l/O Cache subsystem for a DVMA transfer from a VMEbus device is summarized below. The cycle begins with the VMEbus DVMA in~erface decoding the VMEbus address as being in the DVMA address space. Since VMEbus is the default device on the VO address and data busses, the VO Cache Tags and VO Cache Mapper ~- 2~30g88 are accessed immediately, in parallel with the synchronizing of the V~Ebus Address Strobe. The VMEbus address within the DVMA address space, VMEA(19:01), maps directly into the l/O address bus VIOA(19:01);
VlOA(00) is set from VMEbus byte controls.

The VMEbus device uses the virtual DVMA page address VIOA(19:13) to inde~ the VO Cache Tag Array. The address field of the l/O
Cache Tag Array contains a physical address. The I/O Cache Address Comparator compares the lower order block address bits contained in the address field of the tag entry selected, A(12:04), against the untranslated bits of the DVMA block address generated by the VMEbus device, in VIOA( 1 2:04).

In parallel with the I/O Cache Tag Array access, the l/O Mapper is also accessed. The Mapper output, RIOA(3 1:13) is then compared with the Tag Array high order address field, TAGA(3 1:13) for the second tag address comparison. If the two comparisons described above match and the valid bit of the I/O Cache Tag Array entry is set, then an I/O Cache "hit" is indicated. If the VMEbus device is doing a read cycle, data from the VO Cache Data Array entry is sent to the VMEbus device. If a write cycle is occuring, data from the VMEbus device is written into the I/O Cache Data Array entry. An l/O Cache "miss" results, in general, if either of the two address comparisons does not match, if the valid bit in the Tag Array is not set, or if the Dirty bit is not set on a bus cycle in which the DVMA device is doing a write cycle.

_ - 2~30888 During a Class I or Class 2 DVMA read cycle which "misses" the 1/0 Cache, a block of data is written into the Data Cache. Depending on the results of the cache consistency check with the Central Cache, this data may originate from either of two sources: Main Memory, or the Central Cache.

The consistency check against the Central Cache begins with the 1/0 Cache Control Logic initiating a read request to the Central Cache through the I/O Cache Miss Address Register. If an address match is found in the Central Cache, then a block of data from the Central Cache is downloaded to the DVMA Data Bus through the CPU to DVMA Data Transceiver. If no address match is found for the consistency check, then data is transfered from Main Memory through the lJO Cache Data Input Buffer to the DVMA
Data Bus. In both cases, the requested data is bypassed to the DVMA
device while the block is written into the I/O Cache Data Array. The I/O
Cache Tag Array entry is updated with the new DVMA address and marked valid.

Subsequent sequential reads by the DVMA device will result in an 1/0 Cache "hit" until all the data of the block addressed in the initial "miss"
cycle has been read by the DVMA device. It is only the initial "miss" cycle that requires arbitration with the CPU and access to the Central Cache for a consistency check.

During a Class 1 or Class 2 DVMA write cycle which "misses" the 1/0 Cache, in the present implementation the I/O Cache Tag Array entry addressed by the DVMA device is first examined. If this block is valid and modified, then the address from the Tag Array and the block of data from the data array are downloaded into the 1/0 cache write back buffer; if the block is not modified, no download is necessary. The DVMA data from the current write cycle can now be written into the 1/0 Cache Data Array, and the I/O Cache Tag Array entry can be updated with the new physical address and marked valid and modified. A DVMA write back cycle, with the address and data provided through the 1/0 Cache Write Back Buffer, returns the former modified data from the 1/0 Cache to Main Memory. The completion of the DVMA write cycle does not depend on completing the Write Back cycle to Main Memory.

To check for Central Cache data consistency, the Central Cache Tag Array is accessed with the physical address of the DVMA write cycle. If an address match is found, then the corresponding block in the Central Cache is invalidated. If no match is found, then control is returned to the 1/0 Cache Controls.

Subsequent sequential writes by the DVMA device may, in certain implementations, result in an I/O Cache "hit" until the entire block has been filled by the DVMA device. Then the next sequential write will result in an 1/0 Cache "miss". Assuming that the address accesses the same 1/0 Cache entry, the data in the I/O Cache block is dirty and cannot be ovenvritten by new data. This "miss" causes an I/O Cache Write Back cycle to be initiated by filling the Write Back Buffer before new data is written into the I/O Cache. It is only the initial "miss" cycle that requires 2a30888 ._ arbitration with the CPU and access to the Central Cache for a consistency check.

On both read and write cycles which miss the cache and have no protection violation, the I/O Cache Control Logic updates the l/O Cache Tag Array entry addressed by the DVMA device. The real address field bits A3 1:13) are updated with the translated physical address, from RIOA(3 1:13~, transmitted onto the l/O Cache Data Bus through the l/O
Cache Address to Data Buffer. The block address bits A( 12:04) are updated from VIOA(12:04), similarly transmitted onto the l/O Cache Data Bus through the I/O Cache Address to Data Buffer.

Any data that might potentially remain in the l/O Cache Data Array at the end of a transfer sequence by a DVMA device must be removed.
This data can be removed from the l/O Cache through the means of an l/O
Cache Flush command. The system software specifies the I/O Cache array index that is to be flushed from the I/O Cache. The CPU can indicate this command by several means, but the one used in this implementation is a write to a particular address range. This range is uniquely decoded as a Flush command by the I/O Cache Control Logic, and A(10:04) from the Flush command then specifies the block to be flushed. When the l/O Cache Control Logic executes a Flush command, the l/O Cache Tag Array is accessed, indexed from A(10:04). If the tag entry's Valid bit is set, the entry is flushed. The Valid bit is cleared, and if the Modified tag bit is set, then the data is read out of the I/O Cache Data Array into the l/O Cache 2~301388 .
Write Back Buffer in preparation for a background write back cycle to Main Memory.

Software ~equirements for Consistency Controls Following is a summary of system software and DVMA device requ irements for implementing Consistency Controls.

First, as DVMA data is either read from memory or written to memory, for all Class 1, Class 2, and Class 3 DV~A devices, hardware controls ensure that the DVMA data is kept consistent with data in the Central Cache. For Class 1 and 2 DVMA devices, on DVMA read cycles, the DV~IA data will be sourced from the Central Cache if a matching address is found. On DVMA write cycles, a stale block of data found in the Central Cache at the same physical address as a block of data being written by a DV~lA device will be invalidated. For Class 3 DVMA devices, the Central Cache will be searched on both read and write cycles.

Second, at the conclusion of a DVMA transfer sequence, in order to ensure that all data from the VO Cache is properly flushed from the 1/0 Cache into Main Memory (on a DVMA write) or that the DVMA address is inva~idated from the I/O Cache (on a DVMA read), the operating system will issue Flush commands to the I/O Cache. The operating system must recognize when a DVMA transfer sequence has terminated as a requirement for issuance of a Flush command.

The flush command specifies the I/O Cache array inde~ to be flushed, through the CPU address bits A( 10:04). For VMEbus DVMA devices, this 2~30~88 -address corresponds to the device page address, A(19:13), in the VMEbus DVMA address space. For Ethernet, a Flush command to A(10:04) = Ox7F
flushes the Ethnet read buffer, and a flush to A(10:04) = 0~77 flushes the Ethernet write buffer. If the I/O Cache block specified is valid and modified, an l/O Cache Write Back cycle is initiated for the block. If the block specified is valid, then the valid bit is cleared. If the block specified is invalid, no action is taken.

The third requirement relates to the CPU programming sequence and the VO Cache flush command. Since the CPU, at the conclusion of a DVMA
transfer sequence, must first issue an l/O Cache Flush command in order to guarantee that all DVMA data is properly transfered to or from Main Memory, then it is the responsibility of the operating system to ensure that no CPU reference to DVMA data be made prior to the conclusion of the DVMA transfer sequence and prior to the issuance of the VO Cache Flush command. The section above, "Operating System Consistency Guidelines", discusses the operating system solution to this requirement in detail. CPU
and I/O Cache controls ensure that a Flush write back cycle to memory will complete prior to a subsequent reference by the CPU to the DVMA data.

I10 Cache Flowchart Operation Figure S describes the arbitration of I/O Cache busses for fundamental I/O Cache bus cycles. There are three functional I/O Cache request types: a VMEbus request, an Ethernet request, and a CPU Flush request. In addition, the CPU can also issue at least three other requests which may require the use of UO Cache busses, which are not fundamental to the functional operation of the 1/0 Cache. All of these requests behave, in terms of the arbiter operation, like the CPU Flush request which is shown.

The first additional CPU request is a VMEbus Master cycle, in which the CPU requires the use of VO Cache data and address paths to access the VMEbus Master interface. The second CPU request is an 1/0 Cache Diagnostic cycle, in which the CPU tests the 1/0 Cache by writing and reading the 1/0 Cache tag and data arrays. This testing is not normally done as a part of the 1/0 Cache functional operation, but rather for diagnosis only. The third CPU request is an I/O Mapper update, in which the CPU reads and writes the contents of the VO Mapper.

The following convention is used in the flowcharts to describe and reference certain control signals. If the signal is an "active high" signal, then its name has no "-" suffix; if it is an "active low" signal, then its name has a "-" suffix. If an active low signal is true, then it will be at a physicalvalue of "O". When active low signals are tested in a decision block, a "O"
decision block output corresponds to the condition that the active low signal is TRUE, and a "1" decision block output corresponds to the condition that the active low signal is false.

An Ethernet DVMA request from the Ethemet DVMA interface is indicated by the ETHERNET signal. A VMEbus DVMA request is indicated by the VME signal. This signal would result from an active VMEbus address and data strobe, together with a a VMEbus address which is decoded as being within the DVMA address space recognized by the - 2030~8~

system, as shown in Figure 4a. A CPU bus cycle which is decoded as an 1/0 Cache flush equest is indicated by the FLUSH signal.

When the arbiter grants bus ownership to each of these three functional UO Cache requests, the arbiter asserts a "GO" signal, which helps to control both the flow of control logic and also the enabling of various device data paths. The CPUGO- signal is asserted for the Flush operation;
the VMEGO- for the VMEbus DVMA cycle; and the ENETGO- signal for the Ethemet DVMA cycle. The arbiter also asserts a DVMA address strobe, labeled DVMAAS-, to indicate a valid DVMA cycle to 1/0 cache state machines. Both the "GO" and DVMAAS- signals are deasserted at the conclusion of a bus cycle by a DVMA acknowledge signal, labeled DVr-~AACK-, which is asserted in Figure 9.

Figure 6 describes the memory read operation for an 1/0 cacheable DV~IA cycle. The test condition CONSISTENCYCHECK is both a control timing point and a logic signal. With the consistency checking in the prefered embodiment, this signal will be asserted, following an I/O cache miss detection and establishing that the DVMA page is VO cacheable, when the DVMA consistency request gains CPU bus mastership and has the DVMA
address asserted on the CPU address bus, CPUA (Figure 2a).

The control for systems with hardware Central cache data consistency checking is as follows. First, the test for READ/WRITE depends on whether the DVMA cycle is a read or write bus cycle. For both cases, a test is made to see if the DVMA address matches a tag address in the Central Cache, as indicated by the signal CPUCACHEHIT. For a write cycle - 2a30~8 .~

which misses the Central cache, there is no action taken (state B). If a write cycle hits the Central Cache, then the Central Cache tags are invalidated (state A).

For a read cycle which misses the Central cache, a read request to main memory is initiated (state C). The bus protocol for a block read from memory is described in Figure 1 la, entitled "Memory Data Bus: I/O
Transfers". The 1/0 Cache Data Array will be continuously written (state D) until the first transfer of valid data is returned from main memory, as indicated by the control signal MEMORYDATAREADY. This signal is set in response to the Data Ack 0 and Data Ack 1 signals from main memory (Figure 1 la). The DATAENBHI and DATAENBLO control signals enable, respectively, the high and low words of the IO Input Buffer, as shown in Figure 2b. The lOWE- control signal sets the 1/0 Cache array write enable input, also shown in Figure 2b. Since the memory bus is two 32 bit words in width, but the I/O cache array is one word, two update cycles are required for each memory bus transfer. These are shown as the states D, F, H, and J. In states E, G, and 1, the VO Cache Data Array counter is incremented by one word, as indicated with the control signal IOCOUNT-.

For a read cycle which hits the Central Cache, a line of data is read from the Central Cache and passed to the 1/0 Cache for updating. This is shown in the loop control states K, L, M, and N. The Central cache array output enable is indicated by the signal CPUCAC~OE-.

Figure 7 describes the I/O Cache write back control. To initiate this state machine, first DVMAAS- must be asserted. A test is first made to see `_ whether this cycle is a FLUSH from the CPU. If so, a test is made for WRITEBACKNEEDED. This control signal will be set if the 1/0 Cache Tag Array indicates that the addressed block is both valid and modified (or "dirty"). If so, a test is made for WRITEBUFFERFlJLL. This control signal will be set if the 1/0 Cache Write Back Buffer still has valid contents from a previous I/O Cache cycle requiring data to be written back to main memory. If the buffer is full, the state machine loops in state CW until the Write Back buffer is available.

Now the current 1/0 cache block can be downloaded into the Write Back buffer. First, in state C, the address for the block is loaded into the 10 Write Back Buffer from the data transceivers between the IORA bus and the IOCDB bus (see Figure 2b). The data had been captured in the transceivers at the start of the cycle. (See also the description for Figure 10, below.) The Write Back Buffer buffers both the write back address as well as the block of data. The control signal IOCOEON- indicates to the state machine in Figure 8 that the VO cache output enable is to be set active to read data onto the IOCDB data bus; the actual enable signal, IOCOE-, is set in Figure 8. The control signals LI0-, RI0-, LIl-, and RIl- control the selection of the word buffers within the VO Cache Write Back buffer for both the data and the address. In state J, a signal PlOREQ- is asserted to indicate to the system memory controller that a write back cycle to main memory must be initiated. When all data is loaded, in state M, the state machine control goes to state X, where it waits until the controls complete this VO Cache bus cycle.
- 20~08~8 If a FLUSH request has no WRITEBACKNEEDED- active, then the state machine branches to state X directly. If there is no FLUSH request, a test is made for VALIDPAGE. This control signal is set if the DVMA page in the I/O Mapper is marked as valid; this signal is shown as an I/O Mapper output in Figure 2b. If the page is invalid, control is transfered to state X.
If the page is valid, then a test is made for an I/O Cache hit, indicated by the control signal CACHEHIT. This signal is set by the hit logic for the I/O
cache, shown both in Figure 2b and Figure 3.

If there is an VO cache hit, then a test is made for FIRSTWRITE. This control signal is set if the DVMA cycle is a write cycle but the 1/0 Cache Tag Array entry is marked as not modified (not "dirty"). If this cycle is a first write cycle, then the Central cache must be checked for cache consistency, as required for support of the present invention. The request for a cache consistency check is made through the PlOREQ- control signal.
If this DVMA cycle is not a first write cycle, then control branches to state X to wait for the completion of the cycle.

If the DVMA cycle misses the I/O cache, as indicated by a deasserted CACHEHIT, then a test is made for the signal WRITEBACKORIOCREAD. This signal is set if either the current I/O Cache Tag Array entry is valid and modified (indicating write back is required) or if the DVMA bus cycle is a tead cycle to the I/O Cache. This is established by checking the 1/0 Mapper to see if the DVMA cycle is VO cacheable. If WRITEBA~KORIOCREAD is not active, then the state machirle transitions to state X, where PIOREQ- will be - 2~3(~8 asserted on a read bus cycle to initiate the memory read cycle as well as a Central Cache consistency check.

If WRITEBACKORIOCREAD is active, then the state machine again tests for a WRITEBUFFERFULL condition. On a DVMA read cycle, this test ensures data consistency amoung DVMA bus cycles by guaranteeing that a FIFO ordering is observed in processing DVMA requests: the previous write back cycle must complete before the read miss cycle to main memory is initiated. When the WRITEBACKFULL condition is cleared, then a further test of WRITEBACKNEEDED differentiates DVMA read miss cycles from cycles requiring a write back. If WRITEBACKNEEDED is inactive, then the DV~A address is loaded into the IOC Miss Address Register by enabling the address through the signal WBADRV-, referenced in Figure 2b. The signal PIOREQ- is asserted to initiate a block read memory bus cycle and a Central Cache consistency test.

On a DVMA write cycle which misses the I/O Cache, the control signal WRITEBACKORIOCREAD will be deasserted. Control will fall through to state C2, which will initiate a consistency test for the read miss address in the Central Cache through PIOREQ-.

In Figure 8, the basic controls for the state machines and the data bus are established. In the state IDLE, the I/O array output enable, IOCOE-, is set active. Since the default address for the VO array is the tag array, and the default device priority is the VMEbus (Figure 5), the VO Tag Array entry addressed by the VMEbus is the default bus content for the IOCDB
bus .

- 2030~8 If FLSH- is active, the state machine transitions to state C4. The TIMEOUT signal tests for a memory bus time out. If it occurs, state Z holds until the timeout signal is deasserted. If TIMEOUT is inactive, then IOCOEON- is tested. This signal is set in Figure 7. If active, it indicates thatdata must be downloaded into the VO Cache Write Back buffer. State C4B
sets the output enable signal lOCOE- to the I/O Cache array (Figure 2b) and holds this active until the download is complete. When the download is complete, the signal ENABLEDEVICEXCVR tests whether the data bus transceivers for the device selected by the arbiter (Figure 5) are active.
These transceivers will be activated when the 1/0 Cache tag check is complete (if required) and the downloading of write back data is complete (if required). When ENABLEDEVICEXCVR is active for the Flush cycle, the control passes immediately from state C4 to state F2, where the state machine waits for the conclusion of the cycle.

If FLSH- is not active, the control signal CACHEHIT is tested to see if an I/O Cache hit condition is detected (see Figures 2b and 3). If not, then IOCACHEDPAGE is tested. This signal from the I/O Mapper (Figure 2b) determines whether the DVMA page is I/O cacheable. If the page is cacheable, the signal IORW is tested. IORW is active on DVMA read cycles and inactive on DVMA write cycles. For DVMA read cycles to cacheable pages which miss the I10 cache, control passes to state C4. When TIMEOUT
and IOCOEON- are inactive, ENABLEDEVICEXCVR is tested. When the DVMA
device (Ethernet or VMEbus) is enabled, the control signal IOBSY- is tested.
This signal is set by the main memory control logic and indicates that an 2~30~8 1/0 Cache bus cycle to main memory is active. It is reset at the conclusion of the memory bus cycle. The signal is tested twice: first, before the slate D4, and second, after state D4. The first test is to establish whether the 1/0 Cache bus cycle to main memory is active yet; when it is, state control passes to state C4. The second test establishes whether the memory bus cycle is complete; if so, control passes to state F2, waiting the conclusion of the I/O Cache cycle.

1/0 cacheable write DVMA requests which miss the VO cache pass to state C2B. From here, the control signal lOCOEON-, set in Figure 7, determines whether there is valid and modified data in the I/O cache entry which must be downloaded. If so, control passes to state C4B until the download is complete. When complete, the signal ENABLEDEVICEXCVR
is tested to see if the DVMA device can drive the IOCDB bus. If so, the control signal DEVOE- is set active in states C2, D2, and E2. This signal is gated with the device (VMEGO- and ENETGO-) in Figure 2b to enable data onto the IOCDB bus. (The I/O cache write enable is controlled in Figure 9.) For DVMA read cycles which hit the VO cache, control passes to state C. In states C, D, and E, the signal IOCOE- is asserted, which enables the 1/0 Cache Data Array onto the IOCDB (Figure 2b). For DVMA write cycles, a test is made for the IODIRTY signal to determine whether the current VO cache entry is marked as valid and modified. If it is not marked as modified, then control passes to state C2B, where the operation is treated as an 1/0 cache write miss. If IODIRTY is active, then states C2, D2, and E2 drive DEVOE- to enable the DVMA device data onto the IOCDB bus.

2~)3~88 For DVMA requests for which IOCACHEDPAGE is not active, control passes from IDLE to state C3. Read cycles remain in state C3 until the cycle completes. Write cycles remain in state D3, with DEVOE- active and the DVMA device driving the IOCDB bus, until the cycle completes.

Figure 9 describes the controls for cycle termination. From the IDLE
state, a test is made for VALlDPAGE. This control signal is set from 1/0 Mapper outputs. It is active only if the I/O Mapper indicates the DVMA
page translation is valid; for write DVMA bus cycles, the page also must indicate that the DVMA device has write permission. If VALIDPAGE is inactive? an error condition is signalled (DVMAERR- and the cycle is completed (asserting DVMAACK-).

For FLUSH requests with VALIDPAGE active, control passes to state G.
From state G, in general, tests are made for memory responses. For the Flush operation, these responses have no meaning, except for the test for IOSMBSY-. When this signal, set in Figure 8, is inactive, control passes to state F. Here DVMAACK- is asserted to conclude the arbiter state machine, Figure 5. The control signal FLUSHORNONlOCACHEDPAGE will be active for the Flush operation and for non-I/O cacheable DVMA cycles. This signal resets the state machine for Flush commands.

If FLUSH is inactive, CACHEHlT is tested to see if an VO cache hit condition is detected (Figure 2b and 3). If a cache hit is detected, then IORW is tested. For 1/0 cache read hits (IORW active), control passes to state C, and then to states D and F, where DVMAACK- is asserted. For I/O

-- 2030~88 cache write hits, (IORW inactive), the IODIRTY signal is tested to see if the current I/O cache entry is marked as modified (dirty). If not, control passes to state F, where DVMAACK- is assened. If IODIRTY is active, then control passes to state E, where both DVMAACK- and IOCWE- are assened.
IOCWE- is the write enable signal to update the I/O Cache Data Array, Figure 2b.

If FLUSH is inactive and CACHEHIT is inactive, then IOCACHEDPAGE is tested. If the DVMA page is not 1/0 cacheable, then control passes to state G. If it is I/O cacheable, then IORW is tested. For cacheable read cycles, control also passes to state G. In state G, in general, tests are made for memory responses. A TIMEOUT response terminates the cycle with an error, DVMAERR-. A MEMORYACK would be signalled for all DVMA read cycles which go to memory. It is set on the first data strobe (Data Strobe 0, from Figure 1 la). CPUCACHEHIT is asserned for non-l/O cacheable cycles which "hit" the Central Cache and when a read consistency check for 1/0 cacheable data "hits" the Central Cache. CPUCONTROLACK is assened for non-I/O cacheable writes. When any of these conditions is true, control passes to state F, where DVMAACK- is assened. From state F, for cacheable write cycles, control passes to state C; for other cycles, control passes to IDLE. From state C, cacheable write cycles update the VO Cache Data Array in state H by assening IOCWE-.

Figure 10 describes the state machine to control the I/O cache tag upd~te. In the IDLE state, two enable control signals are assened: MAPOE-and FlJNCOE-. MAPOE- sets the output enable for the I/O Mapper, as shown - 2~30888 -in Figure 2b. FUNCOE- combines with the selected 1/0 device (lhe CPU, V~lEbus, or Ethernet) to enable a virtual address onto the IORA bus.
ENrETGO- and FUNCOE- create EFUNCOE-, and VMEGO- and FUNCOE- create VF~JNCOE-, referenced in Figure 2b in setting IORA(12:04). Since VMEGO- is asserted as the default device, the default address on the IORA bus is the V~lEbus address.

At the start of every 1/0 cache bus cycle, the address for the current contents of the I/O Cache Tag Array are driven onto the IOCDB bus. This is caused by the fact that the address generator for the 1/0 cache is pointing to the Tag Array, and the 1/0 cache array is enabled on the lOCDB bus (Figure 8, state IDLE). If the current I/O cache contents are valid and modified, then this means that the write back address is active on the IOCDB bus. At the start of each bus cycle, this address is clocked into the data transceivers between the IORA bus and the IOCDB bus (see Figure 2b).
This address will be subsequently loaded into the 1/0 Cache Write Back Buffer for the write back cycle, as described below.

From the IDLE state, the signal FLUSHCYCLE indicates a Flush command from the CPU has I/O arbiter priority (Figure 5). If a FLUSHCYCLE is asserted, then WRITEBACKNEEDED is tested. This is determined by reading the 1/0 Cache Tag Array entry and e~mining the Valid and Modified bits. If both bits are set, then WRITEBACKNEEDED is set active. If WRITEBACKNEEDED is active, then WRITEBUFFERFULL is tested.
This signal is active when the VO Cache Write Back buffer is full, pending a write back cycle to main memory. Testing WRITEBUFFERFULL prior to - 20308~8 state B checks for a pending write back from a previous l/O cache cycle.
While waiting in state B for this cycle to complete, MAPOE- and FUNCOE-are continued, and IOLTCH-, IOTAGDRV-, and TAGWRT- are asserted. An active IOLTCH- latches and holds the I/O address; see Figure 2b for its use in capturing VMEbus and Ethernet addresses. lOTAGDRV- enables the I/O
Mapper output to drive the IOCDB bus with the new physical address, for use in updating the IIO Cache Tag Array. An active TAGWRT- drives the Tag Array write enable; see Figure 2b for reference to both of these slgnals.

When the WRITEBUFFERFULL is deasserted, indicating that the previous writeback cycle is complete, the state machine control goes to state C 1. In this state, control signals MAPOE-, FUNCOE-, IOCLTCH-, IOTAGDRV-, and TAGWRT- are asse~ted, along with MISS-. The actual Tag Array update occurs at the end of this cycle. The control signal MlSS-drives the Miss Address Function Generator, which generates the Valid and Modified signals, encoded in the nibble IOCDB(03:00), for both updating the I/O Cache Tag Array and for capturing the write back address in the Write Back buffers. In state C 1, for the Flush cycle, MISS- drives the Valid and Modified bits inactive for the Tag Array update. The result of state C 1 for the Flush cycle is that the VO Cache Tag Array is now marked as invalid.

Control now- passes to state D. For the Flush cycle, the write back address captured in the transceivers between the IORA bus and the IOCDB
bus is written into the Write Back Buffer. In state E, the control signal SEEDATA- is set. This signal informs the VO Cache Control Logic in block 20a, Figure 2a, to change the array address bit A9 to point to the data array portion of the I/O cache array, rather than the tag array. Control passes to state F, where it is held until the cycle completes. This is indicated by the deassertion of IOSMBSY-, set in Figure 8, and the deassenion of DVMAAS-, set in Figure 5.

lf the test for FLUSHCYCLE from the IDLE state is false, then CACHEHIT is tested. CACHEHlT is set if the DVMA cycle hits the I/O cache, as indicated in Figure 2b and 3b. If CACHEHIT is true, then the FIRSTWRITE control signal is tested. This signal will be active on DVMA
write cycles which write to an VO Cache entry which is currently marked as ~ alid but not modified. If FIRSTWRITE is true, then the tags must be updated to set the Modified bit. This update is done in state C 1, with the Miss Address Function Driver from Figure 2b driving the Valid and Modified bits into the tag array over IOCDB(03:00). If FIRSTWRITE is inactive for the DVMA cycle which hits the I/O cache, then no tag update is required. Control passes to state F, where it remains until the cycle is complete.

If CACHEHIT is not true, implying an I/O cache miss, then the control signal IOCACHEDPAGE is tested. This signal is read from the I/O Mapper, as shown in Figure 2b. If the page is not I/O cacheable, then control passes to state C2. In C2, MAPOE- is assened so that the DVMA physical address can be written into the IOC Miss Address Register, shown in Figure 2b. Control then passes through states D and E to state F, where the state machine waits for the conclusion of the cycle.
- 2a3a~8 -lf lOCACHEDPAGE is active, then the signal WRITEBACKNEEDED is tested. WRlTEBACKNEEDED indicates the presence on a valid and modified 1/0 cache entry present in the array. If this signal is active, the control flow is treated as in the Flush case. First, WRlTEBACKFULL is tested to see if the Write Back buffer is still busy from a previous write back cycle. If so, control loops in state B until the buffer is cleared. Then control passes to state C 1. In state C 1, the tags are updated, with the tag address driven from the I/O Mapper through asserting the buffer enable IOTAGDRV-, referenced in Figure 2b. The MISS- signal informs the Miss Address Function Driver to update the I/O cache Valid and Modified bits as appropriate, depending on whether the bus cycle is a read or write cycle.
Control then passes to state D, where the write back address is written into the Write Back Buffer, as in the case for the Flush cycle. Next control passes to states E and F to await the conclusion of the cycle.

If WRlTEBACKNEEDED is inactive, then the control signal IORW is tested. This signal will be active for I/O cache read cycles. lf the DVMA
c~.cle is a read cycle, then in order to assure data consistency for I/O cache data read from memory, the state machine tests whether the signal WRITEBUFFERFULL is active. This assures that any pending write back cycle will complete before data for the DVMA read cycle is returned.
Looping in state B until the write back cycle is complete, control can now pass to state C 1. Here lOTAGDRV- enables the physical address from the VO Mapper onto the IOCDB bus, while the MISS- input to the Miss Address Function Driver, Figure 2b, is asserted. The function driver sets the Valid 20308~8 bit in IOCDB(03:00). TAGWRT- updates the tag array entry. Control now passes through states D and E to state F, where the state machine loops until the end of the cycle.

If IORW is inactive, indicating a DVMA write cycle, then control passes directly to state Cl (since WRITEBACKNEEDED is inactive). Here the tags are updated, as above, e~cept that the Miss Address Function Driver (Figure 2b) sets both the Valid and Modified bits active. The write cycle concludes just as the DVMA read cycle, above.

Claims (16)

1. In a computer system comprising a central processing unit (CPU), a central cache, an input/output (I/O) cache, a memory, and a plurality of I/O devices, a method for maintaining data coherency between said central cache, said I/O
cache, and said memory, said method comprising the steps of:
a) partitioning said memory into a plurality of memory segments;
b) assigning ownership for each of said memory segments to said central cache, each of said memory segments assigned to said central cache being eligible to be cached by said central cache only, but accessible by both read and write cycles of said CPU and of said I/O devices addressed to said memory;
c) classifying each of said I/O devices to one of a plurality of I/O device classes based on their logical I/O
buffer and memory access characteristics;
d) allocating and deallocating said memory segments to said logical I/O buffers of said I/O devices, conditionally reassigning ownership of said memory segments being allocated and deallocated to said I/O cache and back to said central cache before said allocation and after said deallocation respectively, based on said I/O devices' classified I/O device classes, said memory segments assigned to said I/O cache being eligible to be cached by said central and I/O caches, but accessible by said read and write cycles of said I/O devices only;
e) detecting read and write cycles of said CPU and said I/O devices;
f) returning data to be read for said detected read cycles of said CPU from selected ones of (i) said central cache and (ii) said memory, and of said I/O devices from selected ones of (i) said I/O cache, (ii) said central cache, and (iii) said memory, respectively; and g) storing data to be written for write cycles of said CPU into selected ones of (i) said central cache, (ii) said memory, and (iii) both said central cache and memory, and of said I/O devices into selected ones of (i) said I/O cache and (ii) memory respectively.
2. The method as set forth in claim 1, wherein, said central cache comprises a plurality of cache lines, each cache line having a line size of n1 bytes;
said I/O cache comprises a plurality of I/O cache lines, each I/O cache line having a line size of n2 bytes; and each of said memory segments has a segment size of n3 bytes, where n3 equals the larger of n2 and n1 if n2 is unequal to n1 and n3 equals both n2 and n1 if n2 equals n1.
3. The method as set forth in claim 2, wherein, each of said logical I/O buffers comprising I/O cache assigned memory segments comprises at least one I/O cache assigned memory segment;
each of said logical I/O buffers comprising at least one I/O cache assigned memory segment has a physical address whose lowest log2(n3) bits are equal to zero and a buffer size that is in multiples of n3 bytes.
4. The method as set forth in claim 3, wherein, each of said logical I/O buffers comprising at least one I/O cache assigned memory segment has a buffer size of a memory page if it is dynamically allocated, said memory page having a page size that is in multiples of n3 bytes; and each of said logical I/O buffers comprising at least one I/O cache assigned memory segment is padded to said buffer size that is in multiples of n3 bytes if it is statically allocated.
5. The method as set forth in claim 1, wherein, said step d) further comprises the steps of:
inhibiting processes executed by said CPU from causing said CPU to perform read and write cycles addressed to said allocated memory segments, and writing all dirty data cached in I/O cache lines of said I/O cache for said memory segments being deallocated and reassigned, dirty data being normally cached in said I/O cache lines and written back into said allocated memory segments cached by said I/O cache lines when said I/O cache lines are reallocated to cache other allocated memory segments.
6. The method as set forth in claim 1, wherein, said data to be read for said read cycles of said I/O device are simultaneously stored into said I/O cache and bypassed to said I/O device if said data to be read is being returned from a selected one of (i) said central cache and (ii) memory, and the memory segments addressed by said read cycles of said I/O cache are eligible to be cached in said I/O cache.
7. The method as set forth in claim 1, wherein, said step f) further comprises the step of invalidating all previously valid data cached in said I/O cache for said allocated memory segments after said I/O devices completed their corresponding current sequence of successive read cycles if said allocated memory segments are assigned to said I/O
cache;
said step g) further comprises the step of invalidating any previously valid data cached in said central cache for said allocated memory and the step of writing all dirty data cached in said I/O cache for said allocated memory segments after said I/O devices completed their corresponding current sequence of successive write cycles, if said allocated memory segments are assigned to said I/O cache.
8. The method as set forth in claim 1, wherein, said central cache is a selected one of (i) a central write through cache and (ii) a central write back cache;
said I/O device classes comprise:
(i) a first I/O device class whose I/O devices dynamically allocate logical I/O buffers in said memory segments, one logical I/O buffer per I/O device, said allocated memory segments being reassigned to said I/O cache, and perform sequential accesses to their dynamically allocated logical I/O
buffers, (ii) a second I/O device class whose I/O devices statically allocate logical I/O buffers in said memory segments, a plurality of logical I/O buffers per I/O device, said allocated memory segments being reassigned to said I/O
cache, and perform interleaving sequential accesses to their statically allocated I/O buffers, and (iii) a third I/O device class whose I/O devices perform accesses to their logical I/O buffers comprising allocated memory segments assigned to said central cache;
said allocated memory segments are addressed by said read and write cycles of said I/O devices in a selected one of physical addressing and virtual addressing.
9. A computer system comprising:
a) a memory comprising a plurality of memory segments;

b) a central cache coupled to said memory, a CPU, and a plurality of I/O devices, said central cache being assigned ownership of said memory segments, said central cache assigned memory segments being eligible to be cached by said central cache only, but accessible to read and write cycles of said CPU
and said I/O devices addressed to said memory;
c) an I/O cache coupled to said memory and said I/O
devices, said I/O cache being conditionally reassigned ownership of said memory segments based on classified I/O
device classes of said I/O devices when said memory segments are allocated to logical I/O buffers of said I/O devices, said I/O cache assigned memory segments being eligible to be cached by said central cache and said I/O cache, but accessible to read and write cycles of said I/O devices addressed to said memory only;
d) an operating system allocating and deallocating said memory segments to said logical I/O buffers of said I/O
devices, and conditionally reassigning ownership of said memory segments being allocated and deallocated to said I/O cache and back to said central cache before said allocation and after said deallocation respectively, based on said classified I/O
device classes of said I/O devices;
e) said central processing unit (CPU) performing said read and write cycles addressed to said memory on behalf of process being executed by said CPU, data to be read for said read cycles of said CPU being returned from selected ones of (i) said central cache, and (ii) said memory, data to be written for said write cycles of said CPU being stored into selected ones of (i) said central cache, (ii) said memory, and (iii) both said central cache and memory; and f) said plurality of input/output (I/O) devices performing said read and write cycles addressed to said memory, said I/O devices being classified into said I/O device classes based on their logical I/O buffer and memory access characteristics, data to be read for said cycles of said I/O
devices being returned from selected ones of (i) said I/O
cache, (ii) said central cache, and (iii) said memory, data to be written for said write cycles of said I/O devices being stored into selected ones of (i) said I/O cache, and (iii) said memory.
10. The computer system as set forth in claim 9 wherein, said central cache comprises a plurality of cache lines, each cache line having a line size of n1 bytes;
said I/O cache comprises a plurality of I/O cache lines, each I/O cache line having a size of n2 bytes; and each of said memory segments has a segment size of n3 bytes, where n3 equals the larger of n2 and n1 if n2 is unequal to n1, and n3 equals both n2 and n1 if n2 equals n1.
11. The computer system as set forth in claim 10, wherein, each of said logical I/O buffers comprising I/O cache assigned memory segments comprises at least one I/O cache assigned memory segment;
each of said logical I/O buffers comprising at least one I/O cache assigned memory segment has a physical address whose lowest log2(n3) bits are equal to zero and a buffer size that is in multiples of n3 bytes.
12. The computer system as set forth in claim 11, wherein, each of said logical I/O buffers comprising at least one I/O cache assigned memory segment has a buffer size of a memory page if it is dynamically allocated, said memory page having a page size that is in multiples of n3 bytes; and each of said logical I/O buffers comprising at least one I/O cache assigned memory segment is padded to said buffer size that is in multiples of n3 bytes if it is statically allocated.
13. The computer system as set forth in claim 9, wherein, said operating system inhibits processes executed by said CPU
from causing said CPU to perform read and write cycles addressed to said allocated memory segments, and writes all dirty data cached in I/O cache lines of said I/O cache for said memory segments being deallocated and reassigned, dirty data being normally cached in said I/O cache lines and written back into said allocated memory segments cached by said I/O cache lines when said I/O cache lines are reallocated to cache other allocated memory segments.
14. The computer system as set forth in claim 9, wherein, said data to be read for said read cycles of said I/O device are simultaneously stored into said I/O cache and bypassed to said I/O device if said data to be read is being returned from a selected one of (i) said central cache and (ii) said memory, and the memory segments addressed by said read cycles of said I/O device are eligible to be cached in said I/O cache.
15. The computer system as set forth in claim 9, wherein, said I/O cache invalidates all previously valid data cached in itself for said allocated memory segments after said I/O devices completed their corresponding current sequence of successive read cycles if said allocated memory segments are assigned to said I/O cache;
said I/O cache further writes all dirty data cached in itself for said allocated memory segments after said I/O
devices completed their corresponding current sequence of successive write cycles if said allocated memory segments are assigned to said I/O cache; and said central cache invalidates any previously valid data cached in itself for said allocated memory segments.
16. The computer system as set forth in claim 9, wherein, said central cache is a selected one of (i) a central write through cache and (ii) a central write back cache;
said I/O device classes comprise:
(i) a first I/O device class whose I/O devices dynamically allocate logical I/O buffers in said memory segments, one logical I/O buffer per I/O device, said allocated memory segments being reassigned to said I/O cache, and perform sequential accesses to their dynamically allocated logical I/O
buffers, (ii) a second I/O device class whose I/O devices statically allocate logical I/O buffers in said memory segments, a plurality of logical I/O buffers per I/O device, said allocated memory segments being reassigned to said I/O
cache, and perform interleaving sequential accesses to their statically allocated logical I/O buffers, and (iii) a third I/O device class whose I/O devices perform accesses to their logical I/O buffers having allocated memory segments assigned to said central cache;
said allocated memory segments are addressed by said read cycles of said I/O devices in a selected one of physical addressing and virtual addressing.
CA 2030888 1990-04-12 1990-11-26 Cache data consistency mechanism for workstations and servers with an i/o cache Expired - Fee Related CA2030888C (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US50893990A 1990-04-12 1990-04-12
US508,939 1995-07-28

Publications (2)

Publication Number Publication Date
CA2030888A1 CA2030888A1 (en) 1991-10-13
CA2030888C true CA2030888C (en) 1996-04-30

Family

ID=24024677

Family Applications (1)

Application Number Title Priority Date Filing Date
CA 2030888 Expired - Fee Related CA2030888C (en) 1990-04-12 1990-11-26 Cache data consistency mechanism for workstations and servers with an i/o cache

Country Status (1)

Country Link
CA (1) CA2030888C (en)

Also Published As

Publication number Publication date
CA2030888A1 (en) 1991-10-13

Similar Documents

Publication Publication Date Title
US5247648A (en) Maintaining data coherency between a central cache, an I/O cache and a memory
US5263142A (en) Input/output cache with mapped pages allocated for caching direct (virtual) memory access input/output data based on type of I/O devices
US6272579B1 (en) Microprocessor architecture capable of supporting multiple heterogeneous processors
US6622214B1 (en) System and method for maintaining memory coherency in a computer system having multiple system buses
US6633967B1 (en) Coherent translation look-aside buffer
US6049847A (en) System and method for maintaining memory coherency in a computer system having multiple system buses
US8180981B2 (en) Cache coherent support for flash in a memory hierarchy
US5659709A (en) Write-back and snoop write-back buffer to prevent deadlock and to enhance performance in an in-order protocol multiprocessing bus
US5161162A (en) Method and apparatus for system bus testability through loopback
US12038840B2 (en) Multi-level cache security
US5551000A (en) I/O cache with dual tag arrays
US6892283B2 (en) High speed memory cloner with extended cache coherency protocols and responses
US20040111576A1 (en) High speed memory cloning facility via a source/destination switching mechanism
US20040111575A1 (en) Dynamic data routing mechanism for a high speed memory cloner
JPH06318174A (en) Cache memory system and method for performing cache for subset of data stored in main memory
EP0681241A1 (en) Processor board having a second level writeback cache system and a third level writethrough cache system which stores exclusive state information for use in a multiprocessor computer system
US6898677B2 (en) Dynamic software accessibility to a microprocessor system with a high speed memory cloner
US7502917B2 (en) High speed memory cloning facility via a lockless multiprocessor mechanism
CA2030888C (en) Cache data consistency mechanism for workstations and servers with an i/o cache
JP3251903B2 (en) Method and computer system for burst transfer of processor data
US20040111581A1 (en) Imprecise cache line protection mechanism during a memory clone operation
CA2036372C (en) An input/output cache for caching direct (virtual) memory access data
US20240320154A1 (en) Multi-level cache security

Legal Events

Date Code Title Description
EEER Examination request
MKLA Lapsed