WO1990005954A2

WO1990005954A2 - Computer upgrading

Info

Publication number: WO1990005954A2
Application number: PCT/GB1989/001411
Authority: WO
Inventors: Bernard William Gill
Original assignee: Xitek Product Design Ltd.
Priority date: 1988-11-24
Filing date: 1989-11-24
Publication date: 1990-05-31
Also published as: WO1990005954A3

Abstract

A personal computer or the like can generally be upgraded by replacing the microprocessor chip on the main (mother) board by a ''turbo board'' mounted in an expansion socket and having a flying lead connection to the microprocessor chip socket. The turbo board comprises a more powerful microprocessor together with a clock oscillator, a memory replacing the main memory on the main board, and interfacing circuitry which converts the signals fed to the socket by the main board into a form compatible with the turbo board microprocessor and vice versa. In the present invention, the turbo board carries a microprocessor (34) of substantially the same type but of higher operating frequency. The signals passing between the main board and the turbo board are fully compatible; interfacing circuitry (41) adjusts their timing appropriately. A cache memory system (40) is interposed between the chip (34) and the interfacing circuitry (41), but the memory on the main board remains operative. Posted writes are used, and certain areas of the main memory are made uncacheable by the cache control unit (43).

Description

Comput er Upgrad ing

The present invention relates to computers of the type commonly known as personal computers and similar such computers such as workstations, and is concerned more specifically with the upgrading of existing computers.

The standard design of personal computer consists generally of a monitor, a keyboard, a printer, and a processor unit. There can of course be other units as well, and variations are of course possible - for example, a single printer may be shared between several computers (which may then be referred to as workstations).

The present invent ion is concerned with the processor unit, which is normally housed in a rectangular box (which often has disc drives built into it as well). More specifically, the processor normally consists of a main board, carrying the microprocessor and a substantial amount of additional circuitry which is virtually indispensible for the essential functions of the computer, and having on it an expansion bus with several connectors into which additional circuit boards (expansion boards) can be plugged. These expansion boards generally provide functions which are not essential but are desired by the part icular user, e.g. additional disc drives, an improved video drive to the monitor, modems, etc.

An outstanding feature of personal computer technology is continuing technical advance. This has shown itself in many aspects, but the two aspects of most importance here are the development of new microprocessor chips and the development of faster versions of a given chip. The type of personal computer being considered here uses a single microprocessor chip for central control of the computer.

The microprocessor chip used in many early personal computers was the 8088, which is an 8-bit chip running at 5 MHz and generally using 4 clock cycles per operation. The next chip to be introduced was the 8086, which is a 16-bit chip; it initially ran at 5 MHz, but was later developed to run at 8 MHz. Next came the 80286 chip, which is also a 16-bit chip running at 6 MHz (then developed to 8 MHz); however, this chip generally uses 2 clock cycles per operat ion. The main chip at present is the 80386, which is a 32-bit chip; the original version runs at 16 MHz, but 20 and 25 MHz versions have been developed, and it is possible that it will be speeded up still more.

There are therefore four main classes of personal computer, depending on which chip is used. As each new chip was introduced, so a new class of microcomputer was introduced with it, providing a substantial advance in computing power over the previous class. But within the more recent classes, there have been increases in computing power due to the increased speed of the corresponding types of chip.

The cost of personal computers, while small compared to large computer systems (mainframes and minis), is nevertheless substantial in comparison with, for example, calculators - the dif ference is a couple of orders of magnitude. Further, personal computers are generally designed to be expandable by the additioa of expansion boards. A substantial activity has therefore developed in providing upgrades to existing personal computers. (In some cases, this is done to maintain the "image" of the owner rather than from any pressing need for technical improvement.)

Such upgrading is achieved by substituting a newer type of microprocessor for the existing one. This generally involves replacing an 8086 chip by an

80286 chip, or an 80286 chip by an 80386 chip. This technique is commonly termed "turbo-charging"; the replacement chip is mounted with some associated circuitry on a board termed a turbo board.

The computing power of a computer designed to use a given chip will, of course, in general be greater than that of a computer designed for the previous type of chip which has been upgraded by the addition of a turbo board. However, the upgrade will generally provide a good deal of the increased power of the new computer, at a cost of a modest fraction of the cost of the existing computer (while the cost of the new computer is likely to be substantially higher than that of the existing computer).

According to its main aspect the present invention provides a computer system of the type described, comprising a plurality of subsystems including a main memory and a clock oscillator, adapted to produce substantially all the microprocessor coupling signals required to couple to a microprocessor (a slow micro processor) of a part icular type and of operating frequency matching the clock oscillator, characterized by a fast microprocessor of substant ially the same type as the slow microprocessor but of higher operating frequency, and interfacing circuitry interconnecting the fast microprocessor with the microprocessor coupling signals and comprising a fast clock oscillator of operating frequency matching that of the fast microprocessor and which converts said interfacing signals to signals of the same type but with t iming matching that of the fast microprocessor.

The fast microprocessor and the interfacing circuitry may be mounted on the same board as said plurality of subsystems. This would normally occur if the computer system were designed as an improved version of an existing design.

However, the fast microprocessor and the interfacing circuitry may be mounted on a separate board from the board carrying said plurality of subsystems. The separate board would be a turbo board which would normally be purchased by the user to upgrade his existing computer. The turbo board would preferably be mounted in a socket carried on the main board and from which the slow- microprocessor has been removed.

One of the advantages of this; compared to the standard technique of using a microprocessor chip on the turbo board which is of a more advanced type than the one which is being replaced on the main board, is that complete compatibility is automatically achieved, because every function which the original chip could perform can also be performed, in precisely the same way, by the new chip on the turbo board.

There is preferably a cache memory system, having a memory unit and a control unit, connected between the fast microprocessor and the main memory; preferably, data is written into the cache memory unit only from the fast microprocessor, and all data from the fast processor is written into the main memory.

The cache memory control unit preferably includes posted write logic mean s which determine whether or not the fast microprocessor can, af ter it has generated data, proceed with the next operation before that data has been written into the main memory; means which determine whether or not data can be written int o the cache memory unit in 'dependence on its address; bus size means which, fo r predetermined address ranges, set the bus size at one or other of two possible values in dependence on the address; and cache reset logic means which cause resetting of the cache in response to predetermined conditions, which preferably include an address within predetermined ranges, input/output operations, and floppy disc unit operations.

To understand the nature of turbo boards more thoroughly, it is necessary to consider more closely the structure and organization of a personal computer . processor. It will be convenient to begin by considering primarily processors using 80286 chips. This discussion will be with reference to the drawings, in which:

Fig. 1 shows diagrammatically a typical board layout of an 80286-based processor;

Fig. 2 shows diagrammatically a typical turbo board for the processor of Fig. 1;

Fig. 3 shows diagrammatically, and by way of example only, a typical turbo board in accordance with the present invention;

Fig. 3A shows diagrammatically an offset adaptor for the turbo board;

Fig. 4 is a block diagram of the turbo board;

Fig. 5 is a block diagram of the cache memory of the cache system of Fig. 4;

Fig. 6 is a block diagram of the logic circuitry of the control unit of the cache system;

Fig. 7 is a block diagram of the timing circuitry of the cache system;

Fig. 7A is a diagram showing a convention used in Fig. 7;

Fig. 8 is a block diagram of the buffer unit of the interface circuitry of the turbo board;

Fig. 9 is a block diagram of the timing unit of the interface circuitry; and

Figs. 10A and 10B are block diagrams of two modifications of the turbo board. Fig. 1 shows a typical main board layout of an 80286-based processor. The board 10 is divided roughly into quarters. An 80286 microprocessor chip 1 1 is mounted towards the centre of the board. Beside this, in the bottom left-hand quarter, there are several connectors 12 into which memory boards are plugged. This provides the primary memory, which is connected to the microprocessor 1 1 by a memory bus (not shown). (In practice, the memory may well be mounted directly on the board, rather than coupled to it via connectors as described, for reasons discussed below. ) The bottom right-hand quarter of the board has various logic units 13 mounted on it, providing a variety of functions such as a real time clock, possible battery back-up for that, t imers, mouse control, DMA (direct memory access) control, and system logic. Above this, in the top right- hand quarter of the board, there is a set of expansion connectors, connected to the expansion bus (not shown). In the top left-hand quarter of the board, there is more logic 15, concerned primarily with interfacing between the microprocessor 1 1 and the expansion bus and with driving that bus.

The original design of the expansion bus was for processors using the 8088 chip, which is an 8-bit chip. The boards to be plugged into the expansion bus were therefore all originally designed as 8-bit boards, and many current, boards are derivatives of these early designs. So the logic 15 also carries out any necessary data conversions from 32 bits to 16 bits or 8 bits. There has in fact been a fairly general adoption of a 16-bit expansion bus, but such a bus often has some short connectors on it for use with 8-bit address expansion boards.

As noted above, there have been frequent advances in microprocessor speed. The expansion bus is physically large and can have various expansion boards of a variety of types plugged into it. The speed at which this bus is operated is therefore generally 8 MHz (which is the same as that used in the faster 8086- based processors); serious problems would be encountered in trying to run .that bus significantly faster. 80386 chips, however, run much faster than this, and one major function of the logic 13 is to provide the t iming conversion between the chip and the expansion bus.

The memory 12 is coupled to the chip 1 1 by a separate memory bus, and is located physically close to the chip 1 1. This is to enable memory accesses to be as fast as possible, with the memory bus being run at the speed of the microprocessor. (In practice, it usually takes 3 clock cycles for a memory access. ) The organization of a typical main board layout of an 80386-based processor is broadly similar to that of the 80286-based processor as described, though of course the physical layout of the components on the board is not necessarily the same. The organization of an 8086-based processor board is also broadly similar to that just described. In particular, the memory on an 8086-based board is usually in the form of memory chips mounted directly on the board, because the size of memory addressable by a 16-bit chip is 640 Kbytes. (The chip contains a memory offset register which adds 4 bits to the normal L6 bits, giving a potential 1 Mbyte, part of which is normally reserved and not usable as RAM.) An 80286 chip can address up to 16 Mbytes, and an 80386 can address up to 4 Gbytes. In such cases, It is therefore often convenient to mount part of the memory directly , oh the main board, but to accommodate the remainder on memory boards or modules which plug into the main board.

To turbocharge a processor by known techniques, the existing microprocessor chip is replaced by a later and larger type of chip, as noted above. However, one cannot simply pull the existing chip out of its mounting and plug in the new one, for several reasons. For one thing, the different types of chip have different mountings. But, more important than that, appropriate interfacing has to be provided so that the new chip, which has a larger word length and probably runs at a higher speed, can still drive the existing board. The turbo board therefore carries not only the new chip but also the additional interfacing circuitry.

As with other expansion boards, the turbo board is fitted into one of the expansion slots 14 on the expansion bus. It draws its power supplies from that bus, and can interface directly with the logic signals on that bus. It also, of course, has to interface with the socket of the original microprocessor chip 11, and this is achieved by means of a flying lead from the turbo board, with a connector on its end which plugs into the existing microprocessor chip socket.

It is desirable to minimize the length of the connections to the new chip. The turbo board normally extends across substantially the full width of th e board 10, and the microprocessor chip is therefore normally mounted towards it s left-hand end. However, the length of the connections is obviously still increased considerably, and the existing chip is being replaced by one which runs substantially faster. The access time from the chip to the memory on the main board 10 (either directly mounted on the board or plugged into the connectors 12, as the case may be) would limit the speed of operation to well below the chip speed. A replacement memory is therefore also mounted on the turbo board, together with circuitry between the chip and the flying lead plug which intercepts memory access attempts and prevents them from reaching the original memory on the board 10, directing them instead to the new memory mounted on the turbo board. That original memory thus becomes inoperative, at any rate for its original purpose. (It may be possible to arrange for it to be used as an additional or extended memory, somewhat as if it were plugged into the expansion bus. )

An additional advantage of this arrangement is that the memory on the turbo board can be of a size matching the new microprocessor on that board, i.e. larger than the original memory on the main board 10.

The organization of such a turbo board is therefore generally as shown in Fig. 2. The board 20 is shown tilted backwards into the flat position; t ilting it forwards into the vertical position orients it so that its edge connector 21 can be plugged into one of the expansion sockets 14 of the main board 10 (Fig. 1 ). The microprocessor chip 22 is mounted towards the centre of the board.

On the right-hand side, there is the primary memory 23 for the chip 22. On the left-hand side, there is interfacing circuitry 24 and the flying lead 25 which plugs into the socket on the main board 10 when the original chip 1 1 is removed. Logically, the chip 22 is coupled to the memory 23 on one side and to the interfacing circuitry 24 on the other. The interfacing circuitry 24 also includes a clock chip to drive the microprocessor 22 at the appropriate speed.

Thus the length of the connections from the microprocessor socket on the main board to the turbo board and the increased speed of the microprocessor on the turbo board have dictated the provision of a memory on the turbo board which ef fectively replaces the existing memory on the main board.

Turbocharging of the general character described has been widely available for upgrading 8088/8086 processors with 80286 and 80386 chips, and 80286-based processors with 80386 chips. Upgrading of 8088 processors to 8086 processors has not been common, because there is little speed advantage in changing an 8088 chip for an 8086 chip and the doubled word width is of little advantage in a processor designed f or the original word width. THE PRESENT SYSTEM

The preferred arrangement of the present invention provides a different approach to the design of a turbo board. Different aspects of this relate to the choice of the chip used on the turbo board to replace the existing chip, the organization of the turbo board itself, and the physical arrangement of the turbo board.

Physically, the present turbo board is designed to be mounted in the socket of the existing microprocessor. Organizationally, the existing memory is not replaced; instead, only a cache system is provided on the turbo board, and the existing memory remains operative.

The organization of the present turbo board is generally as shown in Fig. 3. The board 30 is shown oriented to be plugged into the socket on the main board 10 (Fig. I) in which the chip 11 is normally mounted, and has a plug 31 formed on its under surface therefor. The microprocessor chip 32 is mounted on the opposite corner of the board 30. In the top left-hand corner of the board 30, there is interfacing circuitry 33, which matches the speed of the main board 10 with that of the microprocessor 32. In the lower right-hand corner of the board, there is a cache 34. There is also a flying lead to an edge connector 35 which can be plugged into one of the sockets 14 on the expansion bus. (Alternatively, a tail connector may be used which cooperates with either an expansion board, either an existing one or a dummy one such as a piece of fibre glass.)

To enable the present turbo board, which is to be plugged directly into the existing socket, to clear the existing components of the circuitry 15 in the upper left-hand corner of the main board 10, it may be desirable for the board to be of flexible material, so that the parts carrying the new microprocessor 32, the circuitry 33, and the cache 34 can curve upwardly above the circuitry 15.

The details of the physical layout of the main board, and in particular of the location of the microprocessor, vary considerably among different models of personal computer. In particular, space may be restricted on one or two sides of the microprocessor. The turbo board layout shown in Fig. 3 has been chosen to fit the majority of personal computers, but there may be various designs which, for the mechanical reasons just noted, it will not fit. This problem can be alleviated by using an offset adaptor as shown in Fig, 3A. This adaptor consists of a plug 36 and a socket 37 mounted as shown on a base 3&, with corresponding pins connected.

The pin configuration of an 80386 microprocessor is symmetric, so the adaptor can be plugged into the socket on the main board in any one of four different orientations. This raises the location of the turbo board somewhat higher above the main board, and also allows its location to be offset in any one of four different directions to avoid interference with adjacent components or other structural elements. The orientation of the 80386 microprocessor on the turbo board is of course fixed, and the orientation of the turbo board relative to the main board is likewise fixed. Such a microprocessor normally has one corner truncated, so that the its orientation is immediately apparent. It is convenient for the corresponding corner of the turbo board to be similarly truncated, to give a direct indication of its required orientation relative to the socket on the main board.

If the existing microprocessor socket is such that it is not feasible to plug a turbo board into it, then the board can be constructed to plug into the expansion bus, and connected to the microprocessor socket, by means of a flying . lead. It would then be convenient to power the board from the expansion bus socket, and the board would also αse the signal from the expansion bus which would normally be picked up by the flying lead, although all other signals would pass through the microprocessor socket. This arrangement, of course, has the disadvantage that the signal paths to the board are all lengthened (apart from the one from the expansion bus).

The interfacing circuitry 33 on this board performs a similar role to the circuitry 24 of the known turbo board 20. However, the memory organization is different. When the chip 32 tries to perform a memory access, the address is sent to the cache 34. If the address is not found to be in the cache 34, an access of the existing memory 12 is initiated.

If the address is found to be in the cache 34, the access takes typically 2 clock cycles of the chip 32, the chip 32 and the cache 34 running at matching speeds. However, if the address is not found in the cache 34, the access to the main memory 12 on the board 10 proceeds at the speed determined by the main memory 12, and therefore takes typically 5 cycles of the chip 32. The fact that some accesses are to the main memory 12 obviously slows down the operation somewhat. However, In practice most o'f the accesses are likely to be to addresses which are already resident in the cache 34. The access time of caches tends to be lower than that of main memories, even though caches are more complex than main memories, because they are generally much smaller than main memories. Thus with the present turbo board, most memory accesses are somewhat faster than with a conventional turbo board.

With a cache, when data is written into memory, it is written into the cache (because the cache is normally designed so that that it contains the most recently used memory locations). Arrangements must be made to ensure that there can be no conflict between the data so written into the cache and the previous data in the same address in the main memory. In the present system, the- preferred way of achieving this is to write all data to the main memory as well as to the cache. It is also preferred to buf fer such writes (i.e. use a posted write technique). This means that the turbo board processor can continue processing while the main memory write is proceeding. Thus most writes will take only 2 cycles. But, of course, if a write is closely following by another write, then the chip 32 will have to hold the second write until the first write into the main memory has been completed.

Further, the cache operates over the full memory address range. A conventional turbo board is normally fitted with a 1 MByte memory, so while all memory accesses within that range will be relatively fast, any outside that range will involve accessing the memory on the main board 10 and will therefore be slow (unless a memory expansion upgrade has been fitted to the turbo board).

The present turbo board thus broadly matches the speed of a conventional turbo board. But the cost of the present turbo board is less than that of a conventional one, because the cost of a cache is less than that of a conventional memory; as with speed, the much smaller size more than outweighs the greater complexity.

The present turbo board is mounted in the socket of the existing microprocessor chip 11, to minimize lead lengths. Most of the signals it requires are those which are available on the leads to that socket. However, it also has a flying lead and connector 35 to the expansion bus, so that it can sense signals on that bus. In particular, it is desirable for the turbo board 20 to be able to determine when a direct memory access (DMA) involving a main memory write is being performed. The occurrence of this situation cannot in general be deduced from the states of the signals on the leads to the microprocessor 1 1 socket alone.

The flying lead can be omitted if the original board is constructed so that the signals which would be detected by the flying lead are made available on a spare pin on the microprocessor chip socket, This can obviously be done if the main board is designed to permit this type of upgrade. If the board is not so designed, however, it may st ill be possible to avoid the need for the flying lead. For example, as will become apparent, the present system is designed so that the cache memory is flushed (reset ) when a floppy disc drive is operated. If the main system is such that the only direct memory accesses involving the main board involve the use of floppy disc drives, then the flying lead will not be required. More generally, it will not be required if the main board is such that the turbo board can be designed to detect any I/O address involving an operation on the main board, and flush the cache memory accordingly.

The turbo board 30 obviously requires more power than the chip 1 1 , because the board 30 includes the interfacing circuitry 33 and the cache 34 as well as the replacement chip 32. But the power available from the socket for the chip 1 1 will normally be sufficient to drive the board 30. The power requirement of the board 30 is substantially less than that of a normal turbo board, because it has a cache instead of a main memory. Once again, the increased complexity of a cache is more than outweighed by its much smaller size. However, the flying lead can include power lines to power the board 30 from the expansion bus if desired.

The present turbo board has been described so far largely in terms of comparison with known turbo boards. Such known boards are used, as discussed above, to replace one type of chip by a later chip - for example, to replace an 80286 chip by an 80386 chip. The present techniques can of course be used in the same way, replacing an 80286 chip by an 80386 chip plus a cache. However, a major aspect of the present invention lies in replacing a given chip by a faster chip of the same type. Thus it can be used to replace an 80286 chip by a faster 80286 chip, or an 80386 chip by a faster 80386 chip. In such a case, the design of the turbo board is simplified, because although interfacing circuitry 33 is still required to match the speed of the new chip to the main board (which still runs at Its original speed), no other conversion (such as of word lengths) is required.

Since the present techniques retain the existing memory on the main board 10, they cannot take advantage of any potentially greater memory space which may be usable with the chip 32 on the turbo, board. However, this apparent disadvantage is not usually of practical significance. If the present techniques are used to upgrade an 80286-based processor with an 80286 chip, or an 80386-based processor with an 80386 chip, this possibility does not arise. If an 80286- based processor is being upgraded with an 80386 chip, the memory space available to the original 80286 chip is 16 Mbytes (the chip contains internal address offset registers giving an extra 8 bits for addresses), which is not generally unduly restrictive, and the processor design and programs will generally be tailored to a memory of that size anyway.

The present principles and techniques can obviously also be applied to upgrading personal computers using other types of processor chip, such as Motorola chips.

When the present techniques are used to replace a chip of a particular type with a faster chip of the same type, the resulting system runs at a speed which is largely determined by the new chip. However, it is only the new chip and the associated cache which run at the higher speed; the remainder of the system runs at the old, slower speed. The present principles can therefore be utilized by PC manufacturers who want to upgrade an existing PC design by replacing the existing microprocessor chip by a faster chip of the same size. By using the present principles, the existing chip is replaced by the faster chip, the associated cache, and interfacing circuitry which matches the new, faster chip to the existing, slow board. The advantage of this is that virtually none of the existing board design need be changed. In particular, the other chips of the design can be kept as the slower (and cheaper) versions,

DETAILED CIRCUIT DESCRIPTION

Fig. 4 is a block diagram of the circuitry of the present turbo board. This consists of the microprocessor 34, a cache system 40, interfacing circuitry 41, and an oscillator 46. The cache system 40 consists of two sub-units, a cache memory unit 42 and a cache control unit 43; the interfacing circuitry 41 also consists of two sub-units, a buffer unit 44 and a control unit 45. The turbo board is controlled essentially by the microprocessor 34. The oscillator 46 provides clock signals which clock the microprocessor and the cache system 40.

The units are connected to each other and to the microprocessor header 31 and the expansion bus connector 35 as shown. The connections can be broadly divided into information buses and control line buses (shown in Fig. 4 by double lines and single lines respectively). The information buses carry information which plays little or no part in controlling the operation of the cache system 40 and the interfacing circuitry 41, but is merely passed passively between the microprocessor 34 and the header 31. (Such information consists mainly of data and addresses.)

As noted above, when the microprocessor wants to perform a memory read, the address is sent to the cache system, and if the address is in the cache system, its contents are read out and passed to the microprocessor; the interfacing circuitry 41 is not involved. If the microprocessor wants to perform a memory write, the address and data are sent to the cache system, and written into it; again, the .interfacing circuitry 41 is not directly involved. However, such writes are accompanied by writes to the main memory 12 on the board 10, in order to keep the contents of the cache system and the main memory consistent. Also, if the address on a read is not in the cache system, then the main memory has to be accessed.

Hence for some reads and all writes, there has to be communication with the main board 10. This is achieved through the interfacing circuitry 41. The buffer unit 44 consists of buffer registers for holding information which is to be passed between the main board 10 and the microprocessor 34. The control unit 45 is concerned with making the turbo board emulate the microprocessor which has been removed from the socket or header 31, and particularly with signal retiming. It must transmit signals which can be correctly received by main board circuitry (which is clocked by the slow main board clock), and correctly receive signals which are synchronized with that main board clock. Microprocessor chip pins

The microprocessor chip has a large number of pins (input and output lines), and the microprocessor socket on the base board has a corresponding set of socket holes. Substantially all of the pins have to be properly simulated at the microprocessor socket. These pins will therefore be reviewed.

There is a set of data lines DO-31 (which are input and output lines) and a set of address lines BEO-3 and A2-31 (which are output lines). There are also several lines concerned generally with control and signalling, as follows. Associated with these lines there is an output line /ADS, which the microprocessor chip sets to 0 to indicate when it has placed a valid address on the address lines, and an input line /RDY, which is set to 0 to indicate when valid data has been placed on the data lines for feeding to the microprocessor chip.

There is a group of four lines which are concerned generally with control. These are: a read/write output line W/R, which is 1 for a write and 0 for a read; a memory /IO line M/IO, which is 1 for a memory operation and 0 for an input /output operation; a data/control line D/C, the state of which indicates, in conjunction with the other three lines, such matters as the servicing of interrupts; and a lock line /LOCK, which the microprocessor chip sets to 0 if two accesses are to be tfied together so that no other device can make an intervening memory access (this is used, for example, for semaphore reading and writing).

There are also sundry further lines. An input line /BS16 is set to 0 if a memory or 10 access is a 16-bit access; an input line HOLD is set to 1 if the microprocessor chip is being requested to hold (suspend) the current operation and relinquish the address and data bus outputs; an output line HLDA is set to 1 by the microprocessor chip to acknowledge that it has received a HOLD signal and has duly relinquished the buses; an input reset line RESET resets the microprocessor chip when set to 1; and a clock input line CLK2 is used to feed a double frequency clock signal to the microprocessor chip (which divides it down by 2 internally for its internal timing). (The removal of the RESET signal is synchronized with the CLK2 signal in order to determine the phase of the internal clock of the microprocessor.)

The lines associated with the microprocessor chip on the turbo board are referred to by the symbols above; the corresponding lines associated with the microprocessor socket on the base board will be distinguished by the addition of an initial "B". A line is generally given a name or symbol which to some extent indicates its function or effect. If there is no "/" in the symbol, that function or effect is produced when the line is at 1 ; if there is a "/" in the name, the function or effect corresponding to the symbol after the "/" is produced when the line is at 0. The "/" can be regarded as symbolizing the line itself, with the position of the symbol above or below the line indicating whether the corresponding function or ef fect is produced for a high (1 ) or low (0 ) signal. (Thus the W/R line is 1 for write, 0 for read. ) The inverse of a signal can generally be indicated by placing a "/" before its symbol or by reversing the order of the functions - e.g. R/W is the inverse of W/R.

Memory organization

To understand the nature and purpose of the cache system, the abstract nature of the memory organization, addressing, and data buses and organization must first be understood.

The 80386 chip is a full 32-bit organized chip, which operates with 32-bit words and addresses. That is, it has a 32-bit data bus, lines DO-31 , and a 32- bit address bus. However, there are certain elaborations of this simple basis. The 80386 chip can subdivide the 32-bit data words into 4 8-bit bytes (partly as a matter of general convenience, and partly to preserve some degree of compatibility with the older 80286 chip which operated with 16-bit words, and which in turn preserved some degree of compatibility with the still older 8088 chip which operated with 8-bit words or bytes. ) The 32-bit address structure is therefore modified so that the top 30 bits appear on 30 address lines A2-31 , but there are four byte enable lines BEO-3 corresponding to the bottom 2 bits. Each of these byte enable lines can be 1 or 0 independent ly, so that any combination of the 4 bytes of a word can be selected (enabled). The word itself is of course selected by the top 30 address bits, so that word boundaries occur at every fourth byte.

The memory organization is also determined by conventions established with the 8088 chip. That chip could address 1 Mbyte of memory; although it operated with 8-bit words, its memory registers were ef fect ively 20 bits long. (This length was made up of an 16-bit register added to a 16-bit register off-set by 4 bits. )

The bottom end of the memory was used for application programs and data; the top end was reserved for special purposes. More specifically, the bottom 640 kbytes of memory were for general use. The 128 kbytes of memory area with addresses from AOOOO to BFFFF (hexadecimal) was reserved for video memory; that is, for the video display. Video display circuitry, separate from the 8086 microprocessor, repeatedly read that memory area in synchronism with the video screen scanning to generate the video display. That memory area was thus in principle a form of bit map of the screen display (at least for graphic display; for character display, a character look-up table was normally used). That left 256 kbytes of memory area above the video memory area, and that final 256 kbytes was divided into four portions, each of 64 kbytes; the first two portions were reserved for use with the expansion bus . slots and could be used for video ROM, and the last two were reserved for copies of expansion ROM data and boot ROM data respectively.

For compatibility reasons, the same memory organization has been adopted for computers using 80286 and 80386 chips. These chips have, of course, much larger memory spaces, though not all of the memory space is physically present. (The 80386 chip, for example, can support a 4 Gbyte memory, which is far larger than any personal computer is likely to require for some time. ) But whatever extra memory systems using 80286 and 80386 chips have, the bottom 1 Mbyte is organized as just described, with the memory area above 1 Mbyte (address 100000) being used as general memory, and treated as an extension of the main memory area up to 640 kbytes.

There are several restrictions on the use of the cache, as not all memory addresses can be treated similarly. These restrictions can be summarized with reference to the memory map below, showing the various regions of memory which are treated differently.

One memory function which has to be provided in that of reset. This is required for various purposes. The details of this are discussed later; for present purposes, it must be noted that this function requires an 8 kbyte stretch of the memory space to be reserved for resetting, and so uncacheable. As will be seen shortly, there are substantial stretches of memory space which are made uncacheable for other reasons, and the reset tag value is chosen to lie in one of these stretches.

In some existing systems, the memory area from 0 to 64 kbyte is copied by devices on the board onto a corresponding area located just above 1 Mbyte. In the present system, this area (0 to 64 kbyte) is therefore made uncacheable. Also, since the memory area from addresses 640 kbyte to 1 Mbyte is reserved for special system functions, this area is also made uncacheable. That is, the cache system is prevented from storing any address in these memory areas. If the microprocessor chip tries to access an address in these areas, the data is automatically transferred to or from the main memory on the base board.

Because of the history of the development of this type of computer, some parts of the memory area are treated as having a 16-bit word length; this is because some of the devices on the system board use 16-bit words. A signal (BB16) is generated on the main board indicating when a 16-bit word operation is in progress. However, this signal is not available on the turbo board in time for it to be used when a cache read hit or posted write ocurs.

The present system generates a signal BS16 which indicates whether the word length for an access is 16 or 32 bits. For certain situations, this signal is synthesized, on the basis of assumptions about the way the memory is utilized. The memory areas 0 to 640 kbyte, and 1 to 16 Mbyte are for general memory, and all data in these areas is assumed to be 32-bit words. The video memory area, 640 to 768 kbyte, Is likewise assumed to be 16-bit words. Also, all I/O operations are assumed to use 16-bit words. However, no assumptions can reliably be made regarding the remaining area of memory, 768 kbyte to 1 Mbyte; the signal BS16 cannot be synthesized for this area, but must be generated from BB16; the delay involved in obtaining BB16 from the main board is unavoidable in this case.

There is a further complication to the addressing scheme. There is a further line, M/IO, which can be regarded as an additional address line, the state of which determines whether the memory or an input/output device is to be accessed. (The memory address space is normally . limited, e.g. to 64 k or less; the upper, part of the address is used for 10 device selection,, with the lower part being used as a control signal to the 10 device so selected.)

Posted writes

The interface logic includes buffer registers for both addresses and data.

When the microprocessor chip writes data, it is ready to proceed with Its next operation as soon as that data and its address has been accepted by whatever circuitry the data has been passed to. The address and data are in fact passed to both the cache system and the buffer registers in the interface logic. Thus once the cache system has stored the data and the interface logic buffers have latched the data and its address, the microprocessor can proceed to its next operation. That will normally involve reading further data. If that data is not in the cache system, then it will have to be fetched from the main memory on the base board, and that will have to wait until the data previously stored in the interface circuitry has been accepted by the base board and written into the main memory. However, if the further data is in the cache system, it can be obtained by the microprocessor chip, which can thus continue with its operations in parallel with the writing of the data held in the interface circuitry into the main memory on the base board. This process is known as posted write, since the microprocessor chip in ef fect posts the information to be written, and then proceeds with its operations while leaving the postal system to complete the write.

However, it is only possible to perform a posted write if the word size (16 bits or 32 bits) is known promptly. As noted above, the memory areas 0 to 640 kbyte and 1 to 16 Mbyte are assumed to be 32-bit words, while the video memory area, 640 to 768 kbyte, is assumed to be 16-bit words. For these areas, posted writes are therefore permissible. For the remaining areas, posted writes are not permissible, and the turbo board has to wait for a signal from the system board indicating the word size before it can proceed with the next operation.

Cache resetting

Since the use of a cache system means that some addresses are physically duplicated, it is essential to ensure that inconsistency cannot arise. Read and write memory accesses can both be initiated by both the main board 10 and the turbo board 20.

Considering the turbo board first, when this first reads an address, the address will not exist in the cache system. The word will therefore have to be fetched from the main memory, and copied into the cache system. When the turbo board writes a word, it is automatically written into the main memory 12; if the word already exists in the cache system, it is written into the cache system as well. (If the word does not already exist in the cache system, it is not written into it; this avoids the problem which would otherwise arise if not all bytes of the word are being written.)

Considering now accesses initiated by other devices on the main board 10, these will all be directed to the main memory 12. Since all writes by the turbo board are copied into the mam memory, reads by the main board are necessarily consistent. However, a write by the main board could change an address in the mam memory 12 while the same address exists, and continues to contain the previous version of that word, in the cache system. In the present system, this potential inconsistency is prevented by clearing the whole of the cache system every time there is a write into the mam memory by the system board. This means that the operation of the turbo board is slowed down greatly until the cache system has been substantially reloaded. However, such main board writes are relatively rare.

There is another circumstance in which the cache is reset. It will be realized that when the cache has been reset, the operation of the microprocessor will be slowed down, because for some time all (or nearly all) data will have to be obtained from the main board memory . Certain 10 devices, such as floppy disc . units, are frequently designed in such a way that they will not operate correctly unless the speed of the microprocessor is below a certain value. The present turbo board can often exceed that limiting value, and its operation must therefore be slowed down when such 10 devices are operating. This is achieved by resetting the cache.

It is also necessary to initialize the cache, so that when the system is starting up, the contents of the cache are not used until they are consistent with the contents of the main memory. This can be achieved in various ways, e.g. by copying a block of words of the main memory into the cache on initialization. In the present system, however, this is achieved by resetting the cache system in the same way as for main board writes.

Resetting is accomplished by forcing the cache addresses to predetermined values, and providing hardware trap circuitry which prevents the cache from being able to register a hit for those addresses. This means that there is a set of predetermined addresses in the memory organization which cannot be mapped (copied) into the cache system. The block chosen for this purpose starts at address 0. As noted above, this region is made not cacheable for other reasons, so its use for resetting does not affect the operation of the system.

As a practical matter, the tag chip in the cache memory can only be reset to a value of zero. In fact, in the present system the bottom 8 kbyte of the memory space has to be uncacheable for other reasons, so this reset tag value does not cause any problems. If however it is desirable for the bottom 8 kbyte of the memory space to be cacheable, then the reset 8 kbyte stretch can be relocated to some other part of the memory space. This can be achieved by providing inverters between the address bus and the cache memory for those bits in the tag part of the chosen address which are 1. The chosen address, which is at some other location in the "real" memory space than the bottom end, is thus seen by the cache memory as being at the bottom (zero address) end. Cache memory unit Fig. 5 shows the organization of the cache memory unit 42. This comprises four 8-bit by 8 k data chips 50- 1 to 50-4 which are used for data storage, a further similar tag chip 51 which is used for partial address storage, and a comparator 52. (The comparator 52 may in practice be integrated with the tag chip 51. )

The manner in which data is stored is straight forward, The word size is 32 bits, ie 4 bytes each of 8 bits. The 32-bit data bus, lines D0-31 , is split into 4 bytes, with each byte being fed to a corresponding data chip.

The data chips provide storage for 8 k words, and the bottom 13 bits of the word address on the address bus, lines A2- 14, are used to select an address in these chips. Byte selection by the "downwards extension" of the address, on the byte select lines BE0-3, is required for writing. These lines are therefore fed to the write control inputs of these chips via a set of 4 AND gates 53, which are enabled by the line W/R, which is 1 for writing. For reading, all four bytes of a word are read from the cache, with the microprocessor 32 selecting whichever bytes it requires. (A cache write will of course occur on a read by the microprocessor if there is a cache miss, i.e. if the word is cacheable and is not already in the cache. )

The main memory is therefore mapped repeatedly, in 8 k blocks, into the data chips. That is, each location in the data chips has a large number of addresses, spaced 8 k apart throughout the main memory, mapped into it. Which particular one of this series of words in the main memory is in fact duplicated in the data chips is determined by the tag chip 5 1 , which has an 8-bit tag storage location for each word location in the data bits. The top 8 bits of the address, lines A15-22, are used as the tag.

When an access is made (or attempted) to the cache memory unit, the required address is split into the bottom 13 bits and the top 8 bits (the tag). For a write, the word is always written into the cache memory unit , erasing whatever was there before; for this, the tog is written into the tag chip 5 1 in parallel with the writing of the word itself into the data chips 50. When a read is attempted from the cache memory unit, the address of the word to be read is sent to the cache memory unit, and the bottom 13 bits of this address are used to read the stored tag from the tag chip 51. The stored tag is that for the word already stored in the cache memory unit. This stored tag is compared with the tag part of the required address by the comparator 52. If these two parts are the same, then the line HIT goes to 1 , indicating that the required word is in the cache system (ie a hit). The word itself will have been read out at the same time from the data chips 50 onto the data lines DO-31.

If there is no hit, then the word read out from the cache is discarded and the required word must be fetched from the main memory on the base board 10. If its address is cacheable, it will then be written into the cache; that is, the word itself will be written into the data chips 50 and its tag into the tag chip 51.

To reset the cache memory unit, the tag chip 51 must have all stored tags set to zero, which is the reset tag value chosen (as discussed above). This value can be forced into all locations in the chip 51 by a reset signal applied to a resetting input of the chip. The chip is fed by a reset line HLDA which, when set to 1, clears the contents of the chip; i.e. resets all the stored tags to 00000000. (This cache control unit 43 contains circuitry preventing the accessing of this memory region as cache, as will be described shortly. ) The signal HLDA is in fact produced by the microprocessor, as will be described with reference to Fig. 7.

Cache control unit

The cache control unit 43 can be broadly divided into combinatorial logic circuitry and timing (flip-flop) circuitry. Fig. 6 shows the logic circuitry, and Fig. 7 shows the timing circuitry.

Cache logic circuitry

The address on the address bus is decoded to generate a signal CACHE which is 1 for addresses which can be cached. The remaining addresses are uncacheable; these include the address block which is used for resetting. More generally, the address range from 640 kbyte to 1 Mbyte is made uncacheable, because that range is not normally used for general storage. Also, the maximum address which can be stored in the cache memory unit 42 is 80000 (hex), 8 Mbyte, because the cache memory unit cannot cope with address bits above A22. Hence memory above this is also made uncacheable. It is assumed that the physical memory will not extend beyond 16 Mbyte at the most; if it does extend to 16 Mbyte, it is unlikely that the top 8 Mbyte will be used much.

The signal CACHE is produced by an AND gate 60, so all inputs to that gate must be 1 for CACHE to be 1. One input is the signal M/IO (which is 1 for memory operations, 0 for 10 operations; a second signal is /A31 (which can be 0 for coprocessor operations); and a third signal is /A23 (which is 0 for the memory range from 8 Mbyte to 16 Mbyte). The fourth signal is from an OR gate 61, which is fed with A20, A21, and A22, so its output is 1 for the memory range from 1 Mbyte upwards. Gate 61 is also fed with signal /A19, which is 1 for the memory range from 0 to 512 kbyte. The final input to gate 61 is from an AND gate 62, which is fed with the signals /A17, /A18, and A19; in the only memory range not yet considered, 512 kbyte to 1 Mbyte, the range from 640 kbyte to 1 Mbyte has A19 at 1 and at least one of A17 and A18 at 0. Thus CACHE is 1 for the memory range 0 to 512 kbyte (/A19 = 1, gate 61), the range 128 to 640 kbyte (gate 62), and the range 1 to 8 Mbyte (gate 61), provided of course that the range is not above 8 Mbyte, and that a memory operation rather than an 10 operation is required.

The address is also decoded to produce a signal POST, which is 1 if the address is one for which a write can be posted. The signal POST is produced by an AND gate 63, which is enabled by the signal W/R, ie for writes. Gate 63 is fed by an OR gate 64, which is fed with the signal CACHE; hence all addresses which can be cached can also be posted. Gate 63 is also fed by an AND gate 65, which is fed with the signals A19, /A18, and A17. The output of gate 65 is 1 for the memory area 640 to 768 kbyte, so addresses in this area can be posted. Gate 64 is also fed by an AND gate 66, which is fed with the signals IO/M and /A31; hence posting can occur for all 10 operations except those to the coprocessor (for which A31 = 1).

The address is also decoded to produce a signal BS16, which is 1 for those addresses for which the word length is 16 bits. The signal BS16 is produced by an OR gate 67, which is fed by the gates 65 and 66. Hence the signal BS16 is 1 for the memory area 640 k to 768 k, and for all 10 operations except those to the coprocessor (for which A31 = 1). Gate 67 is also fed by an AND gate 68, which is driven by the signal BB16, which is 1 when the system logic requires a 16-bit transfer. Gate 68 is also fed by a NOR gate 69, which enables BB16 and which is fed with the signals POST, CACHE, and A31. Gate 69 produces a 1 when none of its inputs are 1, i.e. when there is no posting or cacheing or coprocessor accessing; that is, for these regions BS16 follows BB16.

The microprocessor 34 normally requires 2 clock cycles per operation. If it is performing a read access and the address is in the cache system, then the required word can be made available by the cache system in 2 clock cycles, and the turbo board runs at its maximum speed. Similarly, If the microprocessor is performing a posted write operation, then the system takes 2 clock cycles to discover if the word is already in cache, i.e. if there is a hit. If there is a hit, then a third clock cycle is required to complete the writing of the word into the cache system. The system is then ready to start the next operation. However, if the access being performed cannot use the cache system - that is, If the required word is not in the cache memory unit for a read or a posted write is not allowed for that address - then the system has to wait for the access to be completed. This is controlled by a signal RDY, which is set to 0 while an access is proceeding and to 1 when it is completed.

The signal RDY is produced by an OR gate 70, which is fed by 4 AND gates 71 to 74. AND gate 71 produces a 1 for a cacheable address (CACHE = 1) on a read operation (R/W = 1) provided that the address is in the cache system (HIT = 1) on the occurrence of a timing signal CADS, discussed below. Gate 72 produces a 1 for a postable address (POST = 1) on a write operation (W/R = 1) if the address is not in the cache system (HIT = 0), again on the occurrence of the timing signal CADS. Gate 73 produces a 1 for a postable address (POST = 1) on a write operation (W/R = 1) if the address is in the cache system (HIT = 1) on the occurrence of a tag write signal TGWR, discussed below. Finally, gate 74 produces a 1 on the occurrence of a signal FIN and a signal CMD, in the absence of the timing signal CADS.

Cache memory resetting is controlled by a tag reset request signal TGRS, which is produced when a device on the main board writes into memory, the system is reset, accesses need to be slowed down for slow peripheral devices, or accesses are made to registers which can swap around the memory map. Signal MWR is taken from the expansion bus connector 35 and indicates a device on the main board writing into memory, and signal BHLDA indicates that that main board device has control; these 'two signals are combined by an AND gate 57. An AND gate 55 gates CADS (which indicates that an access is being performed) with the signal IO/M (which indicates that the access is an IO access rather than a memory access) provided that address bits A9, A8; and A5 are all 1. The 10 address areas covered by these address bits include addresses 3FX (which are the addresses where floppy disc controllers are located) and 32X (which are addresses where memory switching registers may be located). A gate 59 combines the signals HOLD and HLDA, which makes sure that the HOLD signal to the microprocessor is held until HLDA is activated, i.e. until the microprocessor has acknowledged the HOLD signal to it. The outputs of gates 55, 57, and 59 are combined by an OR gate 58 to generate the signal TGRS.

Cache timing circuitry

Fig. 7 shows the timing logic of the cache control unit 43, This logic consists of a number of flip-flops together with their input and output logic. All the flip-flops are D type flip- flops, being set or cleared according to the state of the input signal when clocked.

A clock divider flip-flop 80 is driven by the oscillator 46 signal CLK2, and divides this down by 2 to generate the main turbo board clock, /CLK, which is used to clock all the other flip-flops in the cache control unit. A reset flip- flop 81 is fed with a reset signal BRES obtained from the main board, and produces a corresponding reset signal RESET. This is fed to the microprocessor 43 to reset it; this resetting synchronizes the internal divide-by-2 clocking of the microprocessor with the turbo board signal /CLK,

(In fact, where a flip-flop is shown as clocked by the main turbo board clock /CLK, it is in fact clocked by the oscillator clock signal CLK2, and its input is fed from a multiplexer MUX which selects between its output and the data input to the flip-flop under control of the main turbo board clock /CLK, as shown in Fig. 7A: this arrangement reduces signal delay times. )

The microprocessor 43 produces an address signal ADS when it has generated an address on its address lines; this signal is only active for 1 clock period. A clocked address signal flip-flop 82 is therefore used to produce a clocked (latched) version of ADS, CADS. Flip-flop 82 is fed with the signal ADS via an OR gate 84. An AND gate 83, enabled by the output CADS of the flip-flop, is fed with a command signal CMD, which is 1 when the turbo board wants to communicate with the main board Hence CADS is set to 1 by ADS and held at 1 until the command signal CMD goes to 0, indicating that the previous command has been performed; CMD will go to 1 for the next clock period. Signals ADS and CADS are combined by an OR gate 85, which feeds an AND gate 86 enabled by the signal R/W. Gate 86 produces the cache read signal CRD, so that the cache is read when ADS or CADS is active and R/W indicates a read.

Writing into the cache memory unit is controlled by a tag write signal TGWR, which is produced by a flip-flop 90, This flip-flop is fed by ah OR gate 91, which is in turn fed by three AND gates 92 to 94. Gate 92 holds the flip- flop set provided a finish signal FIN has not gone to 1 and reading is occurring (R/W = 1 ). Gates 93 and 94 are both enabled by a combination of three signals: CADS, CACHE (the memory address is cacheable), and /CMD (there is no command to access the main board). Provided this combination exists, gate 93 produces a 1 if there is a read (R/W = 1) and no hit (HIT = 0), while gate 94 produces a 1 if there is a write and a hit.

The finish signal FIN Is produced by a finish flip-flop 96 in conjunction with a clocked finish flip-flop 95. These are controlled by a command signal BCMD, which follows the command signal CMD but is synchronized with the baseboard clock. When BCMD goes to 1 , it sets the clocked finish flip-flop 95 via an OR gate 97. This sends the output of an AND gate 98 to 1. When BCMD goes back to 0, it enables an AND gate 99, which sets the finish flip-flop 96. This sends the output of an AND gate 98 back to 0. Both inputs to gate 97 are now at 0, so flip-flop 95 is cleared. Flip-flop 96 is thererfore also cleared. Thus FIN goes to 1 for one turbo board clock period after BCMD goes to 0.

The command signal CMD is produced by the clear state of a flip-flop 100. This flip-flop is set when all the inputs to an OR gate 101 are 1. For this, OR gate 102 must produce a 1, which means that signal FIN must be 1 or the flip- flop must be already set. Also a NAND gate 103 must produce a 1 , which means that CACHE must be 0 (the address is uncacheable), CADS must be 0, or flip-flop 100 is cleared. Also, a NAND gate 104 must produce a 1, which means that there is a hit (HIT = 1 ) and there Is a read (W/R = 0), as determined by an OR gate 105, CADS must be 0, or flip-flop 100 is cleared. The tag reset request signal TGRS is fed to a tag reset flip-flop 107, which feeds the hold input HOLD of the microprocessor. When the microprocessor receives this signal, it produces a .hold acknowledge output signal HLDA, which is f ed to the tag chip 5 1 to reset it. However, signal HLDA is not produced until the microprocessor has finished an operation or a series of closely linked operations. Hence cache resetting cannot interrupt a microprocessor operation or a pair of linked operations, so such linked operat ions cannot be disrupted by the contents of the cache memory changing partway through them.

Interfacing circuitry

This consists of buffer circuitry and timing logic. Fig. 8 shows the buffer circuitry and. a small part of the timing logic; Fig. 9 shows the main part of the timing logic.

The address bus, lines A2-31 and BEO-3, together with the four control lines W/R, M/IO, D/C, and /LOCK, are outputs from the microprocessor 34 and their states have to be buffered through to the main board 10. A buffer 1 10 is provided for this, with output lines BA2-31 , BBEO-3, BW/R, BM/IO, BD/C, and /BLOCK. The data lines DO-31 are bidirectional, and are buf fered to the main board lines BDO-31 by two buf fers 1 1 1 and 1 12, one for each direct ion. Buffers 1 10 and 11 l are clocked by the signal CMD (which, when it rises to 1 , loads the data on the buf fer inputs into the buffers). Buffer 1 12 is clocked by the corresponding signal BCMD and loads data when the signal BCMD ends.

Signal CMD also clocks a latched write flip- flop 1 13, which is set or cleared by the signal W/R. The two outputs of this flip-flop are fed to two AND gates 1 14 and 1 15, which are also fed with the signals /BHLDA (hold acknowledge signal from the main board) and CMD. These two gates feed the output enable inputs of the buffers 11 1 and 112 as shown. Thus the contents of buffer 1 1 1 are passed to the main board for a write (W/R = 1 ), and the contents of buffer 1 12 are passed to the rest of the turbo board for a read, on the signal CMD provided that the main board is not producing the signal BHLDA. Signal /BHLDA enables the outputs of buffer 1 10, so those contents are available unless there is a main board hold acknowledge.

A B clock divider flip-flop 120 is driven by the main board oscillator signal BCLK2, and divides this down by 2 to generate the main board clock, /BCLK, which is used to clock all but one of the other flip-flops in the interfacing circuitry timing logic.

A B address flip-flop 121 is fed with the signal BCMD, and feeds an AND gate 122 which produces the signal BADS; this AND gate is also fed with the signals BCMD and /BHLDA.

A B hold acknowledge flip-flop 123 produces the signal BHLDA. This flip- flop is fed by an OR gate 124 fed by two AND gates 125 and 126, which are both enabled by signal BHOLD. Gate 125 produces a 1 if the signals BLOCK (from buf fer 1 10) and BCMD are both 0; gate 126 produces a 1 if the signal BHLDA is 1.

A B command flip-flop 130 produces the signal BCMD. This flip-flop is fed by an OR gate 131, which is fed by 3 AND gates 132 to 134. Gate 132 is fed by the signals BRDY and BCMD. Gates 133 and 134 are enabled by a signal BRQ, which is produced by a B request flip-flop 135, which is set by the signal CMD and cleared by the signal BCMD; this flip-flop 135 is operated asynchronously (it has a 1 to its data input, CMD is fed to its clock input, and BCMD is fed to its reset input). Gate 133 produces a 1 when the signals BHOLD and BHLDA are both 0; gate 134 produces a 1 when BLOCK is 1 and BHLDA is 0.

The B clock divider flip-flop 120, the B address flip-flop 121, the B hold acknowledge flip-flop 123, and the B command flip-flop 130. are all reset by the main board reset signal BRES.

In most microcomputers, a memory read results in all four bytes of a 32-bit word being read, with the appropriate bytes then being selected by means of the byte enable signals BEO-3. In some types, however, only the particular bytes required are read from the main memory. If the present board is used with such a type, a problem arises on a cache read miss. The whole word containing the required bytes must be read from the main memory on the main board and written into the cache. However, the byte enable signals will in general define only some of the bytes of that word. Hence if the address including these byte enable signals is fed to the main board, only some of the bytes will be read from the main board. If only these bytes are copied into the cache memory, then there will be inconsistency between the remaining bytes of that word - those remaining bytes will not have been updated in the cache memory to match the main memory. To overcome this, circuitry (not shown) may be provided to force all the byte enable signals to 1 in such circumstances. This circuitry may conveniently be controlled by the W/R and CACHE signals, so that it operates on cacne reads.

Coprocessor

The system may include a coprocessor, which is used for performing certain mathematical functions. The coprocessor, if fitted, is treated as an input/ output device, and is therefore addressed by an appropriate address combined with the signal M/IO being 0. In addit ion, the top bit of the address lines, line A31 , may be used to control communication with the coprocessor. If M/IO is 0 and A31 is 1 , then data is being sent to the coprocessor. (This combination of signals is recognized by certain types of coprocessors; using A31 obviates the need for the coprocessor to decode the lower order address bits to determine whether the coprocessor or some other input/output device is being addressed. )

A jumper switch (not shown) is provided, and is manually set open if a coprocessor is f itted. A decoder circuit (not shown), enabled by the jumperswitch being open, is provided, and recognizes when the coprocessor is fitted and is being addressed. The output of the decoder disables the data buffers (discussed below) and also causes the ready signal line RDY (also discussed below) to be "floated", so that that signal can be controlled by the coprocessor. The RDY signal line is preferably provided with a pull up resistor (not shown) so that it is taken to 1 if the coprocessor is not fitted despite the jumper swit ch being set open.

A coprocessor normally has one corner truncated in correspondence with the microprocessor, as discussed above.

An alternative arrangement is to provide a small auxiliary board having a plug which fits into the 80386 microprocessor socket and two sockets, one for the 80386 microprocessor and the other for a coprocessor. This board also has mounted on it the additional circuitry required when a coprocessor is provided. This arrangement eliminates the need for the jumper ment ioned above; it is assumed that if the auxiliary board is used, then a coprocessor will be fitted to it. The present arrangement permits the use of any one of a variety of different coprocessors in conjunction with the main microprocessor.

Programmed control

The system has been largely described in terms of discrete logic circuits (gates and flip-flops). It will be realized that in practice, it may be more convenient to implement at least some of it by means of PLAs (programmed logic arrays).

Further, the logic may be made programmable, so that the user can set various of the turbo board functions. For example, it may be desirable to be able to change the. specific memory areas which are made uncacheable, or for which posted writes are permitted, or for which the signal BS16 is forced to 0 or 1 or dependent on BB16. It may also be desirable to slow down the operation of the turbo board; a modest slowing down may be achieved by permitting no posted writes, and a major slowing down may be achieved by making all addresses uncacheable. All these options may be made programmable, by providing a control register the main function of which is to determine the memory areas for the various memory functions. Such a control register would be addressed as an 10 device, i.e. by setting the signal R/IO to 0, and located at an otherwise unoccupied region of the 10 memory space.

Certain 80286-based systems are already provided with means for controlling their speed; such means comprise a register which effectively controls the oscillator speed. The present programmable control register would operate in a manner somewhat similar to such speed control, though of course in a more elaborate manner.

A further possibility is to provide jumper switches for controlling these options, or some of them. This may be particularly desirable for controlling whether or not the bus width signal BS16 is dependent on the main board bus width signal BB16, since in some microcomputers the bus width signal is permanently forced to 32 bits and any conversions required are performed externally of the microprocessor. It was noted above that in certain circumstances, it is necessary to slow down the operation of the turbo board. It may be desirable to provide means for enabling the board to be set into a permanently slowed condition. This can be done by including a jumper-controlled signal as one of the signals which are ORed together to form the tag reset signal TGRS. This can convenient ly be done by locating the jumper switch on the expansion bus connector 35, which will usually be accessible, whereas the turbo board itself may not be readily accessible. The signal MWR will then be taken from the expansion bus for one jumper position and forced to 1 (by a pull-up resistor) for the other jumper position.

Operat ion of the system

An access by the microprocessor to the cache system takes 2 clock cycles, matching the operation time of the microprocessor. An access to the main board takes much longer, and extra clock cycles have to be inserted in the microprocessor timing; this is done by holding signal RDY at 0.

The general significance of the signals CADS and CMD is as follows. When the microprocessor wants to do something (an access), it sends CADS to 1. If CMD is at 1 , that indicates that the rest of the turbo board (the bus control unit, for example) is doing something. So if the processor tries to start performing a new instruction, CMD shows whether the current instruction can proceed. CADS cannot go to 0 until the previous command has finished, i.e. until CMD has gone to 0. When that happens, CADS can go to 0, and CMD can go to 1 at the same time.

For a read operation, the microprocessor places the address on its address lines and takes ADS to 1 ; it takes ADS to 0 after one further cycle, so when ADS goes to 1, CADS is taken to and held at 1 , CADS goes to 0 2 cycles later if a cache read occurs. CRD (cache read) is at 1 at this t ime, enabling the cache memory unit outputs. If a cache read does not occur, then CRD and CADS are taken to 0, and CMD is taken to 1 to initiate a main board access, If the memory address is cacheable, then TGWR is taken to 1 to write into the cache the data (word) being read.

The interface circuitry now takes BCMD to 1 and holds it at 1 for a main Doard cycle, and takes BADS to 1. The system board responds to this by performing a memory access, and when this is complete, it sends BRDY to 1. This latches the word just read from the main memory into the read latches, and also causes signal FIN to go to 1 for one turbo board clock cycle after BCMD has gone to 0. FIN causes RDY to go to 1, so allowing the microprocessor to read the data from the latches and terminate the access. CMD goes to 0 at this point to indicate the end of the access. TGWR will return to 0 at this point, having been held at 1 until this point to write the data being read into the four data chips of the cache memory unit and the tag into the tag chip when the word becomes available. For a write access, the signal CRD is kept at 1, so the cache data chips are not enabled. Instead, the microprocessor drives the data bus lines with the word to be read. Towards the end of the second clock cycle, CADS goes to 0 and an access to the system board is initiated by taking CMD to 1. The cache memory is checked to determine whether the address is already in it, the signal HIT indicating the result. If the address is not already in the cache system, TGWR is taken to 1 for one clock cycle (the third clock cycle) to write the new word; if it is, then the tag chip is left as it is and the byte enable bits BE0-3 control which of the data chips have new bytes written Into them. Signal TGWR is taken to 1 for one cycle for such an update (possibly partial).

The microprocessor would then normally wait until the end of the full write operatio n before c

However, if the write is a posted write (signal POST = 1 ), then the access can terminate after 3 clock periods if the address was already in the cache system, or 2 if it was not. If a posted write is in progress, then a new access cannot be initiated through the interfacing circuitry until it is finished. In this case CADS stays at 1 until CMD goes to 0 (which occurs at the end of an access through the interfacing circuitry).

At power up, BRES goes to 1 to reset the main board. This forces RES to 1 to reset the turbo board, and in particular its microprocessor. It also resets the cache system, by sending TGRS to 1.

TGRS is also sent to 1 to reset the cache system when another bus master writes into the system memory. This event is indicated by the signal MWR from the expansion bus being at 1 in combination with the signal BHLDA being at 1. MWR is 1 for memory writes, and BHLDA at 1 indicates, that control of the address status and data lines has been relinquished so that they can be driven by some other device (i.e. not the microprocessor being emulated). In the interfacing circuitry, when CMD goes to 1 , BRQ goes to 1 and stays at 1 until BCMD goes to 1 , indicating that transfer is occurring across the header. When transfer is f inished, BCMD gods to 0 and latches data into the read data latches 1 12. (This occurs even if the operat ion is in fact a write operation; the data latched into these latches is irrelevant, and will be erased by whatever is next written into them. )

The control logic of the interfacing circuitry basically arbitrates between BRQ and BHOLD, to give control to another master by taking BHLDA to 1 or to carry out the requested transfer by taking BCMD to 1. These two signals are generated on the rising edge of BLCK.

The operation of the arbitration is as follows. BHLDA will go to 1 if

BHOLD is 1 and BCMD is 0 provided that BLOCK is 0. If BLOCK is 1 , then a locked transfer is in progress and BHLDA waits until BLOCK goes to 0 before going to 1. (A locked transfer is where two transfers need to be carried out without any possibility of another device updating only some of the data being transferred. ) Conversely BCMD can only go to 1 if BRQ is 1 and BHLDA is 0. BCMD cannot go to 1 if BHOLD is 1 unless BLOCK is 1. BHLDA stays at 1 as long as BRQ is 1 , whereas BCMD stays at 1 as long as BRDY is 0. BRDY is taken to 0 by the main board when the main board has completed its part of a transfer.

The signal which actually initiates the transfer on the main board is BADS, which goes to 1 for one clock cycle as soon as BCMD goes to 1. This is achieved by clocking BCMD and then gating the clocked signal with BCMD. BADS is not driven when BLHDA is 1.

Memory operat ion modificat ions

In the cache memory (Fig. 5 ), it should be noted that the gates 53 feed the active high enable inputs of the chips 50. This results in faster operat ion.

The act ive low inputs of these chips are forced to 0.

Figs. 10A and 10B show two modificat ions and developments of the system as previously described.

As noted in the previous description, it may be desirable to force the byte select signals BBEO-3 to the main board to 1 on a read for a cacheable address, regardless of the states of the byte enable signals BEO-3. This is achieved by providing a set of four OR gates 140 (Fig, 10A) through which the byte enable signals BEO-3 are fed on their way to the buffer register 110. The other input to each of these four OR gates is obtained from an AND gate 141, which is fed with the cache signal CACHE and the write/read signal W/R as shown.

In some types of microcomputers, the signal BB16 is valid only at certain times. In the system described above, it is assumed that BB16 is always valid. The system as previously described may therefore not operate correctly in these types of microcomputer.

This problem is overcome by the circuit shown in Fig. 10B. The signal BB16 is clocked into, a flip-flop 150 at the appropriate time, and the signal CB16 is used in the rest of the system in place of the signal BB16. The two flip- flops 150 and 151 are clocked by the clock signal /BCLK, just like the flip-flops shown in Fig. 9.

The circuit samples the signal BB16 on the falling edge of the signal BCLK prior to the rising edge of BCLK on which the signal BCMD goes active. This is done by feeding the signal BB16 through an AND gate 153 and an OR gate 154 to the flip-flop 150, which produces the signal CB16. Once flip-flop 150 has been set to 1 (CB16 = 1), it is latched until it is reset by signal BCMD going to 1 This sends the output of an AND gate 154 to 0, resetting the flip-flop when CMD goes active again. The microprocessor on the turbo board samples CB16, gated to form BS16, just after BCMD goes inactive.

A further complication is that in some types of system, a signal which is usually designated NA is used. This is a signal which is (in a system in which it is used) generated for feeding to the microprocessor, to inform the microprocessor that It can generate and put out the next address. For the microprocessor to make use of this signal, it must obviously be generated before the end of the current microprocessor operation; also, whether the microprocessor can in fact respond to it depends on whether or not it has to wait for the next operation before it can determine what the next address will be.) In most systems, this signal is not generated, and the NA input to the microprocessor is left unconnected. If the signal NA is used, then the signal BB16 must be ignored if NA is active on any falling edge of signal BCLK prior to, but not including, the BCLK clock edge mentioned above on which the signal BB16 is sampled. In that case, the access is a 32-bit access, and flip-flop 151 is used to generate an "ignore BS16" signal IG16. Signal BNA (i.e. the main board signal usually designated NA) is gated by the signals BCMD and CBCMD in an AND gate 155. Signal CBCMD is a "clocked BCMD" signal, obtained as the output of flip-flop 121 (Fig. 9). This effectively prevents the signal BNA from being sampled at the start of an operation, when BADS is active (i.e. BCMD is 1 while CMCMD is 0). An AND gate 156 latches flip-flop 151 until the end of an operation, when BCMD goes to 0; gates 155 and 156 feed flip-flop 151 via an OR gate 157. Signal IG16 from flip-flop 151 disables gate 152, preventing signal BB16 from being sampled if signal BNA was active.

The manual for the 80386 microprocessor states that the status lines should only be sampled in the time periods between BADS going low and BRDY going high. However, at least one microcomputer using an 80386 chip does not conform to this recommendation. Such a microcomputer may not be able to operate satisfactorily with the system as so far described, because although the states of the status lines generated by the turbo board as so far described are compatible with those of a real 80386 chip during the time periods just mentioned, they are not necessarily compatible at other times. To overcome this incompatibility, the status lines may be controlled by the signal BCMD. They will then become valid in the same BCLK clock as BADS and stay valid until the clock in which BRDY is returned at 1. Outside those periods, the status lines would be tristated, and would normally be pulled to 1 by pull-up resistors on the main (mother) board. The address lines may be treated similarly, for the same reason.

Claims

Cla ims

1 A computer system of the type described, comprising a plurality of subsystems (12, 13, 14, 15) including a main memory (12) and a clock oscillator (in 13 or 15 ), adapted to produce substantially all the microprocessor coupling signals required to couple to a microprocessor (a slow microprocessor) (11 ) of a particular type and of operating frequency matching the clock oscillator, characterized by a fast microprocessor (34) of substantially the same type as the slow microprocessor but of higher operating frequency, and interfacing circuitry (41) interconnecting the fast microprocessor with the microprocessor coupling signals (at 31 ) and comprising a fast clock oscillator (46) of operating frequency matching that of the fast microprocessor and which converts said interfacing signals to signals of the same type but with timing matching that of the fast microprocessor.

2 A computer system according to claim 1 , characterized in that the fast microprocessor and the interfacing circuitry are mounted on. the same board (10) as said plurality of subsystems.

3 A computer system according to claim 1, characterized in that the fast microprocessor and the interfacing circuitry are mounted on a separate board (30) from the board (10) carrying said plurality of subsystems.

4 A computer system according to claim 3, characterized in that the board (30) bearing the microprocessor and the interfacing circuitry is mounted in a socket (1 1 ) carried on the main- board and which normally has said slow microprocessor mounted therein.

5 A computer system according to any previous claim, characterized in that a cache memory system (40), having a memory unit (42) and a control unit (43 ), is connected between the fast microprocessor (34) and the main memory (12).

6 A computer system according to claim 5, characterized in that data is written into the cache memory unit only from the fast microprocessor, and all data from the fast processor is written into the main memory. 7 A computer system according to claim 6, characterized in that the cache memory control unit (43 ) includes posted write logic means (60-66 ) which determine whether or not the fast microprocessor can, after it has generated data, proceed with the next operation before that data has been written into the main memory.

8 A computer system according to either of claims 6 and 7, characterized in tha t the cache memory control unit (43 ) includes means (60-62 ) which determine whether or not data can be written into the cache memory unit in dependence on its address.

9 A computer system according to any one of claims 6 to 8, characterized in that the cache memory control unit (43 ) includes bus size means which, for predetermined address ranges, set the bus size at one or other of two possible values in dependence on the address.

10 A computer system according to any one of claims 6 to 9, characterized in that the cache memory control unit (43 ) includes cache reset logic means (55-5-8 ) which cause resetting of the cache in response to predetermined conditions.

1 1 A computer system according to claim 10, characterized in tha t said predetermined conditions include an address within predetermined ranges.

12 A computer system according to either of claims 10 and 1 1 , characterized in that said predetermined conditions include input/output operations.

13 A computer system according to any one of claims 10 to 12, characterized in that said predetermined conditions include floppy disc unit operations.