GB2241801A

GB2241801A - Data bypass structure in a register file on a microprocessor

Info

Publication number: GB2241801A
Application number: GB9101089A
Authority: GB
Inventors: James M Arnold; Glenn J Hinton; Frank S Smith
Original assignee: Intel Corp
Current assignee: Intel Corp
Priority date: 1990-03-05
Filing date: 1991-01-17
Publication date: 1991-09-11
Anticipated expiration: 2011-01-17
Also published as: GB2241801B; JPH04219825A; GB9101089D0; GB9320089D0

Abstract

A register file on a pipelined microprocessor chip has a bypass structure (16, 24) that drives the correct source data from an immediately previous write result to guarantee the most recent data is utilized when executing an instruction stream in the pipeline. Load data or execution result data (52) are returned to the RAM array register file (18) in the second phase of a cycle but is actually written into the RAM Array into the cells in the first phase of the succeeding clock cycle. To avoid an instruction from being delayed one cycle waiting for the data to be written into the RAM and then read out again, bypass logic routes the returning Load or execution result data directly onto the column lines of the read ports of the source busses during the second phase of the cycle that the data is returned. <IMAGE>

Description

DATA BYPASS STRUCTURE IN A REGISTER FILE ON A MICROPROCESSOR CHIP TO ENSURE DATA INTEGRITY Cross Reference to Related Applications Copending patent application 8N 486,407 (D-1273) entitled "REGISTER SCOREBOARDING EXTENDED TO ALL tLTIPLE-CYCLE OPER ATIONS IN A PIPELINED MICROPROCESSOR"; and SN 486,408 (1276) entitled "SIX-WAY ACCESS PORTED RAM ARRAY CELL": all assigned to Intel Corporation.

Technical Field The present invention relates to data processing systems, and more particularly to a register file that has a bypass structure that drives the correct source data from an imme- diately previous write result to guarantee that the most recent data is utilized.

Background Art In US patent 4, 981, 733 "Register Scoreboarding on a Microprocessor Chips granted on January 2, 1990 to David Budde, et al. assigned to Intel Corporation, there is described apparatus for minimizing idle time when executing an instruction stream in a pipelined microprocessor by using a scoreboarding technique for load instructions in a register file that contains user accessible registers. Copending application SN ,486i407 (D-1273) extends scoreboarding to all multiple-cycle operations. Several instructions are issued in each clock cycle and are executed concurrently. To accommodate the demand for access to heeded registers for multiple operands, a multi-ported register file is provided which allows multiple operations to concurrently access the data they need.To accomplish this in a most effective manner a new RAM array cell is provided as described in SN 486,408 (1276). In the prior art, bypassing occurred outside the basic register file array by multi#lexing the correct data directly onto the source data busses, which are after the sense amplifiers.

It is an object of the present invention to provide an apparatus that incorporates a bypass structure that drives the correct source data from an immediately previous write result to guarantee the most recent data is utilized in a subsequent operation.

Brief Description of the Invention The above object is accomplished in accordance with the present invention by providing a bypass circuit that is connected between a memory interface and a random access memory (RAM) array. The memory interface includes a Ld Data bus (106) and the RAM Array includes a plurality of word registers having plurality of outgoing ports. Alignment logic connected to said memory interface and to said RAM array arranges data destined for the memory interface (in the Store case) and for prepares data coming from the memory interface to be entered into the RAM Array (in the Load case).

Means are provided for clearing the registers in the RAM array immediately prior to the column select lines being driven. The Load Bypass logic bypasses the LdData bus such that incoming data returning from the Memory Interface is placed directly onto the outgoing column lines of the read ports of the RAM array.

The bypass logic includes register comparison address logic for comparing register addresses of all the registers in which data is returning with the addresses of the requested source registers to thereby produce a match output signal.

In response to the match output signal data coming in from the Load alignment logic or the execution unit is placed directly onto the RAM column lines of the source.

Description of the Drawings The foregoing and other objects, features, and advantages of the invention will be apparent from the following more particular description of a preferred embodiment of the invention as illustrated in the accompanying drawings.

FIGURE 1 is a functional block diagram of the register file in which the invention is embodied; FIGURE 2 is a more detailed block diagram of the RAM array and comparison logic within the RAM array of the register file of FIGURE 1, and, FIGURE 3 is a timing diagram illustrating the operation of the circuits if FIGURES 1 and 2.

DESCRIPTION Referring now to FIGURE 1, the register file (RF) has 16 local and 16 global registers and is connected to memory interface unit/instruction decode (8) and to execution units (4). The RF has 4 independent read ports and 2 independent write ports to support parallelism. It also checks and maintains a register scoreboarding logic (21) as described in SN 486,407 (D-1273).

The circuit of FIGURE 1 is driven by a clock having a two non-overlapping clock phase design such as the clock described in U.S. patent 4, 816, 700. Four clocks, PH1, PILL, PH2, PH2I, are distributed in the chip. PH1 and PH2 are traditional NMOS non-overlapping clocks with equal duty cycles. PH1I and PH2I are the PMOS analogs to PH1 and PH2 and are exact inversions of PH1 and PH2 respectively.

The Register File (RF) is the focal point for all data operands in the Microprocessor. The Microprocessor implements a WAD/STORE architecture all data operands (excluding special function register operands) associated with a program must at one time or another reside in the RF.

The RF contains the macrocode and microcode visible RAM registers. The RF provides a high performance interface to these registers through a multi-ported access structure, allowing four reads and two writes on different registers to occur during the same machine cycle.

The Register File consists of six major logic blocks. the Load/Store Alignment (10, 12), the Base Mux (14), the Load Bypass (16), the RAM Array (18), the Destination Bypass (24), and the Srcl/Src2 Muxes (26).

There are 4 reads possible : Store (58), Base (50), Srcl (54) and Src2 (56). Similarly, there are 2 writes possible: Load (52) and Destination (60).

As shown in FIGURE 2, the entire data path including the actual RAM array (18), is structured into a 4 word X 32 bits/word, 128-bit wide path, arranged with word bits grouped together (Word 3 bit 31, Word 2 bit 31, Word 1 bit 31, etc.). This arrangement provides advantages in both RAM cell width size and ease of alignment of Load/Store data.

FIGURE 3 shows the basic timings for reading and writing the RAM Array, and checking and setting the scoreboard bits. In the diagram of FIGURE 3, the Load data is returned after an arbitrary number of cycles. At that time, the signal Ld Valid (104) is asserted, indicating that valid data is on the LdData bus (106). Registers in the RAM Array are read in Ph2, and written in Phi. When the Load data is written into the register file, the following occurs.In Pipe Stage 2, Phase 2 (denoted as q22") , the zeros of the data are written into the RAM, with the ones written one phase later in q31. since 028 cannot overwrite 1*8 in the RAM cells, as described in copending application SN 486, 408 (D 1276), the registers to be written must be cleared just previous to the actual writing of the data.

The Load data is returned in q22 but is actually written into the RAM Array in q31. To avoid the Add instruction from being delayed one cycle waiting for the data to be written into the RAM, and then read out again, the RF bypasses the returning Load data onto the Srcl bus during q22.

The Load data is written as usual in q31, while concurrently the Add instruction is executed by the EU (4).

Load/Store Alignment The Load and Store Alignment logic block (10, 12) arranges the data destined for the Memory Interface (in the Store case) and prepares the data coming from the Memory Interface to be entered into the RAM Array (in the load case). Since the procedure is almost identical for both cases with just the direction reversed, the Load alignment process only is described.

Load data returning from the Memory Interface is arranged such that it is word aligned to the least significant word (LSW), which is Word 0. For example, a word returning from Word 2 in a 4 word memory block is shifted to Word 0 before it is placed on the data bus. Note that just as the RF data path is structured as a 4 Word X 32 bits/word path with word bits grouped together (all bit zeros together, all bit ones, etc.), the LdData and Stdata busses also are structured this way. Thus, a word shift to Word O is simply a multiplexing process in each bit cell. Since only partial word alignment is done by the Memory Interface, sub-word (byte and short word) cases are identical to a word access from the point of view of the Memory Interface.For example, a byte returning from Byte 13 of the 16 (of15) byte memory block would return in Word 3, bits 8-15. The Memory Interface would then align this to the LSW, or Word O, though the byte would still be in bits 8-15.

The first step the RF Load alignment logic (10) does is correctly byte align the incoming data to the least significant byte (LSB). This only has to be done if the returning data is a sub-word quantity. The RF determines this from the Typein (or length) field which is returned a phase earlier than the data.

Byte aligning the incoming data requires a physical moving of the data to the lowest byte, an actual "steering" of the data perpendicular to the data path of the RF. The entire data destined for the Register File is completely aligned to the LSB.

Zero extension is now done for the Load case. This is unique to the Load Alignment block; Stores do not have to do these operations. If the data returning from memory is a byte or short word, and bit 3 of the Typein field is zero, then zero extension must be performed to pad the rest of the 32 bit word to be written to a register.

The final step involved is register alignment, which correctly positions the word(s) into their intended word locations, which will then be written into the RAM Array.

Base Mux The Base Mux (14) contains a 2-1 multiplexer that reduces the 64-bit field coming from the RAM Array read ports shown in FIGURE 2 into a 32-bit Base suitable for the Memory Interface (8). The base mux is needed to handle the RAM Array 64-bit Base bus (50). This also saves area in the RAM Array which would have had to further multiplex the 64-bit value into a 32-bit base. The multiplexer is controlled by bit O of the BaseAdr bus, which specifies which word is to be placed onto the Basebus.

Load Bypass The Load Bypass logic block (16) contains the logic to bypass the LdData bus (106) returning from the Memory Interface onto the various outgoing ports: StData (58), Base (50), Srcl (54), and Src2 (56).

The register addresses of all the registers in which data is returning are compared with the addresses of the source registers requested. When a match has been made, the bypass logic places the data coming in from the Load alignment logic block directly onto the source' 6 RAM column lines.

From the point of view of the Src busses, the result of the operation is the same as if the data had been read from the RAM Array cells; no differences can be detected.

This method of driving the column lines is possible because the registers which are being bypassed have been cleared by the clear line (67) shown in FIGURE 2, which is asserted immediately prior to the column lines being driven. If it were not for this fact, the stale contents of the registers would be driven onto the column lines, since the decode logic is still enabling the RAM cells to drive their data.

The column lines are precharged negative true, which means that zeros in the cell do not affect the state of the lines.

RAM Array The RAM Array logic block contains the Literal generation logic (19), the Register RAM Array (18), the address decoders (20, 22), and the register scoreboard bits (21).

The Register File provides 32 literals, values 0 through 31, for the use of the programmer/microprogrammer. The Literal logic (19) which produces these values resides just on top of the RAM Array, with the RAM column lines (50, 58) traveling through the section, continuing to the Load Bypass logic block (16). When a literal is requested as a Srcl or Src2 operand (literals are not allowed as sources for Base and Store use), its corresponding "register address" is placed on the SlAdr or S2Adr busses.

Destination Bypass The Destination Bypass logic block (26) contains the circuitry to bypass the Dst bus (ill) returning from the EU or REG coprocessors onto the various outgoing ports: The StData bus, Base bus, Srcl, and Src2. The Destination bypass is virtually identical to the Load bypass (16) with some minor differences. The logic is actually simpler in the Destination bypass because only two registers can possibly be bypassed, since the Dst bus (110) is only 64-bits wide. In the Load bypass, the register address comparison logic has to handle the possibility of four registers being bypassed since the LdData bus (106) is 128-bits wide. Other than these differences, the logic is almost identical to the Load bypass circuitry.

Srcl and Src2 Muxes The Srcl and Src2 Muxes (26) contain the multiplexers needed to select either one of the two word 64-bit source RAM data, or the SFRInBus to drive the 32-bit operand for the Srcl and Src2 busses. The logic block also contains the buffers that drive the SrclHi bus, providing a full 64-bit source when needed. The control necessary to multiplex the three possible sources into a single word Src operand is the LSB of the SlAdr (or S2Adr) along with the upper 2 bits of the SlAdr (S2Adr) = "10" field which tells the logic when to enable the SFRInBus.

The SrclHi bus is driven without regard to whether or not the data is required by the EU or REG coprocessors.

The following is a general description of the major busses and signals that connect the RF to the other logic blocks shown in FIGURE 1.

Memory Interface Busses The following busses carry actual data to and from the RF.

LdData(0:l27) This is the 128-bit Load Data bus which returns the information from the Memory Interface (External Memory, Data Cache, etc.).

StData(0:127) This is the 128-bit Store Data bus which sends information to the Memory Interface.

Base(0:31) The Base bus is the 32-bit base address bus sent to the Memory Interface specifying the memory address of the Load or Store.

The following busses carry control and register address information, specifying tMpe and location information about the above data busses. All register addresses are 7 bits.

BaseAdr This is the address of the register to be used to drive the Base bus.

LdAdrOut The Load Address Out bus is used in several cases. It is sent to the RF along with the opcode by the IS specifying the starting register that data is to be returned to and to be scoreboarded on a load instruction (i.e. ncon on a quad word access) to be scoreboarded on a Load instruction. It is also used to specify the starting register to be sent on the StData bus on a Store instruction. Finally, it contains the address of the register data si returned to on a LDA (Load Effective Address) instruction.

LdAdrIn This is the register address of the Load or LDA data returning from the Memory Interface or IS. It is driven when the data is ready to return to the register file.

TypeOut(0:3) This 4-bit field specifies the access length and also the type of extension used for sub-word accesses.

It is driven by the IS along with the opcode and LdAdrout bus. It is used to determine which registers to scoreboard (and check) on Loads, and which registers to drive the StData bus on Stores.

Typein(O:3) This is the Typeout field which has been trapped by the Memory Interface, waiting for the data to return, be it from the Data Cache or external memory. It is returned along with the LdAdrIn bus.

LdStOut(O:3) This determines which flavor of memory operation is requested: Load, LDA, Store, or instruction fetch.

It is sent along with the Typein and LdAdrIn fields.

LdValid This signal driven by the Memory Interface is asserted when valid data is placed on the LdData bus.

Nemscbok This signal driven by the RF indicates to the rest of the logic blocks that a register used by the current memory type instruction is not free, and that the instruction must be reissued when the register is not in use.

Register Execution Busses The following busses carry data to and from the RF.

SrclHi, Srcl These two 32-bit busses form the 64-bit source operand tl which is sent to the EU and coprocessors.

Src2Hi, Src2 These two 32-bit busses form the 64-bit source operand X2 which is sent to the EU and coprocessors.

DstHit, DstLo# This constitutes the 64-bit destination bus which the EU and coprocessors use to return the result of the operation performed. Both busses are negative true.

SFRInBus(0:3l)# This is the 32-bit Special Function Register bus which allows external Core logic functions to be read as if they were registers. The RF allows the SFRInBus to drive the Srcl or Src2 busses when the register address field matches a SFR register address. It is also an asserted low bus.

The following busses carry register address information to and from the RF. Each are 7 bits.

SlAdr This is the address of the register(s) used to drive the Srcl bus.

82Adr This specifies the,address of the register(s) used to drive the Src2 bus.

DstAdrOut This is the address of the registers that will be used to store the destination of the operation to be performed. It is used by the RF to scoreboard (and check) the appropriate registers.

DstAdrIn This is the register address for the data returning on the DstHi and DstLo busses.

While the invention has been particularly shown and described with reference to preferred embodiments thereof, it will be understood by those skilled in the art that the foregoing and other changes in form and detail may be made therein without departing from the scope of the invention.

Claims

1. A data processor connectable to a main memory and a microinstruction bus for carrying a current microt struction, said data processor including a Memory Interface (8) including a Ld Data bus (106); a RAM Array (18) including a plurality of word registers; Alignment logic (10) connected to said memory interface (8) and to said RAM array (18) for arranging data destined for said Memory Interface (in the store case) and for preparing data coming from said Memory Interface to be entered into the RAM Array (in the load case) means (62) for clearing said registers in said RAM array immediately prior to said column lines being driven: the improvement characterized byt Load Bypass logic (16) for bypassing said Ldflata bus (106) returning from said Memory Interface (8) onto said column lines of the outgoing ports of said RAM array: StData (58), Base (50), Srcl (54), and Brca (56tut said Destination Bypass logic including register comparison address logic means for comparing register addresses of all the registers in which data is returning with the addresses of said requested source registers to thereby produce a match output signal; and means responsive to said match output signal for placing said data coming in from said Load alignment logic (10) directly onto the RAM column lines of said source.

2. In a pipelined microprocessor the method of loading a mutiported register file which has a plurality of RAM cells comprising the steps oft A. enabling said RAM cells with decoded address data to thereby select particular cells to receive data: B. asserting a clear line to thereby set said selected cells to a zero state; C. precharging column lines negative true, such that zeros in the cell do not affect the state of the cell output lines; D. comparing the register addresses of all the registers in which data is returning with the addresses of the source registers requested: E. Generating a match signal upon the condition that a identity exists: and, F. placing the data on said data bus directly onto the source's RAM column lines for those cells for which a match signal is generated.

3. A data processor connectablb to a main meinory and a microinstruction bus for carrying a current microinstruction, substantially as hereinbefore described with reference to the accompanying drawings.

4. The method of loading a mutiported register file which has a plurality of RAM cells substantially as hereinbefore described.