WO1996019767A1

WO1996019767A1 - Microprocessor for simultaneous execution of a plurality of programs

Info

Publication number: WO1996019767A1
Application number: PCT/EP1995/005051
Authority: WO
Inventors: Joachim Krucken
Original assignee: Motorola Gmbh
Priority date: 1994-12-22
Filing date: 1995-12-20
Publication date: 1996-06-27
Also published as: JPH10510936A; GB2296352A; GB9426474D0; EP0799446A1

Abstract

A microprocessor includes an ALU (1) coupled to receive instructions from a memory latch and decoding unit (3) and data from a data memory (2). Two program counters (6 and 7) are coupled to the program memory (2) via a switching unit (8) controlled by a clock signal (9) such that instructions from an instruction memory (5) are multiplexed at the memory latch and decoding unit (3) so that instructions relating to two separate program flows executed, in turn, by the ALU (1).

Description

Microprocessor for simul taneous executi on of a pl ural i ty of programs .

Field Of The Invention This invention relates to microprocessors, and more particularly to microprocessors formed in so-called Harvard Architecture (Harvard processors) where there are separate memories for the program instructions and the data.

Background Of The Invention In known Harvard processors an Arithmetic Logic Unit (ALU) accepts program instructions (instruction fetch) and data (data fetch) from separate memories controlled by a program counter so that the instructions and data are received consecutively to be operated on by the ALU (Data Process). In order to speed up the processing, it is known to pipeline the operation by simultaneously fetching the next instruction while executing the current data process. However, if the current data process results in a condition which results in a branching of the program flow, then the next instruction which has already been fetched is inappropriate and must be discarded before the correct (branched) instruction can be fetched, thereby wasting at least one fetch/process cycle. Furthermore, in a normal (serial) program flow, it is difficult to arrange for two separate functions to run simultaneously. For example, in a communications interface processor, it is difficult to arrange for both the receiving function and the transmitting function to operate at the same time. It is therefore an object of the present invention to provide a processor which overcomes, or at least reduces, the above-mentioned disadvantages of the prior art.

Brief Summary Of The Invention Accordingly, the invention provides a processor comprising an ALU coupled to receive instructions from a program memory and data from a data memory, two or more program counters, each coupled to the program memory via a switching unit controlled by a clock signal such that instructions relating to two or more separate program flows are multiplexed at the ALU. In a preferred embodiment, an instruction latch and decoding unit is coupled between the program memory and the ALU and between the program memory and the data memory. Preferably, the instruction latch and decoding unit is also coupled to a controller which updates the program counters.

Brief Description Of The Drawings One embodiment of the invention will now be more fully described, by way of example, with reference to the drawings, of which:

FIG. 1 shows a schematic diagram of a conventional Harvard processor;

FIG. 2 shows the instruction and data timing of the processor of FIG. 1 with a linear program flow; FIG. 3 shows the instruction and data timing of the processor of

FIG. 1 with a program flow containing a conditional branch;

FIG. 4 shows a schematic diagram of a processor according to the present invention;

FIG. 5 shows the instruction and data timing of the processor of FIG. 4; and

FIG. 6 shows a timing diagram for the processor of FIG. 4

Detailed Description Thus, as shown in FIG. 1, a known Harvard processor consists of an ALU 1, which processes data received from a data memory 2 according to instruction codes received from an instruction latch 3 and outputs the result back to the data memory or register file 2. The instruction code and, in some cases, data are derived from the instruction latch 3, which includes instruction decoder logic. The instruction latch 3 includes an output which is coupled to the data memory 2 in order to provide the address to the data memory 2 of the data to be output to the ALU 1 to be processed.

In order to allow program jumps or branches in the program flow, the instruction latch 3 also provides an output coupled to an input of an ALU 4, which handles the increment and branch calculations necessary to update a program counter 6. An output of the program counter 6 is coupled to an instruction memory 5 and provides an address of the next instruction to be used. The instruction memory 5 thus provides the next instruction to the instruction latch 3, where it is decoded and passed to the ALU 1, as described above. In order to achieve a high data throughput, the instruction fetch from the instruction memory 5 to the instruction latch 3, the instruction decode by the instruction decoder logic in the instruction latch 3, and the data processing by the ALU 1 according to the instruction code provided by the instruction latch 3 is done in a pipelined manner, as explained above.

In particular, the basic instruction and data timing of the processor shown in FIG. 1 is shown in FIG. 2 as a simple one stage pipeline. In FIG. 2, the top line shows timeslots 11 through 15 clocked by a clocking signal provided to the program counter 6, instruction memory 5, instruction latch 3 and ALU 1. The next line down shows those timeslots during which instructions FI1 through FI4 are fetched from the instruction memory 5 by the instruction latch 3. The lower line shows the timeslots during which the ALU processes (executes) the data Ell through EI4 corresponding to the fetched and decoded instructions. So, for example, in timeslot 13 the ALU 1 processes the data EI2 according to the instruction FI2 fetched during timeslot 12 and instruction latch fetches the next instruction FI3. As can be seen, FIG. 2 shows a linear program flow in which each execution of an instruction (data process), takes place in the next timeslot to that in which the instruction was fetched, and the instructions are fetched and executed in a linear serial order.

FIG. 3 shows a program flow, similar to that of FIG. 2, but containing a conditional branch in the instruction FI2 fetched in timeslot 12. This condition can only be evaluated after the ALU has finished the data execution EI2 in timeslot 13, so that the instruction fetch FI3 already performed in timeslot 13 has to be discarded, since the next required instruction would not be the instruction fetched in timeslot 13, but some other instruction, which now needs to be fetched in timeslot 14, before it can be executed in timeslot 15. For example, if instruction FI6 is required, this can only be executed at EI6. This results in a performance loss of 1 cycle (during timeslot 14 in this example) per branch.

FIG. 4 shows the architecture of a processor according to one embodiment of the present invention which allows two program flows to be executed in parallel. As shown in FIG. 4, this processor generally includes the same functional blocks as described with respect to FIG. 1 and these blocks have the same reference numerals as in FIG. 1. However, in addition to these blocks there is a second program counter 7 connected in parallel to the first program counter 6 with a switch 8 provided between the two program counters 6 and 7 and the instruction memory 5. The two program counters 6 and 7 are switched using the switch 8 so that only one provides the resulting address for the next instruction to be fetched. The switch 8 is timed using a simple clock signal 9, such that during one phase of this clock signal 9, the first program counter 6 provides the next address and during the other phase, the second program counter 7 provides the instruction address. Thus, the two program counters are, in effect, multiplexed so that they can each control a separate program flow, which are thereby multiplexed together in the instruction fetch and execute flow. This is best shown in FIG. 5, where instruction fetch and data execution timing for two program flows A and B running in parallel are shown, with the top line showing the timeslots 21 through 29. Instruction fetches of program flow A are indicated by FIAl through FIA4 and Instruction executes of program flow A are indicated by EIAl through EIA4, while instruction fetches and executes of program flow B are indicated by FIBl through FIB4 and EIBl through EIB4, respectively.

Thus, as can be seen, by multiplexing the fetch instructions between program flow A and program flow B, the fetch instruction being carried out in any one timeslot is from the other program flow from the program flow whose instruction is being executed in that timeslot. For example, in timeslot 24 the instruction being executed EIA2 is from program flow A whereas the instruction being fetched FIB2 is from program flow B. Similarly, in the next timeslot 25 the instruction being executed EIB2 is from program flow B whereas the instruction being fetched F1A3 is from program flow A. The program flows A and B are therefore not affected by conditional branching since the next instruction of a flow is fetched after the preceding ALU operation setting the conditions for that program flow is finished.

FIG. 6 shows the timing more precisely. In this FIG. program flow A is indicated by single hatching while program flow B is indicated by crosshatching. The signal SWITCH indicates which program counter is currently active (i.e. indicates the position of the multiplexer switch 8).

As shown, during phase POO the instruction memory is precharged. Program counter 6 for program flow A is incremented, i.e. unaffected if the instruction is not a branch or set to the next address by evaluating the branch condition and the ALU 1 reads the operand data of the previous program flow B instruction. During the next phase P01 the instruction addressed by program counter 6 for program flow A is latched in the instruction latch 3 while program counter 7 for program flow B is incremented anticipating a linear program flow. The ALU finalizes the calculation of the result and writes it back to the data memory 2. During phase P10 the instruction memory 5 is precharged in order to preparethe evaluation of the second program counter 7 for program flow B. The second program counter 7 is updated in case of a branch instruction and the ALU reads the instruction latch 3 for the instruction fetched during phase P01. During phase Pll the instruction addressed by second program counter B is latched in the instruction latch 3. First program counter 6 is incremented anticipating a linear program flow. The ALU 1 finalizes the calculation of the result and writes it back to the data memory 2. It will therefore be apparent, that, since each execution of a program flow is completed before the next instruction fetch of that program flow, there can be no problems with fetching unrequired instructions due to branching and therefore no cycles are wasted.

Thus, within one cycle of the clock (switch), two instructions and two ALU operations are executed with minimal overhead (only a second program counter as well as a simple multiplexer are necessary), to avoid performace loss due to nonlinear program flow. Having two programs being executed in parallel is also very useful for many applications, e.g. software execution of serial data transmission, where one program handles the data transmission and the other program handles the data reception. Many more applications, such as timing processing machines can be effectively programmed using this architecture. It will be appreciated that although only one particular embodiment of the invention has been described in detail, various modifications and improvements can be made by a person skilled in the art without departing from the scope of the present invention. For example, although the invention has been described with two program counters and two program flows, more than two program counters controlling more than two program flows can easily be implemented by switching all the program counters, in turn.

Claims

1. A processor comprising an ALU coupled to receive instructions from a program memory and data from a data memory, two or more program counters, each coupled to the program memory via a switching unit controlled by a clock signal such that instructions relating to two or more separate program flows are multiplexed at the ALU.

2. A processor according to claim 1, further comprising an instruction latch and decoding unit coupled between the program memory and the ALU and between the program memory and the data memory.

3. A processor according to claim 2, further comprising a controller coupled between the instruction latch and decoding unit and the program counters to update the program counters.