In modern microcomputer chip design, simplified command set streamline structure becomes more and more popular, and the purpose that adopts pipeline organization is to improve execution speed.Now bright for instance.Under certain technology condition, be 1 μ s if microcomputer chip is carried out instruction required averaging time, if do not adopt pipeline organization, then its execution speed is average 1 instruction/1 μ s; If the pipeline organization in 4 stages of employing, then its execution speed is average 4 instruction/1 μ s.Be the principle of work that example illustrates the simplified command set streamline structure microcontroller now with 4 stage pipeline organizations.
The principle of work of simplified command set streamline structure microcontroller is described with reference to figure 1.
Phase one, the instruction fetch phase (IF stage): programmable counter 101 provides an address P1 to instruction ROM (read-only memory) 102.Instruction ROM (read-only memory) 102 will be exported a vectorial P2 of the corresponding instruction in address therewith.This instruction vector comprises operand or operand address, destination address and operation control code etc.
Subordinate phase, read operation are counted the stage (RD stage): instruct vectorial P2 output terminal 103a by this register 103 after order register 103 is delayed time a clock period to send command decoder 105 to.The various microcode vector of instruction vector decoding back output P7, the P8 that this code translator 105 will be imported, P9, P10, P11 etc.Vector P7, P8 are respectively the address of two operands.According to these operand addresss, from random-access memory (ram) 107, take out corresponding operand P13, P14, and be transported to the input end that data are selected transfer circuit 109.Data select transfer circuit 109 according to the control vector P15 from data correlation discriminating circuit 108, select two suitable operands and flow to mathematical logic unit 111 by its output terminal P16, P17.
Phase III, carry out operation stages (EX stage): mathematical logic unit (ALU) 111 carries out corresponding operation according to operation control code P11 to operand P16, P17, and operation result flowed to by its output terminal 111a writes result register 106.
The quadravalence section, write the stage (WR stage) as a result: the operation result that existence is write in the result register 106 is written among the RAM107 by its output terminal 106a.
The available following table one of the operation of above-mentioned quadravalence section pipeline organization is observed.
In the cycle 0, instruction 1 instructs 1 to take out from command memory 102 in the instruction fetch phase this moment.In the cycle 1, instruction 1 moves on to read operation and counts the stage, according to the address of regulation in the instruction 1, read operands A, B.At this moment, instruction 2 enters the instruction fetch phase.In the cycle 2, instruction 1 enters the execute phase, operational code is offered mathematical logic unit 111 and with the operation that puts rules into practice; Instruction 2 moves on to read operation and counts the stage, and instruction 3 enters the instruction fetch phase.At last, in the cycle 3, instruction 1 enters the stage as a result of writing, and the operation result of mathematical logic unit 111 is written in the storage unit of destination address regulation.
Table one
Cycle | Instruction fetch phase | The stage is counted in read operation | Execute phase | Write the stage as a result |
????0 | Instruction 1 | | | |
????1 | Instruction 2 | Instruction 1 | | |
????2 | Instruction 3 | Instruction 2 | Instruction 1 | |
????3 | Instruction 4 | Instruction 3 | Instruction 2 | Instruction 1 |
????4 | Instruction 5 | Instruction 4 | Instruction 3 | Instruction 2 |
From above-mentioned streamline chart, can see that when the operand in the instruction 2 and 3 or had been read, the result of instruction 1 was written in the storage unit.2 and 3 need be with the operation result of instruction 1 if instruct in stage of read operation number or execution command, instruct 1 operation result just must be delivered to the input end of mathematical logic unit 111 in advance so, otherwise, " data disaster " will take place (DATAHAZARD) or obtain wrong result.For example, if instruction 2 will be with the result of instruction 1, so in the cycle 3, the result of instruction 1 must be transferred to the execute phase of instruction 2 forward when being written into storer.(this data are passed on and are realized by following method: read operation is counted operand address or the position of stage and execute phase and write as a result the destination address or the position in stage compare, and with one group of multidiameter option switch suitable service data is passed to mathematical logic unit 111.
Consider the follow procedure fragment:
SUB A, B, C----A-B, the result is stored in C
ADD D, C, F----D+C, the result is stored in F
Instruction X
Instruction Y
The execution order of said procedure fragment can be represented with the streamline chart of following table two.
Table two
Cycle | Instruction fetch phase | The stage is counted in read operation | Execute phase | Write the stage as a result |
????0 | ?SUB?A,B,C | | | |
????1 | ?ADD?D,C,F | ?SUB?A,B,C | | |
????2 | Instruction X | ?ADD?D,C,F | ?SUB?A,B,C | |
????3 | Instruction Y | Instruction X | ?ADD?D,C,F | ?SUB?A,B,C |
In the cycle 3, the result of A-B is written into C.In the execute phase, C but is used to calculate D+C.Therefore, normal read operation is counted the path and must be got around, and must will write the input end that C value in the stage as a result is delivered to the mathematical logic unit of execute phase.Otherwise, just can not get expected result.
In sophisticated vocabulary microcontroller, because the execution of adjacent instructions is non-overlapping copies in time, promptly after current this instruction was finished, next bar instruction was just taken out from command memory and is begun and carried out.But in the simplified command set streamline structure microcontroller, because the execution of adjacent instructions is overlapped in time, if the operand of a certain instruction is the execution result (it is relevant that this situation is called data) of adjacent last or two instruction, so just this result must be delivered to the execute phase forward, otherwise, so-called " data disaster " (Data Hazard) will take place, also the operation result that just can not obtain expecting.In order to realize the correct transmission of data between each stage, read operation need be counted the instruction in stage and the operand address in the instruction of execute phase and compare with destination address in the instruction of writing stage as a result; Control one group of multidiameter option switch with the signal that the result produced relatively then, thereby suitable operand is sent to the data input port of mathematical logic unit.In order to realize this data transfer like clockwork, just must differentiate adjacent instructions and whether have data dependence with the data correlation discriminating circuit.
At present, in the controller of common simplified command set streamline structure, in the used data correlation discriminating circuit, need add other logical combination and sequential circuit with 4 address comparators.
The present invention proposes a kind of new data correlation discriminating that is applied to pipeline organization microcontroller or central processing unit and select transfer circuit.
The invention provides a kind of microcontroller of new simplified command set streamline structure, this microcontroller comprises a data correlation discriminating circuit and data selection transfer circuit, wherein the data correlation discriminating circuit comprises first and second address comparators, the operand address that first address comparator is used for the read operation of microcontroller is counted the instruction in stage compares with destination address in the instruction of writing stage as a result, second address comparator is used for the operand address of the instruction of the execute phase of microcontroller and operand address in the instruction of writing stage are as a result compared, and data select transfer circuit to be used for comparative result according to the data correlation discriminating circuit the totalizer from microcontroller, random access memory, command decoder, that writes result register and mathematical logic unit optionally outputs to the mathematical logic unit.
The instruction length definition of microcontroller involved in the present invention utilizes three instruction definition zones, wherein stipulate number, operand address or destination address immediately with same instruction definition zone, come the operational code of regulation different operating computing with another instruction definition zone, thereby and with also having an instruction definition zone to be given for the identification control sign indicating number of the data transfer of streamline between each stage in operand address in the recognition instruction or position and destination address or the position control simplified command set streamline structure.
For example, a kind of instruction length is 17, thereby wherein three in the instruction identification control sign indicating numbers of controlling the data transfer of streamline between each stage in the described simplified command set streamline structure as the operand address in the recognition instruction or position and destination address or position; Eight in this instruction as performed several immediately, operand address of instruction or destination address; And six in this instruction operational codes as the computing of regulation different operating.
For example, for mathematical logic or cycle shift operation, minimum 8 (the 0th to the 7th) expression in the instruction is number, operand address or destination address immediately; The 8th to the 13rd bit representation operational code; The 14th to the 16th is used for operand position and target location in the recognition instruction.When being ' 1 ' for the 16th, expression has an operand to read from totalizer; When being ' 0 ' for the 16th, the expression no-operand is from totalizer.When being ' 1 ' for the 15th, expression has an operand to read from RAM, and when being ' 0 ' for the 15th, the expression no-operand reads from RAM.When being ' 1 ' for the 14th, the expression target is in ram cell, and when being ' 0 ' for the 14th, the target that the result is write in expression is a totalizer.Certainly, the instruction definition scheme of other form also is significantly concerning those skilled in the art, but for clear, is that example is described the present invention with as above instruction definition scheme, should be this as limitation of the present invention.
Technical scheme of the present invention can illustrate with Fig. 2.With reference to figure 2, each road input signal of data correlation discriminating circuit is respectively the Senior Three position (the 14th to the 16th) [n1 among Fig. 2 (or i16), n5 (or i15) and n3 (or i14)] of the instruction that is in the RD stage, minimum 8 (the 0th to the 7th) [being n7 or the i (7:0) among Fig. 2], clock signal n4 (or clk) and control signal n2 (or e_alu), n6 (or w_alu).Signal e_alu, w_alu represent the class of operation that EX stage and WR stage instruct respectively.For example, when e_alu or w_alu were ' 1 ', what the expression command adapted thereto was carried out was mathematics, logical OR circulative shift operation; Otherwise the expression command adapted thereto is carried out is other operation (as redirect control etc.).
Signal i14 is through register REG1, output signal n9 and reverse signal n8 thereof after the clock period of delaying time; N9 exports n15 and reverse signal n14 after register REG3 postpones a clock period again.Select in the transfer circuit 209 in data, input signal n19 is respectively the data of 5 different phases, separate sources to n23; N19 writes the operation result w_result in stage as a result; N20 is the operation result e_result of execute phase; N21 is the data from RAM; N22 is the data from totalizer; N23 is for counting immediately.
According to the front about the narration of the definition of i14, i15, i16 and pipeline organization microcontroller principle of work and in conjunction with Fig. 2, can know that when n8=' 1 ', the terminal objective of writing in expression EX stage is a totalizer, when n14=' 1 ', the terminal objective of writing in expression WR stage is a totalizer.When n1, n2, n8 equal ' 1 ' simultaneously, then with the output n12=' 1 ' of door AND1; This shows that the operand from totalizer that the RD stage instructs is the operation result that instructs in the EX stage, thus this operation result must be passed in advance the A mouth of ALU 211, otherwise the result that must make mistake.N12 is connected to the control end of MUX (MUX) 1.Therefore n12=' 1 ' the time is so the operation result n20 (e_result) in EX stage is sent to the output terminal n24 of MUX1.Because during n12=' 1 ', n25 also equals ' 1 ', so e_result and then be sent to the output terminal n27 of MUX 5.And n27 will be sent to the A mouth of ALU 211.In like manner, when n1, n6, n14 are ' 1 ' simultaneously, the output n18=' 1 ' of AND3, the operand from totalizer that this expression RD stage instructs is the operation result that instructs in the WR stage.At this moment, if n12=' 0 ', then the operation result n19 (w_result) that instructs of WR stage will be sent to the A mouth of ALU211.If n12, n18 equal ' 1 ' simultaneously, have only e_result to be sent to the A mouth of ALU211 so.This is easy to find out from circuit shown in Figure 2.
Continuation is with reference to figure 2, the address n7 of RD stage in instructing (address that to reach following said address here can be operand also can be a destination address), and the address n11 in instructing with the EX stage is input to address comparator COM1.At this moment, if n2 (being e_alu) is ' 1 ', then COM1 will compare above-mentioned two addresses, if these two addresses are identical, and the output n10=' 1 ' of comparer then, otherwise n10=' 0 '.
When n9=' 1 ', the terminal objective of writing that the expression EX stage instructs is a certain ram cell.Similarly, when n15=' 1 ', the terminal objective of writing that the expression W stage instructs is a certain ram cell.When n9, n5, n10 are ' 1 ' simultaneously, n13=' 1 ' then, this shows that the operand from ram cell of RD stage in instructing is the operation result that instructs in the EX stage.So this result must be sent in advance the B mouth of ALU211, otherwise, will produce " data disaster " (Data Hazard), thereby obtain wrong result.
When n13=' 1 ', MUX3 is sent to output terminal n30 with e_result.Therefore also equal ' 1 ' time, MUX 6 selects n30, so e_result and then be sent to its output terminal n32 and be sent to the B mouth of ALU 211.In like manner, if n15, n5, n16 equal ' 1 ' simultaneously, n28=' 1 ' then, this shows that the operand from ram cell of RD stage in instructing is the operation result in instructing in the WR stage.At this moment, if n13=' 0 ', then w_result will be sent to the B mouth of ALU 211.If n13, n28 equal ' 1 ' simultaneously, have only e_result to be sent to the A mouth of ALU 211 so.
As mentioned above, the present invention compares the beneficial effect that is had with background technology and is to have following advantage: circuit is simpler, thereby more reliable; Adopt the chip area of simplified command set streamline structure microcontroller of the present invention to dwindle, thereby cost also will decrease with corresponding.
Refer now to Fig. 3 application implementation of the present invention in the simplified command set streamline structure microcontroller is described.
(1) instruction fetch phase (clock period 0): programmable counter 201 is according to the control signal Q3 of control module 204 outputs, output order address Q1.The instruction vector Q2 (instruction length is 17) that command memory 202 is corresponding according to address Q1 output.
(2) (clock period 1) counted the stage in read operation: instruct vectorial Q2 output terminal 203a by this order register after order register 203 is delayed time a clock period to send command decoder 205 to.The various microcode vector of instruction vector decoding back output Q7, Q9, Q10, Q11 that this command decoder 205 will be imported.Data correlation discriminating circuit 208 is differentiated the correlation of data in each stage in instruction pipelining according to the various microcode vectors from command decoder 205, and output data selects to transmit control routine vector Q15 in view of the above.Data are selected transfer circuit 209, according to control routine vector Q15, select A mouth and B mouth that 2 suitable data also pass to mathematical logic unit (ALU) 211 respectively from the data of following 5 different phases, separate sources:
A. mathematical logic unit 211 is in the operation result output at its output terminal 211a place;
B. the output Q13 of totalizer 212;
B. write the output of result register 206 at output terminal 206a place:
D. the data of random access memory 207 are exported Q14;
E. the output vector Q9 of command decoder 205 (immediately count).
(3) execute phase: ALU211 carries out corresponding computing according to operation control code vector Q18 to input operand Q16, Q17, and from its output terminal 211a operation result is flowed to and to write result register 206.
(4) write the stage as a result: the operation result that existence is write in the result register 206 writes in random access memory 207 or the totalizer 212 by its output terminal 206a.
Below be two specific embodiment of the present invention.
Example one, consider following 2 instructions:
ADD?A,X
SUB[m1],A
Article one, the performed computing of instruction is the data addition of will count immediately among X and the totalizer A, and the result is stored among the A.The performed operation of second instruction is to be that data in the ram cell of address and the data among the totalizer A are subtracted each other with m1, and the result is stored in m1 is in the ram cell of address.In this example, owing to the operand (being the data in the totalizer) in the 2nd instruction is the operation result of article one instruction, so exist data dependence between these two instructions.So, when second instruction is in read operation and counts the stage, data correlation discriminating electricity 208 will be exported corresponding data in view of the above and select control routine Q15, and this code orders about data and selects transfer circuit 209 that the operation result of its output terminal 211a place article one instruction is passed (forwarding) A mouth to mathematical logic unit 211 as Q16 before directly.
Example two, consider following three instructions:
ADD[m1],A
INC[m2]
SUB?A,[m2]
Article one, instruction is carried out be with m1 be in data and the totalizer in the ram cell of address data mutually adduction the result is existed with m1 is in the ram cell of address.Second instruction is carried out is to be that data in the ram cell of address add 1 and the result is existed with m2 is in the ram cell of address with m2.Article three, instruction is performed be in the totalizer data be that data in the ram cell of address are subtracted each other with m1, and the result is existed in the totalizer.In this example, article one instruction and second instruction no datat correlativity, the second instruction is instructed also no datat correlativity with the 3rd.But article one instruction exists data dependence with the 3rd instruction.Article three, the operand [m1] of instruction (promptly being the data in the ram cell of address with m1) is the operation result of article one instruction.When the 3rd instruction was in read operation and counts the stage, article one instruction was in the stage as a result of writing, but this moment its operation result not write as yet with m1 be in the ram cell of address.So, when article one instruction is in read operation and counts the stage, corresponding data of data correlation discriminating circuit 208 outputs are selected control routine Q15, and this code orders about data and selects transfer circuit 209 operation result of its output terminal 206a place article one instruction to be sent to the B mouth of ALU211 as Q17.
Above-mentioned example only is a particular instance of the present invention, must not be considered as limitation of the present invention.