JPH09311786A - Data processor - Google Patents

Data processor

Info

Publication number
JPH09311786A
JPH09311786A JP5277297A JP5277297A JPH09311786A JP H09311786 A JPH09311786 A JP H09311786A JP 5277297 A JP5277297 A JP 5277297A JP 5277297 A JP5277297 A JP 5277297A JP H09311786 A JPH09311786 A JP H09311786A
Authority
JP
Japan
Prior art keywords
instruction
stage
instructions
data processing
operand
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
JP5277297A
Other languages
Japanese (ja)
Inventor
Yasuhiko Saito
Masahiro Uminaga
靖彦 斎藤
正博 海永
Original Assignee
Hitachi Ltd
株式会社日立製作所
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority to JP6057196 priority Critical
Priority to JP8-60571 priority
Application filed by Hitachi Ltd, 株式会社日立製作所 filed Critical Hitachi Ltd
Priority to JP5277297A priority patent/JPH09311786A/en
Publication of JPH09311786A publication Critical patent/JPH09311786A/en
Granted legal-status Critical Current

Links

Abstract

PROBLEM TO BE SOLVED: To reduce a pipeline stall due to a data hazard of a superscalar system and to improve the processing speed by changing an instruction in 1st instruction format stored in an instruction memory into an instruction in 2nd instruction format. SOLUTION: The instruction is taken in a 1st stage from the instruction memory and the instruction taken in the 1st stage 101 is decoded in a 2nd stage 103. The decoded instruction is executed in a 3rd stage and when the execution result is written in a register in a 4th stage 107, the instruction in the 1st instruction format stored in the instruction memory is changed into the instruction in the 2nd instruction format and executed. Consequently, the pipeline stall due to the data hazard of the superscalar system can be reduced and the processing speed is improved.

Description

Detailed Description of the Invention

[0001]

BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a data processing device such as a microprocessor or a microcomputer, and more particularly to a technique effective when applied to a data processing device such as superscalar for parallel processing.

[0002]

2. Description of the Related Art A microprocessor (CPU (Central P
rocessing unit), microcomputer, etc. ) Sequentially fetches a sequence of instructions,
Decrypt and execute. The instructions executed by the microprocessor are now of fixed length with the aim of simplifying the decoding circuit. A microprocessor that executes fixed-length instructions in a pipeline (Pipelining)
It is called an ISC (Red used Instruction Set Computer) type processor.

FIG. 1 shows a pipelined implementation method of a microprocessor. Here, for simplification, the normally existing memory access stage (M
EM) is omitted. Individual stages (101, 10)
3, 105, 107) is a unit of time (clock)
The individual instruction processing is completed by sequentially stacking the processing from the first stage to the last stage through the latch group (102, 104, 106). The first stage 101 fetches an instruction (IF). Second
The stage 103 interprets an instruction and reads a register (ID). The third stage 105 executes the operation designated by the instruction function (EX). The fourth stage 107 writes the calculation result to the register arranged in the second stage 103 via the signal line 108 (WB).

FIG. 2 shows a conceptual diagram when processing four instructions in a pipeline. If a subsequent instruction uses the contents of the register of the preceding instruction, there is room in the pipeline for the subsequent instruction (called pipeline stall due to data hazard). This state is shown in FIG. The two arrows pointing to the lower left in (a) of FIG. 2 indicate the register read of the subsequent instruction after the register write of the preceding instruction.

Therefore, as a means for solving this problem, when the subsequent instruction uses the result of the previous operation, that value is also sent to the operator in the third stage 105 via the signal line 108. The control line for the above is the signal line 109,
110. This adjustment is
This is known as g), which allows execution every clock. It should be noted that 2 facing downward left in FIG.
The two arrows indicate forwarding. Therefore, the number of clocks required for individual instruction processing is four, for example.
However, since each stage processes a new instruction every clock, the instruction processing becomes one instruction per clock. Therefore, since one instruction can be executed in one clock, the smaller the number of execution instructions for performing a certain process (program), the shorter the execution time.

Regarding pipeline and forwarding, 1994 Morgan Kaufman Publishers, I
Hennessy et al., `` Computer Organization a
nd Design ”Chapter 6 Enhancing Performance with Pipeli
ning (pages 362 to 450).

Next, as an example of a method for improving the processing speed of a microprocessor, a superscalar method (Superscalar method)
ar). In the superscalar system, the number of arithmetic units that can be executed at the same time is plural, for example, two, and accordingly, two instruction fetches and two instruction decodes can be performed at one time. In this case, as shown in FIG. 3A without data dependence, ideally, two instructions can be executed every clock, so that the execution time becomes half as compared with the normal pipeline method. . For the superscalar method, see Nikkei Electronics, November 2, 1989.
7th issue (No.487), pages 191 to 200, "Next Generation RI
SC, aiming for 100 MIPS in CMOS by introducing parallel processing ”.

R adopting the conventional superscalar system
The instruction length of an ISC type microprocessor is fixed at 4 bytes, and the number of operands of arithmetic instructions such as arithmetic operations is generally three. This example is disclosed in Japanese Patent Laid-Open No. 2-1
No. 30634. On the other hand, there is a RISC-type microprocessor with a 2-byte fixed length instruction in order to improve code efficiency (reduce the amount of memory used for storing instructions). However, the RI of the 2-byte fixed length instruction
The SC type microprocessor does not employ the superscalar system. This example is described in JP-A-5-197546.

[0009]

In order to clarify the problems caused by the superscalar method, description will be made with reference to FIG. The operation of the instructions shown in FIG. 3 is shown below.

(1) mov R3, R2 "copy contents of register R3 to register R2" (2) mov # 32, R5 "copy data * 32 * to register R5" (3) add R4, R2 "register R4 Add the contents of R2 and the contents of R2 and store the result in R2. ”(4) and R3, R5“ AND the contents of registers R3 and R5 and store the result in R5. ”Instruction (1) above The instruction (2) and the instruction (3) and the instruction (4) have no data dependency (data flow). However, the instruction (1) and the instruction (3), and the instruction (2) and the instruction (4) have data dependency (data flow). That is, instruction (1) and instruction (3)
Both use register R2. Further, the register R5 is used by both the instruction (2) and the instruction (4). Therefore, the instruction (3) must be executed after the execution of the instruction (1). Also, the instruction (4) must be executed after the execution of the instruction (2).

That is, when there is no data dependency between the instructions to be executed at the same time, there is no space in the pipeline as shown in FIG. A processing speed twice as fast as when executing only instructions is obtained. However, if there is a data dependency between the instructions that are executed at the same time, the pipeline will be disturbed as shown in FIG.
The processing speed will be the same as when executing only instructions.

Therefore, as shown in FIG. 3C, when there is a data dependency between the instructions executed simultaneously,
A method is conceivable in which the subsequent instruction is sent to the next pipeline and the unprocessed instruction nop is executed at the same time as the preceding instruction in place of the subsequent instruction to avoid the disturbance of the pipeline. However, the number of useless instructions increases, the total number of execution instructions increases, and the execution time increases.

Next, in order to clarify the problems caused by the instruction format and instruction system, description will be given below with reference to FIGS. 4 and 5.

FIG. 4 shows an example of an instruction format (instruction format) and an instruction repertoire in the case of a 4-byte / 3-operand instruction (4-byte fixed length instruction) system. In this figure, the OP field 401 specifies the instruction function. The S1 field 403 is a register number specifying the first input (first operand), the S2 field 404 is a register number specifying the second input (second operand), and the D field 402 is a register number specifying the output (third operand). ) Is placed. That is, this instruction format can specify three operands. Command functions include copying (data transfer), addition, subtraction, and the like. In addition, 4
Compound instructions such as a 1-bit left shift addition instruction asl1add and a 0 extension addition instruction zextadd are also provided due to the margin of the instruction length of the byte instruction system. The asl1add instruction shifts the bit pattern of the first operand one bit to the left and then performs the normal addition, and the zextadd instruction zeros the left half of the bit pattern of the first operand and then performs the normal addition. is there. Note that, for simplification, memory access instructions, branch instructions, etc. that would normally exist are omitted. In the case of a copy command (data transfer command), the S2 field 404 is ignored, and the contents of the register (transfer source register) specified in the S1 field 403 remain unchanged in the D field 402.
It is copied (transferred) to the register (transfer destination register) specified by.

FIG. 5 shows an example of the instruction format and instruction repertoire in the case of a 2-byte / 2-operand instruction (2-byte fixed length instruction) system. In FIG. 5, the OP field 501
Specifies the command function. The S1 field 503 has a register number (first operand) that specifies the first input, and the D field 502 has a register number (the same as the register number that specifies the output, the second operand) that specifies the second input. That is, this instruction format can specify two operands. As compared with FIG. 4, the point that the S2 field does not exist is the part that is clearly different from the instruction format of FIG. That is, the number of operands is one less. Further, the remaining field length is shorter than that of FIG.

The instruction function includes a copy instruction (data transfer instruction), a 0 extension instruction, a sign extension instruction, and a 1 input transfer instruction as 1 input transfer instruction.
Bit shift left instruction, addition instruction as 2 input operation instruction,
There is a subtraction instruction, etc. Of these, the 1-bit left shift instruction is
Due to the instruction length, the numbers of the input register (transfer source register) and the output register (transfer destination register) are the same. Therefore, in this case, the S1 field stores not the register number but the extended instruction code for specifying the asl1 instruction.

In order to clarify the advantages and disadvantages of the 4-byte / 3-operand instruction system and the 2-byte / 2-operand instruction system, consider the following formula, for example.

A = b + c + d; (A) This is converted into an instruction sequence (instruction sequence (A1)) of a 4-byte / 3-operand instruction system as follows.

Add Rb, Rc, Ra add Ra, Rd, Ra On the other hand, when this is converted into an instruction sequence (instruction sequence (A2)) of a 2-byte / 2-operand instruction system, it becomes as follows.

Mov Rb, Ra add Rc, Ra add Rd, Ra If the instruction system has 4 bytes and 3 operands, the number of execution instructions is 2, but storage in the instruction memory (and instruction fetch for execution) The number of bytes is 8 bytes. On the other hand 2
In the case of an instruction system of bytes and 2 operands, the number of executed instructions increases to 3, but the number of bytes stored (and instruction fetch for execution) in the instruction memory decreases to 6 bytes. This tendency generally holds. It is generally accepted that the 4-byte / 3-operand instruction system has a 10 to 20% smaller number of executed instructions than the 2-byte / 2-operand instruction system, but the stored byte number is increased by 60%.

However, there is one problem with the 2-byte / 2-operand instruction system. It involves the extra data transfer instructions needed in the two-operand instruction set. Although the above equation (A) can be similarly explained, the following equation (B) will be explained here.

A = b + c; This is converted into an instruction sequence (instruction sequence (B1)) of a 4-byte / 3-operand instruction system as follows.

On the other hand, when this is converted into a 2-byte / 2-operand instruction sequence (instruction sequence (B2)), it becomes as follows.

Mov Rb, Ra add Rc, Ra 4 bytes / 3 operands instruction system can be executed in one clock by using only one of the pipelines. On the other hand, in the case of a 2-byte / 2-operand instruction system, a data flow exists between two extra instructions, a copy (data transfer) instruction mov and a subsequent add instruction add. That is, the value resulting from the preceding instruction is used by the subsequent instruction. Therefore, it is necessary to wait for the result of the preceding instruction mov to execute the succeeding instruction add, which takes two clocks. If the following instruction sequence mov Rb, Ra add Rc, Rd, there is no data flow between two instructions, so 2
Although it can be executed in one clock using one pipeline, the instruction string (B2) corresponding to the expression (B) requires extra processing time because of the data flow. When the superscalar system is adopted, it can be said that the 2-byte / 2-operand instruction system tends to take more execution time than the 4-byte / 3-operand instruction system because of the large number of execution instructions.

The problem of the 2-byte / 2-operand instruction system has been explained by comparing it with the 4-byte / 3-operand instruction system.
When executing a 4-operand operation, the instruction sequence (A
There is a data flow as in 1), and there are similar problems to the 2-byte / 2-operand instruction system.

The existing microprocessors are:
Due to the accumulation of software assets and the inheritance of software assets that have been built up to date, it is difficult to change the instruction format and instruction system. Therefore, it is necessary to improve the processing speed while maintaining the conventional instruction format and instruction system.

An object of the present invention is to reduce the pipeline stall due to data hazard in the superscalar system and to realize the improvement of the processing speed.

Another object of the present invention is to reduce the number of execution instructions and realize an improvement in processing speed.

Further, another object of the present invention is 2 bytes.
It is to realize an improvement in the processing speed of a data processing device that executes a two-operand instruction system.

The above and other objects and novel characteristics of the present invention will be apparent from the description of this specification and the accompanying drawings.

[0031]

The following is a brief description of an outline of a typical invention among the inventions disclosed in the present application.

The pipeline type data processing device has a dependency on the stage for reading a fixed length instruction stored in the instruction memory and the data executed by the plurality of read instructions, and the plurality of instructions have a predetermined value. If there is a relationship,
It has a stage for changing the plurality of instructions so that the plurality of instructions can be executed in parallel in a plurality of pipelines, and a stage for executing the changed plurality of instructions in parallel.

The instruction system is a 2-byte 2-operand instruction system, but is internally processed as a 3-operand instruction system. That is, the instruction fetch stage fetches two instructions. The instruction decode stage decodes two adjacent instructions. Two sets of computing units for the computing stage are prepared. Then, two adjacent two-operand instructions are
The instruction decoder is provided with means for detecting equality with three three-operand instructions and, if so, means for integrating two instructions into one three-operand instruction and sending it to the subsequent execution stage. As a result, one 3-operand instruction is sent to the execution stage and executed in one clock. Further, when it is detected that two adjacent instructions have a data flow relationship but cannot be integrated into one three-operand instruction, means for sending the source data of the preceding instruction to the arithmetic unit for the subsequent instruction is provided.

This allows two instructions to be executed simultaneously. Due to the above two, it is possible to execute two instruction processing in one clock, which conventionally took two clocks due to the data flow between the adjacent instructions. Therefore, the number of execution clocks as a whole can be reduced.

[0035]

DESCRIPTION OF THE PREFERRED EMBODIMENTS A microprocessor according to an embodiment of the present invention will be described in order of items.

<< Microprocessor Pipeline Data Path >> FIG. 6 shows a pipeline data path of the microprocessor according to the embodiment of the present invention. It will be described below that the microprocessor fetches and executes an instruction of a 2-byte / 2-operand instruction system as shown in FIG.

The first stage 700 is an instruction fetch stage. The second stage 800 is an instruction decode stage. The third stage 900 is a calculation stage. The fourth stage 1000 is a stage for writing to a register and forwarding. A first latch group 750, a second latch group 850, and a third latch group 850 are provided between each stage.
There is a latch group 950. It should be noted that each stage in the embodiments of FIG. 6 and subsequent figures shows the flow of data, and does not show the physical arrangement of the circuits and the like described in each stage.

<< Instruction Fetch Stage >> FIG. 7 is a detailed block diagram of the first stage 700 and the first latch group 750. The first stage 700 includes a program counter (PC) 701, a fetch controller 702, and an instruction memory 703. The role of the instruction fetch stage of the first stage 700 is to pass the instructions in the instruction memory to the next instruction decode stage of the second stage 800.

The address indicated by the program counter 701 is sent to the signal line 704, and the instruction 4 bytes (2 instructions) in the instruction memory 703 are fetched to the fetch control unit 702 via the signal line 705. The two instructions fetched by the fetch control unit 702 are sent to the signal line 7 according to the signal line 803.
It is sent to 06 and 707. Then the first latch group 750
The content of the signal line 706 is stored in the latch 751 therein, and the content of the signal line 707 is stored in the latch 752. The latch 751 stores the first instruction and the latch 752 stores the second instruction. Here, the first instruction is the second instruction in the instruction sequence.
It precedes the order. In the present application, the first instruction is also referred to as a preceding instruction and the second instruction is also referred to as a subsequent instruction.

The value of the program counter 701 is set to 4
The value added with is reset to the program counter 701. The value of the program counter 701 (the value of the address that accesses the instruction memory) is set to be a multiple of 2 so that a 4-byte instruction (2 instructions) is fetched from the instruction memory and latched in the first latch group 750. The first stage 700 operates. However, the 4-byte instruction fetched from the instruction memory is not always latched in the first latch group 750 as it is. That is, the second stage 80
When viewed from the instruction decoder stage which is 0, information about how many bytes ahead of the current instruction the next desired instruction is sent to the fetch control unit 702 of the first stage 700 via the signal line 803.
In response to this, the fetch control unit 702 of the first stage 700 utilizes the buffer existing in the fetch control unit 702 to send out the desired 4 bytes (2 instructions) of the instruction decoding stage to the signal lines 706 and 707. 1 latch group 7
It stores in the latches 751 and 752 in 50.

<< Instruction Decode Stage >> FIG. 8 shows a detailed block diagram of the second stage 800 and the second latch group 850. The second stage 800 includes a decode controller 8
01 and register file 802. The role of the instruction decode stage of the second stage 800 is as follows. (1) Prepare input data used by two instructions and pass it to the next operation stage (third stage 900).

(2) The data flow between two instructions is inspected, and the execution result of the preceding instruction (first instruction) is compared with the subsequent instruction (second instruction).
If two instructions are not used, the processing stage is requested to process two instructions.

(3) The data flow between two instructions is checked, and if the execution result of the preceding instruction is used by the succeeding instruction,
Modify the two instructions according to the given rules.

(4) The number of instructions requested to be processed in the operation stage is sent to the instruction fetch stage to prepare for the next pipeline processing.

Instruction decode stage (second stage 80
The operation of 0) will be described below. FIG. 12 shows a detailed block diagram of a part of the decoding control unit 801. The decode control unit 801 has a data flow detection circuit DFDC, an instruction conversion circuit INCC, and the like. Instruction conversion circuit INCC
Has selectors SEL1 to SEL4, processes the contents of the latches 751 and 752 under the control of the data flow detection circuit DFDC, and converts them into the contents of the latches 851 and 852.

OP of the first instruction which is the content of the latch 751
The field is OP-1, the D field is D-1, and the S1 field is S1-1. The OP field of the second instruction, which is the contents of the latch 752, is OP-2, the D field is D-2, and the S1 field is S1-2. Latch 85
The OP field of the first instruction, which is the content of 1, is OPN-
1, D field is DN-1, S1 field is S1N-
Let it be 1. The OP field of the second instruction, which is the contents of the latch 852, is OPN-2, the D field is DN-2, and S1.
Let the field be S1N-2. The second instruction, which is the content of the latch 852, further has an S2 field, which is S2.
2N-2.

The decode control unit 801 fetches two instructions, a preceding instruction and a succeeding instruction, from the latches 751 and 752 in the latch group 750 via signal lines 753 and 754. Then, the data flow detection circuit DFDC checks whether or not the register number of the D field (D-1) of the preceding instruction is equal to the register number of the S1 field (S1-2) or the D field (D-2) of the subsequent instruction. .

If the register numbers are not equal, it can be determined that there is no data flow. If the register numbers are the same, it can be determined that there is a data flow. Then, the data flow detection circuit DFDC receives the control signal 821.
To 824, and switches the selectors SEL1 to SEL4 respectively to switch the latch 8 via the signal lines 813 and 804.
The first and second instructions converted into 51 and 852 are stored. The invalid command NOP820 generated by INCC is always input to one input of the selectors SEL1 and SEL2.

Further, a new command generated by the data flow detection circuit DFDC is input to the selector SEL2 via the signal line 840. The new instruction input to the selector SEL2 via the signal line 840 is generated by the data flow detection circuit DFDC based on 0P-1 of the latch 751 and 0P-2 of the latch 752, and is input to 0P-2 of the latch 852. Is stored. As an example of the new instruction generated, 0P-1 is a 1-bit shift instruction asl1
There is a 1-bit shift addition instruction asl1add generated when 0P-2 is the addition instruction add.

The selector SEL3 is S1-1 or D-.
This is for selecting one of the two values and storing it in S1N-2.

The selector SEL4 is S1-1 or S1.
-2 is selected and stored in S2N-2.

FIG. 11 shows rules for converting two instructions in the instruction decode stage into two instructions in the operation stage (conditions and instructions across the operation stage). The first instruction is either translated or not translated into an invalid instruction nop. The second instruction is converted from the 2-byte / 2-operand format shown in FIG. 5 to the 4-byte / 3-operand format shown in FIG. 4 or converted into an invalid instruction nop. The ALU in FIG. 11 is a general name for two-input operation instructions such as arithmetic operations (addition, subtraction, etc.) and logical operations (logical product, logical sum, etc.). As described above, zextALU is an instruction that zero-extends the first input to the arithmetic unit and performs ALU operation. asl1ALU is an instruction for shifting the first input to the arithmetic unit to the left by 1 bit and performing the ALU operation.

FIG. 11A shows a 2-operand type operation instruction which requires two instructions, a copy instruction mov and an operation instruction ALU, in order to execute a three-operand operation instruction.
It is converted into a three-operand operation instruction ALU. This is the case where the register number of the D field of the copy instruction mov matches the register number of the D field of the arithmetic instruction ALU. In this case, the first instruction is converted into an invalid instruction nop, and the second instruction is converted into a three-operand arithmetic instruction and passed to the arithmetic stage.

The values stored in the fields of the latches 851 and 852 are summarized as follows. In addition,
“←” means that the value on the right side of “←” is stored on the left side of “←”.

[0055] Specifically, it is as follows. OP-1 of the latch 751
It is assumed that "mov" is stored in "", "RN" is stored in "D-1", and "Rm" is stored in "S1-1". Also, the latch 752
OP-2 has "ALU", D-2 has "RN", S
It is assumed that “Rl” is stored in 1-2. Where D
The data flow detection circuit DFDC detects that both -1 and D-2 are "RN" and the register numbers match. Then, the data flow detection circuit DFDC causes the selector SEL1 via 821 so that SEL1 selects the nop instruction 820.
And stores the nop instruction 820 in OPN-1 of the latch 851. The data flow detection circuit DFDC includes a latch 7
51 D-1 and S1-1 are directly connected to the signal lines 753 and 8
The data is stored in DN-1 and S1N-1 of the latch 851 via 13.

Further, the data flow detection circuit DFDC controls the selector SEL2 via the control signal 822 so that the selector SEL2 selects OP-2 of the latch 752, and OP-2 of the latch 752 is OPN- of the latch 852.
2 is stored. Further data flow detection circuit DFDC
Controls the selector SEL3 via the control signal 823 so that the selector SEL3 selects S1-1 of the latch 751, and the S1-1 of the latch 751 is changed to the S1N of the latch 852.
-2. Further, the data flow detection circuit DFDC stores D-2 of the latch 752 in DN-2 of the latch 852 as it is via the signal line 754. Further, in the data flow detection circuit DFDC, the selector SEL4 has a latch 75.
The selector SEL4 is controlled via 834 so as to select S1-1 of 2 and S1-1 of latch 752 is changed to S2N−.
2 is stored.

FIG. 11B shows the case where the register number of the D field of the copy instruction mov matches the register number of the S1 field of the arithmetic instruction ALU. In this case, the second instruction is converted into a three-operand operation instruction and passed to the operation stage without changing the first instruction.

The values stored in the fields of the latches 851 and 852 are summarized as follows. Specifically, it is as follows. OP-1 of the latch 751
It is assumed that "mov" is stored in "", "RN" is stored in "D-1", and "Rm" is stored in "S1-1". Also, the latch 752
OP-2 has "ALU", D-2 has "Rx", S
It is assumed that “RN” is stored in 1-2. Where D
The data flow detection circuit DFDC detects that both -1 and S1-2 are "RN" and the register numbers match. The data flow detection circuit DFDC then selects the selector SEL.
1 controls the selector SEL1 via 821 so that OP-1 of the latch 751 (mov instruction in this case) is selected.
The v instruction is stored in OPN-1 of the latch 851.

In the data flow detection circuit DFDC, the D-1 and S1-1 of the latch 751 are directly connected to the signal line 753,
The data is stored in DN-1 and S1N-1 of the latch 851 via 813. Further, the data flow detection circuit DFDC controls the selector SEL2 via the control signal 822 so that the selector SEL2 selects OP-2 of the latch 752, and stores OP-2 of the latch 752 in OPN-2 of the latch 852. . The data flow detection circuit DFDC stores D-2 of the latch 752 as it is in DN-2 of the latch 852 via the signal lines 754 and 804. Further, the data flow detection circuit DFDC controls the selector SEL3 via the control signal 823 so that the selector SEL3 selects S1-1 of the latch 751, and the S1N-2 of the latch 852 is controlled by the selector SEL3 via the signal line 804. Store S1-1. The data flow detection circuit DFDC connects S1-2 of the latch 752 via the signal lines 754 and 804.
It is stored in S2N-2 of the latch 852 as it is.

It should be noted that the values specifically stored in the latches 851 and 852 are made as described above.
(2) It will be omitted after this. This is because the values to be stored in the latches 851 and 852 can be created in the same manner as (1) and (2) in FIG.

FIG. 11 (3) is for converting a 1-operand type 1-bit left shift instruction into a 2-operand type 1-bit left shift instruction. Register field number of copy instruction mov and D of 1-bit left shift instruction asl1
This is the case when the field register numbers match. In this case, the first instruction becomes an invalid instruction nop in the operation stage,
The second instruction is converted into a 2-operand 1-bit left shift instruction asl1 and passed.

That is, each field is converted as follows.

[0063] In FIG. 11, (4) is a case where the first instruction is the copy instruction mov and the second instruction or condition does not correspond to (1), (2) and (3) in FIG. In this case, the first instruction remains unchanged and the second instruction is converted into the invalid instruction nop and passed to the operation stage. The "other" instructions are executed in the next pipeline, which is one clock offset.

That is, each field is converted as follows.

OPN-1 ← OP-1, DN-1 ← D-1, S1N-1 ← S1-1, OPN-2 ← nop In FIG. 11 (5), 0 extension instruction zext and operation instruction ALU are set to 0.
It is combined with the extended operation instruction zextALU. This is the case where the register number of the D field of the 0 extension instruction zext matches the register number of the D field of the arithmetic instruction ALU.
In this case, the first instruction is an invalid instruction nop in the operation stage.
The second instruction is a 3-operand 0-extended operation instruction zextALU
Is converted to and passed.

That is, each field is converted as follows.

[0067] FIG. 11 (6) shows a case where the register number in the D field of the 0 extension instruction zext and the register number in the S1 field of the add instruction add match. In this case, the first instruction remains in the operation stage, and the second instruction has a 3-operand value of 0.
Converted to the extended addition instruction zextadd and passed.

That is, each field is converted as follows.

[0069] In addition to the addition instruction add, a commutative logical product instruction and, a logical sum instruction or, or the like may be converted in the same manner.

In (7) of FIG. 11, the first instruction is the 0 extension instruction ze.
This is the case where the second instruction or condition does not correspond to (5) or (6) in FIG. 11 at xt. In this case, the first instruction remains unchanged and the second instruction is converted into the invalid instruction nop and passed to the operation stage. The "other" instructions are executed in the next pipeline, which is one clock offset.

That is, each field is converted as follows.

OPN-1 ← OP-1, DN-1 ← D-1, S1N-1 ← S1-1, OPN-2 ← nop (8) in FIG. 11 is a 1-bit left shift instruction asl1 and an operation instruction
ALU and 1-bit left shift operation instruction asl1ALU are combined. This is the case where the register number of the D field of the 1-bit left shift instruction asl1 and the register number of the D field of the arithmetic instruction ALU match. In this case, the first instruction is converted into the invalid instruction nop, and the second instruction is converted into the 3-operand 1-bit left shift operation instruction asl1ALU and passed to the operation stage.

That is, each field is converted as follows.

[0074] FIG. 11 (9) shows a case where the register number of the D field of the 1-bit left shift instruction asl1 and the register number of the S1 field of the add instruction add match. In this case, the second instruction is converted to the 3-operand 1-bit left shift addition instruction asl1add and passed to the operation stage without changing the first instruction.

That is, each field is converted as follows.

[0076] In (10) of FIG. 11, the first instruction is a 1-bit left shift instruction as.
This is the case where the second instruction or condition does not correspond to (8) or (9) in FIG. 11 at l1. In this case, the first instruction remains unchanged and the second instruction is converted into the invalid instruction nop and passed to the operation stage. The "other" instructions are executed in the next pipeline, which is one clock offset.

That is, each field is converted as follows.

OPN-1 ← OP-1, DN-1 ← D-1, S1N-1 ← S1-1, OPN-2 ← nop (11) in FIG. 11 shows a case where there is no data flow between two instructions. No instruction conversion is performed.

The two new instructions converted by the decode controller 801 are sent to the signal lines 813 and 804 and stored in the latches 851 and 852 in the second latch group 850, respectively. In addition, the inspection result of the relationship between the preceding instruction and the succeeding instruction in the data flow detection circuit DFDC is shown in the PC of FIG.
The instruction fetch stage (first stage 700) is communicated via the signal line 803 based on the updated value. That is,
The instruction fetch stage is informed of the information specifying the two instructions to be decoded in the next pipeline.

Further, the decode controller 801
S1 field (S1-1), D field (D-
1), and the S1 field 503 (S1-
2), the four register numbers of the D field 502 (D-2) are sent to the register file 802 via the signal lines 805, 806, 807 and 808. Register file 80
The contents of the four registers in 2 are the signal lines 809 and 81.
0, 811, and 812, and latches 853 (1-1st input) and latches 854 (first latch) in the second latch group 74.
-2 input), latch 855 (2-1 input), latch 8
56 (2nd-2nd input).

A block diagram of the register file 802 is shown in FIG. The register file 802 includes a register RGSTR, a register control circuit RCC, and the like. Register RGSTR has 4 read ports and 2
There is a light port of a book and signal lines 809 and 8 respectively.
10, 811, 812 and signal lines 955, 956 are connected. Therefore, the register file 802 can read the contents of four registers at the same time. Also, it is possible to write to two registers at the same time.

In the case of (1), (5), and (8) of FIG. 11, the contents of the two registers designated by (S1-1) and (S1-2) are read onto the signal lines 811, 812. The latch 855 (the 2nd-1st input) and the latch 856 (the 2nd-2nd)
Input).

In the case of (2), (6), and (9) of FIG. 11, the contents of the register designated in (S1-1) are read out on the signal lines 809 and 811, and the latch 853 (first-first) is read. 1
Input) and the latch 855 (2-1st input). The content of the register designated by (D-2) is the signal line 81.
2 is read out and stored in the latch 856 (2-2nd input).

In the case of (3) in FIG. 11, the contents of the register designated in (S1-1) are read out to the signal line 811, and stored in the latch 855 (second 2-1 input).

In the case of (4), (7), and (10) of FIG. 11, the contents of the register designated by (S1-1) are read out to the signal line 809, and the latch 853 (first-1 input) is read. ).

In the case of (11) in FIG. 11, (S1-
1), (D-1), (S1-2), and the contents of the four registers designated by (D-2) are signal lines 809, 810, and 8
11 and 812, the latch 853 (first-1 input), the latch 854 (first-2 input), the latch 855.
It is stored in the latch 856 (the 2-2nd input).

<< Execution Stage >> FIG. 9 shows the third stage 9
00 and the third latch group 950 are shown in detail. The third stage 900 includes an arithmetic control unit 901 and an ALU.
Operation units 902, 90 including (Alithmetic Logic Unit)
3, the first input adjustment circuits 904 and 905, the selector 906,
And 907. The role of the execution stage, which is the third stage 900, is to execute the operation of two instructions.

The computing unit 902 and the first input adjusting circuit 904 are circuits for computing the preceding instruction, and are the second latch group 850.
From the two latches 853 and 854 in the first input to the first input and the second input from the second input via the signal lines 859 and 860 to the selector 90.
Sent to 6. In addition, the first and second outputs from the two latches 953 and 954 in the third latch group 950 are the signal line 9
55, 956 to the selector 906.

The selector 906 selects one of the signal lines 859, 955 and 956 according to the signal line 1001, and selects the arithmetic unit 90 via the first input circuit 904 and the signal line 912.
Send the data to 2. Also, the selector 906 is the signal line 86.
One of 0, 955 and 956 is selected according to the signal line 1001 and data is sent to the arithmetic unit 902 via the signal line 913.

The arithmetic control unit 901 takes in the instruction of the latch 851 in the second latch group 850, and connects the arithmetic unit 902 and the first input adjusting circuit 904 to the signal line 911 according to the instruction function.
And 908 to perform the operation for the preceding instruction. The resulting value (first output) is stored in the latch 953 in the third latch group 950 via the signal line 918.

On the other hand, the arithmetic unit 903 and the first input adjusting circuit 9
Reference numeral 05 denotes a circuit for calculating a subsequent instruction. The two latches 855 and 856 in the second latch group 850 connect the 2-1 input and the 2-2 input to the selector 907 via the signal lines 861 and 862. Sent. In addition, 2 in the third latch group 950
The first output and the second output from one of the latches 953 and 954 are sent to the selector 907 via the signal lines 955 and 956.

The selector 907 selects one of the signal lines 861, 955 and 956 according to the signal line 1002, and selects the arithmetic unit 90 via the first input circuit 905 and the signal line 914.
Send data to 3. Also, the selector 907 is the signal line 86.
One of 2, 955 and 956 is selected according to the signal line 1002 and data is sent to the arithmetic unit 903 via the signal line 915. The arithmetic control unit 901 takes in the instruction of the latch 852 in the second latch group 850, controls the arithmetic unit 903 and the first input adjustment circuit 905 with the signal lines 910 and 909 according to the instruction function, and performs the arithmetic operation for the subsequent instruction. I do. The resulting value (second output) is stored in the latch 954 in the third latch group 950 via the signal line 919.

The above is the execution stage (third stage 90).
0), but the s1add instruction and zextadd instruction will be supplementarily described. The asl1add instruction and zextadd instruction can be realized by finely adjusting the first input to the arithmetic unit 902 or 903 capable of realizing addition. That is, the first input is not directly input to the arithmetic unit, but is input to the first input adjustment circuit 904 or 905, and the arithmetic control unit 901 controls it.
This can be realized by performing 1-bit left shift or 0-extension adjustment, inputting it to the arithmetic unit 902 or 903, and controlling it to perform normal addition.

<< Write Stage >> FIG. 10 is a block diagram for explaining the operation of the fourth stage 1000. The fourth stage 1000 is composed of a register number decoding circuit 1010 and a forwarding control circuit 1020. The role of the fourth stage 1000, which is a stage that performs writing to a register and forwarding, is as follows.

(1) Write the operation results of two instructions to the register of the specified number.

(2) If the operation results of two instructions are used in the operation stage (next pipeline) at the current clock, it is not the value latched in the second latch group 850,
The value latched in the third latch group 950 is adjusted so as to be input to the arithmetic unit (forwarding).

First, the process (1) will be described. The fourth stage 1000 includes latches 95 in the third latch group 950.
The two instructions calculated immediately before from 1, 952 are sent to the signal line 9
It is taken into the register number decoding circuit 1010 via 57 and 958. Further, the latch 953 in the third latch group 950,
The value of the immediately preceding calculation result from 954 is output to signal lines 955 and 956
To send to. Then, the register number decoding circuit 1010 sends the register numbers in the two D fields of the instruction executed immediately before to the signal lines 1003 and 1004 to specify the write register number of the register file 802 of the second stage 800. Thus, the values of the two calculation results are written in the register file 802.

Next, the process (2) will be described. The fourth stage 1000 includes latches 851 in the second latch group 850,
The two instructions to be calculated this time from 852 are signal line 857,
It is taken into the forwarding control circuit 1020 via 858. In addition, the latches 951 in the third latch group 950,
The two instructions calculated immediately before from 952 are sent to the signal line 95.
Forwarding control circuit 1020 via 7, 958
Take in. Then, the forwarding control circuit 102
For 0, it is checked whether the register numbers in the two D fields of the instruction executed immediately before and the numbers of the S1 field and S2 field of the two instructions to be calculated this time have the same value. If there is the same one as a result of the inspection, the latches 853, 854, 8 in the second latch group 850 are checked for that portion.
The forwarding control circuit 1020 is configured so that the values (signal lines 955 and 956) in the latches 953 and 954 in the third latch group 950 are input to the arithmetic units 902 and 903 instead of the values in 55 and 856. , 1002 to control the two selectors 906 and 907.

<< Processing of Instruction Sequence >> FIG. 13 shows how the instruction sequence is processed by individual clocks in the superscalar processing of the present invention. For comparison, it is also shown how the instruction sequence is processed with individual clocks only when the invalid instruction nop is inserted when the two instructions cannot be executed in parallel. In the present invention, it is possible to process two instructions per clock. Also,
In the present invention, the number of execution instructions is 6 less and the execution time is shorter than that in the case where the invalid instruction nop is inserted when two instructions cannot be executed in parallel (about 40% in this instruction sequence).
There are less execution instructions).

If the preceding instruction is a transfer type instruction such as mov, zext, asl1 and the following instruction is an addition instruction such as add, two instructions are converted into one instruction and executed in one clock. The number of clocks can be reduced and the speed can be increased. Further, even if the preceding instruction is a transfer instruction and the subsequent instruction is an operation instruction, and there is a data flow between the two, it is executed with one clock, so that the number of clocks as a whole can be reduced and the speed can be increased. .

<< Application Example to Microcomputer >> FIG. 1
4 shows a microcomputer system using the superscalar system of the present invention. The microcomputer MCU includes a central processing unit CPU, a floating point processing unit FPU, and a multiplier MULT having a product-sum operation function.
A memory management unit MMU for converting a logical address into a physical address, an instruction and data cache memory CACHE, a cache controller CCNT, an external bus interface EBIF, a 32-bit logical address bus LABUS, and a 32-bit physical address data bus. PABUS and 32-bit data bus DBUS, D
BS and BS are formed on a semiconductor substrate such as single crystal silicon and resin-sealed (sealed in a plastic package).

The microcomputer MCU is connected via an external address bus EAB and a data bus EDB to a main memory unit MM composed of a semiconductor memory using dynamic memory elements such as DRAM as memory cells.

The central processing unit CPU is composed of the pipeline data path shown in FIG. However, a memory access stage is provided between the third stage and the fourth stage to form a so-called 5-stage pipeline. In addition,
The data memory and the instruction memory 703 correspond to the cache memory CACHE or the main memory MM and do not exist in the central processing unit CPU. The central processing unit CPU executes the instruction of the instruction system of the 2-byte fixed length instruction, and the arithmetic unit 90
Reference numerals 2 and 903 each have a 32-bit ALU or the like. The register file 802 has 16 general-purpose registers having a 32-bit length. That is, the central processing unit CPU executes the instructions of the 2-byte / 2-operand instruction system (instruction set) described in Japanese Patent Laid-Open No. 5-197546. The CPU described in JP-A-5-197546 is not a superscalar system. On the other hand, the central processing unit CPU is a superscalar system, and the central processing unit CPU can execute the same command system as the command system described in the application number 1992/897457. Therefore,
High-speed performance can be achieved while maintaining compatibility with existing software (object code compatibility). Also,
The high code efficiency, which is a characteristic of 2-byte fixed length instructions, can be maintained.

Although the invention made by the present inventor has been specifically described based on the embodiments, the present invention is not limited to the embodiments and various modifications can be made without departing from the scope of the invention. There is no end. For example,
In the embodiment shown in FIG. 6 and the following, the case of the 2-byte / 2-operand instruction system has been described, but it is also applicable to the case of the 4-byte / 3-operand instruction system. Although the 0 extension instruction and the 0 extension operation instruction have been described, the same can be applied to the sign extension instruction and the sign extension operation instruction. Further, although the case where the register is specified in the S1 field of the transfer instruction of the first instruction has been described, the case where the S1 field is immediate data can be applied.

[0105]

The effects obtained by typical ones of the inventions disclosed in the present application will be briefly described as follows.

The instructions can be executed in parallel by detecting the data flow between adjacent instructions and converting the instructions. Therefore, it is possible to execute a plurality of instruction processes in one clock, which conventionally takes a plurality of clocks. Thereby,
The number of execution clocks as a whole can be reduced.

[Brief description of drawings]

FIG. 1 illustrates a pipelined implementation of a microprocessor.

FIG. 2 shows a concept of pipeline processing.

FIG. 3 shows the concept of superscalar processing.

FIG. 4 shows an example of an instruction format and an instruction repertoire of a 4-byte instruction system.

FIG. 5 shows an example of a command format and a command repertoire of a 2-byte command system.

FIG. 6 is a diagram showing a data path of a pipeline of a microprocessor according to an embodiment of the present invention.

FIG. 7 is a detailed block diagram of a first stage and a first latch group.

FIG. 8 is a detailed block diagram of a second stage and a second latch group.

FIG. 9 is a detailed block diagram of a third stage and a third latch group.

FIG. 10 is a block diagram illustrating the operation of the fourth stage.

FIG. 11 shows rules for converting two instructions in the instruction decode stage into two instructions in the operation stage.

FIG. 12 shows a detailed block diagram of a part of a decoding control unit.

FIG. 13 shows how an instruction sequence is processed with individual clocks.

FIG. 14 is a diagram of a microcomputer system using the superscalar system of the present invention.

FIG. 15 is a block diagram of a register file.

[Explanation of symbols]

101 ... 1st stage, 103 ... 2nd stage, 1
05 ... 3rd stage, 107 ... 4th stage, 10
8, 109, 110 ... Signal line, 401 ... OP field, 402 ... D field, 403 ... S1 field, 404 ... S2 field, 501 ... OP field, 502 ... D field, 503 ... S1 Field: 700 ... First stage, 800 ... Second stage, 900 ... Third stage, 1000 ... Fourth stage, 701 ... Program counter, 702 ... Fetch controller, 703 ... Instruction memory, 704 , 705,
706, 707 ... Signal line, 751, 752 ... Latch, 801, Decode control unit, 802 ... Register file, 803, 804, 805, 806, 807, 8
08, 809, 810, 811, 812, 813 ... Signal lines, 851, 852, 853, 854, 855, 85
6 ... Latch, 857, 858, 859, 860, 86
1, 862 ... Signal line, 901 ... Arithmetic control unit, 902
... calculator, 903 ... calculator, 904 ... first input adjusting circuit, 905 ... first input adjusting circuit, 906 ... selector, 907 ... selector, 908, 909, 910, 91
1, 912, 913, 914, 915, 916, 91
7, 918, 919 ... Signal line, 951, 952, 95
3, 954 ... Latch, 955, 956, 957, 95
8 ... Signal line, 1001, 1002, 1003, 100
4 ... Signal line, 1010 ... Register number control circuit, 1
020 ... Forwarding control circuit, INCC ... Command conversion circuit, DFDC ... Data flow detection circuit, MC
U ... Microcomputer, CPU ... Central processing unit, FPU ... Floating point processing unit, MULT ...
... Multiplier, MMU ... Memory management unit, CACHE
... Instruction and data cache memory, CCNT ...
... cache controller, EBIF ... external bus interface, LABUS ... 32-bit logical address bus, PABUS ... 32-bit physical address data bus, DBUS, DBS ... 32-bit data bus,
EAB ... External address bus, EDB ... External data bus, MM ... Main memory, RCC ... Register control circuit, RGSTR ... Register.

Claims (20)

[Claims]
1. A data processing device that divides into a plurality of stages to execute an instruction, wherein the plurality of stages include a first stage for fetching at least an instruction from an instruction memory and a fetch stage for the first stage. A second stage for decoding the instruction, a third stage for executing the instruction decoded in the second stage, and a fourth stage for writing the result executed in the third stage into a register. A data processing device, wherein an instruction of a first instruction format stored in the instruction memory is changed to an instruction of a second instruction format and executed.
2. The first instruction format is an instruction format for operating a first operand and a second operand in an operation instruction and storing an operation result in the second operand. The format is the first in the operation instruction
2. The data processing device according to claim 1, wherein the data processing device has an instruction format in which the operand of [1] and the second operand are operated and the operation result is stored in the third operand.
3. The second stage detects that the preceding instruction is a data transfer instruction between registers, detects that the succeeding instruction is an arithmetic instruction, and further detects the transfer destination register number and the succeeding instruction of the preceding instruction. 3. The data processing according to claim 2, wherein it is detected that the instruction transfer destination register numbers are the same, the instruction is converted into an operation instruction of the second instruction format, and the operation instruction is sent to the third stage. apparatus.
4. The data processing device according to claim 3,
It is formed on a single semiconductor substrate.
5. The data processing apparatus according to claim 4, wherein the preceding instruction is a data transfer instruction that transfers the content of the transfer source register to the transfer destination register as it is.
6. The data processing apparatus according to claim 4, wherein the preceding instruction is a data transfer instruction that shifts the contents of a transfer destination register and transfers the contents to the transfer destination register.
7. The preceding instruction resets the contents of the transfer source register to 0.
The data processing apparatus according to claim 4, wherein the data processing instruction is a data transfer instruction that is expanded or sign-extended and transferred to a transfer source register.
8. The second instruction format is the first instruction format.
2. The data processing device according to claim 1, further comprising an instruction in which a plurality of instructions of the instruction format are combined.
9. The second stage detects that the preceding instruction is a data transfer between registers, detects that the following instruction is a fixed bit shift instruction, and further detects the transfer destination register number of the preceding instruction. 9. The method according to claim 8, wherein it is detected that the transfer source register numbers of the subsequent instructions are the same, the single instruction is converted into the one shift instruction of the second instruction format, and the shift instruction is sent to the third stage. Data processing equipment.
10. The second stage detects that the preceding instruction is a data transfer instruction between registers, detects that the succeeding instruction is an arithmetic instruction, and further transfers the register number of the transfer destination of the preceding instruction and the succeeding instruction. Detecting that the transfer source register numbers of the two are the same, converting the subsequent instruction into an operation instruction of the second instruction format having no data flow relationship with the preceding instruction, and sending the operation instruction to the third stage. 3. The data processing apparatus according to claim 2, wherein the same stage can be executed in parallel.
11. The data processing apparatus according to claim 10, wherein the first instruction format is a 2-byte fixed length instruction.
12. The data processing device according to claim 11, wherein the preceding instruction is a data transfer instruction for directly transferring the contents of the transfer source register to the transfer destination register.
13. The data processing apparatus according to claim 11, wherein the preceding instruction is a data transfer instruction that shifts the contents of a transfer destination register and transfers the contents to the transfer destination register.
14. The data processing apparatus according to claim 11, wherein the preceding instruction is a data transfer instruction for 0-extending or sign-extending the contents of the transfer source register and transferring the result to the transfer source register.
15. A pipeline type data processing apparatus, comprising: a first stage for reading a fixed length instruction stored in an instruction memory; and data executed by a plurality of read instructions, and When the plurality of instructions have a predetermined relationship, a second stage that changes the plurality of instructions so that the plurality of instructions can be executed in parallel in a plurality of pipelines; and the changed plurality of instructions in parallel. And a third stage for executing the data processing device.
16. The first stage according to claim 15 has two stages.
A data processing device, wherein two instructions are read simultaneously, and the second stage modifies the two instructions so that the two instructions can be executed in parallel by two pipelines.
17. The first stage according to claim 16,
A data processing device characterized by reading a 2-byte fixed length instruction.
18. A microcomputer in which a CPU and an instruction memory are formed on a single semiconductor substrate, wherein the CPU includes an instruction fetch unit for reading two 2-byte fixed length instructions stored in the instruction memory. If the read two instructions have a dependency on the data to be executed and the two instructions have a predetermined relationship,
An instruction decoder that modifies the two instructions so that the two instructions can be executed in parallel by two pipelines, and two 4-byte-long arithmetic units that execute the two modified instructions in parallel. Microcomputer characterized by.
19. The instruction decoder according to claim 18, wherein an instruction for operating a first operand and a second operand in an operation instruction and storing an operation result in the second operand is a first operand. And a second operand are operated, and the instruction is changed to an instruction for storing the operation result in the third operand.
20. The instruction decoder according to claim 18, wherein the preceding instruction is a data transfer instruction between registers, the subsequent instruction is an arithmetic instruction, and the destination of the preceding instruction is further detected. A microcomputer characterized by detecting that a register number and a transfer source register number of a succeeding instruction are the same and changing the succeeding instruction to an operation instruction having no data flow relationship with the preceding instruction.
JP5277297A 1996-03-18 1997-03-07 Data processor Granted JPH09311786A (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
JP6057196 1996-03-18
JP8-60571 1996-03-18
JP5277297A JPH09311786A (en) 1996-03-18 1997-03-07 Data processor

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP5277297A JPH09311786A (en) 1996-03-18 1997-03-07 Data processor

Publications (1)

Publication Number Publication Date
JPH09311786A true JPH09311786A (en) 1997-12-02

Family

ID=26393432

Family Applications (1)

Application Number Title Priority Date Filing Date
JP5277297A Granted JPH09311786A (en) 1996-03-18 1997-03-07 Data processor

Country Status (1)

Country Link
JP (1) JPH09311786A (en)

Cited By (48)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6298394B1 (en) 1999-10-01 2001-10-02 Stmicroelectronics, Ltd. System and method for capturing information on an interconnect in an integrated circuit
US6349371B1 (en) 1999-10-01 2002-02-19 Stmicroelectronics Ltd. Circuit for storing information
US6351803B2 (en) 1999-10-01 2002-02-26 Hitachi Ltd. Mechanism for power efficient processing in a pipeline processor
US6408381B1 (en) 1999-10-01 2002-06-18 Hitachi, Ltd. Mechanism for fast access to control space in a pipeline processor
US6412043B1 (en) 1999-10-01 2002-06-25 Hitachi, Ltd. Microprocessor having improved memory management unit and cache memory
US6412047B2 (en) 1999-10-01 2002-06-25 Stmicroelectronics, Inc. Coherency protocol
US6434665B1 (en) 1999-10-01 2002-08-13 Stmicroelectronics, Inc. Cache memory store buffer
US6449712B1 (en) 1999-10-01 2002-09-10 Hitachi, Ltd. Emulating execution of smaller fixed-length branch/delay slot instructions with a sequence of larger fixed-length instructions
US6457118B1 (en) 1999-10-01 2002-09-24 Hitachi Ltd Method and system for selecting and using source operands in computer system instructions
US6460174B1 (en) 1999-10-01 2002-10-01 Stmicroelectronics, Ltd. Methods and models for use in designing an integrated circuit
US6463553B1 (en) 1999-10-01 2002-10-08 Stmicroelectronics, Ltd. Microcomputer debug architecture and method
US6487683B1 (en) 1999-10-01 2002-11-26 Stmicroelectronics Limited Microcomputer debug architecture and method
US6496905B1 (en) 1999-10-01 2002-12-17 Hitachi, Ltd. Write buffer with burst capability
US6502210B1 (en) 1999-10-01 2002-12-31 Stmicroelectronics, Ltd. Microcomputer debug architecture and method
US6530047B1 (en) 1999-10-01 2003-03-04 Stmicroelectronics Limited System and method for communicating with an integrated circuit
US6542983B1 (en) 1999-10-01 2003-04-01 Hitachi, Ltd. Microcomputer/floating point processor interface and method
US6546480B1 (en) 1999-10-01 2003-04-08 Hitachi, Ltd. Instructions for arithmetic operations on vectored data
US6553460B1 (en) 1999-10-01 2003-04-22 Hitachi, Ltd. Microprocessor having improved memory management unit and cache memory
US6557119B1 (en) 1999-10-01 2003-04-29 Stmicroelectronics Limited Microcomputer debug architecture and method
US6567932B2 (en) 1999-10-01 2003-05-20 Stmicroelectronics Limited System and method for communicating with an integrated circuit
US6574651B1 (en) 1999-10-01 2003-06-03 Hitachi, Ltd. Method and apparatus for arithmetic operation on vectored data
US6590907B1 (en) 1999-10-01 2003-07-08 Stmicroelectronics Ltd. Integrated circuit with additional ports
US6591369B1 (en) 1999-10-01 2003-07-08 Stmicroelectronics, Ltd. System and method for communicating with an integrated circuit
US6598128B1 (en) 1999-10-01 2003-07-22 Hitachi, Ltd. Microprocessor having improved memory management unit and cache memory
US6598177B1 (en) 1999-10-01 2003-07-22 Stmicroelectronics Ltd. Monitoring error conditions in an integrated circuit
US6601189B1 (en) 1999-10-01 2003-07-29 Stmicroelectronics Limited System and method for communicating with an integrated circuit
US6615370B1 (en) 1999-10-01 2003-09-02 Hitachi, Ltd. Circuit for storing trace information
US6629115B1 (en) 1999-10-01 2003-09-30 Hitachi, Ltd. Method and apparatus for manipulating vectored data
US6629207B1 (en) 1999-10-01 2003-09-30 Hitachi, Ltd. Method for loading instructions or data into a locked way of a cache memory
US6633971B2 (en) 1999-10-01 2003-10-14 Hitachi, Ltd. Mechanism for forward data in a processor pipeline using a single pipefile connected to the pipeline
US6665816B1 (en) 1999-10-01 2003-12-16 Stmicroelectronics Limited Data shift register
US6684348B1 (en) 1999-10-01 2004-01-27 Hitachi, Ltd. Circuit for processing trace information
US6693914B1 (en) 1999-10-01 2004-02-17 Stmicroelectronics, Inc. Arbitration mechanism for packet transmission
US6701405B1 (en) 1999-10-01 2004-03-02 Hitachi, Ltd. DMA handshake protocol
US6732307B1 (en) 1999-10-01 2004-05-04 Hitachi, Ltd. Apparatus and method for storing trace information
US6779145B1 (en) 1999-10-01 2004-08-17 Stmicroelectronics Limited System and method for communicating with an integrated circuit
US6820195B1 (en) 1999-10-01 2004-11-16 Hitachi, Ltd. Aligning load/store data with big/little endian determined rotation distance control
US6826191B1 (en) 1999-10-01 2004-11-30 Stmicroelectronics Ltd. Packets containing transaction attributes
US6859891B2 (en) 1999-10-01 2005-02-22 Stmicroelectronics Limited Apparatus and method for shadowing processor information
US6918065B1 (en) 1999-10-01 2005-07-12 Hitachi, Ltd. Method for compressing and decompressing trace information
US6928073B2 (en) 1999-10-01 2005-08-09 Stmicroelectronics Ltd. Integrated circuit implementing packet transmission
US7000078B1 (en) 1999-10-01 2006-02-14 Stmicroelectronics Ltd. System and method for maintaining cache coherency in a shared memory system
US7072817B1 (en) 1999-10-01 2006-07-04 Stmicroelectronics Ltd. Method of designing an initiator in an integrated circuit
US7076638B2 (en) 2001-09-20 2006-07-11 Matsushita Electric Industrial Co., Ltd. Processor, compiler and compilation method
US7260745B1 (en) 1999-10-01 2007-08-21 Stmicroelectronics Ltd. Detection of information on an interconnect
US7266728B1 (en) 1999-10-01 2007-09-04 Stmicroelectronics Ltd. Circuit for monitoring information on an interconnect
JP2010271818A (en) * 2009-05-20 2010-12-02 Nec Computertechno Ltd Device and method of instruction fusion calculation
JP2014194755A (en) * 2013-03-15 2014-10-09 Intel Corp Methods and apparatus for fusing instructions to provide or-test and and-test functionality on multiple test sources

Cited By (54)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6298394B1 (en) 1999-10-01 2001-10-02 Stmicroelectronics, Ltd. System and method for capturing information on an interconnect in an integrated circuit
US6349371B1 (en) 1999-10-01 2002-02-19 Stmicroelectronics Ltd. Circuit for storing information
US6351803B2 (en) 1999-10-01 2002-02-26 Hitachi Ltd. Mechanism for power efficient processing in a pipeline processor
US6408381B1 (en) 1999-10-01 2002-06-18 Hitachi, Ltd. Mechanism for fast access to control space in a pipeline processor
US6412043B1 (en) 1999-10-01 2002-06-25 Hitachi, Ltd. Microprocessor having improved memory management unit and cache memory
US6412047B2 (en) 1999-10-01 2002-06-25 Stmicroelectronics, Inc. Coherency protocol
US6434665B1 (en) 1999-10-01 2002-08-13 Stmicroelectronics, Inc. Cache memory store buffer
US6449712B1 (en) 1999-10-01 2002-09-10 Hitachi, Ltd. Emulating execution of smaller fixed-length branch/delay slot instructions with a sequence of larger fixed-length instructions
US6457118B1 (en) 1999-10-01 2002-09-24 Hitachi Ltd Method and system for selecting and using source operands in computer system instructions
US6460174B1 (en) 1999-10-01 2002-10-01 Stmicroelectronics, Ltd. Methods and models for use in designing an integrated circuit
US6463553B1 (en) 1999-10-01 2002-10-08 Stmicroelectronics, Ltd. Microcomputer debug architecture and method
US6487683B1 (en) 1999-10-01 2002-11-26 Stmicroelectronics Limited Microcomputer debug architecture and method
US6496905B1 (en) 1999-10-01 2002-12-17 Hitachi, Ltd. Write buffer with burst capability
US6502210B1 (en) 1999-10-01 2002-12-31 Stmicroelectronics, Ltd. Microcomputer debug architecture and method
US6530047B1 (en) 1999-10-01 2003-03-04 Stmicroelectronics Limited System and method for communicating with an integrated circuit
US6542983B1 (en) 1999-10-01 2003-04-01 Hitachi, Ltd. Microcomputer/floating point processor interface and method
US6546480B1 (en) 1999-10-01 2003-04-08 Hitachi, Ltd. Instructions for arithmetic operations on vectored data
US6553460B1 (en) 1999-10-01 2003-04-22 Hitachi, Ltd. Microprocessor having improved memory management unit and cache memory
US6557119B1 (en) 1999-10-01 2003-04-29 Stmicroelectronics Limited Microcomputer debug architecture and method
US6567932B2 (en) 1999-10-01 2003-05-20 Stmicroelectronics Limited System and method for communicating with an integrated circuit
US6574651B1 (en) 1999-10-01 2003-06-03 Hitachi, Ltd. Method and apparatus for arithmetic operation on vectored data
US6590907B1 (en) 1999-10-01 2003-07-08 Stmicroelectronics Ltd. Integrated circuit with additional ports
US6591369B1 (en) 1999-10-01 2003-07-08 Stmicroelectronics, Ltd. System and method for communicating with an integrated circuit
US6591340B2 (en) 1999-10-01 2003-07-08 Hitachi, Ltd. Microprocessor having improved memory management unit and cache memory
US6598128B1 (en) 1999-10-01 2003-07-22 Hitachi, Ltd. Microprocessor having improved memory management unit and cache memory
US6598177B1 (en) 1999-10-01 2003-07-22 Stmicroelectronics Ltd. Monitoring error conditions in an integrated circuit
US6601189B1 (en) 1999-10-01 2003-07-29 Stmicroelectronics Limited System and method for communicating with an integrated circuit
US7228389B2 (en) 1999-10-01 2007-06-05 Stmicroelectronics, Ltd. System and method for maintaining cache coherency in a shared memory system
US6629115B1 (en) 1999-10-01 2003-09-30 Hitachi, Ltd. Method and apparatus for manipulating vectored data
US6629207B1 (en) 1999-10-01 2003-09-30 Hitachi, Ltd. Method for loading instructions or data into a locked way of a cache memory
US6633971B2 (en) 1999-10-01 2003-10-14 Hitachi, Ltd. Mechanism for forward data in a processor pipeline using a single pipefile connected to the pipeline
US6665816B1 (en) 1999-10-01 2003-12-16 Stmicroelectronics Limited Data shift register
US6684348B1 (en) 1999-10-01 2004-01-27 Hitachi, Ltd. Circuit for processing trace information
US6693914B1 (en) 1999-10-01 2004-02-17 Stmicroelectronics, Inc. Arbitration mechanism for packet transmission
US6701405B1 (en) 1999-10-01 2004-03-02 Hitachi, Ltd. DMA handshake protocol
US6732307B1 (en) 1999-10-01 2004-05-04 Hitachi, Ltd. Apparatus and method for storing trace information
US6779145B1 (en) 1999-10-01 2004-08-17 Stmicroelectronics Limited System and method for communicating with an integrated circuit
US6820195B1 (en) 1999-10-01 2004-11-16 Hitachi, Ltd. Aligning load/store data with big/little endian determined rotation distance control
US6826191B1 (en) 1999-10-01 2004-11-30 Stmicroelectronics Ltd. Packets containing transaction attributes
US6859891B2 (en) 1999-10-01 2005-02-22 Stmicroelectronics Limited Apparatus and method for shadowing processor information
US6918065B1 (en) 1999-10-01 2005-07-12 Hitachi, Ltd. Method for compressing and decompressing trace information
US7346072B2 (en) 1999-10-01 2008-03-18 Stmicroelectronics Ltd. Arbitration mechanism for packet transmission
US7000078B1 (en) 1999-10-01 2006-02-14 Stmicroelectronics Ltd. System and method for maintaining cache coherency in a shared memory system
US7072817B1 (en) 1999-10-01 2006-07-04 Stmicroelectronics Ltd. Method of designing an initiator in an integrated circuit
US7266728B1 (en) 1999-10-01 2007-09-04 Stmicroelectronics Ltd. Circuit for monitoring information on an interconnect
US6615370B1 (en) 1999-10-01 2003-09-02 Hitachi, Ltd. Circuit for storing trace information
US7260745B1 (en) 1999-10-01 2007-08-21 Stmicroelectronics Ltd. Detection of information on an interconnect
US6928073B2 (en) 1999-10-01 2005-08-09 Stmicroelectronics Ltd. Integrated circuit implementing packet transmission
US7761692B2 (en) 2001-09-20 2010-07-20 Panasonic Corporation Processor, compiler and compilation method
US7076638B2 (en) 2001-09-20 2006-07-11 Matsushita Electric Industrial Co., Ltd. Processor, compiler and compilation method
JP2010271818A (en) * 2009-05-20 2010-12-02 Nec Computertechno Ltd Device and method of instruction fusion calculation
US8677102B2 (en) 2009-05-20 2014-03-18 Nec Corporation Instruction fusion calculation device and method for instruction fusion calculation
JP2016103280A (en) * 2013-03-15 2016-06-02 インテル・コーポレーション Method and apparatus for fusing instructions to provide or-test and and-test functionality on multiple test sources
JP2014194755A (en) * 2013-03-15 2014-10-09 Intel Corp Methods and apparatus for fusing instructions to provide or-test and and-test functionality on multiple test sources

Similar Documents

Publication Publication Date Title
US8762688B2 (en) Multithreaded processor with multiple concurrent pipelines per thread
US8078828B1 (en) Memory mapped register file
US5761470A (en) Data processor having an instruction decoder and a plurality of executing units for performing a plurality of operations in parallel
US5659722A (en) Multiple condition code branching system in a multi-processor environment
US5838984A (en) Single-instruction-multiple-data processing using multiple banks of vector registers
DE68929483T2 (en) Data processor with an instruction unit having a cache memory and a ROM.
US7877582B2 (en) Multi-addressable register file
US7533243B2 (en) Processor for executing highly efficient VLIW
US6334176B1 (en) Method and apparatus for generating an alignment control vector
US5311458A (en) CPU with integrated multiply/accumulate unit
US5163139A (en) Instruction preprocessor for conditionally combining short memory instructions into virtual long instructions
US5826089A (en) Instruction translation unit configured to translate from a first instruction set to a second instruction set
RU2111531C1 (en) Circuit for parallel processing of at least two instructions in digital computer
JP3858939B2 (en) System and method for retirement of instructions in a superscaler microprocessor
US5870598A (en) Method and apparatus for providing an optimized compare-and-branch instruction
JP2550213B2 (en) Parallel processing device and parallel processing method
DE4301417C2 (en) Computer system with parallel command execution facility
US7793079B2 (en) Method and system for expanding a conditional instruction into a unconditional instruction and a select instruction
DE69627807T2 (en) Data processor for simultaneous data loading and execution of a multiply-add operation
KR100690225B1 (en) Data processor system and instruction system using grouping
CA1174370A (en) Data processing unit with pipelined operands
US5530817A (en) Very large instruction word type computer for performing a data transfer between register files through a signal line path
US6490673B1 (en) Processor, compiling apparatus, and compile program recorded on a recording medium
US6058465A (en) Single-instruction-multiple-data processing in a multimedia signal processor
EP0381471B1 (en) Method and apparatus for preprocessing multiple instructions in a pipeline processor