KR19990084909A

KR19990084909A - Superscalar Pipeline Structure

Info

Publication number: KR19990084909A
Application number: KR1019980016977A
Authority: KR
Inventors: 정대석
Original assignee: 윤종용; 삼성전자 주식회사
Priority date: 1998-05-12
Filing date: 1998-05-12
Publication date: 1999-12-06

Abstract

분기할 어드레스의 명령어와 다음 어드레스의 명령어를 동시에 수행하다가 소정의 시점에서 상기 분기할 어드레스의 명령어와 상기 다음 어드레스의 명령어 중에서 해당되는 것을 선택하여 수행하는 것을 특징으로 하는 수퍼스칼라 파이프라인 구조가 개시되어 있다. 본 발명에 의하면, 파이프라인의 성능이 분기 예측의 정확성에 따라서 수퍼스칼라 파이프라인의 성능의 영향을 받지 않으며, 분기 예측 장치를 따로 필요로 하지 않는 효과를 가진다.A superscalar pipeline structure is disclosed which simultaneously executes an instruction of an address to branch and an instruction of a next address, and then selects and executes a corresponding one among the instruction of the address to branch and the instruction of the next address at a predetermined time point. have. According to the present invention, the performance of the pipeline is not affected by the performance of the superscalar pipeline according to the accuracy of the branch prediction, and has the effect of not requiring a separate branch prediction device.

Description

Superscalar Pipeline Structure

본 발명은 파이프라인(Pipeline) 구조에 관한 것으로서, 특히 수퍼스칼라(Superscalar) 파이프라인 구조에 있어서 분기 예측(Branch Prediction) 장치를 필요로 하지 않고도 효율적으로 동작하는 파이프라인 구조에 관한 것이다.BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a pipeline structure, and more particularly, to a pipeline structure that operates efficiently without requiring a branch prediction device in a superscalar pipeline structure.

반도체 장치의 성능을 고도로 향상시키기 위하여, 명령어(Instruction) 레벨(Level)에서의 병행 기법, 특히 수퍼스칼라(Superscalar)기법과 수퍼파이프라인(Superpipeline)에 대한 연구가 진행되어 오고 있다. 수퍼스칼라 수행(Superscalar Processing) 기법은 최근 단일 프로세서(Single Processor) 시스템에 있어서 고성능을 얻기 위한 일반적인 기법이 되고 있다.In order to improve the performance of semiconductor devices, research on parallel techniques at the instruction level, in particular, the superscalar technique and the superpipeline, has been conducted. Superscalar Processing has recently become a common technique for achieving high performance in single processor systems.

수퍼스칼라 기법과 수퍼파이프라인 기법에 있어서, 분기(Branch) 명령어들은 전체적인 시스템9System)의 성능을 결정짓는 데 아주 중요한 요소가 되고 있다. 분기 명령어들은 일반적으로 복합적인 제어 동작을 수행하는데 사용된다. 따라서 분기의 수가 증가할수록 시스템의 성능은 저하하게 된다.In superscalar and superpipeline techniques, branch instructions have become a very important factor in determining the performance of the overall system. Branch instructions are generally used to perform complex control operations. Therefore, as the number of branches increases, the performance of the system decreases.

종래의 리스크(RISC: Reduced Instruction Set computer) 형태의 중앙 처리 장치(CPU: Central Processing Unit) 구조(Architecture)에서는 분기(Branch) 명령어(Instruction)를 처리할 때 분기 예측(Prediction) 알고리즘(Algorithm)을 사용하여 분기할 어드레스(Address)를 예측하고 이 어드레스에서 명령어를 페치(Fetch)하여 오게 된다.In the conventional Reduced Instruction Set Computer (RISC) central processing unit (CPU) architecture, a branch prediction algorithm is applied when processing a branch instruction. Use this to predict the address to branch to and fetch the instruction from this address.

도 1은 종래의 분기 예측 알고리즘을 사용하는 수퍼스칼라 파이프라인 구조에 있어서 수퍼스칼라 파이프라인 동작의 개략도를 나타내고 있다. 여기서, 참조부호들, T1 내지 T6은 클럭(CLK) 사이클들을 나타낸다. 그리고, 참조부호들, 151 내지 162는 명령어들의 예들을 나타내고 있다. 참조부호, 163은 분기 어드레스의 정확성이 판단되는 시점을 나타내고 있다.1 shows a schematic diagram of a superscalar pipeline operation in a superscalar pipeline structure using a conventional branch prediction algorithm. Here, reference numerals T1 to T6 denote clock (CLK) cycles. Reference numerals 151 to 162 indicate examples of instructions. Reference numeral 163 denotes a time point at which the accuracy of the branch address is determined.

도 1을 참조하면, 종래의 분기 예측 알고리즘을 사용하는 수퍼스칼라 파이프라인 구조는 파이프라인들(110,120,130,140)을 구비한다.Referring to FIG. 1, a superscalar pipeline structure using a conventional branch prediction algorithm includes pipelines 110, 120, 130, and 140.

파이프라인들(110,120,130,140)은 각각, 명령어 페치(IF: Instruction Fetch), 명령어 디코드(DEC: Decode), 동작 수행(EXE: Execution), 및 메모리 엑세스/레지스터 저장(WB) 등의 단계들(Stages)로써 구성되어 있다.The pipelines 110, 120, 130, and 140 each include stages such as Instruction Fetch (IF), Decode (DEC), Execution (EXE), and Memory Access / Register Storage (WB). It consists of.

도 1에 나타내고 있는 바와 같이, 종래의 파이프라인 구조는 매 클럭 사이클마다 4 개의 명령어들이 동시에 페치 되고 수행될 수 있도록 구성되어 있다. 클럭 사이클(T1)에서 4 개의 명령어들(151 내지 154)이 동시에 페치 된다. 여기서, 명령어들(151 내지 154)은 분기 명령어를 포함한다. 분기(Branch) 명령어(Instruction)를 처리할 때에는 분기 예측(Prediction) 알고리즘(Algorithm)을 사용하여 분기할 어드레스(Address)를 예측하고 이 어드레스에서 명령어를 페치(Fetch)하여 오게 된다. 이 때, 분기할 어드레스의 정확성에 대한 판단은 클럭 사이클(T2)의 명령어 디코드 단계(DEC)를 지나 클럭 사이클(T3)에서 이루어지게 된다. 그러므로 만약에 분기 예측 알고리즘을 사용하여 분기할 어드레스가 정확하게 예측이 되는 경우에는 클럭 사이클(T1)과 클럭 사이클(T2)에서 페치 되는 명령어들(151 내지 158)의 수행에 지장 없이 그대로 파이프라인의 동작들이 이루어진다. 그러나, 만약에 분기 예측 알고리즘을 사용하여 분기할 어드레스가 정확하게 예측이 되지 않는 경우에는 클럭 사이클(T1)에 페치 되는 명령어들(151 내지 154) 중에서 분기 명령어(152) 이후의 명령어들(153,154)과 클럭 사이클(T2)에서 페치 한 명령어들(155 내지 158)은 잘못 페치 된 것으로서 플러쉬(Flush) 처리를 하여야 하고, 클럭 사이클(T3)부터 다시 정확하게 판단되는 어드레스로부터 명령어들(159 내지 162)을 페치 하여 수행하여야 한다. 따라서, 분기 예측의 정확성에 따라서 수퍼스칼라 파이프라인의 성능이 결정되게 된다.As shown in FIG. 1, the conventional pipeline structure is configured such that four instructions can be fetched and executed simultaneously every clock cycle. Four instructions 151 to 154 are fetched at the same time in clock cycle T1. Here, the instructions 151 to 154 include branch instructions. When processing a branch instruction, a branch prediction algorithm is used to predict an address to branch and fetch an instruction from the address. At this time, the determination of the accuracy of the address to branch is made in the clock cycle T3 after the instruction decode step DEC of the clock cycle T2. Therefore, if the address to branch using the branch prediction algorithm is correctly predicted, the pipeline operation is performed without interrupting the execution of the instructions 151 to 158 fetched in the clock cycle T1 and the clock cycle T2. Are made. However, if the address to branch using the branch prediction algorithm is not predicted correctly, the instructions 153 and 154 after the branch instruction 152 among the instructions 151 to 154 fetched in the clock cycle T1 are used. Instructions 155 to 158 fetched in clock cycle T2 must be flushed as being fetched incorrectly, and fetch instructions 159 to 162 from an address that is correctly determined again from clock cycle T3. Should be carried out. Therefore, the performance of the superscalar pipeline is determined according to the accuracy of branch prediction.

이와 같은 종래의 수퍼스칼라 파이프라인 구조에 있어서, 분기할 어드레스를 정확하게 맞출 확률은 사용하는 알고리즘에 따라 다르지만 약 90% 정도 된다. 그러나, 분기 예측 알고리즘을 사용하는 경우 잘못된 분기 어드레스를 예측할 수 있다. 잘못된 어드레스에서 명령어를 가져온 경우에는 이 명령어를 플러쉬(Flush)시켜야 한다. 따라서 여러 클럭 사이클(Cycle)을 쉬게 되고 이에 따라 CPU의 성능이 나빠지게 된다.In such a conventional superscalar pipeline structure, the probability of precisely matching an address to branch is about 90%, depending on the algorithm used. However, if a branch prediction algorithm is used, wrong branch addresses can be predicted. If you get a command from the wrong address, you have to flush it. As a result, several clock cycles are taken and the CPU performance is degraded.

따라서 본 발명의 목적은 파이프라인 구조에 있어서, 특히 수퍼스칼라 파이프라인 구조에 있어서 분기 예측 장치를 필요로 하지 않고도 효율적으로 동작하는 파이프라인 구조를 제공하는 데 있다.Accordingly, an object of the present invention is to provide a pipeline structure that operates efficiently without requiring a branch prediction device in a pipeline structure, particularly in a superscalar pipeline structure.

도 1은 종래의 수퍼스칼라 파이프라인 구조의 개략도이다.1 is a schematic diagram of a conventional superscalar pipeline structure.

도 2는 본 발명의 실시예에 따른 수퍼스칼라 파이프라인 구조의 개략도이다.2 is a schematic diagram of a superscalar pipeline structure according to an embodiment of the present invention.

* 도면의 부호에 대한 자세한 설명* Detailed description of the signs in the drawings

T1 내지 T6: 클럭 사이클들, 151 내지 162,251 내지 262: 명령어들,T1 to T6: clock cycles, 151 to 162, 251 to 262: instructions,

163,263: 분기 어드레스 정확성 판단 시점들, IF: 명령어 페치163,263: branch address accuracy judgment points, IF: instruction fetch

DEC: 명령어 디코드, EXE: 동작 수행,DEC: command decode, EXE: perform an action,

WB: 메모리 엑세스/레지스터 저장.WB: Memory access / register storage.

상기 목적을 달성하기 위하여 본 발명에 의한 파이프라인 구조는Pipeline structure according to the present invention to achieve the above object is

분기할 어드레스의 명령어와 다음 어드레스의 명령어를 동시에 수행하다가 소정의 시점에서 상기 분기할 어드레스의 명령어와 상기 다음 어드레스의 명령어 중에서 해당되는 것을 선택하여 수행하는 것을 특징으로 한다.Simultaneously executing the instruction of the address to branch and the instruction of the next address, and at a predetermined point of time to select a corresponding one from the instruction of the address to branch and the instruction of the next address.

이어서, 첨부한 도면들을 참조하여 본 발명의 구체적인 실시예에 따른 수퍼스칼라 파이프라인 구조를 자세히 설명하기로 한다.Next, a superscalar pipeline structure according to a specific embodiment of the present invention will be described in detail with reference to the accompanying drawings.

도 2는 본 발명의 실시예에 따른 분기 예측 알고리즘을 사용하는 수퍼스칼라 파이프라인 구조에 있어서 수퍼스칼라 파이프라인 동작의 개략도를 나타내고 있다. 여기서, 참조부호들, T1 내지 T6은 클럭(CLK) 사이클들을 나타낸다. 그리고, 참조부호들, 251 내지 262는 명령어들의 예들을 나타내고 있다. 참조부호, 263은 분기 어드레스의 정확성이 판단되는 시점을 나타내고 있다.2 is a schematic diagram of a superscalar pipeline operation in a superscalar pipeline structure using a branch prediction algorithm according to an embodiment of the present invention. Here, reference numerals T1 to T6 denote clock (CLK) cycles. Reference numerals 251 to 262 denote examples of the instructions. Reference numeral 263 denotes a time point at which the accuracy of the branch address is determined.

도 2를 참조하면, 본 발명의 실시예에 따른 분기 예측 알고리즘을 사용하는 수퍼스칼라 파이프라인 구조는 파이프라인들(210,220,230,240)을 구비한다.Referring to FIG. 2, a superscalar pipeline structure using a branch prediction algorithm according to an embodiment of the present invention includes pipelines 210, 220, 230, and 240.

파이프라인들(210,220,230,140)은 각각, 명령어 페치(IF: Instruction Fetch), 명령어 디코드(DEC: Decode), 동작 수행(EXE: Execution), 및 메모리 엑세스/레지스터 저장(WB) 등의 단계들(Stages)로써 구성되어 있다.The pipelines 210, 220, 230, and 140 each include stages such as Instruction Fetch (IF), Incode Decode (DEC), Execution (EXE), and Memory Access / Register Storage (WB). It consists of.

도 2에 나타내고 있는 바와 같이, 본 발명의 실시예에 따른 수퍼스칼라 파이프라인 구조는 매 클럭 사이클마다 4 개의 명령어들이 동시에 페치 되고 수행될 수 있도록 구성되어 있다. 클럭 사이클(T1)에서 4 개의 명령어들(251 내지 254)이 동시에 페치 된다. 여기서, 명령어들(251 내지 254)은 분기 명령어를 포함한다. 분기(Branch) 명령어(Instruction)를 처리할 때에는 분기 예측(Prediction) 알고리즘(Algorithm)을 사용하여 분기할 어드레스(Address)를 예측하고 이 어드레스에서 명령어를 페치(Fetch)하여 오게 된다. 이 때, 분기할 어드레스의 정확성에 대한 판단은 클럭 사이클(T2)의 명령어 디코드 단계(DEC)를 지나 클럭 사이클(T3)에서 이루어지게 된다. 클럭 사이클(T2)의 명령어들(251 내지 254)을 디코딩 하는 단계에서 분기된 어드레스 명령어(Branch addr)(255)와 분기된 어드레스+1 명령어(Branch addr+1)(256), 그리고 다음 어드레스 명령어(Next addr)(257)와 다음 어드레스+1 명령어(Next addr+1)(258)를 동시에 페치 하여 오게 된다. 그러므로 클럭 사이클(T2)의 명령어 디코드 단계(DEC)를 지나 클럭 사이클(T3)에서 분기할 어드레스의 정확성에 대한 판단이 이루어져서, 만약에 분기할 어드레스가 정확한 경우에는 클럭 사이클(T2)에서 페치 되는 명령어들(255 내지 258) 중에서 분기된 어드레스 명령어(255)와 분기된 어드레스+1 명령어(256)의 수행에 지장 없이 그대로 파이프라인의 동작들이 이루어진다. 또한, 클럭 사이클(T3)에서 분기할 어드레스의 정확성에 대한 판단이 이루어져서, 만약에 분기할 어드레스가 정확하지 경우에는 클럭 사이클(T2)에 페치 되는 명령어들(255 내지 258) 중에서 다음 어드레스 명령어(257)와 다음 어드레스+1 명령어(258)의 수행에 지장 없이 그대로 파이프라인의 동작들이 이루어진다. 그리고, 클럭 사이클(T3)부터 다시 정확하게 판단되는 어드레스로부터 명령어들(Precise addr, 내지 Precise addr+3)(259 내지 262)을 페치 하여 수행 한다. 따라서, 분기 예측의 정확성에 따라서 수퍼스칼라 파이프라인의 성능의 영향을 받지 않으며, 분기 예측 장치를 따로 필요로 하지 않는다.As shown in FIG. 2, the superscalar pipeline structure according to the embodiment of the present invention is configured such that four instructions can be simultaneously fetched and executed every clock cycle. In the clock cycle T1, four instructions 251 to 254 are fetched at the same time. Here, the instructions 251 to 254 include branch instructions. When processing a branch instruction, a branch prediction algorithm is used to predict an address to branch and fetch an instruction from the address. At this time, the determination of the accuracy of the address to branch is made in the clock cycle T3 after the instruction decode step DEC of the clock cycle T2. In the step of decoding the instructions 251 to 254 of the clock cycle T2, the branched address instruction (Branch addr) 255 and the branched address + 1 instruction (Branch addr + 1) 256, and the next address instruction The (Next addr) 257 and the next address + 1 instruction (Next addr + 1) 258 are simultaneously fetched. Therefore, a decision is made on the accuracy of the address to branch in clock cycle T3 after the instruction decode step DEC of clock cycle T2, and if the address to branch is correct, the instruction fetched in clock cycle T2. The operations of the pipeline are performed without interrupting the execution of the branched address instruction 255 and the branched address + 1 instruction 256 among the fields 255 to 258. Further, a determination is made as to the accuracy of the address to branch in clock cycle T3. If the address to branch to is not correct, the next address instruction 257 among the instructions 255 to 258 fetched in clock cycle T2. ) And pipeline operations are performed without interruption to the execution of next address + 1 instruction 258. In addition, instructions (Precise addr, Precise addr + 3) (259 to 262) are fetched from an address that is correctly determined again from the clock cycle T3. Therefore, the accuracy of the branch prediction is not affected by the performance of the superscalar pipeline, and does not require a branch prediction device.

이와 같이, 본 발명의 실시예에 따른 수퍼스칼라 파이프라인 구조에 있어서, 분기할 어드레스의 명령어와 다음 어드레스의 명령어를 동시에 수행하다가 소정의 시점에서 상기 분기할 어드레스의 명령어와 상기 다음 어드레스의 명령어 중에서 해당되는 것을 선택하여 수행하므로 파이프라인의 성능이 분기 예측의 정확성에 따라서 수퍼스칼라 파이프라인의 성능의 영향을 받지 않으며, 분기 예측 장치를 따로 필요로 하지 않는다.As described above, in the superscalar pipeline structure according to the embodiment of the present invention, the instruction of the address to branch and the instruction of the next address are executed at the same time, and at a given point of time, the instruction of the branch address and the instruction of the next address correspond. The performance of the pipeline is not affected by the performance of the superscalar pipeline according to the accuracy of branch prediction, and does not require a branch prediction device.

본 발명에 의하면, 소정의 시점에서 상기 분기할 어드레스의 명령어와 상기 다음 어드레스의 명령어 중에서 해당되는 것을 선택하여 수행하므로 파이프라인의 성능이 분기 예측의 정확성에 따라서 수퍼스칼라 파이프라인의 성능의 영향을 받지 않으며, 분기 예측 장치를 따로 필요로 하지 않는 효과를 가진다.According to the present invention, the performance of the pipeline is not affected by the performance of the superscalar pipeline according to the accuracy of branch prediction since the corresponding one is selected from the instruction of the address to branch and the instruction of the next address at a predetermined time point. It does not need a branch prediction device separately.

Claims

A superscalar pipeline structure comprising: performing an instruction of an address to branch and an instruction of a next address at the same time, and then selecting one of the instruction of the address to branch and the instruction of the next address at a predetermined time point.

2. The superscalar pipeline structure of claim 1, wherein the superscalar pipeline structure consists of four pipelines.

3. The superscalar pipeline structure of claim 2, wherein each of the pipelines includes an instruction fetch stage, an instruction decode stage, an operation perform stage, and a memory access / register storage stage.