KR20100092230A

KR20100092230A - Static branch prediction method for pipeline processor and compile method therefor

Info

Publication number: KR20100092230A
Application number: KR1020090011513A
Authority: KR
Inventors: 김태송; 서동관; 김석진
Original assignee: 삼성전자주식회사
Priority date: 2009-02-12
Filing date: 2009-02-12
Publication date: 2010-08-20
Also published as: KR101579589B1; US20100205405A1; US8954946B2

Abstract

PURPOSE: A static branch prediction method for a pipeline processor and a compile method thereof are provided to properly process a small control unit having a small sized block by excluding the use of a delay slot. CONSTITUTION: A conditional branch code is predicted in a taken or not-taken format(S10). The codes within a block are scheduled by converting the conditional branch into a jump target address setup code and a test code, wherein the jump target address setup code includes target address information and branch timing information(S20). If the conditional code is predicted in the taken format, the target address indicated by the target address information is fetched to a cycle time indicated by thee branch timing information(S30).

Description

Static branch prediction method for pipeline processor and compile method therefor}

본 명세서는 명령어(instruction)들을 실행하는 프로세서(processor)에 관한 것으로, 보다 구체적으로 파이프라인 프로세서(pipeline processor)에 관한 것이다. The present disclosure relates to a processor for executing instructions, and more particularly to a pipeline processor.

파이프라인 프로세서에서는 하나의 명령어 처리가 여러 개의 단계들로 구분되어 이루어진다. 예를 들어, 명령어 처리 과정은 페치(fetch) 단계, 디코드(decode) 단계, 실행(execute) 단계, 메모리(memory) 단계, 및 기입(write) 단계로 구분될 수 있다. 파이프라인 프로세서에서는 복수의 명령어들이 겹쳐진 형태로 파이프라인의 각 단계들을 순차적으로 지나가면서 실행될 수 있기 때문에, 프로그램의 고속 처리가 가능하다. In a pipeline processor, one instruction is divided into several steps. For example, the instruction processing process may be divided into a fetch stage, a decode stage, an execute stage, a memory stage, and a write stage. In the pipeline processor, a plurality of instructions can be executed while sequentially passing each step of the pipeline in an overlapped form, thereby enabling high-speed processing of a program.

파이프라인 프로세서의 성능에 영향을 미치는 요소들 중의 하나는 '분기 해저드(branch hazard)' 또는 '파이프라인 제어 해저드(pipeline control hazard)'이다. 분기 해저드는 분기 명령어(branch instruction)로 인하여 파이프라인 프로세 서의 처리 속도가 저하되는 것을 가리킨다. 파이프라인 프로세서에서는 분기 명령어의 디코드 단계가 완료되거나 또는 실행 단계가 수행될 때까지는 다음에 페치할 명령어의 어드레스를 알 수가 없기 때문에, 분기 명령어는 파이프라인 프로세서의 성능을 저하시킬 수가 있다. 파이프라인 프로세서의 분기 해저드를 없애기 위하여 많은 연구가 진행되고 있는데, 동적 분기 예측(dynamic branch prediction), 지연 분기(delayed branch), 정적 분기 예측(static branch prediction) 등의 기법들이 제안되었다.One of the factors affecting the performance of a pipeline processor is a 'branch hazard' or a 'pipeline control hazard'. Branch hazards indicate that branch instructions slow down the processing of a pipeline processor. In a pipeline processor, a branch instruction may degrade the performance of the pipeline processor because the address of the next instruction to be fetched is not known until the decode phase of the branch instruction is completed or the execution phase is performed. In order to eliminate the branch hazard of the pipeline processor, much research is being conducted, and techniques such as dynamic branch prediction, delayed branch, and static branch prediction have been proposed.

한편, 재구성가능 프로세서(Reconfigurable Processor)는 데이터 연산이 많은 루프(loop)들은 코어스 그레인 재구성가능 어레이(Coarse Grained reconfigurable Array, CGA)에서 가속하여 연산하는 반면 제어 파트(control part)는 VLIW(Very Long Instruction Word) 머신에서 실행한다. 제어 파트는 일반적으로 기본 블록(Basic Block, BB)의 크기가 작고 데이터 흐름이 간단한 특징이 있다. VLIW 머신에 있어서, 명령어의 실행 스케쥴은 소프트웨어인 컴파일러에 의해 프로세서 외부에서 결정되고 프로세서 내에서의 실행 스케쥴은 고정되어 있기 때문에, 하드웨어를 간단하게 할 수 있다.Meanwhile, the reconfigurable processor accelerates and computes loops with a lot of data operations in a coarse grained reconfigurable array (CGA) while the control part controls a VLIW (Very Long Instruction). Word) Run on the machine. The control part is generally characterized by a small basic block (BB) and a simple data flow. In a VLIW machine, the execution schedule of instructions is determined outside the processor by a compiler, which is software, and the execution schedule in the processor is fixed, thereby simplifying the hardware.

전술한 분기 해저드를 완화하기 위한 방법들 중에서, 동적 분기 예측 방법은 프로그램의 실행 중에 과거 행적(history)에 비추어 해당 조건부 분기 명령에 대하여 테이큰(taken) 또는 낫-테이큰(not-taken)으로 예측을 하는 기법이다. 이러한 동적 분기 예측 방법은, 분기 문제를 해결하기 위하여 많은 하드웨어를 필요로 하므로, 간단한 하드웨어 구성을 요구하는 VLIW 머신의 파이프라인 제어 해저드를 없 애기 위한 방법으로는 적합하지 않다. 그리고 조건부 분기 오퍼레이션(branch operation)과 의존성이 없는 명령어들을 지연 슬롯(delayed slots)에 채워 넣는 지연 분기 방법도, 기본 블록(BB)의 크기가 작아서 주로 적은 개수의 명령어를 처리하는 VLIW 머신에 적용하기에 적합하지 않다. Among the methods for mitigating the aforementioned branch hazards, the dynamic branch prediction method takes predictions with take or not-taken for the conditional branch instruction in view of past history during execution of the program. It is a technique. This dynamic branch prediction method requires a lot of hardware to solve the branching problem, so it is not suitable as a method for eliminating the pipeline control hazard of the VLIW machine requiring a simple hardware configuration. The delayed branching method, which fills delayed slots with conditional branch operations and dependencies, also applies to VLIW machines that handle small numbers of instructions mainly because of the small size of the base block (BB). Not suitable for

정적 분기 예측 방법은 프로그램을 실행하기 전에 이미 조건부 분기 명령어에 대한 테이큰(taken) 또는 낫-테이큰(not-taken)의 분기 예측이 결정되는 방식이다. 기존의 정적 분기 예측 방법에 의하면, 낫-테이큰으로 예측한 경우에는 지연 슬롯을 사용하지 않지만, 테이큰으로 예측한 경우에는 분기 명령어 뒤에 지연 슬롯을 포함시킨다. 따라서 기존의 정적 분기 예측 방법도 VLIW 머신에 적용하기에는 일정한 한계가 있다. 또한, 기존의 정적 분기 예측 방법에 의하면, 분기 오퍼레이션을 수행하는데 많은 정보(데이터)가 필요할 뿐만 아니라 수행해야 할 일(비교 과정 및 분기 과정 등)도 많기 때문에, 분기 명령어의 처리할 때 인코딩 공간(encoding space)의 부족을 초래할 수 있다. The static branch prediction method is a method in which a branch prediction of a taken or not-taken for a conditional branch instruction is already determined before executing a program. According to the existing static branch prediction method, the delay slot is not used when predicting by sickle-taking, but the delay slot is included after the branch instruction when predicting by taken. Therefore, the existing static branch prediction method has some limitations to be applied to VLIW machines. In addition, according to the existing static branch prediction method, not only a lot of information (data) is required to perform a branch operation, but also a lot of work (such as a comparison process and a branch process) is performed, so that the encoding space ( may result in a lack of encoding space.

파이프라인 프로세서의 제어 해저드를 감소시키거나 없애서 그 성능을 향상시킬 수 있는 정적 분기 예측 방법과 장치 및 이를 위한 컴파일 방법이 제안된다. 특히, 기본 블록(BB)에 포함되는 명령어들의 개수가 많지 않은 프로그램의 고속 처리에 적합한 파이프라인 프로세서를 위한 정적 분기 예측 방법과 장치 및 이를 위한 컴파일 방법이 제안된다. A static branch prediction method and apparatus for improving the performance by reducing or eliminating the control hazard of a pipeline processor and a compilation method thereof are proposed. In particular, a static branch prediction method and apparatus for a pipeline processor suitable for high-speed processing of a program having a small number of instructions included in a basic block BB, and a compilation method for the same are proposed.

그리고 파이프라인 프로세서에서 하드웨어 추가의 부담이 적으며 또한 테이큰으로 예측한 경우에도 지연 슬롯을 사용할 필요가 없는 정적 분기 예측 방법과 장치 및 이를 위한 컴파일 방법이 제안된다. 특히, 분기 명령어의 처리시에 인코딩 공간이 부족한 현상도 생기지 않도록 할 수 있는 정적 분기 예측 방법과 장치 및 이를 위한 컴파일 방법이 제안된다.In addition, we propose a static branch prediction method and apparatus and a compile method for the pipeline processor, which require less hardware addition and do not need to use delay slots even when predicted as a take. In particular, a static branch prediction method and apparatus and a compilation method therefor are proposed to prevent the occurrence of a lack of encoding space when processing a branch instruction.

일 실시예에 따른 파이프라인 프로세서를 위한 정적 분기 예측 방법은, 조건부 분기 코드에 대하여 테이큰 또는 낫-테이큰으로 예측하는 단계, 상기 조건부 분기 코드를 타깃 어드레스 정보 및 분기 시기 정보를 포함하는 점프 타깃 어드레스 설정 코드와 테스트 코드로 변환하여 블록 내의 코드들을 스케쥴하되, 상기 테스트 코드를 블록의 마지막 슬롯에 스케쥴하고 또한 상기 점프 타깃 어드레스 설정 코드는 상기 블록 내의 다른 모든 코드들을 스케쥴링한 후에 빈 슬롯에 스케쥴하는 단계, 및 상기 예측 단계에서 테이큰으로 예측한 경우에, 상기 분기 시기 정보가 지시하는 사이클 시간에 상기 타깃 어드레스 정보가 지시하는 타깃 어드레스를 페치하는 단계를 포함한다. In a static branch prediction method for a pipeline processor according to an embodiment, predicting a conditional branch code with a taken or sickle-taked, and setting the jump target address including the conditional branch code with target address information and branch timing information Converting the code and the test code to schedule codes in the block, wherein the test code is scheduled in the last slot of the block and the jump target address setting code is scheduled in an empty slot after all other codes in the block are scheduled; And fetching a target address indicated by the target address information at a cycle time indicated by the branch timing information when predicted by the take in the predicting step.

다른 실시예에 따른 정적 분기 예측을 위한 코드 컴파일 방법은, 조건부 분기 코드를 타깃 어드레스 정보 및 분기 시기 정보를 포함하는 점프 타깃 어드레스 설정 코드와 테스트 코드로 변환하는 단계, 및 상기 테스트 코드와 상기 점프 타깃 어드레스 설정 코드를 포함하는 블록 내의 모든 코드들을 스케쥴링하는 단계를 포함하고, 상기 스케쥴링 단계에서는 상기 테스트 코드를 블록의 마지막 슬롯에 스케 쥴하고 또한 상기 점프 타깃 어드레스 설정 코드는 상기 블록 내의 다른 모든 코드들을 스케쥴링한 후에 빈 슬롯에 스케쥴한다. According to another exemplary embodiment, a code compilation method for static branch prediction may include converting a conditional branch code into a jump target address setting code and a test code including target address information and branch timing information, and the test code and the jump target. Scheduling all codes in a block that includes an addressing code, wherein the scheduling step schedules the test code to the last slot of the block and the jump target addressing code also schedules all other codes in the block. Then schedule it into an empty slot.

또 다른 실시예에 따른 파이프라인 프로세서를 위한 코드 실행 방법은 조건부 분기 코드를 타깃 어드레스 정보 및 분기 시기 정보를 포함하는 점프 타깃 어드레스 설정 코드로 변환하는 단계, 상기 블록 내의 다른 모든 코드들을 스케쥴링한 후에 생기는 빈 슬롯에 상기 점프 타깃 어드레스 설정 코드를 스케쥴하는 단계, 및 상기 분기 시기 정보가 지시하는 사이클 시간에 상기 타깃 어드레스 정보가 지시하는 타깃 어드레스를 페치하는 단계를 포함한다.In another aspect, a code execution method for a pipeline processor includes converting a conditional branch code into a jump target address setting code including target address information and branch timing information, and after scheduling all other codes in the block. Scheduling the jump target address setting code in an empty slot, and fetching a target address indicated by the target address information at a cycle time indicated by the branch timing information.

일 실시예에 의하면, 하드웨어 추가의 부담을 줄일 수 있는 정적 분기 예측 방법을 사용하면서도 지연 슬롯을 사용할 필요가 없다. 따라서 기본 블록의 크기가 작은 제어 파트 등의 처리에 적합하여 VLIW 머신에 적용하기에 적합하다. 그리고 기존의 방법에 비하여 기본 블록을 처리하는데 소요되는 사이클 시간을 단축시킬 수가 있기 때문에, 프로세서의 성능과 속도를 향상시킬 수가 있을 뿐만 아니라 컴파일러(compiler)를 단순화할 수 있다. 또한, 기본 블록 내의 다른 명령어들을 모두 스케쥴한 다음에 빈 슬롯에 점프 타깃 어드레스 설정(JTS) 명령어를 스케쥴할 수 있어서 스케쥴 품질이 높으며, 조건부 분기 명령어를 처리하는데 많은 인코딩 공간이 필요하지가 않다. According to one embodiment, there is no need to use delay slots while using a static branch prediction method that can reduce the burden of hardware addition. Therefore, it is suitable for the processing of control parts, etc., where the basic block size is small, and is suitable for application to a VLIW machine. In addition, since the cycle time required to process the basic block can be shortened compared to the conventional method, the processor can not only improve performance and speed, but also simplify the compiler. In addition, it is possible to schedule a jump target address setting (JTS) instruction in an empty slot after all other instructions in the basic block are scheduled, so that the schedule quality is high, and a large amount of encoding space is not required to process the conditional branch instruction.

이하, 첨부된 도면들을 참조하여 본 발명의 실시예들을 상세하게 설명한다. 본 발명의 양상을 설명함에 있어 관련된 공지 기능 또는 구성에 대한 구체적인 설명이 본 발명의 요지를 불필요하게 흐릴 수 있다고 판단되는 경우에는 그 상세한 설명을 생략할 것이다. 또한, 후술되는 용어들은 본 발명의 실시예에서의 기능을 고려하여 정의된 용어들로서 이는 사용자, 운용자의 의도 또는 관례 등에 따라 달라질 수 있다. 그러므로 그 정의는 본 명세서 전반에 걸친 내용을 토대로 내려져야 할 것이다.Hereinafter, with reference to the accompanying drawings will be described embodiments of the present invention; In describing the aspects of the present invention, if it is determined that a detailed description of related known functions or configurations may unnecessarily obscure the subject matter of the present invention, the detailed description thereof will be omitted. In addition, terms to be described below are terms defined in consideration of functions in the embodiments of the present invention, which may vary according to intention or custom of a user or an operator. Therefore, the definition should be made based on the contents throughout the specification.

도 1은 일 실시예에 따른 정적 분기 예측 방법을 보여 주는 흐름도이다. 도 1에 도시된 실시예는 명령어 레지스터(Instruction Register, IR)의 소스 코드에 조건부 분기 명령어 또는 코드(conditional branch instruction or code)가 포함되어 있는 경우이다.1 is a flowchart illustrating a static branch prediction method according to an embodiment. 1 illustrates a case in which a conditional branch instruction or code is included in a source code of an instruction register (IR).

도 1을 참조하면, 우선 조건부 분기 오퍼레이션에 대하여 테이큰(taken) 또는 낫-테이큰(not-taken)으로 예측한다(S10). 도 1에는 테이큰으로 예측하는 경우가 도시되어 있는데, 이것은 단지 예시적인 것이다. 테이큰 또는 낫-테이큰으로 예측할 경우에 사용하는 알고리즘에 대해서는 아무런 제한이 없다. 예를 들어, 모두 테이큰 또는 낫-테이큰으로 예측하거나 또는 프로파일 기반 예측(profile based prediction)을 하는 등, 기존의 정적 분기 예측 방법에서 사용된 것과 동일한 알고리즘을 사용할 수 있다. 단계 S10의 결과, 예측 정보, 즉 테이큰으로 예측했는지 또는 낫-테이큰으로 예측했는지에 관한 정보는 스케쥴링 단계 또는 컴파일 과정(S20)에서 각 조건부 분기 명령어에 추가되는데, 이에 대해서는 후술한다.Referring to FIG. 1, first, a conditional branch operation is predicted as taken or not-taken (S10). 1 shows a case of predicting with a take, which is merely exemplary. There is no restriction on the algorithm used when predicting to take or not-take. For example, the same algorithm as that used in the existing static branch prediction method may be used, such as predicting all with taken or not-taken or making profile based prediction. The result of step S10, the prediction information, that is, whether information predicted by take or not-taken is added to each conditional branch instruction in the scheduling step or the compilation process (S20), which will be described later.

계속해서, 조건부 분기 명령어를 점프 타깃 주소 설정(Jump Target address Setting, JTSc) 코드와 테스트 코드(test code)로 변환한 다음, 명령어 레지스터(Instruction Register, IR)에서의 코드들을 스케쥴링을 수행한다(S20). 여기서, 점프 타깃 주소 설정(JTSc) 코드 및 테스트 코드라는 명칭은 임의적인 것으로, 동일한 기능을 수행하고 또한 이를 위하여 동일한 정보를 포함하는 다른 명칭이 대신 사용될 수 있다. 스케쥴링 과정은 각 명령어 레지스터(Instruction Register, IR) 또는 각 기본 블록(BB) 내의 코드들을 실행할 순서대로 재배열하는 과정으로서, 명령어들의 컴파일(compile) 과정의 일부일 수 있다. 하나의 파이프라인을 갖는 프로세서의 경우에는 하나의 명령어 레지스터(IR)는 하나의 처리 블록을 이용하여 컴파일되나, 슈퍼스칼라 구조(superscalar structures)에서는 복수의 처리 블록으로 컴파일될 수 있다. Subsequently, the conditional branch instruction is converted into a jump target address setting (JTSc) code and a test code, and then the codes in the instruction register (IR) are scheduled (S20). ). Here, the names of the jump target address setting (JTSc) code and the test code are arbitrary, and other names that perform the same function and include the same information may be used instead. The scheduling process is a process of rearranging codes in each instruction register (IR) or each basic block (BB) in order of execution and may be part of a process of compiling the instructions. In the case of a processor having one pipeline, one instruction register IR may be compiled using one processing block, but in superscalar structures, it may be compiled into a plurality of processing blocks.

JTSc 코드(여기서, 'c'는 'conditional'을 나타낸다)는 타깃 어드레스 정보(target address information)와 분기 시기 정보(branch time information)를 포함하며, 예측 정보(prediction information)도 포함할 수 있다. 여기서 사용된 정보들의 명칭도 예시적인 것이다. 타깃 어드레스 정보는 조건부 분기 오퍼레이션에서 테이큰이 선택될 경우(즉, 테이큰으로 예측한 경우에는 예측이 맞거나 또는 낫-테이큰으로 예측했는데 예측이 맞지 않은 경우)에 실행될 타깃 블록의 어드레스 정보일 수 있다. 분기 시기 정보는 분기가 언제 일어나는지를 지시하는 정보로서, 예컨대 몇 사이클(cycle) 후에 테스트 코드가 오는지를 나타내는 값일 수 있다. 그리고 예측 정보는 단계 S10에서 테이큰으로 예측했는지 또는 낫-테이큰으로 예측했는 지를 나타내는 정보로, 테이큰(t) 또는 낫-테이큰(n)을 지시하는 임의의 값으로 설정할 수 있다. 이러한 예측 정보는 테스트 오퍼레이션에서의 테스트 결과를 비교할 때 사용하는 힌트가 될 수 있다. The JTSc code (where 'c' represents 'conditional') includes target address information and branch time information, and may also include prediction information. The names of the information used here are exemplary. The target address information may be address information of the target block to be executed when the take is selected in the conditional branch operation (that is, when the prediction is correct or when the prediction is correct or when the prediction is not correct when the prediction is the take). The branch timing information is information indicating when a branch occurs, and may be, for example, a value indicating how many cycles a test code comes after. The prediction information is information indicating whether or not predicted by taking or sickle-taking in step S10, and may be set to any value indicating the taken (t) or sickle-taken (n). Such prediction information may be a hint used when comparing test results in a test operation.

테스트 코드 또는 테스트 명령어는 단계 S10에서의 예측이 맞는지를 확인하기 위한 것으로서, 예컨대 비교 명령어(compare instruction)와 동일한 기능을 하는 명령어일 수 있다. 테스트 명령어를 실행하기 위해서는 다른 명령어에 의한 실행 결과가 필요하므로, 테스트 명령어는 다른 명령어들의 실행 결과에 의존 관계가 있다. 따라서 테스트 명령어는 일반적으로 해당 블록에서 마지막에 처리되도록 스케쥴링된다.The test code or the test instruction is for checking whether the prediction in step S10 is correct, and may be, for example, an instruction having the same function as a compare instruction. In order to execute a test command, the execution result of another command is required, so the test command depends on the execution result of other commands. Thus test instructions are typically scheduled to be processed last in the block.

도 2는 도 1의 단계 S20에서의 명령어 스케쥴링 과정 또는 명령어 레지스터(IR)의 코드들을 컴파일하는 과정의 일례를 보여 주는 흐름도이다.FIG. 2 is a flowchart illustrating an example of an instruction scheduling process or a process of compiling codes of an instruction register IR in operation S20 of FIG. 1.

도 2를 참조하면, 먼저 조건부 분기 명령어에 예측 정보를 추가한다(S21). 예측 정보는 도 1의 단계 S10에서의 예측 결과를 나타내는 정보로서, 테이큰(t) 또는 낫-테이큰(n)을 지시하는 정보일 수 있다. 그리고 예측 정보를 포함하는 조건부 분기 명령어를 JTSc 코드와 테스트 코드로 분리한다(S22). 전술한 바와 같이, 조건부 분기 명령어에 포함된 정보들 중에서, 비교 오퍼레이션을 위한 정보들은 테스트 코드에 포함시키며, JTSc 코드에는 타깃 어드레스 정보 등을 포함시킨다. 그리고 JTSc 코드에는 타깃 어드레스 정보 외에 분기 시기 정보(예컨대, 테스트 코드가 몇 사이클 후에 나오는지를 지시하는 값)와 함께 단계 S21에서 삽입된 예측 정보도 포함될 수 있다. Referring to FIG. 2, first, prediction information is added to a conditional branch instruction (S21). The prediction information is information representing a prediction result in step S10 of FIG. 1 and may be information indicating a taken t or a sickle-taken n. The conditional branch instruction including the prediction information is separated into a JTSc code and a test code (S22). As described above, among the information included in the conditional branch instruction, the information for the comparison operation is included in the test code, and the target address information and the like are included in the JTSc code. In addition to the target address information, the JTSc code may also include prediction information inserted in step S21 together with branch timing information (eg, a value indicating how many cycles a test code comes out).

그리고 JTSc 코드, 테스트 코드, 및 해당 블록 내에 포함되는 다른 명령어들을 실행 순서에 따라서 배열한다(S23). 이 경우에, JTSc 코드를 제외한 다른 명령어들은 기존의 방식대로 먼저 스케쥴링할 수 있다. 여기서 기존의 방식이란, 종래의 정적 분기 예측 방법에서 조건부 분기 명령어를 JTSc 코드와 테스트 코드로 나누어서 실행하지 않을 경우의 명령어 스케쥴링 방법일 수 있다. 예를 들어, 기존의 방식(지연 슬롯을 사용하는 방법)에 의할 경우에는, 비교 명령어(compare instruction) 다음에는 분기 명령어(branch instruction)를 삽입한다. 그리고 분기 명령어 뒤에는 지연 슬롯들을 삽입함으로써 비교 명령어와 의존 관계에 있지 않은 다른 명령어들이 지연 슬롯에 배열되도록 스케쥴링할 수 있다. The JTSc code, the test code, and other instructions included in the block are arranged in the order of execution (S23). In this case, other instructions except JTSc code can be scheduled first in a conventional manner. Here, the conventional scheme may be an instruction scheduling method when the conditional branch instruction is not divided into JTSc code and test code in the conventional static branch prediction method. For example, in the conventional manner (using delay slots), a branch instruction is inserted after a compare instruction. By inserting delay slots after the branch instruction, other instructions that are not dependent on the comparison instruction can be scheduled to be arranged in the delay slot.

그러나 조건부 분기 명령어가 JTSc 코드와 테스트 코드로 나누어진 본 실시예에서는, 블록 내의 다른 명령어들에 의존 관계가 있는 테스트 코드가 해당 블록에서 마지막에 실행되도록 한다. 그리고 다른 명령어들을 모두 스케쥴링한 후에 JTSc 코드의 실행 순서는 제일 마지막에 결정한다. 전술한 바와 같이, JTSc 코드는 다른 명령어들과 의존 관계가 없기 때문에, 제일 늦게 스케쥴링을 할 수 있을 뿐만 아니라, 그 배열 위치에 제약이 거의 없다. 그리고 JTSc 코드는 조건부 분기 명령어에 의하여 분기가 될 경우에 현재 블록의 다음에 페치할 블록의 어드레스에 관한 제반 정보를 포함한다. 이러한 정보는 가능한 일찍 획득할수록 파이프라인 프로세서의 분기 헤저드를 없애거나 감소시키는데 도움이 된다. 따라서 본 실시예에서는 JTSc 코드가 가능한 앞쪽에서 실행될 수 있도록 JTSc 코드를 스케쥴링을 할 수 있다. 예를 들어, JTSc 코드는 기존의 스케쥴링 방법(즉, 단계 S22)에서 nop 슬롯으 로 할당되는 슬롯들 중에서 제일 앞쪽에 있는 nop 슬롯에 배치할 수 있다.However, in this embodiment where the conditional branch instruction is divided into JTSc code and test code, the test code that depends on other instructions in the block is executed last in the block. After all other commands have been scheduled, the execution order of the JTSc code is determined last. As mentioned above, since JTSc code is not dependent on other instructions, not only can it be scheduled at the latest, but also there are few restrictions on the arrangement position. The JTSc code includes general information on the address of the block to be fetched next to the current block when branching by a conditional branch instruction. This information, as early as possible, helps to eliminate or reduce branching hazards in pipeline processors. Therefore, in the present embodiment, the JTSc code can be scheduled so that the JTSc code can be executed from the front. For example, the JTSc code may be placed in the nop slot at the front of the slots allocated as the nop slot in the existing scheduling method (ie, step S22).

계속해서 도 1을 참조하면, 스케쥴된 순서에 따라서 명령어들을 순차적으로 페치하여 실행한다(S30). 단계 S10에서 테이큰으로 예측한 경우에는, 도 1에 도시된 바와 같이, 테스트 코드의 페이 이후에는 JTSc 코드에 포함된 분기 시기 정보가 지시하는 사이클(예컨대, 테스트 코드의 페치 이후)에 또한 JTSc 코드에 포함된 타깃 어드레스 정보가 지시하는 블록의 어드레스를 페치할 수 있다. 반면, 단계 S10에서 낫-테이큰으로 예측한 경우에는, 테스트 코드의 페치 이후에는 현재 블록(테스트 코드)의 다음 블록의 명령어들을 순차적으로 페치할 수 있다. Subsequently, referring to FIG. 1, instructions are sequentially fetched and executed according to a scheduled order (S30). In the case of predicting the take in step S10, as shown in FIG. 1, after the page of the test code, the JTSc code is also added to the JTSc code in a cycle indicated by the branch timing information included in the JTSc code (for example, after the fetch of the test code). The address of the block indicated by the included target address information may be fetched. On the other hand, when predicted as sickle-taked in step S10, after fetching the test code, instructions of the next block of the current block (test code) may be sequentially fetched.

이러한 예측에 기초한 명령어들의 페치는 테스트 코드를 페치한 이후에 바로 시작하여 테스트 코드의 디코드 단계가 종료되거나 또는 실행 단계에 의하여 예측이 맞았는지를 확인할 수 있을 때까지 수행할 수 있다. 이에 의하면, 단계 S10에서 테이큰으로 예측한 경우에도 테스트 코드 이후에 지연 슬롯을 사용할 필요가 없거나 또는 지연 슬롯의 사용을 최소화할 수가 있다. 왜냐하면, 본 실시예에서는 테스트 코드를 디코드하여 실행하지 않더라도 조건부 분기 명령어에서 분리된 JTSc 코드를 먼저 실행하여 분기될 타깃 어드레스와 분기 시기를 미리 알 수 있기 때문이다.The fetch of instructions based on this prediction may be started immediately after fetching the test code until the decode phase of the test code is terminated or whether the prediction is correct by the execution step. According to this, even when predicted as a take in step S10, it is not necessary to use the delay slot after the test code or minimize the use of the delay slot. This is because, in the present embodiment, even if the test code is not decoded and executed, the target address and branch timing to be branched can be known in advance by first executing the JTSc code separated from the conditional branch instruction.

그리고 해당 블록의 테스트 코드를 실행하여 단계 S10에서의 예측이 맞았는지에 따라서 페치된 명령어들을 그대로 처리하거나 또는 테스트 코드 뒤에 페치된 명령어들을 플러쉬(flush)한다(S40). 즉, 단계 S10에서의 예측이 맞는 경우에는 단계 S30에서 페치한 순서대로 그대로 명령어들을 실행하지만, 예측이 맞지 않는 경 우에는 테스트 코드 이후에 페치한 명령어들은 모두 플러쉬하고 다른 어드레스(예컨대, 해당 블록의 다음 블록이나 JTSc 코드에 포함된 타깃 어드레스)를 페치한다. 예측이 맞았는지를 확인하기 위하여, JTSc 코드에 포함된 예측 정보를 이용할 수 있다.Then, the test code of the corresponding block is executed to process the fetched instructions as it is or according to the prediction in step S10, or to flush the fetched instructions after the test code (S40). That is, if the prediction in step S10 is correct, the instructions are executed as they are fetched in step S30, but if the prediction is not correct, all instructions fetched after the test code are flushed with other addresses (for example, in the corresponding block). Fetches the next block or target address contained in the JTSc code. In order to confirm whether the prediction is correct, prediction information included in the JTSc code may be used.

예를 들어, 도 1에 도시된 바와 같이, 단계 S10에서 테이큰으로 예측했다고 가정하자. 이 경우에, 테스트 코드의 실행 단계를 수행한 결과, 예측이 맞는 것으로 확인된 경우에는 JTSc 코드에 포함되어 있는 분기 어드레스 정보에서 지시하는 블록의 명령어들, 즉 테스트 코드의 뒤를 이어 이미 페치된 명령어들을 그대로 처리한다. 반면, 예측이 맞지 않는 경우에는, 테스트 코드의 뒤를 이어 페치한 명령어들은 모두 플러쉬하고 대신에 현재 블록의 다음 블록의 명령어들을 페치하기 시작한다.For example, assume that the prediction is taken in step S10 as shown in FIG. In this case, as a result of performing the execution step of the test code, if the prediction is confirmed to be correct, the instructions of the block indicated by the branch address information included in the JTSc code, that is, the instructions already fetched after the test code Treat it as it is. On the other hand, if the prediction does not match, then all subsequent fetched instructions after the test code are flushed and instead fetches instructions from the next block of the current block.

이하에서는 도 3에 도시된 코드들을 예로 들어서, 종래의 정적 분기 예측 방법과 본 실시예의 정적 분기 예측 방법의 차이점을 비교하여 설명하기로 한다. 도 3은 VLIW 머신에서 실행될 수 있는 명령어 레지스터(IR)에서의 코드들의 일례이다.Hereinafter, the codes shown in FIG. 3 are taken as examples, and the difference between the conventional static branch prediction method and the static branch prediction method of the present embodiment will be described. 3 is an example of codes in an instruction register (IR) that may be executed in a VLIW machine.

도 4a는 종래의 정적 분기 예측 방법에 따라서 도 3의 코드들을 스케쥴링한, 즉 컴파일한 코드들을 보여 주는 도면이고, 도 4b는 예측이 맞는 경우에 도 4a로 스케쥴링된 코드들의 일부에 대한 파이프라인 스테이지를 보여 주는 도면이다. 그리고 도 5a는 전술한 실시예에 따라서 도 3의 코드들을 스케쥴링한, 즉 컴파일한 코드들을 보여 주는 도면이고, 도 5b는 예측이 맞는 경우에 도 5a로 스케쥴링된 코 드들의 일부에 대한 파이프라인 스테이지를 보여 주는 도면이다. 도 4b 및 도 5b에서 F, D, E, M, 및 W는 각각 페치(fetch) 단계, 디코드(decode) 단계, 실행(execute) 단계, 메모리(memory) 단계, 및 기입(write) 단계를 나타낸다. 도시된 예들은 두 개의 처리 블록을 포함하는 슈퍼스칼라 구조에 대한 것이나 이러한 하드웨어 구성은 단지 예시적인 것이다.FIG. 4A is a diagram of the scheduled, ie compiled, codes of FIG. 3 according to a conventional static branch prediction method, and FIG. 4B is a pipeline stage for some of the codes scheduled in FIG. 4A when the prediction is correct. The figure shows. And FIG. 5A is a diagram of the scheduled, ie compiled, codes of FIG. 3 according to the embodiment described above, and FIG. 5B is a pipeline stage for some of the codes scheduled in FIG. 5A if the prediction is correct. The figure shows. In FIGS. 4B and 5B, F, D, E, M, and W represent a fetch step, a decode step, an execute step, a memory step, and a write step, respectively. . The examples shown are for a superscalar structure that includes two processing blocks, but this hardware configuration is merely illustrative.

도 4a 및 도 4b를 참조하면, 기존의 방법에 의할 경우에는, 비교 명령어(compare, cmp)의 실행 단계(E)에서는 주어진 값들(r3, r2)을 비교하여 분기할 것인가를 결정하며, 분기 명령어(branch)의 디코드 단계(D)가 되어야 이후에 페치할 분기 타깃 어드레스(branch target address)를 알 수 있다. 따라서 기존의 방법에 의할 경우에는 분기 명령어 이후에 지연 슬롯이 추가되어야 하는데, 도 4a에는 지연 슬롯에 삽입할 적당한 코드가 없어서 nop가 추가된 것이 도시되어 있다. 도시된 예에서는 이러한 지연 슬롯의 추가로 인하여 제1 기본 블록(BB1)을 실행하는데 총 6사이클이 소요되며, 사이클 시간(cycle time) 6에 새로운 타깃 어드레스를 페치한다는 것을 알 수 있다.Referring to FIGS. 4A and 4B, according to the conventional method, the execution step E of the comparison instruction compare cmp determines whether to branch by comparing the given values r3 and r2. The decode phase (D) of the instruction is known so that the branch target address to be fetched later is known. Therefore, according to the conventional method, the delay slot should be added after the branch instruction. In FIG. 4A, nop is added because no suitable code is inserted into the delay slot. In the illustrated example, it can be seen that the addition of the delay slot takes a total of six cycles to execute the first basic block BB1 and fetches a new target address at cycle time 6.

반면, 도 5a 및 도 5b를 참조하면, JTSc 코드의 디코드 단계(D)인 사이클 시간(cycle time) 2에서 분기 타깃 주소를 알 수 있고, 또한 2사이클 후인 사이클 시간 4에 분기 어드레스인 제3 기본 블록(BB3)의 명령어를 페치한다는 것도 알 수 있다. 그리고 테스트 코드를 디코드하고 이를 실행하여 분기 예측이 맞으면 아무것도 하지 않고 파이프라인 프로세서에서 계속 명령어들의 처리 절차가 진행된다. 따라서 도시된 예에 의하면, 제1 기본 블록(BB1)을 실행하는데 총 4사이클이 소요되며, 사이클 시간 4에 타깃 어드레스를 페치한다는 것을 알 수 있다. 이것은 본 실시예에 의하면 파이프라인 프로세서의 처리 속도와 성능을 향상시킬 수가 있다는 것을 잘 보여 준다.On the other hand, referring to FIGS. 5A and 5B, the branch target address can be known at cycle time 2, which is the decoding step (D) of the JTSc code, and the third basic address which is the branch address at cycle time 4, which is two cycles later. It can also be seen that the instruction of the block BB3 is fetched. It decodes the test code and executes it so that if the branch prediction is correct, the pipeline processor continues to process the instructions without doing anything. Therefore, according to the illustrated example, it can be seen that it takes 4 cycles to execute the first basic block BB1 and fetches the target address at cycle time 4. This shows that the present embodiment can improve the processing speed and performance of the pipeline processor.

도 5c는 예측이 맞지 않는 경우에 도 5a로 스케쥴링된 코드들의 일부에 대한 파이프라인 스테이지를 보여 주는 도면이다. 도 5c를 참조하면, 도 5b에서와 마찬가지로, 제1 기본 블록(BB1)의 JTSc 코드를 디코드하여 사이클 시간 2에 타깃 어드레스를 계산하고 2사이클 후인 사이클 시간 4에 새로운 어드레스인 제3 기본 블록(BB3)을 페치한다. 그런데, 테스트 코드를 실행한 결과 예측이 틀린 것으로 확인되었으므로, 제3 기본 블록(BB3)으로부터 페치한 오퍼레이션들은 모두 플러쉬하고, 사이클 시간 6에 후속 블록인 제2 기본 블록(BB2)을 페치한다.FIG. 5C shows a pipeline stage for some of the codes scheduled for FIG. 5A when the prediction is not correct. Referring to FIG. 5C, as in FIG. 5B, the JTSc code of the first basic block BB1 is decoded to calculate a target address at cycle time 2, and the third basic block BB3 is a new address at cycle time 4 that is two cycles later. Fetch). However, since it is confirmed that the prediction is wrong as a result of executing the test code, all operations fetched from the third basic block BB3 are flushed, and the second basic block BB2, which is a subsequent block, is fetched at cycle time 6.

도 6은 다른 실시예에 따른 명령어 스케쥴링 방법 또는 컴파일 방법을 보여 주는 흐름도로서, 소스 코드가 무조건부 분기 명령어(unconditional branch instruction)를 포함하는 경우이다. 무조건부 분기 명령어는 예컨대, 도 3에 도시된 예에서, 제2 기본 블록(BB2)의 코드 'jump'일 수 있는데, 이것은 단지 예시적인 것이다.FIG. 6 is a flowchart illustrating an instruction scheduling method or a compilation method according to another embodiment, in which a source code includes an unconditional branch instruction. The unconditional branch instruction may be, for example, the code 'jump' of the second basic block BB2 in the example shown in FIG. 3, which is merely illustrative.

도 6을 참조하면, 우선 무조건부 분기 명령어(jump)를 점프 타깃 주소 설정(Jump Target address Setting, JTSu) 코드로 변환한다(S110). 여기서도 점프 타깃 주소 설정(JTSu) 코드(여기서, 'u'는 'unconditional'을 나타낸다)라는 명칭은 임의적인 것으로, 동일한 기능을 수행하고 또한 이를 위하여 동일한 정보를 포함하 는 다른 명칭이 대신 사용될 수 있다.Referring to FIG. 6, first, an unconditional branch instruction (jump) is converted into a jump target address setting (JTSu) code (S110). Here again, the Jump Target Address Setting (JTSu) code (where 'u' stands for 'unconditional') is arbitrary, and other names that perform the same function and contain the same information may be used instead. .

JTSu 코드는 타깃 어드레스 정보(target address information)와 분기 시기 정보(branch time information)를 포함한다. 여기서 사용된 정보들의 명칭도 예시적인 것이다. 타깃 어드레스 정보는 무조건부 분기 오퍼레이션에 따라서 분기 오퍼레이션 또는 점프 오퍼레이션(jump operation)이 실행될 경우에, 페치할 타깃 블록의 어드레스 정보일 수 있다. 분기 시기 정보는 분기 또는 점프가 언제 일어나는지를 지시하는 정보일 수 있다. JTSu 코드는 분기 시기 정보를 포함한다는 점에서, 기존의 jump 코드와 차이가 있다. 무조건부 분기 명령어를 실행함에 있어서 예측 정보(prediction information)는 필요 없다.The JTSu code includes target address information and branch time information. The names of the information used here are exemplary. The target address information may be address information of a target block to be fetched when a branch operation or a jump operation is performed according to an unconditional branch operation. The branch timing information may be information indicating when a branch or a jump occurs. JTSu code differs from conventional jump code in that it includes branch timing information. Prediction information is not needed in executing an unconditional branch instruction.

계속해서, JTSu 코드를 포함하는 해당 블록에 대한 스케쥴을 수행한다(S120). 스케쥴링 과정은 각 명령어 레지스터(Instruction Register, IR) 또는 각 기본 블록(BB) 내의 코드들을 실행할 순서대로 재배열하는 과정으로서, 명령어들의 컴파일 과정의 일부일 수 있다. 하나의 파이프라인을 갖는 프로세서의 경우에는 하나의 명령어 레지스터(IR)는 하나의 처리 블록 내에 컴파일되나, 슈퍼스칼라 구조(superscalar structures)에서는 복수의 처리 블록으로 컴파일될 수 있다. 도 4a에는 도 3의 무조건부 분기 명령어를 포함하는 블록(도 3의 제2 기본 블록(BB2))에 대하여 본 실시예에 따라서 스케쥴링한 결과(도 4a의 제2 기본 블록(BB2))의 일례가 도시되어 있다.Subsequently, a schedule for the corresponding block including the JTSu code is performed (S120). The scheduling process is a process of rearranging codes in each instruction register (IR) or each basic block (BB) in an order of execution, and may be part of a compilation process of instructions. In the case of a processor having one pipeline, one instruction register IR may be compiled into one processing block, but in superscalar structures, it may be compiled into a plurality of processing blocks. FIG. 4A shows an example of a result of scheduling according to the present embodiment (second basic block BB2 of FIG. 4A) for the block including the unconditional branch instruction of FIG. 3 (second basic block BB2 of FIG. 3). Is shown.

보다 구체적으로, 먼저 JTSu 코드를 제외한 다른 명령어들을 먼저 스케쥴링한다(S121). 즉, JTSu 코드를 제외하고 해당 블록(예컨대, 제2 기본 블록(BB2)) 내 에 포함되는 다른 명령어들을 실행 순서에 따라서 배열한다. 이 경우에, JTSu 코드를 제외한 다른 명령어들은 스케쥴링하는 방법에는 특별한 제한이 없으며, 이 분야에서의 통상적인 기법이 적용될 수 있다. More specifically, first, other instructions except for the JTSu code are scheduled first (S121). That is, except the JTSu code, other instructions included in the block (eg, the second basic block BB2) are arranged in the order of execution. In this case, the instructions other than the JTSu code are not particularly limited in the scheduling method, and a conventional technique in this field may be applied.

그리고 다른 명령어들을 모두 스케쥴링한 후에 JTSu 코드의 실행 순서를 결정한다(S122). JTSu 코드는 다른 명령어들과 의존 관계가 없기 때문에 스케쥴링을 함에 있어서 제약이 거의 없다. 또한, JTSu 코드는 jump 코드와는 달리 분기 시기 정보를 포함하기 때문에, 항상 해당 블록의 마지막에 실행되어야 하는 jump 코드에 비하여 스케쥴링을 함에 있어서 유연성(flexibility)이 높다. 즉, JTSu 코드가 해당 블록에서 가능한 앞쪽에서 실행될 수 있도록 스케쥴링을 할 수 있다. 예를 들어, JTSu 코드는 기존의 스케쥴링 방법에서 nop 슬롯으로 할당되는 슬롯들 중에서 제일 앞쪽에 있는 nop 슬롯에 배치할 수 있다.After all other commands are scheduled, the execution order of the JTSu code is determined (S122). JTSu code has little dependence on scheduling because it has no dependencies on other commands. In addition, unlike jump codes, JTSu codes contain branch timing information, and thus have higher flexibility in scheduling than jump codes that should always be executed at the end of the block. That is, the JTSu code can be scheduled so that it can be executed as far in front of the block as possible. For example, the JTSu code may be placed in the nop slot that is the first of the slots allocated to the nop slot in the conventional scheduling method.

이러한 본 실시예에 의하면, JTSu 코드 후에 지연 슬롯을 추가할 필요가 없다. 도 4b에 도시된 기존의 방법에 의할 경우에는, 제2 기본 블록(BB2)에서 jump 오퍼레이션을 수행한 이후에도 지연 슬롯이 추가되었다(제2 기본 블록(BB2)의 실행에 총 5 사이클이 소요). 그러나 타깃 어드레스 정보와 함께 분기 시기 정보를 포함하는 JTSu 코드를 사용하여 스케쥴링할 경우에는, JTSu 코드를 해당 블록의 앞쪽에 배열할 수 있으며, 그 결과 지연 슬롯을 별도로 추가할 필요가 없다(제2 기본 블록(BB2)의 실행에 총 4 사이클이 소요). 따라서 본 실시예에 의하면, 파이프라인 프로세서의 성능과 속도를 향상시킬 수가 있다. According to this present embodiment, it is not necessary to add a delay slot after the JTSu code. According to the conventional method illustrated in FIG. 4B, a delay slot is added even after performing a jump operation on the second basic block BB2 (total execution of the second basic block BB2 takes 5 cycles). . However, when scheduling using a JTSu code that includes branch timing information with target address information, the JTSu code can be arranged in front of the block so that there is no need to add a delay slot separately (second basic). 4 cycles are required to execute block (BB2). Therefore, according to this embodiment, the performance and speed of the pipeline processor can be improved.

이상의 설명은 본 발명의 일 실시예에 불과할 뿐, 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자는 본 발명의 본질적 특성에서 벗어나지 않는 범위에서 변형된 형태로 구현할 수 있을 것이다. 따라서, 본 발명의 범위는 전술한 실시예에 한정되지 않고 특허 청구범위에 기재된 내용과 동등한 범위 내에 있는 다양한 실시 형태가 포함되도록 해석되어야 할 것이다. It will be apparent to those skilled in the art that various modifications and variations can be made in the present invention without departing from the spirit or scope of the invention. Therefore, the scope of the present invention should not be limited to the above-described embodiments, but should be construed to include various embodiments within the scope of the claims.

도 1은 일 실시예에 따른 정적 분기 예측 방법을 보여 주는 흐름도이다.1 is a flowchart illustrating a static branch prediction method according to an embodiment.

도 2는 도 1의 단계 S20에서의 명령어 스케쥴링 과정의 일례를 보여 주는 흐름도이다.FIG. 2 is a flowchart showing an example of an instruction scheduling process in step S20 of FIG. 1.

도 3은 VLIW 머신에서 실행될 수 있는 명령어 레지스터(IR)에서의 코드들의 일례이다.3 is an example of codes in an instruction register (IR) that may be executed in a VLIW machine.

도 4a는 종래의 정적 분기 예측 방법에 따라서 도 3의 코드들을 컴파일한 결과를 보여 주는 도면이다. 4A illustrates a result of compiling the codes of FIG. 3 according to a conventional static branch prediction method.

도 4b는 예측이 맞는 경우에 도 4a로 스케쥴링된 코드들의 일부에 대한 파이프라인 스테이지를 보여 주는 도면이다.4B shows the pipeline stage for some of the codes scheduled in FIG. 4A when the prediction is correct.

도 5a는 본 실시예에 따른 정적 분기 예측 방법에 따라서 도 3의 코드들을 컴파일한 결과를 보여 주는 도면이다.5A illustrates a result of compiling the codes of FIG. 3 according to the static branch prediction method according to the present embodiment.

도 5b는 예측이 맞는 경우에 도 5a로 스케쥴링된 코드들의 일부에 대한 파이프라인 스테이지를 보여 주는 도면이다.FIG. 5B shows a pipeline stage for some of the codes scheduled in FIG. 5A when the prediction is correct.

도 5c는 예측이 맞지 않은 경우에 도 5a로 스케쥴링된 코드들의 일부에 대한 파이프라인 스테이지를 보여 주는 도면이다.5C shows the pipeline stage for some of the codes scheduled in FIG. 5A when the prediction is not correct.

도 6은 다른 실시예에 따른 명령어 컴파일 방법을 보여 주는 흐름도이다.6 is a flowchart illustrating a command compilation method according to another exemplary embodiment.

Claims

Predicting with taken or sickle-taken for conditional branch codes;

The code in the block is scheduled by converting the conditional branch code into a jump target address setting code and a test code including target address information and branch timing information, wherein the test code is scheduled to the last slot of the block and the jump target address setting is also performed. The code schedules an empty slot after all other codes in the block have been scheduled; And

And fetching a target address indicated by the target address information at a cycle time indicated by the branch timing information, when predicted by the take in the predicting step.

The method of claim 1,

If the prediction of the prediction step is correct as a result of the execution of the test code, the processes fetched in the fetching step are processed as it is, and if the prediction is not correct, the method further includes flushing all of the codes fetched in the fetching step. Static branch prediction method for pipeline processors.

The method of claim 2,

The cycle time indicated by the branch timing information is a cycle time immediately after fetching the test code.

The method of claim 2,

And a jump target address setting code further includes prediction information of the prediction step, and determining whether the prediction is correct uses the prediction information for a corrected branch prediction method for a pipeline processor.

The method of claim 1,

Fetching the next block address of the current block after fetching the test code when predicting with sickle-taked in the prediction step; And

If the prediction of the prediction step is correct as a result of the execution of the test code, the codes fetched in the fetch step are processed as it is, and if the prediction is not correct, all the codes fetched in the fetch step are flushed and the target address information is And fetching the indicated target address.

Converting the conditional branch code into a jump target address setting code and a test code including target address information and branch timing information; And

Scheduling all codes in a block including the test code and the jump target address setting code;

And in the scheduling step, the test code is scheduled in the last slot of the block, and the jump target address setting code is scheduled in an empty slot after all other codes in the block are scheduled.

The method of claim 6,

The jump target address setting code further includes prediction information indicating a take or sickle-taken, and the test code further uses the prediction information to predict the code branch.

The method of claim 7, wherein

Wherein the branch timing information indicates a cycle time for fetching a target block indicated by the target address information when the prediction information indicates a take.

The method of claim 8,

And a cycle time for fetching the target block is a next cycle time for fetching the test code.

Converting the conditional branch code into a jump target address setting code including target address information and branch timing information;

Scheduling the jump target address setting code to an empty slot that occurs after scheduling all other codes in the block; And

Fetching a target address indicated by the target address information at a cycle time indicated by the branch timing information.