KR20090089358A

KR20090089358A - A system and method for using a working global history register

Info

Publication number: KR20090089358A
Application number: KR1020097011477A
Authority: KR
Inventors: 브라이언 마이클 스템펠; 제임스 노리스 디펜더퍼; 토마스 앤드류 살토리우스; 로드니 웨인 스미스
Original assignee: 콸콤 인코포레이티드
Priority date: 2006-11-03
Filing date: 2007-10-25
Publication date: 2009-08-21
Also published as: JP2010509680A; US7984279B2; KR101081674B1; JP5209633B2; EP2084602B1; WO2008055045A1; US20080109644A1; CN101529378A; EP2084602A1; DE602007012131D1; CN101529378B; ATE496329T1

Abstract

A method of processing branch history information is disclosed. The method retrieves branch instructions from an instruction cache and executes the branch instructions in a plurality of pipeline stages. The method verifies that a branch instruction has been identified. The method further receives branch history information during a first pipeline stage and loads the branch history information into a first register, wherein the first register. The method further loads the branch history information into the second register during the second pipeline stage. ® KIPO & WIPO 2009

Description

A SYSTEM AND METHOD FOR USING A WORKING GLOBAL HISTORY REGISTER}

본 발명은 일반적으로 컴퓨터 시스템들에 관한 것이고, 더욱 상세하게는 워킹 글로벌 히스토리 레지스터를 이용하기 위한 방법 및 시스템에 관한 것이다.The present invention relates generally to computer systems and, more particularly, to a method and system for using a working global history register.

프로세서는 컴퓨터 플랫폼 발전의 핵심이다. 이전의 프로세서들은 그 시기의 이용 가능한 기술에 의해 제한되었다. 제조 기술의 새로운 진보들은 트랜지스터 설계들을 이전의 프로세서들의 사이즈의 1/1000로 줄이거나 이를 넘어서도록 허용한다. 이러한 더 작은 프로세서 설계들은 더욱 빠르고, 더욱 효율적이고, 이전의 예상들을 넘어서는 프로세싱 능력을 제공하면서 동시에 실질적으로 더 낮은 전력을 사용한다.Processors are the key to the evolution of computer platforms. Previous processors were limited by the technology available at that time. New advances in manufacturing technology allow transistor designs to be reduced to or exceed 1/1000 of the size of previous processors. These smaller processor designs are faster, more efficient, and use substantially lower power while providing processing power beyond previous expectations.

프로세서의 물리적인 설계가 발전하는 것처럼, 정보를 처리하고 기능들을 수행하는 혁신적인 방법들 또한 변화하였다. 예를 들어, 명령들의 "파이프라이닝(pipelining)"은 1960년대 초반부터 프로세서 설계들에서 구현되었다. 파이프라이닝의 일례는 실행 파이프라인들을 유닛들 또는 스테이지들로 푸는(break) 개념이고, 이를 통해 명령들이 스트림에서 순차적으로 흐른다. 몇몇의 스테이지들이 몇몇의 명령들의 적절한 부분들을 동시에 처리할 수 있도록 스테이지들은 배열된다. 파이프라이닝의 하나의 장점은 명령들이 병렬로 전개(evaluate)되기 때문에 명령들의 실행이 중첩(overlap)된다는 것이다.As the physical design of processors has evolved, so have innovative ways of processing information and performing functions. For example, "pipelining" of instructions has been implemented in processor designs since the early 1960s. One example of pipelining is the concept of breaking execution pipelines into units or stages, through which instructions flow sequentially in the stream. The stages are arranged so that several stages can process the appropriate portions of some instructions simultaneously. One advantage of pipelining is that the execution of instructions overlaps because the instructions are evolved in parallel.

프로세서 파이프라인은 각각의 스테이지가 명령을 실행하는 것과 관련되는 기능을 수행하는, 많은 스테이지로 구성된다. 각각의 스테이지는 파이프 스테이지 또는 파이프 세그먼트로 지칭된다. 스테이지들은 파이프라인을 형성하기 위해 함께 연결된다. 명령들은 파이프라인의 하나의 끝(end)에서 진입(enter)하고, 다른 끝에서 종료(exit)한다.The processor pipeline consists of many stages, each of which performs a function related to executing an instruction. Each stage is referred to as a pipe stage or pipe segment. The stages are connected together to form a pipeline. Instructions enter at one end of the pipeline and exit at the other end.

프로세서에 의해 실행되는 대부분의 프로그램들은 조건 브랜치(branch) 명령들을 포함하고, 상기 명령의 실제 브랜치 행동은 상기 명령이 파이프라인 깊게 전개될 때까지 알지 못한다. 브랜치 명령의 실제 전개로부터 초래되는 스톨(stall)을 회피하기 위해, 최신의 프로세서들은 몇몇의 브랜치 예측의 형태를 이용할 수 있고, 그에 의해 조건 브랜치 명령의 브랜치 행동은 파이프라인에서 초기에 예측된다. 상기 예측된 브랜치 전개에 기반하여, 프로세서는 예측된 어드레스 -(만약 브랜치가 취해지도록 예측된다면) 브랜치 타깃 어드레스 또는 (만약 브랜치가 취해지지 않도록 예측된다면) 브랜치 명령 후에 다음의 순차적인 어드레스- 로부터의 명령들을 추론적으로(speculatively) 페치(fetch)하고 실행한다. 조건 브랜치 명령이 취해(taken)지거나 취해지지 않는지(not taken) 여부는 브랜치 지시(direction)의 결정으로 지칭된다. 브랜치 지시의 결정은 예측 시간 및 실제 브랜치 결정(resolution) 시간에 내려질 수 있다. 실제 브랜치 행동이 결정될 때, 만약 브랜치가 잘못 예측되었다면, 추론적으로 페치된 명령들은 파이프라인으로부터 방 출(flush)되어야하고, 새로운 명령들이 정정된 어드레스로부터 페치되었다. 잘못된 브랜치 예측에 대응하여 추론적으로 명령들을 페치하는 것은 프로세서 성능 및 전력 소비에 불리하게 영향을 미친다. 결론적으로, 브랜치 예측들의 정확도를 개선하는 것이 중요한 프로세서 설계의 목표이다.Most programs executed by a processor contain condition branch instructions, and the actual branch behavior of the instruction is not known until the instruction is deployed deep into the pipeline. In order to avoid stalls resulting from the actual deployment of branch instructions, modern processors may use some form of branch prediction, whereby branch behavior of conditional branch instructions is initially predicted in the pipeline. Based on the predicted branch evolution, the processor may execute instructions from the predicted address-either the branch target address (if the branch is expected to be taken) or the next sequential address after the branch instruction (if the branch is not expected to be taken). Fetch and execute speculatively. Whether a conditional branch instruction is taken or not taken is referred to as the determination of branch direction. The determination of the branch indication may be made at the prediction time and the actual branch resolution time. When the actual branch behavior was determined, if the branch was mispredicted, speculatively fetched instructions had to be flushed from the pipeline, and new instructions were fetched from the corrected address. Inferentially fetching instructions in response to bad branch prediction adversely affects processor performance and power consumption. In conclusion, improving the accuracy of branch predictions is an important processor design goal.

브랜치 예측의 한가지 공지된 형태는 브랜치 예측을 2개의 예측자들: 최초의 브랜치 타깃 어드레스 캐시(BTAC) 및 브랜치 히스토리 테이블(BHC)로 분할하는 것을 포함한다. BTAC는 명령 페치 그룹 어드레스에 의해 인덱싱(index)되고, 명령 페치 그룹 어드레스에 대응하는, 또한 브랜치 타깃으로 지칭되는, 다음의 페치 어드레스를 포함한다. 엔트리들은 브랜치 명령이 프로세서 파이프라인을 통해 통과한 후에 BTAC로 더해지고, 그것의 브랜치는 취해진다. 만약 BTAC가 가득 차면, 엔트리들은 다음 엔트리가 더해질 때 (라운드 로빈(round robin) 또는 최근 최소 사용(least-recently used)과 같은) 표준 캐시 교체 알고리즘들을 사용하여 BTAC로부터 제거된다.One known form of branch prediction involves partitioning the branch prediction into two predictors: the original branch target address cache (BTAC) and the branch history table (BHC). The BTAC is indexed by the instruction fetch group address and includes the following fetch address, corresponding to the instruction fetch group address, also referred to as the branch target. The entries are added to BTAC after the branch instruction passes through the processor pipeline, and its branch is taken. If BTAC is full, entries are removed from BTAC using standard cache replacement algorithms (such as round robin or last-recently used) when the next entry is added.

BTAC는 매우-어소시에이티브한(highly-associative) 캐시 설계일 수 있고, 명령 실행 파이프라인에서 초기에 액세스된다. 만약 페치 그룹 어드레스가 BTAC 엔트리(BTAC 히트)와 매치(match)되면, 대응하는 다음 페치 어드레스 또는 타깃 어드레스는 다음 사이클에서 페치된다. 이러한 매치 및 타깃 어드레스의 이어지는 페칭은 내재적으로 취해진 브랜치 예측으로 지칭된다. 만약에 매치(BTAC 미스)가 존재하지 않는다면, 다음의 순차적으로 증가된 어드레스는 다음 사이클에서 페치된다. 이러한 노 매치 상황은 또한 내재적으로 취해지지 않은 예측으로 지칭된다.BTAC may be a highly-associative cache design and is initially accessed in the instruction execution pipeline. If the fetch group address matches a BTAC entry (BTAC hit), the corresponding next fetch address or target address is fetched in the next cycle. Subsequent fetching of this match and target address is referred to as branch prediction taken implicitly. If no match (BTAC miss) exists, the next sequentially incremented address is fetched in the next cycle. This no match situation is also referred to as a prediction that is not taken implicitly.

BTAC들은 패턴 히스토리 테이블(PHT)로도 알려진 브랜치 히스토리 테이블(BHT)와 같은 더 정확한 개별적인 브랜치 지시 예측자와 함께 이용될 수 있다. 종래의 BHT는 개별적인 브랜치 명령들에 대한 더욱 정확한 취해진/취해지지 않은 결정을 생성하기 위해 포화된(saturating) 예측된 지시 카운터들의 세트를 포함할 수 있다. 예를 들어, 각각의 포화된 예측된 지시 카운터는 4개의 상태들 중 하나를 취하는 2-비트 카운터를 포함할 수 있고, 각각은 다음과 같이 가중된 예측 값을 할당받는다:BTACs can be used with more accurate individual branch indication predictors, such as branch history tables (BHTs), also known as pattern history tables (PHTs). Conventional BHT may include a set of predicted indication counters that are saturated to produce more accurate taken / untaken decisions for individual branch instructions. For example, each saturated predicted indication counter may include a 2-bit counter that takes one of four states, each assigned a weighted prediction value as follows:

11 - 강한 예측이 취해짐11-strong predictions are taken

10 - 약한 예측이 취해짐10-weak predictions are taken

01 - 약한 예측이 취해지지 않음01-no weak predictions taken

00 - 강한 예측이 취해지지 않음00-No strong prediction is taken

예측 값으로도 지칭되는 종래의 BHT의 출력은, 브랜치 명령의 타깃 어드레스 또는 다음 사이클의 다음 이어지는 어드레스를 페치하도록 하는 결정을 내리거나 내리지 않는다. BHT는 알려질 것처럼 브랜치 결과 정보와 함께 일반적으로 업데이트된다.The output of a conventional BHT, also referred to as a predicted value, does not make or make a decision to fetch the target address of a branch instruction or the address following the next cycle. The BHT is generally updated with branch result information as will be known.

브랜치 예측들의 정확도를 증가시키기 위해, 피드백으로서 다른 브랜치들로부터의 최근 브랜치 히스토리 정보를 사용하는 다양한 다른 예측 기술들이 구현될 수 있다. 당해 기술분야에 속한 자는 현재의 브랜치 행동은 이전에 실행된 브랜치 명령들의 히스토리와 상호 연관된다는 것을 인식해야 한다. 예를 들어, 이전에 실행된 브랜치 명령들의 히스토리는 조건 브랜치 명령이 어떻게 예측되는지에 영향을 미칠 수 있다.To increase the accuracy of branch predictions, various other prediction techniques may be implemented that use recent branch history information from other branches as feedback. Those skilled in the art should recognize that current branch behavior is correlated with the history of previously executed branch instructions. For example, the history of previously executed branch instructions can affect how conditional branch instructions are predicted.

당해 기술분야에서 글로벌 브랜치 히스토리 레지스터 또는 글로벌 히스토리 시프트 레지스터로도 지칭되는 글로벌 히스토리 레지스터(GHR)는 이전에 실행된 브랜치 명령들의 과거의 히스토리를 추적하는데에 이용될 수 있다. GHR에 의해 저장된 것처럼, 브랜치 히스토리는 개선된 예측 결과들을 달성하기 위해 현재 실행된 브랜치 명령에 이르는 코드 경로에서 마주치는 브랜치 명령들의 시퀀스의 관점을 제공한다.Global history registers (GHRs), also referred to in the art as global branch history registers or global history shift registers, can be used to track past history of previously executed branch instructions. As stored by the GHR, branch history provides a view of the sequence of branch instructions encountered in the code path leading to the branch instruction currently executed to achieve improved prediction results.

몇몇의 프로세서들에서, 브랜치 명령의 식별 및 그것에 관련된 예측 정보는 명령 디코딩 스테이지 후에만 발생한다. 일반적으로, 명령 디코딩 스테이지는 명령 실행 시퀀스에서 나중의 스테이지일 수 있다. 명령이 브랜치 명령로 디코딩되고 확정된 후에, GHR은 적절한 브랜치 히스토리 정보로 로딩된다. 브랜치 히스토리 정보가 식별되므로, 그것은 GHR로 시프트된다. GHR의 출력은 BHT에 저장되는 예측 값을 식별하기 위해 사용되고, 이는 다음의 조건 브랜치 명령을 예측하기 위해 사용된다.In some processors, the identification of the branch instruction and the prediction information related to it occur only after the instruction decoding stage. In general, the instruction decoding stage may be a later stage in the instruction execution sequence. After the instruction is decoded and confirmed by the branch instruction, the GHR is loaded with the appropriate branch history information. Since branch history information is identified, it is shifted to GHR. The output of the GHR is used to identify the prediction value stored in the BHT, which is used to predict the next condition branch instruction.

GHR을 사용하는 종래의 프로세서에서, GHR은 복수의 브랜치 명령들이 상대적으로 짧은 시간 주기동안 병렬로 실행될 때 마주치는 실제의 브랜치 히스토리 정보를 나타내지 못한다. 이러한 예에서, GHR은 제 2 브랜치 명령이 예측되기 전에 제 1 브랜치 명령으로부터의 브랜치 히스토리 정보로 업데이트되지 못한다. 그 결과로서, GHR의 부정확한 값이 제 2 조건 브랜치 명령에 대한 BHT에서 엔트리를 식별하기 위해 사용될 수 있다. BHT에서 엔트리를 인덱싱하기 위해 부정확한 값을 이 용하는 것은 브랜치 예측의 정확도에 영향을 미칠 수 있다. 프로세서는 제 1 조건 브랜치 명령으로부터의 브랜치 히스토리 정보를 따라갈 수 없다면, GHR에 다른 값이 저장되고 BHT의 다른 엔트리는 제 2 조건 브랜치 명령에 대해 식별될 것이다.In a conventional processor using GHR, GHR does not represent the actual branch history information encountered when multiple branch instructions are executed in parallel for a relatively short period of time. In this example, the GHR is not updated with branch history information from the first branch instruction before the second branch instruction is predicted. As a result, an incorrect value of GHR can be used to identify an entry in the BHT for the second condition branch instruction. Using incorrect values to index entries in BHT can affect the accuracy of branch prediction. If the processor cannot follow the branch history information from the first condition branch instruction, another value is stored in the GHR and another entry of the BHT will be identified for the second condition branch instruction.

따라서, 더욱 정확한 브랜치 예측들을 달성하기 위해 GHR보다는 브랜치 히스토리 정보를 저장하고 이용할 수 있는 프로세서를 구비하는 산업상의 요구가 존재한다. 본 발명은 이러한 요구를 인식하고 프로세서의 실행 스테이지들에서 브랜치 명령들을 식별하는 프로세서를 개시한다. 브랜치 명령 정보를 입력으로서 이용하면, 프로세서는 순차적인 조건 브랜치 명령들에 대한 예측 값들의 선택을 조종(steer)할 수 있다.Thus, there is an industrial need to have a processor that can store and use branch history information rather than GHR to achieve more accurate branch predictions. The present invention discloses a processor that recognizes this need and identifies branch instructions at execution stages of the processor. Using the branch instruction information as input, the processor can steer the selection of prediction values for sequential condition branch instructions.

브랜치 히스토리 정보를 처리하는 방법이 개시된다. 상기 방법은 제 1 파이프라인 스테이지 동안 브랜치 명령들을 식별하고 상기 제 1 파이프라인 스테이지 동안 제 1 레지스터에서 상기 브랜치 히스토리 정보를 로딩한다. 상기 방법은 제 2 파이프라인 스테이지에서 상기 브랜치 명령들을 확인하고 상기 브랜치 히스토리 정보는 상기 제 2 파이프라인 스테이지 동안 제 2 레지스터로 로딩된다.A method of processing branch history information is disclosed. The method identifies branch instructions during a first pipeline stage and loads the branch history information in a first register during the first pipeline stage. The method identifies the branch instructions at a second pipeline stage and the branch history information is loaded into a second register during the second pipeline stage.

브랜치 히스토리 정보를 포함하는 제 1 레지스터 및 브랜치 히스토리 정보를 포함하는 제 2 레지스터를 포함하는 파이프라인 프로세서가 개시된다. 파이프라인 프로세서는 다수의 파이프라인 스테이지들을 포함하고, 상기 제 1 레지스터에는 브랜치 명령이 식별될 때 제 1 파이프라인 스테이지에서 상기 브랜치 히스토리 정보가 로딩되며, 제 2 레지스터에는 제 2 파이프라인 스테이지 동안 브랜치 히스토리 정보가 로딩된다.A pipeline processor is disclosed that includes a first register containing branch history information and a second register containing branch history information. The pipeline processor includes a plurality of pipeline stages, wherein the first register is loaded with the branch history information at the first pipeline stage when a branch instruction is identified, and the second register has branch history during the second pipeline stage. The information is loaded.

브랜치 히스토리 정보를 처리하는 방법이 개시된다. 상기 방법은 브랜치 명령을 페치하고, 제 1 파이프라인 스테이지 동안 상기 브랜치 명령들을 식별하며, 그리고 상기 제 1 파이프라인 스테이지 동안 제 1 레지스터에서 상기 브랜치 히스토리 정보를 로딩한다. 상기 방법은 제 2 파이프라인 스테이지에서 상기 브랜치 명령들을 확정하고, 상기 브랜치 히스토리 정보는 상기 제 2 파이프라인 스테이지 동안 제 2 레지스터로 로딩된다.A method of processing branch history information is disclosed. The method fetches a branch instruction, identifies the branch instructions during a first pipeline stage, and loads the branch history information in a first register during the first pipeline stage. The method confirms the branch instructions at a second pipeline stage, and the branch history information is loaded into a second register during the second pipeline stage.

본 발명의 추가적인 특징들 및 장점들 뿐만 아니라, 본 발명의 더욱 완전한 이해는 다음의 상세한 설명 및 첨부되는 도면들로부터 명백할 것이다.In addition to further features and advantages of the present invention, a more complete understanding of the invention will be apparent from the following detailed description and the accompanying drawings.

도 1은 본 발명의 일 실시예를 이용하여 프로세서의 하이 레벨 로직 하드웨어 블록 다이어그램을 도시한다.1 illustrates a high level logic hardware block diagram of a processor using one embodiment of the present invention.

도 2는 도 1의 프로세서에 의해 이용되는 예시적인 브랜치 히스토리 테이블을 나타낸다.2 illustrates an example branch history table used by the processor of FIG. 1.

도 3은 워킹 글로벌 히스토리 레지스터를 이용하는 도 1의 프로세서의 로우 레벨 로직 블록 다이어그램을 도시한다.3 illustrates a low level logic block diagram of the processor of FIG. 1 using a working global history register.

도 4는 워킹 글로벌 히스토리 레지스터 및 글로벌 히스토리 레지스터의 상세한 도면을 도시한다.4 shows a detailed view of the working global history register and the global history register.

도 5는 도 1의 프로세서에 의해 실행되는 예시적인 그룹의 명령들을 도시한다.5 illustrates an example group of instructions executed by the processor of FIG. 1.

도 6은 도 1의 프로세서의 다양한 스테이지들을 통해 실행되는 것과 같은 도 5의 예시적인 명령들의 그룹의 타이밍 다이어그램을 도시한다.FIG. 6 shows a timing diagram of the group of example instructions of FIG. 5 as executed through various stages of the processor of FIG. 1.

도 7은 워킹 글로벌 히스토리 레지스터를 이용하여 도 1의 프로세서에 의해 수행되는 명령 프로세스 흐름을 설명하는 플로우 차트를 도시한다.FIG. 7 shows a flow chart illustrating the instruction process flow performed by the processor of FIG. 1 using the working global history register.

본 발명이 실시될 수 있는 유일한 실시예들을 나타내려고 하는 것이 아닌 본 발명의 다양한 실시예들의 설명으로서 의도된 첨부된 도면들과 관련하여 상세한 설명이 아래에서 설명된다. 상세한 설명은 본 발명의 완전한 이해를 제공하기 위한 목적을 위해 특정한 상세한 설명들을 포함한다. 그러나 당해 기술분야에 속한 자에게는 본 발명이 이러한 특정한 상세한 설명들 없이 실시될 수 있음이 명백할 것이다. 몇몇의 예들에서, 본 발명의 개념들을 흐리게 하는 것을 방지하기 위해 널리-알려진 구조들 및 컴포넌트들이 블록 다이어그램 형태로 도시된다. 두문자들 및 다른 기술적인 용어는 단지 편의와 명확함을 위해 사용될 수 있고, 본 발명의 범위를 제한하고자 함이 아니다.DETAILED DESCRIPTION The detailed description is set forth below in connection with the accompanying drawings, which are intended as a description of various embodiments of the invention rather than to show the only embodiments in which the invention may be practiced. The detailed description includes specific details for the purpose of providing a thorough understanding of the present invention. However, it will be apparent to one of ordinary skill in the art that the present invention may be practiced without these specific details. In some instances, well-known structures and components are shown in block diagram form in order to avoid obscuring the concepts of the present invention. Acronyms and other technical terms may be used merely for convenience and clarity and are not intended to limit the scope of the invention.

도 1은 여기서 설명되는 실시예를 이용하여 슈퍼스칼라(superscalar) 프로세서(100)의 하이 레벨 도면을 도시한다. 프로세서(100)는 전용 고속 버스(104)를 통해 명령 캐시(106)와 연결된 중앙 연산 장치(CPU, 102)를 구비한다. 명령 캐시는 또한 범용 버스(116)를 통해 메모리(114)에 연결된다.1 shows a high level diagram of a superscalar processor 100 using the embodiment described herein. The processor 100 has a central computing unit (CPU) 102 connected to the instruction cache 106 via a dedicated high speed bus 104. The instruction cache is also coupled to the memory 114 via the general purpose bus 116.

프로세서(100) 내에서, 명령 페치 유닛(IFU, 122)은 메모리(114)로부터 명령 캐시(106)로 명령들의 로딩을 제어한다. 명령 캐시(106)가 명령들로 로딩되면, CPU(102)는 고속 버스(104)를 통해 명령들에 액세스할 수 있다. 명령 캐시(106)는 도 1에 도시된 것처럼 개별적인 메모리 구조일 수 있거나, 또는 CPU(102)의 내부 컴포넌트로서 집적화될 수 있다. 집적화는 CPU(102)의 복잡함 및 전력 손실(dissipation)뿐만 아니라 명령 캐시(106)의 사이즈에 따라 정해질 수 있다. 또한 브랜치 타깃 어드레스 캐시(BTAC,130), 브랜치 히스토리 테이블(BHT, 140) 및 2개의 하위(lower) 파이프라인들(160, 170)은 IFU(122)에 연결된다.Within the processor 100, an instruction fetch unit (IFU) 122 controls the loading of instructions from the memory 114 to the instruction cache 106. Once the instruction cache 106 is loaded with instructions, the CPU 102 may access the instructions via the high speed bus 104. The instruction cache 106 may be a separate memory structure as shown in FIG. 1 or may be integrated as an internal component of the CPU 102. Integration may be based on the size of the instruction cache 106 as well as the complexity and power dissipation of the CPU 102. Branch target address cache (BTAC) 130, branch history table (BHT) 140 and two lower pipelines 160, 170 are also coupled to IFU 122.

명령들은 명령 캐시(106)로부터 페치되고 디코딩될 수 있으며, 한번에 여러 개의 명령들이 페치되고 디코딩될 수 있다. 명령 캐시(106) 내에서 명령들은 캐시 라인들로서 알려진 섹션들로 그룹화된다. 각각의 캐시 라인은 관련된 데이터뿐만 아니라 복수의 명령들을 포함할 수 있다. 페치되는 명령들의 개수는 각각의 캐시 라인의 명령들의 개수뿐만 아니라 요구되는 페치 대역폭에 의존할 수 있다. IFU(122) 내에서, 페치된 명령들은 동작 유형 및 데이터 의존들에 대해 분석된다. 명령들을 분석한 뒤, 프로세서(100)는 추가적인 실행을 위해 IFU(122)로부터 낮은 기능 유닛들 또는 하위(lower) 파이프라인들(160 또는 170)로 명령들을 분배할 수 있다.The instructions may be fetched and decoded from the instruction cache 106, and several instructions may be fetched and decoded at one time. In the instruction cache 106, instructions are grouped into sections known as cache lines. Each cache line may contain a plurality of instructions as well as associated data. The number of instructions fetched may depend on the number of instructions in each cache line as well as the fetch bandwidth required. Within IFU 122, the fetched instructions are analyzed for operation type and data dependencies. After analyzing the instructions, processor 100 may distribute instructions from IFU 122 to lower functional units or lower pipelines 160 or 170 for further execution.

하위 파이프라인들(160, 170)은 계산 논리 유닛들, 플로팅 포인트 유닛들, 저장 유닛들, 로드 유닛들 등을 포함하는 다양한 실행 유닛들(EU, 118)을 포함할 수 있다. 예를 들어, 계산 논리 유닛과 같은 EU(118)는 정수 가산, 감산, 단순 곱셈, 비트(bitwise) 논리 연산들(예를 들어, AND, NOT, OR, XOR), 비트 시프팅 등과 같은 넓은 범위의 계산 기능들을 실행할 수 있다. 또한, 하위 파이프라인들(160, 170)은 조건 브랜치 명령의 실제의 결과들이 식별되는 동안 해결 스테이지(미도시)를 가질 수 있다. 브랜치 명령의 실제 결과들이 식별되면, 프로세서(100)는 실제의 결과들을 예측된 결과들과 비교할 수 있고, 그들이 매치하지 않으면, 잘못된 예측이 발생한 것이다.The lower pipelines 160, 170 may include various execution units EU 118 including computational logic units, floating point units, storage units, load units, and the like. For example, the EU 118, such as a computational logic unit, has a wide range of integer additions, subtractions, simple multiplications, bitwise logical operations (e.g. AND, NOT, OR, XOR), bit shifting, and the like. Calculation functions can be performed. Further, lower pipelines 160, 170 may have a resolution stage (not shown) while the actual results of the condition branch instruction are identified. Once the actual results of the branch instruction are identified, processor 100 can compare the actual results with the predicted results, and if they do not match, an incorrect prediction has occurred.

당해 기술분야에 속하는 자는 BTAC(130)가 브랜치 타깃 버퍼(BTB) 또는 브랜치 타깃 명령 캐시(BTIC)와 유사할 수 있다. BTB 또는 BTIC는 브랜치의 어드레스 및 타깃 브랜치의 명령 데이터(또는 옵코드(opcode)들) 모두를 저장한다. 설명의 편의를 위해, BTAC(130)는 본 발명의 다양한 실시예들과 관련하여 이용된다. 본 발명의 다른 실시예들은 BTAC(130) 대신에 선택적으로 BTB 또는 BTIC를 포함할 수 있다.One of ordinary skill in the art may have BTAC 130 similar to branch target buffer (BTB) or branch target instruction cache (BTIC). The BTB or BTIC stores both the address of the branch and the instruction data (or opcodes) of the target branch. For convenience of description, BTAC 130 is used in connection with various embodiments of the present invention. Other embodiments of the invention may optionally include BTB or BTIC instead of BTAC 130.

브랜치 명령이 실행되는 첫 번째 시간에, BTAC(130)에 엔트리가 존재하지 않고 BTAC 미스가 일어난다. 브랜치 명령이 그것의 실행을 종료한 후에, BTAC(130)는 프로세서 모드뿐만 아니라 특정한 조건 브랜치 명령의 타깃 어드레스를 나타내기 위해 순차적으로 업데이트될 수 있다(예를 들어, 진보된 RISC 프로세서 구조에서 Arm 대 Thumb 동작). 브랜치 명령이 다시 페치되는 임의의 시간에서, BTAC(130)에 저장된 정보는 페치된 브랜치 명령을 완전히 디코딩하지 않더라도 다음의 프로세서 사이클에서 페치될 것이다.At the first time the branch instruction is executed, there is no entry in BTAC 130 and a BTAC miss occurs. After a branch instruction terminates its execution, BTAC 130 may be updated sequentially to indicate the target address of the particular condition branch instruction as well as the processor mode (eg, Arm vs. Advanced RISC processor architecture). Thumb behavior). At any time a branch instruction is fetched again, the information stored in BTAC 130 will be fetched in the next processor cycle even if the branched instruction is not fully decoded.

BTAC 히트(예를 들어, 페치 그룹 어드레스가 BTAC(130)에서 어드레스와 매치할 때)는 조건 또는 비조건 브랜치 명령에 대해 일어날 수 있다. 이것은 BTAC(130)가 비조건 브랜치 명령들뿐만 아니라 조건 브랜치 명령들 모두에 관련되 는 정보를 저장할 수 있는 사실에 기인한다. 비조건 브랜치 명령에 대한 BTAC 히트의 경우에, 브랜치 명령이 비조건적이라는 사실뿐만 아니라 예측된 타깃 어드레스, 프로세서의 예측된 모드는 저장될 수 있다. 비조건 브랜치 명령 어드레스가 BTAC(130)의 엔트리에 저장되는 상황들에서, 엔트리는 취해지는 브랜치 지시를 표시할 것이다.A BTAC hit (eg, when a fetch group address matches an address in BTAC 130) may occur for a conditional or unconditional branch instruction. This is due to the fact that the BTAC 130 can store information relating to both conditional branch instructions as well as unconditional branch instructions. In the case of a BTAC hit for an unconditional branch instruction, not only the fact that the branch instruction is unconditional, but also the predicted target address, the predicted mode of the processor, can be stored. In situations where an unconditional branch instruction address is stored in the entry of BTAC 130, the entry will indicate the branch indication to be taken.

도 2는 프로세서(100)에 의해 이용되는 예시적인 브랜치 히스토리 테이블(BHT, 140)의 더욱 상세한 설명을 도시한다. BHT(140)는 m개의 어드레스 비트들을 가진 어드레스를 이용하여 인덱싱되는 2^m개의 라인들(202)로 구성될 수 있다. 일 실시예에서, 어드레스의 9개의 비트들은 512라인들을 가지는 BHT(140)를 초래한다. 각각의 라인(202) 내에, n은 적절한 카운터를 선택하기 위해 이용되는 비트들의 수일 때, 2ⁿ개의 카운터들(204)이 존재한다. 또한, 어드레스의 3개의 비트들은 카운터(204)를 선택하기 위해 이용될 수 있고, 라인(202) 당 8개의 카운터들(204)을 가진 BHT(140)를 초래한다. 일 예시적인 실시예에서, 페치 그룹 어드레스 비트들(12 내지 4)은 BHT(140)의 라인(202)을 선택하기 위해 이용될 수 있다. 페치 그룹 어드레스의 비트들(3-1)은 특정 카운터(204)를 선택하기 위해 이용될 수 있다.2 shows a more detailed description of an exemplary branch history table (BHT) 140 used by the processor 100. The BHT 140 may consist of 2 ^m lines 202 indexed using an address with m address bits. In one embodiment, nine bits of the address result in BHT 140 having 512 lines. Within each line 202, there are 2 ⁿ counters 204 when n is the number of bits used to select the appropriate counter. Also, three bits of the address can be used to select the counter 204, resulting in the BHT 140 with eight counters 204 per line 202. In one exemplary embodiment, fetch group address bits 12 through 4 may be used to select line 202 of BHT 140. Bits 3-1 of the fetch group address may be used to select a particular counter 204.

프로세서(100)는 명령 디코딩 스테이지에 앞선 명령 실행 프로세스에서 더 일찍 브랜치 명령들을 식별할 수 있다. 브랜치 명령들이 더 일찍 식별되면, 예측 값(조건 브랜치 명령) 또는 취해진 브랜치 지시(비조건 브랜치 명령)와 같은 브랜치 히스토리 정보가 또한 동시에 식별될 수 있다. 워킹 글로벌 히스토리 레지스 터(WGHR)는 명령 실행 프로세서에서 더 일찍 브랜치 히스토리 정보를 수신하고 처리하기 위해 프로세서(100)에 의해 이용될 수 있다. 예를 들어, WGHR은 비조건적인 브랜치 명령들의 브랜치 지시들뿐만 아니라 조건적인 브랜치 명령들의 예측 값들을 저장할 수 있다. 선택적으로, WGHR은 조건 브랜치 명령들의 예측 값들만 저장할 수 있다. WGHR의 출력은 다음의 조건 브랜치 명령에 대해 BHT(140)에서 대응하는 엔트리를 인덱싱하기 위해 이용될 수 있다.Processor 100 may identify branch instructions earlier in the instruction execution process prior to the instruction decoding stage. If branch instructions are identified earlier, branch history information, such as predicted values (conditional branch instructions) or branch instructions taken (unconditional branch instructions), may also be identified at the same time. The working global history register (WGHR) may be used by the processor 100 to receive and process branch history information earlier in the instruction execution processor. For example, the WGHR may store prediction values of conditional branch instructions as well as branch instructions of unconditional branch instructions. Optionally, the WGHR may only store prediction values of condition branch instructions. The output of the WGHR may be used to index the corresponding entry in the BHT 140 for the next condition branch instruction.

도 4는 워킹 글로벌 히스토리 레지스터(WGHR, 416)를 포함하는 프로세서(100)의 하위 레벨 논리 블록 다이어그램(400)을 도시한다. 상위(upper) 파이프(450)는 하위 레벨 블록 다이어그램(400)에 있다. 상위 파이프의 위는 페치 논리 회로(402)에 연결된다. 상위 파이프(450)는 명령 캐시 1 스테이지(IC1, 404), 명령 캐시 2 스테이지(IC2, 406), 명령 데이터 정렬 스테이지(IDA, 408) 및 디코딩 스테이지(DCD, 410)인 4개의 명령 실행 스테이지들을 포함한다. 파이프 스테이지들은 본 발명의 범위를 한정하지 않으면서 상위 파이프(450)로부터 추가되거나 제외될 수 있음을 주목해야 한다. 상위 파이프(450), 워킹 글로벌 히스토리 레지스터(WGHR, 416), 글로벌 히스토리 레지스터(GHR, 414), 브랜치 정정 논리 회로(BCL, 440), 선택 먹스(mux)(422) 및 어드레스 해싱 논리 회로(420)뿐만 아니라 페치 논리 회로(402) 또한 IFU(122) 내에 위치할 수 있다.4 shows a low level logical block diagram 400 of a processor 100 that includes a working global history register (WGHR) 416. The upper pipe 450 is in the lower level block diagram 400. The top of the upper pipe is connected to the fetch logic circuit 402. The upper pipe 450 includes four instruction execution stages: an instruction cache 1 stage (IC1, 404), an instruction cache 2 stage (IC2, 406), an instruction data alignment stage (IDA, 408), and a decoding stage (DCD, 410). Include. It should be noted that pipe stages may be added or excluded from the upper pipe 450 without limiting the scope of the present invention. Upper Pipe 450, Working Global History Register (WGHR, 416), Global History Register (GHR, 414), Branch Correction Logic Circuits (BCL, 440), Select Mux 422, and Address Hashing Logic Circuits 420 In addition to the fetch logic circuit 402 may also be located within the IFU (122).

프로세서(100)는 명령들을 실행하는 것을 시작하고, 페치 논리 회로(402)는 IC1 스테이지(404) 동안 어떤 명령들이 페치되는지를 결정한다. 명령들을 검색(retrieve)하기 위해, 페치 논리 회로(402)는 명령 캐시(106)로 페치 그룹 어드 레스를 전송한다. 만약 페치 그룹 어드레스가 명령 캐시(106) 내에서 발견된다면(예를 들어, 명령 캐시 히트), 명령들은 IC2 스테이지(406) 동안 명령 캐시(106)의 히트 캐시 라인으로부터 판독된다.Processor 100 begins executing instructions, and fetch logic circuit 402 determines which instructions are fetched during IC1 stage 404. To retrieve the instructions, the fetch logic circuit 402 sends a fetch group address to the instruction cache 106. If a fetch group address is found in the instruction cache 106 (eg, instruction cache hit), the instructions are read from the hit cache line of the instruction cache 106 during the IC2 stage 406.

병렬적으로, IC1 스테이지(404) 동안, 프로세서(100)는 BTAC(130)로 페치 그룹 어드레스를 전송한다. 만약 프로세서(100)가 BTAC 히트를 마주치면, 페치 그룹 어드레스에 대한 BTAC 내에 저장된 정보는 IC2 스테이지(406) 동안 수신된다. 이전에 언급한 것처럼, BTAC(130) 내에 저장된 정보는 취해진 브랜치 지시(비조건적인 브랜치 명령의 경우)뿐만 아니라 브랜치 타깃, 프로세서 모드와 같은 브랜치 정보를 포함할 수 있다.In parallel, during IC1 stage 404, processor 100 sends a fetch group address to BTAC 130. If processor 100 encounters a BTAC hit, the information stored in BTAC for the fetch group address is received during IC2 stage 406. As mentioned previously, the information stored in BTAC 130 may include branch information such as branch targets, processor mode, as well as branch instructions taken (in the case of unconditional branch instructions).

IC1 스테이지(404) 동안 또한, 페치 논리 (회로)는 페치 그룹 어드레스를 어드레스 해싱 논리 회로(420)로 전송한다. 어드레싱 해싱 논리 회로(420) 내에서, 페치 그룹 어드레스의 비트들 12 - 4(bits 12 - 4)은 선택 먹스(422)의 결과와 배타적 논리합(XOR'd)으로 연산된다. 어드레스 해싱 논리 회로(420)의 결과(예를 들어, XOR 함수)는 어드레스 인덱스를 BHT(140)로 제공한다. 이전에 언급한 것처럼, 페치 그룹 어드레스의 비트들(3-1)은 적절한 카운터(204)를 선택하기 위해 선택 비트들을 제공할 수 있다.Also during IC1 stage 404, the fetch logic (circuit) sends the fetch group address to address hashing logic circuit 420. In the addressing hashing logic circuit 420, bits 12-4 of the fetch group address are computed with an exclusive OR (XOR'd) with the result of the selection mux 422. The result of the address hashing logic circuit 420 (eg, an XOR function) provides the address index to the BHT 140. As mentioned previously, bits 3-1 of the fetch group address can provide select bits to select the appropriate counter 204.

IC2 스테이지(406) 동안, 프로세서(100)는 명령 페치 그룹 어드레스를 명령 캐시(106), BTAC(130) 및 BHT(140)로 전송하는 것으로부터의 결과들을 판독한다. IC2 스테이지(406)에서, 프로세서(100)는 만약 BTAC 히트가 일어나는지 여부를 결정한다. BTAC 히트가 IC2 스테이지(406) 동안 확인될 때, 프로세서(100) 또한 브 랜치가 조건 또는 비조건 브랜치 명령인지를 결정한다. IC2 스테이지(406)에서 BHT(140)로부터의 예측 값은 또한 수신되고 저장된다.During IC2 stage 406, processor 100 reads the results from sending the instruction fetch group address to instruction cache 106, BTAC 130 and BHT 140. At IC2 stage 406, processor 100 determines if a BTAC hit occurs. When a BTAC hit is identified during IC2 stage 406, processor 100 also determines whether the branch is a conditional or unconditional branch instruction. Predicted values from the BHT 140 at IC2 stage 406 are also received and stored.

명령 캐시(106)의 각각의 캐시 라인은 복수의 명령들을 포함할 수 있기 때문에, 개별적인 명령들은 캐시 라인으로부터 분리될 필요가 있을 수 있다. 마찬가지로, 데이터는 캐시 라인의 명령들과 얽힐 수 있다. 캐시 라인으로부터의 정보는 적절하게 명령들을 분석하고 실행하기 위해 포맷되고 정렬될 필요가 있을 수 있다. 명령들의 개별적인 실행가능한 명령들로의 정렬 및 포맷은 IDA 스테이지(408) 동안 일어난다.Because each cache line of the instruction cache 106 may include a plurality of instructions, individual instructions may need to be separated from the cache line. Similarly, data can be entangled with instructions in the cache line. The information from the cache line may need to be formatted and aligned in order to properly parse and execute the instructions. Sorting and formatting of instructions into individual executable instructions occur during IDA stage 408.

명령들이 IDA 스테이지(408) 동안 처리된 후에, 그들은 디코딩(DCD) 스테이지(410)를 통하여 통과된다. DCD 스테이지(410) 동안, 명령들은 명령의 유형 및 어떤 추가적인 정보나 자원들이 추가적인 프로세싱을 위하여 필요할 수 있는지를 결정하기 위해 분석된다. 명령의 유형 또는 현재의 명령 로드에 의존하여, 프로세서(100)는 명령을 DCD 스테이지(410)에 홀드(hold)할 수 있거나, 또는 프로세서(100)는 추가적인 실행을 위해 하위 파이프라인들(160 또는 170)로 명령을 통과시킬 수 있다. DCD 스테이지(410)에서, 프로세서(100)는 조건 브랜치 명령으로 명령을 확인하고, BHT(140)로부터 명령의 예측 값(IC2 스테이지(406) 동안 판독함)을 확인한다. 예측 값의 정확도는 하위 파이프라인들(160 또는 170) 중 어느 하나의 나중의 명령 실행의 스테이지 동안 확인될 것이다. 브랜치 예측이 틀린(incorrect) 것으로 결정(예를 들어, 잘못된 예측)될 때까지, 프로세서(100)는 예측 값이 참된 값으로 추측하고 이러한 예측에 기반하여 페칭 명령들을 진행한다.After the instructions are processed during the IDA stage 408, they are passed through the decoding (DCD) stage 410. During DCD stage 410, the instructions are analyzed to determine the type of instruction and what additional information or resources may be needed for further processing. Depending on the type of instruction or the current instruction load, the processor 100 may hold the instruction in the DCD stage 410, or the processor 100 may execute sub-pipelines 160 or 170) can pass the command. In DCD stage 410, processor 100 verifies the instruction with the condition branch instruction, and confirms the predicted value of the instruction (read during IC2 stage 406) from BHT 140. The accuracy of the predicted value will be checked during the stage of later instruction execution of either of the lower pipelines 160 or 170. Until the branch prediction is determined to be incorrect (eg, incorrect prediction), the processor 100 assumes that the prediction value is true and proceeds with fetching instructions based on this prediction.

상위 파이프(450)는 워킹 글로벌 히스토리 레지스터(WGHR, 416)에 연결된다. WGHR(416)은 프로세서(100)가 DCD 스테이지(410)에 앞서 식별된 브랜치 명령들과 관련된 브랜치 히스토리 정보를 저장하고 처리하도록 허용한다. 일 실시예에서, WGHR(416)은 BTAC 히트가 일어날 때 조건 브랜치 명령에 대한 BHT(140)로부터 예측값으로 로딩될 수 있다. 앞서 설명한 것처럼, BTAC 히트는 페치되는 명령이 브랜치 명령이고 브랜치 히스토리 정보(예를 들어, 조건 브랜치 명령에 대한 예측 값 또는 비조건 브랜치 명령에 대한 취해진 지시)와 관련되었는지를 의미한다. 이러한 조건에 기반하여, 프로세서(100)는 DCD 스테이지(410) 동안 브랜치 명령이 확인될 때까지 기다리는 것과는 반대로 이어지는 브랜치 예측들에 대해 더 일찍 브랜치 히스토리 정보를 이용할 수 있다(즉, 브랜치 히스토리 정보가 더욱 최근 것임). WGHR(414)의 출력은 BHT(140)의 다음 엔트리에 대한 어드레스 인덱스를 결정하기 위해 어드레스 해싱 논리 회로(420)로 전송된다.The upper pipe 450 is connected to the working global history register (WGHR) 416. WGHR 416 allows the processor 100 to store and process branch history information related to branch instructions identified prior to DCD stage 410. In one embodiment, WGHR 416 may be loaded with predictions from BHT 140 for condition branch instructions when a BTAC hit occurs. As described above, a BTAC hit means that the instruction being fetched is a branch instruction and is associated with branch history information (eg, a predicted value for a conditional branch instruction or an indication taken for an unconditional branch instruction). Based on this condition, the processor 100 may use branch history information earlier for subsequent branch predictions (ie, branch history information is more likely) as opposed to waiting for a branch instruction to be confirmed during DCD stage 410. Recent). The output of the WGHR 414 is sent to the address hashing logic circuit 420 to determine the address index for the next entry of the BHT 140.

브랜치 히스토리 정보가 이용 가능하게 되는 시기는, 얼마나 빠르게 브랜치 히스토리 정보가 BHT(140)로부터 검색될 수 있는지 그리고 얼마나 빠르게 BTAC 히트가 확인 응답될 수 있는지에 의존한다. 몇몇의 프로세서 설계들에서, 브랜치 히스토리 정보 및 BTAC 히트는 IC2 스테이지(406) 동안 수신될 수 있다. 다른 프로세서 설계들에서, 브랜치 히스토리 정보 및 BTAC 히트는 IDA 스테이지(408) 동안 수신될 수 있다. 다시 위에서 설명된 스테이지들과 다른 스테이지들을 통합하는 다른 프로세서 설계들에서, 브랜치 히스토리 정보 및 BTAC 히트는 디코딩 스테이지에 앞서 상기 스테이지들 동안 이용 가능할 수 있다.The timing at which branch history information becomes available depends on how quickly branch history information can be retrieved from BHT 140 and how quickly a BTAC hit can be acknowledged. In some processor designs, branch history information and BTAC hits may be received during IC2 stage 406. In other processor designs, branch history information and BTAC hits may be received during IDA stage 408. In other processor designs that incorporate other stages again with the stages described above, branch history information and BTAC hits may be available during these stages prior to the decoding stage.

일 실시예에서, 조건 브랜치 명령들에 대한 브랜치 히스토리 정보는 IC2 스테이지(406) 동안(BTAC 히트가 일어날 때) WGHR(416)로 시프트된다. 다시 다른 실시예에서, 조건 브랜치 명령들 및 비조건 브랜치 명령들 모두에 대한 브랜치 히스토리 정보는 WGHR(416)로 시프트된다. 추가적인 실시예에서, WGHR(416)은 IDA 스테이지(408) 동안 브랜치 히스토리 정보로 업데이트될 수 있다. 이러한 상황은 예측 값이 BHT(140)에 저장되었을 때 또는 BTAC 히트 정보가 IDA 스테이지(408)까지 이용 가능하지 않을 때 일어날 수 있다.In one embodiment, branch history information for condition branch instructions is shifted to WGHR 416 during IC2 stage 406 (when a BTAC hit occurs). In yet another embodiment, branch history information for both conditional branch instructions and unconditional branch instructions is shifted to the WGHR 416. In a further embodiment, WGHR 416 may be updated with branch history information during IDA stage 408. This situation may occur when the predicted value is stored in the BHT 140 or when the BTAC hit information is not available up to the IDA stage 408.

선택 먹스(422)는 WGHR(416)의 출력을 수신하도록 구성된다. 일 실시예에서, WGHR(416)의 출력은 프로세서(100)에 의해 처리되는 마지막 9개의 브랜치 명령들의 브랜치 히스토리를 포함하는 9개의 비트 값이다. 선택 먹스(422)의 결과는 다음의 조건 브랜치 명령에 대한 BHT(140)로 인덱싱되는 어드레스 해싱 논리 회로(420)로의 입력으로 이용된다.The selection mux 422 is configured to receive the output of the WGHR 416. In one embodiment, the output of the WGHR 416 is a nine bit value that includes the branch history of the last nine branch instructions processed by the processor 100. The result of the selection mux 422 is used as input to the address hashing logic circuit 420 indexed by the BHT 140 for the next condition branch instruction.

GHR(414)은 GHR(414)이 DCD 스테이지(410) 동안 브랜치 히스토리 정보로 로딩될 수 있는 것을 제외하고는 WGHR(416)과 매우 유사하게 동작한다. GHR(414)의 내용들은 브랜치 명령이 DCD 스테이지(410)를 통해 통과하면 WGHR(416)의 내용들을 반영(mirror)할 것이다. 환경들에 의존하여 GHR의 출력은 예측 값을 인덱싱하기 위해 이용될 수 있다.The GHR 414 operates very similarly to the WGHR 416 except that the GHR 414 can be loaded with branch history information during the DCD stage 410. The contents of the GHR 414 will mirror the contents of the WGHR 416 as the branch instruction passes through the DCD stage 410. Depending on the circumstances, the output of the GHR can be used to index the prediction value.

GHR(414)의 출력은 선택 먹스(422)에 연결된다. BTAC 미스가 일어나고 DCD 스테이지(410) 동안 명령이 취해진 브랜치 명령으로서 확인되는 것으로 결정될 때, 선택 먹스(422)는 인덱싱을 위한 어드레스 해싱 논리 회로(420)에 의해 이용되는 GHR(414)의 출력을 선택하도록 지시한다. 이러한 예에서, WGHR(416)이 아직 취해진 브랜치에 대한 브랜치 히스토리 정보를 가지지 않았기 때문에(BTAC 미스로 인해) GHR(414)은 이용된다. 선택적으로, WGHR(416)가 현재의 브랜치 명령에 대해 BHT(140)를 인덱싱하기에 앞서 순차적인 페치된 브랜치 명령에 의해 업데이트될 수 있기 때문에, BTAC 미스가 일어날 때 GHR(414)의 출력은 또한 어드레스 해싱 논리 회로(420)에 의해 이용될 수 있다. 이러한 예에서, WGHR(414)은 현재의 브랜치 명령에 대한 적절한 값을 나타내지 못하고, 만약 어드레스 해싱 논리 회로(420)에 의해 이용되면 BHT(140)에 틀린 엔트리가 인덱싱될 수 있다.The output of the GHR 414 is connected to the selection mux 422. When a BTAC miss occurs and it is determined to be identified as a branch instruction taken during the DCD stage 410, the select mux 422 selects the output of the GHR 414 used by the address hashing logic circuit 420 for indexing. Instruct them to. In this example, GHR 414 is used because WGHR 416 has not yet had branch history information for the branch taken (due to a BTAC miss). Optionally, because the WGHR 416 can be updated by sequential fetched branch instructions prior to indexing the BHT 140 for the current branch instruction, the output of the GHR 414 also occurs when a BTAC miss occurs. Address hashing logic 420. In this example, the WGHR 414 does not represent an appropriate value for the current branch instruction, and if used by the address hashing logic circuit 420, an incorrect entry in the BHT 140 may be indexed.

GHR(414)의 출력은 또한 브랜치 정정 논리 회로(BCL, 440)에 연결된다. 잘못된 예측이 발생하면, BCL(440)은 복원 목적들을 위해 이용되는 브랜치 히스토리 정보의 "참된(true)" 카피를 제공하기 위해 GHR(414)을 이용한다. 잘못된 예측이 일어나면, BCL(440)은 GHR(414) 및 WGHR(416) 모두에서 브랜치 히스토리 정보를 복원한다. 위에서 언급한 것처럼, 브랜치 명령이 해결 스테이지에 도달하고 실제의 결과들이 예측된 결과들에 매치하지 않을 때, 잘못된 예측이 일어난다.The output of the GHR 414 is also coupled to the branch correction logic circuit BCL 440. If a false prediction occurs, the BCL 440 uses the GHR 414 to provide a "true" copy of the branch history information used for reconstruction purposes. If false prediction occurs, BCL 440 restores branch history information in both GHR 414 and WGHR 416. As mentioned above, when the branch instruction reaches the resolution stage and the actual results do not match the predicted results, wrong prediction occurs.

잘못된 예측이 일어날 때, BCL(440)은 페치 논리 회로(402)가 잘못 예측된 조건 브랜치 명령에 기반하여 페치되는 명령들을 플러싱(flush)하도록 지시하는 정보를 페치 논리 회로(402)로 전송한다. 더욱 효율적이기 위해, BCL(440)은 정정 브랜치 히스토리 정보를 선택 먹스(422)에 제공하는 동시에 GHR(414) 및 WGHR(416)을 정정 브랜치 히스토리 정보로 복원할 수 있다. 잘못된 예측이 일어날 때, 프로세서(100)는 적절한 카운터(204)를 인덱싱하는 데 이용하기 위해 어드레스 해싱 논 리 회로(420)로 지시되는 BCL(440)의 출력(선택 먹스(422)를 통해)을 선택할 수 있다.When false prediction occurs, the BCL 440 sends information to the fetch logic circuit 402 instructing the fetch logic circuit 402 to flush instructions that are fetched based on the mis-predicted condition branch instruction. To be more efficient, the BCL 440 may restore the GHR 414 and the WGHR 416 to the correct branch history information while providing the correct branch history information to the selection mux 422. When a false prediction occurs, processor 100 uses the output of BCL 440 (via selection mux 422) directed to address hashing logic circuit 420 for use in indexing the appropriate counter 204. You can choose.

프로세서(100)가 잘못된 예측을 마주칠 때, BCL(440)은 GHR 및 WGHR을 그들의 적절한 값들로 복원한다. 일 실시예에서, BCL(440)은 GHR(414)이 조건 브랜치 명령에 대해 예측 값으로 로딩된 후에 GHR(414)의 스냅샷을 취할 수 있다. BCL(440)은 그 후 GHR(414)의 가장 최근의 예측 값(예를 들어, MSB)을 인버팅할 수 있다. 예측 값의 반대를 취함으로써, BCL(440)은 잘못된 예측이 일어나면 GHR(414) 및 WGHR(416)에서 나타내어야할 정정된 값을 준비한다. 예를 들어, 만약 DCD 스테이지(410) 동안 조건 브랜치 명령 및 그것의 예측 값을 식별한 이후라면, GHR(414) 및 BCL(440)은 값 "101011111"(MSB => LSB)으로 로딩된다. BCL(440)은 조건 브랜치 명령에 대응하는 MSB로 플립(flip)할 수 있고 조건 브랜치 명령에 링크된 정정된 값 "001011111"을 저장할 수 있다. 따라서 만약 조건 브랜치 명령이 틀리게 예측되면, 정정된 값은 GHR(414), WGHR(416) 및 선택 먹스(422)로 전송될 준비가 된다.When processor 100 encounters a wrong prediction, BCL 440 restores the GHR and WGHR to their appropriate values. In one embodiment, the BCL 440 may take a snapshot of the GHR 414 after the GHR 414 is loaded with the predicted value for the condition branch instruction. The BCL 440 may then invert the most recent prediction value (eg, MSB) of the GHR 414. By taking the opposite of the prediction value, the BCL 440 prepares a corrected value that should appear in the GHR 414 and the WGHR 416 if a wrong prediction occurs. For example, if after identifying the condition branch instruction and its predicted value during DCD stage 410, GHR 414 and BCL 440 are loaded with value "101011111" (MSB => LSB). The BCL 440 may flip to the MSB corresponding to the condition branch instruction and store the corrected value "001011111" linked to the condition branch instruction. Thus, if the condition branch instruction is incorrectly predicted, the corrected value is ready to be sent to the GHR 414, the WGHR 416, and the selection mux 422.

도 5는 WGHR(416), GHR(414) 및 BCL(440)의 상세한 도면(500)을 도시한다. 상세한 도면(500) 내에서, WGHR 선택 먹스(502)는 BCL(440)로부터 정정된 브랜치 히스토리 정보뿐만 아니라 IC2 스테이지(406), DCD 스테이지(410)로부터 브랜치 히스토리 정보를 수신한다. GHR 선택 먹스(504)는 DCD 스테이지(410)로부터 브랜치 히스토리 정보를 수신하고 BCL(440)로부터 정정된 브랜치 히스토리 정보를 수신한다.5 shows a detailed view 500 of the WGHR 416, GHR 414, and BCL 440. Within detailed diagram 500, WGHR selection mux 502 receives branch history information from IC2 stage 406, DCD stage 410 as well as corrected branch history information from BCL 440. The GHR selection mux 504 receives the branch history information from the DCD stage 410 and the corrected branch history information from the BCL 440.

WGHR 선택 먹스(502)는 어떤 입력이 브랜치 히스토리 정보를 WGHR(416)에 로딩하기 위해 사용될지를 선택한다. 잘못된 예측이 일어날 때, BCL(440)로부터의 입력은 IC2 스테이지(406) 또는 DCD 스테이지(410)로부터 전송된 정보보다 우선권을 갖는다. 잘못된 예측 다음에 이어지는 브랜치 히스토리 정보가 틀리게 예측된 브랜치 경로에 따라 페치된 조건 브랜치 명령들과 관련될 수 있기 때문에 BCL(440)은 우선권을 갖는다. 그러므로 IC2 스테이지 또는 DCD 스테이지(410)에 의해 통과된 브랜치 히스토리 정보는 또한 부정확할 수 있다.The WGHR selection mux 502 selects which input is to be used to load branch history information into the WGHR 416. When false prediction occurs, the input from the BCL 440 takes precedence over the information sent from the IC2 stage 406 or the DCD stage 410. The BCL 440 takes precedence because branch history information following bad prediction may be associated with condition branch instructions fetched according to an incorrectly predicted branch path. Therefore, the branch history information passed by the IC2 stage or the DCD stage 410 may also be inaccurate.

만약 잘못된 예측이 일어나지 않는다면, WGHR 선택 먹스(502)를 위한 입력 선택은 가장 높은 우선권으로부터 가장 낮은 우선권으로 나열된 다음의 예들에 따라서 결정될 수 있다:If no false predictions occur, the input selection for the WGHR selection mux 502 may be determined according to the following examples, listed from highest priority to lowest priority:

a) 만약 브랜치 명령이 IC2 스테이지(406) 동안 BTAC 미스를 리턴(return)하지만 DCD 스테이지(410) 동안 디코딩된 후에 취해진 예측이 종료하면, DCD 스테이지(410) 동안 확인된 브랜치 히스토리 값은 WGHR(416)로 시프트된다. 방출될 필요가 있는 예측된 취해진 브랜치 명령 이후에 페치되었기 때문에, DCD 스테이지(410)는 이 경우에 우선권을 갖는다. 그러므로, 동일한 프로세서 사이클 동안 WGHR(416)에 기록될 준비가 될 수 있는 이어지는 브랜치 명령에 대해 IC2 스테이지(406) 동안 식별되는 임의의 브랜치 히스토리 정보는 폐기된다.a) If the branch instruction returns a BTAC miss during IC2 stage 406 but the prediction taken after it is decoded during DCD stage 410 ends, the branch history value checked during DCD stage 410 is determined by WGHR 416. Is shifted to The DCD stage 410 has priority in this case because it has been fetched after the predicted taken branch instruction that needs to be emitted. Therefore, any branch history information identified during IC2 stage 406 is discarded for subsequent branch instructions that may be ready to be written to WGHR 416 during the same processor cycle.

b) 만약 DCD 스테이지(410)가 BTAC 미스와 관련된 브랜치 명령을 실행하지 않으면, IC2 스테이지(406)는 다음의 가장 높은 우선권을 가질 것이다. BTAC 히트가 브랜치 명령에 대해 일어나는 한, IC2 스테이지(406) 동안 식별되는 브랜치 히 스토리 정보는 WGHR(416)로 시프트된다.b) If DCD stage 410 does not execute the branch instruction associated with the BTAC miss, IC2 stage 406 will have the next highest priority. As long as the BTAC hit occurs for the branch instruction, the branch history information identified during IC2 stage 406 is shifted to WGHR 416.

c) 만약 브랜치 명령이 BTAC 히트로서 이전에 식별되었고 관련된 브랜치 히스토리 정보가 이미 설명된 예(b)에 따라서 로딩된다면, WGHR(416)은 DCD 스테이지(410)로부터 한번 이상 다시 라이팅(write)될 것이다. 물론, 만약 조건 브랜치 명령이 BTAC 미스이고 브랜치 명령이 예측이 취해지지 않음이면, WGHR(416)은 이러한 브랜치 히스토리 정보로 라이트된다. WGHR(416)의 기록(writing)은 GHR(414) 및 WGHR(416)이 디코딩 스테이지(410)를 통해 명령이 통과한 후에 동기화될 것임을 보장한다.c) If a branch instruction was previously identified as a BTAC hit and the associated branch history information is loaded according to example (b) already described, the WGHR 416 will be rewritten from the DCD stage 410 one or more times. . Of course, if the conditional branch instruction is a BTAC miss and the branch instruction is not predicted, then the WGHR 416 is written to this branch history information. Writing of WGHR 416 ensures that GHR 414 and WGHR 416 will be synchronized after the command passes through decoding stage 410.

GHR 선택 먹스(504)는 GHR(414)을 업데이트하기 위해 이용되는 적절한 입력을 선택한다. WGHR 선택 로직(502)과 유사하게, GHR 선택 먹스(504)는 BCL(440)로부터의 입력에게 위에서 설명한 것과 동일한 이유로 가장 높은 우선권을 부여한다. 따라서 잘못된 예측이 일어나지 않으면, GHR(414)은 특정한 브랜치 명령에 대해 DCD 스테이지(410) 동안 식별되는 브랜치 히스토리 정보로 업데이트된다.GHR selection mux 504 selects the appropriate input used to update GHR 414. Similar to the WGHR selection logic 502, the GHR selection mux 504 gives the input from the BCL 440 the highest priority for the same reasons as described above. Thus, if false predictions do not occur, the GHR 414 is updated with branch history information identified during the DCD stage 410 for a particular branch instruction.

도 6은 상위 파이프라인(450)을 통해 이동하는 명령들(500)의 예시적인 그룹의 타이밍 다이어그램(600)을 도시한다. 명령들(500)의 예시적인 그룹 내에 복수의 브랜치 명령들이 존재한다. 도 6의 X-축(602)은 프로세서 사이클을 묘사하고 Y-축(604)은 GHR(414) 및 WGHR(416)의 내용들뿐만 아니라 명령이 통과하는 상위 파이프(450) 내에서의 실행 스테이지를 도시한다. GHR(414) 및 WGHR(416)의 내용들은 하나의 프로세서 사이클 동안 라이팅되고 다음의 프로세서 사이클의 시작점에서 래치(latch)된다. 다이밍 다이어그램에 나타난 것처럼, GHR(414) 및 WGHR(416)의 래치된 내용들이 도시된다. 설명의 편의를 위해, GHR(414) 및 WGHR(416)의 가장 중요한 3개의 비트들만이 도시된다. 명령들이 실행되면, 명령들은 Y-축(604)으로 내려간다.6 shows a timing diagram 600 of an exemplary group of instructions 500 moving through the upper pipeline 450. There are a plurality of branch instructions within an exemplary group of instructions 500. The X-axis 602 of FIG. 6 depicts a processor cycle and the Y-axis 604 is the stage of execution within the upper pipe 450 through which the instructions pass, as well as the contents of the GHR 414 and WGHR 416. Shows. The contents of the GHR 414 and WGHR 416 are written during one processor cycle and latched at the beginning of the next processor cycle. As shown in the dimming diagram, the latched contents of GHR 414 and WGHR 416 are shown. For convenience of description, only the three most significant bits of the GHR 414 and WGHR 416 are shown. When the instructions are executed, the instructions descend to the Y-axis 604.

프로세서 사이클 1에서, 페치 논리 회로(402)는 명령(A)에 대해 명령 캐시(106), BTAC(130) 및 어드레스 해싱 논리 회로(420)로 페치 그룹 어드레스를 전송한다. 이는 명령(A)이 IC1 스테이지(404)로 진입하는 것으로 타이밍 다이어그램(600)에 도시된다. 또한 프로세서 사이클 1에서, GHR(414) 및 WGHR(416)의 가장 중요한 3개의 비트들은 모두 0이고, 이는 3개의 실행된 마지막 브랜치 명령들이 모두 취해지지 않았음을 표시한다.In processor cycle 1, fetch logic circuit 402 sends a fetch group address to instruction cache 106, BTAC 130, and address hashing logic circuit 420 for instruction A. This is shown in the timing diagram 600 as the command A enters the IC1 stage 404. Also in processor cycle 1, the most significant three bits of GHR 414 and WGHR 416 are all zeros, indicating that all three executed last branch instructions have not been taken.

프로세서 사이클 2에서, 명령 캐시(106), BTAC(130) 및 BHT(140)로 페치 그룹 어드레스를 전송한 결과들이 수신된다. 이는 명령(A)이 IC2 스테이지(406)로 진입하는 것으로 타이밍 다이어그램에 도시된다. 명령 캐시(106)가 복수의 명령들을 저장하기 때문에, 또한 명령(A+4)은 IC2 스테이지(406)의 명령(A)과 함께 검색되어 것으로 도시된다. IC2 스테이지(406) 내의 논리 회로는 BTAC(130) 및 BHT(140)로부터 수신된 정보를 분석한다. IC2 스테이지(406) 동안, 프로세서(100)는 명령(A)이 BHT(140)로부터 리턴되는 예측 값뿐만 아니라 조건 브랜치 명령(BTAC 히트로부터의 정보에 기반한)임을 결정한다. 이러한 예에서, 명령(A)은 취해지는 것으로 예측된다. 명령(A)에 대한 BHT(140)의 실제의 엔트리는 강하게 취해짐(11) 또는 약하게 취해짐(10) 중 하나일 수 있다. 프로세서 사이클 2의 종료시점에서, 프로세서(100)는 조건 브랜치 명령(A)과 관련되는 예측 값을 나타내기 위해 WGHR(416)의 MSB에서 "1"을 로드한다. 명령(A)이 예측이 취해짐이기 때문에, 다음의 순차적인 명령(A+4)은, 명령(A+4)이 실행될 다음의 명령이 아닐 것이기 때문에 명령(A)이 IC2 스테이지(406)를 통해 통과한 후에 방출된다. 타이밍 다이어그램(600)에서 도시된 것처럼, 값 "100"은 프로세서 사이클 3의 시작점에서 WGHR(416)로 래치된다.In processor cycle 2, the results of sending the fetch group address to the instruction cache 106, BTAC 130, and BHT 140 are received. This is shown in the timing diagram as the command A enters the IC2 stage 406. Since instruction cache 106 stores a plurality of instructions, instruction A + 4 is also shown to be retrieved along with instruction A of IC2 stage 406. Logic circuitry within IC2 stage 406 analyzes the information received from BTAC 130 and BHT 140. During IC2 stage 406, processor 100 determines that instruction A is a condition branch instruction (based on information from a BTAC hit) as well as the predicted value returned from BHT 140. In this example, command A is expected to be taken. The actual entry of BHT 140 for command A may be either strongly taken 11 or weakly taken 10. At the end of processor cycle 2, processor 100 loads " 1 " in the MSB of WGHR 416 to indicate the predicted value associated with condition branch instruction A. Since instruction A is to be predicted, the next sequential instruction A + 4 will not be the next instruction to be executed by instruction A + 4, so that instruction A may cause IC2 stage 406 to fail. Emitted after passing through. As shown in the timing diagram 600, the value “100” is latched into the WGHR 416 at the beginning of processor cycle three.

프로세서 사이클 3 동안, 명령(A)은 IDA 스테이지(408)로 진입한다. IDA 스테이지(408)에 있는 동안, 명령(A)은 포맷되고 정렬되며, DCD 스테이지(410)로 진입하기 위해 명령을 준비한다. 명령(A)이 IDA 스테이지(408)를 통해 이동하는 동안, 명령(B)에 대한 페치 그룹 어드레스는 IC1 스테이지(404) 동안 명령 캐시(106), BTAC(130) 및 BHT(140)로 전송된다.During processor cycle 3, instruction A enters IDA stage 408. While in IDA stage 408, command A is formatted and aligned, preparing the command to enter DCD stage 410. While instruction A moves through IDA stage 408, the fetch group address for instruction B is sent to instruction cache 106, BTAC 130, and BHT 140 during IC1 stage 404. .

프로세서 사이클 4에서, 명령(A)은 DCD 스테이지(410)로 진입하고, 명령(B, B+4)에 대한 페치 요청으로부터의 결과들은 수신되고, 명령(B+8)에 대한 페치 그룹 어드레스는 명령 캐시(106), BTAC(130) 및 BHT(140)로 전송된다(IC1 스테이지(404)). WGHR (416)의 내용들("100")은 선택 먹스(422)에 의해 선택되고 명령(B+8)에 대해 BHT(140)로 엔트리를 인덱싱하기 위한 어드레스 해싱 논리 회로(420)에 의해 이용된다. 명령(A)이 DCD 스테이지(410)에 있을 때, 프로세서(100)는 명령(A)이 조건 브랜치 명령임을 확인하고 결과로서 예측 값("1")은 GHR(414)로 시프트된다. 프로세서(100)는, 프로세서(100)가 GHR(414)을 래치할 때 프로세서 사이클 5의 시작점까지 명령(A)으로부터 GHR(414)의 업데이트된 값을 알지 못할 것이다. 프로세서 사이클 4의 종료점에서, 명령(A)은 상위 파이프(450)를 떠나고, 추가적인 실행을 위해 하위 파이프(160 또는 170)로 지시된다.In processor cycle 4, instruction A enters DCD stage 410, the results from the fetch request for instructions B and B + 4 are received, and the fetch group address for instruction B + 8 is Command cache 106, BTAC 130 and BHT 140 (IC1 stage 404). The contents ("100") of the WGHR 416 are selected by the selection mux 422 and used by the address hashing logic circuit 420 to index the entry into the BHT 140 for the instruction B + 8. do. When instruction A is in DCD stage 410, processor 100 confirms that instruction A is a condition branch instruction and as a result the predicted value "1" is shifted to GHR 414. The processor 100 will not know the updated value of the GHR 414 from the instruction A until the start of processor cycle 5 when the processor 100 latches the GHR 414. At the end of processor cycle 4, instruction A leaves upper pipe 450 and is directed to lower pipe 160 or 170 for further execution.

WGHR(416)을 이용하지 않고 브랜치 히스토리 정보 및 실행된 명령들(500)의 예시적인 그룹을 저장하기 위해 오직 GHR을 이용하는 종래의 프로세서에서, 명령(B+8)에 대해 BHT로부터 리턴된 예측된 값은 정확하지 않을 수 있다. 이는 어드레스 해싱 논리 회로가 명령(B+8)에 대한 BHT의 엔트리를 결정하기 위해 프로세서 사이클 4에서 GHR의 값을 이용할 것이기 때문이다(예를 들어, 값 "000"은 이용되었다). GHR의 이러한 값은, 명령(A)에 대한 브랜치 히스토리 정보가 정확하게 나타나지 않았기 때문에 프로세서에 의해 마주치는 실제의 브랜치 히스토리를 정확하기 나타내지 않는다. 만약 동일한 명령 시퀀스가 순차적으로 실행되었으나, 이번에는 명령(B+8)을 페치할 때 프로세서가 지연(delay)을 경험하였다면(즉, 어드레스 해싱 논리 회로가 BHT 엔트리에 액세스하기 위해 GHR의 값을 이용할 때까지 GHR의 내용들은 업데이트되었음), BHT로의 상이한 엔트리가 액세스될 수 있다. 이러한 경우에, 브랜치 히스토리 정보를 저장하기 위해 단지 GHR만을 이용하는 프로세서는, 동일한 명령 실행 시퀀스를 가지는 동일한 조건 브랜치 명령에 대해 2개의 상이한 BHT 엔트리들에 액세스할 수 있다.In a conventional processor using only GHR to store branch history information and an exemplary group of executed instructions 500 without using WGHR 416, the predicted returned from BHT for instruction B + 8 The value may not be accurate. This is because the address hashing logic will use the value of GHR in processor cycle 4 to determine the entry of the BHT for the instruction B + 8 (e.g., the value "000" was used). This value of GHR does not accurately represent the actual branch history encountered by the processor because the branch history information for instruction A did not appear correctly. If the same instruction sequence was executed sequentially, but this time the processor experienced a delay when fetching the instruction (B + 8) (ie, the address hashing logic would use the value of the GHR to access the BHT entry). Until the contents of the GHR have been updated), different entries into the BHT can be accessed. In such a case, a processor using only GHR to store branch history information may access two different BHT entries for the same condition branch instruction with the same instruction execution sequence.

일 실시예에서, 명령(A)이 DCD 스테이지(410) 내에 있을 때, WGHR(416)은 GHR(414)이 로딩되는 동시에 예측 값으로 다시 재기록된다. 동시에 동일한 예측값으로 양쪽의 레지스터들을 기록함으로써, 2개의 레지스터들은 명령(A)에 대해 동기화된다. 2개의 조건 브랜치 명령들이 다른 명령어에 이어서 즉시 취해지는 것으로 예측되는 것은 드물기 때문에, 2개의 레지스터들의 동기화가 임의의 브랜치 히스토 리 정보를 손실할 가능성은 거의 없다.In one embodiment, when the command A is in the DCD stage 410, the WGHR 416 is rewritten back to the predicted value at the same time that the GHR 414 is loaded. By writing both registers with the same predictive value at the same time, the two registers are synchronized for command A. Since it is rare to expect two condition branch instructions to be immediately taken following another instruction, the synchronization of the two registers is unlikely to lose any branch history information.

프로세서 사이클 5에서, 명령들(B+8, B+12)이 IC2 스테이지(406)로 진입하는 동시에 명령들(B, B+4)은 IDA 스테이지(408)로 진입한다. 또한 프로세서 사이클 5에서, 명령들(B+16, B+20)에 대한 페치 그룹 어드레스는 명령 캐시(106), BTAC(130) 및 BHT(140)로 전송된다. IC2 스테이지(406)에서, 명령(B+8)은 BTAC 히트를 리턴한다. 명령(B+8)이 BTAC 히트이기 때문에, 프로세서(100)는 또한 명령(B+8)이 조건 브랜치 명령임을 결정하고, IC2 스테이지(406) 동안 BHT(140)로부터 리턴된 그것의 예측 값은 WGHR(416)로 시프트된다. 이러한 예에서, 명령(B+8)은 또한 예측이 취해짐이다. BHT(140)의 실제 엔트리는 강하게 취해짐(11) 또는 약하게 취해짐(10) 중 하나일 수 있다. 명령(B+8)이 예측이 취해진 브랜치 명령이기 때문에, 명령들(B+12, B+16, B+20)은, 명령(B+8)이 IC2 스테이지(406)를 떠나고 (BTAC 히트로부터 수신된)타깃 어드레스 표시 명령(C)은 페치 논리 회로(402)로 지시된 후에 페치 논리 회로(402)에 의해 방출될 것이다. WGHR(416)의 내용들은 취해짐의 예측 값("1")으로 업데이트되고, 상기 값은 타이밍 다이어그램(600)에 나타난 것처럼 프로세서 사이클 6의 시작점에서 래치된다.In processor cycle 5, instructions B + 8 and B + 12 enter IC2 stage 406 while instructions B and B + 4 enter IDA stage 408. Also in processor cycle 5, the fetch group address for instructions B + 16, B + 20 is sent to instruction cache 106, BTAC 130 and BHT 140. At IC2 stage 406, command B + 8 returns a BTAC hit. Since instruction B + 8 is a BTAC hit, processor 100 also determines that instruction B + 8 is a condition branch instruction, and its predicted value returned from BHT 140 during IC2 stage 406 is Shifted to WGHR 416. In this example, command B + 8 is also for which prediction is taken. The actual entry of the BHT 140 may be either strongly taken 11 or weakly taken 10. Since instruction B + 8 is a branch instruction for which prediction has been taken, instructions B + 12, B + 16, B + 20 cause instruction B + 8 to leave IC2 stage 406 (from a BTAC hit). The received) target address display command C will be issued by the fetch logic circuit 402 after being directed to the fetch logic circuit 402. The contents of WGHR 416 are updated with a predicted value of "taken" ("1"), which is latched at the beginning of processor cycle 6 as shown in timing diagram 600.

프로세서 사이클 6에서, 명령(B+8)이 IDA 스테이지(408)로 진입하는 동안 명령들(B, B+4)은 DCD 스테이지(410)로 진입한다. 또한, 프로세서 사이클 6동안, 명령(C)에 대한 페치 그룹 어드레스는 명령 캐시(106), BTAC(130) 및 BHT(140)으로 전송된다(IC1 스테이지(404)). 프로세서 사이클 6의 종료시점에서, 명령(B,B+4)은 상위 파이프(450)를 떠나고 추가적인 실행을 위해 하위 파이프라인들(160 또는 170)로 지시된다.In processor cycle 6, instructions B and B + 4 enter DCD stage 410 while instruction B + 8 enters IDA stage 408. Also, during processor cycle 6, the fetch group address for instruction C is sent to instruction cache 106, BTAC 130, and BHT 140 (IC1 stage 404). At the end of processor cycle 6, instruction B, B + 4 leaves upper pipe 450 and is directed to lower pipelines 160 or 170 for further execution.

프로세서 사이클 7에서, 명령(B+8)은 DCD 스테이지(410) 동안 처리된다. DCD 스테이지(410) 동안, 명령(B+8)은 조건 브랜치 명령으로 확인되고 그것의 예측 값은 또한 확인된다. 명령(B+8)에 대해 식별되는 예측 값은 GHR(414)로 시프트되고 프로세서 사이클 7 동안 WGHR(416)로 리로딩(reload)된다. 명령들(C, C+4)은 IC2 스테이지(406) 동안 명령 캐시(106)로부터 리턴된다. 프로세서 사이클 7의 종료시점에서, 명령(B+8)은 상위 파이프(450)를 떠나고 추가적인 실행을 위해 하위 파이프라인들(160 또는 170)로 지시된다.In processor cycle 7, instruction B + 8 is processed during DCD stage 410. During the DCD stage 410, the instruction B + 8 is verified with a condition branch instruction and its predicted value is also confirmed. The predicted value identified for instruction B + 8 is shifted to GHR 414 and reloaded to WGHR 416 during processor cycle 7. Instructions C and C + 4 are returned from the instruction cache 106 during the IC2 stage 406. At the end of processor cycle 7, instruction B + 8 leaves upper pipe 450 and is directed to lower pipelines 160 or 170 for further execution.

(파이프라인의 깊이(depth)에 기반하여) 브랜치 명령들이 서로에 대하여 매우 근접하게 실행될 수 있는 코드 세그먼트에서, 가장 최근의 브랜치 히스토리 정보가 브랜치 예측들을 처리하기 위해 이용된다.In a code segment in which branch instructions (based on the depth of the pipeline) can be executed in close proximity to each other, the most recent branch history information is used to process branch predictions.

프로세서 사이클 8 동안, GHR(414)의 값은 WGHR(416)와 함께 래치된다. 명령들(C, C+4)은 IDA 스테이지(410) 동안 처리되고 명령들(C, C+4) 다음의 임의의 순차적인 명령들은 페치되고 실행될 수 있다.During processor cycle 8, the value of GHR 414 is latched with WGHR 416. Instructions C and C + 4 may be processed during IDA stage 410 and any sequential instructions following instructions C and C + 4 may be fetched and executed.

도 7은 워킹 글로벌 히스토리 레지스터(WGHR, 416)를 이용하여 명령들을 실행하는 프로세서(100)에 의해 취해지는 명령 프로세스 플로우(700)를 도시하는 플로우 차트이다. 명령 프로세스 플로우(700)는 블록(702)에서 시작한다. 명령 프로세스 플로우는 페치 논리 회로(402)가 (BHT(140)으로 인덱싱하기 위해) BTAC(130) 및 어드레스 해싱 논리 회로(420)로 페치 그룹 어드레스를 전송하는 블록(704)으로 진행한다. 위에서 설명한 것처럼, 페치 그룹 어드레스의 전송이 IC1 스테이지(404) 동안 프로세서(100)에서 일어날 수 있다. 블록(704)에서, (페치된 명령이 브랜치 명령인지 여부를 결정하기 위한) BTAC(130)의 탐색(searching)의 결과들이 리턴된다. 결과들은 IC2 스테이지(406) 동안 리턴된다. 블록(704)으로부터, 명령 프로세스 플로우(700)는 결정 블록(706)으로 진행한다. 프로세서(100)는 BTAC 히트가 결정 블록(706)에서 일어났는지 여부를 결정한다. 이러한 결정은 또한 IC2 스테이지(406) 동안 일어날 수 있다. 위에서 설명한 것처럼, BTAC 히트는 조건 브랜치 명령 또는 취해진 비조건 브랜치 명령에 대해 일어날 수 있다. BTAC 히트가 존재하지 않으면(예를 들어, BTAC 미스), 명령 프로세서 플로우(700)는 바로 블록(712)으로 진행한다.7 is a flow chart illustrating an instruction process flow 700 taken by processor 100 to execute instructions using working global history register (WGHR) 416. Instruction process flow 700 begins at block 702. The instruction process flow proceeds to block 704 where the fetch logic circuit 402 sends the fetch group address to the BTAC 130 and the address hashing logic circuit 420 (to index into the BHT 140). As described above, the transfer of the fetch group address may occur at processor 100 during IC1 stage 404. At block 704, the results of the search of BTAC 130 (to determine whether the fetched instruction is a branch instruction) are returned. The results are returned during IC2 stage 406. From block 704, the instruction process flow 700 proceeds to decision block 706. Processor 100 determines whether a BTAC hit occurred at decision block 706. This determination may also occur during IC2 stage 406. As described above, a BTAC hit can occur for a conditional branch instruction or an unconditional branch instruction taken. If no BTAC hit exists (eg, a BTAC miss), the instruction processor flow 700 proceeds directly to block 712.

만약 BTAC 히트가 존재하면, 명령 프로세스 플로우(700)는 블록(710)으로 진행한다. 블록(710)에서, WGHR(416)은 BHT(140)로부터 검색되는 예측 값을 WGHR(416)로 시프팅함으로써 업데이트된다. 예를 들어, 만약 브랜치 명령이 예측이 취해짐이면 "1"이 WGHR(416)로 시프트되고, 만약 예측이 취해지지 않으면 "0"이 시프트된다. 구현에 의존하여, 예측 값은 디코딩 스테이지에 앞서 임의의 프로세서 실행 스테이지 동안 리턴될 수 있다. 위에서 설명된 실시예에서 WGHR(416)은 IC2 스테이지(406) 동안 업데이트된다.If there is a BTAC hit, the instruction process flow 700 proceeds to block 710. At block 710, the WGHR 416 is updated by shifting the prediction value retrieved from the BHT 140 to the WGHR 416. For example, if a branch instruction is taking predictions, " 1 " is shifted to WGHR 416, and if no prediction is taken, " 0 " is shifted. Depending on the implementation, the prediction value may be returned during any processor execution stage prior to the decoding stage. In the embodiment described above, the WGHR 416 is updated during the IC2 stage 406.

명령 프로세서 플로우(700)는 명령이 디코딩 스테이지(예를 들어, DCD 스테이지(410))를 통해 통과하는 블록(712)으로 진행한다. 디코딩 스테이지 동안, 블록(712)에서, 명령은 브랜치 명령으로 확인될 수 있다. 명령이 디코딩 스테이지에서 실행된 후에, 명령 프로세서 플로우(700)는 결정 블록(714)으로 진행한다. 만 약 결정 블록(714)에서 명령이 브랜치 명령이 아니면, 명령 프로세서 플로우(700)는 블록(720)에서 종료한다.The instruction processor flow 700 proceeds to block 712 where the instruction passes through the decoding stage (eg, DCD stage 410). During the decoding stage, at block 712, the instruction can be identified as a branch instruction. After the instruction is executed at the decoding stage, the instruction processor flow 700 proceeds to decision block 714. If the instruction at decision block 714 is not a branch instruction, the instruction processor flow 700 ends at block 720.

만약 블록(714)에서 프로세서(100)가 명령이 브랜치 명령임을 확인하면, 명령 프로세스 플로우(700)는 블록(716)으로 진행한다. 블록 716에서, WGHR(416) 및 GHR(414)은 적절한 브랜치 히스토리 정보로 업데이트되고, 명령 프로세스 플로우는 블록(720)에서 끝난다.If at block 714 the processor 100 confirms that the instruction is a branch instruction, the instruction process flow 700 proceeds to block 716. At block 716, the WGHR 416 and GHR 414 are updated with the appropriate branch history information, and the command process flow ends at block 720.

여기서 설명된 실시예들과 관련하여 설명된 다양한 예시적인 논리 블록들, 모듈들, 회로들, 구성 요소들, 및/또는 컴포넌트들이 범용 프로세서, 디지털 신호 처리기(DSP), 주문형 집적회로(ASIC), 필드 프로그래머블 게이트 어레이(FPGA), 또는 다른 프로그래머블 논리 장치, 이산 게이트 또는 트랜지스터 논리, 이산 하드웨어 컴포넌트들, 또는 이러한 기능들을 구현하도록 설계된 것들의 조합을 통해 구현 또는 수행될 수 있다. 범용 프로세서는 마이크로 프로세서일 수 있지만, 대안적 실시예에서, 이러한 프로세서는 기존 프로세서, 제어기, 마이크로 제어기, 또는 상태 머신일 수 있다. 프로세서는 예를 들어, DSP 및 마이크로프로세서, 복수의 마이크로프로세서들, DSP 코어와 결합된 하나 이상의 마이크로 프로세서, 또는 이러한 구성들의 조합과 같이 계산 장치들의 조합으로서 구현될 수 있다. The various illustrative logic blocks, modules, circuits, components, and / or components described in connection with the embodiments described herein may be used in a general purpose processor, digital signal processor (DSP), application specific integrated circuit (ASIC), It may be implemented or performed through a field programmable gate array (FPGA), or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or a combination of those designed to implement these functions. A general purpose processor may be a microprocessor, but in alternative embodiments, such processor may be an existing processor, controller, microcontroller, or state machine. A processor may be implemented as a combination of computing devices, such as, for example, a DSP and a microprocessor, a plurality of microprocessors, one or more microprocessors in conjunction with a DSP core, or a combination of these configurations.

여기서 특정 실시예들이 도시되고 설명되었더라도, 당해 기술분야에 속한 자는, 동일한 목적을 달성하기 위해 계산되는 임의의 배열이 도시된 특정 실시예들을 대체할 수 있고, 본 발명이 다른 환경들에서 다른 애플리케이션들을 가지고 있음을 인식해야 한다. 이러한 애플리케이션은 본 발명의 임의의 변경들 또는 변형들을 커버하도록 의도된다. 다음의 청구항들은 여기서 설명된 특정 실시예들로 본 발명의 범위를 한정하고자 의도된 방식에 해당하는 것은 아니다.Although specific embodiments have been shown and described herein, one of ordinary skill in the art can substitute for the specific embodiments in which any arrangement calculated to achieve the same purpose is shown, and the present invention provides for other applications in different environments. Be aware that you have Such application is intended to cover any changes or variations of the present invention. The following claims are not intended to limit the scope of the invention to the specific embodiments described herein.

Claims

As a method of processing branch history information,

Identifying a branch instruction at a first pipeline stage;

Loading the branch history information into a first register during the first pipeline stage;

Identifying the branch instruction at a second pipeline stage,

The branch history information is loaded into a second register during the second pipeline stage.

The method of claim 1,

Identifying the branch instruction occurs when a branch target address cache (BTAC) hit is received.

The method of claim 1,

Identifying the branch instruction occurs when a branch target instruction cache (BTIC) hit is received.

The method of claim 1,

And wherein the first pipeline stage is an instruction cache stage.

The method of claim 1,

Wherein the first register and the second register are shift registers.

The method of claim 5,

Wherein the first register and the second register are 9-bit shift registers.

The method of claim 1,

Wherein the first and second registers store branch history information for conditional branch instructions.

The method of claim 1,

Wherein the first and second registers store branch history information for conditional branch instructions and unconditional branch instructions.

The method of claim 1,

And wherein said second pipeline stage is a decoding stage.

As a pipeline processor,

A first register containing branch history information,

A second register containing branch history information,

A plurality of pipeline stages,

The branch register is loaded with the branch history information during a first pipeline stage when a branch instruction is identified, and the branch history information is loaded with a second register during a second pipeline stage.

The method of claim 10,

Wherein the branch instruction is identified when a branch target address cache (BTAC) hit occurs.

The method of claim 10,

The branch instruction is identified when a branch target instruction cache (BTIC) hit occurs.

The method of claim 10,

And the first pipeline stage is an instruction cache stage.

The method of claim 10,

The second pipeline stage is an instruction decoding stage.

The method of claim 10,

Wherein the branch history information further includes branch history information for conditional branch instructions.

The method of claim 10,

Wherein the branch history information further comprises branch history information for conditional branch instructions and unconditional branch instructions.

The method of claim 10,

Said first register and said second register are shift registers.

The method of claim 10,

And said second register is used to provide input to a branch correction logic circuit.

As a method of processing branch history information,

Fetching branch instructions;

Identifying the branch instruction at a first pipeline stage;

Loading the branch history information into a first register during the first pipeline stage; And

Identifying the branch instruction at a second pipeline stage,

And wherein the branch history information is loaded into a second register during the second pipeline stage.

The method of claim 19,

And wherein the first pipeline stage is an instruction cache stage.

The method of claim 19,

And wherein said second pipeline stage is a decoding stage.