KR101086457B1

KR101086457B1 - Processor system having low power trace cache and upcoming instruction set predictor

Info

Publication number: KR101086457B1
Application number: KR1020090132139A
Authority: KR
Inventors: 김철홍; 심성훈; 최홍준
Original assignee: 전남대학교산학협력단
Priority date: 2009-12-28
Filing date: 2009-12-28
Publication date: 2011-11-25
Also published as: KR20110075638A; WO2011081300A3; WO2011081300A2

Abstract

본 발명은 프로세서 시스템에 관한 것으로, 보다 구체적으로는 분기예측기(Branch predictor)를 포함하는 프로세서 시스템에 있어서, 분기가 일어나지 않을 것으로 예측되는 명령어들을 하나의 세트로 하여 복수 개의 명령어 세트를 저장하는 저전력 트레이스 캐쉬 및 다음에 실행될 명령어 세트를 예측하여 하나의 명령어 세트가 인출되는 동안 프로세서 코어에서 분기예측기 및 주 명령어 캐쉬로의 접근을 차단하는 명령어 세트 예측기를 구비하여 상기 분기예측기의 동작에 의해 소비되는 전력을 절감할 수 있는 프로세서 시스템에 관한 것이다.The present invention relates to a processor system, and more particularly, in a processor system including a branch predictor, a low power trace storing a plurality of sets of instructions with one set of instructions for which branching is not expected to occur. An instruction set predictor is provided for predicting the cache and the next instruction set to be executed to block access to the branch predictor and the main instruction cache from the processor core while one instruction set is fetched to provide the power consumed by the operation of the branch predictor. A processor system can be saved.

프로세서 시스템, 프로세서 코어, 분기예측기, 주 명령어 캐쉬 Processor system, processor core, branch predictor, main instruction cache

Description

Processor system having low power trace cache and upcoming instruction set predictor}

일반적으로 프로세서 시스템은, 명령어를 요청하여 인출하는 프로세서 코어 및 주메모리에 저장된 명령어들 중 자주 사용되는 명령어들이 저장되는 주 명령어 캐쉬를 포함하여 이루어진다.Generally, a processor system includes a processor core that requests and fetches instructions, and a main instruction cache, which stores frequently used instructions among instructions stored in main memory.

또한, 상기 프로세서 코어는 중앙처리장치(CPU:Central Processing unit)로도 불리며, 컴퓨터, 휴대전화 등 다양한 전자/정보기기의 중앙처리장치로 사용된다.The processor core is also called a central processing unit (CPU), and is used as a central processing unit for various electronic / information devices such as a computer and a mobile phone.

도 1은 종래의 프로세서 시스템을 보여주는 도면이다.1 is a diagram illustrating a conventional processor system.

도 1을 참조하면, 종래의 프로세서 시스템(10)은 프로세서 코어(11), 분기예측기(12) 및 주 명령어 캐쉬(13)을 포함하여 이루어진다.Referring to FIG. 1, a conventional processor system 10 includes a processor core 11, a branch predictor 12, and a main instruction cache 13.

상기 프로세서 코어(11)는 전술한 바와 같이 상기 주 명령어 캐쉬(13)에서 명령어를 인출하여 외부기기를 제어하고, 상기 주 명령어 캐쉬(13)에는 자주 사용되는 명령어들이 저장된다.As described above, the processor core 11 extracts an instruction from the main instruction cache 13 to control an external device, and frequently used instructions are stored in the main instruction cache 13.

또한, 상기 주 명령어 캐쉬(13)에 저장되는 명령어들은 미 분기 명령어와 분기 명령어로 구분된다.In addition, the instructions stored in the main instruction cache 13 are divided into a branch instruction and a branch instruction.

상기 분기예측기(12)는 주 명령어 캐쉬에서 인출되는 명령어의 미스(miss)를 줄이고, 다음에 수행될 명령어 예측 성공률(hit rate)를 높이기 구비된다.The branch predictor 12 is provided to reduce the miss of the instruction fetched from the main instruction cache, and to increase the instruction prediction hit rate to be performed next.

또한, 상기 분기예측기(12)는 분기 명령어가 인출될 때, 비순차적인 분기 목적지 주소로 분기하여 명령어를 인출할 것인지, 순차적인 다음 주소로 미 분기하여 명령어를 인출할지 결정하는 역할을 한다.In addition, the branch predictor 12 determines whether to branch out to a non-sequential branch destination address when the branch instruction is drawn, or to pull out the instruction by not branching to the next next address.

또한, 상기 분기예측기(12)는 임의의 분기 명령어에 대해 과거에 분기가 있었는지 없었는지에 대한 정보가 저장되는 패턴 이력 테이블(12b,Pattern History Table) 및 분기한 목적지의 주소 정보가 저장되는 분기 목적지 버퍼(12a,Branch Target Buffer)를 포함하여 이루어진다.In addition, the branch predictor 12 includes a pattern history table (12b) in which information on whether there has been a branch in the past for any branch instruction, and a branch in which address information of a branched destination is stored. It comprises a destination buffer 12a (Branch Target Buffer).

또한, 상기 분기예측기(12)를 이용하여 상기 프로세서 코어(11)가 명령어를 인출하는 과정을 간단히 살펴보면, 먼저, 상기 프로세서 코어(11)로부터 명령어 요청이 발생하면, 상기 주 명령어 캐쉬(13)로 접근하여 명령어를 인출한다. 동시에, 상기 프로세서 코어(11)는 상기 분기예측기(12)에 접근하여 다음에 수행될 명령어의 주소를 요청한다.In addition, when the processor core 11 fetches an instruction by using the branch predictor 12, first, when an instruction request is generated from the processor core 11, the main instruction cache 13 is transferred to the main instruction cache 13. Access and fetch a command. At the same time, the processor core 11 accesses the branch predictor 12 and requests the address of the next instruction to be executed.

다음, 상기 분기예측기(12)는 현재 요청된 명령어의 주소를 이용하여 상기 패턴 이력 테이블(12b)을 통해 상기 현재 요청된 명령어 수행 후에 분기가 있었는지 확인하고, 분기가 있었던 경우 상기 분기 목적지 버퍼(12a)에 저장된 분기 목적지 주소를 인출해준다.Next, the branch predictor 12 checks whether there is a branch after performing the currently requested instruction through the pattern history table 12b using the address of the currently requested instruction, and if the branch exists, the branch destination buffer ( Retrieve the branch destination address stored in 12a).

다음, 상기 프로세서 코어(11)는 상기 분기 목적지 주소에 저장된 명령어를 인출하여 수행한다.Next, the processor core 11 retrieves and executes an instruction stored in the branch destination address.

즉, 종래의 프로세서 시스템(10)은 상기 분기예측기(12)를 이용하여, 상기 프로세서 코어(11)로 하여금 매우 높은 예측 성공률로 빠르게 명령어를 인출할 수 있게 해준다.That is, the conventional processor system 10 uses the branch predictor 12 to enable the processor core 11 to quickly retrieve instructions with a very high predicted success rate.

그러나, 종래의 분기예측기(12)를 구비하는 프로세서 시스템(10)은 프로세서 코어(11)가 하나의 명령어를 인출할 때마다 상기 분기예측기(12)로 접근하여, 상기 분기예측기(12)를 활성화시키므로 매우 많은 전력이 소모되는 문제점이 있다.However, the processor system 10 having the conventional branch predictor 12 approaches the branch predictor 12 whenever the processor core 11 draws one instruction to activate the branch predictor 12. This is because there is a problem that consumes a lot of power.

본 발명자들은 프로세서 시스템의 전력 소모를 최소화하고자 노력한 결과, 프로세서 코어가 매 명령어를 인출할 때마다 분기예측기를 활성화하는 것을 방지하고, 저 전력 트레이스 캐쉬 또는 주 명령어 캐쉬에서 선택적으로 명령어를 인출할 수 있는 기술적 구성을 개발하게 되어 본 발명을 완성하게 되었다.As a result of efforts to minimize power consumption of the processor system, the inventors have prevented the processor core from activating the branch predictor whenever it fetches every instruction, and can selectively withdraw instructions from the low power trace cache or the main instruction cache. The technical construction was developed to complete the present invention.

따라서, 본 발명의 목적은 저 전력 트레이스 캐쉬를 이용하여 명령어 인출시에 소모되는 전력을 최소화할 수 있는 프로세서 시스템을 제공하는 것이다.Accordingly, it is an object of the present invention to provide a processor system capable of minimizing the power consumed during instruction retrieval using a low power trace cache.

또한, 본 발명의 다른 목적은 프로세서 코어가 매 명령어 인출 사이클마다 분기예측기를 활성화하는 것을 방지하여 분기예측기의 작동시에 소모되는 전력을 최소화할 수 있는 프로세서 시스템을 제공하는 것이다.It is another object of the present invention to provide a processor system which can minimize the power consumed during the operation of the branch predictor by preventing the processor core from activating the branch predictor every instruction fetch cycle.

본 발명의 목적들은 이상에서 언급한 목적들로 제한되지 않으며, 언급되지 않은 또 다른 목적들은 아래의 기재로부터 당업자에게 명확하게 이해될 수 있을 것이다.The objects of the present invention are not limited to the above-mentioned objects, and other objects that are not mentioned will be clearly understood by those skilled in the art from the following description.

상기의 목적을 달성하기 위하여 본 발명은 복수 개의 미 분기 명령어 및 분기 명령어가 저장되는 주 명령어 캐쉬, 상기 주 명령어 캐쉬의 명령어들이 인출되게 하는 프로세서 코어 및 상기 명령어가 인출될 때마다, 상기 명령어가 미 분기 명령어인지 분기 명령어인지 확인하고, 상기 명령어가 분기 명령어일 경우, 비순차적인 분기 목적지 주소로 분기하여 명령어를 인출할지, 순차적인 다음 주소로 미 분기하여 명령어를 인출할지를 결정하는 분기예측기(Branch Predictor)를 포함하는 프로세서 시스템에 있어서, 상기 프로세서 코어로 인출되는 명령어들 중, 복수 개의 명령어들을 하나의 명령어 세트로 하는 복수 개의 명령어 세트를 저장하는 저전력 트레이스 캐쉬(Low power trace cache) 및 상기 프로세서 코어에서 요청되는 명령어가 제1 분기 명령어이고, 상기 분기예측기에서 분기로 결정될 경우, 상기 제1 분기 명령어의 분기 목적지 주소에서부터 다음 분기 명령어가 인출되기 전까지의 명령어 주소의 수인 분기 수행 카운터 값을 저장하고, 상기 분기예측기에서 미 분기로 결정될 경우, 상기 제1 분기 명령어의 다음 주소인 미 분기 목적지 주소에서부터 다음 분기 명령어가 인출되기 전까지의 명령어 주소의 수인 분기 미수행 카운터 값을 저장하며, 이후, 상기 제1 분기 명령어의 재인출 요청이 있는 경우 상기 분기 목적지 주소 또는 상기 미 분기 목적지 주소로 시작하는 명령어 세트의 명령어들을 순차적으로 인출하되, 상기 분기 수행 카운터 값 또는 상기 분기 미수행 카운터 값만큼의 개수의 명령어들을 순차적으로 인출되게 하는 명령어 세트 예측기를 포함하는 프로세서 시스템을 제공한다.In order to achieve the above object, the present invention provides a main instruction cache in which a plurality of unbranched instructions and branch instructions are stored, a processor core for causing instructions of the main instruction cache to be fetched, and whenever the instructions are fetched, The branch predictor determines whether the instruction is a branch instruction or a branch instruction. If the instruction is a branch instruction, the branch predictor decides whether to branch out to an out-of-sequence branch destination address or to branch out to an sequential next address. A processor system comprising: a low power trace cache for storing a plurality of instruction sets including a plurality of instructions as one instruction set among instructions fetched to the processor core, and a low power trace cache in the processor core. The requested instruction is a first branch instruction, If the branch predictor determines the branch, the branch execution counter value, which is the number of instruction addresses from the branch destination address of the first branch instruction until the next branch instruction is fetched, is stored. Stores the branch non-execution counter value, which is the number of instruction addresses from the non-branch destination address, which is the next address of the first branch instruction, until the next branch instruction is fetched, and thereafter, when there is a re-fetch request of the first branch instruction A processor including an instruction set predictor for sequentially fetching instructions of an instruction set starting with an address or the non-branched destination address, and sequentially fetching as many instructions as the branch performing counter value or the branch non performing counter value; Provide a system.

바람직한 실시예에 있어서, 상기 저전력 트레이스 캐쉬에 저장되는 각 명령어 세트는 상기 프로세서 코어로 인출되어 수행되는 명령어들 중 미 분기 명령어들의 집합으로 이루어지며, 상기 저전력 트레이스 캐쉬의 한 라인에 하나의 명령어 세트가 저장된다.In a preferred embodiment, each instruction set stored in the low power trace cache consists of a set of unbranched instructions among instructions executed by being fetched to the processor core, with one instruction set per line of the low power trace cache. Stored.

바람직한 실시예에 있어서, 상기 프로세서 코어로 명령어 세트가 인출될 경우, 상기 명령어 세트 예측기는 상기 프로세서 코어의 명령어 요청이 상기 분기예측기로 전달되지 않게 하여, 상기 분기예측기가 동작하지 않게 한다.In a preferred embodiment, when the instruction set is fetched to the processor core, the instruction set predictor prevents the instruction request of the processor core from being passed to the branch predictor, thereby making the branch predictor inoperable.

바람직한 실시예에 있어서, 상기 분기예측기는 임의의 분기 명령어에 대해 다음에 수행될 분기 목적지 주소 정보를 저장하는 분기 목적지 버퍼를 포함하고, 상기 명령어 세트 예측기에 저장되는 분기 수행 카운터 값은 상기 분기 목적지 버퍼에 저장되는 분기 목적지 주소와 일대일로 대응되어 저장되며, 상기 명령어 세트 예측기는 상기 명령어 세트들 중 상기 분기 목적지 버퍼에 저장된 주소로 시작하는 명령어 세트가 인출되게 한다.In a preferred embodiment, the branch predictor includes a branch destination buffer that stores branch destination address information to be performed next for any branch instruction, and the branch execution counter value stored in the instruction set predictor is the branch destination buffer. One-to-one correspondence with the branch destination address stored in the storage unit is stored, and the instruction set predictor causes the instruction set starting with the address stored in the branch destination buffer of the instruction sets to be fetched.

바람직한 실시예에 있어서, 상기 명령어 세트 예측기는 상기 분기 수행 카운터 값 및 상기 분기 미 수행 카운터 값을 계산하여 생성하는 카운터 값 계산기, 상기 분기 수행 카운터 값 및 상기 분기 미 수행 카운터 값이 저장되는 카운터 값 저장 테이블, 상기 카운터 값의 수만큼 제어신호를 발생하는 제어신호 발생기, 상기 제어신호의 발생이 있는 경우, 상기 프로세서 코어의 명령어 요청이 상기 분기예측기로 입력되는 것을 차단하는 분기예측기 접근 제어기 및 상기 제어신호의 발생이 있는 경우, 상기 주 명령어 캐쉬의 명령어들이 인출되지 않게 하고, 상기 저전력 트레이스 캐쉬의 명령어 세트가 인출되게 하는 캐쉬 선택기를 포함한다.In a preferred embodiment, the instruction set predictor is a counter value calculator for calculating and generating the branch performance counter value and the non-branch counter value, and a counter value storing the branch performance counter value and the non-branch counter value. Table, a control signal generator for generating a control signal by the number of the counter value, a branch predictor access controller and a control signal for blocking the command request of the processor core from being input to the branch predictor when the control signal is generated And a cache selector to prevent the instructions of the main instruction cache from being fetched and to cause the instruction set of the low power trace cache to be fetched.

바람직한 실시예에 있어서, 상기 카운터 값 계산기는 이전에 인출된 분기 명령어에서 분기할 경우, 분기할 목적지의 주소인 분기 목적지 주소가 저장되는 분기 목적지 주소 레지스터, 이전에 인출된 분기 명령어에서 미 분기할 경우, 미 분기할 목적지인 다음 주소인 미 분기 목적지 주소가 저장되는 미 분기 목적지 주소 레지스터 및 현재 인출될 분기 명령어의 주소인 현재 분기 명령어 주소가 저장되는 현재 분기 명령어 주소 레지스터를 포함하며, 상기 이전에 인출된 분기 명령어에서 분기할 경우, 상기 현재 분기 명령어 주소의 값에서 상기 분기 목적지 주소의 값을 뺀 값을 상기 분기 수행 카운터 값으로 계산하고, 상기 이전에 인출된 분기 명령어에서 분기하지 않을 경우, 상기 현재 분기 명령어 주소의 값에서 상기 미 분기 목적지 주소의 값을 뺀 값을 상기 분기 미 수행 카운터 값으로 계산한다.In a preferred embodiment, the counter value calculator is a branch destination address register that stores a branch destination address, which is the address of a branching destination, when branching from a previously drawn branch instruction, and unbranches from a previously drawn branch instruction. A branch instruction address register that stores a branch instruction address register that is a branch instruction address that is to be fetched and a branch instruction address register that stores a branch instruction address that is a branch instruction to be fetched. When branching from a branch instruction, the current branch instruction address minus the value of the branch destination address is calculated as the branch execution counter value, and if not branching from the previously drawn branch instruction, the current The value of the branch instruction address minus the value of the unbranched destination address. The value is calculated as the non-branched counter value.

본 발명은 다음과 같은 우수한 효과를 가진다.The present invention has the following excellent effects.

먼저, 본 발명의 프로세서 시스템에 의하면 분기 여부 판단이 필요없는 미분기 명령어가 수행될 때, 명령어 세트 예측기를 이용하여 프로세서 코어가 분기예측기로 접근하는 것을 차단함으로써 프로세서 코어가 분기예측기로 접근할 때, 발생하는 전력소모를 줄일 수 있는 효과가 있다.First, according to the processor system of the present invention, when a different branch instruction that does not require branching is executed, the processor core approaches the branch predictor by blocking the processor core from accessing the branch predictor using the instruction set predictor. This can reduce the power consumption.

또한, 본 발명의 프로세서 시스템에 의하면 한 라인에 하나의 명령어 세트가 저장되는 저전력 트레이스 캐쉬를 이용하여 주 명령어 캐쉬에 우선하여 명령어 세트를 인출하게 함으로써 전력소모를 줄일 수 있는 효과가 있다. In addition, according to the processor system of the present invention, by using a low power trace cache in which one instruction set is stored on one line, the instruction set is drawn out in preference to the main instruction cache, thereby reducing power consumption.

본 발명에서 사용되는 용어는 가능한 현재 널리 사용되는 일반적인 용어를 선택하였으나, 특정한 경우는 출원인이 임의로 선정한 용어도 있는데 이 경우에는 단순한 용어의 명칭이 아닌 발명의 상세한 설명 부분에 기재되거나 사용된 의미를 고려하여 그 의미가 파악되어야 할 것이다.The terms used in the present invention were selected as general terms as widely used as possible, but in some cases, the terms arbitrarily selected by the applicant are included. In this case, the meanings described or used in the detailed description of the present invention are considered, rather than simply the names of the terms. The meaning should be grasped.

이하, 첨부한 도면에 도시된 바람직한 실시예들을 참조하여 본 발명의 기술적 구성을 상세하게 설명한다.Hereinafter, the technical structure of the present invention will be described in detail with reference to preferred embodiments shown in the accompanying drawings.

그러나, 본 발명은 여기서 설명되는 실시예에 한정되지 않고 다른 형태로 구체화 될 수도 있다. 명세서 전체에 걸쳐 동일한 참조번호는 동일한 구성요소를 나타낸다.However, the present invention is not limited to the embodiments described herein but may be embodied in other forms. Like reference numerals designate like elements throughout the specification.

도 2는 본 발명의 일 실시예에 따른 프로세서 시스템을 보여주는 도면, 도 3은 본 발명의 일 실시예에 따른 프로세서 시스템의 명령어 세트 예측기를 보여주는 도면, 도 4는 종래의 트레이스 캐쉬를 설명하기 위한 도면, 도 5는 본 발명의 일 실시예에 따른 프로세서 시스템의 저전력 트레이스 캐쉬를 설명하기 위한 도면, 도 6은 본 발명의 일 실시예에 따른 프로세서 시스템의 동작 예를 설명하기 위한 도면이다.2 illustrates a processor system according to an embodiment of the present invention, FIG. 3 illustrates an instruction set predictor of the processor system according to an embodiment of the present invention, and FIG. 4 illustrates a conventional trace cache. 5 is a view for explaining a low power trace cache of a processor system according to an embodiment of the present invention, Figure 6 is a view for explaining an operation example of a processor system according to an embodiment of the present invention.

도 1을 참조하면, 본 발명의 일 실시예에 따른 프로세서 시스템(100)은 프로세서 코어(110), 주 명령어 캐쉬(140), 저전력 트레이스 캐쉬(150), 분기예측기(120) 및 명령어 세트 예측기(130)를 포함하여 이루어진다.Referring to FIG. 1, a processor system 100 according to an embodiment of the present invention may include a processor core 110, a main instruction cache 140, a low power trace cache 150, a branch predictor 120, and an instruction set predictor ( 130).

상기 프로세서 코어(110)는 입력수단에 의해 입력되는 명령신호에 따라 아래에서 설명할 주 명령어 캐쉬(140) 또는 저전력 트레이스 캐쉬(150)에 저장된 명령 어를 인출하고, 인출되는 명령어를 이용하여 외부기기를 동작시킨다.The processor core 110 withdraws the instructions stored in the main instruction cache 140 or the low power trace cache 150 to be described below in accordance with the instruction signal input by the input means, and the external device using the withdrawn instructions To operate.

예를 들면, 상기 프로세서 코어(110)는 컴퓨터나 휴대전화 등, 다양한 전자/정보기기의 중앙처리장치일 수 있다.For example, the processor core 110 may be a central processing unit of various electronic / information devices such as a computer or a mobile phone.

상기 주 명령어 캐쉬(140)는 외부의 주 메모리(도시하지 않음)에 저장된 명령어들 중 자주 인출되는 명령어들이 저장되는 매체로써, 저장되는 명령어들은 미 분기 명령어 및 분기 명령어로 이루어진다.The main instruction cache 140 is a medium in which frequently fetched instructions among instructions stored in an external main memory (not shown) are stored. The instructions stored in the main instruction cache 140 include an unbranched instruction and a branch instruction.

상기 저전력 트레이스 캐쉬(150)는 상기 주 명령어 캐쉬(140)에서 인출되어 수행되는 명령어들의 집합인 복수 개의 명령어 세트가 저장되며, 상기 각 명령어 세트 내의 명령어들은 모두 미 분기 명령어들로 이루어진다.The low power trace cache 150 stores a plurality of instruction sets, which are sets of instructions that are fetched and executed by the main instruction cache 140, and all the instructions in each instruction set are non-branched instructions.

즉, 상기 각 명령어 세트 내의 명령어들 전체가 인출되는 동안 분기 여부를 판단할 필요가 없는 것이다. 자세한 설명은 도 3을 참조하여 하기로 한다.That is, it is not necessary to determine whether to branch while all the instructions in each instruction set are fetched. A detailed description will be given with reference to FIG. 3.

도 4 및 도 5를 참조하여, 상기 저전력 트레이스 캐쉬(150)에 대해 상세하게 설명하면, 종래의 트레이스 캐쉬는 복수 개의 명령어 세트(13a,13b,13c)가 트레이스 캐쉬(130)의 한 라인에 저장된다.4 and 5, the low power trace cache 150 will be described in detail. In the conventional trace cache, a plurality of instruction sets 13a, 13b, and 13c are stored in one line of the trace cache 130. do.

즉, 종래의 트레이스 캐쉬의 명령어 세트들(13a,13b,13c)은 명령어 세트의 구분이 없이 순차적으로 인출되어 실행될 뿐이다.That is, the instruction sets 13a, 13b, and 13c of the conventional trace cache are simply drawn out and executed sequentially without distinguishing the instruction sets.

그러나, 본 발명의 저전력 트레이스 캐쉬(150)에 저장되는 명령어 세트들(151,152,153)은 서로 다른 라인에 저장된다. 다시 말해서, 한 라인의 하나의 명령어 세트가 저장되는 것이다.However, the instruction sets 151, 152, 153 stored in the low power trace cache 150 of the present invention are stored on different lines. In other words, one instruction set per line is stored.

따라서, 본 발명의 저전력 트레이스 캐쉬(150)는 명령어 인출시에 하나의 명 령어 세트만을 인출하여 수행할 수 있으므로 소모되는 전력을 크게 줄일 수 있다.Therefore, the low-power trace cache 150 of the present invention can greatly reduce the power consumed because it can be performed by drawing only one set of instructions when the instruction is retrieved.

상기 분기예측기(120)는 상기 프로세서 코어(110)로부터 다음에 실행될 명령어의 주소를 요청을 받고, 주소 요청이 있을 경우, 현재 인출되는 명령어가 분기 명령어인지 미 분기 명령어인지 판단하며, 분기 명령어일 경우, 분기하여 비순차적인 분기 목적지 주소의 명령어를 인출할지, 미 분기하여 순차적인 다음 주소의 명령어를 인출할지 결정하고, 상기 프로세서 코어(110)로 다음에 실행될 명령어의 주소를 전송한다.The branch predictor 120 receives the request of the address of the next instruction to be executed from the processor core 110, and when there is an address request, determines whether the branch instruction currently being fetched is a branch instruction or a non-branch instruction. It determines whether to branch out to retrieve instructions of a non-sequential branch destination address, or to branch out to retrieve instructions of a sequential next address, and transmits the address of the next instruction to be executed to the processor core 110.

상기 명령어 세트 예측기(130)는 상기 주 명령어 캐쉬(140)에서 명령어를 인출할지, 상기 저전력 트레이스 캐쉬(150)에서 명령어를 인출할지를 결정하고, 상기 저전력 트레이스 캐쉬(150)에서 명령어가 인출될 경우, 인출될 명령어 세트를 예측한다.The instruction set predictor 130 determines whether to fetch an instruction from the main instruction cache 140 or to fetch an instruction from the low power trace cache 150, and when the instruction is fetched from the low power trace cache 150, Predict the instruction set to be fetched.

도 3을 참조하면, 상기 명령어 세트 예측기(130)는 카운터 값 계산기(131), 카운터 값 저장 테이블(132), 제어신호 발생기(133), 분기예측기 접근 제어기(134) 및 캐쉬 선택기(135)를 포함하여 이루어진다.Referring to FIG. 3, the instruction set predictor 130 includes a counter value calculator 131, a counter value storage table 132, a control signal generator 133, a branch predictor access controller 134, and a cache selector 135. It is made to include.

또한, 상기 카운터 값 계산기(131)는 상기 프로세서 코어(110)로 인출되는 명령어가 제1 분기 명령어이고, 상기 분기예측기(120)가 분기로 판단할 경우, 상기 제1 분기 명령어의 분기 목적지 주소에서부터 다음 분기 명령어가 인출되기 전까지의 명령어 주소의 수인 분기 수행 카운터 값을 생성하고, 상기 분기예측기(120)가 미 분기로 판단할 경우, 상기 제1 분기 목적어의 다음 주소인 미 분기 목적지 주소에서부터 다음 분기 명령어가 인출되기 전까지의 명령어 주소의 수인 분기 미수행 카운터 값을 생성한다. In addition, the counter value calculator 131 reads from the branch destination address of the first branch instruction when the instruction drawn to the processor core 110 is the first branch instruction, and the branch predictor 120 determines that the instruction is a branch. A branch performance counter value, which is the number of instruction addresses until the next branch instruction is fetched, is generated, and when the branch predictor 120 determines that the branch is not a branch, the branch branch from the branch destination address, which is the next address of the first branch object, is the next branch. Create a branch non-run counter value that is the number of instruction addresses before the instruction is fetched.

더욱 자세하게는, 상기 카운터 값 계산기(131)는 이전에 상기 제1 분기 명령어의 수행 후에 분기가 있었는지의 정보가 저장되는 분기 결과 레지스터(131d), 이전에 상기 제1 분기 명령어의 수행 후에 분기할 경우, 분기할 목적지의 주소인 분기 목적지 주소가 저장되는 분기 목적지 주소 레지스터(131a), 이전에 상기 제1 분기 명령어의 수행 후에 미 분기할 경우, 미 분기할 목적지인 다음 주소인 미분기 목적지 주소가 저장되는 미 분기 목적지 주소 레이스터(131b) 및 현재 실행되고 있는 분기 명령어의 주소가 저장되는 현재 분기 명령어 주소 레지스터(131c)를 포함하여 이루어진다.More specifically, the counter value calculator 131 branches to a branch result register 131d in which information on whether there was a branch before the execution of the first branch instruction is stored, and branches after the execution of the first branch instruction previously. If a branch destination address register 131a is stored, the branch destination address register 131a which stores the branch destination address, which is the address of the branch to be branched. And the current branch instruction address register 131c which stores the address of the branch instruction address raster 131b to be executed and the branch instruction currently being executed.

또한, 상기 분기 수행 카운터 값은 상기 현재 분기 명령어 주소 레지스터(131c) 값에서 상기 분기 목적지 주소 레지스터(131a)의 값을 뺀 값으로 생성되고, 상기 분기 미수행 카운터 값은 상기 현재 분기 명령어 주소 레지스터(131c)의 값에서 상기 미 분기 목적지 주소 레지스터(131b)의 값을 뺀 값으로 생성된다.The branch performance counter value is generated by subtracting the value of the branch destination address register 131a from the value of the current branch instruction address register 131c. 131c is generated by subtracting the value of the unbranched destination address register 131b.

또한, 상기 제1 분기 명령어의 수행 후에 분기가 있을 경우, 상기 분기 수행 카운터 값은 상기 제1 분기 명령어가 분기한 분기 목적지 주소와 일대일로 대응하여 아래에서 설명할 카운터 값 저장 테이블(132)에 저장된다.In addition, when there is a branch after the execution of the first branch instruction, the branch execution counter value is stored in the counter value storage table 132 which will be described below in one-to-one correspondence with the branch destination address branched by the first branch instruction. do.

또한, 상기 제1 분기 명령어가 분기한 분기 목적지 주소는 상기 분기예측기(120)의 분기 목적지 버퍼(12a)에 저장되어 있다.The branch destination address branched by the first branch instruction is stored in the branch destination buffer 12a of the branch predictor 120.

도 6을 참조하여 상기 분기 수행 카운터 값 및 분기 미수행 카운터 값이 계산되는 것을 예를 들어 설명하면, 먼저, 주소 100번의 분기 명령어가 주소 301번으 로 분기하여, 주소 301번에서 주소 379번까지의 미 분기 명령어가 순차로 인출되고(이때, 주소 301번에서 주소 379번까지의 미 분기 명령어들은 하나의 명령어 세트(153)로 상기 저전력 트레이스 캐쉬(150)의 한 라인에 저장된다.), 현재 주소 380번의 분기 명령어가 수행될 경우, 상기 현재 분기 명령어 주소 레지스터(131c)는 현재 주소인 '380'이 저장되고, 상기 분기 목적지 주소 레지스터(131a)에는 이전의 분기 명령어가 분기한 목적지 주소인 '301'이 저장된다. 다음, 상기 현재 분기 명령어 주소 레지스터(131c)의 값과 상기 분기 목적지 주소 레지스터(131a)의 값을 뺀 값인 '79'가 상기 분기 수행 카운터 값으로 계산된다.For example, referring to FIG. 6, the branch execution counter value and the branch non-execution counter value are calculated. First, a branch instruction of address 100 branches to address 301, and addresses 301 to 379 may be used. Unbranched instructions are sequentially fetched (at this time, unbranched instructions from address 301 to address 379 are stored in one instruction set 153 on one line of the low power trace cache 150), and the current address. When the branch instruction 380 is executed, the current branch instruction address register 131c stores the current address '380', and the branch destination address register 131a stores the branch address of the previous branch instruction '301'. 'Is stored. Next, '79', which is obtained by subtracting the value of the current branch instruction address register 131c and the value of the branch destination address register 131a, is calculated as the branch performance counter value.

한편, 주소 100번의 분기 명령어가 주소 101번으로 미 분기하여, 주소 101번에서 주소 299번까지의 미 분기 명령어가 순차적으로 인출되고, 현재 주소 300번의 분기 명령어가 수행될 경우, 상기 현재 분기 명령어 주소 레지스터(131c)에는 '300'이 저장되고, 상기 미분기 목적지 주소 레지스터(131b)에는 100번의 분기 명령어의 다음 주소인 '101'이 저장된다. 다음, 상기 현재 분기 명령어 주소 레지스터(131c)의 값과 상기 미 분기 목적지 주소 레지스터(131b)의 값을 뺀 값인 '199'가 상기 분기 미수행 카운터 값으로 계산된다.Meanwhile, if the branch instruction of address 100 does not branch to address 101, the branch instruction of address 101 to address 299 is sequentially fetched, and if the branch instruction of address 300 is executed, the current branch instruction address is executed. '300' is stored in the register 131c, and '101', which is the next address of the branch instruction 100 times, is stored in the differentiator destination address register 131b. Next, '199', which is obtained by subtracting the value of the current branch instruction address register 131c and the value of the non-branch destination address register 131b, is calculated as the branch non-performing counter value.

즉, 상기 분기 수행 카운터 값 및 상기 분기 미수행 카운터 값은 분기 여부를 판단하지 않아도 되는 미분기 명령어들의 실행 수와 동일한 것이다.That is, the branch performance counter value and the branch non-performance counter value are the same as the number of execution of the different branch instructions that do not need to determine whether to branch.

또한, 상기 카운터 값 저장 테이블(132)은 상기 분기 수행 카운터 값 및 상기 분기 미수행 카운터 값을 저장하고, 상기 분기 수행 카운터 값이 저장되는 분기 수행 카운터 값 저장테이블(132a) 및 상기 분기 미수행 카운터 값이 저장되는 분기 미수행 카운터 값 저장 테이블(132b)를 포함하여 이루어진다.In addition, the counter value storage table 132 stores the branch execution counter value and the branch execution counter value, and the branch execution counter value storage table 132a and the branch execution counter that store the branch execution counter value. And a non-branched counter value storage table 132b in which the values are stored.

또한, 상기 제어신호 발생기(133)는 상기 분기 수행 카운터 값 또는 상기 분기 미수행 카운터 값만큼의 제어신호를 발생한다. In addition, the control signal generator 133 generates a control signal equal to the branch performance counter value or the branch non-performance counter value.

또한, 상기 제어신호 발생기(133)는 상기 분기 수행 카운터 값 또는 상기 분기 미수행 카운터 값을 감산하는 카운터(133a) 및 상기 제어신호를 발생하는 제어신호 발생수단(133b)을 포함하여 이루어진다.In addition, the control signal generator 133 includes a counter 133a for subtracting the branch performance counter value or the non-branch execution counter value and a control signal generation means 133b for generating the control signal.

즉, 상기 제어신호 발생기(133)는 분기 여부를 판단하지 않아도 되는 미분기 명령어들의 실행 수만큼 상기 제어신호를 발생하는 것이다.That is, the control signal generator 133 generates the control signal as many as the number of different branch instructions that do not need to determine whether to branch.

또한, 도 6을 참조하여 상기 저전력 트레이스 캐쉬(150)에 저장된 명령어 세트가 인출되는 것을 예를 들면, 현재 주소 100번의 분기 명령어가 수행되고, 상기 분기 목적지 버퍼에 분기할 목적지 주소인 301번이 저장되어 있으며, 상기 분기 수행 카운터 값 저장 테이블에 상기 분기할 목적지 주소인 301번과 일대일로 대응하여 저장된 분기 수행 카운터 값이 있다면, 79개의 제어신호를 발생시켜, 주소 301번의 미분기 명령어에서 주소 379번까지의 미분기 명령어가 상기 프로세서 코어(110)로 순차적으로 인출되게 한다.Also, referring to FIG. 6, for example, the instruction set stored in the low power trace cache 150 is fetched. For example, a branch instruction of the current address 100 is performed, and the number 301 which is a destination address to branch to the branch destination buffer is stored. If there is a branch execution counter value stored in one-to-one correspondence with the destination address 301 to be branched in the branch execution counter value storage table, 79 control signals are generated, and the branch command from address 301 to address 379 is generated. The differentiator instruction of is sequentially drawn to the processor core 110.

또한, 현재 주소 100번의 분기 명령어가 수행되고, 상기 분기 목적지 버퍼에 분기할 목적지 주소가 저장되어 있지 않으며, 상기 분기 미수행 카운터 값 저장 테이블에 상기 현재 주소의 다음 주소인 101과 일대일로 대응되어 저장된 분기 미수행 카운터 값이 있다면, 199개의 제어신호를 발생시켜, 주소 101번의 미분기 명령어에서 주소 299번의 미분기 명령어까지 순차적으로 인출하게 된다.In addition, a branch instruction of the current address 100 is performed, and a destination address to branch to is not stored in the branch destination buffer, and is stored in one-to-one correspondence with 101, the next address of the current address, in the branch non-performing counter value storage table. If there is a branch non-execution counter value, 199 control signals are generated and sequentially drawn from the differentiation instruction at address 101 to the differentiation instruction at address 299.

이때, 인출되는 미분기 명령어들은 상기 명령어 세트(151,152,153)들 중 어느 하나의 세트에서 출력된다.At this time, the differentiator instructions to be fetched are output from any one of the instruction sets 151, 152, 153.

또한, 상기 분기예측기 접근 제어기(134)는 상기 제어신호 발생기(133)에서 제어신호가 발생할 경우, 상기 프로세서 코어(110)가 상기 분기예측기(120)로 접근하는 것을 차단한다. In addition, the branch predictor access controller 134 blocks the processor core 110 from accessing the branch predictor 120 when a control signal is generated in the control signal generator 133.

또한, 상기 캐쉬 선택기(135)는 상기 제어신호 발생기(133)에서 제어신호가 발생할 경우, 상기 주 명령어 캐쉬(140)에서 명령어가 인출되지 않게 하고, 상기 저전력 트레이스 캐쉬(150)에 저장된 명령어 세트들(151,152,153) 중 어느 하나의 명령어 세트가 인출되게 한다.In addition, the cache selector 135 prevents the instruction from being fetched from the main instruction cache 140 when the control signal is generated from the control signal generator 133, and sets of instructions stored in the low power trace cache 150. Causes the instruction set of any of (151, 152, 153) to be fetched.

따라서, 분기 여부를 판단할 필요가 없는 미분기 명령어들의 실행에 대해서는 상기 분기예측기(120)가 동작하지 않게 하여 전력 소모를 줄일 수 있고, 이때, 상기 주 명령어 캐쉬(140) 대신 상기 저전력 트레이스 캐쉬(150)에서 명령어를 인출하므로 전력소모를 매우 줄일 수 있는 것이다.Therefore, the branch predictor 120 may not operate to reduce power consumption for execution of the different branch instructions that do not need to determine whether to branch. In this case, the low power trace cache 150 instead of the main instruction cache 140 may be used. ) Can be used to reduce power consumption.

이상에서 살펴본 바와 같이 본 발명은 바람직한 실시예를 들어 도시하고 설명하였으나, 상기한 실시예에 한정되지 아니하며 본 발명의 정신을 벗어나지 않는 범위 내에서 당해 발명이 속하는 기술분야에서 통상의 지식을 가진 자에 의해 다양한 변경과 수정이 가능할 것이다.While the present invention has been particularly shown and described with reference to exemplary embodiments thereof, it is clearly understood that the same is by way of illustration and example only and is not to be taken by way of limitation, Various changes and modifications will be possible.

도 1은 종래의 프로세서 시스템을 보여주는 도면,1 illustrates a conventional processor system;

도 2는 본 발명의 일 실시예에 따른 프로세서 시스템을 보여주는 도면, 2 illustrates a processor system according to an embodiment of the present invention;

도 3은 본 발명의 일 실시예에 따른 프로세서 시스템의 명령어 세트 예측기를 보여주는 도면, 3 illustrates an instruction set predictor of a processor system according to an embodiment of the present invention;

도 4는 종래의 트레이스 캐쉬를 설명하기 위한 도면, 4 is a view for explaining a conventional trace cache,

도 5는 본 발명의 일 실시예에 따른 프로세서 시스템의 저전력 트레이스 캐쉬를 설명하기 위한 도면, 5 is a view for explaining a low power trace cache of a processor system according to an embodiment of the present invention;

도 6은 본 발명의 일 실시예에 따른 프로세서 시스템의 동작 예를 설명하기 위한 도면이다.6 is a diagram for describing an example of an operation of a processor system according to an exemplary embodiment.

본 발명에 따른 도면들에서 실질적으로 동일한 구성과 기능을 가진 구성요소들에 대하여는 동일한 참조부호를 사용한다.In the drawings according to the present invention, the same reference numerals are used for components having substantially the same configuration and function.

<도면의 주요부분에 대한 부호의 설명><Description of the symbols for the main parts of the drawings>

100:프로세서 시스템 110:프로세서 코어100: processor system 110: processor core

120:분기예측기 130:명령어 세트 예측기120: branch predictor 130: instruction set predictor

131:카운터 값 계산기 131a:분기 목적지 주소 레지스터131: Counter value calculator 131a: Branch destination address register

131b:미분기 목적지 주소 레지스터 131c:현재 분기 명령어 주소 레지스터131b: Differentiation destination address register 131c: Current branch instruction address register

131d:분기 결과 레지스터 132:카운터 값 저장 테이블131d: branch result register 132: counter value storage table

132a:분기 수행 카운터 값 저장 테이블 132a: Branch execution counter value storage table

132b:분기 미수행 카운터 값 저장 테이블132b: Store non-branched counter value table

133:제어신호 발생기 133a:카운터133: control signal generator 133a: counter

133b:제어신호 발생수단 134:분기예측기 접근 제어기133b: control signal generating means 134: branch predictor access controller

135:캐쉬 선택기 140:주 명령어 캐쉬135: cache selector 140: main instruction cache

150:저전력 트레이스 캐쉬 151,152,153:명령어 세트150: low power trace cache 151, 152, 153: instruction set

Claims

A main instruction cache for storing a plurality of non-branching instructions and branch instructions, a processor core for causing instructions of the main instruction cache to be fetched, and whenever the instructions are fetched, check whether the instruction is an unbranching instruction or a branch instruction, and In the processor system, if the instruction is a branch instruction, the processor system including a branch predictor (Branch Predictor) to determine whether to fetch the instruction by branching to a non-sequential branch destination address, or to fetch the instruction by unbranching to the next sequential address,

A low power trace cache configured to store a plurality of instruction sets having a plurality of instructions as one instruction set among instructions fetched to the processor core; And

If the instruction requested by the processor core is a first branch instruction and the branch predictor determines that the branch is a branch, the branch execution counter value, which is the number of instruction addresses from the branch destination address of the first branch instruction until the next branch instruction is fetched, is determined. If the branch predictor determines that the branch is not branched, the branch branch counter value, which is the number of instruction addresses before the branch instruction is fetched from the branch destination address that is the next address of the first branch instruction, is stored. When there is a re-fetch request of the first branch instruction, instructions of the instruction set starting with the branch destination address or the non-branch destination address are sequentially drawn out, and the number of the branch execution counter value or the branch non-execution counter value is increased. Command to sequentially retrieve commands from Word set predictor; includes,

And when the instruction set is fetched to the processor core, the instruction set predictor prevents the instruction request of the processor core from being transferred to the branch predictor, thereby making the branch predictor inoperable.

The method of claim 1,

Each instruction set stored in the low power trace cache consists of a set of unbranched instructions among instructions executed by being fetched to the processor core.

And a set of instructions stored on one line of the low power trace cache.

delete

The method according to claim 1 or 2,

The branch predictor comprises a branch destination buffer for storing branch destination address information to be performed next for any branch instruction,

A branch performance counter value stored in the instruction set predictor is stored in one-to-one correspondence with a branch destination address stored in the branch destination buffer.

And wherein the instruction set predictor causes an instruction set to be fetched starting from an address stored in the branch destination buffer of the instruction sets.

The method of claim 4, wherein

The instruction set predictor:

A counter value calculator for calculating and generating the branch performance counter value and the branch non-performance counter value;

A counter value storing table storing the branch performing counter value and the branch not performing counter value;

A control signal generator for generating a control signal as many as the counter value;

A branch predictor access controller for blocking a command request from the processor core from being input to the branch predictor when the control signal is generated; And

And a cache selector which prevents the instructions of the main instruction cache from being fetched and causes the instruction set of the low power trace cache to be fetched when the control signal is generated.

The method of claim 5,

The counter value calculator:

A branch destination address register that stores a branch destination address, which is an address of a destination to branch when branching from a branch instruction previously fetched;

A non-branch destination address register that stores a non-branch destination address, which is a next address which is a non-branch destination, when the non-branch branches from a previously fetched branch instruction; And

A current branch instruction address register in which a current branch instruction address, which is an address of a branch instruction to be fetched, is stored.

When branching from the previously fetched branch instruction, a value obtained by subtracting the value of the branch destination address from the value of the current branch instruction address is calculated as the branch execution counter value,

A processor system which calculates a value obtained by subtracting a value of the non-branch destination address from the value of the current branch instruction address as the non-branch execution counter value when not branching from the previously fetched branch instruction;