KR20140011940A

KR20140011940A - Processor using branch instruction execution cache and operating method thereof

Info

Publication number: KR20140011940A
Application number: KR1020130077191A
Authority: KR
Inventors: 권영수
Original assignee: 한국전자통신연구원
Priority date: 2012-07-18
Filing date: 2013-07-02
Publication date: 2014-01-29

Abstract

Disclosed are a processor using branch instruction execution cache and an operating method thereof. The processor comprises: a patch unit operated in a pipe line manner; a branch prediction unit; a command Q; a decoding unit; and an execution unit. Also the processor comprises a branch instruction execution cache for storing the address and decoding information of the command transmitted from the decoding unit and for providing the execution unit with at least part of the stored address and decoding information in order to solve a branch prediction error if a branch prediction error is determined in the execution unit. Therefore, by using the processor, pipeline initialization overhead can be minimized, thereby preventing performance degradation of the processor, and power consumption of the processor can be minimized. [Reference numerals] (110) Processor core; (120) Command cache; (121) Fetch part; (122) Branch estimating part; (123) Command queue; (124) Decoding part; (125) Executing part; (130) Branch command executing cache; (AA) Control

Description

Processor using branch instruction execution cache and method of operating processor using branch instruction execution cache

본 발명은 프로세서에 관한 것으로, 더욱 상세하게는 고도 파이프라인(deep pipelining) 구조를 가지는 고성능 프로세서 코어에서 분기 예측 오류(branch misprediction)시 발생하는 오버헤드를 줄일 수 있는 프로세서의 구조, 프로세서를 위한 분기 명령 실행 캐쉬 및 프로세서의 동작 방법에 관한 것이다.TECHNICAL FIELD The present invention relates to a processor, and more particularly, to a processor structure and a branch for a processor, which can reduce overhead caused by branch misprediction in a high performance processor core having a deep pipelining structure. An instruction execution cache and a method of operating a processor.

프로세서(processor)는 메모리 또는 디스크 등의 저장장치(storage)에 저장된 명령어(instruction)를 읽어 들여 명령어에 인코딩된 동작에 따라서 피연산자(operand)에 특정 연산(operation)을 수행하고 결과를 다시 저장함으로써 특정 응용(application) 영역을 위한 알고리즘을 실행하는 하드웨어 또는 IP(Intellectual Property)를 의미한다.A processor reads an instruction stored in storage such as a memory or a disk, performs a specific operation on an operand according to an operation encoded in the instruction, and stores the result again. Hardware or IP (Intellectual Property) that executes an algorithm for an application area.

프로세서의 응용 영역은 시스템 반도체 전분야에 걸쳐서 광대하게 적용되고 있다. 예컨대, 프로세서는 비디오 데이터 압축 및 해제, 오디오 데이터 압축 및 해제, 오디오 데이터 변형 및 음향효과와 같은 대용량의 멀티미디어 데이터를 위한 고성능 미디어 데이터 처리, 유무선 통신용 모뎀, 보이스 코덱 알고리즘, 네트워크 데이터 처리, 터치스크린, 가전기기용 콘트롤러, 모터제어와 같은 최소성능 마이크로 콘트롤러 플랫폼 뿐만 아니라, 무선 센서 네트웍(Wireless Sensor Network) 또는 초소형 전자장치(electronics dust)와 같은 안정적인 전원공급이 불가능하거나 외부로부터의 전원공급이 불가능한 장치에 이르기까지 다양한 응용 영역으로 그 사용처를 확대하고 있다.The application area of the processor is widely applied to all areas of system semiconductors. For example, the processor may include high performance media data processing for large amounts of multimedia data such as video data compression and decompression, audio data compression and decompression, audio data transformation and sound effects, modems for wired and wireless communications, voice codec algorithms, network data processing, touch screens, In addition to low-performance microcontroller platforms such as controllers for home appliances, motor control, as well as devices that cannot be supplied with stable or inadequate power supply, such as wireless sensor networks or micros dust. It is expanding its use to various application areas.

프로세서는 기본적으로 코어(core), TLB(Translation Lookaside Buffer), 그리고 캐쉬(cache)로 구성된다. 프로세서가 수행할 작업은 다수의 명령어(instruction)들의 조합으로 규정된다. 즉, 명령어가 메모리에 저장되어 있고, 프로세서에 이 명령어들이 순차적으로(sequentially) 입력되어 매 클럭 사이클마다 프로세서가 특정 연산을 행하게 된다. TLB는 운영체제 기반의 어플리케이션 구동을 위하여 가상 어드레스를 물리 어드레스로 변환하는 기능을 하며, 캐쉬는 외부 메모리에 저장되어 있는 명령어를 칩 내부에 잠시 저장함으로써 프로세서의 속도를 증대시키는 역할을 한다. The processor basically consists of a core, a translation lookaside buffer (TLB), and a cache. The work to be performed by the processor is defined as a combination of a number of instructions. That is, the instructions are stored in the memory, and the instructions are sequentially input to the processor so that the processor performs specific operations every clock cycle. The TLB converts the virtual address to a physical address for operating system based applications. The cache plays a role of increasing the speed of the processor by temporarily storing the instructions stored in the external memory in the chip.

최근 1GHz 이상의 고성능 프로세서 코어는 필수적으로 고도 파이프라인(deep pipelining) 구조를 가지고 있다. 이러한 파이프라인 구조는 동작 주파수를 극대화 할 수 있고, 성능(throughput)을 증대시킬 수 있다. 반면, 파이프라인 구조하에서 분기 명령어를 실행할 경우, 분기를 위한 분기 어드레스(branch target address)는 파이프라인 후반부에서 결정되기 때문에 분기가 실제로 일어날 당시 사이클(clock cycle)에 이미 전반부의 파이프라인에 있는 명령어들은 실행해서는 안되는 명령어들이므로 파이프라인 초기화(pipeline clear)가 일어나게 된다. 파이프라인 초기화 이후 분기명령어의 분기 어드레스에서 다시 명령어들을 읽어오는데, 이 때 10 사이클 이상의 성능 오버헤드가 발생한다. Recent high-performance processor cores of 1GHz and above have essentially deep pipelining structures. This pipeline structure can maximize operating frequency and increase throughput. On the other hand, when executing a branch instruction under the pipeline structure, the branch target address for the branch is determined later in the pipeline, so that instructions already in the first half of the pipeline in the clock cycle at the time the branch actually occurs. Because these commands should not be executed, pipeline clearing will occur. After pipeline initialization, instructions are read back from the branch address of the branch instruction, which incurs more than 10 cycles of performance overhead.

파이프라인 초기화(pipeline clear)는 고도 파이프라인 구조의 프로세서 코어에서 특히 두드러지는 현상이다. 일반적으로 5단계 정도의 파이프라인 구조를 가진 프로세서 코어에서는 파이프라인 초기화에 대한 특별한 대비책을 마련해 놓지 않는 반면, 고도 파이프라인 구조의 프로세서 코어에서는 분기 예측기(branch predictor)를 구현하게 된다. Pipeline clearing is especially noticeable in highly pipelined processor cores. In general, a processor core with a pipeline stage of five stages does not provide any special preparation for pipeline initialization, whereas a branch predictor is implemented in an advanced pipeline processor core.

분기 예측기는 파이프라인 전반부에서 명령어를 읽어 들일 때 분기가 어떻게 일어날 것인가를 미리 예측하고 예측한 메모리 주소에서 명령어를 읽어오게 된다. 분기 예측 결과는 파이프라인 후반부로 전달되며 파이프라인 후반부에서 분기 어드레스를 결정할 때 파이프라인 전반부에서 행했던 분기 예측이 정확했는지를 파악한다. 분기 예측이 정확할 때는 파이프라인 초기화 없이 코어 동작이 지속되나, 만일 분기 예측이 정확하지 않은 경우, 즉, 분기 예측 오류(branch misprediction)가 발생할 경우에는 파이프라인 초기화 과정을 거치게 된다. 즉, 분기 예측기를 사용할 경우에도 파이프라인 초기화는 발생될 수 밖에 없는 상황이므로 분기 예측 오류가 발생했을 때, 파이프라인 초기화에 위한 오버헤드(overhead)를 최소화하기 위한 방안이 필요하다.The branch predictor predicts in advance how a branch will happen when an instruction is read from the first half of the pipeline and reads the instruction from the predicted memory address. The branch prediction results are sent to the second half of the pipeline to determine whether the branch predictions made in the first half of the pipeline were correct when determining the branch address in the second half of the pipeline. When the branch prediction is correct, the core operation continues without pipeline initialization. However, if the branch prediction is not accurate, that is, branch misprediction occurs, the pipeline initialization process is performed. That is, even when the branch predictor is used, pipeline initialization is inevitably generated. Therefore, when a branch prediction error occurs, a method for minimizing the overhead for pipeline initialization is needed.

상기와 같은 문제점을 해결하기 위한 본 발명의 목적은, 파이프라인 구조 프로세서의 성능 증대와 소모 전력 절감을 위해서, 분기 예측 오류의 빠른 복구(recovery)를 가능하게 하고, 분기 예측 오류의 복구를 위한 파이프라인 초기화 오버헤드를 최소화할 수 있는, 프로세서의 구조를 제공하는데 있다.An object of the present invention for solving the above problems, to increase the performance of the pipeline structure processor and to reduce the power consumption, to enable fast recovery of the branch prediction error (pipe) for the recovery of the branch prediction error It is to provide a structure of the processor, which can minimize the line initialization overhead.

상기와 같은 문제점을 해결하기 위한 본 발명의 다른 목적은, 파이프라인 구조 프로세서의 성능 증대와 소모 전력 절감을 위해서, 분기 예측 오류의 빠른 복구를 가능하게 하고, 분기 예측 오류의 복구를 위한 파이프라인 초기화 오버헤드를 최소화할 수 있는, 프로세서의 동작 방법을 제공하는데 있다.Another object of the present invention for solving the above problems, to increase the performance of the pipeline structure processor and to reduce the power consumption, to enable fast recovery of branch prediction error, pipeline initialization for the recovery of branch prediction error It is to provide a method of operating a processor, which can minimize overhead.

상기와 같은 문제점을 해결하기 위한 본 발명의 또 다른 목적은, 프로세서의 성능 증대와 소모 전력 절감을 위해서 파이프라인 구조 프로세서에 적용될 수 있는, 빠른 분기 예측 오류의 복구와 분기 예측 오류 복구를 위한 파이프라인 초기화 오버헤드를 최소화할 수 있는, 분기 명령 실행 캐쉬의 구조를 제공하는데 있다.Another object of the present invention for solving the above problems is a pipeline for fast branch prediction error recovery and branch prediction error recovery, which can be applied to a pipeline structure processor to increase the performance and reduce power consumption of the processor. The purpose is to provide a branch instruction execution cache structure that can minimize initialization overhead.

상기 목적을 달성하기 위한 본 발명의 일 측면은, 명령어 캐쉬로부터 현재 명령어를 읽어오는 페치부, 상기 페치부로부터 상기 현재 명령어를 전달받아 출력하고, 상기 현재 명령어가 분기 명령어인 경우 분기 예측을 수행하여, 분기 예측 결과에 따라 상기 현재 명령어의 분기 어드레스 또는 상기 현재 명령어가 위치한 어드레스의 다음 어드레스로부터 다음 명령어를 출력하도록 상기 페치부를 제어하는 분기 예측부, 상기 분기 예측부로부터 출력된 명령어를 저장하는 명령어 큐, 상기 명령어 큐로부터 전달된 명령어를 디코딩하여, 상기 전달된 명령어의 어드레스와 디코드 정보를 출력하는 디코딩부 및 상기 출력된 명령어의 어드레스와 디코드 정보에 기초하여 상기 전달된 명령어에 대응된 동작을 수행하는 실행부를 포함하고, 상기 디코딩부로부터 출력된 상기 전달된 명령어의 어드레스와 디코드 정보를 저장하며, 상기 실행부에서 분기 예측 오류가 판별된 경우, 상기 분기 예측의 오류를 복구하기 위해, 저장된 디코드 정보들 중 적어도 일부를 상기 실행부로 제공하는 분기 명령 실행 캐쉬를 구비하는 것을 특징으로 하는 프로세서를 제공한다.One aspect of the present invention for achieving the above object is a fetch unit for reading a current instruction from the instruction cache, receives the current instruction from the fetch unit and outputs, and if the current instruction is a branch instruction to perform branch prediction A branch predictor configured to control the fetch unit to output a next instruction from a branch address of the current instruction or a next address of an address at which the current instruction is located according to a branch prediction result, and an instruction queue storing an instruction output from the branch predictor And decoding a command transmitted from the command queue, and outputting an address and decode information of the transmitted command, and performing an operation corresponding to the transmitted command based on the address and decode information of the output command. Including an execution unit, and the decoding unit Storing the output address and the decode information of the transferred instruction, and if the branch prediction error is determined by the execution unit, providing at least some of the stored decode information to the execution unit to recover the error of the branch prediction. Provided is a processor comprising a branch instruction execution cache.

여기에서, 상기 페치부, 상기 분기 예측부, 상기 명령어 큐, 상기 디코딩 부 및 상기 실행부는 파이프라인 방식으로 동작하도록 구성될 수 있다. 이때, 상기 실행부에서 분기 예측 오류가 판별되고, 상기 분기 명령 실행 캐쉬가 상기 저장된 디코드 정보들 중 적어도 일부를 상기 실행부로 제공하지 못하는 경우, 파이프라인 초기화가 수행되도록 구성될 수 있다.Here, the fetch unit, the branch prediction unit, the instruction queue, the decoding unit and the execution unit may be configured to operate in a pipelined manner. In this case, when the branch prediction error is determined by the execution unit and the branch instruction execution cache fails to provide at least some of the stored decode information to the execution unit, the pipeline initialization may be performed.

여기에서, 상기 페치부는 상기 분기 예측부가 상기 현재 명령어에서 분기가 일어날 것으로 예측한 경우는 상기 현재 명령어의 분기 어드레스로부터 상기 다음 명령어를 읽어오며, 상기 현재 명령어에서 분기가 일어나지 않을 것으로 예측한 경우는 상기 현재 명령어가 위치한 어드레스의 다음 어드레스로부터 상기 다음 명령어를 읽어오도록 구성될 수 있다.Here, the fetch unit reads the next instruction from the branch address of the current instruction when the branch predictor predicts that the branch will occur in the current instruction, and when the branch predictor predicts that the branch does not occur in the current instruction, It may be configured to read the next instruction from the next address of the address where the current instruction is located.

여기에서, 상기 분기 명령 실행 캐쉬는 상기 분기 명령어 이후의 명령어들 중 적어도 일부의 디코드 정보를 저장할 수 있다.Here, the branch instruction execution cache may store decoded information of at least some of the instructions after the branch instruction.

여기에서, 상기 분기 명령 실행 캐쉬는 상기 분기 명령어의 분기 어드레스 이후에 위치한 명령어들 중 적어도 일부의 디코드 정보를 저장할 수 있다.Here, the branch instruction execution cache may store decode information of at least some of the instructions located after the branch address of the branch instruction.

여기에서, 상기 분기 명령 실행 캐쉬는 상기 프로세서 장치의 디코딩부로부터 디코딩된 명령어의 어드레스와 디코드 정보를 수신하는 세이빙부, 상기 세이빙부로부터 상기 디코딩된 명령어의 어드레스와 디코드 정보를 수신하여 저장하는 메모리부 및 상기 실행부로부터 분기예측 오류 신호를 입력받아 상기 메모리부에 저장된 디코드 정보를 상기 실행부로 제공하는 복구부를 포함하여 구성될 수 있다. Here, the branch instruction execution cache is a saving unit for receiving the address and decode information of the decoded instruction from the decoding unit of the processor device, a memory unit for receiving and storing the address and decode information of the decoded instruction from the saving unit And a recovery unit receiving a branch prediction error signal from the execution unit and providing decode information stored in the memory unit to the execution unit.

이때, 상기 메모리부는 상기 디코딩된 명령어의 어드레스의 적어도 일부로 특정되는 적어도 하나의 태그 아이템이 저장된 태그 메모리 및 상기 태그 아이템에 의해 일대일로 특정되는 명령어 그룹들로 구성된 명령어 그룹 메모리를 포함하고, 상기 명령어 그룹은 적어도 하나의 명령어에 대한 디코드 정보를 저장하도록 구성될 수 있다.The memory unit may include a tag memory in which at least one tag item specified as at least part of an address of the decoded instruction is stored, and an instruction group memory configured by instruction groups specified one-to-one by the tag item. May be configured to store decode information for at least one instruction.

이때, 상기 세이빙부는, 상기 디코딩부로부터 출력된 명령어의 어드레스에 기초하여 상기 태그 메모리에서 선택된 태그 아이템에 상기 명령어의 어드레스의 적어도 일부를 저장하고, 상기 명령어 그룹 메모리에서 상기 선택된 태그 아이템에 의해 특정되는 명령어 그룹에 상기 출력된 명령어의 디코드 정보를 저장하도록 구성될 수 있다.In this case, the saving unit may store at least a part of the address of the command in a tag item selected in the tag memory based on the address of the command output from the decoding unit, and may be specified by the selected tag item in the command group memory. The decode information of the output command may be stored in the command group.

이때, 상기 복구부는, 상기 실행부로부터 상기 분기 예측 오류 신호와 분기 어드레스를 수신하고, 상기 태그 메모리에서 상기 분기 어드레스를 참조하여 선택된 태그 아이템에 의해 특정되는 상기 명령어 그룹 메모리의 명령어 그룹에 속한 명령어 디코드 정보를 독출하여 상기 실행부로 전달하도록 구성될 수 있다.
In this case, the recovery unit receives the branch prediction error signal and the branch address from the execution unit, and decodes an instruction belonging to an instruction group of the instruction group memory specified by a tag item selected by referring to the branch address in the tag memory. It may be configured to read information and deliver it to the execution unit.

상기 다른 목적을 달성하기 위한 본 발명의 일 측면은, 명령어 캐쉬로부터 읽어온 현재 명령어를 출력하고 분석하여 상기 현재 명령어가 분기 명령어인 경우 분기 예측을 수행하고, 분기 예측 결과에 따라 상기 현재 명령어의 분기 어드레스 또는 상기 현재 명령어가 위치한 어드레스의 다음 어드레스로부터 다음 명령어를 출력하는 분기 예측 단계, 상기 분기 예측 단계로부터 출력된 명령어를 명령어 큐에 저장하는 명령어 저장 단계, 상기 명령어 큐로부터 전달된 명령어를 디코딩하여, 상기 전달된 명령어의 어드레스와 디코드 정보를 출력하는 디코딩 단계 및 상기 디코딩 단계로부터 출력된 명령어의 어드레스와 디코드 정보에 기초하여 상기 출력된 명령어에 대응된 동작을 수행하는 실행 단계를 포함하고, 상기 디코딩 단계에서 출력된 상기 명령어의 어드레스와 디코드 정보를 저장하며, 상기 실행 단계에서 분기 예측 오류가 판별된 경우, 상기 분기 예측 오류를 극복하기 위해, 저장된 명령어의 디코드 정보들 중 적어도 일부를 상기 실행 단계로 제공하는 것을 특징으로 하는 프로세서 동작 방법을 제공한다.One aspect of the present invention for achieving the above another object is to output and analyze the current instruction read from the instruction cache to perform branch prediction when the current instruction is a branch instruction, the branch of the current instruction according to the branch prediction result A branch prediction step of outputting a next command from an address or a next address of the address where the current command is located, a command storing step of storing the command output from the branch prediction step, in a command queue, and decoding the command transmitted from the command queue, And a decoding step of outputting the address and decode information of the transferred command and an execution step of performing an operation corresponding to the output command based on the address and the decode information of the command output from the decoding step, wherein the decoding step The command output from Stores the address and the decode information, and if the branch prediction error is determined in the execution step, at least some of the decoded information of the stored instruction is provided to the execution step to overcome the branch prediction error. Provides a method of operating a processor.

여기에서, 상기 분기 예측 단계, 상기 명령어 저장 단계, 상기 디코딩 단계 및 상기 실행 단계는 파이프라인 방식으로 동작하도록 구성될 수 있다. 이때, 상기 실행 단계에서 분기 예측 오류가 판별되고 상기 저장된 명령어의 디코드 정보들 중 적어도 일부를 상기 실행 단계로 제공하지 못하는 경우, 파이프라인 초기화를 수행되도록 구성될 수 있다.
Here, the branch prediction step, the instruction storing step, the decoding step and the execution step may be configured to operate in a pipelined manner. In this case, when the branch prediction error is determined in the execution step and at least some of the decoded information of the stored instructions are not provided to the execution step, the pipeline initialization may be performed.

상기 또 다른 목적을 달성하기 위한 본 발명의 일 측면은, 파이프라인 구조의 프로세서에 적용되는 분기 명령 실행 캐쉬로서, 상기 프로세서의 디코딩부로부터 디코딩된 명령어의 어드레스와 디코드 정보를 수신하는 세이빙부, 상기 세이빙부로부터 상기 디코딩된 명령어의 어드레스와 디코드 정보를 수신하여 저장하는 메모리부 및 상기 프로세서의 실행부로부터 분기예측 오류 신호를 입력받아 상기 메모리부에 저장된 디코드 정보를 상기 실행부로 제공하는 복구부를 포함한 것을 특징으로 하는 분기 명령 실행 캐쉬를 제공한다.One aspect of the present invention for achieving the another object is a branch instruction execution cache applied to a processor of the pipeline structure, the saving unit for receiving the address and decode information of the decoded instructions from the decoding unit of the processor, A memory unit for receiving and storing the address and decode information of the decoded instruction from a saving unit, and a recovery unit for receiving a branch prediction error signal from an execution unit of the processor and providing the decode information stored in the memory unit to the execution unit. It provides a branch instruction execution cache.

여기에서, 상기 메모리부는, 상기 디코딩된 명령어의 어드레스의 적어도 일부로 특정되는 적어도 하나의 태그 아이템이 저장된 태그 메모리 및 상기 태그 아이템에 의해 일대일로 특정되는 명령어 그룹들로 구성된 명령어 그룹 메모리를 포함하고, 상기 명령어 그룹은 적어도 하나의 명령어에 대한 디코드 정보를 저장하도록 구성될 수 있다.The memory unit may include a tag memory in which at least one tag item specified as at least part of an address of the decoded instruction is stored, and an instruction group memory configured by instruction groups specified one-to-one by the tag item. The instruction group may be configured to store decode information for at least one instruction.

여기에서, 상기 세이빙부는, 상기 디코딩부로부터 출력된 명령어의 어드레스에 기초하여 상기 태그 메모리에서 선택된 태그 아이템에 상기 명령어의 어드레스의 적어도 일부를 저장하고, 상기 명령어 그룹 메모리에서 상기 선택된 태그 아이템에 의해 특정되는 명령어 그룹에 상기 출력된 명령어의 디코드 정보를 저장하도록 구성될 수 있다.Here, the saving unit stores at least a part of the address of the command in a tag item selected in the tag memory based on the address of the command output from the decoding unit, and is specified by the selected tag item in the command group memory. The decode information of the output command may be stored in the command group.

여기에서, 상기 복구부는, 상기 실행부로부터 상기 분기 예측 오류 신호와 분기 어드레스를 수신하고, 상기 태그 메모리에서 상기 분기 어드레스를 참조하여 선택된 태그 아이템에 의해 특정되는 상기 명령어 그룹 메모리의 명령어 그룹에 속한 명령어 디코드 정보를 독출하여 상기 실행부로 전달하도록 구성될 수 있다.Here, the recovery unit receives the branch prediction error signal and the branch address from the execution unit, and instructions belonging to the instruction group of the instruction group memory specified by the tag item selected by referring to the branch address in the tag memory. The decode information may be read and transmitted to the execution unit.

여기에서, 상기 복구부가 상기 실행부로부터 입력된 분기 예측 오류 신호에 대응하여 상기 메모리부에 저장된 디코드 정보를 상기 실행부로 제공하지 못하는 경우, 상기 프로세서의 파이프라인 초기화를 수행하도록 구성될 수 있다.Here, when the recovery unit fails to provide decode information stored in the memory unit to the execution unit in response to the branch prediction error signal inputted from the execution unit, the processor may be configured to perform pipeline initialization of the processor.

종래의 고도 파이프라인 구조 프로세서 코어에서는 분기 예측 오류가 생길 때마다 이를 복구하기 위해서 파이프라인 초기화가 일어나면서 성능이 저하되며, 이로 인한 전력 소모의 증대가 초래된다. In the conventional highly pipelined processor core, pipeline initialization occurs to recover whenever branch prediction errors occur, resulting in increased power consumption.

본 발명에 따른 프로세서에서는 분기 명령 실행 캐쉬를 이용하여 명령어의 디코드 정보(decode information)을 저장하여 두고 분기 예측 오류가 발생할 경우에 분기 명령 실행 캐쉬에 저장된 명령어 디코드 정보를 즉시 실행부에 제공하여 파이프라인 초기화의 발생 빈도를 줄일 수 있다. 따라서, 본 발명에 따른 프로세서 구조를 이용할 경우에는, 프로세서의 성능 저하를 막을 수 있으며, 프로세서의 전력 소모를 절감하는 것이 가능하게 된다.The processor according to the present invention stores the decode information of the instruction using the branch instruction execution cache, and provides the instruction decode information stored in the branch instruction execution cache immediately to the execution unit when a branch prediction error occurs. Frequency of initialization can be reduced. Therefore, when using the processor structure according to the present invention, it is possible to prevent the degradation of the processor, it is possible to reduce the power consumption of the processor.

도 1은 본 발명에 따른 프로세서의 일 실시예를 설명하기 위한 블록도이다.
도 2는 본 발명에 따른 분기 명령 실행 캐쉬의 일 실시예를 설명하기 위한 블록도이다.
도 3은 본 발명에 따른 분기 명령 실행 캐쉬의 일 실시예를 자세히 설명하기 위한 블록도이다.
도 4는 본 발명에 따른 프로세서의 동작 방법의 일 실시예를 설명하기 위한 순서도이다.1 is a block diagram illustrating an embodiment of a processor according to the present invention.
2 is a block diagram illustrating an embodiment of a branch instruction execution cache according to the present invention.
3 is a block diagram illustrating in detail an embodiment of a branch instruction execution cache according to the present invention.
4 is a flowchart illustrating an embodiment of a method of operating a processor according to the present invention.

본 발명은 다양한 변경을 가할 수 있고 여러 가지 실시예를 가질 수 있는 바, 특정 실시예들을 도면에 예시하고 상세한 설명에 상세하게 설명하고자 한다. 그러나, 이는 본 발명을 특정한 실시 형태에 대해 한정하려는 것이 아니며, 본 발명의 사상 및 기술 범위에 포함되는 모든 변경, 균등물 내지 대체물을 포함하는 것으로 이해되어야 한다. 각 도면을 설명하면서 유사한 참조부호를 유사한 구성요소에 대해 사용하였다. While the invention is susceptible to various modifications and alternative forms, specific embodiments thereof are shown by way of example in the drawings and will herein be described in detail. It should be understood, however, that the invention is not intended to be limited to the particular embodiments, but includes all modifications, equivalents, and alternatives falling within the spirit and scope of the invention. Like reference numerals are used for like elements in describing each drawing.

제1, 제2, A, B 등의 용어는 다양한 구성요소들을 설명하는데 사용될 수 있지만, 상기 구성요소들은 상기 용어들에 의해 한정되어서는 안 된다. 상기 용어들은 하나의 구성요소를 다른 구성요소로부터 구별하는 목적으로만 사용된다. 예를 들어, 본 발명의 권리 범위를 벗어나지 않으면서 제1 구성요소는 제2 구성요소로 명명될 수 있고, 유사하게 제2 구성요소도 제1 구성요소로 명명될 수 있다. 및/또는 이라는 용어는 복수의 관련된 기재된 항목들의 조합 또는 복수의 관련된 기재된 항목들 중의 어느 항목을 포함한다. The terms first, second, A, B, etc. may be used to describe various elements, but the elements should not be limited by the terms. The terms are used only for the purpose of distinguishing one component from another. For example, without departing from the scope of the present invention, the first component may be referred to as a second component, and similarly, the second component may also be referred to as a first component. And / or < / RTI > includes any combination of a plurality of related listed items or any of a plurality of related listed items.

어떤 구성요소가 다른 구성요소에 "연결되어" 있다거나 "접속되어" 있다고 언급된 때에는, 그 다른 구성요소에 직접적으로 연결되어 있거나 또는 접속되어 있을 수도 있지만, 중간에 다른 구성요소가 존재할 수도 있다고 이해되어야 할 것이다. 반면에, 어떤 구성요소가 다른 구성요소에 "직접 연결되어" 있다거나 "직접 접속되어" 있다고 언급된 때에는, 중간에 다른 구성요소가 존재하지 않는 것으로 이해되어야 할 것이다. When a component is referred to as being "connected" or "connected" to another component, it may be directly connected to or connected to that other component, but it may be understood that other components may be present in between. Should be. On the other hand, when an element is referred to as being "directly connected" or "directly connected" to another element, it should be understood that there are no other elements in between.

본 출원에서 사용한 용어는 단지 특정한 실시예를 설명하기 위해 사용된 것으로, 본 발명을 한정하려는 의도가 아니다. 단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한, 복수의 표현을 포함한다. 본 출원에서, "포함하다" 또는 "가지다" 등의 용어는 명세서상에 기재된 특징, 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것이 존재함을 지정하려는 것이지, 하나 또는 그 이상의 다른 특징들이나 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다.The terminology used in this application is used only to describe a specific embodiment and is not intended to limit the invention. Singular expressions include plural expressions unless the context clearly indicates otherwise. In this application, the terms "comprise" or "have" are intended to indicate that there is a feature, number, step, operation, component, part, or combination thereof described in the specification, and one or more other features. It is to be understood that the present invention does not exclude the possibility of the presence or the addition of numbers, steps, operations, components, components, or a combination thereof.

다르게 정의되지 않는 한, 기술적이거나 과학적인 용어를 포함해서 여기서 사용되는 모든 용어들은 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자에 의해 일반적으로 이해되는 것과 동일한 의미를 가지고 있다. 일반적으로 사용되는 사전에 정의되어 있는 것과 같은 용어들은 관련 기술의 문맥 상 가지는 의미와 일치하는 의미를 가지는 것으로 해석되어야 하며, 본 출원에서 명백하게 정의하지 않는 한, 이상적이거나 과도하게 형식적인 의미로 해석되지 않는다.Unless defined otherwise, all terms used herein, including technical or scientific terms, have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. Terms such as those defined in commonly used dictionaries are to be interpreted as having a meaning consistent with the contextual meaning of the related art and are to be interpreted as either ideal or overly formal in the sense of the present application Do not.

이하, 본 발명에 따른 바람직한 실시예를 첨부된 도면을 참조하여 상세하게 설명한다.
Hereinafter, preferred embodiments according to the present invention will be described in detail with reference to the accompanying drawings.

도 1 은 본 발명에 따른 프로세서의 일 실시예를 설명하기 위한 블록도이다.1 is a block diagram illustrating an embodiment of a processor according to the present invention.

도 1을 참조하면, 본 발명에 따른 프로세서 장치의 일 실시예(100)는 복수의 프로세서 코어(110)를 포함하여 구성될 수 있다. 각 프로세서 코어(110)는 페치부(fetch unit; 121), 분기 예측부(branch prediction unit; 122), 명령어 큐(instruction queue; 123), 디코딩부(decoding unit; 124) 및 실행부(execution unit; 125)를 포함하여 구성될 수 있다. 이때, 상기 각 구성요소들(페치부, 분기 예측부, 명령어 큐, 디코딩 부 및 실행부)은 파이프라인 방식으로 동작하는 것을 특징으로 한다. 상술된 예시적 구성은 프로세서 코어의 필수적 요소들만을 포함한 것으로 실제 프로세서 코어의 구현은 더 많은 구성 요소들을 포함할 수 있다.Referring to FIG. 1, an embodiment 100 of a processor device according to the present invention may include a plurality of processor cores 110. Each processor core 110 includes a fetch unit 121, a branch prediction unit 122, an instruction queue 123, a decoding unit 124, and an execution unit. 125). In this case, each of the components (fetch unit, branch predictor, instruction queue, decoding unit and execution unit) is characterized in that it operates in a pipelined manner. The example configuration described above includes only the essential elements of the processor core and the implementation of the actual processor core may include more components.

또한, 본 발명에 따른 프로세서 장치는 상기 디코딩부(124)와 상기 실행부(125) 사이에 분기 명령 실행 캐쉬(branch instruction execution cache; 130)가 게재되는 것을 특징으로 한다.
In addition, the processor device according to the present invention is characterized in that a branch instruction execution cache 130 is placed between the decoding unit 124 and the execution unit 125.

먼저, 페치부(121)는 프로세서 내의 명령어 캐쉬(instruction cache; 120)로부터 현재 명령어(current instruction)를 읽어온다. 페치부(121)는 후술되는 분기 예측부(122)의 제어에 의해서 명령어 캐쉬(120)로부터 현재 명령어를 읽어올 수 있다First, the fetch unit 121 reads a current instruction from an instruction cache 120 in the processor. The fetch unit 121 may read the current instruction from the instruction cache 120 under the control of the branch predictor 122 described later.

예컨대, 페치부(121)는 현재 명령어가 분기(branch) 명령어가 아닌 경우에는 현재 명령어가 위치한 어드레스의 다음 어드레스에 위치한 명령어를 순차적으로 읽어오도록 구성될 수 있다. For example, if the current command is not a branch command, the fetch unit 121 may be configured to sequentially read the commands located at the address next to the address where the current command is located.

또한, 페치부(121)는 현재 명령어가 분기 명령어인 경우에는 후술될 분기 예측부(122)의 제어(분기 예측부의 예측에 기초한 제어)에 의해서 분기 명령어에 대응되는 분기 어드레스(branch target address)에 위치한 명령어를 읽어오거나, 현재 명령어가 위치한 어드레스의 다음 어드레스에 위치한 명령어를 읽어올 수도 있다.
In addition, when the current instruction is a branch instruction, the fetch unit 121 controls a branch target address corresponding to the branch instruction by the control of the branch predictor 122 (control based on the prediction of the branch predictor), which will be described later. You can read the command that is located or the command that is located next to the address where the current command is located.

분기 예측부(122)는 상기 페치부(121)로부터 현재 명령어를 전달받아 후술되는 명령어 큐(123)로 출력하고, 상기 현재 명령어가 분기 명령어인 경우 분기 예측을 수행하여, 분기 예측 결과에 따라 상기 현재 명령어의 분기 어드레스 또는 상기 현재 명령어가 위치한 어드레스의 다음 어드레스로부터 다음 명령어를 출력하도록 상기 페치부(121)를 제어하는 구성요소이다.The branch prediction unit 122 receives the current command from the fetch unit 121 and outputs the current command to the command queue 123 which will be described later. When the current command is a branch command, the branch prediction unit 122 performs the branch prediction. The fetch unit 121 controls the fetch unit 121 to output a next instruction from a branch address of a current instruction or a next address of an address where the current instruction is located.

즉, 분기 예측부(122)는 현재 명령어가 분기 명령어인 경우 분기 예측을 수행한다. 이때, 분기 예측부(122)는 분기 예측을 위하여 통상적으로 BTB(Branch Target Buffer)와 BP(Branch Prediction Decision) 유닛을 구비하게 되며, 이들을 이용하여 분기가 일어날지를 예측하고, 분기가 일어날 경우의 분기 어드레스(branch target address)를 추정하게 된다. 분기 예측부(122)은 다양한 세부 구성을 취할 수 있으며, 분기 예측부의 세부 구성은 본 발명의 범위 밖이므로 자세한 설명은 생략하도록 한다.That is, the branch predictor 122 performs branch prediction when the current instruction is a branch instruction. In this case, the branch prediction unit 122 typically includes a branch target buffer (BTB) and a branch prediction decision (BP) unit for branch prediction. The branch prediction unit 122 predicts whether a branch occurs using the branch, and branches when the branch occurs. The branch target address is estimated. The branch predicting unit 122 may take various detailed configurations, and detailed configurations of the branch predicting unit 122 are not included in the scope of the present invention.

이 때, 분기 예측부(122)는 항상 정확한 분기 예측을 수행할 수는 없는데, 그 이유는 분기 어드레스를 정확히 알기 위해서는 해당 분기 명령어 이전에 이미 파이프라인 상에 입력된 명령어들의 실행이 완전히 종료되어야만 하기 때문이다. At this time, the branch predictor 122 may not always perform accurate branch prediction, because it is necessary to completely terminate the execution of the instructions already input to the pipeline before the branch instruction in order to accurately know the branch address. Because.

분기 예측부(122)의 분기 예측 결과는 'Taken'과 'Not-Taken'으로 구분되는데, 'Taken'은 분기가 실제로 일어날 것이라고 추정한 것이며 'Not-Taken'은 분기가 일어나지 않을 것이라고 추정한 것을 의미한다. The branch prediction result of the branch prediction unit 122 is divided into 'Taken' and 'Not-Taken', where 'Taken' estimates that the branch will actually occur and 'Not-Taken' indicates that the branch will not occur. it means.

분기 예측 결과가 'Taken'인 경우 분기 예측부(122)는 페치부(121)가 분기 어드레스로부터 다음 명령어를 읽어오도록 한다. 분기 예측 결과가 'Not-Taken'인 경우 분기 예측부(122)는 페치부(121)가 연속적으로 어드레스를 증가시키면서 다음 명령어를 읽어오도록 한다. 분기 예측부(122)에 입력된 현재 명령어가 분기 명령어가 아닌 경우에는 분기 예측 결과가 'Not-Taken'인 경우와 마찬가지로 순차적으로 어드레스를 증가시키면서 다음 명령어를 읽어오게 된다.
When the branch prediction result is 'Taken', the branch prediction unit 122 causes the fetch unit 121 to read the next instruction from the branch address. When the branch prediction result is 'Not-Taken', the branch prediction unit 122 causes the fetch unit 121 to read the next command while continuously increasing the address. When the current instruction input to the branch predictor 122 is not a branch instruction, the next instruction is read while increasing the address sequentially as in the case where the branch prediction result is 'Not-Taken'.

분기 예측부의 출력은 분기 예측부가 추측한 명령어 시퀀스이며 이것은 명령어 큐(123)로 입력된다. 명령어 큐(123)는 고성능의 프로세서 코어를 위하여 동시에 복수 개(예컨대, 2~4개)의 명령어를 실행하기 위하여 다수의 명령어를 저장하기 위한 구성요소이다. 명령어 큐는 다양한 세부 구성을 취할 수 있으며, 명령어 큐의 세부 구성은 본 발명의 범위 밖이이므로 자세한 설명은 생략하도록 한다.
The output of the branch predictor is an instruction sequence guessed by the branch predictor and is input to the instruction queue 123. The instruction queue 123 is a component for storing a plurality of instructions for executing a plurality of instructions (for example, two to four) at the same time for a high performance processor core. The command queue may take various detailed configurations, and the detailed configuration of the command queue is beyond the scope of the present invention, and thus detailed description thereof will be omitted.

디코딩부(124)는 명령어 큐로부터 명령어를 읽어와서 명령어가 요구하는 연산의 종류, 오퍼랜드(operand)의 위치, 조건(condition) 등을 디코드하여 디코드 정보(decoded information; 126)를 생성하고, 이 디코드 정보(126)는 실행부(125)로 전달되며 실행부(126)는 디코드 정보에 기초하여 실제로 명령어에 대응되는 연산(operation)을 실행하게 된다.
The decoding unit 124 reads the instruction from the instruction queue, decodes the kind of operation required by the instruction, the position of the operand, the condition, and the like to generate decoded information 126, and decodes the decode. The information 126 is transmitted to the execution unit 125, and the execution unit 126 actually executes an operation corresponding to the instruction based on the decode information.

분기 명령 실행 캐쉬(130)는 디코딩부(124)로부터 전달되는 디코드 정보(126)를 필요에 따라서 저장하고, 실행부(125)에서 분기 예측 오류(Branch misprediction; 127)를 판별하여 알려줄 경우에, 분기 예측 오류를 복구하기 위해서 저장된 디코드 정보(128)를 실행부로 전달하는 역할을 수행한다. 즉, 분기명령 실행 캐쉬(130)는 상기 디코딩부로부터 디코딩된 명령어의 어드레스와 디코드 정보를 저장하며, 상기 실행부가 분기예측의 오류를 판별한 경우, 상기 분기 예측 오류를 복구(recovery)하기 위해, 상기 저장된 명령어의 디코드 정보의 적어도 일부를 상기 실행부로 제공하는 역할을 수행하게 된다.The branch instruction execution cache 130 stores the decode information 126 transmitted from the decoding unit 124 as necessary, and when the execution unit 125 determines and informs branch misprediction 127, In order to recover the branch prediction error, the decoded information 128 is transmitted to the execution unit. That is, the branch instruction execution cache 130 stores the address and decode information of the instruction decoded from the decoding unit. When the execution unit determines an error of the branch prediction, the branch instruction execution cache 130 recovers the branch prediction error. It serves to provide at least a part of the decode information of the stored instruction to the execution unit.

즉, 분기 명령 실행 캐쉬는 분기 명령어에 의해서 실제 분기가 일어났을 때 실행되어야 할 분기 어드레스에 위치한 명령어 그룹들의 디코드 정보들과 분기 명령어에 의해서 분기가 일어나지 않았을 때 실행되어야 할 분기 명령어 직후의 명령어 그룹들의 디코드 정보들을 저장하고 있을 수 있다. 하나의 예로서, 반복적으로 실행되어야하는 분기 명령어(예컨대, 루프(loop) 동작)의 경우에는 실행이 반복됨에 따라서 분기가 일어나는 경우와 분기가 일어나지 않는 경우에 실행되어야 하는 각각의 명령어 그룹들의 디코드 정보들이 분기 명령 실행 캐쉬에 저장되어 있게 된다. 이러한 경우에 분기 예측 오류가 발생되더라도 파이프라인의 초기화 없이 분기 명령 실행 캐쉬에 이미 디코딩되어 저장되어 있는 명령어 그룹들의 디코드 정보를 실행부에 즉시 제공할 수가 있게 된다. That is, the branch instruction execution cache includes decode information of instruction groups located at the branch address to be executed when the actual branch occurs by the branch instruction, and instruction groups immediately after the branch instruction to be executed when the branch does not occur by the branch instruction. It may be storing decode information. As an example, in the case of branch instructions (eg, loop operations) that must be executed repeatedly, decode information of respective instruction groups that must be executed when branching occurs and branching does not occur as the execution is repeated. Are stored in the branch instruction execution cache. In this case, even if a branch prediction error occurs, the decode information of instruction groups already decoded and stored in the branch instruction execution cache can be immediately provided to the execution unit without initialization of the pipeline.

본 발명에 따른 프로세서에서는, 상기 실행부(125)에서 분기 예측 오류를 판별하고, 분기 예측 오류를 상기 분기 명령 실행 캐쉬에 통보하였음에도, 상기 분기 명령 실행 캐쉬가 분기 예측 오류를 극복할 수 있도록 미리 저장된 디코드 정보들 중 적어도 일부를 상기 실행부로 제공하지 못하는 경우에만 파이프라인 초기화가 발생되도록 구성된다. 따라서, 본 발명에 따른 프로세서는 분기 예측 오류 복구를 위한 파이프라인 초기화에 의한 오버헤드를 최소화할 수 있다.
In the processor according to the present invention, even if the execution unit 125 determines the branch prediction error and notifies the branch instruction execution cache of the branch prediction error, the branch instruction execution cache is stored in advance so as to overcome the branch prediction error. The pipeline initialization is generated only when at least some of the decode information is not provided to the execution unit. Therefore, the processor according to the present invention can minimize the overhead caused by pipeline initialization for branch prediction error recovery.

도 2는 본 발명에 따른 분기 명령 실행 캐쉬의 일 실시예를 설명하기 위한 블록도이다.2 is a block diagram illustrating an embodiment of a branch instruction execution cache according to the present invention.

이하에서, 설명되는 분기 명령 실행 캐쉬는 파이프라인 구조의 프로세서 코어에 적용되는 구성요소이다. 분기 명령 실행 캐쉬가 적용되는 프로세서는 통상적으로 앞서 설명된 바와 같이 페치부, 분기 예측부, 명령어 큐, 디코딩부 및 실행부를 포함하여 구성될 수 있다. 파이프라인 구조의 프로세서에서 페치부, 분기 예측부, 명령어 큐, 디코딩부 및 실행부들은 파이프라인 방식으로 동작하게 된다. 이때, 본 발명에 따른 분기 명령 실행 캐쉬(130)는 상기 디코딩부(124)와 실행부(125) 간에 게재되어 동작한다.The branch instruction execution cache described below is a component applied to the processor core of the pipeline structure. The processor to which the branch instruction execution cache is applied may typically include a fetch unit, a branch predictor, an instruction queue, a decoder, and an execution unit as described above. In the processor of the pipeline structure, the fetch unit, branch predictor, instruction queue, decoder, and execution unit operate in a pipelined manner. At this time, the branch instruction execution cache 130 according to the present invention is published between the decoding unit 124 and the execution unit 125 to operate.

도 2를 참조하면, 본 발명에 따른 분기 명령 실행 캐쉬의 일 실시예(130)는, 세이빙부(saving unit; 131), 메모리부(memory unit; 140) 및 복구부(recovery unit; 132)를 포함하여 구성될 수 있을 것이다.
2, an embodiment 130 of a branch instruction execution cache according to the present invention may include a saving unit 131, a memory unit 140, and a recovery unit 132. It may be configured to include.

세이빙부(131)는 프로세서의 디코딩부(124)로부터 상기 디코딩부가 디코딩한 명령어의 어드레스와 디코드 정보를 수신하는 구성요소이다. 즉, 세이빙부(131)는 디코딩부에서 디코딩한 명령어의 어드레스와 디코드 정보를 실행부(125)와 함께 수신하여 후술되는 메모리부(140)에 저장하는 역할을 수행한다.
The saving unit 131 is an element that receives an address and decode information of an instruction decoded by the decoding unit from the decoding unit 124 of the processor. That is, the saving unit 131 receives the address and decode information of the command decoded by the decoding unit together with the execution unit 125 and stores the stored information in the memory unit 140 to be described later.

메모리부(140)는 상기 세이빙부로부터 디코딩한 명령어의 어드레스와 명령어 디코드 정보를 수신하여 저장하는 구성요소로서, 메모리부의 구성의 일 실시예는 후술된다. 앞서 설명된 바와 같이, 분기 명령 실행 캐쉬의 메모리부(140)는 프로세서의 동작이 지속됨에 따라서 분기 명령어 이후의 명령어들 중 적어도 일부와 분기 명령어의 분기 어드레스 이후에 위치한 명령어들 중 적어도 일부의 디코드 정보를 저장하고 있게 된다.
The memory unit 140 is an element that receives and stores the address and instruction decode information of the command decoded from the saving unit. An embodiment of the configuration of the memory unit will be described later. As described above, the memory unit 140 of the branch instruction execution cache may decode information of at least some of the instructions after the branch instruction and at least some of the instructions located after the branch address of the branch instruction as the operation of the processor continues. Will be saved.

복구부(132)는 상기 프로세서의 실행부(125)로부터 분기예측 오류 신호를 입력받고 분기 예측 오류 신호에 대응하여 상기 메모리에 저장된 명령어 디코드 정보를 독출하여 실행부(125)로 제공하는 역할을 수행한다.
The recovery unit 132 receives the branch prediction error signal from the execution unit 125 of the processor, reads the instruction decode information stored in the memory, and provides the branch prediction error signal to the execution unit 125 in response to the branch prediction error signal. do.

도 3은 본 발명에 따른 분기 명령 실행 캐쉬의 일 실시예를 보다 자세히 설명하기 위한 블록도이다.3 is a block diagram illustrating in more detail an embodiment of a branch instruction execution cache according to the present invention.

본 발명의 프로세서에 적용될 수 있는 분기 명령 실행 캐쉬의 실제 구현은 다양한 형태를 취할 수 있으나, 도 3은 분기 명령 실행 캐쉬의 구체적인 구현의 일 예를 설명하기 위한 것이다.The actual implementation of the branch instruction execution cache that can be applied to the processor of the present invention may take various forms, but FIG. 3 illustrates an example of a specific implementation of the branch instruction execution cache.

도 3을 참조하면, 본 발명에 따른 분기 명령 실행 캐쉬의 일 실시예에 포함된 메모리부(140)는 태그 메모리(141)와 명령어 그룹 메모리(143)를 포함하여 구성될 수 있다. 세이빙부(131)와 복구부(132)는 앞서 도 2를 통하여 설명된 구성요소들이다. Referring to FIG. 3, the memory unit 140 included in one embodiment of the branch instruction execution cache according to the present invention may include a tag memory 141 and an instruction group memory 143. The saving unit 131 and the recovery unit 132 are the components described above with reference to FIG. 2.

먼저, 분기 명령 실행 캐쉬의 태그 메모리(141)는 적어도 하나의 태그 아이템(142)을 포함하여 구성될 수 있다. 태그 아이템은 후술되는 명령어 그룹 메모리에 저장된 명령어 그룹에 일대일로 대응되는 아이템이다.First, the tag memory 141 of the branch instruction execution cache may include at least one tag item 142. The tag item is an item that corresponds one-to-one to an instruction group stored in the instruction group memory to be described later.

다음으로, 분기 명령 실행 캐쉬의 명령어 그룹 메모리(143)는 복수의 명령어 그룹을 가지며, 각각의 명령어 그룹은 복수의 명령어 디코드 정보를 가진다. 이때, 상술된 바와 같이 각각의 명령어 그룹은 태그 메모리(141)의 태그 아이템(142)에 일대일로 매핑된다.Next, the instruction group memory 143 of the branch instruction execution cache has a plurality of instruction groups, and each instruction group has a plurality of instruction decode information. At this time, as described above, each instruction group is mapped one-to-one to the tag item 142 of the tag memory 141.

이하에서는, 메모리부(140)의 구체적 구현 예를 토대로 세이빙부(131), 복구부(132) 및 메모리부(140)의 동작을 예시하기로 한다.
Hereinafter, operations of the saving unit 131, the recovery unit 132, and the memory unit 140 will be described based on a specific implementation of the memory unit 140.

도 3을 참조하면, 디코딩부(124)의 출력으로 디코딩된 명령어의 어드레스와 명령어의 디코드 정보가 세이빙부(131)로 입력된다. 세이빙부(131)는 디코딩부(124)로부터 수신된 명령어의 어드레스를 바탕으로 태그 메모리(141)를 탐색하여 비어 있는 태그 아이템을 찾아낸다. 비어있는 태그 아이템이 없는 경우에는 가장 최근까지 사용되지 않은 태그 아이템(예컨대, 142로 가정)을 선택한다. Referring to FIG. 3, the address of an instruction decoded by the output of the decoder 124 and the decode information of the instruction are input to the saving unit 131. The saving unit 131 searches the tag memory 141 based on the address of the instruction received from the decoding unit 124 to find an empty tag item. If there is no empty tag item, the tag item that has not been used most recently (eg, assume 142) is selected.

세이빙부(131)는 선택된 태그 아이템(142)에 명령어 어드레스의 적어도 일부(예컨대, 어드레스의 상위 비트)를 저장한다. 이때, 태그 아이템에 명령어 어드레스의 적어도 일부를 저장하는 이유는 명령어 어드레스를 이용하여 해당 명령어의 디코드 정보가 저장되는 명령어 그룹을 지정하는 태그 아이템을 특정할 수 있도록 하기 위함이다. 각 태그 아이템은 명령어 그룹 메모리(143)내의 명령어 그룹(예컨대, 144로 가정)과 1:1로 매핑되어 있다. The saving unit 131 stores at least a portion (eg, high order bits of the address) of the instruction address in the selected tag item 142. In this case, the reason for storing at least a part of the command address in the tag item is to be able to specify a tag item that designates a command group in which decode information of the corresponding command is stored using the command address. Each tag item is mapped 1: 1 with an instruction group (eg, assumed to be 144) in the instruction group memory 143.

한 개의 명령어 그룹(예컨대, 144)은 복수 개(예컨대, 8 개)의 명령어 디코드 정보(예컨대, 145-1,...,145-N)를 저장한다. 각각의 명령어 디코드 정보(145-1,...,145-N)에는 유효 비트(valid bit; 146-1,...,146-N)가 있으며 유효 비트는 해당 명령어 디코드 정보가 유효한 명령어의 결과인지 그렇지 않은지를 나타낸다. One instruction group (eg, 144) stores a plurality of (eg, eight) instruction decode information (eg, 145-1,..., 145 -N). Each instruction decode information (145-1, ..., 145-N) has a valid bit (146-1, ..., 146-N), and a valid bit indicates that the instruction decode information is valid. Indicates whether it is a result or not.

실행부(125)에서 분기 예측 오류(branch misprediction)를 판별하면 실행부(125)는 분기 예측 오류 신호를 통해서 복구부(132)에 분기 예측 오류가 발생했음을 통지한다. 복구부(132)은 실행부(124)가 분기 예측 오류 신호와 함께 전달한 분기 어드레스를 참조하여 태그 메모리(141) 내의 대응되는 태그 아이템을 탐색한다. 대응되는 태그 아이템이 존재할 경우 복구부(132)는 대응되는 태그 아이템에 맵핑되는 명령어 그룹을 명령어 그룹 메모리부터 특정하고, 특정된 명령어 그룹의 명령어 디코드 정보들을 실행부(124)로 제공한다. 만약, 복구부(132)가 분기 어드레스에 대응되는 태그 아이템을 태그 메모리(141) 내에서 찾지 못할 경우에 파이프라인 초기화가 일어나도록 프로세서는 제어된다.
When the execution unit 125 determines the branch misprediction, the execution unit 125 notifies the recovery unit 132 that the branch prediction error has occurred through the branch prediction error signal. The recovery unit 132 searches for a corresponding tag item in the tag memory 141 by referring to the branch address transmitted by the execution unit 124 together with the branch prediction error signal. If there is a corresponding tag item, the recovery unit 132 specifies an instruction group mapped to the corresponding tag item from the instruction group memory, and provides the instruction decode information of the specified instruction group to the execution unit 124. If the recovery unit 132 does not find a tag item corresponding to the branch address in the tag memory 141, the processor is controlled to cause pipeline initialization.

도 4는 본 발명에 따른 프로세서의 동작 방법의 일 실시예를 설명하기 위한 순서도이다.4 is a flowchart illustrating an embodiment of a method of operating a processor according to the present invention.

도 4를 참조하면, 본 발명에 따른 프로세서의 동작 방법은, 분기 예측 단계(S410), 명령어 저장 단계(S420), 디코딩 단계(S430) 및 실행 단계(S440)를 포함하여 구성될 수 있다. 상기 분기 예측 단계(S410), 명령어 저장 단계(S420), 디코딩 단계(S430) 및 실행 단계(S440)는 파이프라인 방식으로 동작하는 것을 특징으로 한다. 즉, 각각의 단계들은 병렬적으로 동작될 수 있다.Referring to FIG. 4, a method of operating a processor according to the present invention may include a branch prediction step S410, an instruction storing step S420, a decoding step S430, and an execution step S440. The branch prediction step S410, the instruction storing step S420, the decoding step S430, and the execution step S440 are operated in a pipelined manner. That is, each step can be operated in parallel.

분기 예측 단계(S410)는 명령어 캐쉬로부터 읽어온 현재 명령어를 출력하고 분석하여 상기 현재 명령어가 분기 명령어인 경우 분기 예측을 수행하고, 분기 예측 결과에 따라 상기 현재 명령어의 분기 어드레스 또는 상기 현재 명령어가 위치한 어드레스의 다음 어드레스로부터 다음 명령어를 출력하는 단계이다. 분기 예측 단계(S410)는 도 2를 통하여 설명된 본 발명에 따른 프로세서의 일 실시예에서 페치부(121)와 분기 예측부(122)에서 수행되는 동작으로 이해될 수 있다.The branch prediction step S410 outputs and analyzes a current instruction read from the instruction cache to perform branch prediction when the current instruction is a branch instruction, and according to a branch prediction result, the branch address of the current instruction or the current instruction is located. The next command is output from the next address of the address. The branch prediction step S410 may be understood as an operation performed by the fetch unit 121 and the branch prediction unit 122 in one embodiment of the processor described with reference to FIG. 2.

다음으로, 명령어 저장 단계(S420)는 상기 분기 예측 단계(S410)로부터 출력된 명령어를 명령어 큐에 저장하는 단계로서, 도 2를 통하여 설명된 본 발명에 따른 프로세서의 일 실시예에서 명령어 큐(123)에서 수행되는 동작으로 이해될 수 있다.Next, the instruction storing step (S420) is a step of storing the instructions output from the branch prediction step (S410) in an instruction queue, and the instruction queue 123 according to one embodiment of the processor according to the present invention described with reference to FIG. 2. It can be understood as the operation performed in the).

다음으로, 디코딩 단계(S430)는 상기 명령어 큐로부터 전달된 명령어를 디코딩하여, 상기 전달된 명령어의 어드레스와 디코드 정보를 출력하는 단계로서, 상기 전달된 명령어의 어드레스와 디코드 정보를 디코딩하며(S431), 디코딩된 상기 명령어의 어드레스와 디코드 정보를 분기 명령 실행 캐쉬에 저장하고(S432), 실행 단계(S440)로 출력하는 단계이다. 디코딩 단계(S430)의 동작은 도 2를 통하여 설명된 본 발명에 따른 프로세서의 일 실시예에서 디코딩부(124)와 분기 명령 실행 캐쉬(130)에서 수행되는 동작으로 이해될 수 있다.Next, the decoding step (S430) is to decode the command delivered from the command queue, and output the address and decode information of the delivered command, decode the address and decode information of the delivered command (S431) The decoded address and decode information of the instruction is stored in the branch instruction execution cache (S432) and output to the execution step (S440). The operation of the decoding step S430 may be understood as an operation performed by the decoding unit 124 and the branch instruction execution cache 130 in one embodiment of the processor according to the present invention described with reference to FIG. 2.

마지막으로, 실행 단계(S440)에서는, 상기 디코딩 단계(S430)로부터 전달된 디코딩된 명령어의 디코드 정보를 토대로, 분기 예측 오류가 발생하였는지가 판별되고(S441), 분기 예측 오류가 발생하지 않은 경우는 디코드 정보에 기초한 연산이 수행된다. 만약, 분기 예측 오류가 발생된 것으로 판별된 경우에는 디코딩 단계(S430)에서 분기 명령 실행 캐쉬에 저장된 분기 어드레스에 대응되는 명령어 그룹의 명령어들의 디코드 정보가 존재하는지를 판단하여(S443), 디코드 정보가 존재할 경우는 분기 명령 실행 캐쉬로부터 디코드 정보를 독출하여 디코드 정보에 기초한 연산을 수행한다(S442). 그러나, 분기 명령 실행 캐쉬에 저장된 분기 어드레스에 대응되는 명령어 그룹의 명령어들의 디코드 정보가 존재하지 않는 경우에는 파이프라인 초기화 과정(S444)이 수행된다.
Finally, in the execution step (S440), it is determined whether a branch prediction error has occurred based on the decoded information of the decoded instruction transmitted from the decoding step (S430) (S441), and if the branch prediction error does not occur, An operation based on the decode information is performed. If it is determined that a branch prediction error has occurred, it is determined in the decoding step S430 whether decode information of the instructions of the instruction group corresponding to the branch address stored in the branch instruction execution cache exists (S443), and the decode information exists. In the case where the decode information is read from the branch instruction execution cache, an operation based on the decode information is performed (S442). However, when there is no decode information of the instructions of the instruction group corresponding to the branch address stored in the branch instruction execution cache, the pipeline initialization process (S444) is performed.

상기에서는 본 발명의 바람직한 실시예를 참조하여 설명하였지만, 해당 기술 분야의 숙련된 당업자는 하기의 특허 청구의 범위에 기재된 본 발명의 사상 및 영역으로부터 벗어나지 않는 범위 내에서 본 발명을 다양하게 수정 및 변경시킬 수 있음을 이해할 수 있을 것이다.
It will be apparent to those skilled in the art that various modifications and variations can be made in the present invention without departing from the spirit or scope of the present invention as defined by the following claims It can be understood that

100: 프로세서 110: 프로세서 코어
120: 명령어 캐쉬 121: 페치부
122: 분기 예측부 123: 명령어 큐
124: 디코딩부 125: 실행부
126, 128: 디코드 정보 127: 분기 예측 오류 신호
130: 분기 명령 실행 캐쉬
131: 세이빙부 132: 복구부
140: 메모리부 141: 태그 메모리
142: 태그 아이템 143: 명령어 그룹 메모리
144: 명령어 그룹
145-1,...,145-N: 명령어 디코드 정보
146-1,...,146-N: 유효 비트100: processor 110: processor core
120: instruction cache 121: fetch unit
122: branch prediction unit 123: instruction queue
124: decoding unit 125: execution unit
126, 128: decode information 127: branch prediction error signal
130: branch instruction execution cache
131: saving unit 132: recovery unit
140: memory unit 141: tag memory
142: tag item 143: instruction group memory
144: command group
145-1, ..., 145-N: instruction decode information
146-1, ..., 146-N: Valid bits

Claims

A fetch unit that reads the current instruction from the instruction cache;
Receives and outputs the current instruction, and if the current instruction is a branch instruction, performs branch prediction to output the next instruction from a branch address of the current instruction or a next address of the address where the current instruction is located according to a branch prediction result. A branch predictor controlling the fetch unit;
An instruction queue for storing instructions output from the branch predictor;
A decoding unit for decoding a command transmitted from the command queue and outputting address and decode information of the transferred command; And
An execution unit configured to perform an operation corresponding to the decode information based on an address and decode information of an instruction output from the decoding unit,
Stores the address and decode information of the instruction output from the decoding unit, and if a branch prediction error is determined by the execution unit, provides at least some of the stored decode information to the execution unit to recover the error of the branch prediction. And a branch instruction execution cache.

The method according to claim 1,
And the fetch unit, the branch predictor, the instruction queue, the decoding unit and the execution unit operate in a pipelined manner.

The method according to claim 2,
And if a branch prediction error is determined in the execution unit, and the branch instruction execution cache fails to provide at least some of the stored decode information to the execution unit.

The method according to claim 1,
The fetch unit reads the next instruction from the branch address of the current instruction when the branch predictor predicts that the branch will occur in the current instruction, and when the branch predictor predicts that the branch does not occur in the current instruction, And read the next instruction from the next address of the located address.

The method according to claim 1,
And the branch instruction execution cache stores decode information of at least some of the instructions after the branch instruction.

The method according to claim 1,
And the branch instruction execution cache stores decode information of at least some of the instructions located after the branch address of the branch instruction.

The method according to claim 1,
The branch instruction execution cache is
A saving unit for receiving the address and the decode information of the decoded instruction from the decoding unit of the processor device;
A memory unit for receiving and storing the address and decode information of the decoded command from the saving unit; And
And a recovery unit receiving a branch prediction error signal from the execution unit and providing decode information stored in the memory unit to the execution unit.

The method of claim 7,
The memory unit
A tag memory storing at least one tag item specified as at least part of an address of the decoded instruction; And
An instruction group memory composed of instruction groups one-to-one specified by the tag item,
And the instruction group stores decode information for at least one instruction.

The method according to claim 8,
The saving unit stores at least a part of an address of the command in a tag item selected in the tag memory based on the address of the command output from the decoding unit, and is specified by the selected tag item in the command group memory. And decode information of the output command in the processor.

The method according to claim 8,
The recovery unit receives the branch prediction error signal and the branch address from the execution unit, and stores the instruction decode information belonging to the instruction group of the instruction group memory specified by the tag item selected by referring to the branch address in the tag memory. Read and transfer the processor to the execution unit.

Branch instruction execution cache that applies to processors in the pipeline structure.
A saving unit for receiving the address and the decode information of the decoded instruction from the decoding unit of the processor;
A memory unit for receiving and storing the address and decode information of the decoded command from the saving unit; And
And a recovery unit which receives a branch prediction error signal from the execution unit of the processor and provides decode information stored in the memory unit to the execution unit.

The method of claim 11,
The memory unit,
A tag memory storing at least one tag item specified as at least part of an address of the decoded instruction; And
An instruction group memory composed of instruction groups one-to-one specified by the tag item,
And the instruction group stores decoded information for at least one instruction.

The method of claim 12,
The saving unit stores at least a part of an address of the command in a tag item selected in the tag memory based on the address of the command output from the decoding unit, and is specified by the selected tag item in the command group memory. And a decode information of the output instruction in the branch instruction execution cache.

The method of claim 12,
The recovery unit receives the branch prediction error signal and the branch address from the execution unit, and stores the instruction decode information belonging to the instruction group of the instruction group memory specified by the tag item selected by referring to the branch address in the tag memory. Branch instruction execution cache, characterized in that for reading and delivering to the execution unit.

The method of claim 12,
If the recovery unit does not provide decode information stored in the memory unit to the execution unit in response to the branch prediction error signal input from the execution unit, the branch instruction execution cache, characterized in that for performing the pipeline initialization of the processor. .

Outputs and analyzes the current instruction read from the instruction cache to perform branch prediction when the current instruction is a branch instruction, and according to the branch prediction result, the next instruction from the branch address of the current instruction or the next address of the address where the current instruction is located. Outputting a branch prediction step;
An instruction storing step of storing the instruction output from the branch prediction step in an instruction queue;
Decoding a command transmitted from the command queue, and outputting an address and decode information of the transferred command; And
An execution step of performing an operation corresponding to the output command based on the address and the decode information of the command output from the decoding step,
Store the address and decode information of the instruction output in the decoding step, and if the branch prediction error is determined in the execution step, execute at least some of the decode information of the stored instruction to overcome the branch prediction error. The processor operating method, characterized in that provided in the step.

18. The method of claim 16,
The branch prediction step, the instruction storing step, the decoding step and the execution step operate in a pipelined manner.

18. The method of claim 17,
And if a branch prediction error is determined in the execution step and at least some of the address and decode information of the stored instruction are not provided to the execution step, pipeline initialization is performed.