KR100861073B1

KR100861073B1 - Parallel processing processor architecture adapting adaptive pipeline

Info

Publication number: KR100861073B1
Application number: KR1020070007118A
Authority: KR
Inventors: 조경록; 이승숙; 이제훈
Original assignee: 충북대학교 산학협력단
Priority date: 2007-01-23
Filing date: 2007-01-23
Publication date: 2008-10-01
Also published as: KR20080069423A

Abstract

하드웨어로 구현된 프로세서의 여러 파이프라인 단들의 위치를 실행중인 명령어에 따라 적응적으로 변화시킬 수 있는 비동기식 파이프라인 구조와, 하드웨어 증가 없이 여러 종류의 명령어를 병렬로 처리할 수 있도록 명령어 종류에 따라 전체 시스템의 데이터패스를 분할하여 서로 다른 명령어들을 동시에 실행시키는 비동기식 제어 방법, 그리고 제시된 파이프라인 구조와 제어 방법을 프로세서에 적용하여 시스템의 불필요한 동작을 제거하고 병렬처리율을 증가시키는 프로세서 구조가 개시된다. 본 발명은 비동기식 적응형 파이프라인과 병렬 처리 방식을 시스템에 적용하면 시스템의 불필요한 동작을 제거하고, 하드웨어의 증가 없이 여러 종류의 명령어를 병렬로 처리할 수 있다. 이로 인해 기존의 프로세서에 비해 명령어 처리 속도를 증가시킬 수 있고, 하드웨어의 증가가 없으므로 소비 전력을 감소시킬 수 있다.An asynchronous pipeline structure that can adaptively change the location of multiple pipeline stages of a hardware-implemented processor according to the instructions being executed, and the entire type according to the instruction type to process several kinds of instructions in parallel without increasing hardware. An asynchronous control method for dividing a datapath of a system to execute different instructions simultaneously, and a processor structure for applying unnecessary proposed pipeline structures and control methods to a processor to eliminate unnecessary operation of the system and to increase parallelism. According to the present invention, when the asynchronous adaptive pipeline and the parallel processing scheme are applied to a system, unnecessary operations of the system can be eliminated and various kinds of instructions can be processed in parallel without increasing hardware. This can increase instruction processing speed compared to conventional processors, and reduce power consumption since there is no increase in hardware.

Description

Parallel processing processor architecture adapting adaptive pipeline

도 1은 스테이지 스키핑과 스테이지 통합 방식을 채택한 적응형 파이프라인의 구조도이다.1 is a structural diagram of an adaptive pipeline employing stage skipping and stage integration.

도 2a는 스테이지 스키핑 방식의 실행 예를 보여주기 위한 적응형 파이프라인의 동작 예이다. 2A is an example of operation of an adaptive pipeline to illustrate an example implementation of a stage skipping scheme.

도 2b는 스테이지 통합 방식의 실행 예를 보여주기 위한 적응형 파이프라인의 동작 예이다.2B is an example of operation of an adaptive pipeline to illustrate an example implementation of a stage integration scheme.

도 3a는 일반적인 파이프라인 구조에서 데이터의 처리를 보여주기 위한 블록도이다. 3A is a block diagram illustrating the processing of data in a general pipeline structure.

도 3b는 본 발명에 따른 데이터 패스 분리 기반의 파이프라인 구조를 보여주기 위한 블록도이다. 3B is a block diagram illustrating a pipeline structure based on data path separation according to the present invention.

도 4는 프로세서에 적용된 파이프라인과 병렬 처리 구조도이다. 4 is a diagram of a pipeline and parallel processing architecture applied to a processor.

도 5는 적응형 파이프라인 구조와 데이터 패스 분리가 적용된 프로세서 구조도이다. 5 is a diagram of a processor architecture in which an adaptive pipeline structure and data path separation are applied.

도 6은 본 발명에 따른 ARM 프로세서의 각 명령어별 실행 데이터 패스를 보여주기 위한 표이다.6 is a table illustrating an execution data path for each instruction of an ARM processor according to the present invention.

<도면의 주요 부분에 대한 부호의 설명> <Explanation of symbols for the main parts of the drawings>

110a~110c : 래치 컨트롤러 120a~120b : 파이프라인 스테이지 110a ~ 110c: Latch Controller 120a ~ 120b: Pipeline Stage

121 : IF 스테이지 130 : ID 스테이지 121: IF stage 130: ID stage

140 : EX 스테이지 140: EX stage

본 발명은 적응형 파이프라인을 적용한 병렬 처리 프로세서 구조에 관한 것으로 특히 비동기식 적응형 파이프라인과 병렬 처리 방식을 시스템에 적용하여 시스템의 불필요한 동작을 제거하고, 하드웨어의 증가 없이 여러 종류의 명령어를 병렬로 처리할 수 있도록 하기 위한 적응형 파이프라인을 적용한 병렬 처리 프로세서 구조에 관한 것이다. The present invention relates to a parallel processor architecture using an adaptive pipeline. In particular, the asynchronous adaptive pipeline and the parallel processing scheme are applied to a system to remove unnecessary operations of the system and to execute various types of instructions in parallel without increasing hardware. The present invention relates to an architecture of parallel processing processor employing an adaptive pipeline for processing.

일반적으로, 프로세서를 사용하는 시스템 측면에서 동기식과 비동기식을 비교하면, 동기식 시스템은 시스템 클록에 의해 전체 시스템을 구동시키는 반면에, 비동기 시스템은 데이터 의존형(data-dependent) 제어 방식으로 데이터의 흐름에 따라 선택적으로 동작이 필요한 모듈만을 구동시킨다. In general, comparing synchronous and asynchronous in terms of systems using processors, synchronous systems run the entire system by the system clock, whereas asynchronous systems operate in a data-dependent control fashion. Only drives modules that need to be activated.

비동기 시스템은 동작이 필요 없는 모듈들이 휴지상태에 있을 때 에너지를 소모하지 않는 이점을 가지므로 비동기 시스템의 설계는 적은 전력 소모로 인해 서브 마이크로 시스템 설계 방법 중의 하나로 인식되고 있다. Asynchronous systems have the advantage that they do not consume energy when modules that do not require operation are at rest, so the design of asynchronous systems is recognized as one of the sub-microsystem design methods due to the low power consumption.

또한, 파이프라인이란 프로세서로 입력되는 명령어들의 움직임, 또는 명령 어를 수행하기 위해 프로세서에 의해 취해진 산술적인 연산 단계가 연속적이고, 다소 겹치는 것을 의미한다. 파이프라인이 없다면 컴퓨터의 프로세서는 메모리에서 첫 번째 명령어를 가지고 와서, 첫 번째 명령어가 요구하는 연산의 수행을 완료하고, 다음번 명령어를 메모리로부터 가져와서 그 명령어가 요구하는 연산의 수행을 완료하는 식으로 동작한다. 그러므로 파이프라인이 없다면 명령어를 하나씩 차례대로만 수행을 하고 다음 명령어는 앞에서 수행되고 있는 명령어를 기다리고 있게 되어 캄퓨터 프로세스의 연산 속도는 느려질 수 밖에 없다. In addition, the pipeline means that the movement of instructions to the processor, or the arithmetic operation steps taken by the processor to perform the instructions, is continuous and somewhat overlapping. If there is no pipeline, the computer's processor takes the first instruction from memory, completes the operation required by the first instruction, takes the next instruction from memory, and completes the operation requested by the instruction. It works. Therefore, if there is no pipeline, the instructions are executed only one by one, and the next instruction is waiting for the instruction being executed. Therefore, the operation speed of the computer process is inevitably slowed down.

그러나 파이프라인이 적용된 프로세서는 이전 명령어에 따른 산술연산을 수행하는 동안에 다음번 명령어를 가져올 수 있으며, 그것을 다음 명령어 연산이 수행될 수 있을 때까지 프로세서 근처의 버퍼에 가져다놓는다. 명령어를 가져오는 단계는 끊임없이 계속된다. 그 결과, 주어진 시간동안에 수행될 수 있는 명령어의 수가 증가한다. 즉, 파이프라인은 명령어들에 따른 프로세세의 산술 연산을 전체적으로 동시에 진행될 수 있도록 하는 장점이 있다. However, a pipelined processor can fetch the next instruction while performing an arithmetic operation on a previous instruction and place it in a buffer near the processor until the next instruction operation can be performed. The steps to get the command are constantly on. As a result, the number of instructions that can be executed in a given time increases. In other words, the pipeline has an advantage that the arithmetic operation of the processor according to the instructions can be performed simultaneously.

종래의 파이프라인은 프로세서의 실행 스테이지를 고정적으로 여러 개로 나누었다. 프로세서는 명령어들을 수행하는 데, 각 명령어마다 실행이 불필요한 스테이지가 존재할 수 있고, 이웃한 스테이지를 순차적으로 사용할 수도 있다. 기존의 구조가 프로세서에 적용되어 칩으로 제작되면 파이프라인 동작의 수정이 불가능하다. 또한, 종래의 동기식 파이프라인 시스템은 고정된 파이프라인의 길이와 위치를 갖기 때문에 가변적인 동작에 대한 대응이 어렵다. Conventional pipelines have fixedly divided the execution stages of a processor into several. The processor may execute instructions, and there may be a stage that does not need to be executed for each instruction, and may sequentially use neighboring stages. If the existing structure is applied to the processor and made into a chip, it is impossible to modify the pipeline behavior. In addition, the conventional synchronous pipeline system has a fixed length and position of the pipeline, it is difficult to respond to the variable operation.

또한, 종래의 명령어 병렬 방법이 사용된 프로세서인 VLIW(Very Long Instruction Word)방식 및 슈퍼스칼라(Superscalar)구조는 여러개의 명령어를 동시에 처리하게 되나, 명령어의 병렬 처리를 위해 동시에 실행되는 명령어의 수만큼 실행 유닛의 수를 증가시키게 되어 프로세서의 하드웨어 크기와 전력 소모를 증가시키는 문제점이 있었다.In addition, the VLIW (Very Long Instruction Word) method and the Superscalar structure, which are processors using a conventional instruction parallel method, process several instructions simultaneously, but as many as the number of instructions executed simultaneously for parallel processing of instructions. Increasing the number of execution units has a problem of increasing the hardware size and power consumption of the processor.

본 발명은 상기의 문제점을 해소하기 위하여 발명된 것으로, 하드웨어 IP의 형태로 구현되더라도 파이프라인의 위치를 유연하게 변경시킬 수 있도록 하였다. 이렇게 함으로서 프로세서의 실행중인 명령어의 필요 데이터 패스가 명령어 종류에 따라 각각 다르게 구현되고 모든 명령어들에 최적화되도록 실행 데이터 패스를 정의할 수 있다. 또한 명령어의 종류에 따라 전체 시스템을 여러 개의 데이터 패스로 분리하여 서로 다른 데이터패스를 갖는 명령어들은 동시에 실행시킴으로서 하드웨어의 크기의 증가 없이 명령어 병렬 처리가 가능하도록 하였다. The present invention was invented to solve the above problems, and even if implemented in the form of hardware IP, it is possible to flexibly change the position of the pipeline. In this way, the execution data path can be defined so that the required data path of the processor's running instructions is implemented differently and optimized for all instructions depending on the type of instruction. In addition, by separating the entire system into several data paths according to the types of instructions, instructions with different data paths can be executed simultaneously to enable parallel processing of instructions without increasing hardware size.

본 발명은 파이프라인 구조를 명령어 종류에 맞게 가변적으로 적용 가능한 파이프라인 구조와 데이터 패스의 제어 방법을 제시하고 이를 적용한 병렬 처리 프로세서 구조를 제공하는 데 그 목적이 있다.An object of the present invention is to propose a pipeline structure and a data path control method in which a pipeline structure can be variably applied to an instruction type, and to provide a parallel processor structure.

이와 같은 목적을 달성하기 위한 본 발명은 The present invention for achieving the above object

실행중인 명령어의 요구된 데이터 패스에 따라 입력을 저장하거나 혹은 투명하게 바뀌어 다음 스테이지로 데이터를 전달할 수 있도록 제어신호에 따라 데이터의 저장 여부를 선택 가능한 파이프라인 스테이지 래치 및 컨트롤러(110a, 110b, 110c);Pipeline stage latches and controllers 110a, 110b, and 110c that select whether or not to store data in accordance with control signals so that input can be stored or transparently changed according to the required data path of the instruction being executed, and can be passed to the next stage. ;

실행중인 명령어의 요구 데이터 패스에 따라 동작을 수행하거나, 동작 수행없이 바로 다음 단 래치로 데이터를 전송을 선택적으로 수행할 수 있는 파이프라인 스테이지 (120a, 120b);Pipeline stages 120a and 120b capable of performing an operation in accordance with a request data path of an instruction being executed or selectively transmitting data to a next latch without performing an operation;

프리디코딩을 수행하여 입력된 명령어가 ARM명령어인지 thumb 명령어인지 구분하는 IF 스테이지(121);An IF stage 121 for performing predecoding to distinguish whether an input instruction is an ARM instruction or a thumb instruction;

IF 스테이지(121)의 출력측에 접속되어 thumb 명령어를 디코딩하기 위한 thumb 디코더(D1)를 포함하는 제 3스테이지(130a), ARM명령어를 디코딩하기 위한 ARM 디코더(D2)를 포함하는 제 4스테이지(130b) 및 디코딩된 thumb 명령어 및 ARM 명령어를 구분하여 출력시키기 위한 디코더(D3)를 포함하는 제 5스테이지(130C)로 구성된 ID 스테이지(130); A fourth stage 130b connected to an output side of the IF stage 121 and including a third stage 130a including a thumb decoder D1 for decoding a thumb instruction, and an ARM decoder D2 for decoding an ARM instruction; And an ID stage 130 including a fifth stage 130C including a decoder D3 for separately outputting a decoded thumb instruction and an ARM instruction;

상기 ID 스테이지(130)에서 디코딩된 명령어별로 동시에 전달받아 명령어에 따른 실행 동작이 수행하는 제 7스테이지(140b); 및A seventh stage 140b that is simultaneously received for each instruction decoded in the ID stage 130 and performs an execution operation according to the instruction; And

상기 제 7스테이지(140b)에서 조인동작에 의하여 메모리나 레지스터 파일로의 저장을 위한 제 6 스테이지 및 제 8 스테이지(140a, 140c)로 구성된 EX 스테이지(140)를 포함한다. The seventh stage 140b includes an EX stage 140 including sixth and eighth stages 140a and 140c for storing to a memory or a register file by a join operation.

이하 첨부된 도면을 참조하여 본 발명의 바람직한 실시예를 상세히 설명하면 다음과 같다.Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to the accompanying drawings.

첨부된 도 1은 스테이지 스키핑과 스테이지 통합 방식을 채택한 적응형 파이프라인의 구조도이고, 도 2a는 스테이지 스키핑 방식의 실행 예를 보여주기 위한 적응형 파이프라인의 동작 예이며, 도 2b는 스테이지 통합 방식의 실행 예를 보여주기 위한 적응형 파이프라인의 동작 예이다. 또한, 도 3a는 일반적인 파이프라인 구조에서 데이터의 처리를 보여주기 위한 블록도이고, 도 3b는 본 발명에 따른 데이터 패스 분리 기반의 파이프라인 구조를 보여주기 위한 블록도이다.1 is a structural diagram of an adaptive pipeline adopting a stage skipping and stage integration scheme, and FIG. 2A is an example of an operation of an adaptive pipeline to show an example of the implementation of the stage skipping scheme, and FIG. An example of the operation of an adaptive pipeline to show an example of execution. 3A is a block diagram illustrating processing of data in a general pipeline structure, and FIG. 3B is a block diagram illustrating a pipeline structure based on data path separation according to the present invention.

본 발명에 따른 비동기식 적응형 파이프라인에는 스테이지 스키핑(stage skipping)방법과 스테이지 통합(stage combining)방법이 적용된다. 일반적으로, 프로세서는 동작의 규칙성을 유지하기 위해 실행이 불필요한 버블(bubble)스테이지를 포함한다. 각 명령어 실행시 포함된 버블 스테이지 동작을 건너뛰거나, 다음에 실행할 스테이지가 비어있을 때 현재 스테이지 동작 후 데이터를 래치하지 않고 즉시 다음 스테이지로 데이터를 전송하여 두 스테이지를 병합한다면 프로세서 실행 시간을 줄일 수 있다. The stage skipping method and the stage combining method are applied to the asynchronous adaptive pipeline according to the present invention. Generally, a processor includes a bubble stage that does not require execution to maintain regularity of operation. You can reduce processor execution time by skipping the bubble stage operation included in each instruction execution, or by merging two stages by sending data to the next stage immediately without latching the data after the current stage operation when the next stage is empty. have.

스테이지 스키핑과 스테이지 통합 방식을 채택한 파이프라인 구조는 도 1에서 보는 바와 같이, 각각의 파이프라인 래치 컨트롤러(110a~110c)와 파이프라인 스테이지(120a,120b)에 EC_N과 EL_N과 같은 제어 신호를 포함하고 있다. 스테이지 스키핑 제어 신호인 EC_N 신호는 해당하는 스테이지의 동작을 수행할 지 아니면 그대로 통과할 지를 결정한다. EC_N 신호가 high인 경우 N번째 스테이지는 동작하지 않고 데이터가 다음 래치로 전달되며, low인 경우 N번째 스테이지는 정상적으로 동작한다. 스테이지 통합의 제어신호인 EL_N 신호는 래치를 정상적으로 동작시킬 지 아니면 존재하지 않는 것처럼 동작시킬 지를 결정한다. EL_N 신호가 high이면 N번째 래치는 존재하지 않는 것처럼 동작하고, low이면 정상적인 래치 동작을 수행한다.As shown in FIG. 1, the pipeline structure adopting the stage skipping and the stage integration method provides control signals such as EC _N and EL _N to the pipeline latch controllers 110a to 110c and the pipeline stages 120a and 120b, respectively. It is included. The EC _N signal, which is a stage skipping control signal, determines whether to perform the operation of the corresponding stage or pass as it is. If the EC _N signal is high, the Nth stage does not operate and data is transferred to the next latch. If the EC _N signal is low, the Nth stage operates normally. The EL _N signal, which is the control signal of the stage integration, determines whether to operate the latch normally or as if it does not exist. If the EL _N signal is high, the Nth latch operates as if it does not exist, and if low, the normal latch operation is performed.

파이프라인에 스테이지 스키핑과 스테이지 통합의 실행 예는 도 2a와 도 2b에서 보는 바와 같이, 프로세서는 명령어를 instruction decode(ID)에서 디코딩한 이후에 명령어 수행에 불필요한 버블 스테이지와 현재 사용 가능한 스테이지를 파악한다. 즉, ID 스테이지는 마이크로 명령어와 EL_N, EC_N 신호를 다음 스테이지로 전달한다. An example of the implementation of stage skipping and stage integration in a pipeline, as shown in FIGS. 2A and 2B, after the processor decodes an instruction in an instruction decode (ID), identifies the bubble stages and currently available stages that are not necessary for instruction execution. . In other words, the ID stage delivers the microinstruction and EL _N , EC _N signals to the next stage.

스테이지 스키핑 모드는 도 2a에서 보는 바와 같이, 버블 스테이지 동작을 제거하기 위하여 각 스테이지는 앞부분에 MUX(도시되지 않음)를 포함한다. EC1신호는 MUX로 입력되어 제 1 스테이지(120a)의 동작을 생략할 지 여부를 결정한다.
제 1 스테이지(120a)의 조합회로는 EC1 신호가 low일 때 동작이 수행된다. 반대로 이 신호가 high일 때에는 조합회로는 연산을 수행하지 않고 입력을 바로 다음 단으로 통과시킨다. The stage skipping mode includes MUX (not shown) in front of each stage to eliminate bubble stage motion, as shown in FIG. 2A. The EC1 signal is input to the MUX to determine whether to skip the operation of the first stage 120a.
The combination circuit of the first stage 120a is operated when the EC1 signal is low. Conversely, when this signal is high, the combinational circuit passes the input to the next stage without performing any operation.

스테이지 통합 모드는 도 2b에서 보는 바와 같이, 두 개 이상의 연속된 파이프라인 스테이지(120a, 120b)가 하나의 스테이지로 결합된다. 제 1 스테이지(120a) 와 제 2 스테이지(120b)사이의 래치 & 콘트롤러(110b)는 EL1신호가 high일 때 열린 상태(transparent)가 된다. 이때 래치 & 콘트롤러(120b)는 데이터 저장을 하지 않는다. 래치 & 콘트롤러(120b)가 열린 상태가 되기 위해서는 제 2 스테이지(120b)의 연산이 모두 끝나있어야 하고, 다음 스테이지를 위해 준비 상태에 있어야 한다. 반대로 EL1 신호가 1일 때에는 래치는 정상적인 동작을 한다. 제 1 및 제 2 스테이지(120b, 120c)의 EL_N신호는 ID 스테이지에서 발생되고 다음 목적지 래치에 도달할 때까지 각 스테이지에서 다음 스테이지로 전달된다. In the stage integration mode, as shown in FIG. 2B, two or more consecutive pipeline stages 120a and 120b are combined into one stage. The latch & controller 110b between the first stage 120a and the second stage 120b is in an open state when the EL1 signal is high. At this time, the latch & controller 120b does not store data. In order for the latch & controller 120b to be in an open state, the operations of the second stage 120b must all be completed and ready for the next stage. In contrast, when the EL1 signal is 1, the latch operates normally. The EL _N signals of the first and second stages 120b and 120c are generated at the ID stage and passed from each stage to the next stage until the next destination latch is reached.

일반적인 파이프라인 구조의 프로세서들은 스테이지마다 한 가지 작업을 하는 파이프라인 구조를 채택한다. 즉, 도 3(a)에서 보는 바와 같이, 각 명령어는 파이프라인 스테이지 수에 따라 같은 양의 작업을 병렬로 수행한다. 또한 몇몇 프로세서들은 슈퍼스칼라나 VLIW 프로세서와 같이 병렬 처리를 위해 기능 블록을 중첩하는 방법을 사용한다. 중첩된 기능 블록들은 데이터 패스의 수만큼 같은 연산을 동시에 실행할 수 있다. 그러나 이러한 프로세서들은 반드시 하드웨어 크기가 증가한다는 단점을 갖는다. 실제로 모든 명령어들이 반드시 같은 데이터 패스로 통과하는 것은 아니다. In general, pipelined processors adopt a pipelined structure that does one task per stage. That is, as shown in Figure 3 (a), each instruction performs the same amount of work in parallel according to the number of pipeline stages. Some processors also use nested function blocks for parallelism, such as superscalar or VLIW processors. Nested functional blocks can execute the same operation at the same time as the number of data paths. However, these processors have the disadvantage that the hardware size necessarily increases. In practice, not all instructions pass through the same data path.

비동기식 프로세서에서는 단순하게 핸드셰이킹 제어로 병렬처리가 가능하다. 만약 모든 명령어를 서로 다른 데이터패스를 갖는 몇 개의 명령어 군으로 묶고, 하나의 스테이지내에 명령어 종류에 따라 서로 다른 데이터패스들로 구성한다면, 서로 다른 종류의 명령어 군들은 동시에 실행될 수 있다. 본 발명에서와 같이 서로 간에 의존성이 없는 명령어를 동시에 실행시켜 도 3(b)에서 보는 바와 같이, 기능 블록들의 중첩이 없이 동작을 시키면 프로세서의 성능은 크게 향상될 것이다. 이와 같은 설계를 위해서는 비동기식 설계 방법이 동기식 설계 방법에 비해 유리하다. 이 방법은 실행되고 있는 명령어들이 각기 다른 데이터패스를 가질 경우 분리된 데 이터패스를 통해 병렬로 처리된다. 이러한 컨트롤 방법은 명령어의 종류에 따라 전체 시스템을 여러 개의 데이터 패스로 나눈다.In an asynchronous processor, parallel processing is possible with simple handshaking control. If all instructions are grouped into several instruction groups with different datapaths and composed of different datapaths according to the instruction type in one stage, different kinds of instruction groups can be executed simultaneously. As shown in FIG. 3 (b) by simultaneously executing instructions having no dependency on each other as in the present invention, if the operation is performed without overlapping functional blocks, the performance of the processor will be greatly improved. For this design, the asynchronous design method is advantageous over the synchronous design method. This method is processed in parallel through separate datapaths if the instructions being executed have different datapaths. This control method divides the entire system into multiple data paths, depending on the type of instruction.

첨부된 도 4는 프로세서에 적용된 파이프라인과 병렬 처리 구조도이고, 도 5는 적응형 파이프라인 구조와 데이터 패스 분리가 적용된 프로세서 구조도이며, 도 6은 본 발명에 따른 ARM 프로세서의 각 명령어별 실행 데이터 패스를 보여주기 위한 표이다.4 is a diagram illustrating a pipeline and parallel processing architecture applied to a processor, FIG. 5 is a diagram illustrating a processor architecture to which an adaptive pipeline structure and data path separation are applied, and FIG. 6 is an execution data path for each instruction of an ARM processor according to the present invention. Table to show

프로세서에 적용된 파이프라인과 병렬 처리 구조는 도 4에서 보는 바와 같이, 데이터패스는 명령어의 종류에 따라 구분되고, 구분된 데이터패스를 갖는 명령어들은 병렬로 실행된다. ID 스테이지(130)에서 디코딩된 명령어는 포크(fork) 동작에 의해서 명령어별로 동시에 해당 EX 스테이지(140)로 전달된다. EX 스테이지(140)의 동작이 완료된 후에는 메모리나 레지스터 파일로의 저장을 위해 조인(join)동작을 수행한다. As shown in FIG. 4, the pipeline and parallel processing structure applied to the processor are classified according to the type of instructions, and the instructions having the divided datapaths are executed in parallel. The instruction decoded in the ID stage 130 is transferred to the corresponding EX stage 140 at the same time for each instruction by a fork operation. After the operation of the EX stage 140 is completed, a join operation is performed to store the memory in a memory or a register file.

본 발명에 따른 적응형 파이프라인 구조와 데이터 패스 분리가 적용된 프로세서 구조는 도 5에서 보는 바와 같이, 프리디코딩을 수행하여 입력된 명령어가 32비트 ARM명령어인지 16비트로 축소된 thumb 명령어인지 구분하는 IF 스테이지(121)를 포함한다. IF 스테이지(121)의 출력측에는 thumb 명령어를 디코딩하기 위한 thumb 디코더(D1)를 포함하는 ID 스테이지(130)인 제 3 스테이지(130a), ARM명령어를 디코딩하기 위한 ARM 디코더(D2)를 포함하는 ID 스테이지(130)인 제 4 스테이지(130b) 및 디코딩된 thumb 명령어 및 ARM 명령어를 구분하여 출력시키기 위한 디코더(D3)를 포함하는 ID 스테이지(130)인 제 5 스테이지(130C)가 접속된다. As shown in FIG. 5, the adaptive pipeline structure according to the present invention and the processor structure to which data path separation is applied, an IF stage for performing predecoding to distinguish whether an input instruction is a 32-bit ARM instruction or a 16-bit reduced thumb instruction. (121). On the output side of the IF stage 121, the third stage 130a which is an ID stage 130 including a thumb decoder D1 for decoding a thumb instruction, and the ID which includes an ARM decoder D2 for decoding an ARM instruction. A fourth stage 130b, which is a stage 130, and a fifth stage 130C, which is an ID stage 130 including a decoder D3 for separately outputting a decoded thumb instruction and an ARM instruction, are connected.

제 3 ~제 5 스테이지(130a~130c)로 구성된 ID 스테이지(130)에서 디코딩된 명령어는 포크(fork) 동작에 의해서 명령어별로 동시에 해당 EX 스테이지(140)로 전달된다. 만약 IF 스테이지(121)에 입력된 명령어가 thumb 명령어이고 제 4 스테이지(130b)가 작업이 완료되어 비어있는 상태라면 EL 신호는 high가 되고 thumb 디코더(D1)와 제 4 스테이지(130b)는 통합된다. 또한, 명령어가 ARM 명령어라면 EC 신호는 high가 되고 thumb 디코더(D1)의 동작은 생략되어 데이터패스가 짧아진다. Instructions decoded in the ID stage 130 including the third to fifth stages 130a to 130c are simultaneously transmitted to the corresponding EX stage 140 for each instruction by a fork operation. If the instruction input to the IF stage 121 is a thumb instruction and the fourth stage 130b is completed and is empty, the EL signal becomes high and the thumb decoder D1 and the fourth stage 130b are integrated. . In addition, if the instruction is an ARM instruction, the EC signal becomes high and the operation of the thumb decoder D1 is omitted, thereby shortening the data path.

ID 스테이지(130)의 출력측에는 제 6 ~ 제 8 스테이지(140a~140C)로 구성된 EX 스테이지(140)가 접속된다. EX 스테이지(140)은 ID 스테이지(130)에서 디코딩된 명령어를 포크(fork) 동작에 의해서 명령어별로 동시에 제 7스테이지(140b)에 전달된다. 제 7 스테이지(140b)에 전달된 명령어에 따른 실행 동작이 완료된 다음 제 7 스테이지(140b)는 제 6 스테이지 및 제 8 스테이지(140a, 140c)에서의 메모리나 레지스터 파일로의 저장을 위해 조인(join)동작을 수행한다. On the output side of the ID stage 130, an EX stage 140 composed of sixth to eighth stages 140a to 140C is connected. The EX stage 140 transfers the instructions decoded in the ID stage 130 to the seventh stage 140b simultaneously for each instruction by a fork operation. After the execution operation according to the instruction transferred to the seventh stage 140b is completed, the seventh stage 140b joins for storage in a memory or register file in the sixth and eighth stages 140a and 140c. Perform the operation.

본 발명에 따른 프로세서는 배럴 쉬프터(barrel shifter)와 그에 연결되어 있는 ALU의 예와 같이 쉬프터가 사용되고 ALU는 사용되지 않는 명령어가 수행되어야 할 경우, EL신호는 high가 되고 두 스테이지는 통합된다. 쉬프터가 사용되지 않는 경우에는 EC신호는 high가 되고 스테이지 스키핑이 수행된다. EX스테이지는 명령어의 종류에 따라 분리되어 있고, 구분된 데이터패스를 갖는 명령어들은 병렬로 실행된다.In the processor according to the present invention, when a shifter is used and an ALU is not used as in the example of a barrel shifter and an ALU connected thereto, the EL signal becomes high and the two stages are integrated. If no shifter is used, the EC signal goes high and stage skipping is performed. The EX stages are divided according to the types of instructions, and instructions with separate data paths are executed in parallel.

본 발명에 따른 ARM 프로세서의 각 명령어별 실행 데이터 패스는 도 6에서 보는 바와 같이, 서로 다른 데이터 패스를 갖는 명령어들은 명령어 실행시 데이터 및 제어 의존성이 없을 경우 병렬로 실행함으로서 프로세서의 명령어 처리율을 높일 수 있다.As shown in FIG. 6, the instruction data path for each instruction of the ARM processor according to the present invention can increase the instruction throughput of the processor by executing the instructions having different data paths in parallel when there are no data and control dependencies. have.

상술한 바와 같이, 본 발명에 따른 위에서 설명한 바와 같이 비동기식 적응형 파이프라인과 병렬 처리 방식을 시스템에 적용하면 시스템의 불필요한 동작을 제거하고, 하드웨어의 증가 없이 여러 종류의 명령어를 병렬로 처리할 수 있다. 이로 인해 기존의 프로세서에 비해 명령어 처리 속도를 증가시킬 수 있고, 하드웨어의 증가가 없으므로 소비 전력을 감소시킬 수 있다.As described above, applying the asynchronous adaptive pipeline and the parallel processing scheme to the system as described above according to the present invention can eliminate unnecessary operation of the system and process various types of instructions in parallel without increasing hardware. . This can increase instruction processing speed compared to conventional processors, and reduce power consumption since there is no increase in hardware.

이상에서 첨부된 도면을 참조하여 본 발명의 바람직한 실시예를 상세히 설명하였으나, 본 발명은 이에 한정되는 것이 아니며 본 발명의 기술적 사상의 범위내에서 당업자에 의해 그 개량이나 변형이 가능하다.Although the preferred embodiments of the present invention have been described in detail with reference to the accompanying drawings, the present invention is not limited thereto and may be improved or modified by those skilled in the art within the scope of the technical idea of the present invention.

Claims

Pipeline stage latches and controllers 110a, 110b, and 110c that select whether or not to store data in accordance with control signals so that input can be stored or transparently changed according to the required data path of the instruction being executed, and can be passed to the next stage. ;

First and second stages (120a, 120b) capable of performing an operation in accordance with a request data path of an instruction being executed or selectively transmitting data to the next stage latch without performing an operation;

An IF stage 121 for performing predecoding to distinguish whether an input instruction is an ARM instruction or a thumb instruction;

A fourth stage 130b connected to an output side of the IF stage 121 and including a third stage 130a including a thumb decoder D1 for decoding a thumb instruction, and an ARM decoder D2 for decoding an ARM instruction; And an ID stage 130 including a fifth stage 130C including a decoder D3 for separately outputting a decoded thumb instruction and an ARM instruction;

A seventh stage 140b that is simultaneously received for each instruction decoded in the ID stage 130 and performs an execution operation according to the instruction; And

Parallel to which the adaptive pipeline including the EX stage 140 composed of the sixth stage and the eighth stage 140a and 140c for storing to the memory or the register file by the join operation in the seventh stage 140b Processing processor architecture.