KR100632317B1

KR100632317B1 - Method and system for buffering instructions in processor

Info

Publication number: KR100632317B1
Application number: KR1019990018393A
Authority: KR
Inventors: 수바쉬챈더지.; 밀라트디파크(엔엠아이)
Original assignee: 텍사스 인스트루먼츠 인코포레이티드
Priority date: 1998-05-21
Filing date: 1999-05-21
Publication date: 2006-10-11
Also published as: KR19990088464A

Abstract

본 발명의 일 실시예에 따르면, 디코드 스테이지를 포함하는 파이프라인을 갖는 프로세서에서 명령어들을 버퍼링하는 방법은 디코드 스테이지의 스톨링을 검출하고, 디코드 스테이지의 스톨 상태가 해제될 때까지 메모리내의 명령어에 대한 사전 페치를 재발행하고, 디코드 스테이지의 스톨 상태가 해제된 후에 상기 페치된 명령어를 명령어 버퍼 내에 기록하는 것을 포함한다.According to one embodiment of the invention, a method of buffering instructions in a processor having a pipeline comprising a decode stage detects stalling of the decode stage, and for instructions in the memory until the stall state of the decode stage is released. Re-issuing the prefetch and writing the fetched instructions into the instruction buffer after the stall state of the decode stage has been released.

파이프라인, 디코드 스테이지, 명령어 버퍼, 프로그램 카운터, 메모리 위치Pipeline, decode stage, instruction buffer, program counter, memory location

Description

METHOD AND SYSTEM FOR BUFFERING INSTRUCTIONS IN A PROCESSOR}

도 1은 본 발명의 교시에 따른 컴퓨터 시스템의 블럭도.1 is a block diagram of a computer system in accordance with the teachings of the present invention.

도 2는 파이프라인이 스톨 상태가 아닌 예로서, 도 1에 도시된 파이프라인의 여러 스테이지들에 대하여 도 1의 컴퓨터 시스템의 프로그램 메모리로부터 얻어진 명령어들의 위치를 도시하는 타이밍도.2 is a timing diagram illustrating the location of instructions obtained from the program memory of the computer system of FIG. 1 with respect to the various stages of the pipeline shown in FIG. 1 as an example where the pipeline is not stalled;

도 3은 파이프라인이 간헐적으로 스톨 상태가 되는 예로서, 도 1에 도시된 파이프라인의 여러 스테이지들에 대하여 도 1의 컴퓨터 시스템의 프로그램 메모리로부터 얻어진 명령어들의 위치를 도시하는 타이밍도.3 is a timing diagram illustrating the location of instructions obtained from the program memory of the computer system of FIG. 1 with respect to the various stages of the pipeline shown in FIG. 1 as an example where the pipeline is intermittently stalled.

도 4는 도 1에 도시된 프로세서의 프리페치 스테이지 및 페치 스테이지를 추가적으로 상세하게 도시하는 블럭도.4 is a block diagram illustrating additional details of the prefetch stage and fetch stage of the processor shown in FIG.

<도면의 주요 부분에 대한 부호의 설명><Explanation of symbols for the main parts of the drawings>

10 : 컴퓨터 시스템10: computer system

12 : 프로세서12: processor

14 : 메모리 시스템14: memory system

16 : 프로그램 메모리16: program memory

20 : 파이프라인20: pipeline

24 : 프리페치 스테이지24: prefetch stage

26 : 페치 스테이지26: fetch stage

28 : 디코드 스테이지28: decode stage

30 : 판독 스테이지30: reading stage

32 : 실행 스테이지32: execution stage

34 : 저장 스테이지34: storage stage

본 발명은 일반적으로 프로세서에 관한 것으로, 특히 프로세서 내에서 명령어를 버퍼링하기 위한 방법 및 그 시스템에 관한 것이다.The present invention relates generally to a processor, and more particularly, to a method and a system for buffering instructions within a processor.

보다 큰 효율성을 갖기 위해 지금의 대다수 프로세서는 프로세서 내에서 파이프라인을 이용한다. 파이프라인을 사용하여, 태스크(task)는 여러개의 순차 서브태스크로 나누어진다. 태스크의 순차 서브태스크로의 분할은 주어진 시간에 많은 프로그램 명령어들을 페칭하고, 디코딩하여 실행할 수 있도록 한다. 그러므로, 어떤 특정한 시간에, 파이프라인에서의 여러 스테이지에서 몇 개의 명령어들이 처리될 수 있다. 이와 같은 많은 프로세서들은 디코드 스테이지가 있는 파이프라인을 포함한다. 파이프라인의 디코드 스테이지에서, 프로그램 메모리로부터 얻어진 명령어는 실행될 수 있도록 디코드된다. 명령어가 디코드된 후에는, 프로세서 내에 명령어를 저장할 필요는 없다. 그러나, 명령어가 디코드될 때까지, 프로그램 메모리로부터 얻어진 명령어는 저장되어야 한다. 명령어가 디코드될 때까지 저장하기 위하여, 대다수 프로세서는 명령어 버퍼를 이용한다.To be more efficient, most modern processors now use pipelines within the processor. Using a pipeline, tasks are divided into sequential subtasks. The division of tasks into sequential subtasks allows a large number of program instructions to be fetched, decoded and executed at any given time. Therefore, at any particular time, several instructions can be processed at various stages in the pipeline. Many such processors include a pipeline with a decode stage. At the decode stage of the pipeline, the instructions obtained from the program memory are decoded for execution. After the instructions are decoded, there is no need to store the instructions in the processor. However, until the instructions are decoded, the instructions obtained from the program memory must be stored. In order to store instructions until they are decoded, most processors use instruction buffers.

종래의 명령어 버퍼는 디코드 스테이지에 이르기까지의 스테이지와 같은 개수의 명령어를 저장하기에 충분한 레지스터들을 포함한다. 예를 들어, 파이프라인이 처음 3개의 스테이지로서 프리페치, 페치 및 디코드 스테이지를 같는다면, 관련된 명령어 버퍼는 이 3개의 명령어를 저장하기 위한 3개의 레지스터를 가질 것이다. 이와 같이, 디코드 스테이지가 스톨(stall) 상태라고 판단되는 경우에 획득중인 명령어들을 명령어 버퍼 내의 상기 개수의 레지스터들이 보존할 수 있기 때문에 종래와 같은 방식으로 사용된 것이다.Conventional instruction buffers contain enough registers to store the same number of instructions as the stage up to the decode stage. For example, if the pipeline has the same prefetch, fetch, and decode stages as the first three stages, the associated instruction buffer would have three registers to store these three instructions. As such, when the decode stage is determined to be in a stall state, since the number of registers in the instruction buffer can be stored, the instructions being obtained are used in a conventional manner.

명령어 버퍼의 사용은 정보를 손실하지 않고 처리를 재개할 수 있도록 하지만, 이것은 문제점을 수반한다. 예를 들어, 공통 명령어 페치(common instruction fetches) 크기가 증가하기 때문에, 명령어 버퍼 내의 각 레지스터가 크기면에서 증가되며, 추가적인 실리콘 영역을 필요로 하게 된다.The use of an instruction buffer allows for resumption of processing without losing information, but this involves a problem. For example, as the common instruction fetches increase in size, each register in the instruction buffer increases in size, requiring additional silicon regions.

따라서, 프로세서에서 명령어를 버퍼링하기 위한 향상된 방법 및 그 시스템에 대한 필요가 발생되었다. 본 발명은 종래의 시스템과 방법의 단점을 해결하려는 목적으로 프로세서에서 명령어를 버퍼링하기 위한 시스템 및 방법을 제공한다.Thus, there is a need for an improved method and system for buffering instructions in a processor. The present invention provides a system and method for buffering instructions in a processor for the purpose of solving the disadvantages of conventional systems and methods.

본 발명의 일 실시예에 따르면, 디코드 스테이지를 갖는 파이프라인이 있는 프로세서에서 명령어를 버퍼링하는 방법은 디코드 스테이지의 스톨링을 검출하고, 디코드 스테이지의 스톨 상태가 해제될 때까지 메모리 내의 명령어에 대한 사전 페치를 재발행한(reissuing) 다음, 디코드 스테이지의 스톨 상태가 해제된 후에 페치 명령어를 상기 명령어 버퍼 내로 기록하는 것을 포함한다.According to one embodiment of the invention, a method of buffering instructions in a processor with a pipeline having a decode stage detects stalling of the decode stage and advances to instructions in memory until the stall state of the decode stage is released. Reissuing the fetch, and then writing the fetch instruction into the instruction buffer after the stall state of the decode stage is released.

본 발명의 또 다른 실시예에 따르면, 프로세서 파이프라인은 디코드 스테이지 이전의 복수의 순차 스테이지들, 및 이 순차 스테이지들의 개수 이하의 개수를 갖는 명령어들을 동시에 저장할 수 있는 제한된 명령어 버퍼를 포함한다. 또한, 프로세서 파이프라인은 카운터 시스템을 포함한다. 카운터 시스템은 제한된 명령어 버퍼에 수신되는 명령어를 저장하는 메모리 위치의 어드레스를 지정하는 카운트를 저장하기 위한 카운터를 포함한다. 또한, 카운터 시스템은 디코드 스테이지의 상태에 기초하여 카운터의 카운트를 조정할 수 있다. 복수의 순차 스테이지들은 카운트에 의해 지정된 어드레스에서 명령어를 페치할 수 있는 페치 스테이지를 포함한다.According to another embodiment of the present invention, the processor pipeline includes a plurality of sequential stages before the decode stage, and a limited instruction buffer capable of simultaneously storing instructions having a number less than or equal to the number of sequential stages. The processor pipeline also includes a counter system. The counter system includes a counter for storing a count specifying an address of a memory location that stores instructions received in a limited instruction buffer. The counter system may also adjust the count of the counter based on the state of the decode stage. The plurality of sequential stages includes a fetch stage that can fetch an instruction at an address specified by a count.

본 발명의 실시예들은 기술적인 많은 장점을 갖는다. 예를 들어, 본 발명의 일 실시예에서, 프로그램 메모리로부터 페치 명령어들을 버퍼링하기 위하여 종래의 더 큰 명령어 버퍼 대신에 제한된 명령어 버퍼가 사용될 수 있다. 더 작은 크기의 한정된 명령어 버퍼는 장치가 보다 소형화되도록 실리콘 영역을 축소하며, 프로세서의 다른 영역들 내에 이와 같은 실리콘 영역을 사용할 수 있도록 한다.Embodiments of the present invention have many technical advantages. For example, in one embodiment of the present invention, a limited instruction buffer may be used instead of the conventional larger instruction buffer to buffer fetch instructions from program memory. The smaller sized limited instruction buffer shrinks the silicon area to make the device smaller, and allows such silicon area to be used in other areas of the processor.

본 기술 분야의 숙련된 기술자라면, 첨부된 도면, 명세서 및 청구항으로부터 이와 다른 기술적 특징들을 용이하게 알 수 있을 것이다.Those skilled in the art will readily recognize other technical features from the accompanying drawings, the specification and the claims.

본 발명의 실시예들 및 그 장점들은 도 1 내지 4를 참조하여 잘 이해될 수 있으며, 여러 도면들의 동일한 대응 부분들은 동일한 참조 번호를 사용한다.Embodiments of the present invention and their advantages can be better understood with reference to FIGS. 1 to 4, wherein like corresponding parts in the various figures use the same reference numerals.

도 1은 본 발명의 교시에 따른 프로세서의 블럭도이다. 컴퓨터 시스템(10)은 프로세서(12) 및 메모리 시스템(14)을 포함한다. 프로세서(12)는 액세스 메모리 시스템(14)으로 동작할 수 있다. 메모리 시스템(14)은 프로그램 메모리(16) 및 데이타 메모리(16) 모두를 포함할 수 있다. 프로세서(12)는 파이프라인(20)을 포함한다. 파이프라인(20)은 프리페치 스테이지(24), 페치 스테이지(26), 디코드 스테이지(28), 판독 스테이지(30), 실행 스테이지(32) 및 저장 스테이지(34)를 포함한다. 또한, 프로세서(12)는 부가적인 프로세싱 소자(21)를 포함할 수도 있다.1 is a block diagram of a processor in accordance with the teachings of the present invention. Computer system 10 includes a processor 12 and a memory system 14. The processor 12 may operate as an access memory system 14. Memory system 14 may include both program memory 16 and data memory 16. The processor 12 includes a pipeline 20. The pipeline 20 includes a prefetch stage 24, a fetch stage 26, a decode stage 28, a read stage 30, an execution stage 32 and a storage stage 34. The processor 12 may also include additional processing elements 21.

프리페치 스테이지(24)는 명령어를 판독하기 위하여 프로그램 메모리(16)에서 메모리 위치의 어드레스를 판정한다. 페치 스테이지(26)는 프리페치 스테이지(24)에 의해 판정된 프로그램 메모리 위치에 있는 명령어를 판독한다. 페치 스테이지(24)는 프로그램 메모리(16)로부터 페치 명령어들을 버퍼링하기 위하여 제한된 명령어 버퍼(50)를 포함한다. 제한된 명령어 버퍼(50)는 도 4에 도시된다. 본 실시예에서, 제한된 명령어 버퍼(50)는 제1 레지스터(52) 및 제2 레지스터(54)를 포함한다. 본 발명에 따르면, 제한된 명령어 버퍼(50)는 디코드 스테이지(28)에 앞선 파이프라인(20) 내의 스테이지들의 개수와 동일한 개수의 레지스터들을 포함하는데, 본 실시예에서는 2개이다.Prefetch stage 24 determines the address of the memory location in program memory 16 to read the instruction. Fetch stage 26 reads the instructions at the program memory location determined by prefetch stage 24. Fetch stage 24 includes a limited instruction buffer 50 to buffer fetch instructions from program memory 16. The limited instruction buffer 50 is shown in FIG. 4. In the present embodiment, the limited instruction buffer 50 includes a first register 52 and a second register 54. In accordance with the present invention, the limited instruction buffer 50 includes the same number of registers as the number of stages in the pipeline 20 prior to the decode stage 28, two in this embodiment.

디코드 스테이지(28)는 프로그램 메모리(16)로부터 얻어진 명령어를 디코드한다. 판독 스테이지(30)는 디코드 스테이지(28)에 의해 디코드된 명령어의 실행에 필요한 데이타를 데이타 메모리(18)로부터 판독한다. 판독 스테이지(30)는 하나 이상의 스테이지로 대체될 수 있다. 예를 들어, 판독 스테이지(30)는 데이타가 판독될 데이타 메모리(18) 내의 위치를 판단하는데 필요한 계산을 수행하는 분리 스테이지 및 이와 같은 데이타를 판독하는 기능을 수행하는 분리 스테이지로 대체될 수 있다. 실행 스테이지(32)는 디코드 스테이지(28)에 의해 디코드된 명령어를 실행하는 기능을 수행한다. 저장 스테이지(34)는 결과적으로 명령어 실행 후에 기록될 것이 요구될 수 있는 소정의 데이타를 기록하는 기능을 수행한다.Decode stage 28 decodes the instructions obtained from program memory 16. The read stage 30 reads from the data memory 18 the data necessary for the execution of the decoded instructions by the decode stage 28. Read stage 30 may be replaced with one or more stages. For example, read stage 30 may be replaced by a separate stage that performs the necessary calculations to determine the location in data memory 18 where data is to be read and a separate stage that performs the function of reading such data. The execution stage 32 performs the function of executing the instructions decoded by the decode stage 28. The storage stage 34 consequently performs the function of recording certain data which may be required to be written after the instruction is executed.

디코드 스테이지(28)에 앞선 스테이지들의 개수와 같은 개수의 레지스터들을 갖는 제한된 명령어 버퍼의 사용은 프로세서(12)용 실리콘 영역 요구를 감소시킨다. 프로세서(12)용 실리콘 영역 요구의 감소는 장점인 것이 일반적이다. 본 발명에 따르면, 제한된 명령어 버퍼는, 프로그램 메모리(16) 내의 명령어용 페치들을 재발행함(reissuing)으로써, 도 2 내지 4를 참조하여 이하에서 자세히 설명하는 바와 같이, 프로페서 성능을 결과적으로 손실시키지 않고 사용될 수 있다.The use of a limited instruction buffer with the same number of registers as the number of stages preceding the decode stage 28 reduces the silicon area requirement for the processor 12. The reduction in silicon area requirements for the processor 12 is generally an advantage. According to the present invention, the limited instruction buffer reissuing instructions fetches in the program memory 16, without consequently losing processor performance, as described in detail below with reference to FIGS. Can be used.

도 2는 파이프라인(20)이 스톨 상태가 아닌 일 예로서, 파이프라인(20)의 여러 스테이지에 관하여 프로그램 메모리(16)로부터 얻어진 명령어들의 위치를 도시하는 타이밍도이다. 정상 처리시에, 프로세서(12)는 순차적인 실행용 어드레스들을 갖는 프로그램 메모리(16) 내의 위치들로부터 명령어들을 연속적으로 페치한다. 어떠한 이유로 프로세서(12)가 스톨 상태가 된다면, 프로세서(12)는 프로그램 메모리(16)로부터 계속해서 명령어들을 페치한다. 예를 들어, 프로세서(12)가 데이타 메모리(18)로부터 데이타를 수신하기 위해 대기하는 동안은 스톨 상태일 수 있다. 이와 같은 경우에, 제한된 명령어 버퍼(50)는 재개 처리를 위해 대기하는 동안 2개의 명령어들을 누산(accumulation)할 수 있다. 제한된 명령어 버퍼(50)와 결합된, 프리페치 스테이지(24), 페치 스테이지(26) 및 디코드 스테이지(28)의 동작은 스톨 상태가 발생하지 않는 예에 대한 도 2를 참조하여 이하에서 설명한다.FIG. 2 is a timing diagram illustrating the location of instructions obtained from program memory 16 with respect to various stages of pipeline 20, as an example where pipeline 20 is not stalled. In normal processing, processor 12 continuously fetches instructions from locations in program memory 16 having sequential execution addresses. If for some reason processor 12 is stalled, processor 12 continues to fetch instructions from program memory 16. For example, the processor 12 may be in a stall state while waiting to receive data from the data memory 18. In such a case, the limited instruction buffer 50 may accumulate two instructions while waiting for resume processing. The operation of the prefetch stage 24, fetch stage 26 and decode stage 28, combined with the limited instruction buffer 50, is described below with reference to FIG. 2 for an example where no stall state occurs.

제1 클락 주기동안, 제1 명령어 I₁에 대응하는 메모리 위치에 대한 어드레스가 계산된다. 제2 클락 주기동안, 제2 명령어 I₂에 대한 어드레스가 계산되고, 명령어 I₁의 페칭이 개시된다. 제3 클락 주기동안, 제3 명령어 I₃에 대응하는 메모리 위치에 대한 어드레스가 계산되고, 명령어 I₁의 페칭이 완료되고, I₂의 페칭이 개시되며, 프로세서(12)가 스톨 상태가 아니기 때문에 명령어 I₁의 디코딩이 개시되고 완료된다.
제3 클락 주기동안, 제한된 명령어 버퍼(50)의 레지스터(52)는 명령어 I₁을 저장한다. 제한된 명령어 버퍼(50)는 디코드될 현재 명령어를 저장하는 제한된 명령어 버퍼(50) 내의 레지스터를 지정하는 포인터(56)를 포함한다. 명령어 포인터(56)는 레지스터(52)에 대한 포인팅으로서 도 4에 도시된다. 이와 같은 제3 클락 주기동안, 포인터(56)는 레지스터(52)를 지정한다. 제4 클락 주기동안, 제4 명령어 I₄에 대한 어드레스가 계산되고, 명령어 I₂의 페칭이 완료되며, 명령어 I₃의 페칭이 개시되고, 명령어 I₂의 디코딩이 개시되고 완료된다. 제4 클락 주기동안, 명령어 I₁이 또 다른 명령어에 의해 중복 기록되지 않았기 때문에, 레지스터(52)는 계속해서 명령어 I₁을 저장하며, 레지스터(54)는 명령어 I₂를 저장하고, 포인터(56)는 레지스터(54)를 지정한다.During the first clock period, the address for the memory location corresponding to the first instruction I ₁ is calculated. During the second clock period, the address for the second instruction I ₂ is calculated and the fetching of the instruction I ₁ is initiated. During the third clock period, the address for the memory location corresponding to the third instruction I ₃ is calculated, the fetching of the instruction I ₁ is completed, the fetching of I ₂ is started, and the processor 12 is not in the stall state. Decoding of the instruction I ₁ is started and completed.
During the third clock period, the register 52 of the restricted instruction buffer 50 stores the instruction I ₁ . The restricted instruction buffer 50 includes a pointer 56 that specifies a register in the restricted instruction buffer 50 that stores the current instruction to be decoded. Instruction pointer 56 is shown in FIG. 4 as a pointing to register 52. During this third clock period, pointer 56 designates register 52. During the fourth clock period, the address for the fourth instruction I ₄ is calculated, the fetching of the instruction I ₂ is completed, the fetching of the instruction I ₃ is started, and the decoding of the instruction I ₂ is started and completed. During the fourth clock period, register 52 continues to store instruction I ₁ because register I ₁ was not overwritten by another instruction, register 54 stores instruction I ₂ , and pointer 56 ) Designates the register 54.

제5 클락 주기동안, 제5 명령어 I₅에 대한 어드레스가 계산되고, 명령어 I₃의 페칭이 완료되고, 명령어 I₄의 페칭이 개시되고, 명령어 I₃의 디코딩이 개시되고 완료된다. 제5 클락 주기 동안, 명령어 I₃는 레지스터(52)에 저장되고, 제한된 명령어 버퍼(50)의 포인터(56)는 레지스터(52)를 지정한다. 제6 클락 주기동안, 제6 명령어 I₆에 대한 어드레스가 계산되고, 명령어 I₄의 페칭이 완료되고, 명령어 I₅의 페칭이 개시되고, 명령어 I₄의 디코딩이 개시되고 완료되며, 명령어 I₃와 관련된 모든 데이타가 판독된다. 제6 클락 주기동안, 명령어 I₄는 레지스터(54)에 저장되며, 제한된 명령어 버퍼(50)의 포인터(56)는 레지스터(54)를 지정한다.During the fifth clock period, the address for the fifth instruction I ₅ is calculated, the fetching of the instruction I ₃ is completed, the fetching of the instruction I ₄ is started, and the decoding of the instruction I ₃ is started and completed. During the fifth clock period, instruction I ₃ is stored in register 52, and pointer 56 of restricted instruction buffer 50 designates register 52. During the sixth clock period, the address for the sixth instruction I ₆ is calculated, the fetching of the instruction I ₄ is completed, the fetching of the instruction I ₅ is started, the decoding of the instruction I ₄ is started and completed, and the instruction I ₃ All data associated with is read. During the sixth clock period, instruction I ₄ is stored in register 54 and pointer 56 of restricted instruction buffer 50 designates register 54.

명령어들을 얻고 처리하는 상술한 순서에 있어서, 제한된 명령어 버퍼(50)는 명령어가 디코드되어질 2개의 레지스터들(52, 54) 중 하나에 명령어를 저장한다. 상술한 순서에서 프로세서(12)는 스톨 상태가 아니기 때문에, 제한된 명령어 버퍼(50)는, 페치를 재발행할 필요없이 부가적인 순차 명령어들을 계속해서 페칭하기에 충분하다. 간헐적인 스톨 주기동안 프로세서(12)의 동작은 도 3을 참조하여 설명된다.In the above-described order of obtaining and processing instructions, the limited instruction buffer 50 stores the instruction in one of two registers 52, 54 where the instruction is to be decoded. Since the processor 12 is not stalled in the order described above, the limited instruction buffer 50 is sufficient to continue to fetch additional sequential instructions without having to reissue the fetch. The operation of the processor 12 during intermittent stall periods is described with reference to FIG.

도 3은 파이프라인(20)의 스톨 상태가 간헐적으로 발생하는 예로서, 파이프라인(20)의 여러 스테이지들에 대한 프로그램 메모리(16)로부터 얻어진 명령어들의 위치를 도시하는 타이밍도이다. 프리페치 스테이지(24), 페치 스테이지(26), 디코드 스테이지(28) 및 제한된 명령어 버퍼(50)의 동작은 파이프라인(20)에서 간헐적인 스톨이 발생하는 예에 대하여 설명된다. 제1 클락 주기동안, 제1 명령어 I₁에 대한 어드레스가 계산된다. 도 3의 마지막 행에서 보여지는 바와 같이, 이 제1 클락 주기동안 프로세서(12)의 스톨은 발생하지 않는다. 제2 클락 주기동안, 제2 명령어 I₂에 대한 어드레스가 계산되고, 명령어 I₁의 페칭이 개시된다. 제3 클락 주기동안, 제3 명령어 I₃에 대한 어드레스가 계산되고, 명령어 I₁의 페칭이 완료되며, 명령어 I₂의 페칭이 개시되고, 명령어 I₁의 디코딩이 개시된다. 이 예에서, 디코드 스테이지(28)는 제3 클락 주기동안 스톨 상태가 되므로, 명령어 I₁의 디코딩은 제3 클락 주기동안 완료되지 않는다. 본 명세서에서 사용되는 것처럼, 디코드 스테이지(28)의 스톨은 디코드 스테이지(28)의 실제적인 스톨 상태 또는 디코드 스테이지(28)가 다음 명령어를 디코딩하는 것을 방해하는 파이프라인(20) 내의 또 다른 스테이지의 스톨 상태를 말한다. 예를 들어, 이와 같은 스톨 상태가 메모리 판독 동작시에 판독 스테이지(30)의 스톨 상태에 기인해 발생한다. 3개의 클락 주기동안, 명령어 I₁은 레지스터(52) 내에 저장되며, 포인터(56)는 레지스터(52)를 지정한다.3 is a timing diagram illustrating the location of instructions obtained from program memory 16 for various stages of pipeline 20 as an example where the stall state of pipeline 20 occurs intermittently. The operations of prefetch stage 24, fetch stage 26, decode stage 28, and limited instruction buffer 50 are described for an example where intermittent stalls occur in pipeline 20. During the first clock period, the address for the first instruction I ₁ is calculated. As shown in the last row of FIG. 3, no stall of the processor 12 occurs during this first clock period. During the second clock period, the address for the second instruction I ₂ is calculated and the fetching of the instruction I ₁ is initiated. During the third clock period, the address for the third instruction I ₃ is calculated, the fetching of the instruction I ₁ is completed, the fetching of the instruction I ₂ is started, and the decoding of the instruction I ₁ is started. In this example, the decode stage 28 is stalled during the third clock period, so that decoding of the instruction I ₁ is not completed during the third clock period. As used herein, the stall of decode stage 28 is the actual stall state of decode stage 28 or another stage in pipeline 20 that prevents decode stage 28 from decoding the next instruction. The stall state. For example, such a stall state occurs due to the stall state of the read stage 30 in the memory read operation. For three clock periods, the instruction I ₁ is stored in register 52, and pointer 56 designates register 52.

제4 클락 주기동안, 디코드 스테이지(28)가 앞선 클락 주기에서 스톨 상태였기 때문에, 다음 명령어에 관련된 부가적인 어드레스가 계산되지 않는다. 명령어 I₂의 페칭이 완료되고, 명령어 I₃의 페칭이 개시된다. 디코드 스테이지(28)가 스톨 상태인 채로 남아있기 때문에, 명령어 I₁의 디코딩은 계속되지만, 완료되지는 않는다. 이 제4 클락 주기동안, 명령어 I₁은 레지스터(52)에 남아있으며, 명령어 I₂는 레지스터(54) 내에 저장되고, 포인터(56)는 레지스터(52)를 지정한 채로 유지된다.During the fourth clock period, since the decode stage 28 was stalled in the previous clock period, no additional address associated with the next instruction is calculated. The fetching of the instruction I ₂ is completed and the fetching of the instruction I ₃ is started. Since the decode stage 28 remains stalled, decoding of the instruction I ₁ continues, but is not complete. During this fourth clock period, instruction I ₁ remains in register 52, instruction I ₂ is stored in register 54, and pointer 56 remains with register 52 specified.

제5 클락 주기동안, 디코드 스테이지(28)가 앞선 클락 주기에서 스톨 상태였기 때문에, 어떠한 부가적인 어드레스도 부가 명령어에 대하여 계산되지 않는다. 또한, 제한된 명령어 버퍼(50)가 풀(full) 상태이기 때문에, 명령어 I₃의 페칭은 종료될 수 없는데, 이것은 명령어 I₃가 명령어 버퍼(50)에 저장될 수 없기 때문이다. 그러므로, 명령어 I₃는 디스카드(discard)되며, 명령어 I₃의 페칭은 이 제5 클락 주기동안 다시 개시된다. 이제 디코드 스테이지(28)가 더이상 스톨 상태가 아니기 때문에, 명령어 I₁의 디코딩은 제5 클락 주기동안 완료된다. 종래의 프로세서는, 명령어 I₃를 디스카드하고 명령어 I₃에 대하여 페치를 재발행하기보다는, 명령어 I₃를 저장하기 위한 명령어 버퍼 내의 제3 레지스터를 포함하였다. 본 발명은 이와 같은 레지스터가 필요하지 않으므로, 실리콘 영역에 대한 요구를 감소시킨다. 제5 클락 주기동안, 레지스터(52)는 명령어 I₁을 저장하고, 레지스터(54)는 명령어 I₂를 저장하고, 포인터(56)는 레지스터(52)를 지정한다.During the fifth clock period, since the decode stage 28 was stalled in the previous clock period, no additional address is calculated for the additional instruction. In addition, because the limited instruction buffer 50 is full, the fetching of the instruction I ₃ cannot be terminated because the instruction I ₃ cannot be stored in the instruction buffer 50. Therefore, instruction I ₃ is discarded, and the fetching of instruction I ₃ is resumed during this fifth clock period. Since the decode stage 28 is no longer stalled, the decoding of the instruction I ₁ is completed during the fifth clock period. Conventional processor, rather than to discard the instruction I _3, and re-issue a fetch for the instruction I _3, was a third register in the instruction buffer for storing an instruction I _3. The present invention does not require such a register, thus reducing the need for silicon regions. During the fifth clock period, register 52 stores instruction I ₁ , register 54 stores instruction I ₂ , and pointer 56 designates register 52.

제6 클락 주기에 있어서, 디코드 스테이지(28)는 더이상 스톨 상태가 아니다. 그러므로, 제4 명령어 I₄에 대한 어드레스가 계산되고, 명령어 I₃의 페칭이 완료되며, 명령어 I₂의 디코딩이 개시되고 완료된다. 이와 같은 제6 클락 주기동안, 명령어 I₃는 레지스터(52) 내에 저장되고, 명령어 I₂는 레지스터(54) 내에 저장되며, 포인터(56)는 레지스터(54)를 지정한다.In the sixth clock period, the decode stage 28 is no longer stalled. Therefore, the address for the fourth instruction I ₄ is calculated, the fetching of the instruction I ₃ is completed, and the decoding of the instruction I ₂ is started and completed. During this sixth clock period, instruction I ₃ is stored in register 52, instruction I ₂ is stored in register 54, and pointer 56 designates register 54.

이와 같이, 파이프라인(20)에서 간헐적인 스톨이 발생하는 도 3에서 설명된 예에 있어서, 파이프라인(20)이 스톨 상태일 때 어드레스가 계산되는 명령어가 파이프라인의 스톨 상태가 해제될 때까지 계속해서 발행된다. 그러므로, 본 발명은 파이프라인이 처음 스톨 상태가 될 때 계산되고 있는 어드레스에서 발견된 명령어, 즉 위의 실시예에서는 명령어 I₃를 저장하는 제3 레지스터를 갖는 명령어 버퍼에 대한 요구를 회피할 수 있다. 명령어 버퍼에서 이와 같은 레지스터에 대한 요구의 회피는 파이프라인(20)이 스톨 상태일 때 명령어 I₃와 같은 페치 명령어를 재발행하는 것에 도움이 될 수 있다. 명령어를 처리하기 위한 상술한 방법을 물리적으로 구현하는 예가 도 4를 참조하여 이하에서 설명된다.As such, in the example described in FIG. 3 where an intermittent stall occurs in the pipeline 20, an instruction whose address is computed when the pipeline 20 is in the stall state until the stall state of the pipeline is released. It is issued continuously. Therefore, the present invention avoids the need for an instruction found at the address being calculated when the pipeline first enters the stall state, i.e., an instruction buffer having a third register that stores the instruction I ₃ in the above embodiment. . Avoiding the need for such a register in the instruction buffer may help reissue a fetch instruction, such as instruction I ₃ , when pipeline 20 is stalled. An example of physically implementing the aforementioned method for processing an instruction is described below with reference to FIG. 4.

도 4는 프로세서(12)의 프리페치 스테이지(24) 및 페치 스테이지(26)를 부가적으로 상세히 설명하는 블럭도이다. 도시된 바와 같이, 프리페치 스테이지(24)는 프로그램 카운터(58) 및 멀티플렉서(60)를 포함한다. 프로그램 카운터(58)는 명령어를 수신하는 메모리 시스템(14) 내의 위치를 지정하기 위해 현재의 카운트를 보유한다. 멀티플렉서(60)는 프로그램 카운터(58)의 앞선 카운트를 수신하고, 프로그램 카운터(58)의 앞선 카운트는 입력 신호로서 하나씩 증가된다. 멀티플렉서(60)는 프로그램 카운터(58)에 대한 업데이트된 카운트를 제공하는 출력 신호(62)를 생성한다. 멀티플렉서(60)는 디코드 스테이지(28)로부터 수신된 선택 신호(64)에 의해 제어된다. 디코드 스테이지(28) 자체의 스톨링, 또는 파이프라인(20) 내에서 디코드 스테이지(28)의 다운스트림 스테이지의 스톨링에 기인하여 디코드 스테이지(28)가 스톨 상태가 될 때, 선택 신호(64)는 앞선 프로그램 카운터(58)의 값을 선택하고, 디코드 스테이지(28)가 스톨 상태가 아닐 때, 출력 신호(62)는 하나씩 증가된 앞선 프로그램 카운터(58)의 카운트를 선택한다. 다음으로, 출력 신호(62)는 프로그램 카운터(58)로 제공된다. 이와 같은 방식으로, 디코드 스테이지(28)가 스톨 상태일 때, 어드레스가 프리페치 스테이지(24)에 의해 계산된 가장 최근의 명령어는 디코드 스테이지(28)의 스톨 상태가 해제될 때까지 계속적으로 페치된다.4 is a block diagram illustrating additional details of the prefetch stage 24 and the fetch stage 26 of the processor 12. As shown, the prefetch stage 24 includes a program counter 58 and a multiplexer 60. Program counter 58 holds a current count to specify a location in memory system 14 that receives instructions. The multiplexer 60 receives the previous count of the program counter 58, and the previous count of the program counter 58 is incremented by one as an input signal. Multiplexer 60 generates output signal 62 that provides an updated count for program counter 58. The multiplexer 60 is controlled by the select signal 64 received from the decode stage 28. When the decode stage 28 goes into a stall state due to the stalling of the decode stage 28 itself or the stall stage of the downstream stage of the decode stage 28 in the pipeline 20, the selection signal 64 Selects the value of the preceding program counter 58, and when the decode stage 28 is not in the stall state, the output signal 62 selects the count of the preceding program counter 58 incremented by one. Next, the output signal 62 is provided to the program counter 58. In this manner, when the decode stage 28 is in the stall state, the most recent instruction whose address was computed by the prefetch stage 24 is fetched continuously until the stall state of the decode stage 28 is released. .

상술한 바와 같이, 본 발명은 프로그램 메모리로부터 페치 명령어들을 버퍼링하기 위하여 종래의 더 큰 명령어 버퍼 대신에 제한된 명령어 버퍼가 사용될 수 있기 때문에, 장치가 보다 소형화되도록 실리콘 영역을 축소하며, 프로세서의 다른 영역들 내에 이와 같은 실리콘 영역을 사용할 수 있도록 한다.As mentioned above, the present invention reduces the silicon area to make the device more compact, since the limited instruction buffer can be used instead of the conventional larger instruction buffer to buffer fetch instructions from the program memory, and other areas of the processor. It is possible to use such a silicon region within.

본 발명 및 그 장점이 상세히 설명되었지만, 첨부된 청구항들에 의해 정의되는 본 발명의 사상 및 범주에서 벗어나지 않는 다양한 변경, 치환 및 대체예들이 행해질 수 있다는 것을 이해하여야 한다.While the invention and its advantages have been described in detail, it should be understood that various changes, substitutions and alterations can be made without departing from the spirit and scope of the invention as defined by the appended claims.

Claims

A method of buffering instructions in a processor having a pipeline comprising a decode stage, the method comprising:

Detecting stalling of the decode stage;

Ressuing a prefetch for instructions in memory until the stall state of the decode stage is released in response to the stalling detection of the decode stage; And

Writing the fetched instruction into an instruction buffer after the stall state of the decode stage is released

How to include.

2. The method of claim 1, further comprising determining whether the instruction buffer is full.

3. The method of claim 2, wherein the refetching of the prefetch comprises reissuing a prefetch for an instruction stored in a memory only when the instruction buffer is full.

2. The method of claim 1, wherein the prefetch reissue step comprises fetching an instruction stored at a memory location having an address specified by a count of a program counter.

5. The method of claim 4, wherein the prefetch reissue step comprises adjusting a count of the program counter from there to a count corresponding to an address of the memory location to which the instruction was prefetched.

6. The method of claim 5, wherein the prefetch reissue step includes leaving the count of the program counter unchanged.

The method of claim 1, wherein the pipeline comprises a plurality of sequential stages prior to the decode stage, and wherein writing the fetched instructions to an instruction buffer is equal to or less than the number of sequential stages preceding the decode stage. Writing the fetched instructions to an instruction buffer operable to store the number of instructions concurrently.

In the processor pipeline,

A plurality of sequential stages preceding the decode stage;

A limited instruction buffer operable to simultaneously store instructions having a number equal to or less than the number of sequential stages;

A counter system for storing a count specifying an address of a memory location that stores instructions received by the restricted instruction buffer, the counter system operable to adjust the count of the counter based on a state of the decode stage; And

A plurality of sequential stages including a fetch unit operable to fetch the instruction at the address specified by the count

Processor pipeline comprising a.

The counter system of claim 8, wherein the counter system is configured to increment the count of the counter when the decode stage is in the stall state, and to leave the count of the counter unchanged when the decode stage is not in the stall state. Processor pipeline.

The counter system of claim 8, wherein the counter system decrements the count of the counter when the decode stage is not in a stall state, and does not change the count of the counter when the decode stage is in a stall state. Processor pipeline operable to leave.

9. The processor pipeline of claim 8, wherein the counter system further comprises a multiplexer operable to receive a control signal indicative of the state of the decode stage.

12. The apparatus of claim 11, wherein the multiplexer is operable to receive a count of the counter and an incremented count of the counter, the output signal representing either one of the count of the counter and the incremented count of the counter. A processor pipeline operable to generate based on the control signal indicative of a state of the decode stage.