KR20000010200A

KR20000010200A - Instruction decoder having reduced instruction decoding path

Info

Publication number: KR20000010200A
Application number: KR1019980030963A
Authority: KR
Inventors: 전성천; 유인선
Original assignee: 김영환; 현대전자산업 주식회사
Priority date: 1998-07-30
Filing date: 1998-07-30
Publication date: 2000-02-15

Abstract

PURPOSE: An instruction decoder is provided to reduce a instruction decoding path by remove a critical path for decoding the instruction in microprocessor operating at high speed. CONSTITUTION: The instruction decoder comprises a prefetch unit(210); an instruction que unit(220) and a decoding unit(230). In the instruction decoder, in response to cache hit signal or cache miss signal output from a cache unit(200), on cache hit, the prefetch unit receives a instruction line output from the cache unit, and on cache miss, the prefetch unit receives a instruction line which is accessed and input from external memory. From the received instruction lines, the prefetch checks whether the instruction to be executed in each pipeline can be executed in parallel or not, and predecodes the instruction. The instruction que unit stores the instruction which is output from the prefetch unit. The decoding unit finally decodes the instruction of each pipeline output from the instruction que unit, respectively. Thereby, it is possible to eliminate the critical decoding path so that it is can be executed a high speed operating stably.

Description

Instruction decoding device with reduced instruction decoding path

본 발명은 마이크로프로세서에 관한 것으로서, 특히 명령어 디코딩의 경로를 줄인 명령어 디코딩 장치에 관한 것이다.The present invention relates to a microprocessor, and more particularly to an instruction decoding apparatus for reducing the path of instruction decoding.

일반적으로 고성능 마이크로프로세서는 명령어 처리 효율(instruction process throughput)을 높이기 위하여 다중 파이프라인의 수퍼스칼라(superscalar) 구조를 가진다. 즉, 두 개 이상의 명령어를 다수의 스테이지로 이루어진 각 파이프라인에서 동시에 수행하는 구조로 이루어진다.In general, high-performance microprocessors have a superscalar structure of multiple pipelines in order to increase instruction process throughput. That is, it consists of a structure that executes two or more instructions simultaneously in each pipeline consisting of a plurality of stages.

도 1은 통상적인 수퍼스칼라 구조의 고성능 마이크로프로세서에서 명령어를 처리하기 위한 다수의 스테이지를 도시한 도면으로, X 파이프라인 및 Y 파이프라인 각각에서 수행할 명령어를 미리 캐쉬에서 가져오는 명령어 페치 스테이지, 페치한 명령어를 디코딩하는 명령어 디코딩 스테이지, 데이터 캐쉬에서 가져올 데이터의 주소를 계산하는 어드레스 생성 스테이지, 명령어를 수행하는 실행 스테이지 및 수행한 결과를 메모리나 레지스터에 쓰는 라이트 백 스테이지로 이루어져, X 파이프라인 및 Y 파이프라인에서 각 스테이지 별로 두 개의 명령어를 동시에 수행한다.1 is a diagram illustrating a plurality of stages for processing instructions in a conventional superscalar high-performance microprocessor, in which instruction fetch stages, fetches, which prefetch instructions from a cache to be executed in the X pipeline and the Y pipeline, respectively. X pipeline and Y, consisting of an instruction decoding stage that decodes one instruction, an address generation stage that calculates the address of the data to retrieve from the data cache, an execution stage that executes the instruction, and a write back stage that writes the result to memory or a register. In the pipeline, two instructions are executed simultaneously for each stage.

도 2는 종래의 수퍼스칼라(superscalar) 마이크로프로세서에 대한 블록도로서, 도면에서 100은 캐시 유닛, 110은 명령어 큐 유닛, 120은 명령어 디코더 유닛, 130은 실행유닛, 140은 버스 인터페이스 유닛, 150은 멀티플렉서, 160은 데이터 캐시, 170은 레지스터 파일을 각각 나타낸다.2 is a block diagram of a conventional superscalar microprocessor, where 100 is a cache unit, 110 is an instruction queue unit, 120 is an instruction decoder unit, 130 is an execution unit, 140 is a bus interface unit, and 150 is a Multiplexer 160 denotes a data cache and 170 denotes a register file.

도 2를 참조하여, 종래의 명령어 처리 방식을 설명한다.Referring to Figure 2, a conventional instruction processing method will be described.

상기와 같이 구성된 수퍼스칼라 마이크로프로세서는 높은 명령어 요청 대역폭(instruction request bandwidth)을 지원하기 위하여 명령어 코드를 캐시 유닛(100) 또는 외부 메모리(10)로부터 읽어 명령어 큐 유닛(110)에 저장하고, 디코더 유닛(120)으로 명령어 코드를 내보낸다. 그리고, 디코더 유닛(120)의 병렬성 체크 블록(121)에서 연속된 두 명령어의 병렬 실행 가능성(pairing)을 검사한다. 이러한 병렬성 여부에 따라 X 파이프라인 및 Y 파이프라인의 디코더(122, 123)로 명령어를 할당하고, 우선순위가 낮은 Y 파이프라인의 진행 여부를 결정한 후 각 파이프라인의 디코더(122, 123)에서 명령어 크기, 레지스터 어드레스 및 프로세서 의존성(processor dependant) 등에 대한 디코딩 정보를 실행유닛(130)으로 출력한다. 그리고, 실행유닛(130)에서 명령어를 실행한 후의 최종 결과를 데이터 캐시(160)나 레지스터 파일(170) 등에 라이트(write)한다.The superscalar microprocessor configured as described above reads the instruction code from the cache unit 100 or the external memory 10 and stores the instruction code in the instruction queue unit 110 in order to support a high instruction request bandwidth. Export the command code to (120). In the parallelism check block 121 of the decoder unit 120, parallel execution of two consecutive instructions is checked. According to the parallelism, the instruction is allocated to the decoders 122 and 123 of the X pipeline and the Y pipeline, and the decoders 122 and 123 of each pipeline are determined after deciding whether or not to proceed with the lower priority Y pipeline. Decoding information about the size, the register address, and the processor dependant is output to the execution unit 130. The final result after executing the instruction in the execution unit 130 is written to the data cache 160 or the register file 170.

상기와 같은 종래의 수퍼스칼라 마이크로프로세서에서 명령어 처리 시 가장 긴 디코딩 경로는, 디코더 유닛(120)의 병렬성 체크 블록(121)에서 병렬성 여부를 체크하는 단계, 명령어를 각 파이프라인에 할당하는 단계, 각 파이프라인의 디코더(122, 123)에서 명령어를 디코딩하는 단계, 디코딩 결과에 응답하여 데이터를 억세스하기 위해 레지스터 파일을 읽는 단계로 이루어진 순차 경로로서, 고속으로 동작하는 마이크로프로세서에서 크리티컬(critical) 경로가 되어 프로세서의 성능을 제한하게 된다.In the conventional superscalar microprocessor as described above, the longest decoding path may include: checking parallelism in the parallelism check block 121 of the decoder unit 120, assigning instructions to each pipeline, and each A sequential path consisting of decoding instructions at the decoders 122 and 123 of the pipeline, and reading a register file to access data in response to the decoding result. This limits the performance of the processor.

본 발명은 상기와 같은 문제점을 해결하기 위하여 안출된 것으로서, 고속으로 동작하는 마이크로프로세서에서 명령어 디코딩 시의 크리티컬 경로를 제거하여 명령어 디코딩 경로를 줄인 명령어 디코딩 장치를 제공하는데 그 목적이 있다.SUMMARY OF THE INVENTION The present invention has been made to solve the above problems, and an object of the present invention is to provide an instruction decoding apparatus for reducing an instruction decoding path by removing a critical path during instruction decoding in a microprocessor operating at a high speed.

도 1은 통상적인 수퍼스칼라 구조의 고성능 마이크로프로세서에서 명령어를 처리하기 위한 다수의 스테이지를 도시한 도면.1 illustrates a number of stages for processing instructions in a high performance microprocessor of a conventional superscalar structure.

도 2는 종래의 수퍼스칼라(superscalar) 마이크로프로세서에 대한 블록도.2 is a block diagram of a conventional superscalar microprocessor.

도 3은 본 발명의 일실시예에 따른 마이크로프로세서의 블록도.3 is a block diagram of a microprocessor in accordance with an embodiment of the present invention.

* 도면의 주요 부분에 대한 설명* Description of the main parts of the drawing

200 : 캐시 유닛 210 : 프리페치 유닛200: cache unit 210: prefetch unit

220 : 명령어 큐 유닛 230 : 디코더 유닛220: instruction queue unit 230: decoder unit

240 : 실행 유닛 250 : 메모리 리소스 유닛240: execution unit 250: memory resource unit

260 : 버스 인터페이스 유닛260 bus interface unit

213 : 병렬성 체크 블록 214 : 프리디코더213: parallelism check block 214: predecoder

상기 목적을 달성하기 위한 본 발명의 명령어 디코딩 장치는, 캐시 유닛으로부터 출력되는 캐시 히트/미스 신호에 응답하여 캐시 히트 시 상기 캐시 유닛으로부터 출력되는 명령어 라인과, 캐시 미스 시 외부 메모리로부터 억세스하여 입력되는 명령어 라인을 입력받아 각 파이프라인에서 수행될 명령어의 병렬 실행 가능성을 체크하고, 상기 명령어에 대한 프리디코딩을 수행하는 프리페치 유닛; 상기 프리페치 유닛으로부터 출력되는 현재 수행할 명령어를 저장하는 명령어 큐 유닛; 및 상기 명령어 큐 유닛으로부터 각 파이프라인별로 출력되는 명령어를 최종 디코딩하는 디코더 유닛을 포함하여 이루어진다.The instruction decoding apparatus of the present invention for achieving the above object, the command line output from the cache unit when the cache hit in response to the cache hit / miss signal output from the cache unit, and is accessed by input from the external memory when the cache miss A prefetch unit which receives a command line and checks the parallel execution possibility of instructions to be executed in each pipeline, and performs predecoding for the instructions; An instruction queue unit configured to store an instruction to be executed currently output from the prefetch unit; And a decoder unit for finally decoding an instruction output for each pipeline from the instruction queue unit.

본 발명은 고성능 마이크로프로세서의 높은 명령어 요청 대역폭을 만족시키기 위하여 명령어 페치 및 디코딩 스테이지에서 처리하는 일들을 프리페치 사이클로(prefetch cycle)로 분산함으로써, 고속으로 동작하는 마이크로프로세서의 명령어 디코딩 시 크리티컬 경로를 제거한다.The present invention eliminates the critical path during instruction decoding of a high speed microprocessor by distributing the processing in the instruction fetch and decoding stages in a prefetch cycle to satisfy the high instruction request bandwidth of a high performance microprocessor. do.

이하, 도면을 참조하여 본 발명의 일실시예를 상세히 설명한다.Hereinafter, an embodiment of the present invention will be described in detail with reference to the drawings.

도 3은 본 발명의 일실시예에 따른 마이크로프로세서의 블록도로서, 실행 유닛(240)에서 자주 사용되는 명령어들을 연속적으로 한 라인으로 저장하고 캐시 히트 시 요구된 명령어를 공급하는 코드램(203)과, 명령어의 캐시 히트/미스를 판정하기 위해 코드램(203)의 한 라인에 저장된 명령어들의 공통 어드레스 상위 비트와 비교되는 명령어 태그 어드레스를 저장하는 태그램(202)과, 요구되는 명령어의 캐시 히트/미스를 판정하여 명령어 공급 및 명령어 외부 메모리 페치 등의 기능을 수행하는 캐시 컨트롤러(201)로 이루어진 캐시 유닛(200)과, 캐시 미스 시 외부 메모리를 억세스하기 위한 버스 인터페이스 유닛(240)과, 캐시 히트/미스 신호에 응답하여 캐시 히트 시 코드램(203)으로부터 출력되는 명령어 라인과 캐시 미스 시 버스 인터페이스 유닛(240)을 통해 외부 메모리로부터 억세스하여 캐시 유닛(200)의 코드램(203)에 새롭게 저장될 명령어 라인 중 하나를 선택하여 출력하는 멀티플렉서(204)와, 상기 멀티플렉서(204)로부터 출력되는 명령어 라인을 입력받아 각 파이프라인에서 수행될 명령어의 병렬 실행 가능성을 체크하고, 상기 명령어에 대한 초기 디코딩(predecoding)을 수행하는 프리페치 유닛(210)과, 프리페치 유닛(210)으로부터 출력되는 현재 수행할 명령어를 저장하는 명령어 큐 유닛(220)과, 명령어 큐 유닛(220)으로부터 각 파이프라인별로 출력되는 명령어를 최종 디코딩(postdecoding)하는 디코더 유닛(230)과, 디코더 유닛(230)으로부터 각 파이프라인별로 출력되는 명령어를 입력받아 적절한 명령어 연산을 수행하는 실행 유닛(240)과, 실행 유닛(240)으로부터 명령어를 수행한 결과를 입력받아 저장하는 데이터 캐시(251) 및 레지스터 파일(252)을 포함하는 메모리 리소스 유닛(memory resource unit, 250)으로 이루어진다.3 is a block diagram of a microprocessor according to an exemplary embodiment of the present invention. The coderam 203 stores instructions frequently used in the execution unit 240 in a row in succession, and supplies the instructions required during a cache hit. And a tag 202 that stores an instruction tag address that is compared with the common address high order bits of the instructions stored in one line of coderam 203 to determine a cache hit / miss of the instruction, and a cache hit of the required instruction. The cache unit 200 including a cache controller 201 for determining / missing and supplying instructions and fetching instructions external memory, a bus interface unit 240 for accessing external memory in case of a cache miss, and a cache Through the command line output from the code RAM 203 in response to the hit / miss signal and the cache miss bus interface unit 240. Each pipe receives a multiplexer 204 that selects and outputs one of the command lines to be newly stored in the code RAM 203 of the cache unit 200 from the secondary memory, and a command line output from the multiplexer 204. A command for checking the parallel execution possibility of an instruction to be executed in a line, and prefetching unit 210 performing initial decoding of the instruction, and storing a current execution instruction outputted from the prefetch unit 210. Input the queue unit 220, the decoder unit 230 for final decoding (postdecoding) instructions output from the instruction queue unit 220 for each pipeline, and the instructions output for each pipeline from the decoder unit 230 are inputted. An execution unit 240 that receives and performs an appropriate instruction operation and data that receives and stores a result of performing an instruction from the execution unit 240. When 251 and comprises a memory resource units (resource memory unit, 250) comprising a register file (252).

프리페치 유닛(210)은 상기 멀티플렉서(204)로부터 출력되는 명령어 라인을 번갈아 각 파이프라인별로 저장하는 명령어 버퍼 A(211) 및 명령어 버퍼 B(212)와, 명령어 버퍼(211, 212)로부터 출력되는 두 명령어의 병렬 실행 가능성을 체크하는 병렬성 체크 블록(213)과, 병렬성 체크 블록(213)으로부터 출력되는 명령어를 프리디코딩하여 명령어 길이, 레지스터 파일 어드레스 등을 알아내는 프리디코더(214)와, 상기 프리디코더(214)로부터 출력되는 프리디코드된 명령어를 저장하고, 명령어의 실제 수행을 대기시키는 각 파이프라인별 프리디코드된 명령어 큐(215, 216)로 이루어진다.The prefetch unit 210 outputs the command buffer A 211 and the command buffer B 212 which alternately store the command lines output from the multiplexer 204 for each pipeline, and the command buffers 211 and 212. A parallelism check block 213 for checking the parallel execution possibility of two instructions, a predecoder 214 for predecoding the instructions output from the parallelism check block 213 to find an instruction length, a register file address, and the like. Pre-decoded instruction queues 215 and 216 for each pipeline that store pre-decoded instructions output from the decoder 214 and wait for actual execution of the instructions.

다음으로, 상기와 같이 구성된 마이크로프로세서에서 크리티컬한 디코딩 경로를 제거한 디코딩 동작을 설명한다.Next, a decoding operation of removing the critical decoding path in the microprocessor configured as described above will be described.

먼저, 캐시 컨트롤러(201)는 어드레스 생성 유닛(도면에는 도시되어 있지 않음)으로부터 매 사이클 명령어 페치 어드레스를 입력받아 태그램/코드램(202/203)으로 보내 캐시 히트/미스를 판정한다. 그리고, 캐시 히트 시 코드램(203)으로부터, 캐시 미스 시 외부 메모리(10)로부터 명령어 코드를 입력받아 프리페치 유닛(210)의 명령어 버퍼 A/B(211/212)에 순차적으로 저장한다. 다음으로, 병렬성 체크 블록(213)에서 명령어 버퍼 A/B(211/212)의 명령어 코드에 대한 병렬성 여부를 체크하고, 병렬성 여부에 대한 체크 후 명령어 코드는 명령어 프리디코더(instruction predecoder, 214)에 전달되어 두 명령어 코드에 대한 부분 디코딩이 진행된다. 명령어 코드에 대한 디코딩을 수행한 후 병렬 처리가 가능한 경우, 예를 들어 독립적인 정수 연산 유닛에서 실행되는 실행 명령어 쌍이거나 정수 연산 유닛과 부동 소수점 연산 유닛에서 각각 실행되는 명령어 쌍인 경우에 X 파이프라인 프리디코드된 명령어 큐(predecoded instruction queue, 215)와 Y 파이프라인 프리디코드된 명령어 큐(216)에 두 명령어 코드를 각각 동시에 저장한다. 종속성이 존재하는 명령어쌍의 경우는 우선순위가 높은 X 파이프라인 프리디코드된 명령어 큐(215)에 순차적으로 저장되도록 진행됨으로써 명령어를 실행유닛(240)에서 실행할 때 충돌(resources conflict)없이 처리할 수 있다.First, the cache controller 201 receives a cycle instruction fetch address from the address generation unit (not shown) and sends it to the tag / code RAM 202/203 to determine a cache hit / miss. In addition, the instruction code is received from the code RAM 203 at the cache hit and the external memory 10 at the cache miss, and is sequentially stored in the instruction buffer A / B 211/212 of the prefetch unit 210. Next, the parallelism check block 213 checks the parallelism of the instruction codes of the instruction buffers A / B 211/212, and after checking the parallelism, the instruction codes are stored in the instruction predecoder 214. It is passed and partial decoding of two instruction codes is performed. X-pipeline free if parallel processing is possible after decoding the instruction code, e.g. pairs of execution instructions executed in independent integer arithmetic units, or pairs of instructions executed in integer arithmetic units and floating point arithmetic units respectively. Both instruction codes are stored simultaneously in the decoded instruction queue 215 and the Y pipeline predecoded instruction queue 216. In the case of instruction pairs having dependencies, they are sequentially stored in the X-pipelined decoded instruction queue 215 having high priority, so that instructions can be processed without executing a resource conflict when executed in the execution unit 240. have.

다음으로, 프리페치 유닛(210)으로부터 프리디코드된 명령어가 명령어 큐 유닛(220)의 해당 파이프라인 명령어 큐(221, 222)에 입력되고, 디코더 유닛(230)에서 명령어 큐(221, 222)로부터 입력되는 명령어에 대한 나머지 디코딩을 수행한 후 실행유닛(240) 및 레지스터 블록 등으로 관련 제어 신호 및 오퍼랜드를 출력한다. 실행유닛(240)의 정수 연산 유닛 A(241)는 X 파이프라인 포스트 디코더(231)로부터 출력되는 디코딩 결과로부터 명령어를 수행하고 그 결과를 레지스터 파일(252)이나 데이터 캐시(251)에 저장함으로써 하나의 명령어 실행을 마친다. 실행유닛(240)의 정수 연산 유닛 B(242) 또는 부동 소수점 연산 유닛(243)은 y 파이프라인 포스트 디코더(232)로부터 출력되는 디코딩 결과로부터 명령어를 수행하고 그 결과를 레지스터 파일(252)이나 데이터 캐시(251)에 저장함으로써 또다른 하나의 명령어 실행을 마친다.Next, the predecoded instructions from the prefetch unit 210 are input to the corresponding pipeline instruction queues 221 and 222 of the instruction queue unit 220 and from the instruction queues 221 and 222 in the decoder unit 230. After performing the remaining decoding on the input instruction, the control unit and the operand are output to the execution unit 240 and the register block. The integer arithmetic unit A 241 of the execution unit 240 executes an instruction from the decoding result output from the X pipeline post decoder 231 and stores the result in the register file 252 or the data cache 251. Finish executing the command. The integer arithmetic unit B 242 or the floating point arithmetic unit 243 of the execution unit 240 executes an instruction from the decoding result output from the y pipeline post decoder 232 and returns the result to the register file 252 or the data. The execution of another instruction is completed by storing in the cache 251.

따라서, 상기와 같이 이루어지는 본 발명의 명령어 디코딩 동작은 명령어 프리페치 방식을 사용하면서 수퍼스칼라 구조에서의 병렬 실행 가능성 체크와 명령어 크기, 레지스터 파일 어드레스 등에 대한 디코딩 동작을 명령어 프리페치 사이클에서 미리 수행하도록 함으로써 디코딩 스테이지에서 수행해야하는 작업 로드(load)를 덜어주어 디코딩 동작에서 크리티컬한 경로를 제거한다.Accordingly, the instruction decoding operation of the present invention as described above uses the instruction prefetch method to perform the parallel execution check in the superscalar structure and the decoding operation for the instruction size, register file address, etc. in advance in the instruction prefetch cycle. Eliminates critical paths in decoding operations by relieving the workload of performing at the decoding stage.

본 발명의 기술 사상은 상기 바람직한 실시예에 따라 구체적으로 기술되었으나, 상기한 실시예는 그 설명을 위한 것이며, 그 제한을 위한 것이 아님을 주의하여야 한다. 또한, 본 발명의 기술 분야의 통상의 전문가라면 본 발명의 기술 사상의 범위 내에서 다양한 실시예가 가능함을 이해할 수 있을 것이다.Although the technical spirit of the present invention has been described in detail according to the above-described preferred embodiment, it should be noted that the above-described embodiment is for the purpose of description and not of limitation. In addition, those skilled in the art will understand that various embodiments are possible within the scope of the technical idea of the present invention.

상기와 같이 이루어지는 본 발명은, 수퍼스칼라 구조의 높은 명령어 요청 대역폭을 충족시키기 위해 프리페치 유닛을 새로이 구성하고 연속된 명령어의 병렬실행 여부와 명령어에 대한 프리디코딩 동작을 명령어 페치 사이클에서 수행함으로써 고속으로 동작하는 수퍼스칼라 마이크로프로세서에서의 크리티컬한 디코딩 경로를 제거하여 안정적인 고속 동작을 가능하게 하는 효과가 있다.According to the present invention, the prefetch unit is newly configured to satisfy the high instruction request bandwidth of the superscalar structure, and whether the parallel execution of consecutive instructions and the predecoding operation for the instructions are performed in the instruction fetch cycle. It is effective to remove the critical decoding path from the operating superscalar microprocessor to enable stable high speed operation.

Claims

In response to the cache hit / miss signal output from the cache unit, a parallel command is executed in each pipeline by receiving a command line output from the cache unit when the cache hit is performed, and a command line input by accessing from an external memory when the cache miss occurs. A prefetch unit that checks feasibility and performs predecoding for the instruction;

An instruction queue unit configured to store an instruction to be executed currently output from the prefetch unit; And

Decoder unit for finally decoding the command output for each pipeline from the command queue unit

Instruction decoding apparatus comprising a.

The method of claim 1,

The prefetch unit,

A plurality of instruction storage means for alternately storing instructions for the instruction line for each pipeline;

Parallelism checking means for checking the parallel execution possibility of the instructions outputted from the instruction storing means;

A predecoder for precoding an instruction output from the parallelism checking means; And

A plurality of predecoded instruction storage means for storing the pre-decoded instructions output from the predecoder for each pipeline

Instruction decoding apparatus comprising a.