KR0172303B1

KR0172303B1 - Command language sorter

Info

Publication number: KR0172303B1
Application number: KR1019950064416A
Authority: KR
Inventors: 고동범; 임채덕
Original assignee: 김주용; 현대전자산업주식회사
Priority date: 1995-12-29
Filing date: 1995-12-29
Publication date: 1999-03-30
Also published as: KR970049497A

Abstract

본 발명은 외부의 프리페치 유닛에서 발생된 다수의 제어신호에 따라 외부의 프리페치 큐에 저장된 명령어를 올바르게 디코딩 유닛으로 제공하기 위한 명령어 정렬기에 있어서, 상기 프리페치 유닛으로부터의 다수의 제1 제어신호(sel1, sel2, sel0)에 따라 상기 프리페치 큐에 저장된 명령어를 선택하여 출력하는 선택 수단(21 내지 25); 및 상기 프리페치 유닛으로부터의 제2 제어 신호(Bsht_amt[4:0])에 따라 상기 선택 수단에 의해 선택된 명령어를 로테이트시켜 상기 디코딩 유닛으로 제공하는 로테이팅 수단(26)을 구비하는 것을 특징으로 하는 명령어 정렬기에 관한 것으로, 프리페치 큐와 캐쉬 블럭 사이에 어떠한 인터페이스도 필요치 않고, 모든 명령어 정렬 기능을 프리페치 큐와 디코더 사이에 놓음으로써, 디자인 개량이 용이하도록 한 것이다.The present invention provides an instruction aligner for correctly providing an instruction stored in an external prefetch queue to a decoding unit according to a plurality of control signals generated in an external prefetch unit, wherein the plurality of first control signals from the prefetch unit selection means (21 to 25) for selecting and outputting instructions stored in the prefetch queue according to (sel1, sel2, sel0); And rotating means 26 for rotating the instructions selected by the selection means in accordance with the second control signal Bsht_amt [4: 0] from the prefetch unit and providing them to the decoding unit. The instruction aligner does not require any interface between the prefetch queue and the cache block, and puts all instruction alignment functions between the prefetch queue and the decoder to facilitate design improvements.

Description

Instruction sorter

제1도는 본 발명에 의한 정렬기와 관련된 주변 기능 블럭을 나타낸 블럭 구성도.1 is a block diagram showing peripheral function blocks associated with an aligner according to the present invention.

제2도는 본 발명의 일실시예에 따른 명령어 정렬기의 구조도.2 is a structural diagram of an instruction aligner according to an embodiment of the present invention.

* 도면의 주요부분에 대한 부호의 설명* Explanation of symbols for main parts of the drawings

21 내지 25 : 멀티플렉서 26 : 바이트 로테이터21 to 25 multiplexer 26 byte rotator

본 발명은 명령어 정렬기에 관한 것으로, 특히 마이크로프로세서 디자인시 캐쉬 유닛과 디코딩 유닛 사이에 설치되어, 올바른 명령어가 제때에 디코딩 유닛으로 입력되도록 하는 명령어 정렬기에 관한 것이다.TECHNICAL FIELD The present invention relates to an instruction aligner, and more particularly, to an instruction aligner installed between a cache unit and a decoding unit in the design of a microprocessor such that the correct instruction is input to the decoding unit in time.

일반적으로, CICS(Complex Instruction Set Computer)의 경우, 명령어의 길이가 일정치 않다. 이러한 CISC에 사용되는 마이크로프로세서에 있어서, 명령어의 길이가 가장 짧은 것은 1 바이트에서부터 가장 긴 것은 15 바이트에 이르는 것도 있다.In general, in the case of CICS (Complex Instruction Set Computer), the length of the instruction is not constant. In the microprocessor used for this CISC, the instruction may range from 1 byte in length to 15 bytes in length.

코드 캐쉬에서 프리페치(prefetch) 유닛으로 명령어를 읽어 들이는 경우, 보통 라인 크기로 읽는다. 그런데 명령어가 연속된 두 라인에 걸쳐 있는 경우, 프리페치 유닛은 연속된 두 라인을 읽기 위해 코드 캐쉬를 두 번 억세스 해야 한다.When reading instructions from the code cache into the prefetch unit, they are usually read in line size. However, if an instruction spans two consecutive lines, the prefetch unit must access the code cache twice to read two consecutive lines.

이 번거로움을 없애기 위해 종래에는 보통 한 사이클에 한 라인의 상위 반 라인과 그 다음 라인의 하위 반 라인을 동시에 가져오는 스플릿 라인 억세스(split line access)를 수행한다.To eliminate this hassle, conventionally, a split line access is performed which simultaneously brings the upper half line of one line and the lower half line of the next line in one cycle.

그러나, 이 방법을 사용하여 코드 캐쉬로부터 명령어를 읽어 오면, 명령어가 정확히 정렬이 안된 상태로 프리페치 큐(queue)에 쓰여지게 된다. 따라서 정확한 명령어가 디코딩 유닛으로 보내어지려면 명령어를 정확하게 정렬해 주는 정렬기가 필요하다.However, using this method to read an instruction from the code cache will cause the instruction to be written to the prefetch queue without being aligned correctly. Therefore, the correct instruction can be sent to the decoding unit, which requires an aligner to correctly align the instruction.

본 발명은 상기 문제점을 해결하고 필요에 부응하여 안출된 것으로, 올바른 명령어가 제때에 디코딩 유닛으로 제공되도록 하는 명령어 정렬기를 제공함에 그 목적이 있다.SUMMARY OF THE INVENTION The present invention has been made in view of solving the above problems and in response to a need, and an object of the present invention is to provide an instruction aligner that allows a correct instruction to be provided to a decoding unit in time.

상기 목적을 달성하기 위하여 본 발명은, 외부의 프리페치 유닛에서 발생된 다수의 제어신호에 따라 외부의 프리페치 큐에 저장된 명령어를 올바르게 디코딩 유닛으로 제공하기 위한 명령어 정렬기에 있어서, 상기 프리페치 유닛으로부터의 다수의 제1 제어 신호에 따라 상기 프리페치 큐에 저장된 명령어를 선택하여 출력하는 선택 수단; 및 상기 프리페치 유닛으로부터의 제2 제어 신호에 따라 상기 선택 수단에 의해 선택된 명령어를 로테이트시켜 상기 디코딩 유닛으로 제공하는 로테이팅 수단을 구비하는 것을 특징으로 한다.In order to achieve the above object, the present invention provides an instruction aligner for correctly providing an instruction stored in an external prefetch queue to a decoding unit according to a plurality of control signals generated in an external prefetch unit. Selecting means for selecting and outputting a command stored in the prefetch queue according to a plurality of first control signals of a plurality of first control signals; And rotating means for rotating the instructions selected by the selection means in accordance with the second control signal from the prefetch unit and providing them to the decoding unit.

이하, 첨부된 도면을 참조하여 본 발명의 실시예를 상세히 설명하면 다음과 같다.Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to the accompanying drawings.

간단한 설명을 위해 라인 크기를 32 바이트, 캐쉬 엔트리의 개수는 128개, 피지컬(physical) 캐쉬를 가정한다. 그리고 프리페치 어드레스는 32 비트이고, 하위 12 비트는 어드레스 변환 없이 캐쉬를 억세스한다고 가정한다. 그러면 프리페치 어드레스는 다음과 같이 나누어질 수 있다.For simplicity, we assume a line size of 32 bytes, the number of cache entries is 128, and the physical cache. It is assumed that the prefetch address is 32 bits and the lower 12 bits access the cache without address translation. The prefetch address can then be divided as follows.

코드 캐쉬에서 프리페치 유닛으로 읽혀지는 명령어는 프리페치 유닛에서 발생되는 프리페치 어드레스에 따라 항상 라인 크기로 프리페치 큐로 로드(load)된다. 이때 스플릿 라인 억세스 시는 현재 라인과 그 다음 라인이 모두 히트가 나야만 억세스가 이루어지고 한 라인이라도 미스가 나면 미스 처리 루틴을 수행하게 된다.Instructions read from the code cache into the prefetch unit are always loaded into the prefetch queue at the line size according to the prefetch address generated by the prefetch unit. At this time, in the split line access, the access is performed only when both the current line and the next line hit, and the miss processing routine is performed when one line is missed.

제1도는 본 발명에 의한 정렬기와 관련된 주변 기능 블럭을 나타낸 블럭 구성도로서, 도면에서 1은 페이지 유닛, 2는 버스 유닛, 3은 코드 TLB(Transmation Lookaside Buffer), 4는 코드 캐쉬, 5는 x 패어링 체크 유닛, 6은 y 패어링 체크 유닛, 7은 프리페치 유닛, 8은 디코딩 유닛, 9는 정렬기, 10은 프리페치 큐를 각각 나타낸다.1 is a block diagram showing peripheral functional blocks associated with an aligner according to the present invention, in which 1 is a page unit, 2 is a bus unit, 3 is a code transmission lookaside buffer (TLB), 4 is a code cache, and 5 is x. A pairing check unit, 6 a y pairing check unit, 7 a prefetch unit, 8 a decoding unit, 9 a sorter, and 10 a prefetch queue.

페이지 유닛(1)에서는 어드레스 변환에 관한 제어를 수행하고, TLB 미스가 발생했을 경우 메인 메모리로부터 어드레스 변환에 필요한 정보를 가져오기 위해 버스 유닛(2)에 메모리 억세스를 위한 요청을 한다.The page unit 1 performs control on address translation, and when a TLB miss occurs, a request is made to the bus unit 2 for memory access to obtain information necessary for address translation from the main memory.

코드 TLB(3)는 어드레스 변환시 필요한 정보를 담고 있어서 TLB 히트시 피지컬 어드레스를 코드 캐쉬(4)로 보낸다.The code TLB 3 contains information necessary for address translation, and sends a physical address to the code cache 4 when the TLB hits.

코드 캐쉬(4)는 프리페치 큐로 보낸 명령어를 담고 있다가 프리페치 유닛으로부터 캐쉬 억세스 요청이 오면 한 라인(32 바이트) 단위의 명령어를 프리페치 큐에 로드한다.The code cache 4 contains instructions sent to the prefetch queue, and then loads a line (32 byte) instruction into the prefetch queue when a cache access request comes from the prefetch unit.

캐쉬 미스시에 디코딩 유닛으로의 명령어 유입이 중단되는 것을 조금이라도 줄이기 위해 버스 유닛(2)에서 직접 프리페치 큐로 로드할 수 있는 패스(64 비트)도 존재한다.There is also a pass (64 bits) that can be loaded directly into the prefetch queue from the bus unit 2 to reduce any interruption of instruction flow into the decoding unit upon cache miss.

디코딩 유닛(8)에서는 코드 캐쉬 정렬기를 거쳐 프리페치 큐로부터 필요시 명령어를 읽어 가서 명령어를 디코딩한다.The decoding unit 8 reads the instruction from the prefetch queue if necessary via the code cache sorter and decodes the instruction.

듀얼(dual) 파이프라인을 지원하기 위해 프리페치 큐 패어 0(패어 0)와 패어 1에서 x 패어링 체크 유닛(5)과 y 패어링 체크 유닛(6)으로 가는 패스(패스)가 모두 존재한다.There are both passes from prefetch queue pair 0 (pair 0) and pair 1 to x pairing check unit 5 and y pairing check unit 6 to support dual pipelines. .

패어링 체크 유닛(5, 6)에서는 패어링 가능 여부를 결정하여 프리페치 유닛(7)에 알려준다.The pairing check units 5 and 6 determine whether or not pairing is possible and inform the prefetch unit 7.

프리페치 유닛(7)에서는 프리페치 큐가 항상 유효한 명령어를 가지고 있도록 프리페치 어드레스를 발생시켜 프리페치 큐를 채우고, 올바른 명령어가 디코딩 유닛(8)으로 가도록 프리페치 큐와 정렬기(9)를 제어한다.The prefetch unit 7 fills the prefetch queue by generating a prefetch address so that the prefetch queue always has valid instructions, and controls the prefetch queue and sorter 9 so that the correct instructions go to the decoding unit 8. do.

프리페치 큐는 32 바이트 짜리 라인 두 개가 하나의 큐쌍을 구성하는 데, 점프 명령어 등에 의한 브랜치가 예상될 때마다 큐쌍을 바꾸어 로드한다.In the prefetch queue, two 32-byte lines form a queue pair. Whenever a branch is expected by a jump instruction or the like, the queue pair is changed and loaded.

따라서 디코딩 유닛(8)이 명령어를 읽어 갈 때는 브랜치가 예상되기 전까지는 한 쌍의 큐만을 억세스하여 명령어를 읽어가게 되고, 브랜치가 예상되었을 때 다른 쌍의 큐를 억세스하여 명령어를 읽어가게 된다. 따라서 양쪽 쌍의 큐에서 x 패어링 체크 유닛(5) 및 y 패어링 체크 유닛(6)으로 가는 패스가 정렬기 안에 있어야 한다.Therefore, when the decoding unit 8 reads an instruction, the instruction is accessed by accessing only a pair of queues until a branch is expected, and when the branch is expected, the instruction is accessed by accessing another pair of queues. Therefore, the path from both pairs of queues to the x pairing check unit 5 and the y pairing check unit 6 must be in the sorter.

제2도는 스플릿 라인 억세스와 아닌 경우 모두를 지원하기 위한 본 발명의 일실시예에 따른 명령어 정렬기의 구조도로서, 도면에서 sel0, sel1, sel2, Bsht_amt[4:0]은 외부(프리페치 유닛)에서 만들어지는 제어신호로 디코더에서 올바른 명령어를 읽어 가기 위한 제어를 수행한다.2 is a structural diagram of an instruction aligner according to an embodiment of the present invention for supporting both split line access and non-splitting, in which sel0, sel1, sel2, and Bsht_amt [4: 0] are external (prefetch units). This is a control signal created in, which performs control to read the correct command from the decoder.

본 실시예는 도면에 도시된 바와 같이 프리페치 유닛으로부터의 제어신호(sel0, sel1, sel2)에 따라 프리페치 큐에 저장된 명령어를 선택하여 출력하는 다수의 멀티플렉서(21 내지 25)와, 프리페치 유닛으로부터의 제어신호(Bsht_amt[4:0])에 따라 멀티플렉서(21 내지 25)에 의해 선택된 명령어를 로테이트시켜 디코딩 유닛으로 제공하는 바이트 로테이트(26)를 구비한다.In the present embodiment, as shown in the drawing, a plurality of multiplexers 21 to 25 for selecting and outputting a command stored in a prefetch queue according to control signals sel0, sel1, and sel2 from a prefetch unit, and a prefetch unit And a byte rotator 26 for rotating the instructions selected by the multiplexers 21 to 25 and providing them to the decoding unit in accordance with the control signal Bsht_amt [4: 0].

프리페치 큐(PFQ_00, PFQ_01, PFQ_10, PFQ_11)는 코드 캐쉬로부터 32 바이트의 명령어를 받아 디코딩 유닛으로 읽어갈 때까지 저장해 두는 레지스터 타입의 저장 장소이다. 캐쉬 미스시 외부 버스로부터도 명령어를 로드할 수 있도록 외부 버스 크기(외부 버스 크기가 64 비트라고 하면 64 비트 단위의 로드) 단위의 로드가 가능해야 한다.The prefetch queues PFQ_00, PFQ_01, PFQ_10, and PFQ_11 are register type storage locations that receive 32-byte instructions from the code cache and store them until they are read into the decoding unit. In order to be able to load instructions from a foreign bus at the time of a cache miss, it must be possible to load an external bus size (or 64-bit load if the external bus size is 64 bits).

유효 비트는 8 바이트마다 1 비트씩 존재가 유효하다는 것을 나타내는 것으로 디코딩 유닛으로 명령어를 읽어갈 때 유효한 명령어가 큐에 존재하고 있는지를 조사할 때 사용된다.The valid bit indicates that the existence of a bit is valid every 8 bytes and is used to check whether a valid instruction exists in the queue when reading the instruction into the decoding unit.

멀티플렉서(16_2×1)는 선택 신호(sel0, sel1)에 의해 2개의 16바이트 입력 중에서 한 16 바이트를 선택하여 출력시키는 멀티플렉서이다. 멀티플렉서(32_2×1)는 선택 신호(sel2)에 의해 2개의 32 바이트 입력 중 하나의 32 바이트를 선택하여 출력으로 내보내는 멀티플렉서이다. 선택 신호가 0이면 멀티플렉서의 0에 해당되는 입력이 선택되고 1이면 1에 해당되는 입력이 선택된다.The multiplexer 16_2x1 is a multiplexer which selects and outputs one 16 byte from two 16-byte inputs by the selection signals sel0 and sel1. The multiplexer 32_2 x 1 is a multiplexer which selects one of 32 bytes of two 32-byte inputs and outputs it to an output by the selection signal sel2. If the selection signal is 0, an input corresponding to 0 of the multiplexer is selected, and if 1, an input corresponding to 1 is selected.

바이트_로테이터(rotater)는 제어 신호(Bsht_amt[4:0])에 의해 32 바이트의 입력을 로테이트시켜 출력으로 내보낸다. 디코더 유닛에서 한번에 디코딩할 수 있는 명령어의 길이에 따라 바이트_로테이터 출력의 크기가 결정되는데 본 실시예에서는 12 바이트로 하였다.The byte rotator rotates the 32-byte input by the control signal Bsht_amt [4: 0] and sends it to the output. The size of the byte_rotator output is determined according to the length of the instruction that can be decoded in the decoder unit at one time.

예를 들어 다음의 16 진수로 표현된 32 바이트의 입력이 들어오고 제어 신호(Bsht_amt[4:0])의 값이 4라고 한다면 입력의 가장 하위 4 바이트를 잘라내고 거기로부터 12 바이트가 출력으로 나가게 된다.For example, if the following 32-byte input is represented in hexadecimal and the value of the control signal (Bsht_amt [4: 0]) is 4, the lowest 4 bytes of the input are truncated and 12 bytes are output from there. do.

32 바이트 입력 :32 byte input:

00ffeeddccbbaa998877665544332211112233445566778899aabbccddeeff0000ffeeddccbbaa998877665544332211112233445566778899aabbccddeeff00

12 바이트 출력 : 112233445566778899aabbcc12 bytes output: 112233445566778899aabbcc

선택 신호(sel0, sel1, sel2) 및 제어 신호(Bsht_amt[4:0])는 적절한 명령어를 디코딩 유닛으로 읽어가기 위해 프리페치 유닛에서 디코딩 유닛에서 받은 명령어의 크기, 프리페치 큐의 유효 비트(16 비트), 브랜치 프리딕션 히트(branch prediction hit) 여부 등을 고려하여 발생시킨다. 스플릿 라인 억세스를 지원하지 받는 경우는 선택 신호(sel0, sel1)에 의해 16 바이트씩 멀티플렉싱할 필요가 없이 하나의 제어 신호에 의해 32 바이트를 멀티플렉싱하면 된다. 스플릿 라인 억세스의 경우를 지원하기 위해 16 바이트짜리 멀티플렉서 4개로 선택 로직을 구성하였다.The selection signals sel0, sel1, sel2 and the control signals Bsht_amt [4: 0] are the size of the instruction received from the decoding unit in the prefetch unit, the valid bits of the prefetch queue (16) to read the appropriate instruction into the decoding unit. Bit), whether to make a branch prediction hit, or the like. In the case of not supporting the split line access, it is not necessary to multiplex by 16 bytes by the selection signals sel0 and sel1, and it is sufficient to multiplex 32 bytes by one control signal. The selection logic is configured with four 16-byte multiplexers to support split line access.

코드 캐쉬로부터 프리페치 큐로 계속해서 끊이지 않고 명령어가 공급된다고 가정하면 상기와 같은 구성을 가지는 정렬기의 동작은 다음과 같다.Assuming that instructions are supplied continuously from the code cache to the prefetch queue, the operation of the sorter having the above configuration is as follows.

코드 캐쉬로부터 프리페치 큐로 명령어를 읽어 들이기 위해 프리페치 유닛에서 프리페치 어드레스를 발생시키는데, 보통은 연속적으로 어드레스를 증가시키다가 브랜치가 예상되었을 때, 브랜치 회복(Branch recovery)이 필요할 때 등의 경우는 새로운 어드레스를 가지고 코드 캐쉬를 억세스한다. 프리페치 큐로 로드할 때 프리페치 어드레스의 비트 4, 5에 의해 4 가지 경우가 일어날 수 있다. 비트 4는 스플릿 라인 억세스인가 아닌가를 나타내고, 비트 5는 도면에서 위쪽 프리페치 큐(PFQ_00 or PFQ_10)에 로드할 것인가 아니면 아래쪽 프리페치 큐(PFQ_01 or PFQ_11)에 로드할 것인가를 결정한다.Prefetch addresses are generated by the prefetch unit to read instructions from the code cache into the prefetch queue, usually when the address is incremented continuously and branch recovery is needed, for example when branch recovery is needed. Access the code cache with the new address. When loading into the prefetch queue, four cases can occur due to bits 4 and 5 of the prefetch address. Bit 4 indicates whether split line access or not, and bit 5 determines whether to load to the upper prefetch queue PFQ_00 or PFQ_10 or to the lower prefetch queue PFQ_01 or PFQ_11 in the figure.

비트 4에 따라 다음의 2가지 경우로 나누어 설명한다. 브랜치가 예상되면 큐쌍만 바뀌는 것 이외에는 틀리는 것이 없으므로 큐 패어 0(PFQ_00, PFQ_01)만으로 설명한다.According to bit 4, the following two cases will be described. If a branch is expected, nothing is wrong except that only the cue pair is changed, so only the cue pair 0 (PFQ_00, PFQ_01) is used.

먼저, 넌스플릿(nonsplit) 라인 억세스(프리페치_어드레스[5:4] = 00 또는 10)인 경우에 대해서 살펴본다.First, the case of nonsplit line access (prefetch address [5: 4] = 00 or 10) will be described.

이 경우는 프리페치_어드레스[4]=0인 경우로 두 라인에 걸치지 않고 한 라인만을 억세스하여 프리페치_어드레스[5]에 따라 0이면 PFQ_00에 로드하고 프리페치_어드레스[5]가 1이면 PFQ_01에 로드한다.In this case, if prefetch_address [4] = 0, only one line is accessed without access to two lines. If it is 0 according to prefetch_address [5], it is loaded into PFQ_00 and prefetch_address [5] is 1 If so, it is loaded into PFQ_01.

처음 프리페치_어드레스[5]가 0이면 연속적인 그 다음 라인을 억세스할 때는 프리페치_어드레스[5]가 하나 증가하여 1이 되므로 PFQ_00와 PFQ_01을 번갈아 가며 로드한다. 프리페치 유닛이 프리페치 큐를 로드할 때 프리페치 큐가 비어 있다는 것을 감지할 수 있도록 프리페치 큐에 유효 비트(valid bit)를 둔다.If the first prefetch address [5] is 0, when accessing the next consecutive line, the prefetch address [5] is increased by one to 1, so that PFQ_00 and PFQ_01 are loaded alternately. When the prefetch unit loads the prefetch queue, it puts a valid bit in the prefetch queue so that it can detect that the prefetch queue is empty.

버스 유닛으로부터의 64 비트 로드도 지원할 수 있도록 8 바이트마다 유효 비트를 두어야 하므로 PFQ_00에 4 비트, PFQ_01에 4 비트를 둔다.There are 4 bits in PFQ_00 and 4 bits in PFQ_01 because valid bits must be placed every 8 bytes to support 64-bit loads from the bus unit.

제어 신호(sel0)는 PFQ_00의 상위 16 비트(31-16)나 PFQ_01의 상위 16비트 중 하나를 선택하는 멀티플렉서(MUX)의 선택 신호이고, 제어 신호(sel1)는 PFQ_00의 하위 16 바이트(15-0)나 PFQ_01 하위 16 바이트 중 하나를 선택하는 멀티플렉서(MUX)의 선택 신호이다.The control signal sel0 is a selection signal of the multiplexer MUX that selects one of the upper 16 bits 31-16 of PFQ_00 or the upper 16 bits of PFQ_01, and the control signal sel1 is the lower 16 bytes (15- 15) of PFQ_00. 0) or PFQ_01 The lower 16 bytes select signal of the multiplexer (MUX).

제어신호(sel2)는 브랜치 예상이나 브랜치 회복에 의해 큐쌍이 바뀌었을 때 발생한다.The control signal sel2 is generated when the cue pair is changed by branch prediction or branch recovery.

제어 신호(Bsft_amt[4:0])는 32 바이트의 입력을 최소 1 바이트에서 최대 31 바이트까지 로테이트시키라는 제어 신호이다. 이 신호는 스플릿라인 억세스와 가변 길이의 명령어 프리페치를 지원하기 위해 필요하다.The control signal Bsft_amt [4: 0] is a control signal for rotating an input of 32 bytes from a minimum of 1 byte to a maximum of 31 bytes. This signal is needed to support splitline access and variable length instruction prefetch.

이렇게 코드 캐쉬 정렬기와 제어 신호(sel0, sel1, sel2, Bsfr_amt[4:0])를 사용하여 두 라인에 걸쳐 있는 가변 길이의 명령어를 한번에 디코딩 유닛으로 보낼 수 있다.In this way, the code cache aligner and the control signals sel0, sel1, sel2, and Bsfr_amt [4: 0] can be used to send a variable-length instruction that spans two lines to the decoding unit at once.

처음 프리페치 어드레스 비트 5가 0이었다고 하면 PFQ_00에 처음으로 명령어가 로드되었으므로 PFQ_00에서부터 가져오기 시작한다. 이때 프리페치_어드레스[4:0]에 위해 PFQ_00의 처음에 거져 올 바이트의 위치가 결정된다. 이때 제어 신호(sel0, sel1, sel2)의 값은 모두 0이다.If the first prefetch address bit 5 is zero, the instruction is loaded for the first time in PFQ_00, and the fetch starts from PFQ_00. At this time, for the prefetch address [4: 0], the position of the byte to be received at the beginning of PFQ_00 is determined. At this time, the values of the control signals sel0, sel1, and sel2 are all zero.

프리페치_어드레스[4]가 0이므로 PFQ_00의 하위 16 바이트 중 한 바이트부터 페치를 시작하다가 하위 반 라인(하위 16 바이트)에서 상위 반 라인으로 넘어가면 하위 반 라인은 이미 다 읽어 갔으므로 제어신호(sel1)를 1로 바꾸어 PFQ_01의 하위 반 라인을 선택한다. 그러다가 다시 PFQ_00의 상위 반 라인을 다 읽고 PFQ_01의 하위 반 라인으로 넘어갈 때 제어 신호(sel0)를 1로 바꾸어 PFQ_01의 상위 반 라인을 선택한다.Since prefetch address [4] is 0, fetching starts from one byte of the lower 16 bytes of PFQ_00, and then moves from the lower half line (lower 16 bytes) to the upper half line. Change sel1) to 1 to select the lower half line of PFQ_01. Then, when the upper half line of PFQ_00 is read again and the lower half line of PFQ_01 is passed, the control signal sel0 is changed to 1 to select the upper half line of PFQ_01.

브랜치에 의한 큐쌍이 바뀌기 전까지 이런 방식으로 명령어를 계속 읽어 가게 된다.In this way, the instruction continues to read until the branch pair is changed.

한편, 스플릿 라인 억세스(프리페치_어드레스 [5:0] = 01 or 11)인 경우는 다음과 같다.On the other hand, the case of the split line access (prefetch_address [5: 0] = 01 or 11) is as follows.

이 경우는 프리페치_어드레스[4]=1인 경우로 두 라인에 걸쳐 한 라인과 그 다음 라인을 억세스하여 프리페치_어드레스[5]에 따라 0이면 PFQ_00에 로드하고 프리페치_어드레스[5]가 1이면 PFQ_01에 로드한다.In this case, prefetch_address [4] = 1. One line and the next line are accessed over two lines, and if it is 0 according to prefetch_address [5], it is loaded into PFQ_00 and prefetch_address [5]. If it is 1, it is loaded in PFQ_01.

이때 프리페치_어드레스[5]가 0일 때 PFQ_00의 상위 반 라인(31-16 바이트)에는 현재 어드레스의 상위 반 라인이, PFQ_00의 하위 반 라인에는 그 다음 라인의 하위 반 라인이 로드된다. 역시 처음 프리페치_어드레스[5]가 0이면 연속적인 그 다음 라인을 억세스할 때는 프리페치_어드레스[5]가 하나 증가하여 1이 되므로 PFQ_00와 PFQ_01을 번갈아가며 로드한다.At this time, when the prefetch address [5] is 0, the upper half line of the current address is loaded in the upper half line (31-16 bytes) of PFQ_00, and the lower half line of the next line is loaded in the lower half line of PFQ_00. Also, if the first prefetch address [5] is 0, when the next successive line is accessed, the prefetch address [5] is increased by one to 1, so that PFQ_00 and PFQ_01 are alternately loaded.

처음 프리페치 어드레스 비트 5가 0이었다고 하면 PFQ_00에 처음으로 명령어가 로드 되었으므로 PFQ_00에서부터 가져오기 시작한다. 이때 프리페치_어드레스[4:0]에 의해 PFQ_00의 처음에 가져올 바이트의 위치가 결정된다. 이때 제어 신호(sel0, sel1, sel2)의 값은 모두 0이다. 프리페치_어드레스[4]가 1이므로 처음 페치할 명령어는 PFQ_00의 상위 반 라인에 위치한다.If the first prefetch address bit 5 is zero, the instruction is loaded for the first time in PFQ_00. At this time, the position of the byte to be fetched at the beginning of PFQ_00 is determined by the prefetch address [4: 0]. At this time, the values of the control signals sel0, sel1, and sel2 are all zero. Since prefetch address [4] is 1, the first instruction to fetch is located on the upper half line of PFQ_00.

PFQ_00의 상의 16 바이트 중 한 바이트부터 페치를 시작하다가 PFQ_00의 상위 반 라인(31-16의 16 바이트)에서 PFQ_00의 하위 반 라인으로 넘어가면 선택 신호(sel0)는 PFQ_01의 상위 반 라인을 선택한다. 그러다가 다시 PFQ_00의 하위 반 라인을 다 읽고 PFQ_01의 상위 반 라인으로 넘어갈 때 제어 신호(sel1)를 1로 바꾸어 PFQ_01의 하위 반 라인을 선택한다. 브랜치에 의한 큐쌍이 바뀌기 전까지 이런 방식으로 명령어를 계속 읽어가게 된다.When the fetch starts from one byte of the 16 bytes of PFQ_00, and then goes from the upper half line (16 bytes of 31-16) of PFQ_00 to the lower half line of PFQ_00, the selection signal sel0 selects the upper half line of PFQ_01. Then, after reading the lower half line of PFQ_00 again and changing to the upper half line of PFQ_01, the control signal sel1 is changed to 1 to select the lower half line of PFQ_01. In this way, the instruction continues to read until the branch pair is changed.

참고적으로, 마이크로프로세서뿐 아니라 마이크로 컨트롤러 등 명령어를 수행시키는 어떠한 컴퓨팅 시스템에서도 본 정렬기의 사용이 가능하다.For reference, the aligner may be used in any computing system that executes instructions such as a microprocessor as well as a microprocessor.

상기와 같이 이루어지는 본 발명은 프리페치 큐와 캐쉬 블럭 사이에 어떠한 인터페이스도 필요치 않고, 모든 명령어 정렬 기능을 프리페치 큐와 디코더 사이에 놓음으로써, 디자인 개량이 용이한 특유의 효과가 있다.The present invention as described above does not require any interface between the prefetch queue and the cache block, and all instruction alignment functions are placed between the prefetch queue and the decoder, so that the design improvement is easy.

Claims

An instruction aligner for correctly providing an instruction stored in an external prefetch queue to a decoding unit according to a plurality of control signals generated in an external prefetch unit, wherein the instruction aligner is configured according to the plurality of first control signals from the prefetch unit. Selecting means for selecting and outputting a command stored in the prefetch queue; And rotating means for rotating the instruction selected by the selection means according to the second control signal from the prefetch unit and providing the instruction to the decoding unit.

2. The apparatus of claim 1, wherein the selecting means comprises: a plurality of first multiplexers for selecting and outputting a command stored in each of the prefetch queues using any one of the plurality of first control signals as a selection signal; And a second multiplexer which selects a predetermined output among the outputs of the first multiplexer and selects one of the plurality of first control signals as a selection signal and supplies the selected output to the input of the rotating means. group.

3. The instruction aligner of claim 1 or 2, wherein the rotating means determines the size of its output in accordance with the length of the instruction that can be decoded at one time in the decoding unit.

The instruction aligner of claim 3, wherein the plurality of first and second control signals are generated according to a size of an instruction from the decoding unit, a valid bit of the prefetch queue, and a branch predicate hit. .