CN107092464B

CN107092464B - Method for generating selectively compressed microprogram code and selectively decompressed microprogram code

Info

Publication number: CN107092464B
Application number: CN201611196464.XA
Authority: CN
Inventors: G·葛兰·亨利; 泰瑞·派克斯; 布兰特·比恩
Original assignee: Via Technologies Inc
Current assignee: Via Technologies Inc
Priority date: 2013-10-18
Filing date: 2014-09-04
Publication date: 2020-06-23
Anticipated expiration: 2034-09-04
Also published as: TW201516861A; CN107092464A; CN107085513A; CN104216682A; CN107085513B; CN104216682B; TWI522910B

Abstract

The invention provides a method for generating selectively compressed microprogram code and selectively decompressing microprogram code. The method for generating the selective compression micro program code comprises the following steps: receiving a source code, the source code including a plurality of microcode assembly language instructions, wherein each of a portion of the microcode assembly language instructions is marked with an indication of the source code; and for each of the microcode combination language instructions, generating a single-word compressed binary representation of the microcode combination language instruction if not indicated by the indication, and generating a multi-word uncompressed binary representation of the microcode combination language instruction if indicated by the indication. The invention can reduce the complexity of the microprocessor and the volume of the microprocessor.

Description

Method for generating selectively compressed microprogram code and selectively decompressed microprogram code

The present application is a divisional application filed on 9/4/2014 under the name of 201410447345.1 entitled "microprocessor and related method for selectively decompressing microcode".

Technical Field

The present invention relates to selectively compressing and decompressing microcode instructions.

Background

Modern advanced microprocessors typically include instruction microcode that executes infrequently executed instructions of a complex and/or microprocessor instruction set architecture. The benefit of implementing instruction set architectures with microprogram code for some instructions is that the complexity of other functional units (e.g., execution units) of the microprocessor may be reduced. Microprocessor code is also required as the number of instructions and complexity of instruction set architectures increase, and based on this need, microprocessors have added non-volatile memory (e.g., Read Only Memory (ROM)) to store microcode. However, the problem is exacerbated as the number of cores in a multi-core microprocessor increases, where each core has memory to store microcode.

Disclosure of Invention

The invention provides a microprocessor. The microprocessor includes one or more memories, each of the one or more memories holding a plurality of microcode instructions. At least one first memory is configured to provide M-bit wide microcode word sets for compressed microcode instructions, at least one second memory is configured to provide N-bit wide microcode word sets for uncompressed microcode instructions, M and N are integers greater than zero, and N is greater than M. The microprocessor also includes a decompression unit for decompressing the compressed microcode instructions after they are accessed from at least one of the first memories and before they are executed.

The invention also provides a method for selectively decompressing the microprogram code. The method for selectively decompressing microcode includes receiving a first N-bit wide microcode word set from a memory. The method for selectively decompressing microcode also includes determining whether a predetermined portion of the first N-bit-wide microcode word is a predetermined value. The method for selectively decompressing microcode also includes decompressing the first N-bit-wide microcode word set to generate an M-bit-wide microcode word set if the predetermined portion is not the predetermined value, where M and N are integers greater than zero and M is greater than N. The method for selectively decompressing microcode also includes, if the predetermined portion is the predetermined value, receiving a second N-bit wide microcode word set from the memory and combining portions of the first N-bit wide microcode word set and the second N-bit wide microcode word set to generate the M-bit wide microcode word set.

The invention also provides a method for generating the selective compression microprogram code. The method for generating selectively compressed microcode includes receiving a source code, the source code including a plurality of microcode assembly language instructions, wherein each of a portion of the microcode assembly language instructions is marked with an indication of the source code. The method also includes, for each of the microcode assembly language instructions, generating a single-word compressed binary representation of the microcode assembly language instruction if not indicated by the indication, and generating a multi-word uncompressed binary representation of the microcode assembly language instruction if indicated by the indication.

The invention also provides a description generating method which is suitable for generating a description of the microprogram code decompressing device. The method includes receiving a source code, wherein the source code includes a plurality of microcode assembly logic language instructions. The method also includes generating an uncompressed binary representation for each of the microcode assembly logic language instructions. The method also includes, for each unique instruction of the microcode assembly language instruction, generating a correspondence relationship between the corresponding uncompressed binary representation of the unique instruction and a compressed binary representation.

The invention also provides a microprocessor. The microprocessor includes a plurality of memories, each of the memories holding a plurality of micro-program code instructions. At least one first memory of the memory is configured to provide a plurality of M-bit wide microcode word sets of compressed microcode instructions, and at least one second memory of the memory is configured to provide a plurality of N-bit wide microcode word sets of uncompressed microcode instructions, where M and N are integers greater than zero and N is greater than M. The microprocessor also includes a decompression unit configured to decompress the compressed microcode instructions after they have been accessed from at least the first memory and before they are executed.

The present invention also provides a method for selectively decompressing microcode, the method for selectively decompressing microcode is suitable for decompressing a plurality of microcode instructions in a microprocessor with a plurality of memories, each of the memories is used for holding the microcode instructions. The method for selectively decompressing microcode includes accessing a plurality of M-bit wide microcode word sets of the compressed microcode instructions from at least a first memory of the memory. The method also includes accessing a plurality of N-bit-wide microcode word sets of uncompressed microcode instructions from at least a second memory of the memory, where M and N are integers greater than zero and N is greater than M. The method for selectively decompressing microcode also includes decompressing the compressed microcode instructions accessed from the first memory. The method of selectively decompressing microcode also includes transferring uncompressed microcode instructions without decompression.

The present invention also provides a computer program product programmed on at least one non-transitory computer-usable medium for use with a computing device. The computer program product includes a computer usable program code embodied in the non-transitory computer usable medium to designate a microprocessor. The computer usable program code includes a first program code specifying a plurality of memories, each of the memories holding a plurality of microcode instructions, wherein at least a first memory of the memories is configured to provide a plurality of M-bit wide microcode words for a plurality of compressed microcode instructions. At least one second memory of the memory is configured to provide a plurality of N-bit wide microcode word sets of uncompressed microcode instructions, where M and N are integers greater than zero and N is greater than M. The computer usable program code also includes second program code specifying a decompression unit for decompressing the compressed microcode instructions after they are accessed from the first memory and before they are executed.

The invention can reduce the complexity of the microprocessor and the volume of the microprocessor.

Drawings

FIG. 1 is a block diagram of a multi-core microprocessor according to an embodiment of the invention.

FIG. 2 is a detailed block diagram of the processing core of FIG. 1 according to the present invention.

Fig. 3 is a detailed block diagram of the decompression unit according to the embodiment of fig. 2 of the present invention.

FIG. 4 is a flowchart illustrating selective compression of microcode instructions by a combinatorial process according to one embodiment of the invention.

FIG. 5 is a flowchart illustrating the process of creating selectively compressed microcode according to an embodiment of the invention.

FIG. 6 is a flowchart illustrating combining microcode by a combination process according to an embodiment of the invention.

FIG. 7 is a flowchart illustrating operation of the complex instruction translator of FIG. 2, and in particular the decompression unit of FIG. 3, according to one embodiment of the present invention.

FIGS. 8-13 are block diagrams illustrating different combinations of packed and unpacked microcode instructions held by a microprocessor having a microcode memory according to another embodiment of the present invention.

Wherein the symbols in the drawings are briefly described as follows:

100 microprocessor

102 processing core

104-core microprogram code read-only memory

106 arbitration logic

108 uncore read only memory

114 uncore microcode patch RAM

202 instruction cache

204 simple instruction translator

206 temporary storage nickname table

208 reservation station

212 execution unit

214 retirement unit

216 instruction access unit

218 structure program counter

222 first multiplexer

224 destination address

226 micro instruction

232 program counter of unstructured micro program code

235 instruction indirect temporary memory

236 micro program

237 micro translator

239 decompression unit

242 Structure instruction

244 first microinstruction

245 select control input

246 second microinstruction

247 first Microcode instructions

248 selection control input

251 second Microprogram code instruction

252 micro program code address

253 uncompressed micro program code instructions

254 non-structural micro program code access address

255 micro instruction information

262 memory subsystem

264 temporary storage

292 second multiplexer

294 instruction translator

296 Complex instruction translator

306 Patching content addressable memory

308 patch address

353 uncompressed micro program code instruction

35538 bit result

392 three-terminal input multiplexer

394 decompressor

396 control logic

398 buffer

402 first micro code assembly language instructions

412 first escape pointer

414 second Microprogram code combination language instruction

432 second escape pointer

16 bits below 434

436 at the 6 position

438 upper 22 bits

1299 core patch RAM

502 source code

504 combination language

506 selectively compressing binary microcode

508 listing all of the MicroProgramming instructions

512 compression program

514 compression table

516 decompress the description

518 logic Generation program (ESPRESSO)

524 decompressor register transfer language file

602 to 612, 702 to 714.

Detailed Description

Most, but not all, of the micro code instructions described herein are compressed and maintained in a micro code memory. The microprocessor includes a decompression unit, such as the decompression unit 239 of FIG. 2, for selectively decompressing selected compressed microcode instructions. According to one embodiment of the present invention, the micro-code instructions maintained in memory are uncompressed and thus longer than the byte width maintained by memory. In this case, the micro-code instructions are divided into two parts and stored in two different words of the memory. In these cases, a predetermined value (referred to herein as a "jump out" pointer) is placed (e.g., by the microprogram code assembly process) in a predetermined portion of the first word of the two-word sequence. When the decompression unit detects that the skip pointer exists in the first word read from the memory, the decompression unit combines the appropriate portions of the two words to generate the uncompressed microcode instructions. This advantageously provides that most of the microcode instructions maintained in memory can be compressed and made shorter in width than implementations not included in the selective compression mechanism.

Furthermore, many embodiments describe a microprocessor having a plurality of microcode memories, at least one of the microcode memories having a width of compressed microcode instructions to provide compressed microcode instructions and at least one of the microcode memories having a width of uncompressed microcode instructions to provide uncompressed microcode instructions. Various combinations of compressed width and uncompressed width core, uncore, and patch microcode memories are detailed herein.

Referring to FIG. 1, FIG. 1 is a block diagram illustrating a multi-core microprocessor 100 according to an embodiment of the invention. The entities of microprocessor 100 are located on an integrated circuit, and microprocessor 100 includes a plurality of processing cores 100, an uncore Read Only Memory (ROM)108 shared by the plurality of processing cores 102, an uncore microcode patch Random Access Memory (RAM)114 shared by the plurality of processing cores 102, and arbitration logic 106 (also referred to as control logic) coupling processing cores 102 to uncore read only memory 108 and uncore microcode patch random access memory 114. Each processing core 102 includes a corresponding core microcode rom 104, and the corresponding core microcode rom 104 is not shared with other processing cores 102 but is dedicated to the respective processing core 102. Each processing core 102 is coupled to arbitration logic 106 via a corresponding bus 112. The uncore rom 108, the uncore microcode patch ram 114, and the core microcode rom 104 are all holding microcode instructions.

Microprocessor 100 includes what is referred to as an uncore. The uncore portion is part of microprocessor 100 but is not part of any processing core 102. The uncore ROM 108 and the uncore microcode patch RAM 114 are located in the uncore portion of the microprocessor 100. In one embodiment, processing core 102 is a single design and replicated. Four processing cores 102 are shown in the embodiment of FIG. 1, while other embodiments have a different number of processing cores 102. Arbitration logic 106 is also disposed in the uncore portion of microprocessor 100, and arbitration decisions are performed by arbitration logic 106 when multiple processing cores 102 request access to uncore read only memory 108 or uncore microcode patch random access memory 114.

Uncore read only memory 108 provides a number of words (referred to as "J") to store microcode instructions and is accessible by all processing cores 102. Each core microcode ROM 104 provides a number of words (referred to as "K") to store microcode instructions and is accessible only to the corresponding processing core 102. The J word blocks of the uncore rom 108 and the corresponding K word blocks of the core microcode rom 104 are located in an address space different from the microcode memory address space of the processing core. In summary, for each processing core, the corresponding core microcode ROM 104 and non-core ROM 108 provide J + K words of storage space for microcode instructions that can be accessed by the processing core 102.

In one embodiment, the uncore ROM 108 has J addressable memory locations, each addressable memory location being the width of a packed microcode instruction. In one embodiment, each NROM 108 has K addressable memory locations, each addressable memory location being the width of a packed microcode instruction. In one embodiment, the compressed microcode specifies a width of 28 bits, while the uncompressed or decompressed microcode instructions are 38 bits wide.

According to one embodiment of the invention, uncore read only memory 108 includes a single read port that is common to all processing cores 102, and arbitration logic 106 for granting access to the read port to processing cores 102 according to an arbitration algorithm. According to one embodiment of the present invention, if only one processing core 102 requests access to the NROM 108 during a given request cycle, the arbitration logic 106 grants the request to the processing core 102; if multiple processing cores 102 request access to the uncore rom 108 during a given request cycle, the arbitration logic 106 grants access in a round-robin order, although other arbitration algorithms may be used in the present invention. In other embodiments, uncore read only memory 108 may include a read port for each processing core 102. However, it should be noted that the die area is increased when the uncore rom 108 includes more read ports, and thus the technical efficiency associated with the die area is reduced.

An advantage of utilizing the uncore ROM 108 is that when micro code instructions are accessed from the uncore ROM 108, they may be used in a manner that increases latency as a cost of reducing the area of the microprocessor 100. The increased latency results in greater distance between the microcode elements of each processing core 102 to the NROM 108, i.e., propagation delay is increased by the additional distance, requiring additional pipeline stages and associated additional clock cycles. Furthermore, the increase in latency may also be attributed to the phenomenon caused by the non-core rom 108 sharing resources and having to be allocated to each processing core 102, and when multiple processing cores 102 require access to the non-core rom 108. Furthermore, the shared nature of the uncore ROM 108 may result in variable access latency, unlike the fixed latency of the core microcode ROM 104. However, in some embodiments, increased and/or varied delay times come at the expense of reduced die area. Advantageously, the size of the uncore read only memory 108 may be further reduced by selective microcode compression as described in embodiments.

Referring to FIG. 2, FIG. 2 is a detailed block diagram of the processing core 102 according to the embodiment of the invention shown in FIG. 1. The processing core 102 includes a pipeline stage that includes a number of functional units. In one embodiment, processing core 102 is a superscalar (superscalar), out-of-order execution (out-of-order execution), pipelined data processing core. The processing core 102 includes an instruction access unit 216 that is coupled to the instruction cache 202, to the instruction translator 294, to a Register Alias Table (RAT)206, to the reservation station 208, to the execution unit 212, and to the retirement unit 214. The execution unit 212 receives operands from registers 264 (architected or not) and the memory subsystem 262. Retirement unit 214 retires the microinstruction results to registers 264 and memory subsystem 262. The instruction translator 294 is coupled to the arbitration logic 106 via the bus 112 of FIG. 1. Access unit 216 includes a fabric program counter 218 of processing core 102. when retirement unit 214 retires an instruction, retirement unit 214 updates fabric program counter 218. The access unit 216 provides the architectural instruction access address to the instruction cache 202. Based on the fabric program counter 218, the access unit 216 generates a fabric access address. Additionally, execution unit 212 may execute branch instructions and provide branch target addresses to access unit 216 for generating fabric access addresses. Finally, a branch predictor (not shown) of the access unit 216 may provide a predicted branch target address for generating the fabric access address. The architectural program counter 218 is distinct from the instruction unstructured microcode program counter 224 maintained by the instruction translator 294, and the architectural access address generated by the access unit 216 for delivery to the instruction cache 202 is distinct from the unstructured microcode access address 254 generated by the microprogram 236 (also referred to as control logic) of the access unit 216, as described in greater detail below.

Architectural instructions 242, also referred to as micro instructions or ISA instructions, cached by the instruction cache 202 are defined by the instruction set architecture of the microprocessor 100 (e.g., x86, ARM, SPARC, etc.). The instruction translator 294 translates architectural instructions 242 into micro instructions 226 for the micro architecture of the processing core 102. the instruction translator 294 typically has Reduced Instruction Set (RISC) architecture related features.

The instruction translator 294 provides micro instructions to the RAT 206 in program order. The RAT 206 allocates the microinstruction entries in program order in the reorder buffer of the retirement unit 214. The RAT 206 performs the renaming of the registers 264. The RAT 206 provides micro instructions to the reservation station 208, which are issued from the reservation station 208 to the execution unit 212 and executed in a non-sequential manner when the source operands for each micro instruction are available and the execution unit 212 is capable of executing the micro instruction. Retirement unit 214 may retire instructions to the architectural state of processing core 102 in program order by retiring the results of micro instructions executed by execution unit 212. The execution units 212 may include load units, store units, integer units, floating point units, branch units, Single instruction multiple data Stream (SIMD) units, and the like. The load unit reads data from the level-1, L1 data cache and the store unit writes data to the level-1 data cache. The level-2 (L2) data cache serves as a backup storage for the level-one data cache and the instruction cache 202.

The instruction translator 294 receives blocks of the architectural instructions 242 from the instruction cache 202 of FIG. 2. The architectural instructions 242 are also referred to as micro instructions 242 or ISA instructions 242. The instruction translator 294 translates the architectural instructions 242 into the implementation micro instructions 226 provided to the RAT 206. The instruction translator 294 includes a Simple Instruction Translator (SIT)204, a Complex Instruction Translator (CIT)296, and a second multiplexer 292. The simple instruction translator 204 outputs the first microinstruction 244 and the microcode address 252. The complex instruction translator 296, also referred to as a micro code unit 296, receives the micro code address 252 and provides for the implementation of the second micro instruction 246. The second multiplexer 292 receives the first microinstruction 244 from the simple instruction translator 204 on one input and the second microinstruction 246 from the complex instruction translator 296 on another input and provides the implementing microinstruction 226 to the execution unit 212 of FIG. 2 according to the select control input 248.

The complex instruction translator 296 includes a micro program 236, the core microcode ROM 104 of FIG. 1, an Instruction Indirect Register (IIR)235, a first multiplexer 222, a decompression unit 239, and a micro translator 237. The micro program 236 receives the micro code address 252 and maintains an unstructured micro code program counter (micro-PC) 232. The core microcode rom 104 receives the address 254 accessed according to the non-architectural microcode program counter 232. Furthermore, the uncore ROM 108 also receives an unstructured microcode access address 254 via the bus 112. The first multiplexer 222 receives the micro program instructions 251 from the non-shared core ROM 104 at a first input, and receives the micro program instructions 249 from the shared non-core ROM 108 (via the bus 112) at a second input, and outputs the first micro program instructions 247 based on the select control input 245 generated by the micro program 236. The unpack unit 239 receives the first microcode instructions 247 from the first multiplexer 222 and selectively unpacks the first microcode instructions 247 to generate uncompressed microcode instructions 253. The micro-translator 237 translates the uncompressed micro-code instructions 253 received by the decompression unit 239 to generate the second microinstruction 246 output by the complex instruction translator 296. The first microinstruction 244 generated by the simple instruction translator 204 and the second microinstruction 246 generated by the complex instruction translator 296 are microinstructions 226, which are sets of microinstructions of the microarchitecture of the microprocessor 100, and are executed by the execution unit 212 pipeline.

The second multiplexer 292 is controlled by the select control input 248. The second multiplexer 292 generally selects the first micro instruction 244 from the simple instruction translator 204; however, when the simple instruction translator 204 encounters a complex instruction 242 and transfers control to (or via traps to) the complex instruction translator 296, the simple instruction translator 204 controls the select control input 248 such that the second multiplexer 292 selects the second microinstruction 246 from the complex instruction translator 296. When the alias register table 206 encounters a microinstruction 226 with special bit sets indicating that it is the last microinstruction 226 in the sequential execution of the complex instruction 242 (denoted herein as the ". T" bit), the alias register table 206 controls the select control input 248 such that the second multiplexer 292 returns to selecting the first microinstruction 244 from the simple instruction translator 204. In addition, when the retirement unit 214 is ready to retire the microinstruction 226 but its status indicates that the microinstruction 226 has caused an exception, the retirement unit 214 controls the select control input 248 such that the second multiplexer 292 selects the second microinstruction 246 from the complex instruction translator 296.

The simple instruction translator 204 receives and decodes the architectural instruction 242 to determine whether the architectural instruction 242 is a simple micro instruction or a complex micro instruction. The simple architecture instruction 242 is the instruction for which the simple instruction translator 204 issues all of the implementation microinstructions 226 to implement the architecture instruction 242, i.e., the complex instruction translator 296 does not provide any implementation microinstructions for the simple architecture instruction 242. In contrast, the complex architecture instruction 242 requires the complex instruction translator 296 to provide at least some, if not all, of the implementing microinstructions 226. In one embodiment, for a subset of the architectural instructions 242, the simple instruction translator 204 issues the first portion 244 of the architectural instructions 242 to implement the architectural instructions 242, then transfers control to the complex instruction translator 296, and finally issues the remaining microinstructions 246 to implement the architectural instructions 242. The second multiplexer 292 is controlled to first convert the implementation microinstruction 244 from the simple instruction translator 204 into microinstructions 226 and provided to the execution units 212, and to second convert the implementation microinstructions 246 from the complex instruction translator 296 into microinstructions 226 and provided to the execution units 212. The simple execution translator 204 knows the starting microcode addresses of the various microcode routines employed by the complex instruction translator 294 and uses them to generate the implementing microinstructions 226 for the various complex structure instructions 242, and provides the associated microcode addresses 252 to the non-architectural microcode program counter 232 of the complex instruction translator 296 when the simple instruction translator 204 decodes a complex structure instruction 242. All of the first microinstructions 244 issued by the simple instruction translator 204 are relatively high-volume architectural instructions 242, particularly architectural instructions 242 that tend to be frequently executed by ISA machine language programs, while only a relatively small volume of the complex instruction translator 296 is required to provide the second microinstructions 246. In one embodiment, the simple instruction translator 204 is a Boolean logic gate block synthesized using well-known synthesis tools.

The complex instruction translator 296 outputs the sequence of the second microinstruction 246 to the second multiplexer 292. The core microcode rom 104 or the non-core rom 108 stores

second microcode instructions

251 and 249 for selectively compressing microcode programs. The core microcode ROM 104/the non-core ROM 108 outputs the selectively compressed second microcode instructions 251/the selectively compressed microcode instructions 249 in response to the non-architectural microcode access address 254, wherein the non-architectural microcode access address 254 is stored in the non-architectural microcode program counter 232. Generally, the non-architectural micro code program counter 232 receives initial values of the micro code address 252 from the simple instruction translator 204 in response to the complex architecture instruction 242 decoded by the simple instruction translator 204. In other cases, such as in response to a reset or exception, the non-architectural microcode program counter 232 receives the reset microcode program address or the appropriate microcode exception processing address, respectively. Generally, the micro program 236 increments the non-architectural micro code program counter 232 by the size of the micro code instructions (which are the size of the word in the core micro code ROM 104 or the non-core ROM 108 according to one embodiment of the present invention) for sequential passage through the micro code program. In addition, the micro program 236 updates the non-architectural micro code program counter 232 in response to the decode control type micro code instruction (e.g., branch instruction) based on the target address 224 generated by the micro translator 237; or the target address generated by the execution unit 212 in response to execution of the control type microinstruction 226 to update the unstructured microcode program counter 232; or to update the non-sequential locations of the core microcode ROM 104 or the non-core ROM 108 to enable the branch. The core microcode rom 104 and the uncore rom 108 are fabricated on the same semiconductor die as the microprocessor 100.

In addition to the first micro instruction 244 of the simple structure instruction 242, which implements part of the complex structure instruction 242, the simple instruction translator 204 also generates micro instruction information 255, the micro instruction information 255 being written to an Instruction Indirect Register (IIR) 235. The micro instruction information 255 stored in the instruction indirect register 235 includes information regarding the translation of the architectural instruction 242, such as identifying the source and destination registers specified by the architectural instruction 242 and the format of the architectural instruction 242, such as whether the architectural instruction 242 operates on memory operands or on an architectural register 264 of the microprocessor 100. This allows the microcode routines to be generic, i.e., each different source and/or target architecture register 264 need not have a different microcode routine. Specifically, the simple instruction translator 204 knows the registers 264 and provides the micro instruction information 255 to the appropriate registers of the registers 264 after translating the register information provided by the architectural instruction 242. The micro instruction information 255 also includes displacement fields, i.e., time domain, constant field, renaming information for each source operand as well as the micro instructions 226 themselves, information indicating the first and last micro instructions of the sequence of micro instructions 226 implementing the architectural instruction 242, and other bits of useful information gathered by the simple instruction translator 204 when decoding the architectural instruction 242.

The micro-translator 237 receives the uncompressed micro code instructions 253 from the decompression unit 239 and the contents of the instruction indirect register 235, and generates the second micro instruction 246 in response. The micro-translator 237 translates certain uncompressed microcode instructions 253 to different sequences of micro-instructions 246 according to information received from the instruction indirect registers 235, such as according to the format of the architectural instructions 242 and the combination of source and/or target architecture registers 264 specified thereby. In some cases, most of the micro instruction information 255 is merged with the uncompressed micro code instructions 253 to generate the second micro instructions 246. In one embodiment, each uncompressed micro program code instruction 253 is 38 bits wide and each second micro instruction 246 is approximately 200 bits wide. In one embodiment, the micro-translator 237 is capable of generating up to three second micro instructions 246 from the uncompressed micro code instruction 253. The micro-translator 237 includes Boolean logic gates that generate the second microinstruction 246.

Since the simple instruction translator 204 generates the micro instruction information 255, the core microcode ROM 104 and the non-core ROM 108 do not need to store the micro instruction information 255 provided by the instruction indirect register 235, and thus the micro translator 237 provides an advantage of reducing the size of the core microcode ROM 104 and the non-core ROM 108. Furthermore, microcode routines may include fewer conditional branch instructions, as the microcode routines need not include separate routines for each different microinstruction format and for each combination of source and/or target architecture registers 264. For example, if the complex architecture instruction 242 is of the memory type, the simple instruction translator 204 may generate a first micro instruction 244 to begin with, including the first micro instruction 244 to load source operands from memory into the temporary register 264, and the micro translator 237 may generate a second micro instruction 246 to store the result from the temporary register to memory; if the complex architecture instruction 242 is in register format, the first microinstruction 244 may begin moving source operands from the source registers specified by the architecture instruction 242 to the temporary registers 264, and the micro-translator 237 may generate the second microinstruction 246 to move results from the temporary registers to the architectural target registers 264 specified by the instruction indirect registers 235. In one embodiment, the micro-translator 237 is similar in many respects to the micro-translator 237 described in U.S. patent application No. US 12/766,244 filed on 23/4/2010, which claims priority to US provisional application No. US61/234,008 filed on 14/8/2009, which was published on 2011 on 2/17 as U.S. publication No. US2011/0040953, each of which is hereby incorporated by reference in its entirety for all purposes.

In another embodiment, the instruction translator 294 does not include the micro translator 237, and the

second microcode instructions

251 and 249 accessed from the core microcode ROM 104 and the non-core ROM 108 are selectively decompressed into microinstructions executable by the execution unit 212.

Note that the unstructured microcode program counter 232 is different from the structured program counter 218; that is, the non-architectural microcode program counter 232 cannot maintain the address of the architectural instruction 242, and the address maintained in the non-architectural microcode program counter 232 is not within the system memory address space.

As described above, the first microcode instructions 247 are non-architectural instructions stored in one or more of the core microcode ROM 104 and the non-core ROM 108 of the microprocessor 100, and the first microcode instructions 247 are accessed and used by the processing core 102 according to the non-architectural microcode access address 254 stored in the non-architectural microcode program counter 232 to implement the instructions and architectural instructions 242 of the microprocessor 100. The uncompressed micro code instruction 253 is translated by the micro translator 237 into the second micro instruction 246 for execution by the execution unit 212, or in another embodiment of the invention, the uncompressed micro code instruction 253 is executed directly by the execution unit 212 (here, the second micro instruction 246). The uncompressed microcode instructions 253 are non-architectural instructions in the sense that they are not instructions of the Instruction Set (ISA) architecture of the microprocessor 100, but are encoded according to an instruction set that is different from the architectural instruction set. The architected microcode program counter 232 is not defined by the instruction set architecture of the microprocessor 100 and differs from the architected program counter 218. The micro program code is used to implement some or all of the instructions of the instruction set architecture of the microprocessor described below. In response to decoding the architectural instruction 242, the microprocessor 100, and in particular the simple instruction translator 294, transfers control to the microcode routine associated with the architectural instruction 242. The microcode program includes microcode instructions. The execution unit 212 executes the uncompressed micro code instruction 253, or according to the embodiment of FIG. 2, the uncompressed micro code instruction 253 is further translated into the second micro instruction 246 for execution by the execution unit 212. The result of execution of the uncompressed micro code instruction 253 (or the second micro instruction 246 translated by the uncompressed micro code instruction 253) by the execution unit 212 is the result defined by the architectural instruction 242. Thus, the architectural instructions 242 are executed by a program of microcode for the architectural instructions 242 that are collectively executed by the execution unit 212 (or the second microinstructions 246 translated by a program of microcode that is collectively executed by the execution unit 212); that is, the uncompressed micro code instructions 253 collectively executed by the execution units 212 (or the second micro instructions 246 translated by the uncompressed micro code instructions 253 collectively executed by the execution units 212) perform the actions specified by the architectural instructions 242 on the inputs specified by the architectural instructions 242 to produce the results defined by the architectural instructions 242. In addition, when the microprocessor is reset for microprocessor tuning, the micro code instructions may be executed (or translated into micro instructions that are executed).

According to one embodiment of the invention, arbitration logic 106 of FIG. 1 includes a request queue (not shown) that holds requests received from processing cores 102 to access uncore ROM 108 or uncore microcode patch RAM 114. According to one embodiment of the invention, each bus 112 between arbitration logic 106 and processing core 102 includes a request portion and a response portion. With respect to the required portion, the processing core 102 specifies an unstructured microcode access address 254 for the required micro program instruction word. With respect to the response portion, arbitration logic 106 provides a micro-program code instruction word, an address, a core number, and a valid pointer. The micro code instruction word, address, and core number are valid only if the valid pointer indicates valid. The core number assignment arbitration logic 106 provides a response to the processing core 102 that previously requested access to the uncore rom 108 or the uncore microcode patch ram 114. The address specifies the address of a microcode instruction word accessed by the uncore rom 108 or the uncore microcode patch ram 114. According to one embodiment of the invention, the arbitration logic 106 asserts a Stall signal on the bus 112 to the processing core 102 to indicate that the arbitration logic 106 cannot receive any more requests from the processing core 102 to access the uncore read only memory 108. Arbitration logic 106 deasserts the delay signal as long as the request can be received again. According to one embodiment of the present invention, if the RAT 206 asserts a stall signal to the instruction translator 294 indicating that no more microinstructions 226 can be received, then the instruction translator 294 clears any ongoing accesses to the NROM 108. Whenever the RAT is deasserted, the micro program 236 begins accessing the microcode instructions at the address next to the address of the last micro instruction 226 sent to the RAT 206. According to another embodiment of the present invention, the instruction translator 294 stores the state of the accessing uncore ROM 108 or the uncore microcode patch RAM 114 to avoid re-accessing the associated accessing microcode instructions.

The access latency of the uncore rom 108 is greater than the access latency of each core microcode rom 104. According to an embodiment of the present invention, the core microcode rom 104 has an access latency of three cycles, and the access latency of the uncore rom 108 is variable in an embodiment where its read port is common to multiple processing cores 102.

Referring to the block diagram of FIG. 3, which shows the decompression unit 239 in greater detail according to the embodiment of FIG. 2 of the present invention, also shown in FIG. 3 is the patch Content Addressable Memory (CAM) 306. When the unstructured microcode access address 254 matches the contents of one of the entries in the patch CAM 306, the addressable CAM 306 holding the patch address 308 outputs the patch address 308 from the addressable CAM 306 to the microprogram 236 in response to the unstructured microcode access address 254. In this case, the micro-program 236 outputs the patch address 308 as the unstructured micro-code access address 254 instead of the next sequential access address (or target address 224) in response to the uncore micro-code patch RAM 114 outputting the patch micro-code instruction 249 on the bus 112. This causes the patch microcode instruction 249 to be retrieved from the uncore microcode patch RAM 114, rather than the unwanted microcode instruction 249 or the second microcode instruction 251 from the uncore ROM 108 or the core microcode ROM 104, respectively. Alternatively, the contents of the addressable memory 306 and the uncore microcode patch RAM 114 are loaded in response to architectural instructions including system software, such as a Basic Input Output System (BIOS) or an operating system running on the microprocessor 100. The decompression unit 239 includes a decompressor 394, a buffer 398, a three-terminal input multiplexer 392, and control logic 396.

The decompressor 394 receives the compressed first microcode instruction 247 from the first multiplexer 222 of FIG. 2, decompresses the compressed first microcode instruction 247 into an uncompressed microcode instruction 353, and provides the uncompressed microcode instruction 353 to a first input of the three-input multiplexer 392. According to one embodiment of the invention, the decompressor 394 comprises a Programmable Logic Array (PLA) synthesized from Register Transfer Language (RTL) code, such as hardware description (Verilog) code, and is automatically generated by the programmable logic array generator 616 of FIG. 6. An example of how the decompressor 394 decompresses the compressed first microcode instructions 247 is described in detail below.

If control logic 396 requests that buffer 398 receive and load the 28-bit bits [15:0] of first microcode instruction 247, then buffer 398 executes following its request, otherwise buffer 398 maintains its previous value. According to one embodiment of the present invention, the contents of the buffer 398 are added to the bits [21:0] of the current 28-bit microcode word 247 to generate the 38-bit result 355 at the second input of the three-terminal input multiplexer 392 during the next clock cycle after the clock cycle in which the bits [15:0] of the 28-bit first microcode instruction 247 are loaded into the buffer 398.

Control logic 396 receives bits [27:16] of micro code word 247 and determines whether the value is the predetermined escape indicator value. According to an embodiment of the present invention, the predetermined escape indicator value is 0x3 FF. If so, control logic 396 controls register 398 to load bits [15:0] of 28 first microcode instructions 247. In addition, when the first multiplexer 222 provides the next 28-bit micro code word 247, the control logic 396 controls the three-port multiplexer 392 to select its second input to provide the uncompressed micro code instructions 253 to the micro-translator 237, i.e., to select the 16-bit contents of the merge buffer 398 and the 38-bit result 355 of bits [21:0] of the 28-bit micro code word 247. The next 28-bit microcode 247 would be the next word of the microcode instruction 249/the next word of the second microcode instruction 251 retrieved from the uncore rom 108/core rom 104, following the word 247 loaded into the buffer 398.

According to another embodiment of the present invention, the decompression unit 239 receives the escape pointer-containing micro code word and two uncompressed micro code instructions in the same clock cycle. In this embodiment, buffer 398 is omitted and the appropriate portions of the adjacent words are combined during the clock cycle and provided to the second input of three-terminal input multiplexer 392, and control logic 396 controls three-terminal input multiplexer 392 to select its second input.

The three-terminal input multiplexer 392 receives the 38-bit microcode word 112 at a third input, such as from the uncore microcode patch RAM 114. If the current source of the microcode instruction is 38 bits wide, such as from the uncore microcode patch RAM 114, the control logic 396 controls the three-port multiplexer 392 to select its third input (i.e., the 38-bit microcode word block 112), otherwise the control logic 396 controls the three-port multiplexer 392 to select either its first input or its second input. If the current source of the microcode instruction is 28 bits wide, such as the uncore ROM 108 or the core microcode ROM 104 storing (in addition to the packed microcode instruction) a separate portion of the uncompressed microcode word set that is requested to be merged, and if the previous word includes an escape pointer (i.e., the second escape pointer 432 of FIG. 4), the control logic 396 controls the TRIAC 392 to select its second input (i.e., the 38-bit result 355). If the current microcode instruction originates from a 28-bit memory storing compressed microcode instructions (except for a separate portion of the uncompressed microcode word and the escape pointer), and the current word does not include the escape pointer, the control logic controls the three-port multiplexer 392 to select its first input (i.e., the 38-bit uncompressed microcode instruction 353 of the decompressor 394).

According to one embodiment of the present invention, the 38-bit uncompressed micro code instructions 253 provided to the micro-translator 237 by the three-port input multiplexer 392 include: according to some instruction formats, and typically a 13-bit opcode field for a non-immediate instruction, a 5-bit first source operand address field, a 5-bit second source operand address field, a 5-bit target operand address field, a 4-bit size field specifying the operand size, a 4-bit field specifying how each 5-bit operand register field is to be decoded by the micro-translator 237, a one bit ". T" field specifying whether the micro-program code instruction is the last instruction in a sequence of micro-program code instructions executing the x86 instruction, and an extra bit. Other 38-bit uncompressed microcode instructions 253, which are typically immediate instructions, have a format that includes: a16-bit immediate field including a 16-bit immediate value, such as a immediate operand or the target address of a jump instruction, and subsets of the other fields described above, such as fields other than the 5-bit second operand field and smaller opcode fields.

Referring now to FIG. 4, therein is shown a flow chart of selectively compressing microcode instructions by a combinatorial process. FIG. 4 includes two flow diagrams illustrating an example of combining a first microcode assembly language instruction 402 into a single-word packed binary instruction 404 and an example of combining a second microcode assembly language instruction 414 represented by a first escape pointer 412 with a multi-word uncompressed binary instruction that includes an escape pointer.

The first flowchart illustrates the combination of a first microcode assembly language instruction 402 into a packed binary instruction 404 stored as a single word in a microcode memory, such as the core microcode rom 104 or the non-core rom 108. In the embodiment of FIG. 4, the singlets are 28 bits wide, i.e., the width of the core microcode ROM 104 or the uncore ROM 108 as shown. When the single-word compressed binary instruction 404 is read from the core microcode rom 104 or the non-core rom 108, it is decompressed to an uncompressed microcode instruction by the decompressor 394 of fig. 3 according to the present invention. In the embodiment of fig. 4. The first microcode assembly language instruction 402 adds the contents of the first general register R2 and the second general register R3 and writes the result into the third general register R4. In this embodiment, this is a microprogram instruction that has a compression entry at the time of assembly that allows the assembly language to compress it, and will be described in detail hereafter.

The second flowchart illustrates the case where the second combinatorial language microcode instruction 414 preceded by the first escape pointer 412 is combined into a multiword uncompressed binary instruction 424, the multiword uncompressed binary instruction 424 being split into two 28-bit words. The first word includes a second escape pointer 432 located at the predetermined position of the first word. The second escape pointer 432 is used by the decompression unit 239 (particularly the control logic 396) of FIG. 2 to respond to and recognize to combine the remaining portion 434 of the word containing the second escape pointer 432 with a portion 438 of the next word from the core microcode ROM 104 or the non-core ROM 108. In one embodiment of the present invention, the default value of the escape pointer is 0x3FF, and the default bits of the first word are bits [27:16 ]. However, the present invention is not limited to the embodiments, and other embodiments have different values and different orientations. Where the ESCAPE pointer is a predetermined word string (e.g., "ESCAPE" as shown in FIG. 4), the programmer may insert a line in the microcode source code file before the microcode instructions, so that the combinatorial language does not compress subsequent microcode instructions, but separates the instructions into two words having the length of the compressed instruction, with the second ESCAPE pointer 432 included at the beginning of the binary word.

The lower 16 bits of the first word are the lower 16 bits 434 of the multi-word uncompressed binary instruction 424, and the upper 22 bits of the second word are the upper 22 bits 438 of the multi-word uncompressed binary instruction 424. When the control logic 396 detects that the second escape pointer 432 is at the beginning of the first word, the lower 16 bits 434 and the upper 22 bits 438 are combined by the decompression unit 239. In actual practice, the upper 6 bits 436 of the second word may all be zeros. In the embodiment of FIG. 4, the second microcode assembly language microcode instruction 414 adds the contents of the second general purpose register R3 and the architected register (e.g., the x86 architected register ESI) and writes the result to the third general purpose register R4. In this embodiment, the microcode instructions do not have the compression table entries in the composition operation, and therefore need to include a first escape pointer 412 to avoid composition errors, as described in more detail below.

Please refer to the flowchart of fig. 5, which illustrates the process of creating the selective compression microcode. Flow begins with source code 502, where the source code 502 is developed by a micro-code designer, and the source micro-code 502 may include the first escape pointer 412. The assembly program 504 receives the source code 502 and the compression table 514. In one embodiment, the compression table 514 may be included in a file generated by the compression program 512 described below. The combination program 504 combines the source code 502 using the compression table 514 to produce the selectively compressed binary microcode 506. The selectively compressed binary micro-code 506 includes a single word compressed binary instruction (e.g., the single word compressed binary instruction 404 of FIG. 4) and a multi-word uncompressed binary instruction that includes a second escape pointer 432 (e.g., the multi-word uncompressed binary instruction 424 of FIG. 4). Source 502 typically comprises a multi-source file that is combined by a combining program 504 to produce selectively compressed binary microcode 506.

The assembly program 504 also generates a list 508 of all microcode instructions, the list 508 of all microcode instructions being included in the selectively compressed binary microcode 506. In one embodiment, the list 508 of all microcode instructions is a human-readable list that includes an entry for each microcode instruction in the selectively compressed binary microcode 506. For each micro-program code instruction, the entry specifies: (1) the address associated with the core microcode ROM 104 or the non-core ROM 108; (2) its uncompressed binary representation, e.g., its 38-bit binary representation; and (3) a similar but modified language representation to its combined language representation to facilitate the generation of the compression table 514 by the compression program 512. The uncompressed binary representation of the 38-bit value is separated into the lower 16-bits 434 and the upper 22-bits 438 of the multi-word uncompressed binary instruction 424 of FIG. 4, if not compressed into the single-word compressed binary designation 404, depending on whether the micro code instruction is a micro code instruction identified by the first escape pointer 412 of the source code 502.

The compression program 512 receives the list 508 of all the micro-program code instructions and generates a compression table 514 therefrom. The compression table 514 is the input to the assembly program 504, and the assembly program 504 then assembles the source code 502 into selectively compressed binary microcode 506. Typically, the subsequent combination is a new or modified source code 502. In addition, subsequent combinations may be the same as the source microcode 502 that was used to generate the compression table 514, such as when the compression table is initially empty.

The compressor 512 examines the list 508 of all micro-program code instructions and generates a unique instruction list. For example, the list of all microcode instructions 508 may include multiple instances with one instruction that subtracts R1 from R2 and places the result in R3; however, when generating the unique instruction list, the compaction program 512 treats these instances as a single and unique microcode. One fact that makes it highly probable that the microcode will be compressed is that for many microcode instructions, multiple instances of the same instruction may be present in the source code 502. The criteria for singulating microcode instructions may vary depending on the compression method used in various embodiments, as will be described in more detail below. In one embodiment, the compression program 512 is a program written in Python language.

After generating the unique instruction list, the compression program 512 specifies a corresponding unique value for each unique microcode instruction. The compression table 514 includes a one-to-one correspondence of unique microcode instructions and unique compression values. In the following combination example, the combination program 504 utilizes the correspondence relationship to compress the source code 502 combination language instruction that is not tagged by the escape pointer into the compressed instruction 404. The compressed value becomes a single word compressed binary instruction 404 (or becomes part of a compressed binary instruction) and the decompressor 394 decompresses the binary value into an uncompressed microprogram instruction 353 (or a part of it is an uncompressed microprogram instruction). In one embodiment, only a portion of the 38-bit uncompressed binary representation is compressed, while the remaining bits are uncompressed. In one embodiment, 22 bits of the 38-bit uncompressed binary representation are compressed into 12 bits of the 28-bit compressed binary instruction 404, and the remaining 16 bits of the 38-bit uncompressed binary representation are skipped over from the uncompressed program into 16 bits of the 28-bit compressed binary instruction 404. Decompressor 394 performs a similar reverse flow, as described below.

In one embodiment, the compressor 512 generates three compression tables 514. One of the compression tables 514 specifies that a unique compression value corresponds to the jump microcode instruction; one of the compression tables 514 specifies that a unique compression value corresponds to the immediate non-jump microcode instruction; and one of the compression tables 514 specifies that a unique compression value corresponds to all other microcode instructions, referred to herein as "miscellaneous" microcode instructions. In one embodiment, the 28-bit packed binary instruction 404 of the packed jump instruction has a binary "1" value at the leading bit (i.e., bit [27]), whereas the 28-bit packed binary instructions 404 of the live non-jump microcode instructions and the miscellaneous microcode instructions have a binary "0" value at the leading bit, the 28-bit packed binary instructions 404 of the live non-jump microcode instructions have a value of between 0x000 and 0x2FF in bits [26:16], and a value of between 0x300 and 0x7FF in bits [26:16] of the 28-bit packed binary instructions 404 of the miscellaneous microcode instructions. In one embodiment, the immediate values of the jump-in-time and non-jump-in-time 28-bit packed binary instructions 404 are located in bits [15:0] of the 28-bit micro program code word 247.

Compressor 512 also generates decompressed description 516. The decompressed description 516 may be conceptually understood as the inverse of the compressed table 514, i.e., the decompressed description 516 includes a one-to-one correspondence (or portion thereof) of the unique compressed values of the compressed table 514 to the unique uncompressed microcode instruction binary representations. As described above, the decompressor 394 may skip compressing a portion of the bits (16 bits in one embodiment) of the first microcode instruction 247, so that the decompressed description 516 need only include a one-to-one correspondence of a portion of the unique 12-bit compressed value to the unique 22-bits of the uncompressed microcode instruction binary representation. The decompressor 394 combines the unique 22-bit portion with the skipped 16-bit portion to produce the 38-bit uncompressed microcode instruction 253 that is ultimately provided to the micro-translator 237. Thus, in one embodiment, the decompressed description 516 corresponds to a unique 12-bit compressed value to a unique 22-bit value of the compression table 514, and the correspondence is used to produce or simulate outputting the unique 22-bit value as part of the uncompressed microcode instructions 353 in response to receiving bits [27:16] of the corresponding unique 12-bit compressed value of the compressed first microcode instruction 247.

The logic generator 518 (e.g., an ESPRESSO logic minizer program, as is well known) converts the decompressed description 516 into a decompressor register transfer language file 524, which file 524 may be used to manufacture or simulate the decompressor 394. In one embodiment, the decompressor register transfer language file 524 may be synthesized into a programmable logic array forming the decompressor 394.

Once an instance of the microprocessor 100, and in particular the hardware decompressor 394, is manufactured or emulated using the given instance of the decompressor register file 524, the subsequently selectively compressed binary microcode 506 executed by the instance of the microprocessor 100 must be assembled by the assembly program 504 using the compression table 514, wherein the compression table 514 corresponds to the decompressed description 516 that generated the decompressor register file 524. Otherwise, the decompressor 394 may incorrectly decompress the selectively compressed binary microcode 506.

For example, the selectively compressed binary microcode 506 executed by the emulation software of the microprocessor 100 must be assembled by the assembly program 504 using the compression table 514, wherein the compression table 514 corresponds to the decompressor description 516 generated in the emulation software using the decompressor register transfer language file 524. According to another embodiment, in which the patch microcode loaded into the uncore microcode patch RAM 114 is selectively compressed, the selectively compressed binary microcode 506 must be assembled by the assembly program 504 using a compression table 514, where the compression table 514 corresponds to the decompressor description 516 generated by the decompressor register file 524, and the decompressor description 516 is used to fabricate an instance of the microprocessor 100.

According to another embodiment, a microprocessor having many semiconductor layers, typically a core microcode ROM 104 and a non-core ROM 108, fabricated in the last layer, provides microcode developers with an opportunity to continue to develop microcode despite the need for the previously fabricated semiconductor layers (typically including decompressor 394) to be fabricated. In this case, the assembly process 504 must assemble the instances of the microprocessor 100 using the compression table 514, where the compression table 514 corresponds to the decompressed description 516, and the decompressed description 516 is created using the decompressor register file 524. This is particularly advantageous because it can provide the microcode designer in some cases to continue developing the microcode for weeks after the hardware designer is finished. In this case, the source code 502 may include new microcode instructions that are not in the list of all microcode instructions 508, where the list of all microcode instructions 508 are used to generate the decompressor register delivery language file 524 from an instance of the microprocessor 100. In this case, the microcode designer needs to insert the first escape pointer 412 into the source code 502 before the new microcode instruction, as described below with respect to step 608 of FIG. 6.

The flow chart shown in FIG. 6 illustrates the flow of combining microcode using a combination program. The flow begins at step 602.

In step 602, the assembly program 504 receives the source code 502 and the compression table 514 of FIG. 5. The assembly process 504 utilizes the compression table 514 to assemble the source code 502, which may include the escape pointer 412. At the beginning of the combined instance, compression table 514 may be empty. It is noted that the microcode developer may not know whether he has inserted a new microcode instruction into the source code 502, for example, the current compression table 514 does not have a microcode instruction mapping relationship. In this case, the associated error is generated when the combining process 504 combines the source code 502. Flow proceeds to block 604.

In step 604, if the assembling process 504 determines that the error occurred in the assembling process in step 602, the flow proceeds to step 606, otherwise, the flow proceeds to step 612. In particular, the combination error may be caused by the compression table 514 not including the correspondence of the source 502 microcode instructions, wherein the source 502 microcode instructions are not marked by the first escape pointer 412.

In step 606, the assembly program 504 outputs a list 508 of all microcode instructions to the source code 502, and flow proceeds to step 608.

At step 608, the first escape pointer 412 is inserted into the source code 502 in front of the microcode instruction that the assembly program 504 generated an error because the compression table 514 does not include a correspondence. In one embodiment, the first escape pointer 412 is inserted by the micro-program designer. In one embodiment, the error message generated by the assembly process 504 can be utilized by a process to automatically insert the first escape pointer 412. In another example of combining source codes 502, flow returns to step 602 until no errors are generated. It should be noted that the compression table 514 is empty when the source code 502 is first assembled, in which case the assembly process 504 generates errors for all of the micro program instructions of the source code 502; however, because the assembly process has generated the list 508 of all the microcode instructions, the compressor 512 may execute the instruction to generate the compression table 514, and the same source code 502 (without the first escape pointer 412 inserted) may be reassembled to generate the selectively compressed binary microcode 506 that may be executed by an instance of the microprocessor 100, where the instance of the microprocessor 100 includes an instance of the decompressor 394, and in the second instance, the decompressed description 516 generates the decompressor register delivery language file 524, the decompressor register delivery language file 524 generates an instance of the decompressor 394, and the decompressed description 516 is generated by the compressor 512.

In step 612, the assembly program 504 outputs a list 508 of all the microcode instructions available to fabricate the microprocessor 100, as well as the optional compressed binary microcode 506. Specifically, the selectively compressed binary microcode 506 includes a single-word compressed binary instruction 404 and a multi-word uncompressed binary instruction 424 located in the uncore rom 108, the core microcode rom 104, and/or the uncore microcode patch ram 114. Flow ends at step 612.

Referring to FIG. 7, a flowchart is shown illustrating the operation of the complex instruction translator 296 of FIG. 2, and in particular the operation of the decompression unit 239 of FIG. 3. The flow begins at step 702.

In step 702, the control logic 396 of FIG. 3 receives and decodes bits [27:16] of the 28-bit compressed first microcode instruction 247 of FIG. 2. Flow continues to step 704.

In step 704, the control logic 396 determines whether the current 28-bit compressed first microcode instruction 247 decoded in step 702 includes an escape pointer. In one embodiment, the control logic determines that the current 28-bit compressed first microcode instruction 247 includes an escape indicator if the predetermined bit of the 28-bit compressed first microcode instruction 247 is equal to a predetermined value. In one embodiment, the bits [27:16] are used and the predetermined value is 0x3FF, although the bits and the predetermined value are not intended to limit the embodiments. If the current 28-bit compact first microcode instruction 247 includes an escape pointer, flow proceeds to block 706; otherwise, flow proceeds to block 712.

In block 706, control logic 396 controls buffer 398 to load bits [15:0] of the 28-bit compressed first microcode instruction 247. Flow proceeds to block 708.

At block 708, when the next 28-bit compressed first microcode instruction 247 arrives (e.g., from the core microcode ROM 104 or the non-core ROM 108), the control logic 396 controls the three-port multiplexer 392 to select the 38-bit result 355 output by the buffer 398 that combines the 16-bit output (e.g., bits [15:0] of the 28-bit compressed first microcode instruction 247 decoded at block 702) with the bits [21:0] of the next 28-bit compressed first microcode instruction 247, and outputs the 38-bit result 355 as the 38-bit uncompressed microcode instruction 253, where the 38-bit uncompressed microcode instruction 253 is uncompressed in this case. In one embodiment, the next 28-bit compressed first microcode instruction 247 does not arrive in a relatively large number of clock cycles because of pipeline delays or contention with other processing cores 102 for use of the uncore ROM 108. Flow proceeds to block 714.

In step 712, the decompressor 394 decompresses the 28-bit compressed first microcode instruction 247 into a 38-bit uncompressed microcode instruction 353. As described above, in one embodiment, the decompressor 394 corresponds 12 bits of the 28-bit compressed first microcode instruction 247 to 22 bits of the 38-bit uncompressed microcode instruction 353, and the decompressor 394 skips the remaining 16 bits of the 28-bit compressed first microcode instruction 247 without corresponding, but the 16 bits are combined with the corresponding 22 bits to generate the 38-bit uncompressed microcode instruction 353. In one embodiment, the decompressor 394 also includes a plurality of multiplexers that direct each of the corresponding 22-bits and 16-bits skipped to a respective bit position of the 38-bit uncompressed microcode instruction 353 in response to a selection signal generated by logic that decodes the 28-bit compressed first microcode instruction 247. For example, in this example, the compressor 512 generates three compression tables 514 for the jump-in-time, non-jump-in-time, and miscellaneous microcode instructions, and the multiplexer directs the corresponding 22-bits and 16-bits passed through to the 38-bit uncompressed microcode instruction 353 based directly on which of the three types of microcode instructions is compressed. For example, in one embodiment, in the case of a native type of microcode instruction, the multiplexers direct the skipped 16 bits to the real-time domain of the 38-bit uncompressed microcode instruction 353, even though the real-time domain locations of the skipped and non-skipped uncompressed microcode instructions 353 are different; however, in the case of a miscellaneous instruction, the multiplexers direct the skipped 16-bit subsets to different regions and/or sub-regions of the non-real-time uncompressed microcode instruction 353; and the multiplexer directs the corresponding 22-bit subset to different 38-bit uncompressed microcode instructions 353 regions and/or sub-regions depending on which of the three types of compressed first microcode instructions 247 is decompressed. The control logic 396 controls the three-port multiplexer 392 to select the 38-bit uncompressed microcode instruction 353 from the decompressor 394 and output the selection 353 as the selectively compressed 38-bit uncompressed microcode instruction 253, in this case decompressing the current 28-bit compressed first microcode instruction 247, wherein the compressed first microcode instruction 247 was encoded in step 702. Flow proceeds to block 714.

At 714, the micro-translator 237 translates the selectively decompressed 38-bit uncompressed micro-code instruction 253 into a second micro-instruction 246, wherein the second micro-instruction 246 is executable by the execution unit 212 of the microprocessor 100, ending at 714.

FIG. 8 shows an embodiment in which the NROM 108 is 28 bits wide and holds packed codewords, the core microcode ROM 104 is 28 bits wide and holds packed codewords, and the NROM scratchpad RAM 114 is 38 bits wide and holds uncompressed codewords. In another embodiment, the NROM 108 is 38 bits wide and holds uncompressed set of micro program words instead of compressed set of micro program words, as shown in FIGS. 9, 11, and 13, in which case the three-port multiplexer 392 may receive the 38-bit set of micro program words from the NROM 108 at one input. In another embodiment, the core microcode ROM 104 is 38 bits wide and holds uncompressed microcode words instead of compressed microcode words, as shown in FIG. 10, where the three-port input multiplexer 392 may receive the 38-bit microcode words from the core microcode ROM 104 on one input. In another embodiment, the uncore microcode patch RAM 114 is 28 bits wide and holds compressed microcode words rather than uncompressed code words, as shown in FIG. 9, in which case 28-bit microcode words from the uncore microcode patch RAM 114 may be provided to one input of the first multiplexer 222 and selected to be provided to the three-terminal input multiplexer 392, the decompressor 394, the control logic 396 and the buffer 398. Furthermore, in accordance with another embodiment of the present invention, each processing core 102 includes a core patch ram 1299, the core patch ram 1299 functions as a non-core microcode patch ram 114, except that the core patch ram 1299 is not shared by multiple processing cores 102 but instead each corresponds to an opposing processing core 102, as shown in fig. 12 and 13. In the embodiment of fig. 12, the core patch ram 1299 is 38 bits wide and holds uncompressed micro program words, in which case the three-terminal multiplexer 392 may receive the 38-bit micro program word from the core patch ram 1299 on one input. In the embodiment of fig. 13, the core patch ram 1299 is 28 bits wide and holds compressed micro program code words, in which case the first multiplexer 222 may receive the 28-bit micro program code words from the core patch ram 1299 on one input. As described above, in each embodiment where the microcode memory is 28 bits wide and maintains a compressed microcode word, the microcode memory may also include a multi-word uncompressed binary instruction 424, and the multi-word uncompressed binary instruction 424 may be divided into two 28-bit words.

Another advantage of the ability to selectively compress microcode as described herein is that as the way in which microcode is stored in a programmable non-volatile memory of a microprocessor becomes more commercially viable, the microcode may be developed until such time as the microcode memory of the components of the microprocessor 100 may be programmed. Further, if the part/memory can be programmed in the field, such as by a user or field technician, the user or technician can also reprogram the part in the field to fix the error. In either case, the new source microcode 502 may have to include an escape pointer because the hardware decompressor 394 is repaired at that point.

Although the micro program code word sets and memories are specified to be of different widths in the various embodiments described, three of which are described by way of example, other embodiments may include micro program code word sets and memories of different widths. Furthermore, although embodiments are described herein in which the width of compressed microcode instructions is a particular width and the width of uncompressed microcode instructions is a particular width, these embodiments are described by way of example, and other embodiments may have different widths for the respective compressed and uncompressed microcode instructions. Furthermore, although embodiments described herein use selectively compressed microcode instructions in a multi-core processor, other embodiments include a single microprocessor that selectively compresses microcode and includes a microcode memory having compressed microcode instructions, with uncompressed microcode instructions, implemented using the present invention. Finally, although the embodiments described herein have a particular correspondence between uncompressed binary representations and compressed binary representations, other embodiments having a different correspondence may require a different set of microcode instructions. In particular, the number of bits is somewhat dependent on the requirement for an acceptable delay range for the decompression hardware.

While various embodiments of the present invention have been described herein, it should be understood that they have been presented by way of example only, and not limitation. It will be understood by those skilled in the computer-related art that various changes in form and detail may be made therein without departing from the scope of the invention. For example, software can enable the functions, manufacture, simulate, describe, and/or test the devices and methods described herein. This may be through the use of general programming languages (e.g., C, C + +), Hardware Description Languages (HDL) including Verilog HDL, VHDL, and the like, or other available programming languages. The software may be disposed on any known computer readable medium such as magnetic tape (magnetic tape), semiconductor, magnetic disk (magnetic disk), or optical disk (optical disk) (e.g., CD-ROM, DVD-ROM, etc.), or on a network, wired, wireless, or other communication medium. Embodiments of the apparatus and methods described herein may be included in a semiconductor intellectual property core, such as a microprocessor processing core (e.g., embodied in or specified in a hardware description language) and hardware for converting integrated circuit products. Furthermore, the apparatus and methods described herein may be embodied as a combination of hardware and software. Accordingly, the present invention should not be limited to any of the embodiments described herein, but should be defined only in accordance with the scope of the appended claims and their equivalents. In particular, the present invention may be implemented in a microprocessor apparatus, wherein the microprocessor apparatus may be used in a general purpose computer. Finally, those skilled in the art should appreciate that they can readily use the disclosed conception and specific embodiment as a basis for designing or modifying other structures for carrying out the same purposes of the present invention without departing from the scope of the invention as defined by the appended claims.

Claims

1. A method for converting instructions of an architectural instruction set into microcode instructions, comprising:

maintaining a plurality of microcode instructions in at least one of a plurality of memories, wherein at least a portion of the microcode instructions are compressed;

accessing one or more microcode instructions from the at least one of the plurality of memories in response to receiving an architectural instruction;

decompressing the compressed microcode instructions before they are executed; and

providing a plurality of sets of microcode words through said at least one of said plurality of memories in response to a microcode access address input, wherein said plurality of sets of microcode words include at least a portion of compressed microcode instructions,

wherein, the step of decompressing the compressed micro program code instruction comprises:

determining whether the set of microcode words accessed from the at least one of the plurality of memories includes a first portion of compressed microcode instructions or uncompressed microcode instructions, wherein the uncompressed microcode instructions further have a second portion contained in an adjacent word; and

decompressing said compressed microcode instructions into a plurality of uncompressed microcode instructions, or merging said first portion and said second portion of said uncompressed microcode instructions into a plurality of uncompressed microcode instructions;

wherein the method further comprises:

receiving a first N-bit wide microprogram codeword from a memory;

determining whether the predetermined portion of the first N-bit-wide microprogram codeword is a predetermined value;

if the predetermined portion is not the predetermined value, decompressing the first N-bit-wide micro-program code word to generate an M-bit-wide micro-program code word, wherein M and N are integers greater than zero, and M is greater than N; and

if the predetermined portion is the predetermined value, receiving a second N-bit wide microprogram codeword set from the memory, and combining the first N-bit wide microprogram codeword set and the second N-bit wide microprogram codeword set of the portion to generate the M-bit wide microprogram codeword set;

wherein if the predetermined portion is not the predetermined value, decompressing the first N-bit-wide micro-program codeword to generate an M-bit-wide micro-program codeword, comprising:

decompressing K bits of the first N-bit wide microprogram codeword set and skipping (N-K) bits of the first N-bit wide microprogram codeword set, wherein K is an integer greater than zero and N is greater than K;

wherein the step of decompressing the K bits of the first N-bit wide microprogram codeword set comprises:

and outputting a plurality of unique L-bit values in response to a plurality of unique values of a predetermined K bit contained in the N-bit wide microprogram code word group according to a predetermined corresponding relationship, wherein L is an integer greater than zero and less than M.

2. The method of claim 1, wherein determining whether the set of microcode words accessed from the at least one of the plurality of memories includes a first portion of a compressed microcode instruction or an uncompressed microcode instruction comprises: determining whether the predetermined portion of the micro program code word is a predetermined value.

3. The method of claim 1, wherein a first memory of the plurality of memories is configured to hold a plurality of compressed microcode instructions, and wherein a second memory of the plurality of memories is configured to hold one or more patched uncompressed microcode instructions including the compressed microcode instructions provided by the first memory.

4. The method of claim 1, further comprising:

maintaining a portion of the compressed microcode instructions by each microcode memory of a plurality of processing cores that is one of the plurality of memories; and

a part of the compressed micro program code instruction is maintained by an uncore micro program code memory shared by the plurality of processing cores as one of the plurality of memories.