WO2013101147A1 - Configurable reduced instruction set core - Google Patents
Configurable reduced instruction set core Download PDFInfo
- Publication number
- WO2013101147A1 WO2013101147A1 PCT/US2011/068016 US2011068016W WO2013101147A1 WO 2013101147 A1 WO2013101147 A1 WO 2013101147A1 US 2011068016 W US2011068016 W US 2011068016W WO 2013101147 A1 WO2013101147 A1 WO 2013101147A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- instruction
- core
- instructions
- supported
- medium
- Prior art date
Links
- 238000013461 design Methods 0.000 claims description 8
- 238000012546 transfer Methods 0.000 claims description 7
- 238000000034 method Methods 0.000 claims 10
- 238000013459 approach Methods 0.000 description 5
- 229910003460 diamond Inorganic materials 0.000 description 5
- 239000010432 diamond Substances 0.000 description 5
- 239000000872 buffer Substances 0.000 description 3
- 230000003287 optical effect Effects 0.000 description 3
- 239000004065 semiconductor Substances 0.000 description 3
- 238000013519 translation Methods 0.000 description 3
- 238000001514 detection method Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000004044 response Effects 0.000 description 2
- 241000435574 Popa Species 0.000 description 1
- 241000932075 Priacanthus hamrur Species 0.000 description 1
- 238000012937 correction Methods 0.000 description 1
- 230000009191 jumping Effects 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30003—Arrangements for executing specific machine instructions
- G06F9/30076—Arrangements for executing specific machine instructions to perform miscellaneous control operations, e.g. NOP
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F1/00—Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
- G06F1/26—Power supply means, e.g. regulation thereof
- G06F1/32—Means for saving power
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30181—Instruction operation extension or modification
- G06F9/30196—Instruction operation extension or modification using decoder, e.g. decoder per instruction set, adaptable or programmable decoders
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline or look ahead
- G06F9/3818—Decoding for concurrent execution
- G06F9/3822—Parallel decoding, e.g. parallel decode units
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline or look ahead
- G06F9/3885—Concurrent instruction execution, e.g. pipeline or look ahead using a plurality of independent parallel functional units
- G06F9/3889—Concurrent instruction execution, e.g. pipeline or look ahead using a plurality of independent parallel functional units controlled by multiple instructions, e.g. MIMD, decoupled access or execute
- G06F9/3891—Concurrent instruction execution, e.g. pipeline or look ahead using a plurality of independent parallel functional units controlled by multiple instructions, e.g. MIMD, decoupled access or execute organised in groups of units sharing resources, e.g. clusters
Definitions
- a subsequent generation generally includes support for legacy features. Over time, some of these legacy features become less and less commonly used since developers tend to revise their programs to work with the most current instruction sets. As time goes on, the number of legacy instructions that need to be supported continually increases. Nonetheless these legacy instructions may be executed less and less often.
- Figure 1 is a flow chart for one embodiment of the present invention
- Figure 2 is a schematic depiction of one embodiment to the present invention
- Figure 3 is a flow chart for another embodiment to the present invention
- Figure 4 is a flow chart for still another embodiment to the present invention
- Figure 5 is a hardware depiction for yet another embodiment to the present invention
- Figure 6 is a flow chart for another embodiment.
- Figure 7 is a schematic depiction of one embodiment.
- a processor may be built with a partial core that only executes a partial set of the total instructions, by eliminating some instructions needed to be fully backwards compliant.
- power consumption may be reduced by providing partial cores that only execute certain instructions and not other instructions needed to be backwards compliant.
- the instructions not supported may be handled in other, more energy efficient ways, so that, the overall processor, including the partial core, may be fully backwards compliant.
- the processor core may operate on the bulk of the instructions that are used in current generations of processors without having to support legacy instructions. This may mean that in some cases, the partial core processors may be more energy efficient.
- a partial core may eliminate a variety of different
- a partial core may eliminate microcode read-only memory dependencies.
- the partial core instructions are implemented as a single operation instruction.
- the instructions get directly translated in hardware without needing to fetch corresponding micro-operations from the microcode read-only memory as is commonly done with complete or non-partial processors. This may save a significant amount of microcode read-only memory space.
- the partial core may be legacy-free or non- backwards compliant. This may make the core more energy efficient and particularly suitable for embedded applications. Other examples may include reducing the number of floating point and single-instruction multiple data instructions as well as support for caches. Only integer and scalar instructions set architecture subsets may be implemented in one embodiment of a partial core. The same idea can be extended to floating point and vector (single instruction multiple data) instruction sets as well as to features typically implemented by full cores.
- the partial core is simply an implementation of a subset architecture that in some embodiments may be targeted to embedded applications. Other implementations of a subset architecture include different numbers of pipelined stages and other performance features like out-of-order, super scalar caches to make these partial cores suitable for particular market segments such as personal computers, tablets or servers.
- an instruction memory 12 provides instructions to an instructions fetch unit 14 in a pipeline 10. Those instructions are then decoded at the decode unit 16. Operand fetch 18 fetches operands from a data memory 24 for execution at execute unit 20. And the data is written back to the data memory 24 at write-back 22.
- a full decoder 16 may be provided in the pipeline 1 0. This decoder, at the time of full instruction decoding, detects unimplemented instructions and invokes prebuilt handlers 34 in execution unit 20 for those instructions. These pre-built handlers are dedicated designs that handle a particular instruction or instruction type. These prebuilt handlers can be software or hardware based.
- This approach may use a full-blown or complete decoder that speeds up detection of unsupported instructions and execution of execution handles.
- These pre-built handlers can be software or hardware based.
- This full blown decoder speeds up detection of unsupported instructions and execution of execution handlers.
- the decoder may be divided into two parts. One part decodes commonly executed instructions and the second part decodes less frequently used instructions.
- the instructions are received by decode unit 16.
- the decode unit 1 6 may include an instruction parser 26 that detects which instructions are supported by the partial core 32 (which may be described as commonly executed instructions) and which instructions are not supported (which may be called less commonly or uncommonly executed
- the instructions that are supported by the partial core are decoded by a commonly executed decoder 28 and passed to the partial core 32. Instructions that are uncommonly executed or unsupported are decoded by the decoder 30 and handled by pre-built handlers 34 in the execute unit 20 in one embodiment.
- a sequence 36 shown in Figure 3 may be implemented in software, firmware and/or hardware.
- the sequence may be implemented by computer executed instructions stored in a non-transitory computer readable medium such as an optical,
- the sequence 36 begins by parsing the instructions as indicated in block 38. Namely the instructions may be parsed based on identifying instructions that are supported by the partial core and instructions that are not supported by the partial core. In one embodiment the supported instructions are the commonly executed instructions. In other embodiments, particular instructions may be parsed out because they are ones that are supported by the partial core.
- the instructions of one type are sent to the first (commonly executed) decoder 28 and instructions of the second type are sent to the second 41 (uncommonly executed) decoder 30. Then the decoded instructions of the first type are sent to the partial core and the decoded instructions of the second type are sent to the prebuilt handlers 34 as shown in block 42.
- a core may generate an undefined instruction exception. This may be an existing exception or a newly defined special exception. The exception may be generated when an instruction is encountered that is unsupported by the partial core. Then a software or binary translation layer may get control of execution or resolve the exception. For example, in one embodiment the binary translation layer may execute a handler program that emulates the unsupported instruction. [0018] In some embodiments, a hybrid of this approach and the previously described approach, shown in Figures 2 and 3 may be used. Thus referring to Figure 4, a sequence 44 may be implemented in software, firmware and/or hardware. In software and firmware embodiments the sequence may be
- a non-transitory computer readable medium such as a magnetic, optical or semiconductor storage.
- the sequence 44 begins by determining whether the instruction is supported as indicated in diamond 46. If so, the instruction may be executed in the partial core as indicated in block 48. Otherwise an exception is issued as indicated in block 50.
- a processor may have one or two cores that include the full and complete instruction set and some number of partial cores that only implement a certain feature of the completed instruction set such as commonly executed features. Whenever a partial core comes across an unsupported instruction, the partial core transfers that task to one of the complete cores.
- the complete core in the mixed or heterogeneous environment can be hidden or exposed to operating systems. This approach does not involve any binary translation layer, either software or hardware in some embodiments, and differences in core features can be hidden from the operating system in other software layers.
- the architecture may include at least one complete core 51 and at least one partial core 52. Instructions are checked by the partial core 52. If the instructions are unsupported then they are transferred to the complete core 51 . Other cases where instructions are transferred, may also be contemplated. [0022] In accordance with one embodiment of a partial core processor, the following instructions may be supported:
- daa das, aaa, aas, aam, aad
- a configurable partial core may be produced with the appropriate circuit elements and software.
- the user can enter selections in response to graphical user interfaces. Then the system
- RTL register transfer level
- the instructions set is predefined and further configurability may be offered.
- a system may enable the user to manually implement configuration selections. As an example, one system may permit configuration of caches, branch predictors, pipeline bypasses, and multipliers.
- a cache configuration may be set by default with tightly coupled data and instruction caches.
- options that may be selected includes split data and instruction caches and selectable cache parameters, such as cache size, line size, associativity, and error correction code.
- Branch predictors may be set by default using the always not-taken approach to conditional branching. Selectable options, in some embodiments, may include backwards taken and forwards not-taken, branch target buffers of two, four, eight or sixteen entries, full scale G-share based, or a predictor with a configurable number of entries.
- a set of default pipeline bypasses may be selectively deactivated in one embodiment. Default bypasses allow users to trade off performance for higher frequency but at the expense of power. For example, a bypass called IFJBUF allows data coming from the instruction memory/cache to go directly to the predecoder and decoder stages without first going into the instruction buffer.
- bypass in some embodiments that sends results from a compare instruction, to operand fetch and instruction stages for quickly determining if a jump instruction, that is the next compare instruction, results in jumping into a different location or not. Based on this information, the instruction fetch unit can start fetching instructions starting at the new address. This bypass reduces the penalty for conditional jump instructions. While these bypasses offer higher efficiency, they do so at the cost of frequency. If a particular application needs higher frequency, then these bypasses can be selectively turned off at design time.
- a default configuration in one embodiment may offer one, two or multiple cycle multipliers. The user can choose one of these three multipliers based on a user's requirements.
- the single cycle multiplier takes more area and may limit the design from reaching higher frequencies but only takes one cycle to execute 32x32 bit multiplication operations.
- the multi-cycle multiplier on the other hand takes about 2,000 gates versus 7,000 gates for a single cycle multiplier, but takes more than one cycle to execute 32x32 bit multiplier operations.
- memory protection unit In some embodiments other configurable features including memory protection unit, memory management unit, write back buffer may be made available. It can also be extended to the floating point unit, single instruction multiple data, superscalar, and number of supported interrupts to mention some additional configurable features.
- some selectable features are performance oriented, as is the case by with bypasses, branch predictors and multipliers, and others are functionality or feature oriented such as those related to caches, memory protection units and memory management units.
- a core configuration sequence 60 may be implemented in software, hardware and/or firmware.
- software and firmware embodiments it may be implemented by computer executed instructions stored in a non-transitory computer readable medium such as an optical, magnetic or semiconductor storage.
- the sequence 60 begin by displaying selectable cache options for a partial core design as indicated in block 62. Once the user makes a selection, as indicated in diamond 64, the option is set as indicated in block 66, meaning that it will be recorded and ultimately be implemented into the necessary code without further user action in some embodiments. If a selection is not made, the flow simply awaits the selection.
- Next branch prediction options may be displayed as indicated in block 68 followed by a selection check at diamond 70 and an option set stage at block 72.
- pipeline bypass options may be displayed (block 74) followed by selection at diamond 76 and option setting at block 78.
- multiplier options may be displayed as indicated at block 80. This may again be followed by a selection decision at diamond 82 and option setting at block 84.
- a system 90 for implementing one embodiment to the present invention may include a processor 92 coupled to a code database 94, an RTL engine 96, a display driver 100 and a software code generator 98.
- Code database 94 stores the database of codes for the different selectable options.
- the RTL engine 96 includes the ability to generate RTL code in response to user selections.
- the software code generator generates the necessary software code to implement the user selections.
- the display driver 1 00 drives the display 104 and includes software for generating the graphical user interface (GUI) 102 in one embodiment that provides user selectability of various defined options.
- GUI graphical user interface
- references throughout this specification to "one embodiment” or “an embodiment” mean that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one implementation encompassed within the present invention. Thus, appearances of the phrase “one embodiment” or “in an embodiment” are not necessarily referring to the same embodiment. Furthermore, the particular features, structures, or characteristics may be instituted in other suitable forms other than the particular embodiment illustrated and all such forms may be encompassed within the claims of the present application.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Executing Machine-Instructions (AREA)
Abstract
A processor may be built with cores that only execute some partial set of the instructions needed to be fully backwards compliant. Thus, in some embodiments power consumption may be reduced by providing partial cores that only execute certain instructions and not other instructions. The instructions not supported may be handled in other, more energy efficient ways, so that, the overall processor, including the partial core, may be fully backwards compliant.
Description
CONFIGURABLE REDUCED INSTRUCTION SET CORE
Background
[0001 ] This relates generally to computing and particularly processing.
[0002] In order to be compatible with previous generations of processors, a subsequent generation generally includes support for legacy features. Over time, some of these legacy features become less and less commonly used since developers tend to revise their programs to work with the most current instruction sets. As time goes on, the number of legacy instructions that need to be supported continually increases. Nonetheless these legacy instructions may be executed less and less often.
Brief Description Of The Drawings
[0003] Some embodiments are described with respect to the following figures:
Figure 1 is a flow chart for one embodiment of the present invention;
Figure 2 is a schematic depiction of one embodiment to the present invention; Figure 3 is a flow chart for another embodiment to the present invention; Figure 4 is a flow chart for still another embodiment to the present invention; Figure 5 is a hardware depiction for yet another embodiment to the present invention;
Figure 6 is a flow chart for another embodiment; and
Figure 7 is a schematic depiction of one embodiment.
Detailed Description
[0004] In accordance with some embodiments, a processor may be built with a partial core that only executes a partial set of the total instructions, by eliminating some instructions needed to be fully backwards compliant. Thus, in some embodiments power consumption may be reduced by providing partial cores that only execute certain instructions and not other instructions needed to be backwards compliant. The instructions not supported may be handled in other, more energy efficient ways, so that, the overall processor, including the partial core, may be fully
backwards compliant. But the processor core may operate on the bulk of the instructions that are used in current generations of processors without having to support legacy instructions. This may mean that in some cases, the partial core processors may be more energy efficient.
[0005] For example, a partial core may eliminate a variety of different
instructions. In one embodiment, a partial core may eliminate microcode read-only memory dependencies. In such case, the partial core instructions are implemented as a single operation instruction. Thus, the instructions get directly translated in hardware without needing to fetch corresponding micro-operations from the microcode read-only memory as is commonly done with complete or non-partial processors. This may save a significant amount of microcode read-only memory space.
[0006] In addition, only a subset of those instructions that are available on complete cores are actually used by modern compilers. As a result of architecture evolution over the last couple of decades, commercial instruction set architectures have many obsolete or non-useful instructions that can be eliminated for efficiency but at the cost of some lack of backwards compatibility.
[0007] Features from previous generations like 16-bit real mode from the Microsoft Disk Operating System (DOS) days and segmentation based memory protection architecture, local and global descriptor tables are being carried forward for backward compatibility reasons. But most modern operating systems do not need or use these features anymore. Thus, in some embodiments these features may simply be eliminated from partial cores.
[0008] Thus, in one embodiment, the partial core may be legacy-free or non- backwards compliant. This may make the core more energy efficient and particularly suitable for embedded applications. Other examples may include reducing the number of floating point and single-instruction multiple data instructions as well as support for caches. Only integer and scalar instructions set architecture subsets may be implemented in one embodiment of a partial core. The same idea can be extended to floating point and vector (single instruction multiple data) instruction sets
as well as to features typically implemented by full cores. The partial core is simply an implementation of a subset architecture that in some embodiments may be targeted to embedded applications. Other implementations of a subset architecture include different numbers of pipelined stages and other performance features like out-of-order, super scalar caches to make these partial cores suitable for particular market segments such as personal computers, tablets or servers.
[0009] Thus referring to Figure 1 , an instruction memory 12 provides instructions to an instructions fetch unit 14 in a pipeline 10. Those instructions are then decoded at the decode unit 16. Operand fetch 18 fetches operands from a data memory 24 for execution at execute unit 20. And the data is written back to the data memory 24 at write-back 22.
[0010] In order to achieve full backwards compatibility, unsupported instructions may be handled in different ways. According to one embodiment, shown in Figure 2, a full decoder 16 may be provided in the pipeline 1 0. This decoder, at the time of full instruction decoding, detects unimplemented instructions and invokes prebuilt handlers 34 in execution unit 20 for those instructions. These pre-built handlers are dedicated designs that handle a particular instruction or instruction type. These prebuilt handlers can be software or hardware based.
[001 1 ] This approach may use a full-blown or complete decoder that speeds up detection of unsupported instructions and execution of execution handles. These pre-built handlers can be software or hardware based.
[0012] This full blown decoder speeds up detection of unsupported instructions and execution of execution handlers. The decoder may be divided into two parts. One part decodes commonly executed instructions and the second part decodes less frequently used instructions.
[0013] Thus referring to Figure 2, the instructions are received by decode unit 16. In this embodiment, the decode unit 1 6 may include an instruction parser 26 that detects which instructions are supported by the partial core 32 (which may be described as commonly executed instructions) and which instructions are not
supported (which may be called less commonly or uncommonly executed
instructions). The instructions that are supported by the partial core are decoded by a commonly executed decoder 28 and passed to the partial core 32. Instructions that are uncommonly executed or unsupported are decoded by the decoder 30 and handled by pre-built handlers 34 in the execute unit 20 in one embodiment.
[0014] In some embodiments, a sequence 36 shown in Figure 3, may be implemented in software, firmware and/or hardware. In software and firmware embodiments the sequence may be implemented by computer executed instructions stored in a non-transitory computer readable medium such as an optical,
semiconductor or magnetic storage.
[0015] The sequence 36, shown in Figure 3 begins by parsing the instructions as indicated in block 38. Namely the instructions may be parsed based on identifying instructions that are supported by the partial core and instructions that are not supported by the partial core. In one embodiment the supported instructions are the commonly executed instructions. In other embodiments, particular instructions may be parsed out because they are ones that are supported by the partial core.
[0016] As indicated in block 40 the instructions of one type are sent to the first (commonly executed) decoder 28 and instructions of the second type are sent to the second 41 (uncommonly executed) decoder 30. Then the decoded instructions of the first type are sent to the partial core and the decoded instructions of the second type are sent to the prebuilt handlers 34 as shown in block 42.
[0017] According to another embodiment, a core may generate an undefined instruction exception. This may be an existing exception or a newly defined special exception. The exception may be generated when an instruction is encountered that is unsupported by the partial core. Then a software or binary translation layer may get control of execution or resolve the exception. For example, in one embodiment the binary translation layer may execute a handler program that emulates the unsupported instruction.
[0018] In some embodiments, a hybrid of this approach and the previously described approach, shown in Figures 2 and 3 may be used. Thus referring to Figure 4, a sequence 44 may be implemented in software, firmware and/or hardware. In software and firmware embodiments the sequence may be
implemented by computer executed instructions stored on a non-transitory computer readable medium such as a magnetic, optical or semiconductor storage.
[0019] The sequence 44 begins by determining whether the instruction is supported as indicated in diamond 46. If so, the instruction may be executed in the partial core as indicated in block 48. Otherwise an exception is issued as indicated in block 50.
[0020] In accordance with yet another embodiment, a processor may have one or two cores that include the full and complete instruction set and some number of partial cores that only implement a certain feature of the completed instruction set such as commonly executed features. Whenever a partial core comes across an unsupported instruction, the partial core transfers that task to one of the complete cores. The complete core in the mixed or heterogeneous environment can be hidden or exposed to operating systems. This approach does not involve any binary translation layer, either software or hardware in some embodiments, and differences in core features can be hidden from the operating system in other software layers.
[0021 ] Thus, referring to Figure 5, the architecture may include at least one complete core 51 and at least one partial core 52. Instructions are checked by the partial core 52. If the instructions are unsupported then they are transferred to the complete core 51 . Other cases where instructions are transferred, may also be contemplated.
[0022] In accordance with one embodiment of a partial core processor, the following instructions may be supported:
Data Transfer
bswap, xchg, xadd, cmpxchg, mov, push,
pop, movsx, movzx, cbw, cwd, cmovcc
Arithmetic
add, ade, sub, sbb, imul, mul, idiv, div,
inc, dec. neg, cmp
Logical
and, or, xor, not
Shift and Ro ate
sar, shr, sal, shl, ror, rol, rer, rcl
Bit and Byte
bt, bts, btr, btc, test
Control Transfer
jmp, jcc, call, ret, iret, int, into
Flag Control
stc, clc, cmc, pushf, popf,
sti, cli
Miscellaneous
lea, nop, ud2
System
lidt, lock, sidt, hit, rdmsr, wrmsr
[0023] The following instructions may not be supported in accordance with one embodiment:
Data Transfer
cmpxchg8b, pusha, popa
Decimal Arithmetric
daa, das, aaa, aas, aam, aad
Shift and Rotate
shrd, shld
Bit and Byte
setee, bound, bsf, bsr
Control Transfer
enter, leave
String
movsb, movsw, movsd, cmpsb, cmpsb,
cmpsw, cmpsd, scash, scasw, scads,
loadsb, loadsw, loaded, stosb, stows,
stosd, rep, repz, repnz
JZQ
in, out, insb, insw, insd, outsb, outsw,
outsb
Flag Control
eld, std, lahf, sahf
Segment Register
Ids, les, Ifs, Igs, Iss
Miscellaneous
xlat, cupid, movebe
System
Igdt, sgdt, lldt, sldt, Itr, str, Imsw, smsw,
cits, arpl, lar, Isl, verr, verw, invd, wbinvd,
invlpg, rsun, rdpmc, rdtsep, sysenter,
sysexit, xsave, xrestr, xgetbv, xsetbv
[0024] In some embodiments, a configurable partial core may be produced with the appropriate circuit elements and software. In one embodiment, the user can enter selections in response to graphical user interfaces. Then the system
automatically generates the register transfer level (RTL) and software to implement a partial core with those features. In some embodiments, the instructions set is predefined and further configurability may be offered. In other embodiments, a system may enable the user to manually implement configuration selections. As an example, one system may permit configuration of caches, branch predictors, pipeline bypasses, and multipliers.
[0025] For example, in one embodiment, a cache configuration may be set by default with tightly coupled data and instruction caches. Among the options that may be selected includes split data and instruction caches and selectable cache parameters, such as cache size, line size, associativity, and error correction code.
[0026] Branch predictors may be set by default using the always not-taken approach to conditional branching. Selectable options, in some embodiments, may include backwards taken and forwards not-taken, branch target buffers of two, four, eight or sixteen entries, full scale G-share based, or a predictor with a configurable number of entries.
[0027] A set of default pipeline bypasses may be selectively deactivated in one embodiment. Default bypasses allow users to trade off performance for higher frequency but at the expense of power. For example, a bypass called IFJBUF allows data coming from the instruction memory/cache to go directly to the predecoder and decoder stages without first going into the instruction buffer.
Similarly, there is another bypass in some embodiments that sends results from a compare instruction, to operand fetch and instruction stages for quickly determining if a jump instruction, that is the next compare instruction, results in jumping into a different location or not. Based on this information, the instruction fetch unit can start fetching instructions starting at the new address. This bypass reduces the penalty for conditional jump instructions. While these bypasses offer higher efficiency, they
do so at the cost of frequency. If a particular application needs higher frequency, then these bypasses can be selectively turned off at design time.
[0028] Still another set of options relates to the multiplier. A default configuration in one embodiment may offer one, two or multiple cycle multipliers. The user can choose one of these three multipliers based on a user's requirements. The single cycle multiplier takes more area and may limit the design from reaching higher frequencies but only takes one cycle to execute 32x32 bit multiplication operations. The multi-cycle multiplier on the other hand takes about 2,000 gates versus 7,000 gates for a single cycle multiplier, but takes more than one cycle to execute 32x32 bit multiplier operations.
[0029] In some embodiments other configurable features including memory protection unit, memory management unit, write back buffer may be made available. It can also be extended to the floating point unit, single instruction multiple data, superscalar, and number of supported interrupts to mention some additional configurable features.
[0030] In some embodiments, some selectable features are performance oriented, as is the case by with bypasses, branch predictors and multipliers, and others are functionality or feature oriented such as those related to caches, memory protection units and memory management units.
[0031 ] Referring to Figure 6, a core configuration sequence 60 may be implemented in software, hardware and/or firmware. In software and firmware embodiments it may be implemented by computer executed instructions stored in a non-transitory computer readable medium such as an optical, magnetic or semiconductor storage.
[0032] In one embodiment, the sequence 60 begin by displaying selectable cache options for a partial core design as indicated in block 62. Once the user makes a selection, as indicated in diamond 64, the option is set as indicated in block 66, meaning that it will be recorded and ultimately be implemented into the
necessary code without further user action in some embodiments. If a selection is not made, the flow simply awaits the selection.
[0033] Next branch prediction options may be displayed as indicated in block 68 followed by a selection check at diamond 70 and an option set stage at block 72.
[0034] Thereafter, pipeline bypass options may be displayed (block 74) followed by selection at diamond 76 and option setting at block 78. Next, multiplier options may be displayed as indicated at block 80. This may again be followed by a selection decision at diamond 82 and option setting at block 84.
[0035] Finally, all the options that have been set or selected are collected and the appropriate RTL and software code is automatically generated as indicated in block 86. Thus, based on the designer's selections, the necessary code to create the hardware and software configuration may be generated automatically in some embodiments.
[0036] Referring to Figure 7, a system 90 for implementing one embodiment to the present invention may include a processor 92 coupled to a code database 94, an RTL engine 96, a display driver 100 and a software code generator 98. Code database 94 stores the database of codes for the different selectable options. The RTL engine 96 includes the ability to generate RTL code in response to user selections. The software code generator generates the necessary software code to implement the user selections. The display driver 1 00 drives the display 104 and includes software for generating the graphical user interface (GUI) 102 in one embodiment that provides user selectability of various defined options.
[0037] References throughout this specification to "one embodiment" or "an embodiment" mean that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one implementation encompassed within the present invention. Thus, appearances of the phrase "one embodiment" or "in an embodiment" are not necessarily referring to the same embodiment. Furthermore, the particular features, structures, or characteristics may
be instituted in other suitable forms other than the particular embodiment illustrated and all such forms may be encompassed within the claims of the present application.
[0038] While the present invention has been described with respect to a limited number of embodiments, those skilled in the art will appreciate numerous
modifications and variations therefrom. It is intended that the appended claims cover all such modifications and variations as fall within the true spirit and scope of this present invention.
Claims
What is claimed is: 1 . A method comprising:
determining if an instruction is supported by a partial core; only if the instruction is supported, providing said instruction for execution by the partial core;
providing a number of selectable partial core design options; and based on user selections, automatically generating code to implement a partial core with the selections.
2. The method of claim 1 including executing an instruction not supported by the partial core by a complete core.
3. The method of claim 1 including executing an instruction not supported by the partial core by a pre-built handler.
4. The method of claim 1 including issuing an exception if an instruction is not supported by the partial core.
5. The method of claim 1 including excluding instructions from the instruction set of the partial core for handling read-only dependencies.
6. The method of claim 1 including translating instructions in hardware without fetching corresponding micro-operations from microcode read-only.
7. The method of claim 1 includes enabling cache configuration selections.
8. The method of claim 1 including enabling selection of branch predictors.
9. The method of claim 1 including enabling selection of pipeline bypasses.
1 0. The method of claim 1 including enabling selection of multipliers.
1 1 . A non-transitory computer readable medium storing instructions to:
determine if an instruction is supported by a core that only executes some of the instructions of an instruction set;
only if the instruction is supported, provide said instruction for execution by the core;
provide a number of selectable partial core design options; and based on user selections, generate code to implement a partial core with the selections.
12. The medium of claim 1 1 , storing instructions to execute an instruction not supported by the core by a complete core.
13. The medium of claim 1 1 , storing instructions to execute an instruction not supported by the core by a pre-built handler.
14. The medium of claim 1 1 , storing instructions to issue an exception if an instruction is not supported by the partial core.
15. The medium of claim 1 1 , storing instructions to exclude instructions from the instruction set of the core for handling read-only dependencies.
16. The medium of claim 1 1 , storing instructions to translate instructions in hardware without fetching corresponding microoperations from microcode read-only memory.
17. The medium of claim 1 1 , storing instructions to enable cache configuration selections.
18. The medium of claim 1 1 , storing instructions to enable selection of branch predictors.
19. The medium of claim 1 1 , storing instructions to enable selection of pipeline bypasses.
20. The medium of claim 1 1 , storing instructions to enable selection of multipliers.
21 . The apparatus comprising:
a processor to enable a user to select from among options for a processor core including cache design options; and
a code database storing code to implement selectable design options for a processor core, including register transfer level and a software code.
22. The apparatus of claim 21 , said processor to enable selection of branch predictors.
23. The apparatus of claim 21 , said processor to enable selection of pipeline bypasses.
24. The apparatus of claim 21 , said processor to enable selection of multipliers.
Priority Applications (5)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201180076171.7A CN104025034B (en) | 2011-12-30 | 2011-12-30 | Configurable reduction instruction set core |
PCT/US2011/068016 WO2013101147A1 (en) | 2011-12-30 | 2011-12-30 | Configurable reduced instruction set core |
US13/992,797 US20140223145A1 (en) | 2011-12-30 | 2011-12-30 | Configurable Reduced Instruction Set Core |
EP11878898.3A EP2798467A4 (en) | 2011-12-30 | 2011-12-30 | Configurable reduced instruction set core |
TW101149530A TWI472911B (en) | 2011-12-30 | 2012-12-24 | Method and apparatus for configurable reduced instruction set core and non-transitory computer readable medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/US2011/068016 WO2013101147A1 (en) | 2011-12-30 | 2011-12-30 | Configurable reduced instruction set core |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2013101147A1 true WO2013101147A1 (en) | 2013-07-04 |
Family
ID=48698381
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/US2011/068016 WO2013101147A1 (en) | 2011-12-30 | 2011-12-30 | Configurable reduced instruction set core |
Country Status (5)
Country | Link |
---|---|
US (1) | US20140223145A1 (en) |
EP (1) | EP2798467A4 (en) |
CN (1) | CN104025034B (en) |
TW (1) | TWI472911B (en) |
WO (1) | WO2013101147A1 (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150113254A1 (en) * | 2013-10-23 | 2015-04-23 | Nvidia Corporation | Efficiency through a distributed instruction set architecture |
WO2015165323A1 (en) * | 2014-04-30 | 2015-11-05 | 华为技术有限公司 | Data processing method, processor, and data processing device |
WO2017105670A1 (en) * | 2015-12-15 | 2017-06-22 | Intel Corporation | Instruction and logic for partial reduction operations |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9830150B2 (en) | 2015-12-04 | 2017-11-28 | Google Llc | Multi-functional execution lane for image processor |
TWI790991B (en) * | 2017-01-24 | 2023-02-01 | 香港商阿里巴巴集團服務有限公司 | Database operation method and device |
TWI805544B (en) * | 2017-01-24 | 2023-06-21 | 香港商阿里巴巴集團服務有限公司 | Database operation method and device |
US10540181B2 (en) * | 2018-01-19 | 2020-01-21 | Marvell World Trade Ltd. | Managing branch prediction information for different contexts |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP0283115A2 (en) * | 1987-02-09 | 1988-09-21 | Advanced Micro Devices, Inc. | Methods and apparatus for achieving an interface for a reduced instruction set computer system |
JP2001005660A (en) * | 1999-05-26 | 2001-01-12 | Infineon Technol North America Corp | Method and device for reducing instruction transaction in microprocessor |
WO2002019098A1 (en) * | 2000-08-30 | 2002-03-07 | Intel Corporation | Method and apparatus for a unified risc/dsp pipeline controller for both reduced instruction set computer (risc) control instructions and digital signal processing (dsp) instructions |
US20050086352A1 (en) * | 2003-09-29 | 2005-04-21 | Eric Boisvert | Massively reduced instruction set processor |
Family Cites Families (19)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5632028A (en) * | 1995-03-03 | 1997-05-20 | Hal Computer Systems, Inc. | Hardware support for fast software emulation of unimplemented instructions |
US5752035A (en) * | 1995-04-05 | 1998-05-12 | Xilinx, Inc. | Method for compiling and executing programs for reprogrammable instruction set accelerator |
US5699537A (en) * | 1995-12-22 | 1997-12-16 | Intel Corporation | Processor microarchitecture for efficient dynamic scheduling and execution of chains of dependent instructions |
US6374349B2 (en) * | 1998-03-19 | 2002-04-16 | Mcfarling Scott | Branch predictor with serially connected predictor stages for improving branch prediction accuracy |
US6480952B2 (en) * | 1998-05-26 | 2002-11-12 | Advanced Micro Devices, Inc. | Emulation coprocessor |
US6185672B1 (en) * | 1999-02-19 | 2001-02-06 | Advanced Micro Devices, Inc. | Method and apparatus for instruction queue compression |
US6708268B1 (en) * | 1999-03-26 | 2004-03-16 | Microchip Technology Incorporated | Microcontroller instruction set |
US6425116B1 (en) * | 2000-03-30 | 2002-07-23 | Koninklijke Philips Electronics N.V. | Automated design of digital signal processing integrated circuit |
US7287147B1 (en) * | 2000-12-29 | 2007-10-23 | Mips Technologies, Inc. | Configurable co-processor interface |
US6886092B1 (en) * | 2001-11-19 | 2005-04-26 | Xilinx, Inc. | Custom code processing in PGA by providing instructions from fixed logic processor portion to programmable dedicated processor portion |
US7100060B2 (en) * | 2002-06-26 | 2006-08-29 | Intel Corporation | Techniques for utilization of asymmetric secondary processing resources |
EP1387259B1 (en) * | 2002-07-31 | 2017-09-20 | Texas Instruments Incorporated | Inter-processor control |
US20040128477A1 (en) * | 2002-12-13 | 2004-07-01 | Ip-First, Llc | Early access to microcode ROM |
TWI232457B (en) * | 2003-12-15 | 2005-05-11 | Ip First Llc | Early access to microcode ROM |
US7165229B1 (en) * | 2004-05-24 | 2007-01-16 | Altera Corporation | Generating optimized and secure IP cores |
US7353489B2 (en) * | 2004-05-28 | 2008-04-01 | Synopsys, Inc. | Determining hardware parameters specified when configurable IP is synthesized |
US7529909B2 (en) * | 2006-12-28 | 2009-05-05 | Microsoft Corporation | Security verified reconfiguration of execution datapath in extensible microcomputer |
US7895415B2 (en) * | 2007-02-14 | 2011-02-22 | Intel Corporation | Cache sharing based thread control |
US20100262966A1 (en) * | 2009-04-14 | 2010-10-14 | International Business Machines Corporation | Multiprocessor computing device |
-
2011
- 2011-12-30 CN CN201180076171.7A patent/CN104025034B/en active Active
- 2011-12-30 WO PCT/US2011/068016 patent/WO2013101147A1/en active Application Filing
- 2011-12-30 EP EP11878898.3A patent/EP2798467A4/en not_active Withdrawn
- 2011-12-30 US US13/992,797 patent/US20140223145A1/en not_active Abandoned
-
2012
- 2012-12-24 TW TW101149530A patent/TWI472911B/en not_active IP Right Cessation
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP0283115A2 (en) * | 1987-02-09 | 1988-09-21 | Advanced Micro Devices, Inc. | Methods and apparatus for achieving an interface for a reduced instruction set computer system |
JP2001005660A (en) * | 1999-05-26 | 2001-01-12 | Infineon Technol North America Corp | Method and device for reducing instruction transaction in microprocessor |
WO2002019098A1 (en) * | 2000-08-30 | 2002-03-07 | Intel Corporation | Method and apparatus for a unified risc/dsp pipeline controller for both reduced instruction set computer (risc) control instructions and digital signal processing (dsp) instructions |
US20050086352A1 (en) * | 2003-09-29 | 2005-04-21 | Eric Boisvert | Massively reduced instruction set processor |
Non-Patent Citations (1)
Title |
---|
See also references of EP2798467A4 * |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150113254A1 (en) * | 2013-10-23 | 2015-04-23 | Nvidia Corporation | Efficiency through a distributed instruction set architecture |
US10503513B2 (en) * | 2013-10-23 | 2019-12-10 | Nvidia Corporation | Dispatching a stored instruction in response to determining that a received instruction is of a same instruction type |
WO2015165323A1 (en) * | 2014-04-30 | 2015-11-05 | 华为技术有限公司 | Data processing method, processor, and data processing device |
US10025752B2 (en) | 2014-04-30 | 2018-07-17 | Huawei Technologies Co., Ltd. | Data processing method, processor, and data processing device |
WO2017105670A1 (en) * | 2015-12-15 | 2017-06-22 | Intel Corporation | Instruction and logic for partial reduction operations |
Also Published As
Publication number | Publication date |
---|---|
CN104025034A (en) | 2014-09-03 |
TWI472911B (en) | 2015-02-11 |
CN104025034B (en) | 2018-09-11 |
US20140223145A1 (en) | 2014-08-07 |
TW201346524A (en) | 2013-11-16 |
EP2798467A1 (en) | 2014-11-05 |
EP2798467A4 (en) | 2016-04-27 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
EP2508985B1 (en) | Apparatus and method for handling of modified immediate constant during instruction translation | |
EP2508979B1 (en) | Efficient conditional alu instruction in read-port limited register file microprocessor | |
US9898291B2 (en) | Microprocessor with arm and X86 instruction length decoders | |
US9274795B2 (en) | Conditional non-branch instruction prediction | |
US8924695B2 (en) | Conditional ALU instruction condition satisfaction propagation between microinstructions in read-port limited register file microprocessor | |
US9032189B2 (en) | Efficient conditional ALU instruction in read-port limited register file microprocessor | |
US9336180B2 (en) | Microprocessor that makes 64-bit general purpose registers available in MSR address space while operating in non-64-bit mode | |
US9043580B2 (en) | Accessing model specific registers (MSR) with different sets of distinct microinstructions for instructions of different instruction set architecture (ISA) | |
US9292470B2 (en) | Microprocessor that enables ARM ISA program to access 64-bit general purpose registers written by x86 ISA program | |
US20120260075A1 (en) | Conditional alu instruction pre-shift-generated carry flag propagation between microinstructions in read-port limited register file microprocessor | |
CN107832083B (en) | Microprocessor with conditional instruction and processing method thereof | |
EP3151109A1 (en) | Microprocessor that translates conditional load or store instructions into a variable number of microinstructions | |
EP2508978A1 (en) | Microprocessor that performs x86 ISA and ARM ISA machine language program instructions by hardware translation | |
US20140223145A1 (en) | Configurable Reduced Instruction Set Core | |
US9645822B2 (en) | Conditional store instructions in an out-of-order execution microprocessor | |
US9378019B2 (en) | Conditional load instructions in an out-of-order execution microprocessor | |
EP2508982B1 (en) | Control register mapping in heterogenous instruction set architecture processor | |
EP3179363B1 (en) | Microprocessor that enables arm isa program to access general purpose registers written by x86 isa program | |
US20140258685A1 (en) | Using Reduced Instruction Set Cores | |
EP2704001B1 (en) | Microprocessor that makes 64-bit general purpose registers available in MSR address space while operating in non-64-bit mode |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
WWE | Wipo information: entry into national phase |
Ref document number: 13992797 Country of ref document: US |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 11878898 Country of ref document: EP Kind code of ref document: A1 |
|
REEP | Request for entry into the european phase |
Ref document number: 2011878898 Country of ref document: EP |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2011878898 Country of ref document: EP |
|
NENP | Non-entry into the national phase |
Ref country code: DE |