CN102200905A

CN102200905A - Microprocessor with compact instruction set architecture

Info

Publication number: CN102200905A
Application number: CN2011101282001A
Authority: CN
Inventors: E·K·诺登; J·H·罗宾森; D·Y-M·拉尤
Original assignee: MIPS Technologies Inc
Current assignee: MIPS Tech LLC
Priority date: 2010-03-26
Filing date: 2011-03-25
Publication date: 2011-09-28

Abstract

A re-encoded instruction set architecture (ISA) provides smaller bit-width instructions or a combination of smaller and larger bit-width instructions to improve instruction execution efficiency and reduce code footprint. The ISA can be re-encoded from a legacy ISA having larger bit-width instructions, and the re-encoded ISA can maintain assembly-level compatibility with the ISA from which it is derived. In addition, the re-encoded ISA can have new and different types of additional instructions, including instructions with encoded arguments determined by statistical analysis and instructions that have the effect of combinations of instructions.

Description

Microprocessor with compact instruction set architecture

The cross reference of related application

The application requires to submit on May 8th, 2009 according to 35 U.S.C § 120, name is called " Microprocessor with Compact Instruction Set Architecture ", U.S. Patent application No.12/463,330 rights and interests are as its part continuation application.U.S. Patent application No.12/463,330 have required rights and interests that on May 8th, 2008 submitted to, that name is called the U.S. Provisional Application 61/051,642 of " Compact Instruction Set Architecture ".Purport content with top mentioned whole applications all is incorporated in this by reference.

Technical field

Embodiments of the invention are usually directed to microprocessor.More specifically, embodiments of the invention relate to the instruction set architecture of microprocessor.

Background technology

Existence is to the needs of the expansion of economy, high performance microprocessor, particularly for the dark Embedded Application of using such as microcontroller.As a result, the client of microprocessor need can be apace and be integrated into solution efficiently in the product effectively.In addition, the client of designer and microprocessor continues to require lower power consumption, and pays close attention to the device that eco-friendly microprocessor is supported recently.

A kind ofly be used to realize that the method for these requirements is revised as the new instructions with less " code footmark " (code footprint) with existing instruction set (being also referred to as instruction set architecture (ISA) here).Less code footmark is converted to the lower power consumption of the task of each execution usually.Less instruction size (being also referred to as " code compaction ") can also cause higher performance.Be to be used to take out the memory access of the low quantity of needs of less instruction about a reason of the efficient of this raising.By making new ISA, can obtain other benefit based on the instruction of the less bit wide that obtains from existing ISA and the combination of the instruction of bit wide greatly with big bit wide.

Summary of the invention

Embodiments of the invention relate to the instruction set architecture recompile that will together use with microprocessor, and the new instruction that obtains thus.According to an embodiment, the instruction set of big bit wide is recoded to the instruction set of less bit wide or has the instruction of less bit wide and the instruction set of the combination of the instruction of big bit wide.In an embodiment, the instruction set of less bit wide keeps the assembly level compatibility of the instruction set of bit wide greatly with the instruction set that therefrom obtains this less bit wide, and interpolation has dissimilar instructions.In addition, the instruction set of the instruction set of new less bit wide or combination less and big bit wide is than the instruction set of the big bit wide of the instruction set that obtains this less bit wide from its recompile, can be more efficiently and has higher performance.

In one embodiment, a plurality of new less bit wide instructions are added in the new instruction set, comprising: compact redirect register (JRC); The redirect register is adjusted stack pointer (16 bit) (JRADDIUSP); Add the no symbol word 5 bit register selections of number (16 bit) immediately (ADDIUS5); Mobile register pair (MOVEP); And redirect and link register, short delay-slot (16 bit) is (JALRS16).

In another embodiment, a plurality of new instructions are added in the new instruction set, and it has the size identical with the presumptive instruction collection, comprising: the branch (BEQZC) under the compact situation that equals zero; The compact branch (BNEZC) that is not equal under zero situation; Redirect and link exchange (JALX); Load word is to (LWP); Load a plurality of words (LWM); Memory word is to (SWP) and storage a plurality of words (SWM); Add and count no symbol word (PC is relevant) immediately (ADDIUPC); More than or equal to branch under 0 situation and link, short delay-slot (BGEZALS); Less than branch under 0 situation and link, short delay-slot (BLTZALS); Redirect and link register, short delay-slot (JALRS); Utilize the redirect and the link register of risk barrier, short delay-slot (JALRS.HB); And redirect and link, short delay-slot (JALS).

Description of drawings:

Embodiment of the invention will be described with reference to drawings.In the accompanying drawings, identical Reference numeral can be represented identical or intimate element.Wherein the accompanying drawing that occurs first of element is usually by the digit representation of the leftmost side in the corresponding Reference numeral.

Fig. 1 is the synoptic diagram of the form of 32 bit instructions of ISA according to an embodiment of the invention.

Fig. 2 is the synoptic diagram of the form of 16 bit instructions of ISA according to an embodiment of the invention.

Fig. 3 A is the synoptic diagram that illustrates the form of branch (BEQZC) instruction under the compact according to an embodiment of the invention situation that equals zero.

Fig. 3 B is the process flow diagram that illustrates the operation of the BEQZC instruction in the microprocessor according to an embodiment of the invention.

Fig. 3 C is the synoptic diagram that illustrates the compact according to an embodiment of the invention form that is not equal to branch (BNEZC) instruction under zero situation.

Fig. 3 D is the process flow diagram that illustrates the operation of the BNEZC instruction in the microprocessor according to an embodiment of the invention.

Fig. 3 E is the synoptic diagram that the form of redirect according to an embodiment of the invention and link exchange (JALX) instruction is shown.

Fig. 3 F is the process flow diagram that illustrates the operation of the JALX instruction in the microprocessor according to an embodiment of the invention.

Fig. 3 G is the synoptic diagram of form that second embodiment of JALX instruction is shown.

Fig. 3 H is the process flow diagram of diagram according to the operation of second embodiment of the JALX instruction of second embodiment.

Fig. 3 I is the synoptic diagram that the form of compact according to an embodiment of the invention redirect register (JRC) instruction is shown.

Fig. 3 J is the process flow diagram of diagram according to the operation of the instruction of the JRC in the microprocessor of an embodiment.

Fig. 3 K illustrates the synoptic diagram to the form of (LWP) instruction of load word according to an embodiment of the invention.

Fig. 3 L is the process flow diagram of diagram according to the operation of the LWP instruction of embodiment.

Fig. 3 M is the synoptic diagram that the form that loads a plurality of words (LWM) instruction according to an embodiment of the invention is shown.

Fig. 3 N is the process flow diagram of diagram according to the operation of the instruction of the LWM in the microprocessor of embodiment.

Figure 30 illustrates the synoptic diagram to the form of (SWP) instruction of memory word according to an embodiment of the invention.

Fig. 3 P is the process flow diagram of diagram according to the operation of the SWP instruction of an embodiment.

Fig. 3 Q is the synoptic diagram that the form of storing a plurality of words (SWM) instruction according to an embodiment of the invention is shown.

Fig. 3 R is the process flow diagram of diagram according to the operation of the SWM instruction of an embodiment.

Fig. 4 A is the synoptic diagram that redirect register according to the embodiment of the invention is shown, adjusts the form that stack pointer (16) (JRADDIUSP) instructs.

Fig. 4 B is the process flow diagram that illustrates according to the operation of JRADDIUSP instruction in the microprocessor of the embodiment of the invention.

Fig. 4 C is the synoptic diagram that adds immediately the form that the no symbol word 5 bit register selections of number (16 bit) (ADDIUS5) instruct that illustrates according to the embodiment of the invention.

Fig. 4 D is the process flow diagram that illustrates according to the operation of ADDIUS5 instruction in the microprocessor of the embodiment of the invention.

Fig. 4 E illustrates according to adding of the embodiment of the invention to count (ADDIUPC) synoptic diagram of the form of instruction of no symbol word (PC relevant) immediately.

Fig. 4 F is the process flow diagram that illustrates according to the operation of ADDIUPC instruction in the microprocessor of an embodiment.

Fig. 4 G is the synoptic diagram that the form that instructs according to the mobile register pair (MOVEP) of the embodiment of the invention is shown.

Fig. 4 H is the process flow diagram that illustrates according to the operation of the MOVEP of embodiment of the invention instruction.

Fig. 5 A is the synoptic diagram that the form that (BGEZALS) instructs according to " more than or equal to the branch under 0 situation and the link, short delay-slot " of the embodiment of the invention is shown.

Fig. 5 B is the process flow diagram that illustrates according to the operation of BGEZALS instruction in the microprocessor of the embodiment of the invention.

Fig. 5 C is the synoptic diagram that the form that (BLTZALS) instructs according to " less than the branch under 0 situation and the link, short delay-slot " of the embodiment of the invention is shown.

Fig. 5 D is the process flow diagram that illustrates according to the operation of BLTZALS instruction in the microprocessor of the embodiment of the invention.

Fig. 5 E illustrates according to (JALRS16) synoptic diagram of order format of " redirect and link register, short delay-slot " (16 bit) of the embodiment of the invention.

Fig. 5 F is the process flow diagram that illustrates according to JALRS16 instruction manipulation in the microprocessor of an embodiment.

Fig. 5 G illustrates according to " redirect and the link register, short delay-slot " of the embodiment of the invention (JALRS) synoptic diagram of the form of instruction.

Fig. 5 H illustrates the process flow diagram that the operation of JALRS instruction is shown according to second embodiment.

Fig. 5 I illustrates according to the embodiment of the invention " utilize the redirect and the link register of risk barrier, short delay-slot " (JALRS.HB) synoptic diagram of form.

Fig. 5 J is the process flow diagram that illustrates according to the operation of JALRS.HB instruction in the microprocessor of an embodiment.

Fig. 5 K illustrates according to " redirect and the link, short delay-slot " of the embodiment of the invention (JALS) synoptic diagram of the form of instruction.

Fig. 5 L is the process flow diagram that illustrates according to the operation of the JALS of embodiment instruction.

Fig. 6 is the synoptic diagram according to the microprocessor core of the embodiment of the invention.

Embodiment

Although with reference to having described the present invention, should be appreciated that to the invention is not restricted to this here about the illustrative embodiment of application-specific.Those skilled in the art will recognize other modification, application and embodiment and the wherein the present invention other field that will have remarkable effectiveness in of the present invention scope according to the instruction that provides here.Following chapters and sections have been described instruction set architecture according to an embodiment of the invention.

I. general introduction

II. the framework of recompile

A. assembly level compatibility

B. special event ISA model selection

III. the instruction of newtype

A. the branch of recompile and jump instruction

B. based on the field of the coding of statistical study

C. instruct the optimization of independent variable to encode

D. delay-slot

E. the instruction that has the destination register of simplifying

The combination of f. existing instruction effect

IV. order format

A. principle (principle) operational code tissue

B. main operation sign indicating number

V. new ISA instruction

VI. example processor is examined

VII. software implementation example

VIII. conclusion

I. general view

The embodiments described herein relates to the ISA that comprises the instruction that will carry out, can carry out the microprocessor of the instruction of ISA thereon, and the method for the existing ISA of recompile.Embodiment more described herein relate to the new ISA that obtains by the existing ISA of recompile.Embodiment more described herein relate to the ISA that obtains for the ISA less and big bit wide that makes up from the ISA recompile with existing big bit wide.In one embodiment, the ISA of existing big bit wide can obtain from Sunnyvale, the MIPS of California, INC. MIPS32, the ISA of the less bit wide of new recompile also can obtain from MIPS, INC. MicroMIPS 16 bit instruction collection, and the ISA of the big bit wide of new recompile also can obtain from MIPS the MicroMIPS 32 bit instruction collection of INC..

In another embodiment, the framework of big bit wide can be recoded to the combination of the instruction of the instruction of improved framework with same bit-width or same bit-width and less bit wide.In one embodiment, the instruction set of the big bit wide of recompile is encoded as the ISA of identical big or small bit wide as follows: compatible and complementary with the instruction set of the less bit wide of the recompile with type discussed here.The embodiment of the instruction set of the big bit wide of recompile can be called as " enhancing ", and the various features that realize with parallel schema of the permission new instructions that can comprise hereinafter to be discussed, and wherein all can utilize these two instruction set on processor.The instruction set of recompile described herein is also with stand-alone mode work, and it is effective that an instruction set is wherein only arranged at every turn.

II. the framework of recompile

A. assembly level compatibility

Some embodiment described herein keeps the assembly level compatibility after the ISA of ISA from the ISA recompile of big bit wide to less bit wide or combination bit wide.In order to realize this, in one embodiment, the assembly language directive collection behind the recompile remembers that easily sign indicating number (mnemonics) is identical with its instruction that is derived from.Keeping the compatible assembly source code that allows the ISA by using less bit wide of assembly level compiles the instruction set assembly source code of the ISA that uses big bit wide.In other words, the assembler that is target with new ISA embodiment of the present invention also can collect to the conventional I SA that obtains embodiments of the invention from it.

In one embodiment, which kind of instruction size assembler determines to use handle specific instruction.For example, for the instruction of the ISA that distinguishes different bit wides, in one embodiment, utilize corresponding to the suffix of different sizes operational code is remembered that easily sign indicating number expands.For example, in one embodiment, the end that suffix " 16 " or " 32 " is placed instruction first ". " (if having one) before, with the instruction of the coding of distinguishing 16 and 32.For example, in one embodiment, " ADD16 " refers to 16 versions of ADD instruction, and " ADD32 " refers to 32 versions of ADD instruction.To know as those skilled in the art, can use other suffix.

Other embodiment do not use the suffix of instruction size to specify.In these embodiments, can omit the bit wide suffix.In one embodiment, assembler is the register of viewing command and the value in the digital section immediately, and judges that the big still order of less bit wide is suitable.Depend on the setting of assembler, assembler can automatically be selected minimum available commands size when handling specific instruction.

B. special event ISA model selection

In another embodiment, ISA taking place in one of following incident selects: unusual, interruption and power-on event.In this embodiment, disposing the disposer assigned I SA of special event.For example, for energising, the energising disposer can assigned I SA.Similarly, in one embodiment, interruption or unusual disposer can assigned I SA.In another embodiment, at each event type, which ISA the user can select to use by control bit.

III. the instruction of newtype

Embodiment with new ISA instruction is described below, and the embodiment with instruction of recompile.Used several General Principle to develop these instructions, and following these have been explained.

A. the branch of recompile and jump instruction

In one embodiment, the ISA of the less bit wide of recompile supports the smaller branching destination address, and enhanced flexibility is provided.For example, in one embodiment, 32 branch instructions that are recoded to 16 branch instructions are supported the branch target address of 16 alignment.

In another example, because the offset field of the branch instruction of 32 recompile size keeps identical with the instruction of 32 traditional recompile, so branch's scope can be less.In a further embodiment, jump instruction J, JAL and JALX support whole jump range by the destination address of supporting 32 alignment.

B. based on the field of the coding of the analysis of ISA operating position

As term used herein " digital section immediately " is known in the art.In an embodiment, digital section can comprise address offset field, load/store instruction and the aiming field of branch immediately.In an embodiment, instruction is depended in width of digital section immediately and the position in the order number.In one embodiment, the digital section immediately of instruction is divided into several fields, and it needs not be adjacent.In another embodiment, order format can have single, continuous digital section immediately.

In one embodiment, at the specific register of ISA instruction and grand use and immediately numerical value can pass on availability than other value higher levels.A plurality of embodiment described herein uses this principle to strengthen the availability of instruction.For example, in order to realize this availability, in one embodiment, carry out the register on use period of ISA and the analysis of the statistical frequency of the value used in the digital section immediately.

In another embodiment, statistical study can the employed independent variable of analysis instruction, for example destination register and numerical value immediately.Carry out the analysis of independent variable operating position at instruction when can in ISA, operate, thereby determine various useful statistics, for example, the frequency of utilization of the total frequency of utilization of argument value, the argument value used at the frequency of utilization of the argument value of specific instruction or classes of instructions, at the computer program or the user of particular category.

In an example of this statistical study and application thereof, an ISA has the specific instruction of the numerical value immediately of the target of accepting 5 bits and 5 bits.Embodiment described herein in the process of the recompile of preparing this instruction, collects the data about this specific instruction operating position, particularly, and for destination register and numerical value which value of use in time immediately with changing.In another embodiment, can be always at all the instruction acquisition operating position data among the ISA.The example of acquisition time can change according to the sample demand.

Continue the example of the foregoing description, the data of always collecting with regard to an ISA and collect with regard to specific instruction especially can be used for this specific instruction of recompile, or are used for identical ISA or second new ISA.As described herein like that, be to increase code compaction to a reason of instruction recompile.Based on the data of above-described collection, the version of the recompile of specific instruction can have the independent variable that needs less bit length (bit-length).In one embodiment, reducing on this size can be by selecting independent variable, for example destination register as previously described and numerical value immediately, the subclass of total probable value realize.For example, in the middle of 32 probable values can being quoted by the independent variable of 5 bits,, can select the last eight argument value of frequent usefulness based on the statistical study of type described above.In an embodiment, the value of these tops can be called as for specific instruction, ISA, computer program, computer program type, application, application type or other similar groupings the value of " useful ".

The example value of foregoing last eight can be " encoded " in the list structure of following shown type, for example in the table 9.In this way, the version of the recompile of example instruction may not be operated on the complete or collected works of 32 probable values that utilize 5 bits of encoded, but has the less figure place that is exclusively used in this specific independent variable on its form really.As described herein, for auxiliary ISA recompile, above-mentioned coding method also can allow to reduce the register and the required size of digital section immediately, and this is because of the coding that can omit certain value that is of little use.For example, the register behind the coding and immediately numerical value can be encoded as the bit wide shorter than original value, for example, " 1001 " can be encoded as " 10 ".In the time will being less bit wide ISA, the value of more frequently not using can be omitted from new tabulation than big bit wide instructions collection recompile.In an embodiment, instruction described herein can be new that create or from existing instruction recompile, thereby these groups is had the serviceability of increase.

Further consider this example, the independent variable in the instruction of recompile needs less space, can be so that have the less version (for example, 16 bits) that the instruction of length (for example, 32 bits) can be recoded to this instruction.Herein among the embodiment of Miao Shuing, this old, bigger instruction and instruction less, recompile can be all in the embodiment of new ISA.

Know according to explanation described herein as those skilled in the art, just the different ingredients of instruction are collected different statistics, thereby allow the different recompile to instruction.In addition, based on this analysis, other embodiment described herein do not use unmodified register or numerical value immediately, but to value encode with register that will be the most useful and immediately numerical chain receive the most frequently used value, as top statistical study is determined.

C. instruct the optimization of independent variable to encode

In one embodiment, about the register that will have the highest serviceability and the numerical chain mapping of receiving the most frequently used value immediately, some link can be transmitted the serviceability than other link higher level.This principle of using embodiment described herein adopts the serviceability of the enhancing instruction of coding.

For example, table 1A has described the Code And Decode value of the digital section immediately of mobile register pair (MOVEP) instruction, will be described hereinafter and be described in Fig. 4 G and 4H.Should be noted that in table 1A, rt (or, be not to be the value of 1 pair 1 (1-to-1) between encoded radio rs) (decimal system) and the decode value (decimal system).In one embodiment, the mapping value that described encoded radio with 1 below is mapped to 17 decode value is based on it selects the feature of the processor of execution command.It will be understood by those skilled in the art that some hardware can use less computing power that a value is linked to another value.

Table 1A:MOVEP Code And Decode value example

D. delay-slot

In the embodiment of the framework of streamline, be called as in the Tapped Delay time slot immediately following the instruction of branch.For the branch that postpones, the instruction of Tapped Delay time slot is always carried out when carrying out branch.In one embodiment, even when taking the preceding branch, also will carry out the delay-slot instruction.Delay-slot can increase efficient, but is not all effective to all application.For example,, do not use delay-slot not have influence on code compaction, for example, almost do not influence for making resulting code become less for some application (for example, performance application), also less if any.Sometimes in an embodiment, the compiler of attempting the filling delay-slot can't find useful instruction.In this case, will not have operation (NOP) instruction and place delay-slot, it may be added to program footmark (footprint) and reduce effectiveness of performance.

The embodiments described herein provides selection when using delay-slot for the developer.Select by this, the developer can select to use delay-slot how best so that make desired result's maximization, for example easness of code size, effectiveness of performance, instruction serviceability and exploitation.In one embodiment, some instruction described herein has two kinds of versions, and illustrative instructions is redirect and branch instruction.These instructions have version of band delay-slot and not with a version of delay-slot.In one embodiment, to order number the time, which version software selects to use.In another embodiment, select to use which version (selection of ADD16 described above or ADD32 is such) by the developer.In another embodiment, automatically select to use which version (as indicated above) by assembler.This feature among these embodiment also can help to keep the compatibility with the conventional hardware processor.

In another embodiment, the size of delay-slot is fixed.The embodiment here involves the have two kinds of instruction size instruction set of (for example, 16 and 32).The delay-slot of fixed width allows the designer to define the delay-slot instruction, makes that this big young pathbreaker is always a certain size, for example the time slot of big bit wide or the time slot of shorter bit wide.This delay-slot selects to allow the designer to pursue different development goals widely.In order to make code footmark minimum, can select the delay-slot of less bit wide equably.Yet this may cause the higher possibility that can not fill less time slot.On the contrary, in order to make the potential performance benefit maximization of delay-slot, can select the time slot of big bit wide.Yet in certain embodiments, this selection may increase the code footmark.

In one embodiment, to order number the time, the designer can be chosen as the delay-slot width big bit wide or less bit wide.This embodiment to the manual selection instruction bit wide of permission described herein (ADD16 or ADD32) is similar.As the selection about fixedly bit wide mentioned above, this delay-slot among some embodiment selects to allow the designer to pursue different development goals.Yet, by this method, can be at each order but not entire system carry out bit wide and select, completely contradict with overall system.In one embodiment, select the ability of delay-slot size to allow the developer to avoid waste delay-slot space in having the ISA of variable length instruction.For example, if fill bigger delay-slot with the instruction of smaller length, then this may cause than required longer code footmark, and reduces effectiveness of performance.In certain embodiments, the developer can select less delay-slot handling less instruction, thereby and avoids the poor efficiency of code.

As skilled in the art will appreciate, the method about delay-slot mentioned above can be applied to use any instruction and other ISA bit wides of delay-slot.

E. the instruction that has the destination register of simplifying

The embodiment of the ISA of recompile can be new by increasing, have the instruction size identical with original I SA instruction size or the instruction size bigger than original I SA instruction size improved code compaction.In one embodiment, the ISA of recompile use with original I SA in the instruction of the identical size of instruction, but as target, can be used for the quantity of the coded-bit of other instruction independents variable (for example, instruct digital section immediately) with increase with the register that reduces quantity.In an example, the instruction of 32 bits has and is exclusively used in some registers as target be used for the bit of one or more digital sections immediately.In the version of the recompile that the example of 32 bits is instructed, only make the destination register group of having simplified available to the instruction of recompile, thereby reduced the amount of bits that need be exclusively used in destination register, and allowed more bits to be used for the coding of digital section immediately.

In one embodiment, utilizing this mode to make its available destination register group of simplifying is the most frequent register that uses concerning specific instruction.As described above, in one embodiment, the destination register group of simplifying can be determined by the statistical study for the order register demand on life cycle.

As the skilled person will be aware of, said method can be applied to the instruction than the greater or lesser bit wide of example, and can use the additive method of selection instruction Bit Allocation in Discrete.A kind of example embodiment of this destination register group of simplifying is for instructing with the ADDIUPC that illustrates described in Fig. 4 E and 4F.

The combination of f. existing instruction effect

In one embodiment, the new instruction among the ISA of recompile can be made up the effect of two or more instructions among the original I SA.In one embodiment, the combination of instruction can be identified as its often combination execution, and based on this identification, new instruction can be included among the ISA of recompile.The combination that a plurality of embodiment can discern the instruction in the instruction of the single recompile among the ISA that can be combined to recompile together with the register target and immediately numerical value select.In one embodiment, total coding bit lacking of the combined command of recompile use than the presumptive instruction that is used to make up.In one embodiment, in the processing procedure similar, can use statistical study to discern the combination of the instruction of frequent combination execution to above-mentioned analysis.

The embodiment of the instruction of the recompile of a plurality of instruction identical operations among aforesaid execution and the existing ISA can make up the operation that jumps to address in the register and the value of another register revised to be coded in and instruct the operation of the amount in the numerical value immediately.The example of this embodiment is the JRADDIUSP instruction, as described in Fig. 4 A and the 4B and the explanation.In one embodiment, this JRADDIUSP instruction is carried out and MIPS32 " JR " instruction and MIPS32 " ADDIU " instruction identical operations.In one embodiment, in order to realize this combination, " ADDIU " part of the combination in the JRADDIUSP instruction can be only with the register that can be used for original " ADDIU " instruction version among the MIPS32 and immediately the subclass of digital section be target.

Another embodiment of the instruction of the recompile of a plurality of instruction identical operations among aforesaid execution and the original I SA can will be copied to destination register centering from the right value of source-register.The example of this embodiment is the MOVEP instruction, and as described in Fig. 4 G and the 4H and the explanation, this instruction is to carry out and the MicroMIPS instruction of mips32 MOVE instruction to identical operations for the subclass that the statistical of target and destination register is selected.

Use other embodiment of combination technique to comprise: the LWP instruction, as described in Fig. 3 K and the 3L and explanation; LWM32 instruction is as described in Fig. 3 M and the 3N and explanation; The SWP instruction is as illustrated among Fig. 3 O and the 3P; And the SWM instruction, as illustrated among Fig. 3 Q and the 3R.

IV. order format

In one embodiment, new ISA comprises the instruction with at least two kinds of different bit wides.For example, comprise instruction according to the ISA of an embodiment with 16 and 32 bit widths.Although the embodiment of new ISA described herein has described two instruction set by the complimentary fashion operation, the instruction here will be applied to the ISA instruction set of any number.

In one embodiment, instruction has operational code, and it comprises the main operation sign indicating number and comprises time operational code in some cases.The main operation sign indicating number has fixed width, and inferior operational code has the width that depends on instruction, comprises the width that is large enough to the travel all over registers group.For example, in one embodiment, the MOVE instruction has 5 inferior operational code, and can arrive whole registers group.For example, in one embodiment, coding comprises the instruction of 16 and 32 bit wides, and they all have left-Aligned 6 main operation sign indicating numbers in order number, are the inferior operational code of variable-width after it.

In one embodiment, for the instruction set of big bit wide and less bit wide, the main operation sign indicating number is identical.For example, in one embodiment, coding comprises the instruction of 16 and 32 bit wides, and they all have left-Aligned 6 main operation sign indicating numbers in order number, are the inferior operational code of variable-width after it.

A. the operate code character is knitted

Fig. 1 is the synoptic diagram according to the form 110 of the instruction of 32 the recompile of an embodiment.The embodiment of order format 110 can have zero, one or more a plurality of register field 120, is optional digital section 130 immediately after it.In one embodiment, the instruction of 32 recompile has the register field 120 of 5 bit wides.Other optional instruction specific field 140 can be immediately between digital section 130 and the opcode field 160.

As shown in Figure 1, in the exemplary embodiment, instruction can have 0 to 4 destination register field 120, is optional digital section 130 immediately after it.Other optional instruction specific field 140 is immediately between digital section 130 and opcode field 150 or 160.In one embodiment, destination register field 120 can have fixing configuration (for example, comprising therein), and they always appear in the identical bit range.As indicated above, opcode field comprises main operation sign indicating number 160, and comprises time operational code (not shown) in some cases.Some embodiment have following format character:

The main operation sign indicating number of C1.6 bit always is positioned at the leftmost side, 31:26 on the throne.

The destination register field 120 of C2.5 bit always is positioned at the fixed position: if instruction has the rt field, then it always is positioned at a 25:21, just on the right side of main operation sign indicating number; If instruction has the rs field, then it always is positioned at a 20:16, just on the right side of rt field; If instruction has the rd field, then it always is positioned at a 15:11, just on the right side of rs field; If instruction has the rr field, then it always is positioned at a 10:6, is located in the right side of rd field.In one embodiment, because these fixing positions, register field can be directly used in the access register file.

C3. one or more digital section 130 immediately, always Right Aligns and always from the position 0.

C4. inferior (operational code) 140﹠amp; Other field (not shown): be configured to not the bit position that taken by the digital section of register described above/immediately.

The above-mentioned tabulation of feature C1-C4 is intended that nonrestrictive, and is intended to describe the different features related with embodiment described herein that can be used in.In one embodiment, be least significant bit (LSB) in the order format in the position of the leftmost side described in the feature C1-C4, and in another embodiment, the position of the leftmost side of describing in feature C1-C4 is the highest significant position in the order format.Feature C1-C4 has listed and has been used for exemplary value and feature that embodiment is made an explanation, and can make up one or more in an embodiment.The spirit that can use other value, label (label) and structure and not break away from embodiment described herein.

Fig. 2 is the synoptic diagram according to the form 210 that is used for 16 bits instruction 200 of an embodiment.The embodiment of order format 210 can have zero, one or more destination register field 220.In one embodiment, the register 220 of 3-bit is used in the instruction of 16 bits, and uses the special-purpose register coding of instruction.In another embodiment, the register (rd 230, and rs 235) of 5 bits is used in the instruction of 16 bits.The special-purpose register coding of instruction relates to the mapping of register that the ad-hoc location with register space that is used for specific instruction is mapped to 3 bits of 16 bits instruction.

Some embodiment have following format character:

The main operation sign indicating number of D1.6 bit, always in the leftmost side, 15:10 on the throne.

If D2. there is one or more time opcode field (260,265), then they just can be positioned on the right side of primary opcode section 260, and in certain embodiments, also can be the individual bits 265 (bit of the rightmost side) that is positioned at position 0.

D3. for the destination register field 220 of 3 bits, in one embodiment, if instruction has the rd register field of 3 bits, then it is 3 bit register fields of the leftmost side, if and instruction has other the register field of 3 bits, then these fields do not have fixing position.In one embodiment, these register fields do not have fixing position, because they are encoded and therefore can not be directly used in the access register file, as described in the top feature C2.

D4. for the destination register field (230 of 5 bits, 235), in one embodiment, if instruction has the rd register field 230 of 5 bits, it always is positioned at a 9:5 so, just on the right side of main operation sign indicating number 260, and if instruction have the rs register field 235 of 5 bits, then it always is positioned at a 4:0,5 bits of the instruction rightmost side.In one embodiment, the fixed and arranged of these 5 bit destination registers (230,235) can be used to directly access register file, as top feature C2.For example, in one embodiment, the MOVE of 16 bits instruction has the register field of 5 bits.Allow the MOVE instruction access of 16 bits to have any register in the registers group of 32 registers to the use of 5 bit register fields.

D5. for number/other field (not shown) immediately: their use the bit position of not used by previously mentioned fields account.

Feature D1-D5 listed above is intended that nonrestrictive, its be intended to describe can with the related different characteristic of describing herein of embodiment.In one embodiment, the position of the leftmost side of describing among the feature D1-D5 is the least significant bit (LSB) in the order format, and in another embodiment, the position of the leftmost side of describing among the feature D1-D5 is the highest significant position in the order format.Feature D1-D5 has listed and has been used for exemplary value and feature that embodiment is made an explanation, and can make up one or more in an embodiment.The spirit that can use other value, label and result and not break away from embodiment described herein.

B. main operation sign indicating number

Table 1B provides the exemplary lists according to the order format of the instruction of 16 bits among the ISA of an embodiment, and table 2 provides the tabulation according to the order format of the instruction of 32 bits among the ISA of another embodiment.As can be known from Table 1, the instruction among this exemplary ISA has 16 or 32 bits.Know as those skilled in the relevant art, the name that appears at the order format in the table 1 is based on the number and the digital section size immediately of the register field of order format.That is, the instruction title has form R＜x〉I＜y 〉.Wherein＜and x〉be the quantity of register in the order format, and＜y〉be digital section size immediately.For example, the digital section immediately that has two register fields and 16 bits based on the instruction of form R2I16.

[00100] table 3 provides the exemplary lists of the form of digital section immediately of 32 bits that are used for ISA.Table 3 is divided into 3 parts: have 26 bits immediately digital section 32 bit order formats, have 16 bits immediately digital section 32 bits instructions lattice, have 12 bits, 32 bit order formats of digital section immediately.

[00101] understands as various equivalent modifications, can use different forms to realize embodiment described herein, and do not deviate from the spirit of disclosed notion.

Table 1B:16 bit instruction set form

Table 2:32 bit instruction set form

Digital section immediately in the instruction of table 3:32 bit

Have 26 bits, 32 bit order formats of digital section immediately:

Have 16 bits, 32 bit order formats of digital section immediately:

Have 12 bits, 32 bit order formats of digital section immediately:

V. the instruction of recompile

Obtaining the embodiment of new ISA, the new instruction and the traditional instruction of recompile have been added from existing ISA recompile.In a plurality of embodiment, these are new to be designed to reduce the code size with instruction recompile.Table 1B-3 illustrates the form according to the instruction of the recompile of the ISA of an embodiment.Table 4 provides the order format of 32 bits instruction of conventional I SA of 16 bits instructions that is used for being recoded to new ISA according to an embodiment.In another embodiment, selecting which 32 traditional bit ISA to instruct recompile is that the new ISA instruction of 16 bits is based on the statistical study of being gone up employed conventional code a period of time, to determine the frequent instruction of using.The exemplary collection of this instruction is provided in table 2 and 3.Table 3 provides the instruction special register described above coding or the example of digital section size coding immediately.Table 4 provides the order format of the 32 bits instruction among the new ISA that the 32 bits instructions recompile from conventional I SA according to an embodiment obtains.Table 5 provides according to the register descriptor of the instruction special use of the embodiment of the instruction that is used for recompile of an embodiment and digital immediately segment value.

Table 6 provides the exemplary lists according to the highest significant position form that is used for exemplary ISA recompile of an embodiment, and this tabulation has shown register field, digital section immediately, and other fields, null field, inferior opcode field are to the primary opcode section.Just as described above, the embodiment of the instruction of the recompile of 32 bits can have the register field of 5 bit widths.In one embodiment, the register field of 5 bit widths use uniform enconding (r0=' 00000 ', r1=' 00001 ', or the like).

The instruction of 16 bit widths can have the register field of different sizes, for example, and the register field of 3 and 5 bit widths.Register field width according to the 16 bits instruction of an embodiment is provided among the table 1B.' other fields ' limited by corresponding row, and the order of these fields is limited by the order of showing in order number.

A. from the 16 new bit instructions of the recompile of 32 bit instructions

As discussed above, here among the embodiment of Miao Shuing, the ISA of big bit wide can be recoded to the ISA of the less and big bit wide of the ISA of less bit wide or combination.In one embodiment, in order to make bigger ISA can be recoded to less ISA, the ISA instruction of less bit wide has less register field and digital section immediately.In one embodiment, as indicated above, this reduce can by to the register of frequent use and immediately numerical value encode and realize.

In one embodiment, 16 bit instruction collection of the 32 bit instruction collection that use to strengthen of ISA and narrower recompile.16 bit instructions of recompile have less register field and digital section immediately, and size reduce be by to the register of frequent use and immediately numerical value encode and realize.

For example, the recompile of listing in following table 4 at the traditional instruction of frequent use is shown as to have and the frequent register that uses and numerical value is corresponding immediately less register field and digital section immediately.

Table 4: 16 bit recompiles of the MIPS32 instruction of often using

Table 5: instruction special register descriptor and digital immediately segment value

In one embodiment, four kinds of variants that have the ADDIU instruction.First kind of variant of ADDIU instruction has bigger digital section immediately, and a register field is only arranged.In this first kind of variant of ADDIU instruction, register field is represented source and destination.Second kind of variant of ADDIU instruction has less digital section immediately, but has two register fields.The third variant, promptly ADDIUSP does not possess the source-register coded-bit, uses single register (GPR29) not only as the source of instruction but also as target, and uses increment and decrement as 4 multiple.The 4th kind of variant, i.e. ADDIUR1SP, use SP as source-register and field with one 3 bit with the select target register, the coded-bit that the is left increment of encoding is used in this instruction, it is 4 multiple.

Use 16 bit instruction meeting contingency ground to cause unjustified.Unjustified in order to be addressed to this, and align on 32 bit boundaries in order in particular case, to instruct, the NOP instruction of 16 bits is provided among the embodiment of Miao Shuing herein.The NOP instruction of this 16 bit also can will reduce the code size.

Not shown NOP instruction in table, because in exemplary embodiment, the NOP instruction realizes as grand.For example, in one embodiment, the NOP of 16 bits instruction is implemented as " MOVE16 r0, r0 ".

In one embodiment, when the jump delay slot after the JP can not be filled, compact instruction JRC more preferably instructed in JR.Because the execution speed when the JRC instruction has NOP in delay-slot is the same with JR fast, under the situation that delay-slot can be filled, should use the JR instruction.

In addition, in one embodiment, break-poing instruction BREAK and SDBBP comprise the variant of 16 bits.This allows at any instruction address place insertion breakpoint, and need not to override more than a single instruction.

E. new ISA instruction

As indicated above, in new ISA, providing several new instructions according to an embodiment.The new instruction of an embodiment and their form have been summarized in the table 6.

Fig. 3 A-Z, 4A-H and 5A-L are the form of some instructions of general introduction in the instruction of general introduction in the description list 6 and the table 4 and the process flow diagram of operation.Following chapters and sections provide form, purpose, description, restriction, operation, the unusual and programming note about the exemplary embodiment of each instruction.

Table 6 newly instructs-32 bits

Fig. 3 A is the synoptic diagram of diagram according to the form of the instruction of the branch (BEQZC) under the situation that equals zero of the compactness of the embodiment of the invention.For writing code, the form of BEQZC instruction is " BEQZC rs, offset ", and wherein rs is that general-purpose register and offset (skew) are numerical value skews immediately.The purpose of BEQZC instruction is test GPR.If the value of GPR is zero (0), then processor is carried out PC correlated condition branch.In other words, if (GPR[rs]=0), then be branched off into actual target address.

Fig. 3 B is the process flow diagram of diagram according to the operation of the instruction of the BEQZC in the microprocessor of an embodiment.In step 302, obtain register (rs) and skew.In step 304, make skew move to left one.In step 306, if necessary, sign extended is carried out in skew.In step 308, the address that skew is added to branch's instruction afterwards is to form destination address.In step 310, if the content of GPR rs equals zero, then in step 312, program branches does not have the delay-slot instruction to destination address, and ELSE instruction is handled in step 313 and finished.

The false code of describing above operation is provided as follows:

In one embodiment, if the BEQZC instruction is placed in the delay-slot of branch or redirect, then processor operations is uncertain.In one embodiment, BEQZC does not have unusually.In one embodiment, BEQZC does not have delay-slot.

Fig. 3 C is the synoptic diagram of diagram according to the form that is not equal to branch (BNEZC) instruction under zero situation of the compactness of the embodiment of the invention.For writing code, the form of BNEZC instruction is " BNEZC rs, offset ", and wherein rs is that general-purpose register and offset are numerical value skews immediately.The purpose of BNEZC instruction is test GPR.If the value of GPR non-vanishing (0), then processor is carried out PC correlated condition branch.In other words, if (GPR[rs] ≠ 0), then branch.

Fig. 3 D is the process flow diagram of diagram according to the operation of the instruction of the BNEZC in the microprocessor of an embodiment.In step 314, obtain register (rs) and skew.In step 316, make skew move to left one subsequently, and in step 318, if necessary, the offset operation number is carried out sign extended.In step 320, the address that skew is added to branch's instruction afterwards is to form destination address.In step 322, if the content of GPR rs is not equal to zero, then in step 324, program branches does not have the delay-slot instruction to destination address, and ELSE instruction is handled in step 325 and finished.

The false code of describing above operation is provided as follows:

In one embodiment, if the BNEZC instruction is placed in the delay-slot of branch or redirect, then processor operations is uncertain.BNEZC does not have unusually.In one embodiment, BNEZC does not have delay-slot.

Fig. 3 E is the synoptic diagram that the form that instructs according to the redirect and the link exchange (JALX) of the embodiment of the invention is shown.For writing code, the form of JALX instruction is " JALX target ", and wherein " target " (target) is with the field of using when the actual target address of computations.The purpose of JALX instruction is that executive process calls and changes the ISA pattern, for example becomes the instruction set of big bit wide from the instruction set of less bit wide.

Fig. 3 F is the process flow diagram of diagram according to the operation of the instruction of the JALX in the microprocessor of an embodiment.In step 326, obtain aiming field.In step 328, determine the address of back link address as the next instruction after the instruction of Tapped Delay time slot, wherein implementation is being proceeded when process transfer returns.In step 330, the return address link is placed GPR 31.Any GPR can be used for storing this return address link, as long as it can not disturb software to carry out.In step 331, the value of position in 0 that is stored in GPR 31 is set to the currency of ISA pattern position.In one embodiment, current which instruction set of ISA pattern bit representation is used to the specific instruction of decipher (being the original I SA or the ISA of recompile).In one embodiment, the position 0 that GPR 31 is set comprises that value with ISA pattern position is connected to address high 31 of the next instruction after the instruction of Tapped Delay time slot.

In one embodiment, the JALX instruction is a PC zone branch, but not the PC correlated branch.In other words, actual target address is following definite " current " 256MB aligned region.In step 332, move to left 2 by making aiming field, 28 of the low levels of acquisition actual target address.In one embodiment, this displacement is finished by being connected to the target word segment value with 2 zero.The remaining high position of actual target address is the corresponding position of the address of the instruction (but not branch self) after the branch.In step 336, together with making ISA pattern bit reversal (toggle) carry out redirect for actual target address.This EO is in step 338.

In one embodiment, the JALX instruction is not unusual.In one embodiment, be added to the value of PC, formed actual target address by the relativity shift that symbol will be arranged.Yet, if all program code addresses will be adapted at the 256MB zone of aliging on the 256MB border, then advantageously, by connecting PC and forming jump target addresses through 26 aiming fields of displacement but not form jump target addresses by the skew that is added with symbol.The PC that use to connect and 26 destination addresses allow any position from the zone to jump to any position in the zone, and this is to have the relativity shift institute of symbol unallowed.

The false code of describing above operation is provided as follows:

I:GPR[31] ← (PC+8) _GPRLEN-1..1‖ ISA pattern

I+1:PC ← PC _GPRLEN-1..28‖ target ‖ 0 ²

ISA pattern ← (non-ISA pattern)

Fig. 3 G is the synoptic diagram of form that second embodiment of JALX instruction is shown.JALX 32 bit patterns instruction according to the embodiment of the invention.For writing code, the form of JALX 32 bit instructions is " JALX instr_index ", and wherein instr_index is with the field of using when the actual target address of computations.The purpose of JALX 32 bit instructions is that executive process calls and changes the ISA pattern, for example becomes the instruction set of less bit wide from the instruction set of big bit wide.

Fig. 3 H is the process flow diagram of diagram according to the operation of the JALX instruction of second embodiment.In step 340, obtain the instr_index field.In step 342, determine the address of back link address as the next instruction after the branch, wherein implementation is being proceeded when process transfer returns.In step 344, the return address link is placed GPR 31.Any GPR can be used for storing the return address link, as long as it can not disturb software to carry out.In step 345, the value of position in 0 that is stored in GPR 31 is set to the currency of ISA pattern position.In one embodiment, the position 0 that GPR 31 is set comprises that value with ISA pattern position is connected to address high 31 of the next instruction after the branch.

In one embodiment, the JALX instruction is a PC zone branch, but not the PC correlated branch.In other words, actual target address is following definite " current " 256MB region aligned.In step 346, move to left 2 by making the instr_index field, determine actual target address.In one embodiment, this displacement is finished by being connected to the target word segment value with 2 zero.The remaining high position of actual target address is the corresponding position of the address of second instruction (but not branch self) after the branch.In step 350, carry out the instruction in the delay-slot.In step 352, together with making ISA pattern bit reversal carry out redirect for actual target address.This EO is in step 354.

In one embodiment, second embodiment of JALX instruction has without limits and not unusually.In one embodiment, be added to the value of PC, formed actual target address by the relativity shift that symbol will be arranged.Yet, if all program code addresses will be adapted at the 256MB zone of aliging on the 256MB border, then advantageously, by connecting PC and forming jump target addresses through 26 aiming fields of displacement but not form jump target addresses by the skew that is added with symbol.The PC that use to connect and 26 destination addresses allow any position from the zone to jump to any position in the zone, and this is to have the relativity shift institute of symbol unallowed.

In one embodiment, second embodiment of JALX instruction only supports the branch target address of 32 alignment.In one embodiment, if branch, redirect, ERET, DERET or WAIT instruction are placed in the delay-slot of branch or redirect, then processor operations is uncertain.In one embodiment, JALX 32 bit instructions do not have unusually.

The false code of describing above operation is provided as follows:

I:GPR[31] ← (PC+8) ‖ ISA pattern

I+1：PC←PC _GPRLEN-1..28‖instr_index‖0 ²

ISA pattern ← (non-ISA pattern)

Fig. 3 I is the synoptic diagram that illustrates according to the form of redirect register (JRC) instruction of the compactness of the embodiment of the invention.For writing code, the form of JRC instruction is JRC rs, and wherein rs is a general-purpose register.The purpose of JRC instruction is to carry out the branch for the instruction address in the register.In other words, PC ← GPR[rs].

Fig. 3 J is the process flow diagram of diagram according to the operation of the instruction of the JRC in the microprocessor of an embodiment.In step 356, obtain the address that keeps in the register (rs).In step 358, program unconditionally jumps to the address of appointment among the GPR rs, and ISA pattern position is set to the value of position in 0 of GPR rs.In one embodiment, there is not the delay-slot instruction.This EO is in step 360.

In one embodiment, zero (0) always, the position 0 of destination address.Therefore, when the position 0 of source-register when being (1), it is unusual that the address do not take place.In one embodiment, the actual target address among the GPR rs must be 32 alignment.If the position 0 of GPR rs is that the position 1 of zero and GPR rs is one, then when taking out (fetch) jump target as instruction subsequently, it is unusual that error in address takes place.The JRC instruction does not have unusual.

The false code of describing above operation is provided as follows:

I：PC←GPR[rs] _GPRLEN-1..1‖0

ISA pattern ← GPR[rs] ₀

Fig. 3 K illustrates the synoptic diagram to the form of (LWP) instruction of load word according to an embodiment of the invention.In one embodiment, the purpose of LWP instruction is from two continuous words of memory load.In other words, GPR[rd], GPR[rd+1] ← memory[GPR[base]+offset].For writing code, the form of LWP instruction is " LWP rd; offset (base) ", wherein rd is the first right register of destination register, base is a register of preserving the base address, skew (offset) will be added to this base address determining obtaining to want effective address the storer of loaded data from it, and offset (skew) is a numerical value immediately.

Fig. 3 L is the process flow diagram of diagram according to the operation of the LWP instruction of an embodiment.In step 368, obtain register (rd), register (base) and skew.In step 369, GPR (base) is added to skew to form effective address.In step 370, the content of the memory location that the effective address of 32 alignment of loading is specified.In step 371, if necessary, the word symbol that loads is expanded to the width of GPR register.In step 372, first word of fetching (retrieve) is stored among the GPR rd.In step 373, by GPR (base) being added to skew+4, the effective address of definite second word that will store.In step 374, the content of fetching the specified memory location of the new effective address of determining is as second word that loads.In step 375, if necessary, the second word symbol that loads is expanded to the width of GPR register.In 376, the second memory word is stored among the GPR (rd+1).This EO is in step 377.

In one embodiment, effective address must be 32 alignment.If any in two least significant bit (LSB)s of address is not zero, it is unusual that error in address then takes place.In one embodiment, if rd equals GPR 31, then Zhi Ling behavior is undefined on framework.If base is identical with rd, then the behavior of LWP instruction also is undefined on framework.If the term of execution interrupt or operation ended, then this allows the LWP operation to restart.In one embodiment, if this instruction is placed in the delay-slot of redirect or branch, then the behavior of this instruction also is undefined on framework.In one embodiment, LWP is unusually: TLB heavily fills out, TLB is invalid, bus error, error in address and monitor.

The false code of describing above operation is provided as follows:

In one embodiment, LWP instructs the cycle that can carry out variable number of times, and can carry out the loading from storer of variable number of times.In addition, in one embodiment, to from the term of execution got any unusual return restarting fully of executable operations sequence.

Fig. 3 M is the synoptic diagram that the form that loads a plurality of words (LWM) instruction according to an embodiment of the invention is shown.For writing code, the form of LWM instruction is " LWM reglist, (base) ", wherein reglist (register tabulation) be wherein each corresponding to the bit field of different registers.

In another embodiment, reglist is the bit field of coding, and its each encoded radio is mapped to the subclass of available register.In another embodiment, reglist sign comprises wherein each corresponding to the register of the bit field of different registers.The purpose of LWM instruction is the sequence from the memory load consecutive word.In other words, GPR[reglist[m]] ... GPR[reglist[n]] ← memory[GPR[base]] ... memory[GPR[base]+4* (n-m)].Table 7 shows the example according to the reglist coding of a plurality of embodiment.

Table 7: register tabulation coding example

In a plurality of embodiment of LWM, taking-up is positioned at the content by the word of 32 bits in succession at the specified memory location place of the effective address of 32 bit aligned, if necessary, its sign extended to the GPR register length, and is placed the defined GPR by reglist.The symbol offset that has of 12 bits is added to the content of GPR plot (base), thereby forms effective address.

Fig. 3 N is the process flow diagram of diagram according to the operation of the instruction of the LWM in the microprocessor of an embodiment.In step 380, obtain register tabulation (reglist), plot (base) and skew (offset) value.In step 381, add the formation effective address by the offset field of instruction and the no symbol of the content of GPR (base).In step 382, the content of the memory location that the effective address of 32 alignment of taking-up is specified.In step 383, if necessary, the word symbol of fetching is expanded to the width of GPR register.In step 384, the result is stored among the GPR corresponding with next register of being identified among the reglist.In step 385, effective address is updated to will be from the next word of memory load.In step 386, for each register value that is identified among the reglist, repeating step 382 to 385.

In one embodiment, effective address must be 32 alignment.If any in two least significant bit (LSB)s of address is not zero, it is unusual that error in address then takes place.If base is included among the reglist, then the behavior of LWM instruction is undefined on framework.If base is included among the reglist, then the behavior of LWM instruction also is undefined on framework, if the term of execution interrupt or operation ended, then this allows operation to restart.

The false code of describing above operation is provided as follows:

In one embodiment, LWM is unusually: TLB heavily fills out, TLB is invalid, bus error, error in address and monitor.In one embodiment, LWM instructs the cycle that can carry out variable number of times, and can carry out the loading from storer of variable number of times.In one embodiment, to from the term of execution got anyly return restarting fully of executable operations sequence unusually.

Fig. 3 O illustrates the synoptic diagram to the form of (SWP) instruction of memory word according to an embodiment of the invention.In one embodiment, the purpose of SWP instruction is to store two continuous words into storer.In other words, memory[GPR[base]+offset] ← GPR[rs1], GPR[rs1+1].For writing code, the form of SWP instruction is " SWP rs1; offset (base) ", wherein rs1 is the first right register of source-register, base is a register of preserving the base address, skew (offset) will be added to this base address determining to store the effective address in the memory of data, and offset is a numerical value immediately.

Fig. 3 P is the process flow diagram of diagram according to the operation of the SWP instruction of an embodiment.In step 387, obtain register (rs1), register (base) and skew.In step 388, GPR (base) is added to skew to form effective address.In step 390, obtain first minimum effective 32 word from GPR (rs1).In step 392, first 32 the word that takes out that is obtained is stored in the effective address appointed positions by alignment in the storer.In step 394, effective address is updated to GPR (base)+skew+4 so that next memory location that wherein will store data is carried out addressing.As required off-set value is carried out sign extended.In step 396, obtain second minimum effective 32 word from GPR (rs1+1).In step 398, the 2 32 the word that is obtained is stored in the effective address appointed positions by the alignment of upgrading in the storer.This EO is in step 399.

A restriction among the embodiment is that effective address must be 32 alignment.If any in two least significant bit (LSB)s of address is not zero, it is unusual that error in address then takes place.In one embodiment, if this instruction is placed in the delay-slot of redirect or branch, then the behavior of this instruction is undefined on framework.

In one embodiment, SWP instructs the cycle that can carry out variable number of times, and can carry out the storage at storer of variable number of times.In addition, in one embodiment, to from the term of execution got any unusual return restarting fully of executable operations sequence.In one embodiment, SWP be that TLB heavily fills out, TLB is invalid, TLB is modified unusually, error in address and monitoring.

The false code of describing above operation is provided as follows:

Fig. 3 Q is the synoptic diagram that the form of storing a plurality of words (SWM) instruction according to an embodiment of the invention is shown.For writing code, the form of SWM instruction is " SWM reglist (base) ", wherein reglist be wherein each corresponding to the bit field of different registers.In another embodiment, reglist is the bit field of coding, and the value of its each coding is mapped to the subclass of available register.。In another embodiment, reglist sign comprises wherein each corresponding to the register of the bit field of different registers.The purpose of SWM instruction is that the sequence with consecutive word stores storer into.In other words,

memory[GPR[base]]……memory[GPR[base]+4*[n-m]]←GPR[reglist[m]]……GPR[reglist[n]]

Fig. 3 R is the process flow diagram of diagram according to the operation of the SWM instruction of an embodiment.In step 380a, obtain register tabulation (reglist), base (base) operand and skew (offset) operand.In step 381a, use content+symbol _ expansion (offset) of GPR (base) to form effective address.In step 382a, obtain minimum effective 32 words of next GPR that reglist identified.In step 383a, with the address place corresponding to effective address of data storage in storer that is obtained.In step 384a, effective address is updated to next address with writing data into memory.In step 385a, for each register that identifies among the reglist, repeating step 382a to 384a.

In one embodiment, the restriction to the SWM instruction is that effective address must be 32 alignment.If any in two least significant bit (LSB)s of address is not zero, it is unusual that error in address then takes place.In one embodiment, if this instruction is placed in the delay-slot of redirect or branch, then the behavior of this instruction is undefined on framework.In one embodiment, the cycle of variable number of times is carried out in the SWM instruction, and carries out the storage at storer of variable number of times.To from the term of execution take place any unusual return restarting fully of executable operations sequence.In one embodiment, being that TLB heavily fills out, TLB is invalid, TLB is modified unusually, error in address, and monitoring for SWM.

The false code of describing above operation is provided as follows:

Fig. 4 A is the synoptic diagram that the form that (JRADDIUSP) instructs according to " the redirect register is adjusted stack pointer " of the embodiment of the invention is shown.In one embodiment, the branch that the purpose of JRADDIUSP instruction is to carry out instruction address in the register, and adjust stack pointer.In order to write code, the form of JRADDIUSP instruction is " JRADDIUSP counts immediately ", and wherein number is to want the decoded independent variable of numerical value immediately immediately.

Fig. 4 B is the process flow diagram that illustrates according to the operation of the JRADDIUSP of embodiment instruction.In step 402, obtain the value that is stored among register GPR 29 and the GPR 31 and count increment size immediately.In step 404, count moved to left 2 bits and the result carried out 0 expansion of increment sizes immediately.In step 406, the numerical value immediately that moves to left with from the value addition of GPR 29, and its result places GPR29.In step 408, actual target address has been set to remove the value among the GPR31 of position 0.In step 410, current ISA pattern position is set to the position 0 from the value of GPR31.In step 412, carry out the redirect of actual target address.EO is in step 432.

In one embodiment, upgrade in any case, the integer overflow exception can not take place at GPR 29.In other embodiments, whether disabled interrupt is relevant with implementation during the sequence of operation that is produced by this instruction.

In one embodiment, the JRADDIUSP instruction does not possess unusual.In one embodiment, being limited in of JRADDIUSP instruction: if the position 0 of GPR31 is zero to jump to the MIPS32 target with appointment, and the position 1 of GPR31 is 1, and will error in address to take place when being taken out in turn unusual by instructing when jump target so.Another restriction among this embodiment is: if can not (for example carry out the ISA mode switch, the MIPS32 of being unrealized), the position of GPR31 0 must be set to 1 so, and if the position 0 of GPR31 be 0, the generation error in address is unusual when jump target is taken out in turn by instruction so.In addition, in the embodiment of JRADDHUSP, be different from most MIPS " redirect " instruction, this embodiment does not possess delay-slot.

The false code of describing above operation provides as follows:

Fig. 4 C is for illustrating the synoptic diagram of the form that adds the no symbol word 5 bit register selections of number (ADDIUS5) instruction immediately according to an embodiment of the invention.In order to write code, the form of ADDIUS5 instruction is " ADDIUS5 rd, immediate_value ", and wherein rd is a general-purpose register, and immediate_value (number _ value immediately) is for wanting the decoded independent variable of numerical value immediately.

In one embodiment, the purpose of ADDIUS5 instruction is constant is added to the integer of 32 bits.

Fig. 4 D is the process flow diagram that illustrates according to the operation of the ADDIUS5 of embodiment instruction.In step 422, the instruction of taking out 4 bits is numerical value immediately.In step 424, the instruction of 4 bits immediately numerical value by sign extended.In step 426, from instruction, obtain the register index rd of 5 bits.The encoded radio of the digital section immediately that symbol is arranged and the example of decode value have been shown in the table 8.In step 428, GPR (rd) is added to the numerical value immediately after the sign extended.In step 430, the result of this addition is placed among the GPR (rd).EO is in step 414.

In one embodiment, the ADDIUS5 instruction is without limits with unusual.

Table 8 has the symbol encoded radio and the decode value of digital section immediately

The false code of describing above operation provides as follows:

Operation:

Temp ← GPR[rd]+symbol _ expansion (counting immediately)

GPR(rd)←temp

In one embodiment, ADDIUS5 manipulates the modular arithmetic of asking of not catching 32 bits that (trap) overflow.One embodiment can be used to not have symbolic operation, as address arithmetic, or ignores the integer arithmetic environment that overflows, for example C language algorithm.

In one embodiment, the purpose of ADDIUPC instruction is the value after constant and the program counter value addition is write register.In order to write code, the form of ADDIUPC instruction is " ADDIUPC rs, left_shifted_immediate ", and wherein, rs is a general-purpose register, the immediately numerical value independent variable of left_shifted_immediate (that shifts left counts immediately) for being shifted left.

Fig. 4 F is the process flow diagram that illustrates according to the operation of the ADDIUPC of embodiment instruction.In step 422, the instruction of taking out 23 bits is numerical value immediately.In step 444, the instruction of 23 bits numerical value immediately moves to left 2.In step 446,23 bits after shifting left instruct immediately numerical value by sign extended.In step 448, obtain the register index (rs) of 3 bits from instruction.In step 450, the register index of 3 bits (rs) is converted into 5 bit register index (rs_decoded) of decoding.In step 452, be this instruction program of file copy Counter Value.In step 454, the position 0 and the position 1 of the program counter value of copy are eliminated (clear).In step 456, the program counter value that is copied is added to the numerical value immediately after the sign extended.In step 458, addition results is placed among the GPR (rs_decoded).EO is in step 460.

In one embodiment, the integer overflow exception can not take place in any case.(for example be different from according to the 16 old bit ISA versions of this instruction, can be from MIPS, INC. (Sunnyvale, California) the implementation MIPS16e of Huo Deing), in one embodiment, even in the time of in the ADDIUPC of present embodiment instructs the delay-slot that is placed in redirect or branch instruction, always also be to use programmable counter (PC) value of ADDIUPC instruction.

In one embodiment, being limited in of ADDIUPC instruction: the register field of 3 bits can only be specified GPRs $2-$7 ， $16 ， $17.In one embodiment, the ADDIUPC instruction does not have unusual.

The false code of describing above operation provides as follows:

Operation:

temp←(PC _GPRLEN-1..2‖0 ²)+sign_extend(immediate‖0 ²)

GPR[Xlat(rs)]←temp

In one embodiment, ADDIUPC manipulates the modular arithmetic of asking of not catching 32 bits that overflow.One embodiment can be used to not have symbolic operation, as address arithmetic, or ignores the integer arithmetic environment that overflows, for example computing of C language.

Fig. 4 G is the synoptic diagram that the form that instructs according to the mobile register pair (MOVEP) of the embodiment of the invention is shown.In order to write code, the form of MOVEP instruction is " MOVEP rd, re, rs, rt ", and wherein, rd, re, rs and rt are general-purpose register.

In one embodiment, the purpose of MOVEP instruction is mobile register pair, for example, two GPR is copied to two other GPR.Be described as: GPR[rd] ← GPR[rs]; GPR[re] ← GPR[rt];

Fig. 4 H is the process flow diagram that illustrates according to the operation of the MOVEP of embodiment instruction.In step 462, the register index Enc_rt of the coding of the register index Enc_rs of the coding of taking-up 3 bits and 3 bits from instruction.In step 464, the register index Enc_rs of the coding of 3 bits is converted into 5 bit register index (rs) of decoding.The example of the encoded radio of Enc_rt and Enc_rs has been shown in table 9.In step 466, the register index Enc_rt of the coding of 3 bits is converted into the register index (rt) of 5 bits of decoding.In step 468, from instruction, obtain two destination register code Enc_dest of 3 bits.In step 470, the Enc_dest value is converted into the destination register index rd and the re of 5 bits.The example of decoding Enc_dest is shown in the table 10.In step 472, the value of GPR (rs) is copied and places GPR (rd).In step 474, the value of GPR (rt) is copied and places GPR (re).Operation ends at step 476.

The Code And Decode value of table 9:Enc_rs and Enc_rt field

The Code And Decode value of table 10:Enc_dest field

In one embodiment, whether disabled interrupt is relevant with implementation in the sequence of operation process that is produced by this instruction.

In one embodiment, being limited in of MOVEP instruction: destination register to field Enc_dest can only named list 10 in the register pair of definition.Source register Enc_rs and Enc_rt can only specify GPRs 0,2-3,16-20.If this instruction places the delay-slot of redirect or branch, then the behavior of this instruction is undefined (UNDEFINED).In one embodiment, the MOVEP instruction does not have unusual.In one embodiment, if the MOVEP instruction is placed in the delay-slot of redirect or branch, the behavior that should instruct is undefined on framework so.

The false code of describing above operation provides as follows:

Operation:

GPR[rd]←GPR[rs]；GPR[re]←GPR[rt]

Fig. 5 A is the synoptic diagram that the form that (BGEZALS) instructs according to " more than or equal to the branch under 0 situation and the link, short delay-slot " of the embodiment of the invention is shown.For writing code, the form of BGEZALS instruction is " BGEZALS rs, offset ".

In one embodiment, the purpose of BGEZALS instruction is to test GPR, carries out the relevant conditioning process (procedure) of PC then and calls, for example, if GPR[rs] 〉=0 carry out procefure_call (process _ call) so.

Fig. 5 B is the process flow diagram that illustrates according to the operation of the BGEZALS of embodiment instruction.In step 512, obtain register (rs) and offset operation number.In step 514, skew (offset) is moved to left 1.In step 516, skew is by sign extended.In step 518, skew is added to the address of the instruction after the branch, to create destination address.In one embodiment, this destination address is the relevant destination address of PC.In step 520, the address of the instruction after the branch is added 2, and the result is placed GPR[31].In step 522,, carry out branch instruction instruction afterwards here if the content of GPR (rs) more than or equal to zero, is operated so and proceeded to step 524.In one embodiment, this instruction is in the delay-slot.In step 526, be branched off into destination address.In step 522, if the content of GPR (rs) less than 0 so EO in 523.

In one embodiment, the size that is limited in the delay-slot instruction of BGEZALS instruction is necessary for 16 bits.In one embodiment, processor operations is uncertain, if the instruction of 32 bits is placed in the delay-slot of BGEZALS instruction.In one embodiment, processor operations is uncertain, if branch, redirect, ERET, DERET or WAIT instruction are placed in the delay-slot of branch or redirect.GPR31 can not be used for source-register rs, because instruction does not have identical effect when carrying out once more.The result who carries out such instruction is uncertain.This restriction allows exception handler to occur recovering execution by the mode that re-executes this branch when unusual in the Tapped Delay time slot.

The false code of describing above operation provides as follows:

Operation:

Fig. 5 C is the synoptic diagram that the form that (BLTZALS) instructs according to " less than the branch under 0 situation and the link, short delay-slot " of the embodiment of the invention is shown.For writing code, the form of BLTZALS instruction is " BLTZALS rs, offset ".

In one embodiment, the purpose of BLTZALS instruction is to test GPR, carries out the relevant conditioning process transfer of PC then.

Fig. 5 D is the process flow diagram that illustrates according to the operation of the BLTZALS of embodiment instruction.In step 528, obtain the value and the offset operation number of register (rs).In step 530, skew is moved to left 1.In step 532, skew is by sign extended.In step 534, skew is added in the address of the instruction after the branch, to create destination address.In step 536, the address of the instruction after the branch is added 2, and the result is placed GPR[31].In step 538, if GPR[rs] content less than 0 so operation proceed to step 540, carry out the instruction after the branch instruction here.In step 542, be branched off into destination address.If GPR[rs] content more than or equal to 0, EO is in step 539. so

In one embodiment, the size that is limited in the delay-slot instruction of BLTZALS instruction is necessary for 16 bits.In one embodiment, if 32 bits instructions is placed in the delay-slot of BLTZALS, then processor operations is uncertain, and GPR 31 can not be used for source-register rs, because this instruction does not have identical effect when re-executing.In one embodiment, this restriction allows abnormality processing person to take place to recover execution by the mode that re-executes branch when unusual in the Tapped Delay time slot.In one embodiment, if branch, redirect, ERET, DERET or WAIT instruction are placed in the delay-slot of branch or redirect, then processor operations is uncertain.In one embodiment, the BLTZALS instruction does not have unusual.

The false code of describing above operation provides as follows:

Operation:

Fig. 5 E is the synoptic diagram that the form that " redirect and link register, short delay-slot " (16 bit) according to the embodiment of the invention (JALRS16) instruct is shown.For writing code, the form of JALRS16 instruction is " JALRS16 rs ", and wherein rs is a general-purpose register.

In one embodiment, the purpose of JALRS16 instruction is to carry out the process transfer to the instruction address in the register, for example, and GPR[31] ← return_addr, PC ← GPR[rs].

Fig. 5 F is the process flow diagram that illustrates according to the operation of the JALRS16 of embodiment instruction.In step 544, obtain the value of register rs.In step 546, effective target ISA pattern is set to GPR[rs] the value of position in 0.In step 548, actual target address has been set to remove the GPR[rs of position 0] in value.In step 550, the address of the instruction after the redirect is added 2, and the result is placed GPR[31].In step 552, carry out jump instruction instruction afterwards.In step 554, operation jumps to actual target address, and the ISA pattern is set to effective target ISA pattern.EO is in step 556.

In one embodiment, the size that is limited in the delay-slot instruction of JALRS16 instruction is necessary for 16 bits.In one embodiment, if the instruction of 32 bits is placed in the delay-slot of JALRS16 instruction, then processor operations is unpredictable.In one embodiment, the actual target address among the GPR rs is necessary for the nature alignment.

In one embodiment, if position 0 is zero and position 1 is 1, it is unusual then to produce error in address when jump target is taken out in turn by instruction.In one embodiment, the position 0 of destination address is maintained zero, is that 1 o'clock address is unusual with the position 0 that prevents source-register.In one embodiment, if branch, redirect, ERET, DERET or WAIT instruction are placed in the delay-slot of branch or redirect, then processor operations is unpredictable.

In one embodiment, the JALRS16 instruction does not have unusual.

The false code of describing above operation provides as follows:

Operation:

Fig. 5 G illustrates according to " redirect and the link register, short delay-slot " of the embodiment of the invention (JALRS) synoptic diagram of the form of instruction.For writing code, the form of JALRS instruction is " JALRS rs (having implied rt=31) " and " JALRS rt, rs ", and wherein rt and rs are general-purpose register.

In one embodiment, the purpose of JALRS instruction is to carry out the process transfer to instruction address in the register, for example, and GPR[rt] ← return_addr, PC ← GPR[rs].

Fig. 5 H is the process flow diagram according to the operation of embodiment example JALRS instruction.In step 558, obtain the value of register rs and rt.In step 560, effective target ISA pattern is set to GPR[rs] position 0 value.In step 562, actual target address be set to the position 0 GPR[rs who is eliminated] in value.In step 564, the address of the instruction after the redirect is added 2, and this result is placed GPR[rt].In step 566, the instruction after the jump instruction is performed.In step 568, operation jumps to actual target address, and the ISA pattern is set to effective target ISA pattern.EO is in step 570.

In one embodiment, the size that is limited in the delay-slot instruction of JALRS instruction is necessary for 16 bits.In one embodiment, if the instruction of 32 bits is placed in the delay-slot of JALRS, then processor operations is unpredictable.Another restriction in one embodiment is: register descriptor rs and rt can not be set to be equal to each other, and this is because these values do not have identical result when re-executing.In one embodiment, if branch, redirect, ERET, DERET or WAIT instruction are placed in the delay-slot of branch or redirect, then processor operations is unpredictable.

In one embodiment, the JALRS instruction does not have unusual.

The false code of describing above operation provides as follows:

Operation:

Fig. 5 I is the synoptic diagram that the form that (JALRS.HB) instructs according to the embodiment of the invention " utilizing the redirect and the link register of risk barrier, short delay-slot " is shown.For writing code, the form of JALRS.HB instruction is " JALRS rs (having implied rt=31) " and " JALRS rt, rs ", and wherein rt and rs are general-purpose register.

In one embodiment, the purpose of JALRS.HB instruction is to carry out the process transfer to the instruction address in the register, for example GPR[rt] ← return_addr, PC ← GPR[rs].

Fig. 5 J is the process flow diagram according to the operation of one or two embodiment example JALRS.HB instruction.In step 572, obtain the value of register rs and rd.In step 576, effective target ISA pattern is set to GPR[rs] the value of position in 0.In step 578, actual target address is set to the GPR[rs of position 0 after being eliminated] in value.In step 580, the address of the instruction after the redirect is added 2, and the result is placed GPR[rd].In step 582, the instruction after the jump instruction is performed.In step 584, remove all instructions and carry out risk.In step 586, the operation redirect goes to actual target address, and the ISA pattern is set to effective target ISA pattern.EO is in step 588.

One embodiment of JALRS.HB instruction realizes software barriers, and it has solved all execution and the instruction risk that is produced by coprocessor 0 state variation.In a plurality of embodiment, for example, in the taking-up and decoding step of the instruction that PC quoted that an embodiment of JALRS.HB instruction will jump to, confirmed the effect of this barrier.Also realized equivalent barrier by the ERET instruction, still this instruction only can be used in the time can visiting coprocessor 0, and the embodiment of JALRS.HB instruction can be used for all operator schemes.In one embodiment, the JALRS.HB instruction is removed and is carried out and the instruction risk.

In one embodiment, the size that is limited in the delay-slot instruction of JALRS instruction is necessary for 16 bits.In one embodiment, if the instruction of 32 bits is placed in the delay-slot of JALRS.HB, then processor operations is unpredictable, and register descriptor rs and rd must not be equal to, because this instruction does not have identical effect (, unpredictable) when re-executing.In one embodiment, if branch, redirect, ERET, DERET or WAIT instruction are placed in the delay-slot of branch or redirect, then processor operations is also unpredictable.In one embodiment, the JALRS.HB instruction does not have unusual.

The false code of describing above operation provides as follows:

Operation:

Herein among the embodiment of the ISA of Miao Shuing, JALR instruction, JALR.HB instruction, JALR16 instruction, JALRS16 instruction, JALRS instruction and JALRS.HB instruction are the only branch and the link instruction of the register that can select to be used for back link; All other link instruction is used specific register, as GPR 31.In an embodiment of JALRS.HB instruction, if omitted in assembly language directive, the default register that then is used for GPR rt is GPR31.

In an embodiment of JALRS.HB instruction, the JALRS.HB instruction was removed before carrying out continuation and is carried out and the instruction risk.In one embodiment, when coprocessor 0 or TLB write the execution that influences instruction stream or mapping, or to after the writing of instruction stream, risk produces, and when this situation exists, software must be indicated to hardware expressly should remove risk.In one embodiment, independent execution risk can be utilized EHB to instruct and remove, and the instruction risk only can enough JR.HB, JALRS.HB or ERET instruct and remove, these instructions cause hardware to remove risk before the instruction at the target place of redirect is removed.It should be noted that, in one embodiment, because JR.HB, JALRS.HB and ERET instruction are encoded as redirect, therefore usually can be by simply presumptive instruction being replaced with the JALRS.HB equivalent, the processing procedure that comprises the clear instruction risk is as calling (JALR) or returning the part of (JR) sequence.

Example: remove owing to ASID changes the risk that causes

Fig. 5 K illustrates according to " redirect and the link, short delay-slot " of the embodiment of the invention (JALS) synoptic diagram of the form of instruction.In one embodiment, the purpose of JALS instruction is that executive process calls in current 128MB region aligned.

Fig. 5 L is the process flow diagram that illustrates according to the operation of the JALS of embodiment instruction.In step 590, from instruction, obtain instr_index (instruction index) field of 26 bits.In step 591, the instr_index field of this 26 bit is moved to left 1.In step 592, the position 31..27 of the address of the instruction after the redirect is connected to the instr_index field of 26 bits after shifting left, to obtain actual target address.In step 593, the address of the instruction after the redirect is added 2, and the result is placed GPR[31].In step 594, the instruction after the jump instruction is performed.In step 595, carry out the redirect of actual target address.EO is in step 596.

In one embodiment, the size that is limited in the delay-slot instruction of JALS instruction is necessary for 16 bits.In one embodiment, if the instruction of 32 bits is placed in the delay-slot of JALS, then processor operations is unpredictable.In one embodiment, if branch, redirect, ERET, DERET or WAIT instruction are placed in the delay-slot of branch or redirect, the then also different predictions of processor operations.In one embodiment, the JALS instruction does not have unusual.

The false code of describing above operation provides as follows:

Operation:

I：GPR[31]←PC+6

I+1：＝PC←PC _GPRLEN-1..27‖instr_index‖0 ¹

VI. example processor is examined

Fig. 6 is the synoptic diagram of example processor nuclear 600 according to an embodiment of the invention, and it is used to realize the ISA according to the embodiment of the invention.Processor core 600 is example processor, and it is intended that illustrative and nonrestrictive.Person of skill in the art will appreciate that and be used for many processor implementations of using with ISA according to the embodiment of the invention.

As shown in Figure 6, processor core 600 comprises performance element 602, retrieval unit 604, floating point unit 606, load/store unit 608, Memory Management Unit (MMU) 610, instruction cache 612, data cache 614, Bus Interface Unit 616, multiplication/division unit (MDU) 620, coprocessor 622, general-purpose register 624, scrachpad storage (scratch pad) 630 and nuclear expanding element 634.Although processor core 600 is described to the parts that comprise that several separate here, but the many parts in these parts are optional parts and will can all exist in each embodiment of the present invention, or the parts that for example can make up, thereby two functions of components reside in the single parts.Also can add other parts.Therefore, the independent parts shown in Fig. 6 are illustrative, rather than limitation of the present invention.

In one embodiment, processor core 600 is Reduced Instruction Set Computer (RISC) processor, as what those skilled in the art will know that, one of feature of such processor is that it uses the instruction that realizes simple functions (function) and direct access register address.The embodiment of risc processor can realize by the RISC framework, will describe an one example below.

The embodiment of performance element 602 realizes having monocycle ALU computing loading-storage (RISC) framework of (for example, logic, be shifted, add, subtract etc.).Performance element 602 is connected with retrieval unit 604, floating point unit 606, load/store unit 608, multiplication-divider 620, coprocessor 622, general-purpose register 624 and nuclear expanding element 634 interfaces.

Retrieval unit 604 is responsible for providing instruction to performance element 602.In one embodiment, retrieval unit 604 comprises: the steering logic that is used for instruction cache 612, be used for the register of recording compressed format order, dynamic branch predictor, and be used to make the operation of retrieval unit 604 to remove the Instruction Register that is coupled from performance element 602.Retrieval unit 604 is connected with performance element 602, Memory Management Unit 610, instruction cache 612 and Bus Interface Unit 616 interfaces.

Floating point unit 606 is connected and carries out the computing of non-integer data with performance element 602 interfaces.Floating point unit 606 comprises flating point register 618.In one embodiment, flating point register 618 can be positioned at floating point unit 606 outsides.Flating point register 618 can be 32 or 64 bit registers that are used for the performed floating-point operation of floating point unit 606.Typical floating-point operation is arithmetical operation, such as addition and multiplication, and also can comprise index or triangulation calculation.

Load/store unit 608 is responsible for data load and storage, and comprises the data cache steering logic.Load/store unit 608 is connected with scrachpad storage 630 and/or filling buffer (not shown) interface with data cache 614.Load/store unit 608 also is connected with Bus Interface Unit 616 interfaces with Memory Management Unit 610.

Memory Management Unit 610 is the physical address that is used for memory access with virtual address translation.In one embodiment, Memory Management Unit 610 comprise translate the reserve buffer (translation lookaside buffer, TLB) and can comprise instruction TLB separately and the data TLB that separates.Memory Management Unit 610 is connected with load/store unit 608 interfaces with retrieval unit 604.

Instruction cache 612 be organized as multichannel be provided with association or direct correlation high-speed cache (such as, for example 2 the tunnel associative cache, 4 tunnel is set associative cache, 8 tunnel is set associative cache etc. is set) on-chip memory array.Instruction cache 612 allows virtual-physical address translation and cache access to carry out concurrently preferably by index virtually and physically mark label thus.In one embodiment, except physical address bits, label also comprises significance bit and optional parity bit.Instruction cache 612 is connected with retrieval unit 604 interfaces.

Data cache 614 also is an on-chip memory array.Data cache 614 is preferably by index virtually and physically mark label.In one embodiment, except physical address bits, label also comprises significance bit and optional parity bit.Data cache 614 is connected with load/store unit 608 interfaces.

The external interface signals of Bus Interface Unit 616 processor controls nuclear 600.In one embodiment, Bus Interface Unit 616 comprises the collapsible buffer (collapsing write buffer) that writes, and it is used to merge directly writes affairs and assemble from writing by the storage of high-speed cache not.

The multiplication and the division arithmetic of processor core 600 carried out in multiplication/division unit 620.In one embodiment, multiplication/division unit 620 preferably includes pipeline multiplier, accumulator register (totalizer) 626 and multiplication and division state machine, and for example carries out multiplication, takes advantage of and add all steering logics required with division function.As shown in Figure 6, multiplication/division unit 620 is connected with performance element 602 interfaces.Totalizer 626 is used to store the result of the arithmetical operation of carrying out multiplication/division unit 620.

Coprocessor 622 is carried out the various overhead functions of processor core 600.In one embodiment, coprocessor 622 is responsible virtual to physical address translation, realization cache protocol, unusual disposal, operator scheme selection and enable/disable interrupt function.Coprocessor 622 is connected with performance element 602 interfaces.Coprocessor 622 comprises status register (state register) 628 and general-purpose storage 638.Status register 628 is generally used for preserving the variable that is used by coprocessor 622.Status register 628 can also comprise the register that is used to preserve the status information that is generally used for processor core 600.For example, status register 628 can comprise status registers (status register).General-purpose storage 638 can be used to preserve nonce (such as the coefficient that generates in the computing interval).In one embodiment, general-purpose storage 638 has the form of register file.

General-purpose register 624 is used for 32 or 64 bit registers of scalar integer arithmetic and address computation typically.In one embodiment, general-purpose register 624 is parts of performance element 624.Alternatively, can comprise one or more other register file groups, such as shadow (shadow) register file group so that for example interrupt and/or abnormality processing during content handover overhead minimum.

Scrachpad storage 630 is to load/store unit 608 storages or memory of data is provided.Can one or more zone, specific address pre-configured or programming ground configuration scrachpad storage in processor 600 operations.The address area be can be for example by the continuation address scope of base address and area size appointment.When using base address and area size, the starting point in assigned address zone, base address, and for example area size is added to the terminal point of base address with the assigned address zone.Typically, in case specified the address area of scrachpad storage, then from scrachpad storage, fetch all data corresponding with specified address area.

User definition instruction (UDI) unit 634 allows at concrete applied customization processor core 600.UDI 634 allows user definitions and adds themselves the instruction that can operate the data that are stored in the general-purpose register 624 for example.UDI 634 allows the user to add new function, keeps the compatibility with industry standard architecture simultaneously.UDI 634 comprises UDI storer 636, and it can be used to store the instruction of user's interpolation and the variable that generates in the computing interval.In one embodiment, UDI storer 436 has the form of register file.

VII. software implementation example

For example, remove and (for example utilize, at CPU (central processing unit) (" CPU "), microprocessor, microcontroller, digital signal processor, processor, processor core, SOC (system on a chip) (" SOC "), within the equipment of any other programmable or electronics or be connected to theirs) outside the implementation of hardware, can also be (for example with software, with such as the source, the computer-readable code that any form of target or machine language is disposed, program code and/or instruction) realize, described software for example is deployed in the computing machine that configuration is used to store this software can be with in (for example, readable) medium.This software can make it possible to realize function, manufacturing, modeling, emulation, description and/or the test of equipment for example described herein and method.For example, it can be by using with the realization of getting off: general programming language (for example C, C++); Hardware description language (HDL) comprises Verilog HDL, VHDLL, SystemC register transfer level (RTL) or the like; Or other available programs, database and/or circuit (that is, illustrated) trap tool.This software can be deployed in any known computer usable medium, described computer usable medium (for example comprises semiconductor, disk, CD, CD-ROM, DVD-ROM etc.), and storage is as can (for example using with computing machine, readable) computer data signal realized of medium (for example, comprise numeral, light or based on any other medium of the medium of simulation).So, this software can be gone up transmission at communication network (comprising internet and Intranet).

The embodiment that should be appreciated that equipment described herein and method can be contained in the semiconductor intellectual property core, for example microprocessor core (as, realize with HDL), and can convert hardware in the integrated circuit (IC) products to.In addition, equipment described herein and method can be implemented as the combination of hardware and software.

VIII. conclusion

Summary of the invention and summary chapters and sections can be set forth one or more exemplary embodiment of the present invention of conceiving as the inventor, but non-all exemplary embodiments, and therefore, be not to be intended to limit by any way the present invention and claim.

Above the function of diagram appointment and between the help of functional configuration module of implementation of relation under embodiment has been described.For the purpose of the description facility, at random defined the border of these functional configuration modules.Can define alternative border, as long as specified function and relation thereof are suitably carried out.

The description of the specific embodiment of front will represent speciality prevailingly of the present invention all sidedly, thereby under the situation that does not depart from general notion of the present invention, those skilled in the art can easily revise these specific embodiments and/or make it be suitable for various application by using the knowledge in this area under the situation of not carrying out irrational experiment.Therefore, based on instruction that presents here and guide, these adjustment and revise should be in the implication and scope of the equivalent of the disclosed embodiments.To understand, the purpose that idiom here or term are unrestricted for description makes those skilled in the art explain the term or the idiom of this instructions according to described instruction and guide.

Scope of the present invention should not be limited to any above-mentioned exemplary embodiment, but should only limit according to claim and equivalent thereof.

Claims

1. a risc processor is used to carry out the instruction that belongs to instruction set architecture, and this instruction set architecture has at least two kinds of different sizes, and this processor comprises:

The instruction retrieval unit is arranged to each cycle taking-up at least one instruction;

Instruction decode unit is arranged to the size of determining every instruction of being taken out, and decodes according to the instruction that its every of determined big young pathbreaker is taken out; And

Performance element is arranged to and carries out institute's decoded instruction, wherein, and the compiler that the instruction backward compatibility in the described instruction set architecture is used with conventional processors.

2. risc processor as claimed in claim 1 is wherein based on the instruction size of the statistical study of instruction operating position being determined specific instruction in the instruction set architecture.

3. risc processor as claimed in claim 2, wherein big or small less instruction are provided for instruction commonly used.

4. risc processor as claimed in claim 1, wherein said instruction set architecture comprises the instruction that only has 3 kinds of sizes.

5. risc processor as claimed in claim 3, wherein said instruction set architecture comprises:

First group of instruction with 16 bits; And

Second group of instruction with 32 bits.

6. one kind by encoding to create the method for new processor ISA to existing processor instruction set framework ISA, comprising:

Utilize computing machine to collect data, these data are with corresponding from the execution value of existing instruction on one period service time of existing ISA;

Utilize the collected data of given Computer Analysis; And

From having the new instruction of the new ISA of instruction and described analysis recompile now.

7. method as claimed in claim 6, wherein said new instruction have with existing instruction compares less bit length.

8. method as claimed in claim 6, wherein said analysis comprise uses statistical study to analyze.

9. method as claimed in claim 6, wherein said execution value comprises destination register, and described new instruction utilization is encoded and quoted the destination register collection of simplifying.

10. method as claimed in claim 6, wherein said execution value comprises numerical value immediately, and the described new instruction possible set of numerical value immediately that utilizes encoded radio to receive to simplify.

11. as the method for claim 10, wherein the value behind at least one coding is based on that its above new ISA is encoded with the special characteristic of the computing machine that is performed.

12. a tangible computer-readable recording medium, it comprises the processor of realizing with software, and this processor comprises:

The instruction retrieval unit is arranged to and takes out first instruction, and this first instruction is associated with the first instruction set architecture ISA;

Instruction decode unit is arranged to the size of determining described first instruction, and according to the described first instruction decoding of its determined big young pathbreaker; And

Performance element is arranged to first instruction that execution decodes, and the size of the independent variable of wherein said first instruction is based on the statistical study of second instruction and definite.

13. as the tangible computer-readable recording medium of claim 12, wherein said second instruction is associated with the 2nd ISA.

14. as the tangible computer-readable recording medium of claim 12, wherein said statistical study comprises: the operating position of described second instruction on a period of time analyzed, and the frequency of determining employed argument value.

15. tangible computer-readable recording medium as claim 12, wherein said statistical study comprise described to second instruction and other instructions operating position on a period of time analyze, and determine that described second instruction compares the frequency of utilization of described other instructions.

16. as the tangible computer-readable recording medium of claim 12, wherein said performance element is arranged to first instruction that execution is decoded, and wherein obtains described first instruction based on described statistical study from the described second instruction recompile.

17. as the tangible computer-readable recording medium of claim 12, wherein said first instruction is arranged to the argument value behind the received code.

18. as the tangible computer-readable recording medium of claim 17, the argument value behind the wherein said coding is based on that the feature of processor determines.

19. as the tangible computer-readable recording medium of claim 17, the argument value behind the wherein said coding is numerical value immediately.

20. as the tangible computer-readable recording medium of claim 17, the independent variable behind the wherein said coding is the destination register value.

21. a processor comprises:

Instruction decode unit is arranged to the size of determining described first instruction, and according to its determined size described first instruction is decoded; And

Performance element, be arranged to first instruction that execution is decoded, wherein said first instruction is the combination of the second and the 3rd instruction, and the argument value behind wherein said first instruction accepting coding, the argument value behind the described coding is with corresponding from one uncoded independent variable in described second instruction and described the 3rd instruction.

22. as the processor of claim 21, wherein said second instruction and the 3rd instruction are associated with the 2nd ISA.

23. as the processor of claim 21, the argument value behind the wherein said coding produces by following processing procedure, this processing procedure comprises:

The operating position of described uncoded independent variable on a period of time analyzed; And

The employed a plurality of independents variable of described first instruction are selected and encoded.

24. as the processor of claim 23, wherein selected described a plurality of independents variable are corresponding to determining by described analysis as described second instruction independent variable of those independents variable of frequent use.

25. one kind is used in the method for carrying out the branch instruction under the compact situation that equals zero on the processor, this method comprises:

Receive the bit sequence of and instruction correspondence at described processor place;

Use demoder that the operation part of described instruction is decoded, described operational code indicates described instruction to be the branch instruction under the compact situation that equals zero;

Utilize described demoder that rs value and off-set value from described instruction are decoded;

With the predetermined figure place of described off-set value displacement;

Expand the symbol of described off-set value;

Form destination address by the storage address that described off-set value is added to described instruction;

Whether the content of determining the GPR address equals 0, and described GPR address is corresponding with described rs value; And

If the GPR content of being checked equals 0, then

Branch to described destination address.

26. method as claimed in claim 25, wherein

The bit length of described instruction is 32 bits;

The described operation part of described instruction comprises main operation sign indicating number and time operational code;

The bit length of the described main operation sign indicating number part of described instruction is 6 bits;

The bit length of described operation part of described instruction is 5 bits;

The bit length of described Offset portion is 16 bits; And

The bit length of the described rs part of described instruction is 5 bits.

27. one kind is used for carrying out the method that loads a plurality of word instructions on processor, this method comprises:

Use demoder that the operation part of described instruction is decoded, it is to load a plurality of word instructions that this operational code is indicated described instruction;

Use described demoder that register tabulation, off-set value and the plot operand part of described instruction are decoded;

Expand the symbol of described off-set value;

By sign extended off-set value and the no symbol of the content of GPR address add, form effective address, described GPR address is corresponding to described plot operand value;

Carry out following steps at each listed in register tabulation register:

The place takes out memory word from storer in effective address;

The symbol of the memory word that expansion is taken out is to the length of GPR register;

Store the memory word of being taken out in the GPR address, this GPR address is corresponding to the value of storage in the described register tabulation; And

Make described effective address be increased to next memory word.

28. as the method for claim 27, wherein

The bit length of described instruction is 32 bits;

The bit length of described operation part of the described writ of execution is 4 bits;

The bit length of the described array of registers matrix section of described instruction is 5 bits;

The bit length of described plot operand part is 5 bits; And

The bit length of the described Offset portion of described instruction is 12 bits.

29. one kind is used for carrying out the method that the redirect register is adjusted the stack pointer instruction on processor, this method comprises:

Utilize demoder that the operation part of described instruction is decoded, described operational code indicates this instruction to adjust the stack pointer instruction for the redirect register;

Utilize described demoder that the increment size of described instruction is partly decoded;

Taking-up is stored in the value in first general-purpose register and second general-purpose register;

With the described increment size predetermined figure place of shifting left;

Numerical value immediately after will shifting left is added to the value that is stored in second register, and the result is positioned in described first register;

Actual target address is set to the value of being stored in first register;

Remove 0 in the described actual target address;

Instruction set architecture pattern position is set to the value that the position of described second register is stored in 0; And

Jump to described actual target address.

30. as the method for claim 29, wherein:

The bit length of described instruction is 16 bits;

The bit length of described operation part of described instruction is 5 bits; And

The bit length of counting incremental portion immediately of described instruction is 5 bits.

31. one kind is used for carrying out on processor and adds the method for counting no symbol word register selection instruction immediately, this method comprises:

Use demoder that the operation part of this instruction is decoded, it is to add to count no symbol word register selection instruction immediately that this operational code is indicated this instruction;

Utilize described demoder to the and instruction of described instruction immediately the part of numerical value and register index value correspondence decode;

Expand the described instruction symbol of numerical value immediately;

With the value of being stored in the GPR address be added to sign extended instruction numerical value immediately, described GPR address is corresponding to described register index value;

The result that will add places described GPR address, wherein,

Described command bit length is 16 bits;

The operation part of described instruction comprises main operation sign indicating number and time operational code;

The bit length of the main operation sign indicating number part of described instruction is 6 bits;

The bit length of the inferior operation part of described instruction is 1 bit;

The bit length of the register index part of described instruction is 5 bits; And

The instruction of the described instruction bit length of fractional part immediately is 4 bits.

32. a method that is used for carrying out mobile register pair instruction on processor, this method comprises:

Utilize demoder that the operation part of described instruction is decoded, it is mobile register pair instruction that this operational code is indicated this instruction;

Utilize described demoder that the part corresponding with the destination-address value of register address value first coding, the second register address value of encoding and coding described instruction decoded;

The register address value of described first coding is converted to the register address value of first decoding;

The register address value of described second coding is converted to the register address value of second decoding;

Determine the register address value of the 3rd decoding and the register address value of the 4th decoding from the destination-address value of described coding;

The content of first register is copied to the 3rd register, and first register address is corresponding to the register address value of described first decoding, and the 3rd register address is corresponding to the register address value of described the 3rd decoding; And

The content of second register is copied to the 4th register, and second register address is corresponding to the register address value of described the 3rd decoding, and the 4th register address is corresponding to the register address value of described the 4th decoding.

33. as the method for claim 32, wherein:

The bit length of described instruction is 16 bits;

The bit length of described operation part of described instruction is 1 bit;

The bit length with the lower part of described instruction is 3 bits:

The register value of described first coding,

The register value of described second coding, and

The destination value of described coding; And

Bit length with the lower part is 5 bits:

The register value of described first decoding,

The register value of described second decoding,

The register value of described the 3rd decoding, and

The register value of described the 4th decoding.

34. one kind is used for carrying out on processor and utilizes the redirect of delay-slot and the method for link instruction, this method comprises:

Utilize demoder that the operation part of described instruction is decoded, it is to utilize the redirect of delay-slot and link instruction that described operational code is indicated this instruction;

Utilize described demoder that the part of the and instruction index correspondence of described instruction is decoded;

With the described instruction index predetermined shift amount that shifts left;

The bit of specific quantity of self-dalay time slot address is connected with the instruction index that shifts left and forms actual target address by future;

Form the return address by the address that a value is added to described instruction, wherein, the ISA that carries out described instruction therein has variable bit length, and added value depends on the size of described delay-slot instruction;

Described return address is placed GPR;

Receive the bit sequence corresponding with the delay-slot address at described processor place;

Utilize demoder that the instruction that is positioned at place, described delay-slot address is decoded;

Carry out described delay-slot instruction; And

Jump to formed actual target address.

35. as the method for claim 34, wherein:

The bit length of described instruction is 32 bits;

The bit length of the operation part of described instruction is 6 bits;

The bit length of described instruction index is 26 bits; And

Added value is 2 or 4.