CN106066786A

CN106066786A - Processor and processor operational approach

Info

Publication number: CN106066786A
Application number: CN201610361995.3A
Authority: CN
Inventors: 杨梦晨
Original assignee: Shanghai Zhaoxin Integrated Circuit Co Ltd
Current assignee: Shanghai Zhaoxin Integrated Circuit Co Ltd
Priority date: 2016-05-26
Filing date: 2016-05-26
Publication date: 2016-11-02

Abstract

A kind of processor, including: microcode catcher, micro-operand store and performance element.Multiple micro-code instruction is detected and collected to microcode catcher.Micro-operand store stores multiple micro-computings, and each micro-code instruction is converted to micro-computing of the first quantity, and launches side by side and micro-computing of number of columns according to the state of indicating bit arranged side by side corresponding to each micro-computing.Performance element performs and micro-computing of number of columns side by side.

Description

Processor and processor operational approach

Technical field

The present invention relates to a kind of processor, particularly to the processor operational approach of a kind of execution micro-code instruction arranged side by side.

Background technology

The instruction that processor performs is divided into simple instruction (simple instruction) and micro-code instruction (microcode instruction).Simple instruction can be decoded into single micro-computing (micro-with decoded unit (decode unit) Operation, micro-op, uop) after disposably performed by performance element (execute unit).But, some processors make Performing specific program with micro-code instruction, a micro-code instruction may be defined as a complicated order, refers to as can not be simple It is decoded as the instruction of the instruction set architecture that single micro-computing is performed by performance element.Micro-code instruction can be via the computing of one or many Look-up table, is translated into a succession of micro-computing being stored in memorizer (e.g., read only memory).Above-mentioned " computing look-up table " also may be used It is referred to as " microcode (microcode) ", multiple micro-computing of correspondence can be found out with micro-code instruction for index.

An although micro-code instruction can translate to multiple micro-computing and perform, but some old microcode (such as legacy microcode Legal microcode) it is to aid in the processor old function of support or instruction, the microcode old due to these has been developed Becoming many years, the restriction of the hardware of the processor of adding over, for example, single-shot penetrate (single issue) processor, generally Each clock cycle can only be launched a micro-computing and be performed to rear end so that emission rate is low.Therefore, it is necessary to do not changing In the case of old microcode itself, promote the emissivity (issue rate) of old microcode.

Summary of the invention

In view of this, the present invention proposes a kind of processor, including: microcode catcher, micro-operand store and perform list Unit.Multiple micro-code instruction is detected and collected to above-mentioned microcode catcher；Above-mentioned micro-operand store stores multiple micro-computings, and will be every One above-mentioned micro-code instruction is converted to above-mentioned micro-computing of the first quantity, and according to corresponding the indicating side by side of each above-mentioned micro-computing The state of position and launch side by side and above-mentioned micro-computing of number of columns；Above-mentioned performance element performs the above-mentioned of above-mentioned and number of columns side by side Micro-computing.

The present invention also proposes a kind of processor operational approach, including: detect and collect multiple micro-code instruction；By above-mentioned microcode Instruction is converted to micro-computing of the first quantity；The state of the indicating bit arranged side by side according to each above-mentioned micro-computing, launches side by side side by side Above-mentioned micro-computing of quantity；And perform above-mentioned and above-mentioned micro-computing of number of columns side by side.

Accompanying drawing explanation

Fig. 1 is to show the block chart according to the processor described in one embodiment of the invention；

Fig. 2 is the schematic diagram showing the arranged side by side indicating bit corresponding according to the micro-computing described in one embodiment of the invention；

Fig. 3 is the block chart of the processor described in display according to another embodiment of the present invention；And

Fig. 4 is to show the flow chart according to the processor operational approach described in one embodiment of the invention.

Detailed description of the invention

Following description is embodiments of the invention.Its purpose is to illustrate the general principle of the present invention, should not regard For the restriction of the present invention, the scope of the present invention is when being as the criterion with the defined person of claim.

It should be noted that following disclosed content can provide the enforcement of multiple different characteristics in order to put into practice the present invention Example or example.The special assembly example of the following stated and arrangement are only in order to illustrate the spirit of the present invention in brief, not In order to limit the scope of the present invention.Additionally, description below may reuse in multiple examples identical element numbers or Word.But, reusable purpose simplifies only for providing and clearly illustrates, is not limited to multiple discussed further below Embodiment and/or configuration between relation.Additionally, a feature described in description below be connected to, be coupled to and/ Or it is formed at the first-class description of another feature, reality can comprise multiple different embodiment, directly contact including such feature, Or comprise other extra feature to be formed between such feature etc. so that such feature non-direct contact.

Fig. 1 is to show the block chart according to the processor described in one embodiment of the invention.As it is shown in figure 1, processor 100 Translate including instruction cache 110, microcode catcher 120, micro-operand store 130, performance element 140 and instruction Code unit 150.It is understood that the processor 100 of entirety can include other required assemblies again, simplify at this in order to describing this in detail The technical characteristic of invention.

Instruction cache 110, its speed buffering deposits the instruction set architecture of such as x86 instruction set architecture etc. Instruction, including simple instruction and micro-code instruction.According to one embodiment of the invention, a simple instruction can be by decoding unit 150 be directly translated into single micro-computing after directly performed by the performance element 140 of rear end, and micro-code instruction cannot be by decoding unit 150 directly translation after perform, it is necessary to first by micro-code instruction via microcode translate to correspondence a series of micro-computing just can be by performing Unit 140 performs.

According to one embodiment of the invention, simple instruction send after directly translating by decoding unit 150 via first path P1 To performance element 140；According to another embodiment of the present invention, micro-code instruction delivers to microcode catcher 120 via the second path P 2. Microcode catcher 120 takes from multiple micro-code instructions of instruction cache 110 in order to detect and to collect, and micro-computing stores Device 130 is in order to store multiple micro-computing, and each micro-code instruction is converted to a number of micro-computing, and according to each micro- The state of the indicating bit arranged side by side that computing is corresponding and launch side by side and micro-computing of number of columns is to performance element 140.Performance element 140 When receiving micro-computing of also number of columns, the micro-computing to also number of columns performs side by side.In one embodiment, microcode is collected Device 120 and micro-operand store 130 can also reside in another decoding unit arranged side by side with decoding unit 150.Noticeable It is that micro-computing is not directly launched and performed to performance element 140 by micro-operand store 130, and centre eliminates prior art Multiple processor pipeline levels, such as register alias table (RAT), resequencing buffer (ROB) and reservation station (Reservation Station) etc., do not repeat them here.

According to one embodiment of the invention, the present invention is directed to micro-interior micro-computing of every a line stored of operand store 130, Corresponding store indicating bit arranged side by side, its indicate each micro-computing whether can with its before micro-computing launch side by side.For example, one The indicating bit arranged side by side of the micro-computing of row if logical one, represent its can with its before micro-computing launch side by side, if patrolling Volume " 0 ", represent its can not with its before micro-computing launch side by side, certainly the invention is not restricted to this.Indicating bit can store side by side Position corresponding with the storage position of each self-corresponding micro-computing in micro-operand store 130；Due to old microcode inconvenience In amendment, in other embodiments, these indicating bits arranged side by side can be stored in bit memory (figure does not illustrates), and this bit memory can Outside micro-operand store 130, according to the putting in order of script of computing micro-in microcode, corresponding in this bit memory Store each indicating bit arranged side by side.Fig. 2 can describe the detail about indicating bit arranged side by side.

According to one embodiment of the invention, micro-operand store 130 also includes logic module 131, refers in order to detect microcode The state of the indicating bit arranged side by side of each micro-computing corresponding to order.Logic module 131 can by after microcode or be included in microcode end One section of code of tail realizes.Fig. 2 is to show the arranged side by side indicating bit corresponding according to the micro-computing described in one embodiment of the invention Schematic diagram.As in figure 2 it is shown, first micro-computing INST1 has the first indicating bit PI1 arranged side by side, second micro-computing INST2 has Two indicating bit PI2 arranged side by side, the micro-computing of N has N indicating bit arranged side by side PIN.

When a micro-code instruction is converted into first micro-computing INST1, second micro-computing by micro-operand store 130 During INST2 ... N micro-computing INSTN, logic module 131 detects the first indicating bit PI1 arranged side by side, the second indicating bit arranged side by side PI2 ... N indicating bit arranged side by side PIN, it is judged that first micro-computing INST1, second micro-computing INST2 ... the micro-computing of N Whether INSTN can perform side by side.For example, the second indicating bit PI2 arranged side by side is logical one, represents second micro-computing INST2 Can launch side by side with first micro-computing INST1；3rd indicating bit PI3 arranged side by side is logical one, represents the 3rd micro-computing INST3 energy Launch side by side with second micro-computing INST2；... the rest may be inferred, until there being the instruction arranged side by side of certain micro-computing (such as INSTM) Position (such as PIM) be logical zero, then it represents that INSTM can not with its before computing launch side by side, then logic module 131 judges Micro-computing INST1～INST (M-1) can be launched side by side.The generally general micro-computing in microcode, such as arithmetic logical operation (ALU), single instruction stream multiple data stream computing (SIMD) and access memory operations etc. can transmitted in parallel, and some are special Micro-computing, such as branch (branch), interruption (interrupt) and model related register (Model Specific Register, MSR) read and write (RDMSR/WRMSR) etc. and must launch in order.

According to one embodiment of the invention, when logic module detects should micro-computing of the first quantity of micro-code instruction In time to have the indicating bit arranged side by side of the second quantity (the most micro-computing INST1～INST (M-1)) be the first logic level, represent correspondence Micro-computing (the most micro-computing INST1～INST (M-1)) can perform side by side.According to one embodiment of the invention, first Logical bit according to the needs of design, and will definitely be logical zero or logical one.

In the present embodiment, micro-from the second quantity that indicating bit arranged side by side is the first logic level of micro-operand store 130 Computing is selected and micro-computing of number of columns is launched side by side to performance element 140 and performed side by side.According to the present invention one is real Execute example, and number of columns according to the emissivity of processor 100, (issue ratio, for example, 4/6/8issue i.e. can hold parallel 4/6/8 micro-computing of row) determined；It is, also number of columns is backend pipeline (the such as performance element of processor 100 140) depending on the quantity of micro-computing can be performed side by side.In the prior art, although the backend pipeline of processor 100 has many Launch the disposal ability of (multi-issue), but only support single-shot due to old microcode and penetrate (single issue), the most each Clock cycle can only be to one micro-computing of rear firing emission of processor 100, and the present invention is not changing old microcode (such as legacy Microcode legacy microcode) itself on the premise of, to the back-end realization multi-emitting of processor 100, make full use of process The back end bandwidth of device 100, improves instruction execution efficiency.

According to one embodiment of the invention, when micro-operand store 130 is by the first of computing micro-produced by micro-code instruction When quantity is more than also number of columns, namely when micro-operand store 130 micro-code instruction is according to the produced micro-computing of microcode conversion Quantity can perform side by side more than performance element 140 quantity time, micro-operand store 130 must repeat to turn micro-code instruction Change micro-computing into, until performance element 140 completes micro-computing that all micro-code instructions are corresponding.

For example, a micro-code instruction is converted into 100 micro-computings by micro-operand store 130, and performance element 140 institute The quantity that can perform side by side only has 4.Assuming that 100 micro-computings all can be launched side by side, the most micro-operand store 130 must profit This micro-code instruction just can be completed with 25 clock cycle.

Fig. 3 is the block chart of the processor described in display according to another embodiment of the present invention.As it is shown on figure 3, processor 300 include instruction cache 310, microcode catcher 320, micro-operand store 330, performance element 340, hold side by side Row buffer 350 and decoding unit 360, wherein instruction cache 310, microcode catcher 320, micro-computing storage Device 330, performance element 340 and decoding unit 360 are respectively corresponding to the instruction cache 110 of Fig. 1, microcode is collected Device 120, micro-operand store 130, performance element 140 and decoding unit 150.

Simple instruction and micro-code instruction produced by instruction cache 310, respectively via first path P1 And second path P 2 deliver to performance element 340.Processor 100 compared to Fig. 1, processor 300 also includes performing side by side to delay Storage 350, performs buffer 350 the most side by side and is configured to temporarily store produced by the corresponding micro-code instruction of micro-operand store 330 the (the second quantity) the part or all of micro-computing can launched side by side in micro-computing of one quantity, in one embodiment, side by side Perform buffer 350 also replace the logic module 131 of Fig. 1 to detect and judge the state of indicating bit arranged side by side corresponding to micro-computing. When micro-computing that execution buffer 350 arranged side by side detects the second quantity in micro-computing of the first quantity can be launched side by side, Micro-computing of the second quantity is all pushed and keeps in performing buffer side by side in a clock cycle by micro-operand store 330 In 350；In certain embodiments, if the memory capacity (the 3rd quantity) performing buffer 350 side by side is not enough to accommodate second Micro-computing of quantity, micro-computing of the 3rd quantity in the second quantity is pushed away by the most micro-operand store 330 in a clock cycle Send and keep in performing in buffer 350 side by side.

According to one embodiment of the invention, perform buffer 350 side by side and then kept in from it in following clock cycle Micro-computing in select and micro-computing of number of columns, launch to performance element 340 and perform side by side, and temporary remaining micro- Computing.According to another embodiment of the present invention, perform buffer 350 side by side also to select from remaining micro-computing in following clock cycle Going out above-mentioned and above-mentioned micro-computing of number of columns, transmitting to performance element 340 performs side by side.

For example, a micro-code instruction is converted into 200 micro-computings by micro-operand store 330, wherein has 100 micro-fortune Calculation can be launched side by side, and the quantity that performance element 340 each clock cycle can perform side by side only has 4.Assume also Row perform buffer 350 and be enough to temporary all of 100 micro-computings, utilize 25 clock cycle just although performing buffer 350 These 100 micro-computings all can be launched, but make microcode catcher 320 and micro-operand store 330 by this Article 100, micro-computing just can disengage after pushing to perform buffer 350 side by side, in order to process next micro-code instruction, and then enters one Step promotes the execution efficiency of processor 300.

According to one embodiment of the invention, send and micro-computing of number of columns is to performing list when performing buffer 350 side by side Unit 340 performs side by side buffer 350 when still having the micro-computing being not fully complete after performing side by side, perform buffer 350 side by side Again the performance element 340 of the micro-computing that can perform side by side transmitting to rear end can be performed in following clock cycle.

Fig. 4 is to show the flow chart according to the processor operational approach described in one embodiment of the invention.Below for Fig. 4 The narration of flow chart, will collocation Fig. 1, Fig. 3, in order to the technical characteristic describing the present invention in detail.

First, the multiple micro-of instruction cache 110,310 output is detected and collected to microcode catcher 120,320 Code instruction (step S1), i.e. picks out micro-code instruction from all instructions of instruction cache 110, and is referred to by microcode Order collects；Micro-operand store 130,330 is by each translation of the micro-code instruction stored by microcode catcher 120,320 It it is micro-computing (step S2) of the first quantity.Micro-operand store 130,330 is always according to the indicating bit arranged side by side of each micro-computing State, launches and micro-computing (step S3) of number of columns side by side.According to one embodiment of the invention, micro-operand store 130 is also Detect the state of indicating bit arranged side by side corresponding to each micro-computing and judge micro-fortune of the second quantity in micro-computing of the first quantity Calculation can perform side by side, and micro-operand store 130 is chosen in micro-computing of the second quantity and micro-computing of number of columns is sent out Penetrate.

Different from Fig. 3, according to another embodiment of the present invention, perform buffer 350 side by side and be configured to temporarily store micro-computing storage Micro-computing of the first quantity (such as 200) that device 330 produces, and detect the state of the indicating bit arranged side by side of micro-computing and judge In micro-computing of the first quantity (such as 200), micro-computing of the second quantity (such as 100) can perform side by side, then should Perform buffer 350 side by side to select in micro-computing of the second quantity (such as 100) and number of columns (example in each clock cycle Such as 4) micro-computing, and launch micro-computing of quantity arranged side by side.

Micro-fortune when the performance element 140,340 of processor rear end receives micro-computing of also number of columns, to also number of columns Calculation performs (step S4) side by side.According to one embodiment of the invention, and number of columns is according to the emissivity of processor 100,300 Determined；It is, also number of columns is micro-computing that backend pipeline (such as performance element 140,340) can perform side by side Depending on quantity.

According to another embodiment of the present invention, it is the second logic level and the second logical bit when the indicating bit arranged side by side of micro-computing When standard and the first logic level differ, representing this micro-computing cannot perform side by side with the micro-computing before it.Therefore, micro-computing Micro-computing that this indicating bit arranged side by side is the second logic level will individually be launched by memorizer 130,330 so that the performance element of rear end 140,340 this micro-computing is sequentially performed.

The above is the general introduction feature of embodiment.Having usually intellectual in art should be easy Utilize the present invention based on design or adjust carry out identical purpose and/or reach the identical excellent of embodiment described herein Point.Art has usually intellectual it will also be appreciated that identical configuration should not deviate from spirit and scope of the invention, Under without departing substantially from spirit and scope of the invention, they can make various change, replace and replace.Illustrative method only represents Exemplary step, but these steps are not necessarily to perform with represented order.Can it is possible to additionally incorporate, replace, change order And/or removal process is optionally to adjust and consistent with disclosed embodiment spirit and scope.

Claims

1. a processor, it is characterised in that including:

Microcode catcher, detects and collects multiple micro-code instruction；

Micro-operand store, stores multiple micro-computing, and each above-mentioned micro-code instruction is converted to above-mentioned micro-fortune of the first quantity Calculate, and launch side by side according to the state of indicating bit arranged side by side corresponding to each above-mentioned micro-computing and above-mentioned micro-fortune of number of columns Calculate；And

Performance element, performs above-mentioned and above-mentioned micro-computing of number of columns side by side.

Processor the most according to claim 1, it is characterised in that also include:

Bit memory, corresponding each above-mentioned multiple micro-computings store above-mentioned indicating bit arranged side by side.

Processor the most according to claim 1, it is characterised in that each above-mentioned micro-computing of instruction of above-mentioned indicating bit arranged side by side is No can with its before micro-computing launch side by side.

Processor the most according to claim 1, it is characterised in that above-mentioned micro-operand store includes:

Logic module, detects the state of indicating bit above-mentioned arranged side by side corresponding to micro-computing of each above-mentioned first quantity, wherein when upper State logic module and detect the instruction above-mentioned arranged side by side that in micro-computing of above-mentioned first quantity, above-mentioned micro-computing of the second quantity is corresponding When position is the first logic level, above-mentioned logic module judges that above-mentioned micro-computing of above-mentioned second quantity can be launched side by side.

Processor the most according to claim 4, it is characterised in that when above-mentioned logic module judges the upper of above-mentioned second quantity Stating micro-computing when can launch side by side, above-mentioned micro-operand store selects above-mentioned and columns from above-mentioned micro-computing of above-mentioned second quantity Above-mentioned micro-computing of amount is launched side by side to above-mentioned performance element and is performed side by side.

Performing buffer side by side, memory capacity is at most to store above-mentioned micro-computing of the 3rd quantity, when detecting above-mentioned first When in micro-computing of quantity, micro-computing of the second quantity can be launched side by side, above-mentioned micro-operand store is by above-mentioned second quantity Micro-computing of above-mentioned 3rd quantity in micro-computing pushes and keeps in above-mentioned execution buffer arranged side by side, and above-mentioned holds side by side Above-mentioned micro-computing from above-mentioned 3rd quantity is selected above-mentioned and above-mentioned micro-fortune of number of columns in a clock cycle by row buffer Calculate, launch extremely above-mentioned performance element and perform side by side.

Perform buffer side by side, when in the micro-computing detecting above-mentioned first quantity, micro-computing of the second quantity can be launched side by side Time, micro-computing of above-mentioned second quantity is pushed and keeps in above-mentioned execution buffer arranged side by side by above-mentioned micro-operand store, and And above-mentioned execution buffer arranged side by side will select above-mentioned and columns from above-mentioned micro-computing of above-mentioned second quantity in a clock cycle Above-mentioned micro-computing of amount, launches extremely above-mentioned performance element and performs side by side.

8. a processor operational approach, it is characterised in that including:

Detect and collect multiple micro-code instruction；

Above-mentioned micro-code instruction is converted to micro-computing of the first quantity；

The state of the indicating bit arranged side by side according to each above-mentioned micro-computing, launches and above-mentioned micro-computing of number of columns side by side；And

Perform above-mentioned and above-mentioned micro-computing of number of columns side by side.

Processor operational approach the most according to claim 8, it is characterised in that also include:

Detect the state of indicating bit arranged side by side corresponding to each above-mentioned micro-computing；And

The above-mentioned arranged side by side indicating bit corresponding when above-mentioned micro-computing of the second quantity in the micro-computing detecting above-mentioned first quantity is During the first logic level, it is determined that above-mentioned micro-computing of above-mentioned second quantity can perform side by side.

Processor operational approach the most according to claim 9, it is characterised in that also include:

When the above-mentioned micro-computing judging above-mentioned second quantity can be launched side by side, select from above-mentioned micro-computing of above-mentioned second quantity Above-mentioned and number of columns above-mentioned micro-computing is launched side by side to performance element and is performed side by side.