CN1116638C - Microprocessor using basic block high speed buffer storage - Google Patents

Microprocessor using basic block high speed buffer storage Download PDF

Info

Publication number
CN1116638C
CN1116638C CN00137005A CN00137005A CN1116638C CN 1116638 C CN1116638 C CN 1116638C CN 00137005 A CN00137005 A CN 00137005A CN 00137005 A CN00137005 A CN 00137005A CN 1116638 C CN1116638 C CN 1116638C
Authority
CN
China
Prior art keywords
instruction
high speed
basic block
buffer storage
group
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN00137005A
Other languages
Chinese (zh)
Other versions
CN1303044A (en
Inventor
詹姆斯·A·卡尔
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
International Business Machines Corp
Original Assignee
International Business Machines Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by International Business Machines Corp filed Critical International Business Machines Corp
Publication of CN1303044A publication Critical patent/CN1303044A/en
Application granted granted Critical
Publication of CN1116638C publication Critical patent/CN1116638C/en
Anticipated expiration legal-status Critical
Expired - Fee Related legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline, look ahead
    • G06F9/3802Instruction prefetching
    • G06F9/3804Instruction prefetching for branches, e.g. hedging, branch folding
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30145Instruction analysis, e.g. decoding, instruction word fields
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline, look ahead
    • G06F9/3802Instruction prefetching
    • G06F9/3808Instruction prefetching for instruction reuse, e.g. trace cache, branch target cache

Abstract

The invention discloses a microprocessor, a related method and data processing system. The microprocessor is provided with an instruction fetch unit constituted so that the first set of microprocessor instructions can be received, and the fetch unit compiles the set of the instructions as an instruction group. The instructions of the group are respectively provided with an instruction group tag. The processor is constituted of a basic cache block mechanism compiled with an instruction group format for caching the instruction group generated by the fetch unit. The execution unit of the processor is suitable for executing the instructions in the instruction group.

Description

Use microprocessor, system and the instruction executing method of basic block high speed buffer storage
The application's theme and 1999,10, the sequence numbers of 28 applications are 09/428399, and the theme of United States Patent (USP) that name is called " the instruction group tissue in the microprocessor and exception handle " is relevant, and it is same assignee with the application, classifies reference as at this.
Technical field
The present invention relates generally to the microprocessor architecture field, relate in particular to the microprocessor that utilizes instruction set architecture, corresponding caching function and useful expansion thereof.
Background technology
Microprocessor technology has reached the performance of kilo-mega cycles per second, main challenge to microprocessor Design person is to utilize the technology that reaches the recent technological advances level, and the while keeps again and is a large amount of compatibility of install software of utilizing specific order set structure (ISA) operation to design.In order to address this problem, the deviser has realized the microprocessor of " hierarchy ", it is adapted to receive the instruction that constitutes according to existing ISA, and the order format of the instruction that receives is converted to the inside ISA that is more suitable in operating in the kilo-mega cycles per second execution pipeline.Referring to Fig. 4, there is shown the part of selection of the microprocessor 401 of hierarchy.In this design, the high-speed cache 410 of microprocessor 401 receives and stores by means of getting the instruction that unit 402 is fetched from primary memory.The instruction of storage is formatted according to an ISA (promptly wherein with the ISA of the program of being carried out by processor 401) in instruction cache unit 410.Search instruction and be converted into the 2nd ISA from instruction cache 410 then by ISA converting unit 412.Because need a plurality of cycles from the conversion of an ISA to the two ISA, so conversion process generally is a pipeline processes, thereby, in any given moment, have a plurality of instructions that are converted to the 2nd ISA from an ISA.Export the instruction that is converted then, so that in the execution pipeline 422 of processor 401, carry out.Get unit 402 and comprise branch prediction logic 406, its result by means of predicted branches decision attempts to determine to be accompanied by the address of the instruction that will be performed of branch instruction.Then, send with instructing predicted property and carried out according to branch prediction.Yet, when a branch prediction mistake, be suspended in the instruction cache 410 of microprocessor 401 and finish the instruction of level between 432 and must be refreshed.When the branch of misprediction causes system refresh, then cause performance loss as the function of streamline length.The quantity of the streamline that must be refreshed is big more, and the branch misprediction performance loss is also big more.The branch prediction relevant with layered architecture loss because layered architecture is added on processor pipeline, and increased the quantity that may be in the instruction in " flight " a given moment, so may become a limiting factor of processor performance.Therefore be starved of and realize a kind of microprocessor that can solve the layered architecture of branch misprediction performance loss.In addition, also need to solve at least in part the solution that the exceptional condition that causes by repeating one section code takes place repeatedly.Also need a kind ofly can operate the in fact bigger team that sends, and do not sacrifice the ability of the next instruction that will carry out of retrieval.
Summary of the invention
Above-mentioned problem is utilized the instruction group and has been solved by major part with the microprocessor of the caching device of the format match of described instruction group by a kind of.One embodiment of the present of invention have proposed a kind of microprocessor and a kind of relevant method and data handling system.Described microprocessor comprises the instruction fission unit, and it is constructed for receiving first microprocessor instruction set.Described division unit is described instruction set organization instruction group.In described instruction group each instructed shared public instruction group echo.Described processor also comprises the basic block high speed buffer storage device, and it utilizes instruction group form to be organized, and is constructed for the instruction group that high-speed cache is produced by the division unit.The performance element of processor is applicable to the instruction in the execution command group.In one embodiment, when generation between the order period in the execution command group caused refresh unusual, then described refreshing only refreshed the instruction that those have been assigned from basic block high speed buffer storage.By only refreshing the instruction that those have arrived basic block high speed buffer storage, processor does not need to refresh the instruction that is suspended in the division unit stream waterline.Because the instruction that is refreshed is less, so can reduce the anomalous performance loss.In one embodiment, the instruction of reception is formatd according to first order format, and according to second order format second instruction set is formatd, and wherein said second order format is wideer than first order format.Basic block high speed buffer storage is constituted suitably, is used for corresponding each the instruction group of inlet storage at basic block high speed buffer storage.In one embodiment, each inlet in the basic block high speed buffer storage comprises the pointer of the inlet field and the next instruction group that will be performed of prediction of the corresponding basic block high speed buffer storage inlet of expression.The branch that processor preferably is constructed for responding misprediction upgrades the pointer of high-speed cache.
The invention allows for a kind of microprocessor that utilizes instruction history information and basic block high speed buffer storage to come together to improve performance, data handling system and method.Sort processor is applicable to and receives an instruction set, and described instruction set organization instruction group, then, the instruction group is assigned execution.When the execution command group, the instruction history information of the anomalous event that the record expression is relevant with the instruction group.After this, anomalous event takes place so that stop in the execution of response instruction historical information modify instruction term of execution of instruction group subsequently.Described processor comprises memory storage, instruction cache for example, L2 high-speed cache and system storage, division unit, and basic block high speed buffer storage.The division unit is constructed for receiving instruction set from memory storage.The division unit is applicable to instruction set is organized into the instruction group.The division unit can be from first order format to the second order format modify instruction collection form.The structure of basic block high speed buffer storage is applicable to the storage instruction group.Basic block high speed buffer storage comprises the instruction history field corresponding to each basic block high speed buffer storage inlet.The anomalous event that the instruction history information representation is relevant with the instruction group.In a preferred embodiment, each inlet of basic block high speed buffer storage is corresponding to an instruction group that is produced by the division unit.Processor can also comprise finishes the table steering logic, and it is constructed for the information in the storage execution history field when the instruction group is finished.Whether the instruction that instruction history information can be illustrated in the instruction group has another instruction dependence, can represent perhaps whether the execution of previous instruction group causes that the storage conveying is unusual.In this embodiment, processor is constructed for responding and detects the previous execution of instruction group and caused that storage carried unusual and execute instruction with orderly fashion.
The present invention also proposes a kind ofly to utilize first to send processor, data handling system and the relevant method that team and second sends team.Described processor is applicable to the issue unit dispatched instructions.Described issue unit comprises that first sends team and second and send team.If instruction current do not satisfy the condition be issued then be stored in first send in the team.If instruction current satisfy the condition be issued then be stored in second send in the team.Processor sends from first determines the next instruction that will send the instruction of team.If instruction depends on the execution result of another instruction, then can send and transfer to second the team and send in the team from first.In one embodiment, instruction can be transferred to second and send in the team being issued so that sent from first after carrying out the team.In this embodiment, in a particular time interval, instruction can be stored in second and send in the team.After this, if this instruction is not rejected as yet, then comprises second of this instruction and send team inlet and no longer be assigned with.Described microprocessor comprises instruction cache, allocation units, and it is constructed for receiving the instruction from instruction cache, and transmitting element, and it is constructed for receiving the instruction from allocation units.Transmitting element is applicable to and distributes current enough instructions to the first of the assignment of executive condition to send in the team, and the command assignment to the second of current not enough executive condition is sent in the team.
Other purpose and advantage of the present invention can be clearer by describing the present invention in detail below in conjunction with accompanying drawing, wherein:
Description of drawings
Fig. 1 is the calcspar that comprises according to the part of the selection of the data handling system of one embodiment of the present of invention;
Fig. 2 is the calcspar according to the part of the selection of the microprocessor of one embodiment of the present of invention;
Fig. 3 represents an example by the instruction fission function of an embodiment execution of processor shown in Figure 2;
Fig. 4 is a kind of calcspar of part of selection of microprocessor;
Fig. 5 is the calcspar of basic cacheline of the microprocessor of Fig. 2;
The various branches situation that the microprocessor of Fig. 6 key diagram 2 may run into; Fig. 7 is applicable to the calcspar that utilizes the table of filling a vacancy of the present invention;
Fig. 8 is the calcspar that comprises the basic block high speed buffer storage of instruction history information; And
Fig. 9 be comprise according to one embodiment of the present of invention once send team and secondary sends the calcspar that sends team of team.
Embodiment
Though the present invention can implement with various form, certain embodiments more shown in the drawings are by way of example herein also described in detail.But, be to be understood that, the drawings and detailed description here are not for the present invention being limited to these certain embodiments, and on the contrary, the present invention should comprise all remodeling and equivalent substitution schemes that drop in the design scope of the present invention that is defined by the following claims.
Referring to Fig. 1, wherein show a embodiment according to data handling system 100 of the present invention.System 100 comprises one or several CPU (central processing unit) (processor) 101a, 101b, 101c etc. (being commonly referred to as processor 101).In one embodiment, each processor 101 can comprise RISC (reduced instruction set computer) (RISC) microprocessor.About risc processor additional information generally can be at C.May et al.Ed., PowerPCArchitecture:A Specification for a New Family of RISC Peocessors (Morgan Kaufmann, 1994 2d edition) obtains.Processor 101 links to each other with system storage 250 and various other element by system bus 113.ROM (read-only memory) (ROM) 102 links to each other with system bus 113, and can comprise Basic Input or Output System (BIOS) (BIOS), and it is used for certain basic function of control system 100.Fig. 1 also shows I/O adapter 107 and network adapter 106, and they link to each other with system bus 113.I/O adapter 107 connected system buses 113 and jumbo memory storage 104 be hard disk 103 and/or tape storage driver 105 for example.Network adapter 106 interconnect bus 113 and external networks are so that can communicate data handling system 100 with other such system.Display monitor 136 links to each other with system bus 113 by display adapter 112, wherein can comprise graphics adapter, be used to improve the performance that graphics intensive is used, and Video Controller.In one embodiment, adapter 107,106 can be by centre bus bridge (not shown) and one or several I/O bus that links to each other with system bus 113 with 112.Be applicable to connect peripherals for example the I/O bus of hard disk controller, network adapter and graphics adapter comprise peripheral cell interface (PCI) bus according to PCI local bus specification Rev.2.2 regulation, it can be from PCI SDecial Interest Group, Hillsboro, OR. obtain, classify reference as at this.Also show the additional input-output unit that links to each other with system bus 113 by user interface adapter 108.Keyboard 109, mouse 110 and loudspeaker 111 all pass through user interface adapter 108 and link to each other with bus 113, and described user interface adapter can comprise the super I/O chip the integrated integrated circuit of a plurality of device adapters.About the out of Memory of this chip, the reader can consult PC87338/PC97338ACPI 1.0 and PC98/99 Compliant SuperI/O data sheet from NationalSemiconductor Corporation (November 1998) at microprocessor .national.com..Like this, configuration as shown in Figure 1, system 100 comprises the treating apparatus of the form that is processor 101, comprises the memory storage of system storage 250 and mass storage 104, input media is keyboard 109 and mouse 110 for example, and the output unit that comprises loudspeaker 111 and display 136.In one embodiment, system storage 250 and a kind of operating system of mass storage 104 centralized stores is AIX  operating system or other suitable operating system of IBM Corporation for example, is used to coordinate the function of various elements shown in Figure 1.Other details about AIX operating system can be at AIXVersion 4.3 TechnicalReference:Base Operating System and Extensions, Volumes 1and 2 (order numbers SC23-4159and SC23-4160); AIX Version 4.3 SystemUser ' s Guide:Communications and Networks (order number SC23-4122); And AIX Version 4.3 System User ' s Guide:Operating Systemand Devices (order number SC23-4121) from IBM Corporation at microprocessor .ibm.com obtain, and classify reference as at this.
Referring to Fig. 2, wherein show calcspar according to a simplification of the processor 101 of one embodiment of the present of invention.Processor 101 shown in Figure 2 comprises the instruction reading unit 202 of the address of the next instruction that is applicable to that generation will be read.The instruction address that is produced by instruction reading unit 202 is provided for instruction cache 210.Reading unit 202 can comprise branch prediction logic, and as its name suggests, it is used to carry out the prediction about the determination result that influences program execution flow.In order to realize improving performance by executing instruction speculatively and disorderly, in all abilities of processor 101, the ability of predicted branches decision correctly is an important element.The address that is produced by reading unit 202 is provided for instruction cache 210, and it comprises the subclass of a system storage content in a kind of high-speed storage device.The instruction of storage preferably has the form of an ISA in instruction cache 210, and it generally is a kind of traditional ISA, for example the instruction set of PowerPC or x86 compatibility.Details about PowerPC  instruction set can obtain among the Inc. (OrderNo.MPC620UM/AD) at PowerPC 620 RISCMicroprocessor User ' s Manual available from Motorola, classify reference as at this.If the address instruction that is produced by reading unit 202 is corresponding to the current system storage position that is replicated in the instruction cache, then instruction cache 210 is delivered to instruction fission unit 212 to corresponding instruction.If it is current not in instruction cache 210 (promptly the instruction address that is provided by reading unit 202 is not in instruction cache 210) corresponding to the instruction of the instruction address that produces by reading unit 202, then before instruction was sent to division unit 212, instruction must be read from L2 high-speed cache (not shown) or system storage.
Division unit 212 is applicable to the instruction stream of revising input, thereby produces one group of instruction of optimizing, is used for carrying out with high operating frequency (operating frequency that for example surpasses 1GHz) execution pipeline below.In one embodiment, for example, division unit 212 receives instruction with 32 bit wide ISA, for example the instruction group of being supported by PowerPC , and instruction transformation become preferably the 2nd ISA of broad, carry out in its high speed performance element that is easy in 1GHz or higher frequency range, operate.The form of the broad of the instruction that is produced by division unit 212 can comprise the explicit field that for example contains information (for example operand value), the only implicit or reference in the instruction that receives by division unit 212 of described information, and described field is formatted according to first form.In one embodiment, for example, the ISA of the instruction that is produced by division unit 212 has 64 or wideer width.
In one embodiment, design division unit 212, place imagination like this removes instruction is become outside second form of broad preferably from first format conversion, also an instruction set that reads is organized in the instruction " group " 302, and its example is as shown in Figure 3.Each instruction group 302 comprises one group of location of instruction 304a, 304b etc. (being commonly referred to as the location of instruction 304).Instruction set is organized in helps in the instruction group carrying out at a high speed, wherein mainly keep renaming the register conversion and finish and show required logic by being reduced to for a large amount of aloft instructions.In Fig. 3, show 3 examples of the instruction group that can realize by division unit 212.
In example 1, one group of instruction being represented by label 301 converts a single instruction group 302 to by division unit 212.Shown in embodiments of the invention in, each instruction group 302 comprises by label 304a, 5 positions that 304b, 304c, 304d, 304e represent.Each position 304 can comprise an instruction.In the present embodiment, each instruction group can comprise 5 instructions at most.In one embodiment, formatted according to an ISA by the instruction in the instruction set 301 of division unit 212 receptions, as mentioned above, and the instruction that is stored in the group 302 is formatted according to the form of second broad.Use the instruction group to rename recovery and table logic that finish by reducing to have been simplified by the quantity of the instruction that starts separately and follow the tracks of.Thereby in out-of-order processors, estimated service life instruction group is trying hard to simplify some information that can sacrifice when instructing about each instruction of hanging up of following the tracks of.
Example 2 explanations of Fig. 3 are according to second example of one embodiment of the present of invention by the instruction group of division unit 212 realizations.This example is represented to divide unit 212 complicated instruction is organized into simple instruction group so that the ability that high speed is carried out.In an example shown, two renewals load (LDU) instruction sequence and are organized into the instruction group, and described instruction group comprises a pair of load instructions that lays respectively at 304a and 304c and lays respectively at a pair of ADD instruction of 304b and 304d.In this example, do not contain branch instruction, so the rearmost position 304e of instruction group 302 does not contain instruction because organize 302.It is the same with the near order in other instruction set that PowerPC  upgrades load instructions, is a kind of instruction of complexity, and it influences the content of a plurality of general-purpose registers (GPR).Specifically, upgrade the ADD instruction that load instructions can be divided into the load instructions of the content that influences a GPR and influence the content of the 2nd GPR.Thereby in the instruction group 302 of the example 2 of Fig. 3, the instruction in two or more locations of instruction 304 is corresponding to an instruction that is received by division unit 212.
In example 3, the single instruction that is input to division unit 212 is divided into the one group of instruction that occupies a plurality of groups 302.More particularly, example 3 explanations load multiple (LM) instruction.Load multiple instruction (according to PowerPC  instruction set) packs the content of the continuous position in the storer by among the GPR of serial number into.In an example shown, the repeatedly loading of 6 continuous memory locations is divided into 6 load instructions.Because each group 32 according to the embodiment of described processor 101 comprises 5 instructions at most, and because the 5th position 304e is preserved for branch instruction, so the repeatedly loading of 6 registers is divided into two groups of 302a and 302b respectively.4 in the middle of the load instructions are stored among first group of 302a, and remaining two load instructions is stored among second group of 302b.Thereby in example 3, an instruction is divided into the instruction set that generates a plurality of instruction groups.
Referring now to Fig. 2,, the instruction group 302 that is produced by the preferred embodiment of division unit 212 is sent to basic block high speed buffer storage 213, and they are stored etc. pending there.Referring to Fig. 5, the embodiment of output quantity basic block high speed buffer storage 213 wherein.In an illustrated embodiment, basic block high speed buffer storage 213 comprises one group of inlet 502a-502n (being commonly referred to as the basic block high speed buffer storage inlet).In one embodiment, each inlet in basic block high speed buffer storage 213 contains an instruction group 302.In addition, each inlet 502 can comprise inlet identifier 504, pointer 506 and instruction address (IA) field 507.Each goes into the IA field 704 that 502 instruction address field 507 is similar to complete table 218.In one embodiment, each inlet 502 in basic block high speed buffer storage 504 is corresponding to the inlet in complete table 218, and instruction address field 507 is illustrated in the instruction address of the instruction of first in the corresponding instruction group 302.In one embodiment, the inlet identifier of the pointer 506 next instruction group 302 of indicating to be performed according to branch prediction algorithm, branch history table or other suitable branch so mechanism.As previously mentioned, utilize division unit 212 to form the rearmost position 304 distribution branch instruction of embodiment preferred in each instruction group 302 of instruction group.In addition, the preferred embodiment of division unit 212 produces instruction group 302, and wherein the quantity of the branch instruction in group 302 is 1 (or still less).In this structure, each instruction group 302 can be counted as representing " leg " of branch tree 600 shown in Figure 6, the value representation of its corresponding instruction group inlet 504 of wherein instruction group 302 usefulness.For example the first instruction group 302a represents that with inlet number (1) the rest may be inferred.As an example, suppose that the branch prediction mechanism prediction leg 2 (corresponding to second group of 302b) of processor 101 will be performed after leg 1, and leg 3 will be performed after leg 2.According to one embodiment of the present of invention, basic block high speed buffer storage 213 indicates next group 302 that will be performed to reflect these branch predictions by pointer 506 is set.The pointer 506 of each inlet 502 in basic block high speed buffer storage 213 can be used to the definite next instruction group 302 that will be dispatched.
Basic block high speed buffer storage 213 and piece reading unit 215 are with the crew-served similar mode co-operating of mode of reading unit 202 and instruction cache 210.More particularly, piece reading unit 215 is responsible for producing the instruction address that is provided for basic block high speed buffer storage 213.Compare in instruction address that is provided by piece reading unit 215 and the address in the instruction address field 507 in the basic block high speed buffer storage 213.If the instruction address that is provided by piece reading unit 213 is in basic block high speed buffer storage 213, then to sending the suitable instruction group of team's 230 inputs.If the address that is provided by piece reading unit 215 is not in basic block high speed buffer storage 213, then instruction address is sent back to reading unit 202, the suitable instruction of retrieval from instruction cache 210.Be used for the embodiment that preserve in its zone (chip size), basic block high speed buffer storage 213 can be cancelled instruction cache 210.In this embodiment, instruction for example is retrieved L2 high-speed cache or the system storage from a suitable memory storage, and is provided directly to division unit 212.If the instruction address that is produced by piece reading unit 213 is not in basic block high speed buffer storage 213, then from L2 high-speed cache or system storage and the suitable instruction of retrieval from instruction cache 210.
The embodiment of shown processor 101 also represents scheduling unit 214.Scheduling unit 214 is responsible for guaranteeing that in all required resources before suitable to it instruction of sending in team's 220 each instruction group of input all be available.In addition, scheduling unit 214 and scheduling with finish steering logic 216 and communicate by letter so that keep the order that trace command is issued and the completion status of these instructions, be beneficial to unordered execution.In the embodiment of processor 101, wherein dividing unit 212 becomes the instruction group to the input instruction tissue, and as mentioned above, each instruction group 302 is distributed a group echo (GTAG) by the finishing with steering logic 216 of order of the instruction group that transmission is sent.As an example, scheduling unit 214 can be to dull value that increases of continuous instruction set of dispense.Utilize this structure, make and know that (promptly newer) the instruction group with lower GTAG value was issued before the instruction group with bigger GTAG value.Though it is independent functional blocks that the embodiment shown in the processor 101 represents scheduling unit 214, the group of basic block high speed buffer storage 213 instruction tissue makes the function that himself has comprised scheduling unit 214.Thereby in one embodiment, scheduling unit 214 is included in the basic block high speed buffer storage 213, its with send team 220 and directly link to each other.
Combine with finishing steering logic 216 with scheduling, finish the state that table 218 is used to follow the tracks of the instruction group of sending in one embodiment of the invention.Referring to Fig. 7, wherein show the calcspar of an embodiment who finishes table 218.In an illustrated embodiment, finish table 218 and comprise one group of inlet 702a-702n (be called as and finish table entry 702).In this embodiment, the inlet of each in finishing table 218 702 comprises instruction address (IA) field 704 and mode bit field 706.In this embodiment, the inlet 702 in the table 218 is finished in the identification of the GTAG value of each instruction group 302, is storing the information of finishing corresponding to instruction group 302 in finishing table 218.Thereby the value of the GTAG of the instruction group 302 of storage is 1 in the inlet 1 of finishing table 118, and the rest may be inferred.In this embodiment, finish table 118 and can also comprise " unrolling " position, the instruction group that is used to represent to have lower GTAG value is in fact new than the instruction group with higher GTAG value.In one embodiment, instruction address field 704 comprises the address of the instruction among the primary importance 304a of corresponding instruction group 302.Mode field 706 can contain one or several mode bit, and whether be used in reference to example is utilizable as the corresponding inlet in finishing table 218 702, and perhaps this inlet has been assigned to a pending instruction group.
In the embodiment of processor shown in Figure 2 101, instruction is issued to sending team 220 from scheduling unit 214, and instruction is waited for corresponding the execution in the pipeline and being performed sending team.Processor 101 can comprise various types of execution pipelines, and each pipeline is designated to be used to carry out a subclass of the instruction set of processor.In one embodiment, carry out pipeline 222 and can comprise branch units streamline 224, load-store streamline 226, point of fixity algorithm unit 228 and unsteady dot element 230.Each carries out pipeline 222 can comprise two or more pipeline stages.The instruction of storage can be sent to execution pipeline 222 by using any one to send precedence algorithm in sending team 220.In one embodiment, for example the pending instruction the earliest in sending team 220 is to be sent to the next instruction of carrying out pipeline 222.In this embodiment, be used to determine the relevant level of instruction pending in sending team 220 by the GTAG value of scheduling unit 214 appointments.Before being issued, the destination register operand of instruction is assigned to the available GPR of renaming.When an instruction at last from sending team 120 when sending to suitable execution pipeline, this execution pipeline then carries out suitable operation according to the indication of the operational code of this instruction, and (by label 132 expressions) write renaming among the GPR of instruction to execution result when this instruction arrives the afterbody of streamline.Between the GPR that renames and its corresponding structure register, keep a kind of mapping.When all instructions in the instruction group (and the instruction in nearer instruction group) all are done and when producing exception, the pointer of finishing in finishing table 218 is incremented to the next instruction group.When finishing pointer and be incremented to new instruction group, be released with the relevant register that renames of instruction in the old instruction group, so as to the result of the instruction that is used for being deposited with old instruction group.If one or several instruction older than the instruction of finishing (but not submitted as yet) has produced one unusually, then producing unusual instruction and instruction that all are newer all is refreshed, and call and rename recovery routine, make the GPR conversion return the effective status of learning recently.
If the branch of prediction is not obtained (branch prediction mistake), then at instruction pipeline 222 with send team's 220 medium pending instructions and be refreshed.In addition, the pointer 506 of the basic block high speed buffer storage inlet 502 relevant with the branch of error prediction is updated, thus the branch that reflection obtains recently.Fig. 5 shows the example that this renewal is handled, and shown situation is that program implementation produces a branch from leg 1 (instruction group 302a) to leg 4 (instruction group 302d).Because the pointer 506 of inlet 502a was predicted the branch of the instruction group (promptly organizing 302b) that exists in No. 2 inlets of basic block high speed buffer storage 213 in the past, so the branch of the reality from instruction group 302a to group 302d is a misprediction.The branch of misprediction is detected, and quilt is sent back to piece reading unit 215, the instruction of hanging between the last level 232 of basic block high speed buffer storage 213 and each streamline 222 is refreshed, and utilizes the instruction group 302d in the inlet 4 of basic block high speed buffer storage 213 to restart to carry out.In addition, the pointer 506 of basic block high speed buffer storage inlet 502a is changed to new value 4 by the value before it 2, so that reflect up-to-date branch information.By being in close proximity to execution pipeline 222 places merging basic block high speed buffer storage 213 and piece reading unit 215, the present invention estimates to reduce the performance loss of misprediction branch.More particularly, implement basic block high speed buffer storage 213 by going up in " downstream " of instruction fission unit 212, the present invention has cancelled the instruction that refreshes path from branch misprediction of hanging in division unit 212, thereby reduced because branch misprediction and the progression of the streamline that must clean, thereby reduced performance loss.In addition, basic block high speed buffer storage 213 comprises a caches mechanism, it has and dispatches and finish control module 216 and finish the structure that the tissue of table 218 is complementary, so as to simplifying the tissue that inserts logic, and help the useful expansion of realization to basic block high speed buffer storage 213, as described below.
In one embodiment, basic block high speed buffer storage 213 also comprises instruction history information, this can be by means of the term of execution information that may use subsequently that is recorded in same instruction group, make avoid causing unusual, refresh, interrupt or the situation of other performance limitations incident (being collectively referred to as anomalous event), thereby improve performance of processors.In the embodiment of basic block high speed buffer storage shown in Figure 8 213, instruction history information is stored in the instruction history field 508 of each inlet 502.As an example that can be stored in the information type in the instruction history field 508, can enumerate the instruction group that comprises a specific load instructions, described instruction group causes storing the unusual of front when described load instructions is performed for the last time.Term used herein " is stored the unusual of front " and is taken place when carrying out the load instructions of after the storage instruction of shared common storage reference (according to procedure order) before the storage instruction in unordered computing machine.Because if carried out load command before storage instruction, load command is the invalid value of retrieval from register then, and to cause that instruction refreshes unusual so produce one.Basic block high speed buffer storage 213 and finish and steering logic 216 between correspondence go far towards to finish to the corresponding inlet of basic block high speed buffer storage 213 and carry by scheduling and finish the task of the information that mode that steering logic 216 is performed and finishes with instruction learns.Do not having under the situation of described correspondence, coming general requirement of self scheduling and the information of finishing of finishing steering logic 216 to pass through, so that make the group command information instruct relevant with its element in some modes of inserting hash table or other suitable mechanism.In the example that storage is carried, carry when unusual detecting storage, dispatch and finish control module 216 and will write on or several in the instruction history field 508 of suitable inlet should represent to store and carry in the unusual basic block high speed buffer storage 213.If the instruction group is carried out subsequently, then can use the storage that takes place before the expression to carry unusual instruction history information, for example,, wherein before finishing storage, stop to carry out to load for processor is placed orderly fashion.Thereby this embodiment of the present invention considers to write down the instruction history information of the expression anomalous event relevant with the instruction group, and the execution of modify instruction group after this, so as subsequently instruction group the term of execution prevention anomalous event generation.Though carrying with storage is that example is illustrated, but the instruction history information field of Kao Lving also is suitable for writing down and the various relevant information of historical events that may make processor avoid taking place again abnormality herein, for example with relevant information such as having or not of the prediction of the precision of any projecting body, operand value, high-speed cache.
An example of the information that can be recorded in the execution history field 508 of basic block high speed buffer storage 213 is by embodiment shown in Figure 9 explanation, and one of them or several team 220 that sends are subdivided into first and send team 902 and second and send team 904.Send the size of the best of team 220 and depth representing and finish balance between the consideration.On the one hand, wish to carry out the very big and very dark team that sends, so that maximally utilise the ability of the execution Out of order instruction of processor 101.The ability of sending Out of order instruction is subjected to the instruction number quantitative limitation in sending team 220.Bigger quantity send the instruction that be suitable for unordered processing of team corresponding to larger amt.On the other hand, along with the increase of the degree of depth of sending team 220, processor determines that the ability of the instruction that the next one is issued also increases in its restriction cycle length.In other words, the quantity of the instruction of hanging in sending team 220 is big more, is used for determining that the required time of instruction that the next one will send is also long more.Therefore, sending team for example sends team 220 and is restricted to about 20 or the degree of depth still less usually.One embodiment of the present of invention are considered to utilize and dark sent team and do not need to increase greatly the advantage of sending the required logic of team of the next instruction that can send of retrieval.The present invention utilizes such fact, promptly, usually the instruction of hanging in sending team 220 can not be sent immediately, this or because its be issued, or because it is suspended in the execution pipeline 222 of processor 101, perhaps it is waiting for finishing of another instruction of relying on because of operand value.
Referring to Fig. 9, comprise that according to the team 220 that sends of one embodiment of the present of invention first sends team 902 and second and send team 904.First sends team 902 contains the instruction that can be sent immediately.In one embodiment, the instruction of sending from scheduling unit 214 at first is stored in first and sends in available the sending of team 902.If a definite subsequently instruction has the correlativity with another instruction, then Xiang Guan instruction is moved to second and sends team 904, retrieves required information up to the relevant instruction that instruction relied on.For example, if an additional instruction after load instructions needs the result of load instructions, then can at first call in first to two instructions and send in the team 902.But, when then determining that additional instruction has dependence to load instructions, additional instruction is transferred to second from first team 902 and is sent team 904.With reference among the described embodiment that utilizes instruction history field 508 of Fig. 8, can write down the dependence of extra-instruction in the above, make this instruction subsequently the term of execution, extra-instruction can directly be stored in second and be sent in the team 220.Also can use second to send the instruction that team 904 stores in the execution pipeline that still is suspended in processor that is issued recently.In the present embodiment, instruction is sent team 902 from first and is sent, and is transferred to second then and sends team 904.In one embodiment, instruction can reside in second and send in the team 904, till definite this instruction is not rejected.Determine that the not unaccepted as yet method of an instruction is to implement one and second each that send in the team 904 to send relevant timer/counter (not shown).When beginning instruction is sent team 902 from first and transfer to second when sending team 904, counter/timer is activated.In one embodiment, timer/counter is to the number count of the clock period of process when timer/counter is activated.If counter/timer continues counting within a predetermined periodicity, then do not detect instruction and be rejected, instruction is assumed to be and successfully is done, and no longer is assigned with at second item that sends in the team 904.By utilizing such one to send team, send team and second comprising first and send team, described first sends team is exclusively used in current qualified sending so that the instruction that is performed, described second send in the team instruction or because the instruction correlativity, perhaps because this instruction is recently to send that team sends and be not sent out as yet from first, but the current not enough executive condition of described instruction, can increase the effective dimensions or the degree of depth of sending team, be used for the required time (being the quantity of logic level) of instruction that definite next one will be issued and increase indistinctively.
Those skilled in the art is to be understood that, according to the content that discloses above, it is contemplated that the embodiment of many different microprocessors, comprising the high-speed cache that is suitable for stores packets instruction (being the instruction of second form from first format conversion promptly), so that reduce the relevant stand-by period of branch of misprediction.Be to be understood that the form of the present invention that describes in detail only is present preferred example herein, the claim below being intended in a wide scope, explain, appended claim comprises all changes and the remodeling of these embodiment.

Claims (20)

1. method that is used to carry out microprocessor instruction may further comprise the steps:
First instruction set that receives is converted to the instruction group;
The described instruction group of storage in the inlet of the basic block high speed buffer storage that is set up, wherein the inlet of each caching device contains the instruction group;
The instruction that is emitted in the described instruction group is carried out; And
What produce the term of execution of the instruction of response in the instruction group is unusual, only refreshes to be suspended on basic block high speed buffer storage and to finish those instructions between the level.
2. the method for claim 1, what wherein produce comprises that unusually branch misprediction is unusual.
3. the method for claim 1, wherein received instruction is formatted according to first order format, and the instruction in the instruction group is formatted according to second order format.
4. method as claimed in claim 3, wherein second order format is wideer than first order format.
5. method as claimed in claim 4 comprises also each inlet in the caching device is distributed a pointer that wherein said pointer is used to predict the next instruction group that will be performed.
6. method as claimed in claim 5, also comprise response an instruction group the term of execution detect misprediction branch and upgrade pointer corresponding to the high-speed cache inlet of the branch of misprediction.
7. microprocessor comprises:
The instruction fission unit, it is configured for receiving the first microprocessor instruction set, and described instruction set is organized into the instruction group; And
The basic block high speed buffer storage device, it is configured for the instruction group that high-speed cache is produced by the division unit; And
Performance element, it is applicable to the instruction in the execution command group;
What wherein produce the instruction in the instruction group term of execution causes refresh unusual, only refreshes those instructions that distribute from basic block high speed buffer storage.
8. processor as claimed in claim 7 also comprises allocation units, and it is configured for search instruction in the instruction group from basic block high speed buffer storage, and described instruction is transported to sends in the team.
9. processor as claimed in claim 7, wherein received instruction is formatted according to first order format, and second instruction set is formatted according to second order format, and wherein second order format is wideer than first order format.
10. processor as claimed in claim 7 wherein is constructed for storing each instruction group at each inlet of corresponding basic block high speed buffer storage at basic block high speed buffer storage.
11. processor as claimed in claim 10, wherein basic block high speed buffer storage comprises the inlet field of the corresponding basic block high speed buffer storage inlet of expression.
12. processor as claimed in claim 11, wherein each inlet of basic block high speed buffer storage comprises the pointer of the instruction group that is used to predict that next will be performed.
13. processor as claimed in claim 12, wherein processor is configured the pointer that makes the branch of response misprediction upgrade each inlet.
14. one kind comprises at least one processor, storer, and the data handling system of input media and display device, wherein said processor comprises:
The instruction fission unit, it is configured for receiving the first microprocessor instruction set, and described instruction set is organized into the instruction group;
The basic block high speed buffer storage device, it is configured for the instruction group that high-speed cache is produced by the division unit; And
Performance element, it is applicable to the instruction in the execution command group;
What wherein produce the instruction in the instruction group term of execution causes refresh unusual, only refreshes those instructions that distribute from basic block high speed buffer storage.
15. data handling system as claimed in claim 14 also comprises allocation units, it is configured for search instruction in the instruction group from basic block high speed buffer storage, and described instruction is transported to sends in the team.
16. data handling system as claimed in claim 14, wherein received instruction is formatted according to first order format, and second instruction set is formatted according to second order format, and wherein second order format is wideer than first order format.
17. data handling system as claimed in claim 14 wherein is constructed for storing each instruction group at each inlet of corresponding basic block high speed buffer storage at basic block high speed buffer storage.
18. data handling system as claimed in claim 17, wherein basic block high speed buffer storage comprises the inlet field of the corresponding basic block high speed buffer storage inlet of expression.
19. data handling system as claimed in claim 18, wherein each inlet of basic block high speed buffer storage comprises the pointer of the instruction group that is used to predict that next will be performed.
20. data handling system as claimed in claim 14, wherein processor is configured the pointer that makes the branch of response misprediction upgrade each inlet.
CN00137005A 2000-01-06 2000-12-27 Microprocessor using basic block high speed buffer storage Expired - Fee Related CN1116638C (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US47830800A 2000-01-06 2000-01-06
US09/478,308 2000-01-06

Publications (2)

Publication Number Publication Date
CN1303044A CN1303044A (en) 2001-07-11
CN1116638C true CN1116638C (en) 2003-07-30

Family

ID=23899386

Family Applications (1)

Application Number Title Priority Date Filing Date
CN00137005A Expired - Fee Related CN1116638C (en) 2000-01-06 2000-12-27 Microprocessor using basic block high speed buffer storage

Country Status (4)

Country Link
JP (1) JP3629551B2 (en)
KR (1) KR100402820B1 (en)
CN (1) CN1116638C (en)
HK (1) HK1035946A1 (en)

Families Citing this family (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN100395731C (en) * 2006-02-23 2008-06-18 华为技术有限公司 Processor and its data operating method
JP2009163624A (en) * 2008-01-09 2009-07-23 Nec Electronics Corp Processor device and conditional branch processing method
US8082467B2 (en) 2009-12-23 2011-12-20 International Business Machines Corporation Triggering workaround capabilities based on events active in a processor pipeline
US9104399B2 (en) 2009-12-23 2015-08-11 International Business Machines Corporation Dual issuing of complex instruction set instructions
US9135005B2 (en) 2010-01-28 2015-09-15 International Business Machines Corporation History and alignment based cracking for store multiple instructions for optimizing operand store compare penalties
US8495341B2 (en) 2010-02-17 2013-07-23 International Business Machines Corporation Instruction length based cracking for instruction of variable length storage operands
US8938605B2 (en) 2010-03-05 2015-01-20 International Business Machines Corporation Instruction cracking based on machine state
US8464030B2 (en) 2010-04-09 2013-06-11 International Business Machines Corporation Instruction cracking and issue shortening based on instruction base fields, index fields, operand fields, and various other instruction text bits
US8645669B2 (en) 2010-05-05 2014-02-04 International Business Machines Corporation Cracking destructively overlapping operands in variable length instructions
US10572264B2 (en) * 2017-11-30 2020-02-25 International Business Machines Corporation Completing coalesced global completion table entries in an out-of-order processor
US10564979B2 (en) * 2017-11-30 2020-02-18 International Business Machines Corporation Coalescing global completion table entries in an out-of-order processor
KR20190128392A (en) * 2018-05-08 2019-11-18 에스케이하이닉스 주식회사 Memory system and operation method thereof
US11409530B2 (en) 2018-08-16 2022-08-09 Arm Limited System, method and apparatus for executing instructions

Also Published As

Publication number Publication date
HK1035946A1 (en) 2001-12-14
KR100402820B1 (en) 2003-10-22
JP2001229024A (en) 2001-08-24
CN1303044A (en) 2001-07-11
KR20010070434A (en) 2001-07-25
JP3629551B2 (en) 2005-03-16

Similar Documents

Publication Publication Date Title
CN1163822C (en) Microprocessor-possessing first and second emitting groups
CN1155882C (en) Microprocessor possessing instruction for basic block high speed buffer storage of historical information
JP4170292B2 (en) A scheduler for use in microprocessors that support speculative execution of data.
EP0106670B1 (en) Cpu with multiple execution units
US6988186B2 (en) Shared resource queue for simultaneous multithreading processing wherein entries allocated to different threads are capable of being interspersed among each other and a head pointer for one thread is capable of wrapping around its own tail in order to access a free entry
CN1188778C (en) Zoning transmit quene and distribution strategy
US5809268A (en) Method and system for tracking resource allocation within a processor
CN1116638C (en) Microprocessor using basic block high speed buffer storage
JPH04232532A (en) Digital computer system
JPH02163835A (en) Data operation for data processing system
JP2007536626A (en) System and method for verifying a memory file that links speculative results of a load operation to register values
JPH10312282A (en) Method and device for improving insruction completion
JP3142813B2 (en) Information processing system and method for managing register renaming
JP2003523574A (en) Secondary reorder buffer microprocessor
US6240507B1 (en) Mechanism for multiple register renaming and method therefor
CN1124546C (en) Distributed instruction completion logic
US7594097B2 (en) Microprocessor output ports and control of instructions provided therefrom
US7197630B1 (en) Method and system for changing the executable status of an operation following a branch misprediction without refetching the operation
JP4631442B2 (en) Processor
JP2001249805A (en) Instruction takeout unit in microprocessor
JPH10283178A (en) Method and system for issuing instruction
JPS6052453B2 (en) Process control blocks for computer systems
Omondi Ideas for the design of multithreaded pipelines

Legal Events

Date Code Title Description
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C06 Publication
PB01 Publication
C14 Grant of patent or utility model
GR01 Patent grant
C19 Lapse of patent right due to non-payment of the annual fee
CF01 Termination of patent right due to non-payment of annual fee