CN105278915A - Instruction distribution device for superscalar processor based on decoupling-check-out operations - Google Patents

Instruction distribution device for superscalar processor based on decoupling-check-out operations Download PDF

Info

Publication number
CN105278915A
CN105278915A CN201510020399.4A CN201510020399A CN105278915A CN 105278915 A CN105278915 A CN 105278915A CN 201510020399 A CN201510020399 A CN 201510020399A CN 105278915 A CN105278915 A CN 105278915A
Authority
CN
China
Prior art keywords
instruction
performance element
rob
unit
vpu
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201510020399.4A
Other languages
Chinese (zh)
Other versions
CN105278915B (en
Inventor
杨思博
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
BEIJING GUORUI ZHONGSHU TECHNOLOGY CO LTD
Original Assignee
BEIJING GUORUI ZHONGSHU TECHNOLOGY CO LTD
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by BEIJING GUORUI ZHONGSHU TECHNOLOGY CO LTD filed Critical BEIJING GUORUI ZHONGSHU TECHNOLOGY CO LTD
Priority to CN201510020399.4A priority Critical patent/CN105278915B/en
Publication of CN105278915A publication Critical patent/CN105278915A/en
Application granted granted Critical
Publication of CN105278915B publication Critical patent/CN105278915B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Abstract

The invention provides an instruction distribution device for a superscalar processor based on decoupling-check-out operations. The device comprises an instruction operation queue, an instruction distribution unit, an ROB (ReOrder Buffer) and execution units, wherein the instruction distribution unit comprises an execution unit dependence analysis device, an execution unit position analysis device, an execution unit state judgment device, an execution unit distribution device and an ROB distribution device, wherein the execution unit position analysis device and the execution unit dependence analysis device are used for obtaining correlation between the instructions, including a position of each execution unit needed by the instruction and dependence of the needed execution unit about the other execution unit, and carrying out decoupling operation on the correlated instructions; the execution unit state judgment device is used for judging whether each execution unit can accept the newly-distributed instruction; the execution unit distribution device is used for carrying out check-out operation on each execution unit; and ROB distribution device is used for carrying out check-out operation according to a position of each virtual execution unit and a corresponding available signal. According to the device, the timing sequence of dispatching operation can be optimized.

Description

Based on the superscalar processor command assignment device of decoupling zero-check out operation
Technical field
The present invention relates to superscalar processor technical field, particularly a kind of superscalar processor command assignment device based on decoupling zero-check out operation.
Background technology
Comprise in superscalar processor multiple can the instruction execution unit of concurrent working, and eliminate the write-after-read conflict between instruction by register renaming technology and write write conflict, instruction as much as possible is performed in different performance elements simultaneously.Register renaming needs to complete before instruction enters the reservation station of performance element, instruction needs the order performed according to program to flow in the streamline of processor before this, and the execution of instruction after entering into the reservation station of performance element just no longer relies on the order that program performs.
Consider from the angle of execution efficiency, the pipeline series in processor is more few better.But consider from the angle of running frequency, require again the progression that the streamline in processor reaches certain.Therefore, in CPU design process, need the running frequency reaching as far as possible high with few pipeline series of trying one's best.Superscalar processor needs in same period, many instructions to be sent in the reservation station of multiple performance element simultaneously, and this process is called the distribution of instruction, or the sending of instruction.
In order to reduce the pipeline series in processor as far as possible, can consider that the distribution of register renaming and instruction is placed on the same cycle to be done, do a benefit in addition like this, the instruction be just through after register renaming enters reservation station immediately, does not need to carry out updating maintenance to the address after rename in command assignment queue again.
But this way requires that the distribution of instruction must follow principle of temporal sequence, the order namely performed according to program successively by many command assignment in the reservation station of different instruction execution units.Because dissimilar instruction needs different performance elements, and the quantity of often kind of performance element is also different in processor, even if same performance element also has the priority of dynamic change, therefore, in the instruction to be allocated such as same period, the allocation result of the instruction that the distribution of ranking instruction rearward in program execution sequence is forward with ranking in program execution sequence is relevant.The such as instruction of two same types, which performance element can ranking instruction rearward distribute or be dispensed into will depend on the allocation result of the forward instruction of ranking and the service condition of performance element reservation station.
Traditional way is the problem that the mode using iteration to judge solves many command assignment, related art scheme proposes a kind of method and system of system for single cycle dispatch of multiple instructions in superscalar processor, the logical flow chart (as Fig. 1) of the one-cycle instruction dispatching process of its directive distributor shows each instruction and all will determine whether to distribute through a series of Logic judgment, and the place that the program that will turn back to after this logical series completes starts judges next instruction.The distribution decision logic of mode to the instruction in same period of this iteration is that serial performs, such as Article 1 instruction obtains execution unit and Article 2 instruction may be caused to obtain execution unit, therefore the distribution of Article 2 instruction depends on the allocation result of Article 1 instruction, this mode must cause the sequential pressure of command assignment in the monocycle large, causes sequential nervous.
Summary of the invention
The present invention is intended to solve one of technical matters in above-mentioned correlation technique at least to a certain extent.
For this reason, the object of the invention is to propose a kind of superscalar processor command assignment device based on decoupling zero-check out operation, iterative processing between instruction is decomposed into independently decoupling zero operation and checks out operation by this device, two kinds of operations can parallel processing, also can pipelining, thus optimize the sequential of sending operation.
To achieve these goals, embodiments of the invention propose a kind of superscalar processor command assignment device based on decoupling zero-check out operation, comprise command operating queue, instruction dispatch unit, ROB and performance element, wherein, described command operating queue, for the instruction to be allocated such as storing, described instruction dispatch unit, by the command assignment in instruction queue in ROB and performance element, described ROB, the order for performing according to program preserves the instruction after distributing, described performance element, for performing described instruction, wherein, described instruction dispatch unit comprises performance element dependency analysis device, performance element position analysis device, performance element state judging device, performance element distributor and ROB distributor, wherein, described performance element position analysis device and performance element dependency analysis device are used for realizing instruction decoupling zero operation, to obtain the correlativity between instruction, comprise performance element needed for the position of performance element needed for instruction and instruction to the dependence of the unit that other performs, and from instruction queue, obtain the instruction type of each bar instruction to be allocated, and according to instruction type, decoupling zero operation is carried out to the instruction that there is correlativity, described performance element state judging device, for judging whether each performance element can accept newly assigned instruction, described performance element distributor, for checking out operation to each performance element, described ROB distributor, checks out operation for carrying out ROB according to the position of each virtual performance element and the available signal of correspondence.
According to the superscalar processor command assignment device based on decoupling zero-check out operation of the embodiment of the present invention, iterative processing between instruction be decomposed into independently decoupling zero operation and check out operation, two kinds of operations can parallel processing, also can pipelining, thus optimizes the sequential of sending operation.
In addition, the superscalar processor command assignment device based on decoupling zero-check out operation according to the above embodiment of the present invention can also have following additional technical characteristic:
In some instances, described instruction type comprises: ALU instruction: performed by fixed-point arithmetic unit; FPU instruction: performed by Float Point Unit; VPU0 instruction: performed by vector operation unit VPU_0 and VPU_1; VPU1 instruction: can only be performed by vector operation unit VPU_1; LSU instruction: can only be performed by memory access unit LSU_0 and LSU_1; BQU instruction: performed by branch's queue unit; MDU instruction: performed by fixed point multiplication and division arithmetic element; ROB instruction: performed by ROB, does not need instruction execution unit, as long as be assigned in ROB.
In some instances, described instruction dispatch unit comprises following virtual performance element: ALU_1ST: the fixed-point arithmetic unit that the Article 1 ALU instruction in instruction to be sent uses, corresponding to ALU_0 or ALU_1 in reality; ALU_2ND: the fixed-point arithmetic unit that the Article 2 ALU instruction in instruction to be sent uses, corresponding to ALU_0 or ALU_1 in reality; FPU_1ST: the Float Point Unit that the Article 1 FPU instruction in instruction to be sent uses, corresponding to FPU_0 or FPU_1 in reality; FPU_2ND: the Float Point Unit that the Article 2 FPU instruction in instruction to be sent uses, corresponding to FPU_0 or FPU_1 in reality; VPU_1ST: the vector operation unit that the Article 1 VPU instruction in instruction to be sent uses, corresponding to VPU_0 or VPU_1 in reality; VPU_2ND: the vector operation unit that the Article 2 VPU instruction in instruction to be sent uses, corresponding to VPU_0 or VPU_1 in reality; LSU_1ST: the Article 1 LSU instruction in instruction to be sent, corresponding to the LSU_0 in reality; LSU_2ND: the Article 2 LSU instruction in instruction to be sent, corresponding to the LSU_1 in reality; BQU: the Article 1 BQU instruction in instruction to be sent, corresponding to the BQU in reality; MDU: the Article 1 MDU instruction in instruction to be sent, corresponding to the MDU in reality.
In some instances, described performance element position analysis device judges the position of instruction in instruction to be sent that will be assigned to each virtual performance element according to the precedence of instruction type and instruction.
In some instances, described performance element dependency analysis device judges the dependence that will be assigned between the instruction of each virtual performance element according to the precedence of instruction type and instruction.
In some instances, the input of described performance element state judging device comes from each performance element, and output is the available signal of each performance element.
In some instances, it is described that to check out operation be for virtual performance element xxx, check that whether each xxx_check_list_yyy is effective, if effectively, just check that whether yyy_available is effective, only have the yyy_available corresponding to all effective xxx_check_list_yyy all effective, xxx could be checked out.
In some instances, the input of described ROB distributor is from performance element position analysis device, performance element dependency analysis device and performance element state judging device.
In some instances, wherein, described ROB distributor, carrying out the precedence following instruction when checking out operation, just can be checked out when the instruction of instruction below only is above checked out.
Additional aspect of the present invention and advantage will part provide in the following description, and part will become obvious from the following description, or be recognized by practice of the present invention.
Accompanying drawing explanation
Above-mentioned and/or additional aspect of the present invention and advantage will become obvious and easy understand from accompanying drawing below combining to the description of embodiment, wherein:
Fig. 1 is the logical flow chart of the one-cycle instruction dispatching process of directive distributor in current related art scheme;
Fig. 2 is according to an embodiment of the invention based on the structured flowchart of decoupling zero-the check out superscalar processor command assignment device of operation;
Fig. 3 is the cut-away view of instruction dispatch unit according to an embodiment of the invention.
Embodiment
Be described below in detail embodiments of the invention, the example of described embodiment is shown in the drawings, and wherein same or similar label represents same or similar element or has element that is identical or similar functions from start to finish.Being exemplary below by the embodiment be described with reference to the drawings, only for explaining the present invention, and can not limitation of the present invention being interpreted as.
Below in conjunction with accompanying drawing, the superscalar processor command assignment device based on decoupling zero-check out operation according to the embodiment of the present invention is described.
Fig. 2 is according to an embodiment of the invention based on the structured flowchart of decoupling zero-the check out superscalar processor command assignment device of operation.As shown in Figure 2, this device comprises: command operating queue 110, instruction dispatch unit 120, ROB (ReOrderBuffer, reorder buffer memory) and performance element 130 (not shown).
Particularly, command operating queue 110 is for storing instruction to be allocated.Instruction dispatch unit 120 for by the command assignment in instruction queue in ROB and performance element 130.The order that ROB is used for performing according to program preserves the instruction after distributing, to ensure that the processor of Out-of-order execution still can produce accurate breakpoint.Performance element 130 is for performing concrete command operating.
In some instances, instruction dispatch unit 120 each cycle can distribute 4 instructions (front 4 instructions namely in instruction queue), instruction will be assigned simultaneously in ROB and performance element 130 that (some instruction such as NOP does not need performance element 130, therefore only can be assigned in ROB), and in the distributor of the embodiment of the present invention, have 10 performance elements 130.Specifically comprise: two fixed-point arithmetic unit ALU_0 and ALU_1; Two Float Point Unit FPU_0 and FPU_1; Two vector operation unit VPU_0 and VPU_1; Two memory access unit LSU_0 and LSU_1; One branch queue unit BQU and fixed point multiplication and division arithmetic element MDU, instruction dispatch unit 120 according to every type of bar instruction and the service condition of each functional unit, will determine which performance element these 4 instructions to be allocated will be assigned in.
Wherein, in one embodiment of the invention, as shown in Figure 3, instruction dispatch unit 120 comprises: performance element dependency analysis device 121, performance element position analysis device 122, performance element state judging device 123, performance element distributor 124 and ROB distributor 125.
Wherein, performance element position analysis device 122 and performance element dependency analysis device 121 are for realizing instruction decoupling zero operation, to obtain the correlativity between instruction, comprise performance element needed for the position (output of performance element position analysis device 122) of performance element needed for instruction and instruction to the dependence (output of performance element dependency analysis device 121) of the unit that other performs, and from instruction queue, obtain the instruction type of each bar instruction to be allocated, and according to instruction type, decoupling zero operation is carried out to the instruction that there is correlativity.Like this, when carrying out command assignment, can analyze based on performance element 130, and type and the precedence relationship of 4 instructions originally need not be considered, as long as the condition adjudgement of performance element 130 is (output of performance element state judging device 123) out, article 4, instruction can carry out command assignment simultaneously, and need not need iterative loop as shown in Figure 1.Wherein, performance element position analysis device 122 and performance element dependency analysis device 121 all obtain input from instruction queue.
Wherein, in one embodiment of the invention, instruction type such as comprises:
ALU instruction: performed by fixed-point arithmetic unit;
FPU instruction: performed by Float Point Unit;
VPU0 instruction: performed by vector operation unit VPU_0 and VPU_1;
VPU1 instruction: can only be performed by vector operation unit VPU_1;
LSU instruction: can only be performed by memory access unit LSU_0 and LSU_1;
BQU instruction: performed by branch's queue unit;
MDU instruction: performed by fixed point multiplication and division arithmetic element;
ROB instruction: performed by ROB, does not need instruction execution unit, as long as be assigned in ROB.
And then performance element position analysis device 122 and command unit dependency analysis device 121 obtain the instruction type of each bar instruction to be allocated from instruction queue, carry out correlation analysis on this basis to instruction.
In some instances, instruction dispatch unit 120 such as comprises following virtual performance element:
ALU_1ST: the fixed-point arithmetic unit that the Article 1 ALU instruction in instruction to be sent uses, corresponding to ALU_0 or ALU_1 in reality;
ALU_2ND: the fixed-point arithmetic unit that the Article 2 ALU instruction in instruction to be sent uses, corresponding to ALU_0 or ALU_1 in reality;
FPU_1ST: the Float Point Unit that the Article 1 FPU instruction in instruction to be sent uses, corresponding to FPU_0 or FPU_1 in reality;
FPU_2ND: the Float Point Unit that the Article 2 FPU instruction in instruction to be sent uses, corresponding to FPU_0 or FPU_1 in reality;
VPU_1ST: the vector operation unit that the Article 1 VPU instruction in instruction to be sent uses, corresponding to VPU_0 or VPU_1 in reality;
VPU_2ND: the vector operation unit that the Article 2 VPU instruction in instruction to be sent uses, corresponding to VPU_0 or VPU_1 in reality;
LSU_1ST: the Article 1 LSU instruction in instruction to be sent, corresponding to the LSU_0 in reality;
LSU_2ND: the Article 2 LSU instruction in instruction to be sent, corresponding to the LSU_1 in reality;
BQU: the Article 1 BQU instruction in instruction to be sent, corresponding to the BQU in reality;
MDU: the Article 1 MDU instruction in instruction to be sent, corresponding to the MDU in reality.
More specifically, performance element position analysis device 122 judges the position of instruction in instruction to be sent that will be assigned to each virtual performance element (if can be assigned with) according to the precedence of instruction type and instruction.In this example, the output of performance element position analysis device 122 represents with xxx_position, and wherein xxx is the name of corresponding virtual performance element.For ALU_1ST, FPU_1ST, VPU_1ST, LSU_1ST, BQU and MDU, corresponding xxx_position is 4, correspond respectively to 1st ~ 4 articles of instructions to be allocated, if X position is the 1 X article of instruction just representing when will be assigned to the instruction of this functional unit in instruction to be allocated; For ALU_2ND, FPU_2ND, VPU_2ND and LSU_2ND, corresponding xxx_position is 3, because Article 1 instruction to be allocated can never be assigned to their theres.
Further, performance element dependency analysis device 121 judges the dependence that will be assigned between the instruction of each virtual performance element according to the precedence of instruction type and instruction, its output represents with xxx_check_list_yyy, and wherein xxx is the name of corresponding virtual performance element.Yyy to represent between xxx and another virtual performance element whether Existence dependency relationship.If xxx_check_list_yyy is 1, then represent that the instruction that will be assigned to xxx only just can distribute after the command assignment that will be assigned to yyy.A special case is xxx_check_list_xxx, indicates that instruction needs to be assigned to xxx when being 1.Because instruction is order-assigned, ALU_2ND, FPU_2ND, VPU_2ND and LSU_2ND and ALU_1ST, between FPU_1ST, VPU_1ST and LSU_1ST, there is natural dependence, otherwise the latter can never rely on the former, so the signal describing the dependence between them is omitted.
In some instances, performance element state judging device 123, performance element distributor 124 and ROB distributor 125 realize checking out operation, thus complete command assignment work.
Specifically, performance element state judging device 123 is for judging whether each performance element can accept newly assigned instruction.The input of performance element state judging device 123 comes from each performance element (its input is omitted in figs. 2 and 3), the output of performance element state judging device 123 is available signals of each performance element, performance element described here, refers to actual performance element instead of virtual performance element.Represent that when zzz_available is 1 performance element zzz can accept command assignment.Simultaneously for the performance element of the same type (such as: ALU, FPU, VPU) of plural number, performance element state judging device 123 also will judge which performance element corresponds to Article 1 instruction of the same type, and which performance element corresponds to Article 2 instruction of the same type.Although LSU instruction also has two performance elements, the distribution of these two performance elements must be order, so do not need to carry out above-mentioned judgement.
The input of performance element distributor 124 is from performance element dependency analysis device 121 and performance element state judging device 123.Performance element distributor 124 is for checking out operation to each performance element.It is for virtual performance element xxx that what is called checks out operation, check that whether each xxx_check_list_yyy is effective, if effectively, just check that whether yyy_available is effective, only have the yyy_available corresponding to all effective xxx_check_list_yyy all effective, xxx could be checked out.
It should be noted that the output zzz_available of performance element state judging device 123 is corresponding to actual performance element, and yyy is virtual performance element, therefore needs to convert zzz_available to yyy_available.Because actual performance element LSU_0, LSU_1, BQU and MDU have unique corresponding virtual performance element, therefore directly the name of actual performance element directly can be translated into the name yyy of the virtual performance element of its correspondence.But for ALU, FPU and VPU, the corresponding relation of actual performance element and virtual performance element is not unique, therefore needs to convert.For ALU and FPU, performance element state judging device 123 can judge the minimum number of distribution instruction which ALU or FPU preserves at present, and this ALU or FPU will as being mapped to ALU_1ST or FPU_1ST by the performance element of priority allocation.For VPU, because some instruction is in this example merely able to be performed by VPU_1, therefore also to judge whether the VPU instruction in instruction to be allocated can only be performed by VPU_1, if the instruction of VPU_1ST can only be performed by VPU_1, just VPU_1ST is directly set to VPU_1.If the instruction of VPU_1ST can be performed by any VPU, but the instruction of VPU_2ND can only be performed by VPU_1, then VPU_2ND is directly set to VPU_1.If the instruction of VPU_1ST and VPU_2ND can be performed by any VPU, then the method same with ALU and FPU is adopted to judge.In addition, the state checked out except other performance element that itself relies on it will be judged of performance element, also to judge whether ROB has the position that can accept the instruction that will be assigned to this performance element, also need to use xxx_position herein, because the instruction that ROB is sequence arrangement to be assigned with, receiving instruction of whether having living space is related with the position of this instruction in instruction to be sent, for reducing redundancy, not shown in figure 3.
The input of ROB distributor 125 is from performance element position analysis device 122, performance element dependency analysis device 121 and performance element state judging device 123.ROB distributor 125 checks out operation for carrying out ROB according to the position of each virtual performance element and the available signal of correspondence.Wherein, carrying out the precedence following instruction when checking out operation, just can be checked out when the instruction of instruction below only is above checked out.For Article 1 instruction to be sent, when being 1 for first of the xxx_position having virtual performance element, if xxx_check_list_xxx and xxx_available of correspondence is also 1, then this instruction can be assigned with.If this instruction is ROB type, and ROB can accept this instruction, then this instruction also can be assigned with.For 2nd ~ 4 articles of instructions to be allocated, except judging whether the performance element corresponding with self position and ROB allow to distribute, also to judge whether the instruction come before oneself can be performed unit and accept.It should be noted that because ROB is order-assigned herein, if so this instruction can be accepted by ROB, then the instruction to be allocated come before this instruction also must be accepted by ROB, therefore need not make a decision again.
In the examples described above, it should be noted that, performance element position analysis device 122, performance element dependency analysis device 121 and performance element state judging device 123 can parallel runnings, and performance element distributor 124 and ROB distributor 125 also can parallel runnings.And due to decoupling zero with check out and be two and independently operate, so the upper level that the operation of performance element position analysis device 122 and performance element dependency analysis device 121 can be put into streamline performs.In an embodiment of the present invention, a part of logic of performance element position analysis device 122 and performance element dependency analysis device 121 has been placed on the upper level of streamline, and another part logic and performance element state judging device 123 concurrent working, be less than the working time of performance element state judging device 123 working time due to this partial logic, therefore from sequential, be exactly performance element state judging device 123 that sum longer with the execution time in performance element distributor 124 and ROB distributor 125 in the sequential key path of command assignment level.The dependence between instruction can be made so as far as possible to be reduced the impact of sequential, thus reach higher travelling speed.
To sum up, according to the superscalar processor command assignment device based on decoupling zero-check out operation of the embodiment of the present invention, the iterative processing between instruction be decomposed into independently decoupling zero operation and check out operation, two kinds of operations can parallel processing, also can pipelining, thus optimize the sequential of sending operation.
In describing the invention, it will be appreciated that, term " " center ", " longitudinal direction ", " transverse direction ", " length ", " width ", " thickness ", " on ", D score, " front ", " afterwards ", " left side ", " right side ", " vertically ", " level ", " top ", " end " " interior ", " outward ", " clockwise ", " counterclockwise ", " axis ", " radial direction ", orientation or the position relationship of the instruction such as " circumference " are based on orientation shown in the drawings or position relationship, only the present invention for convenience of description and simplified characterization, instead of indicate or imply that the device of indication or element must have specific orientation, with specific azimuth configuration and operation, therefore limitation of the present invention can not be interpreted as.
In addition, term " first ", " second " only for describing object, and can not be interpreted as instruction or hint relative importance or imply the quantity indicating indicated technical characteristic.Thus, be limited with " first ", the feature of " second " can express or impliedly comprise at least one this feature.In describing the invention, the implication of " multiple " is at least two, such as two, three etc., unless otherwise expressly limited specifically.
In the present invention, unless otherwise clearly defined and limited, the term such as term " installation ", " being connected ", " connection ", " fixing " should be interpreted broadly, and such as, can be fixedly connected with, also can be removably connect, or integral; Can be mechanical connection, also can be electrical connection; Can be directly be connected, also indirectly can be connected by intermediary, can be the connection of two element internals or the interaction relationship of two elements, unless otherwise clear and definite restriction.For the ordinary skill in the art, above-mentioned term concrete meaning in the present invention can be understood as the case may be.
In the present invention, unless otherwise clearly defined and limited, fisrt feature second feature " on " or D score can be that the first and second features directly contact, or the first and second features are by intermediary indirect contact.And, fisrt feature second feature " on ", " top " and " above " but fisrt feature directly over second feature or oblique upper, or only represent that fisrt feature level height is higher than second feature.Fisrt feature second feature " under ", " below " and " below " can be fisrt feature immediately below second feature or tiltedly below, or only represent that fisrt feature level height is less than second feature.
In the description of this instructions, specific features, structure, material or feature that the description of reference term " embodiment ", " some embodiments ", " example ", " concrete example " or " some examples " etc. means to describe in conjunction with this embodiment or example are contained at least one embodiment of the present invention or example.In this manual, to the schematic representation of above-mentioned term not must for be identical embodiment or example.And the specific features of description, structure, material or feature can combine in one or more embodiment in office or example in an appropriate manner.In addition, when not conflicting, the feature of the different embodiment described in this instructions or example and different embodiment or example can carry out combining and combining by those skilled in the art.
Although illustrate and describe embodiments of the invention above, be understandable that, above-described embodiment is exemplary, can not be interpreted as limitation of the present invention, and those of ordinary skill in the art can change above-described embodiment within the scope of the invention, revises, replace and modification.

Claims (9)

1., based on a superscalar processor command assignment device for decoupling zero-check out operation, it is characterized in that, comprise command operating queue, instruction dispatch unit, ROB and performance element, wherein,
Described command operating queue, for the instruction to be allocated such as storing;
Described instruction dispatch unit, for by the command assignment in instruction queue in ROB and performance element;
Described ROB, the order for performing according to program preserves the instruction after distributing;
Described performance element, for performing described instruction;
Wherein, described instruction dispatch unit comprises performance element dependency analysis device, performance element position analysis device, performance element state judging device, performance element distributor and ROB distributor, wherein,
Described performance element position analysis device and performance element dependency analysis device are used for realizing instruction decoupling zero operation, to obtain the correlativity between instruction, comprise performance element needed for the position of performance element needed for instruction and instruction to the dependence of the unit that other performs, and from instruction queue, obtain the instruction type of each bar instruction to be allocated, and according to instruction type, decoupling zero operation is carried out to the instruction that there is correlativity;
Described performance element state judging device, for judging whether each performance element can accept newly assigned instruction;
Described performance element distributor, for checking out operation to each performance element;
Described ROB distributor, checks out operation for carrying out ROB according to the position of each virtual performance element and the available signal of correspondence.
2., as claimed in claim 1 based on the superscalar processor command assignment device of decoupling zero-check out operation, it is characterized in that, described instruction type comprises:
ALU instruction: performed by fixed-point arithmetic unit;
FPU instruction: performed by Float Point Unit;
VPU0 instruction: performed by vector operation unit VPU_0 and VPU_1;
VPU1 instruction: can only be performed by vector operation unit VPU_1;
LSU instruction: can only be performed by memory access unit LSU_0 and LSU_1;
BQU instruction: performed by branch's queue unit;
MDU instruction: performed by fixed point multiplication and division arithmetic element;
ROB instruction: performed by ROB, does not need instruction execution unit, as long as be assigned in ROB.
3., as claimed in claim 1 based on the superscalar processor command assignment device of decoupling zero-check out operation, it is characterized in that, described instruction dispatch unit comprises following virtual performance element:
ALU_1ST: the fixed-point arithmetic unit that the Article 1 ALU instruction in instruction to be sent uses, corresponding to ALU_0 or ALU_1 in reality;
ALU_2ND: the fixed-point arithmetic unit that the Article 2 ALU instruction in instruction to be sent uses, corresponding to ALU_0 or ALU_1 in reality;
FPU_1ST: the Float Point Unit that the Article 1 FPU instruction in instruction to be sent uses, corresponding to FPU_0 or FPU_1 in reality;
FPU_2ND: the Float Point Unit that the Article 2 FPU instruction in instruction to be sent uses, corresponding to FPU_0 or FPU_1 in reality;
VPU_1ST: the vector operation unit that the Article 1 VPU instruction in instruction to be sent uses, corresponding to VPU_0 or VPU_1 in reality;
VPU_2ND: the vector operation unit that the Article 2 VPU instruction in instruction to be sent uses, corresponding to VPU_0 or VPU_1 in reality;
LSU_1ST: the Article 1 LSU instruction in instruction to be sent, corresponding to the LSU_0 in reality;
LSU_2ND: the Article 2 LSU instruction in instruction to be sent, corresponding to the LSU_1 in reality;
BQU: the Article 1 BQU instruction in instruction to be sent, corresponding to the BQU in reality;
MDU: the Article 1 MDU instruction in instruction to be sent, corresponding to the MDU in reality.
4. as claimed in claim 3 based on the superscalar processor command assignment device of decoupling zero-check out operation, it is characterized in that, described performance element position analysis device judges the position of instruction in instruction to be sent that will be assigned to each virtual performance element according to the precedence of instruction type and instruction.
5. the superscalar processor command assignment device based on decoupling zero-check out operation as described in any one of claim 1-4, it is characterized in that, described performance element dependency analysis device judges the dependence that will be assigned between the instruction of each virtual performance element according to the precedence of instruction type and instruction.
6., as claimed in claim 1 based on the superscalar processor command assignment device of decoupling zero-check out operation, it is characterized in that, the input of described performance element state judging device comes from each performance element, and output is the available signal of each performance element.
7. as claimed in claim 1 based on the superscalar processor command assignment device of decoupling zero-check out operation, it is characterized in that, it is described that to check out operation be for virtual performance element xxx, check that whether each xxx_check_list_yyy is effective, if effectively, just check that whether yyy_available is effective, only have the yyy_available corresponding to all effective xxx_check_list_yyy all effective, xxx could be checked out.
8. as claimed in claim 1 based on the superscalar processor command assignment device of decoupling zero-check out operation, it is characterized in that, the input of described ROB distributor is from performance element position analysis device, performance element dependency analysis device and performance element state judging device.
9. as claimed in claim 8 based on the superscalar processor command assignment device of decoupling zero-check out operation, it is characterized in that, wherein, described ROB distributor, carrying out the precedence following instruction when checking out operation, just can be checked out when the instruction of instruction below only is above checked out.
CN201510020399.4A 2015-01-15 2015-01-15 The superscalar processor that operation is checked out based on decoupling instructs distributor Active CN105278915B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510020399.4A CN105278915B (en) 2015-01-15 2015-01-15 The superscalar processor that operation is checked out based on decoupling instructs distributor

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510020399.4A CN105278915B (en) 2015-01-15 2015-01-15 The superscalar processor that operation is checked out based on decoupling instructs distributor

Publications (2)

Publication Number Publication Date
CN105278915A true CN105278915A (en) 2016-01-27
CN105278915B CN105278915B (en) 2018-03-06

Family

ID=55147987

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510020399.4A Active CN105278915B (en) 2015-01-15 2015-01-15 The superscalar processor that operation is checked out based on decoupling instructs distributor

Country Status (1)

Country Link
CN (1) CN105278915B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114579187A (en) * 2022-04-28 2022-06-03 飞腾信息技术有限公司 Instruction distribution method and device, electronic equipment and readable storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110231636A1 (en) * 2010-03-16 2011-09-22 Olson Christopher H Apparatus and method for implementing instruction support for performing a cyclic redundancy check (crc)
CN102362257A (en) * 2009-03-24 2012-02-22 国际商业机器公司 Tracking deallocated load instructions using a dependence matrix
CN102422262A (en) * 2009-05-08 2012-04-18 松下电器产业株式会社 Processor
CN102566976A (en) * 2010-12-27 2012-07-11 北京国睿中数科技股份有限公司 Register renaming system and method for managing and renaming registers

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102362257A (en) * 2009-03-24 2012-02-22 国际商业机器公司 Tracking deallocated load instructions using a dependence matrix
CN102422262A (en) * 2009-05-08 2012-04-18 松下电器产业株式会社 Processor
US20110231636A1 (en) * 2010-03-16 2011-09-22 Olson Christopher H Apparatus and method for implementing instruction support for performing a cyclic redundancy check (crc)
CN102566976A (en) * 2010-12-27 2012-07-11 北京国睿中数科技股份有限公司 Register renaming system and method for managing and renaming registers

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
任永青: "逻辑核动态可重构的众核处理器体系结构", 《万方学位论文数据库》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114579187A (en) * 2022-04-28 2022-06-03 飞腾信息技术有限公司 Instruction distribution method and device, electronic equipment and readable storage medium
CN114579187B (en) * 2022-04-28 2022-08-19 飞腾信息技术有限公司 Instruction distribution method and device, electronic equipment and readable storage medium

Also Published As

Publication number Publication date
CN105278915B (en) 2018-03-06

Similar Documents

Publication Publication Date Title
CN102129390B (en) Task scheduling system of on-chip multi-core computing platform and method for task parallelization
US6728866B1 (en) Partitioned issue queue and allocation strategy
US8990827B2 (en) Optimizing data warehousing applications for GPUs using dynamic stream scheduling and dispatch of fused and split kernels
KR100636759B1 (en) Vector processing apparatus with overtaking function
US9495206B2 (en) Scheduling and execution of tasks based on resource availability
KR100616722B1 (en) Pipe1ined instruction dispatch unit in a supersca1ar processor
US9760352B2 (en) Program optimization method, program optimization program, and program optimization apparatus
CN103116485B (en) A kind of assembler method for designing based on very long instruction word ASIP
CN104899181A (en) Data processing apparatus and method for processing vector operands
KR20110106717A (en) Reconfigurable array and control method of reconfigurable array
Wang et al. Register renaming and scheduling for dynamic execution of predicated code
KR100628573B1 (en) Apparatus capable of execution of conditional instructions in out of order and method thereof
Karnagel et al. Heterogeneity-aware operator placement in column-store DBMS
US8516223B2 (en) Dispatching instruction from reservation station to vacant instruction queue of alternate arithmetic unit
JP3721780B2 (en) Data processing apparatus having a plurality of pipeline processing mechanisms
EP2159686A1 (en) Information processor
CN105278915A (en) Instruction distribution device for superscalar processor based on decoupling-check-out operations
US11150906B2 (en) Processor with a full instruction set decoder and a partial instruction set decoder
CN114116015B (en) Method and system for managing hardware command queue
Douma et al. Fast and precise cache performance estimation for out-of-order execution
JPH08241213A (en) Decentralized control system in microprocessor
Benhamamouch et al. Computing WCET using symbolic execution
US8683181B2 (en) Processor and method for distributing load among plural pipeline units
Kotthaus et al. Performance analysis for parallel R programs: towards efficient resource utilization
CN112612585B (en) Thread scheduling method, configuration method, microprocessor, device and storage medium

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant