CN111538534A - Multi-instruction out-of-order emission method based on instruction fading and processor - Google Patents

Multi-instruction out-of-order emission method based on instruction fading and processor Download PDF

Info

Publication number
CN111538534A
CN111538534A CN202010264562.2A CN202010264562A CN111538534A CN 111538534 A CN111538534 A CN 111538534A CN 202010264562 A CN202010264562 A CN 202010264562A CN 111538534 A CN111538534 A CN 111538534A
Authority
CN
China
Prior art keywords
instruction
circuit
age
transmitted
withering
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010264562.2A
Other languages
Chinese (zh)
Other versions
CN111538534B (en
Inventor
虞致国
马晓杰
魏敬和
顾晓峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangnan University
Original Assignee
Jiangnan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangnan University filed Critical Jiangnan University
Priority to CN202010264562.2A priority Critical patent/CN111538534B/en
Priority to PCT/CN2020/098961 priority patent/WO2021203560A1/en
Publication of CN111538534A publication Critical patent/CN111538534A/en
Application granted granted Critical
Publication of CN111538534B publication Critical patent/CN111538534B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30003Arrangements for executing specific machine instructions
    • G06F9/30007Arrangements for executing specific machine instructions to perform operations on data operands
    • G06F9/3001Arithmetic instructions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3836Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3867Concurrent instruction execution, e.g. pipeline or look ahead using instruction pipelines
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • Advance Control (AREA)

Abstract

The invention discloses a multi-instruction out-of-order transmitting method based on instruction fading and a processor, and belongs to the field of processor design. The invention abandons a tedious arbitration structure in the traditional emission architecture, adds an instruction withering circuit, adopts an instruction age array to represent the time of instructions stored in a CPU, adds a bit of awakening state bit, stores the instructions which exceed the withering threshold value into a sedimentation tank so as to facilitate the direct emission of the CPU, improves the circuit structures of an instruction request circuit, an instruction distribution circuit, an awakening circuit and the like, and effectively improves the time sequence of a key path in a processor for transmitting multiple instructions; when the instruction is awakened, the instruction with short execution period is awakened in a delayed mode, and the instruction with long execution period is awakened in advance, so that the instructions can be executed back to back, the requirements of high-performance power consumption ratio, low delay and high IPC in a modern superscalar disorder processor are met, and the problems that the number of items in an emission queue list cannot be increased day by day and the delay cannot be increased day by day in the processor in the prior art are solved.

Description

Multi-instruction out-of-order emission method based on instruction fading and processor
Technical Field
The invention relates to a multi-instruction out-of-order transmitting method based on instruction fading and a processor, and belongs to the field of processor design.
Background
The performance improvement of the CPU's single core has been particularly slow since the tennard extension was terminated for more than a decade. In this context, it is entirely necessary to re-study the core microarchitecture to obtain high mononuclear performance.
Among the many architectures of CPUs, the instruction issue architecture is one of the important architectures for achieving high performance of CPUs. The instruction issue architecture schedules execution instructions by selecting and issuing instructions from the instructions to be issued in the instruction issue queue each cycle. To achieve high performance, the instruction issue architecture must implement high IPC (Instructions executed per cycle) with low latency. Meanwhile, in the process of designing the instruction transmitting architecture, low delay is an important consideration, because the instruction transmitting architecture is a time sequence critical path in the processor, the delay of the instruction transmitting architecture can have a great influence on the working main frequency of the CPU.
The traditional multi-instruction out-of-order transmission architecture selects the instructions which can be transmitted through the arbitration circuit, and has the advantages that the instructions with the largest age can be accurately selected for transmission, the efficiency of a processor pipeline is ensured, but the delay of the arbitration circuit is correspondingly increased along with the increase of the number of the items of the transmission queue table.
In modern processors, in pursuit of high IPC, many entries are often designed in the issue queue, which causes the delay of the arbitration circuit to be significant, making the instruction issue circuit become a critical path in the processor and a bottleneck of the main frequency of the processor.
In view of the above requirements and challenges, it is very urgent to provide a design of multiple-instruction out-of-order issue architecture based on instruction nulling for low latency, high IPC, and other conditions.
The multi-instruction out-of-order transmitting architecture designed by the invention can effectively judge the age of the instruction and has the least influence on the efficiency of a processor pipeline, the delay of the time sequence path cannot be increased along with the increase of the number of the table entries in the transmitting queue, the delay in the processor with a large number of table entries is ensured to be as small as possible, and the main frequency improvement of the processor is guaranteed.
Disclosure of Invention
The invention provides a multi-instruction out-of-order transmitting method based on instruction nulling and a processor, aiming at solving the problem that the delay of an arbitration circuit is correspondingly increased along with the increase of the number of items of a transmitting queue in the conventional method for selecting instructions capable of being transmitted through the arbitration circuit.
A multi-instruction out-of-order transmitting method is characterized in that an instruction withering circuit is added in an instruction out-of-order transmitting framework of a processor and used for storing newly distributed instructions into a transmitting queue and realizing withering operation on the instructions in the transmitting queue; the method comprises the following steps:
setting the highest bit of the instruction age corresponding to each instruction in the instruction withering circuit as an instruction awakening state bit, wherein the rest bits of the instruction age represent the instruction intrinsic age; the awakening state bit is used for indicating whether the corresponding instruction is awakened or not, and the age of the awakened instruction in the transmitting queue is larger than that of the non-awakened instruction;
setting a withering threshold value, and triggering a withering signal by an instruction age array when the instruction age of a certain instruction exceeds the withering threshold value so that the instruction is withered; the instructions with the zero fading can be randomly selected to be transmitted without arbitration, so that the out-of-order transmission of multiple instructions is realized;
and determining a transmitting sequence of each instruction in the transmitting queue according to the instruction age and the awakening state.
Optionally, in the method, when the instruction is waken up, the instruction with a short execution cycle is waken up in a delayed manner, and the instruction with a long execution cycle is waken up in advance, so as to ensure that the instructions can be executed back to back.
Optionally, in the method, when the instruction is waken up, after a preceding instruction in the instructions having the sequence is transmitted, the processor waits for the preceding instruction to be executed and then wakes up a following instruction.
Optionally, the instruction out-of-order issue architecture further includes an instruction allocation circuit, an instruction request circuit based on the class adder, and a dynamic delay wake-up circuit;
the instruction distribution circuit is used for distributing a plurality of instructions sent by the physical register to idle table entries in the transmission queue;
the instruction request circuit based on the class adder is used for counting the total number of idle signals of table entries in a transmission queue, coding the number of the idle signals by using special codes, and if the total number of the idle signals subjected to coding is less than the instruction transmission width subjected to coding, sending an instruction request signal to a physical register file;
the dynamic delay wake-up circuit is used for sending out a wake-up signal when the source register number of the instruction to be transmitted is equal to the destination register number of the transmitted instruction, and meanwhile, the wake-up circuit identifies the execution period of the instruction to be transmitted through the instruction execution distinguishing circuit and adjusts the sequence of the wake-up signal according to the execution period of the instruction to be transmitted so as to ensure that the instructions can be executed back to back.
Optionally, the instruction withering circuit includes an instruction age array, an emission queue, a withering threshold adjuster, a sedimentation tank, and a global age feature extraction circuit;
the instruction age array is used for indicating the instruction age of each instruction in the transmission queue and whether the instruction is awakened or not;
the transmission queue is used for storing the instruction sent from the physical register; the transmitting queue is designed to be a non-compression structure, namely when the physical register number of an instruction in a table entry is transmitted and is in an idle state, other table entries cannot be shifted, and each table entry not only temporarily stores the physical register number of the current instruction, but also records the awakening state of the current instruction and whether the table entry is in the idle state;
the withering threshold value adjuster is used for dynamically adjusting and outputting a withering threshold value according to the number of idle table entries of the sedimentation tank and the age value of the instructions in the transmission queue which still remain;
the sedimentation tank is used for storing a withering instruction meeting the withering condition;
the global age characteristic extraction circuit is used for counting global age characteristics.
Optionally, the input of the withering threshold adjuster is the age of each instruction in the instruction age array, and the output is a withering threshold x, that is:
Figure BDA0002440761560000031
where σ is the variance of the instruction age, μ is the expectation of the instruction age, α is the adjustment coefficient, α satisfies
Figure BDA0002440761560000032
Optionally, the adder-like instruction request circuit comprises an addition-like layer and a post-log 2(n/2) layer of shift logic, where n represents the number of entries in the transmit queue.
Optionally, the dynamic delay wake-up circuit is composed of a comparator, an instruction execution discrimination circuit, and a register; the input of the wake-up circuit is the source register number of the instruction to be transmitted and the destination register number of the transmitted instruction, whether the source register number of the instruction to be transmitted and the destination register number of the transmitted instruction are equal or not is compared through a comparator, and if the source register number of the instruction to be transmitted and the destination register number of the transmitted instruction are equal, a wake-up signal is sent out; meanwhile, the wake-up circuit identifies the execution period of the instruction to be transmitted through the instruction execution distinguishing circuit and outputs the period number of the instruction to be transmitted, and the register registers the wake-up signal to be sent out through the period number of the instruction to be transmitted, so that the aim of adjusting the sequence of the wake-up signal is fulfilled.
The application also provides a processor, wherein the instruction out-of-order emission architecture of the processor comprises an instruction distribution circuit, an instruction withering circuit, an instruction request circuit based on a class adder and a dynamic delay awakening circuit;
the instruction distribution circuit is used for distributing a plurality of instructions sent by the physical register to idle table entries in the transmission queue;
the instruction withering circuit is used for storing the newly distributed instructions into the transmitting queue and realizing withering operation on the instructions in the transmitting queue according to the instruction age of each instruction; the command with the zero fading can be randomly selected to be transmitted without arbitration;
the instruction request circuit based on the class adder is used for counting the total number of idle signals of table entries in a transmission queue, coding the number of the idle signals by using special codes, and if the total number of the idle signals subjected to coding is less than the instruction transmission width subjected to coding, sending an instruction request signal to a physical register file;
the dynamic delay wake-up circuit is used for sending out a wake-up signal when the source register number of the instruction to be transmitted is equal to the destination register number of the transmitted instruction, and meanwhile, the wake-up circuit identifies the execution period of the instruction to be transmitted through the instruction execution distinguishing circuit and adjusts the sequence of the wake-up signal according to the execution period of the instruction to be transmitted so as to ensure that the instructions can be executed back to back.
Optionally, the highest bit of the instruction age of each instruction is set as the wakeup state bit of the instruction, and the rest bits of the instruction age represent the instruction intrinsic age; the wakeup status bit is used to indicate whether the corresponding instruction is woken up, and the age of the woken up instruction in the transmission queue is larger than that of the non-woken up instruction.
The invention has the beneficial effects that:
the method has the advantages that a lengthy arbitration structure in a traditional emission architecture is abandoned, the instruction withering circuit is added, the instruction age array is adopted to represent the time of instructions stored in the CPU, in addition, a wakeup state bit is added, the instructions which exceed the withering threshold value are stored in a sedimentation tank so as to be directly emitted by the CPU, the circuit structures of an instruction request circuit, an instruction distribution circuit, a wakeup circuit and the like are improved, and the time sequence of a key path in the processor for emitting multiple instructions is effectively improved; when the instruction is awakened, the instruction with short execution period is awakened in a delayed mode, and the instruction with long execution period is awakened in advance, so that the instructions can be executed back to back, the requirements of high-performance power consumption ratio, low delay and high IPC in a modern superscalar disorder processor are met, and the problems that the number of items in an emission queue list cannot be increased day by day and the delay cannot be increased day by day in the processor in the prior art are solved.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
FIG. 1 is a block diagram illustrating an overall structure of a multiple-instruction out-of-order issue architecture based on instruction nulling according to the present invention.
Fig. 2 is a schematic diagram of the instruction withering circuit according to the present invention.
FIG. 3 is a block diagram of an instruction distribution circuit according to the present invention.
FIG. 4 is a block diagram of an adder-like instruction request circuit according to the present invention.
FIG. 5 is a schematic diagram of the dynamic delay wake-up circuit according to the present invention.
FIG. 6 is a schematic diagram of a pipeline for adjusting a wake-up sequence via a wake-up circuit.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, embodiments of the present invention will be described in detail with reference to the accompanying drawings.
The first embodiment is as follows:
the present embodiment provides a processor, referring to fig. 1, a schematic diagram of a multiple instruction out-of-order issue architecture of the processor is generally shown, where the multiple instruction out-of-order issue architecture includes: the system comprises an instruction distribution circuit, an instruction withering circuit, an instruction request circuit based on a class adder and a dynamic delay awakening circuit.
Wherein the instruction allocation circuitry allocates the register renamed instruction to each entry in the instruction issue queue. The instruction transmitting queue includes several list items, each list item includes one instruction to be transmitted, and if the instruction transmitting queue has idle list item, the instruction transmitted via the distributing circuit will be received.
All the instructions to be transmitted which just enter the table entry are in an un-awakened state, and if the source register number of a certain instruction is equal to the label of the target register of the transmitted instruction, the instruction is awakened by an awakening circuit. The instructions in all the table items are subjected to instruction fading by the instruction fading circuit, all the instructions subjected to fading can be finally transmitted, the out-of-order transmission of the multiple instructions is realized, and the out-of-order transmission of the multiple instructions can be completed in the superscalar out-of-order transmission processor.
The schematic diagram of the instruction withering circuit is shown in fig. 2, and the instruction withering circuit includes an instruction age array, an emission queue, a withering threshold adjuster, a sedimentation tank, and a global age feature extraction circuit.
And the newly distributed instructions passing through the instruction distribution circuit enter the instruction withering circuit to be stored into the idle emission queue item, and meanwhile, the corresponding instruction ages in the instruction age array are initialized to be random values between 0 and 1.
Whenever an instruction is transmitted, an age increment signal is released for the age array, and 1 is added to the age of the instruction which is not transmitted in the transmission queue correspondingly.
The withering threshold adjuster adjusts and outputs a withering threshold according to the free table item information of the sedimentation tank and the global age threshold, if a certain instruction age in the instruction age array is larger than the withering threshold, the instruction age array outputs a withering signal, the instruction receiving the withering signal executes the withering operation, the withering signal enters the sedimentation tank from the transmitting queue, the corresponding table items in the transmitting queue are in a free state, and the newly allocated instruction is waited to be input.
The withering instructions in the sedimentation tank can be transmitted without arbitration.
The adjuster of the withering threshold value inputs the free table item information and the global age characteristic of the sedimentation tank, the global age characteristic value is output by the global age characteristic extraction circuit, and the adjuster adjusts and outputs the withering threshold value according to the number of the free table items of the sedimentation tank and all current instruction age values.
The input of the withering threshold adjuster is the age of each instruction in the instruction age array, and the output is a withering threshold, wherein the withering threshold x is as follows:
Figure BDA0002440761560000051
wherein α satisfies
Figure BDA0002440761560000052
σ is the variance of the instruction age and μ is the expectation of the instruction age. The derivation process of the characteristic value is as follows:
in modern processors, hundreds of millions of instructions per second can be processed, with an initial value of age being a random value between 0 and 1, under this large sample condition the processor's age can be considered continuous, and according to the theorem of large numbers, the processor's age can be considered to follow a normal distribution:
Figure BDA0002440761560000053
where σ is the variance of the instruction age and μ is the expectation of the instruction age.
Constructor g (x):
Figure BDA0002440761560000061
modification of the formula (2)
Figure BDA0002440761560000062
Obtaining a first derivative of (3)
Figure BDA0002440761560000063
Second derivative of (3)
Figure BDA0002440761560000064
The formula (4) is 0
Figure BDA0002440761560000065
Bringing formula (6) into (5)
Figure BDA0002440761560000066
To obtain
Figure BDA0002440761560000067
In order to maximize the age of the withered by the threshold value x, the influence on the production line efficiency is as small as possible, and
Figure BDA0002440761560000068
taking the lowest threshold value to obtain the constraint condition of the regulating coefficient alpha as
Figure BDA0002440761560000069
In conclusion, the following results
Figure BDA00024407615600000610
And α satisfy
Figure BDA00024407615600000611
The instruction age array is essentially an array of counters, each counter totaling
Figure BDA00024407615600000612
Bit, representing the instruction age of the corresponding instruction, of which low
Figure BDA00024407615600000613
The bit is the age counting bit, and the highest 1 bit is the awakening state bit.
When a newly distributed instruction enters a transmitting queue of an instruction withering circuit, setting the corresponding instruction age to zero;
adding 1 to the instruction age corresponding to the instruction which is not transmitted every time when the instruction is transmitted;
when an instruction in the transmission queue is awakened, the awakening state position 1 of the instruction age corresponding to the instruction is used, if the instruction age corresponding to a certain instruction is larger than the withering threshold value, a withering signal is output to the transmission queue, wherein n represents the number of entries of the transmission queue, and s represents the instruction transmission width.
The transmitting queue comprises n table entries, and each table entry stores an instruction to be transmitted and a table entry idle bit.
The sedimentation tank is an instruction queue with the number of entries far smaller than that of the instruction transmitting queue, wherein a withering instruction meeting a withering condition exists, and the withering instruction in the sedimentation tank can be directly transmitted without arbitration.
FIG. 3 is a schematic diagram of the instruction distribution circuit. The instruction distribution circuit is used for distributing a plurality of instructions sent by the physical register to idle entries in the transmission queue.
The instruction distribution circuit comprises s table entry number selection circuits, the input of each table entry number selection circuit is a spare signal sequence of n/s table entries in a transmission queue in the instruction withering circuit and the corresponding transmission queue table entry number, the table entry number selection circuit selects the transmission queue table entry number according to whether an input spare signal is effective, and if a plurality of spare signals are effective, the table entry number with the first spare signal effective is selected; if no effective idle signal exists, the output value is the maximum value of the upper limit of the data bit, which indicates that no selected table entry exists. The table entry number output by the table entry allocation circuit is compared with the upper limit of the numerical value, if the table entry number is equal to the upper limit of the numerical value, the valid signal is set to be 1, and if the table entry number is not equal to the upper limit of the numerical value, the valid signal is set to be 0. And each instruction to be distributed input by the distribution circuit is written into a corresponding table entry according to the table entry number and the valid signal. Where s represents the instruction issue width and n represents the number of entries in the issue queue.
The table entry number selection circuit is composed of a selector array, as shown in fig. 2, a first row of selectors inputs table entry numbers, and the selectors select the table entry numbers according to idle signals with smaller table entry numbers because a first idle table entry needs to be selected; the number input of the table item in the second layer is the number output of the selected table item in the first layer, the selected signal is an idle signal with a smaller table item number, and so on, the total number of the selected table items in the second layer is log2 (n). The selection result of the log2(n) layer selection layer is output to a full-empty table item selector, the selection signal of the selector is the selection signal of the log2(n) layer selection layer, the data to be selected is the selection result of the log2(n) layer selection layer and the numerical upper limit value, and if the selection signal is 0, the numerical upper limit value is output as the final table item number; if not 0, the selection result of the log2(n) level selection layer is output as the final table entry number, where n represents the number of table entries.
FIG. 4 is a schematic diagram of the instruction request circuit. The instruction request circuit is used for counting the total number of idle signals of the table entry, encoding the number of the idle signals by using special codes, and sending an instruction request signal to the physical register file if the total number of the encoded idle signals is less than the instruction transmission width which is also encoded. The instruction request circuit is composed of two parts: the addition-like layer and the last log2(n/2) layer shift the logical layer.
The class addition layer is composed of a class addition computing unit; when the total number of idle signals of the table entry is counted, inputting the idle signal sequence of the table entry into a similar addition layer, calculating the number of the idle signals, carrying out special coding, and outputting the total number of the idle signals subjected to the special coding; and the output of the similar addition layer is sent to a post log2(n/2) layer shift logic layer, a statistical result is finally output, and the statistical result is compared with the instruction emission width which is also specially coded to determine whether an instruction request signal needs to be sent.
Specifically, when the total number of idle signals of the table entry is counted, the idle signal sequence of the table entry is input into a class addition layer, each class addition unit inputs two binary numbers in the idle signal sequence and performs and operation and exclusive or operation respectively, and then the calculation results of the two are compared:
if equal, and the and operation result is 1, then the code representing 1 is output: "01", representing the sum of the two-level system number inputs of the class add unit is 1, and encoding it as "01";
if equal, and the AND operation results in bit 0, then the code representing 0 is output: "10", representing the sum of the two-level system number inputs of the class add unit is 0, and is encoded as "10";
if not, the output represents the code of 2: "00", representing the sum of the two-level system number inputs of the class add unit is 2, and encoded as "00";
the number of encoding bits is n.
The last log2(n/2) level shift logic level is composed of right shift shifters; inputting the output result of the addition-like layer into a post-log 2(n/2) layer shift logic layer, and comparing the output result with the instruction emission width which is also specially coded to determine whether an instruction request signal needs to be sent, wherein the method comprises the following steps:
the right shift shifter takes the output of one type of addition unit as the input of data to be shifted, takes the output of the other type of addition unit as the input of shift bits, and shifts the data to be shifted by n bits right through the right shift shifter. Wherein n is the decimal number corresponding to the shift digit.
For example, if the number of bits to be shifted is "01" and the number of bits to be shifted is "00", then the "01" is right-shifted by 2 bits according to the above coding rule;
fig. 5 is a schematic diagram of the wake-up circuit. The wake-up circuit is composed of a comparator, an instruction execution distinguishing circuit and a register.
The wake-up circuit inputs a source register number of an instruction to be transmitted and a destination register number of an transmitted instruction, compares whether the source register number of the instruction to be transmitted and the destination register number of the transmitted instruction are equal or not through a comparator, and sends out a wake-up signal if the source register number of the instruction to be transmitted and the destination register number of the transmitted instruction are equal; meanwhile, the wake-up circuit identifies the execution period of the instruction to be transmitted through the instruction execution distinguishing circuit and outputs the period number of the instruction to be transmitted, the register registers the wake-up signal to be sent out through the period number of the instruction to be transmitted, so that the aim of adjusting the sequence of the wake-up signal is fulfilled, the wake-up is delayed for the instruction with a short execution period, and the instruction with a long execution period is awakened in advance, so that the instructions on the production line can be executed back to back, and the efficiency of the production line is improved.
FIG. 6 is a schematic diagram of a pipeline adjusted by instruction wakeup. The instruction A needs 3 execution cycles, the instructions B, C, D need one execution cycle respectively, and the instruction D delays the instruction A for two cycles to wake up through the wake-up sequence adjustment of the wake-up circuit, so that two back-to-back executed instructions B, C can be inserted between the instructions A, D, thereby ensuring that all 4 instructions are executed back-to-back without delay bubbles, and improving the execution efficiency of the pipeline.
Example two
The embodiment provides a multiple-instruction out-of-order transmitting method based on instruction withering, which is used in the processor described in the embodiment one, wherein the transmitting architecture in the processor is a non-data capture type transmitting architecture, that is, a CPU can really read a physical register file after a transmitting stage, and each entry stored in a transmitting queue is a physical register number; the method comprises the following steps:
s1, when the physical register file receives the instruction request signal from the instruction request circuit, it outputs the proper instruction to the instruction distribution circuit.
S2, the instruction allocation circuit allocates an instruction output by the physical register file to each entry in the instruction issue queue:
the instruction distribution circuit comprises s table entry number selection circuits, the input of each table entry number selection circuit is a spare signal sequence of n/s table entries in a transmission queue in the instruction withering circuit and the corresponding transmission queue table entry number, the table entry number selection circuit selects the transmission queue table entry number according to whether an input spare signal is effective, and if a plurality of spare signals are effective, the table entry number with the first spare signal effective is selected; if no effective idle signal exists, the output value is the maximum value of the upper limit of the data bit, which indicates that no selected table entry exists.
The table entry number output by the table entry allocation circuit is compared with the upper limit of the numerical value, if the table entry number is equal to the upper limit of the numerical value, the valid signal is set to be 1, and if the table entry number is not equal to the upper limit of the numerical value, the valid signal is set to be 0. And each instruction to be distributed input by the distribution circuit is written into a corresponding table entry according to the table entry number and the valid signal.
Where s represents the instruction issue width and n represents the number of entries in the issue queue.
And the newly distributed instructions passing through the instruction distribution circuit enter the instruction withering circuit to be stored into the idle emission queue item, and meanwhile, the corresponding instruction ages in the instruction age array are initialized to be random values between 0 and 1.
S3, when the transmitting queue in the instruction zero circuit receives a new instruction, the instruction age in the instruction age array corresponding to the table item where the instruction is located is set to zero; when the instruction withering circuit transmits one instruction, adding one to the instruction age corresponding to the instruction still in the transmission queue; the highest bit of the instruction age corresponding to the instruction is the awakening state bit of the instruction, and the rest bits represent the intrinsic age of the instruction. After the instructions in the transmission queue are awakened, the highest position one of the corresponding age information ensures that the age of the awakened instruction is greater than that of the non-awakened instruction.
When the instruction age exceeds a withering threshold value, the instruction age array triggers a withering signal to enable the instruction to wither, the instruction with withering enters the sedimentation tank from the transmitting queue, and meanwhile the table entry in the transmitting queue is set to be idle.
The sedimentation tank is an instruction queue with the number of entries far smaller than that of the transmission queue, instructions after withering exist, and the withering instructions in the sedimentation tank can be randomly selected to be transmitted.
The transmitting queue in the withering circuit is designed into a non-compression structure, namely when the physical register number of an instruction in a table entry is transmitted and is in an idle state, other table entries cannot be shifted, and each table entry not only temporarily stores the physical register number of the current instruction, but also records the awakening state of the current instruction and whether the table entry is in the idle state;
s4, transmitting the idle signal of the list item in the transmitting queue to the instruction request circuit, the instruction request circuit counting the number of idle list items in the transmitting queue, if the number of idle list items in the transmitting queue is larger than the transmitting width of the instruction, the request circuit transmitting the instruction request signal to the physical register file, the physical register file receiving the request signal and transmitting the instruction to the instruction distributing circuit;
and S5, in the instruction transmitting process, the wake-up circuit is responsible for comparing the number of the currently transmitted target register with the number of the source register of each instruction in the transmitting queue, if the numbers are equal, the wake-up circuit sends out a wake-up signal, and simultaneously judges whether the wake-up signal needs to be transmitted in a delayed mode according to the execution period of the instruction, the instruction with long execution period is awakened in advance, and the instruction with short execution period is transmitted in a delayed mode. The awakening signal ensures that the awakened instruction age is larger than the instruction age which is not awakened for the awakening state position 1 in the instruction age corresponding to the instruction.
Some steps in the embodiments of the present invention may be implemented by software, and the corresponding software program may be stored in a readable storage medium, such as an optical disc or a hard disk.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims (10)

1. A multi-instruction out-of-order transmitting method is characterized in that an instruction withering circuit is added in an instruction out-of-order transmitting framework of a processor and used for storing newly distributed instructions into a transmitting queue and realizing withering operation on the instructions in the transmitting queue; the method comprises the following steps:
setting the highest bit of the instruction age corresponding to each instruction in the instruction withering circuit as an instruction awakening state bit, wherein the rest bits of the instruction age represent the instruction intrinsic age; the awakening state bit is used for indicating whether the corresponding instruction is awakened or not, and the age of the awakened instruction in the transmitting queue is larger than that of the non-awakened instruction;
setting a withering threshold value, and triggering a withering signal by an instruction age array when the instruction age of a certain instruction exceeds the withering threshold value so that the instruction is withered; the instructions with the zero fading can be randomly selected to be transmitted without arbitration, so that the out-of-order transmission of multiple instructions is realized;
and determining a transmitting sequence of each instruction in the transmitting queue according to the instruction age and the awakening state.
2. The method of claim 1, wherein the method delays waking up for instructions with short execution cycles and early wakes up for instructions with long execution cycles when waking up the instructions to ensure that the instructions can be executed back to back.
3. The method of claim 2, wherein the method wakes up the subsequent instruction after the previous instruction is transmitted in the sequential order, when the processor waits for the previous instruction to finish executing.
4. The method of claim 3, wherein the instruction out-of-order issue architecture further comprises an instruction dispatch circuit, an adder-like instruction request circuit, and a dynamic delay wake-up circuit;
the instruction distribution circuit is used for distributing a plurality of instructions sent by the physical register to idle table entries in the transmission queue;
the instruction request circuit based on the class adder is used for counting the total number of idle signals of table entries in a transmission queue, coding the number of the idle signals by using special codes, and if the total number of the idle signals subjected to coding is less than the instruction transmission width subjected to coding, sending an instruction request signal to a physical register file;
the dynamic delay wake-up circuit is used for sending out a wake-up signal when the source register number of the instruction to be transmitted is equal to the destination register number of the transmitted instruction, and meanwhile, the wake-up circuit identifies the execution period of the instruction to be transmitted through the instruction execution distinguishing circuit and adjusts the sequence of the wake-up signal according to the execution period of the instruction to be transmitted so as to ensure that the instructions can be executed back to back.
5. The method of claim 4, wherein the instruction withering circuit comprises an instruction age array, a launch queue, a withering threshold adjuster, a settling tank, a global age feature extraction circuit;
the instruction age array is used for indicating the instruction age of each instruction in the transmission queue and whether the instruction is awakened or not;
the transmission queue is used for storing the instruction sent from the physical register; the transmitting queue is designed to be a non-compression structure, namely when the physical register number of an instruction in a table entry is transmitted and is in an idle state, other table entries cannot be shifted, and each table entry not only temporarily stores the physical register number of the current instruction, but also records the awakening state of the current instruction and whether the table entry is in the idle state;
the withering threshold value adjuster is used for dynamically adjusting and outputting a withering threshold value according to the number of idle table entries of the sedimentation tank and the age value of the instructions in the transmission queue which still remain;
the sedimentation tank is used for storing a withering instruction meeting the withering condition;
the global age characteristic extraction circuit is used for counting global age characteristics.
6. The method of claim 5, wherein the input of the wither threshold adjuster is the age of each instruction in the instruction age array, and the output is the wither threshold x, which is:
Figure FDA0002440761550000021
where σ is the variance of the instruction age, μ is the expectation of the instruction age, α is the adjustment coefficient, α satisfies
Figure FDA0002440761550000022
7. The method of claim 4 wherein the adder-like instruction request circuitry comprises an addition-like layer and a post-log 2(n/2) level shift logic layer, n representing the number of entries in the transmit queue.
8. The method of claim 4, wherein the dynamic delay wake-up circuit is composed of a comparator, an instruction execution discrimination circuit, and a register; the input of the wake-up circuit is the source register number of the instruction to be transmitted and the destination register number of the transmitted instruction, whether the source register number of the instruction to be transmitted and the destination register number of the transmitted instruction are equal or not is compared through a comparator, and if the source register number of the instruction to be transmitted and the destination register number of the transmitted instruction are equal, a wake-up signal is sent out; meanwhile, the wake-up circuit identifies the execution period of the instruction to be transmitted through the instruction execution distinguishing circuit and outputs the period number of the instruction to be transmitted, and the register registers the wake-up signal to be sent out through the period number of the instruction to be transmitted, so that the aim of adjusting the sequence of the wake-up signal is fulfilled.
9. A processor is characterized in that an instruction out-of-order emission architecture of the processor comprises an instruction distribution circuit, an instruction withering circuit, an instruction request circuit based on an adder-like device and a dynamic delay awakening circuit;
the instruction distribution circuit is used for distributing a plurality of instructions sent by the physical register to idle table entries in the transmission queue;
the instruction withering circuit is used for storing the newly distributed instructions into the transmitting queue and realizing withering operation on the instructions in the transmitting queue according to the instruction age of each instruction; the command with the zero fading can be randomly selected to be transmitted without arbitration;
the instruction request circuit based on the class adder is used for counting the total number of idle signals of table entries in a transmission queue, coding the number of the idle signals by using special codes, and if the total number of the idle signals subjected to coding is less than the instruction transmission width subjected to coding, sending an instruction request signal to a physical register file;
the dynamic delay wake-up circuit is used for sending out a wake-up signal when the source register number of the instruction to be transmitted is equal to the destination register number of the transmitted instruction, and meanwhile, the wake-up circuit identifies the execution period of the instruction to be transmitted through the instruction execution distinguishing circuit and adjusts the sequence of the wake-up signal according to the execution period of the instruction to be transmitted so as to ensure that the instructions can be executed back to back.
10. The processor of claim 9, wherein the most significant bit of the instruction age of each instruction is set to the instruction's awake state bit, the remaining bits of the instruction age representing the instruction's intrinsic age; the wakeup status bit is used to indicate whether the corresponding instruction is woken up, and the age of the woken up instruction in the transmission queue is larger than that of the non-woken up instruction.
CN202010264562.2A 2020-04-07 2020-04-07 Multi-instruction out-of-order transmitting method and processor based on instruction wither Active CN111538534B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202010264562.2A CN111538534B (en) 2020-04-07 2020-04-07 Multi-instruction out-of-order transmitting method and processor based on instruction wither
PCT/CN2020/098961 WO2021203560A1 (en) 2020-04-07 2020-06-29 Instruction withering-based multi-instruction out-of-order transmission method and processor

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010264562.2A CN111538534B (en) 2020-04-07 2020-04-07 Multi-instruction out-of-order transmitting method and processor based on instruction wither

Publications (2)

Publication Number Publication Date
CN111538534A true CN111538534A (en) 2020-08-14
CN111538534B CN111538534B (en) 2023-08-08

Family

ID=71978534

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010264562.2A Active CN111538534B (en) 2020-04-07 2020-04-07 Multi-instruction out-of-order transmitting method and processor based on instruction wither

Country Status (2)

Country Link
CN (1) CN111538534B (en)
WO (1) WO2021203560A1 (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112099854A (en) * 2020-11-10 2020-12-18 北京微核芯科技有限公司 Method and device for scheduling out-of-order queue and judging queue cancellation item
CN113254079A (en) * 2021-06-28 2021-08-13 广东省新一代通信与网络创新研究院 Method and system for realizing self-increment instruction
US11829768B2 (en) 2020-11-10 2023-11-28 Beijing Vcore Technology Co., Ltd. Method for scheduling out-of-order queue and electronic device items
CN117742796A (en) * 2023-12-11 2024-03-22 上海合芯数字科技有限公司 Instruction awakening method, device and equipment
WO2024078228A1 (en) * 2022-10-11 2024-04-18 深圳市中兴微电子技术有限公司 Instruction issuing method and apparatus based on compression-type issue queue, device and medium

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114519319B (en) * 2021-12-30 2024-09-10 中国人民解放军国防科技大学 Method and system for realizing design of mixed emission queue based on high-level modeling

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080082788A1 (en) * 2006-10-02 2008-04-03 The Regents Of The University Of California Pointer-based instruction queue design for out-of-order processors
CN101395573A (en) * 2006-02-28 2009-03-25 Mips技术公司 Distributive scoreboard scheduling in an out-of order processor
CN104932945A (en) * 2015-06-18 2015-09-23 合肥工业大学 Task-level out-of-order multi-issue scheduler and scheduling method thereof

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101706714B (en) * 2009-11-23 2014-03-26 龙芯中科技术有限公司 System and method for issuing instruction, processor and design method thereof
CN101826000A (en) * 2010-01-29 2010-09-08 北京龙芯中科技术服务中心有限公司 Interrupt response determining method, device and microprocessor core for pipeline microprocessor
US10185564B2 (en) * 2016-04-28 2019-01-22 Oracle International Corporation Method for managing software threads dependent on condition variables
CN109885857B (en) * 2018-12-26 2023-09-01 上海合芯数字科技有限公司 Instruction emission control method, instruction execution verification method, system and storage medium
CN110297662B (en) * 2019-07-04 2021-11-30 中昊芯英(杭州)科技有限公司 Method for out-of-order execution of instructions, processor and electronic equipment

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101395573A (en) * 2006-02-28 2009-03-25 Mips技术公司 Distributive scoreboard scheduling in an out-of order processor
US20080082788A1 (en) * 2006-10-02 2008-04-03 The Regents Of The University Of California Pointer-based instruction queue design for out-of-order processors
CN104932945A (en) * 2015-06-18 2015-09-23 合肥工业大学 Task-level out-of-order multi-issue scheduler and scheduling method thereof

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112099854A (en) * 2020-11-10 2020-12-18 北京微核芯科技有限公司 Method and device for scheduling out-of-order queue and judging queue cancellation item
US11829768B2 (en) 2020-11-10 2023-11-28 Beijing Vcore Technology Co., Ltd. Method for scheduling out-of-order queue and electronic device items
CN113254079A (en) * 2021-06-28 2021-08-13 广东省新一代通信与网络创新研究院 Method and system for realizing self-increment instruction
CN113254079B (en) * 2021-06-28 2021-10-01 广东省新一代通信与网络创新研究院 Method and system for realizing self-increment instruction
WO2024078228A1 (en) * 2022-10-11 2024-04-18 深圳市中兴微电子技术有限公司 Instruction issuing method and apparatus based on compression-type issue queue, device and medium
CN117742796A (en) * 2023-12-11 2024-03-22 上海合芯数字科技有限公司 Instruction awakening method, device and equipment
CN117742796B (en) * 2023-12-11 2024-07-23 上海合芯数字科技有限公司 Instruction awakening method, device and equipment

Also Published As

Publication number Publication date
WO2021203560A1 (en) 2021-10-14
CN111538534B (en) 2023-08-08

Similar Documents

Publication Publication Date Title
CN111538534A (en) Multi-instruction out-of-order emission method based on instruction fading and processor
CN1294484C (en) Breaking replay dependency loops in processor using rescheduled replay queue
US7529956B2 (en) Granular reduction in power consumption
CN1328657C (en) Speculative instruction issue in a simultaneously multithreaded processor and computer processer
US8250395B2 (en) Dynamic voltage and frequency scaling (DVFS) control for simultaneous multi-threading (SMT) processors
CN113535423A (en) Microprocessor with pipeline control for executing instructions at preset future times
US8521993B2 (en) Providing thread fairness by biasing selection away from a stalling thread using a stall-cycle counter in a hyper-threaded microprocessor
Xu et al. PATS: Pattern aware scheduling and power gating for GPGPUs
US20210311743A1 (en) Microprocessor having self-resetting register scoreboard
KR100309308B1 (en) Single chip multiprocessor with shared execution units
EP3398065A1 (en) Data driven scheduler on multiple computing cores
CN114207581A (en) Latency-based instruction reservation in scheduler circuitry in a processor
CN111552366B (en) Dynamic delay wake-up circuit and out-of-order instruction transmitting architecture
US6988185B2 (en) Select-free dynamic instruction scheduling
CN112084139A (en) Multi-emission mixed granularity reconfigurable array processor based on data flow driving
US8578384B2 (en) Method and apparatus for activating system components
CN118012632B (en) GPGPU (graphics processing Unit) branch instruction scheduling method based on multistage redistribution mechanism
CN114675882A (en) Method, system and apparatus for scalable reservation stations
CN111538533A (en) Instruction request circuit based on class adder and out-of-order instruction transmitting architecture
US8447960B2 (en) Pausing and activating thread state upon pin assertion by external logic monitoring polling loop exit time condition
CN111752889A (en) Method and apparatus for multi-stage reservation stations with instruction recirculation
US11829762B2 (en) Time-resource matrix for a microprocessor with time counter for statically dispatching instructions
CN100377076C (en) Control device and its method for fetching instruction simultaneously used on multiple thread processors
CN104636207A (en) Collaborative scheduling method and system based on GPGPU system structure
CN111857830B (en) Method, system and storage medium for designing path for forwarding instruction data in advance

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant