CN109032665B - Method and device for processing instruction output in microprocessor - Google Patents

Method and device for processing instruction output in microprocessor Download PDF

Info

Publication number
CN109032665B
CN109032665B CN201710433164.7A CN201710433164A CN109032665B CN 109032665 B CN109032665 B CN 109032665B CN 201710433164 A CN201710433164 A CN 201710433164A CN 109032665 B CN109032665 B CN 109032665B
Authority
CN
China
Prior art keywords
instruction
loop
microprocessor
output
register
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710433164.7A
Other languages
Chinese (zh)
Other versions
CN109032665A (en
Inventor
张华亮
吴瑞阳
乔崇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Loongson Technology Corp Ltd
Original Assignee
Loongson Technology Corp Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Loongson Technology Corp Ltd filed Critical Loongson Technology Corp Ltd
Priority to CN201710433164.7A priority Critical patent/CN109032665B/en
Publication of CN109032665A publication Critical patent/CN109032665A/en
Application granted granted Critical
Publication of CN109032665B publication Critical patent/CN109032665B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30003Arrangements for executing specific machine instructions
    • G06F9/3005Arrangements for executing specific machine instructions to perform operations for flow control
    • G06F9/30065Loop control instructions; iterative instructions, e.g. LOOP, REPEAT
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Advance Control (AREA)

Abstract

The invention provides a method and a device for processing instruction output in a microprocessor, wherein the method comprises the following steps: determining loop state information of a first instruction, wherein the loop state information comprises a loop identifier and loop times, the loop identifier is used for identifying whether the first instruction is a loop instruction, and the loop times are used for identifying loop iteration times of a loop body where the first instruction is located when the first instruction is processed; and determining whether to output the first instruction according to the loop state information of the first instruction. The method reasonably arranges the output sequence of the instructions based on whether the instructions are the loop instructions or not and the loop times corresponding to the instructions, namely the waiting time of the instructions in the microprocessor, so that the problem that the instructions with long execution time cannot be output all the time is avoided, the problem that the resources of the microprocessor occupy the time process is further avoided, and the performance of the microprocessor is ensured.

Description

Method and device for processing instruction output in microprocessor
Technical Field
The present invention relates to computer technologies, and in particular, to a method and an apparatus for processing instruction output in a microprocessor.
Background
The current microprocessor mostly adopts an over-scalar multi-pipeline scheme, which generally comprises pipeline stages of instruction fetching, predecoding, decoding, register renaming, scheduling, transmitting, register reading, executing, write-back submitting and the like. In the execution stage, a plurality of execution units (fixed point execution unit, floating point execution unit, memory access execution unit and the like) in the microprocessor execute relevant instructions in parallel and output the execution result to an output bus. Wherein, each execution unit in the fixed point execution unit executes a class of instructions respectively. The number of output buses in the microprocessor is smaller than the number of execution units, so that a group of output buses need to be shared by a plurality of execution units, and therefore, when the plurality of execution units need to output results to the output buses at the same time, the output sequence of the operation results of the plurality of execution units needs to be controlled through a certain algorithm.
In the prior art, an output scheduling algorithm with priority of short instructions is provided, which takes the execution time of the instructions as the priority of the weight of submitted instructions, and the instructions with short execution time output operation results first.
However, the methods provided by the prior art may cause the resource occupation time in the microprocessor to be too long when processing the loop instruction, which affects the performance of the microprocessor.
Disclosure of Invention
The invention provides a method and a device for processing instruction output in a microprocessor, which are used for solving the problem of overlong resource occupation in the microprocessor caused by processing a circular instruction in the prior art.
The first aspect of the present invention provides a method for processing instruction output in a microprocessor, including:
determining loop state information of a first instruction, wherein the loop state information comprises a loop identifier and loop times, the loop identifier is used for identifying whether the first instruction is a loop instruction, and the loop times are used for identifying loop iteration times of a loop body where the first instruction is located when the first instruction is processed;
and determining whether to output the first instruction according to the loop state information of the first instruction.
Further, the microprocessor includes a plurality of execution units, each execution unit for executing one type of instruction; the determining whether to output the first instruction according to the loop state information of the first instruction comprises:
if the first instruction is executed completely, the loop identification of the first instruction is true, and the loop number of each second instruction in the N second instructions and the loop number of the first instruction meet the output condition, determining to output the first instruction;
the second instruction is an instruction which is executed and completed in execution units except for an execution unit executing the first instruction in the microprocessor, and N is a difference value between the number of the execution units of the microprocessor and the number of output buses of the microprocessor.
Further, the determining whether to output the first instruction according to the loop state information of the first instruction includes:
if the loop identifier of the first instruction is false, determining whether to output the first instruction according to the priority of an execution unit executing the first instruction; alternatively, the first and second electrodes may be,
if the cycle number of each second instruction in the N second instructions and the cycle number of the first instruction do not meet the output condition, determining whether to output the first instruction according to the priority of an execution unit executing the first instruction;
the second instruction is an instruction which is executed and completed in execution units except for an execution unit executing the first instruction in the microprocessor, and N is a difference value between the number of the execution units of the microprocessor and the number of output buses of the microprocessor.
Further, the number of cycles of the first instruction and the priority of the execution unit executing the first instruction are represented by a data unit composed of preset bytes, wherein a preset number of bits from a highest bit in the data unit is used for representing the number of cycles of the first instruction, and bits other than the bit used for representing the number of cycles in the data unit are used for representing the priority of the execution unit executing the first instruction.
Furthermore, the microprocessor comprises a cycle number register and a cycle state register, the cycle number register is connected with the cycle state register, the cycle number register is used for storing the cycle number, and the cycle state register is used for storing the cycle identifier; the determining loop state information of the first instruction comprises:
judging whether the first instruction identification of the instruction queue where the first instruction is located is equal to the first instruction identification of the loop body where the first instruction is located, if so, then:
adding the value in the cycle number register and 1, and taking the addition result as a new value of the cycle number register; and the number of the first and second groups,
updating the value of the loop status register to true; and the number of the first and second groups,
and taking the new value of the loop time register as the loop time corresponding to the first instruction, and taking the value of the loop state register as the loop identifier corresponding to the first instruction.
Further, the determining the loop state information of the first instruction includes:
if the value of the loop status register is true and the first instruction identifier of the instruction queue of the first instruction is not equal to the first instruction identifier of the loop body of the first instruction, then:
taking the value of the loop frequency register as the loop frequency corresponding to the first instruction; and the number of the first and second groups,
and taking the value of the loop state register as a loop identifier corresponding to the first instruction.
A second aspect of the present invention provides an instruction output processing apparatus in a microprocessor, including:
the device comprises a first determining module, a second determining module and a processing module, wherein the first determining module is used for determining loop state information of a first instruction, the loop state information comprises a loop identifier and loop times, the loop identifier is used for identifying whether the first instruction is a loop instruction or not, and the loop times are used for identifying loop iteration times of a loop body where the first instruction is located when the first instruction is processed;
and the second determining module is used for determining whether to output the first instruction according to the circulation state information of the first instruction.
Further, the microprocessor includes a plurality of execution units, each execution unit for executing one type of instruction; the second determining module includes:
an output unit, configured to output the first instruction when the first instruction has completed execution and a loop flag of the first instruction is true, and a loop count of each of N second instructions and a loop count of the first instruction satisfy an output condition;
the second instruction is an instruction which is executed and completed in execution units except for an execution unit executing the first instruction in the microprocessor, and N is a difference value between the number of the execution units of the microprocessor and the number of output buses of the microprocessor.
Further, the second determining module further comprises:
a determining unit, configured to determine whether to output the first instruction according to a priority of an execution unit that executes the first instruction when a loop identifier of the first instruction is false, or determine whether to output the first instruction according to a priority of an execution unit that executes the first instruction when a loop count of each of the N second instructions and a loop count of the first instruction do not satisfy an output condition;
the second instruction is an instruction which is executed and completed in execution units except for an execution unit executing the first instruction in the microprocessor, and N is a difference value between the number of the execution units of the microprocessor and the number of output buses of the microprocessor. .
Further, the number of cycles of the first instruction and the priority of the execution unit executing the first instruction are represented by a data unit composed of preset bytes, wherein a preset number of bits from a highest bit in the data unit is used for representing the number of cycles of the first instruction, and bits other than the bit used for representing the number of cycles in the data unit are used for representing the priority of the execution unit executing the first instruction.
Furthermore, the microprocessor comprises a cycle number register and a cycle state register, the cycle number register is connected with the cycle state register, the cycle number register is used for storing the cycle number, and the cycle state register is used for storing the cycle identifier; the first determining module includes:
the judging unit is used for judging whether the first instruction identifier of the instruction queue where the first instruction is located is equal to the first instruction identifier of the loop body where the first instruction is located;
an adding unit, configured to add 1 to the value in the loop time number register when the determination result of the determining unit is yes, and take the addition result as a new value of the loop time number register;
the updating unit is used for updating the value of the circulating state register to be true when the judgment result of the judging unit is yes;
and the first processing unit is used for taking the new value of the loop frequency register as the loop frequency corresponding to the first instruction and taking the value of the loop state register as the loop identifier corresponding to the first instruction when the judgment result of the judging unit is yes.
Further, the first determining module further comprises:
and the second processing unit is used for taking the value of the loop time register as the loop time corresponding to the first instruction and taking the value of the loop state register as the loop identifier corresponding to the first instruction when the value of the loop state register is true and the first instruction identifier of the instruction queue of the first instruction is not equal to the first instruction identifier of the loop body of the first instruction.
A third aspect of the invention provides a computer storage medium having stored thereon a computer program which, when executed by a processor, performs the method of the first aspect as described above.
According to the method and the device for processing the instruction output in the microprocessor, the microprocessor firstly determines the cycle state information of the first instruction, and then determines whether to output the first instruction according to the cycle state information of the first instruction, namely, whether the first instruction is output or not is determined based on whether the instruction is the cycle instruction or not, and the cycle times corresponding to the instruction, namely the waiting time of the instruction in the microprocessor, so that the output sequence of the instruction is reasonably arranged, the problem that the instruction with long execution time cannot be output all the time is avoided, the problem that the resource of the microprocessor occupies a time process is further avoided, and the performance of the microprocessor is ensured.
Drawings
In order to more clearly illustrate the technical solutions of the present invention or the prior art, the following briefly introduces the drawings needed to be used in the description of the embodiments or the prior art, and obviously, the drawings in the following description are some embodiments of the present invention, and those skilled in the art can obtain other drawings according to the drawings without inventive labor.
FIG. 1 is a flowchart illustrating a first embodiment of a method for processing instruction output from a microprocessor according to the present invention;
FIG. 2 is a diagram of pipeline components of a microprocessor;
FIG. 3 is a schematic diagram of cycle state information delivery;
FIG. 4 is a flowchart illustrating a second embodiment of a method for processing instruction output in a microprocessor according to the present invention;
FIG. 5 is a flowchart illustrating a third exemplary embodiment of a method for processing instruction output from a microprocessor according to the present invention;
FIG. 6 is a block diagram of a first embodiment of an instruction output processing apparatus in a microprocessor according to the present invention;
FIG. 7 is a block diagram of a second embodiment of an instruction output processing apparatus in a microprocessor according to the present invention;
FIG. 8 is a block diagram of a third embodiment of an instruction output processing apparatus in a microprocessor according to the present invention;
FIG. 9 is a block diagram illustrating a fourth exemplary embodiment of an instruction output processing apparatus in a microprocessor according to the present invention;
FIG. 10 is a block diagram of a fifth embodiment of an instruction output processing apparatus in a microprocessor according to the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The output scheduling algorithm with the priority of the short instruction provided in the prior art can enable more instructions to be output to an output bus in a shorter time, so that the throughput rate of a single execution unit is improved. However, from the overall perspective of the microprocessor, when the instructions processed in the microprocessor include loop instructions, the resource usage is too long. Specifically, the fixed point execution Unit of the microprocessor includes a plurality of execution units, and if an instruction executed by an Arithmetic Logic Unit (ALU) execution Unit executing an addition instruction in the microprocessor is a loop instruction, and both the ALU execution Unit and a multiplication execution Unit executing a multiplication instruction have a result output at the same time, the result in the ALU execution Unit is preferentially output to an output bus according to a principle of short instruction priority, and the result of the multiplication instruction waits. Because the loop instruction is executed in the ALU execution unit, the ALU execution unit continuously generates the execution results of the addition instructions, and according to the short instruction priority principle, the execution results of the addition instructions are output before the result of the waiting multiplication instruction, and the result of the multiplication instruction waits all the time, thereby causing the physical register in the rename register at the front end of the execution unit in the microprocessor to be exhausted, further causing the decoding module at the front end of the rename register to be unable to obtain the effective physical register, the decoding module to be halted, unable to process the input instruction, and affecting the performance of the microprocessor.
The method provided by the invention aims to solve the technical problems.
The overall idea of the embodiment of the invention is that the instruction with short execution time outputs the calculation result firstly; and recording the waiting time after the instruction is executed in the execution part; if the time waiting in the execution unit after the completion of the execution of an instruction is too long, the execution unit, which is relatively inefficient and long in execution time, outputs data results to the bus in order to prevent the resource being occupied by instructions associated with the instruction that are subsequent to the execution time of the instruction that are not committed.
Fig. 1 is a flowchart illustrating a first embodiment of a method for processing an instruction output from a microprocessor according to the present invention, where an execution main body of the method is the microprocessor, and as shown in fig. 1, the method includes:
s101, determining loop state information of the first instruction, wherein the loop state information comprises a loop identifier and loop times, the loop identifier is used for identifying whether the first instruction is a loop instruction, and the loop times are used for identifying loop iteration times of a loop body where the first instruction is located when the first instruction is processed.
Fig. 2 is a schematic diagram of pipeline components of a microprocessor, and as shown in fig. 2, after receiving an instruction input by an external module, the microprocessor may sequentially pass through an instruction fetch module, an instruction queue, a decoding module, a renaming module, an instruction queue to be transmitted, an instruction queue corresponding to an execution component, and the execution component. The execution unit may include a fixed point execution unit, a floating point execution unit, and a memory access execution unit, where an instruction that the fixed point execution unit needs to execute enters a fixed point instruction queue in advance, an instruction that the floating point execution unit needs to execute enters a floating point instruction queue in advance, and an instruction that the memory access execution unit needs to execute enters a memory access instruction queue in advance. In addition, the fixed-point execution unit may specifically include an ALU execution unit, a multiplication execution unit, a division execution unit, and a Digital Signal Processing (DSP) execution unit.
In this step, the loop information of the first instruction is determined when the decoding module performs decoding processing.
Specifically, when a first instruction is processed in the decoding process, the microprocessor determines whether the first instruction is a loop instruction and the loop number of the first instruction according to the current operation information. And the loop times are used for identifying the loop iteration times of the loop body where the first instruction is located when the first instruction is processed. For example, if the loop body of the first instruction has been looped to the 3 rd time when the first instruction is executed, the loop number of the first instruction is 3, that is, the loop number of the first instruction may indicate the latency of the first instruction in the microprocessor, and the smaller the loop number is, the earlier the first instruction is decoded, and the larger the loop number is, the later the first instruction is decoded.
It should be noted that, the schematic diagram of an optional microprocessor component shown in fig. 2 may be changed according to actual needs, and therefore, the component schematic diagram shown in fig. 2 is not meant to be a limitation of the present invention, and step 1 of the present invention may also be executed when performing decoding processing under other component configurations.
And S102, determining whether to output the first instruction according to the circulation state information of the first instruction.
Specifically, taking the pipeline component shown in fig. 2 as an example, when the first instruction executes decoding processing in the decoding module, the loop state information of the first instruction is determined, and further, when the first instruction continues to be transmitted to a subsequent module, the loop state information of the first instruction is carried in the first instruction and transmitted until the first instruction enters the corresponding execution unit to be executed. For example, if the first instruction is an add instruction, the loop status information of the add instruction is carried in the add instruction and transmitted to the ALU execution unit, and after the ALU execution unit executes the add instruction, it determines whether to output the add instruction according to the loop status information of the add instruction. Fig. 3 is a schematic diagram illustrating the transmission of the loop state information, and as shown in fig. 3, the loop state information is transmitted from the decoding module to the fixed point execution unit along with the first instruction, and is used as a basis for whether the first instruction is output or not in the fixed point execution unit.
In this embodiment, the microprocessor first determines the cycle state information of the first instruction, and then determines whether to output the first instruction according to the cycle state information of the first instruction, that is, whether to output the first instruction based on whether the instruction is a cycle instruction, and the cycle number corresponding to the instruction, that is, the waiting time of the instruction in the microprocessor, to reasonably arrange the output sequence of the instruction, thereby avoiding that the instruction with long execution time cannot be output all the time, further avoiding the problem of the resource occupation time process of the microprocessor, and ensuring the performance of the microprocessor.
On the basis of the above embodiments, the present embodiment relates to a specific method in which the microprocessor determines whether to output the first instruction based on the loop state information of the first instruction. That is, step S102 is specifically:
and if the first instruction is executed completely, the loop identification of the first instruction is true, and the difference value of the loop times of each second instruction in the N second instructions minus the loop times of the first instruction is larger than a preset value, outputting the first instruction.
And if the loop identifier of the first instruction is false, determining whether to output the first instruction according to the priority of an execution unit executing the first instruction, or determining whether to output the first instruction according to the priority of the execution unit executing the first instruction if the difference value obtained by subtracting the loop identifier of the first instruction from the loop identifier of each of the N second instructions is less than or equal to a preset value.
The second instruction is an instruction which is executed and completed in execution units except for an execution unit executing the first instruction in the microprocessor, and N is a difference value between the number of the execution units of the microprocessor and the number of output buses of the microprocessor.
Specifically, as mentioned above, the microprocessor includes fixed-point execution units, such as a fixed-point execution unit, in which each execution unit includes an ALU execution unit, a multiply execution unit, a divide execution unit, a DSP execution unit, and the like, and each execution unit is configured to execute one type of instruction, for example, the ALU execution unit may execute single-beat instructions, such as addition, subtraction, comparison, and the like, that is, instructions that can be completed in only one clock cycle, and the multiply execution unit is configured to execute multiply instructions, which are multi-beat instructions, that is, instructions that require multiple clock cycles to be completed.
Each execution unit has a corresponding priority, specifically, the ALU execution unit has a higher priority than the multiply execution unit, the multiply execution unit has a higher priority than the divide execution unit, and the divide execution unit has a higher priority than the DSP execution unit. In this embodiment, 1 indicates the priority of the ALU execution unit, 2 indicates the priority of the multiply execution unit, 3 indicates the priority of the divide execution unit, and 4 indicates the priority of the DSP execution unit.
As an alternative embodiment, one data unit may be used to indicate the number of cycles of the first instruction and the priority of the execution unit executing the first instruction. Specifically, the data unit is composed of a preset byte, wherein a preset number of bits from the highest bit in the data unit are used for indicating the loop times of the first instruction, and bits other than the bit for indicating the loop times in the data unit are used for indicating the priority of an execution unit executing the first instruction. The present embodiment is described by taking an example in which the data unit is composed of 2 bytes, that is, 16 bits, and the upper 13 bits of the 16 bits indicate the number of cycles of the first instruction, and the lower 3 bits indicate the priority of the execution unit.
For convenience of explaining the scheme of the present embodiment, the following description will be given by taking an example in which the microprocessor includes the above-mentioned 4 types of execution units, and the number of execution buses of the microprocessor is 2.
Assuming that the first instruction is a division instruction, that is, the execution unit executing the first instruction is a division execution unit, the upper 13 bits of the data unit used for representing the first instruction may be represented as PRI _ HI _3, and the lower 3 bits of the data unit used for representing the first instruction may be represented as PRI _ LO _3, and accordingly, the second instruction may be represented in other manners, for example, the second instruction to be output by the ALU execution unit may be represented as: PRI _ HI _1 and PRI _ LO _ 1.
Whether to output the first instruction may be expressed by the following equation (1):
DIV_OUT_EN=DIV_RES_VALID&&((LOOP_JUDGE_FLAG&&PRI_HI_3_EN)||(~LOOP_JUDGE_FLAG&&PRI_LO_3_EN)) (1)
the DIV _ RES _ VALID indicates whether the division execution unit outputs a result, that is, only when the division execution unit outputs a result, that is, the first instruction has been executed and completed, it can be further determined whether the first instruction can be output.
PRI_HI_3_EN=(PRI_HI_1-PRI_HI_3-1>0&&PRI_HI_2-PRI_HI_3-1>0)||(PRI_HI_2-PRI_HI_3-1>0&&PRI_HI_4-PRI_HI_3-1>0)||(PRI_HI_1-PRI_HI_3-1>0&&PRI_HI_4-PRI_HI_3-1>0) (2)
Since there are only 2 output buses and 4 execution units, that is, there are only 2 execution units capable of outputting results at each time, PRI _ HI _3_ EN, that is, the calculation result of equation (2), is used to represent an intermediate calculation result, that is, it is necessary to ensure that the cycle number of the instruction to be output of 2 (4-2 ═ 2) execution units of the 4 execution units is greater than the cycle number of the first instruction, and the difference is greater than 1, that is, the difference is greater, so that it is ensured that the output sequence of the first instruction is in the first two bits of the 4 units, and it is ensured that the first instruction can be output. Specifically, as long as the loop times of the instructions to be output of any two execution units except the execution unit corresponding to the first instruction in the 4 execution units are all greater than the loop time of the first instruction by a preset value (for example, 1), the result of the PRI _ HI _3_ EN is true.
LOOP_JUDGE_FLAG=(LOOP_VALID_1+LOOP_VALID_2+LOOP_VALID_3+LOOP_VALID_4>2)&&(PRI_HI_1_EN||PRI_HI_2_EN||PRI_HI_3_EN||PRI_HI_4_EN) (3)
The LOOP _ VALID _1 indicates whether an instruction to be output by the ALU execution unit is a LOOP instruction, the LOOP _ VALID _2 indicates whether an instruction to be output by the multiply execution unit is a LOOP instruction, the LOOP _ VALID _3 indicates whether a first instruction is a LOOP instruction, and so on. Formula (3) is a condition for judging whether the instruction to be output of each execution unit needs to be output competitively from the whole angle. Specifically, it must be ensured that the instructions to be output of at least two execution units in the 4 execution units are LOOP instructions, and the number of times of the instructions to be output of at least one execution unit is smaller than the instructions to be output of the other execution units, then the result of LOOP _ jump _ FLAG is true.
In combination with the above equations (1), (2) and (3), when the first instruction execution is completed, and the loop flag of the first instruction is true, and the loop number of the first instruction is smaller than the loop numbers of the instructions to be output of the other N execution units by a preset value, then the first instruction may be output. That is, the output order of instructions is determined based on the number of cycles of the instructions.
For example, if there is an instruction result to be output in each of the ALU execution unit and the multiply execution unit at the same time, the ALU execution unit is an add instruction a, a loop instruction, and the loop number is 3, the multiply execution unit is a multiply instruction B, the loop number is 1, it indicates that the multiply instruction B is an instruction processed by the loop body executing to the 1 st loop in the decoding module, and the add instruction a is an instruction processed by the loop body executing to the 3 rd loop in the decoding module, that is, the multiply instruction is processed earlier, and a long time has been waited in the microprocessor, after the calculation by the above formula (1), the loop number of the multiply instruction is 2 less than the loop number of the add instruction, so the result of the formula (1) can be made true, and therefore, the multiply instruction B can be output before the add instruction a, and the addition instruction cannot be always output first as in the prior art, so that the problem of long resource occupation time caused by the fact that the multiplication instruction cannot be output all the time is solved.
And in the other case where (LOOP _ JUDGE _ FLAG & & PRI _ HI _3_ EN) in formula (1) is not satisfied, and (LOOP _ JUDGE _ FLAG & & PRI _ HI _3_ EN) in formula (1) is satisfied
(-LOOP _ judgge _ FLAG & & PRI _ LO _3_ EN)), it is described that the first instruction is not a LOOP id, or the difference between the LOOP times of the second instruction and the LOOP times of the first instruction does not satisfy the preset difference condition, it is described that the first instruction is not a LOOP instruction, there is no aforementioned problem of long resource occupation, or it is described that the time for the first instruction to wait in the microprocessor is not very long, and there is no problem of long resource occupation, in this case, it may be determined whether the first instruction is to be output according to the priority of the execution unit.
Where PRI _ LO _3_ EN in equation (1) indicates whether the priority of the first instruction is true, specifically, PRI _ LO _3_ EN may be determined by the priority of the aforementioned PRI _ LO _3 and other second instructions, for example, if PRI _ LO _3 of the first instruction is 3, and at the same time, PRI _ LO _1 and PRI _ LO _1 are both valid, that is, both the ALU execution unit and the multiply execution unit have to output an instruction result, PRI _ LO _3_ EN is false, and if PRI _ LO _1 or PRI _ LO _1 are not valid, that is, the ALU execution unit or the multiply execution unit have no instruction result to output, PRI _ LO _3_ EN is true.
That is, when the first instruction is not a loop instruction or the latency of the first instruction in the microprocessor is not long compared to the second instruction, the result may be output according to the priority of the execution unit, thereby ensuring that the throughput of the microprocessor is as large as possible without affecting the performance of the microprocessor.
On the basis of the foregoing embodiments, the present embodiment relates to a specific method for determining loop status information of a first instruction by a microprocessor, that is, fig. 4 is a flowchart illustrating a second embodiment of a method for processing instruction output in a microprocessor according to the present invention, and as shown in fig. 4, the step S101 specifically includes:
s401, judging whether the first instruction identification of the instruction queue where the first instruction is located is equal to the first instruction identification of the loop body where the first instruction is located, if yes, executing S402-S404. If not, the following steps shown in fig. 5 are continuously executed.
S402, adding the value in the cycle number register and 1, and taking the addition result as a new value of the cycle number register.
Specifically, two registers, namely a cycle number register and a cycle state register, are newly added in the microprocessor, wherein the cycle number register is connected with the cycle state register, the cycle number register is used for storing cycle numbers, and the cycle state register is used for storing cycle identifiers.
It should be noted that, the present invention may have various flexible implementation manners for the positions of the loop number register and the loop status register in the microprocessor and the specific connection relationship between the two registers and other components in the microprocessor according to the functions of the two registers, and is not limited in this embodiment.
And S403, updating the value of the circulation state register to be true.
S404, taking the new value of the loop frequency register as the loop frequency corresponding to the first instruction, and taking the value of the loop state register as the loop identifier corresponding to the first instruction.
Specifically, two registers, namely a cycle number register and a cycle state register, are additionally arranged in the microprocessor, wherein the cycle number register is used for storing the cycle number, and the cycle state register is used for storing the cycle identifier. The initial value of the loop number register is 0 and the initial value of the loop status register is false.
The loop body in the decoding module performs cyclic execution processing in the decoder, and instructions in one loop body can be divided into a plurality of instruction queues for execution, and decoding processing is performed by taking the instruction queues as units. The identifier of the first instruction of the whole LOOP body may be recorded in advance through a register, which is assumed to be referred to as LOOP _ ID, and further, when the instruction queue where the first instruction is located executes, if the identifier of the first instruction of the instruction queue is equal to the LOOP _ ID, it indicates that a new LOOP iteration is entered, the value of the LOOP number register is incremented by 1, and meanwhile, the value of the LOOP state register is updated to true. Further, the value of the loop count register and the value of the loop status register are used as the values of the loop status information of the first instruction.
In another case, when the instruction queue in which the first instruction is located is not the first instruction queue of the loop body, the processing procedure is as shown in fig. 5, where fig. 5 is a flowchart illustrating a third embodiment of a method for processing instruction output in a microprocessor according to the present invention, and as shown in fig. 5, the step S101 specifically includes:
s501, judging whether the value of the loop state register is true, judging whether the first instruction identifier of the instruction queue of the first instruction is not equal to the first instruction identifier of the loop body of the first instruction, if yes, executing S502-S503, otherwise, executing according to the process of the prior art, and no longer giving details here
S502, taking the value of the loop frequency register as the loop frequency corresponding to the first instruction.
S503, taking the value of the loop state register as the loop identifier corresponding to the first instruction.
That is, if the value in the LOOP status register is true and the first instruction identifier of the instruction queue of the first instruction is not equal to the LOOP _ ID, it indicates that the first instruction is a LOOP instruction and the instruction queue of the first instruction is not the first instruction queue in the LOOP body, so that it is not necessary to update the value of the LOOP number register, and it is only necessary to use the values in the current LOOP number register and the LOOP status register as the LOOP status information of the first instruction.
In addition, if the first instruction is not a loop instruction, the number of loops of the first instruction may be set to 0, the loop flag may be set to false, and when a determination of whether to output is made, the microprocessor may determine whether to output the first instruction based on these values and the priority of the execution unit.
Fig. 6 is a block diagram of a first embodiment of an instruction output processing apparatus in a microprocessor according to the present invention, as shown in fig. 6, the apparatus includes:
the first determining module 601 is configured to determine loop state information of the first instruction, where the loop state information includes a loop identifier and a loop frequency, where the loop identifier is used to identify whether the first instruction is a loop instruction, and the loop frequency is used to identify a loop iteration frequency of a loop body where the first instruction is located when the first instruction is processed.
The second determining module 602 is configured to determine whether to output the first instruction according to the loop status information of the first instruction.
The device is used for realizing the method embodiments, the realization principle and the technical effect are similar, and the details are not repeated here.
Fig. 7 is a block diagram of a second embodiment of an instruction output processing apparatus in a microprocessor according to the present invention, as shown in fig. 7, the microprocessor includes a plurality of execution units, each of which is configured to execute one type of instruction; the second determining module 602 includes:
an output unit 6021 configured to output the first instruction when the first instruction has completed execution, and the loop flag of the first instruction is true, and the loop number of each of the N second instructions and the loop number of the first instruction satisfy an output condition.
The second instruction is an instruction which is executed in an execution unit except an execution unit for executing the first instruction in the microprocessor, and N is a difference value between the number of the execution units of the microprocessor and the number of the output buses of the microprocessor.
Fig. 8 is a block diagram of a third embodiment of an instruction output processing apparatus in a microprocessor according to the present invention, and as shown in fig. 8, the second determining module 602 further includes:
the determination unit 6022 is configured to determine,
the instruction execution unit is used for determining whether to output the first instruction according to the priority of an execution unit executing the first instruction when the loop identification of the first instruction is false, or determining whether to output the first instruction according to the priority of the execution unit executing the first instruction when the loop times of each second instruction in the N second instructions and the loop times of the first instruction do not meet the output condition;
the second instruction is an instruction which is executed and completed in execution units except for an execution unit executing the first instruction in the microprocessor, and N is a difference value between the number of the execution units of the microprocessor and the number of output buses of the microprocessor.
In another embodiment, the number of cycles of the first instruction and the priority of the execution unit executing the first instruction are represented by a data unit composed of preset bytes, wherein a preset number of bits from a highest bit in the data unit is used to represent the number of cycles of the first instruction, and bits other than the bit used to represent the number of cycles in the data unit are used to represent the priority of the execution unit executing the first instruction.
Fig. 9 is a block diagram of a fourth embodiment of an instruction output processing apparatus in a microprocessor according to the present invention, as shown in fig. 9, the microprocessor includes a cycle number register and a cycle status register, and the cycle number register is connected to the cycle status register; the loop time register is used for storing loop times, and the loop state register is used for storing loop identifications; the first determination module 601 includes:
the determining unit 6011 is configured to determine whether a first instruction identifier of the instruction queue of the first instruction is equal to a first instruction identifier of a loop body of the first instruction.
An adding unit 6012, configured to add the value in the loop number register and 1 when the determination result of the determining unit 6011 is yes, and take the addition result as a new value of the loop number register.
An updating unit 6013, configured to update the value of the loop status register to true when the determination result of the determining unit 6011 is yes.
The first processing unit 6014 is configured to, when the determination result of the determining unit 6011 is yes, use the new value of the loop count register as the loop count corresponding to the first instruction, and use the value of the loop status register as the loop identifier corresponding to the first instruction.
Fig. 10 is a block diagram of a fifth embodiment of an instruction output processing apparatus in a microprocessor according to the present invention, and as shown in fig. 10, the first determining module 601 further includes:
the second processing unit 6015 is configured to, when the value of the loop status register is true and the identifier of the first instruction in the instruction queue of the first instruction is not equal to the identifier of the first instruction in the loop body of the first instruction, use the value of the loop time register as the loop time corresponding to the first instruction, and use the value of the loop status register as the loop identifier corresponding to the first instruction.
Those of ordinary skill in the art will understand that: all or a portion of the steps of implementing the above-described method embodiments may be performed by hardware associated with program instructions. The program may be stored in a computer-readable storage medium. When executed, the program performs steps comprising the method embodiments described above; and the aforementioned storage medium includes: various media that can store program codes, such as ROM, RAM, magnetic or optical disks.
Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.

Claims (11)

1. A method for processing an instruction output in a microprocessor, comprising:
determining loop state information of a first instruction, wherein the loop state information comprises a loop identifier and loop times, the loop identifier is used for identifying whether the first instruction is a loop instruction, and the loop times are used for identifying loop iteration times of a loop body where the first instruction is located when the first instruction is processed;
determining whether to output the first instruction according to the circulation state information of the first instruction;
the microprocessor comprises a plurality of execution units, and each execution unit is used for executing one type of instructions;
the determining whether to output the first instruction according to the loop state information of the first instruction comprises:
if the first instruction is executed completely, the loop identification of the first instruction is true, and the loop number of each second instruction in the N second instructions and the loop number of the first instruction meet the output condition, determining to output the first instruction;
the second instruction is an instruction which is executed and completed in execution units except for an execution unit executing the first instruction in the microprocessor, and N is a difference value between the number of the execution units of the microprocessor and the number of output buses of the microprocessor.
2. The method of claim 1, wherein determining whether to output the first instruction based on loop state information of the first instruction comprises:
if the loop identifier of the first instruction is false, determining whether to output the first instruction according to the priority of an execution unit executing the first instruction; alternatively, the first and second electrodes may be,
if the cycle number of each second instruction in the N second instructions and the cycle number of the first instruction do not meet the output condition, determining whether to output the first instruction according to the priority of an execution unit executing the first instruction;
the second instruction is an instruction which is executed and completed in execution units except for an execution unit executing the first instruction in the microprocessor, and N is a difference value between the number of the execution units of the microprocessor and the number of output buses of the microprocessor.
3. The method of claim 2, wherein the number of cycles of the first instruction and the priority of the execution unit executing the first instruction are represented by a data unit consisting of preset bytes, wherein a preset number of bits from a highest bit in the data unit are used for representing the number of cycles of the first instruction, and wherein bits other than the bit used for representing the number of cycles in the data unit are used for representing the priority of the execution unit executing the first instruction.
4. The method according to any one of claims 1-3, wherein a loop number register and a loop status register are included in the microprocessor, the loop number register is connected to the loop status register, the loop number register is used for storing the loop number, and the loop status register is used for storing the loop identifier;
the determining loop state information of the first instruction comprises:
judging whether the first instruction identification of the instruction queue where the first instruction is located is equal to the first instruction identification of the loop body where the first instruction is located, if so, then:
adding the value in the cycle number register and 1, and taking the addition result as a new value of the cycle number register; and the number of the first and second groups,
updating the value of the loop status register to true; and the number of the first and second groups,
and taking the new value of the loop time register as the loop time corresponding to the first instruction, and taking the value of the loop state register as the loop identifier corresponding to the first instruction.
5. The method of claim 4, wherein determining loop state information for the first instruction comprises:
if the value of the loop status register is true and the first instruction identifier of the instruction queue of the first instruction is not equal to the first instruction identifier of the loop body of the first instruction, then:
taking the value of the loop frequency register as the loop frequency corresponding to the first instruction; and the number of the first and second groups,
and taking the value of the loop state register as a loop identifier corresponding to the first instruction.
6. An instruction output processing apparatus in a microprocessor, comprising:
the device comprises a first determining module, a second determining module and a processing module, wherein the first determining module is used for determining loop state information of a first instruction, the loop state information comprises a loop identifier and loop times, the loop identifier is used for identifying whether the first instruction is a loop instruction or not, and the loop times are used for identifying loop iteration times of a loop body where the first instruction is located when the first instruction is processed;
the second determining module is used for determining whether to output the first instruction according to the circulation state information of the first instruction;
the microprocessor comprises a plurality of execution units, and each execution unit is used for executing one type of instructions; the second determining module includes:
an output unit, configured to output the first instruction when the first instruction has completed execution and a loop flag of the first instruction is true, and a loop count of each of N second instructions and a loop count of the first instruction satisfy an output condition;
the second instruction is an instruction which is executed and completed in execution units except for an execution unit executing the first instruction in the microprocessor, and N is a difference value between the number of the execution units of the microprocessor and the number of output buses of the microprocessor.
7. The apparatus of claim 6, wherein the second determining module further comprises:
a determining unit, configured to determine whether to output the first instruction according to a priority of an execution unit that executes the first instruction when a loop identifier of the first instruction is false, or determine whether to output the first instruction according to a priority of an execution unit that executes the first instruction when a loop count of each of the N second instructions and a loop count of the first instruction do not satisfy an output condition;
the second instruction is an instruction which is executed and completed in execution units except for an execution unit executing the first instruction in the microprocessor, and N is a difference value between the number of the execution units of the microprocessor and the number of output buses of the microprocessor.
8. The apparatus of claim 7, wherein the number of cycles of the first instruction and the priority of the execution unit executing the first instruction are represented by a data unit consisting of preset bytes, wherein a preset number of bits from a highest bit in the data unit are used for representing the number of cycles of the first instruction, and wherein bits other than the bit used for representing the number of cycles in the data unit are used for representing the priority of the execution unit executing the first instruction.
9. The apparatus according to any one of claims 6-8, wherein the microprocessor includes a loop number register and a loop status register, the loop number register is connected to the loop status register, the loop number register is used for storing the loop number, and the loop status register is used for storing the loop identifier; the first determining module includes:
the judging unit is used for judging whether the first instruction identifier of the instruction queue where the first instruction is located is equal to the first instruction identifier of the loop body where the first instruction is located;
an adding unit, configured to add 1 to the value in the loop time number register when the determination result of the determining unit is yes, and take the addition result as a new value of the loop time number register;
the updating unit is used for updating the value of the circulating state register to be true when the judgment result of the judging unit is yes;
and the first processing unit is used for taking the new value of the loop frequency register as the loop frequency corresponding to the first instruction and taking the value of the loop state register as the loop identifier corresponding to the first instruction when the judgment result of the judging unit is yes.
10. The apparatus of claim 9, wherein the first determining module further comprises:
and the second processing unit is used for taking the value of the loop time register as the loop time corresponding to the first instruction and taking the value of the loop state register as the loop identifier corresponding to the first instruction when the value of the loop state register is true and the first instruction identifier of the instruction queue of the first instruction is not equal to the first instruction identifier of the loop body of the first instruction.
11. A computer storage medium having a computer program stored thereon, wherein the program, when executed by a processor, implements the method of any of claims 1-5.
CN201710433164.7A 2017-06-09 2017-06-09 Method and device for processing instruction output in microprocessor Active CN109032665B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710433164.7A CN109032665B (en) 2017-06-09 2017-06-09 Method and device for processing instruction output in microprocessor

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710433164.7A CN109032665B (en) 2017-06-09 2017-06-09 Method and device for processing instruction output in microprocessor

Publications (2)

Publication Number Publication Date
CN109032665A CN109032665A (en) 2018-12-18
CN109032665B true CN109032665B (en) 2021-01-26

Family

ID=64629813

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710433164.7A Active CN109032665B (en) 2017-06-09 2017-06-09 Method and device for processing instruction output in microprocessor

Country Status (1)

Country Link
CN (1) CN109032665B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112000370B (en) 2020-08-27 2022-04-15 北京百度网讯科技有限公司 Processing method, device and equipment of loop instruction and storage medium
CN111796869A (en) * 2020-09-07 2020-10-20 华夏芯(北京)通用处理器技术有限公司 Program instruction block processing method and device
CN113778528B (en) * 2021-09-13 2023-03-24 北京奕斯伟计算技术股份有限公司 Instruction sending method and device, electronic equipment and storage medium

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1307700A (en) * 1998-06-30 2001-08-08 英特尔公司 Computer processor with replay system
CN101048731A (en) * 2004-10-20 2007-10-03 英特尔公司 Looping instructions for a single instruction, multiple data execution engine
CN101452394A (en) * 2007-11-28 2009-06-10 无锡江南计算技术研究所 Compiling method and compiler
CN101788903A (en) * 2008-11-05 2010-07-28 英特尔公司 Optimizing performance of instructions based on sequence detection or information associated with the instructions
CN101986263A (en) * 2010-11-25 2011-03-16 中国人民解放军国防科学技术大学 Method and microprocessor for supporting single instruction stream and multi-instruction stream dynamic switching execution
CN102053819A (en) * 2009-10-26 2011-05-11 索尼公司 Information processing apparatus and instruction decoder for the information processing apparatus
CN102270112A (en) * 2010-06-03 2011-12-07 边立剑 Reduced instruction-set computer (RISC) microprocessor command decoding circuit
CN103383651A (en) * 2012-05-01 2013-11-06 瑞萨电子株式会社 Semiconductor device
CN104714779A (en) * 2013-12-12 2015-06-17 华为技术有限公司 Command processing method and device
CN105653242A (en) * 2015-12-28 2016-06-08 北京经纬恒润科技有限公司 Timing method and apparatus
CN105677299A (en) * 2016-01-05 2016-06-15 天脉聚源(北京)传媒科技有限公司 Method and device used for identification selection
CN106775591A (en) * 2016-11-21 2017-05-31 江苏宏云技术有限公司 A kind of hardware loop processing method and system of processor

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10318308B2 (en) * 2012-10-31 2019-06-11 Mobileye Vision Technologies Ltd. Arithmetic logic unit

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1307700A (en) * 1998-06-30 2001-08-08 英特尔公司 Computer processor with replay system
CN101048731A (en) * 2004-10-20 2007-10-03 英特尔公司 Looping instructions for a single instruction, multiple data execution engine
CN101452394A (en) * 2007-11-28 2009-06-10 无锡江南计算技术研究所 Compiling method and compiler
CN101788903A (en) * 2008-11-05 2010-07-28 英特尔公司 Optimizing performance of instructions based on sequence detection or information associated with the instructions
CN102053819A (en) * 2009-10-26 2011-05-11 索尼公司 Information processing apparatus and instruction decoder for the information processing apparatus
CN102270112A (en) * 2010-06-03 2011-12-07 边立剑 Reduced instruction-set computer (RISC) microprocessor command decoding circuit
CN101986263A (en) * 2010-11-25 2011-03-16 中国人民解放军国防科学技术大学 Method and microprocessor for supporting single instruction stream and multi-instruction stream dynamic switching execution
CN103383651A (en) * 2012-05-01 2013-11-06 瑞萨电子株式会社 Semiconductor device
CN104714779A (en) * 2013-12-12 2015-06-17 华为技术有限公司 Command processing method and device
CN105653242A (en) * 2015-12-28 2016-06-08 北京经纬恒润科技有限公司 Timing method and apparatus
CN105677299A (en) * 2016-01-05 2016-06-15 天脉聚源(北京)传媒科技有限公司 Method and device used for identification selection
CN106775591A (en) * 2016-11-21 2017-05-31 江苏宏云技术有限公司 A kind of hardware loop processing method and system of processor

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
LS-DSP数字信号处理器总线的低功耗设计;车德亮; 李剑川; 沈绪榜;《LS-DSP数字信号处理器总线的低功耗设计》;20050425;全文 *

Also Published As

Publication number Publication date
CN109032665A (en) 2018-12-18

Similar Documents

Publication Publication Date Title
KR102074961B1 (en) Method and apparatus for efficient scheduling for asymmetrical execution units
CN108089883B (en) Allocating resources to threads based on speculation metrics
JP3547482B2 (en) Information processing equipment
US6560697B2 (en) Data processor having repeat instruction processing using executed instruction number counter
US11698790B2 (en) Queues for inter-pipeline data hazard avoidance
US20080313444A1 (en) Microcomputer and dividing circuit
CN109032665B (en) Method and device for processing instruction output in microprocessor
US8572355B2 (en) Support for non-local returns in parallel thread SIMD engine
US10303399B2 (en) Data processing apparatus and method for controlling vector memory accesses
EP3436928A1 (en) Complex multiply instruction
CN115934168A (en) Processor and memory access method
US10942748B2 (en) Method and system for processing interrupts with shadow units in a microcontroller
US20070028077A1 (en) Pipeline processor, and method for automatically designing a pipeline processor
CN114746840A (en) Processor unit for multiply and accumulate operations
CN111221573B (en) Management method of register access time sequence, processor, electronic equipment and computer readable storage medium
US11714641B2 (en) Vector generating instruction for generating a vector comprising a sequence of elements that wraps as required
US20190391815A1 (en) Instruction age matrix and logic for queues in a processor
US20080215859A1 (en) Computer with high-speed context switching
CN114237878A (en) Instruction control method, circuit, device and related equipment
US9170819B2 (en) Forwarding condition information from first processing circuitry to second processing circuitry
CN112463218A (en) Instruction emission control method and circuit, data processing method and circuit
US20190384608A1 (en) Arithmetic processor and control method of arithmetic processor
CN112445587A (en) Task processing method and task processing device
GB2576457A (en) Queues for inter-pipeline data hazard avoidance
WO2022121273A1 (en) Simt instruction processing method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information

Address after: 100095 Building 2, Longxin Industrial Park, Zhongguancun environmental protection technology demonstration park, Haidian District, Beijing

Applicant after: Loongson Zhongke Technology Co.,Ltd.

Address before: 100095 Building 2, Longxin Industrial Park, Zhongguancun environmental protection technology demonstration park, Haidian District, Beijing

Applicant before: LOONGSON TECHNOLOGY Corp.,Ltd.

CB02 Change of applicant information
GR01 Patent grant
GR01 Patent grant