CN118312220B - Method, device and equipment for sending instruction - Google Patents

Method, device and equipment for sending instruction Download PDF

Info

Publication number
CN118312220B
CN118312220B CN202410742201.2A CN202410742201A CN118312220B CN 118312220 B CN118312220 B CN 118312220B CN 202410742201 A CN202410742201 A CN 202410742201A CN 118312220 B CN118312220 B CN 118312220B
Authority
CN
China
Prior art keywords
instruction
write
sending
type
queue
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202410742201.2A
Other languages
Chinese (zh)
Other versions
CN118312220A (en
Inventor
李祖松
郇丹丹
宋德林
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Micro Core Technology Co ltd
Original Assignee
Beijing Micro Core Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Micro Core Technology Co ltd filed Critical Beijing Micro Core Technology Co ltd
Priority to CN202410742201.2A priority Critical patent/CN118312220B/en
Publication of CN118312220A publication Critical patent/CN118312220A/en
Application granted granted Critical
Publication of CN118312220B publication Critical patent/CN118312220B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Advance Control (AREA)

Abstract

The disclosure provides a method, a device and equipment for sending an instruction. The method comprises the following steps: the first issue queue receives the first type of instruction and source operands required by the instruction, and sends the instruction and all source operands to the functional unit after fetching all source operands required by the instruction. The second transmitting queue receives the second type of instruction, and after the instruction is sent to the register file, the register file sends the instruction and the source operands required by the instruction to the functional unit. Thus, the source operands are read at different times for different instructions. Thereby reducing the number of read ports of the register file used and simultaneously reducing the storage space of the data stored in the transmit queue.

Description

Method, device and equipment for sending instruction
Technical Field
The disclosure relates to the technical field of computer processors, and in particular relates to a method, a device and equipment for sending instructions.
Background
In a computer processor architecture, a register file, issue Queue (Issue Queue), and functional units (Function Unit) are three core components in the execution of instructions by a CPU. The register file is used for storing source operands and execution results of instructions. The issue queue is used to store instructions and to issue instructions to the functional units in an out-of-order manner, depending on the readiness of the instruction source operands, etc. The functional unit is used for executing the instruction and acquiring an execution result of the instruction.
In the related art, in the interaction of the register file, the transmit queue and the functional unit, there is a phenomenon that the resource occupation amount is large.
Disclosure of Invention
The present disclosure aims to solve, at least to some extent, one of the technical problems in the related art.
Therefore, an object of the present disclosure is to provide a method, an apparatus, and a device for sending an instruction, so as to reduce the number of read ports of a register file used, and reduce the storage space of data stored in a transmit queue.
The method for sending the instruction provided by the embodiment of the first aspect of the disclosure comprises the following steps: selecting an instruction to be executed based on a preset allocation rule; reading a source operand required by an instruction from a register file and sending the instruction and the source operand to a first emission queue under the condition that the type of the instruction is a first type; in the case that the type of instruction is the second type, the instruction is sent to the second issue queue.
The method for sending the instruction provided by the embodiment of the second aspect of the disclosure includes: the first transmitting queue receives the first type of instruction and source operands required by the instruction, and transmits the instruction and all source operands to the functional unit after acquiring all source operands required by the instruction; the second transmitting queue receives the second type of instruction, and after the second transmitting queue receives the second type of instruction and sends the instruction to the register file, the register file sends the instruction and the source operands required by the instruction to the functional unit.
An instruction sending device according to an embodiment of a third aspect of the present disclosure includes: the selection module is used for selecting an instruction to be executed based on a preset allocation rule; the sending module is used for reading a source operand required by the instruction from the register file and sending the instruction and the source operand to the first sending queue under the condition that the type of the instruction is a first type; and the sending module is used for sending the instruction to the second emission queue under the condition that the type of the instruction is the second type.
An instruction sending device according to an embodiment of a fourth aspect of the present disclosure includes: the sending module is used for receiving the first type of instruction and source operands required by the instruction by the first sending queue, and sending the instruction and all source operands to the functional unit after acquiring all source operands required by the instruction; and the sending module is used for receiving the second type of instruction by the second sending queue, sending the instruction to the register file, and then sending the instruction and a source operand required by the instruction to the functional unit by the register file.
An embodiment of a fifth aspect of the present disclosure provides an electronic device, including a memory, and a computer program stored on the memory and capable of running on the memory, where the memory implements a method for sending an instruction as set forth in the embodiment of the first aspect of the present disclosure, or implements a method for sending an instruction as set forth in the embodiment of the second aspect of the present disclosure when the memory executes the program.
An embodiment of a sixth aspect of the present disclosure proposes a non-transitory computer-readable storage medium, on which a computer program is stored, which when executed by a memory, implements a method of transmitting instructions as proposed by an embodiment of the first aspect of the present disclosure, or implements a method of transmitting instructions as proposed by an embodiment of the second aspect of the present disclosure.
An embodiment of a seventh aspect of the present disclosure proposes a computer program product which, when executed by an instruction memory in the computer program product, performs a method of sending instructions as proposed by an embodiment of the first aspect of the present disclosure, or implements a method of sending instructions as proposed by an embodiment of the second aspect of the present disclosure.
The method, the device and the equipment for sending the instruction provided by the disclosure have at least the following beneficial effects: the first issue queue receives the first type of instruction and source operands required by the instruction, and sends the instruction and all source operands to the functional unit after fetching all source operands required by the instruction. The second transmitting queue receives the second type of operation instruction, after the instruction is transmitted to the register file, the register file transmits the instruction to the functional unit and the source operand required by the instruction acquires and transmits the source operand required by the operation instruction to the functional unit after the operation instruction is transmitted to the functional unit. Thus, the source operands are read at different times for different instructions. Thereby reducing the number of read ports of the register file used and simultaneously reducing the storage space of the data stored in the transmit queue.
Additional aspects and advantages of the disclosure will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the disclosure.
Drawings
The foregoing and/or additional aspects and advantages of the present disclosure will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings, in which:
FIG. 1 is an interactive schematic diagram of the transmission of instructions according to an embodiment of the present disclosure;
FIG. 2 is a flow chart illustrating a method of sending instructions according to another embodiment of the present disclosure;
FIG. 3 is a flow chart illustrating a method of sending instructions according to another embodiment of the present disclosure;
FIG. 4 is a flow chart illustrating a method of sending instructions according to another embodiment of the present disclosure;
Fig. 5 is a schematic structural diagram of a sending device of an instruction according to an embodiment of the present disclosure;
Fig. 6 is a schematic structural diagram of a transmitting device of an instruction according to another embodiment of the present disclosure;
Fig. 7 illustrates a block diagram of an exemplary electronic device suitable for use in implementing embodiments of the present disclosure.
Detailed Description
Embodiments of the present disclosure are described in detail below, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to like or similar elements or elements having like or similar functions throughout. The embodiments described below by referring to the drawings are exemplary only for explaining the present disclosure and are not to be construed as limiting the present disclosure. On the contrary, the embodiments of the disclosure include all alternatives, modifications, and equivalents as may be included within the spirit and scope of the appended claims.
Fig. 1 is an interactive schematic diagram of a method for sending an instruction according to an embodiment of the disclosure.
In the application, three types of functional units are configured: floating point functional unit, fixed point functional unit, memory access functional unit. For the three types of functional units, three types of emission queues are correspondingly arranged: floating point issue queues, fixed point issue queues, memory issue queues, and floating point register files, fixed point register files. The floating point register file is used for storing floating point type data, and the fixed point register file is used for storing fixed point type data.
The processor assigns instructions to corresponding issue queues based on their type. For example, the processor sends a floating point instruction (an instruction in which the source operand is floating point type data) to the floating point register file, then prepares the source operand stored therein required for the floating point instruction by the floating point register file, and sends the floating point instruction and the prepared source operand to the floating point function unit after the source operand is prepared. The processor sends a fixed point instruction (an instruction whose source operand is fixed point data) to the fixed point register file, after which the source operand that the fixed point instruction needs to have stored inside is prepared by the fixed point register file, and after the source operand is prepared, the fixed point instruction and the prepared source operand are sent to the fixed point functional unit. The processor sends the memory instruction (both source and destination operands may be either floating point or fixed point data) to the memory issue queue, after which the memory instruction is sent by the issue queue to the floating point register file and/or fixed point register file, after which the memory instruction and source operands are sent by the floating point register file and/or fixed point register file to the memory functional unit. After the functional unit executes the instruction to acquire the result, the use authority of the write-back bus is acquired through write-back arbitration, and the result is fed back to the corresponding register file through the write-back bus. For example, the destination operand of an instruction is a fixed point write to fixed point register file, and the destination operand of an instruction is a floating point write to floating point register file.
Furthermore, a plurality of functional units of the same type may be provided, one transmit queue being provided for each functional unit.
For example, 7 fixed point functional units may be provided, including: 4 fixed point arithmetic functional units and logic units (ALU, instructions supporting add-subtract, logic, branch, partial bit expansion, etc.), 2 multiply, divide functional units, 1 Jump/control register operation/register transfer unit (i.e., JAL (Jump and link)/JALR (Jump and link register )/CSR (status and control register operation)/fixed point to floating point data transfer instruction execution unit). 7 fixed-point emission queues are correspondingly arranged, and each fixed-point emission queue corresponds to one fixed-point functional unit.
There may be provided 6 floating point functional units comprising: 4 floating point multiply add units (FMAC), 2 floating point division units and other units (including fixed point and floating point to floating point conversion, floating point format conversion, etc.). 6 floating point transmit queues are correspondingly arranged.
8 Access functions may be provided, including 4 fetch units (Load), 4 Store units (Store). Correspondingly, 8 memory access transmit queues are set.
The number, classification, etc. of the above-mentioned functional units, the function, classification, number, etc. of the respective transmit queues are only examples, and are not limiting of the present application. The number, classification, etc. of the functional units and the transmit queues may be set based on actual requirements.
Fig. 2 is a flow chart illustrating a method for sending an instruction according to an embodiment of the disclosure.
It should be noted that, the execution body of the instruction sending method in this embodiment is a sending device for an instruction, where the device may be implemented in software and/or hardware, and the device may be configured in an electronic device, and the electronic device may include, but is not limited to, a terminal, a server, and so on.
As shown in fig. 2, the method for sending the instruction includes:
step 201: and selecting an instruction to be executed based on a preset allocation rule.
The allocation rule may be that the number of instructions simultaneously sent to each functional unit is smaller than a preset threshold, etc., which is not limited by the present application.
In the application, based on the storage sequence of the instructions, the instructions smaller than the preset threshold number corresponding to each functional unit are fetched for dispatch.
Step 202: in the case that the type of instruction is the first type, the source operands required by the instruction are read from the register file and sent to the first issue queue.
Wherein the first type corresponds to an instruction having a number of source operands greater than or equal to a threshold value for processing, such as a floating point instruction type, a fixed point instruction type, etc.
In the present application, different types of instructions correspond to different issue queues. For example, floating point instructions, fixed point instructions, memory access instructions, etc. each correspond to a different issue queue. The first transmit queue is a queue corresponding to a first type of instruction.
When a processor dispatches instructions, different types of instructions may be dispatched to their corresponding first issue queue for processing. And in the case that the type of instruction dispatched is the first type, the source operands required by the instruction may be read from the corresponding register file first. Then, the instruction and the read source operand are sent to a first issue queue for saving.
For example, when the instruction to be dispatched is a floating point instruction, the processor reads the source operand of the floating point instruction from the floating point register file, and then sends the floating point instruction and the read source operand to the floating point issue queue.
It should be noted that, reading the physical register file (PHYSICAL REGISTER FILE) to obtain the source operands before the issue queue sends the instruction to the functional unit requires the number of register file read ports to be the dispatch width, which is the maximum number of instructions sent to one issue queue. If the dispatch width is 6 and the source operand is 3, 6*3 =18 read ports are required. The physical register file (PHYSICAL REGISTER FILE) is read to obtain the source operands after the issue queue sends the instruction to the functional units, and the number of register file read ports required is the sum of the number of source operands required by the instruction executed by each functional unit. Assuming that the number of functional units is 12, and the number of source operands required for each instruction executed by each functional unit is 3, 3×12=36 read ports are required. Because the processor has more functional units, the number of instructions dispatched by the dispatch unit in one beat is much smaller than the number of instructions executed by the functional units in one beat. Thus, in the case where the type of instruction is the first type, fetching the source operand before sending the instruction to the first issue queue may reduce the read port of the register file.
Step 203: in the case that the type of instruction is the second type, the instruction is sent to the second issue queue.
Wherein the second type corresponds to an instruction that processes a number of required source operands less than a threshold, such as a memory instruction type or the like. The second transmit queue is a queue corresponding to the second type of instruction.
For example, when the instruction to be dispatched is a memory access instruction, the processor directly sends the memory access instruction to the memory access transmit queue.
The second issue queue, upon receiving the instruction, issues the instruction to a register file (PHYSICAL REGISTER FILE) to fetch the source operands required by the instruction. And then, the register file sends the instruction and the source operands required by the instruction to the functional unit, and the emission queue does not need to store the source operands required by the instruction, so that the storage resource occupation of the emission queue is reduced. And the write-back data of the functional unit does not need to be written into the transmit queue, thereby reducing power consumption.
In the application, after an instruction to be executed is selected based on a preset allocation rule, a source operand required by the instruction is read from a register file under the condition that the type of the instruction is a first type, the instruction and the source operand are sent to a first emission queue, and the instruction is sent to a second emission queue under the condition that the type of the instruction is a second type. Thus, the source operands are read at different times for different instructions. Thus, the number of read ports of the register file used is reduced, and the storage space for storing data of the transmission queue is reduced.
Fig. 3 is a flow chart illustrating a method for sending an instruction according to another embodiment of the present disclosure.
As shown in fig. 3, the method for sending the instruction includes:
Step 301: the first issue queue receives the first type of instruction and source operands required by the instruction, and sends the instruction and all source operands to the functional unit after fetching all source operands required by the instruction.
Wherein the first type corresponds to an instruction that processes a number of required source operands greater than or equal to a threshold, such as a floating point instruction type, a fixed point instruction type, or the like. The first transmit queue is a queue corresponding to a first type of instruction.
In the present application, a first issue queue receives an instruction of a first type and also receives source operands required by the instruction. The instruction and its required source operands are then saved. Thus, the first transmit queue receives the timing of the source operation before sending the instruction to the functional module.
Then, when determining to send the instruction to its corresponding functional unit, all source operands required by the instruction may be obtained, and the source operands and all source operands required by the source operands may be sent to the functional unit, where the functional unit may execute the instruction to obtain a result.
In addition, if a source operand required by an instruction is not yet calculated by a preceding instruction or is not yet loaded into the memory file, the processor cannot obtain the source operand from the register file, and cannot send the source operand to the first issue queue together with the instruction. At this time, the number of the register is written to the first issue queue to mark the state of the source operand as the currently unavailable state (Non-available). Thus, in the event that the first issue queue determines that a certain source operand required by an instruction has not been received, the first issue queue may snoop the write back bus, fetch the source operand and save. After all source operands required by the instruction are acquired, the instruction and all source operands required by the instruction are sent to a functional unit. Therefore, the source operand is obtained by the snoop write-back bus, the read port of the register file is not occupied, the number of the read port of the register file is reduced, the main frequency of the processor is improved, and the area and the power consumption of the processor are reduced.
It should be noted that, reading the physical register file (PHYSICAL REGISTER FILE) to obtain the source operands before the issue queue sends the instruction to the functional unit requires the number of register file read ports to be the dispatch width, which is the maximum number of instructions sent to one issue queue. If the dispatch width is 6 and the source operand is 3, 6*3 =18 read ports are required. The physical register file (PHYSICAL REGISTER FILE) is read to obtain the source operands after the issue queue sends the instruction to the functional units, and the number of register file read ports required is the sum of the number of source operands required by the instruction executed by each functional unit. Assuming that the number of functional units is 12, and the number of source operands required for each instruction executed by each functional unit is 3, 3×12=36 read ports are required. Because the processor has more functional units, the number of instructions dispatched by the dispatch unit in one beat is much smaller than the number of instructions executed by the functional units in one beat. Thus, in the case where the type of instruction is the first type, fetching the source operand before sending the instruction to the first issue queue may reduce the read port of the register file.
Step 302: the second transmitting queue receives the second type of instruction, and after the instruction is sent to the register file, the register file sends the instruction and the source operands required by the instruction to the functional unit.
Wherein the second type corresponds to an instruction that processes a number of required source operands less than a threshold, such as a memory instruction type or the like. The second transmit queue is a queue corresponding to the second type of instruction.
In the present application, a second issue queue receives instructions of a second type. Thereafter, the instruction is saved. When it is determined to send an instruction to its corresponding functional unit, the instruction may be directly sent to the functional unit. Thereafter, the source operands required by the instruction are retrieved and sent to the functional unit.
Or when determining to send the instruction to its corresponding functional unit, the instruction may be sent to the register file, and then the instruction and the source operands required by the instruction may be sent to the functional unit by the register file. In addition, in the case that a certain source operand required by the instruction is not stored in the register file, the second issue queue detects the write-back bus to acquire the write-back state of the source operand, then acquires the source operand from the write-back bus based on the write-back state, and sends the source operand to the functional unit corresponding to the instruction. Therefore, the source operand is obtained by the snoop write-back bus, the read port of the register file is not occupied, the number of the read port of the register file is reduced, the main frequency of the processor is improved, and the area and the power consumption of the processor are reduced.
It should be noted that, reading the physical register file (PHYSICAL REGISTER FILE) after the issue queue issues the instruction obtains the source operand, the issue queue does not need to store the source operand required by the instruction, and the storage resource occupation of the issue queue is reduced. And the write-back data of the functional unit does not need to be written into the transmit queue, thereby reducing power consumption.
Alternatively, the process of sending the instruction to the functional unit by the first issue queue or the second issue queue may be: instructions are sent to the functional units based on their priorities. Wherein the priority of the instructions may be preconfigured in the system. Thereby improving the efficiency of instruction execution.
One possible implementation manner is that the first transmission queue performs arbitration allocation on the authority of using the write-back bus by the instruction according to the priority of the instruction, and after the instruction obtains the use authority of the write-back bus, the first transmission queue sends the instruction to the functional unit. Similarly, the second transmitting queue performs arbitration and allocation on the authority of using the write-back bus by the instruction according to the priority of the instruction, and after the instruction obtains the use authority of the write-back bus, the second transmitting queue transmits the instruction to the functional unit.
In the application, the first transmitting queue receives the first type of instruction and the source operands needed by the instruction, and sends the instruction and all the source operands to the functional unit after acquiring all the source operands needed by the instruction. The second transmitting queue receives the second type of instruction, and after the instruction is sent to the register file, the register file sends the instruction and the source operands required by the instruction to the functional unit. Thus, the source operands are read at different times for different instructions. Thereby reducing the number of read ports of the register file used and simultaneously reducing the storage space of the data stored in the transmit queue.
Fig. 4 is a flowchart of a method for sending an instruction according to another embodiment of the present disclosure.
As shown in fig. 4, the method for sending the instruction includes:
Step 401: the first transmitting queue receives an instruction of a first type, the first transmitting queue comprises a write-back state table, and the write-back state table comprises a write-back period number corresponding to the transmitted instruction.
The first type is a type corresponding to an instruction with a source operand greater than or equal to a threshold value, such as a floating point instruction type, a fixed point instruction type and the like.
In the application, a write-back state table can be preset in the first transmission queue. As shown in Table 1, the write-back status table at least includes the number of write-back cycles corresponding to the issued instruction. The number of write-back cycles is the number of cycles that the current moment is away from the sent instruction by the write-back result interval.
TABLE 1
In addition, the write-back state table may also include valid bits, states, etc. Wherein the status may be sent (indicating that the corresponding instruction has been sent to the functional unit), written back (indicating that the result of its corresponding instruction has been received), etc. Initializing a valid bit to 0 indicates that the item is invalid. After an instruction enters the issue queue, the state corresponding to the instruction is set to 1, indicating that the item is valid.
Step 402: after all source operands required by the instruction are acquired, a resource sharing instruction using the same write back bus as the instruction is determined in the issued instruction which is not written back in the write back state table.
In the application, the type of the instruction processed by each write-back bus can be preset in the system, and then, whether the same write-back bus is used or not can be determined based on the type of the instruction and the type of the transmitted instruction, so that the transmitted instruction which uses the same write-back bus as the instruction is determined to be a resource sharing instruction.
For example, assume that all multiply instructions use the same write back bus. The current instruction A is a multiplication instruction, the sent instruction B is also a multiplication instruction, the sent instruction B and the instruction A can be determined to use the same write-back bus, and then the sent instruction B is determined to be a resource sharing instruction.
Step 403: in the case where the number of execution cycles of the instruction is the same as the number of write-back cycles of the resource-sharing instruction, the instruction is suspended from being sent to the functional unit at the current clock cycle.
In the present application, the number of cycles each type of instruction in the writeback state table completes (i.e., the number of cycles executed) is fixed. Before the operation instruction is sent to the functional unit, the execution cycle number of the instruction and the write-back cycle number of the resource sharing instruction can be compared, and when the execution cycle number of the instruction is the same as the write-back cycle number of the resource sharing instruction, the instruction and the resource sharing instruction occupy the same write-back bus and write-back port of the register file at the same time during write-back, and a write-back conflict phenomenon occurs. Thus, the instruction is suspended from being sent to the functional unit in the current clock cycle, and is sent again in the clock cycle after waiting, so that the instruction cannot collide during writing back.
Optionally, the instruction and all source operands are sent to the functional unit in the event that the number of execution cycles of the instruction is not the same as the number of write back cycles of the resource sharing instruction. Thereby ensuring that instructions do not conflict when written back.
Optionally, at each clock cycle, the number of write-back cycles corresponding to each unwritten sent instruction in the write-back state table is decremented by 1 until the sent instruction is written back. Thereby ensuring the accuracy of the write-back cycle number and further ensuring the reliability of instruction processing.
Step 404: the second transmitting queue receives the second type of instruction, and after the instruction is sent to the register file, the register file sends the instruction and the source operands required by the instruction to the functional unit.
In the present application, a write-back state table may be preset in the second transmit queue. The second issue queue may determine that among the issued instructions that are not written back in the write back state table, the resource sharing instructions of the same write back bus as the instructions use before the instructions are issued to the functional unit. Thereafter, in the case where the number of execution cycles of the instruction is the same as the number of write-back cycles of the resource sharing instruction, the instruction is suspended from being sent to the functional unit at the current clock cycle. And sending the instruction to the functional unit under the condition that the execution cycle number of the instruction is different from the write-back cycle number of the resource sharing instruction. Thereby ensuring that instructions do not conflict when written back.
In the application, a first transmitting queue receives an instruction of a first type, the first transmitting queue comprises a write-back state table, the write-back state table comprises a write-back period number corresponding to an issued instruction, after all source operands required by the instruction are acquired, the resource sharing instruction which is not written back in the write-back state table and is the same as the instruction is determined, then, under the condition that the execution period number of the instruction is the same as the write-back period number of the resource sharing instruction, the instruction is suspended to be transmitted to a functional unit in the current clock period, a second transmitting queue receives an instruction of a second type, and after the instruction is transmitted to a register file, the instruction and the source operands required by the instruction are transmitted to the functional unit by the register file. Therefore, before the instruction is sent to the functional unit, whether the sent instruction and the instruction to be sent have write-back collision or not is determined through the write-back state table, so that the phenomenon that the instruction has write-back collision is avoided, and the efficiency and the reliability of the system are improved.
In one embodiment of the application, the first transmit queue and the second transmit queue contain a write-back state table. The write-back state table comprises the number of the transmitted instructions to be written back corresponding to each write-back bus and each register file write-back port in each clock cycle.
Based on the write-back state table, the process of sending the instruction to the functional unit by the first transmit queue or the second transmit queue is as follows: and querying a write-back state table, and determining the number of sent instructions which are not written back at a target clock cycle by a write-back bus used by the instructions and a register file write-back port, wherein the target clock cycle is the instruction execution cycle number. Thereafter, in the event that the number of issued instructions is not zero, the instruction is suspended from being issued to the functional unit at the current clock cycle. In the case where the number of issued instructions is zero, an instruction is issued to the functional unit at the current clock cycle.
For example, assume that instruction A and instruction B occupy the same write back bus 1 and write back port 1 of the register file as instruction C. And when the instruction C is to be sent, the instruction A and the instruction B are sent to the functional unit, the clock cycle number of the instruction A from the write-back result is 2, and the clock cycle number of the instruction B from the write-back result is 1. I.e. at the write back bus 1 and the write back port 1 will be occupied by instruction B in the next clock cycle and the second clock cycle after the current clock cycle will be occupied by instruction a. The number of the commands to be written back sent corresponding to the first clock cycle after the current time of the write back bus 1 and the write back port 1 in the write back state table is 1, the number of the commands to be written back sent corresponding to the second clock cycle after the current time is 2, and the numbers of the commands to be written back sent corresponding to the other clock cycles are 0. Assume that the number of execution cycles of the running instruction C is 2, and thus the target clock cycle is determined to be the second clock cycle after the current clock cycle. Before sending the instruction C, the write back state table is queried, and the number of sent instructions which are not written back by the write back bus 1 and the write back port 1 in the second clock cycle after the current clock cycle is determined to be1, which means that 1 sent instruction writes back data by using the write back bus 1 and the write back port 1 in the second clock cycle after the current clock cycle. Thus, instruction C is suspended from being sent to the functional unit at the current clock cycle. In case the number of issued instructions is zero, indicating that the write back bus 1 and the write back port 1 are unoccupied in the second clock cycle after the current clock cycle, instruction C may be issued to the functional unit in the current clock cycle. Thereby avoiding write-back collisions of instructions.
Fig. 5 is a schematic structural diagram of a device for sending an instruction according to an embodiment of the present disclosure.
As shown in fig. 5, the apparatus 500 for transmitting the instruction is characterized in that the apparatus 500 includes:
A selecting module 510, configured to select an instruction to be executed based on a preset allocation rule;
a sending module 520, configured to read a source operand required by an instruction from a register file and send the instruction and the source operand to a first sending queue when the type of the instruction is a first type;
The sending module 520 is configured to send the instruction to the second transmit queue if the type of the instruction is the second type.
In the application, after an instruction to be executed is selected based on a preset allocation rule, a source operand required by the instruction is read from a register file under the condition that the type of the instruction is a first type, the instruction and the source operand are sent to a first emission queue, and the instruction is sent to a second emission queue under the condition that the type of the instruction is a second type. Thus, the source operands are read at different times for different instructions. Thus, the number of read ports of the register file used is reduced, and the storage space for storing data of the transmission queue is reduced.
Corresponding to the instruction sending method provided by the embodiments of fig. 1 and fig. 2, the present disclosure further provides an instruction sending device. The instruction sending device provided by the embodiment of the present disclosure corresponds to the instruction sending method provided by the embodiments of fig. 1 and fig. 2. Therefore, the implementation of the instruction sending method is also applicable to the instruction sending device provided in the embodiment of the present disclosure, and the embodiment of the present disclosure will not be described in detail.
Fig. 6 is a schematic structural diagram of an instruction sending device according to an embodiment of the present disclosure.
As shown in fig. 6, the apparatus 600 for transmitting the instruction is characterized in that the apparatus 600 includes:
A sending module 610, configured to receive an instruction of a first type and source operands required by the instruction in a first sending queue, and send the instruction and all source operands to a functional unit after obtaining all source operands required by the instruction;
The sending module 610 is configured to send the instruction and a source operand required by the instruction to the functional unit by the register file after the second issue queue receives the instruction of the second type and sends the instruction to the register file.
In some embodiments of the present disclosure, the apparatus further comprises:
And the first interception module is used for intercepting and writing back the bus by the first transmission queue under the condition that any source operand required by the instruction is not received, acquiring any source operand and storing the source operand.
In some embodiments of the present disclosure, the apparatus further comprises a second listening module for:
In the case that any source operand required by the instruction is not stored in the register file, the second issue queue snoops the write-back bus to acquire the write-back state of any source operand;
any source operands are retrieved from the write back bus and sent to the functional unit based on the write back status.
In some embodiments of the present disclosure, the first transmit queue and the second transmit queue include a write-back state table, the write-back state table includes a write-back cycle number corresponding to the transmitted instruction, and the transmitting module 610 is configured to:
Determining a resource sharing instruction which uses the same write-back bus with the instruction in the transmitted instruction which is not written back in the write-back state table;
In the case where the number of execution cycles of the instruction is the same as the number of write-back cycles of the resource-sharing instruction, the instruction is suspended from being sent to the functional unit at the current clock cycle.
In some embodiments of the present disclosure, the apparatus 600 further comprises an update module for:
And subtracting 1 from the number of write-back cycles corresponding to each unwritten sent instruction in the write-back state table in each clock cycle until the sent instruction is written back.
In some embodiments of the present disclosure, the first transmit queue and the second transmit queue include a write-back state table, where the write-back state table includes a number of transmitted instructions to be written back corresponding to each write-back bus and each register file write-back port in each clock cycle, and the transmitting module 610 is configured to:
inquiring a write-back state table, and determining the number of transmitted instructions which are not written back at a target clock cycle by a write-back bus used by the instructions and a register file write-back port, wherein the target clock cycle is the instruction execution cycle number;
In the event that the number of issued instructions is not zero, the instruction is suspended from being issued to the functional unit at the current clock cycle.
In some embodiments of the present disclosure, the sending module 610 is configured to:
Instructions are sent to the functional units based on their priorities.
Corresponding to the instruction sending method provided by the embodiments of fig. 1 and fig. 3 to fig. 4, the disclosure further provides an instruction sending device. The instruction sending device provided by the embodiment of the present disclosure corresponds to the instruction sending method provided by the embodiments of fig. 1, 3-4. Therefore, the implementation of the instruction sending method is also applicable to the instruction sending device provided in the embodiment of the present disclosure, and the embodiment of the present disclosure will not be described in detail.
In the application, the first transmitting queue receives the first type of instruction and the source operands needed by the instruction, and sends the instruction and all the source operands to the functional unit after acquiring all the source operands needed by the instruction. The second transmitting queue receives the second type of instruction, and after the instruction is sent to the register file, the register file sends the instruction and the source operands required by the instruction to the functional unit. Thus, the source operands are read at different times for different instructions. Thereby reducing the number of read ports of the register file used and simultaneously reducing the storage space of the data stored in the transmit queue.
In order to achieve the above embodiments, the present disclosure further proposes an electronic device including: memory, memory and computer program stored on the memory and executable on the memory, the memory implementing the method of transmitting instructions as proposed by the foregoing embodiments of the present disclosure when the program is executed by the memory.
In order to implement the above-described embodiments, the present disclosure also proposes a non-transitory computer-readable storage medium having stored thereon a computer program which, when executed by a memory, implements a transmission method of instructions as proposed by the foregoing embodiments of the present disclosure.
In order to implement the above-mentioned embodiments, the present disclosure also proposes a computer program product which, when executed by an instruction memory in the computer program product, performs a transmission method of instructions as proposed by the foregoing embodiments of the present disclosure.
Fig. 7 illustrates a block diagram of an exemplary electronic device suitable for use in implementing embodiments of the present disclosure. The electronic device 12 shown in fig. 7 is merely an example and should not be construed as limiting the functionality and scope of use of the disclosed embodiments.
As shown in fig. 7, the electronic device 12 is in the form of a general purpose computing device. Components of the electronic device 12 may include, but are not limited to: one or more memories or processing units 16, a system memory 28, and a bus 18 connecting the various system components, including the system memory 28 and the processing units 16.
Bus 18 represents one or more of several types of bus structures, including a memory bus or memory controller, a peripheral bus, an accelerated graphics port, a memory, or a local bus using any of a variety of bus architectures. By way of example, and not limitation, such architectures include industry Standard architecture (Industry Standard Architecture; hereinafter ISA) bus, micro channel architecture (Micro Channel Architecture; hereinafter MAC) bus, enhanced ISA bus, video electronics standards Association (Video Electronics Standards Association; hereinafter VESA) local bus, and peripheral component interconnect (PERIPHERAL COMPONENT INTERCONNECTION; hereinafter PCI) bus.
Electronic device 12 typically includes a variety of computer system readable media. Such media can be any available media that is accessible by electronic device 12 and includes both volatile and nonvolatile media, removable and non-removable media.
Memory 28 may include computer system readable media in the form of volatile memory, such as random access memory (Random Access Memory; hereinafter: RAM) 30 and/or cache memory 32. The electronic device 12 may further include other removable/non-removable, volatile/nonvolatile computer system storage media. By way of example only, storage system 34 may be used to read from or write to non-removable, nonvolatile magnetic media (not shown in FIG. 7, commonly referred to as a "hard disk drive").
Although not shown in fig. 7, a disk drive for reading from and writing to a removable nonvolatile magnetic disk (e.g., a "floppy disk"), and an optical disk drive for reading from or writing to a removable nonvolatile optical disk (e.g., a compact disk read only memory (Compact Disc Read Only Memory; hereinafter CD-ROM), digital versatile read only optical disk (Digital Video Disc Read Only Memory; hereinafter DVD-ROM), or other optical media) may be provided. In such cases, each drive may be coupled to bus 18 through one or more data medium interfaces. Memory 28 may include at least one program product having a set (e.g., at least one) of program modules configured to carry out the functions of the various embodiments of the disclosure.
A program/utility 40 having a set (at least one) of program modules 42 may be stored in, for example, memory 28, such program modules 42 including, but not limited to, an operating system, one or more application programs, other program modules, and program data, each or some combination of which may include an implementation of a network environment. Program modules 42 generally perform the functions and/or methods in the embodiments described in this disclosure.
The electronic device 12 may also communicate with one or more external devices 14 (e.g., keyboard, pointing device, display 24, etc.), one or more devices that enable a user to interact with the electronic device 12, and/or any devices (e.g., network card, modem, etc.) that enable the electronic device 12 to communicate with one or more other computing devices. Such communication may occur through an input/output (I/O) interface 22. Also, the electronic device 12 may communicate with one or more networks, such as a local area network (Local Area Network; hereinafter: LAN), a wide area network (Wide Area Network; hereinafter: WAN), and/or a public network, such as the Internet, through the network adapter 20. As shown, the network adapter 20 communicates with other modules of the electronic device 12 over the bus 18. It should be appreciated that although not shown, other hardware and/or software modules may be used in connection with electronic device 12, including, but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, data backup storage systems, and the like.
The processing unit 16 executes various functional applications and data processing by running a program stored in the system memory 28, for example, implementing the transmission method of instructions mentioned in the foregoing embodiment.
Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This disclosure is intended to cover any adaptations, uses, or adaptations of the disclosure following the general principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.
It is to be understood that the present disclosure is not limited to the precise arrangements and instrumentalities shown in the drawings, and that various modifications and changes may be effected without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.
It should be noted that in the description of the present disclosure, the terms "first," "second," and the like are used for descriptive purposes only and are not to be construed as indicating or implying relative importance. Furthermore, in the description of the present disclosure, unless otherwise indicated, the meaning of "a plurality" is two or more.
Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing specific logical functions or steps of the process, and further implementations are included within the scope of the preferred embodiment of the present disclosure in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the embodiments of the present disclosure.
It should be understood that portions of the present disclosure may be implemented in hardware, software, firmware, or a combination thereof. In the above-described embodiments, the various steps or methods may be implemented in software or firmware stored in a memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, may be implemented using any one or combination of the following techniques, as is well known in the art: discrete logic circuits having logic gates for implementing logic functions on data signals, application specific integrated circuits having suitable combinational logic gates, programmable Gate Arrays (PGAs), field Programmable Gate Arrays (FPGAs), and the like.
Those of ordinary skill in the art will appreciate that all or part of the steps carried out in the method of the above-described embodiments may be implemented by a program to instruct related hardware, and the program may be stored in a computer readable storage medium, where the program when executed includes one or a combination of the steps of the method embodiments.
Furthermore, each functional unit in the embodiments of the present disclosure may be integrated in one processing module, or each unit may exist alone physically, or two or more units may be integrated in one module. The integrated modules may be implemented in hardware or in software functional modules. The integrated modules may also be stored in a computer readable storage medium if implemented as software functional modules and sold or used as a stand-alone product.
The above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, or the like.
In the description of the present specification, a description referring to terms "one embodiment," "some embodiments," "examples," "specific examples," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present disclosure. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiments or examples. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
Although embodiments of the present disclosure have been shown and described above, it will be understood that the above embodiments are illustrative and not to be construed as limiting the present disclosure, and that variations, modifications, alternatives, and variations may be made to the above embodiments by one of ordinary skill in the art within the scope of the present disclosure.

Claims (11)

1. A method of transmitting an instruction, the method comprising:
a first transmission queue receives an instruction of a first type and source operands required by the instruction, and after all the source operands required by the instruction are acquired, the instruction and all the source operands are sent to a functional unit, and under the condition that any source operand required by the instruction is not received, the first transmission queue listens back to a bus, acquires any source operand and stores the source operand, wherein the first type corresponds to the instruction of which the number of the source operands required by the processing is greater than or equal to a threshold value;
And after receiving a second type of instruction, sending the instruction to a register file, and then sending the instruction and source operands required by the instruction to a functional unit by the register file, wherein the second type corresponds to the instruction with the number of source operands required by processing being smaller than a threshold value.
2. The method of claim 1, wherein the method further comprises:
In the case that any source operand required by the instruction is not stored in the register file, the second issue queue snoop write back bus acquires the write back state of the any source operand;
And acquiring any source operand from the write-back bus and sending the any source operand to the functional unit based on the write-back state.
3. The method of any of claims 1-2, wherein the first transmit queue and the second transmit queue include a write-back state table, the write-back state table includes a number of write-back cycles corresponding to a transmitted instruction, and the transmitting the instruction to a functional unit includes:
Determining a resource sharing instruction which uses the same write-back bus with the instruction in the transmitted instruction which is not written back in the write-back state table;
and suspending sending the instruction to the functional unit in the current clock cycle under the condition that the execution cycle number of the instruction is the same as the write-back cycle number of the resource sharing instruction.
4. A method as claimed in claim 3, wherein the method further comprises:
and subtracting 1 from the number of write-back cycles corresponding to each unwritten sent instruction in the write-back state table in each clock cycle until the sent instruction is written back.
5. The method of any of claims 1-2, wherein the first transmit queue and the second transmit queue include a write-back state table, the write-back state table includes a number of instructions to be written back sent corresponding to each write-back bus and each register file write-back port in each clock cycle, and the sending the instructions to a functional unit includes:
Querying the write-back state table, and determining the number of sent instructions which are not written back at a target clock cycle by a write-back bus used by the instruction and a register file write-back port, wherein the target clock cycle is the instruction execution cycle number;
In the event that the number of issued instructions is not zero, suspending the issuing of the instructions to the functional unit at the current clock cycle.
6. The method of any of claims 1-2, wherein said sending the instruction to a functional unit comprises:
the instruction is sent to a functional unit based on the priority of the instruction.
7. A method of transmitting an instruction, the method comprising:
Selecting an instruction to be executed based on a preset allocation rule;
Reading a source operand required by the instruction from a register file and sending the instruction and the source operand to a first emission queue under the condition that the type of the instruction is a first type, so that the first emission queue receives the instruction of the first type and the source operand required by the instruction, and sends the instruction and all source operands to a functional unit after all source operands required by the instruction are acquired, and the first emission queue detects and writes back a bus under the condition that any source operand required by the instruction is not received, acquires and stores any source operand, wherein the first type corresponds to the instruction with the number of source operands required by processing being greater than or equal to a threshold value;
And sending the instruction to a second emission queue under the condition that the type of the instruction is a second type, so that the second emission queue receives the instruction of the second type, and sending the instruction and source operands required by the instruction to a functional unit by a register file after sending the instruction to the register file, wherein the second type corresponds to the instruction of which the number of source operands required by processing is smaller than a threshold value.
8. An apparatus for transmitting instructions, the apparatus comprising:
The selection module is used for selecting an instruction to be executed based on a preset allocation rule;
A sending module, configured to read a source operand required by the instruction from a register file and send the instruction and the source operand to a first sending queue, where the first sending queue receives the instruction of the first type and the source operand required by the instruction, and send the instruction and the all source operands to a functional unit after all source operands required by the instruction are acquired, and the first sending queue listens back to a bus and acquires and stores any source operand required by the instruction if any source operand required by the instruction is not received, where the first type corresponds to an instruction that the number of source operands required by processing is greater than or equal to a threshold;
and the sending module is used for sending the instruction to a second sending queue under the condition that the type of the instruction is a second type, so that the second sending queue receives the instruction of the second type, and after the instruction is sent to the register file, the register file sends the instruction and source operands required by the instruction to a functional unit, wherein the second type corresponds to the instruction of which the number of source operands required by processing is smaller than a threshold value.
9. An apparatus for transmitting instructions, the apparatus comprising:
The sending module is used for receiving a first type of instruction and source operands required by the instruction by a first sending queue, sending the instruction and all source operands to a functional unit after acquiring all source operands required by the instruction, and if any source operand required by the instruction is not received, the first sending queue listens to a write-back bus, acquires any source operand and stores the source operand, wherein the first type corresponds to the instruction of which the number of source operands required by processing is greater than or equal to a threshold value;
The sending module is configured to receive a second type of instruction from the second sending queue, send the instruction to the register file, and then send the instruction and source operands required by the instruction to the functional unit by the register file, where the second type corresponds to an instruction with a number of source operands required by processing less than a threshold.
10. An electronic device, comprising:
at least one memory; and
A memory communicatively coupled to the at least one memory; wherein,
The memory stores instructions executable by the at least one memory to enable the at least one memory to perform the method of any one of claims 1-6 or to perform the method of claim 7.
11. A non-transitory computer readable storage medium storing computer instructions for causing a computer to perform the method of any one of claims 1-6 or to perform the method of claim 7.
CN202410742201.2A 2024-06-11 2024-06-11 Method, device and equipment for sending instruction Active CN118312220B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410742201.2A CN118312220B (en) 2024-06-11 2024-06-11 Method, device and equipment for sending instruction

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410742201.2A CN118312220B (en) 2024-06-11 2024-06-11 Method, device and equipment for sending instruction

Publications (2)

Publication Number Publication Date
CN118312220A CN118312220A (en) 2024-07-09
CN118312220B true CN118312220B (en) 2024-08-30

Family

ID=91733825

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410742201.2A Active CN118312220B (en) 2024-06-11 2024-06-11 Method, device and equipment for sending instruction

Country Status (1)

Country Link
CN (1) CN118312220B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104049945A (en) * 2013-03-15 2014-09-17 英特尔公司 Methods and apparatus for fusing instructions to provide or-test and and-test functionality on multiple test sources
CN113778522A (en) * 2021-09-13 2021-12-10 中国电子科技集团公司第五十八研究所 Instruction transmitting processing method in transmitting unit

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116243976A (en) * 2022-04-14 2023-06-09 海光信息技术股份有限公司 Instruction execution method and device, electronic equipment and storage medium
US20240020127A1 (en) * 2022-07-13 2024-01-18 Simplex Micro, Inc. Out-of-order execution of loop instructions in a microprocessor
CN118093024A (en) * 2024-03-22 2024-05-28 飞腾信息技术有限公司 Instruction dispatch method, dispatch device and related equipment

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104049945A (en) * 2013-03-15 2014-09-17 英特尔公司 Methods and apparatus for fusing instructions to provide or-test and and-test functionality on multiple test sources
CN113778522A (en) * 2021-09-13 2021-12-10 中国电子科技集团公司第五十八研究所 Instruction transmitting processing method in transmitting unit

Also Published As

Publication number Publication date
CN118312220A (en) 2024-07-09

Similar Documents

Publication Publication Date Title
US7310722B2 (en) Across-thread out of order instruction dispatch in a multithreaded graphics processor
US10877766B2 (en) Embedded scheduling of hardware resources for hardware acceleration
US6189065B1 (en) Method and apparatus for interrupt load balancing for powerPC processors
CN111258935B (en) Data transmission device and method
EP1405184A2 (en) Data processing apparatus
US9304774B2 (en) Processor with a coprocessor having early access to not-yet issued instructions
JP2011238271A (en) Simulation of multi-port memory using memory having small number of ports
KR20100053593A (en) Mechanism for broadcasting system management interrupts to other processors in a computer system
US6154832A (en) Processor employing multiple register sets to eliminate interrupts
US9304775B1 (en) Dispatching of instructions for execution by heterogeneous processing engines
US8578387B1 (en) Dynamic load balancing of instructions for execution by heterogeneous processing engines
US20070074009A1 (en) Scalable parallel pipeline floating-point unit for vector processing
CN114153500A (en) Instruction scheduling method, instruction scheduling device, processor and storage medium
CN107870780B (en) Data processing apparatus and method
US20130212364A1 (en) Pre-scheduled replays of divergent operations
JP3431941B2 (en) Method and apparatus for determining instruction execution order in a data processing system
US6738837B1 (en) Digital system with split transaction memory access
CN118312220B (en) Method, device and equipment for sending instruction
US5765017A (en) Method and system in a data processing system for efficient management of an indication of a status of each of multiple registers
CN112559403A (en) Processor and interrupt controller therein
CN116991480A (en) Instruction processing method, device, circuit, transmitter, chip, medium and product
WO2021037124A1 (en) Task processing method and task processing device
CN114625422A (en) Area and power efficient mechanism to consolidate wake-up store-related loads based on store evictions
KR100237989B1 (en) Method and system for efficiently utilizing rename buffers to reduce dispatch unit stalls in a superscalar processor
JP2006285724A (en) Information processor and information processing method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant