CN117170750A - Multi-source operand instruction scheduling method, device, processor, equipment and medium - Google Patents

Multi-source operand instruction scheduling method, device, processor, equipment and medium Download PDF

Info

Publication number
CN117170750A
CN117170750A CN202311135017.3A CN202311135017A CN117170750A CN 117170750 A CN117170750 A CN 117170750A CN 202311135017 A CN202311135017 A CN 202311135017A CN 117170750 A CN117170750 A CN 117170750A
Authority
CN
China
Prior art keywords
instruction
source operand
source
execution unit
operand
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311135017.3A
Other languages
Chinese (zh)
Inventor
张稚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hexin Technology Co ltd
Shanghai Hexin Digital Technology Co ltd
Original Assignee
Hexin Technology Co ltd
Shanghai Hexin Digital Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hexin Technology Co ltd, Shanghai Hexin Digital Technology Co ltd filed Critical Hexin Technology Co ltd
Priority to CN202311135017.3A priority Critical patent/CN117170750A/en
Publication of CN117170750A publication Critical patent/CN117170750A/en
Pending legal-status Critical Current

Links

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The application provides a scheduling method, a scheduling device, a scheduling processor, a scheduling device and a scheduling medium of a multi-source operand instruction. The method comprises the following steps: in a first clock cycle of sending an instruction to an execution unit, the instruction dispatch unit reads a first source operand and a second source operand of a plurality of source operands from a register file respectively; in a second clock cycle of sending the instruction to the execution unit, the instruction scheduling unit sends the multi-source operand instruction, the first source operand and the second source operand to the execution unit, and synchronously reads a third source operand in the plurality of source operands from the register file; in a third clock cycle of sending the instruction to the execution unit, the instruction scheduling unit sends the third source operand to the execution unit by multiplexing the data bus of the first source operand, so that the execution unit executes operation corresponding to the multi-source operand instruction on the plurality of source operands to obtain an instruction operation result. To more stably transfer a multi-source operand instruction and a plurality of source operands in a processor.

Description

Multi-source operand instruction scheduling method, device, processor, equipment and medium
Technical Field
The present application relates to computer technologies, and in particular, to a method, an apparatus, a processor, a device, and a medium for scheduling multi-source operand instructions.
Background
In the instruction set of computer processors, multi-source operand instructions are mostly included. A multi-source operand instruction refers to an instruction having more than 2 operands. The operands are typically stored in a physical register file (Physical Register File, PRF), and when the instruction dispatch unit (Instruction Schedule Unit, ISU) dispatches an issue instruction, there is typically one clock cycle for reading the source operands to the physical register file PRF and for issuing the source operands and the micro instruction to the corresponding execution units in the next clock cycle.
Since multi-source operand instructions require the instruction dispatch unit ISU to prepare multiple operands simultaneously, which requires the ISU to read multiple source operands from the PRF simultaneously, the PRF is required to support multi-port read-write functions. However, PRF requires a larger resource overhead (in some cases even more than 1 time) and is worse in timing, regardless of whether it depends on a memory granule implementation or a register implementation, to support multi-port read-write functions. In addition, the issue of multi-source operand instructions also requires an increase in the data bus path between the execution unit and the dispatch unit, further increasing physical overhead.
Disclosure of Invention
The application provides a scheduling method, a device, a processor, equipment and a medium for a multi-source operand instruction, which are used for solving the problems of access conflict of a register file or congestion of a data bus, high physical resource expense and increase of instruction scheduling time in the prior art, so as to more stably transmit the multi-source operand instruction and a plurality of source operands in the processor, save the resource expense of the data bus and the like, and improve the execution speed and the parallelism of the multi-source operand instruction.
In one aspect, the present application provides a method of scheduling a multi-source operand instruction, the method comprising:
in a first clock cycle of sending an instruction to an execution unit, the instruction dispatch unit reads a first source operand and a second source operand of a plurality of source operands from a register file respectively;
in a second clock cycle of sending an instruction to an execution unit, the instruction dispatch unit sends a multi-source operand instruction, the first source operand, the second source operand to the execution unit, and synchronously reads a third source operand of the plurality of source operands from the register file;
in a third clock cycle of sending an instruction to the execution unit, the instruction scheduling unit sends the third source operand to the execution unit by multiplexing the data bus of the first source operand, so that the execution unit executes operation corresponding to the multi-source operand instruction on the plurality of source operands to obtain an instruction operation result.
In an alternative embodiment, the instruction dispatch unit reads a first source operand and a second source operand of a plurality of source operands from a register file, respectively, and includes:
acquiring a plurality of source operand information from source operand information areas of a plurality of source operand registers in the register file;
reading the first source operand from an operand storage area in a first source operand register and the second source operand from an operand storage area in a second source operand register according to the plurality of source operand information;
reading a third source operand of a plurality of source operands from the register file, comprising:
the third source operand is read from an operand storage area in a third source operand register according to the plurality of source operand information.
In an alternative embodiment, the multi-source operand instruction includes: the instruction type mark is used for identifying the type of the multi-source operand instruction, and the operation code is used for identifying the type of the operation corresponding to the multi-source operand instruction;
the execution unit is used for executing operation corresponding to the multi-source operand instruction on the plurality of source operands based on the instruction type mark and the operation code to obtain the instruction operation result.
In an alternative embodiment, the instruction scheduling unit and the execution unit are both included in a core of a general purpose microprocessor, such that the instruction scheduling unit schedules the multi-source operand instruction in the core of the general purpose microprocessor, and the execution unit performs an operation corresponding to the multi-source operand instruction on the plurality of source operands in the core of the general purpose microprocessor.
In an alternative embodiment, the arithmetic operation corresponding to the multi-source operand instruction includes at least one of: fused multiply-add operations, three-operand addition operations.
In an alternative embodiment, the method further comprises:
in a second clock cycle of issuing an instruction to the execution unit, the instruction dispatch unit further reads a fourth source operand of a plurality of source operands from the register file;
and in a fourth clock cycle of sending an instruction to the execution unit, the instruction scheduling unit sends the fourth source operand to the execution unit through multiplexing a data bus of the second source operand, so that the execution unit executes operation corresponding to the multi-source operand instruction on the plurality of source operands to obtain an instruction operation result.
In another aspect, the present application provides a processor comprising:
an instruction scheduling unit for reading a first source operand and a second source operand of the plurality of source operands from the register file, respectively, in a first clock cycle of an instruction to the execution unit; in a second clock cycle of sending an instruction to an execution unit, sending a multi-source operand instruction, the first source operand, the second source operand to the execution unit, and synchronously reading a third source operand of the plurality of source operands from the register file; transmitting the third source operand to the execution unit through multiplexing the data bus of the first source operand in a third clock cycle of transmitting an instruction to the execution unit;
and the execution unit is used for executing operation corresponding to the multi-source operand instruction on the plurality of source operands to obtain an instruction operation result.
In an alternative embodiment, the instruction scheduling unit is further configured to obtain a plurality of source operand information from source operand information areas of a plurality of source operand registers in the register file; and reading the first source operand from an operand storage area in a first source operand register and the second source operand from an operand storage area in a second source operand register according to the plurality of source operand information; and reading a third source operand from an operand storage area in the third source operand register according to the plurality of source operand information.
In an alternative embodiment, the multi-source operand instruction includes: the instruction type mark is used for identifying the type of the multi-source operand instruction, and the operation code is used for identifying the type of the operation corresponding to the multi-source operand instruction;
the execution unit is further configured to execute operation corresponding to the multi-source operand instruction on the plurality of source operands based on the instruction type flag and the operation code, and obtain the instruction operation result.
In another aspect, the present application provides a scheduling apparatus for a multi-source operand instruction, the apparatus comprising:
a reading module, configured to, in a first clock cycle of sending an instruction to the execution unit, read a first source operand and a second source operand in a plurality of source operands from the register file by the instruction scheduling unit, respectively;
a first sending module, configured to send an instruction to an execution unit in a second clock cycle of sending an instruction, where the instruction scheduling unit sends a multi-source operand instruction, the first source operand, the second source operand to the execution unit, and synchronously reads a third source operand from the plurality of source operands from the register file;
And the second sending module is used for sending the third source operand to the execution unit in a third clock cycle of sending the instruction to the execution unit, and the instruction scheduling unit is used for sending the third source operand to the execution unit by multiplexing the data bus of the first source operand so that the execution unit can execute operation corresponding to the multi-source operand instruction on the plurality of source operands to obtain an instruction operation result.
In another aspect, the present application provides an electronic device, including: a processor and a memory connected with the processor; the memory stores computer-executable instructions; the processor executes the computer-executable instructions stored in the memory to implement the method as described in any one of the above.
In another aspect, the application provides a computer-readable storage medium having stored therein computer-executable instructions which, when executed by a processor, are adapted to carry out a method as any one of the above.
In another aspect, the application provides a computer program product comprising a computer program which, when executed by a processor, implements any of the methods described above.
The application provides a scheduling method, a device, a processor, equipment and a medium of a multi-source operand instruction, wherein an instruction scheduling unit reads a first source operand and a second source operand in a plurality of source operands from a register file respectively in a first clock cycle of sending an instruction to an execution unit; in a second clock cycle of sending the instruction to the execution unit, the instruction scheduling unit sends the multi-source operand instruction, the first source operand and the second source operand to the execution unit, and synchronously reads a third source operand in the plurality of source operands from the register file; in a third clock cycle of sending the instruction to the execution unit, the instruction scheduling unit sends the third source operand to the execution unit by multiplexing the data bus of the first source operand, so that the execution unit executes operation corresponding to the multi-source operand instruction on the plurality of source operands to obtain an instruction operation result.
For multi-source operand instructions, the first and second source operands may be read in a first clock cycle, the instruction issued beginning in a second clock cycle, the first and second source operands and the third source operand read in synchronization, and the third source operand issued to the execution unit in a third clock cycle without adding additional resource overhead. Meanwhile, the third source operand can multiplex the data bus of the first source operand, so that the physical resource overhead is further saved. Therefore, the problems of access conflict of a register file or congestion of a data bus, high physical resource expense and increase of instruction scheduling time in the prior art can be solved, so that a multi-source operand instruction and a plurality of source operands are transmitted in a processor more stably, resource expense such as the data bus is saved, and the technical effects of improving the execution speed and parallelism of the multi-source operand instruction are achieved.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the application and together with the description, serve to explain the principles of the application.
FIG. 1 is a schematic diagram of a processor according to an embodiment of the present application;
FIG. 2 is a schematic diagram of an architecture of an alternative processor according to an embodiment of the present application;
FIG. 3 is a schematic waveform diagram of an alternative FMA using the method of the present application;
FIG. 4 is a flow chart of a method for scheduling multi-source operand instructions according to an embodiment of the present application;
FIG. 5 is a flow chart of an alternative method of scheduling multi-source operand instructions according to an embodiment of the present application;
FIG. 6 is a block diagram illustrating a multi-source operand instruction scheduler according to an embodiment of the present application;
fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Specific embodiments of the present application have been shown by way of the above drawings and will be described in more detail below. The drawings and the written description are not intended to limit the scope of the inventive concepts in any way, but rather to illustrate the inventive concepts to those skilled in the art by reference to the specific embodiments.
Detailed Description
Reference will now be made in detail to exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The implementations described in the following exemplary examples do not represent all implementations consistent with the application. Rather, they are merely examples of apparatus and methods consistent with aspects of the application as detailed in the accompanying claims.
First, the terms involved in the present application will be explained:
an operand is an entity acted upon by an operator, and is an integral part of an expression that specifies the amount of digital operations performed in an instruction, and indicates the source of data required for the operation performed by the instruction.
Multisource operands (Multi-Source operands), which generally refer to a number of Source operands greater than 2 in one instruction, e.g., fused Multiply-Accumulate operation (FMA, also known as floating point Multiply accumulate) instructions, require at least 3 Source operands.
The physical register file (Physical register File, PRF) refers to a system register or the like implemented by registers or memory granules, which is commonly used to store operation data.
Cache memory (cache) is used to better utilize the locality principle and reduce the number of CPU accesses to main memory. In short, the instructions and data being accessed by the CPU may be accessed multiple times later, or the memory area in the vicinity of the instructions and data may be accessed multiple times. Therefore, when the block of area is accessed for the first time, the block of area is copied into the cache, and instructions or data of the area are not required to be fetched from the main memory when the instructions or the data of the area are accessed later.
An instruction cache (ICACHE-instruction cache) for storing instructions that the CPU needs to access.
And a data cache (Dcache-data cache) for storing data to be accessed by the CPU.
In the instruction set of computer processors, multi-source operand instructions are mostly included. A multi-source operand instruction refers to an instruction having more than 2 operands, e.g., 3 or 4. The operands are typically stored in a physical register file PRF, and when the ISU schedules an issue instruction, there is typically one clock cycle for the PRF to read the source operands and issue the source operands and micro instructions to the corresponding execution units on the next clock cycle.
Since multi-source operand instructions require the instruction dispatch unit ISU to prepare multiple operands simultaneously, which requires the ISU to read 3-4 source operands from the PRF simultaneously, the PRF is required to support multi-port read-write functions. However, PRF requires a larger resource overhead (even more than 1 time in some cases) and is worse in timing, regardless of whether it depends on a memory granule implementation or a register implementation, to support multi-port read-write functions. At the same time, the issue of multi-source operand instructions also requires an increase in the data bus path between the execution unit and the dispatch unit, further increasing physical overhead.
The following describes the technical scheme of the present application and how the technical scheme of the present application solves the above technical problems in detail with specific embodiments. The following embodiments may be combined with each other, and the same or similar concepts or processes may not be described in detail in some embodiments. Embodiments of the present application will be described below with reference to the accompanying drawings.
The application provides a dispatching method of a multisource operand instruction, which aims to solve the technical problems in the prior art. The method for scheduling multi-source operand instructions may be applied to a schematic architecture of a processor as shown in fig. 1. As shown in fig. 1, the processor includes:
an instruction dispatch unit 101 for reading a first source operand and a second source operand of the plurality of source operands from the register file 100, respectively, in a first clock cycle of an instruction to the execution unit; in a second clock cycle of the instruction to the execution unit, issuing a multi-source operand instruction, the first source operand, the second source operand to the execution unit, and synchronously reading a third source operand of the plurality of source operands from the register file 100; in a third clock cycle of sending an instruction to an execution unit, the third source operand is sent to the execution unit via a data bus that multiplexes the first source operands.
And an execution unit 102, configured to execute operation corresponding to the multi-source operand instruction on the plurality of source operands, to obtain an instruction operation result.
In the alternative, the processor may be a microprocessor, such as a general purpose microprocessor.
In one example, the instruction scheduling unit and the execution unit are both included in a core of a general-purpose microprocessor, such that the instruction scheduling unit schedules the multi-source operand instruction in the core of the general-purpose microprocessor, and the execution unit performs an operation corresponding to the multi-source operand instruction on the plurality of source operands in the core of the general-purpose microprocessor.
In one example, the clock cycle in the embodiment of the present application is a clock cycle that directs instructions to the execution units, and since all processors run at a certain frequency, the frequency corresponds to the clock cycle. For example, at a frequency of 1G, the clock period is 1000ps, the first clock period is the first 1000ps, the second clock period, the third clock period, and so on.
For example, in the embodiment of the present application, if a multi-source operand instruction needs to be executed in a processor, taking an instruction scheduling unit to read three source operands from a register file in two clock cycles as an example, the multi-source operand instruction and a plurality of source operands are sent to an execution unit for operation in three clock cycles, and the execution unit may perform corresponding operation on the three source operands and write the operation result into a target register.
In this way, compared with the mode of reading a plurality of operands in one clock cycle and sending a multi-source operand instruction and a plurality of source operands to an execution unit in the next clock cycle in the prior art, the embodiment of the application has the advantages of time sequence processing, more stable data transmission, reduced physical resource overhead and improved execution speed and parallelism of the multi-source operand instruction.
Since multi-source operand instructions typically involve complex arithmetic operations such as fused multiply-add operations, three-operand addition operations, etc., if all source operands and instructions are issued to an execution unit in one clock cycle, it may result in an overflow of the input buffers of the execution unit or congestion of the data bus, thereby increasing the latency of the execution unit. For multi-source operand instructions, the first and second source operands may be read in a first clock cycle, the instruction issued beginning in a second clock cycle, the first and second source operands and the third source operand read in synchronization, and the third source operand issued to the execution unit in a third clock cycle without adding additional resource overhead. Meanwhile, the third source operand can multiplex the data bus of the first source operand, so that the physical resource overhead is further saved.
In the embodiment of the application, if three or more source operands need to be read from the register file by the multi-source operand instruction, access conflict of the register file or congestion of a data bus may be caused due to reading all the source operands in one clock period, so that the instruction scheduling time is increased. By reading three source operands in two clock cycles, the problems can be avoided, and the efficiency of instruction scheduling is improved.
FIG. 2 is a schematic diagram of an alternative processor according to an embodiment of the present application, as shown in FIG. 2, for example, an instruction cache is used to store instructions that a CPU of the processor needs to access. The instruction fetch decode unit is a functional unit within the processor that accesses the instruction cache, fetches instructions, and decodes the instructions to translate the instruction set into micro-instructions within the processor that can recognize execution.
As shown in fig. 2, the instruction scheduling unit may read a plurality of source operands from the register file, for example, in a first clock cycle of sending an instruction to the execution unit, and the instruction scheduling unit may read a first source operand and a second source operand in the plurality of source operands from the register file respectively, so that efficiency of instruction scheduling may be improved, and instruction delay may be reduced; in a second clock cycle of sending an instruction to the execution unit, the instruction scheduling unit sends a multi-source operand instruction, the first source operand and the second source operand to the execution unit, and reads a third source operand in the plurality of source operands from the register file synchronously, so that the quick transmission and execution of the multi-source operand instruction can be realized, and the performance of the processor is improved.
Since in general-purpose microprocessors the data bus is a scarce and important resource, if each source operand occupies a separate data bus for transmission, it may result in insufficient data bus resources or data bus collisions. As shown in fig. 2, in the third clock cycle of sending the instruction to the execution unit, the instruction scheduling unit sends the third source operand (source operand 3) to the execution unit by multiplexing the data bus of the first source operand (source operand 1), so as to save data bus resources and reduce the risk of data bus collision. Therefore, the used data buses are multiplexed for transmission, so that the data bus resources can be saved, and the risk of data bus collision is reduced.
In fig. 2, which illustrates an embodiment of the present application, a multi-source operand instruction enters the reorder buffer queue when entering the ISU, and updates the state in the reorder buffer queue after execution is completed, indicating that the instruction may be retired, but must wait for all previous instructions to be retired. The reorder buffer may be considered a first-in-first-out FIFO queue, in which multi-source operand instructions are enqueued on issue and dequeued on retirement, based on FIFO logic, to enable in-order commit of instructions.
Furthermore, by sending the multi-source operand instruction and the plurality of source operands to the execution unit, the execution unit may perform the corresponding operation on the received plurality of source operands and write the operation result to the destination register. According to the embodiment of the application, the processor can complete the dispatching and execution of the multi-source operand instruction, so that the performance of the processor is improved.
For multi-source operand instructions, the first and second source operands may be read in a first clock cycle, the instruction issued beginning in a second clock cycle, the first and second source operands and the third source operand read in synchronization, and the third source operand issued to the execution unit in a third clock cycle without adding additional resource overhead. Meanwhile, the third source operand can multiplex the data bus of the first source operand, so that the physical resource overhead is further saved.
Therefore, the problems of access conflict of a register file or congestion of a data bus, high physical resource expense and increase of instruction scheduling time in the prior art can be solved, the technical effects that transmission of a multi-source operand instruction and a plurality of source operands in a processor can run higher frequency stably, resource expense such as the data bus is saved, and the execution speed and parallelism of the multi-source operand instruction are improved can be achieved. Furthermore, in the embodiment of the application, the multi-operand instruction and a plurality of source operands are flexibly and effectively sent in the processor core, and the support of the multi-source operand instruction is completed with little change.
In an optional embodiment, the instruction scheduling unit is further configured to obtain a plurality of source operand information from source operand information areas of a plurality of source operand registers in the register file; reading the first source operand from an operand storage area in a first source operand register and the second source operand from an operand storage area in a second source operand register according to the plurality of source operand information; and reading the third source operand from an operand storage area in a third source operand register based on the plurality of source operand information.
Optionally, in one example, the source operand information area is an area for storing information of an address, a type, a length, and the like of a source operand, where the information is generated according to a register number in an instruction decoding stage, and by acquiring the information from the source operand information area, a location and an attribute of the source operand in a register file can be quickly determined, so that an access speed of the register file is increased. For example, if the address of the first source operand is R1, the type is integer, and the length is 32 bits, then this information may be retrieved from the source operand information field and the 32-bit integer value in the R1 register is read from the register file as the first source operand based on this information.
Optionally, in another example, the operand storage area is an area for storing specific values of the source operand, and the values are written into corresponding registers according to operation results or data transmission results in the instruction execution stage, and by reading corresponding values from the operand storage area according to the source operand information, the integrity and correctness of the data can be ensured, and the loss or error of the data can be avoided. For example, if the first source operand is a 32-bit integer value in the R1 register, the value may be read from the operand storage area in the R1 register as the first source operand.
By adopting the instruction scheduling unit to acquire a plurality of source operand information from the source operand information areas of a plurality of source operand registers in the register file, reading a first source operand from the operand storage area in the first source operand register according to the plurality of source operand information, reading a second source operand from the operand storage area in the second source operand register, and reading a third source operand from the operand storage area in the third source operand register according to the plurality of source operand information, the plurality of source operands can be quickly positioned and accurately read, and the access efficiency of the register file and the integrity and the correctness of data can be improved.
In an alternative embodiment, the multi-source operand instruction includes: the instruction type mark is used for identifying the type of the multi-source operand instruction, and the operation code is used for identifying the type of the operation corresponding to the multi-source operand instruction.
Since the instruction type flags are fields for identifying the type of the multi-source operand instruction, such as integer, floating point, vector, etc., these types determine the format and length of the multi-source operand instruction, as well as the type of data bus and execution unit required, it is possible to determine whether or not the multi-source operand instruction is one. An opcode is a field used to identify the type of operation, such as addition, multiplication, fused multiply-add, etc., to which the multi-source operand instruction corresponds, which determines the function and priority of the multi-source operand instruction, as well as the execution units required.
FIG. 3 is a schematic waveform diagram of a fused multiply-add instruction FMA using the multi-source operand instruction provided by the embodiments of the present application, wherein clk is a clock cycle signal and FMA _req is a request signal of the fused multiply-add instruction; multi-src-valid is the valid signal for multiple source operands (the signal name is determined by the design implementation and is only illustrative here). When the signal is asserted, it indicates that the currently transmitted source operand and instruction encoding are asserted; uops are micro-operation micro instructions, and an instruction is translated into a micro instruction by a decoder to be correctly executed in a processor, which in the present embodiment is referred to as a multi-source operand instruction.
In addition, dbus1-src1 is used for transmitting a first data1 through the data bus1, dbus1-src2 is used for transmitting a second data2 through the data bus1, namely, a second source data2 is used for transmitting a third data3 through the data bus1, namely, a third source data3 is used for transmitting a third data3 is used for transmitting a third source data 3. The opcode (operation code) code refers to the micro instruction code, and has different bit widths and encoding modes according to different designs.
By including the instruction type flag and the operation code in the multi-source operand instruction, for example, in the embodiment of the application, the instruction type flag is valid (the signal is set from 0 to 1) in the first, second and third clock cycles, and the instruction type flag is invalid (the signal is set from 1 to 0) in the fourth clock cycle, the multi-source operand instruction can be rapidly identified and classified in the instruction decoding stage, thereby providing convenience for subsequent instruction scheduling and execution and improving the efficiency of instruction scheduling and execution.
Optionally, in another exemplary embodiment, the execution unit is further configured to execute an operation corresponding to the multi-source operand instruction on the plurality of source operands based on the instruction type flag and the operation code, to obtain the instruction operation result.
Because the execution unit is used for executing corresponding operation operations on a plurality of source operands, such as an adder, a multiplier, a fusion multiplier and the like, different operation methods can be performed according to different operation types. In the embodiment of the application, the execution unit can specifically execute operation corresponding to the multi-source operand instruction on a plurality of source operands based on the instruction type mark and the operation code to obtain the instruction operation result, thereby realizing the accurate execution of the multi-source operand instruction and improving the performance of the processor. The execution unit can select a proper operation method based on the instruction type mark and the operation code, execute operation corresponding to the multi-source operand instruction on a plurality of source operands to obtain an instruction operation result, and write the result into a target register.
The embodiment of the present application also provides an embodiment of a method for scheduling a multi-source operand instruction, and fig. 4 is a schematic flow diagram of the method for scheduling a multi-source operand instruction provided by the embodiment of the present application, as shown in fig. 4, where the method includes:
s401, in a first clock cycle of sending an instruction to an execution unit, an instruction scheduling unit reads a first source operand and a second source operand in a plurality of source operands from a register file respectively.
S402, in a second clock cycle of sending an instruction to the execution unit, the instruction scheduling unit sends a multi-source operand instruction, a first source operand, a second source operand to the execution unit, and synchronously reads a third source operand of the plurality of source operands from the register file.
S403, in the third clock cycle of sending the instruction to the execution unit, the instruction scheduling unit sends the third source operand to the execution unit by multiplexing the data bus of the first source operand, so that the execution unit executes the operation corresponding to the multi-source operand instruction on the plurality of source operands to obtain the instruction operation result.
The method for scheduling multi-source operand instruction provided by the application can be applied to the processor shown in fig. 1, and optionally, the processor can be a microprocessor, for example, a general-purpose microprocessor.
For example, in the embodiment of the present application, if a multi-source operand instruction needs to be executed in a processor, the instruction scheduling unit may read three source operands from the register file in two clock cycles and send the multi-source operand instruction and the multiple source operands to the execution unit for operation in three clock cycles, and the execution unit may perform corresponding operation on the three source operands and write the operation result into the target register.
In this way, compared with the mode that a plurality of operands are read in one clock cycle in the prior art, and the multi-source operand instruction and the plurality of source operands are sent to the execution unit in the next clock cycle, the embodiment of the application can achieve the purpose that the transmission of the multi-source operand instruction and the plurality of source operands in the processor runs stably at higher frequency, reduce the cost of physical resources and improve the execution speed and parallelism of the multi-source operand instruction.
Since multi-source operand instructions typically involve complex arithmetic operations, such as fused multiply-add operations, three-operand addition operations, and the like, sending all source operands and instructions to an execution unit in one clock cycle may result in overflow of the input buffers of the execution unit or congestion of the data bus, thereby increasing latency of the execution unit. Taking a plurality of source operands, specifically three source operands, for example, for a multi-source operand instruction, the first and second source operands may be read in a first clock cycle, the instruction and the first and second source operands may be issued beginning in a second clock cycle, and a third source operand may be synchronously read and issued to the execution unit in a third clock cycle without adding additional resource overhead. Meanwhile, the third source operand can multiplex the data bus of the first source operand, so that the physical resource overhead is further saved.
In the embodiment of the application, if three or more source operands need to be read from the register file by the multi-source operand instruction, access conflict of the register file or congestion of a data bus may be caused due to reading all the source operands in one clock period, so that the instruction scheduling time is increased. By reading a plurality of source operands in two clock cycles, the problems can be avoided, and the efficiency of instruction scheduling can be improved.
For example, the instruction scheduling unit may read a plurality of source operands from the register file, for example, in a first clock cycle of sending an instruction to the execution unit, and the instruction scheduling unit may read a first source operand and a second source operand in the plurality of source operands from the register file, respectively, so that efficiency of instruction scheduling may be improved, and instruction delay may be reduced; in a second clock cycle of sending an instruction to the execution unit, the instruction scheduling unit sends a multi-source operand instruction, the first source operand and the second source operand to the execution unit, and reads a third source operand in the plurality of source operands from the register file synchronously, so that the quick transmission and execution of the multi-source operand instruction can be realized, and the performance of the processor is improved.
Since in general-purpose microprocessors the data bus is a scarce and important resource, if each source operand occupies a separate data bus for transmission, it may result in insufficient data bus resources or data bus collisions. In the embodiment of the application, in the third clock cycle of sending the instruction to the execution unit, the instruction scheduling unit sends the third source operand to the execution unit by multiplexing the data buses of the first source operand, so that the data bus resources can be saved, and the risk of data bus collision can be reduced. Therefore, the used data buses are multiplexed for transmission, so that the data bus resources can be saved, and the risk of data bus collision is reduced.
Furthermore, by sending the multi-source operand instruction and the plurality of source operands to the execution unit, the execution unit may perform the corresponding operation on the received plurality of source operands and write the operation result to the destination register. According to the embodiment of the application, the processor can complete the dispatching and execution of the multi-source operand instruction, so that the performance of the processor is improved.
For multi-source operand instructions, the first and second source operands may be read in a first clock cycle, the instruction issued beginning in a second clock cycle, the first and second source operands and the third source operand read in synchronization, and the third source operand issued to the execution unit in a third clock cycle without adding additional resource overhead. Meanwhile, the third source operand can multiplex the data bus of the first source operand, so that the physical resource overhead is further saved.
Therefore, the problems of access conflict of a register file or congestion of a data bus, high physical resource expense and increase of instruction scheduling time in the prior art can be solved, the technical effects of stably running higher frequency of transmission of a multi-source operand instruction and a plurality of source operands in a processor, saving resource expense such as the data bus and the like and improving the execution speed and parallelism of the multi-source operand instruction can be achieved. Furthermore, in the embodiment of the application, the multi-operand instruction and a plurality of source operands are flexibly and effectively sent in the processor core, and the support of the multi-source operand instruction is completed with little change.
Since the instruction type flags are fields for identifying the type of the multi-source operand instruction, such as integer, floating point, vector, etc., these types determine the format and length of the multi-source operand instruction, as well as the type of data bus and execution unit required, it is possible to determine whether or not the multi-source operand instruction is one. An opcode is a field used to identify the type of operation, such as addition, multiplication, fused multiply-add, etc., to which the multi-source operand instruction corresponds, which determines the function and priority of the multi-source operand instruction, as well as the execution units required.
By including the instruction type flag and the operation code Opcode in the multi-source operand instruction, for example, in the embodiment of the application, the instruction type flag is valid in the first, second and third clock cycles, the instruction type flag is invalid in the fourth clock cycle, and the multi-source operand instruction can be rapidly identified and classified in the instruction decoding stage, thereby providing convenience for subsequent instruction scheduling and execution and improving the efficiency of instruction scheduling and execution.
Optionally, in another exemplary embodiment, the execution unit is further configured to execute an operation corresponding to the multi-source operand instruction on the plurality of source operands based on the instruction type flag and the operation code, to obtain the instruction operation result.
Because the execution unit is used for executing corresponding operation operations on a plurality of source operands, such as an adder, a multiplier, a fusion multiplier and the like, different operation methods can be performed according to different operation types. In the embodiment of the application, the execution unit can specifically execute operation corresponding to the multi-source operand instruction on a plurality of source operands based on the instruction type mark and the operation code to obtain the instruction operation result, thereby realizing the accurate execution of the multi-source operand instruction and improving the performance of the processor. The execution unit can select a proper operation method based on the instruction type mark and the operation code, execute operation corresponding to the multi-source operand instruction on a plurality of source operands to obtain an instruction operation result, and write the result into a target register.
According to the embodiment of the application, the execution unit executes the operation corresponding to the multi-source operand instruction on the plurality of source operands based on the instruction type mark and the operation code to obtain the instruction operation result, so that the accurate execution of the multi-source operand instruction can be realized, and the performance of the processor is improved.
In an alternative embodiment, the multi-source operand is more than 2 source operands, and generally includes a first source operand, a second source operand, and a third source operand, and may further include: in the embodiment of the present application, the fourth source operand may multiplex the data bus of the second source operand to save the physical resource overhead, and specifically, the method further includes:
the instruction dispatch unit further reads a fourth source operand of the plurality of source operands from the register file during a second clock cycle of issuing an instruction to the execution unit;
in a fourth clock cycle of sending an instruction to the execution unit, the instruction scheduling unit sends the fourth source operand to the execution unit by multiplexing the data bus of the second source operand, so that the execution unit executes operation corresponding to the multi-source operand instruction on the plurality of source operands to obtain the instruction operation result.
In the embodiment of the application, the instruction scheduling unit transmits the fourth source operand to the execution unit by multiplexing the data buses of the second source operand in the fourth clock cycle of transmitting the instruction to the execution unit, so that the data bus resources can be saved, and the risk of data bus collision can be reduced. Therefore, the used data buses are multiplexed for transmission, so that the data bus resources can be saved, and the risk of data bus collision is reduced. For multi-source operand instructions, the first and second source operands may be read in a first clock cycle, the instruction, the first and second source operands may be issued beginning in a second clock cycle, the third source operand may be read in synchronization, the third source operand may be issued to the execution unit in a third clock cycle, the fourth source operand may be read in a second or third clock cycle, and the instruction dispatch unit may issue the fourth source operand to the execution unit by multiplexing the data bus of the second source operand in a fourth clock cycle. In this way, the third source operand may be multiplexed with the data bus of the first source operand, and the fourth source operand may be multiplexed with the data bus of the second source operand, further saving physical resource overhead.
Furthermore, by sending the multi-source operand instruction and the plurality of source operands to the execution unit, the execution unit may perform the corresponding operation on the received plurality of source operands and write the operation result to the destination register. According to the embodiment of the application, the processor can complete the dispatching and execution of the multi-source operand instruction, so that the performance of the processor is improved.
In an alternative embodiment, the instruction scheduling unit and the execution unit are both included in a core of a general-purpose microprocessor, such that the instruction scheduling unit schedules the multi-source operand instruction in the core of the general-purpose microprocessor, and the execution unit performs an operation corresponding to the multi-source operand instruction on the plurality of source operands in the core of the general-purpose microprocessor.
In one example, the instruction scheduling unit and the execution unit are both included in a core of a general-purpose microprocessor, such that the instruction scheduling unit schedules the multi-source operand instruction in the core of the general-purpose microprocessor, and the execution unit performs an operation corresponding to the multi-source operand instruction on the plurality of source operands in the core of the general-purpose microprocessor.
Because the general-purpose microprocessor is a microprocessor composed of a plurality of cores, each core has its own instruction scheduling unit and execution unit, and resources such as register files, data buses, caches and the like shared with other cores. By including both the instruction scheduling unit and the execution unit in the cores of the general purpose microprocessor, each core may be enabled to independently schedule and execute multi-source operand instructions, thereby improving the performance of the general purpose microprocessor. For example, if a multi-source operand instruction needs to be executed in a first core, the instruction dispatch unit of that core may read the corresponding source operand from the register file and send the instruction and multi-source operand to the execution unit of that core for execution without communicating or coordinating with other cores.
Therefore, the transmission of the multi-source operand instruction and a plurality of source operands in the processor can run stably at a higher frequency, the efficient scheduling and execution of the multi-source operand instruction are realized, the performance of the general microprocessor is improved, and the execution speed and the parallelism of the multi-source operand instruction are improved.
In an alternative embodiment, the operation corresponding to the multi-source operand instruction includes at least one of: fused multiply-add operations, three-operand addition operations.
Since fused multiply-add operations and three-operand addition operations are commonly used mathematical operations, they can be used to implement a variety of complex algorithms and functions, such as matrix multiplication, vector operations, neural networks, and the like. The operation corresponding to the multi-source operand instruction comprises the operations, so that the processor can complete the fused multiply-add operation or the three-operand addition operation of a plurality of source operands in one clock period, and the performance of the processor is improved.
For example, if a multi-source operand instruction is a fused multiply-add instruction, whose three source operands are a first source operand A, a second source operand B, and a third source operand C, respectively, and whose destination register is D, the processor may complete the A B+C operation in one clock cycle and write the result into destination register D. The operation corresponding to the multi-source operand instruction comprises fusion multiply-add operation and three-operand addition operation, so that the efficient operation on a plurality of source operands can be realized, and the performance of the processor is improved.
In an alternative implementation manner, fig. 5 is a flow chart of a scheduling method of a multi-source operand instruction according to an embodiment of the present application, as shown in fig. 5, the instruction scheduling unit reads a first source operand and a second source operand in a plurality of source operands from a register file, respectively, including:
S4011, obtaining a plurality of source operand information from source operand information areas of a plurality of source operand registers in the register file.
S4012 reads the first source operand from an operand storage area in a first source operand register and reads the second source operand from an operand storage area in a second source operand register based on the plurality of source operand information.
In addition, in an alternative embodiment, still as shown in fig. 5, reading a third source operand of the plurality of source operands from the register file includes:
s4013, reading the third source operand from an operand storage area in a third source operand register based on the plurality of source operand information.
Optionally, in one example, the source operand information area is an area for storing information of an address, a type, a length, and the like of a source operand, where the information is generated according to a register number in an instruction decoding stage, and by acquiring the information from the source operand information area, a location and an attribute of the source operand in a register file can be quickly determined, so that an access speed of the register file is increased. For example, if the address of the first source operand is R1, the type is integer, and the length is 32 bits, then this information may be retrieved from the source operand information field and the 32-bit integer value in the R1 register is read from the register file as the first source operand based on this information.
Optionally, in another example, the operand storage area is an area for storing specific values of the source operand, and the values are written into corresponding registers according to operation results or data transmission results in the instruction execution stage, and by reading corresponding values from the operand storage area according to the source operand information, the integrity and correctness of the data can be ensured, and the loss or error of the data can be avoided. For example, if the first source operand is a 32-bit integer value in the R1 register, the value may be read from the operand storage area in the R1 register as the first source operand.
By adopting the instruction scheduling unit to acquire a plurality of source operand information from the source operand information areas of a plurality of source operand registers in the register file, reading a first source operand from the operand storage area in the first source operand register according to the plurality of source operand information, reading a second source operand from the operand storage area in the second source operand register, and reading a third source operand from the operand storage area in the third source operand register according to the plurality of source operand information, the plurality of source operands can be quickly positioned and accurately read, and the access efficiency of the register file and the integrity and the correctness of data can be improved.
It should be noted that, the user information (including but not limited to user equipment information, user personal information, etc.) and the data (including but not limited to data for analysis, stored data, presented data, etc.) related to the present application are information and data authorized by the user or fully authorized by each party, and the collection, use and processing of the related data need to comply with the related laws and regulations and standards of the related country and region, and provide corresponding operation entries for the user to select authorization or rejection.
According to one or more embodiments of the present application, there is provided a multi-source operand instruction scheduling apparatus, and fig. 6 is a block diagram of a multi-source operand instruction scheduling apparatus according to an embodiment of the present application, and as shown in fig. 6, the apparatus includes:
a reading module 601, configured to, in a first clock cycle of sending an instruction to an execution unit, read a first source operand and a second source operand in a plurality of source operands from a register file by an instruction scheduling unit, respectively;
a first sending module 602, configured to send, during a second clock cycle of sending an instruction to an execution unit, the instruction dispatch unit to send a multi-source operand instruction, the first source operand, the second source operand to the execution unit, and to synchronously read a third source operand of the plurality of source operands from the register file;
The second sending module 603 is configured to send, in a third clock cycle of sending an instruction to the execution unit, the instruction scheduling unit to send the third source operand to the execution unit by multiplexing the data bus of the first source operand, so that the execution unit performs an operation corresponding to the multi-source operand instruction on the plurality of source operands to obtain an instruction operation result.
According to one or more embodiments of the present application, the multi-source operand instruction includes: an instruction type mark and an operation code, wherein the instruction type mark is used for identifying the type of the multi-source operand instruction, and the operation code is used for identifying the type of the operation corresponding to the multi-source operand instruction;
the execution unit executes an operation corresponding to the multi-source operand instruction on the plurality of source operands based on the instruction type flag and the operation code, thereby obtaining the instruction operation result.
According to one or more embodiments of the present application, the instruction scheduling unit and the execution unit are both included in a core of a general-purpose microprocessor such that the instruction scheduling unit schedules the multi-source operand instruction in the core of the general-purpose microprocessor, and the execution unit performs an operation corresponding to the multi-source operand instruction on the plurality of source operands in the core of the general-purpose microprocessor.
According to one or more embodiments of the present application, the operation corresponding to the multi-source operand instruction includes at least one of: fused multiply-add operations, three-operand addition operations.
According to one or more embodiments of the present application, the reading module includes:
an obtaining unit, configured to obtain a plurality of source operand information from source operand information areas of a plurality of source operand registers in the register file;
a first reading unit for reading the first source operand from an operand storage area in a first source operand register and the second source operand from an operand storage area in a second source operand register according to the plurality of source operand information.
According to one or more embodiments of the present application, the first transmitting module includes:
and a second reading unit configured to read the third source operand from an operand storage area in a third source operand register according to the plurality of source operand information.
In an exemplary embodiment, an embodiment of the present application further provides an electronic device, including: a processor and a memory connected with the processor;
the memory stores computer-executable instructions;
The processor executes the computer-executable instructions stored in the memory to implement the method as described in any one of the above.
In an exemplary embodiment, an embodiment of the application further provides a computer-readable storage medium having stored therein computer-executable instructions that, when executed by a processor, are configured to implement a method as any one of the above.
In an exemplary embodiment, the application also provides a computer program product comprising a computer program which, when executed by a processor, implements any of the methods described above.
In order to achieve the above embodiment, the embodiment of the present application further provides an electronic device. Referring to fig. 7, there is shown a schematic structural diagram of an electronic device 700 suitable for use in implementing an embodiment of the present application, where the electronic device 700 may be a terminal device or a server. The terminal device may include, but is not limited to, a mobile terminal such as a mobile phone, a notebook computer, a digital broadcast receiver, a messaging device, a game console, a medical device, an exercise device, a personal digital assistant (Personal Digital Assistant, PDA for short), a tablet computer (Portable Android Device, PAD for short), a portable multimedia player (Portable Media Player, PMP for short), an in-vehicle terminal (e.g., in-vehicle navigation terminal), and the like, and a fixed terminal such as a digital TV, a desktop computer, and the like. The electronic device shown in fig. 7 is only an example and should not be construed as limiting the functionality and scope of use of the embodiments of the application.
As shown in fig. 7, the electronic apparatus 700 may include a processing device (e.g., a central processing unit, a graphics processor, etc.) 701 that may perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 702 or a program loaded from a storage device 708 into a random access Memory (Random Access Memory, RAM) 703. In the RAM 703, various programs and data required for the operation of the electronic device 700 are also stored. The processing device 701, the ROM 702, and the RAM 703 are connected to each other through a bus 704. An input/output (I/O) interface 705 is also connected to bus 704.
In general, the following devices may be connected to the I/O interface 705: input devices 706 including, for example, a touch screen, touchpad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, and the like; an output device 707 including, for example, a liquid crystal display (Liquid Crystal Display, LCD for short), a speaker, a vibrator, and the like; storage 708 including, for example, magnetic tape, hard disk, etc.; and a communication device 709. The communication means 709 may allow the electronic device 700 to communicate wirelessly or by wire with other devices to exchange data. While fig. 7 shows an electronic device 700 having various means, it is to be understood that not all of the illustrated means are required to be implemented or provided. More or fewer devices may be implemented or provided instead.
In particular, according to embodiments of the present application, the processes described above with reference to flowcharts may be implemented as computer software programs. For example, embodiments of the present application include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method shown in the flowcharts. In such an embodiment, the computer program may be downloaded and installed from a network via communication device 709, or installed from storage 708, or installed from ROM 702. When being executed by the processing means 701, performs the above-described functions defined in the method of the embodiment of the present application.
The computer readable medium of the present application may be a computer readable signal medium or a computer readable storage medium, or any combination of the two. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples of the computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present application, however, the computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, with the computer-readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, fiber optic cables, RF (radio frequency), and the like, or any suitable combination of the foregoing.
The computer readable medium may be contained in the electronic device; or may exist alone without being incorporated into the electronic device.
The computer-readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to perform the methods shown in the above-described embodiments.
Computer program code for carrying out operations of the present application may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, smalltalk, C ++ and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a local area network (Local Area Network, LAN for short) or a wide area network (Wide Area Network, WAN for short), or it may be connected to an external computer (e.g., connected via the internet using an internet service provider).
The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The units involved in the embodiments of the present application may be implemented in software or in hardware. The name of the unit does not in any way constitute a limitation of the unit itself, for example the first acquisition unit may also be described as "unit acquiring at least two internet protocol addresses".
The functions described above herein may be performed, at least in part, by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: a Field Programmable Gate Array (FPGA), an Application Specific Integrated Circuit (ASIC), an Application Specific Standard Product (ASSP), a system on a chip (SOC), a Complex Programmable Logic Device (CPLD), and the like.
In the context of the present application, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
Other embodiments of the application will be apparent to those skilled in the art from consideration of the specification and practice of the application disclosed herein. This application is intended to cover any variations, uses, or adaptations of the application following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the application pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the application being indicated by the following claims.
It is to be understood that the application is not limited to the precise arrangements and instrumentalities shown in the drawings, which have been described above, and that various modifications and changes may be effected without departing from the scope thereof. The scope of the application is limited only by the appended claims.

Claims (10)

1. A method of scheduling a multi-source operand instruction, the method comprising:
in a first clock cycle of sending an instruction to an execution unit, the instruction dispatch unit reads a first source operand and a second source operand of a plurality of source operands from a register file respectively;
in a second clock cycle of sending an instruction to the execution unit, the instruction dispatch unit sends a multi-source operand instruction, the first source operand, the second source operand to the execution unit, and synchronously reads a third source operand of the plurality of source operands from the register file;
In a third clock cycle of sending an instruction to the execution unit, the instruction scheduling unit sends the third source operand to the execution unit by multiplexing the data bus of the first source operand, so that the execution unit executes operation corresponding to the multi-source operand instruction on the plurality of source operands to obtain an instruction operation result.
2. The method of claim 1, wherein the instruction dispatch unit reads a first source operand and a second source operand of a plurality of source operands, respectively, from a register file, comprising:
acquiring a plurality of source operand information from source operand information areas of a plurality of source operand registers in the register file;
reading the first source operand from an operand storage area in a first source operand register and the second source operand from an operand storage area in a second source operand register according to the plurality of source operand information;
reading a third source operand of a plurality of source operands from the register file, comprising:
the third source operand is read from an operand storage area in a third source operand register according to the plurality of source operand information.
3. The method of claim 1, wherein the step of determining the position of the substrate comprises,
the multi-source operand instruction includes: the instruction type mark is used for identifying the type of the multi-source operand instruction, and the operation code is used for identifying the type of the operation corresponding to the multi-source operand instruction;
the execution unit is used for executing operation corresponding to the multi-source operand instruction on the plurality of source operands based on the instruction type mark and the operation code to obtain the instruction operation result.
4. A method according to any one of claims 1 to 3,
the instruction scheduling unit and the execution unit are both included in a core of a general purpose microprocessor such that the instruction scheduling unit schedules the multi-source operand instruction in the core of the general purpose microprocessor, and the execution unit performs an operation corresponding to the multi-source operand instruction on the plurality of source operands in the core of the general purpose microprocessor.
5. A method as claimed in any one of claims 1 to 3, wherein the arithmetic operation corresponding to the multi-source operand instruction comprises at least one of: fused multiply-add operations, three-operand addition operations.
6. A method according to any one of claims 1 to 3, characterized in that the method further comprises:
in a second clock cycle of issuing an instruction to the execution unit, the instruction dispatch unit further reads a fourth source operand of a plurality of source operands from the register file;
and in a fourth clock cycle of sending an instruction to the execution unit, the instruction scheduling unit sends the fourth source operand to the execution unit through multiplexing a data bus of the second source operand, so that the execution unit executes operation corresponding to the multi-source operand instruction on the plurality of source operands to obtain an instruction operation result.
7. A processor, comprising:
an instruction scheduling unit for reading a first source operand and a second source operand of the plurality of source operands from the register file, respectively, in a first clock cycle of an instruction to the execution unit; in a second clock cycle of sending an instruction to the execution unit, sending a multi-source operand instruction, the first source operand, the second source operand to the execution unit, and synchronously reading a third source operand of the plurality of source operands from the register file; transmitting the third source operand to the execution unit through a data bus multiplexing the first source operand in a third clock cycle of transmitting an instruction to the execution unit;
And the execution unit is used for executing operation corresponding to the multi-source operand instruction on the plurality of source operands to obtain an instruction operation result.
8. A scheduling apparatus for a multi-source operand instruction, the apparatus comprising:
a reading module, configured to, in a first clock cycle of sending an instruction to the execution unit, read a first source operand and a second source operand in a plurality of source operands from the register file by the instruction scheduling unit, respectively;
a first sending module, configured to send an instruction to the execution unit in a second clock cycle of sending an instruction, where the instruction scheduling unit sends a multi-source operand instruction, the first source operand, the second source operand to the execution unit, and synchronously reads a third source operand from the plurality of source operands from the register file;
and the second sending module is used for sending the third source operand to the execution unit through multiplexing the data bus of the first source operand in a third clock cycle of sending the instruction to the execution unit, so that the execution unit executes operation corresponding to the multi-source operand instruction on the plurality of source operands to obtain an instruction operation result.
9. An electronic device, comprising: a processor, and a memory coupled to the processor;
the memory stores computer-executable instructions;
the processor executes computer-executable instructions stored in the memory to implement the method of any one of claims 1 to 6.
10. A computer readable storage medium having stored therein computer executable instructions which when executed by a processor are adapted to carry out the method of any one of claims 1 to 6.
CN202311135017.3A 2023-09-04 2023-09-04 Multi-source operand instruction scheduling method, device, processor, equipment and medium Pending CN117170750A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311135017.3A CN117170750A (en) 2023-09-04 2023-09-04 Multi-source operand instruction scheduling method, device, processor, equipment and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311135017.3A CN117170750A (en) 2023-09-04 2023-09-04 Multi-source operand instruction scheduling method, device, processor, equipment and medium

Publications (1)

Publication Number Publication Date
CN117170750A true CN117170750A (en) 2023-12-05

Family

ID=88933122

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311135017.3A Pending CN117170750A (en) 2023-09-04 2023-09-04 Multi-source operand instruction scheduling method, device, processor, equipment and medium

Country Status (1)

Country Link
CN (1) CN117170750A (en)

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101529377A (en) * 2006-10-27 2009-09-09 英特尔公司 Communication between multiple threads in a processor
US20110035570A1 (en) * 2009-08-07 2011-02-10 Via Technologies, Inc. Microprocessor with alu integrated into store unit
CN102043750A (en) * 2010-12-13 2011-05-04 青岛海信信芯科技有限公司 Microprocessor bus structure and microprocessor
US20110153993A1 (en) * 2009-12-22 2011-06-23 Vinodh Gopal Add Instructions to Add Three Source Operands
CN103645886A (en) * 2013-12-13 2014-03-19 广西科技大学 Addition/subtraction, multiplication and division operation control unit for multiple floating-point operands
CN103677742A (en) * 2013-12-13 2014-03-26 广西科技大学 Multi-floating point operand adding/subtracting operation controller
WO2018063513A1 (en) * 2016-10-01 2018-04-05 Intel Corporation Systems and methods for executing a fused multiply-add instruction for complex numbers
CN108027769A (en) * 2015-09-19 2018-05-11 微软技术许可有限责任公司 Instructed using register access and initiate instruction block execution
CN108959180A (en) * 2018-06-15 2018-12-07 北京探境科技有限公司 A kind of data processing method and system
CN114816526A (en) * 2022-04-19 2022-07-29 北京微核芯科技有限公司 Operand domain multiplexing-based multi-operand instruction processing method and device
CN115658146A (en) * 2022-12-14 2023-01-31 成都登临科技有限公司 AI chip, tensor processing method and electronic equipment
CN115904510A (en) * 2023-02-15 2023-04-04 南京砺算科技有限公司 Multi-operand instruction processing method, graphics processor and storage medium

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101529377A (en) * 2006-10-27 2009-09-09 英特尔公司 Communication between multiple threads in a processor
US20110035570A1 (en) * 2009-08-07 2011-02-10 Via Technologies, Inc. Microprocessor with alu integrated into store unit
US20110153993A1 (en) * 2009-12-22 2011-06-23 Vinodh Gopal Add Instructions to Add Three Source Operands
CN102043750A (en) * 2010-12-13 2011-05-04 青岛海信信芯科技有限公司 Microprocessor bus structure and microprocessor
CN103645886A (en) * 2013-12-13 2014-03-19 广西科技大学 Addition/subtraction, multiplication and division operation control unit for multiple floating-point operands
CN103677742A (en) * 2013-12-13 2014-03-26 广西科技大学 Multi-floating point operand adding/subtracting operation controller
CN108027769A (en) * 2015-09-19 2018-05-11 微软技术许可有限责任公司 Instructed using register access and initiate instruction block execution
WO2018063513A1 (en) * 2016-10-01 2018-04-05 Intel Corporation Systems and methods for executing a fused multiply-add instruction for complex numbers
CN108959180A (en) * 2018-06-15 2018-12-07 北京探境科技有限公司 A kind of data processing method and system
CN114816526A (en) * 2022-04-19 2022-07-29 北京微核芯科技有限公司 Operand domain multiplexing-based multi-operand instruction processing method and device
CN115658146A (en) * 2022-12-14 2023-01-31 成都登临科技有限公司 AI chip, tensor processing method and electronic equipment
CN115904510A (en) * 2023-02-15 2023-04-04 南京砺算科技有限公司 Multi-operand instruction processing method, graphics processor and storage medium

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
ELSAYED, ESSAM等: "A novel power-efficient multi-operand digit-multiplier using reconfiguration and clock gating", 《JOURNAL OF SUPERCOMPUTING》, vol. 71, no. 7, 29 July 2015 (2015-07-29) *
李克俭;李洋;柯宝中;雷琳;: "基于FPGA的寻址与运算操作数存储IP核设计", 广西科技大学学报, no. 04, 31 December 2017 (2017-12-31) *
李秀娟;王祖强;张甜;: "一种8位嵌入式RISC MCU IP核数据通道模型设计", 电子技术应用, no. 04, 30 May 2006 (2006-05-30) *

Similar Documents

Publication Publication Date Title
US10853276B2 (en) Executing distributed memory operations using processing elements connected by distributed channels
US10474375B2 (en) Runtime address disambiguation in acceleration hardware
CN108388528B (en) Hardware-based virtual machine communication
CN108052348B (en) Instruction and logic for processing text strings
US10042641B2 (en) Method and apparatus for asynchronous processor with auxiliary asynchronous vector processor
US9886277B2 (en) Methods and apparatus for fusing instructions to provide OR-test and AND-test functionality on multiple test sources
US9323530B2 (en) Caching optimized internal instructions in loop buffer
US10216516B2 (en) Fused adjacent memory stores
KR20160075669A (en) System-on-a-chip(soc) including hybrid processor cores
US20060206693A1 (en) Method and apparatus to execute an instruction with a semi-fast operation in a staggered ALU
US10331452B2 (en) Tracking mode of a processing device in instruction tracing systems
CN110825437B (en) Method and apparatus for processing data
CN114356420B (en) Instruction pipeline processing method and device, electronic device and storage medium
US10223298B2 (en) Urgency based reordering for priority order servicing of memory requests
US9632907B2 (en) Tracking deferred data packets in a debug trace architecture
KR20160113677A (en) Processor logic and method for dispatching instructions from multiple strands
US9753832B2 (en) Minimizing bandwith to compress output stream in instruction tracing systems
US20170123799A1 (en) Performing folding of immediate data in a processor
CN117170750A (en) Multi-source operand instruction scheduling method, device, processor, equipment and medium
US10146538B2 (en) Suspendable load address tracking inside transactions
US20190041895A1 (en) Single clock source for a multiple die package
US20140201505A1 (en) Prediction-based thread selection in a multithreading processor
US11314505B2 (en) Arithmetic processing device
US10996953B2 (en) Low latency execution of floating-point record form instructions
CN117348934A (en) Data caching method, data caching device and processor

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination