CN116302114A - Compiler instruction scheduling optimization method for supporting instruction macro fusion CPU - Google Patents
Compiler instruction scheduling optimization method for supporting instruction macro fusion CPU Download PDFInfo
- Publication number
- CN116302114A CN116302114A CN202310161288.XA CN202310161288A CN116302114A CN 116302114 A CN116302114 A CN 116302114A CN 202310161288 A CN202310161288 A CN 202310161288A CN 116302114 A CN116302114 A CN 116302114A
- Authority
- CN
- China
- Prior art keywords
- instruction
- fusion
- total
- scheduling
- instructions
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 230000004927 fusion Effects 0.000 title claims abstract description 96
- 238000005457 optimization Methods 0.000 title claims abstract description 16
- 238000000034 method Methods 0.000 title claims abstract description 14
- 238000013507 mapping Methods 0.000 claims description 2
- 230000008859 change Effects 0.000 abstract description 2
- 238000010586 diagram Methods 0.000 description 9
- 230000035772 mutation Effects 0.000 description 5
- 230000000903 blocking effect Effects 0.000 description 2
- 230000007547 defect Effects 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 230000008030 elimination Effects 0.000 description 2
- 238000003379 elimination reaction Methods 0.000 description 2
- 230000006872 improvement Effects 0.000 description 2
- 230000003068 static effect Effects 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000003111 delayed effect Effects 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 230000008569 process Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline or look ahead
- G06F9/3836—Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/40—Transformation of program code
- G06F8/41—Compilation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline or look ahead
- G06F9/3867—Concurrent instruction execution, e.g. pipeline or look ahead using instruction pipelines
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/48—Program initiating; Program switching, e.g. by interrupt
- G06F9/4806—Task transfer initiation or dispatching
- G06F9/4843—Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
- G06F9/4881—Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- General Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Devices For Executing Special Programs (AREA)
Abstract
The invention discloses a compiler instruction scheduling optimization method for supporting an instruction macro fusion CPU, which comprises the following steps: step 1, adding total latency value information of each group of fusion instructions into instruction description information of a compiler; step 2, replacing the whole fusion instruction of each group with a corresponding self-defined fusion total instruction, and performing instruction scheduling by a scheduler based on the replaced instruction sequence; and 3, after the instruction scheduling is finished, replacing each self-defined fusion total instruction one by one to each corresponding fusion instruction group. The invention takes part in scheduling in a form of fusing the total instructions, can avoid that two instructions are accidentally disassembled to cause incapability of fusion, can embody the latency condition of real hardware execution, ensures that the instruction scheduling algorithm of software can correctly eliminate the bubble, controls the optimized influence within the scheduling stage, does not need other parts of a compiler and hardware to change, has strong compatibility, does not need to modify the algorithm of the existing scheduler, and can lead the scheduling result to contain consideration of influence on the macro fusion of the instructions.
Description
Technical Field
The invention relates to a CPU compiler technology, in particular to an instruction scheduling optimization method of a compiler back end.
Background
1. Instruction fusion
Instruction fusion is a dynamic process of combining two instructions into a single instruction that produces one operation, micro-operation, sequence within a processor. Instructions stored in a processor Instruction Queue (IQ) may be "fused" after being read out of the IQ and before being sent to an instruction decoder, or after being decoded by the instruction decoder.
Typically, instruction fusion that occurs prior to instruction decoding is referred to as "macrofusion," while instruction fusion that occurs after instruction decoding (e.g., as uops) is referred to as "micro-fusion.
The macro fusion can effectively increase the throughput of instructions, reduce the latency of fused instructions, increase IPC (Instruction per cycle, the average instruction number executed in one clock cycle, and measure the important index of the performance of the processor), and improve the running efficiency of the processor. Especially for RISCV, instruction macrofusion can make up for the defect of simple instruction set: because the RISCV simplified instruction set does not contain complex instructions common to other instruction sets, such as load double gpr instructions in an ARM architecture, a macro fusion mode is needed for the RISCV basic instructions, so that the CPU of the RISCV can achieve the same performance as the CPU of other complex instruction set architectures when completing the same complex functions.
It is therefore desirable to find as many opportunities as possible to achieve instruction fusion.
2. Compiler support for instruction macrofusion
Implementation of instruction macrofusion requires hardware-level implementation and software-level support.
The support in software mainly means that the compiler needs to identify the fusible instruction pair and arrange the fusible instruction pair according to the fusion requirement of hardware, for example, when the CPU requires that the instruction pair is adjacent to be fused, the compiler needs to strictly arrange the fusible instruction pair adjacent to each other, and the opportunity of fusion is created. When the CPU supports out-of-order, the fusion may not require strict adjacency, but still requires the instructions to be arranged within a certain range by the compiler, creating a fused opportunity.
Optimization of the scheduling phase of a compiler typically includes: list scheduling+heuristic algorithm, but only make the scheduling result reach the optimal solution as much as possible, and scenes like instruction macro fusion may not be covered. Thus, a compiler (e.g., LLVM) typically provides an opportunity to redefine dependencies before scheduling is complete, and software can perform the relevant processing of instruction macrofusion.
For example, the object of the current microarchitecture-supported instruction macrofusion is instruction a and instruction b, and requires that instruction b be executed immediately after instruction a has been executed, while the DAG generated after scheduling by the compiler is shown in fig. 1.
The DAG display instruction sequence may be: instruction a|instruction b|instruction c|instruction d, or instruction a|instruction c|instruction b|instruction d. In this case, in order to force the instruction sequence to be: instruction a|instruction b|instruction c|instruction d is typically implemented in a manner that "adds a mutation" between instruction b and instruction c, i.e., by forcing an added dependency relationship to guarantee the required node order, as shown in fig. 2.
According to the DAG relationship of fig. 2, the instruction is forced to: the order of instructions a|instruction b|instruction c|instruction d is arranged.
3. Instruction scheduling principle
Instruction scheduling is a code performance optimization technique provided by compilers to enable efficient execution of programs on a central processor having an instruction pipeline through parallelism at the instruction level. Generally we divide scheduling into static scheduling and dynamic scheduling according to the stage at which it occurs.
However, both static scheduling and dynamic scheduling are performed by reordering the order of execution of instructions such that:
(1) Reducing bubble among instruction pipelines;
(2) Increasing IPC (Instruction per cycle), the average instruction number executed in one clock cycle, and measuring the important index of the performance of the processor;
(3) Relieving register pressure (which may reduce the life cycle of the registers to some extent if the scheduler is well handled).
If two dependent instructions are adjacent, for example, instruction a is adjacent to instruction b, instruction a is arranged before instruction b, and instruction b depends on instruction a, and the latency of instruction a is greater than 1, then instruction b needs to wait for the latency of instruction a to execute and then execute, so that when the CPU actually executes, a cavitation bubble is generated between the two instructions, as shown in fig. 3.
When the adjacent instruction has no dependency, the instruction pipeline is full, and a bubble does not exist; if instruction b depends on instruction a, a bubble is generated between instructions a and b.
In the prior art, for a specific CPU, a compiler has information such as resource use, latency and the like of each instruction, so the compiler has the opportunity to eliminate the bubbles as much as possible through scheduling on the basis of ensuring the unchanged function.
In the above example, it is assumed that the instruction c has no dependency relationship with the instruction a and the instruction b, and the execution sequence of the instruction c and the instruction b is exchanged without changing the overall function, so that the instruction c can be mobilized to the position where the bubble occurs to fill the bubble as much as possible. If the total latency value of the stuffed instruction c is greater than or equal to the latency of the execution of instruction a, then the bubble may be completely eliminated, as shown in FIG. 4.
The technical problem is that the compiler performs instruction scheduling based on the proprietary latency information of each actual instruction, and the change of the latency of the instruction after the instruction macro fusion is not considered, so that the problem can occur after the instruction macro fusion, and the performance improvement brought by the instruction macro fusion is counteracted.
For example, when instruction b depends on instruction a, the latency of instruction a is 3, then a- > b will bring about a bubble of 2 cycles, as shown in FIG. 5.
The compiler may schedule two other instructions x and y, both x and y being 1, forming a sequence of a- > x- > y- > b, eliminating the bubbles of a through b, as shown in FIG. 6.
However, when the hardware supports x and y fusion, the total latency after x and y fusion is reduced to 1, which can bring about performance improvement of 1 cycle, but a- > (x, y) - > b after fusion can cause b to need a bubble of 1 cycle, and the overall performance of the final instruction x and y before and after fusion is not improved, as shown in fig. 7.
The root cause of the scheduling optimization defect here is: the scheduler schedules based on individual instruction latency information without considering the impact of instruction macrofusion.
The related prior art can be seen in patent literature: CN115357230A, CN105378683A, CN104050077A, CN104050026A, CN104049945A, CN103870243a.
Disclosure of Invention
In order to solve the technical problems, the invention provides a compiler instruction scheduling optimization method for supporting an instruction macro fusion CPU, which adopts the following technical scheme that the information range of a scheduler to instructions is covered to the total latency value after the instruction macro fusion, the influence of fusion is considered during scheduling, and the problem that a bubble is generated in the original position without the bubble after the instruction macro fusion is avoided, wherein the method comprises the following steps:
a compiler instruction scheduling optimization method for supporting an instruction macro fusion CPU comprises the following steps:
and 3, after the instruction scheduling is finished, replacing each self-defined fusion total instruction one by one to each corresponding fusion instruction group.
The invention has the following beneficial technical effects:
1. before dispatching, the fused instruction is replaced by a self-defined fusion total instruction, and the dispatching is participated in the form of the fusion total instruction, so that the situation that two instructions are accidentally disassembled and cannot be fused can be avoided, and the latency condition of real hardware execution can be reflected, so that the instruction dispatching algorithm of software can correctly eliminate the bubble.
2. After the dispatching is finished, the fusion total instruction is replaced back to a corresponding group of fusion instructions, namely the actual original CPU instruction, so that the influence of the optimization can be controlled within the dispatching stage, other parts and hardware of a compiler are not required to be changed, and the compatibility is strong.
3. By means of instruction replacement, the scheduling result can be made to contain consideration of influence on instruction macro fusion without modifying an algorithm of an existing scheduler.
Drawings
The invention is further described below with reference to the accompanying drawings:
FIG. 1 is a diagram of a DAG without adding a mutation after scheduling by a compiler;
FIG. 2 is a schematic diagram of a DAG after adding a mutation after compiler scheduling;
FIG. 3 is a diagram illustrating a bubble caused by a dependency relationship between adjacent instructions;
FIG. 4 is a diagram illustrating the elimination of a bubble by a dispatch instruction c;
FIG. 5 is a diagram showing that instruction a and instruction b generate a buffer of 2 cycles;
FIG. 6 is a schematic diagram of elimination of a bubble by scheduling instruction x and instruction y;
FIG. 7 is a diagram showing a special scenario in which the number of cycles is not reduced before and after the fusion of instruction x and instruction y;
FIG. 8 is a diagram showing the generation of a new bubble after the fusion of instruction x and instruction y;
FIG. 9 is a diagram showing that the scheduler knows the latency value after the combination of instruction x and instruction y, and eliminates the bubble by scheduling instruction z;
fig. 10 is a schematic drawing of the DAG for instructions 1 to 4.
Detailed Description
The technical terms described in the invention are defined as follows:
latency: indicating the clock cycle required to fully execute an instruction, the unit is a cycle.
Bubble: the execution of the following instruction is blocked by a blocking (stall) mode on hardware, and the execution of the previous instruction is delayed. This is called pipeline blocking and the delay caused is called bubble.
3. Dependency relationship: to complete a function, interactions between multiple instructions are sometimes required, and some instructions have a dependency relationship, that is, the completion of one instruction may be completed only by relying on the execution results of other instructions.
Dag: in graph theory, a directed graph is a directed acyclic graph (DAG, directed Acyclic Graph) if it cannot go back to any vertex through several edges from that vertex.
The DAG we discuss refers to a DAG constructed by a compiler for each basic block instruction of a program and dependency relationship between instructions after instruction selection is completed in a scheduling stage, and the purpose is to perform local optimization. The following basic blocks:
instruction 1: a=b+c
Instruction 2: b=a-d
Instruction 3: c=b+c
Instruction 4: d=a-c
If the DAG is drawn, it is shown in FIG. 10.
We say that a DAG is a directed acyclic graph that represents the data dependencies between instructions in a basic block. Based on the data dependency relationship among the instructions, the following can be constructed: dag= (V, E), where V represents the set of nodes corresponding to all instructions in the program and E represents the set of data dependencies between instructions.
5. Adding a mutation in the DAG: it is known that we refer to a DAG as a graph with basic block instructions as nodes and dependency relationships between instructions as edges, and adding a mutation refers to adding an edge to the DAG, i.e. forcing a dependency relationship between two instructions.
The invention discloses a compiler instruction scheduling optimization method for supporting an instruction macro fusion CPU, which comprises the following steps:
and step 1, adding the total latency value information of each group of fused instructions into the instruction description information of the compiler. a. Custom fusion general instruction: classifying the macro fusion situation of all instructions in instruction description information of a compiler, defining each macro fusion situation of the instructions as a fusion total instruction, representing the total of a group of macro fusion instructions, defining the name of each fusion total instruction at the rear end of the compiler, but not distributing machine codes for the fusion total instruction, distributing a special computing resource for each fusion total instruction, and defining the special computing resources; b. the correct latency value is set: in the scheduling model file, the latency value of the special computing resource fused by all the instruction macros is set as the total latency value of a group of fused instructions, and the special computing resource is distributed to each corresponding fused total instruction, namely, the mapping of the total latency value fused by the instruction macros to the corresponding fused total instruction is realized, and the total latency value information fused by the instruction macros is supplemented into the scheduler in this way.
And 2, replacing the whole fused instruction group with a corresponding self-defined fused total instruction, and performing instruction scheduling by a scheduler based on the replaced instruction sequence. After the instruction sequence is generated, the instruction sequence is searched, each group of fused instructions is integrally replaced by a corresponding user-defined fused total instruction, and the scheduler performs scheduling based on the replaced instruction sequence, namely, the scheduler can schedule according to the total latency value after the macro fusion of the instructions, so that a scheduling stage is realized. After each group of fusion instructions are replaced, the scheduler can obtain the correct latency value after fusion, and the scheduler works according to the existing algorithm, so that the bubbles caused by the macro fusion of the instructions can be eliminated correctly.
And 3, after the instruction scheduling is finished, searching the instruction sequence again, and replacing each self-defined fusion total instruction one by one to each corresponding fusion instruction group, namely, a real original CPU instruction.
For example, when the instruction b depends on the instruction a, the latency of the instruction a is 3, the instruction sequence a- > b brings about a bubble of 2 cycles; the instruction sequence is x- > y- > z, and each latency is 1.
Regardless of instruction macrofusion, the compiler may generate the sequence a- > x- > y- > b- > z, with 2 instructions between compiler angles a and b, resulting in a latency of 2 cycles, sufficient to eliminate the bubbles from a to b. However, when x and y are fused, and in fact, when the hardware is executing, there is still a bubble between a and b, as shown in fig. 8.
By adopting the technical scheme of the invention, a new fusion total instruction xy is firstly customized to enable the latency to be 1, and the instruction sequence of x- > y- > z is modified into xy- > z; the compiler forms a sequence of a- > xy- > z- > b after scheduling, and eliminates the bubbles from a to b, as shown in fig. 9; and after the dispatching is finished, replacing the fusion total instruction with the actual original CPU instruction to form a sequence of a- > x- > y- > z- > b.
Relationship of total latency value after fusion and the sum of latency values before fusion.
1. Sum of pre-fusion latency values: regardless of the sum of the respective latency values of the instructions when instruction fusion occurs.
2. Total latency value after fusion: consider the total latency value required for fused execution of a fused instruction on a supported instruction when instruction fusion occurs.
3. Total latency value after fusion < sum of latency values before fusion:
the meaning of instruction fusion is that: when two specific instructions are adjacent in a certain order, the two instructions can be executed in parallel, namely (x 1 +x2) cycle is needed for originally executing the two instructions, after the group of instructions are fused, y cycle < (x 1 +x2) cycle is needed for executing the two instructions, namely the total latency value after fusion is less than the sum of the latency values before fusion.
4. Meaning of setting total latency value after fusion:
after implementing the instruction fusion on hardware, the tool chain cannot set a fused new latency value for each group of the fusible instruction pairs like the normal instruction, because the tool chain (taking LLVM as an example) can only set a latency value for a certain single instruction (i.e. each instruction can only correspond to one latency value), if one fused total latency value is set, the method can cause conflict with the latency value of the single instruction in the fusible instruction pairs, and the fusible instruction pairs are formed to appear not only in the fusible scene, but also independently or in an arrangement sequence not conforming to the fusion rule, and the scheduler should schedule according to the respective latency values.
The above is only a preferred embodiment of the present invention, but the scope of the present invention is not limited thereto. Any of the features based on the present invention, which are basically the same means to realize basically the same functions and basically the same effects, are also included in the protection scope of the present invention, and can be replaced by features that can be suggested by a person of ordinary skill in the art without creative effort when the infringement occurs.
Claims (4)
1. A compiler instruction scheduling optimization method for supporting an instruction macro fusion CPU comprises the following steps:
step 1, adding total latency value information of each group of fusion instructions into instruction description information of a compiler;
step 2, replacing the whole fusion instruction of each group with a corresponding self-defined fusion total instruction, and performing instruction scheduling by a scheduler based on the replaced instruction sequence;
and 3, after the instruction scheduling is finished, replacing each self-defined fusion total instruction one by one to each corresponding fusion instruction group.
2. The compiler instruction scheduling optimization method for supporting an instruction macro fusion CPU according to claim 1, wherein step 1 is:
a. custom fusion general instruction
Classifying the macro fusion situation of all instructions in instruction description information of a compiler, defining each macro fusion situation of the instructions as a fusion total instruction, representing the total of a group of macro fusion instructions, defining the name of each fusion total instruction at the rear end of the compiler, but not distributing machine codes for the fusion total instruction, distributing a special computing resource for each fusion total instruction, and defining the special computing resources;
b. setting the correct latency value
In the scheduling model file, the latency value of the special computing resource fused by all the instruction macros is set as the total latency value of a group of fused instructions, and the special computing resource is distributed to each corresponding fused total instruction, namely, the mapping of the total latency value fused by the instruction macros to the corresponding fused total instruction is realized, and the total latency value information fused by the instruction macros is supplemented into the scheduler in this way.
3. The compiler instruction scheduling optimization method for supporting instruction macro fusion CPU according to claim 1, wherein step 2 is: after the instruction sequence is generated, the instruction sequence is searched, each group of fused instructions is integrally replaced by a corresponding user-defined fused total instruction, and the scheduler performs scheduling based on the replaced instruction sequence, namely, the scheduler can schedule according to the total latency value after the macro fusion of the instructions, so that a scheduling stage is realized.
4. The compiler instruction scheduling optimization method for supporting instruction macro fusion CPU according to claim 1, wherein step 3 is: after the instruction scheduling is finished, the instruction sequence is searched again, and each self-defined fusion total instruction is replaced one by one to each corresponding fusion instruction group, namely the actual original CPU instruction.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310161288.XA CN116302114B (en) | 2023-02-24 | 2023-02-24 | Compiler instruction scheduling optimization method for supporting instruction macro fusion CPU |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310161288.XA CN116302114B (en) | 2023-02-24 | 2023-02-24 | Compiler instruction scheduling optimization method for supporting instruction macro fusion CPU |
Publications (2)
Publication Number | Publication Date |
---|---|
CN116302114A true CN116302114A (en) | 2023-06-23 |
CN116302114B CN116302114B (en) | 2024-01-23 |
Family
ID=86814263
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310161288.XA Active CN116302114B (en) | 2023-02-24 | 2023-02-24 | Compiler instruction scheduling optimization method for supporting instruction macro fusion CPU |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116302114B (en) |
Citations (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH05265769A (en) * | 1992-03-18 | 1993-10-15 | Fujitsu Ltd | Instruction scheduling processing method for compiler |
CN1670699A (en) * | 2004-03-19 | 2005-09-21 | 中国科学院计算技术研究所 | A micro-dispatching method supporting directed cyclic graph |
CN101866281A (en) * | 2010-06-13 | 2010-10-20 | 清华大学 | Multi-cycle instruction execution method and device |
CN102200924A (en) * | 2011-05-17 | 2011-09-28 | 北京北大众志微系统科技有限责任公司 | Modulus-scheduling-based compiling method and device for realizing circular instruction scheduling |
CN102799418A (en) * | 2012-08-07 | 2012-11-28 | 清华大学 | Processor architecture and instruction execution method integrating sequence and VLIW (Very Long Instruction Word) |
WO2014154917A1 (en) * | 2013-03-27 | 2014-10-02 | Intel Corporation | Mechanism for facilitating the dynamic and efficient merging of computer instructions in software programs |
CN105190579A (en) * | 2013-03-15 | 2015-12-23 | 索夫特机械公司 | A method for implementing a line speed interconnect structure |
CN105190541A (en) * | 2013-03-15 | 2015-12-23 | 索夫特机械公司 | A method for executing blocks of instructions using a microprocessor architecture having a register view, source view, instruction view, and a plurality of register templates |
CN110543121A (en) * | 2019-08-30 | 2019-12-06 | 西南电子技术研究所(中国电子科技集团公司第十研究所) | Instruction synchronous distribution control device of full-digital phased array system |
CN110692039A (en) * | 2017-05-26 | 2020-01-14 | 微软技术许可有限责任公司 | Microprocessor instruction pre-dispatch prior to block commit |
CN111930428A (en) * | 2020-09-27 | 2020-11-13 | 南京芯瞳半导体技术有限公司 | Method and device for fusing conditional branch instructions and computer storage medium |
CN112527304A (en) * | 2019-09-19 | 2021-03-19 | 无锡江南计算技术研究所 | Self-adaptive node fusion compiling optimization method based on heterogeneous platform |
CN112527393A (en) * | 2019-09-18 | 2021-03-19 | 无锡江南计算技术研究所 | Instruction scheduling optimization device and method for master-slave fusion architecture processor |
CN113196244A (en) * | 2018-12-10 | 2021-07-30 | 斯法夫股份有限公司 | Macro operation fusion |
CN115576608A (en) * | 2022-09-29 | 2023-01-06 | 平头哥(上海)半导体技术有限公司 | Processor core, processor, chip, control equipment and instruction fusion method |
-
2023
- 2023-02-24 CN CN202310161288.XA patent/CN116302114B/en active Active
Patent Citations (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH05265769A (en) * | 1992-03-18 | 1993-10-15 | Fujitsu Ltd | Instruction scheduling processing method for compiler |
CN1670699A (en) * | 2004-03-19 | 2005-09-21 | 中国科学院计算技术研究所 | A micro-dispatching method supporting directed cyclic graph |
CN101866281A (en) * | 2010-06-13 | 2010-10-20 | 清华大学 | Multi-cycle instruction execution method and device |
CN102200924A (en) * | 2011-05-17 | 2011-09-28 | 北京北大众志微系统科技有限责任公司 | Modulus-scheduling-based compiling method and device for realizing circular instruction scheduling |
CN102799418A (en) * | 2012-08-07 | 2012-11-28 | 清华大学 | Processor architecture and instruction execution method integrating sequence and VLIW (Very Long Instruction Word) |
CN105190579A (en) * | 2013-03-15 | 2015-12-23 | 索夫特机械公司 | A method for implementing a line speed interconnect structure |
CN105190541A (en) * | 2013-03-15 | 2015-12-23 | 索夫特机械公司 | A method for executing blocks of instructions using a microprocessor architecture having a register view, source view, instruction view, and a plurality of register templates |
WO2014154917A1 (en) * | 2013-03-27 | 2014-10-02 | Intel Corporation | Mechanism for facilitating the dynamic and efficient merging of computer instructions in software programs |
CN110692039A (en) * | 2017-05-26 | 2020-01-14 | 微软技术许可有限责任公司 | Microprocessor instruction pre-dispatch prior to block commit |
CN113196244A (en) * | 2018-12-10 | 2021-07-30 | 斯法夫股份有限公司 | Macro operation fusion |
CN110543121A (en) * | 2019-08-30 | 2019-12-06 | 西南电子技术研究所(中国电子科技集团公司第十研究所) | Instruction synchronous distribution control device of full-digital phased array system |
CN112527393A (en) * | 2019-09-18 | 2021-03-19 | 无锡江南计算技术研究所 | Instruction scheduling optimization device and method for master-slave fusion architecture processor |
CN112527304A (en) * | 2019-09-19 | 2021-03-19 | 无锡江南计算技术研究所 | Self-adaptive node fusion compiling optimization method based on heterogeneous platform |
CN111930428A (en) * | 2020-09-27 | 2020-11-13 | 南京芯瞳半导体技术有限公司 | Method and device for fusing conditional branch instructions and computer storage medium |
CN115576608A (en) * | 2022-09-29 | 2023-01-06 | 平头哥(上海)半导体技术有限公司 | Processor core, processor, chip, control equipment and instruction fusion method |
Non-Patent Citations (2)
Title |
---|
MARIA LAURA GATTO: "Biomechanical performances of PCL/HA micro- and macro-porous lattice scaffolds fabricated via laser powder bed fusion for bone tissue engineering", MATERIALS SCIENCE AND ENGINEERING: C, vol. 128 * |
余晓江;罗欣;: "基于RISC-V GCC编译器的指令延迟调度", 电子技术与软件工程, no. 08 * |
Also Published As
Publication number | Publication date |
---|---|
CN116302114B (en) | 2024-01-23 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP4042604B2 (en) | Program parallelization apparatus, program parallelization method, and program parallelization program | |
EP2965198B1 (en) | Reducing excessive compilation times | |
US6817013B2 (en) | Program optimization method, and compiler using the same | |
EP2677424B1 (en) | OpenCL compilation | |
US9996325B2 (en) | Dynamic reconfigurable compiler | |
JP5411587B2 (en) | Multi-thread execution device and multi-thread execution method | |
US6760906B1 (en) | Method and system for processing program for parallel processing purposes, storage medium having stored thereon program getting program processing executed for parallel processing purposes, and storage medium having stored thereon instruction set to be executed in parallel | |
US20120198427A1 (en) | Ensuring Register Availability for Dynamic Binary Optimization | |
US20060277529A1 (en) | Compiler apparatus | |
US20050289530A1 (en) | Scheduling of instructions in program compilation | |
JPH01108638A (en) | Parallelized compilation system | |
US20130305021A1 (en) | Method for convergence analysis based on thread variance analysis | |
EP3908920B1 (en) | Optimizing hardware fifo instructions | |
US7712091B2 (en) | Method for predicate promotion in a software loop | |
CN116302114B (en) | Compiler instruction scheduling optimization method for supporting instruction macro fusion CPU | |
EP2677423A2 (en) | OpenCL compilation | |
JP2011138494A (en) | Relational modeling for performance analysis of multi-core processor using virtual task | |
JP2008276547A (en) | Program processing method and information processor | |
US8806466B2 (en) | Program generation device, program production method, and program | |
US8768678B1 (en) | Scheduling processes in simulation of a circuit design based on simulation costs and runtime states of HDL processes | |
US20060200648A1 (en) | High-level language processor apparatus and method | |
Tasca et al. | Enhanced architecture for programmable logic controllers targeting performance improvements | |
KR20140122564A (en) | Apparatus and method for calculating physical address of register in processor | |
JP6528769B2 (en) | INFORMATION PROCESSING APPARATUS, PROCESSING METHOD, AND PROGRAM | |
JP6776914B2 (en) | Parallelization method, parallelization tool |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |