CN116302114A - Compiler instruction scheduling optimization method for supporting instruction macro fusion CPU - Google Patents

Compiler instruction scheduling optimization method for supporting instruction macro fusion CPU Download PDF

Info

Publication number
CN116302114A
CN116302114A CN202310161288.XA CN202310161288A CN116302114A CN 116302114 A CN116302114 A CN 116302114A CN 202310161288 A CN202310161288 A CN 202310161288A CN 116302114 A CN116302114 A CN 116302114A
Authority
CN
China
Prior art keywords
instruction
fusion
total
scheduling
instructions
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202310161288.XA
Other languages
Chinese (zh)
Other versions
CN116302114B (en
Inventor
许谦
庄秋彬
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jindi Spacetime Zhuhai Technology Co ltd
Original Assignee
Jindi Spacetime Zhuhai Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jindi Spacetime Zhuhai Technology Co ltd filed Critical Jindi Spacetime Zhuhai Technology Co ltd
Priority to CN202310161288.XA priority Critical patent/CN116302114B/en
Publication of CN116302114A publication Critical patent/CN116302114A/en
Application granted granted Critical
Publication of CN116302114B publication Critical patent/CN116302114B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3836Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/41Compilation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3867Concurrent instruction execution, e.g. pipeline or look ahead using instruction pipelines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • G06F9/4881Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Devices For Executing Special Programs (AREA)

Abstract

The invention discloses a compiler instruction scheduling optimization method for supporting an instruction macro fusion CPU, which comprises the following steps: step 1, adding total latency value information of each group of fusion instructions into instruction description information of a compiler; step 2, replacing the whole fusion instruction of each group with a corresponding self-defined fusion total instruction, and performing instruction scheduling by a scheduler based on the replaced instruction sequence; and 3, after the instruction scheduling is finished, replacing each self-defined fusion total instruction one by one to each corresponding fusion instruction group. The invention takes part in scheduling in a form of fusing the total instructions, can avoid that two instructions are accidentally disassembled to cause incapability of fusion, can embody the latency condition of real hardware execution, ensures that the instruction scheduling algorithm of software can correctly eliminate the bubble, controls the optimized influence within the scheduling stage, does not need other parts of a compiler and hardware to change, has strong compatibility, does not need to modify the algorithm of the existing scheduler, and can lead the scheduling result to contain consideration of influence on the macro fusion of the instructions.

Description

Compiler instruction scheduling optimization method for supporting instruction macro fusion CPU
Technical Field
The invention relates to a CPU compiler technology, in particular to an instruction scheduling optimization method of a compiler back end.
Background
1. Instruction fusion
Instruction fusion is a dynamic process of combining two instructions into a single instruction that produces one operation, micro-operation, sequence within a processor. Instructions stored in a processor Instruction Queue (IQ) may be "fused" after being read out of the IQ and before being sent to an instruction decoder, or after being decoded by the instruction decoder.
Typically, instruction fusion that occurs prior to instruction decoding is referred to as "macrofusion," while instruction fusion that occurs after instruction decoding (e.g., as uops) is referred to as "micro-fusion.
The macro fusion can effectively increase the throughput of instructions, reduce the latency of fused instructions, increase IPC (Instruction per cycle, the average instruction number executed in one clock cycle, and measure the important index of the performance of the processor), and improve the running efficiency of the processor. Especially for RISCV, instruction macrofusion can make up for the defect of simple instruction set: because the RISCV simplified instruction set does not contain complex instructions common to other instruction sets, such as load double gpr instructions in an ARM architecture, a macro fusion mode is needed for the RISCV basic instructions, so that the CPU of the RISCV can achieve the same performance as the CPU of other complex instruction set architectures when completing the same complex functions.
It is therefore desirable to find as many opportunities as possible to achieve instruction fusion.
2. Compiler support for instruction macrofusion
Implementation of instruction macrofusion requires hardware-level implementation and software-level support.
The support in software mainly means that the compiler needs to identify the fusible instruction pair and arrange the fusible instruction pair according to the fusion requirement of hardware, for example, when the CPU requires that the instruction pair is adjacent to be fused, the compiler needs to strictly arrange the fusible instruction pair adjacent to each other, and the opportunity of fusion is created. When the CPU supports out-of-order, the fusion may not require strict adjacency, but still requires the instructions to be arranged within a certain range by the compiler, creating a fused opportunity.
Optimization of the scheduling phase of a compiler typically includes: list scheduling+heuristic algorithm, but only make the scheduling result reach the optimal solution as much as possible, and scenes like instruction macro fusion may not be covered. Thus, a compiler (e.g., LLVM) typically provides an opportunity to redefine dependencies before scheduling is complete, and software can perform the relevant processing of instruction macrofusion.
For example, the object of the current microarchitecture-supported instruction macrofusion is instruction a and instruction b, and requires that instruction b be executed immediately after instruction a has been executed, while the DAG generated after scheduling by the compiler is shown in fig. 1.
The DAG display instruction sequence may be: instruction a|instruction b|instruction c|instruction d, or instruction a|instruction c|instruction b|instruction d. In this case, in order to force the instruction sequence to be: instruction a|instruction b|instruction c|instruction d is typically implemented in a manner that "adds a mutation" between instruction b and instruction c, i.e., by forcing an added dependency relationship to guarantee the required node order, as shown in fig. 2.
According to the DAG relationship of fig. 2, the instruction is forced to: the order of instructions a|instruction b|instruction c|instruction d is arranged.
3. Instruction scheduling principle
Instruction scheduling is a code performance optimization technique provided by compilers to enable efficient execution of programs on a central processor having an instruction pipeline through parallelism at the instruction level. Generally we divide scheduling into static scheduling and dynamic scheduling according to the stage at which it occurs.
However, both static scheduling and dynamic scheduling are performed by reordering the order of execution of instructions such that:
(1) Reducing bubble among instruction pipelines;
(2) Increasing IPC (Instruction per cycle), the average instruction number executed in one clock cycle, and measuring the important index of the performance of the processor;
(3) Relieving register pressure (which may reduce the life cycle of the registers to some extent if the scheduler is well handled).
If two dependent instructions are adjacent, for example, instruction a is adjacent to instruction b, instruction a is arranged before instruction b, and instruction b depends on instruction a, and the latency of instruction a is greater than 1, then instruction b needs to wait for the latency of instruction a to execute and then execute, so that when the CPU actually executes, a cavitation bubble is generated between the two instructions, as shown in fig. 3.
When the adjacent instruction has no dependency, the instruction pipeline is full, and a bubble does not exist; if instruction b depends on instruction a, a bubble is generated between instructions a and b.
In the prior art, for a specific CPU, a compiler has information such as resource use, latency and the like of each instruction, so the compiler has the opportunity to eliminate the bubbles as much as possible through scheduling on the basis of ensuring the unchanged function.
In the above example, it is assumed that the instruction c has no dependency relationship with the instruction a and the instruction b, and the execution sequence of the instruction c and the instruction b is exchanged without changing the overall function, so that the instruction c can be mobilized to the position where the bubble occurs to fill the bubble as much as possible. If the total latency value of the stuffed instruction c is greater than or equal to the latency of the execution of instruction a, then the bubble may be completely eliminated, as shown in FIG. 4.
The technical problem is that the compiler performs instruction scheduling based on the proprietary latency information of each actual instruction, and the change of the latency of the instruction after the instruction macro fusion is not considered, so that the problem can occur after the instruction macro fusion, and the performance improvement brought by the instruction macro fusion is counteracted.
For example, when instruction b depends on instruction a, the latency of instruction a is 3, then a- > b will bring about a bubble of 2 cycles, as shown in FIG. 5.
The compiler may schedule two other instructions x and y, both x and y being 1, forming a sequence of a- > x- > y- > b, eliminating the bubbles of a through b, as shown in FIG. 6.
However, when the hardware supports x and y fusion, the total latency after x and y fusion is reduced to 1, which can bring about performance improvement of 1 cycle, but a- > (x, y) - > b after fusion can cause b to need a bubble of 1 cycle, and the overall performance of the final instruction x and y before and after fusion is not improved, as shown in fig. 7.
The root cause of the scheduling optimization defect here is: the scheduler schedules based on individual instruction latency information without considering the impact of instruction macrofusion.
The related prior art can be seen in patent literature: CN115357230A, CN105378683A, CN104050077A, CN104050026A, CN104049945A, CN103870243a.
Disclosure of Invention
In order to solve the technical problems, the invention provides a compiler instruction scheduling optimization method for supporting an instruction macro fusion CPU, which adopts the following technical scheme that the information range of a scheduler to instructions is covered to the total latency value after the instruction macro fusion, the influence of fusion is considered during scheduling, and the problem that a bubble is generated in the original position without the bubble after the instruction macro fusion is avoided, wherein the method comprises the following steps:
a compiler instruction scheduling optimization method for supporting an instruction macro fusion CPU comprises the following steps:
step 1, adding total latency value information of each group of fusion instructions into instruction description information of a compiler;
step 2, replacing the whole fusion instruction of each group with a corresponding self-defined fusion total instruction, and performing instruction scheduling by a scheduler based on the replaced instruction sequence;
and 3, after the instruction scheduling is finished, replacing each self-defined fusion total instruction one by one to each corresponding fusion instruction group.
The invention has the following beneficial technical effects:
1. before dispatching, the fused instruction is replaced by a self-defined fusion total instruction, and the dispatching is participated in the form of the fusion total instruction, so that the situation that two instructions are accidentally disassembled and cannot be fused can be avoided, and the latency condition of real hardware execution can be reflected, so that the instruction dispatching algorithm of software can correctly eliminate the bubble.
2. After the dispatching is finished, the fusion total instruction is replaced back to a corresponding group of fusion instructions, namely the actual original CPU instruction, so that the influence of the optimization can be controlled within the dispatching stage, other parts and hardware of a compiler are not required to be changed, and the compatibility is strong.
3. By means of instruction replacement, the scheduling result can be made to contain consideration of influence on instruction macro fusion without modifying an algorithm of an existing scheduler.
Drawings
The invention is further described below with reference to the accompanying drawings:
FIG. 1 is a diagram of a DAG without adding a mutation after scheduling by a compiler;
FIG. 2 is a schematic diagram of a DAG after adding a mutation after compiler scheduling;
FIG. 3 is a diagram illustrating a bubble caused by a dependency relationship between adjacent instructions;
FIG. 4 is a diagram illustrating the elimination of a bubble by a dispatch instruction c;
FIG. 5 is a diagram showing that instruction a and instruction b generate a buffer of 2 cycles;
FIG. 6 is a schematic diagram of elimination of a bubble by scheduling instruction x and instruction y;
FIG. 7 is a diagram showing a special scenario in which the number of cycles is not reduced before and after the fusion of instruction x and instruction y;
FIG. 8 is a diagram showing the generation of a new bubble after the fusion of instruction x and instruction y;
FIG. 9 is a diagram showing that the scheduler knows the latency value after the combination of instruction x and instruction y, and eliminates the bubble by scheduling instruction z;
fig. 10 is a schematic drawing of the DAG for instructions 1 to 4.
Detailed Description
The technical terms described in the invention are defined as follows:
latency: indicating the clock cycle required to fully execute an instruction, the unit is a cycle.
Bubble: the execution of the following instruction is blocked by a blocking (stall) mode on hardware, and the execution of the previous instruction is delayed. This is called pipeline blocking and the delay caused is called bubble.
3. Dependency relationship: to complete a function, interactions between multiple instructions are sometimes required, and some instructions have a dependency relationship, that is, the completion of one instruction may be completed only by relying on the execution results of other instructions.
Dag: in graph theory, a directed graph is a directed acyclic graph (DAG, directed Acyclic Graph) if it cannot go back to any vertex through several edges from that vertex.
The DAG we discuss refers to a DAG constructed by a compiler for each basic block instruction of a program and dependency relationship between instructions after instruction selection is completed in a scheduling stage, and the purpose is to perform local optimization. The following basic blocks:
instruction 1: a=b+c
Instruction 2: b=a-d
Instruction 3: c=b+c
Instruction 4: d=a-c
If the DAG is drawn, it is shown in FIG. 10.
We say that a DAG is a directed acyclic graph that represents the data dependencies between instructions in a basic block. Based on the data dependency relationship among the instructions, the following can be constructed: dag= (V, E), where V represents the set of nodes corresponding to all instructions in the program and E represents the set of data dependencies between instructions.
5. Adding a mutation in the DAG: it is known that we refer to a DAG as a graph with basic block instructions as nodes and dependency relationships between instructions as edges, and adding a mutation refers to adding an edge to the DAG, i.e. forcing a dependency relationship between two instructions.
The invention discloses a compiler instruction scheduling optimization method for supporting an instruction macro fusion CPU, which comprises the following steps:
and step 1, adding the total latency value information of each group of fused instructions into the instruction description information of the compiler. a. Custom fusion general instruction: classifying the macro fusion situation of all instructions in instruction description information of a compiler, defining each macro fusion situation of the instructions as a fusion total instruction, representing the total of a group of macro fusion instructions, defining the name of each fusion total instruction at the rear end of the compiler, but not distributing machine codes for the fusion total instruction, distributing a special computing resource for each fusion total instruction, and defining the special computing resources; b. the correct latency value is set: in the scheduling model file, the latency value of the special computing resource fused by all the instruction macros is set as the total latency value of a group of fused instructions, and the special computing resource is distributed to each corresponding fused total instruction, namely, the mapping of the total latency value fused by the instruction macros to the corresponding fused total instruction is realized, and the total latency value information fused by the instruction macros is supplemented into the scheduler in this way.
And 2, replacing the whole fused instruction group with a corresponding self-defined fused total instruction, and performing instruction scheduling by a scheduler based on the replaced instruction sequence. After the instruction sequence is generated, the instruction sequence is searched, each group of fused instructions is integrally replaced by a corresponding user-defined fused total instruction, and the scheduler performs scheduling based on the replaced instruction sequence, namely, the scheduler can schedule according to the total latency value after the macro fusion of the instructions, so that a scheduling stage is realized. After each group of fusion instructions are replaced, the scheduler can obtain the correct latency value after fusion, and the scheduler works according to the existing algorithm, so that the bubbles caused by the macro fusion of the instructions can be eliminated correctly.
And 3, after the instruction scheduling is finished, searching the instruction sequence again, and replacing each self-defined fusion total instruction one by one to each corresponding fusion instruction group, namely, a real original CPU instruction.
For example, when the instruction b depends on the instruction a, the latency of the instruction a is 3, the instruction sequence a- > b brings about a bubble of 2 cycles; the instruction sequence is x- > y- > z, and each latency is 1.
Regardless of instruction macrofusion, the compiler may generate the sequence a- > x- > y- > b- > z, with 2 instructions between compiler angles a and b, resulting in a latency of 2 cycles, sufficient to eliminate the bubbles from a to b. However, when x and y are fused, and in fact, when the hardware is executing, there is still a bubble between a and b, as shown in fig. 8.
By adopting the technical scheme of the invention, a new fusion total instruction xy is firstly customized to enable the latency to be 1, and the instruction sequence of x- > y- > z is modified into xy- > z; the compiler forms a sequence of a- > xy- > z- > b after scheduling, and eliminates the bubbles from a to b, as shown in fig. 9; and after the dispatching is finished, replacing the fusion total instruction with the actual original CPU instruction to form a sequence of a- > x- > y- > z- > b.
Relationship of total latency value after fusion and the sum of latency values before fusion.
1. Sum of pre-fusion latency values: regardless of the sum of the respective latency values of the instructions when instruction fusion occurs.
2. Total latency value after fusion: consider the total latency value required for fused execution of a fused instruction on a supported instruction when instruction fusion occurs.
3. Total latency value after fusion < sum of latency values before fusion:
the meaning of instruction fusion is that: when two specific instructions are adjacent in a certain order, the two instructions can be executed in parallel, namely (x 1 +x2) cycle is needed for originally executing the two instructions, after the group of instructions are fused, y cycle < (x 1 +x2) cycle is needed for executing the two instructions, namely the total latency value after fusion is less than the sum of the latency values before fusion.
4. Meaning of setting total latency value after fusion:
after implementing the instruction fusion on hardware, the tool chain cannot set a fused new latency value for each group of the fusible instruction pairs like the normal instruction, because the tool chain (taking LLVM as an example) can only set a latency value for a certain single instruction (i.e. each instruction can only correspond to one latency value), if one fused total latency value is set, the method can cause conflict with the latency value of the single instruction in the fusible instruction pairs, and the fusible instruction pairs are formed to appear not only in the fusible scene, but also independently or in an arrangement sequence not conforming to the fusion rule, and the scheduler should schedule according to the respective latency values.
The above is only a preferred embodiment of the present invention, but the scope of the present invention is not limited thereto. Any of the features based on the present invention, which are basically the same means to realize basically the same functions and basically the same effects, are also included in the protection scope of the present invention, and can be replaced by features that can be suggested by a person of ordinary skill in the art without creative effort when the infringement occurs.

Claims (4)

1. A compiler instruction scheduling optimization method for supporting an instruction macro fusion CPU comprises the following steps:
step 1, adding total latency value information of each group of fusion instructions into instruction description information of a compiler;
step 2, replacing the whole fusion instruction of each group with a corresponding self-defined fusion total instruction, and performing instruction scheduling by a scheduler based on the replaced instruction sequence;
and 3, after the instruction scheduling is finished, replacing each self-defined fusion total instruction one by one to each corresponding fusion instruction group.
2. The compiler instruction scheduling optimization method for supporting an instruction macro fusion CPU according to claim 1, wherein step 1 is:
a. custom fusion general instruction
Classifying the macro fusion situation of all instructions in instruction description information of a compiler, defining each macro fusion situation of the instructions as a fusion total instruction, representing the total of a group of macro fusion instructions, defining the name of each fusion total instruction at the rear end of the compiler, but not distributing machine codes for the fusion total instruction, distributing a special computing resource for each fusion total instruction, and defining the special computing resources;
b. setting the correct latency value
In the scheduling model file, the latency value of the special computing resource fused by all the instruction macros is set as the total latency value of a group of fused instructions, and the special computing resource is distributed to each corresponding fused total instruction, namely, the mapping of the total latency value fused by the instruction macros to the corresponding fused total instruction is realized, and the total latency value information fused by the instruction macros is supplemented into the scheduler in this way.
3. The compiler instruction scheduling optimization method for supporting instruction macro fusion CPU according to claim 1, wherein step 2 is: after the instruction sequence is generated, the instruction sequence is searched, each group of fused instructions is integrally replaced by a corresponding user-defined fused total instruction, and the scheduler performs scheduling based on the replaced instruction sequence, namely, the scheduler can schedule according to the total latency value after the macro fusion of the instructions, so that a scheduling stage is realized.
4. The compiler instruction scheduling optimization method for supporting instruction macro fusion CPU according to claim 1, wherein step 3 is: after the instruction scheduling is finished, the instruction sequence is searched again, and each self-defined fusion total instruction is replaced one by one to each corresponding fusion instruction group, namely the actual original CPU instruction.
CN202310161288.XA 2023-02-24 2023-02-24 Compiler instruction scheduling optimization method for supporting instruction macro fusion CPU Active CN116302114B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310161288.XA CN116302114B (en) 2023-02-24 2023-02-24 Compiler instruction scheduling optimization method for supporting instruction macro fusion CPU

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310161288.XA CN116302114B (en) 2023-02-24 2023-02-24 Compiler instruction scheduling optimization method for supporting instruction macro fusion CPU

Publications (2)

Publication Number Publication Date
CN116302114A true CN116302114A (en) 2023-06-23
CN116302114B CN116302114B (en) 2024-01-23

Family

ID=86814263

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310161288.XA Active CN116302114B (en) 2023-02-24 2023-02-24 Compiler instruction scheduling optimization method for supporting instruction macro fusion CPU

Country Status (1)

Country Link
CN (1) CN116302114B (en)

Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH05265769A (en) * 1992-03-18 1993-10-15 Fujitsu Ltd Instruction scheduling processing method for compiler
CN1670699A (en) * 2004-03-19 2005-09-21 中国科学院计算技术研究所 A micro-dispatching method supporting directed cyclic graph
CN101866281A (en) * 2010-06-13 2010-10-20 清华大学 Multi-cycle instruction execution method and device
CN102200924A (en) * 2011-05-17 2011-09-28 北京北大众志微系统科技有限责任公司 Modulus-scheduling-based compiling method and device for realizing circular instruction scheduling
CN102799418A (en) * 2012-08-07 2012-11-28 清华大学 Processor architecture and instruction execution method integrating sequence and VLIW (Very Long Instruction Word)
WO2014154917A1 (en) * 2013-03-27 2014-10-02 Intel Corporation Mechanism for facilitating the dynamic and efficient merging of computer instructions in software programs
CN105190579A (en) * 2013-03-15 2015-12-23 索夫特机械公司 A method for implementing a line speed interconnect structure
CN105190541A (en) * 2013-03-15 2015-12-23 索夫特机械公司 A method for executing blocks of instructions using a microprocessor architecture having a register view, source view, instruction view, and a plurality of register templates
CN110543121A (en) * 2019-08-30 2019-12-06 西南电子技术研究所(中国电子科技集团公司第十研究所) Instruction synchronous distribution control device of full-digital phased array system
CN110692039A (en) * 2017-05-26 2020-01-14 微软技术许可有限责任公司 Microprocessor instruction pre-dispatch prior to block commit
CN111930428A (en) * 2020-09-27 2020-11-13 南京芯瞳半导体技术有限公司 Method and device for fusing conditional branch instructions and computer storage medium
CN112527304A (en) * 2019-09-19 2021-03-19 无锡江南计算技术研究所 Self-adaptive node fusion compiling optimization method based on heterogeneous platform
CN112527393A (en) * 2019-09-18 2021-03-19 无锡江南计算技术研究所 Instruction scheduling optimization device and method for master-slave fusion architecture processor
CN113196244A (en) * 2018-12-10 2021-07-30 斯法夫股份有限公司 Macro operation fusion
CN115576608A (en) * 2022-09-29 2023-01-06 平头哥(上海)半导体技术有限公司 Processor core, processor, chip, control equipment and instruction fusion method

Patent Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH05265769A (en) * 1992-03-18 1993-10-15 Fujitsu Ltd Instruction scheduling processing method for compiler
CN1670699A (en) * 2004-03-19 2005-09-21 中国科学院计算技术研究所 A micro-dispatching method supporting directed cyclic graph
CN101866281A (en) * 2010-06-13 2010-10-20 清华大学 Multi-cycle instruction execution method and device
CN102200924A (en) * 2011-05-17 2011-09-28 北京北大众志微系统科技有限责任公司 Modulus-scheduling-based compiling method and device for realizing circular instruction scheduling
CN102799418A (en) * 2012-08-07 2012-11-28 清华大学 Processor architecture and instruction execution method integrating sequence and VLIW (Very Long Instruction Word)
CN105190579A (en) * 2013-03-15 2015-12-23 索夫特机械公司 A method for implementing a line speed interconnect structure
CN105190541A (en) * 2013-03-15 2015-12-23 索夫特机械公司 A method for executing blocks of instructions using a microprocessor architecture having a register view, source view, instruction view, and a plurality of register templates
WO2014154917A1 (en) * 2013-03-27 2014-10-02 Intel Corporation Mechanism for facilitating the dynamic and efficient merging of computer instructions in software programs
CN110692039A (en) * 2017-05-26 2020-01-14 微软技术许可有限责任公司 Microprocessor instruction pre-dispatch prior to block commit
CN113196244A (en) * 2018-12-10 2021-07-30 斯法夫股份有限公司 Macro operation fusion
CN110543121A (en) * 2019-08-30 2019-12-06 西南电子技术研究所(中国电子科技集团公司第十研究所) Instruction synchronous distribution control device of full-digital phased array system
CN112527393A (en) * 2019-09-18 2021-03-19 无锡江南计算技术研究所 Instruction scheduling optimization device and method for master-slave fusion architecture processor
CN112527304A (en) * 2019-09-19 2021-03-19 无锡江南计算技术研究所 Self-adaptive node fusion compiling optimization method based on heterogeneous platform
CN111930428A (en) * 2020-09-27 2020-11-13 南京芯瞳半导体技术有限公司 Method and device for fusing conditional branch instructions and computer storage medium
CN115576608A (en) * 2022-09-29 2023-01-06 平头哥(上海)半导体技术有限公司 Processor core, processor, chip, control equipment and instruction fusion method

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
MARIA LAURA GATTO: "Biomechanical performances of PCL/HA micro- and macro-porous lattice scaffolds fabricated via laser powder bed fusion for bone tissue engineering", MATERIALS SCIENCE AND ENGINEERING: C, vol. 128 *
余晓江;罗欣;: "基于RISC-V GCC编译器的指令延迟调度", 电子技术与软件工程, no. 08 *

Also Published As

Publication number Publication date
CN116302114B (en) 2024-01-23

Similar Documents

Publication Publication Date Title
JP4042604B2 (en) Program parallelization apparatus, program parallelization method, and program parallelization program
EP2965198B1 (en) Reducing excessive compilation times
US6817013B2 (en) Program optimization method, and compiler using the same
EP2677424B1 (en) OpenCL compilation
US9996325B2 (en) Dynamic reconfigurable compiler
JP5411587B2 (en) Multi-thread execution device and multi-thread execution method
US6760906B1 (en) Method and system for processing program for parallel processing purposes, storage medium having stored thereon program getting program processing executed for parallel processing purposes, and storage medium having stored thereon instruction set to be executed in parallel
US20120198427A1 (en) Ensuring Register Availability for Dynamic Binary Optimization
US20060277529A1 (en) Compiler apparatus
US20050289530A1 (en) Scheduling of instructions in program compilation
JPH01108638A (en) Parallelized compilation system
US20130305021A1 (en) Method for convergence analysis based on thread variance analysis
EP3908920B1 (en) Optimizing hardware fifo instructions
US7712091B2 (en) Method for predicate promotion in a software loop
CN116302114B (en) Compiler instruction scheduling optimization method for supporting instruction macro fusion CPU
EP2677423A2 (en) OpenCL compilation
JP2011138494A (en) Relational modeling for performance analysis of multi-core processor using virtual task
JP2008276547A (en) Program processing method and information processor
US8806466B2 (en) Program generation device, program production method, and program
US8768678B1 (en) Scheduling processes in simulation of a circuit design based on simulation costs and runtime states of HDL processes
US20060200648A1 (en) High-level language processor apparatus and method
Tasca et al. Enhanced architecture for programmable logic controllers targeting performance improvements
KR20140122564A (en) Apparatus and method for calculating physical address of register in processor
JP6528769B2 (en) INFORMATION PROCESSING APPARATUS, PROCESSING METHOD, AND PROGRAM
JP6776914B2 (en) Parallelization method, parallelization tool

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant