CN113296788A - Instruction scheduling method, apparatus, device, storage medium and program product - Google Patents

Instruction scheduling method, apparatus, device, storage medium and program product Download PDF

Info

Publication number
CN113296788A
CN113296788A CN202110650043.4A CN202110650043A CN113296788A CN 113296788 A CN113296788 A CN 113296788A CN 202110650043 A CN202110650043 A CN 202110650043A CN 113296788 A CN113296788 A CN 113296788A
Authority
CN
China
Prior art keywords
instruction
node
scheduling
preset
directed acyclic
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110650043.4A
Other languages
Chinese (zh)
Other versions
CN113296788B (en
Inventor
殷闻强
王瑶池
张圣铭
陈扬洋
陈光胜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Eastsoft Microelectronics Co ltd
Original Assignee
Shanghai Eastsoft Microelectronics Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Eastsoft Microelectronics Co ltd filed Critical Shanghai Eastsoft Microelectronics Co ltd
Priority to CN202110650043.4A priority Critical patent/CN113296788B/en
Publication of CN113296788A publication Critical patent/CN113296788A/en
Application granted granted Critical
Publication of CN113296788B publication Critical patent/CN113296788B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/41Compilation
    • G06F8/44Encoding
    • G06F8/443Optimisation
    • G06F8/4434Reducing the memory space required by the program code
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30003Arrangements for executing specific machine instructions
    • G06F9/3005Arrangements for executing specific machine instructions to perform operations for flow control
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Devices For Executing Special Programs (AREA)

Abstract

The application provides an instruction scheduling method, an instruction scheduling device, instruction scheduling equipment, a storage medium and a program product. Firstly, dividing a plurality of instructions into a plurality of basic blocks according to a preset division rule, then determining a directed acyclic graph of each basic block according to the data dependency relationship among the instructions in each basic block, and obtaining the data dependency graph of each basic block. And scheduling each node in each directed acyclic graph based on a preset scheduling algorithm until the node sequence of each directed acyclic graph is obtained. Because each instruction in the same basic block has a data dependency relationship, the node sequence obtained after the scheduling according to the preset scheduling algorithm is based on the directed acyclic graph, the capacity of a memory occupied by a program can be effectively reduced, the required quantity of the memory and the register is reduced, the phenomenon of register overflow is effectively avoided, the required development technology threshold and the cost are lower, and the realizability is higher.

Description

Instruction scheduling method, apparatus, device, storage medium and program product
Technical Field
The present application relates to the field of computer technologies, and in particular, to a method, an apparatus, a device, a storage medium, and a program product for instruction scheduling.
Background
In the field of computer technology, a compiler is a computer program that translates a source code into a computer executable program, and the computer executable program is composed of a plurality of instructions, and the instructions executed in a certain order can be regarded as a basic block.
For some computer systems, memory space is quite limited, and thus there are strict requirements on the size of the program capacity. For example, the number of registers of the embedded system is usually small, and the instruction usually uses the registers as operands, so when the registers are not enough to hold the variables in the program, the corresponding values of the registers are temporarily stored into the memory, i.e. a "register overflow" phenomenon occurs. The occurrence of the "register overflow" phenomenon inevitably increases the frequency of the processor accessing the memory, thereby resulting in the reduction of the program execution efficiency and the increase of the program code space.
In order to avoid the phenomenon of register overflow, the code needs to be optimized to reduce the space occupied by the program, but there are many problems in optimizing the source code. For example, the technical threshold of source code optimization is often high, which not only has high professional requirements for source code developers, but also requires a deep understanding of the compiler and the instruction set, and thus it usually requires a lot of manpower to perform program analysis to modify the source code for optimization. Therefore, an effective solution is urgently needed to avoid the occurrence of the register overflow phenomenon.
Disclosure of Invention
The application provides an instruction scheduling method, an instruction scheduling device, an instruction scheduling apparatus, a storage medium and a program product, which are used for reducing the number of required registers and memory and avoiding the occurrence of a register overflow phenomenon.
In a first aspect, the present application provides an instruction scheduling method, including:
dividing a plurality of instructions into a plurality of basic blocks according to a preset division rule, wherein the preset division rule comprises a preset starting instruction type and a preset ending instruction type, the preset ending instruction type comprises any one of an unconditional jump instruction, a conditional jump instruction, a function call instruction and a last instruction of a function, and the function call instruction does not comprise a specified function call instruction;
determining a directed acyclic graph of each basic block according to the data dependency relationship among the instructions in each basic block, and obtaining the data dependency graph of each basic block, wherein each node in each directed acyclic graph corresponds to each instruction in each corresponding basic block one to one;
and scheduling each node in each directed acyclic graph based on a preset scheduling algorithm until the node sequence of each directed acyclic graph is obtained.
In one possible design, after the obtaining the node ordering of each directed acyclic graph, the method further includes:
and determining the instruction execution sequence of the corresponding basic block according to the node sequence of each directed acyclic graph, and executing each instruction in each basic block according to the instruction execution sequence to finish the ordered execution of the plurality of instructions.
In one possible design, the preset start instruction type includes any one of the following:
a first instruction of the function, a target address of a preset instruction and a next instruction after the preset instruction;
the preset instructions comprise unconditional jump instructions and conditional jump instructions.
In a possible design, the determining a directed acyclic graph of each basic block according to a data dependency relationship between instructions in each basic block and obtaining a data dependency graph of each basic block includes:
determining the data dependency relationship among the instructions in each basic block to generate a User chain of a node corresponding to each instruction and generate the data dependency graph of each basic block;
obtaining the directed acyclic graph of each basic block according to the User chain of each corresponding node in each basic block;
the data dependency graph of each basic block comprises a node set and a directed edge set, each node subset in the node set is used for representing the data dependency relationship of each node, each directed edge subset in the directed edge set is used for representing constraint data of the data dependency relationship of each node, and the constraint data are represented by weight values.
In one possible design, the data dependencies include: alias analysis of any one or more of dependencies, usage-definition (Use-Def) chain dependencies, and usage (Use) chain dependencies.
In one possible design, the scheduling nodes in each directed acyclic graph based on a preset scheduling algorithm includes:
determining a node corresponding to an ending instruction as a first finished scheduling node in the current directed acyclic graph aiming at the schedulable node in each directed acyclic graph;
scheduling other schedulable nodes except the current completed scheduling node according to a preset counting rule, a preset priority scheduling rule and a User chain of the current completed scheduling node;
the preset scheduling algorithm comprises the preset counting rule and the preset priority scheduling rule.
In one possible design, the scheduling nodes other than the currently completed scheduling node according to a preset counting rule, a preset priority scheduling rule and a User chain of the completed scheduling node includes:
obtaining a node to be selected according to the User chain of the currently finished scheduling node, and updating the count value of the node to be selected according to the preset counting rule and the constraint data corresponding to the node to be selected so as to generate a Next chain;
deleting the currently finished scheduling nodes and the data dependency relationship of the currently finished scheduling nodes in the current directed acyclic graph to obtain a new current directed acyclic graph;
repeating the steps on the new current directed acyclic graph based on the preset priority scheduling rule until all schedulable nodes in each directed acyclic graph are scheduled;
and the preset priority scheduling rule is used for representing the priority scheduling of the node to be selected with the largest count value in the Next chain.
In a possible design, during the preferential scheduling, if the number of nodes to be selected with the largest count value in the Next chain is greater than 1, merging the User chains of the nodes to be selected to the Next chain to obtain the lengths of the merged chains, and performing the preferential scheduling on the nodes to be selected corresponding to the smallest chain length.
In one possible design, after the scheduling is completed for each schedulable node of each directed acyclic graph, further comprising:
and determining the scheduling completion sequence of each schedulable node in each directed acyclic graph as the node sequence of each directed acyclic graph.
In a second aspect, the present application provides an instruction scheduling apparatus, including:
the system comprises a first processing module, a second processing module and a third processing module, wherein the first processing module is used for dividing a plurality of instructions into a plurality of basic blocks according to a preset division rule, the preset division rule comprises a preset starting instruction type and a preset ending instruction type, the preset ending instruction type comprises any one of an unconditional jump instruction, a conditional jump instruction, a function call instruction and a last instruction of a function, and the function call instruction does not comprise a specified function call instruction;
the second processing module is used for determining the directed acyclic graph of each basic block according to the data dependency relationship among the instructions in each basic block and obtaining the data dependency graph of each basic block, and each node in each directed acyclic graph corresponds to each instruction in each corresponding basic block one to one;
and the third processing module is used for scheduling each node in each directed acyclic graph based on a preset scheduling algorithm until the node sequence of each directed acyclic graph is obtained.
In one possible design, the instruction scheduling apparatus further includes: a fourth processing module; the fourth processing module is configured to:
and determining the instruction execution sequence of the corresponding basic block according to the node sequence of each directed acyclic graph, and executing each instruction in each basic block according to the instruction execution sequence to finish the ordered execution of the plurality of instructions.
In one possible design, the preset start instruction type includes any one of the following:
a first instruction of the function, a target address of a preset instruction and a next instruction after the preset instruction;
the preset instructions comprise unconditional jump instructions and conditional jump instructions.
In one possible design, the second processing module is specifically configured to:
determining the data dependency relationship among the instructions in each basic block to generate a User chain of a node corresponding to each instruction and generate the data dependency graph of each basic block;
obtaining the directed acyclic graph of each basic block according to the User chain of each corresponding node in each basic block;
the data dependency graph of each basic block comprises a node set and a directed edge set, each node subset in the node set is used for representing the data dependency relationship of each node, each directed edge subset in the directed edge set is used for representing constraint data of the data dependency relationship of each node, and the constraint data are represented by weight values.
In one possible design, the data dependencies include: alias analysis of any one or more of dependencies, usage-definition (Use-Def) chain dependencies, and usage (Use) chain dependencies.
In one possible design, the third processing module includes:
the first processing submodule is used for determining a node corresponding to an ending instruction as a first finished scheduling node in the current directed acyclic graph aiming at the schedulable node in each directed acyclic graph;
the second processing submodule is used for scheduling other schedulable nodes except the current finished scheduling node according to a preset counting rule, a preset priority scheduling rule and a User chain of the current finished scheduling node;
the preset scheduling algorithm comprises the preset counting rule and the preset priority scheduling rule.
In one possible design, the second processing submodule is specifically configured to:
obtaining a node to be selected according to the User chain of the currently finished scheduling node, and updating the count value of the node to be selected according to the preset counting rule and the constraint data corresponding to the node to be selected so as to generate a Next chain;
deleting the currently finished scheduling nodes and the data dependency relationship of the currently finished scheduling nodes in the current directed acyclic graph to obtain a new current directed acyclic graph;
repeating the steps on the new current directed acyclic graph based on the preset priority scheduling rule until all schedulable nodes in each directed acyclic graph are scheduled;
and the preset priority scheduling rule is used for representing the priority scheduling of the node to be selected with the largest count value in the Next chain.
In a possible design, when performing the preferential scheduling, if the number of nodes to be selected in the Next chain with the largest count value is greater than 1, the second processing sub-module is further configured to:
and merging the User chain of the node to be selected to the Next chain to obtain the length of each merged chain, and performing the priority scheduling on the node to be selected corresponding to the minimum chain length.
In one possible design, the instruction scheduling apparatus further includes: a fifth processing module; the fifth processing module is configured to:
and determining the scheduling completion sequence of each schedulable node in each directed acyclic graph as the node sequence of each directed acyclic graph.
In a third aspect, the present application provides an electronic device, comprising:
a processor; and the number of the first and second groups,
a memory for storing a computer program for the processor;
wherein the processor is configured to perform any one of the possible instruction scheduling methods provided by the first aspect via execution of the computer program.
In a fourth aspect, the present application provides a computer-readable storage medium, on which a computer program is stored, which, when executed by a processor, implements any one of the possible instruction scheduling methods provided by the first aspect.
In a fifth aspect, the present application further provides a computer program product comprising a computer program, which when executed by a processor implements any one of the possible instruction scheduling methods provided in the first aspect.
The application provides an instruction scheduling method, an instruction scheduling device, instruction scheduling equipment, a storage medium and a program product. Firstly, dividing a plurality of instructions into a plurality of basic blocks according to a preset division rule, then determining a directed acyclic graph of each basic block according to the data dependency relationship among the instructions in each basic block, and obtaining the data dependency graph of each basic block, wherein each node in each directed acyclic graph corresponds to each instruction in each corresponding basic block one to one. And scheduling each node in each directed acyclic graph based on a preset scheduling algorithm until the node sequence of each directed acyclic graph is obtained. According to the instruction scheduling method, a plurality of instructions are firstly divided into a plurality of basic blocks, and then scheduling of the instructions is realized on the basis of obtaining the directed acyclic graph according to the data dependency of the instructions in each basic block. Because each instruction in the same basic block has a data dependency relationship, the capacity of a memory occupied by a program can be effectively reduced based on the node sequencing of the directed acyclic graph obtained after scheduling, the required quantity of the memory and the register is reduced, the register overflow phenomenon is effectively avoided, the required development technology threshold and the cost are lower, and the realizability is higher.
Drawings
Fig. 1 is a schematic view of an application scenario provided in an embodiment of the present application;
fig. 2 is a schematic flowchart of an instruction scheduling method according to an embodiment of the present application;
FIG. 3 is a flowchart illustrating another instruction scheduling method according to an embodiment of the present disclosure;
FIG. 4 is a schematic illustration of a User chain according to an embodiment of the present application;
FIG. 5 is a schematic diagram of a directed acyclic graph according to an embodiment of the present disclosure;
FIG. 6 is a flowchart illustrating a further instruction scheduling method according to an embodiment of the present application;
FIG. 7 is a flowchart illustrating another instruction scheduling method according to an embodiment of the present application;
FIG. 8 is a diagram of a Next chain and a directed acyclic graph according to an embodiment of the present disclosure;
FIG. 9 is a diagram of another Next chain and directed acyclic graph according to an embodiment of the present disclosure;
fig. 10 is a schematic structural diagram of an instruction scheduling apparatus according to an embodiment of the present application;
fig. 11 is a schematic structural diagram of another instruction scheduling apparatus according to an embodiment of the present application;
fig. 12 is a schematic structural diagram of an electronic device provided in the present application.
Detailed Description
Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present application. Rather, it is merely an example of methods and apparatus consistent with certain aspects of the present application, as detailed in the following claims.
The terms "first," "second," "third," "fourth," and the like in the description and in the claims of the present application and in the above-described drawings (if any) are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data used in the manner described is interchangeable under appropriate circumstances in order to describe the embodiments of the application, e.g., can be implemented in an order other than that shown or described. Furthermore, the terms "comprises," "comprising," and any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
The storage space of the computer system is very limited, when the register is not enough to store the variable in the program, the corresponding value of the register is temporarily stored into the memory so as to release the available register, and when the register is needed, the register is transferred back to the register from the memory again, so that the phenomenon of 'register overflow' occurs. The occurrence of this phenomenon will inevitably increase the frequency of memory access of the processor, which in turn leads to inefficient program execution and increased code space. To avoid the "register overflow" phenomenon, the code needs to be optimized to reduce the space occupied by the program. However, there are many problems in optimizing the source code, for example, the technical threshold of the source code optimization is often high, which not only has high professional requirements for the source code developer, but also needs a deep understanding of the compiler and the instruction set, and thus, it usually needs a lot of manpower to perform program analysis to modify the source code to achieve optimization. In view of the above, it can be seen that an effective solution is urgently needed to avoid the occurrence of the "register overflow" phenomenon.
In view of the foregoing problems in the prior art, the present application provides an instruction scheduling method, apparatus, device, storage medium, and program product. The invention conception of the instruction scheduling method provided by the application is as follows: firstly, a plurality of instructions are divided into a plurality of basic blocks by a preset division rule, wherein the preset division rule comprises a preset starting instruction type and a preset ending instruction type, so that after a starting instruction in one basic block is determined, subsequent instructions can be continuously added into the basic block until an instruction of the preset ending instruction type appears, and the range of the basic block is expanded. Further, the scheduling of each node in each basic block is realized based on a preset scheduling algorithm on the basis of obtaining the directed acyclic graph according to the data dependency relationship among the instructions in each basic block, so that the node sequence of each node in each directed acyclic graph is obtained, that is, the scheduling of each instruction in each basic block is realized. Because each instruction with data dependency relationship is in the same basic block and the node sequencing by scheduling is realized, the life cycle of variables in the program is shortened, the capacity of a memory occupied by the program can be effectively reduced, the quantity of required memories and registers is reduced, the phenomenon of register overflow is effectively avoided, the required development technology threshold and the cost are lower, and the realizability is higher.
An exemplary application scenario of the embodiments of the present application is described below.
Fig. 1 is a schematic view of an application scenario provided in an embodiment of the present application, and as shown in fig. 1, a single chip Microcomputer (MCU) 11 may be configured at any terminal capable of implementing corresponding control by executing a computer instruction, such as an air conditioner, an intelligent instrument, a printer, and the like, where the computer instruction executed by the MCU 11 may be scheduled by the instruction scheduling method provided in the embodiment of the present application. The instruction scheduling method provided by the embodiment of the present application can be implemented by the instruction scheduling apparatus provided by the embodiment of the present application, wherein the computer 12 can be an electronic device corresponding to the instruction scheduling apparatus provided by the embodiment of the present application, and the processor is configured to execute the instruction scheduling method provided by the embodiment of the present application. Specifically, a plurality of computer instructions to be executed by the MCU 11 are first divided to obtain a plurality of basic blocks, and then nodes in each directed acyclic graph are scheduled according to a designed preset scheduling algorithm on the basis of obtaining a directed acyclic graph of each basic block according to a data dependency relationship between the instructions in each basic block to obtain a node ranking of each directed acyclic graph, that is, to implement scheduling of the instructions in each basic block. Therefore, when the MCU 11 executes the plurality of computer instructions, the execution process of the instructions can be completed for each computer instruction in each basic block according to the node sequence, and the life cycle of variables in the program can be shortened, thereby effectively reducing the capacity of the memory occupied by the program, reducing the number of the memory and the register required, and effectively avoiding the occurrence of the "register overflow".
It should be noted that the above application scenarios are only exemplary, and the instruction scheduling method, apparatus, device, storage medium, and program product provided in the embodiments of the present application include, but are not limited to, the above application scenarios.
The following describes the technical solutions of the present application and how to solve the above technical problems with specific embodiments. The following several specific embodiments may be combined with each other, and details of the same or similar concepts or processes may not be repeated in some embodiments.
Fig. 2 is a flowchart illustrating an instruction scheduling method according to an embodiment of the present application. As shown in fig. 2, the present embodiment includes:
s101: and dividing the plurality of instructions into a plurality of basic blocks according to a preset dividing rule.
The preset division rule comprises a preset starting instruction type and a preset ending instruction type.
The compiler may translate the source code into a computer-executable program, for example, for the source code, the compiler may generate corresponding intermediate code from the input source code, where each generated intermediate code may be understood as one instruction, and thus, a plurality of instructions corresponding to the source code may be obtained first according to the source code. The intermediate code can be understood as different representation forms of the source code, or intermediate representations.
After obtaining the plurality of instructions, dividing the plurality of instructions into a plurality of basic blocks according to a preset division rule, wherein each basic block is composed of a group of a plurality of instructions with sequences. Each basic block only has one inlet and one outlet, wherein the inlet is a start instruction of the basic block, and the outlet is an end instruction of the basic block, so that the plurality of instructions can be divided into a plurality of basic blocks according to a preset division rule.
The preset division rule may include a preset start instruction type and a preset end instruction type.
The preset starting instruction type may be a first instruction of the function, or may be a target address of the preset instruction, where the preset instruction includes an unconditional jump instruction and a conditional jump instruction, that is, an instruction pointed to by a pc (program counter) register after the jump instruction is completed, or may be a next instruction after the conditional jump instruction and the unconditional jump instruction. I.e. the predetermined starting instruction type comprises any of the above listed instructions or target addresses.
The preset ending instruction type may be an unconditional jump instruction, a conditional jump instruction, a function call instruction, or a last instruction of a function, where the function call instruction does not include a specified function call instruction. The specified function call instruction refers to a part of a system library function or a function having a certain attribute. Therefore, the preset end instruction type includes any one of an unconditional jump instruction, a conditional jump instruction, a function call instruction, and a last instruction of a function.
For example, for a plurality of instructions corresponding to the source code, an instruction conforming to a preset starting instruction type is determined as a starting instruction of a basic block to be divided, then other instructions are included in the basic block until an instruction conforming to a preset ending instruction type appears, and then an instruction conforming to the preset ending instruction type is determined as an ending instruction of the basic block, so that the division of one basic block is completed. The preset ending instruction type does not include a specified function call instruction, so that the ending basic block can also continue to incorporate the instruction, and when the basic block is divided, a plurality of adjacent basic blocks can be merged into a larger basic block through the preset dividing rule in the instruction scheduling method provided by the embodiment of the application, so that the range of the basic block is expanded. Since the instructions can only be scheduled in the basic block, by expanding the range of the basic block, if the basic block in which each instruction is located before being divided by the preset division rule is regarded as a small basic block, the instruction scheduling can be realized across a plurality of original small basic blocks after being divided by the preset division rule, thereby being beneficial to shortening the variable life cycle when the source code is executed.
S102: and determining the directed acyclic graph of each basic block according to the data dependency relationship among the instructions in each basic block, and obtaining the data dependency graph of each basic block.
And each node in each directed acyclic graph corresponds to each instruction in each corresponding basic block one by one.
After a plurality of instructions are divided into basic blocks, for each basic block, determining a Directed Acyclic Graph (DAG) of each basic block according to data dependency relations among the instructions in each basic block, wherein each instruction consists of an operation operator and 0 to a plurality of operands, and thus, the data dependency relations among the operation operators and the operands in each instruction can be determined, so that the DAG (Directed Acyclic Graph) corresponding to each basic block can be formed. Each node in each formed directed acyclic graph corresponds to each instruction in the corresponding basic block one by one, the edge of each directed acyclic graph is constraint data of data dependency between each instruction corresponding to adjacent nodes, and the constraint data is represented by weight values.
When each instruction has more than one operand, different operands have different data dependency relationships, if two instructions have data dependency relationships to multiple operands, constraint data exists between the data dependency relationships, and the corresponding constraint data can be represented by weight values.
In addition, the data dependency relationship between instructions is determined by the data dependency relationship between operands of the instructions. For example, the data dependency of the operand may be any one or more of an alias analysis dependency, a Use-definition (Use-Def) chain dependency, and a Use (Use) chain dependency, which is not limited in this embodiment.
The alias analysis dependency is a dependency relationship used for determining whether a memory location may have multiple access modes, and if the memory location is not analyzed or determined to be accessible by multiple modes, it indicates that the sequence of two consecutive instructions is unchangeable, that is, a node corresponding to a subsequent instruction points to a node corresponding to a previous instruction.
The Use-Def chain dependency refers to an instruction that will only assert one operand. For the same operand, if the operand is used directly without being fixed, the value obtained when reading the operand is indeterminate, and therefore the command to fix the operand must be executed before the instruction to use the operand.
In Use chain dependencies, the same operand may be used in multiple instructions, and thus multiple instructions with that operand may be concatenated through the same operand to form a corresponding User chain.
Determining the data dependency relationship among the instructions in each basic block, generating a User chain of nodes corresponding to the instructions, and further forming a directed acyclic graph for instruction scheduling, so that a plurality of instructions with the same operand can be executed together, and further the access operation to the memory can be reduced.
When determining the directed acyclic graph of each basic block, the data dependency relationship between the instructions in each basic block can be recorded in the form of a data dependency graph.
In a possible design, a possible implementation manner of this step S102 is shown in fig. 3, and fig. 3 is a flowchart of another instruction scheduling method provided in this embodiment of the present application. As shown in fig. 3, the present embodiment includes:
s1021: and determining the data dependency relationship among the instructions in each basic block to generate a User chain of the node corresponding to each instruction.
S1022: and obtaining the directed acyclic graph of each basic block according to the User chain of each corresponding node in each basic block, and generating and recording the data dependency graph of each basic block.
The data dependency relationship among the instructions can be determined based on the operation operational operators and the operands of the instructions in each basic block, and the data dependency relationship among all nodes related to the nodes corresponding to each instruction is characterized through the User chain, namely, the User chain of the nodes corresponding to each instruction is generated. Furthermore, a directed acyclic graph corresponding to each basic block can be constructed according to the User chain of the node corresponding to each instruction in each basic block.
Meanwhile, in order to facilitate recording of data dependency relationships among instructions in each basic block, a data dependency graph can be generated, so that the data dependency relationships can be recorded in the form of the data dependency graph. The data dependency graph may be represented by G, where G ═ I, (E), I represents a node set, E represents a directed edge set, and each node in the directed acyclic graph corresponding to each basic block has a data dependency relationship that is each subset of the node set (I), that is, each node subset. The constraint data of the data dependency relationship of each node in the directed acyclic graph corresponding to each basic block is each subset of the directed edge set (E), that is, each directed edge subset, for example, the constraint data of the data dependency relationship of the nodes i6, i6, and i1 in the directed acyclic graph shown in fig. 3 is 2.
Table 1 lists instructions in a basic block (i.e., instructions corresponding to nodes i0 to i10), fig. 4 shows a User chain of nodes (i0 to i10) corresponding to each instruction generated by determining data dependency relationships between the instructions in the basic block shown in table 1, and fig. 4 is a schematic diagram of a User chain provided in this embodiment of the present application. On the basis of fig. 4, a directed acyclic graph of the basic block as shown in fig. 5 can be obtained, and fig. 5 is a schematic diagram of a directed acyclic graph provided in an embodiment of the present application.
TABLE 1
Figure BDA0003110776580000111
Figure BDA0003110776580000121
In the case of the known instructions corresponding to the nodes in the basic block, such as the instructions corresponding to the nodes i0 to i10 in table 1, according to the operation operators and operands constituting each instruction, the data dependency relationship between the instructions corresponding to the nodes can be determined, so that the data dependency relationship is represented by the User chain of the node corresponding to each instruction as shown in fig. 4. In fig. 4, the numerical value corresponding to W is a weight value, and is used to represent constraint data between nodes (i.e., instructions corresponding to the nodes). For example, as shown in fig. 3, the User chain of the node i1 is the node i0, and the constraint data between the nodes i0 and i1 is represented by a weight value W, which is 1. In addition, the User chains for nodes i0, i2, and i5 are all empty. After obtaining fig. 4, a directed acyclic graph of the basic block corresponding to table 1 shown in fig. 5 can be constructed. As shown in fig. 5, each node of the directed acyclic graph corresponds to each instruction of the basic block, i.e., nodes i0 to i10 correspond to each instruction in table 1 one by one. Edges of the directed acyclic graph are connected to constraint data of data dependency relationships among nodes with the data dependency relationships.
According to the instruction scheduling method provided by the embodiment of the application, for each basic block, a User chain of a node corresponding to each instruction is obtained firstly based on the data dependency relationship among the instructions in each basic block, and then a directed acyclic graph of each basic block is constructed based on the User chain, so that instruction scheduling is performed based on the directed acyclic graph, and the node sequence of each basic block is determined.
S103: and scheduling each node in each directed acyclic graph based on a preset scheduling algorithm until the node sequence of each directed acyclic graph is obtained.
After the directed acyclic graph of each basic block is obtained, each node is scheduled by using a preset scheduling algorithm provided by the embodiment of the application, so that the node sequence of each directed acyclic graph is obtained. The preset scheduling algorithm is used for representing a scheduling principle and a scheduling mode which are followed when each node in each directed acyclic graph is scheduled.
The node sequence of each directed acyclic graph refers to a node sequence determined after scheduling each node in each directed acyclic graph.
In one possible design, a possible implementation of this step S103 is shown in fig. 6. Fig. 6 is a flowchart illustrating a further instruction scheduling method according to an embodiment of the present application. As shown in fig. 6, the present embodiment includes:
s1031: and determining a node corresponding to the ending instruction as a first finished scheduling node in the current directed acyclic graph aiming at the schedulable node in each directed acyclic graph.
When each node in each directed acyclic graph is scheduled by a preset scheduling algorithm, a bottom-up mode or a top-down mode can be adopted. Whether a bottom-up or top-down mode is adopted, nodes capable of being scheduled need to meet the condition that the degree of in-degree is 0, namely the nodes are not depended on by other nodes, and nodes meeting the condition in each directed acyclic graph are defined as schedulable nodes. In addition, no matter a bottom-up mode or a top-down mode is adopted, the position of a node corresponding to an ending instruction in a basic block cannot be changed, namely, the ending instruction after the node is scheduled is still the ending instruction of the basic block before scheduling, and the positions of other instructions can be changed before and after scheduling. The data dependency relations corresponding to the bottom-up scheduling mode and the top-down scheduling mode are opposite, namely the direction of the directed edge set (E) is opposite.
It should be noted that, in the embodiment of the present application, a bottom-up scheduling manner is adopted in the process of determining that the data dependency relationship forms the directed acyclic graph, and therefore, the embodiment shown in fig. 6 performs scheduling in a bottom-up manner. Specifically, for a schedulable node in each directed acyclic graph, a node corresponding to the end instruction may be selected first, and the node is determined as a first completed scheduling node in the current directed acyclic graph being scheduled.
It can be understood that the scheduling of the instruction may be regarded as ordering the nodes corresponding to the instruction, the sequence of selecting the nodes is the scheduling sequence, and the process of selecting the nodes is the process of scheduling the nodes.
Continuing with the example of the directed acyclic graph shown in fig. 5, if the bottom-up scheduling is performed, it can be known from table 1 that the instruction corresponding to the node i10 is an end instruction, and therefore the node i10 is the first completed scheduling node in the current directed acyclic graph shown in fig. 5.
S1032: and scheduling other schedulable nodes except the current completed scheduling node according to the preset counting rule, the preset priority scheduling rule and the User chain of the current completed scheduling node.
After the current finished scheduling node is obtained, scheduling other schedulable nodes except the current finished scheduling node according to a preset counting rule, a preset priority scheduling rule and a User chain of the current finished scheduling node. In other words, other schedulable nodes are selected one by one according to the preset counting rule, the preset priority scheduling rule and the User chain of the currently completed scheduling node, so that the scheduling of the schedulable node is completed. It is understood that the preset scheduling algorithm includes a bottom-up or top-down scheduling manner, a preset counting rule, and a preset priority scheduling rule.
Specifically, a possible implementation manner of step S1032 is shown in fig. 7, and fig. 7 is a schematic flowchart of another instruction scheduling method provided in the embodiment of the present application. As shown in fig. 7, the present embodiment includes:
s201: and obtaining a node to be selected according to the User chain of the currently finished scheduling node, and updating the count value of the node to be selected according to a preset counting rule and constraint data corresponding to the node to be selected so as to generate a Next chain.
The node to be selected is a schedulable node which has a data dependency relationship with the current completed scheduling node in the User chain of the current completed scheduling node. For example, if the node whose scheduling is currently completed in step S1031 is node i10, it may be determined that the node having a data dependency relationship with i10 is i9 according to the User chain of each node shown in fig. 4, and i9 is the currently obtained node to be selected.
And then, updating the count value of the node to be selected according to a preset count rule and constraint data corresponding to the node to be selected so as to generate a Next chain.
And the node to be selected is used as the head of the Next chain, and the count value updated by the preset count rule is used as the tail of the Next chain, so that the current Next chain is generated.
In addition, the preset counting rule may be represented by the following relation (1):
Count′=Count+W (1)
the node to be selected is a node to be selected, and the node to be selected is a node to be selected, wherein the Count' is a Count value of the node to be selected obtained after updating, the Count is a Count value of a currently completed scheduling node when the node to be selected is obtained, and the W is constraint data of data dependency between the node to be selected and the currently completed scheduling node when the node to be selected is obtained, that is, edges of the node to be selected and the currently completed scheduling node in the directed acyclic graph.
It should be noted that, the initial count values of all schedulable nodes in the current directed acyclic graph are all 0, and the node to be selected is a node in a Next chain.
S202: and deleting the currently finished scheduling nodes in the current directed acyclic graph and the data dependency relationship of the currently finished scheduling nodes to obtain a new current directed acyclic graph.
And after the current finished scheduling node is obtained, deleting the data dependency relationship between the current finished scheduling node and the current finished scheduling node from the current directed acyclic graph to obtain a new current directed acyclic graph.
S203: and repeating the steps on the new current directed acyclic graph based on a preset priority scheduling rule until all schedulable nodes in each directed acyclic graph are scheduled.
The preset priority scheduling rule is used for representing priority scheduling of the node to be selected with the largest count value in the Next chain.
And after obtaining a new current directed acyclic graph and a Next chain, repeating the steps S201 to S203 for the new current directed acyclic graph based on a preset priority scheduling rule until all schedulable nodes in each directed acyclic graph are scheduled.
The preset priority scheduling rule is that when the Next is not empty, the node to be selected with the largest count value in the Next chain is preferentially scheduled. Specifically, when the preset priority scheduling rule is used for priority scheduling, when the number of the nodes to be selected in the Next chain is one, the node to be selected in the Next chain is the currently completed scheduling node. And when the prior scheduling is carried out, if the number of the nodes to be selected with the largest count value in the Next chain is more than 1, merging the User chains of the nodes to be selected to the Next chain, then acquiring the lengths of the merged chains to carry out the prior scheduling on the nodes to be selected corresponding to the shortest chain length, namely carrying out the prior scheduling on the nodes to be selected corresponding to the shortest chain length so as to determine the nodes to be selected as the current scheduling nodes.
In one possible design, after all schedulable nodes in each directed acyclic graph are scheduled through the embodiments shown in fig. 6 and fig. 7, the order in which all schedulable nodes in each directed acyclic graph complete scheduling is determined as the node ordering of each directed acyclic graph.
The following describes in detail a specific implementation manner of the embodiments shown in fig. 6 and fig. 7, with reference to the instructions of the nodes (i0 to i10) corresponding to one basic block shown in table 1, the User chain of the nodes shown in fig. 4, and the directed acyclic graph of the basic block shown in fig. 5.
First, according to the instructions shown in table 1, the instruction corresponding to the node i10 is an end instruction, and therefore the node i10 is the first completed scheduling node, in other words, the node i10 is the current completed scheduling node.
Then, as can be known from the User chain of each node shown in fig. 4, obtaining constraint data W between the node i9 and the node i10 and the node to be selected as i9 according to the User chain of the currently completed scheduling node, that is, the node i10, updating the count value of the node i9 based on that the initial value of the count value of the node i9 is 0 and a preset count rule, then using the node i9 to be selected as a head, and using the updated count value (0+1 is 1, that is, C: 1) as a tail, obtaining a Next chain shown in (a) in fig. 8, where fig. 8 is a schematic diagram of a Next chain and a directed acyclic graph provided in this embodiment of the present application. Further, the data dependency relationship of the currently completed scheduling node i10 and the scheduling node i10 in the current directed acyclic graph, that is, deleted from fig. 5, is used to obtain a new current directed acyclic graph as shown in (b) of fig. 8. Since the number of the node i9 to be selected in the current Next chain is one, the node i9 is the currently completed scheduling node.
Further, the corresponding steps of generating the Next chain, obtaining the new current directed acyclic graph and following the preset priority scheduling rule are repeatedly executed to continue the scheduling of other schedulable nodes (i.e. nodes except the node i10 and the node i 9).
Next chains generated from the User chain of the node i9 are shown in fig. 9 (a), and fig. 9 is a schematic diagram of another Next chain and a directed acyclic graph provided in the embodiment of the present application. The new current directed acyclic graph obtained by deleting the completed scheduling node i9 and its data dependency is shown as (b) in fig. 9. At this time, the count values of the nodes i6 to be selected and i8 in the Next chain are equal, so that the User chains of the nodes i6 to be selected and i8 to be selected are merged into the Next chain according to priority scheduling, and the chain lengths of the nodes i6 to be selected and i8 are obtained respectively, that is, the merged User chain and the merged Next chain of the node i6 to be selected include a node i6, a node i8 and a node i1, and the chain length is 3, while the merged User chain and the merged Next chain of the node i8 to be selected include a node i6, a node i8, a node i7 and a node i4, and the chain length is 4, that is, the chain length of the former is the shortest chain length, so that priority scheduling is performed on the node i6 to be selected, that is, the node i6 is the currently scheduled node.
And repeatedly executing the corresponding steps of generating a Next chain, obtaining a new current directed acyclic graph and following the preset priority scheduling rule to continue to other schedulable nodes, wherein the obtained completed scheduling nodes are node i1, node i0, node i8, node i7, node i5, node i4, node i3 and node i 2. And at this point, all schedulable nodes in the directed acyclic graph are completely scheduled. Further, the order (i10-i9-i6-i1-i0-i8-i7-i5-i4-i3-i2) of the scheduling completion of all schedulable nodes (node i0 to node i10) in the directed acyclic graph is determined as the node ordering of the directed acyclic graph.
As can be seen from the above description of the embodiments, after the node sequence of each directed acyclic graph is obtained, the scheduling of the instruction corresponding to the node of each directed acyclic graph is completed. The instruction scheduling method provided by the embodiment of the application is used for performing instruction scheduling on the intermediate code of the source code, wherein the number of instructions is not reduced, and only the sequence of the instructions is recombined, so that when the target code of a specific target machine is generated based on the instruction corresponding to the intermediate code, the life cycle of variables is shortened, the number of required registers is reduced, the possibility of occurrence of a register overflow phenomenon can be reduced, and the occurrence of the register overflow phenomenon is avoided.
According to the instruction scheduling method provided by the embodiment of the application, a plurality of instructions are firstly divided into a plurality of basic blocks, and then scheduling of the instructions is realized on the basis of obtaining the directed acyclic graph according to the data dependency relationship of the instructions in each basic block. Because each instruction in the same basic block has a data dependency relationship, when the instructions are placed in the same basic block and are executed according to the node sequence obtained after scheduling, the capacity of a memory occupied by a program can be effectively reduced, the required quantity of the memory and the register is reduced, the occurrence of the phenomenon of register overflow is effectively avoided, the required development technology threshold and the required cost are lower, and the realizability is higher.
In a possible design, after the node ordering of each directed acyclic graph is obtained, the instruction execution order of the basic block corresponding to the directed acyclic graph can be determined according to the node ordering. If a bottom-up mode is adopted when each completed scheduling node of each directed acyclic graph is obtained, namely when the first completed scheduling node is the node corresponding to the ending instruction, the node ordering of the directed acyclic graph is the reverse order of the instruction execution order of the basic block corresponding to the directed acyclic graph. On the contrary, if the top-down manner is adopted when obtaining each completed scheduling node of each directed acyclic graph, that is, when the first completed scheduling node is the node corresponding to the initial instruction, the node ordering of the directed acyclic graph is the instruction execution order of the basic block corresponding to the directed acyclic graph. And after the instruction execution sequence is obtained, executing each instruction in each basic block according to the instruction execution sequence, thereby completing the ordered execution of a plurality of instructions corresponding to the source code.
Due to the fact that the number of available registers of different architectures is different, the advantages brought by the instruction scheduling method provided by the embodiment of the application cannot be well quantized. Assuming that the target machine has an infinite number of dummy registers, variables in the instruction are allocated to the dummy registers, and the more dummy registers are used, the more hardware resources are required. Therefore, the number of statistical instruction holding variables can be used as a metric to evaluate the instruction scheduling method provided by the embodiment of the application. Still taking the instruction fragments shown in table 1 as an example, the following tables 2 and 3 are shown below to respectively obtain the variable life cycles corresponding to the instruction scheduling method provided by the embodiment of the present application for instruction scheduling and the instruction scheduling method not used for instruction scheduling:
TABLE 2
Figure BDA0003110776580000171
Figure BDA0003110776580000181
TABLE 3
Figure BDA0003110776580000182
As can be seen from table 2, the number of variables to be saved when the instruction of the node i5 is executed is the largest, and 6 variables need to be saved. If the number of registers of the target machine is not enough to hold the 6 variables, a register overflow is generated, the register overflow increases the Load/Store instruction and increases the use of the memory, and therefore the register overflow is reduced as much as possible. The instruction scheduling method provided by the embodiment of the application can be used for scheduling the instruction, so that the probability of register overflow can be reduced, and the effect of the variable life cycle after the instruction scheduling method is used is shown in the table 3. As can be seen from table 3, after the instruction scheduling method provided in the embodiment of the present application is used for instruction scheduling, each instruction needs to store 4 variables at most. Compared with the variables corresponding to the unused instruction scheduling in table 2, the demand on the register and the memory is greatly reduced, and the life cycle of the variables is shortened as much as possible, so that the limited register resources can be fully utilized in a register multiplexing mode, and the utilization rate of the register is improved.
Fig. 10 is a schematic structural diagram of an instruction scheduling apparatus according to an embodiment of the present application. As shown in fig. 10, the instruction scheduling apparatus 300 according to the present embodiment includes:
the first processing module 301 is configured to divide a plurality of instructions into a plurality of basic blocks according to a preset division rule.
The preset division rule comprises a preset starting instruction type and a preset ending instruction type.
The preset ending instruction type comprises any one of an unconditional jump instruction, a conditional jump instruction, a function call instruction and a last instruction of a function, wherein the function call instruction does not comprise a specified function call instruction.
The second processing module 302 is configured to determine a directed acyclic graph of each basic block according to a data dependency relationship between instructions in each basic block, and obtain a data dependency graph of each basic block.
Each node in each directed acyclic graph corresponds to each instruction in each corresponding basic block one by one;
the third processing module 303 is configured to schedule each node in each directed acyclic graph based on a preset scheduling algorithm until a node ranking of each directed acyclic graph is obtained.
Based on fig. 10, fig. 11 is a schematic structural diagram of another instruction scheduling apparatus according to an embodiment of the present application. As shown in fig. 11, the instruction scheduling apparatus 300 according to this embodiment further includes: a fourth processing module 304, the fourth processing module 304 configured to:
and determining the instruction execution sequence of the corresponding basic block according to the node sequence of each directed acyclic graph, and executing each instruction in each basic block according to the instruction execution sequence to finish the ordered execution of a plurality of instructions.
In one possible design, the preset start instruction type includes any one of the following:
a first instruction of the function, a target address of a preset instruction and a next instruction after the preset instruction; the preset instructions comprise unconditional jump instructions and conditional jump instructions.
In one possible design, the second processing module 302 is specifically configured to:
and determining the data dependency relationship among the instructions in each basic block to generate a User chain of the node corresponding to each instruction and generate a data dependency graph of each basic block.
And obtaining the directed acyclic graph of each basic block according to the User chain of each corresponding node in each basic block.
The data dependency graph of each basic block comprises a node set and a directed edge set, each node subset in the node set is used for representing the data dependency relationship of each node, each directed edge subset in the directed edge set is used for representing constraint data of the data dependency relationship of each node, and the constraint data are represented by weight values.
In one possible design, the data dependencies include: alias analysis of any one or more of dependencies, usage-definition (Use-Def) chain dependencies, and usage (Use) chain dependencies.
In one possible design, the third processing module 303 includes:
and the first processing submodule is used for determining a node corresponding to the ending instruction as a first finished scheduling node in the current directed acyclic graph aiming at the schedulable node in each directed acyclic graph.
And the second processing submodule is used for scheduling other schedulable nodes except the current finished scheduling node according to the preset counting rule, the preset priority scheduling rule and the User chain of the current finished scheduling node.
The preset scheduling algorithm comprises a preset counting rule and a preset priority scheduling rule.
In one possible design, the second processing submodule is specifically configured to:
and obtaining a node to be selected according to the User chain of the currently finished scheduling node, and updating the count value of the node to be selected according to a preset counting rule and constraint data corresponding to the node to be selected so as to generate a Next chain.
And deleting the currently finished scheduling nodes in the current directed acyclic graph and the data dependency relationship of the currently finished scheduling nodes to obtain a new current directed acyclic graph.
And repeating the steps on the new current directed acyclic graph based on a preset priority scheduling rule until all schedulable nodes in each directed acyclic graph are scheduled.
The preset priority scheduling rule is used for representing priority scheduling of the node to be selected with the largest count value in the Next chain.
In one possible design, when performing priority scheduling, if the number of nodes to be selected in the Next chain with the largest count value is greater than 1, the second processing submodule is further configured to:
and merging the User chains of the nodes to be selected into a Next chain to obtain the lengths of all the merged chains, and preferentially scheduling the nodes to be selected corresponding to the minimum chain length.
In one possible design, the instruction scheduling apparatus 300 further includes: and a fifth processing module. The fifth processing module is configured to:
and determining the scheduling completion sequence of each schedulable node in each directed acyclic graph as the node sequence of each directed acyclic graph.
It should be noted that the instruction scheduling apparatus provided in fig. 10 and 11 and the alternative embodiments may be configured to execute corresponding steps of the instruction scheduling method provided in any of the embodiments, and specific implementation manners and technical effects are similar and will not be described again here.
The foregoing embodiments of the apparatus provided in this application are merely exemplary, and the module division is only one logic function division, and there may be another division manner in actual implementation. For example, multiple modules may be combined or may be integrated into another system. The coupling of the various modules to each other may be through interfaces such as electrical communication interfaces, without excluding the possibility of mechanical or other forms of interfaces.
Fig. 12 is a schematic structural diagram of an electronic device provided in the present application. As shown in fig. 12, the electronic device 400 may include: at least one processor 401 and memory 402. Fig. 12 shows an electronic device as an example of a processor.
A memory 402 for storing computer programs for the processor 401. In particular, the program may include program code including computer operating instructions.
Memory 402 may comprise high-speed RAM memory and may also include non-volatile memory (non-volatile memory), such as at least one disk memory.
The processor 401 is configured to execute the computer program stored in the memory 402 to implement the steps of the instruction scheduling method in the above method embodiments.
The processor 401 may be a Central Processing Unit (CPU), an Application Specific Integrated Circuit (ASIC), or one or more integrated circuits configured to implement the embodiments of the present application.
Alternatively, the memory 402 may be separate or integrated with the processor 401. When the memory 402 is a device independent of the processor 401, the electronic device 400 may further include:
a bus 403 for connecting the processor 401 and the memory 402. The bus may be an Industry Standard Architecture (ISA) bus, a Peripheral Component Interconnect (PCI) bus, an Extended ISA (EISA) bus, or the like. Buses may be classified as address buses, data buses, control buses, etc., but do not represent only one bus or type of bus.
Alternatively, in a specific implementation, if the memory 402 and the processor 401 are integrated on a chip, the memory 402 and the processor 401 may communicate through an internal interface.
The present application also provides a computer-readable storage medium, which may include: a variety of media that can store program codes, such as a usb disk, a removable hard disk, a read-only memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and in particular, the computer-readable storage medium stores a computer program, and when at least one processor of the electronic device executes the computer program, the electronic device executes the steps of the instruction scheduling method provided by the foregoing various embodiments.
Embodiments of the present application also provide a computer program product, which includes a computer program, and the computer program is stored in a readable storage medium. The computer program may be read from a readable storage medium by at least one processor of the electronic device, and execution of the computer program by the at least one processor causes the electronic device to perform the steps of the instruction scheduling method provided by the various embodiments described above.
Other embodiments of the present application will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the application being indicated by the following claims.
It will be understood that the present application is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the application is limited only by the appended claims.

Claims (21)

1. An instruction scheduling method, comprising:
dividing a plurality of instructions into a plurality of basic blocks according to a preset division rule, wherein the preset division rule comprises a preset starting instruction type and a preset ending instruction type, the preset ending instruction type comprises any one of an unconditional jump instruction, a conditional jump instruction, a function call instruction and a last instruction of a function, and the function call instruction does not comprise a specified function call instruction;
determining a directed acyclic graph of each basic block according to the data dependency relationship among the instructions in each basic block, and obtaining the data dependency graph of each basic block, wherein each node in each directed acyclic graph corresponds to each instruction in each corresponding basic block one to one;
and scheduling each node in each directed acyclic graph based on a preset scheduling algorithm until the node sequence of each directed acyclic graph is obtained.
2. The instruction scheduling method according to claim 1, further comprising, after said obtaining the node ordering of each directed acyclic graph:
and determining an instruction execution sequence in the corresponding basic block according to the node sequence of each directed acyclic graph, and executing each instruction in each basic block according to the instruction execution sequence to complete the ordered execution of the plurality of instructions.
3. The instruction scheduling method according to claim 1, wherein the preset starting instruction type comprises any one of the following:
a first instruction of the function, a target address of a preset instruction and a next instruction after the preset instruction;
the preset instructions comprise unconditional jump instructions and conditional jump instructions.
4. The instruction scheduling method according to any one of claims 1 to 3, wherein the determining a directed acyclic graph of each basic block according to the data dependency relationship between instructions in each basic block and obtaining the data dependency graph of each basic block comprises:
determining the data dependency relationship among the instructions in each basic block to generate a User chain of a node corresponding to each instruction;
obtaining the directed acyclic graph of each basic block according to the User chain of each corresponding node in each basic block, and generating the data dependency graph of each basic block;
the data dependency graph of each basic block comprises a node set and a directed edge set, each node subset in the node set is used for representing the data dependency relationship of each node, each directed edge subset in the directed edge set is used for representing constraint data of the data dependency relationship of each node, and the constraint data are represented by weight values.
5. The instruction scheduling method of claim 4 wherein the data dependencies comprise: alias analysis of any one or more of dependencies, usage-definition (Use-Def) chain dependencies, and usage (Use) chain dependencies.
6. The instruction scheduling method according to claim 4, wherein the scheduling nodes in each directed acyclic graph based on a preset scheduling algorithm comprises:
determining a node corresponding to an ending instruction as a first finished scheduling node in the current directed acyclic graph aiming at the schedulable node in each directed acyclic graph;
scheduling other schedulable nodes except the current completed scheduling node according to a preset counting rule, a preset priority scheduling rule and a User chain of the current completed scheduling node;
the preset scheduling algorithm comprises the preset counting rule and the preset priority scheduling rule.
7. The method of claim 6, wherein said scheduling schedulable nodes other than said currently completed scheduling node according to a predetermined counting rule, a predetermined priority scheduling rule and a User chain of said completed scheduling node comprises:
obtaining a node to be selected according to the User chain of the currently finished scheduling node, and updating the count value of the node to be selected according to the preset counting rule and the constraint data corresponding to the node to be selected so as to generate a Next chain;
deleting the currently finished scheduling nodes and the data dependency relationship of the currently finished scheduling nodes in the current directed acyclic graph to obtain a new current directed acyclic graph;
repeating the steps on the new current directed acyclic graph based on the preset priority scheduling rule until all schedulable nodes in each directed acyclic graph are scheduled;
and the preset priority scheduling rule is used for representing the priority scheduling of the node to be selected with the largest count value in the Next chain.
8. The method of claim 7, wherein during the priority scheduling, if the number of nodes to be selected with the largest count value in the Next chain is greater than 1, merging the User chain of the nodes to be selected to the Next chain to obtain the merged chain lengths, and performing the priority scheduling on the node to be selected corresponding to the shortest chain length.
9. The method according to claim 7, further comprising, after said scheduling is completed for all schedulable nodes in each directed acyclic graph:
and determining the scheduling completion sequence of all schedulable nodes in each directed acyclic graph as the node sequence of each directed acyclic graph.
10. An instruction scheduling apparatus, comprising:
the system comprises a first processing module, a second processing module and a third processing module, wherein the first processing module is used for dividing a plurality of instructions into a plurality of basic blocks according to a preset division rule, the preset division rule comprises a preset starting instruction type and a preset ending instruction type, the preset ending instruction type comprises any one of an unconditional jump instruction, a conditional jump instruction, a function call instruction and a last instruction of a function, and the function call instruction does not comprise a specified function call instruction;
the second processing module is used for determining the directed acyclic graph of each basic block according to the data dependency relationship among the instructions in each basic block and obtaining the data dependency graph of each basic block, and each node in each directed acyclic graph corresponds to each instruction in each corresponding basic block one to one;
and the third processing module is used for scheduling each node in each directed acyclic graph based on a preset scheduling algorithm until the node sequence of each directed acyclic graph is obtained.
11. The instruction scheduling apparatus of claim 10, further comprising: a fourth processing module; the fourth processing module is configured to:
and determining the instruction execution sequence of the corresponding basic block according to the node sequence of each directed acyclic graph, and executing each instruction in each basic block according to the instruction execution sequence to finish the ordered execution of the plurality of instructions.
12. The instruction scheduling apparatus of claim 10, wherein the preset start instruction type comprises any one of the following:
a first instruction of the function, a target address of a preset instruction and a next instruction after the preset instruction;
the preset instructions comprise unconditional jump instructions and conditional jump instructions.
13. The instruction scheduling apparatus according to any one of claims 10 to 12, wherein the second processing module is specifically configured to:
determining the data dependency relationship among the instructions in each basic block to generate a User chain of a node corresponding to each instruction and generate the data dependency graph of each basic block;
obtaining the directed acyclic graph of each basic block according to the User chain of each corresponding node in each basic block;
the data dependency graph of each basic block comprises a node set and a directed edge set, each node subset in the node set is used for representing the data dependency relationship of each node, each directed edge subset in the directed edge set is used for representing constraint data of the data dependency relationship of each node, and the constraint data are represented by weight values.
14. The instruction scheduling apparatus of claim 13 wherein the data dependencies comprise: alias analysis of any one or more of dependencies, usage-definition (Use-Def) chain dependencies, and usage (Use) chain dependencies.
15. The instruction scheduling apparatus of claim 13, wherein the third processing module comprises:
the first processing submodule is used for determining a node corresponding to an ending instruction as a first finished scheduling node in the current directed acyclic graph aiming at the schedulable node in each directed acyclic graph;
the second processing submodule is used for scheduling other schedulable nodes except the current finished scheduling node according to a preset counting rule, a preset priority scheduling rule and a User chain of the current finished scheduling node;
the preset scheduling algorithm comprises the preset counting rule and the preset priority scheduling rule.
16. The instruction scheduling device of claim 15, wherein the second processing submodule is specifically configured to:
obtaining a node to be selected according to the User chain of the currently finished scheduling node, and updating the count value of the node to be selected according to the preset counting rule and the constraint data corresponding to the node to be selected so as to generate a Next chain;
deleting the currently finished scheduling nodes and the data dependency relationship of the currently finished scheduling nodes in the current directed acyclic graph to obtain a new current directed acyclic graph;
repeating the steps on the new current directed acyclic graph based on the preset priority scheduling rule until all schedulable nodes in each directed acyclic graph are scheduled;
and the preset priority scheduling rule is used for representing the priority scheduling of the node to be selected with the largest count value in the Next chain.
17. The instruction scheduling device of claim 16, wherein in performing the priority scheduling, if the number of nodes to be selected in the Next chain with the largest count value is greater than 1, the second processing sub-module is further configured to:
and merging the User chain of the node to be selected to the Next chain to obtain the length of each merged chain, and performing the priority scheduling on the node to be selected corresponding to the minimum chain length.
18. The instruction dispatcher of claim 16, wherein the instruction dispatcher further comprises: a fifth processing module; the fifth processing module is configured to:
and determining the scheduling completion sequence of each schedulable node in each directed acyclic graph as the node sequence of each directed acyclic graph.
19. An electronic device, comprising:
a processor; and the number of the first and second groups,
a memory for storing a computer program for the processor;
wherein the processor is configured to perform the instruction scheduling method of any one of claims 1 to 9 via execution of the computer program.
20. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the instruction scheduling method of any one of claims 1 to 9.
21. A computer program product comprising a computer program, characterized in that the computer program, when being executed by a processor, implements the instruction scheduling method of any one of claims 1 to 9.
CN202110650043.4A 2021-06-10 2021-06-10 Instruction scheduling method, device, equipment and storage medium Active CN113296788B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110650043.4A CN113296788B (en) 2021-06-10 2021-06-10 Instruction scheduling method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110650043.4A CN113296788B (en) 2021-06-10 2021-06-10 Instruction scheduling method, device, equipment and storage medium

Publications (2)

Publication Number Publication Date
CN113296788A true CN113296788A (en) 2021-08-24
CN113296788B CN113296788B (en) 2024-04-12

Family

ID=77328054

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110650043.4A Active CN113296788B (en) 2021-06-10 2021-06-10 Instruction scheduling method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN113296788B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2024066875A1 (en) * 2022-09-29 2024-04-04 深圳市中兴微电子技术有限公司 Instruction-level parallel scheduling method and apparatus, electronic device, and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5894576A (en) * 1996-11-12 1999-04-13 Intel Corporation Method and apparatus for instruction scheduling to reduce negative effects of compensation code
CN102830954A (en) * 2012-08-24 2012-12-19 北京中科信芯科技有限责任公司 Method and device for instruction scheduling
CN105843660A (en) * 2016-03-21 2016-08-10 同济大学 Code optimization scheduling method for encoder
US20170161689A1 (en) * 2015-12-08 2017-06-08 TCL Research America Inc. Personalized func sequence scheduling method and system
CN110333857A (en) * 2019-07-12 2019-10-15 辽宁工程技术大学 A kind of custom instruction automatic identifying method based on constraint planning

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5894576A (en) * 1996-11-12 1999-04-13 Intel Corporation Method and apparatus for instruction scheduling to reduce negative effects of compensation code
CN102830954A (en) * 2012-08-24 2012-12-19 北京中科信芯科技有限责任公司 Method and device for instruction scheduling
US20170161689A1 (en) * 2015-12-08 2017-06-08 TCL Research America Inc. Personalized func sequence scheduling method and system
CN105843660A (en) * 2016-03-21 2016-08-10 同济大学 Code optimization scheduling method for encoder
CN110333857A (en) * 2019-07-12 2019-10-15 辽宁工程技术大学 A kind of custom instruction automatic identifying method based on constraint planning

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2024066875A1 (en) * 2022-09-29 2024-04-04 深圳市中兴微电子技术有限公司 Instruction-level parallel scheduling method and apparatus, electronic device, and storage medium

Also Published As

Publication number Publication date
CN113296788B (en) 2024-04-12

Similar Documents

Publication Publication Date Title
JP4957729B2 (en) Program parallelization method, program parallelization apparatus and program
JP4042604B2 (en) Program parallelization apparatus, program parallelization method, and program parallelization program
US6598221B1 (en) Assembly code performance evaluation apparatus and method
US8893080B2 (en) Parallelization of dataflow actors with local state
EP3066560B1 (en) A data processing apparatus and method for scheduling sets of threads on parallel processing lanes
CN110968321B (en) Tensor calculation code optimization method, device, equipment and medium
TW201923561A (en) Scheduling tasks in a multi-threaded processor
EP3908920B1 (en) Optimizing hardware fifo instructions
JP4965995B2 (en) Program processing method, processing program, and information processing apparatus
CN106462431B (en) The extraction system framework in higher synthesis
CN114510339B (en) Computing task scheduling method and device, electronic equipment and readable storage medium
WO2016105840A1 (en) Technologies for low-level composable high performance computing libraries
CN113296788B (en) Instruction scheduling method, device, equipment and storage medium
US10990073B2 (en) Program editing device, program editing method, and computer readable medium
EP4231138A1 (en) Method and apparatus for fixing weak memory ordering problem
US10089088B2 (en) Computer that performs compiling, compiler program, and link program
CN111930359A (en) System and method for algorithm development on heterogeneous embedded system
US20140013312A1 (en) Source level debugging apparatus and method for a reconfigurable processor
US9934035B2 (en) Device and method for tracing updated predicate values
JP5140105B2 (en) Instruction scheduling apparatus, instruction scheduling method, and instruction scheduling program
US20120226890A1 (en) Accelerator and data processing method
WO2018198745A1 (en) Calculation resource management device, calculation resource management method, and computer-readable recording medium
JP6473023B2 (en) Performance evaluation module and semiconductor integrated circuit incorporating the same
Xiao et al. Optimization on operation sorting for HLS scheduling algorithms
CN115658242B (en) Task processing method for logic system design and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant