WO2024066875A1 - 指令级并行调度方法、装置、电子设备及存储介质 - Google Patents

指令级并行调度方法、装置、电子设备及存储介质 Download PDF

Info

Publication number
WO2024066875A1
WO2024066875A1 PCT/CN2023/115697 CN2023115697W WO2024066875A1 WO 2024066875 A1 WO2024066875 A1 WO 2024066875A1 CN 2023115697 W CN2023115697 W CN 2023115697W WO 2024066875 A1 WO2024066875 A1 WO 2024066875A1
Authority
WO
WIPO (PCT)
Prior art keywords
dag
operand
operands
relationship graph
instruction
Prior art date
Application number
PCT/CN2023/115697
Other languages
English (en)
French (fr)
Inventor
刘炜
Original Assignee
深圳市中兴微电子技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳市中兴微电子技术有限公司 filed Critical 深圳市中兴微电子技术有限公司
Publication of WO2024066875A1 publication Critical patent/WO2024066875A1/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead

Definitions

  • the present application relates to but is not limited to the field of processor technology.
  • the present application provides an instruction-level parallel scheduling method, which includes: obtaining a DAG (Directed Acyclic Graph) relationship graph corresponding to an instruction set, searching for intermediate operands corresponding to each operation instruction with a dependency relationship in the instruction set by traversing the DAG relationship graph; sorting each intermediate operand according to the ready order of each intermediate operand to obtain an intermediate operand list; generating an original DAG relationship graph corresponding to each intermediate operand, and inserting each intermediate operand into the original DAG relationship graph in sequence according to the arrangement priority of the intermediate operand list to obtain a target DAG relationship graph; and performing parallel scheduling on the instructions in the instruction set based on the target DAG relationship graph.
  • DAG Directed Acyclic Graph
  • the present application also provides an instruction-level parallel scheduling device, the instruction-level parallel scheduling device comprising: an operand search module, configured to obtain a DAG relationship graph corresponding to an instruction set, and to search for intermediate operands corresponding to each operation instruction with a dependency relationship in the instruction set by traversing the DAG relationship graph; an operand sorting module, configured to sort the intermediate operands according to each operation instruction with a dependency relationship; The ready order of the intermediate operands is used to sort the intermediate operands to obtain an intermediate operand list; a dependency breaking module is configured to generate an original DAG relationship graph corresponding to each of the intermediate operands, and according to the arrangement priority of the intermediate operand list, each of the intermediate operands is sequentially inserted into the original DAG relationship graph to obtain a target DAG relationship graph; a parallel scheduling module is configured to perform parallel scheduling on the instructions in the instruction set according to the target DAG relationship graph.
  • an operand search module configured to obtain a DAG relationship graph corresponding to an instruction set, and to search for intermediate operand
  • the present application also provides an electronic device, which is a physical device, and includes: a memory, a processor, and a program of the instruction-level parallel scheduling method stored in the memory and executable on the processor.
  • a program of the instruction-level parallel scheduling method stored in the memory and executable on the processor.
  • the present application also provides a computer-readable storage medium, on which is stored a program for implementing the instruction-level parallel scheduling method.
  • a program for implementing the instruction-level parallel scheduling method When the program of the instruction-level parallel scheduling method is executed by a processor, the steps of the instruction-level parallel scheduling method described in this article are implemented.
  • FIG1 is a schematic diagram of a flow chart of an instruction level parallel scheduling method of the present application.
  • FIG2 is a schematic diagram of a DAG relationship diagram in the instruction level parallel scheduling method of the present application.
  • FIG3 is a schematic diagram of an original DAG relationship diagram corresponding to the DAG relationship diagram in the instruction-level parallel scheduling method of the present application;
  • FIG4 is a schematic diagram of inserting basic operations into the original DAG relationship diagram in the instruction-level parallel scheduling method of the present application
  • FIG5 is a schematic diagram of a target DAG relationship diagram in the instruction-level parallel scheduling method of the present application.
  • FIG6 is a flow chart of the instruction level parallel scheduling method of the present application.
  • FIG7 is a schematic diagram of an apparatus in an instruction level parallel scheduling apparatus of the present application.
  • FIG8 is a schematic diagram of the device structure of the hardware operating environment involved in the instruction-level parallel scheduling method in this application.
  • the present application provides an instruction level parallel scheduling method.
  • the instruction level parallel scheduling method may include steps S10 to S40.
  • step S10 a DAG relationship graph corresponding to the instruction set is obtained, and intermediate operands corresponding to each operation instruction having a dependency relationship in the instruction set are found by traversing the DAG relationship graph.
  • step S20 the intermediate operands are sorted according to the ready order of the intermediate operands to obtain an intermediate operand list.
  • step S30 an original DAG relationship graph corresponding to each of the intermediate operands is generated, and each of the intermediate operands is sequentially inserted into the original DAG relationship graph according to the arrangement priority of the intermediate operand list to obtain a target DAG relationship graph.
  • step S40 the instructions in the instruction set are scheduled in parallel according to the target DAG relationship graph.
  • the instruction set includes at least one instruction
  • the DAG relationship graph is a relationship graph generated according to the instruction information corresponding to each instruction, and the DAG relationship graph contains information such as the dependency relationship and operation relationship between the operands corresponding to the instructions.
  • Figure 2 is a schematic diagram of the DAG relationship diagram in the present application, wherein a, b, c, d, t1, t2, t3, t4, t5, t6, t8, t9, t10 and t11 are all operands corresponding to instructions, *, +, /, -, shl, and and or are all basic operations, 1, 3 and 16 are all ready cycle numbers, that is, after this ready cycle number, the operand corresponding to the ready cycle number is ready, and the corresponding instruction can be issued.
  • the intermediate operands are operands corresponding to the operation instructions having a dependency relationship, and the intermediate operands are not result operands of the operation instructions having a dependency relationship.
  • the addition operations with dependency relationships in FIG. 2 are a+b, t1+t2, t9+t3, t10+t4, and the operands involved are a, b, t1, t2, t9, t3, t10, and t4, respectively.
  • the intermediate operands obtained are a, b, t1, t3, and t4.
  • steps S10 to S40 include: obtaining a DAG relationship graph generated according to instruction information of each instruction in an instruction set; searching for intermediate operands corresponding to each identical operation instruction with a dependency relationship in the DAG relationship graph by traversing the DAG relationship graph; sorting each intermediate operand from ready to ready according to the ready order of each intermediate operand in the DAG relationship graph to obtain an intermediate operand list; regenerating the DAG relationship graph according to the association relationship between each intermediate operand to obtain an original DAG relationship graph, and sequentially inserting each intermediate operand and the operation results between each intermediate operand into the original DAG relationship graph according to the arrangement priority of the intermediate operand list to obtain a target DAG relationship graph; and performing parallel scheduling on the instructions in the instruction set according to the depth corresponding to each instruction in the target DAG relationship graph, wherein, since the depth corresponding to each instruction in the target DAG relationship graph is consistent with the ready order of the operands in the target DAG relationship graph, there is no dependency relationship between the instructions, thereby achieving the purpose of breaking
  • the step of generating the original DAG relationship graph corresponding to each of the intermediate operands includes step A10 and step B10.
  • step A10 operation instructions irrelevant to each of the intermediate operands are deleted from the DAG relationship graph to obtain an original DAG relationship graph.
  • the original DAG relationship graph can be obtained by adjusting the DAG relationship graph corresponding to the instruction set.
  • step A10 includes: determining the operation instructions related to each intermediate operand in the DAG relationship graph; deleting the instructions except the operation instructions related to each intermediate operand in the DAG relationship graph to obtain the original DAG relationship graph.
  • step B10 an original DAG relationship graph is generated according to the original operation instructions corresponding to each of the intermediate operands in the DAG relationship graph.
  • the original DAG relationship graph may be a DAG relationship graph regenerated based on instruction information corresponding to each intermediate operand.
  • step B10 includes: determining the original operation instructions of each intermediate operand in the DAG relationship graph, and generating an original DAG relationship graph based on instruction information corresponding to the original operation instructions.
  • Figure 3 is a schematic diagram of the original DAG relationship diagram corresponding to the DAG relationship diagram in the present application.
  • the original DAG relationship diagram can be adjusted on the basis of the DAG relationship diagram in Figure 2.
  • the addition operations with dependency relationships in Figure 2 are a+b, t1+t2, t9+t3, t10+t4, respectively, and the operands included are a, b, t1, t2, t9, t3, t10 and t4, respectively.
  • the intermediate operands obtained are a, b, t1, t3 and t4.
  • the intermediate operand list obtained by sorting each intermediate operation is a(0), b(0), t1(3), t3(16) and t4(1), where 0, 1, 3 and 16 are the numbers of ready cycles. Then, according to the intermediate operand list, the operation instructions not related to the intermediate operands are deleted from the DAG relationship diagram in Figure 2 to obtain the original DAG relationship diagram.
  • the step of inserting each intermediate operand into the original DAG relationship graph in sequence according to the arrangement priority of the intermediate operand list to obtain the target DAG relationship graph includes steps S31 to S33.
  • step S31 the first target operand and the second target operand that are ranked first are taken out from the intermediate operand list.
  • step S32 the original DAG relationship graph and the intermediate operand list are updated according to the first target operand and the second target operand.
  • step S33 return to the execution step: take out the first target operand and the second target operand that are ranked higher in the intermediate operand list, until all the operands in the intermediate operand list have been inserted into the original DAG relationship graph, and use the original DAG relationship graph as the target DAG relationship graph.
  • steps S31 to S33 include: taking out the first target operand and the second target operand ranked higher in the ready order in the intermediate operand list; constructing a basic operation between the first target operand and the second target operand according to the operation relationship between the operands in the DAG relationship graph; updating the original DAG relationship graph and the intermediate operand list by inserting the basic operation into the original DAG relationship graph and inserting the operation result of the basic operation into the intermediate operand list; returning to the execution step: taking out the first target operand ranked higher in the intermediate operand list; The intermediate operand list and the second target operand are inserted into the original DAG relationship graph until all the operands in the intermediate operand list have been inserted into the original DAG relationship graph, and the original DAG relationship graph is used as the target DAG relationship graph.
  • the implementation method of the present application realizes that each intermediate operand and the operation result corresponding to each intermediate operand are inserted into the original DAG relationship graph in sequence according to the ready order of the operands in the intermediate operand list, which can ensure that there is no dependency between the operation instructions in the target DAG relationship graph.
  • the step of updating the original DAG relationship graph and the intermediate operand list according to the first target operand and the second target operand includes steps S321 to S323.
  • step S321 a basic operation between the first target operand and the second target operand is constructed.
  • step S322 the original DAG relationship graph is updated by inserting the basic operation into the original DAG relationship graph.
  • step S323 the intermediate result operands corresponding to the basic operation are added to the intermediate operand list, and the intermediate operand list is updated by re-sorting the intermediate operand list.
  • steps S321 to S323 include: constructing a basic operation between the first target operand and the second target operand according to the operation relationship between the first target operand and the second target operand in the DAG relationship graph, and updating the original DAG relationship graph by inserting the basic operation and the operation result of the basic operation into the original DAG relationship graph, wherein the operation result may be an intermediate result operand; adding the intermediate result operand corresponding to the basic operation to the intermediate operand list, and updating the intermediate operand list by re-sorting the intermediate operand list according to the ready order of each operand in the intermediate operand list.
  • FIG 4 is a schematic diagram of inserting basic operations into the original DAG relationship diagram in the present application.
  • the DAG relationship diagram in Figure 4 is a DAG relationship diagram obtained by inserting the basic operations between a and b on the basis of the original DAG relationship diagram in Figure 3, wherein a is the first target operand, b is the second target operand, s1 is the operation result corresponding to a and b, that is, the intermediate result operand, and accordingly, the updated intermediate operand list is s1(1), t4(1), t1(3), t3(16).
  • FIG. 5 is a schematic diagram of the target DAG relationship diagram in the present application. Based on the DAG relationship diagram in FIG. 4 , the intermediate operands in the intermediate operand list are sequentially inserted in a loop to obtain the target DAG relationship diagram in FIG. 5 .
  • the step of searching for intermediate operands corresponding to each operation instruction having a dependency relationship in the instruction set by traversing the DAG relationship graph includes steps S11 to S13.
  • step S11 by traversing the DAG relationship graph, the same operation DAG tree with mutual dependency is searched in the DAG relationship graph.
  • step S12 the same operations having dependency relationships are searched by traversing the same operation DAG trees.
  • step S13 the operands corresponding to the same operations are obtained, and the result operands are deleted from the operands to obtain the intermediate operands.
  • the DAG relationship graph may be composed of at least one DAG tree.
  • step S11 to step S13 include: by traversing the DAG relationship graph, searching in the DAG relationship graph for the same operation DAG tree corresponding to the same operation with mutual dependence; by traversing each of the same operation DAG trees, searching for each of the same operations with mutual dependence; extracting the operands corresponding to each of the same operations to obtain an operand list; deleting the operands that serve as result operands in the operand list to obtain each intermediate operand.
  • the embodiment of the present application provides an instruction-level parallel scheduling method, that is, obtaining a DAG relationship graph corresponding to an instruction set, searching for intermediate operands corresponding to each operation instruction with a dependency relationship in the instruction set by traversing the DAG relationship graph; sorting each intermediate operand according to the ready order of each intermediate operand to obtain an intermediate operand list; generating an original DAG relationship graph corresponding to each intermediate operand, and inserting each intermediate operand into the original DAG relationship graph in sequence according to the arrangement priority of the intermediate operand list to obtain a target DAG relationship graph, thereby realizing the construction of a target DAG relationship graph of an instruction set according to the ready order of the operands, and performing parallel scheduling on the instructions in the instruction set based on the target DAG relationship graph, which can be realized Instructions are scheduled in parallel sequentially according to the ready order of operands, eliminating the influence of dependencies between instructions on parallel scheduling of instructions, thereby improving the emission efficiency of dependent instructions and solving the technical problem of low emission efficiency caused by dependencies between instructions.
  • the step of sorting each of the intermediate operands according to the ready order of each of the intermediate operands to obtain a list of intermediate operands includes: step S21, determining the operand source type corresponding to each of the intermediate operands, wherein the operand source type includes at least one of a long pipeline type and a short pipeline type; step S22, sorting each of the intermediate operands according to the operand source type and the ready order of each of the intermediate operands to obtain a list of intermediate operands.
  • the number of ready cycles corresponding to multiple intermediate operands is often the same.
  • the operand source type corresponding to the intermediate operand can be considered as the basis for further sorting.
  • the actual number of ready cycles corresponding to the long pipeline type intermediate operands is often greater than the reference number of ready cycles given in the DAG relationship diagram.
  • the step of fine-tuning the sorting order of the preliminary intermediate operand list according to the operand source type to obtain the intermediate operand list includes: determining each target intermediate operand in the preliminary intermediate operand list with the same ready order, and if there is a long pipeline type in the operand source type corresponding to each target intermediate operand, the target intermediate operand belonging to the long pipeline type is sorted in the preliminary intermediate operand list. Move back, where the distance of the move back can be set by yourself, for example, move back 1 arrangement position or move back 2 arrangement positions, etc.
  • the step of fine-tuning the sorting order of the preliminary intermediate operand list according to the source type of each operand to obtain the intermediate operand list includes: determining each target intermediate operand belonging to the long pipeline type in the preliminary intermediate operand list according to the source type of each operand; shifting each target intermediate operand back by a preset number of arrangement positions in the preliminary intermediate operand list to obtain the intermediate operand list, wherein the preset number of arrangement positions can be set by oneself, such as 1 arrangement position or 2 arrangement positions, etc.
  • the step of sorting each of the intermediate operands according to the source type of each of the operands and the ready order of each of the intermediate operands to obtain a list of intermediate operands includes: step S221, determining the hardware resource limitation parameters by traversing the DAG relationship graph; step S222, sorting each of the intermediate operands according to the hardware resource limitation parameters, the source type of each of the operands and the ready order of each of the intermediate operands to obtain a list of intermediate operands.
  • the real number of ready cycles for the intermediate operands of the long pipeline type is often greater than the reference number of ready cycles given in the DAG relationship diagram.
  • the hardware resource limitation parameter is the maximum number of instructions issued by the device in one ready cycle. If the number of instructions that can be issued in parallel is greater than the hardware resource limitation parameter, some of the instructions that originally need to be issued in parallel will be changed to serial issuance, which will cause the real number of ready cycles to be greater than the reference number of ready cycles given in the DAG relationship diagram.
  • step S221 to step S222 include: determining the maximum number of instructions issued by the device in a ready cycle by traversing the DAG relationship graph, and obtaining a hardware resource limitation parameter; sorting the ready order of each intermediate operand to obtain a preliminary intermediate operand list; according to the hardware resource limitation parameter, detecting whether there is a target list fragment in the preliminary intermediate operand list, wherein the target list fragment is a list fragment composed of intermediate operands with the same ready order, and the number of intermediate operands in the target list fragment is greater than the hardware resource limitation parameter; if there is no target list fragment, using the preliminary intermediate operand list as the intermediate operand list; if there is a target list fragment, then according to the operation corresponding to the intermediate operand in the target list fragment, According to the source type of the number, the arrangement position of the intermediate operands in the target list fragment in the preliminary intermediate operand list is selectively moved back until the number of intermediate operands in the target list fragment is not greater than the hardware resource limitation parameter, and the execution step is returned
  • the implementation method of the present application takes into account the influence of the hardware resource limitation and the operand source type of the intermediate operand on the parallel issuance of instructions, which can further improve the accuracy of sorting the intermediate operands, so that the target DAG relationship diagram generated according to the intermediate operands is more in line with the actual application scenario, which can further improve the issuance efficiency of instructions.
  • the step of selectively shifting backward the arrangement position of the intermediate operands in the target list segment in the preliminary intermediate operand list according to the operand source type corresponding to the intermediate operands in the target list segment includes: detecting whether there are intermediate operands of long pipeline type in the target list segment according to the operand source type corresponding to the intermediate operands in the target list segment; if there are no intermediate operands of long pipeline type, randomly selecting intermediate operands in the target list segment to shift backward; if there are intermediate operands of long pipeline type, shifting backward the arrangement position of the intermediate operands of long pipeline type in the target list segment in the preliminary intermediate operand list; detecting whether the number of intermediate operands in the target list segment is greater than the hardware resource limitation parameter, and if so, returning to the execution step: detecting whether there are intermediate operands of long pipeline type in the target list segment according to the operand source type corresponding to the intermediate operands in the target list segment; if not, determining that the intermediate operands in the target
  • An embodiment of the present application provides a method for sorting intermediate operands, that is, determining the operand source type corresponding to each intermediate operand, wherein the operand source type includes at least one of a long pipeline type and a short pipeline type; sorting each intermediate operand according to the operand source type and the ready order of each intermediate operand to obtain an intermediate operand list.
  • the embodiment of the present application not only sorts the intermediate operands according to the reference ready order in the DAG relationship diagram, but also considers the influence of whether the intermediate operand belongs to the long pipeline type on the real ready order.
  • the manner of sorting the intermediate operands in the embodiment of the present application is more in line with the instruction.
  • the actual application scenario when instructions are issued in parallel can improve the accuracy of sorting the intermediate operands, so that the process of issuing instructions in parallel based on the target DAG relationship graph constructed based on the intermediate operand list will be more in line with the actual application scenario, which can further improve the efficiency of parallel issuance of instructions.
  • an instruction-level parallel scheduling device which includes: an operand search module 10, configured to obtain a DAG relationship graph corresponding to an instruction set, and to search for intermediate operands corresponding to each operation instruction with a dependency relationship in the instruction set by traversing the DAG relationship graph; an operand sorting module 20, configured to sort each intermediate operand according to the ready order of each intermediate operand to obtain an intermediate operand list; a dependency breaking module 30, configured to generate an original DAG relationship graph corresponding to each intermediate operand, and to insert each intermediate operand into the original DAG relationship graph in sequence according to the arrangement priority of the intermediate operand list to obtain a target DAG relationship graph; a parallel scheduling module 40, configured to perform parallel scheduling of instructions in the instruction set based on the target DAG relationship graph.
  • an operand search module 10 configured to obtain a DAG relationship graph corresponding to an instruction set, and to search for intermediate operands corresponding to each operation instruction with a dependency relationship in the instruction set by traversing the DAG relationship graph
  • an operand sorting module 20 configured to sort each intermediate oper
  • the dependency breaking module 30 is further configured to: take out the first target operand and the second target operand that are ranked higher in the intermediate operand list; update the original DAG relationship graph and the intermediate operand list based on the first target operand and the second target operand; return to the execution step: take out the first target operand and the second target operand that are ranked higher in the intermediate operand list until all the operands in the intermediate operand list have been inserted into the original DAG relationship graph, and use the original DAG relationship graph as the target DAG relationship graph.
  • the dependency breaking module 30 is further configured to: construct a basic operation between the first target operand and the second target operand; update the original DAG relationship graph by inserting the basic operation into the original DAG relationship graph; add the intermediate result operand corresponding to the basic operation to the intermediate operand list, and update the intermediate operand list by reordering the intermediate operand list.
  • the operand search module 10 is further configured to: search for identical operations with mutual dependencies in the DAG relationship graph by traversing the DAG relationship graph. DAG tree; by traversing the DAG trees of the same operations, finding the same operations with dependencies; obtaining operands corresponding to the same operations, deleting result operands from the operands, and obtaining intermediate operands.
  • the operand sorting module 20 is further configured as: an operand source type corresponding to each of the intermediate operands, wherein the operand source type includes at least one of a long pipeline type and a short pipeline type; and sorting each of the intermediate operands according to the operand source type and the ready order of each of the intermediate operands to obtain an intermediate operand list.
  • the operand sorting module 20 is further configured to: determine the hardware resource limitation parameters by traversing the DAG relationship graph; sort each of the intermediate operands according to the hardware resource limitation parameters, the source type of each of the operands and the ready order of each of the intermediate operands to obtain an intermediate operand list.
  • the dependency breaking module 30 is further configured to: delete operation instructions that are not related to each of the intermediate operands in the DAG relationship graph to obtain an original DAG relationship graph; and/or generate an original DAG relationship graph based on the original operation instructions corresponding to each of the intermediate operands in the DAG relationship graph.
  • the instruction-level parallel scheduling device provided by the present application adopts the instruction-level parallel scheduling method in the above-mentioned implementation mode, which solves the technical problem of low transmission efficiency due to the dependency between instructions.
  • the beneficial effects of the instruction-level parallel scheduling device provided by the implementation mode of the present application are the same as the beneficial effects of the instruction-level parallel scheduling method provided by the above-mentioned implementation mode, and the other technical features in the instruction-level parallel scheduling device are the same as the features disclosed in the above-mentioned implementation mode method, which will not be repeated here.
  • An embodiment of the present application provides an electronic device, which includes: at least one processor; and a memory communicatively connected to the at least one processor; wherein the memory stores instructions executable by the at least one processor, and the instructions are executed by the at least one processor so that the at least one processor can execute the instruction-level parallel scheduling method in any of the above embodiments.
  • FIG8 it shows a schematic diagram of the structure of an electronic device suitable for implementing the embodiments of the present disclosure.
  • the electronic device in the embodiments of the present disclosure may include but is not limited to Such as mobile terminals such as mobile phones, notebook computers, digital broadcast receivers, PDAs (personal digital assistants), PADs (tablet computers), PMPs (portable multimedia players), vehicle-mounted terminals (such as vehicle-mounted navigation terminals), etc., and fixed terminals such as digital TVs, desktop computers, etc.
  • the electronic device shown in FIG8 is only an example and should not bring any limitation to the functions and scope of use of the embodiments of the present disclosure.
  • electronic equipment may include a processing device (e.g., a central processing unit, a graphics processing unit, etc.), which can perform various appropriate actions and processes according to a program stored in a read-only memory (ROM) or a program loaded from a storage device into a random access memory (RAM).
  • ROM read-only memory
  • RAM random access memory
  • various programs and data required for the operation of the electronic equipment are also stored.
  • the processing device, ROM, and RAM are connected to each other via a bus.
  • An input/output (I/O) interface is also connected to the bus.
  • the following systems can be connected to the I/O interface: input devices including, for example, a touch screen, a touchpad, a keyboard, a mouse, an image sensor, a microphone, an accelerometer, a gyroscope, etc.; output devices including, for example, a liquid crystal display (LCD), a speaker, a vibrator, etc.; storage devices including, for example, a magnetic tape, a hard disk, etc.; and communication devices.
  • the communication device can allow the electronic device to communicate with other devices wirelessly or by wire to exchange data.
  • an embodiment of the present disclosure includes a computer program product, which includes a computer program carried on a computer-readable medium, and the computer program contains a program code for executing the method shown in the flowchart.
  • the computer program can be downloaded and installed from a network through a communication device, or installed from a storage device, or installed from a ROM.
  • the computer program is executed by a processing device, the above-mentioned functions defined in the method of the embodiment of the present disclosure are executed.
  • the electronic device provided by the present application adopts the instruction-level parallel scheduling method in the above-mentioned implementation to solve the technical problem of low transmission efficiency due to the dependency between instructions.
  • the beneficial effects of the electronic device provided by the implementation of the present application are the same as the beneficial effects of the instruction-level parallel scheduling method provided by the above-mentioned implementation, and the other technical features in the electronic device are the same as the features disclosed in the above-mentioned implementation method, which will not be repeated here. State.
  • This embodiment provides a computer-readable storage medium having computer-readable program instructions stored thereon, and the computer-readable program instructions are used to execute the instruction-level parallel scheduling method in the above-mentioned embodiment 1.
  • the computer-readable storage medium provided in the embodiment of the present application can be, for example, a USB flash drive, but is not limited to electrical, magnetic, optical, electromagnetic, infrared, or semiconductor systems, systems or devices, or any combination of the above. More specific examples of computer-readable storage media can include, but are not limited to: an electrical connection with one or more wires, a portable computer disk, a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disk read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the above.
  • the computer-readable storage medium can be any tangible medium containing or storing a program, which can be used by an instruction execution system, a system or a device or used in combination with it.
  • the program code contained on the computer-readable storage medium can be transmitted with any appropriate medium, including but not limited to: wires, optical cables, RF (radio frequency), etc., or any suitable combination of the above.
  • the computer-readable storage medium may be included in the electronic device, or may exist independently without being installed in the electronic device.
  • the computer-readable storage medium carries one or more programs.
  • the electronic device obtains a DAG relationship graph corresponding to an instruction set, searches for intermediate operands corresponding to each operation instruction with a dependency relationship in the instruction set by traversing the DAG relationship graph; sorts each intermediate operand according to the ready order of each intermediate operand to obtain an intermediate operand sequence. table; generate an original DAG relationship graph corresponding to each of the intermediate operands, and insert each of the intermediate operands into the original DAG relationship graph in sequence according to the arrangement priority of the intermediate operand list to obtain a target DAG relationship graph; and perform parallel scheduling on the instructions in the instruction set according to the target DAG relationship graph.
  • Computer program code for performing the operations of the present disclosure may be written in one or more programming languages, or a combination thereof, including object-oriented programming languages, such as Java, Smalltalk, C++, and conventional procedural programming languages, such as "C" or similar programming languages.
  • the program code may be executed entirely on the user's computer, partially on the user's computer, as a separate software package, partially on the user's computer and partially on a remote computer, or entirely on a remote computer or server.
  • the remote computer may be connected to the user's computer through any type of network, including a local area network (LAN) or a wide area network (WAN), or may be connected to an external computer (e.g., through the Internet using an Internet service provider).
  • LAN local area network
  • WAN wide area network
  • Internet service provider e.g., AT&T, MCI, Sprint, EarthLink, MSN, GTE, etc.
  • each square box in the flow chart or block diagram can represent a module, a program segment or a part of a code, and the module, the program segment or a part of the code contains one or more executable instructions for realizing the specified logical function.
  • the functions marked in the square box can also occur in a sequence different from that marked in the accompanying drawings. For example, two square boxes represented in succession can actually be executed substantially in parallel, and they can sometimes be executed in the opposite order, depending on the functions involved.
  • each square box in the block diagram and/or flow chart, and the combination of the square boxes in the block diagram and/or flow chart can be implemented with a dedicated hardware-based system that performs a specified function or operation, or can be implemented with a combination of dedicated hardware and computer instructions.
  • the modules involved in the embodiments described in the present disclosure may be implemented by software or hardware, wherein the name of the module does not limit the unit itself in some cases.
  • the computer-readable storage medium provided by the present application stores computer-readable program instructions for executing the above-mentioned instruction-level parallel scheduling method, which solves the problem of the dependencies between instructions.
  • the beneficial effects of the computer-readable storage medium provided in the embodiment of the present application are the same as the beneficial effects of the instruction-level parallel scheduling method provided in the above embodiment, which will not be described in detail here.

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Advance Control (AREA)

Abstract

本申请提供了指令级并行调度方法、装置、电子设备及存储介质,所述指令级并行调度方法包括:获取指令集合对应的DAG关系图,通过遍历所述DAG关系图,查找所述指令集合中具有依赖关系的各运算指令对应的中间操作数;根据各所述中间操作数的就绪顺序,对各所述中间操作数进行排序,得到中间操作数列表;生成各所述中间操作数共同对应的原始DAG关系图,根据所述中间操作数列表的排列优先级,将各所述中间操作数顺序插入所述原始DAG关系图,得到目标DAG关系图;依据所述目标DAG关系图,对所述指令集合中的指令进行并行调度。

Description

指令级并行调度方法、装置、电子设备及存储介质
相关申请的交叉引用
本申请要求2022年9月29日提交给中国专利局的第202211204579.4号专利申请的优先权,其全部内容通过引用合并于此。
技术领域
本申请涉及但不限于处理器技术领域。
背景技术
在支持指令级并行的平台上,往往每个周期可以发射多条指令,即同时执行多条指令。然而目前的指令级并行是基于无关依赖指令之间的并行化处理,而针对于存在依赖的指令,指令间存在输出依赖关系,通常无法在并行发射,影响指令发射的效率。
发明内容
本申请提供一种指令级并行调度方法,所述指令级并行调度方法包括:获取指令集合对应的DAG(Directed Acyclic Graph,有向无环)关系图,通过遍历所述DAG关系图,查找所述指令集合中具有依赖关系的各运算指令对应的中间操作数;根据各所述中间操作数的就绪顺序,对各所述中间操作数进行排序,得到中间操作数列表;生成各所述中间操作数共同对应的原始DAG关系图,根据所述中间操作数列表的排列优先级,将各所述中间操作数顺序插入所述原始DAG关系图,得到目标DAG关系图;依据所述目标DAG关系图,对所述指令集合中的指令进行并行调度。
本申请还提供一种指令级并行调度装置,所述指令级并行调度装置包括:操作数查找模块,配置为获取指令集合对应的DAG关系图,通过遍历所述DAG关系图,查找所述指令集合中具有依赖关系的各运算指令对应的中间操作数;操作数排序模块,配置为根据各所 述中间操作数的就绪顺序,对各所述中间操作数进行排序,得到中间操作数列表;依赖关系破除模块,配置为生成各所述中间操作数共同对应的原始DAG关系图,根据所述中间操作数列表的排列优先级,将各所述中间操作数顺序插入所述原始DAG关系图,得到目标DAG关系图;并行调度模块,配置为依据所述目标DAG关系图,对所述指令集合中的指令进行并行调度。
本申请还提供一种电子设备,所述电子设备为实体设备,所述电子设备包括:存储器、处理器以及存储在所述存储器上并可在所述处理器上运行的所述指令级并行调度方法的程序,所述指令级并行调度方法的程序被处理器执行时可实现如本文所述的指令级并行调度方法的步骤。
本申请还提供一种计算机可读存储介质,所述计算机可读存储介质上存储有实现指令级并行调度方法的程序,所述指令级并行调度方法的程序被处理器执行时实现如本文所述的指令级并行调度方法的步骤。
附图说明
图1为本申请指令级并行调度方法的流程示意图;
图2本申请指令级并行调度方法中DAG关系图的示意图;
图3为本申请指令级并行调度方法中DAG关系图对应的原始DAG关系图的示意图;
图4为本申请指令级并行调度方法中在原始DAG关系图中插入基本运算的示意图;
图5为本申请指令级并行调度方法中目标DAG关系图的示意图;
图6为本申请指令级并行调度方法的流程示意图;
图7本申请指令级并行调度装置中的装置示意图;
图8为本申请中指令级并行调度方法涉及的硬件运行环境的设备结构示意图。
具体实施方式
为使本申请的上述目的、特征和优点能够更加明显易懂,下面将结合本申请实施方式中的附图,对本申请实施方式中的技术方案进行清楚、完整地描述。显然,所描述的实施方式仅仅是本申请一部分实施方式,而不是全部的实施方式。基于本申请中的实施方式,本领域普通技术人员在没有作出创造性劳动的前提下所获得的所有其它实施方式,均属于本申请保护的范围。
参照图1,本申请提供一种指令级并行调度方法。在本申请的一个实施方式中,指令级并行调度方法可以包括步骤S10至S40。
在步骤S10,获取指令集合对应的DAG关系图,通过遍历所述DAG关系图,查找所述指令集合中具有依赖关系的各运算指令对应的中间操作数。
在步骤S20,根据各所述中间操作数的就绪顺序,对各所述中间操作数进行排序,得到中间操作数列表。
在步骤S30,生成各所述中间操作数共同对应的原始DAG关系图,根据所述中间操作数列表的排列优先级,将各所述中间操作数顺序插入所述原始DAG关系图,得到目标DAG关系图。
在步骤S40,依据所述目标DAG关系图,对所述指令集合中的指令进行并行调度。
在本实施方式中,需要说明的是,所述指令集合至少包括一指令,所述DAG关系图为根据各指令对应的指令信息生成的关系图,该DAG关系图包含指令对应的操作数之间的依赖关系以及运算关系等信息。
作为一种示例,参照图2,图2为本申请中DAG关系图的示意图,其中,a、b、c、d、t1、t2、t3、t4、t5、t6、t8、t9、t10和t11均为指令对应的操作数,*、+、/、-、shl、and和or均为基本运算,1、3和16均为就绪周期数,也即在该就绪周期数之后,该就绪周期数对应的操作数已就绪,对应的指令可以发射。
所述中间操作数为具有依赖关系的各运算指令对应的操作数,且该中间操作数不为这些具有依赖关系的各运算指令的结果操作数, 也即不为运算结果。作为一种示例,进一步参照图2,图2中具有依赖关系的各加法运算分别为a+b,t1+t2,t9+t3,t10+t4,包含的操作数分别a、b、t1、t2、t9、t3、t10和t4,删除各结果操作数,得到的各中间操作数为a、b、t1、t3和t4。
作为一种示例步骤S10至步骤S40包括:获取根据指令集合中各指令的指令信息生成的DAG关系图;通过遍历所述DAG关系图,在所述DAG关系图中查找具有依赖关系的各相同运算指令对应的中间操作数;根据各所述中间操作数在DAG关系图中的就绪顺序,将各所述中间操作数从先就绪到后就绪进行排序,得到中间操作数列表;根据各所述中间操作数之间的关联关系,重新生成DAG关系图,得到原始DAG关系图,根据所述中间操作数列表的排列优先级,将各中间操作数和各中间操作数之间的运算结果顺序插入所述原始DAG关系图,得到目标DAG关系图;根据所述目标DAG关系图中各指令对应的深度,对所述指令集合中的指令进行并行调度,其中,由于目标DAG关系图中各指令对应的深度是和目标DAG关系图中操作数的就绪顺序相符合的,各指令之间无依赖关系,因此实现了破除指令之间的依赖关系的目的。
其中,所述生成各所述中间操作数共同对应的原始DAG关系图的步骤包括步骤A10和步骤B10。
在步骤A10,在所述DAG关系图中删除与各所述中间操作数无关的运算指令,得到原始DAG关系图。
作为一种示例,需要说明的是,所述原始DAG关系图可以在指令集合对应的DAG关系图进行调整得到。
作为一种示例,步骤A10包括:在所述DAG关系图中确定各中间操作数相关的运算指令;在所述DAG关系图中将除各中间操作数相关的运算指令之外的指令进行删除,得到原始DAG关系图。
在步骤B10,根据各所述中间操作数在所述DAG关系图中对应的原先运算指令,生成原始DAG关系图。
作为一种示例,所述原始DAG关系图可以是基于各中间操作数对应的指令信息重新生成的DAG关系图。
作为一种示例,步骤B10包括:确定各所述中间操作数在所述DAG关系图的原先运算指令,基于原先运算指令对应的指令信息,生成原始DAG关系图。
作为一种示例,参照图3,图3为本申请中DAG关系图对应的原始DAG关系图的示意图,该原始DAG关系图可以在图2中的DAG关系图的基础上进行调整得到,图2具有依赖关系的各加法运算分别为a+b,t1+t2,t9+t3,t10+t4,包含的操作数分别a、b、t1、t2、t9、t3、t10和t4,删除各结果操作数,得到的各中间操作数为a、b、t1、t3和t4,各中间操作排序得到的中间操作数列表为a(0)、b(0)、t1(3)、t3(16)和t4(1),其中,0、1、3和16为就绪周期数,进而根据中间操作数列表,在图2中的DAG关系图中删除与各所述中间操作数无关的运算指令,即可得到原始DAG关系图。
其中,所述根据所述中间操作数列表的排列优先级,将各所述中间操作数顺序插入所述原始DAG关系图,得到目标DAG关系图的步骤包括步骤S31至S33。
在步骤S31,在所述中间操作数列表中取出排序靠前的第一目标操作数和第二目标操作数。
在步骤S32,依据所述第一目标操作数和所述第二目标操作数,更新所述原始DAG关系图以及所述中间操作数列表。
在步骤S33,返回执行步骤:在所述中间操作数列表中取出排序靠前的第一目标操作数和第二目标操作数,直至所述中间操作数列表中的操作数均已插入所述原始DAG关系图,将所述原始DAG关系图作为目标DAG关系图。
作为一种示例,步骤S31至步骤S33包括:在所述中间操作数列表中取出就绪顺序排序靠前的第一目标操作数和第二目标操作数;根据DAG关系图中各操作数之间的运算关系,构建所述第一目标操作数和所述第二目标操作数之间的基本运算;通过将所述基本运算插入所述原始DAG关系图以及将所述基本运算的运算结果插入所述中间操作数列表,更新所述原始DAG关系图以及所述中间操作数列表;返回执行步骤:在所述中间操作数列表中取出排序靠前的第一目标操 作数和第二目标操作数,直至所述中间操作数列表中的操作数均已插入所述原始DAG关系图,将所述原始DAG关系图作为目标DAG关系图。本申请实施方式实现了根据中间操作数列表中操作数的就绪顺序,循环将各中间操作数和各中间操作数对应的运算结果顺序插入原始DAG关系图,可以保证目标DAG关系图中的运算指令之间无依赖关系。
其中,所述依据所述第一目标操作数和所述第二目标操作数,更新所述原始DAG关系图以及所述中间操作数列表的步骤包括步骤S321至S323。
在步骤S321,构建所述第一目标操作数和所述第二目标操作数之间的基本运算。
在步骤S322,通过将所述基本运算插入所述原始DAG关系图,更新所述原始DAG关系图。
在步骤S323,将所述基本运算对应的中间结果操作数加入所述中间操作数列表,通过对所述中间操作数列表重新进行排序,更新所述中间操作数列表。
作为一种示例,步骤S321至步骤S323包括:根据所述第一目标操作数和所述第二目标操作数在所述DAG关系图中的运算关系,构建所述第一目标操作数和所述第二目标操作数之间的基本运算,通过将所述基本运算以及所述基本运算的运算结果插入所述原始DAG关系图,更新所述原始DAG关系图,其中,该运算结果可以为中间结果操作数;将所述基本运算对应的中间结果操作数加入所述中间操作数列表,根据中间操作数列表中各操作数的就绪顺序,通过重新对所述中间操作数列表重新进行排序,更新所述中间操作数列表。
作为一种示例,参照图4,图4为本申请中在原始DAG关系图中插入基本运算的示意图,图4中的DAG关系图为在图3中的原始DAG关系图的基础上插入a和b之间的基本运算得到的DAG关系图,其中,a为第一目标操作数,b为第二目标操作数,s1为a和b共同对应的运算结果,也即为中间结果操作数,相应地,更新后的中间操作数列表为s1(1),t4(1),t1(3),t3(16)。
作为一种示例,进一步参照图5,图5为本申请中目标DAG关系图的示意图,在图4中的DAG关系图的基础上,循环将中间操作数列表中的中间操作数进行顺序插入,即可得到图5中的目标DAG关系图。
其中,所述通过遍历所述DAG关系图,查找所述指令集合中具有依赖关系的各运算指令对应的中间操作数的步骤包括步骤S11至S13。
在步骤S11,通过遍历所述DAG关系图,在所述DAG关系图中查找具有相互依赖关系的相同运算DAG树。
在步骤S12,通过遍历各所述相同运算DAG树,查找具有依赖关系的各相同运算。
在步骤S13,获取各所述相同运算对应的操作数,在各所述操作数中删除结果操作数,得到各中间操作数。
在本实施方式中,需要说明的是,所述DAG关系图可以由至少一颗DAG树组成。
作为一种示例,步骤S11至步骤S13包括:通过遍历所述DAG关系图,在所述DAG关系图中查找具有相互依赖关系的相同运算对应的相同运算DAG树;通过遍历各所述相同运算DAG树,查找具有相互依赖关系的各相同运算;提取各所述相同运算对应的操作数,得到操作数列表;在所述操作数列表中删除作为结果操作数的操作数,得到各中间操作数。
本申请实施方式提供了一种指令级并行调度方法,也即获取指令集合对应的DAG关系图,通过遍历所述DAG关系图,查找所述指令集合中具有依赖关系的各运算指令对应的中间操作数;根据各所述中间操作数的就绪顺序,对各所述中间操作数进行排序,得到中间操作数列表;生成各所述中间操作数共同对应的原始DAG关系图,根据所述中间操作数列表的排列优先级,将各所述中间操作数顺序插入所述原始DAG关系图,得到目标DAG关系图,从而实现了根据操作数的就绪顺序来构建指令集合的目标DAG关系图,依据所述目标DAG关系图,对所述指令集合中的指令进行并行调度,可以实现 依据操作数的就绪顺序,顺序对指令进行并行调度,破除了指令之间的依赖关系对指令并行调度的影响,所以提升了对存在依赖的指令的发射效率,解决了由于指令之间存在依赖而导致发射效率低的技术问题。
进一步地,参照图6,在本申请另一实施方式中,与上述实施方式相同或相似的内容,可以参考上文介绍,后续不再赘述。所述根据各所述中间操作数的就绪顺序,对各所述中间操作数进行排序,得到中间操作数列表的步骤包括:步骤S21,确定各所述中间操作数对应的操作数来源类型,其中,所述操作数来源类型至少包括长流水类型和短流水类型中的一种;步骤S22,根据各所述操作数来源类型和各所述中间操作数的就绪顺序,对各所述中间操作数进行排序,得到中间操作数列表。
在本实施方式中,需要说明的是,在对各中间操作数进行排序时,常常会出现多个中间操作数对应的就绪周期数一致的情况,此时可以考虑中间操作数对应的操作数来源类型作为进一步进行排序的依据。对于长流水类型的中间操作数,由于系统延时以及系统资源限制等诸多因素,长流水类型的中间操作数对应的真实就绪周期数往往会大于DAG关系图中所给出的参考就绪周期数。
作为一种示例,步骤S21至步骤S22包括:确定各所述中间操作数对应的操作数来源类型,其中,所述操作数来源类型至少包括长流水类型和短流水类型中的一种;依据各所述中间操作数的就绪顺序,对各所述中间操作数进行排序,得到初步中间操作数列表;根据各所述操作数来源类型,对初步中间操作数列表的排序顺序进行微调,得到中间操作数列表。
作为一种示例,所述根据各所述操作数来源类型,对初步中间操作数列表的排序顺序进行微调,得到中间操作数列表的步骤包括:确定所述初步中间操作数列表中就绪顺序相同的各目标中间操作数,若各目标中间操作数对应的操作数来源类型中存在长流水类型,则将属于长流水类型的目标中间操作数在所述初步中间操作数列表进行 后移,其中,后移的距离可以自行设定,例如后移1个排列位置或者后移2个排列位置等。
作为一种示例,所述根据各所述操作数来源类型,对初步中间操作数列表的排序顺序进行微调,得到中间操作数列表的步骤包括:根据各所述操作数来源类型,在所述初步中间操作数列表中确定属于长流水类型的各目标中间操作数;将各目标中间操作数在初步中间操作数列表中后移预设排列位置数,得到中间操作数列表,其中,预设排列位置数可以自行设定,例如1个排列位置或者2个排列位置等。
其中,所述根据各所述操作数来源类型和各所述中间操作数的就绪顺序,对各所述中间操作数进行排序,得到中间操作数列表的步骤包括:步骤S221,通过遍历所述DAG关系图,确定硬件资源限制参数;步骤S222,根据所述硬件资源限制参数、各所述操作数来源类型和各所述中间操作数的就绪顺序,对各所述中间操作数进行排序,得到中间操作数列表。
在本实施方式中,需要说明的是,由于设备的硬件资源存在限制,因此对于长流水类型的中间操作数,真实就绪周期数往往会大于DAG关系图中所给出的参考就绪周期数。所述硬件资源限制参数为设备在一个就绪周期内的最大指令发射数量,若当前可并行发射的指令数量大于硬件资源限制参数,则原本需要并行发射的指令则会部分变为串行发射,从而会导致真实就绪周期数往往会大于DAG关系图中所给出的参考就绪周期数。
作为一种示例,步骤S221至步骤S222包括:通过遍历所述DAG关系图,确定设备在一个就绪周期内的最大指令发射数量,得到硬件资源限制参数;各所述中间操作数的就绪顺序,对各所述中间操作数进行排序,得到初步中间操作数列表;根据硬件资源限制参数,在所述初步中间操作数列表中检测是否存在目标列表片段,其中,所述目标列表片段为具备相同就绪顺序的中间操作数构成的列表片段,且所述目标列表片段中的中间操作数的数量大于硬件资源限制参数;若不存在目标列表片段,则将初步中间操作数列表作为中间操作数列表;若存在目标列表片段,则根据目标列表片段中中间操作数对应的操作 数来源类型,将目标列表片段中的中间操作数在初步中间操作数列表中的排列位置进行选择性后移,直至目标列表片段中的中间操作数的数量不大于硬件资源限制参数,并返回执行步骤:根据硬件资源限制参数,在所述初步中间操作数列表中检测是否存在目标列表片段,直至检测到初步中间操作数列表中不存在目标列表片段。本申请实施方式在对中间操作数进行排序时,同时考虑了硬件资源限制和中间操作数的操作数来源类型对于指令并行发射的影响,可以进一步提升对各中间操作数进行排序的准确度,使得根据中间操作数生成的目标DAG关系图更加符合实际应用场景,可进一步提升指令的发射效率。
作为一种示例,所述根据目标列表片段中中间操作数对应的操作数来源类型,将目标列表片段中的中间操作数在初步中间操作数列表中的排列位置进行选择性后移的步骤包括:根据目标列表片段中的中间操作数对应的操作数来源类型,检测所述目标列表片段是否存在长流水类型的中间操作数;若不存在长流水类型的中间操作数,则在目标列表片段中随机选取中间操作数进行后移,若存在长流水类型的中间操作数,则将目标列表片段中长流水类型的中间操作数在初步中间操作数列表中的排列位置进行后移;检测目标列表片段的中间操作数的数量是否大于硬件资源限制参数,若大于硬件资源限制参数,则返回执行步骤:根据目标列表片段中中间操作数对应的操作数来源类型,检测所述目标列表片段是否存在长流水类型的中间操作数;若不大于硬件资源限制参数,则判定目标列表片段中的中间操作数已后移完毕。
本申请实施方式提供了一种中间操作数排序方法,也即确定各所述中间操作数对应的操作数来源类型,其中,所述操作数来源类型至少包括长流水类型和短流水类型中的一种;根据各所述操作数来源类型和各所述中间操作数的就绪顺序,对各所述中间操作数进行排序,得到中间操作数列表。本申请实施方式在对中间操作数进行排序时,在依据中间操作数在DAG关系图中的参考就绪顺序进行排序的基础上,还考虑到了中间操作数是否属于长流水类型对于真实就绪顺序的影响,本申请实施方式中对各中间操作数进行排序的方式更加符合指 令并行发射时的真实应用场景,因此可以提升对各中间操作数进行排序的准确度,从而基于中间操作数列表构建的目标DAG关系图进行指令并行发射的过程会更加符合真实应用场景,可进一步提升指令并行发射的效率。
为实现上述目的,参照图7,本申请还提供一种指令级并行调度装置,所述指令级并行调度装置包括:操作数查找模块10,配置为获取指令集合对应的DAG关系图,通过遍历所述DAG关系图,查找所述指令集合中具有依赖关系的各运算指令对应的中间操作数;操作数排序模块20,配置为根据各所述中间操作数的就绪顺序,对各所述中间操作数进行排序,得到中间操作数列表;依赖关系破除模块30,配置为生成各所述中间操作数共同对应的原始DAG关系图,根据所述中间操作数列表的排列优先级,将各所述中间操作数顺序插入所述原始DAG关系图,得到目标DAG关系图;并行调度模块40,配置为依据所述目标DAG关系图,对所述指令集合中的指令进行并行调度。
可选地,所述依赖关系破除模块30还配置为:在所述中间操作数列表中取出排序靠前的第一目标操作数和第二目标操作数;依据所述第一目标操作数和所述第二目标操作数,更新所述原始DAG关系图以及所述中间操作数列表;返回执行步骤:在所述中间操作数列表中取出排序靠前的第一目标操作数和第二目标操作数,直至所述中间操作数列表中的操作数均已插入所述原始DAG关系图,将所述原始DAG关系图作为目标DAG关系图。
可选地,所述依赖关系破除模块30还配置为:构建所述第一目标操作数和所述第二目标操作数之间的基本运算;通过将所述基本运算插入所述原始DAG关系图,更新所述原始DAG关系图;将所述基本运算对应的中间结果操作数加入所述中间操作数列表,通过对所述中间操作数列表重新进行排序,更新所述中间操作数列表。
可选地,所述操作数查找模块10还配置为:通过遍历所述DAG关系图,在所述DAG关系图中查找具有相互依赖关系的相同运算 DAG树;通过遍历各所述相同运算DAG树,查找具有依赖关系的各相同运算;获取各所述相同运算对应的操作数,在各所述操作数中删除结果操作数,得到各中间操作数。
可选地,所述操作数排序模块20还配置为:各所述中间操作数对应的操作数来源类型,其中,所述操作数来源类型至少包括长流水类型和短流水类型中的一种;根据各所述操作数来源类型和各所述中间操作数的就绪顺序,对各所述中间操作数进行排序,得到中间操作数列表。
可选地,所述操作数排序模块20还配置为:通过遍历所述DAG关系图,确定硬件资源限制参数;根据所述硬件资源限制参数、各所述操作数来源类型和各所述中间操作数的就绪顺序,对各所述中间操作数进行排序,得到中间操作数列表。
可选地,所述依赖关系破除模块30还配置为:在所述DAG关系图中删除与各所述中间操作数无关的运算指令,得到原始DAG关系图;和/或根据各所述中间操作数在所述DAG关系图中对应的原先运算指令,生成原始DAG关系图。
本申请提供的指令级并行调度装置,采用上述实施方式中的指令级并行调度方法,解决了由于指令之间存在依赖而导致发射效率低的技术问题。与现有技术相比,本申请实施方式提供的指令级并行调度装置的有益效果与上述实施方式提供的指令级并行调度方法的有益效果相同,且该指令级并行调度装置中的其他技术特征与上述实施方式方法公开的特征相同,在此不做赘述。
本申请实施方式提供一种电子设备,电子设备包括:至少一个处理器;以及,与至少一个处理器通信连接的存储器;其中,存储器存储有可被至少一个处理器执行的指令,指令被至少一个处理器执行,以使至少一个处理器能够执行上述任一实施方式中的指令级并行调度方法。
下面参考图8,其示出了适于用来实现本公开实施方式的电子设备的结构示意图。本公开实施方式中的电子设备可以包括但不限于诸 如移动电话、笔记本电脑、数字广播接收器、PDA(个人数字助理)、PAD(平板电脑)、PMP(便携式多媒体播放器)、车载终端(例如车载导航终端)等等的移动终端以及诸如数字TV、台式计算机等等的固定终端。图8示出的电子设备仅仅是一个示例,不应对本公开实施方式的功能和使用范围带来任何限制。
如图8所示,电子设备可以包括处理装置(例如中央处理器、图形处理器等),其可以根据存储在只读存储器(ROM)中的程序或者从存储装置加载到随机访问存储器(RAM)中的程序而执行各种适当的动作和处理。在RAM中,还存储有电子设备操作所需的各种程序和数据。处理装置、ROM以及RAM通过总线彼此相连。输入/输出(I/O)接口也连接至总线。
通常,以下系统可以连接至I/O接口:包括例如触摸屏、触摸板、键盘、鼠标、图像传感器、麦克风、加速度计、陀螺仪等的输入装置;包括例如液晶显示器(LCD)、扬声器、振动器等的输出装置;包括例如磁带、硬盘等的存储装置;以及通信装置。通信装置可以允许电子设备与其他设备进行无线或有线通信以交换数据。虽然图中示出了具有各种系统的电子设备,但是应理解的是,并不要求实施或具备所有示出的系统。可以替代地实施或具备更多或更少的系统。
特别地,根据本公开的实施方式,上文参考流程图描述的过程可以被实现为计算机软件程序。例如,本公开的实施方式包括一种计算机程序产品,其包括承载在计算机可读介质上的计算机程序,该计算机程序包含用于执行流程图所示的方法的程序代码。在这样的实施方式中,该计算机程序可以通过通信装置从网络上被下载和安装,或者从存储装置被安装,或者从ROM被安装。在该计算机程序被处理装置执行时,执行本公开实施方式的方法中限定的上述功能。
本申请提供的电子设备,采用上述实施方式中的指令级并行调度方法,解决了由于指令之间存在依赖而导致发射效率低的技术问题。与现有技术相比,本申请实施方式提供的电子设备的有益效果与上述实施方式提供的指令级并行调度方法的有益效果相同,且该电子设备中的其他技术特征与上述实施方式方法公开的特征相同,在此不做赘 述。
应当理解,本公开的各部分可以用硬件、软件、固件或它们的组合来实现。在上述实施方式的描述中,具体特征、结构、材料或者特点可以在任何的一个或多个实施方式或示例中以合适的方式结合。
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以所述权利要求的保护范围为准。
本实施方式提供一种计算机可读存储介质,具有存储在其上的计算机可读程序指令,计算机可读程序指令用于执行上述实施方式一中的指令级并行调度的方法。
本申请实施方式提供的计算机可读存储介质例如可以是U盘,但不限于电、磁、光、电磁、红外线、或半导体的系统、系统或器件,或者任意以上的组合。计算机可读存储介质的更具体的例子可以包括但不限于:具有一个或多个导线的电连接、便携式计算机磁盘、硬盘、随机访问存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、光纤、便携式紧凑磁盘只读存储器(CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。在本实施方式中,计算机可读存储介质可以是任何包含或存储程序的有形介质,该程序可以被指令执行系统、系统或者器件使用或者与其结合使用。计算机可读存储介质上包含的程序代码可以用任何适当的介质传输,包括但不限于:电线、光缆、RF(射频)等等,或者上述的任意合适的组合。
上述计算机可读存储介质可以是电子设备中所包含的;也可以是单独存在,而未装配入电子设备中。
上述计算机可读存储介质承载有一个或者多个程序,当上述一个或者多个程序被电子设备执行时,使得电子设备:获取指令集合对应的DAG关系图,通过遍历所述DAG关系图,查找所述指令集合中具有依赖关系的各运算指令对应的中间操作数;根据各所述中间操作数的就绪顺序,对各所述中间操作数进行排序,得到中间操作数列 表;生成各所述中间操作数共同对应的原始DAG关系图,根据所述中间操作数列表的排列优先级,将各所述中间操作数顺序插入所述原始DAG关系图,得到目标DAG关系图;依据所述目标DAG关系图,对所述指令集合中的指令进行并行调度。
可以以一种或多种程序设计语言或其组合来编写用于执行本公开的操作的计算机程序代码,上述程序设计语言包括面向对象的程序设计语言—诸如Java、Smalltalk、C++,还包括常规的过程式程序设计语言—诸如“C”语言或类似的程序设计语言。程序代码可以完全地在用户计算机上执行、部分地在用户计算机上执行、作为一个独立的软件包执行、部分在用户计算机上部分在远程计算机上执行、或者完全在远程计算机或服务器上执行。在涉及远程计算机的情形中,远程计算机可以通过任意种类的网络——包括局域网(LAN)或广域网(WAN)—连接到用户计算机,或者,可以连接到外部计算机(例如利用因特网服务提供商来通过因特网连接)。
附图中的流程图和框图,图示了按照本申请各种实施方式的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段、或代码的一部分,该模块、程序段、或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。也应当注意,在有些作为替换的实现中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个接连地表示的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合,可以用执行规定的功能或操作的专用的基于硬件的系统来实现,或者可以用专用硬件与计算机指令的组合来实现。
描述于本公开实施方式中所涉及到的模块可以通过软件的方式实现,也可以通过硬件的方式来实现。其中,模块的名称在某种情况下并不构成对该单元本身的限定。
本申请提供的计算机可读存储介质,存储有用于执行上述指令级并行调度方法的计算机可读程序指令,解决了由于指令之间存在依 赖而导致发射效率低的技术问题。与现有技术相比,本申请实施方式提供的计算机可读存储介质的有益效果与上述实施方式提供的指令级并行调度方法的有益效果相同,在此不做赘述。
以上仅为本申请的优选实施方式,并非因此限制本申请的专利范围,凡是利用本申请说明书及附图内容所作的等效结构或等效流程变换,或直接或间接运用在其他相关的技术领域,均同理包括在本申请的专利处理范围内。

Claims (10)

  1. 一种指令级并行调度方法,包括:
    获取指令集合对应的有向无环(DAG)关系图,通过遍历所述DAG关系图,查找所述指令集合中具有依赖关系的各运算指令对应的中间操作数;
    根据各所述中间操作数的就绪顺序,对各所述中间操作数进行排序,得到中间操作数列表;
    生成各所述中间操作数共同对应的原始DAG关系图,根据所述中间操作数列表的排列优先级,将各所述中间操作数顺序插入所述原始DAG关系图,得到目标DAG关系图;
    依据所述目标DAG关系图,对所述指令集合中的指令进行并行调度。
  2. 如权利要求1所述指令级并行调度方法,其中,所述根据所述中间操作数列表的排列优先级,将各所述中间操作数顺序插入所述原始DAG关系图,得到目标DAG关系图的步骤包括:
    在所述中间操作数列表中取出排序靠前的第一目标操作数和第二目标操作数;
    依据所述第一目标操作数和所述第二目标操作数,更新所述原始DAG关系图以及所述中间操作数列表;
    返回执行步骤:在所述中间操作数列表中取出排序靠前的第一目标操作数和第二目标操作数,直至所述中间操作数列表中的操作数均已插入所述原始DAG关系图,将所述原始DAG关系图作为目标DAG关系图。
  3. 如权利要求2所述指令级并行调度方法,其中,所述依据所述第一目标操作数和所述第二目标操作数,更新所述原始DAG关系图以及所述中间操作数列表的步骤包括:
    构建所述第一目标操作数和所述第二目标操作数之间的基本运 算;
    通过将所述基本运算插入所述原始DAG关系图,更新所述原始DAG关系图;
    将所述基本运算对应的中间结果操作数加入所述中间操作数列表,通过对所述中间操作数列表重新进行排序,更新所述中间操作数列表。
  4. 如权利要求1所述指令级并行调度方法,其中,所述通过遍历所述DAG关系图,查找所述指令集合中具有依赖关系的各运算指令对应的中间操作数的步骤包括:
    通过遍历所述DAG关系图,在所述DAG关系图中查找具有相互依赖关系的相同运算DAG树;
    通过遍历各所述相同运算DAG树,查找具有依赖关系的各相同运算;
    获取各所述相同运算对应的操作数,在各所述操作数中删除结果操作数,得到各中间操作数。
  5. 如权利要求1所述指令级并行调度方法,其中,所述根据各所述中间操作数的就绪顺序,对各所述中间操作数进行排序,得到中间操作数列表的步骤包括:
    确定各所述中间操作数对应的操作数来源类型,其中,所述操作数来源类型至少包括长流水类型和短流水类型中的一种;
    根据各所述操作数来源类型和各所述中间操作数的就绪顺序,对各所述中间操作数进行排序,得到中间操作数列表。
  6. 如权利要求5所述指令级并行调度方法,其中,所述根据各所述操作数来源类型和各所述中间操作数的就绪顺序,对各所述中间操作数进行排序,得到中间操作数列表的步骤包括:
    通过遍历所述DAG关系图,确定硬件资源限制参数;
    根据所述硬件资源限制参数、各所述操作数来源类型和各所述 中间操作数的就绪顺序,对各所述中间操作数进行排序,得到中间操作数列表。
  7. 如权利要求1所述指令级并行调度方法,其中,所述生成各所述中间操作数共同对应的原始DAG关系图的步骤包括:
    在所述DAG关系图中删除与各所述中间操作数无关的运算指令,得到原始DAG关系图;和/或
    根据各所述中间操作数在所述DAG关系图中对应的原先运算指令,生成原始DAG关系图。
  8. 一种指令级并行调度装置,包括:
    操作数查找模块,配置为获取指令集合对应的DAG关系图,通过遍历所述DAG关系图,查找所述指令集合中具有依赖关系的各运算指令对应的中间操作数;
    操作数排序模块,配置为根据各所述中间操作数的就绪顺序,对各所述中间操作数进行排序,得到中间操作数列表;
    依赖关系破除模块,配置为生成各所述中间操作数共同对应的原始DAG关系图,根据所述中间操作数列表的排列优先级,将各所述中间操作数顺序插入所述原始DAG关系图,得到目标DAG关系图;
    并行调度模块,配置为依据所述目标DAG关系图,对所述指令集合中的指令进行并行调度。
  9. 一种电子设备,包括:
    至少一个处理器;以及,
    与所述至少一个处理器通信连接的存储器;其中,
    所述存储器存储有可被所述至少一个处理器执行的指令,所述指令被所述至少一个处理器执行,以使所述至少一个处理器能够执行权利要求1至7中任一项所述的指令级并行调度方法的步骤。
  10. 一种计算机可读存储介质,其中,所述计算机可读存储介质上存储有实现指令级并行调度方法的程序,所述实现指令级并行调度方法的程序被处理器执行以实现如权利要求1至7中任一项所述指令级并行调度方法的步骤。
PCT/CN2023/115697 2022-09-29 2023-08-30 指令级并行调度方法、装置、电子设备及存储介质 WO2024066875A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202211204579.4A CN117827287A (zh) 2022-09-29 2022-09-29 指令级并行调度方法、装置、电子设备及存储介质
CN202211204579.4 2022-09-29

Publications (1)

Publication Number Publication Date
WO2024066875A1 true WO2024066875A1 (zh) 2024-04-04

Family

ID=90475958

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/115697 WO2024066875A1 (zh) 2022-09-29 2023-08-30 指令级并行调度方法、装置、电子设备及存储介质

Country Status (2)

Country Link
CN (1) CN117827287A (zh)
WO (1) WO2024066875A1 (zh)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140082330A1 (en) * 2012-09-14 2014-03-20 Qualcomm Innovation Center, Inc. Enhanced instruction scheduling during compilation of high level source code for improved executable code
CN104424026A (zh) * 2013-08-21 2015-03-18 华为技术有限公司 一种指令调度方法及装置
CN113296788A (zh) * 2021-06-10 2021-08-24 上海东软载波微电子有限公司 指令调度方法、装置、设备、存储介质及程序产品

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140082330A1 (en) * 2012-09-14 2014-03-20 Qualcomm Innovation Center, Inc. Enhanced instruction scheduling during compilation of high level source code for improved executable code
CN104424026A (zh) * 2013-08-21 2015-03-18 华为技术有限公司 一种指令调度方法及装置
CN113296788A (zh) * 2021-06-10 2021-08-24 上海东软载波微电子有限公司 指令调度方法、装置、设备、存储介质及程序产品

Also Published As

Publication number Publication date
CN117827287A (zh) 2024-04-05

Similar Documents

Publication Publication Date Title
KR101841751B1 (ko) 콜경로 파인더
US9805326B2 (en) Task management integrated design environment for complex data integration applications
US9779158B2 (en) Method, apparatus, and computer-readable medium for optimized data subsetting
CN109543080B (zh) 一种缓存数据处理方法、装置、电子设备及存储介质
CN110764748B (zh) 代码调用方法、装置、终端及存储介质
CN111078672A (zh) 数据库的数据对比方法及装置
CN102054217A (zh) 基于元模型的工具中的实体变形
CN112115153A (zh) 数据处理方法、装置、设备及存储介质
WO2023040612A1 (zh) 用于处理订单的方法和装置
CN113971037A (zh) 应用处理方法、装置、电子设备及存储介质
WO2022184077A1 (zh) 文档编辑的方法、装置、终端及非暂时性存储介质
US10747766B2 (en) Context based access path selection
WO2024066875A1 (zh) 指令级并行调度方法、装置、电子设备及存储介质
CN110704050B (zh) 模块初始化方法、装置、电子设备及计算机可读存储介质
US9607021B2 (en) Loading data with complex relationships
CN111796865B (zh) 一种字节码文件修改方法、装置、终端设备及介质
CN111399902B (zh) 客户端源文件处理方法、装置、可读介质与电子设备
CN112948228B (zh) 一种面向流数据的多模数据库评测基准系统及其构建方法
CN111984645B (zh) 一种数据处理的方法、装置、介质和电子设备
CN116432954A (zh) 新能源发电厂报表管理调度方法、系统、设备及储存介质
CN117557394A (zh) 智能对账方法、系统、设备及储存介质
CN111797110A (zh) 一种生成调度模型的方法、装置、计算机设备及存储介质
CN118152432A (zh) 数据库查询方法、设备、存储介质及计算机程序产品
CN114064267A (zh) 数据写入方法、系统、电子设备、可读存储介质及产品
CN117271550A (zh) 一种数据处理语句的处理方法、装置、设备及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23870116

Country of ref document: EP

Kind code of ref document: A1