WO2021239056A1 - 计算程序中数据依赖关系的方法及计算机可读存储介质 - Google Patents

计算程序中数据依赖关系的方法及计算机可读存储介质 Download PDF

Info

Publication number
WO2021239056A1
WO2021239056A1 PCT/CN2021/096378 CN2021096378W WO2021239056A1 WO 2021239056 A1 WO2021239056 A1 WO 2021239056A1 CN 2021096378 W CN2021096378 W CN 2021096378W WO 2021239056 A1 WO2021239056 A1 WO 2021239056A1
Authority
WO
WIPO (PCT)
Prior art keywords
fixed value
information
memory descriptor
instruction
memory
Prior art date
Application number
PCT/CN2021/096378
Other languages
English (en)
French (fr)
Inventor
刘艳娜
苏振宇
Original Assignee
中科寒武纪科技股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中科寒武纪科技股份有限公司 filed Critical 中科寒武纪科技股份有限公司
Priority to US18/000,120 priority Critical patent/US20230195599A1/en
Publication of WO2021239056A1 publication Critical patent/WO2021239056A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/41Compilation
    • G06F8/43Checking; Contextual analysis
    • G06F8/433Dependency analysis; Data or control flow analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/36Preventing errors by testing or debugging software
    • G06F11/362Software debugging
    • G06F11/3636Software debugging by tracing the execution of the program
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/20Software design
    • G06F8/22Procedural
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/41Compilation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/41Compilation
    • G06F8/44Encoding
    • G06F8/443Optimisation

Definitions

  • the present disclosure generally relates to the field of computers. More specifically, the present disclosure relates to a method and computer-readable storage medium for calculating data dependency in a program.
  • SSA static single assignment
  • IR intermediate representation
  • the solution of the present disclosure provides a method for calculating data dependency in a program and a computer-readable storage medium.
  • the present disclosure discloses a method for calculating data dependency in a program, the method comprising: constructing the program into a control flow graph in a static single assignment form, the control flow graph including a plurality of nodes; Determine the initial information of the memory descriptors of the multiple nodes, where the initial information is the fixed value information and kill set information of all memory descriptors that directly dominate the node; and iterate based on the control flow graph, and execute each iteration The following steps: traverse the multiple nodes, calculate and update the fixed value information and kill set information of the memory descriptor of each node, and update the data dependency of the instructions.
  • the data dependency of the instruction is used as reference information during instruction compilation and optimization.
  • the present disclosure discloses a computer-readable storage medium on which is stored computer program code of a data dependency relationship in a calculation program, and when the computer program code is run by a processor, the foregoing method is executed.
  • the present disclosure uses a compiler to analyze the memory interval of each instruction or the intermediary code fixed value and the used memory interval, calculate the data dependency relationship between the instructions, and establish a fixed value-for instructions with a fixed value-use relationship.
  • Use chain and use-fixed value chain to simplify the data dependency between memory and speed up the calculation of image, voice, and text data.
  • Figure 1A shows an exemplary source code and control flow diagram
  • Figure 1B shows an equivalent static single assignment form and control flow diagram
  • Figure 2 shows an exemplary control flow diagram
  • FIG. 3 shows a flowchart of an embodiment of the present disclosure
  • FIG. 4 shows a flowchart of another embodiment of the present disclosure
  • FIG. 5 shows a flowchart of another embodiment of the present disclosure
  • FIG. 6 shows a flowchart of another embodiment of the present disclosure
  • FIG. 7 shows a flow chart of another embodiment of the present disclosure during implantation
  • FIG. 8 shows a flowchart of another embodiment of the present disclosure.
  • FIG. 9 is a structural diagram showing an integrated circuit device of an embodiment of the present disclosure.
  • FIG. 10 is a structural diagram showing a board card of an embodiment of the present disclosure.
  • the present disclosure is a control flow graph in the form of static single assignment.
  • a compiler analyzes the memory interval fixed by each instruction and the used memory interval, and calculates the data dependency relationship between each instruction. For instructions with a fixed value and usage relationship, a fixed value-use chain and a use-fixed value chain are established.
  • the control flow graph is an abstract data structure used in the compiler, which represents all the paths that a program will traverse during the execution process, and reflects the possible flow of all basic blocks (BB) in the process in the form of a flowchart. .
  • the control flow graph is composed of nodes and the relationship between nodes.
  • the node is the basic block, which is the sequence of statements executed in the maximum order in the program.
  • Each basic block has only one entry and exit. The execution starts from its entry, starting from Its exit ends.
  • the characteristic of the basic block is that as long as the first instruction in the basic block is executed, all the instructions in the basic block will be executed in order until all the instructions of the basic block have been executed.
  • the main purpose of the static single assignment form is to adjust the variables so that each variable will only be assigned once to achieve better computing efficiency.
  • the following 3 lines of code are:
  • the first line is to assign the y variable to 1
  • the second line is to assign the y variable to 2
  • the third line is to assign the value of the y variable to the x variable, because the y variable is assigned twice in the second line, covering the first Assignment of the row, and the value of the y variable is used in the third row, so the first row is not necessary.
  • the aforementioned code will be converted into the following intermediary code:
  • Figure 1A shows an exemplary source code and control flow diagram.
  • the value of the input data is assigned to the x variable, and then it is judged whether the value of the x variable is 42. If it is, the y variable is assigned the value 1. If it is not, the y variable predicate x+2, and finally the value of the y variable is printed.
  • Figure 1B shows the equivalent static single assignment form and control flow diagram. The y variable is rewritten as y 1 and y 2 in the two branches. The static single assignment form uses the ⁇ (PHI) function to merge the control flow at the intersection of the control flow.
  • the characteristic of the static single assignment form is that the source variable is divided according to the range of activity (from the variable value to the use), and new variables with the suffix of the number are generated, and each variable is only fixed once, as the aforementioned y variable adjustment Into y 1 and y 2 ; there is a PHI function at the intersection of the control flow to combine values from different paths.
  • the PHI function represents a parallel operation, and an assignment is selected according to the running path.
  • the static single assignment form simplifies the two link relationships: use-def chain and fixed-use chain (def-use chain).
  • the use-fixed value chain is a data structure that contains a fixed value (or defined) variable and a collection of all its uses.
  • the fixed value-use chain contains a use variable and a collection of all its fixed values. Since the variable under the non-static single assignment form is fixed more than once, the fixed value-use chain of each variable is very complicated.
  • the static single assignment form there is no variable with the same name, and each variable is valued only once, so the use of the same name belongs to its fixed value-use chain, and each variable is only fixed once before use. So the use-fixed value chain is one-to-one. It can be seen that both the use-fixed value chain and the fixed value-use chain under the static single assignment form have been simplified.
  • the static single assignment form brings the following benefits: For scalar, each variable has only a unique fixed value, data flow analysis and optimization algorithms are simpler; the memory space occupied by the use-fixed value relationship chain is reduced from exponential growth to linear growth ; The relationship between use and fixed value is more intuitive; several unrelated uses of the same variable in the source program will be transformed into the use of different variables in the static single assignment form, so unnecessary redundant instructions can be avoided .
  • each variable can be defined by a memory descriptor.
  • a memory descriptor is used to describe a memory interval. It has three kinds of information: variable name, offset, and size, such as MD1( var:A,offset:0,size:128) represents that the memory descriptor of No. 1 contains the data of variable A, which is stored in 128 consecutive address spaces starting from the address with offset 0 [0:127 ]inside.
  • the memory descriptor is "exact", otherwise it is not accurate. For example: MD3[0,127] is accurate, MD4[32,X] is inaccurate (X is unknown).
  • kill set information In other words, the kill set of the memory descriptor records the intervals that will be killed by other instructions, and the fixed value of the memory descriptor is invalid for these intervals.
  • FIG. 2 shows an exemplary control flow diagram to illustrate these nodes.
  • H node the starting point of the path is node A, and after node E, there is a branch after node E, which may proceed to node F or node G, regardless of Perform F node or G node, and pass through H node after the end.
  • the dominating node refers to the basic block that must be passed through from the beginning of the control flow graph to a certain basic block. To reach the H node, it will inevitably pass through the A node, the E node and the H node itself, so the A node and the E node Both H and H nodes are the dominant nodes of H nodes.
  • Strictly dominating nodes are dominating nodes other than the specific basic block itself. Also taking H node as an example, A node and E node are strictly dominating nodes of H node.
  • the direct dominating node refers to the dominating node closest to a specific basic block, and the E node is the directly dominating node of the H node.
  • the dominance frontier that is, the place where the dominance relationship of a certain node ends and where other control flows will appear.
  • the intuitive understanding of the dominance frontier is the location of a specific node. Disposable boundary (not including this boundary).
  • the dominance boundary of the E node is the D node (the immediate rear driver of the F node) and the M node (the immediate rear driver of the H node).
  • An embodiment of the present disclosure is a method for calculating data dependency relationships in a program in a control flow graph.
  • the program can be used to perform inference calculations on image, voice, and text data in a neural network. Or in other calculations, there is no specific limitation here.
  • a tensor memory range may be determined by multiple instructions.
  • the method of the embodiment of the present application is used to analyze the data dependency relationship between the instructions according to the memory descriptor.
  • Fig. 3 shows a flowchart of this embodiment.
  • step 301 the program is constructed into a control flow graph in the form of static single assignment.
  • a program is constructed into a static single assignment form.
  • the simplest is to first replace each assigned variable with a new variable name, as shown in Figure 1A for the basic block A and B in the source program.
  • variables y changing basic blocks a basic block of FIG. 1B variables y 1 and the basic block B y 2.
  • variable y in the basic block C in Figure 1A cannot be determined. This is because the predecessor basic blocks (basic block A and basic block B) have fixed values for the variable y, and it is impossible to know when the program is compiled from Which predecessor basic block reaches basic block C? When the control flow graph encounters this situation, this step is followed by adding the PHI function. As shown in Figure 1B, a PHI function instruction is added to the beginning of the basic block C, and a variable y 3 is redefined based on the variable y. The PHI function can select the corresponding version according to the running path of the program, that is, choose to use the variable y 1 or y 2 .
  • the boundary is controlled to accurately grasp which basic blocks the PHI function should be placed in. If a certain node has a certain variable, then this variable will reach every node dominated by the node, and the PHI function needs to be added to the basic block of immediate rearward drive.
  • the R 3 buffer will be the positive definite value of the variable y 3.
  • step 302 the directly dominating nodes of all nodes are identified.
  • the classic Lengauer-Tarjan algorithm or other algorithms can be used to identify the directly dominating node. Taking the control flow diagram of FIG. 2 as an example, this embodiment recognizes that the directly dominating nodes are A node, B node, E node, and I node.
  • step 303 the initial information of all nodes is set, and the initial information is the fixed value information and kill set information of all the memory descriptors of the corresponding direct control node. Since the variables in the direct control node are directly used in the trailing basic block, this embodiment uses the fixed value and kill set of the variables in the direct control node as the initial information of the trailing basic block when setting the initial information. Furthermore, variables are defined by memory descriptors, so setting the initial information of all nodes is to set the initial information of each variable in the node, that is, to set the fixed value information and kill set of the memory descriptor corresponding to the variable The initial information of the message.
  • the kill set information is an ordered list, generally the addresses are arranged from low to high, which facilitates various intersection and union operations.
  • step 304 iterate based on the control flow graph.
  • the fixed value information and kill set information of the memory descriptor of each node are calculated and updated, and the data dependency of the instruction is updated.
  • Initialization is to calculate the fixed value information and kill set information of the memory descriptor of each node based on the initial information obtained in step 303, after obtaining new fixed value information and kill set information, use the new fixed value information and kill set The information is substituted into the recalculation, and iterates until the fixed value information and kill set information of the memory descriptor of the variable in each node no longer change.
  • the order of the data stream can be pre-order, middle-order, post-sequence, or reverse post-sequence.
  • the operation is to traverse the left subtree in reverse order, then traverse the right subtree in reverse order, and finally visit the root node.
  • the result of the subsequent traversal is M ⁇ D ⁇ C ⁇ B ⁇ H ⁇ F ⁇ G ⁇ E ⁇ L ⁇ J ⁇ K ⁇ I ⁇ A.
  • the reverse post-order as an example again. It is obtained by traversing in a post-order manner first, and then inverting the result.
  • the reverse post-order traversal result of the control flow graph in Figure 2 is A ⁇ I ⁇ K ⁇ J ⁇ L ⁇ E ⁇ G ⁇ F ⁇ H ⁇ B ⁇ C ⁇ D ⁇ M, the reverse order can converge earlier.
  • This embodiment does not limit the order of traversal, but it is better to use reverse post-order traversal.
  • the node may involve the PHI function.
  • the instructions of each basic block are further traversed to determine whether the subsequent basic block of each basic block is a PHI node. If it is a PHI node, set the fixed value information of all direct predecessor nodes of the PHI node as the fixed value information of the memory descriptor of the PHI node to determine the fixed value information of the PHI node. Then take the intersection of the kill set information of these immediate predecessor nodes to generate the intersection interval of the kill set information of the immediate predecessor node, and then set the intersection interval as the kill set information of the memory descriptor of the PHI node to determine the kill set of the PHI node information.
  • step 304 of the foregoing embodiment the use of the command is generally recognized first, and then the fixed value of the command is recognized.
  • the foregoing embodiment first identifies which instructions in each node are used and which are fixed values, and then calculates and updates the fixed value information and kill set information of the memory descriptors used by those instructions, and then calculates and updates The fixed value information and kill set information of the memory descriptor of the fixed value of those instructions.
  • the addition instruction sets MD1 (var:C,offset:0,size:128), and the addition instruction uses MD2 (var:A,offset:0,size:128) and MD3(var:B,offset:0,size:128).
  • the fixed value information and kill set of the use of the addition instruction can be calculated and updated first. Information, then calculate and update the fixed value information and kill set information of the fixed value MD1 of the addition instruction.
  • FIG. 4 shows a flowchart thereof.
  • step 401 it is determined whether a memory descriptor must be used for the use of the instruction. If there is no memory descriptor that must be used, step 402 is executed to set the fixed value information of the memory descriptor that may be used as the fixed value information of the memory descriptor used by the instruction. Wherein, the possible memory description can be multiple memory descriptors corresponding to the same variable, or multiple descriptors corresponding to different variables. If the memory descriptor must be used, step 403 is executed to determine whether the memory descriptor must be used accurately, that is, whether the memory descriptor must be used accurately.
  • step 404 set the fixed value information of the memory descriptor that must be used and possibly use the memory descriptor as the fixed value information of the instruction, that is, the fixed value information of the memory descriptor must be used and the memory may be used
  • the fixed value information of the descriptor is used as the fixed value information of the memory descriptor used by the instruction. If it is accurate, step 405 is executed.
  • step 405 it is determined whether the fixed value information of the memory descriptor that must be used accurately is present. If it does not exist, step 406 is executed to identify all possible memory descriptors with fixed value information, and then step 413 is executed. If it exists, step 407 is executed to set the fixed value information of the memory descriptor that must be used accurately as the fixed value information used by the instruction, and then step 408 is executed.
  • step 408 it is determined whether the kill set information of the memory descriptor that must be used accurately is empty, where the kill set may be determined by considering all possible memory descriptors of the same variable. If it is empty, it means that the fixed value information of the memory descriptor that must be used accurately has not been killed, and step 409 is executed. There is no need to set the kill set information used by the instruction, that is, the kill set information is Empty, end the whole process. If it is not empty, it means that part of the fixed value information of the memory descriptor that must be used accurately is killed, and step 411 is executed.
  • step 411 it is determined whether the fixed value information of the memory descriptor that may be used to make the kill set information not empty exists. If it does not exist, there is no need to consider the possible use of the memory descriptor. It is sufficient to set the value based on the aforementioned setting information of the memory descriptor that must be used. At this time, step 409 is executed to end the entire process. If it exists, step 413 is executed to determine whether the intersection interval of the fixed value information of the memory descriptor that must be used and the fixed value information of the memory descriptor that may be used is killed.
  • step 414 is executed to set the fixed value information of the memory descriptor that may be used and the fixed value information of the memory descriptor that must be used as the used fixed value information. If it is killed, step 409 is executed to end the entire process.
  • FIG. 5 shows a flowchart of this embodiment, which is different from the embodiment of FIG. 4 in two places.
  • Step 501 is executed to set the fixed value information of the memory descriptor that may be used as the fixed value information used by the instruction.
  • step 411 when it is determined that the fixed value information of the memory descriptor may be used, step 413 is not executed, and step 502 is executed to set the fixed value information of the memory descriptor that may be used and must be used.
  • the fixed value information of the memory descriptor is the fixed value information used by the instruction.
  • step 304 After calculating and updating the fixed value information and kill set information of the memory descriptor used by the instruction, step 304 then calculates and updates the fixed value information and kill set information of the fixed value of the instruction.
  • FIG. 6 shows a flowchart thereof.
  • step 601 it is judged whether the fixed value of the instruction has a fixed value memory descriptor. If it does not exist, step 602 is executed to identify all possible fixed-value memory descriptors. Then step 603 is executed to exclude possible fixed-value memory descriptors that are covered by other possible fixed-value memory descriptors among all possible fixed-value memory descriptors. Next, step 604 is executed to exclude possible fixed-value memory descriptors that have fixed-value information among all possible fixed-value memory descriptors. Next, step 605 is executed to establish the setting information of the possible fixed-value memory descriptors that are not excluded from all possible fixed-value memory descriptors. When it is determined in step 601 that the fixed value of the instruction has a necessary fixed value memory descriptor, step 606 is executed.
  • step 606 it is judged whether the memory descriptor with a fixed value is accurate. If it is accurate, step 607 is executed to establish the fixed value information of the required fixed value memory descriptor in the fixed value of the instruction. Next, step 608 is executed to identify possible fixed-value memory descriptors of the necessary fixed-value memory descriptor among the fixed values of the instruction. Next, step 609 is executed to place the fixed value information of the necessary fixed value memory descriptor in the fixed value of the instruction into the kill set information of the possible fixed value memory descriptor. When it is determined in step 606 that the fixed-value memory descriptor is not accurate, step 610 is executed.
  • step 610 it is determined whether there is fixed value information in the possible fixed value memory descriptor. If there is fixed value information, step 611 is executed to note that the possible fixed value memory descriptor is killed. If there is no fixed value information, it means that there is no information available for fixed value, and step 612 is executed to end the entire process.
  • step 609 its placement method can be further refined into the process shown in FIG. 7.
  • step 701 it is determined whether the possible fixed-value memory descriptor is completely killed. If it is completely killed, step 702 is executed to delete the record of the possible fixed-value memory descriptor. If it is not completely killed, step 703 is executed to determine whether the kill set information of the possible fixed value memory descriptor is empty. If it is empty, step 704 is executed to place all intervals of the fixed value information of the necessary fixed value memory descriptor in the fixed value of the instruction into the kill set information of the possible fixed value memory descriptor.
  • step 705 is executed to establish a union interval of the kill set information of the possible fixed value memory descriptor and the fixed value information of the necessary fixed value memory descriptor.
  • step 706 is executed to put the union interval into the kill set information of the possible fixed value memory descriptor.
  • step 601 when it is judged in step 601 whether the fixed value of the instruction has a required fixed value memory descriptor, some special ones, such as the possible fixed value memory descriptor shown in step 603 or 604, are not excluded. , But after identifying all possible fixed-value memory descriptors in step 602, setting value information of all possible fixed-value memory descriptors is directly established.
  • this embodiment further executes the process shown in FIG. 8.
  • step 801 it is judged whether the fixed value of the instruction must be fixed value memory descriptor is predicated. If there is no predicate, step 802 is executed, and the flow of FIG. 6 is executed. If it is predicated, the setting information of the required fixed value memory descriptor of the instruction needs to be established. At this time, step 803 is continued to determine whether the fixed value of the fixed value of the instruction has fixed value information. If there is fixed value information, step 804 is executed to add the fixed value of the instruction to the fixed value information of the required fixed value memory descriptor of the fixed value of the instruction. If the fixed value information does not exist, and the fixed value information of the required fixed value memory descriptor of the fixed value of the new instruction needs to be established, step 802 is executed, and the flow of FIG. 6 is executed.
  • step 304 can also update the data dependency of the instruction at the same time.
  • the data dependency of the instruction may refer to the setting corresponding to the use of each instruction. Value information, so that the fixed value-use chain and the use-fixed value chain can be determined.
  • the data dependency can be recorded and stored in the form of an array.
  • a variable in a control flow graph involves multiple memory descriptors and their setting information as follows: MD1[0,31], MD2[0,15], MD3[16,31], MD4[ 0,64], and based on the reverse data flow, there are currently 6 related instructions to set and use the variable as shown in the following table:
  • IR1 fixed value MD1 MD1 ⁇ MD2,MD3,MD4 ⁇
  • IR2 Use MD1 MD1 ⁇ MD2,MD3,MD4 ⁇
  • IR3 fixed value MD2 MD2 ⁇ MD1,MD4 ⁇
  • IR4 Use MD1 MD1 ⁇ MD2,MD3,MD4 ⁇
  • IR5 fixed value MD3 MD3 ⁇ MD1,MD4 ⁇
  • IR6 Use MD4 MD4 ⁇ MD1,MD2,MD3 ⁇
  • step 601 it is judged whether IR1 has a fixed value memory descriptor, MD1 is a fixed value memory descriptor, so step 606 is executed to judge whether the fixed value memory descriptor is accurate, and the interval [0,31] of MD1 is accurate , Then execute step 607 to establish the fixed value information of the necessary fixed value memory descriptor in IR1, that is, set MD1[0,31] as the fixed value information of the variable. Then execute step 608 and step 609.
  • MD2, MD3, and MD4 are possible fixed-value memory descriptors, it is determined by querying the initial information that MD2, MD3, and MD4 are not present in the commands before IR1, so in the fixed value of this command It is not yet possible to consider the possible fixed-value memory descriptors MD2, MD3, MD4, so there is no need to record the possible fixed-value memory descriptors or update the kill set information in step 608 and step 609. Therefore, the memory descriptor creation record information of this variable is as follows:
  • the kill set information is recorded as ⁇ , which means that the kill set information is empty.
  • step 401 it is judged whether the use of the instruction must use a memory descriptor. Because MD1 exists, step 403 is executed to determine whether there is a memory descriptor that must be used accurately. Since MD1 is accurate, step 405 is executed to determine Whether the fixed value information of the memory descriptor that must be used accurately exists, the fixed value information [0,31] of MD1 exists, and then step 407 is performed to set the fixed value information of the memory descriptor that must be used accurately.
  • step 408 is executed to determine whether the kill set information of the memory descriptor that must be used accurately is Empty, because no interval of MD1 has been killed yet, step 409 is executed, no need to consider the kill set information, and the whole process ends.
  • the fixed value information of IR2 is IR1, so that the data dependency relationship between the instructions IR2 and IR1 (that is, the use_def chain) can be determined, and the kill set information is empty.
  • step 601 it is judged whether IR3 has a fixed value memory descriptor, MD2 is a fixed value memory descriptor, so step 606 is executed to judge whether the fixed value memory descriptor is accurate, and the interval [0,15] of MD2 is accurate , And then execute step 607 to establish the fixed value information of the necessary fixed value memory descriptor in IR3, that is, set MD2[0,15] as the fixed value information of the variable.
  • step 608 and step 609 are then executed. Since MD1 is a possible fixed value memory descriptor of MD2, in step 609, the kill set information of MD1 is updated. Therefore, the memory descriptor update record information of this variable is as follows:
  • step 401 it is judged whether the use of the instruction must use a memory descriptor. Because MD1 exists, step 403 is executed to determine whether there is a memory descriptor that must be used accurately. Since MD1 is accurate, step 405 is executed to determine all Whether the fixed value information of the memory descriptor that must be used accurately exists, and the fixed value information of MD1 exists, then step 407 is executed to set the fixed value information of the memory descriptor that must be used accurately to the instruction IR4 Fixed value information, so the fixed value information used by the instruction IR4 is set to IR1, and then step 408 is executed to determine whether the kill set information of the memory descriptor that must be used accurately is empty.
  • the kill set information of MD1 If it is not empty, go to step 411 to determine whether the fixed value information of the memory descriptor that may be used exists, MD2 has fixed value information, so go to step 413 to determine the intersection of the fixed value information of MD1 and the fixed value information of MD2 Whether the interval [0,15] is killed. Since the interval [0, 15] of MD2 has not been killed, step 414 is executed, and the fixed value information of MD2 is also set as the fixed value information used by the instruction.
  • the fixed value information of IR4 is: IR1 (the fixed value instruction of MD1) and IR3 (the fixed value instruction of MD2), which can determine the data dependency between the instructions IR4, IR2 and IR1 (ie use_def chain).
  • IR4 uses MD1, and the interval of MD1 is determined by two instructions, among which IR1 is fixed by [16,31], and IR3 is fixed by [0,15].
  • step 601 it is judged whether IR5 has a fixed value memory descriptor, MD3 is a fixed value memory descriptor, so step 606 is executed to judge whether the fixed value memory descriptor is accurate, and MD3 is accurate, and then step 607 is executed to create The fixed value information of the memory descriptor that must be fixed in IR5, that is, the fixed value information of the memory descriptor that sets MD3 to the variable.
  • step 608 and step 609 are then executed. Since MD1 is a possible fixed value memory descriptor of MD3, in step 609, the kill set information of MD1 is updated.
  • step 401 it is judged whether there is a memory descriptor that must be used for the use of the instruction. Since MD4 has not been set in the preceding instruction, there is no memory descriptor that must be used. Therefore, step 402 is executed, and the setting is possible.
  • the fixed value information of the used memory descriptor is the fixed value information used by the instruction.
  • the fixed value information of these two memory descriptors are used as the fixed value information of IR6, so as to determine the data dependence relationship (ie use_def chain) among the instructions IR6, IR3 and IR5.
  • step 601 it is judged whether IR1 has a fixed value memory descriptor, MD1 is a fixed value memory descriptor, so step 606 is executed to judge whether the fixed value memory descriptor is accurate, MD1 is accurate, and then step 607 is executed to create The fixed value information of the memory descriptor that must be fixed in IR1, that is, MD1[0,31] is set to the fixed value information of variable A. Then execute step 608 and step 609.
  • the memory descriptor creation record information of variable A is as follows:
  • kill set information is_kill MD1 IR1 ⁇ Fake
  • is_kill records whether the memory descriptor has been killed, true means it has been killed, and false means it has not been killed.
  • step 601 it is judged whether IR2 has a fixed value memory descriptor.
  • MD2 is a fixed value memory descriptor. Therefore, step 606 is executed to judge whether the fixed value memory descriptor is accurate. MD2 is not accurate, and X is not fixed, so In step 610, it is determined whether there is fixed value information in the possible fixed value memory descriptor.
  • step 611 Since there is fixed value information in the possible fixed value memory descriptor MD1 of IR2, step 611 is executed, and it is noted that MD1 is killed, that is, the is_kill of MD1 is changed to true, to indicate that part of the interval of MD1 has been killed, but due to the fixed value of MD2 The value information is not accurate, and the kill set information of MD1 cannot be determined. Therefore, the is_kill information indicates that MD1 has been killed. Therefore, the record of the memory descriptor of variable A is updated as follows:
  • kill set information is_kill MD1 IR1 ⁇ real MD2 IR2 ⁇ Fake
  • step 401 it is determined whether the use of the instruction must use a memory descriptor, MD2 must use a memory descriptor, so step 403 is executed to determine whether there is a memory descriptor that must be used accurately. Since MD2 is not accurate, perform the step 404.
  • Set the fixed value information of the memory descriptor that must be used and may use the memory descriptor as the fixed value information of the memory descriptor used by the instruction, that is, the fixed value information of MD2 and the fixed value information of MD1 are both used as the fixed value information of IR3 Value information.
  • the fixed value information used by IR3 for MD2 is IR1 and IR2, so that the data dependency relationship (ie, use_def chain) between the instructions IR3, IR2, and IR1 can be determined.
  • step 601 it is judged whether the instruction has a memory descriptor with a fixed value. Because it does not exist, step 602 is executed to identify all possible fixed-value memory descriptors. In this example, all possible fixed-value memory descriptors are MD1, MD2, MD3, and MD4. Then step 603 is executed to exclude possible fixed-value memory descriptors that are covered by other possible fixed-value memory descriptors among all possible fixed-value memory descriptors. Since MD3 is completely covered by MD4, MD3 is excluded. Next, step 604 is executed to exclude possible fixed-value memory descriptors that have fixed-value information among all possible fixed-value memory descriptors.
  • all possible fixed-value memory descriptors are MD1, MD2, MD3, and MD4.
  • step 603 is executed to exclude possible fixed-value memory descriptors that are covered by other possible fixed-value memory descriptors among all possible fixed-value memory descriptors. Since MD3 is completely covered by MD4, MD3 is excluded.
  • step 604 is executed to exclude
  • step 605 is executed to establish the setting information of all possible fixed-value memory descriptors that have not been excluded, that is, the fixed value information of MD4 corresponding to the newly added variable B, and the fixed value information of each memory descriptor after the update
  • the value information is as follows:
  • kill set information is_kill MD1 IR1 ⁇ real MD2 IR2 ⁇ Fake MD4 IR4 ⁇ Fake
  • step 801 it is judged whether the fixed value of the instruction must be fixed value memory descriptor is predicate. Because it is predicated, step 803 is executed to determine whether the fixed value of the instruction must have fixed value information in the fixed value memory descriptor. If MD1 has fixed value information, step 804 is executed to add the fixed value of the instruction to the fixed value information of the required fixed value memory descriptor of the fixed value of the instruction. Therefore, the setting record is updated to:
  • kill set information is_kill MD1 IR1, IR5 ⁇ real MD2 IR2 ⁇ Fake MD4 IR4 ⁇ Fake
  • step 401 it is judged whether the use of the instruction exists and the memory descriptor must be used. Because there is no memory descriptor that must be used, step 402 is executed to set the fixed value information of the memory descriptor that may be used as the fixed value information used by the instruction IR6.
  • the fixed value information used in IR6 is: IR1, IR2, IR4, IR5, which can determine the data dependency between the instruction IR6 and IR2, IR1, IR4, and IR5 (ie, use_def chain).
  • Another embodiment of the present disclosure is a computer-readable storage medium on which is stored computer program code of the data dependency relationship in the calculation program, and when the computer program code is run by a processor, various methods of the present disclosure are executed, The method described in Figures 3 to 8.
  • the compiler analyzes the memory interval and kill set information used by each fixed value and instruction to calculate the data dependency relationship between the instructions, and at the same time, establish the fixed value for the instruction with the fixed value-use relationship.
  • the compiler can perform compilation optimization based on the data dependency, thereby effectively reducing unnecessary calculation procedures and obtaining optimized codes.
  • the compiler can further convert the above optimized code into machine code, which can be executed by the artificial intelligence chip, so that the artificial intelligence chip, especially the neural network inference, can accelerate the operation.
  • FIG. 9 is a structural diagram showing an integrated circuit device 900 to which an embodiment of the present disclosure is applied.
  • the integrated circuit device 900 includes a computing device 902, and the computing device 902 carries machine code compiled by applying various embodiments of the present disclosure to implement neural network inference.
  • the integrated circuit device 900 further includes a universal interconnect interface 904 and other processing devices 906.
  • the other processing device 906 may be one or more types of processors in general and/or special-purpose processors such as a central processing unit, a graphics processor, an artificial intelligence processor, etc., the number of which is not limited but determined according to actual needs. .
  • the other processing device 906 serves as an interface between the computing device 902 and external data and control, performs basic control including but not limited to data transfer, and completes the starting and stopping of the computing device 902.
  • the other processing device 906 can also cooperate with the computing device 902 to complete computing tasks.
  • the universal interconnect interface 904 can be used to transfer data and control commands between the computing device 902 and other processing devices 906.
  • the computing device 902 can obtain the required input data from the other processing device 906 via the universal interconnect interface 904, and write it into the on-chip storage unit of the computing device 902.
  • the computing device 902 can obtain control instructions from other processing devices 906 via the universal interconnect interface 904, and write them into the on-chip control buffer of the computing device 902.
  • the universal interconnection interface 904 can also read the data in the storage module of the computing device 902 and transmit it to the other processing device 906.
  • the integrated circuit device 900 also includes a storage device 908, which can be connected to the computing device 902 and other processing devices 906, respectively.
  • the storage device 908 is used to store data of the computing device 902 and other processing devices 906, and is particularly suitable for data that cannot be fully stored in the internal storage of the computing device 902 or other processing devices 906 for the data required for calculation.
  • the integrated circuit device 900 can be used as a system-on-chip (SOC) for mobile phones, robots, drones, video capture and other equipment, thereby effectively reducing the core area of the control part, increasing processing speed and reducing overall power consumption .
  • the universal interconnection interface 904 of the integrated circuit device 900 is connected to certain components of the device. Some components here can be, for example, a camera, a monitor, a mouse, a keyboard, a network card or a wifi interface.
  • the present disclosure also discloses a chip or integrated circuit chip, which includes an integrated circuit device 900.
  • the present disclosure also discloses a chip packaging structure, which includes the above-mentioned chip.
  • the board card 1000 may also include other supporting components.
  • the supporting components include a storage device 1004, an interface device 1006 and a control device 1008.
  • the storage device 1004 is connected to the chip 1002 in the chip packaging structure through a bus 1014 for storing data.
  • the storage device 1004 may include multiple groups of storage units 1010.
  • the interface device 1006 is electrically connected to the chip 1002 in the chip packaging structure.
  • the interface device 1006 is used to implement data transmission between the chip 1002 and an external device 1012 (for example, a server or a computer).
  • the interface device 1006 is a standard PCIe interface, and the data to be processed is transferred from the server to the chip 1002 through the standard PCIe interface to realize data transfer.
  • the calculation result of the chip 1002 is also transmitted back to the external device 1012 by the interface device 1006.
  • the control device 1008 is electrically connected to the chip 1002 to monitor the state of the chip 1002. Specifically, the chip 1002 and the control device 1008 may be electrically connected through an SPI interface.
  • the control device 1008 may include a single-chip microcomputer ("MCU", Micro Controller Unit).
  • electronic equipment or devices can include data processing devices, robots, computers, printers, scanners, tablets, smart terminals, mobile phones, driving recorders, navigators, sensors, cameras, servers, cloud servers, and cameras , Cameras, projectors, watches, headphones, mobile storage, wearable devices, vehicles, household appliances, and/or medical equipment.
  • the transportation means include airplanes, ships, and/or vehicles;
  • the household appliances include televisions, air conditioners, microwaves, refrigerators, rice cookers, humidifiers, washing machines, electric lights, gas stoves, and range hoods;
  • medical equipment includes nuclear magnetic resonance, B-ultrasound and/or electrocardiograph.
  • a method for calculating data dependency in a program comprising: constructing the program into a static single-assignment control flow graph, the control flow graph including multiple nodes; setting the multiple The initial information of the memory descriptor of the node, where the initial information is the fixed value information and kill set information of all the memory descriptors that directly dominate the node; and iterating based on the control flow graph, and performing the following steps for each iteration: Describe multiple nodes, calculate and update the fixed value information and kill set information of the memory descriptor of each node, and update the data dependency of the instructions.
  • the data dependency of the instruction is used as reference information during instruction compilation and optimization.
  • Clause A2 the method according to clause A1, wherein the iterative step is performed in a reverse sequential data flow manner.
  • Clause A3 The method according to clause A1, wherein in the calculation step, the memory descriptor used by the instruction is first calculated and updated, and then the fixed-value memory descriptor of the instruction is calculated and updated.
  • the calculation step further includes: judging whether there is a memory descriptor that must be used accurately; if it exists, judging that the memory descriptor must be used accurately. Whether the value information exists; if the fixed value information exists, set the fixed value information of the memory descriptor that must be used accurately as the used fixed value information.
  • the calculation step further includes: judging whether the kill set information of the memory descriptor that must be used accurately is empty; and if it is not empty, Identify the possible use of the memory descriptor that makes the kill set information not empty; determine whether the fixed value information of the possible use of the memory descriptor exists; if the fixed value information of the possible use of the memory descriptor exists, judge the necessary Whether the intersection interval between the fixed value information of the memory descriptor used and the fixed value information of the possible memory descriptor is killed; if it is not killed, set the fixed value information of the possible memory descriptor and the The fixed value information of the memory descriptor that must be used is the fixed value information used by the instruction.
  • the calculation step further includes: judging whether the kill set information of the memory descriptor that must be used accurately is empty; and if it is not empty, Determine whether the fixed value information of the memory descriptor may be used; if the fixed value information of the memory descriptor may be used, set the fixed value information of the memory descriptor that may be used and the memory descriptor that must be used
  • the fixed value information of is the fixed value information used by the instruction.
  • the calculation step further includes: identifying all possible memory descriptors that have fixed value information; judging that the memory descriptor must be used accurately Whether the intersection interval between the fixed value information of the possible use memory descriptor and the fixed value information of the possible use memory descriptor is killed; and if it has not been killed, set the fixed value information of the possible use memory descriptor and the necessary use memory
  • the fixed value information of the descriptor is the fixed value information used by the instruction.
  • the calculation step further includes: identifying all possible memory descriptors that have fixed value information; and setting the possible memory descriptor
  • the fixed value information of the symbol is the fixed value information used by the instruction.
  • the calculation step further includes: setting the fixed value information of the memory descriptor that must be used and the memory descriptor that may be used as the instruction The value information of the use.
  • Clause A10 The method according to Clause A4, wherein the calculation step further includes: judging whether the use of the instruction must use a memory descriptor; and if it does not exist, a memory descriptor must be used, and setting a memory descriptor that may be used
  • the fixed value information of is the fixed value information used by the instruction.
  • the calculating step includes: establishing the fixed value information of the required fixed value memory descriptor in the fixed value of the instruction; identifying the instruction The possible fixed value memory descriptors of the required fixed value memory descriptor in the fixed value of, and the setting information of the required fixed value memory descriptor in the instruction is placed in the kill set information of the possible fixed value memory descriptor middle.
  • Clause A12 The method according to clause A11, wherein the step of placing includes: judging whether the possible fixed-value memory descriptor is completely killed; and if it is completely killed, deleting the possible fixed-value memory descriptor record of.
  • Clause A13 The method according to clause A11, wherein the step of inserting includes: judging whether the kill set information of the possible fixed value memory descriptor is empty; All intervals of the fixed value information of the fixed value memory descriptor are placed in the kill set information of the possible fixed value memory descriptor.
  • the placing step includes: establishing the kill set information of the possible fixed-value memory descriptor and The union interval of the fixed value information of the necessary fixed value memory descriptor; and the union interval is placed in the kill set information of the possible fixed value memory descriptor.
  • the calculation step includes: judging whether the required fixed value memory descriptor in the fixed value of the instruction is accurate; The fixed value information of the required fixed value memory descriptor in the fixed value of the instruction; and the possible fixed value memory descriptor that identifies the required fixed value memory descriptor in the fixed value of the instruction; if inaccurate: judge the possible fixed value Whether the value memory descriptor has fixed value information; and if the possible fixed value memory descriptor has fixed value information, it is noted that the possible fixed value memory descriptor is killed.
  • the calculation step includes: judging whether the fixed value of the instruction has a mandatory fixed value memory descriptor; if there is no mandatory fixed value memory description Perform the following steps: identify all possible fixed-value memory descriptors; among all possible fixed-value memory descriptors, exclude possible fixed-value memory descriptors covered by other possible fixed-value memory descriptors; in all possible fixed-value memory descriptors In the symbol, exclude possible fixed-value memory descriptors that have fixed-value information; and establish the fixed value information of all possible fixed-value memory descriptors that are not excluded from the possible fixed-value memory descriptors.
  • the calculation step includes: judging whether the fixed value of the instruction has a mandatory fixed value memory descriptor; if there is no mandatory fixed value memory description Symbol, establishes the setting information of all possible fixed-value memory descriptors.
  • the calculation step includes: judging whether the fixed value of the instruction must be fixed value memory descriptor is predicated; The fixed value information of the required fixed value memory descriptor that describes the fixed value of the instruction.
  • Clause A19 The method according to clause A18, wherein the step of establishing the setting information of the fixed value of the instruction must be fixed value memory descriptor includes: judging whether the fixed value of the instruction must be fixed value memory descriptor There is fixed value information; if there is fixed value information, add the fixed value of the instruction to the fixed value information of the fixed value of the instruction must be fixed value memory descriptor; and if there is no fixed value information, create a new The fixed value of the instruction must be fixed value of the fixed value information of the memory descriptor.
  • Clause A20 The method according to clause A1, wherein the calculation step includes: traversing the instructions of each basic block, judging whether the subsequent basic block of each basic block is a PHI node; if it is a PHI node, perform the following steps : Set the fixed value information of each basic block as the fixed value information of the memory descriptor of the PHI node; establish the intersection interval of all kill set information of each basic block; and set the intersection interval Is the kill set information of the PHI node.
  • Clause A21 The method according to clause A1, further comprising: identifying the directly dominating node among the plurality of nodes.
  • Clause A23 A computer-readable storage medium on which is stored the computer program code of the data dependency relationship in the calculation program, and when the computer program code is run by the processor, it executes any one of clauses A1-22. method.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Computer Hardware Design (AREA)
  • Quality & Reliability (AREA)
  • Devices For Executing Special Programs (AREA)

Abstract

一种计算程序中数据依赖关系的方法及计算机可读存储介质,应用于计算装置(902)中,所述计算装置(902)包括在集成电路装置中,该集成电路装置包括通用互联接口(904)和其他处理装置(906)。计算装置(902)与其他处理装置(906)进行交互,共同完成用户指定的计算操作。集成电路装置还可以包括存储装置(908),存储装置(908)分别与计算装置(902)和其他处理装置(906)连接,用于计算装置(902)和其他处理装置(906)的数据存储。

Description

计算程序中数据依赖关系的方法及计算机可读存储介质
相关申请的交叉引用
本申请要求于2020年5月29日申请的,申请号为2020104748695,名称为“计算程序中数据依赖关系的方法及计算机可读存储介质”的中国专利申请的优先权,在此将其全文引入作为参考。
技术领域
本公开一般地涉及计算机领域。更具体地,本公开涉及计算程序中数据依赖关系的方法及计算机可读存储介质。
背景技术
精确的数据流分析是让编译优化能高效进行的基础,静态单赋值(static single assignment,SSA)形式就是一种高效的数据流分析技术,目前几乎可以应用于所有的现代编译器。静态单赋值形式是一种中介码(intermediate representation,IR),每个变量仅被赋值一次,使得其使用定值链(use-define chain)非常明确,能帮助简化编译器的算法。
当一个内存区间可能会由多条指令来定值时,现有技术难以分析指令之间的数据依赖关系,使得编译过程无法有效简化,导致运算资源的浪费,因此一种新的方法来分析内存之间的数据依赖关系是迫切需要的。
发明内容
为了至少部分地解决背景技术中提到的技术问题,本公开的方案提供了一种计算程序中数据依赖关系的方法及计算机可读存储介质。
在一个方面中,本公开揭露一种计算程序中数据依赖关系的方法,所述方法包括:将所述程序建构成静态单赋值形式的控制流图,所述控制流图包括多个节点;设定所述多个节点的内存描述符的初始信息,所述初始信息为直接支配节点的所有内存描述符的定值信息及kill集信息;以及基于所述控制流图进行迭代,每次迭代执行下列步骤:遍历所述多个节点,计算并更新每个节点的内存描述符的定值信息及kill集信息,并更新指令的数据依赖关系。其中所述指令的数据依赖关系用于指令编译优化时的参考信息。
在另一个方面,本公开揭露一种计算机可读存储介质,其上存储有计算程序中数据依赖关系的计算机程序代码,当所述计算机程序代码由处理器运行时,执行前述的方法。
本公开通过编译器对每一条指令或中介码定值的内存区间和使用的内存区间进行分析,计算各指令之间的数据依赖关系,同时对于有定值-使用关系的指令,建立定值-使用链和使用-定值链,以简化内存之间的数据依赖关系,加速图像、语音、文本数据的计算。
附图说明
通过参考附图阅读下文的详细描述,本公开示例性实施方式的上述以及其他目的、 特征和优点将变得易于理解。在附图中,以示例性而非限制性的方式示出了本公开的若干实施方式,并且相同或对应的标号表示相同或对应的部分其中:
图1A示出示例性的源代码及控制流图;
图1B示出等价的静态单赋值形式及控制流图;
图2示出示例性的控制流图;
图3示出本公开实施例的流程图;
图4示出本公开另一个实施例的流程图;
图5示出本公开另一个实施例的流程图;
图6示出本公开另一个实施例的流程图;
图7示出本公开另一个实施例进行置入时的流程图;
图8示出本公开另一个实施例的流程图;
图9是示出本公开实施例的集成电路装置的结构图;以及
图10是示出本公开实施例的板卡的结构图。
具体实施方式
下面将结合本公开实施例中的附图,对本公开实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本公开一部分实施例,而不是全部的实施例。基于本公开中的实施例,本领域技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本公开保护的范围。
应当理解,本公开的权利要求、说明书及附图中的术语“第一”、“第二”、“第三”和“第四”等是用于区别不同对象,而不是用于描述特定顺序。本公开的说明书和权利要求书中使用的术语“包括”和“包含”指示所描述特征、整体、步骤、操作、元素和/或组件的存在,但并不排除一个或多个其它特征、整体、步骤、操作、元素、组件和/或其集合的存在或添加。
还应当理解,在此本公开说明书中所使用的术语仅仅是出于描述特定实施例的目的,而并不意在限定本公开。如在本公开说明书和权利要求书中所使用的那样,除非上下文清楚地指明其它情况,否则单数形式的“一”、“一个”及“该”意在包括复数形式。还应当进一步理解,在本公开说明书和权利要求书中使用的术语“和/或”是指相关联列出的项中的一个或多个的任何组合以及所有可能组合,并且包括这些组合。
如在本说明书和权利要求书中所使用的那样,术语“如果”可以依据上下文被解释为“当...时”或“一旦”或“响应于确定”或“响应于检测到”。
下面结合附图来详细描述本公开的具体实施方式。
本公开是一种利用静态单赋值形式的控制流图,通过编译器对每一条指令定值的内存区间和使用的内存区间进行分析,计算各个指令间的数据依赖关系。对于有定值和使用关系的指令,建立起定值-使用链及使用-定值链。
控制流图是用在编译器中的一种抽象数据结构,代表了一个程序在执行过程中会遍历的所有路径,以流程图的形式反映过程内所有基本块(basic block,BB)的可能流向。控制流图是由节点和节点间的关系所组成的,节点即是基本块,是程序中最大限度顺序执行的语句序列,每个基本块只有一个入口和出口,执行时从其入口开始,从其出口结束。基本块的特点是只要是基本块内的第一条指令被执行了,那么基本块 内所有指令都会按照顺序被执行,直到基本块的所有指令都被执行完毕。
静态单赋值形式最主要的用途是通过调整变量,使得每一个变量仅会被赋值一次,以达到较佳的运算效率。举例来说,以下3行代码:
y:=1
y:=2
x:=y
第一行是将y变量赋值1,第二行是将y变量赋值2,第三行是将y变量的值赋值给x变量,由于y变量在第二行被二度赋值,覆盖了第一行的赋值,而y变量的数值在第三行被使用,故第一行是没有必要的。在静态单赋值形式下,前述代码会转换成下列的中介码:
y 1:=1
y 2:=2
x 1:=y 2
再以图1为例,图1A示出示例性的源代码及控制流图,首先将输入数据的值赋值给x变量,接着判断x变量的值是否为42,如果是,y变量赋值1,如果不是,则y变量谓词x+2,最后打印y变量的值。图1B示出等价的静态单赋值形式及控制流图,y变量在两个分支中被改写为y 1和y 2,静态单赋值形式在控制流交汇处利用Ф(PHI)函数合并了来自不同分支的y 1和y 2值,表示将获得y 1和y 2值的其中一个,接着赋值给y 3,最后打印y 3的值。从图1B可知,静态单赋值形式会将源代码中的各个变量做调整,使得一个变量仅被赋值一次。
静态单赋值形式的特征在于:源变量按照活动范围(从变量的一次定值到使用)进行分割,并产生添加数字编号后缀的新变量,每个变量只定值一次,如前述的y变量调整成y 1和y 2;控制流交汇处有PHI函数将来自不同路径的值合并。PHI函数表示一个平行操作,根据运行的路径选择一个赋值。
静态单赋值形式简化了两种链接关系:使用-定值链(use-def chain)和定值-使用链(def-use chain)。使用-定值链是一个数据结构,包含一个定值(或称定义)变量,以及它的全部使用的集合。相对的,定值-使用链包含一个使用变量,以及它的全部定值的集合。由于非静态单赋值形式下的变量不止一次被定值,所以每个变量的定值-使用链非常复杂。而在静态单赋值形式下,没有同名变量,每个变量只被定值一次,所以同名的使用都是属于它的定值-使用链,而且每个变量在使用前都只被定值一次,所以使用-定值链是一对一的。可见,静态单赋值形式下的使用-定值链和定值-使用链都得到了简化。
静态单赋值形式带来以下好处:对于标量而言,每个变量只有唯一的定值,数据流分析和优化算法更加简单;使用-定值关系链所占用的内存空间从指数增长降低为线性增长;使用和定值的关系更加直观;源程序中对同一个变量的不相关的若干次使用,在静态单赋值形式中会转变成对不同变量的使用,因此能避免执行不必要的冗余指令。
对于张量而言,每个变量可以是以内存描述符(memory descriptor)所定义的,内存描述符用以描述一块内存区间,其具有3种信息:变量名称、偏移及大小,例如 MD1(var:A,offset:0,size:128)代表编号1号的内存描述符载有变量A的数据,该数据存储在自偏移量为0的地址起的连续128个地址空间[0:127]里。
如果还存在一个MD2(var:A,offset:32,size:32),显然,MD2和MD1在内存区间[32,63]存在交集,那么,当一个指令在定值MD1的同时,也可能会对MD2对应的内存区间产生影响,则该指令对MD1的定值为必须定值(must define),而该指令对其他与MD1产生交集区间的内存描述符(如MD2)而言则是可能定值(may define)。同样地,对使用的内存描述符也有相似的定义,分别为必须使用(must use)和可能使用(may use)。换言之,“必须”指的是对区间的必然描述,而“可能”则是任意跟“必须”有交集其他区间。
再者,如果一个内存描述符的偏移和大小都是精确的数值或范围,则该内存描述符是“精确(exact)”的,否则便不是精确的。例如:MD3[0,127]是精确的,MD4[32,X]是不精确的(X为未知)。
另外,同样以前述MD1及MD2为例,如果MD2的定值早于MD1的定值,那么不论MD2的定值为何,都会被后来的MD1所“杀掉(kill)”,也就是覆盖掉;如果MD2的定值晚于MD1的定值,那么MD2的定值会kill掉MD1的定值区间的一部分,即[32,63]。这种被kill的区间信息称为kill集信息。也就是说,内存描述符的kill集记录了会被其他指令kill掉的区间,内存描述符的定值对这些区间来说是无效的。
控制流图中有所谓的支配节点(dominator)、严格支配节点(strictly dominator)及直接支配节点(immediate dominator)。图2示出一个示例性的控制流图以说明这些节点,以H节点来说,路径的起点是A节点,再经过E节点,E节点后有个分支,可能进行F节点或G节点,不论进行F节点或G节点,结束后都会经过H节点。支配节点指的是自控制流图的起点开始,进行至某个特定基本块必须途经的基本块,欲到达H节点,必然会经过A节点、E节点和H节点本身,故A节点、E节点和H节点都是H节点的支配节点。严格支配节点则是特定基本块本身以外的支配节点,同样以H节点为例,A节点及E节点是H节点的严格支配节点。而直接支配节点指的是最接近特定基本块的支配节点,E节点便是H节点的直接支配节点。
在构建静态单赋值形式的过程中,还需要考量支配边界(dominance frontier),即为某一节点的支配关系终止的地方,也是会有其他控制流出现的地方,支配边界直观理解就是特定节点所能支配的边界(并不包括该边界)。以图2的E节点为例,其严格支配了F节点、G节点、H节点,则E节点的支配边界为D节点(F节点的立即后驱节点)、M节点(H节点的立即后驱节点)、L节点(节点的立即后驱节点),如图所示,D节点、M节点、L节点都有其他分支导入,以至于不受E节点支配。
本公开的一个实施例是一种在控制流图中计算程序中数据依赖关系的方法,适用张量指令的环境中,所述程序可以用于对图像、语音、文本数据在神经网络进行推理计算或其他运算中,此处不做具体限定。对于包含张量指令的情况,一个张量内存区间可能会由多个指令进行定值,本申请实施例的方法用于根据内存描述符分析指令之间的数据依赖关系。图3示出此实施例的流程图。
在步骤301中,将所述程序建构成静态单赋值形式的控制流图。把程序转换为静态单赋值形式的方式有很多种,最简单的就是先将每个被赋值的变量用一个新的变量名称来取代,如图1A所示的源程序中的A基本块与B基本块的变量y变更为图1B 中A基本块的变量y 1与B基本块的变量y 2
然而,在图1A中的基本块C里变量y的使用无法确定,这是因为前驱基本块(基本块A与基本块B)都对变量y进行了定值,编译时无法得知程序会从哪个前驱基本块到达基本块C。当控制流图遇到这种情况时,此步骤接着加入PHI函数。如图1B所示,基本块C的起始处添加了一条PHI函数指令,同时基于变量y重新定义一个变量y 3,PHI函数可以根据程序的运行路径选择对应的版本,也就是选择使用变量y 1或y 2
此实施例通过支配边界以准确掌握PHI函数应当置入在哪些基本块中。如果某个节点定值了某个变量,那么这个变量会到达被该节点支配的每一个节点,在立即后驱基本块中就需要加入PHI函数。
此实施例利用以下方式来实现PHI函数。同样以图1B为例,假设A基本块的y 1分配至R 1缓存器,B基本块的y 2分配至R 2缓存器,则C基本块中的PHI函数y 3←Ф(A:y 1,B:y 2)可以在A基本块的末尾会生成以下指令:
move R 3←R 1
也就是将R 1缓存器的数据搬移到R 3缓存器。同理,在B基本块的末尾生成以下指令:
move R 3←R 2
将R 2缓存器的数据搬移到R 3缓存器。如此不论程序经过A基本块还是B基本块,R 3缓存器都会是变量y 3的正确定值。
在步骤302中,识别所有节点的直接支配节点。此步骤可以采用经典的Lengauer-Tarjan演算法或其他演算法识别出直接支配节点。以图2的控制流图为例,此实施例识别出直接支配节点为A节点、B节点、E节点及I节点。
在步骤303中,设定所有节点的初始信息,所述初始信息为相对应的直接支配节点的所有内存描述符的定值信息及kill集信息。由于直接支配节点里的各变量会直接在后驱基本块被使用,故此实施例在设定初始信息时,将直接支配节点里各变量的定值及kill集作为后驱基本块的初始信息。再者,变量是由内存描述符所定义的,故设定所有节点的初始信息就是设定节点内各变量的初始信息,也就是设定变量所对应的内存描述符的定值信息及kill集信息的初始信息。
以图2的C节点为例,其直接支配节点为B节点,因此利用B节点内变量的内存描述符的定值信息及kill集信息,来作为C节点内的变量的内存描述符的定值信息及kill集信息的初始信息。又以图2的L节点为例,其直接支配节点为A节点,因此利用A节点内的变量的内存描述符的定值信息及kill集信息,来作为L节点内的变量的内存描述符的定值信息及kill集信息的初始信息。
在此实施例中,kill集信息皆为有序列表,一般是地址由低到高排列,方便做各种交集和并集运算。
在步骤304中,基于所述控制流图进行迭代,每次迭代执行时,遍历所有节点,计算并更新每个节点的内存描述符的定值信息及kill集信息,并更新指令的数据依赖关系。初始是基于步骤303中所得到的初始信息计算每个节点的内存描述符的定值信息及kill集信息,获得新的定值信息及kill集信息后,再以新的定值信息及kill集信 息代入重新计算,如此迭代,直到每个节点内的变量的内存描述符的定值信息及kill集信息不再改变为止。
在迭代的过程中,数据流的顺序可以采用先序、中序、后序或逆后序的方式。以后序为例,其操作为以倒序遍历左子树,再倒序遍历右子树,最后访问根结点。以图2为例,其后序遍历的结果是M→D→C→B→H→F→G→E→L→J→K→I→A。再以逆后序为例,其是先以后序方式遍历,再将结果倒置而得,故图2的控制流图的逆后序遍历结果为A→I→K→J→L→E→G→F→H→B→C→D→M,逆后序可以更早的收敛。此实施例不限制遍历的顺序,但较佳的采用逆后序遍历。
当每个节点内的变量的内存描述符的定值信息及kill集信息不再改变时,迭代便可结束,此时的定值-使用链、使用-定值链及kill集信息即为最终结果。详细来说,在更新各内存描述符的定值信息及kill集信息的过程中,其实就是更新指令间的数据依赖关系,这些数据依赖关系将做为指令编译优化时的参考信息,经编译优化的指令可编译为计算机能够执行的机器码,从而使得计算机能够根据这些指令的数据依赖关系执行图像、语音、文本数据等计算。其中,指令编译优化包括但不限于死代码删除、公共子表达式删除、常量折叠等等。
此实施例考量到节点可能会涉及PHI函数,在此步骤会进一步遍历每个基本块的指令,判断每个基本块的后继基本块是否为PHI节点。如为PHI节点,则设定PHI节点的所有直接前驱节点的定值信息为PHI节点的内存描述符的定值信息,以确定PHI节点的定值信息。接着将这些立即前驱节点的kill集信息取交集,以产生立即前驱节点的kill集信息的交集区间,再设定交集区间为PHI节点的内存描述符的kill集信息,以确定PHI节点的kill集信息。
以一个例子来说明PHI函数的定值方式。假设PHI节点存在两个分支输入,在这两个立即前驱节点里分别具有指令对内存描述符MD1定值,即为MD1V1和MD1V2,又假设MD1V1和MD1V2的记录信息如下:
  定值信息 kill集信息
MD1V1 IR1、IR2 {[1,127]}
MD1V2 IR3、IR4 {[32,63]}
设定PHI节点的所有立即前驱节点的定值信息为PHI节点的内存描述符的定值信息,即IR1、IR2、IR3、IR4。接着将这些立即前驱节点的kill集信息取交集,以产生立即前驱节点的kill集信息的交集区间,[1,127]与[32,63]的交集区间为[32,63],故PHI节点的内存描述符的kill集信息为[32,63]。
Figure PCTCN2021096378-appb-000001
在前述实施例的步骤304中,一般是先识别指令的使用,再识别指令的定值。更详细来说,前述实施例先识别各节点中的指令有哪些是使用,哪些是定值,然后计算并更新那些指令的使用的内存描述符的定值信息及kill集信息,再计算并更新那些指 令的定值的内存描述符的定值信息及kill集信息。
例如,对于加法指令:C[0,127]=A[0,127]+B[0,127]来说,该加法指令定值了MD1(var:C,offset:0,size:128),该加法指令使用了MD2(var:A,offset:0,size:128)和MD3(var:B,offset:0,size:128)。在本公开实施例中,在计算和更新各个节点中的内存描述符的定值信息及kill集信息时,可以首先计算并更新该加法指令的使用(MD2和MD3)的定值信息和kill集信息,再计算并更新该加法指令的定值MD1的定值信息和kill集信息。
本公开的另一个实施例是一种计算并更新指令的使用的内存描述符的定值信息及kill集信息的方法,图4示出其流程图。
在步骤401中,判断指令的使用是否存在必须使用内存描述符。如不存在必须使用内存描述符,就执行步骤402,设定可能使用内存描述符的定值信息为指令的使用的内存描述符的定值信息。其中,该可能使用内存描述可以是同一变量对应的多个内存描述符,也可以是多个不同变量对应的描述符。如存在必须使用内存描述符,就执行步骤403,判断是否存在必须精确使用内存描述符,也就是必须使用内存描述符是否为精确。如不精确,便执行步骤404,设定必须使用内存描述符及可能使用内存描述符的定值信息为指令的使用的定值信息,也就是必须使用内存描述符的定值信息及可能使用内存描述符的定值信息都作为所述指令的使用的内存描述符的定值信息。如精确,便执行步骤405。
在步骤405中,判断所述必须精确使用内存描述符的定值信息是否存在。如不存在,则执行步骤406,识别所有存在定值信息的可能使用内存描述符,接着执行步骤413。如存在,则执行步骤407,设定所述必须精确使用内存描述符的定值信息为所述指令的使用的定值信息,接着执行步骤408。
在步骤408中,判断所述必须精确使用内存描述符的kill集信息是否为空,其中,kill集可以是考虑同一变量的所有可能使用内存描述符确定的。如为空,表示所述必须精确使用内存描述符的定值信息未被kill掉,则执行步骤409,不需要再设定所述指令的使用的kill集信息了,也就是其kill集信息为空,结束整个流程。如不为空,表示所述必须精确使用内存描述符的部分定值信息被kill掉,则执行步骤411。
在步骤411中,判断所述使kill集信息不为空的可能使用内存描述符的定值信息是否存在。如不存在,则不需要再考量可能使用内存描述符了,基于前述必须使用内存描述符的定值信息定值即可,此时执行步骤409,结束整个流程。如存在,则执行步骤413,判断所述必须使用内存描述符的定值信息与所述可能使用内存描述符的定值信息的交集区间是否被kill掉。如未被kill掉,则执行步骤414,设定所述可能使用内存描述符的定值信息和必须使用内存描述符的定值信息为所述使用的定值信息。如被kill掉,则执行步骤409,结束整个流程。
本公开的另一个实施例亦为计算并更新指令的使用的定值信息及kill集信息的方法,图5示出此实施例的流程图,其与图4的实施例不同处有二。
其一是图5的实施例不包括步骤413,换言之,在步骤406之后,不考虑必须使用内存描述符的定值信息与可能使用内存描述符的定值信息的交集区间是否被kill掉,直接执行步骤501,设定可能使用内存描述符的定值信息为所述指令的使用的定值信息。
其二是在步骤411中,当判断所述可能使用内存描述符的定值信息存在时,不执行步骤413,而执行步骤502,设定所述可能使用内存描述符的定值信息和必须使用内存描述符的定值信息为指令的使用的定值信息。
在计算并更新指令的使用的内存描述符的定值信息及kill集信息后,步骤304接着计算并更新指令的定值的定值信息及kill集信息。
本公开的另一个实施例即为计算并更新指令的定值的定值信息及kill集信息的方法,图6示出其流程图。
在步骤601中,判断指令的定值是否存在必须定值内存描述符。如不存在,则执行步骤602,识别所有可能定值内存描述符。接着执行步骤603,在所有可能定值内存描述符中,排除被其他可能定值内存描述符覆盖的可能定值内存描述符。接着执行步骤604,在所有可能定值内存描述符中,排除已有定值信息的可能定值内存描述符。接着执行步骤605,建立所有可能定值内存描述符中未被排除的可能定值内存描述符的定值信息。当在步骤601中判断指令的定值存在必须定值内存描述符时,则执行步骤606。
在步骤606中,判断必须定值内存描述符是否精确。如精确,则执行步骤607,建立指令的定值中的必须定值内存描述符的定值信息。接着执行步骤608,识别所述指令的定值中的必须定值内存描述符的可能定值内存描述符。接着执行步骤609,将所述指令的定值中的必须定值内存描述符的定值信息置入所述可能定值内存描述符的kill集信息中。当在步骤606中判断必须定值内存描述符不精确,则执行步骤610。
在步骤610中,判断可能定值内存描述符是否存在定值信息。如存在定值信息,则执行步骤611,注记所述可能定值内存描述符被kill了。如不存在定值信息,表示没有任何可供定值的信息,则执行步骤612,结束整个流程。
在步骤609中,其置入的方式可以再细化为如图7所示的流程。在步骤701中,判断所述可能定值内存描述符是否完全被kill掉。如完全被kill掉,则执行步骤702,删除所述可能定值内存描述符的记录。如未完全被kill掉,则执行步骤703,判断所述可能定值内存描述符的kill集信息是否为空。如为空,则执行步骤704,将所述指令的定值中的必须定值内存描述符的定值信息全部区间置入所述可能定值内存描述符的kill集信息中。如不为空,则执行步骤705,建立所述可能定值内存描述符的kill集信息与所述必须定值内存描述符的定值信息的并集区间。接着执行步骤706,将所述并集区间置入所述可能定值内存描述符的kill集信息中。
在另一种情境下,当在步骤601中判断所述指令的定值是否存在必须定值内存描述符时,并不排除一些特殊的,如步骤603或604所示的可能定值内存描述符,而是在步骤602识别所有可能定值内存描述符后,直接建立所有可能定值内存描述符的定值信息。
在另一种情境下,当前述的必须定值内存描述符被谓词(predicate)时,在编译时便无法确定该谓词是否会被执行。此实施例基于谓词的情况,更进一步执行如图8所示的流程。
在步骤801中,判断指令的定值的必须定值内存描述符是否被谓词。如未被谓词,则执行步骤802,执行图6的流程。如被谓词,则需要建立所述指令的必须定值内存描述符的定值信息,这时继续执行步骤803,判断所述指令的定值的必须定值内存描 述符是否存在定值信息。如存在定值信息,则执行步骤804,加入所述指令的定值至所述指令的定值的必须定值内存描述符的定值信息中。如不存在定值信息,需要建立新的指令的定值的必须定值内存描述符的定值信息,则执行步骤802,执行图6的流程。
同时,在计算并更新指令的内存描述符的定值信息及kill集信息的过程中,步骤304还可以同时更新指令的数据依赖关系,指令的数据依赖关系可以是指各个指令的使用对应的定值信息,从而可以确定定值-使用链、使用-定值链。该数据依赖关系可以一数组的形式进行记录和存储。
为更清楚理解本公开的技术方案,下面举例对前述各流程进行对照。
在一个例子中,某个控制流图中的一个变量涉及多个内存描述符及其定值信息如下:MD1[0,31]、MD2[0,15]、MD3[16,31]、MD4[0,64],并基于逆后序的数据流,现有6条相关指令对该变量定值与使用如下表所示:
指令 必须定值/使用 可能定值/使用
IR1:定值MD1 MD1 {MD2,MD3,MD4}
IR2:使用MD1 MD1 {MD2,MD3,MD4}
IR3:定值MD2 MD2 {MD1,MD4}
IR4:使用MD1 MD1 {MD2,MD3,MD4}
IR5:定值MD3 MD3 {MD1,MD4}
IR6:使用MD4 MD4 {MD1,MD2,MD3}
依序处理各指令。首先处理指令IR1,其必须定值了MD1,由于是指令的定值,故执行图6的流程。在步骤601中,判断IR1是否存在必须定值内存描述符,MD1是必须定值内存描述符,因此执行步骤606,判断必须定值内存描述符是否精确,MD1的区间[0,31]是精确,接着执行步骤607,建立IR1中的必须定值内存描述符的定值信息,也就是将MD1[0,31]设定成该变量的定值信息。接着执行步骤608及步骤609,虽然MD2、MD3、MD4是可能定值内存描述符,但通过查询初始信息确定在IR1之前的指令均未出现MD2、MD3、MD4,故在此指令的定值中尚无法考虑可能定值内存描述符MD2、MD3、MD4,因此在步骤608及步骤609中无需记录可能定值内存描述符也不用更新kill集信息。因此该变量的内存描述符建立记录信息如下:
  定值信息 kill集信息
MD1 IR1 {}
其中kill集信息记录为{},表示其kill集信息为空。
接着处理指令IR2,其必须使用了MD1,由于是指令的使用,故执行图4或图5的流程,在此以图4为例说明之。在步骤401中,判断所述指令的使用是否存在必须使用内存描述符,由于存在MD1,故就执行步骤403,判断是否存在必须精确使用内存描述符,由于MD1为精确,便执行步骤405,判断所述必须精确使用内存描述符的定值信息是否存在,MD1的定值信息[0,31]是存在的,接着执行步骤407,设定所述必须精确使用内存描述符的定值信息为所述指令的使用的定值信息,因此指令IR2的使用的定值信息设定为IR1(MD1的定值指令),接着执行步骤408,判断所述必须 精确使用内存描述符的kill集信息是否为空,由于MD1尚未有任何区间被kill掉,则执行步骤409,不需要再考量kill集信息了,结束整个流程。换言之,IR2的定值信息为IR1,从而可以确定指令IR2和IR1之间的数据依赖关系(即use_def链),kill集信息为空。
接着处理指令IR3,其必须定值了MD2,故执行图6的流程。在步骤601中,判断IR3是否存在必须定值内存描述符,MD2是必须定值内存描述符,因此执行步骤606,判断必须定值内存描述符是否精确,MD2的区间[0,15]是精确,接着执行步骤607,建立IR3中的必须定值内存描述符的定值信息,也就是将MD2[0,15]设定成该变量的定值信息。接着执行步骤608及步骤609,由于MD1是MD2的可能定值内存描述符,因此在步骤609中,更新MD1的kill集信息。因此该变量的内存描述符更新记录信息如下:
  定值信息 kill集信息
MD1 IR1 {[0,15]}
MD2 IR3 {}
接着处理指令IR4,其必须使用了MD1。在步骤401中,判断所述指令的使用是否存在必须使用内存描述符,由于存在MD1,故执行步骤403,判断是否存在必须精确使用内存描述符,由于MD1为精确,便执行步骤405,判断所述必须精确使用内存描述符的定值信息是否存在,MD1的定值信息是存在的,接着执行步骤407,设定所述必须精确使用内存描述符的定值信息为所述指令IR4的使用的定值信息,因此指令IR4的使用的定值信息设定为IR1,接着执行步骤408,判断所述必须精确使用内存描述符的kill集信息是否为空,在处理IR3后,MD1的kill集信息已不为空,则执行步骤411,判断所述可能使用内存描述符的定值信息是否存在,MD2存在定值信息,故执行步骤413,判断MD1的定值信息与MD2的定值信息的交集区间[0,15]是否被kill掉。因MD2的区间[0,15]未被kill掉,则执行步骤414,设定MD2的定值信息亦为所述指令的使用的定值信息。因此IR4的定值信息为:IR1(MD1的定值指令)和IR3(MD2的定值指令),从而可以确定指令IR4、IR2和IR1之间的数据依赖关系(即use_def链)。换言之,IR4使用了MD1,而MD1的区间是由两条指令所定值,其中IR1定值了[16,31],且由IR3定值了[0,15]。
接着处理指令IR5,其必须定值了MD3,故执行图6的流程。在步骤601中,判断IR5是否存在必须定值内存描述符,MD3是必须定值内存描述符,因此执行步骤606,判断必须定值内存描述符是否精确,MD3为精确,接着执行步骤607,建立IR5中的必须定值内存描述符的定值信息,也就是将MD3设定成该变量的内存描述符的定值信息。接着执行步骤608及步骤609,由于MD1是MD3的可能定值内存描述符,因此在步骤609中,更新MD1的kill集信息,由于MD1的原有kill区间为[0,15],新的kill区间为[16,31],这两个区间合并成一个新的区间[0,31],而这个区间恰好就是MD1对应的完整区间,换言之,MD1的定值已经全部被kill掉了。因此该变量的记录更新如下:
  定值信息 kill集信息
MD2 IR3 {}
MD3 IR5 {}
接着处理指令IR6,其必须使用了MD4。在步骤401中,判断所述指令的使用是否存在必须使用内存描述符,由于MD4尚未在前置的指令中被定值,故不存在任何必须使用内存描述符,因此执行步骤402,设定可能使用内存描述符的定值信息为指令的使用的定值信息。在考虑所有的可能使用内存描述符时,只有MD2和MD3在记录内存在,且这两个内存描述符的有效区间都没有被kill掉,故以这两个内存描述符的定值信息(即IR3和IR5)作为IR6的定值信息,从而可以确定指令IR6、IR3和IR5之间的数据依赖关系(即use_def链)。
在另一个例子中,假设存在四个内存描述符,分别为:MD1[0,31]、MD2[0,X]、MD3[0,31]、MD4[0,63],其中X表示编译时无法确定对应内存区间的具体地址,且MD1和MD2描述相同变量,例如为变量A,MD3和MD4描述相同变量,例如为变量B。以下是针对这四个内存描述符的指令:
Figure PCTCN2021096378-appb-000002
首先处理指令IR1,其必须定值了MD1,由于是指令的定值,故执行图6的流程。在步骤601中,判断IR1是否存在必须定值内存描述符,MD1是必须定值内存描述符,因此执行步骤606,判断必须定值内存描述符是否精确,MD1为精确,接着执行步骤607,建立IR1中的必须定值内存描述符的定值信息,也就是将MD1[0,31]设定成变量A的定值信息。接着执行步骤608及步骤609,虽然MD2的定值会影响MD1的定值信息,但在IR1之前指令均未定值MD2,故在此指令中尚无法考虑可能定值内存描述符MD2,因此在步骤608及步骤609中无需记录可能定值内存描述符也不用更新kill集信息。因此变量A的内存描述符建立记录信息如下:
  定值信息 kill集信息 is_kill
MD1 IR1 {}
其中is_kill是记录该内存描述符是否被kill了,为真表示被kill,为假表示未被kill。
接着处理指令IR2,其必须定值了MD2。在步骤601中,判断IR2是否存在必须定值内存描述符,MD2是必须定值内存描述符,因此执行步骤606,判断必须定值内存描述符是否精确,MD2并不精确,其中X未定,故在步骤610中,判断所述可能定值内存描述符是否存在定值信息。由于IR2的可能定值内存描述符MD1存在定值 信息,执行步骤611,注记MD1被kill了,即MD1的is_kill变更为真,用以表示MD1的部分区间被kill了,但由于MD2的定值信息不精确,无法确定MD1的kill集信息,因此以is_kill信息表示MD1被kill了。因此变量A的内存描述符的记录更新如下:
  定值信息 kill集信息 is_kill
MD1 IR1 {}
MD2 IR2 {}
接着处理指令IR3,其必须使用了MD2。在步骤401中,判断所述指令的使用是否存在必须使用内存描述符,MD2是必须使用内存描述符,故执行步骤403,判断是否存在必须精确使用内存描述符,由于MD2不精确,因此执行步骤404,设定必须使用内存描述符及可能使用内存描述符的定值信息为指令的使用的内存描述符的定值信息,也就是MD2的定值信息及MD1的定值信息都作为IR3的定值信息。IR3对MD2使用的定值信息为IR1和IR2,从而可以确定指令IR3、IR2和IR1之间的数据依赖关系(即use_def链)。
接着处理指令IR4,其为指令的定值。在步骤601中,判断所述指令是否存在必须定值内存描述符。因不存在,则执行步骤602,识别所有可能定值内存描述符。在此例子中,所有可能定值内存描述符为MD1、MD2、MD3、MD4。接着执行步骤603,在所有可能定值内存描述符中,排除被其他可能定值内存描述符覆盖的可能定值内存描述符。由于MD3完全被MD4覆盖,故排除MD3。接着执行步骤604,在所有可能定值内存描述符中,排除已有定值信息的可能定值内存描述符。MD1及MD2已有定值信息,故被排除,因此只剩下MD4。接着执行步骤605,建立所有可能定值内存描述符中未被排除的可能定值内存描述符的定值信息,即新增变量B对应的MD4的定值信息,更新后各个内存描述符的定值信息如下:
  定值信息 kill集信息 is_kill
MD1 IR1 {}
MD2 IR2 {}
MD4 IR4 {}
接着处理指令IR5,其必须定值了MD1。由于IR5被谓词,故执行如图8所示的流程。在步骤801中,判断所述指令的定值的必须定值内存描述符是否被谓词。由于被谓词,故执行步骤803,判断所述指令的定值的必须定值内存描述符是否存在定值信息。MD1存在定值信息,则执行步骤804,加入所述指令的定值至所述指令的定值的必须定值内存描述符的定值信息中。因此定值记录更新为:
  定值信息 kill集信息 is_kill
MD1 IR1、IR5 {}
MD2 IR2 {}
MD4 IR4 {}
接着处理指令IR6,其为指令的使用。在步骤401中,判断所述指令的使用是否存在必须使用内存描述符。因不存在必须使用内存描述符,就执行步骤402,设定可能使用内存描述符的定值信息为指令IR6的使用的定值信息。IR6中使用的定值信息为:IR1、IR2、IR4、IR5,从而可以确定指令IR6与IR2、IR1、IR4及IR5之间的数据依赖关系(即use_def链)。
本公开另一个实施例为一种计算机可读存储介质,其上存储有计算程序中数据依赖关系的计算机程序代码,当所述计算机程序代码由处理器运行时,执行本公开的各种方法,如图3至8所述的方法。
前述各实施例通过编译器对每一条定值和指令的使用的内存区间和kill集信息进行分析,计算各指令之间的数据依赖关系,同时对于有定值-使用关系的指令,建立定值-使用链和使用-定值链的技术手段,解决难以分析内存之间的数据依赖关系,导致运算资源浪费的技术问题,达到提升计算效率的技术效果。
编译器可以基于该数据依赖关系进行编译优化,从而有效的减少不必要的计算程序,获得优化的代码。编译器还可以进一步将上述优化的代码转化为机器码,该机器码能够被人工智能芯片执行,使得人工智能芯片,特别是神经网络推理得以加速运行。
图9是示出应用本公开实施例的一种集成电路装置900的结构图。如图9所示,集成电路装置900包括计算装置902,计算装置902载有应用本公开各实施例编译后的机器码,以实现神经网络推理。另外,集成电路装置900还包括通用互联接口904和其他处理装置906。
其他处理装置906可以是中央处理器、图形处理器、人工智能处理器等通用和/或专用处理器中的一种或多种类型的处理器,其数目不做限制而是依实际需要来确定。其他处理装置906作为计算装置902与外部数据和控制的接口,执行包括但不限于数据搬运,完成对计算装置902的开启、停止等的基本控制。其他处理装置906也可以和计算装置902协作共同完成运算任务。
通用互联接口904可以用于在计算装置902与其他处理装置906间传输数据和控制指令。例如,计算装置902可以经由通用互联接口904从其他处理装置906中获取所需的输入数据,写入计算装置902片上的存储单元。进一步,计算装置902可以经由通用互联接口904从其他处理装置906中获取控制指令,写入计算装置902片上的控制缓存。替代地或可选地,通用互联接口904也可以读取计算装置902的存储模块中的数据并传输给其他处理装置906。
集成电路装置900还包括存储装置908,其可以分别与计算装置902和其他处理装置906连接。存储装置908用于保存计算装置902和其他处理装置906的数据,尤其适用于所需要运算的数据在计算装置902或其他处理装置906的内部存储中无法全部保存的数据。
根据应用场景的不同,集成电路装置900可以作为手机、机器人、无人机、视频采集等设备的片上系统(SOC),从而有效地降低控制部分的核心面积,提高处理速度并降低整体的功耗。在此情况时,集成电路装置900的通用互联接口904与设备的某些部件相连接。此处的某些部件可以例如是摄像头,显示器,鼠标,键盘,网卡 或wifi接口。
本公开还揭露一种芯片或集成电路芯片,其包括了集成电路装置900。本公开还揭露一种芯片封装结构,其包括了上述芯片。
本公开另一个实施例是一种板卡,其包括了上述芯片封装结构。参阅图10,板卡1000除了包括多个上述芯片1002以外,还可以包括其他的配套部件,该配套部件包括存储器件1004、接口装置1006和控制器件1008。
存储器件1004与芯片封装结构内的芯片1002通过总线1014连接,用于存储数据。存储器件1004可以包括多组存储单元1010。
接口装置1006与所述芯片封装结构内的芯片1002电连接。所述接口装置1006用于实现芯片1002与外部设备1012(例如服务器或计算机)之间的数据传输。在此实施例中,接口装置1006为标准PCIe接口,待处理的数据由服务器通过标准PCIe接口传递至芯片1002,实现数据转移。芯片1002的计算结果亦由接口装置1006传送回外部设备1012。
控制器件1008与芯片1002电连接,以便对芯片1002的状态进行监控。具体地,芯片1002与控制器件1008可以通过SPI接口电连接。控制器件1008可以包括单片机(“MCU”,Micro Controller Unit)。
本公开的另一个实施例是一种电子设备或装置,其包括了上述板卡1000。根据不同的应用场景,电子设备或装置可以包括数据处理装置、机器人、电脑、打印机、扫描仪、平板电脑、智能终端、手机、行车记录仪、导航仪、传感器、摄像头、服务器、云端服务器、相机、摄像机、投影仪、手表、耳机、移动存储、可穿戴设备、交通工具、家用电器、和/或医疗设备。所述交通工具包括飞机、轮船和/或车辆;所述家用电器包括电视、空调、微波炉、冰箱、电饭煲、加湿器、洗衣机、电灯、燃气灶、油烟机;所述医疗设备包括核磁共振仪、B超仪和/或心电图仪。
依据以下条款可更好地理解前述内容:
条款A1、一种计算程序中数据依赖关系的方法,所述方法包括:将所述程序建构成静态单赋值形式的控制流图,所述控制流图包括多个节点;设定所述多个节点的内存描述符的初始信息,所述初始信息为直接支配节点的所有内存描述符的定值信息及kill集信息;以及基于所述控制流图进行迭代,每次迭代执行下列步骤:遍历所述多个节点,计算并更新每个节点的内存描述符的定值信息及kill集信息,并更新指令的数据依赖关系。其中所述指令的数据依赖关系用于指令编译优化时的参考信息。
条款A2、根据条款A1所述的方法,其中所述迭代步骤是以逆后序数据流方式进行。
条款A3、根据条款A1所述的方法,其中在所述计算步骤中,先计算并更新指令的使用的内存描述符,再计算并更新指令的定值的内存描述符。
条款A4、根据条款A3所述的方法,其中对于所述指令的使用,所述计算步骤还包括:判断是否存在必须精确使用内存描述符;如存在,判断所述必须精确使用内存描述符的定值信息是否存在;如所述定值信息存在,设定所述必须精确使用内存描述符的定值信息为所述使用的定值信息。
条款A5、根据条款A4所述的方法,其中如所述定值信息存在,所述计算步骤还包括:判断所述必须精确使用内存描述符的kill集信息是否为空;以及如不为空, 识别使所述kill集信息不为空的可能使用内存描述符;判断所述可能使用内存描述符的定值信息是否存在;如所述可能使用内存描述符的定值信息存在,判断所述必须使用内存描述符的定值信息与所述可能使用内存描述符的定值信息的交集区间是否被kill掉;如未被kill掉,设定所述可能使用内存描述符的定值信息和所述必须使用内存描述符的定值信息为所述指令的使用的定值信息。
条款A6、根据条款A4所述的方法,其中如所述定值信息存在,所述计算步骤还包括:判断所述必须精确使用内存描述符的kill集信息是否为空;以及如不为空,判断所述可能使用内存描述符的定值信息是否存在;如所述可能使用内存描述符的定值信息存在,设定所述可能使用内存描述符的定值信息和所述必须使用内存描述符的定值信息为所述指令的使用的定值信息。
条款A7、根据条款A4所述的方法,其中如所述定值信息不存在,所述计算步骤还包括:识别所有存在定值信息的可能使用内存描述符;判断所述必须精确使用内存描述符的定值信息与所述可能使用内存描述符的定值信息的交集区间是否被kill掉;以及如未被kill掉,设定所述可能使用内存描述符的定值信息和所述必须使用内存描述符的定值信息为所述指令的使用的定值信息。
条款A8、根据条款A4所述的方法,其中如所述定值信息不存在,所述计算步骤还包括:识别所有存在定值信息的可能使用内存描述符;以及设定所述可能使用内存描述符的定值信息为所述指令的使用的定值信息。
条款A9、根据条款A4所述的方法,其中如不存在必须精确使用内存描述符,所述计算步骤还包括:设定必须使用内存描述符及可能使用内存描述符的定值信息为所述指令的使用的定值信息。
条款A10、根据条款A4所述的方法,其中所述计算步骤还包括:判断所述指令的使用是否存在必须使用内存描述符;以及如不存在必须使用内存描述符,设定可能使用内存描述符的定值信息为所述指令的使用的定值信息。
条款A11、根据条款A3所述的方法,其中对于所述指令的定值,所述计算步骤包括:建立所述指令的定值中的必须定值内存描述符的定值信息;识别所述指令的定值中的必须定值内存描述符的可能定值内存描述符;以及将所述指令中的必须定值内存描述符的定值信息置入所述可能定值内存描述符的kill集信息中。
条款A12、根据条款A11所述的方法,其中所述置入步骤包括:判断所述可能定值内存描述符是否完全被kill掉;以及如完全被kill掉,删除所述可能定值内存描述符的记录。
条款A13、根据条款A11所述的方法,其中所述置入步骤包括:判断所述可能定值内存描述符的kill集信息是否为空;如为空,将所述指令的定值中的必须定值内存描述符的定值信息全部区间置入所述可能定值内存描述符的kill集信息中。
条款A14、根据条款A13所述的方法,其中如所述可能定值内存描述符的kill集信息不为空,所述置入步骤包括:建立所述可能定值内存描述符的kill集信息与所述必须定值内存描述符的定值信息的并集区间;以及将所述并集区间置入所述可能定值内存描述符的kill集信息中。
条款A15、根据条款A3所述的方法,其中对于所述指令的定值,所述计算步骤包括:判断所述指令的定值中的必须定值内存描述符是否精确;如精确:建立所述指 令的定值中的必须定值内存描述符的定值信息;以及识别所述指令的定值中的必须定值内存描述符的可能定值内存描述符;如不精确:判断所述可能定值内存描述符是否存在定值信息;以及如所述可能定值内存描述符存在定值信息,注记所述可能定值内存描述符被kill了。
条款A16、根据条款A3所述的方法,其中对于所述指令的定值,所述计算步骤包括:判断所述指令的定值是否存在必须定值内存描述符;如不存在必须定值内存描述符,执行以下步骤:识别所有可能定值内存描述符;在所有可能定值内存描述符中,排除被其他可能定值内存描述符覆盖的可能定值内存描述符;在所有可能定值内存描述符中,排除已有定值信息的可能定值内存描述符;以及建立所有可能定值内存描述符中未被排除的可能定值内存描述符的定值信息。
条款A17、根据条款A3所述的方法,其中对于所述指令的定值,所述计算步骤包括:判断所述指令的定值是否存在必须定值内存描述符;如不存在必须定值内存描述符,建立所有可能定值内存描述符的定值信息。
条款A18、根据条款A3所述的方法,其中对于所述指令的定值,所述计算步骤包括:判断所述指令的定值的必须定值内存描述符是否被谓词;如被谓词,建立所述指令的定值的必须定值内存描述符的定值信息。
条款A19、根据条款A18所述的方法,其中所述建立所述指令的定值的必须定值内存描述符的定值信息步骤包括:判断所述指令的定值的必须定值内存描述符是否存在定值信息;如存在定值信息,则加入所述指令的定值至所述指令的定值的必须定值内存描述符的定值信息中;以及如不存在定值信息,则建立新的所述指令的定值的必须定值内存描述符的定值信息。
条款A20、根据条款A1所述的方法,其中所述计算步骤包括:遍历每个基本块的指令,判断所述每个基本块的后继基本块是否为PHI节点;如为PHI节点,执行下列步骤:设定所述每个基本块的定值信息为所述PHI节点的内存描述符的定值信息;建立所有所述每个基本块的kill集信息的交集区间;以及设定所述交集区间为所述PHI节点的kill集信息。
条款A21、根据条款A1所述的方法,其中还包括:识别所述多个节点里的所述直接支配节点。
条款A22、根据条款A1-21任一项所述的方法,其中所述kill集信息为有序列表。
条款A23、一种计算机可读存储介质,其上存储有计算程序中数据依赖关系的计算机程序代码,当所述计算机程序代码由处理器运行时,执行条款A1-22的任意一项所述的方法。
以上对本披露实施例进行了详细介绍,本文中应用了具体个例对本披露的原理及实施方式进行了阐述,以上实施例的说明只是用于帮助理解本披露的方法及其核心思想;同时,对于本领域的一般技术人员,依据本披露的思想,在具体实施方式及应用范围上均会有改变之处,综上所述,本说明书内容不应理解为对本披露的限制。

Claims (23)

  1. 一种计算程序中数据依赖关系的方法,所述方法包括:
    将所述程序建构成静态单赋值形式的控制流图,所述控制流图包括多个节点;
    设定所述多个节点的内存描述符的初始信息,所述初始信息为直接支配节点的所有内存描述符的定值信息及kill集信息;以及
    基于所述控制流图进行迭代,每次迭代执行下列步骤:
    遍历所述多个节点,计算并更新每个节点的内存描述符的定值信息及kill集信息,并更新指令的数据依赖关系;
    其中,所述指令的数据依赖关系用于指令编译优化时的参考信息。
  2. 根据权利要求1所述的方法,其中所述迭代步骤是以逆后序数据流方式进行。
  3. 根据权利要求1所述的方法,其中在所述计算步骤中,先计算并更新指令的使用的内存描述符,再计算并更新指令的定值的内存描述符。
  4. 根据权利要求3所述的方法,其中对于所述指令的使用,计算步骤还包括:
    判断是否存在必须精确使用内存描述符;
    如存在,判断所述必须精确使用内存描述符的定值信息是否存在;
    如所述定值信息存在,设定所述必须精确使用内存描述符的定值信息为所述使用的定值信息。
  5. 根据权利要求4所述的方法,其中如所述定值信息存在,所述计算步骤还包括:
    判断所述必须精确使用内存描述符的kill集信息是否为空;以及
    如不为空,判断所述可能使用内存描述符的定值信息是否存在;
    如所述可能使用内存描述符的定值信息存在,判断所述必须使用内存描述符的定值信息与所述可能使用内存描述符的定值信息的交集区间是否被kill掉;
    如未被kill掉,设定所述可能使用内存描述符的定值信息和所述必须使用内存描述符的定值信息为所述指令的使用的定值信息。
  6. 根据权利要求4所述的方法,其中如所述定值信息存在,所述计算步骤还包括:
    判断所述必须精确使用内存描述符的kill集信息是否为空;以及
    如不为空,判断所述可能使用内存描述符的定值信息是否存在;
    如所述可能使用内存描述符的定值信息存在,设定所述可能使用内存描述符的定值信息和所述必须使用内存描述符的定值信息为所述指令的使用的定值信息。
  7. 根据权利要求4所述的方法,其中如所述定值信息不存在,所述计算步骤还包括:
    识别所有存在定值信息的可能使用内存描述符;
    判断所述必须精确使用内存描述符的定值信息与所述可能使用内存描述符的定值信息的交集区间是否被kill掉;以及
    如未被kill掉,设定所述可能使用内存描述符的定值信息和所述必须使 用内存描述符的定值信息为所述指令的使用的定值信息。
  8. 根据权利要求4所述的方法,其中如所述定值信息不存在,所述计算步骤还包括:
    识别所有存在定值信息的可能使用内存描述符;以及
    设定所述可能使用内存描述符的定值信息为所述指令的使用的定值信息。
  9. 根据权利要求4所述的方法,其中如不存在必须精确使用内存描述符,所述计算步骤还包括:
    设定必须使用内存描述符及可能使用内存描述符的定值信息为所述指令的使用的定值信息。
  10. 根据权利要求4所述的方法,其中所述计算步骤还包括:
    判断所述指令的使用是否存在必须使用内存描述符;以及
    如不存在必须使用内存描述符,设定可能使用内存描述符的定值信息为所述指令的使用的定值信息。
  11. 根据权利要求3所述的方法,其中对于所述指令的定值,所述计算步骤包括:
    建立所述指令的定值中的必须定值内存描述符的定值信息;
    识别所述指令的定值中的必须定值内存描述符的可能定值内存描述符;
    以及
    将所述指令中的必须定值内存描述符的定值信息置入所述可能定值内存描述符的kill集信息中。
  12. 根据权利要求11所述的方法,其中所述置入步骤包括:
    判断所述可能定值内存描述符是否完全被kill掉;以及
    如完全被kill掉,删除所述可能定值内存描述符的记录。
  13. 根据权利要求11所述的方法,其中所述置入步骤包括:
    判断所述可能定值内存描述符的kill集信息是否为空;
    如为空,将所述指令的定值中的必须定值内存描述符的定值信息全部区间置入所述可能定值内存描述符的kill集信息中。
  14. 根据权利要求13所述的方法,其中如所述可能定值内存描述符的kill集信息不为空,所述置入步骤包括:
    建立所述可能定值内存描述符的kill集信息与所述必须定值内存描述符的定值信息的并集区间;以及
    将所述并集区间置入所述可能定值内存描述符的kill集信息中。
  15. 根据权利要求3所述的方法,其中对于所述指令的定值,所述计算步骤包括:
    判断所述指令的定值中的必须定值内存描述符是否精确;
    如精确:
    建立所述指令的定值中的必须定值内存描述符的定值信息;以及
    识别所述指令的定值中的必须定值内存描述符的可能定值内存描述符;
    如不精确:
    判断所述可能定值内存描述符是否存在定值信息;以及
    如所述可能定值内存描述符存在定值信息,注记所述可能定值内存 描述符被kill了。
  16. 根据权利要求3所述的方法,其中对于所述指令的定值,所述计算步骤包括:
    判断所述指令的定值是否存在必须定值内存描述符;
    如不存在必须定值内存描述符,执行以下步骤:
    识别所有可能定值内存描述符;
    在所有可能定值内存描述符中,排除被其他可能定值内存描述符覆盖的可能定值内存描述符;
    在所有可能定值内存描述符中,排除已有定值信息的可能定值内存描述符;以及
    建立所有可能定值内存描述符中未被排除的可能定值内存描述符的定值信息。
  17. 根据权利要求3所述的方法,其中对于所述指令的定值,所述计算步骤包括:
    判断所述指令的定值是否存在必须定值内存描述符;
    如不存在必须定值内存描述符,建立所有可能定值内存描述符的定值信息。
  18. 根据权利要求3所述的方法,其中对于所述指令的定值,所述计算步骤包括:
    判断所述指令的定值的必须定值内存描述符是否被谓词;
    如被谓词,建立所述指令的定值的必须定值内存描述符的定值信息。
  19. 根据权利要求18所述的方法,其中所述建立所述指令的定值的必须定值内存描述符的定值信息步骤包括:
    判断所述指令的定值的必须定值内存描述符是否存在定值信息;
    如存在定值信息,则加入所述指令的定值至所述指令的定值的必须定值内存描述符的定值信息中;以及
    如不存在定值信息,则建立新的所述指令的定值的必须定值内存描述符的定值信息。
  20. 根据权利要求1所述的方法,其中所述计算步骤包括:
    遍历每个基本块的指令,判断所述每个基本块的后继基本块是否为PHI节点;
    如为PHI节点,执行下列步骤:
    设定所述每个基本块的定值信息为所述PHI节点的内存描述符的定值信息;
    建立所有所述每个基本块的kill集信息的交集区间;以及
    设定所述交集区间为所述PHI节点的kill集信息。
  21. 根据权利要求1所述的方法,其中还包括:识别所述多个节点里的所述直接支配节点。
  22. 根据权利要求1-21任一项所述的方法,其中所述kill集信息为有序列表。
  23. 一种计算机可读存储介质,其上存储有计算程序中数据依赖关系的计算机程序代码,当所述计算机程序代码由处理器运行时,执行权利要求1-22的任意一项所述的方法。
PCT/CN2021/096378 2020-05-29 2021-05-27 计算程序中数据依赖关系的方法及计算机可读存储介质 WO2021239056A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US18/000,120 US20230195599A1 (en) 2020-05-29 2021-05-27 Method for computing data dependence relationship in program, and computer readable storage medium

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010474869.5A CN113741861A (zh) 2020-05-29 2020-05-29 计算程序中数据依赖关系的方法及计算机可读存储介质
CN202010474869.5 2020-05-29

Publications (1)

Publication Number Publication Date
WO2021239056A1 true WO2021239056A1 (zh) 2021-12-02

Family

ID=78724583

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/096378 WO2021239056A1 (zh) 2020-05-29 2021-05-27 计算程序中数据依赖关系的方法及计算机可读存储介质

Country Status (3)

Country Link
US (1) US20230195599A1 (zh)
CN (1) CN113741861A (zh)
WO (1) WO2021239056A1 (zh)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6301704B1 (en) * 1998-06-16 2001-10-09 Silicon Graphics, Inc. Method, system, and computer program product for using static single assignment form as a program representation and a medium for performing global scalar optimization
CN110168548A (zh) * 2017-01-05 2019-08-23 皇家飞利浦有限公司 编译设备和方法
CN110633248A (zh) * 2018-06-25 2019-12-31 英特尔公司 高性能计算环境中的序列优化
CN111857815A (zh) * 2019-04-24 2020-10-30 华为技术有限公司 指令处理的方法及装置

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6301704B1 (en) * 1998-06-16 2001-10-09 Silicon Graphics, Inc. Method, system, and computer program product for using static single assignment form as a program representation and a medium for performing global scalar optimization
CN110168548A (zh) * 2017-01-05 2019-08-23 皇家飞利浦有限公司 编译设备和方法
CN110633248A (zh) * 2018-06-25 2019-12-31 英特尔公司 高性能计算环境中的序列优化
CN111857815A (zh) * 2019-04-24 2020-10-30 华为技术有限公司 指令处理的方法及装置

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
MA HONGTU, ZHAO RONG-CAI, SU YAN-BING: "Analysis and Implementation of the Computation of Dominator in CFG", COMPUTER SCIENCE, KEXUE JISHU WENXIAN CHUBANSHE CHONGQING FENSHE, CN, vol. 36, no. 3, 31 March 2009 (2009-03-31), CN , XP055871688, ISSN: 1002-137X *

Also Published As

Publication number Publication date
US20230195599A1 (en) 2023-06-22
CN113741861A (zh) 2021-12-03

Similar Documents

Publication Publication Date Title
CN110096309B (zh) 运算方法、装置、计算机设备和存储介质
EP3525119B1 (en) Fpga converter for deep learning models
WO2021239056A1 (zh) 计算程序中数据依赖关系的方法及计算机可读存储介质
CN111047005A (zh) 运算方法、装置、计算机设备和存储介质
CN111461314B (zh) 基于常量数据包进行人工神经网络计算的方法、装置及计算机可读存储介质
CN111966399B (zh) 指令处理方法、装置及相关产品
CN111949317B (zh) 指令处理方法、装置及相关产品
CN109542837B (zh) 运算方法、装置及相关产品
CN112395008A (zh) 运算方法、装置、计算机设备和存储介质
CN111047030A (zh) 运算方法、装置、计算机设备和存储介质
CN111026440B (zh) 运算方法、装置、计算机设备和存储介质
WO2021254123A1 (zh) 基于控制流图推导地址的方法、装置及可读存储介质
CN111949318A (zh) 指令处理方法、装置及相关产品
CN111290789B (zh) 运算方法、装置、计算机设备和存储介质
CN111461326B (zh) 一种基于设备内存的指令寻址方法及计算机可读存储介质
CN111124497B (zh) 运算方法、装置、计算机设备和存储介质
CN111400341B (zh) 标量查找指令处理方法、装置及相关产品
CN112394985B (zh) 执行方法、装置及相关产品
CN111079914B (zh) 运算方法、系统及相关产品
CN111290788B (zh) 运算方法、装置、计算机设备和存储介质
CN111078285B (zh) 运算方法、系统及相关产品
CN111078125B (zh) 运算方法、装置及相关产品
CN111079915B (zh) 运算方法、装置及相关产品
CN111461315A (zh) 计算神经网络的方法、装置、板卡及计算机可读存储介质
CN111461316A (zh) 计算神经网络的方法、装置、板卡及计算机可读存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21812058

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21812058

Country of ref document: EP

Kind code of ref document: A1